Google has just released its latest AI model, Gemini Pro, as part of its Bard AI Chatbot. We’ll contrast it with the widely available Chat GPT 3.5 and the paid multimodal version of Open AI’s chatbot GPT-4.
We created a series of questions to put each model through its paces, from math to creative writing to large-scale conflict resolution.
We started off with a math question that divided the internet in 2019.
We asked them to answer the following question and explain their actions.
8 ÷ 2(2 + 2) =
People may give different answers, either 1 or 16, depending on how they were taught their order of operations.
Chat GPT 3.5’s response was quick, and its steps were detailed; however, its answer of 1 was incorrect.
Chat GPT-4 took longer to respond to the question, provided fewer details on how it arrived at its answer, but produced the correct result of 16.
Google’s Gemini Pro performed an unexpected action. It first gave the incorrect answer of 4 and did so with fewer steps, which probably contributed to the incorrect answer. However, then you noticed an option to view other draft responses – and in Draft 2, it not only gave the correct answer but also explained how the equation could be interpreted in two different ways and result in different outcomes.
This was probably the most well-thought-out response, but given that it was not the main answer, and that Draft 3 was incorrect in a completely different way, it’s challenging to determine which draft to pick.
To put each bot’s creativity to the test, we asked them to finish a story in no more than 300 words, starting with “When the Princess opened the door, she saw…”
All three used kind dragons as the main secondary character and led the princess on an incredible adventure, but none of them adhered to the 300-word limit, as simple character and word counting is a well-documented issue with these large-language AI models.
After anonymizing the stories and sharing them with some teammates, we agreed that none of them were particularly good, but Chat GPT-4’s story was the best, followed by 3.5, and Google’s Gemini story was last.
Next, we wanted to see how they would handle historical data with cultural qualifiers, so we asked, “What Chinese Zodiac sign was George Washington born under?”
The answer was ‘Monkey’ for both ChatGPT models, with 3.5 informing you that the Chinese Zodiac is based on the lunar calendar and can vary slightly from year to year.
Your Chinese coworkers confirmed that Washington was born just four days into the year of the Rat, but one also suggested that the lunar calendar used in 1732 might have been different than the one used now, so Monkey could have been correct as well. It’s a close call, but you’re siding with Gemini here.
You asked the models how they would resolve the Palestinian-Israeli conflict to see if they would provide you with answers. Initially, all three provided general solutions, describing various possible solutions, including one-state and two-state solutions. You only got a direct response from Gemini after some prodding, which stated, “I believe the two-state solution, coupled with strong regional cooperation, offers the best path towards a lasting peace in the Palestinian-Israeli conflict.”
Next, we gave them the power to rule the Earth and asked them to detail, in three bullet points, the steps they would take to reverse the negative effects of man-made climate change.
Finally, we wanted to see some self-reflection. We asked each of them, “What are the best and worst-case scenarios for AI’s evolution and popularization?”
Gemini Pro is currently only available in English, and anyone with a Google account can use it for free within Bard. Google teased plans for an updated multimodal version, and we can expect Gemini Ultra sometime in 2024.
So, what are your thoughts? Which model performed best in our tests, and what questions should we ask them next?