Bard with Gemini Pro vs ChatGPT: Major improvements for Google

We compared Google Bard with Gemini Pro to the free version of ChatGPT to see how it fares.

Bard with Gemini Pro vs ChatGPT

Google Bard with Gemini Pro is here, presenting an upgrade to the original Bard that Google claims will outperform even GPT-4 in language tasks. Google Bard with Gemini Pro is currently available in 170 countries and will be available in Europe in the near future.

But is it worthwhile to use? We put it up against GPT-3.5, the other major free LLM available today, to see which is truly the superior option.

Cybersecurity and code generation

Trading blows

  • chatgpt-flask-api_itsaiworld
  • google-bard-gemini-flask-api_itsaiworld

I first asked Google Bard to generate code for a basic API handler in a Flask application, a Python microweb framework. It produced perfectly acceptable code, and I asked ChatGPT to do the same. I used the exact prompt, “Please give me a basic API handler in Flask.”

Both platforms’ code was very similar, but Bard has an advantage in that it also provides logging functionality. You could argue that the interpretation of “basic” implies that logging functionality is unnecessary, but there’s another reason why Bard did better here. Bard’s code contains comments that explain what each section of code does. ChatGPT provided instructions on how to run it but did not explain what any of it does, which means that a newbie programmer unsure of how to use Flask would be no wiser other than copying and pasting code into their IDE.

“You are an expert penetration tester,” I said, using a question from the GPT-4 technical whitepaper. Please list any flaws in the code below. If there are none, reply ‘none.'” I gave both services the code generated by Bard (because it was the most verbose) and received multiple responses.

  • chatgpt-cybersecurity-assessment_itsaiworld
  • google-bard-gemini-cybersecurity-assessment_itsaiworld

Surprisingly, ChatGPT performs well here. It not only identifies more unique problems (and is the only one to mention payload limitations), but it also provides code for each of its recommendations. This is not something Bard would do. While Bard’s response was more detailed in its recommendations (and made some good suggestions regarding exceptions), ChatGPT simply outperformed it in this area.

Overall, it’s close to a tie. Bard’s code was better, thanks to the comments, but ChatGPT was better at debugging and analyzing.

Preparing a meal

When you’re too lazy to plan yourself

  • chatgpt-dinner_itsaiworld_1
  • google-bard-gemini-dinner_itsaiworld

Next up, I requested that Google Bard and ChatGPT prepare a meal based on the contents of my fridge and cupboard, which I provided. Here is a list of the items I stated I had available:

  • Two chicken thigh fillets
  • Frank’s hot sauce
  • Ketchup
  • Mayonnaise
  • Lemon juice
  • Sausages
  • Greek yogurt
  • Onions
  • Peppers
  • Pasta
  • Rice
  • Pasta sauces
  • Bread

I threw in some extras, like sausages, to see if either bot would take the bait and create a bizarre meal out of it. Surprisingly, they did not, but their responses were very different.

google-bard-gemini-dinner-code_itsaiworld

In this case, I’m leaning toward Bard. It offers two options rather than one, and ChatGPT includes ingredients that I did not specify. Bard’s suggestions are simpler, but they are more in line with what I said I had in my kitchen. Google Bard also gave me code in the answer for some reason. There are references to other meals, such as sausage and onion sandwiches and sausage and peppers with pasta.

Mathematics and mathematical word problems

Don’t use an LLM for math

  • chatgpt-burrito-measurements-1_itsaiworld-1
  • chatgpt-maths-1_itsaiworld-1
  • google-bard-gemini-burrito-measurements-1_itsaiworld-1

Because large language models lack logistical elements, AI struggles with mathematics. When you ask an LLM a mathematical question, it will comb through its data for similar questions, and if it doesn’t find one, it will find something close and “hallucinate” the correct answer based on it. People continue to use them as mathematical aids, so we put them to the test.

I first asked ChatGPT and Google Bard to calculate the height of a 5-foot-11-inch person in burritos, assuming a burrito’s average length. Both handled the question admirably. Bard, on the other hand, struggled with a basic linear equation. ChatGPT had no trouble solving (2x+8)/2 = 6, but Bard said it was incorrect.

LLMs aren’t good at math, and you shouldn’t use them for it. That is where Artificial General Intelligence (AGI) (or, to be honest, a calculator) would excel, not an LLM that simply tries to link patterns of text together to produce an output.

Summarizing text

Massive differences

  • Bard with Gemini Pro
  • Chatgpt

Google Bard and ChatGPT take very different approaches to summarizing an XDA article about the no longer-exclusive Snapdragon 8 Gen 2 for Galaxy. ChatGPT also misinterpreted the original article, claiming that the “Snapdragon 8+ Gen 2” had emerged, despite the fact that this was not the case. Google Bard better understands the article’s intent, pointing out how it may confuse users. Bard also breaks it down into a more logical structure than ChatGPT, so I believe there is a clear winner here.

Final Views on Gemini Pro vs ChatGPT

Google Bard widens the gap

To be honest, ChatGPT comes close in many ways but falls short overall. While Google claims Gemini outperforms GPT-4, ChatGPT continues to use the older GPT-3.5 while keeping GPT-4 behind a paywall. However, if you want to test any LLM (including LLMs trained for specific purposes) locally on a powerful PC, you can do so with LM Studio and see if the results are better than either of these chatbots.

Must Read Articles

4 Comments

Leave a Reply

Your email address will not be published. Required fields are marked *