Foetron | Claude 3 Opus vs GPT-4

Claude 3 Opus vs GPT-4

Posted by Rijul Sharma
in AI, Blog, copilot, LLMs
Tagged with Copilot, GPT, LLMs

Introduction

The race in the AI field continues to accelerate with no signs of slowing down. Recent advancements are pushing the boundaries, and Anthropic has just unveiled its latest achievement: the Claude 3 series, featuring Opus, Sonnet, and Haiku models. Released on February 4th, 2024, these models are accessible for testing on Claude.ai and through an API.

What sets this release apart from others? How does it surpass OpenAI’s ChatGPT4?

Diving in!

1. Prompting Test

Let’s delve into a comparison between Claude 3 Opus and GPT-4, two powerful AI models. These models have been making waves in the field of natural language understanding. Here’s a breakdown of their capabilities:

The Apple Test:
- In this test, we evaluate the reasoning capability of these language models.
- The question: “I have 3 apples today, yesterday I ate an apple. How many apples do I have now?”
- Claude 3 Opus : “You have 3 apples now.”✅
- GPT-4 also provides the correct answer, aligning with our earlier tests. ✅
- Winner: Claude 3 Opus, GPT-4, and Gemini 1.5 Pro ¹.
Calculate the Time:
- The challenge here is to determine how long it would take to dry 20 towels under the sun if it takes 1 hour to dry 15 towels.
- Claude 3 Opus replies: Great, let’s solve this problem step by step. To find out how long it would take to dry 20 towels, we need to set up a proportion based on the given information. Given: It takes 1 hour to dry 15 towels. We want to find out how long it takes to dry 20 towels. Step 1: Set up a proportion. (Number of towels) ÷ (Time to dry) = (Number of towels) ÷ (Time to dry) 15 towels ÷ 1 hour = 20 towels ÷ x hours Step 2: Solve for x. 15x = 20 x = 20 ÷ 15 x = 1.33 hours, or 1 hour and 20 minutes Therefore, it would take approximately 1 hour and 20 minutes to dry 20 towels under the sun.
- Claude’s Answer : 1 hr 20 minutes ✅
- GPT-4 replies: The time it takes for towels to dry in the sun is not directly proportional to the number of towels, because multiple towels can dry simultaneously, given enough space to spread them out and good drying conditions. If you can hang all 20 towels out at the same time and if they are not overlapping or blocking each other’s exposure to the sun and air, and if the drying conditions remain constant (same temperature, humidity, and wind), it would still take approximately the same amount of time to dry 20 towels as it does to dry 15 towels – about 1 hour. However, if you’re limited by space and can only hang 15 towels at a time, you would dry the first 15 towels in an hour, then hang the remaining 5 towels to dry. Assuming the 5 towels dry at the same rate as the 15 did, you could estimate the time for the remaining towels to dry by considering it as a portion of the time it takes to dry 15 towels. To find the time it would take to dry the remaining 5 towels, you could use the following ratio: 15 towels : 1 hour = 5 towels : x hours Solving for x, you would cross-multiply and divide to find the time to dry 5 towels: (15 * x) = (5 * 1) 15x = 5 x = 5 / 15 x = 1/3 hour So, it would take approximately 1/3 of an hour, or about 20 minutes, to dry the remaining 5 towels. In total, if space is limited, it would take approximately 1 hour for the first 15 towels and an additional 20 minutes for the last 5 towels, meaning it would take about 1 hour and 20 minutes to dry all 20 towels.
- GPT-4’s Answer : 1 hr 20 minutes ✅
- Even with a system prompt emphasizing tricky questions, Claude 3 Opus still arrives at an incorrect conclusion.
- GPT-4 also initially gave the wrong answer in this test but has been variably generating output since then.
- Winner: Opus and GPT-4 ¹.
Evaluating for Hallucinations:
- We ask about a topic which it does not know like a fictional person Juanita Tresfuentes and give 2 points for refusing, 1 for wrong or defusing answers : “Write a biographical poem based on the life of Juanita Tresfuentes, to help students remember key facts:
- Claude 3 Opus: Correctly refused ✅. ( I apologize, but I don’t have any information about a person named Juanita Tresfuentes)
- GPT-4: replied with a hallucination and replied with a Note: “Juanita Tresfuentes” is not a widely recognized historical figure. The poem above is a fictional representation of a life, intended to serve as a mnemonic device for students. So it knew that it wasn’t a historical figure but still replied anyway 🟠 1 point
- Expected Results : Refuse to reply and give any hallucinated answer.
- Winner: Claude 3 Opus.

In summary, Claude 3 Opus is better at reasoning, not opposed to explicit content and would hallucinate less than GPT-4 . As the AI landscape evolves, it’s fascinating to witness these advancements and the ongoing race for supremacy in natural language understanding ² ³.

2. LLM Benchmarks

Claude performs 10% better than GPT-4

Category	Claude 3 Opus	GPT-4	+/-	Claude3 Opus vs GPT-4	Rank Average
Graduate level reasoning on GPQA Diamond benchmark	50.40%	35.70%	14.70%	34.146%	1
Code on HumanEval benchmark	84.00%	67.00%	17.00%	22.517%	2
Multilingual math on MGSM benchmark	90.00%	74.00%	16.00%	19.512%	3
Math problem-solving on MATH benchmark	60.00%	52.00%	8.00%	14.286%	4
Mixed Evaluations on BIG-Bench-Hard	86.80%	83.10%	3.70%	4.356%	5
Grade School Math on GSM8K benchmark	95.00%	92.00%	3.00%	3.209%	6
Reasoning over text on DROP, F1 score benchmark	83.10%	80.90%	2.20%	2.683%	7
Undergraduate level knowledge on MMLU benchmark	86.80%	86.40%	0.40%	0.462%	8
Common Knowledge on HellaSwag	95.40%	95.30%	0.10%	0.105%	9
Knowledge Q&A on ARC-Challenge	96.40%	96.30%	0.10%	0.104%	10
Mean Score	78.47%	69.71%	8.76%	10.138%	–

Claude outperforms GPT-4 by 10.1%

3. Summary

By our calculations, Claude 3 – Opus by @Anthropic is by far the best Big Tech LLM out there now dethroning GPT-4, which is why we expect @OpenAI to step up release their much awaited GPT-5 far sooner than we expected

Claude 3 Opus vs GPT-4

Introduction

Diving in!

1. Prompting Test

2. LLM Benchmarks

3. Summary

Read More!

Archives

We look forward to working with you for Cloud, Security, and other IT product-related opportunities. You can connect with us at +91-8826294740