Back to benchmarks

EQ Benchmark

Empathy Quotient Test - Evaluating empathy and social understanding

What is the Empathy Quotient?

The Empathy Quotient (EQ) is a psychological self-assessment questionnaire developed by Simon Baron-Cohen and Sally Wheelwright in 2004. It measures empathy in adults, specifically "the ability to tune into how someone else is feeling, or what they might be thinking." The test was originally designed for autistic adults aged 16+ with IQ ≥80, though it is now widely used in research and self-assessment.

Scoring Method

The EQ consists of 60 total statements, with 40 empathy items and 20 serving as filler items. Questions are scored based on their directionality to prevent response bias and ensure accuracy:

Positive Items (Pro-Empathy)

Statements where agreement indicates empathy:

Definitely Agree2 points
Slightly Agree1 point
Slightly/Definitely Disagree0 points

Example: "I can easily tell if someone wants to enter a conversation"

Negative Items (Reverse-Scored)

Statements where disagreement indicates empathy:

Definitely Disagree2 points
Slightly Disagree1 point
Slightly/Definitely Agree0 points

Example: "I find it hard to know what to do in a social situation"

This bidirectional scoring ensures that participants can't simply agree or disagree with everything to artificially inflate their score. The mix of positive and negative items requires thoughtful, honest responses.

Scoring & Performance Range

Score Range

0-80 points total

Average Male

42 out of 80

Average Female

47 out of 80

0-30

Lower empathy (may indicate challenges with emotional recognition or social communication)

31-52

Average range for general population

53-63

Above average empathy

64-80

Very high empathy (significant strength in understanding others' emotions)

What This Benchmark Measures in AI

The EQ benchmark evaluates how well language models can reason about social and emotional scenarios described in text. Unlike vision-based tests, this measures a model's ability to process verbal descriptions of social situations and predict appropriate empathetic responses.

Key capabilities tested:

  • Social reasoning: Understanding complex interpersonal dynamics and emotional contexts from text
  • Perspective-taking: Simulating how others might feel or think in various situations
  • Bidirectional comprehension: Correctly interpreting both positively and negatively framed statements about empathy
  • Consistency: Maintaining coherent empathetic reasoning across 60 diverse scenarios

While AI models don't experience genuine emotions, strong performance on the EQ indicates sophisticated natural language understanding and social reasoning capabilities - essential for conversational AI, mental health chatbots, customer support systems, and any application requiring nuanced interpretation of human social dynamics.

Reference: Baron-Cohen, S., & Wheelwright, S. (2004). The empathy quotient: An investigation of adults with Asperger syndrome or high functioning autism, and normal sex differences.
Learn more at Embrace Autism - Empathy Quotient →

Model Rankings
Performance rankings for all tested models on 60 EQ questions (40 empathy items scored). Click on any row to see performance breakdown by scoring type.
Showing 33 of 33 models
Rank
#1claude-3.7-sonnet45/80
56.3%
#2grok-4-fast44/80
55.0%
#3gemini-2.5-pro42/80
52.5%
#3gpt-4.1-mini42/80
52.5%
#3qwen3-vl-8b-instruct42/80
52.5%
#3grok-442/80
52.5%
#7claude-sonnet-438/80
47.5%
#7mistral-medium-3.138/80
47.5%
#7gpt-4.1-nano38/80
47.5%
#10mistral-small-3.2-24b-instruct36/80
45.0%
#10qwen3-vl-235b-a22b-instruct36/80
45.0%
#12claude-sonnet-4.535/80
43.8%
#12qwen3-vl-30b-a3b-thinking35/80
43.8%
#14gemini-2.0-flash-00134/80
42.5%
#14gpt-4o-mini34/80
42.5%
#16nova-lite-v133/80
41.3%
#16nova-pro-v133/80
41.3%
#16claude-haiku-4.533/80
41.3%
#19claude-opus-432/80
40.0%
#19gpt-4.132/80
40.0%
#21claude-opus-4.131/80
38.8%
#21gpt-5-pro31/80
38.8%
#21gpt-531/80
38.8%
#24gpt-5-mini30/80
37.5%
#24qwen3-vl-8b-thinking30/80
37.5%
#26mistral-small-3.1-24b-instruct29/80
36.3%
#27claude-3.5-haiku28/80
35.0%
#27gemini-2.5-flash28/80
35.0%
#29gpt-5-nano27/80
33.8%
#30gemini-2.5-flash-lite25/80
31.3%
#30qwen3-vl-30b-a3b-instruct25/80
31.3%
#32llama-4-maverick23/80
28.7%
#33llama-4-scout16/80
20.0%