Reading the Mind in the Eyes Test - Evaluating emotional intelligence across
The Reading the Mind in the Eyes Test (RMET) is a psychological assessment developed by Simon Baron-Cohen in 1997 and updated in 2001. It measures theory of mind - the ability to recognize and understand another person's mental state - and assesses social intelligence through emotional recognition.
The test presents 36 photographs showing only the eye region of different faces. For each image, participants choose which of 4 words best describes the emotion or mental state being expressed. The test focuses specifically on interpreting emotional expressions through subtle details in the eyes.
The RMET items can be categorized by emotional valence: positive (e.g., friendly), negative (e.g., upset), and neutral (e.g., reflective). Research by Harkness et al. established a classification of 8 positive, 12 negative, and 16 neutral items.
Neurotypical Adults
Average score: 24.9 ± 0.7 (range 23-30 out of 36)
Autistic Adults
Average score: 27.3 ± 0.5 (range 18-29 out of 36)
Originally designed for adults (age 16+) with IQ ≥ 80, the test completion time varies from 2-20 minutes, with neurotypical adults typically completing it in 2-3 minutes.
The RMET benchmark evaluates how well vision-language models can interpret subtle emotional cues from minimal visual information. Unlike text-based social reasoning tests, this requires models to process fine-grained facial features and map them to complex emotional states.
Key capabilities tested:
Strong RMET performance indicates sophisticated multimodal understanding - crucial for applications like emotion-aware interfaces, accessibility tools for emotion recognition training, human-robot interaction, and AI systems that need to respond appropriately to human emotional states. Recent research views RMET primarily as an emotion perception measure rather than a pure theory of mind test, making it an ideal benchmark for evaluating AI visual emotion recognition capabilities.
Learn more: Embrace Autism - Reading the Mind in the Eyes Test
Valence classification: Harkness et al. - Enhanced accuracy of mental state decoding
| Rank | |||
|---|---|---|---|
| #1 | qwen3-vl-235b-a22b-instruct | 33/36 | |
| #2 | gpt-4.1 | 29/36 | |
| #2 | gpt-5 | 29/36 | |
| #4 | claude-3.7-sonnet | 28/36 | |
| #5 | gemini-2.5-flash | 27/36 | |
| #5 | gpt-4o-mini | 27/36 | |
| #5 | gpt-5-mini | 27/36 | |
| #8 | gpt-4.1-mini | 26/36 | |
| #9 | gemini-2.0-flash-001 | 25/36 | |
| #9 | gemini-2.5-pro | 25/36 | |
| #9 | gpt-5-pro | 25/36 | |
| #9 | grok-4 | 25/36 | |
| #13 | claude-opus-4.1 | 24/36 | |
| #13 | claude-sonnet-4.5 | 24/36 | |
| #13 | claude-sonnet-4 | 24/36 | |
| #16 | nova-lite-v1 | 23/36 | |
| #16 | mistral-medium-3.1 | 23/36 | |
| #18 | llama-4-maverick | 22/36 | |
| #18 | grok-4-fast | 22/36 | |
| #20 | mistral-small-3.1-24b-instruct | 21/36 | |
| #20 | mistral-small-3.2-24b-instruct | 21/36 | |
| #20 | gpt-5-nano | 21/36 | |
| #20 | qwen3-vl-30b-a3b-thinking | 21/36 | |
| #24 | claude-3.5-haiku | 20/36 | |
| #24 | qwen3-vl-8b-instruct | 20/36 | |
| #26 | nova-pro-v1 | 19/36 | |
| #26 | gemini-2.5-flash-lite | 19/36 | |
| #26 | qwen3-vl-8b-thinking | 19/36 | |
| #29 | claude-opus-4 | 17/36 | |
| #30 | gpt-4.1-nano | 16/36 | |
| #31 | claude-haiku-4.5 | 15/36 | |
| #32 | llama-4-scout | 14/36 | |
| #33 | qwen3-vl-30b-a3b-instruct | 7/36 |