Research / Benchmark

9JAVQA:
MODELS UNDER EXAM

Can the world's best multimodal AI pass a Nigerian exam? We tested them. They failed.

9jaVQA is a document visual question answering benchmark built from WAEC and NECO standardized exams — designed to expose how badly frontier AI models fail when the language isn't English.

Published at AfricaNLP 2025, this benchmark tests GPT-4o, Claude-3.5 Haiku, and Gemini-1.5 Pro on exam questions in Yoruba, Igbo, and Hausa — languages spoken by over 120 million people yet largely absent from AI training data.

BENCHMARK AT:
A GLANCE

Dataset Size

837 manually cropped exam question images

Languages

English, Yoruba, Igbo, Hausa

Sources

WAEC & NECO standardized examinations

Year Range

2008–2024 exam papers

Models Evaluated

GPT-4o, Claude-3.5 Haiku, Gemini-1.5 Pro

Published

AfricaNLP Workshop @ ACL 2025

THE FINDINGS:
A CLEAR GAP

State-of-the-art models that ace English benchmarks collapse under African language content.

GPT-4o on English

Achieved over 90% accuracy on English-language exam questions — comfortably above human performance.

GPT-4o on African Languages

The same model dropped below 40% on Yoruba, Igbo, and Hausa — a greater than 50-point collapse.

Humans Outperform All

Human participants exceeded 50% accuracy across all three African languages, outperforming every model tested.

Native Prompts Help, Barely

Prompting models in the native language improved results modestly but did not close the gap against human performance.

USE THE:
BENCHMARK

The 9jaVQA dataset is openly available on HuggingFace. We invite researchers to build on this benchmark and help close the gap in African language AI.

HuggingFace Dataset ACL Anthology

Cite Our Work

@inproceedings{olufemi-etal-2025-challenging,
  title     = {Challenging Multimodal {LLM}s with African Standardized Exams:
               A Document {VQA} Evaluation},
  author    = {Olufemi, Victor Tolulope and Babatunde, Oreoluwa Boluwatife
               and Bolarinwa, Emmanuel and Moshood, Kausar Yetunde},
  booktitle = {Proceedings of the 6th Workshop on African Natural Language Processing},
  year      = {2025},
  publisher = {Association for Computational Linguistics},
  url       = {https://aclanthology.org/2025.africanlp-1.22},
}

9JAVQA:MODELS UNDER EXAM

BENCHMARK AT:A GLANCE

THE FINDINGS:A CLEAR GAP