Back to Projects

Arolsen Archives

AI for Holocaust Document Digitization

Accenture · V360 Innovation Award

LayoutLMCRFOCRNLP

The Problem

The Arolsen Archives hold the world's largest collection of documents on Nazi persecution — 110 million digital objects, part of UNESCO's Memory of the World. Before AI, each document was indexed independently by three volunteers and verified by an archivist. Four people, one hour, four documents. At that rate, digitizing everything would take decades.

What We Built

An AI solution using OCR, LayoutLM, and CRF models to automatically extract structured information — names, dates, birthplaces, religions — from documents that were particularly difficult for humans: multi-row prisoner lists, faded handwriting, camp-specific abbreviations.

Impact

  • 40x productivity increase (4 → 160 documents/hour)
  • 160,000+ names indexed, 18,000+ documents extracted, 60,000+ clustered
  • 99% AI confidence on 'religion' field
  • 950+ Accenture volunteers across 70+ cities, 6 continents
  • V360 Global Innovation Award · 2 NLP patents
  • Covered by Nasdaq, BusinessWire; project part of UNESCO program

My Role

Part of an 8-person team working under a tech lead. Responsible for technical communication with the Arolsen Archives team and contributing to the document classification pipeline. The coordination role — making sure 8 people across multiple workstreams delivered coherent results — was as important as the code.

Links

Home
About
Projects
Blog