LLM-Based Metadata Extraction from NATO Scanned Documents
Historical archives contain valuable evidence, but scanned documents are difficult to search when their metadata is incomplete or inconsistent. At the C4DHI Anniversary Workshop, I presented a workflow that uses large language models to extract structured metadata from scanned NATO archival documents. The talk focused on noisy OCR, multilingual records and the need to preserve evidence for human review.