A spoken image description contains information about more than pronunciation: it also shows which concepts and relations a person notices and how they organise them in language. This paper combines speech recognition, formal semantic analysis and machine learning to compare Czech descriptions with an expert reference. The resulting semantic features offer an interpretable route towards scalable screening for cognitive disorders.

When people describe an image, the result contains more than pronunciation and acoustic information. The choice of concepts, relations between objects and the structure of the description also provide a view of language and cognitive performance.

This paper focuses on semantic analysis of verbal image descriptions. It complements the speech-recognition work in DigiDiaDem by looking at what was said and how the described scene was organised in language.

The method is part of the wider DigiDiaDem Speech-Cognitive Dataset research and complements the nonsense-word repetition study.

Screenshot of the Springer page for Automatic Cognitive Disorder Detection through Semantic Analysis of Verbal Image Descriptions
Publisher record for the conference paper in Text, Speech, and Dialogue.

Authors

Tomáš Lebeda, Lucie Zajícová, Jan Švec, Luboš Šmídl

Abstract

Timely detection of cognitive disorders is critical yet often limited by resource-heavy clinical assessments. This paper presents an automated framework for early cognitive screening based on the semantic analysis of spoken image descriptions in Czech. The system integrates automatic speech recognition, formal semantic parsing, and machine learning to evaluate deviations from an expert-defined reference description. The responses of the participants are analyzed for missing or incorrect semantic content, producing structured loss vectors used for classification. Evaluation on a clinically annotated dataset of 268 samples (split into train-test subsets) shows that semantic features outperform traditional lexical and morphological baselines, highlighting the potential of the method for scalable and interpretable cognitive assessment.

Publication links

Read next