ICMI 2020 - Speech & Technology

ICMI

ICMI2020 is going virtual. With the help of our Virtual Conference chairs, the ICMI organisers are preparing the online and interactive program.

The workshop "Speech, Voice, Text, and Meaning" about Oral History and Technology will be on the 29^th of October.

iconpdf Proposal

iconpdf Short paper

Aim of the Workshop

When considering research processes that involve interview data, we observe a variety of scholarly approaches, that are typically not shared across disciplines. Scholars hold on to engrained research practices drawn from specific research paradigms and they seldom venture outside their comfort zone. The inability to ‘reach across’ methods and tools arises from tight disciplinary boundaries, where terminology and literature may not overlap, or from different priorities placed upon digital skills in research. We believe that offering accessible and customized information on how to appreciate and use technology can help to bridge these gaps.

This workshop aims to break down some of these barriers by offering scholars who work with interview data the opportunity to apply, experiment and exchange tools and methods that have been developed in the realm of Digital Humanities.

Previous work

As a multidisciplinary group of European scholars, tools and data professionals, spanning the fields of speech technology, social sciences, human computer interaction, oral history and linguistics, we are interested in strengthening the position of interview data in Digital Humanities. Since 2016 we have organized a series of workshops, supported by CLARIN on this topic (See here on this website).

Our first concrete output was the development of the T-Chain, a tool that supports transcription and alignment of audio and text in multiple languages. Second, we developed a format for experimenting with a variety of annotation, text analysis and emotion recognition tools as they apply to interview data.

The workshop

This half-day workshop will provide a fruitful cross-disciplinary knowledge exchange session.
It will:

Show how you can convert your AV-material into a suitable format and then use automatic speech recognition via the OH portal
Demonstrate the correction of the ASR-results and the annotation of the resulting text
Demonstrate how you can do some text-analysis (Voyant) and make nice graphics
Demonstrate the possibility of emotion extraction with Open Smile

Programme

Thursday 29 October 17:00 - 20:30 CET

Time (CET)	Action	Speaker(s)	Affiliation
17:00 – 17:30	Introduction and short presentation on ‘Digital Humanities approaches to interview data - can historians, linguists and social scientists share tools?’	Stef Scagliola	Univ. Luxemburg (L)
17:30 – 18:45	Preparing your audio-data, uploading the audio to the portal and automatic recognizing the speech. Correcting the ASR-results Downloading the (corrected) results and improving the readability	Christoph Draxler Henk vd Heuvel Arjan van Hessen	LMU München (D) RU Nijmegen (NL) UTwente (NL)
18:45 – 19:00	Coffee break
19:00 – 19:30	Analyzing Oral Archives by looking at BREATHING Records of Breath was an art installation by Evrim Kavcar that concentrated on the artist inhaling and exhaling, while talking about her personal trauma. Listening to the sharp intake of these breaths with all the tension, and the indescribable subtleties that powerfully signal the distress of the artist is a transforming experience. The artist has achieved this by working on individual sound files in meticulous detail. The BREATH project, inspired by Records of Breath, aims to apply breathing analysis on a much grander scale for the analysis of trauma-related oral history archives, across narratives told about various type of oppressions, violence or natural disasters, collected from different cultures, geographies and timespans, and seeks to gain new insights by analyzing breathing patterns of thousands. Our starting point was to generate algorithms that automatically detect breathing patterns in conversations, and apply the analysis of these patterns to investigate the relationship between short- and long-term affective states (emotion and mood), speech features, and heart rate. Despite the strong effect of non-verbal information in the speech, breathing is under-studied in affect recognition literature, which mostly focuses on spoken portions of the audio and facial expressions from the video. We hypothesize that the non-speech parts, particularly silences and breathing, are of high importance in analyzing and understanding emotions, and will generate valuable features enhancing the literature on emotion as well as trauma/Post Traumatic Stress Disorder (PTSD) detection. Oral archives act as vital testimonials for oral history, politics and human rights. As such, they are usually either transcribed, or meticulously indexed. Computational methods used in studying these valuable sources help with automatic transcription of the records into text, as well as with subsequent text analysis. Looking at nonverbal signals will expand the computational toolkit for analyzing the archives, while at the same time generating an invaluable source to study the act of narrating, remembering and all the emotions such actions evoke. In this talk, I’ll focus above all on the interdisciplinary approach of our project, our first steps in analyzing breathing patterns, as well as our future plans.	Almila Akdag Salah	Utrecht University (NL)
19:30 – 20:00	Enriching audio databases with information hidden in the acoustic signal Audio information retrieval is an emerging field as users' needs for accurate search results are growing, and as Media Assets Management (MAM) systems tackle a massive amount of audio and video excerpts that need to be indexed. One of the challenges is to index videos with mixed temporal information, such as exists in documentaries. Indeed, a massive amount of cultural heritage exists in documentaries that include both present-time anchorpersons and archival pieces. In this talk I will present a study in which we used a k-means clustering method to automatically identify time-related information from the acoustic signal of the video. By that, the study offers a possible enrichment of video databases, beyond the traditional manual meta-data.	Vered Silber-Varod	Open University of Israel (IL)
20:00 – 20:30	General discussion and close of meeting

Presentation