Logo long text klein

    • Welcome to this website
    • What drives us
    • Who we are
    • Supporting Partners
    • Contact
    • Member login
    • About
  • Research domains
    • Oral History
    • Computational Linguistics
    • Sociolinguistics
    • Language and Speech Technology
  • Tech & Tools
    • Technologies and tools for speech data
    • Automatic Speech Recognition
    • ASR Tools
    • Forced Alignment
    • Transcription
    • Emo-Spectre
    • Qualitative Data Analysis
    • Computational Linguistics
    • Subtitles
    • Software developed by our team
    • Recordings
  • Workshops
    • OH-SMArt
    • EMLAR 2024
    • EHRI 2022
    • ICMI 2020
    • CLARIAH OH Workshop 2019
    • CLARIN Workshop Sofia 2019
    • DH2019 Workshop
    • München Workshop 2018
    • Arezzo Workshop 2017
    • Utrecht Workshop 2016
    • Oxford Workshop 2016
  • Publications
  • Transcription Portal
  • Guidelines
    • Converting Audio
    • Metadata Schemas
    • New Digital Recordings
  • News
  • Data

manuals2

AI and Oral History: Applications in Holocaust Testimonies

AI and Oral History: Applications in Holocaust Testimonies

An interesting on-line meeting about  AI and Oral History Date: 25 November 2024Time: 14:25 –15:45How: Online We are pleased to welcome Maria Dermentzi (digital humanities consultant) and...

What automatic speech recognition can and cannot do for conversational speech transcription

What automatic speech recognition can and cannot do for conversational speech transcription

Sam O'Connor Russell, Iona Gessinger, Anna Krason, Gabriella Vigliocco, Naomi Harte, In: Research Methods in Applied Linguistics, Volume 3, Issue 3, 2024,100163,ISSN 2772-7661.DOI:...

A new ASR tool: aTrain

A new ASR tool: aTrain

At the end of March, we got the first version of our paper back to the LREC-COLING workshop about Holocaust Testimonies as Language Resources (Workshops – LREC-Coling 2024). We needed to modify some...

Update Whisper Large Model

Update Whisper Large Model

OpenAI is pleased to announce the latest iteration of Whisper, called large-v3. Whisper-v3 has the same architecture as the previous large models except some minor differences. The large-v3 model...

How Might We Create Better Benchmarks for Speech Recognition?

How Might We Create Better Benchmarks for Speech Recognition?

The applications of automatic speech recognition (ASR) systems are proliferating, in part due to re-cent significant quality improvements. However, as recent work indicates, even state-of-the-art...

How researchers digitally preserve Holocaust evidence

How researchers digitally preserve Holocaust evidence

Das E-Learning-Projekt „Musik im KZ Theresienstadt“ soll Schülerinnen und Schülern Grundlagenwissen über das Lager vermitteln. (The e-learning project "Music in Theresienstadt Concentration Camp"...

The dubbing artist: 'That's how Artificial Intelligence stole my voice'

The dubbing artist: 'That's how Artificial Intelligence stole my voice'

(this aricle is an automatic translation of the original story, written in Italian) Remie Michelle Clarke is an Irish voice actor. She usually charges up to $2,000 for 30 seconds of her voice. But...

Whisper, a new ASR engine

Whisper, a new ASR engine

Whisper is an automatic speech recognition (ASR) system trained on 680,000 hours of multilingual and multitask supervised data collected from the web. The developer of Whisper, OpenAI, shows that...

Exploring the possibilities of Thomson’s fourth paradigm transformation—The case for a multimodal approach to digital oral history?

Exploring the possibilities of Thomson’s fourth paradigm transformation—The case for a multimodal approach to digital oral history?

Authors: Hannah K Smyth, Julianne Nyhan, Andrew Flinn Published: 02 January 2023 Digital Scholarship in the Humanities, fqac094,...

The State of Automatic Speech Recognition

The State of Automatic Speech Recognition

Q&A with Kaldi’s Dan Povey This article continues the series on Automatic Speech Recognition, written at Medium.com.  Few experts in the field of automatic speech recognition have the...

AI and Oral History: Applications in Holocaust Testimonies

An interesting on-line meeting about  AI and Oral History

Date: 25 November 2024
Time: 14:25 –15:45
How: Online

We are pleased to welcome Maria Dermentzi (digital humanities consultant) and Hugo Scheithauer (Inria Paris ALMAnaCH) for this 'Voices Unbound?' seminar.

All welcome to join this online seminar. Please register to receive the joining details .

Background

This lecture will focus on the application of Automatic Speech Recognition (ASR) technology, specifically OpenAI's Whisper model, to transcribe oral testimonies from the Holocaust. We will discuss what makes ASR for oral Holocaust testimonies challenging and present examples of successes and failures. The lecture will also cover post-processing techniques for automatically generated transcripts, including Named Entity Recognition (NER). Participants will gain insights into how AI tools can support oral history research and why domain expertise is key to correcting and interpreting the results of these tools.

About the Speakers

Maria Dermentzi

Digital humanities consultant

Previously, she worked at King's College London, where she developed AI-powered tools for history research in the context of the European Holocaust Research Infrastructure. She holds an MSc in Digital Humanities from KU Leuven, an MA in Digital Culture and Society from King's College London, and a degree in Law from the Aristotle University of Thessaloniki. Hugo: PhD student in the Inria Paris ALMAnaCH project-team, Hugo Scheithauer works on document layout analysis, automatic text recognition, and information extraction for historical documents. He holds a dual MA in Art History from Columbia University and Sorbonne University, and an MA in Digital Humanities from the École nationale des chartes (National School of Charters), Paris

Hugo Scheithauer

PhD Researcher at Inria Paris ALMAnaCH

Hugo Scheithauer works on document layout analysis, automatic text recognition, and information extraction for historical documents. He holds a dual MA in Art History from Columbia University and Sorbonne University, and an MA in Digital Humanities from the École nationale des chartes (National School of Charters), Paris.

 

--------------------------------------------

Voices unbound? Exploring new and/or possible directions in digital and experimental oral history

A lecture series co-organised by TU Darmstadt, UCL Centre for Digital Humanities, Luxembourg Centre for Contemporary and Digital History (C²DH) and the Max-Planck-Institut für Wissenschaftsgeschichte.
The series is convened by Julianne Nyhan (TU Darmstadt), Andrew Flinn, Andreas Vlachidis, and Marco Humbel (UCL), Shih-Pei Chen (Max Planck Institute), and Gerben Zaagsma (C²DH), offering an important way of keeping up to date with the methodological and theoretical state of the art in digital oral history. We invited speakers to present work on recent technological developments that may hold promise for digital oral history. In this way, the seminar series appeals to (digital) oral historians, digital humanists and scholars of the history of information, memory and knowledge systems.

 

--------------------------------------------

Privacy notice for the UCLDH mailing list

Your personal data (email address, name) will be used by the list owner to manage your membership and send emails with relevant news and events. We will treat your information with respect. By subscribing, you agree that we may process your information in accordance with these terms.

----------------------------------------------------------------

What automatic speech recognition can and cannot do for conversational speech transcription

elsevier non solusSam O'Connor Russell, Iona Gessinger, Anna Krason, Gabriella Vigliocco, Naomi Harte,
In: Research Methods in Applied Linguistics, Volume 3, Issue 3, 2024,100163,ISSN 2772-7661.
DOI: https://doi.org/10.1016/j.rmal.2024.100163.
WWW: https://www.sciencedirect.com/science/article/pii/S2772766124000697

Abstract

Transcripts are vital in any research involving conversation. Most transcription is conducted manually, by experts; a process which can take many times longer than the conversation itself. Recently, there has been interest in using automatic speech recognition (ASR) to automate transcription, driven by the wide availability of ASR platforms such as OpenAI’s Whisper. However as studies typically focus on metrics such as the word error rate, there is a lack of detail about ASR transcript quality and the practicalities of ASR use in research. In this paper we review six state-of-the-art ASR technologies, three commercial and three open-source. We assess their capabilities as automatic transcription tools. We find that the commercial ASR systems mostly capture an accurate representation of what was said, and overlapping speech is handled well. Unlike prior work, we show that commercial ASR also preserves the location, but not necessarily the spelling of a large majority of non-lexical tokens: short words such as uh-hum which play vital roles in conversation. We show that the open-source ASR systems produce substantially more errors than their commercial counterparts. However, we highlight how the cost and privacy advantages of open-source ASR may outweigh performance issues in certain applications. We discuss practical considerations for ASR deployment in research, concluding that present ASR technology cannot yet replace the trained transcriber. However, a high-quality initial transcript generated by ASR can provide a good starting point and may be further refined by manual correction. We make all ASR-generated transcripts available for future research in the supplementary material.

 

For the full article, see the link.

A new ASR tool: aTrain

atrainAt the end of March, we got the first version of our paper back to the LREC-COLING workshop about Holocaust Testimonies as Language Resources (Workshops – LREC-Coling 2024). We needed to modify some (minor) issues and started to do so. The paper is about the arrival of Whisper (autumn 2022) and related packages, which allow you to do better, richer and faster speech recognition. Whisper was already a kind of miracle a year ago, but certainly since the "related packages" have come of age, it is only getting much better.

One of the reviewers noted that in the listing of standalone ASR-software, he missed a new programme from Graz in Austria: aTrain. Immediately we search on the internet to figure out what aTrain was. It is a Whisper-using software package for ASR and it is available for free via a download at the Microsoft shop.

Downloaded, installed and ran! And yes: again a joy to use.
For Apple, there has been MacWhisper for a while: a standalone package that allows you to run Whisper on a modern MacOS computer. For Windows, you could use SubtitleEdit, but this software is much more than a "simple" speech recogniser (however: it works fine).

And now there is aTrain: a software package similar to MacWhisper that runs on Windows and Linux.

The difference with MacWhisper is that aTrain uses more of Whisper's modern variants/add-ons. MacWhisper is a CPP implementation of the classic Whisper like OpenAI provided 1.5 years ago. But aTrain is newer and can do e.g. diarization, and is much faster than the original Whisper.

After downloading aTrain, the software asks if you want to install it. Choose yes and wait some 10 min, then you'll have the modern speech recogniser available.

image

Description

aTrain is an self-installing and encapsulated tool for automatically transcribing speech recordings utilizing state-of-the-art machine learning models without uploading any data. It was developed by researchers at the Business Analytics and Data Science-Center at the University of Graz and tested by researchers from the Know-Center Graz.

More can be read in:
 Haberl, A., Fleiß, J., Kowald, D., & Thalmann, S. (2024). Take the aTrain. Introducing an interface for the Accessible Transcription of Interviews. Journal of Behavioral and Experimental Finance, 41, 100891. 
The advantage of the software is that it just runs on your own PC/Laptop and no connection to the Internet is required. Especially for those who have confidential data, this is a big advantage because it allows you to guarantee the confidentiality of the data in the best possible way.

aTrain was developed by researchers at the Business Analytics and Data Science-Center of the University of Gräz and tested by researchers at the Know-Center Graz.

aTrain2

What is offered

aTrain offers the following benefits:

Fast and accurate
aTrain provides a user friendly access to the faster-whisper implementation of OpenAI’s Whisper model, ensuring best in class transcription quality paired with higher speeds on your local computer. Transcription when selecting the highest-quality model takes only around three times the audio length on current mobile CPUs typically found in middle-class business notebooks (e.g., Core i5 12th Gen, Ryzen Series 6000).

Speaker detection
aTrain has a speaker detection mode and can analyze each text segment to determine which speaker it belongs to.

Privacy Preservation and GDPR compliance
aTrain processes the provided speech recordings completely offline on your own device and does not send recordings or transcriptions to the internet. This helps researchers to maintain data privacy requirements arising from ethical guidelines or to comply with legal requirements such as the GDRP.

Multi-language support
aTrain can process speech recordings in more or less 57 languages.

MAXQDA and ATLAS.ti compatible output
aTrain provides transcription files that are seamlessly importable into the most popular tools for qualitative analysis, ATLAS.ti and MAXQDA. This allows you to directly play audio for the corresponding text segment by clicking on its timestamp.

NVIDIA GPU support
aTrain can either run on the CPU or an NVIDIA GPU (CUDA toolkit installation required). A CUDA-enabled NVIDIA GPU significantly improves the speed of transcriptions and speaker detection, reducing transcription time to 20% of audio length on current entry-level gaming notebooks.

Running aTrain

To run aTrain, you choose an AV file (video or audio), select the OpenAI-model to use (tiny - large), choose the language spoken or leave it blank so the software will detect it, and possibly indicate whether you want to recognise speakers. If so, you need to add the amount of different speakers in the recording.
Finally you click start and wait a while. On my PC (i9, Nvidia card), this takes a little less than 20% of the duration of the recording. The results are saved in a special directory.

That output contains the following files:

metadata.txt recognition metadata (language, model, audio duration, etc.)
transcription.json a complete result of the recognition
transcription.srt the standard subtitles
transcription.txt the recognised text with the selected speakers
transcription_timespans.txt the same but with the start times of each segment
transcription_maxqda.txt the version that can be read in MaxQDA

Conclusion

aTrain works very well and can be used by anyone on their own modern Windows machine. The use of GPU however, is helping a lot in improving the velocity of the recording! The addition of diarization (speaker detection) makes it a better choice than MacWhisper (for now).

Download

MicrosoftaTrain can be downloaded (>10GB) in the Microsoft store: https://apps.microsoft.com/detail/9n15q44szns2

 

 

 

 

Update Whisper Large Model

OpenAI is pleased to announce the latest iteration of Whisper, called large-v3. Whisper-v3 has the same architecture as the previous large models except some minor differences.

The large-v3 model is trained on 1 million hours of weakly labeled audio and 4 million hours of pseudolabeled audio collected using large-v2. The model was trained for 2.0 epochs over this mixture dataset.

The large-v3 model shows improved performance over a wide variety of languages, and the plot below includes all languages where Whisper large-v3 performs lower than 60% error rate on Common Voice 15 and Fleurs, showing 10% to 20% reduction of errors compared to large-v2:

plaatje

Language-breakdown

Languages evaluated using character error rates (CERs) instead of word error rates (WERs) are shown in Italic.
We used character error rates for Korean as well, in addition to the five languages that we used CERs in the paper (Chinese, Japanese, Thai, Lao, and Myanmar). While Korean does use spaces to separate words, there are many cases where it is acceptable to omit spaces between words, and we noticed that the labels in both Common Voice 15 and Fleurs have many instances of inconsistent or incorrect spacings.
In the Fleurs dataset, we used the transcription column which contains labels that are pre-processed and normalized from the raw_transcription column, with these two exceptions:

  1. Korean labels have many phrases inside parentheses that usually repeats the preceding word in the Latin script; we removed these using regular expressions.
  2. Serbian labels contain transcriptions in both Latin and Cyrillic scripts, and Whisper's predictions for Serbian also often fluctuate between the two scripts. For evaluation, we converted both the labels and the model predictions into Cyrillic and computed the word error rates.

This article comes from: https://github.com/openai/whisper/discussions/1762

 

How Might We Create Better Benchmarks for Speech Recognition?

benchmarkThe applications of automatic speech recognition (ASR) systems are proliferating, in part due to re-cent significant quality improvements. However, as recent work indicates, even state-of-the-art speech recognition systems – some which deliver impressive benchmark results, struggle to generalize across use cases.
In this paper, we review relevant work, and, hoping to inform future benchmark development, outline a taxonomy of speech recognition use cases, proposed for the next generation of ASR benchmarks. We also survey work on metrics, in addition to the de facto standard Word Error Rate (WER) metric, and we introduce a versatile framework designed to describe interactions between linguistic variation and ASR performance metrics.

In the paper below, 4 engineers from Google, describe how they believe a better benchmark system for ASR can be made.

Aksënova, Alëna & Esch, Daan & Flynn, James & Golik, Pavel. (2021). How Might We Create Better Benchmarks for Speech Recognition?
Proceedings of the 1st Workshop on Benchmarking: Past, Present and Future, pages 22–34 August 5–6, 2021. ©2021 Association for Computational Linguistics
https://aclanthology.org/2021.bppf-1.4.pdf

or here

iconpdf How Might We Create Better Benchmarks for Speech Recognition?

 


 

How researchers digitally preserve Holocaust evidence

Das E-Learning-Projekt „Musik im KZ Theresienstadt“ soll Schülerinnen und Schülern Grundlagenwissen über das Lager vermitteln.
(The e-learning project "Music in Theresienstadt Concentration Camp" aims to provide pupils with basic knowledge about the camp. )

 

Deutsch

Erinnern durch E-Learning

Mit virtuellen Zeitzeugnissen konservieren LMU-Forschende die Erinnerung an das Grauen des Nationalsozialismus. Im E-Learning-Projekt „Musik im KZ Theresienstadt“ werden Interviews mit Überlebenden für Schülerinnen und Schüler aufbereitet, bei „Voices from Ravensbrück“ für Forschende. 

Mit ein paar Klicks gelangt man in das virtuelle Konzentrations­lager Theresienstadt. Dort trifft man die Überlebende Dr. Michaela Vidláková, sieht Ausschnitte eines Propagandafilms der Nazis, hört aber auch die wunderbare Musik, die seinerzeit in Theresienstadt komponiert wurde. Das E-Learning-Projekt „Musik im KZ Theresienstadt“ des Jewish Chamber Orchestra Munich soll Schülerinnen und Schülern Grundlagenwissen über das Lager vermitteln – und über Kunst und Kultur, die dort entstanden sind.

Die Idee dazu hatte Daniel Grossmann, Gründer und Dirigent des Jewish Chamber Orchestra Munich. „Dass im KZ Theresienstadt ein umfangreiches kulturelles Leben stattfand und sogar zahlreiche Werke dort komponiert und uraufgeführt wurden, ist wenig bekannt“, erklärt er. „Als Dirigent bedeutet es mir sehr viel, der Erinnerung an diese Künstlerinnen und Künstler gerecht zu werden.“

Entwickelt wurde die Plattform in Kooperation mit der LMU und der Technischen Universität München (TUM). „Schon länger beschäftigen wir uns mit digitalen Tools zur Holocaust Education“, erklärt Germanist Ernst Hüttl, der das Projekt als Wissenschaftlicher Mitarbeiter am Lehrstuhl für Deutschdidaktik auf LMU-Seite betreute. „Im Projektverbund LediZ beispielsweise wurden unterschiedliche interaktive Zeitzeugnisse entwickelt.“ So können 3D-Projektionen von Holocaust-Überlebenden mithilfe von Sprachverarbeitung auf Fragen antworten. Und „Abba‘s Hub“ stellt die Lebensgeschichte des litauischen Holocaust-Überlebenden Abba Naor als begehbare Zeitleiste mit 3D-Umgebung dar.

„Die Weise von Liebe und Tod“ in 360 Grad

Die neue Plattform „Musik im KZ Theresienstadt“, in die man sich mit VR-Brille, aber auch am Smartphone oder Computer begeben kann, sei ein „sehr kompaktes Lernpaket“, so Hüttl. Darin geht es auch um die ambivalente Seite der Musik in Theresienstadt – als Propagandamittel für die Täter, aber auch als Überlebenswerkzeug für die Inhaftierten. Im Fokus standen Leben und Werk Viktor Ullmanns, eines Komponisten, der im KZ Theresienstadt komponierte, bevor er in Auschwitz ermordet wurde.

Drei virtuelle Räume, entwickelt von Studierenden der Architekturinformatik der TUM, repräsentieren die Lebensgeschichte Ullmanns. Hüttl selbst bereitete sie technisch auf und befüllte sie mit Medien: historischen Fotos, Quellentexten, Audiobeispielen und Interviews. 

Mithilfe dieser Medien beantworten die Schülerinnen und Schüler Fragen und erhalten bei richtiger Antwort Zugang zum nächsten Raum. Am Ende des Quiz gelangen sie zu einem 360-Grad-Video einer Aufführung des Jewish Chamber Orchestra Munich in Theresien­stadt. Dirigent Grossmann hatte das KZ dafür mit seinen Musikern besucht und sie räumlich getrennt positioniert – im Waschzimmer, Schlafräumen oder dem Theatersaal. So wurde die letzte Komposition Viktor Ullmanns „Die Weise von Liebe und Tod des Cornets Christoph Rilke“ aufgeführt – eine visuell wie akustisch eindrucksvolle Aufzeichnung. 

Transkription mit Künstlicher Intelligenz

Aus anderer Perspektive befasst sich Dr. Christoph Draxler vom Institut für Phonetik und Sprachverarbeitung der LMU mit Holocaust-Zeugnissen. In dem Projekt „Voices from Ravensbrück: The Value of Multilingual Oral History” bereitete er mit Kolleginnen und Kollegen aus Italien und den Niederlanden Zeitzeuginnen-Interviews für die Forschung auf. „Das KZ Ravensbrück bei Berlin war als reines Frauen­lager mit Inhaftierten aus über 30 Ländern ein sehr spezielles, multilinguales Lager“, erklärt der Informatiker und romanistische Linguist, der sich seit Langem mit Webtools für phonetische Forschung, Abfrage­methoden in Sprachdatenbanken sowie Dialektologie befasst.

Ausgangspunkt des Projekts waren Interviews mit Überlebenden des KZ, die in den 70er-Jahren in Italien aufgenommen worden waren. Das Forschenden-Team digitalisierte die Tonbandaufnahmen, sammelte weitere Interviews aus verschiedenen Ländern der Welt und bereitete sie für die Forschung auf. 

Draxlers Part war dabei die Verschriftlichung. „Die Wortfehlerrate heutiger Transkriptions-Programme ist dank Künstlicher Intelligenz sehr vielbesser geworden“, so Draxler. „Was KI noch nicht kann – die wissenschaftliche Transkription aber leisten muss –, ist das Erkennen von Häsitationen, Wiederholungen oder Satzabbrüchen. Denn diese Phänomene sind für das Verständnis der Gesprächssituation entscheidend.“ Bei der Transkription der Zeitzeugen-Interviews müsse das Programm zudem mit berücksichtigen, dass bei den sehr persönlichen Geschichten keine Persönlichkeitsrechte des Interview­partners oder Dritter verletzt würden.

Mehr als abstrakte Zahlen

clarin logoTZusammengeführt wurden die Interviews und Transkripte auf CLARIN, der europäischen Plattform für Sprachforschungsdaten. In einem neuen „Oral History“-Bereich, der auf der Arbeit von Draxler und seinen Projekt-Partnerinnen und -Partnern basiert, können Forschende nun entsprechende Zeitzeugnisse erfassen und auffinden sowie über Sprachdaten, die es in Archiven weltweit zu Ravensbrück gibt, recherchieren. Damit auch Forschende ohne IT-Kenntnisse Transkriptions-Software für Zeitzeugengespräche nutzen können, testete man im Rahmen des Ravensbrück-Projekts ein selbst ent­wickeltes, „radikal einfach zu bedienendes“ Programm – bei gleicher Fehlerrate, als würde man es selbst abtippen.

„Allein über das Unterrichten abstrakter Zahlen kann man die Tragödie des Holocaust nicht vermitteln“, erklärt Dirigent Daniel Grossmann. „Aus einer jüdischen Familie stammend, die in weiten Teilen vernichtet wurde, ist mir die Erinnerung an einzelne Opfer des Holo­caust sehr wichtig.“

English

Remembering through e-learning

LMU researchers are preserving the memory of the horrors of National Socialism with virtual testimonies. In the e-learning project "Music in Theresienstadt Concentration Camp", interviews with survivors are prepared for schoolchildren, and in "Voices from Ravensbrück" for researchers.

With a few clicks you can enter the virtual concentration camp Theresienstadt. There you meet the survivor Dr. Michaela Vidláková, see excerpts of a Nazi propaganda film, but also hear the wonderful music that was composed in Theresienstadt at the time. The e-learning project "Music in Theresienstadt Concentration Camp" by the Jewish Chamber Orchestra Munich aims to provide pupils with basic knowledge about the camp - and about the art and culture that was created there.

The idea came from Daniel Grossmann, founder and conductor of the Jewish Chamber Orchestra Munich. "The fact that there was an extensive cultural life in the Theresienstadt concentration camp and that numerous works were even composed and premiered there is little known," he explains. "As a conductor, it means a lot to me to do justice to the memory of these artists."

The platform was developed in cooperation with the LMU and the Technical University of Munich (TUM). "We have been working on digital tools for Holocaust education for a long time," explains German scholar Ernst Hüttl, who supervised the project as a research assistant at the Chair of German Didactics at LMU. "In the LediZ project network, for example, different interactive contemporary testimonies were developed." For example, 3D projections of Holocaust survivors can answer questions with the help of speech processing. And "Abba's Hub" presents the life story of Lithuanian Holocaust survivor Abba Naor as a walk-through timeline with a 3D environment.

"The way of love and death" in 360 degrees

The new platform "Music in Theresienstadt Concentration Camp", which you can enter with VR glasses, but also on your smartphone or computer, is a "very compact learning package", says Hüttl. It also deals with the ambivalent side of music in Theresienstadt - as a propaganda tool for the perpetrators, but also as a survival tool for the prisoners. The focus was on the life and work of Viktor Ullmann, a composer who composed in the Theresienstadt concentration camp before being murdered in Auschwitz.

Three virtual rooms, developed by students of architectural informatics at TUM, represent the life story of Ullmann. Hüttl himself prepared them technically and filled them with media: historical photos, source texts, audio examples and interviews.

With the help of these media, the pupils answer questions and, if they answer correctly, are given access to the next room. At the end of the quiz, they are taken to a 360-degree video of a performance by the Jewish Chamber Orchestra Munich in Theresienstadt. Conductor Grossmann had visited the concentration camp with his musicians for this and positioned them spatially separately - in the washroom, bedrooms or the theatre hall. This is how Viktor Ullmann's last composition "Die Weise von Liebe und Tod des Cornets Christoph Rilke" was performed - a visually and acoustically impressive recording.

Transcription with Artificial Intelligence

From a different perspective, Dr Christoph Draxler from the Institute of Phonetics and Speech Processing at LMU is dealing with Holocaust testimonies. In the project "Voices from Ravensbrück: The Value of Multilingual Oral History", he and colleagues from Italy and the Netherlands prepared eyewitness interviews for research. "As an all-women's camp with prisoners from over 30 countries, Ravensbrück concentration camp near Berlin was a very special, multilingual camp," explains the computer scientist and Romance linguist, who has been working for a long time on web tools for phonetic research, query methods in language databases and dialectology.

The starting point of the project were interviews with concentration camp survivors recorded in Italy in the 1970s. The research team digitised the tape recordings, collected further interviews from various countries around the world and prepared them for research.

Draxler's part was the transcription. "The word error rate of today's transcription programmes has become much better thanks to artificial intelligence," says Draxler. "What AI cannot yet do - but scientific transcription must do - is recognise haesitations, repetitions or sentence breaks. Because these phenomena are crucial for understanding the conversation situation." When transcribing the eyewitness interviews, the programme also has to take into account that the very personal stories do not violate the personal rights of the interviewee or third parties.

More than abstract numbers

clarin logoTThe interviews and transcripts were brought together on CLARIN, the European platform for language research data. In a new "oral history" area based on the work of Draxler and his project partners, researchers can now record and find corresponding contemporary testimonies and research language data on Ravensbrück available in archives worldwide. To enable researchers without IT skills to use transcription software for eyewitness interviews, the Ravensbrück project tested a self-developed, "radically easy-to-use" programme - with the same error rate as if you were typing it yourself.

"You can't convey the tragedy of the Holocaust just by teaching abstract numbers," explains conductor Daniel Grossmann. "Coming from a Jewish family that was largely annihilated, the memory of individual victims of the Holocaust is very important to me."

Italiano

Ricordare attraverso l'e-learning

I ricercatori della LMU conservano la memoria degli orrori del nazionalsocialismo con testimonianze virtuali. Nel progetto di e-learning "Musica nel campo di concentramento di Theresienstadt", vengono preparate interviste con i sopravvissuti per gli studenti e in "Voci da Ravensbrück" per i ricercatori.

Con pochi clic si può entrare nel campo di concentramento virtuale di Theresienstadt. Lì si incontra la sopravvissuta Dr. Michaela Vidláková, si vedono spezzoni di un film di propaganda nazista, ma si ascolta anche la meravigliosa musica composta all'epoca a Theresienstadt. Il progetto di e-learning "Musica nel campo di concentramento di Theresienstadt" dell'Orchestra da Camera Ebraica di Monaco di Baviera mira a fornire agli studenti le conoscenze di base sul campo - e sull'arte e la cultura che vi sono state create.

L'idea è di Daniel Grossmann, fondatore e direttore dell'Orchestra da Camera Ebraica di Monaco. "Il fatto che nel campo di concentramento di Theresienstadt ci fosse un'ampia vita culturale e che numerose opere siano state composte e presentate in prima assoluta è poco noto", spiega Grossmann. "Come direttore d'orchestra, significa molto per me rendere giustizia alla memoria di questi artisti".

La piattaforma è stata sviluppata in collaborazione con la LMU e l'Università Tecnica di Monaco (TUM). "Lavoriamo da tempo sugli strumenti digitali per l'educazione all'Olocausto", spiega lo studioso tedesco Ernst Hüttl, che ha supervisionato il progetto come assistente di ricerca presso la cattedra di didattica tedesca della LMU. "Nella rete del progetto LediZ, ad esempio, sono state sviluppate diverse testimonianze interattive contemporanee". Ad esempio, le proiezioni in 3D dei sopravvissuti all'Olocausto possono rispondere alle domande con l'aiuto dell'elaborazione vocale. E "Abba's Hub" presenta la storia della vita di Abba Naor, sopravvissuto all'Olocausto in Lituania, sotto forma di una linea del tempo percorribile con un ambiente 3D.

"La via dell'amore e della morte" a 360 gradi

La nuova piattaforma "Musica nel campo di concentramento di Theresienstadt", alla quale si può accedere con gli occhiali VR, ma anche con lo smartphone o il computer, è un "pacchetto didattico molto compatto", dice Hüttl. Si occupa anche del lato ambivalente della musica a Theresienstadt: come strumento di propaganda per i carnefici, ma anche come strumento di sopravvivenza per i prigionieri. L'attenzione si è concentrata sulla vita e sul lavoro di Viktor Ullmann, un compositore che compose nel campo di concentramento di Theresienstadt prima di essere ucciso ad Auschwitz.

Tre stanze virtuali, sviluppate dagli studenti di informatica architettonica del TUM, rappresentano la storia della vita di Ullmann. Hüttl stesso le ha preparate tecnicamente e le ha riempite di media: foto storiche, testi originali, esempi audio e interviste.

Con l'aiuto di questi supporti, gli alunni rispondono alle domande e, se rispondono correttamente, hanno accesso alla stanza successiva. Alla fine del quiz, vengono portati in un video a 360 gradi di un'esibizione dell'Orchestra da Camera Ebraica di Monaco a Theresienstadt. Il direttore d'orchestra Grossmann ha visitato il campo di concentramento con i suoi musicisti e li ha posizionati spazialmente in modo separato: nei bagni, nelle camere da letto o nella sala del teatro. È così che è stata eseguita l'ultima composizione di Viktor Ullmann "Die Weise von Liebe und Tod des Cornets Christoph Rilke": una registrazione di grande impatto visivo e acustico.

Trascrizione con intelligenza artificiale

Da una prospettiva diversa, il dottor Christoph Draxler dell'Istituto di fonetica ed elaborazione del parlato della LMU si occupa di testimonianze dell'Olocausto. Nel progetto "Voices from Ravensbrück: The Value of Multilingual Oral History" (Voci da Ravensbrück: il valore della storia orale multilingue), insieme a colleghi italiani e olandesi ha preparato interviste a testimoni oculari per la ricerca. "Il campo di concentramento di Ravensbrück, vicino a Berlino, era un campo di concentramento multilingue molto particolare, in quanto era un campo di sole donne con prigionieri provenienti da oltre 30 Paesi", spiega l'informatico e linguista romanista, che da tempo si occupa di strumenti web per la ricerca fonetica, di metodi di interrogazione nei database linguistici e di dialettologia.

Il punto di partenza del progetto sono state le interviste ai sopravvissuti dei campi di concentramento registrate in Italia negli anni Settanta. Il team di ricerca ha digitalizzato le registrazioni su nastro, ha raccolto altre interviste in vari Paesi del mondo e le ha preparate per la ricerca.

Draxler si è occupato della trascrizione. "Il tasso di errore di parola degli attuali programmi di trascrizione è migliorato molto grazie all'intelligenza artificiale", afferma Draxler. "Quello che l'AI non può ancora fare - ma che la trascrizione scientifica deve fare - è riconoscere le esitazioni, le ripetizioni o le interruzioni di frase. Perché questi fenomeni sono fondamentali per comprendere la situazione della conversazione". Nel trascrivere le interviste ai testimoni oculari, il programma deve anche tenere conto del fatto che le storie molto personali non violano i diritti personali dell'intervistato o di terzi.

Più che numeri astratti

clarin logoTLe interviste e le trascrizioni sono state raccolte su CLARIN, la piattaforma europea per i dati della ricerca linguistica. In una nuova area di "storia orale" basata sul lavoro di Draxler e dei suoi partner di progetto, i ricercatori possono ora registrare e trovare le corrispondenti testimonianze contemporanee e i dati linguistici di ricerca su Ravensbrück disponibili negli archivi di tutto il mondo. Per consentire ai ricercatori che non dispongono di competenze informatiche di utilizzare un software di trascrizione per le interviste ai testimoni oculari, il progetto Ravensbrück ha testato un programma sviluppato in proprio, "radicalmente facile da usare" - con lo stesso tasso di errore che si avrebbe digitando da soli.

"Non si può trasmettere la tragedia dell'Olocausto solo insegnando numeri astratti", spiega il direttore d'orchestra Daniel Grossmann. "Provenendo da una famiglia ebraica che è stata in gran parte annientata, la memoria delle singole vittime dell'Olocausto è molto importante per me".

christophDr. Christoph Draxler

ernstErnst Hüttl

The original article (in German) can be found here.
The translations were automatically generated with DeepL


The dubbing artist: 'That's how Artificial Intelligence stole my voice'

remie(this aricle is an automatic translation of the original story, written in Italian)

Remie Michelle Clarke is an Irish voice actor. She usually charges up to $2,000 for 30 seconds of her voice. But today for only $27 you can use her "cloned voice" without her receiving anything. And this is thanks to Artificial Intelligence. "It all started with a strange phone call from a friend who asked me: 'How did you manage to offer yourself as a virtual voice actress on revoicer.com? Do they pay well?" I didn't understand what he was talking about, I had never heard of revoicer.com: it is a start-up company that offers hundreds of synthesised voices for a fee to recite any kind of advertisements or audio books. Looking through the voices on offer, I came across mine,' she says in an interview with the Italian La Repubblica (in Italian).

Olivia's discovery

In the interview with Giuliano Aluffi, Clarke recounts that her voice was introduced to her as Olivia. 'Clicking on the audio sample I had an alienating experience: hearing my voice saying sentences that I had never actually spoken'. The voice said to her: "Hello, my name is Olivia. I have a soft, caressing voice. I can record audio books, educational videos and do all kinds of dubbing". According to the voice actress, 'all it takes is an audio file of a few minutes, because the A.I. programmes can quickly find the most characteristic waveforms of someone's speech and use that information to speak "in the manner of" any person, faithfully, even if they end up being lifeless voices'.

The contract with Microsoft

Regarding the legality of the procedure, Clarke reveals that three years ago she had dubbed Bing's voice for Microsoft: 'In the contract it was mentioned that third parties could access my voice samples. But the technology at the time did not make it possible to imagine that it would one day be possible to clone my voice. The problem is that technology has leapt ahead and jurisprudence has lagged behind'. And for all that, there is no legal protection: 'At the moment, the voice is still not recognised as an asset that can be defended by copyright. Instead, it would be important also because, apart from exceptions of really famous dubbers, the world of 'voice over' is made up of people who do not sail in gold'.

 

OH and Speech Technology News

Here we will place news from the world of Oral History and Speech Technology as far as it is relevant for this website. It may be written by one of us or it is a copy from an existing paper/blog.