Transcription Portal - Speech & Technology

To facilitate scholars with their audio transcriptions, CLARIN ERIC supported an initiative to build a TranscriptionPortal where scholars can upload audio files, select the spoken language and eventually a Language Model, and proces the files. Once the processing is done, the results can be downloaded. Click on the image or button below to go to the TranscriptionPortal. Do note that an academic login is required for use of this tool.

Screenshot of the TranscriptionPortal as realised for the 2018 workshop in München

Go to the Transcription-Portal

Dataflow

In september 2018 the first version of the CLARIN TranscripionPortal was realised. The portal, currently located at the LMU in München, collects the audio files and sent them to the ASR-engines in the different counties. So, a Dutch audio file from a scholar in Stockholm, is send to München and from there, once Dutch is chosen as the language, sent to Nijmegen in the Netjherlands where the audio is recognised. The results are resend to München and from there back to Stockholm.

The assumption is that for all European languages there should be a speech recognizer available somewhere in Europe. Yet, this is not the case, so an attempt is being made to use available commercial recognisers for the missing languages. However, it cannot be guaranteed that they treat the data according to the GDPR. So, in the CLARIN context, an attempt is being made to make a good recognizer available for as many languages as possible in the long term.

Workflow

Go to the TranscriptionPortal website, select one or more sound files, upload them, select the language and eventually a Language Model, start the recognition and once ready, download the automatically generated transcription(s).
Currently, the audio-files must be formatted as mono or stereo wav-files, but in the near future the portal itself will transform a range of audio-file formats into the required (wav-) format.

In case of stereo, the portal asks the users whether they want to process both audio-channels separately or together (i.e. recoded to one mono signal).

If choosen for "separately", both channels are done one after the other so that when you have recorded the interview with 2 speakers each on a single channel, it is easier to separate the different speakers, determine turn-takings, and (sometimes) get a better recognition result.

Once ready, the results can be downloaded and processed according the whishes of the scholars.