Logo long text klein

    • Welcome to this website
    • What drives us
    • Who we are
    • Supporting Partners
    • Contact
    • Member login
    • About
  • Research domains
    • Oral History
    • Computational Linguistics
    • Sociolinguistics
    • Language and Speech Technology
  • Tech & Tools
    • Technologies and tools for speech data
    • Automatic Speech Recognition
    • Forced Alignment
    • Transcription
    • Emo-Spectre
    • Qualitative Data Analysis
    • Computational Linguistics
    • Subtitles
    • Software developed by our team
  • Workshops
    • EHRI 2022
    • ICMI 2020
    • CLARIAH OH Workshop 2019
    • CLARIN Workshop Sofia 2019
    • DH2019 Workshop
    • München Workshop 2018
    • Arezzo Workshop 2017
    • Utrecht Workshop 2016
    • Oxford Workshop 2016
  • Publications
  • Transcription Portal
  • Guidelines
    • Converting Audio
    • About Transcription
    • Metadata Schemas
    • New Digital Recordings
  • News
  • Data

manuals

  • EHRI 2022
  • ICMI 2020
  • CLARIAH OH Workshop 2019
  • CLARIN Workshop Sofia 2019
  • DH2019 Workshop
  • München Workshop 2018
  • Arezzo Workshop 2017
  • Utrecht Workshop 2016
  • Oxford Workshop 2016

Arezzo Workshop 2016

arezzo

As a follow up of the CLARIN-PLUS workshops on Oral History (OH) archives in Oxford (April 2016) and Utrecht (dec 2016), the Arezzo workshop is meant for the finalization of the setup of a transcription chain for OH interviews.

The envisaged outcome of the Arezzo workshop is an implementation plan for an OH transcription chain that can be integrated into the CLARIN infrastructure. Once the implementation plan is written, it will be submitted to CLARIN ERIC for final approval. The funding has been reserved already.

The second workshop (10-12 May 2017) in Arezzo is a two-day workshop for max 30 participants (on invitation only).
Main goal of the workshop is to:

  • finalize the proposal for the "ideal transcription chain" for oral historians
  • find necessary colleagues/partners
  • identify possible (CLARIN) hosts for OH transcription services for the three languages.

 

Location

ArezzoWalkThe location of the meeting is at the Department of Education, human sciences and intercultural communication – Siena University (Campus ‘Il Pionta’).

The University Campus is located in Viale Cittadini 33
  +39-0575-9261;
 This email address is being protected from spambots. You need JavaScript enabled to view it.

The location is very near to the railway station of Arezzo and the historical centre is less than 10 minutes by foot.

Directions: Once you get to the railway station, walk through the underpass to Campo di Marte and take the exit on the right, walk straight to the traffic light, cross the road and walk in the opposite direction to the cars. After a few meters, you will find the Campus on your left.

Here you can find a virtual tour of the Campus.

Programme Wednesday 10 May

14:00 Welcome Silvia Calamai  
14:15 Overview Henk van den Heuvel Background, Objectives, Agenda, targets of workshop
14:30 Transcription chain Henk van den Heuvel The various building blocks of a transcription chain, as discussed in Utrecht workshop.
14:45 AD-conversion Arjan van Hessen

AD-conversion-tools

 

ASR-tools: Full Speech Recognition for different languages

15:00 ASR tools, English Thomas Hain Focussing at WebASR.org
15:20 ASR tools, Dutch Roeland Ordelman KALDI recognizer Dutch NISV
15:40 ASR tools, Dutch Henk van den Heuvel

Webinterface incl. OH version, incl results

16:00 BREAK    
 

ASR-tools: Alignment of audio and transcripts for various languages

16:15 WebMAUS John Coleman &
Christoph Draxler
WebMAUS Aligner
16:30 Italian Alignment Piero Cosi The Italian Aligner
16:45 Experience feedback Graham Gibbs Participants reports on their experiences with the ASR tools and Alignment tools 
17:15 DIY Arjan van Hessen Discussion about desired formats of the ASR-tools. What do you want to get back from the ASR-engine?
Hands-on Experience if necessary
18:30 Close of first day Silvia Calamai Are you hungry?
19:30 Diner    

Programme Thursday 11 May

9:15 Buon Giorno Henk van den Heuvel Summary of day 1 and Overview of day 2
 

Transcription: Guidelines, Standards, Editors, Crowdsourcing

9:25 Transcription guidelines Stef Scagliola & Silvia Calamai Various standards, best practices for Oral History
9:45 Manual transcription correction services Arjan van Hessen What is there to be used by individual researchers (for example SubtitleEdit)
10:00 Web-based annotation editors Christoph Draxler Portal for individual researchers and in in a crowdsourcing environment
11:00 BREAK    
11:15 Crowdsourcing Arjan van Hessen  Crowdflower (in 2020 bought by Appen) crowdsourcing strategies and transcription correction
11:25 Discussion All Participants reports on their experiences with Transcription services and crowdsourcing platforms
12:00 Hand-on experience Arjan van Hessen & Christoph Draxler Do a correction of your own transcriptions, set-up a crowdsourcing experiment where people can help you with the transcriptions, and try-out the transcription guidelines (good or not and what is missing)
13:00 LUNCH    
 

Metadata: Guidelines, Standards, Editors

14:00 Metadata Stef Scagliola & Louise Corti Overview of standards, relevant categories, language of metadata, translation etc
14:30 Metadata editor Henk van den Heuvel A metadata editor as implemented at CLST
14:45 Discussion All Participants reports on their experiences with Metadata-editing
15:00 BREAK    
 

Presentations on data management/hosting in NL, UK, IT ((persistent) archiving options)

15:15 National Infra: NL Rene van Horik About the data infrastructure in the country and how our services could fit into that & access to data, tools, metadata for the research community at large & IPR / informed consent / ethical issues
15:30 National Infra: UK Louise Corti About the data infrastructure in the country and how our services could fit into that & access to data, tools, metadata for the research community at large & IPR / informed consent / ethical issues
15:45 National Infra: IT Monica Monachini About the data infrastructure in the country and how our services could fit into that & access to data, tools, metadata for the research community at large & IPR / informed consent / ethical issues
16:00 National Infra: CZ Pavel Stranak About the data infrastructure in the country and how our services could fit into that & access to data, tools, metadata for the research community at large & IPR / informed consent / ethical issues
16:20 Discussion Henk van den Heuvel  
18:00 Close of meeting Silvia Calamai  
19:30 Diner    

Programme Friday 12 May

9:45 Buongiorno Henk van den Heuvel Summary of day 2 and overview of day 3
10:00 Wrapping up Henk van den Heuvel
  • Which improvements are needed for the Google documents on the various topics:
  • Which software improvements are needed and should be included in the implementation plan
  • Which facilities do we miss so far?
10:30 Proposal Arjan van Hessen Concluding actions for finalising the implementation proposal
11:30 BREAK    
11:45 Time schedule Arjan van Hessen Setup of the time schedules for the next months: from workshop to proposal.
12:15 Plan for a publication Stef Scagliola How to set up some publications based on the work done in this workshop?
12:45 LUNCH    
14:00 Adjourn Henk van den Heuvel & Silvia Calamai  

Images

Arezzo workshop

More images of the workshop can be found here.

 

ASR comments