Scientific Transcripts

Transcription is a translation between forms of data, most commonly to convert audio-visual recordings to text in qualitative and quantitative research. It should match the analytic and methodological aims of the research. Whilst transcription is often part of the analysis process, it also enhances the sharing, disclosure and reuse potential of research data.
Full transcription is recommended for data sharing.

terrorismeScreenshot of the manual correction (in SubtitleEdit) of the transcription that was generated with a (Vocapia) ASR-engine.

If the transcription is done with ASR or Forced Aligment, each transcribed/spoken word will automatically get a start- and end-time. This makes it possible to access the AV-files directly on the word-level: clicking the selected fragment in the search window may result in playing that fragment aloud.

Separation of content and presentation

Transcripts contain (a lot of) information that can be parsed by computers and humans. Human parsing is robust for small errors but computer parsing is not.
The content is therefor best written in XML (or JSON) using UTF-8. XML enforces a structured way of storing the the data, making it possible to unambiguously parse the transcripts with a computer.

Storing the transcripts in a text-editor format (e.g. docx or pdf) is therefor not recommended. Small, nearly noticable, errors may disable the parsing of the transcript. For example by using a less suitable font: Rl and RI look the same when using the helvetica-font (but clearly different when using the courier-font Rl and RI).
The same is true for the use of a hard-return (RETURN) and a soft-return (SHIFT-RETURN). For the human eye it looks the same but not for a computer, so parsing may go wrong.


XSLT schema3 XSLT-files (left) for export to third party software and 3 XSLT-files (right) for reading by humans


When presenting the transcripts, XSLT-files can be used to generate a human-readable document that

  1. shows just the information that is desired (for example all information or only the text of the transcript) 
  2. presents the information in the look-and-feel of the institution (font, size, colours, etc.) including logo's and standard text.

Finally, if the layout of the transcripts need to be modified, only one XSLT-file need to be changed (in stead of hundreds of word-files).

Use of transcripts in third-party software

When planning the structure of the transcription template, best practice is to:

  • Consider compatibility with the import features of qualitative data analysis software. Which information is needed (a must) or nice-to-have in that particular analysis software package and which information can not be used (so does it make sense to collect that info in the transcription documents?). Again, an XSLT-file can be used to generate XML-files that can be imported in the third-party software.
    Moreover, different XSLT-files can be used to generate different export-files for different third-party software (for example one XSLT for export to AtlasTI, another XSLT for export to MaxQDA).
  • Write transcriber instructions or guidelines to get consistancy in the transcripts, especially when different people make or correct the transcriptions. How to deal with non-verbal, not-understandable or inauditable speech? How to write foreign or dialect words? How to mark sensitive information for later anonymisation?
  • Provide a translation or at least a summary of each interview in English, when the speech is in another laguage.
  • Never trust the transcription results of ASR-software (automatic speech recognition). ASR becomes better and better but the software cannot recognise words that are not in the vocabularct (jargon, foreing and dialect words, acronyms, abreviations, etc. ).

Transcription methods

Transcription methods depends very much upon your theoretical and methodological approach, and can vary between disciplines.

  • A thematic sociological research project usually requires a denaturalised approach, i.e. most like written language (Bucholtz, 2000), because the focus is on the content of what was said and the themes that emerge from that.
  • A project using conversation analysis would use a naturalised approach, i.e. most like speech, whereby a transcriber seeks to capture all the sounds they hear and use a range of symbols to represent particular features of speech in addition to the spoken words; for example representing the length of pauses, laughter, overlapping speech, turn­taking or intonation.
  • A psycho­social method transcript may include detailed notes on emotional reactions, physical orientation, body language, use of space, as well as the psycho-dynamics in the relationship between the interviewer and interviewee.
  • Some transcribers may try to make a transcript look correct in grammar and punctuation, considerably changing the sense of flow and dynamics of the spoken interaction. Transcription should capture the essence of the spoken word, but need not go as far as the naturalised approach. This kind of transcripts is, in combination with forced alignment, often used for the automatic generation of subtitles.

Reference: Bucholtz, M. (2000) The Politics of Transcription. Journal of Pragmatics 32: 1439­1465.

(this text is partly based on the information on the UK Data Service website)