There are many different specialized tools for manual digital transcription. While it is not neccesary to use specialized tools (one can also use Microsoft Word or Google Docs), knowing your options can make transcribing much less of a hassle and expand the possibilities of research using your transcriptions.
One example of an advantage of transcription software is that it offers "playing" with "typing" and that the resulting transcription is time-aligned. i.e. the start and end time of each text-fragment (a word, a couple of words, a phrase or even a paragraph) is known. This time-alignment makes it possible to search for spoken words and to generate subtitles.
Transcriptions made with an ordinary text editor (Notepad, Word, etc.) lack this time-alignment and the result is just text. Combining this text with forced alignment however will result in the same time-aligned transcriptions as with dedicated transcription software, which will be explored on our page on post-transcription.
Granularity
The "time resolution" of the transcription software depends on the human editor who selects short fragments (words or even phonemes) or rather long fragments (paragraph). Another often used method for the time-alignment is to place time-stamps in a fixed interval (e.g. each 30 sec or each 5 minutes).
Once the transcription is made (with a text editor with or without time-stamps or dedicated transcription software on an utterance level) a final foreced alignment will result in a more precise determination of the start- and end-times of each word and, if desired, the start- and end-times of the spoken phonemes.
For Oral Historians, time-aligment on the utterance level will be "enough", but modern technolgy makes it extremely simple to automatically add a higher granularity on the time-aligned transcriptions.
Tools
Here, I will list three tools that can be used especially for transcription. The first is a transcription-centered text editor for plain transcripts, the second is a tool useful for manually transcribing on a sentence-by-sentence basis, and the third goes even deeper, making it possible to transcribe on a word-by word or phoneme basis.
oTranscribe
oTranscribe
Simple transcription - Lightweight - Easy-to-use
Free - Webbased
oTranscribe is a simple, lightweight website which eases the process of manual transcription. It combines a straight-foward text editor á la Google Docs with a simple media player that uses keyboard shortcuts for play/pause/forward/backward/etcetera. You can load your own audio and video files, or use a video from YouTube. The online tool does not store data on their own servers, but uses your own browser storage. This is good for privacy, but bad for reliability: back-up your transcription every 10 minutes or so!
Subtitle Edit
Subtitle Edit
Subtitle editor - Timed transcription (sentences) - Open source
Free - Installer (Windows only)
https://www.nikse.dk/subtitleedit/
Subtitle Edit was not created as a transcription tool, as the name implies. However, as subtitle files are a great way to store perfectly timed transcriptions, and Subtitle Edit is a very customizable and powerful Subtitle editor, the tool can be used to create transcriptions that are perfectly timed on a sentence-by-sentence basis. Also, one can choose to export ASR results as subtitle files, which makes Subtitle Edit a great tool for editing and correcting these ASR results.
ELAN
ELAN
Linguistic Annotator - Timed transcription (words) - Open source
Free - Installer (Windows, Mac, Linux)
https://archive.mpi.nl/tla/elan
Like Subtitle Edit, ELAN was not developed with (just) transcription in mind. It was created as a more general linguistic annotation tool. These annotations can be anything the user wants: these could be codes, key moments, memos, and also transcriptions. Transcribing with ELAN is a lot of work compared to classic transcription, but the biggest benefit of using ELAN is that it is possible to transcribe on a word-by-word basis, and even timing for specific phonemes. It also allows for multi-layer transcription: you can create different 'tiers' for different speakers.