Converting digital recordings

The last years, many anlogue interview collections were digitized. It may be the case that the end-result is sub-optimal, and it should be better to start-over again, but here we will handle just the transformation from one existing format into an other.

There are three major groups of audio file formats:

wav sound formatHeader of a RIFF-WAV file

Uncompressed audio formats

The first one is just a simple array of integers. The signal was sampled with a certain sample frequancy (see tab From analogue to digital) and each consecutive sample is stored in the array. The difference in the uncompressed formats is the way the signal is written (big endian or small endian) and the header (=metadata) stored in the audio-file.

This metadata makes it possible for a software programs to "know" if the recording was in mono or stereo, what the sample frequency was , how many bits are used for one sample and how many audio-samples there are. Moreover, additional information can be stored such as the owner, the software/hardware used for the recording and more. These uncompressed audio formats require the most disk-space but are "fast" because no additional calculation has to be done for reading and writing the files and they have the highest audio quality.

Lossless Compression

The amount of disk space necessary for your audio files may be an issue. With lossless compression, audio files are written in a "smart" way so they absorb less space but do not lose any information. From a lossless compression you alway can go back to the umcompressed format and no information will lost.
The disadvantage is that you need some computing to open the file (for reading, editing or playing) and to write the file back to the hard disk. So for long time storage when there is no need to access the files very often, lossless compression is an optimal choice.

Lossy Compression

The human ear is not lineair-sensible: i.e. we hear the difference between 100Hz and 110Hz quite well but not the difference between 6000Hz and 6010Hz. So, it is possible to compress the sound with a lossy data compression: a data encoding method that use inexact approximations and partial data discarding to reduce file sizes significantly, typically by a factor of 10 without a huge loss of audio quality. However, the more we compress, the better we do hear it.

For listening to the files, this partial loss of quality is not a big thing: we still can perfectly hear and understand what someone is telling in an interview. For the automatic recognition however, a strong compression may increase the word-error-rate.

So the best thing to do is: use the original quality (uncompressed or lossless compressed) for ASR, Aligment and (eventually) non-verbal analyses and use the compressed versions for access via the internet

Conversion on the computer

For both Windows and Apple there are dozens of good and free conversion programs that turn your audio into the right format for ASR. So, the first thing you should do is to have a good search on the web. However, to help you doing it, here are two programs that easily do the job:

 audioconverter mediaio    
To Wave Converter (Apple) GoldWave (Windows) 


Conversion on the web

There are various (good) webservices that convert your audio into other formats. 

 audioconverter mediaio    
Audio Converter