|This page of developer code and/or digital audio documentation is part of a collection of pages for learning about our code. We aim to increasingly add interactive diagrams to these pages and over time connect better to the doxygen documentation.|
All sounds we hear with our ears are pressure waves in air. Starting with Thomas Edison's demonstration of the first phonograph in 1877, it has been possible to capture these pressure waves onto a physical medium and then reproduce them later by regenerating the same pressure waves. Audio pressure waves, or waveforms, look something like this:
Analog recording media such as a phonograph records and cassette tapes represent the shape of the waveform directly, using the depth of the groove for a record or the amount of magnetization for a tape. Analog recording can reproduce an impressive array of sounds, but it also suffers from problems of noise. Notably, each time an analog recording is copied, more noise is introduced, decreasing the fidelity. This noise can be minimized but not completely eliminated.
Digital recording works differently: it samples the waveform at evenly-spaced timepoints, representing each sample as a precise number. Digital recordings, whether stored on a compact disc (CD), digital audio tape (DAT), or on a personal computer, do not degrade over time and can be copied perfectly without introducing any additional noise. The following image illustrates a sampled audio waveform:
Digital audio can be edited and mixed without introducing any additional noise. In addition, many digital effects can be applied to digitized audio recordings, for example, to simulate reverberation, enhance certain frequencies, or change the pitch.
A waveform can be represented as a sequence of signed numbers. A common recording rate is 44,100 Hz, which entails 44,100 values every second - or twice as many if recording stereo as there is one waveform for the left ear and one waveform for the right ear.
Formats such as '.wav' format store the numbers directly, with a small amount of extra overhead to give information such as whether it is a stereo or mono recording. There is a great deal of similarity between the left and right channel in stereo recording, and also the waveform is not just a random shape, but has some regularities. The redundancy in the signal makes it possible to encode audio more compactly, especially if one is prepared to sacrifice a small amount of quality. A format such as .mp3 does this, and is very much more compact.
The sequence of numbers describing audio in .wav format describe the waveform 'in the time domain'. There is a second way of representing audio, which is 'in the frequency domain'. Essentially at each moment the amount of audio at each different pitch is measured, and those values are recorded.
The sound received by the left and right ear are different - voices and instruments closer to one ear are heard very slightly before the other ear - and this effect helps us in getting a 3D sense of where sounds are coming from.
For a sound recording, the left and right channels are digitised separately. If you zoom in on the waveform below, you will see that the left and right channels are subtly different. (you can zoom in by clicking and dragging on the ruler)