The interactive diagram below shows a spectrogram for the word 'Audacity'. The horizontal axis is time and the vertical axis frequency. Move the cursor over different parts of the spectrogram to see which parts of the spectrogram correspond to which sounds.
Vowels and Consonants
The distinction between vowel sounds and consonant sounds is visible from the spectrogram. Vowel sounds show a sustained sound, each with a few strong frequencies. The 'd' and 't' consonant sounds are much shorter, and are preceded by a small near silence.
- On the spectrogram, find the regions of near silence to find the following 'd' and 't'. The 'd' and 't' are 'plosive' sounds - there is a burst of air pressure to make the sound.
- On the spectrogram find the area that's white in colour of highest frequency - at about 5KHz. The 'c' in Audacity is soft, like the sound of 's', and though it is a sustained sound, sound is produced in many frequencies around a frequency of 5KHz, rather than in a few strong frequencies as in the vowels.
- At a window size of 2048 the silences before 't' and 'd' are clearest and the frequency bands of the 'i' in 'Audacity' show up clearest.
- At a window size of 1024 the change in frequencies in at 'Au' of 'Audacity' show up clearest.
The underlying audio was sampled at a frequency of 44.1KHz (see panel in left of spectrogram), in other words there are 44,100 audio samples for each second of sound. 4,410 samples would be 1/10th of a second and 441 samples 1/100th of a second. When measuring the frequencies, Audacity looks over a range of samples called a 'window'. A window of 8192 samples, the longest window we offer, is taking samples over a duration of 8192/44100 seconds, or about 1/5th of a second.
The interactive diagram lets you choose a window size to see how the spectrogram varies. At long window sizes the algorithm sees more repeats of the same pattern of sound, and is better able to make fine distinctions between frequencies - provided the frequencies are constant rather than varying. Shorter windows measure the frequencies less precisely, but are better able to deal with rapid variation in frequencies.