Sanitizing speech recordings made with portable audio recorders

From Audacity Wiki
Jump to: navigation, search

Audacity SpeechCast Processing

I have just started toying with mobile voice recordings including personal musings, presentations at work or events or a course I am taking, and generally speech recordings of any kind. The key aspects of this recorded speech are little control over the recording environment, and use of a vareity of recording tools, many of which are likely to be less than ideal quality for perfect speech recording. For e.g., for the personal notes, I have control over the environment and intelligence-to-noise ratio, and can use Audacity on the PC with a fairly good microphone to record the data. However, in the other environments where the use of portable recording devices is required, I am limited in this regard. I cannot pre-arrange for decent recording equipment most of the time, and to stand next to the presenter/lecturer with my Zen in order to get better quality voice recording is not an option. :)

General low volume audio, poor intelligence-to-noise ratio, and distracting high pitch 'hiss' in the noise are just some of the problems I am experiencing.

I have today for the first time begun looking for good audio processing tools. After playing with a few I am beginning to settle on Audacity.

However, I am at a severe dis-advantage in having no idea of the best practices for voice processing in this fashion.

I will attempt to outline what I have 'learned' so far, and where I think more info would be great for people in general recording speech sound-lets for re-distribution or private records etc. (Speech/Pod Casts)

My Process to Date

1) My portable recorder (like most portables I imagine), has a limited mic quality and gain . It generates Wav Files, which I import to Audacity.

2) This is a stereo, 16kHz, 32-bit float audio (as from the box info on the left of the audacity window)

3) I then immediately save the import as an Audacity project, copying the source wav into the audacity project. I then remove the source wav file which is now redundant (though I still have a copy of this original info on a backup medium until I am happy with the processing of the data)

4) I select the envelope tool and 'widen' it, in preparation for normalisation (to get better normalisation, or increase in volume).

5) Next I select all the data and perform normalisation (audio level of speaker at a distance of a few metres is low, so this increases the audio level)

6) Unfortunately, the background noise was close to the level of the speaker most of the time, so this has now also increased in volume and is still prevalent. I select 'quite' pieces between words/sentences and use them as 'noise training data' with the "Get Noise Profile" option of the noise removal 'effect'.

7) Armed with this, I select data for a range to either side of this training data, and apply the noise filter. I repeat for the entire data.

8) I revisit the file searching for gaps in the speech, noises that are not intelligence, etc, and silence/cut them.

9) I have also applied the 'Click Removal' effect, but I am not sure it has done much.

10) My audio recorder (a zen vision:m portable thingy) has a drive in it which spins up a various intervals. Then there is a 'click' and the spin 'whine' has stopped. I presume this is just writing the buffer to disk, but hearing the disk in the recording is annoying, especially with low volume speakers. I am manually looking for these points (which, like people coughing, is given away by a quick 'spike' in the waveform representation), and silence or remove them. Is there a better way to deal with these ???

11) There is some 'hiss' and other high pitch noise in the background, how can I get rid of this ?

Areas requiring further attention (?)

1) Some more information specific to speech processing, especially when the recording is -not- made with Audacity, but rather with average or poor quality portable recorders.

2) I have read articles hinting at unrequired frequencies or frequency bands, bass mostly. I need to look deeper into this. What are they ? How can one apply them in Audacity ? Are there dangers in losing intelligence ?

3) How to use 'filters' to remove data outside the range of the human voice. Can a specific speaker be profiled via a short speech segment, and use this as a mask to remove everything else ?

4) What are good audio file settings for speech ? I am referring specifically to sampling sensitivity and rate, stereo/mono, etc. How can one easily re-sample an audio file with speech only in order to make it a smaller file yet with all the intelligence still intact ?


I am sure I am missing loads more good tricks for processing speech data. So please chip in. This is a cool application, good info on accomplishing various tasks with Audacity would be a great help to me, and lots of others from what I am reading online.

With the advent of the multitude of portable digital audio players and recorders and thus the desire to produce general SpeechCasts, we have entered a new era where the general public, not knowing about the Signal Processing tricks required to sanitise speech well, will want to do just that.

Any tutorials, automation, etc would be very very helpful in my opinion.

I will report back if I find anything else of interest. :)