Completed Proposal Noise Removal
|Proposal pages help us get from feature requests into actual plans. This proposal page is about improving the Audacity Noise Removal effect.|
Proposal pages are used on an ongoing basis by the Audacity development team and are open to edits from visitors to the wiki. They are a good way to get community feedback on a proposal.
- Note: Proposals for Google Summer of Code projects are significantly different in structure, are submitted via Google's web app and may or may not have a corresponding proposal page.
Effective noise removal is a difficult and highly subjective task, with trade-off between the degree of noise removal and the acceptable degree of quality damage to the remaining audio. There are a number of differing approaches to the task of noise reduction. The aim of this proposal is to bring together current work in Audacity noise reduction in one place with the aim of improving the Audacity Noise Removal effect to a level on par with, or superior to other noise removal applications.
- Marco Diego Aurélio Mesquita
Noise removal is one of the most common tasks among Audacity users. Transferring vinyl/tape to CD, recording a podcast, making a music recording, and many other common Audacity tasks can benefit from effective noise reduction.
Noise Reduction methods
Current Audacity method
Does the noise gating (downward expansion) take into account the full spectrum audio level as well as the level in the narrow band that is being processed?
The advantage of this method is that if both the noise and the "music" are broadband, then one would expect that low level noise would be largely masked when the music is loud and would not require removal. Considering that removing the noise will create some "damage" to the music, it would often be preferable to rely on this masking effect rather than attempting to remove the noise.
The obvious limitation of this method is that if the noise has a much broader frequency band than the "music", then it will not be masked and will remain obvious while the level of the "music" is above the noise level threshold.
Paul L 3 Oct 2014: Pictures at right illustrate the most serious shortcoming of the present method. Each displays a linearly rising chirp (20 to 2000 Hz, amplitude 0.8, duration 2 minutes) mixed with white noise (amplitude 0.01) in spectrogram view with a window size of 2048. The first uses a rectangular window, the other a Hann(ing) window. The noise removal algorithm sees what you see in the first picture. (More precisely it only sees spectral data for every 1024th sample time.)
With the rectangular window, there is great "spectral leakage" except when the frequency is near a multiple of sample rate (here 44100 Hz) divided by 2048 (which gives 21.533 Hz). Thus the noise behind the sound is usually invisible to the removal algorithm. This explains why examples elsewhere on this page, of a sine tone mixed with noise, do not clean up well in the present Audacity version (2.0.6). Such examples can clean up -- but only if your tone's frequency is just right. Otherwise the powers for frequencies far from the tone, as computed with rectangular windows, do not drop below the noise thresholds, so those frequencies pass unaltered.
That explains Bill's "black box" observation, below. No, the intention of the algorithm was indeed to examine levels in each frequency band. But spectral leakage of the rectangular window impairs the algorithm.
Noise removal should instead be given data to analyze more like the second picture, in which a windowing function lessens spectral leakage. Choosing between a spectral gating or a subtraction procedure is of secondary importance. First we need good spectral data, else it's GIGO.
Hann windows are used in the present algorithm, but only on the synthesis side after inverse FFT of the modified spectra. It is good to use the window in synthesis to continuously blend the effects of modifying the spectra in successive overlapping windows.
I believe it would be best to use Hann or another window function on both the analysis and synthesis sides. This would require the FFT windows to hop by 1/4 of the window size (or less with other functions), not 1/2 as now. This may increase computation time but the benefit will be great.
Noise profile statistics, and the discrimination problem
When a noise profile is analyzed, a threshold is deduced for each of 1025 frequency bands.
When removing noise, for each window of 2048 samples (windows will overlap by 50%), and for each of 1025 frequency bands, the band must be classified as noise or not. If there is a mistaken classification as non-noise in a quiet section, the result is a "tinkle-bell" or musical noise, as the frequency passes unattenuated for the duration of one window. If the opposite error occurs in foreground sound, the result is a drop-out that distorts the sound.
In more detail, what happens in noise removal is this: for each band, and for each window, take the power for that window, and for one overlapping window, and take the minimum; compare that to the threshold. Perhaps this criterion was meant to reduce the occurrence of musical noise, because a brief random fluctuation upward would be ignored. However, it creates the opposite problem that a fluctuation downward in the signal can cause a drop-out. Perhaps with a hop size of 75% instead of 50%, it would be better to use more then two windows and take a median, avoiding both problems.
Generate a long track of white noise. Use some initial portion of the track as noise profile, then remove noise from the entire track, with the maximum reduction, and 0 for Sensitivity, Attack/Decay, and Frequency Smoothing. View the result in Spectrogram, and musical noise becomes visible. You will observe that: the profile portion itself is free of such artifacts; and that the longer the profile is, the lesser the density of artifacts will be in the remainder.
The noise profiling algorithm performs the same minimum-of-two operation as will happen in noise removal, and takes the maximum of such minima over the length of the profile, to determine the threshold. But then the probability of fluctuations above the threshold in the non-sampled noise remains larger if the profile is shorter. It would be better to derive the threshold from the mean and perhaps other moments of the distribution of power values, so that the 99th percentile (say) of the distribution could be guessed from a short sample, and so the quality of noise removal would be less dependent on the length of the sample.
The noise profiling algorithm could use a smaller hop between windows than the noise removal uses, and so examine more overlapping windows, which might also result in better statistics from short samples of noise.
The Sensitivity setting biases all the thresholds by an equal amount. When it is positive, it can reduce the musical noise, at the risk of more distortions. But this seems like compensation for the failure of the profiling algorithm to deduce good thresholds from the sample.
The Frequency Smoothing setting exists to mitigate the windows of musical noise that still pass through, by spreading the effect over neighboring bands. The Attack/Decay setting can also have the effect of blurring a "tinkle-bell" along the time axis. Neither reduces the number of these discrimination errors.
What should Isolate do?
The current Isolate feature interacts badly with attack/decay and gives strange results with the longest setting: "swelling" of non-noise leading up to a pause.
Should Isolate simply pass the portion of the sound that is classified as noise? Then Sensitivity should affect it, but then it should be unaffected by Noise Reduction, Attack/Decay, and Frequency Smoothing settings. This should be documented and perhaps those controls should be disabled when it is selected.
Should Isolate instead give the "residue," that is the difference between the noise-removed signal that would result, and the original signal? Then those settings would have effect.
Are both of the above useful, and so should there be a three-way choice instead?
Other known problems in 2.0.6
In no particular order:
- Lowest and highest frequency bins are always set to zero before inverse FFT.
- Frequency smoothing originally averaged gains by dB (geometrically) but was inadvertently changed to average arithmetically. (Subtle, hard to make a persuasive test case, but it makes less sense.)
- Taking a noise profile in a project at one sample rate, and then removing noise in another project with a different rate, makes incorrect results. This should simply be prohibited.
- "Release" is a better term than "Decay," and it would be useful to vary attack and release separately, because normal sounds are not time symmetrical. It may be important not to attenuate the decaying tail of a note too soon; there is less need for a long attack.
- The result always fades in from zero over the first 1024 samples.
- When analyzing the noise profile, the program may see trailing zero-padded windows which may contaminate the noise thresholds. To avoid this problem, we may need to impose a minimum length on the noise profile and give an error dialog when it is too short.
Gating according to level of noise sample
Gating each frequency band according to the level of the noise sample at that frequency.
Bill 21Apr11: We have to be careful and precise in our wording. The above statement says to me "The signal level is compared to the level of the noise sample in each frequency band (call this the band threshold). If the signal is above the band threshold it is passed unaltered; if it is below the threshold, gating (or downward expansion) is applied." The question is, what do we mean by "signal level"? I could mean the overall signal level, or the signal level in the given frequency band. Treating the AudNR effect as a black box, I am led to the conclusion that it is the former. In the one other NR effect that I have experience with, it seems to be the latter.
In spectral subtraction, the average noise spectrum is subtracted from the average signal spectrum, performed independently in the frequency bands critical to human hearing. This reduces the level of each frequency band by an amount proportional to the level of that frequency in the noise sample. This could be done more softly (power subtraction) or more strongly (magnitude subtraction).
A noise reduction effect for Audacity developed by Jérôme M. Berger uses a technique that has been borrowed from image/video processing. The patch is available here.
Jerome describes the process as follows:
- The coring function is a soft thresholding function which gives very small gains for frequencies whose power is lower than the threshold and gives gains close to 1 for frequencies whose power is higher than the threshold.
- The idea being that frequencies whose power is higher than the threshold will be mostly signal which will mask the noise and must be kept, while the other frequencies will be mostly noise which must be removed.
- Mathematically, the coring function I used is: 1-exp(|FFT(f)|^2/s) where f is the frequency and s is a strength derived from the parameters.
The current version of this effect has a slope control for setting the "color" of the noise that is to be removed (dB per decade). Some examples of this filter in use can be found here: Examples. ("Bruts" are the originals, "Filtre" are filtered).
A significant difference between this effect and the Audacity Noise Removal is that Noise Coring reduces hiss from within other sounds as well as during quiet sections. This makes it particularly useful for reducing noise such as tape hiss, particularly from music where the hiss is evident through the music.
The current version of this patch has a flaw that it will sometimes create small glitches in the processed audio which is possibly due to the way that the envelope changes from one window to the next. Jérôme is currently looking at adding "time smoothing code" to cure this issue.
Summary of Current State
So far there has been significant improvement by the introduction of the Sensitivity Slider, and fixing the attack/decay times slider.
It has been observed that increasing the FFT size produces improved noise removal with less damage caused to the remaining audio.
Sine wave in 400Hz with backgroung white noise
How to generate it:
- 1 Generate 30 seconds of white noise, amplitude 0.1 - 2 Into a second track, generate 10 seconds sine wave, 100 Hz, 0.8 amplitude, starting at about 5.0 seconds along the time line. - 3 Select both tracks then Mix and Render.
Result: Sample 1
Proposed Patches and Modifications
Edgar, 3 October 2014 - I propose that the current Noise Removal code be left exactly as is for now and that an entirely new effect be created (probably based on the current Noise Removal code). This will make it trivial to experiment on the new code without the fear that it would be difficult to back out unwanted/unwarranted changes in the existing Noise Removal code in case the entire project was not ready for the next release. This would make it very easy to #define as experimental the entire new Effect's code base. Ultimately, removing the old Noise Removal code and changing the name strings in the new code if required) would also be very easy.
Patch to add spectral subtraction slider to audacity 2.0.5 source release: Spectral subtraction slider patch.
Spectral subtraction with aggressiveness setting
Patch to add spectral subtraction slider with aggressiveness setting to audacity 2.0.6 source release: Spectral subtraction slider patch. Setting the aggressiveness slider to 0 will make the noise removal effect behave as a Power Spectral subtracting algorithm, setting it to 0.5 will make it behave as a Magnitude Spectral subtracting algorithm and setting it to 1.0 will make it behave as something that is even more aggressive than magnitude spectral subtraction.
Test results should include the name of the source file, patches and modifications used, what settings were used, and full details of any additional processing. The source file should be listed in the Test Samples section above.
Spectral subtraction vs Standard audacity 2.0.6
Standard audacity 2.0.6 noise removal applied to Sample 1: Configuration used: Noise Reduction: 48 Sensitivity: 0 Frequency Smoothing: 200 Attack/decay: 0.1 . Result: Sample 1 with standard audacity 2.0.6 noise removal
Standard audacity 2.0.6 noise removal applied to Sample 1 with spectral subtraction slider patch: Configuration used: Noise Reduction: 48 Sensitivity: 0 Subtraction: 1.0 Frequency Smoothing: 200 Attack/decay: 0.1 . Result: Sample 1 with audacity 2.0.6 noise removal with spectral subtraction