Proposal Noise Removal
|Proposal pages help us get from feature requests into actual plans. This proposal page is about improving the Audacity Noise Removal effect.|
Proposal pages are used on an ongoing basis by the Audacity development team and are open to edits from visitors to the wiki. They are a good way to get community feedback on a proposal.
- Note: Proposals for Google Summer of Code projects are significantly different in structure, are submitted via Google's web app and may or may not have a corresponding proposal page.
Effective noise removal is a difficult and highly subjective task, with trade-off between the degree of noise removal and the acceptable degree of quality damage to the remaining audio. There are a number of differing approaches to the task of noise reduction. The aim of this proposal is to bring together current work in Audacity noise reduction in one place with the aim of improving the Audacity Noise Removal effect to a level on par with, or superior to other noise removal applications.
Developer/QA/Guest Programmers backing
- Steve Daulton
- Marco Diego Aurélio Mesquita
- Paul L
- Ed Musgrove (Edgar on the forum) (As a heavy user of Audacity's Noise Removal function and occasional contributor to Audacity's code base (but someone with absolutely no understanding of the algorithms), I feel comfortable offering my help as a beta tester and someone who can do lightweight C++ code review.)
- Andrew Hallendorff
Noise removal is one of the most common tasks among Audacity users. Transferring vinyl/tape to CD, recording a podcast, making a music recording, and many other common Audacity tasks can benefit from effective noise reduction.
Noise Reduction methods
Current Audacity method
Does the noise gating (downward expansion) take into account the full spectrum audio level as well as the level in the narrow band that is being processed?
The advantage of this method is that if both the noise and the "music" are broadband, then one would expect that low level noise would be largely masked when the music is loud and would not require removal. Considering that removing the noise will create some "damage" to the music, it would often be preferable to rely on this masking effect rather than attempting to remove the noise.
The obvious limitation of this method is that if the noise has a much broader frequency band than the "music", then it will not be masked and will remain obvious while the level of the "music" is above the noise level threshold.
Critique of current Audacity Noise Removal by Paul Licameli
Overview of the noise removal procedure
Note that the comments at the top of NoiseRemoval.cpp are not completely correct about the sequence. I do not here describe Isolate (which is buggy) or the noise profiling procedure presumed to have been done before.
Divide the sound into windows of 2048 samples. Consecutive windows overlap by 50%.
Create a queue, each record of which holds FFT coefficients of one window, and an array of gain factors to apply to each of the 1025 frequency bands.
While data remain in the sound or the queue:
- Add a record to the queue. Initialize with FFT coefficients of the next window (adding trailing zero samples as needed) and gains all equal to the noise reduction setting.
- Examine the record in the middle of the queue. For each of its bands:
- Determine whether this band is noise, using data collected from the noise profile.
- If it is not noise, raise its gain to 0 dB.
- Propagate raised gain according to the attack/decay setting. This means: define a decay, both forward and backward in time, from 0 dB down to the noise reduction. Ensure that no gain, for the same band, in the nearby windows, is below this decay curve. The other gains may remain above it, on the attack side (going backward in time, to windows that already passed through these steps).
- Process the oldest record in the queue (if it has grown long enough):
- Adjust gains according to the frequency smoothing setting.
- Scale the FFT coefficients by the gain factors.
- Perform inverse FFT to obtain a new window of samples.
- Multiply this new window by a Hann window function, and add to the result sound.
- Remove from the queue.
The big problem
Pictures at right illustrate the most serious shortcoming of the present method. Each displays a linearly rising chirp (20 to 2000 Hz, amplitude 0.8, duration 2 minutes) mixed with white noise (amplitude 0.01) in spectrogram view with a window size of 2048. The first uses a rectangular window, the other a Hann(ing) window. The noise removal algorithm sees what you see in the first picture. (More precisely it only sees spectral data for every 1024th sample time.)
With the rectangular window, there is great "spectral leakage" except when the frequency is near a multiple of sample rate (here 44100 Hz) divided by 2048 (which gives 21.533 Hz). Thus the noise behind the sound is usually invisible to the removal algorithm. This explains why examples elsewhere on this page, of a sine tone mixed with noise, do not clean up well in the present Audacity version (2.0.6). Such examples can clean up -- but only if your tone's frequency is just right. Otherwise the powers for frequencies far from the tone, as computed with rectangular windows, do not drop below the noise thresholds, so those frequencies pass unaltered.
That explains Bill's "black box" observation, below. No, the intention of the algorithm was indeed to examine levels in each frequency band. But spectral leakage of the rectangular window impairs the algorithm.
Noise removal should instead be given data to analyze more like the second picture, in which a windowing function lessens spectral leakage. Choosing between a spectral gating or a subtraction procedure is of secondary importance. First we need good spectral data, else it's GIGO.
Hann windows are used in the present algorithm, but only on the synthesis side after inverse FFT of the modified spectra. It is good to use the window in synthesis to continuously blend the effects of modifying the spectra in successive overlapping windows.
I believe it would be best to use Hann or another window function on both the analysis and synthesis sides. This would require the FFT windows to hop by 1/4 of the window size (or less with other functions), not 1/2 as now. This may increase computation time but the benefit will be great.
Noise profile statistics, and "tinklebells"
When a noise profile is analyzed, a threshold is deduced for each of 1025 frequency bands. Each threshold should be high enough that the power in that band in windows of other noise remains below that threshold, with high enough probability to make artifacts infrequent.
Generate a long track of white noise. Use some initial portion of the track as noise profile, then remove noise from the entire track, with the maximum reduction, and 0 for Sensitivity, Attack/Decay, and Frequency Smoothing. View the result in Spectrogram, and musical noise becomes visible, where excursions above the thresholds occur. You will observe that: the profile portion itself is free of such artifacts; and that the longer the profile is, the lesser the density of artifacts will be in the remainder.
The noise profiling algorithm performs the same minimum operation as will happen in noise removal (see next section), and takes the maximum of such minima over the length of the profile, to determine each threshold. But then the probability of fluctuations above the thresholds in the non-sampled noise remains larger if the profile is shorter. Some bands have good thresholds, by luck, because large excursions occur early in the track and are accounted for by the profiling algorithm. For other bands, more noise must be examined before there is good luck.
It would be better to derive the threshold from the mean and perhaps other moments of the distribution of power values, so that some quantile of the distribution (say, 0.9999) could be guessed from a short sample, and so the quality of noise removal would be less dependent on the length of the sample.
The noise profiling algorithm could use a smaller hop between windows than the noise removal uses, and so examine more overlapping windows, which might also result in better statistics from short samples of noise.
The Sensitivity setting biases all the thresholds by an equal amount. When it is positive, it can reduce the musical noise, at the risk of more distortions (described below). But this seems like compensation for the failure of the profiling algorithm to deduce good thresholds to begin with. The problem with biasing all thresholds equally is that some of the lucky good thresholds might become excessive just so that the unlucky ones become adequate.
Perhaps "Sensitivity" should have new meaning when other statistics are gathered. It might instead adjust the quantile of the inferred distribution that determines the threshold. It would then no longer be a dB value.
The Frequency Smoothing setting exists to mitigate the windows of musical noise that still pass through, by spreading the effect over neighboring bands. The Attack/Decay setting can also have the effect of blurring a "tinkle-bell" along the time axis. These could also mitigate the distortions in non-noise. Neither reduces the number of these discrimination errors.
The opposite discrimination error: dropouts
Thresholds must be high enough, but not so high as to distort the signal by misclassifying parts of it as noise. The opposite error to a "tinkle-bell" in the background is a drop-out that distorts the foreground sound. (Example picture or clip?)
In more detail, what happens in noise removal is this: for each band, and for each window, take the power for that window, and for one overlapping window, and take the minimum; compare that to the threshold. (That is what happens at the usual sample rate of 44100 Hz. More windows might be checked with greater rates.) Perhaps this criterion was meant to reduce the occurrence of musical noise, because a brief random fluctuation upward would be ignored. However, this increases vulnerability to dropouts. Perhaps with window overlaps of 75% instead of 50%, it would be better to use more than two windows and take a median, steering between both problems.
What should Isolate do?
The current Isolate feature interacts badly with attack/decay and gives strange results with the longest setting: "swelling" of non-noise leading up to a pause.
Should Isolate simply pass the portion of the sound that is classified as noise? Then Sensitivity should affect it, but then it should be unaffected by Noise Reduction, Attack/Decay, and Frequency Smoothing settings. This should be documented and perhaps those controls should be disabled when it is selected.
Should Isolate instead give the "residue," that is the difference between the resulting noise-removed signal and the original signal? Then those settings would have effect.
Are both of the above useful, and so should there be a three-way choice instead?
Other known problems in 2.0.6
In no particular order:
- Attack/Decay propagation is needlessly inefficient.
- Lowest and highest frequency bins are always set to zero before inverse FFT.
- Frequency smoothing originally averaged gains by dB (geometrically) but was inadvertently changed to average arithmetically. (Subtle, hard to make a persuasive test case, but it makes less sense.)
- Taking a noise profile in a track at one sample rate, and then removing noise in another track with a different rate, makes incorrect results. This should simply be prohibited. Taking a noise profile over two or more tracks at different rates should also be prohibited.
- "Release" is a better term than "Decay," and it would be useful to vary attack and release separately, because normal sounds are not time symmetrical. It may be important not to attenuate the decaying tail of a note too soon. There is less need for a long attack.
- The result always fades in from zero over the first 1024 samples.
- When analyzing the noise profile, the program may see trailing zero-padded windows which may contaminate the noise thresholds. To avoid this problem, we may need to impose a minimum length on the noise profile and give an error dialog when it is too short.
- For certain noise profiles there may be edge artifacts in the last 1024 samples at the end of the processed selection, where noise fails to attenuate or even rises. Mostly, this occurs only for artificial examples of very peaky noise. I am less certain what to do about that.
- When it is enabled, the "OK" button should have keyboard focus, not "Get Noise Profile," for consistency with other effects.
Other methods that have been proposed
Gating according to level of noise sample
Gating each frequency band according to the level of the noise sample at that frequency.
- Bill 21Apr11: We have to be careful and precise in our wording. The above statement says to me "The signal level is compared to the level of the noise sample in each frequency band (call this the band threshold). If the signal is above the band threshold it is passed unaltered; if it is below the threshold, gating (or downward expansion) is applied." The question is, what do we mean by "signal level"? I could mean the overall signal level, or the signal level in the given frequency band. Treating Audacity Noise Removal as a black box, I am led to the conclusion that it is the former. In the one other Noise Removal effect that I have experience with, it seems to be the latter.
In spectral subtraction, the average noise spectrum is subtracted from the average signal spectrum, performed independently in the frequency bands critical to human hearing. This reduces the level of each frequency band by an amount proportional to the level of that frequency in the noise sample. This could be done more softly (power subtraction) or more strongly (magnitude subtraction).
A noise reduction effect for Audacity developed by Jérôme M. Berger uses a technique that has been borrowed from image/video processing. The patch is available here.
Jérôme describes the process as follows:
- The coring function is a soft thresholding function which gives very small gains for frequencies whose power is lower than the threshold and gives gains close to 1 for frequencies whose power is higher than the threshold.
- The idea being that frequencies whose power is higher than the threshold will be mostly signal which will mask the noise and must be kept, while the other frequencies will be mostly noise which must be removed.
- Mathematically, the coring function I used is: 1-exp(|FFT(f)|^2/s) where f is the frequency and s is a strength derived from the parameters.
The current version of this effect has a slope control for setting the "color" of the noise that is to be removed (dB per decade).
Some examples of this filter in use can be found here: Examples. ("Bruits" are the originals, "Filtre" are filtered).
A significant difference between this effect and the Audacity Noise Removal is that Noise Coring reduces hiss from within other sounds as well as during quiet sections. This makes it particularly useful for reducing noise such as tape hiss, particularly from music where the hiss is evident through the music.
The current version of this patch has a flaw that it will sometimes create small glitches in the processed audio which is possibly due to the way that the envelope changes from one window to the next. Jérôme is currently looking at adding "time smoothing code" to cure this issue.
- Gale 04Oct14:
- There is a general perception that the interface for Noise Removal should be kept as simple as possible - indeed the more tweak controls that are required, the stronger the indication that the offered Noise Removal algorithm is not yet optimal.
- There may be a case for a slide-out/dropdown "Advanced" interface where more controls could be provided, for example for choice of algorithm.
- Some feel that the effect is misnamed, the argument being that "Removal" wrongly implies that perfect removal without artifacts is always possible, therefore the name should be "Noise Reduction". My view is that a rename to "Reduction" fits poorly with the current interface and suggests a tacit admission that the effect is not as good as it should be. There is some consensus for renaming after a substantial update to the effect.
(originally by James 11Nov14:)
Noise Removal IS a form of Source Separation. We need a model for what noise is AND a model for what signal is. In our current noise removal, our model for noise is mean and variance of audio power gathered for each frequency bin. This works well for constant white / pink / brown noise and for mains hum and similar frequency based noise. In our noise model, for this effect, we are not currently catering for episodic noise such as clicks or coughs. Meanwhile our model for signal is anything that is significantly louder than noise level in its frequency band. The model for signal additionally has a temporal aspect, that allows for an attack time and a release time.
With the new residue feature, we can split the recording into a signal and a residue. The signal isn't quite what you'd get if you apply the model strictly. We rely on perceptual masking. So we allow some noise 'under' the signal that, if being strict, we would remove, rather as in mp3 encoding. Viewed in this light, Noise Reduction is a two step process. The idea of splitting a complex action into two parts for better control is one that recurs. In noise removal the first step, conceptually, is a true source separation into 'signal' and 'noise' tracks. The second step is a new 'effect' that combines two tracks, moving some audio from the second track to the first, when it would be perceptually masked. The amount of audio to move is related to the strength of the noise removal. If all audio on the second track is moved, then we have removed nothing. If none, we have removed as much as we can, but may have caused artifacts where the source separation was imperfect.
And why bother with the philosophical viewpoint? The reason is that potentially source separation code could lead to powerful noise removal effects. We could have different prior models for voice-signal to musical-signal. Potentially it could lead to noise removal effects that can remove coughs from a piano concerto. Your noise removal dialog would essentially consist of choosing what kind of signal you have, what kind of noise you have, and the strength of the noise reduction.
Summary of Current State
So far there has been significant improvement by the introduction of the Sensitivity Slider, and fixing the attack/decay times slider.
It has been observed that increasing the FFT size produces improved noise removal with less damage caused to the remaining audio.
Sine wave in 400 Hz with background white noise
How to generate it:
- Generate 30 seconds of white noise, amplitude 0.1
- Into a second track, generate 10 seconds sine wave, 100 Hz, 0.8 amplitude, starting at about 5.0 seconds along the time line.
- Select both tracks then Mix and Render.
Result: Sample 1
Proposed Patches and Modifications
Edgar, 3 October 2014 - I propose that the current Noise Removal code be left exactly as is for now and that an entirely new effect be created.
- Edgar 10Nov14 : This proposal has been committed today (reusing and extending the current interface).
Patch to add spectral subtraction slider to audacity 2.0.5 source release: Spectral subtraction slider patch.
Spectral subtraction with aggressiveness setting
Patch to add spectral subtraction slider with aggressiveness setting to audacity 2.0.6 source release: Spectral subtraction slider patch. Setting the aggressiveness slider to 0 will make the noise removal effect behave as a Power Spectral subtracting algorithm, setting it to 0.5 will make it behave as a Magnitude Spectral subtracting algorithm and setting it to 1.0 will make it behave as something that is even more aggressive than magnitude spectral subtraction.
Paul Licameli's Noise Reduction effect
- Update: Another patch was committed at r13595. Now, when a Spectral Selection is defined during Step 2, the effect will reduce or isolate noise only in the selected frequency band. Spectral selection does not affect Step 1. That is, statistics are still gathered for all frequencies.
This effect was added to Audacity version 2.0.7 (or 2.1.0) on 10 November 2014 in revision 13591, under an "experimental" define. It is a thorough rewriting of the Noise Removal effect, but still preserving the general outlines of the spectral noise gating procedure. It is intended ultimately as as replacement for the current Noise Removal.
As of r13596 it is enabled on Windows in HEAD in Experimental.h. Other platforms are awaiting the new files to be added to their build systems.
When defined in Experimental.h, an effect called Noise Reduction appears in the Effects menu, in addition to the Noise Removal of version 2.0.6.
The dialog includes all the familiar controls of Noise Removal, except that the attack/decay setting is now separated into two and decay is renamed "release."
New controls include a third radio button and several advanced controls. The advanced controls are intended for alpha testing purposes only and should be hidden in release versions once experiment has determined good settings. An exception is that the new sensitivity setting may replace the existing one. The new radio button "Residue" is intended for end users of 2.0.7. The new controls can be hidden by rebuilding with the lines
in NoiseReduction.cpp commented out.
Descriptions of the new controls follow. (in progress)
Note that the first radio button is renamed from Remove to Reduce, and that the behavior of Isolate has been fixed so that it is independent of the Noise reduction, Attack time, Release time and Frequency smoothing settings.
The new third button is named Residue. When it is chosen, the resulting signal equals what would result from choosing Reduce, minus the original signal. Therefore the other settings named above do have effect. Duplicating the original, applying Residue to one copy, and then listening to the mix, should play the same results (up to very small roundoff errors) as would result from Reduce.
Thus Isolate lets you play that part of the signal that is classified as noise, independently of most settings, while Residue serves the different purpose of playing the difference of original and noise-reduced signals, affected by all of the settings.
This choice control varies the two windowing functions, as explained above in the "Critique:" the first that is multiplied by the waveform before FFT, and the second that is multiplied by the results of inverse FFT and then added to the results of overlapping windows.
The choice none, Hann behaves as in 2.0.6 for comparison with newer methods.
The choice Hann, none improves noise reduction in examples such as a chirp mixed with noise and still permits only two steps per window. But the lack of a second window blending the overlapping windows does cause some artifacts, as the gain of a frequency band jumps discontinuously at window boundaries.
The choice Hann, Hann is what I recommend, but it requires at least four steps per window, at least doubling computation time. This is the default when advanced controls are hidden.
The choice Blackman, Hann also permits four steps per window and should not be any more expensive than the previous. I do not yet know of a strong reason to prefer it over Hann, Hann.
A warning message results if you attempt to reduce noise using a different choice from that used when profiling noise data. But noise reduction is allowed to proceed.
This choice permits the FFT window size to vary among powers of 2. The default, as in the older effect, is 2048, and this value is used when the advanced controls are hidden.
An error message results if you attempt to reduce noise with a different window size from that used when taking the noise profile.
Steps per window
This choice permits different powers of two for the ratio of window size to step size. This may not exceed the window size and must be at least 4 for the Hann, Hann and Blackman, Hann choices for window types. Higher values will slow the computation.
The default value when advanced controls are hidden is 4, the minimum required by the default window types.
This choice affects how statistics of the noise profile are used to determine thresholds, and how frequency bands of each window are classified as noise or signal.
The Old method is as in 2.0.6, and is affected by the Sensitivity slider. At a sample rate of 44100 Hz, the window is compared with the one following overlapping window and the minimum power in each band is compared with the threshold determined by the old algorithm for profiling noise, which is a maximum of such minima of neighbors over the whole profile.
The Median and Second greatest methods both ignore Sensitivity and are instead affected by New method sensitivity. Each method considers each window together with other windows whose centers overlap it - that is, the number of windows equals Steps per window, plus one. Median may be used only with step sizes of 2 or 4; this method throws out both high and low values for each band. Second greatest throws out one outlying high value and uses the maximum of the rest. Then the resulting value is compared with a multiple of the mean power of that band over the noise profile. The multiplier depends on the New method sensitivity setting (below).
Second greatest is now the default in case advanced controls are hidden. Experiment may prove Median to be preferable.
New method sensitivity
This control determines (does not equal) the multiple of the mean power of each frequency band of the noise profile that is used as a threshold in discrimination. The default value is 6 when advanced controls are hidden.
The intention is that the setting approximates the negative of the base ten logarithm of the probability that a band of a window of noise will exceed the threshold and so remain as a "tinklebell" visible as a spot in the spectrogram. Thus the default setting is meant to correspond to 1 in 1 million. At the usual sampling rate of 44100 Hz and window size of 2048 and four steps per window, there are 1025 bands per window and 86.133 windows per second; a miss rate of one in a million should therefore mean approximately one such bell every 11 seconds (1000000 / (1025 * 86.133)), though not always at an audible frequency.
Of course, higher settings will result in less musical noise, but excessive settings will result in more distortion of the signal that is passed unattenuated.
- A recent addition is graying out of controls which are not relevant with other settings set.
With isolate set, so many controls are grayed out it suggests that 'Isolate' is a different effect, and possibly not much use.
James is generally -1 on graying out as a UI idiom, as it can cause confusion in users as to why options are grayed out.
Test results should include the name of the source file, patches and modifications used, what settings were used, and full details of any additional processing. The source file should be listed in the Test Samples section above.
Spectral subtraction vs Standard audacity 2.0.6
Standard audacity 2.0.6 noise removal applied to Sample 1: Configuration used: Noise Reduction: 48 Sensitivity: 0 Frequency Smoothing: 200 Attack/decay: 0.1 . Result: Sample 1 with standard audacity 2.0.6 noise removal
Standard audacity 2.0.6 noise removal applied to Sample 1 with spectral subtraction slider patch: Configuration used: Noise Reduction: 48 Sensitivity: 0 Subtraction: 1.0 Frequency Smoothing: 200 Attack/decay: 0.1 . Result: Sample 1 with audacity 2.0.6 noise removal with spectral subtraction