Difference between revisions of "HowAudacityWorks"

From Audacity Wiki
Jump to: navigation, search
(Explain mixing.)
(Use {{DISPLAYTITLE}} to avoid CamelCase)
 
(4 intermediate revisions by 2 users not shown)
Line 1: Line 1:
{| style="background:#EEEEFF" width=90% align="center"
+
{{DISPLAYTITLE:How Audacity Works}}
|This is a page for technical questions about the '''algorithms''' used in Audacity's programming code. An algorithm can be defined as a finite list of instructions for accomplishing a task that, given an initial state, will terminate in a defined end-state.  
+
{{introrel|This is a page for technical questions about the '''algorithms''' used in Audacity's programming code. An algorithm can be defined as a finite list of instructions for accomplishing a task that, given an initial state, will terminate in a defined end-state. |If you have questions about how Audacity works, please post them here and the developers will answer them!|[[ArchitecturalDesign]] describes the structure of the Audacity system architecture.
 +
* [[AudacityLibraries]] describes the components that are combined together to make Audacity.}}
 +
{{hint|Eventually we will get a lot more organised and have explanations of the Audacity algorithms in {{external|[http://www.stack.nl/~dimitri/doxygen/ Doxygen]}} format, where they can be both in the source code for Audacity and on a web page like here.}}
  
'''If you have questions about how Audacity works, please post them here and the developers will answer them!'''
 
|}
 
  
 +
__TOC__
  
<span style="font-style:italic">Eventually we will get a lot more organised and have explanations of the Audacity algorithms in {{external|[http://www.stack.nl/~dimitri/doxygen/ Doxygen]}} format, where they can be both in the source code for Audacity and on a web page like here.</span><br><br>
 
'''Related articles:'''
 
* [[ArchitecturalDesign]] describes the structure of the Audacity system architecture.
 
* [[AudacityLibraries]] describes the components that are combined together to make Audacity.<br><br>
 
__TOC__
 
 
== When is Gain Applied? ==  
 
== When is Gain Applied? ==  
  
'''Q:''' <i>I know that Audacity has a 32 bit sample resolution, and that when mixed down to normal 16 bit wav, it renders much of the following moot...  however...  When gain/amplification (either negative or positive) is applied, the resulting interpolation must result in a less accurate representation of the original waveform.  I'm wondering if when running down the EDL (edit decision list), Audacity performs each gain change calculation separately, or if it's smart enough to look at all the gain adjustments in total, and interpolate only once, thereby reducing the accumulation of error?</i>
+
'''Q:''' <i>I know that Audacity has a 32-bit sample resolution, and that when mixed down to normal 16-bit wav, it renders much of the following moot...  however...  When gain/amplification (either negative or positive) is applied, the resulting interpolation must result in a less accurate representation of the original waveform.  I'm wondering if when running down the EDL (edit decision list), Audacity performs each gain change calculation separately, or if it's smart enough to look at all the gain adjustments in total, and interpolate only once, thereby reducing the accumulation of error?</i>
  
 
'''A:''' No.  Audacity is not that smart.  The order in which effects are applied to the tracks is exactly the same as the order that you apply them in.  Audacity doesn't actually have an EDL at all.  Effects are applied at the time that you request them.
 
'''A:''' No.  Audacity is not that smart.  The order in which effects are applied to the tracks is exactly the same as the order that you apply them in.  Audacity doesn't actually have an EDL at all.  Effects are applied at the time that you request them.
  
== Noise Removal ==
+
== How do Effects Work? ==
 
 
=== About the algorithm===
 
 
 
'''Q:''' <i>How do you actually remove noise?  What is the algorithm?</i>
 
 
 
'''A:''' The noise removal algorithm uses {{external|[http://en.wikipedia.org/wiki/Fourier_analysis Fourier analysis]}}: it finds the spectrum of pure tones that make up the background noise in the quiet sound segment that you selected - that's called the "frequency spectrum" of the sound.  That forms a fingerprint of the static background noise in your sound file.  When you remove noise from the music as a whole, the algorithm finds the frequency spectrum of each short segment of sound in your music.  Any pure tones that aren't sufficiently louder than the fingerprint (above the threshold to be preserved) are greatly reduced in volume.  That way, (say) a guitar note or an overtone of the singer's voice are preserved, but hiss, hum, and other steady noises can be minimized. The general technique is called {{external|[http://en.wikipedia.org/wiki/Noise_gate spectral noise gating]}}. 
 
 
 
The first pass of noise removal is done over just noise.  For each windowed sample of the sound, we take a Fast Fourier Transform (FFT) and then statistics are tabulated for each frequency band - specifically the maximum level achieved by at least <n> sampling windows in a row, for various values of <n>.
 
 
 
During the noise removal phase, we start by setting a gain control
 
for each frequency band such that if the sound has exceeded the
 
previously-determined threshold, the gain is set to 0, otherwise
 
the gain is set lower (e.g. -18 dB), to suppress the noise. Then frequency-smoothing is applied so that a single frequency is never suppressed or boosted in isolation, followed by time-smoothing so that the gain for each frequency band moves slowly. Lookahead is employed; this effect is not designed for real-time but if it were, there would be a significant delay. The gain controls are applied to the complex FFT of the signal, and then the inverse FFT is applied, followed by a Hanning window; the output signal is then pieced together using overlap/add of half the window size.
 
 
 
 
 
'''Q:''' <i>How many frequency bands does the noise gate use?</i>
 
 
 
'''A:''' In Audacity 1.3.3 + later we use an FFT size of 2048, which results in 1024 frequency bands.
 
 
 
 
 
===Artifacts===  
 
 
 
'''Q:''' <i>What causes the 'tinkling' artefacts, and what steps can and have been taken to remove them?</i>
 
 
 
'''A:''' The tinkly artifacts happen when individual pure tones are near the threshold to be preserved -- they are small pieces of the background soundscape that survived the thresholding, perhaps because the background noise is slightly different from the fingerprint or because the main sound has overtones that are imperceptible but that boost them slightly over the threshold.
 
 
 
So while the Audacity noise gating algorithm could perhaps be improved, any Fourier-based noise removal algorithm will have some artifacts like the "tinkle-bells".  They are a symptom of the problem of ''discrimination'' - deciding whether a particular analogue signal is above or below a decision threshold - that is central to the fields of digital data processing and information theory.  In general the tinkle-bell artifacts are ''quieter'' than the original noise.  The real question is whether they are ''more noticeable'' than the original noise.  (For example, noise-gating the Beatles' ''Sun King'' track off the ''Abbey Road'' album is a bad idea, because the soft brushed cymbal sounds merge smoothly into the tape hiss on the original master recording, so tinkle bells and a related problem -- fluttering -- are prominent in noise-gated versions of that track.)
 
 
 
You can reduce the effect of tinkle bells by noise gating sounds that are well separated (either in volume or frequency spectrum) from the background noise, or by mixing a small amount of the original noisy track back into the noise gated sound.  Then the muted background noise tends to mask the tinkle bells. That technique works well for (e.g.) noisy microcassette recordings, where the noise floor might only be 20 dB below the loudest sounds on the tape.  You can get about 10dB of noise reduction that way, without excessive tinkly artifacts.
 
  
 +
* See [[How Effects Work]]
  
 
== Resampling ==
 
== Resampling ==
Line 58: Line 26:
 
*{{external|http://comments.gmane.org/gmane.comp.audio.audacity.devel/4320}}
 
*{{external|http://comments.gmane.org/gmane.comp.audio.audacity.devel/4320}}
 
*{{external|http://comments.gmane.org/gmane.comp.audio.audacity.devel/4307}}
 
*{{external|http://comments.gmane.org/gmane.comp.audio.audacity.devel/4307}}
 
  
 
== Interpolation ==
 
== Interpolation ==
Line 64: Line 31:
 
'''Q:''' <i>Which interpolation algorithm does Audacity use to interpolate between frequency values in the spectrum analysis?</i>
 
'''Q:''' <i>Which interpolation algorithm does Audacity use to interpolate between frequency values in the spectrum analysis?</i>
  
 +
== Waveform dB==
  
== Waveform dB==
+
'''Q:''' <i>How is the Waveform dB scale calculated?</i>
 +
{{ednote|'''Gale: 04Apr13:''' The previous text which was confusing and seemingly incorrect said 
 +
"If the sound amplitude (air pressure) goes up by a factor of 10 the dB goes up by one point.  If it increases 100-fold then in dB it goes up by 2 and so on.  This is very like the Richter scale for earthquakes.  A one point change is a 10-fold increase in pressure." }}
 +
 
 +
'''A:''' See [http://en.wikipedia.org/wiki/Decibel Wikipedia] for full details. The basic idea is that dB is a [http://en.wikipedia.org/wiki/Logarithmic_scale logarithmic scale] indicating a ratio of power or amplitude relative to a specified or implied reference level. In Audacity's case the ratio is of [http://en.wikipedia.org/wiki/Amplitude amplitude] relative to zero [http://en.wikipedia.org/wiki/DB_FS dBFS] which is the maximum possible level of a digital signal without [http://en.wikipedia.org/wiki/Clipping_(audio) clipping]. We use an amplitude ratio because doubling the power of an audio signal does not double its amplitude.
  
'''Q:''' <i>How is the waveform dB calculated?</i>
+
To give a couple of examples, doubling amplitude raises it by 6 dB (applies a ''gain'' of + 6 dB) and halving amplitude reduces it by 6 dB (applies a ''gain'' of -6 dB). Increasing amplitude ten-fold (by a factor of 10) applies a gain of + 20 dB and reducing amplitude to one-tenth of the original applies a gain of -20 dB.
  
'''A:''' "See [http://en.wikipedia.org/wiki/Decibel Wikipedia] for full details.  The basic idea is that dB is a logarithmic scale.  If the sound amplitude (air pressure) goes up by a factor of 10 the dB goes up by one point. If it increases 100 fold then in dB it goes up by 2 and so on.  This is very like the richter scale for earthquakes.  A one point change is a 10 fold increase in pressure."
+
To compare that last example with Audacity's [http://en.wikipedia.org/wiki/Linear linear] Waveform scale, an amplitude of 0 dB is '''1''' on that scale and an amplitude of -20 dB is '''0.1''' on that scale.  
  
There is also some disorganised but useful information about decibels on [[User talk:Galeandrews|my talk page]] which has never been put anywhere more appropriate.  
+
There is also some disorganised but useful information about decibels [[User talk:Galeandrews#Definition_of_the_decibel_scale|here]] which has never been put anywhere more appropriate.
  
 
== Audio Mixing ==
 
== Audio Mixing ==
'''Q:''' <i>What is the algorithm used by Audacity to mix separate sound tracks. (ie. what is the process of merging the tracks to a single one when the "Mix and Render" command is used.)</i>
+
'''Q:''' <i>What is the algorithm used by Audacity to mix separate sound tracks (i.e. what is the process of merging the tracks to a single one when the "Mix and Render" command is used)?</i>
  
'''A:''' Mixing is just addition.  The wave forms show the air pressure moment by moment.  If there are two sounds at the same time then the air pressures add. So we just add the wave form values.  It ''is'' a little more complex than that since for stereo we add right tracks to right tracks and left tracks to left tracks and mono to both, and we apply gain and amplitude envelopes before adding. Gains are just multiplying the signal by some value. Left-right panning, which is also done during mixing, is similar in that it applies different gains to left and right channels. Also, if the tracks being mixed were not at the desired sample rate for the project, we have to first do sample rate conversion too.  There is also the problem of 'clipping' - if the value after mixing is too loud.
+
'''A:''' Mixing is just addition.  The waveforms show the air pressure moment by moment.  If there are two sounds at the same time then the air pressures add. So we just add the waveform values.  It ''is'' a little more complex than that since for stereo we add right tracks to right tracks and left tracks to left tracks and mono to both, and we apply gain and amplitude envelopes before adding. Gains are just multiplying the signal by some value. Left-right panning, which is also done during mixing, is similar in that it applies different gains to left and right channels. Also, if the tracks being mixed were not at the desired sample rate for the project, we have to first do sample rate conversion too.  There is also the problem of 'clipping' where the value after mixing is too loud. At the moment Audacity mixes the tracks as indicated by the waveform values and the  setting of the gain and pan sliders on the Track Control Panels, without preventing clipping in the result. 
  
[[Category:For Developers]]
+
[[Category:For Developers]][[Category:How It Works]]

Latest revision as of 18:55, 29 January 2015


This is a page for technical questions about the algorithms used in Audacity's programming code. An algorithm can be defined as a finite list of instructions for accomplishing a task that, given an initial state, will terminate in a defined end-state.
If you have questions about how Audacity works, please post them here and the developers will answer them!
 
Related article(s):
Eventually we will get a lot more organised and have explanations of the Audacity algorithms in Doxygen  format, where they can be both in the source code for Audacity and on a web page like here.


When is Gain Applied?

Q: I know that Audacity has a 32-bit sample resolution, and that when mixed down to normal 16-bit wav, it renders much of the following moot... however... When gain/amplification (either negative or positive) is applied, the resulting interpolation must result in a less accurate representation of the original waveform. I'm wondering if when running down the EDL (edit decision list), Audacity performs each gain change calculation separately, or if it's smart enough to look at all the gain adjustments in total, and interpolate only once, thereby reducing the accumulation of error?

A: No. Audacity is not that smart. The order in which effects are applied to the tracks is exactly the same as the order that you apply them in. Audacity doesn't actually have an EDL at all. Effects are applied at the time that you request them.

How do Effects Work?

Resampling

Q: I'd like to know which resampling algorithm Audacity uses. I`m studying resampling for my thesis and I`m testing the influence of Audacity's resampler on perceived audio quality.

A: Audacity uses a library called libresample, which is an implementation of the resampling algorithm from Julius Orion Smith's Resample project. Audacity contains code to use Erik de Castro Lopo's libsamplerate as an alternative, but we can't distribute that with Audacity because of licensing issues.

For more information on our choice of resampling algorithms:

Interpolation

Q: Which interpolation algorithm does Audacity use to interpolate between frequency values in the spectrum analysis?

Waveform dB

Q: How is the Waveform dB scale calculated?

Gale: 04Apr13: The previous text which was confusing and seemingly incorrect said "If the sound amplitude (air pressure) goes up by a factor of 10 the dB goes up by one point. If it increases 100-fold then in dB it goes up by 2 and so on. This is very like the Richter scale for earthquakes. A one point change is a 10-fold increase in pressure."

A: See Wikipedia for full details. The basic idea is that dB is a logarithmic scale indicating a ratio of power or amplitude relative to a specified or implied reference level. In Audacity's case the ratio is of amplitude relative to zero dBFS which is the maximum possible level of a digital signal without clipping. We use an amplitude ratio because doubling the power of an audio signal does not double its amplitude.

To give a couple of examples, doubling amplitude raises it by 6 dB (applies a gain of + 6 dB) and halving amplitude reduces it by 6 dB (applies a gain of -6 dB). Increasing amplitude ten-fold (by a factor of 10) applies a gain of + 20 dB and reducing amplitude to one-tenth of the original applies a gain of -20 dB.

To compare that last example with Audacity's linear Waveform scale, an amplitude of 0 dB is 1 on that scale and an amplitude of -20 dB is 0.1 on that scale.

There is also some disorganised but useful information about decibels here which has never been put anywhere more appropriate.

Audio Mixing

Q: What is the algorithm used by Audacity to mix separate sound tracks (i.e. what is the process of merging the tracks to a single one when the "Mix and Render" command is used)?

A: Mixing is just addition. The waveforms show the air pressure moment by moment. If there are two sounds at the same time then the air pressures add. So we just add the waveform values. It is a little more complex than that since for stereo we add right tracks to right tracks and left tracks to left tracks and mono to both, and we apply gain and amplitude envelopes before adding. Gains are just multiplying the signal by some value. Left-right panning, which is also done during mixing, is similar in that it applies different gains to left and right channels. Also, if the tracks being mixed were not at the desired sample rate for the project, we have to first do sample rate conversion too. There is also the problem of 'clipping' where the value after mixing is too loud. At the moment Audacity mixes the tracks as indicated by the waveform values and the setting of the gain and pan sliders on the Track Control Panels, without preventing clipping in the result.