Difference between revisions of "HowAudacityWorks"

From Audacity Wiki
Jump to: navigation, search
(Audacity SpeechCast Processing)
(Use {{DISPLAYTITLE}} to avoid CamelCase)
 
(24 intermediate revisions by 10 users not shown)
Line 1: Line 1:
This page hasn't been written yet. If you have questions about how Audacity works, please post them here and the developers will answer them. Eventually we will organize the document and explain all of the most interesting issues.
+
{{DISPLAYTITLE:How Audacity Works}}
----
+
{{introrel|This is a page for technical questions about the '''algorithms''' used in Audacity's programming code. An algorithm can be defined as a finite list of instructions for accomplishing a task that, given an initial state, will terminate in a defined end-state. |If you have questions about how Audacity works, please post them here and the developers will answer them!|[[ArchitecturalDesign]] describes the structure of the Audacity system architecture.  
 +
* [[AudacityLibraries]] describes the components that are combined together to make Audacity.}}
 +
{{hint|Eventually we will get a lot more organised and have explanations of the Audacity algorithms in {{external|[http://www.stack.nl/~dimitri/doxygen/ Doxygen]}} format, where they can be both in the source code for Audacity and on a web page like here.}}
  
Okay, I'll start this out!
 
  
Questions:
+
__TOC__
1) I've been using Audacity 1.2.3 for several months and it's been great. But for some reason, which I don't have a clue about, suddenly upon playback, Audacity makes voices sound like Alvin (of The chipmonks).  Even my own voice, recorded via line in/from Voice Editing software, is Alvin!  Music recorded via webcast radio is also like The Chipmonks!  The only change I see is that "Recording time remaining" went from 460-something down to under 200.  I've tried uninstalling (and getting rid of all files or folders dealing with Audacity), then defrag and restart before installing Audacity again.  All the same situations, including "Recording time remaining" came right back.
 
<br><br>
 
Probably recording two tracks at two different bit rates, or dragging other audio files into a project that were recorded at a different bit rate.
 
  
2) Since I'm not a musician, I don't understand what the Speed, Tempo, Pitch . . . settings ought to be, nor what they were as default.  Is there a way to return Audacity to *all* of its default settings and, by the way, what ought they to be?
+
== When is Gain Applied? ==
  
Thanks SO much--Arlene
+
'''Q:''' <i>I know that Audacity has a 32-bit sample resolution, and that when mixed down to normal 16-bit wav, it renders much of the following moot...  however...  When gain/amplification (either negative or positive) is applied, the resulting interpolation must result in a less accurate representation of the original waveform.  I'm wondering if when running down the EDL (edit decision list), Audacity performs each gain change calculation separately, or if it's smart enough to look at all the gain adjustments in total, and interpolate only once, thereby reducing the accumulation of error?</i>
  
 +
'''A:''' No.  Audacity is not that smart.  The order in which effects are applied to the tracks is exactly the same as the order that you apply them in.  Audacity doesn't actually have an EDL at all.  Effects are applied at the time that you request them.
  
== Audacity SpeechCast Processing ==
+
== How do Effects Work? ==
  
'''Q:'''  I had just started toying with mobile voice recordings, personal musings, presentations at work, and lectures at a course I'm taking. For the personal notes, I have control over the environment and intelligence-to-noise ratio. However, in the other environments, I am limited in this regard, as I cannot stand next to the presenter/lecturer with my Zen in order to get better quality voice recording. :)
+
* See [[How Effects Work]]
  
I have today for the first time begun looking for good audio processing tools. After playing with a few I am begining to settle I think on Audacity. However, I have no idea what to do for voice processing. I will outline what I have learned so far, and where I think more info would be great for people in general recording speech  sound-lets for re-distribution or private records.
+
== Resampling ==
  
'''My Process to Date'''
+
'''Q:''' <i>I'd like to know which resampling algorithm Audacity uses. I`m studying resampling for my thesis and I`m testing the influence of Audacity's resampler on perceived audio quality.</i>
  
1) My portable recorder (like most, has a limited mic and gain there, and also quality/sampling-rate i would imagine) generates Wav Files, which I import to Audacity.
+
'''A:''' Audacity uses a library called [[libresample]], which is an implementation of the resampling algorithm from Julius Orion Smith's [[Resample]] project.  Audacity contains code to use Erik de Castro Lopo's [[libsamplerate]] as an alternative, but we can't distribute that with Audacity because of licensing issues.
  
1.1) This is a stereo, 16kHz, 32-bit float audio (as from the box info on the left of the audacity window)
+
For more information on our choice of resampling algorithms:
 +
*{{external|http://comments.gmane.org/gmane.comp.audio.audacity.devel/4320}}
 +
*{{external|http://comments.gmane.org/gmane.comp.audio.audacity.devel/4307}}
  
2) I then immediately save the import as an Audacity project, copying the source wav into the audacity project. I then remove the source wav file which is now redundant (though I still have a copy of this original info on a backup medium until I am happy with the processing of the data)
+
== Interpolation ==
  
3) I select the envelope tool and 'widen' it, in preparation for normalisation (to get better normalisation, or increase in volume).
+
'''Q:''' <i>Which interpolation algorithm does Audacity use to interpolate between frequency values in the spectrum analysis?</i>
  
4) Next I select all the data and perform normalisation (audio level of speaker at distance of few metres is low, so this increases the audio level)
+
== Waveform dB==
  
5) Unfortunately, the background noise was close to the level of the speaker most of the time, so I select 'quite' pieces between words/sentences and use them as 'noise training data' with the "Get Noise Profile" option of the noise removal 'effect'.
+
'''Q:''' <i>How is the Waveform dB scale calculated?</i>
 +
{{ednote|'''Gale: 04Apr13:''' The previous text which was confusing and seemingly incorrect said 
 +
"If the sound amplitude (air pressure) goes up by a factor of 10 the dB goes up by one point.  If it increases 100-fold then in dB it goes up by 2 and so on.  This is very like the Richter scale for earthquakes.  A one point change is a 10-fold increase in pressure." }}
  
6) Armed with this, I select data for a range to either side of this training data, and apply the noise filter. I repeat for the entire data.
+
'''A:''' See [http://en.wikipedia.org/wiki/Decibel Wikipedia] for full details. The basic idea is that dB is a [http://en.wikipedia.org/wiki/Logarithmic_scale logarithmic scale] indicating a ratio of power or amplitude relative to a specified or implied reference level. In Audacity's case the ratio is of [http://en.wikipedia.org/wiki/Amplitude amplitude] relative to zero [http://en.wikipedia.org/wiki/DB_FS dBFS] which is the maximum possible level of a digital signal without [http://en.wikipedia.org/wiki/Clipping_(audio) clipping]. We use an amplitude ratio because doubling the power of an audio signal does not double its amplitude.  
  
7) I revisit the file searching for gaps in the speech, noises that are not intelligence, etc, and silence/cut them.
+
To give a couple of examples, doubling amplitude raises it by 6 dB (applies a ''gain'' of + 6 dB) and halving amplitude reduces it by 6 dB (applies a ''gain'' of -6 dB). Increasing amplitude ten-fold (by a factor of 10) applies a gain of + 20 dB and reducing amplitude to one-tenth of the original applies a gain of -20 dB.  
  
8) I have also applied the 'Click Removal' effect, but I am not sure it has done much.
+
To compare that last example with Audacity's [http://en.wikipedia.org/wiki/Linear linear] Waveform scale, an amplitude of 0 dB is '''1''' on that scale and an amplitude of -20 dB is '''0.1''' on that scale.  
  
9) My audio recorder (a zen vision:m portable thingy) has a drive in it which spins up a various intervals. Then there is a 'click' and the spin 'whine' has stopped. I presume this is just writing the buffer to disk, but hearing the disk in the recording is annoying, especially with low volume speakers. I am manually looking for these points (which, like people coughing, is given away by a quick 'spike' in the waveform representation). Is there a better way ?
+
There is also some disorganised but useful information about decibels [[User talk:Galeandrews#Definition_of_the_decibel_scale|here]] which has never been put anywhere more appropriate.
  
10) There is some 'hiss' and other high pitch noise in the background, how can I get rid of this ?
+
== Audio Mixing ==
 +
'''Q:''' <i>What is the algorithm used by Audacity to mix separate sound tracks (i.e. what is the process of merging the tracks to a single one when the "Mix and Render" command is used)?</i>
  
'''What I think is needed'''
+
'''A:''' Mixing is just addition.  The waveforms show the air pressure moment by moment.  If there are two sounds at the same time then the air pressures add. So we just add the waveform values.  It ''is'' a little more complex than that since for stereo we add right tracks to right tracks and left tracks to left tracks and mono to both, and we apply gain and amplitude envelopes before adding. Gains are just multiplying the signal by some value. Left-right panning, which is also done during mixing, is similar in that it applies different gains to left and right channels. Also, if the tracks being mixed were not at the desired sample rate for the project, we have to first do sample rate conversion too.  There is also the problem of 'clipping' where the value after mixing is too loud. At the moment Audacity mixes the tracks as indicated by the waveform values and the  setting of the gain and pan sliders on the Track Control Panels, without preventing clipping in the result. 
  
1) Some more information specific to speech processing.
+
[[Category:For Developers]][[Category:How It Works]]
 
 
1.2) I have read articles hinting at unrequired frequencies or frequency bands, bass mostly. I need to look deeper into this. What are they ? How can one apply them in Audacity ? Are there dangers in losing intelligence ?
 
 
 
1.3) How to use 'filters' to remove data outside the range of the human voice. Can a specific speaker be profiled via a short speech segment, and use this as a mask to remove everything else ?
 
 
 
1.4) What are good audio file settings for speech ? I am referring specifically to sampling sensitivity and rate, stereo/mono, etc. How can one easily re-sample an audio file with speech only in order to make it a smaller file yet with all the intelligence still intact ?
 
 
 
 
 
I am sure I am missing loads more good tricks for processing speech data. So please chip in. This is a cool application, good info on accomplishing various tasks with Audacity would be a great help to me, and lots of others from what I am reading online.
 
 
 
With the advent of the multitude of portable digital audio players and recorders and thus the desire to produce general SpeechCasts, we have entered a new era where the general public, not knowing about the Signal Processing tricks required to sanitise speech well, will want to do just that.
 
 
 
Any tutorials, automation, etc would be very very helpful in my opinion.
 
 
 
I will report back if I find anything else of interest.
 
 
 
== Audacity Resampling ==
 
 
 
'''Q:''' I'd like to know which resampling algorithm does audacity use. I`m studying resampling for my thesis and I`m testing Audacity resampler` influence on perceived audio quality..
 
 
 
'''A:''' Audacity uses the resampling algorithm from Julius Orion Smith's [[Resample]] project.  Audacity also contains code to use [[libsamplerate]], but we can't distribute librample with Audacity because of licensing issues.
 
 
 
See these discussions for more information on our choice of resampling algorithms:
 
* http://comments.gmane.org/gmane.comp.audio.audacity.devel/4320
 
* http://comments.gmane.org/gmane.comp.audio.audacity.devel/4307
 

Latest revision as of 18:55, 29 January 2015


This is a page for technical questions about the algorithms used in Audacity's programming code. An algorithm can be defined as a finite list of instructions for accomplishing a task that, given an initial state, will terminate in a defined end-state.
If you have questions about how Audacity works, please post them here and the developers will answer them!
 
Related article(s):
Eventually we will get a lot more organised and have explanations of the Audacity algorithms in Doxygen  format, where they can be both in the source code for Audacity and on a web page like here.


When is Gain Applied?

Q: I know that Audacity has a 32-bit sample resolution, and that when mixed down to normal 16-bit wav, it renders much of the following moot... however... When gain/amplification (either negative or positive) is applied, the resulting interpolation must result in a less accurate representation of the original waveform. I'm wondering if when running down the EDL (edit decision list), Audacity performs each gain change calculation separately, or if it's smart enough to look at all the gain adjustments in total, and interpolate only once, thereby reducing the accumulation of error?

A: No. Audacity is not that smart. The order in which effects are applied to the tracks is exactly the same as the order that you apply them in. Audacity doesn't actually have an EDL at all. Effects are applied at the time that you request them.

How do Effects Work?

Resampling

Q: I'd like to know which resampling algorithm Audacity uses. I`m studying resampling for my thesis and I`m testing the influence of Audacity's resampler on perceived audio quality.

A: Audacity uses a library called libresample, which is an implementation of the resampling algorithm from Julius Orion Smith's Resample project. Audacity contains code to use Erik de Castro Lopo's libsamplerate as an alternative, but we can't distribute that with Audacity because of licensing issues.

For more information on our choice of resampling algorithms:

Interpolation

Q: Which interpolation algorithm does Audacity use to interpolate between frequency values in the spectrum analysis?

Waveform dB

Q: How is the Waveform dB scale calculated?

Gale: 04Apr13: The previous text which was confusing and seemingly incorrect said "If the sound amplitude (air pressure) goes up by a factor of 10 the dB goes up by one point. If it increases 100-fold then in dB it goes up by 2 and so on. This is very like the Richter scale for earthquakes. A one point change is a 10-fold increase in pressure."

A: See Wikipedia for full details. The basic idea is that dB is a logarithmic scale indicating a ratio of power or amplitude relative to a specified or implied reference level. In Audacity's case the ratio is of amplitude relative to zero dBFS which is the maximum possible level of a digital signal without clipping. We use an amplitude ratio because doubling the power of an audio signal does not double its amplitude.

To give a couple of examples, doubling amplitude raises it by 6 dB (applies a gain of + 6 dB) and halving amplitude reduces it by 6 dB (applies a gain of -6 dB). Increasing amplitude ten-fold (by a factor of 10) applies a gain of + 20 dB and reducing amplitude to one-tenth of the original applies a gain of -20 dB.

To compare that last example with Audacity's linear Waveform scale, an amplitude of 0 dB is 1 on that scale and an amplitude of -20 dB is 0.1 on that scale.

There is also some disorganised but useful information about decibels here which has never been put anywhere more appropriate.

Audio Mixing

Q: What is the algorithm used by Audacity to mix separate sound tracks (i.e. what is the process of merging the tracks to a single one when the "Mix and Render" command is used)?

A: Mixing is just addition. The waveforms show the air pressure moment by moment. If there are two sounds at the same time then the air pressures add. So we just add the waveform values. It is a little more complex than that since for stereo we add right tracks to right tracks and left tracks to left tracks and mono to both, and we apply gain and amplitude envelopes before adding. Gains are just multiplying the signal by some value. Left-right panning, which is also done during mixing, is similar in that it applies different gains to left and right channels. Also, if the tracks being mixed were not at the desired sample rate for the project, we have to first do sample rate conversion too. There is also the problem of 'clipping' where the value after mixing is too loud. At the moment Audacity mixes the tracks as indicated by the waveform values and the setting of the gain and pan sliders on the Track Control Panels, without preventing clipping in the result.