Use Cases

From Audacity Wiki
Revision as of 18:59, 22 March 2008 by James (talk | contribs) (New use cases which feed in to the implementation pages.)
Jump to: navigation, search

Ideas extracted from the Feature Requests page.


A lot of wildlife enthusiasts utilise this software for recording and editing animal sounds (e.g. birds, bats) who generally work from the spectrum rather than the waveform. The following would extend the sound analysis capabilities of the software and make the spectrum feature easier to use:

  • Threshold settings to remove the displayed background noise - i.e. only sound over a certain amplitude is displayed
  • Time marks displayed along x-axis, more frequency marks along y-axis
  • User defined spectrum colours
  • Settings saved upon exit (e.g. fit vertically, spectrum, threshold)

Audio Cleaning

Cleaning up audio is a common task. The features for it and interface could be improved:

  • Port the cleaning effects from Gnome Wave Cleaner. Particularly decrackle and remove click which are useful for cleaning LPs. Make them into LADSPA effects so that they can be used in other programs too?
  • Wrap Declip as a LADSPA effect.
  • Use Vamp to identify clicks, then an enhanced label track to allow applying a de-click effect by clicking on a label - or selecting ten and applying to ten at a time.
  • Wizard to help people with cleaning LPs.
  • Zoomed in view to help in locating clicks.
  • Picker for 'ambient noise' samples used to repair sounds during silence (more useful in vocal tracks than in music).
  • Waveform over spectrograph.
  • Enhanced Audio preview that allows you to hear the effect of different parameters and so adjust them better. The critical problem to solve is 'independence' of parameters. It is almost impossible to set parameters well when there are more than two and changing one affects the others. A preview that allows parameter changes to be heard in isolation could help.
  • Threshold on wavetrack (backported from Audacity-extra).

Education (General)

  • Ability to use X-Windows sound. This is particularly important in Canada, where schools often use 'thin clients'. That is the classroom has computers which are little more than screen, keyboard, mouse. Disk drives, memory and processor are on a central blade server. This is more cost effective and easier to administer, but does require that Audacity use X-Windows sound.
  • Methods for hiding advanced / optional features so as to make the interface less confusing. Each school may have a different definition of what is standard and what is advanced.
  • Method for easily resetting options to some default saved value. For example to return the system to a a sampling rate and number of channels which works. Pupil may be allowed to tamper with the settings, but we want to be able to get back to known good ones easily.


  • Vowel target practice based on code from CLAM project.
    • There is a vowel-quadrilateral display we can take from Audacity-extra. It uses voroni regions to show nearest vowel match to an utterance.
  • Other kinds of voice analysis, using Vamp plug ins. E.g for the two kinds of 's' sound in the Polish language.
  • Programmed playback, e.g. using script.
  • Elaborations to label track so that can search audio lessons via text more easily.
    • Also have some info an 'audio icons' that make tree based searching of audio, using audio only, faster.

The ideal is to create a small eco-system of programs around Audacity that help with language learning. Some code could run on PDAs and mobile phones (Java) with the lessons being prepared on Audacity.

  • Can integrate with dictionaries from freedict and language resources from open commons.

Looping and Dictation Aids

These will let someone work through a dictation, adjusting and advancing the looping region during playback, sentence by sentence or verse by verse. (At least 2 people want all of these.)

  • When the user hits right-bracket, set the end of looping region to the current position and go back to beginning immediately. (Currently need to stop and restart looping play for it to take effect.)
  • When the user hits left-bracket, set the beginning of the looping play to the current play position immediately. (Currently need to stop and restart looping play for it to take effect.)
  • When the user hits shift-right-bracket, set the end of the looping region to the end of the track immediately. (New key, new feature.)
  • When the user hits shift-left-bracket, set the beginning of the looping region to the existing end of the looping region and move the play head to this new location. Also set the end of the looping region to the end of the track. (New key, new feature.)

Note: A patch has been submitted to devel-list which seeks to implement these four requested features

"Books on tape", field recordings, etc.

This is a collection of features that will facilitate converting recordings made "live" or on physical tapes into digital form. My specific use-case is my attempt to turn foreign language instructional cassettes into MP3 files, but it should be equally useful in many other cases. (BTW, I've tried diving into the source code to implement these myself, but quickly got lost. Any assistance in getting started would be appreciated.)

  • Recordings such as these frequently feature a lot of meta-data that is spoken by the presenter, things like "Unit 3" or the title of the next segment, which I want to capture as labels. While either recording or playing a project, you need "hot keys" that instantly create labels according to a template. The template should allow the inclusion of auto-incrementing numbers. Currently, I have to quickly press a sequence like [Ctrl-M], "u", "3" and [Enter], and then post-process the .aup file to expand the labels. If several events occur in rapid succession, this is especially hard to do accurately. Finally, it would be nice if the label were inserted a bit before the time of the key-press, to allow for reaction times.
  • Use '<' and '> to adjust the tempo of a recording during playback. The 'Change Tempo" effect does this, but I do not want to permanently change my recording.
  • Have a key that, during play-back, jumps backwards one or two seconds. Especially when scanning speech, this would reduce the time required to label everything.
  • Even with compensation for reaction times, it is unlikely that a label will be exactly where you'd want it to be. I'd like to be able do two things to help with this. First, use the mouse to sweep a region and then have Audacity adjust the selection boundaries, for instance to move the selection's start point forward until it's a fixed time prior to the end of silence. (Ideally, this would use a new type of Nyquist extension.) Second, given a selection that covers exactly one label, adjust that label so that it matches the selection.
  • Finally, I'd like to see label-based navigation of tracks, things like "scroll to the {first,next,previous,last} label". Note: you can already click in a label then tab and SHIFT - tab back and forth between them. Thanks for the info, I didn't realize you could do that.

Taken together, these features would allow someone to perform a mostly-unattended capture of a conference presentation or a jam session, play it back at high-speed while marking points of interest, and easily revisit those points to adjust the labels.


  • Create labels in real time from external source during recording: It would be nice to be able to create labels immediately during the recording process, using either remote calls or listening on a socket. I currently fake this with a little python script that logs the intended labels and (hopefully) the correct time-since-start into an external file, then "Import Labels". It would be nice if there were some way to do this directly, since Audacity always has the correct "current" time-in-recording.

We use this for pulling in GPS-sourced time, scene change hints, and some other stuff. "waterfall"/"voiceprint" spectrum display option (instead of, or above/below waveform display)

Fixing a Jam Session

  • Multiple takes of the same 'session'. Allow the user to mix and match. With two takes the idea here is to have an interface where you can choose which fragment to take for each moment, so a bum note in one track that was OK in the other will be fine. Particularly useful for long takes. A necessary step is to align the two tracks so that equivalent sounds correspond. This entails expanding/contracting one or both waveforms in various places. A variant of this uses self comparison of the same track. This is useful where a singer sings a phrase again that they got wrong the first time. Alignment can match up the repeated parts and allow easy removal of the unwanted sound.
  • Spot every cough from the audience in a recording (irregular spacing of a repeated pattern). Needs an interface for defining a 'cough' which will have one or more samples, each with parameters controlling the allowed variations. For example, we can set how precisely the volume of the cough must be matched - ignoring coughs that are in the distance is less likely to mess up audio that by chance matches the pattern.
  • De-Mix the audio, e.g take the vocals away from the instrumentals, separate the percussion from the wind instruments onto two different tracks. In amateur recording, fewer mics than necessary may have been used, or there may have been bleed between the different sources. Separating them cleanly allows for different noise cleaning and effects to be applied to each, before remixing again.

Algorithms for doing aspects of this are discussed on Audio Diff Notes (mix and match and comparison) and on Source Separation Notes for de-mixing audio.

Audio as a measuring tool

  • Graph the speed of an engine from its sound (regular spacing of a repeated pattern, spacing not initially known)
  • Measure the speed of a bullet from the time delay between the sound of firing and of the target being hit - and the known speed of sound in air. (There's a website with pictures of Audacity being used to do just that).
  • Gather evidence of the noise pollution (pneumatic drill, heavy lorries) from building works nearby over a 10 day long recording.... He's planning to take them to court. For this one we need not only a way of classifying sound (advanced Vamp plug ins) but also an improved time track, a way of converting the matches to a printable log. For litigation there may be additional requirements such as ways to indicate that the data has not been tampered with. We may want to run two widely separated microphones, or parabolic and non parabolic, to confirm the direction of the sound source.

Algorithms for doing aspects of this are discussed on Audio Diff Notes.

Recording from cassettes

Azimuth Setter

Azimuth refers to the angle between the tape head(s) and the tape medium. Sometimes tape azimuth is easy enough to set, sometimes not. Having to set it for every tape is a chore though.

Azimuth setter would display the azimuth setting on screen in real time, showing ideal setting in the middle and showing actual setting.

Azimuth setter would:

  1. look at the delay between the 2 channels, which contain a lot of mono content (slightly delayed when azimuth is out of line)
  2. graphically display the azimuth setting
  3. update display in real time, enabling easy ideal setting by user.

This would:

  • speed up the job
  • reduce errors
  • improve accuracy and thus bandwidth & signal to noise ratio
  • If it ran during recording, it would show when azimuth changed, enabling the user to readjust to the new perfect setting during recording.
  • It would also show all users the azimuth issue in a simple clear non-technical way, and result in better recordings by end users.

How might the code work?

This is just a thought, and maybe there are better options I dont know about. I'm not a computer programmer.

The idea is to take a brief sample and look for mono content, repeating with assorted time delays between the 2 channels.

One time shift will give the greatest mono content, and the amount of timeshift then gives you the degree and sign of azimuth error.

A little intelligence can presumably be applied to home in on the right delay figure while minimising the number of delay settings tried.

Note the display scale should be flippable (left to right). The computer has no way to know which way round the scale should be presented to make adjustment easy and intuitive.

If my coding skills weren't 25 years out of date I'd probably give it a go.

If you want to vote for Azimuth setter: please do so on the other recording enhancements section of Feature Requests

Other features

Other features mentioned on Feature Requests which may be useful in their own right are also useful to cassette recording, in particular :

  • 8 bit recording
  • SmartEQ (which tries to equalise the audio according to a known frequency spectrum).
  • "Automatically set recording settings" for sample rate and bit depth based on a sample of recorded audio