Spectral Selection Ideas
Your work on a spectral editing feature for Audacity will be done over ten weeks, and subject to mentor's evaluations at the halfway and final stages. If your proposal is accepted, you must draw up a schedule of tasks and have periodic discussions with your mentor to track progress.
This schedule will be part of the proposal that you formally submit before 13 April 2021. You should be estimating now what you think you can accomplish in that time, which should include the "essentials" below.
If you finish the essentials ahead of schedule and before the end of the program, then there may be further enhancements that you can implement as bonuses -- the "stretch goals."
If you have compiled Audacity and can step its execution in the debugger, then you can explore how some of its graphical editing is implemented, the better to understand how you can implement new editing operations analogously. This page contains a guide for those explorations.
Mouse interaction and painting
Some work must be done for a user interface, and it is not small, but it can mostly proceed by analogy and adaptation of existing examples. Strive for a minimum of reasonably convenient operations, but remember the real value to the user, and inventiveness in this project, lies in the added editing capability to be accessed through this interface.
The abstract base class
TrackPanelCell represents a rectangle in the track area, with its own policies for painting itself and handling events. It requires its subclasses to implement
HitTest(state, pProject), where the first argument is a structure including screen coordinates of the cell's rectangle and a point within it.
The function returns a vector of zero or more pointers to
UIHandle objects, which are short-lived, maintaining state only during click-drag-release sequences of mouse events; these may also be aborted with the Esc key. (Multiple hit test targets may be at one point. The user can also use the Esc key to choose a target at the point before clicking the mouse.)
TrackPanelDrawable is a common abstract base class of
UIHandle, containing the
Draw(context, rect, iPass) method for painting a part of the screen. The method may be called multiple times in one painting of the screen, with different
iPass values. Higher values are for later passes which may overpaint results of previous passes.
Study the hit tests, associated handles, and drawing of class
SpectrumView in particular. See how
SelectHandle is constructed and handles the present simple implementation of spectral selection. But see other, simpler subclasses of
UIHandle too, to understand the commonalities of various click-drag-release sequences.
The Tools toolbar controls a global editing mode that influences the hit test routines within tracks. An implementation of spectral editing might add a button, or buttons, to this bar, and make modifications of the existing hit test routines for spectrum view.
Perhaps a non-modal dialog, like the Contrast analyzer (implemented in Contrast.cpp), could be shown and hidden as one enters or leaves this editing mode.
Saving changes in tracks
Having gathered information from the user's mouse gestures, the tool must then use it to apply an edit to the audio track, which could afterwards be undone or redone. A call to
ProjectHistory::PushState makes a snapshot of the state of tracks, after modifications -- there are many examples, for example
OnSilence(context) in EditMenus.cpp.
ProjectHistory::RollbackState is used when a partial editing operation cannot be completed.
Representation of a spectral selection
Existing spectral selections in Audacity are represented simply by four numbers -- two time bounds, and two frequency bounds. Note that this description of a part of the signal is independent of the magnification or horizontal scrolling position of the view of the track, and of display settings of the spectrogram, such as the vertical scale or windowing function.
To allow the user to choose an irregular shape on the screen -- you must devise a more complicated representation, of a set time-frequency bins, which is updated in response to mouse events, and is used to update the painting of the picture.
Using the representation
An expected use of an irregular spectral selection is that it might highlight an unwanted background sound in a recording, and reduce or eliminate it, while simultaneous desirable sounds are left intact. How can this be calculated?
The spectrogram view is computed using the Fast Fourier Transform. You need not know the details of the algorithm, but should understand the usefulness of its outputs.
You should be comfortable with the following notions -- some of which, this page helps to illustrate:
- Time and frequency domains
- Sliding window
- Time and frequency bins
- Spectral leakage
- Windowing function
- Linearity of the transform
- Invertibility of the transform
It may help to understand the geometric interpretation of complex numbers, and that pairs of real-valued results of the FFT can also be viewed as single complex numbers, with a phase that does not affect the spectrogram view, and a magnitude that does.
Among the sound effects built into Audacity, Noise Reduction (implemented in NoiseReduction.cpp) also uses this algorithm, and its inverse transform. In effect, its algorithm "looks at" a spectrogram too, but then rewrites it. You should understand its general working as phases of
- Analysis (using FFT)
- Modification of coefficients
- Re-synthesis (using inverse FFT)
But those phases are not sequential, but rather are repeated over overlapping portions of a signal. Know too how the overlapping works, affecting analysis and synthesis.
This section describes deficiencies in an older version of the Noise Reduction effect (which have been corrected since then) and may give you other insights.
Do you understand to what extent the procedure to modify sound in response to a user's selection can be like this?
These are some hints for additional study that might further improve the quality of the results. Addressing any one of the questions could be a significant accomplishment alone. Addressing the entire list would be far too ambitious for one GSOC term! But perhaps you will wish to continue contributions that improve upon your invention after the term of the program.
Ease of selection
Look at spectrograms of common sounds like voice or a car horn. Notice the patterns of parallel overtones, which are whole number multiples of a fundamental frequency.
Can a selection tool make it easy to select fundamentals and overtones together?
A sound like voice or a siren rises and falls in pitch, unlike the typical car horn. Can a selection tool extrapolate in time beyond a click of the mouse, following the curve?
The reassignment method is implemented in Audacity as a display option for spectrograms, after the method described here. You need not know all of the mathematics, but understand that it can compensate spectral leakage in a display, so that fundamentals and overtones of sounds are drawn as sharper curves. The intensity that an ordinary spectrogram would assign to a pixel is "reassigned" or moved a small amount in the x and y directions, according to other calculations, and one pixel then shows a summation of reassignments that land on it.
Could these calculations find reuse in improved spectral selection?
Redefining the selection
Must the selection be merely a set of time and frequency bins, including all or none of the spectral power of each bin?
Could the effect apply another user-controlled parameter, such as from a slider control, to attenuate a background sound only partially?
Could the effect examine time-frequency bins outside the selection, and calculate an approximate match within the repaired region?
Avoid early overspecification of details of implementation or of the user interface (class names, button images, dialog text, etc.). These may be revised as experience suggests. Do not worry about how to add a new button icon — this is non-obvious as Audacity is now organized, and you will get help as needed.
But do write a document that provides a credible outline of a plan to add a useful feature, in stages, addressing all the essentials. Provide a mock-up image illustrating a possibility, if you like.
Paintbrush and lasso tools were suggested. A paintbrush tool, which only needs to record a path that the mouse took, may be the simpler idea to reach the non-stretch essential goals. The lasso may require more complicated 2d geometric algorithms.
A convenient tool might also allow the user to remove bins from a spectral selection in progress, before the application of the edit to the sound. So you might want an eraser tool accompanying the paintbrush, or (again, more difficult) a means to drag a curve that bounds a lassoed area.
A non-modal dialog that opens when a new Tools toolbar button is down, and hides when it is up, was suggested to avoid adding more buttons than strictly needed to the toolbar. The dialog might then have buttons to switch among sub-tools of a spectral editing mode — or not. Maybe modifier keys (shift, control) could instead change the meaning of mouse drags. But the dialog might also have other controls that are not buttons. (A non-modal dialog is one that is displayed but still permits mouse and keyboard interactions with other windows. A modal dialog lets you do nothing else but interact with it, until you dismiss it, usually with an OK or Cancel button.)