More GSoC Ideas

More GSoC Ideas
This is a list of more potential projects.

Subheadings for each idea:

Command Scripting Support
Possible Mentors:
 * TBD (Existing work by James Crook, proposed for GSoC by Richard Ash)

Description: Audacity's core audio processing facilities would be useful for many purposes besides the normal use as an audio editor. For many of these, some means of controlling Audacity from another application will be needed. This can also be used for creating advanced scripts using Audacity to perform complex sequences of operation. Rather than trying to incorporate a full scripting language into Audacity, we want to develop a command interface so that a script written in any language (perl, python, javascript, bash ....) can "drive" Audacity by sending commands to it and receiving feedback. Currently an experimental implementation exists for Windows only using named pipes. This project would aim to:
 * Provide a cross-platform implementation, using appropriate IPC structures such as ptys, and possibly TCP/IP on all platforms
 * Make other changes in Audacity to move the code from experimental to production status.

The main issue in the latter category is handling of errors in Audacity. Currently most errors result in an error dialog being displayed directly. This is neither technically possible nor desirable when Audacity is the back-end of another process. Three modes of operation are possible: Normal Audacity, where the error is in the main thread; a graphical scripting client where the error is in a different thread to the user interface; and a non-graphical scripting client where there is no user interface to display a dialog. A means is needed to replace the dialog in such cases and handle the error, along the lines of:
 * error in main thread: show dialog normally
 * error in script thread, GUI mode: post a message to the main thread to show the dialog.
 * error in script thread, non-GUI mode: return a text string indicating an error.

Skills:
 * wxWidgets and C++.
 * Helpful to already have knowledge of IPC on Windows and at least one other platform - but a well honed ability in finding technical info on the internet will suffice if not.

Difficulty:
 * Moderate.

Notes:
 * See also Automation

Early spinoffs from this work:

Either Or
 * Scripting compiles and usable (with existing error reporting) on more than one platform.
 * Scripting works in all three modes on Windows (but still using named pipes).

Multi-Channel Audio support
Possible Mentors:
 * Richard Ash (in principle??)

Description: Audacity is currently only designed with stereo and mono audio in mind. There is a mechanism for exporting tracks from a project to separate channels in a file, but there is no mechanism for panning audio between more than two channels. Given the widespread use of multiple channel audio (5.1 surround, ambisonics etc) it would be good if Audacity could support working with these formats. Adding this support requires a number of separate changes, each of which could be quite far-reaching in the existing Audacity code base:
 * Allow arbitrary sized groups of tracks to be linked, rather than just pairs of tracks. This is currently very badly abstracted in the code, so will mean a lot of fixing of existing code to use the new track group interface.
 * Create a multi-channel capable panning module to replace the current pan control in feeding mono tracks to a multi-channel output. This must cope with a number of different multi-channel formats, and be extensible in the future for more. Cinelerra may have something suitable for 5.1 as a starting point.
 * Provide multi-channel playback in Audacity. Some simple implementations of this exist, but do not use multi-channel mixing of content. The Audacity mixer needs to become multi-channel capable if it is not already, and be linked up to the sound card. Some system for coping if the sound device does not support the Audacity format will be needed (more on platforms with inflexible audio APIs than on those with a plug-in architecture like ALSA where this can probably be done outside Audacity)
 * Provide multi-channel export from Audacity. Mixing shared with above, but may need work on exporter modules to provide things like channel mapping control if needed by the format.
 * Enable track group support in multi-channel importers, so that multi-channel files come in as a track group not lots of mono tracks as is currently the case. This is probably the simplest section of the work.

A simple diagram of what the system might look like is here as a PNG or here as Inkscape SVG

Skills:
 * wxWidgets and C++
 * Some portions may need knowledge of psychoacoustics for doing 3-D panning control
 * access to some multi-channel audio hardware would be advantageous.

Difficulty:
 * Moderate.

Early spinoffs from this work:
 * Track grouping could be done independently of the rest. The same may also be true of a better panning control, and cleaning up the current odd implementation of mono/left/right/stereo tracks.

Smart Help Infrastructure
Possible Mentors:
 * TBD

Description:

(a) (2008-in progress, non GSoC) The preferences panel in Audacity is becoming more difficult for new and experienced users as more preferences are added. There are conflicting forces. The proposal is to link a wxHTML help window with the preference panel. The preference panel will have short descriptions. The HTML help window will have longer descriptive text. The link will be two way, and be at the level of static boxes. The HTML window will highlight text, and if needed scroll, when the static boxes are clicked on. Conversely clicking on an icon in the html text will highlight it in the text and move the preferences dialog on to the right page and highlight the appropriate static box.
 * We want to keep the descriptions on the dialogs short so that the dialogs are not cluttered.
 * Simultaneously we want more explanatory text so that people can find out what the preferences actually do.

(b) We also look for ways to build help screenshots directly from the program. We already have a built-in screenshot tool. This could be augmented to automatically collect all the screenshots needed from an Audacity build. This considerably reduces the work when there are changes in the interface. It also paves the way for a smaller distribution. Images in the help files are not needed since they can be generated on the user's machine. An additional advantage is that the images are custom to the OS on which Audacity is running.

(c) Lists of the available effects and what they do can be built up by the program, so that help generates a custom html page describing the effects actually installed. Similarly the key bindings, mouse bindings and menu bindings can be 'walked' to generate some of the html help files.

It is intended that this new code be released under the wxWidgets license, which is compatible with us releasing Audacity under GPL, so that it can be used in other wxProjects too as part of the Application Framework initiative.

Skills:
 * wxWidgets and C++

Difficulty:
 * Moderate. Each part not too technically challenging, but there is a lot to do.

Notes:
 * See Smart Help Notes for more background / discussion.
 * Part (a) started as non-GSoC project in August 2008

Early spinoffs from this work:
 * Either (b) or (c) essentially completed at the half way stage.

Audio 'Diff'
Possible Mentors:
 * James Crook. Also  Chris Cannam,  Roger Dannenberg (both to be confirmed).

Description:

Ability to compare and align two sound sequences just as one compares text using diff would be a powerful new feature in Audacity. It would greatly facilitate the combining of sounds from multiple 'takes' of the same track. It would also be of use to people looking to identify particular known sounds, e.g. repeated themes in birdsong, in very long recordings.

The implementation idea is conceptually simple. The spectra in two sounds being compared are computed at regular spacings - using existing Audacity code. A metric for spectral similarity is written. In the first incarnation it can be a correlation function.

The alignment (diff) of the two sounds is computed using standard least-distance algorithms, using an adjustable parameter which is the penalty for stretching the sound and the spectral similarity score.

The GUI for presenting the alignment could use the existing code that allows a track to be split into smaller chunks that can be shifted around augmented with a 2D similarity 'plot'. If there is time, an enhanced interface that caters more directly to the two use cases could be provided.

The diff would be implemented as a Vamp plug-in.

Skills:
 * wxWidgets and C++
 * Audio DSP

Difficulty:
 * Hard. Only suitable for a student who has some familiarity with this kind of problem - and possibly already doing research on something related.

Notes:
 * See Audio Diff Notes for more background / discussion.
 * We may run two projects for this, e.g one for local homology one for global homology.
 * Midi-Wav Version of this project started as non-GSoC project in July 2008

Early spinoffs from this work:
 * A method for scoring the similarity of two spectra built into Audacity.
 * A 2D graphical display that will show the similarity of two spectra across the different frequencies.

Computed Automation Tracks
Possible Mentors:
 * James Crook.

Description: In many ways Audacity is just a specialised multi-track chart recorder. This project is to add a new type of track, a track which shows multiple computed automation variables. Rather than being stored, these are computed on demand. The immediate application for these is to give more flexibility in segmenting speech. They can give feedback on where the existing algorithms are proposing to segment a track, allowing fine tuning of the parameters by adjusting the threshold.

If there is time, the computed automation tracks could be used to control parameters in one or more other effect, not just used for segmenting audio.

Another direction this could be taken in is in improving the transcription processing in Audacity.

Skills:
 * wxWidgets and C++

Difficulty:
 * Moderate.

Early spinoff from this work:
 * Implementation of a thresholding automation track, using an existing threshold slider GUI element backported from Audacity-Extra and adding the new compute-on-demand code.

Feature Completion
Possible Mentors:
 * TBD (depends on details)

Description: Identify a feature of Audacity which is in development CVS but is in some way incomplete. A project proposal should describe how the feature would be much improved and brought to release candidate readiness.

Some features we have in CVS that are not yet ready for our stable builds include:


 * Transcription ToolBar - The Algorithms for finding word boundaries are rather buggy and slow.
 * Themes - Too difficult to use as they stand, not all items are covered, does not allow theming of backgrounds yet.

Skills:
 * wxWidgets and C++

Difficulty:
 * Moderate.

Early spinoffs from this work:
 * To be specified in the proposal.

Application Framework Extraction
Possible Mentors:
 * TBD (depends on details).

Description: Enhance Audacity code through splitting generic code from application-specific code. Make the generic code more generic and more useful to Audacity at the same time.

Why is this valuable?


 * Cleanly separating the generic code makes the code easier to work with. This benefits everyone working on Audacity.
 * Audacity is already an excellent way to learn how to use wxWidgets. Extracting general application code further increases the value to programmers of learning Audacity's code.  That's because the framework can be reused with little change in other GPL programs that have nothing to do with audio.

The work on refactoring highlights opportunities for making minor features of Audacity more general, particularly on the GUI side. For example, we already have a simple GUI class for matching channels. It is audio-specific. This could be made into a general purpose widget for 'matching'. Improved graphics and flexibility would benefit our application at the same time as making it useful to other projects.

We'd expect a student to propose some specific classes to work on, to say how they can refactor them and how in doing that they can add general purpose functionality that is directly valuable to us in our application. What's presented here is only an outline of possible directions. A successful student will need to convince us with sufficient detail.

Skills:
 * wxWidgets and C++
 * Very strong awareness of software reuse.

Difficulty:
 * Moderate to Hard depending on completeness.

Early spinoff from this work:
 * It is envisaged that refactoring and making code more generic would be done in stages, each stage useful in itself, rather than progressing all planned changes in tandem. The easiest way to get a visible spin off early is to focus on a particular GUI component.  For example, a focus on the graphs could give us graphs that are not just tied to a single audio channel.  This could be used to demonstrate overlaying a waveform graph over a spectral plot - easy with a more generic component, but not possible with our current arrangement.

Render-On-Demand (Adapted for GSoC 2008)
Possible Mentors:
 * James Crook.

Description: Audacity effects are currently 'batch' oriented. You apply an effect and wait for it to render. This is appropriate for Audacity which often runs on low specification machines. Only the simplest effects can reliably be applied fast enough to play them as they render.

Render-On-Demand would add an option to return immediately when applying an effect, giving greater responsiveness. The render would be applied later as the effect was played. There are several complications to achieving this. They include tracking the state of effects that have been partially rendered and dealing gracefully with running out of CPU resources. A project proposal would need to demonstrate a clear strategy for dealing with these issues.

Skills:
 * wxWidgets and C++

Difficulty:
 * Easy to Moderate.

Notes:
 * ''Adapted in 2008 as Quickload

Early spinoff from this work:
 * A visual display of the rendering state of an effect, alongside the affected track, rather than a progress indicator in a dialog. This would be used before any effects had been made 'render on demand'.  You'd have restricted access to Audacity menus and buttons during a render.  For example during a render you'd have the ability to examine the waveform for clipping, but not to edit sound or queue additional effects until the render completes.

LV2 Support (Proposed/Taken in GSoC 2008)
Possible Mentors:
 * Vaughan Johnson

Description: Add support for the plug-in architecture LV2, the new, improved descendant of LADSPA. This could include adding support for new features in LV2 such as hierarchical plugin categorisation (which could be adapted to work with other plugin types too) and LV2 extensions such as port grouping and basic MIDI support.

Skills:
 * C++

Difficulty:
 * Hard.

Notes:
 * Plan needs to be specific about the LV2 effects which will work once complete.
 * Linux version of LV2 completed, GSoC 2008.

Early spinoff from this work: OR
 * Ability to use LV2 synth effects in Audacity as simple tone generators (rather then hooking them up to a MIDI stream) - giving us several new tone generators.
 * Improved hierarchical browsing of built-in and Nyquist effects in Audacity, proving the strategy for LV2 hierarchical browsing.

Enhanced Vamp plug-in support
Possible Mentors:
 * James Crook

Description: Extend support for Vamp (http://www.vamp-plugins.org/) in Audacity.

Vamp is a C/C++ binary plug-in system for analysis of audio, first used in the Sonic Visualiser (http://www.sonicvisualiser.org/) audio analysis program. Audacity currently contains a basic interface to Vamp, allowing the user to configure and run plug-ins which have simple point-time outputs, displaying the results using the label track. There are many possibilities for extending this.

Some obvious ideas involve extensions to the Audacity GUI to better display or interact with results produced from Vamp plug-ins. But the presence of Vamp support in Audacity also opens the door to using analysis results in editing as well.

Some possibilities:


 * Enable the display of results that have values as well as just point times (i.e. output curves). This has a significant overlap with "Label Track Enhancements" above.


 * Some Vamp plug-ins calculate grid data suitable for display, such as variant types of spectrogram and "chromagram" plot. Enable the display of such results by modifying the existing spectrogram track so as to show output from Vamp plug-ins that have a suitable structure.  Perhaps the track could have a single simple option to select the display type, choosing from a set of known plug-ins with known configurations.


 * Make a beat-slicer function to split audio at times that have been calculated using e.g. an onset detector plug-in.


 * Provide a general feature for applying an effect only to regions in which a certain analysis result is greater than a threshold value.


 * Integrate Vamp with Nyquist (Audacity's built-in Lisp interpreter), so that Nyquist programs can use Vamp plug-ins and Vamp plug-ins can be written using Nyquist.


 * Look for analysis code and applications thereof already existing in Audacity, and consider whether these could be made to function as plug-ins. This would allow them to benefit from improved algorithms in the future, and allow other programs to benefit from the existing code.

Skills:
 * C++.

Notes:
 * For most projects it is a very good idea to discuss the project on audacity-devel mailing list before applying. With this one it is essential.
 * We may have problems finding a mentor for this project in its current form. If you're applying for this one and you haven't discussed it on audacity-devel, you are taking a risk.  You might anyway want to apply with another idea too, so as to reduce that risk.

Difficulty:
 * Moderate.

Early spinoffs from this work:
 * Sibilant extender/contractor by combining sibilant classifier with a stretch-where-marked effect.

Source Separation GUI
Possible Mentors:
 * Chris Cannam (to be confirmed).

Description: Gui for any kind of effect which split one track into two. This also requires an extension to Vamp to support this. New methods will be required to mark up audio for separation. It will need to work with both mono and stereo tracks.

Skills:
 * C++.

Notes:
 * For most projects it is a very good idea to discuss the project on audacity-devel mailing list before applying. With this one it is essential.  Please also read the recent archives as there is some discussion of it already.  See also Source Separation Notes

Difficulty:
 * Hard. An ideal difficult-but-achievable project for an advanced student.

Early spinoffs from this work:
 * Part of proving you are the right person for this is to describe a good spin off that could be achieved by the mid-term evaluation.

Postfish Integration
Possible Mentors:
 * Monty, James Crook

Description: Postfish is a p-threads based Linux audio application with uncompromising quality and some unique effects such as 'deverb' which removes unwanted reverb from a track. Currently it can't be used from within Audacity.

Integrating it into Audacity will be a challenge indeed, particularly on the Windows side where we anticipate problems with a simple translation to wxThreads. The student will need to be prepared to dive into such code.

We also have to be mindful that Audacity will usually be run on machines with far less processing power than the ideal for Postfish. Postfish does heavy number crunching. Its 'deverb' effect was designed for Dual Xeon 3GHz systems. For its integration in Audacity we want to fall-back gracefully to non-real time mode when necessary.

The Ardour project has a version of jack (Jack Audio Connection Kit) that is able to run under Windows. This might be a good way to get Audacity and Postfish working together under Windows, as it opens up other possibilities for Audacity too. An alternative route would use the new plug-in architecture, so that a Postfish bridge is made as an Audacity plug-in.

This is part of the Connected Open Source initiative, aiming to build more bridges between Open Source projects.

Skills:
 * wxWidgets and C++
 * pThreads
 * Experience with both Linux and Windows would help.

Difficulty:
 * Hard. Only suitable for someone who has debugged thread issues in some other context.

Notes:
 * It doesn't look as if Monty is going to be available for this. The proposal could be cannibalised and used to make another one.  A possibly straightforward project would lift the deverb and declip effects and give them a LADSPA wrapper, which would make the effects available on Windows too.

Early spinoffs from this work:
 * Demonstration of a simple real-time echo effect using same thread structure as planned for Postfish integration.

More ideas
More ideas at GSoC Ideas; also the Use Cases and Feature Requests pages (over 200 requests) may suggest ideas for a project proposal.