GSoC Ideas 2008

From Audacity Wiki
Revision as of 22:00, 23 September 2007 by James (talk | contribs) (Past tense for GSoC 2007.)
Jump to: navigation, search

The Audacity Developer Team weren't part of Google Summer of Code 2007 but we want to be in 2008. We were watching what happened in 2007 closely.

Please help us to get ready for the next round. It is never too early. Your ideas and enthusiasm will help us make it happen next year. You can contact us at this email address: [email protected].

We're also keen to promote links between projects that can benefit Audacity. In 2007 wxWidgets participated in GSoC 2007 and so did We use code from both. Indirectly work on those projects may benefit Audacity. If you have ideas for a project of that kind that you would like to do, we'd be keen to hear from you and discuss it with you, at the address above.

This page is mainly for Ideas for projects.

Other pages related to GSoC 2008 are:


Below is a list of potential projects, but feel free to suggest your own ideas as well.

Note that there are literally hundreds of user-contributed feature suggestions on the Feature Requests page. A good project proposal could combine a number of these suggestions into one proposal. This way of making a proposal makes it particularly easy to specify early spinoffs - which we regard as vital to a successful project.

#1. Audio 'Diff'

(Suggested by James Crook)

Ability to compare and align two sound sequences just as one compares text using diff would be a powerful new feature in Audacity. It would greatly facilitate the combining of sounds from multiple 'takes' of the same track. It would also be of use to people looking to identify particular known sounds, e.g. repeated themes in birdsong, in a very long recordings.

The implementation idea is conceptually simple. The spectra in two sounds being compared are computed at regular spacings - using existing Audacity code. A metric for spectral similarity is written. In the first incarnation it can be a correlation function.

The alignment (diff) of the two sounds is computed using standard least-distance algorithms, using an adjustable parameter which is the penalty for stretching the sound and the spectral similarity score.

The GUI for presenting the alignment could use the existing code that allows a track to be split into smaller chunks that can be shifted around augmented with a 2D similarity 'plot'. If there is time, an enhanced interface that caters more directly to the two use cases could be provided.

Early spinoffs from this work:

  • A method for scoring the similarity of two spectra built into audacity.
  • A 2D graphical display that will show the similarity of two spectra across the different frequencies.

#2. Computed Automation Tracks

(suggested by ???)

In many ways Audacity is just a specialised multi-track chart recorder. This project is to add a new type of track, a track which shows multiple computed automation variables. Rather than being stored, these are computed on demand. The immediate application for these is to give more flexibility in segmenting speech. They can give feedback on where the existing algorithms are proposing to segment a track, allowing fine tuning of the parameters by adjusting the threshold.

If there is time, the computed automation tracks could be used to control parameters in one or more other effect, not just used for segmenting audio.

Another direction this could be taken in is in improving the transcription processing in Audacity.

Early spinoff from this work:

  • Implementation of a thresholding automation track, using an existing threshold slider GUI element backported from Audacity-extra and adding the new compute-on-demand code.

#3. Bridges

(suggested by Jeremy Henty)

One way to grow the feature set of Audacity and at the same time to avoid re-inventing the wheel is to build compatibility 'bridges' between Audacity and other Open Source programs. Two examples of bridges already supported by Audacity are:

  • Bridge to Octave - Octave is an Open Source program for mathematical analysis and is useful in digital signal processing (DSP). Audacity has a bridge to Octave that allows Octave to apply effects to Audacity waveforms and to annotate Audacity waveforms with labels.
  • Bridge to Rivendell - Rivendell is an Open Source program for radio station management. The Rivendell bridges allows Audacity and Rivendell to exchange play-lists.

A proposal for a new bridge should go into some detail as to what features of the other program will be bridged. Generally the plan should avoid extensive work on the other program, since the point of the project is to extend Audacity.

Early spinoff from this work:

  • A restricted bridge which exposes a smaller part of the functionality.

#4. Feature Completion

(suggested by James Crook)

Identify a feature of Audacity which is in development CVS but is in some way incomplete. Project proposal should describe how the feature would be much improved and brought to release candidate readiness.

Some features we have in CVS that are not yet ready for our stable builds include:

  • Transcription ToolBar - The Algorithms for finding word boundaries are rather buggy and slow.
  • Themes - Too difficult to use as they stand, not all items are covered, does not allow theming of backgrounds yet.

Early spinoffs from this work:

  • To be specified in the proposal.

#5. Render-On-Demand

(suggested by Tom Moore)

Audacity effects are currently 'batch' oriented. You apply an effect and wait for it to render. This is appropriate for Audacity which often runs on low spec machines. Only the simplest effects can reliably be applied fast enough to play them as they render.

Render-On-Demand would add an option to return immediately when applying an effect, giving greater responsiveness. The render would be applied later as the effect was played. There are several complications to achieving this. They include tracking the state of effects that have been partially rendered and dealing gracefully with running out of CPU resources. A project proposal would need to demonstrate a clear strategy for dealing with these issues.

Early spinoff from this work:

  • A visual display of the rendering state of an effect, alongside the affected track, rather than a progress indicator in a dialog. This would be used before any effects had been made 'render on demand'. You'd have restricted access to Audacity menus and buttons during a render. For example during a render you'd have the ability to examine the waveform for clipping, but not to edit sound or queue additional effects until the render completes.

#6. Label Track Enhancements

(suggested by Tom Moore)

Audacity has flexible Label tracks for annotating sections of the audio. The method for positioning and dragging the labels allows the same kind of labels to easily be used to label points, ranges, and also maintain boundaries between regions by dragging two end points at the same time.

We're looking for proposals for the next stage of enhancements, that integrate them more into the audio editing process. Possibilities to consider include:

  • More operations on labels, such as 'apply effect at labels'.
  • Visual enhancements, such as different icons and colours associated with different kinds of labels.
  • Handling of very large numbers of labels. This requires both optimisations and new visual options.
  • Snapping of labels, so that they position at specified time intervals.
  • New ways to automatically compute labels.

A detailed proposal should make clear the use cases for the enhanced labels that are motivating the changes.

Early spinoff from this work:

  • Label tracks to stick to the track above, so that they edit together.

#7. Postfish Integration

(suggested by ???)

Postifsh is a p-threads based Linux audio application with uncompromising quality and some unique effects such as 'deverb' which removes unwanted reverb from a track. Currently it can't be used from within Audacity.

Integrating it into Audacity will be a challenge indeed, particularly on the windows side where we anticipate problems with a simple translation to wxThreads. The student will need to be prepared to dive into such code.

We also have to be mindful that Audacity will usually be run on machines with far less processing power than the ideal for Postfish. Postfish does heavy number crunching. Its 'deverb' effect was designed for Dual Xeon 3GHz systems. For its integration in Audacity we want to fall-back gracefully to non-real time mode when necessary.

Early spinoffs from this work:

  • Demonstration of a simple real-time echo effect using same thread structure as planned for Postfish integration.

#8. Intuitive cross-fading

(suggested by Matt Brubeck)

One of the most common operations people want to do when mixing audio is to smoothly transition between two sound clips. This is commonly called a cross-fade. This operation is technically possible in Audacity now, but it is very clunky, requiring multiple steps and no editability short of undoing the actions and starting again. We are looking for someone to implement a clean, intuitive, nondestructive cross-fade for Audacity. Audacity already has all of the infrastructure necessary to support implementing this operation nondestructively and we already have a clear plan for how it should work. The following webpage has a mockup of what we think the GUI might look like:

This feature, while seemingly small, would represent a huge boost in usability for Audacity. This feature is intimately related to several other UI enhancements that we have proposed: for example, one element of this proposed GUI is that clips "stick" to each other or "snap" into place when you push them together. Such a snap-to behavior would be great in several other circumstances, for example having a track stick to t=0, or to a point that lines up with another track.

Early spinoffs from this work:

  • Ability for all effects to be faded in/out automatically. This can avoid clicks in some circumstances.

#9. Play-Back Enhancements

(First two parts suggested by James Crook)

Audacity lags behind commercial audio software in a number of details of its play back behaviour. Specific enhancements we would like a summer student to provide are:

  • No 'click' on start/stop/loop; At the moment there usually is an audible click when Audacity starts or stops playing a sound, or in iterations of playing a loop. A very short fade-in and fade-out applied only to playback should fix this.
  • Loop play adjusts dynamically to boundaries being moved; Finding the precise boundaries of a sound, for example an unwanted sound to be fixed, can be difficult with Audacity as it currently is. The location of the sound isn't obvious from the waveform. The new option would allow playing the sound in a loop, adjusting the boundaries to find out exactly where it starts and stops.
  • Vari-speed playback; Fast playback of sound allows sections of audio to be located more rapidly. Slow playback allows precise location (on timeline) of sound to be determined more accurately. There is a crude version of this on the 'Transcription Toolbar' - refining it would include allowing the speed to vary during playback without starting and stopping.
  • Drag-playback-cursor whilst playing; This requires changes to both playback and GUI. It is an extension of vari-speed playback and would make locating sound more rapid.
  • Play all 'labels' on the selected label tracks; Labels can be placed on the Audio both manually and automatically. This has many uses, one being the possibility of previewing a recording whilst skipping over periods of silence.

Early spinoffs from this work:

  • The schedule should plan to have some features complete to the 'release candidate' phase by the half way point. This is more useful than progressing all of the features in tandem, and perhaps completing none of them if time runs out.

#10. FFmpeg integration

(Suggested by Richard Ash)

A patch has been produced in the past to use FFmpeg libraries for importing and exporting audio files in a wide variety of audio formats. This however was against 1.2.x and will require considerable changes to integrate it with current audacity code. There will also be issues surrounding the build system and ensuring license requirements are met when distributing the resulting program.

  • Import needs to decode the imported files into audacity, handling varying channel counts sensibly.
  • metadata in imported files should be fed into the audacity meta-data handling so it can be stored, edited and used for exported files.
  • A decision is needed on what to do if video files (with and without sound tracks) are presented.
  • Export will need to work out how to provide a user interface to handle choice of codec and container formats.
  • Export should cater for multi-channel output where applicable with more than 2 output channels, using the export routing code already in place.
  • Export needs to write relevant metadata to the exported file from the available audacity metadata and user interface.

Early spinoffs from this work:

  • Importer is probably easier to do and independent of export. Adding an importer should not be terribly hard to do, at least for mono and stereo files which covers most use cases. No interface modifications needed to do this, and enabling / disabling it at build time is clean and follows other importers.
  • Metadata support can be implemented after other parts are complete / working.