From Audacity Wiki
Revision as of 22:16, 9 November 2010 by Windinthew (talk | contribs) (Remove confusing use of "we" in intro after "Audacity")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search
VoxForge is a Free and Open speech resource licensed under the GPL that will be of interest to Audacity users. VoxForge would like to work more closely with Audacity, and find ways in which our projects can help each other.

What VoxForge does

VoxForge collects speech from users from all around the world. We are working to create a Free speech corpus (a database of speech audio files and their text transcriptions) that can be used to create acoustic models for use in Free and Open Source speech recognition engines.

We have corpora in English (our largest corpus), German, Dutch, and Russian. We are also working to add Italian and Hebrew to the mix. We collect speech using a number of approaches: using a Java applet, by telephone, and, of course, using Audacity.

We take a user's submission, and depending on its original formatting, we might downsample it to our standard formats of:

Each submission contains the following directory structure:

  * [submitterID]-[date]-[3 random characters]
    * etc
       * audio file_details - contains the submission's orginal formatting 
                              information (sampling rate/bits per sample).
       * GPL_License - full GPL license.
       * HDMan_log - console output from HTK's HDMan tool. 
                   - gives the phone usage counts for the submission.
       * HVite_log - console output from HTK's HViteDMan tool.
                   - runs a "re-alignment" of the training data (to find 
                     the best pronunciation for a given word), but its 
                     main purpose is a sanity check to make sure an audio 
                     recording matches its prompt.
       * PROMPTS - sanitized prompts file (with some punctuation removed) 
                   and includes path to audio for acoustic mode training.
       * prompts-original - prompts in their original format.
       * README - information about the user (gender, age range, language,
                  pronunciation dialect) and their recording environment.
    * wav file - the actual audio.
    LICENSE - GPL license notice

Free Speech... Recognition