From Audacity Wiki
Revision as of 16:46, 22 March 2008 by James (talk | contribs) (Header and footer.)
Jump to: navigation, search

This is a resource that will be of interest to people working with Audacity!

We'd like to work more closely with them, find ways that our project can help theirs and that their project can help ours.

What VoxForge does

The VoxForge project collects speech from users from all around the world. We are working to create a Free speech corpus that can be used to create acoustic models, which can then be used for Free and Open Source speech recognition engines. All submissions are released under the GPL license.

We have corpora in English (our largest corpus), German, Dutch, and Russian. We are also working to add Italian and Hebrew to the mix. We collect speech using a number of approaches: using a Java applet, by telephone, and, of course, using Audacity.

We take a user's submission, and depending on its original formatting, we might downsample it to our standard formats of:

 * 16kHz_16bit wav
 * 8kHz_16bit wav

Each submission contains the following directory structure:

 * [submitterID]-[date]-[3 random characters]
   * etc
      * audiofile_details - contains the submission's orginal formatting information (sampling rate/bits per 
      * GPL_License - full gpl license
      * HDMan_log - console output from HTK's HDMan tool 
                  - gives the phone usage counts for the submission
      * HVite_log - console output from HTK's HViteDMan tool
                  - runs a "re-alignment" of the training data (to find the best pronunciation for a given 
                    word), but its main purpose is a sanity check to make sure an audio recordings matches 
                    its prompt.
      * PROMPTS - sanitized prompts file (with some punctuation removed) and includes path to audio for 
                  acoustic mode training.
      * prompts-original - prompts in their original format
      * README - information about the user (gender, age range, language, pronunciation dialect) and their 
                 recording environment
   * wav - the actual audio
   LICENSE - GPL license notice

Free Speech... Recognition