Difference between revisions of "VoxForge"
From Audacity Wiki
Galeandrews (talk | contribs) (minor tidy) |
Windinthew (talk | contribs) (Remove confusing use of "we" in intro after "Audacity") |
||
Line 1: | Line 1: | ||
− | {{Intro | + | {{Intro|[http://www.voxforge.org VoxForge] is a Free and Open speech resource licensed under the [http://www.gnu.org/licenses/licenses.html#GPL GPL] that will be of interest to Audacity users. VoxForge would like to work more closely with Audacity, and find ways in which our projects can help each other.|}} |
Latest revision as of 22:16, 9 November 2010
VoxForge is a Free and Open speech resource licensed under the GPL that will be of interest to Audacity users. VoxForge would like to work more closely with Audacity, and find ways in which our projects can help each other.
|
What VoxForge does
VoxForge collects speech from users from all around the world. We are working to create a Free speech corpus (a database of speech audio files and their text transcriptions) that can be used to create acoustic models for use in Free and Open Source speech recognition engines.
We have corpora in English (our largest corpus), German, Dutch, and Russian. We are also working to add Italian and Hebrew to the mix. We collect speech using a number of approaches: using a Java applet, by telephone, and, of course, using Audacity.
We take a user's submission, and depending on its original formatting, we might downsample it to our standard formats of:
Each submission contains the following directory structure:
* [submitterID]-[date]-[3 random characters] * etc * audio file_details - contains the submission's orginal formatting information (sampling rate/bits per sample). * GPL_License - full GPL license. * HDMan_log - console output from HTK's HDMan tool. - gives the phone usage counts for the submission. * HVite_log - console output from HTK's HViteDMan tool. - runs a "re-alignment" of the training data (to find the best pronunciation for a given word), but its main purpose is a sanity check to make sure an audio recording matches its prompt. * PROMPTS - sanitized prompts file (with some punctuation removed) and includes path to audio for acoustic mode training. * prompts-original - prompts in their original format. * README - information about the user (gender, age range, language, pronunciation dialect) and their recording environment. * wav file - the actual audio. LICENSE - GPL license notice
Free Speech... Recognition