Proposal Transcription Editor

From Audacity Wiki
Jump to: navigation, search
Proposal pages help us get from feature requests into actual plans. This page has proposals to improve audacity for audio transcription and subtitling.
Proposal pages are used on an ongoing basis by the Audacity development team and are open to edits from visitors to the wiki. They are a good way to get community feedback on a proposal.


  • Note: Proposals for Google Summer of Code projects are significantly different in structure, are submitted via Google's web app and may or may not have a corresponding proposal page.

Proposed Features

Indented topics are subfeatures.

  • Multi-line labels
    • Long Labels - Label track should allow further character entry even when you can't scroll the audio track any further. This is the main "feature request" from transcribers at the moment.
  • Import/export or convert to/from other label text formats:
    • *.SRT movie subtitle format (an .xls tool is available to convert from .srt to .txt and vice versa)
    • Cue sheets
  • Easy real-time change of playback speed: We need to fix bug 133 and bug 235 before these proposals make a lot of sense.
  • Configurable Transcription Toolbar The switched off features can gradually be resurrected as and when they are re-implemented correctly. With a configurable toolbar user can customise which buttons to show, including a mix of mainstream and experimental, if they so wish. Related to 'Transcriber's Panel' below, which could be a dropdown from the toolbar.
  • Transcriber's Panel with buttons for the various label operations, and showing shortcut keys.
    • Shortcuts for Transcribers Start a new label (when already in a label), join labels, split label, move back/forward a fixed amount of time (whilst playing, and moving label cursor too), start-recording/stop-recording with same key, start-label/stop-label with same key.
    • New clip key Each time we press a key whilst recording we start a new clip aligned with the start of the next label in the label track - if a label track is selected. May create new audio tracks as clips on the same track can't overlap.
    • Clip Do-Over Without leaving recording mode, push the current clip onto a muted 'rejected' track and rewind to 5s before its start. Play the 5s then start recording a replacement for the clip. To be useful we do need better ways to delete mistakes without interrupting recording workflow (6 votes on Feature Requests)
    • Label Nudging Moving labels whilst playing, when keys are pressed. For example move label start by a fixed amount, possibly using key combinations; move label start forward in time to next sound in the voice track.
    • Align clip to labels Select a sequence of clips and have their starts aligned to the starts of labels in a label track.
    • Align labels to clips Select a sequence of labels and have their starts and ends move to the starts and ends of corresponding clips, using a heuristic for what 'corresponds'.
    • Pack Clips Spaces between clips are removed. Used with linked labels, this will move labels too.
    • Clip trimmer Trim silence from start and end of each selected clip.
  • Open entire set of labels in external text editor
    • Find/replace text within any labels (using text editor for full flexibility and regular expressions etc)
    • Re-Import updated labels re-import from external file that has been modified.
    • Html-Lite Formatting conventions for the text version conform to a subset of html. For example bold and italics using html tags. However we can also have our own tags too, e.g. for beeps.
    • Fully customisable font and color for any selected text in a label (over-rides default set in the label track dropdown menu)
    • Customisable prefixes or categories e.g. to annotate "type of student error" for language teaching or type of "glitch" in a recording. Integration with Label Editor to view list of different categories with occurrence data.
    • Customisable audible beeps which can be added to any label (different beeps for different categories)
  • il8n Labels Different label tracks for different languages/translations, so that in one project we can have subtitles for multiple languages. Main thing is to tie this in with .srt il8n support which also supports more than one set of subtitles.
    • Gale: 1.3.12 had problems with decimal character values 300 to 400 on Windows but r10457 fixes those issues (that was the problem complained about in the Forum topic that generated this Proposal).
  • Simple Movie Window to view movie whilst subtitling.
  • Text-to-Speech (TTS) to record labels (subtitles) using synthesised voices (SAPI4, SAPI5).
    • Record a sound clip for a label e.g. language teacher records the correct pronunciation of labelled student's error


Note: some overlap with Proposal Label Enhancements

Developer/QA Backing

  • James: Likes the simpler to implement features. Thinks 'Movie window' if implemented should be as a call out to VLC, not done by us. Thinks TTS is featuritis, work that is not that useful. Transcriber could use a separate app that does TTS that is then worked on in Audacity.
  • Gale: I would suggest this Proposal could be generalised a little more along the lines of "Transcription Editor" or similar rather than headlining it just for film subtitles. (done). Indeed I would think the proposal of limited value purely for subtitling unless we did actually implement a "simple movie window" or bridge to one (strong general support on Feature Requests - 24 votes as at mid-Jan 2011).


Motivation / Use Cases

Transcriber's Panel

Part of the problem is that people are not finding 'TAB' and 'SHIFT-TAB' label navigation, so a combined menu and cheat sheet 'transcribers panel' might be in order. The cheat sheet could have a button with the key combination, e.g CTRL-5 and a description. Click the button on screen or on keyboard and the action happens.

The various requests for new key combinations are suggesting attaching keys in more complex ways than a single key to a single fixed command. There are parameters such as 'nudge by 5s' and there are multiple actions, e.g start/stop. These uses need something that go beyond the current keyboard preferences.

The New clip key makes it easy to replace existing audio with new audio, subtitle by subtitle. It is also good for games developers building up a library of phrases with associated labels.

Label Nudging

Typical on-line transcription editors do things in two passes. One pass to write the text of the labels, then a second pass to set the time points.

  • In the first pass the transcriber has to start and stop the video a lot, very hard to type at the speed people speak and often necessary to rewind to check exactly what someone said.
  • In the second pass it is nice to not have to start and stop, and single key label-nudging would help here.

Language teachers

GA has had correspondence with a couple of long-established language teachers, who point to the need to improve visual/audible categorisation of labels and the need to more easily "attach" an audio clip to a label. Both these would help enormously in using Audacity as a tool to correct language students' work.

For example, when student makes a mistake, teacher opens a panel and selects the appropriate error noise and category for the mistake, then presses an inbuilt record button to record comments and the correct pronunciation. When student receives their annotated recording and reaches the selected error, the beep and error category is audibly inserted before playing the student's rendition. Then the teacher's comments/correction is heard before the audio continues past the selection.


Some links

Past Discussion

Summary of user Mederi's post on the forum:

Recently I was looking for some simple audio tool on the Internet to adjust some movie audio track. I found Audacity, a real freeware masterpiece, that helped me very much (cutting, mixing, adjusting tempo). Then I played myself with recording my voice through microphone and got the idea to record my own spoken movie subtitles. Suddenly I found out that Audacity is already a good SUBTITLER.

If you show to users the new Audacity feature: SUBTITLES to VOICE and VOICE to SUBTITLES, they will use it. Especially users abroad (outside UK, USA) are watching foreign movies in original dubbing with a help of subtitles. Audio-video enthusiasts will be interested despite that Audacity is audio software only.

Some little adjustments in program code should not be a problem for programmers to improve this unfolded feature.

There is a label track that can be used for movie subtitles, too. Just to add import/export (conversion) of "subtitles.srt" besides "labels.txt" and to decide how to implement multiple lines of subtitles.

I prefer to put multiple lines (up to 3, or more?) of 1 subtitle screen into 1 label. Then I would add some switch button in menu at the beginning of label track to switch between 2 options: 1 line (multiple subtitle lines separated with some separator character like "|") and more lines (if subtitle screen contains 3 lines, then 3 lines in a label, too) of text in a label. One line using separator is good but multiple lines could be better.

There is something wrong with Fonts in Label Track. They are probably built-in and do not fully support Central European characters. If there were operating system fonts (Windows XP) used, there would not be the problem. I do not know how it works, whether I am right. Possible bug.

Then I would add: Edit > Move Cursor > To Next Label (S) - this should find forward from actual cursor (time) position the nearest label start time and put there cursor vertical line (like clicking with the left mouse button under the lable sets new Selection Start Time according to a place of clicking). Edit > Move Cursor > To Previous Label (A) - backward to the nearest lable start time Transport > Append Record/Stop (D) - single key to start and to stop recording, too (like Space bar Play/Stop). This could be alternative to Append Record (Shift+R). Right after Stopping of recording one Label there could also be some option (probably on/off button in menu at the beginning of the audio track) for automatic shift (jump) of the cursor (Selection Start) to (under) next Label, so user needs for the whole process only this one key if he does not make any mistakes.

A recorded voice audio track can be played together with the movie or Auto Duck effect can be used and then it can be mixed with original sound audio track.

Vice versa, writing of SUBTITLES according to movie audio, translating movie, song, adding comments... for all this is Label track already good enough, not only audio track Descriptor and Karaoke.

So like I wrote above, SUBTITLES to VOICE and VOICE to SUBTITLES, Audacity could be with some effort a simple and complete SUBTITLER, too. Then I have had some ideas about video player window or Text-to-Speech (TTS) support to let a computer to speak (read subtitles) and record it. Later I could think of it and maybe come with more details, too if there was any interest.