GSoC 2008 - FFmpeg integration
- Gale 21Sep14: -1. The Functionality section is linked to in at least three places in the Manual. See the P2 I set in that section below to update the information. Perhaps it would be better for users who come here if that "functionality" information was in its own Wiki page. There might even be a case for having that information in the Manual if it was up-to-date, but until we decide if we are going to use FFmpeg or GStreamer there seems no point moving it to the Manual.
If you want to delete all the old GSoC pages except for the FFmpeg functionality I think you will need to ask permission on the -quality list. Fixing the Feature Requests page is a higher priority IMO.
The FFmpeg integration project incorporated the open source FFmpeg library into Audacity as an optional library. This greatly expanded the range of proprietary formats Audacity can import and export, as listed below. This page reviews and discusses progress of the FFmpeg integration project during GSoC 2008. Further discussion can also be found in the audacity-devel mailing list. |
Contents
Essence
FFmpeg is a set of libraries for audio/video encoding/decoding/muxing/demuxing/processing/capturing. In Audacity FFmpeg will be used to import and export audio data. While FFmpeg also has other uses, this is outside the scope of this project.
User Documentation
User documentation was produced by the student and has been incorporated into appropriate pages in the Audacity Manual. Note that details of the documentation or of the project implementation itself may change over time.
Design
Linking
As a design decision, FFmpeg libraries are being linked dynamically and loaded at run time. This allows FFmpeg to be optional (Audacity will run without FFmpeg) and it could be distributed separately (removes licensing issues).
It's not yet clear which build of FFmpeg will be used - static or shared. Static build results in one big library, that contains all functions from FFmpeg package. Shared build results in a few libraries (and these libraries may depend on other non-FFmpeg libraries), each library exports different functions. Valid system-wide FFmpeg package is usually shared, and resides somewhere in PATH, so any application is able to load functions from any of these libraries. This is also necessary because libraries depend on each other. Static build is usually stored somewhere within application's directory as some kind of plug-in. My guess is that Linux users would prefer shared FFmpeg, since they have superior package managing system, while on Windows it's easier to drop static FFmpeg build into Audacity and forget about packages.
From Audacity's point of view there's only one difference: when using shared FFmpeg, Audacity has to load each function from respective library, and each library has to be mapped into Audacity process memory by separate wxDynamicLibrary object. When using static FFmpeg, Audacity only loads one library and imports everything from it. Because all shared libraries should be in PATH, it's not useful to ask user for libraries' location when shared build is used, while with static build it may be necessary.
Somewhere along the way the FFmpegLibs class was modified to support loading both shared and static libraries, and library locating dialog was redesigned and put into Preferences. Also, now Audacity will modify application-wide environment variable PATH to include the directory where located library resides. Because of that it is possible to load both static and shared libraries regardless of their location, as long as all relevant shared libraries are either in one directory, or in the PATH (before there was a requirement that shared libraries had to be either in PATH or in Audacity directory). It means that static build is not very advantageous now, and should not be used unless there is a specific reason to do so.
Functionality overlap
FFmpeg is able to import and export FLAC, OGG Vorbis, MP3, MP2 and various uncompressed wave files, however Audacity already possesses all necessary capabilities to work with these formats (except MP3 - exporting requires separate plug-in). Using built-in Audacity import/export modules is better because they support various sample formats, while FFmpeg supports only 16-bit integer samples. FFmpeg also cannot (at the time of writing) import meta-data (tags) from raw FLAC files.
In some cases user may wish to use FFmpeg to import files however. This could be achieved in two ways:
- Add new preference - plug-in loading order. By using this preference user could adjust the order in which plug-ins are being registered in Audacity. When FFmpeg import plug-in is registered before all others, it would handle importing of any file without letting other plug-ins even try. At the moment FFmpeg is always registered last.
- Make Importer aware of the file filter user choose in FileDialog, and assume that when user chooses "FFmpeg-compatible file" filter, he (user) wants to import files via FFmpeg, and by choosing "MP3 files" user suggests Audacity to use libmp3lame. Implementing this requires a few changes, some of them may involve cumbersome passing of additional argument through several functions from ShowOpenDialog to Importer.
I implemented second way: FileDialog stores selected filter index in preferences, and Importer retrieves this index and uses it to guess first import plug-in to try.
Overlap in export functionality is discussed later.
Export procedure
Currently Audacity only offers limited choice of export file types:
- WAV, AIFF and other uncompressed files
- OGG Vorbis
- MP3
- MP2
- FLAC
- Command-line exporter
First one hides variety of export types (header formats) and codecs (sample encoding formats) in it's "Options..." dialog. This is possible because there's little or no difference between all these formats. For FFmpeg such solution is not acceptable, since FFmpeg can export audio in completely different formats and grouping them all in one choice is unintuitive. Grouping all the options for all these formats in one dialog is not the easiest (to implement and to use) thing in the Universe too.
To overcome this ExportPlugin class was slightly redesigned to present itself as more than one export type, while using the same code to perform actual export procedure. This new feature is used to present FFmpeg as a few common export types, each - with it's own simplified options dialog, while still providing access to complete functionality via Custom FFmpeg export dialog.
Functionality
The following table is a work-in-progress listing of FFmpeg format functionality as at FFmpeg 2.2.2 (recommended for Audacity 2.0.6 or later). Visit http://manual.audacityteam.org/man/faq_installation_and_plug_ins.html#ffdown for the recommended FFmpeg installer.
Some formats can only be exported (or offer additional export options) by using Audacity's command-line exporter. Choose this exporter by selecting (external program) in the Export File dialog then point Audacity to an FFmpeg executable binary. Examples for some specific formats are provided below for convenience. Audacity has no ability yet to import files by pointing to arbitrary versions of FFmpeg or other decoding libraries.
AMR Wide Band
- Imports and Exports are supported. Exports must be mono, 16000 Hz.
- To export, use Custom FFmpeg Export, choose "3gp" format and "libvo_amrwbenc" encoder.
- Alternatively, export using (external program). Click the button, enter the following command
ffmpeg -i - -acodec amr_wb "%f" |
Apple Lossless (ALAC)
- Import and export of 16-bit Apple Lossless files using the ALAC codec is supported.
- Exporting to ALAC only works if you export using (external program). Click the button, enter the following command
ffmpeg -i - -acodec alac "%f" |
Importing and Exporting using the Audacity-recommended version of FFmpeg
All functionality depends on the build of FFmpeg in use and the codecs enabled in that build of FFmpeg. For file formats and codecs supported by latest FFmpeg builds, see http://ffmpeg.org/general.html#Audio-Codecs.
Long name | Short name | Default extension | Read | Write | Read metadata | Write metadata |
---|---|---|---|---|---|---|
4X Technologies format | 4xm | ??? | yes | no | no | no |
ADTS AAC | adts | aac | yes | yes 5 | no | no |
Audio IFF | aiff | aiff | yes | yes | yes | yes |
3GPP AMR NB mono (8000 Hz) | amr | amr | yes | yes | no | no |
AMR WB mono (16000 Hz) | awb | awb | yes | yes | no | no |
CRYO APC format | apc | ??? | yes | no | no | no |
Monkey's Audio | ape | ape | yes | no | yes 4 | no |
ASF format | asf | wma | yes 2 | yes 3 | yes 4 | yes |
SUN AU format | au | au | yes | yes | no | no |
AVI format | avi | avi | yes | yes | no | no |
AVISynth | avs | avs | yes | yes | no | no |
AVS format | avs | avs | yes | no | no | no |
Bethesda Softworks VID format | bethsoftvid | ??? | yes | no | no | no |
Brute Force & Ignorance | bfi | ??? | yes | no | no | no |
Interplay C93 | c93 | ??? | yes | no | no | no |
CRC testing format | crc | crc | no | yes | no | no |
D-Cinema audio format | daud | 302 | yes | yes | no | no |
Delphine Software International CIN format | dsicin | ??? | yes | no | no | no |
DV video format | dv | dv | yes | yes | no | no |
Electronic Arts cdata | ea_cdata | ??? | yes | no | no | no |
Electronic Arts multimedia format | ea | ??? | yes | no | no | no |
FFM (FFserver live feed) format | ffm | ffm | yes | yes | no | no |
FLV format | flb | flv | yes | yes | no | no |
framecrc testing format | framectc | no | yes | no | no | |
GXF format | gxf | gxf | yes | yes | no | no |
id CIN format | idcin | ??? | yes | no | no | no |
id RoQ format | RoQ | roq | yes | yes | no | no |
IFF format | iff | ??? | yes | no | no | no |
Interplay MVE format | ipmovie | ??? | yes | no | no | no |
lmlm4 raw format | lmlm4 | ??? | yes | no | no | no |
Matroska file format | matroska | mka | yes | yes | yes | yes |
American Laser Games MM format | mm | ??? | yes | no | no | no |
mmf format | mmf | mmf | yes | yes | no | no |
QCP format containing QCELP | qcp | qcp | no | no | no | no |
QuickTime MOV format | mov | mov | yes | yes 5 | yes 1 | yes 1 |
MP4 format | mp4 | mp4 | yes | yes 5 | yes 1 | yes 1 |
3GP format | 3gp | 3gp | yes | yes | yes | yes |
PSP MP4 format | psp | mp4 | yes | yes 5 | yes 1 | yes 1 |
3GP2 format | 3g2 | 3g2 | yes | yes | yes | yes |
iPod H.264 MP4 format | ipod | m4a | yes | yes 5 | yes 1 | yes 1 |
MPEG audio layer 3 | mp3 | mp3 | yes | yes | yes | yes |
MPEG audio layer 2 | mp2 | mp2 | yes | yes | yes | yes |
Musepack | mpc | ??? | yes | no | no | no |
Musepack SV8 | mpc8 | ??? | yes | no | no | no |
MPEG-PS format | mpeg | mpeg | yes | yes | no | no |
MPEG-2 PS format | vob | vob | yes | yes | no | no |
MPEG-2 transport stream format | mpegts | ts | yes | yes | no | no |
MTV format | MTV | ??? | yes | no | no | no |
Motion Pixels MVI format | mvi | mvi | yes | no | no | no |
Material eXchange Format | mxf | mxf | yes | no | no | no |
NullSoft Video Format | nsv | nsv | yes | no | yes | no |
NUT format | nut | nut | yes | yes | yes | yes |
NuppelVideo format | nuv | ??? | yes | no | no | no |
Ogg | ogg | ogg | yes | yes | yes | yes |
Sony OpenMG audio | oma | oma | yes | no | no | no |
Sony Playstation STR format | psxstr | ??? | yes | no | no | no |
Speex | Speex | spx | yes | no | yes | no |
TechnoTrend PVA file and stream format | pva | ??? | yes | no | no | no |
Raw AC3 | ac3 | ac3 | yes | yes | no | no |
Raw DTS | dts | dts | yes | yes | no | no |
Raw FLAC | flac | flac | yes | yes | no | no |
Raw GSM | gsm | gsm | yes | no | no | no |
Raw MLP | mlp | mlp | yes | yes | no | no |
Raw Shorten | shn | ??? | yes | no | no | no |
Raw PCM | pcm_* | yes | yes | no | no | |
r2l format | r2l | ??? | yes | no | no | no |
RM format | rm | rm | yes | yes | yes | yes |
RPL/ARMovie format | rpl | ??? | yes | no | yes | no |
Sega FILM/CPK format | film_cpk | ??? | yes | no | no | no |
Sierra VMD format | vmd | ??? | yes | no | no | no |
Beam Software SIFF | siff | son | yes | no | no | no |
Smacker video | smk | smk | yes | no | no | no |
Sierra SOL format | sol | ??? | yes | no | no | no |
Flash format | swf | swf | yes | yes | no | no |
Flash 9 (AVM2) format | avm2 | yes | yes | no | no | |
THP | thp | ??? | yes | no | no | no |
Tiertex Limited SEQ format | tiertexseq | ??? | yes | no | no | no |
True Audio | tta | tta | yes | no | no | no |
Creative Voice file format | voc | voc | yes | yes | no | no |
WAV format | wav | wav | yes | yes | no | no |
Wing Commander III movie format | wc3movie | ??? | yes | no | yes | no |
Westwood Studios audio format | wsaud | ??? | yes | no | no | no |
Westwood Studios VQA format | wsvqa | ??? | yes | no | no | no |
WavPack | wv | ??? | yes | no | no | no |
Maxis XA File Format | xa | ??? | yes | no | no | no |
1 | Artist and Year not read or written (FFmpeg bug) |
2 | 24-bit WMA Lossless cannot be imported using FFmpeg 2.2.2. |
3 | A maximum of 2 channels can be written using the "WMA (version 2) Files (FFmpeg)" export choice or FFmpeg at the command-line using (external program). To encode other WMA formats, export using (external program) and point to a command-line WMA encoder. lvqcl's command-line WMA encoder can export as WMA V9, WMA Lossless and WMA 10 Professional (but limited to maximum 6 channels, despite WMA 10 Professional supports 8 channels). |
4 | Artist Name not supported (Audacity bug) |
5 | More than 2 channels output is not supported with the recommended FFmpeg 2.2.2 library, unless you export using (external program) with a command that tells Audacity to explicitly use the native FFmpeg encoder. This produces maximum 6 channels. For up to 8 channels, point the same command to the latest FFmpeg-git. |
Issues
GetOpenFileName filter length bug
On Windows one file filter in OpenDialog can't exceed the length of 260 bytes.
Eventually Leland found a workaround by directly hacking into the ListView control of the dialog and filtering its contents via custom filtering routine.
Channel order
FFmpeg presents channels in the order they stored on the media. Audacity has to watch the importing type and reorder channels accordingly. Same goes for export. This is worsened by the fact that Audacity lacks proper multi-channel support.
I decided to put this issue on hold. Without having full support for multi-channel audio there's no sense in reordering channels on import - order doesn't matters for Audacity, all channels are mixed to stereo or mono at playback.
Dynamic changes
For some files or other sources audio parameters may change over time. For example, number of channels or sample rate may vary. At the moment FFmpeg importer can't handle that.
While i believe such issue does exists, the only file with such odd audio i had happened to be broken (new version of FFmpeg refuses to transcode it, claiming it to be malformed). Until i find a sample with dynamically changing parameters, this issue is on hold.
Special cases
FFmpeg offers a lot of information, most of it is used to handle special cases (like - additional time offset, etc) and is completely ignored at the moment. I added a few log messages for special cases. While FFmpeg importer still won't handle these cases, at least one can see in the log that something is wrong.
Error handling
Importers and exporters in Audacity are not known for their meaningful error messages, because there is no error messages coming from them. Error reporting could be improved in all plug-ins, and error handling should be improved in FFmpeg plug-in in particular.
New logger solved this problem partially. New Progress Dialog helps too - it recognizes three outcomes of import procedure (Success, Canceled, Error) instead of two (Success, Not Success).
Implemented features
- AMR WB support is not enabled, because undistributable (non-free).
- Support for dynamic loading of shared FFmpeg builds (DLL, DYLIB or SO files) using Libraries Preferences or static FFmpeg builds (single EXE or other binary) using (external program).
- Dynamic loading has detailed logging in the Audacity Log.
- Audacity will feed FFmpeg with silence to ensure that last block is full. This behaviour is enabled by "/FileFormats/OverrideSmallLastFrame" config switch (1/0) and is enabled by default. There is no GUI for this setting at the moment.
- Custom FFmpeg Export dialog for arbitrary combinations of format and codec including load and save of presets for these combinations.
TODO
All planned major things are done.