Proposal General Scripting or Python Integration

From Audacity Wiki
Jump to: navigation, search
Proposal pages help us get from feature requests into actual plans. This page is a proposal to create a new framework for scripting Audacity, possibly including an interactive Python Console.
Proposal pages are used on an ongoing basis by the Audacity development team and are open to edits from visitors to the wiki. They are a good way to get community feedback on a proposal.


  • Note: Proposals for Google Summer of Code projects are significantly different in structure, are submitted via Google's web app and may or may not have a corresponding proposal page.
  • NickHilton 23Jul14: Warning: As starter of this proposal I'm new to Audacity, and in fact, haven't seen a single line of Audacity's code yet. So I'm just writing about what I'd like to see.


Proposed Feature

Create a new framework for scripting many aspects of Audacity, which may include:

  • Interactive Python Console
  • Track manipulation
  • Track audio manipulation
  • Track annotation
  • GUI window manipulation
  • Adding items to menus
  • Invoking menu items
  • Recording and playing back macros
  • Processing chains in-term of the framework
  • Project template creation
  • Menu bars with clickable icons.

The short term goal would be to work on a framework based on the Open/Closed Principal. That is set up a framework that's easily extensible, but closed to modification.

Interactive Python Console

A Python console is a quick and easy way to explore what classes and methods the Audacity Python module would provide. It would also allow one to interact and manipulate objects in the current project, providing a quick way to learn how to write scripts.

Track Manipulation

From the new framework, all aspect of tracks should be available for manipulation:

  • inserting new tracks,
  • removing existing tracks,
  • manipulating their parameters, like sample rate, pan, volume, selections.

Track Audio Manipulation

The audio data (in Audacity, float32) should be accessible through the framework, including iterating over selections, getting the sample start/stop of each selection, then manipulation the data and writing it back to the track. Audio data insertion and removal from any location in time should also be possible.

Note that selections will be multi-selections, allowing multiple spans. We may also go down a similar route to Photoshop and have 'fuzzy selections' where the selection isn't just 0 or 1. This allows the selection to include some information about blending.

Track Annotation

Adding labels, drawing lines, is something Matplotlib calls annotation.

GUI window manipulation

The framework should allow the Main window to be manipulated, zoom level, window size, position, hide, bring to the front, etc.

The zoom level is actually a property of the ruler, rather than per project window. We have example screens with multiple rulers at different scales. It's quite a general thing that many GUI choices can be seen as being per-component, or belonging to an item higher in the tree. More generally, a property like color of items in a list may be a fixed constant value, a value that is per item, or the result of a function computed on properties of each item.

Adding items to menus

The framework should allow the menu bar to be manipulated, the most obvious one being inserting items under the plug-ins' menus. But new menus should also be allowed. Taking this proposal to its ultimate extreme, all of Audacity's functionality comes from plug-ins that register themselves with the framework.

Invoking menu items

Items in the Audacity's menu system should be able to be access and manipulation from the framework.

Recording and playing back macros

With the framework in place, the framework events being triggered could be recorded as a macro, which would be a great starting point for writing my own plug-in to do some thing similar. I use jEdit as my principle text editor, it has a wonderful macro recorder and playback. Once a macro is recorded, it can be saved and manually edited to become a new plug-in.

It's also a quick way to find out what objects to manipulate.

Processing chains in-term of the framework

With the framework in place, plug-ins could be written that invoke other plug-ins, create a processing chain.

Project template creation

With enough element in Audacity scriptable, project templates could easily be created. For example for podcasts, usually there's always an intro, and the regular voices are on their own tracks, and a closing. A template project could be created with a click on the plug-in, and the user just drags and drops on the track with the audio, or points to a folder with .wav files that get imported into the project.

This would replace the existing system for Simplifying Audacity.

Menu bars with clickable icons

Plugins could provide an icon when they register with Audacity, menu bars could be created as a collection icons.

Framework Design

Drawing framework.png

media:drawing_framework.svg.zip

Messages or events being passed around could contain data as UBJSON, a binary format of JSON. This would allow easy extension, as JSON is essentially a std::map or Python dict.

Also consider Protocol Buffers, which could easily enable C++/Python generated classes from a custom message format.

Message Passing

Using a tree-like structure, such as JSON, provides flexibility in how data is passed around the framework. The actual framework would use a binary format for efficiency, but the concept of messages having an hierarchy is the important bit.

A simple example:

{
    "target" : "gui.volume.master",
    "write" : {"volume" : 0.66},
}

Objects that provide servies for plug-ins, should register themslves with the framework, telling who they are:

A registration message:

{
    "register" :
    {
       "target" : "gui.volume.master"
    }
}

At startup, the widget that controls the master volume would register itself with the frame, passing it the message above. The framework would then create a mapping to the object/Qt event for that target:

// c++ somewhere
targets["gui.volume.master"] = & obj;

Where obj is an class that implements the framework interface for emitting and receiving messages.

Next, a plug-in written in python could compose a message and send it to the framework:

{
   "target" : "gui.volume.master",
   "write" : {"volume : 0.66}
}

The message would travel through the socket to the framework that listens for the messages and eventually fires it off to the target. The target's callback (or Qt 'slot') that handles the messages inspects the contents to process it. If it's malformed, a nice log message and popup dialog explains why.

A message for reading something might look like this:

{
    "target" : "gui.volume.master",
    "read" : "volume"
}


Example Plugin: Low Pass Filter

I'll try to work through the messages for implementing a low pass filter plug-in.

Registration:

{
    "register" :
    {
       "target" : "plug-in.effect.filters.low_pass.nick",
       "menu" :
       {
           "category" : "effect.filters",
           "label" : "Low Pass FIR Filter (by Nick)"
       }
       "icon" : [ // binary data stream ],
   }
}

When the menu item is clicked, the menu item widget emits this message, which eventually the plug-in's callback handler is invoked:

{
   "target" : "plug-in.effect.filters.low_pass.nick"
   "emitter" : "menu.effects.fliters"
}

This message is processed by the plug-in's callback method:

   import audacity
   
   # ...
   
   class MyPlugin(audacity.Plugin):
   
       def callback(self, message, *args, **kwargs):
           
           # handle message

The callback would then query if there is any track data selected, perhaps with this message:

{
    "target" : "gui.tracks",
    "read" : "count"
}

The plug-in would get this message:

{
    "target" : "plug-in.effect.filters.low_pass.nick",
    "emitter" : "gui.tracks",
    "count" : 1
}

Which could mean there is 1 track in the project. To get the selection from the track:

{
    "target" : "gui.tracks",
    "read" :
    {
       "selection" : 0
   }
}

And the reply could be:

{
    "target" : "plug-in.effect.filters.low_pass.nick",
    "emitter" : "gui.tracks",
    "selection" :
    [
       [100, 1000],
       [2000, 3000]
   ]
}

This message indicates there are two regions selected. It could also be possible that nothing is selected:

"selection" : []

In which case the plug-in could default to processing the entire track.

Next, assuming the track has data to process, the plug-in would popup a window to configure the filter parameters. Lets suppose a region was selected, so the plug-in reads some track data with the message:

{
   "target" : "gui.tracks",
   "read" :
   {
       "track" : 0,
       "start" : 0,
       "stop" : 123412341
   }
}

The reply could be:

{
   "target" : "plug-in.effect.filters.low_pass.nick",
   "emitter" : "gui.tracks",
   "track" :
   {
       "id" : 0,
       "sr" : 48000,
       "channels" : 2,
       "data" :
       [
           [ // binary data for channel 0],
           [ // binary data for channel 1],
       ]
   }
}

Next, the "Preview" button is pressed, the plug-in processes that data and writes a message to play the audio:

{
   "target" : "gui.playback",
   "play" :
   {
       "sr" : 48000,
       "data" :
       [
           [ // binary data for channel 0],
           [ // binary data for channel 1],
       ]
   }
}

The audio is heard. The user is happy and presses okay, which sends this message:

{
   "target" : "gui.tracks",
   "write" :
   {
       "track" : 0,
       "start" : 0,
       "stop" : 123412341,
       "sr" : 48000,
       "data" :
       [
           [ // binary data for channel 0],
           [ // binary data for channel 1],
       ]
   }
}

Message Design Discussion

My initial thought was no code would need to be generated. We could leave it up to the plug-ins themselves to interpret the message, and throw an error if the message isn't understood. Let me illustrate what I mean.

In this use-case, I'm writing a script that will use a low pass filter plug-in. Suppose we have a Message class that allows us to insert any data we want. For example:

// C++

Message msg;

// template <class T>
// void set(std::string const & path, T & v);

msg.set("emitter", "my_script.py(36)");  // filename & line no
msg.set("target", "plug-in.effect.filters.low_pass.nick");
msg.set("write.cutoff", 500.0f);

The JSON equivalent:

{
    "emitter" : "my_script.py(36)",
    "target" : "plug-in.effect.filters.low_pass.nick"
    "write" :
    {
       "cutoff" : 500.0
    }
}

Now I send the message to the plug-in. The plug-in will read the message and determine if it can under stand it by trying to read paths out of the message:

void message_callback(Message & msg)
{
   if(msg.has_path("write")) _do_write(msg);
   else
   if(msg.has_path("read")) _do_read(msg);
   else
   if(msg.has_path("process")) _do_process(msg);
   
   throw MalformedMessage(msg);
}

As part of the 'Framework Engine' plug-ins should advertise what messages they can process. This way, a full set of documentation could be auto generated.

Some pros, cons for this approach:

Pros:

  • Light-weight messages, no message definition files to maintain
  • All details of the messages a plug-in can process, is contained in a single place/file, rather then scattered about in message code elsewhere
  • Extending a message's contents simply means reading/writing more paths to the message
  • No third-party tool (like SWIG) is needed to generate class code
  • Messages aren't named, so no naming/namespace names, file organization issues

Cons:

  • Plugins must populate all message content
  • Malformed messages could be easily created, could be annoying
  • Extending a message's contents means reading/writing more paths to the message, PLUS updating the message's definition file
  • No benefit from class constructors, where default values could be populated
  • Classes could provide methods that operate on the data in the message
  • Changing a message format could mean catching errors at compile time for C++ plug-ins
  • Message format errors could be caught in the script emitting the message, instead of the target complaining about it

Lots of trade-offs to think about.

The "Don't repeat yourself" principle argues that we should use code generation. We want to offer the 'interface' on both sides of the pipe and in multiple languages and with multiple transports, rather than being locked down.


Development

Evolutionary Approach

For python scriptability we can choose:

Select: Mode=Range FirstTrack=0 LastTrack=1 StartTime=3.2 EndTime=8.4
over a named pipe. Scripting is enabled by placing the existing mod-script-pipe.dll in the modules sub directory below the audacity executable directory, and then agreeing to load it when Audacity prompts you. mod-script-pipe is language agnostic. There is sample code to talk to the pipe written in Perl. The quickest way to get some Python scripting is to write some Python code to do the same thing.
  • (b) New code. Switch from hand coded functions to present Audacity functions as text to SWIG produced functions. Use a registration system (the 'Registrar') within Audacity to collect families of related functions, e.g. functions that apply an effect, functions that create a track of some type and accessors for preference parameters. Auto-generate the marshalling code. Once we are auto-generating it is relatively easier to try out different transport approaches like JSON, protocol buffers, DCOM, or human readable text.

Approach (b) will take at least 6 months to get anything at all useful to end users. I'd advocate an approach where we do (a), then start evolving it towards (b). That gives a useful result for users much sooner. Building on (a) would mean exposing more of Audacity functionality via the text based script interface. As Audacity does not use template classes, we can probably easily generate the human text readable format automatically from a suitable SWIG definition. That gives us an evolutionary approach from (a) to (b).

Examples

For scripting to be useful we should have some example scripts that extend Audacity in some way.

Experience

I'd like to compare ImageMagick, which is designed with external scripting in mind, with 'The Gimp', which supports internal scripting. Blender, and the approach we are trying here, ideally give the best of both approaches - external scripting (full power of the scripting language) without losing the convenience of a GUI. Some experience with ImageMagick is that its text based syntax, being bracketless, is inconvenient for complex scripts. We need functions that will for example return a selection object that can be passed into another function. That needs brackets (or RPN) so that the syntax is consistent. Our current text syntax does not support that at all. We are strictly one command at a time.

Chains

We should consider taking the existing 'chains' feature out of Audacity and making it a separate program, based on the scripting interface. Done carefully this program need not 'know' anything about Audacity and could be used for simple batch operations for any program that provides a scripting interface. The main thing it adds over a command line is convenience in selecting files to iterate over, and GUI based prompting for files to operate over and parameters for functions to apply. The same chains code should be perfectly capable of loading each of 60 blender models in turn and creating x,y,z and diagonal view screenshots.

API

Registration

The point of this section is to demonstrate that one mechanism for building trees can be used in many places in this proposal.

The TrackPanel plug-in was created to start breaking down the track panel into re-usable components, that could then be recombined.

Here's an extract from some C++ code in the track panel plug in. This code is building part of the display, using Start(), Add() and End() to make a tree. This is a very similar style to existing code in Audacity for building dialogs using ShuttleGui.


   S.Start( new Sizer );
     S.Start( (new Sizer)->SetSizing( 50, 0 ).SetTiling( eTileVertical ) );
       S.Add( (new ButtonTrack)->SetSizing( 290,0 ) );
       S.Start( (new Sizer)->SetTiling( eTileBorder ) );
         S.Add( new Meter );
       S.End( _Sizer );
     S.End( _Sizer );
     S.Start( (new Sizer)->SetSizing( 30, 3.1 ).SetTiling(eTileOverlay ));
       S.Start( (new Sizer)->SetSizing(30,3.1) );
         S.Add( (new RulerTrack)->SetZoom( 1.5 ).SetSizing( 30, .1 ) );
         S.Add( (new LabelTrack)->SetSizing( 0, 0.6 ) );
         S.Add(  new WaveTrack ) ;
         S.Start( (new Sizer)->SetTiling( eTileOverlay ) );
           S.Add( (new EnvelopeTrack)->SetAgg( kAggShade ) ); // The background.
           S.Add( (new EnvelopeTrack)->SetAgg( kAggLines ) ); // The line.
         S.End( _Sizer );
       S.End( _Sizer );
       S.Add( new SelectionTrack );
     S.End( _Sizer );
     S.Start( (new Sizer)->SetSizing( 30, 1.1 ).SetTiling(eTileOverlay ));
       S.Start( (new Sizer)->SetSizing(30,1.1) );
         S.Add( (new RulerTrack)->SetZoom( 0.65 ).SetSizing( 30, 0.1 ) );
         S.Add(  new WaveTrack ) ;
       S.End( _Sizer );
       S.Add( new SelectionTrack );
     S.End( _Sizer );
   S.End( _Sizer );


Here is some code that adds menu items via the 'Registrar'.

   ShuttleMenuBase &M = Registrar::AtMenu()
     M.Add( _("&Edit/Modify/Sounds/Echo"),    ApplyEffect, HAS_ARGS, eEcho,     4100 )// About 1/10th second;
     M.Add( _("&Edit/Modify/Sounds/Silence"), ApplyEffect, HAS_ARGS, eAmplify,  0   );
     M.Add( _("&Edit/Modify/Sounds/Amplify"), ApplyEffect, HAS_ARGS, eAmplify,  150 );
     M.Add( _("&Edit/Modify/Sounds/Quieten"), ApplyEffect, HAS_ARGS, eAmplify,  66 );


Here is some code that adds a new module. This is the module that supports drawing a waveform. The parameters registered control the waveform color.

   ShuttleModulesBase &M = Registrar::AtModules();
     M.Add( new WaveTrack, WaveTrack_Register );
   ShuttlePreferencesBase &S= Registrar::AtPreferences();
     S.Start( &SysWaveTrack );
       S.Add( wxT("We use a different color when the wave is zoomed in to show individual samples ") );
       S.Add( wxT("Normally you are seeing a summary of several pixels ") );
       S.Add( wxT("cWaveMain"), &Sys::cWaveMain );
       S.Add( wxT("cWaveStretch"), &Sys::cWaveStretch );
     S.End( _Sys );


The aim is to create a uniform system for creating 'trees' of things. Our menus are trees of clickable items. Our screens are trees of widgets (including things like tracks). Our preferences are trees of mutable values. The scripting language should have access to the trees and the operations for building the trees.

Plugins

Here is the base class used for identification, e.g to identify plug-in effects.

 class IdentInterface
 {
 public:
    virtual ~IdentInterface() {};
 
    virtual PluginID GetID() = 0;
    virtual aString GetPath() = 0;
    virtual aString GetName() = 0;
    virtual aString GetVendor() = 0;
    virtual aString GetVersion() = 0;
    virtual aString GetDescription() = 0;
 };

It is likely that we will use reversed URLs, as eclipse does, for the 'vendor'. PluginIDs are GUIDs. Code cannot assume that PluginIDs are actually globally unique as experience shows that someone will package things incorrectly and re-use an existing Id. The scripting API will need to take account of plug-ins, otherwise it won't get all the functions available within Audacity. It will need to query Audacity for a list of registered objects.

CRUD

We need commands to create, read, update and delete objects.

  • I've a preference for using prototypes to specify what to create. That way we don't hard-code some choices into the API. For example, we might initially regard point labels and labels that span a range as different things. Later we might treat them both as special cases of 'Label'. With prototypes we have a named example of a point label and a named example of a range label that can then be cloned and modified. Our python code does not need to know whether they are different kinds of object, or the same.

Tree Manipulation

We need commands to work with a tree of things. So far we have:

  • Start(), End(), Add() used in tree creation.
  • At() used to specify a place in a tree.


Path Forward: Leveraging The `mod-script-pipe` Plugin

The mod-script-pipe plug-in architecture:

Nick-mod-script-pipe.png

Forking mod-script-pipe to use Google Protocol Buffers:

Nick-mod-script-pipe-pb.png

media:nick-mod-script-pipe-svg.zip

References