Talk:Completed: Proposal Unitary Project

From Audacity Wiki
Jump to: navigation, search

Gale: 28Apr11: I'm strongly in favour of a more "unitary" project format, though I am not sure about the details of presenting your scheme to the user. Is the user still opening the .aup? If so then when user does File > Open (Project), they will I assume never be able to see more than one .aup at a time, because each will be in its own folder. They will be able to see all the folders.

If we had a genuinely unitary format (somewhat like a .zip) that packed the project file and _data folder into a single entity, that would seem to me more robust and intuitive. How do other project-based apps handle making it harder to access the data?

Do we need the "Temporary Project" menu item? Why can't you just hit Record?

Not sure about "Close and Delete" either. We do want some safe way to let users rename and delete projects. My current thinking is a single menu item that leads to some kind of "Project Manager" or "Project Browser". See bug 136.

Bill03May2011: Regarding other project-based apps, iMovie and iDVD pull everything into one monolithic file e.g. myDVD.dvdproj. I have no idea what Apple does to make access to the data in those files easy to access and manipulate. OTOH Pro Tools uses a scheme almost identical to Audacity's. In Pro Tools (at least up to v6.x) you could not have a new untitled "session" (equivalent to an Audacity project) - you were forced to name it and save it. In the process Pro Tools would create folder with the same name as the session, then save the .pts file in that folder, and create in that folder two folders called "Audio Files" and "Fade Files". This is more in line with Steve's proposal, although I'd expect resistance to the notion of having to save a project before you can record into it. iMovie and iDVD are consumer apps, so it makes sense that Apple would make them as bullet-proof as possible. Pro Tools is (or was) targeted at professional users. The Pro Tools M-Powered series (bundled with M-Audio consumer interfaces) seems to do the same.


Alternative Proposal transferred from the Forum

*.aua File Format

The name is just a strawman exemplar.

Koz Proposed: Photoshop has its PSD file format. You can save your work as MyPicture.psd and email it to someone and they can open it up and look at all your layers, formats, masks, channels, paths, etc.
Audacity needs an .aua file format does does the same thing (I picked the letters out of a hat). Instead of trying to email an .aup file to someone, Audacity would have its own file format -- possibly based on a variant of FLAC -- and available under File > Save.
AUP would be supported, but no longer generated.

Steve:I've been wondering if the "unitary project format" could be something like an ISO or UDF format, but I don't really have sufficiently in-depth knowledge to form an opinion.

Bruno:I don't it necessarily needs to be an ISO or UDF format, but I understand your idea of projects files being "embedded" inside some sort of "virtual disk image". There a few file systems that could be used for that. Compatibility among all the 3 platforms could be an issue though, unless all the libs and tools were already included with audacity. In linux virtually any kind of filesystem can be mount as a loop device, ie, you can make a raw copy of a disk into a file and then mount that file as if it was a real disk (you use the loop module for that). In terms of performance I'm not sure how much that would cost us... maybe too much... maybe not... I haven't done any recent performance tests on loop devices... Could there be big differences between different platforms...

Bruno:One option that could probably be easily done would be an "export project as zip" feature. And also an equivalent "import project from zip". AUP would be supported, but no longer generated.

Koz: The goal is to avoid this.
Saving a Project in its current form is an Advanced Production function for use by people putting together the Academy Award® show sound track, not somebody with simple, 40 minute lectures like this user.
Somebody complaining that the .aua file is awkward and slow can be introduced to the brittle .aup file format and its associated /data folders and high speed, efficient long show production. Aup should not be the default save format.
Please note that the .aua format can be saved as a Classic Project by simple loading and saving. Further, .aua is not subject to file splitting and scattershot data wounds that long format Projects are.
.aua will save large, complex, multi-track productions uncompressed, and remember, the base format is FLAC.
Audacity core doesn't change.

Bruno: One other thing that could be done was changing the way the aup and the data files are stored...
On the save dialog when the user is asked for the file name, instead of a file name that could be a dir name... Then Audacity would create a directory with the name specified by the user and the .aup file would go inside that folder among with the data folders and tiny audio files.
Instead of the .aup having the name of the project it could be a standard name always the same for all projects. You could even change the extension from .aup to .xml and it would go like metadata.xml or projectinfo.xml or similar.
That way the user wouldn't get so confused... He'd know that and audacity project is a folder containing many files...

Steve: The main question there is how to achieve a unified project format without incurring a performance penalty. I don't think that a solution that involved "unpacking" a project into temporary files would be acceptable as large projects would then require a lot more disk space. Audacity would need to be able to work directly on the project.


James' Proposal

The tree and block design by Roger and Dominic achieves what we want in terms of performance, but not all i none file. Its scheme has multiple files because it 'delegates' management of whole blocks to the OS running the disk. An important point is that when splitting / merging blocks we copy so that a block rarely ends up less than half full. Specifically, we never allow two half-full blocks in succession. If that would happen, we copy to merge them. This gives the stutterless guarantee. Empty files aren't needed at all, and are deleted, the OS doing 'garbage collection' of any disk space freed. This makes the 'memory management' simpler.

James proposes instead:

  • Implement our own efficient malloc/free within a single file. We set a minimum size for mallocs in the file - which corresponds to the block size in our current scheme. We also offer a free that can free part off a block. If both parts are above the minimum size this is done in situ without copying. This combination leads to a small efficiency gain relative to the existing system, in that
(a) our 'blocks' can end up bigger than the minimum block size
(b) our block boundaries will more often end up aligned with split/join boundaries, reducing the amount of copying on a split/join.
(c) we'll more often use exactly the size needed.

We still keep the guarantee about stutterless play, which is that we have at most two jumps per 'block-size' of data.

  • The 'AUP' tree becomes an edit decision list and is added to the end of the file.
  • Because we don't have the OS doing garbage collection of empty blocks, we may end up with holes in our file. Also when we save we in any case want to purge the undo data. The easy way to do this is to write a new file. Because the working file already has the stutterless guarantee, doing this will be very nearly as fast as copying a file, as we'll be working with large blocks of data.

Some more details:

  • We can repurpose wav and more easily ogg-vorbis files for this, so that the Audacity specific information is held in metadata. After garbage collection these will be playable as native files. Without garbage collection they will still have 'the right audio' in them, just jumbled and partially repeated. Typically the early part of the file, and any uninterrupted recording session will be contiguous in the file.
  • Most code in Audacity should be written in a way that does not care or know about the block boundaries. It should just be processing streams of data. The system needs an abstract interface that hides the blocks.

This is too big a change for main Audacity itself. My thinking is to bring the Unitary Project format in with the new trackpanel plug-in.


A further proposed change is about parameter settings. At the moment the config file and the aup file are distinct formats. However, we ought to be able to have config file settings in the aup file.
  • As a 'unitary project' I would also like to put preferences data and project data on the same footing. I would like users to be able to choose whether a setting, such as quality settings is per project or global. This project can therefore override settings that are global. This requires changes in the user interface so that the user can choose what level a setting lives at, and is not an essential part of the proposal.


For developers:

  • I've not spelled out the details of the edit decision list. It will reference data via pointers, and where the pointers or the data get too fragmented, consolidate that into one piece, and free the original pieces, adding on to the end of the file if there is no other free space to use in the file.
  • Each separate stream of data has the stutterless property. Audacity will need to have a fixed time interval cached in advance for each stream. Provided Audacity does that, and provided the total data rate is sustainable, there won't be stutter. Starting play and also unmuting a track may entail a caching delay.


Andrew 17Nov14

  • If an ogg-vorbis file type is used to save the chunks of data, I believe the data must be stored as int only. Do you believe that the conversion from float should be done at the point of saving or do you see a point where there is an allowance of non float native block storage in the tree? Maybe conversion done at processing time?
  • Do you see any need for dating or dirtying data to prevent the rewriting of unchanged data?
  • If a single file system is used, could blocking or thrashing between reading and writing become a problem as data goes through the transformation process. Is there a place for a rolling file system?
  • Have you thought of any changes necessary to improve past 32-bit data limitations?

James 18Nov14

A lot of details still to work out. I probably should have said vorbis-flac, as we need a lossless format.

  • I see no problem in clipping stored float values that are outside the range -2..+2 and losing precision on very faint sounds and so using 32 bit ints would be OK, though I hadn't actually considered this concern.
  • There could be value in internal pipelines using ints. If writing by hand I'd prefer to use floats throughout so that I write less code. If using a code generator, then getting it to write multiple versions of code and conversion code, could make sense.
  • A single-file system shouldn't actually cause any more thrashing or blocking than the current system where multiple files live on the same disk. An in-situ, rather than copy-based defragmentation could be rather slow and inefficient, so I am inclined to go for copy based as the only way to start with. I don't know what you mean by a rolling file system.
  • I have not been thinking beyond 32 bits, which gives us 27 hours of recording at 44.1KHz, and taking 32Gb of storage.

Gale 21Nov14:

  • It strikes me that another necessity is a sanity check when Audacity saves the new unitary file, so that the blocks can be reopened in the correct order and can be reopened at all. Where is the Timeline information for the blocks stored - in the metadata of the file? Is the garbage collection certain to ensure sanity of its own accord?
  • (James) the writing scheme is incremental, so the file is always valid (though possibly not up to date) after writing each complete block. As an example, the file could 'think' it had three holes. We write blocks of audio data into those three holes. It still thinks it has three holes until the metadata is updated too. How much work you lose in a crash is down to how often you update the metadata. The garbage collection is incremental too, so the file is valid all the time, though as I said, it is enough and easier/faster to create a new untangled file rather than defragment in situ. If the existing metadata formats are very cumbersome to work with, I may instead fake it with metadata in the actual audio data. You would then hear a little bit of noise at occasional intervals IF you play in another program without having defragmented in Audacity first. Your audio will sound bad anyway, due to the rearrangement of blocks and unused data. Code can and should have checks to prevent errors such as trying to write -1 blocks and things like that. I am not inclined to open the file again immediately after writing it to check it, except in a test suite.