Talk:Proposal Page Tabled Memory and Event Layout
Fixed Sized Memory Chunks
Having fixed size chunks of memory and pointers to them can be better than copying data around, and may work better than malloc/free of memory. A fixed size will not cause memory fragmentation. However, overall I think the proposal is too vague. The devil is in the detail.
- In my SlippyPanel plugin module, I use blocks too. My block size is determined by the sound card (512 samples). Looping is achieved without a click by fading out the last packet and fading in the next one. This leads to audible drops in sound level at the boundary, but still is a lot better than a click. Perhaps better would be a crossfade from one to the other, i.e. playing two blocks at the same time, and perhaps better still a smart join where an instant excision repair is done to avoid a click. I currently can only loop at 10ms boundaries.
- This proposal implies looping at 10ms boundaries (only). Is that acceptable? Maybe not, if we have sounds with sharp onset.
- The proposal looks only at sample data. What about envelope data? What about summary data (used in drawing the waveform)? Whether these are 'in the chunk' or in their own chunks matters a lot to implementation. Also do we treat stereo left and right as two chunks or do we interleave?
- How much processing do we allow actually during output? For example, do we allow equalization on output? The more processing we allow during output the lower the latency between a change in a parameter and it taking effect, but also the higher the risk of underruns.
- Mp3 encoding also uses blocks, and FFTs will have overlapping blocks. Working at block boundaries in mp3 allows for lossless rearrangement. We can't set those block sizes. Our block size might not match the block size of the sound device either. So we may still need code to merge and split blocks of different sizes. So do fixed block sizes gain us anything?
- An alternative to fixed block sizes is to set a lower size and upper size for any block. The lower size guarantees that the overhead for switching blocks will be a small proportion of the time spent on the block. The upper size puts a limit on (a) how much copying must be done when making new block boundaries and (b) how much wasted 'unused space' we allow per block.
Yes I am very lacking in details. I hope I can explain some of the refinements without too much trouble. I have thought of many of these issues, expressing them in text is a bit harder.
- Sub-10ns-edits will be done via floating windows. When a percentage of a block is added or removed, the block containing the sample point to be inserted is given the event. Because it saves the index of the block it has cut, it is there for further refinements and processing. Variables within the stream will detail these sub-points.
- Example: If making a cut from block 3.1 to 5.3, the event would go in the 3 bin. The event would put the 0.1 from 3 into the stream and fill the rest of the block from the 0.3 point of block 5 overrunning to 0.2 of block 6. Blocks from then on would be filled from 2 blocks with an overlap value of 0.2-0.8.
- Envelope data would most likely be maintained at a track level and refined and adjusted via events within the stream.
- Processing for output can vary as much as needed. The processing agent could take a block and divide it up into sub blocks (64 sample?) and send them out early. Or it could gather as many blocks as necessary and do more complex (12k+ sample) EQ effects. They could also run in tandem or parallel. I will detail how a processing matrix could be laid out later.
- One of the ideas within this data flow layout is to move drawing into the effect category thus making it purely modular. Whether it be done via textures, meshes, or procedural calls it would be left up to the event implementation. The design of this is to get data from point a to point b.
Events encoded as 31 bits initially sound like there will be more than enough, but then we have to consider that there will be parameters with events, such as 'Go back by 12 packets', or 'set amplification to 37%'. We have a classic instruction encoding design decision. Again the devil is in the detail.
- What events do we support?
- Fixed length encoding could save us mutex work, where we can 'change the instruction with one atomic write'. However this does tempt us to write code that makes possibly invalid assumptions about relative speed of threads. There needs to be a discipline to how we change the play queue, for example a strategy where increases or decreases in length are 'optimistic' - we request the change, but do not assume that it was actioned before the 'play head' reached the relevant point where it makes a difference. The 'play head' could have proceeded on the old path. So we can do an atomic write, but our coding discipline means we then need to test whether we did indeed do it in time.
- The block/event list is only a sequencer. It provides a marginal level of granularity such that things are spread out enough to keep things relatively simple yet allow full functionality.
- One key point of action is that events are stackable upon their index. Whether they are in a linked list or container would be left up to that level of implementation.
- There will be many mutex saving areas, but parallelism will most likely be a level above this. I think of the block/event stream as mostly linear.
- As much as I would love this to be a catch all for every type of processing, it is a balancing act. Some types of processing will not be conducive to this model. However, there is nothing stopping it from rendering to a target and allowing more complicated processing to happen there.
Higher Level API needed
The proposed design needs to be complemented by a high level design, an API that shows us what we can do with the sound, an API that hides the implementation detail of whether we have blocks at all, and if so whether blocks are fixed size or not, interleaved or not, and whether we are single or multithreaded.
- The proposal, especially the event codes, are pushing towards an implicit language for representing flow of audio and control data with annotation for what can be done in parallel, what serial, what can be precomputed, what should be done on demand, and where data should live (memory, disk, SSE registers, GPU). We should make that language explicit.
- When we have that language, we should look at tools that convert it to a chosen lower level design, translating audio streams to queues of blocks and chosen function calls into event tokens of certain kinds. That may sound complicated, but it is a lot better than doing it all by hand. We could choose, for example, to have a one to one correspondence between event codes and function names.
- Why do it this way? The high level API is likely to be simpler and show more clearly what we are trying to achieve. The implementation detail is something we may wish to change and tweak.