Talk:Proposal Page Tabled Memory and Event Layout
Fixed Sized Memory Chunks
Having fixed size chunks of memory and pointers to them can be better than copying data around, and may work better than malloc/free of memory. A fixed size will not cause memory fragmentation. However, overall I think the proposal is too vague. The devil is in the detail.
- In my SlippyPanel plugin module, I use blocks too. My block size is determined by the sound card (512 samples). Looping is achieved without a click by fading out the last packet and fading in the next one. This leads to audible drops in sound level at the boundary, but still is a lot better than a click. Perhaps better would be a crossfade from one to the other, i.e. playing two blocks at the same time, and perhaps better still a smart join where an instant excision repair is done to avoid a click. I currently can only loop at 10ms boundaries.
- This proposal implies looping at 10ms boundaries (only). Is that acceptable? Maybe not, if we have sounds with sharp onset.
- The proposal looks only at sample data. What about envelope data? What about summary data (used in drawing the waveform)? Whether these are 'in the chunk' or in their own chunks matters a lot to implementation. Also do we treat stereo left and right as two chunks or do we interleave?
- How much processing do we allow actually during output? For example, do we allow equalization on output? The more processing we allow during output the lower the latency between a change in a parameter and it taking effect, but also the higher the risk of underruns.
- Mp3 encoding also uses blocks, and FFTs will have overlapping blocks. Working at block boundaries in mp3 allows for lossless rearrangement. We can't set those block sizes. Our block size might not match the block size of the sound device either. So we may still need code to merge and split blocks of different sizes. So do fixed block sizes gain us anything?
- An alternative to fixed block sizes is to set a lower size and upper size for any block. The lower size guarantees that the overhead for switching blocks will be a small proportion of the time spent on the block. The upper size puts a limit on (a) how much copying must be done when making new block boundaries and (b) how much wasted 'unused space' we allow per block.
31 bit Events
Events encoded as 31 bits initially sound like there will be more than enough, but then we have to consider that there will be parameters with events, such as 'Go back by 12 packets', or 'set amplification to 37%'. We have a classic instruction encoding design decision. Again the devil is in the detail.
- What events do we support?
- Fixed length encoding could save us mutex work, where we can 'change the instruction with one atomic write'. However this does tempt us to write code that makes possibly invalid assumptions about relative speed of threads. There needs to be a discipline to how we change the play queue, for example a strategy where increases or decreases in length are 'optimistic' - we request the change, but do not assume that it was actioned before the 'play head' reached the relevant point where it makes a difference. The 'play head' could have proceeded on the old path. So we can do an atomic write, but our coding discipline means we then need to test whether we did indeed do it in time.
Higher Level API needed
The proposed design needs to be complemented by a high level design, an API that shows us what we can do with the sound, an API that hides the implementation detail of whether we have blocks at all, and if so whether blocks are fixed size or not, interleaved or not, and whether we are single or multithreaded.
- The proposal, especially the event codes, are pushing towards an implicit language for representing flow of audio and control data with annotation for what can be done in parallel, what serial, what can be precomputed, what should be done on demand, and where data should live (memory, disk, SSE registers, GPU). We should make that language explicit.
- When we have that language, we should look at tools that convert it to a chosen lower level design, translating audio streams to queues of blocks and chosen function calls into event tokens of certain kinds. That may sound complicated, but it is a lot better than doing it all by hand. We could choose, for example, to have a one to one correspondence between event codes and function names.
- Why do it this way? The high level API is likely to be simpler and show more clearly what we are trying to achieve. The implementation detail is something we may wish to change and tweak.