Importing Timestamp Information

From Audacity Wiki
Jump to: navigation, search
Audacity cannot yet read the timestamp information which can be stored by professional recording hardware using the Broadcast Wave (BWF) format. Timestamps are useful in identifying different parts of the recording.

This page proposes a workaround using a Python program to write the timestamp information to a label file which can then be imported into Audacity.


Quick Links:


Contents

Python analysis program

The following python program analyzes a WAV file in the Broadcast WAV (BWF) format that is for example produced by the M-Audio Microtrack recorder. The program writes the timestamp data to the file labels.txt that can be imported as a label track into Audacity, using the Import Labels menu item (under the File menu in current Audacity and legacy 1.3, or the Project menu in legacy Audacity 1.2).

Amendment made to script 11 May 2013

Added an error message if no cues are found, and changed syntax to (hopefully) run with python3.

Amendment made to script 29 February 2016

The code line

cuechunk=="data" and cuename==i and cuepos==cueoffset):

was updated to

cuechunk=="data" and cuename==i): 

because Rmunn observed that the "cuepos==cueoffset" check was too strict for the Zoom H4n which creates markers with cuepos always equal to 0, and cueoffset equal to the actual position. The cuepos value is never used by that Python code, so it really doesn't matter what it is.

Script

#!/usr/bin/python

def formatline(position1,position2,name): # for the labels.txt file
  num1 = "%8.6f" % position1
  num2 = "%8.6f" % position2
  return num1+ "\t"+ num2 +"\t" +name+"\n"


from sys import exit,argv
import wave

def error(s):
  print (s)
  exit()

def readnumber(f):
  c = f.read(4)
  if len(c)<4:
     error("Sorry, no cue information found.")
  return sum(ord(c[i])*256**i for i in range(4))

def findcues(filename):
  f = wave.open(filename,"r")
  framerate = f.getframerate()
  channels = f.getnchannels()
  bytespersample = f.getsampwidth()
  totalframes = f.getnframes()
  totalduration = float(totalframes)/framerate
  byterate = framerate * channels * bytespersample
  print (str(framerate) + " samples = "+str(byterate)+ " bytes per second")
  f.close()

  f = open(filename,"r")
  if f.read(4) != "RIFF":
      error("Unknown file format (not RIFF)")
  f.read(4)
  if f.read(4) != "WAVE":
      error("Unknown file format (not WAVE)")
  name = f.read(4)
  while name != "cue ":
    leng= readnumber(f)
    f.seek(leng,1) # relative skip
    name = f.read(4)

  leng= readnumber(f)
  num = readnumber(f)
  if leng != 4+24*num:
    error("Inconsistent length of cue chunk")
  print (str(num) + "MARKER(S) found *********")
  if num>0:
   oldmarker = 0.0
   out=open("labels.txt","w")
   for i in range(1,num+1):
      cuename = readnumber(f)
      cuepos = readnumber(f)
      cuechunk = f.read(4)
      cuechunkstart = readnumber(f)
      cueblockstart = readnumber(f)
      cueoffset = readnumber(f)
      if not (cuechunkstart==0 and cueblockstart==0 and
          cuechunk=="data" and cuename==i):
        print (cuename, cuepos, cuechunk,
                  cuechunkstart, cueblockstart, cueoffset)
        error("unexpected marker data")
      else:
        position = float(cueoffset)/framerate
        print("Marker",i,
           "   offset =",cueoffset,"samples =",position,"seconds")
        if position>oldmarker:
          # prefer to mark them as regions (intervals)
          out.write(formatline(oldmarker,position, "Section "+str(i)))
        else:
          out.write(formatline(position,position, "Marker "+str(i)))
        oldmarker = position
   if totalduration>oldmarker:
      out.write(formatline(oldmarker,totalduration, "Section "+str(num+1)))
   out.close()
   print ("Marker data was written to labels.txt")
  else:
   print ("No marker data file was written.")
  f.close()

if __name__ == "__main__":
  if len(argv)<=1:
    print ("Usage: python "+ argv[0] + " WAV-file")
    exit()
  filename = argv[1]
  print ("extract position markers from file " +filename)
  findcues(filename)

How to run

You need python to run this. I wrote this file for python version 2.7.6. I have not tested it with the new python version 3, but I think it should run. Store this text in a file called findcues.py. Run it with

python findcues.py your-WAVE-filename.wav

from the shell command line, or with

import findcues
findcues.findcues("your-WAVE-filename.wav")
in a python window (assuming that findcues.py is in a place where the python system finds it). If it finds timestamp information, it will generate a file labels.txt. So far, I have tested it for the files of the M-Audio Microtrack-II recorder. If someone has positive or negative experience with files produced by other devices, please mention this here.
  • Also works with Sony PCM-M10 when using the "T-MARK" button to set track marks. BTW: the Python program crashes when I try it on a wav file without cues.

Acknowledgements

I have benefited from the info at the following pages

The following program lists all chunks of a WAV file and helped me to analyze and understand the structure of the files that I had:

def readnumber(f):
  s=0
  c = f.read(4)
  for i in range(4):
    s += ord(c[i])*256**i
#  print "                   ***", repr(c),"EEE"
  return s

def readchunk(f,level=0,searchcues=False):
  pos = f.tell()
  name= f.read(4)
  leng= readnumber(f)
  totleng = leng+8
  print "   "*level,name,"len-8 =%8d"%leng,"   start of chunk =",pos,"bytes"

  if name in ("RIFF","list"):
      print "   "*level,f.read(4),"recursive sublist" 
      sublen = leng-4
      while sublen>0:
         sublen -= readchunk(f,level+1,searchcues)
      if sublen !=0:
         print "ERROR:",sublen
  elif searchcues and name=="cue ":
    sublen=leng-4
    num = readnumber(f)
    print num,"MARKER(S) *********"
    for i in range(num):
      sublen -= 24
      cuename = readnumber(f)
      cuepos = readnumber(f)
      cuechunk = f.read(4)
      cuechunkstart = readnumber(f)
      cueblockstart = readnumber(f)
      cueoffset = readnumber(f)
      if not (cuechunkstart==0 and cueblockstart==0 and
          cuechunk=="data" and cuename==i+1 and cuepos==cueoffset):
        print "unexpected marker data", cuename, cuepos, cuechunk,\
                  cuechunkstart, cueblockstart, cueoffset
      else:
        print "Marker#",i+1,"   offset =",cueoffset,"bytes ***"
    if sublen !=0:
      print "ERROR:",sublen
  elif searchcues and name=="labl" and level==2:
    sublen=leng-4
    labelname = readnumber(f)
    labeltext = f.read(sublen)
    print "Label #",labelname,"  name = >>"+labeltext.rstrip("\x00")+"<<"
  else:
    f.seek(leng,1) # relative skip
  return totleng

def allchunks(f):
  readchunk(f)
  c = f.read()
  if c != '':
     print "error", len(c), c[:20]

def cues(f):
  readchunk(f,searchcues=True)

if __name__ == "__main__":
  from sys import argv
  if len(argv)<=1:
    print "Usage: python", argv[0],"WAV-file"
    exit()
  filename = argv[1]
  print "analyze chunk structure from WAV-file",filename
  f = open(filename,"r")
  cues(f)
  f.close()

Future features

might want to look at "list/adtl" chunks in order to find the true names of the breakpoints. Hopefully, this will be found in Audacity itself soon. Also, my files have a "regn" chunk that seems to contain some "regions", called "Section 1" etc. --Guenterrote 22:49, 9 March 2010 (CST)

Personal tools

Donate securely by PayPal, using your credit card or PayPal account!