I have been playing around with beat detection, because two games that I have been working on would really benefit from having strong beat detection. I think I’m on to something with this one (maybe), so I’ll post my methods and results here. I don’t have any schoolin’ in fourier analysis, so things I’m doing may seem naive to the experienced fouriereer. If you’re one of these, please please please point me in the right direction.

I came up with this algorithm on the bus to school this morning. When I got home I couldn’t wait to try it on some real data with Mathematica. I describe my findings here:

I started with this 5-second sound sample (sorry about the WAViness… that’s the format I loaded it in as). It’s an excerpt from Rachmaninoff’s symphonic dances, which I chose because it has a wide frequency spectrum but is still very rhythmic. Mathematica plots the samples like so:

In[1]:= wav = Import["symphonic_dances_fourier_field.wav"]; In[2]:= samples = wav[[1,1]]; In[3]:= ListPlot[samples]

Listen to the sample and notice how you can see the beats in the wave. These are raw intensity, and these are what a naive detector would use to pick the beats out. This example isn’t good enough to tell if my beat detector is any better than naive.

Next I compute the “fourier field” of a particular window size, in this case 8192 samples (about 0.18 seconds). This takes successive windows at a given step (in this case 256 samples) and computes their fourier transforms, and sticks it in a two-dimensional array indexed by [time; frequency]. Here’s the code:

In[4]:= fourierfieldtime[data_, window_, time_] := Fourier[data[[Range[time-window, time+window-1] ]] ] In[5]:= fourierfield[data_, window_, step_] := Table[fourierfieldtime[data, window, i], {i, window+1, Length[data]-window, step}] In[6]:= field = fourierfield[samples, 4096, 256]; In[7]:= ListDensityPlot[Abs[field], Mesh -> False]

The next thing I do is take each frequency band (which gives that frequency’s “presence” over the time of the song) and do *another* fourier transform on it, and put it in a table. This should tell me how each frequency is pulsating in time, with the theory that most frequencies repeat themselves with the beat. Note that I take the absolute value of the previous transform, disregarding the phase. Otherwise I just run round-trip and get a strangely-averaged and phase-rotated version of what I started with.

In[8]:= timefield[data_] := Table[Fourier[Abs[data[[All, i]] ] ], {i, Length[data[[1]] ]}] In[9]:= times = timefield[field]; In[10]:= ListDensityPlot[Abs[times], Mesh -> False]

Finally I just add all of these together vertically to get the rough correlation of frequencies of each frequency. We’re only interested in the very low frequencies (of each frequency)—things that happen between 1 and 40 times during the sample—so I only plot those.

In[11]:= compose[data_] := Plus @@ data In[12]:= beats = compose[times]; In[13]:= ListPlot[Abs[beats[[Range[1, 40] ]] ], PlotJoined -> True, PlotRange -> {0, 320}]

This is our final analysis, so let’s look at it carefully. What is it saying? We want to look at the dominant frequencies, the ones with the strongest correlation. Frequency one is of no interest; it is always high in phaseless fourier transforms. The largest is six, followed by four. What happens six times during this sample? The half notes, which are the strongest orchestra hits. I can’t figure out four yet. The next one is twelve, which corresponds to the quarter notes: that’s the one we’re really interested in. I can’t quite tell how we’re going to communicate that to a reading program. We might just report the five highest levels or something, and let the program determine what kind of tempo range it is looking for.

The final step in the process is to do an inverse fourier transform on this data, to get back something that resembles our beats over time (a very dumbed-down version of the song, as it were). This is pretty cool:

In[14]:= orig = InverseFourier[beats]; In[15]:= ListPlot[orig, PlotJoined -> True]

You can see each of the strong beats that you associate with the song. I can’t tell whether the beats occur at the tips of those or at the heavy turnarounds right before the tips. I’m inclined to say the latter. But this information that we’ve reconstructed could easily have been created by a dumb volume analysis. Next week (when I finally have some time again, damn homework—I’m procrastinating a huge project as I speak), I’ll try it on a harder song, a baroque piece or some jazz. My ultimate goal in this project is to get it to beat-detect jazz, which has a very implicit pulse rather than a strong beat.