All writing

~/writing/gometer-130fps-led

Audio
11 min read

130 frames per second, derived from a wire

I built a Go daemon that drives a music visualizer on a Pi, then found out the frame rate wasn't mine to pick. The LED protocol had already decided it. The real work was a latency budget I had to measure before the lights felt locked to the music.

How fast does a music visualizer need to refresh? I thought that was up to me.

I assumed I'd pick a number. Sixty fps, maybe, because that's what everything else uses, tune the render loop until it hit that, done. What actually happened is I never got to pick. The number was already decided, in silicon, by the protocol on the wire that drives the LEDs. My job wasn't to choose the frame rate. My job was to compute it from first principles and build everything else to fit.

The project is gometer: a Go daemon on a Pi that reads live audio out of squeezelite, runs an FFT, and paints a 32x16 WS2812 panel as a spectrum analyzer with a phosphor trail. The FFT isn't the interesting part. "Looks fine" and "feels locked to the music" are separated by about twenty milliseconds, and you don't get from one to the other by guessing.

The frame rate is on the wire

WS2812 is a one-wire protocol clocked at 800 kHz. Every pixel is 24 bits, so at 800 kHz that's 30 µs per pixel on the wire. After the last pixel you hold the line low for a reset latch, about 50 µs, so the strip knows the frame is done and commits it. That's the entire timing model, and it's not negotiable: it's the protocol.

The panel is 256 pixels. So a full frame costs:

The frame-time identity
frame = pixels * per_pixel + latch
      = 256 * 30 us + 50 us
      = 7730 us
      = 7.73 ms  ->  ~129 fps

7.73 ms. About 129 fps. That's the ceiling, and it's a hard one: you can't clock the bits out faster than the strip reads them.

One place I could have done better, I already had: the two halves of the panel clock in parallel on PWM0 and PWM1. You'd expect 512 pixels to take twice as long, but the two channels shift simultaneously, so a full update is still 7.73 ms, not 15.5. The parallelism doesn't buy speed past the ceiling. It buys me the ceiling at full panel size instead of half.

Now the part that took a moment to trust. The render loop has a ticker set to a 500 µs sleep, nominally 2 kHz. It doesn't run at 2 kHz. It runs at about 129 Hz, and that's correct.

render.go
ticker := time.NewTicker(500 * time.Microsecond) // nominal 2 kHz
defer ticker.Stop()
 
for range ticker.C {
    frame := vis.Render(spectrum())
    // pa.Write blocks until the DMA has clocked every bit to the strip.
    // The loop does not free-run at the ticker rate. It settles at the
    // wire rate, because this call will not return until 7.73 ms have passed.
    pa.Write(frame)
}

pa.Write blocks until the DMA finishes shifting the frame out. The ticker is just there to make sure I'm ready to start the next frame the instant the previous one commits. The loop self-clocks to the hardware. I didn't set 129 fps anywhere. The wire set it, and the loop found it.

The latency budget is the real product

Frame rate is throughput. It tells you how often a new picture appears. It doesn't tell you whether that picture is showing you the audio you're hearing right now or the audio from a quarter second ago. Those are different problems, and the second one decides whether the thing feels alive.

There's a threshold, roughly 40 ms, past which a sound and a corresponding flash of light stop registering as the same event. Inside it, your brain fuses them. Outside it, the lights look like they're reacting to the music, which is a different and worse experience than looking like they are the music. So the whole design has one constraint over it: total audio-to-LED latency under 40 ms, ideally well under.

I measured the path and added it up.

FFT window
14 ms
resolves to ~70 Hz, stays transient-reactive
Frame time
7.73 ms
the wire ceiling, ~129 fps
Total a/v latency
~22 ms
under the ~40 ms fusion threshold

The FFT window is the tension in the whole budget. A longer window resolves lower frequencies: 14 ms of audio gets me down to about 70 Hz, low enough to see a kick drum and a bass line as distinct things. But a longer window also means more audio has to accumulate before I can transform it. That's latency. Make it longer for better bass resolution and you blur transients and push past the fusion threshold. Make it shorter for snappier transients and the low end turns to mush. Fourteen milliseconds is where those two pressures balance for this panel.

Add it up: FFT window, transform and mapping, frame time, buffering in between, and the audio reaches the LEDs about 22 ms after it reaches the speakers. Comfortably inside 40. A beat lands on the panel at the same instant it lands in your ears. That alignment isn't luck. It's a number I kept under a ceiling on purpose.

Throughput is not latency

A high frame rate with a deep buffer in front of it gives you smooth motion that arrives late. The panel looks great and feels dead. Frame rate decides how the motion looks. Latency budget decides whether it's the music or a recording of the music a moment ago.

The panel cannot do red, and other physics

With timing settled, color was next. WS2812 panels are bad at red. The red die sits at about 620 nm, and a saturated, fully-on red reads as a tiny dim blip next to the green and blue, which blast out far brighter. Drive pure red at full value and it looks like the panel is barely on.

The first instinct is to crank red and pull the others down to match. That works for exactly one panel. Real WS2812 batches vary enough that a balance tuned on one strip looks wrong on the next, so the correction had to be a per-deployment knob, not a constant baked into the binary. The default gains lean on the channels that are already too strong:

color.go
// Per-deployment, because WS2812 batches vary enough that a constant lies.
// Pull green and blue down toward the weak red die instead of overdriving red.
var ColorBalance = [3]float64{1.0, 0.85, 0.85} // R, G, B gains
 
func balance(r, g, b uint8) (uint8, uint8, uint8) {
    return scale(r, ColorBalance[0]),
        scale(g, ColorBalance[1]),
        scale(b, ColorBalance[2])
}

The second color problem was transitions. A spectrum analyzer wants to sweep cool to warm as energy rises, blue up through cyan and orange into red. The obvious way is to interpolate RGB directly, and the obvious way is wrong: the midpoint of blue and red is a muddy purple that, on a low-contrast LED panel, reads as a smear with no edges. So I don't blend through the midpoint. I route the transition through complementary pairs that hold their contrast: red against cyan, orange against blue. Those survive a low contrast ratio because the eye reads them as opposed rather than as a gradient toward gray. Motion stays legible instead of dissolving into purple mush.

The last piece is why 129 fps is worth having at all, given that film gets away with 24. Flicker fusion, the rate past which a flashing light looks steady, is around 60 Hz for most people. At about twice that, the panel isn't just past flicker. It's inside persistence of vision, where successive frames blend into continuous motion in the eye itself. A bar climbing the panel doesn't step from pixel to pixel. It reads as smooth sub-pixel motion, because by the time your retina has let go of one frame the next two have already arrived.

The phosphor trail decay is two-stage to match how a real phosphor behaves. A bright half-life of about one frame, so a freshly-hit bar snaps to full brightness and drops fast, and a dim half-life of about 50 ms underneath it, so the tail lingers and fades the way an old CRT does. One stage alone looks wrong: a single fast decay has no trail, and a single slow decay smears everything into a glowing blur. Two stages give you a sharp leading edge with a soft tail behind it.

The bug that needed a full restart

Now the correctness story. The timing math is the spine, but this is the part I didn't see coming.

gometer reads audio out of squeezelite through a shared-memory segment that squeezelite exports for its VU meter. To find it, the daemon scans /dev/shm for segments named squeezelite* and attaches to one. The original code took the first match:

shm.go
func findSqueezeliteShm() (string, error) {
    entries, _ := os.ReadDir("/dev/shm")
    for _, e := range entries {
        // os.ReadDir returns lexical order. The first squeezelite* segment
        // is not necessarily the live one. This is the bug.
        if strings.HasPrefix(e.Name(), "squeezelite") {
            return filepath.Join("/dev/shm", e.Name()), nil
        }
    }
    return "", errSegNotFound
}

os.ReadDir returns entries in lexical order. So "the first match" is "the alphabetically first match," which has nothing to do with which segment belongs to the running player. squeezelite doesn't always clean up its segment when it dies, so a stale segment from a dead PID can outlive the process and sort ahead of the live one. The visualizer would lock onto a corpse and sit there reacting to silence, or worse, to whatever garbage was last left in that buffer.

The fix is to stop trusting the order and start reading the segments. Each one carries a running byte and an updated timestamp. Stat every candidate, check whether it's actually live, and among the live ones pick the most recently updated:

shm.go
func findSqueezeliteShm() (string, error) {
    entries, _ := os.ReadDir("/dev/shm")
    var best string
    var bestUpdated uint64
 
    for _, e := range entries {
        if !strings.HasPrefix(e.Name(), "squeezelite") {
            continue
        }
        hdr := peekHeader(filepath.Join("/dev/shm", e.Name()))
        // running == 0 is a corpse. Among the live ones, newest updated wins.
        if hdr.running != 0 && hdr.updated > bestUpdated {
            bestUpdated = hdr.updated
            best = filepath.Join("/dev/shm", e.Name())
        }
    }
    if best == "" {
        return "", errSegNotFound
    }
    return best, nil
}

That fixed lock-on at startup. It didn't fix the other half of the same bug: a player that started fine and then went dark. The daemon would lose audio mid-session and need a full restart to recover. The segment it had attached to was still there, still mapped, just no longer being written, because the player had moved to a new one. Discovery only ran once at startup, so nothing ever went looking again.

The answer was a stagnation timer. If no frames have arrived for five seconds, the audio has stopped flowing through whatever segment we hold, so we drop it and re-run discovery. That turned "loses audio and needs a restart" into "loses audio for five seconds and silently re-locks onto the live player." Same bug as the startup lock-on, in a slower disguise: do not trust a handle to stay valid just because it was valid when you got it.

One aside, because I earned it

The visualizer has a small web control UI: brightness, palette, which visualization. I built the first version with shadcn and React, looked at it, and didn't like it. A single-page app with a JSON API behind it, talking to a daemon whose whole job is to push pixels to a wire as fast as the wire allows. The frontend had more moving parts than the thing it controlled.

I tore it out and rebuilt it with htmx. Server renders HTML, buttons post to it, page swaps in fragments. There was no JSON API to design because there was no client that needed one. A visualizer doesn't need a state-management library. It needs three knobs and a daemon that never blinks.

What this actually was

I went in thinking the FFT or the color would be where I got stuck. The FFT is a library call. The color was physics I could measure and correct. What actually bit me was a number I never got to choose and a budget I had to keep under a threshold I couldn't see.

The frame rate was on the wire the whole time: 256 pixels, 30 µs each, 50 to latch, 7.73 ms, 129 fps. I didn't set it. I computed it, built the loop to fall into it, and spent the rest of the budget making sure the light arrived while the sound was still in the air.

The code is on GitHub.