A music producer never types a song into a paragraph. They lay it on a timeline — tracks, takes, things they can move, mute, and version. So why is a scene of dialogue still a flat block of script text? Speech Score treats a performance the way a DAW treats music — and it runs today.
Where it is: real, running, and playable right here — the whole instrument, all four scores and ~1.4 MB of real neural voices, rides inside one page: no install, no signup, works offline. There's a live mode where a human actor performs one lane while the AI holds the others on the beat. What it can't do yet: perform in your voice — that reference-read layer is next.
The instrument is tracker-brained and Ableton-bodied. Voices are lanes, lines are clips, the page is a timeline. A playhead descends and each line sounds as it's struck — in real neural voices, polyphonic and latency-tight, so three witches can land on the same row and phase into one another. In the editor you drag a clip to retime it, drag it across lanes to recast it, set how many beats it spans, and warp it to lock a word to the grid — the same knob that snaps it to the beat also stretches the voice toward madness. It's explicitly not text-to-speech: the point isn't to make a machine talk, it's to make a performance you can compose, rehearse, and stage.
| t | Witch I | Witch II | Witch III | Chorus |
|---|---|---|---|---|
| 0.0 | "When shall we three—" | |||
| 0.5 | ↳ meet again? | "In thunder—" | ||
| 1.0 | lightning, | "or in rain?" | ||
| 1.5 | ✻ | ✻ | ✻ | "Fair is foul" |
| 2.0 | ↳ "and foul is fair" |
Every score is portable JSON on one engine — add a play in a single place. The point of four is the point of the instrument: it isn't one effect, it's a surface that plays any scene you bring it.
"If actors ever needed confirmation of their worth, it's this application." … then, watching the voices phase together: "the discordant music reveals itself!" — "that's pretty good."
— Chris, an actor, going from the jab to recognition on the first prototype. The flagship live human-plus-AI mode came out of that thread.
The same instinct as the Script Doctor, pointed at the ear instead of the page: take a judgment a director makes by feel — that pause is too long, those two lines should collide — and give it a track you can actually move.
Open it, hit play, and the scene performs in real neural voices. It opens on Philip Glass Buys a Loaf of Bread; switch to Macbeth's witches or the Wilde duet in the library. Runs in your browser — no install, no signup, works offline.
▶ Open the instrumentAct or direct? Put your name on a lane →