essay · on the signal · 5 min

your voice is the best dating profile you have.

Every dating app you have ever used optimised on the wrong sense. They asked for your face, your bio, sometimes your prompts. They never asked you to talk.

Voice is the highest-signal input a stranger can give you about who they are. It is harder to game than a photo and harder to game than a written paragraph. Listen to someone describe what they care about, for thirty seconds, and you already know more about them than you would from a hundred swipes.

Soulmate's onboarding includes one thirty-second voice clip. Just one. Here is what we do with it.

what happens between record and match

You answer a short prompt out loud — 'what's something you turn to when you need to feel real?' We record on the device, upload to private storage, and pass the audio to Whisper for transcription. The text comes back in under five seconds.

From there the audio is gone, in a sense. It still lives in storage, gated by the same row-level security rules that gate your photos. But the matching engine never reads it. The engine reads the transcript.

The transcript folds into the same 1536-dim embedding as your written prompts and your ideology answers. Five prompts plus thirty seconds of voice is approximately one hundred and fifty features. That is what the matching engine compares against other people's vectors.

why the transcript matters more than the audio

There is a real concern with voice as a matching input: that it lets the algorithm learn things about your accent, your demographic, your gender, your fluency. We didn't want that. So we don't pass the audio into the matching layer at all.

What's left is the words. The transcript captures the things you cared enough to say. It captures pacing and emphasis indirectly — people who care about something tend to use more specific nouns, longer sentences and fewer filler words when they talk about it. People who don't care tend not to.

We have run the embeddings on test pairs. The transcripts cluster meaningfully even when the underlying prompts are short. A two-sentence answer to 'something I turn to' produces a different cluster than 'a thing my mom said once that stuck' and a different cluster again from 'a song I've replayed too many times this week.' That's the signal we wanted.

what unlocks when (and only when)

If two souls vibe each other on the basis of their vectors, an LLM writes a paragraph naming why they fit and a single opening line. The chat opens. The photos unlock simultaneously, on both sides, at that moment. And — only then — both people can hear each other's voice clip.

The voice is held back exactly like the photo is held back, by design. The matching engine is blind to face and to voice. The human moment of unlock is when both of them arrive.

what soulmate is not

We are not a voice-call dating app. There are several of those and they have their own merit. Soulmate is text-first and async by default. The voice clip is a thirty-second input into the matching layer and a thirty-second reveal post-mutual. The day-to-day surface is reading and writing.

We are also not, structurally, a dating app at all. Three intents — friendship, relationship, community — are co-equal first-class outputs of the same engine. The voice clip works the same way for all three. Friends find each other by what they say about what they love. So do partners. So do communities.

where this goes next

Right now the voice clip is required in onboarding. We may make it optional once we have more data on whether onboarding completion improves without it. We may add a second optional clip — 'tell us about a moment that changed how you think' — once we know the first clip works.

The premise we are testing is that voice is the highest-signal input we know how to ask for at low effort. So far the embeddings agree with the premise.

The door is at byvibration.com. If you want to know whether the people you'd actually want to know are easier to find by what they said than by what they looked like, this is the app where you can answer that question.

share —x / twitter

a soft place to find your people.

three minutes of quiet questions. no photo required.