The Fit Score: Ranking 3,500 Playlists So You Don’t Have To

Every playlist submission is a bet. You’re wagering your time, your relationship with a curator, and sometimes money on the belief that your track belongs on their playlist. Most of the time you’re guessing -scrolling through playlists, listening to a few tracks, going with your gut.
This post is about replacing that guess with a number.
The Problem With “Good Fit”
By this point in the series, I had:
- A snapshot record for every track on every active Spotify playlist, each with the same audio features computed
- Three composite scores – Energy Adjusted, Mood Score, Rhythm Score – on both the track and the playlist, expressed on the same 0–1 scale
- Every score record now carries a Fit Score between 0 and 1.
What I didn’t have was a single answer to the question: how well does this track fit this playlist? That’s what Fit Score is.
The Foundation: Audio Similarity
The core signal is Audio Similarity – how closely a track’s audio profile matches a playlist’s average profile across the three composite dimensions.
energy_delta = | Track Energy Adjusted − Playlist Energy Adjusted |
mood_delta = | Track Mood Score − Playlist Mood Score |
rhythm_delta = | Track Rhythm Score − Playlist Rhythm Score |
So then:
Audio Similarity = 1 – (energy_delta + mood_delta + rhythm_delta) / 3
Because all three scores are normalised to [0, 1], the mean delta is also in [0, 1], and similarity is its direct complement. A score of 1.0 means a perfect match across all three dimensions. A score of 0.0 means maximum divergence on all three.
This is a stored number computed by the scoring microservice across every track × playlist combination. With ~150 scored tracks and ~3,500 active playlists, that’s roughly 525,000 score records – each one a precise measurement of audio fit.
The Problem With Audio Alone
Audio similarity tells you whether a track sounds like the playlist. It doesn’t tell you whether it belongs there.
Two playlists can have identical audio profiles but completely different identities. A mainstream pop playlist averaging 80 Spotify popularity and a deep-cut discovery playlist averaging 25 might both cluster around the same energy, mood, and rhythm scores. A track that fits the sound fits both – but it only belongs on one of them. That’s where the other signals come in.
Four Signals, One Score
Fit Score = (Audio Similarity × 0.5) + (Popularity Score × 0.2) + (Recency Score × 0.15) + (Trend Score × 0.15)
Audio Similarity – 50%
The dominant signal. If the track doesn’t sound like the playlist, nothing else matters.
Popularity Score – 20%
This is the signal that handles the mainstream vs. niche problem. The popularity signal measures how well a track’s Spotify popularity matches the playlist’s typical tier. A mainstream playlist and an underground discovery playlist can sound identical on paper – same energy, same mood, same rhythm – but they’re not interchangeable. Popularity tier is part of a playlist’s identity, and a track that belongs on one doesn’t necessarily belong on the other.
The alternative – treating higher popularity as universally better – would collapse every ranking into a chart position and ignore what each playlist is actually about. Proximity is the right model.
Both sides null-default to the midpoint so unscored tracks receive a neutral score rather than a penalty. A brand new release has no Spotify play history yet; it shouldn’t be punished for that.
Recency Score – 15%
Curators want fresh music. A track released last month fits the zeitgeist of an active playlist better than one released three years ago, even if they sound identical. The score decays linearly to zero over 365 days – by which point a track either found its playlists or it didn’t.
Popularity Trend Score – 15%
Computed nightly, this measures the direction of a track’s Spotify popularity – is it rising, flat, or declining? A track gaining momentum is a better addition to an active playlist than one that’s fading. Null-defaults to neutral – new releases with no trend history aren’t penalised.
What I Considered and Rejected
Raw audio feature weighting – an approach that assigns individual weights to all 13 raw audio features (energy, danceability, valence, etc.). The problem: the three composite scores already encode those features in a principled, theoretically-grounded way. Re-weighting the raw features introduces scale inconsistency, double-counting, and 13 hyperparameters to tune instead of 4.
Artist Affinity – tracking curator acceptance history by artist. Every track in this catalog belongs to the same artist, so a signal that’s identical for every track adds nothing to the ranking.
Successful submission history – whether a playlist has ever accepted any of your tracks. Interesting in principle but it inflates scores uniformly for friendly playlists rather than helping rank which track to submit. Not the problem I’m solving here.
The Result
Every Track Playlist Score record now carries a Fit Score between 0 and 1. For any given track, you can sort all 3,500 playlists by that score and get a ranked list – highest fit at the top.
That’s not a guess anymore. That’s a ranked list of 3,500 playlists, scored to three decimal places.
Which means the next question is: what acts on it? That’s the Recommendation Engine – the Agentforce agent that reads this data and turns it into submission decisions. But that’s the next post.
This is part of an ongoing series on building the Music Intelligence Engine on Salesforce. Previous posts covered the data model, the audio feature pipeline, the theoretical foundation behind the three-dimensional scoring system, the composite scores, and the playlist snapshot pipeline