3,500 Moving Targets: Building the Playlist Snapshot Pipeline

The track pipeline was the proof of concept. Download a preview, run the analysis, write thirteen numbers back to Salesforce. Repeat for every track in the catalog.
The playlist pipeline is a different problem entirely.
To score a submitted track against a playlist, I need to know what the playlist actually sounds like — not just its name and follower count, but its sonic character: the average energy, the typical mood, the rhythmic profile. That profile has to come from the tracks already on it, analyzed the same way I analyze submitted tracks.
There are roughly 3,500 active Spotify playlists in my database. Each has dozens to hundreds of tracks. Many of those tracks change week to week as curators add new music, remove what isn’t performing, and reshape their lists for the season.
That’s not a dataset. That’s a moving target.
The Scale Problem
The track pipeline runs once a month against a fixed catalog. The playlist pipeline has to run continuously against something that’s actively changing.
A naive approach – query all 3,500 playlists, fetch every track, analyze everything – isn’t viable. The audio analysis is CPU-intensive and sequential by nature: one track, one preview download, one librosa pass, write the result. At that rate, processing the full catalog of playlist tracks in a single session would take days and burn through Spotify’s rate limits in the first hour.
The pipeline needed to be incremental. Process a slice each run. Remember what’s been done. Only touch what’s new.
The Snapshot Model
The solution is a snapshot table. For every track on every playlist, there’s a record in Salesforce with the full audio feature profile – the same thirteen fields that live on submitted tracks. When the pipeline runs, it queries existing snapshots for the target playlists and builds a diff against what Spotify currently shows. Only new tracks – ones without a snapshot record – get analyzed.
This changes the economics of the job completely. The first time a playlist is processed, you pay the full cost: fetch every track, analyze every preview. After that, most runs are cheap. A playlist with 80 tracks that gets 5 new additions this week costs 5 analysis passes, not 80.
The composite scores – Energy Adjusted, Mood Score, Rhythm Score – live on each snapshot record. The playlist’s profile is the average of those scores across all its snapshot children. As the snapshot table fills in, the profile becomes more representative. As tracks turn over, old snapshots can be pruned and new ones take their place.
The Ordering Problem
With 3,500 playlists and a fixed processing budget per run, the order in which playlists get processed matters.
The pipeline sorts by when each playlist was last synced, oldest first – with never-synced playlists always at the front of the queue. This guarantees that the initial seeding phase makes steady, predictable progress. New playlists added to the database are always prioritized over re-syncing ones that already have profiles.
After the initial seed, daily runs become short by design. A playlist synced three days ago won’t appear near the front of the queue for weeks. The pipeline naturally shifts its attention to the tail – playlists that haven’t been touched recently – and leaves the fresh ones alone.
Clearing the sync timestamp on any playlist forces it back to the front of the queue on the next run. That’s the manual override for “re-process this one now.”
The Playlist That Disappeared
Here’s an edge case that turned into a design decision.
Spotify returns a 404 for playlists that have been deleted – but also for playlists that a curator has temporarily set to private. From the outside, you can’t tell the difference. A 404 just means “not available right now.”
The conservative choice: treat 404 as a soft delete. Mark the playlist inactive, record why, stamp the timestamp, and move on. Don’t retry aggressively. But don’t forget about it either.
The pipeline keeps inactive playlists in the processing queue. When a curator re-publishes a playlist – or when the 404 turns out to have been temporary – the pipeline detects that Spotify is now returning tracks, automatically restores the playlist to active status, clears the deletion flag, and processes it normally. Recovery is automatic; no manual intervention required.
403 is handled differently. A 403 means the playlist belongs to a private account that hasn’t authorized access. There’s no recovery path, so the pipeline skips it silently and moves on without touching the record.
Running Unattended
The pipeline runs on a schedule – a daily pass that processes a fixed slice of the catalog, and a monthly pass that adds audio analysis for new tracks up to a set limit. Both run automatically, write their results back to Salesforce, and require no supervision.
The separation between the two passes reflects the different costs involved. Fetching track metadata from Spotify is fast and cheap. Downloading a preview clip and running audio analysis on it is neither. The daily pass keeps the snapshot table current – new tracks added as stubs, removed tracks pruned, playlists timestamped. The monthly pass fills in the audio features for stubs that don’t have them yet.
After the initial seed, daily runs are short. Most playlists haven’t changed. Most of the work is already done. The pipeline processes its slice, finds mostly nothing new, stamps the timestamps, and exits.
That’s the right behavior. A pipeline that does less work as the data matures isn’t stalling – it’s working correctly.
Where This Leaves Things
The snapshot table is filling in. Every active playlist in the database is accumulating a profile – energy, mood, rhythm – derived from the tracks on it, analyzed the same way submitted tracks are analyzed.
Once both sides are populated, the Scoring Engine has what it needs: composite scores for every submitted track, composite scores for every playlist. The distance between those scores, across three dimensions, is the raw signal for fit.
Turning that signal into a ranking – deciding how to weight the dimensions, how to combine them with signals like track recency, popularity, and momentum – is the fit scoring problem. That’s the next post.
Matt McGuire is an independent punk artist and Salesforce architect. He’s presenting “The Music Intelligence Engine: AI-Powered Promotion on Salesforce” at True North Dreamin‘ in May 2026.