Choosing an audio fingerprinting service

This is a guest post by Dan Gravell – Founder of OneMusicAPI

Picture the scene: someone installs the music app you’ve developed, and they copy some of their own music files to the player. They open up your app and… they’re presented with a bunch of garbled album titles and generic images to represent what-should-be album covers. The users’ first experience with their precious music collection would be so much more magical if there was a way of recognising this content and displaying pristine metadata and album artwork automatically. This can be done with audio fingerprinting, an increasingly popular way of identifying music.

Why has audio fingerprinting become popular amongst developers? The core benefit is that a piece of music can be reliably identified simply by the music itself. This means there need be no reliance on extraneous managed metadata, which imposes a maintenance burden and may not exist in many cases.

The use cases are wide; from music-lovers trying to find out what the song is they are listening to in a bar; to apps on smartphones trying to deliver the best musical experience; to music rights holders attempting to protect the use of their music.

Service over algorithm

It’s important to separate audio fingerprint identification from the fingerprinting algorithm. To a developer, both are important, and it’s the combination of the two into a service that provides the true value to end users. Let’s break fingerprinting apart to learn a little more.

An audio fingerprinting algorithm is the process by which audio data can be analysed and boiled down to the essential characteristics that make a piece of music sound the way it does. The source audio data can come from anywhere; a smartphone’s microphone, a computer music file, a radio transmission. The output is a fingerprint, a summary of the audio data, condensed and ready to be sent to be identified.

Audio fingerprinting identification then takes the fingerprint and checks a database for matches. The database will typically consist of millions of fingerprints, each one a fingerprint for a song. Importantly, the database also contains identifiers; each fingerprint is assigned one, possibly more, identifiers. This link between fingerprint and identifier is essential; it’s what provides the value.

The reason this link provides the value is that once a piece of music is identified, possibilities open up. The identifier can be used to search for metadata about the recording. Maybe it could be used to find gigs or events by the performer. In reality, in most cases a fingerprint service will also combine metadata with the identifier, to make the app development process a little easier.

So we can see that an audio fingerprint service is, importantly, a combination of two things: algorithm and identification. It also means that some types of service or app are ruled out; Shazam for example may produce fingerprints on your mobile phone with their algorithm, but the identification process is not opened up for developers to use.

A graveyard of services

It’s worth noting that audio fingerprinting services have a history of appearance and disappearance. Many services have been launched and later rescinded, victims of changes in business priorities or acquisitions. Echoprint (part of Echonest) is discontinued for new submissions, although the existing database is available. Doreso‘s service was stopped, as was Moomash‘s, and Last.fm‘s.

For that reason it’s worth considering, as an app developer, how to future proof an integration to an audio fingerprinting service.

Four audio fingerprinting services to consider

So let’s have a review of some audio fingerprinting services that are out there, available for app developers to use.

ACRCloud

ACRCloud boasts a vast database of 40m (and growing) fingerprints. You can query the service via a range of SDKs written for different programming languages. It’s a commercial service with different tiers for pricing – there’s a free tier for when you’re just starting, and enterprise tiers to guarantee service levels.

Importantly, and pertinently to my highlighting of the important of both algorithm and identification, they link to multiple 3rd party identifiers including YouTube, Apple Music, Deezer and Spotify IDs. That offers a vast range of different sources for metadata, as well as the audio itself.

Acoustid

Acoustid is an open source project which boasts a fingerprinter (Chromaprint) and a back-end lookup service. It is heavily linked to MusicBrainz, the online music encyclopedia, and uses MusicBrainz identifiers (MBIDs) to identify recordings.

As Acoustid is crowd sourced, its database is growing rapidly. There are occasional incorrect results, but I’ve written up a few hacks to improve that.

The downside of Acoustid is that it is focused at the problem of identifying recorded music with no external noise. So, for use cases where you are identifying released music, for example from a CD or from computer audio files, it works well. In less controlled environments it does not work.

Gracenote MusicID

MusicID is Gracenote’s service for creating fingerprints and looking up metadata, generating playlists and lists of similar tracks, artists and releases. Gracenote had previously purchased MusicIP (later, AmpliFIND) and rolled this into their subsequent MusicID service.

Gracenote is a commercial service and is typically quite highly, and non-transparently, priced. However, their database is extensive and detailed. They offer free plans for non-commercial projects, and cheaper plans for early stage startups, but otherwise the cost is high.

Rovi Media Recognition

Rovi Media Recognition is a service which can recognise music and provide metadata as a result.

Like Gracenote, Rovi boasts a large and detailed database, but the cost is high.

… to summarise

So it boils down to:

Recognise you need a service that combines a fingerprint algorithm with music identification.
Ideally the service should also provide metadata.
Examine the long term viability of the platform.
Pricing varies wildly; do you really need an “enterprise” licence?

I hope this helps in your audio fingerprinting service selection process! Feel free to drop me a question in the comments, or via email.

ACRCloud Blog