It's been years since I last asked this, so let's try again: is there yet a way to Shazam a bunch of files in bulk to identify their contents? I'd like to run such a thing over the archived DNA webcasts to generate actual playlists after the fact.
There used to be a thing called Echoprint but I never got it to work, and it's gone now anyway.
Gracenote (Sony) ended up with the patents. https://github.com/ZeroOneStudio/gracenote-cli
...but, as far as I know, it assumes a one-track-per-file model.
There's a Python library to go with it: https://github.com/beetbox/pyacoustid
I came across this:
They offer a chrome extension and an API
I tried out a few of the songs from one of the web casts and it seemed to work reasonably well.
The extension is .99/month and the API is ~2-4$ per 1000 songs.
Seems to be geared towards user generated content, so I'm guessing this is what you are looking for.
Looks like you can just use the RapidAPI method of using their tool; probably not more than a dozen lines of code to write something to go scrub through your URLs. I assume you'd be writing some small script to actually make those playlists/TOCs anyway so maybe this isn't any more work? https://rapidapi.com/AudD/api/audd
The question is whether it'll identify more than one item in a file. If it won't then this isn't going to get you a TOC for a 30 minute file unless you chop it up to submit it which is surely over your too-much-hassle threshold. I wasn't willing to pay the $1.50 to get a subscription and test it but if you do I think you can just use those m3u links to do the test on that API sample page.
Does YouTube cite the specific copyrights you're ostensibly violating when they yank your videos? You could let them take care of IDing songs and make the playlist from the takedown notice
Not reliably enough, and would also have the side effect of breaking my Youtube account.
Hmmm. How about chopping up a bunch of correctly labeled songs into 30-second chunks, training a neural net on them, and giving the trained nn your unlabeled audio in 30 second snippets?
A former classmate of mine wrote a bird chirp detector that uses spectrograms of forest audio clips
Sounds great, let me know when you have it working.
Sorry, how would that help?
What's inside an Android phone that does local audio matches? That's apparently a small enough data set for them to just drop it into every device, and it does a halfway decent job of spotting pop music, it's not going to tell me which track off "These Guys Are From England" I'm listening to but it knows stuff like "Africa" and "thank u, next" and I can imagine that sort of thing would be a decent start?
Are you sure it's not using a network API?
They say it operates locally, but it has a small database. (blog post, technical paper) So not useful in multiple dimensions. There's a better remote service accessible through Google Assistant.
But none of that is made accessible in any useful way. There's no public API.
What makes you think that these matches aren't all network-based?
Just because it tends to operate very quickly at times does not mean that it is a local operation.
I'm very aware of the network. I use it a lot. But universal ambient network access is not yet a reality even in my relatively high population country. So there are two possible explanations for why my phone is still quite capable of telling me which Shakira song my friend is playing from a tinny Bluetooth speaker in the stone cottage we're renting in the middle of nowhere where there's no WiFi and no mobile phone signal:
1. COSPAS SARSAT return link - a global very low bandwidth communication channel piggy-backed on the various GPS-style satellites operated by super powers - is finished, and working, and enabled in my phone but Google has bribed the contracting parties (that would be the Russian and American governments as that bilingual name hints so er... seems unlikely?) to use it not as part of the global distress system but to enable Google phones to have a Shazam-style feature.
2. Google's AI division figured out how to teach a machine to do some semblance of the trick that enables humans to know which popular song they can hear, from recognisable fragments of sound.
Ok that's all fascinating but "does the secret, private Android API use the network" is not terribly relevant to the question I asked.
It seems like anything useful will be based on acoustid:
I'm using audfprint (https://github.com/dpwe/audfprint) for a thing at work, and it works pretty well. Unfortunately it has one large limitation which you might not be able to work around: it recognizes recordings, not songs. The same band playing the same song twice will never have sufficient precision to have the same timings both times, so the fingerprints of those recordings will be different. (I have some ideas on how to get around that, but they're rather speculative.) It has other technical problems as well, but working around those took me less than a day; I doubt you'll have any trouble with them either. If you want to give it a shot, here are some instructions:
The first step is to generate fingerprints for all of your shows, using audfprint's "precompute" function. I recommend specifying the --samplerate 8000 --density 40 options; this down-samples the audio more than the default, but I find that it gets a higher match rate with fewer hashes per second that way.
Then import the fingerprint files into a database (actually a very simple hash table) using the "new" command. This is where the limitations are: in order to support >6-hour shows, you'll need to specify --maxtimebits 20. However, this leaves only 12 bits for the file id, so it will only be able to recognize 4096 unique files. I see that you've done just over 5k shows, so that's not quite enough. You could make two databases, or you could modify hash_table.py to dump the hashes into a better database, something with a btree index that can grow over time. For my own experiments I've just used sqlite3. I use the apsw library rather than python's sqlite3 library because it was easier to install that and use an up-to-date version of sqlite instead of the ancient one that came with my distro. It's still an incomplete hack though so I haven't sent a patch upstream yet.
Now you have to query against your database. Find some songs you have metadata for and feed them into audfprint's "match" command. Make sure to specify the same sample rate and density options you specified when making your database, and use the --find-time-range option to get all the info about which part of the query matches which part of the reference. That'll tell you when those recordings were played.
Of course commercial services exist which already have a database of music that you can use for queries, but I haven't tested that aspect of things so I have no idea what the coverage is like. YMMV.
Yeah, the answer I'm taking from the replies here is:
So how about a second question: do you know someone who works at Shazam? How can I convince that person to just run their code on my files and send me the result?
I don't know why but I posted a reply 3-4 days ago about using ACRCloud which I was able to run on some of your webcasts, they permit 10k requests a day in their free trial.
beets is a pretty spiffy CLI for music management, and has an acousticID plugin.
Alternatively, there's a decent GUI program on Mac called Yate that might do the trick.
As a warning, acousticID fingerprinting takes seconds per track, so it can get slow for large numbers of files.