bLight
Registered User
Registered User
Posts: 39
Joined: Wed Jan 11, 2017 2:42 pm

New Audio Hashing technique and sample application (with source code)

Mon Feb 13, 2017 4:35 pm

As discussed with Zag, I have created a sample application that creates a file hash for MP3/FLAC files while extracting additional TAG/ID3 meta-data.

The MusicBrainz algorithm might be more accurate, but it's also a lot heavier (both CPU and bandwidth required) compared to the much simpler hash I'm suggesting.

The algorithm itself is based on a modified version of the OpenSubtitles.org code:
http://www.yanniel.info/2012/01/open-su ... elphi.html

Unlike the OpenSubtitles.org hash, in this case, the dual hash offset positions within the file are determined by the file size to support smaller file sizes, while allowing larger TAG data (embedded images) to be changed without affecting both hashes (unless the embedded image changes the file size from under 2048KiB to over 2048KiB).

The purpose of the hashing at offsets close to the start and the end of the file is to ensure that if a TAG editing tool modifies either the beginning or the end of the file, at least one of the two hashes should survive and allow us to get meta-data on a specific file.

Delphi 7 compatible source code:
https://github.com/bLightZP/Audio-File- ... udioDB.com

Compiled binary:
http://zoomplayer.com/t/AudioHash.zip

User avatar
zag
Site Admin
Site Admin
Posts: 1246
Joined: Wed Jun 06, 2012 9:19 am
Country: United Kingdom

Re: New Audio Hashing technique and sample application (with source code)

Mon Feb 13, 2017 9:31 pm

Thanks for this, i'm going to look at it properly in the next few days, but on initial quick tests it looks very fast and accurate.

I wonder if the xml can be simplified a little so its the path and filename can be separated?

bLight
Registered User
Registered User
Posts: 39
Joined: Wed Jan 11, 2017 2:42 pm

Re: New Audio Hashing technique and sample application (with source code)

Mon Feb 13, 2017 9:44 pm

Of course, this is why I posted this, for feedback, I'll split the path and name in the next build.

I looked into using the AcoustID Fingerprinter, but it seems it only submits the data to their servers and it's not command line driven either.
This means that adding support for an AcoustID hash would require someone to create either a DLL or EXE file that can generate the hash and then I could bundle it into the AudioHash code.

User avatar
zag
Site Admin
Site Admin
Posts: 1246
Joined: Wed Jun 06, 2012 9:19 am
Country: United Kingdom

Re: New Audio Hashing technique and sample application (with source code)

Mon Feb 13, 2017 9:55 pm

Regarding the musicbrainz acoustID take a look at their command line program fpcalc.exe

It's installed with musicbrainz picard in the main directory and outputs a text hash, although its slow and i cant work out the json output usage, it could be another hash to include (we could store both quite easily).

EDIT: After a few tests with fpcalc I found it quite hard to work with the matching code. Apparently just matching the string with a php compare is not enough. A FLAC and MP3 direct converted only shows 33% similarity and is very slow so It is a bit beyond my small brain :)

Image

I'll look into XML/CSV file submission via the API tomorrow, should be pretty easy I think.

bLight
Registered User
Registered User
Posts: 39
Joined: Wed Jan 11, 2017 2:42 pm

Re: New Audio Hashing technique and sample application (with source code)

Tue Feb 14, 2017 12:45 am

I'm not sure what the metrics are, 33% might be an amazing match, other audio tracks may have 0.001% match.

Check this out:
https://acoustid.org/webservice#lookup

They have a web-service that converts the hashes into musicbrainz id.

I may be able to use their API to convert the hash into a muscibrainz id, but I'm not sure how that will work with regards to their usage license and it wont be fast (they limit to 3 queries/sec) unless they provide faster DB access.

User avatar
zag
Site Admin
Site Admin
Posts: 1246
Joined: Wed Jun 06, 2012 9:19 am
Country: United Kingdom

Re: New Audio Hashing technique and sample application (with source code)

Wed Feb 15, 2017 2:48 pm

I've written a manual XML importer for now which is up and running, so if anyone wants to send me an XML file of their collection feel free.

You can see the result here for a test song I did. I'll update the thread soon with more feedback.

http://www.theaudiodb.com/track/34705640

This can of course easily be made available on the API for hash lookups now which will return the MusicBrainz Artist, Album and Track ID for item ;)

Image

There are obvious problems like filename matching but its looking good so far.

I'd recommend everyone tags their music with musicbrainz picard so the naming is perfect for our site to match the records. Picard is pretty automated.

User avatar
zag
Site Admin
Site Admin
Posts: 1246
Joined: Wed Jun 06, 2012 9:19 am
Country: United Kingdom

Re: New Audio Hashing technique and sample application (with source code)

Wed Feb 15, 2017 3:15 pm

I didn't have perfect success with the hashing algorithm in testing but the 2nd hash is working in most cases...

I get a different hash if I remove the tag completely. Everything else seems to work fine though.

See the change I made in bold at the start of each hash. Its the same file all the way through with small changes.

original filename="2.mp3" fileext="mp3" filesize="5793818" hash1="659D200D3B4F2BAD" hash2="0AF56452067CF197" image="True"
changefilename filename="1.mp3" fileext="mp3" filesize="5793818" hash1="659D200D3B4F2BAD" hash2="0AF56452067CF197" image="True"
yearchange filename="1.mp3" fileext="mp3" filesize="5793818" hash1="659D200D3B4F2BAD" hash2="0AF56452067CF197" image="True"
totallyremovetag filename="1.mp3" fileext="mp3" filesize="5789611" hash1="9F65097AD1E38352" hash2="EDD5972919703D39" image="True"
addimage filename="1.mp3" fileext="mp3" filesize="5903937" hash1="CC2E2B80F6A3C3AC" hash2="0AF56452067CF197" image="True"
retag with picard filename="1.mp3" fileext="mp3" filesize="5791797" hash1="F46F746965C28CF8" hash2="0AF56452067CF197" image="True"

I used MP3 tag to remove the id3tag completely. And musicbrainz picard to retag it once I removed the tag.

Here is the mp3 file I used for testing

http://www.theaudiodb.com/testfiles/01- ... magine.mp3 (it's creative commons licensed)

bLight
Registered User
Registered User
Posts: 39
Joined: Wed Jan 11, 2017 2:42 pm

Re: New Audio Hashing technique and sample application (with source code)

Thu Feb 16, 2017 1:20 pm

Some MP3 files may have multiple TAGs, for example "ID3v1" at the beginning of the file and "ID3v2" at the end of the file.
If the tool removing the tags removes both, thus modifying both the beginning and end of the file, both hashes get invalidated.
The only solution for such cases would be the audio fingerprinting method...

I updated the hashing tool to separate the filepath and filename into different fields as requested (same link).

User avatar
zag
Site Admin
Site Admin
Posts: 1246
Joined: Wed Jun 06, 2012 9:19 am
Country: United Kingdom

Re: New Audio Hashing technique and sample application (with source code)

Thu Feb 16, 2017 1:42 pm

Thanks, its working as expected.

I'm going to run it on my full music collection later as a test run. Will report back.

User avatar
zag
Site Admin
Site Admin
Posts: 1246
Joined: Wed Jun 06, 2012 9:19 am
Country: United Kingdom

Re: New Audio Hashing technique and sample application (with source code)

Fri Feb 17, 2017 5:39 pm

I've been testing with FLAC files but unfortunately I don't get any tag information.

Is it possible to add it?

Return to “Developers”