bLight
Registered User
Registered User
Posts: 35
Joined: Wed Jan 11, 2017 2:42 pm

API issues

Wed Jan 11, 2017 2:54 pm

I began implementing the API for an open-source Zoom Player plugin I'm implementing.

I stumbled upon a few issues,

For example:
http://www.theaudiodb.com/api/v1/json/[apikey]/searchalbum.php?a=The%20Adventures%20Of%20Priscilla%20Queen%20Of%20The%20Desert
or
http://www.theaudiodb.com/api/v1/json/[apikey]/searchalbum.php?a=Hamilton
returns nothing, no error, http status = 200.


Another issue (and this isn't 100% the database at fault here, I'm still investigating) is that I'm not sure if the "strDescriptionEN" value is UTF8 Encoded before it's sent out, for example, take a look at :
http://www.theaudiodb.com/api/v1/json/[apikey]/searchalbum.php?s=Adele&a=21

"21 is the second studio album by British singer Adele. It was released on 24 January 2011 in most of Europe, and on 22 February 2011 in North America. The album was named after the age of the singer during its production. 21 shares the folk and Motown soul influences of her 2008 debut album 19, but was further inspired by the American country and Southern blues music to which she had been exposed during her 2008\u201309 North American tour An Evening with Adele... "

User avatar
zag
Site Admin
Site Admin
Posts: 1189
Joined: Wed Jun 06, 2012 9:19 am
Country: United Kingdom

Re: API issues

Thu Jan 12, 2017 9:36 am

Morning bLight,

First let me say i'm a big fan of Zoomplayer, I used to use it back in the days when Home Theatre software needed and external player. Very nice interface you have there :)

Regarding your comments

1) This search by album name only method is not implemented at the moment, but probably can be if needed. The only reason I didn't do it was it may be a little inefficient on the API(we have 157,674 albums in the database) to only search for the album name. Let me know how your using it and I can probably introduce it.

2) I remember a discussion about the JSON encoding from years ago when we launched. I believe JSON has to have certain encoding in its specification and I think we comply with that. I know the Kodi scrapers certainly accept special characters now and it doesn't require any kind of post processing. The database fields are "utf8_general_ci" collation and I use standard json_encode php function to return the api data. Let me know how you get on with it.

bLight
Registered User
Registered User
Posts: 35
Joined: Wed Jan 11, 2017 2:42 pm

Re: API issues

Thu Jan 12, 2017 1:31 pm

I semi-expected your answer on the JSON encoding, like I wrote, I wasn't entirely sure.

With regards to album-only search, I'm currently in the process of trying to find the best automatic detection of various folder naming schemes.

Some albums simply don't have an artist or there are multiple artists involved (like the two examples I posted above), so based only on the folder name, I make an album search (which appears as an option in the API tutorial post).

I would welcome any other suggestions on improving search accuracy.

I posted an early screenshot of how the scraping results currently looks like in ZP:
https://www.facebook.com/zoomplayer/pho ... 28/?type=3

BTW,
If you're open for suggestions, it is possible to improve 'searching for track' accuracy by using a hash of the file's content instead of a track name, it would mean a larger data-set because of multiple versions of the same track and probably other complications on the back-end, but it would make identification 99.999% accurate. This is how the opensubtitles.org API look-ups work.

User avatar
zag
Site Admin
Site Admin
Posts: 1189
Joined: Wed Jun 06, 2012 9:19 am
Country: United Kingdom

Re: API issues

Thu Jan 12, 2017 2:38 pm

Searchalbum should now accept just the album name.

Example
http://www.theaudiodb.com/api/v1/json/1 ... m.php?a=21

Unfortunately it didn't work for either of your examples as (I think) they are wildcard searches. This method only matches exact hits for performance reasons. We serve 10 million hits a night sometimes so I have to be very careful about using too much CPU time on these kind of searches.

My advice is to try and get the MusicBrainz ID for any tracks, albums and artists you search for. Once you have that unique ID its very easy to lookup artwork and details from a number of online sites such as ours. You can either use the MusicBrainz Web service to search for the proper album name or one of our other search methods.

Regarding Hashes, TADB is already setup to store any type of file hash or acoustiID's. But at this time I have never found an easy to use windows app that can create the file hashes from a folder of MP3's and export to a text for or csv. If you know (or can write) something like that i'd happily expose hash lookups to the song ID on the API. It is very easy to implement once we have some hash data.

NOTE: A lot of people change their file tagging which will effect the file hash, so this might not be as effective as you think. I'd still like to do it though :)

bLight
Registered User
Registered User
Posts: 35
Joined: Wed Jan 11, 2017 2:42 pm

Re: API issues

Thu Jan 12, 2017 7:34 pm

The samples I gave are one for a movie soundtrack and another for a theater play soundtrack, I'm not sure what you mean by wildcard searches.

I did mean to ask you about fuzzy searches. For example a search for "Coldplay / Viva la Vida" failed, while a search for "Coldplay / Viva la Vida or Death and All His Friends" worked. It would be nice if the database would return some fuzzy results sorted by best match.

Other cases were "REM / Automatic For The People" failing while "R.E.M. / Automatic For The People" working.

You're right about the TAG editing affecting the hash, opensubtitles.org's hash works by hashing certain parts of the file (beginning/end).

That won't work for audio files with TAGs, but that can be partially circumvented by taking say, the middle 64kbyte of an audio file to generate the hash.
So as long as the hash editing tool doesn't modify the file's start, the hash will be maintained.

As far as creating the hashes, that's not really an issue, I can even make a tool for you and use a similar hash to opensubtitles.org (with just a different starting index) so you'll have nearly-complete pre-existing code samples in many coding languages.

The major issue (in my view) is how to populate the database.

I can't afford musicbrainz.

User avatar
zag
Site Admin
Site Admin
Posts: 1189
Joined: Wed Jun 06, 2012 9:19 am
Country: United Kingdom

Re: API issues

Thu Jan 12, 2017 9:38 pm

bLight wrote:I did mean to ask you about fuzzy searches. For example a search for "Coldplay / Viva la Vida" failed, while a search for "Coldplay / Viva la Vida or Death and All His Friends" worked. It would be nice if the database would return some fuzzy results sorted by best match.


Thanks, I just fixed that by adding fuzzy searching after the album name. Try again and it should work. This is about as good as it can get with the current API as mysql has a number of limitations on search and indexes.

http://www.theaudiodb.com/api/v1/json/1 ... 0la%20Vida

bLight wrote:Other cases were "REM / Automatic For The People" failing while "R.E.M. / Automatic For The People" working.

Yes this is unfortunate... REM should work using artist only search, but not album as well. Its a limitation of our API at this time sorry.

http://www.theaudiodb.com/api/v1/json/1 ... .php?s=rem

bLight wrote:As far as creating the hashes, that's not really an issue, I can even make a tool for you and use a similar hash to opensubtitles.org (with just a different starting index) so you'll have nearly-complete pre-existing code samples in many coding languages.

The major issue (in my view) is how to populate the database.


That's an interesting thought, if you do have any tool I will take a look. At first I can import things from a csv output text file possibly.

It looks like the id3 tags can be at the start and end of music files, which makes it a bit more complicated.

https://acoustid.org/ is another option but I have not looked into it much.

bLight
Registered User
Registered User
Posts: 35
Joined: Wed Jan 11, 2017 2:42 pm

Re: API issues

Sun Jan 15, 2017 10:24 am

I've given the hashing issue more thought...

Assumptions:
1. From my experience, ID3 (and other format) tags are always at the beginning or end of the video.
2. The size of the ID3 tags are 99.999% under 1MB in size (usually much smaller, but sometimes they include an embedded image).
3. Most audio tracks are over 2MB in size.
4. Editing a TAG doesn't modify both the start and the end of the file.

In the above scenario, the best hashing approach would be to take two hashes:
1. Hash 64KByte at 1MB into the file.
2. Hash 64KByte from 1MB from the end of file (FileSize-(1MB)+(64KB))

If a hash search matches at least one of the two hashes we can be fairly certain there's a match.

User avatar
zag
Site Admin
Site Admin
Posts: 1189
Joined: Wed Jun 06, 2012 9:19 am
Country: United Kingdom

Re: API issues

Sun Jan 15, 2017 1:33 pm

bLight wrote:I've given the hashing issue more thought...

Assumptions:
1. From my experience, ID3 (and other format) tags are always at the beginning or end of the video.
2. The size of the ID3 tags are 99.999% under 1MB in size (usually much smaller, but sometimes they include an embedded image).
3. Most audio tracks are over 2MB in size.
4. Editing a TAG doesn't modify both the start and the end of the file.

In the above scenario, the best hashing approach would be to take two hashes:
1. Hash 64KByte at 1MB into the file.
2. Hash 64KByte from 1MB from the end of file (FileSize-(1MB)+(64KB))

If a hash search matches at least one of the two hashes we can be fairly certain there's a match.


Yep thats pretty much how I would approach it. It will mean some filler tracks are excluded due to being less than 1mb but thats not a huge issue.

The biggest issue with hashing is doing it fast, so any hash creator must be as quick as possible.

bLight
Registered User
Registered User
Posts: 35
Joined: Wed Jan 11, 2017 2:42 pm

Re: API issues

Sun Jan 15, 2017 7:18 pm

I propose using the same hashing code as opensubtitles.org as it's already been available for a while and it's rather simple to implement.

Since it's only processing 128kb (64x2) it would mean it's pretty speedy.

As far as minimal file size, we can reduce the distance from the start/end of the hash if the file size is under 1mb, maybe dropping to 250kbyte (which would cover any small embedded image in smaller audio files).

I'll write a sample open-source app in Delphi to mass-hash the files.
What fields other than filename/hash1/hash2 are required for CVS output?
myvideo.avi, AABBCCDDEEFFAABBCCDDEEFFAABBCCDD,AABBCCDDEEFFAABBCCDDEEFFAABBCCDD,???

Need a full file path? file size? other details?

User avatar
zag
Site Admin
Site Admin
Posts: 1189
Joined: Wed Jun 06, 2012 9:19 am
Country: United Kingdom

Re: API issues

Mon Jan 16, 2017 10:08 am

File size and file extension would be useful.

Also something that can scan sub folders?

mp3, flac are the main 2 formats I see.

Return to “Developers”