bLight
Registered User
Registered User
Posts: 39
Joined: Wed Jan 11, 2017 2:42 pm

Matching album names containing invalid file path characters

Thu Feb 23, 2017 3:17 pm

I'm trying to make a folder match for this album:
http://www.theaudiodb.com/album/2120877

Since the album name contains ":" in the database, I don't think I can ever make a match if the source is based on a folder name (which can never have ":" as part of the name).

I tried matching "The best of R.E.M. In Time" and "In Time The best of R.E.M" (can't do R.E.M. with a dot in the end as windows doesn't allow trailing dots in file names) , neither of which work.

Here's another example:
With this album "http://www.theaudiodb.com/album/2224366" containing the name "Suzanne Vega - Tried and True: The Best of Suzanne Vega", I can't make a match if I search for "Suzanne Vega" and "Tried and True The Best of Suzanne Vega".

Perhaps it would be best to fuzzy the search to strip certain characters in the DB when making the match, for example ":", "/", "\", ".","-" etc...

User avatar
zag
Site Admin
Site Admin
Posts: 1199
Joined: Wed Jun 06, 2012 9:19 am
Country: United Kingdom

Re: Matching album names containing invalid file path characters

Thu Feb 23, 2017 3:32 pm

Is that because you are searching from a windows folder name?

I'm not sure I want to change this as most searches are done from the tags where : is a perfectly valid character

Code: Select all

http://www.theaudiodb.com/api/v1/json/1/searchalbum.php?s=R.E.M.&a=In%20Time:%20The%20Best%20of%20R.E.M.%201988-2003


There is a workaround where you could search for the artist and list all the albums in that case:

Code: Select all

http://www.theaudiodb.com/api/v1/json/1/discography.php?s=R.E.M.


Then do some kind of fuzzy search post processing yourself.

bLight
Registered User
Registered User
Posts: 39
Joined: Wed Jan 11, 2017 2:42 pm

Re: Matching album names containing invalid file path characters

Thu Feb 23, 2017 9:25 pm

Yes, but that will be a DB hit for you.

How about a parameter that specifies whether to ignore the problematic characters, that way you don't change default behavior.

Ideally, a probability based approach on name matching would work best, but I'm not sure how much resources you have to spare for that.

User avatar
zag
Site Admin
Site Admin
Posts: 1199
Joined: Wed Jun 06, 2012 9:19 am
Country: United Kingdom

Re: Matching album names containing invalid file path characters

Fri Feb 24, 2017 2:07 pm

bLight wrote:Yes, but that will be a DB hit for you.

How about a parameter that specifies whether to ignore the problematic characters, that way you don't change default behavior.

Ideally, a probability based approach on name matching would work best, but I'm not sure how much resources you have to spare for that.


Yeh thats the real issue, we serve 10 million API hits a night from a single server so any changes really need to be thought through in terms of resourcing.

I will look at making an option to ignore non-windows characters later tonight.

bLight
Registered User
Registered User
Posts: 39
Joined: Wed Jan 11, 2017 2:42 pm

Re: Matching album names containing invalid file path characters

Sun Mar 12, 2017 9:31 am

Any update on this?

User avatar
zag
Site Admin
Site Admin
Posts: 1199
Joined: Wed Jun 06, 2012 9:19 am
Country: United Kingdom

Re: Matching album names containing invalid file path characters

Mon Mar 13, 2017 3:12 pm

Can you try this:

Code: Select all

http://www.theaudiodb.com/api/v1/json/1/searchalbum.php?a=In%20Time%20The%20Best%20of%20R.E.M.%201988-2003&i=1


Notice the i=1 at the end to denote removing the irregular character :

The speed is VERY slow for this though :( Almost 0.2 seconds per query when a normal one is about 0.0002 seconds.

This is only a test to see if it works first, I have not rolled it out to other API methods. I am hesitant because of the speed issue even if its optional.

bLight
Registered User
Registered User
Posts: 39
Joined: Wed Jan 11, 2017 2:42 pm

Re: Matching album names containing invalid file path characters

Tue Mar 14, 2017 11:02 pm

There is another solution that would eliminate the speed-hit.

You can pre-calculate it for all albums entries by adding another field to the DB that contains the stripped down string and .
If DB space is not an issue, it's a far faster solution.

I'll test the new code shortly.

User avatar
zag
Site Admin
Site Admin
Posts: 1199
Joined: Wed Jun 06, 2012 9:19 am
Country: United Kingdom

Re: Matching album names containing invalid file path characters

Wed Mar 15, 2017 8:55 am

Hmm yes thats actually a good idea, I will see if thats possible too.

Are you still wanting all those characters removed? Or just the :

bLight
Registered User
Registered User
Posts: 39
Joined: Wed Jan 11, 2017 2:42 pm

Re: Matching album names containing invalid file path characters

Wed Mar 15, 2017 1:16 pm

As a base, I would like an option to ignore all invalid windows filename characters.

And to improve chances of matching on a name based only, even legit characters should probably be removed, but it should be very clear in the docs, something like:

&stripchars=1

Which would remove all characters: !@#$%^&*()_-+=[]{};:"'<>| etc...

Here's my rational:
1. The number of album/artist names that require these characters for matching should be close to zero and the client can always switch to the full name search mode if there is suspicion that a query may fail based on the characters in the name.
2. If the char-stripping is done in both the search request (by the client, not the DB) and the DB has these stripped records precalculated in a separate DB entry, the chances of a successful DB hit will go up by quite a nice percentage I believe.

For example:
If the band is named "R.E.M." but someone named the folder "REM", then using the logic above, it would still match it easily.

The only downsides I can think of is the extra DB space which we already discussed and possible false matches, but I think the level of additional positive matches would far out-weigh any failures.

This is actually a metric you can measure over time by seeing the % of successful hits with or without the "stripchars" flag.

User avatar
zag
Site Admin
Site Admin
Posts: 1199
Joined: Wed Jun 06, 2012 9:19 am
Country: United Kingdom

Re: Matching album names containing invalid file path characters

Thu Mar 16, 2017 12:21 pm

Yep I don't see any major downsides really, its a great idea.

I'm working on it now but it will be a little bit until the data is copied in the tables.

Return to “Developers”