paxomega
13-05-2009 22:55:31
I'm using your show religiously :- ) I have it scheduled as a task daily and it mostly works flawlessly. I have come across a rather annoying problem for me.
I have automated broadcatching of the shows that I'm interested in. I have it set so that I try and get the MKV version if it's available if not I get the SD (AVI) version of the file. I keep looking for the higher def version until the next ep comes out so I sometimes get two versions of the same show (I don't mind this as sometimes it's a battle between "gimmie gimmie NOW" ... vs "I want the best quality".
My problem is that it appears your missing algorithm is purely working off the season episode tuple as the key inconjunction with the likely show name.
This fails in my situation as it misses the 2nd version of the episode (usually the MKV as it's larger, comes out later and takes longer to obtain).
Is there anyway you can cater for this scenario?
I have several suggestions as to how this could be achieved:
easiest option
- maintain multiple *missing* lists based upon file type (avi, mpeg, mkv) using your current processing logic- may lead to more false identifications depending upon how much you favour the season/episode tuple
more powerful (complicated option)
- identify a show by it's actual name first and episode coordinates second
I know this is difficult as different groups have different standards using "." or "_" or "-" as separator characters ... and some tend to abbreviate show names (btvs - buffy the vampire slayer, CSI-NY -> CSI New York). The seperator char is easy to handle - it's mostly "." these days (I defined a set of possible sep chars and counted them in the name - the most present was the one being used).
The abbreviated name makes lookups more difficult - but possible I think. If you can obtain a set of all the possible show names (not sure if this is available from the TVDB API) you could compute all the possible abbreviations readily and then also match against those.
Another useful trick I used was "convention" over configuration. Most users will have some or all their old shows in correctly named folders. So you could use those as the set of names to start off with computing the abbreviation for ... also as to which file/folder name most closely matched the tvrename name.
eg. TV/CSI-NY/Season 01
and I have
/New/Crime Scene Investigators : New York S01E03
/New/CSI: New York S01E06
/New/CSI:NY 2x13
Computing the Abbreviated version (first letter of each word with Capitals ALWAYS being include) before the episode coordinates yields CSINY, CSINY and CSINY
Other matching ideas is that you could save the previous matching history (perhaps using your episode tuple system) you could have the system learn that both "Crime Scene Investigators : New York" and "CSI:NY" were successfully mapped back to the show CSI-NY.
I know that's a big wishlist but it doesn't hurt to share ideas.
I recalled another thread where it was discussed opening the source ... if you were open to the idea I would be interested in participating. I realise this is a Dot Net (C#??) App but as a Java Dev I can learn to Sentence case my API's ;- ) System.Out.Print vs system.out.print :- )
There are a large number of places for hosting code such as this - Google code (esp if they are starting to use Mercurial instead of SVN), GitHub (if you prefer GIT) or SourceForge etc. I recall you were concerned about losing control of the source ... so using a Distributed tool (Mercurial, GIT etc) would allow others to develop independently readily and for you to maintain control without granting commit access rights (you can do so with other non distrib tools but it's easier). Worst case I can see is that someone forks your code. And you end up with a competitive situation - eg XBMC vs Plex (Mac version) where they cherry pick bits and pieces from each other's source tree (as it's still open).
Hmmmm ... I've spent way too long on this.
Thanks VERY much for all your efforts to date
pax
I have automated broadcatching of the shows that I'm interested in. I have it set so that I try and get the MKV version if it's available if not I get the SD (AVI) version of the file. I keep looking for the higher def version until the next ep comes out so I sometimes get two versions of the same show (I don't mind this as sometimes it's a battle between "gimmie gimmie NOW" ... vs "I want the best quality".
My problem is that it appears your missing algorithm is purely working off the season episode tuple as the key inconjunction with the likely show name.
This fails in my situation as it misses the 2nd version of the episode (usually the MKV as it's larger, comes out later and takes longer to obtain).
Is there anyway you can cater for this scenario?
I have several suggestions as to how this could be achieved:
easiest option
- maintain multiple *missing* lists based upon file type (avi, mpeg, mkv) using your current processing logic- may lead to more false identifications depending upon how much you favour the season/episode tuple
more powerful (complicated option)
- identify a show by it's actual name first and episode coordinates second
I know this is difficult as different groups have different standards using "." or "_" or "-" as separator characters ... and some tend to abbreviate show names (btvs - buffy the vampire slayer, CSI-NY -> CSI New York). The seperator char is easy to handle - it's mostly "." these days (I defined a set of possible sep chars and counted them in the name - the most present was the one being used).
The abbreviated name makes lookups more difficult - but possible I think. If you can obtain a set of all the possible show names (not sure if this is available from the TVDB API) you could compute all the possible abbreviations readily and then also match against those.
Another useful trick I used was "convention" over configuration. Most users will have some or all their old shows in correctly named folders. So you could use those as the set of names to start off with computing the abbreviation for ... also as to which file/folder name most closely matched the tvrename name.
eg. TV/CSI-NY/Season 01
and I have
/New/Crime Scene Investigators : New York S01E03
/New/CSI: New York S01E06
/New/CSI:NY 2x13
Computing the Abbreviated version (first letter of each word with Capitals ALWAYS being include) before the episode coordinates yields CSINY, CSINY and CSINY
Other matching ideas is that you could save the previous matching history (perhaps using your episode tuple system) you could have the system learn that both "Crime Scene Investigators : New York" and "CSI:NY" were successfully mapped back to the show CSI-NY.
I know that's a big wishlist but it doesn't hurt to share ideas.
I recalled another thread where it was discussed opening the source ... if you were open to the idea I would be interested in participating. I realise this is a Dot Net (C#??) App but as a Java Dev I can learn to Sentence case my API's ;- ) System.Out.Print vs system.out.print :- )
There are a large number of places for hosting code such as this - Google code (esp if they are starting to use Mercurial instead of SVN), GitHub (if you prefer GIT) or SourceForge etc. I recall you were concerned about losing control of the source ... so using a Distributed tool (Mercurial, GIT etc) would allow others to develop independently readily and for you to maintain control without granting commit access rights (you can do so with other non distrib tools but it's easier). Worst case I can see is that someone forks your code. And you end up with a competitive situation - eg XBMC vs Plex (Mac version) where they cherry pick bits and pieces from each other's source tree (as it's still open).
Hmmmm ... I've spent way too long on this.
Thanks VERY much for all your efforts to date
pax