True show identification - not episode missing= likely show

paxomega

13-05-2009 22:55:31

I'm using your show religiously :- ) I have it scheduled as a task daily and it mostly works flawlessly. I have come across a rather annoying problem for me.

I have automated broadcatching of the shows that I'm interested in. I have it set so that I try and get the MKV version if it's available if not I get the SD (AVI) version of the file. I keep looking for the higher def version until the next ep comes out so I sometimes get two versions of the same show (I don't mind this as sometimes it's a battle between "gimmie gimmie NOW" ... vs "I want the best quality".

My problem is that it appears your missing algorithm is purely working off the season episode tuple as the key inconjunction with the likely show name.

This fails in my situation as it misses the 2nd version of the episode (usually the MKV as it's larger, comes out later and takes longer to obtain).

Is there anyway you can cater for this scenario?

I have several suggestions as to how this could be achieved:
easiest option
- maintain multiple *missing* lists based upon file type (avi, mpeg, mkv) using your current processing logic- may lead to more false identifications depending upon how much you favour the season/episode tuple

more powerful (complicated option)
- identify a show by it's actual name first and episode coordinates second

I know this is difficult as different groups have different standards using "." or "_" or "-" as separator characters ... and some tend to abbreviate show names (btvs - buffy the vampire slayer, CSI-NY -> CSI New York). The seperator char is easy to handle - it's mostly "." these days (I defined a set of possible sep chars and counted them in the name - the most present was the one being used).

The abbreviated name makes lookups more difficult - but possible I think. If you can obtain a set of all the possible show names (not sure if this is available from the TVDB API) you could compute all the possible abbreviations readily and then also match against those.

Another useful trick I used was "convention" over configuration. Most users will have some or all their old shows in correctly named folders. So you could use those as the set of names to start off with computing the abbreviation for ... also as to which file/folder name most closely matched the tvrename name.

eg. TV/CSI-NY/Season 01
and I have
/New/Crime Scene Investigators : New York S01E03
/New/CSI: New York S01E06
/New/CSI:NY 2x13

Computing the Abbreviated version (first letter of each word with Capitals ALWAYS being include) before the episode coordinates yields CSINY, CSINY and CSINY :-)

Other matching ideas is that you could save the previous matching history (perhaps using your episode tuple system) you could have the system learn that both "Crime Scene Investigators : New York" and "CSI:NY" were successfully mapped back to the show CSI-NY.

I know that's a big wishlist but it doesn't hurt to share ideas.

I recalled another thread where it was discussed opening the source ... if you were open to the idea I would be interested in participating. I realise this is a Dot Net (C#??) App but as a Java Dev I can learn to Sentence case my API's ;- ) System.Out.Print vs system.out.print :- )

There are a large number of places for hosting code such as this - Google code (esp if they are starting to use Mercurial instead of SVN), GitHub (if you prefer GIT) or SourceForge etc. I recall you were concerned about losing control of the source ... so using a Distributed tool (Mercurial, GIT etc) would allow others to develop independently readily and for you to maintain control without granting commit access rights (you can do so with other non distrib tools but it's easier). Worst case I can see is that someone forks your code. And you end up with a competitive situation - eg XBMC vs Plex (Mac version) where they cherry pick bits and pieces from each other's source tree (as it's still open).

Hmmmm ... I've spent way too long on this.

Thanks VERY much for all your efforts to date :-)
pax

sstteevvee

13-05-2009 23:51:13

Thanks for the feedback.. A few quick answers, as I should be going to sleep soon..

At the moment, it will only look for files if they are deemed to be missing. So, if "S01E03" is there, it won't even consider looking for a better version. A few people have asked for the ability to have it "upgrade" files if a better version is found, but that would take some redoing of how TVRename approaches things. It may not be too hard, but its not something I've really thought about yet.

If a file is in the final "Media Library" folder, it is assumed to belong to that show - no matter what show name might be in it. If the season and episode number can be found, it will be renamed. For files being picked up in "finding and organising", then it has to have the show name in it. One thing I'm planning on doing soon (due to popular demand) is "alternate names" for a show, so you can tell it that "Buffy The Vampire Slayer" may also be named "BTVS" or "Buffy". It'll then rename them to whatever you prefer the show's name to be.

The code is in C++.NET, and I'm currently using CVS (private server, just for my own backup/protection). Google code does look quite good - I've seen a couple of projects on there before.