HomeGroupsTalkZeitgeist

Will you like it?

From WikiThing

Jump to: navigation, search

<< back to Help and Frequently Asked Questions

This is a Help or FAQ page

Please answer LibraryThing related questions here (and on the other Help and FAQ pages). To ASK questions, however, please use Talk.

Contents


Will you like it? is a predictive algorithm that attempts to assess whether you will like a book or not. It predicts how you will feel about the book, and how certain its prediction is.

The algorithm is based on comparing libraries, and does not factor in the "star rankings" that people assign to books. If a book is shared by many people whose libraries are highly similar to yours, taking into account the relative popularity of the book, then you are predicted to like the book.

If there are a relatively small number of copies of the book in LibraryThing, then the certainty of the algorithm is low.

How it works
1. What the algorithm is fundamentally telling you is "Do the 1,000 users most like you have a statistically lower or higher number of copies of the work?" That's it.
Simple as it may sound, that's a tough piece of data to calculate. Not in theory, but in practice. It requires calculating the 1,000 users, which is the hardest calculation on LibraryThing (it hits memory once for every work, and then once for every owner of that book). Then it does some rather complex statistical sampling and analysis, involving a few thousand more hits. I'm not going to spell that part out more, insofar as it's the key to moving it from "terrible" to "meh" and, well, I gotta have secrets, mom.
The basic problem with the method is that it's a secondary calculation--first LibraryThing needs to compute a good 1,000-members list for you. If it's not doing that well, Will you like it won't be better, and will probably be worse. Second, because it's about who shares your books, it can fail to understand your individuality. As I wrote, most people who like Greek history--a big piece of my collection--also like Republican Roman History. I don't, but LibraryThing thinks I should.
Similarly, my two largest sets of books--Greek and Latin language and history, and computer programming--match up nicely with people who think they are above Harry Potter. People who read Plutarch and PHP in the original are snobs of a sort, generally, and have an unusually low number of Harry Potter books in their libraries. But I love Harry Potter! Well, that's life.
Why not use ratings?
1. Ratings are not very useful. Most people like their books, and they rate them within a point or two of each other--the average is 3.80 stars and the standard deviation is 1.03. Further, only about 5 million books have ratings, out of 35 million books. When you're comparing small population samples--the people you are closest to who also have the book vs. the entire population of LibraryThing--having only 1/7 of all books rated is a problem. Think of it this way: If every one of your closest, bestest friends have seen a movie, that's good data. That one of them gave it three stars and you don't know what the others thought of it isn't good data.
Netflix uses ratings because we more often see movies we don't like. It's less of a commitment and we often see movies socially, ceding our movie choice to a loved one whose idea of a good time is a romantic commedy with a dog, not the bullets and mayhem that we like. So the ratings have a higher standard deviation--the data is more interesting. Books are different. Also, Netflix doesn't know every movie you've seen. It only knows what you've rated their stuff.
2. It's technical. For speed reasons LibraryThing stores a complete list of all books you have in memory, available at any time without going to the database. It also stores a list of every user who has a work. Both are in theory; in practice only a portion of users and works are in memory--the portion it had to think about recently; the rest it gets off disk. To find your 1,000-closest we run thousands of queries against this memory--and then store the result in memory as a simple list.
The data is simple to keep the memory size down. After all, we have almost 600,000 members and 35 million books. It doesn't include ratings because ratings aren't as useful and would swell the data size. Getting it off the disk, however, would bring the site to a crashing halt.<ref>Message #51, Tim Spalding, "Will you like it?", "New features" group (Jan. 8, 2009, 12:01 pm).</ref>


Feature debuted: Jan. 8, 2009

Discussion of the feature:

(As of November 12, 2009, LibraryThing now has more than 920,000 members and over 45.5 million books.)

References

Template:Reflist

You are using the new servers! | About | Privacy/Terms | Help/FAQs | Blog | Store | APIs | TinyCat | Legacy Libraries | Early Reviewers | Common Knowledge | 114,418,741 books! | Top bar: Always visible