Am I missing something or is the algorithm?

TalkRecommendations function in LT

This group has been archived. Find out more.

Join LibraryThing to post.

Am I missing something or is the algorithm?

1aulsmith
Nov 12, 2010, 9:22 am

Recommendation: South Riding
Why? Karen Armstrong's Beginning the World

South Riding is a novel with a romance plot taking place in Yorkshire in the 1930s. Karen Armstrong's book is the continuation of her autobiography after she left the convent.

Both books are about single women living in England in the 20th century, but that's the only connection I can see. If that's the connection, I suspect it'll recommend Radclyffe Hall's The Well of Loneliness any day now ...

2jjwilson61
Nov 12, 2010, 11:35 am

LT Recommendations are mostly based on which books appear together in all the libraries in the system. So your getting the above recommendation because there are a lot of libraries with Beginning the World that also have South Riding. There are other factors such as rarer books affecting the algorithm and I think tags and subjects of the two books have some influence but mostly it's just whether those two books commonly appear together in libraries on LT.

3aulsmith
Edited: Nov 12, 2010, 11:46 am

Okay the tags for women and England would do it then. I don't really like the idea of getting fiction recommended for non-fiction (unless it's by a human being who's read both books and can explain the connection).

So maybe I won't get The Well of Loneliness, since there may not be a lot of people with both nun books and lesbian books.

ETA: second paragraph

4lorax
Nov 12, 2010, 12:00 pm

3>

Tim hasn't divulged the exact details, but mostly it's commonality, not tags.

Why people trust one person who says "I liked both X and Y, so if you should like X you should try Y" but not a hundred people who all say that together, because a computer is involved in making the aggregation of those hundred people, completely escapes me, but enough people feel that way that I've given up trying to talk them out if it.

5aulsmith
Nov 12, 2010, 1:38 pm

I guess it doesn't resonant with me because so many of the books in my library are not books I liked or even read. And if I did like them once, it doesn't mean I'd like them now.

It's actually kind of scary that LT might be recommending that people read some of the really abhorrent books I have in my library simply because the other person and I share a lot of books in common.

6lorax
Edited: Nov 12, 2010, 2:03 pm

5>

So put all those "abhorrent" books in a separate Collection and uncheck "Use for recommendations". You can't expect LT to read your mind. (And it's not just you, and not just "abhorrent" books. I see lots of people who trust individual Member Recommendations from complete strangers over automated recommendations that are produced from aggregates of many people, and I just don't understand it at all.)

7aulsmith
Nov 12, 2010, 3:10 pm

6: I am doing that (or rather the reverse, putting only "good" (for me) books in the one collection I have marked for recommendations). I understand the LT is using only that collection to generate my recommendations. Is it also only using that collection of books from my library to generate recommendations for other people? Or is it using my whole library, bad books and all, to aggregate recommendations for others?

8jjwilson61
Nov 12, 2010, 3:32 pm

7> It only works one way.

6> That won't work unless those books are not included in any other collections that have Use for Recommendations checked.

9aulsmith
Nov 12, 2010, 5:20 pm

6: I see I was unclear in message 5. I meant that if my library has weird crap in it that I don't want other people to read, why shouldn't I assume that other people's libraries that are like mine do too. So why should I trust an aggregate recommendation based on that kind of data?

If a human being recommends a book and tells me why it's related to another book, at least I have some idea of how the two books might be related and that at least one other person liked both books.

10jjwilson61
Nov 12, 2010, 6:19 pm

Given that most recommendations seem reasonable to most people then the amount of weird crap in people's libraries may be assumed to be minimal. Or perhaps all the weird crap is different weird crap in different libraries so when you aggregate among hundreds of libraries the weird crap comes out in the wash (or something like that).

11Aerrin99
Nov 12, 2010, 8:10 pm

6> You can't expect LT to read your mind.

No, but I can expect it to use the data I provide in order to determine my opinion of the books in my library.

Gee, if only there were some easy-to-use and easy-to-quantify way that I did that, huh? ;)

12aulsmith
Nov 13, 2010, 8:45 am

10: Do we know that "most recommendations seem reasonable to most people"? Or do the people who think the recommendations aren't reasonable self-select out of using the feature?

13jjwilson61
Nov 13, 2010, 10:03 am

I think we know that most people who use talk to discuss features of LT find them reasonable. It seems to depend on your library. If you have a large chunk of a certain common genre, like mine, the algorithm picks more books of that genre and ignores the rest of your library which you might want to get recommendations on. But if it completely didn't work I think you'd hear a lot more complaining.

14eromsted
Nov 13, 2010, 5:28 pm

>1 aulsmith:
If you want to see more specifically where a recommendation came from and the "Why?" option is functioning you can click through to a work page listed under "Why?" and then to the more recommendations section.

In the recommendations for Beginning the World, South Riding comes in at #3 in "People with this book also have... (more common)" and #20 in "People with this book also have... (more obscure)." That's enough to get it to #9 overall and apparently into recommendations for your library, an aggregate of these aggregate work recs.

So in this case it was people having both books more often than statistically expected not tags nor subjects.

15auntSteelbreaker
Nov 13, 2010, 9:12 pm

Well there are only 23 copies of Beginning the world so you can't expect the recs to be perfect. I get a lot of faulty recommendations for Swedish books because the books I have read aren't that common on LT. I have for example on several occasions been recommended to read some of the most popular Swedish crime/mystery/whatever-crap just because a lot of Swedish readers happened to have that book. And it is obvious that if the Swedish user base were ten times as big I would not get those recs.

16lorax
Nov 13, 2010, 11:53 pm

9>

I think the idea is that everyone's weird crap is different, so it doesn't correlate.

To grossly oversimplify, say you have books A, B, and C, which reflect your tastes, and book Q, which doesn't, and which you have excluded from "use for recommendations". Of people who have A, B, and C, 95% of them have D; 5% have X, 5% have Y, and 5% have Z, which are all their "weird stuff." So you'll get the recommendation for D, but not for X, Y, or Z. On the other hand, if you put your confidence in a single individual, there's a chance you'll get a recommendation for Z. You may end up really liking it -- member recommendations are great for that sort of lateral, non-intuitive recommendation -- but for the higher-confidence recommendation I'd go for the numbers every time.

(Full disclosure: I'm building a recommendation engine based on collaborative filtering in my current job, so I'm somewhat predisposed to think it's a good technique.)

17aulsmith
Nov 14, 2010, 9:57 am

16: In light of your full disclosure, let me toss out some more information for whatever it's worth to you.

The weird crap that I was thinking about in 9 included the following:

The Koran
Catcher in the Rye
Farewell to Arms
Judith Butler's Gender Trouble

Gender Trouble has the least holdings (ca. 1200) and Catcher the most (over 10,000). So, you can see that, especially in the case of Catcher, it's more one of your D cases, than one of your X, Y, Zs. These are all books that are either assigned as textbooks or everyone is supposed to read. So a lot of people have them, but whether they've read them or liked them isn't clear (unless you use ratings as Aerrin stated earlier).

I suspect the South Riding recommendation was the same kind of thing. It's a virago classic, so it's probably one of those books that gets assigned to as a text in certain classes in Great Britain, but who knows if the people who own it read it or liked it (or would have liked it if they hadn't been told they should ...). Based on the reviews, I might like parts of it, but it seems to be basically a romance, so it's really low on my list. I'm certainly not going to like it because I liked Through the Narrow Gate.

Which brings me to toffte's comment. Actually, since I thought tags were factored in, I had every expectatiion of Through the Narrow Gate generating a number of good recommendations. It's an ex-nun autobiography. I'm fond of real life stories of disaffected former religious. I've read a number of them, but Through the Narrow Gate is particularly well-written, so I put it on my list in the hopes of generating other recommendations for well-written, autobiographical pieces by former religious. This is actually where I was hoping recommendations could help me as it's hard to get at this topic with traditional library subject headings. So of all the ways that it's possible to look at Through the Narrow Gate (autobiography, autobiography of best selling author, second wave feminism, academic life in Great Britain in the early 1970s ...), it's only that one particular way that I care about.

I think this is what Netflix is trying to do with all those weird sub-sub-genres it keeps coming up with, though they haven't helped me that much yet. But one question I have is whether there's some way to use folksonomies to do this better? Is there some way for me to tap into other folks who read disaffected religious peoples life stories and find the books they've read that I haven't? Because that's the kind of thing I really want to do.

18auntSteelbreaker
Nov 14, 2010, 11:04 am

I think what you have to do is taking the recs for rare books with a grain of salt. For most books some of them ought to be good, but sometimes there will be strange things. For me it has been rather easy to ignore well known Swedish authors because I know they just happened to become recs while other fiction-for-fiction recommendations seemed strange but I couldn't know that until actually looking at the book at the library or the book store.

What I would do to find books is to look at the full list of LT recs (opening up all of the specific lists) and check the ones who actually seem to be what I want. If you're lucky you find at least one other book on the specific subject you want that can give you more recs. Or you could go for tagmashes.

19lorax
Nov 14, 2010, 8:40 pm

17>

The recommendations are based on overdensities, though. If people with books like yours aren't any more likely to have Catcher than people with books unlike yours, it won't show up as a recommendation. So you can't point to overall popularity as an indicator that a book showing up in other people's libraries will automatically mean it shows up in yours.

20aulsmith
Nov 15, 2010, 8:50 am

19:

If you can't go into this much detail, I understand. I also understand that you'd be answering generally, and not in terms of what LT is actually doing (or your own algorithm for that matter).

How does size of library factor in to deciding who has books "like" me? With a 7000+ book library, I'm creeping my way up into the top 50 libraries on LT. I suspect I'm in the 90th percentile for library size. With the science fiction bias on LT, I'm fairly likely to have a hundred or so books in common with a substantial minority of people on the site. So could all these libraries be counted as "like" mine and skew the "common book effect" for me and other large libraries, but not for smaller, less science fiction dominated libraries?

21auntSteelbreaker
Nov 15, 2010, 10:07 am

@20 Connections based things might have that kind of effect, so Read-alike recs will be skewed in different ways depending on library size and reading style. My list is for example dominated by 20th century Swedish language classics, because I have a rather small library with a fair amount of books read by Swedish people with an interest in literature. But that is because these books usually range from 15 to 100 registered copies, and therefore are considered more important than for example internationally well known English language literature. So if your SF books are popular they will not have the same effect as your less well known books. For a library of my size it seems that the least main stream of my genres will dominate. (Which should be the point of Read-alike recommendations.)

The book (work) based recommendations don't work that way. They don't care at all about your library but will instead focus on the owners of the specific book. If it is statistically expected that among the 150 owners of book A 0.7 users also should have book B, but in reality actually 11 users have it it is likely to become a recommendation. So the size of your own library is not important.

22jjwilson61
Nov 15, 2010, 10:31 am

20> If you can't go into this much detail, I understand. I also understand that you'd be answering generally, and not in terms of what LT is actually doing (or your own algorithm for that matter).

Are you under the impression that lorax works for LT? She doesn't. LT employees will have an iconic L next to their names when they post.

23aulsmith
Nov 15, 2010, 12:45 pm

22: No, I'm under the impression that she works for somebody else (unspecified) and that she is currently working on a recommendations algorithm which may or may not be proprietary. And that even though the data I bring up is from LT, she can only guess at how the algorithm works, so can't speak for it specifically.

21: Okay, so let me adjust my speculations in 9. Of the 23 people who own Beginning the World (I realized last night that I switched book titles in mid-stream. They are basically two parts of a two part biography.) than otherwise. They have a statistical anomaly of owning more copies of South Riding than other folks on this site (it being a British classic and the site being predominantly U.S. residents). So, since I happen to have listed Beginning the World in my library, I "benefit" from this statistical anomaly by getting it recommended to me. This kind of algorithm does explain many of the useless recommendations I get based on my current recommendations collection. It doesn't explain the ca. 800 useless science fiction recommendation I got when I used to have my entire library generate the recommendations.

I think I should warn folks with science fiction collections that I just added the books I read for my paper on glossolalia in the 1970s, thereby increasing the statistical improbability of science fiction fans with glossolalia books. That should make for some recommendation weirdness!