Give higher weight to exact title matches for Touchstones
This is a continuation of the topic Give higher weight to exact title matches for Touchstones.
Talk Recommend Site Improvements
Join LibraryThing to post.
This topic is currently marked as "dormant"—the last message is more than 90 days old. You can revive it by posting a reply.
1MarthaJeanne
In the past two years the discussion has gotten up over the 200 mark, which is traditionally the time to start a new topic.
In the meantime, we still have the problem that touchstones often give ridiculous results, in some cases the work wanted doesn't show up at all or is so buried in a long list that it is difficult to find. Many users don't know how to change the result, and even more don't know how to force it if an easy change doesn't work.
Touchstones are an important part of how discussions of books happen on LT, so having them be so unreliable for so long is a real issue.
In the meantime, we still have the problem that touchstones often give ridiculous results, in some cases the work wanted doesn't show up at all or is so buried in a long list that it is difficult to find. Many users don't know how to change the result, and even more don't know how to force it if an easy change doesn't work.
Touchstones are an important part of how discussions of books happen on LT, so having them be so unreliable for so long is a real issue.
2klarusu
>1 MarthaJeanne: Well said & thanks for the new thread!
3gilroy
Oh, I'll add an example to the new thread, just cause:
The Forgotten Girls by Alexa Steele comes up with Pride and Prejudice by Jane Austen.
The Forgotten Girls by Alexa Steele comes up with Pride and Prejudice by Jane Austen.
4lesmel
Company Town by Madeline Ashby gets you The Adventures of Tom Sawyer -- b/c hello, someone's review is in the title and other person put the publisher details in the title. (psssst: that's BAD DATA)
5lorax
Here, for the record, is the original text of my post #1 in the original thread, dated from March 2015:
If I recall correctly, Touchstones use the work search feature under the hood, and provides the results sorted by overall popularity of the work. This results in frequent mind-boggling results where the default result, which many users don't know how to change, doesn't actually match any of the words in the title, but connects because each of the words appears in some edition or another.
(Recent case in point: the first Touchstone suggestion for my new ER win, The New Wild, is "The Portrait of Dorian Gray". Which does admittedly contain the word "The"; one edition has "New" in the title, and a couple people have misspelled the author's name without the "e". That's all it takes to rocket this book with 22,000 copies to the top of the list.)
Touchstone searches, though, are different from title searches where people may be uncertain of the title or not have a particular title in mind; the user has a particular title in mind. A minimal change would be to do a phrase search rather than a words-search, so the Touchstone logic searches for, in this case, "The New Wild" - providing four results including the correct one. (The correct one doesn't show up at all in the existing version.)
The New Wild now gives "Wuthering Heights" as the first touchstone, with Dorian Gray as the second and the correct work in the 23rd position.
If I recall correctly, Touchstones use the work search feature under the hood, and provides the results sorted by overall popularity of the work. This results in frequent mind-boggling results where the default result, which many users don't know how to change, doesn't actually match any of the words in the title, but connects because each of the words appears in some edition or another.
(Recent case in point: the first Touchstone suggestion for my new ER win, The New Wild, is "The Portrait of Dorian Gray". Which does admittedly contain the word "The"; one edition has "New" in the title, and a couple people have misspelled the author's name without the "e". That's all it takes to rocket this book with 22,000 copies to the top of the list.)
Touchstone searches, though, are different from title searches where people may be uncertain of the title or not have a particular title in mind; the user has a particular title in mind. A minimal change would be to do a phrase search rather than a words-search, so the Touchstone logic searches for, in this case, "The New Wild" - providing four results including the correct one. (The correct one doesn't show up at all in the existing version.)
The New Wild now gives "Wuthering Heights" as the first touchstone, with Dorian Gray as the second and the correct work in the 23rd position.
6Petroglyph
It's been two years, and it's been getting progressively worse. Please fix this! Please!
A recent example for me: Paris stories by Mavis Gallant comes up as Dickens' A tale of two cities. Some of the other titles suggested by the algorithm are: Madame Bovary; Candide; A moveable feast; The travels of Marco Polo; In our time; For your eyes only; Green hills of Africa; The mystery of the 99 steps; César Birotteau; Kobbe's complete opera book; ...
A recent example for me: Paris stories by Mavis Gallant comes up as Dickens' A tale of two cities. Some of the other titles suggested by the algorithm are: Madame Bovary; Candide; A moveable feast; The travels of Marco Polo; In our time; For your eyes only; Green hills of Africa; The mystery of the 99 steps; César Birotteau; Kobbe's complete opera book; ...
7jnwelch
Good to see a new thread for this. I wonder what the status is, as I understood the problem was being worked on.
8lorax
>7 jnwelch:
Really? Nothing in the other thread ever gave me any indication or hope that the problem was being worked on. Loranne said she would put it on her list to mention to the developers, but that's far from meaning there's actual code being written or even looked at.
Really? Nothing in the other thread ever gave me any indication or hope that the problem was being worked on. Loranne said she would put it on her list to mention to the developers, but that's far from meaning there's actual code being written or even looked at.
9jnwelch
>8 lorax: It probably was wishful thinking. I thought someone said they hoped to have some changes in January. (Ha!) Maybe I dreamed it.
10laytonwoman3rd
>9 jnwelch: @lorannen did say she'd tried to bring it up to the development peeps in mid-January, Joe. http://www.librarything.com/topic/189572#5802687
We must keep agitating.
We must keep agitating.
11jnwelch
>10 laytonwoman3rd: Ah, excellent, thanks, Linda. I'm really good at annoying agitating people.
13wester
Just a few recent exemples then, to remind everybody how bad it is.
On Looking yields as touchstone The Girl With The Dragon Tattoo.
How we die yields The Very Hungry Caterpillar.
Cheap yields The Hound of the Baskervilles.
On Looking yields as touchstone The Girl With The Dragon Tattoo.
How we die yields The Very Hungry Caterpillar.
Cheap yields The Hound of the Baskervilles.
14laytonwoman3rd
*washing machine agitation sounds*
16charl08
My favourite one recently: looking for Julia by Otto de Kat.
Brings up: Fahrenheit 451.
Then...
Lord of the Flies
Hamlet
Romeo and Juliet
Les Misérables
Mansfield Park
Charlie and the Chocolate Factory
Brings up: Fahrenheit 451.
Then...
Lord of the Flies
Hamlet
Romeo and Juliet
Les Misérables
Mansfield Park
Charlie and the Chocolate Factory
18Storeetllr
My latest "favorite": The Children. A loooong list of off-the-wall titles without the correct one ever showing up! Here's the image of the first 22. There are at least 250 titles listed. (Here's a link to the actual book: http://www.librarything.com/work/13779.)
19jnwelch
>18 Storeetllr: Wow. *shaking my head*
20lorax
That may be a new record - my previous "best" was one where the correct title eventually showed up at spot 218.
21laytonwoman3rd
*bump* Not enough grumbling going on around here lately!
22Petroglyph
It's been two years and a month since the original thread got started, and things have gotten progressively worse. If any staff members read this: please do something about it.
23timspalding
The simple truth is that this is a hard problem. The search structures that power sitesearch generally have a hard time with works--and all their editions. Setting up a whole new search system, which is what we feel is necessary, and which we did for catalog search, is resource constrained--we don't have enough servers for it (and for the second, parallel system you always need). I have been playing with the issue within the constraints of what we have, and will continue to do so, but I can't say a solution is on the horizon.
24gilroy
>23 timspalding: Is there any way to tweak the existing algorithm so it isn't so ... bloody far off?
25lorax
>23 timspalding:
First, thank you very much for weighing in.
Several years ago, I suggested that doing a phrase search rather than a words search would get most of what we need, and that's something that already exists for works search - for one of the examples I gave in my initial post so long ago, The New Wild, the correct title comes up second out of six (and has the most copies) on a sitewide works search with the phrase, versus fifteenth for a sitewide works search with the words (i.e. with vs. without quotes); it's currently 21 on the Touchstone search. Would this particular tweak really require a new search system, or is this a case where the best is the enemy of the good and you want the perfect solution, which we'll never get, rather than a decent one which could actually be implemented?
First, thank you very much for weighing in.
Several years ago, I suggested that doing a phrase search rather than a words search would get most of what we need, and that's something that already exists for works search - for one of the examples I gave in my initial post so long ago, The New Wild, the correct title comes up second out of six (and has the most copies) on a sitewide works search with the phrase, versus fifteenth for a sitewide works search with the words (i.e. with vs. without quotes); it's currently 21 on the Touchstone search. Would this particular tweak really require a new search system, or is this a case where the best is the enemy of the good and you want the perfect solution, which we'll never get, rather than a decent one which could actually be implemented?
26jjwilson61
>25 lorax: I think a phrase search is exactly what Tim was talking about that would require the new search system. But, Tim, could you, after the site search do a local search of just the search results for exact title matches and move those to the top?
27lorax
>26 jjwilson61:
It already exists as part of the work search, though, which we know Touchstones uses (or at least it did, at one point. They could have changed that, I suppose.)
It already exists as part of the work search, though, which we know Touchstones uses (or at least it did, at one point. They could have changed that, I suppose.)
28jnwelch
Good to hear from Tim on this. As gilroy says in >24 gilroy: and lorax says in >25 lorax:, just somewhat better would be a help - not so "bloody far off". If the phrase search is doable and improves things, I'm all for it.
29librisissimo
23 timspalding
Glad to hear you are looking at it.
In the meantime, after laughing at some of the results, maybe someone would start a game of guessing what Touchstones will come up with for a given title, and giving a prize to the closest or funniest answer?
Glad to hear you are looking at it.
In the meantime, after laughing at some of the results, maybe someone would start a game of guessing what Touchstones will come up with for a given title, and giving a prize to the closest or funniest answer?
30lesmel
>29 librisissimo: I think that was already done.
31Petroglyph
>23 timspalding:
Thank you for responding at least.
Thank you for responding at least.
32librisissimo
>30 lesmel: Fun! Do you have a link?
34librisissimo
>33 lesmel: Thanks. It didn't seem to catch on, though. No prizes??
35gilroy
Okay, so here is where we have a serious problem with the touch stones that is really bad.
Green by Ted Dekker comes up as The Hobbit. Clicking on Other ... Not a single volume has the proper author as an option. In fact, not a single option that comes up is the SINGLE WORD title for any work. The closest it comes up with Ann of Green Gables at number 8.
Green by Ted Dekker comes up as The Hobbit. Clicking on Other ... Not a single volume has the proper author as an option. In fact, not a single option that comes up is the SINGLE WORD title for any work. The closest it comes up with Ann of Green Gables at number 8.
36lorax
Bump. Is there any chance that, even if this specific request cannot be implemented, the new Works search can be leveraged to make Touchstones somewhat less terrible?
37timspalding
Okay, this is fixed. See https://www.librarything.com/topic/258659
>36 lorax:
"Okay, Green by Ted Dekker comes up as The Hobbit"
They now work, except that it chooses "Green" by REM. I'm going to continue to tweak the algorithm, but the logic is that "Green" by Decker isn't always called "Green" at all, but very often something with the series, book number and so forth. This makes it a weaker match for a mere "Green" than REM. (See https://www.librarything.com/work/8430269/editions )
>36 lorax:
"Okay, Green by Ted Dekker comes up as The Hobbit"
They now work, except that it chooses "Green" by REM. I'm going to continue to tweak the algorithm, but the logic is that "Green" by Decker isn't always called "Green" at all, but very often something with the series, book number and so forth. This makes it a weaker match for a mere "Green" than REM. (See https://www.librarything.com/work/8430269/editions )
38gilroy
>37 timspalding: Yes, but Green by REM is closer than the Hobbit. It shows up in the list of selections (I think 8th now) so much better than before.
39lorax
>37 timspalding:
Matching the wrong "Green" is fine with me.
Tests:
The Sound Book
The New Wild
Those both come up with the right result as the first option. Thank you SO MUCH.
Matching the wrong "Green" is fine with me.
Tests:
The Sound Book
The New Wild
Those both come up with the right result as the first option. Thank you SO MUCH.
41Storeetllr
This is excellent! A big thank you to you and your team, Tim!
42jnwelch
The Underground Railroad works. Jane Steele works. Hmm. This seems like really good news. Thanks!


