Work search: What's left to fix?

TalkRecommend Site Improvements

Join LibraryThing to post.

Work search: What's left to fix?

This topic is currently marked as "dormant"—the last message is more than 90 days old. You can revive it by posting a reply.

1timspalding
Nov 20, 2010, 1:07 am

We've been working on the work search and I'm getting more satisfied with it. So I'm putting out a call for people to nominate situations that still aren't working.

Note that I will periodically have to "stop" this conversation as I redo a search index or parameter. While I'm changing an index--or anyway changing it significantly--there's no point to criticizing it. It's like complaining about the arrangement of living room furniture not in the living room, but in a moving van.

So, anyone got examples that don't work?

2timspalding
Edited: Nov 20, 2010, 1:30 am

Taking from the other thread:

"Room" is now showing Donoghue's recent book second, after "The Room by Hubert Jr. Selby." First would be better, but Selby's book isn't obscure. The technical explanation is two-fold—(1) the index ignores the "the," and (2) the sharding that making search fast means that different search pools will have different proportions of a word; in this case the shard with Selby has the word "room" fewer times so it's emphasizing it more. Anyway, #2 is fine for me.

"1984" is now showing "Nineteen Eighty-Four" third. It's below "1984" by Saul Bellow. That seems fair. Bellow's book isn't exactly unknown, and it has the exact title. It's also below "Animal Farm/1984." I think this too is fair. The work in question should probably be renamed "Animal Farm/Nineteen Eighty-Four."

"Interviews Barnes" is now working well.

"Shakespeare" is currently preferencing books with Shakespeare in both the title AND author spot, like "Romeo and Juliet (Shakespeare Made Easy)." But the results are much more sensible. I'm running something to equalize that situation a bit overnight.

3timspalding
Nov 20, 2010, 1:34 am

See bug on http://www.librarything.com/topic/100237 — an interesting one.

4brightcopy
Nov 20, 2010, 1:38 am

You've probably already noticed, but I'm running through every Other Search bugs (after moving a bunch to there from Uncategorized - ugh) to try to update them in regards to the new search. I'm being a little ruthless on some when they're completely based around the old search but describing general problems (like "the author results suck"). Just seems to me like it'd help having new bugs with actual test cases in the new system.

5timspalding
Nov 20, 2010, 1:40 am

Yeah, it's a fine line between "tuning" and bugs. That bug above is more like a bug insofar as it should CLEARLY take note of the canonical title not just the ones that are actually given. But a result where the most logical thing shows up #6 not #1 is more of a tuning thing--for one thing, it may require trade-offs.

6brightcopy
Nov 20, 2010, 1:47 am

2>

The Room result still makes me a bit queasy. But yeah, I think #2 is good enough, given the technical limitations. I'd say we should try to cast our net and see if we can find a large number of places where this would actually be problematic.

1984 sounds good. It's a damned if you do/don't situation, I think.

The Shakespeare one bothers me, but I don't know exactly what "equalizing" you plan to do. Something really makes me feel that all the books with the title of just "Shakespeare" should definitely come higher. I think this could be alleviated a lot if you added a Titles search between Works and Authors that would search only titles.

7brightcopy
Nov 20, 2010, 1:49 am

5> Right, that bug was so specific even though it was based on the old search that I felt it still applied and just updated it to note that.

8timspalding
Edited: Nov 20, 2010, 2:01 am

> feel that all the books with the title of just "Shakespeare" should definitely come higher

Yeah, I'm not sure about this. I can see it both ways anyway. It's a bit mechanical. I note that it's not true of either Amazon or Google Books. WorldCat is doing basically what LT does—preferencing books with "Shakespeare" in the title, wherever it is.

9brightcopy
Nov 20, 2010, 3:14 am

8> Yeah, I can see it both ways, too. It's a gut feeling that I can't come up with a good logical argument about. My gut says:
Romeo and Juliet (Shakespeare Made Easy) by William Shakespeare (160 copies)
Hamlet (No Fear Shakespeare) by William Shakespeare (247 copies)
Love Poems & Sonnets of William Shakespeare by William Shakespeare (130 copies)
The First Folio of Shakespeare: The Norton Facsimile by William Shakespeare (179 copies)
shouldn't come way at the top with
Shakespeare: The World as Stage by Bill Bryson (1,735 copies)
relegated to way down the page.

If the goal of search is to find the thing you're most likely to be looking for, are you very likely to be looking for Manga Shakespeare (179 copies) or Shakespeare: The World as Stage (1,735 copies)? Of course, we've seen what happens when too much weight is put on copies, too. I think maybe Shakespeare is just a bad example.

I also really wonder if Works is a very useful default search at all. If you had a Titles search, would it be a better one? Are people much more likely to be searching Titles most of the time and Authors a small minority? Are they likely to click on Authors ANYWAY when searching for authors? Is the desire to have a single search box worth an unavoidable degradation in the search results because you aren't helping the system by telling it which thing to search?

These are the thoughts that go through my head on this whole issue.

10EveleenM
Edited: Nov 20, 2010, 9:05 am

Since I'm interested in travel guides, I tried a few searches on London.

1. Searching on the one word London, the first book about London (the city) is at 13 in the list. I don't mind it coming lower than The Unabridged Jack London, but I think things need more tweaking when it's below Froggy Plays in the Band. Works by an author London with 93, 97, 139, 146 etc. copies are coming up higher than books with London in the title with 1,614 copies (London: The Biography by Peter Ackroyd), 1,792 copies (London Fields by Martin Amis), 2,020 copies (London: The Novel by Edward Rutherfurd), 3,608 copies (Down and Out in Paris and London by George Orwell). I would have expected the latter four to come out at or near the top of the list.

2. Searching on London guide produces a reasonable-looking list at first, but includes a lot of classics such as Pride and Prejudice which were published in London and include a reader's guide. I certainly wouldn't have expected Jane Austen or Macbeth or The Complete Poetry of John Milton to come up on the first page of this search.

3. Searching on London travel produces a well-focused list: the first odd-looking one http://www.librarything.com/work/443623 Frommer's Paris 2005, turns out to have a few wrong editions combined into it, including a London one (which I will fix later, when I don't need it as an example). So this search is exactly what I would have wanted.

Edited to add: is it possible to filter the results by title, the way the author search filters them by author? Excluding works whose London is only the author (first list) or city of publication (second list) would solve most of my problems here.

11EveleenM
Nov 20, 2010, 8:36 am

Searching on Ireland produces a rather strange scatter of works whose title is the single word Ireland. The first two are Ireland by Sonya Newland (21 copies) and Ireland by David Lyons (36 copies); further down the first page are quite a few more such as Ireland by Joseph Coohill (19 copies) and Ireland by Dervla Murphy (6 copies), and lots with single copies.

However, if I keep going through the next pages, I get:
Page 2: Ireland by William Trevor (44 copies), Ireland by Nick Constable (24 copies), and Ireland by Joe McCarthy (33 copies), plus more singletons.
Page 4: Ireland by Terence J. Sheehy (17 copies).
Page 6: Ireland by Quigley (1 copies).
Page 9: Ireland by Edwin Smith (9 copies).
Page 10: Ireland by Micheál Mac Liammóir (9 copies) and Ireland by John Fraser Hart (8 copies).
Page 12: Ireland by J.W.P. Rowledge (4 copies) and Ireland by Michael O'Mara (4 copies).
Page 13:Ireland by Bill Doyle (2 copies).
Page 16: Ireland by James Reynolds (2 copies).
Page 18: Ireland by Roland Hill (1 copies) and Ireland by Debra Shipley (1 copies)

I've no idea what way the weighting is working here: if the number of copies is weighted, why is the 44-copy William Trevor book on the second page, or the 17-copy Sheehy book on the fourth? Single-word titles are always going to be a pain in the neck, since it's impossible to narrow down the title. Still, the old search had the alphabetical sort option with which it was possible to get all these works in a block
http://www.librarything.com/search_works.php?q=Ireland&offset=3450&so=40...
http://www.librarything.com/search_works.php?q=Ireland&offset=3475&so=40...
http://www.librarything.com/search_works.php?q=Ireland&offset=3500&so=40...

12timspalding
Nov 20, 2010, 10:12 am

A key question is whether you expect the works search to search anything other than titles and authors--for example, tags and subjects.

As conceived, it does not. But maybe it should. I'm just worried that would be both harder and more prone to make it hard to find a book you are looking for.

I come down on the idea that it should not. We aren't a library. The primary use should be to find books you know about, not a book you don't.

13eromsted
Edited: Nov 20, 2010, 10:27 am

Here's something unexpected, for me at least:

The words search is considering word order when weighting results.

So Hawthorn, The Scarlet Letter has the main work at #4.

The Scarlet Letter, Hawthorne has the main work at #1.

The comma also matters. So Hawthorne The Scarlet Letter has the main work at #6.

*Fixed bad third link*

14_Zoe_
Nov 20, 2010, 11:32 am

Yup, it's looking much better now. Thanks.

15_Zoe_
Nov 20, 2010, 11:36 am

I'm not sure why La Chute isn't turning up, though. Is it deliberately excluding non-English titles?

16EveleenM
Nov 20, 2010, 11:46 am

Another test search, this time on Amber: as a gemstone, a colour, a famous fantasy location, and a name, it should produce a few different clusters of results.

Results:
1. Candlemas: Feast of Flames by Amber K (139 copies)
2. Amber by Lauren Royal (35 copies)
3. Tarot for Dummies by Amber Jayanti (53 copies)
4. Witness: For the Prosecution of Scott Peterson by Amber Frey (91 copies)
5. Philosophy of Wicca by Amber Laine Fisher (61 copies)
6. True Magick: A Beginner's Guide by Amber K (266 copies)
7. Blood of Amber by Roger Zelazny (881 copies)
8. Coven Craft: Witchcraft for Three or More by Amber K (187 copies)
9. Chaos and Amber by John Gregory Betancourt (144 copies)
10. Dragonfly in Amber by Diana Gabaldon (5,355 copies)
11. Amber by Night by Sharon Sala (40 copies)
12. Nine Princes in Amber by Roger Zelazny (1,333 copies)
13. I, Amber Brown by Paula Danziger (205 copies)
14. Black Amber by Phyllis A. Whitney (68 copies)
15. Forever Amber by Kathleen Winsor (841 copies)
16. Fishing for Amber: A Long Story by Ciaran Carson (41 copies)
17. The Chronicles of Amber, Volume I (Nine Princes in Amber and The Guns of Avalon) by Roger Zelazny (566 copies)
18. Forever Amber Brown by Paula Danziger (220 copies)
19. Cheyenne Amber by Catherine Anderson (28 copies)
20. The Amber Cat by Hilary McKay (35 copies)
21. The Great Book of Amber: The Complete Amber Chronicles, 1-10 by Roger Zelazny (1,630 copies)
22. The Amber Spyglass by Philip Pullman (12,182 copies)

Again, Amber as author name is weighted way too high compared with Amber in the title: 5 of the first six works are author rather than title results. Even allowing for this, the listing of titles with an author Amber is still totally unpredictable to a lay person. For example: Candlemas: Feast of Flames by Amber K (139 copies) is #1 on the list; Meridian by Amber Kizer (218 copies) is #401. I don't understand how that is working at all.

Looking at titles with Amber, I find it inexplicable that The Amber Cat by Hilary McKay (35 copies) is higher in the list than The Amber Spyglass by Philip Pullman (12,183 copies): they're both three-word titles with amber in the middle - how does the weighting manage to favour the cat over the spyglass?

My expectation was that this search would produce the hugely popular Pullman and Zelazny titles as the most relevant results: I certainly didn't expect to find Zelazny at #7 and Pullman at #22. Even filtering by title would still leave The amber spyglass below The amber cat - why? Filtering by author would still leave Amber Kizer's works way below Amber K's works, with no reason I can see.

17timspalding
Nov 20, 2010, 12:43 pm

La Chute

Hard problem. It does show up—the top result on page two. It's not currently seeing its French title as anything more than a fairly rare alternate title. I'll look into how to improve the situation.

18jjwilson61
Nov 20, 2010, 1:05 pm

I wonder if tags should be used, not to retrieve the results, but to sort the results. For example, if someone did a work search for christmas, maybe the works that score highest on the tag page should get a boost on the search page.

19jjwilson61
Nov 20, 2010, 1:09 pm

Searching for "paradise"

Paradise by Abdulrazak Gurnah (102 copies)
Road to Paradise by Paullina Simons (90 copies)
Paradise by Toni Morrison (2,197 copies)

Why is an exact title match by Gurnah appear above an exact title match by Morrison when Morrison has way more copies?

And why is Road to Paradise #2 when it has fewer copies than either of the others and Paradise is only a part of the title?

20brightcopy
Nov 20, 2010, 1:11 pm

18> It's possible, though my mind goes back to someone posting about how Catcher in the Rye is the number 5 result for the "immigration" tag search:
http://www.librarything.com/tag/immigration

I think it'd need to take into account not the number of copies of a book tagged X but the number of times a book has been tagged X. To apply that to the previous example, Catcher has only been tagged with "immigration" by one user.

21brightcopy
Nov 20, 2010, 1:13 pm

19> Possibly due to the sharding issue Tim explained in message 2.

22EveleenM
Nov 20, 2010, 2:23 pm

I think the crucial test for me is Would you expect to find this book with that search?

I think I can safely say that no one would expect to find these 13 books by a search on London guide:
Romeo and Juliet by William Shakespeare, Macbeth by William Shakespeare, Pride and Prejudice by Jane Austen, Tess of the D'Urbervilles by Thomas Hardy, The Importance of Being Earnest by Oscar Wilde, The Warden by Anthony Trollope
Mary Queen of Scots by Antonia Fraser, The Complete Poetry of John Milton by John Milton, The Nigger of the Narcissus by Joseph Conrad, Mrs Beeton's Book of Household Management (Oxford World's Classics) by Isabella Beeton, Rocks & Minerals (Eyewitness Books) by Dorling Kindersley Ltd, Dress Your Best: The Complete Guide to Finding the Style That's Right for Your Body by Clinton Kelly. I'd count it as a real weakness that they're taking up 13 spaces on the first page of that search.

On the other hand, if you know the exact title of a book, I think it's reasonable to expect the search to find it. If that title is unfortunately a single word like Ireland or London, while it's technically correct that the search will find it, it could be anywhere over 18 or 20 results pages. In practice, having to check up to 1800 or 2000 titles is pretty discouraging - the alphabetical sort of the old results, while clunky, at least gave some hope of finding this kind of work.

23_Zoe_
Nov 20, 2010, 2:32 pm

>17 timspalding: Maybe you can give extra weight to Original Title, if it's included in the search at all?

24timspalding
Nov 20, 2010, 2:37 pm

Would you expect to find this book with that search

I don't know, is that REALLY the criterion? I mean, do you judge a search by whether you find good stuff or whether it also includes some meh stuff?

ordering

Working on it.

25EveleenM
Nov 20, 2010, 2:58 pm

#24
(my #22) Would you expect to find this book with that search

I don't know, is that REALLY the criterion? I mean, do you judge a search by whether you find good stuff or whether it also includes some meh stuff?


I judge a search by whether it finds relevant stuff - if I search on amber (thinking gemstones) and get the adventures of Amber Brown, that's fine. While of no interest to me personally, it's a relevant result for that search. If the search includes meh stuff at the expense of more good stuff, then, yes, I'd fault it. If the irrelevant Mrs Beeton's Book of Household Management is on the first page of a search on London guide while actual London guides are on page 9, then I think it's a problem.

26Heather19
Nov 20, 2010, 5:20 pm

Um.... Maybe this is a tad off-topic, but it's something I've always meant to bring up, so with the new search on everyone's minds....

Is it possible to *just* search book titles? You can search just "authors", but searching "works" brings up titles *and* authors. Is there any way to find, for example, books titled "Carrie" without bringing up pages and pages of authors named Carrie?

27lorax
Nov 20, 2010, 5:49 pm

26>

No, it never has been, despite repeated requests. I think that Tim's considering adding it to the new search, though.

28Aerrin99
Nov 20, 2010, 6:29 pm

Yeah, what you actually are doing is searching 'works' in a title and/or author search. It seems like Tim is willing to allow us to limit our search to only the title field or the author field if he can get the code to do it properly.

29lorax
Nov 20, 2010, 7:00 pm

28>

He allows, and always has, allowed author-only searches. It's just that now it defaults to "works", rather than requiring you to choose from the outset. If you click on the magnifying glass without terms, or go directly to /search.php, you can choose your search type without the intermediate step.

30_Zoe_
Nov 20, 2010, 7:36 pm

It's weird that searching "lion the witch and the wardrobe" brings up five other results before the C.S. Lewis work.

31timspalding
Nov 20, 2010, 7:37 pm

Yeah.

I'm going to condense the works from 10 to 4 shards, hoping to improve the quality. We'll see.

32_Zoe_
Nov 20, 2010, 7:42 pm

I don't know enough about the technical side of things, but I hope it works :)

Searching for C.S. Lewis or CS Lewis also doesn't bring up the expected results at the top. It seems like you've swung too far in focusing on the title rather than the author. I think number of copies needs to play a much larger role, as long as the term in question regularly appears as the title or author of the popular work.

33eromsted
Nov 20, 2010, 8:14 pm

I don't know if it's technically possible, but I suspect that the results could be dramatically improved by depreciating the stuff in the edition titles that's not the main title. So that would mean anything after a colon, semicolon or period or question mark and anything contained inside parentheses or square brackets.

34rsterling
Nov 20, 2010, 8:19 pm

When does spam get removed from the index? A search of member reviews turned up several reviews for spam works that have already been suppressed, and where the reviews are already flagged and hidden:
http://www.librarything.com/search.php?search=seo&searchtype=10

Do the spammers' catalogs have to be deleted for the reviews to be removed from the index? Or is spam-work-suppression enough, but with a delay?

I can see this new search being very useful for finding and removing spam, but it would probably be more so if things already suppressed didn't come up.

35rsterling
Nov 20, 2010, 8:22 pm

Also, the counts for tag search don't seem to reflect the fact that works are suppressed:
http://www.librarything.com/search.php?search=seo&searchtype=tags

For instance, the results say 158 results for "SEO services," but there's only one book (and an actual book) on the tag page. Everything else would have been spam that is suppressed.

36brightcopy
Nov 20, 2010, 10:54 pm

First off, I think the #1 goal of a search is to find something specific that you are looking for. I think the #2 goal is finding interesting stuff based on a search term. To give examples, #1 is when I search for "Tom Sawyer" and I'm trying to find the The Adventures of Tom Sawyer. #2 is when I search for "jupiter" because I'm looking for books about "jupiter". I think #1 should take priority over #2 because it's much more of a pain in the ass when it doesn't work rather than when #2 gives you sub-optimal (but not terrible) results. At least, that's how I see it.

I'm beginning to think it's a near-impossible task to make a search that searches BOTH the title and author and returns the results the searcher expects, except in very specific cases. It's like doing a search that may search the internet and your email at the same time. I think the two sides fight at each other and wind up being impossible to balance. This is my opinion and probably won't be too popular, but I almost think the Works search should be ditched, or at least moved to third place. Title search should be the default, with Author being a choice you can switch to. I think it would reduce frustration and actually make the search more likely to fit the users actual need. Again, my opinion.

37jjwilson61
Nov 20, 2010, 11:22 pm

I agree. I don't know how you'd do it in the UI, but a search where the title terms were put in a title box and author terms were put in another box would be much more likely to find what you wanted. Maybe you could allow them in one box separated by a comma but people would have to know somehow to do that (although it's already a common technique for finding books on the Add Books page).

38EveleenM
Nov 21, 2010, 6:43 am

#36
I'm beginning to think it's a near-impossible task to make a search that searches BOTH the title and author and returns the results the searcher expects, except in very specific cases. It's like doing a search that may search the internet and your email at the same time. I think the two sides fight at each other and wind up being impossible to balance.

That's pretty automatically true of single-word searches, but I wonder how often people use them. Seeing Froggy plays in the band so high on the London search results had me muttering to myself, but thinking about it, if someone is searching for Froggy books knowing the author's name is London, they're much more likely to search for Froggy London which will produce a very well-focused list, http://www.librarything.com/search.php?term=froggy+london . Likewise a search for amber Zelazny will work perfectly http://www.librarything.com/search.php?term=amber+zelazny . I'd hate to lose that capability.

For the series, doing the general search first and then selecting the relevant series pages seems to work very well. I was hoping that in the same way, we could do the general search first, and then select an 'author term only' or 'title term only' filter on the works. But since I'm not a technical person, I've no idea if that's actually possible.

39SimonW11
Nov 21, 2010, 6:50 am

title search would be a great option

40jjwilson61
Nov 21, 2010, 10:37 am

The ultimate search would be like they have on a lot of job search engines where after your initial search the left hand column has several search terms with the 5 or 6 of the most common items *from that search* filled in under them (with an option for more). Then you can select one or more of those items to narrow the search.

For example, for your amber search one of the authors on the left would be zelazny and if you selected it LT would sub-select the list for only those items with an author of zelazny.

41infiniteletters
Nov 21, 2010, 1:50 pm

40: Ooo. Like the tag page and tagmash pages show related versions.

42timspalding
Edited: Nov 22, 2010, 12:42 am

Current status:

"Room" : A+. Donoghue is now #1.

"1984" : B+. Nineteen-Eighty Four is now at #3 (see above)

"interviews barnes" : A+

"Amber" : B+. Amber Spyglass is 4.

"La Chute" : B. The Fall is number 9.

"Paradise" : B+? Six works titled "paradise" now first.

"Shakespeare" : A- : Two works titled "Shakespeare" lead the list, followed by big Shakespeares--Macbeth, Romeo and Juliet...

"london guide" : C It's still including big fiction works with both "london" and "guide" somewhere in the subtitles.

43brightcopy
Nov 22, 2010, 1:31 am

42> That seems like a pretty good hit rate for such (mostly) vague queries. I've been trying a few, and I haven't really run anything too surprising. But here's another example of the 1984 B+:

Hamlet

Hamlet by Bruce Coville (30 copies)
Hamlet (33 copies)
The Tragedy of Hamlet, Prince of Denmark by William Shakespeare (13,727 copies)

It's a tough situation, as exact names are probably normally a great match. But maybe there's something that could be done for 30 copies vs 13,727 copies. Of course, looking at the 1984, I note that Dune comes in #5 with 16,168 copies or Frankenstein at #8 with 17,033 copies.

I notice Dune shows up due to 1 copy of the 16,168 copies being "Dune (First Putnam Edition, 1984 Book Club)". I wonder if there's a lesson somewhere in there.

44reading_fox
Nov 22, 2010, 7:29 am

Two Bugs:

Jim Butcher still doens't appear on the author search results! (although the work pages gets lots of his titles. He is the top hit of the Beta aliased results though, once you think to look there. How many other authors are missing?

"Open in new tab" in IE8 doesn't work. I get a copy of the page (ie talk) I was on. This is annoying.

Is there a help page for allowed wildcards and search modifiers? Please link to it from the search page!

45EveleenM
Nov 22, 2010, 7:48 am

#42
That looks good!

You didn't mention the search on plain "London" - I'd rate it an A now as well (allowing for the huge breadth of the search). The high-volume works about London are now respectably high on the list.

Another test search, on "Persephone": http://www.librarything.com/search.php?term=persephone

I'd rate it A+: the top works are titled persephone; the more numerous works published by Persephone Books are further down the list.

46lorax
Nov 22, 2010, 8:41 am

42>

That looks like a major improvement!

47lorax
Nov 22, 2010, 8:43 am

43>

I notice Dune shows up due to 1 copy of the 16,168 copies being "Dune (First Putnam Edition, 1984 Book Club)". I wonder if there's a lesson somewhere in there.

Yeah, ideally it would be able to weight by copies that actually contain the search phrase, rather than by overall copies when only one or two copies of a 10,000+ copy book contain the phrase. That would probably be quite slow, though.

I think that for the "Hamlet" situation having the presumable target show up third below two exact matches when the presumable target isn't actually an exact match is about the best an automated system can be expected to do.

48jjwilson61
Nov 22, 2010, 9:47 am

Paradise looks good, but I'm still puzzled why identically titled books aren't ordered by number of copies.

49timspalding
Nov 22, 2010, 11:32 am

Paradise looks good, but I'm still puzzled why identically titled books aren't ordered by number of copies.

It's weighing the whole string of alternate titles too, and determining what percentage of them (or rather their text) correspond to the search term. Thus, without ANY popularity, a work entitled "Twilight" alone would rank much higher for the term than the real work, which has various secondary titles and translated titles in the mix. Popularity compensates for that in most situations, but the basic math of searching is still operative.

50jjwilson61
Nov 22, 2010, 12:33 pm

By "string of alternate titles" do you mean from the individual editions that make up the work? So the Paradise by Morrison is sorted lower because a lot of the editions have (Oprah's Book Club) in the title? I think that's reasonable, I just want to understand it.

51timspalding
Nov 22, 2010, 2:05 pm

Yes, or Paradiso or the Arabic title or whatever.

52mvrdrk
Edited: Nov 22, 2010, 5:27 pm

There needs to be some method of searching on "single characters" so that a searches for something like "紅" shows both 紅摟夢 and 紅 (集英社スーパーダッシュ文庫)

53timspalding
Nov 22, 2010, 6:14 pm

Casey, how does that work—can we make word breaks around them?

54IamAleem
Nov 22, 2010, 6:47 pm

WHEN WILL 'Tagmirror' BE BACK UP?!?!?!?!?!?!?!?!?!?

55eromsted
Edited: Nov 22, 2010, 6:51 pm

>54 IamAleem:
Try out the new talk search. This is the second hit on a talk search for tag mirror.

56timspalding
Nov 22, 2010, 7:04 pm

In Spanish, would that have to start with ¡¿¡¿¡¿¡¿¡¿¡¿¡¿¡¿¡¿¡¿¡¿ ?

57justjim
Nov 22, 2010, 10:22 pm


¡ʎɐʍʎuɐ pɹɐoqʎəʞ doʇdɐl ʎɯ ɟo ʇno sqɯnɹɔ əɥʇ ʇəƃ oʇ pəpəəu ı

58romula
Nov 22, 2010, 11:04 pm

57> I just hit CTRL-ALT-Down arrow, does about the same thing =)

59Mr.Durick
Nov 23, 2010, 12:07 am

Oh my!

Robert

60timspalding
Nov 23, 2010, 1:04 am

ISBNs now work.

61DaynaRT
Nov 23, 2010, 8:36 am

Yes they do. All is right with the world again.

62lorax
Nov 23, 2010, 8:49 am

60>

Thanks, Tim!

63eromsted
Edited: Nov 23, 2010, 11:58 am

>42 timspalding:
In a search for Moby Dick, Melville's novel comes in at #11.

Since the title is often Moby-Dick this may be a result of the search treating - as a non-character instead of a space.

So mobydick does better with the novel coming in at #4.

64caseydurfee
Edited: Nov 23, 2010, 11:31 am

52, 53 >

That would require having a separate text index just for CJK characters. The index would either use a probabilistic filter to guess where the word breaks actually are, or generate N-grams (overlapping sequences of bytes in the string, eg. "casey" => {"ca", "as", "se", "ey" }). But doing CJK -- split into "real words" or not -- requires fundamentally indexing in a way you don't want to apply to general-purpose text. So the key is being able to identify those strings and putting them in a special field, as well as the text/boost_text fields.

There will be something better baked into future versions of Lucene/Solr that can guess the right word-delimiting to do based on guessing the language, then applying the best algorithm for that language:

https://issues.apache.org/jira/browse/LUCENE-1488

65mvrdrk
Nov 23, 2010, 5:38 pm

Thanks!

66eromsted
Edited: Nov 29, 2010, 12:16 am

This one's odd. A search for Tin Drum doesn't find Grass's The Tin Drum at all.

67brightcopy
Edited: Nov 23, 2010, 6:58 pm

66> Furthermore, it only returns 13 total results.

ETA: I should mention that one of those results should probably be combined with the main work. Please don't until Tim looks at this. Otherwise it may ruin the example.

68_Zoe_
Nov 23, 2010, 7:44 pm

Now that the basic functionality has been improved, I really think we should revisit the issue of what displays. It seems likely that one of the first things a new user will do is search for a favourite book to see what the site is about. The results don't look very friendly. The results are plain text, title and author only, and there's a garbled repetition of the search results under each item.

Compare this to GoodReads. Each book that turns up in the search results has, in addition to title and author, a cover picture, a notification if it's by a GR author, an average rating with visual representation, the number of users, the original publication year, and the number of editions. It's very clear what all the information means and why it's there, and it's just a lot more welcoming overall.

69brightcopy
Nov 23, 2010, 7:48 pm

68> All great points, but perhaps more applicable to the search in general than this topic (Works search). I think there's a lot of cools stuff that could be done to spiff up the results for works, authors, series, etc. A lot of it could be dynamically loaded so it doesn't have to slow down the actual return of search results.

70_Zoe_
Nov 23, 2010, 7:55 pm

>69 brightcopy: Well, my main concern is works, but yeah, I could post in the other thread as well.

71keristars
Nov 23, 2010, 8:25 pm

64> I wish there were some way to do that kind of search, particularly in authors. Because it's just not useful for character strings where a character is a whole word and spaces aren't used. Plus, it's going to be troublesome if someone doesn't remember an S or something (I think plurals would be the most common example): with an author search, "Gidden" received no results, though you might hope that it would show some for "Giddens" (the old author search would bring those results, until it got tweaked, but might do it again now).

Even if it were a back-door, gotta know the secret password to initiate the search and then wait a bit longer like with tagmash option for doing it.

72brightcopy
Nov 23, 2010, 9:52 pm

71> If nothing else, I wonder if there could be a way that a search that returns 0 results could be converted into a substring search, in other words changing it to "*gidden*".

73Collectorator
Nov 23, 2010, 9:59 pm

This member has been suspended from the site.

74Heather19
Nov 23, 2010, 10:10 pm

73: Thank you for bringing that up so I didn't have to feel like a nagger. I really, really want search by title only... That's bugged me for a long time.

75keristars
Nov 23, 2010, 10:15 pm

72> If only such a substring search actually did something :(

76Aerrin99
Nov 23, 2010, 10:32 pm

Tim has expressed interest in trying to filter Work search results by author and by title (IE, letting you do a title search), but has not talked about specifics as far as how hard it will be or whether it is coming for certain or soon. My impression was that he needed to look more closely at how things functioned after the works search was fixed up a bit.

I don't recall much at all about sorting - maybe someone else does.

77brightcopy
Nov 23, 2010, 10:49 pm

75> Right, I didn't mean to say substring works now. But even if they don't immediately provide a way to do the substring search from search.php, it's possible that it could be done as a fallback on 0 results.

78jjwilson61
Nov 23, 2010, 11:07 pm

Tim said earlier that their new fancy search engine can do wildcards anywhere but at the beginning of the string.

79keristars
Nov 23, 2010, 11:16 pm

Huh. Must not be implemented - try "gidden*"

80brightcopy
Nov 23, 2010, 11:36 pm

79> It was actually casey who said it here:
http://www.librarything.com/topic/102604#2311807

I think he was referring to what they COULD do, not what they are currently doing.

81eromsted
Nov 23, 2010, 11:42 pm

Following up on message 66, another search where the main work is missing entirely: Divine Comedy.

The result list for this one is longer so someone let me know if I just missed it.

82EveleenM
Nov 24, 2010, 4:34 am

#73
I don't want to read this whole thread. Could someone just tell me if we are going to get search by title only, and if we are going to be able to sort results on anything?

In #22, I raised two issues: irrelevant works turning up in the results, and finding results for single-word titles, where I gave the example that the old search, sorted alphabetically, worked for that.

In #24, Tim answered
ordering
Working on it.

I assumed that he meant he is still working on sorting the results.

83_Zoe_
Nov 24, 2010, 11:48 am

The search results for Brandon Sanderson are strange. Why are a bunch of one-copy works showing up above the major works?

84timspalding
Nov 24, 2010, 11:58 am

Are you doing works or titles. There's a bug in titles I'm fixing.

85_Zoe_
Nov 24, 2010, 12:02 pm

86andejons
Nov 24, 2010, 3:19 pm

I get zero results when searching for "Blecktrumman" (The Tin Drum): it shows up under "Titles only", but not under "Works". Searching for "The Tin Drum" gives similar same results: the main work just isn't shown.

87timspalding
Nov 24, 2010, 3:33 pm

88_Zoe_
Nov 25, 2010, 8:47 am

I'm still getting bad result ordering for works search on Brandon Sanderson.

89brightcopy
Nov 29, 2010, 12:32 pm

One more for the Unexpected But Maybe Unfixable Results Department:

"Moby-Dick" has weird results:

http://www.librarything.com/newsearch.php?search=Moby-Dick
Moby-Dick
1 member, 1 edition, 3 stars

Moby-Dick, Second Edition (Norton Critical Editions) by Herman Melville
644 members, 3 reviews, 25 editions, 4.18 stars

Redburn, Whitejacket, Moby Dick by Herman Melville
308 members, 4 reviews, 45 editions, 4.28 stars

Moby-Dick: A Pop-Up Book by Sam Ita
50 members, 1 review, 7 editions, 4.5 stars

Moby-Dick; or, The Whale by Herman Melville
13,837 members, 184 reviews, 2,287 editions, 3.93 stars
And when search is changed to "titles only":
Moby-Dick
1 member, 1 edition, 3 stars

Moby-Dick by Jan Needle
1 member, 2 editions

Moby-Dick by Herman Mehlville
1 member, 1 edition

Moby-Dick by Jens Hoffmann
1 member, 2 editions

Moby-Dick, Second Edition (Norton Critical Editions) by Herman Melville
644 members, 3 reviews, 25 editions, 4.18 stars

Redburn, Whitejacket, Moby Dick by Herman Melville
308 members, 4 reviews, 45 editions, 4.28 stars

The trying-out of Moby-Dick by Howard Paton Vincent
8 members, 5 editions

The Trying-Out of Moby-Dick by Howard P. Vincent
1 member, 2 editions

A reading of Moby-Dick by Milton Oswin Percival
1 member, 1 edition

Discussions of Moby-Dick by Milton R. Stern
6 members, 6 editions

Moby-Dick: A Pop-Up Book by Sam Ita
50 members, 1 review, 7 editions, 4.5 stars

Moby Dick: A Commentary by Harold Beaver
4 members, 4 editions, 2.5 stars

Moby-Dick; or, The Whale by Herman Melville
13,837 members, 184 reviews, 2,287 editions, 3.93 stars

90infiniteletters
Nov 29, 2010, 6:31 pm

I combined some singleton and doubleton works. The main Moby-Dick is now miraculously the top result in works search.

91infiniteletters
Nov 29, 2010, 6:40 pm

And now it's 10th again. Strange.

92brightcopy
Nov 29, 2010, 6:42 pm

90> First off, it's really best not to combine stuff that's being used as an example to get Tim to fix an algorithm. Otherwise, you just make it harder to track down.

Second, I'm not getting it as the top result:

http://www.librarything.com/newsearch.php?search=Moby-Dick
Moby-Dick, Second Edition (Norton Critical Editions) by Herman Melville
644 members, 3 reviews, 25 editions, 4.18 stars

Redburn, Whitejacket, Moby Dick by Herman Melville
308 members, 4 reviews, 45 editions, 4.28 stars

Moby-Dick: A Pop-Up Book by Sam Ita
50 members, 1 review, 7 editions, 4.5 stars

Herman Melville: Moby-Dick by Nick Selby
24 members, 1 review, 15 editions, 4 stars

Moby-Dick by Jan Needle
1 member, 2 editions

Moby-Dick by Jens Hoffmann
1 member, 2 editions

Moby Dick: A Commentary by Harold Beaver
4 members, 4 editions, 2.5 stars

A reading of Moby-Dick by Milton Oswin Percival
1 member, 1 edition

The trying-out of Moby-Dick by Howard Paton Vincent
9 members, 7 editions

Moby-Dick; or, The Whale by Herman Melville
13,840 members, 184 reviews, 2,291 editions, 3.93 stars


As you can see, it's actually further down than before.

93timspalding
Nov 29, 2010, 8:19 pm

What is this books of which you speak? Never heard of it.

94SimonW11
Nov 30, 2010, 3:21 am

searching for eager
i get
http://www.librarything.com/work/8339578 this as the first result.

This as the second http://www.librarything.com/work/1630

and this as the third http://www.librarything.com/work/1363072
followed by a mass of other works by edward eager.

i would like to the third take precidence that is word in title over word in authors name.
and the current 1st. only two copies and lousy data buried somewhere far down the list.

95Aerrin99
Dec 2, 2010, 6:33 pm

Search: harry potter and the sorcerer's stone

Results:

Harry Potter and Sorcerer's Stone
1 member, 1 edition

Harry Potter and the Sorcerer's Stone by KnowWonder
1 member, 1 edition, 2.5 stars

The Mysteries of Harry Potter and the Sorcerer's Stone
2 members, 2 editions

Harry Potter and the Sorcerer's Stone p4t
2 members, 2 editions

Harry Potter and the Sorcerer's Stone 2002 Calendar
4 members, 2 editions

Harry Potter and the Sorcerer's Stone Postcard Book
14 members, 2 editions, 4.5 stars

Harry Potter and the sorcerer's stone by Copyright Collection DLC
18 members, 3 editions, 5 stars

Harry Potter and the Sorcerer's Stone by Electronic Arts Inc.
1 member, 1 edition

Harry Potter and the Sorcerer's Stone by S. J. Rozan
2 members, 1 edition, 5 stars

In fact, the book does not appear anywhere in the 63 (!!) results - and changing it to 'Philosopher's Stone' (which is the canonical title at present) fares about the same. What's going on here? This is so screwy that I have to suspect a bug rather than bad algorithmness.

96brightcopy
Dec 2, 2010, 6:41 pm

Please resist the urge to "clean up" the previous result by doing combining. This does not help track down bugs. :)

97EveleenM
Dec 3, 2010, 5:14 am

#95

A further example of the same thing: search on 'Harry Potter' by number of copies and you get a neat list with the other six Potter books at the top, but Harry Potter and the Philosopher's Stone is missing.

98andejons
Dec 3, 2010, 5:24 am

Searching for titles in other languages doesn't seem to work either. It seems to be the same problem as with The Tin Drum.

99_Zoe_
Dec 4, 2010, 11:25 am

So, is search now "done"? Brandon Sanderson results are still very badly ordered.