Update the Series Cloud

Talk Recommend Site Improvements

Join LibraryThing to post.

This topic is currently marked as "dormant"—the last message is more than 90 days old. You can revive it by posting a reply.

1Collectorator
May 26, 2010, 5:57 am

This member has been suspended from the site.

2EveleenM
May 26, 2010, 7:23 am

I think none of the CK clouds have updated since I joined in January: not just series, but things like nationality, locations, burial places as well.

3Collectorator
May 26, 2010, 7:40 am

This member has been suspended from the site.

4MarthaJeanne
Edited: May 26, 2010, 10:31 am

But which two hours? Please not the two hours I'm usually on.

5KingRat
May 26, 2010, 1:30 pm

I've been complaining about the CK clouds not being updated since at least August.

6Collectorator
May 27, 2010, 7:08 am

This member has been suspended from the site.

7FicusFan
May 27, 2010, 7:47 am

I also would like to see tags on the book page updated regularly.

There is a book, forget which now, that has few owners. I recently added it and the tags part doesn't list mine (even when I click the All button). Its probably been more than a couple of weeks.

8DaynaRT
May 27, 2010, 8:34 am

I also would like to see tags on the book page updated regularly.

This. Please dear Zeus let this happen.

9keristars
May 27, 2010, 9:49 am

It seems there are a lot of data collection clouds/lists (not sure what to call them as a group) that don't get updated regularly - there's the tag clouds, series clouds, tag lists, wishlisted books, currently reading, and I'm pretty certain there are a few others that aren't coming to mind at the moment because I haven't encountered them recently and thought to check. I think for a while, certain zeitgeists weren't getting updated without a manual push?

Is there any particular reason why these things don't refresh themselves regularly? Is it just the strain on the database (I guess, how long it takes)?

10timspalding
May 27, 2010, 3:02 pm

Tags on books are updated every two weeks. We're now running 4 weeks because of admin stuff that John needs to finish before we can bang on the servers.

11_Zoe_
May 27, 2010, 3:06 pm

I'd be happy with a tag mirror that refreshed every four months... ;)

12DaynaRT
May 27, 2010, 3:09 pm

I'm sorry but the last part of Tim's comment forced me go seek out this Robot Chicken video (kinda NSFW).

13KingRat
May 27, 2010, 6:57 pm

CK Clouds have been updated for the first time in 9 months!

14Heather19
May 27, 2010, 9:55 pm

11: THIS!!!
I realize that the tag mirror must be a huge strain on the server, but do we really have to wait until it's *perfected*? Is it possible to just have it back but not updated regularly?

15timspalding
May 28, 2010, 2:32 am

Yeah. There's a system for updating such stuff. (Nothing should be older than about a month or in the case of a cache dependent on a cache, two.) The CK Clouds had "keys" starting with ":". The system expects it to be a number or a letter. Anyway, it meant that it could never get run. It now can.

16r.orrison
May 28, 2010, 4:42 am

Most of the clouds have updated, but the Blurbers cloud still includes HRF Keating 100 Best Crime & Mystery Books (98) and Julian Symons 100 Best Crime & Mystery Books (96). If you check the links, neither has any entries.

17timspalding
May 28, 2010, 4:44 am

It's about ampersand escaping. Sending this one to Chris/ConceptDawg.

http://www.librarything.com/bookaward/Julian+Symons+100+Best+Crime+%2526+Mystery...

18Collectorator
May 28, 2010, 5:51 am

This member has been suspended from the site.

19timspalding
May 28, 2010, 5:56 am

My guess is that's also about encoding (the '). Chris?

20Collectorator
May 28, 2010, 5:59 am

This member has been suspended from the site.

21FicusFan
May 28, 2010, 7:21 am

Now I think the Tags problem on the book page is even worse.

I went to my most recent low entry book (4 owners). It used to have tags. No it says it has none. Even when I press All, no tags are listed.

I have tagged the book with 10 tags. None are listed in the Tags area of the book page.

http://www.librarything.com/work/3007563/book/59872204

22timspalding
May 28, 2010, 7:24 am

Tags didn't change. Probably it got combined into a new work. That kills the cached tags.

23FicusFan
May 28, 2010, 7:26 am

Still says there only 4 owners, if it had been combined, wouldn't that go up ?

24FicusFan
May 28, 2010, 7:29 am

Does not appear to be combined with any other book.

25conceptDawg
May 28, 2010, 12:56 pm

The ampersand problem in CK clouds has been corrected. And I reran the blurbers cloud because it had some old/bad info in it for some reason.

26Collectorator
Edited: May 28, 2010, 6:28 pm

This member has been suspended from the site.

27Collectorator
May 29, 2010, 8:34 am

This member has been suspended from the site.

28FicusFan
May 29, 2010, 10:36 am

So even if the book lost its tags because of a combination, it would still seem to be a bug.

It now has no tags listed, yet it does have tags.

Are you saying every time a book is combined (correctly or mistakenly) it will be tag-less for 2-4 or an indeterminate number of weeks ?

And with a straight face that this is a built in feature and not a bug ?

What is the point of tags then. More and more I am seeing combining as an evil and not a benefit.

There are no publisher's series, and now no tags because people are running around willy-nilly combining.

The 'Social' benefit does not outweigh the loss of data and usability, especially since tagwatch is still among the missing.

29Collectorator
May 29, 2010, 10:48 am

This member has been suspended from the site.

30Collectorator
May 29, 2010, 10:55 am

This member has been suspended from the site.

31timspalding
May 29, 2010, 1:30 pm

LibraryThing has 63 million tags. We cannot go through live information to find work->books->tags every single time you look at a work page, still less author->works->books->tags every time you look at an author. That alone would be hard enough, but because LT has tag combination there's always an extra step of tags->resolved tags (so, for example, the tag cloud for the Book Thief lists wwii, not all the variants). And that's not counting spam tags, spam users, etc.

Instead, we calculate work->resolved tags on a regular basis. When two works are combined, we do not recalculate all the tags. Most of the time, the change will be minimal, since the larger work "wins" and keeps its work code. If you go through a series of combinations and separations that results in a new work code, a work can lose some of its cached information. To recalculate that information for a single book would be quite time consuming. Most works have tag clouds and most are quite up to date--and the median page-generation time for work pages is 0.6 seconds.

especially since tagwatch is still among the missing

Like a number of other features, tag watch was not designed to be quick. It was significantly hurting tag-page speed--adding multiples of time. I am no longer allowing programmers to add features without seriously dealing with speed issues. Overall, this has been a good thing--site speed has improved. I am committed to keeping the speed up, even if that means delaying the reintroduction of ill-designed features.

32conceptDawg
May 29, 2010, 1:39 pm

WOW. This thread has taken on a life of its own. There are at least 3 or 4 different issues being talked about. Let's limit it to problems with the CK clouds page. Tags, combining, details section of the work page, etc. should be moved to their own thread where they can be properly tracked and fixed (if they are bugs). Dropping into a new subject is a sure way for that discussion to get lost and forgotten.
(posted before Tim answered, but it still holds true)

All of the clouds were regenerated yesterday. If you see some that seem out of wack let me know and I'll see why that is so.

I also fixed a few other minor bugs when I pushed out the changes yesterday, but nothing that is worth mentioning really.

33timspalding
Edited: May 29, 2010, 2:03 pm

The Children's Books one was messed up by works like this. We weren't correctly handling the combination of work-combination and changes to CK.

http://www.librarything.com/work/9542649

Anyway, we're fixing it now, but what is up with that series? Why are you renaming the series—and the book titles—that way?

34timspalding
May 29, 2010, 2:06 pm

Fixed.

35FicusFan
May 29, 2010, 2:38 pm

Tim, I realize there are lots of tags and you have to optimize speeds.

I am not talking about looking at books, but about books that have lost all their tags.

Surely there is a way to single out books tags>0 who have become Tags=0 and fix only them, on a shorter schedule, without bringing the site down ?

Or adjust how combinations are done so that all the tags don't go walkabout ?

Otherwise you might as well say the book in question is 'under construction' for 2 weeks or however long it takes to repopulate.

I have seen missing tags
talked about in the combiner's group, so its an issue.

Tagwatch is a feature, I don't need the background on it, just that it work.

If you didn't like its performance, then fix it. Now it just sounds like you are more interested in punishing the programmer and us.

36r.orrison
May 29, 2010, 5:59 pm

33: That is just one of the ways that people try to deal with zero-copy works that perpetually clutter up the system. Works that have been deleted, or where the book has been edited so that it ends up combined with another work. I know that you like to keep them, and I seem to recall that the justification is that if they are left combined then if someone else enters the same incorrect data it will cause auto combination with the correct work. The problem is that much of the time these zero-copy works contain incorrect data. I'm not sure exactly how it happens, but I've seen many instances (while cleaning up the Lakeland Fells guides by A. Wainwright) in which zero-copy works cause incorrect combination suggestions. Splitting off the zero-copy works gets rid of the inappropriate suggestions. I beg, and plead - please, please, please just delete these bogus entries from the database.

37timspalding
Edited: May 29, 2010, 6:06 pm

Okay, but someone HAS that book, member "Chale." They have something they described as:

"Best in Children's Books - Jason and the Golden Fleece; Gregoria and the White Llama; The Magic Skipping Rope; A Frog He would A-Wooing Go; The Terrible Mr. Twitmeyer; Toads and Diamonds; Crunch, Crunch"

It's not my fault that such book data exists. In this case it's Amazon's fault. But in any case, it's not a bogus work that should be deleted, and it's only to the credit of the system that it doesn't allow members to rashly and wrongly decide that someone's work is bogus and without copies and remove it.

38Collectorator
May 29, 2010, 6:07 pm

This member has been suspended from the site.

39timspalding
Edited: May 29, 2010, 6:26 pm

http://www.librarything.com/work/9542649/book/59228628

Look, I don't want to be too harsh here, but I do think we need to reconsider using the system in weird, unintended ways. The title and series fields were not intended to be used to marginalize works for their data or for being spam. It is, no doubt, my fault that it took us a while to make a better way to mark spam works, but the result of doing it the wrong way has been big spam links in clouds—and this.

40Collectorator
May 29, 2010, 6:24 pm

This member has been suspended from the site.

41timspalding
May 29, 2010, 6:36 pm

Thanks. Do me a favor. I'm going to see why that book is showing up with copies of other editions. Don't change anything related to it, since this'll take a while.

42Collectorator
May 29, 2010, 6:49 pm

This member has been suspended from the site.

43rsterling
Edited: May 29, 2010, 7:03 pm

36, 37. On this zero-copies thing, there's been a discussion about this for a while, but I still have never understood the rationale (or been convinced by it) for separating out zero-copy works. Ok, if there's a zero-copy edition of the Tempest with the the wrong ISBN, that's been somehow attached to the work page of Of Mice and Men. But in general? I think there's a much better argument for combining zero-copy editions with their appropriate and correct work rather than having them hanging around the system.

ETA: I think there are 2 connected but slightly separate issues here. One is what to do with zero copy editions, whether to separate them from or combine them with the main work page. Another - and this is what I think Tim's talking about with the references to spam - is the use of CK fields to label real works as bad data or to label spam works as spam. I've been trying to go through and delete all the references to "spam" and "spammer" in CK, to try to get a handle on how much of it is left in the system and also to make sure no legitimate authors and titles got caught up in the fray and labeled as spam. But it's practically impossible to check and clean out all the references to spam in CK because once works have been combined, the CK for the books that were combined into another book becomes inaccessible.

44timspalding
Edited: May 29, 2010, 7:02 pm

>43 rsterling:-43

I don't think it makes sense to use title and series for things other than titles and series—for example to change the title to reflect that the work has no copies, or is spam. And I can't see why separating out zero-copy works makes any sense at all, ever. Indeed, the whole point is that if someone comes along and puts a work with that title/author in, it will have its natural home, not be all alone. If there's some rationale I'm missing, what is it?

"Using the system in weird, unintended ways," : BCIB 0 Copies
"marginalize works," : BCIB 0 Copies
"spam," : Marking works as being in the spam series or having the title "spam"

I don't see what point you're missing here, but I don't think I'm from another planet for describing clearly the systematic use of titles and series to mean something other than titles and series.

45Collectorator
May 29, 2010, 7:02 pm

This member has been suspended from the site.

46Collectorator
May 29, 2010, 7:03 pm

This member has been suspended from the site.

47timspalding
May 29, 2010, 7:06 pm

If an edition has data that means it shouldn't belong with a work, it should be separated. If it has data that indicates it should, it should. I don't see how the number of copies matters here in the slightest.

48rsterling
Edited: May 29, 2010, 7:07 pm

Yes.
I'm not sure exactly how it happens, but I've seen many instances (while cleaning up the Lakeland Fells guides by A. Wainwright) in which zero-copy works cause incorrect combination suggestions. Splitting off the zero-copy works gets rid of the inappropriate suggestions.

In cases like these, where presumably 2 completely different books have been combined somehow, and one has zero copies on the editions page, yes, of course the wrong title should be separated out. But it should then be combined with its proper title, if there is one. If it's just a misspelling, that should not cause inappropriate suggestions. If it's two different books, then obviously the two books should be separated - but that's the case whether one has zero copies or whether it has 10.

Edited for typo

49timspalding
Edited: May 29, 2010, 7:13 pm

The concept of "work" and "edition" are concepts above and apart from the item. This is a feature, not a bug.

For example, if someone added the Finnish edition of the Omnivore's Dilemma and combined it with the common edition, that edition is supposed to be part of that work forever--or at least until it's separated out. That allows future Finns to have their books work, even if the original Finnish person left the site or changed the title of their work in some idiosyncratic way. Given members penchant for editing their book data, there are quite a few editions which are perfectly fine and should check through to some work, but which have no copies at the moment because some (probably Amazon) tick in the title has been removed by a user.

I take out the trash. When I'm done I don't hit my head with a hammer until the word "trash" no longer means anything to me, so I can learn it again when the occasion presents. And I certainly don't go around hitting other people with the hammer because, on Tuesday morning, we live in trashless world.

50Collectorator
May 29, 2010, 7:11 pm

This member has been suspended from the site.

51EveleenM
May 29, 2010, 7:11 pm

#44
And I can't see why separating out zero-copy works makes any sense at all.

Suppose a work has 10 actual copies with the author Elizabeth Brown, and 8 with the author E. Brown, but the presence of 6 zero-copy works is tilting the balance to the author E. Brown, and that's where the work is listed. Separating the zero-copy works brings the main work back to the Elizabeth Brown page, where it's better off.

52Collectorator
May 29, 2010, 7:14 pm

This member has been suspended from the site.

53rsterling
May 29, 2010, 7:14 pm

50. I can. 1. Many orphan works in the system. 2. The next time someone adds a book with data that matches the zero copy work it doesn't get combined properly with the main work.

54timspalding
May 29, 2010, 7:16 pm

47, can you give me an example of a *negative* result of zero copies being separated out?

See message 49.

Suppose a work has 10 actual copies with the author Elizabeth Brown, and 8 with the author E. Brown, but the presence of 6 zero-copy works is tilting the balance to the author E. Brown, and that's where the work is listed. Separating the zero-copy works brings the main work back to the Elizabeth Brown page, where it's better off.

Then let's work on that. I am not at all suggesting that zero-copy works should affect a title or author. They shouldn't, except as a last resort--in a work without copies. I suspect that part of the problem is actually that edition counts aren't being updated enough. (I'm running it now.)

But, again, this is a problem where work titles aren't being calculated right, and should be fixed by recommending changes to how it's calculated or pointing out bugs in how it's working, not by bending the system in odd ways to accomplish the result.

55timspalding
May 29, 2010, 7:17 pm

Do you think that is where zero copies come from?

Yeah. I think the latter is the most common. They change titles.

56Collectorator
May 29, 2010, 7:19 pm

This member has been suspended from the site.

57Collectorator
May 29, 2010, 7:20 pm

This member has been suspended from the site.

58DaynaRT
May 29, 2010, 7:20 pm

not by bending the system in odd ways accomplish the result

But then what will combiners and separators do all day? ;)

59timspalding
May 29, 2010, 7:24 pm

Take a look at the link in your message 39. Where did that one come from?

I searched the library of the person listed as having it.

But then what will combiners and separators do all day? ;)

Well, right. So, I need to make it possible to do everything. That's going to be hard, since everything is a big thing. But I do think some of these weird techniques have gotten a bit out of hand.

60Collectorator
May 29, 2010, 7:25 pm

This member has been suspended from the site.

61timspalding
May 29, 2010, 7:26 pm

If users could update edition counts, the necessity of separating out zero copies would not exist. If users could get staff to update edition counts, users wouldn't have to do either.

How would users knows how many people held an edition?

62rsterling
May 29, 2010, 7:26 pm

There's always plenty of combining and separating work to be done; I don't think most of what combiners and separators do involves what you're calling weird techniques.

63timspalding
May 29, 2010, 7:27 pm

No, I mean click on it. It's a zero copy now, since I fixed it. It used to be a 1, now it is a 0. THAT is the SOURCE of the zero copies.

How did you fix it? You separated out the edition that had a user and combined it with the work in question?

64timspalding
May 29, 2010, 7:28 pm

I don't think most of what combiners and separators do involves what you're calling weird techniques.

I completely agree.

65EveleenM
May 29, 2010, 7:33 pm

I am not at all suggesting that zero-copy works should affect a title or author. They shouldn't, except as a last resort--in a work without copies.

Live and learn! I honestly thought that was just a part of the system.

66Collectorator
May 29, 2010, 7:41 pm

This member has been suspended from the site.

67Collectorator
Edited: May 29, 2010, 7:45 pm

This member has been suspended from the site.

68rsterling
May 29, 2010, 8:01 pm

Something else I really don't understand is why combine different zero-copy titles together into one work, when they aren't the same work?

69jjwilson61
May 29, 2010, 8:10 pm

68> I think the idea is to be able to separate out the zero-copy works from the main works but without the drawback that someone posted a little while ago of creating a zillion more works.

70jjwilson61
May 29, 2010, 8:13 pm

Tim, have you been following any of the threads in the Combiners group? There have been many times that someone has posted about a weird problem and separating out the zero-copy works solves the problem.

I'll try to dig around and see if I can find any examples. You should also start a thread in the Combiners group to solicit the opinions of any combiners that don't frequent this group. I say you should because if I did it there'd be no guarantee that you'd be following it which would make it a big waste of time.

71rsterling
May 29, 2010, 8:34 pm

69 - But where you have several different books, all with zero copies, but with different isbns and titles, all combined together into one zero-copy work, isn't that (a) not how we are supposed to use combination, i.e. combining different works into the same work, and (b) going to create problems if someone adds a book with one of those ISBNs later?

72prosfilaes
May 29, 2010, 11:13 pm

#47: Why should I spend time playing around with an edition that no one has?

#49: if someone added the Finnish edition of the Omnivore's Dilemma and combined it with the common edition, that edition is supposed to be part of that work forever--or at least until it's separated out. That allows future Finns to have their books work,

Assuming it worked right. It seems you're adding a lot of complexity and noise to the system for editions that may or may not have ever been correct, and in many, probably most, cases weren't.

73timspalding
May 29, 2010, 11:31 pm

>72 prosfilaes:

You shouldn't have to spend time playing around. This isn't that situation. This is the systematic playing around with them to force a result.

74timspalding
May 29, 2010, 11:40 pm

75rsterling
Edited: May 30, 2010, 12:21 am

So, are you asking us definitively:
- not to separate zero-copy editions (unless they are for a different work or similar major problem for which we'd separate anyway, regardless of the number of copies)?
and
- not to use/alter the CK fields (e.g. canonical title, series) to indicate something peculiar about the quality of the record (e.g. "zero copy" titles or "spam" series, etc.), but only to standardize the title, enter a real series, etc.?
- oh: and not to combine different works together?

(edited to change "work" to "edition", and to add last question)

76Collectorator
May 30, 2010, 4:24 am

This member has been suspended from the site.

77EveleenM
May 30, 2010, 5:56 am

Tim #54:
I am not at all suggesting that zero-copy works should affect a title or author. They shouldn't, except as a last resort--in a work without copies. I suspect that part of the problem is actually that edition counts aren't being updated enough. (I'm running it now.)

But, again, this is a problem where work titles aren't being calculated right, and should be fixed by recommending changes to how it's calculated or pointing out bugs in how it's working, not by bending the system in odd ways to accomplish the result.

Here's an example:
http://www.librarything.com/work/9701443/editions/57806259

I entered that book on the 16th March from amazon.co.uk, which gave the long title Freshwater Life in Ireland: Keys to the more easily identified Irish freshwater plants and animals and checklists for most groups. As far as I remember, I immediately corrected the title to the shorter version Freshwater Life in Ireland. (Then at some point, in an attempt to get the work title to change, I seem to have added and deleted another copy with the newer title).

Anyway, more than two months later the work title is still that belonging to the zero-copy work, while the title of the one real copy hasn't dislodged it. How many times have edition counts been run since the 16th March? From my point of view, separating the work from itself would sort this out right away: I haven't bothered in this case, but that's what I was advised to do in other cases where I'd added an author later but the work was still, weeks or months afterwards, showing the zero-copy blank author.

78Noisy
May 30, 2010, 6:08 am

I think you've been up too long, and are making bad decisions, Tim.

79timspalding
Edited: May 30, 2010, 9:20 am

>76 Collectorator:

The work in question was messed up in the following ways:

1. Ampersand linking, which Chris fixed.
2. The CK-cloud-count didn't correctly discount aliased works in the total. Since so much work had been done moving these books BICB around, there were literally hundreds of entries.
3. The BICB book, marked with the title "BICB Copies 0" either had originally or had acquired a real book--both potential sources of error--demonstrating the problem with taking works and giving them purposefully wrong titles and copies.

Updates seem to be the source of many, many problems on LT. I resent your obfuscation of legitimate complaints with irrelevant practices that you have Never shown an interest in before.

I'm sorry you think irrelevant a practice that is bad for LibraryThing--purposefully combining editions together for reasons having nothing to do with what editions should go together, and renaming works for administrative purposes unrelated to the work title.

Anyway, more than two months later the work title is still that belonging to the zero-copy work, while the title of the one real copy hasn't dislodged it. How many times have edition counts been run since the 16th March? From my point of view, separating the work from itself would sort this out right away: I haven't bothered in this case, but that's what I was advised to do in other cases where I'd added an author later but the work was still, weeks or months afterwards, showing the zero-copy blank author.

So, let's figure out a way for you to get what you want in the title without doing weird things to the work system to force results.

I'm sorry people advised you to do this, but it really makes no sense to be intentionally breaking the system to force it to behave in a way you want it to or even should.

Incidentally, this is a perfect example of the problem. You added the book from Amazon. When the next person adds the book, it's going to be added to wherever work the separated edition belongs to now. If you subsequently go on to combine that separated edition into some omnigatherum of editions, the work will go there.

I think you've been up too long, and are making bad decisions, Tim.

In what possible way? I want people to combine editions that belong to the same essential work and give works titles that describe the work's real title.

Look, I'm sure most of this would be solved if I simply provided a way to force edition renumbering and title recalculation. I worked last night on a complete redo of edition counts, and a faster approach to title and author calculation. (The author calculation also treats blank authors as 1/3 less numerous.) I'll try to get a "recalculate" button out as soon as I can. In the mean time, I'd appreciate tried to understand why hitting the television might be a negative approach to maintenance.

80Noisy
May 30, 2010, 9:31 am

The right decision is to eliminate zero-copy works. Why you aren't taking this decision is totally beyond me. JDI.

81Collectorator
May 30, 2010, 9:52 am

This member has been suspended from the site.

82timspalding
May 30, 2010, 10:18 am

The right decision is to eliminate zero-copy works. Why you aren't taking this decision is totally beyond me. JDI.

Zero-copy works, or zero-copy editions? If zero-copy works, where do the editions go?

Any why would you care about zero-copy works anyway--I mean literally? Is it because you don't like seeing them somewhere specific? Let's talk about that.

At present, I'm minded to remove them and their copies for the simple reason that they'e been wrongly gathered together. I don't think people understand that the work "BICB 0 Copies" will receive any book that falls under any of the 88 titles on http://www.librarything.com/work/9542649/editions . Worse, because edition-picking for works follows a cascade from exact to best-guess matches, simple titles like "Best in Children's Books" will also resolve to that work.

But, fundamentally, LibraryThing's system is designed to show the relationship between editions and works. Whether or not an edition has book in it is irrelevant to whether it belongs to the work. In the case mentioned above—message 77—the current method would consign all future copies of the book added from Amazon to separation from the appropriate work and, if the methods were carried out fully, to some nonsense work with a "0 copies" title.

What happens in BICB is irrelevant to the Series Cloud.

Look, I'm sorry. I ran the database queries, not you. There were hundreds of entries for that series because of edition combination. Yes, as I said, that's wrong. The cloud code—written by Chris and he can back me up here—should have discounted the CK work on works that were subsequently combined. It wasn't doing that. In doing the wrong thing, however, it showed me that there were hundreds of Best in Children's Books editions combined into a non-work with a non-title.

83timspalding
Edited: May 30, 2010, 10:23 am

I think the solution is:

1. For me to carefully re-explain what the various levels are for, and how they interact.
2. For me to make clear that separating to force results and not-work titles are not to be done.
3. For me to provide a link on each work that: (a) recalculates edition counts, (b) recalculates title and author based on a.

So far I've:

1. Re-run the foreign-language identification script on editions.
2. Rewrote and rerun the edition-count algorithm. You'll notice that many 1-count editions are now 0-count editions.
3. Rewrote the title and author recalcalculator to be faster and to discount no-author editions more.
4. Rerun it globally. It's currently on work 603,499.

84Collectorator
May 30, 2010, 10:25 am

This member has been suspended from the site.

85EveleenM
May 30, 2010, 11:06 am

#83

1. Re-run the foreign-language identification script on editions.
2. Rewrote and rerun the edition-count algorithm. You'll notice that many 1-count editions are now 0-count editions.
3. Rewrote the title and author recalcalculator to be faster and to discount no-author editions more.
4. Rerun it globally. It's currently on work 603,499.

That will certainly deal with the cases I was thinking of where separating zero-copy editions seemed to help. Thanks, Tim!

86Noisy
May 30, 2010, 11:23 am

>82 timspalding:

Sorry, I did mean zero-copy editions.

>83 timspalding:

"2. For me to make clear that separating to force results and not-work titles are not to be done."
And risk hard-core combiners going against your wishes because that's the only way to get things correctly matched? Why force people into that position?

87jjwilson61
May 30, 2010, 11:50 am

I think the solution is:

1. For me to carefully re-explain what the various levels are for, and how they interact.
2. For me to make clear that separating to force results and not-work titles are not to be done.

Could you do these things in the Combiners! group. I'm not sure that the people who need to be informed are following this group much less this off-topic tangent of this thread.

88klarusu
Edited: May 30, 2010, 11:58 am

#87 Hear, hear! I only just happened upon this by chance. This is something that deserves a thread of it's own. For the record, I'm with Tim on the For me to make clear that separating to force results and not-work titles are not to be done.

89timspalding
May 30, 2010, 2:11 pm

Written on iPhone. My goal is to make the changes I spoke of and post to the combiners group about it. I intend to make it clear both what people should not do and try to get as much feedback as possible on what changes are necessary to make it so people aren't tempted to.

90FicusFan
May 30, 2010, 4:15 pm

The tags have made it back to my book. Thank You, Tim.

With all this talk about zero copy books hanging around, is it the same for zero copy authors ?

I notice that often when searching for an author I will get a list of 10 names all the same, and 9 will have zero books and say they have been combined with Author X (same name).

Is there a reason all those empty authors are left on the system when they have been combined ?

91timspalding
May 30, 2010, 5:23 pm

The tags have made it back to my book. Thank You, Tim.

Just to be clear, it's not about tags on your book. It's about whether your tags show up in the global tag cloud for the book. There's no caching on your own data--indeed the whole reason LT needs to do this stuff is that we maintain a strict distinction between personal libraries and the global level. Anyway.

With all this talk about zero copy books hanging around, is it the same for zero copy authors ?

Honestly I'm not sure. I need to look at that system as well. In general, we have a second generation to our search system coming, but it's hanging on finishing up Overcat (you're in the beta group, right?).

92Heather19
May 30, 2010, 5:49 pm

Yes, it's pretty much the same. I've found more then a few zero-copy authors in my author cloud, authors that have no books attributed to them at all, but obviously used to before combining/name-changing/editing/whatever happened. (for what it's worth, this observation was made a few days ago, so I don't know if anything has changed since then)

93timspalding
May 30, 2010, 5:52 pm

The author situation is different in structure. It's not the fairly elegant edition->work concept, but a rather more problemmatic author name->author name, with severe limitations on how the name is recorded, and a separate marker for author->splits by work codes one. I'm loathe to refine it further; it needs redoing.

94Noisy
May 30, 2010, 6:06 pm

>89 timspalding:

The only thing that will stop me separating out zero-copy editions when I see them is if I don't see them.

95FicusFan
May 30, 2010, 6:20 pm

> 91 Tim,

The tags have made it back to my book. Thank You, Tim.

Just to be clear, it's not about tags on your book. It's about whether your tags show up in the global tag cloud for the book.

I was being unclear. I didn't mean that the tags had gone or returned from my personal book. Rather, I meant that the tags had returned to the work page for that book.

Don't use clouds so have no idea about that.

But thank you again.

96rsterling
May 30, 2010, 8:18 pm

94 - But why? I really do hope Tim starts a discussion about this in combiners, because I really do want to know exactly what benefits (are thought to) come from separating out zero-copy editions. If they're not hurting anyone - i.e. not for a completely different/wrong book - then why not leave them alone?

97prosfilaes
May 30, 2010, 8:26 pm

#96: If they're not hurting anyone - i.e. not for a completely different/wrong book - then why not leave them alone?

Because the editions with copies in the system mean that they have titles and authors that someone is willing to accept in their library. Editions that have no copies frequently do so because the name and author are screwed up in a way that no one wants to stand behind them, and yet they can influence and even dominate the name of the work in the system.

98rsterling
May 30, 2010, 8:43 pm

Ok, but canonical title can fix that, and if someone else adds the same screwed up version of the title, their work won't get combined properly. Anyway, it seems like Tim's trying to figure out other ways to deal with these problems that don't involve separation.

It's also possible someone could have deleted the book from their library, or just deleted a space from the title or changed the capitalization. That doesn't mean, as a general rule, that the zero copy title is bad, or that it won't later be used for someone else's book.

99timspalding
May 31, 2010, 12:31 am

and yet they can influence and even dominate the name of the work in the system.

But THAT is the problem. You're complaining about something that should be implemented better by proposing abuse of a feature that has been implemented correctly. Zero-edition works shouldn't have the power to rename works.

In fact, since yesterday, zero-edition works no longer count as they once did (they count for a .01 vote, so they can only matter if 101 0-copy editions say X and one 1-copy edition says Y). The only wrinkle is that the counts are updated nightly, not immediately, so a zero-edition can matter in the short term. Anyway, I'll be introducing tools to force recalculations of both edition counts and titles/authors.

100r.orrison
Edited: May 31, 2010, 3:26 am

The problem that I have is, as I mentioned in #36, inappropriate combinations being suggested by zero-copy works. In the A. Wainwright (touchstone not working) lake district guides, there are dozens, perhaps hundreds, of zero copy editions/works that if combined into the works that seem appropriate will cause suggested combinations with all the other books in the series. And.. now it looks like someone has merged a lot of them back in. Take a look at the suggested combinations on this page:

http://www.librarything.com/work/1688915/editions/10036749

EVERY BOOK IN BOTH SERIES!

(And yes, I used canonical title on some things like http://www.librarything.com/work/9767795 to help prevent it being combined where it does not belong)

I bet that if I separate off all those "no current copies" a lot of those suggestions will go away.

Many of those zero-copy works have ISBNs that don't match their titles, they've obviously been entered wrong and corrected. Sometimes the title didn't have enough information to show which series it should be in. Where should they go? Should they be combined by title? By ISBN? Or should they be deleted because they're wrong and nobody actually has them in their library?

(Another cause of inappropriate suggestions is the "match title only on text before the colon" rule. Take a look at that editions page again, and think how that rule will affect these books.)

Another case of bad data being kept around and fouling things up: http://www.librarything.com/topic/91773 in which a book was entered with no author, which created the work with no author, then when the book was edited to have an author it was auto-combined with the work which still had no author. C.f. http://www.librarything.com/topic/91861

101MarthaJeanne
May 31, 2010, 4:08 am

I found a number of books yesterday in my singletons where the proper author shows as a zero work author, and my book shows up in a mangled version of the author. In several of these cases I had actually emntered a lot of data in the author CK. Please don't delete these authors unless it is clear that they are really 0 work authors.

My connection seems to be going down. I'm going to try and post this. Examples later if I can find them.

102infiniteletters
May 31, 2010, 9:43 am

Another reason I have seen for 0-edition copies to be separated out is when there are, oh, 20 lines of the exact same title showing under a 4 copy work on the author combine/separate page. It makes the author combine/separate pages shorter.

103EveleenM
May 31, 2010, 9:59 am

#102 Another reason I have seen for 0-edition copies to be separated out is when there are, oh, 20 lines of the exact same title showing under a 4 copy work on the author combine/separate page. It makes the author combine/separate pages shorter.

Here's an example, Honolulu by Glenda Bendure, with 3 actual copies and 23 edition lines, http://www.librarything.com/work/6751072/editions. The zero-copy editions seem to be showing up on the work edition page at the moment, but even if they are taken off that again, they can still be seen on the author combine page at http://www.librarything.com/combine.php?author=bendureglenda (scroll down to Honolulu).

104Collectorator
Edited: May 31, 2010, 10:06 am

This member has been suspended from the site.

105rsterling
May 31, 2010, 11:28 am

Ok, so so far I've seen 3 reasons for why people are separating out zero-copy editions:

1) zero-copy editions can sometimes dominate the calculation for which title and author is dominant for the work.
But: canonical title can fix the title part, and better recalculation should fix this, per Tim's 99.

2) zero-copy editions are/might be responsible for making the "suggested combinations" on the editions page have too many and too irrelevant works suggested.
3) zero-copy editions make the author combination page too long.

Both 2 and 3 seem to be more aesthetic issues. With 2, I can see that there would be a problem if combinations were happening automatically, but that doesn't seem to be what's at issue; if it's just about the length and inappropriateness of the combination suggestions, suggestions can just be ignored. I also suspect that these aren't coming from the fact that the editions have zero-copies but from the fact that some things that were combined were the wrong book or had bad data (e.g. a title for one book but ISBN for another) and that there's some caching of these connections in the database. To me, it seems like the solution is to separate out editions that are for the wrong book, but that's regardless of how many copies there are. Perhaps the recalculation of editions could also help with refreshing the combination suggestions more often.
3 is annoying, because it's annoying when you have to scroll through a very long combination page for an author, but it doesn't seem like a major problem to me. It's also annoying to scroll through an author combination page with very popular books where there are hundreds of non-zero-copy editions, but it's doable, and it's not a bug per se. (There does seem to be some slow caching going on, though, so that a book you separate out might still show up as an edition for the wrong book on the author combination page for a while; that could definitely use fixing.)

106rsterling
May 31, 2010, 11:33 am

By the way, I also agree that it would be nice not to have those cases there there are a dozen or more of the exact same zero-copy work, like the example in 103. I could be wrong, but I don't think those zero-copy works are coming from people changing their work, but from something else in the system.

But, I think it would be better to find a database, back-end solution to that, rather than using combination/separation from the member end, which is more haphazard and may create other complications in the database that we can't see.

107infiniteletters
May 31, 2010, 11:45 am

I agree with 106.

108Noisy
May 31, 2010, 12:51 pm

>105 rsterling:

I think you've missed the most likely reason: the data was bad in the first place, otherwise why would it have been changed! If you buy a washing machine and it's defective so you get a new one, do you keep the old one around just so you can see what it looked like? If a sock develops a hole so you throw it away, or if you lose a sock - do you keep the odd one lying around just in case? Neither of these are accurate analogies, of course, but I fail to understand why anyone would want to keep BAD DATA lying around in a database, and particularly if it's kept associated with the 'good' data: to do so would be mad.

109Collectorator
May 31, 2010, 1:07 pm

This member has been suspended from the site.

110jjwilson61
Edited: May 31, 2010, 1:09 pm

108> Because bad data is an integral part of LibraryThing. The data people enter in their catalogs can't be changed therefore there is going to be bad data lying around in people's catalog. The idea, I believe, is that keeping the bad data associated with the work it was supposed to be for in the form of zero copy editions makes it easier for other bad data in people's catalogs to get associated with the works they were meant to be.

The trick then is in making sure that the bad data doesn't overwhelm the good in the cases of determining the title and author of the work. Tim said something about minimizing the effect of zero-copy editions when calculating the title and author for the work, but I don't really see any reason that they should have have any effect at all.

111jjwilson61
May 31, 2010, 1:12 pm

105> Having bad suggested combinations is dangerous because someone will click them eventually especially when the titles look similar as is the case with the travel books.

Perhaps Tim should allow the option to say No Thanks to those suggestions. Or perhaps to allow combiners to delete selected zero-copy works that are causing problems, like where the wrong ISBN is getting associated with a Title.

112rsterling
Edited: May 31, 2010, 1:19 pm

108 - No, actually that's more or less the point I am making, but the question is about what constitutes bad data, how we can tell, and what we - members - should be doing about it if anything rather than what LT back-end processes should be doing.

If the data is bad -- wrong title or something -- that is the reason to separate it, Regardless of how many copies are there. If it's not bad data, there's no reason to separate it. If it's just a misspelling or something similar, there's no reason to separate IMHO, because later instances where someone imports the same bad, mistaken data, can be associated correctly with the rest of the work regardless of the misspelling.

the data was bad in the first place, otherwise why would it have been changed!
We can't know that. We can't know why someone else decided to change their data. People are assuming this, but we don't know. It's also probably not the only reason that zero copies get produced, as in post 103.

It may be that sometimes the reason that zero copies exist is because people changed bad data, but that's not the only reason zero copies might exist. The data might have been changed not because it was bad, but because I want no space before my colon, or I want the full title capitalized. That doesn't mean it's bad, nor does it imply anything about whether someone else, who imports the same data that I changed, will be happy or unhappy with that data. It seems people are making a lot of assumptions about what "zero copies" means, without knowing for sure.

All bad data (wrong title, etc.) should probably be separated. But not all zero copies are bad data.
(edited to add post reference)
ETA perhaps it would be useful to clarify what bad data means, and separate really bad/mistaken data from typos and smaller errors. I consider two different books combined into the same book to be a case of really bad/mistaken data, which should definitely be separated. Typos, alternate spelling or capitalization, even if someone changes those and it generates a zero-copy work - not particularly bad data. Just data.

113rsterling
May 31, 2010, 1:20 pm

111 - A "nevering" option for works would be nice. But with great power would come great responsibility.... ;)

114timspalding
May 31, 2010, 1:55 pm

I have reply from 102 on, but multiple editions with exactly the same data should not be in the system. I have made it so they can't happen in the future, and I am now fixing the past ones.

115KingRat
May 31, 2010, 4:55 pm

All I know is that someone separating zero copy editions caused a recommendation of mine to become attached to a Frankenstein work rather than the original work.

116Noisy
May 31, 2010, 7:03 pm

>115 KingRat:

So that's another addition to rsterling's list: zero-copy editions can acquire and carry CK (and other?) data, which is patently meaningless and wouldn't happen if the zero-copy editions didn't exist in the first place.

117rsterling
May 31, 2010, 7:34 pm

116 - But only if they're separated. That problem wouldn't arise if they weren't separated.

118jjwilson61
May 31, 2010, 8:39 pm

116> Only works carry CK but zero-copy editions may be connected to zero-copy works, but as rsterling says, only if they're separated from the main work.

119Collectorator
May 31, 2010, 9:10 pm

This member has been suspended from the site.

120rsterling
May 31, 2010, 9:49 pm

115.

121timspalding
Jun 1, 2010, 5:07 am

Update: I'm still refreshing/regenerating data. You'll notice that some works, like Honolulu, above (http://www.librarything.com/work/6751072/editions), have all their redundant editions removed. But not all have. And edition copies will be wonky today.

More coming. Sorry the system is so large that major changes take time.

122infiniteletters
Jun 1, 2010, 11:45 am

Thanks Tim! I know I was one of the people complaining about the redundant 0-copy editions, and I think that's the main complaint others had too.

123timspalding
Jun 1, 2010, 11:53 am

Whoops... http://www.librarything.com/topic/92085

124lorax
Jun 1, 2010, 1:10 pm

37>

Tim, nobody's suggesting that we 'marginalize' anyone's works by "deciding" that they have zero copies; the zero copy thing is an artifact introduced by the person who has the work editing it. So they saw, and corrected, the bad data, yet the bad data lingers on, now unconnected to their book or their data. How is this helpful or necessary? Who does it benefit?

125jjwilson61
Jun 1, 2010, 1:53 pm

As Tim has already expressed, it helps the next person to import that book from that source, whose book will now be automatically combined with the correct work. And it isn't necessarily bad data but just differently formatted data.

126timspalding
Jun 1, 2010, 2:01 pm

Right. In the case above someone added from Amazon

Freshwater Life in Ireland: Keys to the more easily identified Irish freshwater plants and animals and checklists for most groups / Cedric S. Woods (ISBN 0716522810)

And then, presumably disliking the subtitle, changed it to

Freshwater Life in Ireland / Cedric S. Woods (ISBN 0716522810)

By keeping the original title we allow the next person who adds it from Amazon, using the data it provides, to be connected to the right work.

127lorax
Jun 1, 2010, 2:09 pm

126>

Fine, in that case, the data wasn't actually wrong, though it is in many such cases. How about just not showing zero-copy works, then?

128timspalding
Jun 1, 2010, 2:13 pm

Zero copy works or editions?

129lorax
Jun 1, 2010, 2:17 pm

128>

Zero-copy works.

130saltmanz
Jun 1, 2010, 2:24 pm

126> Yeah.

I'm usually the first guy to add a new Star Wars book to LT as soon as the Amazon listing goes up. And I almost always rename them. Something like

Star Wars: Fate of the Jedi: Allies

gets retitled

Allies (Star Wars: Fate of the Jedi, Book 5).

That doesn't mean the original data was bad, just that I prefer not to to look at it that way. A real problem is that because the Amazon-generated titles to big SW series like the above are so long that LT always automatically tries combining the new books with earlier books in the series. So keeping the 0-copy edition of the new book is vital to correct combining when someone else adds the book to their library.

131timspalding
Jun 1, 2010, 2:26 pm

Well, I'm unsure what to do about the "X 0 COPIES" works, gathered artificially together and purposefully a mix of real works.

But if the system is used correctly--if editions are combined with where they ought to go--there should NEVER be zero-copy works. The only zero-copy works should be aliased to other works. (Members don't really see that side of it, but it exists.) Or, conceivably, very very occasionally an edition should not belong to the work it's with, or to any other currently-onsite-work, and should be by itself, patiently waiting for someone to show up and love it. But that should be very very rare. Under normal use, there shouldn't be any zero-copy works.

132timspalding
Edited: Jun 1, 2010, 2:34 pm

A real problem is that because the Amazon-generated titles to big SW series like the above are so long that LT always automatically tries combining the new books with earlier books in the series.

For the record, it's not length, it's dividers. LT fails in a cascade--full title comes first. If it has to guess it starts with the ISBN, and then fails further to using a "guess title," which removes everything after a : or inside parentheses and normalizes the capitalization. After that it just gives up and assigns it to The World's Greatest Book of Useless Information.

133lorax
Jun 1, 2010, 2:36 pm

131>

So, if there's something with bad Amazon data -- let's say there's a typo in the title, or an incorrect author -- and the user then fixes that, so that it gets combined with the correct work -- what, "if the system is used correctly", should happen to that now-orphaned zero-copy work that has the typo in the title?

134infiniteletters
Jun 1, 2010, 2:37 pm

131: At one point last year(?), 0-copy works started showing up when titles were changed (as well as the redundant 0-copy editions) mentioned earlier.

There have been discussions in combiners about what to do about them, and it's gone back and forth.

135timspalding
Jun 1, 2010, 2:38 pm

>133 lorax:

If they combine the work, the whole work should go there. But you're right, if they make an edit that changes the title/author/ISBN such that it now matches another work's edition exactly*, the old edition will be marooned.

*The "guessing" stuff in message 132 only happens on entry. After that it's about exact correspondences.

136Collectorator
Jun 1, 2010, 6:36 pm

This member has been suspended from the site.

137rsterling
Jun 1, 2010, 6:47 pm

Not so that they can be added, but so they can be linked to the correct works.

Although, maybe some changes to the add books are afoot what would make previous records make more of a difference. (?)

138Noisy
Jun 1, 2010, 7:53 pm

I've been racking my brains trying to think why Tim (or anyone) would want to keep bad data in the system, when the current reason for doing so seems so weak. OK, auto-matching of works on the basis of common edition criteria sounds like a good goal, but when you have the mighty Combiners group who can actually research and apply intelligence to sparse-information matches and then effect the combinations with a great deal of accuracy (in spite of the need to apply arcane mechanisms like creating invalid intermediate works that will match with both) then what are you gaining?

Well, what you are gaining is a 'no-effort' fuzzy-matching system. Now, if this had been expressed as the goal, then I'd have understood. I'd have disagreed with the method, of course, but I'd have understood. I have no knowledge of fuzzy-matching algorithms, but I expect they are complicated, so I can see why Tim wouldn't want to invest the effort in implementing such an algorithm (at this point, when there are so many other things to work on).

If the zero-copy editions had remained hidden, then I wouldn't have been any the wiser, and this debate probably wouldn't be happening. On the other hand, I have now uncovered Tim's nefarious plot.

So, I think the intermediate question, if my postulate is correct, is - what are acceptable as positive fuzzy-match rules to keep, and which should be discarded. Misspelled titles and incomplete author names sound as if they are acceptable. Sometimes totally different author names are acceptable (if there are multiple author names for a work) and sometimes they are not (if a title has been used by different authors in different countries). Different ISBNs are usually acceptable, but sometimes the same ISBN has been re-used by a publisher for different works; sometimes a Thingamabrarian has added a book that was in the same series, and just modified the title but left the old ISBN. What to do in these cases?

I think that these are just the starting points for a much wider discussion, but if Tim or anyone else is going to persuade me that keeping bad data in the system without applying at least some discrimination is an acceptable thing, then they've got an uphill struggle.

139r.orrison
Jun 2, 2010, 2:42 am

In message 131 Tim said: But if the system is used correctly--if editions are combined with where they ought to go--there should NEVER be zero-copy works. The only zero-copy works should be aliased to other works.

Simple question: when the title and ISBN of an edition don't match, and there are zero copies of the edition because everyone who entered it has fixed the error, which work should that edition be a part of?

In my experience it will be suggested as a combination for both, and when combined with either will cause the two to be suggested for each other. If separated, at least it does not cause a suggested combination between the two different works.

And people who don't see the problem with inappropriate combinations have never spent hours separating out a mess that was created in a few seconds.

140r.orrison
Jun 2, 2010, 9:16 am

Another case where "using the system in weird, unintended ways" fixed a problem. How would you recommend we fix things like that, Tim?

141jjwilson61
Edited: Jun 2, 2010, 9:19 am

139> Of course, there could be a book with the wrong ISBN in it in someone's library, so that isn't really a zero-copy work problem. But it would be nice if everyone actually fixed their copies if a ghost edition wouldn't stick around gumming up the works.

142timspalding
Jun 2, 2010, 9:49 am

>140 r.orrison:

You mean #151. Yes, you CAN cause LibraryThing to rejigger some data by combining and uncombining books. You could also get the stats to update by combining Shakespeare with Steven King. No, it's not a good thing for the site for you to do it.

And people who don't see the problem with inappropriate combinations have never spent hours separating out a mess that was created in a few seconds.

Right. And here's the problem. You guys aren't distinguishing between how the data should be organized and how the system should behave. If such things are a problem, let's figure out a way to prevent them from being a problem that doesn't involve intentionally messing up the data.

For example, one problem you've raised is copy-refreshing. You want to screw up the data in order to force it. I'd rather add a copy-refresh button.

You've raised the problem of "truly" incorrect editions. I think these are fairly rare, but in any case what's needed is some way for editions to be suggested for deletion if they are actually incorrect rather than merely 0-copy.

You've raised the problem of combinations that shouldn't be suggested. Why don't we make it possible to "never" a combination like that?

143r.orrison
Jun 2, 2010, 10:02 am

but in any case what's needed is some way for editions to be suggested for deletion if they are actually incorrect rather than merely 0-copy.

Sounds fine to me. A mechanism like the spam marking?

(On that subject, could spam works get deleted, or is there some value in keeping them and merely hiding them? I suppose if someone else entered the same spam work, it would automatically get combined and hidden, so perhaps it is worth keeping them in the database.)

Why don't we make it possible to "never" a combination like that?

That sounds great too.

The problem is that we have never had mechanisms like these, and although the problems have been reported time and again, and discussed for years in the Combiners, Bug Collectors, and Recommended Site Improvements groups, there has never previously been any sustained interest on your part. When problems with data have occurred, we've only had data manipulation as a way to solve them. (Do you want me to find some old threads?)

I'm sorry if I'm coming across as snarky, but after years of using this sort of data manipulation to solve problems in the database, and hours of cleaning up hundreds of inappropriate combinations, there's a bit of repressed frustration coming out. In all seriousness, thank you sincerely for joining in this discussion.

144jjwilson61
Jun 2, 2010, 10:09 am

Tim, the reason people look for work-arounds is that these problems have existed for years and you've shown no interest before now in finding technical solutions for them. The Combiners are a dedicated group of people interested in solving problems and they aren't likely to let technical limitations stop them.

Have you looked at the methods people use to float a book to the right author or to combine a book with no author? I have to say that I'm not terribly familiar with them but I believe their use creates lots of zero-copy editions. Do you approve of these methods? People haven't made their use a secret and your silence on the matter seems to give them your tacit approval.

So, to sum up, it must seem frustrating to long-time combiners to find you all of a sudden criticizing their methods.

145MarthaJeanne
Jun 2, 2010, 10:18 am

Say I have manually entered a book. Noone else owns it. I made a mistake in the title or author. If I correct it the work stays with the wrong title and/or on the wrong author page.

Would you rather I separated my copy from the bad work to get a good one, or would you rather I deleted my copy and tried again instead of trying to correct it?

146timspalding
Jun 2, 2010, 10:52 am

>144 jjwilson61:

Look, I understand. The criticism I've leveled has only come about when some members stubbornly insist that I'm wrong about how the data is supposed to be, and insist that their methods are not only necessary work-arounds but better models of how the system should work.

147jjwilson61
Jun 2, 2010, 12:49 pm

146> They're only going by their experience which is that zero-copy editions are attached to a work sometimes bad things happen and when they are separated the bad things go away. That there may be good things about zero-copy editions is not in their realm of experience. If you implemented some alternate ways to clean up these bad situations (a recalculate button, for example) I'm sure they'd be happy to use the approved methods.

148infiniteletters
Jun 2, 2010, 2:49 pm

142:

"For example, one problem you've raised is copy-refreshing. You want to screw up the data in order to force it. I'd rather add a copy-refresh button.

You've raised the problem of "truly" incorrect editions. I think these are fairly rare, but in any case what's needed is some way for editions to be suggested for deletion if they are actually incorrect rather than merely 0-copy.

You've raised the problem of combinations that shouldn't be suggested. Why don't we make it possible to "never" a combination like that?"

Excellent.

149jjwilson61
Jun 2, 2010, 5:02 pm

You've raised the problem of "truly" incorrect editions. I think these are fairly rare, but in any case what's needed is some way for editions to be suggested for deletion if they are actually incorrect rather than merely 0-copy.

You mean like deleting an incorrect 1-copy edition? Isn't that deleting actual user data and therefore forbidden?

150timspalding
Jun 2, 2010, 5:11 pm

No, I think s/he means 0-copy. Obviously 1-copies will never be deleted.

I'm torn between allowing members to delete 0-copies that are wrong, and some sort of "ratty data" marker that members could apply, which might have various effects.

151r.orrison
Jun 2, 2010, 5:11 pm

I think he means incorrect zero-copy editions, as opposed to zero-copy editions that are not incorrect, but merely zero-copy.

152infiniteletters
Jun 2, 2010, 5:19 pm

150: I think a flag for bad data would be safest. I would also like the same thing for covers.

153r.orrison
Jun 2, 2010, 5:27 pm

I think deleting the bad data would be safest, otherwise it will keep popping up in unexpected places (such as the very popular "SPAM! SPAM! SPAM!" series that appeared in the series cloud recently, which consisted solely of spam works that had been hidden, but not deleted). If it's only hidden, the flag that hides it needs to be added to all existing code that can display editions, or make calculations based on them, and it needs to remembered in all future code that deals with possibly hidden editions. If it's gone, it's gone.

(If the only edition in a work is deleted, would the work be deleted? If the only edition is flagged as bad and to be hidden, then the work should also be hidden.)

154FicusFan
Jun 2, 2010, 6:02 pm

If there is Bad data: Would it be so horrible to just allow edits to fix it, rather than all this adding, deleting, combining, separating and creating zero copy works ?

I realize the data in a specific account would still be bad because you can't change their data, but the public work in the LT world would be correct for everyone else. Maybe a marker so the bad data edition can be seen, and just left because it belongs to a user ? The marker makes sure that edition doesn't 'dominate' ?

155timspalding
Jun 2, 2010, 6:04 pm

>154 FicusFan:

Yes, that's what I'm thinking of as the ratty-data marker.

156FicusFan
Jun 2, 2010, 6:20 pm

> 155

And the edits part ? I mean the rigamaroll you have to go through to correct things is amazing.

157rsterling
Jun 2, 2010, 7:24 pm

153: such as the very popular "SPAM! SPAM! SPAM!" series that appeared in the series cloud recently, which consisted solely of spam works that had been hidden, but not deleted)

Actually, that series is not still there because the spam works have been deleted, but because a bunch of works that had that series in the CK got combined with other spam works, and for some reason, some of the CK gets lost in a kind of limbo world, where it's still attached to editions, but not accessible or editable on the work it's combined into. I've been trying to go through and delete all instances of this kind of CK where they appear, even for flagged and hidden spam works, but there are a bunch of combined editions that still have spam CK that I can't get to to delete.

158Collectorator
Jun 2, 2010, 7:54 pm

This member has been suspended from the site.

159infiniteletters
Jun 2, 2010, 7:55 pm

157: And that's another bug that needs to be corrected... inaccessible CK after combining.

160r.orrison
Edited: Jun 3, 2010, 3:12 am

157:
At the time I spotted it in the series cloud, there were no unhidden works in the series. It appeared in the cloud, after the cloud was regenerated, but if you clicked it there was nothing there. CK History showed the works that were in the series, and all of them were hidden spam works. It is now gone from the cloud*, because the appropriate check for hidden spam works was added in a place where it had been missed. If those works and their constituent editions had been deleted, then they could not have appeared in the cloud**. If a "ratty-data" flag is added, it will also have to be checked in all the places where the spam flag is checked***, and missing a check will mean the bad data appearing when it's not wanted.

Whether or not it was possible to remove the series from the combined spam works (#159) is, as infiniteletters says, another bug.

* But still appears in the CK history, if you search for "spam" in the series field; this is another place the flag should be checked.
** Yes, I realize that the CK entries could theoretically be left around even if the work was deleted. I'm assuming that wouldn't be the case.
*** Yes, I realize the ratty-data flag will probably be attached to editions rather than works, and won't hide the whole work. But it will need to trickle up to works that are solely ratty-data, so perhaps it will actually have to be checked in more places.

161r.orrison
Jun 3, 2010, 2:46 am

But -- however they choose to implement it -- I welcome every bit of work done to help make it easier to clean up the data. Seconding #158, heartily.

162justjim
Jun 3, 2010, 2:55 am

#158 I realise that this isn't Pedants' Corner, but I hope you mean 'prescribed'!

163Collectorator
Jun 3, 2010, 3:23 am

This member has been suspended from the site.

164jjwilson61
Jun 3, 2010, 9:13 am

If there were a ratty data flag on editions then if there was a work that was created solely due to that edition, it should probably just be removed as there would be no need for it.

165rsterling
Jun 3, 2010, 3:27 pm

160 - if you look at the spam works that have been hidden, though, I'm pretty sure none of them actually have SPAM!SPAM!SPAM in the series on the work page. I went through and deleted every instance I could find/get to.

The series may be hidden from the cloud now that the books assigned to the series were combined into other books and the latter books have now been hidden.

When you click on "spam" or "spam!spam!spam" in the CK history, the link is for a different work number that you ultimately get to. So the CK is attached to a different "work" than the one it was ultimately combined into, which is the one that got flagged and hidden.

166r.orrison
Jun 3, 2010, 5:50 pm

My point is that if the works that are now hidden by the spam flag had instead been deleted, the series would not have appeared in the series cloud, without having to add another check for the spam flag when calculating what goes into the cloud.

It doesn't matter whether it's the work directly linked to from the history list, or some other work that it has been combined into. It doesn't matter that it has been removed from as many works as possible.

The series appeared in the series cloud a couple days ago when it was recalculated, Tim added a check for the spam flag where there had not been one previously, and now it is gone. The works were combined and edited and voted away as spam long before it appeared.

If the works that he is checking in order to hide it had instead been deleted, it wouldn't have appeared in the first place.

167Collectorator
Jun 3, 2010, 7:41 pm

This member has been suspended from the site.

168kathrynnd
Jun 3, 2010, 7:54 pm

I don't think it's only the spam role in this problem, I came a across a work of zero copies by Jules Verne yesterday. I separated out the ones correctly belonging to the author Malvina G. Vogel but some belonging to other authors' works (or other Jules Verne works) remain. Should this be broken?

http://www.librarything.com/work/9471614/editions

169Collectorator
Jun 3, 2010, 8:06 pm

This member has been suspended from the site.

170rsterling
Jun 3, 2010, 10:13 pm

I don't think the spam issue is connected to zero copies per se. I think the spam issue is about people modifying CK to add markers that something is spam (series, canonical titles, canonical authors), etc. It's related to zero copies in the sense that zero-copy works is another place where people are modifying CK.

So I think there are 2 distinct but sometimes related issues Tim was talking about:
- separating out zero copies
- changing CK to indicate something about data quality (whether a book is spam or whether a book has no copies)

171r.orrison
Jun 4, 2010, 1:21 am

I'm not claiming any relationship between spam and zero copies works, and I haven't seen anyone else doing it. It's a question of hiding vs. deleting data.

Confirmed spam is currently marked as such in the database and hidden. It sometimes shows up in places where the flag is not being checked. If it had been deleted it couldn't show up anywhere.

The proposal is that incorrect zero-copy editions be marked and hidden. Its flag will need to be checked everywhere that it could show up. If it's deleted it can't show up anywhere.

The problem that I pointed out with the spam series is an illustration of what can occur when you just try to hide data. You have to check whether it needs to be hidden in every place that it could possibly show up, and checks can be missed. If you delete the data, it won't be there.

172Collectorator
Jun 4, 2010, 6:08 am

This member has been suspended from the site.

173rsterling
Jun 4, 2010, 10:12 am

172 - Weird. Why would that have been suppressed as spam? That doesn't make sense to me. It would make more sense to separate those editions back out and combine them with their proper works. And to remove the CK altogether.

Anyway, you can see any of the sub-level pages of a suppressed work by putting this at the end of the URL: &spam=1

174rsterling
Jun 4, 2010, 11:50 am

172-3: I made a note of this (apparent bug?) on the spam thread. Tim's looking into it, and asks for people not to do any separating or anything on it in the meantime:
http://www.librarything.com/topic/89778#2005672

175r.orrison
Edited: Jun 5, 2010, 6:23 am

Tim, could you take a look at http://www.librarything.com/work/6337002/editions ?

It's showing as 2 members, one edition has 4 copies, and there are 11 editions listed with "no current copies" -- all of which appear to be identical.

In message 114 you said that these were getting fixed. Has the system not gotten to this one yet, was it missed, or is it a slightly difference case that wasn't being detected?

176EveleenM
Edited: Jun 14, 2010, 5:00 pm

177timspalding
Jun 6, 2010, 9:21 pm

I'm still noticing quite a few works with these duplicate zero-copy editions. Has the process by any chance stopped?

Yes. It was removing some editions that were real ones. (That was the cause of some books seeming to get disconnected from their works the other day.) I need to look at the code again and start it up. It's very irritating insofar as the code is really quite iron clad and simple. Ugh.

178r.orrison
Jun 9, 2010, 6:41 am

Another test case (as if you need more!): http://www.librarything.com/work/7719317/editions

179Collectorator
Jun 9, 2010, 7:23 am

This member has been suspended from the site.

180r.orrison
Jun 9, 2010, 7:38 am

And in the absence of further progress in the various changes* Tim has mentioned, I'll start separating off the junk again**.

* Such as:
nevering suggested work combinations
fixing multiple-identical-editions
manual recalculation of author names

** just kidding, I'll wait a while longer

181timspalding
Jun 9, 2010, 1:02 pm

Patience please. The last time I ran this it practically brought down the system. I need to retool it.

182r.orrison
Jun 15, 2010, 11:40 am

No problem. I've actually started combining zero-copy works in where they belong, as long as the title and ISBN agree where that is. Two weeks?

183timspalding
Edited: Jun 15, 2010, 4:44 pm

tail -f /var/www/html/admin/logs/onetime_getridofduplicateeditionsinworks.log
Work: 6535502 / Done: 1222818
Work: 6535633 / Done: 1222996
Work: 6535748 / Done: 1223141
Work: 6535880 / Done: 1223290
Work: 6536011 / Done: 1223450
Work: 6536130 / Done: 1223620
Work: 6536256 / Done: 1223790
Work: 6536381 / Done: 1223876
Work: 6536525 / Done: 1223982
Work: 6536655 / Done: 1224090

Actually, it no longer deletes the edition, but it moves everyone off of ones that have duplicates. Stage two deletes it.

184Collectorator
Jun 15, 2010, 6:32 pm

This member has been suspended from the site.

185r.orrison
Edited: Jun 15, 2010, 6:44 pm

Take a look at the links in messages 175-178. Note that there are many exact duplicate editions listed. Check again in a few days. Be happy.

(Although, from the description and sequence, I wouldn't have expected http://www.librarything.com/work/1673324/editions to have two lines that say "Lonely Planet Best of Bangkok / Rebecca Turner (ISBN 1741044421) (1 copy separate)")

186timspalding
Jun 15, 2010, 7:10 pm

>184 Collectorator:

Sorry. That's the "tail" of an ongoing process. It's slowly chewing through the works. It's at work number 6536655.

187Collectorator
Jun 17, 2010, 7:56 am

This member has been suspended from the site.

188timspalding
Jun 17, 2010, 8:37 am

You mean in the alphabetization? Hmm. Not a high priority for me to change, especially as I'd need to dig deeper into it to do it right--other languages, for example.

189jjwilson61
Jun 17, 2010, 9:27 am

Couldn't you just use the same algorithm you use for book titles?

190EveleenM
Edited: Jun 18, 2010, 11:30 am

And the big slabs of duplicates in #175, #176, #178 are gone. Great!

Edited to add: the process seems to have left one zero-copy edition for each real-copy edition. I think that's manageable compared to the long lists of duplicates there were before.

191timspalding
Jun 18, 2010, 11:42 am

seems to have left one zero-copy edition for each real-copy edition

If they are the same, it shouldn't. I'm investigating. But it SHOULD be leaving zero-copies when it's a different edition.

192timspalding
Jun 18, 2010, 11:42 am

seems to have left one zero-copy edition for each real-copy edition

If they are the same, it shouldn't. I'm investigating. But it SHOULD be leaving zero-copies when it's a different edition.

193AnnaClaire
Jun 18, 2010, 12:00 pm

>191 timspalding:/192
Oh dear. Now we have zero-copy posts.

194timspalding
Jun 18, 2010, 12:22 pm

Snort.

195Noisy
Jul 3, 2010, 6:07 am

http://www.librarything.com/topic/94108

Yet another illustration of how zero-copy editions cause grief when something goes wrong. In this instance there was some over-ambitious combining, and when that touches Tolstoy and 'Remembrance of Things Past', well - the nightmare is compounded by the page after page of zero-copy editions that have to be separated out. Argh! GET RID OF THEM!!!

Update the Series Cloud

Talk Recommend Site Improvements

1CollectoratorMay 26, 2010, 5:57 am

2EveleenMMay 26, 2010, 7:23 am

3CollectoratorMay 26, 2010, 7:40 am

4MarthaJeanneEdited: May 26, 2010, 10:31 am

5KingRatMay 26, 2010, 1:30 pm

6CollectoratorMay 27, 2010, 7:08 am

7FicusFanMay 27, 2010, 7:47 am

8DaynaRTMay 27, 2010, 8:34 am

9keristarsMay 27, 2010, 9:49 am

10timspaldingMay 27, 2010, 3:02 pm

11_Zoe_May 27, 2010, 3:06 pm

12DaynaRTMay 27, 2010, 3:09 pm

13KingRatMay 27, 2010, 6:57 pm

14Heather19May 27, 2010, 9:55 pm

15timspaldingMay 28, 2010, 2:32 am

16r.orrisonMay 28, 2010, 4:42 am

17timspaldingMay 28, 2010, 4:44 am

18CollectoratorMay 28, 2010, 5:51 am

19timspaldingMay 28, 2010, 5:56 am

20CollectoratorMay 28, 2010, 5:59 am

21FicusFanMay 28, 2010, 7:21 am

22timspaldingMay 28, 2010, 7:24 am

23FicusFanMay 28, 2010, 7:26 am

24FicusFanMay 28, 2010, 7:29 am

25conceptDawgMay 28, 2010, 12:56 pm

26CollectoratorEdited: May 28, 2010, 6:28 pm

27CollectoratorMay 29, 2010, 8:34 am

28FicusFanMay 29, 2010, 10:36 am

29CollectoratorMay 29, 2010, 10:48 am

30CollectoratorMay 29, 2010, 10:55 am

31timspaldingMay 29, 2010, 1:30 pm

32conceptDawgMay 29, 2010, 1:39 pm

33timspaldingEdited: May 29, 2010, 2:03 pm

34timspaldingMay 29, 2010, 2:06 pm

35FicusFanMay 29, 2010, 2:38 pm

36r.orrisonMay 29, 2010, 5:59 pm

37timspaldingEdited: May 29, 2010, 6:06 pm

38CollectoratorMay 29, 2010, 6:07 pm

39timspaldingEdited: May 29, 2010, 6:26 pm

40CollectoratorMay 29, 2010, 6:24 pm

41timspaldingMay 29, 2010, 6:36 pm

42CollectoratorMay 29, 2010, 6:49 pm

43rsterlingEdited: May 29, 2010, 7:03 pm

44timspaldingEdited: May 29, 2010, 7:02 pm

45CollectoratorMay 29, 2010, 7:02 pm

46CollectoratorMay 29, 2010, 7:03 pm

47timspaldingMay 29, 2010, 7:06 pm

48rsterlingEdited: May 29, 2010, 7:07 pm

49timspaldingEdited: May 29, 2010, 7:13 pm

50CollectoratorMay 29, 2010, 7:11 pm

51EveleenMMay 29, 2010, 7:11 pm

52CollectoratorMay 29, 2010, 7:14 pm

53rsterlingMay 29, 2010, 7:14 pm

54timspaldingMay 29, 2010, 7:16 pm

55timspaldingMay 29, 2010, 7:17 pm

56CollectoratorMay 29, 2010, 7:19 pm

57CollectoratorMay 29, 2010, 7:20 pm

58DaynaRTMay 29, 2010, 7:20 pm

59timspaldingMay 29, 2010, 7:24 pm

60CollectoratorMay 29, 2010, 7:25 pm

61timspaldingMay 29, 2010, 7:26 pm

62rsterlingMay 29, 2010, 7:26 pm

63timspaldingMay 29, 2010, 7:27 pm

64timspaldingMay 29, 2010, 7:28 pm

65EveleenMMay 29, 2010, 7:33 pm

66CollectoratorMay 29, 2010, 7:41 pm

67CollectoratorEdited: May 29, 2010, 7:45 pm

68rsterlingMay 29, 2010, 8:01 pm

69jjwilson61May 29, 2010, 8:10 pm

70jjwilson61May 29, 2010, 8:13 pm

71rsterlingMay 29, 2010, 8:34 pm

72prosfilaesMay 29, 2010, 11:13 pm

73timspaldingMay 29, 2010, 11:31 pm

74timspaldingMay 29, 2010, 11:40 pm

75rsterlingEdited: May 30, 2010, 12:21 am

76CollectoratorMay 30, 2010, 4:24 am

77EveleenMMay 30, 2010, 5:56 am

78NoisyMay 30, 2010, 6:08 am

1Collectorator
May 26, 2010, 5:57 am

2EveleenM
May 26, 2010, 7:23 am

3Collectorator
May 26, 2010, 7:40 am

4MarthaJeanne
Edited: May 26, 2010, 10:31 am

5KingRat
May 26, 2010, 1:30 pm

6Collectorator
May 27, 2010, 7:08 am

7FicusFan
May 27, 2010, 7:47 am

8DaynaRT
May 27, 2010, 8:34 am

9keristars
May 27, 2010, 9:49 am

10timspalding
May 27, 2010, 3:02 pm

11_Zoe_
May 27, 2010, 3:06 pm

12DaynaRT
May 27, 2010, 3:09 pm

13KingRat
May 27, 2010, 6:57 pm

14Heather19
May 27, 2010, 9:55 pm

15timspalding
May 28, 2010, 2:32 am

16r.orrison
May 28, 2010, 4:42 am

17timspalding
May 28, 2010, 4:44 am

18Collectorator
May 28, 2010, 5:51 am

19timspalding
May 28, 2010, 5:56 am

20Collectorator
May 28, 2010, 5:59 am

21FicusFan
May 28, 2010, 7:21 am

22timspalding
May 28, 2010, 7:24 am

23FicusFan
May 28, 2010, 7:26 am

24FicusFan
May 28, 2010, 7:29 am

25conceptDawg
May 28, 2010, 12:56 pm

26Collectorator
Edited: May 28, 2010, 6:28 pm

27Collectorator
May 29, 2010, 8:34 am

28FicusFan
May 29, 2010, 10:36 am

29Collectorator
May 29, 2010, 10:48 am

30Collectorator
May 29, 2010, 10:55 am

31timspalding
May 29, 2010, 1:30 pm

32conceptDawg
May 29, 2010, 1:39 pm

33timspalding
Edited: May 29, 2010, 2:03 pm

34timspalding
May 29, 2010, 2:06 pm

35FicusFan
May 29, 2010, 2:38 pm

36r.orrison
May 29, 2010, 5:59 pm

37timspalding
Edited: May 29, 2010, 6:06 pm

38Collectorator
May 29, 2010, 6:07 pm

39timspalding
Edited: May 29, 2010, 6:26 pm

40Collectorator
May 29, 2010, 6:24 pm

41timspalding
May 29, 2010, 6:36 pm

42Collectorator
May 29, 2010, 6:49 pm

43rsterling
Edited: May 29, 2010, 7:03 pm

44timspalding
Edited: May 29, 2010, 7:02 pm

45Collectorator
May 29, 2010, 7:02 pm

46Collectorator
May 29, 2010, 7:03 pm

47timspalding
May 29, 2010, 7:06 pm

48rsterling
Edited: May 29, 2010, 7:07 pm

49timspalding
Edited: May 29, 2010, 7:13 pm

50Collectorator
May 29, 2010, 7:11 pm

51EveleenM
May 29, 2010, 7:11 pm

52Collectorator
May 29, 2010, 7:14 pm

53rsterling
May 29, 2010, 7:14 pm

54timspalding
May 29, 2010, 7:16 pm

55timspalding
May 29, 2010, 7:17 pm

56Collectorator
May 29, 2010, 7:19 pm

57Collectorator
May 29, 2010, 7:20 pm

58DaynaRT
May 29, 2010, 7:20 pm

59timspalding
May 29, 2010, 7:24 pm

60Collectorator
May 29, 2010, 7:25 pm

61timspalding
May 29, 2010, 7:26 pm

62rsterling
May 29, 2010, 7:26 pm

63timspalding
May 29, 2010, 7:27 pm

64timspalding
May 29, 2010, 7:28 pm

65EveleenM
May 29, 2010, 7:33 pm

66Collectorator
May 29, 2010, 7:41 pm

67Collectorator
Edited: May 29, 2010, 7:45 pm

68rsterling
May 29, 2010, 8:01 pm

69jjwilson61
May 29, 2010, 8:10 pm

70jjwilson61
May 29, 2010, 8:13 pm

71rsterling
May 29, 2010, 8:34 pm

72prosfilaes
May 29, 2010, 11:13 pm

73timspalding
May 29, 2010, 11:31 pm

74timspalding
May 29, 2010, 11:40 pm

75rsterling
Edited: May 30, 2010, 12:21 am

76Collectorator
May 30, 2010, 4:24 am

77EveleenM
May 30, 2010, 5:56 am

78Noisy
May 30, 2010, 6:08 am

79timspalding
Edited: May 30, 2010, 9:20 am

80Noisy
May 30, 2010, 9:31 am