This topic is currently marked as "dormant"—the last message is more than 90 days old. You can revive it by posting a reply.
1timspalding
Very short version
Something changed that most members won't notice, but members who do a lot of data improvements should like.
Short version
I recently revisited LibraryThing's system for calculating and recalculating the primary author of works. This resulted in significant improvements to the authors picked. A HUGE of works were affected. But almost all were works with only a handful of copies.
If you run across works with a bad primary author, let me know, and I'll look at it.
Longer version
As you may know, LibraryThing's basic notion is that data "bubbles up" from the book level--the level of member's data--to the work data. Because book level disagrees, there's a complicated system for picking the best authors (and titles) for the work overall.
For various reasons, the system was not appropriately combing through past works, recalculating the authors as new books were added to them. Also, when manually triggered, by a member clicking the link to recalculate, by a combination, or a split, the system wasn't perfect. Even worse, the two processes--combing-through and the individual process--were slightly different.(1)
Anyway, I spent two work days on this, and the end result is a new system, used everywhere and any time the author needs recalculating.
The raw changes are impressive:
Statistics
Total works changed 732,708
163,820 went from no primary author to a author primary
8,507 went from a primary author to no primary author
The rest changed primary author.
Of these, 35% changed author variant, but not true author. 360,000 changed their author completely(2).
Changes were strongly concentrated on low-copy works. 95% had fewer than 10 copies. 72% had one or two copies. A very common cirumstance was a work with two copies whose two book have different authors, and it switched from one to another. Academic books with multiple authors are one common cause.
The New System
The new system sorts in a cascade. That is, it only gets to the next level, if the authors are tied so far.
1. Whether there's a manually-added primary author
2. Number of books using the author, all variants combined
3. Number of books using the specific variant
4. Length of author code; this is arbitrary but tends to produce better results.
5. Alphabetical by author code; arbitrary but provides the same answer every time.
Books without authors are counted as 1/3 of a book. That is, if two books in a work have no author, but a third book has an author, it will pick the third author. Most copies need to have no author for it to prefer no author.
The system gives deleted books a 1/100th vote. This occasionally breaks ties.
The Upshot
* Most users won't notice a thing.
* Members who do a lot of librarian work will find the author-selections more rational.
* Members who do a lot of combining and split-assigning will see new low-copy works, from all the works that went from no author to a good author. (One member already noticed, and thought they were from uncombining. See https://www.librarything.com/topic/241944 .
If you see any work-level author picks you disagree with, take a look at the editions page for the work. If you still see a problem, let me know on this thread, and I'll explain why the author was chosen, or improve the algorithm.
1. This wasn't just idiocy. For speed and memory reasons, the code for processing millions of works at a time can't be the same as the code that processes one, quickly. It's now the same, so recalculations take a little longer than I'd like (4-5 seconds for high-copy works).
2. minus some aliasing of authors that's irritating to calculate across so many works efficiently.
Something changed that most members won't notice, but members who do a lot of data improvements should like.
Short version
I recently revisited LibraryThing's system for calculating and recalculating the primary author of works. This resulted in significant improvements to the authors picked. A HUGE of works were affected. But almost all were works with only a handful of copies.
If you run across works with a bad primary author, let me know, and I'll look at it.
Longer version
As you may know, LibraryThing's basic notion is that data "bubbles up" from the book level--the level of member's data--to the work data. Because book level disagrees, there's a complicated system for picking the best authors (and titles) for the work overall.
For various reasons, the system was not appropriately combing through past works, recalculating the authors as new books were added to them. Also, when manually triggered, by a member clicking the link to recalculate, by a combination, or a split, the system wasn't perfect. Even worse, the two processes--combing-through and the individual process--were slightly different.(1)
Anyway, I spent two work days on this, and the end result is a new system, used everywhere and any time the author needs recalculating.
The raw changes are impressive:
Statistics
Total works changed 732,708
163,820 went from no primary author to a author primary
8,507 went from a primary author to no primary author
The rest changed primary author.
Of these, 35% changed author variant, but not true author. 360,000 changed their author completely(2).
Changes were strongly concentrated on low-copy works. 95% had fewer than 10 copies. 72% had one or two copies. A very common cirumstance was a work with two copies whose two book have different authors, and it switched from one to another. Academic books with multiple authors are one common cause.
The New System
The new system sorts in a cascade. That is, it only gets to the next level, if the authors are tied so far.
1. Whether there's a manually-added primary author
2. Number of books using the author, all variants combined
3. Number of books using the specific variant
4. Length of author code; this is arbitrary but tends to produce better results.
5. Alphabetical by author code; arbitrary but provides the same answer every time.
Books without authors are counted as 1/3 of a book. That is, if two books in a work have no author, but a third book has an author, it will pick the third author. Most copies need to have no author for it to prefer no author.
The system gives deleted books a 1/100th vote. This occasionally breaks ties.
The Upshot
* Most users won't notice a thing.
* Members who do a lot of librarian work will find the author-selections more rational.
* Members who do a lot of combining and split-assigning will see new low-copy works, from all the works that went from no author to a good author. (One member already noticed, and thought they were from uncombining. See https://www.librarything.com/topic/241944 .
If you see any work-level author picks you disagree with, take a look at the editions page for the work. If you still see a problem, let me know on this thread, and I'll explain why the author was chosen, or improve the algorithm.
1. This wasn't just idiocy. For speed and memory reasons, the code for processing millions of works at a time can't be the same as the code that processes one, quickly. It's now the same, so recalculations take a little longer than I'd like (4-5 seconds for high-copy works).
2. minus some aliasing of authors that's irritating to calculate across so many works efficiently.
2r.orrison
Possibly related? On https://www.librarything.com/work/3175555/summary look at the Other Authors section first; the first author there is "Lederman, Ross" with a valid link. At the top of the page, the name Ross Lederman links to /author/
3timspalding
Interesting. Where'd you find that?
4MarthaJeanne
>2 r.orrison: The link may be valid, but the work doesn't appear on the page.
6rodneyvc
>2 r.orrison: I'm seeing the bad link for author Ross Lederman at the top of the summary page for the Tarzan the Fearless / Tarzan's Revenge [videorecording]
7omargosh
Thanks for your work on this, Tim. Those are some pretty big numbers. Should the changes mean that manual recalculations in general shouldn't be very necessary anymore (i.e. going forward)?
8lorannen
>7 omargosh: That's the idea! It should also mean that, when manual recalculations are initiated, they're more successful at finding the right author.
10timspalding
First, I fixed something that was causing the manual process to use the old algorithm. Grumble. It's working now.
Tarzan the Fearless / Tarzan's Revenge
So I need to understand where this link came from. It does indeed have no author on the work level. (This fact is half overcome by their being a manual author but, as noted, it's not changing the link. Let's ignore that secondary bug for now.) When I reran the reconciliation script, it was one of only 291 works that changed. So I'm guessing the work was JUST made, or just split, or whatever. I want to find out what happened.
The problem MAY be explained by my note at the top. So… wait for another one?
1) works that have more than one clear author choice will still need help making a decision
http://www.librarything.com/work/3844472/editions
Well, there is a clear author. 2/3 books have "Richardson" as the author. 1/3 has Richardson, Adele. The system chooses "Richardson." It's surely "wrong" but it's working as it was designed to work. We have to work with the data we have.
http://www.librarything.com/work/1260901/editions
Yeah, if there are three answers, it has to choose one. In this case, there are two answers with the same number of votes--the no-author variant is counted for less. It does the best it can.
2) works that have one best choice will still need help choosing it
http://www.librarything.com/work/11851919/editions
I see:
Interceptive Orthodontics/Richardson/ISBN 0904588459 (1 copy separate)
Interceptive Orthodontics/Richardson, Andrew/ISBN 0904588564 (1 copy separate)
That's two choices. Note: It's now choosing "Richardson" as the shorter of the two. I've found that usually works better--because longer ones are more often full of garbage, like a second author jammed in.
http://www.librarything.com/work/5137215/editions
Yeah, it has to make a choice.
Bears: Paws, Claws, and Jaws (Wild World of Animals)/Richardson, Adele D./ISBN 073680823X (1 copy separate)
Bears: Paws, Claws, and Jaws (Wild World of Animals)/Richardson/ISBN 073680823X (1 copy separate)
Bears: Paws, Claws, and Jaws (Wild World of Animals (Bridgestone))/Richardson/ISBN 073680823X (no current copies separate)
In this case, it by the deleted copies--they break ties.
Should the changes mean that manual recalculations in general shouldn't be very necessary anymore (i.e. going forward)?
Yes, it shouldn't be necessary, except in serious cases of lag.
Those are some pretty big numbers.
By the way, it's a little hard to calculate, but it looks to me like 50% were basically moving from one arbitrary choice to another. That is, if a work—usually a very low copy work—has two authors that are tied, the system has to make a choice. The old algorithm and the new algorithm solved that problem differently.
Tarzan the Fearless / Tarzan's Revenge
So I need to understand where this link came from. It does indeed have no author on the work level. (This fact is half overcome by their being a manual author but, as noted, it's not changing the link. Let's ignore that secondary bug for now.) When I reran the reconciliation script, it was one of only 291 works that changed. So I'm guessing the work was JUST made, or just split, or whatever. I want to find out what happened.
The problem MAY be explained by my note at the top. So… wait for another one?
1) works that have more than one clear author choice will still need help making a decision
http://www.librarything.com/work/3844472/editions
Well, there is a clear author. 2/3 books have "Richardson" as the author. 1/3 has Richardson, Adele. The system chooses "Richardson." It's surely "wrong" but it's working as it was designed to work. We have to work with the data we have.
http://www.librarything.com/work/1260901/editions
Yeah, if there are three answers, it has to choose one. In this case, there are two answers with the same number of votes--the no-author variant is counted for less. It does the best it can.
2) works that have one best choice will still need help choosing it
http://www.librarything.com/work/11851919/editions
I see:
Interceptive Orthodontics/Richardson/ISBN 0904588459 (1 copy separate)
Interceptive Orthodontics/Richardson, Andrew/ISBN 0904588564 (1 copy separate)
That's two choices. Note: It's now choosing "Richardson" as the shorter of the two. I've found that usually works better--because longer ones are more often full of garbage, like a second author jammed in.
http://www.librarything.com/work/5137215/editions
Yeah, it has to make a choice.
Bears: Paws, Claws, and Jaws (Wild World of Animals)/Richardson, Adele D./ISBN 073680823X (1 copy separate)
Bears: Paws, Claws, and Jaws (Wild World of Animals)/Richardson/ISBN 073680823X (1 copy separate)
Bears: Paws, Claws, and Jaws (Wild World of Animals (Bridgestone))/Richardson/ISBN 073680823X (no current copies separate)
In this case, it by the deleted copies--they break ties.
Should the changes mean that manual recalculations in general shouldn't be very necessary anymore (i.e. going forward)?
Yes, it shouldn't be necessary, except in serious cases of lag.
Those are some pretty big numbers.
By the way, it's a little hard to calculate, but it looks to me like 50% were basically moving from one arbitrary choice to another. That is, if a work—usually a very low copy work—has two authors that are tied, the system has to make a choice. The old algorithm and the new algorithm solved that problem differently.
11r.orrison
Looking at the Helpers Log for Other Authors, a few changes were made to Tarzan the Fearless / Tarzan's Revenge yesterday between 3-4pm EST:
That work doesn't show in the Work Combination or Work Separation logs.
casaloma added author Lederman, Ross to Tarzan the Fearless / Tarzan's Revenge \videorecording\ (Director, primary, all editions)
casaloma added author Lederman, Ross to Tarzan the Fearless / Tarzan's Revenge \videorecording\ (primary, all editions)
casaloma added author Hill, Robert F. to Tarzan the Fearless / Tarzan's Revenge \videorecording\ (Director, main, all editions)
casaloma added author Crabbe, Buster to Tarzan the Fearless / Tarzan's Revenge \videorecording\ (secondary, all editions)
casaloma added author Morris, Glenn to Tarzan the Fearless / Tarzan's Revenge \videorecording\ (secondary, all editions)
\...\
casaloma added author Lederman, D. Ross to Tarzan the Fearless / Tarzan's Revenge \videorecording\ (Director, primary, all editions)
That work doesn't show in the Work Combination or Work Separation logs.
13MDGentleReader
Thank you, @timspalding
15timspalding
I forgot to thank Collectorator for prompting this with a sort of perfect bug--simple, clear, reproducible, etc. See https://www.librarything.com/topic/227454#5807250
16Noisy
Anything that advances the cause is truly welcome.
>10 timspalding: Not sure I'd have gone with the 'Always choose shortest' option. Perhaps an impossible request, but what about calculating the average of all (across LT; excluding the bits in brackets) name-strings and picking the one that's nearest to that average.
>10 timspalding: Not sure I'd have gone with the 'Always choose shortest' option. Perhaps an impossible request, but what about calculating the average of all (across LT; excluding the bits in brackets) name-strings and picking the one that's nearest to that average.
17timspalding
>16 Noisy:
Hmm. I'm not sure that's right. I think the best answer would be to compare the strings. I mean, for example. If you have
twainmark
twainmarksmithjoe
You want twainmark. Ditto if you have
smithjoe
smithjoeeditor
But other times you have
smithjoe
smith
I see no good way to deciding here. Names are of different lengths.
In theory, I could try to remove some words (ed., editor, by, etc.) and preference ones that didn't have that. But at some point we're devising complicated rules that advance things by 1%. This will, after all, ONLY happen when all the other conditions are tied. That's rare.
Hmm. I'm not sure that's right. I think the best answer would be to compare the strings. I mean, for example. If you have
twainmark
twainmarksmithjoe
You want twainmark. Ditto if you have
smithjoe
smithjoeeditor
But other times you have
smithjoe
smith
I see no good way to deciding here. Names are of different lengths.
In theory, I could try to remove some words (ed., editor, by, etc.) and preference ones that didn't have that. But at some point we're devising complicated rules that advance things by 1%. This will, after all, ONLY happen when all the other conditions are tied. That's rare.
18lorax
>17 timspalding:
Can you check that both first name and last name are populated, and choose the one where both are prior to the length check, as a way of distinguishing between "smith" vs. "smithjoe" and "smithjoe" vs. "smithjoeeditor"?
Can you check that both first name and last name are populated, and choose the one where both are prior to the length check, as a way of distinguishing between "smith" vs. "smithjoe" and "smithjoe" vs. "smithjoeeditor"?
19Noisy
>17 timspalding: Just a thought. Really have no idea what the average might be, so I assumed something like '12'. As you say, it's not one of the leading conditions, and the times the rule would be employed would probably be small. Also, the people who will actually notice this are the more 'involved' cleaners and will accept there has to be a compromise.
20timspalding
>18 lorax:
What we have are strings, with, sometimes, commas. There's no explicit first-name, last-name distinction. Bob, Ed. has both parts.
What we have are strings, with, sometimes, commas. There's no explicit first-name, last-name distinction. Bob, Ed. has both parts.
21jjwilson61
I think, though, that a closest to 12 (or something like that) test is still simple but is likely to eliminate both the multiple name messes and the last name only cases.
22r.orrison
I did recalculate author on this work https://www.librarything.com/work/9579395/editions and it chose no author (most popular edition, 2 copies) over the author that appeared on 1 copy.
Edit: Another work where no author is chosen over numerous non-blank options: https://www.librarything.com/work/2197009/editions
Edit: Another work where no author is chosen over numerous non-blank options: https://www.librarything.com/work/2197009/editions
23MarthaJeanne
>22 r.orrison: There are three copies without an author. No author is shorter than GeoCenter.
25r.orrison
Would it be possible to recognize a list of non-authors and prioritize them lower than other author names? E.g. "anonymous" and "no author".
On a work like https://www.librarything.com/work/10876265/editions where there's a choice between Anonymous and a real name, it would be nice if the system would have a preference for the real name.
On a work like https://www.librarything.com/work/10876265/editions where there's a choice between Anonymous and a real name, it would be nice if the system would have a preference for the real name.
26lorax
>25 r.orrison:
Would it be possible to recognize a list of non-authors and prioritize them lower than other author names? E.g. "anonymous" and "no author".
There's a long-standing RSI for this, let me go find it.
Edited: Never mind, I was misremembering the RSI that pertained to "special" authors; it was about combination. For reference it's at https://www.librarything.com/topic/155018 .
Would it be possible to recognize a list of non-authors and prioritize them lower than other author names? E.g. "anonymous" and "no author".
There's a long-standing RSI for this, let me go find it.
Edited: Never mind, I was misremembering the RSI that pertained to "special" authors; it was about combination. For reference it's at https://www.librarything.com/topic/155018 .
27r.orrison
(You're thinking of https://www.librarything.com/topic/93378. So was I.)
30timspalding
Give me an example, C.
31SimoneA
One example of this working counterproductive can be seen on the Smith author page http://www.librarything.com/combine.php?author=smith. There are several works that end up on the 'wrong' author page, because the calculation uses the shortest author form. I understand that a calculation has to be chosen, so nothing can be done about this.
However, I also noticed that the zero copy editions don't seem to contribute to the calculation, for example here http://www.librarything.com/work/2568835/editions. Maybe that could be looked into?
However, I also noticed that the zero copy editions don't seem to contribute to the calculation, for example here http://www.librarything.com/work/2568835/editions. Maybe that could be looked into?
33timspalding
"This is not the only one that already knew to which author division to go once I changed its author name."
What? Say that another way?
What? Say that another way?
35KoobieKitten
I would guess it's doing that because back on Oct 12, 2012 user bw42 other-authored that work and "The Essentials of IT" to John Hamilton (11). How exactly it "remembers" which other-author it was previously after the change to John due to the recalc, and then the change back to John Hamilton after the manual add author change, I don't know.
Edit: In this case, when the author's name is changed using add author, this "remembering" part is good, yes C.?
Edit: In this case, when the author's name is changed using add author, this "remembering" part is good, yes C.?
36timspalding
Imagine if you have a book authored by two well-known split authors. Because it's a coauthorship situation, the book has bounced back and forth between primary authors. When the author was X, members other-authored it to the correct split. When the author becomes Y, members other-authored it as well. If it flips back, what should happen? I think it should remember WHICH X it was with.
It remembers because that makes the most sense. If the link between a work and its other author was broken, all sorts of information would be lost. Sometimes like this and sometimes when someone wrongly combined or changed an author.
Yeah, you can see problems if members engage in extensive renumbering. But I've always been against that…
It remembers because that makes the most sense. If the link between a work and its other author was broken, all sorts of information would be lost. Sometimes like this and sometimes when someone wrongly combined or changed an author.
Yeah, you can see problems if members engage in extensive renumbering. But I've always been against that…
37r.orrison
This is why most people like to keep the Disambiguation Notices about splits around, even when there are no books assigned to those splits...
39timspalding
>38 Collectorator:
Sure. We can go back to it staying with the first author ever entered, and not recalculating if the majority-author changes.
Would that be better? No.
In this case, it has a choice between two authors, each with exactly one edition. When the edition counts are tied, it has to decide between them. What metric do you suggest?
If it can't decide by counts, including deleted books to break the tie of undeleted books, It uses length. In general, I find the shorter author is more commonly correct. That is, wrongness like "Smith, John, editor" is more common than "Smith." But either way is going to have problems, and "Why can't it just remember" isn't a solution to those problems, but merely a decision to pick one error and stick with it forever.
Sure. We can go back to it staying with the first author ever entered, and not recalculating if the majority-author changes.
Would that be better? No.
In this case, it has a choice between two authors, each with exactly one edition. When the edition counts are tied, it has to decide between them. What metric do you suggest?
If it can't decide by counts, including deleted books to break the tie of undeleted books, It uses length. In general, I find the shorter author is more commonly correct. That is, wrongness like "Smith, John, editor" is more common than "Smith." But either way is going to have problems, and "Why can't it just remember" isn't a solution to those problems, but merely a decision to pick one error and stick with it forever.
43r.orrison
Could you take a look at https://www.librarything.com/work/18465958/editions - the author name appears at the top of the page as a URL segment ("topdemirhuumlseyinga"), although the name is correct on the one edition.
44PhaedraB
>43 r.orrison: It looks fine now.
46timspalding
>43 r.orrison:
Those things can happen, but they should be quickly replaced with the full name. Click to recalculate the author name to be sure.
Those things can happen, but they should be quickly replaced with the full name. Click to recalculate the author name to be sure.
48timspalding
>47 Collectorator:
When you combine works, it picks one work to "win"—the work with the most copies. The old work gets aliased into the winning work, all of its editions get pointed at the new one too.
Obviously we can't have the losing work's split win by default. That is, if work A is assigned to 1 and work B is assigned to 2, and B gets combined into A, we can't have the losing work's split-assignment triumph, right?
What you're proposing is, I think, that, if the losing work is listed as belonging to split 1, and the winning work is not assigned to any split, then it should "take the hint" and assign the winning work to split 1.
Right?
When you combine works, it picks one work to "win"—the work with the most copies. The old work gets aliased into the winning work, all of its editions get pointed at the new one too.
Obviously we can't have the losing work's split win by default. That is, if work A is assigned to 1 and work B is assigned to 2, and B gets combined into A, we can't have the losing work's split-assignment triumph, right?
What you're proposing is, I think, that, if the losing work is listed as belonging to split 1, and the winning work is not assigned to any split, then it should "take the hint" and assign the winning work to split 1.
Right?
49krazy4katz
>48 timspalding: Actually, if I understand this correctly, I think the opposite seems to happen. Often the work with the most copies is assigned to the correct author. On the multiple author page, if you combine it with a single from "Unknown", the entire group goes to the Unknown author. I have gotten around this be reassigning the works from unknown to the correct author before combining.
If that is not what you and Collectorator are talking about, I apologize and please ignore me. k4k
If that is not what you and Collectorator are talking about, I apologize and please ignore me. k4k
50timspalding
Okay, you're saying that work combination always wipes out split assignment?
51r.orrison
No, sometimes. I think in my experience it usually does the right thing, but I've seen it lose the assignment as well.
52krazy4katz
>50 timspalding: I guess I don't know if it is "always" since I stopped combining works before reassigning them once I noticed this happening.
53PhaedraB
>50 timspalding: "Always remember, never say 'never' or 'always'."
In my experience, combining an already assigned work with copies from the Unknown section takes the work out of the split and categorizes it as Unknown. Every time in my experience.
As I recall, this does not happen when combining works already assigned to the same split.
I don't recall combining when the works combined were assigned to two different splits, so I don't know what happens then.
In my experience, combining an already assigned work with copies from the Unknown section takes the work out of the split and categorizes it as Unknown. Every time in my experience.
As I recall, this does not happen when combining works already assigned to the same split.
I don't recall combining when the works combined were assigned to two different splits, so I don't know what happens then.
57MarthaJeanne
I don't think it always wipes out the split assignment, but often enough that you need to check the assignment before and after.
58timspalding
Thanks. Checking tomorrow—it's 2am here. Resetting my read-to marker after this.
59AnnieMod
>58 timspalding:
It almost feels like it keeps tabs of what was assigned before combinations and goes with the highest number of works - so works with a lot of copies will end up as unassigned - maybe because there was a 2000 works one assigned once and then separate non-assigned were folded in but if it is 1 on 1, it sometimes remembers to put it in the proper place after that.
At least that is what seems to happen when I do not assign before combining (or it is actually random)
It almost feels like it keeps tabs of what was assigned before combinations and goes with the highest number of works - so works with a lot of copies will end up as unassigned - maybe because there was a 2000 works one assigned once and then separate non-assigned were folded in but if it is 1 on 1, it sometimes remembers to put it in the proper place after that.
At least that is what seems to happen when I do not assign before combining (or it is actually random)
60timspalding
>56 Collectorator:
Thank you.
First, the David Wood one acted unexpectedly. It lost its assignment even though the one with more copies was assigned. So, problem--not working as intended.
Moving on.
Thank you.
First, the David Wood one acted unexpectedly. It lost its assignment even though the one with more copies was assigned. So, problem--not working as intended.
Moving on.
62timspalding
Argh. I do need another.
I'll look.
I'll look.
63timspalding
Okay, see New Features: http://www.librarything.com/topic/244676
As you can see, I am now explaining where the combined work is going to "go" in the splits. But I decided against:
Anyway, the logic and ramifications here just spun out of control when I tried to pin down every edge case. So instead of trying to come up with an automatic "best" answer and apply it, and it was too much. So I decided to do what it was supposed to do--have the "winner"'s split data triumph, and be clearer about what was going on and where it would end up.
As you can see, I am now explaining where the combined work is going to "go" in the splits. But I decided against:
"What you're proposing is, I think, that, if the losing work is listed as belonging to split 1, and the winning work is not assigned to any split, then it should "take the hint" and assign the winning work to split 1."I decided against it, because it's not always a question of assigning between splits. Sometimes you're assigning to splits that are aliased away to another author--more and more of this apparently! And sometimes the works are to different authors, or different authors that are combined, etc. You can even have the author change in the course of combination.
Anyway, the logic and ramifications here just spun out of control when I tried to pin down every edge case. So instead of trying to come up with an automatic "best" answer and apply it, and it was too much. So I decided to do what it was supposed to do--have the "winner"'s split data triumph, and be clearer about what was going on and where it would end up.
64timspalding
Thanks for all your help on this Collectorator. I really tried to do the thing you proposed—this has been my work most of today. It's just a hard problem. And, in retrospect, I think sticking with "the winner is the winner" is a better principle that doing selective magic.
65leselotte
Digging out this thread because I'm not sure this is a bug: I've come across several entries lately that had as main author name a name that doesn't show up in copies / editions of that title.
Example:
https://www.librarything.com/work/9812570/summary
Is it possible to see whether the author was changed manually? Recalculating doesn't help, by the way!
Tia!
Example:
https://www.librarything.com/work/9812570/summary
Is it possible to see whether the author was changed manually? Recalculating doesn't help, by the way!
Tia!
66MarthaJeanne
>65 leselotte: In that case, the author name on the work is the author page the work is on due to (quite proper) combining.
67leselotte
>66 MarthaJeanne: Thank you, MarthaJeanne! Maybe I'll come across some more examples that stumped me (that weren't as clear as the Confucius one to me)
69leselotte
>68 Collectorator: Thanks a lot!
70leselotte
Here's one that baffles me: Franz Carl Weiskopf. Lots of titles where the author shows up as F.C. Weiskopf, while it's F. C. Weiskopf in work details. Canonical name not set (and shows up as Franz Carl Weiskopf anyways). How come?
71MarthaJeanne
>70 leselotte: Look at the 'includes' page: https://www.librarything.com/author/weiskopffranzcarl/names
The FC page was combined into the Franz Carl page.
The FC page was combined into the Franz Carl page.

