"Recalculate author name" clears the author name for authors without works

TalkBug Collectors

Join LibraryThing to post.

"Recalculate author name" clears the author name for authors without works

This topic is currently marked as "dormant"—the last message is more than 90 days old. You can revive it by posting a reply.

1gangleri
Edited: Apr 4, 2012, 9:48 pm

Hi!

I have seen this during the past.

Example from today:

1) /work/5469516/summary is showing
by Yan Sh. Hammes, Frank Hammes, Yan Sh. Hammes (Autor), Frank Hammes (Autor)

2) klicked on /author/hammesfrank The name was "Frank Hammes".
I opened the link at "Recalculate author name" in another browser tab.
The name changed there to "hammesfrank" showing only the author url.

Note 1: You may reproduce this by choosing some of the works at Publisher Series: Kauderwelsch AusspracheTrainer. If one user recalculated that other auther before you idi the first author the author name will be lowercase. You will need to find another example.

Note 2: In order to make something useful out of such urls you need to add the canonical name and recalculate the author a second time.

Note 3: Authors other then the first author may use other scripts then Latin. In such cases their url will be numerical only i.e. without showing their name.

Note 4: At search author youmay see author urls generated from CK via canonical name, legal name, alternative name (sometimes relelationships) will not behave as described above. They show the author url only.

2jjwilson61
Apr 4, 2012, 11:41 pm

I believe that's working as intended since the author page uses the most common author name from their works and since there are no works (and no canonical name) it has no way to know what name to use, so it uses the one from the url.

3gangleri
Apr 5, 2012, 12:29 am

>2 jjwilson61: The point / the benefit / the intention of "Recalculate ..." is to add or correct / replace something, not to delete.

4brightcopy
Apr 5, 2012, 12:56 am

#3 by @gangleri> Not really. It will delete it, if the editions with that data have been changed/deleted by the users since the previous recalculation.

As for the actual problem in #1, I didn't really want to wade through the post and try to figure it out. So I'm not saying it's not a bug. Just clarifying the point of recalculation.

5gangleri
Apr 5, 2012, 2:22 am

>4 brightcopy: Ok! But that would make sense only there is a basis on which one could "recalculate" the name. If no books / works are found one should leave the name as it is. If language CN has been added one should consider this; ie.e. should do whatever is done now or improved because of the "last CN is the winner" (at least for that language and all languages without CN - the languages having CK set should keep their own CK).

So only an additional IF statement is required.

6AnnieMod
Apr 5, 2012, 2:26 am

It's not that simple.

Sometimes as a result of bad combinations, a wrong spelled name ends up showing up at the top. Once all is cleared, I'd rather see the url than the bad name... So recalculate should do what is is doing now.

Why anyone will click on recalculate on these pages if there is no need to is a different story.

7gangleri
Edited: Apr 5, 2012, 5:46 am

>6 AnnieMod: "Sometimes as a result of bad combinations ...."

These are exceptions. In that case add the appropriate CN to eng.CK (English CK).

There are thousands of other cases as authors tagged ¬á, ¬à, ¬â, ¬ä, ¬ă, ¬ā, ¬å, ¬æ, ¬ĉ, ¬č, ¬ç, ¬ð, ¬é, ¬è, ¬ë, ¬ı, ¬í, ¬ï, ¬ij, ¬ł, ¬ń, ¬ñ, ¬ó, ¬ö, ¬ō, ¬ő, ¬ø, ¬ř, ¬ś, ¬š, ¬ş⇼ș, ¬ß, ¬ţ⇼ț, ¬þ, ¬ú, ¬ü, ¬ý, ¬ż, ¬ž see my tag cloud.

If you are importing Herta Müller from many, many data sources and the ISBN is not known to LT already then the author url would be "muumllerherta" until you edit the book the first time. With the first saved edit the author url will be "mllerherta" the first url mentioned here will have no book any more. i.e. until another user imports a book and will not edit it or if the book will nt be combined with another work.
This is why both
a) the author needs to be combined this is c)
b) a properly UTF-8 encoded CN needs to be added
c) that author needs to be recalculated
d) both author urls should be combined
e) these corelated author urls should never ever be separated; at least no but diasmbiguations are required (at some point in time) in cases as for
Erik Möller, Erik Möller, Erik Møller

Not sure where this is documented. I made it for hundreds and hundreds of authors;. Beside the mentioned characters all characters which can be encoded via "&*;" as © apostrophe characters in names at Israel Union List (IUL) etc. are causing this extra work.

This is a hard pain and could be done via a script.

Note: Authors with books imported from IUL (maybe from other data sources as well) will have one numerical url and shift to another. Deleting the existing name is a harder loss then deleting it from "skelterhelter".
----
>6 AnnieMod: "Why anyone will click on recalculate on these pages if there is no need to is a different story."

Why are people sp$amm*ing LT?

8AnnieMod
Apr 5, 2012, 3:50 am

So fix a non-existing bug (it works as it is supposed to work) and add an additional complication instead? Oh well

9gangleri
Apr 5, 2012, 4:07 am

>7 gangleri: Sorry! You are very fast with your comments. Did you ever check "/profile/*/stats/gender"? See profile/gangleri/stats/gender. There are reports that this statistic lists author names (only one of the correlated urls). If you klick on the author url there is a discrepancy about showing how many books and works you own and what you see when you go to the link to your catalog. (Investigating this for other users needs manual url manipolation or other advanced skills. This is exactelly what a "non-existing bug" is.

10gangleri
Edited: Apr 5, 2012, 4:25 am

>6 AnnieMod: re: "d) both author urls should be combined"

You either should know how to do it immediately via url combination syntax or will need to wait until the search used at GUI (using non real time data) can assist you. In the second case you need to maintain a log about what is not combined yet or ask support at the group "Combiners".

Note: Not sure haw many users speaking English import records only / mainly from LoC. LoC is using "CDM'" (combined diacritical marks which do not harm author urls. But you need to remember that most of the correlaated author urls have a third form as "mullerherta", "mullereric", "mollereric" etc. So this is a "non issue" at LT.

11AnnieMod
Apr 5, 2012, 4:27 am

>You either should know how to do it immediately via url combination syntax or will need to wait until the search used at GUI (using non real time data) can assist you. In the second case you need to maintain a log about what is not combined yet or ask support at the group "Combiners".

If at least one of the names can be found in the search, all you need to do is to search for it from the other author and you can combine. No need to know anything special...

12MarthaJeanne
Edited: Apr 5, 2012, 4:43 am

I think now that you are complaining that if your book has a different author listed than the work, you may get sent from the gender page to an author page that doesn't have your book listed on it.

I agree that this is rather disconcerting the first time around, but since I figured out what was going on, I have come to value this as a way of catching mistaken entries in my books and/or author combinations that need to be made. Yes, most often this has to do with different codings for non-ASCII characters, but not always; sometimes I have used an alternative form of the author's name; sometimes I have seen it happen with names that look identical to me.

Have you ever looked at the lists of authors suggested for combining on certain author pages? There is so much garbage there! Let's not have the computer decided which are valid. People can do a much better job.

13gangleri
Edited: Apr 5, 2012, 5:54 am

>11 AnnieMod: You did not get the point. Within 5 minutes neither of the correlated authors are in the system. I am talking for authors where the first book is added. Imagine it is Hungarian author. They have many diacritical character, 99% of Icelandic female name end with "dóttir" (as in "Magnúsdóttir"). Imagine the French authors and all other languages using "ç", there are scaron characters. Do you mean that a newbie will have the endurance to find out whats other then expected?
Please remember that combination of "empty" author urls was blocked at LT and allowed only few months ago. Who would try to identify the catalog discrepancies. Go to he Male or female-statistic of Icelandic, Hungarian etc. users to see how much of a problem this is.
----
A "TangenTopic" relates to the useless (?) results from search author name listing "empty" author names generated by CN, LN, Alt.N (maybe also relationships). Via improved functionality To these author urls CN should be added in the same language if it is not populated. Combination could be done as long as the correlated author url is not a disambiguation. If taht is a case one should notify the user of this situation. This case does not need to be discussed here in detail.
Another "TangenTopic" related to the combinations I made was relating to the loos of the available links at bthe "loosing author url".
----
>7 gangleri: Homoglyphs may let you thing the name is the same. This is using characters as in Kyrillic CCCP.
But I thing the most cases relate to the usage of CDM's(combined diacritical marks) mainly used by LOC. The solution for this is normally Unicode normalization. If you see such cases copy and paste them; go to an Wikipeia empty page as http://en.wikipedia.org/w/index.php?title=talk:zzzzzzzzzzzzz&action=edit&amp... past what you have copied , make a preview and copy what appears as result. Now I explained what I dis to one of the1,500,000 LT users. This is not what I should do.

P.S. fixed html

14gangleri
Apr 5, 2012, 5:52 am

fixed broken html in >7 gangleri:
writing fast I was sating the opposite of what I was thinking - especially e

conclusion: You / users are either affected by this problem "on masse" (French) or they are not. This is a non issue for them.

15gangleri
Edited: Apr 5, 2012, 6:31 am

Explaining why this report and the TagenTopic s are so important for me: The gaph at http://viaf.org/viaf/7524651/ shows how many National Libraries participating in combining theri authority control number (here for Aristoteles agreed that that VIAf numer relates to the same spelling here using 4 scripts. Looking at the "Abut box at /topic/134611#aboutBox you will see a lot of articles linking today to YIVO Encyclopedia and using the same spelling as that Encyclopedia. This is no automated work. The Yiddish CK is generating author urls of witch many are not used at any book source in member imports until today. Many authors have combination of books where both Hebrew and Latin script is used. Waiting until some user would use LT to add these books can be as waiting forever.
I stopped this worked three weeks ago because of the ongoing Esperanto translation and involvement in localization issues. Many issues addressed during last weeks where known to me during long time. I am very happy both about LT stuffs engagment (?) / help / support in fixing many of them and the wide and warm support and help received by a lot of active users.

16gangleri
Edited: Apr 5, 2012, 7:15 am

testcaseLT please do not fix

>13 gangleri: "... Imagine it is Hungarian author. ..."
/profile/fazsef/stats/gender
Please not the ammount of authors with diacriticals in their name.

Zsombor Bódy
a) with html-entity: /author/boacutedyzsombor - one work | two members - tese two books have never been edited
b) with ignored html-entity: /author/bdyzsombor - "empty" - place wher the work will be if books are edited
c) LoC varant - might arrive at any time: /author/bodyzsombor - today empty - lowercase anyway

catalog.php?view=fazsef&author=boacutedyzsombor - empty - discrepancy
/catalog.php?view=fazsef&author=bdyzsombor - one work - discrepancy
----
similar:
catalog.php?view=fazsef&author=boacutedyzsombor - empty - discrepancy
/catalog.php?view=fazsef&author=bdyzsombor - one work - discrepancy

17gangleri
Edited: Apr 5, 2012, 10:00 am

>12 MarthaJeanne: please see /search.php?search=mu%CC%88ller&searchtype=work to see how many works, talks are using the homoglyph u+diaresis (encoded in urls as "%CC%88".
----
Imagine authors using html-entities if 49% are editing their work while 51 do not all 100% have tho ptoblem.
I looked at german Zeitgeist in German. There the top authors with diacritics are
/author/duumlrrenmattfriedri&norefer=1 included in Friedrich Dürrenmatt and
/author/schaumltzingfrank&norefer=1 included in Frank Schätzing..
While the first is better known abroad (and copies in languages other then German may influence the users choice from where they import books) the porcentag of the "auml" url is higher then 10 %.
I wounder why LT import will fail if the "ü" character is used in the author name. This happened since more then a half year but I recall that in the past it woorked. So you need to insert "Friedrich Durrenmatt" in the upper field at "Add books". This is similar ar telling people from the united states that they need to use "Vashington" because Latin did not use "W".
----
>6 AnnieMod: "Why anyone will click on recalculate on these pages if there is no need to is a different story.
BTW: I realized last year that "Recalculate author / title" does not shift authors from the html-entity based url to its corellated url reached today only if / when 51% of the work owners edited their book.
This means that haus$tile members can not influence the evaluation of the main url but they can delet the minimal information at author urls without works.
----
In the "Abut" box at /topic/134611#aboutBox one can see odd lowecase links. This is because at the time combining the manualy added url with the existing one I either forgot to do b) or c) from >7 gangleri:. Doing neither one would not allow you to add "empty" author urls to the "Abut" box. ☛ "#aboutBox"

18timspalding
Apr 6, 2012, 5:22 pm

It seems to me that the core bug here is not a bug, and is unfixable. A given author code (eg., hammesfrank) gets its full form from all the full forms of it in the system. At some point there was one, so it was "Frank Hammes." Once there are none in the system, it can't do any better than hammesfrank, unless you add a canonical name.

19jjwilson61
Apr 6, 2012, 5:44 pm

19> I believe that gangleri is asking that when it comes time to recalculate the name of an author with zero works, whether it's a manual or automatic recalculation, that the system just not do it and let whatever the last name was stand. I guess whether that works depends on whether the last calculated name was reasonable.

20timspalding
Apr 6, 2012, 5:54 pm

Right. That makes a certain sense but, well, meh. There's no right answer. But you can make the answer right, with the Canonical Author.

21rsterling
Apr 6, 2012, 6:02 pm

Actually, isn't part of it that he wants a consistent/stable URL for an author page, one that he can link to from that other site?

Unfortunately, that just doesn't seem to be how LT author pages work, since the URLs where they sit can move/change around through combination or through addition of new works, or other things.

22rsterling
Apr 6, 2012, 6:11 pm

And other difficulty is author recalculation can happen not only from the author page but from the work page. There's no way to know from the work page whether an author page has zero works or not, or whether recalculating will result in an author page suddenly having zero works. I just don't see any way people can or should be asked not to recalculate authors, since concerns about data accuracy and commonality on works would seem to take precedence (and be more general) than concerns about the particular author URL on which the books land.

The problem really is with less common authors, especially if they have diacritics, since the author page is more likely to shift around when people edit their book data, other authors, or recalculate authors. However, this can be fixed through combining, for the most part, especially since it's now possible to combine even zero-work author pages.

23gangleri
Apr 6, 2012, 10:44 pm

Thank you for all the comments! I am working on authors with different spellings and language specific transliterations. This is why the LT system should have a "peer point", search should find "foobar" and during last year I combined "foobar" with existing "bar" author names. This is why combinations with zero work authors should never be splitted from their "targets§. LT does a good job in combining authors.

At this point I want to point on the difference of some LT functions:
a) some are near real time based
b) other are based on a LT dump, so search and GUI combinations are functional the next day, a few days after (I suppose).

I managed to combine newly "created" zero work author urls with existing newly created author urls nearly real time. This can be made with non dis$clo*se author combination via url syntax. So I do not need to wait until next day.

What came in my mind these days is to use a "LT Meta"-word as "enforce" in order to query near real time data. It should be used at least in the following situations:
a) when asking "combine this author with" (or similar)
b) when adding other authors to a work in the other authors container at work pages (anchor is #otherauthors_container)
c) when adding relationships
d) in author search; with some skills this can be done "guesing" the LT author url for author names with Latin based scrips; however there one need to know how html entity characters are handled at LT and what the correlated authors will be; limiting to 20 character only is mandatory; accidentaly happening "miscounts / erroneous counting" are a potential source of errors experienced in the past.
The guessing for author urls using other then Latin based scripts is a Houdini job.

Syntax should be "enforce:Smith, John".

Not sure how many advanced users would use and benefit from "enforce:". Maybe there are better suggestions around here.
----
Regarding the actual topic about "Recalculate author name".

The suggestion was quite simple: One IF statement only: If a author url "smithjohn" has no works associated then IF a nontrivial not "smithjohn" is associated with that author url and NO CN is defined for any language then "Recalculate author name" will preserve the nontrivial name. I assume that many LT users can fix the few exceptional situations setting a useful CN where the nontivial name makes no sense (to them). If in dount one could ask. Group "Combiners" would probably be glad to assist.
----
"Recalculate author name" can not be done on combined author. This might be a convenient "how it works now" and a protection to add CN to non top level author urls. On the other hand one can not see for "Smith, John" if the url is a Japanese, Russian, Hebrew, Korean or whatever "LT weighted variant". To "regenerate" the original spelling one would need to split the author url, delete the (mainly) English CN, recalculate the name, if necessary add a CN in the relevant language and combine the urls again. Then one could see all the language names in the "Combined with" list and at the "/issues" subpage.

24gangleri
Apr 6, 2012, 11:22 pm

P.S. Why I reported this bug? I clicked on "Recalculate author name"and noticed later that the author age made less sense then before. I did not know the consequences. So I will add the CN as long as this is the "status quo".
The main problem is in situations not arriving on time. Somebody else was recalculating that author url. You have no chance to see why this url is in the system. You might find it via search on while clicking on the "Other authors" in the third line of your work pages.
This (showing more then the main author) is a feature available during the last year.

25brightcopy
Apr 7, 2012, 12:03 am

It just doesn't make sense. There are two situations:

1) The original name is right. You recalculate and now John Smith becomes smithjohn.
2) The original name is wrong. You recalculate and now Johannes Smythe stays Johannes Smythe, even though it's supposed to be "John Smith"

You propose that by default doing method #2 and having people correct it if wrong. But if someone doesn't go in and add a Canonical Name, it stays wrong.

If you do method #1 instead, what is the downside? That it won't look as "pretty"? I seriously don't understand what is so horrible about an author with ZERO works showing "smithjohn". And who is really going to see it that often if they have no works? How are they even getting there? If an author has no books, is he/she really an author?

26gangleri
Apr 7, 2012, 7:41 am

short answer to >26 gangleri:
re: "If an author has no books, is he/she really an author?"
After three years I finaly managed to import records from LT data sources about Zamenhof, Lidja / Zamenhof, Lidia. I did not finished adding relations (links, pictures) to the Zamenhof family because of documenting many issues in reports. I only have one book using Cyril script for her grandfather, for Felix Zamenhof (as far as I remember the brother of Zamenhof, L. L.) I did not verify what I have in my catalog. The only living descendent of L.L. Zamengof is using a double name containing a "-". That name and works can not be found not knowing the exact spelling (maybe google helps but one might use the browser search in my tag cloud).
The answer to the above question. One ay identify only data sources with some of the required spellings. as Wales, Jimmy and Wales, Jimbo.
"Patrononimic" names used in Russia and in countries from the former SSSR are not common in all cultures. This is why so many alternavie names used in CN, LN and OT need to be properly popululated anc combined with the available author url having books. That might use a script other then Latin as it is the case in my catalog with urls generated by names in Cyril and / or Hebrew.
Visiting my friend from Ukrainia yesterday we added CN, LN, ON as
http://epo.librarything.com/search.php?search=%D0%9D%D0%B5%D0%BC%D0%B8%D1%80%D0%...
http://epo.librarything.com/search.php?search=%D0%9B%D0%B5%D0%B2+%D0%9B%D0%B0%D0...
http://epo.librarything.com/search.php?search=%D0%A8%D0%BE%D0%BB%D0%BE%D0%BC+%D0...
http://epo.librarything.com/search.php?search=%D0%90%D0%B1%D1%80%D0%B0%D0%BC+%D0...
http://epo.librarything.com/search.php?search=%D0%93%D0%BE%D0%BB%D1%8C%D0%B4%D0%...
http://epo.librarything.com/search.php?search=%D0%9B%D0%B5%D0%B2+%D0%9A%D0%B2%D0...
I am waiting for an LT dump where both variants with Fathers name and without will show.
It should be easy to make such additions nearly real time. Else you need to maintain extra lists for maintenance.

Need to leave, will fix spelling errors and urls later.
----
P.S. to >7 gangleri: You should never ever insert diacritics in CN, LN, On, relations uusing unverified copy and paste. Unicode normalization (the filtering of CDM - combined diacritical marks) is in the responsability of LT users. Such spellings enter the system and propagate there.

27brightcopy
Apr 7, 2012, 9:38 am

If there were answers to my questions somewhere in that wall of text, I didn't see them.