search can not identify names which are nor Unicode normalized
Talk Bug Collectors
Join LibraryThing to post.
This topic is currently marked as "dormant"—the last message is more than 90 days old. You can revive it by posting a reply.
1gangleri
Please search for Herta Müller.
q=Herta+Mu%CC%88ller http://www.librarything.com/search_author.php?q=Herta+Mu%CC%88ller where the Unicode Character 'COMBINING DIAERESIS' (U+0308) unicode/char/0308/index.htm http://www.fileformat.info/info/unicode/char/0308/index.htm is used.
However author/mllerherta&norefer=2 http://www.librarything.com/author/mllerherta&norefer=2 is in the system.
This bug also affects combine.php .
However I am interested in knowing how such authors could be identified / combined. (There would be a bad idea to use such spellings as main author name.)
Thanks for any advice.
q=Herta+Mu%CC%88ller http://www.librarything.com/search_author.php?q=Herta+Mu%CC%88ller where the Unicode Character 'COMBINING DIAERESIS' (U+0308) unicode/char/0308/index.htm http://www.fileformat.info/info/unicode/char/0308/index.htm is used.
However author/mllerherta&norefer=2 http://www.librarything.com/author/mllerherta&norefer=2 is in the system.
This bug also affects combine.php .
However I am interested in knowing how such authors could be identified / combined. (There would be a bad idea to use such spellings as main author name.)
Thanks for any advice.
3gangleri
q=Herta+M*ller http://www.librarything.com/search_author.php?q=Herta+M*ller
finds
author/muumlllerherta http://www.librarything.com/author/muumlllerherta&norefer=1 and matildefracllerherta http://www.librarything.com/author/matildefracllerherta&norefer=1
The first one is what I would call a « premium » spelling variant because it is using « ouml » and the character si not lost. I assume that due to historical reasons characters not belonging to the Unicode Characters in the Basic Latin Block unicode/block/basic_latin/list.htm http://www.fileformat.info/info/unicode/block/basic_latin/list.htm where ignored.
This way all the variants
Herta Müller (using the Unicode Character 'COMBINING DIAERESIS' (U+0308))
Herta Möller
Herta Müller
Herta Møller
generate the same author name author/mllerherta http://www.librarything.com/author/mllerherta&norefer=1 .
I assumed first that search is a kind of reverse engineering technique; a kind of guessing:
Erik Möller q=Erik+M%C3%B6ller http://www.librarything.com/search_author.php?q=Erik+M%C3%B6ller
Erik Müller q=Erik+M%C3%BCller http://www.librarything.com/search_author.php?q=Erik+M%C3%BCller
Erik Møller q=Erik+M%C3%B8ller http://www.librarything.com/search_author.php?q=Erik+M%C3%B8ller
But they do not show the same result: Only the first findsauthor/mllererik&norefer=1 http://www.librarything.com/author/mllererik&norefer=1 where all three spelling variants are available.
So the code seems to be more complex.
finds
author/muumlllerherta http://www.librarything.com/author/muumlllerherta&norefer=1 and matildefracllerherta http://www.librarything.com/author/matildefracllerherta&norefer=1
The first one is what I would call a « premium » spelling variant because it is using « ouml » and the character si not lost. I assume that due to historical reasons characters not belonging to the Unicode Characters in the Basic Latin Block unicode/block/basic_latin/list.htm http://www.fileformat.info/info/unicode/block/basic_latin/list.htm where ignored.
This way all the variants
Herta Müller (using the Unicode Character 'COMBINING DIAERESIS' (U+0308))
Herta Möller
Herta Müller
Herta Møller
generate the same author name author/mllerherta http://www.librarything.com/author/mllerherta&norefer=1 .
I assumed first that search is a kind of reverse engineering technique; a kind of guessing:
Erik Möller q=Erik+M%C3%B6ller http://www.librarything.com/search_author.php?q=Erik+M%C3%B6ller
Erik Müller q=Erik+M%C3%BCller http://www.librarything.com/search_author.php?q=Erik+M%C3%BCller
Erik Møller q=Erik+M%C3%B8ller http://www.librarything.com/search_author.php?q=Erik+M%C3%B8ller
But they do not show the same result: Only the first findsauthor/mllererik&norefer=1 http://www.librarything.com/author/mllererik&norefer=1 where all three spelling variants are available.
So the code seems to be more complex.

