Encoding problem with "ш"

TalkBug Collectors

Join LibraryThing to post.

Encoding problem with "ш"

1Deil
Apr 21, 2025, 1:48 pm

Cyrillic letter "ш" (lower case only) is not displayed anymore in my library. It looks like "� " now (with a trailing space). It also creates a new author page with "� " in its name when I try to add a book when the author has "ш" in its name.
Books created earlier has broken name and author's name on library page, but proper name on work page. Books added today all have this broken symbol on the work page as well.

And it also looks like if the "publication field" has "ш", then it's broken as well in library page, plus all symbols after it are broken as well.




2Deil
Apr 21, 2025, 1:50 pm

Sorry, it got even worse when I copied it. The letter is lower case https://uk.wikipedia.org/wiki/%D0%A8

3bnielsen
Apr 21, 2025, 2:14 pm

Where do you add the books from? It might be an encoding problem with the source, so a couple of examples would be nice.

4AnnieMod
Edited: Apr 21, 2025, 2:14 pm

Let's try this way:

ш

Ha - yep, something is wrong for some reason.

5Deil
Apr 21, 2025, 2:18 pm

>3 bnielsen: All from "Add Books > Manual entry" page so it's not the source problem.

6bnielsen
Apr 21, 2025, 2:18 pm

The added book is this one, but I don't think the rest of us can see where it was added from? (Please correct me if I'm wrong).

https://www.librarything.com/work/33983778/285110109

7AnnieMod
Apr 21, 2025, 2:20 pm

It looks ok on the author pages (see https://www.librarything.com/author/236212755 - he has the letter both in his name and in a title) but is messed up the catalog.

And it is both in the author field and the title field (I have a book that happens to have both here: https://www.librarything.com/work/13740937/book/285113595

To help the developers:
Book page: (broken)
https://www.librarything.com/work/13740937/book/285113595

Work page: https://www.librarything.com/work/13740937 (broken)

Author page: https://www.librarything.com/author/236212755 (works ok)

Catalog: broken

8AnnieMod
Edited: Apr 21, 2025, 2:21 pm

>3 bnielsen: Not a source problem and one of the ones I see broken in my catalog was ok last week when I looked at it. So it is something new (I wonder if the fix for the Hunt that was miscounting work pages is not the culprit).

9bnielsen
Edited: Apr 21, 2025, 2:25 pm

>8 AnnieMod: Yes, this is a nicely isolated bug. I just tried typing "trying ш for sha" into a review and that displays the same bug. And when I edit this post it displays the character as expected, but when displaying the post we get the inverted ?-mark.

10Deil
Edited: Apr 21, 2025, 2:27 pm

>6 bnielsen: Yes, that's the one

>7 AnnieMod:
Here's the broken author page,
https://www.librarything.com/author/3933194830
it's for this book https://www.librarything.com/work/25043567/book/285110716 , should've been merged with https://www.librarything.com/author/vernonursula I'm not merging them for now.
I've added this book like an hour ago, all author names added before that (few days ago last time) are fine.

Here's the library, there's plenty of broken names there https://www.librarything.com/catalog/Deil (change to "All Collections")

11AnnieMod
Apr 21, 2025, 2:25 pm

More data: https://www.librarything.com/work/28744533/book/223853646 looks ok on the book/work page but not in my catalog (it has the letter in the author name).

So something with the adding today is messing it up even worse (the work/book page is broken for the one I added today to test after the bug started). For the older record that I am not touching at all, the Catalog shows it as broken and the work/book page works.

12AnnieMod
Edited: Apr 21, 2025, 2:27 pm

duplicate of >11 AnnieMod:. LT is slowish today...

13AnnieMod
Apr 21, 2025, 2:27 pm

>10 Deil: Oh, yes, I saw it in yours.

But I am also reproducing in mine and with an author and book that do not need merging and that worked as of last week. Thus me posting my research as well.

14AnnieMod
Apr 21, 2025, 2:29 pm

And I have a feeling that https://www.librarything.com/topic/370277 may also be related.

15kristilabrie
Apr 22, 2025, 8:51 am

We're looking at this (and the related issue) now, thanks.

16kristilabrie
Edited: Apr 22, 2025, 9:59 am

I think we'll need to roll back a change...

ETA: I think we need Lucy for this one, which means we'll have to wait until tomorrow. Thanks for your patience in the meantime!

17knerd.knitter
Apr 22, 2025, 2:16 pm

We think we've got this fixed, but let us know.

18idiosyncratic
Apr 22, 2025, 2:16 pm

I can confirm this bug. I experimented a bit and noticed that all offending characters have "88" as the second byte in their UTF-8 encoding.

A few examples (2 and 3 byte characters):
Cyrillic small letter sha (U+6237) / UTF-8: D1 88
Greek small letter psi (U+0448) / UTF-8: CF 88
Latin capital letter C with circumflex (U+0108) / UTF-8: C4 88
Combining diaeresis (U+0308) / UTF-8: CC 88
CYK, Pinyin hù (U+6237) / UTF-8: E6 88 B7

19idiosyncratic
Apr 22, 2025, 2:23 pm

I made a test entry and everything looked as it should. THANKS!

20Deil
Edited: Apr 22, 2025, 2:57 pm

Yeah, looks like it's fixed for old entries. Nice, thanks!
I assume for the books created when this bug was present I just need to fix them manually?

test: ш

21Deil
Apr 22, 2025, 2:45 pm

>18 idiosyncratic: That's some funny bug, would be nice to hear what actually happened.

22Deil
Edited: Apr 22, 2025, 3:55 pm

*deleted*
moved to a new thread

23idiosyncratic
Edited: Apr 22, 2025, 3:02 pm

This message has been deleted by its author.

24bnielsen
Apr 22, 2025, 3:47 pm

>18 idiosyncratic: Ah, that's a nice hint of what's happening.

25bnielsen
Apr 22, 2025, 3:48 pm

>22 Deil: Yes, please make a new thread for it.