Export as Marc21 contains garbled records

TalkBug Collectors

Join LibraryThing to post.

Export as Marc21 contains garbled records

This topic is currently marked as "dormant"—the last message is more than 90 days old. You can revive it by posting a reply.

1bnielsen
Jun 8, 2011, 6:58 am

When exporting my library as "Basic + MARC" I get 4781 records just fine and 336 garbled records.
Each of the 336 records turn into "0x11 0x14 N" which make converters like

http://marcpm.sourceforge.net/cgi-bin/converter.cgi

stop at the sight of the first one. If I remove the "0x11 0x14 N" parts of the marc21 file, the 4781 remaining records display just fine in the converter tool.

Here is my recipe for removing the errors
cat lt.marc | sed -e 's/x11x14N//g' > lt2.marc

I'll take a look at the 336 affected records and see if I can guess what makes this happen.

2timspalding
Jun 9, 2011, 5:28 pm

Assigning to Casey. This needs to be fixed.

3bnielsen
Jun 10, 2011, 7:20 am

I've taken a look at the 336 unlucky records. They come from only a few sources:

269 Bibliotek.dk
1 BURK
66 Det Kongelige Bibliotek

Encoding alone doesn't explain it:
211 ASCII
125 UTF-8

Most of them are entered long time ago. The newest is book id
70091801 John Engelbrecht: Nu vender vinden entered Feb 12, 2011.

The 10 oldest ones

22823282
23609527
23609664
23610208
23611138
23613936
23616181
23616276
23635752
23639535

The 10 newest ones
53349318
53349610
53349956
56088903
57759433
58234048
64342172
64744989
66071079
70091801

I can't see anything obvious in which books get exported and which get garbled,
so I think it must be something to do with the internal representation of the records.

4timspalding
Jun 15, 2011, 1:26 am

Bumping.

5bnielsen
Jun 17, 2011, 8:09 am

BTW this is fun stuff (for the non-garbled records). I've written a script to present me with the books with the review/description in the marc record alongside with my own review if such exists.

Example (sorry it's in Danish, but all my reviews are in Danish, so ...)
64708299 Stephen King: Misery
Marc review:
Efter en bilulykke plejes forfatteren Paul Sheldon i et
ensomt beliggende hus af en ukendt, sindssyg kvinde, der
samtidig forsøger at tvinge ham til at genoplive sin
berømte romanserie ved udspekulerede ydmygelser.
My review:
Paul Sheldon, 42 år, er forfatter til 4 meget populære
bøger med en heltinde, Misery Chastain, som han er blevet
grundigt træt af. I hans nyeste bog "Misery's child" tager
han livet af Misery og han vil nu skrive en ordentlig bog
"Fast cars".
...

Another example:
21399188 Douglas R. Hofstadter: I Am a Strange Loop
Marc review:
Hofstadter's long-awaited return to the themes of Gödel,
Escher, Bach--an original and controversial view of the
nature of consciousness and identity. What do we mean when
we say "I"? Can a self, a soul, a consciousness, an "I"
arise out of mere matter? If it cannot, then how can you or
I be here? This book argues that the key to understanding
selves and consciousness is a special kind of abstract
feedback loop inhabiting our brains. Deep down, a human
brain is a chaotic soup of particles, on a higher level it
is a jungle of neurons, and on a yet higher level it is a
network of abstractions that we call "symbols." The most
central and complex symbol in your brain or mine is the one
we both call "I." But how can such a mysterious abstraction
be real--or is our "I" merely a convenient fiction?--From
publisher description.

6ccatalfo
May 4, 2012, 12:01 pm

Hi bnielsen,

I don't see these showing up in the marc export anymore. Are you still seeing them? I tried exporting your data and all of the records looked OK (although some were of course quite brief).

7bnielsen
May 4, 2012, 2:12 pm

Yes, the "0x11 0x14 N" stuff is gone. The corresponding records are just not exported.
My checking script returns:
- - checking 5294 books -- 649 tagged as recycled - - 4942 in marc format - -
so 352 records are just silently omitted from the export.

8ccatalfo
May 7, 2012, 8:14 am

Hi,

Ah, ok. What's an example record that is silently omitted?

9bnielsen
Edited: May 10, 2012, 11:23 am

Hmm, let's see.

Note to self:
cat /tmp/lt.marc | tr "\t" "\n" | grep ^001 | cut -f2 -d\ > /tmp/marc-no
cat /tmp/lt.utf8 | cut -f1 > /tmp/tab-no
wc -l /tmp/*no
4734 /tmp/marc-no
5294 /tmp/tab-no
10028 total

Ah, it's probably a bit worse than the 352. Anyway, let's look for something missing.

grep 23611138 /tmp/lt.utf8
23611138 Fru Jøeks Bagebog Jøek, Gerda Gerda Jøek Kbh. 1946 i.e.: 1945 1946 Det Kongelige Bibliotek Danish (blank) Danish Nov 23, 2007

grep 23611138 /tmp/lt.marc

Nothing!

Let's spill the beans on the tab-exported version:

grep 23611138 /tmp/lt.utf8 | sed -e 's/\t/ ## /g'
23611138 ## Fru Jøeks Bagebog ## Jøek, Gerda ## Gerda Jøek ## ## Kbh. 1946 i.e.: 1945 ## 1946 ## ## ## Det Kongelige Bibliotek ## Danish ## (blank) ## Danish ## ## ## ## Nov 23, 2007 ## ## ## ## ## Your library ## Baking,Recipes ## ## Fru Jøeks Bagebog by Gerda Jøek (1946) ## ## ## 1 ## UTF-8 ##

So why doesn't this record show up in the marc export ?

ETA some backslashes in the unix-commands are not displayed.

10bnielsen
May 23, 2012, 5:07 pm

As far as I can see, I don't get all my records exported in the marc export and that is a bug, especially as I can't see any system in what gets included and what doesn't.

11timspalding
Sep 26, 2012, 10:04 pm

Bump CC.

12bnielsen
Jun 15, 2013, 7:32 am

Bump.

My sanitycheck script now reports:

- - checking 5500 books -- 736 tagged as recycled - - 5175 in marc format - -

so I'm still missing at least 325 books. Not a big thing since it doesn't seem to affect newly added books, but still a bit annoying. I've discovered a nifty tool called sqlite, that allows me to create an offline copy of my tab-export and marc-export, so I can do things like:

sqlite3 lthing.db 'select title,"marc review" from LT where "review" = "" and not "marc review" = "" and title like "%morse%" ;'
Et kors for Morse Kriminalroman, hvor kriminalkommissær Morse for uigenkaldeligt sidste gang udreder en speget mordsag; denne gang er den myrdede sygeplejersken Yvonne Harrison, som ikke var helt ukendt for Morse.

I.e. it finds a review for a book that I haven't reviewed myself.

If I ask sqlite about the missing records I get a slightly higher number but still nothing

sqlite3 lthing.db 'select "title" from LT where "marc record" = ""; ' | wc -l
366

So the problem stills seem to be there. And
23611138 ## Fru Jøeks Bagebog ## Jøek, Gerda ## Gerda Jøek ## ## Kbh. 1946 i.e.: 1945 ## 1946 ## ## ## Det Kongelige Bibliotek ## Danish ## (blank) ## Danish ## ## ## ## Nov 23, 2007 ## ## ## ## ## Your library ## Baking,Recipes ## ## Fru Jøeks Bagebog by Gerda Jøek (1946) ## ## ## 1 ## UTF-8 ##
is still a good example.

13Collectorator
Jun 15, 2013, 1:34 pm

This member has been suspended from the site.

14bnielsen
Apr 20, 2014, 7:31 pm

Bump. The export as MARC (basic + marc) gives me 5167 records but I have over 5800 books. It would be nice if I could see all of my books.

15ccatalfo
May 15, 2014, 12:12 pm

>14 bnielsen: OK I believe I have fixed this now: exporting your catalog, bnielsen, gives 5723 records now, including the 245 10 $a Fru Jøeks Bagebog. book.

Let me know if you still see problems.

16bnielsen
Edited: May 16, 2014, 11:15 am

#15:
Much better than before, but still not perfect.. I'm missing some 153 books it seems.

Edited because some of those were due to the marc converter I'm using.
The list is down to 92:

22823282
36461016
39422108
39422142
39422152
39438067
39438107
39741316
40541508
40831074
41060471
45673161
46131293
46131617
46131659
46131684
46131924
46131993
56088903
57759433
58234048
64342172
64744989
66071079
70091801
74315388
78508135
79130010
79146073
79146648
79154972
79154983
79154992
79165530
79165859
79703920
79749338
79749407
83750990
84426098
85228124
85752930
86651249
86651502
88602363
90788815
90904026
90905117
91406149
91570699
91829781
91850444
92785511
92878497
96198432
99186300
99362304
100175837
100371732
100735621
100901921
101008808
101152088
101564805
102656651
102808030
103703545
103708389
104042274
104142763
105685886
105725536
105742505
106458145
106474517
106474543
106541139
106542336
106542837
106645628
106653017
106653024
106653031
106808256
106996352
106996394
107001390
107020058
107765200
107802781
108916859
108917010

Any idea why? (And I'm also curious about the bug, you most certainly have fixed :-)

17ccatalfo
May 15, 2014, 8:20 pm

>16 bnielsen: Hhm, I do not know why offhand but I'll take a look tomorrow. Are those the relevant LT book ids you've listed?

18bnielsen
Edited: May 16, 2014, 4:40 am

Yes, exactly. I've stuffed it into a sqlite database, so I can do things like:

sqlite3 lthing.db 'select "book id" from LT where "marc record" = ""; '

Not that it helped in finding out what these omitted books have in common, since it is
probably something inside the marc record that I never get to see :-)

19bnielsen
May 16, 2014, 5:44 am

The "0x11 0x14 N" stuff from #1 is back. If I try to import the marc export from LT into

http://marcpm.sourceforge.net/cgi-bin/converter.cgi

it stops after 1834 records. If I remove the "0x11 0x14 N" parts, it converts 5734 records just fine.

20bnielsen
Edited: May 16, 2014, 7:13 am

4 of the converted records contains a char(1) control character.

001 24668524
001 41062188
001 78797217
001 87707785

Field 730 in these four books contains a ^A control character

Might be another bug. Your call :-)

Note to self:
cat l.t | tr " " "\n" | sed -n '/\x01/p' | cat -v

^A^_aBibelen.^_pNT.^^\\^_aRecycled^_aReligion^^\\^_a6^^\\^_aYour
^A^_aRegnar
^A^_aRM-CM-&vebogen.^^\\^_aFables^_aFiction^^\\^_a10^^\\^_aYour
^A^_aNeedful

21ccatalfo
May 16, 2014, 9:46 am

>19 bnielsen:

OK: i've added some previously missing logic to handle errors coming back from the MARC generation code which I believe has eliminated those "N" records.

Next step is to figure out why the generation is failing on those to begin with.

But you should be able to re-export and find that those are gone: they are being omitted now rather than outputting that N stuff.

22bnielsen
Edited: May 16, 2014, 11:59 am

Just tested. Export as marc (Basic + MARC) now gives me 5734 records and http://marcpm.sourceforge.net/cgi-bin/converter.cgi converts them to text just fine.

So the only problem is that 5734 is 92 records short of the 5826 books in my catalog.

_All_ of them are imported from Det Kongelige Bibliotek! This is an important clue, I think.

sqlite3 lthing.db 'select "book id",source from LT where "marc record" = ""; '
22823282|Det Kongelige Bibliotek
36461016|Det Kongelige Bibliotek
39422108|Det Kongelige Bibliotek
39422142|Det Kongelige Bibliotek
39422152|Det Kongelige Bibliotek
39438067|Det Kongelige Bibliotek
39438107|Det Kongelige Bibliotek
39741316|Det Kongelige Bibliotek
40541508|Det Kongelige Bibliotek
40831074|Det Kongelige Bibliotek
41060471|Det Kongelige Bibliotek
45673161|Det Kongelige Bibliotek
46131293|Det Kongelige Bibliotek
46131617|Det Kongelige Bibliotek
46131659|Det Kongelige Bibliotek
46131684|Det Kongelige Bibliotek
46131924|Det Kongelige Bibliotek
46131993|Det Kongelige Bibliotek
56088903|Det Kongelige Bibliotek
57759433|Det Kongelige Bibliotek
58234048|Det Kongelige Bibliotek
64342172|Det Kongelige Bibliotek
64744989|Det Kongelige Bibliotek
66071079|Det Kongelige Bibliotek
70091801|Det Kongelige Bibliotek
74315388|Det Kongelige Bibliotek
78508135|Det Kongelige Bibliotek
79130010|Det Kongelige Bibliotek
79146073|Det Kongelige Bibliotek
79146648|Det Kongelige Bibliotek
79154972|Det Kongelige Bibliotek
79154983|Det Kongelige Bibliotek
79154992|Det Kongelige Bibliotek
79165530|Det Kongelige Bibliotek
79165859|Det Kongelige Bibliotek
79703920|Det kongelige Bibliotek
79749338|Det Kongelige Bibliotek
79749407|Det Kongelige Bibliotek
83750990|Det Kongelige Bibliotek
84426098|Det kongelige Bibliotek
85228124|Det Kongelige Bibliotek
85752930|Det Kongelige Bibliotek
86651249|Det Kongelige Bibliotek
86651502|Det Kongelige Bibliotek
88602363|Det kongelige Bibliotek
90788815|Det Kongelige Bibliotek
90904026|Det Kongelige Bibliotek
90905117|Det Kongelige Bibliotek
91406149|Det Kongelige Bibliotek
91570699|Det Kongelige Bibliotek
91829781|Det kongelige Bibliotek
91850444|Det Kongelige Bibliotek
92785511|Det Kongelige Bibliotek
92878497|Det Kongelige Bibliotek
96198432|Det Kongelige Bibliotek
99186300|Det Kongelige Bibliotek
99362304|Det Kongelige Bibliotek
100175837|Det Kongelige Bibliotek
100371732|Det Kongelige Bibliotek
100735621|Det kongelige Bibliotek
100901921|Det Kongelige Bibliotek
101008808|Det kongelige Bibliotek
101152088|Det kongelige Bibliotek
101564805|Det Kongelige Bibliotek
102656651|Det Kongelige Bibliotek
102808030|Det Kongelige Bibliotek
103703545|Det kongelige Bibliotek
103708389|Det kongelige Bibliotek
104042274|Det Kongelige Bibliotek
104142763|Det Kongelige Bibliotek
105685886|Det Kongelige Bibliotek
105725536|Det Kongelige Bibliotek
105742505|Det Kongelige Bibliotek
106458145|Det Kongelige Bibliotek
106474517|Det Kongelige Bibliotek
106474543|Det Kongelige Bibliotek
106541139|Det Kongelige Bibliotek
106542336|Det kongelige Bibliotek
106542837|Det Kongelige Bibliotek
106645628|Det Kongelige Bibliotek
106653017|Det kongelige Bibliotek
106653024|Det kongelige Bibliotek
106653031|Det kongelige Bibliotek
106808256|Det Kongelige Bibliotek
106996352|Det Kongelige Bibliotek
106996394|Det Kongelige Bibliotek
107001390|Det Kongelige Bibliotek
107020058|Det kongelige Bibliotek
107765200|Det Kongelige Bibliotek
107802781|Det Kongelige Bibliotek
108916859|Det Kongelige Bibliotek
108917010|Det Kongelige Bibliotek

But I have lots of books imported from Det Kongelige Bibliotek, so it is not something that goes wrong for all of them:

sqlite3 lthing.db 'select "book id" from LT where source like "Det Kongelige Bibliotek"; ' | wc -l
1766

23ccatalfo
May 16, 2014, 12:48 pm

Ok good, progress!

I will take a look next at what is happening in the MARC generation for those records. An error is being spit out in the generator: need to figure out why and hopefully implement a fix.

24bnielsen
May 30, 2014, 10:32 am

Just confirming that the bug is still there. I just generated a marc export and the same 92 records are missing. The good news is that all my recently added books is present in the marc export.

25ccatalfo
Jun 2, 2014, 8:40 am

>24 bnielsen:
Yes, thanks for confirming: I haven't forgotten: I've been doing some other code in the codebase and will circle back to this issue.

26bnielsen
Sep 6, 2014, 6:21 pm

Bump. (And the bug is now well over three years old.)

A few more examples from recent additions to my catalogue brougt the tally to 119, all of them from Det Kongelige Bibliotek

22823282|Det Kongelige Bibliotek
36461016|Det Kongelige Bibliotek
39422108|Det Kongelige Bibliotek
39422142|Det Kongelige Bibliotek
39422152|Det Kongelige Bibliotek
39438067|Det Kongelige Bibliotek
39438107|Det Kongelige Bibliotek
39741316|Det Kongelige Bibliotek
40541508|Det Kongelige Bibliotek
40831074|Det Kongelige Bibliotek
41060471|Det Kongelige Bibliotek
45673161|Det Kongelige Bibliotek
46131293|Det Kongelige Bibliotek
46131617|Det Kongelige Bibliotek
46131659|Det Kongelige Bibliotek
46131684|Det Kongelige Bibliotek
46131924|Det Kongelige Bibliotek
46131993|Det Kongelige Bibliotek
56088903|Det Kongelige Bibliotek
57759433|Det Kongelige Bibliotek
58234048|Det Kongelige Bibliotek
64342172|Det Kongelige Bibliotek
64744989|Det Kongelige Bibliotek
66071079|Det Kongelige Bibliotek
70091801|Det Kongelige Bibliotek
74315388|Det Kongelige Bibliotek
78508135|Det Kongelige Bibliotek
79130010|Det Kongelige Bibliotek
79146073|Det Kongelige Bibliotek
79146648|Det Kongelige Bibliotek
79154972|Det Kongelige Bibliotek
79154983|Det Kongelige Bibliotek
79154992|Det Kongelige Bibliotek
79165530|Det Kongelige Bibliotek
79165859|Det Kongelige Bibliotek
79703920|Det kongelige Bibliotek
79749338|Det Kongelige Bibliotek
79749407|Det Kongelige Bibliotek
83750990|Det Kongelige Bibliotek
84426098|Det kongelige Bibliotek
85228124|Det Kongelige Bibliotek
85752930|Det Kongelige Bibliotek
86651249|Det Kongelige Bibliotek
86651502|Det Kongelige Bibliotek
88602363|Det kongelige Bibliotek
90788815|Det Kongelige Bibliotek
90904026|Det Kongelige Bibliotek
90905117|Det Kongelige Bibliotek
91406149|Det Kongelige Bibliotek
91570699|Det Kongelige Bibliotek
91829781|Det kongelige Bibliotek
91850444|Det Kongelige Bibliotek
92785511|Det Kongelige Bibliotek
92878497|Det Kongelige Bibliotek
96198432|Det Kongelige Bibliotek
99186300|Det Kongelige Bibliotek
99362304|Det Kongelige Bibliotek
100175837|Det Kongelige Bibliotek
100371732|Det Kongelige Bibliotek
100735621|Det kongelige Bibliotek
100901921|Det Kongelige Bibliotek
101008808|Det kongelige Bibliotek
101152088|Det kongelige Bibliotek
101564805|Det Kongelige Bibliotek
102656651|Det Kongelige Bibliotek
102808030|Det Kongelige Bibliotek
103703545|Det kongelige Bibliotek
103708389|Det kongelige Bibliotek
104042274|Det Kongelige Bibliotek
104142763|Det Kongelige Bibliotek
105685886|Det Kongelige Bibliotek
105725536|Det Kongelige Bibliotek
105742505|Det Kongelige Bibliotek
106458145|Det Kongelige Bibliotek
106474517|Det Kongelige Bibliotek
106474543|Det Kongelige Bibliotek
106541139|Det Kongelige Bibliotek
106542336|Det kongelige Bibliotek
106542837|Det Kongelige Bibliotek
106645628|Det Kongelige Bibliotek
106653017|Det kongelige Bibliotek
106653024|Det kongelige Bibliotek
106653031|Det kongelige Bibliotek
106808256|Det Kongelige Bibliotek
106996352|Det Kongelige Bibliotek
106996394|Det Kongelige Bibliotek
107001390|Det Kongelige Bibliotek
107020058|Det kongelige Bibliotek
107765200|Det Kongelige Bibliotek
107802781|Det Kongelige Bibliotek
108916859|Det Kongelige Bibliotek
108917010|Det Kongelige Bibliotek
109807491|Det Kongelige Bibliotek
109850947|Det Kongelige Bibliotek
110171706|Det Kongelige Bibliotek
110171739|Det Kongelige Bibliotek
110235177|Det Kongelige Bibliotek
110282535|Det Kongelige Bibliotek
110292834|Det Kongelige Bibliotek
110295447|Det Kongelige Bibliotek
110308830|Det Kongelige Bibliotek
110311242|Det Kongelige Bibliotek
110326560|Det Kongelige Bibliotek
110470209|Det kongelige Bibliotek
110479436|Det Kongelige Bibliotek
110481319|Det Kongelige Bibliotek
110481501|Det Kongelige Bibliotek
110858261|Det Kongelige Bibliotek
110862620|Det Kongelige Bibliotek
111012486|Det Kongelige Bibliotek
111510167|Det Kongelige Bibliotek
111510469|Det Kongelige Bibliotek
111574438|Det Kongelige Bibliotek
111793666|Det Kongelige Bibliotek
111802747|Det kongelige Bibliotek
111886334|Det kongelige Bibliotek
111996515|Det kongelige Bibliotek
112020648|Det kongelige Bibliotek
112288329|Det Kongelige Bibliotek

27bnielsen
Oct 2, 2014, 6:29 pm

Another 4 weeks :-) The tally is now 127 books missing. 8 new missing books. And still they all come from one source:

112360422|Det Kongelige Bibliotek
112360424|Det Kongelige Bibliotek
112375192|Det Kongelige Bibliotek
112375218|Det Kongelige Bibliotek
112375572|Det kongelige Bibliotek
112457297|Det kongelige Bibliotek
112570577|Det Kongelige Bibliotek
112854360|Det kongelige Bibliotek

28davidgn
Oct 14, 2016, 8:45 am

Bump for good measure.

29lorax
Oct 14, 2016, 12:46 pm

Re-opening, since bumping a closed-as-fixed bug is unlikely to get attention.

30lorannen
Oct 14, 2016, 12:49 pm

>29 lorax: Thanks! You beat me to the punch.

31bnielsen
Oct 14, 2016, 5:19 pm

>29 lorax: Thanks. I hadn't noticed that ccatalfo had closed it.

32davidgn
Oct 14, 2016, 8:24 pm

>29 lorax: >31 bnielsen: Nor I. Nice catch.

33ccatalfo
Nov 10, 2016, 2:55 pm

Thanks for the subsequent reports, will return to this one.

34ccatalfo
Edited: Nov 14, 2016, 10:23 am

All: I *think* I've got this fixed, at least mostly.

The bug appears to be one related to the raw MARC data coming back for certain records. The make MARC code was actually re-parsing it in order to make HTML output possible. I've moved that logic down into the section where it applies so that the export can complete.

This means making HTML version of these particular records will still, for now, fail - part of the problem appears to be the data getting too large for the MARC directory. I have not delved into exactly why that's happening yet.

But at least the export should get everything now, which seems more urgent to me.

I tested with your books, bnielsen, and it got all 7,032 this time.

Marking "fixed" until we get examples of not working again.

35bnielsen
Nov 15, 2016, 8:46 am

Nice!

I'll go test it soonish :-)