Treating marc records like a database?

Talk Hacking LibraryThing

Join LibraryThing to post.

1bnielsen
Edited: Jul 19, 2025, 11:32 am

I found out that TinyCat has a "MARC View" allowing one to see a Marc Record for any book. The button is simply a link to another web page, so it is easy to download the Marc Record for a single book

https://www.librarycat.org/lib/bnielsen/item/109650282#

I've been doing this for fun, so I have a bunch of these files and they can be used as a poor man's database with some of the standard unix tools,. I've pretty printed the table view from the webpage so it looks like a standard Marc Record.

001109650282
003MePoLT
00520250707180301.0
008890123s1968 dk 000 0 dan
010 $a 74354398
015 $aD68-29/30
035 $a(OCoLC)19051836
040 $aDLC $cWaU $dDLC $dMePoLT $erda
042 $apremarc
050 0 0$aPT8175.S36 $bI3 1968
100 1 $aScherfig, Hans, $d1905-
245 1 0$aIdealister.
250 $a6. opl.
260 $aKøbenhavn, $bGyldendal, $c1968.
300 $a256 p. $c19 cm.
490 0 $aGyldendals tranebøger ; $v160
920 $aFiction $aRecycled $aTranebog
921 $aDanmark, november 1938returnIndeholder ...
923 $aYour library

So I can do stuff like this:

grep '650.*xProg' *mrc | cut -f1 -d: | xargs cat | grep '920.*\$a'
920 $aProgramming $aPython $aRaspberry Pi
920 $aMicroPython $aMicrobit $aProgramming $aRecycled
920 $aAwk $aC $aComputers $aProgramming $aRecycled $aUnix
920 $aComputers
920 $aJava $aProgramming

but I wondered if any of the others here do something similar and have any advice to share?

(My example above is finding records where 650 x has something with Programming in it and then seeing if 920 a (i.e. my own tags) has something similar. So I'll be looking at the book I tagged Computers but not Programming and see if that was an error)

Comments?

ETA: https://www.loc.gov/marc/marctools.html

ETA: https://github.com/hectorcorrea/marcli

marcli seems to be very close to what I was looking for.

2Keeline
Jul 20, 2025, 12:33 am

How complete is the LT implementation of MARC ? It is my understanding that it is complex but not always filled in for a given record, even ones from libraries.

I use a Mac so many of the tools you mention here and elsewhere are familiar to me from the command line.

It seems that some of this is a bit like working with a a JSON source with different markup and tools to use it. It is more grep based.

These files could get big if one has a large collection. So it would be possible to compress them and use zgrep to query them. The zcat tool is also helpful. I normally use them with .gz files. I don't know if they work with .zip files. A .gz file is usually about 1/10 the size of an uncompressed text file with ordinary characters.

James

3bnielsen
Edited: Jul 29, 2025, 6:00 am

So far my modus operandi is to download the https://www.librarycat.org/lib/bnielsen/item/## Insert Book_Id here ### files.
I've written a minimal perl script to convert them to readable Marc format. And a rather long ad-hoc script to repair broken character set conversions.
I then use yaz-marcdump to convert them into binary Marc format.
(It seems I run into a bug/feature/whatever with lines longer than 10000 characters (and yes, it seems to be 10000 and not 10200), so I just chop the rather few records at 10000 characters.)
I need the binary version because that's what the marcli program use. I can then do stuff like this:

$ ./marcli_linux -match kraniebrud -fields 100 -file /tmp/m2 | tr -d "\r" | grep .
=100 1\$aSayers, Dorothy L.
=100 1\$aJohansen, Orla, $df. 1912.
=100 1\$aSimenon, Georges.
=100 1\$aUtzon, Mette Vibe.
=100 1\$astergaard, Leif.
=100 1\$aSimenon, Georges.
=100 1\$aSimenon, Georges.
=100 1\$aMeister, Knud.
=100 1\$aSimenon, Georges.
=100 1\$aSayers, Dorothy L.
=100 1\$aMarric, J. J.
=100 1\$aTurèll, Dan.
=100 1\$aNielsen, Niels E.
=100 1\$aO'Donnell, Peter.
=100 1\$aWandrei, Donald.
=100 1\$aFossum, Karin.
=100 1\$aGiménez, Carlos.
=100 1\$aPini, Wendy.
=100 1\$aSayers, Dorothy L.
=100 1\$aJaprisot, Sébastien.
=100 1\$aGeertinger, Preben.

$ ls -la /tmp/m2
-rw-rw-r-- 1 bnielsen bnielsen 32493942 jul 20 07:47 /tmp/m2

$ ./marcli_linux -format=count-only -file /tmp/m2
9873

The file with the 9873 records is about 32 Mb which is not a problem.

Caveats, so far:
marcli gives me a lot of unwantet carriage returns
I have 9 records, that have an empty "MARC View" (Bug report filed)
yaz-markdump croaks on lines longer than 10000 characters.

The benefit is that the MARC View gives some data that aren't otherwise available.
Example: Looking at 260c I noticed a series that I had overlooked.

4bnielsen
Jul 20, 2025, 3:27 am

>2 Keeline: "How complete is the LT implementation of MARC ? It is my understanding that it is complex but not always filled in for a given record, even ones from libraries."

Good question. I've given up on the LT export as Marc, since it seems to produce weird results for my books. Line breaks in Comments seems to be passed down as line breaks in the Marc export file. (Also there are several flavors of Marc Export and I couldn't get any of them to work in a good manner :-)

But the TinyCat MARC view seems to be quite useful.

I think there is a third place where Marc is exposed. But I haven't looked closer at that.

It was many and many years ago that I used to work with a couple of Aleph libraries and wrote custom tools for exporting and importing Marc records, so I've known about Marc records for a long, long time.

5bnielsen
Jul 29, 2025, 6:07 am

I solved the problem with long lines. It was because I use yaz-marcdump to convert them into binary Marc format (Marc21) and there are only four decimal digits for the field length, so 9999 is a hard limit. I've just made my script split the long lines in several lines. I.e. one "921 a" field is split into several "921 a".

The resulting marc records look fine and the marcli tool also works fine with them.

6bnielsen
Aug 19, 2025, 4:35 pm

Short status:
My modus operandi is to download the https://www.librarycat.org/lib/bnielsen/item/## Insert Book_Id here ### files.
I've written a minimal perl script to convert them to readable Marc format. And a rather long ad-hoc script to repair broken character set conversions.
Nine records give ERROR: 'NoneType' object has no attribute 'get_fields'. I've written a small script to create ersatz entries for them.
I then use yaz-marcdump to convert them into a binary Marc format.

The resulting lt.marc file is 34390843 bytes long and I can do stuff like:

$ ./marcli_linux -match kraniebrud -fields 245,100 -file lt.marc | tr -d "\r" | grep . | head
=245 1\$$aBusman's honeymoon.
=100 1\$aSayers, Dorothy L.
=245 1\$$aSæsonens mord : $bkriminalroman / $caf Orla Johansen.
=100 1\$aJohansen, Orla, $df. 1912.
=245 1\$$aMaigret bliver bange.
=100 1\$aSimenon, Georges.
=245 1\$$aFørstedamer / $cMette Vibe Utzon.
=100 1\$aUtzon, Mette Vibe.
=245 1\$$aHjernen.
=100 1\$aØstergaard, Leif.

I've also written a couple of scripts to look for errors in the marc file. I.e. Østergaard was converted to stergaard, so the Ø was missing completely. Stuff like that can be fixed by an ad-hoc script, but if there was a lot of stuff like that, I'd give up on the project. But these errors seem to be few, so I'm happy with the lt.marc file at the moment.

The idea here is that the MARC View gives some data that aren't otherwise available.
Example: Looking at 260c I noticed a series that I had overlooked.

Also using the marcli program is a bit of fun. Lesson so far is that XML is almost unusable.

7bnielsen
Aug 19, 2025, 4:56 pm

Comparing author names finds me a lot like

:Aldiss, Brian Wilson: versus :Aldiss, Brian W:

and

:Clarke, Arthur Charles: versus :Clarke, Arthur C:

and

:Niven, Laurence Van Cott: versus :Niven, Larry:

Treating marc records like a database?

Talk Hacking LibraryThing

1bnielsen
Edited: Jul 19, 2025, 11:32 am

2Keeline
Jul 20, 2025, 12:33 am

3bnielsen
Edited: Jul 29, 2025, 6:00 am

4bnielsen
Jul 20, 2025, 3:27 am

5bnielsen
Jul 29, 2025, 6:07 am

6bnielsen
Aug 19, 2025, 4:35 pm

7bnielsen
Aug 19, 2025, 4:56 pm

Group: Hacking LibraryThing

About

Touchstones

Works

Treating marc records like a database?

Talk Hacking LibraryThing

1bnielsenEdited: Jul 19, 2025, 11:32 am

2KeelineJul 20, 2025, 12:33 am

3bnielsenEdited: Jul 29, 2025, 6:00 am

4bnielsenJul 20, 2025, 3:27 am

5bnielsenJul 29, 2025, 6:07 am

6bnielsenAug 19, 2025, 4:35 pm

7bnielsenAug 19, 2025, 4:56 pm

Group: Hacking LibraryThing

About

Touchstones

Works

1bnielsen
Edited: Jul 19, 2025, 11:32 am

2Keeline
Jul 20, 2025, 12:33 am

3bnielsen
Edited: Jul 29, 2025, 6:00 am

4bnielsen
Jul 20, 2025, 3:27 am

5bnielsen
Jul 29, 2025, 6:07 am

6bnielsen
Aug 19, 2025, 4:35 pm

7bnielsen
Aug 19, 2025, 4:56 pm