1bnielsen
I found out that TinyCat has a "MARC View" allowing one to see a Marc Record for any book. The button is simply a link to another web page, so it is easy to download the Marc Record for a single book
https://www.librarycat.org/lib/bnielsen/item/109650282#
I've been doing this for fun, so I have a bunch of these files and they can be used as a poor man's database with some of the standard unix tools,. I've pretty printed the table view from the webpage so it looks like a standard Marc Record.
001109650282
003MePoLT
00520250707180301.0
008890123s1968 dk 000 0 dan
010 $a 74354398
015 $aD68-29/30
035 $a(OCoLC)19051836
040 $aDLC $cWaU $dDLC $dMePoLT $erda
042 $apremarc
050 0 0$aPT8175.S36 $bI3 1968
100 1 $aScherfig, Hans, $d1905-
245 1 0$aIdealister.
250 $a6. opl.
260 $aKøbenhavn, $bGyldendal, $c1968.
300 $a256 p. $c19 cm.
490 0 $aGyldendals tranebøger ; $v160
920 $aFiction $aRecycled $aTranebog
921 $aDanmark, november 1938returnIndeholder ...
923 $aYour library
So I can do stuff like this:
grep '650.*xProg' *mrc | cut -f1 -d: | xargs cat | grep '920.*\$a'
920 $aProgramming $aPython $aRaspberry Pi
920 $aMicroPython $aMicrobit $aProgramming $aRecycled
920 $aAwk $aC $aComputers $aProgramming $aRecycled $aUnix
920 $aComputers
920 $aJava $aProgramming
but I wondered if any of the others here do something similar and have any advice to share?
(My example above is finding records where 650 x has something with Programming in it and then seeing if 920 a (i.e. my own tags) has something similar. So I'll be looking at the book I tagged Computers but not Programming and see if that was an error)
Comments?
ETA: https://www.loc.gov/marc/marctools.html
ETA: https://github.com/hectorcorrea/marcli
marcli seems to be very close to what I was looking for.
https://www.librarycat.org/lib/bnielsen/item/109650282#
I've been doing this for fun, so I have a bunch of these files and they can be used as a poor man's database with some of the standard unix tools,. I've pretty printed the table view from the webpage so it looks like a standard Marc Record.
001109650282
003MePoLT
00520250707180301.0
008890123s1968 dk 000 0 dan
010 $a 74354398
015 $aD68-29/30
035 $a(OCoLC)19051836
040 $aDLC $cWaU $dDLC $dMePoLT $erda
042 $apremarc
050 0 0$aPT8175.S36 $bI3 1968
100 1 $aScherfig, Hans, $d1905-
245 1 0$aIdealister.
250 $a6. opl.
260 $aKøbenhavn, $bGyldendal, $c1968.
300 $a256 p. $c19 cm.
490 0 $aGyldendals tranebøger ; $v160
920 $aFiction $aRecycled $aTranebog
921 $aDanmark, november 1938returnIndeholder ...
923 $aYour library
So I can do stuff like this:
grep '650.*xProg' *mrc | cut -f1 -d: | xargs cat | grep '920.*\$a'
920 $aProgramming $aPython $aRaspberry Pi
920 $aMicroPython $aMicrobit $aProgramming $aRecycled
920 $aAwk $aC $aComputers $aProgramming $aRecycled $aUnix
920 $aComputers
920 $aJava $aProgramming
but I wondered if any of the others here do something similar and have any advice to share?
(My example above is finding records where 650 x has something with Programming in it and then seeing if 920 a (i.e. my own tags) has something similar. So I'll be looking at the book I tagged Computers but not Programming and see if that was an error)
Comments?
ETA: https://www.loc.gov/marc/marctools.html
ETA: https://github.com/hectorcorrea/marcli
marcli seems to be very close to what I was looking for.
2Keeline
How complete is the LT implementation of MARC ? It is my understanding that it is complex but not always filled in for a given record, even ones from libraries.
I use a Mac so many of the tools you mention here and elsewhere are familiar to me from the command line.
It seems that some of this is a bit like working with a a JSON source with different markup and tools to use it. It is more grep based.
These files could get big if one has a large collection. So it would be possible to compress them and use zgrep to query them. The zcat tool is also helpful. I normally use them with .gz files. I don't know if they work with .zip files. A .gz file is usually about 1/10 the size of an uncompressed text file with ordinary characters.
James
I use a Mac so many of the tools you mention here and elsewhere are familiar to me from the command line.
It seems that some of this is a bit like working with a a JSON source with different markup and tools to use it. It is more grep based.
These files could get big if one has a large collection. So it would be possible to compress them and use zgrep to query them. The zcat tool is also helpful. I normally use them with .gz files. I don't know if they work with .zip files. A .gz file is usually about 1/10 the size of an uncompressed text file with ordinary characters.
James
3bnielsen
So far my modus operandi is to download the https://www.librarycat.org/lib/bnielsen/item/## Insert Book_Id here ### files.
I've written a minimal perl script to convert them to readable Marc format. And a rather long ad-hoc script to repair broken character set conversions.
I then use yaz-marcdump to convert them into binary Marc format.
(It seems I run into a bug/feature/whatever with lines longer than 10000 characters (and yes, it seems to be 10000 and not 10200), so I just chop the rather few records at 10000 characters.)
I need the binary version because that's what the marcli program use. I can then do stuff like this:
$ ./marcli_linux -match kraniebrud -fields 100 -file /tmp/m2 | tr -d "\r" | grep .
=100 1\$aSayers, Dorothy L.
=100 1\$aJohansen, Orla, $df. 1912.
=100 1\$aSimenon, Georges.
=100 1\$aUtzon, Mette Vibe.
=100 1\$astergaard, Leif.
=100 1\$aSimenon, Georges.
=100 1\$aSimenon, Georges.
=100 1\$aMeister, Knud.
=100 1\$aSimenon, Georges.
=100 1\$aSayers, Dorothy L.
=100 1\$aMarric, J. J.
=100 1\$aTurèll, Dan.
=100 1\$aNielsen, Niels E.
=100 1\$aO'Donnell, Peter.
=100 1\$aWandrei, Donald.
=100 1\$aFossum, Karin.
=100 1\$aGiménez, Carlos.
=100 1\$aPini, Wendy.
=100 1\$aSayers, Dorothy L.
=100 1\$aJaprisot, Sébastien.
=100 1\$aGeertinger, Preben.
$ ls -la /tmp/m2
-rw-rw-r-- 1 bnielsen bnielsen 32493942 jul 20 07:47 /tmp/m2
$ ./marcli_linux -format=count-only -file /tmp/m2
9873
The file with the 9873 records is about 32 Mb which is not a problem.
Caveats, so far:
marcli gives me a lot of unwantet carriage returns
I have 9 records, that have an empty "MARC View" (Bug report filed)
yaz-markdump croaks on lines longer than 10000 characters.
The benefit is that the MARC View gives some data that aren't otherwise available.
Example: Looking at 260c I noticed a series that I had overlooked.
I've written a minimal perl script to convert them to readable Marc format. And a rather long ad-hoc script to repair broken character set conversions.
I then use yaz-marcdump to convert them into binary Marc format.
(It seems I run into a bug/feature/whatever with lines longer than 10000 characters (and yes, it seems to be 10000 and not 10200), so I just chop the rather few records at 10000 characters.)
I need the binary version because that's what the marcli program use. I can then do stuff like this:
$ ./marcli_linux -match kraniebrud -fields 100 -file /tmp/m2 | tr -d "\r" | grep .
=100 1\$aSayers, Dorothy L.
=100 1\$aJohansen, Orla, $df. 1912.
=100 1\$aSimenon, Georges.
=100 1\$aUtzon, Mette Vibe.
=100 1\$astergaard, Leif.
=100 1\$aSimenon, Georges.
=100 1\$aSimenon, Georges.
=100 1\$aMeister, Knud.
=100 1\$aSimenon, Georges.
=100 1\$aSayers, Dorothy L.
=100 1\$aMarric, J. J.
=100 1\$aTurèll, Dan.
=100 1\$aNielsen, Niels E.
=100 1\$aO'Donnell, Peter.
=100 1\$aWandrei, Donald.
=100 1\$aFossum, Karin.
=100 1\$aGiménez, Carlos.
=100 1\$aPini, Wendy.
=100 1\$aSayers, Dorothy L.
=100 1\$aJaprisot, Sébastien.
=100 1\$aGeertinger, Preben.
$ ls -la /tmp/m2
-rw-rw-r-- 1 bnielsen bnielsen 32493942 jul 20 07:47 /tmp/m2
$ ./marcli_linux -format=count-only -file /tmp/m2
9873
The file with the 9873 records is about 32 Mb which is not a problem.
Caveats, so far:
marcli gives me a lot of unwantet carriage returns
I have 9 records, that have an empty "MARC View" (Bug report filed)
yaz-markdump croaks on lines longer than 10000 characters.
The benefit is that the MARC View gives some data that aren't otherwise available.
Example: Looking at 260c I noticed a series that I had overlooked.
4bnielsen
>2 Keeline: "How complete is the LT implementation of MARC ? It is my understanding that it is complex but not always filled in for a given record, even ones from libraries."
Good question. I've given up on the LT export as Marc, since it seems to produce weird results for my books. Line breaks in Comments seems to be passed down as line breaks in the Marc export file. (Also there are several flavors of Marc Export and I couldn't get any of them to work in a good manner :-)
But the TinyCat MARC view seems to be quite useful.
I think there is a third place where Marc is exposed. But I haven't looked closer at that.
It was many and many years ago that I used to work with a couple of Aleph libraries and wrote custom tools for exporting and importing Marc records, so I've known about Marc records for a long, long time.
Good question. I've given up on the LT export as Marc, since it seems to produce weird results for my books. Line breaks in Comments seems to be passed down as line breaks in the Marc export file. (Also there are several flavors of Marc Export and I couldn't get any of them to work in a good manner :-)
But the TinyCat MARC view seems to be quite useful.
I think there is a third place where Marc is exposed. But I haven't looked closer at that.
It was many and many years ago that I used to work with a couple of Aleph libraries and wrote custom tools for exporting and importing Marc records, so I've known about Marc records for a long, long time.
5bnielsen
I solved the problem with long lines. It was because I use yaz-marcdump to convert them into binary Marc format (Marc21) and there are only four decimal digits for the field length, so 9999 is a hard limit. I've just made my script split the long lines in several lines. I.e. one "921 a" field is split into several "921 a".
The resulting marc records look fine and the marcli tool also works fine with them.
The resulting marc records look fine and the marcli tool also works fine with them.
6bnielsen
Short status:
My modus operandi is to download the https://www.librarycat.org/lib/bnielsen/item/## Insert Book_Id here ### files.
I've written a minimal perl script to convert them to readable Marc format. And a rather long ad-hoc script to repair broken character set conversions.
Nine records give ERROR: 'NoneType' object has no attribute 'get_fields'. I've written a small script to create ersatz entries for them.
I then use yaz-marcdump to convert them into a binary Marc format.
The resulting lt.marc file is 34390843 bytes long and I can do stuff like:
$ ./marcli_linux -match kraniebrud -fields 245,100 -file lt.marc | tr -d "\r" | grep . | head
=245 1\$$aBusman's honeymoon.
=100 1\$aSayers, Dorothy L.
=245 1\$$aSæsonens mord : $bkriminalroman / $caf Orla Johansen.
=100 1\$aJohansen, Orla, $df. 1912.
=245 1\$$aMaigret bliver bange.
=100 1\$aSimenon, Georges.
=245 1\$$aFørstedamer / $cMette Vibe Utzon.
=100 1\$aUtzon, Mette Vibe.
=245 1\$$aHjernen.
=100 1\$aØstergaard, Leif.
I've also written a couple of scripts to look for errors in the marc file. I.e. Østergaard was converted to stergaard, so the Ø was missing completely. Stuff like that can be fixed by an ad-hoc script, but if there was a lot of stuff like that, I'd give up on the project. But these errors seem to be few, so I'm happy with the lt.marc file at the moment.
The idea here is that the MARC View gives some data that aren't otherwise available.
Example: Looking at 260c I noticed a series that I had overlooked.
Also using the marcli program is a bit of fun. Lesson so far is that XML is almost unusable.
My modus operandi is to download the https://www.librarycat.org/lib/bnielsen/item/## Insert Book_Id here ### files.
I've written a minimal perl script to convert them to readable Marc format. And a rather long ad-hoc script to repair broken character set conversions.
Nine records give ERROR: 'NoneType' object has no attribute 'get_fields'. I've written a small script to create ersatz entries for them.
I then use yaz-marcdump to convert them into a binary Marc format.
The resulting lt.marc file is 34390843 bytes long and I can do stuff like:
$ ./marcli_linux -match kraniebrud -fields 245,100 -file lt.marc | tr -d "\r" | grep . | head
=245 1\$$aBusman's honeymoon.
=100 1\$aSayers, Dorothy L.
=245 1\$$aSæsonens mord : $bkriminalroman / $caf Orla Johansen.
=100 1\$aJohansen, Orla, $df. 1912.
=245 1\$$aMaigret bliver bange.
=100 1\$aSimenon, Georges.
=245 1\$$aFørstedamer / $cMette Vibe Utzon.
=100 1\$aUtzon, Mette Vibe.
=245 1\$$aHjernen.
=100 1\$aØstergaard, Leif.
I've also written a couple of scripts to look for errors in the marc file. I.e. Østergaard was converted to stergaard, so the Ø was missing completely. Stuff like that can be fixed by an ad-hoc script, but if there was a lot of stuff like that, I'd give up on the project. But these errors seem to be few, so I'm happy with the lt.marc file at the moment.
The idea here is that the MARC View gives some data that aren't otherwise available.
Example: Looking at 260c I noticed a series that I had overlooked.
Also using the marcli program is a bit of fun. Lesson so far is that XML is almost unusable.

