LC Classification (LCCs) are better, part 1

TalkTalk about LibraryThing

Join LibraryThing to post.

LC Classification (LCCs) are better, part 1

This topic is currently marked as "dormant"—the last message is more than 90 days old. You can revive it by posting a reply.

1timspalding
Edited: Mar 16, 2017, 11:39 am

LT is in a process to make its LCC (Library of Congress Classification) code better.

First, @ChrisCatalfo and I fixed a longstanding bug with sorting. See https://www.librarything.com/topic/114262 . To sort by LCC, make sure you have the field in your display styles ( https://www.librarything.com/settings/styles ). Then click on the sort icon or, better, the blue column words "LC Classification.

Sorting works as follows:

1. Works without LCCs at all (i.e., blank)
2. LCCs, correctly sorted
3. Things that aren't valid LCCs.

NOTE: If you have questions or concerns about the sorting, put them here. There may well be problems remaining; we aim to solve them.

Next, the work-level (green) LCC numbers were recalculated. The new data cover 20% more works (2.2m), and is more "generic"--correctly removing second cutters and other edition-specific data.

Next, we're going to make LCC pages much as we have DDC (MDS) pages. Ideally, we'd like roll up the call numbers under the relatively high level terms used on the LCC site. The trick is, LCC doesn't have the sort of simple structure that DDC has. Subjects cross numbers quite arbitrarily--in effect, the numbers map arbitrarily to a separate classification system. Example:



So we need to figure out how to get these schedules ( https://www.loc.gov/catdir/cpso/lcco/ ) in a data format, without paying 6k for the FULL schedules from LC. In theory we could ask users to help us but, man, it's a lot of work, and very fiddly. If anyone knows where we could get it in a better format, let us know.

Info about LCCs

* Basic schedules, that is, an outline of the system (classes, numbers) https://www.loc.gov/catdir/cpso/lcco/
* Wikipedia: https://en.wikipedia.org/wiki/Library_of_Congress_Classification
* Sorting them, Rutgers PDF https://www.libraries.rutgers.edu/rul/staff/access_serv/student_coord/LibConSys....
* Some of the nasty variants we had to deal with http://web.library.yale.edu/cataloging/LC-call-numbers-overview/examples-7-11 and https://library.duke.edu/about/depts/cataloging/documentation/call-numbers

See below for more technical detail.

More detail

Doing this involved writing code to parse LCCs better than we have before--to determine what's the letters, class, decimal expansion, first cutter, second cutter, etc. "Valid LCC" is fairly generous. For example, it doesn't stop "HD70 C2132 Q3 D278 2008," which seems to have three cutters. But it will stop obvious non-LCCs ("NOT IN LC", "ANT", etc.). To be valid an LCC needs to have (1) Letter(s), (2) Numbers, (3) Numbers and decimal expansion must be integers, without zero-pading.

We aren't currently exposing the "is it valid" data. In theory we could, but it could get fiddly.

Work-level LCCs keep the first cutter. Most of the time this will be right--the author of a work doesn't change. But it can. There's no perfect way to do this, but keeping one cutter seems more likely correct.

2Collectorator
Mar 16, 2017, 2:30 pm

This member has been suspended from the site.

3lorax
Edited: Mar 16, 2017, 2:44 pm

>2 Collectorator:

If they aren't in the LoC, looking in academic libraries on Worldcat is probably the best bet. Fiction tends to be harder to find this way.

Edit:

I'll add that this has prompted me to see if I can fill out the "stub" classifications I've given a few of my books where I couldn't find a full one (plopping a cookbook in TX, for instance) since now those sort all the way to the bottom as invalid.

4Collectorator
Mar 16, 2017, 2:46 pm

This member has been suspended from the site.

5jjwilson61
Mar 16, 2017, 2:57 pm

I had one book with an invalid LC Classification which was CPB Box no. 1955 vol. 14. It's Source was the Library of Congress.

7lesmel
Mar 16, 2017, 3:03 pm

There's also: http://www.loc.gov/cds/classweb/classweborder.html -- again, not sure how helpful it will be.

8lesmel
Mar 16, 2017, 3:08 pm

This has the complete (I think) schedules in PDF form: http://www.loc.gov/aba/publications/FreeLCC/freelcc.html

9gilroy
Mar 16, 2017, 3:16 pm

>5 jjwilson61: I have 48 from the LoC that start with CPB Box no. Plus I have some that say Not in LC yet

10timspalding
Mar 16, 2017, 3:37 pm

I had one book with an invalid LC Classification which was CPB Box no. 1955 vol. 14. It's Source was the Library of Congress.

Well, LC has it in their catalog, but CPB is not part of the LCC system.

11timspalding
Mar 16, 2017, 3:49 pm

It's tricky. Should such a book be shelved in the C's? I don't know. It seems to me it's some sort of separate collection. Including invalid LCCs would cause confusion, but not including them can also cause it.

12jjwilson61
Mar 16, 2017, 3:49 pm

LT thinks this is a valid LCC, CGC 6832-6838 (ref print), but I don't think it is. The book is Fear and Loathing in Las Vegas.

13lorax
Mar 16, 2017, 4:01 pm

>11 timspalding:

No, that's not a valid LC number. It's helpful for locating the actual physical copy housed at the Library of Congress. It really does seem to be the exact equivalent of an LT tag "Box 35, number 12". It does not belong in the Cs.

14timspalding
Mar 16, 2017, 6:03 pm

>12 jjwilson61:

Yeah, it's going to think things are valid that aren't.

Someone sent me a text file of the top-level classifications. I'll put it in and maybe check against it.

15Proclus
Edited: Mar 17, 2017, 6:09 pm

I've noticed a couple problems still. For example:
BX 385 .A82 is sorting before
BX 385 .A8
and same thing with translation cutters:
PG 3337 .L4 G413 is coming before
PG 3337 .L4 G4

This little group shows the problems with both the 1st & 2nd cutters:
http://www.librarything.com/catalog/Proclus&deepsearch=lermontov

16davidgn
Mar 17, 2017, 11:22 pm

Glad this is finally progressing. Let me know if there's gonna be a barn-raising; I'm in.

17timspalding
Mar 18, 2017, 2:20 am

>15 Proclus:

Okay, will take a look!

18timspalding
Mar 18, 2017, 3:29 am

Test case:

BX 385 .A82 19T2 2007
BX 385 .A8 A4 2007

Okay, so the rule WAS that the two cutters are squished, so it's effectively

BX 385 .A8219T2 2007
BX 385 .A8A4 2007

That seemed to me what Chris told me was the rule, but I seem to find the opposite in descriptions of sorting these.

I fixed all the BXes. The whole system will refresh in next few hours. And I'll talk to Chris on Monday, to see whether the rule was right or wrong.

19timspalding
Mar 18, 2017, 11:43 am

Okay, should be good everywhere.

20Edward
Mar 25, 2017, 5:25 am

This is great – many of my LCCs that weren't sorting correctly before are now.

However, I have three LCCs sorting in the following order:

G5753.O8F7 1994
G5753.O8 2009
G5753.O8A25 2014

I think the second cutter (F7/blank/A25) should be considered before the date, and the order should be (blank/A25/F7):

G5753.O8 2009
G5753.O8A25 2014
G5753.O8F7 1994

(These are maps, and the two cutters are for place and subject respectively. The subject cutter is omitted if the map has no subject focus.)

21timspalding
Edited: Mar 26, 2017, 11:23 pm

The problem is with these call numbers:

G5753.O8F7 1994
G5753.O8A25 2014

Both have no space between the first and second cutter. I don't believe this occurs in library data, at least as LibraryThing parses it. I notice both are manually entered.

The lack of the space makes it think that the two cutters are one. It's tricky. We could try to fix the parsing to catch this, but I'm worried it will just cause something else unglued, like understanding (valid) monstrosities like "G6713.F7:3G6P2 1976 .L5"

22Edward
Mar 27, 2017, 3:46 pm

>21 timspalding: Thanks. I've added a space between the cutters, and they're now sorting as follows:

G5753.O8 A25 2014
G5753.O8 F7 1994
G5753.O8 2009

So, the two-cutter numbers are now in the correct order, but the number without the second cutter is still out of sequence.

23timspalding
Mar 28, 2017, 10:53 am

Okay, good. Thanks for the example. It was reading the 2009 *as* the cutter. We fixed how it works. I've re-run the G5753s, and confirmed it works, but it's going to be at least 8 hours for all the others to be recalculated.

24pauldavidrowe
May 28, 2017, 5:27 pm

I'm having a hard time getting LT to parse several call numbers that seem valid to me. I should say that I am very new to the LC system, so I apologize if it's my misunderstanding. They are all in either French literature or in

I have three works by the same author. Without the date, they all have the same call number that parses fine:

PQ2662 .A6523

However if I add a date at the end it doesn't parse:

PQ2662 .A6523 2016

I think this fits the recommendation for subarranging each author found here:

http://www.loc.gov/aba/publications/FreeLCC/PQ-text.pdf

I have also tried adding my own classification for a recent novel that doesn't have an LCC that I can find online. My attempt doesn't parse:

PQ2702 .U875

Even accounting for inaccuracies in developing the author cutter, I would expect it to parse ok.

Any thoughts?

25nate48281
May 29, 2017, 2:26 am

Is the $6K a one time purchase or an annual/subscription purchase of some sort?

26Edward
May 31, 2017, 2:25 pm

>24 pauldavidrowe: I'm also having a problem with some recently entered LCCs. I have a book classified as DA670.C83 E9 2017 (I've tried variations on spacing and punctuation), and it's being sorted right at the end of the sequence (after ZA4375).

That was a manual entry, but I tested adding a book from the Library of Congress catalogue with the very similar classification DA670.C83E23, and that's also being sorted at the end.

27Edward
May 31, 2017, 2:25 pm

>23 timspalding: Thanks, the G5753s are working fine now! I'm sorry I didn't thank you for this at the time!

28Edward
Jun 10, 2017, 6:49 pm

>26 Edward: The LCCs I mentioned above are now sorting correctly – maybe just a caching delay.

29timspalding
Jun 11, 2017, 4:05 am

Hey, sorry. I should have posted here. The first report was by email; I replied to that and forgot about this.

30davidgn
Edited: Mar 19, 2018, 2:26 am

Hmm. Are we aware of this?
https://raw.githubusercontent.com/edsu/lcco/master/lcco.rdf

via: https://pdfs.semanticscholar.org/e27f/332724835b325533afef2f8c275588dab509.pdf
pp. 162-163

That's the full outline (or first four levels) from https://www.loc.gov/catdir/cpso/lcco/ scraped and converted to SKOS.

31davidgn
Edited: Mar 19, 2018, 2:22 am

Also: you mentioned asking "users" to help in >1 timspalding:. I think there are a lot of ideas in the PDF in >30 davidgn: that would attract a lot of crowdsourced attention from across the library world if a framework could be put in place, redounding to general notoriety and benefit. (Naturally, non-compete agreements and the like would need to be considered).