This topic is currently marked as "dormant"—the last message is more than 90 days old. You can revive it by posting a reply.
1timspalding
LT is in a process to make its LCC (Library of Congress Classification) code better.
First, @ChrisCatalfo and I fixed a longstanding bug with sorting. See https://www.librarything.com/topic/114262 . To sort by LCC, make sure you have the field in your display styles ( https://www.librarything.com/settings/styles ). Then click on the sort icon or, better, the blue column words "LC Classification.
Sorting works as follows:
1. Works without LCCs at all (i.e., blank)
2. LCCs, correctly sorted
3. Things that aren't valid LCCs.
NOTE: If you have questions or concerns about the sorting, put them here. There may well be problems remaining; we aim to solve them.
Next, the work-level (green) LCC numbers were recalculated. The new data cover 20% more works (2.2m), and is more "generic"--correctly removing second cutters and other edition-specific data.
Next, we're going to make LCC pages much as we have DDC (MDS) pages. Ideally, we'd like roll up the call numbers under the relatively high level terms used on the LCC site. The trick is, LCC doesn't have the sort of simple structure that DDC has. Subjects cross numbers quite arbitrarily--in effect, the numbers map arbitrarily to a separate classification system. Example:

So we need to figure out how to get these schedules ( https://www.loc.gov/catdir/cpso/lcco/ ) in a data format, without paying 6k for the FULL schedules from LC. In theory we could ask users to help us but, man, it's a lot of work, and very fiddly. If anyone knows where we could get it in a better format, let us know.
Info about LCCs
* Basic schedules, that is, an outline of the system (classes, numbers) https://www.loc.gov/catdir/cpso/lcco/
* Wikipedia: https://en.wikipedia.org/wiki/Library_of_Congress_Classification
* Sorting them, Rutgers PDF https://www.libraries.rutgers.edu/rul/staff/access_serv/student_coord/LibConSys....
* Some of the nasty variants we had to deal with http://web.library.yale.edu/cataloging/LC-call-numbers-overview/examples-7-11 and https://library.duke.edu/about/depts/cataloging/documentation/call-numbers
See below for more technical detail.
More detail
Doing this involved writing code to parse LCCs better than we have before--to determine what's the letters, class, decimal expansion, first cutter, second cutter, etc. "Valid LCC" is fairly generous. For example, it doesn't stop "HD70 C2132 Q3 D278 2008," which seems to have three cutters. But it will stop obvious non-LCCs ("NOT IN LC", "ANT", etc.). To be valid an LCC needs to have (1) Letter(s), (2) Numbers, (3) Numbers and decimal expansion must be integers, without zero-pading.
We aren't currently exposing the "is it valid" data. In theory we could, but it could get fiddly.
Work-level LCCs keep the first cutter. Most of the time this will be right--the author of a work doesn't change. But it can. There's no perfect way to do this, but keeping one cutter seems more likely correct.
First, @ChrisCatalfo and I fixed a longstanding bug with sorting. See https://www.librarything.com/topic/114262 . To sort by LCC, make sure you have the field in your display styles ( https://www.librarything.com/settings/styles ). Then click on the sort icon or, better, the blue column words "LC Classification.
Sorting works as follows:
1. Works without LCCs at all (i.e., blank)
2. LCCs, correctly sorted
3. Things that aren't valid LCCs.
NOTE: If you have questions or concerns about the sorting, put them here. There may well be problems remaining; we aim to solve them.
Next, the work-level (green) LCC numbers were recalculated. The new data cover 20% more works (2.2m), and is more "generic"--correctly removing second cutters and other edition-specific data.
Next, we're going to make LCC pages much as we have DDC (MDS) pages. Ideally, we'd like roll up the call numbers under the relatively high level terms used on the LCC site. The trick is, LCC doesn't have the sort of simple structure that DDC has. Subjects cross numbers quite arbitrarily--in effect, the numbers map arbitrarily to a separate classification system. Example:

So we need to figure out how to get these schedules ( https://www.loc.gov/catdir/cpso/lcco/ ) in a data format, without paying 6k for the FULL schedules from LC. In theory we could ask users to help us but, man, it's a lot of work, and very fiddly. If anyone knows where we could get it in a better format, let us know.
Info about LCCs
* Basic schedules, that is, an outline of the system (classes, numbers) https://www.loc.gov/catdir/cpso/lcco/
* Wikipedia: https://en.wikipedia.org/wiki/Library_of_Congress_Classification
* Sorting them, Rutgers PDF https://www.libraries.rutgers.edu/rul/staff/access_serv/student_coord/LibConSys....
* Some of the nasty variants we had to deal with http://web.library.yale.edu/cataloging/LC-call-numbers-overview/examples-7-11 and https://library.duke.edu/about/depts/cataloging/documentation/call-numbers
See below for more technical detail.
More detail
Doing this involved writing code to parse LCCs better than we have before--to determine what's the letters, class, decimal expansion, first cutter, second cutter, etc. "Valid LCC" is fairly generous. For example, it doesn't stop "HD70 C2132 Q3 D278 2008," which seems to have three cutters. But it will stop obvious non-LCCs ("NOT IN LC", "ANT", etc.). To be valid an LCC needs to have (1) Letter(s), (2) Numbers, (3) Numbers and decimal expansion must be integers, without zero-pading.
We aren't currently exposing the "is it valid" data. In theory we could, but it could get fiddly.
Work-level LCCs keep the first cutter. Most of the time this will be right--the author of a work doesn't change. But it can. There's no perfect way to do this, but keeping one cutter seems more likely correct.
3lorax
>2 Collectorator:
If they aren't in the LoC, looking in academic libraries on Worldcat is probably the best bet. Fiction tends to be harder to find this way.
Edit:
I'll add that this has prompted me to see if I can fill out the "stub" classifications I've given a few of my books where I couldn't find a full one (plopping a cookbook in TX, for instance) since now those sort all the way to the bottom as invalid.
If they aren't in the LoC, looking in academic libraries on Worldcat is probably the best bet. Fiction tends to be harder to find this way.
Edit:
I'll add that this has prompted me to see if I can fill out the "stub" classifications I've given a few of my books where I couldn't find a full one (plopping a cookbook in TX, for instance) since now those sort all the way to the bottom as invalid.
5jjwilson61
I had one book with an invalid LC Classification which was CPB Box no. 1955 vol. 14. It's Source was the Library of Congress.
6lesmel
Not sure if this will help any: http://id.loc.gov/authorities/classification.html
Example: http://id.loc.gov/authorities/classification/B.html
Example: http://id.loc.gov/authorities/classification/B.html
7lesmel
There's also: http://www.loc.gov/cds/classweb/classweborder.html -- again, not sure how helpful it will be.
8lesmel
This has the complete (I think) schedules in PDF form: http://www.loc.gov/aba/publications/FreeLCC/freelcc.html
9gilroy
>5 jjwilson61: I have 48 from the LoC that start with CPB Box no. Plus I have some that say Not in LC yet
10timspalding
I had one book with an invalid LC Classification which was CPB Box no. 1955 vol. 14. It's Source was the Library of Congress.
Well, LC has it in their catalog, but CPB is not part of the LCC system.
Well, LC has it in their catalog, but CPB is not part of the LCC system.
11timspalding
It's tricky. Should such a book be shelved in the C's? I don't know. It seems to me it's some sort of separate collection. Including invalid LCCs would cause confusion, but not including them can also cause it.
12jjwilson61
LT thinks this is a valid LCC, CGC 6832-6838 (ref print), but I don't think it is. The book is Fear and Loathing in Las Vegas.
13lorax
>11 timspalding:
No, that's not a valid LC number. It's helpful for locating the actual physical copy housed at the Library of Congress. It really does seem to be the exact equivalent of an LT tag "Box 35, number 12". It does not belong in the Cs.
No, that's not a valid LC number. It's helpful for locating the actual physical copy housed at the Library of Congress. It really does seem to be the exact equivalent of an LT tag "Box 35, number 12". It does not belong in the Cs.
14timspalding
>12 jjwilson61:
Yeah, it's going to think things are valid that aren't.
Someone sent me a text file of the top-level classifications. I'll put it in and maybe check against it.
Yeah, it's going to think things are valid that aren't.
Someone sent me a text file of the top-level classifications. I'll put it in and maybe check against it.
15Proclus
I've noticed a couple problems still. For example:
BX 385 .A82 is sorting before
BX 385 .A8
and same thing with translation cutters:
PG 3337 .L4 G413 is coming before
PG 3337 .L4 G4
This little group shows the problems with both the 1st & 2nd cutters:
http://www.librarything.com/catalog/Proclus&deepsearch=lermontov
BX 385 .A82 is sorting before
BX 385 .A8
and same thing with translation cutters:
PG 3337 .L4 G413 is coming before
PG 3337 .L4 G4
This little group shows the problems with both the 1st & 2nd cutters:
http://www.librarything.com/catalog/Proclus&deepsearch=lermontov
18timspalding
Test case:
BX 385 .A82 19T2 2007
BX 385 .A8 A4 2007
Okay, so the rule WAS that the two cutters are squished, so it's effectively
BX 385 .A8219T2 2007
BX 385 .A8A4 2007
That seemed to me what Chris told me was the rule, but I seem to find the opposite in descriptions of sorting these.
I fixed all the BXes. The whole system will refresh in next few hours. And I'll talk to Chris on Monday, to see whether the rule was right or wrong.
BX 385 .A82 19T2 2007
BX 385 .A8 A4 2007
Okay, so the rule WAS that the two cutters are squished, so it's effectively
BX 385 .A8219T2 2007
BX 385 .A8A4 2007
That seemed to me what Chris told me was the rule, but I seem to find the opposite in descriptions of sorting these.
I fixed all the BXes. The whole system will refresh in next few hours. And I'll talk to Chris on Monday, to see whether the rule was right or wrong.
19timspalding
Okay, should be good everywhere.
20Edward
This is great – many of my LCCs that weren't sorting correctly before are now.
However, I have three LCCs sorting in the following order:
G5753.O8F7 1994
G5753.O8 2009
G5753.O8A25 2014
I think the second cutter (F7/blank/A25) should be considered before the date, and the order should be (blank/A25/F7):
G5753.O8 2009
G5753.O8A25 2014
G5753.O8F7 1994
(These are maps, and the two cutters are for place and subject respectively. The subject cutter is omitted if the map has no subject focus.)
However, I have three LCCs sorting in the following order:
G5753.O8F7 1994
G5753.O8 2009
G5753.O8A25 2014
I think the second cutter (F7/blank/A25) should be considered before the date, and the order should be (blank/A25/F7):
G5753.O8 2009
G5753.O8A25 2014
G5753.O8F7 1994
(These are maps, and the two cutters are for place and subject respectively. The subject cutter is omitted if the map has no subject focus.)
21timspalding
The problem is with these call numbers:
G5753.O8F7 1994
G5753.O8A25 2014
Both have no space between the first and second cutter. I don't believe this occurs in library data, at least as LibraryThing parses it. I notice both are manually entered.
The lack of the space makes it think that the two cutters are one. It's tricky. We could try to fix the parsing to catch this, but I'm worried it will just cause something else unglued, like understanding (valid) monstrosities like "G6713.F7:3G6P2 1976 .L5"
G5753.O8F7 1994
G5753.O8A25 2014
Both have no space between the first and second cutter. I don't believe this occurs in library data, at least as LibraryThing parses it. I notice both are manually entered.
The lack of the space makes it think that the two cutters are one. It's tricky. We could try to fix the parsing to catch this, but I'm worried it will just cause something else unglued, like understanding (valid) monstrosities like "G6713.F7:3G6P2 1976 .L5"
22Edward
>21 timspalding: Thanks. I've added a space between the cutters, and they're now sorting as follows:
G5753.O8 A25 2014
G5753.O8 F7 1994
G5753.O8 2009
So, the two-cutter numbers are now in the correct order, but the number without the second cutter is still out of sequence.
G5753.O8 A25 2014
G5753.O8 F7 1994
G5753.O8 2009
So, the two-cutter numbers are now in the correct order, but the number without the second cutter is still out of sequence.
23timspalding
Okay, good. Thanks for the example. It was reading the 2009 *as* the cutter. We fixed how it works. I've re-run the G5753s, and confirmed it works, but it's going to be at least 8 hours for all the others to be recalculated.
24pauldavidrowe
I'm having a hard time getting LT to parse several call numbers that seem valid to me. I should say that I am very new to the LC system, so I apologize if it's my misunderstanding. They are all in either French literature or in
I have three works by the same author. Without the date, they all have the same call number that parses fine:
PQ2662 .A6523
However if I add a date at the end it doesn't parse:
PQ2662 .A6523 2016
I think this fits the recommendation for subarranging each author found here:
http://www.loc.gov/aba/publications/FreeLCC/PQ-text.pdf
I have also tried adding my own classification for a recent novel that doesn't have an LCC that I can find online. My attempt doesn't parse:
PQ2702 .U875
Even accounting for inaccuracies in developing the author cutter, I would expect it to parse ok.
Any thoughts?
I have three works by the same author. Without the date, they all have the same call number that parses fine:
PQ2662 .A6523
However if I add a date at the end it doesn't parse:
PQ2662 .A6523 2016
I think this fits the recommendation for subarranging each author found here:
http://www.loc.gov/aba/publications/FreeLCC/PQ-text.pdf
I have also tried adding my own classification for a recent novel that doesn't have an LCC that I can find online. My attempt doesn't parse:
PQ2702 .U875
Even accounting for inaccuracies in developing the author cutter, I would expect it to parse ok.
Any thoughts?
26Edward
>24 pauldavidrowe: I'm also having a problem with some recently entered LCCs. I have a book classified as DA670.C83 E9 2017 (I've tried variations on spacing and punctuation), and it's being sorted right at the end of the sequence (after ZA4375).
That was a manual entry, but I tested adding a book from the Library of Congress catalogue with the very similar classification DA670.C83E23, and that's also being sorted at the end.
That was a manual entry, but I tested adding a book from the Library of Congress catalogue with the very similar classification DA670.C83E23, and that's also being sorted at the end.
27Edward
>23 timspalding: Thanks, the G5753s are working fine now! I'm sorry I didn't thank you for this at the time!
28Edward
>26 Edward: The LCCs I mentioned above are now sorting correctly – maybe just a caching delay.
29timspalding
Hey, sorry. I should have posted here. The first report was by email; I replied to that and forgot about this.
30davidgn
Hmm. Are we aware of this?
https://raw.githubusercontent.com/edsu/lcco/master/lcco.rdf
via: https://pdfs.semanticscholar.org/e27f/332724835b325533afef2f8c275588dab509.pdf
pp. 162-163
That's the full outline (or first four levels) from https://www.loc.gov/catdir/cpso/lcco/ scraped and converted to SKOS.
https://raw.githubusercontent.com/edsu/lcco/master/lcco.rdf
via: https://pdfs.semanticscholar.org/e27f/332724835b325533afef2f8c275588dab509.pdf
pp. 162-163
That's the full outline (or first four levels) from https://www.loc.gov/catdir/cpso/lcco/ scraped and converted to SKOS.
31davidgn
Also: you mentioned asking "users" to help in >1 timspalding:. I think there are a lot of ideas in the PDF in >30 davidgn: that would attract a lot of crowdsourced attention from across the library world if a framework could be put in place, redounding to general notoriety and benefit. (Naturally, non-compete agreements and the like would need to be considered).

