Physical description, pages, sizes and LibraryThing philosophy

TalkRecommend Site Improvements

Join LibraryThing to post.

Physical description, pages, sizes and LibraryThing philosophy

This topic is currently marked as "dormant"—the last message is more than 90 days old. You can revive it by posting a reply.

1timspalding
Edited: May 4, 2010, 2:49 am

In the near future LibraryThing is going to have a number of cataloging upgrades. They have been a long time coming. Casey is doing most of the work here, but I'm doing some.

Among the first improvements will be access to what's known as the "physical description" fields (MARC field 300, see here) in library records, and to the much simpler but less detailed page-count and physical-size fields in Amazon records.

There's are some critical question of method. Today I want to address one key one: how to mash up two similar but incompatible views of book data--library and Amazon--as well as how to strike a balance between quality and requiring every LibraryThing member to get a degree in library science and data processing. In some ways, this is a tempest in a tea-pot. In others, this cuts to the soul of the site.

Amazon's data. Amazon data is straightforward. For page numbers there is a "pages" field that counts the Arabic-numeral pages in the book. (That is, if a book has i through xix, they aren't counted, but only the pages 1 through whatever.) It doesn't mention maps, fold-outs, illustrations or etc. Amazon sizing data is also straightforward, but also split into many fields--x, y, z and a units field.

Size information can be expressed a "string" (ie., "8.5 x 11 x 2 inches"), but edits must be done to the "fielded" numerical version if the data is to be "used" in any way, for example for calculating how much shelf space your books require.

Library data. The MARC "Physical Description" field (300) is much more complicated. Like much of library data it's simultaneously much richer and much less regular and machine-friendly than Amazon data. Here are some examples of the physical description field, with "English" translations. (You can see another random hundred here.)

xxii, 888 p. : ill., maps ; 25 cm.
"Book has 22 Roman-numeral pages, 888 Arabic-numeral pages, illustrations, maps and is 25 centimeters tall."

974 p. : ill., port. (on inside front cover) ; 18 cm.
"Book has 974 Arabic numeral pages, illustrations, a portrait on the inside cover and is 18 centimeters tall."

xvi, 751 p., {20} leaves of plates : maps ; 25 cm.
"Book has 16 Roman-numeral pages, 751 Arabic-numeral pages, twenty unpaginated plates, maps and is 25 centimeters tall."

1 v. (unpaged) : col. ill. ; 24 x 28 cm.
"One volume without page numbers, colored illustrations and is 24 centimeters wide and 28 centimeters high."

lx, 468 p., 1 l., 38 p. front., illus. (plans) maps (partly fold.) 16 cm.
"I am more complicated than Tim can understand."

We have slightly more to go on that these bare "strings." MARC data includes "subfields." As the link explains, the text is divided into sections for "extent," "other physical dimensions," and so forth, such that:

11 v. ill., port. (on inside front cover); 24 cm.

is effectively

Extent: 11 v.
Other physical details: ill., port. (on inside front cover);
Dimensions: 24 cm.

That's "chuncked" but still very "string-centric" and not fully parseable as data.

The problems: There are three basic problems

The smaller problem is that the Amazon and library models are different.

The larger problem is that the library data isn't really "data." It reads well to a trained eye, but it can't always be parsed into cut-and-dried fields. And any step away from its current format, toward either ease of editing or the "meaning" of the words "breaks the link" with the original data—creating inconsistencies and thwarting any attempt to revert to library data or use other library data to improve things.

How complex can editing and input be? Is it reasonable to expect members to examine and/or fill out large numbers of boxes?

Approaches:

1. Keep the data in its current form. In this scenario library is kept as strings. It can't be analyzed properly, but at least it's not garbled by automatic analysis or to fit the limited mindset of Amazon. The key question would be whether to have a single physical-description field, or to have the subfields.

The Amazon data could be kept simple--allowing you to, for example, sort all your Amazon books by page-count. Or it could be turned into a library-ish "Description field."

2. Parse the string into data as much as possible and then discard the string. 90% of the time we can detect both Arabic-page counts and the height of the book. And when we can, we're right about 90% of the time too. So, one approach would be for LibraryThing to examine the physical description field, parse it as best as it could, and come out with the facts necessary to produce Amazon-compatible information.

The rest of the data is tricky. There's a powerful argument for trying to parse most of it--having a "illustrations" checkbox and a "illustrations comments" field. But no matter how much of this we attempt, we will lose data, and we can only do it once. We can't keep going back to re-parse the string, after members have changed the calculated data. Worse, while pages and height are pretty easy, the rest is harder. Is it worth it to pick up high-quality data and get it wrong a significant percentage of the time?

3. Parse the numbers out, but keep the strings. This is attractive--allowing members to get the benefit of both systems. The downside is that users will produce illogical edits--changing the page number field but not the physical description field. Those that want to do it right will be forced to make all changes in two places, and follow a mysterious librarian system.

4. Keep strings but calculate numbers. We can keep the library-ish strings, like "xxii, 888 p. : ill., maps ; 25 cm." but produce "calculated" version of the data for display and sorting. So, for example, we'd keep

xxii, 888 p. : ill., maps ; 25 cm.

But the integer-only, sortable "pages" field would perform some magic to determine that a number plus a space and a "p." represents the Arabic-numeral pages.

This approach has advantages but also drawbacks. For starters, how is the data edited? There's something crazy about having members have to edit library data in library-ish ways. To expect them to have an eye on how LibraryThing will parse that information is crazier still.

Editing presents another problem. Can the calculation be overriden? This is how the "summary" field currently works in LibraryThing--if you leave it blank, it's calculated from the title and author, but you can edit it. But if it can, inconsitency problems reemerge. And it requires members to understand a rather slippery concept--seemingly simple fields that are really calculations, but can also be overridden.

5. Some combination or hybrid. Your choice.

Final questions:

1. Should we move to a tiered model of record types--eg., simple records and complex ones. Does record-type determine what edits can be made? If so, can a user change the record type--for example, from a simple Amazonish model to a library-ish model?

2. How does any of this apply to manual edits?

3. Do we care about being able to produce MARC records from edited or member-entered data? Should we aim to have our manual entries be good enough to be picked up by libraries?

4. Assuming complicated things take longer, and changing systems afterwards can be very painful, what's an acceptable trade-off between doing it right and doing it soon?

5. Should LibraryThing always aim to be the best at cataloging than any similar site? Should we aim to be better even at the expense of confusing some members? Should we aim to stay "in tune" with library data, or just pull what we can out and try to improve on the model?

2KingRat
May 4, 2010, 2:34 am

I like #4 with editable an editable page number field.

1. Perhaps, please explain more. No. No.

2. Yes. It's really irritating that I can't edit subject headings for my own books. I'd hate to add more fields to that category.

3. It would be nice, but you can't enforce it. I suspect that there's a core group of folks on LT that will between them produce extremely high quality data, because they are anal-retentive about getting the information on their books right. (I am one of them.) This is potentially valuable, but there are a millions of books and not enough active contributors to cover everything.

4. Can o worms.

5. Yes. I'd have to think about this; perhaps the details editing page could group info into "social" data (i.e., commonly used) and cataloging data. I don't know enough about library data to make a really considered judgment, but what I've seen is that library data is all over the map in quality and adherence to "library data standards" so I'd be leaning toward the latter.

3r.orrison
Edited: May 4, 2010, 2:46 am

Answering Final Question 1 first, I really think you should only have a single record type. I tend to import recent paperbacks from Amazon, because that's often the only place I can find them, but don't want to be limited in the data I can keep because of that initial choice.

Partially-formed suggestion for the data:
On import, parse the data, whether it be from a library or Amazon, and then generate a MARC string if necessary. Store the MARC text and separate fields for the numbers. Display the MARC text and the numbers in parentheses after

When editing, break it out into separate fields - a long text field for the MARC record, and small appropriately labelled fields for the numbers. Depending on which data the user changed, rebuild the other fields when they click Save. If the text field is unparseable after an edit, or they've entered conflicting data, don't worry about it just save what the user entered. (This is the hard part.)

Final questions:
2. as above
3. I don't, but others may. I think you'd have a very hard time ensuring that user data is usable by libraries, but maybe you could.
4. Do it soon, in a way that can be extended later, and then finish it later. (I make it sound so easy, don't I?) (Like you did with author roles, and author splitting, and pictures, except don't forget the "finish it later" bit. Sorry, sore points.)
5. You have to decide who your market is. If you want to be the best social book cataloging site, then you have to cater for Facebook users. If you want to be the best online book catalog, with user data, then you have to cater for libraries. If you want to be both, you won't have it easy.

Edited to add: I like KingRat's suggestion about separate social and library data sections.

4justjim
May 4, 2010, 2:48 am

Should LibraryThing always aim to be the best at cataloging than any similar site? Should we aim to be better even at the expense of confusing some members? Should we aim to stay "in tune" with library data, or just pull what we can out and try to improve on the model?

Aye, there's the rub.

//Rest of post cut to notepad while I really think about it!//

5timspalding
Edited: May 4, 2010, 2:55 am

For what it's worth, I'd love help—from users, and especially from computer and cataloging people—about the problems. The final questions are easier because they're opinions, not solutions :)

Incidentally, I'm not even getting into two further complications:

1. We now have library data for virtually all Amazon-sourced books—and will allow members to jump to it, in full or in part. This opens the issue up to a much wider audience.
2. The work level.

6felius
May 4, 2010, 3:01 am

2. Parse the string into data as much as possible and then discard the string.

We should aim to create brand new edition-level records based on what ever aggregate data we can extract. If we can extract it automatically, that's great - but otherwise we should give users a tool to allow them to enter properly fielded data themselves.

1. Should we move to a tiered model of record types--eg., simple records and complex ones. Does record-type determine what edits can be made? If so, can a user change the record type--for example, from a simple Amazonish model to a library-ish model?

The amazon/library records should be read-only. We should have an edition-level record which works like CK - everyone can contribute to it, it has separate fields with declared data types, and it's the canonical record (within LT) for a particular edition of a work.

3. Do we care about being able to produce MARC records from edited or member-entered data? Should we aim to have our manual entries be good enough to be picked up by libraries?

Yes, and yes. I'd say that the 300 field should be generated in an appropriate format from data people enter regarding the work, not a free-form text field. i.e. it should be backwards compatible with any existing use of the field elsewhere, but we ought to be able to declare how it's constructed from our data such that it becomes machine readable to anyone wishing to use our MARC records.

5. Should LibraryThing always aim to be the best at cataloging than any similar site? Should we aim to be better even at the expense of confusing some members?

Yes, and yes.

7timspalding
May 4, 2010, 3:07 am

But what's "properly fielded data?" To represent the possible options of just the 300 field in a series of checkboxes, radio buttons, string- and integer fields would produce a monstrosity—hard to work with and opaque to all.

8andyl
May 4, 2010, 5:18 am

#7

That's the real issue. Only library people (and computer programming people with an interest) will care enough to figure it out correctly anyway. A number of the fields are pretty much free text governed by convention (and not specification). For example when it comes to size you may see stuff like "200 x 350 cm., folded to 20 x 15 cm., in plastic case 25 x 20 cm." in subfield c.

As you are probably aware repetition can occur (both another 300 record and also on a subfield basis) and you can have alternate extents (the pages/vols count) in parentheses.

The UI to handle all that would be horrendously cumbersome and very confusing. However the Amazon solution isn't enough for sophisticated needs.

Some thoughts -

i) Don't throw the original away. In fact don't throw the original MARC record away full stop. One of the things I liked in LT a few years ago was being able to view the original MARC records.

ii) Parsing the numbers maybe OK iff the roman numeral bit is kept separate, and is also editable.

iii) Tiered model of record types - I think this is going to have to be inevitable. Also I know you have mentioned 'work level' up thread but this is really 'edition level' data. If stored at edition level what happens when two trusted sources give different data - say LOC and BL. Most users would be happy with the Amazon level of data - but some will not. Switching between simple and complex may be a solution.

iv) Manual edits? A switchable UI between simple and more (but not insanely more) detailed.

v) Yes LT should be able to produce compliant MARC records - but maybe more simplified than the full 300 field spec.

So I think from that I favour (on pragmatic grounds) a parsed approach to the MARC data simplifying it into a max of 3 extents and simple dimensions fields. The data should be editable (I doubt a full parser that works 100% can be achieved - for example the dimensions of the folded map in a case) and the original imported data should be viewable.

9felius
May 4, 2010, 5:43 am

>7 timspalding: Well, that comes back to your final question #4, and finding an acceptable trade-off between complete and adequate.

I say we collect three dimensions and a single page count as a minimum; optionally we can collect separate page counts for roman/arabic numerals if that's important, and have check boxes for other reasonably common options (illustrated, colour, etc).

This is also why I think the CK approach is the right one - we can start off with the bare minimum of useful fields, and add more over time as necessary - assuming people find it worthwhile to enter/convert the data.

10reading_fox
May 4, 2010, 5:54 am

A few thoughts:

Fields that you can't do anyting with are fairly pointless. The very least you need to be able to do is sort by page number - hence you're going to need to extract that somehow. May be you can have two fields {Page number} and {extra pages}

Uneditable fields are even worse. I've long since abandoned looking at Subject. I pretend it doesn't exist.

An edition layer is going to be vital here - or will you ignore this at Work level?

I used to like seeing the full Marc record - it helps in combining.

Do it right - make sure it retroactively applies to all existing records. Don't leave a few key bits hanging off for "later" - we're still waiting for author roles to be completed.

Have I mentioned before that I'd prefer older features to work properly first before more new features that also don't work properly are added?

11pgmcc
May 4, 2010, 6:36 am

I recommend Tim's reading Systemantics by John Gall before getting too complex. (Especially the earlier editions.)

It won't provide any technical solutions, but it may support the development of a mind-set that prevents one going crazy when considering the minutia of LT system requirements.

12MarthaJeanne
May 4, 2010, 6:50 am

Please don't make a field that can't be edited.

Quite outside of anything else, there are plenty of times when I settle for a different edition and edit it. I don't mind not having data, but having wrong data really bothers me.

13Collectorator
May 4, 2010, 6:54 am

This member has been suspended from the site.

14andyl
May 4, 2010, 7:33 am

#13

I presume it will be all of those sources that use Z39.50. We used to get MARC records from all the library sources for example the National Library Of Scotland. WorldCat isn't a LT source (and has a proprietorial interest in their data anyway) so I doubt very much we would get MARC records from them.

15Collectorator
May 4, 2010, 7:58 am

This member has been suspended from the site.

16JonathanGorman
May 4, 2010, 9:07 am

>9 felius:

The lack of three dimensions is one thing that has bugged me about current cataloging practice. It is a minor issue in some ways, but it has actually been a problem in practice in a few cases. (Moving large number of books, trying to estimate how many boxes needed).

Occasionally catalogers do actually record height and width. I forget the exact AACr2 (my copy is sitting at home), but essentially there's three rules that boil down to something like...:

1) if the width of the cover is between half the size of the height and twice the size of the cover, don't record height. Estimate height.

2) if the width of the cover is more than twice the height of cover

3) I'm sure there's three rules, who knows what this one is.

(*This is off the top of my head, probably mostly wrong, except for the 1/2 to 2x rule*)

Just putting in the height x width x depth seems so much more straightforward once you also consider the additional machine readability and possibilities. *Ha! You got a virtual shelf browse! Well, ours you can rotate the book and see the dimensions IN SCALE!". Ok, maybe that's too much.

Then again of course there's the physical issue of how widely can you open the book/inside margins that are important for xeroxing/scanners/etc.

I think too much about minutia though perhaps.

17countrylife
May 4, 2010, 9:17 am

Neither a programmer nor a librarian. But as a user, I would like the 3 dimensions to be in separate fields and each of them sortable.

Agree with felius/6 on everything he addressed.

18jjwilson61
May 4, 2010, 9:45 am

I'm not sure about the work-level like CK idea. I'm afraid a lot of people might want to count page numbers in their own way (number of physical pages instead of just the last roman numeral page number for example).

I think the original MARC record should be stored somewhere so that you can go back and try to get more info from it later. You shouldn't try to modify this original record due to changes the user has made. The field shouldn't be hidden from the user but there should be an extra step to get to it vs. the editable data. And if it is blank there ought to be a way to load it from a library without deleting and reentering the book.

19EveleenM
May 4, 2010, 10:02 am

Is the number of pages from Amazon the number which at the moment is imported into the Publication field? I've used amazon.co.uk for the basic entry of most of my books, which I then edit. The page number in the publication field is the piece of data which is most likely to be wrong - wrong not just some of the time, but up to nine times in ten. It seems to go back to the publisher: a few imprints and small presses are consistently right. I don't know what happens with the others; my impression is that Amazon are sent an estimate at an early stage in the production process, which never gets corrected.

Anyway, if that's the number we're talking about, I think it's essential for users to be able to correct it, whether at an individual level or at an edition-level CK. If Amazon have a better number in the system which we haven't been getting up to now, you can ignore this post.

20Larxol
May 4, 2010, 11:29 am

Are we getting an edition separation and combination function? For example, this edition and that edition of this work are the same edition, even though some of the edition-level data has been recorded differently.

21timspalding
May 4, 2010, 11:33 am

Don't throw the original away. In fact don't throw the original MARC record away full stop. One of the things I liked in LT a few years ago was being able to view the original MARC records.

No, we won't. We have it now. But when the MARC and the LT no longer share an underlying structure, they can't "speak to each other" as it were.

Edition level, work level

I shouldn't have raised the issue because it's mostly a red herring. No problems are solved at the edition level except, as Felius writes, a potential problem of getting enough eyeballs to look at your data. But the structure questions aren't solved.

From how many sources will we receive MARC records? Basically I am asking if it is LoC only or WorldCat, too?

Virtually everything found in a library will have a record. (The exception is some early records, where we had to go find the record again and for whatever reason it's gone.) Most Amazon records also have a good-guess record too. We will make it easy to move from one to the other, and between versions of a record generally.

22Heather19
May 4, 2010, 12:05 pm

Um. I'll respond, but I'll probably be in the minority (at least among fellow RSI-talkers).

I personally use Amazon data almost exclusively, despite their drawbacks, and I just edit things when data is wrong. I wouldn't use and don't care about more detailed physical descriptions and MARC record stuff. HOWEVER, I can definitely see how important and interesting this could be for the site itself, and more detail-oriented users. As long as it doesn't interfer with how I chose to catalogue, go for it! Or, um, something along those lines.

"is it reasonable to expect members to examine and/or fill out large numbers of boxes?"
No. To a certain extent, this already happens... If you want to make sure the data you pull from your Add Books source is correct, that is. But many users don't look twice at that data (even I don't, sometimes), and shouldn't be expected to put more time/energy into this... and I'm having a really hard time putting my thoughts into words.

The only response I really have to the questions is to number 5. Confusing some members happens regularly, I think, when changes are made and new features put out. However, LibraryThing should NEVER chose better data over overall member-comfort. If the majority of members would be confused by it ("it" being whatever), it shouldn't happen. LT is so unique in it's "putting users first" model, I'd hate to see that change.

23_Zoe_
May 4, 2010, 12:14 pm

Should we move to a tiered model of record types--eg., simple records and complex ones. Does record-type determine what edits can be made? If so, can a user change the record type--for example, from a simple Amazonish model to a library-ish model?

I definitely think the Details page should distinguish between personal and bibliographic data, maybe even with the two split onto separate pages.

I would go with your simple/complex distinction, but I think there would be a lot of disagreement about what counts as simple and what counts as complex. I still hate the Book Information section on the work pages.

But regardless of what you do with display, all books should have all data available somewhere.

Also, no matter what happens with all the MARC data, there should be a basic field for number of pages: one number, editable and sortable.

24stephmo
May 4, 2010, 12:25 pm

How complex can editing and input be? Is it reasonable to expect members to examine and/or fill out large numbers of boxes?

To answer the first part - infinitely so - and individuals will request all manner of complexity. How does the saying go? A camel is a horse designed by committee...I think the object here is to make sure you don't end up with a camel after tacking on all endless manner of improvements to the horse. After all, people still want the horse.

On the second part. The more manual it is, the less appealing it is and the faster the feature becomes this lovely niche thing that appeals only to hardcore data folks. I realize that there's a population on the site that seems to feel that ease of use means we attract "lesser" membership, but I personally think that's a load of hooey. Don't we take enough flack over the green plus sign not working to know that yet another feature that appears straightforward should behave that way (I see that the page numbers already show for other users, why do I have to enter them for all of mine as well?)?

I say import the arabic numbers and throw them in and allow the field to be editable. That way, the 80% of the users that want a page count imported and don't care if it's off by i-xiv or 4 maps or the photo pages are happy and the 20% that are hardcore can edit at will - because that's what they'll end up doing anyway when it's discovered that no one parsing method makes everyone happy.

And I'm pulling 80/20 out of my butt - mostly because everyone loves to use the 80/20 rule when doing pareto charts.

25PhaedraB
May 4, 2010, 12:29 pm

12>Please don't make a field that can't be edited.

Quite outside of anything else, there are plenty of times when I settle for a different edition and edit it. I don't mind not having data, but having wrong data really bothers me.


To which I say amen. I am much more likely to chose a "pretty close" edition that I then edit, than I am to do a manual entry. It would make me nuts to have things such as dimensions or page counts that were not editable.

>22 Heather19:I personally use Amazon data almost exclusively, despite their drawbacks, and I just edit things when data is wrong. I wouldn't use and don't care about more detailed physical descriptions and MARC record stuff. HOWEVER, I can definitely see how important and interesting this could be for the site itself, and more detail-oriented users. As long as it doesn't interfer with how I chose to catalogue, go for it! Or, um, something along those lines.

What she said.

I'm not terribly interested in dimensions, though I do note (in comments) if a book is oversized or smaller than a mass market paperback. I'd be happy to move that (also my notes "illus. b/w" or "color" or "maps") out of comments and into a field. But please, please let me edit imported records. (See "make me nuts" above.)

Page counts I put in "Publication." I could move that, but don't give me uneditable. (See "make me nuts" above.)

I could darn near clear my comments field completely if we also had fields or tick boxes for (square bracket)contains(square bracket) "bibliography", "endnotes", "footnotes", "glossary", "indexed", "DJ" (dustjacket), and "marginalia". Which pretty much fills my stable with ponies, so I'll stop.

26Katya0133
May 4, 2010, 1:19 pm

I think I'm generally in favor of approach number 3, because it preserves the original strings but also allows for the data to be parsed. Although, I'm not exactly sure what you'd want to do with the original strings, but I'm reluctant to throw anything away. :)

Answers to questions:

1. The problem I see here is that it's not just a matter of simple vs. complex. I'm all in favor of allowing users to enrich their data (that is, arguably, the whole point of this site), but the rules for entering library data are often counterintuitive and only make sense in a library catalog or, worse, only make sense if you're familiar with the history of library catalogs.

2. I don't know.

3. Again, I don't know. I have actually used LT data to enrich a library catalog (by using it as a source for volume numbers in a publisher's series), but it's a big jump between adding CK-type series data to a record and understanding the interaction between 440, 490, and 8xx MARC fields.

4. I don't know.

5. I think that LT should be the best at cataloging, because the site seems to attract the members who care the most about the data.

At the same time, a "milk before meat" UI approach to new users might do a better job of not overwhelming them or scaring them away. (Maybe you could set up the features like a video game, where you have to pass certain levels--by entering data--in order to "Unlock" additional features? ;) )

For now, I'd say we should focus on pulling out what we can from library records without throwing anything away. (The idea of staying "in tune" is awfully tricky right now, anyway, with the impending approach of RDA.)

27lorax
May 4, 2010, 2:02 pm

I'll look at this in more detail later, after reading the responses, but my immediate response is, to final question 5:

5. Should LibraryThing always aim to be the best at cataloging than any similar site? Should we aim to be better even at the expense of confusing some members?

Yes, yes, yes, a thousand times yes. This is your core strength.

Should we aim to stay "in tune" with library data, or just pull what we can out and try to improve on the model?

I think that's a separate question, and one I'm not qualified to answer. Improvement is good. Oversimplification is bad.

28Collectorator
May 4, 2010, 2:09 pm

This member has been suspended from the site.

29lorax
May 4, 2010, 2:11 pm

This is also why I think the CK approach is the right one - we can start off with the bare minimum of useful fields, and add more over time as necessary - assuming people find it worthwhile to enter/convert the data.

Except CK is work-level, and page counts and so forth are book-level.

Please don't make a field that can't be edited.

Quite outside of anything else, there are plenty of times when I settle for a different edition and edit it. I don't mind not having data, but having wrong data really bothers me.


Yes. This.

30_Zoe_
May 4, 2010, 2:20 pm

Should LibraryThing always aim to be the best at cataloging than any similar site? Should we aim to be better even at the expense of confusing some members?

I think there's another issue to consider too: isn't LT already the best by far in terms of cataloguing? Does GR or Shelfari have even the faintest of hope of catching up? And if not, wouldn't it make more sense to focus on the areas where you're comparatively weaker (i.e., pay attention to things like ease of use), so that you can compete more for the "average" person looking for a social book site? The hard-core catalogers have already been won over, so going even more into the hard-core cataloguing at the expense of confusing more casual users seems like a bad trade-off.

31timspalding
Edited: May 4, 2010, 2:22 pm

Thanks for piping up Katya. (I specifically asked her to.)

with the impending approach of RDA

That brings up another thing--should LT be working in an RDA-compatible space instead.

The hard-core catalogers have already been won over, so going even more into the hard-core cataloguing at the expense of confusing more casual users seems like a bad trade-off.

Right. In general, though, I don't think we should hang up out hats. To take an example, both GR and Shelfari have page counts. They use Amazon's and that's that. You can't edit them, but they're there. So, we should at LEAST have them. The question is how we approach that issue.

32lorax
May 4, 2010, 2:31 pm

General thoughts:

You cannot rely exclusively on Amazon data -- whatever approach you use must include library data, even if all you do is keep the raw strings.

I think options 3 and 4 (and I'm not clear on the difference between the two) would be acceptable. Keeping the raw MARC string for libraries, and having the Amazon-style "page number" field available for users to fill in (so, not trying to parse the data, but allowing users to do so) would also be an alternative.

"Final questions:"


1. Should we move to a tiered model of record types--eg., simple records and complex ones. Does record-type determine what edits can be made?


Can you expand a little bit on what you mean here?

If so, can a user change the record type--for example, from a simple Amazonish model to a library-ish model?

Depends on whether there's a shared "edition" level or not. If it's purely individual, let users do whatever they want. If it's shared, though, despite having said I don't want un-editable fields in my catalog, I think that this is not a good idea. If users care about data quality enough to want library-ish data, they're unlikely to be sourcing the books from Amazon anyway, except in rare cases. This only opens a can of worms of clueless Amazon-only users corrupting the fields with bad data. (If we get the "change record source" thing, where those of us who added books from Amazon before we knew any better can fix our old shame, there would be basically no need for this.)

2. How does any of this apply to manual edits?

Edits, or additions? I think fields should be editable though probably not type-changeable, but users who do manual addition should be able to put in the fully-complex data if they want to.

3. Do we care about being able to produce MARC records from edited or member-entered data? Should we aim to have our manual entries be good enough to be picked up by libraries?


A nice goal to strive for, but unlikely to ever happen in a world where "All books by author" is an acceptable record.

4. Assuming complicated things take longer, and changing systems afterwards can be very painful, what's an acceptable trade-off between doing it right and doing it soon?

Get it right. We've waited this long, we can wait a little longer.

5. Should LibraryThing always aim to be the best at cataloging than any similar site? Should we aim to be better even at the expense of confusing some members?

I've already answered this, but YES. Data quality is what distinguishes LT from the masses. If users don't understand the "Physical description" field, they can just ignore it. Increased cataloging ability doesn't interfere with anyone's usage of the site. Getting passed by someone else with better cataloging risks losing your power users.

33lorax
May 4, 2010, 2:33 pm

The hard-core catalogers have already been won over

For the time being. If LT ceases to be the best at cataloging, a lot of us would leave for greener pastures. (Neither GR or Shelfari is ever likely to be what beats LT here, of course.)

34_Zoe_
May 4, 2010, 2:38 pm

>33 lorax: I guess I'm just less concerned about this hypothetical better cataloguing site that may come into existence at some point in the future than I am about the real competition LT is facing--and in many respects, losing to--right now.

35Steven_VI
May 4, 2010, 2:38 pm

Old books cataloger and ditto bibliographer here (yes, those are different), speaking from experience with three different ways to fill out this field - albeit not in MARC. I'll give you an example.

In our catalog, books are either 'old books' (pre-1800) or 'modern books'. The records have a different lay-out, the old books records being much more elaborate. But not all pre-1800 books have the 'old books'' lay-out, since this was introduced only about 10 years ago and some of these books have been in the catalog since the 1860's. Upgrading about 35.000 books takes quite a while. The main problem is that if you see a record which has "38 p." you don't know if it really does only have 38 pages, or if it should be upgraded to "2, xxij, 38, 2, 2 blank p."

Of course you could argue that very few users worry about this, which is very true. The users that *are* confused are the ones that really need the extra pagination information, and they still can't get it with certainty from the catalog. This is a problem that will very slowly go away while we are upgrading the current "modern books" records for old books. But this is a controlled environment, whereas LT is uncontrolled (which is its great strength, I feel).

A problem which Tim failed to mention is that MARC only tells you what field you're entering, not how to enter it. If all libraries use the same cataloguing system (like AACR2) this shouldn't be a problem; but that isn't the case. For example, the abbreviations are usually translated (maps in Dutch is krtn., folding is uitsl.); some libraries note blank pages (as in my example above) but most don't; some include tables next to illustrations, others don't. This will make parsing rather difficult.

Another note about dimensions (can you tell that I'm on my hobby horse?). Apart from the fact that in order to be useful these must be in the same measurement (cm, mm or inch?) the dimensions of a book depend on the copy, not the edition. True, in most cases the original edition size will be kept. But library books can be (re)bound or cut. This is especially true for old books, but also for many older French books ("broché", in paper wrappers) which are/were usually bound by the library or the owner. So the dimensions of your copy of the book may very well differ from those of copy in the catalog you're importing from.

Concluding, I would say: keep it simple, and clear for everybody.

36dchaikin
May 4, 2010, 3:03 pm

Should LibraryThing always aim to be the best at cataloging than any similar site? Should we aim to be better even at the expense of confusing some members?

The more I think about this the more I think that the answer is a resounding yes - limited by your resources, which should be social-aspect heavy, I imagine.

Social users won't care about the physical descriptions and will allow to do what you want. They will be OK being confused

Serious catalogers will want all the best cataloging and won't be confused.

Left in the middle are casual catalogers (like me) who will be confused and will have to put in some effort to learn the system (and pay for our pre-upgrade mistakes). I think (but I'm not sure) these users would be interested in learning a new system.

OTHERWISE On a personal level, I want to know:
1. That the the LT record will be the same irregardless of the source format.
2. How would this be applied to existing records?
3. How much freedom will users have to manually edit these records?

37lorax
May 4, 2010, 3:38 pm

34>

Whereas I was answering Tim's actual question, which was "Should we remain the best at cataloging?" A "no" answer would mean "No, let someone else pass us by."

LT's only "losing" to GoodReads if you don't care about data quality. Otherwise it's like saying they're "losing" to Facebook; sure, the other may have more members, but they aren't even trying to do the same thing.

38FicusFan
May 4, 2010, 3:43 pm

I guess I would like to think that you could do both, be the best Cataloging site, and do it in such a way that you don't confuse people.

Perhaps simple, auto-filled (but editable) boxes that you parse. Save the string and then expanded fields that you can drill down to, if you are interested in the more complex data (auto-filled the best you can, but again editable).

I am not really interested in a lot of these details, so I wouldn't want them to disrupt my quick and easy import. At the same time, I know others are interested, so they should be allowed to do what they feel is important.

To echo some of the others:
I think whatever you do should be editable, sortable and it should be fully implemented, not 'some for later'.

I think in disabling the green plus sign you have already opted for data over users.

If you are going to make changes to the catalog data structure, then the display needs to be upgraded as well, otherwise you are just robbing Peter to pay Paul.

I would like to see editions implemented.

39DaynaRT
May 4, 2010, 3:45 pm

What Zoe said needs repeating:

Also, no matter what happens with all the MARC data, there should be a basic field for number of pages: one number, editable and sortable.

40_Zoe_
May 4, 2010, 3:50 pm

>37 lorax: I was answering Tim's question too: "Should we aim to be better even at the expense of confusing some members?"

but they aren't even trying to do the same thing

Fortunately I don't think Tim is foolhardy enough to believe that LT is as different from GR as it is from Facebook. Plenty of people use both LT and Facebook, but when it comes to LT vs. GR there seems to be a pretty strong tendency to choose just one.

41jrochkind
May 4, 2010, 3:50 pm

Here's what I would do.

Yes, parse the numbers out of the MARC string you have.

Keep the original MARC string around, but do NOT let users edit it. Let the users edit _structured_ boxes of the elements you actually want. Pages: Height: Whatever else you want:

These are pre-populated with what was parsed out of the MARC string. But the first time a user actually edits these values, the MARC string is thrown out (or kept for historical purposes, but never used for anything ever again).

Until a user does edit it, you've got the original string around, and if you improve your parsing algorithms, then you'll improve what you parse out of the marc string you've kept around. But as soon as someone edits it, the human edit is probably more reliable than your heuristics trying to parse out of the MARC string, that's now better data you should keep.

If you're actually letting users edit, there is no good reason to have them edit in hard-to-parse MARC/AACR2-ish format. Have them actually enter the individual elements in individual reliable boxes.

If you have the actual elements in individual data structures internally, you CAN still create MARC exports that will be good enough, you can apply AACR2 rules to those individual elements to produce MARC exports. I'm not sure if you should care about this or not, so I'm not sure you should care if it won't be perfect, but it'll be pretty good.

There is no reason to stick to ridiculous MARC data when you can do better.

--jrochkind, a library coder

42_Zoe_
May 4, 2010, 4:12 pm

For reference, here's a poll about whether people use more than one book cataloguing site. So far, no.

43timspalding
May 4, 2010, 4:19 pm

For what it's worth, I'd love to keep the topic somewhat focused. I'm almost sorry to have raise the larger issue. I need to decide the smaller issue—how to handle physical description. The slightly larger issue is how to handle the REST of the improvements we can glean from library data—which are considerable, but involve still other factors I didn't want to get into. The general question of whether LT should be more catalog-y or social or whatever isn't going to help me decide this question. And, generally, we are committed to doing both.

Also, no matter what happens with all the MARC data, there should be a basic field for number of pages: one number, editable and sortable.

Should that number include numbered pages only? That's the usual convention, but with library data we also often know un-numbered pages.

44lorax
May 4, 2010, 4:26 pm

43>

I think that by default that number should include only numbered pages; users can always edit it to add in non-numbered pages, but it's less likely to be confusing if they see "436 p." on a book where, when they flip to the last-numbered page, the number they see is "436".

38>

It's not about "data vs. users". I'm a user. Data is extremely important to me. It's about "users who care about data" and "users who couldn't care less" and "users who are actively hostile to data". The latter group would be the only ones who would be turned off by cataloging improvements -- users who don't care wouldn't use them, but wouldn't be driven away, either.

45DaynaRT
May 4, 2010, 4:27 pm

>43 timspalding:
Why does the page number field need to be autopopulated? No one will ever agree on how it should be calculated which will lead to the feature never being implemented.

All I've wanted, for years, is a blank, editable, sortable field so that I can shoving my page numbers into the BCID field.

46_Zoe_
May 4, 2010, 4:28 pm

>43 timspalding: I'd stick with the usual convention.

47brightcopy
May 4, 2010, 4:49 pm

45> I can give you an example, though I don't think it should drive the conversation as it's something I can do without. It'd be fun for me to add up all the pagecount of the books I've read in the last year. Or look at the longest, the shortest, etc. In such a situation, I don't actually care that the pagecounts are exactly correct on all my data. So if it auto-populated, that'd be handy for me. But I sure as hell am not going to go and hand-fill in the page counts for all the books in my library.

But yeah, if it's a stopping point, I think the very first thing to do is add the field and leave it blank. It'd be a shame if the more complicated stuff delayed simply having the field for another "two weeks."

48AnnaClaire
May 4, 2010, 4:52 pm

I haven't read every single post thoroughly, and other people will have added to this in the time it takes to type it, but I go home soon and once I leave the office I won't necessarily remember to come back.

I think approach 3 would be the best. The downside of approach 2 -- the risk of getting things weirdly wrong and not being able to fix the code and try again -- is a little too unpleasant. Same thing with approach 4 -- the good thing about LT is the ability to adjust things, and it looks like we'd lose that.

As for the questions:
1: I like the idea of "tiered" but am not sure how that would apply in terms of complexity. But I do think tiers could be useful, for things like having an edition tier between the work and book levels. That said, I don't claim to be a programmer and have no idea how it could be implemented.

2: Like I said, I don't know how tiers would be implemented, but perhaps it could suggest an edition in much the same way that autocombine works now: If your copy shares, say, an ISBN and publisher and format with other editions of the same work, the system could somehow suggest that your copy is the same edition.

3: I don't much care either way about producing MARC records from my data. I am not a library.

4: Considering how long collections took, I think I can wait -- as long as I see some progress, and you don't let it get in the way of fixing stuff. I'd say spend a little time now mapping out the big picture (to get things right), then give us bits of this as they get finished (letting us see progress).

5: I think LT already is the better cataloging site, compared to what I've seen. The challenge is making it better still without making it worse/confusing.

49_Zoe_
May 4, 2010, 4:56 pm

>47 brightcopy: I'd enjoy something like that too, though I'm sure it will never be implemented in a way I'd use (ideally it could be based on the reading dates, but I have a feeling we'd be forced to create yet more extraneous collections).

The problem with creating the field and leaving it blank is that chances are it would be years before they got around to auto-filling it, if ever. I think it has to be now or never.

50Helcura
May 4, 2010, 5:30 pm

I'd consider myself to be in the middle of the cataloger spectrum and I need to do a little more thinking on the topic, but I do think that at minimum:

1. User editable, sortable, sum-able fields for page count, height, length and width.
2. Keep the original MARC record available.
3. Parse the basic data as much as reasonable
4. Something for extras - illustrations, maps, etc. but I need to think more about what format, and how fine a discrimination level.

51lorax
May 4, 2010, 5:38 pm

50>

I hadn't thought of it before, but summable width -- aka "how many shelf-feet of bookcases do I need?" would be very nice, wouldn't it?

52DaynaRT
May 4, 2010, 5:54 pm

>49 _Zoe_:
I don't ever want my page number field auto-filled.

53_Zoe_
May 4, 2010, 5:59 pm

>52 DaynaRT: Isn't it the kind of thing that could be green, like Dewey and LC numbers? I don't think any other fields have the option to just not import data that's available. I'd assume that once there's a page field, it will come with the data from whatever source you use to add the book--at least for books added after the fact. For books added before, I guess it has to be optional.

54DaynaRT
May 4, 2010, 6:02 pm

More stuff filled with incorrect data (no matter the color). No thanks. I'll stick with the BCID field if that's the case.

55brightcopy
May 4, 2010, 6:11 pm

54> More stuff filled with incorrect data (no matter the color).

I think you've just described the entire process of trying to catalog your books based on data that's out there on someone else's servers. ;)

I'd think in this case, you'd just have a tag/collection that is for books you've verified all the data on (author names, titles, publication data, etc. in addition to page count). At that point, you jump on the imaginary pony that is Enabling Power-Edit to Mass Update Fields Other Than Collections and Tags and blank out all the unverified page numbers (correct or incorrect) and you'd be all set. :D

56_Zoe_
May 4, 2010, 6:15 pm

I'd think in this case, you'd just have a tag/collection that is for books you've verified all the data on (author names, titles, publication data, etc. in addition to page count).

Right. Isn't this how the people usually go about correcting data after the initial entry? The data comes in with the source, and those who care go through systematically changing it. I'm not sure why number of pages would be different from any other field.

57jjwilson61
May 4, 2010, 6:36 pm

54> If that's you're attitude than you're better off entering all your books manually. Adding books from outside sources is just going to lead to bad data.

58DaynaRT
May 4, 2010, 6:39 pm

I never would have thought of adding books manually, even after all these years. How enlightening.

59jjwilson61
May 4, 2010, 6:41 pm

41> Until a user does edit it, you've got the original string around, and if you improve your parsing algorithms, then you'll improve what you parse out of the marc string you've kept around. But as soon as someone edits it, the human edit is probably more reliable than your heuristics trying to parse out of the MARC string, that's now better data you should keep.

But what if Tim wants to get more data from the Marc record than he did in the first pass. So if he wants to give illustrated its own checkbox, for example, why should he not be able to do that for books where the owner edited the width field?

Instead, if Tim keeps track internally of which LT fields had been changed since they were imported, he could hold on the Marc record to rescan it in the future but only update the LT fields if the user hadn't changed that field.

60jjwilson61
May 4, 2010, 6:43 pm

58> I guess all that work Tim put into that feature to import data was just wasted then. I mean, what's the point. It can't be trusted anyway.

61Heather19
May 4, 2010, 6:45 pm

AnnaClaire in message 48 said I think LT already is the better cataloging site, compared to what I've seen. The challenge is making it better still without making it worse/confusing.

That is what I was trying to say earlier.

It seems a lot of people in this thread would rather see LT get "better" at the expense of it's users, and I don't agree with that. LT *IS* better. And I think one of the things that *makes* it better is that they actually *care* about their users. Please, PLEASE don't sacrifice user-comfort for "better" data.

Would it be *possible* for me to learn complicated stuff about MARC and new, complicated work pages? Sure. Would I be mad and turned off to LT in a HUGE way if I was suddenly "expected" to put in more time and effort to maintain my catalogue, when I don't give a crap about all those extra fields? Yes. I think a LOT of non-power-users would feel the same way.

62brightcopy
May 4, 2010, 6:57 pm

58> I think you forgot an extra "." at the end of that sentence. ;)

63lorax
May 4, 2010, 6:59 pm

61>

I am a user too. And with equal validity I can say "It seems a lot of people in this thread would rather see LT get 'easier to use' at the expense of its users, and I don't agree with that." It's not about data vs users, or ease-of-use vs users; it's about balancing the desires of different groups of users.

Nobody's expecting you to use fields you don't want to. We're expecting you not to throw a fit when they're added so that those of us who want to use them are allowed to do so.

64brightcopy
May 4, 2010, 7:12 pm

63> I see it as kind of like the notorious Subjects field. I find them useless. So I didn't use it as a column in my catalog. Everyone was happy with the arrangement.

65Heather19
May 4, 2010, 7:23 pm

63: If these discussed-changes had no inpact on how I catalogue, I wouldn't "throw a fit". However, Tim himself said in the OP

How complex can editing and input be? Is it reasonable to expect members to examine and/or fill out large numbers of boxes?

And I'm saying that I don't think that's reasonable. That's all. I'm not trying to interfere with your user-desires, I'm just trying to get my voice out there so that MY user-desires aren't interfered with. That's all.

66lorax
May 4, 2010, 7:39 pm

65>

It was obvious to me that members aren't going to be required to fill out anything, given that currently not even author is required. It seems to me that Tim meant "Is anyone going to fill this out if we provide the whole complexity, or do we need to simplify it to get anyone to use it?"

67brightcopy
Edited: May 4, 2010, 7:47 pm

65/66> Ditto what lorax said (Why do I find myself saying this more often? It was more fun when I always disagreed with you). Tim means if they want to put that data in their catalog, should they be expected to use a more complicated interface.

68felius
May 4, 2010, 10:11 pm

>18 jjwilson61:
That's why I'm talking about edition-level CK rather than work level.

>21 timspalding:
re: the edition level/work level distinction, the point of raising that is that physical description is meaningless at a work level. The work level includes editions with completely different sizes and numbers of pages. On the other hand if you restrict yourself to the book level for this data then you have no easy way of aggregating edits and resolving conflicts.

69jjwilson61
Edited: May 4, 2010, 10:26 pm

On the other hand if you restrict yourself to the book level for this data then you have no easy way of aggregating edits and resolving conflicts.

Plus you end up duplicating a lot of what should be identical data. How much space will these new book-level fields take in the database that could be saved if the fields were in an edition level?

ETA: Swapped "space" for "memory" because databases exist on disk as well as in memory.

70nichtich
Edited: May 5, 2010, 4:27 am

That's why I'm talking about edition-level CK rather than work level.

1. Should we move to a tiered model of record types?
There already is such a model, for instance Common Knowledge (CK) is a record of its own. What is missing is a CK-like record on the edition level.

2. How does any of this apply to manual edits?
The core distinction is not between manual edit or pre-filled fields between fields editable by one user and fields editable by all users (CK). The former may not be restricted too much. If I like to put in roman page numbers please let me do it as I do not conflict with other users.

3. Do we care about being able to produce MARC records from edited or member-entered data? Should we aim to have our manual entries be good enough to be picked up by libraries?
No! Just provide the best data we have in LT and let libraries think about how to make use of it.

4. What's an acceptable trade-off between doing it right and doing it soon?
42.

5. Should LibraryThing always aim to be the best at cataloging than any similar site? Should we aim to be better even at the expense of confusing some members? Should we aim to stay "in tune" with library data, or just pull what we can out and try to improve on the model?
LT already provides the best cataloging and it already confuses members because cataloging confuses people. As long as you can also just collect books without having to edit any fields that's fine. But if you want to describe your books, editions, and works, you are into it and there is no way to keep it simple because cataloging is complex. Please keep on looking at library data to keep in touch but aim at doing it right for LT users. They already provided better data than libraries at the places where you provided them easy tools to collaborate (especially on works and series).

71andyl
May 5, 2010, 4:35 am

#65

Which is why a switchable UI might work For example initially you could have just a field for number of pages plus a button to switch "more detail" on - which would then let you add some extra detail. It is the complexity of the extra detail which is being talked about

Now obviously as we have seen from some of the examples this could get complicated - multiple page counts, multiple extents, alternate units for length, etc - and I'm not proposing that all that functionality is added (it might end up too complex for all but the masochistic), however at the basic level all a 'non-catalogue oriented user' should see is a single field for number of pages and fields for size (H x W x D).

72MarthaJeanne
May 5, 2010, 8:54 am

Since other people want these fields, I have no real problem with them being added, and filled if possible from the sources when the book is added, if I then have the option of accepting, changing or deleting the information in my catalogue.

HWD are a real problem on an international site. If information comes from different sites, some of it will be in inches and some in metric units, which could be either cm or mm.

I am against filling these fields automatically, even with green information, for books already entered. Too much of it will be wrong.

73_Zoe_
May 5, 2010, 9:02 am

I also wonder how the dimensions are going to take into account different units, unless it's just a text field.

If we're not going to fill in the data automatically, though, I'd say not to bother with the whole thing. Waste of time for too little gain.

74jjwilson61
May 5, 2010, 9:14 am

HWD are a real problem on an international site. If information comes from different sites, some of it will be in inches and some in metric units, which could be either cm or mm.

That's not really a problem since the units are in the data from the Marc record. If LT let users select their preferred unit then it could convert it before storing it into the individual fields.

75andyl
May 5, 2010, 9:59 am

#74

Sure the units are in the Marc record and in most cases it is reasonably easy to parse but it is free text.

"c200 x 350 cm., folded to 20 x 15 cm., in plastic case 25 x 20 cm." is an unusual example of what could be in the dimensions. What dimensions should be used there? If you are worried about fitting things on shelves then the last is more important. But from a "usability of content" pov the first is more important.

It may be that the only sane way of attacking things is to pick something that works in the majority of cases*, and fails reasonably gracefully when presented with something odd.

* most of us don't have any (and nearly all of us don't have many) old books that have been catalogued as 8 v. (unpaged) or other material that push the complexity boundaries. For most of us the majority of our items are either normal books (hardcover and paperback) and maybe CDs (or DVDs).

76jjwilson61
May 5, 2010, 10:40 am

I believe Tim said that he realizes that the data he gets won't be 100% accurate. I think that what most LT members want from dimensions is whether they'll fit on the shelves and how many books will fit on a shelf, so if it's possible then I think that Tim should extract the "shelf dimensions" out of the above example. And if it isn't feasible programmatically then members can fill it in manually.

77shmjay
May 9, 2010, 11:12 pm

I’m a professional library cataloguer, but I’m not sure I understand what you want to do here.

1. What do you want out of the data?
2. Perhaps more important, what do people want out of the data? What do they want to do with it?

The only question I can definitely answer is #3: No. People have to be trained to catalogue according to AACR2, etc., so you won’t be able to generate a library-quality description field from the data of a lot of people who are using their own private rules for cataloguing.

78timspalding
May 10, 2010, 3:32 am

>77 shmjay:

I think you could do it if you were very explicit about the boxes, at least for "simple" books (ie., without a lizard taped to page 202 and an audiotape wrapped around the map). Give me an example where you wouldn't be able to do it?

That said, the prompts would need to much too explicit for most peoples' tastes.

79shmjay
Edited: May 10, 2010, 4:52 am

Sure, if it was for standard books with:

numbered pages {What's the last page with a number on it?}
Does it have illustrations?
Does it have maps? {which is the only special illustration the Library of Congress bothers to note for regular books}
size {Give the height of the book, in centimetres}.

But I suspect adding Roman numerals, specifying other kinds of illustrations, specifying whether or not some are in colour, and non-standard sizes (width greater than height, tiny books or huge books) would increase the error rate, as well as being tedious. Centimetres might even be a problem for some Americans. And why only height?, they would say. Why not width and depth?, as people have asked above.

In a library you can enforce it by saying 1. This is the international standard; 2. Your paycheque requires you to follow this standard. But why should a casual person on the Internet care?

lx, 468 p., 1 l., 38 p. front., illus. (plans) maps (partly fold.) 16 cm.

This is the archaic mode of description. For non-rare books, we no longer need to count blank leaves or note the presence of a frontispiece, thank goodness.

80justjim
May 10, 2010, 5:10 am

But why should a casual person on the Internet care?

But now we are talking about Thingamabrarians! That might make a difference.

Seriously, not wanting to belittle your professional achievements, I'm sure it is something that can be learned, here, by anyone who wants to put in the effort.

81StephenBarkley
May 10, 2010, 8:54 am

Just an idea: how about a 1 hour(ish) web-training seminar to teach LTers to edit advanced fields? I'm sure many members and non-members would be interested and grateful for the knowledge.

82infiniteletters
May 10, 2010, 10:12 am

81: 1 hour? *blinks* There are semester-long classes on it, and people still don't get it "right".

83lorax
May 10, 2010, 12:21 pm

81>

How about you don't assume that anyone who isn't a librarian is automatically an idiot? I can understand that a field labeled "height in centimeters" wants to be filled with the height in centimeters without spending an hour being told that's what it means, thanks.

Maybe this one would be different, but I haven't yet seen a web training course that wasn't horrifically slow and oversimplified and geared at people who need to be told everything five times in very small words.

84brightcopy
May 10, 2010, 12:31 pm

82 & 83> These responses together are a hoot. One person is shocked because the poster suggests that such a huge amount of training could possibly be condensed down to an hour, the other is offended because the poster suggests that it'd take an entire hour to explain such a simplistic task.

Clearly, there's a bit of a disconnect here.

85lorax
May 10, 2010, 12:59 pm

84>

I suspect infiniteletters was thinking of the full complexity, with no prompts, while I was thinking of what was suggested by Tim, which would be reduced complexity with explicit prompts.

I've seen many suggestions over the years for restricting one or another ability on LT to librarians, which always gets my hackles up. There are lots of competent and hard-working Combiners, for instance, who aren't librarians, and we know what we're doing.

86StephenBarkley
May 10, 2010, 2:48 pm

83, 85> I agree. I consider myself a competent (albeit slothful) combiner too. My suggestion for a course would keep the less-than-competent out of the mix.

Maybe a short LT skills quiz would be sufficient—failure would lead you to the course. Just thinking out loud here.

82> There will always be errors, but what if we could up the get-it-right quotient from 60% to 95%?

84> "Damned of you do, damned if you don't" :)

87shmjay
May 10, 2010, 11:55 pm

Yes, but I’m thinking of the casual people who don’t want to put in an effort. I mean, it’s not hard, it’s just nit-picking, that’s all, and I can imagine people getting bored with it. There are a lot of *librarians* who don’t see the point of it either. We call them "reference librarians" ;)

88shmjay
May 10, 2010, 11:58 pm

85> I’d be happy with reduced complexity with explicit prompts, but the question is still, "what kind of data do people *want* to record"? There might be something obvious that people want to record but isn't in the library cataloguing rules, such as all three dimensions.

89EveleenM
May 11, 2010, 5:53 am

#87
I mean, it’s not hard, it’s just nit-picking, that’s all

I think the most nit-picking book owners all find their way here.

90infiniteletters
May 11, 2010, 9:40 am

87: Hence my quotation marks around "right". I rarely got it right after said semester course either.

91bell7
May 11, 2010, 12:25 pm

1.Should we move to a tiered model of record types--eg., simple records and complex ones. Does record-type determine what edits can be made? If so, can a user change the record type--for example, from a simple Amazonish model to a library-ish model? Hmm...I kind of like the idea of being able to switch some of the records I took from Amazon (generally early in my adding books to this site or for manga, because getting individual records from library data was pretty much impossible if they were cataloged correctly) over to a model that was more library. I would want more details of exactly how this would work, however, before giving a definite answer. How would this work at the edition level, for example? Would the manual entries that currently have title and author be the lumped together as the most “basic” edition?

2. How does any of this apply to manual edits? Yes, please, let me edit the physical description of my books. How do you anticipate the manual edits affecting the rest of the edition or work? Would different information for the same ISBN, such as page number - which I've seen in library records I've copy cataloged - mean a different edition, would it get lumped in by ISBN but kept with separate pagination, or would it change the information for everything under that ISBN? Or something else entirely?

3. Do we care about being able to produce MARC records from edited or member-entered data? Should we aim to have our manual entries be good enough to be picked up by libraries? No, I think in too many cases members would be interested in different data than libraries, as the above discussion regarding height x width x depth of a book. Also, a MARC record is just not friendly for the non-cataloger.

4. Assuming complicated things take longer, and changing systems afterwards can be very painful, what's an acceptable trade-off between doing it right and doing it soon? This looks like it's shaping up to be a big change, and I would say take time to get it right. The length of discussion on page numbers alone suggests that this is going to be complicated...

5. Should LibraryThing always aim to be the best at cataloging than any similar site? Should we aim to be better even at the expense of confusing some members? Should we aim to stay "in tune" with library data, or just pull what we can out and try to improve on the model? Other folks have said it better, but I'll repeat – be the best, but no so librarian-ish that others get lost with the information thrown at them. I think most people here care about getting the information about their books right. And I would say pull out what you can from a string and “translate” it. Even as a librarian, I wouldn't use every aspect of physical description, and for the really complicated books I can't parse it either. What I would care most about is Arabic page numbering, and possibly being able to say what type of illustration, if applicable – black and white, photographs, etc. As far as height and all the rest is concerned, I would probably leave it alone much like I do the green text summaries that currently show up.

Sorry for the long (and late) response, just wanted to think it through a bit before putting something out there.

92jjwilson61
May 11, 2010, 2:10 pm

I would hope that the dimensions would be stored in some standard format, probably cm, and convert it when the page is rendered to the format that the user desires.

93AnnaClaire
May 11, 2010, 3:46 pm

>92 jjwilson61:
Only if those of us who use inches only have to say so once.

That said, even if I could set my own default to inches the one time (and not have to select inches for every dimension every time), I don't think I'll be entering dimensions for my books. It just seems like a lot of effort on my part that doesn't really do anything useful for me in return. The kind of things I would use are fields for the stuff one would expect to see in the current "Publication" field -- things like format and the name of the publisher.

94Helcura
May 12, 2010, 12:58 pm

>93 AnnaClaire:

I can see that a lot of people might not use dimensions, but I think I'd do it for one simple reason: I've been thinking about having some custom bookcases built and dimensions would allow me to group together by topic using tags and then specify height, depth and shelf length. That's why I'd like sum-able fields.

I'm at the point where I have odd little bookshelves in every nook of my house and physical characteristics of books have become more important to me than they used to be.

95countrylife
Edited: May 12, 2010, 1:19 pm

EXACTLY what she said! (Helcura/94)

eta: Well, maybe not ~exactly~... Every time we move, we have to build book cases from scratch and the space utilized, of course, varies. Between tags and dimensions (I want all them all, in separate fields, sum-able), it would make that job so much easier.

96SchanleyMedia
Edited: May 28, 2010, 8:35 pm

I'm late to the game, having procrastinated with school and the still-as-yet-unsuccessful search for employment, but this is a topic near to my heart. I'm going to address the questions roughly backwards.

First, let me say that I catalog at my place of employment, and my experience has convinced me that librarians can't even be counted upon to get things right. And no, I'm not even talking about obscure fields. I'm talking records that have errors in the most important line:

Title : $b subtitle / $c author.

looking like

Title. subtitle {by} author. (but substitute brackets for {} to avoid touchstone)

Sure, a human can read it, but a computer can't parse it the same any more. Don't even get me started on what counts as authorship for government docs...

Worse yet, this happens for records which started out from the Library of Congress. Lots of automated batch loads and "enhancements" plus faulty in-publication and vendor records are making their way into OCLC, and sloppy copy cataloging accepts poor records as-is (due to slashed library budgets.) Hey, my library doesn't have time to fix its own old sloppy records acquired from other libraries either, even though I'm careful with my new cataloging. "Good enough" rules by necessity. The arrival of RDA and blended standards as ONIX conversions are embraced will make it even worse. So to Final question 3, I must say: use what MARC offers on import, but don't bother to try to fully preserve it, much less export it. LT-ers won't learn it if even the pros can't be bothered to enforce standards.

I do think that LT should aim to be the best cataloging site and should support functionality equal to the most heavily used MARC fields and subfields. They're heavily used for a reason. Check out http://www.oclc.org/research/publications/library/2010/2010-06.pdf for the most-used fields. This report doesn't identify sub-fields, but just looking at sample records en-masse will tell you important things. For instance, even catalogers don't always detail the type of illustrations. Maps and charts get called out most after the plain ill. Maybe it's best to parse out the most frequently used options and not worry about the rest. Do we really need to indicate fold-out pages and leaves versus plates here? Probably not. All those 1 v. (various pagings) can be left blank on import...someone wants to count, let them have at it.

Yes, I'm advocating for a hybrid model, where MARC gets parsed and turned into a very few options (not exhaustive) and a new, editable MARC comments field gets retained. Parse out number of volumes (default to 1 if not stated), single numerical field for pages, checkboxes for ill. and maybe maps and charts (and these are debatable), plus height (defaulting blank for many books) and width. Though MARC doesn't code it, LT members might want the option of thickness; that's a tough call. If the user edits any imported field manually, have the MARC change color (or go to italics) and display an asterisk that indicates that manual edits were made and data might conflict. Allow this MARC comments field to be manually edited to keep the "real catalogers" happy.

I do think that the site is going to need to embrace a tiered record model at some point. Not everyone will want or need the MARC record. In fact, not everyone even needs an edition/manifestation-level record with publication details now. Some people just want a FRBR work record, a "generic edition," and we should make it possible for those who simply want to track a reading list to do so. The generic edition would, in effect, be the "work" record and a stable URI, to which editions would be added automatically (and combined/separated as appropriate).

And with my husband walking in the door, that must be all I can say on this for now. I suppose that is enough to chew on, though!

(Edited to fix erroneous auto-touchstone)

97Kathleen828
Sep 26, 2010, 8:41 am

Sigh! I am a Technical Services Librarian, and I have to read about this monstrosity (FRBR) on a daily basis at work. Pfaugh!

I have been away from Library Thing for some time, and am just now returning to engage with it. And what do I find???!!!

FRBR has invaded here too!

Case in point, I have just entered a newer edition of War and Peace, this one translated by Andrew Bromfield. I have already put my Guerra Y Paz into LT, and eventually I will enter my original W & P, translated by Constance Garnett.

When I entered my Bromfield translation today, the red "duplicate copy exists in your collection" appeared in my recently entered information.

THIS IS PATENTLY UNTRUE! I do NOT have another copy of THIS BOOK. I have other copies of War and Peace, yes, but not of this. Having this stupid note incenses me. I know what I have; I do NOT have duplicates, I have 3 different translations of the same book.

I perfectly understand what FRBR is trying to do and I violently disagree with it.

98Nicole_VanK
Sep 26, 2010, 8:52 am

You do not have multiple copies of the book, but you do of the "work" as LT defines it*. I agree the wording "duplicate copy exists in your collection" (I get it too, for having several editions of "Alices Adventures in Wonderland" for instance) leaves to be desired. But it's not necessarily a FRBR thing.

* "Works connect all the different editions of a book, so that members with one edition can connect to members with other editions. Works also improve the recommendation system and much more."

99Kathleen828
Sep 26, 2010, 12:56 pm

I understand, WEMI, Matt - I just hate it

100timspalding
Sep 27, 2010, 4:24 pm

>99 Kathleen828:

Your hatred is noted. Can you explain it?

101WholeHouseLibrary
Sep 27, 2010, 6:01 pm

Gee! A hundred messages, and no one has even bothered to consider probably the most important physical characteristic of a book - its weight.

Do I want to read this book? Nah! It's too heavy to want to carry around.

102PhaedraB
Sep 27, 2010, 7:21 pm

101 >

If it's too heavy, it becomes the bedside table book. Unless it doesn't fit on the bedside table, then maybe it needs to sit on the floor.

Of course, even if it's lightweight, it has to be of a shape that will fit in my purse. if not that, then it becomes bedside table fodder. But if it's too big for the purse, will it fit on the bedside table or become another floor sitter?

So, we need a column to compare book size to handbag sizes and end-table sizes and nightstand sizes, sorted by outfit and room ...

103SylviaC
Sep 27, 2010, 7:33 pm

Don't forget smell. If it smells like smoke or perfume, I don't want to read it. If it smells like bacon, I do. We need a column for that.

104infiniteletters
Sep 27, 2010, 8:17 pm

103: If it smells like Bad Things (bad being whatever you dislike, of course), then get rid of it. ;P

As for good things, tag it bacon. :P

105Suncat
Sep 27, 2010, 9:01 pm

>103 SylviaC:

Read it or eat it? I guess people can devour books.

106justjim
Sep 27, 2010, 11:36 pm

This member certainly does.

107timspalding
Sep 27, 2010, 11:51 pm

This message has been deleted by its author.

108timspalding
Sep 27, 2010, 11:51 pm

Oh, I follower her on Twitter. I've never seen the picture so big.

109Nicole_VanK
Sep 30, 2010, 8:38 am

> 103: If it smells like bacon it will probably end up in my other account : http://www.librarything.com/profile/Ankher

110AnnaClaire
Sep 30, 2010, 11:14 am

>109 Nicole_VanK:
How about if it smelled like cheese? Would it end up there then?

111Nicole_VanK
Sep 30, 2010, 11:22 am

> 110: She's not a fussy eater.

112AnnaClaire
Sep 30, 2010, 11:29 am

>111 Nicole_VanK:
So the only reason her catalog isn't any bigger is because you've hidden all your books? I hope you've done the same with your shoes.

113Nicole_VanK
Edited: Sep 30, 2010, 11:34 am

> 112: No, most of my walls are side to side, floor to ceiling bookshelves. So that wasn't an option. I managed to teach her not to touch books. But if they start smelling like regular food all bets are off.

114Helcura
Sep 30, 2010, 5:51 pm

>101 WholeHouseLibrary:

I actually have two books (Footrot Flats: The Dog Strips and Ten Years of User Friendly.org that are so heavy I have to put them on the floor and read them lying on my stomach because I can't hold them.

I still love them, though, although if quality multi-volume set became available, I'd buy it in a heartbeat.

115Kathleen828
Oct 3, 2010, 2:04 pm

Thank you for asking, Tim, and please forgive my delay in replying. I have been sick and am still under the weather, so I don't feel that I will be able to be either clear or persuasive in any explanation of my dislike of FRBR/RDA, etc.

So, a few random points are the best I can manage for now...

1)Its very basis, a computer-programming-style, data-entity relationship model seems to me to be Procustean in its very essence when applied to cataloging practice

2) Its terminology is execrable - NO ONE understands what an "Expression" is, and I have heard myriads of experts discourse on it

3) It is not useful;rather it causes confusion. The idea of trying to explain it to patrons is horrific.

4) It is unwieldy - RDA's table of contents is 110 pages long

5) Only a few people are willing to say "The Emperor has no clothes" in relation to it

In lieu of clarity from me, at least for the moment, I recommend anything by Martha Yee regarding the subject.

I hope to be better able to express myself about this when I am feeling better.

116Kathleen828
Oct 3, 2010, 2:06 pm

I really am sick, I mean Procrustean...

117infiniteletters
Oct 3, 2010, 4:56 pm

115: So if FRBR was less complicated, would you be happier?

118jjmcgaffey
Oct 3, 2010, 6:01 pm

I'm not a librarian, I have no idea what FRBR or RDA or whatever it is is. But I love that LT links books that are the same 'work' - the same content - into one object. The problem lies with the edge cases, of course; if this is what FRBR is about, then that may be why there's so much documentation about it. Is an abridged work the same as the original? (no) What if it's the same work, but with different foreword/afterword? (maybe) What about a critical edition, where the footnotes can outnumber the text? (no, but should be linked to the original). And translations, and...

Real things don't fit neatly into categories (DDC is fun, and MDC is being equally fun (fun pronounced sarcastically, both times) because of this), but treating each book, or even each edition of a book, as a completely separate entity would lose out on most of the connectivity that makes LT interesting - even within my own library, I'd miss some info about having different versions of the same 'work'. I find Duplicate copy interesting and useful, not annoying - I even have some Duplicate ISBN copies, mostly for loaning out.

119Kathleen828
Oct 9, 2010, 10:15 am

Ok - here we go again.

TIM -- is there some way I can get you to shut off this maddening WORK thing just in my account? PLEASE THIS IS DRIVING ME CRAZY!!!

It's just happened to me AGAIN! I've just entered Geoffrey Barraclough's "The Crucible of Europe, the ninth and tenth centuries in European history."

ONCE AGAIN LT told me that "a duplicate work exists..." etc.

I did not think that I had this particular edition of this book, but the note forced me to go and look through my library just to make sure. If I already had it, I would pass it on to someone else.

I DO NOT HAVE THIS BOOK!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

Yes, I have a different edition of it, but I don't have this one.

I WANT YOU TO STOP TELLING ME THIS, PLEASE! It is annoying, unnecessary and irritating in the extreme. I know what I mean, I don't need you to help me. WEMI is not of use to me, please LET ME OUT OF THIS. I am perfectly capable of managing my own account and that is what I would like to do.

May I please be excused?

Thank you!

120keristars
Edited: Oct 9, 2010, 10:44 am

119> I think it helps to not think of it as trying to manage your account for you or anything. It's not like it has gone and messed with your data or is saying that in ten seconds, if you don't take action, one of those records will be eaten by the other. Instead, consider that by saying those books are the same work, the system is allowing connections between people who have different editions, to facilitate finding similar libraries, which in turn can help you find new-to-you books in those libraries, or for the people with similar libraries to be more likely to see the books in yours and get recommendations and so on, thanks to Connection News or Recommendations.

You can also completely ignore that page.

I really don't understand the hostility towards the function, to be honest. :/

eta: also, I'm not sure how different the 1968 and 1998 books are. One is from the Folio Society, right? but does it change the contents any? There are different page counts, but that could be due to layout variance, not actual content change. If the only difference is the presentation of the individual copy, why can't they be considered effectively the same? They still show up as unique in your catalogue...

121Kathleen828
Oct 9, 2010, 12:11 pm

@120 Thank you for your kind and soothing words. I'm sure my reaction to this must appear excessive. In part that is because this noxious scheme is all over my work life and I cannot express my feelings in that venue. So when I encounter it here, where I come for joy and relaxation, it incenses me. :(

I object to the theory behind this because I think it is flawed in principle when applied in this arena, AND I think that a computer-based model (entity-relationship) has been thrust into library world by theoreticians who like that sort of thing. I do not like it, nor do I find it in the least helpful.

I can find like minded others by searching title or author or any other category I choose.

Also, I do not accept the idea that my two editions are the same. They patently are not, as several things about them differ - pagination, cover, publisher, etc. The only thing they share is authorship, and I already know that. I don't LT to tell me. It's annoying and distracting.

I hate FRBR, upon which this is based because I think it is fatally flawed in concept. "Muddle-headed in the extreme" would not be a bad description of it.

Just Google it, if you'd like to see how bad it really is. I defy any normal person to make heads or tails of it without deep study. A tool ostensibly created to help people should do that; FRBR does not. To find its ugly visage in my beloved Library Thing makes me sad - and angry! Sigh!

122jjwilson61
Edited: Oct 9, 2010, 12:30 pm

I don't think its based on FRBR or WEMI or anything else other than a common sense notion that if I read Lord of the Rings and you read Lord of the Rings most people would say that we've read the same book. LibraryThing doesn't take this information from any library records in imports, it infers it based on the Title and the Author being the same. The inference is sometimes wrong and LibraryThing members combine and separate works all the time. Take a look at the group Combiners! sometime.

So I think you're hostility is entirely misplaced here. LT's work system isn't what you think it is.

123SchanleyMedia
Oct 9, 2010, 2:14 pm

I don't think the idea behind FRBR is only of use to people who like computers. If I go into a rural library that doesn't even have an online catalog yet, one that physically stamps its books and uses a card catalog because they don't have money for computers, and you tell the librarian you liked The Lord of the Rings and would like to read something similar, the librarian is NOT going to ask you if you read the one with the castle on the cover, or the one with the large print, or the one put out by XYZ publisher after the movie came out. These things matter only for the management of physical volumes, and those physical volumes are the most important aspect of the book only for those who collect books as objects. For most people, the physical aspect of the book is much less relevant than its content; after all, the author didn't write an edition, he wrote a "book," and one or more distributors merely made copies (none of which are actually 100% faithful to the author's creation in terms of cover, pages, etc.) and sent them out to the public. This was true even long before the printing press, much less the computer. The Canon which used to be taught in Western universities wasn't tied to specific editions, and with good reason. LibraryThing takes seriously the underlying truth of the book as artistic creation, not physical manifestation. For straight-up traditional cataloging of physical objects, with every physical description field a volume-worshipper could desire, places like biblios.net are better; LT aims for something with more meaning, more context, and more connectivity between readers and the books that inspire them.

124prosfilaes
Edited: Oct 9, 2010, 3:49 pm

#121: If I have two copies of "I, Robot", then I want LibraryThing to tell me that, even if one is paperback and one is hardback and paginated and even illustrated differently. At its simplest level, the work system is pure user-friendly. Even at its more complex level, it drives a lot of features that most people using LibraryThing love; recommendations are a lot more powerful when they include all the editions into one work.

125Heather19
Oct 9, 2010, 4:54 pm

*just putting my 2cents in here* I don't know about the whole "FRBR" thing, but...

Kathleen, I do understand your frustration about the "work" system here. But as much as I may hate to admit it, LibraryThing is a social site. I hope it'll always be a cataloguing site first and social site second, but... Anyways. What I mean is this:

A lot of the features and/or way things work around here is geared towards helping readers connect to people who read the same books. If I've read a first-edition of "such-and-such" book, and another person has read the 2nd edition of that same book, *usually* we would have something to talk about. The "book" is the same in the sense that the plot is the same.

And that's how Tim defines "works", basically. He has said that if, at a party, person A and person B have two editions of the same "work", and they can talk about the book and be talking about the same thing, then it's essentially the same work.

And that sounds confusing as I write it. Basically, leaving out possible author's notes and such, different editions usually *are* the same book. If they are different enough that it impacts the plot, story outcome, information, etc, then they should be different works. If not... well, that's just one of those things LT does that not everyone agrees with, I guess.

126eromsted
Oct 9, 2010, 7:21 pm

>119 Kathleen828:
As others have said the work concept is quite central to LT and Tim's not going to change it.

However, I noticed that you say you were "forced to look through {your} library" to check whether the newly entered was the same "particular edition" as your older copy (however you would mean that).

It's true that edition information does not show up on the Add books page. But you don't need to do a search to find it. Just click on the title of the newly added book in the Recently added column of the add books page. Now on the work page, you should see a section called Book Information. At the top of this section is the edition info for the book you just entered. Below you should see "Kathleen828's other editions" and the edition info for your other copy(s) of the work. The publication info visible here is usually enough to decide whether the editions are the same or not.

127TineOliver
Oct 9, 2010, 8:23 pm

> 99, 119

Can I suggest that with respect to the 'duplicate work' notice it be amended as follows:

(a) Remove the word duplicate - that seems to be at least part of the problem. What about simply stating that "You have another version of this work in your library"? The sentence as it reads now seems to suggest that I've stuffed up (pardon the expression) and bought two copies of the same book.

(b) Change the colour - the red, again, makes it look like I've made a mistake somewhere.

So in other words, I'm happy for you to tell me that I have two versions of the Count of Monte Cristo in my library - but not that they're duplicates, because they're not (different translations).

128Heather19
Oct 9, 2010, 9:43 pm

127: I'd definitely like that.

129ABVR
Oct 9, 2010, 10:22 pm

> 127

As T. H. Huxley is said to have remarked, upon reading the manuscript of Darwin's On the Origin of Species: "How stupid of me not to have thought of that!" :-)

Seriously . . . TineOliver's suggestion strikes me as exceptionally elegant, and very much in the spirit of the "Work" concept. Bravo!

130Kathleen828
Oct 10, 2010, 8:53 am

Yes! Indeed! TineOliver has come up with "an exceptionally elegant" solution and I vote for it wholeheardtedly. And s/he has identified that which I did not state clearly - what bothered me so about the notice I was receiving.

I note also that my complaints have not been clearly stated, as I seem to have given the impression that I am concerned with the physicality of my books and not their content which is quite the opposite of how I regard them. That is entirely my fault. I have obviously not expressed myself clearly enough.

I love the content firstly, but I do also like the information about the specific item I have. Before I had so many, I knew my books individually by appearance - "that's my first LOTR. It's the Ballantine with the pink spine. My French one is wine-colored, etc..."

Also, it was very enlightening for me to realize that many/most regard this as primarily a social site. That aspect of it is nice (like this discussion) but it is not the primary reason I got a life membership in LT. I already own a cataloging program and have all of my books in it, stored in my computer. But we know how uncertain that is.

My computer could die, and all my work would be lost. So when I found LT, I happily joined and am very appreciative of the ability to store my information in ether that Tim maintains.

So I suppose, in some sense, I expected my information to remain static. Now that I understand that social aspects are primary for most, I think my frustration level will diminish.

So thank you very much to this great LT community. You have expanded my understanding and I salute you.

Thanks to all who contributed to this discussion. It was very enlightening.

Tim, any chance you would adopt TineOiver's wording?

131staffordcastle
Oct 10, 2010, 1:52 pm

I like TineOliver's solution as well; I was only mildly bothered by the duplicate notice, but this would be nicer.

The one time I actually appreciate getting the notice is when it says this is a duplicate ISBN, because that means I really have entered the book twice, and should fix it.

132brightcopy
Edited: Oct 10, 2010, 1:56 pm

131> The one time I actually appreciate getting the notice is when it says this is a duplicate ISBN, because that means I really have entered the book twice, and should fix it.

Agreed! I actually do like a big red notice then, because either I've screwed up in my book cataloging, or I screwed up in my book buying. When I have multiple copies of the exact same book, I pull the others and sell/donate them. I understand that there's probably a small percentage of users that have duplicate copies of the exact same edition on purpose, though (collectors, for one).

Edit: I do note that I do something similar when I buy a hardback version of something I already had in paperback, though. But that's a less clear-cut case, since there's no good way to tell it apart in code from just buying two different editions. I have multiple editions of LOTR, The Hobbit, and HHGTTG because they're some of my favorite books.

133staffordcastle
Oct 10, 2010, 2:09 pm

Yeah, pretty similar here. Lots of people might have duplicates on purpose to have a loaner copy of something (I have a really battered copy of The Game of Kings because it's been loaned so much, while I jealously guard my hardcover copy.)

In my fiction account I tag the books HB for hardback and PB for paperback, because they're shelved separately and it helps finding the book.

134jjwilson61
Oct 10, 2010, 4:35 pm

I would bet that most people have few or zero duplicate works and the warning is helpful for those people who may have accidentally entered a work twice. For example, all those people who don't care about editions and add a random one whenever they're adding books.

135brightcopy
Oct 10, 2010, 5:24 pm

134> Right. I don't think anyone is disputing that. What we're saying is that there are different groups of people for whom a given behavior is applicable, and some groups for whom the behavior for one group might be quite annoying.

136TineOliver
Oct 10, 2010, 10:05 pm

134/135> I completely agree, that's why I suggested the rewording above - it should still bring 'duplicates' to the attention of those who don't want two of the same 'work', but it wouldn't seem so annoying to those of us who have different editions of the same work on purpose.

I don't think anyone would object to the duplicate ISBN notice (which I believe is actually a different notice from the duplicate work notice) staying as it is.

Does anyone from the "I never deliberately have two copies of the same work" camp feel that amending the notice as I've suggested in 127 above would create an issue for them?

137JonathanGorman
Oct 10, 2010, 10:24 pm

The major reason I get this message is because I at some point had the book in a "To Read" or "Wishlist". Later I start reading the book and add it without checking to see if i already add it.

138prosfilaes
Oct 10, 2010, 10:39 pm

#136: I wouldn't describe myself as part of the "I never deliberately have two copies of the same work" camp, but I have no objections to that change.

139shmjay
Oct 11, 2010, 3:02 pm

I like this suggestion too, because sometimes I actually have duplicate items and sometimes I just have different editions of the same book, and it would be handy to know when I need to weed something.

140prosfilaes
Oct 11, 2010, 3:22 pm

Though I would point out that just because there's no an ISBN match doesn't mean you don't have the same edition in your catalog, just that they don't have ISBNs or the ISBNs got changed (which is unfortunately common). And red is about the only thing making this stand out from the text that follows it.

141MarthaJeanne
Oct 12, 2010, 1:34 am

Some sort of message is very useful. I get it farily often. Sometimes because I really have (and want to have) two copies of the same work. Sometimes because someone in the house bought a second copy of something someone else already had. (Get rid of one of them!) Sometimes because Volumes one and two got autocombined and I need to separate.

I agree that the word 'duplicate' ought to go. Probably the red colour as well, but the message itself needs to stay, and be noticable.

142TineOliver
Edited: Oct 12, 2010, 3:33 am

141: I completely agree

ETA - my inability to effectively use the English language

143ExVivre
Oct 12, 2010, 2:18 pm

>141 MarthaJeanne: Yes, for all those reasons and others. I have a number of works where I've purposely collected multiple editions.

144Mr.Durick
Oct 12, 2010, 5:06 pm

I entered a stack of books yesterday from piles of earlier acquisitions and was surprised at some of the duplicate ISBN's and duplicate works. I was glad, however, that they were flagged prominently, among other things, so that I can ponder whether to try to dig out my other edition of Will and Representation. The language seemed explicit to me, and the red coloring distinguished the fine print from the fine print that did not interest me.

Robert

145MyriadBooks
Oct 14, 2010, 2:25 pm

I've just had an occasion to wonder, "What book do I own with the highest page count?" I know I have a lot of page-count data in LT (and I leaped to LT immediately with the thought that I could modify my catalog Styles to sort by 'page count') but all my page-count data is buried within the publication field and thus not sortable. Or searchable.

In thinking about this more, I would also love the ability to sort by book weight. I would find it useful to command LT to, say, show me all my books classified as not-yet-read and weighing under half a pound, which would be a help me select when selecting books to I need haul on plane rides or car trips.

(....though the desire to sort by this data may be mine alone....)

146paradoxosalpha
Oct 14, 2010, 4:08 pm

The quantitative physical datum I most covet for each book is width, so that I can sum my shelf-feet of books. But I can't say I'd actually go back in and enter it for my whole existing catalog, and this statistic is one that traditional catalogers seem to have at the bottom of their priority list. It sounds like Amazon has it, though!

147Kathleen828
Nov 14, 2010, 5:03 pm

I am sorry to say that I am back again with the same complaint which has been thoroughly discussed and answered above.

But every time it happens to me, I get that same old frustrated feeling.

Today I input into my library a paperback copy of Thomas Hardy's The Mayor of Casterbridge. LibraryThing informed me that that more than 3,000 other members own this book.

I perfectly understand that these 3,000 people own a copy of some edition of The Mayor of Casterbridge. But I find it difficult to believe that they ALL own the same little 1956 paperback that I do.

In fact, I KNOW that they don't and I understand that that is actually NOT what LT is saying, but semantically, it sounds as if it is, and that just makes me crazy.

I hate this. It's actually a lie. All these other people DON'T own the same book I do, they own some other edition of this novel.

Sigh! I wonder if some other online location would just let me enter my stuff without insisting on telling me things I don't agree with. I did pay a little bit of money for a lifetime LT membership - can't I have even the smallest say in this? Just leave me out of the whole FRBR/WEMI thing and let me enter my individual books -- please?!

148Heather19
Nov 14, 2010, 5:11 pm

If it is still bothering you so much, despite all the explanations in this thread, then maybe LT just isn't the place for you.

I know you see it differently, but those 3,000 people *do* have the same book you do. The same plot. The same words. The same ending. It's the same in every way except for maybe publication date and the cover. If you really can't get past that, and realize that a different publication date doesn't make a work substantally different, then maybe LT isn't what you need.

The very core of LT-works is to have different editions in the same work. Most people here understand that and want it. And Tim has been fairly adament that it won't change.

149keristars
Nov 14, 2010, 5:22 pm

Why not just go to private mode and never visit a work page? Stick to your catalogue view and maybe talk, and you won't have to be faced with knowing that I own a Penguin Classics edition of Mayor of Casterbridge published in 1998 (or whatever) and that I consider us to have the same book.

I really don't understand why you hate the idea of works so much, since the whole point of the concept on LibraryThing is to link people's catalogues together and show connections, which in turn allows for discovering new works that might be interesting, or easily access other reviews of the same work, or maybe just find someone to talk to about a book.

150brightcopy
Edited: Nov 14, 2010, 5:45 pm

147> Yeah, I totally don't get that. If you say to me "I just finished The Mayor of Casterbridge. Have you read that book?", I'm likely to say "Oh, I have that book. It's one of my favorite books!" Not. "Which The Mayor of Casterbridge? The 1956 paperback version? No, I haven't read that book."

The word you're looking for is, of course, "edition."

Not only is LT not claiming that you both own the same edition, that's not what normal everyday people are typically saying when they say "book" in casual conversation.

There's a lot of other quibbles I could see people having over LT terminology, but I honestly don't understand getting hung up on this one.

151BTRIPP
Edited: Nov 14, 2010, 6:12 pm

I'd like to jump in with my two cents on this ...

While I certainly understand that each of several dozen editions of a classic book are "the same book" (possibly with variations on accessory material), the physical presence of "the book" in my possession has a certain identifiability that makes me somewhat cranky when L.T. tells me I have "duplicates" of a book. Now, I realize that I have a few duplicates in my collection (where I'd bought another copy of the same edition, either intentionally, such as the Fourth Edition of Turabian at various points in school, or accidentally), but it aggravates me to be told that my 1961 hardcover edition of Mao's On Guerrilla Warfare is the same as the 2005 cheap Dover paperback! Yes, the words might be the same (I'm assuming these a single translation), but almost everything else about the books is quite different.

It's some thing like how I feel about e-books, "there's no there there" ... perhaps I'm enough of a geezer that the "shelf presence" of a book (as, frankly, my recall of books tends to be very visual, relating the particular book with places and times when it was read) is important, but I can certainly understand why folks get their noses out of joint by having a particular instance of "a book" being folded in indiscriminantly with all manifestations of that book on the "work" level.

 

152jjwilson61
Nov 14, 2010, 7:01 pm

I just checked out a work page and on the members sub-page there is a section with the heading "All members who have the book". I would be in favor of changing this to "All members who have the work". Would that satisfy you?

153jjwilson61
Nov 14, 2010, 7:07 pm

I see that TineOliver also made some suggestions which may or may not have been implemented. Unfortunately there is no Feature Suggestion tracking like Tim has recently added for Bug Tracking, so if you have an idea that you want to see implemented you need to keep bringing it up periodically. It would also help if those ideas had their own thread.

154JonathanGorman
Edited: Nov 15, 2010, 1:46 pm

> 147

Well, really, if we want to get down to semantics, you and whoever else you purchased that particular copy with own it. I mean, I bought the 2,1567 printed copy of a particular print run. I share it the purchase with my wife. Only two people own it, right?

Of course, it's still unresolved if I still own the same book as I used to if I rip out an empty page...

At some point in order to do this in a somewhat sane and automated way, choices have to be made at the level to aggregate counts like this. The level LibraryThing chooses so far seems to work for me and most others for most cases. Going by edition or even print run seems too detailed. While it would be nice to toggle things on an edition-level, it's nice to see the popularity of the overall work. When I care about that, I suppose. Typically I'm more concerned with which of my friends have read the book, but none of my non-librarian friends use LibraryThing.

155eromsted
Edited: Nov 15, 2010, 3:26 pm

Add to that the difficulty of getting rather simple statements to be "true" if we really want to parse things out.

Take jjwilson's rather sensible "All members who have the work."

All (Now by all we don't really mean all we mean all that we know of given vague user input, the diligence of combiners, the problem of contained/containes and the definition of a work - see below) members who (By members you might think we mean real individual people, but individuals can have more than one account, accounts can be owned by more than one individual or corporate entities, or may indicate lists of dead people's library holdings or other abstract concepts) have (By have you might think physically have but that's not the case. Really it means have listed in their LT catalogs for whatever reason: ownership, readership, want lists, or other purposes) the work (see here for more info but also be warned that there are grey areas and a specific work at a specific moment may or may not not live up to the general definition).

Despite all that I'm still with Jonathan that the total members number is useful.

156Kathleen828
Nov 16, 2010, 7:11 am

@151 - Thank you. You have expressed, far more eloquently than I, the point which I am trying to make. That's it exactly.

157Kathleen828
Nov 16, 2010, 7:31 am

This is what I mean:

I have a set of Lord of the Rings which I bought in high school. It's the Ballantine set with the pink and white covers. I have another of the SAME set which someone gave us for a wedding present. I have a boxed set, leather-bound also a gift, I have a paperback set with really ugly covers which I bought myself, and I now have the set in French which is what started this whole thing.

Of course I know that the contents of each of these is the same. But the PHYSICAL books are different.

That's why it annoys me when LT says, "duplicate copy exists in your library." Except for the 2 Ballantine sets, they are NOT duplicates - they are quite different.

When I talk about "my books" I mean my PHYSICAL books which is what I am recording in LT.

When I talk about what I am reading, it still makes some difference. There are now 2 sets of translations of Proust's entire oeuvre. A set by Moncrieff and a set by a variety of authors. Same work, yes. Same expression, no. And the difference is fundamental, as translations radically alter the sense and feel of a reading encounter.

So the Prousts are patently NOT the same, yet LT would tell me they were. Grrrrr

158Nicole_VanK
Nov 16, 2010, 7:49 am

Yes, I would prefer it if LT said "work" instead of "book" in such cases.

159_Zoe_
Nov 16, 2010, 8:22 am

It seemed like everyone in this thread already agreed that the "duplicate" wording was less than ideal. See TineOliver's suggestion in #127: "You have another version of this work in your library", with the positive response following.

Unfortunately Tim doesn't make all suggested improvements immediately. And ranting against the whole work system seems likely just to make him dismiss the issue entirely.

160jjwilson61
Nov 16, 2010, 9:15 am

I already suggested starting a new thread for just the issue of changing the warning text, but you ignored it.

Another thought is that this can be seen as an error in the wording so maybe entering a bug report in Bug Collectors is warranted.

161brightcopy
Edited: Nov 16, 2010, 5:18 pm

156/157> This seems completely different from your complaint in 147, yet you seem to be implying it's just a restatement of it.

162jjmcgaffey
Nov 16, 2010, 5:12 pm

157> Note that in Add Books, it does in fact say "Duplicate WORK in your catalog". If you can accept that on LT, book =/= work, it's telling you what you already know - that you already have another edition of this book.

151> Yeesss, that's true - the physicality of the book makes a difference. But don't you find it useful for LT to tell you that you have multiple editions of whatever, so that you can compare them directly? I've winnowed down my duplicates, pulling out the ones that are damaged or too big (I prefer paperbacks to hardbacks - more of them fit on the shelf! Of course, an illustrated version is something different), largely because LT told me that I did have duplicates. Not to mention if I accidentally buy another copy of a book/work, LT will tell me right away. With your style of entering, that's not particularly helpful to you (assuming you remember during reading the second version that you read it before), but it's very handy to me and others.

163paradoxosalpha
Nov 16, 2010, 7:07 pm

Another time it can be useful is adding something to a wishlist collection. "Oh, I have that already? Well then, time to dig it out."

164Kathleen828
Nov 20, 2010, 4:51 pm

@ 162 - No, I don't find it useful for LT to tell me that I have multiple editions. I already know I have multiple editions and I want to have multiple editions for various reasons. That is precisely why it annoys me.

165Kathleen828
Nov 20, 2010, 4:53 pm

However, I surrender. This is obviously going to continue to happen, so I shall say no more. I know, I know, I should have decided that many posts ago...:)

166LolaWalser
Edited: Nov 23, 2010, 3:54 pm

Kathleen, if your main concern is keeping the various editions separate in your catalogue, you could try adding distinctive information to your titles, PLUS a note saying "do not combine".

For example, Your Title. Signet56. DO NOT COMBINE

Not pretty, but it would prevent automatic combining, and probably most "manual" combining.

Of course, this would orphan your books, but it simply isn't possible in the current LT system to group identical editions (without ruining the concept of LT work).

167lorax
Nov 23, 2010, 4:01 pm

166>

I ignore notices like that when I see them, and grumble about the arrogance of people who add them who think that their own personal edition is so super-special that they can't stand to have it mingle with the common herd.

The editions are separate in Kathleen828's catalog. They each have their own entirely independent record with associated data. She just needs to never look at the social data, because that's where they get mingled.

168timspalding
Edited: Feb 9, 2011, 11:26 am

I have gone live with the wording "There is another version of this work in your books."

I chose "your books" because that's the interface name for all your books. Your library is the old name. I changed the color from red to green.

I think it needs to stand out. The message is there to prompt you to consider whether you really have cataloged the same thing twice. The ISBN message only happens when the book HAS an ISBN, after all. If you catalog the same older book, there will be no such message. It may also, I think, happen if the ISBNs are expressed differently (ISBN10/13 or with dashes). I haven't checked that.

Kathleen is not, however, going to be happy that LT is considering a move to a much more thoroughgoing FRBR-ish model. See http://www.librarything.com/topic/109523 . Or perhaps she will, as it will then be possible to see just the reviews applied to the Moncrieff translation. Of course, there were multiple editions of that book, and then the book was copied and sent to different bookstores, some of which dinged the side and some of which didn't. If you are talking about your physical book, and you add more than one exclamation point, such details may be important to you.

Generally-speaking, I sympathize with all cries of the heart. But if we're speaking about libraries, there is something truly silly about a system in which someone can walk into a library, look up a book, see that all the copies are checked out and not be in any way informed that he library has the same book in another format.

Fundamentally, however, if you don't want to be informed of how your item connects to the larger bibliographic world, you should be cataloging offline.

169paradoxosalpha
Feb 9, 2011, 12:40 pm

>168 timspalding:

Good change, but "There is another instance of this work in your catalog" would have been even better. What if they aren't different versions? And my total catalog has never been exclusively "my books" since I started using Collections as suggested: i.e. with Wishlist and Read but not owned (the latter primarily for reviewing purposes). The total catalog = "My books" terminology has caused a lot of anguish among Thingamabrarians in my experience.

170paradoxosalpha
Feb 9, 2011, 12:41 pm

Or "...another record of this work...."

171brightcopy
Feb 9, 2011, 12:42 pm

Or "entry". This is one place where I'm fine with that generic term.

172lorax
Feb 9, 2011, 12:51 pm

169>

"Your books" is what it says on the tab, though, which is why Tim's using it.

173timspalding
Feb 9, 2011, 12:53 pm

Right, but "catalog" is no longer used on the tab. The term is "Your books."

If we use "entry" then people will assume it's the "same" entry. I think version is best. It might actually be exactly the same version,b ut this leaves it open.

174brightcopy
Feb 9, 2011, 1:10 pm

173> If we use "entry" then people will assume it's the "same" entry.

I can see that, though the same line of reasoning means people will assume it's a DIFFERENT version if you say "another version." Probably a no-win scenario.

"There is another version of this work or possibly the same version but just another entry or maybe even it's not even the same thing but somehow the ISBN got screwed up and tacked onto the wrong books and do you REALLY think the author name is King Stephen and not Stephen King because, honestly, that seems much more likely...

err... In your books."

175_Zoe_
Feb 9, 2011, 1:25 pm

Please make it "among your books" rather than "in your books".

176timspalding
Feb 9, 2011, 1:34 pm

Yeah, it's a no-win. I mean, what you really care about is whether you did the same book, but it can't know. It's not telepathic. Maybe you have two identical copies and want to catalog them.Maybe you do that, but then you misshelve them and you catalog them AGAIN. What if you use four different sources? No answers. At some point you've got to remember what you have--at least remember that you don't in fact have five copies of that book.

"There is another version of this work or possibly the same version but just another entry or maybe even it's not even the same thing but somehow the ISBN got screwed up and tacked onto the wrong books and do you REALLY think the author name is King Stephen and not Stephen King because, honestly, that seems much more likely...

How about just a red, "you may or may not have screwed up there, chum."

177timspalding
Feb 9, 2011, 1:35 pm

>175 _Zoe_:

How about in Your books?

Can I use "amongst"?

178_Zoe_
Feb 9, 2011, 1:38 pm

>177 timspalding: Yeah, "in Your books" would work too, though it doesn't seem as clean.

I'd also prefer "amongst" to the status quo :)

179TheoClarke
Feb 9, 2011, 1:42 pm

Verily, a similar tome lies amongst Your books

180brightcopy
Feb 9, 2011, 1:48 pm

"Amongst your books are such diverse entries as..."

181timspalding
Feb 9, 2011, 2:04 pm

Amongƒt your books are ƒuch diverƒe entries as...

182TLCrawford
Feb 9, 2011, 2:05 pm

"you may or may not have screwed up there, chum."

I really like that one.

183prosfilaes
Feb 9, 2011, 6:29 pm

It's "Amongſt your books are ſuch diverſe entries as..."

184Heather19
Feb 9, 2011, 6:31 pm

Am I the only one who finds "another version of this *work* in your *books*" a little awkward?

On LibraryThing, I've come to understand that "work" basically means "book". I've gotten used to that, so to see that specific wording, it feels a little redundant... It's not clear that "your books" means "Your Books" and not just a general type of "the books you have" or whatever.

185TineOliver
Feb 9, 2011, 7:04 pm

168: Tim - thank you! I like this much better!

184: It's probably just semantics, but seeing as I'm pretty sure it was me who suggested that particular wording, the reason I said another version of the "work" not "book" was to do with translations (which I think came up a number of times on the thread where this originally came up). People seemed much more comfortable with the concept that Les miserables (for example) in the original French, in an English translation by person A and in an English translation by person B were the same work, but not that they were the same book, and if you owned all three, they were certainly not "duplicates" of each other (which was the old wording). I'll admit than 'instance' probably works just as well (if not better) as 'version' in this context

Although, I'm not clear from Tim's post whether you still get the same message if you enter two versions of Les Mis with two separate ISBNs.

186timspalding
Feb 9, 2011, 8:10 pm

>183 prosfilaes:

Thanks. No ſ key I could find :)