Extend meta data from other/several sources?

TalkWelcome to LibraryThing!

Join LibraryThing to post.

Extend meta data from other/several sources?

1hbbk
Apr 16, 3:48 pm

I am adding the library of a club with >1500 books to LibraryThing. The majority are older than 1975 and have no ISBN and barcode. I use the phone app and scan the barcode (if existing) or use speech-to-text for title and sometimes author until I get hits for the book in the data sources. Initially, I only see if a hit has a cover photo, the title and the author in the respective system. Much later in the computer I see which other metadata fields came from the selected source (or better: are missing). In many cases, the selected source has no Dewey number which is my biggest pain. Also other metadata fields are filled in one source but not in the other. It would be great if there was a function to collect metadata fields from several (or all) source systems. Does someone have ideas how to extend the metadata fields in my LT catalog from source systems?

In "Search where?" I currently use these sources for my german books:
- Amazon Deutschland, Bücher
- GBV - Gemeinsamer Bibliotheksverbund / GBV Common Library Network (Göttingen, NI)
- Amazon Deutschland, alle Medien
- Universität Wien, UBW / University of Vienna (Wien, Wien)
- Library of Congress (Washington, DC)
- Amazon.com, Bücher
- Overcat (über)

2SandraArdnas
Apr 16, 4:16 pm

There is no way to automatically pull data from several sources.

On desktop, you can preview what data will be imported beyond the basics. There's a ? to the right of search result, and clicking it expands to include all data the source provides. AFAIK, this isn't possible through the app. Since non-ISBN cataloguing through the app isn't much faster than on desktop, perhaps switching to desktop for adding those might alleviate some issues.

Also, Amazon will never have some data that only libraries include, most notably Dewey classification and subject headings. So, prioritize library sources if those matter. DDC can be edited, so worst case you can add them later. Subject headings are not and can only come from a library source.

3gilroy
Apr 16, 5:00 pm

Amazon is one of the worst sources for books pre-ISBN, You should make it a last resort.
If you want good details, you want to prioritize libraries.

But you can't draw from multiple sources.

4MarthaJeanne
Apr 16, 5:17 pm

>3 gilroy: Manual entry is easier than using old Amazon data and having to edit every field.

5gilroy
Apr 16, 5:20 pm

>4 MarthaJeanne: I do believe that still fits what I said. Amazon as last resort, which would put it after Manual Entry.

6Buchmerkur
Apr 17, 5:34 pm

Did you try dnb https://portal.dnb.de/opac/showSearchForm, Katalog der Deutschen Nationalbibliothek, Leipzig? They have books of the 1970s and older.

7hbbk
Apr 19, 11:29 am

Thank you for your ideas. I exported a CSV file for the >500 books I already added tp my LT catalog. When I order it by ISBN, or Dewey, or Source, I see no clear pattern that any of the Sources is better than the others: Not Amazon.com, amazon german books, German National Library, GBV - Gemeinsamer Bibliotheksverbund / GBV Common Library Network (Göttingen, NI).

It would be incredibly time-consuming to import each book individually from various sources and then check which one provides the best data. Only automated processing in LT would be feasible. Otherwise, I would have to perform a web search for each book and manually add missing fields, which is also incredibly tedious with hundreds of books.

Currently, I'm doing exactly this manual work for books that weren't assigned an author name during import into LT. Then I also try to add missing publication data and publishers.

I had hoped for more automated support from LT.

Regards, Jürgen

8SandraArdnas
Edited: Apr 19, 12:04 pm

>7 hbbk: It is unclear now whether you're doing a mass import ( https://www.librarything.com/newimport/main ) or entering books individually and by automatically, you mean getting all the data without checking what source provides it. Also, libraries rarely have missing basic data, such as authors. If you haven't already, check all the German sources available and see if any other are likely to have your books. There's around 30, some university, some public libraries as far as I can see.

If you give us some idea of the kind of books, several examples and your workflow, perhaps we can suggest something that will speed things up

ETA: I checked some records in your catalogue and out of a dozen citing literally 'unknown author', all are from Amazon. Amazon is really terrible for older books. The data comes from their marketplace, so it's as good or as bad as those selling it make it, which is usually the latter.

9hbbk
Apr 19, 12:30 pm

I'm not doing a bulk import because I don't yet have a complete catalog of my books. I developed the following workflow because it seemed the most efficient: I take the books from a shelf and process them from left to right. If the book has a barcode, I scan it with the app. If not, I enter keywords from the title and author using the speech-to-text function on my iPhone or by typing. LT then searches the libraries I specified above. From the search results, I select one with the correct cover image. If there's no match with a cover photo, I take one myself and add it. (Unfortunately, some photos get lost in the process, and I have to photograph the book again later. It seems to be a bug in LT.) Later, on my computer, I see missing data fields and painstakingly try to correct them.

I've cataloged over 500 books this way. I still have over 1000 to go. Do you have any better suggestions on how I could proceed?

10lilithcat
Apr 19, 1:18 pm

>9 hbbk:

I would suggest that, rather than selecting the one with the correct cover image, you use the “?” link next to the titles to find the one that has the correct (or most correct) data. It’s far easier to use a photo that is already on the work page* or add one of your own, than it is to correct a slew of incorrect data.

* I caution you against using an Amazon photo, as these will change if they are changed on the Amazon site.

11SandraArdnas
Apr 19, 1:53 pm

>9 hbbk: You can specify the publisher too, I'd search 'title, author, publisher' or just 'title, publisher' if the title is distinctive enough. Note that for libraries you have to separate them with a comma. Amazon doesn't care, but I'd really ditch Amazon altogether for older books, even entering manually is faster than correcting that data. Desktop adding of books gives you more flexibility than the app, so I'd definitely switch for anything not scanable by ISBN. Add all sources that might be helpful, you're not limited to just a few and it's a single click to search in the next one if the previous doesn't have what you need.

Also, books with no ISBN will not have any cover associated with it because it is tied to the ISBN (or ASIN when you use Amazon as a source) because the system has no way of knowing which edition it is without an identifier. But it's possible there's an appropriate cover uploaded already and you just need to choose it once you enter the book.

12MarthaJeanne
Apr 19, 1:56 pm

You also might want to check that your author is in the correct format, and also whether the book has autocombined into the correct work. This is more necessary in German books because there are fewer existing editions in the works.

13hbbk
Edited: Apr 19, 2:10 pm

>10 lilithcat: This sounds good. But in my phone app i cannot find "you use the “?” link next to the titles".
I tap "Add to Catalog", enter title and author (of such an old book without ISBN), get a results list with photo, title, author, source. But there is no “?”.

Addition:
Ok, I see: You suggest the Web-App (PC) for adding such books. There is a “?”. I will try that. Thanks!

14hbbk
Apr 24, 5:03 pm

I tried what was suggested by lilithcat for some books. Unfortunately, the result was unsatisfying. Many of the old books I found in several source systems, e.g. German national library, GBV - Gemeinsamer Bibliotheksverbund, and sometimes more than one from Amazon.

In most cases, each of these entries has a few fields. But each of them has different fields. Even worse: Im many cases, some of these fields have wrong data. Sometimes they have wrong characters in e.g. the title, sometimes complete nonsense in the title, wrong names in the author fields, and more errors.

Example: We have a collection of nine books with texts from Platon (old greek philosopher). The data source from German National Library shows in the title strange signs like "Bd. 1. Iōn". The author field does not show Platon at all but "Heinz Mitwirkender Hofmann Friedrich Übersetzer Schleiermacher". This is the entry: https://www.librarything.com/work/36220041/details/312199801
I can could add many more terrible examples

At the end, I started correcting many entries manually to have something usable at all.

-> My question now is: How save / how stable are these manual changes? (I am worried that changes from someone else can overwrite my changes because someone said that when I change the cover photo, it might be changed again when the photo is changed in the source system.)

15SandraArdnas
Apr 24, 6:33 pm

>14 hbbk: No one can edit your data without logging into your account. The same goes for a cover you either upload or chose one that is uploaded by someone else on LT. (Amazon covers can change or disappear if they no longer have the book)

There are often issues with umlauts, diacritics and such, depending on what encoding is used by the source. I'll leave to someone more experienced to comment what sources have or don't have them.

16LeslieWx
Apr 24, 6:45 pm

>14 hbbk: In most cases, each of these entries has a few fields. But each of them has different fields. Even worse: In many cases, some of these fields have wrong data.

Yes.
I've been here 13 months, cataloging books no older than about 120 years, and had the same problem. Good metadata is hard to come by! And typos are not a modern invention.

ALSO, I have learned that classification systems (Library of Congress, Dewey Decimal, Melville Decimal System) are not quite as full of hard-and-fast rules as I thought, and MARC records have an option for "local call number"s, and sometimes even the people doing the cataloging at the Library of Congress are ill-trained or overworked or I-don't-know-what. I've become less sure that there's one totally right way to catalog a book and more more comfortable using my own judgement. Sometimes I'm happy about that, sometimes not!

I have almost 2000 books cataloged so far; about 11% are completely manual entries. About 25% I've marked as being quick entries, meant so I know what titles by whom we've (temporarily) got in which boxes. Of the remaining 1200 or so books, I bet I've added or changed metadata on over 1000 of them. Picky? Crazy? You decide.

How safe / how stable are these manual changes?
Your textual (metadata) changes are as safe and stable as the LT servers and the internet. No other user can change them.

someone said that when I change the cover photo, it might be changed again when the photo is changed in the source system
I believe that that's only true of covers that are in the "Amazon covers" options, because those are live links to whatever Amazon has put up for that title/ISBN/ISSN. If you upload your own photo, or you download a picture of a book from somewhere to your computer and then upload it, that picture then exists on the LT servers. I believe that if you tell LT to "grab a photo from the web", that picture also then resides on the LT servers.

=====
On a related note: have you discovered Worldcat.org? An account is free, and it's a great way to find library records for even hard-to-find works. I've done a lot of manual entries from library records I've found that way; I've also gone to the "add sources" link on the "Add books" page and added a lot of libraries to my search options once I realize they've got good collections & metadata in areas of my interests.

17MarthaJeanne
Apr 25, 2:27 am

Amazon covers can change. Very rarely the Membercover best guess for ISBN can change. (For example if a novie comes out and suddenly lots of peple buy copies with a movie themed cover) but member covers chosen by you are stable.

18hbbk
Apr 25, 2:46 am

>16 LeslieWx:

On the one hand, it's reassuring to hear that the data is secure. On the other hand, it's also a shame that everyone works on their own catalog and doesn't collaborate on improvements. For example, I worked on Wikipedia for over ten years. There, many people collaborate to create the best possible database and continuously develop it further.

I wasn't familiar with WorldCat.org, thanks for the tip!

With your advice, I now feel confident continuing to work on my data. Thank you very much!

By the way, one idea I have is to have AI generate Dewey/LLC categorizations in Excel spreadsheets and then add them to the books using PowerEdit. Or does anyone have a better idea?

19MarthaJeanne
Edited: Apr 25, 3:20 am

There seems to be an opinion going around that there is single 'right' classification for each book. Many books cross categories, and the person who sets a classification has to choose which to use. This can be different for different libraries. For example https://www.librarything.com/work/57170/classification

You can't really use Power edit for classification data unless you have several books that you want to give the same call number to.

We do collaberate on work data. Common Knowledge and related areas are worked on by many people. If you change the birth date for an author in the CK area of the Author page, it is changed for everybody (and anybody can change it back.) I do not have to set birth and death dates and gender information for each of my authors, because those with that information have already done it. On the other hand, only I know which edition of Tom Sawyer is on my shelf, so there isn't much point in collaberation on publisher or date of my copy. Original Publishing date is CK.

Although the computer does most of the work of combining, members do a lot of combining and separating to correct the autocombiner and fix things it is not suited to. We also collaberatein cleaning up spam. LT could not exist without a lot of collaberation.

20LeslieWx
Apr 25, 1:09 pm

>19 MarthaJeanne: There seems to be an opinion going around that there is single 'right' classification for each book.

I've been a voracious reader for many, many decades. My normally don't-rock-the-boat mother argued the city library into letting me check out books from the adult section without her presence when I was in late elementary school. (That is somehow more shocking and amazing as I think of it now than it was when it happened in front of me then.) I helped the school librarians before/after school or during free periods in both elementary and high school. I currently frequent 3 public and 1 university library. And yet, until I started 13 months ago to catalog on LT the books we own, it was indeed my understanding that there was one 'right' classification for each book.

Isn't it great that there are always new mistakes to make, new things to learn?

That said ... I am NOT going to buy paint for the cats!!!

21davidgn
Apr 25, 1:20 pm

>18 hbbk: You know, "collaborative multi-source metadata retrieval, harmonization, and refinement" does sound like an interesting long-term goal for LT.

22LeslieWx
Apr 25, 1:30 pm

>21 davidgn: Oh my goodness, "data harmonization". A buzzword (buzz phrase?) from the jog my non-library career took in its last few years, just a few years ago. And on a Saturday!!

I may need to log off and go grab a mystery to recover ...

23davidgn
Edited: Apr 25, 1:34 pm

>22 LeslieWx: Clearly the trick would be to figure out how to make it painless. e.g., how many people select a given variant for a given row, given the choice, among those who choose to make a choice.

24LeslieWx
Edited: Apr 25, 1:39 pm

>23 davidgn:
Heads to the basement library for that mystery ...

25MarthaJeanne
Apr 25, 2:35 pm

>20 LeslieWx: But if the cats want paint?

Actually, I did a tag nash of unrelated tags (Cats, art). That book was first on the list and had an interesting title, and also provided different call numbers in both Dewey and LCC. I have not read the book, and do not have access to it. Nor do I have cats.

26LeslieWx
Apr 25, 6:55 pm

>25 MarthaJeanne: But if the cats want paint?

No.
That way lies madness. Especially with a German Shepherd around who would want to supervise.

But the book might be a good present for a couple people I can think of ...