HomeGroupsTalkZeitgeist
This site uses cookies to deliver our services, improve performance, for analytics, and (if not signed in) for advertising. By using LibraryThing you acknowledge that you have read and understand our Terms of Service and Privacy Policy. Your use of the site and services is subject to these policies and terms.
  • LibraryThing
  • Book discussions
  • Your LibraryThing
  • Join to start using.

Cataloging improvement I: Better export

New features

Join LibraryThing to post.

This topic is currently marked as "dormant"—the last message is more than 90 days old. You can revive it by posting a reply.

1timspalding
Edited: Nov 20, 2014, 1:48pm Top

Check out the blog post announcement:
http://blog.librarything.com/main/2014/11/cataloging-improvement-i-better-export...

Try the export here:
https://www.librarything.com/more/import

New features include 41 fields (up from 16/29), and export filtering.

2lorax
Nov 20, 2014, 2:10pm Top

There appear to be some formatting issues with the tab-delimited export. In several cases for my catalog there's no newline at the end of several records, meaning that the next record just starts on the same line (and thus, there are too few rows in the file.)

I'll open an actual bug report with more details.

3JBD1
Nov 20, 2014, 2:14pm Top

Niiiiice!

4timspalding
Edited: Nov 20, 2014, 2:17pm Top

>2 lorax:

Thanks. Give us the book IDs.

5lauralkeet
Edited: Nov 20, 2014, 3:44pm Top

Very happy to see this new feature!!

I ran a tab-delimited export to find books to be read (a collection), that I already own (a tag). This produces a list of books that matches a similar library search in terms of number of books and their titles.

However, in my library all of the books have an Entry Date. In the export only about 1/3 of the books have an Entry Date, and that date also appears in the "Acquired" field, which I do not use. Here are a few book IDs in case they are useful.

Entry Date blank
35786320
54250932
56653621
56836574

Entry Date=Acquired:
36417734
39368263
46678590
61176882

6Foretopman
Nov 20, 2014, 3:58pm Top

This is great.

The ability to export more fields is good.

JSON format is good. (I thought I wanted an XML export, but that was just because I didn't know JSON. Now that I've seen what a JSON export looks like I think I'm glad we didn't get XML.)

The ability to filter is good.

There's only one problem that I've found. When I search by tags I can use a minus sign (or hyphen) to search for books that DON'T have a tag. Trying to do that in a filtered export seems to cause the export to hang up processing the first book.

7saltmanz
Nov 20, 2014, 4:21pm Top

Solid. I haven't played with it yet, but it definitely look promising.

Any chance of adding the ability to export directly from "Your books"? A single button that would take into account your current view (collection/tag/search) and prompt you for export format?

8timspalding
Nov 20, 2014, 4:24pm Top

>5 lauralkeet:

Okay, I'm opening a bug for that, and pasting your comments in.

See https://www.librarything.com/topic/183237

10timspalding
Nov 20, 2014, 4:25pm Top

Any chance of adding the ability to export directly from "Your books"? A single button that would take into account your current view (collection/tag/search) and prompt you for export format?

We'll consider it. I could do it easily for most of what you do. But it's not clear I can easily do it for all searches, clicks, etc.

11LibraryPerilous
Nov 20, 2014, 4:38pm Top

12saltmanz
Edited: Nov 20, 2014, 4:44pm Top

10: Cool. Not a real necessity (I probably wouldn't even use it much) but it just seems like the next logical step.

I did my own dump (xls) of this year's additions (93 records), and only two issues jump out at me:

1. The missing entry dates from #5, upthread (I'll add detail in that bug thread.)
2. Some character-encoding issues, specifically:
• exports as •
é exports as é
è exports as è

13timspalding
Nov 20, 2014, 4:50pm Top

I'm working on the encoding. The problem is, I think, that it's not being recognized as UTF8 by programs that open it.

14timspalding
Nov 20, 2014, 5:08pm Top

I'm going to tear my hair out. I'm worried we have to do that convert to UTF-16 with extra characters to get it to be read correctly by Excel. Argh.

15eromsted
Nov 20, 2014, 5:14pm Top

>12 saltmanz:,13
I had similar issues with character encoding but when I told Excel to open the file with UTF8 everything worked properly.

16timspalding
Nov 20, 2014, 5:17pm Top

Right. So the trick is convincing Excel that it's UTF8 to begin with. Anyone have an idea how to do this? Files don't have MIME types the way files do.

17LucindaLibri
Nov 20, 2014, 5:27pm Top

>5 lauralkeet:
>8 timspalding:
Also noticed the Entry Date problem right away (also would prefer that the Entry Date column was in its old position with the other dates for easy comparison). I checked my previous export from a few weeks ago and all the books have entry dates.

Off to check the other date columns now . . . and verifying that many of the Entry Dates are in the Acquired column instead (which is a column I DO use, so concerned about where those dates ended up). And the "Date Finished/Ended" column is "Date Read" instead, which is confusing/inaccurate. I seem to be missing some finish dates for books with no start date, but I'll have to go through one by one comparing with the previous export to locate the specific books because when I sort they aren't appearing, but could be buried somewhere in the 2400 lines.

SO, while I appreciate all the options, I really just want the Export All file to be accurate . . . which right now it is not.

I was glad to see the additional author column, though many of my books have more than two, so I'll still have to use my "more author info" tag and copy those by hand. I'm thinking I would rather the extra author info port off into a single column with all the info: author#2name, author#2role; author#3name, author#3role, etc.

Must go take out the garbage, clean the litter box, and mop the kitchen floor . . . will explore this more later.

Thanks for taking a step in the right direction.

19saltmanz
Nov 20, 2014, 5:32pm Top

Looks like you changed the filetype for the TSV export just now? My initial test from an hour ago gave me an .xls file, while my second try just a minute ago gave me a .tsv. Of course, Win7 doesn't know what to do with a .tsv file. On the other hand, with the .xls Excel gave me a "this file's format and extension don't match" warning (though it opened just fine anyway.)

20LucindaLibri
Nov 20, 2014, 5:34pm Top

One more encoding thing . . . my Excel 2011 for Mac doesn't recognize .tsv as a readable file. I can change the open options to allow "all files" but then it wants to import it, which is where I can designate what type of file it is, etc. but adds several steps to the opening process. When I go to the property info of the file itself and designate it to Open With . . . Excel. It opens right away, but I'll need to check to see what happens with a few of the columns where I need it to import as text rather than General format (to preserve the 0s at the beginning of ISBNs that I've moved to the comments and private comments columns).

21timspalding
Nov 20, 2014, 5:34pm Top

We're going to have a big conversation with the team tomorrow about the best way to satisfice the file-type issue. So it may change again. Fundamentally, the file is UTF-8. Grumble.

22timspalding
Nov 20, 2014, 5:35pm Top

We may need to convert it all to that a miserable XLS format. Ick.

23saltmanz
Nov 20, 2014, 5:37pm Top

20: You can also just drag-and-drop the .tsv into Excel to open it without any issues. (In Windows, anyway; it's been like 15 years since I've used a Mac.)

24bnielsen
Nov 20, 2014, 6:45pm Top

Very nice. The tab-export converts easily to an ancient format called rdb. A side effect is that I got this list of the exported data. This might come handy when discussing anything amiss with the new export. So far I'm quite happy with it.

1 Book_Id
2 Title
3 Sort_Character
4 Primary_Author
5 Primary_Author_Role
6 Secondary_Author
7 Secondary_Author_Roles
8 Publication
9 Date
10 Review
11 Rating
12 Comment
13 Private_Comment
14 Summary
15 Physical_Description
16 Weight
17 Height
18 Thickness
19 Length
20 Dimensions
21 Page_Count
22 LCCN
23 Acquired
24 Date_Started
25 Date_Read
26 BCID
27 Tags
28 Collections
29 Languages
30 Original_Languages
31 LC_Classification
32 ISBN
33 ISBNs
34 Subjects
35 Dewey_Decimal
36 Dewey_Wording
37 Copies
38 Source
39 Entry_Date
40 OCLC
41 Work_id

25jjmcgaffey
Nov 20, 2014, 9:20pm Top

>17 LucindaLibri: re: Other Authors. I get multiple Other Authors in that column, separated by pipes; and their roles in the next field, Other Author Roles. So it might require a bit of attention, but they're all in there and so are their roles.

Order - I would truly love it if the fields were grouped more logically. All the dates together (instead of Entry Date being way off at the end). All the classifications together - as it is, LCCN is in one place, DDC a bit later, and LC Classification even further on. And all the work-level stuff together and at the end - work ID, subjects, etc. It's great to have it, but I won't be using it 99% of the time and keeping it together and out of the way would be wonderful.

Pony - a cookie or something that lets us arrange the fields in our preferred order (I can see someone wanting Work ID right next to Book ID, for instance) and that's the way they export. Or possibly export based on views, or something like views? But then not all the fields would be covered (I'll never have Subject in my views...).

And mild puzzlement - in what order do the records export? It's not book ID order, like the old TD, not entry order (which would be the same thing), not...??? Not a problem, since I resort them anyway as I investigate, but puzzling.

I can open the file nicely in Calc, because it asks anyway. I can also set the types for the fields that want it (Text for title and Dewey (otherwise it loses leading 0s), Dates for the various date fields) which helps. Theoretically it's supposed to recognize dates, but it was missing some.

Oh yeah - and the dates are coming out in different formats. Specifically, Acquired, Started and Read are displaying as day mon year (1 Apr 2014), and sort as text (if I don't specifically set it to date); Entry defaults to yyyy-mm-dd and sorts as number/date. I prefer the latter - especially for sorting, that's a heck of a lot more useful than it sorting
1 Apr 2010
1 Apr 2014
26 Jul 2011
23 Jun 2014
...

As I said, I can set it to notice them as dates, in the appropriate order (dmy, mdy, ymd) - but to do so I have to page through each column (and with Review and Subjects, that's rather a pain) before I let Calc actually open the file.

Very happy! Some tweaks, yeah sure, but this is lovely.

Oh...just noticed, I don't see the multiple reading dates. Is that not happening? Awwww. It would make it hard to sort, unless you did a(nother) separate field for "previous read dates", but it would be a very nice thing to have...

26timspalding
Nov 20, 2014, 10:41pm Top

No, fair enough. Nobody build anything based on the current order, okay?

I'll get on some of your suggestions soon.

27LucindaLibri
Nov 20, 2014, 11:18pm Top

>25 jjmcgaffey:
In excel (earlier this evening) I was getting dates formated mm/dd/yy . . . the old export gave me the nn-mmm-yy . . . so not sure if the dates format you're getting is due to using Calc rather than Excel?

I'll have to double check tomorrow about the multiple authors, but last I exported (several hours ago), the columns were labelled "Second Author" and "Second Author Role" so if multiple contributors are in those columns they need different names.

28saltmanz
Edited: Nov 20, 2014, 11:27pm Top

@25: Yeah, multiple reading dates would be (very) nice, but I'm also not sure what form that would/should take. Pipe separation would be consistent with other authors, but also seems like it would feel odd in what is ostensibly a date field.

@27: Excel for me is kinda inconsistent with the date formats: Acquired and Reading dates show as 20-Nov-14, and Publication and Entry dates show as 11/20/2014...

29LucindaLibri
Edited: Nov 20, 2014, 11:57pm Top

>12 saltmanz:
I'm getting different variants for accented characters:
é comes through as √©
"Light on Prāṇāyāma : the yogic art of breathing" turns into "Light on PrƒÅ·πáƒÅyƒÅma : the yogic art of breathing"
Brontë turns into Bront√´

I'm a bit unclear on what to choose for "File Origin" when importing into Excel. I get a drop-down list of many options:
Macintosh
Windows (ANSI)
DOS or OS/2 (PC-8)
. . .
many foreign language options
. . .
Western (ASCII)
Western (DOS Latin 1)
Western (ISO Latin 1)
Western (Mac OS Roman)
Western (Windows Latin 1)

the only UTF-8 on the list is:
Unicode 6.1 UTF-8

Is THAT last one the one I should be choosing?

30jjmcgaffey
Nov 21, 2014, 2:56am Top

>27 LucindaLibri: Ah, it's actually Secondary Author (& ...Role). So it's singular, but putting in parentheses is probably not a good idea... technically it should be Secondary Author(s), but probably not worth it.

>29 LucindaLibri: Um...probably? In Calc it's "Unicode (UTF-8)", looks about the same. Does it come out clean (without weird characters) if you choose that one?

>28 saltmanz: Huh. Tried it again on a different machine - the exact same file, I think the same version of Calc (though that may not be true), and nothing set on the date fields (I didn't choose any options) - and I got dd/mm/yy for all four date fields. So possibly not anything that needs changing. Oh, no, on more careful examination - while it shows up as dd/mm/yy, it's a text field not a date (sorts wrong). But if I choose "detect special numbers" they are dates. So that's a matter of how smart Calc is being - it exports (checking in a text program) as the unquoted text dd/mm/yy, and it depends on the program reading it correctly and displaying it reasonably. And I have no idea why it would display them differently in different fields - same format (to my eye, at least).

Though it's really weird (on Calc's part, not LT's) that it could read the numbers as dates, write them as dd mon yyyy, and then sort them as text...

31bnielsen
Edited: Nov 21, 2014, 6:36am Top

>30 jjmcgaffey: Too few data :-)

The columns with dates are:
9 Date
24 Date Started
25 Date Read
39 Entry Date

I get the formats (given by example :-)

1975
2013-02-24
2012-06-14
2007-10-28

I think the difference between your dd/mm/yy and my yyyy-mm-dd is a setting on our profiles? Maybe you have the default date format and I have chosen the ISO standard?

Yes, Home/Profile/Edit profile and account/Account Settings/Data format

https://www.librarything.com/editprofile/account

32timspalding
Edited: Nov 21, 2014, 9:05am Top

The date formatting on your profile won't change this. Perhaps it should, but I think we're better off with standardization.

What's the best date format for us to provide? Whatever that is, we'll do it. YYYY-MM-DD?

33bnielsen
Nov 21, 2014, 10:28am Top

>32 timspalding: My vote is for YYYY-MM-DD. But anything I can parse is okay with me.

34Lyndatrue
Edited: Nov 21, 2014, 11:40am Top

>32 timspalding: I would like to second the vote for YYYY-MM-DD, please. I've been following along on most of the export threads even though I don't export data (although I probably should, as a backup, if for no other reason), but that format is the most useful.

We could always just write out the date properly, such as:

November twenty-first, two thousand and fourteen

Of course, to be proper, that would have had a lot more capitalization.

35LucindaLibri
Nov 21, 2014, 12:29pm Top

>30 jjmcgaffey:
Yes, when I went back and exported again I see the pipes in the Secondary Authors & Roles columns. I then added a space before and after each pipe to make it more readable (quick find & replace). Will try another export later today and open as Unicode 6.1 UTF-8 to see what happens. Thanks!!

36timspalding
Nov 21, 2014, 3:00pm Top

Okay, four options now:

37lauralkeet
Nov 21, 2014, 3:35pm Top

>36 timspalding: Export as Excel -- wonderful.
This works great if I click on the link and tell Windows I want to download the file, and then I open it in Excel. The downloaded file has an "xls" extension. I get an error about the file being in a different format than the extension, but click OK and it loads into Excel just fine.

If I click on the download link and tell Windows to open it in Excel instead of downloading, I get a different error message: "Excel cannot open the file .xlsx because the file format or file extension is not valid. Verify that the file has not been corrupted and that the file extension matches the format of the file"

Note that in the second case the extension is "xlsx".

I'm using Win7 and Excel 2010.

38anglemark
Nov 21, 2014, 5:43pm Top

Still a bad tab text and bad buttons on localised sites.

39jouni
Nov 21, 2014, 6:07pm Top

Voting for date format YYYY-MM-DD, more readable than unix timestamp.

> 36 Excel XLS is a closed file format owned by Microsoft. They have nowadays released a 1000+ page specification about it, but it's a bit hard to read :) Excel XLSX is more or less open file format created by Microsoft. Some parts refer to XLS specification, so it's even more complex. Quite achievement :)

Just saying that trying to create such files is hard. I would stick in CSV or TSV, since Excel can import those (at least with some human help).

40Keeline
Nov 21, 2014, 6:10pm Top

The Excel export still appears to be tab-delimited text (completely what I'd expect and recommend). Excel can deal with this most of the time. Occasionally the OS gets in the way.

I have found in the past that it is sometimes necessary to provide a MIME type hint when the file is downloaded. The best one to use for Excel is:

header("Content-type: application/vnd.ms-excel\r\n\r\n");

It is sometimes necessary to add this to a file like mime.types to help associate the extension with the type in addition to your dynamic code in PHP.

James

41jouni
Nov 21, 2014, 6:10pm Top

Bug: after selecting "export as TSV" I get on page, where I seem to be able to change export format with radio buttons. Selection does not seem to have any effect i.e. even after changing to JSON I still download TSV file. The webpage url seems to contain file format selection, did not change.

42jouni
Nov 21, 2014, 6:28pm Top

Bug: run my book catalog JSON through JSONLint, got error note:

Error: Parse error on line 5287:
... "copies": "1" }}{ "53240931": {
---------------------^
Expecting 'EOF', '}', ',', '', got '{'

If output isn't valid JSON, then I cannot import it and do anything with the data :(

43ringman
Nov 21, 2014, 8:08pm Top

I would like an option to omit the green.

44bnielsen
Nov 22, 2014, 3:30pm Top

>42 jouni: Please see http://www.librarything.com/topic/183318

sed -e 's/^}{/,/'

will fix your broken json output file and hopefully Tim et all will fix the buggy json exporter soonish, so we don't need to fix it .

PS If you post the bugs in the group Bug Collectors the chances of getting them fixed are much better than if you just post them as posting #42 in a long thread.

45LucindaLibri
Nov 23, 2014, 5:05pm Top

Finally got back to checking some more things in the new export.

Used Export as Excel, which still walks me through the import steps when I try to open it, but turns out I need/want to do that anyway so I can specify column formats, etc.

Choosing Unicode 6.1 UTF-8 as the "File Origin" in Excel got rid of the odd characters, even in the Sanskrit words/titles! Yeah.

Excel is messing with the date formats, so I have to reformat once I open the file. (While I'm importing I see them as yyyy-mm-dd, which is what I want, but then when the file opens they're mm/dd/yy . . . definitely an Excel thing, I'll need to experiment with changing the date options in the import phase . . . still I can reformat once it opens, so okay.)

Still need to check my special list of 30+ books that had dates problems to make sure they are now all fixed, but everything else is looking good.

46ScarletBea
Nov 24, 2014, 3:04am Top

I also played around with it this weekend, and boy, did I think I had a clean catalogue?
I spent some hours putting 'original language' in most of my books, because I realised that when I see them as a green 'english' on the screen, it just means that it's guessing and when it exports, it puts whatever weird language someone included in *their* books (Nick Hornby wrote originally in german? really?)
I see many more similar fields, and many happy hours of tidying up :)

47jjwilson61
Nov 24, 2014, 9:31am Top

That doesn't make sense. If you're getting a green English in Your Library then it should either export as English (Tim's already done that for some other fields that can have green entries) or nothing. Exporting German in that case is bizarre. I think you should report it as a bug.

48jjmcgaffey
Nov 24, 2014, 3:42pm Top

Tim said green text would export now - so yeah, if you're seeing English the export should show English.

49ScarletBea
Nov 25, 2014, 2:44am Top

Thanks :) I've already fixed all of that field, but at least that will save me fixing the other fields where I see green entries (if they're ok).

50birder4106
Nov 28, 2014, 10:21am Top

For that I was waiting for a long time...
Thank you Tim et all.

I was trying to export my library and parts of it (filtered library) as excel, as tab-separeted and as json files.
But nothing worked. The weehl turns for hours and hours ...
(Win7, Windows Explorer)

Am I alone with that problem?

Martin

51timspalding
Nov 28, 2014, 10:33am Top

Ammar—help him out?

52librisissimo
Nov 29, 2014, 5:29pm Top

Had posted this earlier today on 183319, but think it is better here (didn't find this thread earlier) - Martin is not alone (#50):

I have tried to export tab-delimited files, and the "working" icon comes on but stays stuck at either 100 records processed (a few days ago) or 1 record processed (today). The Excel option doesn't even start up the working icon. Windows 8 with Internet Explorer as browser.

53Foretopman
Nov 30, 2014, 3:25pm Top

>50 birder4106: You say you are doing a filtered export.
>52 librisissimo: Are you filtering or exporting all?

I suggest you both take a look at this thread:
https://www.librarything.com/topic/183238

I was the original reporter of this "bug", but I don't think it's actually a bug, I think it's really about using "tags" and "search" for different things.

54birder4106
Dec 1, 2014, 3:56am Top

>53 Foretopman:

I've tried both. To export all books and to export single collections (for example: my library)
I also tried to export as excel, as json and as tab-delimited text.

55librisissimo
Dec 2, 2014, 12:19am Top

I tried once to export my smallest collection (only 3 books) in case there was a size-of-file problem, but it didn't work either.
Today (exporting all records as Excel file) I didn't even get the turning wheel icon, just the message "attempting export" and then nothing (waited about ten minutes to see if the processing completed but no dice; in the past exporting has taken less than a minute).
I do have an inordinate number of tags in every record, if that is a consideration for some reason.

56librisissimo
Jan 6, 2015, 10:32pm Top

JFTR, since I was the last poster, my export is working now.

57LucindaLibri
Feb 9, 2015, 1:47pm Top

Another FYI: I discovered the hard way that the new export switched the order of the Collections and Tags columns. I was combining records from different 2014 exports. I knew I would have to deal with new columns and that the date columns were in a different place, etc. but after I thought I had everything lined up just right and sorted it altogether :( I found the mixup with the Collections and Tags columns. I fixed everything using copy and paste and an empty column, but just thought it was worth a warning for anyone who is combining data from the various exports.

Now off to make myself a cheat sheet of reminders about the old and new columns, character formats, etc. and what I need to remember to do when exporting.

(Not an LT problem, just a LucindaLibri old brain/failing memory problem :)

58librisissimo
Apr 13, 2015, 12:58pm Top

JFTR, a complete export of my file on 2015-04-012 does not have this problem.

59CarolO
Apr 25, 2015, 2:41am Top

Sorry, late to the party, I didn't see this in the prior posts...

I did a small test filtering for a single tag, the export included books without that tag...but they did have that tag many months ago.

I filtered for the tag "want" and the book People of the Book was included in the export without the tag. I probably removed that tag last summer or early fall from that book. There are other examples in my export if needed. If it is working off of old info then it seems like the tag should also show up - but it doesn't.

Group: New features

45,201 messages

This group does not accept members.

About

This topic is not marked as primarily about any work, author or other topic.

About | Contact | Privacy/Terms | Help/FAQs | Blog | Store | APIs | TinyCat | Legacy Libraries | Early Reviewers | Common Knowledge | 134,234,487 books! | Top bar: Always visible