This topic is currently marked as "dormant"—the last message is more than 90 days old. You can revive it by posting a reply.
1timspalding
I'm thinking of redoing the export formats. Right now we have two formats, a CSV (comma-separated values) and a TSV (tab-separated values). As many have pointed out, they are imperfect.
I want to change them, but the code is crap. I need to rewrite them from the ground up. Ideally, I'd like to switch to an XML format. There is no good way to represent book data in a row/column format like CSV or TSV. Something like "other authors" has to be squished into a comma-delimited list, for example. There is also a fun new application that uses LT data that I'd like to have use good LT data—which, in this case, means XML.
Anyway, I need to keep the amount of coding and re-coding down. If you care about exports, please tell me:
Question: Assuming there are three formats—XML, CSV, TSV.
1. If there could be two export formats, which two would you pick?
2. If there could be one export format, which would you pick?
Please answer 1-2 before suggesting anything else.
I want to change them, but the code is crap. I need to rewrite them from the ground up. Ideally, I'd like to switch to an XML format. There is no good way to represent book data in a row/column format like CSV or TSV. Something like "other authors" has to be squished into a comma-delimited list, for example. There is also a fun new application that uses LT data that I'd like to have use good LT data—which, in this case, means XML.
Anyway, I need to keep the amount of coding and re-coding down. If you care about exports, please tell me:
Question: Assuming there are three formats—XML, CSV, TSV.
1. If there could be two export formats, which two would you pick?
2. If there could be one export format, which would you pick?
Please answer 1-2 before suggesting anything else.
2Foretopman
1. XML and flip a coin
2. XML
2. XML
3lorax
1. TSV, XML.
2. Not XML. I want something I can easily manipulate in a spreadsheet or Perl script.
Really, though, as long as it's complete, TSV versus CSV doesn't matter much. I mostly voted for TSV because the current CSV is so pathetically incomplete, and I would be absolutely furious if you left us with no way to have a reasonably-complete export.
ETA: Numbers refer to the questions in Tim's post, not to posts numbered 1 and 2.
2. Not XML. I want something I can easily manipulate in a spreadsheet or Perl script.
Really, though, as long as it's complete, TSV versus CSV doesn't matter much. I mostly voted for TSV because the current CSV is so pathetically incomplete, and I would be absolutely furious if you left us with no way to have a reasonably-complete export.
ETA: Numbers refer to the questions in Tim's post, not to posts numbered 1 and 2.
4brightcopy
1. TSV and XML. Tab delimited data is less problematic than CSV in many ways. The main one is that in most datasets (like this one), you don't find tabs in the actual data like you do in CSV. Otherwise, you have to worry about quoting and such. I especially like that the ubiquitous Excel handles tab delimited data MUCH more sanely than CSV. XML is the programmer-ish solution. I'm not a big fan of XML as a technology, but it serves a purpose and most languages have a library so you can easily pull it in and change it into your other favorite format like JSON.
2. TSV, with the caveat that the multiple value fields are separated by something programmatically useful, like a pipe character. Commas work poorly for multiple values, even for the non-techie user who might just be using Excel. I think for this dataset, this would work as well as XML.
The big gotcha here is line breaks. XML is nice in that you get to have line breaks (albeit with XML's annoying penchant for collapsing whitespace). Any line-by-line format like TSV and CSV will always require either data loss (stripping the line breaks) or kludges (substing in a special character for line breaks).
The thing is, if you did it write, all you need to write is XML and a function to transform it. For example:
<Books>
<Book>
<Author>Smith, John</Author>
<Title>Fantastic Thriller</Title>
</Book>
<Book>
<Author>Doe, Jane</Author>
<Author>Jones, John</Author>
<Title>Collaborative Novel</Author>
</Book>
</Books>
could be transformed to
author \t title
Smith, John \t Fantastic Thriller
Doe, Jane|Jones, John \t Collaborative Novel
Very easily and with no book-specific code. Just some generic XML-to-tabs code.
ETA: And if you go for fancier XML, something like:
<Author role="illustrator">Doe, Jane</Author>
this could become:
Doe, Jane(illustrator)
in the TSV. As long as it was done consistently and in a way you could easily rejigger the data using common tools like Excel, you should be able to map things pretty easily.
2. TSV, with the caveat that the multiple value fields are separated by something programmatically useful, like a pipe character. Commas work poorly for multiple values, even for the non-techie user who might just be using Excel. I think for this dataset, this would work as well as XML.
The big gotcha here is line breaks. XML is nice in that you get to have line breaks (albeit with XML's annoying penchant for collapsing whitespace). Any line-by-line format like TSV and CSV will always require either data loss (stripping the line breaks) or kludges (substing in a special character for line breaks).
The thing is, if you did it write, all you need to write is XML and a function to transform it. For example:
<Books>
<Book>
<Author>Smith, John</Author>
<Title>Fantastic Thriller</Title>
</Book>
<Book>
<Author>Doe, Jane</Author>
<Author>Jones, John</Author>
<Title>Collaborative Novel</Author>
</Book>
</Books>
could be transformed to
author \t title
Smith, John \t Fantastic Thriller
Doe, Jane|Jones, John \t Collaborative Novel
Very easily and with no book-specific code. Just some generic XML-to-tabs code.
ETA: And if you go for fancier XML, something like:
<Author role="illustrator">Doe, Jane</Author>
this could become:
Doe, Jane(illustrator)
in the TSV. As long as it was done consistently and in a way you could easily rejigger the data using common tools like Excel, you should be able to map things pretty easily.
5kevmalone
1. XML, TSV
2. TSV
And if you go for XML, why not keep it simple? I think you're making a rod for your own back if you go for the complex stuff.
2. TSV
And if you go for XML, why not keep it simple? I think you're making a rod for your own back if you go for the complex stuff.
6wademlee
1. TSV, XML
2. TSV
I'm not a programmer. If I want the data (MY data), it's primarily to do simple manipulations or to export from LT to import into some other database/site/spreadsheet.
2. TSV
I'm not a programmer. If I want the data (MY data), it's primarily to do simple manipulations or to export from LT to import into some other database/site/spreadsheet.
8aethercowboy
1. XML
2.CSVTSV
On second thought, it's a whole lot less likely for a book title or other bibliographic information to contain a tab character versus a comma....
2.
On second thought, it's a whole lot less likely for a book title or other bibliographic information to contain a tab character versus a comma....
9infiniteletters
I also prefer XML and TSV because book data will have commas.
10lucien
1. XML, TSV
2. XML*
For reasons that have already been said. XML seems cleanest for multiple values, looses the least info, and can be used to create the others. TSV because of the likelihood of commas in other fields. The asterisk on number 2 is just to say that if there is only one, XML is my personal preference. If you asked me which is best for the site (and the largest number of users), I'd say TSV.
Also, thank you for working on this. It'll be great to a have a clean and complete export available.
2. XML*
For reasons that have already been said. XML seems cleanest for multiple values, looses the least info, and can be used to create the others. TSV because of the likelihood of commas in other fields. The asterisk on number 2 is just to say that if there is only one, XML is my personal preference. If you asked me which is best for the site (and the largest number of users), I'd say TSV.
Also, thank you for working on this. It'll be great to a have a clean and complete export available.
12andyl
1. XML and TSV
2. XML
Comments:
#3 Not XML. I want something I can easily manipulate in a spreadsheet or Perl script.
I'm pretty sure basic scripts (in PERL or other langauges) would soon appear to extract reasonable subsets of data as TSV (or other formats) in no time. I know I am a programmerly type but processing XML isn't too difficult.
#4 albeit with XML's annoying penchant for collapsing whitespace). Any line-by-line format like TSV and CSV will always require either data loss (stripping the line breaks) or kludges (substing in a special character for line breaks).
The solution could be done in XML without any collapsing whitespace or kludging. XML parsers shouldn't (according to the spec) throw away any non-markup whitespace - however some (by default) do. However the DTD can be defined such that tags have xml:space="preserve" defined which will fix things even in those cases.
2. XML
Comments:
#3 Not XML. I want something I can easily manipulate in a spreadsheet or Perl script.
I'm pretty sure basic scripts (in PERL or other langauges) would soon appear to extract reasonable subsets of data as TSV (or other formats) in no time. I know I am a programmerly type but processing XML isn't too difficult.
#4 albeit with XML's annoying penchant for collapsing whitespace). Any line-by-line format like TSV and CSV will always require either data loss (stripping the line breaks) or kludges (substing in a special character for line breaks).
The solution could be done in XML without any collapsing whitespace or kludging. XML parsers shouldn't (according to the spec) throw away any non-markup whitespace - however some (by default) do. However the DTD can be defined such that tags have xml:space="preserve" defined which will fix things even in those cases.
13ari.joki
Whatever you choose, please PLEASE do retain something that we who don't know the difference between PERL scripts and pearl necklaces can easily use. And "easily" does not mean "get the awfully most recent version of XXX and install this this and this script, addon, tool".
I can see the advantages of XML, I must admit. I also remember that in many applications that I have seen, "simple XML" and "flexibly useful XML" have not been simultaneously achieved.
I don't just know how I personally am going to be able to take advantage of the advantages of XML.
I also wish that whatever easy format is taken, the Universal Import will continue to digest that format, with all the data fields there.
so. my votes are a direct copy from #6, wademlee
1. TSV, XML
2. TSV
I can see the advantages of XML, I must admit. I also remember that in many applications that I have seen, "simple XML" and "flexibly useful XML" have not been simultaneously achieved.
I don't just know how I personally am going to be able to take advantage of the advantages of XML.
I also wish that whatever easy format is taken, the Universal Import will continue to digest that format, with all the data fields there.
so. my votes are a direct copy from #6, wademlee
1. TSV, XML
2. TSV
14justjim
Q1. TSV & *SV (where * is a character of your choice, probably from a limited list)
Q2. TSV
Most datasets that I have been forced to work with recently are pipe (|) separated. I can make MySQL or MS Access (at smaller sites) work with that just fine.
The only time I ever work with XML is for sites that have/want Crystal Reports developed. If you think that I'm going to develop a Crystal report for my LT data, you are wrong, my friend, wrong.
Way back in the dim, dark ages (2007), I'd take an export, when I remembered, as a backup in case Tim took my $25 and did a runner to Belize or something.
Since the move to the new server farm last (?) year, I haven't bothered. I've still got my books, what's the worst that could happen? I have to re-catalogue them?
I've actually seriously thought about starting another paid account and doing that anyway, using all the tagging and collection-ing features; and more importantly, not using Amazon.
Q2. TSV
Most datasets that I have been forced to work with recently are pipe (|) separated. I can make MySQL or MS Access (at smaller sites) work with that just fine.
The only time I ever work with XML is for sites that have/want Crystal Reports developed. If you think that I'm going to develop a Crystal report for my LT data, you are wrong, my friend, wrong.
Way back in the dim, dark ages (2007), I'd take an export, when I remembered, as a backup in case Tim took my $25 and did a runner to Belize or something.
Since the move to the new server farm last (?) year, I haven't bothered. I've still got my books, what's the worst that could happen? I have to re-catalogue them?
I've actually seriously thought about starting another paid account and doing that anyway, using all the tagging and collection-ing features; and more importantly, not using Amazon.
15MarthaJeanne
TSV
16paulhurtley
1. XML, TSV
2. XML
2. XML
171dragones
To answer the questions posed in message #1,
1. TSV, XML
2. TSV (or maybe something else I could manipulate in a spread sheet)
XML is my second choice only because others seem to know what use they could make of it. I don't know if the XML format would be useful for me or not, as I know little about it.
1. TSV, XML
2. TSV (or maybe something else I could manipulate in a spread sheet)
XML is my second choice only because others seem to know what use they could make of it. I don't know if the XML format would be useful for me or not, as I know little about it.
18aulsmith
1 & 2: TSV
Like 1dragones, I have no idea how I could use XML The only thing I use export for is a backup for the data in case LT goes belly up. In that case, I'd have to be able to fiddle with the data and get some kind of delimited txt file of my essential data. XML doesn't sound handy for that.
Like 1dragones, I have no idea how I could use XML The only thing I use export for is a backup for the data in case LT goes belly up. In that case, I'd have to be able to fiddle with the data and get some kind of delimited txt file of my essential data. XML doesn't sound handy for that.
19saltmanz
1. TSV, XML
2. TSV
I'm certain that XML will be worthless for the vast majority of LT users.
2. TSV
I'm certain that XML will be worthless for the vast majority of LT users.
20andyl
Supposedly Excel (2007 and more recent editions) supports XML reasonably well now.
It looks to be fairly easy to map a LT XML export into a spreadsheet (and that mapping could be distributed here on LT too). There can be many different mappings which could be shared amongst the users.
For OpenOffice things aren't quite as easy but similar import filters could also be shared.
It looks to be fairly easy to map a LT XML export into a spreadsheet (and that mapping could be distributed here on LT too). There can be many different mappings which could be shared amongst the users.
For OpenOffice things aren't quite as easy but similar import filters could also be shared.
21brightcopy
My experience is that there are a myriad number of ways that various XML libraries decide to do things not quite right. Each one seems to have their own quirks. That's why I think XML should definitely take a backseat. TSV isn't perfect, either, but IMO it is more user-friendly for a greater number of people.
22ari.joki
>20 andyl:
Yes, andyl, indeed. Always get the most recent version of whatever tools there are to be had and install everything in the most recent hardware that can be found in the sales catalogs.
I know it can be hard to believe, but as recently as two months ago I was using an eight-year-old computer with age-appropriate software.
I have no intention of updating this computer, either, before it says *sprkrltghsrk* (or perhaps stays silent) and ceases to operate. Until then, perhaps I just am not supposed need any import/export functionality.
Yes, andyl, indeed. Always get the most recent version of whatever tools there are to be had and install everything in the most recent hardware that can be found in the sales catalogs.
I know it can be hard to believe, but as recently as two months ago I was using an eight-year-old computer with age-appropriate software.
I have no intention of updating this computer, either, before it says *sprkrltghsrk* (or perhaps stays silent) and ceases to operate. Until then, perhaps I just am not supposed need any import/export functionality.
23jjwilson61
If all you want is to be able to reimport what you export then a more robust format like XML is what you want (assuming Tim will build an XML import which given an XML export he'd be crazy not to do.).
24brightcopy
23> I believe the way I outlined the TSV export in #4 would make it work just as well as XML (and have the benefit of being more approachable for more users).
Of course, as I explained in #4, if you do it right there's no reason you need to choose.
Of course, as I explained in #4, if you do it right there's no reason you need to choose.
25andyl
#22
Yes I know some people will have older computers and not install new software. However Excel 2007 and Open Office 3.2 are both fairly prevalent. OO 3.2 is even free. However they aren't the only choices. Even Excel 2003 has a basic XML import - although I admit it ain't that good.
If Tim goes down the XML route I would expect that within a few weeks there would be a basic LTXML -> CSV program in something that could be run on most computers. If not I will write one.
LT data is difficult to do a plain CSV export - it can contain free text including commas, quotes, significant space and line breaks. It also has a variable number of fields (Other Authors and their roles). In message #4 brightcopy mentions putting the role in brackets. This will break if an author name contains brackets (I don't know if any do). Of course escaping can be done - but that is often not handled as cleanly as one would like in spreadsheet programs and is certainly less robust than the XML alternative.
Yes I know some people will have older computers and not install new software. However Excel 2007 and Open Office 3.2 are both fairly prevalent. OO 3.2 is even free. However they aren't the only choices. Even Excel 2003 has a basic XML import - although I admit it ain't that good.
If Tim goes down the XML route I would expect that within a few weeks there would be a basic LTXML -> CSV program in something that could be run on most computers. If not I will write one.
LT data is difficult to do a plain CSV export - it can contain free text including commas, quotes, significant space and line breaks. It also has a variable number of fields (Other Authors and their roles). In message #4 brightcopy mentions putting the role in brackets. This will break if an author name contains brackets (I don't know if any do). Of course escaping can be done - but that is often not handled as cleanly as one would like in spreadsheet programs and is certainly less robust than the XML alternative.
26brightcopy
25> And if the author name includes < or >?
Oh yes, escaping must be done, etc. etc.
;)
The reality is that no solution is perfect. But I think if you have to pick one, tab delimited is more accessible.
Oh yes, escaping must be done, etc. etc.
;)
The reality is that no solution is perfect. But I think if you have to pick one, tab delimited is more accessible.
27MarthaJeanne
I use Office 2000 and have no intention of updating.
29brightcopy
28> Yes, it's called escaping. Hence my mention of escaping.
30jjwilson61
29> Yes, but Andy mentioned well-defined, and I don't believe there is any universal way of escaping using TSV.
31brightcopy
30> Actually, there's plenty of well-defined ways of escaping strings. They don't have to be TSV specific. The typical way is to just use backslashes, of course. i just think far more is being made of the escaping part than is necessary. I think I probably could have coded it in the time I've spent posting to this thread.
The end result is that for XML, you need a program or programs to help you actually do anything useful with it. With TSV, you're much more likely to be able to make use of it with a much simpler program (an old crusty version of Excel, a text editor, etc.)
And we have to keep this in the context of pretty simple data. If we were talking about a much more complicated schema, I'd agree that TSV would be useless. But what we're talking about here can flatten pretty easily.
The end result is that for XML, you need a program or programs to help you actually do anything useful with it. With TSV, you're much more likely to be able to make use of it with a much simpler program (an old crusty version of Excel, a text editor, etc.)
And we have to keep this in the context of pretty simple data. If we were talking about a much more complicated schema, I'd agree that TSV would be useless. But what we're talking about here can flatten pretty easily.
32jjwilson61
The point is that there isn't *a* way of escaping strings for TSV or CSV files. With XML you know that any program that properly handles XML will properly handle escaped characters. With the other formats there isn't a proper way to handle it so the way Tim decides to escape the characters may work for Excel but won't for something else.
33andyl
#31
On flattening - that rather depends on what is being exported. Will the export contain some (or all) of the data from CK?
What about when LT is expanded to deal with contents cataloguing?
Personally I am more in favour of a XML export PLUS conversion of that XML to TSV.
On flattening - that rather depends on what is being exported. Will the export contain some (or all) of the data from CK?
What about when LT is expanded to deal with contents cataloguing?
Personally I am more in favour of a XML export PLUS conversion of that XML to TSV.
34brightcopy
32/33> These are all problems that can be fairly easily handled by any competent programmer. So I'm going to hold off on any more of the nit-picking back and forth.
35staffordcastle
1. TSV, XML
2. TSV, because XML is an unknown quantity, as far as I am concerned.
2. TSV, because XML is an unknown quantity, as far as I am concerned.
36legallypuzzled
#10 expresses my feeling too: "if there is only one, XML is my personal preference. If you asked me which is best for the site (and the largest number of users), I'd say TSV."
37jjmcgaffey
1. TSV, XML
2. XML with an on-site 'flatten to TSV' link/method?
I use TSV a lot - I don't use CSV at all. I don't use XML and don't really know the necessary methods to put it into the formats I need (which is actually CSV, to import it to my mobile database). However, I know such methods exist, and that XML can handle much more complex data. So for a chance to get CK:Series, Other Authors complete with roles, and multiple Read dates, I'd happily go figure out a way to handle XML.
As brightcopy pointed out, XML is more or less designed to have bits extracted. It should be simple to flatten it to TSV - so if LT did all the coding bits to extract an XML record of your library, then upon request output a TSV, both sections (the programmers and the non-programmers) would be well-served. And LT wouldn't have to keep two separate extraction coding modules.
2. XML with an on-site 'flatten to TSV' link/method?
I use TSV a lot - I don't use CSV at all. I don't use XML and don't really know the necessary methods to put it into the formats I need (which is actually CSV, to import it to my mobile database). However, I know such methods exist, and that XML can handle much more complex data. So for a chance to get CK:Series, Other Authors complete with roles, and multiple Read dates, I'd happily go figure out a way to handle XML.
As brightcopy pointed out, XML is more or less designed to have bits extracted. It should be simple to flatten it to TSV - so if LT did all the coding bits to extract an XML record of your library, then upon request output a TSV, both sections (the programmers and the non-programmers) would be well-served. And LT wouldn't have to keep two separate extraction coding modules.
38aethercowboy
>3 lorax:
Perl can trivially parse and modify XML. There's, of course, XML::Parser, and several others which I can't think of right now, but are well documented in the Perl Cookbook and Advanced Perl Programming (go for the latest edition, though).
Of course, I speak Perl like a second language, so maybe I'm being hyperbolic when I say it's trivial.
Perl can trivially parse and modify XML. There's, of course, XML::Parser, and several others which I can't think of right now, but are well documented in the Perl Cookbook and Advanced Perl Programming (go for the latest edition, though).
Of course, I speak Perl like a second language, so maybe I'm being hyperbolic when I say it's trivial.
39ari.joki
>38 aethercowboy:,
the most recent programming language I used with anything approaching competence was procedural Pascal. Perl? Objects? err....
the most recent programming language I used with anything approaching competence was procedural Pascal. Perl? Objects? err....
40lorax
38>
Bleah, I've dealt with XML::Parser and hated it. It may just have been that the particular XML I was working with was nasty, though.
Bleah, I've dealt with XML::Parser and hated it. It may just have been that the particular XML I was working with was nasty, though.
41JonathanGorman
1) TSV, XML
2) TSV
I'm a little torn here. I think my issue is that I can think of a lot of uses of the export function:
* To preserve a backup of the data you've entered in librarything
* To transfer in and out of another system
* To create spreadsheets for manipulation that is too complicated for the catalog view.
* to format shift part or all of your collection into something like a printed inventory list of what you own/want to read/etc.
* data harvesting and manipulation to do things like visualizations of your data.
* To power some sort of application or interface. In other words, to supplement the existing APIs.
For most people, software, and for most of these categories, TSV would work. However, for several categories I'd much rather work with XML or JSON and either one would allow more programming libraries to be used. (XML would allow XSLT, SAX, DOM, and simple-type XML approaches to be used). I choose TSV over CSV just barely, mainly because I've seen it go wrong less than csv.
42reading_fox
TSV
- I would say CSV but a very good point was made about the commas in LT records. |SV would be good - apart from the series field.... hmm ... ^SV maybe. I know in XL you can specifiy what the break symbol is, so that might work.
I have no idea what I'd do with XML. I wouldn't want my export in IE, the point of exporting would be to have it as a spreadsheet.
But please export ALL the columns!
- I would say CSV but a very good point was made about the commas in LT records. |SV would be good - apart from the series field.... hmm ... ^SV maybe. I know in XL you can specifiy what the break symbol is, so that might work.
I have no idea what I'd do with XML. I wouldn't want my export in IE, the point of exporting would be to have it as a spreadsheet.
But please export ALL the columns!
44aethercowboy
>40 lorax:
If I remember correctly, one of the two books I cited (I read them back-to-back-ish, so I kinda got their content muddled) showed a relatively painless way to parse and modify XML. For example, XML::Simple, I believe, does a lot of the XML::Parser stuff, only simpler (at some cost to memory).
For HTML, I used HTML::TokeParser for tokenizing out tags and attributes. But there are literally hundreds of HTML/XML-related modules on CPAN.
If I remember correctly, one of the two books I cited (I read them back-to-back-ish, so I kinda got their content muddled) showed a relatively painless way to parse and modify XML. For example, XML::Simple, I believe, does a lot of the XML::Parser stuff, only simpler (at some cost to memory).
For HTML, I used HTML::TokeParser for tokenizing out tags and attributes. But there are literally hundreds of HTML/XML-related modules on CPAN.
45brightcopy
44>But there are literally hundreds of HTML/XML-related modules on CPAN.
Just had to say it again.
But probably not in the complimentary way you mean. ;)
Just had to say it again.
But probably not in the complimentary way you mean. ;)
47PaulFoley
With TSV, you're much more likely to be able to make use of it with a much simpler program (an old crusty version of Excel, a text editor, etc.)
It's far easier to read XML data, if it's at all sanely designed (as much as anything XML can be called sane) in a text editor than a TSV file with more than a couple of columns...
It's far easier to read XML data, if it's at all sanely designed (as much as anything XML can be called sane) in a text editor than a TSV file with more than a couple of columns...
48brightcopy
47> Yes, but do you think people want an export so they can pop it open in a text editor and read it?
49PaulFoley
Unlikely...but you were the one who brought up text editors :)
I don't know what people want to export for. I can't imagine any reason I'd ever want to export into a spreadsheet, but that seems to be everyone else's default expectation. Makes about as much sense as a text editor, to me...maybe less. First thing I do with the LibraryThing TSV data is convert it to a more convenient form—it makes no real difference to me what format I'm converting from (except that the TSV processor is already written), but it's easier to get more useful data (e.g., multiple authors with roles) into a more structured format.
I don't know what people want to export for. I can't imagine any reason I'd ever want to export into a spreadsheet, but that seems to be everyone else's default expectation. Makes about as much sense as a text editor, to me...maybe less. First thing I do with the LibraryThing TSV data is convert it to a more convenient form—it makes no real difference to me what format I'm converting from (except that the TSV processor is already written), but it's easier to get more useful data (e.g., multiple authors with roles) into a more structured format.
50jjwilson61
I don't know what a TSV processor is, but generally I believe people use a spreadsheet to convert the data to a more convenient form (moving and rearranging columns would be the more common transformations I would imagine).
51brightcopy
49> Yes, but maybe I wasn't clear but I was talking about using text editors (and Excel, etc.) to manipulate the data, not to peruse it.
Spreadsheets can be VERY handy, especially when you know how to use their features to the maximum (very true with Excel). Editors can be the same way (UltraEdit and macros being one of my favorites). I know how to do all sorts of fancypants stuff, but often I can get the quick-and-dirty stuff done a lot faster with something like UltraEdit, Excel or command-line tools such as cut.
Spreadsheets can be VERY handy, especially when you know how to use their features to the maximum (very true with Excel). Editors can be the same way (UltraEdit and macros being one of my favorites). I know how to do all sorts of fancypants stuff, but often I can get the quick-and-dirty stuff done a lot faster with something like UltraEdit, Excel or command-line tools such as cut.
52theapparatus
+1 for the XML.
+1 for whatever we decide, please consider writing up an API that docs the fields and in what order please. :)
+1 for whatever we decide, please consider writing up an API that docs the fields and in what order please. :)
53ari.joki
Hear! Hear! API with documentation that doesn't require post-graduate studies in computer science or 10 years expertise in SOAP Web Services.
54theapparatus
Gotta use those 3 phds on something.....
55ari.joki
Heh. I have three abandoned Master's programs on top of one B.Eng. Can do a bit of procedural programming, but modern software technology ... on a very good day, a little.
56theapparatus
And here we go off topic again.... :)
My BS is in Computer Eng which, at the time, was 50% software, 50% hardware. I came up on Turbo Pascal. Yes, I;m that old. *sigh*
My BS is in Computer Eng which, at the time, was 50% software, 50% hardware. I came up on Turbo Pascal. Yes, I;m that old. *sigh*
57LucindaLibri
Not sure why a poll isn't being used here . . . but one more vote for a format easily imported into a spreadsheet . . . for me that would be tab delimited format which I guess is now known as TSV. And yet another vote for documentation of what fields will be included (all please?) and which excluded (none please?).
My rationale is simple. I found LT just as I was about to enter my whole library into a excel spreadsheet/workbook. I thought LT would make it easier, though now I'm not entirely convinced it has saved me any time. If/when I give up on LT, I'll want to be able to export what I have entered and import it into a spreadsheet. I know how to do that with the old version of excel I use and TSV files. I realize the newer office versions are XML friendly, but I'm poor and don't have them yet.
My rationale is simple. I found LT just as I was about to enter my whole library into a excel spreadsheet/workbook. I thought LT would make it easier, though now I'm not entirely convinced it has saved me any time. If/when I give up on LT, I'll want to be able to export what I have entered and import it into a spreadsheet. I know how to do that with the old version of excel I use and TSV files. I realize the newer office versions are XML friendly, but I'm poor and don't have them yet.
58brightcopy
57> Not sure why a poll isn't being used here
Mainly because it's not a single simple Yes/No question. Beyond those, polls get really muddled.
Mainly because it's not a single simple Yes/No question. Beyond those, polls get really muddled.
59Mr.Durick
Once you folks resolve this issue, will I be able to download my catalog into an Access database?
Robert
Robert
60brightcopy
59> If XML - yes, but with caveats. You'll likely need at least Access 2007. And if will likely wind up importing into several tables that must be joined together to form any kind of useful view.
If TSV - import will go pretty smoothly, even in much much older versions of Access. Columns that can have multiple values (authors, reading dates, etc.) will likely appear as a single value delimited (such as "Smith, John <illustrator>"|Doe, Jane <author>").
If TSV - import will go pretty smoothly, even in much much older versions of Access. Columns that can have multiple values (authors, reading dates, etc.) will likely appear as a single value delimited (such as "Smith, John <illustrator>"|Doe, Jane <author>").
61Mr.Durick
Okay, I have Access 2007. Thank you. I may ask you questions if it all comes about.
Robert
Robert
62brightcopy
61> I'll be learning it at the same time you are. Haven't had to do much with Access and XML yet. This is one of those cases where TSV is more "accessible" - meaning I have to expand my brain a bit if I'm going to use XML. Which isn't actually bad, if I actually need something in the XML that the TSV can't give me.
63theapparatus
I;ve been poking around other sites this morning and seems like they prefer xml files for the most part. CSV a close second if that's what they mean by tab limited. (Tab limited means different things to different people I;ve noticed over the years.)
64jjwilson61
CSV is Comma Seperated Values and TSV is Tab Separated Values. So tab delimited (not limited) would be TSV.
65PaulFoley
I don't know what a TSV processor is
A program that reads LT's TSV export format and turns into something useful to me.
but generally I believe people use a spreadsheet to convert the data to a more convenient form (moving and rearranging columns would be the more common transformations I would imagine)
Exactly. But why is one arrangement of columns more convenient than any other? A non-columnar format that eliminates all the blank entries and splits the multi-valued entries and text containing newlines, etc., is more useful, IMO.
A program that reads LT's TSV export format and turns into something useful to me.
but generally I believe people use a spreadsheet to convert the data to a more convenient form (moving and rearranging columns would be the more common transformations I would imagine)
Exactly. But why is one arrangement of columns more convenient than any other? A non-columnar format that eliminates all the blank entries and splits the multi-valued entries and text containing newlines, etc., is more useful, IMO.
66bnielsen
I'll go along with anything Tim is happy with. It's taken ages to get a few simple bugs fixed and a couple of things are still missing mostly because Tim gets the creeps when looking at the code.
So I vote for XML as long as it at least contains the data we get from TSV at the moment.
Hmm, can we get tab-seperated XML with a few commas thrown in for fun. That should satisfy everyone. :-)
So I vote for XML as long as it at least contains the data we get from TSV at the moment.
Hmm, can we get tab-seperated XML with a few commas thrown in for fun. That should satisfy everyone. :-)
67jandm
1. TSV, XML
2. TSV
Please go with both TSV (simple enough for almost anyone to work with in Excel or OpenOffice) and XML (to make it easier to have a more complete set of data, and for programmers to have more fun with).
+1 for vote about reasonable documentation on what field is which.
2. TSV
Please go with both TSV (simple enough for almost anyone to work with in Excel or OpenOffice) and XML (to make it easier to have a more complete set of data, and for programmers to have more fun with).
+1 for vote about reasonable documentation on what field is which.
68fancett
1. TSV, XML
2. Not sure
I use tsv to export as much as possible as an offline backup to LT and also occasionally via Excel to print out lists in a custom format not easily done within LT. I also use it in Spacejock's BookDB, which although somewhat clunky will import LT's TSV file and provide me with an offline version of my library on my netbook, which is very handy for checking up what books I have when I'm away from home (I don't use internet on my mobile and aged parents don't have internet connection either so an offline version is handy).
Like quite a few others I am not that familiar with xml and how to use it but would be happy for this to be one of the export versions if I was given easy guidance into how to convert it into a form I could use in BookDB or Excel.
So in summary my main concerns are that the export:
1. Includes as much of the data as possible just in case it ever needs to be reimported into LT.
2. Can easily be used or converted to use in other spreadsheet or database type programs.
3. And leading on from 2 can be manipulated to produce printout lists (or LT itself has much more customisable printing options which may mean 3 was unnecessary).
2. Not sure
I use tsv to export as much as possible as an offline backup to LT and also occasionally via Excel to print out lists in a custom format not easily done within LT. I also use it in Spacejock's BookDB, which although somewhat clunky will import LT's TSV file and provide me with an offline version of my library on my netbook, which is very handy for checking up what books I have when I'm away from home (I don't use internet on my mobile and aged parents don't have internet connection either so an offline version is handy).
Like quite a few others I am not that familiar with xml and how to use it but would be happy for this to be one of the export versions if I was given easy guidance into how to convert it into a form I could use in BookDB or Excel.
So in summary my main concerns are that the export:
1. Includes as much of the data as possible just in case it ever needs to be reimported into LT.
2. Can easily be used or converted to use in other spreadsheet or database type programs.
3. And leading on from 2 can be manipulated to produce printout lists (or LT itself has much more customisable printing options which may mean 3 was unnecessary).
69bnielsen
#69: I have the same considerations. It would be nice though if this was something that was nearly automatic for Tim to update with new fields. As is, this feature is at least half a year late with anything new added to LT.
70brightcopy
So, five months later...
71theapparatus
All I know is that I've yet to find a site that took the export correctly out of the box. I bugged many a site admin elsewhere to blank my account with them because what exported out didn't export in elsewhere.
72Keeline
1: I prefer TSV but would look at XML for some applications
2: TSV
Since there is no mobile phone app on the horizon, I want to keep options open to letting me export to a format that can easily be brought into Excel or PHP/MySQL.
For example, if I wanted to put my LT data in an iPhone app such as "My Library", it can accept a CSV with certain columns that don't align with LT's export. Making a program to handle this is possible but the kind I'd make wouldn't always be helpful to others. With this in mind, being able to set up an export profile with specific columns and file format would be very helpful. If I could access this with a specific user/password in the URL that is not my main LT login and could only be used for this purpose, I would like it even more.
Make sure that the export includes other authors (with type) and image links.
James Keeline
2: TSV
Since there is no mobile phone app on the horizon, I want to keep options open to letting me export to a format that can easily be brought into Excel or PHP/MySQL.
For example, if I wanted to put my LT data in an iPhone app such as "My Library", it can accept a CSV with certain columns that don't align with LT's export. Making a program to handle this is possible but the kind I'd make wouldn't always be helpful to others. With this in mind, being able to set up an export profile with specific columns and file format would be very helpful. If I could access this with a specific user/password in the URL that is not my main LT login and could only be used for this purpose, I would like it even more.
Make sure that the export includes other authors (with type) and image links.
James Keeline
73Singpolyma
TSV and CSV are basically the same (though you seem to export different columns into each one... weird) and are thus interchangeable.
If you do create an XML format, please start with the excellent dublinCore instead of building a new vocabulary from scratch :)
If you do create an XML format, please start with the excellent dublinCore instead of building a new vocabulary from scratch :)
74lorax
73>
TSV and CSV are basically the same (though you seem to export different columns into each one... weird) and are thus interchangeable
In principle, yes, they're interchangeable, but in LT-specific parlance they're extremely different. The TSV export is nearly complete, but the CSV export has very, very few columns. I'd be fine with a complete CSV export, but not with the current CSV export being the only option.
TSV and CSV are basically the same (though you seem to export different columns into each one... weird) and are thus interchangeable
In principle, yes, they're interchangeable, but in LT-specific parlance they're extremely different. The TSV export is nearly complete, but the CSV export has very, very few columns. I'd be fine with a complete CSV export, but not with the current CSV export being the only option.
76timepiece
I have to say, that importing into Excel (or similar programs, presumably) is much easier with unique delimiters (i.e., not commas). Pipe-delimited plain text would be far preferable to comma-delimited.
77justjim
The hitherto little-known pipe character ( | ) has been making somewhat of a comeback recently. While it has long been available as an export delimiter (eg. Microsoft Access 97 had it as an option), it has been slowly infiltrating itself into more general use. Email signatures and Twitter spring immediately to mind. I predict that it won't be long until publishers re-discover it and we find it in book titles.*
Then we are again up that famous creek without a paddle.
My point, if indeed I have one, is that XML is probably the way to go.
*This may already be the case, I can't say for sure that it isn't.
Then we are again up that famous creek without a paddle.
My point, if indeed I have one, is that XML is probably the way to go.
*This may already be the case, I can't say for sure that it isn't.
78brightcopy
Tab tab tab tab
tab tab tab tab
lovely taaaaaaaaaab
wonderful taaaaaaaaab
tab tab tab tab
lovely taaaaaaaaaab
wonderful taaaaaaaaab
79brightcopy
Also, it's a good thing there's nothing in titles that would ever need to be escaped when exporting to XML.
;)
;)
80justjim
Yes that is fortunate.
I'll have the Lobster Thermidor aux crevettes with a Mornay sauce, garnished with truffle pate, brandy and a fried egg on top and Tab.
I'll have the Lobster Thermidor aux crevettes with a Mornay sauce, garnished with truffle pate, brandy and a fried egg on top and Tab.
82antqueen
Add my vote for an XML export. I do think we need a direct-to-spreadsheet export too, for those who just want to look at it quickly. I don't care what format that is, though.

