Click to flag this message as abuse

What is abuse? (1) personal attacks, (2) commercial solicitation, (3) spam. See terms of use.

Group:  New features ignore
Topic:  Tagmash overlap 0 / 71 read

Sep 7, 2009, 4:06am (top) Message 1: timspalding

Come monday or Tuesday I'm going to reintroduce the "tagmash" feature. It's been redone, and is better in various ways. I think tagmashes are really great.

I also added a profile-statistics page.

Mine: http://www.librarything.com/profile/tims...
Yours: http://www.librarything.com/profile/MEMB...

At 4:04am, that's about all I can say!

Sep 7, 2009, 4:10am (top)Message 2: BarkingMatt

Will need some getting used to, but looks good. Thanks.

Sep 7, 2009, 4:22am (top)Message 3: staffordcastle

Nice! One question, though - are the entries in any particular order, or is it random? They don't seem to be either alphabetical or in order of how many books in the overlap.

Sep 7, 2009, 4:25am (top) Message 4: timspalding

It's by a ranking, which is mostly the overlap, but conditioned by how HIGH in the list a book appears, and how many books are on the list. Anyway, it's trying to give you the strongest, most interesting ones.

T

Sep 7, 2009, 4:27am (top)Message 5: staffordcastle

Thanks - I'll have to study the list for a while, to wrap my head around that. It did list one of the ones I made myself as the first!

Sep 7, 2009, 4:33am (top)Message 6: VisibleGhost

I'm glad to see that tagmash will be reintroduced. I think it's one of the gems of LT.

Sep 7, 2009, 6:19am (top)Message 7: MrsLee

Is there an explanation for this somewhere? I'm not sure I understand the function and use of it.

Sep 7, 2009, 6:21am (top)Message 8: AnnieMod

Sep 7, 2009, 7:45am (top)Message 9: markbarnes

I'm sure we won't reach unanimity on this, but should the order be left to right, not top to bottom? Far more useful info is above the fold that way.

Message edited by its author, Sep 7, 2009, 7:46am.

Sep 7, 2009, 8:34am (top)Message 10: 235711

Ah, with green checkmarks added, as with single tags. And the stats page is very interesting. Thank you.

Sep 7, 2009, 9:57am (top)Message 11: stephmo

Checkmark comparison is very nice.

Sep 7, 2009, 12:15pm (top)Message 12: PortiaLong

Very interesting - if you had asked me how many of my books others would have tagged as "children" and "christian" and "series" I would have thought - oh, maybe a 1/2 dozen. Nope - 67! Who knew?

Sep 7, 2009, 12:35pm (top)Message 13: Anneli

>12
You learn new things about your books! I have 9 books about dystopia, non-fiction - among others Crime and punishment and The hitchhiker’s guide to the galaxy.

Sep 7, 2009, 2:01pm (top)Message 14: kristenn

This is really interesting. First I knew of tag mashes.

And now I'm trying to figure out the tag 'Fred.'

Sep 7, 2009, 2:09pm (top)Message 15: prosfilaes

I want a switch to hide this.

Okay, no, not really. But it shows a depressing number of Children's, juvenile, young adult, etc. A lot of it I have to admit is fair, but it seems like any adventure work more than a hundred years old is automatically juvenile; Jules Verne, The Three Musketeers, Robinson Crusoe, H. G. Wells, Frankenstein, etc. In a hundred years will Jurassic Park be dismissed as juvenile?

(Edit: It seems that it is in fact private. Which seems a little odd, since comparable things, like the tag mirror and recommendations, are public.)

Message edited by its author, Sep 7, 2009, 6:02pm.

Sep 7, 2009, 3:33pm (top)Message 16: infiniteletters

15: You're assuming Jurassic Park survives. :)

Sep 7, 2009, 11:43pm (top)Message 17: Heather19

Yeah. I saw this thread earlier, clicked the link, and got lost in my tagmash overlap. An hour later, I have eight more wishlisted books and waaaay too many new tag ideas.

I love this.

Sep 8, 2009, 12:26am (top)Message 18: prosfilaes

It's creating bad links in some cases; "fantasy, literature/19th century" links to http://www.librarything.com/tag/fantasy%... which gives a 404 error.

Sep 8, 2009, 1:38am (top)Message 19: keristars

Amazing!

Because of the tagmashes, and looking to see what books certain ones covered, I discovered at least five books that I've owned for years but never entered into LT. That is, I expected to see certain books I own show up under the mashes, which they did, but they didn't have the green checkmark.

I'm going to consider tagmashes a fantastic feature, if only because I now know to double check my books against LT before I box them up when I move, since I seem to have skipped over one or two books from each shelf when I was entering everything in last year. :P (strangely, each of the skipped-over books have been read many times, and a few might could use replacing)

Sep 8, 2009, 3:57am (top)Message 20: AnnieMod

>18
That's because of the / in the name of the tag...

Sep 8, 2009, 8:01am (top)Message 21: eromsted

>13 Small numbers of oddball tags cause problems for several of the tag based social functions (tag mirror, tag based recommendations). But the fiction/nonfiction distinction on this function would seem to me to be especially useful and especially screwy when it's wrong.

To use Anneli's examples, Crime and Punishment is tagged fiction 2298 times and non-fiction 8 times. The hitchhiker’s guide to the galaxy is tagged 1529/2, fiction/non-fiction.

Would it be possible for LT to check the ratio of fiction/non-fiction tags and ignore the smaller one when computing the tagmash? Since I can think of some good reasons a book could be tagged with both, perhaps only when the ratio is running more than 10/1 in one direction.

Sep 8, 2009, 9:59am (top)Message 22: Aerrin99

> 9 I'm sure we won't reach unanimity on this, but should the order be left to right, not top to bottom? Far more useful info is above the fold that way.

Agreed!

Sep 8, 2009, 11:26am (top) Message 23: timspalding

>21

Actually, we store a single value elsewhere for this—fiction, nonfiction or undecided/undetermined—based on the ratio. But tagmash is what tagmash does. There are lots of similar tags people will disagree on, and no way to police the issue. If you say --fiction, well, you've got rid of everything with a single fiction tag—live with it! Instead, use -fiction to "demote" the fictions. And you're done! :)

>22

No. Because if you do it that way, you can't keep everything lined up without making a little self-contained box for each line. Since some lines are two lines long, it looks terrible. I'm not sure I can explain this well without a visual aid.

Sep 8, 2009, 1:22pm (top)Message 24: eromsted

>23
It does appear that subject tag, non-fiction, -fiction (or vice-versa) gives the desired result.

By the way, is the separately stored fiction/non-fiction/undecided data used for anything at the moment?

Sep 8, 2009, 1:36pm (top) Message 25: timspalding

>24

Minimally, for recommendations.

Sep 8, 2009, 1:57pm (top)Message 26: Aerrin99

> 23 Sad. :(

Stupid formatting issues.

Sep 8, 2009, 2:02pm (top)Message 27: jjwilson61

How about if you paginated the results and we could control the page size so it fit on our screens. That way we could see all the strongest matches on one page without scrolling.

Sep 8, 2009, 4:08pm (top)Message 28: lilithcat

I'm sorry, but I don't understand what this means.

Sep 8, 2009, 4:40pm (top)Message 29: jjwilson61

It's probably too much trouble for Tim to do it, but he could calculate how many tag-mashes would fit on one page without scrolling and just show that many. Then the next group could be shown when a "next" button is pushed.

Sep 8, 2009, 4:52pm (top)Message 30: AnnieMod

>It's probably too much trouble for Tim to do it, but he could calculate how many tag-mashes would fit on one page without scrolling and just show that many.

It will depend on your browser (and what kind of bars you have on it), the font that you use, the resolution and so on. Doing it individually every time? I do not think this would be easily done. It might be easier to do something like the catalog size (everyone specifies at the top of the page or something how many to see...)

Sep 8, 2009, 5:13pm (top) Message 31: timspalding

Handed to Chris. Sorry. I meant another thread.

Message edited by its author, Sep 8, 2009, 5:15pm.

Sep 8, 2009, 5:29pm (top)Message 32: infiniteletters

28: Which part?

Sep 8, 2009, 8:23pm (top)Message 33: jjwilson61

30> That's what i meant by "we could control the page size."

Sep 8, 2009, 9:38pm (top)Message 34: lilithcat

> 28

This part: How your books overlap with LibraryThing tagmashes.

I know what a tagmash is. But what does it mean for a book to "overlap" with a tagmash?

Sep 8, 2009, 9:55pm (top)Message 35: infiniteletters

34: That it shows up on the list for the tagmash.

The tagmash "overlap" is more about your library than individual books. It shows which tagmashes have a lot of your books on them, like the tag mirror but for multiple tags at a time.

Sep 8, 2009, 11:42pm (top)Message 36: jjwilson61

I'm assuming that these aren't all the possible tagmashes that are used, but those that people have actually searched using?

Sep 9, 2009, 8:00am (top)Message 37: eromsted

I too would assume the caveat from any specific tagmash page applies here as well, "Tagmashes do not exist until someone enters them."

This text could be added to the top of the tagmash overlap page for clarity.

Sep 9, 2009, 12:08pm (top)Message 38: prosfilaes

>20 I know. The quick solution would be to choose the form of a tag (with multiple spellings) to be one without problematic characters; several bugs would be fixed if it use 19th century literature instead of literature/19th century. The better solution, and that would fix a lot of bugs that have been annoying me for a long time, is to consistently hash tag names in a way that avoids problems with special characters; perhaps even convert all of them to a number just like works are, and never send a tag name by URL.

Sep 9, 2009, 12:38pm (top)Message 39: infiniteletters

20/38: And series names. Twould be lovely.

Sep 9, 2009, 12:41pm (top) Message 40: timspalding

It's funny members think every combination of X million tags—X million to the fifth power?—will be pre-generated. There are more potential tagmashes than atoms in the universe, people!

Sep 9, 2009, 12:56pm (top)Message 41: aethercowboy

>40.

If there are x tags, there should be x! + (x x-1) + ... + (x 1) different tagmashes, of course (for x > 1).

(where (y z) represents a combination, such that:

(y z) = y! / (z! * (y - z)!)

)

I'd put it into summation notation, but it's scary enough as it is.

Sep 9, 2009, 1:34pm (top) Message 42: timspalding

I don't think the system allows a single mash to have one thousand tags, though :)

Sep 9, 2009, 1:52pm (top)Message 43: prosfilaes

It would be possible if you wanted to do it. Dump all the tags with too little usage would probably leave you with say 100,000 tags, which can be compared pairwise fairly easily, and then three tagmashes, covering the three pairs of three tags, can be assembled pretty easily, then if we want four tag tagmashes, they can be assembled from the four tagmashes that contain three of those tags pretty easily. It wouldn't be cheap, it wouldn't be worth it, but it's not silly to think it could be done.

Sep 9, 2009, 2:16pm (top) Message 44: timspalding

But 100,000 four ways would be 100,000 to the fourth power. That's 100,000,000,000,000,000,000 possibilities. That's one hundred quintillion. According to somewhere online atoms in the universe is a number with around 80 zeroes. Still, one hundred quintillion is a lot.

Sep 9, 2009, 2:22pm (top)Message 45: suitable1

#44 - So, is the issue processor power or disc space?

Sep 9, 2009, 2:35pm (top)Message 46: prosfilaes

But if you follow the algorithm I gave, you never do 100,000 four ways.

If you're looking at these tagmash overlaps, if a,b,c,and d are tags, you should only look at creating an a,b,c,d tagmash if tagmashes (a,b,c), (a,b,d), (a, c, d) and (b, c, d) were all interesting. You never evaluate any larger tagmash containing Esperanto literature, French literature because that tagmash had one element, and you can probably ignore any tagmash containing medieval literature, science fiction (18 results). The numbers of two-tag tagmashes that have reasonable overlaps is not anywhere near 100,000^2.

I suspect you could create every non-empty tagmash, not reasonably, but certainly within a "Tim Spalding has the brain-fever and is willing to run LT into the ground computing this function" budget. A lot of the stuff you do couldn't be done with naive functions.

Sep 9, 2009, 2:40pm (top) Message 47: timspalding

>46

The problem would be "read, "unread," "fiction," "nonfiction," etc.

Sep 9, 2009, 3:14pm (top)Message 48: infiniteletters

47: Then exclude those 4. :)

Sep 10, 2009, 3:53am (top)Message 49: bnielsen

If he also excludes etc. he is done :-)

Sep 10, 2009, 9:42am (top)Message 50: infiniteletters

Except for the complaints about lack of content. :)

Sep 10, 2009, 10:21am (top)Message 51: aethercowboy

It's remotely more feasible to create every c(x, 2) tagmash, since, given 100,000 tags, that'd be like 4,999,950,000 combinations.

Sep 10, 2009, 1:06pm (top)Message 52: thorold

>47
I'm just trying to work out why on earth someone created the tagmash reread, unread, and what if anything to deduce from the fact that it's fairly high on my list of overlaps?

I also wonder about some of the redundant tagmashes that people have created: like "glbt, lgbt, queer" and "England, fiction, sex" - obviously the difference between AND and OR isn't universally understood.

It's nice to see that British authors hold three of the top ten places in the tagmash "German, satire", anyway...

Sep 10, 2009, 1:48pm (top)Message 53: lorax

52>

Oh, "reread, unread" sounds very interesting, actually -- books that some people adore, and others just can't seem to get to?

I can see how "glbt, lgbt, queer" is likely to be fairly redundant, though "lgbt, --glbt" and vice versa might be interesting, but how is "England, fiction, sex" redundant? Is this a "No sex please, we're British"-style swipe that I'm missing?

Sep 10, 2009, 1:54pm (top) Message 54: timspalding

> England, fiction, sex

Yeah, the England, good cooking tagmash too

Sep 10, 2009, 2:05pm (top)Message 55: jjmcgaffey

Nonsense, there's lots of English *fiction* about sex. Consider Fanny Hill, Tess of the D'Urbervilles...

Sep 10, 2009, 2:07pm (top) Message 56: timspalding

How about sex, --friction?

Message edited by its author, Sep 10, 2009, 2:08pm.

Sep 10, 2009, 6:30pm (top)Message 57: jjmcgaffey

Did you mean to put that r in there? It does bring up interesting images...

Sep 10, 2009, 8:11pm (top)Message 58: MerryMary

Talk about someone being rubbed the wrong way...or the right way...

Sep 10, 2009, 11:08pm (top)Message 59: Heather19

Wooooaaaah. Innocently wander into the thread, and *wide-eyed*.

Sep 11, 2009, 7:29am (top)Message 60: thorold

Oops - I seem to have started something...

One oddity I noticed: between all the tagmashes on my Tagmash Overlap page, there is one single tag: "english fiction" (no comma). Is the system listing it as though it were a tagmash because it happens to contain two words in alphabetical order?

Sep 11, 2009, 9:30am (top)Message 61: apple2e

> 60

No, because my tagmash page has one singleton tag as well: comedy

(edit) found two more: action & fantasy fiction

There are some odd tagmashes that I did not expect:

dwarves, non-fiction (I would have expected this to be a very small set)

Message edited by its author, Sep 11, 2009, 9:37am.

Sep 11, 2009, 9:33am (top) Message 62: timspalding

Yes, I'm not sure how the singletons are getting there. May be an error left over from long ago.

Sep 11, 2009, 9:33am (top) Message 63: timspalding

Yes, I'm not sure how the singletons are getting there. May be an error left over from long ago.

Sep 11, 2009, 10:57am (top)Message 64: AnnieMod

>dwarves, non-fiction (I would have expected this to be a very small set)

Well - it depends on how someone is marking works on myths and the similar. For example I consider The World Guide to Gnomes, Fairies, Elves and Other Little People non-fiction in my tagging.

Sep 11, 2009, 11:03am (top)Message 65: aethercowboy

>61.

a very small set

Pun unintended?

Sep 11, 2009, 7:09pm (top)Message 66: prosfilaes

Sep 14, 2009, 4:13pm (top)Message 67: romula

This may be due to some on-going tag regeneration, but I'm getting quite a few empty tagmashes (there's a number in parenthesis, but no tags, http://www.librarything.com/profile/romu...).

For example here's my first row:
epic fantasy, high fantasy, magic (61)
awesome, science fiction (19)
(8)
(24)

I would've expected tags in front of (8) and (24)

Sep 14, 2009, 4:34pm (top) Message 68: timspalding

Oooh, nice catch. I'll look into it.

Sep 14, 2009, 4:51pm (top)Message 69: jjwilson61

And I'm still getting singleton tags. For example, science fiction, and another is british authors. Hm, could it be because they are two word tags?

ETA: I also spotted this pair:

YA, good vs. evil (27)
good vs. evil, ya (27)

Is this because the mashes are still being generated?

Message edited by its author, Sep 14, 2009, 4:54pm.

Sep 14, 2009, 6:24pm (top) Message 70: timspalding

Is this because the mashes are still being generated?

Yeah. Starting with the a and going to the z... :)

Sep 14, 2009, 10:51pm (top)Message 71: prosfilaes

I'm seeing #67 pretty heavily. As for singleton tags, I'm getting mystery and fantasy, so it's not just two word tags.

I'm also seeing D&D, RPG, fantasy (33), and I'm wondering how anyone ever generated that, since using any alias for D&D gets translated to D&D and then brings up the tagmash for D.

(back to top)

Debug test: your member name is:

Help/FAQs | About | Privacy/Terms | Blog | Contact | APIs | WikiThing | Common Knowledge | 46,500,828 books!