Cleaning up member-added HTML, links and related stuff

TalkNew features

Join LibraryThing to post.

Cleaning up member-added HTML, links and related stuff

This topic is currently marked as "dormant"—the last message is more than 90 days old. You can revive it by posting a reply.

1timspalding
Nov 28, 2009, 3:34 pm

We've changed to a new library that deals with member data, standardizing HTML where it's allowed and preventing spam and other malicious attacks.

This code is now in use two places—your catalog and talk. It's intended to clean up member-provided data, either removing all HTML code in fields that aren't supposed to have it (eg., titles, ISBNs) or cleaning it up and making it "safe" on fields that do allow it. For the latter, the library is supposed to:

1. Standardize line breaks. One or two can happen, but no more.
2. Turn things that look like links into links (eg., http://www.cnn.com)
3. Truncate display of very long links.
4. Add target="_top" (so links get you out of the catalog) and rel="nofollow" (an anti-spam measure) to all member-added links
5. Allow only certain HTML. The default list is

<cite><pre><br><b><i><u><a><em><strong><em><u><strike><ul><ol><li><img>

6. Remove all other attributes, except href for links and src for images; this is a spam-fighting thing

The code is deployed, but is not yet invoked everywhere it might be. I want to spot problems before multiplying places where they might happen. So, if you see problems in either place, let me know, with as much specificity as possible, and I'll look into it.

Thanks,
Tim

2MaidMeri
Nov 28, 2009, 3:46 pm

Is publication one of the fields where HTML is allowed?

3timspalding
Nov 28, 2009, 3:50 pm

It is now. Thanks.

4MaidMeri
Edited: Nov 28, 2009, 4:01 pm

Great! Thank you. Just found another one: <i> used to work in the comments field, is that supposed to be gone?

5timspalding
Nov 28, 2009, 4:33 pm

Show me a book where it's not showing up? Be sure you're talking about comments on the catalog page, not elsewhere.

6MaidMeri
Nov 28, 2009, 4:44 pm

Oh, sorry. Seems to work fine when comments has its own column, actually. I was looking at the combo column with tags and comments. Doesn't work there.

7Boobalack
Nov 28, 2009, 4:49 pm

Could we please still have HTML on our profile page? I feel a need to underline book titles, no matter where I post them, and I like to bold dates when I add more comments to my profile page. I'd appreciate your considering this but will understand if you choose not to do it. Thanks.

8TomVeal
Edited: Nov 28, 2009, 5:10 pm

How about allowing the paragraph tag, which is useful for book reviews?

9timspalding
Edited: Nov 28, 2009, 5:19 pm

>6 MaidMeri:

The "combo" column now works.

>7 Boobalack:

Yes, I think it's time to bring them back. Right now I'm focusing on Talk and the catalog. Then we can move onto more.

10timspalding
Nov 28, 2009, 5:18 pm

>8 TomVeal:

Is it? How about just adding line breaks, no?

11Boobalack
Nov 28, 2009, 5:22 pm

Thank you. :-)

12TomVeal
Nov 28, 2009, 5:59 pm

Well, okay. I should have thought of that myself.

13bragan
Nov 28, 2009, 6:08 pm

No blockquote tag? I'd think that could be useful in discussions about books. But maybe it's not something people generally bother to use.

14jjwilson61
Edited: Nov 28, 2009, 9:08 pm

I would think that some people might be used to using p tags and be confused when they don't work. What's the harm to allowing them?

edited to add back the disappearing tag.

15Mr.Durick
Nov 28, 2009, 7:17 pm

I have used or tried to use blockquote. I think it should be available.

Robert

16justjim
Edited: Nov 28, 2009, 7:32 pm

I also use blockquote sometimes for little excerpts from something I'm reading.

eta: usually in Talk

17lquilter
Edited: Nov 28, 2009, 7:38 pm

I second fourth the blockquote tag. I would really like to have that one.

18christiguc
Edited: Nov 28, 2009, 7:53 pm

Can we keep height (or width) so that images can be resized to fit into a message? (And maybe margins for spacing pictures on profiles?)

19christiguc
Edited: Nov 28, 2009, 8:00 pm

Also, why are some of my instructions now hidden on messages such as this one?

Edited to fix link

20jjwilson61
Nov 28, 2009, 9:07 pm

This message has been deleted by its author.

21Talbin
Nov 28, 2009, 9:14 pm

Would it be possible to have height as well as width for images, particularly on the profile pages? Thanks.

22FicusFan
Edited: Nov 28, 2009, 9:55 pm

Height and Width for pictures would be good.

Also on my Wiki page I use " table , tr , td " and would hope to be able to keep using them.

23christiguc
Nov 28, 2009, 9:55 pm

I assume nothing would be stripped from Wiki pages?

24timspalding
Nov 28, 2009, 11:41 pm

I've added blockquote and p. I am not totally behind p, but I'll go with it.

I've added back height and width. They need to be done as height="x" and width="x" not using a CSS "style=" syntax.

>23 christiguc:

Yes, Wiki pages are and will be untouched.

25timspalding
Nov 28, 2009, 11:42 pm

>19 christiguc:

What do you mean?

26christiguc
Nov 29, 2009, 12:19 am

>25 timspalding:

When I follow my link, I see this as my message:



Whereas, if you look at what I typed, and what originally showed before, it should say something like this:

To refresh, a hyperlink is:
< a href="URL" >text< /a >

If you want a picture to act as the link instead of text, then you would do:
< a href="URL" >picture< /a >

etc.

If we use < or >, will the things between them just disappear now, even if there is a space after < and before >?

27timspalding
Nov 29, 2009, 12:24 am

Yeah, you've got to use the &lt; and &gt; work-around. Putting spaces around them doesn't make them any less HTML, or less dangerous. We just weren't catching them.

28Lman
Edited: Nov 29, 2009, 3:50 am

>27 timspalding:

I'm trying to figure out what you mean here but I can't even copy what you have written?? I keep getting brackets for HTML, so how does the &... work?

29Noisy
Nov 29, 2009, 5:06 am

If you want to show < and > in a talk message, then you have to replace them with &lt; and &gt;. What Tim is saying is that the software will detect the use of < in a message, and check what comes after it. If ir isn't one of the HTML commands in the list supplied in the OP (as amended by later messages), then the parser will think that this is the start of an illegal HTML command, and remove whatever's after it (up to the next >?).

(We get the &lt; to show in talk messages by replacing the & with &amp;, like this &amp;lt;.)

30Noisy
Nov 29, 2009, 5:11 am

Stupidly I forgot to take a copy of that message before posting it, so I'm not going to go back and change my mistakes, because I'll probably get the ampersands in the wrong places.

What I will correct is that where I said "illegal HTML command", I meant "suppressed HTML command".

31Lman
Nov 29, 2009, 6:12 am

I get it now - thank you Noisy!

32avaland
Nov 29, 2009, 11:20 am

OK, working from examples on my Club Read book log here: http://www.librarything.com/topic/61835 (and bearing in mind that I use some html but don't speak it well:-)

returning height to images didn't fix my images. I use sort of code (having adjusted to the loss of style codes on the threads).

Also, it seems a simple return after the line of code no longer results in placing the title below the image. Do I have to insert codes on all the entries now?

And how to explain this: the comments for three of my book entries have DISAPPEARED at the same time all this happened. (messages 185, 194, 207) OY! I hope I copied the text to reviews ... Why did that happen?

And the one photobucket picture (142) seems to have lost its porportions also. At this point I don't dare to go in and edit any of this or even look at it for fear I will lose something or muck it up...

Do we have capabilities to define margins between images?

33christiguc
Nov 29, 2009, 12:10 pm

>32 avaland: That's weird about your Club Read thread. If you click to edit those posts, do you see all the text of your reviews?

34christiguc
Nov 29, 2009, 12:19 pm

>24 timspalding:

I've added back height and width.

I don't see it working yet. e.g., here

35avaland
Nov 29, 2009, 1:16 pm

>33 christiguc: call me cowardly but I haven't dared to look. Maybe I will tomorrow...

36Noisy
Nov 29, 2009, 1:25 pm

>35 avaland:

I had a look at your 185 using view source and didn't see any text - search for 'messagehead185'. (Didn't look at the others.)

37christiguc
Nov 29, 2009, 3:31 pm

>36 Noisy: True, but it won't show in source, just as the things with < etc didn't show in source here. (But it did show my typing when I clicked to edit the message).

38TomVeal
Nov 30, 2009, 11:51 pm

Another vote in favor of blockquote.

39timspalding
Nov 30, 2009, 11:55 pm

Both p and blockquote are now allowed.

40clamairy
Dec 1, 2009, 9:32 am

Could we still use the code to resize posted images, please?

We post pictures like crazy in the Dragon, and now every re-sized image we've posted for the last 3 1/2 years is whacked out.

41timspalding
Dec 1, 2009, 9:38 am

Size should work. Can you give me a thread and an image you knew to be resized?

43clamairy
Dec 1, 2009, 10:26 am

Thank you!

44aethercowboy
Dec 1, 2009, 10:32 am

Can we also have <sup> and <sub>?

It's not critical. I just like these tags.

45timspalding
Dec 1, 2009, 10:42 am

Fixed. It wasn't taking case isn't account, so width worked, but not WIDTH...

46clamairy
Dec 1, 2009, 11:14 am

#45 - But I tried all lower case first, and it didn't work. I edited it to upper case, and when THAT didn't work I yelped.

47timspalding
Dec 1, 2009, 11:17 am

It's good for you now, though, right?

48clamairy
Edited: Dec 1, 2009, 11:22 am

Yes! Absolutely!
But I need to know if it was good for you, too. ;o)

And thank you, again!

49clamairy
Dec 1, 2009, 11:31 am

So sorry. I hate to be a pain in the arse, but could you allow align="right" as well. I just noticed our group page is a bit off now.

50timspalding
Dec 1, 2009, 1:02 pm

Which group?

Right now it's only about talk posts, not group descriptions, etc.

51clamairy
Dec 1, 2009, 1:43 pm

Ah, okay. I was talking about The Green Dragon group page.

52timepiece
Dec 1, 2009, 2:57 pm

So ... I'm afraid to try editing my profile and have this stuff go away on save, never to return -

Can we still include style="" inside a tag? In profiles, at least? Being able to float my images (and blockquotes) is really nice.

53timepiece
Dec 1, 2009, 3:07 pm

Oh, and bless you, Tim, for encouraging use of <cite> instead of just <i> or <em>. Semantic html - learn it, live it, love it. (actually, I for one wouldn't cry if you got rid of <b> and <i> entirely in favor of <strong> and <em>)

54timspalding
Dec 1, 2009, 3:25 pm

Oh, I'm gonna fight you on that one. b has a meaning that cannot be reduced to strong. It's like saying the Mona Lisa should be reduced to a prettypicture tag. ;)

55timepiece
Dec 1, 2009, 4:28 pm

I don't want to hijack the thread, but what does bold mean that strong doesn't? Apart from "I want it to look bold" which I assume is not what you mean (not to mention not what html is for).

56timspalding
Dec 1, 2009, 4:50 pm

All sorts of things! Bold is used in the production of printed books for all sorts of things. Sometimes they explicitly say it, sometimes not. For example, I was just talking about an edition of Artemidorus where the bold words were all in the topical index. Are they emphasized, or is the concept of plain, emphasized and "strong" a rather impoverished abstraction of the rich world of meaning and display that is typography—exactly the sort of reduction you'd expect from computer people at standards bodies who named everything in CSS as if they'd never talked to printing or typography people!

57timspalding
Dec 1, 2009, 4:51 pm

(runs screaming from the room)

58Larxol
Dec 1, 2009, 7:36 pm

Timepiece has it right, Tim. You can run <strong> screaming, </strong> but not <bold> screaming. </bold> Standard Generalized Mark-up Language, of which html is a dialect, tries to be independent of the medium.

59timspalding
Dec 1, 2009, 8:04 pm

The medium is the message.

60justjim
Dec 1, 2009, 8:08 pm

Unless the message is more than 500 words, then it's more of a large.

61timspalding
Dec 1, 2009, 8:11 pm

Big gulp™

62timepiece
Dec 1, 2009, 9:42 pm

So in the Artemidorus example, in html those words should be tagged with a class of "indexed", which can then be styled to display as bold. And if someone decided, in the next version, that they wanted indexed words to appear in small caps rather than bold, it could be easily changed. I don't see how bold is inherent to the meaning "indexed", whereas empasis and strong do have inherent meaning.

Think about how all these things are conveyed via screen reader.

63prosfilaes
Dec 1, 2009, 11:51 pm

#62: And in the Artemidorus example, how is the header supposed to explain how the indexed words are labeled? "The bold (well, maybe, who knows) words are in the topical index"? There's an old rule of thumb in both computer science and mathematics, that you can always add more abstraction, and the tyro will always do so.

64markbarnes
Edited: Dec 4, 2009, 10:13 am

<li> and <ul> don't work properly in Talk because the CSS is poor. Could you change this? There's an example here: http://www.librarything.com/topic/78213#1633690

65Lman
Dec 6, 2009, 1:13 am

Is there any reason why <strike> won't work in profile comments now - I know I am doing it correctly because it worked in my post comments - it just won't in profile ones.

66timspalding
Dec 6, 2009, 1:50 am

The new code isn't being used on profile comments. It will be extended. I'm still waiting for any other problems to crop up. The code change was so massive, I was expecting more problems :)

67Lman
Dec 6, 2009, 2:13 am

Oh, right! Umm...what does that mean for profile comments then - all my other HTML works there, just not strike through?
Thanks for answering - sorry about so many problems...

68timspalding
Dec 6, 2009, 2:25 am

Not sure. It's old code. Probably it allows some tags, but not strike.

69GirlFromIpanema
Dec 6, 2009, 2:30 pm

A bug, a bug, I found a bug! :-)

Round parentheses "break" the ability of LT to change a http:// string into a clickable link:

http://www.librarything.com --no parentheses
(http://www.librarything.com) --directly connected p.
( http://www.librarything.com ) --parentheses with a space

70timspalding
Dec 6, 2009, 3:24 pm

Fixed. Thanks.

71rsterling
Dec 6, 2009, 3:30 pm

Are these changes at all connected to the fact that award pages based on CK entries no longer display any non-numeric parenthetical data? Can that be reinstated?

This page, for instance, used to show the information people had entered about the type and region of the award, not just the year:
http://www.librarything.com/bookaward/Commonwealth%20Writers%27%20Prize

Similarly, this page used to show the genre next to the year (e.g. Fiction and Poetry, Nonfiction).
http://www.librarything.com/bookaward/New+York+Times+Notable+Book+of+the+Year

All that information is still in the CK entries for the works, but is no longer showing up on the Award page.
(I also mentioned this here.)

72VictoriaPL
Dec 9, 2009, 10:42 am

my pictures are HUGE again. Will they go back to the size I specified, or is this the new normal?

73timspalding
Dec 9, 2009, 11:06 am

Fixed. Sorry about that. New code had a bug in it.

74VictoriaPL
Dec 9, 2009, 11:07 am

Thanks Tim!

75justjim
Jan 13, 2010, 6:02 am

Are the cite and pre tags expected to work in talk?

Is the cite tag expected to give mouseover text? It doesn't here.

Should the pre tag allow display of more than one 'space'? It doesn't here.

76Aerrin99
Jan 13, 2010, 8:55 am


  • Unordered lists do not appear to be working properly




  1. Although ordered lists do.



Also, could you clarify whether this is also the set that works for profile pages, as well as your catalog and Talk?

77foggidawn
Edited: Jan 13, 2010, 9:00 am

  • Oddly enough
  • bullet points (the "li" tag) work,
  • just not in an unordered list ("ul" tag)
  • 78Aerrin99
    Jan 13, 2010, 9:58 am

    > 77

    Well /that's/ very strange. Although good to know. I'll wait and see if Tim comments (perhaps next week, after ALA) before I go adding it to the HelpThing page, though, as it doesn't seem like it /should/ work that way.

    79timspalding
    Jan 13, 2010, 11:57 am

    It definitely shouldn't. I'm not sure it'll get fixed quickly, though. What needs to happen is to have all user input go through the new, correct functions we made. These functions handle all the weird edge cases, like un-closed tags, and also allow much more HTML. But they aren't everywhere they should be. Under-abstraction was, unfortunately, a side effect of rapid development.

    80justjim
    Jan 13, 2010, 12:07 pm

    I'm invisible again!

    81TheoClarke
    Jan 13, 2010, 12:33 pm

    I could have sworn I heard an Oz accent come out of thin air.

    82Aerrin99
    Jan 13, 2010, 12:55 pm

    Thanks Tim!

    83rsterling
    Jan 13, 2010, 7:25 pm

    What happened to the safeguards that used to prevent brand-new users from having live URLs on their profiles (and group pages too? or was it only profiles?)?
    See for instance this new spammer:
    http://www.librarything.com/profile/adultaccessnow

    84reconditereader
    Jan 14, 2010, 1:24 am

    Yeah, and why can spammers put URLs in book titles? That's not right.

    85jjmcgaffey
    Jan 14, 2010, 5:40 am

    Give an example? Yes, that's not right, and should be fixed. I haven't come across it, and there are an awful lot of books to search...make it easy on the programmers and link to a work page.

    86skittles
    Jan 14, 2010, 8:20 am

    possibly accidental, but an example

    http://www.librarything.com/work/9417961/details

    87jjwilson61
    Jan 14, 2010, 10:05 am

    86> But that's not an active URL; you can't click on it and the browser takes you somewhere. I think they were talking about active URLs above.

    88jjmcgaffey
    Jan 14, 2010, 3:49 pm

    Ah - actually the work - was it cD? - did may just have inactivated URLs in titles and CK and so on - not deleted them. So it's a horribly messy title and I hope it doesn't get combined, but it's not a spam danger. reconditereader, was the one you saw active?

    89reconditereader
    Jan 14, 2010, 4:38 pm

    Oooh, now they seem to redirect to the "spam" author page. Nice!

    90avaland
    Feb 23, 2010, 11:36 am

    The book covers I so meticulously place (and size) on my profile page just went bonkers (I'm assuming "height" is no longer working). While I have had problems adding images to to the same "about me" field - the code's there, but the images don't appear - (under my 'now reading' heading), I have been able to add to the existing collection of covers under the "last read" heading (and downsizing all the covers to 100px) - until today.

    Other random trivial bit: I noticed when placing an image in a thread post, I now have to add a break code to put the text below it.

    While I can live without ever posting another birthday cake on a thread, I am extremely fond of using book covers (of various sizes) in posts and I like to be able to use them on my profile page (in multiples) as it breaks up what is a somewhat visually boring bit of text. Is there some wiki place of reference where I can see what and what is not exactly allowed now for code?

    91brightcopy
    Feb 23, 2010, 12:26 pm

    90> Might want to check out this thread.

    92Aerrin99
    Feb 23, 2010, 1:04 pm

    > 90

    Re: wiki, there's this page, but it's certainly still a work in progress.

    93SilentInAWay
    Feb 26, 2010, 2:44 am

    So that the title and author name pop up in a tool tip for each cover image on my profile page, I use the title parameter in the html link for each image:

    <a href="xxxxx" title="xxxxx">

    Was there a recent push that disabled the ability to use the title parameter in this way?

    94brightcopy
    Feb 26, 2010, 10:01 am

    Just FYI, TITLE doesn't work in Firefox. It's an old and very annoying bug.

    95legallypuzzled
    Feb 27, 2010, 10:45 am

    TITLE doesn't work in Firefox

    I think that's wrong. TITLE has always worked in Firefox. It's the ALT attribute that doesn't "work" by offering a pop-up, although it does in IE.

    96brightcopy
    Edited: Feb 27, 2010, 3:54 pm

    95> Nope, I mean TITLE. But while researching the post to give you an example, I ran across a funny thing. It won't work in my normal firefox that I use all the time, but will when I run it as a different user that I just use every once in a while to test stuff. Here's the URL I used to test.. Go there and hover over the "link label".

    In my FF, no popup hint. Under the test user, it pops up "Anchor Text". I don't get it. I'm in the process of picking everything apart. I've started FF in safe mode (all addons disabled) and it still does it. I've combed through the prefs trying to find something. I've been googling. Nothing so far. If anyone knows, please pass on the solution. FF 3.5.7 on XP.

    ETA: Oh, and btw, I know I'm not the only one to run into this. My coworker's Firefox behaved the same way, at least in regards to TITLE tooltips on images. I even made a greasemonkey script for xkcd so that we could see the titles for the comics.

    97lorax
    Feb 27, 2010, 4:04 pm

    96>

    I even made a greasemonkey script for xkcd so that we could see the titles for the comics.

    There's an add-on for that.

    https://addons.mozilla.org/en-US/firefox/addon/1933

    98legallypuzzled
    Edited: Feb 27, 2010, 5:02 pm

    >96 brightcopy:

    Ah. I misread "TITLE doesn't work in Firefox" as meaning it just didn't work in Firefox, instead of *your* version of Firefox. Because it's working fine for me, even on the sites you mentioned.

    There's sometimes odd behavior if you have
    browser.chrome.toolbar_tips
    set to false; it should only turn off the tooltips on the standard buttons but it often turns off tooltips on sites. See
    https://bugzilla.mozilla.org/show_bug.cgi?id=64232

    I'm embarrassed to say I read xkcd for about three months before running into the hidden text....

    ETA: Whoops! Forgot the link.

    99brightcopy
    Feb 27, 2010, 9:44 pm

    98> No, you didn't misread it, that's what I said. Up until I started investigating, I though TITLE wasn't working in FF. I think at some point in the foggy past, there was a version of FF where there was a bug in TITLE (I think it was the one with extra long titles being truncated) and that got conflated in my mind with TITLE just not working, period.

    And sweet jesus that was the pref! I was poring over what the differences were between my profile and the test profile I was running where tooltips were working. I gave up after about 10 minutes and I'm glad I did. Thanks so much! I would pass this tip along to my coworker who had the same problem, but he's already moved on to Chrome. What's the world coming to?

    PS: I think I'll actually stick with my GM script for xkcd. It puts the TITLE text under the image, kind of like a caption. It makes it easy to see while reading the comic, and it makes sure I don't forget to check it. And yes, I had been reading it for a while before I ran into the hidden text, too. Worse than that was SMBC. I went through and read the entire archive before figuring out what the secret comic button was...