- LibraryThing
- All topics
- Hot topics
- Book discussions
- All discussions
- Your LibraryThing
- Join to start using.
 |
downtime / site status page ?Join LibraryThing to post. This topic is currently marked as "dormant"—the last message is more than 90 days old. You can revive it by posting a reply.
Just a thought: is there any way to have a site status page hosted on a different server or something so there's a way to find out what's going on when there are down times? I used to check the blog when the rest of the site wouldn't come up, at least to see if it's going to be a big one, but now the blog seems to be inaccessible too when the rest of the site goes down.
Tonight's outage wasn't the servers. Our servers were just fine. It was the entire data center that went down. So having another, dedicated status machine at our data center wouldn't have done much good.
That being said, we've talked about having a separate status machine located somewhere other than our data center. We just haven't had the resources (human) to get it done quite yet.
Thanks for the response. Twitter's a good idea - bookmarking it... 5 feliusAug 6, 2009, 2:02am 
You could try mine, too, seeing as I post more about outages than Tim does, and I get paged when something breaks. ;)
Yes - I saw your tweets (both of you) about the outage today. Fortunately it didn't go down until after I'd done a shelf of my mom's books to lure her into using LT...
The (repeated) questions as to whether Boston was a smoking hole in the ground were...almost amusing. Would have been entirely so if I was quite certain the answer was 'no'! 7BTRIPPAug 6, 2009, 7:42am 
LiveJournal used to (it may still) have a status page hosted on Warped.com that if LJ went down would somehow get the hits (and let folks know there was a problem) ... I'm not sure exactly how that worked, but something like that could be an option.
8 feliusAug 6, 2009, 8:26am 
(gets a bit technical, apologies in advance!)
I definitely want a status page hosted somewhere other than the main web server, and I want to track availability at a much more granular level. Now that the migration to the new colo has happened we have enough hardware to handle outages in much smarter ways, and there are lots of things on my TODO list that are all about improving reliability and availability. This will include being able to show a status/outage page even if a web server dies (though preferably we'll keep going without you noticing, while I get paged to fix it!).
Our uptime *has* been much better since the move - we almost hit 100% in July until something broke right near the end of the month. :/ We've had a couple of short outages since then that were due to changes in the code causing excess load on the DB servers.
This case was pretty unusual though - our network connectivity is backed by multiple backbone providers, and traffic should be transparently shifted between them in the event of a failure. What happened was that our hosting provider had a power failure to the network core itself, which cut off connectivity to their two datacenters in Boston. (Actually I suspect it was a drastic equipment failure - they said "The power event was limited to DC power systems that provide power to the Internap PNAP and not customer UPS systems." Still waiting to hear the full story..)
In order to deal with that we'd need to be able to dynamically re-route an IP address to a server hosted in another facility. That's technically possible, but not something we're set up to do (and I suspect not something that's high on the priority list at the moment.)
10 feliusAug 6, 2009, 10:00am 
How is it that I've forgotten to mention (until now) that we actually *do* have an external status page. At present it only indicates whether or not the main web server is up, so it doesn't count outages where you can see a "LibraryThing is down" page. I'm planning to add better reports here before I do anything more fancy, though.. 11BTRIPPAug 6, 2009, 10:05am 
Oddly enough ... LiveJournal (along with Twitter, for some reason) is down this morning, so I got a chance to actually check out http://status.livejournal.com/ ... which is still out there and still on Warped.com ... this at least gives a place for the organization to communicate info about the downtime to its users! 12BTRIPPAug 6, 2009, 10:09am 
Hey ... that http://downforeveryoneorjustme.com/ is cool! I guess it's "not just me" that's having problems with both LiveJournal and Twitter this morning ... of course, I'm going nuts now since the only place I can go and blither about it is FaceBook :-(
> 12
I feel your pain (sort of). FB was acting wonky for me this morning. Luckily, most of my blithering is on another site, with which y'all may be familiar ...
The problem with our status page is that "up" to you is a technical term. It means the server is responding. To you, if the server is delivering a "down" page, we're up.
Can Pingdom calculate the "real" up?
Ah, yes, the old, 'yes the program is running, it's just not responding to users' problem.
14: If the server is delivering a down web page, then we don't need to check an external page...
>16
Yes, I but I'd love to know "how we're doing" over time. 18 feliusAug 6, 2009, 7:57pm 
The problem with our status page is that "up" to you is a technical term. It means the server is responding. To you, if the server is delivering a "down" page, we're up. No, "up" means the same thing to me as it does to everybody else. I just track a lot more things that can be "up" or "down". However I agree that most people care whether or not "LibraryThing" is up, rather than the status of individual components. Can Pingdom calculate the "real" up?
Yes. We just tell them to request scripts which return a tiny block of XML indicating the status of whatever service we're tracking. We just need to write the scripts!
Cool. So, let's focus on something that combines when we're unreachable (down down) and when we're . _down down. 20justjimAug 7, 2009, 3:04am 
Can't you just phone everybody? Or accept collect calls from everybody who wants to know what's going on.
//runs and hides//
Dude, just come over. If the site's down, I'll be up and we can have a beer. 22justjimAug 7, 2009, 3:22am 
Mate, if any outage lasts long enough for me to get from here to there, your site is toast!
22: Oh, just take the orbital flight. ;) 24reading_foxEdited: Aug 10, 2009, 11:00am 
#22 you live 4 days (the length of the longest downtime I remember back in summer '07?) away from Maine? That's truly remote, only a tiny percentage of the world is 4 days away from "civilisation".
And yes it was stressful for us poor users unable to access our daily fix.
Tonight I hauled out a huge pile of books to add . . . got through a handful and then the LTsite went down. Now it seems to be back up, but every library I try to add from gives the "Don't Panic" message and when I try to follow the link to "Search Other Libraries" that doesn't work either . . . none of the twitter or other links mentioned above seems to mention the problem . . . Thus, I renew the request for some sort of status feed we can subscribe to in order to assess "What's up with LT?" I will now leave my pile of books and go drink some decaf. 26 feliusDec 1, 2009, 10:17pm 
>25 Ugh. Sorry about that. The outage was my fault - I killed a server by exhausting all physical RAM, taking most of our in-memory cache and the library search services with it.
I had it back up fairly quickly, but failed to notice that some of the services on the search side didn't come back on boot. I've made changes to the configuration of that server to prevent that happening again.
You're absolutely right though, we need better monitoring and better communication of current status. The ball is in my court. |  68,660 messages This group does not accept members.  AboutThis topic is not marked as primarily about any work, author or other topic.  Touchstones
|