PDA

View Full Version : Major Board Hiccup - Part 2



jimnyc
11-16-2010, 11:01 PM
Earlier this evening, at about 8pm EST, the board was brought down briefly by the hosting company to replace a hard drive. This was announced ahead of time and the ETA was 10 minutes. When our site was back at about 8:15, we noticed there were 3 days of posts missing and I immediately opened a support ticket.

I made the intitial announcement on the board while I waited for an explanation from the hosting provider. Unfortunately, for people like Psychoblues, I saw he went back and was already re-posting things he had lost within the past 3 days. I even made a few posts myself.

Then I got an "explanation" from the provider that they were restoring the account from last evening. Anything after that will be lost. So we move forward from last night. :(

I don't know what's worse, losing the 3 days and moving on, or assuming it was gone and moving on, only to have the "moving on" posts deleted when it gets restored to the night before!!

You all have my sincerest apologies. What was supposed to be a quick and easy maintenance step on the server turned into a days worth of lost posts.

jimnyc
11-16-2010, 11:17 PM
It appears the last post was by RSR sometime around 4:30am EST today. Then the rest is lost until I started posting again just a short while ago.

You all have my sincerest apologies.

The only time sensitive post was the NFL threads. I updated last weeks results already. Have posted this weeks schedule and received picks thus far.

http://www.debatepolicy.com/showthread.php?29855-NFL-2010-Week-11

Mr. P
11-16-2010, 11:29 PM
It wiped out todays PMs also. Chit happens.

Psychoblues
11-17-2010, 12:08 AM
Earlier this evening, at about 8pm EST, the board was brought down briefly by the hosting company to replace a hard drive. This was announced ahead of time and the ETA was 10 minutes. When our site was back at about 8:15, we noticed there were 3 days of posts missing and I immediately opened a support ticket.

I made the intitial announcement on the board while I waited for an explanation from the hosting provider. Unfortunately, for people like Psychoblues, I saw he went back and was already re-posting things he had lost within the past 3 days. I even made a few posts myself.

Then I got an "explanation" from the provider that they were restoring the account from last evening. Anything after that will be lost. So we move forward from last night. :(

I don't know what's worse, losing the 3 days and moving on, or assuming it was gone and moving on, only to have the "moving on" posts deleted when it gets restored to the night before!!

You all have my sincerest apologies. What was supposed to be a quick and easy maintenance step on the server turned into a days worth of lost posts.

I certainly hope I didn't offend you, jimbo. I did not attempt to repost the last three days but only to some degree revive the conversations from the last few hours. That would approximately realign to the parameters of the available recoupable information wouldn't it? Please don't tell me you're turning into a drama queen!!!!!!!!!!

I hope you can get it all straightened out!!!!!!!!!

Love :laugh2:

Psychoblues

SassyLady
11-17-2010, 01:12 AM
It appears the last post was by RSR sometime around 4:30am EST today. Then the rest is lost until I started posting again just a short while ago.

You all have my sincerest apologies.

The only time sensitive post was the NFL threads. I updated last weeks results already. Have posted this weeks schedule and received picks thus far.

http://www.debatepolicy.com/showthread.php?29855-NFL-2010-Week-11

Thank you Jim for working so hard to get things back on track.

darin
11-17-2010, 04:49 AM
To help folks, I've decided to pay for EVERYONE's use of the board for the rest of the week! Open bar, folks.

:D

Thanks Jim

btw - the PM you sent in reply to mine? It's gone, too. I read it thru the email notification though.

:)

jimnyc
11-17-2010, 06:52 AM
I still have not received an acceptable response from my provider as to how this happened. All they told me is that they restored the site from early yesterday morning and to verify with them that this was working. While it won't help us, I still want to know how the procedure they were doing caused my site to lose data.

But yes, everything and anything is unfortunately gone from the time of the restore until the time the servers came back up at 8:15, including PM's.

chloe
11-17-2010, 09:00 AM
I still have not received an acceptable response from my provider as to how this happened. All they told me is that they restored the site from early yesterday morning and to verify with them that this was working. While it won't help us, I still want to know how the procedure they were doing caused my site to lose data.

But yes, everything and anything is unfortunately gone from the time of the restore until the time the servers came back up at 8:15, including PM's.

Maybe its divine intervention for some:laugh:....I kid I kid

jimnyc
11-17-2010, 11:29 AM
I suppose the discounts won't hurt! But I'll need to read more about this to fully understand their response. I've worked with RAID Arrays very often in a server environment. There would be an array of 3-10 drives in our systems, and when a red light came on one of them it meant it needed to be changed. Hell, most of our servers at the time, and this was 10 years ago, we were able to swap these drives while the server was still running. We also had some servers where we would bring them down gently, perform the quick swap, and bring the server back up again. We've had complete failures on servers where we had to restore entire servers from backups, but I don't recall ever losing days of data after successfully swapping a drive in an array. I was always under the impression that drives in these RAID arrays were up to date with one another. (but I'm more of a PC guy than a server guy)

Here's their official response. Maybe another techie reading this can explain better what they think happened, I think this may be above my pay grade!!


Hi,

I have had our senior admin get a look at this. It seems that due to the one bad drive, the raid was not syncing properly to the secondary drive. When we replaced the hard drive and then synced data over to it from the secondary, it was not upto date. We do apologize for this, but these types of issues will happen from time to time.

I have added a 15% discount to your plan, and given you a one month credit on the account.


Regards,

NightTrain
11-17-2010, 01:57 PM
The way I read it, Jim, she's telling you that the faulty hard drive wasn't providing complete, uncorrupted data to the backup.

So when it failed completely, and they put in the new hard drive, the backup data they had on the secondary was incomplete.

I don't work on servers, I just install & troubleshoot the network that makes them talk but I have watched the IT guys get frustrated when fighting problems like these while getting dozens of irate messages from the customer-interfacing crew.

A nice discount though!

jimnyc
11-18-2010, 11:53 AM
Oh boy, just got this email, which is 100% identical to the one I received the morning of the last incident.


Maintenance Notice
Dear James,
We are contacting you to inform you about some upcoming scheduled maintenance.

The Sem-Dedicated server "trinculo" needs a bad drive replaced in its RAID array. Server will need to be taken down to replace the drive.
This has been scheduled to be performed at 08:00pm EST on Friday, November 19th, 2010. The estimated downtime is 10 to 15 minutes.
Thank you for your cooperation.
I'm already on top of them to find out what preparations are being taken to ensure the same thing doesn't happen again. I'm also going to perform my own backups around 7pm tomorrow so that if something goes wrong again we can go right back one hour.

KitchenKitten99
11-18-2010, 04:33 PM
And here I thought my post was deleted for whatever reason, without explaination. I thought it was weird, but not worried enough about it to ask anyone.

BoogyMan
11-18-2010, 07:09 PM
Do you know what RAID level they are using on this server? It sounds like they are doing a simple RAID1 (mirroring), and doing it badly.

Trinity
11-18-2010, 07:19 PM
The way I read it, Jim, she's telling you that the faulty hard drive wasn't providing complete, uncorrupted data to the backup.

So when it failed completely, and they put in the new hard drive, the backup data they had on the secondary was incomplete.

I don't work on servers, I just install & troubleshoot the network that makes them talk but I have watched the IT guys get frustrated when fighting problems like these while getting dozens of irate messages from the customer-interfacing crew.

A nice discount though!

Yep that pretty much sums it up.

Trinity
11-18-2010, 07:20 PM
Do you know what RAID level they are using on this server? It sounds like they are doing a simple RAID1 (mirroring), and doing it badly.

That's what I was wondering too?

jimnyc
11-19-2010, 10:27 AM
The way I read it, Jim, she's telling you that the faulty hard drive wasn't providing complete, uncorrupted data to the backup.

So when it failed completely, and they put in the new hard drive, the backup data they had on the secondary was incomplete.

I don't work on servers, I just install & troubleshoot the network that makes them talk but I have watched the IT guys get frustrated when fighting problems like these while getting dozens of irate messages from the customer-interfacing crew.

A nice discount though!


Do you know what RAID level they are using on this server? It sounds like they are doing a simple RAID1 (mirroring), and doing it badly.


That's what I was wondering too?

I made the mistake of "assuming" that they were running at least RAID 5, but it appears you guys were right in your assumptions. I asked them how they would ensure we don't have a repeat adventure and here is what I got yesterday:


Both drives are sync'd at this time so replacing the drive will not cause that issue and data will remain intact.


Sure sounds like basic mirroring to me.

I will be running an entire backup between 7pm-8pm EST tonight, so the server may be a little slow right before the downtime.