As of about 8:40 today the CVS box threw a SMART error regarding one of its drives. As a result most of the CVS box is presently mounted read only. We are working to correct this issue but expect CVS to be up and down today.
-Mike
On 11/17/06, Mike McGrath mmcgrath@fedoraproject.org wrote:
As of about 8:40 today the CVS box threw a SMART error regarding one of its drives. As a result most of the CVS box is presently mounted read only. We are working to correct this issue but expect CVS to be up and down today.
Dell has been dispatched and will arrive in 4 hours. CVS will be down until further notice.
-Mike
On 11/17/06, Mike McGrath mmcgrath@fedoraproject.org wrote:
On 11/17/06, Mike McGrath mmcgrath@fedoraproject.org wrote:
As of about 8:40 today the CVS box threw a SMART error regarding one of its drives. As a result most of the CVS box is presently mounted read only. We are working to correct this issue but expect CVS to be up and down today.
Dell has been dispatched and will arrive in 4 hours. CVS will be down until further notice.
Further notice will go well into tonight and probably into tomorow. The drives are pretty messed up and mgalgoci has done great work to get the array's rebuilt but we aren't confident the filesystem will be in working order, we may have to restore from backup.
-Mike
On 11/18/06, Mike McGrath mmcgrath@fedoraproject.org wrote:
On 11/17/06, Mike McGrath mmcgrath@fedoraproject.org wrote:
On 11/17/06, Mike McGrath mmcgrath@fedoraproject.org wrote:
As of about 8:40 today the CVS box threw a SMART error regarding one of its drives. As a result most of the CVS box is presently mounted read only. We are working to correct this issue but expect CVS to be up and down today.
Dell has been dispatched and will arrive in 4 hours. CVS will be down until further notice.
Further notice will go well into tonight and probably into tomorow. The drives are pretty messed up and mgalgoci has done great work to get the array's rebuilt but we aren't confident the filesystem will be in working order, we may have to restore from backup.
Thanks for keeping us informed.
Just for information, what backup type/program are you using? How easy is it to recover from bare metal?
On 11/18/06, Gianluca Sforna giallu@gmail.com wrote:
On 11/18/06, Mike McGrath mmcgrath@fedoraproject.org wrote:
On 11/17/06, Mike McGrath mmcgrath@fedoraproject.org wrote:
On 11/17/06, Mike McGrath mmcgrath@fedoraproject.org wrote:
As of about 8:40 today the CVS box threw a SMART error regarding one of its drives. As a result most of the CVS box is presently mounted read only. We are working to correct this issue but expect CVS to be up and down today.
Dell has been dispatched and will arrive in 4 hours. CVS will be down until further notice.
Further notice will go well into tonight and probably into tomorow. The drives are pretty messed up and mgalgoci has done great work to get the array's rebuilt but we aren't confident the filesystem will be in working order, we may have to restore from backup.
Thanks for keeping us informed.
Just for information, what backup type/program are you using? How easy is it to recover from bare metal?
We're using BackupPC. At present we can't backup everything on every box so we're only getting whats important (that will change with the new backup server). Bare metal for the cvs box though shouldn't be to bad once we get the OS back on it.
-Mike
Mike McGrath wrote:
As of about 8:40 today the CVS box threw a SMART error regarding one of its drives. As a result most of the CVS box is presently mounted read only. We are working to correct this issue but expect CVS to be up and down today.
Just curious: why the read-only mount? shouldn't the RAID have continued in degraded mode?
"AK" == Avi Kivity avi@argo.co.il writes:
AK> Just curious: why the read-only mount? shouldn't the RAID have AK> continued in degraded mode?
Probably because something else bad happened that just completely screwed up the SCSI bus and corrupted data on multiple disks.
- J<
On 18 Nov 2006 09:31:26 -0600, Jason L Tibbitts III tibbs@math.uh.edu wrote:
"AK" == Avi Kivity avi@argo.co.il writes:
AK> Just curious: why the read-only mount? shouldn't the RAID have AK> continued in degraded mode?
Probably because something else bad happened that just completely screwed up the SCSI bus and corrupted data on multiple disks.
- J<
We're talking about multiple failures across multiple drives, possibly a backplane. Here's the current plan.
1) Move proxy 3-4 into the f.rh.c cluster so we can take our new dells back.
2) Grab one of the new Dells and build the new cvs box. This will allow us to A) trust the hardware (we're all a little wary about the current cvs box) and B) build a new box with atleast access to the old box if we're missing something. It will also allow us greater capacity with regards to future growth and the whole FC+FC=Fedora thing.
3) Restore backups to the new cvs box.
4) test test test
5) Release to the wild and fix bugs as needed.
6) Take the old cvs box and run full diagnostics before we rebuild it (it'll be come one of our db servers, either primary or backup)
Right now mgalgoci is working working on steps 1 and 2. When they are done I'll be on step 3 and we'll need a few people for 4. We'll probably discuss in #fedora-extras when the time comes.
Bottom line, this sucks but we're working on it. Should be up and better than ever by Monday.
-Mike
On 11/18/06, Mike McGrath mmcgrath@fedoraproject.org wrote:
Bottom line, this sucks but we're working on it. Should be up and better than ever by Monday.
Buhhh, late monday.
-Mike
On 11/19/06, Mike McGrath mmcgrath@fedoraproject.org wrote:
On 11/18/06, Mike McGrath mmcgrath@fedoraproject.org wrote:
Bottom line, this sucks but we're working on it. Should be up and better than ever by Monday.
Buhhh, late monday.
-Mike
We've got the box, restores are happening as we speak, 48G restored so far. I'll be restoring the ssh-host keys last so if you're still getting ssh mismatch errors... its not ready to be logged into.
-Mike
On 11/20/06, Mike McGrath mmcgrath@fedoraproject.org wrote:
On 11/19/06, Mike McGrath mmcgrath@fedoraproject.org wrote:
On 11/18/06, Mike McGrath mmcgrath@fedoraproject.org wrote:
Bottom line, this sucks but we're working on it. Should be up and better than ever by Monday.
Buhhh, late monday.
-Mike
We've got the box, restores are happening as we speak, 48G restored so far. I'll be restoring the ssh-host keys last so if you're still getting ssh mismatch errors... its not ready to be logged into.
Alrighty then, new box is up. The following service should be working (if they are not, let me know immediately)
legacy - cvs extras - cvs fedora - cvs dist (copy) - cvs docs - cvs font - cvs (whatever that is) viewcvs
Services known NOT to work git (needs a new chroot but no data loss)
Services in ????: There were a number of web services on the box that were never properly configured in the first place. Let me know if there are services taht don't work now that did before the crash.
AFAIK we had no actual data loss except for commits that happened after the last backup and before the failure (which was bout an 8 hour window or so with 10 commits. I believe warren is going to contact these people or already has).
I'll be working on getting the git repo back up tomorrow morning and will need someone to test, any volunteers please contact me.
CVS is a very strange box in that many people have access to make changes on it (and they do) it could take a while before we get every service back online but we're doing well.
Please contact me (or admin@fedoraproject) with any issues or bugs, we'll get to them soon.
-Mike
On Tuesday, 21 November 2006 at 08:00, Mike McGrath wrote: [...]
Alrighty then, new box is up. The following service should be working (if they are not, let me know immediately)
legacy - cvs extras - cvs fedora - cvs dist (copy) - cvs docs - cvs font - cvs (whatever that is) viewcvs
Thank you very much for the hard work! I've just imported and built another package.
Regards, R.
On 11/21/06, Mike McGrath mmcgrath@fedoraproject.org wrote:
On 11/20/06, Mike McGrath mmcgrath@fedoraproject.org wrote:
On 11/19/06, Mike McGrath mmcgrath@fedoraproject.org wrote:
On 11/18/06, Mike McGrath mmcgrath@fedoraproject.org wrote:
Bottom line, this sucks but we're working on it. Should be up and better than ever by Monday.
Buhhh, late monday.
-Mike
We've got the box, restores are happening as we speak, 48G restored so far. I'll be restoring the ssh-host keys last so if you're still getting ssh mismatch errors... its not ready to be logged into.
Alrighty then, new box is up. The following service should be working (if they are not, let me know immediately)
Good job, thanks.