This sound good, but first we need to detect why rpms database got corrupted.
----- Original Message ---- From: Sam Varshavchik mrsam@courier-mta.com To: Development discussions related to Fedora Core fedora-devel-list@redhat.com Sent: Sunday, November 19, 2006 11:50:09 PM Subject: Re: SUG: RPM database verification / repair, nightly and in Anaconda
Tony Nelson writes:
I propose that there should be a nightly cron task to check the RPM
I proposed this several years ago, and got poo-poohed.
I now just have a cron.daily script that just makes a copy of /var/lib/rpm on a five day rotation.
At 4:38 AM -0800 11/20/06, Otto Rey wrote:
From: Sam Varshavchik mrsam@courier-mta.com Sent: Sunday, November 19, 2006 11:50:09 PM
Tony Nelson writes:
I propose that there should be a nightly cron task to check the RPM
I proposed this several years ago, and got poo-poohed.
I now just have a cron.daily script that just makes a copy of /var/lib/rpm on a five day rotation.
This sound good, but first we need to detect why rpms database got corrupted.
First? Do both at the same time! I'm verifying the database while other people are working on the RPM bugs. Plus, knowing more about the corruption would provide useful data to the effort to fix it.
On Mon, Nov 20, 2006 at 04:38:23AM -0800, Otto Rey wrote:
This sound good, but first we need to detect why rpms database got corrupted.
It's Berkeley DB. Corruption should be expected. :-/
Steve
On Mon, 2006-11-20 at 17:49 -0600, Steven Pritchard wrote:
On Mon, Nov 20, 2006 at 04:38:23AM -0800, Otto Rey wrote:
This sound good, but first we need to detect why rpms database got corrupted.
It's Berkeley DB. Corruption should be expected. :-/
I guess I'm a little confused here, actually.
Let's take bsddb out of the example.
If I'm a client of an oracle db and I repeatedly open, read, close the database using the interface available, for small amounts of data that might not be the most efficient use of the database connection.
However, if as a result of open-read-close the database is corrupted and/or rendered unusable where would you say the bug lies?
To me it seems like a valid client connection should not be able to corrupt a database simply by open-read-closing no matter how many times. And if it can then clearly there is something wrong with the database code.
-sv
On Tue, 21 Nov 2006, seth vidal wrote:
On Mon, 2006-11-20 at 17:49 -0600, Steven Pritchard wrote:
On Mon, Nov 20, 2006 at 04:38:23AM -0800, Otto Rey wrote:
This sound good, but first we need to detect why rpms database got corrupted.
It's Berkeley DB. Corruption should be expected. :-/
I guess I'm a little confused here, actually.
Let's take bsddb out of the example.
If I'm a client of an oracle db and I repeatedly open, read, close the database using the interface available, for small amounts of data that might not be the most efficient use of the database connection.
However, if as a result of open-read-close the database is corrupted and/or rendered unusable where would you say the bug lies?
To me it seems like a valid client connection should not be able to corrupt a database simply by open-read-closing no matter how many times. And if it can then clearly there is something wrong with the database code.
Well, obviously. I think that's exactly what Steven means... Personal experience with both subversion repos using Berkeley DB storage and rpmdb has made me too expect nothing else but eventual corruption from BDB :-/
- Panu -
On Tue, Nov 21, 2006 at 09:43:17AM +0200, Panu Matilainen wrote:
On Tue, 21 Nov 2006, seth vidal wrote:
To me it seems like a valid client connection should not be able to corrupt a database simply by open-read-closing no matter how many times. And if it can then clearly there is something wrong with the database code.
Well, obviously. I think that's exactly what Steven means... Personal experience with both subversion repos using Berkeley DB storage and rpmdb has made me too expect nothing else but eventual corruption from BDB :-/
That's been my experience as well. For example, every system that I have running openldap eventually corrupts the database (usually in 2-3 months of continuous use). This is on servers that run for months (sometimes years) at a time without a reboot.
Luckily slapd_db_recover and slapindex has fixed the problem every time so far, which is why I'd *really* like it if "service ldap restart" would run both.
https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=199322
Steve
On Wed, 2006-11-22 at 11:23 -0600, Steven Pritchard wrote:
That's been my experience as well. For example, every system that I have running openldap eventually corrupts the database (usually in 2-3 months of continuous use). This is on servers that run for months (sometimes years) at a time without a reboot.
Luckily slapd_db_recover and slapindex has fixed the problem every time so far, which is why I'd *really* like it if "service ldap restart" would run both.
Do you think we should be relying on berkeley db, then?
-sv
"sv" == seth vidal skvidal@linux.duke.edu writes:
sv> Do you think we should be relying on berkeley db, then?
It would sure be nice if we didn't have to. Unfortunately there's not really an adequate replacement; sqlite works well for some things but it's not really working in the same problem space. It might work for RPM, though.
- J<
At 9:43 AM +0200 11/21/06, Panu Matilainen wrote:
On Tue, 21 Nov 2006, seth vidal wrote:
On Mon, 2006-11-20 at 17:49 -0600, Steven Pritchard wrote:
On Mon, Nov 20, 2006 at 04:38:23AM -0800, Otto Rey wrote:
This sound good, but first we need to detect why rpms database got corrupted.
It's Berkeley DB. Corruption should be expected. :-/
I guess I'm a little confused here, actually.
Let's take bsddb out of the example.
If I'm a client of an oracle db and I repeatedly open, read, close the database using the interface available, for small amounts of data that might not be the most efficient use of the database connection.
However, if as a result of open-read-close the database is corrupted and/or rendered unusable where would you say the bug lies?
To me it seems like a valid client connection should not be able to corrupt a database simply by open-read-closing no matter how many times. And if it can then clearly there is something wrong with the database code.
Well, obviously. I think that's exactly what Steven means... Personal experience with both subversion repos using Berkeley DB storage and rpmdb has made me too expect nothing else but eventual corruption from BDB :-/
It would be a good idea to find out how many Fedora users have corrupt RPM databases. My rpm_verifydb package at http://georgeanelson.com/rpm-verifydb.htm is a start, but it won't report problems back to you guys. ISTM that if yum were to (possibly temporarily) verify the RPM database with the only tool available in RPM, "rpm --verifydb" (or its synonym "rpmdb_verify"), and then report only broken databases to Fedora, that this would not be the sort of privacy invasion that is causing so much angst. If RPM's developer, Jeff Johnson, is correct, you won't receive any reports, and the issue can be dropped. See the thread I started at Redhat's rpm-list, "SUG: Automatic RPM database verification and repair", for his take on the whole matter.
If you want me to do some of this, ask.
On Wed, 2006-11-29 at 16:02 -0500, Tony Nelson wrote:
It would be a good idea to find out how many Fedora users have corrupt RPM databases. My rpm_verifydb package at http://georgeanelson.com/rpm-verifydb.htm is a start, but it won't report problems back to you guys. ISTM that if yum were to (possibly temporarily) verify the RPM database with the only tool available in RPM, "rpm --verifydb" (or its synonym "rpmdb_verify"), and then report only broken databases to Fedora, that this would not be the sort of privacy invasion that is causing so much angst. If RPM's developer, Jeff Johnson, is correct, you won't receive any reports, and the issue can be dropped. See the thread I started at Redhat's rpm-list, "SUG: Automatic RPM database verification and repair", for his take on the whole matter.
If you want me to do some of this, ask.
Tony, We'd like for the problem to be fixed 'correctly' where correctly means not just papering over it. However, I definitely understand wanting to fix problems for users one way or the other. So the best I can say is that if this problem isn't fixed properly before Fedora 8 then I'll push hard for something similar to the work-around solution you've described to be implemented.
-sv
At 4:39 PM -0500 11/29/06, seth vidal wrote:
On Wed, 2006-11-29 at 16:02 -0500, Tony Nelson wrote:
It would be a good idea to find out how many Fedora users have corrupt RPM databases. My rpm_verifydb package at http://georgeanelson.com/rpm-verifydb.htm is a start, but it won't report problems back to you guys. ISTM that if yum were to (possibly temporarily) verify the RPM database with the only tool available in RPM, "rpm --verifydb" (or its synonym "rpmdb_verify"), and then report only broken databases to Fedora, that this would not be the sort of privacy invasion that is causing so much angst. If RPM's developer, Jeff Johnson, is correct, you won't receive any reports, and the issue can be dropped. See the thread I started at Redhat's rpm-list, "SUG: Automatic RPM database verification and repair", for his take on the whole matter.
If you want me to do some of this, ask.
Tony, We'd like for the problem to be fixed 'correctly' where correctly means not just papering over it.
...
Measuring a problem does not paper over it. You did not understand my post. Please ask me about anything in it that is unclear.
On 20/11/06, Otto Rey otto_rey@yahoo.com.ar wrote:
This sound good, but first we need to detect why rpms database got corrupted.
It seems that the rpm guys think this issue is arising due to races and signal handling problems which are related to the repeated opening and closing of the database inherent to the latest versions of yum:
See: https://lists.dulug.duke.edu/pipermail/rpm-devel/2006-November/001862.html
which is part of this thread on the problem: https://lists.dulug.duke.edu/pipermail/rpm-devel/2006-November/001849.html
Jonathan.
It seems that the rpm guys think this issue is arising due to races and signal handling problems which are related to the repeated opening and closing of the database inherent to the latest versions of yum:
See:
https://lists.dulug.duke.edu/pipermail/rpm-devel/2006-November/001862.html
which is part of this thread on the problem:
https://lists.dulug.duke.edu/pipermail/rpm-devel/2006-November/001849.html
Jonathan.
It might be worth reminding that in the current context, the issue appears to be kernel dependent and related to disk I/O traffic. The corruption then is a consequence of repeated system freezes during "rpm" transactions [kernel package rev. 2835 of "fc6-updates-testing" is the only one since my switch to "FC6" and recent "rawhide" which allows me to reliably avoid these problems].