Strange yum update hang (or something else) in Rawhide .. how to debug this further?

Panu Matilainen pmatilai at laiskiainen.org
Tue Feb 5 07:24:41 UTC 2013


On 02/05/2013 12:32 AM, Richard W.M. Jones wrote:
> On Mon, Feb 04, 2013 at 07:17:35PM +0200, Panu Matilainen wrote:
>> On 02/04/2013 07:01 PM, Richard W.M. Jones wrote:
>>> On Mon, Feb 04, 2013 at 04:38:08PM +0000, Richard W.M. Jones wrote:
>>>>
>>>>    Cleanup    : cpp-4.8.0-0.7.fc19.x86_64                                215/262
>>>>    Cleanup    : gdb-7.5.50.20130118-2.fc19.x86_64                        216/262
>>>>    Cleanup    : 1:findutils-4.5.10-7.fc19.x86_64                         217/262
>>>>    Cleanup    : spice-server-0.12.2-2.fc19.x86_64                        218/262
>>>>    Cleanup    : cracklib-2.8.22-2.fc19.x86_64                            219/262
>>>>    Cleanup    : libvirt-daemon-driver-interface-1.0.1-6.fc19.x86_64      220/262
>>>>    Cleanup    : libvirt-daemon-driver-nodedev-1.0.1-6.fc19.x86_64        221/262
>>>>    Cleanup    : libvirt-daemon-driver-nwfilter-1.0.1-6.fc19.x86_64       222/262
>>>>    Cleanup    : libvirt-daemon-driver-secret-1.0.1-6.fc19.x86_64         223/262
>>>>    Cleanup    : libvirt-daemon-1.0.1-6.fc19.x86_64                       224/262
>>>>    Cleanup    : libvirt-client-1.0.1-6.fc19.x86_64                       225/262
>>>>    Cleanup    : cyrus-sasl-2.1.25-2.fc19.x86_64                          226/262
>>>>    Cleanup    : openldap-2.4.33-3.fc19.x86_64                            227/262
>>>>    Cleanup    : nss-tools-3.14.1-3.fc19.x86_64                           228/262
>>>>    Cleanup    : nss-sysinit-3.14.1-3.fc19.x86_64                         229/262
>>>>    Cleanup    : nss-3.14.1-3.fc19.x86_64                                 230/262
>>>> (and here it hangs, for at least 20 minutes)
>>>
>>> So how odd is this?  Suddenly it leaps back into life, after maybe
>>> 30-40 minutes.
>>
>> Sounds like https://bugzilla.redhat.com/show_bug.cgi?id=860500
>
> Yes, this looks similar.
>
> It's possible that I ran a non-root yum command in another terminal.

A non-root yum/rpm/similar command wouldn't do. Only processes running 
as root can participate in the shared environment (those __db.* files) 
locking, others use a "private environment" which pretty much equals to 
no locking at all.

So whatever it is that causes the jam is running as root, and equally 
only a root-process can unjam it. Could even be the same thing that 
caused the jam re-running, it's quite clearly something that runs 
automatically in the background and does so more or less periodically, 
occasionally exiting or crashing without freeing the rpmdb iterator it 
holds. Whether its time-based or triggered by some other "external" 
event I dunno. And when it causes a jam its either still running while 
yum is started, or has started after yum.

Rpm uses Berkeley DB's "Concurrent Data Store" model for its database. 
This is a simple model which supposedly provides a deadlock-free 
operation without caller having to bother with explicit locking, but 
unfortunately this only works when all callers are well-behaved. Not 
entirely unlike multitasking in Windows 3.x... All it takes a single 
buggy application forgetting to release its rpmdb iterators (or crashing 
while holding them) to block a concurrent writer "forever". Stale locks 
from no longer active processes are automatically cleaned but only on 
rpmdb open, so a potentially long-running application like yum can get 
stuck if the bad apple comes along after yum started.

Come to think of it, it should be possible to have rpm check for stale 
locks when opening write-cursors. That would help some of the cases 
(where the bad caller already exited/died) at least, but it'd still be 
"vulnerable" to long-running process hanging on to iterators.

	- Panu -


More information about the devel mailing list