On Fri, Oct 28, 2022 at 13:49:21 +0100, Richard W.M. Jones wrote:
On Fri, Oct 28, 2022 at 01:45:32PM +0100, Tom Hughes wrote:
> On 28/10/2022 13:31, Richard W.M. Jones wrote:
> >On Fri, Oct 28, 2022 at 12:29:30PM +0100, Tom Hughes wrote:
> >>The reason it hadn't completed is that rpmdb-migrate.service
> >>was enabled on that machine.
> >[was not, I guess?]
>
> Yes ;-)
>
> >>Enabling (and starting) that service made it complete.
> >
> >Interesting.  The current state of that service is:
> >
> >○ rpmdb-migrate.service - RPM database migration to /usr
> >      Loaded: loaded (/usr/lib/systemd/system/rpmdb-migrate.service; disabled; vendor preset: enabled)
> >      Active: inactive (dead)
> >
> >There are no log entries for this service, but my logs only go back to
> >around April which is probably too late to see anything.
> >
> >After starting the service:
> >
> >Oct 28 13:29:33 systemd[1]: Starting rpmdb-migrate.service - RPM database migration to /usr...
> >Oct 28 13:29:35 rpmdb_migrate[1722092]: removed '/var/lib/rpm/.migratedb'
> >Oct 28 13:29:35 rpmdb_migrate[1722092]: removed '/var/lib/rpm/rpmdb.sqlite-shm'
> >Oct 28 13:29:35 rpmdb_migrate[1722092]: removed '/var/lib/rpm/rpmdb.sqlite'
> >Oct 28 13:29:35 rpmdb_migrate[1722092]: removed '/var/lib/rpm/rpmdb.sqlite-wal'
> >Oct 28 13:29:35 rpmdb_migrate[1722092]: removed '/var/lib/rpm/.rpm.lock'
> >Oct 28 13:29:35 rpmdb_migrate[1722092]: removed directory '/var/lib/rpm'
> >Oct 28 13:29:35 rpmdb_migrate[1722468]: '/var/lib/rpm' -> '../../usr/lib/sysimage/rpm'
> >Oct 28 13:29:35 systemd[1]: rpmdb-migrate.service: Deactivated successfully.
> >Oct 28 13:29:35 systemd[1]: Finished rpmdb-migrate.service - RPM database migration to /usr.
> >Oct 28 13:29:35 systemd[1]: rpmdb-migrate.service: Consumed 2.433s CPU time.
> >
> >... and the migration has been successful.  So at least we know how to
> >fix this.  Although it's quite curious that the service was installed
> >and supposed to run but didn't.
>
> The idea is that the service is enabled (not sure off hand what is
> supposed to do that) but not started, so that it runs when you reboot
> and completes the migration as part of the boot.
>
> When it runs it removes /var/lib/rpm/.migratedb which was created
> by the rpm scripts and hence prevents the service running on future
> boots as it's no longer needed.

It's hard to tell what happened without logs going back all the way,
but the machine was rebooted 65 days ago and several times before
that.  Within the available logs there is no mention of the service so
it may already have tried to run on a boot prior to the logs and
failed for some reason.

We do have reports of this happening from 3 independent people now so
it's not a one-off.

I wonder if folks rebooted or shut down the system after upgrading, and after /var/lib/rpm/.migratedb is removed but before the migration actually finished.  The example above shows the file being removed before the cleanup completed.  That run only took two seconds, so it seems to be an unlikely cause, but is that typically how long the migration takes?

  -- Steve