Hi every one,

Reading all ideas about solving issues with upgrading systems from working systems are more or less ideas of ad choc solving some issues or even more or less reinventing the wheel. IMO all those ideas will not solve anything and will only increase total level of entropy. After this will be necessary sooner or later add even more ad choc workarounds and so on ..

I've mention already that some solutions are close to reinventing the wheel.
Why? Because they've been solved long time ago. To be more precise more than decade ago.

I'm working with rpm (RPM Package Manager) more than two decades (try to execute "rpm -qa --channgelog | grep kloczek" and you can find one on my earliest activities still present in any RH based distributions. I've been maintaining for 3 years PLD which at peak time was withs rpm based distribution with more than 5k src.rpm packages).
Initially rpm was huge step forward because it's been formalizing many install/upgrade, uninstall, verifications, building, testing problems under single hood. 
Especially many things related to building packages have been solved very well. So well that even today only some small improvements time too time needs to be done. 

From the beginning of the rpm (from time when it was 100% implemented in perl) compare do SySV packages (used on Solaris and BSD*s) and deb (kind of only improved new skyfold on top of original SySV packaging tools ideas) up to now problem of consistent upgrade never been solved completely. Why? Because man assumption about doing upgrade on working system image/resources is broken by design idea. As long as during upgrade process will be deleted some files still used by working processes or will be reopened by those processes always possibility that those processes will be not able normally used resources or will be trying to use resources from wrong version is relatively high.
Whatever could be done on packagemanager are to avoid those icebergs is not enough and will never solve those two fundamental uncertain scenarios.

So why with existing rpm is not possible to solve upgrade dilemmas is probably more or less obvious now.
So seems like now is yet another iteration of clashes with rmp limitations only question is how (and by who?) those problems have been already solved?
Answer is very simple: those problems have been solved almost decade ago on Solaris with introduction two crucial technologies like ZFS (Zeta File System) and IPS (Image Packaging system). These two bits on maintaining system resources are interacting very closely and they cannot be used separately (yes .. atm only).

So how ALL upgrade problems have been solved on ideas layer?
Very simple: by assumption that system upgrade will never (ever) will be done on working system resource.
Someone may scratch his head asking "how it is possible to do upgrade if system resources are not touched?". Answer is that it is not possible to implement this idea adding some functionalities to package management (PM) software. Such operation like upgrade needs to be supported by OS and to be more precise by FS layer.

So how problem of consistent upgrade have been solved on Solaris using ZFS and IPS?
ZFS has ability to create snapshot of the vol (RO resource) and create on top of the shapshot clone (RW resource).
Whole upgrade process consist from few steps:
- find volumes which needs to be snapshoted and cloned
- create clones
- mount clones as separated tree and perform upgrad
  This part is crucial. If anything wrong will happen during upgrade still working system is not affected. It is possible to observe state of broken upgrade and produce very precise diagnostic data allowing to fix upgrade process on layer of packages. In other words impact of during upgrade on top of still working system is NULL/ZERO!!!
- when upgrade process is finished grub boot loaded configuration is updated to add new root point from from which updated system system image needs to be booted.

As I wrote two technologies here (together) are crucial here to solve 100% upgrade issues: ZFS and IPS. 3rd minor part is bootloaded. Originally on Solaris 10 was used grub and grub2 on Solaris 11 only simplified whole workflow.

So what is missing here on Linux to implement those idea? To be hones .. not to much which is good :)
Only few small bolts and beans are missing :)

On Linux at the moment is available btrfs which provides possibility of RW snapshots (equivalent of ZFS clones). All what needs to be added to this layer is btrfs volume attribute indicating that volume needs to be cloned during upgrade in case of more complicated scenarios.
Why? Because automatic discovery may be not enough in cases like mayr database upgrade when part of the u[grade may be some format change which needs to be applied in format for example database files used by some application. If in boot loaded will be possible to have to boot entries allowing to boot from original state from before upgrade and all what was done after upgrade upgrade if post PM upgrade operations applied on top of upgraded software will be cloned as well in case any troubles on this stage. Whole rollback/downgrade procedure will only consist from reboot and choose another BE (Boot Environment) 
All BE management on solaris is dome over one command beadm. This command is used on cloning existing OS resources manually as well. BE idea is connected to to other small bits like running BE and active BE. Running BE it is BE which is used now and active BE it is BE which will be used automatically if it will be used reboot command without specify BE from which system needs to boot after shutdown.

Another small bit which needs to be sorted is related to install procedures implemented in anaconda and post installation procedures in kernel package.
What is missing here? anaconda does not allow now to use /boot on btrfs. It forces use ext3/4.
Few weeks ago dysk in my laptop started failing so I've attached new disk replacing CD. Initially I've started replicating whole partitions layout as it was applied by anaconda installer with one partition for swap, second one for ext3/4 /boot and / on btrfs.
When I've done and after start "btrfs send | btr receive" commands I've found out that in kernel space are loaded ext modules ad they are used only by /boot. So I've stopped everything to change to have only swap partition (without LVM) and btrfs root pool.
After copy all resources and generate proper boot loaded on new disk everything still is working so there is no any technical reasons now to have /boot separated!!!
Only obstacle is that implemented in kernel package post installation procedure does not like btrfs on /noot and does not update grub boot entries so after few one or two kernel upgrades from rawhide I found that my grub menu is shorter and rhorer :)
All what needs to be done to fix this issue is execute "grub2-mkconfig -o /boot/grub2/grub.cfg"
I'm pretty sure that above will not break booting from other FSeses :)

Going to the end of his long email ..

All that needs to have done solving all upgrade issues on top of the Fedora in some minimalistic scenario is:
- add to dnf BEs management
- switch btrfs as default FS
- adapt kernel post installation procedure
On top of above can be added few other small bits making whole BE management consistent from point of view BEs management.

Anyone who will choose other than btrfs FS will need to accept that it will be more or less dealing with limitations of the calassic more than 30 years old ideas of using non-shapshotable/cloneable volumes and limitations of old SySV packages ideas.

rpm need to die sooner or later as well and probably best would be adapt IPS :)
IPS is fully OSS (https://java.net/projects/ips/sources/pkg-gate/show/) and many people see far been thinking about porting it of Linux. However lack of enough stable btrfs was main obstacle. As now btrfs is quite stable IMO it is time to start thinking about move away from rpm as well. However no rush :)
As I said and I think that I've prove above that now IPS is not essential maybe another time I'll try to write longer comment why rpm is already dead :)

kloczek
-- 
Tomasz Kłoczko | LinkedIn: http://lnkd.in/FXPWxH