Given the plans to make btrfs the default, I'll share some of my own recent experiences, hopefully this can make it easier for the next person
One issue I've come across is that a btrfs filesystem can only be used on hosts with the same page size as the host that created the filesystem
E.g. x86-64 kernels have a 4k default page size but powerpc64le kernels have been compiled with the optional 64k page size. This impacts various distributions.
If somebody creates some filesystems with the 4k parameter and then they migrate them to the powerpc64le host, they won't mount
If they try to go the other way, the filesystems won't mount either
There are other non-btrfs issues related to the 64k page size, for example, nouveau driver won't work either
To make things easier for btrfs, could it be worthwhile changing the default page size from 64k back to 4k on default kernels for most ordinary users?
Daniel Pocock wrote:
One issue I've come across is that a btrfs filesystem can only be used on hosts with the same page size as the host that created the filesystem
Ewww! That alone should disqualify btrfs as a default file system!
Why does a file system depend on the kernel page size? The kernel page size is an internal implementation detail of the kernel, whereas a file system ought to be a stable interchange format that is compatible across all machines.
It is unfortunate that this showstopper was not mentioned when the switch to btrfs by default was proposed.
Kevin Kofler
On Tue, Sep 15, 2020 at 7:57 PM Kevin Kofler kevin.kofler@chello.at wrote:
Daniel Pocock wrote:
One issue I've come across is that a btrfs filesystem can only be used on hosts with the same page size as the host that created the filesystem
Ewww! That alone should disqualify btrfs as a default file system!
Why does a file system depend on the kernel page size? The kernel page size is an internal implementation detail of the kernel, whereas a file system ought to be a stable interchange format that is compatible across all machines.
It is unfortunate that this showstopper was not mentioned when the switch to btrfs by default was proposed.
I hate to break it to you, but this problem is not just in filesystems, it's in basically everything in the kernel. And we've had variations of problems like this for years (endianness, page size, pointer size, single bit vs multi-bit booleans, etc.). I've personally been bitten by all of these issues in some way. This comes from the fact that there's no such thing as "internal implementation detail of the kernel" by design. This is the "joy" of the monorepo "design" where everything leaks into everything else.
This didn't become a serious problem until Red Hat made the unfortunate (though not realized at the time) mistake of switching to 64k pages for ARM and POWER. We got that change in Fedora for POWER but not ARM. It has led to all kinds of unfortunate problems that are gradually being worked on and fixed upstream.
Coming back to Btrfs specifically, there is work underway upstream to resolve this issue. My (semi-blind) estimate is that we'll see a fix in Linux 5.11, but Josef (cc'd to this email) may know more about it.
-- 真実はいつも一つ!/ Always, there's only one truth!
On 9/15/20 7:29 PM, Neal Gompa wrote:
On Tue, Sep 15, 2020 at 7:57 PM Kevin Kofler kevin.kofler@chello.at wrote:
Daniel Pocock wrote:
One issue I've come across is that a btrfs filesystem can only be used on hosts with the same page size as the host that created the filesystem
Ewww! That alone should disqualify btrfs as a default file system!
Why does a file system depend on the kernel page size? The kernel page size is an internal implementation detail of the kernel, whereas a file system ought to be a stable interchange format that is compatible across all machines.
It is unfortunate that this showstopper was not mentioned when the switch to btrfs by default was proposed.
I'm not sure that it would have been deemed any more important than other concerns which were raised at the time, TBH.
I hate to break it to you, but this problem is not just in filesystems, it's in basically everything in the kernel. And we've had variations of problems like this for years (endianness, page size, pointer size, single bit vs multi-bit booleans, etc.). I've personally been bitten by all of these issues in some way. This comes from the fact that there's no such thing as "internal implementation detail of the kernel" by design. This is the "joy" of the monorepo "design" where everything leaks into everything else.
That's simply not accurate. Handling 32/64 bit interfaces, endianness, etc are long-solved problems. Longstanding lack of design or support for sub-page block support in a filesystem is not /at all/ the same thing.
Are there occasional endianness bugs, pointer size bugs, etc? Sure. But that's different from "We did not design this."
This didn't become a serious problem until Red Hat made the unfortunate (though not realized at the time) mistake of switching to 64k pages for ARM and POWER. We got that change in Fedora for POWER but not ARM. It has led to all kinds of unfortunate problems that are gradually being worked on and fixed upstream.
Sub-page block support in filesystems is not a wild, esoteric, unexpected feature.
It's something that is generally available in nearly every other widely used Linux filesystem. It's not accurate to suggest that this is some unexpected side effect of page size choice, or that 64k pages were somehow a "mistake" now that this btrfs compatibility issue has been made more obvious.
btw, Fedora has shipped kernels with 64k pages for almost a decade:
commit 737c9c7da818f1da0bdf3f6a0dda5c38a3cba769 Author: Josh Boyer jwboyer@redhat.com Date: Fri Sep 9 11:21:22 2011 -0400
Change to 64K page size for ppc64 kernels (rhbz 736751)
-Eric
On Wed, Sep 16, 2020 at 10:32 AM Eric Sandeen esandeen@redhat.com wrote:
On 9/15/20 7:29 PM, Neal Gompa wrote:
On Tue, Sep 15, 2020 at 7:57 PM Kevin Kofler kevin.kofler@chello.at wrote:
Daniel Pocock wrote:
One issue I've come across is that a btrfs filesystem can only be used on hosts with the same page size as the host that created the filesystem
Ewww! That alone should disqualify btrfs as a default file system!
Why does a file system depend on the kernel page size? The kernel page size is an internal implementation detail of the kernel, whereas a file system ought to be a stable interchange format that is compatible across all machines.
It is unfortunate that this showstopper was not mentioned when the switch to btrfs by default was proposed.
I'm not sure that it would have been deemed any more important than other concerns which were raised at the time, TBH.
I hate to break it to you, but this problem is not just in filesystems, it's in basically everything in the kernel. And we've had variations of problems like this for years (endianness, page size, pointer size, single bit vs multi-bit booleans, etc.). I've personally been bitten by all of these issues in some way. This comes from the fact that there's no such thing as "internal implementation detail of the kernel" by design. This is the "joy" of the monorepo "design" where everything leaks into everything else.
That's simply not accurate. Handling 32/64 bit interfaces, endianness, etc are long-solved problems. Longstanding lack of design or support for sub-page block support in a filesystem is not /at all/ the same thing.
Are there occasional endianness bugs, pointer size bugs, etc? Sure. But that's different from "We did not design this."
Almost every filesystem was not originally designed for mixing page sizes, endianness, etc. These issues *have* been fixed over time, for sure. But it is not worth it for me or anyone else to go into a blame game. Is it unfortunate that Btrfs didn't have that? Sure. Did I know this was a problem? No, because I have no access to POWER systems, like almost everyone else here. And ARM, the other architecture we have, does not use 64K page sizes in Fedora (though it does in RHEL, and that is pretty much considered a mistake there, as it didn't take off, caused interop and performance issues, and added complexity where it was unneeded).
This didn't become a serious problem until Red Hat made the unfortunate (though not realized at the time) mistake of switching to 64k pages for ARM and POWER. We got that change in Fedora for POWER but not ARM. It has led to all kinds of unfortunate problems that are gradually being worked on and fixed upstream.
Sub-page block support in filesystems is not a wild, esoteric, unexpected feature.
It's something that is generally available in nearly every other widely used Linux filesystem. It's not accurate to suggest that this is some unexpected side effect of page size choice, or that 64k pages were somehow a "mistake" now that this btrfs compatibility issue has been made more obvious.
btw, Fedora has shipped kernels with 64k pages for almost a decade:
commit 737c9c7da818f1da0bdf3f6a0dda5c38a3cba769 Author: Josh Boyer jwboyer@redhat.com Date: Fri Sep 9 11:21:22 2011 -0400
Change to 64K page size for ppc64 kernels (rhbz 736751)
I am aware that we shipped them for a long time. They are a mistake for many other reasons unrelated to Btrfs. Regardless, the choice was made and things have been fixed over time for it. There is already a patch set being reviewed[1] for the first stage of mixed page support.
[1]: https://lore.kernel.org/linux-btrfs/12ecf2f9-c262-8b00-2165-486684ba2fef@sus...
-- 真実はいつも一つ!/ Always, there's only one truth!
On Wed, Sep 16, 2020 at 10:44 AM Neal Gompa ngompa13@gmail.com wrote:
On Wed, Sep 16, 2020 at 10:32 AM Eric Sandeen esandeen@redhat.com wrote:
On 9/15/20 7:29 PM, Neal Gompa wrote:
On Tue, Sep 15, 2020 at 7:57 PM Kevin Kofler kevin.kofler@chello.at wrote:
Daniel Pocock wrote:
One issue I've come across is that a btrfs filesystem can only be used on hosts with the same page size as the host that created the filesystem
Ewww! That alone should disqualify btrfs as a default file system!
Why does a file system depend on the kernel page size? The kernel page size is an internal implementation detail of the kernel, whereas a file system ought to be a stable interchange format that is compatible across all machines.
It is unfortunate that this showstopper was not mentioned when the switch to btrfs by default was proposed.
I'm not sure that it would have been deemed any more important than other concerns which were raised at the time, TBH.
I hate to break it to you, but this problem is not just in filesystems, it's in basically everything in the kernel. And we've had variations of problems like this for years (endianness, page size, pointer size, single bit vs multi-bit booleans, etc.). I've personally been bitten by all of these issues in some way. This comes from the fact that there's no such thing as "internal implementation detail of the kernel" by design. This is the "joy" of the monorepo "design" where everything leaks into everything else.
That's simply not accurate. Handling 32/64 bit interfaces, endianness, etc are long-solved problems. Longstanding lack of design or support for sub-page block support in a filesystem is not /at all/ the same thing.
Are there occasional endianness bugs, pointer size bugs, etc? Sure. But that's different from "We did not design this."
Almost every filesystem was not originally designed for mixing page sizes, endianness, etc. These issues *have* been fixed over time, for sure. But it is not worth it for me or anyone else to go into a blame game. Is it unfortunate that Btrfs didn't have that? Sure. Did I know this was a problem? No, because I have no access to POWER systems, like almost everyone else here. And ARM, the other architecture we have, does not use 64K page sizes in Fedora (though it does in RHEL, and that is pretty much considered a mistake there, as it didn't take off, caused interop and performance issues, and added complexity where it was unneeded).
This didn't become a serious problem until Red Hat made the unfortunate (though not realized at the time) mistake of switching to 64k pages for ARM and POWER. We got that change in Fedora for POWER but not ARM. It has led to all kinds of unfortunate problems that are gradually being worked on and fixed upstream.
Sub-page block support in filesystems is not a wild, esoteric, unexpected feature.
It's something that is generally available in nearly every other widely used Linux filesystem. It's not accurate to suggest that this is some unexpected side effect of page size choice, or that 64k pages were somehow a "mistake" now that this btrfs compatibility issue has been made more obvious.
btw, Fedora has shipped kernels with 64k pages for almost a decade:
commit 737c9c7da818f1da0bdf3f6a0dda5c38a3cba769 Author: Josh Boyer jwboyer@redhat.com Date: Fri Sep 9 11:21:22 2011 -0400
Change to 64K page size for ppc64 kernels (rhbz 736751)
I am aware that we shipped them for a long time. They are a mistake for many other reasons unrelated to Btrfs. Regardless, the choice was made and things have been fixed over time for it. There is already a patch set being reviewed[1] for the first stage of mixed page support.
Apropos of nothing else in this thread, I love that I can continue my trend of "everything bad in Fedora comes from Josh" :)
64k pages have significant performance advantages on large memory machines and with specific workloads. Is that worth the hassle and complexity for using a page size that doesn't match the de facto standard? For the people that run those workloads, yes. For Fedora, probably not. At the time of that decision ppc64 was a secondary architecture with a lot of participation from IBM, which is naturally focused more on server class workloads (and the bug is clear that we didn't switch ppc32 because it makes no sense for that class of hardware).
On the ppc64le architecture there has been significant benefit to keeping the page size the same between Fedora and RHEL. Up until relatively recently it was almost exclusively server class hardware still. Some of the more recent offerings from IBM are smaller configs, and the Talos machines are squarely aimed at developer workstations. If you'd like to revisit the page size for ppc64le in Fedora, start a discussion with the kernel team.
(I had no input or direction on aarch64 in RHEL. My opinion there is that 64k pages were premature, predicated on similar benefits for server class hardware that for all practical purposes hasn't materialized. The market just isn't there yet.)
josh
On Wed, Sep 16, 2020 at 09:31:50AM -0500, Eric Sandeen wrote:
On 9/15/20 7:29 PM, Neal Gompa wrote:
On Tue, Sep 15, 2020 at 7:57 PM Kevin Kofler kevin.kofler@chello.at wrote:
Daniel Pocock wrote:
One issue I've come across is that a btrfs filesystem can only be used on hosts with the same page size as the host that created the filesystem
Ewww! That alone should disqualify btrfs as a default file system!
Why does a file system depend on the kernel page size? The kernel page size is an internal implementation detail of the kernel, whereas a file system ought to be a stable interchange format that is compatible across all machines.
It is unfortunate that this showstopper was not mentioned when the switch to btrfs by default was proposed.
I'm not sure that it would have been deemed any more important than other concerns which were raised at the time, TBH.
I hate to break it to you, but this problem is not just in filesystems, it's in basically everything in the kernel. And we've had variations of problems like this for years (endianness, page size, pointer size, single bit vs multi-bit booleans, etc.). I've personally been bitten by all of these issues in some way. This comes from the fact that there's no such thing as "internal implementation detail of the kernel" by design. This is the "joy" of the monorepo "design" where everything leaks into everything else.
That's simply not accurate. Handling 32/64 bit interfaces, endianness, etc are long-solved problems. Longstanding lack of design or support for sub-page block support in a filesystem is not /at all/ the same thing.
Are there occasional endianness bugs, pointer size bugs, etc? Sure. But that's different from "We did not design this."
This didn't become a serious problem until Red Hat made the unfortunate (though not realized at the time) mistake of switching to 64k pages for ARM and POWER. We got that change in Fedora for POWER but not ARM. It has led to all kinds of unfortunate problems that are gradually being worked on and fixed upstream.
Sub-page block support in filesystems is not a wild, esoteric, unexpected feature.
These kinds of problems are not really that rare across different Filesystems.
Try creating a XFS fs on a system with 64k PAGE_SIZE and a blocksize of 64k, then try mounting that fs on a x86_64 machine. It won't work: https://elixir.bootlin.com/linux/v5.8/source/fs/xfs/xfs_mount.c#L165 And IIRC xfs is the default for RHEL, no?
On 9/16/20 10:22 AM, Benjamin Block wrote:
On Wed, Sep 16, 2020 at 09:31:50AM -0500, Eric Sandeen wrote:
...
Sub-page block support in filesystems is not a wild, esoteric, unexpected feature.
These kinds of problems are not really that rare across different Filesystems.
Try creating a XFS fs on a system with 64k PAGE_SIZE and a blocksize of 64k, then try mounting that fs on a x86_64 machine. It won't work: https://elixir.bootlin.com/linux/v5.8/source/fs/xfs/xfs_mount.c#L165 And IIRC xfs is the default for RHEL, no?
It is. mkfs.xfs defaults to 4k blocks, so XFS filesystem are, by default, compatible across all supported architectures. RHEL would not choose a default fs with this sort of incompatibility across arches.
Block > page size is a different problem vs what is described in this thread.
If you /manually/ create a large block size fs, overriding the defaults, then yes you will have a compatibility problem on smaller page systems.
That's not the same as "you cannot create any btrfs filesystem that is usable on both 4k and 64k page systems"
-Eric
Eric Sandeen wrote:
Block > page size is a different problem vs what is described in this thread.
Well, the thread is about block size ≠ page size, of which that is one of the two cases to handle.
Though of course, if (as is the case for xfs), mkfs does not produce large block sizes by default, missing support for that case is less of a problem than if it does.
Kevin Kofler
On Tue, Sep 15, 2020 at 7:57 PM Kevin Kofler <kevin.kofler(a)chello.at> wrote:
I hate to break it to you, but this problem is not just in filesystems, it's in basically everything in the kernel. And we've had variations of problems like this for years (endianness, page size, pointer size, single bit vs multi-bit booleans, etc.). I've personally been bitten by all of these issues in some way. This comes from the fact that there's no such thing as "internal implementation detail of the kernel" by design. This is the "joy" of the monorepo "design" where everything leaks into everything else.
This didn't become a serious problem until Red Hat made the unfortunate (though not realized at the time) mistake of switching to 64k pages for ARM and POWER. We got that change in Fedora for POWER but not ARM. It has led to all kinds of unfortunate problems that are gradually being worked on and fixed upstream.
Coming back to Btrfs specifically, there is work underway upstream to resolve this issue. My (semi-blind) estimate is that we'll see a fix in Linux 5.11, but Josef (cc'd to this email) may know more about it.
-- 真実はいつも一つ!/ Always, there's only one truth!
Frankly I'm disappointed that the response is to deflect criticism of btrfs by claiming that this is an expected issue with the kernel, and then placing the blame on Red Hat for using a larger page size. To my knowledge page size differences aren't an issue on ext4 or xfs as they default to using a 4kb block size, so saying that "it's in basically everything in the kernel" is at best inaccurate and at worst intentionally misleading.
On Wed, Sep 16, 2020 at 2:05 PM Tom Seewald tseewald@gmail.com wrote:
On Tue, Sep 15, 2020 at 7:57 PM Kevin Kofler <kevin.kofler(a)chello.at> wrote:
I hate to break it to you, but this problem is not just in filesystems, it's in basically everything in the kernel. And we've had variations of problems like this for years (endianness, page size, pointer size, single bit vs multi-bit booleans, etc.). I've personally been bitten by all of these issues in some way. This comes from the fact that there's no such thing as "internal implementation detail of the kernel" by design. This is the "joy" of the monorepo "design" where everything leaks into everything else.
This didn't become a serious problem until Red Hat made the unfortunate (though not realized at the time) mistake of switching to 64k pages for ARM and POWER. We got that change in Fedora for POWER but not ARM. It has led to all kinds of unfortunate problems that are gradually being worked on and fixed upstream.
Coming back to Btrfs specifically, there is work underway upstream to resolve this issue. My (semi-blind) estimate is that we'll see a fix in Linux 5.11, but Josef (cc'd to this email) may know more about it.
-- 真実はいつも一つ!/ Always, there's only one truth!
Frankly I'm disappointed that the response is to deflect criticism of btrfs by claiming that this is an expected issue with the kernel, and then placing the blame on Red Hat for using a larger page size. To my knowledge page size differences aren't an issue on ext4 or xfs as they default to using a 4kb block size, so saying that "it's in basically everything in the kernel" is at best inaccurate and at worst intentionally misleading.
I *expect* issues in general for stuff like this to come up when it comes to knobs like this being turned. That does not excuse the fact this issue exists. And it is true that all kinds of things are impacted by those kinds of changes.
That said, there *is* work going on to resolve it *now*. I have even asked upstream to consider just forcing 4K sizes going forward since that is easy enough for everything to handle.
I'm annoyed in general that we still have problems like this, and I'm even more annoyed that I basically have no way to even test or deal with these things. We *still* do not have packager test machines, so I can't even figure out how to craft a workaround if there is one (and I suspect one is possible).
On Wed, Sep 16, 2020 at 02:09:42PM -0400, Neal Gompa wrote:
I'm annoyed in general that we still have problems like this, and I'm even more annoyed that I basically have no way to even test or deal with these things. We *still* do not have packager test machines, so I can't even figure out how to craft a workaround if there is one (and I suspect one is possible).
QEMU can emulate a virtual machine for any of the Fedora arches. It won't be fast like native, but that shouldn't be a fundamental blocker for a bit of adhoc debugging / testing of something that isn't performance critical.
Regards, Daniel
On Wed, Sep 16, 2020 at 2:15 PM Daniel P. Berrangé berrange@redhat.com wrote:
On Wed, Sep 16, 2020 at 02:09:42PM -0400, Neal Gompa wrote:
I'm annoyed in general that we still have problems like this, and I'm even more annoyed that I basically have no way to even test or deal with these things. We *still* do not have packager test machines, so I can't even figure out how to craft a workaround if there is one (and I suspect one is possible).
QEMU can emulate a virtual machine for any of the Fedora arches. It won't be fast like native, but that shouldn't be a fundamental blocker for a bit of adhoc debugging / testing of something that isn't performance critical.
Right, though on my Fedora 32 machine I can't create btrfs volumes in QEMU. That's fixed in F33, but my machine that ran F33 kinda died last week. :(
I guess I'll upgrade one of the machines hanging around...
-- 真実はいつも一つ!/ Always, there's only one truth!
On Wed, Sep 16, 2020 at 02:09:42PM -0400, Neal Gompa wrote:
I'm annoyed in general that we still have problems like this, and I'm even more annoyed that I basically have no way to even test or deal with these things. We *still* do not have packager test machines, so I can't even figure out how to craft a workaround if there is one (and I suspect one is possible).
We do: https://fedoraproject.org/wiki/Test_Machine_Resources_For_Package_Maintainer.... There's one ppc64le on that list.
Zbyszek
On Fri, Sep 18, 2020 at 10:07 AM Zbigniew Jędrzejewski-Szmek zbyszek@in.waw.pl wrote:
On Wed, Sep 16, 2020 at 02:09:42PM -0400, Neal Gompa wrote:
I'm annoyed in general that we still have problems like this, and I'm even more annoyed that I basically have no way to even test or deal with these things. We *still* do not have packager test machines, so I can't even figure out how to craft a workaround if there is one (and I suspect one is possible).
We do: https://fedoraproject.org/wiki/Test_Machine_Resources_For_Package_Maintainer.... There's one ppc64le on that list.
They were all down, last I checked a month ago. But it seems like the ppc64le one is back online now. :)
On Fri, Sep 18, 2020 at 10:08:46AM -0400, Neal Gompa wrote:
On Fri, Sep 18, 2020 at 10:07 AM Zbigniew Jędrzejewski-Szmek zbyszek@in.waw.pl wrote:
On Wed, Sep 16, 2020 at 02:09:42PM -0400, Neal Gompa wrote:
I'm annoyed in general that we still have problems like this, and I'm even more annoyed that I basically have no way to even test or deal with these things. We *still* do not have packager test machines, so I can't even figure out how to craft a workaround if there is one (and I suspect one is possible).
We do: https://fedoraproject.org/wiki/Test_Machine_Resources_For_Package_Maintainer.... There's one ppc64le on that list.
They were all down, last I checked a month ago. But it seems like the ppc64le one is back online now. :)
The aarch64/armv7 ones are still down, but we got a vm elsewhere for ppc64le. The x86_64 ones should all be up and working.
kevin
Hi, Kevin.
On Friday, 18 September 2020 at 19:46, Kevin Fenzi wrote:
On Fri, Sep 18, 2020 at 10:08:46AM -0400, Neal Gompa wrote:
On Fri, Sep 18, 2020 at 10:07 AM Zbigniew Jędrzejewski-Szmek zbyszek@in.waw.pl wrote:
On Wed, Sep 16, 2020 at 02:09:42PM -0400, Neal Gompa wrote:
I'm annoyed in general that we still have problems like this, and I'm even more annoyed that I basically have no way to even test or deal with these things. We *still* do not have packager test machines, so I can't even figure out how to craft a workaround if there is one (and I suspect one is possible).
We do: https://fedoraproject.org/wiki/Test_Machine_Resources_For_Package_Maintainer.... There's one ppc64le on that list.
They were all down, last I checked a month ago. But it seems like the ppc64le one is back online now. :)
The aarch64/armv7 ones are still down, but we got a vm elsewhere for ppc64le. The x86_64 ones should all be up and working.
I'd really appreciate having an ARM machine available. I've had a few bugs that occur only on ARM. Is there any ETA for their availability?
Regards, Dominik
On Sat, 19 Sep 2020 at 16:14, Dominik 'Rathann' Mierzejewski < dominik@greysector.net> wrote:
Hi, Kevin.
On Friday, 18 September 2020 at 19:46, Kevin Fenzi wrote:
On Fri, Sep 18, 2020 at 10:08:46AM -0400, Neal Gompa wrote:
On Fri, Sep 18, 2020 at 10:07 AM Zbigniew Jędrzejewski-Szmek zbyszek@in.waw.pl wrote:
On Wed, Sep 16, 2020 at 02:09:42PM -0400, Neal Gompa wrote:
I'm annoyed in general that we still have problems like this, and I'm even more annoyed that I basically have no way to even test or deal with these things. We *still* do not have packager test machines, so I can't even figure out how to craft a workaround if there is one (and I suspect one is possible).
We do:
https://fedoraproject.org/wiki/Test_Machine_Resources_For_Package_Maintainer... .
There's one ppc64le on that list.
They were all down, last I checked a month ago. But it seems like the ppc64le one is back online now. :)
The aarch64/armv7 ones are still down, but we got a vm elsewhere for ppc64le. The x86_64 ones should all be up and working.
I'd really appreciate having an ARM machine available. I've had a few bugs that occur only on ARM. Is there any ETA for their availability?
Not at the moment. I need to schedule multiple accesses to the data centre due to COVID-19 rules of 4 hour work windows and other controls. Once that I will need to then spend several days getting each one of these working. - a bastion server working - a dns/dhcp server working on the local network - get the mgmt consoles working - get the serial server working - find out what each arm box thinks it is doing - get them installed with something newer than F30 - get their ips working so main infra can get them working - get main infra to Ansible the systems.
several of these can be done 'remotely' but a lot of them require me to go in and sit on a keyboard/video/mouse on a box then work with networking and other groups to iron out issues. It is also a 'smooge only' thing on a lot of them because I am the only one in the near vicinity of these boxes.
I am currently on extended vacation because I didn't get any time off since January to work on the move from PHX2 to IAD2.. however when that is done, and I have gotten a task list of 'what is considered higher priority' then we can get this project done.
Neal Gompa ngompa13@gmail.com wrote:
This didn't become a serious problem until Red Hat made the unfortunate (though not realized at the time) mistake of switching to 64k pages for ARM and POWER. We got that change in Fedora for POWER but not ARM. It has led to all kinds of unfortunate problems that are gradually being worked on and fixed upstream.
There have been arches in the kernel that didn't support a 4K page size. FRV, for example, only supported 16K pages. I don't know if any of the "non-regular" arches still in the kernel have similar constraints.
David
On 9/14/20 3:31 AM, Daniel Pocock wrote:
Given the plans to make btrfs the default, I'll share some of my own recent experiences, hopefully this can make it easier for the next person
One issue I've come across is that a btrfs filesystem can only be used on hosts with the same page size as the host that created the filesystem
E.g. x86-64 kernels have a 4k default page size but powerpc64le kernels have been compiled with the optional 64k page size. This impacts various distributions.
If somebody creates some filesystems with the 4k parameter and then they migrate them to the powerpc64le host, they won't mount
If they try to go the other way, the filesystems won't mount either
There are other non-btrfs issues related to the 64k page size, for example, nouveau driver won't work either
To make things easier for btrfs, could it be worthwhile changing the default page size from 64k back to 4k on default kernels for most ordinary users?
Yeah subpage blocksize support isn't something that we've prioritized. When btrfs was originally written the only option for that was to use buffer heads, which removed a lot of the flexibility we needed to support things like multi device file systems. At the time we tied the fs blocksize to the page size, because it was unlikely that a user would mkfs a fs on one arch and move it over to another arch.
There is work ongoing from Suse to bring this support in, there was a patchset last week posted to add read-only support for sub-page blocksizes. Write support will be harder but is coming along. However these are obviously not going to be ready for F33 timeline, nor probably F34. Thanks,
Josef
On Wed, Sep 16, 2020 at 03:04:45PM -0400, Josef Bacik wrote:
At the time we tied the fs blocksize to the page size, because it was unlikely that a user would mkfs a fs on one arch and move it over to another arch.
But one doesn't need "another arch" for page size to change; many architectures (arm, mips, powerpc, sparc, to name a few) support multiple page sizes.
On 9/16/20 3:18 PM, Eugene Syromiatnikov wrote:
On Wed, Sep 16, 2020 at 03:04:45PM -0400, Josef Bacik wrote:
At the time we tied the fs blocksize to the page size, because it was unlikely that a user would mkfs a fs on one arch and move it over to another arch.
But one doesn't need "another arch" for page size to change; many architectures (arm, mips, powerpc, sparc, to name a few) support multiple page sizes.
Sure, but again you are not likely to change page size for an existing system. The decision early on was to forgo this particular ability for simplicity, and then we would revisit the decision later on. It's been a while and there's still not been enough demand to justify the work until recently. Thanks,
Josef
On 16/09/2020 21:29, Josef Bacik wrote:
On 9/16/20 3:18 PM, Eugene Syromiatnikov wrote:
On Wed, Sep 16, 2020 at 03:04:45PM -0400, Josef Bacik wrote:
At the time we tied the fs blocksize to the page size, because it was unlikely that a user would mkfs a fs on one arch and move it over to another arch.
But one doesn't need "another arch" for page size to change; many architectures (arm, mips, powerpc, sparc, to name a few) support multiple page sizes.
Sure, but again you are not likely to change page size for an existing system. The decision early on was to forgo this particular ability for simplicity, and then we would revisit the decision later on. It's been a while and there's still not been enough demand to justify the work until recently. Thanks,
This is messy but important
Is it possible for Fedora to offer two flavours of the kernel package, like Debian? There, I created a -4k flavour so it builds two kernel packages, one with 4k and the other with 64k. They can both be installed on the same machine and one or the other selected in the grub menu on each boot. Either can mount an ext4 root but obviously they can't share the same btrfs root, only the one that created it can mount that root.
Once an alternative kernel is available, people need an installer/rescue ISO including that kernel. This may mean making both permutations available as different installer ISOs, or including two kernels in the same ppc64el
Installer logic: If somebody is using ANY non-4k page size, on any architecture, it would be useful to display a pop-up window with a warning about btrfs before they create their root filesystem. This will save a lot of trouble for people. They might not realize there is a problem until they've been using the system for a few days and then they have to reinstall it again.
Finally, if both page sizes are available, it is desirable to do a build of every package for every page size. Some packages appear to sense the page size at compile time and assume it will always be the same at runtime. This is unfortunate. Maybe reproducible builds techniques can be used to build each package on two different page sizes, detect if the binary differs and if so, suggest checking for hard-coded page size.
Regards,
Daniel
On Fri, Sep 18, 2020 at 8:19 AM Daniel Pocock daniel@pocock.pro wrote:
On 16/09/2020 21:29, Josef Bacik wrote:
On 9/16/20 3:18 PM, Eugene Syromiatnikov wrote:
On Wed, Sep 16, 2020 at 03:04:45PM -0400, Josef Bacik wrote:
At the time we tied the fs blocksize to the page size, because it was unlikely that a user would mkfs a fs on one arch and move it over to another arch.
But one doesn't need "another arch" for page size to change; many architectures (arm, mips, powerpc, sparc, to name a few) support multiple page sizes.
Sure, but again you are not likely to change page size for an existing system. The decision early on was to forgo this particular ability for simplicity, and then we would revisit the decision later on. It's been a while and there's still not been enough demand to justify the work until recently. Thanks,
This is messy but important
Is it possible for Fedora to offer two flavours of the kernel package, like Debian? There, I created a -4k flavour so it builds two kernel packages, one with 4k and the other with 64k. They can both be installed on the same machine and one or the other selected in the grub menu on each boot. Either can mount an ext4 root but obviously they can't share the same btrfs root, only the one that created it can mount that root.
Once an alternative kernel is available, people need an installer/rescue ISO including that kernel. This may mean making both permutations available as different installer ISOs, or including two kernels in the same ppc64el
Installer logic: If somebody is using ANY non-4k page size, on any architecture, it would be useful to display a pop-up window with a warning about btrfs before they create their root filesystem. This will save a lot of trouble for people. They might not realize there is a problem until they've been using the system for a few days and then they have to reinstall it again.
Finally, if both page sizes are available, it is desirable to do a build of every package for every page size. Some packages appear to sense the page size at compile time and assume it will always be the same at runtime. This is unfortunate. Maybe reproducible builds techniques can be used to build each package on two different page sizes, detect if the binary differs and if so, suggest checking for hard-coded page size.
I think all of this effort required is why we probably *won't* do that.
I would be interested in seeing what kind of performance differences there are between 4K page sizes and 64K page sizes, but I don't particularly want to make ppc64le change back to 4K page sizes, simply because there's not much point to it. POWER systems are still largely unavailable to people and that's not going to change anytime soon.
As for a warning, I do not have the cycles to do that. As it stands, if I'm going to put my limited contributing energy into something, it'd probably be to help upstream fix the problem with non-4K page sizes in the first place. Since you're interested in this problem, perhaps you could help upstream with testing the patches and writing code to fix this? Especially since it sounds like you have hardware and use-cases to test this with.
On 18/09/2020 14:34, Neal Gompa wrote:
On Fri, Sep 18, 2020 at 8:19 AM Daniel Pocock daniel@pocock.pro wrote:
On 16/09/2020 21:29, Josef Bacik wrote:
On 9/16/20 3:18 PM, Eugene Syromiatnikov wrote:
On Wed, Sep 16, 2020 at 03:04:45PM -0400, Josef Bacik wrote:
At the time we tied the fs blocksize to the page size, because it was unlikely that a user would mkfs a fs on one arch and move it over to another arch.
But one doesn't need "another arch" for page size to change; many architectures (arm, mips, powerpc, sparc, to name a few) support multiple page sizes.
Sure, but again you are not likely to change page size for an existing system. The decision early on was to forgo this particular ability for simplicity, and then we would revisit the decision later on. It's been a while and there's still not been enough demand to justify the work until recently. Thanks,
This is messy but important
Is it possible for Fedora to offer two flavours of the kernel package, like Debian? There, I created a -4k flavour so it builds two kernel packages, one with 4k and the other with 64k. They can both be installed on the same machine and one or the other selected in the grub menu on each boot. Either can mount an ext4 root but obviously they can't share the same btrfs root, only the one that created it can mount that root.
Once an alternative kernel is available, people need an installer/rescue ISO including that kernel. This may mean making both permutations available as different installer ISOs, or including two kernels in the same ppc64el
Installer logic: If somebody is using ANY non-4k page size, on any architecture, it would be useful to display a pop-up window with a warning about btrfs before they create their root filesystem. This will save a lot of trouble for people. They might not realize there is a problem until they've been using the system for a few days and then they have to reinstall it again.
Finally, if both page sizes are available, it is desirable to do a build of every package for every page size. Some packages appear to sense the page size at compile time and assume it will always be the same at runtime. This is unfortunate. Maybe reproducible builds techniques can be used to build each package on two different page sizes, detect if the binary differs and if so, suggest checking for hard-coded page size.
I think all of this effort required is why we probably *won't* do that.
I would be interested in seeing what kind of performance differences there are between 4K page sizes and 64K page sizes, but I don't particularly want to make ppc64le change back to 4K page sizes, simply because there's not much point to it. POWER systems are still largely unavailable to people and that's not going to change anytime soon.
It looks like a lot of packages are impacted by this issue, it is not just btrfs
Examples that I already experienced personally:
- Firefox, especially things involving media content, like WebRTC
- Nouveau
Both of those were fixed without any recompiling, just using the same Firefox and Nouveau binaries on a 4k kernel.
As for a warning, I do not have the cycles to do that. As it stands, if I'm going to put my limited contributing energy into something, it'd probably be to help upstream fix the problem with non-4K page sizes in the first place. Since you're interested in this problem, perhaps you could help upstream with testing the patches and writing code to fix this? Especially since it sounds like you have hardware and use-cases to test this with.
In the 10 years since the 64k page size was selected, a lot of things still haven't been patched. People who buy these workstations don't necessarily have the bandwidth to fix all that. It feels like we're being pushed to fix stuff like that in other people's code when we could just use a 4k page size and not worry about it.
It is a chicken-and-egg problem: people might be deterred from using the platform if they keep hearing reports that Firefox doesn't work. From my experience, Firefox is working and it is working better with the 4k page size kernel that I compiled.
It looks like interest in the platform will increase too: FSF recently certified it as Respects Your Freedom. Vikings[1], in Germany, are making plans to distribute it in Europe. Overall, the machines are very nice for developers. Given the amount of parallelism in the architecture, a small development or support team could share one of these machines between 2 to 4 people in a multi-seat configuration.
Regards,
Daniel