On Wed, Sep 16, 2020 at 10:44 AM Neal Gompa <ngompa13(a)gmail.com> wrote:
On Wed, Sep 16, 2020 at 10:32 AM Eric Sandeen <esandeen(a)redhat.com> wrote:
> On 9/15/20 7:29 PM, Neal Gompa wrote:
> > On Tue, Sep 15, 2020 at 7:57 PM Kevin Kofler <kevin.kofler(a)chello.at>
> >> Daniel Pocock wrote:
> >>> One issue I've come across is that a btrfs filesystem can only be
> >>> on hosts with the same page size as the host that created the
> >> Ewww! That alone should disqualify btrfs as a default file system!
> >> Why does a file system depend on the kernel page size? The kernel page
> >> is an internal implementation detail of the kernel, whereas a file system
> >> ought to be a stable interchange format that is compatible across all
> >> machines.
> >> It is unfortunate that this showstopper was not mentioned when the switch
> >> btrfs by default was proposed.
> I'm not sure that it would have been deemed any more important than other
> concerns which were raised at the time, TBH.
> > I hate to break it to you, but this problem is not just in
> > filesystems, it's in basically everything in the kernel. And we've had
> > variations of problems like this for years (endianness, page size,
> > pointer size, single bit vs multi-bit booleans, etc.). I've personally
> > been bitten by all of these issues in some way. This comes from the
> > fact that there's no such thing as "internal implementation detail of
> > the kernel" by design. This is the "joy" of the monorepo
> > where everything leaks into everything else.
> That's simply not accurate. Handling 32/64 bit interfaces, endianness, etc
> are long-solved problems. Longstanding lack of design or support for
> sub-page block support in a filesystem is not /at all/ the same thing.
> Are there occasional endianness bugs, pointer size bugs, etc? Sure.
> But that's different from "We did not design this."
Almost every filesystem was not originally designed for mixing page
sizes, endianness, etc. These issues *have* been fixed over time, for
sure. But it is not worth it for me or anyone else to go into a blame
game. Is it unfortunate that Btrfs didn't have that? Sure. Did I know
this was a problem? No, because I have no access to POWER systems,
like almost everyone else here. And ARM, the other architecture we
have, does not use 64K page sizes in Fedora (though it does in RHEL,
and that is pretty much considered a mistake there, as it didn't take
off, caused interop and performance issues, and added complexity where
it was unneeded).
> > This didn't become a serious problem until Red Hat made the
> > unfortunate (though not realized at the time) mistake of switching to
> > 64k pages for ARM and POWER. We got that change in Fedora for POWER
> > but not ARM. It has led to all kinds of unfortunate problems that are
> > gradually being worked on and fixed upstream.
> Sub-page block support in filesystems is not a wild, esoteric, unexpected
> It's something that is generally available in nearly every other widely used
> Linux filesystem. It's not accurate to suggest that this is some unexpected
> side effect of page size choice, or that 64k pages were somehow a
> now that this btrfs compatibility issue has been made more obvious.
> btw, Fedora has shipped kernels with 64k pages for almost a decade:
> commit 737c9c7da818f1da0bdf3f6a0dda5c38a3cba769
> Author: Josh Boyer <jwboyer(a)redhat.com>
> Date: Fri Sep 9 11:21:22 2011 -0400
> Change to 64K page size for ppc64 kernels (rhbz 736751)
I am aware that we shipped them for a long time. They are a mistake
for many other reasons unrelated to Btrfs. Regardless, the choice was
made and things have been fixed over time for it. There is already a
patch set being reviewed for the first stage of mixed page support.
Apropos of nothing else in this thread, I love that I can continue my
trend of "everything bad in Fedora comes from Josh" :)
64k pages have significant performance advantages on large memory
machines and with specific workloads. Is that worth the hassle and
complexity for using a page size that doesn't match the de facto
standard? For the people that run those workloads, yes. For Fedora,
probably not. At the time of that decision ppc64 was a secondary
architecture with a lot of participation from IBM, which is naturally
focused more on server class workloads (and the bug is clear that we
didn't switch ppc32 because it makes no sense for that class of
On the ppc64le architecture there has been significant benefit to
keeping the page size the same between Fedora and RHEL. Up until
relatively recently it was almost exclusively server class hardware
still. Some of the more recent offerings from IBM are smaller
configs, and the Talos machines are squarely aimed at developer
workstations. If you'd like to revisit the page size for ppc64le in
Fedora, start a discussion with the kernel team.
(I had no input or direction on aarch64 in RHEL. My opinion there is
that 64k pages were premature, predicated on similar benefits for
server class hardware that for all practical purposes hasn't
materialized. The market just isn't there yet.)