On 16/02/2021 17:05, Peter Robinson wrote:
> On 15/02/2021 19:47, Gary Buhrmaster wrote:
>> On Mon, Feb 15, 2021 at 6:39 PM Dan Horák <dan(a)danny.cz> wrote:
>>> The open question still is whether we should try to keep 64k as default
>>> as it would allow to find the remaining bugs and offer 4k kernel variant
>>> (COPR for ppc64le should be coming back soon), similar for the
>>> installer (a new remix/spin). After BTRFS removes the page size
>>> dependency, switching the kernels shouldn't cause any issues for users.
>> I think it may be instructive to look at the enabling IPv6
>> had on the entire ecosystem (and going to ipv6-first
>> networking). Which definitely broke things (and there
>> remain, in the greater world, lots of things still broken
>> when IPv6 is enabled). However, if we still used ipv4-first
>> networking even more would almost certainly still be
>> broken, because no one would experience or report
>> the issues with IPv6.
>> If you agree that fixing the 64K bugs are important
>> (and I personally think they are), you need to go
>> 64K first to get the reports, and get the fixes.
> The problem is that not all ppc64le bugs are related to page size
Welcome to the world of non x86 architectures.
Welcome or welcome back...
I started on Color Computer 3 with Motorola 6809.
> I was recently looking at ffmpeg issues that happen on any
> that is now fixed and it also fixes issues in Blender.
> Going to 4k page size, we effectively drain the swamp to the half-water
> mark. Some bugs will go away, other bugs will still be there.
It doesn't drain the swap at all, it just changes the water from one to another.
My personal impression is that the combination of Btrfs page size and
GPUs not working were a darker water. To get around that, I had to not
only compile a kernel but also create a custom installer image with my
kernel. I'm happy to share those things for other users.
> The volume of workstation bugs is actually quite intimidating.
> somebody with a lot of experience, it takes away a certain amount of
> energy. Some users and maybe even some developers will spend so much
> time on hacks and workarounds that they have no time or energy left to
> report the bugs, bisect them or even fix them.
At least you don't have to deal with big endian bugs in there too, and
a bunch of us that have been working on non x86 architectures for
years have no doubt solved a number of the problems already. aarch64
had a mix of 4K (Fedora) and 64K (RHEL) and we've dealt with 100s of
these already, of course that doesn't rule out POWER specific ones.
Actually, as an upstream developer, when politics doesn't prevent me
uploading packages to every distribution, I would carefully check unit
test results from all architectures on both Fedora and Debian. If
builds failed on a specific architecture or big endian I would make the
effort to support it. But I understand some people spend a far greater
percentage of their time on that than me and I'm glad so many things
just work already on POWER9.
> But I do agree that we can't avoid 64k indefinitely. If
there is a way
> to support both page sizes and run unit tests for all packages on both
> that would be really useful. In addition to unit tests, it would be
> useful to have a manual check on Firefox, Thunderbird, LibreOffice, etc
> before each major release on 64k.
No easily, apparently having something you can set via a kernel
command line for this stuff isn't straight forward, I started asking
for that functionality for pages sizes back in the early days of
aarch64 and I'm still waiting.
I wasn't really thinking about a runtime option, I was thinking about
two completely parallel environments, each with their own copies of
userland compiled on kernels with the corresponding page size.
Beyond the unit tests, it would also be interesting to use reproducible
builds methods to compare userland binaries and see if they vary
depending on the page size of the host where they were built. This
could flush out more problems.
> 64k issues for ppc64le will also get more attention when other
> architectures go 64k, then we won't have all the pressure on ppc64le
> users. IPv6 was for every architecture so the effort was spread a lot
> more widely.
aarch64 also has 64K page sizes so it's already a shared problem, and
I've dealt with and fixed more bug around that I care to remember but
you're not the only one that's had to deal with it.
Yes and I've seen more reports from people trying that platform too, for
example, the recent blog about the HoneyComb
Are there other ways we can collaborate on this, for example, with a
wiki page about known 64k issues?
Does anybody want to capture anything from this thread in the wiki page
for the change or is there any other place where it would be useful
to have a summary?