On 08/30/2017 09:29 AM, Matthew Miller wrote:
Yesterday's compose stalled. So, no compose yesterday. The day
2017-08-28 03:19:38 [...] Fedora-27-20170828.n.0 started [...]
2017-08-28 20:26:31 [...] Fedora-27-20170828.n.0 FINISHED_INCOMPLETE [...]
That's *over 17 hours*. We're not too far off from actually literally
taking more than 24 hours to make a day's compose.
If this were something like one, two, or even four hours, when there
a problem, we could fix that and restart without losing a whole day
(which often cascades into months).
I understand that a large part of the slowness is simply IO, from
moving a bunch of packages around, and there are tens of thousands of
packages. But, the day-to-day changeset isn't that big. Looking back at
F26 compose reports from June, I see changes ranging from a few
megabytes up to 3GB or so. Even processing at at at one megabyte per
second, that should be less than an hour even on those busy days. (Plus
whatever time for creating images, which I understand has its own
things with RPM scripts being slow and etc.)
It's not that simple. There's lots of things that have to be complete
before the next thing can run. ie, you have to make the repos before you
can compose images from them, etc.
Anyway, there has to be some low-hanging fruit here. Is there
we can optimize by caching day-to-day, or caching as new packages are
Is there a place we can thrown money at the problem and do this
all on a machine with RAID 10 SSDs or something?
I think we could, but then then there might be issues copying it and
hardlinks and such.
There are plans to split out gigantic koji volume (~40TB) into a
'active' and 'not active' set, and then get the active set to be on
netapp flash and the not active to be on sata drives. We just haven't
gotten to this yet, but we could prioritize it...