Do we need this anymore, or can it be dropped to something more reasonable like HZ=100 or HZ=250?
Does anyone have an actual issue where they require HZ=1000?
P.
On Sun, 2013-03-03 at 07:18 -0500, Prarit Bhargava wrote:
Do we need this anymore, or can it be dropped to something more reasonable like HZ=100 or HZ=250?
Does anyone have an actual issue where they require HZ=1000?
I guess it depends what the implementation actually does. What does HZ even control these days? If it controls the scheduler timeslice then yes I very much do need 1kHz, xserver needs to be able to post a buffer swap at least every 16ms and if the scheduler only task-switches every 10ms then I stand a very good chance of dropping frames.
Why do you want it lower?
- ajax
On 03/04/2013 10:31 AM, Adam Jackson wrote:
On Sun, 2013-03-03 at 07:18 -0500, Prarit Bhargava wrote:
Do we need this anymore, or can it be dropped to something more reasonable like HZ=100 or HZ=250?
Does anyone have an actual issue where they require HZ=1000?
I guess it depends what the implementation actually does. What does HZ even control these days? If it controls the scheduler timeslice then yes I very much do need 1kHz, xserver needs to be able to post a buffer swap at least every 16ms and if the scheduler only task-switches every 10ms then I stand a very good chance of dropping frames.
IIRC, the scheduler timeslice isn't impacted by HZ. There's a definition of the default timeslice, RRTIMESLICE or something like that (brain fail me.) which takes into account that HZ is really some number of events per second, rather than just some number of events.
(brain ... still ... fail. ... not RRTIMESLICE ... geez.)
I'm going to end up owing someone a beer when they tell me what that default is...
Why do you want it lower?
You know me ;) Always thinking of big systems :)
Some HPC colleagues (who would strongly prefer to use Fedora instead of another OS) mentioned to me that they can actually "see" the effect of having HZ @ 1000. While thing have gotten better, the bottom line appears to be that there is a noticeable impact to performance on compute nodes due to acquiring of the clock related locks in the timer interrupt code path when you have 1000s of physical cpus.
So I took a look at some of the other Linux OSes (Ubuntu, OpenSuse) and the major ones seem to be at HZ=250. I'm wondering if there is an actual real reason we still have it at 1000 or if this is a purely historical value?
Aside: I had heard from SGI in the past year that they also noticed some problems with the locking @ HZ=1000, however, I'm certain those issues have been resolved upstream. It seems like for most performance situations we're okay @ 1000, but there are some circumstances where 1000 still won't cut it.
I've advised them to compile their own kernel with a lower value of HZ. But, as you can imagine, that isn't exactly the best option for them.
P.
IIRC, the scheduler timeslice isn't impacted by HZ.
Yeah. As Dave was pointing out, a lot of the timeout aliasing problems with HZ have been fixed by moving implementations of waits from the HZ granular interfaces to hrtimers. f.e.
commit 8ff3e8e85fa6c312051134b3953e397feb639f51 Author: Arjan van de Ven arjan@linux.intel.com Date: Sun Aug 31 08:26:40 2008 -0700
select: switch select() and poll() over to hrtimers
has at its core:
- __timeout = schedule_timeout(__timeout);
+ if (!schedule_hrtimeout(to, HRTIMER_MODE_ABS))
A slightly different and disappointing result of dropping HZ is increasing the duration of waits in code that is still using schedule_timeout(1) for a short timeout.
btrfs has a bunch of these that are trying to wait for more work to accumulate before carrying on. If you drop HZ you'll be adding 4ms (or 10ms) delays to a few paths.
jbd has similar code that is sensitive to jiffies, but it's a little more involved because it's measuring journal commit times rather than using a dumb single jiffie timeout.
Anyway, just a data point.
- z
On 03/14/2013 02:43 PM, Zach Brown wrote:
IIRC, the scheduler timeslice isn't impacted by HZ.
Yeah. As Dave was pointing out, a lot of the timeout aliasing problems with HZ have been fixed by moving implementations of waits from the HZ granular interfaces to hrtimers. f.e.
commit 8ff3e8e85fa6c312051134b3953e397feb639f51 Author: Arjan van de Ven <arjan@linux.intel.com> Date: Sun Aug 31 08:26:40 2008 -0700 select: switch select() and poll() over to hrtimers
has at its core:
__timeout = schedule_timeout(__timeout);
if (!schedule_hrtimeout(to, HRTIMER_MODE_ABS))
A slightly different and disappointing result of dropping HZ is increasing the duration of waits in code that is still using schedule_timeout(1) for a short timeout.
btrfs has a bunch of these that are trying to wait for more work to accumulate before carrying on. If you drop HZ you'll be adding 4ms (or 10ms) delays to a few paths.
jbd has similar code that is sensitive to jiffies, but it's a little more involved because it's measuring journal commit times rather than using a dumb single jiffie timeout.
IOW .. don't modify HZ=1000 yet ;). Durnit ...
P.
Anyway, just a data point.
- z
On 15/03/13 06:46, Prarit Bhargava wrote:
On 03/14/2013 02:43 PM, Zach Brown wrote:
IIRC, the scheduler timeslice isn't impacted by HZ.
Yeah. As Dave was pointing out, a lot of the timeout aliasing problems with HZ have been fixed by moving implementations of waits from the HZ granular interfaces to hrtimers. f.e.
commit 8ff3e8e85fa6c312051134b3953e397feb639f51 Author: Arjan van de Ven <arjan@linux.intel.com> Date: Sun Aug 31 08:26:40 2008 -0700 select: switch select() and poll() over to hrtimers
has at its core:
__timeout = schedule_timeout(__timeout);
if (!schedule_hrtimeout(to, HRTIMER_MODE_ABS))
A slightly different and disappointing result of dropping HZ is increasing the duration of waits in code that is still using schedule_timeout(1) for a short timeout.
btrfs has a bunch of these that are trying to wait for more work to accumulate before carrying on. If you drop HZ you'll be adding 4ms (or 10ms) delays to a few paths.
jbd has similar code that is sensitive to jiffies, but it's a little more involved because it's measuring journal commit times rather than using a dumb single jiffie timeout.
IOW .. don't modify HZ=1000 yet ;). Durnit ...
1000HZ has long been recommended for accurate MIDI playback,
regards,
Brendan
On Sun, Mar 03, 2013 at 07:18:32AM -0500, Prarit Bhargava wrote:
Do we need this anymore, or can it be dropped to something more reasonable like HZ=100 or HZ=250?
Does anyone have an actual issue where they require HZ=1000?
Historically, one reason was that with 100HZ we wouldn't get frequent enough timer interrupts that delays were accurate.
The test program below would consistently fail. That doesn't seem to be the case against a current kernel afaict.
Dave
#include <stdio.h> #include <stdlib.h> #include <stdio.h> #include <sys/time.h>
unsigned long do_time(unsigned long usecs) { struct timeval after, before; int secs, usec; gettimeofday(&before, NULL); usleep(usecs); gettimeofday(&after, NULL); secs = after.tv_sec - before.tv_sec; usec = after.tv_usec - before.tv_usec; return secs * 1000000 + usec; }
int main(int argc, char **argv) { unsigned long delays = 0; int i; /* take the average over 1000 measurements */ for (i = 0; i < 1000; i++) delays += do_time(1000); delays = delays / 1000; printf("%li -> %li \n", 1000, delays);
/* we asked for a 1.000 msec delay, if this takes more than 2.5 msec that's unacceptable. */ if (delays > 2500) { printf("Unacceptable long delay; asked for 1000 usec, got %i usec \n", delays); exit(EXIT_FAILURE); }
delays = 0; for (i = 0; i < 1000; i++) delays += do_time(2000); delays = delays / 1000; printf("%li -> %li \n", 2000, delays);
/* we asked for a 2.000 msec delay, if this takes more than 3.5 msec that's unacceptable. */ if (delays > 3500) { printf("Unacceptable long delay; asked for 2000 usec, got %i usec \n", delays); exit(EXIT_FAILURE); } exit(EXIT_SUCCESS); }
kernel@lists.fedoraproject.org