HZ=1000 in Fedora kernels - kernel - Fedora Mailing-Lists

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

HZ=1000 in Fedora kernels

3.9 config update for s390x

[patch F18] team: update to 3.9-rc1

Prarit Bhargava

Sunday, 3 March 2013 Sun, 3 Mar '13

6:18 a.m.

Do we need this anymore, or can it be dropped to something more reasonable like HZ=100 or HZ=250? Does anyone have an actual issue where they require HZ=1000? P.

Reply

Show replies by date

Adam Jackson

Monday, 4 March Mon, 4 Mar

9:31 a.m.

On Sun, 2013-03-03 at 07:18 -0500, Prarit Bhargava wrote:

Do we need this anymore, or can it be dropped to something more reasonable like HZ=100 or HZ=250? Does anyone have an actual issue where they require HZ=1000?

I guess it depends what the implementation actually does. What does HZ even control these days? If it controls the scheduler timeslice then yes I very much do need 1kHz, xserver needs to be able to post a buffer swap at least every 16ms and if the scheduler only task-switches every 10ms then I stand a very good chance of dropping frames. Why do you want it lower? - ajax

Reply

Prarit Bhargava

Wednesday, 13 March Wed, 13 Mar

5:45 p.m.

On 03/04/2013 10:31 AM, Adam Jackson wrote:

On Sun, 2013-03-03 at 07:18 -0500, Prarit Bhargava wrote: > Do we need this anymore, or can it be dropped to something more reasonable like > HZ=100 or HZ=250? > > Does anyone have an actual issue where they require HZ=1000? I guess it depends what the implementation actually does. What does HZ even control these days? If it controls the scheduler timeslice then yes I very much do need 1kHz, xserver needs to be able to post a buffer swap at least every 16ms and if the scheduler only task-switches every 10ms then I stand a very good chance of dropping frames.

IIRC, the scheduler timeslice isn't impacted by HZ. There's a definition of the default timeslice, RRTIMESLICE or something like that (brain fail me.) which takes into account that HZ is really some number of events per second, rather than just some number of events. (brain ... still ... fail. ... not RRTIMESLICE ... geez.) I'm going to end up owing someone a beer when they tell me what that default is...

Why do you want it lower?

You know me ;) Always thinking of big systems :) Some HPC colleagues (who would strongly prefer to use Fedora instead of another OS) mentioned to me that they can actually "see" the effect of having HZ @ 1000. While thing have gotten better, the bottom line appears to be that there is a noticeable impact to performance on compute nodes due to acquiring of the clock related locks in the timer interrupt code path when you have 1000s of physical cpus. So I took a look at some of the other Linux OSes (Ubuntu, OpenSuse) and the major ones seem to be at HZ=250. I'm wondering if there is an actual real reason we still have it at 1000 or if this is a purely historical value? Aside: I had heard from SGI in the past year that they also noticed some problems with the locking @ HZ=1000, however, I'm certain those issues have been resolved upstream. It seems like for most performance situations we're okay @ 1000, but there are some circumstances where 1000 still won't cut it. I've advised them to compile their own kernel with a lower value of HZ. But, as you can imagine, that isn't exactly the best option for them. P.

Reply

Zach Brown

Thursday, 14 March Thu, 14 Mar

1:43 p.m.

IIRC, the scheduler timeslice isn't impacted by HZ.

Yeah. As Dave was pointing out, a lot of the timeout aliasing problems with HZ have been fixed by moving implementations of waits from the HZ granular interfaces to hrtimers. f.e. commit 8ff3e8e85fa6c312051134b3953e397feb639f51 Author: Arjan van de Ven <arjan(a)linux.intel.com> Date: Sun Aug 31 08:26:40 2008 -0700 select: switch select() and poll() over to hrtimers has at its core: - __timeout = schedule_timeout(__timeout); + if (!schedule_hrtimeout(to, HRTIMER_MODE_ABS)) A slightly different and disappointing result of dropping HZ is increasing the duration of waits in code that is still using schedule_timeout(1) for a short timeout. btrfs has a bunch of these that are trying to wait for more work to accumulate before carrying on. If you drop HZ you'll be adding 4ms (or 10ms) delays to a few paths. jbd has similar code that is sensitive to jiffies, but it's a little more involved because it's measuring journal commit times rather than using a dumb single jiffie timeout. Anyway, just a data point. - z

Reply

Prarit Bhargava

3:46 p.m.

On 03/14/2013 02:43 PM, Zach Brown wrote:

> IIRC, the scheduler timeslice isn't impacted by HZ. Yeah. As Dave was pointing out, a lot of the timeout aliasing problems with HZ have been fixed by moving implementations of waits from the HZ granular interfaces to hrtimers. f.e. commit 8ff3e8e85fa6c312051134b3953e397feb639f51 Author: Arjan van de Ven <arjan(a)linux.intel.com> Date: Sun Aug 31 08:26:40 2008 -0700 select: switch select() and poll() over to hrtimers has at its core: - __timeout = schedule_timeout(__timeout); + if (!schedule_hrtimeout(to, HRTIMER_MODE_ABS)) A slightly different and disappointing result of dropping HZ is increasing the duration of waits in code that is still using schedule_timeout(1) for a short timeout. btrfs has a bunch of these that are trying to wait for more work to accumulate before carrying on. If you drop HZ you'll be adding 4ms (or 10ms) delays to a few paths. jbd has similar code that is sensitive to jiffies, but it's a little more involved because it's measuring journal commit times rather than using a dumb single jiffie timeout.

IOW .. don't modify HZ=1000 yet ;). Durnit ... P.

Anyway, just a data point. - z

Reply

Brendan Jones

6:24 p.m.

On 15/03/13 06:46, Prarit Bhargava wrote:

On 03/14/2013 02:43 PM, Zach Brown wrote: >> IIRC, the scheduler timeslice isn't impacted by HZ. > > Yeah. As Dave was pointing out, a lot of the timeout aliasing problems > with HZ have been fixed by moving implementations of waits from the HZ > granular interfaces to hrtimers. f.e. > > commit 8ff3e8e85fa6c312051134b3953e397feb639f51 > Author: Arjan van de Ven <arjan(a)linux.intel.com> > Date: Sun Aug 31 08:26:40 2008 -0700 > > select: switch select() and poll() over to hrtimers > > has at its core: > > - __timeout = schedule_timeout(__timeout); > > + if (!schedule_hrtimeout(to, HRTIMER_MODE_ABS)) > > A slightly different and disappointing result of dropping HZ is > increasing the duration of waits in code that is still using > schedule_timeout(1) for a short timeout. > > btrfs has a bunch of these that are trying to wait for more work to > accumulate before carrying on. If you drop HZ you'll be adding 4ms (or > 10ms) delays to a few paths. > > jbd has similar code that is sensitive to jiffies, but it's a little > more involved because it's measuring journal commit times rather than > using a dumb single jiffie timeout. IOW .. don't modify HZ=1000 yet ;). Durnit ...

1000HZ has long been recommended for accurate MIDI playback, regards, Brendan http://wiki.linuxaudio.org/faq/start

Reply

Dave Jones

Monday, 4 March Mon, 4 Mar

4:26 p.m.

On Sun, Mar 03, 2013 at 07:18:32AM -0500, Prarit Bhargava wrote:

Do we need this anymore, or can it be dropped to something more reasonable like HZ=100 or HZ=250? Does anyone have an actual issue where they require HZ=1000?

Historically, one reason was that with 100HZ we wouldn't get frequent enough timer interrupts that delays were accurate. The test program below would consistently fail. That doesn't seem to be the case against a current kernel afaict. Dave #include <stdio.h> #include <stdlib.h> #include <stdio.h> #include <sys/time.h> unsigned long do_time(unsigned long usecs) { struct timeval after, before; int secs, usec; gettimeofday(&before, NULL); usleep(usecs); gettimeofday(&after, NULL); secs = after.tv_sec - before.tv_sec; usec = after.tv_usec - before.tv_usec; return secs * 1000000 + usec; } int main(int argc, char **argv) { unsigned long delays = 0; int i; /* take the average over 1000 measurements */ for (i = 0; i < 1000; i++) delays += do_time(1000); delays = delays / 1000; printf("%li -> %li \n", 1000, delays); /* we asked for a 1.000 msec delay, if this takes more than 2.5 msec that's unacceptable. */ if (delays > 2500) { printf("Unacceptable long delay; asked for 1000 usec, got %i usec \n", delays); exit(EXIT_FAILURE); } delays = 0; for (i = 0; i < 1000; i++) delays += do_time(2000); delays = delays / 1000; printf("%li -> %li \n", 2000, delays); /* we asked for a 2.000 msec delay, if this takes more than 3.5 msec that's unacceptable. */ if (delays > 3500) { printf("Unacceptable long delay; asked for 2000 usec, got %i usec \n", delays); exit(EXIT_FAILURE); } exit(EXIT_SUCCESS); }

Reply

4060

days inactive

4071

days old

kernel@lists.fedoraproject.org

Manage subscription

6 comments

5 participants

tags (0)

participants (5)

Adam Jackson
Brendan Jones
Dave Jones
Prarit Bhargava
Zach Brown