I was browsing for info on 12 core cpu's and found that AMD released them or announced back in March. The price is steep of course. What I would like to know is the degree of granularity of the SMP implementation in Linux. Does anyone have an inside track on that? Or point to some internal documentation?
On 09/30/2010 10:12 AM, Tom Horsley wrote:
On Thu, 30 Sep 2010 09:59:38 -0700 JD wrote:
What I would like to know is the degree of granularity of the SMP implementation in Linux.
Don't know what granularity means :-), but we have run kernel.org kernels on up to 64 core machines here at work.
Granularity applies to the locking scheme in the kernel. It is how the kernel must prevent different cors/cpus from clobberig the same kernel global data at the same time.
Some locking schemes are coarser than others. There is an optimal point where further granularity will decrease performance.
I was looking for papers/studies that may have been done to see at what degree of granularity the payoff was highest.
On Thu, 2010-09-30 at 10:26 -0700, JD wrote:
Granularity applies to the locking scheme in the kernel. It is how the kernel must prevent different cors/cpus from clobberig the same kernel global data at the same time.
Some locking schemes are coarser than others. There is an optimal point where further granularity will decrease performance.
I was looking for papers/studies that may have been done to see at what degree of granularity the payoff was highest.
Most of the 2.6 kernel no longer uses the BKL (AKA Big kernel lock) and uses fine grained locking instead, making some code paths better suited when it comes to operating in heavily-SMP'ed environments (10's of core). (E.g. IP vs IPX)
As you are talking about recent hardware (AMD Opteron 6xxx I presume) with (only) 12 cores, I doubt that you'll hit any major performance barrier.
As I said in a previous comment, some additional information on what you want to do with the machine will be helpful.
On 10/02/2010 11:09 AM, Gilboa Davara wrote:
On Thu, 2010-09-30 at 10:26 -0700, JD wrote:
Granularity applies to the locking scheme in the kernel. It is how the kernel must prevent different cors/cpus from clobberig the same kernel global data at the same time.
Some locking schemes are coarser than others. There is an optimal point where further granularity will decrease performance.
I was looking for papers/studies that may have been done to see at what degree of granularity the payoff was highest.
Most of the 2.6 kernel no longer uses the BKL (AKA Big kernel lock) and uses fine grained locking instead, making some code paths better suited when it comes to operating in heavily-SMP'ed environments (10's of core). (E.g. IP vs IPX)
As you are talking about recent hardware (AMD Opteron 6xxx I presume) with (only) 12 cores, I doubt that you'll hit any major performance barrier.
As I said in a previous comment, some additional information on what you want to do with the machine will be helpful.
I just wanted to read the papers/studies, if they exist somewhere.
JD wrote:
On 09/30/2010 10:12 AM, Tom Horsley wrote:
On Thu, 30 Sep 2010 09:59:38 -0700 JD wrote:
What I would like to know is the degree of granularity of the SMP implementation in Linux.
Don't know what granularity means :-), but we have run kernel.org kernels on up to 64 core machines here at work.
Granularity applies to the locking scheme in the kernel. It is how the kernel must prevent different cors/cpus from clobberig the same kernel global data at the same time.
Some locking schemes are coarser than others. There is an optimal point where further granularity will decrease performance.
I was looking for papers/studies that may have been done to see at what degree of granularity the payoff was highest.
In case you don't see a mention in the papers, one of the areas where the kernel can't currently be tuned, or can only be tuned using affinity, if having the schedular decide to use two threads in the same core (hyperthreading), and two cores. And depending on the application it does matter, two threads working on shared memory will go a lot faster if the memory is in L1 cache than if it's off on another core.
Happy reading.
On Thu, 2010-09-30 at 09:59 -0700, JD wrote:
I was browsing for info on 12 core cpu's and found that AMD released them or announced back in March. The price is steep of course. What I would like to know is the degree of granularity of the SMP implementation in Linux. Does anyone have an inside track on that? Or point to some internal documentation?
I'm not sure I understand the question. The Linux kernel itself has no issues supporting 100's of CPUs (either real, or SMT). As for application support, it greatly depends on the application being used. (E.g. Database, web-server, math, 3D, etc)
On 10/02/2010 10:54 AM, Gilboa Davara wrote:
On Thu, 2010-09-30 at 09:59 -0700, JD wrote:
I was browsing for info on 12 core cpu's and found that AMD released them or announced back in March. The price is steep of course. What I would like to know is the degree of granularity of the SMP implementation in Linux. Does anyone have an inside track on that? Or point to some internal documentation?
I'm not sure I understand the question. The Linux kernel itself has no issues supporting 100's of CPUs (either real, or SMT).
Not to be mean or anything....So please take it lightly: Did I say linux has issues in supporting large smp? NO! You are answering your own question. not mine.
As for application support, it greatly depends on the application being used. (E.g. Database, web-server, math, 3D, etc)
Did I say anything about application support? No! Again, you are answering your own question.
I have found a few articles mentioning the smp granularity, but do not discuss the linux smp implementation within the context of granularity and overhead, and optimum degree of granularity.
On Sat, 2010-10-02 at 11:18 -0700, JD wrote:
Not to be mean or anything....So please take it lightly: Did I say linux has issues in supporting large smp? NO! You are answering your own question. not mine.
...
Did I say anything about application support? No! Again, you are answering your own question.
I have found a few articles mentioning the smp granularity, but do not discuss the linux smp implementation within the context of granularity and overhead, and optimum degree of granularity.
Might be a language barrier on my part, but by your question, it wasn't apparent (at least not to me) if you're talking the locking granularity inside the Linux kernel (read: A theoretical question) or application performance running on Linux. As you talked about actual hardware (AMD Opteron CPU with 12 cores), I -assumed- that you were talking about application performance.
-However-, saying "Taking it lightly" doesn't excuse you from "being excellent to each other" rule. Your answer was rude and uncalled for.
- Gilboa
On 10/02/2010 11:30 AM, Gilboa Davara wrote:
On Sat, 2010-10-02 at 11:18 -0700, JD wrote:
Not to be mean or anything....So please take it lightly: Did I say linux has issues in supporting large smp? NO! You are answering your own question. not mine.
...
Did I say anything about application support? No! Again, you are answering your own question.
I have found a few articles mentioning the smp granularity, but do not discuss the linux smp implementation within the context of granularity and overhead, and optimum degree of granularity.
Might be a language barrier on my part, but by your question, it wasn't apparent (at least not to me) if you're talking the locking granularity inside the Linux kernel (read: A theoretical question) or application performance running on Linux. As you talked about actual hardware (AMD Opteron CPU with 12 cores), I -assumed- that you were talking about application performance.
-However-, saying "Taking it lightly" doesn't excuse you from "being excellent to each other" rule. Your answer was rude and uncalled for.
- Gilboa
Sorry, Gilboa. I felt that you were mis-steering the gist of my question. Have a nice day!
On Sat, 02 Oct 2010 19:54:27 +0200 Gilboa Davara gilboad@gmail.com wrote:
On Thu, 2010-09-30 at 09:59 -0700, JD wrote:
I was browsing for info on 12 core cpu's and found that AMD released them or announced back in March. The price is steep of course. What I would like to know is the degree of granularity of the SMP implementation in Linux. Does anyone have an inside track on that? Or point to some internal documentation?
I'm not sure I understand the question. The Linux kernel itself has no issues supporting 100's of CPUs (either real, or SMT).
Apparently it does have issues http://www.conceivablytech.com/3166/science-research/current-operating-syste...
On Sun, Oct 3, 2010 at 3:49 AM, Yorvyk yorvik.ubunto@googlemail.com wrote:
On Sat, 02 Oct 2010 19:54:27 +0200 Gilboa Davara gilboad@gmail.com wrote:
On Thu, 2010-09-30 at 09:59 -0700, JD wrote:
I was browsing for info on 12 core cpu's and found that AMD released them or announced back in March. The price is steep of course. What I would like to know is the degree of granularity of the SMP implementation in Linux. Does anyone have an inside track on that? Or point to some internal documentation?
I'm not sure I understand the question. The Linux kernel itself has no issues supporting 100's of CPUs (either real, or SMT).
Apparently it does have issues http://www.conceivablytech.com/3166/science-research/current-operating-syste...
-- Steve Cook (Yorvyk)
http://lubuntu.net
users mailing list users@lists.fedoraproject.org To unsubscribe or change subscription options: https://admin.fedoraproject.org/mailman/listinfo/users Guidelines: http://fedoraproject.org/wiki/Mailing_list_guidelines
The MIT paper they refer to in the article: http://ppi.fudan.edu.cn/system/publications/paper/corey-osdi08.pdf
To be honest your bigger problem is finding enough application parallelism and enough parallel user space apps. That and memory or I/O bandwidth on servers.
The kernel will run on supercomputers with over 1000 processors. Not all workloads are handled well at that scale so if you threw 1000 random user instances on it you wouldn't get great results in a lot of cases.
On a desktop its instructive to measure how many processor cores ever end up running at once. About the only time an 8 core box seems to use all cores at once is compiling kernels.
On a server you've got more chance as you've often got a lot of work hitting the box from multiple sources, but in many cases then the bottleneck ends up being I/O and memory bandwidth unless you've got a board with separate RAM hanging off all the CPUs, and you spent real money on the I/O subsystem.
This is actually one of the things that really hurt certain workloads. There are some that simply don't parallelise and the move to many cores and to clusters has left them stuck.
Alan
On 10/02/2010 02:41 PM, Alan Cox wrote:
To be honest your bigger problem is finding enough application parallelism and enough parallel user space apps. That and memory or I/O bandwidth on servers.
The kernel will run on supercomputers with over 1000 processors. Not all workloads are handled well at that scale so if you threw 1000 random user instances on it you wouldn't get great results in a lot of cases.
On a desktop its instructive to measure how many processor cores ever end up running at once. About the only time an 8 core box seems to use all cores at once is compiling kernels.
On a server you've got more chance as you've often got a lot of work hitting the box from multiple sources, but in many cases then the bottleneck ends up being I/O and memory bandwidth unless you've got a board with separate RAM hanging off all the CPUs, and you spent real money on the I/O subsystem.
This is actually one of the things that really hurt certain workloads. There are some that simply don't parallelise and the move to many cores and to clusters has left them stuck.
Alan
Yes I agree. Most (if not all) of open source apps are not implemented with hundreds or thousands of parallel threads. But many commercial applications are heavily multithreaded and it is here that the optimum locking granularity will have a substantial value.
I am wondering if we will ultimately end up splitting up massively parallel architectures into specialized sets where each set will be responsible for some major operating system task (i.e. operations) in order to reduce lock contentions and head off race conditions that can easily crop up when an optimum locking scheme is pursued. Perhaps some things from the past will be the future. Remember the IO channel? :) :)
On 10/02/2010 01:05 PM, Samuel Kidman wrote:
On Sun, Oct 3, 2010 at 3:49 AM, Yorvyk <yorvik.ubunto@googlemail.com mailto:yorvik.ubunto@googlemail.com> wrote:
On Sat, 02 Oct 2010 19:54:27 +0200 Gilboa Davara <gilboad@gmail.com <mailto:gilboad@gmail.com>> wrote: > On Thu, 2010-09-30 at 09:59 -0700, JD wrote: > > I was browsing for info on 12 core cpu's and found > > that AMD released them or announced back in March. > > The price is steep of course. > > What I would like to know is the degree of granularity > > of the SMP implementation in Linux. > > Does anyone have an inside track on that? > > Or point to some internal documentation? > > I'm not sure I understand the question. > The Linux kernel itself has no issues supporting 100's of CPUs (either > real, or SMT). > Apparently it does have issues http://www.conceivablytech.com/3166/science-research/current-operating-systems-may-only-make-sense-up-to-48-cores/
Seems like the writing is on the wall: Linux must evolve into an an efficient OS for massively parallel architectures.
On Sat, 2010-10-02 at 20:49 +0100, Yorvyk wrote:
On Sat, 02 Oct 2010 19:54:27 +0200 Gilboa Davara gilboad@gmail.com wrote:
On Thu, 2010-09-30 at 09:59 -0700, JD wrote:
I was browsing for info on 12 core cpu's and found that AMD released them or announced back in March. The price is steep of course. What I would like to know is the degree of granularity of the SMP implementation in Linux. Does anyone have an inside track on that? Or point to some internal documentation?
I'm not sure I understand the question. The Linux kernel itself has no issues supporting 100's of CPUs (either real, or SMT).
Apparently it does have issues http://www.conceivablytech.com/3166/science-research/current-operating-syste...
I saw this on ./ yesterday, but didn't have time to read the actual paper (only the story linked above).
Never the less, talking about the kernel as single blob, is plain wrong. Different application / workloads exercise different code paths in the kernel; some are better equipped to handle 100's of cores, others don't.
Claiming that you should "redesign" the kernel in-order to handle huge-SMP configurations (that do exist today, mind you) is irrelevant unless you point at specific parts of the kernel that should be redesigned (E.g. network? FS? scheduler? etc, etc, etc) including the target workloads. (E.g. No point in optimizing the kernel for 32 core desktop machine - at least not for now...)
Beyond that, at least in my own experience, if you're running an application that can actually *fully* utilize 32+ cores, your application is usually memory or CPU bound and not kernel (so-to-speak) bound.
... But then again, as I said, I only had time to read the story linked above and didn't have time to read the actual MIT paper.
On Fri, Oct 1, 2010 at 12:59 AM, JD jd1008@gmail.com wrote:
I was browsing for info on 12 core cpu's and found that AMD released them or announced back in March. The price is steep of course. What I would like to know is the degree of granularity of the SMP implementation in Linux. Does anyone have an inside track on that? Or point to some internal documentation?
-- users mailing list users@lists.fedoraproject.org To unsubscribe or change subscription options: https://admin.fedoraproject.org/mailman/listinfo/users Guidelines: http://fedoraproject.org/wiki/Mailing_list_guidelines
IBM has some good, if outdated information here:
http://www.ibm.com/developerworks/library/l-linux-smp/
http://www.ibm.com/developerworks/linux/library/l-scheduler/
It might be a bit broad compared to what you said you're looking for, but could be a start for further reading.
Hope this helps
Sam
On 10/02/2010 11:11 AM, Samuel Kidman wrote:
On Fri, Oct 1, 2010 at 12:59 AM, JD <jd1008@gmail.com mailto:jd1008@gmail.com> wrote:
I was browsing for info on 12 core cpu's and found that AMD released them or announced back in March. The price is steep of course. What I would like to know is the degree of granularity of the SMP implementation in Linux. Does anyone have an inside track on that? Or point to some internal documentation? -- users mailing list users@lists.fedoraproject.org <mailto:users@lists.fedoraproject.org> To unsubscribe or change subscription options: https://admin.fedoraproject.org/mailman/listinfo/users Guidelines: http://fedoraproject.org/wiki/Mailing_list_guidelinesIBM has some good, if outdated information here:
http://www.ibm.com/developerworks/library/l-linux-smp/
http://www.ibm.com/developerworks/linux/library/l-scheduler/
It might be a bit broad compared to what you said you're looking for, but could be a start for further reading.
Hope this helps
Sam
Thank you Sam. The articles are very helpful.
Cheers,
JD