On Sep 22, 2010, JD jd1008@gmail.com wrote:
On my notebook, which has an old 2.2 GHz athlon65 uniicore (3700+), cpuinfo shows cpu MHz as 798.103
OK
Does that mean that as I am typing this message, the cpu is running at only 790MHz??
Approximately, yes. Your machine is also not discharging its battery quite so fast, nor is it generating more heat unnecessarily for the modest level of CPU activity you are now requesting of the machine.
How an I speed it up?
Ask the CPU to do more work. Recalculate a large spreadsheet. Spell-check a long document. Do a database lookup. Better yet, do them all at the same time. If your bandwidth, as opposed to the machine's, isn't interested in all that excitement, but you still want to exercise the processor more, find some program to run in the background while you do less compute-intensive tasks. For example, you could join the folding@home project:
http://en.wikipedia.org/wiki/Folding@home
Or you could just be content that your computer knows how to run in an idle mode instead of racing around at top speed when it doesn't have anything to compute at the moment (which is most of the time, usually).
One of the larger challenges of contemporary computer science is to figure out how to use, most efficiently and effectively, the multiple processor resources now more commonly available. Software has to be made aware of how to best use the newer hardware, and this is a non-trivial task.
Ken
On 09/22/2010 10:56 AM, Kenneth Marcy wrote:
On Sep 22, 2010, JDjd1008@gmail.com wrote:
On my notebook, which has an old 2.2 GHz athlon65 uniicore (3700+), cpuinfo shows cpu MHz as 798.103
OK
Does that mean that as I am typing this message, the cpu is running at only 790MHz??
Approximately, yes. Your machine is also not discharging its battery quite so fast, nor is it generating more heat unnecessarily for the modest level of CPU activity you are now requesting of the machine.
How an I speed it up?
Ask the CPU to do more work. Recalculate a large spreadsheet. Spell-check a long document. Do a database lookup. Better yet, do them all at the same time. If your bandwidth, as opposed to the machine's, isn't interested in all that excitement, but you still want to exercise the processor more, find some program to run in the background while you do less compute-intensive tasks. For example, you could join the folding@home project:
I ran a super cpu hog: celestia. Cput utilization reached 99.9% and stayed there. In a terminal window, I ran this shell:
while true; do cat /proc/cpuinfo | grep -i mhz sleep 3 done
The speed stayed at 790MHz.
I think there must be something wrong with speed-step or somehow, the bios does not update this value (I understand that cpuinfo is populated by calls to bios).
I wish I could find a program that could actually test the cpu MHz by timing, in a loop, a complex set of instructions which would be an average representation of the machine's instructions used by apps and kernel. I am not sure if such a program exists. The old "mips" calculation programs do not work on modern architectures.
http://en.wikipedia.org/wiki/Folding@home
Or you could just be content that your computer knows how to run in an idle mode instead of racing around at top speed when it doesn't have anything to compute at the moment (which is most of the time, usually).
One of the larger challenges of contemporary computer science is to figure out how to use, most efficiently and effectively, the multiple processor resources now more commonly available. Software has to be made aware of how to best use the newer hardware, and this is a non-trivial task.
Ken
On 22 September 2010 13:00, JD jd1008@gmail.com wrote:
I wish I could find a program that could actually test the cpu MHz
cat /sys/devices/system/cpu/cpu?/cpufreq/scaling_cur_freq
On 09/22/2010 03:09 PM, suvayu ali wrote:
On 22 September 2010 13:00, JDjd1008@gmail.com wrote:
I wish I could find a program that could actually test the cpu MHz
cat /sys/devices/system/cpu/cpu?/cpufreq/scaling_cur_freq
$ cat /sys/devices/system/cpu/cpu?/cpufreq/scaling_cur_freq cat: /sys/devices/system/cpu/cpu?/cpufreq/scaling_cur_freq: No such file or directory
$ find /sys -name cpufreq /sys/devices/system/cpu/cpufreq
$ ls -l /sys/devices/system/cpu/cpufreq total 0K
$ ls -l /sys/devices/system/cpu/cpu0 total 0K drwxr-xr-x 5 root root 0 Sep 22 15:24 cache/ drwxr-xr-x 6 root root 0 Sep 22 15:38 cpuidle/
$ find /sys -name scaling* $
On 22 September 2010 15:44, JD jd1008@gmail.com wrote:
On 09/22/2010 03:09 PM, suvayu ali wrote:
On 22 September 2010 13:00, JDjd1008@gmail.com wrote:
I wish I could find a program that could actually test the cpu MHz
cat /sys/devices/system/cpu/cpu?/cpufreq/scaling_cur_freq
$ cat /sys/devices/system/cpu/cpu?/cpufreq/scaling_cur_freq cat: /sys/devices/system/cpu/cpu?/cpufreq/scaling_cur_freq: No such file or directory
$ find /sys -name cpufreq /sys/devices/system/cpu/cpufreq
$ ls -l /sys/devices/system/cpu/cpufreq total 0K
$ ls -l /sys/devices/system/cpu/cpu0 total 0K drwxr-xr-x 5 root root 0 Sep 22 15:24 cache/ drwxr-xr-x 6 root root 0 Sep 22 15:38 cpuidle/
$ find /sys -name scaling* $
After your response I tried to find those files on my work machine, which is rather old. They are not there. Maybe your cpu is an older model like my work machine and this is a newer feature... Just a guess. :-/
JD wrote:
On 09/22/2010 10:56 AM, Kenneth Marcy wrote:
On Sep 22, 2010, JDjd1008@gmail.com wrote:
On my notebook, which has an old 2.2 GHz athlon65 uniicore (3700+), cpuinfo shows cpu MHz as 798.103
OK
Does that mean that as I am typing this message, the cpu is running at only 790MHz??
Approximately, yes. Your machine is also not discharging its battery quite so fast, nor is it generating more heat unnecessarily for the modest level of CPU activity you are now requesting of the machine.
How an I speed it up?
Ask the CPU to do more work. Recalculate a large spreadsheet. Spell-check a long document. Do a database lookup. Better yet, do them all at the same time. If your bandwidth, as opposed to the machine's, isn't interested in all that excitement, but you still want to exercise the processor more, find some program to run in the background while you do less compute-intensive tasks. For example, you could join the folding@home project:
I ran a super cpu hog: celestia. Cput utilization reached 99.9% and stayed there. In a terminal window, I ran this shell:
while true; do cat /proc/cpuinfo | grep -i mhz sleep 3 done
The speed stayed at 790MHz.
I think there must be something wrong with speed-step or somehow, the bios does not update this value (I understand that cpuinfo is populated by calls to bios).
I wish I could find a program that could actually test the cpu MHz by timing, in a loop, a complex set of instructions which would be an average representation of the machine's instructions used by apps and kernel. I am not sure if such a program exists. The old "mips" calculation programs do not work on modern architectures.
http://en.wikipedia.org/wiki/Folding@home
Or you could just be content that your computer knows how to run in an idle mode instead of racing around at top speed when it doesn't have anything to compute at the moment (which is most of the time, usually).
One of the larger challenges of contemporary computer science is to figure out how to use, most efficiently and effectively, the multiple processor resources now more commonly available. Software has to be made aware of how to best use the newer hardware, and this is a non-trivial task.
Ken
In the Bios check your C2 or C3 and make sure it is disabled. Also Cool and Quiet needs to be turned off.
I hate that feature as my rig is at 100% load on all 4 cores 24/7 . (Seti@home) That's one thing very nice about this processor (Phenom II 965 @ 3.6) is I do not even realize that it is under 100% load all the time.
I can't wait to see the Bulldozer series in action ( 16 cores Hyperthreaded) yeah baby..........
Michael Miles wrote:
I can't wait to see the Bulldozer series in action ( 16 cores Hyperthreaded) yeah baby..........
Unfortunately, Bulldozer doesn’t do conventional SMT (which is what Intel usually¹ means by hyperthreading). It has two integer cores sharing a wide floating point engine and level 2 cache. This combination is what AMD call a “module”, but they will be selling it as two cores.
So a 16 core Bulldozer will have 16 hardware threads.
A module takes more power and area than a traditional core with hyperthreading, but you should get more performance out of it, too.
Sorry,
James.
¹ Hyperthreading is an Intel trademark, and, as such, means precisely what Intel wants it to mean at the moment. This can change (it means something different for the Itanium).
James Wilkinson wrote:
Michael Miles wrote:
I can't wait to see the Bulldozer series in action ( 16 cores Hyperthreaded) yeah baby..........
Unfortunately, Bulldozer doesn’t do conventional SMT (which is what Intel usually¹ means by hyperthreading). It has two integer cores sharing a wide floating point engine and level 2 cache. This combination is what AMD call a “module”, but they will be selling it as two cores.
So a 16 core Bulldozer will have 16 hardware threads.
A module takes more power and area than a traditional core with hyperthreading, but you should get more performance out of it, too.
Sorry,
James.
¹ Hyperthreading is an Intel trademark, and, as such, means precisely what Intel wants it to mean at the moment. This can change (it means something different for the Itanium).
Thank's for the clear up. My question is with Hyperthreading that is if each core does double duty so to speak by looking after two threads would it not do basically the same work as one core full bore on one thread. Is there a speed difference (faster, slower)
It seems to me that two threads time share one core. Thus making the work load the same as if one core was doing one task but at twice the speed
It is confusing why they would have Hyperthreading there. An i7 920 with 4 cores does the same amount of work as the same chip with 8 cores showing with Hyperthreading active.
Michael
Michael Miles wrote:
Thank's for the clear up. My question is with Hyperthreading that is if each core does double duty so to speak by looking after two threads would it not do basically the same work as one core full bore on one thread. Is there a speed difference (faster, slower)
Good question. The answer is “it depends, but it’s usually faster”.
Reasons why it can be faster: * Most modern processors can despatch up to three or four instructions at a time (IF the front end can identify enough instructions that logically can be run at the same time), but will have six to ten execution units to actually run the instructions¹. Therefore, one thread might be able to make use of execution units the other thread isn’t using.
* Compared to CPU speed, it takes a seriously long time to get data from main memory. If one thread is waiting for data to arrive, the other one can make full use of the processor.
* Most modern CPUs do out-of-order execution, which means they can often find things to do while waiting for data to come from (L2/L3) cache. That’s not guaranteed, though, so the other thread might get more resources to play with.
On the other hand, Atom isn’t out-of-order, and can’t do anything while it’s waiting for data from Level 2 cache. So the other thread has full run of the core.
Why it can be slower: * The cache memory is having to look after two sets of data, not just one, which means there’ll be a lot more cache misses. The worst case example would be something like two threads, each of which are regularly hitting a different 6K of data, on a Pentium 4 with only 8K Level 1 data cache. Each thread will be constantly replacing the other’s data, meaning each thread is continually having to wait for data from Level 2 cache.
This effect was especially noticeable on Pentium 4-based CPUs: a lot of high-end benchmarks would be run with SMT turned off.
Hope this helps,
James.
¹ The instruction units are specialised: if a thread is 100% integer, the FPU units won’t be of any use to it.
On 09/25/2010 12:35 PM, James Wilkinson wrote:
Michael Miles wrote:
Thank's for the clear up. My question is with Hyperthreading that is if each core does double duty so to speak by looking after two threads would it not do basically the same work as one core full bore on one thread. Is there a speed difference (faster, slower)
Good question. The answer is “it depends, but it’s usually faster”.
Reasons why it can be faster:
Most modern processors can despatch up to three or four instructions at a time (IF the front end can identify enough instructions that logically can be run at the same time), but will have six to ten execution units to actually run the instructions¹. Therefore, one thread might be able to make use of execution units the other thread isn’t using.
Compared to CPU speed, it takes a seriously long time to get data from main memory. If one thread is waiting for data to arrive, the other one can make full use of the processor.
Most modern CPUs do out-of-order execution, which means they can often find things to do while waiting for data to come from (L2/L3) cache. That’s not guaranteed, though, so the other thread might get more resources to play with.
On the other hand, Atom isn’t out-of-order, and can’t do anything while it’s waiting for data from Level 2 cache. So the other thread has full run of the core.
Why it can be slower:
- The cache memory is having to look after two sets of data, not just one, which means there’ll be a lot more cache misses. The worst case example would be something like two threads, each of which are regularly hitting a different 6K of data, on a Pentium 4 with only 8K Level 1 data cache. Each thread will be constantly replacing the other’s data, meaning each thread is continually having to wait for data from Level 2 cache.
This effect was especially noticeable on Pentium 4-based CPUs: a lot of high-end benchmarks would be run with SMT turned off.
Hope this helps,
James.
¹ The instruction units are specialised: if a thread is 100% integer, the FPU units won’t be of any use to it.
Correct James. The clobbering of the cache by 2 different threads does not depend on whether or not the cpu is hyperthreaded. Any two threads can achieve this clobering on any cpu, and it is often the case. The only situation where hyperthreading will show noticeable improvement of execution speed is where the threads are all children of the same process and are well behaved and work almost entirely on the parent process' data space, with proper synchronization. However, if the parent data space and text space is larger than the cache, then the sibling threads can still cause cache refill every time a sibling accesses a different data space than other siblings. Ditto with the instruction cache. Different threads have a different set of instructions.
My basic attitude is forget hyperthreading. IMHO it is largely a hype!
JD wrote:
On 09/25/2010 12:35 PM, James Wilkinson wrote:
Michael Miles wrote:
Thank's for the clear up. My question is with Hyperthreading that is if each core does double duty so to speak by looking after two threads would it not do basically the same work as one core full bore on one thread. Is there a speed difference (faster, slower)
Good question. The answer is “it depends, but it’s usually faster”.
Reasons why it can be faster:
Most modern processors can despatch up to three or four instructions at a time (IF the front end can identify enough instructions that logically can be run at the same time), but will have six to ten execution units to actually run the instructions¹. Therefore, one thread might be able to make use of execution units the other thread isn’t using.
Compared to CPU speed, it takes a seriously long time to get data from main memory. If one thread is waiting for data to arrive, the other one can make full use of the processor.
Most modern CPUs do out-of-order execution, which means they can often find things to do while waiting for data to come from (L2/L3) cache. That’s not guaranteed, though, so the other thread might get more resources to play with.
On the other hand, Atom isn’t out-of-order, and can’t do anything while it’s waiting for data from Level 2 cache. So the other thread has full run of the core.
Why it can be slower:
- The cache memory is having to look after two sets of data, not just one, which means there’ll be a lot more cache misses. The worst case example would be something like two threads, each of which are regularly hitting a different 6K of data, on a Pentium 4 with only 8K Level 1 data cache. Each thread will be constantly replacing the other’s data, meaning each thread is continually having to wait for data from Level 2 cache.
This effect was especially noticeable on Pentium 4-based CPUs: a lot of high-end benchmarks would be run with SMT turned off.
Hope this helps,
James.
¹ The instruction units are specialised: if a thread is 100% integer, the FPU units won’t be of any use to it.
Correct James. The clobbering of the cache by 2 different threads does not depend on whether or not the cpu is hyperthreaded. Any two threads can achieve this clobering on any cpu, and it is often the case. The only situation where hyperthreading will show noticeable improvement of execution speed is where the threads are all children of the same process and are well behaved and work almost entirely on the parent process' data space, with proper synchronization. However, if the parent data space and text space is larger than the cache, then the sibling threads can still cause cache refill every time a sibling accesses a different data space than other siblings. Ditto with the instruction cache. Different threads have a different set of instructions.
My basic attitude is forget hyperthreading. IMHO it is largely a hype!
Thanks for the explanation!!!
One more question that I am a bit confused with If I run Hardware Lister (lshw) it tells me my Phenom 2 965 is Hyperthreaded
product: AMD Phenom(tm) II X4 965 Processor vendor: Advanced Micro Devices [AMD] bus info: cpu@0 version: AMD Phenom(tm) II X4 965 Processor serial: To Be Filled By O.E.M. slot: AM2 size: 3600MHz capacity: 3600MHz width: 64 bits clock: 200MHz capabilities: mathematical co-processor, FPU exceptions reporting, wp, virtual mode extensions, debugging extensions, page size extensions, time stamp counter, model-specific registers, 4GB+ memory addressing (Physical Address Extension), machine check exceptions, compare and exchange 8-byte, on-chip advanced programmable interrupt controller (APIC), memory type range registers, page global enable, machine check architecture, conditional move instruction, page attribute table, 36-bit page size extensions, clflush, multimedia extensions (MMX), fast floating point save/restore, streaming SIMD extensions (SSE), streaming SIMD extensions (SSE2), HyperThreading, fast system calls, no-execute bit (NX), multimedia extensions (MMXExt), fxsr_opt, pdpe1gb, rdtscp, 64bits extensions (x86-64), multimedia extensions (3DNow!Ext), multimedia extensions (3DNow!), constant_tsc, rep_good, nonstop_tsc, extd_apicid, pni, monitor, cx16, popcnt, lahf_lm, cmp_legacy, svm, extapic, cr8_legacy, abm, sse4a, misalignsse, 3dnowprefetch, osvw, ibs, skinit, wdt
So is this true and can it be turned on?
Michael
JD wrote:
Correct James. The clobbering of the cache by 2 different threads does not depend on whether or not the cpu is hyperthreaded. Any two threads can achieve this clobering on any cpu, and it is often the case.
This last sentence is true, but with normal multitasking, and no multi-threading, each software thread gets a slice of the processor time to itself – usually several million clock cycles, these days¹. So the thread has a chance to fill the level 1 cache with its own data before another thread gets a look in. With multi-threading, each thread is *constantly* clobbering the other’s data.
The only situation where hyperthreading will show noticeable improvement of execution speed is where the threads are all children of the same process and are well behaved and work almost entirely on the parent process' data space, with proper synchronization. However, if the parent data space and text space is larger than the cache, then the sibling threads can still cause cache refill every time a sibling accesses a different data space than other siblings. Ditto with the instruction cache. Different threads have a different set of instructions.
This does not appear to match reality for all processors.
The Pentium 4 was both the first generally-available processor with multi-threading available, and a pretty poor example of multi-threading. So a lot of people got a poor first impression.
Even there, there were other cases when multi-threading made a lot of sense: if, for example, the algorithm was such that you’re going to get mostly cache misses *anyway*, then you might as well have two threads hanging around waiting for data as one.
Other processors (current Core i7 and i5, for example) tend not to have such a microscopic Level 1 cache, so there’s more chance for both working sets to fit in cache at the same time.²
http://www.realworldtech.com/beta/forums/index.cfm?action=detail&id=8900... (and following thread) gives a link to an Intel benchmark claiming a 50%+ performance improvement due to hyperthreading on Atom. Linus Torvalds³ effectively says “it’s easy to get 50% performance improvements if the CPU can’t make good use of all it’s resources with just one thread.”
I’d note, too, that Bulldozer’s FPU is effectively multi-threading, and that doesn’t use Level 1 data cache *at all*: the data all comes from Level 2. AMD apparently believes they can get enough out-of-order re-ordering to hide the latency.
My basic attitude is forget hyperthreading. IMHO it is largely a hype!
You know, I’d actually agree with that on the desktop⁴ – but for different reasons. The number of hardware threads has mushroomed over the last ten years, but desktop software is still largely single-threaded. It’s still fairly rare for there to be a situation where desktop software can make efficient use of six or eight threads. The main exceptions are things like transcoding and compression – and few people buy desktops to do that – and compiling large software projects, like the Linux kernel.
Personally, I prefer to let the Fedora Project do most of that for me!
Hope this helps,
James.
¹ IF the thread needs it. ² You don’t need the entire program in cache, just the bits that the program is currently using. ³ As far as we can tell, yes, *that* Linus. He certainly has the same use of language, the same arguing style, and knows stuff the real Linus would. ⁴ Servers often do have enough software threads to make use of all the hardware threads they can get – see Sun’s Niagara for an example. And single-core Atoms benefit from hyperthreading to improve latency.
On 22 September 2010 13:00, JD jd1008@gmail.com wrote:
I wish I could find a program that could actually test the cpu MHz
cat /sys/devices/system/cpu/cpu?/cpufreq/scaling_cur_freq