Are all cores unlocked?
Michael Miles
mmamiga6 at gmail.com
Sat Sep 25 23:02:27 UTC 2010
JD wrote:
>
> On 09/25/2010 12:35 PM, James Wilkinson wrote:
>
>> Michael Miles wrote:
>>
>>> Thank's for the clear up. My question is with Hyperthreading that is if
>>> each core does double duty so to speak by looking after two threads
>>> would it not do basically the same work as one core full bore on one thread.
>>> Is there a speed difference (faster, slower)
>>>
>> Good question. The answer is “it depends, but it’s usually faster”.
>>
>> Reasons why it can be faster:
>> * Most modern processors can despatch up to three or four instructions
>> at a time (IF the front end can identify enough instructions that
>> logically can be run at the same time), but will have six to ten
>> execution units to actually run the instructions¹. Therefore, one
>> thread might be able to make use of execution units the other thread
>> isn’t using.
>>
>> * Compared to CPU speed, it takes a seriously long time to get data
>> from main memory. If one thread is waiting for data to arrive, the
>> other one can make full use of the processor.
>>
>> * Most modern CPUs do out-of-order execution, which means they can
>> often find things to do while waiting for data to come from (L2/L3)
>> cache. That’s not guaranteed, though, so the other thread might get
>> more resources to play with.
>>
>> On the other hand, Atom isn’t out-of-order, and can’t do anything
>> while it’s waiting for data from Level 2 cache. So the other thread
>> has full run of the core.
>>
>> Why it can be slower:
>> * The cache memory is having to look after two sets of data, not just
>> one, which means there’ll be a lot more cache misses. The worst case
>> example would be something like two threads, each of which are
>> regularly hitting a different 6K of data, on a Pentium 4 with only 8K
>> Level 1 data cache. Each thread will be constantly replacing the
>> other’s data, meaning each thread is continually having to wait for
>> data from Level 2 cache.
>>
>> This effect was especially noticeable on Pentium 4-based CPUs: a lot of
>> high-end benchmarks would be run with SMT turned off.
>>
>> Hope this helps,
>>
>> James.
>>
>> ¹ The instruction units are specialised: if a thread is 100% integer,
>> the FPU units won’t be of any use to it.
>>
>>
> Correct James. The clobbering of the cache by 2 different threads
> does not depend on whether or not the cpu is hyperthreaded.
> Any two threads can achieve this clobering on any cpu, and it is
> often the case.
> The only situation where hyperthreading will show noticeable
> improvement of execution speed is where the threads are all
> children of the same process and are well behaved and work
> almost entirely on the parent process' data space, with proper
> synchronization. However, if the parent data space and text
> space is larger than the cache, then the sibling threads can
> still cause cache refill every time a sibling accesses a different
> data space than other siblings. Ditto with the instruction cache.
> Different threads have a different set of instructions.
>
> My basic attitude is forget hyperthreading. IMHO it is largely
> a hype!
>
>
>
Thanks for the explanation!!!
One more question that I am a bit confused with
If I run Hardware Lister (lshw) it tells me my Phenom 2 965 is Hyperthreaded
product: AMD Phenom(tm) II X4 965 Processor
vendor: Advanced Micro Devices [AMD]
bus info: cpu at 0
version: AMD Phenom(tm) II X4 965 Processor
serial: To Be Filled By O.E.M.
slot: AM2
size: 3600MHz
capacity: 3600MHz
width: 64 bits
clock: 200MHz
capabilities:
mathematical co-processor,
FPU exceptions reporting,
wp,
virtual mode extensions,
debugging extensions,
page size extensions,
time stamp counter,
model-specific registers,
4GB+ memory addressing (Physical Address Extension),
machine check exceptions,
compare and exchange 8-byte,
on-chip advanced programmable interrupt controller (APIC),
memory type range registers,
page global enable,
machine check architecture,
conditional move instruction,
page attribute table,
36-bit page size extensions,
clflush,
multimedia extensions (MMX),
fast floating point save/restore,
streaming SIMD extensions (SSE),
streaming SIMD extensions (SSE2),
HyperThreading,
fast system calls,
no-execute bit (NX),
multimedia extensions (MMXExt),
fxsr_opt,
pdpe1gb,
rdtscp,
64bits extensions (x86-64),
multimedia extensions (3DNow!Ext),
multimedia extensions (3DNow!),
constant_tsc,
rep_good,
nonstop_tsc,
extd_apicid,
pni,
monitor,
cx16,
popcnt,
lahf_lm,
cmp_legacy,
svm,
extapic,
cr8_legacy,
abm,
sse4a,
misalignsse,
3dnowprefetch,
osvw,
ibs,
skinit,
wdt
So is this true and can it be turned on?
Michael
More information about the users
mailing list