Are all cores unlocked?

Sat Sep 25 23:02:27 UTC 2010

JD wrote:
>
> On 09/25/2010 12:35 PM, James Wilkinson wrote:
>    
>> Michael Miles wrote:
>>      
>>> Thank's for the clear up. My question is with Hyperthreading that is if
>>> each core does double duty so to speak by looking after two threads
>>> would it not do basically the same work as one core full bore on one thread.
>>> Is there a speed difference (faster, slower)
>>>        
>> Good question. The answer is “it depends, but it’s usually faster”.
>>
>> Reasons why it can be faster:
>>    * Most modern processors can despatch up to three or four instructions
>>      at a time (IF the front end can identify enough instructions that
>>      logically can be run at the same time), but will have six to ten
>>      execution units to actually run the instructions¹. Therefore, one
>>      thread might be able to make use of execution units the other thread
>>      isn’t using.
>>
>>    * Compared to CPU speed, it takes a seriously long time to get data
>>      from main memory. If one thread is waiting for data to arrive, the
>>      other one can make full use of the processor.
>>
>>    * Most modern CPUs do out-of-order execution, which means they can
>>      often find things to do while waiting for data to come from  (L2/L3)
>>      cache. That’s not guaranteed, though, so the other thread might get
>>      more resources to play with.
>>
>>      On the other hand, Atom isn’t out-of-order, and can’t do anything
>>      while it’s waiting for data from Level 2 cache. So the other thread
>>      has full run of the core.
>>
>> Why it can be slower:
>>    * The cache memory is having to look after two sets of data, not just
>>      one, which means there’ll be a lot more cache misses. The worst case
>>      example would be something like two threads, each of which are
>>      regularly hitting a different 6K of data, on a Pentium 4 with only 8K
>>      Level 1 data cache. Each thread will be constantly replacing the
>>      other’s data, meaning each thread is continually having to wait for
>>      data from Level 2 cache.
>>
>> This effect was especially noticeable on Pentium 4-based CPUs: a lot of
>> high-end benchmarks would be run with SMT turned off.
>>
>> Hope this helps,
>>
>> James.
>>
>> ¹ The instruction units are specialised: if a thread is 100% integer,
>> the FPU units won’t be of any use to it.
>>
>>      
> Correct James. The clobbering of the cache by 2 different threads
> does not depend on whether or not the cpu is hyperthreaded.
> Any two threads can achieve this clobering on any cpu, and it is
> often the case.
> The only situation where hyperthreading will show noticeable
> improvement of execution speed is where the threads are all
> children of the same process and are well behaved and work
> almost entirely on the parent process' data space, with proper
> synchronization. However, if the parent data space and text
> space is larger than the cache,  then the sibling threads can
> still cause cache refill every time a sibling accesses a different
> data space than other siblings. Ditto with the instruction cache.
> Different threads have a different set of instructions.
>
> My basic attitude is forget hyperthreading. IMHO it is largely
> a hype!
>
>
>    

Thanks for the explanation!!!

One more question that I am a bit confused with
If I run Hardware Lister (lshw) it tells me my Phenom 2 965 is Hyperthreaded

product: AMD Phenom(tm) II X4 965 Processor
vendor: Advanced Micro Devices [AMD]
bus info: cpu at 0
version: AMD Phenom(tm) II X4 965 Processor
serial: To Be Filled By O.E.M.
slot: AM2
size: 3600MHz
capacity: 3600MHz
width: 64 bits
clock: 200MHz
capabilities:
     mathematical co-processor,
     FPU exceptions reporting,
     wp,
     virtual mode extensions,
     debugging extensions,
     page size extensions,
     time stamp counter,
     model-specific registers,
     4GB+ memory addressing (Physical Address Extension),
     machine check exceptions,
     compare and exchange 8-byte,
     on-chip advanced programmable interrupt controller (APIC),
     memory type range registers,
     page global enable,
     machine check architecture,
     conditional move instruction,
     page attribute table,
     36-bit page size extensions,
     clflush,
     multimedia extensions (MMX),
     fast floating point save/restore,
     streaming SIMD extensions (SSE),
     streaming SIMD extensions (SSE2),
     HyperThreading,
     fast system calls,
     no-execute bit (NX),
     multimedia extensions (MMXExt),
     fxsr_opt,
     pdpe1gb,
     rdtscp,
     64bits extensions (x86-64),
     multimedia extensions (3DNow!Ext),
     multimedia extensions (3DNow!),
     constant_tsc,
     rep_good,
     nonstop_tsc,
     extd_apicid,
     pni,
     monitor,
     cx16,
     popcnt,
     lahf_lm,
     cmp_legacy,
     svm,
     extapic,
     cr8_legacy,
     abm,
     sse4a,
     misalignsse,
     3dnowprefetch,
     osvw,
     ibs,
     skinit,
     wdt

So is this true and can it be turned on?

Michael