Hyperthreading and multi-threading [Are all cores unlocked?]

Mon Sep 27 20:39:31 UTC 2010

JD wrote:
> Correct James. The clobbering of the cache by 2 different threads
> does not depend on whether or not the cpu is hyperthreaded.
> Any two threads can achieve this clobering on any cpu, and it is
> often the case.

This last sentence is true, but with normal multitasking, and no
multi-threading, each software thread gets a slice of the processor time
to itself – usually several million clock cycles, these days¹. So the
thread has a chance to fill the level 1 cache with its own data before
another thread gets a look in. With multi-threading, each thread is
*constantly* clobbering the other’s data.

> The only situation where hyperthreading will show noticeable
> improvement of execution speed is where the threads are all
> children of the same process and are well behaved and work
> almost entirely on the parent process' data space, with proper
> synchronization. However, if the parent data space and text
> space is larger than the cache,  then the sibling threads can
> still cause cache refill every time a sibling accesses a different
> data space than other siblings. Ditto with the instruction cache.
> Different threads have a different set of instructions.

This does not appear to match reality for all processors.

The Pentium 4 was both the first generally-available processor with
multi-threading available, and a pretty poor example of multi-threading.
So a lot of people got a poor first impression.

Even there, there were other cases when multi-threading made a lot of
sense: if, for example, the algorithm was such that you’re going to get
mostly cache misses *anyway*, then you might as well have two threads
hanging around waiting for data as one.

Other processors (current Core i7 and i5, for example) tend not to have
such a microscopic Level 1 cache, so there’s more chance for both
working sets to fit in cache at the same time.²

http://www.realworldtech.com/beta/forums/index.cfm?action=detail&id=89001&threadid=89001&roomid=2
(and following thread) gives a link to an Intel benchmark claiming a
50%+ performance improvement due to hyperthreading on Atom. Linus
Torvalds³ effectively says “it’s easy to get 50% performance
improvements if the CPU can’t make good use of all it’s resources with
just one thread.”

I’d note, too, that Bulldozer’s FPU is effectively multi-threading, and
that doesn’t use Level 1 data cache *at all*: the data all comes from
Level 2. AMD apparently believes they can get enough out-of-order
re-ordering to hide the latency.

> My basic attitude is forget hyperthreading. IMHO it is largely
> a hype!

You know, I’d actually agree with that on the desktop⁴ – but for
different reasons. The number of hardware threads has mushroomed over
the last ten years, but desktop software is still largely
single-threaded. It’s still fairly rare for there to be a situation
where desktop software can make efficient use of six or eight threads.
The main exceptions are things like transcoding and compression – and
few people buy desktops to do that – and compiling large software
projects, like the Linux kernel.

Personally, I prefer to let the Fedora Project do most of that for me!

Hope this helps,

James.

¹ IF the thread needs it.
² You don’t need the entire program in cache, just the bits that the
program is currently using.
³ As far as we can tell, yes, *that* Linus. He certainly has the same
use of language, the same arguing style, and knows stuff the real Linus
would.
⁴ Servers often do have enough software threads to make use of all the
hardware threads they can get – see Sun’s Niagara for an example.
And single-core Atoms benefit from hyperthreading to improve latency.

-- 
E-mail:     james@ | “My aunt’s camel has fallen in the mirage.”
aprilcottage.co.uk |     -- “Soul Music”, Terry Pratchett.