os that rather uses the gpu?

Zoltan Boszormenyi zboszor at freemail.hu
Thu Jul 15 20:03:05 UTC 2010


JD írta:
>   On 07/15/2010 12:07 PM, Zoltan Boszormenyi wrote:
>   
>> JD írta:
>>     
>>>    On 07/15/2010 11:23 AM, Michael Miles wrote:
>>>
>>>       
>>>> On 07/15/2010 12:18 AM, JD wrote:
>>>>
>>>>         
>>>>>      On 07/14/2010 11:41 PM, mike cloaked wrote:
>>>>>
>>>>>
>>>>>           
>>>>>> On Thu, Jul 15, 2010 at 5:27 AM, john wendel<jwendel10 at comcast.net>     wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>>             
>>>>>>> Agreed that an OS kernel hasn't much use for a GPU. But it should be
>>>>>>> easy to add a small general purpose CPU (ARM or Intel Atom) and a couple
>>>>>>> of usb ports to the card and move X completely to the video card. Just
>>>>>>> like a remote X server only in the same box.
>>>>>>>
>>>>>>> I really think the OP was referring to having user mode code take
>>>>>>> advantage of the high processing power of modern GPUs. It works now, but
>>>>>>> could be improved if the OS contained specialized scheduling support for
>>>>>>> these kinds of jobs.
>>>>>>>
>>>>>>>
>>>>>>>               
>>>>>> I understand that the GPU has no page faults, and is missing many of
>>>>>> what we regard as the essential functions of a normal processor?  Also
>>>>>> getting large amounts of data in or out of the GPU is slow - it is
>>>>>> fast partly because there is a lot less overhead compared to a single
>>>>>> processor and partly from the advantage of multiple cores. I was
>>>>>> speaking to someone who has been working with GPU processing for
>>>>>> several years and was skeptical about getting code to run reliably
>>>>>> across different GPUs...  and of course CUDA is vendor specific as fa
>>>>>> as I know? So speed gain is dependent on the kind of processing needed
>>>>>> but if anything goes wrong then it can easily crash the system.
>>>>>>
>>>>>> Anyone had any experience with using the GPU could perhaps comment?
>>>>>>
>>>>>>
>>>>>>             
>>>>> Sorry to barge in this late into this thread....
>>>>> Was the originator of the thread interested in the kernel
>>>>> to use the gpu for floating point operations or integer
>>>>> operations?
>>>>> If floating point, the x86 (among others) already has an
>>>>> integrated fpu, and the integer logic is already in the cpu (or alu).
>>>>> So I do not understand what sort of computations the originator
>>>>> of the thread would like to see done on the gpu.
>>>>>
>>>>> jd
>>>>>
>>>>>
>>>>>           
>>>> The other OS's Mac and Windows are using the GPU in its  video
>>>> conversion programs.
>>>> The newer programs will have selections to activate the GPU for computation.
>>>>
>>>> I have been using the GPU for scientific computation for quite a while now.
>>>> Seti at home is very much a hobby and it takes samples from the areciebo
>>>> telescope and analyse data looking for "You guessed it, ET"
>>>> It will crunch numbers very fast compared to a normal CPU.
>>>>
>>>> I bench my Phenom 2 965 at 3 gflops/cpu   while the GPU will be doing 54
>>>> Gflops .
>>>>
>>>> I have a slow video card Nvidia 9400GT. The bigger ones will go right up
>>>> to  a full teraflop.
>>>> That kind of speed would be well accepted if an OS would use it
>>>> generally or software that is written for Video conversion to use it
>>>> greatly reducing time.
>>>>
>>>>
>>>> That's what I would like to see, more focus on speeding up video
>>>> conversion especially with HD video and it seems that the GPU is a very
>>>> inexpensive way to add a lot of power to your machines
>>>>
>>>>         
>>> A teraflop?? WHoa! Can the PCI  bus really feed the  gpu with
>>> an instruction stream that will yield that performance?
>>>
>>>       
>> Err, no. GPUs are massively parallel beasts. They can't reach level of
>> performance via a discrete instruction stream, not to mention even
>> the PCIe bus couldn't cope with it really. The high performance comes
>> from the GPU programmes are executed by their hardware threads
>> in parallel on large amounts of data.
>>
>>     
>>> I mean most pc's out there are in people's homes still pci (33 or 66 MHz
>>> bus).
>>> Relatively, fewer are on pci x16 which is a much faster bus.
>>>
>>> Thanks for your feedback.
>>>
>>>
>>>
>>>       
> So, given the extreme limitations of the bus relative to the
> teraflop speed of the gpu, how can the cpu feed the gpu with
> "data" at a rate that can sustain continuous 1 teraflops/sec?
>   

Usually the GPU is used for matrix-computations, the tricks involve
computations done with strict aligment requirements and cleverly
parallelized
algorithms and using computations that result in much less data as the
result
like vector length is a single number no matter the vector dimensions.
Only such computations worth rewriting to GPU because the card has to
upload the initial data to its memory, do the computation then push the
results back to the main RAM so the CPU can access it. And the upload +
the massively parallel GPU computation + download must still be faster
than the naively parallelized (2, 4 or little more threads to exploit
CPU cores)
or non-parallelized CPU computation to be worth the rewrite.

> Is there a pci-e32 with a faster bus clock on the horizon?
>   

No idea.

> There are so many programs that use/need floating point operations
> on matrices that could benefit vastly from this. I hope someone(s)
> can point to such future HW development.
>   

NVIDIA's big Tesla machines with 6GB GPU memory were built exactly
to save on the upload/download time. Many large matrices (like initial data,
partial and final results) and  can fit into such large amount of memory...



More information about the users mailing list