os that rather uses the gpu?

JD jd1008 at gmail.com
Thu Jul 15 19:19:22 UTC 2010


  On 07/15/2010 12:07 PM, Zoltan Boszormenyi wrote:
> JD írta:
>>    On 07/15/2010 11:23 AM, Michael Miles wrote:
>>
>>> On 07/15/2010 12:18 AM, JD wrote:
>>>
>>>>      On 07/14/2010 11:41 PM, mike cloaked wrote:
>>>>
>>>>
>>>>> On Thu, Jul 15, 2010 at 5:27 AM, john wendel<jwendel10 at comcast.net>     wrote:
>>>>>
>>>>>
>>>>>
>>>>>> Agreed that an OS kernel hasn't much use for a GPU. But it should be
>>>>>> easy to add a small general purpose CPU (ARM or Intel Atom) and a couple
>>>>>> of usb ports to the card and move X completely to the video card. Just
>>>>>> like a remote X server only in the same box.
>>>>>>
>>>>>> I really think the OP was referring to having user mode code take
>>>>>> advantage of the high processing power of modern GPUs. It works now, but
>>>>>> could be improved if the OS contained specialized scheduling support for
>>>>>> these kinds of jobs.
>>>>>>
>>>>>>
>>>>> I understand that the GPU has no page faults, and is missing many of
>>>>> what we regard as the essential functions of a normal processor?  Also
>>>>> getting large amounts of data in or out of the GPU is slow - it is
>>>>> fast partly because there is a lot less overhead compared to a single
>>>>> processor and partly from the advantage of multiple cores. I was
>>>>> speaking to someone who has been working with GPU processing for
>>>>> several years and was skeptical about getting code to run reliably
>>>>> across different GPUs...  and of course CUDA is vendor specific as fa
>>>>> as I know? So speed gain is dependent on the kind of processing needed
>>>>> but if anything goes wrong then it can easily crash the system.
>>>>>
>>>>> Anyone had any experience with using the GPU could perhaps comment?
>>>>>
>>>>>
>>>> Sorry to barge in this late into this thread....
>>>> Was the originator of the thread interested in the kernel
>>>> to use the gpu for floating point operations or integer
>>>> operations?
>>>> If floating point, the x86 (among others) already has an
>>>> integrated fpu, and the integer logic is already in the cpu (or alu).
>>>> So I do not understand what sort of computations the originator
>>>> of the thread would like to see done on the gpu.
>>>>
>>>> jd
>>>>
>>>>
>>> The other OS's Mac and Windows are using the GPU in its  video
>>> conversion programs.
>>> The newer programs will have selections to activate the GPU for computation.
>>>
>>> I have been using the GPU for scientific computation for quite a while now.
>>> Seti at home is very much a hobby and it takes samples from the areciebo
>>> telescope and analyse data looking for "You guessed it, ET"
>>> It will crunch numbers very fast compared to a normal CPU.
>>>
>>> I bench my Phenom 2 965 at 3 gflops/cpu   while the GPU will be doing 54
>>> Gflops .
>>>
>>> I have a slow video card Nvidia 9400GT. The bigger ones will go right up
>>> to  a full teraflop.
>>> That kind of speed would be well accepted if an OS would use it
>>> generally or software that is written for Video conversion to use it
>>> greatly reducing time.
>>>
>>>
>>> That's what I would like to see, more focus on speeding up video
>>> conversion especially with HD video and it seems that the GPU is a very
>>> inexpensive way to add a lot of power to your machines
>>>
>> A teraflop?? WHoa! Can the PCI  bus really feed the  gpu with
>> an instruction stream that will yield that performance?
>>
> Err, no. GPUs are massively parallel beasts. They can't reach level of
> performance via a discrete instruction stream, not to mention even
> the PCIe bus couldn't cope with it really. The high performance comes
> from the GPU programmes are executed by their hardware threads
> in parallel on large amounts of data.
>
>> I mean most pc's out there are in people's homes still pci (33 or 66 MHz
>> bus).
>> Relatively, fewer are on pci x16 which is a much faster bus.
>>
>> Thanks for your feedback.
>>
>>
>>
So, given the extreme limitations of the bus relative to the
teraflop speed of the gpu, how can the cpu feed the gpu with
"data" at a rate that can sustain continuous 1 teraflops/sec?
Is there a pci-e32 with a faster bus clock on the horizon?

There are so many programs that use/need floating point operations
on matrices that could benefit vastly from this. I hope someone(s)
can point to such future HW development.


More information about the users mailing list