Performance difference of CUDA in Windows and Linux

**Roasted** · 14-05-2010

According to CUDA-Z, I consider something and got that there looks to be a substantial performance difference between the 'Device to Device' memory copy speed in both of the 64 bit enables OSs (Windows and Linux). I found after committing a test, Windows is going 6x faster in 'Device to Device' copy. What actually is a 'Device to Device' copy? Is it the concept of memory operations withing the memory of the graphics card or what ?

Also are there some substantial performance differences within the CUDA running on Windows versus Linux OS?

**Devasis** · 14-05-2010

I don't think, it should be happened like this but if you are getting such type of issue then you just understand it is the issue of your internal configuration or some thing going wrong with you. That Linux device-to-device speed is way too low for the card currently you carried. Either the test that you conducted to get the actual statistics is incorrect, or you have some type of driver issue. What driver version , you are running with currently ?

**Roasted** · 14-05-2010

Originally Posted by Devasis

That Linux device-to-device speed is way too low for the card currently you carried. Either the test that you conducted to get the actual statistics is incorrect, or you have some type of driver issue. What driver version , you are running with currently ?

I am just running with the 195.36.15 drivers in Ubuntu 9.10 x86_64, which were downloaded from the official site of nvidia corporation and not the 'Developer Drivers for Linux (195.36.15)' found on the CUDA website.This is the ouput to the bandwidth Test in the SDK while I was working in Linux system:

Running on......
device 0:GeForce 8800 GTS
Quick Mode
Host to Device Bandwidth for Pageable memory
.
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 1645.9

Quick Mode
Device to Host Bandwidth for Pageable memory
.
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 1468.4

Quick Mode
Device to Device Bandwidth
.
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 10080.2

&&&& Test PASSED

Press ENTER to exit...

**Netorious** · 14-05-2010

Hi, everyone.

I like to have Cuda API, it is very good and clean but I just seemed that there's a major speed difference between the two most famous operating system:

My Mandelbrot code runs at around 200fps under my Linux
My Mandelbrot code runs at around 400fps under Windows XP

In both scenario, I made disabled VSync to get the higher frame rate. I am working with Cuda 2.3 (the stable version, that is) within both of the OSs. Perhaps my code is buggy somehow? I tried to do the same official nbody sample from the SDK:

Under Linux:
./nbody -benchmark -n=30000
Run "nbody -benchmark [-n=<numBodies>]" to measure perfomance.
30000 bodies, total time for 100 iterations: 19701.262 ms
= 4.568 billion interactions per second
= 91.365 GFLOP/s at 20 flops per interaction

Under Windows XP:
Run "nbody -benchmark [-n=<numBodies>]" to measure perfomance.
30000 bodies, total time for 100 iterations: 12137.919 ms
= 7.415 billion interactions per second
= 148.296 GFLOP/s at 20 flops per interaction

So it's not just my code... For some causes, my GT240 working at least 60% quick within the Windows XP. Do you all have any suggestion why? Is this a driver bug or what could be the reason ....

**Calvin K** · 14-05-2010

NVIDIA binary driver provides the compatibility with different performance levels. Each one contains two parameters: GPU clock frequency and on-board memory clock frequency (if the card contains one). To find out supported performance levels, you require to use 'nvidia-settings' command.

I have some example here to show you three performance levels:

$ nvidia-settings -q GPUPerfModes -t
perf=0, nvclock=100, memclock=0 ; perf=1, nvclock=350, memclock=0 ; perf=2, nvclock=425, memclock=0

The parameter is read-only.
memclock=0 indicates that on-board memory is not currently available.
Number of present performance levels may vary for some other cards (I got some cards with 1, 3 and 4 levels also).
For the similar performance level, there may be some other memclock values for different screen mode BPS (16/32).

**Russell** · 15-05-2010

NVIDIA driver perform continuously in one of these levels each slice of time.Here are some commands will display you current performance level and current frequency parameters:

Code:

    $ nvidia-settings -q GPUCurrentPerfLevel -t
    2

    $ nvidia-settings -q GPUCurrentClockFreqs -t
    425,666

As per the output of the following commands "425,666" in the example stated - current GPU frequency is 425 MHz, but the card has no on-board memory so there must be "425,0" (memclock=0 for perf=2). It seems as a issue in NVIDIA driver or nvidia-settings tool.

**SecRecy** · 17-05-2010

There is one more read-only NVIDIA driver parameter is suggested as 'GPUCurrentPerfMode' - Performance mode. The possible values for this parameter would be according to this measurement :

1 - Desktop (2D) - 3D features are not used at the moment;
2 - Highest Performance (3D) - some application is using 3D features.

To move to the 3D mode you may run 'glxgears' application from 'mesa-demos' package for example (graphics/mesa-demos port for FreeBSD users).