NVIDIA Optimus技术和CUDA

lovnet

浏览: 6720564 次
性别:
来自: 武汉

最近访客更多访客>>

u012363178

jx_colin

MauerSu

wangyy

博主相关

博客

微博

相册

留言

关于我

文章分类

全部博客 (7414)

社区版块

存档分类

2013-03 ( 20)
2013-02 ( 53)
2013-01 ( 90)
更多存档...

Nvidia称Optimus是一个革命性的技术，它可以延长笔记本电脑续航时间，同时获得很好的性能。这种技术使得硬件能够自己选择使用那个显卡来计算。一般使用GPU计算的应用、视频、3D游戏等会被安排在高性能的NVIDIA GPU上执行，一些像Office、Web冲浪，电子邮件类似的应用会被安排在集成显卡上执行。

如果你的笔记本有两块显卡，一块是Intel的集成显卡（IGP），一块是Nvidia的独立GPU，你的笔记本一般采用了Optimus技术。在使用CUDA的时候，务必要理解Optimus，才能很好的使用你的笔记本来开发CUDA程序。

1、在运行自己的CUDA程序时，小心自己的程序找不到CUDA设备！（因为实现默认的是IGP奥）

2、在使用Direct3D与CUDA、OpenGL与CUDA互操作时，开发人员要注意一些限制！

针对以上限制，注意说明：

1、查询CUDA设备：

// CUDA Runtime API Version
inline int cutGetMaxGflopsDeviceId() 
{ 
    int current_device   = 0, sm_per_multiproc = 0; 
    int max_compute_perf = 0, max_perf_device  = 0; 
    int device_count     = 0, best_SM_arch     = 0; 
    int arch_cores_sm[3] = { 1, 8, 32 }; 
    cudaDeviceProp deviceProp; 
 
    cudaGetDeviceCount( &device_count ); 
 
    // Find the best major SM Architecture GPU device 
    while ( current_device < device_count ) { 
        cudaGetDeviceProperties( &deviceProp, current_device ); 
        if (deviceProp.major > 0 && deviceProp.major < 9999) { 
            best_SM_arch = max(best_SM_arch, deviceProp.major); 
        } 
        current_device++; 
    } 
 
    // Find the best CUDA capable GPU device 
    current_device = 0; 
    while( current_device < device_count ) { 
        cudaGetDeviceProperties( &deviceProp, current_device ); 
        if (deviceProp.major == 9999 && deviceProp.minor == 9999) { 
            sm_per_multiproc = 1; 
        } else if (deviceProp.major <= 2) { 
            sm_per_multiproc = arch_cores_sm[deviceProp.major]; 
        } else { // Device has SM major > 2 
            sm_per_multiproc = arch_cores_sm[2]; 
        } 
 
        int compute_perf = deviceProp.multiProcessorCount *  
                           sm_per_multiproc * deviceProp.clockRate; 
 
        if( compute_perf > max_compute_perf ) { 
            // If we find GPU of SM major > 2, search only these 
            if ( best_SM_arch > 2 ) { 
                // If device==best_SM_arch, choose this, or else pass 
                if (deviceProp.major == best_SM_arch) {   
                    max_compute_perf  = compute_perf; 
                    max_perf_device   = current_device; 
                }  
            } else { 
                max_compute_perf  = compute_perf; 
                max_perf_device   = current_device; 
            } 
        } 
        ++current_device; 
    } 
 
    cudaGetDeviceProperties(&deviceProp, max_compute_perf_device); 
    printf("\nDevice %d: \"%s\"\n", max__perf_device,  
                                    deviceProp.name); 
    printf("Compute Capability   : %d.%d\n",  
            deviceProp.major, deviceProp.minor); 
    return max_perf_device; 
}

声明下哈，这个代码并不是我写的，在参考文献里粘贴的奥。这是官方给出的代码。大家可以参考着写出适合自己的代码，要是你想直接考走就用，建议你还是别看了，直接请别人给你做得了！！

2、Direct3D与CUDA、OpenGL与CUDA互操作：

代码以蔽之！

// CUDA/Direct3D9 interop 
    // You will need to create the D3D9 context first 
IDirect3DDevice9 * g_pD3D9Device; // Initialize D3D9 rendering device 
    // After creation, bind your D3D9 context to CUDA 
cudaD3D9SetDirect3DDevice(g_pD3D9Device);

// CUDA/Direct3D9 interop 
    // You will need to create the D3D9 context first 
IDirect3DDevice9 * g_pD3D9Device; // Initialize D3D9 rendering device 
    // After creation, bind your D3D9 context to CUDA 
cudaD3D9SetDirect3DDevice(g_pD3D9Device);

// CUDA/Direct3D11 interop 
    // You will need to first create the D3D11 context first 
      ID3D11Device      * g_pD3D11Device; // Initialize D3D11 rendering device 
    // After creation, bind your D3D11 context to CUDA 
      cudaD3D11SetDirect3DDevice(g_pD3D11Device);

// For CUDA/OpenGL interop 
    // You will need to create the OpenGL Context first 
    // After creation, bind your D3D11 context to CUDA 
       cudaGLSetGLDevice(deviceID);

会写上面这些代码，并且可以熟练使用这些互操作接口后，你不一定能得到正确的结果奥，或者你的程序根本就不能运行！呵呵，这是为什么那。很可能的原因是你在使用cuda提供的互操作接口之前，DX、OpenGL的图形上下文可能已经被用在了IGP上了，所以会导致失败。解决方法：

1.使用支持Optimus功能的笔记本用户可以在Nvidia Control panel的3DSettings->Manage3DSettings中将需要运行的程序加入到列表中,并将它们的preferred processor设置为High-performance Nvidia processor。

2.程序编写者应该按照最新的方法修改代码。方法是首先创建一个不与设备绑定的OpenGL/D3D上下文；然后枚举所有CUDA设备并创建CUDA上下文；再把CUDA上下文所在设备与OpenGL/D3D上下文绑定；最后再将CUDA上下文与OpenGL/D3D上下文绑定。

参考：

【1】：NVIDIA CUDA Developer Guide for NVIDIA Optimus Platforms

【2】：http://hpcbbs.it168.com/forum.php?mod=viewthread&tid=5958&fromuid=26008496

分享到：