CUDA functions

CUDA Functions

The CUDA API can be thought of as an extension to C. It includes a number of new functions that are called by the host and allow interaction between the host and device. Even with a limited number of available functions, there are a few standout functions that provide most of the functionality that we need for creating a GPU implementation:

cudaGetDevice: find the current GPU “context”

cudaGetDeviceCount: find the number of graphics devices (independent GPUs) available

cudaSetDevice: set the current graphics device (0-indexed integer)

cudaMalloc: allocate an array on the device

cudaMallocHost: (optional) way to allocate an array on the host. Using this function to allocate memory gives CUDA the ability to track & align the memory on the host and accelerate the process of transferring data between host and device.

cudaBindTexture: attach a handle to a CUDA array so that it can be treated as a texture by the device. texture memory is the only type of memory that is cached.

cudaMemcpy: copy between arrays on host & device or two device arrays.

cudaThreadSynchronize: the CPU and the GPU operate asynchronously. This function returns to the CPU when all previously invoked device kernels have completed. [ analogous to an MPI_Barrier ]

These functions are prototyped in cuda.h and cuda_runtime_api.h and can be called from your usual C code. 

No comments:

Post a Comment