The CUDA API can be thought of as an extension to C. It includes a number of new functions that are called by the host and allow interaction between the host and device. Even with a limited number of available functions, there are a few standout functions that provide most of the functionality that we need for creating a GPU implementation:
cudaGetDevice: find the current GPU “context”
cudaGetDeviceCount: find the number of graphics devices (independent GPUs) available
cudaSetDevice: set the current graphics device (0-indexed integer)
cudaMalloc: allocate an array on the device
cudaMallocHost: (optional) way to allocate an array on the host. Using this function to allocate memory gives CUDA the ability to track & align the memory on the host and accelerate the process of transferring data between host and device.
cudaBindTexture: attach a handle to a CUDA array so that it can be treated as a texture by the device. texture memory is the only type of memory that is cached.
cudaMemcpy: copy between arrays on host & device or two device arrays.
cudaThreadSynchronize: the CPU and the GPU operate asynchronously. This function returns to the CPU when all previously invoked device kernels have completed. [ analogous to an MPI_Barrier ]
These functions are prototyped in cuda.h and cuda_runtime_api.h and can be called from your usual C code.
No comments:
Post a Comment