Computer Architecture Nvidia Cuda Technology

compare Nvidia's CUDA technology to the general-purpose microprocessor. 


CUDA can perform parallel programming with co-processor offload and additions to C-syntax. This technology can handle multiple threads as mapped kernels. These kernels represent the pending job at any given point of the domain. The threads are handled parallelly and can share the memory. These can also be synchronized. The kernels have scalar C codes (Che, et al., 2008). The technology can create a function declaration to determine whether the work would be executed on GPU or CPU.  The GPU can store the memory variable and a type qualifier can be used to determine this function. Using the kernels with this technology would also ensure identification of the kernels on separate data sets. CUDA uses a five-dimensional structure where a maximum of 512 threads can be used in three-dimensional form and a two-dimensional grid is used. In a single thread blocks the threads work simultaneously but the data can only be retrieved when the blocks have been completed (Halfhill, 2008).

The general-purpose microprocessor uses a data path and controller-based system which is dependent on the memory. The data path uses three components, load, store and ALU. There is a circuit system which can store and convert temporary data (Wei-Wu, Rui, Jun-Hua, X, & Long-Bin, 2006). The ALU conducts the logical calculations for data transformation and also produce status signals. The controller creates a sequence of operations and processes them with the help of programs. The main difference between CUDA and general-purpose microprocessor is the ability of parallel computing. Individual task schedules are not necessary in CUDA whereas the general-purpose microprocessor uses clock cycles.



