GPGPU
General-Purpose Computation Using Graphics Hardware

Introduction

GPGPU stands for General-Purpose computation on GPUs. With the increasing programmability of commodity graphics processing units (GPUs), these chips are capable of performing more than the specific graphics computations for which they were designed. They are now capable coprocessors, and their high speed makes them useful for a variety of applications. The goal of this page is to catalog the current and historical use of GPUs for general-purpose computation.

Contribute
Have some GPGPU News to Contribute? Submit it!

Contact Us


Subscribe to a syndicated RSS feed of GPGPU.
Subscribe to a syndicated RSS feed of GPGPU.

Powered by Blosxom.

Hosted by ibiblio.org

A MapReduce Framework on Graphics Processors

This paper describes the design and implementation of Mars, a MapReduce framework, on graphics processors (GPUs). MapReduce is a distributed programming framework originally proposed by Google for the ease of development of web search applications on a large number of commodity CPUs. Compared with CPUs, GPUs have an order of magnitude higher computation power and memory bandwidth, but can be harder to program because their architectures are designed as a special-purpose co-processor and they have only recently introduced non-graphics programming interfaces. The authors developed Mars on an NVIDIA G80 GPU, which contains 128 processors, and evaluated it in comparison with Phoenix, the state-of-the-art MapReduce framework on multi-core CPUs. Mars hides the programming complexity of the GPU behind the simple and familiar MapReduce interface. It is up to 16 times faster than its CPU-based counterpart for six common web applications on a quad-core machine. Additionally, the authors propose a MapReduce framework with coprocessing between the GPU and the CPU for further performance improvement. Mars is developed by Bingsheng He (HKUST) and Wenbin Fang(HKUST) under the supervision of Naga K. Govindaraju (Microsoft Corp.), Qiong Luo (HKUST), and Tuyong Wang (Sina.com). Source code of Mars can be downloaded from the authors' website. (A MapReduce Framework on Graphics Processors. Bingsheng He, Wenbin Fang, Qiong Lo, Naga K. Govindaraju, and Tuyong Want. To appear in PACT 2008.)

Posted: 16 Oct 2008 [GPGPU /Data Parallel Algorithms] #

Case studies on GPU usage and data structure design

Abstract: Big improvements in the performance of graphics processing units (GPUs) turned them into a compelling platform for high performance computing. In this thesis, we discuss the usage of NVIDIA's CUDA in two applications -- Einstein@Home, a distributed computing software, and OpenSteer, a game-like application. Our work on Einstein@Home demonstrates that CUDA can be integrated into existing applications with minimal changes, even in programs designed without considering GPU usage. However the existing data structure of Einstein@Home performs poorly when used on the GPU. We demonstrate that using a redesigned data structure improves the performance to about three times as fast as the original CPU version, even though the code executed on the device is not optimized. We further discuss the design of a novel spatial data structure called "dynamic grid" that is optimized for CUDA usage. We measure its performance by integrating it into the Boids scenario of OpenSteer. Our new concept outperforms a uniform grid by a margin of up to 15%, even though the dynamic grid still provides optimization potential. (Case studies on gpu usage and data structure design. J. Breitbart, Master's thesis, Universität Kassel, 2008)

Posted: 11 Aug 2008 [GPGPU /Data Parallel Algorithms] #

A Fast Similarity Join Algorithm Using Graphics Processing Units

This paper by Lieberman et al. at the University of Maryland describes an application of GPU processing to the similarity join, a common operation in spatial databases. A similarity join takes two sets of points A, B and returns pairs pA, qB where the distance D(p,q) ≤ ε. The similarity join is a common spatial database operation with many applications. An algorithm named LSS is presented that executes on a GPU, taking advantage of the GPU's parallelism and large data throughput. To achieve peak efficiency, LSS relies only on simple primitive operations that execute quickly on the GPU, such as the sorting and searching of arrays. It recasts the similarity join as a sort-and-search problem by mapping its input datasets onto a set of space-filling curves, generated by a parallel sort routine on the GPU. It then searches small intervals of these curves that are guaranteed to contain all pairs of the correct result. LSS offers a balance between time and work efficiencies and is shown to perform well when compared against existing prominent high-dimensional similarity join methods. (M. D. Lieberman, J. Sankaranarayanan, and H. Samet. A fast similarity join algorithm using graphics processing units. In Proceedings of the 24th IEEE International Conference on Data Engineering, pages 1111-1120, Cancun, Mexico, April 2008.)

Posted: 25 May 2008 [GPGPU /Data Parallel Algorithms] #

Graph Layout on the GPU

A graph is an ordered pair G=(V,E) where V is a set of nodes and E is a set of edges connecting nodes. Graph drawing addresses the problem of creating geometric representations of graphs. Unlike matrices or images, graphs are unstructured and hence graph layout does not seem to be suitable for acceleration on the GPU. We present two GPU-accelerated graph drawing algorithms which are able to quickly compute aesthetic layouts of large graphs. One is for the layout of a single graph and the other is for computing stable layouts of a sequence of graphs. Speedups of 5.5x to 17x relative to a CPU implementation are demonstrated. (Yaniv Frishman and Ayellet Tal, Multi-Level Graph Layout on the GPU, IEEE Transactions on Visualization and Computer Graphics (Proceedings Information Visualization 2007), 13(6):1310-1317, 2007)
(Yaniv Frishman and Ayellet Tal, Online Dynamic Graph Drawing, accepted to IEEE Transactions on Visualization and Computer Graphics)

Posted: 25 May 2008 [GPGPU /Data Parallel Algorithms] #

Relational Joins on Graphics Processors

Abstract: "We present a novel design and implementation of relational join algorithms for new-generation graphics processing units (GPUs). Taking advantage of GPU features, we design a set of data-parallel primitives such as split and sort, and use these primitives to implement indexed or non-indexed nested-loop, sort-merge and hash joins. Our algorithms utilize the high parallelism as well as the high memory bandwidth of the GPU, and use parallel computation and memory optimizations to effectively reduce memory stalls. We have implemented our algorithms on a PC with an NVIDIA G80 GPU and an Intel quad-core CPU. Our GPU-based join algorithms are able to achieve a performance improvement of 2-7X over their optimized CPU-based counterparts. (Bingsheng He, Ke Yang, Rui Fang, Mian Lu, Naga K. Govindaraju, Qiong Luo, and Pedro V. Sander. Relational Joins on Graphics Processors. ACM SIGMOD 2008.)

Posted: 02 Apr 2008 [GPGPU /Database] #

Ph.D. Dissertation: Glift Generic GPU Data Structures, by Aaron Lefohn

This Ph.D. dissertation by Aaron Lefohn at the University of California, Davis describes the Glift GPU data structure abstraction and its application to both GPU-based data-parallel and interactive rendering algorithms. The applications include octree 3D painting, adaptive shadow maps, resolution matched shadow maps, heat-diffusion depth-of-field, and a GPU-based direct solver for tridiagonal linear systems. While much of this work has been posted previously, this dissertation contains a more in-depth discussion of the Glift data structure library and introduces several GPGPU and rendering algorithms that are not yet published. This dissertation demonstrates that a data structure abstraction for GPUs can simplify the description of new and existing data structures, stimulate development of complex GPU algorithms, and perform equivalently to hand-coded implementations. The dissertation also presents a case that future interactive rendering solutions will be an inseparable mix of general-purpose, data-parallel algorithms and traditional graphics programming. (Aaron Lefohn, "Glift: Generic Data Structures for Graphics Hardware," Ph.D. dissertation, Computer Science Department, University of California Davis, September 2006.)

Posted: 18 Jan 2007 [GPGPU /Data Parallel Algorithms] #

GPU-ABiSort: Optimal Parallel Sorting on Stream Architectures

This paper presents a novel approach for parallel sorting on stream processing architectures. It is based on adaptive bitonic sorting. For sorting n values utilizing p stream processor units, this approach achieves the optimal time complexity O((n log n)/p). This approach is competitive with common sequential sorting algorithms not only from a theoretical viewpoint, it is also very fast from a practical viewpoint. The paper presents an implementation on modern programmable graphics hardware (GPUs). On recent GPUs this optimal parallel sorting approach has shown to be remarkably faster than sequential sorting on the CPU, and it is also faster than previous non-optimal sorting approaches on the GPU for sufficiently large input sequences. (GPU-ABiSort: Optimal Parallel Sorting on Stream Architectures Alexander Gress and Gabriel Zachmann. Proc. 20th IEEE Int'l Parallel and Distributed Processing Symposium (IPDPS), 2006.)

Posted: 14 Aug 2006 [GPGPU /Database/Sort & Search] #

SIGGRAPH Poster: GPU Histogram Computation

This SIGGRAPH poster by Oliver Fluck et al. presents an approach to computing histograms in fragment shaders. The proposed method enables iterative and histogram-guided algorithms to run on GPUs and avoids data transfer between the GPU and main memory. The algorithm has been demonstrated using the example of a GPU level set segmentation. (GPU Histogram Computation)

Posted: 10 Aug 2006 [GPGPU /Data Parallel Algorithms] #

A work-efficient step-efficient prefix-sum algorithm

This extended abstract by Sengupta et al. presents a work-efficient step-efficient prefix-sum algorithm. This algorithm achieves a three to four fold speedup over the step-efficient prefix-sum algorithm presented by Daniel Horn in GPU Gems 2. It can also be tuned to efficiently run on future hardware which would have a higher degree of parallelism. (A work-efficient step-efficient prefix-sum algorithm Shubhabrata Sengupta, Aaron E. Lefohn, John D. Owens in in Proceedings of the 2006 Workshop on Edge Computing Using New Commodity Architectures.)

Posted: 25 Jun 2006 [GPGPU /Data Parallel Algorithms] #

GPUTeraSort: High Performance Graphics Coprocessor Sorting for Large Database Management

GPUTeraSort sorts billion-record wide-key databases using the data and task parallelism on the graphics processing unit (GPU) to perform memory-intensive and compute-intensive tasks while the CPU performs I/O and resource management. It exploits both the high-bandwidth GPU memory interface and the lower-bandwidth CPU main memory interface to achieve higher aggregate memory bandwidth than purely CPU-based algorithms. It also pipelines disk transfers to achieve near-peak I/O performance. GPUTera-Sort is a two-phase task pipeline: (1) read disk, build keys, sort using the GPU, generate runs, write disk, and (2) read, merge, write. We tested the performance of GPUTeraSort on billion-record files using the standard Sort benchmark. In practice, a 3 GHz Pentium IV PC with $265 NVIDIA 7800 GT GPU is significantly faster than optimized CPU-based algorithms on much faster processors, sorting 60GB for a penny; the best reported PennySort price-performance. These results suggest that a GPU co-processor can significantly improve performance on large data processing tasks. (GPUTeraSort: High Performance Graphics Coprocessor Sorting for Large Database Management . Naga K. Govindaraju, Jim Gray, Ritesh Kumar, and Dinesh Manocha. Proceedings of ACM SIGMOD 2006.)

Posted: 04 Apr 2006 [GPGPU /Database] #

Data Visualization and Mining using the GPU

Sudipto Guha, Shankar Krishnan and Suresh Venkatasubramanian are presenting a tutorial on the use of the GPU for data visualization and mining at the ACM International Conference on Knowledge Discovery and Data Mining (KDD 2005). ( Data Visualization and Mining on the GPU)

Posted: 28 Jul 2005 [GPGPU /Database] #

High Performance Sorting on a GPU

This paper by Govindaraju et al. describes a cache-efficient bitonic sorting algorithm on GPUs. The algorithm uses the special purpose texture mapping and programmable hardware to sort IEEE 32-bit floating point data including pointers, and has been used to perform stream data mining and relational database queries. Their results indicate a significant performance improvement over prior CPU-based and GPU-based sorting algorithms. ( GPUSORT: A High Performance Sorting Library" . Also see this Tom's Hardware article)

Posted: 01 Jul 2005 [GPGPU /Database/Sort & Search] #

Hardware Acceleration for Spatial Database Operations

These works from the Database Systems Lab at UC Santa Barbara describe how a graphics processor can be effectively used to accelerate the performance of spatial database (GIS databases) operations. Spatial database operations, especially which involve polygon datasets, have been known to be computationally expensive. Sun et al. describe a novel hardware / software co-processing technique which uses basic features of a GPU to reduce the spatial query processing cost. Experimental evaluation shows that their hardware-based approach can significantly outperform leading software-based techniques. (Hardware Acceleration for Spatial Selections and Joins Chengyu Sun, Divyakant Agrawal, Amr El Abbadi. Proceedings of SIGMOD 2003.) However, this evaluation is done in a stand-alone setting where there are no indices, preprocessing or other optimizations available in a database. Bandi et al. extend Sun et al.'s work and integrate the hardware-based technique into a popular commercial database. Rigorous experimentation over real-life data sets shows that the hardware-based approach is very effective and can be complimentary to the optimizations available in a commercial database setting. (Hardware Acceleration in Commercial Databases: A Case Study of Spatial Operations Nagender Bandi, Chengyu Sun, Divyakant Agrawal, Amr El Abbadi to appear in VLDB 2004.)

Posted: 19 Jul 2004 [GPGPU /Database] #

Fast Database Operations using Graphics Processors

This paper by Govindaraju et al. describes new algorithms for performing fast computation of several common database operations on commodity graphics processors. Specifically, the paper considers operations such as conjunctive selections, aggregations, and semi-linear queries, which are essential computational components of typical database, data warehousing, and data mining applications. The proposed algorithms take into account some of the limitations of the programming model of current GPUs and perform no data rearrangements. These algorithms have been implemented on a programmable GPU (e.g. NVIDIA's GeForce FX 5900) and applied to databases consisting of up to a million records. The paper compares their performance with an optimized CPU-based implementation. The experiments indicate that the graphics processor available on commodity computer systems is an effective coprocessor for performing database operations. (Fast Database Operations using Graphics Processors. Naga K. Govindaraju, Brandon Lloyd, Wei Wang, Ming C. Lin, Dinesh Manocha to appear at SIGMOD 2004.)

Posted: 11 Jun 2004 [GPGPU /Database] #


Categories