<?xml version="1.0"?>
<rss version="2.0">
<channel>
<title>GPGPU</title>
<link>http://www.gpgpu.org/cgi-bin/blosxom.cgi</link>
<description>General Purpose Computation Using Graphics Hardware</description>
<language>en</language>
<generator>Blosxom</generator>
<ttl>180</ttl>

<item>
<title>CUDPP 1.0a Adds Segmented Scan and Sparse Matrix-Vector Multiplication</title>
<link>http://www.gpgpu.org/cgi-bin/blosxom.cgi/Miscellaneous/Developer Resources/cudpp_1.0a.html</link>
<description><![CDATA[Version 1.0 alpha of &lt;a href=&quot;http://www.gpgpu.org/developer/cudpp&quot;&gt;CUDPP&lt;/a&gt;, the CUDA Data-Parallel Algorithms Library, has been released.  This version adds the &lt;a href=&quot;http://www.gpgpu.org/developer/cudpp/rel/cudpp_1.0a/html/group__public_interface.html#g34b57db9dc1207031be684922559cf8d&quot;&gt;segmented scan&lt;/a&gt; algorithm and &lt;a href=&quot;http://www.gpgpu.org/developer/cudpp/rel/cudpp_1.0a/html/group__public_interface.html#g1a1c352037b5fd16a5468b0dae3b4cae&quot;&gt;sparse matrix-vector multiplication&lt;/a&gt; to CUDPP's repertoire. Other new features include an improved &quot;plan&quot;-based configuration interface, an improved scan algorithm for higher performance, support for more inclusive scans and more scan operators, an improved stream compaction interface.  In addition, CUDPP 1.0a adds support for CUDA 2.0 and the Windows Vista and Mac OS X (10.5.2 and higher) operating systems. CUDPP works with NVIDIA &lt;a href=&quot;http://www.nvidia.com/cuda&quot;&gt;CUDA&lt;/a&gt; versions 1.1 and higher.]]></description>
<pubDate>Sun, 20 Apr 2008 18:02:00 PST</pubDate>
<category>Developer Resources</category>
<guid isPermaLink="true">http://www.gpgpu.org/cgi-bin/blosxom.cgi/Miscellaneous/Developer Resources/cudpp_1.0a.html</guid>
<author>Administrator</author>
</item>
<item>
<title>Shader Maker: a simple, truly cross-platform GLSL editor</title>
<link>http://www.gpgpu.org/cgi-bin/blosxom.cgi/Tools/zachmanshadermaker08.html</link>
<description><![CDATA[Shader Maker is a simple, cross-platform GLSL editor. It works on Windows, Linux, and Mac OS X. Shader Maker provides the basics of a shader editor, such that students can get started with writing their own shaders as quickly as possible. This includes: syntax highlighting in the GLSL editors; vertex, fragment, and geometry shader editors; interactive editing of uniform variables; light source parameters; pre-defined simple shapes (e.g., torus); a simple OBJ loader; and more.(&lt;a href=&quot;http://cg.in.tu-clausthal.de/publications.shtml#shader_maker&quot;&gt;
http://cg.in.tu-clausthal.de/publications.shtml#shader_maker
&lt;/a&gt;)]]></description>
<pubDate>Sun, 20 Apr 2008 16:22:00 PST</pubDate>
<category>Tools</category>
<guid isPermaLink="true">http://www.gpgpu.org/cgi-bin/blosxom.cgi/Tools/zachmanshadermaker08.html</guid>
<author>Administrator</author>
</item>
<item>
<title>GRIP - A Rugged GPU Accelerated Image Processing System</title>
<link>http://www.gpgpu.org/cgi-bin/blosxom.cgi/GPUs/grippc08.html</link>
<description><![CDATA[Vision4ce launched a new line of General-purpose Rugged Image Processing (GRIP) products at the recent SPIE Defense and Security Symposium in Orlando from 18th-20th March 2008. The GRIP-Beta showed cutting edge GPGPU-based image processing demonstrations, analog and Gigabit Ethernet video streams and the robust functionality in the Gripworkx image processing framework. The Vision4ce team with GRIP now addresses numerous rugged embedded computing challenges with a cost effective, readily available rugged solution that might normally be served by more expensive and lengthy FPGA approaches. See &lt;a href=&quot;http://www.vision4ce.com&quot;&gt;www.vision4ce.com&lt;/a&gt; for more information.]]></description>
<pubDate>Sun, 20 Apr 2008 16:20:00 PST</pubDate>
<category>GPUs</category>
<guid isPermaLink="true">http://www.gpgpu.org/cgi-bin/blosxom.cgi/GPUs/grippc08.html</guid>
<author>Administrator</author>
</item>
<item>
<title>SHARCNET Symposium on GPU and CELL Computing</title>
<link>http://www.gpgpu.org/cgi-bin/blosxom.cgi/Conferences/sharcnet08.html</link>
<description><![CDATA[&lt;p&gt;University of Waterloo&lt;br/&gt;
Waterloo, Ontario, Canada&lt;br/&gt;
May 27th 2008
&lt;/p&gt;
&lt;p&gt;This one-day symposium will explore the use of GPUs, CELL processors, FPGAs and multi-core CPUs for large-scale scientific computing.  The symposium program includes invited talks on the LANL Roadrunner CELL supercomputer, the RapidMind platform for multicore CPUs and many-core accelerators, and NVIDIA CUDA. For more information, see &lt;a href=&quot;http://www.sharcnet.ca/events/ssgc2008/&quot;&gt;http://www.sharcnet.ca/events/ssgc2008/&lt;/a&gt;]]></description>
<pubDate>Sun, 20 Apr 2008 16:17:00 PST</pubDate>
<category>Conferences</category>
<guid isPermaLink="true">http://www.gpgpu.org/cgi-bin/blosxom.cgi/Conferences/sharcnet08.html</guid>
<author>Administrator</author>
</item>
<item>
<title>gDEBugger V4.0 Adds Linux Support and a Buffer Viewer</title>
<link>http://www.gpgpu.org/cgi-bin/blosxom.cgi/Tools/gDebugger4.0.html</link>
<description><![CDATA[The new &lt;a href=&quot;http://www.gremedy.com&quot;&gt;gDEBugger V4.0&lt;/a&gt; introduces gDEBugger Linux. This new exciting product adds 32-bit and 64-bit Linux Support, bringing all of gDEBugger's debugging and profiling abilities to the Linux OpenGL developers' world. A new Texture and Buffer Viewer has been added. This Viewer allows you to view textures, static buffers and pbuffers as images or raw data in its original format, including non-RGB data formats (float, depth, integer, luminance, etc). This version also includes significant performance improvements. gDEBugger, an OpenGL and OpenGL ES debugger and profiler, traces application activity on top of the OpenGL API to let programmers see what is happening within the graphics system implementation to find bugs and optimize OpenGL application performance. (&lt;a href=&quot;http://www.gremedy.com&quot;&gt;http://www.gremedy.com&lt;/a&gt;)]]></description>
<pubDate>Wed, 02 Apr 2008 05:02:00 PST</pubDate>
<category>Tools</category>
<guid isPermaLink="true">http://www.gpgpu.org/cgi-bin/blosxom.cgi/Tools/gDebugger4.0.html</guid>
<author>Administrator</author>
</item>
<item>
<title>CUDA compatible GPU cards as efficient hardware accelerators for Smith-Waterman sequence alignment</title>
<link>http://www.gpgpu.org/cgi-bin/blosxom.cgi/Scientific Computing/manavskiBMCBio08.html</link>
<description><![CDATA[The Smith-Waterman algorithm has been available for more than 25 years. It is based on a dynamic programming approach that explores all the possible alignments between two biological sequences; as a result it returns the optimal local alignment. Unfortunately, the computational cost is very high, requiring a number of operations proportional to the product of the length of two sequences. &lt;a href=&quot;http://www.biomedcentral.com/1471-2105/9/S2/S10&quot;&gt;This paper&lt;/a&gt; by &lt;a href=&quot;http://www.manavski.com&quot;&gt;Svetlin Manavski&lt;/a&gt; and &lt;a href=&quot;http://grup.cribi.unipd.it/%7Evalle/&quot;&gt;Giorgio Valle&lt;/a&gt; describes &lt;a href=&quot;http://bioinformatics.cribi.unipd.it/cuda/&quot;&gt;SmithWaterman-CUDA&lt;/a&gt;, an open-source project to perform fast sequence alignment on the GPU. Although the software performs the optimal Smith-Waterman alignment it is faster than heuristics approaches like FASTA and BLAST. The tests on protein data banks show up to 30x speed up related to reference CPU implementations. (Svetlin A. Manavski, Giorgio Valle, &lt;a href=&quot;http://www.biomedcentral.com/1471-2105/9/S2/S10&quot;&gt;CUDA compatible GPU cards as efficient hardware accelerators for Smith-Waterman sequence alignment&lt;/a&gt;, BMC Bioinformatics 2008, 9(Suppl 2):S10 (26 March 2008))]]></description>
<pubDate>Wed, 02 Apr 2008 04:57:00 PST</pubDate>
<category>Scientific Computing</category>
<guid isPermaLink="true">http://www.gpgpu.org/cgi-bin/blosxom.cgi/Scientific Computing/manavskiBMCBio08.html</guid>
<author>Administrator</author>
</item>
<item>
<title>Relational Joins on Graphics Processors</title>
<link>http://www.gpgpu.org/cgi-bin/blosxom.cgi/Database/hesigmod08.html</link>
<description><![CDATA[Abstract: &quot;We present a novel design and implementation of relational join algorithms for new-generation graphics processing units (GPUs). Taking advantage of GPU features, we design a set of data-parallel primitives such as split and sort, and use these primitives to implement indexed or non-indexed nested-loop, sort-merge and hash joins. Our algorithms utilize the high parallelism as well as the high memory bandwidth of the GPU, and use parallel computation and memory optimizations to effectively reduce memory stalls. We have implemented our algorithms on a PC with an NVIDIA G80 GPU and an Intel quad-core CPU. Our GPU-based join algorithms are able to achieve a performance improvement of 2-7X over their optimized CPU-based counterparts. (Bingsheng He, Ke Yang, Rui Fang, Mian Lu, Naga K. Govindaraju, Qiong Luo, and Pedro V. Sander. &lt;a href=&quot;http://www.cse.ust.hk/catalac/papers/gpujoin_sigmod08.pdf&quot;&gt;Relational Joins on Graphics Processors&lt;/a&gt;. ACM SIGMOD 2008.)]]></description>
<pubDate>Wed, 02 Apr 2008 04:53:00 PST</pubDate>
<category>Database</category>
<guid isPermaLink="true">http://www.gpgpu.org/cgi-bin/blosxom.cgi/Database/hesigmod08.html</guid>
<author>Administrator</author>
</item>
<item>
<title>A SIMD interpreter for Genetic Programming on GPU Graphics Cards</title>
<link>http://www.gpgpu.org/cgi-bin/blosxom.cgi/Scientific Computing/langdonGP08.html</link>
<description><![CDATA[Abstract: Mackey-Glass chaotic time series prediction and nuclear protein classification show the feasibility of evaluating genetic programming populations directly on parallel consumer gaming graphics processing units. Using a Linux KDE computer equipped with an nVidia GeForce 8800 GTX graphics processing unit card the C++ SPMD interpretter evolves programs at Giga GP operations per second (895 million GPops).  We use the RapidMind general processing on GPU (GPGPU) framework to evaluate an entire population of a quarter of a million individual programs on a non-trivial problem in 4 seconds. An efficient reverse polish notation (RPN) tree based GP is given. (&lt;a href=&quot;http://www.cs.ucl.ac.uk/staff/W.Langdon/ftp/papers/langdon_2008_eurogp.pdf&quot;&gt;A SIMD interpreter for Genetic Programming on GPU Graphics Cards&lt;/a&gt;. W.B. Langdon and W. Banzhaf. In M. Neill, L. Vanneschi, A.I. Esparcia Alcazar, S. Gustafson eds., EuroGP 2008, pp73-85. Springer, LNCS 4971, 26-28 March, Naples.)]]></description>
<pubDate>Wed, 02 Apr 2008 04:49:00 PST</pubDate>
<category>Scientific Computing</category>
<guid isPermaLink="true">http://www.gpgpu.org/cgi-bin/blosxom.cgi/Scientific Computing/langdonGP08.html</guid>
<author>Administrator</author>
</item>
<item>
<title>GPGPU Based Image Segmentation Livewire Algorithm Implementation</title>
<link>http://www.gpgpu.org/cgi-bin/blosxom.cgi/Image And Volume Processing/baggioImageSegmentation08.html</link>
<description><![CDATA[This thesis presents a GPU implementation of the Livewire algorithm. The algorithm is divided in three phases: Sobel or Laplacian filter convolution, image modeling as a grid graph and solving the non-negative weighted edges single-source shortest path problem. In order to calculate the shortest path, an adapted version of the delta-stepping algorithm was developed for GPUs, using CUDA. A critical result analysis shows that intense speedups are seen in image filtering algorithms. On the other hand, the wide use of dependent device memory look-ups has constrained delta-stepping algorithm from achieving higher performance than CPU implementation although a better performance is expected for wider graphs. Besides showing the viability of the Livewire algorithm implementation, this thesis makes available an open-source image segmentation GPU based application, which can be used as example for future GPU algorithm implementations at &lt;a href=&quot;http://code.google.com/p/gpuwire/&quot;&gt;http://code.google.com/p/gpuwire/&lt;/a&gt;.]]></description>
<pubDate>Tue, 01 Apr 2008 09:46:00 PST</pubDate>
<category>Image And Volume Processing</category>
<guid isPermaLink="true">http://www.gpgpu.org/cgi-bin/blosxom.cgi/Image And Volume Processing/baggioImageSegmentation08.html</guid>
<author>Administrator</author>
</item>
<item>
<title>Quantum Chemistry on GPUs</title>
<link>http://www.gpgpu.org/cgi-bin/blosxom.cgi/Scientific Computing/ufimtsevCTC08.html</link>
<description><![CDATA[Ivan Ufimtsev and &lt;a href=&quot;http://mtzweb.scs.uiuc.edu&quot;&gt;Todd Martínez&lt;/a&gt; at the &lt;a href=&quot;http://www.uiuc.edu&quot;&gt;University of Illinois at Urbana-Champaign&lt;/a&gt; have implemented an efficient method of calculating two-electron repulsion integrals over Gaussian basis functions on the GPU. Virtually all modern quantum chemical calculations require evaluating millions to billions of these integrals. This problem turns out to be well-suited to the massively parallel architecture of GPUs by an appropriate  partitioning of the problem. A &lt;a href=&quot;http://pubs.acs.org/cgi-bin/abstract.cgi/jctcce/2008/4/i02/abs/ct700268q.html&quot;&gt;benchmark test&lt;/a&gt; performed for the evaluation of approximately one million (ss|ss) integrals over contracted s-orbitals showed that a naïve algorithm implemented on the GPU achieves up to 130-fold speedup over a traditional CPU implementation on an AMD Opteron. Subsequent calculations on a 256-atom DNA strand show that the GPU advantage is maintained for basis sets including higher angular momentum functions.
(&lt;a href=&quot;http://pubs.acs.org/cgi-bin/abstract.cgi/jctcce/2008/4/i02/abs/ct700268q.html&quot;&gt;Quantum Chemistry on Graphical Processing Units. 1. Strategies for
Two-Electron Integral Evaluation&lt;/a&gt;, Ivan S. Ufimtsev and Todd J. Martínez, &lt;i&gt;J. Chem. Theory Comput.&lt;/i&gt;, 4 (2), 222 -231, 2008. doi:10.1021/ct700268q)]]></description>
<pubDate>Tue, 01 Apr 2008 09:42:00 PST</pubDate>
<category>Scientific Computing</category>
<guid isPermaLink="true">http://www.gpgpu.org/cgi-bin/blosxom.cgi/Scientific Computing/ufimtsevCTC08.html</guid>
<author>Administrator</author>
</item>
<item>
<title>A Flexible Kernel for Adaptive Mesh Refinement on GPU</title>
<link>http://www.gpgpu.org/cgi-bin/blosxom.cgi/Computational Geometry/Surfaces and Modeling/boubekeurCGF08.html</link>
<description><![CDATA[This paper by &lt;a href=&quot;http://user.cs.tu-berlin.de/~boubek&quot;&gt;Boubekeur&lt;/a&gt; (TU Berlin) and Schlick (INRIA) presents a flexible GPU kernel for adaptive on-the-fly refinement of meshes with arbitrary topology. By simply reserving a small amount of GPU memory to store a set of adaptive refinement patterns, on-the-fly refinement is performed by the GPU, without any preprocessing or additional topology data structure. The level of adaptive refinement can be controlled by specifying a per-vertex depth tag, in addition to usual position, normal, color and texture coordinates. This depth tag is used by the kernel to instanciate the correct refinement pattern. Finally, the refined patch produced for each triangle can be displaced by the vertex shader, using any kind of geometric refinement, such as Bezier patch smoothing, scalar valued displacement, procedural geometry synthesis or subdivision surfaces. This refinement engine requires no multi-pass rendering, fragment processing, or special preprocessing of the input mesh structure. It can be implemented on any GPU with vertex shading capabilities. (&lt;a href=&quot;http://iparla.labri.fr/publications/2008/BS08/&quot;&gt;A Flexible Kernel for Adaptive Mesh Refinement on GPU&lt;/a&gt;, Tamy Boubekeur and Christophe Schlick, Computer Graphics Forum, 2008.)]]></description>
<pubDate>Tue, 01 Apr 2008 09:29:00 PST</pubDate>
<category>Surfaces and Modeling</category>
<guid isPermaLink="true">http://www.gpgpu.org/cgi-bin/blosxom.cgi/Computational Geometry/Surfaces and Modeling/boubekeurCGF08.html</guid>
<author>Administrator</author>
</item>
<item>
<title>Accelerating Resolution-of-the-Identity Second-Order Møller-Plesset Quantum Chemistry Calculations with Graphical Processing Units</title>
<link>http://www.gpgpu.org/cgi-bin/blosxom.cgi/Scientific Computing/vogtPhysChem08.html</link>
<description><![CDATA[In this paper we describe a modification of a general purpose code for quantum mechanical calculations of molecular properties (Q-Chem) to use a graphical processing unit. We report a 4.3x speedup of the resolution-of-the-identity second-order Møller-Plesset perturbation theory execution time for single point energy calculation of linear alkanes. Furthermore, we obtain the correlation and total energy for n-octane conformers as the torsional angle of central bond is rotated to show that precision is not lost for these types of calculations. This code modification is accomplished using the NVIDIA CUDA Basic Linear Algebra Subprograms (CUBLAS) library for an NVIDIA Quadro FX 5600 graphics card. Finally, we anticipate further speedups of other matrix algebra based electronic structure calculations using a similar approach. (&lt;a href=&quot;http://pubs.acs.org/cgi-bin/abstract.cgi/jpcafh/asap/abs/jp0776762.html&quot;&gt;Accelerating Resolution-of-the-Identity Second-Order Møller-Plesset Quantum Chemistry Calculations with Graphical Processing Units&lt;/a&gt;. Vogt, L., Olivares-Amaya, R., Kermes, S., Shao, Y., Amador-Bedolla, C., and Aspuru-Guzik, A. &lt;i&gt;J. Phys. Chem. A&lt;/i&gt;, 2008, DOI: 10.1021/jp0776762)]]></description>
<pubDate>Sun, 10 Feb 2008 13:23:00 PST</pubDate>
<category>Scientific Computing</category>
<guid isPermaLink="true">http://www.gpgpu.org/cgi-bin/blosxom.cgi/Scientific Computing/vogtPhysChem08.html</guid>
<author>Administrator</author>
</item>
<item>
<title>Microprocessor Report: Parallel Processing With CUDA</title>
<link>http://www.gpgpu.org/cgi-bin/blosxom.cgi/Press/halfhillMicroprocessorReport08.html</link>
<description><![CDATA[&lt;a href=&quot;http://www.mdronline.com/watch/watch_Issue.asp?Volname=Issue+%23012808&amp;on=1#item2&quot;&gt;This article&lt;/a&gt; in the January 28, 2008 issue of Microprocessor Report discusses parallel computing with massive multiprocessing on GPUs using NVIDIA CUDA.  While the full article requires a subscription, a summary is available &lt;a href=&quot;http://www.mdronline.com/watch/watch_Issue.asp?Volname=Issue+%23012808&amp;on=1#item2&quot;&gt;here&lt;/a&gt;.]]></description>
<pubDate>Sun, 10 Feb 2008 13:20:00 PST</pubDate>
<category>Press</category>
<guid isPermaLink="true">http://www.gpgpu.org/cgi-bin/blosxom.cgi/Press/halfhillMicroprocessorReport08.html</guid>
<author>Administrator</author>
</item>
<item>
<title>GPGPUs: Neat Idea or Disruptive Technology?</title>
<link>http://www.gpgpu.org/cgi-bin/blosxom.cgi/Press/farberSciComputing08.html</link>
<description><![CDATA[&quot;General purpose graphics processing units can perform amazingly well when used effectively.&quot; &lt;a href=&quot;http://www.scimag.com/ShowPR.aspx?PUBCODE=030&amp;ACCT=3000000100&amp;ISSUE=0801&amp;RELTYPE=PR&amp;ORIGRELTYPE=HPCC&amp;PRODCODE=00000000&amp;PRODLETT=C&amp;CommonCount=0&quot;&gt;This article&lt;/a&gt; by Rob Farber at &lt;a href=&quot;http://www.scimag.com&quot;&gt;Scientific Computing&lt;/a&gt; provides a brief high-level discussion of GPGPU and NVIDIA CUDA.
]]></description>
<pubDate>Sun, 10 Feb 2008 13:16:00 PST</pubDate>
<category>Press</category>
<guid isPermaLink="true">http://www.gpgpu.org/cgi-bin/blosxom.cgi/Press/farberSciComputing08.html</guid>
<author>Administrator</author>
</item>
<item>
<title>High-throughput sequence alignment using Graphics Processing Units</title>
<link>http://www.gpgpu.org/cgi-bin/blosxom.cgi/Scientific Computing/schatzSequenceAlignment07.html</link>
<description><![CDATA[The recent availability of new, less expensive high-throughput DNA sequencing technologies has yielded a dramatic increase in the volume of sequence data that must be analyzed. &lt;a href=&quot;http://www.biomedcentral.com/1471-2105/8/474/abstract&quot;This paper&lt;/a&gt; by University of Maryland researchers &lt;a href=&quot;http://www.cbcb.umd.edu/~mschatz&quot;&gt;Michael Schatz&lt;/a&gt;, &lt;a href=&quot;http://www.cs.umd.edu/~cole&quot;&gt;Cole Trapnell&lt;/a&gt;, &lt;a href=&quot;http://www.cbcb.umd.edu/~adelcher&quot;&gt;Art Delcher&lt;/a&gt;, and &lt;a href=&quot;http://www.cs.umd.edu/~varshney&quot;&gt;Amitabh Varshney&lt;/a&gt; describes &lt;a href=&quot;http://mummergpu.sourceforge.net/&quot;&gt;MUMmerGPU&lt;/a&gt;, an open-source high-throughput parallel pairwise local sequence alignment program that runs on GPUs. MUMmerGPU uses the new Compute Unified Device Architecture (CUDA) from nVidia to align multiple query sequences against a single reference sequence stored as a suffix tree. By processing the queries in parallel on the graphics card, MUMmerGPU achieves more than a 10-fold speedup over a serial CPU version of the sequence alignment kernel, despite the very low arithmetic intensity of the task. (&lt;a href=&quot;http://www.biomedcentral.com/1471-2105/8/474/abstract&quot;&gt;High-throughput sequence alignment using Graphics Processing Units&lt;/a&gt;, Schatz, M.C., Trapnell, C., Delcher, A.L., Varshney, A. (2007), BMC Bioinformatics 8:474.)
]]></description>
<pubDate>Sun, 10 Feb 2008 13:11:00 PST</pubDate>
<category>Scientific Computing</category>
<guid isPermaLink="true">http://www.gpgpu.org/cgi-bin/blosxom.cgi/Scientific Computing/schatzSequenceAlignment07.html</guid>
<author>Administrator</author>
</item>
<item>
<title>Applying graphics hardware to achieve extremely fast geometric pattern matching</title>
<link>http://www.gpgpu.org/cgi-bin/blosxom.cgi/Computational Geometry/aigerGeomPatternMatching08.html</link>
<description><![CDATA[Abstract: &quot;We present a GPU-based approach to geometric pattern matching. We reduce this problem to ?nding the depth (maximally covered point) of an arrangement of polytopes in transformation space and describe hardware-assisted (GPU) algorithms which exploit the available set of graphics operations to perform a fast rasterized depth computation. (&lt;a href=&quot;http://www.cs.bgu.ac.il/~aiger/EWCG07_GPU.pdf&quot;&gt;Applying graphics hardware to achieve extremely fast geometric pattern matching in two and three dimensional transformation space&lt;/a&gt;. Dror Aiger and Klara Kedem. &lt;i&gt;&lt;a href=&quot;http://dx.doi.org/10.1016/j.ipl.2007.09.003&quot;&gt;Information Processing Letters&lt;/a&gt;&lt;/i&gt;. 2008.)

 ]]></description>
<pubDate>Thu, 24 Jan 2008 08:17:00 PST</pubDate>
<category>Computational Geometry</category>
<guid isPermaLink="true">http://www.gpgpu.org/cgi-bin/blosxom.cgi/Computational Geometry/aigerGeomPatternMatching08.html</guid>
<author>Administrator</author>
</item>
<item>
<title>Parallel Implementation of the 2D Discrete Wavelet Transform on Graphics Processing Units: Filter Bank versus Lifting</title>
<link>http://www.gpgpu.org/cgi-bin/blosxom.cgi/Scientific Computing/tenlladoDWT08.html</link>
<description><![CDATA[Abstract: &quot;The widespread usage of the Discrete Wavelet Transform (DWT) has motivated the development of fast DWT algorithms and their tuning on all sorts of computer systems. Several studies have compared the performance of the most popular schemes, known as Filter Bank (FBS) and Lifting (LS), and have always concluded that Lifting is the most efficient option. However, there is no such study on streaming processors such as modern
Graphic Processing Units (GPUs). Current trends have transformed these devices into powerful stream processors with enough flexibility to perform intensive and complex floating-point calculations. The opportunities opened up by these platforms, as well as the growing popularity of the DWT within the computer graphics field, make a new performance comparison of great practical interest. Our study indicates that FBS outperforms LS in current generation GPUs. In our experiments, the actual FBS gains range between 10% and 140%, depending on the problem size and the type and length of the wavelet filter. Moreover, design trends suggest higher gains in future generation GPUs. (&lt;a href=&quot;http://doi.ieeecomputersociety.org/10.1109/TPDS.2007.70716&quot;&gt;Parallel Implementation of the 2D Discrete Wavelet Transform on Graphics Processing Units: Filter Bank versus Lifting&lt;/a&gt;. Christian Tenllado, Javier Setoain, Manuel Prieto, Luis Piñuel, and Francisco Tirado. &lt;i&gt;IEEE Transactions on Parallel and Distributed Systems ,vol. 19, no. 3,  pp. 299-310, March, 2008. &lt;/i&gt;)]]></description>
<pubDate>Thu, 24 Jan 2008 08:09:00 PST</pubDate>
<category>Scientific Computing</category>
<guid isPermaLink="true">http://www.gpgpu.org/cgi-bin/blosxom.cgi/Scientific Computing/tenlladoDWT08.html</guid>
<author>Administrator</author>
</item>
<item>
<title>Dynamic Warp Formation and Scheduling for Efficient GPU Control Flow</title>
<link>http://www.gpgpu.org/cgi-bin/blosxom.cgi/Computer Architecture/fungDynamicWarpFormation08.html</link>
<description><![CDATA[Abstract: &quot;Recent advances in graphics processing units (GPUs) have resulted in massively parallel hardware that is easily programmable and widely available in commodity desktop computer systems. GPUs typically use single-instruction, multiple-data (SIMD) pipelines to achieve high performance with minimal overhead incurred by control hardware. Scalar threads are grouped together into SIMD batches, sometimes referred to as warps. While SIMD is ideally suited for simple programs, recent GPUs include control flow instructions in the GPU instruction set architecture and programs using these instructions may experience reduced performance due to the way branch execution is supported by hardware. One approach is to add a stack to allow different SIMD processing elements to execute distinct program paths after a branch instruction. The occurrence of diverging branch outcomes for different processing elements significantly degrades performance. In this paper, we explore mechanisms for more efficient SIMD branch execution on GPUs. We show that a realistic hardware implementation that dynamically regroups threads into new warps on the fly following the occurrence of diverging branch outcomes improves performance by an average of 20.7% for an estimated area increase of 4.7%. (Wilson W. L. Fung, Ivan Sham, George Yuan, and Tor M. Aamodt, &lt;a href=&quot;http://www.ece.ubc.ca/~aamodt/papers/wwlfung.micro2007.pdf&quot;&gt;Dynamic Warp Formation and Scheduling for Efficient GPU Control Flow&lt;/a&gt;, to appear in 40th IEEE/ACM International Symposium on Microarchitecture (&lt;a href=&quot;http://www.microarch.org/micro40/&quot;MICRO-40&lt;/a&gt;), Chicago, IL, December 1-5, 2007.]]></description>
<pubDate>Thu, 17 Jan 2008 20:42:00 PST</pubDate>
<category>Computer Architecture</category>
<guid isPermaLink="true">http://www.gpgpu.org/cgi-bin/blosxom.cgi/Computer Architecture/fungDynamicWarpFormation08.html</guid>
<author>Administrator</author>
</item>
<item>
<title>Toward efficient GPU-accelerated N-body simulations</title>
<link>http://www.gpgpu.org/cgi-bin/blosxom.cgi/Scientific Computing/stockNBody08.html</link>
<description><![CDATA[Abstract: &quot;N-body algorithms are applicable to a number of common problems in computational physics including gravitation, electrostatics, and fluid dynamics. Fast algorithms (those with better than O(N&lt;sup&gt;2&lt;/sup&gt;) performance) exist, but have not been successfully implemented on GPU hardware for practical problems. In the present work, we introduce not only best-in-class performance for a multipole-accelerated treecode method, but a series of improvements that support implementation of this solver on highly-data-parallel graphics processing units (GPUs). The greatly reduced computation times suggest that this problem is ideally suited for the current and next generations of single and cluster CPU-GPU architectures. We believe that this is an ideal method for practical computation of largescale turbulent flows on future supercomputing hardware using parallel vortex particle methods. (Mark J. Stock and Adrin Gharakhani, &quot;Toward efficient GPU-accelerated N-body simulations,&quot; in &lt;a href=&quot;http://www.aiaa.org/agenda.cfm?lumeetingid=1065&amp;viewcon=agenda&amp;pageview=2&amp;programSeeview=1&amp;formatview=2&quot;&gt;46th AIAA Aerospace Sciences Meeting and Exhibit&lt;/a&gt;, AIAA 2008-608, January 2008, Reno, Nevada.)]]></description>
<pubDate>Thu, 17 Jan 2008 20:30:00 PST</pubDate>
<category>Scientific Computing</category>
<guid isPermaLink="true">http://www.gpgpu.org/cgi-bin/blosxom.cgi/Scientific Computing/stockNBody08.html</guid>
<author>Administrator</author>
</item>
<item>
<title>Acceleration of a 3D Euler Solver Using Commodity Graphics Hardware</title>
<link>http://www.gpgpu.org/cgi-bin/blosxom.cgi/Scientific Computing/brandvikEuler08.html</link>
<description><![CDATA[Abstract: &quot;The porting of two- and three-dimensional Euler solvers from a conventional CPU implementation to the novel target platform of the Graphics Processing Unit (GPU) is described. The motivation for such an effort is the impressive performance that GPUs offer: typically 10 times more floating point operations per second than a modern CPU, with over 100 processing cores and all at a very modest financial cost. Both codes were found to generate the same results on the GPU as the FORTRAN versions did on the CPU. The 2D solver ran up to 29 times quicker on the GPU than on the CPU; the 3D solver 16 times faster.&quot; (Tobias Brandvik and Graham Pullan, &lt;a href=&quot;http://www.eng.cam.ac.uk/~gp10006/research/Brandvik_Pullan_2008a_DRAFT.pdf&quot;&gt;Acceleration of a 3D Euler Solver Using Commodity Graphics Hardware&lt;/a&gt;. 46th AIAA Aerospace Sciences Meeting and Exhibit.  January, 2008.)]]></description>
<pubDate>Thu, 17 Jan 2008 18:39:00 PST</pubDate>
<category>Scientific Computing</category>
<guid isPermaLink="true">http://www.gpgpu.org/cgi-bin/blosxom.cgi/Scientific Computing/brandvikEuler08.html</guid>
<author>Administrator</author>
</item></channel>
</rss>