<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>GPGPUSearch results for 'supercomputing+2009' (page 1 of 15)</title>
	<atom:link href="http://gpgpu.org/search/supercomputing+2009/feed/rss2/" rel="self" type="application/rss+xml" />
	<link>http://gpgpu.org</link>
	<description>General-Purpose Computation on Graphics Hardware</description>
	<lastBuildDate>Mon, 06 Feb 2012 04:59:24 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>Supercomputing 2009 Tutorial: High-Performance Computing with CUDA</title>
		<link>http://gpgpu.org/2009/11/30/sc2009-cuda-tutorial</link>
		<comments>http://gpgpu.org/2009/11/30/sc2009-cuda-tutorial#comments</comments>
		<pubDate>Tue, 01 Dec 2009 04:54:34 +0000</pubDate>
		<dc:creator>Mark Harris</dc:creator>
				<category><![CDATA[Developer Resources]]></category>
		<category><![CDATA[Events]]></category>
		<category><![CDATA[Conferences]]></category>
		<category><![CDATA[High-Performance Computing]]></category>
		<category><![CDATA[NVIDIA CUDA]]></category>
		<category><![CDATA[Supercomputing]]></category>
		<category><![CDATA[Tutorials & Courses]]></category>

		<guid isPermaLink="false">http://gpgpu.org/?p=1994</guid>
		<description><![CDATA[The presentation slides from the Supercomputing 2009 full-day tutorial “High-Performance Computing with CUDA” are now available at http://gpgpu.org/sc2009 . Abstract: NVIDIA’s CUDA is a general-purpose architecture for writing highly parallel applications. CUDA provides several key abstractions—a hierarchy of thread blocks, shared memory, and barrier synchronization—for &#8230;]]></description>
			<content:encoded><![CDATA[<p>The presentation slides from the <a href="http://sc09.supercomputing.org/" target="_blank">Supercomputing 2009</a> full-day tutorial &#8220;High-Performance Computing with CUDA&#8221; are now available at <a href="http://gpgpu.org/sc2009">http://gpgpu.org/sc2009</a>.</p>
<p>Abstract:</p>
<blockquote><p>NVIDIA’s CUDA is a general-purpose architecture for writing highly parallel applications. CUDA provides several key abstractions—a hierarchy of thread blocks, shared memory, and barrier synchronization—for scalable high-performance parallel computing. Scientists throughout industry and academia use CUDA to achieve dramatic speedups on production and research codes. The CUDA architecture supports many languages, programming environments, and libraries including C, Fortran, OpenCL, DirectX Compute, Python, Matlab, FFT, LAPACK, etc.</p>
<p>In this tutorial NVIDIA engineers will partner with academic and industrial researchers to present CUDA and discuss its advanced use for science and engineering domains. The morning session will introduce CUDA programming, motivate its use with many brief examples from different HPC domains, and discuss tools and programming environments. The afternoon will discuss advanced issues such as optimization and sophisticated algorithms/data structures, closing with real-world case studies from domain scientists using CUDA for computational biophysics, fluid dynamics, seismic imaging, and theoretical physics.</p></blockquote>
]]></content:encoded>
			<wfw:commentRss>http://gpgpu.org/2009/11/30/sc2009-cuda-tutorial/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Supercomputing 2009 CUDA Tutorial</title>
		<link>http://gpgpu.org/sc2009</link>
		<comments>http://gpgpu.org/sc2009#comments</comments>
		<pubDate>Tue, 01 Dec 2009 04:48:26 +0000</pubDate>
		<dc:creator>Mark Harris</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://gpgpu.org/?page_id=1991</guid>
		<description><![CDATA[High Performance Computing with CUDA Welcome to the course notes for the full-day SUPERCOMPUTING 2009 CUDA Tutorial! (Note: the slides below are also available on the NVIDIA website.) Abstract NVIDIA&#8217;s CUDA is a general purpose architecture for writing highly parallel applications. CUDA provides several key abstractions&#8211;a hierarchy of thread blocks, shared memory, and barrier synchronization&#8211;for scalable [...]]]></description>
			<content:encoded><![CDATA[<h3 style="font-size: 21px; margin-top: 0px; margin-right: 0px; margin-bottom: 20px; margin-left: 0px; outline-width: 0px; outline-style: initial; outline-color: initial; vertical-align: baseline; background-image: initial; background-repeat: initial; background-attachment: initial; -webkit-background-clip: initial; -webkit-background-origin: initial; background-color: transparent; background-position: initial initial; padding: 0px; border: 0px initial initial;">High Performance Computing with CUDA</h3>
<p style="margin-top: 0px; margin-right: 0px; margin-bottom: 20px; margin-left: 0px; outline-width: 0px; outline-style: initial; outline-color: initial; font-size: 13px; vertical-align: baseline; background-image: initial; background-repeat: initial; background-attachment: initial; -webkit-background-clip: initial; -webkit-background-origin: initial; background-color: transparent; font-style: italic; background-position: initial initial; padding: 0px; border: 0px initial initial;">Welcome to the course notes for the full-day <a style="outline-width: 0px; outline-style: initial; outline-color: initial; font-size: 13px; vertical-align: baseline; background-image: initial; background-repeat: initial; background-attachment: initial; -webkit-background-clip: initial; -webkit-background-origin: initial; background-color: transparent; text-decoration: none; color: #336699; background-position: initial initial; padding: 0px; margin: 0px; border: 0px initial initial;" href="http://sc09.supercomputing.org/">SUPERCOMPUTING 2009</a> CUDA Tutorial!</p>
<p style="margin-top: 0px; margin-right: 0px; margin-bottom: 20px; margin-left: 0px; outline-width: 0px; outline-style: initial; outline-color: initial; font-size: 13px; vertical-align: baseline; background-image: initial; background-repeat: initial; background-attachment: initial; -webkit-background-clip: initial; -webkit-background-origin: initial; background-color: transparent; font-style: italic; background-position: initial initial; padding: 0px; border: 0px initial initial;">(Note: the slides below are also available on the <a href="http://www.nvidia.com/object/SC09_Tutorial.html" target="_blank">NVIDIA website</a>.)</p>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 0px; width: 1px; height: 1px; overflow-x: hidden; overflow-y: hidden;">Abstract</div>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 0px; width: 1px; height: 1px; overflow-x: hidden; overflow-y: hidden;">NVIDIA&#8217;s CUDA is a general purpose architecture for writing highly parallel</div>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 0px; width: 1px; height: 1px; overflow-x: hidden; overflow-y: hidden;">applications. CUDA provides several key abstractions&#8211;a hierarchy of thread blocks,</div>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 0px; width: 1px; height: 1px; overflow-x: hidden; overflow-y: hidden;">shared memory, and barrier synchronization&#8211;for scalable high-performance parallel</div>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 0px; width: 1px; height: 1px; overflow-x: hidden; overflow-y: hidden;">computing. Scientists throughout industry and academia use CUDA to achieve dramatic</div>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 0px; width: 1px; height: 1px; overflow-x: hidden; overflow-y: hidden;">speedups on production and research codes. The CUDA architecture supports many</div>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 0px; width: 1px; height: 1px; overflow-x: hidden; overflow-y: hidden;">languages, programming environments, and libraries including C, Fortran, OpenCL,</div>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 0px; width: 1px; height: 1px; overflow-x: hidden; overflow-y: hidden;">DirectX Compute, Python, Matlab, FFT, LAPACK, etc.</div>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 0px; width: 1px; height: 1px; overflow-x: hidden; overflow-y: hidden;">In this tutorial NVIDIA engineers will partner with academic and industrial researchers to</div>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 0px; width: 1px; height: 1px; overflow-x: hidden; overflow-y: hidden;">present CUDA and discuss its advanced use for science and engineering domains. The</div>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 0px; width: 1px; height: 1px; overflow-x: hidden; overflow-y: hidden;">morning session will introduce CUDA programming, motivate its use with many brief</div>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 0px; width: 1px; height: 1px; overflow-x: hidden; overflow-y: hidden;">examples from different HPC domains, and discuss tools and programming</div>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 0px; width: 1px; height: 1px; overflow-x: hidden; overflow-y: hidden;">environments. The afternoon will discuss advanced issues such as optimization and</div>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 0px; width: 1px; height: 1px; overflow-x: hidden; overflow-y: hidden;">sophisticated algorithms/data structures, closing with real-world case studies from</div>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 0px; width: 1px; height: 1px; overflow-x: hidden; overflow-y: hidden;">domain scientists using CUDA for computational biophysics, fluid dynamics, seismic</div>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 0px; width: 1px; height: 1px; overflow-x: hidden; overflow-y: hidden;">imaging, and theoretical physics.</div>
<p><strong>Abstract</strong></p>
<p>NVIDIA&#8217;s CUDA is a general-purpose architecture for writing highly parallel applications. CUDA provides several key abstractions—a hierarchy of thread blocks, shared memory, and barrier synchronization—for scalable high-performance parallel computing. Scientists throughout industry and academia use CUDA to achieve dramatic speedups on production and research codes. The CUDA architecture supports many languages, programming environments, and libraries including C, Fortran, OpenCL, DirectX Compute, Python, Matlab, FFT, LAPACK, etc.</p>
<p>In this tutorial NVIDIA engineers will partner with academic and industrial researchers to present CUDA and discuss its advanced use for science and engineering domains. The morning session will introduce CUDA programming, motivate its use with many brief examples from different HPC domains, and discuss tools and programming environments. The afternoon will discuss advanced issues such as optimization and sophisticated algorithms/data structures, closing with real-world case studies from domain scientists using CUDA for computational biophysics, fluid dynamics, seismic imaging, and theoretical physics.</p>
<p><span style="font-family: 'Trebuchet MS'; line-height: normal;"> </span></p>
<p><strong>8:30 Introduction-Overview and CUDA Basics</strong><br />
<em>David Luebke, NVIDIA<br />
</em><a href="http://gpgpu.org/wp/wp-content/uploads/2009/11/SC09_CUDA_luebke_Intro.pdf"><strong>[Download PDF]</strong></a><strong> </strong></p>
<p><strong>9:00 CUDA Programming Environments<br />
<em>Ian Buck, NVIDIA<br />
<a href="http://gpgpu.org/wp/wp-content/uploads/2009/11/SC09_CUDA_ProgModel_Buck.pdf"><strong><span style="font-style: normal;">[Download PDF]</span></strong></a></em></strong></p>
<p><strong> </strong></p>
<p><strong>10:30 CUDA Libraries &amp; Tools<br />
<em>Jonathan Cohen, NVIDIA<br />
<a href="http://gpgpu.org/wp/wp-content/uploads/2009/11/SC09_CUDA_Tools_Cohen.pdf"><strong><span style="font-style: normal;">[Download PDF]</span></strong></a></em></strong></p>
<p><strong>11:15 Optimizing GPU Performance and CPU-GPU Performance<br />
<em>Paulius Micikevicius, NVIDIA<br />
<span style="font-style: normal;"><a style="outline-width: 0px; outline-style: initial; outline-color: initial; font-size: 13px; vertical-align: baseline; background-image: initial; background-repeat: initial; background-attachment: initial; -webkit-background-clip: initial; -webkit-background-origin: initial; background-color: transparent; text-decoration: none; color: #336699; background-position: initial initial; padding: 0px; margin: 0px; border: 0px initial initial;" href="http://gpgpu.org/wp/wp-content/uploads/2009/11/SC09_Optimization_Micikevicius.pdf"><strong>[Download PDF]</strong></a></span></em></strong></p>
<p><strong>1:45 Irregular Algorithms &amp; Data Structures<br />
<em>John Owens, University of California Davis<br />
<a href="http://gpgpu.org/wp/wp-content/uploads/2009/11/SC09_Irregular_Data_Structures_Owens.pdf"><span style="font-style: normal;"><strong>[Download PDF]</strong></span></a></em></strong></p>
<p><strong>2:30 Molecular Modeling<br />
<em>John Stone, University of Illinois at Urbana-Champaign<br />
<a href="http://gpgpu.org/wp/wp-content/uploads/2009/11/SC09_Molecular_Stone.pdf"><span style="font-style: normal;"><strong>[Download PDF]</strong></span></a></em></strong></p>
<p><strong>3:30 Seismic Imaging<br />
<em>Scott Morton, Hess<br />
<strong><a style="outline-width: 0px; outline-style: initial; outline-color: initial; font-size: 13px; vertical-align: baseline; background-image: initial; background-repeat: initial; background-attachment: initial; -webkit-background-clip: initial; -webkit-background-origin: initial; background-color: transparent; text-decoration: none; color: #336699; background-position: initial initial; padding: 0px; margin: 0px; border: 0px initial initial;" href="http://gpgpu.org/wp/wp-content/uploads/2009/11/SC09_Seismic_Hess.pdf">[Download PDF]</a></strong> </em></strong></p>
<p><strong>4:00 Computational Fluid Dynamics<br />
<em>Jonathan Cohen, NVIDIA<br />
<a href="http://gpgpu.org/wp/wp-content/uploads/2009/11/SC09_Fluid_Sim_Cohen.pdf"><span style="font-style: normal;"><strong>[Download PDF]</strong></span></a></em></strong></p>
<p><strong>5:00 Quantum Chromodynamics<br />
</strong><em>Michael Clark, Harvard University</em><br />
<a href="http://gpgpu.org/wp/wp-content/uploads/2009/11/SC09_Clark.pdf"><strong>[Download PDF]</strong></a></p>
]]></content:encoded>
			<wfw:commentRss>http://gpgpu.org/sc2009/feed</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Workshop on GPU Supercomputing 2009, National Taiwan University</title>
		<link>http://gpgpu.org/2009/02/03/workshop-on-gpu-supercomputing-2009-national-taiwan-university</link>
		<comments>http://gpgpu.org/2009/02/03/workshop-on-gpu-supercomputing-2009-national-taiwan-university#comments</comments>
		<pubDate>Tue, 03 Feb 2009 09:57:09 +0000</pubDate>
		<dc:creator>dom</dc:creator>
				<category><![CDATA[Events]]></category>
		<category><![CDATA[Workshops]]></category>

		<guid isPermaLink="false">http://www.gpgpu.org/newgpgpu/?p=1131</guid>
		<description><![CDATA[The first NTU workshop on GPU supercomputing was held at NTU on January 16, 2009. Organized by the Center for Quantum Science and Engineering (CQSE) at National Taiwan University, This workshop consisted of seminars on applications of GPU/CUDA in high performance computations in science and engineering, as well as other fields. Slides from the presentations [...]]]></description>
			<content:encoded><![CDATA[<p>The first <a title="Workshop Website" href="http://cqse.ntu.edu.tw/cqse/gpu2009.html" target="_blank">NTU workshop on GPU supercomputing</a> was held at NTU on January 16, 2009. Organized by the Center for Quantum Science and Engineering (CQSE) at National Taiwan University, This workshop consisted of seminars on applications of GPU/CUDA in high performance computations in science and engineering, as well as other fields. <a title="Presentation Slides" href="http://cqse.ntu.edu.tw/cqse/gpu2009.html" target="_blank">Slides from the presentations</a> are now online.</p>
]]></content:encoded>
			<wfw:commentRss>http://gpgpu.org/2009/02/03/workshop-on-gpu-supercomputing-2009-national-taiwan-university/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Beyond Programmable Shading SIGGRAPH 2009 Course</title>
		<link>http://gpgpu.org/2009/08/06/beyond-programmable-shading-siggraph-2009</link>
		<comments>http://gpgpu.org/2009/08/06/beyond-programmable-shading-siggraph-2009#comments</comments>
		<pubDate>Fri, 07 Aug 2009 00:33:20 +0000</pubDate>
		<dc:creator>Mark Harris</dc:creator>
				<category><![CDATA[Developer Resources]]></category>
		<category><![CDATA[Events]]></category>
		<category><![CDATA[Research]]></category>
		<category><![CDATA[Computer Graphics]]></category>
		<category><![CDATA[Tutorials & Courses]]></category>

		<guid isPermaLink="false">http://gpgpu.org/?p=1784</guid>
		<description><![CDATA[The course notes and supplementary material for &#8220;Beyond Programmable Shading&#8221;, a full-day course held at SIGGRAPH 2009 on August 6, are now available online. This course is presented in two parts, Beyond Programmable Shading I and Beyond Programmable Shading II. There are strong indications that the future of interactive graphics programming is a more flexible model than [...]]]></description>
			<content:encoded><![CDATA[<p>The course notes and supplementary material for &#8220;Beyond Programmable Shading&#8221;, a full-day course held at SIGGRAPH 2009 on August 6, are <a href="http://s09.idav.ucdavis.edu/" target="_blank">now available online</a>.</p>
<p>This course is presented in two parts, <em><a href="http://www.siggraph.org/s2009/sessions/courses/details/?type=course&amp;">Beyond Programmable Shading I</a></em> and <em><a href="http://www.siggraph.org/s2009/sessions/courses/details/?type=course&amp;">Beyond Programmable Shading II</a></em>.</p>
<p>There are strong indications that the future of interactive graphics programming is a more flexible model than today&#8217;s OpenGL/Direct3D pipelines. Graphics developers need a basic understanding of how to combine emerging parallel programming techniques and more flexible graphics processors with the traditional interactive rendering pipeline. The first half of the course introduces the trends and directions in this emerging field. Topics include: parallel graphics architectures, parallel programming models for graphics, and game-developer investigations of the use of these new capabilities in future rendering engines.</p>
<p>The second half of the course has leaders from graphics hardware vendors, game development, and academic research present case studies that show how general parallel computation is being combined with the traditional graphics pipeline to boost image quality and spur new graphics algorithm innovation. Each case study discusses the mix of parallel programming constructs used, details of the graphics algorithm, and how the rendering pipeline and computation interact to achieve the technical goals. <span id="more-1784"></span>The focus is on what currently can be done, how it is done, and near-future trends. Topics include volumetric and hair lighting, alternate rendering pipelines including ray tracing and micropolygon rendering, in-frame data structure construction, and complex image processing. The course concludes with a panel, moderated by the creator of OpenGL Kurt Akeley, on the future of interactive graphics programming models.</p>
<p>The course presenters are experts on advanced rendering, graphics hardware, and parallel computing for graphics from academia and industry, and have presented papers and tutorials on the topic at SIGGRAPH, High Performance Graphics, Supercomputing, IEEE Visualization, and elsewhere.</p>
]]></content:encoded>
			<wfw:commentRss>http://gpgpu.org/2009/08/06/beyond-programmable-shading-siggraph-2009/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>ISC 2009 CUDA/OpenCL Tutorial Slides Posted</title>
		<link>http://gpgpu.org/2009/06/25/isc-2009-tutorial</link>
		<comments>http://gpgpu.org/2009/06/25/isc-2009-tutorial#comments</comments>
		<pubDate>Fri, 26 Jun 2009 00:16:47 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Developer Resources]]></category>
		<category><![CDATA[Events]]></category>
		<category><![CDATA[Fluid Simulation]]></category>
		<category><![CDATA[NVIDIA CUDA]]></category>
		<category><![CDATA[Tutorials & Courses]]></category>

		<guid isPermaLink="false">http://gpgpu.org/?p=1715</guid>
		<description><![CDATA[A tutorial on High Performance Computing with CUDA was held at the International Conference on Supercomputing in Hamburg on Monday, June 22nd 2009.  The tutorial included an introduction to the CUDA programming model and C for CUDA, along with details on the CUDA Toolkit, Libraries, and optimization.  The tutorial also provided an introduction to OpenCL, [...]]]></description>
			<content:encoded><![CDATA[<p>A tutorial on High Performance Computing with CUDA was held at the International Conference on Supercomputing in Hamburg on Monday, June 22nd 2009.  The tutorial included an introduction to the CUDA programming model and C for CUDA, along with details on the CUDA Toolkit, Libraries, and optimization.  The tutorial also provided an introduction to OpenCL, and finished with a case study on Computational Fluid Dynamics by Dr. Graham Pullan from Cambridge University.  <a href="http://gpgpu.org/isc2009">Slides from the tutorial</a> are now posted here on GPGPU.org.</p>
<p>(Massimiliano Fatica, Timo Stich, and Graham Pullan.  <em><a href="http://gpgpu.org/isc2009">High Performance Computing with CUDA</a></em>.  Tutorial.  International Conference on Supercomputing 2009.  Hamburg, Germany.)</p>
]]></content:encoded>
			<wfw:commentRss>http://gpgpu.org/2009/06/25/isc-2009-tutorial/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>ISC 2009 CUDA Tutorial</title>
		<link>http://gpgpu.org/isc2009</link>
		<comments>http://gpgpu.org/isc2009#comments</comments>
		<pubDate>Fri, 26 Jun 2009 00:05:22 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://gpgpu.org/?page_id=1699</guid>
		<description><![CDATA[High Performance Computing with CUDA Welcome to the course notes for the full-day CUDA Tutorial from the 2009 International Conference on Supercomputing! The tutorial was held at the International Conference on Supercomputing in Hamburg, Germany on Monday, June 22, 2009. Course Organizers Dr. Massimiliano Fatica, NVIDIA Corporation Course Speakers Dr. Timo Stich, NVIDIA Corporation Dr. Graham Pullan, University [...]]]></description>
			<content:encoded><![CDATA[<h3>High Performance Computing with CUDA</h3>
<p>Welcome to the course notes for the full-day CUDA Tutorial from the <a href="http://www.supercomp.de/isc09/" target="_blank">2009 International Conference on Supercomputing</a>!</p>
<p>The tutorial was held at the International Conference on Supercomputing in Hamburg, Germany on Monday, June 22, 2009.</p>
<h4><strong>Course Organizers</strong></h4>
<p>Dr. Massimiliano Fatica, <a href="http://www.nvidia.com/">NVIDIA Corporation</a></p>
<h4>Course Speakers</h4>
<p>Dr. Timo Stich, <a href="http://www.nvidia.com/">NVIDIA Corporation</a><br />
<a href="http://www.eng.cam.ac.uk/~gp10006/" target="_blank">Dr. Graham Pullan</a>, University of Cambridge, UK</p>
<h4>Tutorial Slides</h4>
<ul>
<li>Introduction to GPU Computing (<a href="http://gpgpu.org/wp/wp-content/uploads/2009/06/01-Intro.pdf">PDF)</a></li>
<li>Basic CUDA <a href="http://gpgpu.org/wp/wp-content/uploads/2009/06/02-CUDA_basic.pdf">(PDF)</a></li>
<li>CUDA Toolkit &amp; Libraries <a href="http://gpgpu.org/wp/wp-content/uploads/2009/06/03-Toolkit.pdf">(PDF)</a></li>
<li>CUDA Optimization <a href="http://gpgpu.org/wp/wp-content/uploads/2009/06/04-OptimizingCUDA.pdf">(PDF)</a></li>
<li>Introduction to OpenCL <a href="http://gpgpu.org/wp/wp-content/uploads/2009/06/05-OpenCLIntroduction.pdf">(PDF)</a></li>
<li>Case Study: Computational Fluid Dynamics <a href="http://gpgpu.org/wp/wp-content/uploads/2009/06/06-CFD.pdf">(PDF)</a></li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://gpgpu.org/isc2009/feed</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Triangular matrix inversion on Graphics Processing Unit</title>
		<link>http://gpgpu.org/2010/02/06/triangular-matrix-inversion</link>
		<comments>http://gpgpu.org/2010/02/06/triangular-matrix-inversion#comments</comments>
		<pubDate>Sat, 06 Feb 2010 10:24:16 +0000</pubDate>
		<dc:creator>dom</dc:creator>
				<category><![CDATA[Research]]></category>
		<category><![CDATA[Linear Algebra]]></category>
		<category><![CDATA[NVIDIA CUDA]]></category>
		<category><![CDATA[Papers]]></category>

		<guid isPermaLink="false">http://gpgpu.org/?p=2114</guid>
		<description><![CDATA[Abstract: Dense matrix inversion is a basic procedure in many linear algebra algorithms. A computationally arduous step in most dense matrix inversion methods is the inversion of triangular matrices as produced by factorization methods such as LU decomposition. In this paper, we demonstrate how triangular matrix inversion (TMI) can be accelerated considerably by using commercial [...]]]></description>
			<content:encoded><![CDATA[<p>Abstract:</p>
<blockquote><p>Dense matrix inversion is a basic procedure in many linear algebra algorithms. A computationally arduous step in most dense matrix inversion methods is the inversion of triangular matrices as produced by factorization methods such as LU decomposition. In this paper, we demonstrate how triangular matrix inversion (TMI) can be accelerated considerably by using commercial Graphics Processing Units (GPU) in a standard PC. Our implementation is based on a divide and conquer type recursive TMI algorithm, efficiently adapted to the GPU architecture. Our implementation obtains a speedup of 34x versus a CPU-based LAPACK reference routine, and runs at up to 54 gigaflops/s on a GTX 280 in double precision. Limitations of the algorithm are discussed, and strategies to cope with them are introduced. In addition, we show how inversion of an L- and U-matrix can be performed concurrently on a GTX 295 based dual-GPU system at up to 90 gigaflops/s.</p></blockquote>
<p>(Florian Ries, Tommaso De Marco, Matteo Zivieri and Roberto Guerrieri, <em>Triangular Matrix Inversion on Graphics Processing Units</em>, Supercomputing 2009, DOI <a href="http://dx.doi.org/10.1145/1654059.1654069" target="_blank">10.1145/1654059.1654069</a>)</p>
]]></content:encoded>
			<wfw:commentRss>http://gpgpu.org/2010/02/06/triangular-matrix-inversion/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Supercomputing 2009 birds-of-a-feather session on &#8220;The Art of Performance Tuning for CUDA and Manycore Architectures&#8221;</title>
		<link>http://gpgpu.org/2009/12/02/supercomputing-2009-performance-tuning-for-cuda</link>
		<comments>http://gpgpu.org/2009/12/02/supercomputing-2009-performance-tuning-for-cuda#comments</comments>
		<pubDate>Thu, 03 Dec 2009 00:41:17 +0000</pubDate>
		<dc:creator>Mark Harris</dc:creator>
				<category><![CDATA[Events]]></category>
		<category><![CDATA[Birds-of-a-Feather]]></category>
		<category><![CDATA[Conferences]]></category>
		<category><![CDATA[NVIDIA CUDA]]></category>
		<category><![CDATA[Supercomputing]]></category>

		<guid isPermaLink="false">http://gpgpu.org/?p=2036</guid>
		<description><![CDATA[High throughput architectures for HPC seem likely to emphasize many cores with deep multithreading, wide SIMD, and sophisticated memory hierarchies. GPUs present one example, and their high throughput has led a number of researchers to port computationally intensive applications to NVIDIA&#8217;s CUDA architecture. This session explored the art of performance tuning for CUDA using several [...]]]></description>
			<content:encoded><![CDATA[<p>High throughput architectures for HPC seem likely to emphasize many cores with deep multithreading, wide SIMD, and sophisticated memory hierarchies. GPUs present one example, and their high throughput has led a number of researchers to port computationally intensive applications to NVIDIA&#8217;s CUDA architecture.</p>
<p><a href="http://www.cs.virginia.edu/~skadron/Papers/cuda_tuning_bof_sc09_final.pdf" target="_blank">This session</a> explored the art of performance tuning for CUDA using several case studies. Topics included profiling to identify bottlenecks, effective use of the GPU&#8217;s memory hierarchy and DRAM interface to maximize bandwidth, data versus task parallelism, and avoiding SIMD divergence.  Many of the lessons learned in the context of CUDA are likely to apply to other many-core architectures used in HPC applications.</p>
]]></content:encoded>
			<wfw:commentRss>http://gpgpu.org/2009/12/02/supercomputing-2009-performance-tuning-for-cuda/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>CfP: International Conference on Supercomputing (ICS&#8217;10)</title>
		<link>http://gpgpu.org/2009/11/30/cfp-ics2010</link>
		<comments>http://gpgpu.org/2009/11/30/cfp-ics2010#comments</comments>
		<pubDate>Tue, 01 Dec 2009 00:58:04 +0000</pubDate>
		<dc:creator>Mark Harris</dc:creator>
				<category><![CDATA[Events]]></category>
		<category><![CDATA[Research]]></category>
		<category><![CDATA[Conferences]]></category>
		<category><![CDATA[High-Performance Computing]]></category>
		<category><![CDATA[Supercomputing]]></category>

		<guid isPermaLink="false">http://gpgpu.org/?p=1976</guid>
		<description><![CDATA[24th International Conference on Supercomputing (ICS&#8217;10) June 1-4, 2010 Epochal Tsukuba (Tsukuba International Congress Center) Tsukuba, Japan Sponsored by ACM/SIGARCH ICS is the premier international forum for the presentation of research results in high-performance computing systems.  In 2010 the conference will be held at the Epochal Tsukuba (Tsukuba International Congress Center) in Tsukuba City, the [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.ics-conference.org/" target="_blank">24th International Conference on Supercomputing (ICS&#8217;10)</a><br />
June 1-4, 2010<br />
<a href="http://www.epochal.or.jp/eng/" target="_blank">Epochal Tsukuba (Tsukuba International Congress Center)</a><br />
Tsukuba, Japan<br />
Sponsored by ACM/SIGARCH</p>
<p>ICS is the premier international forum for the presentation of research results in high-performance computing systems.  In 2010 the conference will be held at the Epochal Tsukuba (Tsukuba International Congress Center) in Tsukuba City, the largest high-tech and academic<br />
city in Japan.</p>
<p>Papers are solicited on all aspects of research, development, and application of high-performance experimental and commercial systems. Special emphasis will be given to work that leads to better understanding of the implications of the new era of million-scale parallelism and Exa-scale performance; including (but not limited to):<span id="more-1976"></span></p>
<ul>
<li>Computationally challenging scientific and commercial applications: studies and experiences to exploit ultra large scale parallelism, a large number of accelerators, and/or cloud computing paradigm.</li>
<li>High-performance computational and programming models: studies and proposals of new models, paradigms and languages for scalable application development, seamless exploitation of accelerators, and grid/cloud computing.</li>
<li>Architecture and hardware aspects: processor, accelerator, memory, interconnection network, storage and I/O architecture to make future systems scalable, reliable and power efficient.</li>
<li>Software aspects: compilers and runtime systems, programming and development tools, middleware and operating systems to enable us to scale applications and systems easily, efficiently and reliably.</li>
<li>Performance evaluation studies and theoretical underpinnings of any of the above topics, especially those giving us perspective toward future generation high-performance computing.</li>
<li>Large scale installations in the Petaflop era: design, scaling, power, and reliability, including case studies and experience reports, to show the baselines for future systems.</li>
</ul>
<p>In order to encourage open discussion on future directions, the program committee will provide higher priority for papers that present highly innovative and challenging ideas.</p>
<p>Papers should not exceed 6,000 words, and should be submitted electronically, in PDF format using the ICS&#8217;10 submission web site. Submissions should be blind.  The review process will include a rebuttal period. Please refer to the ICS&#8217;10 web site for detailed instructions.</p>
<p>Workshop and tutorial proposals are also be solicited and due by January 18, 2010.  For further information and future updates, refer to the ICS&#8217;10 web site at <a href="http://www.ics-conference.org/" target="_blank">http://www.ics-conference.org</a> or contact the General Chair (<a href="mailto:ics10-chair@hpcs.cs.tsukuba.ac.jp">ics10-chair@hpcs.cs.tsukuba.ac.jp</a>) or Program Co-Chairs (<a href="mailto:ics10-chairs@ac.upc.edu">ics10-chairs@ac.upc.edu</a>).</p>
<p><strong>Important Dates</strong></p>
<ul>
<li>Abstract submission:  January 11, 2010</li>
<li>Paper submission:     January 18, 2010</li>
<li>Author notification:  March 22, 2010</li>
<li>Final papers:         April 15, 2010</li>
</ul>
<p>For more information, please visit the conference web site at <a href="http://www.ics-conference.org/" target="_blank">http://www.ics-conference.org</a></p>
]]></content:encoded>
			<wfw:commentRss>http://gpgpu.org/2009/11/30/cfp-ics2010/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>NVIDIA Announces Next-Generation CUDA GPU Architecture &#8211; Codenamed &#8220;Fermi&#8221;</title>
		<link>http://gpgpu.org/2009/10/01/nvidia-next-generation-gpu-codenamed-fermi</link>
		<comments>http://gpgpu.org/2009/10/01/nvidia-next-generation-gpu-codenamed-fermi#comments</comments>
		<pubDate>Fri, 02 Oct 2009 02:17:00 +0000</pubDate>
		<dc:creator>Mark Harris</dc:creator>
				<category><![CDATA[Business]]></category>
		<category><![CDATA[Press]]></category>
		<category><![CDATA[GPUs]]></category>
		<category><![CDATA[NVIDIA]]></category>
		<category><![CDATA[NVIDIA CUDA]]></category>

		<guid isPermaLink="false">http://gpgpu.org/?p=1922</guid>
		<description><![CDATA[On September 30th NVIDIA unveiled its latest GPU architecture, codenamed &#8220;Fermi&#8221;.  The first Fermi GPUs will contain 512 &#8220;CUDA Cores&#8221;, capable of more than 8x the double precision floating-point throughput of its predecessor, the GT200 GPU.  The GPU also incorporates error correcting (ECC) memories and caches, a new cache hierarchy, increased shared memory and register [...]]]></description>
			<content:encoded><![CDATA[<p>On September 30th NVIDIA unveiled its latest GPU architecture, codenamed &#8220;Fermi&#8221;.  The first Fermi GPUs will contain 512 &#8220;CUDA Cores&#8221;, capable of more than 8x the double precision floating-point throughput of its predecessor, the GT200 GPU.  The GPU also incorporates error correcting (ECC) memories and caches, a new cache hierarchy, increased shared memory and register file sizes, and the ability to execute C++ programs.</p>
<p>From the <a href="http://www.nvidia.com/object/io_1254288141829.html" target="_blank">press release</a>:</p>
<blockquote><p><span>SANTA CLARA, Calif. -Sep. 30, 2009- </span>NVIDIA Corp. today introduced its next generation CUDA™ GPU architecture, codenamed &#8220;Fermi&#8221;. An entirely new ground-up design, the &#8220;Fermi&#8221;™ architecture is the foundation for the world&#8217;s first computational graphics processing units (GPUs), delivering breakthroughs in both graphics and GPU computing.</p>
<p>&#8220;NVIDIA and the Fermi team have taken a giant step towards making GPUs attractive for a broader class of programs,&#8221; said Dave Patterson, director Parallel Computing Research Laboratory, U.C. Berkeley and co-author of Computer Architecture: A Quantitative Approach. &#8220;I believe history will record Fermi as a significant milestone.&#8221;</p>
<p>Presented at the company&#8217;s inaugural GPU Technology Conference, in San Jose, California, &#8220;Fermi&#8221; delivers a feature set that accelerates performance on a wider array of computational applications than ever before. Joining NVIDIA&#8217;s press conference was <a style="color: #2a5db0;" href="http://www.nvidia.com/object/pr_oakridge_093009.html" target="_blank">Oak Ridge National Laboratory</a>who announced plans for a new supercomputer that will use NVIDIA<span>®</span> GPUs based on the &#8220;Fermi&#8221; architecture. &#8220;Fermi&#8221; also garnered the support of leading organizations including Bloomberg, Cray, Dell, HP, IBM and <a style="color: #2a5db0;" href="http://www.nvidia.com/object/io_1254126305481.html" target="_blank">Microsoft</a>.</p>
<p><span id="more-1922"></span></p>
<p>&#8220;It is completely clear that GPUs are now general purpose parallel computing processors with amazing graphics, and not just graphics chips anymore,&#8221; said Jen-Hsun Huang, co-founder and CEO of NVIDIA. &#8220;The Fermi architecture, the integrated tools, libraries and engines are the direct results of the insights we have gained from working with thousands of CUDA developers around the world. We will look back in the coming years and see that Fermi started the new GPU industry.&#8221;</p>
<p>As the foundation for NVIDIA&#8217;s family of next generation GPUs namely GeForce<span>®</span>, Quadro<span>®</span> and Tesla<span>®</span> &#8211; &#8220;Fermi&#8221; features a host of new technologies that are &#8220;must-have&#8221; features for the computing space, including:</p>
<ul>
<li style="margin-left: 15px;">C++, complementing existing support for C, Fortran, Java, Python, OpenCL and DirectCompute.</li>
<li style="margin-left: 15px;">ECC, a critical requirement for datacenters and supercomputing centers deploying GPUs on a large scale</li>
<li style="margin-left: 15px;">512 CUDA Cores™ featuring the new IEEE 754-2008 floating-point standard, surpassing even the most advanced CPUs</li>
<li style="margin-left: 15px;">8x the peak double precision arithmetic performance over NVIDIA&#8217;s last generation GPU. Double precision is critical for high-performance computing (HPC) applications such as linear algebra, numerical simulation, and quantum chemistry</li>
<li style="margin-left: 15px;">NVIDIA Parallel DataCache™ &#8211; the world&#8217;s first true cache hierarchy in a GPU that speeds up algorithms such as physics solvers, raytracing, and sparse matrix multiplication where data addresses are not known beforehand</li>
<li style="margin-left: 15px;">NVIDIA GigaThread™ Engine with support for concurrent kernel execution, where different kernels of the same application context can execute on the GPU at the same time (eg: PhysX<span>®</span> fluid and rigid body solvers)</li>
<li style="margin-left: 15px;"><a style="color: #2a5db0;" href="http://www.nvidia.com/object/pr_nexus_093009.html" target="_blank">Nexus</a> &#8211; the world&#8217;s first fully integrated heterogeneous computing application development environment within Microsoft Visual Studio</li>
</ul>
<p>Images, technical whitepapers, presentations, videos and more on &#8220;Fermi&#8221; can all be found at: <a style="color: #2a5db0;" href="http://www.nvidia.com/object/fermi_architecture.html" target="_blank">www.nvidia.com/fermi</a>.</p></blockquote>
]]></content:encoded>
			<wfw:commentRss>http://gpgpu.org/2009/10/01/nvidia-next-generation-gpu-codenamed-fermi/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

