www.gpgpu.org Forum Index www.gpgpu.org
General Purpose Computation on GPUs
 
 FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

ATTN: SSE and OpenMP upgrade of CPU backend complete

 
Post new topic   Reply to topic    www.gpgpu.org Forum Index -> Getting Started with Brook
View previous topic :: View next topic  
Author Message
ned14
G


Joined: 13 May 2007
Posts: 65
Location: St. Andrews University, Scotland

PostPosted: Mon Nov 12, 2007 8:51 pm    Post subject: ATTN: SSE and OpenMP upgrade of CPU backend complete Reply with quote

Dear all,

I am very pleased to announce that the SSE and OpenMP upgrade of the CPU backend of Brook has been finished. The speed increase for float4 using kernels is dramatic: on a dual core processor, the same kernel shrank from 88 seconds on old Brook down to 4 seconds on this upgrade! Smile

You can find this update on the "Niall's Update" branch in SVN at rev 1871: http://brook.svn.sourceforge.net/viewvc/brook/branches/Niall's%20Update/

Some notes:
    * SSE acceleration only works for float4, nothing else. This means you need SSE1 on your x86 processor as it's turned on by default - to disable, define BRT_USE_SSE to 0.
    * If you have SSE4 on your target processor, defining BRT_USE_SSE to 4 enables the use of the new horizontal instructions to significantly speed up various miscellaneous calculations.
    * OpenMP support only works when you compile the Brook output using the appropriate command line switches to enable OpenMP. If you don't use these switches, single core usage continues. Note you don't need to compile the runtime or anything else with OpenMP support enabled - I figured this was easier for everyone concerned.
    * Only normal kernels will be OpenMP accelerated. Reduction kernels stay single threaded - this is due to the obvious difficulties of multiple threads using the reduction destination at once.
    * GCC produces vastly faster output if you use the -mfpmath=sse -ffast-math command line options.
    * You MUST ensure your arrays of float4 are 16 byte aligned!!! Use BRTALIGNED for all stack allocated arrays. Use brmalloc() and brfree() for dynamically allocated arrays. Failure to do this correctly will usually be caught by assertion checks in the runtime, if not by a fatal CPU exception as the SSE unit needs 16 byte alignment.


Obviously this upgrade may have bugs in it. Please report these here!

Thanks in advance,
Niall
Back to top
View user's profile Send private message
Display posts from previous:   
Post new topic   Reply to topic    www.gpgpu.org Forum Index -> Getting Started with Brook All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © 2001, 2005 phpBB Group