ned14 G
Joined: 13 May 2007 Posts: 65 Location: St. Andrews University, Scotland
|
Posted: Mon Nov 12, 2007 8:51 pm Post subject: ATTN: SSE and OpenMP upgrade of CPU backend complete |
|
|
Dear all,
I am very pleased to announce that the SSE and OpenMP upgrade of the CPU backend of Brook has been finished. The speed increase for float4 using kernels is dramatic: on a dual core processor, the same kernel shrank from 88 seconds on old Brook down to 4 seconds on this upgrade!
You can find this update on the "Niall's Update" branch in SVN at rev 1871: http://brook.svn.sourceforge.net/viewvc/brook/branches/Niall's%20Update/
Some notes:
* SSE acceleration only works for float4, nothing else. This means you need SSE1 on your x86 processor as it's turned on by default - to disable, define BRT_USE_SSE to 0.
* If you have SSE4 on your target processor, defining BRT_USE_SSE to 4 enables the use of the new horizontal instructions to significantly speed up various miscellaneous calculations.
* OpenMP support only works when you compile the Brook output using the appropriate command line switches to enable OpenMP. If you don't use these switches, single core usage continues. Note you don't need to compile the runtime or anything else with OpenMP support enabled - I figured this was easier for everyone concerned.
* Only normal kernels will be OpenMP accelerated. Reduction kernels stay single threaded - this is due to the obvious difficulties of multiple threads using the reduction destination at once.
* GCC produces vastly faster output if you use the -mfpmath=sse -ffast-math command line options.
* You MUST ensure your arrays of float4 are 16 byte aligned!!! Use BRTALIGNED for all stack allocated arrays. Use brmalloc() and brfree() for dynamically allocated arrays. Failure to do this correctly will usually be caught by assertion checks in the runtime, if not by a fatal CPU exception as the SSE unit needs 16 byte alignment.
Obviously this upgrade may have bugs in it. Please report these here!
Thanks in advance,
Niall |
|