| View previous topic :: View next topic |
| Author |
Message |
ssinha .
Joined: 21 Feb 2006 Posts: 4 Location: UNC Chapel Hill
|
Posted: Tue Jul 04, 2006 3:50 pm Post subject: 16bit float sufficient on ATI, but not NVIDIA |
|
|
I have implemented a [OpenGL/Cg/WinXP] GPU version of KLT, a well-known feature tracking algorithm in computer vision and have tested it on various ATI and NVIDIA cards.
ATI: Radeon 850, 1800XT, 1900XT
NV : 6800 Ultra GO(Laptop), 7800 GTX, 7900 GTX
On ATI, I use RGBA 16 float_ATI textures with texture target GL_TEXTURE_2D.
On Nvidia, I used GL_TEXTURE_RECTANGLE_NV. To get the same degree of accuracy I had to use RGBA 32 float_NV textures. 16 bit floating point gave me inaccurate results. I found the same behaviour on 3 different PC's.
Has anyone seen this behaviour before ? Is there a known explanation for it ?
Due to the need for 32 bit precision, my tracker runs slower on machines with Nvidia cards.
For the same problem size, the following timings were observed.
NV_7800 50.68 msec
NV_7900 46.79 msec
ATI_1900 29.24 msec
ATI_1800 36.01 msec
ATI__850 39.10 msec
What looks odd to me is that the ATI-850 beats the NV-7800 and NV-7900. Could this difference be just due to using GL_TEXTURE_2D & 16bit float on ATI and GL_TEXTURE_RECTANGLE_NV & 32bit float on NVIDIA ?
Some input from those of you who have an idea would be great !
Thanks in advance,
Sudipta |
|
| Back to top |
|
 |
mhouston GPGPU

Joined: 02 Sep 2003 Posts: 1001 Location: Santa Clara
|
Posted: Tue Jul 04, 2006 5:44 pm Post subject: |
|
|
All ATI boards do the math at 32-bit precision and then clamp the output to 16 bit. Nvidia will actually do the math at 16-bit. So, it's likely you need higher precision for intermediate results than 16-bit. You should be able to write pixel shader asm (fp40) by hand to get this behavior, and you might even be able to convince Cg to use 32-bit values as intermediates.
ATI boards tend to do a better job at hiding memory latency, as well as dealing with register pressure better. When you move to 32-bit on Nvidia, you increase the bandwidth requirements, and you double your register count. Texture rectangles shouldn't make a difference.
Also, from your timing numbers for X850->X1900, it appears that your application is memory bound, unless you are including setup and download/readback in your numbers.
edit: correction, the X800/850 (R4XX) do math at *24-bit* precision for intermediates. |
|
| Back to top |
|
 |
ssinha .
Joined: 21 Feb 2006 Posts: 4 Location: UNC Chapel Hill
|
Posted: Tue Jul 04, 2006 6:10 pm Post subject: |
|
|
Hi Mike,
thanks for your quick reply. I just wanted to make sure I understand you correctly.
Did you mean that on Nvidia, even if my fragment shaders had variables declared as float, float2, float4 (NOT half) they would be treated as 16bit floats when using these shaders with float16 textures ?
Ok, I will look into ways of convincing Cg to use 32 bit for intermediates ...
The posted timings include setup and download time although you are right my application is memory bound. |
|
| Back to top |
|
 |
mhouston GPGPU

Joined: 02 Sep 2003 Posts: 1001 Location: Santa Clara
|
Posted: Tue Jul 04, 2006 6:42 pm Post subject: |
|
|
You'll have to look at the generated assembly. If Cg can used reduced precision, it often does. There are also a plethora of options you can give to Cg.
You also might want to try the Nvidia or ARB version of the float_16 texture format. I don't remember it off the top of my head, and the enum may map to the ATI one. |
|
| Back to top |
|
 |
Chris Dodd GP
Joined: 17 Feb 2005 Posts: 161
|
Posted: Wed Jul 05, 2006 4:23 pm Post subject: Precision issues |
|
|
| Variables of type float/float2/float4 should be stored and operated on in 32 bit precision, but the driver has some optimizations that tend to do operations on the results of reads from 16-bit or smaller textures in 16-bit precision instead of 32-bit precision. With new versions of cgc (1.5), you can use the -bestprecision option to defeat this optimization. With older versions you can use -nofastmath/-nofastprecision which may or may not fix the issue. |
|
| Back to top |
|
 |
ssinha .
Joined: 21 Feb 2006 Posts: 4 Location: UNC Chapel Hill
|
Posted: Wed Jul 05, 2006 10:19 pm Post subject: |
|
|
I tried to pass in Cg compiler options in this way ..
I tried the following 3 combinations.
-nofastmath
-nofastprecision
-nofastmath -nofastprecision
but none of these solved the problem.
| Code: |
const char* args[] = { "-nofastprecision", 0 };
if (programInFile)
_FP = cgCreateProgramFromFile(fContext,
CG_SOURCE, FPsource,
fProfile, 0,args);
|
The entry-point of my fragment program is main( )
I then tried running cgc.exe on the fragment program and inspected the assembly but the -nofastmath/-nofastprecision didn't make any difference to the generated assembly. |
|
| Back to top |
|
 |
mhouston GPGPU

Joined: 02 Sep 2003 Posts: 1001 Location: Santa Clara
|
Posted: Thu Jul 06, 2006 12:14 am Post subject: |
|
|
| Try Cg 1.5 as Chris mentioned. There are various bug fixes there. |
|
| Back to top |
|
 |
|