www.gpgpu.org Forum Index www.gpgpu.org
General Purpose Computation on GPUs
 
 FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

16bit float sufficient on ATI, but not NVIDIA

 
Post new topic   Reply to topic    www.gpgpu.org Forum Index -> Programming Help
View previous topic :: View next topic  
Author Message
ssinha
.


Joined: 21 Feb 2006
Posts: 4
Location: UNC Chapel Hill

PostPosted: Tue Jul 04, 2006 3:50 pm    Post subject: 16bit float sufficient on ATI, but not NVIDIA Reply with quote

I have implemented a [OpenGL/Cg/WinXP] GPU version of KLT, a well-known feature tracking algorithm in computer vision and have tested it on various ATI and NVIDIA cards.

ATI: Radeon 850, 1800XT, 1900XT
NV : 6800 Ultra GO(Laptop), 7800 GTX, 7900 GTX

On ATI, I use RGBA 16 float_ATI textures with texture target GL_TEXTURE_2D.
On Nvidia, I used GL_TEXTURE_RECTANGLE_NV. To get the same degree of accuracy I had to use RGBA 32 float_NV textures. 16 bit floating point gave me inaccurate results. I found the same behaviour on 3 different PC's.

Has anyone seen this behaviour before ? Is there a known explanation for it ?

Due to the need for 32 bit precision, my tracker runs slower on machines with Nvidia cards.

For the same problem size, the following timings were observed.

    NV_7800 50.68 msec
    NV_7900 46.79 msec
    ATI_1900 29.24 msec
    ATI_1800 36.01 msec
    ATI__850 39.10 msec


What looks odd to me is that the ATI-850 beats the NV-7800 and NV-7900. Could this difference be just due to using GL_TEXTURE_2D & 16bit float on ATI and GL_TEXTURE_RECTANGLE_NV & 32bit float on NVIDIA ?

Some input from those of you who have an idea would be great !
Thanks in advance,
Sudipta
Back to top
View user's profile Send private message Send e-mail Visit poster's website
mhouston
GPGPU


Joined: 02 Sep 2003
Posts: 1001
Location: Santa Clara

PostPosted: Tue Jul 04, 2006 5:44 pm    Post subject: Reply with quote

All ATI boards do the math at 32-bit precision and then clamp the output to 16 bit. Nvidia will actually do the math at 16-bit. So, it's likely you need higher precision for intermediate results than 16-bit. You should be able to write pixel shader asm (fp40) by hand to get this behavior, and you might even be able to convince Cg to use 32-bit values as intermediates.

ATI boards tend to do a better job at hiding memory latency, as well as dealing with register pressure better. When you move to 32-bit on Nvidia, you increase the bandwidth requirements, and you double your register count. Texture rectangles shouldn't make a difference.

Also, from your timing numbers for X850->X1900, it appears that your application is memory bound, unless you are including setup and download/readback in your numbers.

edit: correction, the X800/850 (R4XX) do math at *24-bit* precision for intermediates.
Back to top
View user's profile Send private message Visit poster's website
ssinha
.


Joined: 21 Feb 2006
Posts: 4
Location: UNC Chapel Hill

PostPosted: Tue Jul 04, 2006 6:10 pm    Post subject: Reply with quote

Hi Mike,
thanks for your quick reply. I just wanted to make sure I understand you correctly.

Did you mean that on Nvidia, even if my fragment shaders had variables declared as float, float2, float4 (NOT half) they would be treated as 16bit floats when using these shaders with float16 textures ?

Ok, I will look into ways of convincing Cg to use 32 bit for intermediates ...

The posted timings include setup and download time although you are right my application is memory bound.
Back to top
View user's profile Send private message Send e-mail Visit poster's website
mhouston
GPGPU


Joined: 02 Sep 2003
Posts: 1001
Location: Santa Clara

PostPosted: Tue Jul 04, 2006 6:42 pm    Post subject: Reply with quote

You'll have to look at the generated assembly. If Cg can used reduced precision, it often does. There are also a plethora of options you can give to Cg.

You also might want to try the Nvidia or ARB version of the float_16 texture format. I don't remember it off the top of my head, and the enum may map to the ATI one.
Back to top
View user's profile Send private message Visit poster's website
Chris Dodd
GP


Joined: 17 Feb 2005
Posts: 161

PostPosted: Wed Jul 05, 2006 4:23 pm    Post subject: Precision issues Reply with quote

Variables of type float/float2/float4 should be stored and operated on in 32 bit precision, but the driver has some optimizations that tend to do operations on the results of reads from 16-bit or smaller textures in 16-bit precision instead of 32-bit precision. With new versions of cgc (1.5), you can use the -bestprecision option to defeat this optimization. With older versions you can use -nofastmath/-nofastprecision which may or may not fix the issue.
Back to top
View user's profile Send private message
ssinha
.


Joined: 21 Feb 2006
Posts: 4
Location: UNC Chapel Hill

PostPosted: Wed Jul 05, 2006 10:19 pm    Post subject: Reply with quote

I tried to pass in Cg compiler options in this way ..
I tried the following 3 combinations.
-nofastmath
-nofastprecision
-nofastmath -nofastprecision

but none of these solved the problem.

Code:

const char* args[] = { "-nofastprecision", 0 }; 
      if (programInFile)
         _FP = cgCreateProgramFromFile(fContext,
                               CG_SOURCE, FPsource,
                               fProfile, 0,args);


The entry-point of my fragment program is main( )

I then tried running cgc.exe on the fragment program and inspected the assembly but the -nofastmath/-nofastprecision didn't make any difference to the generated assembly.
Back to top
View user's profile Send private message Send e-mail Visit poster's website
mhouston
GPGPU


Joined: 02 Sep 2003
Posts: 1001
Location: Santa Clara

PostPosted: Thu Jul 06, 2006 12:14 am    Post subject: Reply with quote

Try Cg 1.5 as Chris mentioned. There are various bug fixes there.
Back to top
View user's profile Send private message Visit poster's website
Display posts from previous:   
Post new topic   Reply to topic    www.gpgpu.org Forum Index -> Programming Help All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © 2001, 2005 phpBB Group