Talk About Network



Register and Login
Nick
Password
Register create new account Sign up is FREE and you can post replies, new topics, bookmark posts and more!
Recover lost password


Programming > Assembly x86 > AMD's 3DNow! im...
Latest [ Topics | Posts ] Archive Post A New Topic Post a Reply
<< Topic < Post Post 1 of 12 Topic 4602 of 4646
Post > Topic >>

AMD's 3DNow! imprecise

by "kuratkull@[EMAIL PROTECTED] " <spamtrap@[EMAIL PROTECTED] > Apr 2, 2008 at 12:50 PM

The subject may be a bit misleading, but the imprecisement is rather
derived from how binary works, but maybe you smart people can help me
out with a workaround or something.
Ok, I'm relatively new to ASM(a few days actually), but I have a
background with C, Python, CLisp and such.

Anyway, I am trying to build the very-fast-prime-counter i assembly
and I succeeded when using only the regular instructions, but a 6.5
secs for a 100k primes is way too slow, so I looked around and
profiled my code...and of course, a major bottleneck was the idivl. So
I decided to use the 3DNow! instructions.

ecx  = ecx+2 (begins from 3; 2 is already in the stack)
edx = all found primes less than current ecx, are looped through it.

MOVD    %ecx, %MM0      /* divisor */
PI2FD   %MM0, %MM0      /* divisor to float */
MOVQ    %MM0, %MM2      /* copy of divisor */
PFRCP   %MM0, %MM0      /* 1/divisor */
MOVD    %edx, %MM1      /* prime_candidate */
PI2FD   %MM1, %MM1      /* prime. to float */
MOVQ    %MM1, %MM3      /* copy of prime. */
PFMUL   %MM0, %MM1      /* 1/divisor * prime */
PF2ID   %MM1, %MM1      /* answer to int ...*/
PI2FD   %MM1, %MM1      /* ... and back to float */     /* NOTE1 */
PFMUL   %MM1, %MM2    /* roundeddown(1/divisor*prime.)*divisor
PFCMPEQ %MM2, %MM3
MOVD %MM3, %eax   /* if prime check "good", then != */


I will reorganize it into a tight loop later, and will, of course, do
other logical optimizations, but the real problem is currently the
following:
It will run perfectly until ecx=9 and edx=3. After "NOTE1", the MM1
should be 3.0, but it seems to get rounded down to 2 on the toint-
tofloat round. It seems the previous calculation before the converting
stays a bit below 3, and then gets rounded down.

Any workarounds on this problem? I tried using the SSE instructions,
but then I read that GNU "as" doesn't support them :/

Any help(even if just saying "you're screwed!") will be greatly
appreciated :)

Thanks,
Tanel




 12 Posts in Topic:
AMD's 3DNow! imprecise
"kuratkull@[EMAIL PR  2008-04-02 12:50:35 
Re: AMD's 3DNow! imprecise
kuratkull <spamtrap@[  2008-04-03 14:23:43 
Re: AMD's 3DNow! imprecise
kuratkull <spamtrap@[  2008-04-03 14:13:07 
Re: AMD's 3DNow! imprecise
Sebastian Biallas <sp  2008-04-04 00:23:22 
Re: AMD's 3DNow! imprecise
Waldek Hebisch <spamt  2008-04-04 02:23:23 
Re: AMD's 3DNow! imprecise
Terje Mathisen <spamt  2008-04-04 07:36:03 
Re: AMD's 3DNow! imprecise
kuratkull <spamtrap@[  2008-04-03 23:13:00 
Re: AMD's 3DNow! imprecise
Phil Carmody <thefatph  2008-04-04 23:18:38 
Re: AMD's 3DNow! imprecise
kuratkull <spamtrap@[  2008-04-03 22:45:12 
Re: AMD's 3DNow! imprecise
Sebastian Biallas <sp  2008-04-07 13:55:19 
Re: AMD's 3DNow! imprecise
kuratkull <spamtrap@[  2008-04-08 05:19:12 
Re: AMD's 3DNow! imprecise
"James Van Buskirk&q  2008-04-10 13:08:32 

Post A Reply:
  Go here to Signup

AddThis Feed Button


About - Advertising - Contact - Frequently Asked Questions - Privacy Policy - Terms of Use - Signup

Contact
tan12V112 Wed May 14 3:26:18 CDT 2008.