Talk About Network

Google


Register and Login
Nick
Password
Register create new account Sign up is FREE and you can post replies, new topics, bookmark posts and more!
Recover lost password


Programming > Assembly x86 > Re: AMD's 3DNow...
Latest [ Topics | Posts ] Archive Post A New Topic Post a Reply
<< Topic < Post Post 9 of 12 Topic 4602 of 4729
Post > Topic >>

Re: AMD's 3DNow! imprecise

by kuratkull <spamtrap@[EMAIL PROTECTED] > Apr 3, 2008 at 10:45 PM

On Apr 4, 5:23 am, Waldek Hebisch  <spamt...@[EMAIL PROTECTED]
> wrote:
> kuratk...@[EMAIL PROTECTED]
 <spamt...@[EMAIL PROTECTED]
> wrote:
> > Anyway, I am trying to build the very-fast-prime-counter i assembly
> > and I succeeded when using only the regular instructions, but a 6.5
> > secs for a 100k primes is way too slow, so I looked around and
> > profiled my code...and of course, a major bottleneck was the idivl.
>
> I am not sure what "prime-counter" mean -- I you want just generate
> primes than sieve of Eratosthenes is the standard method (much faster
> than method using division).
>
> In case you really want to use division, you may consider storing
> recipracals -- you are using the same divisor many times.
>
>
>
> > So
> > I decided to use the 3DNow! instructions.
>
> > ecx  = ecx+2 (begins from 3; 2 is already in the stack)
> > edx = all found primes less than current ecx, are looped through it.
>
> > MOVD    %ecx, %MM0      /* divisor */
> > PI2FD   %MM0, %MM0      /* divisor to float */
> > MOVQ    %MM0, %MM2      /* copy of divisor */
> > PFRCP   %MM0, %MM0      /* 1/divisor */
> > MOVD    %edx, %MM1      /* prime_candidate */
> > PI2FD   %MM1, %MM1      /* prime. to float */
> > MOVQ    %MM1, %MM3      /* copy of prime. */
> > PFMUL   %MM0, %MM1      /* 1/divisor * prime */
> > PF2ID   %MM1, %MM1      /* answer to int ...*/
> > PI2FD   %MM1, %MM1      /* ... and back to float */     /* NOTE1 */
> > PFMUL   %MM1, %MM2    /* roundeddown(1/divisor*prime.)*divisor
> > PFCMPEQ %MM2, %MM3
> > MOVD %MM3, %eax   /* if prime check "good", then != */
>
> > It will run perfectly until ecx=9 and edx=3. After "NOTE1", the MM1
> > should be 3.0, but it seems to get rounded down to 2 on the toint-
> > tofloat round. It seems the previous calculation before the converting
> > stays a bit below 3, and then gets rounded down.
>
> 3DNow! PFRCP intruction give only approximate result (unlike 387 FPU
> the result in _not_ correctly rounded).  Worse, according to AMD
> do***entation PFRCP has only 12 bits of accuracy, so you are limited
> to really tiny numbers.  So, you probably should add PFRCPIT1 and
> PFRCPIT2 instructions, which together should give you 24 bits of
> accuracy (however, the do***entation I have is pretty unclear what
> accuraccy is guaranteed, so I would not trust in the last 1-2 bits).
>
> Given inaccuracy of reciprocal and rounding of multiplicatin you
> may get result which is smaller than true quotient.  If the true
> result is an integer and rounded result is smaller then rounding
> down give you wrong result.  To avoid this error you must make
> sure that approximate result is _always_ bigger than true result.
> Simple way to to multiply approximate a/b by a number slightly
> bigger than 1, say 1+2^(-20) (this should be enough if the
> approximate a/b has 21 bits of accuracy).  Of course, there is
> a danger that real a/b is smaller than an integer, but
> after biasing up may get bigger and you may get the result
> which is bigger than correct result.  However, this can not
> happen if the numbers are small enough.  Namely, if a and b
> are positive integer and a/b in not an integer, than distance
> from a/b to the nearest integer is at least 1/b.  As long as
> absolute error of the calculation is smaller than 1/b there
> is no risk that after rounding you get number bigger than
> true result.  Error bounds for floating point computations
> are relative, so you really want relative error smaller than
> 1/a.  Which means that you are limited to 18-19 bit a.
> If you want bigger numbers than it is hard to beat IDIV.
> One may try the same trick used by PFRCP, PFRCPIT1, PFRCPIT2 --
> first get rough approximantion (say, via table lookup) and
> refine using Newton iteration.  But it is not clear if
> table lookup and several aritmetic instructions needed
> for Newton iteration will be faster than IDIV...
>
> --
>                               Waldek Hebisch
> hebi...@[EMAIL PROTECTED]
 our posts are falling out of sync with eachother.

Sorry Sebastian, I mixed up: the gas doc. says:
"Currently, as does not sup****t Intel's floating point SIMD, Katmai
(KNI)."
->
SSE works, but the examples I tried didn't work for some reason, after
I had "converted" them from Intel to AT&T. Enough about this :)

Waldek:
By prime-counter I meant trial division with every previously found
prime. (I have looked into number theory as a personal interest ).

Thanks for the detailed info on the workaround, but I am more
staggered to find that the 3DNow! actually IS imprecise :/ So it is
basically useless, because as I said, it generates ~500 primes more
for 1000,000 than it should. (I previously wrote 10 mil, sorry).

The solution you offered is very nice, but I was hoping the solution
to be without limits, though I appreciate your effort. It just looks
like I have to dump 3DNow! altogether. (I'm going to get myself a new
IntC2D laptop, so I will be able to use SSE upto SSSE3).
 




 12 Posts in Topic:
AMD's 3DNow! imprecise
"kuratkull@[EMAIL PR  2008-04-02 12:50:35 
Re: AMD's 3DNow! imprecise
kuratkull <spamtrap@[  2008-04-03 14:23:43 
Re: AMD's 3DNow! imprecise
kuratkull <spamtrap@[  2008-04-03 14:13:07 
Re: AMD's 3DNow! imprecise
Sebastian Biallas <sp  2008-04-04 00:23:22 
Re: AMD's 3DNow! imprecise
Waldek Hebisch <spamt  2008-04-04 02:23:23 
Re: AMD's 3DNow! imprecise
Terje Mathisen <spamt  2008-04-04 07:36:03 
Re: AMD's 3DNow! imprecise
kuratkull <spamtrap@[  2008-04-03 23:13:00 
Re: AMD's 3DNow! imprecise
Phil Carmody <thefatph  2008-04-04 23:18:38 
Re: AMD's 3DNow! imprecise
kuratkull <spamtrap@[  2008-04-03 22:45:12 
Re: AMD's 3DNow! imprecise
Sebastian Biallas <sp  2008-04-07 13:55:19 
Re: AMD's 3DNow! imprecise
kuratkull <spamtrap@[  2008-04-08 05:19:12 
Re: AMD's 3DNow! imprecise
"James Van Buskirk&q  2008-04-10 13:08:32 

Post A Reply:
  Go here to Signup

AddThis Feed Button


About - Advertising - Contact - Frequently Asked Questions - Privacy Policy - Terms of Use - Signup

Contact
tan12V112 Thu Jul 24 14:54:06 CDT 2008.