The subject may be a bit misleading, but the imprecisement is rather
derived from how binary works, but maybe you smart people can help me
out with a workaround or something.
Ok, I'm relatively new to ASM(a few days actually), but I have a
background with C, Python, CLisp and such.
Anyway, I am trying to build the very-fast-prime-counter i assembly
and I succeeded when using only the regular instructions, but a 6.5
secs for a 100k primes is way too slow, so I looked around and
profiled my code...and of course, a major bottleneck was the idivl. So
I decided to use the 3DNow! instructions.
ecx = ecx+2 (begins from 3; 2 is already in the stack)
edx = all found primes less than current ecx, are looped through it.
MOVD %ecx, %MM0 /* divisor */
PI2FD %MM0, %MM0 /* divisor to float */
MOVQ %MM0, %MM2 /* copy of divisor */
PFRCP %MM0, %MM0 /* 1/divisor */
MOVD %edx, %MM1 /* prime_candidate */
PI2FD %MM1, %MM1 /* prime. to float */
MOVQ %MM1, %MM3 /* copy of prime. */
PFMUL %MM0, %MM1 /* 1/divisor * prime */
PF2ID %MM1, %MM1 /* answer to int ...*/
PI2FD %MM1, %MM1 /* ... and back to float */ /* NOTE1 */
PFMUL %MM1, %MM2 /* roundeddown(1/divisor*prime.)*divisor
PFCMPEQ %MM2, %MM3
MOVD %MM3, %eax /* if prime check "good", then != */
I will reorganize it into a tight loop later, and will, of course, do
other logical optimizations, but the real problem is currently the
following:
It will run perfectly until ecx=9 and edx=3. After "NOTE1", the MM1
should be 3.0, but it seems to get rounded down to 2 on the toint-
tofloat round. It seems the previous calculation before the converting
stays a bit below 3, and then gets rounded down.
Any workarounds on this problem? I tried using the SSE instructions,
but then I read that GNU "as" doesn't support them :/
Any help(even if just saying "you're screwed!") will be greatly
appreciated :)
Thanks,
Tanel


|