Talk About Network

Google


Register and Login
Nick
Password
Register create new account Sign up is FREE and you can post replies, new topics, bookmark posts and more!
Recover lost password


Programming > Assembly x86 > Re: AMD's 3DNow...
Latest [ Topics | Posts ] Archive Post A New Topic Post a Reply
<< Topic < Post Post 8 of 12 Topic 4602 of 4728
Post > Topic >>

Re: AMD's 3DNow! imprecise

by Phil Carmody <thefatphil_demunged@[EMAIL PROTECTED] > Apr 4, 2008 at 11:18 PM

kuratkull  <spamtrap@[EMAIL PROTECTED]
> writes:
> On Apr 4, 8:36 am, Terje Mathisen  <spamt...@[EMAIL PROTECTED]
> wrote:
> > Waldek Hebisch wrote:
> > > kuratk...@[EMAIL PROTECTED]
 <spamt...@[EMAIL PROTECTED]
> wrote:
> > >> Anyway, I am trying to build the very-fast-prime-counter i assembly
> > >> and I succeeded when using only the regular instructions, but a 6.5
> > >> secs for a 100k primes is way too slow, so I looked around and
> > >> profiled my code...and of course, a major bottleneck was the idivl.
> >
> > > I am not sure what "prime-counter" mean -- I you want just generate
> > > primes than sieve of Eratosthenes is the standard method (much
faster
> > > than method using division).
> >
> > Indeed, that is the way to do it for any kind of range processing,
until
> > you get into stuff like 500+ bit prime number searching (for RSA
keys),
> > in which case you should be using statistical procedures, i.e. modular
> > arithmetic.
> >
> > > In case you really want to use division, you may consider storing
> > > recipracals -- you are using the same divisor many times.
> >
> > Using the same divisor/reciprocal many times simply means using a
sieve,
> > and for a properly compressed base table (personally I prefer storing
30
> > numbers in each 8-bit entry) you can even do stuff like casting out
> > multiple trial divisor multiples at the same time, when the divisors
are
> > still pretty small.
> >
> > Do a search on +prime +"sieve of Eratosthenes"!
> >
> > I'm guessing you have at least a couple of orders of magnitude in
speed
> > to gain. :-)
> 
> Primes are like my personal hobby, so yes, I know pretty much about
> them (trying not to boast). I have ~10 different algorythms
> implemented in C, I also have 4 primes which belong to the TOP5000
> largest primes in the world list(found using LLR and sieving).
> So please stop suggesting me to use the SOE, I know :)

*cough*

The problem is that if you're looking at optimising assembly 
at this stage, you've obviously ignored everything that Knuth
has said about optimisation. As has been pointed out before, 
you've not even clearly described exactly what task you wish
to perform. 

If, as you first said, you want a prime counter, and the 
magnitudes of your bounds are quite small, but the range
between them not insignificant (say 1e18-1.01e18) then the 
quickest way is to use something like a Meissel-Lehmer 
algorithm, or at least Legendre's algorithm, rather than 
a sieve. PD C and Java source can be found on sci.math
dating about 5 years ago. (When James Harris was in full 
flood.)

However, if your range is narrow, then a sieve may be best.
The precise type of sieve that's best will depend on the
magnitude of the numbers involved. If your implementation 
is polished, then an eratosthenes with intelligent queuing
of the small primes will be optimal up to 10^17 or so.
Tomás Oliveira e Silva has a pretty good open source 
implementation which if tweaked correctly can approach
James van Buskirk's, which I believe to be the fastest
Eratosthenes. 

Beyond that, you'll want an Atkin or Galway sieve. DJB's
primegen does work well at ~10^18 and beyond despite his
disclaimers, as long as you parametrise it correctly.
(Most people who have complained about its speed have 
simply not read the instructions, it degrades badly if
you mis-parametrise.) As far as I know there are no 
particularly fast implementations of a Galway sieve, 
but as long as you're prepared to handle numbers >64
bits, a Galway sieve should run quite happily up to 10^20
on a commodity PC.

If you're actually looking at finding moderately large 
primes in a modest range, then deep sieving becomes 
relatively unim****tant, as the PRP tests begin to dominate. 
Shallow sieving, of course, is vital. The cutoff between 
shallow and deep is something for which you'll need to do 
a little bit of arithmetic. (I.e. what Terje said above.)

However, whatever you're doing, to reiterate, the first 
thing to do is make sure you know what you really want 
to do. The second most im****tant thing is to chose the 
right algorithm, and only the final thing to worry about
is the bit ba****ng itself.

Phil
-- 
Dear aunt, let's set so double the killer delete select all.
-- Microsoft voice recognition live demonstration
 




 12 Posts in Topic:
AMD's 3DNow! imprecise
"kuratkull@[EMAIL PR  2008-04-02 12:50:35 
Re: AMD's 3DNow! imprecise
kuratkull <spamtrap@[  2008-04-03 14:23:43 
Re: AMD's 3DNow! imprecise
kuratkull <spamtrap@[  2008-04-03 14:13:07 
Re: AMD's 3DNow! imprecise
Sebastian Biallas <sp  2008-04-04 00:23:22 
Re: AMD's 3DNow! imprecise
Waldek Hebisch <spamt  2008-04-04 02:23:23 
Re: AMD's 3DNow! imprecise
Terje Mathisen <spamt  2008-04-04 07:36:03 
Re: AMD's 3DNow! imprecise
kuratkull <spamtrap@[  2008-04-03 23:13:00 
Re: AMD's 3DNow! imprecise
Phil Carmody <thefatph  2008-04-04 23:18:38 
Re: AMD's 3DNow! imprecise
kuratkull <spamtrap@[  2008-04-03 22:45:12 
Re: AMD's 3DNow! imprecise
Sebastian Biallas <sp  2008-04-07 13:55:19 
Re: AMD's 3DNow! imprecise
kuratkull <spamtrap@[  2008-04-08 05:19:12 
Re: AMD's 3DNow! imprecise
"James Van Buskirk&q  2008-04-10 13:08:32 

Post A Reply:
  Go here to Signup

AddThis Feed Button


About - Advertising - Contact - Frequently Asked Questions - Privacy Policy - Terms of Use - Signup

Contact
tan12V112 Thu Jul 24 0:35:08 CDT 2008.