On Apr 4, 11:18 pm, Phil Carmody <thefatphil_demun...@[EMAIL PROTECTED]
>
wrote:
> kuratkull <spamt...@[EMAIL PROTECTED]
> writes:
> > On Apr 4, 8:36 am, Terje Mathisen <spamt...@[EMAIL PROTECTED]
> wrote:
> > > Waldek Hebisch wrote:
> > > > kuratk...@[EMAIL PROTECTED]
<spamt...@[EMAIL PROTECTED]
> wrote:
> > > >> Anyway, I am trying to build the very-fast-prime-counter i
assembly
> > > >> and I succeeded when using only the regular instructions, but a
6.5
> > > >> secs for a 100k primes is way too slow, so I looked around and
> > > >> profiled my code...and of course, a major bottleneck was the
idivl.
>
> > > > I am not sure what "prime-counter" mean -- I you want just
generate
> > > > primes than sieve of Eratosthenes is the standard method (much
faster
> > > > than method using division).
>
> > > Indeed, that is the way to do it for any kind of range processing,
until
> > > you get into stuff like 500+ bit prime number searching (for RSA
keys),
> > > in which case you should be using statistical procedures, i.e.
modular
> > > arithmetic.
>
> > > > In case you really want to use division, you may consider storing
> > > > recipracals -- you are using the same divisor many times.
>
> > > Using the same divisor/reciprocal many times simply means using a
sieve,
> > > and for a properly compressed base table (personally I prefer
storing 30
> > > numbers in each 8-bit entry) you can even do stuff like casting out
> > > multiple trial divisor multiples at the same time, when the divisors
are
> > > still pretty small.
>
> > > Do a search on +prime +"sieve of Eratosthenes"!
>
> > > I'm guessing you have at least a couple of orders of magnitude in
speed
> > > to gain. :-)
>
> > Primes are like my personal hobby, so yes, I know pretty much about
> > them (trying not to boast). I have ~10 different algorythms
> > implemented in C, I also have 4 primes which belong to the TOP5000
> > largest primes in the world list(found using LLR and sieving).
> > So please stop suggesting me to use the SOE, I know :)
>
> *cough*
>
> The problem is that if you're looking at optimising assembly
> at this stage, you've obviously ignored everything that Knuth
> has said about optimisation. As has been pointed out before,
> you've not even clearly described exactly what task you wish
> to perform.
>
> If, as you first said, you want a prime counter, and the
> magnitudes of your bounds are quite small, but the range
> between them not insignificant (say 1e18-1.01e18) then the
> quickest way is to use something like a Meissel-Lehmer
> algorithm, or at least Legendre's algorithm, rather than
> a sieve. PD C and Java source can be found on sci.math
> dating about 5 years ago. (When James Harris was in full
> flood.)
>
> However, if your range is narrow, then a sieve may be best.
> The precise type of sieve that's best will depend on the
> magnitude of the numbers involved. If your implementation
> is polished, then an eratosthenes with intelligent queuing
> of the small primes will be optimal up to 10^17 or so.
> Tomás Oliveira e Silva has a pretty good open source
> implementation which if tweaked correctly can approach
> James van Buskirk's, which I believe to be the fastest
> Eratosthenes.
>
> Beyond that, you'll want an Atkin or Galway sieve. DJB's
> primegen does work well at ~10^18 and beyond despite his
> disclaimers, as long as you parametrise it correctly.
> (Most people who have complained about its speed have
> simply not read the instructions, it degrades badly if
> you mis-parametrise.) As far as I know there are no
> particularly fast implementations of a Galway sieve,
> but as long as you're prepared to handle numbers >64
> bits, a Galway sieve should run quite happily up to 10^20
> on a commodity PC.
>
> If you're actually looking at finding moderately large
> primes in a modest range, then deep sieving becomes
> relatively unim****tant, as the PRP tests begin to dominate.
> Shallow sieving, of course, is vital. The cutoff between
> shallow and deep is something for which you'll need to do
> a little bit of arithmetic. (I.e. what Terje said above.)
>
> However, whatever you're doing, to reiterate, the first
> thing to do is make sure you know what you really want
> to do. The second most im****tant thing is to chose the
> right algorithm, and only the final thing to worry about
> is the bit ba****ng itself.
>
> Phil
> --
> Dear aunt, let's set so double the killer delete select all.
> -- Microsoft voice recognition live demonstration
Thanks for the long post, I really appreciate it Phil. Although the
point was not about the algorithm itself, but about any means to have
the app run faster with "simple" ASM tricks. But It doesn't matter any
more, since I have implemented a Sieve of E. in 55 lines which finds
10 million primes in 0.54 seconds, so I guess I managed to write fast
code anyway(because it's possible to write a rather slow S. of E.
too).
The 3DNow! problem is fixed by simply not using it, so I guess this
topic has served its purpose.
Thanks to all who responded!
Tanel


|