Talk About Network

Google


Register and Login
Nick
Password
Register create new account Sign up is FREE and you can post replies, new topics, bookmark posts and more!
Recover lost password


Programming > Assembly x86 > Re: Population ...
Latest [ Topics | Posts ] Archive Post A New Topic Post a Reply
<< Topic < Post Post 10 of 33 Topic 4614 of 4710
Post > Topic >>

Re: Population count in SSE2, again

by "Maarten Kronenburg" <spamtrap@[EMAIL PROTECTED] > Apr 14, 2008 at 08:53 PM

"James Van Buskirk"  wrote in message
> "Maarten Kronenburg"  wrote in message
>
> > For what it's worth below I give my 32-bit assembler for popcnt.
>
> This is a crude version of the well-known code in AMD's optimization
> manual, and is not as fast as the AMD code.
>

Yes it's the basic one, I'm now putting it in xmm and see where I get.
I work in 32-bit system so the number of registers is a limitation.

> > Also I peeked for popcnt in GMP 4.2.2, see:
> > http://gmplib.org/
> > They have an IA-64 Itanium popcnt with 1 cycle/limb (64-bit), using
the
> > IA-64 popcnt instruction.
>
> I couldn't find where you got that number from.  In
>

Just go to:
http://gmplib.org
and download gmp-4.2.2.tar.gz.
Then look for the files popcount.asm.
I just copied their timings, perhaps they have already improved.

> http://www.technovelty.org/code/arch/popcount.html
>
> I see a result of 6 clocks per 8 bytes for Itanium 2.
>
> > Their table lookup popcnt runs 8 cycle/limb (32-bit).
>
> Obviously one should be able to get close to 1 byte per cycle
> with a Core 2 Duo and 8-bit LUT.  Only crappy code would be
> short of this mark by a factor of 2.
>
> > Clearly your xmm popcnt1 is also very fast, about 1 cycle/limb
(32-bit)!
>
> Also on a machine with throughput of one 64-bit hardware per clock
> cycle it may be possible to get about 10 bytes per cycle, if the
> other throughput problem I am experiencing is not also a problem in
> the code I envision.
>
> > But because it seems more complicated you should also test it for all
> > possible 128-bit bit patterns.
>
> No, the thing to do is to follow the references given and assure
> oneself that the algorithm is mathematically sound.  The compression
> stage only needs a truth table for all combinations of three input
> bits (2**3 antries) to be exhaustive and the wrap-up stage can
> be thought of as 16 independent bytes in parallel.  The only
> aspects that I haven't seen in previous code were use PSADBW
> for the final sweep stage (which is, if anything, more
> transparent than the multiply and ****ft given by AMD, but integer
> registers don't have SIMD operations in the x86 ISA) and the
> negative logic in the compression stage (which can be extablished
> by an 8-entry truth table if you flunked logic circuits.)
>

Yes I'm sure you are right, but a typo in assembler is easily made.

> After that, testing with extreme conditions such as all bits set
> and all bits clear as well as random bits to guard against gross
> errors in coding, while not exhaustive of all possibilities, is
> at least persuasive to me.  Besides, the algorithm mashes
> together 1024 bits in each loop trip, distilling out one 128-bit
> drop to count up, not to mention that the dregs of previous
> iterations is an input to each successive iteration, so even
> an unattainable number like 128 bits doesn't even start to
> describe the complexity of the algorithm.  It's explained fairly
> well in the second link I gave in the seminal post of this
> thread.  Check it out and see what's going on here.
>
> The Wikipedia entry on Hamming weight is pretty good for the
> wrap stage of the algorithm, but the code there is almost
> identical to your example, so it probably doesn't help you.
>

Yes I found:
http://www.hackersdelight.org/HDcode/newCode/pop_arrayHS.c
To put those in assembler as you did is quite some work.
Perhaps you may find and eliminate dependency chains in your assembler
code,
that's the normal way to optimize exisiting assembler, but sometimes it's
rather a puzzle.
My popcnt is part of an integer library, just like GMP, and it's linear so
its performance is not so critical, but for some other special
applications
it may be critical. Amazing that you can get so far with optimizing
tricks.
Regards, Maarten.
 




 33 Posts in Topic:
Population count in SSE2, again
"James Van Buskirk&q  2008-04-12 03:14:45 
Re: Population count in SSE2, again
Terence <spamtrap@[EM  2008-04-12 16:55:40 
Re: Population count in SSE2, again
Terje Mathisen <spamt  2008-04-13 15:08:36 
Re: Population count in SSE2, again
"James Van Buskirk&q  2008-04-13 09:20:49 
Re: Population count in SSE2, again
"Maarten Kronenburg&  2008-04-13 21:48:40 
Re: Population count in SSE2, again
Jake Waskett <spamtra  2008-04-13 21:43:32 
Re: Population count in SSE2, again
"Maarten Kronenburg&  2008-04-14 01:55:14 
Re: Population count in SSE2, again
Jake Waskett <spamtra  2008-04-14 11:19:35 
Re: Population count in SSE2, again
"James Van Buskirk&q  2008-04-14 02:38:07 
Re: Population count in SSE2, again
"Maarten Kronenburg&  2008-04-14 20:53:32 
Re: Population count in SSE2, again
"Maarten Kronenburg&  2008-04-15 17:13:38 
Re: Population count in SSE2, again
"Maarten Kronenburg&  2008-04-15 21:58:21 
Re: Population count in SSE2, again
Terence <spamtrap@[EM  2008-04-13 17:14:55 
Re: Population count in SSE2, again
"Wolfgang Kern"  2008-04-14 12:42:35 
Re: Population count in SSE2, again
"James Van Buskirk&q  2008-04-14 13:53:21 
Re: Population count in SSE2, again
"Wolfgang Kern"  2008-04-16 15:34:09 
Re: Population count in SSE2, again
"James Van Buskirk&q  2008-04-16 10:05:48 
Re: Population count in SSE2, again
Robert Redelmeier <red  2008-04-14 14:21:05 
Re: Population count in SSE2, again
"James Van Buskirk&q  2008-04-14 02:58:34 
Re: Population count in SSE2, again
"Maarten Kronenburg&  2008-04-14 18:09:21 
Re: Population count in SSE2, again
Terje Mathisen <spamt  2008-04-15 07:28:26 
Re: Population count in SSE2, again
Terence <spamtrap@[EM  2008-04-14 05:00:42 
Re: Population count in SSE2, again
Terence <spamtrap@[EM  2008-04-14 15:09:35 
Re: Population count in SSE2, again
Terence <spamtrap@[EM  2008-04-15 02:29:34 
Re: Population count in SSE2, again
Gerd Isenberg <spamtr  2008-04-15 02:56:39 
Re: Population count in SSE2, again
"James Van Buskirk&q  2008-04-16 00:33:16 
Re: Population count in SSE2, again
"Maarten Kronenburg&  2008-04-16 14:42:37 
Re: Population count in SSE2, again
"Maarten Kronenburg&  2008-04-16 19:38:21 
Re: Population count in SSE2, again
"James Van Buskirk&q  2008-04-16 12:41:36 
Re: Population count in SSE2, again
"Maarten Kronenburg&  2008-04-16 21:39:47 
Re: Population count in SSE2, again
"Maarten Kronenburg&  2008-04-17 16:43:31 
Re: Population count in SSE2, again
Gerd Isenberg <spamtr  2008-04-16 09:58:11 
Re: Population count in SSE2, again
"James Van Buskirk&q  2008-04-16 12:59:38 

Post A Reply:
  Go here to Signup

AddThis Feed Button


About - Advertising - Contact - Frequently Asked Questions - Privacy Policy - Terms of Use - Signup

Contact
tan12V112 Sun Jul 6 20:32:02 CDT 2008.