Talk About Network

Google


Register and Login
Nick
Password
Register create new account Sign up is FREE and you can post replies, new topics, bookmark posts and more!
Recover lost password


Programming > Assembly x86 > Re: Population ...
Latest [ Topics | Posts ] Archive Post A New Topic Post a Reply
<< Topic < Post Post 9 of 33 Topic 4614 of 4729
Post > Topic >>

Re: Population count in SSE2, again

by "James Van Buskirk" <spamtrap@[EMAIL PROTECTED] > Apr 14, 2008 at 02:38 AM

"Maarten Kronenburg" <spamtrap@[EMAIL PROTECTED]
> wrote in message 
news:4802639c$0$720$7ade8c0d@[EMAIL PROTECTED]
> For what it's worth below I give my 32-bit assembler for popcnt.

This is a crude version of the well-known code in AMD's optimization
manual, and is not as fast as the AMD code.

> Also I peeked for popcnt in GMP 4.2.2, see:
> http://gmplib.org/
> They have an IA-64 Itanium popcnt with 1 cycle/limb (64-bit), using the
> IA-64 popcnt instruction.

I couldn't find where you got that number from.  In

http://www.technovelty.org/code/arch/popcount.html

I see a result of 6 clocks per 8 bytes for Itanium 2.

> Their table lookup popcnt runs 8 cycle/limb (32-bit).

Obviously one should be able to get close to 1 byte per cycle
with a Core 2 Duo and 8-bit LUT.  Only crappy code would be
short of this mark by a factor of 2.

> Clearly your xmm popcnt1 is also very fast, about 1 cycle/limb (32-bit)!

Also on a machine with throughput of one 64-bit hardware per clock
cycle it may be possible to get about 10 bytes per cycle, if the
other throughput problem I am experiencing is not also a problem in
the code I envision.

> But because it seems more complicated you should also test it for all
> possible 128-bit bit patterns.

No, the thing to do is to follow the references given and assure
oneself that the algorithm is mathematically sound.  The compression
stage only needs a truth table for all combinations of three input
bits (2**3 antries) to be exhaustive and the wrap-up stage can
be thought of as 16 independent bytes in parallel.  The only
aspects that I haven't seen in previous code were use PSADBW
for the final sweep stage (which is, if anything, more
transparent than the multiply and ****ft given by AMD, but integer
registers don't have SIMD operations in the x86 ISA) and the
negative logic in the compression stage (which can be extablished
by an 8-entry truth table if you flunked logic circuits.)

After that, testing with extreme conditions such as all bits set
and all bits clear as well as random bits to guard against gross
errors in coding, while not exhaustive of all possibilities, is
at least persuasive to me.  Besides, the algorithm mashes
together 1024 bits in each loop trip, distilling out one 128-bit
drop to count up, not to mention that the dregs of previous
iterations is an input to each successive iteration, so even
an unattainable number like 128 bits doesn't even start to
describe the complexity of the algorithm.  It's explained fairly
well in the second link I gave in the seminal post of this
thread.  Check it out and see what's going on here.

The Wikipedia entry on Hamming weight is pretty good for the
wrap stage of the algorithm, but the code there is almost
identical to your example, so it probably doesn't help you.

-- 
write(*,*) transfer((/17.392111325966148d0,6.5794487871554595D-85, &
6.0134700243160014d-154/),(/'x'/)); end
 




 33 Posts in Topic:
Population count in SSE2, again
"James Van Buskirk&q  2008-04-12 03:14:45 
Re: Population count in SSE2, again
Terence <spamtrap@[EM  2008-04-12 16:55:40 
Re: Population count in SSE2, again
Terje Mathisen <spamt  2008-04-13 15:08:36 
Re: Population count in SSE2, again
"James Van Buskirk&q  2008-04-13 09:20:49 
Re: Population count in SSE2, again
"Maarten Kronenburg&  2008-04-13 21:48:40 
Re: Population count in SSE2, again
Jake Waskett <spamtra  2008-04-13 21:43:32 
Re: Population count in SSE2, again
"Maarten Kronenburg&  2008-04-14 01:55:14 
Re: Population count in SSE2, again
Jake Waskett <spamtra  2008-04-14 11:19:35 
Re: Population count in SSE2, again
"James Van Buskirk&q  2008-04-14 02:38:07 
Re: Population count in SSE2, again
"Maarten Kronenburg&  2008-04-14 20:53:32 
Re: Population count in SSE2, again
"Maarten Kronenburg&  2008-04-15 17:13:38 
Re: Population count in SSE2, again
"Maarten Kronenburg&  2008-04-15 21:58:21 
Re: Population count in SSE2, again
Terence <spamtrap@[EM  2008-04-13 17:14:55 
Re: Population count in SSE2, again
"Wolfgang Kern"  2008-04-14 12:42:35 
Re: Population count in SSE2, again
"James Van Buskirk&q  2008-04-14 13:53:21 
Re: Population count in SSE2, again
"Wolfgang Kern"  2008-04-16 15:34:09 
Re: Population count in SSE2, again
"James Van Buskirk&q  2008-04-16 10:05:48 
Re: Population count in SSE2, again
Robert Redelmeier <red  2008-04-14 14:21:05 
Re: Population count in SSE2, again
"James Van Buskirk&q  2008-04-14 02:58:34 
Re: Population count in SSE2, again
"Maarten Kronenburg&  2008-04-14 18:09:21 
Re: Population count in SSE2, again
Terje Mathisen <spamt  2008-04-15 07:28:26 
Re: Population count in SSE2, again
Terence <spamtrap@[EM  2008-04-14 05:00:42 
Re: Population count in SSE2, again
Terence <spamtrap@[EM  2008-04-14 15:09:35 
Re: Population count in SSE2, again
Terence <spamtrap@[EM  2008-04-15 02:29:34 
Re: Population count in SSE2, again
Gerd Isenberg <spamtr  2008-04-15 02:56:39 
Re: Population count in SSE2, again
"James Van Buskirk&q  2008-04-16 00:33:16 
Re: Population count in SSE2, again
"Maarten Kronenburg&  2008-04-16 14:42:37 
Re: Population count in SSE2, again
"Maarten Kronenburg&  2008-04-16 19:38:21 
Re: Population count in SSE2, again
"James Van Buskirk&q  2008-04-16 12:41:36 
Re: Population count in SSE2, again
"Maarten Kronenburg&  2008-04-16 21:39:47 
Re: Population count in SSE2, again
"Maarten Kronenburg&  2008-04-17 16:43:31 
Re: Population count in SSE2, again
Gerd Isenberg <spamtr  2008-04-16 09:58:11 
Re: Population count in SSE2, again
"James Van Buskirk&q  2008-04-16 12:59:38 

Post A Reply:
  Go here to Signup

AddThis Feed Button


About - Advertising - Contact - Frequently Asked Questions - Privacy Policy - Terms of Use - Signup

Contact
tan12V112 Sat Jul 26 2:14:07 CDT 2008.