Talk About Network

Google


Register and Login
Nick
Password
Register create new account Sign up is FREE and you can post replies, new topics, bookmark posts and more!
Recover lost password


Programming > Assembly x86 > Re: Population ...
Latest [ Topics | Posts ] Archive Post A New Topic Post a Reply
<< Topic < Post Post 29 of 33 Topic 4614 of 4729
Post > Topic >>

Re: Population count in SSE2, again

by "James Van Buskirk" <spamtrap@[EMAIL PROTECTED] > Apr 16, 2008 at 12:41 PM

"Maarten Kronenburg" <spamtrap@[EMAIL PROTECTED]
> wrote in message 
news:4805f444$0$717$7ade8c0d@[EMAIL PROTECTED]
> Below is my xmm version, it runs a little under 3 cycles/32-bit.
> It's not as fast as your code (1 cycle/32-bit), but on 32-bit system
there
> are not enough xmm registers to do the CSA trick, so this is how far I 
> get.

> @[EMAIL PROTECTED]
> movdqu xmm1, [esi+4*edi]
> @[EMAIL PROTECTED]
> movdqa xmm2, xmm1
> psrld xmm2, 1
> pand xmm1, xmm4
> pand xmm2, xmm4
> paddd xmm1, xmm2
> movdqa xmm2, xmm1
> psrld xmm2, 2
> pand xmm1, xmm5
> pand xmm2, xmm5
> paddd xmm1, xmm2
> movdqa xmm2, xmm1
> psrld xmm2, 4
> paddd xmm1, xmm2
> pand xmm1, xmm6
> psadbw xmm1, xmm7
> paddd xmm0, xmm1
> sub edi, 4
> jnz short @[EMAIL PROTECTED]
 cycles per 4 bytes is pretty slow.  What kind of processor are you
testing with?  There are some tweaks that can speed up your code.
Firstly, look at how the first stage where 1-bit counters are
combined to form 2-bits counters is performed in the AMD manual.
Notice that they use a subtraction instead of an addition, a
clever optimization that can save you an AND.

You might think that it's a good thing to keep all your constants in
registers, but that is more problematic on Intel processors where
you have only limited ROB read bandwidth.  What that amounts to is
you can only read at most two registers per cycle than aren't "in
flight" or three if the third is used as an index.  See 248966.pdf,
section 3.5.2.1 on ROB read ****t stalls.  Originally my code had all
the constants in registers but I moved some of them to memory with
consequent speed-up.  Also you may consider making edi 4X as big
and addressing as movdqu xmm1, [edi+1*esi] so that your dead
register is the index register.  Thus it won't participate in ROB
read ****t stalls.

Also when you have to perform a movaps to copy a value because of
the 2-register ISA we are laboring under (movaps is one byte shorter
than movdqa which is why it is more freqently seen, even in integer
code) copy a constant rather than the in-flight variable.  This
allows the processor to issue the copy instruction out of order
well in advance of where it's needed so that it can't get in the way
of the critical path instructions.

After you have moved some of the constants from registers to memory
and tweaked your code a little, you may find that as a side effect
of having constants in memory you have now some free registers so
that you may be able to do a little CSA compression after all.

-- 
write(*,*) transfer((/17.392111325966148d0,6.5794487871554595D-85, &
6.0134700243160014d-154/),(/'x'/)); end
 




 33 Posts in Topic:
Population count in SSE2, again
"James Van Buskirk&q  2008-04-12 03:14:45 
Re: Population count in SSE2, again
Terence <spamtrap@[EM  2008-04-12 16:55:40 
Re: Population count in SSE2, again
Terje Mathisen <spamt  2008-04-13 15:08:36 
Re: Population count in SSE2, again
"James Van Buskirk&q  2008-04-13 09:20:49 
Re: Population count in SSE2, again
"Maarten Kronenburg&  2008-04-13 21:48:40 
Re: Population count in SSE2, again
Jake Waskett <spamtra  2008-04-13 21:43:32 
Re: Population count in SSE2, again
"Maarten Kronenburg&  2008-04-14 01:55:14 
Re: Population count in SSE2, again
Jake Waskett <spamtra  2008-04-14 11:19:35 
Re: Population count in SSE2, again
"James Van Buskirk&q  2008-04-14 02:38:07 
Re: Population count in SSE2, again
"Maarten Kronenburg&  2008-04-14 20:53:32 
Re: Population count in SSE2, again
"Maarten Kronenburg&  2008-04-15 17:13:38 
Re: Population count in SSE2, again
"Maarten Kronenburg&  2008-04-15 21:58:21 
Re: Population count in SSE2, again
Terence <spamtrap@[EM  2008-04-13 17:14:55 
Re: Population count in SSE2, again
"Wolfgang Kern"  2008-04-14 12:42:35 
Re: Population count in SSE2, again
"James Van Buskirk&q  2008-04-14 13:53:21 
Re: Population count in SSE2, again
"Wolfgang Kern"  2008-04-16 15:34:09 
Re: Population count in SSE2, again
"James Van Buskirk&q  2008-04-16 10:05:48 
Re: Population count in SSE2, again
Robert Redelmeier <red  2008-04-14 14:21:05 
Re: Population count in SSE2, again
"James Van Buskirk&q  2008-04-14 02:58:34 
Re: Population count in SSE2, again
"Maarten Kronenburg&  2008-04-14 18:09:21 
Re: Population count in SSE2, again
Terje Mathisen <spamt  2008-04-15 07:28:26 
Re: Population count in SSE2, again
Terence <spamtrap@[EM  2008-04-14 05:00:42 
Re: Population count in SSE2, again
Terence <spamtrap@[EM  2008-04-14 15:09:35 
Re: Population count in SSE2, again
Terence <spamtrap@[EM  2008-04-15 02:29:34 
Re: Population count in SSE2, again
Gerd Isenberg <spamtr  2008-04-15 02:56:39 
Re: Population count in SSE2, again
"James Van Buskirk&q  2008-04-16 00:33:16 
Re: Population count in SSE2, again
"Maarten Kronenburg&  2008-04-16 14:42:37 
Re: Population count in SSE2, again
"Maarten Kronenburg&  2008-04-16 19:38:21 
Re: Population count in SSE2, again
"James Van Buskirk&q  2008-04-16 12:41:36 
Re: Population count in SSE2, again
"Maarten Kronenburg&  2008-04-16 21:39:47 
Re: Population count in SSE2, again
"Maarten Kronenburg&  2008-04-17 16:43:31 
Re: Population count in SSE2, again
Gerd Isenberg <spamtr  2008-04-16 09:58:11 
Re: Population count in SSE2, again
"James Van Buskirk&q  2008-04-16 12:59:38 

Post A Reply:
  Go here to Signup

AddThis Feed Button


About - Advertising - Contact - Frequently Asked Questions - Privacy Policy - Terms of Use - Signup

Contact
tan12V112 Thu Jul 24 14:58:45 CDT 2008.