Talk About Network

Google


Register and Login
Nick
Password
Register create new account Sign up is FREE and you can post replies, new topics, bookmark posts and more!
Recover lost password


Programming > Assembly x86 > Re: Population ...
Latest [ Topics | Posts ] Archive Post A New Topic Post a Reply
<< Topic < Post Post 30 of 33 Topic 4614 of 4729
Post > Topic >>

Re: Population count in SSE2, again

by "Maarten Kronenburg" <spamtrap@[EMAIL PROTECTED] > Apr 16, 2008 at 09:39 PM

"James Van Buskirk"  wrote in message
> "Maarten Kronenburg"  wrote in message
>

> 3 cycles per 4 bytes is pretty slow.  What kind of processor are you
> testing with?  There are some tweaks that can speed up your code.
> Firstly, look at how the first stage where 1-bit counters are
> combined to form 2-bits counters is performed in the AMD manual.
> Notice that they use a subtraction instead of an addition, a
> clever optimization that can save you an AND.
>
Yes I saw that also in your code, I will try to study and understand this.
This would indeed save me one and in the loop.

> You might think that it's a good thing to keep all your constants in
> registers, but that is more problematic on Intel processors where
> you have only limited ROB read bandwidth.  What that amounts to is
> you can only read at most two registers per cycle than aren't "in
> flight" or three if the third is used as an index.  See 248966.pdf,
> section 3.5.2.1 on ROB read ****t stalls.  Originally my code had all
> the constants in registers but I moved some of them to memory with
> consequent speed-up.  Also you may consider making edi 4X as big
> and addressing as movdqu xmm1, [edi+1*esi] so that your dead
> register is the index register.  Thus it won't participate in ROB
> read ****t stalls.

OK I will try that also.

>
> Also when you have to perform a movaps to copy a value because of
> the 2-register ISA we are laboring under (movaps is one byte shorter
> than movdqa which is why it is more freqently seen, even in integer
> code) copy a constant rather than the in-flight variable.  This
> allows the processor to issue the copy instruction out of order
> well in advance of where it's needed so that it can't get in the way
> of the critical path instructions.

Yes I understand, then the copy is out of the dependency chain.

>
> After you have moved some of the constants from registers to memory
> and tweaked your code a little, you may find that as a side effect
> of having constants in memory you have now some free registers so
> that you may be able to do a little CSA compression after all.
>
> --

Thanks for your suggestions, I have some work to do.
My library has many other assembler routines for multiplication etc,
optimizing those also means getting instructions out of the dependency
chain. Mostly the first step is getting a working version, then the second
step is this kind of optimiziation.
Regards, Maarten.
 




 33 Posts in Topic:
Population count in SSE2, again
"James Van Buskirk&q  2008-04-12 03:14:45 
Re: Population count in SSE2, again
Terence <spamtrap@[EM  2008-04-12 16:55:40 
Re: Population count in SSE2, again
Terje Mathisen <spamt  2008-04-13 15:08:36 
Re: Population count in SSE2, again
"James Van Buskirk&q  2008-04-13 09:20:49 
Re: Population count in SSE2, again
"Maarten Kronenburg&  2008-04-13 21:48:40 
Re: Population count in SSE2, again
Jake Waskett <spamtra  2008-04-13 21:43:32 
Re: Population count in SSE2, again
"Maarten Kronenburg&  2008-04-14 01:55:14 
Re: Population count in SSE2, again
Jake Waskett <spamtra  2008-04-14 11:19:35 
Re: Population count in SSE2, again
"James Van Buskirk&q  2008-04-14 02:38:07 
Re: Population count in SSE2, again
"Maarten Kronenburg&  2008-04-14 20:53:32 
Re: Population count in SSE2, again
"Maarten Kronenburg&  2008-04-15 17:13:38 
Re: Population count in SSE2, again
"Maarten Kronenburg&  2008-04-15 21:58:21 
Re: Population count in SSE2, again
Terence <spamtrap@[EM  2008-04-13 17:14:55 
Re: Population count in SSE2, again
"Wolfgang Kern"  2008-04-14 12:42:35 
Re: Population count in SSE2, again
"James Van Buskirk&q  2008-04-14 13:53:21 
Re: Population count in SSE2, again
"Wolfgang Kern"  2008-04-16 15:34:09 
Re: Population count in SSE2, again
"James Van Buskirk&q  2008-04-16 10:05:48 
Re: Population count in SSE2, again
Robert Redelmeier <red  2008-04-14 14:21:05 
Re: Population count in SSE2, again
"James Van Buskirk&q  2008-04-14 02:58:34 
Re: Population count in SSE2, again
"Maarten Kronenburg&  2008-04-14 18:09:21 
Re: Population count in SSE2, again
Terje Mathisen <spamt  2008-04-15 07:28:26 
Re: Population count in SSE2, again
Terence <spamtrap@[EM  2008-04-14 05:00:42 
Re: Population count in SSE2, again
Terence <spamtrap@[EM  2008-04-14 15:09:35 
Re: Population count in SSE2, again
Terence <spamtrap@[EM  2008-04-15 02:29:34 
Re: Population count in SSE2, again
Gerd Isenberg <spamtr  2008-04-15 02:56:39 
Re: Population count in SSE2, again
"James Van Buskirk&q  2008-04-16 00:33:16 
Re: Population count in SSE2, again
"Maarten Kronenburg&  2008-04-16 14:42:37 
Re: Population count in SSE2, again
"Maarten Kronenburg&  2008-04-16 19:38:21 
Re: Population count in SSE2, again
"James Van Buskirk&q  2008-04-16 12:41:36 
Re: Population count in SSE2, again
"Maarten Kronenburg&  2008-04-16 21:39:47 
Re: Population count in SSE2, again
"Maarten Kronenburg&  2008-04-17 16:43:31 
Re: Population count in SSE2, again
Gerd Isenberg <spamtr  2008-04-16 09:58:11 
Re: Population count in SSE2, again
"James Van Buskirk&q  2008-04-16 12:59:38 

Post A Reply:
  Go here to Signup

AddThis Feed Button


About - Advertising - Contact - Frequently Asked Questions - Privacy Policy - Terms of Use - Signup

Contact
tan12V112 Fri Jul 25 21:07:16 CDT 2008.