Talk About Network

Google


Register and Login
Nick
Password
Register create new account Sign up is FREE and you can post replies, new topics, bookmark posts and more!
Recover lost password


Programming > Assembly x86 > Re: Population ...
Latest [ Topics | Posts ] Archive Post A New Topic Post a Reply
<< Topic < Post Post 14 of 33 Topic 4614 of 4710
Post > Topic >>

Re: Population count in SSE2, again

by "Wolfgang Kern" <spamtrap@[EMAIL PROTECTED] > Apr 14, 2008 at 12:42 PM

"Terence" wrote:

> Someone tell me why 21 instructions and a branch, per word, is somehow
> better/faster than
>
> Initializtion all methods need.
>
>   zero summing index   (eg ax or eax)
>   load address of string to esi
>   load string count to ecx
>   load table address to ebx
> back:                              ; 3 instructions per word follow
>   add to current sum, from count table indexed by the 8 byte or 16 bit
> word (one instruction)
>   increment string pointer
> (esi)                                            (one instruction)
>   branch to repeat if not finished string byte/word string length
> (dec-branch instruction)
>
> ?
> I want to know, if I haven't understood what you all seem to be trying
> to do!
> Is it that all those instuctions are all somehow loaded to the cpu
> chip inststruction store and processed on chip? Because there is
> always the memory access to be considered.

The shown code reminds me of what Randy Hyde posted a while back
in ALA for counting set bits.
Even this algo suffer on many dependency stalls, it reads four bytes
and processes 32 bits per iteration, so it reduces the costly memory
accesses which you encounter with a byte by byte indexed table lookup.
Granted, a 256 byte table could be easy held in cache, so it would be
worth to benchmark and compare the performance of the two methods.

I never needed a bit counter yet, but if I ever need one, I'd try
if the known to be awful slow BSF/BSR instructions wont save on
many iterations, ...sure the worst case would be all bits set then,
so your suggestion may perform quite well:

xor eax,eax         ;init count
xor ecx,ecx         ;used as dwords_size
mov esi,start_of
mov edi, Byte_LUT   ;should be prepared already
prefetchnta [edi]
            [edi+40h]
            [edi+80h]
            [edi+C0h]
L0:
mov ebx,[esi+ecx*4]
movzx edx,bl         ;four times unrolled
shr ebx,8            ;because a loop branch
add AL,[edi+edx]     ;may cost more time
adc eax,0x0100       ;
movzx edx,bl
shr ebx,8
add AL,[edi+edx]
adc eax,0x0100
movzx edx,bl
shr ebx,8
add AL,[edi+edx]
adc eax,0x0100
movzx edx,bl
;shr ebx,8           ;not needed in last
add AL,[edi+edx]
adc eax,0x0100
inc ecx
cmp ecx,dword_size
jc L0
; eax = bit count

but I'm not sure that your short variant:

;ecx=byte_size
L0:
movzx edx,[esi]
inc esi
add AL,[edi+edx]
adc eax,0x0100
dec ecx
jnz L0

could be faster, because it reads 'bytes' more often from memory.

perhaps I type in in one day and post timing results then.
__
wolfgang
 




 33 Posts in Topic:
Population count in SSE2, again
"James Van Buskirk&q  2008-04-12 03:14:45 
Re: Population count in SSE2, again
Terence <spamtrap@[EM  2008-04-12 16:55:40 
Re: Population count in SSE2, again
Terje Mathisen <spamt  2008-04-13 15:08:36 
Re: Population count in SSE2, again
"James Van Buskirk&q  2008-04-13 09:20:49 
Re: Population count in SSE2, again
"Maarten Kronenburg&  2008-04-13 21:48:40 
Re: Population count in SSE2, again
Jake Waskett <spamtra  2008-04-13 21:43:32 
Re: Population count in SSE2, again
"Maarten Kronenburg&  2008-04-14 01:55:14 
Re: Population count in SSE2, again
Jake Waskett <spamtra  2008-04-14 11:19:35 
Re: Population count in SSE2, again
"James Van Buskirk&q  2008-04-14 02:38:07 
Re: Population count in SSE2, again
"Maarten Kronenburg&  2008-04-14 20:53:32 
Re: Population count in SSE2, again
"Maarten Kronenburg&  2008-04-15 17:13:38 
Re: Population count in SSE2, again
"Maarten Kronenburg&  2008-04-15 21:58:21 
Re: Population count in SSE2, again
Terence <spamtrap@[EM  2008-04-13 17:14:55 
Re: Population count in SSE2, again
"Wolfgang Kern"  2008-04-14 12:42:35 
Re: Population count in SSE2, again
"James Van Buskirk&q  2008-04-14 13:53:21 
Re: Population count in SSE2, again
"Wolfgang Kern"  2008-04-16 15:34:09 
Re: Population count in SSE2, again
"James Van Buskirk&q  2008-04-16 10:05:48 
Re: Population count in SSE2, again
Robert Redelmeier <red  2008-04-14 14:21:05 
Re: Population count in SSE2, again
"James Van Buskirk&q  2008-04-14 02:58:34 
Re: Population count in SSE2, again
"Maarten Kronenburg&  2008-04-14 18:09:21 
Re: Population count in SSE2, again
Terje Mathisen <spamt  2008-04-15 07:28:26 
Re: Population count in SSE2, again
Terence <spamtrap@[EM  2008-04-14 05:00:42 
Re: Population count in SSE2, again
Terence <spamtrap@[EM  2008-04-14 15:09:35 
Re: Population count in SSE2, again
Terence <spamtrap@[EM  2008-04-15 02:29:34 
Re: Population count in SSE2, again
Gerd Isenberg <spamtr  2008-04-15 02:56:39 
Re: Population count in SSE2, again
"James Van Buskirk&q  2008-04-16 00:33:16 
Re: Population count in SSE2, again
"Maarten Kronenburg&  2008-04-16 14:42:37 
Re: Population count in SSE2, again
"Maarten Kronenburg&  2008-04-16 19:38:21 
Re: Population count in SSE2, again
"James Van Buskirk&q  2008-04-16 12:41:36 
Re: Population count in SSE2, again
"Maarten Kronenburg&  2008-04-16 21:39:47 
Re: Population count in SSE2, again
"Maarten Kronenburg&  2008-04-17 16:43:31 
Re: Population count in SSE2, again
Gerd Isenberg <spamtr  2008-04-16 09:58:11 
Re: Population count in SSE2, again
"James Van Buskirk&q  2008-04-16 12:59:38 

Post A Reply:
  Go here to Signup

AddThis Feed Button


About - Advertising - Contact - Frequently Asked Questions - Privacy Policy - Terms of Use - Signup

Contact
tan12V112 Sun Jul 6 20:05:58 CDT 2008.