Talk About Network

Google


Register and Login
Nick
Password
Register create new account Sign up is FREE and you can post replies, new topics, bookmark posts and more!
Recover lost password


Programming > Assembly x86 > Re: Population ...
Latest [ Topics | Posts ] Archive Post A New Topic Post a Reply
<< Topic < Post Post 16 of 33 Topic 4614 of 4729
Post > Topic >>

Re: Population count in SSE2, again

by "Wolfgang Kern" <spamtrap@[EMAIL PROTECTED] > Apr 16, 2008 at 03:34 PM

"James Van Buskirk" <spamtrap@[EMAIL PROTECTED]
> schrieb im Newsbeitrag
news:P9adnbQZvO6pK57VnZ2dnUVZ_vOlnZ2d@[EMAIL PROTECTED]
> "Wolfgang Kern" <spamtrap@[EMAIL PROTECTED]
> wrote in message
> news:ftvcf7$umq$1@[EMAIL PROTECTED]
>> Even this algo suffer on many dependency stalls, it reads four bytes
>> and processes 32 bits per iteration, so it reduces the costly memory
>> accesses which you encounter with a byte by byte indexed table lookup.
>> Granted, a 256 byte table could be easy held in cache, so it would be
>> worth to benchmark and compare the performance of the two methods.

> Well, I tried rewriting the 8-bit LUT in assembly to minimize
> memory access, also I tried the 4-bit LUT.  In the following,
> popcnt1 is the topical code of this thread, popcnt2 is the
> 8-bit LUT, popcnt3 is popcnt1 without compression as in the AMD
> manual, and popcnt4 is the 4-bit LUT:

I see in your later posts you found already faster solutions ...

[code ..]
when I look at this table ...
>         LUT:
>         .byte   0,1,1,2,1,2,2,3  00       1.
>         .byte   1,2,2,3,2,3,3,4  08       2.
>         .byte   1,2,2,3,2,3,3,4  10 =08  [2
>         .byte   2,3,3,4,3,4,4,5  18       3.

>         .byte   1,2,2,3,2,3,3,4  20 = 08 [2
>         .byte   2,3,3,4,3,4,4,5  28 = 18 [3
>         .byte   2,3,3,4,3,4,4,5  30 = 18 [3
>         .byte   3,4,4,5,4,5,5,6  38       4.

>         .byte   1,2,2,3,2,3,3,4  40 = 08 [2
>         .byte   2,3,3,4,3,4,4,5  48 = 18 [3
>         .byte   2,3,3,4,3,4,4,5  50 = 18 [3
>         .byte   3,4,4,5,4,5,5,6  58 = 38 [4

>         .byte   2,3,3,4,3,4,4,5  60 = 18 [3
>         .byte   3,4,4,5,4,5,5,6  68 = 38 [4
>         .byte   3,4,4,5,4,5,5,6  70 = 38 [4
>         .byte   4,5,5,6,5,6,6,7  78       5.

>         .byte   1,2,2,3,2,3,3,4  80 = 08 [2
>         .byte   2,3,3,4,3,4,4,5  88 = 18 [3
>         .byte   2,3,3,4,3,4,4,5  90 = 18 [3
>         .byte   3,4,4,5,4,5,5,6  98 = 38 [4

>         .byte   2,3,3,4,3,4,4,5  a0 = 18 [3
>         .byte   3,4,4,5,4,5,5,6  a8 = 38 [4
>         .byte   3,4,4,5,4,5,5,6  b0 = 38 [4
>         .byte   4,5,5,6,5,6,6,7  b8 = 78 [5

>         .byte   2,3,3,4,3,4,4,5  c0 = 18 [3
>         .byte   3,4,4,5,4,5,5,6  c8 = 38 [4
>         .byte   3,4,4,5,4,5,5,6  d0 = 38 [4
>         .byte   4,5,5,6,5,6,6,7  d8 = 78 [5

>         .byte   3,4,4,5,4,5,5,6  e0 = 38 [4
>         .byte   4,5,5,6,5,6,6,7  e8 = 78 [5
>         .byte   4,5,5,6,5,6,6,7  f0 = 78 [5
>         .byte   5,6,6,7,6,7,7,8  f8       6.

I wonder if a few calculations wont reduce the LUT to 6*8 bytes
or even smaller with a double '0,1,1,2' index conversion,
which then could be held in registers ...
Haven't checked on the compressed method in detail, so this
may be in use already anyway.

[code..]

> So we got down to about 1 byte per 1.5 clocks for the 8-bit LUT.
> Quite disappointing.  The 4-bit LUT is close to the AMD manual
> method, though.  For a processor with fast PSHUFB and no
> hardware POPCNT this may be a good thing.  PSHUFB unfortunately is
> slow on my Core 2 Duo E6700.

My AMD64 docs show latency/throughput = 2(5)/1 for POPCNT, and I see
this as 0.25 cycles per byte on GPR64 and 0.625 on mem64, .. not bad
and hard to beat with an algo.

__
wolfgang
 




 33 Posts in Topic:
Population count in SSE2, again
"James Van Buskirk&q  2008-04-12 03:14:45 
Re: Population count in SSE2, again
Terence <spamtrap@[EM  2008-04-12 16:55:40 
Re: Population count in SSE2, again
Terje Mathisen <spamt  2008-04-13 15:08:36 
Re: Population count in SSE2, again
"James Van Buskirk&q  2008-04-13 09:20:49 
Re: Population count in SSE2, again
"Maarten Kronenburg&  2008-04-13 21:48:40 
Re: Population count in SSE2, again
Jake Waskett <spamtra  2008-04-13 21:43:32 
Re: Population count in SSE2, again
"Maarten Kronenburg&  2008-04-14 01:55:14 
Re: Population count in SSE2, again
Jake Waskett <spamtra  2008-04-14 11:19:35 
Re: Population count in SSE2, again
"James Van Buskirk&q  2008-04-14 02:38:07 
Re: Population count in SSE2, again
"Maarten Kronenburg&  2008-04-14 20:53:32 
Re: Population count in SSE2, again
"Maarten Kronenburg&  2008-04-15 17:13:38 
Re: Population count in SSE2, again
"Maarten Kronenburg&  2008-04-15 21:58:21 
Re: Population count in SSE2, again
Terence <spamtrap@[EM  2008-04-13 17:14:55 
Re: Population count in SSE2, again
"Wolfgang Kern"  2008-04-14 12:42:35 
Re: Population count in SSE2, again
"James Van Buskirk&q  2008-04-14 13:53:21 
Re: Population count in SSE2, again
"Wolfgang Kern"  2008-04-16 15:34:09 
Re: Population count in SSE2, again
"James Van Buskirk&q  2008-04-16 10:05:48 
Re: Population count in SSE2, again
Robert Redelmeier <red  2008-04-14 14:21:05 
Re: Population count in SSE2, again
"James Van Buskirk&q  2008-04-14 02:58:34 
Re: Population count in SSE2, again
"Maarten Kronenburg&  2008-04-14 18:09:21 
Re: Population count in SSE2, again
Terje Mathisen <spamt  2008-04-15 07:28:26 
Re: Population count in SSE2, again
Terence <spamtrap@[EM  2008-04-14 05:00:42 
Re: Population count in SSE2, again
Terence <spamtrap@[EM  2008-04-14 15:09:35 
Re: Population count in SSE2, again
Terence <spamtrap@[EM  2008-04-15 02:29:34 
Re: Population count in SSE2, again
Gerd Isenberg <spamtr  2008-04-15 02:56:39 
Re: Population count in SSE2, again
"James Van Buskirk&q  2008-04-16 00:33:16 
Re: Population count in SSE2, again
"Maarten Kronenburg&  2008-04-16 14:42:37 
Re: Population count in SSE2, again
"Maarten Kronenburg&  2008-04-16 19:38:21 
Re: Population count in SSE2, again
"James Van Buskirk&q  2008-04-16 12:41:36 
Re: Population count in SSE2, again
"Maarten Kronenburg&  2008-04-16 21:39:47 
Re: Population count in SSE2, again
"Maarten Kronenburg&  2008-04-17 16:43:31 
Re: Population count in SSE2, again
Gerd Isenberg <spamtr  2008-04-16 09:58:11 
Re: Population count in SSE2, again
"James Van Buskirk&q  2008-04-16 12:59:38 

Post A Reply:
  Go here to Signup

AddThis Feed Button


About - Advertising - Contact - Frequently Asked Questions - Privacy Policy - Terms of Use - Signup

Contact
tan12V112 Fri Jul 25 15:11:20 CDT 2008.