Talk About Network

Google


Register and Login
Nick
Password
Register create new account Sign up is FREE and you can post replies, new topics, bookmark posts and more!
Recover lost password


Programming > Assembly x86 > Re: Population ...
Latest [ Topics | Posts ] Archive Post A New Topic Post a Reply
<< Topic < Post Post 17 of 33 Topic 4614 of 4729
Post > Topic >>

Re: Population count in SSE2, again

by "James Van Buskirk" <spamtrap@[EMAIL PROTECTED] > Apr 16, 2008 at 10:05 AM

"Wolfgang Kern" <spamtrap@[EMAIL PROTECTED]
> wrote in message 
news:fu4v9j$b5g$1@[EMAIL PROTECTED]
> "James Van Buskirk" <spamtrap@[EMAIL PROTECTED]
> schrieb im Newsbeitrag
> news:P9adnbQZvO6pK57VnZ2dnUVZ_vOlnZ2d@[EMAIL PROTECTED]
> when I look at this table ...
>>         LUT:
>>         .byte   0,1,1,2,1,2,2,3  00       1.
>>         .byte   1,2,2,3,2,3,3,4  08       2.
>>         .byte   1,2,2,3,2,3,3,4  10 =08  [2
>>         .byte   2,3,3,4,3,4,4,5  18       3.

>>         .byte   1,2,2,3,2,3,3,4  20 = 08 [2
>>         .byte   2,3,3,4,3,4,4,5  28 = 18 [3
>>         .byte   2,3,3,4,3,4,4,5  30 = 18 [3
>>         .byte   3,4,4,5,4,5,5,6  38       4.

>>         .byte   1,2,2,3,2,3,3,4  40 = 08 [2
>>         .byte   2,3,3,4,3,4,4,5  48 = 18 [3
>>         .byte   2,3,3,4,3,4,4,5  50 = 18 [3
>>         .byte   3,4,4,5,4,5,5,6  58 = 38 [4

>>         .byte   2,3,3,4,3,4,4,5  60 = 18 [3
>>         .byte   3,4,4,5,4,5,5,6  68 = 38 [4
>>         .byte   3,4,4,5,4,5,5,6  70 = 38 [4
>>         .byte   4,5,5,6,5,6,6,7  78       5.

>>         .byte   1,2,2,3,2,3,3,4  80 = 08 [2
>>         .byte   2,3,3,4,3,4,4,5  88 = 18 [3
>>         .byte   2,3,3,4,3,4,4,5  90 = 18 [3
>>         .byte   3,4,4,5,4,5,5,6  98 = 38 [4

>>         .byte   2,3,3,4,3,4,4,5  a0 = 18 [3
>>         .byte   3,4,4,5,4,5,5,6  a8 = 38 [4
>>         .byte   3,4,4,5,4,5,5,6  b0 = 38 [4
>>         .byte   4,5,5,6,5,6,6,7  b8 = 78 [5

>>         .byte   2,3,3,4,3,4,4,5  c0 = 18 [3
>>         .byte   3,4,4,5,4,5,5,6  c8 = 38 [4
>>         .byte   3,4,4,5,4,5,5,6  d0 = 38 [4
>>         .byte   4,5,5,6,5,6,6,7  d8 = 78 [5

>>         .byte   3,4,4,5,4,5,5,6  e0 = 38 [4
>>         .byte   4,5,5,6,5,6,6,7  e8 = 78 [5
>>         .byte   4,5,5,6,5,6,6,7  f0 = 78 [5
>>         .byte   5,6,6,7,6,7,7,8  f8       6.

> I wonder if a few calculations wont reduce the LUT to 6*8 bytes
> or even smaller with a double '0,1,1,2' index conversion,
> which then could be held in registers ...
> Haven't checked on the compressed method in detail, so this
> may be in use already anyway.

I think I see what you are driving at.  Actually if you look
closer at the 4-bit LUT code it holds the first 16 bytes of the
LUT in one register, and looks up bytes in two stages.  Look at
the _popcnt4 function (latest version to be found at
http://groups.google.com/group/comp.lang.asm.x86/msg/ee1ed0bbd6f35874
) and it may already be doing what you are proposing.

> My AMD64 docs show latency/throughput = 2(5)/1 for POPCNT, and I see
> this as 0.25 cycles per byte on GPR64 and 0.625 on mem64, .. not bad
> and hard to beat with an algo.

This doesn't beat the _popcnt1 version, which is getting 4.7 bytes
per clock, but the latency/throughput numbers quoted above for
Phenom imply almost 8 bytes/clock to me:

popcnt   (%rcx), %rbx
addq     %rbx, %rax
popcnt   8(%rcx), %rbx
addq     %rbx, %rax
.....

But I think it can go even faster because Phenom should be capable
of 3 DirectPath operations per cycle as well as two loads, and the
above sequence only uses one load and two DirectPath operations
per cycle.  Thus every third instruction could be devoted to
a compression algorithm like the one seen in _popcnt1, whose
final counting scheme could be accelerated with hardware popcount.
Of course my attempts to get Core 2 Duo do sustain 3 useful
calculational operations per cycle have been in vain, so there may
also be throughput issues with Phenom that prevent this idea from
becoming reality, but if anyone out there has a Phenom running
64-bit Windows I would be eager to run some tests...

-- 
write(*,*) transfer((/17.392111325966148d0,6.5794487871554595D-85, &
6.0134700243160014d-154/),(/'x'/)); end
 




 33 Posts in Topic:
Population count in SSE2, again
"James Van Buskirk&q  2008-04-12 03:14:45 
Re: Population count in SSE2, again
Terence <spamtrap@[EM  2008-04-12 16:55:40 
Re: Population count in SSE2, again
Terje Mathisen <spamt  2008-04-13 15:08:36 
Re: Population count in SSE2, again
"James Van Buskirk&q  2008-04-13 09:20:49 
Re: Population count in SSE2, again
"Maarten Kronenburg&  2008-04-13 21:48:40 
Re: Population count in SSE2, again
Jake Waskett <spamtra  2008-04-13 21:43:32 
Re: Population count in SSE2, again
"Maarten Kronenburg&  2008-04-14 01:55:14 
Re: Population count in SSE2, again
Jake Waskett <spamtra  2008-04-14 11:19:35 
Re: Population count in SSE2, again
"James Van Buskirk&q  2008-04-14 02:38:07 
Re: Population count in SSE2, again
"Maarten Kronenburg&  2008-04-14 20:53:32 
Re: Population count in SSE2, again
"Maarten Kronenburg&  2008-04-15 17:13:38 
Re: Population count in SSE2, again
"Maarten Kronenburg&  2008-04-15 21:58:21 
Re: Population count in SSE2, again
Terence <spamtrap@[EM  2008-04-13 17:14:55 
Re: Population count in SSE2, again
"Wolfgang Kern"  2008-04-14 12:42:35 
Re: Population count in SSE2, again
"James Van Buskirk&q  2008-04-14 13:53:21 
Re: Population count in SSE2, again
"Wolfgang Kern"  2008-04-16 15:34:09 
Re: Population count in SSE2, again
"James Van Buskirk&q  2008-04-16 10:05:48 
Re: Population count in SSE2, again
Robert Redelmeier <red  2008-04-14 14:21:05 
Re: Population count in SSE2, again
"James Van Buskirk&q  2008-04-14 02:58:34 
Re: Population count in SSE2, again
"Maarten Kronenburg&  2008-04-14 18:09:21 
Re: Population count in SSE2, again
Terje Mathisen <spamt  2008-04-15 07:28:26 
Re: Population count in SSE2, again
Terence <spamtrap@[EM  2008-04-14 05:00:42 
Re: Population count in SSE2, again
Terence <spamtrap@[EM  2008-04-14 15:09:35 
Re: Population count in SSE2, again
Terence <spamtrap@[EM  2008-04-15 02:29:34 
Re: Population count in SSE2, again
Gerd Isenberg <spamtr  2008-04-15 02:56:39 
Re: Population count in SSE2, again
"James Van Buskirk&q  2008-04-16 00:33:16 
Re: Population count in SSE2, again
"Maarten Kronenburg&  2008-04-16 14:42:37 
Re: Population count in SSE2, again
"Maarten Kronenburg&  2008-04-16 19:38:21 
Re: Population count in SSE2, again
"James Van Buskirk&q  2008-04-16 12:41:36 
Re: Population count in SSE2, again
"Maarten Kronenburg&  2008-04-16 21:39:47 
Re: Population count in SSE2, again
"Maarten Kronenburg&  2008-04-17 16:43:31 
Re: Population count in SSE2, again
Gerd Isenberg <spamtr  2008-04-16 09:58:11 
Re: Population count in SSE2, again
"James Van Buskirk&q  2008-04-16 12:59:38 

Post A Reply:
  Go here to Signup

AddThis Feed Button


About - Advertising - Contact - Frequently Asked Questions - Privacy Policy - Terms of Use - Signup

Contact
tan12V112 Thu Jul 24 15:01:34 CDT 2008.