"Maarten Kronenburg" <spamtrap@[EMAIL PROTECTED]
> wrote in message
news:4805f444$0$717$7ade8c0d@[EMAIL PROTECTED]
> Below is my xmm version, it runs a little under 3 cycles/32-bit.
> It's not as fast as your code (1 cycle/32-bit), but on 32-bit system
there
> are not enough xmm registers to do the CSA trick, so this is how far I
> get.
> @[EMAIL PROTECTED]
> movdqu xmm1, [esi+4*edi]
> @[EMAIL PROTECTED]
> movdqa xmm2, xmm1
> psrld xmm2, 1
> pand xmm1, xmm4
> pand xmm2, xmm4
> paddd xmm1, xmm2
> movdqa xmm2, xmm1
> psrld xmm2, 2
> pand xmm1, xmm5
> pand xmm2, xmm5
> paddd xmm1, xmm2
> movdqa xmm2, xmm1
> psrld xmm2, 4
> paddd xmm1, xmm2
> pand xmm1, xmm6
> psadbw xmm1, xmm7
> paddd xmm0, xmm1
> sub edi, 4
> jnz short @[EMAIL PROTECTED]
cycles per 4 bytes is pretty slow. What kind of processor are you
testing with? There are some tweaks that can speed up your code.
Firstly, look at how the first stage where 1-bit counters are
combined to form 2-bits counters is performed in the AMD manual.
Notice that they use a subtraction instead of an addition, a
clever optimization that can save you an AND.
You might think that it's a good thing to keep all your constants in
registers, but that is more problematic on Intel processors where
you have only limited ROB read bandwidth. What that amounts to is
you can only read at most two registers per cycle than aren't "in
flight" or three if the third is used as an index. See 248966.pdf,
section 3.5.2.1 on ROB read ****t stalls. Originally my code had all
the constants in registers but I moved some of them to memory with
consequent speed-up. Also you may consider making edi 4X as big
and addressing as movdqu xmm1, [edi+1*esi] so that your dead
register is the index register. Thus it won't participate in ROB
read ****t stalls.
Also when you have to perform a movaps to copy a value because of
the 2-register ISA we are laboring under (movaps is one byte shorter
than movdqa which is why it is more freqently seen, even in integer
code) copy a constant rather than the in-flight variable. This
allows the processor to issue the copy instruction out of order
well in advance of where it's needed so that it can't get in the way
of the critical path instructions.
After you have moved some of the constants from registers to memory
and tweaked your code a little, you may find that as a side effect
of having constants in memory you have now some free registers so
that you may be able to do a little CSA compression after all.
--
write(*,*) transfer((/17.392111325966148d0,6.5794487871554595D-85, &
6.0134700243160014d-154/),(/'x'/)); end


|
33 Posts in Topic:
|
"James Van Buskirk&q |
2008-04-12 03:14:45 |
|
Terence <spamtrap@[EM |
2008-04-12 16:55:40 |
|
Terje Mathisen <spamt |
2008-04-13 15:08:36 |
|
"James Van Buskirk&q |
2008-04-13 09:20:49 |
|
"Maarten Kronenburg& |
2008-04-13 21:48:40 |
|
Jake Waskett <spamtra |
2008-04-13 21:43:32 |
|
"Maarten Kronenburg& |
2008-04-14 01:55:14 |
|
Jake Waskett <spamtra |
2008-04-14 11:19:35 |
|
"James Van Buskirk&q |
2008-04-14 02:38:07 |
|
"Maarten Kronenburg& |
2008-04-14 20:53:32 |
|
"Maarten Kronenburg& |
2008-04-15 17:13:38 |
|
"Maarten Kronenburg& |
2008-04-15 21:58:21 |
|
Terence <spamtrap@[EM |
2008-04-13 17:14:55 |
|
"Wolfgang Kern" |
2008-04-14 12:42:35 |
|
"James Van Buskirk&q |
2008-04-14 13:53:21 |
|
"Wolfgang Kern" |
2008-04-16 15:34:09 |
|
"James Van Buskirk&q |
2008-04-16 10:05:48 |
|
Robert Redelmeier <red |
2008-04-14 14:21:05 |
|
"James Van Buskirk&q |
2008-04-14 02:58:34 |
|
"Maarten Kronenburg& |
2008-04-14 18:09:21 |
|
Terje Mathisen <spamt |
2008-04-15 07:28:26 |
|
Terence <spamtrap@[EM |
2008-04-14 05:00:42 |
|
Terence <spamtrap@[EM |
2008-04-14 15:09:35 |
|
Terence <spamtrap@[EM |
2008-04-15 02:29:34 |
|
Gerd Isenberg <spamtr |
2008-04-15 02:56:39 |
|
"James Van Buskirk&q |
2008-04-16 00:33:16 |
|
"Maarten Kronenburg& |
2008-04-16 14:42:37 |
|
"Maarten Kronenburg& |
2008-04-16 19:38:21 |
|
"James Van Buskirk&q |
2008-04-16 12:41:36 |
|
"Maarten Kronenburg& |
2008-04-16 21:39:47 |
|
"Maarten Kronenburg& |
2008-04-17 16:43:31 |
|
Gerd Isenberg <spamtr |
2008-04-16 09:58:11 |
|
"James Van Buskirk&q |
2008-04-16 12:59:38 |
|