Talk About Network

Google


Register and Login
Nick
Password
Register create new account Sign up is FREE and you can post replies, new topics, bookmark posts and more!
Recover lost password


Programming > Languages Misc > Re: Code perfor...
Latest [ Topics | Posts ] Archive Post A New Topic Post a Reply
<< Topic < Post Post 2 of 2 Topic 1141 of 1217
Post > Topic >>

Re: Code performance, rdtsc and code alignment

by "Wolfgang Kern" <nowhere@[EMAIL PROTECTED] > Apr 27, 2008 at 10:50 AM

James Harris wrote:

....
>> btw: I don't have a CPUID before the second RDTSC
>> so I get an almost contant 14 cycles count for an empty test.

> If I remove the second cpuid I seem to get a consistent 32 cycles (on
> a Pentium 3). As I understand it, though, the second serialisation is
> needed to ensure any previous instructions (which there would be if
> this was testing a real code sequence) have completed.

This CPUID seem to do a bit more then just wait for pipes completion,
and its behavour may vary a lot for different CPUs even within family.

Ok, the second RDTSC may stall on preceding register jobs (eax,edx)
but this can be covered by a few nops in front of it (which I usually
always have in my test field anyway).

> I've tried various offsets both before the three tests and within but
> cannot see a consistent pattern. The CPU re****ts the following
> instruction cache characteristics.
>
> 08 1st-level instruction cache: 16 KB, 4-way set associative, 32-byte
> line size

You can try to put the first RDTSC at the very end of a (physical)
cache-line, so that the code under test always start on cache bounds.


> 01 Instruction TLB: 4 KB Pages, 4-way set associative, 32 entries
> 02 Instruction TLB: 4 MB Pages, fully associative, 2 entries
>
> This isn't a problem as it stands unless the symptoms persist within a
> loop. Something to watch out for, though, for tight code!

Yes, there are many things to consider for optimising ... beside
alignment, cache-bounds, dependencies and reg/pipe-stalls there is the
code-prefetch with its CPU-dependent size and decode capabilities.

Sometimes a redundant looking jmp or useless NOPs can speed up
a following tiny loop by four times, but it always depends ...

__
wolfgang
 




 2 Posts in Topic:
Re: Code performance, rdtsc and code alignment
James Harris <james.ha  2008-04-26 15:00:44 
Re: Code performance, rdtsc and code alignment
"Wolfgang Kern"  2008-04-27 10:50:26 

Post A Reply:
  Go here to Signup

AddThis Feed Button


About - Advertising - Contact - Frequently Asked Questions - Privacy Policy - Terms of Use - Signup

Contact
tan12V112 Fri Jul 25 0:20:33 CDT 2008.