Talk About Network



Register and Login
Nick
Password
Register create new account Sign up is FREE and you can post replies, new topics, bookmark posts and more!
Recover lost password


Programming > Languages Misc > Re: Code perfor...
Latest [ Topics | Posts ] Archive Post A New Topic Post a Reply
<< Topic < Post Post 1 of 2 Topic 1141 of 1154
Post > Topic >>

Re: Code performance, rdtsc and code alignment

by James Harris <james.harris.1@[EMAIL PROTECTED] > Apr 26, 2008 at 03:00 PM

On 26 Apr, 13:17, "Wolfgang Kern" <nowh...@[EMAIL PROTECTED]
> wrote:
> James Harris wrote:
> > Oddly, no replies as yet. Will see if others are interested. The test
> > below was on an Intel chip. Testing on an AMD CPU shows similar (but
> > not as marked) effect.
>
> :) I also waited for some answers ...
>
> The align will affect the timing of all instructions after the
> first RDTSC.  My time measurement work an a dedicated test area
> filled with nops, aligned to cache-bounds and includes debugger
> overhead, so I can step/trace my code under test and see the
> reported TSC-difference.
>
> But this is just useful for comparing algos or find obvious
> dependencies early, it wont tell anything about code duration when
> this test code become part of a greater scenario somewhere else ...
>
> The different figures you get without align may come from cache-
> bound-crossing of your test code and if I look in more detail:
> +71,+91,+112,+126  could this be burst-read penalties caused by
> the 2nd serialising ?
>
> btw: I don't have a CPUID before the second RDTSC
> so I get an almost contant 14 cycles count for an empty test.

If I remove the second cpuid I seem to get a consistent 32 cycles (on
a Pentium 3). As I understand it, though, the second serialisation is
needed to ensure any previous instructions (which there would be if
this was testing a real code sequence) have completed.

I've tried various offsets both before the three tests and within but
cannot see a consistent pattern. The CPU reports the following
instruction cache characteristics.

08 1st-level instruction cache: 16 KB, 4-way set associative, 32-byte
line size

01 Instruction TLB: 4 KB Pages, 4-way set associative, 32 entries
02 Instruction TLB: 4 MB Pages, fully associative, 2 entries

This isn't a problem as it stands unless the symptoms persist within a
loop. Something to watch out for, though, for tight code!

--




 2 Posts in Topic:
Re: Code performance, rdtsc and code alignment
James Harris <james.ha  2008-04-26 15:00:44 
Re: Code performance, rdtsc and code alignment
"Wolfgang Kern"  2008-04-27 10:50:26 

Post A Reply:
  Go here to Signup

AddThis Feed Button


About - Advertising - Contact - Frequently Asked Questions - Privacy Policy - Terms of Use - Signup

Contact
tan12V112 Thu May 15 1:07:35 CDT 2008.