On 26 Apr, 13:17, "Wolfgang Kern" <nowh...@[EMAIL PROTECTED]
> wrote:
> James Harris wrote:
> > Oddly, no replies as yet. Will see if others are interested. The test
> > below was on an Intel chip. Testing on an AMD CPU shows similar (but
> > not as marked) effect.
>
> :) I also waited for some answers ...
>
> The align will affect the timing of all instructions after the
> first RDTSC. My time measurement work an a dedicated test area
> filled with nops, aligned to cache-bounds and includes debugger
> overhead, so I can step/trace my code under test and see the
> reported TSC-difference.
>
> But this is just useful for comparing algos or find obvious
> dependencies early, it wont tell anything about code duration when
> this test code become part of a greater scenario somewhere else ...
>
> The different figures you get without align may come from cache-
> bound-crossing of your test code and if I look in more detail:
> +71,+91,+112,+126 could this be burst-read penalties caused by
> the 2nd serialising ?
>
> btw: I don't have a CPUID before the second RDTSC
> so I get an almost contant 14 cycles count for an empty test.
If I remove the second cpuid I seem to get a consistent 32 cycles (on
a Pentium 3). As I understand it, though, the second serialisation is
needed to ensure any previous instructions (which there would be if
this was testing a real code sequence) have completed.
I've tried various offsets both before the three tests and within but
cannot see a consistent pattern. The CPU reports the following
instruction cache characteristics.
08 1st-level instruction cache: 16 KB, 4-way set associative, 32-byte
line size
01 Instruction TLB: 4 KB Pages, 4-way set associative, 32 entries
02 Instruction TLB: 4 MB Pages, fully associative, 2 entries
This isn't a problem as it stands unless the symptoms persist within a
loop. Something to watch out for, though, for tight code!
--


|