On 17 Mar, 13:16, Stargazer <spamt...@[EMAIL PROTECTED]
> wrote:
....
> > > A little more testing showed that almost all cycles (2300+) were
spent
> > > at access to a global variable (via ds:[] addressing).
....
> It appears
> that the "inc dword [_running_irq]" accounts for all the mess.
>
> BTW, now I notice that there is not only read through ds:[] (bt), but
> also two writes to ds:[] following the rtdsc that I use to store time
> stamp for further comparison and neither causes any anomaly. So it
> just strengthens my suspect that RMW instruction has some strange
> effect on more distant caches. Does anybody have an idea?
--- <code snipped> ---
I'm a bit confused as to how the measurements are taken. Why not try
to measure by using the following which is based on
http://cs.smu.ca/~jamuir/rdtscpm1.pdf
cpuid
rdtsc
mov subtime, eax
cpuid
rdtsc
sub eax, subtime
mov subtime, eax
cpuid
rdtsc
mov subtime, eax
cpuid
rdtsc
sub eax, subtime
mov subtime, eax
cpuid
rdtsc
mov subtime, eax
cpuid
rdtsc
sub eax, subtime
mov subtime, eax // Only the last value of subtime is kept
// subtime should now represent the overhead cost of the
// MOV and CPUID instructions
....other instructions...
;Test 1: the single inc instruction
cpuid // Serialize execution
rdtsc // Read time stamp to EAX
mov time_1, eax ;Time for this instruction
inc dword [_running_irq] #Taken from your code
cpuid // Serialize again for time-stamp read
rdtsc
sub eax, time_1 // Find the difference
mov time_1, eax
....other instructions...
#Now time_1 minus subtime should give length of test 1
(As you know the above clobbers eax, ebx, ecx, edx each time cpuid is
run so you need to push/pop to protect them if needed.)
If the time is still of the order of 2000 cycles maybe try splitting
your inc instruction to
mov esi, _running_irq
inc esi
mov _running_irq, esi
where esi is used as eax will be trashed. If still as long (unlikely,
for the reasons you mentioned) then you could split the measurement
points.
--
James


|