On Oct 1, 3:58=A0pm, "Dmitriy V'jukov" <dvyu...@[EMAIL PROTECTED]
> wrote:
> On Oct 1, 3:35=A0pm, Alexander Chemeris <alexander.cheme...@[EMAIL PROTECTED]
>
> wrote:
>
> > > I run simple benchmark for around 10 seconds. Every thread count
> > > number of executed operations in thread local counter. After 10
> > > seconds I stop all threads, and sum all thread local counters. And
> > > then divide execution time measured in cycles by total number of
> > > operations. Execution time I measure with rdtsc instruction.
> > > Usually results very stable, i.e. deviation is no more than few
> > > cycles.
>
> > How do you get rid of the task switches and the time, spent in
> > other threads? Some people just assume that if task switch
> > occurred, then time, spent inside a calculation cycle deviate
> > a lot from the mean and just through away this timing.
> > Do you use the same technique?
>
> I don't get rid of context switches and exterior load. Since I measure
> around 10^9 operations, I don't think that it will have some effect.
> Machine is maximally relieved of any exterior load while tests are
> conducted. I think that bigger mistake will be introduced by measuring
> time of every single operation.
I think using rdtsc won't introduce too much distortion, even when
timing small pieces of code. You can look into ffmpeg's timing
facilities.
Ffmpeg guys really do care about performance and have developed
a good set of C macros for performance measurement.
But, with current long-pipline CPUs there is a problem that you can't
actually say that "this operation takes exactly N cycles". You have to
clarify is it a number of cycles from the start of the first
instruction
to the start of the last instruction, or from the start of the first
instruction
to the end of the last instruction. If I recall correctly, using rdtsc
for
measuring every operation will give you the latter. While measuring
the overall time will give you the former + loop overhead + context
switches overhead. Probably context switches overhead is negligible
because of the nature of experiment, but you can't get rid of loop
overhead (though it should be just few cycles).
So, in making story short - it would be interesting if you make
timings on per-operation basis and compare that results with your
previous results.
--
Regards,
Alexander Chemeris.
SIPez LLC.
SIP VoIP, IM and Presence Consulting
http://www.SIPez.com
tel: +1 (617) 273-4000


|