On Apr 19, 12:41 pm, "Chris Thomasson" <spamt...@[EMAIL PROTECTED]
> wrote:
> "Alexei A. Frounze" <spamt...@[EMAIL PROTECTED]
> wrote in
messagenews:45fb62be-cae7-492a-81b1-4ca933227a45@[EMAIL PROTECTED]
>
> > On Apr 18, 5:30 pm, "Chris Thomasson" <spamt...@[EMAIL PROTECTED]
> wrote:
> >> Before I convert this code into AT&T syntax for GAS to assemble, and
to
> >> MASM
> >> I was wondering if there are any possible optimizations I can perform
on
> >> the
> >> following code, I will show the C header first:
>
> > Are you sure your performance problem is here and not elsewhere?
>
> I just wanted to know if I could gain some optimizations in the assembly
it
> self before I translate it into AT&T syntax...
>
> > If
> > you have a serious resource contention in your application, squeezing
> > a few cycles out of the lock/queue primitives will be of little help.
> > If you profile your application and see that it spends too much time
> > in these primitives compared to the rest of the useful work it's
> > doing, then you should think of ways to restructure the application,
> > the components and their interaction to minimize this.
>
> Great advise. I am always thinking of ways to reduce contention.
However, I
> need to use non-blocking operations in certain scenarios, one of those
being
> signal-handlers. It helps when your synchronization operations are
> asynchronous reentrant signal-safe. Also, I find that non-blocking
> algorithms tend to scale better than their lock-based equivalents. One
> example would be a wait-free single-producer/consumer unbounded FIFO
> data-structure. This can be implemented without using any interlocked
RMW
> operations or explicit memory barriers on the x86. A sample
implementation
> can be found here:
>
> http://appcore.home.comcast.net
>
> I find that a clever marriage between careful lock-based and
non-blocking
> programming techniques can usually yield optimal results.
I don't know if you got what I was trying to say. If the contention
(for lock, for bus, for cache lines) is the real bottleneck, you
should rethink the design. Maybe simply processing data in bigger
chunks (hence the reduced resource competition) would be enough.
Alex


|