Talk About Network



Register and Login
Nick
Password
Register create new account Sign up is FREE and you can post replies, new topics, bookmark posts and more!
Recover lost password


Programming > Assembly x86 > Re: MMX speedup...
Latest [ Topics | Posts ] Archive Post A New Topic Post a Reply
<< Topic < Post Post 1 of 1 Topic 4644 of 4646
Post > Topic >>

Re: MMX speedup for Floyd Steinberg error diffusion

by "Novosad, Stephan R." <spamtrap@[EMAIL PROTECTED] > May 8, 2008 at 11:16 AM

This is a multi-part message in MIME format.

------_=_NextPart_001_01C8B137.A53B5190
Content-Type: text/plain;
	charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

Terje Mathisen wrote:

> rep_movsd wrote:
> > The thing is Floyd-Steinberg dithering works serially one pixel at a
> > time... Each pixel processed affects the pixel to its right and the
> > pixels on the next scanline.
>
> Interesting algorithm, very serially dependent. :-(
>
> How does it handle the corner/edge cases?

   It ignores them.

> I.e. the last pixel on a scan line has no down-right neighbor, the =
first
> pixel has no down-left and the last scanline has nothing below at all!

   The errors are just thrown away.

> > So the best I can hope for is to handle one pixels R, G and B byte
> > values in one go.
>
> > Let me clarify with actual code ( highly unoptimal C++ , for =
clarities
> > sake )
>
> It seems to me like the getNearestPalColor() function could easily
> dominate the processing time here!

   It does in my implementation.  Simple exhaustive search through
up to 256 palette colors.  Not too bad with 16.

> I guess a 3D space partition would be a good way to go, using an =
initial
> lookup table based on something like 3:3:3 bits to start in the proper
> subvolume, or simply use that table to point to lists of possible
> palColor candidates for each starting point.
>
> (Due to truncation effects, several pal entries might belong to =
multiple
> lists, depending upon the value of the rounded/truncated RGB bits, but
> with a good set of palette colors, the number of values to test =
against
> should be (on average) pretty low.

   The palette colors can be quite constrained in some pictures.
And occupy almost the entire RGB space in others.  Maybe some
sort of binary partition scheme?

> Anyway, to speed up the dithering stage I would use MMX/SSE and work =
on
> multiple lines in parallel, keeping the intermediate results in
> registers until we're finished with them.

   For all of the multiple lines, except the first, you would have to
process an extra line to provide the error values for the first
plotted.

Steve N.

------_=_NextPart_001_01C8B137.A53B5190
Content-Type: text/html;
	charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
<HTML>
<HEAD>
<META HTTP-EQUIV=3D"Content-Type" CONTENT=3D"text/html; =
charset=3Diso-8859-1">
<META NAME=3D"Generator" CONTENT=3D"MS Exchange Server version =
6.5.7653.19">
<TITLE>Re: MMX speedup for Floyd Steinberg error diffusion</TITLE>
</HEAD>
<BODY>
<!-- Converted from text/plain format -->

<P><FONT SIZE=3D2>Terje Mathisen wrote:<BR>
<BR>
&gt; rep_movsd wrote:<BR>
&gt; &gt; The thing is Floyd-Steinberg dithering works serially one =
pixel at a<BR>
&gt; &gt; time... Each pixel processed affects the pixel to its right =
and the<BR>
&gt; &gt; pixels on the next scanline.<BR>
&gt;<BR>
&gt; Interesting algorithm, very serially dependent. :-(<BR>
&gt;<BR>
&gt; How does it handle the corner/edge cases?<BR>
<BR>
&nbsp;&nbsp; It ignores them.<BR>
<BR>
&gt; I.e. the last pixel on a scan line has no down-right neighbor, the =
first<BR>
&gt; pixel has no down-left and the last scanline has nothing below at =
all!<BR>
<BR>
&nbsp;&nbsp; The errors are just thrown away.<BR>
<BR>
&gt; &gt; So the best I can hope for is to handle one pixels R, G and B =
byte<BR>
&gt; &gt; values in one go.<BR>
&gt;<BR>
&gt; &gt; Let me clarify with actual code ( highly unoptimal C++ , for =
clarities<BR>
&gt; &gt; sake )<BR>
&gt;<BR>
&gt; It seems to me like the getNearestPalColor() function could =
easily<BR>
&gt; dominate the processing time here!<BR>
<BR>
&nbsp;&nbsp; It does in my implementation.&nbsp; Simple exhaustive =
search through<BR>
up to 256 palette colors.&nbsp; Not too bad with 16.<BR>
<BR>
&gt; I guess a 3D space partition would be a good way to go, using an =
initial<BR>
&gt; lookup table based on something like 3:3:3 bits to start in the =
proper<BR>
&gt; subvolume, or simply use that table to point to lists of =
possible<BR>
&gt; palColor candidates for each starting point.<BR>
&gt;<BR>
&gt; (Due to truncation effects, several pal entries might belong to =
multiple<BR>
&gt; lists, depending upon the value of the rounded/truncated RGB bits, =
but<BR>
&gt; with a good set of palette colors, the number of values to test =
against<BR>
&gt; should be (on average) pretty low.<BR>
<BR>
&nbsp;&nbsp; The palette colors can be quite constrained in some =
pictures.<BR>
And occupy almost the entire RGB space in others.&nbsp; Maybe some<BR>
sort of binary partition scheme?<BR>
<BR>
&gt; Anyway, to speed up the dithering stage I would use MMX/SSE and =
work on<BR>
&gt; multiple lines in parallel, keeping the intermediate results in<BR>
&gt; registers until we're finished with them.<BR>
<BR>
&nbsp;&nbsp; For all of the multiple lines, except the first, you would =
have to<BR>
process an extra line to provide the error values for the first<BR>
plotted.<BR>
<BR>
Steve N.</FONT>
</P>

</BODY>
</HTML>
------_=_NextPart_001_01C8B137.A53B5190--




 1 Posts in Topic:
Re: MMX speedup for Floyd Steinberg error diffusion
"Novosad, Stephan R.  2008-05-08 11:16:33 

Post A Reply:
  Go here to Signup

AddThis Feed Button


About - Advertising - Contact - Frequently Asked Questions - Privacy Policy - Terms of Use - Signup

Contact
tan12V112 Wed May 14 3:22:56 CDT 2008.