On May 7, 11:35 pm, "Maarten Kronenburg" <spamt...@[EMAIL PROTECTED]
> wrote:
> "Maarten Kronenburg" wrote in message
>
> Now I see the data is in bytes. In that case it seems better to put 16
bytes
> in an 128-bit xmm register, then put 8 bytes each time into 8 16-bit
words
> by ****fting and anding, and do the above with PSLLW/PSRLW and
PADDW/PSUBW in
> the 16-bit words. Then the scaling mentioned is not needed because the
upper
> 8 bits in the 16-bit words should be zero.
> Maarten.
Thanks...
The thing is Floyd-Steinberg dithering works serially one pixel at a
time... Each pixel processed affects the pixel to its right and the
pixels on the next scanline.
So the best I can hope for is to handle one pixels R, G and B byte
values in one go.
Let me clarify with actual code ( highly unoptimal C++ , for clarities
sake )
/////////////////////////////////////////////////
struct RGBA {unsigned char r, g, b, a; };
int saturateAdd(int a, int b)
{
int ret = a + b;
if(ret < 0) return 0;
if(ret > 255) return 255;
return ret;
}
void diffuse(RGBA* pImg, int w, int h)
{
for(int y = 0; y < h; ++y)
{
RGBA* pPix = pImg + (y * w);
for(int x = 0; x < w; ++x, ++pPix)
{
RGBA bestMatch = getNearestPalColor(pPix);
int rDiff = pPix->r - bestMatch.r;
int gDiff = pPix->g - bestMatch.g;
int bDiff = pPix->b - bestMatch.b;
RGBA* pNext = pPix + 1;
pNext->r = saturateAdd(pNext->r, (rDiff * 7) >> 16);
pNext->g = saturateAdd(pNext->g, (gDiff * 7) >> 16);
pNext->b = saturateAdd(pNext->b, (bDiff * 7) >> 16);
// repeat 3 lines above for pixel below, below left and
below right with co efficients 5, 3 and 1
}
}
}
///////////////////////////////////////////////
Since logical and multiply arent available for 8 bit operands, heres
what im thinking....
The pixel bytes lets say are RGBA
I do a PUNPCKLBW getting bytes XRXGXBXA in an MMX register
The X are unwanted values.
Then I do a PAND with 0x00FF00FF00FF00FF , getting rid of the X's
I repeat the same process for the new palette pixel
Then I can do a PSUB to get 4 signed differences
Then PMULLW with the coefficient value like 0x0007000700070007
and PSRAW to ****ft
Now i have 4 signed WORDs in some MMX register which are the signed
differences between the original and palettized pixel colors...
Now how do i add these to the destination pixel with saturated
addition?
There are instructions for adding signed values with signed saturation
and unsigned values with unsigned saturation.
How do i add signed differences to unsigned values with unsigned
saturation?
Perhaps some sort of tricky bit manipulation can work?
Anyhow even if i get this far and have to do the rest normally without
MMX, it should be much simpler code than the horror that my compiler
generates for the above C++ code.
Further ideas appreciated...
Vivek


|