---[ PutPixel v2.0 ]----------------------------------[04/18/2008]---
by Timothy Trussell
---[ A new PutPixel ]------------------------------------------------
I found myself writing resolution-specific versions of my PutPixel
word, due to the fact that my code for calculating the offset into
the display screen, in order to be efficiently fast, needed to be
different for each of those resolutions.
In the middle of writing code for plotting pixels in different screen
resolutions, it occurred to me that rather than putting into the code
an integer math calculation sequence which, while being slower, has
the virtue of working with all resolutions, it should be possible to
keep using the SHL opcode, which performs an extremely fast power of
two multiplication operation.
Now, for use with a specific resolution, the best, fastest code would
be written for that one, specific resolution, but I wanted to use a
module that didn't need to be re-written for each of the different
resolutions that I would be working with.
The code modification presented here adds a certain amount of CPU
overhead, though not a truly significant amount, since that overhead
is minimized by virtue of the code being in pure assembly, and not
mixed with higher-level Forth code.
For reference, the original coding of my PutPixel word, which plots
data to a double buffer, is:
code PutPixel ( &buf x y c -- )
\ c in bx on entry as TOS
bx push \ put all parameters onto the stack
bp sp xchg \ setup the stack frame
es push \ save registers to be used
di push
ds push
es pop
3 cells [bp] di mov \ buffer address to EDI
2 cells [bp] di add \ di=&buf[x]
\ Calculate Y*320
1 cells [bp] ax mov \ ax=y
6 # cl mov
ax cl shl \ ax=y*64
ax di add \ di=&buf[y*64+x]
2 # cl mov
ax cl shl \ ax=y*256
ax di add \ di=&buf[y*320+x]
0 cells [bp] ax mov \ al=c
es: al 0 [di] mov \ plot the pixel
di pop \ restore registers
es pop
bp sp xchg \ restore stack frame
4 cells # sp add \ drop parameters
bx pop \ get new TOS
end-code
no-expand
---[Note]------------------------------------------------------------
This code is written specifically for the 32Forth system, which is
the DOS DPMI version of Rick van Norman's OS2FORTH package, available
from the Taygeta Scientific Forth Archives, to be found at:
www.taygeta.com
along with the archives of my previous graphics columns. Again, many
thanks to Dr. Everett Carter for maintaining these archives.
--------------------------------------------------------[End Note]---
As coded, the above word will only work for video modes with an X
resolution of 320 pixels, such as 320x200 and 320x240.
The Y*320 calculation is performed by a pair of SHL (bit shift left)
operations:
1 cells [bp] ax mov \ move the Y parameter to EAX
6 # cl mov \ set for 1st shift of 6 bits
ax cl shl \ ax=y*64
ax di add \ di=y*64+x
2 # cl mov \ set for 2nd shift of 2 bits
ax cl shl \ ax=y*256
ax di add \ di=y*320+x
The first shift operation moves the value in EAX 6 bit positions to
the left, so if the value in EAX was a 1, after the shift the value
in EAX would now be 64.
---[Note]------------------------------------------------------------
The 32Forth system is a 32-bit Protected Mode version of Forth.
In CODE words, even though AX/BX/CX/DX/SI/DI are used, the full
32-bit version of these registers - EAX/EBX/ECX/EDX/ESI and EDI -
are what actually get compiled and implemented in the final code.
It is possible to specifically use the 16-bit registers, by putting
the OP: word before the register in the source code.
--------------------------------------------------------[End Note]---
This value is added to the X parameter which had been store into the
EDI register, giving EDI a result of &buf+Y*64+X.
Next, we shift the value in EAX two more bit positions to the left,
giving us Y*256 in EAX, which we then add to EDI, giving us the
final result of &buf+Y*320+X in EDI.
Extending the basic concept of factoring and adding the shifts, we
get the following calculations:
X resolution of 320: (y*64)+(y*256)
X resolution of 640: (y*128)+(y*512)
X resolution of 800: (y*32)+(y*256)+(y*512)
X resolution of 1024: (y*1024)
X resolution of 1280: (y*256)+(y*1024)
X resolution of 1600: (y*64)+(y*512)+(y*1024)
The above calculations are for the resolutions
320x200 640x400 640x480 800x600 1024x768 1280x1024 1600x1200
This gives us high-level code equivalents of:
: *320 ( y -- y*320 ) 6 LSHIFT dup 2 LSHIFT + ;
: *640 ( y -- y*640 ) 7 LSHIFT dup 2 LSHIFT + ;
: *800 ( y -- y*800 ) 5 LSHIFT dup 3 LSHIFT dup 1 LSHIFT + + ;
: *1024 ( y -- y*1024 ) 10 LSHIFT ;
: *1280 ( y -- y*1280 ) 8 LSHIFT dup 2 LSHIFT + ;
: *1600 ( y -- y*1600 ) 6 LSHIFT dup 3 LSHIFT dup 1 LSHIFT + + ;
: PutPixel ( &buf x y c -- )
-rot \ &buf c x y
%ScreenWidth @[EMAIL PROTECTED]
\ &buf c x y maxx
case
320 of *320 endof
640 of *640 endof
800 of *800 endof
1024 of *1024 endof
1280 of *1280 endof
1600 of *1600 endof
endcase \ &buf c x yofs
+ \ &buf c yofs+x
rot \ c yofs+x &buf
+ \ c &buf[yofs+x]
C! \ --
;
The above code will work correctly, but with a fairly large speed
penalty, due to it's being in high-level Forth code.
---[Binary Notation]-------------------------------------------------
Where these values (32, 64, 128, 256 etc) come from is the basic
values of bit positions in binary notation. That is to say, from
Base 2, where numbers are represented by sequences of 0's and 1's.
In binary:
0000 0000 0000 0001 = 1 0000 0001 0000 0000 = 256
0000 0000 0000 0010 = 2 0000 0010 0000 0000 = 512
0000 0000 0000 0100 = 4 0000 0100 0000 0000 = 1024
0000 0000 0000 1000 = 8 0000 1000 0000 0000 = 2048
0000 0000 0001 0000 = 16 0001 0000 0000 0000 = 4096
0000 0000 0010 0000 = 32 0010 0000 0000 0000 = 8192
0000 0000 0100 0000 = 64 0100 0000 0000 0000 = 16384
0000 0000 1000 0000 = 128 1000 0000 0000 0000 = 32768
Each time we shift a number a bit position to the left (with SHL) we
multiply that number by 2 ** for each bit position shifted **.
Conversely, shifting a number a bit position to the right (SHR) will
divide that number by two, again ** for each bit position shifted **.
We specify multiple bit positions by placing the count value in the
CL register.
6 # cl mov \ set for 1st shift of 6 bits
Therefore, shifting a number 6 bit positions to the left (SHL) will
multiply that number by 64.
6 # cl mov \ set for 1st shift of 6 bits
ax cl shl \ ax=y*64
Shifting that result another 2 bit positions to the left (SHL) will
multiply the number an additional four times, leaving a result of
[ number*256 ] in the EAX register.
In 32Forth, you can test this directly by using the high-level Forth
word LSHIFT, as follows:
1 6 LSHIFT .
LSHIFT then shifts the first number on the stack, [ 1 ], the number
of bit positions specified by the second number on the stack, [ 6 ],
leaving the result of [ 64 ] on the stack.
If you look at this carefully, you will see that all the SHL (and
the SHR) opcode does is multiply (or divide) by a power of 2, by
moving the bits that represent the number to the left (or right).
Each bit position doubles (or halves) the value of the number.
So, to enable this to work for the X resolutions the VGA display is
able to be set to, it is simply a matter of factoring the resolutions
into power of two elements.
An X resolution of 1600 can be broken down into three power of two
segments: 1024, 512 and 64 (1600-1024=576. 576-512=64.)
What we do for this resolution is three multiplications by bit shift
of the Y coordinate value:
Y*64, Y*512 and Y*1024
Rather than waste cycles by specifically doing shifts of 6, 9 and 10,
we will do incremental shifts. The first is the Y*64 shift:
6 # cl mov \ shift 6 bit positions
ax cl shl \ ax=y*64
giving us the value of y*64. In the assembly code, we add this to
our result register EDI, where we have already place the value of the
buffer address, and added to it the X coordinate offset:
ax di add \ di=&buf[y*64+x]
We now shift the y*64 value in EAX three more bit positions, giving
us the value y*512 in EAX, and add this to our result register EDI:
3 # cl mov \ shift 3 bit positions
ax cl shl \ ax=y*512
ax di add \ di=&buf[y*512+y*64+x]
And finally, we shift the y*512 value in EAX one more bit position,
giving us the value y*1024 in EAX, then add this to EDI for our final
result of &buf[y*1600+x]:
ax shl \ shift 1 bit position, ax=y*1024
ax di add \ di=&buf[y*1024+y*512+y*64+x]
This incremental bit shifting saves us from having to reload the EAX
register with the Y coordinate each time, as well as the cycles for
doing the additional bit shifts. This makes the operation of the
PutPixel word much faster, especially when plotting a LOT of pixels.
---[Modifying PutPixel]----------------------------------------------
In order for PutPixel to use the correct calculation, it has to be
able to determine what the current resolution is. In this case, we
will create a pair of variables which specify the maximum X and Y
screen coordinate.
variable %ScreenHeight
variable %ScreenWidth
These need to be initialized during the InitGraph call, which sets up
the screen resolution.
The modified PutPixel, plotting to a double buffer, is now coded as:
code PutPixel ( &buf x y c -- )
\ c in bx on entry as TOS
bx push \ put all parameters onto the stack
bp sp xchg \ setup the stack frame
es push \ save registers to be used
di push
ds push \ set ES:DI to the buffer
es pop
3 cells [bp] di mov \ buffer address to EDI
2 cells [bp] di add \ di=&buf[x]
1 cells [bp] ax mov \ ax=y
%ScreenWidth #) bx mov \ get the screen width value
320 # bx cmp \ is it 320?
1 L# jne \ no, check next
\ Calculate Y*320
6 # cl mov
ax cl shl \ ax=y*64
ax di add \ di=&buf[y*64+x]
2 # cl mov
ax cl shl \ ax=y*256
ax di add \ di=&buf[y*320+x]
6 L# ju \ go plot the pixel
1 L: 640 # bx cmp \ 640?
2 L# jne \ no, check next
\ Calculate Y*640
7 # cl mov
ax cl shl \ ax=y*128
ax di add \ di=&buf[y*128+x]
2 # cl mov
ax cl shl \ ax=y*512
ax di add \ di=&buf[y*640+x]
6 L# ju \ go plot the pixel
2 L: 800 # bx cmp \ 800?
3 L# jne \ no, check next
\ Calculate Y*800
5 # cl mov
ax cl shl \ ax=y*32
ax di add \ di=&buf[y*32+x]
3 # cl mov
ax cl shl \ ax=y*256
ax di add \ di=&buf[y*32+y*256+x]
ax shl \ ax=y*512
ax di add \ di=&buf[y*800+x]
6 L# ju \ go plot the pixel
3 L: 1024 # bx cmp \ 1024?
4 L# jne \ no, check next
\ Calculate Y*1024
10 # cl mov
ax cl shl \ ax=y*1024
ax di add \ di=&buf[y*1024+x]
6 L# ju \ go plot the pixel
4 L: 1280 # bx cmp \ 1280?
5 L# jne \ no, check next
\ Calculate Y*1280
8 # cl mov
ax cl shl \ ax=y*256
ax di add \ di=&buf[y*256+x]
2 # cl mov
ax cl shl \ ax=y*1024
ax di add \ di=&buf[y*1280+x]
6 L# ju \ go plot the pixel
5 L: 1600 # bx cmp \ 1600?
7 L# jne \ no, exit with no plot
\ Calculate Y*1600
6 # cl mov
ax cl shl \ ax=y*64
ax di add \ di=&buf[y*64+x]
3 # cl mov
ax cl shl \ ax=y*512
ax di add \ di=&buf[y*512+y*64+x]
ax shl \ ax=y*1024
ax di add \ di=&buf[y*1600+x]
\ Plot the pixel to the buffer space
6 L:
0 cells [bp] ax mov \ al=c
es: al 0 [di] mov \ plot the pixel
\ Exit
7 L:
di pop \ restore registers
es pop
bp sp xchg \ restore stack frame
4 cells # sp add \ drop parameters
bx pop \ get new TOS
end-code
no-expand
If the final check (for %ScreenWidth==1600) fails, the routine will
fall thru to removing the parameters and exitting WITHOUT plotting a
pixel to the buffer, but will NOT give any return result that it had
failed to plot. The only way for this to happen is the programmer
failing to initialize the %ScreenWidth/%ScreenHeight values - either
at all, or incorrectly.
Remember, this is Forth. If you wanted your compiler to hold your
hand and wipe your nose, you'd be using C++ right now.
---[Closing it down]-------------------------------------------------
There we have it.
In a moment, if it hasn't hit you already, you'll start wondering
"Hey, how do I get into these other resolutions so I can use this
wonderful bit of programming excellence I now have?"
Really, I know this is what you're thinking - or soon will be...
Well, that will be coming. I'm presently working on a VESA library
that will implement PutPixel v2.0 easily. It will also implement the
Linear Frame Buffer, which from Protected Mode lets us access all the
video memory on our VGA card directly, without having to worry about
things like Selectors, bank switching, or even having to switch back
to real mode continously to access the VGA memory map, at full 32-bit
programming speeds.
So, the code for PutPixel is barely a scratch on the surface of the
next level of graphics manipulations I'll be getting into.
As it is, this code will work correctly in the 320x200x256 Mode 13h
resolution, but until the new library is worked out, the additional
resolution capabilities will be idle.
If you want to use this version with your code, remember to create
and initialize the %ScreenWidth variable specifically in your code.
Adding this to your InitGraph word would be an excellent idea, if
you choose to use this for now.
---------------------------------------------[End of transmission]---


|