On Mar 18, 4:24 pm, "rh...@[EMAIL PROTECTED]
" <spamt...@[EMAIL PROTECTED]
> wrote:
> Hi all,
>
> To test HLA, I generate several different equivalent source files in
> FASM, MASM, NASM, Gas, and HLA, compile them, disassemble the
> executables they produce, and then diff the disassembly files. This
> works great as long as HLA can be coerced to produce object code in
> the exact same form as (each of) the other assemblers. As the test is
> automated, it provides a great regression test tool that I can run
> anytime I make changes to the HLA system.
>
> In the past, if a particular assembler generated different code from
> HLA, and both code sequences were semantically equivalent (i.e.,
> different encodings for the same instruction) I simply disabled that
> particular test and left it up to one of the other tests (with a
> different assembler) to catch any defects that crept into the code
> generator.
>
> For HLA v1.102, however, I'm adding a feature that allows HLA v1.102
> to generate the same "object code signature" as MASM, FASM, TASM, Gas,
> and NASM, as much as is reasonable (i.e., I don't generate bad opcodes
> if one of these assemblers has a bug in the instruction encoding).
>
> One curious thing I've noticed is that the presence/absence of a 0x66
> size prefix byte, for 16-bit only instructions, is all over the map.
> For example, consider the following two instructions:
>
> mov ds, ax
> mov ax, ds
>
This issue revolves around the machine code generated by:
mov eax,ebx
-and-
mov ax,bx
...both are the same byte code sequence!
1 ;; -f bin -l tst4.lst -o
tst4.bin tst4.nsm
2 ;;
3 [bits 16]
4
5 00000000 89D8 mov ax,bx
6 00000002 6689D8 mov eax,ebx
7 [bits 32]
8 00000005 89D8 mov eax,ebx
9 00000007 6689D8 mov ax,bx
10
11 ;; -= eof =-
You'll notice the change in the sense between the two across the two
different
sections. The second set destined for a segment described by a 32-bit
descriptor,
the first set destined for a segment described by a 16-bit descriptor.
There is a bit in the descriptor for a code segment which indicates
the 'default instruction size .. that is to say, 'this' descriptor
describes a 32-bit code segment, _or not_(i.e. 16-bit code).
> Clearly, there are only 16-bit versions of these two instructions.
> Some assemblers *always* put a 0x66 size prefix byte in front of the
> encodings, some never do, and at least one (MASM) puts size prefix
> bytes before one but not the other.
>
> The question I have is this: Is the some situation (i.e., some
> specific CPU) where the size prefix is absolutely necessary? Quite
> honestly, I've never executed these instructions in 32-bit mode, so I
> don't even know if they work. But given the number of assemblers that
> emit these instructions without the size prefix (in 32-bit mode), I
> assume that they still work properly? Or is this just a bug in those
> assemblers?
>
I dunno, what does..
mov eax,DS
..do? Automatically clear the high word? Or just overwrite the low
word?
> The Intel do***entation states: "In 32-bit mode, the assembler may
> insert the 16-bit operand-size prefix with this instruction (see the
> following "Description" section for further information). "
>
> As I read this, the 0x66 prefix byte is purely optional. However, that
> statement may not apply to some non-Intel CPUs (unlikely, but I'm
> asking because I just don't know).
>
Not optional.
> Is there any reason to waste a byte and emit these prefixes?
So as not to hobble code, confuse users?
> Thanks,
> Randy Hyde
Steve


|