Hi Stephen,
> If you've never tried it before, start with the smallest subset of BASIC
> that you can write nontrivial programs in. I've never tried to analyze
> it, but I suspect that QBasic's grammar isn't LL1. You can write a
> recursive-descent parser in any language. It helps if the language
> supports recursive functions, so you don't have to maintain your own
> stack, but you could theoretically do it on a Turing machine, given a
few
> man-centuries of programming effort.
I was thinking of grabbing a copy of the ISO BASIC standard, but I
suspect the grammar is in a form suitable for a LALR parser generator
and it would probably be too much for me to handle at this early stage.
Now that you mention it, I'll go with a small subset of BASIC. That
seems like a much easier way to start. I think I'll go with a subset
that does arithmetic expressions, has variable and constant
declarations, control structures and PRINT and INPUT. From there I'll
go on to add arrays, subroutines and functions. I'd like to work on the
code generation in a serious way, so the minimal subset makes a great
deal of sense. Instead of creating a full and new dialect of BASIC with
a monster standard library and stuff, I'd like a smaller language so I
can spend more time on code generation.
> Seriously, it's been done in Pascal and QuickBasic. I worked through
most
> of Jack Crenshaw's "Let's Build A Compiler!" tutorial (see
> http://compilers.iecc.com/crenshaw/),
translating his Pascal code into
> QuickBasic as I went along. It wasn't that hard, given QB's Pascal -
like
> control structures and the trick of using strings to represent sets, so
> that INSTR() can serve as the membership function. Crenshaw's "Tiny"
> language is far smaller than any BASIC, but I wound up with a compiler
> that actually worked (under real-mode DOS; *don't* try this in a
> Windows environment).
I worked through Crenshaw's tutorial, but screwed up somewhere and
couldn't get a few parts working as promised. I attempted to interpret
the Pascal code as best I could and wrote it in C++ and generated 32-bit
flat address space assembler for use in protected mode. It was great
fun though, and at that time it was the only compiler-construction
material I had a hope of understanding! I had a bit of trouble
following the 68000 assembler code in the early stages.
To begin with, I'll use the C library. I'll generate assembler like
this when generating code for PRINT:
pushl $string_addr
call _printf
addl $4, %esp
.... for example (that was GNU assembler aka AT&T syntax).
> Some caveats: Crenshaw never finished the series; it peters out after
> introducing procedures. The target processor is the Motorola 68000 (a
> much nicer architecture than the 8086; WHY did IBM have to bless Intel's
> kluge?). One of his fans published an assembly-language runtime package
> for the 8086, and I grabbed that and wrote my own code generator
routines
> to emit 8086 assembler code.
Yeah, I've always wondered why IBM chose Intel's awful processors...
probably because they were cheap and Intel invented the system on a chip
thing? IBM was a lumbering bureaucracy and they like things such as "we
created the microprocessor..." inferring that made their processors
somehow superior to others. To IBM, Intel was probably an eminent
authority. Don't see why IBM didn't design their own CPU for the PC,
come to think of it.
> It would be a _lot_ of work to go from the end of Crenshaw's series to a
> practical compiler, even in a DOS environment, but it's a good exercise.
> Dogpile still shows the original link, along with some others for
> "Crenshaw compiler tutorial."
I'm pretty much doing this as a learning exercise and don't have any
plans to make this any kind of real compiler for everyday use. But I do
have plans to get to that stage eventually. I'd also like to write a
front end for GCC and learn the tree language and RTL that GNU uses as
an IR in the back end.
Have you seen Pat Terry's (http://www.scifac.ru.ac.za/cspt/)
compiler
texts? Check this out: http://www.scifac.ru.ac.za/compilers/.
The new
edition of this book is due out this year and the compilers developed in
the text generate code for Sun's JVM and Microsoft's CLR. Instead of
using C++ as the current online version, the new edition uses C# and
Java. Terry also describes the use of Cocol/R, which generates
recursive descent/LL(k) parsers. Cocol/R is very easy to use for
beginners at my level of (un)sophistication. Terry's book is a nice
intermediate between very basic tutorials like Crenshaw and heavy
advanced works like Aho, Sethi & Ullman.
Thanks very much for your response. Are you still hacking compilers?
Care for some further discussion by email?
Cheers,
Johnathan


|