In article <645afqF29q67nU1@[EMAIL PROTECTED]
>,
=?ISO-8859-1?Q?J=FCrgen_Kahrs?= <Juergen.KahrsDELETETHIS@[EMAIL PROTECTED]
> wrote:
> problems@[EMAIL PROTECTED]
schrieb:
>
> > I've just looked at awk's docos for the first time and it seems
> > suitable for implememting an augumented-finite-state-machine.
>
> Right.
>
> > At each/most node there's an action to be performed.
> > Some nodes are not terminals/leaves and just lead to
> > another nested syntax-diagram via push & return via pop.
>
> What you describe is a recursive-descent compiler.
> Terminals are consumed by the scanner.
> Non-terminals are handled by a dedicated function.
>
Well in terms of traversing the AFSM, at terminals the corresponding
actions are performed, which is why I thought awk might be suitable,
and at non-terminals, eg. after the "BEGIN" of a procedures outer
block, the train-on-the-tracks [=prog.counter] jumps to the
<STATEMENT> sub-diagram, which just consists of a sequence of
lines of: (token, action set); somewhere else on the AFSM
representation. Importantly this AFSM representation is just
data, which is interpreted, by traversing the paths and doing the
corresponding action.
> > I've built several p-code compilers like this.
>
> Me too, with AWK.
>
> > From what I see from awks docos, it's very suitable for
> > doing the [token,action] task, as the source-code is
> > scanned into tokens and passed to awk.
>
> Right, but remember that the traditional distinction
> between a low-level "scanner" (with regular expressions)
> and a high-level "parser" (doing recursive-descent parsing)
> should be made.
>
At the time I did it [70s] I didn't know about regex, and the
p-code compiler was multipass, so 'scanning' meant converting
each token to either a
* single-char for reserved-words, or
* a-number: in an array of numbers [the desktop HP programmable
calculator did not distinguish between interger & real types, it
just auto converted where appropriate] or
* a-name: store in an array of names.
> > As I remember, at the branch-points a one token look ahead
> > only was needed for the pascal-like syntax.
>
> Yes, it was called the LL(1). Wirth always designed
> his languages so that they can be parsed with a
> lookahead of one symbol.
>
At that time I didn't know about LL(1), nor have I studied
compliler theory up to now. What I liked about this recursive
descent simplicity, was that while compiling [but less so with
my multipass HP-calculator version] the source can/is just
displayed, and where it stops IS the error-found's location.
> > Can awk handle this ?
> >
> > How would it handle the branching, eg. the sub-diagram for
> > statement would be like:
> > Case TOKEN
> > 'IF': next state := IF-node;
> > 'WHILE: next state := WHILE-node;
> > ...
>
> I have implemented a compiler for PL/0 and I
> did it this way:
>
> http://en.wikipedia.org/wiki/PL/0
>
OK, I fetch and examine that.
> function Statement( m, n) {
> if (EatSym("ident", -1)) {
> n = FindIdent(tokenprev, "variable")
> EatSym(":=", 14)
> Expression()
> Encode(0xb3, n, 2)
> } else if (EatSym("CALL", -1)) {
> EatSym("ident", 12)
> Encode(0xb8, FindIdent(tokenprev, "procedure"), 2)
> } else if (EatSym("BEGIN", -1)) {
> do { Statement() } while (EatSym(";", -1))
> EatSym("END", 9)
> } else if (EatSym("IF", -1)) {
> Condition()
> EatSym("THEN", 10)
> n = pc
> Encode(0xa0, 0, 2)
> Statement()
> PatchAddr(n, pc)
> } else if (EatSym("WHILE", -1)) {
> n = pc
> Condition()
> EatSym("DO", 1)
> m = pc
> Encode(0xa0, 0, 2)
> Statement()
> Encode(0xa7, n-pc, 2) # goto
> PatchAddr(m, pc)
> }
> }
>
OK, apparently "Encode" is the code generation.
Since I haven't worked with my stuff for many years,
I only now realise that awk would give me a program
driven implementation, rather than a data-driven AFSM,
which is what I want.
So, looking at your version I'd have had:
....
IF: Condition()
THEN: n = pc; Encode(0xa0, 0, 2); Statement(); PatchAddr(n, pc);
ELSE: <set of actions>
and AFAI-can remember, each node lead to the next-one-down,
by default, except when the field/column, perhaps left-most/
before the token, was not empty and gave the next-node/line's
number.
What brought me onto this query, is that I need to clean-up
texts which I fetch everyday, by deleting eg. all text between:
"["<1 to 3 numerics> "]" "Reply" <rest of line>
<between 3 an 5 lines>
<start of line> "author" <rest of line>.
And it seems that awk can easily handle this.
And I don't want to invest in learning awk just to do this task.
So I thought perhaps I could use awk to do other usefull tasks.
Apparently I need first to get familiar with *nix regex to
clean-up my fetched texts ?
Apparently my ideas of an AFSM compiler is closer to the
table-driven compilers ?
clean-up my fetched texts ?
Thanks for any feedback,
==Chris Glur.


|