Christopher Diggins wrote:
> I was wondering if there has been any research into adding higher-
> order instruction to the Java bytecode? In other words instructions
> that either push or pop instructions on the evaluation stack.
I don't know whether there has been any research on such ideas, I haven't
seen
anything myself, but I (think I) miss quite a bit more of what's going on
than
I catch.
I'm rather sceptical about the approach -- at least, if taken as a
candidate
for extending the expressiveness of the JVM. (It does look interesting in
and
of itself.)
The problems I see are "political" and practical, I'm willing to believe
your
claim that there are no real killer problem in theory.
(Although I think you have a fair bit more work to do to sup****t that
claim.
Just for one example: how do you ensure that a constant-pool index pushed
onto
the stack to form part of a generated invokevirtual instruction call
sequence
is in range, let alone correctly "typed" ?).
The "political" problems (not really a very good term, but I can't think
of a
better one off-hand) are boring and obvious. The JVM is a standard (in
the
sense of having a public specification independent of any one
implementation),
and non-standard extensions are doomed to die in obscurity. No one will
use a
program which requires a non-standard JVM as an execution platform, and so
any
JVM language will surely remain (to say the best of it) very much a niche
language unless it targets the /real/ JVM. But if only very unim****tant
(in
numerical terms) languages require the non-standard extensions then
there's not
a lot of chance of them making it into the standard.
The practical problems are more interesting ;-)
The best way I can put it is that the idea seem to assume, or require, an
approach to JVM implementation which is not used in practise. If the JVM
were
typically implemented as an interpreter with a classic "big-switch" on the
opcode, then the adding this feature would obviously be trivial. If the
implementation approach were to translate bytecode into native code by
doing a
table lookup to translate each opcode into a (stereotyped) native
instruction
sequence, concatenating the results, then shoving everything through a
peephole
optimiser, then adding this kind of feature would also be pretty trivial.
But
neither of those approaches are relevant (as far as I know no production
JVM
for desktop class machines uses either of them, or has done since about
JDK
1.0.2).
In fact the stack-based nature of the JVM bytecode language is a bit of a
red
herring. For one thing JVM bytecode isn't really that much of a
stack-based
language[*]. More im****tantly the stack is only an incidental feature of
the
intermediate language used to compress source code into a ****table
delivery
format. It isn't necessarily a particularly well chosen format (though it
is
reasonably compact, and does keep javac simple), since the JVM has to
translate
code back from the stack-based expression into something suitable for
input to
its optimising compiler (SSA or whatever). I don't know the details of
how the
current crop of JVMs generate native code, but it is at least a plausible
approach that the JITer never sees bytecode at all, nor ever thinks in
terms of
a stack (except the stack of activation records, of course).
It may well be that there are ways of extending the analysis the JVM must
do to
allow higher-order bytecodes in the context of a state of the art JITer,
but I
don't see any reason to expect that to be simple at all -- nor any reason
to
expect it even to fit into the current framework at all (it would be like
adding eval(char *) to an optimising C compiler -- how can the compiler
optimise code that is unknown until runtime ?)
But then, perhaps the overtly stack-based nature of your idea is itself a
red-herring. It doesn't seem to me that much would change if you wanted
to be
able to assemble sequences of executable bytecode in byte[] arrays,
instead of
on the stack (the latter might significantly simplify the implementation
of
languages like Cat, but it doesn't seem to me that it's similarly
im****tant for
languages outside that niche-within-niche). But the techniques for doing
that
are already well-understood, rather widely used, and do not require
changes to
the semantics or implementation of the JVM. So I suspect there's more
mileage
to be gained in looking for ways to optimise that process (e.g. creating
lighter-weight classes) than in a wholesale reorganisation of the JVM
semantics.
-- chris
[*] It is true that there is an explicit stack, with operations to
manipulate
it, but that is only the expression stack and it isn't used in anything
like
the way "the" stack is used in Forth or PostScript -- the character of the
JVM
bytecode language would be essentially unchanged if it used, say, the
unbounded-number-of-named-registers-per-activation approach -- which it
does,
anyway, for everything except expression evaluation.
P.S. Very small notes re: your paper. "PostScript" (which is a
registered
trademark) should be spelled with a capital S. Also, I don't understand
why
you say that PostScript isn't higher order -- you manipulate instructions
as
data every time you write conditional code, or define a procedure.
Lastly, I'm
not sure that citing the PostScript reference manual as "Inc. 1999" is
quite in
tune with established academic norms ;-)


|