Context: writing a search query parser with Lucene underneath. I've got
to sup****t a different query syntax than vanilla Lucene, so I've used
JavaCC to create the AST, which I then parse in a separate class. For the
most part it works just as I want it to (with help from this group in the
past!), but there's a problem I haven't been able to solve.
The issue has to do with Boolean expressions as operands to a proximity
operator. For example:
(a OR b) NEAR c
(a AND b) NEAR (c NOT d)
Behind the scenes I am using Lucene's span queries to create SpanQuery
objects that handle the proximity part. Lucene provides a SpanOrQuery
that makes it easy to create the right objects when only OR is used in
the Boolean expressions that are the operands of the proximity operator.
As such, I can handle the first of the two examples but not the second. I
have tried to fiddle with Lucene's SpanQuery cl*****, but haven't yet
managed to figure out how to create the right objects. While I am able to
generate the AST properly, I can't do what I need to with what I've got.
As far as I can tell, there are two approaches available to me: extend
some core Lucene cl***** to do what I want (downstream), or refactor the
AST so that it's workable (upstream). I would like to avoid the
downstream approach for two reasons: if the Lucene cl***** change in any
fundamental way I will either have to refactor those cl***** I've written
or never upgrade Lucene versions; also, I am always more comfortable
"fixing" something as far upstream as possible.
So, what I would like to do, at the JavaCC level (i.e. in the class
generated by JavaCC), is turn the AST on its head so that I end up with
an expression that I can handle down the line. Using the example above, I
would like to be able to turn:
(a AND b) NEAR (c NOT d)
into:
((a NEAR c) AND (b NEAR c)) NOT ((a NEAR d) OR (b NEAR d))
or (closer to the original):
((a NEAR c) AND (b NEAR c)) AND NOT (a NEAR d) AND NOT (b NEAR d)
Either of these are simple proximity expressions joined by Boolean
operators, which are easily turned into simple SpanQuery objects as
clauses of a BooleanQuery object.
My gut tells me that I'll need to use the Visitor pattern to handle this,
but that's as far as I am able to get. Any ideas?
Thanks,
-- Robert


|