Talk About Network



Register and Login
Nick
Password
Register create new account Sign up is FREE and you can post replies, new topics, bookmark posts and more!
Recover lost password


Programming > Compilers > String tokenize...
Latest [ Topics | Posts ] Archive Post A New Topic Post a Reply
<< Topic < Post Post 1 of 5 Topic 2383 of 2421
Post > Topic >>

String tokenizer place in Chomsky hierarchy?

by Tegiri Nenashi <TegiriNenashi@[EMAIL PROTECTED] > Apr 8, 2008 at 10:44 AM

String tokenizer is the simplest parser of all. In java, arguably, it
is much more frequently used than regular expressions. Yet I fail to
see any parsing theory book ever mentioning it.

In my practical experience it is easer write a scanner on string
tokenizer foundation rather than to use off the shellf reg exp engine.
I'm interested however what is the language theory perspective onto
this phenomenon.

Formally, a set of terminals is partitioned into a separator or a set
of separators, and the rest of terminals. Then, string tokenizer
translates a given word into a set (or list) of words. Here we have
the first technical difficulty, what exactly this translation is? Is
it a mapping from Monoid to Idempotent Semiring? Certainly not if we
insist on the result being a list of words, and not the set. Then if
we try exrtend this mapping naturally to the domain of sets, then we
have to deal with set of sets (or set of lists) on the range side of
the mapping?

Consider an example, an alphabet of terminals {a,b,c} with the "a"
being a separator. Can you suggest a grammar that be able to tokenize
the word "babcac" into {b,bc,c}? Sure something like

s >= a
t >= b
t >= c
w >= wt
w >= 1
u >= ws
v >= 1
v >= uv

would work, but does this 8(!) rule grammar present any new insight to
what string tokenizor is? Besides, assuming this grammar produces an
unambigious parse tree, there is still an extra step of extracting the
set of words from the tree.




 5 Posts in Topic:
String tokenizer place in Chomsky hierarchy?
Tegiri Nenashi <Tegiri  2008-04-08 10:44:53 
Re: String tokenizer place in Chomsky hierarchy?
Mitch <maharri@[EMAIL   2008-04-11 07:17:34 
Re: String tokenizer place in Chomsky hierarchy?
Hans-Peter Diettrich <  2008-04-11 17:15:39 
Re: String tokenizer place in Chomsky hierarchy?
sipayi@[EMAIL PROTECTED]   2008-04-15 05:31:33 
Re: String tokenizer place in Chomsky hierarchy?
Rock Brentwood <markwh  2008-04-25 12:24:46 

Post A Reply:
  Go here to Signup

AddThis Feed Button


About - Advertising - Contact - Frequently Asked Questions - Privacy Policy - Terms of Use - Signup

Contact
tan12V112 Tue May 13 8:07:13 CDT 2008.