Talk About Network



Register and Login
Nick
Password
Register create new account Sign up is FREE and you can post replies, new topics, bookmark posts and more!
Recover lost password


Programming > Apl > Regexp with bac...
Latest [ Topics | Posts ] Archive Post A New Topic Post a Reply
<< Topic < Post Post 1 of 2 Topic 988 of 1014
Post > Topic >>

Regexp with backreferences in J

by brian.b.mcguinness@[EMAIL PROTECTED] Apr 20, 2008 at 11:30 AM

The rxrplc function in the standard J library script regex.ijs doesn't
seem to handle backreferences in the replacement text.  This function,
derived from rxrplc, seems
to work:

NB.
----------------------------------------------------------------------------
NB. rxsubst - Regular expression substitution
NB.
NB. str =. (pattern[;index];newtext) rxsubst str
NB.
NB. Brian B. McGuinness     J     April, 2008
NB.
----------------------------------------------------------------------------
rxsubst =: 4 : 0
  pat=. >{.x
  new=. >{:x
  if. L. pat do. 'pat ndx'=. pat else. ndx=. ,0 end.
  if. 1 ~: #$ ndx do. 13!:8[3 end.

  mat=. pat rxmatches y

  if. 0 = # mat do.
    y
    return.
  end.

  NB. --- Find any back references in the replacement text
  escaped =. 0
  newtxt  =. 0 2 $ 0
  last    =. 0

  for_i. i. # new do.
    c =. i { new
    if. c = '\' do.
      escaped =. -. escaped
    else.
      if. escaped do.
        c =. '123456789' i. c
        if. c < 9 do.
          newtxt =. newtxt, (last }. (i - 1) {. new); c
          last   =. i + 1
        end.
      end.
      escaped =. 0
    end.
  end.

  if. last < #new do.
    newtxt =. newtxt, (last }. new); 9
  end.

  if. 1 = #newtxt do.
    NB. --- No back references in the replacement text; act like
rxrplc
    (0 { newtxt) (({.ndx) {"2 mat) rxmerge y
  else.
    NB. --- Expand the back references
    repl =. ''

    for_i. mat do.
      a =. ''
      for_j. }. i do.
        a =. a, < (1 { j) {. (0 { j) }. y
      end.
      a =. ;((1 {::"1 newtxt) { 10 {. a) (< a: ; 1) } newtxt

      repl =. repl, <a
    end.

    NB. --- Now perform the replacements
    repl (({.ndx) {"2 mat) rxmerge y
  end.
)

For example:

   ('@[EMAIL PROTECTED]
([biu]){(.*?)}';'<\1>\2</\1>') rxsubst 'We can have @[EMAIL PROTECTED]
 and @[EMAIL PROTECTED]
 text.'

We can have <b>bold</b>, <i>italic</i>, and <u>underlined</u> text.


I have experimented a bit with the J state machine, for example:

NB.
----------------------------------------------------------------------------
NB. Find escaped digits
NB.
NB. State 0: normal
NB. State 1: escaped
NB. State 2: escaped digit found (used so we break *after* the digit)
NB. State 3: initial state (needed since we're not allowed to
initialize j to 0)
NB.
----------------------------------------------------------------------------

test4 =: 3 : 0
  conv   =. (a. = '\') + +: a. e. '123456789'
  trans  =. 4 3 2 $  0 0  1 0  0 0   0 0  0 0  2 0   0 2  0 2  0 2   0
1  1 1  0 1
  newtxt =. (0;trans;conv; 0 _1 3 _1) ;: y
  field  =. (<"0 '123456789' i. {: & > }: newtxt), <9
  newtxt =. ((_2 }. &. > }: newtxt), {: newtxt) ,. field
  NB. *** for each match, replace 2nd col of newtxt with corresponding
substr, then raze
)

But I don't see any great advantage to using it, and it makes the code
less
readable.  The state machine also has some odd restrictions, such as
requiring that the "j" value be initialized to _1, so a special
"initial" state then
required to set j to 0.

--- Brian




 2 Posts in Topic:
Regexp with backreferences in J
brian.b.mcguinness@[EMAIL  2008-04-20 11:30:56 
Re: Regexp with backreferences in J
brian.b.mcguinness@[EMAIL  2008-05-01 09:34:58 

Post A Reply:
  Go here to Signup

AddThis Feed Button


About - Advertising - Contact - Frequently Asked Questions - Privacy Policy - Terms of Use - Signup

Contact
tan12V112 Mon May 12 0:53:50 CDT 2008.