Talk About Network



Register and Login
Nick
Password
Register create new account Sign up is FREE and you can post replies, new topics, bookmark posts and more!
Recover lost password


Programming > Awk > Re: Less greedy...
Latest [ Topics | Posts ] Archive Post A New Topic Post a Reply
<< Topic < Post Post 28 of 33 Topic 2223 of 2236
Post > Topic >>

Re: Less greedy pattern match

by "Rajan" <svrajan@[EMAIL PROTECTED] > Apr 18, 2008 at 06:28 AM

"Ed Morton" <morton@[EMAIL PROTECTED]
> wrote in message 
news:48081FEB.60605@[EMAIL PROTECTED]
>
>
> On 4/17/2008 6:20 PM, Rajan wrote:
>>
>> "Ed Morton" <morton@[EMAIL PROTECTED]
> wrote in message
>> news:4807A618.5020501@[EMAIL PROTECTED]
>>
>>>
>>>On 4/17/2008 1:32 PM, pk wrote:
>>>
>>>>On Wednesday 16 April 2008 05:23, Ed Morton wrote:
>>>>
>>>>
>>>>
>>>>>There's no "less greedy" operator (to be honest, I'm not sure what
that
>>>>>really means or why it'd be more useful than a different RE)
>>>>
>>>>
>>>>Well, perl REs have non-greedy match. It can be useful, for example, 
>>>>when
>>>>parsing html or xml. The classical example is something like this 
>>>>(please
>>>>note that I'm not a perl guru - alas -):
>>>>
>>>>$ cat file.html
>>>><tag><section><t1>foo</t1><t2>blah</t2><t1>bar
>>>></t1></section>
>>>><section><t1>baz
>>>></t1><t2>blah</t2><t1>baz</t1></section></tag>
>>>>
>>>>Suppose you want to remove only what's inside <t1> tags (tags
included)
>>>>and
>>>>keep everything else, without apriori knowledge of how the lines are
>>>>formatted.
>>>>
>>>>The simple perl one-liner:
>>>>
>>>>$ perl -p0e 's%<t1>.+?</t1>%%gs' file.html
>>>><tag><section><t2>blah</t2></section>
>>>><section><t2>blah</t2></section></tag>
>>>>
>>>>does what can't be done with sed. The "+?" is the non-greedy notation
>>>>for "one or more", just like "*?" is the non-greedy operator for "zero

>>>>or
>>>>more", etc.
>>>>
>>>>Note that I'm not saying that that cannot be done using other
>>>>methods...that
>>>>was just an example to demonstrate how non-greedy match can be useful.
>>>>
>>>
>>>Yeah, that is a bit simpler than the awk equivalent:
>>>
>>>$ cat file
>>><tag><section><t1>foo</t1><t2>blah</t2><t1>bar</t1></section>
>>><section><t1>baz</t1><t2>blah</t2><t1>baz</t1></section></tag>
>>>
>>>$ awk
'{o="</t1>";n=SUBSEP;gsub(o,n);gsub("<t1>[^"n"]*"n,"");gsub(n,o)}1'
>>>file
>>><tag><section><t2>blah</t2></section>
>>><section><t2>blah</t2></section></tag>
>>>
>>>or something else to chop up and reassemble the record. It is a pity 
>>>there
>>>isn't
>>>a "not" operator for RE elements other than characters within [...].
>>>
>>>Oh well...
>>>
>>>Ed.
>>>
>>
>>
>> Wouldn't this do?
>> gawk -v RS='</*t1>' -v ORS=""  'RT!="</t1>"{print}'
>>
>
> Yes, that's a nice solution. It's obviously gawk specific which may be 
> fine.
> Just a couple of tweaks: the "*" should be a "?"
Agreed!

 and you don't need the
> "{print}"
I knocked my head after clicking send.

and you don't NEED the "" when assigning a null variable:
>
>  gawk -v RS='</?t1>' -v ORS=  'RT!="</t1>"'
>
Fore!
> Fore!
>
> Ed.
>




 33 Posts in Topic:
Less greedy pattern match
Prateek <prateek.a@[EM  2008-04-15 19:59:12 
Re: Less greedy pattern match
Ed Morton <morton@[EMA  2008-04-15 22:23:49 
Re: Less greedy pattern match
"Rajan" <svr  2008-04-16 00:02:06 
Re: Less greedy pattern match
pk <pk@[EMAIL PROTECTE  2008-04-17 20:32:10 
Re: Less greedy pattern match
pk <pk@[EMAIL PROTECTE  2008-04-17 21:12:47 
Re: Less greedy pattern match
Ed Morton <morton@[EMA  2008-04-17 14:33:44 
Re: Less greedy pattern match
pk <pk@[EMAIL PROTECTE  2008-04-17 22:01:46 
Re: Less greedy pattern match
pk <pk@[EMAIL PROTECTE  2008-04-17 22:12:04 
Re: Less greedy pattern match
"Rajan" <svr  2008-04-17 19:20:05 
Re: Less greedy pattern match
Ed Morton <morton@[EMA  2008-04-17 23:13:31 
Re: Less greedy pattern match
Ed Morton <morton@[EMA  2008-04-17 23:36:59 
Re: Less greedy pattern match
pk <pk@[EMAIL PROTECTE  2008-04-18 09:50:29 
Re: Less greedy pattern match
Ed Morton <morton@[EMA  2008-04-18 09:27:00 
Re: Less greedy pattern match
pk <pk@[EMAIL PROTECTE  2008-04-18 19:05:23 
Re: Less greedy pattern match
pk <pk@[EMAIL PROTECTE  2008-04-18 19:24:27 
Re: Less greedy pattern match
pk <pk@[EMAIL PROTECTE  2008-04-18 22:06:17 
Re: Less greedy pattern match
Ed Morton <morton@[EMA  2008-04-18 21:19:26 
Re: Less greedy pattern match
Cesar Rabak <csrabak@[  2008-04-19 13:15:05 
Re: Less greedy pattern match
Ed Morton <morton@[EMA  2008-04-20 08:36:44 
Re: Less greedy pattern match
"Rajan" <svr  2008-04-20 09:58:54 
Re: Less greedy pattern match
"Rajan" <svr  2008-04-20 10:21:46 
Re: Less greedy pattern match
Janis Papanagnou <Jani  2008-04-19 18:53:19 
Re: Less greedy pattern match
pk <pk@[EMAIL PROTECTE  2008-04-20 14:32:54 
Re: Less greedy pattern match
Janis Papanagnou <Jani  2008-04-20 16:30:25 
Re: Less greedy pattern match
Ed Morton <morton@[EMA  2008-04-21 09:16:17 
Re: Less greedy pattern match
pk <pk@[EMAIL PROTECTE  2008-04-22 10:09:12 
Re: Less greedy pattern match
Ed Morton <morton@[EMA  2008-04-22 06:14:10 
Re: Less greedy pattern match
"Rajan" <svr  2008-04-18 06:28:14 
Re: Less greedy pattern match
Prateek <prateek.a@[EM  2008-04-15 21:21:23 
Re: Less greedy pattern match
"Rajan" <svr  2008-04-16 00:54:18 
Re: Less greedy pattern match
Ed Morton <morton@[EMA  2008-04-16 07:06:52 
Re: Less greedy pattern match
"Rajan" <svr  2008-04-16 18:01:13 
Re: Less greedy pattern match
Ed Morton <morton@[EMA  2008-04-16 23:48:21 

Post A Reply:
  Go here to Signup

AddThis Feed Button


About - Advertising - Contact - Frequently Asked Questions - Privacy Policy - Terms of Use - Signup

Contact
tan12V112 Fri May 16 6:33:26 CDT 2008.