On 4/16/2008 5:01 PM, Rajan wrote:
>
> "Ed Morton" <morton@[EMAIL PROTECTED]
> wrote in message
> news:4805EBDC.8080700@[EMAIL PROTECTED]
>
>>
>>On 4/15/2008 11:54 PM, Rajan wrote:
>>
>>>"Prateek" <prateek.a@[EMAIL PROTECTED]
> wrote in message
>>>news:9807e551-cdc4-4006-9b52-68d833d56787@[EMAIL PROTECTED]
>>>
>>>
>>>>On Apr 16, 12:02 am, "Rajan" <svra...@[EMAIL PROTECTED]
> wrote:
>>>>
>>>>
>>>>>"Ed Morton" <mor...@[EMAIL PROTECTED]
> wrote in message
>>>>>
>>>>>news:48057145.9090408@[EMAIL PROTECTED]
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>>On 4/15/2008 9:59 PM, Prateek wrote:
>>>>>>
>>>>>>
>>>>>>>Hi,
>>>>>>>This question is regarding the less greedy regular expression
pattern
>>>>>>>match. As you know, Perl provides less greedy pattern match
parameter
>>>>>>>in the form of repetition?
>>>>>>
>>>>>>>But does gawk provide anything similar?
>>>>>>
>>>>>>>The sample I was trying was:
>>>>>>
>>>>>>>echo ccccccd | gawk '{ print match ($0,c*?d)
>>>>>>
>>>>>>>I was expecting value 6 as output
>>>>>>
>>>>>>why?
>>>>>
>>>>>>>but was always getting value 1.
>>>>>>
>>>>>>>(The above is only an example; my patterns are more complex and was
>>>>>>>trying to find less greedy operators)
>>>>>>
>>>>>>There's no "less greedy" operator (to be honest, I'm not sure what
that
>>>>>>really
>>>>>>means or why it'd be more useful than a different RE), but if you
post
>>>>>>some
>>>>>>sample input and expected output we can probably help get what you
want
>>>>>>using
>>>>>>the existing functionality.
>>>>>
>>>>>>Ed.
>>>>>
>>>>>match returns the index of start of match within the string not
number
>>>>>of
>>>>>occurrences .
>>>>>For less greedy expressions you can use interval expressions like
c{1}.
>>>>>If
>>>>>you are looking for number of consecutive occurrences change your
c*?d
>>>>>to
>>>>>c*
>>>>>and print RLENGTH. If you are looking for occurrences that are
>>>>>non-consecutive use a gsub in combination with an interval
expression.
>>>>>
>>>>>Rajan
>>>>
>>>>Hi Rajan,
>>>>Can you please give an example of "if you are looking for occurrences
>>>>that are
>>>>non-consecutive use a gsub in combination with an interval
>>>>expression."
>>>>
>>>>Thanks,
>>>>Prateek
>>>
>>>
>>>Prateek, If I want to count all occurrences of "cici", I would do it
like
>>>below
>>>echo ccccccdcicidcicicicid | gawk --re-interval '{tmp=$0; print
>>>gsub(/(ci){2}/,"anytext",tmp)}'
>>>Rajan
>>>
>>
>>I'd do:
>>
>>echo ccccccdcicidcicicicid | awk -F'cici' '{print NF ? NF-1 : 0}'
>>
>>Ed.
>>
>
>
> Cool, that is better and easier. As long as you are resetting FS and
forcing
> gawk to re-split fields before doing anything else.
If I needed to use a different FS for the "main" part of the script, then
I'd
instead do:
echo ccccccdcicidcicicicid | awk '{c=split($0,t,"cici"); print c ? c-1 :
0}'
I'd really like to see the OP post some real examples though - I still
don't
know what the real problem is he's trying to solve but I doubt if it's the
one
we're addressing!
Ed.


|