On 4/8/2008 4:39 PM, ric wrote:
> On 8 abr, 09:03, ric <ricar...@[EMAIL PROTECTED]
> wrote:
>
>>On 8 abr, 06:58, Ed Morton <mor...@[EMAIL PROTECTED]
> wrote:
>>
>>
>>
>>
>>
>>
>>>On 4/8/2008 7:47 AM, Janis wrote:
>>
>>>>On 8 Apr., 14:16, Ed Morton <mor...@[EMAIL PROTECTED]
> wrote:
>>>
>>>>>The posted code so far handles sentences that end in ".". What if
they end in
>>>>>"!" or "?"? Are there any other punctuation characters that have
meaning? The
>>>>>posted code also assumes that newlines have meaning wrt when to
start/stop
>>>>>counting "words". Do they really or is a period the true "end of
record" character?
>>>>
>>>>I'd take the pragmatic approach, add ? and ! to the character
>>>>set if necessary, and wait for new requirements as soon as the
>>>>OP gets aware of any. Experience shows that it needs a lot of
>>>>forth-and-back postings to get somewhat complete requirements,
>>>>but mostly that isn't necessary here and can be fixed on the fly.
>>>
>>>Yes, but I'm thinking the approach should probably be changed to use an
RS
>>>that's whatever set of characters really represent the end of a
"sentence" which
>>>would introduce a fair amount of churn in the script and may warrant a
>>>gawk-specific solution so it's worth poking at the requirements a bit
before
>>>going any further.
>>
>>>>BTW, initially I had used [[:punct:]] in the program I posted
>>>>but for apparent reasons (< and >) that was not appropriate.
>>>
>>>and now I'm thinking that after peeling the requirements onion a bit
more we MAY
>>>end up suggesting xmlawk or some such instead...
>>
>>> Ed.
>>
>>Goord Morning Janis and Ed
>>
>>I think you are right, I prefer change the file format to some xml
>>tagged style, to avoid all this little problems, looks xml it's the
>>best for this jobs.
>>
>>#cat newfile
>><instance id="bass.v.bnc.001" docsrc="BNC">
>><context>
>>I went fishing for some sea <head>bass</head> .
>></context>
>></instance>
>>
>><instance id="bass.v.bnc.002" docsrc="BNC">
>><context> <---
>>it's ok,can contain a point
>>The <head>bass</head> part of the song is very moving.
>></context>
>></instance>
>>
>><instance id="program.v.bnc.001" docsrc="BNC">
>><context> <---can
>>finisht without point too
>>he proposed an elaborate <head>program</head> of public works . This
>>information was taken
>></context>
>></instance>
>>
>><instance id="program.v.bnc.002" docsrc="BNC">
>><context>
>>the <head>program</head> required several hundred lines of code .
>></context>
>></instance>
>>
>><instance id="smell.v.bnc.001" docsrc="BNC">
>><context> <--in a single
>>line,and ends with "?"
>>It 's making me annoyed .I did n't want to stay there and I did n't
>>want to go to Combe Court , cos I hate it and it <head>smells</head>
>>and the Captain slobbers in his food and Christmas is horrible with no
>>good prezzies and Annie not there . Why did n't you visit me ? Why
>>not ?
>></context>
>></instance>
>>
>>Returning exactly this:
>>#----------------------------------
>>for some sea bass
>>The bass part the song
>>proprosed elaborate program public works
>>the program required several hundred
>>cos hate and smells and the Captain
>>
>>best regards from central america! ;)- Ocultar texto de la cita -
>>
>>- Mostrar texto de la cita -
>
>
>
>
> Just a little Advance:
>
> #cat solution.awk
>
> /context/{flag=1} /\/context/{flag=0} !/context/{
> if (flag==1)
>
> gsub (/[,;:]/, " ", $0) ;
> gsub (/[.]/, " . ", $0) }
> /<.*>/ { for (i = 1; i <= NF; i++)
> if ($i~/<.*>/) { s = substr ($i, 2, length($i)-2)
> c = 0
> for (j = i-1; j > 0 && c != 3 && $j != "." ; j--)
> if (length($j)>2) { s = $j FS s ; c++ }
> c = 0
> for (j = i+1; j <= NF && c != 3 && $j != "." ; j++)
> if (length($j)>2) { s = s FS $j ; c++ }
> }
> print s
> }
>
>
> But i' getting some extra text
>
> #cat m
> context
> for some sea bass
> /context
> /instance
> /instance
> context
> The bass part the song
> /context
> /instance
> /instance
> context
> proposed elaborate program public works
> /context
> /instance
> /instance
> context
> the program required several hundred
> /context
> /instance
> /instance
> context
> cos hate and smells and the Captain
> /context
> /instance
> #
>
>
> regards,
>
> -ric
Take a look at xmlawk: http://home.vrweb.de/~juergen.kahrs/gawk/XML/
Ed.


|