On 4/8/2008 7:47 AM, Janis wrote:
> On 8 Apr., 14:16, Ed Morton <mor...@[EMAIL PROTECTED]
> wrote:
>
>>The posted code so far handles sentences that end in ".". What if they
end in
>>"!" or "?"? Are there any other punctuation characters that have
meaning? The
>>posted code also assumes that newlines have meaning wrt when to
start/stop
>>counting "words". Do they really or is a period the true "end of record"
character?
>
>
> I'd take the pragmatic approach, add ? and ! to the character
> set if necessary, and wait for new requirements as soon as the
> OP gets aware of any. Experience shows that it needs a lot of
> forth-and-back postings to get somewhat complete requirements,
> but mostly that isn't necessary here and can be fixed on the fly.
Yes, but I'm thinking the approach should probably be changed to use an RS
that's whatever set of characters really represent the end of a "sentence"
which
would introduce a fair amount of churn in the script and may warrant a
gawk-specific solution so it's worth poking at the requirements a bit
before
going any further.
> BTW, initially I had used [[:punct:]] in the program I posted
> but for apparent reasons (< and >) that was not appropriate.
and now I'm thinking that after peeling the requirements onion a bit more
we MAY
end up suggesting xmlawk or some such instead...
Ed.


|