On 8 abr, 09:03, ric <ricar...@[EMAIL PROTECTED]
> wrote:
> On 8 abr, 06:58, Ed Morton <mor...@[EMAIL PROTECTED]
> wrote:
>
>
>
>
>
> > On 4/8/2008 7:47 AM, Janis wrote:
>
> > > On 8 Apr., 14:16, Ed Morton <mor...@[EMAIL PROTECTED]
> wrote:
>
> > >>The posted code so far handles sentences that end in ".". What if
they=
end in
> > >>"!" or "?"? Are there any other punctuation characters that have
meani=
ng? The
> > >>posted code also assumes that newlines have meaning wrt when to
start/=
stop
> > >>counting "words". Do they really or is a period the true "end of
recor=
d" character?
>
> > > I'd take the pragmatic approach, add ? and ! to the character
> > > set if necessary, and wait for new requirements as soon as the
> > > OP gets aware of any. Experience shows that it needs a lot of
> > > forth-and-back postings to get somewhat complete requirements,
> > > but mostly that isn't necessary here and can be fixed on the fly.
>
> > Yes, but I'm thinking the approach should probably be changed to use
an =
RS
> > that's whatever set of characters really represent the end of a
"sentenc=
e" which
> > would introduce a fair amount of churn in the script and may warrant a
> > gawk-specific solution so it's worth poking at the requirements a bit
be=
fore
> > going any further.
>
> > > BTW, initially I had used [[:punct:]] in the program I posted
> > > but for apparent reasons (< and >) that was not appropriate.
>
> > and now I'm thinking that after peeling the requirements onion a bit
mor=
e we MAY
> > end up suggesting xmlawk or some such instead...
>
> > =A0 =A0 =A0 =A0 Ed.
>
> Goord Morning Janis and Ed
>
> I think you are right, I prefer change the file format to some xml
> tagged style, to avoid all this little problems, looks xml it's the
> best for this jobs.
>
> #cat newfile
> <instance id=3D"bass.v.bnc.001" docsrc=3D"BNC">
> <context>
> I went fishing for some sea <head>bass</head> .
> </context>
> </instance>
>
> <instance id=3D"bass.v.bnc.002" docsrc=3D"BNC">
> <context> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0
=A0 =
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0<---
> it's ok,can contain a point
> The <head>bass</head> part of the song is very moving.
> </context>
> </instance>
>
> <instance id=3D"program.v.bnc.001" docsrc=3D"BNC">
> <context> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0
=A0 =
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0<---can
> finisht without point too
> he proposed an elaborate <head>program</head> of public works . This
> information was taken
> </context>
> </instance>
>
> <instance id=3D"program.v.bnc.002" docsrc=3D"BNC">
> <context>
> the <head>program</head> required several hundred lines of code .
> </context>
> </instance>
>
> <instance id=3D"smell.v.bnc.001" docsrc=3D"BNC">
> <context> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0
=A0 =
=A0 =A0 =A0 =A0<--in a single
> line,and ends with "?"
> It 's =A0making me annoyed .I did n't want to stay there and I did n't
> want to go to Combe Court , cos I hate it and it <head>smells</head>
> and the Captain slobbers in his food and Christmas is horrible with no
> good prezzies and Annie not there . Why did n't you visit me ? =A0Why
> not ?
> </context>
> </instance>
>
> Returning exactly this:
> #----------------------------------
> for some sea bass
> The bass part the song
> proprosed elaborate program public works
> the program required several hundred
> cos hate and smells and the Captain
>
> best regards from central america! ;)- Ocultar texto de la cita -
>
> - Mostrar texto de la cita -
Just a little Advance:
#cat solution.awk
/context/{flag=3D1} /\/context/{flag=3D0} !/context/{
if (flag=3D=3D1)
gsub (/[,;:]/, " ", $0) ;
gsub (/[.]/, " . ", $0) }
/<.*>/ { for (i =3D 1; i <=3D NF; i++)
if ($i~/<.*>/) { s =3D substr ($i, 2, length($i)-2)
c =3D 0
for (j =3D i-1; j > 0 && c !=3D 3 && $j !=3D "." ; j--)
if (length($j)>2) { s =3D $j FS s ; c++ }
c =3D 0
for (j =3D i+1; j <=3D NF && c !=3D 3 && $j !=3D "." ; j++)
if (length($j)>2) { s =3D s FS $j ; c++ }
}
print s
}
But i' getting some extra text
#cat m
context
for some sea bass
/context
/instance
/instance
context
The bass part the song
/context
/instance
/instance
context
proposed elaborate program public works
/context
/instance
/instance
context
the program required several hundred
/context
/instance
/instance
context
cos hate and smells and the Captain
/context
/instance
#
regards,
-ric


|