Talk About Network



Register and Login
Nick
Password
Register create new account Sign up is FREE and you can post replies, new topics, bookmark posts and more!
Recover lost password


Programming > Awk > Re: parsing a t...
Latest [ Topics | Posts ] Archive Post A New Topic Post a Reply
<< Topic < Post Post 16 of 17 Topic 2217 of 2236
Post > Topic >>

Re: parsing a text file

by ric <ricardo7@[EMAIL PROTECTED] > Apr 8, 2008 at 02:39 PM

On 8 abr, 09:03, ric <ricar...@[EMAIL PROTECTED]
> wrote:
> On 8 abr, 06:58, Ed Morton <mor...@[EMAIL PROTECTED]
> wrote:
>
>
>
>
>
> > On 4/8/2008 7:47 AM, Janis wrote:
>
> > > On 8 Apr., 14:16, Ed Morton <mor...@[EMAIL PROTECTED]
> wrote:
>
> > >>The posted code so far handles sentences that end in ".". What if
they=
 end in
> > >>"!" or "?"? Are there any other punctuation characters that have
meani=
ng? The
> > >>posted code also assumes that newlines have meaning wrt when to
start/=
stop
> > >>counting "words". Do they really or is a period the true "end of
recor=
d" character?
>
> > > I'd take the pragmatic approach, add ? and ! to the character
> > > set if necessary, and wait for new requirements as soon as the
> > > OP gets aware of any. Experience shows that it needs a lot of
> > > forth-and-back postings to get somewhat complete requirements,
> > > but mostly that isn't necessary here and can be fixed on the fly.
>
> > Yes, but I'm thinking the approach should probably be changed to use
an =
RS
> > that's whatever set of characters really represent the end of a
"sentenc=
e" which
> > would introduce a fair amount of churn in the script and may warrant a
> > gawk-specific solution so it's worth poking at the requirements a bit
be=
fore
> > going any further.
>
> > > BTW, initially I had used [[:punct:]] in the program I posted
> > > but for apparent reasons (< and >) that was not appropriate.
>
> > and now I'm thinking that after peeling the requirements onion a bit
mor=
e we MAY
> > end up suggesting xmlawk or some such instead...
>
> > =A0 =A0 =A0 =A0 Ed.
>
> Goord Morning Janis and Ed
>
> I think you are right, I prefer change the file format to some xml
> tagged style, to avoid all this little problems, looks xml it's the
> best for this jobs.
>
> #cat newfile
> <instance id=3D"bass.v.bnc.001" docsrc=3D"BNC">
> <context>
> I went fishing for some sea <head>bass</head> .
> </context>
> </instance>
>
> <instance id=3D"bass.v.bnc.002" docsrc=3D"BNC">
> <context> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0
=A0 =
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0<---
> it's ok,can contain a point
> The <head>bass</head> part of the song is very moving.
> </context>
> </instance>
>
> <instance id=3D"program.v.bnc.001" docsrc=3D"BNC">
> <context> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0
=A0 =
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0<---can
> finisht without point too
> he proposed an elaborate <head>program</head> of public works . This
> information was taken
> </context>
> </instance>
>
> <instance id=3D"program.v.bnc.002" docsrc=3D"BNC">
> <context>
> the <head>program</head> required several hundred lines of code .
> </context>
> </instance>
>
> <instance id=3D"smell.v.bnc.001" docsrc=3D"BNC">
> <context> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0
=A0 =
=A0 =A0 =A0 =A0<--in a single
> line,and ends with "?"
> It 's =A0making me annoyed .I did n't want to stay there and I did n't
> want to go to Combe Court , cos I hate it and it <head>smells</head>
> and the Captain slobbers in his food and Christmas is horrible with no
> good prezzies and Annie not there . Why did n't you visit me ? =A0Why
> not ?
> </context>
> </instance>
>
> Returning exactly this:
> #----------------------------------
> for some sea bass
> The bass part the song
> proprosed elaborate program public works
> the program required several hundred
> cos hate and smells and the Captain
>
> best regards from central america! ;)- Ocultar texto de la cita -
>
> - Mostrar texto de la cita -



Just a little Advance:

#cat solution.awk

/context/{flag=3D1} /\/context/{flag=3D0} !/context/{
if (flag=3D=3D1)

 gsub (/[,;:]/, " ", $0) ;
    gsub (/[.]/, " . ", $0) }
  /<.*>/ { for (i =3D 1; i <=3D NF; i++)
             if ($i~/<.*>/) { s =3D substr ($i, 2, length($i)-2)
               c =3D 0
               for (j =3D i-1; j > 0 && c !=3D 3 && $j !=3D "." ; j--)
                 if (length($j)>2) { s =3D $j FS s ; c++ }
               c =3D 0
               for (j =3D i+1; j <=3D NF && c !=3D 3 && $j !=3D "." ; j++)
                 if (length($j)>2) { s =3D s FS $j ; c++ }
             }
      print s
}


But i' getting some extra text

#cat m
context
for some sea bass
/context
/instance
/instance
context
The bass part the song
/context
/instance
/instance
context
proposed elaborate program public works
/context
/instance
/instance
context
the program required several hundred
/context
/instance
/instance
context
cos hate and smells and the Captain
/context
/instance
#


regards,

  -ric




 17 Posts in Topic:
parsing a text file
ric <ricardo7@[EMAIL P  2008-04-07 10:01:05 
Re: parsing a text file
gazelle@[EMAIL PROTECTED]  2008-04-07 17:17:26 
Re: parsing a text file
ric <ricardo7@[EMAIL P  2008-04-07 14:29:48 
Re: parsing a text file
Janis Papanagnou <Jani  2008-04-08 00:15:18 
Re: parsing a text file
Janis Papanagnou <Jani  2008-04-08 00:10:01 
Re: parsing a text file
"Anton Treuenfels&qu  2008-04-07 22:31:43 
Re: parsing a text file
Ed Morton <morton@[EMA  2008-04-07 17:14:10 
Re: parsing a text file
ric <ricardo7@[EMAIL P  2008-04-07 15:36:52 
Re: parsing a text file
ric <ricardo7@[EMAIL P  2008-04-07 16:00:54 
Re: parsing a text file
Janis Papanagnou <Jani  2008-04-08 01:05:40 
Re: parsing a text file
ric <ricardo7@[EMAIL P  2008-04-07 19:07:38 
Re: parsing a text file
Ed Morton <morton@[EMA  2008-04-08 07:16:06 
Re: parsing a text file
Janis <janis_papanagno  2008-04-08 05:47:41 
Re: parsing a text file
Ed Morton <morton@[EMA  2008-04-08 07:58:29 
Re: parsing a text file
ric <ricardo7@[EMAIL P  2008-04-08 08:03:47 
Re: parsing a text file
ric <ricardo7@[EMAIL P  2008-04-08 14:39:18 
Re: parsing a text file
Ed Morton <morton@[EMA  2008-04-09 12:38:49 

Post A Reply:
  Go here to Signup

AddThis Feed Button


About - Advertising - Contact - Frequently Asked Questions - Privacy Policy - Terms of Use - Signup

Contact
tan12V112 Fri May 16 8:37:33 CDT 2008.