ric wrote:
> Hi People:
>
> Could anybody tell me how can I do this with awk or bash shell?
> I've a file with this systaxis:
>
> #cat file
> I went fishing for some sea <bass>.
> The <bass> part of the song is very moving.
>
> he proposed an elaborate <program> of public works.This information
> was taken from the magazine.
> the <program> required several hundred lines of code.
>
> I need a script that gets 3 previus words and 3 words after the
> tag(<>) from a text file,EXCEPT the words greater than
> 2 chars length(example of,a,an).
>
> Returning something like this:
>
> #----------------------------------
> for some sea bass
> The bass part the song
> proprosed elaborate program public works
> the program required several hundred
Your data format makes a solution somewhat ugly (see missing space
after the sentence in " works.This "). The following solution will
not stop printing after a '.' as your output suggests; if that is a
necessary requirement the program, especially the conditions of the
inner for loops, must be refined.
{ gsub (/[,.;:]/, " ", $0) }
/<.*>/ { for (i = 1; i <= NF; i++)
if ($i~/<.*>/) { s = substr ($i, 2, length($i)-2)
c = 0
for (j = i-1; j > 0 && c != 3; j--)
if (length($j)>2) { s = $j FS s ; c++ }
c = 0
for (j = i+1; j <= NF && c != 3; j++)
if (length($j)>2) { s = s FS $j ; c++ }
}
print s
}
Will produce...
for some sea bass
The bass part the song
proposed elaborate program public works This
the program required several hundred
Janis
>
>
> Thanks!
> -ric


|