"Janis Papanagnou" <Janis_Papanagnou@[EMAIL PROTECTED]
> wrote in message
news:fte63p$nif$1@[EMAIL PROTECTED]
>
> { gsub (/[,.;:]/, " ", $0) }
> /<.*>/ { for (i = 1; i <= NF; i++)
> if ($i~/<.*>/) { s = substr ($i, 2, length($i)-2)
> c = 0
> for (j = i-1; j > 0 && c != 3; j--)
> if (length($j)>2) { s = $j FS s ; c++ }
> c = 0
> for (j = i+1; j <= NF && c != 3; j++)
> if (length($j)>2) { s = s FS $j ; c++ }
> }
> print s
> }
>
> Will produce...
>
> for some sea bass
> The bass part the song
> proposed elaborate program public works This
> the program required several hundred
Now that a working solution has been posted I can't help playing with
it...
/<.*>/ {
gsub(/[,.;:]/, " ", $0)
gsub(/[ \t]+[^ \t][^ \t]?[ \t]+/, " ", $0)
for ( i = 1; $i !~ /<.*>/; i++ )
;
t = substr($i, 2, length($i)-2 )
j = i - 3 > 1 ? i - 3 : 1
e = i + 3 < NF ? i + 3 : NF
s = i != j ? $j : t
while ( ++j <= e )
s = s FS (i != j) ? $j : t
print s
}
....the idea being to simplify by getting rid of all the "words" of one or
two characters right at the start (not tested). There's still a whole
bunch
of assumptions, including that "<" and ">" always are the first and last
characters of the target word and that there is at most one target word
per
line.
- Anton Treuenfels


|