On Jan 28, 12:30=A0pm, Ed Morton <mor...@[EMAIL PROTECTED]
> wrote:
> On 1/28/2008 11:21 AM, z.entropic wrote:
>
>
>
>
>
> > On Jan 28, 9:56 am, Ed Morton <mor...@[EMAIL PROTECTED]
> wrote:
>
> >>On 1/24/2008 8:56 PM, z.entropic wrote:
>
> >>>On Jan 24, 5:40 pm, Ed Morton <mor...@[EMAIL PROTECTED]
> wrote:
>
> >>>>On 1/24/2008 4:35 PM, z.entropic wrote:
>
> >>>>>On Jan 24, 3:33 pm, Ed Morton <mor...@[EMAIL PROTECTED]
> wrote:
>
> >>>>>>On 1/24/2008 2:25 PM, z.entropic wrote:
>
> >>>>>>>On Jan 24, 2:41 pm, Ed Morton <mor...@[EMAIL PROTECTED]
> wrote:
>
> >>>>>>>>On 1/24/2008 1:19 PM, z.entropic wrote:
> >>>>>>>><snip>
>
> >>>>>>>>>Here is a fragment of my input file:
>
> >>>>>>>>>=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
> >>>>>>>>>100 =A0 =A0 =A0 =A024479.33 =A0 =A0 =A0 =A014399.09 =A0 =A0 =A0
=
=A01/23/2008 19:55 6 =A0 =A0 =A0 1
>
> >> =A00 =A0 =A0 =A0 0 =A0 =A0 =A0 3.293 =A0 1.287>>>>>>>101 =A0 =A0 =A0
=
=A024480.25 =A0 =A0 =A0 =A014400.01 =A0 =A0 =A0 =A01/23/2008 19:55 6 =A0
=A0=
=A0 1
>
> >> =A00 =A0 =A0 =A0 0 =A0 =A0 =A0 3.296 =A0 1.288>>>>>>>102 =A0 =A0 =A0
=
=A024480.36 =A0 =A0 =A0 =A00.11 =A0 =A01/23/2008 19:55 7 =A0 =A0 =A0 1 =A0
=
=A0 =A0 0
>
> >> =A0-0.00185954 =A0 =A0 3.167 =A0 1.287
>
> >>>>>>>>>=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
>
> >>>>>>>>>Thus, if field $6 in line 100 is equal to 6 AND field $6 in
line =
101
> >>>>>>>>>is equal to 7, store the value 14399.09 in array1[1] and 1.287
in=
> >>>>>>>>>array2[1]. When the next match is found, store the two values
in
> >>>>>>>>>array1[2] and array2[2], etc. =A0Basically, I'm comparing the
sam=
e
> >>>>>>>>>fields in consecutive rows.
>
> >>>>>>>>Try this:
>
> >>>>>>>>awk '($6=3D=3D7)&&(p6=3D=3D6){array1[++n]=3Dp3;array2[n]=3Dp11}
{p=
=3D$6;p3=3D$3;p11=3D$11}' file
>
> >>>>>>>>You should use an array for the "p" (previous) field values if
you=
to need to
> >>>>>>>>access more of them.
>
> >>>>>>>> =A0 =A0 Ed.
>
> >>>>>>>I think my example is a bit confusing due to poor formatting
(copyi=
ng
>
> >>>>>>>from Excel with the wrong date format didn't help...) =A0In
essence=
, I
>
> >>>>>>>can't get the script working even after some changes etc., so let
m=
e
> >>>>>>>explain again as best as I can. =A0Here is an interesting section
f=
rm
> >>>>>>>one of my data files, this time with proper formatting that awk
wou=
ld
> >>>>>>>see (tab-separated fields):
>
> >>>>>>>100 24479.32 14399.08 1/23/2008 7:55:39 PM 6 1 0 =A00 =A0 =A0 =A0
=
=A0 =A03.293399
> >>>>>>>101 24480.25 14400.01 1/23/2008 7:55:40 PM 6 1 0 =A00 =A0 =A0 =A0
=
=A0 =A03.293234
> >>>>>>>102 24480.36 =A0 =A0 0.10 1/23/2008 7:55:41 PM 7 1 0 -0.00185954
3.=
166826
> >>>>>>>103 24480.46 =A0 =A0 0.21 1/23/2008 7:55:41 PM 7 1 0 -0.00185932
3.=
034836
>
> >>>>>>>Simply put, I want to find pairs of lines in which the counter in
> >>>>>>>field $7 changes, here from 6 to 7, and then store in array
array1[=
1]
> >>>>>>>the value found in field $11 (3.293234, line 101). The next pair
of=
> >>>>>>>found lines would change the array counter to 2 (array[2]).
>
> >>>>>>So now we're back to one array? ok, look:
>
> >>>>>>$ awk '($7=3D=3D7)&&(p7=3D=3D6){array[++n]=3Dp11}
{p7=3D$7;p11=3D$11=
} END{for (i in array)
> >>>>>>print i, array[i]}' file
> >>>>>>1 3.293234
>
> >>>>>>>Once I figure out with your help how to do that, I'll try to
expand=
> >>>>>>>this script to store more values, including some from line 102 in
t=
he
> >>>>>>>example above.
>
> >>>>>>If the above still isn't what you're looking for either, maybe
posti=
ng a little
> >>>>>>more sample input and some expected output would help.
>
> >>>>>> =A0 =A0 =A0Ed.- Hide quoted text -
>
> >>>>>>- Show quoted text -
>
> >>>>>Your script works--in part, probably because I underspecified the
> >>>>>requirements. =A0I think the problem is a bit more complex; I'll
prov=
ide
> >>>>>a larger =A0example of input and output.
>
> >>>>OK, but if it's just that you want to get output every time the 7th
fi=
eld
> >>>>changes rather than when it specifically changes from 6 to 7, then
all=
you'd
> >>>>need is:
>
> >>>>$ awk 'p7&&($7!=3Dp7){array[++n]=3Dp11} {p7=3D$7;p11=3D$11} END{for
(i=
in array) print i
> >>>>, array[i]}' file
> >>>>1 3.293234
>
> >>>>so also see if that's what you're really looking for....
>
> >>>>>I believe this kind of a problem may be of interest to a wider
group
> >>>>>of readers and awk users as it concerns data extraction and
processin=
g
> >>>>>that I, at least, often encounter.
>
> >>>>>z.e.- Hide quoted text -
>
> >>>>- Show quoted text -- Hide quoted text -
>
> >>>>- Show quoted text -
>
> >>>I think this is the closes so far to my goal--the $11 values printed
> >>>out are those I am after, and the lines I'm interested in always are
> >>>those where one of the fields, a loop counter of sorts, changes a
> >>>value. Now, three questions on the modification of the latest script
> >>>to expand its functionality:
>
> >>>1. how to store the value in an aditional field, e.g., $10, and print
> >>>it out on the same line? =A0I've tried
>
> >>>awk 'p7&&($7!=3Dp7){V[++n]=3Dp11}{c[++m]=3D=3Dp10}
{p7=3D$7;p10=3D$10;p=
11=3D$11} END
>
> >>ITYM c[++m]=3Dp10 instead of c[++m]=3D=3Dp10.
>
> >>>{for (i in V) print i, V[i],c[i]}'
>
> >>>but obviously this ex[pression doesn't work as intended (the for loop
> >>>is incomplete...) =A0Should I use two independent loops and a \n at
the=
> >>>end of the first statement? =A0The n and m indices are always the
same,=
> >>>but I can't use n twice as its increases in both expressions...
>
> >>If you don't want n to increase twice, just don't increment it twice:
>
> >>awk 'p7&&($7!=3Dp7){V[++n]=3Dp11;c[n]=3Dp10}
{p7=3D$7;p10=3D$10;p11=3D$1=
1}
> >>END{for (i in V) print i, V[i],c[i]}'
>
> >>>2. how could I store and print out $11 from the next line (with an
> >>>already changed $7?)
>
> >>awk 'p7&&($7!=3Dp7){V[++n]=3Dp11;c[n]=3Dp10;d[n]=3D$11}
{p7=3D$7;p10=3D$=
10;p11=3D$11}
> >>END{for (i in V) print i, V[i],c[i],d[i]}'
>
> >>>3. I'd like to store and print the source FILENAME on each line.
>
> >>awk
'p7&&($7!=3Dp7){V[++n]=3Dp11;c[n]=3Dp10;d[n]=3D$11,e[n]=3DFILENAME}
> >>{p7=3D$7;p10=3D$10;p11=3D$11}
> >>END{for (i in V) print i, V[i],c[i],d[i],e[i]}'
>
> >>but you don't need to store it if it's just one input file:
>
> >>awk 'p7&&($7!=3Dp7){V[++n]=3Dp11;c[n]=3Dp10;d[n]=3D$11}
{p7=3D$7;p10=3D$=
10;p11=3D$11}
> >>END{for (i in V) print i, V[i],c[i],d[i],FILENAME}'
>
> >>>4. I'd like to skip the first 5 or 10 lines (I think I know how to do
> >>>that...)
>
> >>awk 'NR<=3D10{next}
> >>p7&&($7!=3Dp7){V[++n]=3Dp11;c[n]=3D=3Dp10;d[n]=3D$11,e[n]=3DFILENAME}
{p=
7=3D$7;p10=3D$10;p11=3D$11}
> >>END{for (i in V) print i, V[i],c[i],d[i],e[i]}'
>
> >>>5. I assume that if I wanted more complex conditions, I could combine
> >>>them as in
>
> >>>awk '(p7&&($7!=3Dp7))&&(p8&&($8!=3Dp8))...'
>
> >>>but what if I'd like to use $8 on the next line, with a changed value
> >>>of $7?
>
> >>I don't know what you mean by that.
>
> >>>Hmmm... this is getting more complex than I initially expected...
>
> >>Just follow the pattern....
>
> >> =A0 =A0 =A0 =A0Ed.- Hide quoted text -
>
> >>- Show quoted text -- Hide quoted text -
>
> >>- Show quoted text -
>
> > I took your latest script, cleaned it up a bit for clarity, changed
> > some letters (to make them more meaningful for me during the debugging
> > and learning process--and it almost works the way I would want it to
> > work!
>
> > ( NR < 8 ) && ( $7 < 6 ) { next } s7 && ( $7 !=3D s7 ) { V[++n] =3D
V11;=
> > c[n] =3D c10; U[n] =3D $11; f[n] =3D FILENAME } { s7 =3D $7; c10 =3D
$10=
; V11 =3D
> > $11 } END { for (i in V ) print i, f[i], c[i], V[i], U[i] }
>
> > However, I still have a few problems:
>
> > 1. the first two inequalities seem to be disregarded, and unwanted
> > data are stored in the array and then printed out.
>
> I think you probably meant "||" rather than "&&".
>
> > 2. the data are printed out in reverse order (from the highest i to
> > 1... how come?
>
> No, they aren't. It's a random order due to the way array indexing works
i=
n awk.
> If you care about the order, use this:
>
> for (i=3D1;i<=3Dn;i++)
>
> instead of
>
> for (i in V)
>
> > 3. how to impose an i>6 condition in the last 'for' printout loop (see
> > #1).
>
> for (i=3D7;i<=3Dn;i++)
>
> Let's fix up the white space a bit for readability:
>
> ( NR < 8 ) || ( $7 < 6 ) { next }
> s7 && ( $7 !=3D s7 ) { V[++n] =3D V11; c[n] =3D c10; U[n] =3D $11; f[n]
=
=3D FILENAME }
> { s7 =3D $7; c10 =3D $10; V11 =3D $11 }
> END { for (i=3D7;i<=3Dn;i++) print i, f[i], c[i], V[i], U[i] }
>
> and note that you don't NEED several different arrays to just print that
> information:
>
> ( NR < 8 ) || ( $7 < 6 ) { next }
> s7 && ( $7 !=3D s7 ) { V[++n] =3D FILENAME OFS c10 OFS V11 OFS $11 }
> { s7 =3D $7; c10 =3D $10; V11 =3D $11 }
> END { for (i=3D7;i<=3Dn;i++) print i, V[i] }
>
> Regards,
>
> =A0 =A0 =A0 =A0 Ed.- Hide quoted text -
>
> - Show quoted text -
Great thanks, Ed--that was a wonderful lesson--and a good starting
point for further exploits! Works just great!
z.e.


|