Talk About Network

Google


Register and Login
Nick
Password
Register create new account Sign up is FREE and you can post replies, new topics, bookmark posts and more!
Recover lost password


Programming > Awk > Re: compare val...
Latest [ Topics | Posts ] Archive Post A New Topic Post a Reply
<< Topic < Post Post 15 of 15 Topic 2141 of 2317
Post > Topic >>

Re: compare values in the same field in consecutive rows--and store

by "z.entropic" <subPlanck@[EMAIL PROTECTED] > Jan 29, 2008 at 05:34 AM

On Jan 28, 12:30=A0pm, Ed Morton <mor...@[EMAIL PROTECTED]
> wrote:
> On 1/28/2008 11:21 AM, z.entropic wrote:
>
>
>
>
>
> > On Jan 28, 9:56 am, Ed Morton <mor...@[EMAIL PROTECTED]
> wrote:
>
> >>On 1/24/2008 8:56 PM, z.entropic wrote:
>
> >>>On Jan 24, 5:40 pm, Ed Morton <mor...@[EMAIL PROTECTED]
> wrote:
>
> >>>>On 1/24/2008 4:35 PM, z.entropic wrote:
>
> >>>>>On Jan 24, 3:33 pm, Ed Morton <mor...@[EMAIL PROTECTED]
> wrote:
>
> >>>>>>On 1/24/2008 2:25 PM, z.entropic wrote:
>
> >>>>>>>On Jan 24, 2:41 pm, Ed Morton <mor...@[EMAIL PROTECTED]
> wrote:
>
> >>>>>>>>On 1/24/2008 1:19 PM, z.entropic wrote:
> >>>>>>>><snip>
>
> >>>>>>>>>Here is a fragment of my input file:
>
> >>>>>>>>>=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
> >>>>>>>>>100 =A0 =A0 =A0 =A024479.33 =A0 =A0 =A0 =A014399.09 =A0 =A0 =A0
=
=A01/23/2008 19:55 6 =A0 =A0 =A0 1
>
> >> =A00 =A0 =A0 =A0 0 =A0 =A0 =A0 3.293 =A0 1.287>>>>>>>101 =A0 =A0 =A0
=
=A024480.25 =A0 =A0 =A0 =A014400.01 =A0 =A0 =A0 =A01/23/2008 19:55 6 =A0
=A0=
 =A0 1
>
> >> =A00 =A0 =A0 =A0 0 =A0 =A0 =A0 3.296 =A0 1.288>>>>>>>102 =A0 =A0 =A0
=
=A024480.36 =A0 =A0 =A0 =A00.11 =A0 =A01/23/2008 19:55 7 =A0 =A0 =A0 1 =A0
=
=A0 =A0 0
>
> >> =A0-0.00185954 =A0 =A0 3.167 =A0 1.287
>
> >>>>>>>>>=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
>
> >>>>>>>>>Thus, if field $6 in line 100 is equal to 6 AND field $6 in
line =
101
> >>>>>>>>>is equal to 7, store the value 14399.09 in array1[1] and 1.287
in=

> >>>>>>>>>array2[1]. When the next match is found, store the two values
in
> >>>>>>>>>array1[2] and array2[2], etc. =A0Basically, I'm comparing the
sam=
e
> >>>>>>>>>fields in consecutive rows.
>
> >>>>>>>>Try this:
>
> >>>>>>>>awk '($6=3D=3D7)&&(p6=3D=3D6){array1[++n]=3Dp3;array2[n]=3Dp11}
{p=
=3D$6;p3=3D$3;p11=3D$11}' file
>
> >>>>>>>>You should use an array for the "p" (previous) field values if
you=
 to need to
> >>>>>>>>access more of them.
>
> >>>>>>>> =A0 =A0 Ed.
>
> >>>>>>>I think my example is a bit confusing due to poor formatting
(copyi=
ng
>
> >>>>>>>from Excel with the wrong date format didn't help...) =A0In
essence=
, I
>
> >>>>>>>can't get the script working even after some changes etc., so let
m=
e
> >>>>>>>explain again as best as I can. =A0Here is an interesting section
f=
rm
> >>>>>>>one of my data files, this time with proper formatting that awk
wou=
ld
> >>>>>>>see (tab-separated fields):
>
> >>>>>>>100 24479.32 14399.08 1/23/2008 7:55:39 PM 6 1 0 =A00 =A0 =A0 =A0
=
=A0 =A03.293399
> >>>>>>>101 24480.25 14400.01 1/23/2008 7:55:40 PM 6 1 0 =A00 =A0 =A0 =A0
=
=A0 =A03.293234
> >>>>>>>102 24480.36 =A0 =A0 0.10 1/23/2008 7:55:41 PM 7 1 0 -0.00185954
3.=
166826
> >>>>>>>103 24480.46 =A0 =A0 0.21 1/23/2008 7:55:41 PM 7 1 0 -0.00185932
3.=
034836
>
> >>>>>>>Simply put, I want to find pairs of lines in which the counter in
> >>>>>>>field $7 changes, here from 6 to 7, and then store in array
array1[=
1]
> >>>>>>>the value found in field $11 (3.293234, line 101). The next pair
of=

> >>>>>>>found lines would change the array counter to 2 (array[2]).
>
> >>>>>>So now we're back to one array? ok, look:
>
> >>>>>>$ awk '($7=3D=3D7)&&(p7=3D=3D6){array[++n]=3Dp11}
{p7=3D$7;p11=3D$11=
} END{for (i in array)
> >>>>>>print i, array[i]}' file
> >>>>>>1 3.293234
>
> >>>>>>>Once I figure out with your help how to do that, I'll try to
expand=

> >>>>>>>this script to store more values, including some from line 102 in
t=
he
> >>>>>>>example above.
>
> >>>>>>If the above still isn't what you're looking for either, maybe
posti=
ng a little
> >>>>>>more sample input and some expected output would help.
>
> >>>>>> =A0 =A0 =A0Ed.- Hide quoted text -
>
> >>>>>>- Show quoted text -
>
> >>>>>Your script works--in part, probably because I underspecified the
> >>>>>requirements. =A0I think the problem is a bit more complex; I'll
prov=
ide
> >>>>>a larger =A0example of input and output.
>
> >>>>OK, but if it's just that you want to get output every time the 7th
fi=
eld
> >>>>changes rather than when it specifically changes from 6 to 7, then
all=
 you'd
> >>>>need is:
>
> >>>>$ awk 'p7&&($7!=3Dp7){array[++n]=3Dp11} {p7=3D$7;p11=3D$11} END{for
(i=
 in array) print i
> >>>>, array[i]}' file
> >>>>1 3.293234
>
> >>>>so also see if that's what you're really looking for....
>
> >>>>>I believe this kind of a problem may be of interest to a wider
group
> >>>>>of readers and awk users as it concerns data extraction and
processin=
g
> >>>>>that I, at least, often encounter.
>
> >>>>>z.e.- Hide quoted text -
>
> >>>>- Show quoted text -- Hide quoted text -
>
> >>>>- Show quoted text -
>
> >>>I think this is the closes so far to my goal--the $11 values printed
> >>>out are those I am after, and the lines I'm interested in always are
> >>>those where one of the fields, a loop counter of sorts, changes a
> >>>value. Now, three questions on the modification of the latest script
> >>>to expand its functionality:
>
> >>>1. how to store the value in an aditional field, e.g., $10, and print
> >>>it out on the same line? =A0I've tried
>
> >>>awk 'p7&&($7!=3Dp7){V[++n]=3Dp11}{c[++m]=3D=3Dp10}
{p7=3D$7;p10=3D$10;p=
11=3D$11} END
>
> >>ITYM c[++m]=3Dp10 instead of c[++m]=3D=3Dp10.
>
> >>>{for (i in V) print i, V[i],c[i]}'
>
> >>>but obviously this ex[pression doesn't work as intended (the for loop
> >>>is incomplete...) =A0Should I use two independent loops and a \n at
the=

> >>>end of the first statement? =A0The n and m indices are always the
same,=

> >>>but I can't use n twice as its increases in both expressions...
>
> >>If you don't want n to increase twice, just don't increment it twice:
>
> >>awk 'p7&&($7!=3Dp7){V[++n]=3Dp11;c[n]=3Dp10}
{p7=3D$7;p10=3D$10;p11=3D$1=
1}
> >>END{for (i in V) print i, V[i],c[i]}'
>
> >>>2. how could I store and print out $11 from the next line (with an
> >>>already changed $7?)
>
> >>awk 'p7&&($7!=3Dp7){V[++n]=3Dp11;c[n]=3Dp10;d[n]=3D$11}
{p7=3D$7;p10=3D$=
10;p11=3D$11}
> >>END{for (i in V) print i, V[i],c[i],d[i]}'
>
> >>>3. I'd like to store and print the source FILENAME on each line.
>
> >>awk
'p7&&($7!=3Dp7){V[++n]=3Dp11;c[n]=3Dp10;d[n]=3D$11,e[n]=3DFILENAME}
> >>{p7=3D$7;p10=3D$10;p11=3D$11}
> >>END{for (i in V) print i, V[i],c[i],d[i],e[i]}'
>
> >>but you don't need to store it if it's just one input file:
>
> >>awk 'p7&&($7!=3Dp7){V[++n]=3Dp11;c[n]=3Dp10;d[n]=3D$11}
{p7=3D$7;p10=3D$=
10;p11=3D$11}
> >>END{for (i in V) print i, V[i],c[i],d[i],FILENAME}'
>
> >>>4. I'd like to skip the first 5 or 10 lines (I think I know how to do
> >>>that...)
>
> >>awk 'NR<=3D10{next}
> >>p7&&($7!=3Dp7){V[++n]=3Dp11;c[n]=3D=3Dp10;d[n]=3D$11,e[n]=3DFILENAME}
{p=
7=3D$7;p10=3D$10;p11=3D$11}
> >>END{for (i in V) print i, V[i],c[i],d[i],e[i]}'
>
> >>>5. I assume that if I wanted more complex conditions, I could combine
> >>>them as in
>
> >>>awk '(p7&&($7!=3Dp7))&&(p8&&($8!=3Dp8))...'
>
> >>>but what if I'd like to use $8 on the next line, with a changed value
> >>>of $7?
>
> >>I don't know what you mean by that.
>
> >>>Hmmm... this is getting more complex than I initially expected...
>
> >>Just follow the pattern....
>
> >> =A0 =A0 =A0 =A0Ed.- Hide quoted text -
>
> >>- Show quoted text -- Hide quoted text -
>
> >>- Show quoted text -
>
> > I took your latest script, cleaned it up a bit for clarity, changed
> > some letters (to make them more meaningful for me during the debugging
> > and learning process--and it almost works the way I would want it to
> > work!
>
> > ( NR < 8 ) && ( $7 < 6 ) { next } s7 && ( $7 !=3D s7 ) { V[++n] =3D
V11;=

> > c[n] =3D c10; U[n] =3D $11; f[n] =3D FILENAME } { s7 =3D $7; c10 =3D
$10=
; V11 =3D
> > $11 } END { for (i in V ) print i, f[i], c[i], V[i], U[i] }
>
> > However, I still have a few problems:
>
> > 1. the first two inequalities seem to be disregarded, and unwanted
> > data are stored in the array and then printed out.
>
> I think you probably meant "||" rather than "&&".
>
> > 2. the data are printed out in reverse order (from the highest i to
> > 1... how come?
>
> No, they aren't. It's a random order due to the way array indexing works
i=
n awk.
> If you care about the order, use this:
>
> for (i=3D1;i<=3Dn;i++)
>
> instead of
>
> for (i in V)
>
> > 3. how to impose an i>6 condition in the last 'for' printout loop (see
> > #1).
>
> for (i=3D7;i<=3Dn;i++)
>
> Let's fix up the white space a bit for readability:
>
> ( NR < 8 ) || ( $7 < 6 ) { next }
> s7 && ( $7 !=3D s7 ) { V[++n] =3D V11; c[n] =3D c10; U[n] =3D $11; f[n]
=
=3D FILENAME }
> { s7 =3D $7; c10 =3D $10; V11 =3D $11 }
> END { for (i=3D7;i<=3Dn;i++) print i, f[i], c[i], V[i], U[i] }
>
> and note that you don't NEED several different arrays to just print that
> information:
>
> ( NR < 8 ) || ( $7 < 6 ) { next }
> s7 && ( $7 !=3D s7 ) { V[++n] =3D FILENAME OFS c10 OFS V11 OFS $11 }
> { s7 =3D $7; c10 =3D $10; V11 =3D $11 }
> END { for (i=3D7;i<=3Dn;i++) print i, V[i] }
>
> Regards,
>
> =A0 =A0 =A0 =A0 Ed.- Hide quoted text -
>
> - Show quoted text -

Great thanks, Ed--that was a wonderful lesson--and a good starting
point for further exploits!  Works just great!

z.e.
 




 15 Posts in Topic:
compare values in the same field in consecutive rows--and store
"z.entropic" &l  2008-01-24 10:53:49 
Re: compare values in the same field in consecutive rows--and st
Ed Morton <morton@[EMA  2008-01-24 12:58:25 
Re: compare values in the same field in consecutive rows--and st
"z.entropic" &l  2008-01-24 11:19:28 
Re: compare values in the same field in consecutive rows--and st
Ed Morton <morton@[EMA  2008-01-24 13:41:45 
Re: compare values in the same field in consecutive rows--and st
Ed Morton <morton@[EMA  2008-01-24 13:52:12 
Re: compare values in the same field in consecutive rows--and st
"z.entropic" &l  2008-01-24 12:25:04 
Re: compare values in the same field in consecutive rows--and st
Ed Morton <morton@[EMA  2008-01-24 14:33:01 
Re: compare values in the same field in consecutive rows--and st
"z.entropic" &l  2008-01-24 14:35:52 
Re: compare values in the same field in consecutive rows--and st
Ed Morton <morton@[EMA  2008-01-24 16:40:22 
Re: compare values in the same field in consecutive rows--and st
"z.entropic" &l  2008-01-24 18:56:50 
Re: compare values in the same field in consecutive rows--and st
Ed Morton <morton@[EMA  2008-01-28 08:56:07 
Re: compare values in the same field in consecutive rows--and st
"z.entropic" &l  2008-01-28 07:46:59 
Re: compare values in the same field in consecutive rows--and st
"z.entropic" &l  2008-01-28 09:21:25 
Re: compare values in the same field in consecutive rows--and st
Ed Morton <morton@[EMA  2008-01-28 11:30:39 
Re: compare values in the same field in consecutive rows--and st
"z.entropic" &l  2008-01-29 05:34:16 

Post A Reply:
  Go here to Signup

AddThis Feed Button


About - Advertising - Contact - Frequently Asked Questions - Privacy Policy - Terms of Use - Signup

Contact
tan12V112 Fri Jul 25 15:07:20 CDT 2008.