On 1/28/2008 11:21 AM, z.entropic wrote:
> On Jan 28, 9:56 am, Ed Morton <mor...@[EMAIL PROTECTED]
> wrote:
>
>>On 1/24/2008 8:56 PM, z.entropic wrote:
>>
>>
>>
>>
>>>On Jan 24, 5:40 pm, Ed Morton <mor...@[EMAIL PROTECTED]
> wrote:
>>
>>>>On 1/24/2008 4:35 PM, z.entropic wrote:
>>>
>>>>>On Jan 24, 3:33 pm, Ed Morton <mor...@[EMAIL PROTECTED]
> wrote:
>>>>
>>>>>>On 1/24/2008 2:25 PM, z.entropic wrote:
>>>>>
>>>>>>>On Jan 24, 2:41 pm, Ed Morton <mor...@[EMAIL PROTECTED]
> wrote:
>>>>>>
>>>>>>>>On 1/24/2008 1:19 PM, z.entropic wrote:
>>>>>>>><snip>
>>>>>>>
>>>>>>>>>Here is a fragment of my input file:
>>>>>>>>
>>>>>>>>>============
>>>>>>>>>100 24479.33 14399.09 1/23/2008 19:55 6
1
>>>>>>>>
>> 0 0 3.293 1.287>>>>>>>101 24480.25
14400.01 1/23/2008 19:55 6 1
>>
>> 0 0 3.296 1.288>>>>>>>102 24480.36 0.11
1/23/2008 19:55 7 1 0
>>
>> -0.00185954 3.167 1.287
>>
>>
>>
>>
>>
>>
>>>>>>>>>=============
>>>>>>>>
>>>>>>>>>Thus, if field $6 in line 100 is equal to 6 AND field $6 in line
101
>>>>>>>>>is equal to 7, store the value 14399.09 in array1[1] and 1.287 in
>>>>>>>>>array2[1]. When the next match is found, store the two values in
>>>>>>>>>array1[2] and array2[2], etc. Basically, I'm comparing the same
>>>>>>>>>fields in consecutive rows.
>>>>>>>>
>>>>>>>>Try this:
>>>>>>>
>>>>>>>>awk '($6==7)&&(p6==6){array1[++n]=p3;array2[n]=p11}
{p=$6;p3=$3;p11=$11}' file
>>>>>>>
>>>>>>>>You should use an array for the "p" (previous) field values if you
to need to
>>>>>>>>access more of them.
>>>>>>>
>>>>>>>> Ed.
>>>>>>>
>>>>>>>I think my example is a bit confusing due to poor formatting
(copying
>>>>>>
>>>>>>>from Excel with the wrong date format didn't help...) In essence,
I
>>>>>
>>>>>>>can't get the script working even after some changes etc., so let
me
>>>>>>>explain again as best as I can. Here is an interesting section frm
>>>>>>>one of my data files, this time with proper formatting that awk
would
>>>>>>>see (tab-separated fields):
>>>>>>
>>>>>>>100 24479.32 14399.08 1/23/2008 7:55:39 PM 6 1 0 0
3.293399
>>>>>>>101 24480.25 14400.01 1/23/2008 7:55:40 PM 6 1 0 0
3.293234
>>>>>>>102 24480.36 0.10 1/23/2008 7:55:41 PM 7 1 0 -0.00185954
3.166826
>>>>>>>103 24480.46 0.21 1/23/2008 7:55:41 PM 7 1 0 -0.00185932
3.034836
>>>>>>
>>>>>>>Simply put, I want to find pairs of lines in which the counter in
>>>>>>>field $7 changes, here from 6 to 7, and then store in array
array1[1]
>>>>>>>the value found in field $11 (3.293234, line 101). The next pair of
>>>>>>>found lines would change the array counter to 2 (array[2]).
>>>>>>
>>>>>>So now we're back to one array? ok, look:
>>>>>
>>>>>>$ awk '($7==7)&&(p7==6){array[++n]=p11} {p7=$7;p11=$11} END{for (i
in array)
>>>>>>print i, array[i]}' file
>>>>>>1 3.293234
>>>>>
>>>>>>>Once I figure out with your help how to do that, I'll try to expand
>>>>>>>this script to store more values, including some from line 102 in
the
>>>>>>>example above.
>>>>>>
>>>>>>If the above still isn't what you're looking for either, maybe
posting a little
>>>>>>more sample input and some expected output would help.
>>>>>
>>>>>> Ed.- Hide quoted text -
>>>>>
>>>>>>- Show quoted text -
>>>>>
>>>>>Your script works--in part, probably because I underspecified the
>>>>>requirements. I think the problem is a bit more complex; I'll
provide
>>>>>a larger example of input and output.
>>>>
>>>>OK, but if it's just that you want to get output every time the 7th
field
>>>>changes rather than when it specifically changes from 6 to 7, then all
you'd
>>>>need is:
>>>
>>>>$ awk 'p7&&($7!=p7){array[++n]=p11} {p7=$7;p11=$11} END{for (i in
array) print i
>>>>, array[i]}' file
>>>>1 3.293234
>>>
>>>>so also see if that's what you're really looking for....
>>>
>>>>>I believe this kind of a problem may be of interest to a wider group
>>>>>of readers and awk users as it concerns data extraction and
processing
>>>>>that I, at least, often encounter.
>>>>
>>>>>z.e.- Hide quoted text -
>>>>
>>>>- Show quoted text -- Hide quoted text -
>>>
>>>>- Show quoted text -
>>>
>>>I think this is the closes so far to my goal--the $11 values printed
>>>out are those I am after, and the lines I'm interested in always are
>>>those where one of the fields, a loop counter of sorts, changes a
>>>value. Now, three questions on the modification of the latest script
>>>to expand its functionality:
>>
>>>1. how to store the value in an aditional field, e.g., $10, and print
>>>it out on the same line? I've tried
>>
>>>awk 'p7&&($7!=p7){V[++n]=p11}{c[++m]==p10} {p7=$7;p10=$10;p11=$11} END
>>
>>ITYM c[++m]=p10 instead of c[++m]==p10.
>>
>>
>>>{for (i in V) print i, V[i],c[i]}'
>>
>>>but obviously this ex[pression doesn't work as intended (the for loop
>>>is incomplete...) Should I use two independent loops and a \n at the
>>>end of the first statement? The n and m indices are always the same,
>>>but I can't use n twice as its increases in both expressions...
>>
>>If you don't want n to increase twice, just don't increment it twice:
>>
>>awk 'p7&&($7!=p7){V[++n]=p11;c[n]=p10} {p7=$7;p10=$10;p11=$11}
>>END{for (i in V) print i, V[i],c[i]}'
>>
>>
>>
>>
>>>2. how could I store and print out $11 from the next line (with an
>>>already changed $7?)
>>
>>awk 'p7&&($7!=p7){V[++n]=p11;c[n]=p10;d[n]=$11} {p7=$7;p10=$10;p11=$11}
>>END{for (i in V) print i, V[i],c[i],d[i]}'
>>
>>
>>>3. I'd like to store and print the source FILENAME on each line.
>>
>>awk 'p7&&($7!=p7){V[++n]=p11;c[n]=p10;d[n]=$11,e[n]=FILENAME}
>>{p7=$7;p10=$10;p11=$11}
>>END{for (i in V) print i, V[i],c[i],d[i],e[i]}'
>>
>>but you don't need to store it if it's just one input file:
>>
>>awk 'p7&&($7!=p7){V[++n]=p11;c[n]=p10;d[n]=$11} {p7=$7;p10=$10;p11=$11}
>>END{for (i in V) print i, V[i],c[i],d[i],FILENAME}'
>>
>>
>>>4. I'd like to skip the first 5 or 10 lines (I think I know how to do
>>>that...)
>>
>>awk 'NR<=10{next}
>>p7&&($7!=p7){V[++n]=p11;c[n]==p10;d[n]=$11,e[n]=FILENAME}
{p7=$7;p10=$10;p11=$11}
>>END{for (i in V) print i, V[i],c[i],d[i],e[i]}'
>>
>>
>>>5. I assume that if I wanted more complex conditions, I could combine
>>>them as in
>>
>>>awk '(p7&&($7!=p7))&&(p8&&($8!=p8))...'
>>
>>>but what if I'd like to use $8 on the next line, with a changed value
>>>of $7?
>>
>>I don't know what you mean by that.
>>
>>
>>>Hmmm... this is getting more complex than I initially expected...
>>
>>Just follow the pattern....
>>
>> Ed.- Hide quoted text -
>>
>>- Show quoted text -- Hide quoted text -
>>
>>- Show quoted text -
>
>
> I took your latest script, cleaned it up a bit for clarity, changed
> some letters (to make them more meaningful for me during the debugging
> and learning process--and it almost works the way I would want it to
> work!
>
> ( NR < 8 ) && ( $7 < 6 ) { next } s7 && ( $7 != s7 ) { V[++n] = V11;
> c[n] = c10; U[n] = $11; f[n] = FILENAME } { s7 = $7; c10 = $10; V11 =
> $11 } END { for (i in V ) print i, f[i], c[i], V[i], U[i] }
>
> However, I still have a few problems:
>
> 1. the first two inequalities seem to be disregarded, and unwanted
> data are stored in the array and then printed out.
I think you probably meant "||" rather than "&&".
> 2. the data are printed out in reverse order (from the highest i to
> 1... how come?
No, they aren't. It's a random order due to the way array indexing works
in awk.
If you care about the order, use this:
for (i=1;i<=n;i++)
instead of
for (i in V)
> 3. how to impose an i>6 condition in the last 'for' printout loop (see
> #1).
for (i=7;i<=n;i++)
Let's fix up the white space a bit for readability:
( NR < 8 ) || ( $7 < 6 ) { next }
s7 && ( $7 != s7 ) { V[++n] = V11; c[n] = c10; U[n] = $11; f[n] = FILENAME
}
{ s7 = $7; c10 = $10; V11 = $11 }
END { for (i=7;i<=n;i++) print i, f[i], c[i], V[i], U[i] }
and note that you don't NEED several different arrays to just print that
information:
( NR < 8 ) || ( $7 < 6 ) { next }
s7 && ( $7 != s7 ) { V[++n] = FILENAME OFS c10 OFS V11 OFS $11 }
{ s7 = $7; c10 = $10; V11 = $11 }
END { for (i=7;i<=n;i++) print i, V[i] }
Regards,
Ed.


|