Al wrote:
> Janis Papanagnou wrote:
>
>
>>Al wrote:
>>
>>>I'm having trouble getting my head around the most basic of things
about
>>>arrays, and wonder if anyone could post something very simple about
them,
>>>or point to some elementary link. I understand how to do the notation
as
>>>listed in Sed and Awk, or similar books. I can see it working when I
>>>copy
>>>the formulae, but don't really understand why or how it works. What I
>>>don't get is sort of crazy, its 'what an array is'.
>>
>>Basically an array is a collection of values of the same type[*] that
are
>>ordered in a certain way or numbered (or named, in case of an
associative
>>array) and whose elements can be accessed by that number (or name).
>>
>>[*] You need additional concepts (like polymorphism or hybrid basic
types)
>>to define arrays as inhomogenuous collections. And generally
(non-awk'ish)
>>arrays can carry not only "values".
>>
>>
>>>Like, I have a file which goes something like this:
>>>
>>>1 item1 2.99 sept
>>>2 item2 4.25 may
>>>
>>>When you use the array commands to link the first column to the second,
>>>as
>>
>>I don't understand what you mean by "linked columns".
>>
>>
>>>shown in many of the examples, are you doing something like assigning
>>>names
>>>in Excel? Is it a sort of view into the file?
>>
>>Besides the conceptual view I described above an array is a technical
tool
>>to memorize data; e.g. data in a file that you may need to store for
later
>>usage.
>>
>>Awk will read the file sequentially from the beginning and you can
>>instruct awk to do certain tasks on each of the lines, like stripping
only
>>certain fields ( {print $2, $4} ); that way you get something like a
view
>>on a file.
>>
>>Whenever you need to refer to previous data you have to store all
>>necessary data in an array, though.
>>
>>
>>>If I'm trying to use the
>>>array commands to total, for instance, all sales of item1 in sept, am I
>>>to create two views into the file....or?
>>
>>You will instruct awk to consider only lines that match your "view" by
>>defining a pattern[**] $2=="item1" && $4=="sept" then define the
action
>>to do on that files, summing up sum += $3 . Your program would be...
>>
>> $2=="item1" && $4=="sept" { sum += $3 }
>> END { print sum }
>>
>>So you don't need any arrays for this task since you store just a scalar
>>value (and not a collection of named/indexed elements).
>>
>>[**] Assuming there's no whitespace in the second column (which awk used
>>as a field separator per default).
>>
>>Janis
>>
>>
>>>I realize this is confused. Any basic help or pointers would be
greatly
>>>appreciated.
>>>
>>>Peter
>
>
> I see that awk does this without arrays, and thanks for the explanation,
> which helps a lot. The example is a bit oversimplified. Do I need to
use
> arrays for a slightly extended case?
>
> In real life, its a text file of about 15,000 transactions, each of
which is
> a sale of one of about 200 items. So I want to run through the file and
> add up the total sales per item per month. However, the problem is,
they
> may add more items at any time. So the program has to run against the
> transaction file without being able to specify in advance what items it
has
> to be totalling.
>
> So one month it could be the above example, but the next month it could
look
> something like:
>
> 1 item1 2.99 sept
> 2 item2 4.25 may
> 4 item5 250 aug
> 7 item6 4.25 aug
> 1 item1 2.99 aug
>
> the file is item code number, item name, price, date of sale.
>
> I was doing this in OO Calc with an "array formula" (hence some of the
> confusion) which works but too slowly, but am now rewriting it in awk
for
> performance reasons.
>
> Do I not need to use arrays for that situation in awk? I'd assumed yes.
Yes, indeed. If I understand your requirement correct you don't want to
specify any fixed item but do the summation a) for all items, or b) for
all items and months, all in a single pass through the data.
First, case a), the easy one
{ sum[$2] += $3 }
END { for (item in sum) print item, sum[item] }
or if you want the summation for a specific month only
$4=="sept"{ sum[$2] += $3 }
END { for (item in sum) print item, sum[item] }
Now case b) for two indices, item and month, a bit more complicated
{ sum[$2,$4] += $3 }
END { for (key in sum) {
split (key, x, SUBSEP)
print x[1], x[2], sum[x[1],x[2]]
}
}
If I misunderstood your requirement feel free to ask and clarify.
Janis
>
> Cheers
>
> Al


|