Janis Papanagnou wrote:
> Al wrote:
>> I'm having trouble getting my head around the most basic of things
about
>> arrays, and wonder if anyone could post something very simple about
them,
>> or point to some elementary link. I understand how to do the notation
as
>> listed in Sed and Awk, or similar books. I can see it working when I
>> copy
>> the formulae, but don't really understand why or how it works. What I
>> don't get is sort of crazy, its 'what an array is'.
>
> Basically an array is a collection of values of the same type[*] that
are
> ordered in a certain way or numbered (or named, in case of an
associative
> array) and whose elements can be accessed by that number (or name).
>
> [*] You need additional concepts (like polymorphism or hybrid basic
types)
> to define arrays as inhomogenuous collections. And generally
(non-awk'ish)
> arrays can carry not only "values".
>
>>
>> Like, I have a file which goes something like this:
>>
>> 1 item1 2.99 sept
>> 2 item2 4.25 may
>>
>> When you use the array commands to link the first column to the second,
>> as
>
> I don't understand what you mean by "linked columns".
>
>> shown in many of the examples, are you doing something like assigning
>> names
>> in Excel? Is it a sort of view into the file?
>
> Besides the conceptual view I described above an array is a technical
tool
> to memorize data; e.g. data in a file that you may need to store for
later
> usage.
>
> Awk will read the file sequentially from the beginning and you can
> instruct awk to do certain tasks on each of the lines, like stripping
only
> certain fields ( {print $2, $4} ); that way you get something like a
view
> on a file.
>
> Whenever you need to refer to previous data you have to store all
> necessary data in an array, though.
>
>> If I'm trying to use the
>> array commands to total, for instance, all sales of item1 in sept, am I
>> to create two views into the file....or?
>
> You will instruct awk to consider only lines that match your "view" by
> defining a pattern[**] $2=="item1" && $4=="sept" then define the
action
> to do on that files, summing up sum += $3 . Your program would be...
>
> $2=="item1" && $4=="sept" { sum += $3 }
> END { print sum }
>
> So you don't need any arrays for this task since you store just a scalar
> value (and not a collection of named/indexed elements).
>
> [**] Assuming there's no whitespace in the second column (which awk used
> as a field separator per default).
>
> Janis
>
>>
>> I realize this is confused. Any basic help or pointers would be
greatly
>> appreciated.
>>
>> Peter
I see that awk does this without arrays, and thanks for the explanation,
which helps a lot. The example is a bit oversimplified. Do I need to use
arrays for a slightly extended case?
In real life, its a text file of about 15,000 transactions, each of which
is
a sale of one of about 200 items. So I want to run through the file and
add up the total sales per item per month. However, the problem is, they
may add more items at any time. So the program has to run against the
transaction file without being able to specify in advance what items it
has
to be totalling.
So one month it could be the above example, but the next month it could
look
something like:
1 item1 2.99 sept
2 item2 4.25 may
4 item5 250 aug
7 item6 4.25 aug
1 item1 2.99 aug
the file is item code number, item name, price, date of sale.
I was doing this in OO Calc with an "array formula" (hence some of the
confusion) which works but too slowly, but am now rewriting it in awk for
performance reasons.
Do I not need to use arrays for that situation in awk? I'd assumed yes.
Cheers
Al


|