Talk About Network

Google


Register and Login
Nick
Password
Register create new account Sign up is FREE and you can post replies, new topics, bookmark posts and more!
Recover lost password


Programming > Awk > Re: arrays, wha...
Latest [ Topics | Posts ] Archive Post A New Topic Post a Reply
<< Topic < Post Post 5 of 7 Topic 2113 of 2341
Post > Topic >>

Re: arrays, what they actually are?

by Janis Papanagnou <Janis_Papanagnou@[EMAIL PROTECTED] > Dec 26, 2007 at 08:39 AM

Al wrote:
> Janis Papanagnou wrote:
> 
> 
>>Al wrote:
>>
>>>I'm having trouble getting my head around the most basic of things
about
>>>arrays, and wonder if anyone could post something very simple about
them,
>>>or point to some elementary link.  I understand how to do the notation
as
>>>listed in Sed and Awk, or similar books.   I can see it working when I
>>>copy
>>>the formulae, but don't really understand why or how it works.  What I
>>>don't get is sort of crazy, its 'what an array is'.
>>
>>Basically an array is a collection of values of the same type[*] that
are
>>ordered in a certain way or numbered (or named, in case of an
associative
>>array) and whose elements can be accessed by that number (or name).
>>
>>[*] You need additional concepts (like polymorphism or hybrid basic
types)
>>to define arrays as inhomogenuous collections. And generally
(non-awk'ish)
>>arrays can carry not only "values".
>>
>>
>>>Like, I have a file which goes something like this:
>>>
>>>1       item1   2.99    sept
>>>2       item2   4.25    may
>>>
>>>When you use the array commands to link the first column to the second,
>>>as
>>
>>I don't understand what you mean by "linked columns".
>>
>>
>>>shown in many of the examples, are you doing something like assigning
>>>names
>>>in Excel?  Is it a sort of view into the file?
>>
>>Besides the conceptual view I described above an array is a technical
tool
>>to memorize data; e.g. data in a file that you may need to store for
later
>>usage.
>>
>>Awk will read the file sequentially from the beginning and you can
>>instruct awk to do certain tasks on each of the lines, like stripping
only
>>certain fields ( {print $2, $4} ); that way you get something like a
view
>>on a file.
>>
>>Whenever you need to refer to previous data you have to store all
>>necessary data in an array, though.
>>
>>
>>>If I'm trying to use the
>>>array commands to total, for instance, all sales of item1 in sept, am I
>>>to create two views into the file....or?
>>
>>You will instruct awk to consider only lines that match your "view" by
>>defining a pattern[**]  $2=="item1" && $4=="sept"  then define the
action
>>to do on that files, summing up  sum += $3 . Your program would be...
>>
>>   $2=="item1" && $4=="sept" { sum += $3 }
>>   END { print sum }
>>
>>So you don't need any arrays for this task since you store just a scalar
>>value (and not a collection of named/indexed elements).
>>
>>[**] Assuming there's no whitespace in the second column (which awk used
>>as a field separator per default).
>>
>>Janis
>>
>>
>>>I realize this is confused.  Any basic help or pointers would be
greatly
>>>appreciated.
>>>
>>>Peter
> 
> 
> I see that awk does this without arrays, and thanks for the explanation,
> which helps a lot.  The example is a bit oversimplified.  Do I need to
use
> arrays for a slightly extended case?
> 
> In real life, its a text file of about 15,000 transactions, each of
which is
> a sale of one of about 200 items.  So I want to run through the file and
> add up the total sales per item per month.  However, the problem is,
they
> may add more items at any time.  So the program has to run against the
> transaction file without being able to specify in advance what items it
has
> to be totalling.
> 
> So one month it could be the above example, but the next month it could
look
> something like:
> 
> 1       item1   2.99    sept
> 2       item2   4.25    may
> 4       item5   250     aug
> 7       item6   4.25    aug
> 1       item1   2.99    aug
> 
> the file is item code number, item name, price, date of sale.
> 
> I was doing this in OO Calc with an "array formula" (hence some of the
> confusion) which works but too slowly, but am now rewriting it in awk
for
> performance reasons.  
> 
> Do I not need to use arrays for that situation in awk?  I'd assumed yes.

Yes, indeed. If I understand your requirement correct you don't want to
specify any fixed item but do the summation a) for all items, or b) for
all items and months, all in a single pass through the data.

First, case a), the easy one

   { sum[$2] += $3 }
   END { for (item in sum) print item, sum[item] }

or if you want the summation for a specific month only

   $4=="sept"{ sum[$2] += $3 }
   END { for (item in sum) print item, sum[item] }

Now case b) for two indices, item and month, a bit more complicated

   { sum[$2,$4] += $3 }
   END { for (key in sum) {
            split (key, x, SUBSEP)
            print x[1], x[2], sum[x[1],x[2]]
         }
   }

If I misunderstood your requirement feel free to ask and clarify.

Janis

> 
> Cheers
> 
> Al
 




 7 Posts in Topic:
arrays, what they actually are?
Al <palcibiades-first@  2007-12-24 08:18:40 
Re: arrays, what they actually are?
Manuel Collado <m.coll  2007-12-24 11:18:09 
Re: arrays, what they actually are?
Janis Papanagnou <Jani  2007-12-24 14:12:36 
Re: arrays, what they actually are?
Al <palcibiades-first@  2007-12-26 06:37:19 
Re: arrays, what they actually are?
Janis Papanagnou <Jani  2007-12-26 08:39:30 
Re: arrays, what they actually are?
Ed Morton <morton@[EMA  2007-12-24 10:47:08 
Re: arrays, what they actually are?
"Anton Treuenfels&qu  2007-12-24 21:29:47 

Post A Reply:
  Go here to Signup

AddThis Feed Button


About - Advertising - Contact - Frequently Asked Questions - Privacy Policy - Terms of Use - Signup

Contact
tan12V112 Wed Aug 27 17:41:08 CDT 2008.