(N.B. this is all reproduced from memory, as I don't currently have
access to the file I was working on; so beware typos....)
Today I was working on a CSV file in which the first field in every line
was the system date, as follows:
Tue,19 Feb 2008.08:13:24,28374658,29387034
Wed,20 Feb 2008.22:45:33,40028373,29387574
The file contained thousands of entries over a period of ten years or
so, and I wanted to graph the change in value of the final field. To
simplify the data while retaining the general trend I wanted to print
just the first value encountered in each month, using a
(month!=oldmonth) condition.
To get the month value I altered the setting of FS to "[, .]" in order
to split each record
Tue
19
Feb
2008
08:13:24
28374658
29387034
Thus I could use
{oldmonth=month
month=$3 " " $4}
(month!=oldmonth) {print}
to print out the first value for "Feb 1999", "Mar 1999" etc., which
seemed a nice simple way of doing it. However, I didn't actually want
values such as "Tue,19 Feb 2008.08:13:24" for the data labels in my
graph, so thought I might as well doctor the output by substituting my
already-calculated value of 'month' back into field $4 and deleting
fields $1 to $3. So far as I could see, though, there isn't actually a
way of doing this. (If I do $1="";$2="",$3="" with OFS set to "," I
get a row of commas at the start of the output.)
In this case it proved simpler to {print $4,$6,$7} for what I was
interested in, since there were only a few fields per record, but I was
wondering if it is in fact theoretically possible to 'remove' fields
from $0 (in the same way that one can, for example, add to them by
assigning a value to a non-existent $8)? Gawk appears to remember the
prior existence of the fields even if they don't currently have a value.
--
Harriet Bazley == Loyaulte me lie ==
A bachelor is footloose and fiancee free.


|