On 2/20/2008 5:08 PM, Harriet Bazley wrote:
> (N.B. this is all reproduced from memory, as I don't currently have
> access to the file I was working on; so beware typos....)
>
>
> Today I was working on a CSV file in which the first field in every line
> was the system date, as follows:
>
> Tue,19 Feb 2008.08:13:24,28374658,29387034
> Wed,20 Feb 2008.22:45:33,40028373,29387574
>
> The file contained thousands of entries over a period of ten years or
> so, and I wanted to graph the change in value of the final field. To
> simplify the data while retaining the general trend I wanted to print
> just the first value encountered in each month, using a
> (month!=oldmonth) condition.
>
> To get the month value I altered the setting of FS to "[, .]" in order
> to split each record
> Tue
> 19
> Feb
> 2008
> 08:13:24
> 28374658
> 29387034
>
>
> Thus I could use
>
> {oldmonth=month
> month=$3 " " $4}
>
> (month!=oldmonth) {print}
>
> to print out the first value for "Feb 1999", "Mar 1999" etc., which
> seemed a nice simple way of doing it. However, I didn't actually want
> values such as "Tue,19 Feb 2008.08:13:24" for the data labels in my
> graph, so thought I might as well doctor the output by substituting my
> already-calculated value of 'month' back into field $4 and deleting
> fields $1 to $3. So far as I could see, though, there isn't actually a
> way of doing this. (If I do $1="";$2="",$3="" with OFS set to "," I
> get a row of commas at the start of the output.)
>
> In this case it proved simpler to {print $4,$6,$7} for what I was
> interested in, since there were only a few fields per record, but I was
> wondering if it is in fact theoretically possible to 'remove' fields
> from $0 (in the same way that one can, for example, add to them by
> assigning a value to a non-existent $8)? Gawk appears to remember the
> prior existence of the fields even if they don't currently have a value.
>
Kinda, bit not as easily as you'd like. You need to substitute text for a
pattern that includes the field separator, e.g.:
$ echo "1,2,3,4,5" | gawk --re-interval 'sub(/([^,]+,){3}/,"rep,")1'
rep,4,5
which you could use to replace the first 3 fields in a csv file with the
text
"rep" if you have GNU awk or some other awk that sup****ts RE intervals.
Replacing the middle 3 becomes more interesting, especially if you don't
have
gawks gensub().
Ed.


|