On 4/16/2008 9:08 AM, r wrote:
> On Apr 4, 4:19 am, Ed Morton <mor...@[EMAIL PROTECTED]
> wrote:
> ... awk -F, -f file.awk file2.csv file2.csv > newfile
>
>
>>>Could you tell me which section of the documentation explains why the
file needs
>>>to be read twice, please?
>>
>>It's not a documentation thing, your problem just requires a 2-pass
solution.
>>The first time to identify which columns always have the null value, the
second
>>to output just the columns that have at least one non-null value.
>
>
> Thanks for your critism of the so-called tutorial; I'd already
> discarded it and started to read the gnu guide to gawk.
>
> Returning to my example: sorry to re-start with this issue of reading
> the file twice, would you direct me towards the section of the gawk
> guide that explains what you call a "2-pass solution". I'd like to
> learn the gawk technical phrase for this task and read the background.
It's a software design concept, not an awk one. If you need to do
something with
some of the contents of a file based on the entire contents of that same
file,
then you need to read the file twice - that's all.
> Similarly, I've understood the terms NR and FNR but could not find
> reference to the equation NR==FNR, which section of the gawk guide
> should I read to learn that it is possible to conceive this equation?
> As a complete novice, where do I learn that after learning the
> definition of NR and FNR independently, it is possible to use these
> functions with ==?
They aren't functions, they're variables. NR is only equal to FNR in the
first
file (or the second file if the first one is empty - watch out for that
gotcha!). You can alternatively test for:
FILENAME == ARGV[1] - watch out for non-filename arguments
or
ARGIND == 1 - GNU awk only
I don't think that particular thing is specifically addressed in the GNU
awk guide.
> The gawk guide introduces the term 'i++' in section 1.3 but '++' is
> explained (correctly?) as increment operator in section 5.8? Why is
> increment necessary in this example?
I assume by "this example" you mean:
for (i=1;i<=NF;i++)
if ($i"" != "0.0e0")
good[i]
If so, think about what value(s) "i" would have if it was never
incremented.
>
> So far, apart from my failure to understand NR==FNR, I understand as
> follows:
>
> if (NR>1) #if there is a line, or record, go to the next step
> for (i=1;i<=NF;i++) #where there is a record of data, each datum is
> considered equivalent to a variable i. As long as i is less than or
> equal to the total number of fields, or columns, add a value 1 to the
> value of i. So, if my file consists of data arranged in three columns,
> for the datum at row 1 column 2, the value of that datum i becomes i
> +1.
>
I think you're on the right track but I'm not 100% sure you really get it.
Maybe
this will help:
$ cat file
r1f1 r1f2 r1f3
r2f1 r2f2
r3f1 r3f2 r3f3 r3f4
r4f1
$ awk '{
printf "record %d with %d fields = {\n",NR,NF
for (i=1;i<=NF;i++) {
printf "\tfield %d = %s\n",i,$i
}
print "}"
}' file
record 1 with 3 fields = {
field 1 = r1f1
field 2 = r1f2
field 3 = r1f3
}
record 2 with 2 fields = {
field 1 = r2f1
field 2 = r2f2
}
record 3 with 4 fields = {
field 1 = r3f1
field 2 = r3f2
field 3 = r3f3
field 4 = r3f4
}
record 4 with 1 fields = {
field 1 = r4f1
}
As an exercise, modifying the above to print FNR in addition to NR and
then add
a second "file" to the end of the awk argument list would probably help
you.
Ed.


|