On 16 Apr., 16:08, r <inp...@[EMAIL PROTECTED]
> wrote:
> On Apr 4, 4:19=A0am, Ed Morton <mor...@[EMAIL PROTECTED]
> wrote:
> ... awk -F, -f file.awk file2.csv file2.csv > newfile
>
> > > Could you tell me which section of the documentation explains why
the =
file needs
> > > to be read twice, please?
>
> > It's not a documentation thing, your problem just requires a 2-pass
solu=
tion.
> > The first time to identify which columns always have the null value,
the=
second
> > to output just the columns that have at least one non-null value.
>
> Thanks for your critism of the so-called tutorial; I'd already
> discarded it and started to read the gnu guide to gawk.
If you proceed to read further it you'll find answers to some of
your questions below.
>
> Returning to my example: sorry to re-start with this issue of reading
> the file twice, would you direct me towards the section of the gawk
> guide that explains what you call a "2-pass solution". I'd like to
> learn the gawk technical phrase for this task and read the background.
That has nothing to do with gawk. Many tools operate on data streams
sequentially but have tasks to do that can only be completed if the
data is already fully known to them; so you read the same file twice.
In awk that can be done by providing the filename two times in the
argument list.
awk -f awkprog datafile datafile
>
> Similarly, I've understood the terms NR and FNR but could not find
> reference to the equation NR=3D=3DFNR, which section of the gawk guide
It's just the equality operator as known from mathematics.
Since you know what each variable means you just have to think about
the values they will take while awk is processing the first file and
while processing the second file.
> should I read to learn that it is possible to conceive this equation?
> As a complete novice, where do I learn that after learning the
> definition of NR and FNR independently, it is possible to use these
> functions with =3D=3D?
You can compare values of the same type most generally.
(Sometimes awk performs implicit conversions before comparison.)
>
> The gawk guide introduces the term 'i++' in section 1.3 but '++' is
> explained (correctly?) as increment operator in section 5.8? Why is
> increment necessary in this example?
The ++ is the increment operator. You need a variable to increment.
Here i is the name of the variable to increment.
(To explain the necessity in "the" example I'd need the context.)
>
> So far, apart from my failure to understand NR=3D=3DFNR, I understand as
NR=3D=3DFNR is true while you're processing the first file of the
argument list. If the same filename is given twice then you can
call that condition "first pass" on the file.
> follows:
>
> if (NR>1) #if there is a line, or record, go to the next step
> for (i=3D1;i<=3DNF;i++) #where there is a record of data, each datum is
> considered equivalent to a variable i. As long as i is less than or
> equal to the total number of fields, or columns, add a value 1 to the
> value of i. So, if my file consists of data arranged in three columns,
> for the datum at row 1 column 2, the value of that datum i becomes i
> +1.
All that are really the basic basics of the awk language;
I'd suggest to read some more of the GNU awk guidebook.
Janis


|