On 25 Mrz., 14:32, Hermann Peifer <pei...@[EMAIL PROTECTED]
> wrote:
> On Mar 25, 11:39=A0am, Malapha <mala...@[EMAIL PROTECTED]
> wrote:
>
>
>
>
>
> > Thanks again. I got everything up and running - and it worked :-) I
> > also modified XMLCOPY as suggested.
>
> > Here are some benchmarks:
> > Type =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0Minutes =A0 =A0 =A0 =A0
Size=
> > BYTESHRED_XMLCOPY =A0 =A0 =A0 7,966666667 =A0 =A0 322 MB
> > COUNTSHRED =A0 =A0 =A0 =A0 =A0 =A0 =A00,583333333 =A0 =A0 322 MB
> > COUNTSHRED_XMLCOPY =A0 =A0 =A07,55 =A0 =A0 =A0 =A0 =A0 =A0322 MB
>
> > Countshred_XMLCOPY uses the xmlcopy method. As you can see - the
> > textbased method (Hermans first) is by ways the fastest. Having the
> > disadvantage, that the xml-input file has to be well formed. I am
> > still struggling which methodology to use. As I have filesizes of up
> > to 3 GB "COUNTSHRED" seems to be the one.
>
> If you already have "nicely formatted" XML files (or manage to get
> there via xmllint --format), then I'd recommend to use the faster
> solution with regular awk.
>
> If not... then you have to use xgawk in combination with XmlCopy. Some
> performance tuning might be possible. I guess that J=FCrgen might have
> some good ideas.
>
> > One more question: In my XML Files there is another tag next to the
> > <OfferInfo>, named <CancelOfferInfo>. Where do I need to place this in
> > the code, so that it also gets processed?
>
> There is no single answer to this question as are 3 scripts now with
> slightly different code. However, these rules will find both:
> OfferInfo and CancelOfferInfo elements:
>
> /<.*OfferInfo>/ {do something with regular awk}
>
> XMLPATH ~ /OfferInfo/ {do something with xgawk}
>
> Another xgawk option could be to define the condition via XMLDEPTH,
> e.g.:
>
> XMLDEPTH > 2 {do something}
>
> Hermann
After having decided to use the fasted way, please let me come back to
the original problem: I also want to have some logging about the
shredding-process at runtime, after each chunk is finished, so the
filesystems filesize after the shredding corresponds with the values
of the tables attributes in the logfile. Here is my idea:
logfile =3D "shredder_log.txt"
cmd_original_length =3D "ls -l " FILENAME " | gawk '{print $5;}'"
cmd_original_length | getline original_size
cmd_part_length =3D "ls -l " outfile " | gawk '{print $5;}'"
cmd_part_length | getline part_size
print outfile ";" sprintf("%03d", num+1) ";" FILENAME ";" strftime("%m
%d%Y%H%M%S", systime()) ";" original_size ";" part_size >> logfile
so far so fine - but I got problems with the placing of that piece of
code. I tried several places in the script, but either its to early
(file does not yet exist -> size 0), in between the process (wrong
filesize) or to late..
Many regards
Mala


|