On Mar 25, 11:39=A0am, Malapha <mala...@[EMAIL PROTECTED]
> wrote:
>
> Thanks again. I got everything up and running - and it worked :-) I
> also modified XMLCOPY as suggested.
>
> Here are some benchmarks:
> Type =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0Minutes =A0 =A0 =A0 =A0 Size
> BYTESHRED_XMLCOPY =A0 =A0 =A0 7,966666667 =A0 =A0 322 MB
> COUNTSHRED =A0 =A0 =A0 =A0 =A0 =A0 =A00,583333333 =A0 =A0 322 MB
> COUNTSHRED_XMLCOPY =A0 =A0 =A07,55 =A0 =A0 =A0 =A0 =A0 =A0322 MB
>
> Countshred_XMLCOPY uses the xmlcopy method. As you can see - the
> textbased method (Hermans first) is by ways the fastest. Having the
> disadvantage, that the xml-input file has to be well formed. I am
> still struggling which methodology to use. As I have filesizes of up
> to 3 GB "COUNTSHRED" seems to be the one.
>
If you already have "nicely formatted" XML files (or manage to get
there via xmllint --format), then I'd recommend to use the faster
solution with regular awk.
If not... then you have to use xgawk in combination with XmlCopy. Some
performance tuning might be possible. I guess that J=FCrgen might have
some good ideas.
> One more question: In my XML Files there is another tag next to the
> <OfferInfo>, named <CancelOfferInfo>. Where do I need to place this in
> the code, so that it also gets processed?
>
There is no single answer to this question as are 3 scripts now with
slightly different code. However, these rules will find both:
OfferInfo and CancelOfferInfo elements:
/<.*OfferInfo>/ {do something with regular awk}
XMLPATH ~ /OfferInfo/ {do something with xgawk}
Another xgawk option could be to define the condition via XMLDEPTH,
e.g.:
XMLDEPTH > 2 {do something}
Hermann


|