Talk About Network



Register and Login
Nick
Password
Register create new account Sign up is FREE and you can post replies, new topics, bookmark posts and more!
Recover lost password


Programming > Awk > Re: Splitting h...
Latest [ Topics | Posts ] Archive Post A New Topic Post a Reply
<< Topic < Post Post 14 of 16 Topic 2194 of 2236
Post > Topic >>

Re: Splitting huge XML Files into fixsized wellformed parts

by Hermann Peifer <peifer@[EMAIL PROTECTED] > Mar 25, 2008 at 06:32 AM

On Mar 25, 11:39=A0am, Malapha <mala...@[EMAIL PROTECTED]
> wrote:
>
> Thanks again. I got everything up and running - and it worked :-) I
> also modified XMLCOPY as suggested.
>
> Here are some benchmarks:
> Type =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0Minutes =A0 =A0 =A0 =A0 Size
> BYTESHRED_XMLCOPY =A0 =A0 =A0 7,966666667 =A0 =A0 322 MB
> COUNTSHRED =A0 =A0 =A0 =A0 =A0 =A0 =A00,583333333 =A0 =A0 322 MB
> COUNTSHRED_XMLCOPY =A0 =A0 =A07,55 =A0 =A0 =A0 =A0 =A0 =A0322 MB
>
> Countshred_XMLCOPY uses the xmlcopy method. As you can see - the
> textbased method (Hermans first) is by ways the fastest. Having the
> disadvantage, that the xml-input file has to be well formed. I am
> still struggling which methodology to use. As I have filesizes of up
> to 3 GB "COUNTSHRED" seems to be the one.
>

If you already have "nicely formatted" XML files (or manage to get
there via xmllint --format), then I'd recommend to use the faster
solution with regular awk.

If not... then you have to use xgawk in combination with XmlCopy. Some
performance tuning might be possible. I guess that J=FCrgen might have
some good ideas.

> One more question: In my XML Files there is another tag next to the
> <OfferInfo>, named <CancelOfferInfo>. Where do I need to place this in
> the code, so that it also gets processed?
>

There is no single answer to this question as are 3 scripts now with
slightly different code. However, these rules will find both:
OfferInfo and CancelOfferInfo elements:

/<.*OfferInfo>/ {do something with regular awk}

XMLPATH ~ /OfferInfo/ {do something with xgawk}

Another xgawk option could be to define the condition via XMLDEPTH,
e.g.:

XMLDEPTH > 2 {do something}

Hermann




 16 Posts in Topic:
Splitting huge XML Files into fixsized wellformed parts
Malapha <malapha@[EMAI  2008-03-17 03:43:20 
Re: Splitting huge XML Files into fixsized wellformed parts
Janis Papanagnou <Jani  2008-03-17 13:37:27 
Re: Splitting huge XML Files into fixsized wellformed parts
Malapha <malapha@[EMAI  2008-03-17 06:35:37 
Re: Splitting huge XML Files into fixsized wellformed parts
Hermann Peifer <peifer  2008-03-17 20:20:36 
Re: Splitting huge XML Files into fixsized wellformed parts
=?ISO-8859-1?Q?J=FCrgen_K  2008-03-17 21:33:46 
Re: Splitting huge XML Files into fixsized wellformed parts
Hermann Peifer <peifer  2008-03-18 00:01:43 
Re: Splitting huge XML Files into fixsized wellformed parts
Malapha <malapha@[EMAI  2008-03-18 08:42:50 
Re: Splitting huge XML Files into fixsized wellformed parts
Malapha <malapha@[EMAI  2008-03-18 08:43:54 
Re: Splitting huge XML Files into fixsized wellformed parts
Hermann Peifer <peifer  2008-03-18 20:49:03 
Re: Splitting huge XML Files into fixsized wellformed parts
Hermann Peifer <peifer  2008-03-19 14:05:17 
Re: Splitting huge XML Files into fixsized wellformed parts
Malapha <malapha@[EMAI  2008-03-19 15:11:08 
Re: Splitting huge XML Files into fixsized wellformed parts
Hermann Peifer <peifer  2008-03-20 09:52:22 
Re: Splitting huge XML Files into fixsized wellformed parts
Malapha <malapha@[EMAI  2008-03-25 03:39:31 
Re: Splitting huge XML Files into fixsized wellformed parts
Hermann Peifer <peifer  2008-03-25 06:32:31 
Re: Splitting huge XML Files into fixsized wellformed parts
Malapha <malapha@[EMAI  2008-03-26 10:01:38 
Re: Splitting huge XML Files into fixsized wellformed parts
Hermann Peifer <peifer  2008-03-26 19:57:10 

Post A Reply:
  Go here to Signup

AddThis Feed Button


About - Advertising - Contact - Frequently Asked Questions - Privacy Policy - Terms of Use - Signup

Contact
tan12V112 Fri May 16 9:20:27 CDT 2008.