Talk About Network



Register and Login
Nick
Password
Register create new account Sign up is FREE and you can post replies, new topics, bookmark posts and more!
Recover lost password


Programming > Awk > Re: Splitting h...
Latest [ Topics | Posts ] Archive Post A New Topic Post a Reply
<< Topic < Post Post 2 of 16 Topic 2194 of 2241
Post > Topic >>

Re: Splitting huge XML Files into fixsized wellformed parts

by Janis Papanagnou <Janis_Papanagnou@[EMAIL PROTECTED] > Mar 17, 2008 at 01:37 PM

Malapha wrote:
> Hi,
> 
> I am kind of depressed :-) I want to split xml-files with sizes
> greater than 2 gb into smaler chunks. As I dont want to end up with
> billions of files, I want those splitted files to have configurable
> sizes like 250 MB. Each file should be well formed having an exact
> copy of the header (and footer as the closing of the header) from the
> original file. Forthermore, a table should be generated were I can
> see, that the File X is seperated into Part N with timestamp:

A nice and well described little homework with clear requirements.

I'd abstain from splitting the file according to file sizes in MB
but suggest to take a more simple measure for splitting, like number
of XML-blocks or number of lines.

> 
> Table:
> 
> Orginalfilename|Name of PartN|Size of PartN|Timestamp
> 
> 
> 
> The Original XML-Files look like this:
> <?xml ...>
> <Headerelement with some infos to be copied 1to1>
>          <OfferInfo>
>                          <OfferID></OfferID>
>                           ...
>           </OfferInfo>
>          <OfferInfo>
>                          <OfferID></OfferID>
>                           ...
>           </OfferInfo>
>          <OfferInfo>
>                          <OfferID></OfferID>
>                           ...
>           </OfferInfo>
> </Headerelement>
> 
> 
> 
> All in all I ended up with reading the XML processing docus with gawk,
> but as it seems I am lacking some deeper programming skills..

Given your data above you can solve that all with basic awk pattern
matching capabilities, no deeper skills required. What have you tried
so far?

> Could
> someone please help?

Since, apparently, you don't have a complex XML structure the use of
xgawk seems unnecessary. The quick way I'd go would be...

Save everything in a variable until you match the /Headerelement/.
Write that header to a file whose name contains a variable as number.
Write everything until the end of the block /<\/OfferInfo>/ to the
file whose name contains a variable as number, while counting lines.
If the number of lines exceeded some constant value write the constant
trailer, and close() the file, and increase the variable that counts
the files. To create a separate table just write out the information
you already have to a file with fixed name (use awk's date functions
or if unavailable an external date program and getline).

If you have concrete questions feel free to ask.
(Or did you mean to write that program for you?)

Janis

> 
> Thx
> Malapha




 16 Posts in Topic:
Splitting huge XML Files into fixsized wellformed parts
Malapha <malapha@[EMAI  2008-03-17 03:43:20 
Re: Splitting huge XML Files into fixsized wellformed parts
Janis Papanagnou <Jani  2008-03-17 13:37:27 
Re: Splitting huge XML Files into fixsized wellformed parts
Malapha <malapha@[EMAI  2008-03-17 06:35:37 
Re: Splitting huge XML Files into fixsized wellformed parts
Hermann Peifer <peifer  2008-03-17 20:20:36 
Re: Splitting huge XML Files into fixsized wellformed parts
=?ISO-8859-1?Q?J=FCrgen_K  2008-03-17 21:33:46 
Re: Splitting huge XML Files into fixsized wellformed parts
Hermann Peifer <peifer  2008-03-18 00:01:43 
Re: Splitting huge XML Files into fixsized wellformed parts
Malapha <malapha@[EMAI  2008-03-18 08:42:50 
Re: Splitting huge XML Files into fixsized wellformed parts
Malapha <malapha@[EMAI  2008-03-18 08:43:54 
Re: Splitting huge XML Files into fixsized wellformed parts
Hermann Peifer <peifer  2008-03-18 20:49:03 
Re: Splitting huge XML Files into fixsized wellformed parts
Hermann Peifer <peifer  2008-03-19 14:05:17 
Re: Splitting huge XML Files into fixsized wellformed parts
Malapha <malapha@[EMAI  2008-03-19 15:11:08 
Re: Splitting huge XML Files into fixsized wellformed parts
Hermann Peifer <peifer  2008-03-20 09:52:22 
Re: Splitting huge XML Files into fixsized wellformed parts
Malapha <malapha@[EMAI  2008-03-25 03:39:31 
Re: Splitting huge XML Files into fixsized wellformed parts
Hermann Peifer <peifer  2008-03-25 06:32:31 
Re: Splitting huge XML Files into fixsized wellformed parts
Malapha <malapha@[EMAI  2008-03-26 10:01:38 
Re: Splitting huge XML Files into fixsized wellformed parts
Hermann Peifer <peifer  2008-03-26 19:57:10 

Post A Reply:
  Go here to Signup

AddThis Feed Button


About - Advertising - Contact - Frequently Asked Questions - Privacy Policy - Terms of Use - Signup

Contact
tan12V112 Sat May 17 4:43:10 CDT 2008.