Talk About Network



Register and Login
Nick
Password
Register create new account Sign up is FREE and you can post replies, new topics, bookmark posts and more!
Recover lost password


Programming > Awk > Re: Splitting h...
Latest [ Topics | Posts ] Archive Post A New Topic Post a Reply
<< Topic < Post Post 3 of 16 Topic 2194 of 2236
Post > Topic >>

Re: Splitting huge XML Files into fixsized wellformed parts

by Malapha <malapha@[EMAIL PROTECTED] > Mar 17, 2008 at 06:35 AM

On 17 Mrz., 13:37, Janis Papanagnou <Janis_Papanag...@[EMAIL PROTECTED]
>
wrote:
> Malapha wrote:
> > Hi,
>
> > I am kind of depressed :-) I want to split xml-files with sizes
> > greater than 2 gb into smaler chunks. As I dont want to end up with
> > billions of files, I want those splitted files to have configurable
> > sizes like 250 MB. Each file should be well formed having an exact
> > copy of the header (and footer as the closing of the header) from the
> > original file. Forthermore, a table should be generated were I can
> > see, that the File X is seperated into Part N with timestamp:
>
> A nice and well described little homework with clear requirements.
>
> I'd abstain from splitting the file according to file sizes in MB
> but suggest to take a more simple measure for splitting, like number
> of XML-blocks or number of lines.
>

I totally agree with you. Using numbers of XML block as an
approximation for filesize is well enough.
The problem I see is, using linecounts works in cases where an EOL is
implemented in the xml document. In case the input data file has no
EOL I run into problems. So I came to the solution to use the xgawk
framework in order to make use of the "node hopping" technique. This
gives me the possibility to count the Offers without having to solve
the problems mentioned above.

>
> > All in all I ended up with reading the XML processing docus with gawk,
> > but as it seems I am lacking some deeper programming skills..
>
> Given your data above you can solve that all with basic awk pattern
> matching capabilities, no deeper skills required. What have you tried
> so far?

As I come from the VBA world - I tried to get familiar with awk. What
I do have is theoretical solution in form of a structured process
diagram :-)

Copy Header and Footer from Original to Var
Set Start_Offer = First Offer (from <Offer> to </Offer>)
Set End_Transaction = 0
Set Part = 0
Set FileSize = 0
Set MaxFileSize = 250
while not Start_Offer < EOF(OriginalXMLFile)
     Part=part+1
     Open NewFile OriginalXMLFileName + Part + ".xml"
     Paste Header from Var to NewFile
     While filesize(NewFile)<MaxFileSize do
         Copy Offer (Start_Offer) from OriginalXMLDatei to NewFile
         Start_Offer=Start_Offer + 1
     wend
     Paste Footer from Var to NewFile
wend

I am right now trying to translate this into awk.. Please dont ask me
how far i am, its frustrating :-)


> Save everything in a variable until you match the /Headerelement/.
> Write that header to a file whose name contains a variable as number.
> Write everything until the end of the block /<\/OfferInfo>/ to the
> file whose name contains a variable as number, while counting lines.
> If the number of lines exceeded some constant value write the constant
> trailer, and close() the file, and increase the variable that counts
> the files. To create a separate table just write out the information
> you already have to a file with fixed name (use awk's date functions
> or if unavailable an external date program and getline).

This looks very much like my approach - so I am quite happy that I am
not that wrong...




 16 Posts in Topic:
Splitting huge XML Files into fixsized wellformed parts
Malapha <malapha@[EMAI  2008-03-17 03:43:20 
Re: Splitting huge XML Files into fixsized wellformed parts
Janis Papanagnou <Jani  2008-03-17 13:37:27 
Re: Splitting huge XML Files into fixsized wellformed parts
Malapha <malapha@[EMAI  2008-03-17 06:35:37 
Re: Splitting huge XML Files into fixsized wellformed parts
Hermann Peifer <peifer  2008-03-17 20:20:36 
Re: Splitting huge XML Files into fixsized wellformed parts
=?ISO-8859-1?Q?J=FCrgen_K  2008-03-17 21:33:46 
Re: Splitting huge XML Files into fixsized wellformed parts
Hermann Peifer <peifer  2008-03-18 00:01:43 
Re: Splitting huge XML Files into fixsized wellformed parts
Malapha <malapha@[EMAI  2008-03-18 08:42:50 
Re: Splitting huge XML Files into fixsized wellformed parts
Malapha <malapha@[EMAI  2008-03-18 08:43:54 
Re: Splitting huge XML Files into fixsized wellformed parts
Hermann Peifer <peifer  2008-03-18 20:49:03 
Re: Splitting huge XML Files into fixsized wellformed parts
Hermann Peifer <peifer  2008-03-19 14:05:17 
Re: Splitting huge XML Files into fixsized wellformed parts
Malapha <malapha@[EMAI  2008-03-19 15:11:08 
Re: Splitting huge XML Files into fixsized wellformed parts
Hermann Peifer <peifer  2008-03-20 09:52:22 
Re: Splitting huge XML Files into fixsized wellformed parts
Malapha <malapha@[EMAI  2008-03-25 03:39:31 
Re: Splitting huge XML Files into fixsized wellformed parts
Hermann Peifer <peifer  2008-03-25 06:32:31 
Re: Splitting huge XML Files into fixsized wellformed parts
Malapha <malapha@[EMAI  2008-03-26 10:01:38 
Re: Splitting huge XML Files into fixsized wellformed parts
Hermann Peifer <peifer  2008-03-26 19:57:10 

Post A Reply:
  Go here to Signup

AddThis Feed Button


About - Advertising - Contact - Frequently Asked Questions - Privacy Policy - Terms of Use - Signup

Contact
tan12V112 Fri May 16 9:05:45 CDT 2008.