Talk About Network



Register and Login
Nick
Password
Register create new account Sign up is FREE and you can post replies, new topics, bookmark posts and more!
Recover lost password


Programming > Awk > Re: Splitting h...
Latest [ Topics | Posts ] Archive Post A New Topic Post a Reply
<< Topic < Post Post 9 of 16 Topic 2194 of 2236
Post > Topic >>

Re: Splitting huge XML Files into fixsized wellformed parts

by Hermann Peifer <peifer@[EMAIL PROTECTED] > Mar 18, 2008 at 08:49 PM

Malapha wrote:
> On 18 Mrz., 00:01, Hermann Peifer <pei...@[EMAIL PROTECTED]
> wrote:
>> Malapha wrote:
>>
>>> As I come from the VBA world - I tried to get familiar with awk. What
>>> I do have is theoretical solution in form of a structured process
>>> diagram :-)
>>> Copy Header and Footer from Original to Var
>>> Set Start_Offer = First Offer (from <Offer> to </Offer>)
>>> Set End_Transaction = 0
>>> Set Part = 0
>>> Set FileSize = 0
>>> Set MaxFileSize = 250
>>> while not Start_Offer < EOF(OriginalXMLFile)
>>>      Part=part+1
>>>      Open NewFile OriginalXMLFileName + Part + ".xml"
>>>      Paste Header from Var to NewFile
>>>      While filesize(NewFile)<MaxFileSize do
>>>          Copy Offer (Start_Offer) from OriginalXMLDatei to NewFile
>>>          Start_Offer=Start_Offer + 1
>>>      wend
>>>      Paste Footer from Var to NewFile
>>> wend
>>> I am right now trying to translate this into awk.. Please dont ask me
>>> how far i am, its frustrating :-)
>> Below one solution for splitting in well-formed chunks, here: 100
>> OfferInfos each.  There might be better solutions (I just don't know
>> them ;-) It only works if the XML data is in "pretty print format", as
>> the sample data you posted.
>>
>> $ cat split_bigfile.awk
>>
>> BEGIN { new_chunk = 1 ; size = 100 }
>>
>> NR == 1 { header = $0 ; next }
>> NR == 2 { header = header ORS $0 ; footer = "</" substr($1,2) ">" ;
next }
>>
>> $0 !~ footer {
>>         if (new_chunk) {
>>                 outfile = "chunk" sprintf("%07d", num) ".xml"
>>                 print header > outfile
>>                 new_chunk = 0
>>         }
>>         print > outfile
>>
>> }
>>
>> /<\/OfferInfo>/ {
>>         num = int(count++/size)
>>         if (num > prev_num) {
>>                 print footer > outfile
>>                 new_chunk = 1
>>         }
>>         prev_num = num
>>
>> }
>>
>> END { if (!new_chunk) print footer > outfile }
> 
> Herman you are great. As I have written in to Jürgen, I am unable to
> check it. But as soon as possible I ll give it a try!!
> 
> Thanks again
> Malapha


Here the xgawk version of the same script. It works fine for me with 
your testdata. No pre-formatting of bigfile.xml is needed. However, for 
this solution you need to have xgawk and the library xmlcopy.awk 
available. In xmlcopy.awk, I made a minor change at the very end:

    # printf( "%s", token )
    return token

Usage of the script: xgawk -f split_big_xmlfile.awk bigfile.xml

$ cat split_big_xmlfile.awk

# Include the xmlcopy.awk library
# Make sure that xgawk finds it
@[EMAIL PROTECTED]
 xmlcopy

BEGIN { new_chunk = 1 ; size = 100 }

# Remember XML declaration of bigfile.xml
XMLDECLARATION { header = XmlCopy() }

# Remember root element, define the footer
XMLSTARTELEM && XMLDEPTH == 1 {
	header = header XmlCopy()
	footer = "</" XMLSTARTELEM ">"
}

# Only care about OfferInfos and their children
XMLPATH ~ /OfferInfo/ {
	if (new_chunk) {
		outfile = "chunk" sprintf("%07d", num) ".xml"
		printf "%s", header > outfile
		new_chunk = 0
	}
	printf "%s", XmlCopy() > outfile
}

# Decide if it's time to add a footer and start a new chunk
XMLENDELEM == "OfferInfo" {
	num = int(++count/size)
	if (num > prev_num) {
		print footer > outfile
		new_chunk = 1
	}
	prev_num = num
}

# Avoid double footers, if at the end: count%size = 0
END { if (!new_chunk) print footer > outfile }




 16 Posts in Topic:
Splitting huge XML Files into fixsized wellformed parts
Malapha <malapha@[EMAI  2008-03-17 03:43:20 
Re: Splitting huge XML Files into fixsized wellformed parts
Janis Papanagnou <Jani  2008-03-17 13:37:27 
Re: Splitting huge XML Files into fixsized wellformed parts
Malapha <malapha@[EMAI  2008-03-17 06:35:37 
Re: Splitting huge XML Files into fixsized wellformed parts
Hermann Peifer <peifer  2008-03-17 20:20:36 
Re: Splitting huge XML Files into fixsized wellformed parts
=?ISO-8859-1?Q?J=FCrgen_K  2008-03-17 21:33:46 
Re: Splitting huge XML Files into fixsized wellformed parts
Hermann Peifer <peifer  2008-03-18 00:01:43 
Re: Splitting huge XML Files into fixsized wellformed parts
Malapha <malapha@[EMAI  2008-03-18 08:42:50 
Re: Splitting huge XML Files into fixsized wellformed parts
Malapha <malapha@[EMAI  2008-03-18 08:43:54 
Re: Splitting huge XML Files into fixsized wellformed parts
Hermann Peifer <peifer  2008-03-18 20:49:03 
Re: Splitting huge XML Files into fixsized wellformed parts
Hermann Peifer <peifer  2008-03-19 14:05:17 
Re: Splitting huge XML Files into fixsized wellformed parts
Malapha <malapha@[EMAI  2008-03-19 15:11:08 
Re: Splitting huge XML Files into fixsized wellformed parts
Hermann Peifer <peifer  2008-03-20 09:52:22 
Re: Splitting huge XML Files into fixsized wellformed parts
Malapha <malapha@[EMAI  2008-03-25 03:39:31 
Re: Splitting huge XML Files into fixsized wellformed parts
Hermann Peifer <peifer  2008-03-25 06:32:31 
Re: Splitting huge XML Files into fixsized wellformed parts
Malapha <malapha@[EMAI  2008-03-26 10:01:38 
Re: Splitting huge XML Files into fixsized wellformed parts
Hermann Peifer <peifer  2008-03-26 19:57:10 

Post A Reply:
  Go here to Signup

AddThis Feed Button


About - Advertising - Contact - Frequently Asked Questions - Privacy Policy - Terms of Use - Signup

Contact
tan12V112 Fri May 16 9:38:08 CDT 2008.