Talk About Network

Google


Register and Login
Nick
Password
Register create new account Sign up is FREE and you can post replies, new topics, bookmark posts and more!
Recover lost password


Programming > C++ Moderated > Re: Open source...
Latest [ Topics | Posts ] Archive Post A New Topic Post a Reply
<< Topic < Post Post 3 of 5 Topic 9584 of 9831
Post > Topic >>

Re: Open source library for generating and parsing (x)html

by Ian Collins <ian-news@[EMAIL PROTECTED] > May 10, 2008 at 07:26 PM

{ Accepted as follow-up.  Further discussion of general tools for HTML
tidying 
would be off-topic (as I see you're aware :-) ) unless there is some C++ 
content. -mod }

marlow.andrew@[EMAIL PROTECTED]
 wrote:
> But I am not so sure about my case. My need is to parse HTML for use
> by a screen scraper. The trouble is, most web pages, including the
> ones I am scraping, have ill-formed HTML. How does your library cope
> with that? I eventually gave up trying to do this in C++ and used
> python instead. It has a package called BeautifulSoup which is
> designed specifically to cope with ill-formed HTML.
> 
<OT>
htmltidy is your friend in this case.  Your system may have it
installed, otherwise it is very easy to build and use.
</OT>

-- 
Ian Collins.

      [ See http://www.gotw.ca/resources/clcm.htm
for info about ]
      [ comp.lang.c++.moderated.    First time posters: Do this! ]
 




 5 Posts in Topic:
Open source library for generating and parsing (x)html
Mitchel Haas <mhaas@[E  2008-05-09 09:25:17 
Re: Open source library for generating and parsing (x)html
marlow.andrew@[EMAIL PROT  2008-05-10 06:14:38 
Re: Open source library for generating and parsing (x)html
Ian Collins <ian-news@  2008-05-10 19:26:36 
Re: Open source library for generating and parsing (x)html
"AnonMail2005@[EMAIL  2008-05-10 19:25:28 
Re: Open source library for generating and parsing (x)html
Mitchel Haas <mhaas@[E  2008-05-11 09:47:14 

Post A Reply:
  Go here to Signup

AddThis Feed Button


About - Advertising - Contact - Frequently Asked Questions - Privacy Policy - Terms of Use - Signup

Contact
tan12V112 Sat Jul 26 3:01:54 CDT 2008.