Re: Open source library for generating and parsing (x)html
by Ian Collins <ian-news@[EMAIL PROTECTED]
>
May 10, 2008 at 07:26 PM
{ Accepted as follow-up. Further discussion of general tools for HTML
tidying
would be off-topic (as I see you're aware :-) ) unless there is some C++
content. -mod }
marlow.andrew@[EMAIL PROTECTED]
wrote:
> But I am not so sure about my case. My need is to parse HTML for use
> by a screen scraper. The trouble is, most web pages, including the
> ones I am scraping, have ill-formed HTML. How does your library cope
> with that? I eventually gave up trying to do this in C++ and used
> python instead. It has a package called BeautifulSoup which is
> designed specifically to cope with ill-formed HTML.
>
<OT>
htmltidy is your friend in this case. Your system may have it
installed, otherwise it is very easy to build and use.
</OT>
--
Ian Collins.
[ See http://www.gotw.ca/resources/clcm.htm
for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]