Scraping HTML Content

HTML DOM Parser Works well: http://net.tutsplus.com/tutorials/php/html-parsing-and-screen-scraping-with-the-simple-html-dom-library/
http://sourceforge.net/projects/simplehtmldom/

PHP XML parsers will work though the HTML needs to be perfect (XHTML) : http://www.php.net/manual/en/refs.xml.php

Example: http://www.php.net/manual/en/simplexml.examples.php

 

To grab HTML from a URL, you can use CURL

        // create a new cURL resource
	$ch = curl_init();

	// set URL and other appropriate options
	curl_setopt($ch, CURLOPT_URL, $url);
	curl_setopt($ch, CURLOPT_USERPWD, "username:password");
	curl_setopt($ch, CURLOPT_HEADER, 0);
	curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);	

	// grab URL and pass it to the browser
	$result = curl_exec($ch);

	// close cURL resource, and free up system resources
	curl_close($ch);

 

  From: http://sitestree.com/?p=556
Categories:Web Development, Root, By Sayed Ahmed
Tags:
Post Data:2013-11-18 02:03:36