Create Simple Web Crawler Using PHP And MySQL

Datetime:2016-08-23 02:22:22          Topic: PHP  MySQL  Web Crawlers           Share

Tags:-PHP MySQL HTML

Web crawler is used to crawl webpages and collect details like webpage title, description, links etc for search engines and store all the details in database so that when someone search in search engine they get desired results web crawler is one of the most important part of a search engine.In this tutorial we will show you how to create a simple web crawler using PHP and MySQL.

To Create Simple Web Crawler It Takes Only One Step:-

  1. Make a PHP file to crawl webpages and store details in database

Step 1.Make a PHP file to crawl webpages and store details in database

We make a PHP file and save it with a name crawl.php

// Database Structure 
CREATE TABLE 'webpage_details' (
 'link' text NOT NULL,
 'title' text NOT NULL,
 'description' text NOT NULL,
 'internal_link' text NOT NULL,
) ENGINE=MyISAM AUTO_INCREMENT=5 DEFAULT CHARSET=latin1

<?php
 $main_url="http://samplesite.com";
 $str = file_get_contents($main_url);
 
 // Gets Webpage Title
 if(strlen($str)>0)
 {
  $str = trim(preg_replace('/\s+/', ' ', $str)); // supports line breaks inside <title>
  preg_match("/\<title\>(.*)\<\/title\>/i",$str,$title); // ignore case
  $title=$title[1];
 }
	
 // Gets Webpage Description
 $b =$main_url;
 @$url = parse_url( $b );
 @$tags = get_meta_tags($url['scheme'].'://'.$url['host'] );
 $description=$tags['description'];
	
 // Gets Webpage Internal Links
 $doc = new DOMDocument; 
 @$doc->loadHTML($str); 
 
 $items = $doc->getElementsByTagName('a'); 
 foreach($items as $value) 
 { 
  $attrs = $value->attributes; 
  $sec_url[]=$attrs->getNamedItem('href')->nodeValue;
 }
 $all_links=implode(",",$sec_url);
 
 // Store Data In Database
 $host="localhost";
 $username="root";
 $password="";
 $databasename="sample";
 $connect=mysql_connect($host,$username,$password);
 $db=mysql_select_db($databasename);

 mysql_query("insert into webpage_details values('$main_url','$title','$description','$all_links')");

?>

In this step we create a database called 'webpage_details' to store webpage details extracted by our crawler.In starting we enter webpage url and get its content using file_get_contents() function and them we use some regular expression to get webpage title.To get webpage description we use parse_url() function and get_meta_tags() function to get description value.Now we only have to get all the link present a webpage to do this task we create an object of dom and load html and then get all the anchor tag using getElementsByTagName() function and by using foreach loop we get the value of href attribute and that is our final url and after that we store all the details in our database table.

That's all, this is how to create simple web crawler using PHP and MySQL.You can customize this code further as per your requirement. And please feel free to give comments on this tutorial.





About List