How does a web crawler works

**Elbanco** · 09-01-2010

Hi,
I was just wondering how does popular search engine like Google works. All I found that this are called crawler type or also called as web robot of search engine. I need some detail information on this. First how a search engine works and finds a web page what I want. Why they are called as crawlers. Also what is the basic working of this types of search engine means how does a web crawler works. Thanks in advance

**Jackson2** · 09-01-2010

Sometimes you want to visit a particular site thoroughly and you do not necessarily have a connection at high speed to do so. To remedy this, it is possible to download an entire site so you can see when you're not connected. To do this, you can use a web spider. A web spider reproduced in full - or part - of a site URL on your hard drive. This means that in the address bar of the browser URL, you see a path to your hard drive. It is possible to navigate within the site at different levels of depth that you set when importing data from the site with the vacuum cleaner sites.

**Techno01** · 09-01-2010

Web crawler is type of search engine process also known as web robot. It is a type of program or a script which runs automatically on the internet. The script looks for a the pages on the internet for which a query is passed via web browser.The most basic example of the same is a search engine. WWW is like a web and search engines are like spider which crawl on the web and fetch information.

**Trio** · 09-01-2010

Mostly the search engine crawls on the sites every day to update them in cas of any new content added to it. The search engines saves a copy of a page which are visited mostly so that it becomes easy for them to index later. It is the most basic technique used on the internet and thus widely successful also.

**deveritt** · 09-01-2010

Here is how a web crawler works. First when you enter a keyword on the search box of Google it searches for the web address or a url. Then to browse users get access to the same site via http. This is the protocol used by all of use to access web pages. Then once the web servers reads the query you get the pages on your computer screen. This happens in just matter of some seconds.

**Techguru01** · 09-01-2010

It is right to say that we entirely depend on a single protocol to access internet. You can search text, images, videos, etc on the internet with the help of this crawlers. It is not easy to develop a search engine. A web crawler type of search engine collects information like the url, title, meta tag, content and links. Then the entire information once collected is reverted back to you on your desktop.

lunalovegod · 11-01-2010

A Web crawler is a computer program that browses the World Wide Web in a methodical, automated manner. Other terms for Web crawlers are ants, automatic indexers, bots, and worms or Web spider, Web robot, or—especially in the FOAF community—Web scutter.

This process is called Web crawling or spidering. Many sites, in particular search engines, use spidering as a means of providing up-to-date data. Web crawlers are mainly used to create a copy of all the visited pages for later processing by a search engine that will index the downloaded pages to provide fast searches. Crawlers can also be used for automating maintenance tasks on a Web site, such as checking links or validating HTML code. Also, crawlers can be used to gather specific types of information from Web pages, such as harvesting e-mail addresses (usually for spam).

A Web crawler is one type of bot, or software agent. In general, it starts with a list of URLs to visit, called the seeds. As the crawler visits these URLs, it identifies all the hyperlinks in the page and adds them to the list of URLs to visit, called the crawl frontier. URLs from the frontier are recursively visited according to a set of policies.