DepthFirstSearch

Following the last link that gets added - A depth first search is an algorithm that as it crawls we pages it will follow the list link on page, add the links that are found there and following aging the last link added. It therefore does not get to look at the second link on the first page until if followed all the links that can be reached from the last link on the fist page.

If the goal is to get a good a corpus from the web quickly, doing a depth-first search would probably not be the best way to do it.

if a search can be completed no matter what order it followed it will always find the same set of pages. However, if it not able to complete the search and with a real web crawlers, there are far too many pages to wait until it finds them all to return a result, then the order that it does the pages matters a lot.

Figure out ways to change the search order that will result in a better way of capturing content on the web.


Tags: crawler

Edit this page
Wiki-logo