Web crawler software for Frontier.
Crawlers are also know as Spiders. Same idea...
This suite crawls a website, downloading all files it can find to your local hard disk.
I wrote this suite because I needed to grab some sites for testing with Frontier 4.2.
Full source is provided, it makes good sample code for lots of other kinds of projects, including a remote site outliner, a database-oriented remote search engine, or a site loader that brings a remote site into Frontier's object database.
You are welcome to take this code apart and put it back together any way you like.
Download and install
The Crawler suite requires: Frontier 4.2, NetEvents, and the Tag Extraction Kit.
Please download the Crawler suite:
ftp://ftp.scripting.com/userland/crawler.sit.hqx or
http://ws3.scripting.com/ftpMirror/userland/crawler.sit.hqx
Double-click on each of the Frontier files, click on OK to all confirmation prompts.
Setup
Choose Crawler from Frontier's Suites menu. This adds a new menu to your menu bar called Crawl.
Choose Open Site List from the Crawl menu. An outline opens with a single entry, the URL to a website. This is the site that will be downloaded. You can add as many sites as you want to this list.
Set the folder to hold crawled sites using the Choose Folder command in the Crawl menu.
You may not want to download large files. Open the Exception list and add file suffixes that you don't want to download. Initially, we set it up so the crawler won't download files ending with .hqx or .bin.
Get ready to crawl!
It's time. Let's do it.
Choose the Crawl! command from the Crawl menu. The NetEvents app launches. A log outline window opens. URLs start piling up. In the Finder, the website starts materializing. It's methodical. It gets everything and puts it in the right place. I love watching this thing!
Eventually the spider finishes, and the formerly remote site is now on my hard disk.
A couple of crawler logs are on Sample Crawler Log page.
Other options
Choose the Open Crawler Data command from the Crawl menu. A table, user.crawler, opens. You can use Frontier's table editor to change many of the parameters for crawling. Here are some notes.
Excellence!
The Crawler suite builds on the work of Chuck Shotton, Arnold Lesikar, Danis Georgiadis and Brent Simmons.
It's a community at work producing excellent results.
That's realllly coooool.
Thanks!
© Copyright 1996-97 UserLand Software. This page was last built on 5/7/97; 1:16:47 PM.
It was originally posted on 2/18/97; 5:38:49 PM.
Internet service provided by Conxion.