Checkbot
Checkbot is a tool to verify links on a set of HTML
pages. Checkbot can check a single document, or a set of documents
on one or more servers. Checkbot creates a report which summarizes
all links which caused some kind of warning or error.
Getting Checkbot
Current releases of Checkbot can always be found on CPAN,
the Comprehensive Perl
Archive Network. Currently the latest version is checkbot-1.80.tar.gz.
Checkbot is now hosted on sourceforge.net.
Changes in version 1.80 (15-Oct-2008)
- Fix handling of nofollow robots tag.
- Require newer version of LWP for better handling of character
encodings.
- Ignore mms scheme.
- Minor clarification in output.
Changes in version 1.79 (3-Feb-2007)
- Correctly parse documents to avoid problems with UTF-8
documents. This avoids the "Parsing of undecoded UTF-8 will give
garbage when decoding entities" messages.
- Allow regular expressions in the suppression file, and complain if
the suppression file is not a proper file.
- More robust handling of HTTP and FTP servers that have problems
responding to HEAD requests.
- Use the original URL to report problems.
- Ensure XHTML compliance.
Changes in version 1.78 (3-May-2006)
- Don't throw errors for links that cannot be expected to be valid
all the time (e.g. the classid attribute of an object element)
- Better fallbacks for some cases where the HEAD request does not
work
- Add more classes and ids to allow more styling of results pages
(including example CSS file)
- Ensure XHTML compliance
- Better checks for optional dependencies
Changes in version 1.77 (28-Jul-2005)
- Fix silly build-related problem that prevented checkbot 1.76
from running at all.
- Check for presence of robots meta tag and act on it.
Changes in version 1.76 (25-Jul-2005)
- Error reports now include the page title for easier
identification.
- javascript: links are now ignored because they cannot be
checked.
- Documentation updates.
Changes in version 1.75 (22-Apr-2004)
- New --cookies option to accept cookies from servers while
checking.
- New --noproxy option indicates which domains should not be passed
through the proxy.
- New error code for unknown schemes; only known non-checkable
schemes are ignored now.
- Minor bug fixes.
- Documentation updates.
Changes in version 1.74 (17-Dec-2003)
- New --suppress option allows Response code/URL combinations not to
be reported as problems.
- Checkbot warnings are now handled as pseudo-HTTP status messages so
that they can make use of all Checkbot features such as --dontwarn.
- Option --allow-simple-hosts is deprecated due to this change.
- More robust handling of (lack of) status messages.
- Checkbot now requires LWP 5.70 due to bugfixes in this release,
although it should still also work with older LWP versions.
- Documentation fixes.
Changes in version 1.73 (31-Aug-2003)
- Checkbot now tries to produce valid XHTML 1.1
- URLs matching the --ignore option are now completely ignored; they
used to be checked but not reported.
- Proxy support works again, but --proxy now applies to all
links
- Documentation fixes
Changes in version 1.72 (04-May-2003)
- URLs with query strings are now checked by default, the --exclude
option can be used to revert to the previous behavior
- The server results page contains shortcut links to each section
- Removed warning for unqualified hostnames for news: URLs
- Handling of signals such as SIGINT
- Bug and documentation fixes
Changes in version 1.71 (29-Dec-2002)
- New --filter option allows rewriting of URLs before they
will be checked
- Problematic links are now reported for each page on which
they occur
- New statistics which should work correctly
- Much simplified storage of information on problem links
- Duplicate links are now properly detected and not checked twice
- Rewritten internals for link checking, as a consequence
internal and external links are checked at the same time now,
not in two passes like before
- Rewritten internals for message output
- A simple test case for 'make test'
- Minor cleanups of the code
Dependencies
Checkbot requires the following additional software, all of
which is also available at CPAN. Try running the 'cpan' command
to install them in your perl installation.
- perl 5 (version 5.8 recommended)
- LWP 5.803 (the
libwww-perl 5 module)
- HTML::Parser 3.33
- URI 1.10
- Net::FTP 2.58 (available in the libnet package)
- Mail::Send 1.03 (optional, needed to use --mailto option,
available in the MailTools package)
- Time::Duration (optional, used to show lapsed time in the
reports)
In general, it is recommended to always use the most recent version of
these modules.
Example
I usually keep an example of a Checkbot
run around.
Additional information
Contact information
Please report bugs and patches at the
Sourceforge
project page.
Announcements mailinglist
There is an announcement
mailing list to which announcements of new versions are
posted.
Hans de
Graaff
Last modified: Wed Oct 15 14:22:04 CEST 2008