ZsoltFabok/site_checker

View on GitHub
README.md

Summary

Maintainability
Test Coverage
###Site Checker

[![Gem Version](https://badge.fury.io/rb/site_checker.png)](http://badge.fury.io/rb/site_checker)
[![Build Status](https://travis-ci.org/ZsoltFabok/site_checker.png)](https://travis-ci.org/ZsoltFabok/site_checker)
[![Dependency Status](https://gemnasium.com/ZsoltFabok/site_checker.png)](https://gemnasium.com/ZsoltFabok/site_checker)
[![Code Climate](https://codeclimate.com/github/ZsoltFabok/site_checker.png)](https://codeclimate.com/github/ZsoltFabok/site_checker)
[![Coverage Status](https://coveralls.io/repos/ZsoltFabok/site_checker/badge.png)](https://coveralls.io/r/ZsoltFabok/site_checker)

Site Checker is a simple ruby gem, which helps you check the integrity of your website by recursively visiting the referenced pages and images. I use it in my test environments to make sure that my websites don't have any dead links.

### Install

    gem install site_checker

### Usage

#### In Test Code

First, you have to load the `site_checker` by adding this line to the file where you would like to use it:

    require 'site_checker'

If you want to use it for testing, the line should goto the `test_helper.rb`.

The usage is quite simple:

    check_site("http://localhost:3000/app", "http://localhost:3000")
    puts collected_remote_pages.inspect
    puts collected_local_pages.inspect
    puts collected_remote_images.inspect
    puts collected_local_images.inspect
    puts collected_problems.inspect

The snippet above will open the `http://localhost:3000/app` link and will look for links and images. If it finds a link to a local page, it will recursively checkout out that page, too. The second argument - `http://localhost:3000` - defines the starting reference of your website.

In case you don't want to use a DSL like API you can still do the following:

    SiteChecker.check("http://localhost:3000/app", "http://localhost:3000")
    puts SiteChecker.remote_pages.inspect
    puts SiteChecker.local_pages.inspect
    puts SiteChecker.remote_images.inspect
    puts SiteChecker.local_images.inspect
    puts SiteChecker.problems.inspect

##### Using on Generated Content
If you have a static website (e.g. generated by [octopress](https://github.com/imathis/octopress)) you can tell `site_checker` to use folders from the file system. With this approach, you don't need a webserver for verifying your website:

    check_site("./public", "./public")
    puts collected_problems.inspect

##### Configuration
You can instruct `site_checker` to ignore certain links:

    SiteChecker.configure do |config|
      config.ignore_list = ["/", "/atom.xml"]
    end

By default it won't check the conditions of the remote links and images - e.g. 404 or 500 -, but you can change it like this:

    SiteChecker.configure do |config|
      config.visit_references = true
    end

Too deep recursive calls may be expensive, so you can configure the maximum depth of the recursion with the following attribute:

    SiteChecker.configure do |config|
      config.max_recursion_depth = 3
    end

##### Examples
Make sure that there are no local dead links on the website (I'm using [rspec](https://github.com/rspec/rspec) syntax):

    before(:each) do
      SiteChecker.configure do |config|
        config.ignore_list = ["/atom.xml", "/rss"]
      end
    end

    it "should not have dead local links" do
      check_site("http://localhost:3000", "http://localhost:3000")
      # this will print out the difference and I don't have to re-run with print
      collected_problems.should be_empty
    end

Check that all the local pages can be reached with maximum two steps:

    before(:each) do
      SiteChecker.configure do |config|
        config.ignore_list = ["/atom.xml", "/rss"]
        config.max_recursion_depth = 2
      end

      @number_of_local_pages = 100
    end

    it "all the local pages have to be visited" do
      check_site("http://localhost:3000", "http://localhost:3000")
      collected_local_pages.size.should eq @number_of_local_pages
    end

#### Command line
From version 0.3.0 the site checker can be used from the command line as well. Here is the list of the available options:

    ~ % site_checker -h
    Visits the <site_url> and prints out the list of those URLs which cannot be found

    Usage: site_checker [options] <site_url>
    -e, --visit-external-references  Visit external references (may take a bit longer)
    -m, --max-recursion-depth N      Set the depth of the recursion
    -r, --root URL                   The root URL of the path
    -i, --ignore URL                 Ignore the provided URL (can be applied several times)
    -p, --print-local-pages          Prints the list of the URLs of the collected local pages
    -x, --print-remote-pages         Prints the list of the URLs of the collected remote pages
    -y, --print-local-images         Prints the list of the URLs of the collected local images
    -z, --print-remote-images        Prints the list of the URLs of the collected remote images
    -h, --help                       Show a short description and this message
    -v, --version                    Show version

### Troubleshooting
#### undefined method 'new' for SiteChecker:Module
This error occurs when the test code calls v0.1.1 methods, but a newer version of the gem has already been installed. Update your test code following the examples above.

### Copyright

Copyright (c) 2013 Zsolt Fabok and Contributors. See LICENSE for details.