                    ____           __
                   / __/__  ____ _/ /___  __________
                  / /_/ _ \/ __ `/ __/ / / / ___/ _ \
                 / __/  __/ /_/ / /_/ /_/ / /  /  __/
                /_/  \___/\__,_/\__/\__,_/_/   \___/
                    __                __
                   / /_  __  ______  / /____  _____
                  / __ \/ / / / __ \/ __/ _ \/ ___/
                 / / / / /_/ / / / / /_/  __/ /
                /_/ /_/\__,_/_/ /_/\__/\___/_/



A python module for trawling music websites that detects changes in lists of feature albums and sends notifications by email

In order for Scrapy to work, you're going to have to install a couple of packages, this guide explains it all


Clone this repository and cd into it

git clone
cd feature-hunter/


install install/test the python package

sudo python install
python test/
play with some databases (an example databse is provided)

cp example_db.json ~
cd ~
python -m feature_hunter --db example_db.json

schedule that bad boi in your crontab for alerts!

0 * * * * python -m feature_hunter --db ~/example_db.json --enable-alerts --smtp-host --smtp-port 465 --smtp-pass <email_password> --smtp-sender <your_email> --smtp-domain <your_domain>

Targets are configured by modifying the target table of the database file. Here's the example DB which reads feature albums of the Triple J website:

    "targets": {
        "1": {
            "url": "",
            "record_spec": "{\"css\": \"div.podlist_item\"}",
            "field_specs": "{\"album\": {\"regex\": \" - \\\\s*(\\\\S[\\\\s\\\\S]+\\\\S)\\\\s*$\", \"css\": \"div.text div.title::text\"}, \"artist\": {\"regex\": \"^\\\\s*(\\\\S[\\\\s\\\\S]+\\\\S)\\\\s* - \", \"css\": \"div.text div.title::text\"}}",
            "name": "triplej"


it's JSON within JSON (JSON all the way down) so quote chars have to be backslash-escaped, which means it's easier to create your own database using feature_hunter.db.DBWrapper.insert_target(), but if I get enough interest in this repo, I'll add something to make the targets easier to enter into the database.

In this example, our target webpage looks something like this
<!-- ... -->
<div id="two_col">
    <h2 id="latest">latest feature albums</h2>

    <!--item start-->
    <div class="podlist_item">
        <a href=""><img width="300" height="300" alt="Banks - The Altar" src=""></a>
        <div class="text" style="height: 66px;">
            <div class="title">Banks - The Altar</div>
            Following up her 2013 debut <i>Goddess</i>, the L.A. singer pushes personal boundaries with her alt-pop R&amp;B sound.
        <a href="" class="more">More</a>
        <div class="clear"></div>
    <!--item end-->
    <!-- ... -->

<!-- ... -->

we want to target every `<div class="podlist_item">`  using the css target spec: `div.podlist_item` as our records (it also supports xpath targeting), then to obtain the fields `album` and `artist` from each record we're going to do another css target spec on `div.text div.title::text`. Now since the format of the title is `<artist> - <album>` we're going to further target the fields within this text element by selecting them with a regular expression which is ` - \s*(\S[\s\S]+\S)\s*$` for the album and `^\s*(\S[\s\S]+\S)\\s* - ` for the artist.

That's all you need to specify a target. a css or xpath target spec for each record and a css or xpath target spec for each field. The regex is optional, and not needed if your fields are separated in the html.

You may need to dick around with mail settings to get mail to work. At the moment it connects to localhost as a plaintext SMTP server, so if you're using macOS you'll have to floow this guide:

If I get enough interest I'll write an SSL SMTP client, because plaintext creds r bad

 - [x] Correctly identify changes in targets specified in database
 - [ ] Interface to easily add targets to database
 - [ ] Send alerts when changes are detected
 - [ ] get rid of ScrapyDeprecationWarning