bomquote/transistor

View on GitHub
examples/books_to_scrape/__init__.py

Summary

Maintainability
A
0 mins
Test Coverage
# -*- coding: utf-8 -*-
"""
transistor.examples.books_to_scrape
~~~~~~~~~~~~
This module implements a full working example for scraping a list of book titles on
the books.toscrape.com website, with the book titles themselves ingested from a
column in the `book_titles.xlsx` workbook.

- scraper.py
    - subclasses SplashScraper to create a customized BooksToScrapeScraper
    - see SplashScraper in transistor.scrapers.abstract.splash_scraper_abc

- workgroup.py
    - subclasses BaseWorker to create BooksWorker
    - sublcasses BaseGroup to create BooksToScrapeGroup
    - see BaseWorker and BaseGroup in transistor.workers.abstract

- manager.py
    - sublasses BaseWorkGroupManager to create BooksWorkGroupManager
    - see BaseWorkGroupManager in transistor.managers.manager_abc

- main.py
    - Entry point to run the books_to_scrape example

- settings.py
    - An example application configuration, for database URI's and other attributes

- persistence/serialization.py
    - classes to serialize data from a scrape object in a suitable format for newt.db

- persistence/newt_db.py
    - an example of setting up a newt.db connection, which we name ndb


:copyright: Copyright (C) 2018 by BOM Quote Limited
:license: The MIT License, see LICENSE for more details.
~~~~~~~~~~~~
"""