doc/notebooks/feature_extraction.rst from DynaSlum/satsense

doc/notebooks/feature_extraction.rst
Summary

Maintainability

Test Coverage

Issues

Feature Extraction Example
--------------------------

In this example we will extract the Histogram of Gradients (HoG),
Normalized Difference Vegetation Index (NDVI) and the Pantex features
from a test satelite image.

-  The HoG feature captures the distribution of structure orientations.
-  The NDVI feature captures the level of vegetation.
-  The Pantex feature captures the level of built-up structures.

The image will be split into blocks, in this example 20 by 20 pixels,
and each feature is calculated for this block using a certain amount of
context information called a window. A feature can be calculated on
multiple windows to allow for context at different scales.

In this example
~~~~~~~~~~~~~~~

-  First we will define the Features we would like to extract and with
   which window shapes.
-  We will then load the image using the ``Image`` class.
-  Then we will split the image into blocks using the ``FullGenerator``
   Class.
-  Then we will extract the features using the ``extract_features``
   function.

Live iPython Notebook
^^^^^^^^^^^^^^^^^^^^^

If you are reading this example on readthedocs.io a notebook of this
example is available `in the
repository <https://github.com/DynaSlum/satsense/blob/master/notebooks/FeatureExtraction/feature_extraction.ipynb>`__

.. code:: ipython3

    # General imports
    import numpy as np

    import matplotlib.pyplot as plt
    import matplotlib.gridspec as gridspec
    %matplotlib inline

    # Satsense imports
    from satsense import Image
    from satsense.generators import FullGenerator
    from satsense.extract import extract_features
    from satsense.features import NirNDVI, HistogramOfGradients, Pantex

Define the features to calculate
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

First we define a list of windows for each of the features to use.

Hog and Pantex will be calculated on 2 windows of 25x25 pixels and 23x27
pixels. NDVI will be calculated on one window with 37x37 pixels.

These window shapes are chose arbitrarily to show the capabilities of
satsense, for your own feature extraction you should think and
experiment with these window shapes to give you the best results.

N.B. The NDVI feature here is called NirNDVI because that implementation
uses the near-infrared band of the image, there are several other
implementations of NDVI available in satsense, see `the
documentation <https://satsense.readthedocs.io/en/latest/api/satsense.features.html>`__

.. code:: ipython3

    # Multiple windows
    two_windows = [(25, 25), (23, 37)]
    # Single window
    one_window = [(37, 37),]
    features = [
        HistogramOfGradients(two_windows),
        NirNDVI(one_window),
        Pantex(two_windows),
    ]

Load the image
~~~~~~~~~~~~~~

Here we load the image and normalize it to values between 0 and 1.
Normalization by default is performed per band using the 2nd and 98th
percentiles.

The image class can provide the individual bands, or a number of useful
derivatives such as the RGB image or Grayscale, we call these base
images. More advanced base images are also available, for instance Canny
Edge

.. code:: ipython3

    image = Image('../../test/data/source/section_2_sentinel.tif',
                    'quickbird')
    image.precompute_normalization()

    fig, (ax1, ax2, ax3) = plt.subplots(1, 3, figsize=(24, 8), sharey=True)

    ax1.axis('off')
    ax1.imshow(image['rgb'])
    ax1.set_title('RGB image')

    ax2.axis('off')
    ax2.imshow(image['grayscale'], cmap="gray")
    ax2.set_title('Grayscale image')

    ax3.axis('off')
    ax3.imshow(image['canny_edge'], cmap="gray")
    ax3.set_title('Canny Edge Image')

    plt.show()



.. image:: feature_extraction_files/feature_extraction_5_0.png


Generator
~~~~~~~~~

Next we create a FullGenerator which creates patches of the image in
steps of 20x20 pixels.

In this cell we also show the images, therefore we load the rgb base
image into the generator. This is only needed here so we can show the
blocks using matplotlib. In the next section we will be using the
``extract_features`` function to extract features, which will be loading
the correct base images for you based on the features that will be
calculated.

The patch sizes are determined by the list of window shapes that you
supply the ``load_image`` function. This is normally also provided by
the ``extract_features`` function.

.. code:: ipython3

    generator = FullGenerator(image, (20, 20))
    print("The generator is {} by {}".format(*generator.shape), " blocks")

    # Create a gridspec to show the images
    gs = gridspec.GridSpec(*generator.shape)
    gs.update(wspace=0.05, hspace=0.05)

    # Load a baseimage into the generator.
    # The window is the same as the block size to show the blocks used
    generator.load_image('rgb', ((20, 20),))

    fig = plt.figure(figsize=(8, 8))
    for i, img in enumerate(generator):
        ax = plt.subplot(gs[i])
        ax.imshow(img.filled(0.5))
        ax.axis('off')


.. parsed-literal::

    The generator is 8 by 8  blocks



.. image:: feature_extraction_files/feature_extraction_7_1.png


Calculate all the features and append them to a vector
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

In this cell we use the ``extract_features`` function from satsense to
extract all features.

``extract_features`` returns a python generator that we can loop over.
Each invocation of this generator returns the feature vector for one
feature in the order of the features list. The shape of this vector is
(x, y, w, v) where:

    - x is the number of blocks of the generator in the x direction
    - y is the number of blocks of the generator in the y direction
    - w is the number of windows the feature is calculated on
    - v is the length of the feature per window

We use a little numpy reshaping to merge these feature vectors into a
single feature vector of shape (x, y, n) where n is the total length of
all features over all windows. In this example it will be (8, 8, 13)
because:

    - HoG has 5 numbers per window and 2 windows:   10
    - NirNDVI has 1 number per window and 1 window:  1
    - Pantex has 1 number per window and2 windows:   2
    -                                        Total: 13

.. code:: ipython3

    vector = []
    for feature_vector in extract_features(features, generator):
        # The shape returned is (x, y, w, v)
        # Reshape the resulting vector so it is (x, y, w * v)
        # e.g. flattened along the windows and features
        data = feature_vector.vector.reshape(
                    *feature_vector.vector.shape[0:2], -1)
        vector.append(data)
    # dstack reshapes the vector into and (x, y, n)
    # where n is the total length of all features
    featureset = np.dstack(vector)

    print("Feature set has shape:", featureset.shape)


.. parsed-literal::

    Feature set has shape: (8, 8, 13)


Showing the resulting features
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Below we show the results for the calculated features.

In the result images you can see the edges of the feature vector have
been masked as the windows at the edge of the original image contain
masked values. Furthermore, please keep in mind that the value for the
feature in each block depends on an area around the block.

HoG
^^^

Here is the result of the HoG feature, we display the first value for
each window.

Histogram of Gradients is a feature that first calculates a histogram of
the gradient orientations in the window. Using this histogram 5 values
are calculated. This first value is the 1st heaved central shift moment.
Heaved central shift moments are a measure of spikiness of a histogram.

The other values are: the 2nd heaved central shift moment, the
orientation of the highest and second highest peaks and the sine of the
absolute difference between the highest and second highest peak (this is
1 for right angles).

.. code:: ipython3

    fig, (ax1, ax2, ax3) = plt.subplots(1, 3, figsize=(24, 8))

    ax1.axis('off')
    ax1.imshow(image['rgb'])
    ax1.set_title('Input image')

    ax2.axis('off')
    ax2.imshow(featureset[:, :, 0], cmap="PRGn")
    ax2.set_title('Hog Feature for window {}'.format(two_windows[0]))

    ax3.axis('off')
    ax3.imshow(featureset[:, :, 5], cmap="PRGn")
    ax3.set_title('Hog Feature for window {}'.format(two_windows[1]))

    plt.show()



.. image:: feature_extraction_files/feature_extraction_11_0.png


Normalized Difference Vegetation Index
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Here we show the result for the NDVI feature The NDVI feature captures
the level of vegetation, purple means very little vegetation, green
means a lot of vegetation.

.. code:: ipython3

    fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(16, 8))

    ax1.axis('off')
    ax1.imshow(image['rgb'])
    ax1.set_title('Input image')

    ax2.axis('off')
    ax2.imshow(featureset[:, :, 10], cmap="PRGn")
    ax2.set_title('NirNDVI for window {}'.format(one_window[0]))

    plt.show()



.. image:: feature_extraction_files/feature_extraction_13_0.png


Pantex
^^^^^^

Here we show the results for the Pantex feature. The Pantex feature
captures the level of built-up structures, purple means very little
built-up while green means very built-up.

.. code:: ipython3

    fig, (ax1, ax2, ax3) = plt.subplots(1, 3, figsize=(24, 8))

    ax1.axis('off')
    ax1.imshow(image['rgb'])
    ax1.set_title('Input image')

    ax2.axis('off')
    ax2.imshow(featureset[:, :, 11], cmap="PRGn")
    ax2.set_title('Pantex for window {}'.format(two_windows[0]))

    ax3.axis('off')
    ax3.imshow(featureset[:, :, 12], cmap="PRGn")
    ax3.set_title('Pantex for window {}'.format(two_windows[1]))

    plt.show()



.. image:: feature_extraction_files/feature_extraction_15_0.png