lliendo/Radar

View on GitHub
docs/internals.rst

Summary

Maintainability
Test Coverage
Radar internals
===============

Radar has been carefully designed to its keep base code clean and
understandable, so everyone can take a look at its internals (and hopefully
play with the code).

This section of the documentation tries to expose the main ideas that were
implemented to make this project possible. We'll not describe every single
detail because that would take a huge amount of time and you'll get bored.
Instead I've decided to describe few things as possible and try to reflect
why a decision was taken that way. Also consider that like everybody else, I make
mistakes and no perfect software design exists and Radar is a long way
from achieving that.


Overview
--------

Radar is designed to be a small tool, its core isn't intended to grow
indefinitely besides some currently lacking features. The reason behind
this is that a tool that is small and controlled in its size and its
objectives is easier to understand and does its work better than an
all-problem-solving solution.
This also has a downside : a small tool may not offer advanced or complex
features. Radar's main goal is to be a simple and easy to use tool hence the
reason why you might not find as many features as other solutions may offer.

Radar makes use of object oriented programming, every component is modeled
using a class. Some few classes make use of mixins and all errors are
handled through exceptions. Radar also makes heavy use of list comprehensions
across the project.

If you take a fast look to the code you'll realize that almost every method
is only a few lines long. Every class is intended to perform a specific task
and each method solves a concrete piece of that task.
The result is that you won't find complex or twisted code and reading any
piece of code and get the idea of what is doing should take little time.
The code mostly lacks comments, the reason for this is that the code
intends to be self-describing (care has been taken to make classes and
methods describe and reflect their intentions). Radar tries to stick to
this rule.


Project layout
--------------

Radar has the following project structure :

.. code-block:: bash

    /Radar
        /docs                # Includes project's documentation in reStructuredText.
                             # Sphinx is used for documentation generation.

        /scripts             # Launch scripts of Radar server and client.
                             # Configuration scripts for Radar server and client.

        /init_scripts        # Init scripts for different operating systems.
        /tests               # Project's tests.

        /radar
            /check           # Check and CheckGroup abstractions.
            /check_manager   # CheckManager governs check execution.
            /class_loader    # Loading mechanism for plugins.
            /client          # Main RadarClient abstraction.

            /client_manager  # ClientManager handles Radar clients when connect,
                             # disconnect and send replies.

            /config          # Includes builders that handle initializations of
                             # both Radar client and server.

            /contact         # Contact and ContactGroup abstractions.

            /initial_setup   # Includes facilities to configure Radar after
                             # it has been installed.

            /launcher        # Includes classes that fire up both Radar server
                             # and client.

            /logger          # Logging services.
            /misc            # A few helper classes mainly used by Radar server.
            /monitor         # Monitor abstraction.

            /network         # Low level network facilities to handle both 
                             # server and clients. Different platform specific
                             # network monitors are found here.

            /platform_setup  # Platform specific configuration and setup.

            /plugin          # Plugin and PluginManager classes work each other
                             # closely to allow plugin functionality.

            /protocol        # Low level network protocol that Radar uses for
                             # communicating between server and clients.
                             
            /server          # Main RadarServer abstraction.


Initialization
--------------

Both Radar client and server go through almost the same steps before going
into operational mode. When Radar (client or server) is fired up it 
instantiates a launcher (RadarClientLauncher for the client and
RadarServerLauncher for the server) and immediately calls its run() method.

From that point a three phase initialization takes place :

1. First the command line is processed. This is done in the RadarLauncher
   class. After this, objects and configurations are read from the main
   configuration file and alternate files in the case of the server are
   parsed and processed.
2. Client and server proceed to define, create and configure threads. 
3. Finally threads are launched.

After all threads are successfully launched client and server break away and
start performing completely different tasks.


Operational overview
--------------------

Both Radar client and server operate in an event triggered fashion and make
use of threads to distribute the workload.
If you look at the code of the RadarServer and RadarClient classes you'll
find methods called 'on_something'. Every time a network event occurs it is
reflected in any of those methods. The heart of Radar is two abstract
classes : Client and Server which can be found under the network module.
The Client and Server classes operate in a very similar way despite being
different from the way they handle network sockets.

The network module also provides some network monitors that are platform
dependent. Before Radar server goes into operational mode it tries to select
the best multiplex i/o method available. In any case if the platform can't
be detected or an efficient multiplexing method cannot be found Radar will
fall back to the SelectMonitor (which relies on the select system call).
The currently supported multiplexing strategies are : select, poll, epoll
and kqueue.

Radar's client and server also operate in a non-blocking way. Its main threads
loops are iterated constantly every 200 milliseconds. This prevents any
single client from blocking the server indefinitely due to a malformed or
incomplete network message. Also this mechanism is used as an easy workaround
to gracefully terminate threads : one thread Event is shared among all defined
threads, when this thread event is stopped the condition of the loop does
not hold and the threads successfully end.


Server operation
----------------

The main work of the server is split across three main threads :

* RadarServer.
* RadarServerPoller.
* PluginManager.


RadarServer :

This thread is responsible for accepting clients and receiving replies from
them. A client is only accepted if it is defined in at least one monitor
and is not duplicated (that is, if the same client isn't already connected).

Once a client is accepted it is registered within the ClientManager.
The ClientManager acts as proxy that talks directly to all defined monitors.
Every monitor internally knows if it has to accept a client when it connects,
if it is indeed accepted then a copy of the checks and contacts is stored
along with the instance of that client. This copy is needed because more than
one client may match against the same monitor.

The reverse process applies when a client disconnects, the RadarServer unregisters
that client and the connection is closed.

When a client sends a reply is it also initially processed by the ClientManager.
The reason for this is that we need to get a list of checks and contacts
that are affected by such reply. These two lists of objects are later on
transferred to the PluginManager to be processed by any defined plugins.


RadarServerPoller :

This is the simplest thread. Every N seconds it simply asks the ClientManager
to poll all of its monitors. The existence of this thread is that it makes
sense to have a different abstraction that decides when its time to poll
the clients. If this work would have been done in the RadarServer we would
be mixing asynchronous (network activity) and synchronous (wait a certain amount
of time) events making the overall design more complex to both understand
and work with.


PluginManager :

As its name indicates, this is the place where all plugins are executed and
controlled. Whenever the RadarServer receives a reply from a client and after
little processing a dictionary containing all relevant plugin data is written
by the RadarServer to a  queue that both RadarServer and PluginManager share,
this is the mechanism of communication between those objects.
The PluginManager quietly waits for a new dictionary to arrive from this
queue, when it does it disassembles all parameters and performs object id
dereferencing of two lists that contain the affected checks and the
related contacts. This dereferencing is possible because threads share the
same address space. This solution seems more elegant and effective than
re-instantiating those objects from their states.
After this pre-processing every plugin's run method is called with appropiate
arguments. If a plugin does not work properly all exceptions are caught and
registered in the Radar's log file.


Client operation
----------------

The client relies on two threads :

* RadarClient.
* CheckManager.

RadarClient :

This thread is responsible for receiving and replying messages from the
Radar server. For every message received the message is desearialized and
written to a queue (that is shared with the CheckManager). Both RadarClient
and CheckManager actually share two queues to support bidirectional
communication between threads. One queue is used to write checks that need
to be executed, the other is used to read the results of those executions.

In case the Radar client is unable to connect to the Radar server it will
wait a certain amount of time and try to reconnect again. This is repeated
indefinitely if the reconnect option is set to True. It will try to connect
after 5, 15 and 60 seconds (cyclically). This option is useful because after
updating the Radar's server configuration you need to restart it and all
connections are lost. Radar currently does not provide a reload mechanism.


CheckManager :

Whenever a CHECK message is received by the RadarClient thread and after
little processing is immediately sent to the CheckManager. When the check
information is received the CheckManager proceeds to instantiate a bunch
of Checks (depending on the platform running it may instantiate a UnixCheck
or a WindowsCheck) and finally executes them sequentially.
Every check's output is collected and verified (the CheckManager makes sure
that the Check didn't blow up and that a valid status was returned). It also
discards all fields that are not relevant (it will only keep the status,
details and data fields of the returned JSON).

Once the outputs have been collected they're sent back to the RadarClient
through the other queue and RadarClient sends those results back to the
RadarServer.


Network protocol
----------------

Radar client and server use TCP for all of its communications. Here is the 
network protocol that is used by Radar :

    +------+---------+--------------+---------+
    | TYPE | OPTIONS | PAYLOAD SIZE | PAYLOAD |
    +------+---------+--------------+---------+

* TYPE (1 byte) : Current message types are TEST, TEST REPLY, CHECK
  and CHECK REPLY.

* OPTIONS (1 byte) : Current options are NONE and COMPRESS. 

* PAYLOAD SIZE (2 bytes) : Indicates the size (in bytes) of the payload.

* PAYLOAD (variable) : N bytes make up the payload. The payload's maximum
  size is 64 KiB.

Every time the poller needs to query its clients a CHECK message is built
and broadcasted to all clients that are managed by any monitor. When
the client receives this CHECK message it proceeds to run all checks that
the server instructs it to run. After all checks are executed their outputs
are collected and a CHECK REPLY message is built and sent to the server.

The TEST and TEST REPLY messages are not yet implemented (just defined). The
idea is to have a user-controlled way to explicitly force the run of specific
checks. This is useful because if a check is not working as expected and
a developer or sysadmin fixes it, then it doesn't not make sense to wait until
the next poll round to verify that check performs as expected or fails again.
This feature will be implemented in a next release along with a small console
that allows the user to have more control of the running server.

The payload is always a JSON. The decision behind using JSON is that
provides flexibility and an easy way to validate and convert data that
comes from the other side of the network. Besides that it also allows the
final user to layout the data field of checks as she or he wishes.
This also has downsides : more bytes are sent through the network and an
extra overhead is payed every time we serialize and deserialize a JSON
string.

Currently messages are not being compressed at all. This feature makes
sense only if the client replies a message longer than 64 KiB. This feature
will be certainly included in a future release.


Class diagrams
--------------

Sometimes class diagrams help you see the big picture of a design and also
act as useful documentation. Here are some diagrams that may help you to
to understand what words make cumbersome to describe.

The diagrams contain the most relevant classes of both Radar server and client.
Only the most important methods of every class are mentioned.
You should follow these diagrams along with the code to have a detailed
understanding about what's happening on a certain part of the project.

Radar client :

    +----------------+-------------------------+
    |  RadarClient   | RadarClientLauncher     |
    +================+=========================+
    | |radar-client| | |radar-client-launcher| |
    +----------------+-------------------------+


Radar server :

    +----------------+----------+
    | RadarServer    | Server   |
    +================+==========+
    | |radar-server| | |server| |
    +----------------+----------+

| 

    +-----------+-----------------+
    | Monitor   | ServerConfig    |
    +===========+=================+
    | |monitor| | |server-config| |
    +-----------+-----------------+


Notes :

 * RadarServerLauncher is analogous to RadarClientLauncher.


.. Radar client class-diagrams.

.. |radar-client| image:: _static/class-diagrams/radar-client.svg
    :target: _static/class-diagrams/radar-client.svg
    :width: 60%
    :align: middle

.. |radar-client-launcher| image:: _static/class-diagrams/radar-client-launcher.svg
    :target: _static/class-diagrams/radar-client-launcher.svg
    :width: 60%
    :align: middle


.. Radar server class-diagrams.

.. |radar-server| image:: _static/class-diagrams/radar-server.svg
    :target: _static/class-diagrams/radar-server.svg
    :width: 60%
    :align: middle

.. |server| image:: _static/class-diagrams/server.svg
    :target: _static/class-diagrams/server.svg
    :width: 60%
    :align: middle

.. |monitor| image:: _static/class-diagrams/monitor.svg
    :target: _static/class-diagrams/monitor.svg
    :width: 60%
    :align: middle

.. |server-config| image:: _static/class-diagrams/server-config.svg
    :target: _static/class-diagrams/server-config.svg
    :width: 60%
    :align: middle