getindata/dbt-airflow-factory

View on GitHub
docs/usage.rst

Summary

Maintainability
Test Coverage
Usage
-----

To start, create a directory with a following structure, where ``manifest.json`` is a file generated by **dbt**:

.. code-block:: bash

 .
 ├── config
 │   ├── base
 │   │   ├── airflow.yml
 │   │   ├── dbt.yml
 │   │   └── k8s.yml
 │   └── dev
 │       └── dbt.yml
 ├── dag.py
 └── manifest.json

Then, put the following code into ``dag.py``:

.. code-block:: python

 from dbt_airflow_factory.airflow_dag_factory import AirflowDagFactory
 from airflow.models import Variable
 from os import path

 dag = AirflowDagFactory(path.dirname(path.abspath(__file__)), Variable.get("env")).create()

For older versions of Airflow (before 2.0) the dag file need to be slightly bigger:

.. code-block:: python

 from airflow import DAG
 from pytimeparse import parse
 from os import path
 from airflow.models import Variable
 from dbt_airflow_factory.config_utils import read_config
 from dbt_airflow_factory.airflow_dag_factory import AirflowDagFactory

 dag_factory = AirflowDagFactory(path.dirname(path.abspath(__file__)), Variable.get("env"))
 config = dag_factory.read_config()
 with DAG(default_args=config["default_args"], **config["dag"]) as dag:
     dag_factory.create_tasks(config)


When uploaded to Airflow DAGs directory, it will get picked up by Airflow, parse ``manifest.json`` and prepare a DAG to run.

Configuration files
+++++++++++++++++++

It is best to look up the example configuration files in
`tests directory <https://github.com/getindata/dbt-airflow-factory/tree/develop/tests/config>`_ to get a glimpse
of correct configs.

You can use `Airflow template variables <https://airflow.apache.org/docs/apache-airflow/stable/templates-ref.html#variables>`_
in your ``dbt.yml`` and ``k8s.yml`` files, as long as they are inside quotation marks:

.. code-block:: yaml

 target: "{{ var.value.env }}"
 some_other_field: "{{ ds_nodash }}"

Analogously, you can use ``"{{ var.value.VARIABLE_NAME }}"`` in ``airflow.yml``, but only the Airflow variable getter.
Any other Airflow template variables will not work in ``airflow.yml``.


Creation of the directory with data-pipelines-cli
+++++++++++++++++++++++++++++++++++++++++++++++++

**DBT Airflow Factory** works best in tandem with `data-pipelines-cli <https://pypi.org/project/data-pipelines-cli/>`_
tool. **dp** not only prepares directory for the library to digest, but also automates Docker image building and pushes
generated directory to the cloud storage of your choice.