getindata/dbt-airflow-factory

View on GitHub
README.md

Summary

Maintainability
Test Coverage
# DBT Airflow Factory

[![Python Version](https://img.shields.io/badge/python-3.8%20%7C%203.9%20%7C%203.10%20%7C%203.11-blue)](https://github.com/getindata/dbt-airflow-factory)
[![PyPI Version](https://badge.fury.io/py/dbt-airflow-factory.svg)](https://pypi.org/project/dbt-airflow-factory/)
[![Downloads](https://pepy.tech/badge/dbt-airflow-factory)](https://pepy.tech/project/dbt-airflow-factory)
[![Maintainability](https://api.codeclimate.com/v1/badges/47fd3570c858b6c166ad/maintainability)](https://codeclimate.com/github/getindata/dbt-airflow-factory/maintainability)
[![Test Coverage](https://api.codeclimate.com/v1/badges/47fd3570c858b6c166ad/test_coverage)](https://codeclimate.com/github/getindata/dbt-airflow-factory/test_coverage)
[![Documentation Status](https://readthedocs.org/projects/dbt-airflow-factory/badge/?version=latest)](https://dbt-airflow-factory.readthedocs.io/en/latest/?badge=latest)

Library to convert DBT manifest metadata to Airflow tasks

## Documentation

Read the full documentation at [https://dbt-airflow-factory.readthedocs.io/](https://dbt-airflow-factory.readthedocs.io/en/latest/index.html)

## Installation

Use the package manager [pip][pip] to install the library:

```bash
pip install dbt-airflow-factory
```

## Usage

The library is expected to be used inside an Airflow environment with a Kubernetes image referencing **dbt**.

**dbt-airflow-factory**'s main task is to parse `manifest.json` and create Airflow DAG out of it. It also reads config
files from `config` directory and therefore is highly customizable (e.g., user can set path to `manifest.json`).

To start, create a directory with a following structure, where `manifest.json` is a file generated by **dbt**:
```
.
├── config
│   ├── base
│   │   ├── airflow.yml
│   │   ├── dbt.yml
│   │   └── k8s.yml
│   └── dev
│       └── dbt.yml
├── dag.py
└── manifest.json
```

Then, put the following code into `dag.py`:
```python
from dbt_airflow_factory.airflow_dag_factory import AirflowDagFactory
from os import path

dag = AirflowDagFactory(path.dirname(path.abspath(__file__)), "dev").create()
```

When uploaded to Airflow DAGs directory, it will get picked up by Airflow, parse `manifest.json` and prepare a DAG to run.

### Configuration files

It is best to look up the example configuration files in [tests directory][tests] to get a glimpse of correct configs.

You can use [Airflow template variables][airflow-vars] in your `dbt.yml` and `k8s.yml` files, as long as they are inside
quotation marks:
```yaml
target: "{{ var.value.env }}"
some_other_field: "{{ ds_nodash }}"
```

Analogously, you can use `"{{ var.value.VARIABLE_NAME }}"` in `airflow.yml`, but only the Airflow variable getter.
Any other Airflow template variables will not work in `airflow.yml`.

### Creation of the directory with data-pipelines-cli

**DBT Airflow Factory** works best in tandem with [data-pipelines-cli][dp-cli] tool. **dp** not only prepares directory
for the library to digest, but also automates Docker image building and pushes generated directory to the cloud storage
of your choice.

[airflow-vars]: https://airflow.apache.org/docs/apache-airflow/stable/templates-ref.html#variables
[dp-cli]: https://pypi.org/project/data-pipelines-cli/
[pip]: https://pip.pypa.io/en/stable/
[tests]: https://github.com/getindata/dbt-airflow-factory/tree/develop/tests/config