airbnb/caravel

View on GitHub
docs/docs/installation/docker-builds.mdx

Summary

Maintainability
Test Coverage
---
title: Docker Builds
hide_title: true
sidebar_position: 5
version: 1
---

# Docker builds, images and tags

The Apache Superset community extensively uses Docker for development, release,
and productionizing Superset. This page details our Docker builds and tag naming
schemes to help users navigate our offerings.

Images are built and pushed to the [Superset Docker Hub repository](
https://hub.docker.com/r/apache/superset) using GitHub Actions.
Different sets of images are built and/or published at different times:

- **Published releases** (`release`): published using
  tags like `3.0.0` and the `latest` tag.
- **Pull request iterations** (`pull_request`): for each pull request, while
  we actively build the docker to validate the build, we do
  not publish those images for security reasons, we simply `docker build --load`
- **Merges to the main branch** (`push`): resulting in new SHAs, with tags
  prefixed with `master` for the latest `master` version.

# Build presets

We have a set of build "presets" that each represent a combination of
parameters for the build, mostly pointing to either different target layer
for the build, and/or base image.

Here are the build presets that are exposed through the `build_docker.py` script:
- `lean`: The default Docker image, including both frontend and backend. Tags
without a build_preset are lean builds, e.g., `latest`.
- `dev`: For development, with a headless browser, dev-related utilities and root access.
- `py311`, e.g., Py311: Similar to lean but with a different Python version (in this example, 3.11).
- `ci`: For certain CI workloads.
- `websocket`: For Superset clusters supporting advanced features.
- `dockerize`: Used by Helm.

## Key tags examples

- `latest`: The latest official release build
- `latest-dev`: the `-dev` image of the latest official release build, with a
  headless browser and root access.
- `master`: The latest build from the `master` branch, implicitly the lean build
  preset
- `master-dev`: Similar to `master` but includes a headless browser and root access.
- `pr-5252`: The latest commit in PR 5252.
- `30948dc401b40982cb7c0dbf6ebbe443b2748c1b-dev`: A build for
  this specific SHA, which could be from a `master` merge, or release.
- `websocket-latest`: The WebSocket image for use in a Superset cluster.

For insights or modifications to the build matrix and tagging conventions,
check the [build_docker.py](https://github.com/apache/superset/blob/master/scripts/build_docker.py)
script and the [docker.yml](https://github.com/apache/superset/blob/master/.github/workflows/docker.yml)
GitHub action.

## Caching

To accelerate builds, we follow Docker best practices and use `apache/superset-cache`.

## About database drivers

Our docker images come with little to zero database driver support since
each envrionment requires different drivers, and mataining a build with
wide database support would be both challenging (dozens of databases,
python drivers, and os dependencies) and inefficient (longer
build times, larger images, lower layer cache hit rate, ...).

For production use cases, we recommend that you derive our `lean` image(s) and
add database support for the database you need.

## On supporting different platforms (namely arm64 AND amd64)

Currently all automated builds are multi-platform, supporting both `linux/arm64`
and `linux/amd64`. This enables higher level constructs like `helm` and
docker-compose to point to these images and effectively be multi-platform
as well.

Pull requests and master builds
are one-image-per-platform so that they can be parallized and the
build matrix for those is more sparse as we don't need to build every
build preset on every platform, and generally can be more selective here.
For those builds, we suffix tags with `-arm` where it applies.

### Working with Apple silicon

Apple's current generation of computers uses ARM-based CPUs, and Docker
running on MACs seem to require `linux/arm64/v8` (at least one user's M2 was
configured in that way). Setting the environment
variable `DOCKER_DEFAULT_PLATFORM` to `linux/amd64` seems to function in
term of leveraging, and building upon the Superset builds provided here.

```
export DOCKER_DEFAULT_PLATFORM=linux/amd64
```

Presumably, `linux/arm64/v8` would be more optimized for this generation
of chips, but less compatible across the ARM ecosystem.