README.md
# ActiveRecord::Dbt
[![Gem Version](https://badge.fury.io/rb/activerecord-dbt.svg)](https://badge.fury.io/rb/activerecord-dbt)
[![Maintainability](https://api.codeclimate.com/v1/badges/ef9a0a71c73dae7b8156/maintainability)](https://codeclimate.com/github/yamotech/activerecord-dbt/maintainability)
[![Ruby CI](https://github.com/yamotech/activerecord-dbt/actions/workflows/ruby-ci.yml/badge.svg)](https://github.com/yamotech/activerecord-dbt/actions/workflows/ruby-ci.yml)
`ActiveRecord::Dbt` generates [dbt](https://www.getdbt.com/) files from the information of the database connected via ActiveRecord.
Currently, it can generate `yaml` files for `sources` and `models` files for `staging`.
## Installation
To install `activerecord-dbt`, add this line to your application's Gemfile:
```ruby
gem 'activerecord-dbt'
```
Since it is only used in the development environment, it is recommended to add it to the development group:
```ruby
group :development do
gem 'activerecord-dbt'
end
```
Then run:
```bash
$ bundle
```
Alternatively, you can install it manually by running:
```bash
$ gem install activerecord-dbt
```
## Usage
### Configuration
#### ActiveRecord::Dbt Configuration
Create an initializer file for dbt:
```bash
$ bin/rails generate active_record:dbt:initializer
```
This will generate the `config/initializers/dbt.rb` file.
Configuration | Description
--------- | ---------
config_directory_path | The path to the directory where files generated by `bin/rails generate active_record:dbt:*` are stored. The default is `lib/dbt`.
export_directory_path | The path to the directory where configuration files are stored. The default is `doc/dbt`.
dwh_platform | Specify the data warehouse platform to which dbt connects. The default is `bigquery`.
data_sync_delayed | Indicates whether there is a data delay. If set to `true`, `severity: warn` is applied to the `relationships` test. The default is `false`.
logger | The destination for log output. The default is `Logger.new('./log/active_record_dbt.log')`.
used_dbt_package_names | An array of `dbt` package names to use.
locale | I18n locale. The default is `I18n.locale`.
List of platforms that can currently be set with `dwh_platform`.
Data Warehouse Platform | Link
--------- | ---------
bigquery | [BigQuery enterprise data warehouse - Google Cloud](https://cloud.google.com/bigquery?hl=en)
postgres | [PostgreSQL: The world's most advanced open source database](https://www.postgresql.org/)
redshift | [Cloud Data Warehouse - Amazon Redshift - AWS](https://aws.amazon.com/redshift/)
snowflake | [The Snowflake AI Data Cloud - Mobilize Data, Apps, and AI](https://www.snowflake.com/en/)
spark | [Apache Spark™ - Unified Engine for large-scale data analytics](https://spark.apache.org/)
List of packages that can currently be set with `used_dbt_package_names`.
dbt Package Name | Link
--------- | ---------
dbt-labs/dbt-utils | [dbt-labs/dbt-utils: Utility functions for dbt projects.](https://github.com/dbt-labs/dbt-utils)
datnguye/dbterd | [datnguye/dbterd: Generate the ERD as a code from dbt artifacts](https://github.com/datnguye/dbterd)
Example:
Adjust the settings according to your environment.
```ruby
# frozen_string_literal: true
if Rails.env.development?
require 'active_record/dbt'
ActiveRecord::Dbt.configure do |c|
c.config_directory_path = 'lib/dbt'
c.export_directory_path = 'doc/dbt'
c.data_sync_delayed = false
c.used_dbt_package_names = [
'dbt-labs/dbt_utils',
'datnguye/dbterd'
]
end
end
```
#### Create Configuration Files
Create configuration files for dbt:
```bash
$ bin/rails generate active_record:dbt:config
```
This will create the following files.
File | Description
--------- | ---------
`#{config_directory_path}/source_config.yml` | Used to generate `#{export_directory_path}/models/sources/#{source_name}/src_#{source_name}.yml`.
`#{config_directory_path}/staging_model.sql.tt` | Used to generate `#{export_directory_path}/models/staging/#{source_name}/stg_#{source_name}__#{table_name}.sql`.
### Generate dbt Source File
#### dbt Source Configuration
In the `#{config_directory_path}/source_config.yml` file, describe the properties you want to set for the source.
You can configure `sources`, `table_overrides`, `defaults`, and `table_descriptions` in this file.
The available properties for `sources` and `table_overrides` are detailed in [Source properties | dbt Developer Hub](https://docs.getdbt.com/reference/source-properties).
##### sources
Set all properties except for `tables`.
Set the items you want to exclude with `exlude` in `meta`.
Configuration | Description
--------- | ---------
table_names | Specify which table names you do not want output in `sources`.
Example:
```yml
sources:
name: dummy
meta:
generated_by: activerecord-dbt
exclude:
table_names:
- ar_internal_metadata
- schema_migrations
description: |-
Write a description of the 'dummy' source.
You can write multiple lines.
```
##### table_overrides
Set all properties for `tables` except for `name` and `description`.
Example:
```yml
table_overrides:
users:
loaded_at_field: created_at
freshness:
warn_after:
count: 3
period: day
error_after:
count: 5
period: day
columns:
created_at:
data_tests:
- not_null:
where: 'id != 1'
```
##### defaults
Set the default value for the `description`(`logical_name`, `description`) of `tables`.
In the `logical_name` and `description` of `table_descriptions`, you can refer to the table name with `{{ table_name }}`.
In the `description` of `table_descriptions.columns`, you can refer to the table name with `{{ table_name }}` and the column name with `{{ column_name }}`.
Example:
```yml
defaults:
table_descriptions:
logical_name: Write a logical_name of the '{{ table_name }}' table.
columns:
description: Write a description of the '{{ table_name }}.{{ column_name }}' column.
```
If nothing is set, it defaults to the following:
```yml
defaults:
table_descriptions:
logical_name: Write a logical_name of the '{{ table_name }}' table.
columns:
description: Write a description of the '{{ table_name }}.{{ column_name }}' column.
```
##### table_descriptions
Set the `name` and `description` for `tables`.
Configuration | Description
--------- | ---------
logical_name | A title or one-line description to be output in the dbt `description`.
description | A detailed description of `logical_name` to be output in the dbt `description`.
Example:
```yml
table_descriptions:
ar_internal_metadata:
logical_name: Internal Metadata
description: |-
By default Rails will store information about your Rails environment and schema
in an internal table named `ar_internal_metadata`.
columns:
key: Key
value: Value
created_at: Created At
updated_at: Updated At
schema_migrations:
logical_name: Schema Migrations
description: |-
Rails keeps track of which migrations have been committed to the database and
stores them in a neighboring table in that same database called `schema_migrations`.
columns:
version: The version number of the migration.
```
##### Example:
Adjust the settings according to your environment.
```yml
sources:
name: dummy
meta:
generated_by: activerecord-dbt
exclude:
table_names:
- profiles
description: |-
Write a description of the 'dummy' source.
You can write multiple lines.
table_overrides:
users:
loaded_at_field: created_at
freshness:
warn_after:
count: 3
period: day
error_after:
count: 5
period: day
columns:
created_at:
data_tests:
- not_null:
where: 'id != 1'
defaults:
table_descriptions:
logical_name: Write a logical_name of the '{{ table_name }}' table.
columns:
description: Write a description of the '{{ table_name }}.{{ column_name }}' column.
seed_descriptions:
enum:
description: "{{ source_name }} {{ translated_table_name }} {{ translated_attribute_name }} enum"
table_descriptions:
ar_internal_metadata:
logical_name: Internal Metadata
description: |-
By default Rails will store information about your Rails environment and schema
in an internal table named `ar_internal_metadata`.
columns:
key: Key
value: Value
created_at: Created At
updated_at: Updated At
schema_migrations:
logical_name: Schema Migrations
description: |-
Rails keeps track of which migrations have been committed to the database and
stores them in a neighboring table in that same database called `schema_migrations`.
columns:
version: The version number of the migration.
```
#### Generate `#{export_directory_path}/models/sources/#{source_name}/src_#{source_name}.yml`
Generate a source file for dbt:
```bash
$ bin/rails generate active_record:dbt:source
```
Generate `#{export_directory_path}/models/sources/#{source_name}/src_#{source_name}.yml`.
##### Example:
> [!NOTE]
>
> The output will be as shown below. It is recommended to indent the YAML file with a tool of your choice.
> [!WARNING]
>
> If you are using a version of dbt lower than v1.8, replace `tests:` with `data_tests:` in the generated file.
>
> [Add data tests to your DAG | dbt Developer Hub](https://docs.getdbt.com/docs/build/data-tests#new-data_tests-syntax)
>
>> Data tests were historically called "tests" in dbt as the only form of testing available. With the introduction of unit tests in v1.8, the key was renamed from `tests:` to `data_tests:`.
```yaml
---
version: 2
sources:
- name: dummy
meta:
generated_by: activerecord-dbt
exclude:
table_names:
- profiles
description: |-
Write a description of the 'dummy' source.
You can write multiple lines.
tables:
- name: ar_internal_metadata
description: |-
# Internal Metadata
By default Rails will store information about your Rails environment and schema
in an internal table named `ar_internal_metadata`.
columns:
- name: key
description: Key
data_type: string
data_tests:
- unique
- not_null
- name: value
description: Value
data_type: string
- name: created_at
description: Created At
data_type: datetime
data_tests:
- not_null
- name: updated_at
description: Updated At
data_type: datetime
data_tests:
- not_null
- name: companies
description: Write a logical_name of the 'companies' table.
columns:
- name: id
description: id
data_type: int64
data_tests:
- unique
- not_null
- name: name
description: Write a description of the 'companies.name' column.
data_type: string
data_tests:
- not_null
- name: establishment_date
description: Write a description of the 'companies.establishment_date' column.
data_type: string
- name: average_age
description: Write a description of the 'companies.average_age' column.
data_type: float64
- name: published
description: Write a description of the 'companies.published' column.
data_type: bool
data_tests:
- not_null
- accepted_values:
values:
- true
- false
quote: false
- name: created_at
description: Created At
data_type: datetime
data_tests:
- not_null
- name: updated_at
description: Updated At
data_type: datetime
data_tests:
- not_null
- name: posts
description: Post
columns:
- name: id
description: ID
data_type: int64
data_tests:
- unique
- not_null
- name: user_id
description: User
data_type: int64
data_tests:
- not_null
- relationships:
to: source('dummy', 'users')
field: id
meta:
relationship_type: many-to-one
- name: title
description: Title
data_type: string
- name: content
description: Content
data_type: string
- name: created_at
description: Post Created At
data_type: datetime
data_tests:
- not_null
- name: updated_at
description: Post Updated At
data_type: datetime
data_tests:
- not_null
- name: status
description: Status
data_type: int64
data_tests:
- accepted_values:
values:
- 0
- 1
- 2
quote: false
- name: posts_tags
description: Write a logical_name of the 'posts_tags' table.
data_tests:
- dbt_utils.unique_combination_of_columns:
combination_of_columns:
- post_id
- tag_id
columns:
- name: post_id
description: post_id
data_type: int64
data_tests:
- not_null
- relationships:
to: source('dummy', 'posts')
field: id
meta:
relationship_type: many-to-one
active_record_dbt_error:
class: NameError
message: uninitialized constant PostsTag
- name: tag_id
description: tag_id
data_type: int64
data_tests:
- not_null
- relationships:
to: source('dummy', 'tags')
field: id
meta:
relationship_type: many-to-one
active_record_dbt_error:
class: NameError
message: uninitialized constant PostsTag
- name: relationships
description: Write a logical_name of the 'relationships' table.
data_tests:
- dbt_utils.unique_combination_of_columns:
combination_of_columns:
- follower_id
- followed_id
columns:
- name: id
description: id
data_type: int64
data_tests:
- unique
- not_null
- name: follower_id
description: follower_id
data_type: int64
data_tests:
- not_null
- relationships:
to: source('dummy', 'users')
field: id
meta:
relationship_type: many-to-one
- name: followed_id
description: followed_id
data_type: int64
data_tests:
- not_null
- relationships:
to: source('dummy', 'users')
field: id
meta:
relationship_type: many-to-one
- name: created_at
description: Created At
data_type: datetime
data_tests:
- not_null
- name: updated_at
description: Updated At
data_type: datetime
data_tests:
- not_null
- name: schema_migrations
description: |-
# Schema Migrations
Rails keeps track of which migrations have been committed to the database and
stores them in a neighboring table in that same database called `schema_migrations`.
columns:
- name: version
description: The version number of the migration.
data_type: string
data_tests:
- unique
- not_null
- name: tags
description: Write a logical_name of the 'tags' table.
columns:
- name: id
description: id
data_type: int64
data_tests:
- unique
- not_null
- name: name
description: Write a description of the 'tags.name' column.
data_type: string
data_tests:
- unique
- not_null
- name: created_at
description: Created At
data_type: datetime
data_tests:
- not_null
- name: updated_at
description: Updated At
data_type: datetime
data_tests:
- not_null
- name: user_tags
description: Write a logical_name of the 'user_tags' table.
data_tests:
- dbt_utils.unique_combination_of_columns:
combination_of_columns:
- user_id
- tag_id
columns:
- name: id
description: id
data_type: int64
data_tests:
- unique
- not_null
- name: user_id
description: user_id
data_type: int64
data_tests:
- not_null
- relationships:
to: source('dummy', 'users')
field: id
meta:
relationship_type: many-to-one
- name: tag_id
description: tag_id
data_type: int64
data_tests:
- not_null
- relationships:
to: source('dummy', 'tags')
field: id
meta:
relationship_type: many-to-one
- name: created_at
description: Created At
data_type: datetime
data_tests:
- not_null
- name: updated_at
description: Updated At
data_type: datetime
data_tests:
- not_null
- name: users
description: User
loaded_at_field: created_at
freshness:
warn_after:
count: 3
period: day
error_after:
count: 5
period: day
columns:
- name: id
description: ID
data_type: int64
data_tests:
- unique
- not_null
- name: created_at
description: User Created At
data_type: datetime
data_tests:
- not_null:
where: id != 1
- name: updated_at
description: User Updated At
data_type: datetime
data_tests:
- not_null
- name: company_id
description: company_id
data_type: int64
data_tests:
- relationships:
to: source('dummy', 'companies')
field: id
meta:
relationship_type: many-to-one
```
### Generate dbt Staging Files
#### dbt Staging Configuration
In the `#{config_directory_path}/staging_model.sql.tt` file, write the SQL template for the `staging` model you want to create.
You can use `sql.source_name`, `sql.table_name`, `sql.select_column_names`, `sql.primary_key_eql_id?`, and `sql.rename_primary_id` within this file.
Example:
```sql
with
source as (
select * from {{ source('<%= sql.source_name %>', '<%= sql.table_name %>') }}
),
renamed as (
select
<%- sql.select_column_names.each_with_index do |(column_type, columns), column_type_index| -%>
-- <%= column_type %>
<%- columns.each_with_index do |column, column_index| -%>
<%- is_rename_primary_id = sql.primary_key_eql_id? && sql.primary_key?(column.name) -%>
<%- is_last_column = column_type_index == sql.select_column_names.size - 1 && column_index == columns.size - 1 -%>
<%= is_rename_primary_id ? "id as #{sql.rename_primary_id}" : column.name %><% unless is_last_column -%>,<%- end %>
<%- if column_type_index != sql.select_column_names.size - 1 && column_index == columns.size - 1 -%>
<%- end -%>
<%- end -%>
<%- end -%>
from source
)
select * from renamed
```
Different Pattern:
```sql
#standardSQL
with source as (
select
<%- if sql.primary_key_eql_id? -%>
id as <%= sql.rename_primary_id %>
, * except(id)
<%- else -%>
*
<%- end -%>
from {{ source('<%= sql.source_name %>', '<%= sql.table_name %>') }}
)
, final as (
select
<%- sql.select_column_names.each_with_index do |(column_type, columns), column_type_index| -%>
-- <%= column_type %>
<%- columns.each_with_index do |column, column_index| -%>
<% unless column_type_index == 0 && column_index == 0 -%>, <%- end %><%= (sql.primary_key_eql_id? && sql.primary_key?(column.name) ? sql.rename_primary_id : column.name) %>
<%- if column_type_index != sql.select_column_names.size - 1 && column_index == columns.size - 1 -%>
<%- end -%>
<%- end -%>
<%- end -%>
from source
)
select
*
from final
```
#### Generate dbt Staging Files
Generate staging model files for dbt:
```bash
$ bin/rails generate active_record:dbt:staging_model TABLE_NAME
```
Generate staging model files for dbt that reference the specified `TABLE_NAME`.
File | Description
--------- | ---------
`#{export_directory_path}/models/staging/#{source_name}/stg_#{source_name}__#{table_name}.sql` | Staging model file for dbt.
`#{export_directory_path}/models/staging/#{source_name}/stg_#{source_name}__#{table_name}.yml` | Staging model documentation file for dbt.
Example:
```bash
$ bin/rails generate active_record:dbt:staging_model profiles
```
##### Generate `#{export_directory_path}/models/staging/#{source_name}/stg_#{source_name}__#{table_name}.sql`
Example:
```sql
with
source as (
select * from {{ source('dummy', 'profiles') }}
),
renamed as (
select
-- ids
id as profile_id,
user_id,
-- strings
first_name,
last_name,
-- datetimes
created_at,
updated_at
from source
)
select * from renamed
```
Different Pattern:
```sql
#standardSQL
with source as (
select
id as profile_id
, * except(id)
from {{ source('dummy', 'profiles') }}
)
, final as (
select
-- ids
profile_id
, user_id
-- strings
, first_name
, last_name
-- datetimes
, created_at
, updated_at
from source
)
select
*
from final
```
##### Generate `#{export_directory_path}/models/staging/#{source_name}/stg_#{source_name}__#{table_name}.yml`
Example:
> [!NOTE]
>
> The output will be as shown below. It is recommended to indent the YAML file with a tool of your choice.
> [!WARNING]
>
> If you are using a version of dbt lower than v1.8, replace `tests:` with `data_tests:` in the generated file.
>
> [Add data tests to your DAG | dbt Developer Hub](https://docs.getdbt.com/docs/build/data-tests#new-data_tests-syntax)
>
>> Data tests were historically called "tests" in dbt as the only form of testing available. With the introduction of unit tests in v1.8, the key was renamed from `tests:` to `data_tests:`.
```yaml
---
version: 2
models:
- name: stg_dummy__profiles
description: Write a logical_name of the 'profiles' table.
columns:
- name: profile_id
description: profile_id
data_type: int64
data_tests:
- unique
- not_null
- relationships:
to: source('dummy', 'profiles')
field: id
meta:
relationship_type: one-to-one
- name: user_id
description: user_id
data_type: int64
data_tests:
- unique
- not_null
- relationships:
to: source('dummy', 'users')
field: id
meta:
relationship_type: one-to-one
- name: first_name
description: Write a description of the 'profiles.first_name' column.
data_type: string
data_tests:
- not_null
- name: last_name
description: Write a description of the 'profiles.last_name' column.
data_type: string
data_tests:
- not_null
- name: created_at
description: Created At
data_type: datetime
data_tests:
- not_null
- name: updated_at
description: Updated At
data_type: datetime
data_tests:
- not_null
```
### Generated dbt Seed Files
#### dbt Seed Configuration
In the `#{config_directory_path}/source_config.yml` file, describe the properties you want to set for the seed enum.
You can configure `defaults` in this file.
##### defaults
Set the default value for the `description` of the `seeds` enum.
In the `description` of `seed_descriptions.enum`, you can refer to the source name with `{{ source_name }}`, the translated table name with `{{ translated_table_name }}`, and the translated column name with `{{ translated_attribute_name }}`.
Example:
```yml
defaults:
seed_descriptions:
enum:
description: "{{ source_name }} {{ translated_table_name }} {{ translated_attribute_name }} enum"
```
If nothing is set, it defaults to the following:
```yml
defaults:
seed_descriptions:
enum:
description: "{{ source_name }} {{ translated_table_name }} {{ translated_attribute_name }} enum"
```
#### Generate dbt Seed Enum Files
Generate seed enum files for dbt:
```bash
$ bin/rails generate active_record:dbt:enum TABLE_NAME ENUM_COLUMN_NAME
```
Generate seed enum files for dbt from the specified `TABLE_NAME` and `ENUM_COLUMN_NAME`.
File | Description
--------- | ---------
`#{export_directory_path}/seeds/#{source_name}/seed_#{source_name}__#{table_name_singularize}_enum_#{enum_pluralized}.csv` | Seed enum file for dbt.
`#{export_directory_path}/seeds/#{source_name}/seed_#{source_name}__#{table_name_singularize}_enum_#{enum_pluralized}.yml` | Seed enum documentation file for dbt.
Example:
```bash
$ bin/rails generate active_record:dbt:enum posts status
```
##### Generate `#{export_directory_path}/seeds/#{source_name}/seed_#{source_name}__#{table_name_singularize}_enum_#{enum_pluralized}.csv`
Example:
```csv
status_before_type_of_cast,status_key,status_en,status_ja
0,draft,Draft,下書き
1,published,Published,公開
2,deleted,Deleted,削除
```
##### Generate `#{export_directory_path}/seeds/#{source_name}/seed_#{source_name}__#{table_name_singularize}_enum_#{enum_pluralized}.yml`
Example:
> [!NOTE]
>
> The output will be as shown below. It is recommended to indent the YAML file with a tool of your choice.
> [!WARNING]
>
> If you are using a version of dbt lower than v1.8, replace `tests:` with `data_tests:` in the generated file.
>
> [Add data tests to your DAG | dbt Developer Hub](https://docs.getdbt.com/docs/build/data-tests#new-data_tests-syntax)
>
>> Data tests were historically called "tests" in dbt as the only form of testing available. With the introduction of unit tests in v1.8, the key was renamed from `tests:` to `data_tests:`.
```yaml
---
version: 2
seeds:
- name: seed_dummy__post_enum_statuses
description: dummy Post Status enum
config:
column_types:
status_before_type_of_cast: int64
status_key: string
status_en: string
status_ja: string
columns:
- name: status_before_type_of_cast
description: Status
data_tests:
- unique
- not_null
- name: status_key
description: Status(key)
data_tests:
- unique
- not_null
- name: status_en
description: Status(en)
data_tests:
- unique
- not_null
- name: status_ja
description: Status(ja)
data_tests:
- unique
- not_null
```
## Contributing
Contribution directions go here.
## License
The gem is available as open source under the terms of the [MIT License](https://opensource.org/licenses/MIT).