fedspendingtransparency/usaspending-api

View on GitHub
usaspending_api/etl/management/commands/load_transactions_in_delta.md

Summary

Maintainability
Test Coverage
# Overview
This command is used to initialize and update Transaction related delta tables.
## Key Tables being updated
### `int.transaction_id_lookup` 
This table maps raw transaction data (`raw.published_fabs` and `raw.detached_award_procurement`) to intermediary transaction data (`int.transaction_fabs`, `int.transaction_fpds`, and `int.transaction_normalized`). One entry exists in this table for each transaction represented by a `int.transaction_normalized` record and **either** a `int.transaction_fabs` or `int.transaction_fpds` record.

#### Fields
- `transaction_id` - Generated by this loader.
- `transaction_unique_id` - This corresponds to `afa_generated_unique` for fabs transactions and `detached_award_proc_unique` for fpds transactions.
- `published_fabs_id` - Unique identifier for `transaction_fabs` data.
- `detached_award_procurement_id` - Unique identifier for `transaction_fpds` data.

### `int.award_id_lookup`
This table maps award data to transaction data. Because an award can have many transactions, a single award may appear multiple times in this table. A record exists for each transaction, which maps it to its award.

#### Fields
- `award_id` - Generated by this loader. This is an id unique to an award, so this column allows for duplicates.
- `transaction_unique_id` - 
- `published_fabs_id` - Unique identifier for `transaction_fabs` data
- `detached_award_procurement_id` - Unique identifier for `transaction_fpds` data
- `generated_unique_award_id` - This value comes from the `unique_award_key` from incoming raw transactions

### `int.transaction_fabs`
Represents FABS transactions. Has one corresponding `int.transaction_normalized` record.
### `int.transaction_fpds`
Represents FPDS transactions. Has one corresponding `int.transaction_normalized` record.
### `int.transaction_normalized`
Represents data elements of a transaction not specific to FABS or FPDS. This has either a corresponding `int.transaction_fabs` or `int.transaction_fpds` record.
### `int.awards`
Represents data corresponding to an Award, which comprises of one or more Transactions. Transactions are grouped together as an award by their `unique_award_key`. Fields on awards come from either:
- The most recent Transaction for the Award
- The earliest Transaction for the Award
- Aggregate values of all Transactions for the Award

# Usage

This command will perform different actions depending on the argument provided with the `--etl-level` flag. 

## Initial Run
Before this script can be used routinely, it must first be run with the `--etl-level` flag set to `initial_run`. This will perform a few setup actions.

1. Create the the `transaction_id_lookup` table.
2. Use the existing `raw.transaction_normalized` table to prepopulate the `transaction_id_lookup` table. This will ensure that existing `transaction_id`s created by our system will be retained. If the `raw.transaction_normalized` table doesn't exist, this skep is skipped.
3. Set the `transaction_id_seq` sequence to the maximum `id` in the `transaction_id_lookup` table. If there are no records in the table, the sequence will be set to 1.
4. Create the `award_id_lookup` table.
5. Use the existing `raw.awards` table to prepopulate the `award_id_lookup` table. This will ensure any existing `id`s created by our system will be retained. If the `raw.awards` table does not exist, this step is skipped.
6. Set the `award_id_seq` to the maximum `id` value in the `award_id_lookup` table. If there are no records in the table, the sequence will be set to 1. 
7. Create each of the following intermediary tables and attempt to backfill them with data from their raw counterpart. If the `--no-initial-copy` flag is provided, the backfill will be skipped.
    - `int.transaction_fabs`
    - `int.transaction_fpds`
    - `int.transaction_normalized`

### Note on `--no-initial-copy`
This flag should not be used in production. The intermediary tables should be backfilled during the initial run so that update and create dates are preserved from the old, raw versions of the tables.
This flag may be used during testing to compare the data from the raw tables to the intermediary tables.

## Daily Usage

During the nightly pipeline, this command should be run multiple times using different `--etl-level` flag arguments. Below is a description of the steps required:
1. `transaction_id_lookup` and `award_id_lookup` can be run in parallel. These will update the lookup tables with the latest data from the raw tables: `raw.published_fabs` and `raw.detached_award_procurement`.
2. `transaction_fabs`, `transaction_fpds`, and `transaction_normalized` can be run in parallel.
3. `awards` must be run by itself.