docs/ADRs/0004.database-schema-validation.md from opcotech/elemo

docs/ADRs/0004.database-schema-validation.md
Summary

Maintainability

Test Coverage

Issues
# Database Schema Validation

| author       | created at | updated at | status   |
|:-------------|:-----------|------------|:---------|
| @gabor-boros | 2023-03-16 | -          | accepted |

## Abstract

The database of choice for the project is Neo4j. Neo4j is a graph database, and
it is a good fit for the project, because project management is basically about
managing a bunch of interconnected objects.

However, there are some things we need to consider when working with Neo4j:

* The ID of vertices and edges generated by Neo4j are meant to be used only
  internally by the database. Furthermore, IDs of generated by Neo4j _may_ be
  reused by the database upon need after deleting its previous owner.
  Therefore, we need to generate our own IDs for the vertices and edges, and
  use them in the application.
* The database is schemaless, which means that we can't use the database to
  validate the data we are storing. We need to validate the data in the
  application.

This ADR describes how we are going to solve schema validation. For generating
IDs for vertices and edges, see the [Vertex And Edge IDs] ADR.

[Vertex And Edge IDs]: 0003.vertex-and-edge-ids.md

## Decision

Before jumping into the solution, let's take it clear that we are not going to
validate the data in the application. We are going to validate/ensure that the
database looks like we want it to **look like**. We are going to use the
combination of constraints, indexes, and `struct`s to ensure that the schema is
matching our expectations.

Since we are not using the Enterprise Edition of Neo4j, we can only use "unique
node property constraints", meaning that we can only use constraints on the
properties of the vertices which are creating indexes as well. Although this is
a limitation, we can compensate the lack of schema by using `struct`s to ensure
node property existence.

This is where the `struct`s come into play. We are going to use `struct`s to
ensure that the properties of the vertices are present. The repository layer is
responsible for defining these `struct`s without the business logic knowing
about them.

However, we have no way at the moment to ensure that property values are valid.
For example, we can't ensure that a property value is a valid email address, we
can only ensure that the property is present.

## Consequences

As a consequence of this decision, we need to make sure that the repository
layer is defining the `struct`s for the vertices and edges. This brings in
potential code duplication. However, this is a minor issue compared to the
benefits of having a schema.

## References

* [Neo4j constraints](https://neo4j.com/docs/cypher-manual/current/constraints/)
* [Neo4j indexes](https://neo4j.com/docs/cypher-manual/current/indexes-for-search-performance/)