docs/manual/developer/07_understanding_build_system.md from ComplianceAsCode/content

docs/manual/developer/07_understanding_build_system.md
Summary

Maintainability

Test Coverage

Issues
# Understanding the ComplianceAsCode build system

## Introduction

This section aims to provide an introduction to the ComplianceAsCode build
system to developers interested in extending or debugging it.

Before beginning, it is generally expected that some familiarity with the
relevant standards in this space are understood. Among others, these are:

- [XCCDF](https://csrc.nist.gov/projects/security-content-automation-protocol/specifications/xccdf),
  the eXtensible Configuration Checklist Description Format; this is a
  textual representation format of various steps in hardening a particular
  system.
- [OVAL](https://oval.cisecurity.org/), the Open Vulnerability and Assessment
  Language; this is a standardized mechanism for auditing various compliance
  checks.
- [OCIL](https://csrc.nist.gov/projects/security-content-automation-protocol/specifications/ocil),
  the Open Checklist Interactive Language, an expressive language for
  handling manual compliance checks.
- [CPE](https://nvd.nist.gov/products/cpe), the Common Platform Enumeration;
  a scheme for identifying software and systems.
- SCAP source data stream format, a mechanism for combining the above into a single
  redistributable file.

Additionally, some familiarity with the content layout (as discussed in
previous chapters) is also implied.

However, while this document serves as a guide, ultimately the build system
is changing and thus inspecting the code is the only way to find the answers
to many questions.


## High-Level Overview

ComplianceAsCode's content project is ultimately the combination of three
things:

- A collection of content in a format-agnostic manner,
- A build system for collecting this content and combining it to form
  artifacts understood by other systems,
- A test system for validating both the compliance of these artifacts
  to various standards and the correctness of the content in the repo.

As previous sections describe in detail the expectations around content in the
repo, this section aims to describe the build system. For understanding of the
test systems, it is suggested to look at the README under `tests/` in the
repo.

The build system is generated by CMake and combines local Python utilities
with XML tools (such as `xmllint` and `xsltproc`) and OpenSCAP's `oscap`
CLI executable. These Python utilities transform the various input files
into a more standardized format and apply Jinja macros to them. Ultimately
many of the artifacts we generate are XML-based so extensive XSLT processing
occurs after building the initial structure in Python. Finally, OpenSCAP
combines and references several files for us to build the finished artifacts.


### CMake Structure

CMake requires projects have an entry point called `/CMakeLists.txt`. This
uses the CMake language and drives building and installing the project. This
file contains several things:

- The many build-time options for customizing the types of content generated,
- The hand-off for generating each product's specific content,
- Common installation, testing, and distribution targets.

However, the specifics of building a particular product are contained in the
shared module located at `cmake/SSGCommon.cmake`. This file contains all of
the CMake logic to build a particular product and exposes the top-level macro
`ssg_build_product(...)`. This macro generates per-product build, installation,
and testing targets. While the specifics should be understood from this file
directly, in general this takes the following outline of steps in rough order
of occurrence:

- Generate SCE content and metadata.
- Generate the product dictionary.
- Resolve rules, profiles, groups, static checks and static remediations to the product-specific resolved form (also known as compiled form).
- Generate templated checks and remediations from the templates.
- Collect all available remediations.
- Combine all available OVAL checks into a single unlinked OVAL document.
- Load resolved rules, profiles, groups, collected remediations and the unlinked OVAL document and generate XCCDF, OVAL and OCIL documents from this data.
- Generate CPE OVAL and CPE dictionary.
- Combining the OVAL, OCIL, CPE and XCCDF documents into a single SCAP source data stream.
- Generate content for derived products (such as CentOS and Scientific Linux).
- Generate HTML tables, Bash scripts, Ansible Playbooks and other secondary artifacts.

### Python Build Scripts

Various Python utilities under `/build-scripts` contribute to this process;
refer to their help text for more information and usage:

- `build_all_guides.py` -- generates separate HTML guides for every profile
  in an XCCDF document.
- `build_rule_playbooks.py` -- generates per-rule per-profile playbooks in
  Ansible content.
- `build_sce.py` -- outputs SCE content and combined metadata.
- `build_templated_content.py` -- generates templated audit and remediation
  content.
- `build_xccdf.py` -- generate XCCDF, OVAL and OCIL documents from resolved content
- `collect_remediations.py` -- finds the separate (per-rule and templated)
  remediations and places them into a single directory.
- `combine_ovals.py` -- combines separate (per-rule, shared, and templated) OVAL XML trees into a single larger OVAL XML document.
- `compile_all.py` -- resolves rules, groups, profiles static checks and remediations to the product-specific resolved form (also known as compiled form)
- `compile_product.py` -- resolves the product.yml and distributed product attributes
- `compose_ds.py` -- composes an SCAP source data stream from individual
  SCAP components
- `cpe_generate.py` -- generates the product-specific CPE dictionary and
  checks.
- `enable_derivatives.py` -- generates derivative product content from a
  base product.
- `expand_jinja.py` -- helper script used by the BATS (Bash unit test
  framework) to expand Jinja in test scripts.
- `generate_guides.py` -- Generate HTML guides and HTML index for every profile in the built SCAP source data stream.
- `generate_man_page.py` -- generates the ComplianceAsCode man page.
- `generate_profile_remediations.py` -- Generate profile oriented Bash remediation scripts or profile oriented Ansible Playbooks from the built SCAP source data stream. The output is similar to the output of the `oscap xccdf generate fix` command, but the tool `generate_profile_remediations.py` generates the scripts or Playbooks for all profiles in the given SCAP source data stream at once.
- `profile_tool.py` -- utility script to generate statistics about profiles
  in a specific XCCDF/data stream file.
- `verify_references.py` -- used by the test system to verify cross-linkage
  of identifiers between XCCDF and OVAL/OCIL documents.

Many of these utilities are simply front-ends over code in the SSG Python
module located under `ssg/`.

## How OVAL is Built

The build of the OVAL document takes place in two steps.

### 1. Combination of OVALs

In the first step, all available and applicable OVAL checks are built into a single unlinked OVAL document stored in the `build/${PRODUCT}/oval-unlinked.xml` directory.
The `oval-unlinked.xml` document is generated using the `combine_ovals.py` script.
The OVAL shorthands are loaded into the OVAL Document object in the order that the benchmark checks are loaded first, followed by the shared directory checks.
If the shorthand is already loaded into the OVAL Document object, it is skipped.

Steps of loading the OVAL shorthand:

1. The OVAL Shorthand file is loaded as a string, and in the case of not templated Shorthand, it is expanded using Jinja macros before loading.
2. The OVAL Shorthand string is processed by the OVAL Document object.
   1. The OVAL Shorthand string is loaded into the OVAL Shorthand object.
   2. The OVAL Shorthand object is validated.
      The following properties are checked:
       - Whether the OVAL definitions are applicable to the product.
       - If there is an OVAL definition in the shorthand with the same id as the given rule_id.
3. If the OVAL Shorthand object is valid, it is added to the OVAL Document object.

After all OVAL Shorthands are loaded, the affected platforms of the loaded OVAL definitions are completed.
And then the OVAL document is saved as an XML file in `build/${PRODUCT}/oval-unlinked.xml`.

### 2. Linking OVAL Document

The second step is performed when building an XCCDF document using the `build_xccdf.py` script.
In this step, the `oval-unlinked.xml` document from the previous step is linked (IDs between rules and checks are aligned) to the XCCDF document being built.

Steps to link an OVAL document to an XCCDF document:

1. The unlinked OVAL document `oval-unlinked.xml` is loaded into the OVAL Document object.
2. The integrity of the references to the components of the OVAL Document object is verified.
3. For each XCCDF rule that has a CCE identification and
   has an OVAL check implemented, a new `<reference>` element with the CCE ID is added to the OVAL definition.
4. The OVAL definition referenced by the XCCDF is checked to be defined in the OVAL document.
5. Verify if `<xccdf:Value>` `type` to corresponding OVAL variable `datatype` export matching [constraint](http://csrc.nist.gov/publications/nistpubs/800-126-rev2/SP800-126r2.pdf#page=30&zoom=auto,69,313) is met.
   Also correct the `type` attribute of those `<xccdf:Value>` elements where necessary in order the produced content to meet this constraint.
6. Verify that the referenced CCE identifiers are correct.
7. Translate the identifiers in the OVAL Document object using `IDTranslator`.
8. The OVAL Document object is stored as an XML file `build/ssg-${PRODUCT}-oval.xml`.
9. For each XCCDF rule, a minimal OVAL Documents document is generated as an artifact
10. For each reference of OVAL check in XCCDF, a link to the `check-content` and a `check-export` element is added.