cloudfoundry/cf-k8s-controllers

View on GitHub
docs/architecture-decisions/0002-cf-resource-guid-format.md

Summary

Maintainability
Test Coverage
# CF Resource GUID Format

Date: 2021-08-19

## Status

Accepted

## Context

Every Cloud Foundry API resource has a globally unique ID (guid) which is used to uniquely identify it. Many GET endpoints use the guid as the key for fetching an individual resource. In CF for VMs these guids are (usually) generated by the database and are random "Version 4" UUIDs.

While every Kubernetes resource [does have an associated UID](https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#uids), these [are not used as keys](https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#uids) in the etcd store and don’t have an index either, which explains why there is no way to retrieve objects by UID, but only by namespace and name. The only way we have to retrieve objects by GUID is then to establish some sort of mapping between GUID and namespace + name.

To support the V3 Cloud Foundry APIs we need to be able to identify CF resources by a single identifier while maintaining the ability to find these resources on Kubernetes.

For more details see [our exploration (Issue #2)](https://github.com/cloudfoundry/cf-k8s-api/issues/2) and [RFC doc](https://docs.google.com/document/d/1lUZ1kyZpExJNOHhXkFp0tqkjLJgEFqUFSyTDOoKCXuk/edit#).

## Decision
We’ve decided to generate globally unique, non-deterministic GUIDs for resources we create. This GUID will be used as a resource’s name on Kubernetes. This aligns more closely with what existing Cloud Foundry clients expect and avoids encoding extra information in the identifier that may be difficult for us to work with long term.
> **NOTE:** in [ADR 0004](0004-resource-name-prefixes.md) we somewhat modified
> this decision to include a prefix on generated GUIDs for orgs, spaces and
> CFProcesses.

We believe we can address some of the performance implications here by using an in-memory cache on the CF API shim that maps known GUIDs to the Kubernetes namespace in which the object lives.

Requests to the CF API Shim to get an individual resource (e.g. `GET /v3/apps/:app-guid`) will first check the cache to see if there is a known namespace for the resource. If the namespace is known, the API Shim can then perform a Get call to the Kubernetes API with that namespace + resource guid (since it is the object’s name). If the namespace is not found in the cache, the API Shim will need to perform a List across all of the namespaces the user has access to. Once it finds the resource it can store guid->namespace mapping in the cache for future requests. Additionally the Shim can add resources to the cache if it is the one creating them.

For the initial implementation this cache will be in-memory, non-durable, and local to each instance of the API Shim.

## Consequences

### Pros
* GUID format and characteristics align closely with resource identifiers on CF for VMs
* We do not have to implicitly encode extra information in the identifier and document/communicate this with other consumers of the CF custom resources

### Cons
* Fetching individual resources from the API shim becomes more complicated.
* Need to manage a (hopefully lightweight) cache.
* Using non-deterministic GUIDs as a Kubernetes resource name does not support the [optimistic locking](https://github.com/kubernetes-sigs/controller-runtime/blob/master/FAQ.md#q-my-cache-might-be-stale-if-i-read-from-a-cache-how-should-i-deal-with-that) pattern employed by many Kubernetes controllers. We will need to use more complicated approaches (look into how the Deployments controller handles its children) or look into other approaches (some are mentioned on the RFC doc) to ensure our controllers function correctly.

## Open Questions

Do we need to enforce that CF custom resource names must be a GUID? Is it ok for a `kubectl` user to name a resource whatever they want?  If our caching mechanism makes use of the fact they’re globally unique then we might.