Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document the cluster aggregate module #985

Merged
merged 3 commits into from
Nov 16, 2022
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
49 changes: 48 additions & 1 deletion lib/trento/domain/cluster/cluster.ex
Original file line number Diff line number Diff line change
@@ -1,5 +1,52 @@
defmodule Trento.Domain.Cluster do
@moduledoc false
@moduledoc """
The cluster aggregate manages all the domain logic related to
deployed HA Clusters (Pacemaker, Corosync, etc).
The HA cluster is used to handle the high availability scenarios on the installed
SAP infrastructure. That's why this domain is tailored to work on clusters managing
SAP workloads.

Each deployed cluster is registered as a new aggregate entry, meaning that all the hosts belonging
to the same cluster are part of the same stream. A cluster is registered first time/details updated afterwards
only by cluster discovery messages coming from the **designated controller** node. Once a cluster is
registered other hosts can be added receiving discovery messages coming from other nodes. All the hosts
are listed in the `hosts` field.

The cluster aggregate stores and updates information coming in the cluster discovery messages such as:

- Cluster name
- Number of hosts and cluster resources
- Platform where the host is running (the cloud provider for instance)
- Managed SAP workload SID

## Cluster health

The cluster health is one of the most relevant concepts of this domain.
It shows if the cluster is working as expected or not, and in the second case,
what is the roout cause of the issue and if there is some possible remediation.
It is composed by sub-health elements:

- Discovered health
- Checks health

The main cluster health is computed using the values from these two. This means that the cluster health is the
worst of the two.

### Discovered health

The discovered health comes from the cluster discovery messages and it depends on the cluster type.
Each cluster type has a different way of evaluating the health.

### Checks health

The checks health is obtained from the [Checks Engine executions](https://github.com/trento-project/wanda/).
Every time a checks execution is started, the selected checks for this cluster are executed, and based on the result
the health value is updated. The checks are started from a user request or periodically following the
project scheduler configuration.

This domain only knows about the health, the details about the execution are stored in the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just a little change here: "This bounded context" is more accurate.

Copy link
Member

@nelsonkopliku nelsonkopliku Nov 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: who is This bounded context?

If we refer to web or wanda I'd be fine, but if we refer to the Cluster Aggregate, I wouldn't agree.

Then, what about other references to domain in this Aggregate's documentation?

That's why this domain is tailored to work on clusters managing SAP workloads.

The cluster health is one of the most relevant concepts of this domain.

What does domain refer to in these sentences?

Just asking ☮️

[Checks Engine](https://github.com/trento-project/wanda/).
"""

require Trento.Domain.Enums.Provider, as: Provider
require Trento.Domain.Enums.ClusterType, as: ClusterType
Expand Down