|
| 1 | +# etcd.openshift.io API Group |
| 2 | + |
| 3 | +This API group contains CRDs related to etcd cluster management in Two Node OpenShift with Fencing deployments. |
| 4 | + |
| 5 | +## API Versions |
| 6 | + |
| 7 | +### v1alpha1 |
| 8 | + |
| 9 | +Contains the `PacemakerCluster` custom resource for monitoring Pacemaker cluster health in Two Node OpenShift with Fencing deployments. |
| 10 | + |
| 11 | +#### PacemakerCluster |
| 12 | + |
| 13 | +- **Feature Gate**: `DualReplica` |
| 14 | +- **Component**: `two-node-fencing` |
| 15 | +- **Scope**: Cluster-scoped singleton resource (must be named "cluster") |
| 16 | +- **Resource Path**: `pacemakerclusters.etcd.openshift.io` |
| 17 | + |
| 18 | +The `PacemakerCluster` resource provides visibility into the health and status of a Pacemaker-managed cluster. |
| 19 | +It is periodically updated by the cluster-etcd-operator's status collector. |
| 20 | + |
| 21 | +### Status Subresource Design |
| 22 | + |
| 23 | +This resource uses the standard Kubernetes status subresource pattern (`+kubebuilder:subresource:status`). |
| 24 | +The status collector creates the resource without status, then immediately populates it via the `/status` endpoint. |
| 25 | + |
| 26 | +**Why not atomic create-with-status?** |
| 27 | + |
| 28 | +We initially explored removing the status subresource to allow creating the resource with status in a single |
| 29 | +atomic operation. This would ensure the resource is never observed in an incomplete state. However: |
| 30 | + |
| 31 | +1. The Kubernetes API server strips the `status` field from create requests when a status subresource is enabled |
| 32 | +2. Without the subresource, we cannot use separate RBAC for spec vs status updates |
| 33 | +3. The OpenShift API test framework assumes status subresource exists for status update tests |
| 34 | + |
| 35 | +The status collector performs a two-step operation: create resource, then immediately update status. |
| 36 | +The brief window where status is empty is acceptable since the healthcheck controller handles missing status gracefully. |
| 37 | + |
| 38 | +### Pacemaker Resources |
| 39 | + |
| 40 | +A **pacemaker resource** is a unit of work managed by pacemaker. In pacemaker terminology, resources are services |
| 41 | +or applications that pacemaker monitors, starts, stops, and moves between nodes to maintain high availability. |
| 42 | + |
| 43 | +For Two Node OpenShift with Fencing, we manage three resource types: |
| 44 | +- **Kubelet**: The Kubernetes node agent and a prerequisite for etcd |
| 45 | +- **Etcd**: The distributed key-value store |
| 46 | +- **FencingAgent**: Used to isolate failed nodes during a quorum loss event (tracked separately) |
| 47 | + |
| 48 | +### Status Structure |
| 49 | + |
| 50 | +```yaml |
| 51 | +status: # Optional on creation, populated via status subresource |
| 52 | + conditions: # Required when status present (min 3 items) |
| 53 | + - type: Healthy |
| 54 | + - type: InService |
| 55 | + - type: NodeCountAsExpected |
| 56 | + lastUpdated: <timestamp> # Required when status present, cannot decrease |
| 57 | + nodes: # Control-plane nodes (0-5, expects 2 for TNF) |
| 58 | + - name: <hostname> # RFC 1123 subdomain name |
| 59 | + addresses: # Required: List of node addresses (1-8 items) |
| 60 | + - type: InternalIP # Currently only InternalIP is supported |
| 61 | + address: <ip> # First address used for etcd peer URLs |
| 62 | + conditions: # Required: Node-level conditions (min 9 items) |
| 63 | + - type: Healthy |
| 64 | + - type: Online |
| 65 | + - type: InService |
| 66 | + - type: Active |
| 67 | + - type: Ready |
| 68 | + - type: Clean |
| 69 | + - type: Member |
| 70 | + - type: FencingAvailable |
| 71 | + - type: FencingHealthy |
| 72 | + resources: # Required: Pacemaker resources on this node (min 2) |
| 73 | + - name: Kubelet # Both Kubelet and Etcd must be present |
| 74 | + conditions: # Required: Resource-level conditions (min 8 items) |
| 75 | + - type: Healthy |
| 76 | + - type: InService |
| 77 | + - type: Managed |
| 78 | + - type: Enabled |
| 79 | + - type: Operational |
| 80 | + - type: Active |
| 81 | + - type: Started |
| 82 | + - type: Schedulable |
| 83 | + - name: Etcd |
| 84 | + conditions: [...] # Same 8 conditions as Kubelet (abbreviated) |
| 85 | + fencingAgents: # Required: Fencing agents for THIS node (1-8) |
| 86 | + - name: <nodename>_<method> # e.g., "master-0_redfish" |
| 87 | + method: <method> # Fencing method: redfish, ipmi, fence_aws, etc. |
| 88 | + conditions: [...] # Same 8 conditions as resources (abbreviated) |
| 89 | +``` |
| 90 | +
|
| 91 | +### Fencing Agents |
| 92 | +
|
| 93 | +Fencing agents are STONITH (Shoot The Other Node In The Head) devices used to isolate failed nodes. |
| 94 | +Unlike regular pacemaker resources (Kubelet, Etcd), fencing agents are tracked separately because: |
| 95 | +
|
| 96 | +1. **Mapping by target, not schedule**: Resources are mapped to the node where they are scheduled to run. |
| 97 | + Fencing agents are mapped to the node they can *fence* (their target), regardless of which node |
| 98 | + their monitoring operations are scheduled on. |
| 99 | +
|
| 100 | +2. **Multiple agents per node**: A node can have multiple fencing agents for redundancy |
| 101 | + (e.g., both Redfish and IPMI). Expected: 1 per node, supported: up to 8. |
| 102 | +
|
| 103 | +3. **Health tracking via two node-level conditions**: |
| 104 | + - **FencingAvailable**: True if at least one agent is healthy (fencing works), False if all agents unhealthy (degrades operator) |
| 105 | + - **FencingHealthy**: True if all agents are healthy (ideal state), False if any agent is unhealthy (emits warning events) |
| 106 | +
|
| 107 | +### Cluster-Level Conditions |
| 108 | +
|
| 109 | +| Condition | True | False | |
| 110 | +|-----------|------|-------| |
| 111 | +| `Healthy` | Cluster is healthy (`ClusterHealthy`) | Cluster has issues (`ClusterUnhealthy`) | |
| 112 | +| `InService` | In service (`InService`) | In maintenance (`InMaintenance`) | |
| 113 | +| `NodeCountAsExpected` | Node count is as expected (`AsExpected`) | Wrong count (`InsufficientNodes`, `ExcessiveNodes`) | |
| 114 | + |
| 115 | +### Node-Level Conditions |
| 116 | + |
| 117 | +| Condition | True | False | |
| 118 | +|-----------|------|-------| |
| 119 | +| `Healthy` | Node is healthy (`NodeHealthy`) | Node has issues (`NodeUnhealthy`) | |
| 120 | +| `Online` | Node is online (`Online`) | Node is offline (`Offline`) | |
| 121 | +| `InService` | In service (`InService`) | In maintenance (`InMaintenance`) | |
| 122 | +| `Active` | Node is active (`Active`) | Node is in standby (`Standby`) | |
| 123 | +| `Ready` | Node is ready (`Ready`) | Node is pending (`Pending`) | |
| 124 | +| `Clean` | Node is clean (`Clean`) | Node is unclean (`Unclean`) | |
| 125 | +| `Member` | Node is a member (`Member`) | Not a member (`NotMember`) | |
| 126 | +| `FencingAvailable` | At least one agent healthy (`FencingAvailable`) | All agents unhealthy (`FencingUnavailable`) - degrades operator | |
| 127 | +| `FencingHealthy` | All agents healthy (`FencingHealthy`) | Some agents unhealthy (`FencingUnhealthy`) - emits warnings | |
| 128 | + |
| 129 | +### Resource-Level Conditions |
| 130 | + |
| 131 | +Each resource in the `resources` array and each fencing agent in the `fencingAgents` array has its own conditions. |
| 132 | + |
| 133 | +| Condition | True | False | |
| 134 | +|-----------|------|-------| |
| 135 | +| `Healthy` | Resource is healthy (`ResourceHealthy`) | Resource has issues (`ResourceUnhealthy`) | |
| 136 | +| `InService` | In service (`InService`) | In maintenance (`InMaintenance`) | |
| 137 | +| `Managed` | Managed by pacemaker (`Managed`) | Not managed (`Unmanaged`) | |
| 138 | +| `Enabled` | Resource is enabled (`Enabled`) | Resource is disabled (`Disabled`) | |
| 139 | +| `Operational` | Resource is operational (`Operational`) | Resource has failed (`Failed`) | |
| 140 | +| `Active` | Resource is active (`Active`) | Resource is not active (`Inactive`) | |
| 141 | +| `Started` | Resource is started (`Started`) | Resource is stopped (`Stopped`) | |
| 142 | +| `Schedulable` | Resource is schedulable (`Schedulable`) | Resource is not schedulable (`Unschedulable`) | |
| 143 | + |
| 144 | +### Validation Rules |
| 145 | + |
| 146 | +**Resource naming:** |
| 147 | +- Resource name must be "cluster" (singleton) |
| 148 | + |
| 149 | +**Node name validation:** |
| 150 | +- Must be a lowercase RFC 1123 subdomain name |
| 151 | +- Consists of lowercase alphanumeric characters, '-' or '.' |
| 152 | +- Must start and end with an alphanumeric character |
| 153 | +- Maximum 253 characters |
| 154 | + |
| 155 | +**Node addresses:** |
| 156 | +- Uses `PacemakerNodeAddress` type (similar to `corev1.NodeAddress` but with IP validation) |
| 157 | +- Currently only `InternalIP` type is supported |
| 158 | +- Pacemaker allows multiple addresses for Corosync communication between nodes (1-8 addresses) |
| 159 | +- The first address in the list is used for IP-based peer URLs for etcd membership |
| 160 | +- IP validation: |
| 161 | + - Must be a valid global unicast IPv4 or IPv6 address |
| 162 | + - Must be in canonical form (e.g., `192.168.1.1` not `192.168.001.001`, or `2001:db8::1` not `2001:0db8::1`) |
| 163 | + - Excludes loopback, link-local, and multicast addresses |
| 164 | + - Maximum length is 39 characters (full IPv6 address) |
| 165 | + |
| 166 | +**Timestamp validation:** |
| 167 | +- `lastUpdated` is required when status is present |
| 168 | +- Once set, cannot be set to an earlier timestamp (validation uses `!has(oldSelf.lastUpdated)` to handle initial creation) |
| 169 | +- Timestamps must always increase (prevents stale updates from overwriting newer data) |
| 170 | + |
| 171 | +**Status fields:** |
| 172 | +- `status` - Optional on creation (pointer type), populated via status subresource |
| 173 | +- When status is present, all fields within are required: |
| 174 | + - `conditions` - Required array of cluster conditions (min 3 items) |
| 175 | + - `lastUpdated` - Required timestamp for staleness detection |
| 176 | + - `nodes` - Required array of control-plane node statuses (min 0, max 5; empty allowed for catastrophic failures) |
| 177 | + |
| 178 | +**Node fields (when node present):** |
| 179 | +- `name` - Required, RFC 1123 subdomain |
| 180 | +- `addresses` - Required (min 1, max 8 items) |
| 181 | +- `conditions` - Required (min 9 items with specific types enforced via XValidation) |
| 182 | +- `resources` - Required (min 2 items: Kubelet and Etcd) |
| 183 | +- `fencingAgents` - Required (min 1, max 8 items) |
| 184 | + |
| 185 | +**Conditions validation:** |
| 186 | +- Cluster-level: MinItems=3 (Healthy, InService, NodeCountAsExpected) |
| 187 | +- Node-level: MinItems=9 (Healthy, Online, InService, Active, Ready, Clean, Member, FencingAvailable, FencingHealthy) |
| 188 | +- Resource-level: MinItems=8 (Healthy, InService, Managed, Enabled, Operational, Active, Started, Schedulable) |
| 189 | +- Fencing agent-level: MinItems=8 (same conditions as resources) |
| 190 | + |
| 191 | +All condition arrays have XValidation rules to ensure specific condition types are present. |
| 192 | + |
| 193 | +**Resource names:** |
| 194 | +- Valid values are: `Kubelet`, `Etcd` |
| 195 | +- Both resources must be present in each node's `resources` array |
| 196 | + |
| 197 | +**Fencing agent fields:** |
| 198 | +- `name`: The pacemaker resource name (e.g., "master-0_redfish"), max 253 characters |
| 199 | +- `method`: The fencing method (e.g., "redfish", "ipmi", "fence_aws"), max 63 characters |
| 200 | +- `conditions`: Required, same 8 conditions as resources |
| 201 | + |
| 202 | +### Usage |
| 203 | + |
| 204 | +The cluster-etcd-operator healthcheck controller watches this resource and updates operator conditions based on |
| 205 | +the cluster state. The aggregate `Healthy` conditions at each level (cluster, node, resource) provide a quick |
| 206 | +way to determine overall health. |
0 commit comments