Autopilot#
A tool for updating your k0s
controller and worker nodes using specialized plans.
How it works#
- You run autopilot instances alongside the controller and/or worker nodes that you want to
have updated.
- For controllers, autopilot can be run standalone as
root
in order to access the runningk0s
(also typicallyroot
) - For workers, these can be run in privileged pods that have
hostPID=true
configured, or can also be run standalone asroot
.
- For controllers, autopilot can be run standalone as
- You create a
Plan
YAML- Defining the update payload (new version of
k0s
, URLs for platforms, etc) - Add definitions for all the nodes that should receive the update.
- Either statically, or dynamically using label/field selectors
- Defining the update payload (new version of
- Apply the
Plan
- Applying a
Plan
is a simplekubectl apply
operation.
- Applying a
- Monitor the progress
- The applied
Plan
provides a status that details the progress.
- The applied
Safeguards#
There are a number of safeguards in place to avoid breaking a cluster.
Stateless Controllers#
- The autopilot controllers were designed to not require any heavy state, or massive synchronization. Controllers can disappear, and backup controllers can resume the autopilot operations.
Workers Update Only After Controllers#
- The versioning that Kubelet and the Kubernetes API server adhere to requires that Kubelets should not be of a newer version than the API server.
- How autopilot handles this is that when a
Plan
is applied that has both controller and worker nodes, all of the controller nodes will be updated first. It is only when all controllers have updated successfully that worker nodes will receive their update instructions.
Plans are Immutable#
- When you apply a
Plan
, autopilot evaluates all of the controllers and workers that should be included into thePlan
, and tracks them in the status. After this point, no additional changes to the plan (other than status) will be recognized.- This helps in largely dynamic worker node environments where nodes that may have been
matched by the
selector
discovery method no longer exist by the time the update is ready to be scheduled.
- This helps in largely dynamic worker node environments where nodes that may have been
matched by the
Controller Quorum Safety#
- Prior to scheduling a controller update, autopilot queries the API server of all
controllers to ensure that they report a successful
/ready
- Only once all controllers are
/ready
will the current controller get sent update signaling. - In the event that any controller reports a non-ready, the
Plan
transitions into anInconsistentTargets
state, and thePlan
execution ends.
Controllers Update Sequentially#
- Despite having the configuration options for controllers to set concurrency, only one controller will be updated at a time.
Update Payload Verification#
- Each
update
object payload can provide an optionalsha256
hash of the update content (specified inurl
), which is compared against the update content after it downloads.
Configuration#
Autopilot relies on a Plan
object on its instructions on what to update.
Here is an arbitrary Autopilot plan:
apiVersion: autopilot.k0sproject.io/v1beta2
kind: Plan
metadata:
name: autopilot
spec:
id: id1234
timestamp: now
commands:
- k0supdate:
version: v1.23.3+k0s.1
platforms:
linux-amd64:
url: https://github.com/k0sproject/k0s/releases/download/v1.23.3%2Bk0s.1/k0s-v1.23.3+k0s.1-amd64
sha256: 0cd1f7c49ef81e18d3873a77ccabb5e4095db1c3647ca3fa8fc3eb16566e204e
targets:
controllers:
discovery:
static:
nodes:
- ip-172-31-44-131
- ip-172-31-42-134
- ip-172-31-39-65
workers:
limits:
concurrent: 5
discovery:
selector:
labels: environment=staging
fields: metadata.name=worker2
Core Fields#
apiVersion <string> (required)
#
- The current version of the Autopilot API is
v1beta2
, with a full group-version ofautopilot.k0sproject.io/v1beta2
metadata.name <string> (required)
#
- The name of the plan should always be
autopilot
- Note: Plans will not execute if they don't follow this convention.
Spec Fields#
spec.id <string> (optional)
#
- An identifier that can be provided by the creator for informational and tracking purposes.
spec.timestamp <string> (optional)
#
- A timestamp value that can be provided by the creator for informational purposes. Autopilot does nothing with this information.
spec.commands[] (required)
#
- The
commands
contains all of the commands that should be performed as a part of the plan.
k0supdate
Command#
spec.commands[].k0supdate.version <string> (required)
#
- The version of the binary being updated. This version is used to compare against the installed version before and after update to ensure success.
spec.commands[].k0supdate.platforms.*.url <string> (required)
#
- An URL providing where the updated binary should be downloaded from, for this specific platform.
- The naming of platforms is a combination of
$GOOS
and$GOARCH
, separated by a hyphen (-
)- eg:
linux-amd64
,linux-arm64
,linux-arm
- eg:
- Note: The main supported platform is
linux
. Autopilot may work on other platforms, however this has not been tested.
- The naming of platforms is a combination of
spec.commands[].k0supdate.platforms.*.sha256 <string> (optional)
#
- If a SHA256 hash is provided for the binary, the completed downloaded will be verified against it.
spec.commands[].k0supdate.targets.controllers <object> (optional)
#
- This object provides the details of how
controllers
should be updated.
spec.commands[].k0supdate.targets.controllers.limits.concurrent <int> (fixed as 1)
#
- The configuration allows for specifying the number of concurrent controller updates
through the plan spec, however for controller targets this is fixed always to
1
. - By ensuring that only one controller updates at a time, we aim to avoid scenarios where quorom may be disrupted.
spec.commands[].k0supdate.targets.workers <object> (optional)
#
- This object provides the details of how
workers
should be updated.
spec.commands[].k0supdate.targets.workers.limits.concurrent <int> (optional, default = 1)
#
- Specifying a
concurrent
value for worker targets will allow for that number of workers to be updated at a time. If no value is provided,1
is assumed.
airgapupdate
Command#
spec.commands[].airgapupdate.version <string> (required)
#
- The version of the airgap bundle being updated.
spec.commands[].airgapupdate.platforms.*.url <string> (required)
#
- An URL providing where the updated binary should be downloaded from, for this specific platform.
- The naming of platforms is a combination of
$GOOS
and$GOARCH
, separated by a hyphen (-
)- eg:
linux-amd64
,linux-arm64
,linux-arm
- eg:
- Note: The main supported platform is
linux
. Autopilot may work on other platforms, however this has not been tested.
- The naming of platforms is a combination of
spec.commands[].airgapupdate.platforms.*.sha256 <string> (optional)
#
- If a SHA256 hash is provided for the binary, the completed downloaded will be verified against it.
spec.commands[].airgapupdate.targets.workers <object> (optional)
#
- This object provides the details of how
workers
should be updated.
spec.commands[].airgapupdate.targets.workers.limits.concurrent <int> (optional, default = 1)
#
- Specifying a
concurrent
value for worker targets will allow for that number of workers to be updated at a time. If no value is provided,1
is assumed.
Static Discovery#
This defines the static
discovery method used for this set of targets (controllers
, workers
). The static
discovery method relies on a fixed set of hostnames defined
in .nodes
.
It is expected that a Node
(workers) or ControlNode
(controllers) object exists with
the same name.
static:
nodes:
- ip-172-31-44-131
- ip-172-31-42-134
- ip-172-31-39-65
spec.commands[].k0supdate.targets.*.discovery.static.nodes[] <string> (required for static)
#
- A list of hostnames that should be included in target set (
controllers
,workers
).
Selector Target Discovery#
The selector
target discovery method relies on a dynamic query to the Kubernetes API
using labels and fields to produce a set of hosts that should be updated.
Providing both labels
and fields
in the selector
definition will result in a logical AND
of both operands.
selector:
labels: environment=staging
fields: metadata.name=worker2
Specifying an empty selector will result in all nodes being selected for this target set.
selector: {}
spec.commands[].k0supdate.targets.*.discovery.selector.labels <string> (optional)
#
- A collection of name/value labels that should be used for finding the appropriate nodes for the update of this target set.
spec.commands[].k0supdate.targets.*.discovery.selector.fields <string> (optional)
#
- A collection of name/value fields that should be used for finding the appropriate nodes
for the update of this target set.
- Note: Currently only the field
metadata.name
is available as a query field.
- Note: Currently only the field
Status Reporting#
After a Plan
has been applied, its progress can be viewed in the .status
of the
autopilot
Plan.
kubectl get plan autopilot -oyaml
An example of a Plan
status:
status:
state: SchedulableWait
commands:
- state: SchedulableWait
k0supdate:
controllers:
- lastUpdatedTimestamp: "2022-04-07T15:52:44Z"
name: controller0
state: SignalCompleted
- lastUpdatedTimestamp: "2022-04-07T15:52:24Z"
name: controller1
state: SignalCompleted
- lastUpdatedTimestamp: "2022-04-07T15:52:24Z"
name: controller2
state: SignalPending
workers:
- lastUpdatedTimestamp: "2022-04-07T15:52:24Z"
name: worker0
state: SignalPending
- lastUpdatedTimestamp: "2022-04-07T15:52:24Z"
name: worker1
state: SignalPending
- lastUpdatedTimestamp: "2022-04-07T15:52:24Z"
name: worker2
state: SignalPending
To read this status, this indicates that:
- The overall status of the update is
SchedulableWait
, meaning that autopilot is waiting for the next opportunity to process a command. - There are three controller nodes
- Two controllers have
SignalCompleted
successfully - One is waiting to be signalled (
SignalPending
)
- Two controllers have
- There are also three worker nodes
- All are awaiting signaling updates (
SignalPending
)
- All are awaiting signaling updates (
Plan Status#
The Plan
status at .status.status
represents the overall status of the autopilot
update operation. There are a number of statuses available:
Status | Description | Ends Plan? |
---|---|---|
IncompleteTargets |
There are nodes in the resolved Plan that do not have associated Node (worker) or ControlNode (controller) objects. |
Yes |
InconsistentTargets |
A controller has reported itself as not-ready during the selection of the next controller to update. | Yes |
Schedulable |
Indicates that the Plan can be re-evaluated to determine which next node to update. |
No |
SchedulableWait |
Scheduling operations are in progress, and no further update scheduling should occur. | No |
Completed |
The Plan has run successfully to completion. |
Yes |
Restricted |
The Plan included node types (controller or worker) that violates the --exclude-from-plans restrictions. |
Yes |
Node Status#
Similar to the Plan Status, the individual nodes can have their own statuses:
Status | Description |
---|---|
SignalPending |
The node is available and awaiting an update signal |
SignalSent |
Update signaling has been successfully applied to this node. |
MissingPlatform |
This node is a platform that an update has not been provided for. |
MissingSignalNode |
This node does have an associated Node (worker) or ControlNode (controller) object. |
FAQ#
Q: How do I apply the Plan
and ControlNode
CRDs?#
A: These CRD definitions are embedded in the autopilot binary and applied on startup. No additional action is needed.
Q: How will ControlNode
instances get removed?#
A: ControlNode
instances are created by autopilot controllers as they startup. When
controllers disappear, they will not remove their associated ControlNode
instance. It
is the responsibility of the operator/administrator to ensure their maintenance.
Q: I upgraded my workers, and now Kubelets are no longer reporting#
You probably upgraded your workers to an API version greater than what is available on the API server.
https://kubernetes.io/releases/version-skew-policy/
Make sure that your controllers are at the desired version first before upgrading workers.
Q: If running workers in a Pod, what minimum RBAC access is needed?#
The following RBAC entries are required for running autopilot workers in Pods.
API Group | Resource | Verbs |
---|---|---|
autopilot.k0sproject.io |
plans |
list , watch |
autopilot.k0sproject.io |
plans/status |
update |
apps |
daemonsets |
get |
(empty) | nodes |
list , patch , update , watch |
(empty) | pods |
list |
(empty) | pods/eviction |
create |