Upgrade from v1.4.0 to v1.4.1
General information
An Upgrade button appears on the Dashboard screen whenever a new Harvester version that you can upgrade to becomes available. For more information, see Start an upgrade.
For air-gapped environments, see Prepare an air-gapped upgrade.
Known issues
1. Upgrade is stuck in the "Pre-drained" state
The upgrade process may become stuck in the "Pre-drained" state. Kubernetes is supposed to drain the workload on the node, but some factors may cause the process to stall.
A possible cause is processes related to orphan engines of the Longhorn Instance Manager. To determine if this applies to your situation, perform the following steps:
Check the name of the
instance-manager
pod on the stuck node.Example:
The stuck node is
harvester-node-1
, and the name of the Instance Manager pod isinstance-manager-d80e13f520e7b952f4b7593fc1883e2a
.$ kubectl get pods -n longhorn-system --field-selector spec.nodeName=harvester-node-1 | grep instance-manager
instance-manager-d80e13f520e7b952f4b7593fc1883e2a 1/1 Running 0 3d8hCheck the Longhorn Manager logs for informational messages.
Example:
$ kubectl -n longhorn-system logs daemonsets/longhorn-manager
...
time="2025-01-14T00:00:01Z" level=info msg="Node instance-manager-d80e13f520e7b952f4b7593fc1883e2a is marked unschedulable but removing harvester-node-1 PDB is blocked: some volumes are still attached InstanceEngines count 1 pvc-9ae0e9a5-a630-4f0c-98cc-b14893c74f9e-e-0" func="controller.(*InstanceManagerController).syncInstanceManagerPDB" file="instance_manager_controller.go:823" controller=longhorn-instance-manager node=harvester-node-1The
instance-manager
pod cannot be drained because of the enginepvc-9ae0e9a5-a630-4f0c-98cc-b14893c74f9e-e-0
.Check if the engine is still running on the stuck node.
Example:
$ kubectl -n longhorn-system get engines.longhorn.io pvc-9ae0e9a5-a630-4f0c-98cc-b14893c74f9e-e-0 -o jsonpath='{"Current state: "}{.status.currentState}{"\nNode ID: "}{.spec.nodeID}{"\n"}'
Current state: stopped
Node ID:The issue likely exists if the output shows that the engine is not running or even the engine is not found.
Check if all volumes are healthy.
kubectl get volumes -n longhorn-system -o yaml | yq '.items[] | select(.status.state == "attached")| .status.robustness'
All volumes must be marked
healthy
. If this is not the case, please help to report the issue.Remove the
instance-manager
pod's PodDisruptionBudget (PDB) .Example:
kubectl delete pdb instance-manager-d80e13f520e7b952f4b7593fc1883e2a -n longhorn-system
Related issues:
- [BUG] v1.4.0 -> v1.4.1-rc1 upgrade stuck in Pre-drained and the node stay in Cordoned
- [IMPROVEMENT] Cleanup orphaned volume runtime resources if the resources already deleted
2. Upgrade with default StorageClass that is not harvester-longhorn
Harvester adds the annotation storageclass.kubernetes.io/is-default-class: "true"
to harvester-longhorn
, which is the original default StorageClass. When you replace harvester-longhorn
with another StorageClass, the following occur:
The Harvester ManagedChart shows the error message
cannot patch "harvester-longhorn" with kind StorageClass: admission webhook "validator.harvesterhci.io" denied the request: default storage class %!s(MISSING) already exists, please reset it first
.The webhook denies the upgrade request.
You can perform any of the following workarounds:
Set
harvester-longhorn
as the default StorageClass.Add
spec.values.storageClass.defaultStorageClass: false
to theharvester
ManagedChart.kubectl edit managedchart harvester -n fleet-local
Add
timeoutSeconds: 600
to the Harvester ManagedChart spec.kubectl edit managedchart harvester -n fleet-local
For more information, see Issue #7375.