Version: v1.7 (Dev)

Upgrade from v1.5.x to v1.6.0

General Information

An Upgrade button appears on the Dashboard screen whenever a new Harvester version that you can upgrade to becomes available. For more information, see Start an upgrade.

Clusters running v1.5.0 and v1.5.1 can upgrade to v1.6.x directly because Harvester allows a maximum of one minor version upgrade for underlying components. Harvester v1.5.0, v1.5.1, and v1.5.2 use the same minor version of RKE2 (v1.32), while Harvester v1.6.0 uses the next minor version (v1.33). For more information, see Upgrade paths.

note

Only customers affected by issues listed in the Bug Fixes section of the release notes must install v1.5.2.

For information about upgrading Harvester in air-gapped environments, see Prepare an air-gapped upgrade.

Update Harvester UI Extension on Rancher v2.12.0

You must use v1.6.0 of the Harvester UI Extension to import Harvester v1.6.0 clusters on Rancher v2.12.0.

On the Rancher UI, go to local > Apps > Repositories.
Locate the repository named harvester, and then select ⋮ > Refresh.
Go to the Extensions screen.
Locate the extension named Harvester, and then click Update.
Select version 1.6.0, and then click Update.
Allow some time for the extension to be updated, and then refresh the screen.

Known Issues

1. Upgrade is Stuck in the "Pre-drained" State

In certain situations, the Longhorn Instance Manager might fail to clean up an engine instance, even after the state of the engine CR has changed to "Stopped". The upgrade process becomes stuck in the "Pre-drained" state because the instance-manager pod cannot be deleted while the corresponding PodDisruptionBudget (PDB) still exists.

The workaround is to delete the instance-manager PDB after ensuring that all volumes are healthy.

Related issues: #8977 and #11605

2. Guest Cluster is Stuck in the "Updating" State

An RKE2 guest cluster may become stuck in the "Updating" state after Harvester is upgraded. The following error message is displayed on the Harvester UI:

Configuring etcd node(s) rke2-pool1-xdvfc-qf4vb: Node condition MemoryPressure is Unknown. Node condition DiskPressure is Unknown. Node condition PIDPressure is Unknown. Node condition Ready is Unknown, waiting for probes: calico, etcd, kube-apiserver, kube-controller-manager

The issue occurs when the guest node's IP address changes after the upgrade, causing etcd to malfunction. It is likely that the underlying virtual machine was rebooted several times and received a new IP address from the DHCP server.

To address the issue, perform the following steps:

On the Rancher UI, delete the error-causing node from the guest cluster.
On the Harvester UI, check the status of the underlying virtual machine.
If necessary, restart the virtual machine.

The virtual machine is removed, and the guest cluster attempts to create a new node. Once the node is created, the status of the guest cluster changes to "Active".

Related issue: #8950

3. Stopped Virtual Machine is Stuck in the "Starting" State

A Longhorn volume can flap between the "Detaching" and "Detached" states after a live migration. Because the volume is not ready, the associated virtual machine is unable to fully start.

The workaround is to clear the Longhorn volume's status.currentMigrationNodeID using following command:

kubectl patch -n longhorn-system volume <volume> \
  --type=merge \
  --subresource status \
  -p '{"status":{"currentMigrationNodeID":""}}'

Related issues: #8949 and #11479

4. Nodes Stuck in “Waiting Reboot” State Due to Network Setup Error

Nodes may become stuck in the Waiting Reboot state during an upgrade if the following criteria are met:

Harvester v1.2.1 or an earlier version was initially installed, and the nodes were upgraded incrementally.
The vlan_id field in the install.management_interface setting is either set to 1 or is empty.

The issue occurs because of a network setup error, as indicated by the message yaml: line did not find expected key in the node logs.

During the upgrade, the /oem/90_custom.yaml file is updated to reflect changes in the behavior of v1.5.x, which added VLANs 2–4094 to mgmt-br and mgmt-bo. Two scripts in that file (/etc/wicked/scripts/setup_bond.sh and /etc/wicked/scripts/setup_bridge.sh) may be truncated by a sed operation if they use the format generated by gopkg.in/yaml.v2, which was used in the installer of Harvester versions earlier than 1.2.2. The sed operation removes the line bridge vlan add vid 2-4094 dev $INTERFACE. This truncation issue does not affect scripts that use the format generated by gopkg.in/yaml.v3

Content of /etc/wicked/scripts/setup_bond.sh within /oem/90_custom.yaml file generated from gopkg.in/yaml.v2:

"#!/bin/sh\n\nACTION=$1\nINTERFACE=$2\n\ncase $ACTION in\n\tpost-up)\n\t\t#
inherit MAC address\n\t\tip link set dev mgmt-br address $(ip -json link show
dev $INTERFACE | jq -j '.[0][\"address\"]')\n\n\t\t# accept all vlan, PVID=1
by default\n\t\tbridge vlan add vid 2-4094 dev $INTERFACE\n\t\t;;\n\nesac\n"

Content of /etc/wicked/scripts/setup_bond.sh within /oem/90_custom.yaml file generated from gopkg.in/yaml.v3:

#!/bin/sh

ACTION=$1
INTERFACE=$2

case $ACTION in
        post-up)
                # inherit MAC address
                ip link set dev mgmt-br address $(ip -json link show dev $INTERFACE | jq -j '.[0]["address"]')

                #accept all vlan,PVID=1 by default
                bridge vlan add vid 2-4094 dev $INTERFACE
                ;;

esac

Content of /etc/wicked/scripts/setup_bridge.sh within /oem/90_custom.yaml file generated from gopkg.in/yaml.v2:

"#!/bin/sh\n\nACTION=$1\nINTERFACE=$2\n\ncase $ACTION in\n\tpre-up)\n\t\t#
enable vlan-aware\n\t\tip link set dev $INTERFACE type bridge vlan_filtering 1\n\t\t\t;;\n\n\tpost-up)\n\t\t#
accept all vlan, PVID=1 by default\n\t\tbridge vlan add vid 2-4094 dev $INTERFACE
self\n\t\tbridge vlan add vid 2-4094 dev mgmt-bo\n\t\t;;\n\nesac\n"

Content of /etc/wicked/scripts/setup_bridge.sh within /oem/90_custom.yaml file generated from gopkg.in/yaml.v3:

#!/bin/sh

ACTION=$1
INTERFACE=$2

case $ACTION in
        pre-up)
               #enable vlan-aware
               ip link set $INTERFACE type bridge vlan_filtering 1
               ;;

        post-up)
                #accept all vlan, PVID=1 by default
                bridge vlan add vid 2-4094 dev $INTERFACE self
                bridge vlan add vid 2-4094 dev mgmt-bo
                ;;
esac

The workaround is to replace the above contents generated from gopkg.in/yaml.v3 for /etc/wicked/scripts/setup_bond.sh and /etc/wicked/scripts/setup_bridge.sh in /oem/90_custom.yaml file. Once the file is updated, the upgrade process should resume its progress.

Related issue: #9033

5. Network connectivity lost on secondary VLAN interfaces on the `mgmt` cluster network

In v1.6.0, a feature change was introduced to only attach required VLAN interfaces to mgmt-br and mgmt-bo, instead of all secondary VLANs. This is intended behavior to reduce unnecessary VLAN provisioning.Due to this all secondary VLAN interfaces previously attached to the mgmt-br bridge and mgmt-bo are removed from the management hosts after the cluster is upgraded to v1.6.x.

danger

Workloads that rely on these interfaces will lose network connectivity.

For more information, see issue #7650.

After upgrading to v1.6.x, perform the following steps:

Verify VLANs attached to the mgmt-br and mgmt-bo by running the following command on management hosts:
```
bridge vlan show
```
The above outputs only the primary vlan part of mgmt-br and mgmt-bo
Manually add the required secondary VLANs to the mgmt-br bridge and the mgmt-bo interface by adding the following commands to the /oem/90_custom.yaml file:
- /etc/wicked/scripts/setup_bond.sh section
```
bridge vlan add vid <vlan-id> dev $INTERFACE
```
- /etc/wicked/scripts/setup_bridge.sh section
```
bridge vlan add vid <vlan-id> dev $INTERFACE self
bridge vlan add vid <vlan-id> dev mgmt-bo
```
important
You must include a separate command for each distinct VLAN ID. Ensure that the vlan-id placeholder is replaced with the actual ID.
Once the /oem/90_custom.yaml file is updated, reboot the management hosts.
Verify that all the required VLANs were added by running the following command on the hosts:
```
bridge vlan show
```

Upgrade Scenario Example

In the following example, a v1.5.x cluster was initially installed with a primary VLAN interface (VLAN ID: 2021). To add a secondary VLAN interface (VLAN ID: 2113), the /oem/99_vlan-ifcfg.yaml file was created on the management hosts with the following contents:

stages:
  initramfs:
    - name: "Host VLAN interface mgmt-br.353"
      files:
        - path: /etc/sysconfig/network/ifcfg-mgmt-br.2113
          owner: 0
          group: 0
          permissions: 384
          content: |
            STARTMODE='onboot'
            BOOTPROTO='static'
            IPADDR='10.255.113.150/24'
            VLAN_ID='2113'
            ETHERDEVICE='mgmt-br'
            VLAN='yes'
            DEFROUTE='no'

The typical expectation is that an additional VLAN sub-interface is created on the mgmt interface (mgmt-br.2113) and assigned an IPv4 address. In addition, this sub-interface and the primary interface (mgmt-br.2021) are both expected to be used for L3 connectivity after the cluster is upgraded to v1.6.x.

In actuality after upgrade to v1.6.0, however, the VLAN sub-interface is created but the secondary VLAN (VLAN ID: 2113) is removed from the mgmt-br bridge and the mgmt-bo interface. After a reboot, only the primary VLAN ID is assigned to the mgmt-br bridge and the mgmt-bo interface (using the /oem/90_custom.yaml file). To mitigate the effects of this change, you must perform the workaround described in the previous section. This involves identifying secondary VLAN interfaces and then adding the necessary ones to the mgmt-br bridge and the mgmt-bo interface.

General Information​

Update Harvester UI Extension on Rancher v2.12.0​

Known Issues​

1. Upgrade is Stuck in the "Pre-drained" State​

2. Guest Cluster is Stuck in the "Updating" State​

3. Stopped Virtual Machine is Stuck in the "Starting" State​

4. Nodes Stuck in “Waiting Reboot” State Due to Network Setup Error​

5. Network connectivity lost on secondary VLAN interfaces on the mgmt cluster network​

Upgrade Scenario Example​