This document contains the steps required for configuring support for high availability (HA) on a 128T router. Unlike traditional routers, where deploying high availability involved deploying two separate routers and using a protocol such as VRRP or HSRP to provide failover protection, the 128T deploys software instances (referred to as "nodes") in pairs, but are collectively referred to as a single, logical router.
Configuring high availability requires that two 128T routing nodes have at least one device-interface that is shared between them (referred to in this document as a shared interface). Shared interfaces are configured on both nodes, but are active on only one node at a time. These shared interfaces must be in the same L2 broadcast domain; this is because the 128T uses gratuitous ARP messages to announce an interface failover, so that it may receive packets in place of its counterpart.
The two 128T router nodes that make up a high availability pair must be collocated due to latency sensitivities for the information that they synchronize between themselves.
Before You Begin
There are several things to be mindful of before configuring HA; the two nodes must be informed that they are part of a high availability set, and they must have a dedicated interface between themselves for synchronizing state information about active sessions. These steps will be covered in this section.
Configuration Change Operations
For High Availability (HA) configurations, all configuration edit and commit operations must be done on only one node. A severe, but very rare race condition may occur if changes to the configuration are made concurrently on both nodes of the HA router or conductor; the generated configuration will be lost upon commit. This applies to all methods of editing: PCLI, Web GUI, or external API’s (NETCONF or REST).
To avoid this situation, designate one node as the edit node. For example, in an HA configuration with two nodes, DataCenter Node A and DataCenter Node B, designate DataCenter Node A as the edit node. All configuration changes are made on Node A. All other commands can be run on Node B, but no configuration changes or commits, whether from the GUI, PCLI, or an API, are made from Node B.
This issue is resolved in release 5.3.0, but affects all prior releases.
Because highly available nodes synchronize time-series data, it is critical that the two nodes that comprise an HA pair have synchronized clocks. It is sufficient to manually synchronize the clocks until 128T software is installed, after which point NTP (Network Time Protocol) can be used to automatically synchronize the clocks.
Within the 128T configuration, you should configure NTP servers within
authority > router > system > ntp.
To confirm that you have NTP configured, use the command
show config running as shown here:
To confirm that NTP is synchronized, use the
show ntp command and confirm that at least one NTP server is in the
active state (some columns have been removed for display purposes):
Migrating from Standalone to HA
For an established standalone router of one node, converting it to be highly available requires configuring a second node within the 128T configuration (PCLI or GUI) at the outset.
Converting an existing router from standalone to HA will require downtime, and is therefore only to be undertaken during a maintenance window, as applicable.
Adding a second node is simply a matter of configuring another node container within the router. Eventually, this node will contain one or more shared interfaces, which will protect the router from failure modes if/when interfaces, links or a node fails. Configuring shared interfaces is covered later in this document.
Follow the setps in Non-forwarding HA Interfaces in order to provision an interface to connect between peer 128T nodes.
Configuring the Shared Interface(s)
A highly available router is comprised of exactly two routing nodes within the same router container. (Configuring two routers, each comprised of one node, cannot be made highly available.) Additionally, as mentioned previously, these routers must have at least one shared interface in common.
Configuring the basic properties of the two nodes is described elsewhere in this documentation. For high availability, the crucial step is identifying to the 128T the interfaces that are to be shared between them. This is done by establishing a common Layer 2 address, known as a MAC address, that is maintained by the active node in the pair. (I.e., when node1 of the pair has active control over the interface, it will respond to ARP requests for the addresses on that interface with the shared MAC address, whereas node2 will not.) The configuration element for this MAC address is the shared-phys-address, within the device-interface element.
The shared-phys-address is simply a series of six octets, where the only requirement is that it is unique on a given broadcast domain. (The 128T Conductor also enforces that the shared-phys-address be unique among all routers within an Authority.) There are no hardfast rules for creating "globally unique" MAC addresses; there are, however, many websites available that will generate random values. Again, since these MAC addresses are only used on a broadcast domain, they do not need to be globally unique to suit the 128T router's needs. Irrespective of how you choose to generate the value, the shared-phys-address is configured using the format "00:00:00:00:00:00."
Configuring the same shared-phys-address on two different interfaces (one per node in the high availability pair) informs the 128T that you wish to have the interfaces protect one another. This in turn causes the 128T to assign all corresponding pairs of network-interfaces that belong to this shared interface the same common global ID. (I.e., each network-interface on a node will have a unique global ID, but each counterpart network-interface on a highly available node will have the same global ID.) The global ID is an internal identifier, used by the 128T, to refer to the shared interface.
About the Global ID
Each network-interface within a 128T configuration has a global ID assigned to it. The term "global" refers to the value being global across two nodes of a router; it is not uniquely global across an Authority. Each router within an authority can share the same global ID. When two nodes share an interface for high availability, each network-interface pair, one per node, that fails over to another network-interface on the paired node is assigned the same global ID.
This value is also present in the output of
show rib, where it is the trailing value within each RIB entry:
The g1 value in the line above refers to the interface assigned with global-id value 1.
Network Interface Consistency
When configuring shared interfaces, it is crucial that the network-interface elements within a shared device-interface are mirror images of one another. This is to prevent any behavioral changes when ownership of a shared interface changes from one node to its counterpart. The configuration validation step will prevent committing configuration changes when the network-interface elements are not identical.
Confirming that Interfaces are Shared
Once you've configured two device-interface elements on individual nodes within a router for high availability, the
show device-interface summary command will identify which devices are redundant (shared) within the pair, as well as whether the interface is active or standby (or non-redundant, for interfaces that do not have a counterpart).
In this sample output, the interfaces on
node1 are active from a redundancy standpoint. Adding the optional argument
node all to the command will show all interfaces on the nodes that comprise the router:
Configuring the Fabric Interface
An optional, but common inclusion in highly available routers is a fabric interface, also known as a "dogleg" interface. Named to evoke the imagery of a fabric backplane or midplane of a chassis-based router, the fabric interface is a forwarding interface between two nodes in a router, and is used when the ingress interface and egress interface for a given session are active on different nodes.
Fabric interfaces are not required for simple active/standby deployments where the two nodes are mirror images of one another (e.g., each WAN interface and LAN interface is protected using shared interfaces). It does offer an additional protection against failure even in these active/standby setups: the double failure of a LAN port on node 1 and a WAN port on node 2. For deployments where Ethernet ports are not at a premium, a fabric interface is strongly recommended.
Configuring Redundancy Groups
Redundancy groups are sets of interfaces that share fate, such that if one of the interfaces in the group fails, leadership of all interfaces in the group will be relinquished to the counterpart node in the router. Redundancy groups are required when the two nodes in a router do not have a fabric interface between them; otherwise, you could end up in a situation where the active LAN interface is on node 1 and the active WAN interface is on node 2, with no way to transit packets from node 1 to node 2.
While redundancy groups are most commonly found in legacy deployments (i.e., those that predate 128 Technology's introduction of the fabric interface), they are still useful in simple HA deployments. Furthermore, the redundancy group affords administrators the ability to assert a preference for which node is active in an HA pair in the "sunny day" scenario where no interfaces are administratively or operationally down.
Generally, you will configure two nodes that each has a set of forwarding interfaces (for illustrative purposes, assume an interface on an internal network named lan and an interface on an external network named wan). Each node will require a redundancy-group that contains its pair of internal and external interfaces, as is seen in the following example:
In this example, our two redundant nodes (node1 and node2) each have two interfaces contained within part of the
redundancy-group. Note that each group collects the interfaces for a node, not interfaces that share a global-id.
The priority value indicates, all things being otherwise equal, an administrative preference for which group should be active. When configuring two redundancy-groups with differing priority values, the failover of the systems is said to be “revertive” – that is, the group with the higher priority will be active unless it experiences a failure, but when that failure is restored it will become active again.
When configuring two redundancy-groups with the same priority value, the 128T router will select an active member using an internal election algorithm, which is not guaranteed to be revertive in the event of a failure – but is neither guaranteed to be non-revertive. For this reason, it is suggested that you configure redundancy-group elements with different priority values.
Below is a sample, minimal configuration which shows the inclusion of both a fabric interfaces as well as redundancy-groups. This topology consists of 4 interfaces per node. 1 LAN, 1 WAN, 1 Fabric dog-leg, and 1 Fabric forwarding interface.