Alarms

FieldData
Categoryasset
Severitymajor
MessageAsset <id>, which is configured as <node>.<router>, is not running
ThresholdIssued when the 128T service stops on a node (must be managed by ZTP). Clears on 128T start.
CauseTroubleshooting Step
128T is not running on node <node> router <router>Start 128T from the Conductor PCLI by entering with send command start router <router> node <node> or pressing the start button on the Conductor’s router page in the GUI. If 128T cannot start check systemctl status 128T on that node.

FieldData
Categoryasset
Severitymajor
MessageAsset<id>, which is configured as <node>.<router>, failed to install.
ThresholdIssued when the 128T install fails on a node (must be managed by ZTP). Clears once 128T is installed.
CauseTroubleshooting Step
128T failed to install on node <node> router <router>, which is asset <id>Issue command show assets <id> to see detailed information on why the install failed and follow the instructions to fix the issue and retry the installation.

FieldData
Categoryasset
Severitycritical
Message<node>.<router> has a software version that does not match its peer.
ThresholdIssued if any node in a router has mismatched versions. Cleared when they are all equal.
CauseTroubleshooting Step
Multiple nodes configured within one router have different software versions.Manually upgrade the node that has the lower version using 128T or upgrade the router from the PCLI by issuing the send command upgrade router <router> <version> or hitting the upgrade button on the Conductor's Software Management Studio page on the GUI.

FieldData
Categoryasset
Severitycritical
MessageA duplicate asset with id <id> has been detected. Ensure all assets have a unique id and restart salt-minion on asset <id>, which is configured as <node>.<router>.
ThresholdIssued if any node being manged by a conductor has the same asset id as another node in the authority.
CauseTroubleshooting Step
Multiple nodes configured within an authority have the same asset id.Execute show assets to identify the <router>.<node> that has a duplicate ID. Change the asset id for that node in the Conductor to have a unique id.
Tip: Clearing the asset-id value will generate a random value.

FieldData
Categorygiid
Severitymajor
MessageDHCP address for interface [<interface name>] has not been resolved
ThresholdIssued when DHCP address for interface is unresolved.
CauseTroubleshooting Step
Interface configured to obtain address dynamically using DHCP but was not able to acquire one in time.Ensure the interface is operationally up
Ensure the interface is connected to a network with a DHCP server and the server will accept the node’s request for DHCP address.
Collect the DHCP statistics to check for any failures.
Collect packet traces on the DHCP interface to investigate any protocol level failures.

FieldData
Categoryinterface
Severityinfo
Messageinterface administratively down
Thresholdup/down
CauseTroubleshooting Step
The interface is down due to being disabled in the configurationRe-enable the interface in the configuration.

FieldData
Categoryinterface
Severitycritical
Messageinterface operational down
Thresholdup/down
CauseTroubleshooting Step
Interface is down for an Ethernet WAN connectionThe next hop networking equipment is down. Troubleshoot by checking for link status on adjacent equipment, adjacent switch ports, etc.
Interface is down for an HA or LAN connectionThe next hop networking equipment is down. Troubleshoot by checking for link status on adjacent equipment, adjacent switch ports, etc.
The down interface is an LTE interfaceCheck that strength and status of the LTE connection by using the show device-interface router <router name> id <interface id> command.
• If the signal strength is marginal, poor, or 0 the LTE interface is malfunctioning.
• If the system mode is not listed as LTE the signal is malfunctioning.
• If the Operation Status is down, the LTE interface is malfunctioning.

In the event of the conditions above, contact 128 Technology.

FieldData
Categorypeer
Severitymajor
MessagePeer <name> path is down
ThresholdWhen a single path is marked down by BFD. The source of the alarm includes the Node/interface/IP/VLAN.
CauseTroubleshooting Step
Router Interface is down.Enter the show device-interface router <router> node <node> <interface> command to verify the router's interface status. If the interface is down, the next hop equipment is likely down. Troubleshoot the adjacent device(s).
Adjacency router's interface is down.Enter the show device-interface router <router> command to verify the adjacency router's interface status. If the interface is down, the troubleshoot the adjacent device’s interface.
Path health has degraded sufficiently and is impacting performance.Using the GUI, click the Home icon and select the appropriate view for the current environment. Examine the graph for any anomalies at the time of the alarm. If the loss is 5% or higher the path has degraded.

FieldData
Categorypeer
Severitycritical
MessagePeer <name> is not reachable
ThresholdWhen all paths to a peer are marked down by BFD.
CauseTroubleshooting Step
All “Peer path” alarms to a given peer are triggered.Review the statistics for show stats bfd by-peer-path to investigate for anomolies.
Capture packets on the interface(s) that talk to the peer and look for successful UDP traffic to and from the peer at port 1280.

FieldData
Categorypeer
Severitymajor
MessagePeer <name> path MTU is unresolvable.
CauseTroubleshooting Step
Maximum Transmit Unit for packet size is unable to be determined.Set the MTU for the device-interface statically.

FieldData
Categoryplatform
Severitymajor
Messageflow table limit exceeded
Thresholdgreater than 90% of the total flow table
CauseTroubleshooting Step
Occurs when 90% or more of the total flow table is utilized.The alarm is cleared when 80% or less of the total flow table is utilized.

FieldData
Categoryplatform
Severitymajor
Messagefib table limit exceeded
Thresholdgreater than 90% of the total FIB table
CauseTroubleshooting Step
Occurs when 90% or more of the total FIB table is utilized.The alarm is cleared when 80% or less of the total FIB table is utilized. This may be due to suboptimal configuration or insufficient memory allocated to the 128T software. Contact 128T support if this alarm persists.

FieldData
Categoryplatform
Severitymajor
Messageaction table limit exceeded
Thresholdgreater than 90% of the total action table
CauseTroubleshooting Step
Occurs when 90% or more of the action table is utilized.The alarm is cleared when 80% or less of the action table is utilized. This table's use is proportional to the number of active flows.

FieldData
Categoryplatform
Severitymajor
Messagearp table limit exceeded
Thresholdgreater than 90% of the total arp table
CauseTroubleshooting Step
Occurs when 90% or more of the ARP table is used.The alarm is cleared when 80% or less of the table is used.

FieldData
Categoryplatform
Severitycritical
MessageSecurity Rekey failed for: <node-name(s)>
CauseTroubleshooting Step
Issued when a conductor fails to distribute newly created security keys during rekey process to any managed routers.Make sure failed nodes are running and have connectivity to the conductor. If the problem still persists please contact 128T customer support.

FieldData
Categoryplatform
Severitycritical
MessageSecurity Rekey failed for: <node-name(s)>
CauseTroubleshooting Step
Issued when a conductor fails to distribute newly created security keys during rekey process to any managed routers.Make sure failed nodes are running and have connectivity to the conductor. If the problem still persists please contact 128T customer support.

FieldData
Categoryprocess
Severitymajor
MessageProcess has exited unexpectedly: <process-name>
ThresholdIssued when a 128T system process exits and is cleared when it is successfully restarted
CauseTroubleshooting Step
Process exits once and restarts to normal operationThe 128T system is designed to restart processes in the event of a failure. If this alarm state is only seen briefly and then clears it is likely that the system has self-recovered. Please report to 128T customer support.
Process exits continuouslyContact 128T customer support

FieldData
Categorysystem
Severitycritical
Messagesystem memory exceeded
Thresholdgreater than 90%
CauseTroubleshooting Step
A process is consuming excessive memoryLocate the system processes consuming large amounts of system memory by running show stats process memory rss from the PCLI.

FieldData
Categorysystem
Severitymajor
Messagedisk space low
Thresholdless than 10% disk space left
CauseTroubleshooting Step
Disk usage is highUsing standard Linux tools such as “df” and “ls” determine which files are consuming large amounts of disk space. In the event that there are unneeded files they should be removed.

FieldData
Categorysystem
Severitymajor
MessageNo connectivity to <router>.<node>
ThresholdWhen a connection between a node that is present in config/environment config is not present.
CauseTroubleshooting Step
The node is not reachable by the conconductor.Enter `show system connectivity router all node all
The node is not reachable by its HA peer.Enter `show system connectivity router all node all

FieldData
Categorysystem
Severitymajor
MessageHost cpu utilization exceeded
Thresholdgreater than 85% for 30 seconds
CauseTroubleshooting Step
Intermittent process consuming large amount of CPUIf the alarm triggers and clears intermittently this could indicate a periodic load spike or intermittent process workload. Check the current cpu utilization of all processes in the system by using the linux command top or the PCLI command show stats process cpu. If the processes “highway” is consuming a large amount of CPU this could indicate a high network load event.
Process consistently consuming large amount of CPUIf the alarm is constantly active this could indicate an under-provisioned system. Check the current cpu utilization of all processes in the system by using the linux command top or the PCLI command `show stats process cpu. If the process “highway” is consuming a large amount of CPU this could indicate a high network load. Contact 128T support for guidance on provisioning the system.

FieldData
Categorysystem
SeverityMajor
MessageReceived config sync info state for node <node-name> with syncVersion=<version>, error=<error>, message=<message> and resulting action=<action>
ThresholdConfiguration synchronization error
CauseTroubleshooting Step
The router is unable to receive the configuration from the ConductorRun the command show asset <asset-id> of the system exhibiting the problem. This will return the status of the asset and provide more detailed information regarding the nature of the problem.

FieldData
Categorysystem
Severitymajor
MessageHostname [<hostname>] is unresolved
ThresholdWhen a configured hostname is unresolved.
CauseTroubleshooting Step
The router was unable to resolve the hostname given in the configVerify that the hostname is resolvable from linux using a utility like dig. Verify that the hostname has a corresponding /etc/hosts entry

FieldData
Categorysystem
SeverityMajor
MessageNo active NTP server
ThresholdIssued when the system is not connected to any active NTP servers.
CauseTroubleshooting Step
The router is having connectivity problems to the NTP server that was selected.Specify NTP server(s) to connect to. From PCLI, “configure authority router <router name> system ntp server <ntp server address>”.
Make this more resilient by specifying more NTP servers. A common practice is to specify 4 servers.

FieldData
Categorysystem
Severitycritical
MessageNode <node-name> went offline
ThresholdIssued when an HA node goes offline
CauseTroubleshooting Step
The HA peer node has shut down or stopped runningVerify that the HA peer node is powered on and running. If the node is running verify that the 128T service is running without error by issuing the command systemctl status 128T. If the system appears to be running correctly check connectivity between the systems by issuing the PCLI command show system connectivity on both nodes.
Connectivity between HA nodes is downHA node connectivity can be evaluated with the PCLI command show system connectivity. If the state to the peer node is not connected check the inter node tunnel status by running the PCLI command show system connectivity internal. All tunnels to the peer node should report “connected”. If connectivity is down verify links between the systems and if they are up then please contact 128T support.

FieldData
Categorysystem
Severitymajor
MessageCorrupt entitlement certificate received
Invalid entitlement certificate received
Unable to obtain entitlement certificate
ThresholdCertificate failure
CauseTroubleshooting Step
Unable to read entitlement data from certificateEnsure that the certificate installed on the system matches the one received from 128 Technology. Run install128t to reinstall the certificate. If the problem persists, contact customer support to obtain a new certificate.

FieldData
Categorysystem
Severitymajor
MessageSNMP server failure
ThresholdUnable to communicate to SNMP server
CauseTroubleshooting Step
Network connectivity failure or misconfigurationEnsure that the SNMP server defined in the configuration is reachable. Usually this can be determined by issueing a ping to the server address. If the server does not respond, run a packet capture on the interface used for SNMP to observe if traffic is being generated from the 128T upon event generation.

FieldData
Categorysystem
Severitymajor
MessageRestart required.
ThresholdRestart is required for configuration to take effect
CauseTroubleshooting Step
non-dynamically reconfiguable filed has been editedSome fields within the 128T configuration is not dynamic and requires a restart of the 128T process to take effect (e.g. forwarding-cores). From the Conductor Router page, click on the gear icon to issue a restart of the 128T process. Alternatively, from within the linux shell of the 128T Router, issue systemctl restart 128T
Last updated on