Skip to main content

Alarms

Asset

FieldData
Categoryasset
Severitycritical
Message<node>.<router> has a software version that does not match its peer.
ThresholdIssued if any node in a router has mismatched versions. Cleared when they are all equal.
CauseTroubleshooting Step
Multiple nodes configured within one router have different software versions.Manually upgrade the node that has the lower version. Upgrade the router from the PCLI by issuing the send command upgrade router <router> <version> or the upgrade button on the Router page on the Conductor's GUI.

FieldData
Categoryasset
Severitycritical
MessageA duplicate asset with id <id> has been detected. Ensure all assets have a unique id and restart salt-minion on asset <id>, which is configured as <node>.<router>.
ThresholdIssued if any node being manged by a conductor has the same asset id as another node in the authority.
CauseTroubleshooting Step
Multiple nodes configured within an authority have the same asset id.Execute show assets to identify the <router>.<node> that has a duplicate ID. Change the asset id for that node in the Conductor to have a unique id.
Tip: Clearing the asset-id value will generate a random value.

FieldData
Categoryasset
Severitymajor
MessageAsset <id>, which is configured as <node>.<router>, is not running
ThresholdIssued when the SSR service stops on a node (must be managed by ZTP). Clears on SSR start.
CauseTroubleshooting Step
SSR is not running on node <node> router <router>Start SSR from the Conductor PCLI by entering with send command start router <router> node <node> or pressing the start button on the Conductor’s router page in the GUI. If the SSR cannot start check systemctl status 128T on that node.

FieldData
Categoryasset
Severitymajor
MessageAsset<id>, which is configured as <node>.<router>, failed to install.
ThresholdIssued when the SSR install fails on a node (must be managed by ZTP). Clears once SSR is installed.
CauseTroubleshooting Step
SSR failed to install on node <node> router <router>, which is asset <id>Issue command show assets <id> to see detailed information on why the install failed and follow the instructions to fix the issue and retry the installation.

bgp_neighbor

FieldData
Categorybgp_neighbor
Severitymajor
MessageNeighbor <ipaddress> failed to reach the ESTABLISHED state.
ThresholdIssued when the BGP neighbor is not in the ESTABLISHED state. Clears when the BGP neighbor returns to the ESTABLISHED state.
CauseTroubleshooting Step
1. The remote IP address is not reachable due to some network connectivity problem.
2. The remote router is not configured to accept a BGP connection.
3. The OPEN message exchange fails.
Use the command show bgp neighbors and review the content for misconfiguration, state machine connection status, and disconnect failures.

giid

FieldData
Categorygiid
Severitymajor
MessageDHCP address for interface [<interface name>] has not been resolved
ThresholdIssued when DHCP address for interface is unresolved.
CauseTroubleshooting Step
Interface configured to obtain address dynamically using DHCP but was not able to acquire one in time.Ensure the interface is operationally up
Ensure the interface is connected to a network with a DHCP server and the server will accept the node’s request for DHCP address.
Collect the DHCP statistics to check for any failures.
Collect packet traces on the DHCP interface to investigate any protocol level failures.

Interface

FieldData
Categoryinterface
Severitycritical
Messageinterface operational down
Thresholdup/down
CauseTroubleshooting Step
Interface is down for an Ethernet WAN connectionThe next hop networking equipment is down. Troubleshoot by checking for link status on adjacent equipment, adjacent switch ports, etc.
Interface is down for an HA or LAN connectionThe next hop networking equipment is down. Troubleshoot by checking for link status on adjacent equipment, adjacent switch ports, etc.
The down interface is an LTE interfaceCheck that strength and status of the LTE connection by using the show device-interface router <router name> id <interface id> command.
• If the signal strength is marginal, poor, or 0 the LTE interface is malfunctioning.
• If the system mode is not listed as LTE the signal is malfunctioning.
• If the Operation Status is down, the LTE interface is malfunctioning.

In the event of the conditions above, contact Juniper.

FieldData
Categoryinterface
Severityinfo
Messageinterface administratively down
Thresholdup/down
CauseTroubleshooting Step
The interface is down due to being disabled in the configurationRe-enable the interface in the configuration.

Peer

FieldData
Categorypeer
Severitycritical
MessagePeer <name> is not reachable.
ThresholdWhen all paths to a peer are marked down by BFD.
CauseTroubleshooting Step
All “Peer path” alarms to a given peer are triggered.Review the statistics for show stats bfd by-peer-path to investigate for anomolies.
Capture packets on the interface(s) that talk to the peer and look for successful UDP traffic to and from the peer at port 1280.

FieldData
Categorypeer
Severitymajor
MessagePeer <name> path is down
ThresholdWhen a single path is marked down by BFD. The source of the alarm includes the Node/interface/IP/VLAN.
CauseTroubleshooting Step
Router Interface is down.Enter the show device-interface router <router> node <node> <interface> command to verify the router's interface status. If the interface is down, the next hop equipment is likely down. Troubleshoot the adjacent device(s).
Adjacency router's interface is down.Enter the show device-interface router <router> command to verify the adjacency router's interface status. If the interface is down, the troubleshoot the adjacent device’s interface.
Path health has degraded sufficiently and is impacting performance.Using the GUI, click the Home icon and select the appropriate view for the current environment. Examine the graph for any anomalies at the time of the alarm. If the loss is 5% or higher the path has degraded.

FieldData
Categorypeer
Severitymajor
MessagePeer <name> path MTU is unresolvable.
CauseTroubleshooting Step
Maximum Transmit Unit for packet size is unable to be determined.Set the MTU for the device-interface statically.

Platform

FieldData
Categoryplatform
Severitycritical
MessageSecurity Rekey failed for: <node-name(s)>
CauseTroubleshooting Step
Issued when a conductor fails to distribute newly created security keys during rekey process to any managed routers.Make sure failed nodes are running and have connectivity to the conductor. If the problem still persists please contact Juniper customer support.

FieldData
Categoryplatform
Severitycritical
MessageSecurity Rekey failed for: <node-name(s)>
CauseTroubleshooting Step
Issued when a conductor fails to distribute newly created security keys during rekey process to any managed routers.Make sure failed nodes are running and have connectivity to the conductor. If the problem still persists please contact Juniper customer support.

FieldData
Categoryplatform
Severitymajor
Messageflow table limit exceeded
Thresholdgreater than 90% of the total flow table
CauseTroubleshooting Step
Occurs when 90% or more of the total flow table is utilized.The alarm is cleared when 80% or less of the total flow table is utilized.

FieldData
Categoryplatform
Severitymajor
Messagefib table limit exceeded
Thresholdgreater than 90% of the total FIB table
CauseTroubleshooting Step
Occurs when 90% or more of the total FIB table is utilized.The alarm is cleared when 80% or less of the total FIB table is utilized. This may be due to suboptimal configuration or insufficient memory allocated to the SSR software. Contact Juniper support if this alarm persists.

FieldData
Categoryplatform
Severitymajor
Messageaction table limit exceeded
Thresholdgreater than 90% of the total action table
CauseTroubleshooting Step
Occurs when 90% or more of the action table is utilized.The alarm is cleared when 80% or less of the action table is utilized. This table's use is proportional to the number of active flows.

FieldData
Categoryplatform
Severitymajor
Messagearp table limit exceeded
Thresholdgreater than 90% of the total arp table
CauseTroubleshooting Step
Occurs when 90% or more of the ARP table is used.The alarm is cleared when 80% or less of the table is used.

FieldData
Categoryprocess
Severitymajor
MessageProcess has exited unexpectedly: <process-name>
ThresholdIssued when a SSR system process exits and is cleared when it is successfully restarted
CauseTroubleshooting Step
Process exits once and restarts to normal operationThe SSR is designed to restart processes in the event of a failure. If this alarm state is only seen briefly and then clears it is likely that the system has self-recovered. Please report to Juniper customer support.
Process exits continuouslyContact Juniper customer support

System


FieldData
Categorysystem
Severitycritical
MessageNode <node-name> went offline
ThresholdIssued when an HA node goes offline
CauseTroubleshooting Step
The HA peer node has shut down or stopped runningVerify that the HA peer node is powered on and running. If the node is running verify that the SSR service is running without error by issuing the command systemctl status 128T. If the system appears to be running correctly check connectivity between the systems by issuing the PCLI command show system connectivity on both nodes.
Connectivity between HA nodes is downHA node connectivity can be evaluated with the PCLI command show system connectivity. If the state to the peer node is not connected check the inter node tunnel status by running the PCLI command show system connectivity internal. All tunnels to the peer node should report “connected”. If connectivity is down verify links between the systems and if they are up then please contact Juniper support.

FieldData
Categorysystem
Severitycritical
Messagesystem memory exceeded
Thresholdgreater than 90%
CauseTroubleshooting Step
Alarm triggers above 90% memory usageThe Alarm resets/clears when memory usage drops below 80%.
A process is consuming excessive memoryLocate the system processes consuming large amounts of system memory by running show stats process memory rss from the PCLI.

FieldData
Categorysystem
Severitymajor
Messagedisk space low
Thresholdless than 10% disk space left
CauseTroubleshooting Step
Disk usage is highUsing standard Linux tools such as “df” and “ls” determine which files are consuming large amounts of disk space. In the event that there are unneeded files they should be removed.

FieldData
Categorysystem
Severitymajor
MessageNo connectivity to <router>.<node>
ThresholdWhen a connection between a node that is present in config/environment config is not present.
CauseTroubleshooting Step
The node is not reachable by the conconductor.Enter `show system connectivity router all node all
The node is not reachable by its HA peer.Enter `show system connectivity router all node all

FieldData
Categorysystem
Severitymajor
MessageHost cpu utilization exceeded
Thresholdgreater than 85% for 30 seconds
CauseTroubleshooting Step
Alarm triggers above 85% CPU usage for 30 secondsThe alarm clears when the CPU usage drops below 70%.
Intermittent process consuming large amount of CPUIf the alarm triggers and clears intermittently this could indicate a periodic load spike or intermittent process workload. Check the current cpu utilization of all processes in the system by using the linux command top or the PCLI command show stats process cpu.
Process consistently consuming large amount of CPUIf the alarm is constantly active this could indicate an under-provisioned system. Check the current cpu utilization of all processes in the system by using the linux command top or the PCLI command show stats process cpu. Contact Juniper support for guidance on provisioning the system.

FieldData
Categorysystem
Severitymajor
MessageReceived config sync info state for node <node-name> with syncVersion=<version>, error=<error>, message=<message> and resulting action=<action>
ThresholdConfiguration synchronization error
CauseTroubleshooting Step
The router is unable to receive the configuration from the ConductorRun the command show asset <asset-id> of the system exhibiting the problem. This will return the status of the asset and provide more detailed information regarding the nature of the problem.

FieldData
Categorysystem
Severitymajor
MessageHostname [<hostname>] is unresolved
ThresholdWhen a configured hostname is unresolved.
CauseTroubleshooting Step
The router was unable to resolve the hostname given in the configVerify that the hostname is resolvable from linux using a utility like dig. Verify that the hostname has a corresponding /etc/hosts entry

FieldData
Categorysystem
Severitymajor
MessageNo active NTP server
ThresholdIssued when the system is not connected to any active NTP servers.
CauseTroubleshooting Step
The router is having connectivity problems to the NTP server that was selected.Specify NTP server(s) to connect to. From PCLI, “configure authority router <router name> system ntp server <ntp server address>”.
Make this more resilient by specifying more NTP servers. A common practice is to specify 4 servers.

FieldData
Categorysystem
Severitymajor
MessageCorrupt entitlement certificate received
Invalid entitlement certificate received
Unable to obtain entitlement certificate
ThresholdCertificate failure
CauseTroubleshooting Step
Unable to read entitlement data from certificateEnsure that the certificate installed on the system matches the one received from Juniper. Run install128t to reinstall the certificate. If the problem persists, contact customer support to obtain a new certificate.

FieldData
Categorysystem
Severitymajor
MessageSNMP server failure
ThresholdUnable to communicate to SNMP server
CauseTroubleshooting Step
Network connectivity failure or misconfigurationEnsure that the SNMP server defined in the configuration is reachable. Usually this can be determined by issueing a ping to the server address. If the server does not respond, run a packet capture on the interface used for SNMP to observe if traffic is being generated from the SSR upon event generation.

FieldData
Categorysystem
Severitymajor
MessageRestart required
ThresholdRestart is required for configuration to take effect
CauseTroubleshooting Step
Non-dynamically reconfiguable filed has been editedSome fields within the SSR configuration are not dynamic and requires a restart of the SSR process to take effect (e.g. forwarding-cores). From the Conductor Router page, click on the gear icon to issue a restart of the SSR process. Alternatively, from within the linux shell of the SSR Router, issue systemctl restart 128T

FieldData
Categorysystem
Severityminor
MessageApplication Identification cache utilization is approaching maximum capacity
ThresholdFires at 95% of max-capacity value (default is 10,000)
CauseTroubleshooting Step
Capacity of the cache exceeds 95% of the max-capacity configured valueThe alarm is cleared once the capacity of the cache goes below 85% of the configured value, and as sessions using those stats expire. The alarm can be addressed by adjusting the max-capacity value under application-identification. App-id stats are tracked per application, per client, and per next-hop. The granularity of per-application, per-client traffic stats will be reduced while the alarm is active on the system.