CPU Spikes

This guide is to be used when investigating the root cause for a 128T router that reports alarms of CPU spikes.

Expected Behavior

A CPU alarm is triggered when the average CPU of all of the cores on a system exceeds 85% for thirty seconds. This will not include any CPU cores that are pinned for packet forwarding. The alarm is cleared when the average CPU is below 85% for five (5) seconds.

You can determine whether a system has pinned CPU cores by checking the configuration. The CPU allocation is defined within the node configuration of a router.

admin@gouda.novigrad# show config running authority router novigrad node gouda
config
authority
router novigrad
name novigrad
node gouda
name gouda
enabled true
forwarding-core-mode automatic

In particular, the forwarding-core-count indicates how many cores are dedicated for fast packet forwarding.

note

The default configuration has forwarding-core-mode set to automatic with no forwarding-core-count defined. The 128T platform will attempt to right-size the configuration based on system's available resources. For some deployments, it may be desirable to override the defaults to optimize the platform for your environment.

When the forwarding-count-mode is set to automatic, you can see how the system has allocated resources by issuing the command show platform cpu.

admin@gouda.novigrad# show platform cpu
Thu 2020-03-19 15:23:24 UTC
===================================================================
gouda
===================================================================
---------------
CPU Information
---------------
Type: Intel(R) Atom(TM) CPU C2558 @ 2.40GHz
Speed: 2.400096893310547 GHz
Hyper-Threading: no
Cores: 4
Forwarding Cores: 1
Isolated Cores: 1
Power-Saver: disabled
Completed in 2.75 seconds

With an understanding of how the system is configured, we can now get to the process of examining the history of CPU usage over time. The 128T stores time series data for a number of KPIs that are relevant for system and service health and operation. Viewing time series data is best accomplished within the GUI.

Navigate to the Custom Reports located on the dashboard. From there create two reports: one for total utilization per CPU and another for utilization per 128T process, as indicated by the images below.

ts_cpu_spikes_per_cpu_chart

ts_cpu_spikes_per_process_chart

Alarms are optionally overlaid on top of all charts generated by the 128T. This is very helpful in correlating system events to system behavior. The time window of custom reports can be extended from 5 minutes to 6 months. If you see anomalies in either the CPU and correspondingly a particular application, this may indicate the system is not performing properly.

Last updated on