This document describes the process for troubleshooting issues with Automated Provisioning (AP) using Salt.
The terms "master" and "conductor" are used interchangeably throughout this document. "Master" refers to the salt-master process running on conductor, which orchestrates tasks for AP. Also the terms "minion", "salt-minion" and "asset" are used interchangeably throughout this document. "Minion" runs on an "asset", or system hosting a 128T router. Minions are responsible for carrying out tasks on the host, given to it by the master.
There are several conditions which are symptomatic of issues which may be affecting AP, such as asset disconnected from conductor and general asset errors. These can be seen from conductor CLI using the
show assets command with a provided asset-id. For example:
At certain times some errors that are seen may be transient during periods of intermittent connectivity, or while the salt master corrects things found in the incorrect state on the minion. Errors that persist are often the result of issues with connectivity between the minion and conductor, or the minion timing out tying to complete tasks given to it by the master.
Steps to Rectify
Throughout the lifecycle of a 128T asset, some errors are normal and will clear on their own over time. The first steps are to ensure there is working connectivity between the minion and the master.
Diagnosing the issues
The following steps can help diagnose issues with salt-minion which may be affecting AP:
- Ensure that the
salt-minionprocess is running:
- Ensure the host has a working route to conductor by inspecting the route table. For example the following host has a default route via the
- Verify that the minion is successfully opening connections to the master on TCP port 4505 and 4506. For example, you can use the
ssLinux utility to look for active sockets:
Connections on TCP port 4506 are transient, and only exist when the minion needs to report information back to the master. Seeing a none, or a varying number of connections being opened on TCP port 4506 is normal.
- If you suspect network issues are impacting the minion's ability to function, you can further diagnose by verifying packets are flowing as expected on the wire using
tcpdump. For example:
salt-minion does not appear to be attempting connections to master, or has healthy connections to conductor but continues to have persistent errors, you can try restarting
If it appears there are connectivity issues that are preventing AP from functioning, correct any networking issues that exist.