Troubleshooting¶
Generally to be done with the assistance of Altair Support.
Checking status of Altair SLC Hub services¶
To check that the Altair SLC Hub services are running, use the hubctl command:
hubctl service status
This should print a summary table of all of the Altair SLC Hub services and whether they are active (running).
If any are marked as inactive try restarting them with
hubctl service start <name>
If they are still inactive, or marked as failed, then view the logs of the service.
Viewing service logs¶
The logs from the services are captured by systemd/journald and made available using the
standard journalctl command.
For example, to see the last 50 log records from the authentication service, use the command
sudo journalctl -u slchub.auth -n 50
The -u parameter names the systemd unit to show log entries for. The unit names
of the Altair SLC Hub services all start with the prefix slchub.. To see the full list of
Altair SLC Hub service unit names, use the command:
sudo hubctl service status
Note
If a service fails unexpectedly, systemd can fail to currently associate the final log
messages with the relevant systemd unit file. It is sometimes necessary to use
journalctl -t name rather than journalctl -u slchub.name to see these last
few messages
For full details of the journalctl see the journalctl man page.
The logs from the services are located in [var directory]\log.
Log viewing tips on Linux¶
To see the log messages from all Altair SLC Hub services, use the pattern slchub.* as the
argument for the -u option, as in:
sudo journalctl -u slchub.* -n 50
By default, journalctl pipes the log entries through a pager such as less or more.
Piping the output to a file, and then editing the file using an editor such as vi can
be a useful alternative way of viewing the log files.
Rather than limiting the display to a fixed number of entries, the output can be limited based on the timestamp of the log record. To return the log entries for all Altair SLC Hub services that have happened in the last 5 minutes, use the command:
sudo journalctl -u slchub.* -S -5m
Missing nomad logs on worker nodes¶
Nomad has a garbage collector which by default deletes nomad log files when disk space usage exceeds 80%.
This can lead to nomad deleting log files as soon as a task completes, making it extremely difficult to diagnose the reason for a task failure.
This is unlikely to occur in a production environment.
If it does occur and there is an urgent need to diagnose a task failure, as a short term measure add a file named 90-gc-config.hcl to the [etc directory]/nomad.d directory of the Altair SLC Hub installation with this content:
client {
gc_disk_usage_threshold = 99
}
hubctl service restart nomad
A proper remedy is to increase the disk space available, for example on Linux putting the [var directory]/nomad directory of the Altair SLC Hub installation on its own volume.