Troubleshooting¶
Generally to be done with the assistance of Altair Support.
Checking status of Altair SLC Hub services¶
To check that the Altair SLC Hub services are running, use the hubctl
command:
hubctl service status
This should print a summary table of all of the Altair SLC Hub services and whether they are active (running).
If any are marked as inactive
try restarting them with
hubctl service start <name>
If they are still inactive, or marked as failed, then view the logs of the service.
Viewing service logs¶
The logs from the services are captured by systemd/journald
and, they can most easily be
accessed using the hubctl log
command.
For more details see Logging.
Note
In most scenarios the logs can be retrieved by the hubctl log
command.
However, if a service fails unexpectedly, systemd
can fail to associate the final log
messages with the relevant service, in this case it is necessary to use
journalctl
to view the systemd
output.
Log viewing tips on Linux¶
By default, hubctl log
pipes the log entries through a pager.
Piping the output to a file, and then editing the file using an editor such as vi
can
be a useful alternative way of viewing the log files.
Rather than limiting the display to a fixed number of entries, the output can be limited based on the timestamp of the log record. To return the log entries for all Altair SLC Hub services that have happened in the last 5 minutes, use the command:
sudo hubctl log --since -5m
The logs from the services are located in [var directory]\log
.
Missing nomad logs on worker nodes¶
Nomad has a garbage collector which by default deletes nomad log files when disk space usage exceeds 80%.
This can lead to nomad deleting log files as soon as a task completes, making it extremely difficult to diagnose the reason for a task failure.
This is unlikely to occur in a production environment.
If it does occur and there is an urgent need to diagnose a task failure, as a short term measure add a file named 90-gc-config.hcl
to the [etc directory]/nomad.d
directory of the Altair SLC Hub installation with this content:
client {
gc_disk_usage_threshold = 99
}
hubctl service restart nomad
A proper remedy is to increase the disk space available, for example on Linux putting the [var directory]/nomad
directory of the Altair SLC Hub installation on its own volume.