Batch Jobs
Overview¶
Altair SLC Hub provides a facility for executing batch jobs from a command line tool, hubcli
.
The batch jobs can consist of a number of different types of workload:
- SAS language programs
- Shell scripts (Linux bash, PowerShell, Windows BAT)
- Program from an Altair SLC Hub package stored in an Altair SLC Hub artefact repository.
The hubcli
command enables authentication with Altair SLC Hub and for batch jobs to be submitted
and monitored. The Altair SLC Hub Portal also has administrative facilities to list and manage
the batch job workloads.
Installation¶
The hubcli
command line tool is provided as a single standalone executable, and is distributed in the
form of a Windows zip file or Linux tar.gz
file. This can be installed
anywhere on any host, and has no dependency on any other parts of Altair SLC Hub.
It is recommended to keep the batch job submission hosts separate from the Altair SLC Hub workers hosts. The Altair SLC Hub workload scheduler works best when there are no other workloads running on the worker machine (other than normal operating system processes). It is therefore recommended to keep the batch job submission hosts separate.
Operation and Use of hubcli
¶
Management of Altair SLC Hub Connection Profiles¶
Connections to multiple instances of Altair SLC Hub can be configured for use with the hubcli
command.
Connections can be defined at two levels. A user can define their own connections using the
hubcli connection add
command. These connections are stored in the hubcli
configuration
file in the user's home directory.
Connections can also be defined at a host level and shared among users on that host.
On hubcli
invocation, shared connections defined in the file identified by
the ALTAIR_HUBCLI_SHARED_CFG
environment variable are read, if that environment variable is set.
The format of that file is the same as the user configuration file stored in the
users home directory. The recommended method of populating that file is to use
hubcli connection add
and then copy the relevant section out of the resulting
user config file that will be stored in ~/.hubcli/config.yaml
. For example:
connections:
- name: prod
url: https://prodhub.example.org:9090
Multiple connection profiles can be defined. Most uses of the hubcli
command require
a connection profile specified using the --connection
argument. Alternatively,
use the hubcli connection use
to set a default connection so that the --connection
argument
does not have to be specified on each command.
The --connection
argument can be used to temporarily work with a connection
that is not the current default.
For example, to log on to a connection, it is necessary to use a command such as
hubcli logon --connection prod
This command can be simplified if hub connection use
has been specified.
hub connection use prod
hubcli logon
The hub connection use
sets the default connection to use for all subsequent
commands. The default is stored in the user's ~/.hubcli/config.yaml
file.
It is possible to see the list of connection profiles, and to see which is currently
selected as the default by using the hubcli connection list
command.
Authentication to Altair SLC Hub¶
Once a connection has been defined, it is necessary to authenticate to Altair SLC Hub.
There are two supported logon methods when invoking the hubcli
command:
- If a web browser is available on the host on which
hubcli
is run, then browser logon is supported. This is the recommended method to use. - If there is no web browser available on the host then command line authentication can be used.
In this method, prompts for username and password are presented when the
hubcli
command is run.
Altair SLC Hub uses the standard OpenID Connect protocol for authentication. An
initial authentication process has to take place involving the user providing a username
and password, either into a web browser, or at prompts given by the hubcli
command.
This initial authentication generates what is called refresh token. The refresh token typically has
a lifespan of 30 days. Each time the hubcli
command is run to perform an operation
that requires communication with Altair SLC Hub, the application will use the refresh token to
acquire a short term access token. The access token typically has a lifespan of 1 hour.
The hubcli
application will generate new access tokens as required for as long as it needs them.
The refresh token is stored securely in a token cache in the user's home directory.
Each time the hubcli
command is used to perform an operation
that requires it to communicate with Altair SLC Hub, as well as acquiring an access token, it
will regenerate the refresh token. The new refresh token will have an independent lifespan,
so will be valid for, say, 30 days from the time it was generated. So as long as the
hubcli
command is run at least one every 30 days (or whatever the configured refresh token
lifespan is), there will be no further need to enter a username and password.
If necessary, the hubcli login-refresh
command can be used to generate a new refresh token
to keep the chain of refresh tokens going. Unless it is known that hubcli
will be used
to submit a batch job (or other similar operation that requires communicating with Altair SLC Hub)
at least once every 30 days from each host, it is recommended to schedule execution of
hubcli login-refresh
at least once every 30 days (or whatever the configured refresh token
lifespan is) to ensure that there is no further need to enter a username and password.
Submission and Monitoring of Batch Jobs¶
Submission¶
There are two styles of submitting batch jobs, depending on whether the source program being submitted is available to the workers or needs to be copied into Altair SLC Hub.
The hubcli job run
command is used when the source program (SAS language file, bash script etc) is in a
directory available to the Altair SLC Hub workers. The hubcli
application does not read the contents of the file.
This command takes the path of the file, and that path string is passed to the chosen worker. The worker receives the path string
and will open the specified file; the path needs to have meaning to the worker.
The specified file path is not required to be available to the hubcli
tool. This means the tool cannot verify
the file content or the file's existence. When identifying a file, you should specify the absolute path for the file.
You should not specify the file location using a relative path from to the current directory where the hubcli
is running.
The relative path string is unlikely to have any meaning to the worker node.
The hubcli job submit
command is used when the source program is not in a directory available to the
Altair SLC Hub workers. When this command is run, the hubcli
tool opens and reads
the content of the specified file and the contents of the file are passed to the worker to run.
In this style of executio, the source program has to be available to the hubcli
tool, and the location
of the file can be specified as a relative path if desired.
Monitoring¶
Batch Jobs submitted via. hubcli
can be monitored via. the Altair SLC Hub Portal from the Batch Jobs page in
the Deployment Services section, as well as via. the hubcli
application itself.
Monitoring via the Portal¶
On the Batch Jobs page, the status of executed jobs is shown, along with other job details. On any individual job, the history can be viewed, showing the job progress as it is executed, including the exit code on completion. System options and environment variables that were used are also visible, and the job log and results can be downloaded from a job's outputs.
It is also possible to cancel a job that is in the 'Pending' or 'Executing' state (as may be necessary if a runaway job is consuming too many resources). Historical job executions that are no longer required can also be deleted here.
Monitoring via Hubcli¶
The hubcli
utility can also be used to monitor the progress of batch jobs.
Some examples of hubcli
commands that can be used to monitor batch jobs are:
- The
hubcli job list
command can be used to list all the batch jobs that have been submitted.
This command will also show the job ID, which is needed for many of the otherhubcli
commands below. - The
hubcli job status
command can be used to get the status of a specific job. - The
hubcli job log
command can be used to get the log of a specific job. - The
hubcli job results
command can be used to get the results of a specific job.
Namespaces and Execution Profiles¶
To submit a batch program it is necessary to specify a namespace and an execution profile to the hubcli
command.
Specify a namespace using the --namespace
argument. Alternatively, a default namespace can be set using the hubcli namespace use
command.
Specify an execution profile using the --execution-profile
argument. Alternatively, a default namespace can be set using the hubcli namespace use
command.
The specified defaults are used for future hubcli
commands that require a namespace or execution profile on a per-connection basis.
That is, the default namespace for one connection can be different to the default namespace for another connection.
The specified default values for namespace and execution profile are stored in the user's ~/.hubcli/config.yaml
file, also on a per-connection basis.
Note
hubcli
overrides the ALTLOG
system option when running Altair SLC programs.
Setting this option in an execution profile will have no effect.
Running Altair SLC Hub Package Programs¶
A program can be run from an Altair SLC Hub package that has been authored in
Altair Analytics Workbench and uploaded into Altair SLC Hub. This can be done with the
hubcli job runpkg
command.
There is no requirement for the package to be deployed to use the hubcli job runpkg
command, only that the package
has been uploaded to Altair SLC Hub.
If you specify the --program
parameter to invoke a program from an Altair SLC Hub package,
you should use the API entry point for the program not the path to the source file within the package
For example, the following shows the editor for a program in Altair Analytics Workbench.
The name of the source file in the package is Program1.sas
. The
value required for the --program
parameter is the value of the "Program path" field, in this
example, the path is examples/example1
. The command to invoke this program using the hubcli
command is:
hubcli job runpkg --program examples/example1 [other required parameters]
Other required parameters for this program include the repository
, group
, name
, and version
arguments.
Job States¶
A batch job goes through a number of states as it is processed by Altair SLC Hub.
State | Meaning |
---|---|
Creating | Jobs go through a two-phase creation process. First the job object is created, then any required inputs are uploaded. A job should only exist in a creating state for a few moments. If a job remains in creating state for more than a few moments, it typically indicates that the client disconnected during the process. Jobs that remain in creating state for too long are automatically removed. |
Pending | Once a job is created, it is placed in Pending state. The reason for the job being in a pending state might be: there are too many other jobs being executed, and therefore insufficient resources to place this job; or there is a constraint on where the job can be run and that constraint cannot be satisfied. |
Executing | Once a job has been committed to a node it is placed in Executing state. This state means the job is being prepared for execution, currently running, or any execution results are being recovered. Individual events in the job status will indicate exactly what phase of execution the job is in. |
Completed successfully | This state indicates that a job has run to completion with a success exit code (either returning a zero exit code or one of the additional exit codes that have been explicitly defined as indicating successful completion). |
Completed with error | This state indicates the job has run to completion but produced an unexpected exit code. |
Failed | This state indicates that something has gone wrong during the execution. For example, failure to recover result files, or failure to communicate with a host. This state will be accompanied by a reason text. |
Cancelled | This state indicates that the batch job was cancelled by the user. |
Configuration¶
Token lifespan¶
The lifespan of the refresh and access tokens is governed by standard Altair SLC Hub configuration settings.
See the etc\config.d\auth.yaml
file for more information about their use and current default values.
The lifespan of the tokens can be set such that they affect all clients (the
Altair SLC Hub portal, Altair Analytics Workbench, and the hubcli
command), or they can be set individually
for each of those client types. More details and examples are given in the etc\config.d\auth.yaml
file.