Skip to content

Batch Jobs

Overview

Altair SLC Hub provides a facility for executing batch jobs from a command line tool, hubcli.

The batch jobs can consist of a number of different types of workload:

  • SAS language programs
  • Shell scripts (Linux bash, PowerShell, Windows BAT)
  • Program from an Altair SLC Hub package stored in an Altair SLC Hub artefact repository.

The hubcli command enables authentication with Altair SLC Hub and for batch jobs to be submitted and monitored. The Altair SLC Hub Portal also has administrative facilities to list and manage the batch job workloads.

Installation

The hubcli command line tool is provided as a single standalone executable, and is distributed in the form of a Windows zip file or Linux tar.gz file. This can be installed anywhere on any host, and has no dependency on any other parts of Altair SLC Hub.

It is recommended to keep the batch job submission hosts separate from the Altair SLC Hub workers hosts. The Altair SLC Hub workload scheduler works best when there are no other workloads running on the worker machine (other than normal operating system processes). It is therefore recommended to keep the batch job submission hosts separate.

Operation and Use of hubcli

Management of Altair SLC Hub Connection Profiles

Connections to multiple instances of Altair SLC Hub can be configured for use with the hubcli command. Connections can be defined at two levels. A user can define their own connections using the hubcli connection add command. These connections are stored in the hubcli configuration file in the user's home directory.

Connections can also be defined at a host level and shared among users on that host. On hubcli invocation, shared connections defined in the file identified by the ALTAIR_HUBCLI_SHARED_CFG environment variable are read, if that environment variable is set.

The format of that file is the same as the user configuration file stored in the users home directory. The recommended method of populating that file is to use hubcli connection add and then copy the relevant section out of the resulting user config file that will be stored in ~/.hubcli/config.yaml. For example:

connections:
  - name: prod
    url: https://prodhub.example.org:9090

Multiple connection profiles can be defined. Most uses of the hubcli command require a connection profile specified using the --connection argument. Alternatively, use the hubcli connection use to set a default connection so that the --connection argument does not have to be specified on each command.

The --connection argument can be used to temporarily work with a connection that is not the current default.

For example, to log on to a connection, it is necessary to use a command such as

hubcli logon --connection prod

This command can be simplified if hub connection use has been specified.

hub connection use prod
hubcli logon

The hub connection use sets the default connection to use for all subsequent commands. The default is stored in the user's ~/.hubcli/config.yaml file.

It is possible to see the list of connection profiles, and to see which is currently selected as the default by using the hubcli connection list command.

Authentication to Altair SLC Hub

Once a connection has been defined, it is necessary to authenticate to Altair SLC Hub.

There are two supported logon methods when invoking the hubcli command:

  • If a web browser is available on the host on which hubcli is run, then browser logon is supported. This is the recommended method to use.
  • If there is no web browser available on the host then command line authentication can be used. In this method, prompts for username and password are presented when the hubcli command is run.

Altair SLC Hub uses the standard OpenID Connect protocol for authentication. An initial authentication process has to take place involving the user providing a username and password, either into a web browser, or at prompts given by the hubcli command. This initial authentication generates what is called refresh token. The refresh token typically has a lifespan of 30 days. Each time the hubcli command is run to perform an operation that requires communication with Altair SLC Hub, the application will use the refresh token to acquire a short term access token. The access token typically has a lifespan of 1 hour. The hubcli application will generate new access tokens as required for as long as it needs them.

The refresh token is stored securely in a token cache in the user's home directory. Each time the hubcli command is used to perform an operation that requires it to communicate with Altair SLC Hub, as well as acquiring an access token, it will regenerate the refresh token. The new refresh token will have an independent lifespan, so will be valid for, say, 30 days from the time it was generated. So as long as the hubcli command is run at least one every 30 days (or whatever the configured refresh token lifespan is), there will be no further need to enter a username and password.

If necessary, the hubcli login-refresh command can be used to generate a new refresh token to keep the chain of refresh tokens going. Unless it is known that hubcli will be used to submit a batch job (or other similar operation that requires communicating with Altair SLC Hub) at least once every 30 days from each host, it is recommended to schedule execution of hubcli login-refresh at least once every 30 days (or whatever the configured refresh token lifespan is) to ensure that there is no further need to enter a username and password.

Submission and Monitoring of Batch Jobs

Submission

There are two styles of submitting batch jobs, depending on whether the source program being submitted is available to the workers or needs to be copied into Altair SLC Hub.

The hubcli job run command is used when the source program (SAS language file, bash script etc) is in a directory available to the Altair SLC Hub workers. The hubcli application does not read the contents of the file. This command takes the path of the file, and that path string is passed to the chosen worker. The worker receives the path string and will open the specified file; the path needs to have meaning to the worker.

The specified file path is not required to be available to the hubcli tool. This means the tool cannot verify the file content or the file's existence. When identifying a file, you should specify the absolute path for the file. You should not specify the file location using a relative path from to the current directory where the hubcli is running. The relative path string is unlikely to have any meaning to the worker node.

The hubcli job submit command is used when the source program is not in a directory available to the Altair SLC Hub workers. When this command is run, the hubcli tool opens and reads the content of the specified file and the contents of the file are passed to the worker to run. In this style of executio, the source program has to be available to the hubcli tool, and the location of the file can be specified as a relative path if desired.

Monitoring

Batch Jobs submitted via. hubcli can be monitored via. the Altair SLC Hub Portal from the Batch Jobs page in the Deployment Services section, as well as via. the hubcli application itself.

Monitoring via the Portal

On the Batch Jobs page, the status of executed jobs is shown, along with other job details. On any individual job, the history can be viewed, showing the job progress as it is executed, including the exit code on completion. System options and environment variables that were used are also visible, and the job log and results can be downloaded from a job's outputs.

It is also possible to cancel a job that is in the 'Pending' or 'Executing' state (as may be necessary if a runaway job is consuming too many resources). Historical job executions that are no longer required can also be deleted here.

Monitoring via Hubcli

The hubcli utility can also be used to monitor the progress of batch jobs.

Some examples of hubcli commands that can be used to monitor batch jobs are:

  • The hubcli job list command can be used to list all the batch jobs that have been submitted.
    This command will also show the job ID, which is needed for many of the other hubcli commands below.
  • The hubcli job status command can be used to get the status of a specific job.
  • The hubcli job log command can be used to get the log of a specific job.
  • The hubcli job results command can be used to get the results of a specific job.

Namespaces and Execution Profiles

To submit a batch program it is necessary to specify a namespace and an execution profile to the hubcli command.

Specify a namespace using the --namespace argument. Alternatively, a default namespace can be set using the hubcli namespace use command. Specify an execution profile using the --execution-profile argument. Alternatively, a default namespace can be set using the hubcli namespace use command.

The specified defaults are used for future hubcli commands that require a namespace or execution profile on a per-connection basis. That is, the default namespace for one connection can be different to the default namespace for another connection.

The specified default values for namespace and execution profile are stored in the user's ~/.hubcli/config.yaml file, also on a per-connection basis.

Note

hubcli overrides the ALTLOG system option when running Altair SLC programs. Setting this option in an execution profile will have no effect.

Running Altair SLC Hub Package Programs

A program can be run from an Altair SLC Hub package that has been authored in Altair Analytics Workbench and uploaded into Altair SLC Hub. This can be done with the hubcli job runpkg command.

There is no requirement for the package to be deployed to use the hubcli job runpkg command, only that the package has been uploaded to Altair SLC Hub.

If you specify the --program parameter to invoke a program from an Altair SLC Hub package, you should use the API entry point for the program not the path to the source file within the package

For example, the following shows the editor for a program in Altair Analytics Workbench.

Image

The name of the source file in the package is Program1.sas. The value required for the --program parameter is the value of the "Program path" field, in this example, the path is examples/example1. The command to invoke this program using the hubcli command is:

hubcli job runpkg --program examples/example1 [other required parameters]

Other required parameters for this program include the repository, group, name, and version arguments.

Job States

A batch job goes through a number of states as it is processed by Altair SLC Hub.

State Meaning
Creating Jobs go through a two-phase creation process. First the job object is created, then any required inputs are uploaded. A job should only exist in a creating state for a few moments. If a job remains in creating state for more than a few moments, it typically indicates that the client disconnected during the process. Jobs that remain in creating state for too long are automatically removed.
Pending Once a job is created, it is placed in Pending state. The reason for the job being in a pending state might be: there are too many other jobs being executed, and therefore insufficient resources to place this job; or there is a constraint on where the job can be run and that constraint cannot be satisfied.
Executing Once a job has been committed to a node it is placed in Executing state. This state means the job is being prepared for execution, currently running, or any execution results are being recovered. Individual events in the job status will indicate exactly what phase of execution the job is in.
Completed successfully This state indicates that a job has run to completion with a success exit code (either returning a zero exit code or one of the additional exit codes that have been explicitly defined as indicating successful completion).
Completed with error This state indicates the job has run to completion but produced an unexpected exit code.
Failed This state indicates that something has gone wrong during the execution. For example, failure to recover result files, or failure to communicate with a host. This state will be accompanied by a reason text.
Cancelled This state indicates that the batch job was cancelled by the user.

Configuration

Token lifespan

The lifespan of the refresh and access tokens is governed by standard Altair SLC Hub configuration settings. See the etc\config.d\auth.yaml file for more information about their use and current default values.

The lifespan of the tokens can be set such that they affect all clients (the Altair SLC Hub portal, Altair Analytics Workbench, and the hubcli command), or they can be set individually for each of those client types. More details and examples are given in the etc\config.d\auth.yaml file.