Help:Autoscaling HPC Cluster

Jump to: navigation, search

Kogence Autoscaling Cloud HPC Cluster Architecture

Kogence High Performance Computing (HPC) clusters are NOT shared clusters unlike HPC traditional onprem clusters. Each model execution on Kogence kick starts a new private and personal autoscaling HPC cluster in the cloud. When your model terminates, your cluster terminates too. If you add collaborators to your model then those collaborators can also access the cluster that your model has started. Cluster Architecture.png

Each of Kogence HPC cluster consists of two types of nodes - 1/ One master node and 2/ Multiple compute nodes. Master node is interactive node and support latency free remote user interaction. You can open graphical applications or shell terminals (TTY). Master node is also used to monitor cluster and job status. On the other hand, compute nodes are batch processing nodes. They do not provide graphics or TTY support. Jobs you send to the compute nodes must be able to execute without TTY support, should not require user interaction and should not be trying to open any graphical windows.

Kogence clusters allow you to select different type of HPC server hardware independently for the master node and for the compute nodes. All compute nodes in a cluster are of same server hardware type. Typically, users select a small inexpensive server hardware for the master node and use this node for pre-processing, post-processing and cluster and job monitoring purpose. Typically these uses do not require high computing power. Type of server hardware for the compute nodes can be selected based on the scalability and memory needs of your model. If your model scales linearly as you increase the number of CPUs then you are better of choosing compute nodes with large CPU counts. If your model requires large memory then you can select compute nodes that have large RAM/CPU ratio.

Number of compute nodes on Kogence clusters scale up and down automatically depending upon the number of jobs you send to the cluster and depending upon the resource requirements of each of these jobs. You use the Kogence Cluster Builder Graphical User Interface (GUI) to select cluster hardware resources you want in your cluster.

Configuring the Cloud HPC Cluster Hardware

You use the Kogence Cluster Builder GUI to select cluster hardware resources you want in your cluster. Kogence Cluster Builder GUI is accessible through the Cluster tab of your model. Follow the steps below to setup you cluster.

  • Step1: Select a Compute Plan

On the left hand panel, first you select one of your Compute Plans (i.e. one of your CPU-Credit accounts) that you want to use for the simulations. All logged-in Kogence users will at least see an Individual Plan in the dropdown menu. If you are one of the team member of a Team Subscription then you might see more than one Compute Plans in the dropdown menu to choose from. Typically users use different Compute Plans for work under different projects or different grants. KogenceClusterBuilder1.jpg

  • Step2: Select a Time Limit for the Cluster

On the left hand panel, you should enter a time limit after which you want your cluster to be automatically terminated irrespective of whether your simulation has completed or not. When you start your simulation, some CPU Credits will be blocked from your Compute Plan based on the time limit you select here (this also depends upon the number of nodes and type of nodes in the cluster). After your cluster terminates (either automatically because your simulation ended or because you terminate your cluster manually using the Stop button on top right corner of the NavBar), this CPU Credits block would be removed and some CPU Credits would be credited back to your Compute Plan. Net number of CPU Credits that are debited would depend upon the number of hours each of the compute node in your cluster was actually up.

Typically, user do not want to end their computations prematurely after hours of computing on several nodes. We recommend maintaining a large pool of CPU Credits (these credits do not expire) and selecting large time limit to ensure that jobs do not end prematurely. Unused CPU Credits are automatically refunded back to your Compute Plan.

If we have a credit card on file with Auto Refill box check marked then you can also select the Unlimited option for the time limit. In that case, once your Compute Plan is about to be depleted, your credit card would be charged and, if charge attempt is successful, your Compute Plan would be replenished. In this case your simulation can continue to run.

  • Step3: Select "Run on autoscaling cluster" Option

Once you select this option you will be able to send selected jobs and selected commands within your model to the compute node while others will execute on the master node. If this option is not selected then only one cloud HPC server will be booted up and all jobs would run the same server. You would execute jobs through the Software Stack Builder GUI or through CloudShell terminal or through scripts as described below.

  • Step4: Select Maximum Number of Compute Nodes

Autoscaling clusters start with a minimum number of compute nodes (currently set at 2 nodes). It then scales up and down automatically depending upon the jobs submitted to compute nodes through the Software Stack Builder GUI or through CloudShell terminal or through scripts. You can select the maximum number of compute nodes that you want to have in your cluster. You do not need to change your job submission scripts or your settings on the Software Stack Builder GUI. If the workload submitted is higher than the cluster size then jobs would simply wait in the queue for the compute nodes to become available. If the workload submitted is less than the cluster size then some of the compute nodes would be automatically terminated soon. Cluster can scale down to zero compute nodes if no workload has been waiting in the queue for sometime.

See the screen shots below. You can click on the images to see a larger version.

  • Step5: Select Server Hardware for the Master and the Compute Nodes

Kogence clusters allow you to select different type of HPC server hardware independently for the master node and for the compute nodes. All compute nodes in a cluster are of same server hardware type. Typically, users select a small inexpensive server hardware for the master node and use this node for pre-processing, post-processing and cluster and job monitoring purpose. Typically these uses do not require high computing power. Type of server hardware for the compute nodes can be selected based on the scalability and memory needs of your model. If your model scales linearly as you increase the number of CPUs then you are better of choosing compute nodes with large CPU counts. If your model requires large memory then you can select compute nodes that have large RAM/CPU ratio.

First select "Master Node" from the dropdown menu and then click on one of the machine icons of the different HPC server hardware available under your subscription to select the HPC hardware for the master node of the cluster. Next select the "Compute Node" from the same dropdown menu and then again click on one of the machine icons to select the HPC server hardware for the compute node.

  • Step6: Save the Cluster Settings

On the right hand panel you will see a summary of your final selection. Please press Save button to confirm the selections.

Running Jobs on Autoscaling Cluster Using Kogence Software Stack Builder GUI

Running Jobs on Autoscaling Cluster Using CloudShell Terminal

For most users, running jobs on cloud HPC clusters using the Kogence Software Stack Builder GUI is the the most convenient, easiest and quickest method. On the other hand, more advanced user may prefer to invoke executable binaries of available software, solvers and simulator applications using a shell terminal. It offers greater flexibility. Use cases can vary widely and include ability to redirect outputs and errors to specific files, ability to stop and restart an application with stopping entire simulation, invoking executable with large number of different input files (e.g. parameter scans) etc.

  • Step1: Connect Specific Software to Your Model

First, go to the Stack tab of your model and connect one or more software, solver or simulator applications that you want to use from the shell terminal. Please do not select any entrypoint binaries from the dropdown menu for any of these software. See an example screen shot below. This would download the software docker containers on your HPC server after it boots up. This may take a few seconds. Since you did not select any entrypoint binaries, no command would be invoked. After downloading the containers, commands would be automatically configured and added to the PATH so you can invoke them from the shell terminal.

Staying at the Stack tab of your model, connect the CloudShell to your model. From the entrypoint binary dropdown menu, select the shell-terminal. Currently, we offer two shell terminal emulators: xterm and gnome-terminal. In the command textbox please type either xterm or gnome-terminal depending on your preference.

  • Step3: Save the Software Stack Settings

Save the settings of the Stack tab.

  • Step4: Run the Model and Connect to Your Cloud HPC Server

From the top NavBar click on the Run button. It may take up to 5 minutes for your autoscaling cloud HPC cluster to boot up. Once the cluster is booted up, the Run button would turn into Stop button and the Visualizer button would become active. Connect to the master node of your cluster by clicking on the Visualizer button. Once all the software containers have been downloaded, you will see your shell terminal.

You are ready to invoke any of the binaries that are documented on the simulator wiki page. Any command that you invoke on the shell terminal would be executed on the master node. On the other hand, if you want to send your jobs to the compute nodes of your cluster using the shell terminal then please use the qsub command of the Grid Engine resource manager like below:
qsub -b y -pe mpi num_of_CPU -cwd YourCommand
Be careful with the num_of_CPU. It has to be either same or less than the number of CPU's in the compute node type that you selected in the Cluster tab of your model. Grid Engine offers a lot of flexibility with many command line switches. Please check the qsub man page. Specifically you might find following switches to be useful:
    • -b y: Command you are invoking is treated as a binary and not as a job submission script.
    • -cwd: Makes the current folder as the working directory. Output and error files would be generated in this folder.
    • -wd working_dir_path: Makes the working_dir_path as the working directory. Output and error files would be generated in this folder.
    • -o stdout_file_name: Job output would go in this file.
    • -e stderr_file_name: Job error would go in this file.
    • -j y: Sends both the output and error to the output file.
    • -N job_name: Gives job a name. Output and error files would be generated with this name if not explicitly specified using -o and/or -e switches. You can also monitor and manage your job using the job name.
    • -sync y: By default qsub command returns control to the user immediately after submitting the job to cluster. So you can continue to do other things on the CloudShell terminal including submitting more jobs to the scheduler. This option tells qsub to not return the control until the job is complete.
    • -V: Specifies that all environment variables active within the qsub utility be exported to the context of the job.

Running MPI and openMP Jobs Using Shell Terminal

Running Jobs on Autoscaling Cluster Using Scripts

For most users, running jobs on cloud HPC clusters using the Kogence Software Stack Builder GUI is the the most convenient, easiest and quickest method. On the other hand, more advanced user may prefer to invoke executable binaries of available software, solvers and simulator applications using scripts. It offers greater flexibility. Use cases can vary widely and include ability to redirect outputs and errors to specific files, ability to stop and restart an application with stopping entire simulation, invoking executable with large number of different input files (e.g. parameter scans) etc.

  • Step1: Connect Specific Software to Your Model

First, go to the Stack tab of your model and connect one or more software, solver or simulator applications that you want to use from the shell terminal. Please do not select any entrypoint binaries from the dropdown menu for any of these software. See an example screen shot below. This would download the software docker containers on your HPC server after it boots up. This may take a few seconds. Since you did not select any entrypoint binaries, no command would be invoked. After downloading the containers, commands would be automatically configured and added to the PATH so you can invoke them from the shell terminal.

Staying at the Stack tab of your model, connect the CloudShell to your model. From the entrypoint binary dropdown menu, select the bash-shell. In the command textbox please type the name of your script. Save the settings of the Stack tab. Make sure you have the script available under the Files tab of your model.

  • Step3: Save the Software Stack Settings

Save the settings of the Stack tab.

  • Step4: Run the Model and Connect to Your Cloud HPC Server

From the top NavBar click on the Run button. It may take up to 5 minutes for your autoscaling cloud HPC cluster to boot up. Once the cluster is booted up, the Run button would turn into Stop button and the Visualizer button would become active. Connect to the master node of your cluster by clicking on the Visualizer button. Once all the software containers have been downloaded, your shell script will execute automatically.

In your shell-script you can invoke any of the binaries that are documented on the simulator wiki page. Any command that you invoke in your shell script would be executed on the master node. On the other hand, if you want to send your jobs to the compute nodes of your cluster through the script then please use the qsub command of the Grid Engine resource manager like below:
qsub -b y -pe mpi num_of_CPU -cwd YourCommand
Be careful with the num_of_CPU. It has to be either same or less than the number of CPU's in the compute node type that you selected in the Cluster tab of your model. Grid Engine offers a lot of flexibility with many command line switches. Please check the qsub man page. Specifically you might find following switches to be useful:
    • -b y: Command you are invoking is treated as a binary and not as a job submission script.
    • -cwd: Makes the current folder as the working directory. Output and error files would be generated in this folder.
    • -wd working_dir_path: Makes the working_dir_path as the working directory. Output and error files would be generated in this folder.
    • -o stdout_file_name: Job output would go in this file.
    • -e stderr_file_name: Job error would go in this file.
    • -j y: Sends both the output and error to the output file.
    • -N job_name: Gives job a name. Output and error files would be generated with this name if not explicitly specified using -o and/or -e switches. You can also monitor and manage your job using the job name.
    • -sync y: By default qsub command returns control to the user immediately after submitting the job to cluster. So your script would immediately move on to do other things including submitting more jobs to the scheduler. This option tells qsub to not return the control until the job is complete.
    • -V: Specifies that all environment variables active within the qsub utility be exported to the context of the job.

Running MPI and openMP Jobs Using Scripts

Managing and Monitoring Cloud HPC Cluster Jobs

For all your Cloud HPC Cluster jobs, we strongly recommend that you add a CloudShell software to your workflow using the Stack Builder GUI accesible through the Stack tab of your model. Make sure that the CloudShell is marked as "Run With Previous" so that the CloudShell becomes available for use immediately and is not blocked until jobs before it have been completed.

On the CloudShell terminal, you can use the qhost, qstat and qdel utilities to manage and monitor your Cloud HPC Cluster jobs.

ClusterJobMonitoring.png

  • qhost shows the number of compute nodes available in your cluster. NCPU, NSOC, NCOR and NTHR shows the number of CPUs, number of sockets, number of cores and number of hardware threads available on each of those compute nodes. LOAD shows the time averaged (over past 5 minutes) demand for number of CPUs for all runnable processes. If you have 36 CPU nodes then LOAD should be very close to 36 if you utilizing your nodes properly.
  • qstat shows the state of the job.
    • "r" represents a properly running job.
    • "qw" represents a job that is waiting-in-queue. This usually happens when your autoscaling cluster is waiting for nodes to be added into your cluster. Kogence platform uses predictive algorithms and there can sometime be an up to 5 minutes of delay before nodes are added to your cluster. Your job will automatically switch to "r" state once nodes are added.
    • "Eqw" represents that job is waiting in queue due to some error in the submission command. You can run the command qstat -explain c -j jobIDXYZ to see more details. You can use qdel jobIDXYZ to delete the job, fix the submission command and resubmit the job using qsub.