# Quantum Espresso

Quantum ESPRESSO is an integrated suite of computer codes for electronic-structure calculations and materials modeling at the nanoscale. It is based on density-functional theory, plane waves, and pseudopotentials. Quantum ESPRESSO has evolved into a distribution of independent and inter-operable codes. The Quantum ESPRESSO distribution consists of a core set of components (e.g. PWscf and CP), and a set of plug-ins that perform more advanced tasks, plus a number of third-party packages designed to be inter-operable with the core components.

The program start with pseudo potentials and calculates the first estimate of the stationary state wavefunctions at each k point and band (i.e. each Kohn-Sham state). This is done by diagonalizing the Hamiltonian matrix using highly parallel linear algebra packages. Some algorithms (e.g. CP) requires the calculated stationary state wavefunctions to be orthonormalized, again, done using parallel linear algebra packages. The program then used 3D FFT (fast Fourier transforms) to calculate electron density, charge density and a new estimate of potentials. This process is repeated until self consistency is achieved. For some simulations, e.g. Nudged Elastic Band (neb.x) calculations and Atomic Displacement (ph.x), once the self consistent potential is achieved, program calculates forces and stresses on all atoms. It then moves the atoms around to reduce the forces and stresses to zero. For each new atomic position, self consistent field calculations are again repeated. aa

## Quantum Espresso Access on Kogence

On Kogence, Quantum Espresso is pre-configured and pre-optimized and is available to be run on a single cloud HPC server of your choice or on an autoscaling cloud HPC cluster. On Kogence, Quantum Espresso runs in a docker container. Quantum Espresso is available under Free Individual Plans. Kohgence deployment supports both openMP and MPI parallelism is compiled/linked/built with Intel MPI and Intel compilers (with AVX-512 advanced vector extensions of SIMD instruction set), Intel MKL, ScaLAPACK, ELPA, libxc, parallel HDF5 among other hardware specific optimizations.

### Versions Available on Kogence

Following versions are deployed, provisioned and performance optimized on Kogence.

• Quantum Espresso 6.7
• Thermo PW 1.5.0

### Packages Available on Kogence

Several executables and solvers are available from several native as well as third party Quantum Espresso packages:

• CP: manycp.x, cppp.x, wfdd.x and cp.x
• EPW: epw.x
• GUI: pwgui
• GWW: pw4gww.x, simple.x, simple_bse.x, bse_main.x, head.x, simple_ip.x, gww_fit.x, gww.x
• HP: hp.x
• NEB: neb.x, path_interpolation.x
• PHonon: phcg.x, alpha2f.x, epa.x', matdyn.x, dynmat.x, fqha.x, lambda.x, postahc.x, q2qstar.x, dvscf_q2r.x, ph.x, q2r.x
• PP: open_grid.x, bands.x, plotband.x, plotrho.x, ppacf.x, projwfc.x, plan_avg.x, pw2wannier90.x, wfck2r.x, fs.x, wannier_ham.x, fermi_velocity.x, plotproj.x, dos.x, fermi_proj.x, xctest_qe_libxc.x, molecularpdos.x, epsilon.x, pw2bgw.x, average.x, sumpdos.x, pp.x, pw2gw.x, pmw.x, initial_state.x, pw2critic.x, wannier_plot.x, pawplot.x
• PW: ev.x, scan_ibrav.x, kpoints.x, ibrav2cell.x, pwi2xsf.x, cell2ibrav.x, pw.x
• PCOND: pwcond.x
• TDDFPT: turbo_spectrum.x, turbo_lanczos.x, turbo_davidson.x, turbo_eels.x
• XSpectra: spectra_correction.x, molecularnexafs.x, xspectra.x
• Atomic: ld1.x
• Thermo PW: rotate_tensors.x, hex_trig.x, translate.x, space_groups.x, crystal_point_group.x, units.x, pdec.x, mag_point_group.x, epsilon_tpw.x, supercell.x, kovalev.x, gener_nanowire.x, gener_3d_slab.x, elastic.x, plot_sur_states.x, optical.x, test_colors.x, debye.x, gener_2d_slab.x, bravais_lattices.x, thermo_pw.x
• Upflib: virtual_v2.x, upfconv.x
• Wannier90-3.1.0: wannier90.x, postw90.x

## Entrypoint Binaries

Quantum Espresso is deployed in an individual docker container on Kogence. Among the large number of executables and post-processing codes offered by Quantum Espresso, as listed above, we only offer a limited set of significant solvers as Entrypoint Binaries for convenient invocation through Stack tab GUI, through CloudShell or through other software applications running in their own containers. All executables and solvers listed above are available through bash scripts is run inside the Quantum Espresso container (using bash-shell Entrypoint Binary) or when a shell terminal is opened inside the Quantum Espresso container (using shell-terminal Entrypoint Binary).

Quantum Espresso container offer following Entrypoint Binaries.

• pw.x: Basic code for Self Consistent Field (Hartree-Fock) calculation, structure optimization and molecular dynamics.
• ph.x: Phonon code, Gamma-only and third-order derivatives.
• hp.x: Calculation of the Hubbard parameters from DFPT (density functional perturbation theory).
• pwcond.x: Ballistic conductance.
• neb.x: Code for Nudged Elastic Band method.
• cp.x: Car-Parrinello (CP) molecular dynamics code.
• xspectra.x: X-ray core-hole spectroscopy calculations.
• epw.x: Electron-Phonon Coupling with Wannier functions.
• pwgui: Graphical User Interface.
• bash-shell: Run a bash shell script inside Quantum Espresso container.
• shell-terminal: Start a shell terminal emulator inside Quantum Espresso container.

## Quantum Espresso Usage on Kogence

• Invoking Entrypoint Binaries directly through Stack tab: all of these binaries (with an exception of pwgui, bash-shell, shell-terminal) expect < *.in > *.out or -i *.in > *.out to be provided in the the Arguments/Options textbox on the Stack tab. For parallel computing, Stack tab GUI allows you to select number of nodes in the cluster and number of processes and threads in each node and you do not need to modify the command that you entered in the Stack tab GUI (e.g. you will not call mpirun). You would still need to provide the $PARA_POSTFIX parameters in the Arguments/Options textbox: for example, -nk 6 -nt 6 -nd 4 < *.in > *.out. Please see above section on Parallel Computing in Quantum Espresso for detailed description of $PARA_POSTFIX parameters.
• Invoking Entrypoint Binaries on CloudShell (using shell-terminal Entrypoint Binary) or inside other software containers: use double quotes around the Arguments/Options like: pw.x "< *.in > *.out". For parallel computing, you will need to add $PARA_PREFIX and $PARA_POSTFIX around the invocation command like: $PARA_PREFIXpw.x <code>$PARA_POSTFIX "< *.in > *.out"</code>. See next section for more details.
• Invoking any of the Quantum Espresso solvers/executables through a container shell script (using bash-shell Entrypoint Binary): use full command like: pw.x < *.in > *.out. For parallel computing, you will need to add $PARA_PREFIX and $PARA_POSTFIX around the invocation command like: $PARA_PREFIXpw.x <code>$PARA_POSTFIX < *.in > *.out</code>. See next section for more details.

### Single Node Invocation Options

For using Quantum Espresso on Kogence, you would first create or copy an existing Quantum Espresso model. You can see list of ready to copy and execute Quantum Espresso model on Kogence here: Category:Quantum Espresso. Step by step instructions for creating and copying models is here How To Use Kogence.

1. Using Kogence Software Stack Builder GUI: This is the easiest way to use Quantum Espresso on Kogence. Stack Builder GUI also allows you to connect multiple software (such as pre-processing and post-processing software) to your model and create complex multi-software workflows. Open your Quantum Espresso model. On top navigation bar, click on the Stack tab. Click the + button to connect a software to your model. A pop up will come up on which you can search/filter/scroll to find Quantum Espresso. Click + button next to Quantum Espresso. Quantum Espresso container is now added to your workflow. Now you will see a dropdown menu that will let you pick an Entrypoint Binary. Pick, for example, the pw.x Entrypoint Binary. Now you will see an empty textbox called Arguments/Options. Type < myScript.in > myOutput.out in the box. You will also see dropdown options that allow you to select number of parallel processes and number of parallel threads. The product of these two cannot be more than the number of CPUs in the hardware you selected. We recommend selecting parallel threads to be 1 as most operations in Quantum Espresso are coded for parallel processing and not for multithreading. Click the Save button to save the software stack and then click the Run button on top navigation bar. You may have to wait 2 minutes for your HPC server to boot up. Once it is ready, you will see that the Run button turns into Stop button and there is a Visualizer button that lets you connect to your HPC server. Results are saved in various myOutput.out file. Logs and errors are printed in the ___titusiOutput and ___titusiError files. All of these are available in real time when your simulations are in running state under the Visualize tab or under the Files tab after the simulation has ended.
2. Using Container Bash Shell Script: Follow the same process as above but instead of picking pw.x as the Entrypoint Binary pick the bash-shell binary. Type the name of your *.sh input file (this file should be available under the Files tab of your model) in the Arguments/Options textbox. Click the Save button to save the software stack and then click the Run button on top navigation bar. When you execute Quantum Espresso through this method, the Quantum Espresso will use OpenMP parallelism automatically and you don't have to do anything. If you like, you can also use MPI parallelism (we recommend doing this). In order to effectively use all the CPUs in your chosen cloud HPC server (in the Cluster tab of your Model), you may need to modify the $PARA_PREFIX and $PARA_POSTFIX parameters in the input bash shell script. The $PARA_PREFIX tells the system if openMP, MPI or hybrid parallelism is desired and how many maximum number of CPUs are available for parallelism in the cloud HPC server hat has been orchestrated for this simulation, while the $PARA_POSTFIX tells the system how many of these available CPUs should be used for which portion of the calculation (please see above section on Parallel Computing in Quantum Espresso for detailed description of $PARA_POSTFIX). Here is an example that uses only MPI parallelism (i.e. 36 MPI parallel processes will be started) on a 36 CPU machine.  #Code Section Example PARA_PREFIX="mpirun -np 36" PARA_POSTFIX="-nk 6 -nt 6 -nd 4" export OMP_NUM_THREADS=1 PW_COMMAND="$PARA_PREFIX pw.x $PARA_POSTFIX"  Please refer to User Manual and Intel MPI for details on these command line switches. There are some constraints in the $PARA_PREFIX: $OMP_NUM_THREADS (set in your input bash script, see below) cannot be more than the number of CPUs on each compute node. openMP provides shared memory parallelism within each node. And the product of $OMP_NUM_THREADS (set in your input bash script, see below) and the number of MPI processes (set in $PARA_PREFIX as mpirun -np switch) can not be more than the maximum total number of CPUs available in the cloud HPC server that you requested in the Cluster tab of your Model. 3. CloudShell Shell Terminal Access: Follow the steps as described above in the 'Using Kogence Software Stack Builder GUI' section. Connect the Quantum Espresso container to the Stack tab of your Model but this time do not select any Entrypoint Binary. Next, click the + button one more time to connect the CloudShell to your model. A pop up will come up on which you can search/filter/scroll to find CloudShell. Click + button next to CloudShell. CloudShell will now be added to your workflow. Kogence offers 2 different shell terminal emulators: xterm and gnome-terminal. In the Arguments/Options textbox, either type xterm or gnome-terminal depending upon which emulator you prefer. When you run the model, you can go to Visualizer tab and you will see your shell terminal. Here you can invoke individual solvers like pw.x or you can also execute your own bash scripts. Make sure you add/upload your bash script under the Files tab of your model before running the model. For example you can do: 1. export OMP_NUM_THREADS=1; mpirun -np 36 pw.x -nk 6 -nt 6 -nd 4 "< MyCode.in > MyOutput.out" : This will invoke pw.x Entrypoint Binary on single node with MPI parallelism instead of OpenMP. 2. YourScript.sh YourScriptArgs: Will run your custom shell script. Make sure you add/upload your bash script under the Files tab of your model before running the model. 4. Using PWGUI: Follow the steps as described above in the 'Using Kogence Software Stack Builder GUI' section. Connect the Quantum Espresso container to the Stack tab of your Model but this time select the pwgui as the Entrypoint Binary. Leave the Arguments/Options textbox empty. Save the settings and run the model. Once Visualizer button is active you can click that button to connect to your cloud HPC server. You will see the pwgui GUI interface. ### Autoscaling Cluster Invocation Options 1. Using Kogence Software Stack Builder GUI: First, make sure that you check mark the "Run on Autoscaling Cluster" checkbox in the Cluster tab of your model. Then, follow the steps as described above in the 'Using Kogence Software Stack Builder GUI' section under the Single Node Invocation Options. This time you will see "Run on Cluster" checkbox on the Stack next to the pw.x Entrypoint Binary that you just added to your model. Check that box. This will execute pw.x on the compute nodes of the cluster instead of running it on the master node (also known as the manager/visualization/login node). 2. Using Container Bash Shell Script: First, make sure that you check mark the "Run on Autoscaling Cluster" checkbox in the Cluster tab of your model. Then, follow the steps as described above in the 'Using Container Bash Shell Script' section under the Single Node Invocation Options. NOTE: this time, do not check the "Run on Cluster" checkbox. We want to run the bash script on the master node. Bash script container the qsub command which will send the job to compute nodes of the cluster. The script itself needs to run on the master node. In order to effectively use all the CPUs in the Kogence autoscaling cloud HPC cluster, you may need to modify the $PARA_PREFIX and $PARA_POSTFIX parameters in the input bash shell script (please see above section on Parallel Computing in Quantum Espresso for detailed description of $PARA_POSTFIX). Here is an example that uses only MPI parallelism (i.e. 36 MPI parallel processes will be started) on a 36 CPU machine:
   #Code Section Example:
PARA_PREFIX="qsub -pe mpi 72 -b y -N job1 -b y -cwd -o ./___titusiOutput -j y -V mpirun -np 72"
PARA_POSTFIX="-nk  12  -nt  6  -nd  4 "
PW_COMMAND="$PARA_PREFIX pw.x$PARA_POSTFIX"

Please refer to User Manual, Job Scheduler and Intel MPI for details on these command line switches. There are some constraints in the $PARA_PREFIX. Just like single node case, the $OMP_NUM_THREADS cannot be more than the number of CPUs on each compute node. openMP provides shared memory parallelism within each node. On the other hand, the product of $OMP_NUM_THREADS (set in your input bash script) and the number of MPI processes (set in $PARA_PREFIX as mpirun -np switch) can not be more than the maximum total number of CPUs available in your Kogence autoscaling cloud HPC cluster. As described above, the maximum number of CPUs is determined by the maximum number of compute nodes that you select to let your cluster to scale up to and the number of CPUs in each compute node -- both of these choices are made in the Cluster tab of your Model.
3. CloudShell Shell Terminal Access: First, make sure that you check mark the "Run on Autoscaling Cluster" checkbox in the Cluster tab of your model. Then, follow the steps as described above in the 'CloudShell Shell Terminal Access' section under the Single Node Invocation Options. When you run the model, you can go to Visualizer tab and you will see your shell terminal. Here you can invoke individual solvers like pw.x or you can also execute your own bash scripts. Make sure you add/upload your bash script under the Files tab of your model before running the model. For example you can do:
1. export OMP_NUM_THREADS=1; qsub -pe mpi 72 -b y -N job1 -b y -cwd -o ./___titusiOutput -j y -V mpirun -np 72 pw.x -i MyCode.in -nk  12  -nt  6  -nd 4. This will invoke Quantum Espresso on Kogence autoscaling cloud HPC cluster. See Job Scheduler and Intel MPI for details on these command line switches.
2. YourScript.sh YourScriptArgs: Will run your custom shell script. Make sure you add/upload your bash script under the Files tab of your model before running the model. The script itself will run on the master node but that script can contain qsub commands like above. Those perticular commands will be sent to the Kogence autoscaling cloud HPC cluster.
4. Using PWGUI: Follow the steps as described above in the 'Using PWGUI' section under the Single Node Invocation Options.

## Parallel Computing in Quantum Espresso

Quantum Espresso supports shared memory parallelism through multi-threading (openMP), distributed memory parallelism through multi-processing (MPI) as well as hybrid parallelism where users may choose to use share memory parallelism within each node and distributed memory parallelism between the nodes. In order to effectively use all the CPUs in the Kogence autoscaling cloud HPC cluster, user may need to modify the $PARA_PREFIX and $PARA_POSTFIX parameters in the input bash shell script if you are invoking Quantum Espresso using a bash shell script or on the command prompt if you are invoking Quantum Espresso using a shell terminal.

$PARA_PREFIX tells the system if openMP, MPI or hybrid parallelism is desired. On Kogence, by default, Quantum Espresso is setup to use openMP parallelism. If user is running simulations on single node and if openMP parallelism is desired then user does not need to modify the $PARA_PREFIX parameter. If, on the other hand, MPI parallelism is desired then $PARA_PREFIX should be modified. For single node simulations, the $PARA_PREFIX would look like mpirun  -np  4096 while on multi node cluster simulation $PARA_PREFIX would look like qsub -pe mpi 4096 mpirun -np 4096. Please see Single Node Invocation Options and Autoscaling Cluster Invocation Options sections below for more details. The $PARA_POSTFIX tells the system how many of these available CPUs (i.e. 4096 in above example) should be used for which portion of the calculations. To control the number of CPU assigned for calculations in each group, use the following $PARA_POSTFIX switches in your bash shell script on the shell terminal depending upon how you are invoking Quantum Espresso on Kogence clusters: • -nimage (or the shorthand -ni). Available only with neb.x or ph.x. -ni provides a loosely coupled parallelization mechanism to parallelly simulate replicas of same system. Different images can be run on different nodes even with poor network as the images are loosely coupled and rarely interact. • -npools (or the shorthand -nk). If the simulation in each image (i.e -ni) consists of multiple k-points (number of points the irreducible Brillouin zone of the reciprocal space) then those can be simulated in parallel in -nk number of pools of CPUs. For example, if your simulation included 1000 k points and if -nk is set as 4 then 4 pools of multiple CPUs will be created for parallel calculations. Each pool would work on 250 k points. The number of CPUs available in each pool will depend on the settings of other switches in $PARA_POSTFIX as well as total number of CPUs available through the settings of $PARA_PREFIX. k-point parallelism is also loosely coupled and pools can be located on different nodes with poor network. CPUs within each pool are instead tightly coupled and communications can be signiﬁcant. This means that fast communication hardware is needed if your pool extends over more than a few CPUs on diﬀerent nodes. If that is the case then you should use the Kogence Network Limited Workload nodes that come with OS Bypass remote direct memory access network such as the Infiniband network. • -nband (or the shorthand -nb). Each pool (i.e. -nk) can be subpartitioned into ”band groups”, each taking care of a group of Kohn-Sham orbitals (also called bands). This type of calculation parallelism is especially useful for calculations with hybrid functionals. Depending upon the problem at hand, band may or may not interact much. If you are running different bands on different nodes of the cluster and if you expect your system to be significantly "hybridized", meaning if you expect bands in your system to interact significantly, then you should use the Kogence Network Limited Workload nodes that come with OS Bypass remote direct memory access network such as the Infiniband network. • -ntg (or the shorthand -nt). Each band group can then be partitioned into task groups. Task groups perform the 3D FFT operations to calculate electron density and charge density given a set of stationary state wavefunctions. Task are very tightly coupled. We recommend running them on same node. If that is not possible then please use the Kogence Network Limited Workload nodes that come with OS Bypass remote direct memory access network such as the Infiniband network. • -ndiag (or -northo or the shorthand -nd). This switch specifies the number of CPUs in contrast to the other switches that specify the number of partitions. This switch creates a linear-algebra group of -nd CPUs from within the band group as and when needed (it does not conflict with -nt settings) and parallelize matrix diagonalization and matrix-matrix multiplications needed for the iterative diagonalization (SCF) or orthonormalization (CP). CPUs within ortho group should be very tightly coupled. We recommend running them on same node. If that is not possible then please use the Kogence Network Limited Workload nodes that come with OS Bypass remote direct memory access network such as the Infiniband network. There are 2 constraints that should be kept in mind while selecting appropriate values for these switches in the $PARA_POSTFIX.

• Please note that the product of -ni, -nk, -nb and -nt can not be more than the maximum total number of CPUs available in your Kogence autoscaling cloud HPC cluster. Please note that the maximum number of CPUs is determined by the maximum number of compute nodes that you select to let your cluster to scale up to and the number of CPUs in each compute node -- both of these choices are made in the Cluster tab of your Model.
• The diagonalization and the orthonomalization operations block partition the matrices into a square 2D array of smaler matrices. As a consequence the number of CPUs in the linear-algebra group is given by ${\displaystyle n_{d}=m^{2}}$, where ${\displaystyle m}$ is an integer such that ${\displaystyle m^{2}}$ is less than or equal to CPUs in each band group. The diagonalization is then performed in parallel using ScaLAPACK in Kogence clusters.

There are also some constraints in the $PARA_PREFIX. • $OMP_NUM_THREADS (set in your input bash script, see below) cannot be more than the number of CPUs on each compute node. openMP provides shared memory parallelism within each node.
• The product of $OMP_NUM_THREADS (set in your input bash script, see below) and the number of MPI processes (set in $PARA_PREFIX as mpirun -np switch) can not be more than the maximum total number of CPUs available in your Kogence autoscaling cloud HPC cluster. As described above, the maximum number of CPUs is determined by the maximum number of compute nodes that you select to let your cluster to scale up to and the number of CPUs in each compute node -- both of these choices are made in the Cluster tab of your Model.

As an example of combined effect of both $PARA_PREFIX and $PARA_POSTFIX, consider the following command line:

mpirun  -np  4096  neb.x  -ni  8  -nk  2  -nt  4  -nd  144  < my.input > my.output

This executes a NEB calculation on 4096 CPUs, 8 images (points in the conﬁguration space in this case) at the same time, each of which is distributed across 512 processors (${\displaystyle 8\times 512=4096}$). k-points are distributed across 2 pools of 256 processors each, 3D FFT is performed using 4 task groups (64 processors each, so the 3D real-space grid is cut into 64 slices), and the diagonalization of the subspace Hamiltonian is distributed to a square grid of 144 processors (12x12).

Default values are: -ni  1  -nk  1  -nt  1 . Default value for the switch -nd is 1 if ScaLAPACK is not compiled (in Kogence it is compiled with ScaLAPACKEandLPA,)and it is set to the square integer smaller than or equal to half the number of processors of each pool by default.