# Introduction to Supercomputing Architecture, Linux and job scheduling (SLURM)

Intro to Supercomputing Architechture:\
Slides:
-------

\*Just section II\*

{% file src="/files/4CkAaXcw2y9GCyCcx6TD" %}
Rebecca Hartman-Baker, PhD User Engagement Group Lead Charles Lively III, PhD Science Engagement Engineer Helen He, PhD User Engagement Group June 28, 2024
{% endfile %}

### Introduction to SLURM: Theory and Usage

#### What is SLURM?

SLURM (Simple Linux Utility for Resource Management) is a widely-used open-source workload manager designed to efficiently allocate computing resources on High-Performance Computing (HPC) clusters. It manages how computational jobs are scheduled, executed, and monitored across the cluster.

#### How SLURM Works

SLURM operates based on the following key concepts:

* **Nodes**: Individual computers within a cluster, each with multiple CPUs or GPUs.
* **Partitions**: Logical groups of nodes configured by administrators, typically organized by node capability or job duration.
* **Jobs**: Tasks or programs submitted by users to be executed on the cluster.
* **Scheduler**: The core component of SLURM, responsible for managing resources and scheduling jobs based on priority, availability, and job requirements.

When a user submits a job, SLURM places it into a queue. The scheduler prioritizes and allocates resources to jobs based on user requests, resource availability, and cluster policies. Once resources become available, the scheduler assigns the necessary nodes and executes the job automatically.

#### Submitting Jobs to SLURM

To submit jobs to SLURM, users typically write a simple batch script and then submit it using the `sbatch` command.

Here's a basic example of a SLURM batch script:

```bash
#!/bin/bash
#SBATCH --job-name=my_first_job
#SBATCH --output=output.txt
#SBATCH --error=error.txt
#SBATCH --time=01:00:00
#SBATCH --partition=standard
#SBATCH --nodes=1
#SBATCH --ntasks=1

# Load modules if necessary
module load python

# Run your program or command
python myscript.py
```

* `--job-name`: Specifies the name of your job.
* `--output` and `--error`: Files to save the standard output and error messages.
* `--time`: Requested maximum runtime of the job (format HH:MM:SS).
* `--partition`: Partition (group of nodes) your job should run on.
* `--nodes`: Number of nodes required.
* `--ntasks`: Number of parallel tasks (typically equal to the number of processes you want to run).

#### Useful SLURM Commands

* `sbatch myscript.sh`: Submits a job script.
* `squeue`: Lists jobs currently in the queue.
* `scancel <job_id>`: Cancels a job based on its Job ID.
* `sinfo`: Displays information about partitions and node availability.

#### Checking Job Status

You can monitor your job status with:

```bash
squeue -u <username>
```

This will show all your current jobs and their status (pending, running, etc.).

#### Canceling Jobs

If you need to cancel a job, use:

```bash
scancel <job_id>
```

Replace `<job_id>` with the actual ID of the job you want to cancel.

### Key SLURM Commands

| Command   | Purpose                              |
| --------- | ------------------------------------ |
| `sinfo`   | View available resources             |
| `squeue`  | See running/pending jobs             |
| `sbatch`  | Submit a batch job                   |
| `srun`    | Launch a job step or interactive job |
| `scancel` | Cancel a running job                 |
| `sacct`   | View accounting/history (if enabled) |

***

***

## Try it out on TAMU FASTER

{% hint style="info" %}
Needs an ACCESS Account and a TAMU account from ACCESS
{% endhint %}

Guide from <https://hprc.tamu.edu/kb/User-Guides/FASTER/ACCESS-CI/#getting-an-access-account>

**Authorized ACCESS users can log in using the Web Portal:**

{% embed url="<https://portal-faster-access.hprc.tamu.edu>" %}

### Compose a job using Drona Composer

<figure><img src="/files/4FpjRkasespS0xK0l3cX" alt=""><figcaption></figcaption></figure>

Click on Drona Composer

Setup your SLURM job using the GUI.

job name: ship\_fractal

location: leave as is

Environments: Generic

Upload files: select file, add the ship.py file uploading from your local machine as below.

Sample job code:

Make a file on your local machine called ship.py . This is a sample script that we will run on the HPC.

````python
# ship.py

import numpy as np
import matplotlib.pyplot as plt

# Set image resolution
width, height = 1000, 1000
max_iter = 256

# Define viewing window in complex plane
xmin, xmax = -2.0, 1.5
ymin, ymax = -2.0, 0.5

# Generate complex grid
x = np.linspace(xmin, xmax, width)
y = np.linspace(ymin, ymax, height)
X, Y = np.meshgrid(x, y)
C = X + 1j * Y

# Initialize fractal iteration array
Z = np.zeros_like(C)
img = np.zeros(C.shape, dtype=int)

# Compute Burning Ship fractal
for i in range(max_iter):
    Z = (np.abs(Z.real) + 1j * np.abs(Z.imag))**2 + C
    mask = (img == 0) & (np.abs(Z) > 2)
    img[mask] = i

# Plot and save the result
plt.figure(figsize=(10, 10))
plt.imshow(img, cmap='hot', extent=(xmin, xmax, ymin, ymax))
plt.axis('off')
plt.tight_layout()
plt.savefig("burning_ship.png", dpi=300, bbox_inches='tight')
```

````

Number of Tasks: 1

No Accelerator

Total memory: 40GB

Expected Run Time: 10 Minutes

Project Account: Default one

<figure><img src="/files/LxqoCvClJjm6iOSEFvSf" alt=""><figcaption></figcaption></figure>

### Click Preview and then and the follow code to below where it says ADD YOUR COMMANDS BELOW

<pre><code>
module load GCC/13.3.0 GCC/9.3.0  CUDA/11.0.2  OpenMPI/4.0.3  GCC/9.3.0  OpenMPI/4.0.3 iccifort/2020.1.217  impi/2019.7.217
module load  SciPy-bundle/2020.03-Python-3.8.2 matplotlib/3.2.1-Python-3.8.2
python ship.py

<strong>
</strong></code></pre>

Your template.txt should look like below:

```
#!/bin/bash
#SBATCH --job-name=ship
#SBATCH --time=1:0:00 --mem=2G
#SBATCH --ntasks=1 --nodes=1 --cpus-per-task=1
#SBATCH --output=out.%j --error=error.%j
#SBATCH   --account=145332967756

module purge
module load WebProxy 
cd /scratch/user/u.sc126842/drona_composer/runs/ship
# ADD YOUR COMMANDS BELOW


module load GCC/13.3.0 GCC/9.3.0  CUDA/11.0.2  OpenMPI/4.0.3  GCC/9.3.0  OpenMPI/4.0.3 iccifort/2020.1.217  impi/2019.7.217
module load  SciPy-bundle/2020.03-Python-3.8.2 matplotlib/3.2.1-Python-3.8.2
python ship.py


```

Click submit.

Go back to main dashboard, jobs, Active Jobs to view the job and file output.

If you job completes then go: dashboard, files, scratch, and a path like this to find the job

```
/scratch/user/u.sc126842/drona_composer/runs/ship
```


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://workshop.dukeieee.org/workshops/introduction-to-supercomputing-architecture-linux-and-job-scheduling-slurm.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
