Contact & Support
OLCF User Assistance Center
Authentication Support
OLCF Office Hours
Communication to Users
Accounts and Projects
Request a New Allocation
What are the differences between project types?
What happens after a project request is approved?
Guidance on Frontier Allocation Requests
Applying for a user account
Checking the status of your application
Get access to additional projects
Frequently Asked Questions
How do I apply for an account?
What is the status of my application?
How should I acknowledge the OLCF in my publications and presentations?
What is a subproject?
I no longer need my account. Who should I inform and what should I do with my OLCF issued RSA SecurID token?
My SecurID token is broken/expired. What should I do?
Getting Help
Additional Resources
Documents and Forms
Forms for Requesting a Project Allocation
Forms for Requesting an Account
Forms to Request Changes to Computers, Jobs, or Accounts
Report Templates
Miscellaneous Forms
OLCF Policy Guides
OLCF Acknowledgement
Software Requests
Special Requests and Policy Exemptions
Computing Policy
Computer Use
Data Use
Software Use
User Accountability
Data Management Policy
Introduction
Data Retention, Purge, & Quotas
Data Prohibitions & Safeguards
Software
Security Policy
Scope
Personal Use
Accessing OLCF Computational Resources
Data Management
Sensitive Data
Data Transfer
INCITE Allocation Under-utilization Policy
Project Reporting Policy
Non-proprietary Institutional User Agreement Policy
Access
Rules and Regulations
Safety and Health
Intent to Publish
Export Control
Intellectual Property
HIPAA/ITAR Project Rules of Behavior Policy
User-Managed Software (UMS) Policy
Purpose
Policies
Glossary
Additional Resources
Connecting
Connecting for the first time
Activating a new SecurID fob
PINs, Passcodes, and Tokencodes
X11 Forwarding
Systems Available to All Projects
OLCF System Hostnames
Starting a Tmux Session
Checking System Availability
Systems
2024 Notable System Changes
HPSS Decommission and Kronos Availability
Late July 2024 - Kronos available
August 30, 2024 - HPSS becomes read-only
January 31, 2025 - HPSS decommissioned
Summit and Alpine2 Decommissions
November 15, 2024 - Summit decommissioned
November 19, 2024 - Alpine2 read-only
January 31, 2025 - Alpine2 decommissioned
Frontier User Guide
System Overview
Frontier Compute Nodes
Node Types
System Interconnect
File Systems
Operating System
GPUs
Connecting
Data and Storage
Transition from Alpine to Orion
LFS setstripe wrapper
NFS Filesystem
Lustre Filesystem
Kronos Archival Storage
NVMe
NVMe Usage
Using Globus to Move Data to and from Orion
AMD GPUs
AMD vs NVIDIA Terminology
Blocks (workgroups), Threads (work items), Grids, Wavefronts
The Compute Unit
HIP
Things To Remember When Programming for AMD GPUs
Programming Environment
Environment Modules (Lmod)
Compilers
MPI
Compiling
Compilers
MPI
OpenMP
OpenMP GPU Offload
OpenACC
HIP
HIP + OpenMP CPU Threading
SYCL
Running Jobs
Login vs Compute Nodes
Simplified Node Layout
Slurm
Batch Scripts
Interactive Jobs
Common Slurm Options
Slurm Environment Variables
Job States
Job Reason Codes
Scheduling Policy
Job Dependencies
Monitoring and Modifying Batch Jobs
Srun
Process and Thread Mapping Examples
Ensemble Jobs
Tips for Launching at Scale
Software
Debugging
Linaro DDT
GDB
Valgrind4hpc
Omnitrace
Profiling Applications
Getting Started with the HPE Performance Analysis Tools (PAT)
Getting Started with HPCToolkit
Getting Started with the ROCm Profiler
Roofline Profiling with the ROCm Profiler
Omniperf
Tips and Tricks
Using reduced precision (FP16 and BF16 datatypes)
Enabling GPU Page Migration
Floating-Point (FP) Atomic Operations and Coarse/Fine Grained Memory Allocations
Performance considerations for LDS FP atomicAdd()
Library considerations with atomic operations
System Updates
2024-11-12
2024-09-03
2024-08-20
2024-07-16
2024-04-17
2024-03-19
2024-01-23
2023-12-05
2023-10-03
2023-09-19
2023-07-18
2023-05-09
Known Issues
Open Issues
Open Issues w/Workaround
Resolved Issues
Summit User Guide
Summit Documentation Resources
System Overview
Summit Nodes
Node Types
System Interconnect
File Systems
Operating System
Hardware Threads
GPUs
Connecting
Data and Storage
Software
Shell & Programming Environments
Default Shell
Environment Management with Lmod
Compiling
Compilers
Linking in Libraries
Running Jobs
Login, Launch, and Compute Nodes
Batch Scripts
Interactive Jobs
Common bsub Options
Batch Environment Variables
Job States
Scheduling Policy
Job Dependencies
Job Launcher (jsrun)
Launching a Job with jsrun
Jsrun Examples
Using Multithreading in a Job
Launching Multiple Jsruns
CUDA-Aware MPI
Monitoring Jobs
Interacting With Jobs
Other LSF Commands
PBS/Torque/MOAB-to-LSF Translation
Easy Mode vs. Expert Mode
System Service Core Isolation
Job Accounting on Summit
Other Notes
Debugging
Linaro DDT
GDB
Valgrind
Optimizing and Profiling
Profiling GPU Code with NVIDIA Developer Tools
Score-P
Vampir
HPCToolkit
NVIDIA V100 GPUs
NVIDIA V100 SM
HBM2
NVIDIA NVLink
Volta Multi-Process Service
Unified Memory
Independent Thread Scheduling
Tensor Cores
Tesla V100 Specifications
Further Reading
Burst Buffer
NVMe (XFS)
Current NVMe Usage
Interactive Jobs Using the NVMe
NVMe Usage Example
Spectral Library
Known Issues
Open Issues
Resolved Issues
CUDA 10.1 Known Issues
Scalable Protected Infrastructure (SPI)
Training System (Ascent)
File Systems
Obtaining Access to Ascent
Logging In to Ascent
Preparing For Frontier
HIP
Using HIP on Summit
Learning to Program with HIP
Previous Frontier Training Events
Citadel User Guide
What is Citadel
Citadel (SPI) Documentation
Andes User Guide
System Overview
Compute nodes
Login nodes
File systems
LFS setstripe wrapper
Shell and programming environments
Default shell
Environment management with lmod
Installed Software
Compiling
Available compilers
Changing compilers
Compiler wrappers
Compiling threaded codes
Running Jobs
Login vs Compute Nodes on Commodity Clusters
Slurm
Interactive Batch Jobs on Commodity Clusters
Common Batch Options to Slurm
Batch Environment Variables
Modifying Batch Jobs
Monitoring Batch Jobs
Job Execution
Batch Queues on Andes
Job Accounting on Andes
Debugging
Linaro DDT
GDB
Valgrind
Visualization tools
ParaView
VisIt
Remote Visualization using VNC (non-GPU)
Remote Visualization using VNC (GPU nodes)
Remote Visualization using Nice DCV (GPU nodes only)
Home
System Overview
Access & Connecting
Usage
Acceptable Tasks
Unacceptable Tasks
Data Transfer Nodes (DTNs)
System Overview
Interactive Access
Access From Globus Online
Batch Queue (Slurm)
Queue Policy
Submitting jobs to Frontier
High Performance Storage System
System Overview
Odo
System Overview
File Systems
Obtaining Access to Odo
Logging In to Odo
Ascent
System Overview
Spock Quick-Start Guide
System Overview
Spock Compute Nodes
System Interconnect
File Systems
GPUs
Connecting
Data and Storage
NFS
GPFS
Programming Environment
Environment Modules (Lmod)
Compilers
MPI
Compiling
MPI
OpenMP
OpenMP GPU Offload
HIP
Running Jobs
Slurm Workload Manager
Slurm Compute Node Partitions
Process and Thread Mapping
NVMe Usage
Getting Help
Crusher Quick-Start Guide
System Overview
Crusher Compute Nodes
System Interconnect
File Systems
GPUs
Connecting
Data and Storage
NFS Filesystem
Lustre Filesystem
Programming Environment
Environment Modules (Lmod)
Compilers
MPI
Compiling
MPI
OpenMP
OpenMP GPU Offload
HIP
HIP + OpenMP CPU Threading
SYCL
Running Jobs
Slurm Workload Manager
Slurm Compute Node Partitions
Process and Thread Mapping
NVMe Usage
Tips for Launching at Scale
Profiling Applications
Getting Started with the HPE Performance Analysis Tools (PAT)
Getting Started with HPCToolkit
Getting Started with the ROCm Profiler
Roofline Profiling with the ROCm Profiler
Omnitrace
Omniperf
Notable Differences between Summit and Crusher
Using reduced precision (FP16 and BF16 datatypes)
Enabling GPU Page Migration
Floating-Point (FP) Atomic Operations and Coarse/Fine Grained Memory Allocations
Performance considerations for LDS FP atomicAdd()
System Updates
2024-03-19
2024-01-23
2023-12-05
2023-10-03
2023-09-19
2023-07-18
2023-04-05
2022-12-29
Getting Help
Services and Applications
Slate
Overview
What is Slate?
What is Kubernetes?
What is OpenShift?
Getting Started
Requesting A Slate Project Allocation
Logging in
Slate Namespaces
Install the OC tool
Test login with OC Tool
Guided Tutorial
Creating your project
Guided Web GUI Tutorial
Guided Tutorial: CLI
Adding a Pod to your Project
Image Building
Build Types
Examples
Logging into the registry externally
Workloads
Pods
Deployments
Networking
Services
NodePorts
Routes
Network Policies
Quick Access from Outside Slate
Persistent Storage
Creating A Persistent Volume Claim
Adding PVC To Pod
Backups
Workflows
Workflows Overview
OpenShift GitOps
OpenShift Pipelines
Application Deployment Examples
Build and Deploy Simple Website
Deploy MongoDB
Deploy NGINX with Hello World
GitLab Runners
Deploy Packages with Helm
Helm Prerequisites
MinIO Object Store (On an NCCS Filesystem)
Access OLCF Resources From Containers
Batch Job Submission
Mount OLCF Filesystems
Schedule Other Slate Resources
GPUs
OLCF-Provided Applications on Slate
Troubleshooting
Fix Container Image Permissions
Debugging
YAML Object Quick Reference
CronJobs
Deployments and Stateful Sets
Pods
Roles and Rolebindings
Routes, Services and Nodeports
Persistent Volume Claims
Glossary
myOLCF
Overview
What is myOLCF?
What can it do?
Can I suggest a feature?
Authenticating
OLCF Moderate Accounts
OLCF Open Accounts
Project Pages
Project Context
Switching Project Contexts
Available Pages
Account Pages
Account Context
Available Pages
Processing Project Membership Requests
Jupyter
Overview
Jupyter at OLCF
Access
CPU vs. GPU JupyterLab (Available Resources)
Working within Lustre and NFS (Launching a notebook)
Conda environments and custom notebooks
Manually stopping your JupyterLab session
Things to be aware of
Example Jupyter Notebooks
Data Storage and Transfers
Summary of Storage Areas
Notes on User-Centric Data Storage
User Home Directories (NFS)
Notes on Project-Centric Data Storage
Project Home Directories (NFS)
Project Work Areas
Project Archive Directories
Data Policies
Information
Special Requests
Data Retention
Orion Lustre HPE ClusterStor Filesystem
Orion Performance Tiers and File Striping Policy
I/O Patterns that Benefit from File Striping
LFS setstripe wrapper
Lustre File Locking Tips
Darshan-runtime and I/O Profiling
Purge
Alpine2 IBM Spectrum Scale Filesystem
Alpine2 Performance under non-ideal workloads
Tips
Major difference between Lustre HPE ClusterStor and IBM Spectrum Scale
HPSS Data Archival System
Kronos Nearline Archival Storage System
Access / Data Transfer
Directory Structure
Project Quotas
Kronos and HPSS Comparison
Transferring Data
Globus
Using Globus to Move Data Between Collections
Using Globus From Your Local Workstation
HSI
Additional HSI Documentation
HTAR
HTAR Limitations
Additional HTAR Documentation
Command-Line/Terminal Tools
Burst Buffer and Spectral Library
Software
Software News
Frontier: Updated modules for cpe/23.12 (October 16 2024)
Frontier: Core module (October 15, 2024)
Frontier: System Software Update (July 16, 2024)
Frontier: User Environment Changes (July 9, 2024)
ML/DL & Data Analytics
IBM Watson Machine Learning CE -> Open CE
Getting Started
Running Distributed Deep Learning Jobs
Setting up Custom Environments
Best Distributed Deep Learning Performance
Troubleshooting Tips
R and pbdR on Summit
Loading R
How to Run an R Script
R Hello World Example
pbdR Hello World Example
Common R Packages for Parallelism
GPU Computing with R
More Information
NVIDIA RAPIDS
Overview
Getting Started
RAPIDS on Jupyter
RAPIDS on Summit
Setting up Custom Environments
BlazingSQL Distributed Execution
Apache Spark
Overview
Getting Started
Python on OLCF Systems
Overview
OLCF Python Guides
Conda Basics
Installing mpi4py and h5py
Installing CuPy
Sbcast Conda Environments
Jupyter Visibility
PyTorch on Summit
PyTorch on Frontier
Module Usage
Base Environment
Custom Environments
How to Run
Summit
Frontier / Andes
Best Practices
Additional Resources
Profiling Tools
Score-P
Overview
Usage
Instrumentation
Measurement
Profiling
Tracing
Manual Instrumentation
Score-P Demo Video
Tuning and Analysis Utilities (TAU)
Run-Time Environment Variables
Compile-Time Environment Variables
MiniWeather Example Application
CUDA Profiling Tools Interface
Tracing
Selective Instrumentation
Dynamic Phase
Static Phase
OpenMP Offload
Vampir
Overview
Usage
Vampir on a Login Node
Vampir Using VampirServer
Vampir Tunneling to VampirServer
Vampir GUI Demo
User-Managed Software
Introduction
Currently Available User-Managed Software
Usage
Policies
Writing UMS Modulefiles
Workflows
Running Workflows on OLCF Resources
Workflow Systems
Ensemble Toolkit (EnTK)
FireWorks
MLflow
Parsl
pmake
Swift/T
libEnsemble
E4S Software Stack
Summit
Access via modulefiles
E4S 21.08 Packages
E4S 21.05 Packages
Spock
Access via modulefiles
E4S 21.08 Packages
E4S 21.05 Packages
Spack Environments
Purpose
Getting Started
Add Dependencies to the environment
Adding OLCF Modulefiles as External Packages
Adding User-Defined Dependencies to the environment
Installing the Environment
More Details
References
Visualization Tools
VisIt
Overview
Installing and Setting Up Visit
Remote GUI Usage
Command Line Example
Troubleshooting
Additional Resources
ParaView
Overview
Installing and Setting Up ParaView
Remote GUI Usage
Command Line Example
Troubleshooting
Additional Resources
Containers on Summit
Basic Information
Setup before Building
Build and Run Workflow
Building a Simple Image
Using a Container Registry to Build and Save your Images
Running a Simple Container in a Batch Job
Running an MPI program with the OLCF MPI base image
Running a single node GPU program with the OLCF MPI base image
Running a CUDA-Aware MPI program with the OLCF MPI base image
Tips, Tricks, and Things to Watch Out For
Containers on Frontier
Examples for Building and Running Containers
Building and running a container image from a base Linux distribution for MPI
Pushing your Apptainer image to an OCI Registry supporting ORAS (e.g. DockerHub)
Building an image on top of an existing image (local, docker image, OCI artifact)
OLCF Base Images & Apptainer Modules
Base Images
Apptainer Modules
Example Workflow
Some Restrictions and Tips
Debugging
Linaro Forge DDT
Client Setup and Usage
Download
Installation
Configuration
GNU GDB
Valgrind
Training
OLCF Training Calendar
OLCF Tutorials
OLCF Training Archive
OLCF GPU Hackathons
FacultyHack
OLCF Vimeo Channel
New User Quick Start
Quantum
Quantum Computing User Program (QCUP) Access
QCUP Priorities
Project Allocations
What happens after a project request is approved?
Project Renewals
Closeout and Quarterly Reports
User Accounts
Checking the status of your application
Accessing Quantum Resources
IBM Quantum Computing
Quantinuum
Rigetti
IonQ
Publication Citations
Quantum Systems
IBM Quantum
Overview
Connecting
Running Jobs & Queue Policies
Checking System Availability & Capability
Software
Additional Resources
Rigetti
Overview
Connecting
Running Jobs
Allocations & Credit Usage
Data Storage Policies
Software
Additional Resources
Quantinuum
Overview
Connecting
Running Jobs & Queue Policies
Allocations & Credit Usage
Software
IonQ
Overview
IonQ systems
Connecting
Running Jobs & Queue Policies
Allocations & Credit Usage
Checking System Availability & Capability
Additional Resources
Quantum Software
Quantum Software on HPC Systems
Overview
Qiskit
PyQuil/Forest SDK (Rigetti)
PennyLane
Pytket
CUDA-Q
Batch Jobs
Hello QCUP Scripts
Overview
IBM Quantum
Quantinuum
IonQ
Rigetti
Frequently Asked Questions
How do quantum computers differ from classical computers?
What is a qubit?
How do I access the OLCF quantum computing resources?
What happens after I apply for access to QCUP?
I formerly had access to quantum resources, but my backends/lattices/etc. have disappeared, what do I do?
I applied to a quantum computing resource via the vendor website, but don’t have access; what do I do?
Scalable Protected Infrastructure (SPI)
What is SPI
What is Citadel
New User QuickStart
Notable Differences
Allocations and User Accounts
Allocations (Projects)
Requesting a New Allocation (Project)
User Accounts
Requesting a New User Account
Available Resources
Compute
File Systems
Data Transfer
IP Whitelisting
Whitelisting an IP or range
Finding your IP
Citadel
Login Nodes
Connecting
Building Software
External Repositories
Running Batch Jobs
File Systems
Data Transfer
Advanced Computing Ecosystem Testbed (ACE)
Defiant Quick-Start Guide
System Overview
Defiant Compute Nodes
System Interconnect
File Systems
GPUs
Connecting
Data and Storage
NFS Filesystem
Lustre Filesystem (Polis)
Programming Environment
Environment Modules (Lmod)
Compilers
MPI
Compiling
MPI
OpenMP
OpenMP GPU Offload
HIP
Running Jobs
Slurm Workload Manager
Slurm Compute Node Partitions
Process and Thread Mapping
NVMe Usage
Container Usage
Setup before Building
Build and Run Workflow
Getting Help
Known Issues
Contributing to these docs
Submitting suggestions
Authoring content
Setup authoring environment
Edit the docs
Resources
GitHub Guidelines
OLCF User Documentation
Services and Applications
Slate
Troubleshooting
Edit on GitHub
Troubleshooting
Fix Container Image Permissions
Mount an EmptyDir Volume
Build a New Image
Debugging
Debug a pod in CrashLoopBackoff
Get a shell inside a pod