OLCF Inference Service Documentation

Welcome to the documentation for the OLCF Inference Service. This service flexes the Secure Scientific Service Mesh (S3M) to provide access to powerful Large Language Models (LLMs) running on a highly optimized vLLM runtime, offering OpenAI-compatible API endpoints.

Requesting OLCF Inference Service Access

Please email OLCF Support help@olcf.ornl.gov if you are interested in using the OLCF Inference Service.

Include the following information in your email:

Existing OLCF Project ID
Project PI
Your project’s use case - Please explain how your project will use the OLCF inference service in your workflow.

Authentication

To use the inference service, you must authenticate your requests using a Bearer token.

Mint your token: Tokens must be minted via S3M on myOLCF. More information can be found in the S3M documentation here.
Set your environment variable: Once you have your token, we recommend exporting it securely in your terminal environment to prevent hardcoding it in your scripts.

export S3M_TOKEN="your_minted_token_here"

Endpoint URL

The primary base endpoint for chat completions is:

https://s3m.olcf.ornl.gov/olcf/open/v1/inference/chat/completions

Available Models

Note

This list is not exhaustive. To see a complete list, please see List Models

Currently, the service supports the following models:

Supported Models
Model	Aliases	Features
gpt-oss-120b	`gpt-oss-120b`, `gpt-oss`	Text, Reasoning
nemotron-nano-fp8	`nemotron-nano-fp8`, `nemotron-nano`	Text, Reasoning
apriel-1.6-15b-thinker	`apriel-1.6-15b-thinker`, `apriel-15b-thinker`	Text, Reasoning, Vision
nomic-embed-text-v2-moe	`nomic-embed-text-v2-moe`, `nomic-embed-v2`	Text Embedding

Additional Resources

You can refer to both vLLM’s API reference and OpenAI’s API reference documentation for additional examples and instructions.

vLLM: https://docs.vllm.ai/en/stable/serving/openai_compatible_server/

OpenAI Chat Completions: https://developers.openai.com/api/reference/chat-completions/overview

OLCF Inference Service Documentation

Requesting OLCF Inference Service Access

Authentication

Endpoint URL

Available Models

Usage Examples

Querying gpt-oss-120b

Querying nemotron-nano-fp8

Computer Vision

Simple Text Files

Core API Endpoints

Chat Completions

Standard Completions

List Models

Embeddings

Responses

Additional Resources

OLCF Inference Service Documentation

Requesting OLCF Inference Service Access

Authentication

Endpoint URL

Available Models

Multi-Modal Inputs

Usage Examples

Querying gpt-oss-120b

Querying nemotron-nano-fp8

Computer Vision

Simple Text Files

Core API Endpoints

Chat Completions

Standard Completions

List Models

Embeddings

Responses

Additional Resources