Swift/T

Overview

Swift/T is a completely new implementation of the Swift language for high-performance computing which translates Swift scripts into MPI programs that use the Turbine (hence, /T) and ADLB runtime libraries. This tutorial shows how to get up and running with Swift/T on Summit specifically. For more information about Swift/T, please refer to its documentation.

Prerequisites

Swift/T is available as a module on Summit, and it can be loaded as follows:

$ module load workflows
$ module load swift/1.5.0

You will also need to set the PROJECT environment variable:

$ export PROJECT="ABC123"

Hello world!

To run an example “Hello world” program with Swift/T on Summit, create a file called hello.swift with the following contents:

trace("Hello world!");

Now, run the program from a shell or script:

$ swift-t -m lsf hello.swift

The output should look something like the following:

TURBINE-LSF SCRIPT
NODES=2
PROCS=2
PPN=1
TURBINE_OUTPUT=/ccs/home/seanwilk/turbine-output/2021/06/18/17/11/29
wrote: /ccs/home/seanwilk/turbine-output/2021/06/18/17/11/29/turbine-lsf.sh
PWD: /autofs/nccs-svm1_home2/seanwilk/turbine-output/2021/06/18/17/11/29
Job <1095064> is submitted to default queue <batch>.
JOB_ID=1095064

Congratulations! You have now submitted a Swift/T job to Summit. Inspect the TURBINE_OUTPUT directory to find the workflow outputs and other artifacts.

Cross Facility Workflow

This example demonstrates a continuously running cross-facility workflow. The idea is that there is a science facility (eg. SNS at ORNL) that produces scientific data to be processed by the remote compute facility (eg. OLCF at ORNL). The data is continuously arriving in a designated directory at the compute facility from science facility. The workflow picks data from that directory and does the processing to the data to produce some output. The Swift source file workflow.swift looks as follows:

import files;
import io;

app (void v) processdata(file f)
{
 // change path per your location
 "/gpfs/alpine/scratch/ketan2/stf019/swift-work/cross-facility/processdata.sh" f ;
}

for (boolean b = true; b; b=c)
{
  boolean c;
  // You can change the number of data files while the workflow is running
  file data[] = glob("*.jpg");
  void V[];
  foreach f, i in data
  {
    V[i] = processdata(f);
  }
  printf("processed %i files.", size(V)) => c = true;
}

In order to demonstrate the data generation, we have a script that downloads image data from the NOAA website periodically. The image is a geographical image showing current cloud cover over south-east US. The code gendata.sh looks like so:

#!/bin/bash
set -eu

function cleanup() {
  \rm -f ./data/earth*.jpg
}

while true
do
  uid=$(uuidgen | awk -F- '{print $1}')
  wget -q https://cdn.star.nesdis.noaa.gov/GOES16/ABI/SECTOR/se/GEOCOLOR/1200x1200.jpg -O ./data/earth${uid}.jpg
  sleep 5
  trap cleanup EXIT
done

Next, we have the data processing script called processdata.sh that looks as follows:

#!/bin/bash
set -eu

TASK=convert
DATA=$1
echo "\nProcessing ${DATA}\n"
${TASK} ${DATA} -fuzz 10% -fill white -opaque white -fill black +opaque white -format "%[fx:100*mean]" info:
sleep 5

The above script computes the cloud cover percentage by looking at the amount of white pixels in the image. Note that it uses ImageMagick’s convert utility.

The suggested directory structure is to have a outer directory say swift-work that has the swift source and shell scripts. Inside of swift-work create a new directory called data.

Additionally, we will need two terminals open. In the first terminal window, navigate to the swift-work directory and invoke the data generation script like so:

$ ./gendata.sh

In the second terminal, we will run the swift workflow as follows (make sure to change the project name per your allocation):

$ module load imagemagick # for convert utility
$ export WALLTIME=00:10:00
$ export PROJECT=STF019
$ export TURBINE_OUTPUT=/gpfs/alpine/scratch/ketan2/stf019/swift-work/cross-facility/data
$ swift-t -O0 -m lsf workflow.swift

If all goes well, and when the job starts running, the output will be produced in the data directory output.txt file.