Using an External simulator

As demonstrated in the Quickstart notebook, SWYFT, of course, enables the user to define a simulator for use in the inferrence problem. Importantly, however, SWYFT also enables the user to employ any simulator callable from the command line. This ensures that users can employ the simulators they are familiar with without having to worry about implementation etc.

In this notebook, based on the Quickstart example, we demonstrate the use of an external simulator.

NB: Here we demonstrate the use of SWYFT’s command line based simuulator invocation. The user can of course also write a python wraper to the simulator in question.

[1]:
%load_ext autoreload
%autoreload 2
[2]:
# DON'T FORGET TO ACTIVATE THE GPU when on google colab (Edit > Notebook settings)
from os import environ
GOOGLE_COLAB = True if "COLAB_GPU" in environ else False
if GOOGLE_COLAB:
    !pip install git+https://github.com/undark-lab/swyft.git
[3]:
import numpy as np
import torch
import pylab as plt
import os

import swyft
[4]:
# Set randomness
np.random.seed(25)
torch.manual_seed(25)

# cwd
cwd = os.getcwd()

# swyft
device = 'cpu'
n_training_samples = 100
n_parameters = 2
observation_key = "x"

Set input …

In order to make use of en external simulator called frrom the command line, the user must specify a function to setup the simulator input. It should take one input argument (the array with the input parameters), and return any input to be passed to the program via stdin. If the simulator requires any input files to be present, this function should write these to disk.

[5]:
def set_input(v):
    v0 = v[0]
    v1 = v[1]
    v_str = str(v0).strip()+' '+str(v1).strip()
    return v_str

… output methods …

Analogously, the user must define a function to retrieve results from the simulator output. It should take two input arguments (stdout and stderr of the simulator run) and return a dictionary with the simulator output shaped as described by the sim_shapes argument. If the simulator writes output to disk, this function should parse the results from the file(s).

[6]:
def get_output(stdout,stderr):
    try:
        if not stderr :
            x0,x1 = stdout.split(" ")
            x0 = float(x0.strip())
            x1 = float(x1.strip())
            x = np.array([x0,x1])
            return dict(x=x)

        else:
            raise('simulator returned on stderr')

    except:
        raise('Error in output retrieval')

… and invocation

Here we use the cell magic %%writefile command to create an external python function randgauss.py containing the simulator defined as model in the Quickstart notebook. This function is then invoked from the command line.

[7]:
 %%writefile randgauss.py
#!/usr/bin/env python

import numpy as np
import sys



def rgmodel(v,sigma=0.05):
    x = v + np.random.randn(2)*sigma
    return x

def main():
    sigma = None
    args = sys.stdin.readline()
    arg1, arg2 = args.split(' ')
    try:
        v0 = float(arg1.rstrip())
        v1 = float(arg2.rstrip())

    except:
        raise()

    v = np.array([v0,v1])

    if sigma is not None:
        x = rgmodel(v,sigma=sigma)
    else:
        x = rgmodel(v)

    print(str(x[0]).strip()+' '+str(x[1]).strip())



if __name__ == "__main__":
    main()


Overwriting randgauss.py

It is up to the user to ensure adaquate permissions for all relevant files.

[8]:
!chmod 755 randgauss.py
command = cwd+'/randgauss.py'

And to ensure that the root temporary directory in which the simulator is run exists. Each instance of the simulator will run in a separate sub-folder.

[9]:
!mkdir -p ./tmp

Defining the simulator

The simulator itslef can then be defined using the from_command() method of the Simulator class.

[10]:
simulator = swyft.Simulator.from_command(
    command=command,
    parameter_names=["x0","x1"],
    sim_shapes=dict(x=(n_parameters,)),
    set_input_method=set_input,
    get_output_method=get_output,
    tmpdir=cwd+'/tmp/',
    shell = False
)

PLEASE NOTE

The from_command() method of the Simulator class makes use of the subprocess module to execute a command line program or function.

As for the subprocess module, the shell keyword is set to False by default. In some cases, the user may, however, want to execute their program or function via the shell, enabling the invocation and use of different environments, as well as featuures such as shell pipes, filename wildcards, environment variable expansion, and expansion of ~ to a user’s home directory.

This can be achived by setting shell = True in the from_command() method.

We do, however, encourage the user to be aware of the security considerations connected to the use of shell = True in the subprocess module.

The remaining workflow is identical to that described in Quickstart.ipynb

[11]:
store = swyft.Store.memory_store(simulator)
Creating new store.
[12]:
low = -1 * np.ones(n_parameters)
high = 1 * np.ones(n_parameters)
prior = swyft.get_uniform_prior(low, high)

# drawing samples from the store is Poisson distributed. Simulating slightly more than we need avoids attempting to draw more than we have.
store.add(n_training_samples + 0.02 * n_training_samples, prior)
store.simulate()
Store: Adding 115 new samples to simulator store.
[13]:
dataset = swyft.Dataset(n_training_samples, prior, store)

The store / dataset is populated with samples drawn from an external simulator.

[14]:
print(len(dataset))
113
[ ]: