swyft.store

class swyft.store.DaskSimulator(model, parameter_names, sim_shapes, sim_dtype='f8', fail_on_non_finite=True)[source]

Setup and run the simulator engine, powered by dask.

Parameters:
  • model (Callable[[...], Dict[Hashable, Union[ndarray, Tensor]]]) –

  • parameter_names (Union[Sequence[str], int]) –

  • sim_shapes (Mapping[Hashable, Union[Size, Tuple[int, ...]]]) –

  • sim_dtype (str) –

  • fail_on_non_finite (bool) –

classmethod from_command(command, parameter_names, sim_shapes, set_input_method, get_output_method, shell=False, tmpdir=None, sim_dtype='f8')

Setup a simulator from a command line program.

Parameters:
  • command (str) – Command-line program using shell-like syntax.

  • set_input_method (Callable) – Function to setup the simulator input. It should take one input argument (the array with the input parameters), and return any input to be passed to the program via stdin. If the simulator requires any input files to be present, this function should write these to disk.

  • get_output_method (Callable) – Function to retrieve results from the simulator output. It should take two input arguments (stdout and stderr of the simulator run) and return a dictionary with the simulator output shaped as described by the sim_shapes argument. If the simulator writes output to disk, this function should parse the results from the file(s).

  • shell (bool) – execute the specified command through the shell. NOTE: the following security considerations apply: https://docs.python.org/3/library/subprocess.html#security-considerations

  • tmpdir (Optional[Union[str, Path]]) – Root temporary directory where to run the simulator. Each instance of the simulator will run in a separate sub-folder. It must exist.

  • parameter_names (Union[Sequence[str], int]) –

  • sim_shapes (Mapping[Hashable, Union[Size, Tuple[int, ...]]]) –

  • sim_dtype (str) –

classmethod from_model(model, prior, fail_on_non_finite=True)

Instantiate a Simulator with the correct sim_shapes.

Parameters:
  • model (Callable[[...], Dict[Hashable, Union[ndarray, Tensor]]]) – Simulator model.

  • prior (Prior) – Model prior.

  • fail_on_non_finite (bool) –

Note

The simulator model is run once in order to infer observable shapes from the output.

set_dask_cluster(cluster=None)[source]

Connect to Dask cluster.

Parameters:

cluster – Cluster address or Cluster object from dask.distributed (default is LocalCluster).

Return type:

None

class swyft.store.Dataset(N, prior, store, bound=None, simhook=None, simkeys=None)[source]

Dataset for access to swyft.Store.

Parameters:
  • N (int) – Number of samples.

  • prior (Prior) – Parameter prior.

  • store (swyft.store.store.Store) – Store reference.

  • simhook (Optional[Callable[[...], Dict[Hashable, Union[ndarray, Tensor]]]]) – Applied on-the-fly to each sample. simhook(x, v)

  • simkeys (Optional[Sequence[Hashable]]) – List of simulation keys that should be exposed (None means that all store sims are exposed)

  • bound (Optional[Bound]) –

Note

swyft.Dataset is essentially a list of indices that point to corresponding entries in the swyft.Store. It is a daughter class of torch.utils.data.Dataset, and can be used directly for training. Due to the statistical nature of the Store, the returned number of samples is effectively drawn from a Poisson distribution with mean N.

__getitem__(idx)[source]

Return datastore entry.

Return type:

Tuple[Dict[Hashable, Union[ndarray, Tensor]], Tensor, Tensor]

__len__()[source]

Return length of dataset.

Return type:

int

property bound: Bound

Return bound of truncated prior of dataset (swyft.Bound).

classmethod load(filename, store, simhook=None)[source]

Load dataset.

Parameters:
  • filename (Union[str, Path]) –

  • store (Store) – Corresponding datastore.

  • simhook (Optional[Callable[[...], Dict[Hashable, Union[ndarray, Tensor]]]]) – Simulation hook.

Warning

Make sure that the store is the same that was originally used for creating the dataset.

property parameter_names: Sequence[str]

Return parameter names (inherited from store and simulator).

property prior: Prior

Return prior of dataset.

property requires_sim: bool

Check if simulations are required for points in the dataset.

save(filename)[source]

Note

The store and the simhook are not saved. They must be loaded independently by the user.

Parameters:

filename (Union[str, Path]) –

Return type:

None

simulate(batch_size=None, wait_for_results=True)[source]

Trigger simulations for points in the dataset.

Parameters:
  • batch_size (Optional[int]) – Number of batched simulations.

  • wait_for_results (bool) – What for simulations to complete before returning.

Return type:

None

property v: ndarray

Return all parameters as (n_points, n_parameters) array.

class swyft.store.Simulator(model, parameter_names, sim_shapes, sim_dtype='f8', fail_on_non_finite=True)[source]

Wrapper class for simulator.

Parameters:
  • model (Callable[[...], Dict[Hashable, Union[ndarray, Tensor]]]) – Model function.

  • parameter_names (Union[Sequence[str], int]) – List of parameter names, or number of parameters (interpreted as ‘z0’, ‘z1’, …).

  • sim_shapes (Mapping[Hashable, Union[Size, Tuple[int, ...]]]) – Dict describing model function output shapes.

  • sim_dtype (str) – Model output data type.

  • fail_on_non_finite (bool) – whether return an invalid code if simulation returns NaN or infinite, default True

Examples

>>> def model(v):
>>>     mu = sum(v)  # mu = x + y + z
>>>     nu = np.array([v[1], 2*v[2]])  # nu = [y, 2*z]
>>>     return dict(mu=mu, nu=nu)
>>> simulator = swyft.Simulator(model, ["x", "y", "z"], sim_shapes=dict(mu=(1,), nu=(2,))
classmethod from_command(command, parameter_names, sim_shapes, set_input_method, get_output_method, shell=False, tmpdir=None, sim_dtype='f8')[source]

Setup a simulator from a command line program.

Parameters:
  • command (str) – Command-line program using shell-like syntax.

  • set_input_method (Callable) – Function to setup the simulator input. It should take one input argument (the array with the input parameters), and return any input to be passed to the program via stdin. If the simulator requires any input files to be present, this function should write these to disk.

  • get_output_method (Callable) – Function to retrieve results from the simulator output. It should take two input arguments (stdout and stderr of the simulator run) and return a dictionary with the simulator output shaped as described by the sim_shapes argument. If the simulator writes output to disk, this function should parse the results from the file(s).

  • shell (bool) – execute the specified command through the shell. NOTE: the following security considerations apply: https://docs.python.org/3/library/subprocess.html#security-considerations

  • tmpdir (Optional[Union[str, Path]]) – Root temporary directory where to run the simulator. Each instance of the simulator will run in a separate sub-folder. It must exist.

  • parameter_names (Union[Sequence[str], int]) –

  • sim_shapes (Mapping[Hashable, Union[Size, Tuple[int, ...]]]) –

  • sim_dtype (str) –

classmethod from_model(model, prior, fail_on_non_finite=True)[source]

Instantiate a Simulator with the correct sim_shapes.

Parameters:
  • model (Callable[[...], Dict[Hashable, Union[ndarray, Tensor]]]) – Simulator model.

  • prior (Prior) – Model prior.

  • fail_on_non_finite (bool) –

Note

The simulator model is run once in order to infer observable shapes from the output.

class swyft.store.Store(zarr_store, simulator=None, sync_path=None, chunksize=1, pickle_protocol=4, from_scratch=True)[source]

Store of sample parameters and simulation outputs.

Based on Zarr, it should be instantiated via its methods memory_store, directory_store or load.

Parameters:
  • zarr_store (Union[MemoryStore, DirectoryStore]) – Zarr store object.

  • simulator (Optional[Simulator]) – simulator object.

  • sync_path (Optional[Union[str, Path]]) – if specified, it will enable synchronization using file locks (files will be stored in the given path). Must be accessible to all processes working on the store and the underlying filesystem must support file locking.

  • chunksize (int) – the parameters and simulation output will be stored as arrays with the specified chunk size along the sample dimension (a single chunk will be used for the other dimensions).

  • pickle_protocol (int) – pickle protocol number used for storing intensity functions.

  • from_scratch (bool) – if False, load the sample store from the Zarr store provided.

__getitem__(i)[source]

Returns data store entry with index \(i\).

Parameters:

i (int) –

Return type:

Tuple[Mapping[str, ndarray], ndarray]

__len__()[source]

Returns number of samples in the store.

Return type:

int

add(N, prior, bound=None)[source]

Adds points to the store.

Parameters:
  • N (int) – Number of samples

  • prior (Prior) – Prior

  • bound (Optional[Bound]) – Bound object for prior truncation

Return type:

None

Warning

Calling this method will alter the content of the store by adding additional points. Currently this cannot be reverted, so use with care when applying it to the DirectoryStore.

property any_failed: bool

Check whether there are parameters which currently lead to a failed simulation.

coverage(N, prior, bound=None)[source]

Returns fraction of already stored data points.

Parameters:
  • N (int) – Number of samples

  • prior (Prior) – Prior

  • bound (Optional[Bound]) – Bound object for prior truncation

Returns:

Fraction of samples that is already covered by content of the store.

Return type:

float

Note

A coverage of zero means that all points need to be newly simulated. A coverage of 1.0 means that all points are already available for this (truncated) prior.

Warning

Results are Monte Carlo estimated and subject to sampling noise.

classmethod directory_store(path, simulator=None, sync_path=None, overwrite=False)[source]

Instantiate a new Store based on a Zarr DirectoryStore.

Parameters:
  • path (Union[str, Path]) – path to storage directory

  • simulator (Optional[Simulator]) – simulator object

  • sync_path (Optional[Union[str, Path]]) – path for synchronization via file locks (files will be stored in the given path). It must differ from path, it must be accessible to all processes working on the store, and the underlying filesystem must support file locking.

  • overwrite (bool) – if True, and a store already exists at the specified path, overwrite it.

Returns:

Store based on a Zarr DirectoryStore

Return type:

Store

Example

>>> store = swyft.Store.directory_store(PATH_TO_STORE)
get_simulation_status(indices=None)[source]

Determine the status of sample simulations.

Parameters:

indices (Optional[Sequence[int]]) – List of indices. If None, check the status of all samples

Returns:

list of simulation statuses

Return type:

ndarray

classmethod load(path, simulator=None, sync_path=None)[source]

Open an existing sample store using a Zarr DirectoryStore.

Parameters:
  • path (Union[str, Path]) – path to the Zarr root directory

  • simulator (Optional[Simulator]) – simulator object

  • sync_path (Optional[Union[str, Path]]) – path for synchronization via file locks (files will be stored in the given path). It must differ from path, it must be accessible to all processes working on the store, and the underlying filesystem must support file locking.

Return type:

Store

lock()[source]

Lock store for the current process.

Return type:

None

log_lambda(z)[source]

Intensity function of the store.

Parameters:

z (ndarray) – Array with the sample parameters. Should have shape (num. samples, num. parameters per sample).

Returns:

Array with the sample intensities.

Return type:

ndarray

classmethod memory_store(simulator)[source]

Instantiate a new Store based on a Zarr MemoryStore.

Parameters:

simulator (Simulator) – simulator object

Returns:

Store based on a Zarr MemoryStore

Return type:

Store

Note

The store returned is in general expected to be faster than an equivalent store based on the Zarr DirectoryStore, and thus useful for quick explorations, or for loading data into memory before training.

Example

>>> store = swyft.Store.memory_store(simulator)
requires_sim(indices=None)[source]

Check whether there are parameters which require simulation.

Parameters:

indices (Optional[Sequence[int]]) – List of indices. If None, check all samples.

Returns:

True if one or more samples require simulations, False otherwise.

Return type:

bool

sample(N, prior, bound=None, check_coverage=True, add=False)[source]

Return samples from store.

Parameters:
  • N (int) – Number of samples

  • prior (Prior) – Prior

  • bound (Optional[Bound]) – Bound object for prior truncation

  • check_coverage (bool) – Check whether requested points are contained in the store.

  • add (bool) – If necessary, add requested points to the store.

Returns:

Index list pointing to the relevant store entries.

Return type:

Indices

save(path)[source]

Save the Store to disk using a Zarr DirectoryStore.

Parameters:

path (Union[str, Path]) – path where to create the Zarr root directory

Return type:

None

set_simulator(simulator)[source]

(Re)set simulator.

Parameters:

simulator (Simulator) – Simulator.

Return type:

None

simulate(indices=None, batch_size=None, wait_for_results=True)[source]

Run simulator on parameter store with missing corresponding simulations.

Parameters:
  • indices (Optional[Sequence[int]]) – list of sample indices for which a simulation is required

  • batch_size (Optional[int]) – simulations will be submitted in batches of the specified size

  • wait_for_results (Optional[bool]) – if True, return only when all simulations are done

Return type:

None

to_memory()[source]

Make an in-memory copy of the existing Store using a Zarr MemoryStore.

Return type:

Store

unlock()[source]

Unlock store so that other processes can access it.

Return type:

None

wait_for_simulations(indices)[source]

Wait for a set of sample simulations to be finished.

Parameters:

indices (Sequence[int]) – list of sample indices

Return type:

None