swyft.store¶
- class swyft.store.DaskSimulator(model, parameter_names, sim_shapes, sim_dtype='f8', fail_on_non_finite=True)[source]¶
Setup and run the simulator engine, powered by dask.
- Parameters:
model (Callable[[...], Dict[Hashable, Union[ndarray, Tensor]]]) –
parameter_names (Union[Sequence[str], int]) –
sim_shapes (Mapping[Hashable, Union[Size, Tuple[int, ...]]]) –
sim_dtype (str) –
fail_on_non_finite (bool) –
- classmethod from_command(command, parameter_names, sim_shapes, set_input_method, get_output_method, shell=False, tmpdir=None, sim_dtype='f8')¶
Setup a simulator from a command line program.
- Parameters:
command (str) – Command-line program using shell-like syntax.
set_input_method (Callable) – Function to setup the simulator input. It should take one input argument (the array with the input parameters), and return any input to be passed to the program via stdin. If the simulator requires any input files to be present, this function should write these to disk.
get_output_method (Callable) – Function to retrieve results from the simulator output. It should take two input arguments (stdout and stderr of the simulator run) and return a dictionary with the simulator output shaped as described by the
sim_shapes
argument. If the simulator writes output to disk, this function should parse the results from the file(s).shell (bool) – execute the specified command through the shell. NOTE: the following security considerations apply: https://docs.python.org/3/library/subprocess.html#security-considerations
tmpdir (Optional[Union[str, Path]]) – Root temporary directory where to run the simulator. Each instance of the simulator will run in a separate sub-folder. It must exist.
parameter_names (Union[Sequence[str], int]) –
sim_shapes (Mapping[Hashable, Union[Size, Tuple[int, ...]]]) –
sim_dtype (str) –
- classmethod from_model(model, prior, fail_on_non_finite=True)¶
Instantiate a Simulator with the correct sim_shapes.
- Parameters:
model (Callable[[...], Dict[Hashable, Union[ndarray, Tensor]]]) – Simulator model.
prior (Prior) – Model prior.
fail_on_non_finite (bool) –
Note
The simulator model is run once in order to infer observable shapes from the output.
- class swyft.store.Dataset(N, prior, store, bound=None, simhook=None, simkeys=None)[source]¶
Dataset for access to swyft.Store.
- Parameters:
N (int) – Number of samples.
prior (Prior) – Parameter prior.
store (swyft.store.store.Store) – Store reference.
simhook (Optional[Callable[[...], Dict[Hashable, Union[ndarray, Tensor]]]]) – Applied on-the-fly to each sample. simhook(x, v)
simkeys (Optional[Sequence[Hashable]]) – List of simulation keys that should be exposed (None means that all store sims are exposed)
bound (Optional[Bound]) –
Note
swyft.Dataset is essentially a list of indices that point to corresponding entries in the swyft.Store. It is a daughter class of torch.utils.data.Dataset, and can be used directly for training. Due to the statistical nature of the Store, the returned number of samples is effectively drawn from a Poisson distribution with mean N.
- __getitem__(idx)[source]¶
Return datastore entry.
- Return type:
Tuple[Dict[Hashable, Union[ndarray, Tensor]], Tensor, Tensor]
- classmethod load(filename, store, simhook=None)[source]¶
Load dataset.
- Parameters:
filename (Union[str, Path]) –
store (Store) – Corresponding datastore.
simhook (Optional[Callable[[...], Dict[Hashable, Union[ndarray, Tensor]]]]) – Simulation hook.
Warning
Make sure that the store is the same that was originally used for creating the dataset.
- property parameter_names: Sequence[str]¶
Return parameter names (inherited from store and simulator).
- property requires_sim: bool¶
Check if simulations are required for points in the dataset.
- save(filename)[source]¶
Note
The store and the simhook are not saved. They must be loaded independently by the user.
- Parameters:
filename (Union[str, Path]) –
- Return type:
None
- simulate(batch_size=None, wait_for_results=True)[source]¶
Trigger simulations for points in the dataset.
- Parameters:
batch_size (Optional[int]) – Number of batched simulations.
wait_for_results (bool) – What for simulations to complete before returning.
- Return type:
None
- property v: ndarray¶
Return all parameters as (n_points, n_parameters) array.
- class swyft.store.Simulator(model, parameter_names, sim_shapes, sim_dtype='f8', fail_on_non_finite=True)[source]¶
Wrapper class for simulator.
- Parameters:
model (Callable[[...], Dict[Hashable, Union[ndarray, Tensor]]]) – Model function.
parameter_names (Union[Sequence[str], int]) – List of parameter names, or number of parameters (interpreted as ‘z0’, ‘z1’, …).
sim_shapes (Mapping[Hashable, Union[Size, Tuple[int, ...]]]) – Dict describing model function output shapes.
sim_dtype (str) – Model output data type.
fail_on_non_finite (bool) – whether return an invalid code if simulation returns NaN or infinite, default True
Examples
>>> def model(v): >>> mu = sum(v) # mu = x + y + z >>> nu = np.array([v[1], 2*v[2]]) # nu = [y, 2*z] >>> return dict(mu=mu, nu=nu) >>> simulator = swyft.Simulator(model, ["x", "y", "z"], sim_shapes=dict(mu=(1,), nu=(2,))
- classmethod from_command(command, parameter_names, sim_shapes, set_input_method, get_output_method, shell=False, tmpdir=None, sim_dtype='f8')[source]¶
Setup a simulator from a command line program.
- Parameters:
command (str) – Command-line program using shell-like syntax.
set_input_method (Callable) – Function to setup the simulator input. It should take one input argument (the array with the input parameters), and return any input to be passed to the program via stdin. If the simulator requires any input files to be present, this function should write these to disk.
get_output_method (Callable) – Function to retrieve results from the simulator output. It should take two input arguments (stdout and stderr of the simulator run) and return a dictionary with the simulator output shaped as described by the
sim_shapes
argument. If the simulator writes output to disk, this function should parse the results from the file(s).shell (bool) – execute the specified command through the shell. NOTE: the following security considerations apply: https://docs.python.org/3/library/subprocess.html#security-considerations
tmpdir (Optional[Union[str, Path]]) – Root temporary directory where to run the simulator. Each instance of the simulator will run in a separate sub-folder. It must exist.
parameter_names (Union[Sequence[str], int]) –
sim_shapes (Mapping[Hashable, Union[Size, Tuple[int, ...]]]) –
sim_dtype (str) –
- classmethod from_model(model, prior, fail_on_non_finite=True)[source]¶
Instantiate a Simulator with the correct sim_shapes.
- Parameters:
model (Callable[[...], Dict[Hashable, Union[ndarray, Tensor]]]) – Simulator model.
prior (Prior) – Model prior.
fail_on_non_finite (bool) –
Note
The simulator model is run once in order to infer observable shapes from the output.
- class swyft.store.Store(zarr_store, simulator=None, sync_path=None, chunksize=1, pickle_protocol=4, from_scratch=True)[source]¶
Store of sample parameters and simulation outputs.
Based on Zarr, it should be instantiated via its methods memory_store, directory_store or load.
- Parameters:
zarr_store (Union[MemoryStore, DirectoryStore]) – Zarr store object.
simulator (Optional[Simulator]) – simulator object.
sync_path (Optional[Union[str, Path]]) – if specified, it will enable synchronization using file locks (files will be stored in the given path). Must be accessible to all processes working on the store and the underlying filesystem must support file locking.
chunksize (int) – the parameters and simulation output will be stored as arrays with the specified chunk size along the sample dimension (a single chunk will be used for the other dimensions).
pickle_protocol (int) – pickle protocol number used for storing intensity functions.
from_scratch (bool) – if False, load the sample store from the Zarr store provided.
- __getitem__(i)[source]¶
Returns data store entry with index \(i\).
- Parameters:
i (int) –
- Return type:
Tuple[Mapping[str, ndarray], ndarray]
- add(N, prior, bound=None)[source]¶
Adds points to the store.
- Parameters:
- Return type:
None
Warning
Calling this method will alter the content of the store by adding additional points. Currently this cannot be reverted, so use with care when applying it to the DirectoryStore.
- property any_failed: bool¶
Check whether there are parameters which currently lead to a failed simulation.
- coverage(N, prior, bound=None)[source]¶
Returns fraction of already stored data points.
- Parameters:
- Returns:
Fraction of samples that is already covered by content of the store.
- Return type:
float
Note
A coverage of zero means that all points need to be newly simulated. A coverage of 1.0 means that all points are already available for this (truncated) prior.
Warning
Results are Monte Carlo estimated and subject to sampling noise.
- classmethod directory_store(path, simulator=None, sync_path=None, overwrite=False)[source]¶
Instantiate a new Store based on a Zarr DirectoryStore.
- Parameters:
path (Union[str, Path]) – path to storage directory
simulator (Optional[Simulator]) – simulator object
sync_path (Optional[Union[str, Path]]) – path for synchronization via file locks (files will be stored in the given path). It must differ from path, it must be accessible to all processes working on the store, and the underlying filesystem must support file locking.
overwrite (bool) – if True, and a store already exists at the specified path, overwrite it.
- Returns:
Store based on a Zarr DirectoryStore
- Return type:
Example
>>> store = swyft.Store.directory_store(PATH_TO_STORE)
- get_simulation_status(indices=None)[source]¶
Determine the status of sample simulations.
- Parameters:
indices (Optional[Sequence[int]]) – List of indices. If None, check the status of all samples
- Returns:
list of simulation statuses
- Return type:
ndarray
- classmethod load(path, simulator=None, sync_path=None)[source]¶
Open an existing sample store using a Zarr DirectoryStore.
- Parameters:
path (Union[str, Path]) – path to the Zarr root directory
simulator (Optional[Simulator]) – simulator object
sync_path (Optional[Union[str, Path]]) – path for synchronization via file locks (files will be stored in the given path). It must differ from path, it must be accessible to all processes working on the store, and the underlying filesystem must support file locking.
- Return type:
- log_lambda(z)[source]¶
Intensity function of the store.
- Parameters:
z (ndarray) – Array with the sample parameters. Should have shape (num. samples, num. parameters per sample).
- Returns:
Array with the sample intensities.
- Return type:
ndarray
- classmethod memory_store(simulator)[source]¶
Instantiate a new Store based on a Zarr MemoryStore.
- Parameters:
simulator (Simulator) – simulator object
- Returns:
Store based on a Zarr MemoryStore
- Return type:
Note
The store returned is in general expected to be faster than an equivalent store based on the Zarr DirectoryStore, and thus useful for quick explorations, or for loading data into memory before training.
Example
>>> store = swyft.Store.memory_store(simulator)
- requires_sim(indices=None)[source]¶
Check whether there are parameters which require simulation.
- Parameters:
indices (Optional[Sequence[int]]) – List of indices. If None, check all samples.
- Returns:
True if one or more samples require simulations, False otherwise.
- Return type:
bool
- sample(N, prior, bound=None, check_coverage=True, add=False)[source]¶
Return samples from store.
- Parameters:
- Returns:
Index list pointing to the relevant store entries.
- Return type:
Indices
- save(path)[source]¶
Save the Store to disk using a Zarr DirectoryStore.
- Parameters:
path (Union[str, Path]) – path where to create the Zarr root directory
- Return type:
None
- set_simulator(simulator)[source]¶
(Re)set simulator.
- Parameters:
simulator (Simulator) – Simulator.
- Return type:
None
- simulate(indices=None, batch_size=None, wait_for_results=True)[source]¶
Run simulator on parameter store with missing corresponding simulations.
- Parameters:
indices (Optional[Sequence[int]]) – list of sample indices for which a simulation is required
batch_size (Optional[int]) – simulations will be submitted in batches of the specified size
wait_for_results (Optional[bool]) – if True, return only when all simulations are done
- Return type:
None