The Python API is a concise API for using LiberTEM from Python code. It is suitable both for interactive scripting, for example from Jupyter notebooks, and for usage from within a Python application or script.
This is a basic example to load the API, create a local cluster, load a file and run an analysis. For complete examples on how to use the Python API, please see the Jupyter notebooks in the example directory.
import sys import logging # Changed in 0.5.0: The thread count is set dynamically # on the workers. No need for setting environment variables anymore. import numpy as np import matplotlib.pyplot as plt from libertem import api logging.basicConfig(level=logging.WARNING) # Protect the entry point. # LiberTEM uses dask, which uses multiprocessing to # start worker processes. # https://docs.python.org/3/library/multiprocessing.html if __name__ == '__main__': # api.Context() starts a new local cluster. # The "with" clause makes sure we shut it down in the end. with api.Context() as ctx: try: path = sys.argv ds = ctx.load( 'auto', path=path, ) except IndexError: path = ('C:/Users/weber/Nextcloud/Projects/' 'Open Pixelated STEM framework/Data/EMPAD/' 'scan_11_x256_y256.emd') ds = ctx.load( 'hdf5', path=path, ds_path='experimental/science_data/data', ) (scan_y, scan_x, detector_y, detector_x) = ds.shape mask_shape = (detector_y, detector_x) # LiberTEM sends functions that create the masks # rather than mask data to the workers in order # to reduce transfers in the cluster. def mask(): return np.ones(shape=mask_shape) analysis = ctx.create_mask_analysis(dataset=ds, factories=[mask]) result = ctx.run(analysis, progress=True) # Do something useful with the result: print(result) print(result.mask_0.raw_data) # For each mask, one channel is present in the result. # This may be different for other analyses. # You can access the result channels by their key on # the result object: plt.figure() plt.imshow(result.mask_0.raw_data) plt.show() # Otherwise, results handle like lists. # For example, you can iterate over the result channels: raw_result_list = [channel.raw_data for channel in result]
Connect to a cluster
See Starting a custom cluster on how to start a scheduler and workers.
from libertem import api from libertem.executor.dask import DaskJobExecutor # Connect to a Dask.Distributed scheduler at 'tcp://localhost:8786' with DaskJobExecutor.connect('tcp://localhost:8786') as executor: ctx = api.Context(executor=executor) ...
Customize CPUs and CUDA devices
To control how many CPUs and which CUDA devices are used, you can specify them as follows:
from libertem import api from libertem.executor.dask import DaskJobExecutor, cluster_spec from libertem.utils.devices import detect # Find out what would be used, if you like # returns dictionary with keys "cpus" and "cudas", each with a list of device ids devices = detect() # Example: Deactivate CUDA devices by removing them from the device dictionary devices['cudas'] =  # Example: Deactivate CuPy integration devices['has_cupy'] = False # Example: Use 3 CPUs. The IDs are ignored at the moment, i.e. no CPU pinning devices['cpus'] = range(3) # Generate a spec for a Dask.distributed SpecCluster # Relevant kwargs match the dictionary entries spec = cluster_spec(**devices) # Start a local cluster with the custom spec with DaskJobExecutor.make_local(spec=spec) as executor: ctx = api.Context(executor=executor) ...
For a full API reference, please see Python API reference.
To go beyond the included capabilities of LiberTEM, you can implement your own using User-defined functions (UDFs).
Integration with Dask arrays
make_dask_array() function can generate a distributed Dask array from a
DataSet using its partitions as blocks. The typical LiberTEM partition size is close to the optimum size for Dask array blocks under most circumstances. The dask array is accompanied with a map of optimal workers. This map should be passed to the
compute() method in order to construct the blocks on the workers that have them in local storage.
from libertem.contrib.daskadapter import make_dask_array # Construct a Dask array from the dataset # The second return value contains information # on workers that hold parts of a dataset in local # storage to ensure optimal data locality dask_array, workers = make_dask_array(dataset) # Use the Dask.distributed client of LiberTEM, since it may not be # the default client: result = ctx.executor.client.compute( dask_array.sum(axis=(-1, -2)) ).result()