Python API

The Python API is a concise API for using LiberTEM from Python code. It is suitable both for interactive scripting, for example from Jupyter notebooks, and for usage from within a Python application or script.

Basic example

This is a basic example to load the API, create a local cluster, load a file and run an analysis. For complete examples on how to use the Python API, please see the Jupyter notebooks in the example directory.

For more details, please see Loading data, Data Set API and format-specific reference. See Sample Datasets for publicly available datasets.

import sys
import logging
# Changed in 0.5.0: The thread count is set dynamically
# on the workers. No need for setting environment variables anymore.

import numpy as np
import matplotlib.pyplot as plt

from libertem import api

logging.basicConfig(level=logging.WARNING)


# Protect the entry point.
# LiberTEM uses dask, which uses multiprocessing to
# start worker processes.
# https://docs.python.org/3/library/multiprocessing.html
if __name__ == '__main__':

    # api.Context() starts a new local cluster.
    # The "with" clause makes sure we shut it down in the end.
    with api.Context() as ctx:
        try:
            path = sys.argv[1]
            ds = ctx.load(
                'auto',
                path=path,
            )
        except IndexError:
            path = ('C:/Users/weber/Nextcloud/Projects/'
                    'Open Pixelated STEM framework/Data/EMPAD/'
                    'scan_11_x256_y256.emd')
            ds = ctx.load(
                'hdf5',
                path=path,
                ds_path='experimental/science_data/data',
                tileshape=(1, 8, 128, 128)
            )

        (scan_y, scan_x, detector_y, detector_x) = ds.shape
        mask_shape = (detector_y, detector_x)

        # LiberTEM sends functions that create the masks
        # rather than mask data to the workers in order
        # to reduce transfers in the cluster.
        def mask(): return np.ones(shape=mask_shape)

        analysis = ctx.create_mask_analysis(dataset=ds, factories=[mask])

        result = ctx.run(analysis, progress=True)

        # Do something useful with the result:
        print(result)
        print(result.mask_0.raw_data)

        # For each mask, one channel is present in the result.
        # This may be different for other analyses.
        # You can access the result channels by their key on
        # the result object:
        plt.figure()
        plt.imshow(result.mask_0.raw_data)
        plt.show()

        # Otherwise, results handle like lists.
        # For example, you can iterate over the result channels:
        raw_result_list = [channel.raw_data for channel in result]

To control how many CPUs and which CUDA devices are used, you can specify them as follows:

from libertem import api
from libertem.executor.dask import DaskJobExecutor, cluster_spec
from libertem.utils.devices import detect

# Find out what would be used, if you like
# returns dictionary with keys "cpus" and "cudas", each with a list of device ids
devices = detect()

# Example: Deactivate CUDA devices by removing them from the device dictionary
devices['cudas'] = []

# Example: Deactivate CuPy integration
devices['has_cupy'] = False

# Example: Use 3 CPUs. The IDs are ignored at the moment, i.e. no CPU pinning
devices['cpus'] = range(3)

# Generate a spec for a Dask.distributed SpecCluster
# Relevant kwargs match the dictionary entries
spec = cluster_spec(**devices)
# Start a local cluster with the custom spec
with DaskJobExecutor.make_local(spec=spec) as executor:
    ctx = api.Context(executor=executor)
    ...

For a full API reference, please see Reference.

To go beyond the included capabilities of LiberTEM, you can implement your own using User-defined functions.

Integration with Dask arrays

The make_dask_array() function can generate a distributed Dask array from a DataSet using its partitions as blocks. The typical LiberTEM partition size is close to the optimum size for Dask array blocks under most circumstances. The dask array is accompanied with a map of optimal workers. This map should be passed to the compute() method in order to construct the blocks on the workers that have them in local storage.

from libertem.contrib.daskadapter import make_dask_array

# Construct a Dask array from the dataset
# The second return value contains information
# on workers that hold parts of a dataset in local
# storage to ensure optimal data locality
dask_array, workers = make_dask_array(dataset)

# Use the Dask.distributed client of LiberTEM, since it may not be
# the default client:
result = ctx.executor.client.compute(
    dask_array.sum(axis=(-1, -2))
).result()