Python API

The Python API is a concise API for using LiberTEM from Python code. It is suitable both for interactive scripting, for example from Jupyter notebooks, and for usage from within a Python application or script.

For a full API reference, please see Python API reference.

Context

The libertem.api.Context object is the entry-point for most interaction and processing with LiberTEM. It is used to load datasets, specify and run analyses. The following snippet initializes a Context ready-for-use with default parameters and backed by a parallel processing engine.

import libertem.api as lt

with lt.Context() as ctx:
    ...

See the API documentation for the full capabilities exposed by the Context.

Note

The use of a with block in the above code ensures the Context will correctly release any resources it holds on to when it goes out of scope, but it is also possible to use the object as a normal variable, i.e. ctx = lt.Context()

Basic example

This is a basic example to load the API, create a local cluster, load a file and run an analysis.

import matplotlib.pyplot as plt
import libertem.api as lt


if __name__ == '__main__':
    # A path to a Quantum Detectors Merlin header file
    # Adapt to your data and data format
    dataset_path = './path_to_dataset.hdr'

    # Create a Context object to load data and run analyses
    # Here we specify we want to use 4 CPU workers for parallel jobs
    with lt.Context.make_with(cpus=4) as ctx:
        # Next we define a dataset, at this time no data is loaded
        # into memory, we only specify where the files are
        # The key 'mib' tells LiberTEM which format to load
        # it is possible to supply 'auto' and the Context will
        # try to auto-detect the correct dataset format
        ds = ctx.load('mib', path=dataset_path)

        # Create a sum-over-disk analysis, i.e. brightfield image
        # Values for disk centre x/y and radius in pixels
        disk_sum_analysis = ctx.create_disk_analysis(ds, cx=32, cy=32, r=8)
        disk_sum_result = ctx.run(disk_sum_analysis, progress=True)

        # Plot the resulting brightfield image
        plt.imshow(disk_sum_result.intensity.raw_data)
        plt.show()

For complete examples on how to use the Python API, please see the Jupyter notebooks in the example directory.

For more details on the data formats that LiberTEM supports, please see Loading data, Data Set API and format-specific reference. See Sample Datasets for publicly available datasets.

Custom processing routines

To go beyond the included capabilities of LiberTEM, you can implement your own analyses using User-defined functions (UDFs). UDFs are dataset-agnostic and benefit from the same parallelisation as the built-ins tools.

Executors

An Executor is the internal engine which the Context uses to compute user-defined functions or run other tasks. Executors can be serial or parallel, and can differ substantially in their implementation, but all adhere to a common interface which the Context understands.

New in version 0.9.0: The executor API is internal. Since choice and parameters of executors are important for integration with Dask and other frameworks, they are now documented. Only the names and creation methods for executors are reasonably stable. The rest of the API is subject to change without notice. For that reason it is documented in the developer section and not in the API reference.

The default executor is DaskJobExecutor that uses the dask.distributed scheduler. To support all LiberTEM features and achieve optimal performance, the methods provided by LiberTEM to start a dask.distributed cluster should be used. However, LiberTEM can also run on a “vanilla” dask.distributed cluster. Please note that dask.distributed clusters that are not created by LiberTEM might use threading or a mixture of threads and processes, and therefore might behave or perform differently to a LiberTEM-instantiated cluster.

The InlineJobExecutor runs all tasks synchronously in the current thread. This is useful for debugging and for special applications such as running UDFs that perform their own multithreading efficiently or for other non-standard use that requires tasks to be executed sequentially and in order.

See also Threading for more information on multithreading in UDFs.

New in version 0.9.0.

The ConcurrentJobExecutor runs all tasks using concurrent.futures. Using a concurrent.futures.ThreadPoolExecutor, which is the deafult behaviour, allows sharing large amounts of data as well as other resources between the main thread and workers efficiently, but is severely slowed down by the Python global interpreter lock under many circumstances. Furthermore, it can create thread safety issues such as https://github.com/LiberTEM/LiberTEM-blobfinder/issues/35.

It is also in principle possible to use a concurrent.futures.ProcessPoolExecutor as backing for the ConcurrentJobExecutor, though this is untested and is likely to lead to worse performance than the LiberTEM default DaskJobExecutor.

For special applications, the DelayedJobExecutor can use dask.delayed to delay the processing. This is experimental, see Dask integration for more details. It might use threading as well, depending on the Dask scheduler that is used by compute().

Pipelined executor

New in version 0.10.0.

For live data processing using LiberTEM-live, the PipelinedExecutor provides a multiprocessing executor that routes the live data source in a round-robin fashion to worker processes. This is important to support processing that cannot keep up with the detector speed on a single CPU core. This executor also works for offline data sets in principle, but is not optimized for that use case.

Please see Pipelined executor for a reference of the pipelined executor, and the LiberTEM-live documentation for details on live processing.

Specifying executor type, CPU and GPU workers

New in version 0.9.0.

Changed in version 0.12.0: Added the cpus and gpus keyword arguments, as well as the explicit plot_class keyword argument passed to the Context initializer (replaced the prior *args, **kwargs form).

libertem.api.Context.make_with() provides a convenient shortcut to start a Context with specific executor and customise the number of workers it uses.

import libertem.api as lt

# Create a Dask-based Context with 4 cpu workers and 2 gpu workers
with lt.Context.make_with('dask', cpus=4, gpus=2) as ctx:
    ...

The default behaviour is to create a Dask-based Context, but the same method can be used to create any executor, as described in the documentation of the method. A useful shortcut is lt.Context.make_with('inline') to quickly create a synchronous executor for debugging.

Note

Not all executor types allow specifying number of workers, and not all executor types are GPU-capable. In these cases the make_with method will raise an ExecutorSpecException.

See the API documentation for more information.

Connect to an existing cluster

The DaskJobExecutor is capable of connecting to an existing dask.distributed scheduler, which may be a centrally managed installation on a physical cluster, or a local, single-machine scheduler started for some other purpose (by LiberTEM or directly through Dask). Cluster re-use can reduce startup times as there is no requirement to spawn new workers each time a script or Notebook is executed.

See Starting a custom cluster for more on how to start a scheduler and workers.

import libertem.api as lt
from libertem.executor.dask import DaskJobExecutor

# Connect to a Dask.Distributed scheduler at 'tcp://localhost:8786'
with DaskJobExecutor.connect('tcp://localhost:8786') as executor:
    ctx = lt.Context(executor=executor)
    ...