Python API
The Python API is a concise API for using LiberTEM from Python code. It is suitable both for interactive scripting, for example from Jupyter notebooks, and for usage from within a Python application or script.
For a full API reference, please see Python API reference.
Context
The libertem.api.Context
object is the entry-point for most interaction
and processing with LiberTEM. It is used to load datasets, specify and run analyses.
The following snippet initializes a Context
ready-for-use
with default parameters and backed by a parallel processing engine.
import libertem.api as lt
with lt.Context() as ctx:
...
See the API documentation
for the full capabilities
exposed by the Context
.
Note
The use of a with
block in the above code ensures the Context
will correctly release any resources it holds on to when it goes out of scope,
but it is also possible to use the object as a normal variable, i.e. ctx = lt.Context()
Basic example
This is a basic example to load the API, create a local cluster, load a file and run an analysis.
import matplotlib.pyplot as plt
import libertem.api as lt
if __name__ == '__main__':
# A path to a Quantum Detectors Merlin header file
# Adapt to your data and data format
dataset_path = './path_to_dataset.hdr'
# Create a Context object to load data and run analyses
# Here we specify we want to use 4 CPU workers for parallel jobs
with lt.Context.make_with(cpus=4) as ctx:
# Next we define a dataset, at this time no data is loaded
# into memory, we only specify where the files are
# The key 'mib' tells LiberTEM which format to load
# it is possible to supply 'auto' and the Context will
# try to auto-detect the correct dataset format
ds = ctx.load('mib', path=dataset_path)
# Create a sum-over-disk analysis, i.e. brightfield image
# Values for disk centre x/y and radius in pixels
disk_sum_analysis = ctx.create_disk_analysis(ds, cx=32, cy=32, r=8)
disk_sum_result = ctx.run(disk_sum_analysis, progress=True)
# Plot the resulting brightfield image
plt.imshow(disk_sum_result.intensity.raw_data)
plt.show()
For complete examples on how to use the Python API, please see the Jupyter notebooks in the example directory.
For more details on the data formats that LiberTEM supports, please see Loading data, Data Set API and format-specific reference. See Sample Datasets for publicly available datasets.
Custom processing routines
To go beyond the included capabilities of LiberTEM, you can implement your own analyses using User-defined functions (UDFs). UDFs are dataset-agnostic and benefit from the same parallelisation as the built-ins tools.
Executors
An Executor is the internal engine which the Context
uses to
compute user-defined functions or run other tasks. Executors can be serial or parallel,
and can differ substantially in their implementation, but all adhere to a
common interface which the Context
understands.
New in version 0.9.0: The executor API is internal. Since choice and parameters of executors are important for integration with Dask and other frameworks, they are now documented. Only the names and creation methods for executors are reasonably stable. The rest of the API is subject to change without notice. For that reason it is documented in the developer section and not in the API reference.
The default executor is DaskJobExecutor
that
uses the dask.distributed
scheduler. To support all LiberTEM features and
achieve optimal performance, the methods provided by LiberTEM to start a
dask.distributed
cluster should be used. However, LiberTEM can also run on a
“vanilla” dask.distributed
cluster. Please note that dask.distributed
clusters that are not created by LiberTEM might use threading or a mixture of threads
and processes, and therefore might behave or perform differently to a
LiberTEM-instantiated cluster.
The InlineJobExecutor
runs all tasks
synchronously in the current thread. This is useful for debugging and for
special applications such as running UDFs that perform their own multithreading
efficiently or for other non-standard use that requires tasks to be executed
sequentially and in order.
See also Threading for more information on multithreading in UDFs.
New in version 0.9.0.
The ConcurrentJobExecutor
runs all tasks
using concurrent.futures
. Using a
concurrent.futures.ThreadPoolExecutor
, which is the deafult behaviour,
allows sharing large amounts of data as well as other resources between the
main thread and workers efficiently, but is severely slowed down by the
Python global interpreter lock
under many circumstances. Furthermore, it can create thread safety issues such as
https://github.com/LiberTEM/LiberTEM-blobfinder/issues/35.
It is also in principle possible to use a concurrent.futures.ProcessPoolExecutor
as backing for the ConcurrentJobExecutor
,
though this is untested and is likely to lead to worse performance than the
LiberTEM default DaskJobExecutor
.
For special applications, the
DelayedJobExecutor
can use dask.delayed to delay the processing. This
is experimental, see Dask integration for more details. It might use threading as
well, depending on the Dask scheduler that is used by compute()
.
Pipelined executor
New in version 0.10.0.
For live data processing using
LiberTEM-live, the
PipelinedExecutor
provides a multiprocessing executor that routes the live data source in a round-robin
fashion to worker processes. This is important to support processing that cannot keep
up with the detector speed on a single CPU core. This executor also works for offline
data sets in principle, but is not optimized for that use case.
Please see Pipelined executor for a reference of the pipelined executor, and the LiberTEM-live documentation for details on live processing.
Specifying executor type, CPU and GPU workers
New in version 0.9.0.
Changed in version 0.12.0: Added the cpus
and gpus
keyword arguments, as well as the
explicit plot_class
keyword argument passed to the Context
initializer (replaced the prior *args, **kwargs
form).
libertem.api.Context.make_with()
provides a convenient shortcut to start a
Context
with specific executor and customise the number of
workers it uses.
import libertem.api as lt
# Create a Dask-based Context with 4 cpu workers and 2 gpu workers
with lt.Context.make_with('dask', cpus=4, gpus=2) as ctx:
...
The default behaviour is to create a Dask-based Context, but the same
method can be used to create any executor, as described in the documentation
of the method. A useful shortcut is lt.Context.make_with('inline')
to quickly create a synchronous executor for debugging.
Note
Not all executor types allow specifying number of workers, and
not all executor types are GPU-capable. In these cases the make_with
method will raise an ExecutorSpecException
.
See the API documentation
for more information.
Connect to an existing cluster
The DaskJobExecutor
is capable of connecting
to an existing dask.distributed
scheduler, which may be a centrally managed
installation on a physical cluster, or a local, single-machine scheduler started for
some other purpose (by LiberTEM or directly through Dask). Cluster re-use can
reduce startup times as there is no requirement to spawn new workers each time
a script or Notebook is executed.
See Starting a custom cluster for more on how to start a scheduler and workers.
import libertem.api as lt
from libertem.executor.dask import DaskJobExecutor
# Connect to a Dask.Distributed scheduler at 'tcp://localhost:8786'
with DaskJobExecutor.connect('tcp://localhost:8786') as executor:
ctx = lt.Context(executor=executor)
...