UDF API reference

Defining UDFs

See User-defined functions (UDFs) for an introduction and in-depth explanation.

Mixins for processing methods

class libertem.udf.base.UDFFrameMixin(*args, **kwargs)[source]

Implement process_frame for per-frame processing.

process_frame(frame: ndarray)[source]

Implement this method to process the data on a frame-by-frame manner.

Data available in this method:

self.params - the parameters of this UDF
self.task_data - task data created by get_task_data
self.results - the result buffer instances
self.meta - meta data about the current operation and data set

Parameters:: frame (numpy.ndarray or cupy.ndarray) – A single frame or signal element from the dataset. The shape is the same as dataset.shape.sig. In case of pixelated STEM / scanning diffraction data this is 2D, for spectra 1D etc.

class libertem.udf.base.UDFTileMixin(*args, **kwargs)[source]

Implement process_tile for per-tile processing.

process_tile(tile: ndarray)[source]

Implement this method to process the data in a tiled manner.

Data available in this method:

self.params - the parameters of this UDF
self.task_data - task data created by get_task_data
self.results - the result buffer instances
self.meta - meta data about the current operation and data set

Parameters:: tile (numpy.ndarray or cupy.ndarray) – A small number N of frames or signal elements from the dataset. The shape is (N,) + dataset.shape.sig. In case of pixelated STEM / scanning diffraction data this is 3D, for spectra 2D etc.

class libertem.udf.base.UDFPartitionMixin(*args, **kwargs)[source]

Implement process_partition for per-partition processing.

process_partition(partition: ndarray) → None[source]

Implement this method to process the data partitioned into large (100s of MiB) partitions.

Data available in this method:

self.params - the parameters of this UDF
self.task_data - task data created by get_task_data
self.results - the result buffer instances
self.meta - meta data about the current operation and data set

Note

Only use this method if you know what you are doing; especially if you are running a processing pipeline with multiple steps, or multiple processing pipelines at the same time, performance may be adversely impacted.

Parameters:: partition (numpy.ndarray or cupy.ndarray) – A large number N of frames or signal elements from the dataset. The shape is (N,) + dataset.shape.sig. In case of pixelated STEM / scanning diffraction data this is 3D, for spectra 2D etc.

Base UDF classes

class libertem.udf.base.UDF(**kwargs: Any | AuxBufferWrapper)[source]

The main user-defined functions interface. You can implement your functionality by overriding methods on this class.

If you override __init__, please take care, as it is called multiple times during evaluation of a UDF. You can handle some pre-conditioning of parameters, but you also have to accept the results as input again.

Arguments passed as **kwargs will be automatically available on self.params when running the UDF.

Example

>>> class MyUDF(UDF):
...     def __init__(self, param1, param2="def2", **kwargs):
...         param1 = int(param1)
...         if "param3" not in kwargs:
...             raise TypeError("missing argument param3")
...         super().__init__(param1=param1, param2=param2, **kwargs)

Parameters:

kwargs –

Input parameters. They are scattered to the worker processes and available as self.params from here on.

Values can be BufferWrapper instances, which, when accessed via self.params.the_key_here, will automatically return a view corresponding to the current unit of data (frame, tile, partition).

BACKEND_ALL = ('cupyx.scipy.sparse.csr_matrix', 'cupyx.scipy.sparse.csc_matrix', 'cupyx.scipy.sparse.coo_matrix', 'scipy.sparse.coo_array', 'scipy.sparse.csc_array', 'scipy.sparse.csr_array', 'scipy.sparse.csr_matrix', 'scipy.sparse.csc_matrix', 'scipy.sparse.coo_matrix', 'cupy', 'numpy', 'sparse.COO', 'sparse.GCXS'): Tuple with all backends in suggested priority

BACKEND_CUDA = 'cuda': NumPy array, but run on CUDA device class

BACKEND_CUPY = 'cupy': CuPy array

BACKEND_CUPY_SCIPY_COO = 'cupyx.scipy.sparse.coo_matrix': cupyx.scipy.sparse.coo_matrix

BACKEND_CUPY_SCIPY_CSC = 'cupyx.scipy.sparse.csc_matrix': cupyx.scipy.sparse.csc_matrix

BACKEND_CUPY_SCIPY_CSR = 'cupyx.scipy.sparse.csr_matrix': cupyx.scipy.sparse.csr_matrix

BACKEND_NUMPY = 'numpy': NumPy array

BACKEND_SCIPY_COO = 'scipy.sparse.coo_matrix': scipy.sparse.coo_matrix

BACKEND_SCIPY_COO_ARRAY = 'scipy.sparse.coo_array': scipy.sparse.coo_array

BACKEND_SCIPY_CSC = 'scipy.sparse.csc_matrix': scipy.sparse.csc_matrix

BACKEND_SCIPY_CSC_ARRAY = 'scipy.sparse.csc_array': scipy.sparse.csc_array

BACKEND_SCIPY_CSR = 'scipy.sparse.csr_matrix': scipy.sparse.csr_matrix

BACKEND_SCIPY_CSR_ARRAY = 'scipy.sparse.csr_array': scipy.sparse.csr_array

BACKEND_SPARSE_COO = 'sparse.COO': sparse.COO array

BACKEND_SPARSE_DOK = 'sparse.DOK': sparse.DOK array – not recommended since slow!

BACKEND_SPARSE_GCXS = 'sparse.GCXS': sparse.GCXS array

CPU_BACKENDS = frozenset({'numpy', 'numpy.matrix', 'scipy.sparse.coo_array', 'scipy.sparse.coo_matrix', 'scipy.sparse.csc_array', 'scipy.sparse.csc_matrix', 'scipy.sparse.csr_array', 'scipy.sparse.csr_matrix', 'sparse.COO', 'sparse.DOK', 'sparse.GCXS'}): Set of backends that run on device class CPU

CUDA_BACKENDS = frozenset({'cuda', 'cupy', 'cupyx.scipy.sparse.coo_matrix', 'cupyx.scipy.sparse.csc_matrix', 'cupyx.scipy.sparse.csr_matrix'}): Set of backends that run on device class CUDA

CUPY_BACKENDS = frozenset({'cupy', 'cupyx.scipy.sparse.coo_matrix', 'cupyx.scipy.sparse.csc_matrix', 'cupyx.scipy.sparse.csr_matrix'}): Set of backends that use CuPy, subset of class CUDA

D2_BACKENDS = frozenset({'cupyx.scipy.sparse.coo_matrix', 'cupyx.scipy.sparse.csc_matrix', 'cupyx.scipy.sparse.csr_matrix', 'numpy.matrix', 'scipy.sparse.coo_array', 'scipy.sparse.coo_matrix', 'scipy.sparse.csc_array', 'scipy.sparse.csc_matrix', 'scipy.sparse.csr_array', 'scipy.sparse.csr_matrix'}): Set of backends that only support two-dimensional arrays

DENSE_BACKENDS = frozenset({'cuda', 'cupy', 'numpy', 'numpy.matrix'}): Set of backends that are dense arrays

ND_BACKENDS = frozenset({'cuda', 'cupy', 'numpy', 'sparse.COO', 'sparse.DOK', 'sparse.GCXS'}): Set of backends that support n-dimensional arrays

SPARSE_BACKENDS = frozenset({'cupyx.scipy.sparse.coo_matrix', 'cupyx.scipy.sparse.csc_matrix', 'cupyx.scipy.sparse.csr_matrix', 'scipy.sparse.coo_array', 'scipy.sparse.coo_matrix', 'scipy.sparse.csc_array', 'scipy.sparse.csc_matrix', 'scipy.sparse.csr_array', 'scipy.sparse.csr_matrix', 'sparse.COO', 'sparse.DOK', 'sparse.GCXS'}): Set of backends that are sparse arrays

TILE_DEPTH_DEFAULT = <object object>: Suggest using recommended tile depth

TILE_DEPTH_MAX = inf: Suggest using maximum tile depth

TILE_SIZE_BEST_FIT = <object object>: Suggest using recommended tile size

TILE_SIZE_MAX = inf: Suggest using maximum tile size

UDF_METHOD(value): alias of UDFMethod

USE_NATIVE_DTYPE: alias of bool

classmethod aux_data(data, kind, extra_shape=(), dtype='float32')[source]

Use this method to create auxiliary data. Auxiliary data should have a shape like (dataset.shape.nav, extra_shape) and on access, an appropriate view will be created. For example, if you access aux data in process_frame, you will get the auxiliary data for the current frame you are processing.

Example

We create a UDF to demonstrate the behavior:

>>> class MyUDF(UDF):
...     def get_result_buffers(self):
...         # Result buffer for debug output
...         return {'aux_dump': self.buffer(kind='nav', dtype='object')}
...
...     def process_frame(self, frame):
...         # Extract value of aux data for demonstration
...         self.results.aux_dump[:] = str(self.params.aux_data[:])
...
>>> # for each frame, provide three values from a sequential series:
>>> aux1 = MyUDF.aux_data(
...     data=np.arange(np.prod(dataset.shape.nav) * 3, dtype=np.float32),
...     kind="nav", extra_shape=(3,), dtype="float32"
... )
>>> udf = MyUDF(aux_data=aux1)
>>> res = ctx.run_udf(dataset=dataset, udf=udf)

process_frame for frame (0, 7) received a view of aux_data with values [21., 22., 23.]:

>>> res['aux_dump'].data[0, 7]
'[21. 22. 23.]'

buffer(kind: Literal['nav', 'sig', 'single'], extra_shape: tuple[int, ...] = (), dtype: nt.DTypeLike = 'float32', where: Literal['device'] | None = None, use: Literal['private', 'result_only'] | None = None) → BufferWrapper[source]

Use this method to create BufferWrapper objects in get_result_buffers().

Parameters:

kind ("nav", "sig" or "single") – The abstract shape of the buffer, corresponding either to the navigation or the signal dimensions of the dataset, or a single value.
extra_shape (optional, tuple of int or a Shape object) – You can specify additional dimensions for your data. For example, if you want to store 2D coords, you would specify (2,) here. If this is specified as a Shape object, it is converted to a tuple first.
dtype (string or numpy dtype) – The dtype of this buffer
where (string or None) –
None means NumPy array, specify 'device' to use a device buffer (for example on a CUDA device)

New in version 0.6.0.
use ("private", "result_only" or None) –
If you specify "private" here, the result will only be made available to internal functions, like process_frame(), merge() or get_results(). It will not be available to the user of the UDF, which means you can use this to hide implementation details that are likely to change later.

Specify "result_only" here if the buffer is only used in get_results(), this means we don’t have to allocate and return it on the workers without actually needing it.

None means the buffer is used both as a final and intermediate result.

New in version 0.7.0.

copy_for_partition(partition: Partition, roi: ndarray) → UDF[source]: create a copy of the UDF, specifically slicing aux data to the specified pratition and roi

forbuf(arr, target)[source]

Convert array to backend that is compatible with result buffers and reshape.

This function should be wrapped around assignment to result buffers if the array arr might have an incompatible array backend and/or have a flattened sig dimension. It converts any of the supported input backends into the array backend of the target buffer. If the argument is already matching the buffer or is not recognized, it is a no-op.

In particular, the result of array operations on a sparse input tile can’t always be merged into dense result buffers directly since arrays from the sparse package are not converted to NumPy arrays automatically.

New in version 0.11.0.

Parameters:

arr (array-like) – Any array-like object that is supported by the sparseconverter package or allows the desired operation with the result buffer at hand.
target (array-like) – The LiberTEM result buffer that the desired operation targets.

Examples

>>> def process_tile(self, tile):
...     res = ...  # Processing result from tile
...     self.results.intensity[:] += self.forbuf(
...         res,
...         self.results.intensity
...     )

See Sparse arrays for a complete example how to use this function in the contaxt of a UDF!

get_backends() → Literal['numpy', 'numpy.matrix', 'cuda', 'cupy', 'sparse.COO', 'sparse.GCXS', 'sparse.DOK', 'scipy.sparse.coo_matrix', 'scipy.sparse.csr_matrix', 'scipy.sparse.csc_matrix', 'scipy.sparse.coo_array', 'scipy.sparse.csr_array', 'scipy.sparse.csc_array', 'cupyx.scipy.sparse.coo_matrix', 'cupyx.scipy.sparse.csr_matrix', 'cupyx.scipy.sparse.csc_matrix'] | Iterable[Literal['numpy', 'numpy.matrix', 'cuda', 'cupy', 'sparse.COO', 'sparse.GCXS', 'sparse.DOK', 'scipy.sparse.coo_matrix', 'scipy.sparse.csr_matrix', 'scipy.sparse.csc_matrix', 'scipy.sparse.coo_array', 'scipy.sparse.csr_array', 'scipy.sparse.csc_array', 'cupyx.scipy.sparse.coo_matrix', 'cupyx.scipy.sparse.csr_matrix', 'cupyx.scipy.sparse.csc_matrix']][source]

Signal which computation back-ends the UDF can use.

Allowed are values in libertem.udf.base.UDF.BACKEND_ALL. (UDF.BACKEND_NUMPY, ) is returned by default for CPU-based computation.

New in version 0.6.0.

Changed in version 0.11.0: Extended from just NumPy vs. CuPy to include more generic specification of backends, including sparse arrays. See Sparse arrays for details! The backend specifications are backwards-compatible with the previously supported values.

Returns:: A single value or iterable containing values from the supported backends in libertem.udf.base.UDF.BACKEND_ALL
Return type:: backends

get_method() → UDFMethod

Return a member of the libertem.udf.base.UDF.UDF_METHOD enum to indicate which processing method to use during run_udf.

By default, this method returns based on the process method precedence of tile > frame > partition.

The return is checked at runtime to ensure the requested method is impemented on the UDF itself.

Changed in version 0.13.0: get_method() was exposed to enable choice of processing method at runtime

Returns:: A member of the enum libertem.udf.base.UDF.UDF_METHOD
Return type:: UDFMethod

get_preferred_input_dtype() → nt.DTypeLike[source]

Override this method to specify the preferred input dtype of the UDF.

The default is float32 since most numerical processing tasks perform best with this dtype, namely dot products.

The back-end uses this preferred input dtype in combination with the dataset`s native dtype to determine the input dtype using numpy.result_type(). That means float data in a dataset switches the dtype to float even if this method returns an int dtype. int32 or wider input data would switch from float32 to float64, and complex data in the dataset will switch the input dtype kind to complex, following the NumPy casting rules.

In case your UDF only works with specific input dtypes, it should throw an error or warning if incompatible dtypes are used, and/or implement a meaningful conversion in your UDF’s process_<...> routine.

If you prefer to always use the dataset’s native dtype instead of floats, you can override this method to return UDF.USE_NATIVE_DTYPE, which is currently identical to numpy.bool and behaves as a neutral element in numpy.result_type().

New in version 0.4.0.

get_result_buffers() → dict[str, BufferWrapper][source]

Return result buffer declaration.

Values of the returned dict should be BufferWrapper instances, which, when accessed via self.results.key, will automatically return a view corresponding to the current unit of data (frame, tile, partition).

The values also need to be serializable via pickle.

Data available in this method:

self.params - the parameters of this UDF
self.meta - relevant metadata, see UDFMeta documentation. Please note that partition metadata will not be set when this method is executed on the head node.

Returns:: Flat dict with string keys. Keys should be valid python identifiers, which allows access via self.results.the_key_here.
Return type:: dict

get_results() → dict[str, ndarray][source]

Get results, allowing a postprocessing step on the main node after a result has been merged. See also: UDFPostprocessMixin.

New in version 0.7.0.

Note

You should return all values as numpy arrays, they will be wrapped in BufferWrapper instances before they are returned to the user.

See the Post-processing after merging section in the documentation for details and examples.

Returns:: results – A dict containing the final post-processed results.
Return type:: dict

get_task_data() → dict[str, Any][source]

Initialize per-task data.

Per-task data can be mutable. Override this function to allocate temporary buffers, or to initialize system resources.

If you want to distribute static data, use parameters instead.

Data available in this method:

self.params - the input parameters of this UDF
self.meta - relevant metadata, see UDFMeta documentation.

Returns:: Flat dict with string keys. Keys should be valid python identifiers, which allows access via self.task_data.the_key_here.
Return type:: dict

get_tiling_preferences() → TilingPreferences[source]

Configure tiling preferences. Return a dictionary with the following keys:

“depth”: number of frames/frame parts to stack on top of each other
“total_size”: total size of a tile in bytes

New in version 0.6.0.

merge(dest: MergeAttrMapping, src: MergeAttrMapping)[source]

Merge a partial result src into the current global result dest.

Data available in this method:

self.params - the parameters of this UDF

Parameters:

dest – global results; you can access the ndarrays for each buffer name (from get_result_buffers) by attribute access (dest.your_buffer_name)
src – results for a partition; you can access the ndarrays for each buffer name (from get_result_buffers) by attribute access (src.your_buffer_name)

Note

This function is running on the main node, which means self.results and self.task_data are not available.

property requires_custom_merge

use != ‘result_only’ are present where the default merge doesn’t work

New in version 0.5.0.

Changed in version 0.12.0: Ignore use != 'result_only' buffer since they are not present in the merge function

Type:: Determine if buffers with kind != 'nav' that are not
Type:: code

property requires_custom_merge_all: Determine if buffers with kind != 'nav' are present where the default merge doesn’t work

New in version 0.12.0.

static with_mask(data: ndarray, mask: ndarray | bool) → ArrayWithMask[source]

Add a mask to indicate the valid parts in a result. The mask should be an array of bools, with a shape matching the data array. The mask should have True in positions that are valid, and False otherwise.

You can pass mask=True or mask=False as shortcuts for all-valid or all-invalid, and also pass in masks that can be broadcast to data.

This should be used in get_results().

See the Valid data masking section in the documentation for details and more examples.

Example

>>> class SomeUDF(UDF):
...     # ... other methods omitted for brevity ...
...
...     def get_results(self):
...         return {
...             'some_result': self.with_mask(
...                 data=self.results.some_result,
...                 mask=False,
...             ),
...         }
...

property xp

Compute back-end library to use.

Generally, use self.xp instead of np to use NumPy or CuPy transparently

New in version 0.6.0.

enum libertem.common.udf.UDFMethod(value)[source]

Valid values are as follows:

TILE = <UDFMethod.TILE: 'tile'>

FRAME = <UDFMethod.FRAME: 'frame'>

PARTITION = <UDFMethod.PARTITION: 'partition'>

Alternative merging for Dask arrays

class libertem.udf.base.UDFMergeAllMixin(*args, **kwargs)[source]

merge_all(ordered_results: OrderedDict[Slice, MergeAttrMapping]) → Mapping[str, nt.ArrayLike][source]

Combine stack of ordered partial results ordered_results to form complete result.

Combining can be more efficient than direct merging into a result buffer for cases where the results are not NumPy arrays. Currently this is only applicable for the libertem.executor.delayed.DelayedJobExecutor where it provides an efficient pathway to construct Dask arrays from delayed UDF results.

The input and the returned arrays are in flattened navigation dimension with ROI applied.

Data available in this method:

self.params - the parameters of this UDF

For UDFs with only kind='nav' result buffers a default implementation is used automatically.

Parameters:: ordered_results – Ordered dict mapping partition slice to UDF partial result
Returns:: Dictionary mapping result buffer name to buffer content
Return type:: dict[buffername] -> array_like

Note

This function is running on the main node, which means self.results and self.task_data are not available.

Meta information

class libertem.udf.base.UDFMeta(partition_slice: Slice | None, dataset_shape: Shape, roi: ndarray | None, dataset_dtype: nt.DTypeLike, input_dtype: nt.DTypeLike, tiling_scheme: TilingScheme | None = None, tiling_index: int = 0, corrections: CorrectionSet | None = None, device_class: Literal['cpu', 'cuda'] | None = None, threads_per_worker: int | None = None, array_backend: Literal['numpy', 'numpy.matrix', 'cuda', 'cupy', 'sparse.COO', 'sparse.GCXS', 'sparse.DOK', 'scipy.sparse.coo_matrix', 'scipy.sparse.csr_matrix', 'scipy.sparse.csc_matrix', 'scipy.sparse.coo_array', 'scipy.sparse.csr_array', 'scipy.sparse.csc_array', 'cupyx.scipy.sparse.coo_matrix', 'cupyx.scipy.sparse.csr_matrix', 'cupyx.scipy.sparse.csc_matrix'] | None = None, valid_nav_mask: ndarray | None = None)[source]

UDF metadata. Makes all relevant metadata accessible to the UDF. Can be different for each task/partition.

Changed in version 0.4.0: Added distinction of dataset_dtype and input_dtype

Changed in version 0.6.0: Information on compute backend, corrections, coordinates and tiling scheme added

Changed in version 0.9.0: tiling_scheme_idx and sig_slice added

Changed in version 0.11.0: array_backend added

property array_backend: Literal['numpy', 'numpy.matrix', 'cuda', 'cupy', 'sparse.COO', 'sparse.GCXS', 'sparse.DOK', 'scipy.sparse.coo_matrix', 'scipy.sparse.csr_matrix', 'scipy.sparse.csc_matrix', 'scipy.sparse.coo_array', 'scipy.sparse.csr_array', 'scipy.sparse.csc_array', 'cupyx.scipy.sparse.coo_matrix', 'cupyx.scipy.sparse.csr_matrix', 'cupyx.scipy.sparse.csc_matrix'] | None: Array backend, one of the constants defined in BACKEND_* constants in libertem.udf.base.UDF, or None if not known at that time.

New in version 0.11.0.

property coordinates: ndarray

Array of coordinates that correspond to the frames in the actual navigation space which are part of the current tile or partition.

New in version 0.6.0.

Type:: np.ndarray

property corrections: CorrectionSet

correction data that is available, either from the dataset or specified by the user

New in version 0.6.0.

Type:: CorrectionSet

property dataset_dtype: nt.DTypeLike

Native dtype of the dataset

Type:: numpy.dtype

property dataset_shape: Shape

The original shape of the whole dataset, not influenced by the ROI

Type:: Shape

property device_class: str

Which device class is used.

The back-end library can be accessed as libertem.udf.base.UDF.xp. This additional string information is used since that way the back-end can be probed without importing them all and testing them against libertem.udf.base.UDF.xp.

Current values are cpu (default) or cuda.

New in version 0.6.0.

get_valid_nav_mask(full_nav: bool = False) → ndarray | None[source]

Return a mask of the already computed nav positions, as flattened 1D array.

Only available in merge() and get_results().

In case of merge(), the mask does not yet contain the positions of the data that will be merged into the result, but only those positions that have already been merged into result.

NOTE: these positions may include dropped frames or missing data.

Parameters:: full_nav – If a roi is applied, still include all elements of the navigation space in the mask. If this is not set, the mask is compressed to the positions selected in the roi.

property input_dtype: nt.DTypeLike

dtype of the data that will be passed to the UDF

This is determined from the dataset’s native dtype and UDF.get_preferred_input_dtype() using numpy.result_type()

New in version 0.4.0.

Type:: numpy.dtype

property partition_shape: Shape

The shape of the partition this UDF currently works on. If a ROI was applied, the shape will be modified accordingly.

Type:: Shape

property roi: ndarray | None

Boolean array which limits the elements the UDF is working on. Has a shape of dataset_shape.nav.

Type:: numpy.ndarray

property sig_slice: Slice

Signal slice of the current tile.

Since all tiles follow the same tiling scheme, this avoids repeatedly calculating the signal part of slice. Instead, the appropriate slice from the tiling scheme can be re-used.

property slice: Slice | None

A Slice instance that describes the location within the dataset with navigation dimension flattened and reduced to the ROI.

Type:: Slice

property threads_per_worker: int | None

number of threads that a UDF is allowed to use in the process_* method.: For Numba, pyfftw, Torch, NumPy and SciPy (OMP, MKL, OpenBLAS), this limit is set automatically; this property can be used for other cases, like manually creating thread pools or setting limits for unsupported modules. None means no limit is set, and the UDF can use any number of threads it deems necessary (should be limited to system limits, of course).

Note

Changed in version 0.8.0: Since discovery of loaded libraries can be slow with threadpoolctl (#1117), they are cached now. In case an UDF triggers loading of a new library or instance of a library that is supported by threadpoolctl, it will only be discovered in the first run on a Context. The threading settings for such other libraries or instances can therefore depend on the execution order. In such cases the thread count for affected libraries should be set in the UDF based on threads_per_worker. Numba, pyfftw, Torch, NumPy and SciPy should not be affected since they are loaded before the first discovery.

See also: libertem.common.threading.set_num_threads()

New in version 0.7.0.

Type:: int or None

property tiling_scheme: TilingScheme | None

the tiling scheme that was negotiated

New in version 0.6.0.

Type:: TilingScheme

property tiling_scheme_idx: int: Index of the current tile in tiling_scheme.

Pre- and postprocessing

class libertem.udf.base.UDFPreprocessMixin(*args, **kwargs)[source]

Implement preprocess to initialize the result buffers of a partition on the worker before the partition data is processed.

New in version 0.3.0.

preprocess() → None[source]

Implement this method to preprocess the result data for a partition.

This can be useful to initialize arrays of dtype='object' with the correct container types, for example.

Data available in this method:

self.params - the parameters of this UDF
self.task_data - task data created by get_task_data
self.results - the result buffer instances

class libertem.udf.base.UDFPostprocessMixin(*args, **kwargs)[source]

Implement postprocess to modify the resulf buffers of a partition on the worker after the partition data has been completely processed, but before it is returned to the main node for the final merging step.

postprocess() → None[source]

Implement this method to postprocess the result data for a partition.

This can be useful in combination with process_tile() to implement a postprocessing step that requires the reduced results for whole frames.

Data available in this method:

self.params - the parameters of this UDF
self.task_data - task data created by get_task_data
self.results - the result buffer instances

Exceptions

class libertem.common.exceptions.UDFException[source]: Raised when the UDF interface is somehow misused

Running UDFs

Three methods of libertem.api.Context are relevant for running user-defined functions:

class libertem.api.Context(executor: JobExecutor | None = None, plot_class: Live2DPlot | None = None)[source]

Context is the main entry point of the LiberTEM API. It contains methods for loading datasets, creating analyses on them and running them. In the background, instantiating a Context creates a suitable executor and spins up a local Dask cluster unless the executor is passed to the constructor.

Changed in version 0.7.0: Removed deprecated methods create_mask_job, create_pick_job

Parameters:

executor (JobExecutor or None) – If None, create a local dask.distributed cluster and client using make_local() with optimal configuration for LiberTEM. It uses all cores and compatible GPUs on the local system, but is not set as default Dask scheduler to not interfere with other uses of Dask.
plot_class (libertem.viz.base.Live2DPlot) –
Default plot class for live plotting. Defaults to libertem.viz.mpl.MPLLive2DPlot.

New in version 0.7.0.

plot_class

Default plot class for live plotting. Defaults to libertem.viz.mpl.MPLLive2DPlot.

New in version 0.7.0.

Type:: libertem.viz.base.Live2DPlot

Examples

>>> ctx = libertem.api.Context()  

>>> # Create a Context using an inline executor for debugging
>>> # See also Context.make_with() for a more convenient interface!
>>> from libertem.executor.inline import InlineJobExecutor
>>> debug_ctx = libertem.api.Context(executor=InlineJobExecutor())

Create an AutoUDF with function f() and run it on dataset

Changed in version 0.5.0: Added the progress parameter

Changed in version 0.6.0: Added the corrections and backends parameter

Parameters:

dataset – The dataset to work on
f – Function that accepts a frame as the only parameter. It should return a strongly reduced output compared to the size of a frame.
roi (numpy.ndarray, sparse array or coordinate tuple(s), optional) – Region of interest as bool mask over the navigation axes of the dataset. See Regions of interest.
progress (bool | ProgressReporter) – Show progress bar. Toggle with boolean flag or supply instance of libertem.common.progress.ProgressReporter for custom handling of progress display.
corrections – Corrections to apply while running the function. If none are given, the corrections that are part of the DataSet are used, if there are any. See also Corrections.
backends (None or iterable containing 'numpy', 'cupy' and/or 'cuda') – Restrict the back-end to a subset of the capabilities of the UDF. This can be useful for testing hybrid UDFs.

Returns:

BufferWrapper – The result of the UDF. Access the underlying numpy array using the data property. Shape and dtype is inferred automatically from f.

Return type:

libertem.common.buffers.BufferWrapper

Run udf on dataset, restricted to the region of interest roi.

Changed in version 0.5.0: Added the progress parameter

Changed in version 0.6.0: Added the corrections and backends parameter

Changed in version 0.7.0: Added the plots and sync parameters, and the ability to run multiple UDFs on the same data in a single pass.

Raises:

UDFRunCancelled – Either the run was cancelled using AsyncJobExecutor.cancel(), or the underlying data source was interrupted.

Parameters:

dataset – The dataset to work on
udf – UDF instance you want to run, or a list of UDF instances
roi (numpy.ndarray, sparse array or coordinate tuple(s), optional) – Region of interest as bool mask over the navigation axes of the dataset. See Regions of interest.
progress (bool | ProgressReporter) – Show progress bar. Toggle with boolean flag or supply instance of libertem.common.progress.ProgressReporter for custom handling of progress display.
corrections – Corrections to apply while running the UDF. If none are given, the corrections that are part of the DataSet are used, if there are any. See also Corrections.
backends (None or iterable containing 'numpy', 'cupy' and/or 'cuda') – Restrict the back-end to a subset of the capabilities of the UDF. This can be useful for testing hybrid UDFs.
plots (None or True or List[List[Union[str, Tuple[str, Callable]]]] or List[LivePlot]) –
- None: don’t plot anything (default)
- True: plot all 2D UDF result buffers
- List[List[...]]: plot the named UDF buffers. Pass a list of names or (name, callable) tuples for each UDF you want to plot. If the callable is specified, it is applied to the UDF buffer before plotting.
- List[LivePlot]: LivePlot instance for each channel you want to plot
New in version 0.7.0.
sync (bool) –
By default, run_udf is a synchronous method. If sync is set to False, it is awaitable instead.

New in version 0.7.0.

Returns:

Return value of the UDF containing the result buffers of type libertem.common.buffers.BufferWrapper. Note that a BufferWrapper can be used like a numpy.ndarray in many cases because it implements __array__(). You can access the underlying numpy array using the data property.

If a list of UDFs was passed in, the returned type is a Tuple[dict[str,BufferWrapper], …].

Return type:

dict or Tuple[dict, …]

Examples

Run the SumUDF on a data set:

>>> from libertem.udf.sum import SumUDF
>>> result = ctx.run_udf(dataset=dataset, udf=SumUDF())
>>> np.array(result["intensity"]).shape
(32, 32)
>>> # intensity is the name of the result buffer, defined in the SumUDF

Running a UDF on a subset of data:

>>> from libertem.udf.sumsigudf import SumSigUDF
>>> roi = dataset.roi[0, 0]
>>> result = ctx.run_udf(dataset=dataset, udf=SumSigUDF(), roi=roi)
>>> # to get the full navigation-shaped results, with NaNs where the `roi` was False:
>>> np.array(result["intensity"]).shape
(16, 16)
>>> # to only get the selected results as a flat array:
>>> result["intensity"].raw_data.shape
(1,)

Run udf on dataset, restricted to the region of interest roi. Yields partial results after each merge operation.

New in version 0.7.0.

Parameters:

dataset – The dataset to work on
udf – UDF instance you want to run, or a list of UDF instances
roi (numpy.ndarray, sparse array or coordinate tuple(s), optional) – Region of interest as bool mask over the navigation axes of the dataset. See Regions of interest.
progress (bool | ProgressReporter) – Show progress bar. Toggle with boolean flag or supply instance of libertem.common.progress.ProgressReporter for custom handling of progress display.
corrections – Corrections to apply while running the UDF. If none are given, the corrections that are part of the DataSet are used, if there are any. See also Corrections.
backends (None or iterable containing 'numpy', 'cupy' and/or 'cuda') – Restrict the back-end to a subset of the capabilities of the UDF. This can be useful for testing hybrid UDFs.
plots (None or True or List[List[Union[str, Tuple[str, Callable]]]] or List[LivePlot]) –
- None: don’t plot anything (default)
- True: plot all 2D UDF result buffers
- List[List[...]]: plot the named UDF buffers. Pass a list of names or (name, callable) tuples for each UDF you want to plot. If the callable is specified, it is applied to the UDF buffer before plotting.
- List[LivePlot]: LivePlot instance for each channel you want to plot
sync (bool) – By default, run_udf_iter is a synchronous method. If sync is set to False, an async generator will be returned instead.

Returns:

Generator of UDFResults container objects. Their attribute buffers is the list of result buffer dictionaries for the UDFs. Attribute damage is a BufferWrapper of kind='nav', dtype=bool indicating the positions in nav space that have been processed already.

Return type:

Generator[UDFResults]

Examples

Run the SumUDF on a data set:

>>> from libertem.udf.sum import SumUDF
>>> for result in ctx.run_udf_iter(dataset=dataset, udf=SumUDF()):
...     assert np.array(result.buffers[0]["intensity"]).shape == (32, 32)
>>> np.array(result.buffers[0]["intensity"]).shape
(32, 32)
>>> # intensity is the name of the result buffer, defined in the SumUDF

Result type for UDF result iterators:

class libertem.udf.base.UDFResults(buffers: Iterable[Mapping[str, BufferWrapper]], damage: BufferWrapper)[source]

Container class to combine UDF results with additional information.

This class allows to return additional information from UDF execution together with UDF result buffers. This is currently used to pass “damage” information when running UDFs as an iterator using libertem.api.Context.run_udf_iter(). “Damage” is a map of the nav space that is set to True for all positions that have already been processed.

New in version 0.7.0.

Parameters:

buffers – Iterable containing the result buffer dictionaries for each of the UDFs being executed
damage (BufferWrapper) – libertem.common.buffers.BufferWrapper of kind='nav', dtype=bool. It is set to True for all positions in nav space that have been processed already.

Exception for cancelled UDF run:

class libertem.common.exceptions.UDFRunCancelled[source]: Raised when the UDF run was cancelled, either when the job was cancelled using AsyncJobExecutor.cancel(), or when the underlying data source was interrupted.

Buffers

BufferWrapper objects are used to manage data in the context of user-defined functions.

class libertem.common.buffers.ArrayWithMask(arr: T, mask: ndarray | bool)[source]

An opaque type representing an array together with a mask (for example, defining a region in the array where there are valid entries)

This is meant for usage in UDF.get_results(), and you can use the convenience method UDF.with_mask() to instantiate it.

property arr: T

property mask: ndarray

class libertem.common.buffers.AuxBufferWrapper(kind: Literal['nav', 'sig', 'single'], extra_shape: tuple[int, ...] = (), dtype='float32', where: None | Literal['device'] = None, use: Literal['private', 'result_only'] | None = None)[source]

new_for_partition(partition, roi)[source]

Return a new AuxBufferWrapper for a specific partition, slicing the data accordingly and reducing it to the selected roi.

This assumes to be called on an AuxBufferWrapper that was not created by this method, that is, it should have global coordinates without having the ROI applied.

set_buffer(buf, is_global=True)[source]

Set the underlying buffer to an existing numpy array.

If is_global is True, the shape must match with the shape of nav or sig of the dataset, plus extra_shape, as determined by the kind and extra_shape constructor arguments.

class libertem.common.buffers.BufferPool[source]

allocation pool for explicitly re-using (aligned) allocations

bytes(size, alignment=4096)[source]

checkin_bytes(size, alignment, buf)[source]

checkout_bytes(size, alignment)[source]

empty(size, dtype, alignment=4096)[source]

zeros(size, dtype, alignment=4096)[source]

class libertem.common.buffers.BufferWrapper(kind: Literal['nav', 'sig', 'single'], extra_shape: tuple[int, ...] = (), dtype='float32', where: None | Literal['device'] = None, use: Literal['private', 'result_only'] | None = None)[source]

Helper class to automatically allocate buffers, either for partitions or the whole dataset, and create views for partitions or single frames.

This is used as a helper to allow easy merging of results without needing to manually handle indexing.

Usually, as a user, you only need to instantiate this class, specifying kind, dtype and sometimes extra_shape parameters. Most methods are meant to be called from LiberTEM-internal code, for example the UDF functionality.

This class is array_like, so you can directly use it, for example, as argument for numpy functions.

Changed in version 0.6.0: Add option to specify backend, for example CuPy

Parameters:

kind ("nav", "sig" or "single") – The abstract shape of the buffer, corresponding either to the navigation or the signal dimensions of the dataset, or a single value.
extra_shape (optional, tuple of int or a Shape object) – You can specify additional dimensions for your data. For example, if you want to store 2D coords, you would specify (2,) here. For a Shape object, sig_dims is discarded and the entire shape is used.
dtype (string or numpy dtype) – The dtype of this buffer
where (string or None) –
None means NumPy array, device to use a back-end specified in allocate().

New in version 0.6.0.
use ("private", "result_only" or None) –
If you specify "private" here, the result will only be made available to internal functions, like process_frame(), merge() or get_results(). It will not be available to the user of the UDF, which means you can use this to hide implementation details that are likely to change later.

Specify "result_only" here if the buffer is only used in get_results(), this means we don’t have to allocate and return it on the workers without actually needing it.

None means the buffer is used both as a final and intermediate result.

New in version 0.7.0.

property data: Get the buffer contents in shape that corresponds to the original dataset shape. If a ROI is set, embed the result into a new array; unset values have NaN value for floating point types, False for boolean, 0 for integer types and structs, ‘’ for string types and None for objects.

Changed in version 0.7.0: Better initialization values for dtypes other than floating point.

property dtype: Get the declared dtype of this buffer.

New in version 0.7.0.

property extra_shape: Get the extra_shape of this buffer.

New in version 0.5.0.

get_valid_slice_inner(axis: int = 0) → tuple[slice, ...][source]

Return a slice into data across axis that contains only elements marked as valid. This is the “inner” slice, meaning all elements selected by the slice are valid according to valid_mask.

Parameters:: axis – The axis across which we should cut. Other axes are unrestricted in the returned slice. For example, if you have a (y, x) result, specifying axis=0 will give you a slice like np.s_[y0:y1, :] where for y in y0..y1, data[y, :] is valid according to valid_mask.

property kind: Literal['nav', 'sig', 'single']: Get the kind of this buffer.

New in version 0.5.0.

property masked_data: MaskedArray: Same as data, but masked to the valid data, as a numpy masked array.

property raw_data: ndarray | None: Get the raw data underlying this buffer, which is flattened and may be even filtered to a ROI

property raw_masked_data: MaskedArray: Same as raw_data, but masked to the valid data, as a numpy masked array.

property shape

property valid_mask: ndarray: Returns a boolean array with a mask that indicates which parts of the buffer have valid contents.

property valid_slice_bounding: tuple[slice, ...]

Returns a slice that bounds the valid elements in data.

Note that this includes all valid elements, but also may contain invalid elements. If you instead want to get a slice that contains no invalid elements (which may cut off some valid elements), use get_valid_slice_inner().

property where: Get the place where this buffer is to be allocated.

New in version 0.6.0.

exception libertem.common.buffers.InvalidMaskError[source]: The mask is not compatible, raised for example when the shape of the mask doesn’t match the data.

class libertem.common.buffers.ManagedBuffer(pool, size, alignment)[source]: Allocate size bytes from pool, and return them to the pool once we are GC’d

class libertem.common.buffers.PlaceholderBufferWrapper(kind: Literal['nav', 'sig', 'single'], extra_shape: tuple[int, ...] = (), dtype='float32', where: None | Literal['device'] = None, use: Literal['private', 'result_only'] | None = None)[source]

A declaration-only version of BufferWrapper that doesn’t actually allocate a buffer. Meant as a placeholder for results that are only materialized in UDF.get_results.

property data: Get the buffer contents in shape that corresponds to the original dataset shape. If a ROI is set, embed the result into a new array; unset values have NaN value for floating point types, False for boolean, 0 for integer types and structs, ‘’ for string types and None for objects.

Changed in version 0.7.0: Better initialization values for dtypes other than floating point.

property raw_data: Get the raw data underlying this buffer, which is flattened and may be even filtered to a ROI

class libertem.common.buffers.PreallocBufferWrapper(data, *args, **kwargs)[source]

libertem.common.buffers.batched(iterable, n)[source]

libertem.common.buffers.bytes_aligned(size: int) → memoryview[source]

libertem.common.buffers.disjoint(sl: Slice, slices: Iterable[Slice])[source]

libertem.common.buffers.empty_aligned(size: int | tuple[int, ...], dtype: nt.DTypeLike) → ndarray[source]

libertem.common.buffers.reshaped_view(a: ndarray, shape)[source]

Like numpy.ndarray.reshape(), just guaranteed to return a view or throw an AttributeError if no view can be created.

New in version 0.5.0.

Parameters:

a (numpy.ndarray) – Array to create a view of
shape (tuple) – Shape of the view to create

Returns:

view – View into a with shape shape

Return type:

numpy.ndarray

libertem.common.buffers.to_numpy(a: ndarray | Any) → ndarray[source]

libertem.common.buffers.zeros_aligned(size: int | tuple[int, ...], dtype: nt.DTypeLike) → ndarray[source]

Included utility UDFs

Some generally useful UDFs are included with LiberTEM:

Note

See Application-specific APIs for application-specific UDFs and analyses.

Sum of frames

class libertem.udf.sum.SumUDF(dtype='float32')[source]

Sum up frames, preserving the signal dimension

Parameters:: dtype (numpy.dtype, optional) – Preferred dtype for computation, default ‘float32’. The actual dtype will be determined from this value and the dataset’s dtype using numpy.result_type(). See also Preferred input dtype.

Examples

>>> udf = SumUDF()
>>> result = ctx.run_udf(dataset=dataset, udf=udf)
>>> np.array(result["intensity"]).shape
(32, 32)

Sum of log-scaled frames

class libertem.udf.logsum.LogsumUDF[source]

Sum up logscaled frames

In comparison to log-scaling the sum, this highlights regions with slightly higher intensity that appear in many frames in relation to very high intensity in a few frames.

Examples

>>> udf = LogsumUDF()
>>> result = ctx.run_udf(dataset=dataset, udf=udf)
>>> np.array(result["logsum"]).shape
(32, 32)

get_backends()[source]

Signal which computation back-ends the UDF can use.

Allowed are values in libertem.udf.base.UDF.BACKEND_ALL. (UDF.BACKEND_NUMPY, ) is returned by default for CPU-based computation.

New in version 0.6.0.

Changed in version 0.11.0: Extended from just NumPy vs. CuPy to include more generic specification of backends, including sparse arrays. See Sparse arrays for details! The backend specifications are backwards-compatible with the previously supported values.

Returns:: A single value or iterable containing values from the supported backends in libertem.udf.base.UDF.BACKEND_ALL
Return type:: backends

Standard deviation

class libertem.udf.stddev.StdDevUDF(dtype=None, use_numba=True)[source]

Compute sum of variances and sum of pixels from the given dataset

The one-pass algorithm used in this code is taken from the following paper: [SG18].

Changed in version 0.5.0: Result buffers have been renamed

Changed in version 0.7.0: var, mean, and std are now returned directly from the UDF via get_results.

Changed in version 0.11: added dtype and use_numba parameters, added support for sparse input.

Parameters:

dtype – Base dtype to determine numerical precision.
use_numba (bool) – Use the Numba-accelerated version if input array format and backend allow.

Examples

>>> udf = StdDevUDF()
>>> result = ctx.run_udf(dataset=dataset, udf=udf)
>>> np.array(result["varsum"])        # variance times number of frames
array(...)
>>> np.array(result["num_frames"])  # number of frames for each tile
array(...)
>>> np.array(result["sum"])  # sum of all frames
array(...)
>>> np.array(result["var"])
array(...)
>>> np.array(result["mean"])
array(...)
>>> np.array(result["std"])
array(...)

get_backends()[source]

Signal which computation back-ends the UDF can use.

Allowed are values in libertem.udf.base.UDF.BACKEND_ALL. (UDF.BACKEND_NUMPY, ) is returned by default for CPU-based computation.

New in version 0.6.0.

Changed in version 0.11.0: Extended from just NumPy vs. CuPy to include more generic specification of backends, including sparse arrays. See Sparse arrays for details! The backend specifications are backwards-compatible with the previously supported values.

Returns:: A single value or iterable containing values from the supported backends in libertem.udf.base.UDF.BACKEND_ALL
Return type:: backends

get_task_data()[source]

Initialize per-task data.

Per-task data can be mutable. Override this function to allocate temporary buffers, or to initialize system resources.

If you want to distribute static data, use parameters instead.

Data available in this method:

self.params - the input parameters of this UDF
self.meta - relevant metadata, see UDFMeta documentation.

Returns:: Flat dict with string keys. Keys should be valid python identifiers, which allows access via self.task_data.the_key_here.
Return type:: dict

libertem.udf.stddev.run_stddev(ctx, dataset, roi=None, progress=False, use_numba=True)[source]

Compute sum of variances and sum of pixels from the given dataset

One-pass algorithm used in this code is taken from the following paper: [SG18].

Changed in version 0.5.0: Result buffers have been renamed

Changed in version 0.5.0: Added progress parameter for progress bar

Parameters:

ctx (libertem.api.Context) –
dataset (libertem.io.dataset.base.DataSet) – dataset to work on
roi (numpy.ndarray) – Region of interest, see Regions of interest for more information.
progress (bool, optional) – Show progress bar. Default is False.
use_numba (bool) – Use the Numba backend if available, otherwise use ndarray based implementation

Returns:

pass_results – A dictionary of narrays that contains sum of variances, sum of pixels, and number of frames used to compute the above statistic
To retrieve statistic, using the following commands
variance (pass_results['var'])
standard deviation (pass_results['std'])
sum of pixels (pass_results['sum'])
mean (pass_results['mean'])
number of frames (pass_results['num_frames'])

libertem.udf.stddev.consolidate_result(udf_result)[source]

Calculate variance, mean and standard deviation from raw UDF results and consolidate the per-tile frame counter into a single value. Convert all result arrays to ndarray.

Note

This is mostly here for backwards-compatability - nowadays, ‘var’, ‘std’, and ‘mean’ are already calculated in StdDevUDF.get_results().

Parameters:: udf_result (Dict[str, BufferWrapper]) – UDF result with keys ‘sum’, ‘varsum’, ‘num_frames’, ‘var’, ‘std’, ‘mean’
Returns:: pass_results – Result dictionary with keys 'sum', 'varsum', 'var', 'std', 'mean' as numpy.ndarray, and 'num_frames' as int
Return type:: Dict[str, Union[numpy.ndarray, int]]

Sum per frame

class libertem.udf.sumsigudf.SumSigUDF(**kwargs: Any | AuxBufferWrapper)[source]

Sum over the signal axes. For each navigation position, the sum of all pixels is calculated.

Examples

>>> udf = SumSigUDF()
>>> result = ctx.run_udf(dataset=dataset, udf=udf)
>>> np.array(result["intensity"]).shape
(16, 16)

get_backends()[source]

Signal which computation back-ends the UDF can use.

Allowed are values in libertem.udf.base.UDF.BACKEND_ALL. (UDF.BACKEND_NUMPY, ) is returned by default for CPU-based computation.

New in version 0.6.0.

Changed in version 0.11.0: Extended from just NumPy vs. CuPy to include more generic specification of backends, including sparse arrays. See Sparse arrays for details! The backend specifications are backwards-compatible with the previously supported values.

Returns:: A single value or iterable containing values from the supported backends in libertem.udf.base.UDF.BACKEND_ALL
Return type:: backends

Centre-of-Mass

class libertem.udf.com.CoMUDF(com_params: CoMParams = (None, None, inf, 0.0, 0.0, False, RegressionOptions.NO_REGRESSION))[source]

Perform centre-of-mass analysis on the dataset

To parametrise the CoM calculation, use the constructor CoMUDF.with_params().

This replicates the functionality of create_com_analysis() but in UDF form, making it compatible with live processing and more easily sub-classable.

New in version 0.12.0.

The result buffers of this UDF, all of type ‘nav’, are as follows:

raw_com, coordinate (y, x)
The centre-of-mass in the pixel coordinate system of the frame, subject to the mask defined by the parameters, if any.
raw_shifts, vector (dy, dx)
The raw shift in centre-of-mass in axis-aligned pixels, relative to the parameters cy and cx, either as-supplied or from the frame centre, by default.
field, vector (2,)
The transformed shift in the centre-of-mass in pixels, subject to any corrections (scan rotation, detector flip)
field_y, scalar
y-component of field, as given above.
field_x, scalar
x-component of field, as given above.
magnitude, scalar
The magnitude of the field buffer, equivalent to ||raw_shifts||_2
divergence, scalar
The discrete divergence of the field buffer
curl, scalar
The discrete curl of the field buffer
regression, (3, 2) matrix
The regression parameters used to flatten the field: ((y, x), (dy/dy, dx/dy), (dy/dx, dx/dx))

Note

The implementation of the results ‘divergence’ and ‘curl’ differ slightly from COMAnalysis in that here results for pixels outside of an ROI will always be NaN. The COMAnalysis implementation could provide non-NaN values even in ROI == False pixels due to the behaviour of np.gradient.

Parameters:: com_params (CoMParams) – A CoMParams instance containing the parameters for this UDF. By default a CoMParams instance is created which performs whole-frame CoM giving shifts relative to the frame centre with no corrections.

Examples

>>> # CoM in a disk r=4. centred on (7.2, 6.8)
>>> udf = CoMUDF.with_params(cy=7.2, cx=6.8, r=4.)
>>> result = ctx.run_udf(dataset=dataset, udf=udf)
>>> result["raw_shifts"].data.shape
(16, 16, 2)

>>> # Whole-frame CoM corrected to have zero-mean shift
>>> udf = CoMUDF.with_params(regression=0)
>>> result = ctx.run_udf(dataset=dataset, udf=udf)
>>> result["raw_shifts"].data.shape
(16, 16, 2)

classmethod with_params(*, cy: float | None = None, cx: float | None = None, r: float = inf, ri: float = 0.0, scan_rotation: float = 0.0, flip_y: bool = False, regression: Literal[-1, 0, 1] = RegressionOptions.NO_REGRESSION)[source]

Returns an instantiated CoMUDF with a given set of parameters

Parameters:

cy (Optional[float], by default None) – Vertical-Centre of the mask applied to the frame, if any, and the reference point from which vertical CoM-shifts are calculated. If None, cy is set to the frame centre at runtime.
cx (Optional[float], by default None) – Horizontal-Centre of the mask applied to the frame, if any, and the reference point from which horizontal CoM-shifts are calculated. If None, cx is set to the frame centre at runtime.
r (float, by default float('inf')) – (Outer) Radius of the disk mask around cy/cx to restrict the CoM calculation. The default value is float('inf') which is equivalent to performing whole-frame CoM with the given origin cy/cx.
ri (float, by default 0.) – (Inner) Radius of the ring mask around cy/cx to exclude from the CoM calculation. If left as 0., no inner disk is excluded and the CoM is calculated within a complete disk of radius r.
scan_rotation (float, by default 0.) – Scan rotation in degrees. The optics of an electron microscope can rotate the image. Furthermore, scan generators may allow scanning in arbitrary directions. This means that the x and y coordinates of the detector image are usually not parallel to the x and y scan coordinates. For interpretation of center of mass shifts, however, the shift vector in detector coordinates has to be put in relation to the position on the sample. The scan_rotation parameter can be used to rotate the detector coordinates to match the scan coordinate system. A positive value rotates the displacement vector clock-wise. That means if the detector seems rotated to the right relative to the scan, this value should be negative to counteract this rotation.
flip_y (bool, by default False) – Flip the Y coordinate. Some detectors, namely Quantum Detectors Merlin, may have pixel (0, 0) at the lower left corner. This has to be corrected to get the sign of the y shift as well as curl and divergence right.
regression (Union[np.ndarray, Literal[-1, 0, 1]], by default -1) –
Regression to a background CoM field used to flatten the results

Can be an ndarray with shape (3, 2) which specifies a correction to subtract in the form: [[y, x], [dy/dy, dx/dy], [dy/dx, dx/dx]].

Otherwise a member of the RegressionOptions enum or an integer in (-1, 0, 1). This specifies the regression order to calculate and apply to the results. With the following effects:
- RegressionOptions.NO_REGRESSION or -1: No regression
- RegressionOptions.SUBTRACT_MEAN or 0: Subtract the mean to compensate for center offset.
- RegressionOptions.SUBTRACT_LINEAR or 1: Subtract the mean and a linear regression, to compensate for center offset and descan error.
The regression that was used is available in the 'regression' result buffer.

Returns:

CoMUDF – the UDF with the provided parameters

Return type:

libertem.udf.com.CoMUDF

Note

Using a regression correction calculated on the data itself may distort valid results, such as the effect of long-range fields. Furthermore, it may yield erronerous results due to aliasing on periodic structures or boundary effects. Subtracting a reference measurement or regression on a reference measurement instead of a regression on the data may give more reliable results.

The CoMUDF has two helper classes used to parametrise it:

namedtuple libertem.udf.com.CoMParams(cy: float | None = None, cx: float | None = None, r: float = inf, ri: float | None = 0.0, scan_rotation: float = 0.0, flip_y: bool = False, regression: Literal[-1, 0, 1] = RegressionOptions.NO_REGRESSION)[source]

Used to instantiate CoMUDF.

See with_params() for descriptions of each parameter. In most cases use this method rather than instantiating this class directly.

Fields:

cy (Optional[float]) – Alias for field number 0
cx (Optional[float]) – Alias for field number 1
r (float) – Alias for field number 2
ri (Optional[float]) – Alias for field number 3
scan_rotation (float) – Alias for field number 4
flip_y (bool) – Alias for field number 5
regression (Union[ndarray, Literal[-1, 0, 1]]) – Alias for field number 6

enum libertem.udf.com.RegressionOptions(value)[source]

Possible values of the regression functionality implemented in CoMUDF

Member Type:: int

Valid values are as follows:

NO_REGRESSION = <RegressionOptions.NO_REGRESSION: -1>

SUBTRACT_MEAN = <RegressionOptions.SUBTRACT_MEAN: 0>

SUBTRACT_LINEAR = <RegressionOptions.SUBTRACT_LINEAR: 1>

There is also a function available to guess sensible CoM parameters:

libertem.udf.com.guess_corrections(y_centers: ndarray, x_centers: ndarray, roi: ndarray | tuple[slice, ...] | None = None) → GuessResult[source]

Guess values for center offset ((cy, cx)), scan_rotation and flip_y from existing CoM data.

This function can generate a CoM parameter guess for atomic resolution 4D STEM data by using the following assumptions:

The field is purely electrostatic, i.e. the RMS curl should be minimized
There is no net field over the field of view and no descan error, i.e. the mean deflection is zero.
Atomic resolution STEM means that the divergence will be negative at atom columns and consequently the histogram of divergence will have a stronger tail towards negative values than towards positive values.

If any corrections were applied when generating the input data please note that the new parameters should be applied relative to these previous values. In particular, the values for (cy, cx) returned by this function are the means of the inputs y_centers and x_centers. If these inputs are CoM-shifts relative to a given position, then the guessed correction will also be relative to this same position.

Parameters:

y_centers (numpy.ndarray) – 2D arrays with y and x component of the center of mass for each scan position
x_centers (numpy.ndarray) – 2D arrays with y and x component of the center of mass for each scan position
roi (Optional[numpy.ndarray]) – Selector for values to consider in the statistics, compatible with indexing an array with the shape of y_centers and x_centers. By default, everything except the last row and last column are used since these contain artefacts.

Examples

>>> # Perform default, whole-frame CoM on a dataset
>>> udf = CoMUDF()
>>> result = ctx.run_udf(dataset=dataset, udf=udf)
>>> # Pass the `raw_com` result to this function
>>> y_centers = result['raw_com'].data[..., 0]
>>> x_centers = result['raw_com'].data[..., 1]
>>> guess_result = guess_corrections(y_centers, x_centers)
>>> # Create a new CoMUDF with the guessed parameters
>>> param_udf = CoMUDF.with_params(**guess_result._asdict())
>>> result_corrected = ctx.run_udf(dataset=dataset, udf=param_udf)

Returns:: GuessResult
Return type:: computed corrections

namedtuple libertem.udf.com.GuessResult(scan_rotation: int, flip_y: bool, cy: float, cx: float)[source]

Container for CoM parameters inferred from results, which can be passed to with_params() to create a new CoMUDF better-adapted to the data.

Return type of guess_corrections()

Fields:

scan_rotation (int) – Alias for field number 0
flip_y (bool) – Alias for field number 1
cy (float) – Alias for field number 2
cx (float) – Alias for field number 3

Apply masks

class libertem.udf.masks.ApplyMasksUDF(mask_factories, use_torch=True, use_sparse=None, mask_count=None, mask_dtype=None, preferred_dtype=None, backends=None, shifts=None, **kwargs)[source]

Apply masks to signals/frames in the dataset. This can not only be used to integrate over regions with a binary mask - the integration can be weighted by using float or complex valued masks.

The result will be returned in a single sig-shaped buffer called intensity. Its shape will be (*nav_shape, len(masks)).

Parameters:

mask_factories (Union[Callable[[], array_like], Iterable[Callable[[], array_like]]]) – Function or list of functions that take no arguments and create masks. The returned masks can be numpy arrays, scipy.sparse or sparse https://sparse.pydata.org/ matrices. The mask factories should not reference large objects because they can create significant overheads when they are pickled and unpickled. Each factory function should, when called, return a numpy array with the same shape as frames in the dataset (so dataset.shape.sig).
use_torch (bool, optional) – Use pytorch back-end if available. Default True
use_sparse (Union[None, False, True, 'scipy.sparse', 'scipy.sparse.csc', 'sparse.pydata'], optional) – Which sparse back-end to use. * None (default): Where possible use sparse matrix multiplication if all factory functions return a sparse mask, otherwise convert all masks to dense matrices and use dense matrix multiplication * True: Convert all masks to sparse matrices and use default sparse back-end. * False: Convert all masks to dense matrices. * ‘scipy.sparse’: Use scipy.sparse.csr_matrix (default sparse) * ‘scipy.sparse.csc’: Use scipy.sparse.csc_matrix * ‘sparse.pydata’: Use sparse.pydata COO matrix
mask_count (int, optional) – Specify the number of masks if a single factory function is used so that the number of masks can be determined without calling the factory function.
mask_dtype (numpy.dtype, optional) – Specify the dtype of the masks so that mask dtype can be determined without calling the mask factory functions. This can be used to override the mask dtype in the result dtype determination. As an example, setting this to np.float32 means that masks of type float64 will not switch the calculation and result dtype to float64 or complex128.
preferred_dtype (numpy.dtype, optional) – Let get_preferred_input_dtype() return the specified type instead of the default float32. This can perform the calculation with integer types if both input data and mask data are compatible with this.
backends (Iterable containing strings "numpy" and/or "cupy", or None) – Control which back-ends are used. Default is numpy and cupy
shifts (Union[Tuple[int, int], AuxBufferWrapper], optional) –
(Y/X)-shifts to apply to all masks before multiplying with each frame. Can be either a length-2 array-like for a constant (Y, X) shift, or an AuxBufferWrapper of (kind='nav', extra_shape=(2,), dtype=int) defining a per-frame shift to apply.

A positive y-shift moves the mask ‘down’ relative to the frame, while a positive x-shift moves the mask ‘right’ relative to the frame. Elements of the mask and frame which do not overlap after the shift are discarded. A shift resulting in no overlap at all will return a sum of 0. for that frame.

Note

Float shift values are cast to integers internally; round values before passing the shifts argument to better control the exact shifts used.

Note

The shifts parameter requires frame-by-frame processing to function, and so adds a performance penalty compared to unshifted mask application. If applying a constant shift it may be worthwhile to manually create a new, pre-shifted mask rather than relying on this feature.

Shifting is also currently incompatible with scipy.sparse masks. If sparse processing is required then where possible scipy.sparse masks are converted to sparse.pydata equivalents. A consequence of this is that sparse processing is not yet supported through CuPy when shifts are enabled, as sparse.pydata has no current CuPy implementation. If sparse masks are supplied on a CuPy backend when use_sparse=None (the default) they will be densified to allow the calculation to take place.

Examples

>>> dataset.shape
(16, 16, 32, 32)
>>> def my_masks():
...     return [np.ones((32, 32)), np.zeros((32, 32))]
>>> udf = ApplyMasksUDF(mask_factories=my_masks)
>>> res = ctx.run_udf(dataset=dataset, udf=udf)['intensity']
>>> res.data.shape
(16, 16, 2)
>>> np.allclose(res.data[..., 1], 0)  # same order as in the mask factory
True

Mask factories can also return all masks as a single array, stacked on the first axis:

>>> def my_masks_2():
...     masks = np.zeros((2, 32, 32))
...     masks[1, ...] = 1
...     return masks
>>> udf = ApplyMasksUDF(mask_factories=my_masks_2)
>>> res_2 = ctx.run_udf(dataset=dataset, udf=udf)['intensity']
>>> np.allclose(res_2.data, res.data)
True

Masks can be shifted relative to the data using the shifts parameter, this can either be a constant shift for all frames:

>>> udf = ApplyMasksUDF(mask_factories=my_masks, shifts=(2, -5))
>>> res_shift_constant = ctx.run_udf(dataset=dataset, udf=udf)['intensity']

or a per-frame shift supplied using an AuxBufferWrapper created using aux_data():

>>> shifts = np.random.randint(-8, 8, size=(16, 16, 2)).ravel()
>>> udf = ApplyMasksUDF(
...         mask_factories=my_masks,
...         shifts=ApplyMasksUDF.aux_data(
...             shifts,
...             kind='nav',
...             extra_shape=(2,),
...             dtype=shifts.dtype,
...         )
...     )
>>> res_shift_aux = ctx.run_udf(dataset=dataset, udf=udf)['intensity']

New in version 0.4.0.

Changed in version 0.13.0: Added the shifts parameter

Load data

class libertem.udf.raw.PickUDF[source]

Load raw data from ROI

This UDF is meant for frame picking with a very small ROI, usually a single frame.

New in version 0.4.0.

Examples

>>> udf = PickUDF()
>>> roi = dataset.roi[0]
>>> result = ctx.run_udf(dataset=dataset, udf=udf, roi=roi)
>>> result["intensity"].raw_data[0].shape
(32, 32)

Save data

class libertem.udf.record.RecordUDF(filename, _is_master=True)[source]

Record input data as NumPy .npy file

Parameters:

filename (str or path-like) – Filename where to save. The file will be overwritten if it exists.
_is_master (bool) – Internal flag, keep at default value.

NoOp

class libertem.udf.base.NoOpUDF(preferred_input_dtype=<class 'bool'>)[source]

A UDF that does nothing and returns nothing.

This is useful for testing.

Parameters:: preferred_input_dtype (numpy.dtype) – Perform dtype conversion. By default, this is UDF.USE_NATIVE_DTYPE.

get_preferred_input_dtype()[source]: Return the value passed in the constructor.

get_result_buffers()[source]: No result buffers.

process_tile(tile)[source]: Do nothing.