UDF API reference

Defining UDFs

See User-defined functions (UDFs) for an introduction and in-depth explanation.

Mixins for processing methods

class libertem.udf.base.UDFFrameMixin(*args, **kwargs)[source]

Implement process_frame for per-frame processing.

process_frame(frame: ndarray)[source]

Implement this method to process the data on a frame-by-frame manner.

Data available in this method:

  • self.params - the parameters of this UDF

  • self.task_data - task data created by get_task_data

  • self.results - the result buffer instances

  • self.meta - meta data about the current operation and data set

Parameters

frame (numpy.ndarray or cupy.ndarray) – A single frame or signal element from the dataset. The shape is the same as dataset.shape.sig. In case of pixelated STEM / scanning diffraction data this is 2D, for spectra 1D etc.

class libertem.udf.base.UDFTileMixin(*args, **kwargs)[source]

Implement process_tile for per-tile processing.

process_tile(tile: ndarray)[source]

Implement this method to process the data in a tiled manner.

Data available in this method:

  • self.params - the parameters of this UDF

  • self.task_data - task data created by get_task_data

  • self.results - the result buffer instances

  • self.meta - meta data about the current operation and data set

Parameters

tile (numpy.ndarray or cupy.ndarray) – A small number N of frames or signal elements from the dataset. The shape is (N,) + dataset.shape.sig. In case of pixelated STEM / scanning diffraction data this is 3D, for spectra 2D etc.

class libertem.udf.base.UDFPartitionMixin(*args, **kwargs)[source]

Implement process_partition for per-partition processing.

process_partition(partition: ndarray) None[source]

Implement this method to process the data partitioned into large (100s of MiB) partitions.

Data available in this method:

  • self.params - the parameters of this UDF

  • self.task_data - task data created by get_task_data

  • self.results - the result buffer instances

  • self.meta - meta data about the current operation and data set

Note

Only use this method if you know what you are doing; especially if you are running a processing pipeline with multiple steps, or multiple processing pipelines at the same time, performance may be adversely impacted.

Parameters

partition (numpy.ndarray or cupy.ndarray) – A large number N of frames or signal elements from the dataset. The shape is (N,) + dataset.shape.sig. In case of pixelated STEM / scanning diffraction data this is 3D, for spectra 2D etc.

Base UDF class

class libertem.udf.base.UDF(**kwargs: Union[Any, AuxBufferWrapper])[source]

The main user-defined functions interface. You can implement your functionality by overriding methods on this class.

If you override __init__, please take care, as it is called multiple times during evaluation of a UDF. You can handle some pre-conditioning of parameters, but you also have to accept the results as input again.

Arguments passed as **kwargs will be automatically available on self.params when running the UDF.

Example

>>> class MyUDF(UDF):
...     def __init__(self, param1, param2="def2", **kwargs):
...         param1 = int(param1)
...         if "param3" not in kwargs:
...             raise TypeError("missing argument param3")
...         super().__init__(param1=param1, param2=param2, **kwargs)
Parameters

kwargs

Input parameters. They are scattered to the worker processes and available as self.params from here on.

Values can be BufferWrapper instances, which, when accessed via self.params.the_key_here, will automatically return a view corresponding to the current unit of data (frame, tile, partition).

classmethod aux_data(data, kind, extra_shape=(), dtype='float32')[source]

Use this method to create auxiliary data. Auxiliary data should have a shape like (dataset.shape.nav, extra_shape) and on access, an appropriate view will be created. For example, if you access aux data in process_frame, you will get the auxiliary data for the current frame you are processing.

Example

We create a UDF to demonstrate the behavior:

>>> class MyUDF(UDF):
...     def get_result_buffers(self):
...         # Result buffer for debug output
...         return {'aux_dump': self.buffer(kind='nav', dtype='object')}
...
...     def process_frame(self, frame):
...         # Extract value of aux data for demonstration
...         self.results.aux_dump[:] = str(self.params.aux_data[:])
...
>>> # for each frame, provide three values from a sequential series:
>>> aux1 = MyUDF.aux_data(
...     data=np.arange(np.prod(dataset.shape.nav) * 3, dtype=np.float32),
...     kind="nav", extra_shape=(3,), dtype="float32"
... )
>>> udf = MyUDF(aux_data=aux1)
>>> res = ctx.run_udf(dataset=dataset, udf=udf)

process_frame for frame (0, 7) received a view of aux_data with values [21., 22., 23.]:

>>> res['aux_dump'].data[0, 7]
'[21. 22. 23.]'
buffer(kind: Literal['nav', 'sig', 'single'], extra_shape: Tuple[int, ...] = (), dtype: nt.DTypeLike = 'float32', where: Optional[Literal['device']] = None, use: Optional[Literal['private', 'result_only']] = None) BufferWrapper[source]

Use this method to create BufferWrapper objects in get_result_buffers().

Parameters
  • kind ("nav", "sig" or "single") – The abstract shape of the buffer, corresponding either to the navigation or the signal dimensions of the dataset, or a single value.

  • extra_shape (optional, tuple of int or a Shape object) – You can specify additional dimensions for your data. For example, if you want to store 2D coords, you would specify (2,) here. If this is specified as a Shape object, it is converted to a tuple first.

  • dtype (string or numpy dtype) – The dtype of this buffer

  • where (string or None) –

    None means NumPy array, specify 'device' to use a device buffer (for example on a CUDA device)

    New in version 0.6.0.

  • use ("private", "result_only" or None) –

    If you specify "private" here, the result will only be made available to internal functions, like process_frame(), merge() or get_results(). It will not be available to the user of the UDF, which means you can use this to hide implementation details that are likely to change later.

    Specify "result_only" here if the buffer is only used in get_results(), this means we don’t have to allocate and return it on the workers without actually needing it.

    None means the buffer is used both as a final and intermediate result.

    New in version 0.7.0.

copy_for_partition(partition: Partition, roi: ndarray) UDF[source]

create a copy of the UDF, specifically slicing aux data to the specified pratition and roi

get_backends() Union[Literal['numpy', 'cuda', 'cupy'], Iterable[Literal['numpy', 'cuda', 'cupy']]][source]

Signal which computation back-ends the UDF can use.

numpy is the default CPU-based computation.

cuda is CUDA-based computation without CuPy.

cupy is CUDA-based computation through CuPy.

New in version 0.6.0.

Returns

backend – An iterable containing possible values numpy (default), 'cuda' and cupy

Return type

Iterable[str]

get_preferred_input_dtype() nt.DTypeLike[source]

Override this method to specify the preferred input dtype of the UDF.

The default is float32 since most numerical processing tasks perform best with this dtype, namely dot products.

The back-end uses this preferred input dtype in combination with the dataset`s native dtype to determine the input dtype using numpy.result_type(). That means float data in a dataset switches the dtype to float even if this method returns an int dtype. int32 or wider input data would switch from float32 to float64, and complex data in the dataset will switch the input dtype kind to complex, following the NumPy casting rules.

In case your UDF only works with specific input dtypes, it should throw an error or warning if incompatible dtypes are used, and/or implement a meaningful conversion in your UDF’s process_<...> routine.

If you prefer to always use the dataset’s native dtype instead of floats, you can override this method to return UDF.USE_NATIVE_DTYPE, which is currently identical to numpy.bool and behaves as a neutral element in numpy.result_type().

New in version 0.4.0.

get_result_buffers() Dict[str, BufferWrapper][source]

Return result buffer declaration.

Values of the returned dict should be BufferWrapper instances, which, when accessed via self.results.key, will automatically return a view corresponding to the current unit of data (frame, tile, partition).

The values also need to be serializable via pickle.

Data available in this method:

  • self.params - the parameters of this UDF

  • self.meta - relevant metadata, see UDFMeta documentation. Please note that partition metadata will not be set when this method is executed on the head node.

Returns

Flat dict with string keys. Keys should be valid python identifiers, which allows access via self.results.the_key_here.

Return type

dict

get_results() Dict[str, ndarray][source]

Get results, allowing a postprocessing step on the main node after a result has been merged. See also: UDFPostprocessMixin.

New in version 0.7.0.

Note

You should return all values as numpy arrays, they will be wrapped in BufferWrapper instances before they are returned to the user.

See the Post-processing after merging section in the documentation for details and examples.

Returns

results – A dict containing the final post-processed results.

Return type

dict

get_task_data() Dict[str, Any][source]

Initialize per-task data.

Per-task data can be mutable. Override this function to allocate temporary buffers, or to initialize system resources.

If you want to distribute static data, use parameters instead.

Data available in this method:

  • self.params - the input parameters of this UDF

  • self.meta - relevant metadata, see UDFMeta documentation.

Returns

Flat dict with string keys. Keys should be valid python identifiers, which allows access via self.task_data.the_key_here.

Return type

dict

get_tiling_preferences() TilingPreferences[source]

Configure tiling preferences. Return a dictionary with the following keys:

  • “depth”: number of frames/frame parts to stack on top of each other

  • “total_size”: total size of a tile in bytes

New in version 0.6.0.

merge(dest: MergeAttrMapping, src: MergeAttrMapping)[source]

Merge a partial result src into the current global result dest.

Data available in this method:

  • self.params - the parameters of this UDF

Parameters
  • dest – global results; you can access the ndarrays for each buffer name (from get_result_buffers) by attribute access (dest.your_buffer_name)

  • src – results for a partition; you can access the ndarrays for each buffer name (from get_result_buffers) by attribute access (src.your_buffer_name)

Note

This function is running on the main node, which means self.results and self.task_data are not available.

property requires_custom_merge

Determine if buffers with kind != 'nav' are present where the default merge doesn’t work

New in version 0.5.0.

Alternative merging for Dask arrays

class libertem.udf.base.UDFMergeAllMixin(*args, **kwargs)[source]
merge_all(ordered_results: OrderedDict[Slice, MergeAttrMapping]) Mapping[str, nt.ArrayLike][source]

Combine stack of ordered partial results ordered_results to form complete result.

Combining can be more efficient than direct merging into a result buffer for cases where the results are not NumPy arrays. Currently this is only applicable for the libertem.executor.delayed.DelayedJobExecutor where it provides an efficient pathway to construct Dask arrays from delayed UDF results.

The input and the returned arrays are in flattened navigation dimension with ROI applied.

Data available in this method:

  • self.params - the parameters of this UDF

For UDFs with only kind='nav' result buffers a default implementation is used automatically.

Parameters

ordered_results – Ordered dict mapping partition slice to UDF partial result

Returns

Dictionary mapping result buffer name to buffer content

Return type

dict[buffername] -> array_like

Note

This function is running on the main node, which means self.results and self.task_data are not available.

Meta information

class libertem.udf.base.UDFMeta(partition_slice: Optional[Slice], dataset_shape: Shape, roi: Optional[ndarray], dataset_dtype: nt.DTypeLike, input_dtype: nt.DTypeLike, tiling_scheme: TilingScheme = None, tiling_index: int = 0, corrections: Optional[CorrectionSet] = None, device_class: Optional[Literal['cpu', 'cuda']] = None, threads_per_worker: Optional[int] = None)[source]

UDF metadata. Makes all relevant metadata accessible to the UDF. Can be different for each task/partition.

Changed in version 0.4.0: Added distinction of dataset_dtype and input_dtype

Changed in version 0.6.0: Information on compute backend, corrections, coordinates and tiling scheme added

Changed in version 0.9.0: tiling_scheme_idx and sig_slice added

property coordinates: ndarray

Array of coordinates that correspond to the frames in the actual navigation space which are part of the current tile or partition.

New in version 0.6.0.

Type

np.ndarray

property corrections: CorrectionSet

correction data that is available, either from the dataset or specified by the user

New in version 0.6.0.

Type

CorrectionSet

property dataset_dtype: nt.DTypeLike

Native dtype of the dataset

Type

numpy.dtype

property dataset_shape: Shape

The original shape of the whole dataset, not influenced by the ROI

Type

Shape

property device_class: str

Which device class is used.

The back-end library can be accessed as libertem.udf.base.UDF.xp. This additional string information is used since that way the back-end can be probed without importing them all and testing them against libertem.udf.base.UDF.xp.

Current values are cpu (default) or cuda.

New in version 0.6.0.

property input_dtype: nt.DTypeLike

dtype of the data that will be passed to the UDF

This is determined from the dataset’s native dtype and UDF.get_preferred_input_dtype() using numpy.result_type()

New in version 0.4.0.

Type

numpy.dtype

property partition_shape: Shape

The shape of the partition this UDF currently works on. If a ROI was applied, the shape will be modified accordingly.

Type

Shape

property roi: Optional[ndarray]

Boolean array which limits the elements the UDF is working on. Has a shape of dataset_shape.nav.

Type

numpy.ndarray

property sig_slice: Slice

Signal slice of the current tile.

Since all tiles follow the same tiling scheme, this avoids repeatedly calculating the signal part of slice. Instead, the appropriate slice from the tiling scheme can be re-used.

property slice: Optional[Slice]

A Slice instance that describes the location within the dataset with navigation dimension flattened and reduced to the ROI.

Type

Slice

property threads_per_worker: Optional[int]
number of threads that a UDF is allowed to use in the process_* method.

For Numba, pyfftw, Torch, NumPy and SciPy (OMP, MKL, OpenBLAS), this limit is set automatically; this property can be used for other cases, like manually creating thread pools or setting limits for unsupported modules. None means no limit is set, and the UDF can use any number of threads it deems necessary (should be limited to system limits, of course).

Note

Changed in version 0.8.0: Since discovery of loaded libraries can be slow with threadpoolctl (#1117), they are cached now. In case an UDF triggers loading of a new library or instance of a library that is supported by threadpoolctl, it will only be discovered in the first run on a Context. The threading settings for such other libraries or instances can therefore depend on the execution order. In such cases the thread count for affected libraries should be set in the UDF based on threads_per_worker. Numba, pyfftw, Torch, NumPy and SciPy should not be affected since they are loaded before the first discovery.

See also: libertem.common.threading.set_num_threads()

New in version 0.7.0.

Type

int or None

property tiling_scheme: Optional[TilingScheme]

the tiling scheme that was negotiated

New in version 0.6.0.

Type

TilingScheme

property tiling_scheme_idx: int

Index of the current tile in tiling_scheme.

Pre- and postprocessing

class libertem.udf.base.UDFPreprocessMixin(*args, **kwargs)[source]

Implement preprocess to initialize the result buffers of a partition on the worker before the partition data is processed.

New in version 0.3.0.

preprocess() None[source]

Implement this method to preprocess the result data for a partition.

This can be useful to initialize arrays of dtype='object' with the correct container types, for example.

Data available in this method:

  • self.params - the parameters of this UDF

  • self.task_data - task data created by get_task_data

  • self.results - the result buffer instances

class libertem.udf.base.UDFPostprocessMixin(*args, **kwargs)[source]

Implement postprocess to modify the resulf buffers of a partition on the worker after the partition data has been completely processed, but before it is returned to the main node for the final merging step.

postprocess() None[source]

Implement this method to postprocess the result data for a partition.

This can be useful in combination with process_tile() to implement a postprocessing step that requires the reduced results for whole frames.

Data available in this method:

  • self.params - the parameters of this UDF

  • self.task_data - task data created by get_task_data

  • self.results - the result buffer instances

Running UDFs

Three methods of libertem.api.Context are relevant for running user-defined functions:

class libertem.api.Context(executor: Optional[JobExecutor] = None, plot_class=None)[source]

Context is the main entry point of the LiberTEM API. It contains methods for loading datasets, creating analyses on them and running them. In the background, instantiating a Context creates a suitable executor and spins up a local Dask cluster unless the executor is passed to the constructor.

Changed in version 0.7.0: Removed deprecated methods create_mask_job, create_pick_job

Parameters
  • executor (JobExecutor or None) – If None, create a local dask.distributed cluster and client using make_local() with optimal configuration for LiberTEM. It uses all cores and compatible GPUs on the local system, but is not set as default Dask scheduler to not interfere with other uses of Dask.

  • plot_class (libertem.viz.base.Live2DPlot) –

    Default plot class for live plotting. Defaults to libertem.viz.mpl.MPLLive2DPlot.

    New in version 0.7.0.

plot_class

Default plot class for live plotting. Defaults to libertem.viz.mpl.MPLLive2DPlot.

New in version 0.7.0.

Type

libertem.viz.base.Live2DPlot

Examples

>>> ctx = libertem.api.Context()  
>>> # Create a Context using an inline executor for debugging
>>> from libertem.executor.inline import InlineJobExecutor
>>> debug_ctx = libertem.api.Context(executor=InlineJobExecutor())
map(dataset: DataSet, f, roi: Optional[ndarray] = None, progress: bool = False, corrections: Optional[CorrectionSet] = None, backends=None) BufferWrapper[source]

Create an AutoUDF with function f() and run it on dataset

Changed in version 0.5.0: Added the progress parameter

Changed in version 0.6.0: Added the corrections and backends parameter

Parameters
  • dataset – The dataset to work on

  • f – Function that accepts a frame as the only parameter. It should return a strongly reduced output compared to the size of a frame.

  • roi (numpy.ndarray) – region of interest as bool mask over the navigation axes of the dataset

  • progress (bool) – Show progress bar

  • corrections – Corrections to apply while running the function. If none are given, the corrections that are part of the DataSet are used, if there are any. See also Corrections.

  • backends (None or iterable containing 'numpy', 'cupy' and/or 'cuda') – Restrict the back-end to a subset of the capabilities of the UDF. This can be useful for testing hybrid UDFs.

Returns

BufferWrapper – The result of the UDF. Access the underlying numpy array using the data property. Shape and dtype is inferred automatically from f.

Return type

libertem.common.buffers.BufferWrapper

run_udf(dataset: DataSet, udf: Union[UDF, Iterable[UDF]], roi: Optional[ndarray] = None, corrections: Optional[CorrectionSet] = None, progress: bool = False, backends=None, plots=None, sync=True) Union[Mapping[str, BufferWrapper], List[Mapping[str, BufferWrapper]], Coroutine[None, None, Mapping[str, BufferWrapper]], Coroutine[None, None, List[Mapping[str, BufferWrapper]]]][source]

Run udf on dataset, restricted to the region of interest roi.

Changed in version 0.5.0: Added the progress parameter

Changed in version 0.6.0: Added the corrections and backends parameter

Changed in version 0.7.0: Added the plots and sync parameters, and the ability to run multiple UDFs on the same data in a single pass.

Parameters
  • dataset – The dataset to work on

  • udf – UDF instance you want to run, or a list of UDF instances

  • roi (numpy.ndarray) – Region of interest as bool mask over the navigation axes of the dataset

  • progress (bool) – Show progress bar

  • corrections – Corrections to apply while running the UDF. If none are given, the corrections that are part of the DataSet are used, if there are any. See also Corrections.

  • backends (None or iterable containing 'numpy', 'cupy' and/or 'cuda') – Restrict the back-end to a subset of the capabilities of the UDF. This can be useful for testing hybrid UDFs.

  • plots (None or True or List[List[Union[str, Tuple[str, Callable]]]] or List[LivePlot]) –

    • None: don’t plot anything (default)

    • True: plot all 2D UDF result buffers

    • List[List[...]]: plot the named UDF buffers. Pass a list of names or (name, callable) tuples for each UDF you want to plot. If the callable is specified, it is applied to the UDF buffer before plotting.

    • List[LivePlot]: LivePlot instance for each channel you want to plot

    New in version 0.7.0.

  • sync (bool) –

    By default, run_udf is a synchronous method. If sync is set to False, it is awaitable instead.

    New in version 0.7.0.

Returns

Return value of the UDF containing the result buffers of type libertem.common.buffers.BufferWrapper. Note that a BufferWrapper can be used like a numpy.ndarray in many cases because it implements __array__(). You can access the underlying numpy array using the data property.

If a list of UDFs was passed in, the returned type is a Tuple[dict[str,BufferWrapper]].

Return type

dict or Tuple[dict]

Examples

Run the SumUDF on a data set:

>>> from libertem.udf.sum import SumUDF
>>> result = ctx.run_udf(dataset=dataset, udf=SumUDF())
>>> np.array(result["intensity"]).shape
(32, 32)
>>> # intensity is the name of the result buffer, defined in the SumUDF

Running a UDF on a subset of data:

>>> from libertem.udf.sumsigudf import SumSigUDF
>>> roi = np.zeros(dataset.shape.nav, dtype=bool)
>>> roi[0, 0] = True
>>> result = ctx.run_udf(dataset=dataset, udf=SumSigUDF(), roi=roi)
>>> # to get the full navigation-shaped results, with NaNs where the `roi` was False:
>>> np.array(result["intensity"]).shape
(16, 16)
>>> # to only get the selected results as a flat array:
>>> result["intensity"].raw_data.shape
(1,)
run_udf_iter(dataset: DataSet, udf: Union[UDF, Iterable[UDF]], roi: Optional[ndarray] = None, corrections: Optional[CorrectionSet] = None, progress: bool = False, backends=None, plots=None, sync=True) Union[Generator[UDFResults, None, None], AsyncGenerator[UDFResults, None]][source]

Run udf on dataset, restricted to the region of interest roi. Yields partial results after each merge operation.

New in version 0.7.0.

Parameters
  • dataset – The dataset to work on

  • udf – UDF instance you want to run, or a list of UDF instances

  • roi (numpy.ndarray) – Region of interest as bool mask over the navigation axes of the dataset

  • progress (bool) – Show progress bar

  • corrections – Corrections to apply while running the UDF. If none are given, the corrections that are part of the DataSet are used, if there are any. See also Corrections.

  • backends (None or iterable containing 'numpy', 'cupy' and/or 'cuda') – Restrict the back-end to a subset of the capabilities of the UDF. This can be useful for testing hybrid UDFs.

  • plots (None or True or List[List[Union[str, Tuple[str, Callable]]]] or List[LivePlot]) –

    • None: don’t plot anything (default)

    • True: plot all 2D UDF result buffers

    • List[List[...]]: plot the named UDF buffers. Pass a list of names or (name, callable) tuples for each UDF you want to plot. If the callable is specified, it is applied to the UDF buffer before plotting.

    • List[LivePlot]: LivePlot instance for each channel you want to plot

  • sync (bool) – By default, run_udf_iter is a synchronous method. If sync is set to False, an async generator will be returned instead.

Returns

Generator of UDFResults container objects. Their attribute buffers is the list of result buffer dictionaries for the UDFs. Attribute damage is a BufferWrapper of kind='nav', dtype=bool indicating the positions in nav space that have been processed already.

Return type

Generator[UDFResults]

Examples

Run the SumUDF on a data set:

>>> from libertem.udf.sum import SumUDF
>>> for result in ctx.run_udf_iter(dataset=dataset, udf=SumUDF()):
...     assert np.array(result.buffers[0]["intensity"]).shape == (32, 32)
>>> np.array(result.buffers[0]["intensity"]).shape
(32, 32)
>>> # intensity is the name of the result buffer, defined in the SumUDF

Result type for UDF result iterators:

class libertem.udf.base.UDFResults(buffers: Iterable[Mapping[str, BufferWrapper]], damage: BufferWrapper)[source]

Container class to combine UDF results with additional information.

This class allows to return additional information from UDF execution together with UDF result buffers. This is currently used to pass “damage” information when running UDFs as an iterator using libertem.api.Context.run_udf_iter(). “Damage” is a map of the nav space that is set to True for all positions that have already been processed.

New in version 0.7.0.

Parameters
  • buffers – Iterable containing the result buffer dictionaries for each of the UDFs being executed

  • damage (BufferWrapper) – libertem.common.buffers.BufferWrapper of kind='nav', dtype=bool. It is set to True for all positions in nav space that have been processed already.

Buffers

BufferWrapper objects are used to manage data in the context of user-defined functions.

class libertem.common.buffers.AuxBufferWrapper(kind, extra_shape=(), dtype='float32', where=None, use=None)[source]
get_view_for_dataset(dataset)[source]
new_for_partition(partition, roi)[source]

Return a new AuxBufferWrapper for a specific partition, slicing the data accordingly and reducing it to the selected roi.

This assumes to be called on an AuxBufferWrapper that was not created by this method, that is, it should have global coordinates without having the ROI applied.

set_buffer(buf, is_global=True)[source]

Set the underlying buffer to an existing numpy array.

If is_global is True, the shape must match with the shape of nav or sig of the dataset, plus extra_shape, as determined by the kind and extra_shape constructor arguments.

class libertem.common.buffers.BufferPool[source]

allocation pool for explicitly re-using (aligned) allocations

bytes(size, alignment=4096)[source]
checkin_bytes(size, alignment, buf)[source]
checkout_bytes(size, alignment)[source]
empty(size, dtype, alignment=4096)[source]
zeros(size, dtype, alignment=4096)[source]
class libertem.common.buffers.BufferWrapper(kind, extra_shape=(), dtype='float32', where=None, use=None)[source]

Helper class to automatically allocate buffers, either for partitions or the whole dataset, and create views for partitions or single frames.

This is used as a helper to allow easy merging of results without needing to manually handle indexing.

Usually, as a user, you only need to instantiate this class, specifying kind, dtype and sometimes extra_shape parameters. Most methods are meant to be called from LiberTEM-internal code, for example the UDF functionality.

This class is array_like, so you can directly use it, for example, as argument for numpy functions.

Changed in version 0.6.0: Add option to specify backend, for example CuPy

Parameters
  • kind ("nav", "sig" or "single") – The abstract shape of the buffer, corresponding either to the navigation or the signal dimensions of the dataset, or a single value.

  • extra_shape (optional, tuple of int or a Shape object) – You can specify additional dimensions for your data. For example, if you want to store 2D coords, you would specify (2,) here. For a Shape object, sig_dims is discarded and the entire shape is used.

  • dtype (string or numpy dtype) – The dtype of this buffer

  • where (string or None) –

    None means NumPy array, device to use a back-end specified in allocate().

    New in version 0.6.0.

  • use ("private", "result_only" or None) –

    If you specify "private" here, the result will only be made available to internal functions, like process_frame(), merge() or get_results(). It will not be available to the user of the UDF, which means you can use this to hide implementation details that are likely to change later.

    Specify "result_only" here if the buffer is only used in get_results(), this means we don’t have to allocate and return it on the workers without actually needing it.

    None means the buffer is used both as a final and intermediate result.

    New in version 0.7.0.

add_partitions(partitions)[source]

Add a list of dataset partitions to the buffer such that self.allocate() can make use of the structure

allocate(lib=None)[source]

Allocate a new buffer, in the shape previously set via one of the set_shape_* methods.

Changed in version 0.6.0: Support for allocating on device

property data

Get the buffer contents in shape that corresponds to the original dataset shape. If a ROI is set, embed the result into a new array; unset values have NaN value for floating point types, False for boolean, 0 for integer types and structs, ‘’ for string types and None for objects.

Changed in version 0.7.0: Better initialization values for dtypes other than floating point.

property dtype

Get the declared dtype of this buffer.

New in version 0.7.0.

export()[source]

Convert device array to NumPy array for pickling and merging

property extra_shape

Get the extra_shape of this buffer.

New in version 0.5.0.

flush(debug=False)[source]

Write back any cached contiguous copies

New in version 0.5.0.

get_contiguous_view_for_tile(partition, tile)[source]

Make a cached contiguous copy of the view for a single tile if necessary.

Currently this is only necessary for kind="sig" buffers. Use flush() to write back the cache.

Boundary condition: tile.tile_slice.get(sig_only=True) does not overlap for different tiles while the cache is active, i.e. the tiles follow LiberTEM slicing for libertem.udf.base.UDFTileMixing.process_tile().

New in version 0.5.0.

Returns

view – View into data or contiguous copy if necessary

Return type

np.ndarray

get_view_for_dataset(dataset)[source]
get_view_for_frame(partition, tile, frame_idx)[source]

get a view for a single frame in a partition- or dataset-sized buffer (partition-sized here means the reduced result for a whole partition, not the partition itself!)

get_view_for_partition(partition)[source]

get a view for a single partition in a whole-result-sized buffer

get_view_for_tile(partition, tile)[source]

get a view for a single tile in a partition-sized buffer (partition-sized here means the reduced result for a whole partition, not the partition itself!)

has_data()[source]
property kind

Get the kind of this buffer.

New in version 0.5.0.

property raw_data: Optional[ndarray]

Get the raw data underlying this buffer, which is flattened and may be even filtered to a ROI

replace_array(data: nt.ArrayLike) None[source]

Set the data backing the BufferWrapper even if the BufferWrapper already has self._data allocated

data should be any array-like object

Will perform checking for shape and dtype.

result_buffer_type()[source]

Define the type of Buffer used to return final UDF results

More specialised buffers can override this

property roi_is_zero
set_roi(roi)[source]
set_shape_ds(dataset_shape, roi=None)[source]
set_shape_partition(partition, roi=None)[source]
property shape
property where

Get the place where this buffer is to be allocated.

New in version 0.6.0.

class libertem.common.buffers.ManagedBuffer(pool, size, alignment)[source]

Allocate size bytes from pool, and return them to the pool once we are GC’d

class libertem.common.buffers.PlaceholderBufferWrapper(kind, extra_shape=(), dtype='float32', where=None, use=None)[source]

A declaration-only version of BufferWrapper that doesn’t actually allocate a buffer. Meant as a placeholder for results that are only materialized in UDF.get_results.

allocate(lib=None)[source]

Allocate a new buffer, in the shape previously set via one of the set_shape_* methods.

Changed in version 0.6.0: Support for allocating on device

property data

Get the buffer contents in shape that corresponds to the original dataset shape. If a ROI is set, embed the result into a new array; unset values have NaN value for floating point types, False for boolean, 0 for integer types and structs, ‘’ for string types and None for objects.

Changed in version 0.7.0: Better initialization values for dtypes other than floating point.

export()[source]

Convert device array to NumPy array for pickling and merging

get_contiguous_view_for_tile(partition, tile)[source]

Make a cached contiguous copy of the view for a single tile if necessary.

Currently this is only necessary for kind="sig" buffers. Use flush() to write back the cache.

Boundary condition: tile.tile_slice.get(sig_only=True) does not overlap for different tiles while the cache is active, i.e. the tiles follow LiberTEM slicing for libertem.udf.base.UDFTileMixing.process_tile().

New in version 0.5.0.

Returns

view – View into data or contiguous copy if necessary

Return type

np.ndarray

get_view_for_frame(partition, tile, frame_idx)[source]

get a view for a single frame in a partition- or dataset-sized buffer (partition-sized here means the reduced result for a whole partition, not the partition itself!)

get_view_for_partition(partition)[source]

get a view for a single partition in a whole-result-sized buffer

get_view_for_tile(partition, tile)[source]

get a view for a single tile in a partition-sized buffer (partition-sized here means the reduced result for a whole partition, not the partition itself!)

has_data()[source]
property raw_data

Get the raw data underlying this buffer, which is flattened and may be even filtered to a ROI

class libertem.common.buffers.PreallocBufferWrapper(data, *args, **kwargs)[source]
libertem.common.buffers.bytes_aligned(size: int) memoryview[source]
libertem.common.buffers.disjoint(sl: Slice, slices: Iterable[Slice])[source]
libertem.common.buffers.empty_aligned(size: Union[int, Tuple[int, ...]], dtype: nt.DTypeLike) ndarray[source]
libertem.common.buffers.reshaped_view(a: ndarray, shape)[source]

Like numpy.ndarray.reshape(), just guaranteed to return a view or throw an AttributeError if no view can be created.

New in version 0.5.0.

Parameters
  • a (numpy.ndarray) – Array to create a view of

  • shape (tuple) – Shape of the view to create

Returns

view – View into a with shape shape

Return type

numpy.ndarray

libertem.common.buffers.to_numpy(a: Union[ndarray, Any]) ndarray[source]
libertem.common.buffers.zeros_aligned(size: Union[int, Tuple[int, ...]], dtype: nt.DTypeLike) ndarray[source]

Included utility UDFs

Some generally useful UDFs are included with LiberTEM:

Note

See Application-specific APIs for application-specific UDFs and analyses.

Sum of frames

class libertem.udf.sum.SumUDF(dtype='float32')[source]

Sum up frames, preserving the signal dimension

Parameters

dtype (numpy.dtype, optional) – Preferred dtype for computation, default ‘float32’. The actual dtype will be determined from this value and the dataset’s dtype using numpy.result_type(). See also Preferred input dtype.

Examples

>>> udf = SumUDF()
>>> result = ctx.run_udf(dataset=dataset, udf=udf)
>>> np.array(result["intensity"]).shape
(32, 32)
get_preferred_input_dtype()[source]

Override this method to specify the preferred input dtype of the UDF.

The default is float32 since most numerical processing tasks perform best with this dtype, namely dot products.

The back-end uses this preferred input dtype in combination with the dataset`s native dtype to determine the input dtype using numpy.result_type(). That means float data in a dataset switches the dtype to float even if this method returns an int dtype. int32 or wider input data would switch from float32 to float64, and complex data in the dataset will switch the input dtype kind to complex, following the NumPy casting rules.

In case your UDF only works with specific input dtypes, it should throw an error or warning if incompatible dtypes are used, and/or implement a meaningful conversion in your UDF’s process_<...> routine.

If you prefer to always use the dataset’s native dtype instead of floats, you can override this method to return UDF.USE_NATIVE_DTYPE, which is currently identical to numpy.bool and behaves as a neutral element in numpy.result_type().

New in version 0.4.0.

Sum of log-scaled frames

class libertem.udf.logsum.LogsumUDF[source]

Sum up logscaled frames

In comparison to log-scaling the sum, this highlights regions with slightly higher intensity that appear in many frames in relation to very high intensity in a few frames.

Examples

>>> udf = LogsumUDF()
>>> result = ctx.run_udf(dataset=dataset, udf=udf)
>>> np.array(result["logsum"]).shape
(32, 32)

Standard deviation

class libertem.udf.stddev.StdDevUDF(**kwargs: Union[Any, AuxBufferWrapper])[source]

Compute sum of variances and sum of pixels from the given dataset

The one-pass algorithm used in this code is taken from the following paper: [SG18].

..versionchanged:: 0.5.0

Result buffers have been renamed

..versionchanged:: 0.7.0

var, mean, and std are now returned directly from the UDF via get_results.

Examples

>>> udf = StdDevUDF()
>>> result = ctx.run_udf(dataset=dataset, udf=udf)
>>> np.array(result["varsum"])        # variance times number of frames
array(...)
>>> np.array(result["num_frames"])  # number of frames for each tile
array(...)
>>> np.array(result["sum"])  # sum of all frames
array(...)
>>> np.array(result["var"])
array(...)
>>> np.array(result["mean"])
array(...)
>>> np.array(result["std"])
array(...)
get_result_buffers()[source]

Initializes BufferWrapper objects for sum of variances, sum of frames, and the number of frames

Returns

A dictionary that maps ‘varsum’, ‘num_frames’, ‘sum’ to the corresponding BufferWrapper objects

Return type

dict

get_results()[source]

Calculate variance, mean and standard deviation from raw UDF results

Returns

pass_results – Result dictionary with keys 'sum', 'varsum', 'num_frames', 'var', 'std', and 'mean' as BufferWrapper

Return type

Dict[str, BufferWrapper]

get_task_data()[source]

Initialize per-task data.

Per-task data can be mutable. Override this function to allocate temporary buffers, or to initialize system resources.

If you want to distribute static data, use parameters instead.

Data available in this method:

  • self.params - the input parameters of this UDF

  • self.meta - relevant metadata, see UDFMeta documentation.

Returns

Flat dict with string keys. Keys should be valid python identifiers, which allows access via self.task_data.the_key_here.

Return type

dict

merge(dest, src)[source]

Given destination and source buffers that contain sum of variances, sum of frames, and the number of frames used in each of the buffers, merge the source buffers into the destination buffers by computing the joint sum of variances and sum of frames over all frames used

Parameters
  • dest – Aggregation bufer that contains sum of variances, sum of frames, and the number of frames

  • src – Partial results that contains sum of variances, sum of frames, and the number of frames of a partition to be merged into the aggregation buffers

process_tile(tile)[source]

Calculate a sum and variance minibatch for the tile and update partition buffers with it.

Parameters

tile – tile of the data

libertem.udf.stddev.run_stddev(ctx, dataset, roi=None, progress=False)[source]

Compute sum of variances and sum of pixels from the given dataset

One-pass algorithm used in this code is taken from the following paper: [SG18].

..versionchanged:: 0.5.0

Result buffers have been renamed

..versionchanged:: 0.5.0

Added progress parameter for progress bar

Parameters
Returns

  • pass_results – A dictionary of narrays that contains sum of variances, sum of pixels, and number of frames used to compute the above statistic

  • To retrieve statistic, using the following commands

  • variance (pass_results['var'])

  • standard deviation (pass_results['std'])

  • sum of pixels (pass_results['sum'])

  • mean (pass_results['mean'])

  • number of frames (pass_results['num_frames'])

libertem.udf.stddev.consolidate_result(udf_result)[source]

Calculate variance, mean and standard deviation from raw UDF results and consolidate the per-tile frame counter into a single value. Convert all result arrays to ndarray.

Note

This is mostly here for backwards-compatability - nowadays, ‘var’, ‘std’, and ‘mean’ are already calculated in StdDevUDF.get_results().

Parameters

udf_result (Dict[str, BufferWrapper]) – UDF result with keys ‘sum’, ‘varsum’, ‘num_frames’, ‘var’, ‘std’, ‘mean’

Returns

pass_results – Result dictionary with keys 'sum', 'varsum', 'var', 'std', 'mean' as numpy.ndarray, and 'num_frames' as int

Return type

Dict[str, Union[numpy.ndarray, int]]

Sum per frame

class libertem.udf.sumsigudf.SumSigUDF(**kwargs: Union[Any, AuxBufferWrapper])[source]

Sum over the signal axes. For each navigation position, the sum of all pixels is calculated.

Examples

>>> udf = SumSigUDF()
>>> result = ctx.run_udf(dataset=dataset, udf=udf)
>>> np.array(result["intensity"]).shape
(16, 16)

Apply masks

class libertem.udf.masks.ApplyMasksUDF(mask_factories, use_torch=True, use_sparse=None, mask_count=None, mask_dtype=None, preferred_dtype=None, backends=None)[source]

Apply masks to signals/frames in the dataset. This can not only be used to integrate over regions with a binary mask - the integration can be weighted by using float or complex valued masks.

The result will be returned in a single sig-shaped buffer called intensity. Its shape will be (*nav_shape, len(masks)).

Parameters
  • mask_factories (Union[Callable[[], array_like], Iterable[Callable[[], array_like]]]) – Function or list of functions that take no arguments and create masks. The returned masks can be numpy arrays, scipy.sparse or sparse https://sparse.pydata.org/ matrices. The mask factories should not reference large objects because they can create significant overheads when they are pickled and unpickled. Each factory function should, when called, return a numpy array with the same shape as frames in the dataset (so dataset.shape.sig).

  • use_torch (bool, optional) – Use pytorch back-end if available. Default True

  • use_sparse (Union[None, False, True, 'scipy.sparse', 'scipy.sparse.csc', 'sparse.pydata'], optional) – Which sparse back-end to use. * None (default): Use sparse matrix multiplication if all factory functions return a sparse mask, otherwise convert all masks to dense matrices and use dense matrix multiplication * True: Convert all masks to sparse matrices and use default sparse back-end. * False: Convert all masks to dense matrices. * ‘scipy.sparse’: Use scipy.sparse.csr_matrix (default sparse) * ‘scipy.sparse.csc’: Use scipy.sparse.csc_matrix * ‘sparse.pydata’: Use sparse.pydata COO matrix

  • mask_count (int, optional) – Specify the number of masks if a single factory function is used so that the number of masks can be determined without calling the factory function.

  • mask_dtype (numpy.dtype, optional) – Specify the dtype of the masks so that mask dtype can be determined without calling the mask factory functions. This can be used to override the mask dtype in the result dtype determination. As an example, setting this to np.float32 means that masks of type float64 will not switch the calculation and result dtype to float64 or complex128.

  • preferred_dtype (numpy.dtype, optional) – Let get_preferred_input_dtype() return the specified type instead of the default float32. This can perform the calculation with integer types if both input data and mask data are compatible with this.

  • backends (Iterable containing strings "numpy" and/or "cupy", or None) – Control which back-ends are used. Default is numpy and cupy

Examples

>>> dataset.shape
(16, 16, 32, 32)
>>> def my_masks():
...     return [np.ones((32, 32)), np.zeros((32, 32))]
>>> udf = ApplyMasksUDF(mask_factories=my_masks)
>>> res = ctx.run_udf(dataset=dataset, udf=udf)['intensity']
>>> res.data.shape
(16, 16, 2)
>>> np.allclose(res.data[..., 1], 0)  # same order as in the mask factory
True

Mask factories can also return all masks as a single array, stacked on the first axis:

>>> def my_masks_2():
...     masks = np.zeros((2, 32, 32))
...     masks[1, ...] = 1
...     return masks
>>> udf = ApplyMasksUDF(mask_factories=my_masks_2)
>>> res_2 = ctx.run_udf(dataset=dataset, udf=udf)['intensity']
>>> np.allclose(res_2.data, res.data)
True

New in version 0.4.0.

get_backends()[source]

Signal which computation back-ends the UDF can use.

numpy is the default CPU-based computation.

cuda is CUDA-based computation without CuPy.

cupy is CUDA-based computation through CuPy.

New in version 0.6.0.

Returns

backend – An iterable containing possible values numpy (default), 'cuda' and cupy

Return type

Iterable[str]

Load data

class libertem.udf.raw.PickUDF[source]

Load raw data from ROI

This UDF is meant for frame picking with a very small ROI, usually a single frame.

New in version 0.4.0.

NoOp

class libertem.udf.base.NoOpUDF(preferred_input_dtype=<class 'bool'>)[source]

A UDF that does nothing and returns nothing.

This is useful for testing.

Parameters

preferred_input_dtype (numpy.dtype) – Perform dtype conversion. By default, this is UDF.USE_NATIVE_DTYPE.

get_preferred_input_dtype()[source]

Return the value passed in the constructor.

get_result_buffers()[source]

No result buffers.

process_tile(tile)[source]

Do nothing.