UDF API reference

Defining UDFs

See User-defined functions for an introduction and in-depth explanation.

class libertem.udf.base.Task(partition, idx)[source]

A computation on a partition. Inherit from this class and implement __call__ for your specific computation.

Changed in version 0.4.0: Moved from libertem.job.base to libertem.udf.base as part of Job API deprecation

__init__(partition, idx)[source]

Initialize self. See help(type(self)) for accurate signature.

get_resources()[source]

Specify the resources that a Task will use.

The resources are designed to work with resource tags in Dask clusters: See https://distributed.dask.org/en/latest/resources.html

The resources allow scheduling of CPU-only compute, CUDA-only compute, (CPU or CUDA) hybrid compute, and service tasks in such a way that all resources are used without oversubscription. Furthermore, they distinguish if the given resource can be accessed with a transparent NumPy ndarray interface – namely if CuPy is installed to access CUDA resources.

Each CPU worker gets one CPU, one compute and one ndarray resource assigned. Each CUDA worker gets one CUDA and one compute resource assigned. If CuPy is installed, it additionally gets an ndarray resource assigned. A service worker doesn’t get any resources assigned.

A CPU-only task consumes one CPU, one ndarray and one compute resource, i.e. it will be scheduled only on CPU workers. A CUDA-only task consumes one CUDA, one compute and possibly an ndarray resource, i.e. it will only be scheduled on CUDA workers. A hybrid task that can run on both CPU or CUDA using a transparent ndarray interface only consumes a compute and an ndarray resource, i.e. it can be scheduled on CPU workers or CUDA workers with CuPy. A service task doesn’t request any resources and can therefore run on CPU, CUDA and service workers.

Which device a hybrid task uses is only decided at runtime by the environment variables that are set on each worker process.

That way, CPU-only and CUDA-only tasks can run in parallel without oversubscription, and hybrid tasks use whatever compute workers are free at the time. Furthermore, it allows using CUDA without installing CuPy. See https://github.com/LiberTEM/LiberTEM/pull/760 for detailed discussion.

Compute tasks will not run on service workers, so that these can serve shorter service tasks with lower latency.

At the moment, workers only get a single “item” of each resource assigned since we run one process per CPU. Requesting more of a resource than any of the workers has causes a RuntimeError, including requesting a resource that is not available at all.

For that reason one has to make sure that the workers are set up with the correct resources and matching environment. libertem.utils.devices.detect(), libertem.executor.dask.cluster_spec() and libertem.executor.dask.DaskJobExecutor.make_local() can be used to set up a local cluster correctly.

New in version 0.6.0.

class libertem.udf.base.UDF(**kwargs)[source]

The main user-defined functions interface. You can implement your functionality by overriding methods on this class.

USE_NATIVE_DTYPE

alias of builtins.bool

__init__(**kwargs)[source]

Create a new UDF instance. If you override __init__, please take care, as it is called multiple times during evaluation of a UDF. You can handle some pre-conditioning of parameters, but you also have to accept the results as input again.

Arguments passed as **kwargs will be automatically available on self.params when running the UDF.

Example

>>> class MyUDF(UDF):
...     def __init__(self, param1, param2="def2", **kwargs):
...         param1 = int(param1)
...         if "param3" not in kwargs:
...             raise TypeError("missing argument param3")
...         super().__init__(param1=param1, param2=param2, **kwargs)
Parameters

kwargs

Input parameters. They are scattered to the worker processes and available as self.params from here on.

Values can be BufferWrapper instances, which, when accessed via self.params.the_key_here, will automatically return a view corresponding to the current unit of data (frame, tile, partition).

classmethod aux_data(data, kind, extra_shape=(), dtype='float32')[source]

Use this method to create auxiliary data. Auxiliary data should have a shape like (dataset.shape.nav, extra_shape) and on access, an appropriate view will be created. For example, if you access aux data in process_frame, you will get the auxiliary data for the current frame you are processing.

Example

We create a UDF to demonstrate the behavior:

>>> class MyUDF(UDF):
...     def get_result_buffers(self):
...         # Result buffer for debug output
...         return {'aux_dump': self.buffer(kind='nav', dtype='object')}
...
...     def process_frame(self, frame):
...         # Extract value of aux data for demonstration
...         self.results.aux_dump[:] = str(self.params.aux_data[:])
...
>>> # for each frame, provide three values from a sequential series:
>>> aux1 = MyUDF.aux_data(
...     data=np.arange(np.prod(dataset.shape.nav) * 3, dtype=np.float32),
...     kind="nav", extra_shape=(3,), dtype="float32"
... )
>>> udf = MyUDF(aux_data=aux1)
>>> res = ctx.run_udf(dataset=dataset, udf=udf)

process_frame for frame (0, 7) received a view of aux_data with values [21., 22., 23.]:

>>> res['aux_dump'].data[0, 7]
'[21. 22. 23.]'
buffer(kind, extra_shape=(), dtype='float32', where=None)[source]

Use this method to create BufferWrapper objects in get_result_buffers().

copy_for_partition(partition: numpy.ndarray, roi: numpy.ndarray)[source]

create a copy of the UDF, specifically slicing aux data to the specified pratition and roi

get_backends()[source]

Signal which computation back-ends the UDF can use.

numpy is the default CPU-based computation.

cuda is CUDA-based computation without CuPy.

cupy is CUDA-based computation through CuPy.

New in version 0.6.0.

Returns

backend – An iterable containing possible values numpy (default), 'cuda' and cupy

Return type

Iterable[str]

get_preferred_input_dtype()[source]

Override this method to specify the preferred input dtype of the UDF.

The default is float32 since most numerical processing tasks perform best with this dtype, namely dot products.

The back-end uses this preferred input dtype in combination with the dataset`s native dtype to determine the input dtype using numpy.result_type(). That means float data in a dataset switches the dtype to float even if this method returns an int dtype. int32 or wider input data would switch from float32 to float64, and complex data in the dataset will switch the input dtype kind to complex, following the NumPy casting rules.

In case your UDF only works with specific input dtypes, it should throw an error or warning if incompatible dtypes are used, and/or implement a meaningful conversion in your UDF’s process_<...> routine.

If you prefer to always use the dataset’s native dtype instead of floats, you can override this method to return UDF.USE_NATIVE_DTYPE, which is currently identical to numpy.bool and behaves as a neutral element in numpy.result_type().

New in version 0.4.0.

get_result_buffers()[source]

Return result buffer declaration.

Values of the returned dict should be BufferWrapper instances, which, when accessed via self.results.key, will automatically return a view corresponding to the current unit of data (frame, tile, partition).

The values also need to be serializable via pickle.

Data available in this method:

  • self.params - the parameters of this UDF

  • self.meta - relevant metadata, see UDFMeta documentation.

    Please note that partition metadata will not be set when this method is executed on the head node.

Returns

Flat dict with string keys. Keys should be valid python identifiers, which allows access via self.results.the_key_here.

Return type

dict

get_task_data()[source]

Initialize per-task data.

Per-task data can be mutable. Override this function to allocate temporary buffers, or to initialize system resources.

If you want to distribute static data, use parameters instead.

Data available in this method:

  • self.params - the input parameters of this UDF

  • self.meta - relevant metadata, see UDFMeta documentation.

Returns

Flat dict with string keys. Keys should be valid python identifiers, which allows access via self.task_data.the_key_here.

Return type

dict

get_tiling_preferences()[source]

Configure tiling preferences. Return a dictinary with the following keys:

  • “depth”: number of frames/frame parts to stack on top of each other

  • “total_size”: total size of a tile in bytes

merge(dest: Dict[str, numpy.array], src: Dict[str, numpy.array])[source]

Merge a partial result src into the current global result dest.

Data available in this method:

  • self.params - the parameters of this UDF

Parameters
  • dest – global results; dictionary mapping the buffer name (from get_result_buffers) to a numpy array

  • src – results for a partition; dictionary mapping the buffer name (from get_result_buffers) to a numpy array

Note

This function is running on the leader node, which means self.results and self.task_data are not available.

property requires_custom_merge

Determine if buffers with kind != 'nav' are present where the default merge doesn’t work

New in version 0.5.0.

class libertem.udf.base.UDFBase[source]

Base class for UDFs with helper functions.

property xp

Compute back-end library to use.

Generally, use self.xp instead of :code`np` to use NumPy or CuPy transparently

New in version 0.6.0.

class libertem.udf.base.UDFData(data: Dict[str, libertem.common.buffers.BufferWrapper])[source]

Container for result buffers, return value from running UDFs

__init__(data: Dict[str, libertem.common.buffers.BufferWrapper])[source]

Initialize self. See help(type(self)) for accurate signature.

allocate_for_part(partition: libertem.common.shape.Shape, roi: numpy.ndarray, lib=None)[source]

allocate all BufferWrapper instances in this namespace. for pre-allocated buffers (i.e. aux data), only set shape and roi

class libertem.udf.base.UDFFrameMixin[source]

Implement process_frame for per-frame processing.

process_frame(frame: numpy.ndarray)[source]

Implement this method to process the data on a frame-by-frame manner.

Data available in this method:

  • self.params - the parameters of this UDF

  • self.task_data - task data created by get_task_data

  • self.results - the result buffer instances

  • self.meta - meta data about the current operation and data set

Parameters

frame (numpy.ndarray or cupy.ndarray) – A single frame or signal element from the dataset. The shape is the same as dataset.shape.sig. In case of pixelated STEM / scanning diffraction data this is 2D, for spectra 1D etc.

class libertem.udf.base.UDFMeta(partition_shape: libertem.common.shape.Shape, dataset_shape: libertem.common.shape.Shape, roi: numpy.ndarray, dataset_dtype: numpy.dtype, input_dtype: numpy.dtype, tiling_scheme: libertem.io.dataset.base.tiling.TilingScheme = None, tiling_index: int = 0, corrections=None, device_class: str = None)[source]

UDF metadata. Makes all relevant metadata accessible to the UDF. Can be different for each task/partition.

Changed in version 0.4.0: Added distinction of dataset_dtype and input_dtype

Changed in version 0.6.0.dev0: Information on compute backend added

__init__(partition_shape: libertem.common.shape.Shape, dataset_shape: libertem.common.shape.Shape, roi: numpy.ndarray, dataset_dtype: numpy.dtype, input_dtype: numpy.dtype, tiling_scheme: libertem.io.dataset.base.tiling.TilingScheme = None, tiling_index: int = 0, corrections=None, device_class: str = None)[source]

Initialize self. See help(type(self)) for accurate signature.

property corrections

correction data that is available, either from the dataset or specified by the user

New in version 0.6.0.

Type

CorrectionSet

property dataset_dtype

Native dtype of the dataset

Type

numpy.dtype

property dataset_shape

The original shape of the whole dataset, not influenced by the ROI

Type

Shape

property device_class

Which device class is used.

The back-end library can be accessed as libertem.udf.base.UDF.xp. This additional string information is used since that way the back-end can be probed without importing them all and testing them against libertem.udf.base.UDF.xp.

Current values are cpu (default) or cuda.

New in version 0.6.0.

property input_dtype

dtype of the data that will be passed to the UDF

This is determined from the dataset’s native dtype and UDF.get_preferred_input_dtype() using numpy.result_type()

New in version 0.4.0.

Type

numpy.dtype

property partition_shape

The shape of the partition this UDF currently works on. If a ROI was applied, the shape will be modified accordingly.

Type

Shape

property roi

Boolean array which limits the elements the UDF is working on. Has a shape of dataset_shape.nav.

Type

numpy.ndarray

property slice

A Slice instance that describes the location within the dataset with navigation dimension flattened and reduced to the ROI.

Type

Slice

property tiling_scheme

the tiling scheme that was negotiated

Type

TilingScheme

class libertem.udf.base.UDFPartitionMixin[source]

Implement process_partition for per-partition processing.

process_partition(partition: numpy.ndarray)[source]

Implement this method to process the data partitioned into large (100s of MiB) partitions.

Data available in this method:

  • self.params - the parameters of this UDF

  • self.task_data - task data created by get_task_data

  • self.results - the result buffer instances

  • self.meta - meta data about the current operation and data set

Note

Only use this method if you know what you are doing; especially if you are running a processing pipeline with multiple steps, or multiple processing pipelines at the same time, performance may be adversely impacted.

Parameters

partition (numpy.ndarray or cupy.ndarray) – A large number N of frames or signal elements from the dataset. The shape is (N,) + dataset.shape.sig. In case of pixelated STEM / scanning diffraction data this is 3D, for spectra 2D etc.

class libertem.udf.base.UDFPostprocessMixin[source]

Implement postprocess to modify the resulf buffers of a partition on the worker after the partition data has been completely processed, but before it is returned to the master node for the final merging step.

postprocess()[source]

Implement this method to postprocess the result data for a partition.

This can be useful in combination with process_tile() to implement a postprocessing step that requires the reduced results for whole frames.

Data available in this method:

  • self.params - the parameters of this UDF

  • self.task_data - task data created by get_task_data

  • self.results - the result buffer instances

class libertem.udf.base.UDFPreprocessMixin[source]

Implement preprocess to initialize the result buffers of a partition on the worker before the partition data is processed.

New in version 0.3.0.

preprocess()[source]

Implement this method to preprocess the result data for a partition.

This can be useful to initialize arrays of dtype='object' with the correct container types, for example.

Data available in this method:

  • self.params - the parameters of this UDF

  • self.task_data - task data created by get_task_data

  • self.results - the result buffer instances

class libertem.udf.base.UDFTileMixin[source]

Implement process_tile for per-tile processing.

process_tile(tile: numpy.ndarray)[source]

Implement this method to process the data in a tiled manner.

Data available in this method:

  • self.params - the parameters of this UDF

  • self.task_data - task data created by get_task_data

  • self.results - the result buffer instances

  • self.meta - meta data about the current operation and data set

Parameters

tile (numpy.ndarray or cupy.ndarray) – A small number N of frames or signal elements from the dataset. The shape is (N,) + dataset.shape.sig. In case of pixelated STEM / scanning diffraction data this is 3D, for spectra 2D etc.

Running UDFs

Two methods of libertem.api.Context are relevant for running user-defined functions:

class libertem.api.Context(executor: libertem.executor.base.JobExecutor = None)[source]

Context is the main entry point of the LiberTEM API. It contains methods for loading datasets, creating analyses on them and running them.

map(dataset: libertem.io.dataset.base.dataset.DataSet, f, roi: numpy.ndarray = None, progress: bool = False)libertem.common.buffers.BufferWrapper[source]

Create an AutoUDF with function f() and run it on dataset

Changed in version 0.5.0: Added the progress parameter

Parameters
  • dataset – The dataset to work on

  • f – Function that accepts a frame as the only parameter. It should return a strongly reduced output compared to the size of a frame.

  • roi (numpy.ndarray) – region of interest as bool mask over the navigation axes of the dataset

  • progress (bool) – Show progress bar

Returns

BufferWrapper – The result of the UDF. Access the underlying numpy array using the data property. Shape and dtype is inferred automatically from f.

Return type

libertem.common.buffers.BufferWrapper

run_udf(dataset: libertem.io.dataset.base.dataset.DataSet, udf: libertem.udf.base.UDF, roi: numpy.ndarray = None, corrections: libertem.corrections.corrset.CorrectionSet = None, progress: bool = False, backends=None) → Dict[str, libertem.common.buffers.BufferWrapper][source]

Run udf on dataset, restricted to the region of interest roi.

Changed in version 0.5.0: Added the progress parameter

Changed in version 0.6.0: Added the corrections and backends parameter

Parameters
  • dataset – The dataset to work on

  • udf – UDF instance you want to run

  • roi (numpy.ndarray) – Region of interest as bool mask over the navigation axes of the dataset

  • progress (bool) – Show progress bar

  • corrections – Corrections to apply while running the UDF. If none are given, the corrections that are part of the DataSet are used, if there are any.

  • backends (None or iterable containing 'numpy', 'cupy' and/or 'cuda') – Restrict the back-end to a subset of the capabilities of the UDF. This can be useful for testing hybrid UDFs.

Returns

Return value of the UDF containing the result buffers of type libertem.common.buffers.BufferWrapper. Note that a BufferWrapper can be used like a numpy.ndarray in many cases because it implements __array__(). You can access the underlying numpy array using the data property.

Return type

dict

Buffers

BufferWrapper objects are used to manage data in the context of user-defined functions.

class libertem.common.buffers.AuxBufferWrapper(kind, extra_shape=(), dtype='float32', where=None)[source]
get_view_for_dataset(dataset)[source]
new_for_partition(partition, roi)[source]

Return a new AuxBufferWrapper for a specific partition, slicing the data accordingly and reducing it to the selected roi.

This assumes to be called on an AuxBufferWrapper that was not created by this method, that is, it should have global coordinates without having the ROI applied.

set_buffer(buf, is_global=True)[source]

Set the underlying buffer to an existing numpy array.

If is_global is True, the shape must match with the shape of nav or sig of the dataset, plus extra_shape, as determined by the kind and extra_shape constructor arguments.

class libertem.common.buffers.BufferPool[source]

allocation pool for explicitly re-using (aligned) allocations

__init__()[source]

Initialize self. See help(type(self)) for accurate signature.

bytes(size)[source]
empty(size, dtype)[source]
zeros(size, dtype)[source]
class libertem.common.buffers.BufferWrapper(kind, extra_shape=(), dtype='float32', where=None)[source]

Helper class to automatically allocate buffers, either for partitions or the whole dataset, and create views for partitions or single frames.

This is used as a helper to allow easy merging of results without needing to manually handle indexing.

Usually, as a user, you only need to instantiate this class, specifying kind, dtype and sometimes extra_shape parameters. Most methods are meant to be called from LiberTEM-internal code, for example the UDF functionality.

This class is array_like, so you can directly use it, for example, as argument for numpy functions.

__init__(kind, extra_shape=(), dtype='float32', where=None)[source]

Changed in version 0.6.0: Add option to specify backend, for example CuPy

Parameters
  • kind ("nav", "sig" or "single") – The abstract shape of the buffer, corresponding either to the navigation or the signal dimensions of the dataset, or a single value.

  • extra_shape (optional, tuple of int or a Shape object) – You can specify additional dimensions for your data. For example, if you want to store 2D coords, you would specify (2,) here. For a Shape object, sig_dims is discarded and the entire shape is used.

  • dtype (string or numpy dtype) – The dtype of this buffer

  • where (string or None) – None means NumPy array, device to use a back-end specified in allocate(). New in 0.6.0.dev0

allocate(lib=None)[source]

Allocate a new buffer, in the shape previously set via one of the set_shape_* methods.

Changed in version 0.6.0.dev0: Support for allocating on device

property data

Get the buffer contents in shape that corresponds to the original dataset shape. If a ROI is set, embed the result into a new array; unset values have nan value, if supported by the underlying dtype.

export()[source]

Convert device array to NumPy array for pickling and merging

property extra_shape

Get the extra_shape of this buffer.

New in version 0.5.0.

flush(debug=False)[source]

Write back any cached contiguous copies

New in version 0.5.0.

get_contiguous_view_for_tile(partition, tile)[source]

Make a cached contiguous copy of the view for a single tile if necessary.

Currently this is only necessary for kind="sig" buffers. Use flush() to write back the cache.

Boundary condition: tile.tile_slice.get(sig_only=True) does not overlap for different tiles while the cache is active, i.e. the tiles follow LiberTEM slicing for libertem.udf.base.UDFTileMixing.process_tile().

New in version 0.5.0.

Returns

view – View into data or contiguous copy if necessary

Return type

np.ndarray

get_view_for_dataset(dataset)[source]
get_view_for_frame(partition, tile, frame_idx)[source]

get a view for a single frame in a partition- or dataset-sized buffer (partition-sized here means the reduced result for a whole partition, not the partition itself!)

get_view_for_partition(partition)[source]

get a view for a single partition in a whole-result-sized buffer

get_view_for_tile(partition, tile)[source]

get a view for a single tile in a partition-sized buffer (partition-sized here means the reduced result for a whole partition, not the partition itself!)

has_data()[source]
property kind

Get the kind of this buffer.

New in version 0.5.0.

property raw_data

Get the raw data underlying this buffer, which is flattened and may be even filtered to a ROI

property roi_is_zero
set_roi(roi)[source]
set_shape_ds(dataset, roi=None)[source]
set_shape_partition(partition, roi=None)[source]
property where

Get the place where this buffer is to be allocated.

New in version 0.6.0.dev0.

libertem.common.buffers.bytes_aligned(size)[source]
libertem.common.buffers.disjoint(sl: libertem.common.slice.Slice, slices: Iterable[libertem.common.slice.Slice])[source]
libertem.common.buffers.empty_aligned(size, dtype)[source]
libertem.common.buffers.reshaped_view(a: numpy.ndarray, shape)[source]

Like numpy.ndarray.reshape(), just guaranteed to return a view or throw an AttributeError if no view can be created.

New in version 0.5.0.

Parameters
  • a (numpy.ndarray) – Array to create a view of

  • shape (tuple) – Shape of the view to create

Returns

view – View into a with shape shape

Return type

numpy.ndarray

libertem.common.buffers.to_numpy(a)[source]
libertem.common.buffers.zeros_aligned(size, dtype)[source]

Included utility UDFs

Some generally useful UDFs are included with LiberTEM:

Note

See Application-specific API for application-specific UDFs and analyses.

Sum of frames

class libertem.udf.sum.SumUDF(dtype='float32')[source]

Sum up frames, preserving the signal dimension

Examples

>>> udf = SumUDF()
>>> result = ctx.run_udf(dataset=dataset, udf=udf)
>>> np.array(result["intensity"]).shape
(16, 16)
__init__(dtype='float32')[source]
Parameters

dtype (numpy.dtype, optional) – Preferred dtype for computation, default ‘float32’. The actual dtype will be determined from this value and the dataset’s dtype using numpy.result_type(). See also dtype support.

get_preferred_input_dtype()[source]

Override this method to specify the preferred input dtype of the UDF.

The default is float32 since most numerical processing tasks perform best with this dtype, namely dot products.

The back-end uses this preferred input dtype in combination with the dataset`s native dtype to determine the input dtype using numpy.result_type(). That means float data in a dataset switches the dtype to float even if this method returns an int dtype. int32 or wider input data would switch from float32 to float64, and complex data in the dataset will switch the input dtype kind to complex, following the NumPy casting rules.

In case your UDF only works with specific input dtypes, it should throw an error or warning if incompatible dtypes are used, and/or implement a meaningful conversion in your UDF’s process_<...> routine.

If you prefer to always use the dataset’s native dtype instead of floats, you can override this method to return UDF.USE_NATIVE_DTYPE, which is currently identical to numpy.bool and behaves as a neutral element in numpy.result_type().

New in version 0.4.0.

Sum of log-scaled frames

class libertem.udf.logsum.LogsumUDF[source]

Sum up logscaled frames

In comparison to log-scaling the sum, this highlights regions with slightly higher intensity that appear in many frames in relation to very high intensity in a few frames.

Examples

>>> udf = LogsumUDF()
>>> result = ctx.run_udf(dataset=dataset, udf=udf)
>>> np.array(result["logsum"]).shape
(16, 16)

Standard deviation

class libertem.udf.stddev.StdDevUDF(**kwargs)[source]

Compute sum of variances and sum of pixels from the given dataset

The one-pass algorithm used in this code is taken from the following paper: [SG18].

..versionchanged:: 0.5.0

Result buffers have been renamed

Examples

>>> udf = StdDevUDF()
>>> result = ctx.run_udf(dataset=dataset, udf=udf)
>>> # Note: These are raw results. Use run_stddev() instead of
>>> # using the UDF directly to obtain
>>> # variance, standard deviation and mean
>>> np.array(result["varsum"])        # variance times number of frames
array(...)
>>> np.array(result["num_frames"])  # number of frames for each tile
array(...)
>>> np.array(result["sum"])  # sum of all frames
array(...)
get_result_buffers()[source]

Initializes BufferWrapper objects for sum of variances, sum of frames, and the number of frames

Returns

A dictionary that maps ‘varsum’, ‘num_frames’, ‘sum’ to the corresponding BufferWrapper objects

Return type

dict

get_task_data()[source]

Initialize per-task data.

Per-task data can be mutable. Override this function to allocate temporary buffers, or to initialize system resources.

If you want to distribute static data, use parameters instead.

Data available in this method:

  • self.params - the input parameters of this UDF

  • self.meta - relevant metadata, see UDFMeta documentation.

Returns

Flat dict with string keys. Keys should be valid python identifiers, which allows access via self.task_data.the_key_here.

Return type

dict

merge(dest, src)[source]

Given destination and source buffers that contain sum of variances, sum of frames, and the number of frames used in each of the buffers, merge the source buffers into the destination buffers by computing the joint sum of variances and sum of frames over all frames used

Parameters
  • dest – Aggregation bufer that contains sum of variances, sum of frames, and the number of frames

  • src – Partial results that contains sum of variances, sum of frames, and the number of frames of a partition to be merged into the aggregation buffers

process_tile(tile)[source]

Calculate a sum and variance minibatch for the tile and update partition buffers with it.

Parameters

tile – tile of the data

libertem.udf.stddev.run_stddev(ctx, dataset, roi=None, progress=False)[source]

Compute sum of variances and sum of pixels from the given dataset

One-pass algorithm used in this code is taken from the following paper: [SG18].

..versionchanged:: 0.5.0

Result buffers have been renamed

..versionchanged:: 0.5.0

Added progress parameter for progress bar

Parameters
Returns

  • pass_results – A dictionary of narrays that contains sum of variances, sum of pixels, and number of frames used to compute the above statistic

  • To retrieve statistic, using the following commands

  • variance (pass_results['var'])

  • standard deviation (pass_results['std'])

  • sum of pixels (pass_results['sum'])

  • mean (pass_results['mean'])

  • number of frames (pass_results['num_frames'])

libertem.udf.stddev.consolidate_result(udf_result)[source]

Calculate variance, mean and standard deviation from raw UDF results and consolidate the per-tile frame counter into a single value.

Parameters

udf_result (Dict[str, BufferWrapper]) – UDF result with keys ‘sum’, ‘varsum’, ‘num_frames’

Returns

pass_results – Result dictionary with keys 'sum', 'varsum', 'var', 'std', 'mean' as numpy.ndarray, and 'num_frames' as int

Return type

Dict[str, Union[numpy.ndarray, int]]

Sum per frame

class libertem.udf.sumsigudf.SumSigUDF(**kwargs)[source]

Sum over the signal axes. For each navigation position, the sum of all pixels is calculated.

Examples

>>> udf = SumSigUDF()
>>> result = ctx.run_udf(dataset=dataset, udf=udf)
>>> np.array(result["intensity"]).shape
(16, 16)

Apply masks

class libertem.udf.masks.ApplyMasksUDF(mask_factories, use_torch=True, use_sparse=None, mask_count=None, mask_dtype=None, preferred_dtype=None, backends=None)[source]

Apply masks to signals/frames in the dataset.

New in version 0.4.0.

__init__(mask_factories, use_torch=True, use_sparse=None, mask_count=None, mask_dtype=None, preferred_dtype=None, backends=None)[source]
Parameters
  • mask_factories (Union[Callable[[], array_like], Iterable[Callable[[], array_like]]]) – Function or list of functions that take no arguments and create masks. The returned masks can be numpy arrays, scipy.sparse or sparse https://sparse.pydata.org/ matrices. The mask factories should not reference large objects because they can create significant overheads when they are pickled and unpickled. Each factory function should, when called, return a numpy array with the same shape as frames in the dataset (so dataset.shape.sig).

  • use_torch (bool, optional) – Use pytorch back-end if available. Default True

  • use_sparse (Union[None, False, True, 'scipy.sparse', 'scipy.sparse.csc', 'sparse.pydata'], optional) – Which sparse back-end to use. * None (default): Use sparse matrix multiplication if all factory functions return a sparse mask, otherwise convert all masks to dense matrices and use dense matrix multiplication * True: Convert all masks to sparse matrices and use default sparse back-end. * False: Convert all masks to dense matrices. * ‘scipy.sparse’: Use scipy.sparse.csr_matrix (default sparse) * ‘scipy.sparse.csc’: Use scipy.sparse.csc_matrix * ‘sparse.pydata’: Use sparse.pydata COO matrix

  • mask_count (int, optional) – Specify the number of masks if a single factory function is used so that the number of masks can be determined without calling the factory function.

  • mask_dtype (numpy.dtype, optional) – Specify the dtype of the masks so that mask dtype can be determined without calling the mask factory functions. This can be used to override the mask dtype in the result dtype determination. As an example, setting this to np.float32 means that masks of type float64 will not switch the calculation and result dtype to float64 or complex128.

  • preferred_dtype (numpy.dtype, optional) – Let get_preferred_input_dtype() return the specified type instead of the default float32. This can perform the calculation with integer types if both input data and mask data are compatible with this.

  • backends (Iterable containing strings "numpy" and/or "cupy", or None) – Control which back-ends are used. Default is numpy and cupy

get_backends()[source]

Signal which computation back-ends the UDF can use.

numpy is the default CPU-based computation.

cuda is CUDA-based computation without CuPy.

cupy is CUDA-based computation through CuPy.

New in version 0.6.0.

Returns

backend – An iterable containing possible values numpy (default), 'cuda' and cupy

Return type

Iterable[str]

Load data

class libertem.udf.raw.PickUDF[source]

Load raw data from ROI

This UDF is meant for frame picking with a very small ROI, usually a single frame.

New in version 0.4.0.