UDF API reference
Defining UDFs
See User-defined functions (UDFs) for an introduction and in-depth explanation.
Mixins for processing methods
- class libertem.udf.base.UDFFrameMixin(*args, **kwargs)[source]
Implement
process_frame
for per-frame processing.- process_frame(frame: ndarray)[source]
Implement this method to process the data on a frame-by-frame manner.
Data available in this method:
self.params - the parameters of this UDF
self.task_data - task data created by get_task_data
self.results - the result buffer instances
self.meta - meta data about the current operation and data set
- Parameters:
frame (numpy.ndarray or cupy.ndarray) – A single frame or signal element from the dataset. The shape is the same as dataset.shape.sig. In case of pixelated STEM / scanning diffraction data this is 2D, for spectra 1D etc.
- class libertem.udf.base.UDFTileMixin(*args, **kwargs)[source]
Implement
process_tile
for per-tile processing.- process_tile(tile: ndarray)[source]
Implement this method to process the data in a tiled manner.
Data available in this method:
self.params - the parameters of this UDF
self.task_data - task data created by get_task_data
self.results - the result buffer instances
self.meta - meta data about the current operation and data set
- Parameters:
tile (numpy.ndarray or cupy.ndarray) – A small number N of frames or signal elements from the dataset. The shape is (N,) + dataset.shape.sig. In case of pixelated STEM / scanning diffraction data this is 3D, for spectra 2D etc.
- class libertem.udf.base.UDFPartitionMixin(*args, **kwargs)[source]
Implement
process_partition
for per-partition processing.- process_partition(partition: ndarray) None [source]
Implement this method to process the data partitioned into large (100s of MiB) partitions.
Data available in this method:
self.params - the parameters of this UDF
self.task_data - task data created by get_task_data
self.results - the result buffer instances
self.meta - meta data about the current operation and data set
Note
Only use this method if you know what you are doing; especially if you are running a processing pipeline with multiple steps, or multiple processing pipelines at the same time, performance may be adversely impacted.
- Parameters:
partition (numpy.ndarray or cupy.ndarray) – A large number N of frames or signal elements from the dataset. The shape is (N,) + dataset.shape.sig. In case of pixelated STEM / scanning diffraction data this is 3D, for spectra 2D etc.
Base UDF classes
- class libertem.udf.base.UDF(**kwargs: Any | AuxBufferWrapper)[source]
The main user-defined functions interface. You can implement your functionality by overriding methods on this class.
If you override __init__, please take care, as it is called multiple times during evaluation of a UDF. You can handle some pre-conditioning of parameters, but you also have to accept the results as input again.
Arguments passed as **kwargs will be automatically available on self.params when running the UDF.
Example
>>> class MyUDF(UDF): ... def __init__(self, param1, param2="def2", **kwargs): ... param1 = int(param1) ... if "param3" not in kwargs: ... raise TypeError("missing argument param3") ... super().__init__(param1=param1, param2=param2, **kwargs)
- Parameters:
kwargs –
Input parameters. They are scattered to the worker processes and available as self.params from here on.
Values can be BufferWrapper instances, which, when accessed via self.params.the_key_here, will automatically return a view corresponding to the current unit of data (frame, tile, partition).
- BACKEND_ALL = ('cupyx.scipy.sparse.csr_matrix', 'cupyx.scipy.sparse.csc_matrix', 'cupyx.scipy.sparse.coo_matrix', 'scipy.sparse.csr_matrix', 'scipy.sparse.csc_matrix', 'scipy.sparse.coo_matrix', 'cupy', 'numpy', 'sparse.COO', 'sparse.GCXS')
Tuple with all backends in suggested priority
- BACKEND_CUDA = 'cuda'
NumPy array, but run on CUDA device class
- BACKEND_CUPY = 'cupy'
CuPy array
- BACKEND_CUPY_SCIPY_COO = 'cupyx.scipy.sparse.coo_matrix'
cupyx.scipy.sparse.coo_matrix
- BACKEND_CUPY_SCIPY_CSC = 'cupyx.scipy.sparse.csc_matrix'
cupyx.scipy.sparse.csc_matrix
- BACKEND_CUPY_SCIPY_CSR = 'cupyx.scipy.sparse.csr_matrix'
cupyx.scipy.sparse.csr_matrix
- BACKEND_NUMPY = 'numpy'
NumPy array
- BACKEND_SCIPY_COO = 'scipy.sparse.coo_matrix'
scipy.sparse.coo_matrix
- BACKEND_SCIPY_CSC = 'scipy.sparse.csc_matrix'
scipy.sparse.csc_matrix
- BACKEND_SCIPY_CSR = 'scipy.sparse.csr_matrix'
scipy.sparse.csr_matrix
- BACKEND_SPARSE_COO = 'sparse.COO'
sparse.COO array
- BACKEND_SPARSE_DOK = 'sparse.DOK'
sparse.DOK array – not recommended since slow!
- BACKEND_SPARSE_GCXS = 'sparse.GCXS'
sparse.GCXS array
- CPU_BACKENDS = frozenset({'numpy', 'numpy.matrix', 'scipy.sparse.coo_matrix', 'scipy.sparse.csc_matrix', 'scipy.sparse.csr_matrix', 'sparse.COO', 'sparse.DOK', 'sparse.GCXS'})
Set of backends that run on device class CPU
- CUDA_BACKENDS = frozenset({'cuda', 'cupy', 'cupyx.scipy.sparse.coo_matrix', 'cupyx.scipy.sparse.csc_matrix', 'cupyx.scipy.sparse.csr_matrix'})
Set of backends that run on device class CUDA
- CUPY_BACKENDS = frozenset({'cupy', 'cupyx.scipy.sparse.coo_matrix', 'cupyx.scipy.sparse.csc_matrix', 'cupyx.scipy.sparse.csr_matrix'})
Set of backends that use CuPy, subset of class CUDA
- D2_BACKENDS = frozenset({'cupyx.scipy.sparse.coo_matrix', 'cupyx.scipy.sparse.csc_matrix', 'cupyx.scipy.sparse.csr_matrix', 'numpy.matrix', 'scipy.sparse.coo_matrix', 'scipy.sparse.csc_matrix', 'scipy.sparse.csr_matrix'})
Set of backends that only support two-dimensional arrays
- DENSE_BACKENDS = frozenset({'cuda', 'cupy', 'numpy', 'numpy.matrix'})
Set of backends that are dense arrays
- ND_BACKENDS = frozenset({'cuda', 'cupy', 'numpy', 'sparse.COO', 'sparse.DOK', 'sparse.GCXS'})
Set of backends that support n-dimensional arrays
- SPARSE_BACKENDS = frozenset({'cupyx.scipy.sparse.coo_matrix', 'cupyx.scipy.sparse.csc_matrix', 'cupyx.scipy.sparse.csr_matrix', 'scipy.sparse.coo_matrix', 'scipy.sparse.csc_matrix', 'scipy.sparse.csr_matrix', 'sparse.COO', 'sparse.DOK', 'sparse.GCXS'})
Set of backends that are sparse arrays
- TILE_DEPTH_DEFAULT = <object object>
Suggest using recommended tile depth
- TILE_DEPTH_MAX = inf
Suggest using maximum tile depth
- TILE_SIZE_BEST_FIT = <object object>
Suggest using recommended tile size
- TILE_SIZE_MAX = inf
Suggest using maximum tile size
- classmethod aux_data(data, kind, extra_shape=(), dtype='float32')[source]
Use this method to create auxiliary data. Auxiliary data should have a shape like (dataset.shape.nav, extra_shape) and on access, an appropriate view will be created. For example, if you access aux data in process_frame, you will get the auxiliary data for the current frame you are processing.
Example
We create a UDF to demonstrate the behavior:
>>> class MyUDF(UDF): ... def get_result_buffers(self): ... # Result buffer for debug output ... return {'aux_dump': self.buffer(kind='nav', dtype='object')} ... ... def process_frame(self, frame): ... # Extract value of aux data for demonstration ... self.results.aux_dump[:] = str(self.params.aux_data[:]) ... >>> # for each frame, provide three values from a sequential series: >>> aux1 = MyUDF.aux_data( ... data=np.arange(np.prod(dataset.shape.nav) * 3, dtype=np.float32), ... kind="nav", extra_shape=(3,), dtype="float32" ... ) >>> udf = MyUDF(aux_data=aux1) >>> res = ctx.run_udf(dataset=dataset, udf=udf)
process_frame for frame (0, 7) received a view of aux_data with values [21., 22., 23.]:
>>> res['aux_dump'].data[0, 7] '[21. 22. 23.]'
- buffer(kind: Literal['nav', 'sig', 'single'], extra_shape: tuple[int, ...] = (), dtype: nt.DTypeLike = 'float32', where: Literal['device'] | None = None, use: Literal['private', 'result_only'] | None = None) BufferWrapper [source]
Use this method to create
BufferWrapper
objects inget_result_buffers()
.- Parameters:
kind ("nav", "sig" or "single") – The abstract shape of the buffer, corresponding either to the navigation or the signal dimensions of the dataset, or a single value.
extra_shape (optional, tuple of int or a Shape object) – You can specify additional dimensions for your data. For example, if you want to store 2D coords, you would specify
(2,)
here. If this is specified as a Shape object, it is converted to a tuple first.dtype (string or numpy dtype) – The dtype of this buffer
where (string or None) –
None
means NumPy array, specify'device'
to use a device buffer (for example on a CUDA device)New in version 0.6.0.
use ("private", "result_only" or None) –
If you specify
"private"
here, the result will only be made available to internal functions, likeprocess_frame()
,merge()
orget_results()
. It will not be available to the user of the UDF, which means you can use this to hide implementation details that are likely to change later.Specify
"result_only"
here if the buffer is only used inget_results()
, this means we don’t have to allocate and return it on the workers without actually needing it.None
means the buffer is used both as a final and intermediate result.New in version 0.7.0.
- copy_for_partition(partition: Partition, roi: ndarray) UDF [source]
create a copy of the UDF, specifically slicing aux data to the specified pratition and roi
- forbuf(arr, target)[source]
Convert array to backend that is compatible with result buffers and reshape.
This function should be wrapped around assignment to result buffers if the array
arr
might have an incompatible array backend and/or have a flattened sig dimension. It converts any of the supported input backends into the array backend of the target buffer. If the argument is already matching the buffer or is not recognized, it is a no-op.In particular, the result of array operations on a sparse input tile can’t always be merged into dense result buffers directly since arrays from the
sparse
package are not converted to NumPy arrays automatically.New in version 0.11.0.
- Parameters:
arr (array-like) – Any array-like object that is supported by the sparseconverter package or allows the desired operation with the result buffer at hand.
target (array-like) – The LiberTEM result buffer that the desired operation targets.
Examples
>>> def process_tile(self, tile): ... res = ... # Processing result from tile ... self.results.intensity[:] += self.forbuf( ... res, ... self.results.intensity ... )
See Sparse arrays for a complete example how to use this function in the contaxt of a UDF!
- get_backends() Literal['numpy', 'numpy.matrix', 'cuda', 'cupy', 'sparse.COO', 'sparse.GCXS', 'sparse.DOK', 'scipy.sparse.coo_matrix', 'scipy.sparse.csr_matrix', 'scipy.sparse.csc_matrixcupyx.scipy.sparse.coo_matrix', 'cupyx.scipy.sparse.csr_matrix', 'cupyx.scipy.sparse.csc_matrix'] | Iterable[Literal['numpy', 'numpy.matrix', 'cuda', 'cupy', 'sparse.COO', 'sparse.GCXS', 'sparse.DOK', 'scipy.sparse.coo_matrix', 'scipy.sparse.csr_matrix', 'scipy.sparse.csc_matrixcupyx.scipy.sparse.coo_matrix', 'cupyx.scipy.sparse.csr_matrix', 'cupyx.scipy.sparse.csc_matrix']] [source]
Signal which computation back-ends the UDF can use.
Allowed are values in
libertem.udf.base.UDF.BACKEND_ALL
.(UDF.BACKEND_NUMPY, )
is returned by default for CPU-based computation.New in version 0.6.0.
Changed in version 0.11.0: Extended from just NumPy vs. CuPy to include more generic specification of backends, including sparse arrays. See Sparse arrays for details! The backend specifications are backwards-compatible with the previously supported values.
- Returns:
A single value or iterable containing values from the supported backends in
libertem.udf.base.UDF.BACKEND_ALL
- Return type:
backends
- get_method() UDFMethod
Return a member of the
libertem.udf.base.UDF.UDF_METHOD
enum to indicate which processing method to use duringrun_udf
.By default, this method returns based on the process method precedence of tile > frame > partition.
The return is checked at runtime to ensure the requested method is impemented on the UDF itself.
Changed in version 0.13.0:
get_method()
was exposed to enable choice of processing method at runtime- Returns:
A member of the enum
libertem.udf.base.UDF.UDF_METHOD
- Return type:
- get_preferred_input_dtype() nt.DTypeLike [source]
Override this method to specify the preferred input dtype of the UDF.
The default is
float32
since most numerical processing tasks perform best with this dtype, namely dot products.The back-end uses this preferred input dtype in combination with the dataset`s native dtype to determine the input dtype using
numpy.result_type()
. That meansfloat
data in a dataset switches the dtype tofloat
even if this method returns anint
dtype.int32
or wider input data would switch fromfloat32
tofloat64
, and complex data in the dataset will switch the input dtype kind tocomplex
, following the NumPy casting rules.In case your UDF only works with specific input dtypes, it should throw an error or warning if incompatible dtypes are used, and/or implement a meaningful conversion in your UDF’s
process_<...>
routine.If you prefer to always use the dataset’s native dtype instead of floats, you can override this method to return
UDF.USE_NATIVE_DTYPE
, which is currently identical tonumpy.bool
and behaves as a neutral element innumpy.result_type()
.New in version 0.4.0.
- get_result_buffers() dict[str, BufferWrapper] [source]
Return result buffer declaration.
Values of the returned dict should be
BufferWrapper
instances, which, when accessed viaself.results.key
, will automatically return a view corresponding to the current unit of data (frame, tile, partition).The values also need to be serializable via pickle.
Data available in this method:
self.params
- the parameters of this UDFself.meta
- relevant metadata, seeUDFMeta
documentation. Please note that partition metadata will not be set when this method is executed on the head node.
- Returns:
Flat dict with string keys. Keys should be valid python identifiers, which allows access via self.results.the_key_here.
- Return type:
- get_results() dict[str, ndarray] [source]
Get results, allowing a postprocessing step on the main node after a result has been merged. See also:
UDFPostprocessMixin
.New in version 0.7.0.
Note
You should return all values as numpy arrays, they will be wrapped in BufferWrapper instances before they are returned to the user.
See the Post-processing after merging section in the documentation for details and examples.
- Returns:
results – A dict containing the final post-processed results.
- Return type:
- get_task_data() dict[str, Any] [source]
Initialize per-task data.
Per-task data can be mutable. Override this function to allocate temporary buffers, or to initialize system resources.
If you want to distribute static data, use parameters instead.
Data available in this method:
self.params - the input parameters of this UDF
self.meta - relevant metadata, see
UDFMeta
documentation.
- Returns:
Flat dict with string keys. Keys should be valid python identifiers, which allows access via self.task_data.the_key_here.
- Return type:
- get_tiling_preferences() TilingPreferences [source]
Configure tiling preferences. Return a dictionary with the following keys:
“depth”: number of frames/frame parts to stack on top of each other
“total_size”: total size of a tile in bytes
New in version 0.6.0.
- merge(dest: MergeAttrMapping, src: MergeAttrMapping)[source]
Merge a partial result src into the current global result dest.
Data available in this method:
self.params - the parameters of this UDF
- Parameters:
dest – global results; you can access the ndarrays for each buffer name (from get_result_buffers) by attribute access (
dest.your_buffer_name
)src – results for a partition; you can access the ndarrays for each buffer name (from get_result_buffers) by attribute access (
src.your_buffer_name
)
Note
This function is running on the main node, which means self.results and self.task_data are not available.
- property requires_custom_merge
use != ‘result_only’ are present where the default merge doesn’t work
New in version 0.5.0.
Changed in version 0.12.0: Ignore
use != 'result_only'
buffer since they are not present in the merge function- Type:
Determine if buffers with
kind != 'nav'
that are not- Type:
code
- property requires_custom_merge_all
Determine if buffers with
kind != 'nav'
are present where the default merge doesn’t workNew in version 0.12.0.
- static with_mask(data: ndarray, mask: ndarray | bool) ArrayWithMask [source]
Add a mask to indicate the valid parts in a result. The mask should be an array of bools, with a shape matching the data array. The mask should have True in positions that are valid, and False otherwise.
You can pass mask=True or mask=False as shortcuts for all-valid or all-invalid, and also pass in masks that can be broadcast to data.
This should be used in
get_results()
.See the Valid data masking section in the documentation for details and more examples.
Example
>>> class SomeUDF(UDF): ... # ... other methods omitted for brevity ... ... ... def get_results(self): ... return { ... 'some_result': self.with_mask( ... data=self.results.some_result, ... mask=False, ... ), ... } ...
- property xp
Compute back-end library to use.
Generally, use
self.xp
instead ofnp
to use NumPy or CuPy transparentlyNew in version 0.6.0.
Alternative merging for Dask arrays
- class libertem.udf.base.UDFMergeAllMixin(*args, **kwargs)[source]
- merge_all(ordered_results: OrderedDict[Slice, MergeAttrMapping]) Mapping[str, nt.ArrayLike] [source]
Combine stack of ordered partial results
ordered_results
to form complete result.Combining can be more efficient than direct merging into a result buffer for cases where the results are not NumPy arrays. Currently this is only applicable for the
libertem.executor.delayed.DelayedJobExecutor
where it provides an efficient pathway to construct Dask arrays from delayed UDF results.The input and the returned arrays are in flattened navigation dimension with ROI applied.
Data available in this method:
self.params
- the parameters of this UDF
For UDFs with only
kind='nav'
result buffers a default implementation is used automatically.- Parameters:
ordered_results – Ordered dict mapping partition slice to UDF partial result
- Returns:
Dictionary mapping result buffer name to buffer content
- Return type:
dict[buffername] -> array_like
Note
This function is running on the main node, which means self.results and self.task_data are not available.
Meta information
- class libertem.udf.base.UDFMeta(partition_slice: Slice | None, dataset_shape: Shape, roi: ndarray | None, dataset_dtype: nt.DTypeLike, input_dtype: nt.DTypeLike, tiling_scheme: TilingScheme | None = None, tiling_index: int = 0, corrections: CorrectionSet | None = None, device_class: Literal['cpu', 'cuda'] | None = None, threads_per_worker: int | None = None, array_backend: Literal['numpy', 'numpy.matrix', 'cuda', 'cupy', 'sparse.COO', 'sparse.GCXS', 'sparse.DOK', 'scipy.sparse.coo_matrix', 'scipy.sparse.csr_matrix', 'scipy.sparse.csc_matrixcupyx.scipy.sparse.coo_matrix', 'cupyx.scipy.sparse.csr_matrix', 'cupyx.scipy.sparse.csc_matrix'] | None = None, valid_nav_mask: ndarray | None = None)[source]
UDF metadata. Makes all relevant metadata accessible to the UDF. Can be different for each task/partition.
Changed in version 0.4.0: Added distinction of dataset_dtype and input_dtype
Changed in version 0.6.0: Information on compute backend, corrections, coordinates and tiling scheme added
Changed in version 0.9.0:
tiling_scheme_idx
and sig_slice addedChanged in version 0.11.0:
array_backend
added- property array_backend: Literal['numpy', 'numpy.matrix', 'cuda', 'cupy', 'sparse.COO', 'sparse.GCXS', 'sparse.DOK', 'scipy.sparse.coo_matrix', 'scipy.sparse.csr_matrix', 'scipy.sparse.csc_matrixcupyx.scipy.sparse.coo_matrix', 'cupyx.scipy.sparse.csr_matrix', 'cupyx.scipy.sparse.csc_matrix'] | None
Array backend, one of the constants defined in
BACKEND_*
constants inlibertem.udf.base.UDF
, or None if not known at that time.New in version 0.11.0.
- property coordinates: ndarray
Array of coordinates that correspond to the frames in the actual navigation space which are part of the current tile or partition.
New in version 0.6.0.
- Type:
np.ndarray
- property corrections: CorrectionSet
correction data that is available, either from the dataset or specified by the user
New in version 0.6.0.
- Type:
- property dataset_dtype: nt.DTypeLike
Native dtype of the dataset
- Type:
- property dataset_shape: Shape
The original shape of the whole dataset, not influenced by the ROI
- Type:
- property device_class: str
Which device class is used.
The back-end library can be accessed as
libertem.udf.base.UDF.xp
. This additional string information is used since that way the back-end can be probed without importing them all and testing them againstlibertem.udf.base.UDF.xp
.Current values are
cpu
(default) orcuda
.New in version 0.6.0.
Return a mask of the already computed nav positions, as flattened 1D array.
Only available in
merge()
andget_results()
.In case of
merge()
, the mask does not yet contain the positions of the data that will be merged into the result, but only those positions that have already been merged into result.NOTE: these positions may include dropped frames or missing data.
- Parameters:
full_nav – If a roi is applied, still include all elements of the navigation space in the mask. If this is not set, the mask is compressed to the positions selected in the roi.
- property input_dtype: nt.DTypeLike
dtype of the data that will be passed to the UDF
This is determined from the dataset’s native dtype and
UDF.get_preferred_input_dtype()
usingnumpy.result_type()
New in version 0.4.0.
- Type:
- property partition_shape: Shape
The shape of the partition this UDF currently works on. If a ROI was applied, the shape will be modified accordingly.
- Type:
- property roi: ndarray | None
Boolean array which limits the elements the UDF is working on. Has a shape of
dataset_shape.nav
.- Type:
- property sig_slice: Slice
Signal slice of the current tile.
Since all tiles follow the same tiling scheme, this avoids repeatedly calculating the signal part of
slice
. Instead, the appropriate slice from the tiling scheme can be re-used.
- property slice: Slice | None
A
Slice
instance that describes the location within the dataset with navigation dimension flattened and reduced to the ROI.- Type:
- property threads_per_worker: int | None
- number of threads that a UDF is allowed to use in the process_* method.
For Numba, pyfftw, Torch, NumPy and SciPy (OMP, MKL, OpenBLAS), this limit is set automatically; this property can be used for other cases, like manually creating thread pools or setting limits for unsupported modules.
None
means no limit is set, and the UDF can use any number of threads it deems necessary (should be limited to system limits, of course).
Note
Changed in version 0.8.0: Since discovery of loaded libraries can be slow with
threadpoolctl
(#1117), they are cached now. In case an UDF triggers loading of a new library or instance of a library that is supported bythreadpoolctl
, it will only be discovered in the first run on aContext
. The threading settings for such other libraries or instances can therefore depend on the execution order. In such cases the thread count for affected libraries should be set in the UDF based onthreads_per_worker
. Numba, pyfftw, Torch, NumPy and SciPy should not be affected since they are loaded before the first discovery.See also:
libertem.common.threading.set_num_threads()
New in version 0.7.0.
- Type:
int or None
- property tiling_scheme: TilingScheme | None
the tiling scheme that was negotiated
New in version 0.6.0.
- Type:
- property tiling_scheme_idx: int
Index of the current tile in
tiling_scheme
.
Pre- and postprocessing
- class libertem.udf.base.UDFPreprocessMixin(*args, **kwargs)[source]
Implement
preprocess
to initialize the result buffers of a partition on the worker before the partition data is processed.New in version 0.3.0.
- preprocess() None [source]
Implement this method to preprocess the result data for a partition.
This can be useful to initialize arrays of
dtype='object'
with the correct container types, for example.Data available in this method:
self.params - the parameters of this UDF
self.task_data - task data created by get_task_data
self.results - the result buffer instances
- class libertem.udf.base.UDFPostprocessMixin(*args, **kwargs)[source]
Implement
postprocess
to modify the resulf buffers of a partition on the worker after the partition data has been completely processed, but before it is returned to the main node for the final merging step.- postprocess() None [source]
Implement this method to postprocess the result data for a partition.
This can be useful in combination with process_tile() to implement a postprocessing step that requires the reduced results for whole frames.
Data available in this method:
self.params - the parameters of this UDF
self.task_data - task data created by get_task_data
self.results - the result buffer instances
Exceptions
Running UDFs
Three methods of libertem.api.Context
are relevant for running user-defined functions:
- class libertem.api.Context(executor: JobExecutor | None = None, plot_class: Live2DPlot | None = None)[source]
Context is the main entry point of the LiberTEM API. It contains methods for loading datasets, creating analyses on them and running them. In the background, instantiating a Context creates a suitable executor and spins up a local Dask cluster unless the executor is passed to the constructor.
Changed in version 0.7.0: Removed deprecated methods
create_mask_job
,create_pick_job
- Parameters:
executor (JobExecutor or None) – If None, create a local dask.distributed cluster and client using
make_local()
with optimal configuration for LiberTEM. It uses all cores and compatible GPUs on the local system, but is not set as default Dask scheduler to not interfere with other uses of Dask.plot_class (libertem.viz.base.Live2DPlot) –
Default plot class for live plotting. Defaults to
libertem.viz.mpl.MPLLive2DPlot
.New in version 0.7.0.
- plot_class
Default plot class for live plotting. Defaults to
libertem.viz.mpl.MPLLive2DPlot
.New in version 0.7.0.
Examples
>>> ctx = libertem.api.Context()
>>> # Create a Context using an inline executor for debugging >>> # See also Context.make_with() for a more convenient interface! >>> from libertem.executor.inline import InlineJobExecutor >>> debug_ctx = libertem.api.Context(executor=InlineJobExecutor())
- map(dataset: DataSet, f, roi: ndarray | SparseArray | spmatrix | tuple[int, ...] | Iterable[tuple[tuple[int, ...], bool]] | None = None, progress: bool | ProgressReporter = False, corrections: CorrectionSet = None, backends=None) BufferWrapper [source]
Create an
AutoUDF
with functionf()
and run it ondataset
Changed in version 0.5.0: Added the
progress
parameterChanged in version 0.6.0: Added the
corrections
andbackends
parameter- Parameters:
dataset – The dataset to work on
f – Function that accepts a frame as the only parameter. It should return a strongly reduced output compared to the size of a frame.
roi (numpy.ndarray, sparse array or coordinate tuple(s), optional) – Region of interest as bool mask over the navigation axes of the dataset. See Regions of interest.
progress (bool | ProgressReporter) – Show progress bar. Toggle with boolean flag or supply instance of
libertem.common.progress.ProgressReporter
for custom handling of progress display.corrections – Corrections to apply while running the function. If none are given, the corrections that are part of the
DataSet
are used, if there are any. See also Corrections.backends (None or iterable containing 'numpy', 'cupy' and/or 'cuda') – Restrict the back-end to a subset of the capabilities of the UDF. This can be useful for testing hybrid UDFs.
- Returns:
BufferWrapper – The result of the UDF. Access the underlying numpy array using the
data
property. Shape and dtype is inferred automatically fromf
.- Return type:
- run_udf(dataset: DataSet, udf: UDF | Iterable[UDF], roi: ndarray | SparseArray | spmatrix | tuple[int, ...] | Iterable[tuple[tuple[int, ...], bool]] | None = None, corrections: CorrectionSet | None = None, progress: bool | ProgressReporter = False, backends=None, plots=None, sync=True) Mapping[str, BufferWrapper] | list[Mapping[str, BufferWrapper]] | Coroutine[None, None, Mapping[str, BufferWrapper]] | Coroutine[None, None, list[Mapping[str, BufferWrapper]]] [source]
Run
udf
ondataset
, restricted to the region of interestroi
.Changed in version 0.5.0: Added the
progress
parameterChanged in version 0.6.0: Added the
corrections
andbackends
parameterChanged in version 0.7.0: Added the
plots
andsync
parameters, and the ability to run multiple UDFs on the same data in a single pass.- Raises:
UDFRunCancelled – Either the run was cancelled using
AsyncJobExecutor.cancel()
, or the underlying data source was interrupted.- Parameters:
dataset – The dataset to work on
udf – UDF instance you want to run, or a list of UDF instances
roi (numpy.ndarray, sparse array or coordinate tuple(s), optional) – Region of interest as bool mask over the navigation axes of the dataset. See Regions of interest.
progress (bool | ProgressReporter) – Show progress bar. Toggle with boolean flag or supply instance of
libertem.common.progress.ProgressReporter
for custom handling of progress display.corrections – Corrections to apply while running the UDF. If none are given, the corrections that are part of the
DataSet
are used, if there are any. See also Corrections.backends (None or iterable containing 'numpy', 'cupy' and/or 'cuda') – Restrict the back-end to a subset of the capabilities of the UDF. This can be useful for testing hybrid UDFs.
plots (None or True or List[List[Union[str, Tuple[str, Callable]]]] or List[LivePlot]) –
None
: don’t plot anything (default)True
: plot all 2D UDF result buffersList[List[...]]
: plot the named UDF buffers. Pass a list of names or (name, callable) tuples for each UDF you want to plot. If the callable is specified, it is applied to the UDF buffer before plotting.List[LivePlot]
:LivePlot
instance for each channel you want to plot
New in version 0.7.0.
sync (bool) –
By default, run_udf is a synchronous method. If sync is set to False, it is awaitable instead.
New in version 0.7.0.
- Returns:
Return value of the UDF containing the result buffers of type
libertem.common.buffers.BufferWrapper
. Note that aBufferWrapper
can be used like anumpy.ndarray
in many cases because it implements__array__()
. You can access the underlying numpy array using thedata
property.If a list of UDFs was passed in, the returned type is a Tuple[dict[str,BufferWrapper], …].
- Return type:
Examples
Run the
SumUDF
on a data set:>>> from libertem.udf.sum import SumUDF >>> result = ctx.run_udf(dataset=dataset, udf=SumUDF()) >>> np.array(result["intensity"]).shape (32, 32) >>> # intensity is the name of the result buffer, defined in the SumUDF
Running a UDF on a subset of data:
>>> from libertem.udf.sumsigudf import SumSigUDF >>> roi = np.zeros(dataset.shape.nav, dtype=bool) >>> roi[0, 0] = True >>> result = ctx.run_udf(dataset=dataset, udf=SumSigUDF(), roi=roi) >>> # to get the full navigation-shaped results, with NaNs where the `roi` was False: >>> np.array(result["intensity"]).shape (16, 16) >>> # to only get the selected results as a flat array: >>> result["intensity"].raw_data.shape (1,)
- run_udf_iter(dataset: DataSet, udf: UDF | Iterable[UDF], roi: ndarray | SparseArray | spmatrix | tuple[int, ...] | Iterable[tuple[tuple[int, ...], bool]] | None = None, corrections: CorrectionSet = None, progress: bool | ProgressReporter = False, backends=None, plots=None, sync=True) ResultGenerator | ResultAsyncGenerator [source]
Run
udf
ondataset
, restricted to the region of interestroi
. Yields partial results after each merge operation.New in version 0.7.0.
- Parameters:
dataset – The dataset to work on
udf – UDF instance you want to run, or a list of UDF instances
roi (numpy.ndarray, sparse array or coordinate tuple(s), optional) – Region of interest as bool mask over the navigation axes of the dataset. See Regions of interest.
progress (bool | ProgressReporter) – Show progress bar. Toggle with boolean flag or supply instance of
libertem.common.progress.ProgressReporter
for custom handling of progress display.corrections – Corrections to apply while running the UDF. If none are given, the corrections that are part of the
DataSet
are used, if there are any. See also Corrections.backends (None or iterable containing 'numpy', 'cupy' and/or 'cuda') – Restrict the back-end to a subset of the capabilities of the UDF. This can be useful for testing hybrid UDFs.
plots (None or True or List[List[Union[str, Tuple[str, Callable]]]] or List[LivePlot]) –
None
: don’t plot anything (default)True
: plot all 2D UDF result buffersList[List[...]]
: plot the named UDF buffers. Pass a list of names or (name, callable) tuples for each UDF you want to plot. If the callable is specified, it is applied to the UDF buffer before plotting.List[LivePlot]
:LivePlot
instance for each channel you want to plot
sync (bool) – By default, run_udf_iter is a synchronous method. If sync is set to False, an async generator will be returned instead.
- Returns:
Generator of
UDFResults
container objects. Their attributebuffers
is the list of result buffer dictionaries for the UDFs. Attributedamage
is aBufferWrapper
ofkind='nav'
,dtype=bool
indicating the positions in nav space that have been processed already.- Return type:
Generator[UDFResults]
Examples
Run the
SumUDF
on a data set:>>> from libertem.udf.sum import SumUDF >>> for result in ctx.run_udf_iter(dataset=dataset, udf=SumUDF()): ... assert np.array(result.buffers[0]["intensity"]).shape == (32, 32) >>> np.array(result.buffers[0]["intensity"]).shape (32, 32) >>> # intensity is the name of the result buffer, defined in the SumUDF
Result type for UDF result iterators:
- class libertem.udf.base.UDFResults(buffers: Iterable[Mapping[str, BufferWrapper]], damage: BufferWrapper)[source]
Container class to combine UDF results with additional information.
This class allows to return additional information from UDF execution together with UDF result buffers. This is currently used to pass “damage” information when running UDFs as an iterator using
libertem.api.Context.run_udf_iter()
. “Damage” is a map of the nav space that is set toTrue
for all positions that have already been processed.New in version 0.7.0.
- Parameters:
buffers – Iterable containing the result buffer dictionaries for each of the UDFs being executed
damage (BufferWrapper) –
libertem.common.buffers.BufferWrapper
ofkind='nav'
,dtype=bool
. It is set toTrue
for all positions in nav space that have been processed already.
Exception for cancelled UDF run:
Buffers
BufferWrapper
objects are used to manage data in the context of user-defined functions.
- class libertem.common.buffers.ArrayWithMask(arr: T, mask: ndarray | bool)[source]
An opaque type representing an array together with a mask (for example, defining a region in the array where there are valid entries)
This is meant for usage in
UDF.get_results()
, and you can use the convenience methodUDF.with_mask()
to instantiate it.- property arr: T
- class libertem.common.buffers.AuxBufferWrapper(kind: Literal['nav', 'sig', 'single'], extra_shape: tuple[int, ...] = (), dtype='float32', where: None | Literal['device'] = None, use: Literal['private', 'result_only'] | None = None)[source]
- new_for_partition(partition, roi)[source]
Return a new AuxBufferWrapper for a specific partition, slicing the data accordingly and reducing it to the selected roi.
This assumes to be called on an AuxBufferWrapper that was not created by this method, that is, it should have global coordinates without having the ROI applied.
- class libertem.common.buffers.BufferPool[source]
allocation pool for explicitly re-using (aligned) allocations
- class libertem.common.buffers.BufferWrapper(kind: Literal['nav', 'sig', 'single'], extra_shape: tuple[int, ...] = (), dtype='float32', where: None | Literal['device'] = None, use: Literal['private', 'result_only'] | None = None)[source]
Helper class to automatically allocate buffers, either for partitions or the whole dataset, and create views for partitions or single frames.
This is used as a helper to allow easy merging of results without needing to manually handle indexing.
Usually, as a user, you only need to instantiate this class, specifying kind, dtype and sometimes extra_shape parameters. Most methods are meant to be called from LiberTEM-internal code, for example the UDF functionality.
This class is array_like, so you can directly use it, for example, as argument for numpy functions.
Changed in version 0.6.0: Add option to specify backend, for example CuPy
- Parameters:
kind ("nav", "sig" or "single") – The abstract shape of the buffer, corresponding either to the navigation or the signal dimensions of the dataset, or a single value.
extra_shape (optional, tuple of int or a Shape object) – You can specify additional dimensions for your data. For example, if you want to store 2D coords, you would specify (2,) here. For a Shape object, sig_dims is discarded and the entire shape is used.
dtype (string or numpy dtype) – The dtype of this buffer
where (string or None) –
None
means NumPy array,device
to use a back-end specified inallocate()
.New in version 0.6.0.
use ("private", "result_only" or None) –
If you specify
"private"
here, the result will only be made available to internal functions, likeprocess_frame()
,merge()
orget_results()
. It will not be available to the user of the UDF, which means you can use this to hide implementation details that are likely to change later.Specify
"result_only"
here if the buffer is only used inget_results()
, this means we don’t have to allocate and return it on the workers without actually needing it.None
means the buffer is used both as a final and intermediate result.New in version 0.7.0.
- property data
Get the buffer contents in shape that corresponds to the original dataset shape. If a ROI is set, embed the result into a new array; unset values have NaN value for floating point types, False for boolean, 0 for integer types and structs, ‘’ for string types and None for objects.
Changed in version 0.7.0: Better initialization values for dtypes other than floating point.
- property dtype
Get the declared dtype of this buffer.
New in version 0.7.0.
- property extra_shape
Get the
extra_shape
of this buffer.New in version 0.5.0.
- get_valid_slice_inner(axis: int = 0) tuple[slice, ...] [source]
Return a slice into
data
across axis that contains only elements marked as valid. This is the “inner” slice, meaning all elements selected by the slice are valid according tovalid_mask
.- Parameters:
axis – The axis across which we should cut. Other axes are unrestricted in the returned slice. For example, if you have a
(y, x)
result, specifyingaxis=0
will give you a slice likenp.s_[y0:y1, :]
where for y in y0..y1,data[y, :]
is valid according tovalid_mask
.
- property masked_data: MaskedArray
Same as
data
, but masked to the valid data, as a numpy masked array.
- property raw_data: ndarray | None
Get the raw data underlying this buffer, which is flattened and may be even filtered to a ROI
- property raw_masked_data: MaskedArray
Same as
raw_data
, but masked to the valid data, as a numpy masked array.
- property shape
- property valid_mask: ndarray
Returns a boolean array with a mask that indicates which parts of the buffer have valid contents.
- property valid_slice_bounding: tuple[slice, ...]
Returns a slice that bounds the valid elements in
data
.Note that this includes all valid elements, but also may contain invalid elements. If you instead want to get a slice that contains no invalid elements (which may cut off some valid elements), use
get_valid_slice_inner()
.
- property where
Get the place where this buffer is to be allocated.
New in version 0.6.0.
- exception libertem.common.buffers.InvalidMaskError[source]
The mask is not compatible, raised for example when the shape of the mask doesn’t match the data.
- class libertem.common.buffers.ManagedBuffer(pool, size, alignment)[source]
Allocate size bytes from pool, and return them to the pool once we are GC’d
- class libertem.common.buffers.PlaceholderBufferWrapper(kind: Literal['nav', 'sig', 'single'], extra_shape: tuple[int, ...] = (), dtype='float32', where: None | Literal['device'] = None, use: Literal['private', 'result_only'] | None = None)[source]
A declaration-only version of
BufferWrapper
that doesn’t actually allocate a buffer. Meant as a placeholder for results that are only materialized inUDF.get_results
.- property data
Get the buffer contents in shape that corresponds to the original dataset shape. If a ROI is set, embed the result into a new array; unset values have NaN value for floating point types, False for boolean, 0 for integer types and structs, ‘’ for string types and None for objects.
Changed in version 0.7.0: Better initialization values for dtypes other than floating point.
- property raw_data
Get the raw data underlying this buffer, which is flattened and may be even filtered to a ROI
- libertem.common.buffers.bytes_aligned(size: int) memoryview [source]
- libertem.common.buffers.empty_aligned(size: int | tuple[int, ...], dtype: nt.DTypeLike) ndarray [source]
- libertem.common.buffers.reshaped_view(a: ndarray, shape)[source]
Like
numpy.ndarray.reshape()
, just guaranteed to return a view or throw anAttributeError
if no view can be created.New in version 0.5.0.
- Parameters:
a (numpy.ndarray) – Array to create a view of
shape (tuple) – Shape of the view to create
- Returns:
view – View into
a
with shapeshape
- Return type:
Included utility UDFs
Some generally useful UDFs are included with LiberTEM:
Note
See Application-specific APIs for application-specific UDFs and analyses.
Sum of frames
- class libertem.udf.sum.SumUDF(dtype='float32')[source]
Sum up frames, preserving the signal dimension
- Parameters:
dtype (numpy.dtype, optional) – Preferred dtype for computation, default ‘float32’. The actual dtype will be determined from this value and the dataset’s dtype using
numpy.result_type()
. See also Preferred input dtype.
Examples
>>> udf = SumUDF() >>> result = ctx.run_udf(dataset=dataset, udf=udf) >>> np.array(result["intensity"]).shape (32, 32)
Sum of log-scaled frames
- class libertem.udf.logsum.LogsumUDF[source]
Sum up logscaled frames
In comparison to log-scaling the sum, this highlights regions with slightly higher intensity that appear in many frames in relation to very high intensity in a few frames.
Examples
>>> udf = LogsumUDF() >>> result = ctx.run_udf(dataset=dataset, udf=udf) >>> np.array(result["logsum"]).shape (32, 32)
- get_backends()[source]
Signal which computation back-ends the UDF can use.
Allowed are values in
libertem.udf.base.UDF.BACKEND_ALL
.(UDF.BACKEND_NUMPY, )
is returned by default for CPU-based computation.New in version 0.6.0.
Changed in version 0.11.0: Extended from just NumPy vs. CuPy to include more generic specification of backends, including sparse arrays. See Sparse arrays for details! The backend specifications are backwards-compatible with the previously supported values.
- Returns:
A single value or iterable containing values from the supported backends in
libertem.udf.base.UDF.BACKEND_ALL
- Return type:
backends
Standard deviation
- class libertem.udf.stddev.StdDevUDF(dtype=None, use_numba=True)[source]
Compute sum of variances and sum of pixels from the given dataset
The one-pass algorithm used in this code is taken from the following paper: [SG18].
Changed in version 0.5.0: Result buffers have been renamed
Changed in version 0.7.0:
var
,mean
, andstd
are now returned directly from the UDF viaget_results
.Changed in version 0.11: added dtype and use_numba parameters, added support for sparse input.
- Parameters:
dtype – Base dtype to determine numerical precision.
use_numba (bool) – Use the Numba-accelerated version if input array format and backend allow.
Examples
>>> udf = StdDevUDF() >>> result = ctx.run_udf(dataset=dataset, udf=udf) >>> np.array(result["varsum"]) # variance times number of frames array(...) >>> np.array(result["num_frames"]) # number of frames for each tile array(...) >>> np.array(result["sum"]) # sum of all frames array(...) >>> np.array(result["var"]) array(...) >>> np.array(result["mean"]) array(...) >>> np.array(result["std"]) array(...)
- get_backends()[source]
Signal which computation back-ends the UDF can use.
Allowed are values in
libertem.udf.base.UDF.BACKEND_ALL
.(UDF.BACKEND_NUMPY, )
is returned by default for CPU-based computation.New in version 0.6.0.
Changed in version 0.11.0: Extended from just NumPy vs. CuPy to include more generic specification of backends, including sparse arrays. See Sparse arrays for details! The backend specifications are backwards-compatible with the previously supported values.
- Returns:
A single value or iterable containing values from the supported backends in
libertem.udf.base.UDF.BACKEND_ALL
- Return type:
backends
- get_task_data()[source]
Initialize per-task data.
Per-task data can be mutable. Override this function to allocate temporary buffers, or to initialize system resources.
If you want to distribute static data, use parameters instead.
Data available in this method:
self.params - the input parameters of this UDF
self.meta - relevant metadata, see
UDFMeta
documentation.
- Returns:
Flat dict with string keys. Keys should be valid python identifiers, which allows access via self.task_data.the_key_here.
- Return type:
- libertem.udf.stddev.run_stddev(ctx, dataset, roi=None, progress=False, use_numba=True)[source]
Compute sum of variances and sum of pixels from the given dataset
One-pass algorithm used in this code is taken from the following paper: [SG18].
Changed in version 0.5.0: Result buffers have been renamed
Changed in version 0.5.0: Added
progress
parameter for progress bar- Parameters:
ctx (libertem.api.Context) –
dataset (libertem.io.dataset.base.DataSet) – dataset to work on
roi (numpy.ndarray) – Region of interest, see Regions of interest for more information.
progress (bool, optional) – Show progress bar. Default is
False
.use_numba (bool) – Use the Numba backend if available, otherwise use ndarray based implementation
- Returns:
pass_results – A dictionary of narrays that contains sum of variances, sum of pixels, and number of frames used to compute the above statistic
To retrieve statistic, using the following commands
variance (
pass_results['var']
)standard deviation (
pass_results['std']
)sum of pixels (
pass_results['sum']
)mean (
pass_results['mean']
)number of frames (
pass_results['num_frames']
)
- libertem.udf.stddev.consolidate_result(udf_result)[source]
Calculate variance, mean and standard deviation from raw UDF results and consolidate the per-tile frame counter into a single value. Convert all result arrays to ndarray.
Note
This is mostly here for backwards-compatability - nowadays, ‘var’, ‘std’, and ‘mean’ are already calculated in
StdDevUDF.get_results()
.- Parameters:
udf_result (Dict[str, BufferWrapper]) – UDF result with keys ‘sum’, ‘varsum’, ‘num_frames’, ‘var’, ‘std’, ‘mean’
- Returns:
pass_results – Result dictionary with keys
'sum', 'varsum', 'var', 'std', 'mean'
asnumpy.ndarray
, and'num_frames'
asint
- Return type:
Dict[str, Union[numpy.ndarray, int]]
Sum per frame
- class libertem.udf.sumsigudf.SumSigUDF(**kwargs: Any | AuxBufferWrapper)[source]
Sum over the signal axes. For each navigation position, the sum of all pixels is calculated.
Examples
>>> udf = SumSigUDF() >>> result = ctx.run_udf(dataset=dataset, udf=udf) >>> np.array(result["intensity"]).shape (16, 16)
- get_backends()[source]
Signal which computation back-ends the UDF can use.
Allowed are values in
libertem.udf.base.UDF.BACKEND_ALL
.(UDF.BACKEND_NUMPY, )
is returned by default for CPU-based computation.New in version 0.6.0.
Changed in version 0.11.0: Extended from just NumPy vs. CuPy to include more generic specification of backends, including sparse arrays. See Sparse arrays for details! The backend specifications are backwards-compatible with the previously supported values.
- Returns:
A single value or iterable containing values from the supported backends in
libertem.udf.base.UDF.BACKEND_ALL
- Return type:
backends
Centre-of-Mass
- class libertem.udf.com.CoMUDF(com_params: CoMParams = (None, None, inf, 0.0, 0.0, False, RegressionOptions.NO_REGRESSION))[source]
Perform centre-of-mass analysis on the dataset
To parametrise the CoM calculation, use the constructor
CoMUDF.with_params()
.This replicates the functionality of
create_com_analysis()
but in UDF form, making it compatible with live processing and more easily sub-classable.New in version 0.12.0.
The result buffers of this UDF, all of type ‘nav’, are as follows:
- raw_com, coordinate (y, x)
The centre-of-mass in the pixel coordinate system of the frame, subject to the mask defined by the parameters, if any.
- raw_shifts, vector (dy, dx)
The raw shift in centre-of-mass in axis-aligned pixels, relative to the parameters cy and cx, either as-supplied or from the frame centre, by default.
- field, vector (2,)
The transformed shift in the centre-of-mass in pixels, subject to any corrections (scan rotation, detector flip)
- field_y, scalar
y-component of field, as given above.
- field_x, scalar
x-component of field, as given above.
- magnitude, scalar
The magnitude of the field buffer, equivalent to ||raw_shifts||_2
- divergence, scalar
The discrete divergence of the field buffer
- curl, scalar
The discrete curl of the field buffer
- regression, (3, 2) matrix
The regression parameters used to flatten the field: ((y, x), (dy/dy, dx/dy), (dy/dx, dx/dx))
Note
The implementation of the results ‘divergence’ and ‘curl’ differ slightly from
COMAnalysis
in that here results for pixels outside of an ROI will always be NaN. TheCOMAnalysis
implementation could provide non-NaN values even inROI == False
pixels due to the behaviour ofnp.gradient
.- Parameters:
com_params (CoMParams) – A
CoMParams
instance containing the parameters for this UDF. By default aCoMParams
instance is created which performs whole-frame CoM giving shifts relative to the frame centre with no corrections.
Examples
>>> # CoM in a disk r=4. centred on (7.2, 6.8) >>> udf = CoMUDF.with_params(cy=7.2, cx=6.8, r=4.) >>> result = ctx.run_udf(dataset=dataset, udf=udf) >>> result["raw_shifts"].data.shape (16, 16, 2)
>>> # Whole-frame CoM corrected to have zero-mean shift >>> udf = CoMUDF.with_params(regression=0) >>> result = ctx.run_udf(dataset=dataset, udf=udf) >>> result["raw_shifts"].data.shape (16, 16, 2)
- classmethod with_params(*, cy: float | None = None, cx: float | None = None, r: float = inf, ri: float = 0.0, scan_rotation: float = 0.0, flip_y: bool = False, regression: Literal[-1, 0, 1] = RegressionOptions.NO_REGRESSION)[source]
Returns an instantiated CoMUDF with a given set of parameters
- Parameters:
cy (Optional[float], by default None) – Vertical-Centre of the mask applied to the frame, if any, and the reference point from which vertical CoM-shifts are calculated. If None, cy is set to the frame centre at runtime.
cx (Optional[float], by default None) – Horizontal-Centre of the mask applied to the frame, if any, and the reference point from which horizontal CoM-shifts are calculated. If None, cx is set to the frame centre at runtime.
r (float, by default
float('inf')
) – (Outer) Radius of the disk mask around cy/cx to restrict the CoM calculation. The default value isfloat('inf')
which is equivalent to performing whole-frame CoM with the given origin cy/cx.ri (float, by default 0.) – (Inner) Radius of the ring mask around cy/cx to exclude from the CoM calculation. If left as 0., no inner disk is excluded and the CoM is calculated within a complete disk of radius r.
scan_rotation (float, by default 0.) – Scan rotation in degrees. The optics of an electron microscope can rotate the image. Furthermore, scan generators may allow scanning in arbitrary directions. This means that the x and y coordinates of the detector image are usually not parallel to the x and y scan coordinates. For interpretation of center of mass shifts, however, the shift vector in detector coordinates has to be put in relation to the position on the sample. The
scan_rotation
parameter can be used to rotate the detector coordinates to match the scan coordinate system. A positive value rotates the displacement vector clock-wise. That means if the detector seems rotated to the right relative to the scan, this value should be negative to counteract this rotation.flip_y (bool, by default False) – Flip the Y coordinate. Some detectors, namely Quantum Detectors Merlin, may have pixel (0, 0) at the lower left corner. This has to be corrected to get the sign of the y shift as well as curl and divergence right.
regression (Union[np.ndarray, Literal[-1, 0, 1]], by default -1) –
Regression to a background CoM field used to flatten the results
Can be an ndarray with shape (3, 2) which specifies a correction to subtract in the form:
[[y, x], [dy/dy, dx/dy], [dy/dx, dx/dx]]
.Otherwise a member of the
RegressionOptions
enum or an integer in(-1, 0, 1)
. This specifies the regression order to calculate and apply to the results. With the following effects:RegressionOptions.NO_REGRESSION
or-1
: No regressionRegressionOptions.SUBTRACT_MEAN
or0
: Subtract the mean to compensate for center offset.RegressionOptions.SUBTRACT_LINEAR
or1
: Subtract the mean and a linear regression, to compensate for center offset and descan error.
The regression that was used is available in the
'regression'
result buffer.
- Returns:
CoMUDF – the UDF with the provided parameters
- Return type:
Note
Using a regression correction calculated on the data itself may distort valid results, such as the effect of long-range fields. Furthermore, it may yield erronerous results due to aliasing on periodic structures or boundary effects. Subtracting a reference measurement or regression on a reference measurement instead of a regression on the data may give more reliable results.
The CoMUDF
has two helper classes used to parametrise it:
- namedtuple libertem.udf.com.CoMParams(cy: float | None = None, cx: float | None = None, r: float = inf, ri: float | None = 0.0, scan_rotation: float = 0.0, flip_y: bool = False, regression: Literal[-1, 0, 1] = RegressionOptions.NO_REGRESSION)[source]
Used to instantiate
CoMUDF
.See
with_params()
for descriptions of each parameter. In most cases use this method rather than instantiating this class directly.
- enum libertem.udf.com.RegressionOptions(value)[source]
Possible values of the regression functionality implemented in
CoMUDF
- Member Type:
Valid values are as follows:
- NO_REGRESSION = <RegressionOptions.NO_REGRESSION: -1>
- SUBTRACT_MEAN = <RegressionOptions.SUBTRACT_MEAN: 0>
- SUBTRACT_LINEAR = <RegressionOptions.SUBTRACT_LINEAR: 1>
There is also a function available to guess sensible CoM parameters:
- libertem.udf.com.guess_corrections(y_centers: ndarray, x_centers: ndarray, roi: ndarray | tuple[slice, ...] | None = None) GuessResult [source]
Guess values for center offset (
(cy, cx)
),scan_rotation
andflip_y
from existing CoM data.This function can generate a CoM parameter guess for atomic resolution 4D STEM data by using the following assumptions:
The field is purely electrostatic, i.e. the RMS curl should be minimized
There is no net field over the field of view and no descan error, i.e. the mean deflection is zero.
Atomic resolution STEM means that the divergence will be negative at atom columns and consequently the histogram of divergence will have a stronger tail towards negative values than towards positive values.
If any corrections were applied when generating the input data please note that the new parameters should be applied relative to these previous values. In particular, the values for
(cy, cx)
returned by this function are the means of the inputsy_centers
andx_centers
. If these inputs are CoM-shifts relative to a given position, then the guessed correction will also be relative to this same position.- Parameters:
y_centers (numpy.ndarray) – 2D arrays with y and x component of the center of mass for each scan position
x_centers (numpy.ndarray) – 2D arrays with y and x component of the center of mass for each scan position
roi (Optional[numpy.ndarray]) – Selector for values to consider in the statistics, compatible with indexing an array with the shape of y_centers and x_centers. By default, everything except the last row and last column are used since these contain artefacts.
Examples
>>> # Perform default, whole-frame CoM on a dataset >>> udf = CoMUDF() >>> result = ctx.run_udf(dataset=dataset, udf=udf) >>> # Pass the `raw_com` result to this function >>> y_centers = result['raw_com'].data[..., 0] >>> x_centers = result['raw_com'].data[..., 1] >>> guess_result = guess_corrections(y_centers, x_centers) >>> # Create a new CoMUDF with the guessed parameters >>> param_udf = CoMUDF.with_params(**guess_result._asdict()) >>> result_corrected = ctx.run_udf(dataset=dataset, udf=param_udf)
- Returns:
GuessResult
- Return type:
computed corrections
- namedtuple libertem.udf.com.GuessResult(scan_rotation: int, flip_y: bool, cy: float, cx: float)[source]
Container for CoM parameters inferred from results, which can be passed to
with_params()
to create a newCoMUDF
better-adapted to the data.Return type of
guess_corrections()
Apply masks
- class libertem.udf.masks.ApplyMasksUDF(mask_factories, use_torch=True, use_sparse=None, mask_count=None, mask_dtype=None, preferred_dtype=None, backends=None, shifts=None, **kwargs)[source]
Apply masks to signals/frames in the dataset. This can not only be used to integrate over regions with a binary mask - the integration can be weighted by using float or complex valued masks.
The result will be returned in a single sig-shaped buffer called intensity. Its shape will be
(*nav_shape, len(masks))
.- Parameters:
mask_factories (Union[Callable[[], array_like], Iterable[Callable[[], array_like]]]) – Function or list of functions that take no arguments and create masks. The returned masks can be numpy arrays, scipy.sparse or sparse https://sparse.pydata.org/ matrices. The mask factories should not reference large objects because they can create significant overheads when they are pickled and unpickled. Each factory function should, when called, return a numpy array with the same shape as frames in the dataset (so dataset.shape.sig).
use_torch (bool, optional) – Use pytorch back-end if available. Default True
use_sparse (Union[None, False, True, 'scipy.sparse', 'scipy.sparse.csc', 'sparse.pydata'], optional) – Which sparse back-end to use. * None (default): Where possible use sparse matrix multiplication if all factory functions return a sparse mask, otherwise convert all masks to dense matrices and use dense matrix multiplication * True: Convert all masks to sparse matrices and use default sparse back-end. * False: Convert all masks to dense matrices. * ‘scipy.sparse’: Use scipy.sparse.csr_matrix (default sparse) * ‘scipy.sparse.csc’: Use scipy.sparse.csc_matrix * ‘sparse.pydata’: Use sparse.pydata COO matrix
mask_count (int, optional) – Specify the number of masks if a single factory function is used so that the number of masks can be determined without calling the factory function.
mask_dtype (numpy.dtype, optional) – Specify the dtype of the masks so that mask dtype can be determined without calling the mask factory functions. This can be used to override the mask dtype in the result dtype determination. As an example, setting this to np.float32 means that masks of type float64 will not switch the calculation and result dtype to float64 or complex128.
preferred_dtype (numpy.dtype, optional) – Let
get_preferred_input_dtype()
return the specified type instead of the default float32. This can perform the calculation with integer types if both input data and mask data are compatible with this.backends (Iterable containing strings "numpy" and/or "cupy", or None) – Control which back-ends are used. Default is numpy and cupy
shifts (Union[Tuple[int, int], AuxBufferWrapper], optional) –
(Y/X)-shifts to apply to all masks before multiplying with each frame. Can be either a length-2 array-like for a constant
(Y, X)
shift, or anAuxBufferWrapper
of(kind='nav', extra_shape=(2,), dtype=int)
defining a per-frame shift to apply.A positive y-shift moves the mask ‘down’ relative to the frame, while a positive x-shift moves the mask ‘right’ relative to the frame. Elements of the mask and frame which do not overlap after the shift are discarded. A shift resulting in no overlap at all will return a sum of
0.
for that frame.Note
Float shift values are cast to integers internally; round values before passing the shifts argument to better control the exact shifts used.
Note
The
shifts
parameter requires frame-by-frame processing to function, and so adds a performance penalty compared to unshifted mask application. If applying a constant shift it may be worthwhile to manually create a new, pre-shifted mask rather than relying on this feature.Shifting is also currently incompatible with
scipy.sparse
masks. If sparse processing is required then where possiblescipy.sparse
masks are converted tosparse.pydata
equivalents. A consequence of this is that sparse processing is not yet supported through CuPy when shifts are enabled, assparse.pydata
has no current CuPy implementation. If sparse masks are supplied on a CuPy backend whenuse_sparse=None
(the default) they will be densified to allow the calculation to take place.
Examples
>>> dataset.shape (16, 16, 32, 32) >>> def my_masks(): ... return [np.ones((32, 32)), np.zeros((32, 32))] >>> udf = ApplyMasksUDF(mask_factories=my_masks) >>> res = ctx.run_udf(dataset=dataset, udf=udf)['intensity'] >>> res.data.shape (16, 16, 2) >>> np.allclose(res.data[..., 1], 0) # same order as in the mask factory True
Mask factories can also return all masks as a single array, stacked on the first axis:
>>> def my_masks_2(): ... masks = np.zeros((2, 32, 32)) ... masks[1, ...] = 1 ... return masks >>> udf = ApplyMasksUDF(mask_factories=my_masks_2) >>> res_2 = ctx.run_udf(dataset=dataset, udf=udf)['intensity'] >>> np.allclose(res_2.data, res.data) True
Masks can be shifted relative to the data using the
shifts
parameter, this can either be a constant shift for all frames:>>> udf = ApplyMasksUDF(mask_factories=my_masks, shifts=(2, -5)) >>> res_shift_constant = ctx.run_udf(dataset=dataset, udf=udf)['intensity']
or a per-frame shift supplied using an
AuxBufferWrapper
created usingaux_data()
:>>> shifts = np.random.randint(-8, 8, size=(16, 16, 2)).ravel() >>> udf = ApplyMasksUDF( ... mask_factories=my_masks, ... shifts=ApplyMasksUDF.aux_data( ... shifts, ... kind='nav', ... extra_shape=(2,), ... dtype=shifts.dtype, ... ) ... ) >>> res_shift_aux = ctx.run_udf(dataset=dataset, udf=udf)['intensity']
New in version 0.4.0.
Changed in version 0.13.0: Added the
shifts
parameter
Load data
- class libertem.udf.raw.PickUDF[source]
Load raw data from ROI
This UDF is meant for frame picking with a very small ROI, usually a single frame.
New in version 0.4.0.
Examples
>>> udf = PickUDF() >>> roi = np.zeros(dataset.shape.nav, dtype=bool) >>> roi[0] = True >>> result = ctx.run_udf(dataset=dataset, udf=udf, roi=roi) >>> result["intensity"].raw_data[0].shape (32, 32)
Save data
NoOp
- class libertem.udf.base.NoOpUDF(preferred_input_dtype=<class 'bool'>)[source]
A UDF that does nothing and returns nothing.
This is useful for testing.
- Parameters:
preferred_input_dtype (numpy.dtype) – Perform dtype conversion. By default, this is
UDF.USE_NATIVE_DTYPE
.