LiberTEM UDFs
[1]:
%matplotlib inline
[2]:
import os
import matplotlib.pyplot as plt
import libertem.api as lt
import numpy as np
[3]:
ctx = lt.Context()
Specifying the dataset
Most formats can be loaded using the "auto"
type, but some may need additional parameters.
See the loading data section of the LiberTEM docs for details.
[4]:
data_base_path = os.environ.get("TESTDATA_BASE_PATH", "/home/alex/Data/")
[5]:
ds = ctx.load("auto", path=os.path.join(data_base_path, "01_ms1_3p3gK.hdr"))
After loading, some information is available in the diagnostics
attribute:
[6]:
ds.diagnostics
[6]:
[{'name': 'Bits per pixel', 'value': '12'},
{'name': 'Data kind', 'value': 'u'},
{'name': 'Layout', 'value': '(1, 1)'},
{'name': 'Partition shape', 'value': '(2075, 256, 256)'},
{'name': 'Number of partitions', 'value': '32'},
{'name': 'Number of frames skipped at the beginning', 'value': 0},
{'name': 'Number of frames ignored at the end', 'value': 0},
{'name': 'Number of blank frames inserted at the beginning', 'value': 0},
{'name': 'Number of blank frames inserted at the end', 'value': 0}]
Standard analyses: virtual detector
A standard analysis to run on 4D STEM data is to apply a virtual detector. Here, we define a ring detector, with radii in pixels:
[7]:
ring = ctx.create_ring_analysis(dataset=ds, ri=60, ro=70)
The analysis can be run with the Context.run
method:
[8]:
ring_res = ctx.run(ring, progress=True)
ring_res
[8]:
[<AnalysisResult: intensity>, <AnalysisResult: intensity_log>]
As the analysis mirrors what the web GUI does, we have to access the data using the raw_data
attribute, as we would get a viusalized result otherwise. Here we do the visualization ourselves using matplotlib
:
[9]:
plt.figure()
plt.imshow(ring_res.intensity.raw_data)
[9]:
<matplotlib.image.AxesImage at 0x7f4a2ecc57b0>
Simple UDF definition
User-defined funtions provide a way for you to implement your own data processing functionality. As a very simple example, we define a function that just sums up the pixels of each frame:
[10]:
def sum_of_pixels(frame):
return np.sum(frame)
The easiest way to run this on the data is to use the Context.map
function:
[11]:
res_pixelsum_1 = ctx.map(dataset=ds, f=sum_of_pixels, progress=True)
res_pixelsum_1
[11]:
<BufferWrapper kind=nav dtype=float32 extra_shape=()>
The result is of type BufferWrapper
, but can be used by any function that expects a numpy array, for example for plotting it:
[12]:
plt.figure()
plt.imshow(res_pixelsum_1)
[12]:
<matplotlib.image.AxesImage at 0x7f4a1ff597b0>
The Context.map
function is a shortcut for implementing very easy mapping over data, in a frame-by-frame fashion. The longer way of writing this would be as follows:
[13]:
from libertem.udf import UDF
class SumOfPixels(UDF):
def get_result_buffers(self):
return {
'sum_of_pixels': self.buffer(kind='nav', dtype='float32')
}
def process_frame(self, frame):
self.results.sum_of_pixels[:] = np.sum(frame)
This can now be run using the Context.run_udf
method:
[14]:
res_pixelsum_2 = ctx.run_udf(dataset=ds, udf=SumOfPixels(), progress=True)
res_pixelsum_2
[14]:
{'sum_of_pixels': <BufferWrapper kind=nav dtype=float32 extra_shape=()>}
The result is now a dict
, which maps buffer names, as defined in get_result_buffers
, to the BufferWrapper
result, so we can use the following to plot the results:
[15]:
plt.figure()
plt.imshow(res_pixelsum_2['sum_of_pixels'])
[15]:
<matplotlib.image.AxesImage at 0x7f4a8c7ab070>
extra_shape
: more than one result per scan position
[16]:
class StatsUDF(UDF):
def get_result_buffers(self):
return {
'all_stats': self.buffer(kind='nav', dtype='float32', extra_shape=(4,)),
}
def process_frame(self, frame):
self.results.all_stats[:] = (np.mean(frame), np.min(frame), np.max(frame), np.std(frame))
[17]:
res_stats = ctx.run_udf(dataset=ds, udf=StatsUDF(), progress=True)
Result now has an extra dimension, as specified by extra_shape
above:
[18]:
res_stats['all_stats'].data.shape
[18]:
(186, 357, 4)
Let’s plot the stddev
of each frame:
[19]:
plt.figure()
plt.imshow(res_stats['all_stats'].data[..., 3])
[19]:
<matplotlib.image.AxesImage at 0x7f4a1e4fca00>
kind=”sig” buffers, merge functions
Previously: one result for each scan position
Now: result buffer shaped like the diffraction patterns
We need a merge function to merge the result of one partition into the final result
Different buffer kinds can be combined in a single UDF, so you can combine different operations in a single pass over the data
[20]:
class MaxFrameUDF(UDF):
def get_result_buffers(self):
return {
'maxframe': self.buffer(kind='sig', dtype='float32')
}
def process_frame(self, frame):
# element-wise maximum:
self.results.maxframe[:] = np.maximum(self.results.maxframe, frame)
def merge(self, dest, src):
# src: the maximum observed in the current partition
# dest: the maximum observed in all partitions that were already merged together
dest.maxframe[:] = np.maximum(dest.maxframe, src.maxframe)
[21]:
res_max = ctx.run_udf(dataset=ds, udf=MaxFrameUDF(), progress=True)
[22]:
plt.figure()
plt.imshow(np.log1p(res_max['maxframe']))
[22]:
<matplotlib.image.AxesImage at 0x7f4a2ed1a710>
Region of interest
work on a subset of the navigation axes
can be used with all UDFs
useful for working selectively on data, or just reducing the I/O and computational load when implementing a new UDF
defined as a binary mask
Let’s create a mask based on the previously calculated pixel-sum:
[23]:
from skimage.morphology import opening, closing
[24]:
np.min(res_pixelsum_1), np.max(res_pixelsum_1), np.mean(res_pixelsum_1)
[24]:
(7762.0, 12367.0, 10277.385)
[25]:
mask = res_pixelsum_1.data < np.mean(res_pixelsum_1)
# mask = opening(mask)
mask = closing(opening(mask))
plt.figure()
plt.imshow(mask.astype("float32"))
plt.colorbar()
[25]:
<matplotlib.colorbar.Colorbar at 0x7f4a1d36f100>
[26]:
res_roi = ctx.run_udf(dataset=ds, udf=StatsUDF(), roi=mask, progress=True)
[27]:
plt.figure()
plt.imshow(np.log1p(res_roi['all_stats'].data[..., 2]))
[27]:
<matplotlib.image.AxesImage at 0x7f4a1cf91c90>
Results with ROI applied
One has to take some care when handling results where a roi
was applied; just using the result as a numpy array or accessing the data
attribute will give you a result that keeps the whole dataset shape, where the deselected parts are filled with NaN
values:
[28]:
res_roi['all_stats'].data.shape
[28]:
(186, 357, 4)
There is a second attribute, raw_data
, which will give you a flattenned array of just the results, like numpy would give you for fancy indexing:
[29]:
res_roi['all_stats'].raw_data.shape
[29]:
(45358, 4)
Parameters
Keyword arguments passed to the UDF are made available in self.params
on the UDF. A good convention to document your parameters is to put a docstring into your __init__
method and just pass on the values to super().__init__
. You can also validate the arguments here.
[30]:
class PixelPicker(UDF):
def __init__(self, coords, *args, **kwargs):
"""
Parameters
----------
coords : Tuple[int]
The coordinates to look at in each frame
"""
if len(coords) != 2:
raise ValueError("invalid coordinates")
super().__init__(*args, coords=coords, **kwargs)
def get_result_buffers(self):
return {
'value_of_pixel': self.buffer(kind='nav', dtype=np.float32)
}
def process_frame(self, frame):
self.results.value_of_pixel[:] = frame[self.params.coords]
[31]:
res_pixel_picker = ctx.run_udf(dataset=ds, udf=PixelPicker(coords=(128, 128)), progress=True)
[32]:
plt.figure()
plt.imshow(res_pixel_picker['value_of_pixel'])
[32]:
<matplotlib.image.AxesImage at 0x7f4a1cdf1ed0>
[ ]: