Data Set API

This API allows to load and handle data on a distributed system efficiently. Note that you should not directly use most dataset methods, but rather use the more high-level tools available, for example user-defined functions.

See our documentation on loading data for a high-level introduction.

Formats

Merlin Medipix (MIB)

class libertem.io.dataset.mib.MIBDataSet(path, tileshape=None, scan_size=None, disable_glob=False, nav_shape=None, sig_shape=None, sync_offset=0, io_backend=None)[source]

MIB data sets consist of one or more .mib files, and optionally a .hdr file. The HDR file is used to automatically set the nav_shape parameter from the fields “Frames per Trigger” and “Frames in Acquisition.” When loading a MIB data set, you can either specify the path to the HDR file, or choose one of the MIB files. The MIB files are assumed to follow a naming pattern of some non-numerical prefix, and a sequential numerical suffix.

Note that if you are using a per-pixel trigger setup, LiberTEM won’t be able to deduce the x scanning dimension - in that case, you will need to specify the nav_shape yourself.

Examples

>>> # both examples look for files matching /path/to/default*.mib:
>>> ds1 = ctx.load("mib", path="/path/to/default.hdr")  
>>> ds2 = ctx.load("mib", path="/path/to/default64.mib")  
Parameters
  • path (str) – Path to either the .hdr file or one of the .mib files

  • nav_shape (tuple of int, optional) – A n-tuple that specifies the size of the navigation region ((y, x), but can also be of length 1 for example for a line scan, or length 3 for a data cube, for example)

  • sig_shape (tuple of int, optional) – Common case: (height, width); but can be any dimensionality

  • sync_offset (int, optional) – If positive, number of frames to skip from start If negative, number of blank frames to insert at start

Raw binary files

class libertem.io.dataset.raw.RawFileDataSet(path, dtype, scan_size=None, detector_size=None, enable_direct=False, detector_size_raw=None, crop_detector_to=None, tileshape=None, nav_shape=None, sig_shape=None, sync_offset=0, io_backend=None)[source]

Read raw data from a single file of raw binary data. This reader assumes the following format:

  • only raw data (no file header)

  • frames are stored in C-order without additional frame headers

  • dtype supported by numpy

Examples

>>> ds = ctx.load("raw", path=path_to_raw, nav_shape=(16, 16), sig_shape=(128, 128),
...               sync_offset=0, dtype="float32",)
Parameters
  • path (str) – Path to the file

  • nav_shape (tuple of int) – A n-tuple that specifies the size of the navigation region ((y, x), but can also be of length 1 for example for a line scan, or length 3 for a data cube, for example)

  • sig_shape (tuple of int) – Common case: (height, width); but can be any dimensionality

  • sync_offset (int, optional) – If positive, number of frames to skip from start If negative, number of blank frames to insert at start

  • dtype (numpy dtype) – The dtype of the data as it is on disk. Can contain endian indicator, for example >u2 for big-endian 16bit data.

Digital Micrograph (DM3, DM4) files

class libertem.io.dataset.dm.DMDataSet(files=None, scan_size=None, same_offset=False, nav_shape=None, sig_shape=None, sync_offset=0, io_backend=None)[source]

Reader for stacks of DM3/DM4 files.

Note

This DataSet is not supported in the GUI yet, as the file dialog needs to be updated to properly handle opening series.

Note

Single-file 4D DM files are not yet supported. The use-case would be to read DM4 files from the conversion of K2 STEMx data, but those data sets are actually transposed (nav/sig are swapped).

That means the data would have to be transposed back into the usual shape, which is slow, or algorithms would have to be adapted to work directly on transposed data. As an example, applying a mask in the conventional layout corresponds to calculating a weighted sum frame along the navigation dimension in the transposed layout.

Since the transposed layout corresponds to a TEM tilt series, support for transposed 4D STEM data could have more general applications beyond supporting 4D DM4 files. Please contact us if you have a use-case for single-file 4D DM files or other applications that process stacks of TEM files, and we may add support!

Note

You can use the PyPi package natsort to sort the filenames by their numerical components, this is especially useful for filenames without leading zeros.

Parameters
  • files (List[str]) – List of paths to the files that should be loaded. The order is important, as it determines the order in the navigation axis.

  • nav_shape (Tuple[int] or None) – By default, the files are loaded as a 3D stack. You can change this by specifying the nav_shape, which reshapes the navigation dimensions. Raises a DataSetException if the shape is incompatible with the data that is loaded.

  • sig_shape (Tuple[int], optional) – Signal/detector size (height, width)

  • sync_offset (int, optional) – If positive, number of frames to skip from start If negative, number of blank frames to insert at start

  • same_offset (bool) – When reading a stack of dm3/dm4 files, it can be expensive to read in all the metadata from all files, which we currently only use for getting the offsets and sizes of the main data in each file. If you absolutely know that the offsets and sizes are the same for all files, you can set this parameter and we will skip reading all metadata but the one from the first file.

EMPAD

class libertem.io.dataset.empad.EMPADDataSet(path, scan_size=None, nav_shape=None, sig_shape=None, sync_offset=0, io_backend=None)[source]

Read data from EMPAD detector. EMPAD data sets consist of two files, one .raw and one .xml file. Note that the .xml file contains the file name of the .raw file, so if the raw file was renamed at some point, opening using the .xml file will fail.

Parameters
  • path (str) – Path to either the .xml or the .raw file. If the .xml file given, the nav_shape parameter can be left out

  • nav_shape (tuple of int, optional) – A tuple (y, x) that specifies the size of the scanned region. It is automatically read from the .xml file if you specify one as path.

  • sig_shape (tuple of int, optional) – Signal/detector size (height, width)

  • sync_offset (int, optional) – If positive, number of frames to skip from start If negative, number of blank frames to insert at start

K2IS

class libertem.io.dataset.k2is.K2ISDataSet(path, nav_shape=None, sig_shape=None, sync_offset=None, io_backend=None)[source]

Read raw K2IS data sets. They consist of 8 .bin files and one .gtg file. Currently, data acquired using the STEMx unit is supported, metadata about the nav_shape is read from the .gtg file.

Parameters
  • path (str) – Path to one of the files of the data set (either one of the .bin files or the .gtg file)

  • nav_shape (tuple of int, optional) – A n-tuple that specifies the size of the navigation region ((y, x), but can also be of length 1 for example for a line scan, or length 3 for a data cube, for example)

  • sig_shape (tuple of int, optional) – Signal/detector size (height, width)

  • sync_offset (int, optional) – If positive, number of frames to skip from start If negative, number of blank frames to insert at start

FRMS6

class libertem.io.dataset.frms6.FRMS6DataSet(path, enable_offset_correction=True, gain_map_path=None, dest_dtype=None, nav_shape=None, sig_shape=None, sync_offset=0, io_backend=None)[source]

Read PNDetector FRMS6 files. FRMS6 data sets consist of multiple .frms6 files and a .hdr file. The first .frms6 file (matching *_000.frms6) contains dark frames, which are subtracted if enable_offset_correction is true.

Parameters
  • path (string) – Path to one of the files of the FRMS6 dataset (either .hdr or .frms6)

  • enable_offset_correction (boolean) – Subtract dark frames when reading data

  • gain_map_path (string) – Path to a gain map to apply (.mat format)

  • nav_shape (tuple of int, optional) – A n-tuple that specifies the size of the navigation region ((y, x), but can also be of length 1 for example for a line scan, or length 3 for a data cube, for example)

  • sig_shape (tuple of int, optional) – Signal/detector size (height, width)

  • sync_offset (int, optional) – If positive, number of frames to skip from start If negative, number of blank frames to insert at start

BLO

class libertem.io.dataset.blo.BloDataSet(path, tileshape=None, endianess='<', nav_shape=None, sig_shape=None, sync_offset=0, io_backend=None)[source]

Read Nanomegas .blo files

Examples

>>> ds = ctx.load("blo", path="/path/to/file.blo")  
Parameters
  • path (str) – Path to the file

  • endianess (str) – either ‘<’ or ‘>’ for little or big endian

  • nav_shape (tuple of int, optional) – A n-tuple that specifies the size of the navigation region ((y, x), but can also be of length 1 for example for a line scan, or length 3 for a data cube, for example)

  • sig_shape (tuple of int, optional) – Signal/detector size (height, width)

  • sync_offset (int, optional) – If positive, number of frames to skip from start If negative, number of blank frames to insert at start

SER

class libertem.io.dataset.ser.SERDataSet(path, emipath=None, nav_shape=None, sig_shape=None, sync_offset=0, io_backend=None)[source]

Read TIA SER files.

Examples

>>> ds = ctx.load("ser", path="/path/to/file.ser")  
Parameters
  • path (str) – Path to the .ser file

  • nav_shape (tuple of int, optional) – A n-tuple that specifies the size of the navigation region ((y, x), but can also be of length 1 for example for a line scan, or length 3 for a data cube, for example)

  • sig_shape (tuple of int, optional) – Signal/detector size (height, width)

  • sync_offset (int, optional) – If positive, number of frames to skip from start If negative, number of blank frames to insert at start

HDF5

class libertem.io.dataset.hdf5.H5DataSet(path, ds_path=None, tileshape=None, target_size=None, min_num_partitions=None, sig_dims=2, io_backend=None)[source]

Read data from a HDF5 data set.

Examples

>>> ds = ctx.load("hdf5", path=path_to_hdf5, ds_path="/data")
Parameters
  • path (str) – Path to the file

  • ds_path (str) – Path to the HDF5 data set inside the file

  • sig_dims (int) – Number of dimensions that should be considered part of the signal (for example 2 when dealing with 2D image data)

  • target_size (int) – Target partition size, in bytes. Usually doesn’t need to be changed.

  • min_num_partitions (int) – Minimum number of partitions, set to number of cores if not specified. Usually doesn’t need to be specified.

Norpix SEQ

class libertem.io.dataset.seq.SEQDataSet(path: str, scan_size: Optional[Tuple[int]] = None, nav_shape: Optional[Tuple[int]] = None, sig_shape: Optional[Tuple[int]] = None, sync_offset: int = 0, io_backend=None)[source]

Read data from Norpix SEQ files.

Examples

>>> ds = ctx.load("seq", path="/path/to/file.seq", nav_shape=(1024, 1024))  
Parameters
  • path – Path to the .seq file

  • nav_shape (tuple of int) – A n-tuple that specifies the size of the navigation region ((y, x), but can also be of length 1 for example for a line scan, or length 3 for a data cube, for example)

  • sig_shape (tuple of int, optional) – Signal/detector size (height, width)

  • sync_offset (int, optional) – If positive, number of frames to skip from start If negative, number of blank frames to insert at start

MRC

class libertem.io.dataset.mrc.MRCDataSet(path, nav_shape=None, sig_shape=None, sync_offset=0, io_backend=None)[source]

Read MRC files.

Examples

>>> ds = ctx.load("mrc", path="/path/to/file.mrc")  
Parameters
  • path (str) – Path to the .mrc file

  • nav_shape (tuple of int, optional) – A n-tuple that specifies the size of the navigation region ((y, x), but can also be of length 1 for example for a line scan, or length 3 for a data cube, for example)

  • sig_shape (tuple of int, optional) – Signal/detector size (height, width)

  • sync_offset (int, optional) – If positive, number of frames to skip from start If negative, number of blank frames to insert at start

Memory data set

class libertem.io.dataset.memory.MemoryDataSet(tileshape=None, num_partitions=None, data=None, sig_dims=2, check_cast=True, tiledelay=None, datashape=None, base_shape=None, force_need_decode=False, io_backend=None, nav_shape=None, sig_shape=None, sync_offset=0)[source]

This dataset is constructed from a NumPy array in memory for testing purposes. It is not recommended for production use since it performs poorly with a distributed executor.

Examples

>>> from libertem.io.dataset.memory import MemoryDataSet
>>>
>>> data = np.zeros((2, 2, 128, 128))
>>> ds = MemoryDataSet(data=data)

Internal DataSet API

class libertem.io.dataset.base.BasePartition(meta: libertem.io.dataset.base.meta.DataSetMeta, partition_slice: libertem.common.slice.Slice, fileset: libertem.io.dataset.base.fileset.FileSet, start_frame: int, num_frames: int, io_backend: libertem.io.dataset.base.backend.IOBackend)[source]

Base class with default implementations

Parameters
  • meta – The DataSet’s DataSetMeta instance

  • partition_slice – The partition slice in non-flattened form

  • fileset – The files that are part of this partition (the FileSet may also contain files from the dataset which are not part of this partition, but that may harm performance)

  • start_frame – The index of the first frame of this partition (global coords)

  • num_frames – How many frames this partition should contain

  • io_backend – The I/O backend to use for accessing this partition

adjust_tileshape(tileshape, roi)[source]

Final veto of the Partition in the tileshape negotiation process, make sure that corrections are taken into account!

get_base_shape(roi)[source]
get_io_backend()[source]
get_locations()[source]
get_macrotile(dest_dtype='float32', roi=None)[source]

Return a single tile for the entire partition.

This is useful to support process_partiton() in UDFs and to construct dask arrays from datasets.

get_max_io_size()[source]

Override this method to implement a custom maximum I/O size

get_tiles(tiling_scheme, dest_dtype='float32', roi=None)[source]

Return a generator over all DataTiles contained in this Partition.

Note

The DataSet may reuse the internal buffer of a tile, so you should directly process the tile and not accumulate a number of tiles and then work on them.

Parameters
  • tiling_scheme – According to this scheme the data will be tiled

  • dest_dtype (numpy dtype) – convert data to this dtype when reading

  • roi (numpy.ndarray) – Boolean array that matches the dataset navigation shape to limit the region to work on. With a ROI, we yield tiles from a “compressed” navigation axis, relative to the beginning of the dataset. Compressed means, only frames that have a 1 in the ROI are considered, and the resulting tile slices are from a coordinate system that has the shape (np.count_nonzero(roi),).

need_decode(read_dtype, roi, corrections)[source]
set_corrections(corrections: libertem.corrections.corrset.CorrectionSet)[source]
class libertem.io.dataset.base.BufferedBackend(max_buffer_size=16777216)[source]

I/O backend using a buffered reading strategy. Useful for slower media like HDDs, where seeks cause performance drops. Used by default on Windows.

This does not perform optimally on SSDs under all circumstances, for better best-case performance, try using MMapBackend instead.

Parameters

max_buffer_size (int) – Maximum buffer size, in bytes. This is passed to the tileshape negotiation to select the right depth.

classmethod from_json(msg)[source]

Construct an instance from the already-decoded msg.

get_impl()[source]
class libertem.io.dataset.base.DataSet(io_backend=None)[source]
check_valid()[source]

check validity of the DataSet. this will be executed (after initialize) on a worker node. should raise DataSetException in case of errors, return True otherwise.

classmethod detect_params(path, executor)[source]

Guess if path can be opened using this DataSet implementation and detect parameters.

returns dict of detected parameters if path matches this dataset type, returns False if path is most likely not of a matching type.

property diagnostics

Diagnostics common for all DataSet implementations

property dtype

the destination data type

get_cache_key()[source]
get_correction_data()[source]

Correction parameters that are part of this DataSet. This should only be called after the DataSet is initialized.

Returns

correction parameters that are part of this DataSet

Return type

CorrectionSet

get_diagnostics()[source]

Get relevant diagnostics for this dataset, as a list of dicts with keys name, value, where value may be string or a list of dicts itself. Subclasses should override this method.

get_io_backend()[source]
classmethod get_msg_converter()Type[libertem.web.messageconverter.MessageConverter][source]
get_num_partitions()[source]

Returns the number of partitions the dataset should be split into

get_partitions()[source]

Return a generator over all Partitions in this DataSet. Should only be called on the master node.

get_slices()[source]

Return the partition slices for the dataset

classmethod get_supported_extensions()Set[str][source]

Return supported extensions as a set of strings.

Plain extensions only, no pattern!

get_sync_offset_info()[source]

Check sync_offset specified and returns number of frames skipped and inserted

initialize(executor)[source]

Perform possibly expensive initialization, like pre-loading metadata.

This is run on the master node, but can execute parts on workers, for example if they need to access the data stored on worker nodes, using the passed executor instance.

If you need the executor around for later operations, for example when creating the partitioning, save a reference here!

Should return the possibly modified DataSet instance (if a method running on a worker is changing self, these changes won’t automatically be transferred back to the master node)

partition_shape(dtype, target_size, min_num_partitions=None)[source]

Calculate partition shape for the given target_size

Parameters
  • dtype (numpy.dtype or str) – data type of the dataset

  • target_size (int) – target size in bytes - how large should each partition be?

  • min_num_partitions (int) – minimum number of partitions desired. Defaults to the number of workers in the cluster.

Returns

the shape calculated from the given parameters

Return type

Tuple[int]

property raw_dtype

the underlying data type

set_num_cores(cores)[source]
property shape

The shape of the DataSet, as it makes sense for the application domain (for example, 4D for pixelated STEM)

exception libertem.io.dataset.base.DataSetException[source]
class libertem.io.dataset.base.DataSetMeta(shape: libertem.common.shape.Shape, image_count=0, raw_dtype=None, dtype=None, metadata=None, sync_offset=0)[source]
shape

“native” dataset shape, can have any dimensionality

raw_dtypenp.dtype

dtype used internally in the data set for reading

dtypenp.dtype

Best-fitting output dtype. This can be different from raw_dtype, for example if there are post-processing steps done as part of reading, which need a different dtype. Assumed equal to raw_dtype if not given

sync_offset: int, optional

If positive, number of frames to skip from start If negative, number of blank frames to insert at start

image_count

Total number of frames in the dataset

metadata

Any metadata offered by the DataSet, not specified yet

class libertem.io.dataset.base.DataTile(input_array, tile_slice, scheme_idx)[source]
property flat_data

Flatten the data.

The result is a 2D array where each row contains pixel data from a single frame. It is just a reshape, so it is a view into the original data.

reshape(shape, order='C')[source]

Returns an array containing the same data with a new shape.

Refer to numpy.reshape for full documentation.

See also

numpy.reshape

equivalent function

Notes

Unlike the free function numpy.reshape, this method on ndarray allows the elements of the shape parameter to be passed in as separate arguments. For example, a.reshape(10, 11) is equivalent to a.reshape((10, 11)).

class libertem.io.dataset.base.Decoder[source]
do_clear()[source]
get_decode(native_dtype, read_dtype)[source]
get_native_dtype(inp_native_dtype, read_dtype)[source]
class libertem.io.dataset.base.DtypeConversionDecoder[source]
get_decode(native_dtype, read_dtype)[source]
get_native_dtype(inp_native_dtype, read_dtype)[source]
class libertem.io.dataset.base.File(path, start_idx, end_idx, native_dtype, sig_shape, frame_footer=0, frame_header=0, file_header=0)[source]
Parameters
  • file_header (int) – Number of bytes to ignore at the beginning of the file

  • frame_header (int) – Number of bytes to ignore before each frame

  • frame_footer (int) – Number of bytes to ignore after each frame

close()[source]
property end_idx
property file_header_bytes
fileno()[source]
property native_dtype
property num_frames
open()[source]
property sig_shape
property start_idx
class libertem.io.dataset.base.FileSet(files: List[libertem.io.dataset.base.file.File], frame_header_bytes: int = 0, frame_footer_bytes: int = 0)[source]
Parameters

files – files that are part of a partition or dataset

files_from(start)[source]
get_as_arr()[source]
get_for_range(start, stop)[source]

return new FileSet filtered for files having frames in the [start, stop) range

get_read_ranges(start_at_frame: int, stop_before_frame: int, dtype, tiling_scheme: libertem.io.dataset.base.tiling.TilingScheme, sync_offset: int = 0, roi: Optional[numpy.ndarray] = None)[source]
class libertem.io.dataset.base.FileTree(low: int, high: int, value: Any, idx: int, left: Union[None, libertem.io.dataset.base.utils.FileTree], right: Union[None, libertem.io.dataset.base.utils.FileTree])[source]

Construct a FileTree node

Parameters
  • low – First frame contained in this file

  • high – First index of the next file

  • value – The corresponding file object

  • idx – The index of the file object in the fileset

  • left – Nodes with a lower low

  • right – Nodes with a higher low

classmethod make(files)[source]

build a balanced binary tree by bisecting the files list

search_start(value)[source]

search a node that has start_idx <= value && end_idx > value

to_string(depth=0)[source]
class libertem.io.dataset.base.IOBackend[source]
classmethod from_json(msg)[source]

Construct an instance from the already-decoded msg.

get_impl()libertem.io.dataset.base.backend.IOBackendImpl[source]
registry = {'buffered': <class 'libertem.io.dataset.base.backend_buffered.BufferedBackend'>, 'mmap': <class 'libertem.io.dataset.base.backend_mmap.MMapBackend'>}
class libertem.io.dataset.base.LocalFile(path, start_idx, end_idx, native_dtype, sig_shape, frame_footer=0, frame_header=0, file_header=0)[source]
close()[source]
fileno()[source]
mmap()[source]

Memory map for this file, with file header, frame header and frame footer cut off

Used for reading tiles straight from the filesystem cache

open()[source]
raw_mmap()[source]

Memory map for this file, with only the file header cut off

Used for reading tiles with a decoding step, using the read ranges

readinto(out)[source]

Fill out by reading from the current file position

seek(pos)[source]
tell()[source]
class libertem.io.dataset.base.MMapBackend(enable_readahead_hints=False)[source]

I/O backend using memory mapped files. Used by default on non-Windows systems.

Parameters

enable_readahead_hints (bool) – Linux only. Try to influence readahead behavior (experimental).

classmethod from_json(msg)[source]

Construct an instance from the already-decoded msg.

get_impl()[source]
class libertem.io.dataset.base.Negotiator[source]

Tile shape negotiator. The main functionality is in get_scheme, which, given a udf, partition and read_dtype will generate a TilingScheme that is compatible with both the UDF and the DataSet, possibly even optimal.

get_scheme(udfs, partition, read_dtype: numpy.dtype, roi: numpy.ndarray, corrections: Optional[libertem.corrections.corrset.CorrectionSet] = None)[source]

Generate a TilingScheme instance that is compatible with both the given udf and the :class:~`libertem.io.dataset.base.DataSet`.

Parameters
  • udfs (List[UDF]) – The concrete UDF to optimize the tiling scheme for. Depending on the method (tile, frame, partition) and preferred total input size and depth.

  • partition (Partition) – The TilingScheme is created specifically for the given Partition, so it can adjust even in the face of different partition sizes/shapes.

  • read_dtype – The dtype in which the data will be fed into the UDF

  • roi (np.ndarray) – Region of interest

  • corrections (CorrectionSet) – Correction set to consider in negotiation

validate(shape, partition, size, io_max_size, itemsize, base_shape, corrections)[source]
class libertem.io.dataset.base.Partition(meta: libertem.io.dataset.base.meta.DataSetMeta, partition_slice: libertem.common.slice.Slice, io_backend: libertem.io.dataset.base.backend.IOBackend)[source]
Parameters
  • meta – The DataSet’s DataSetMeta instance

  • partition_slice – The partition slice in non-flattened form

  • fileset – The files that are part of this partition (the FileSet may also contain files from the dataset which are not part of this partition, but that may harm performance)

  • io_backend – The I/O backend to use for accessing this partition

adjust_tileshape(tileshape, roi)[source]

Final veto of the Partition in the tileshape negotiation process, make sure that corrections are taken into account!

property dtype
get_base_shape(roi)[source]
get_io_backend()[source]
get_locations()[source]
get_macrotile(dest_dtype='float32', roi=None)[source]
get_max_io_size()[source]

Override this method to implement a custom maximum I/O size

get_min_sig_size()[source]

minimum signal size, in number of elements

get_tiles(tiling_scheme, dest_dtype='float32', roi=None)[source]
classmethod make_slices(shape, num_partitions, sync_offset=0)[source]

partition a 3D dataset (“list of frames”) along the first axis, yielding the partition slice, and additionally start and stop frame indices for each partition.

need_decode(read_dtype, roi, corrections)[source]
set_corrections(corrections: libertem.corrections.corrset.CorrectionSet)[source]
set_io_backend(backend)[source]
property shape

the shape of the partition; dimensionality depends on format

validate_tiling_scheme(tiling_scheme)[source]
class libertem.io.dataset.base.PartitionStructure(shape, slices, dtype)[source]

Structure of the dataset.

Assumed to be contiguous on the flattened navigation axis.

Parameters
  • slices (List[Tuple[Int]]) – List of tuples [start_idx, end_idx) that partition the data set by the flattened navigation axis

  • shape (Shape) – shape of the whole dataset

  • dtype (numpy dtype) – The dtype of the data as it is on disk. Can contain endian indicator, for example >u2 for big-endian 16bit data.

SCHEMA = {'$id': 'http://libertem.org/PartitionStructure.schema.json', '$schema': 'http://json-schema.org/draft-07/schema#', 'properties': {'dtype': {'type': 'string'}, 'shape': {'items': {'minimum': 1, 'type': 'number'}, 'minItems': 2, 'type': 'array'}, 'sig_dims': {'type': 'number'}, 'slices': {'items': {'items': {'maxItems': 2, 'minItems': 2, 'type': 'number'}, 'type': 'array'}, 'minItems': 1, 'type': 'array'}, 'version': {'const': 1}}, 'required': ['version', 'slices', 'shape', 'sig_dims', 'dtype'], 'title': 'PartitionStructure', 'type': 'object'}
classmethod from_ds(ds)[source]
classmethod from_json(data)[source]
serialize()[source]
class libertem.io.dataset.base.TilingScheme(slices: List[libertem.common.slice.Slice], tileshape: libertem.common.shape.Shape, dataset_shape: libertem.common.shape.Shape, debug=None)[source]
property dataset_shape
property depth
classmethod make_for_shape(tileshape: libertem.common.shape.Shape, dataset_shape: libertem.common.shape.Shape, debug=None)[source]

Make a TilingScheme from tileshape and dataset_shape.

Note that both in signal and navigation direction there are border effects, i.e. if the depth doesn’t evenly divide the number of frames in the partition (simplified, ROI also applies…), or if the signal dimensions of tileshape doesn’t evenly divide the signal dimensions of the dataset_shape.

Parameters
  • tileshape – Uniform shape of all tiles. Should have flat navigation axis (meaning tileshape.nav.dims == 1) and be contiguous in signal dimensions.

  • dataset_shape – Shape of the whole data set. Only the signal part is used.

property shape

tileshape. note that some border tiles can be smaller!

property slices

signal-only slices for all possible positions

property slices_array

Returns the slices from the schema as a numpy ndarray a of shape (n, 2, sig_dims) with: a[i, 0] are origins for slice i a[i, 1] are shapes for slice i

class libertem.io.dataset.base.WritableDataSet[source]
class libertem.io.dataset.base.WritablePartition[source]
delete()[source]
get_write_handle()[source]
libertem.io.dataset.base.decode_swap_2(inp, out, idx, native_dtype, rr, origin, shape, ds_shape)[source]
libertem.io.dataset.base.decode_swap_4(inp, out, idx, native_dtype, rr, origin, shape, ds_shape)[source]
libertem.io.dataset.base.default_get_read_ranges(start_at_frame, stop_before_frame, roi, depth, slices_arr, fileset_arr, sig_shape, bpp, sync_offset=0, extra=None, frame_header_bytes=0, frame_footer_bytes=0)
libertem.io.dataset.base.get_coordinates(slice_, ds_shape, roi=None)[source]

Returns numpy.ndarray of coordinates that correspond to the frames in the actual navigation space which are part of the current tile or partition.

Parameters
  • slice (Slice) – Describes the location within the dataset with navigation dimension flattened and reduced to the ROI.

  • ds_shape (Shape) – The original shape of the whole dataset, not influenced by the ROI

  • roi (numpy.ndarray, optional) – Array of type bool, matching the navigation shape of the dataset

libertem.io.dataset.base.make_get_read_ranges(px_to_bytes=CPUDispatcher(<function _default_px_to_bytes>), read_ranges_tile_block=CPUDispatcher(<function _default_read_ranges_tile_block>))Tuple[numpy.ndarray, numpy.ndarray][source]

Translate the TilingScheme combined with the roi into (pixel)-read-ranges, together with their tile slices.

Parameters
  • start_at_frame – Dataset-global first frame index to read

  • stop_before_frame – Stop before this frame index

  • tiling_scheme – Description on how the data should be tiled

  • fileset_arr – Array of shape (number_of_files, 3) where the last dimension contains the following values: (start_idx, end_idx, file_idx), where [start_idx, end_idx) defines which frame indices are contained in the file.

  • roi – Region of interest (for the full dataset)

Returns

read_ranges is an ndarray with shape (number_of_tiles, depth, 3) where the last dimension contains: file index, start_byte, stop_byte

Return type

(tile_slice, read_ranges)