Tips and tricks
This is a collection of various helpful tips that don’t fit in elsewhere.
Using SSH forwarding
As there is no built-in authentication yet, LiberTEM should not listen on a network
port where untrusted parties have access. You can use ssh port forwarding from localhost
instead
to access LiberTEM from a different computer.
For example with conda:
$ ssh -L 9000:localhost:9000 <remote-hostname> "conda activate libertem; libertem-server"
Or, with virtualenv:
$ ssh -L 9000:localhost:9000 <remote-hostname> "/path/to/virtualenv/bin/libertem-server"
This makes LiberTEM, which is running on remote-hostname
, available on your
local host via http://localhost:9000/
Alternatively, you can launch and access LiberTEM on remote systems through Jupyter integration.
Activating ipywidgets in Jupyter
Some examples use ipywidgets
in notebooks, most notably the fast
BQLive2DPlot
. In some cases the corresponding Jupyter
Notebook extension has to be activated manually:
$ jupyter nbextension enable --py widgetsnbextension
For BQLive2DPlot
to function correctly in JupyterHub
and possibly JupyterLab, the packages bqplot
and bqplot-image-gl
have to be installed in both the notebook environment and in the environment for
the notebook server. This might require admin privileges.
Running in a top-level script
Since LiberTEM uses multiprocessing, the script entry point may have to be protected, most notably on Windows:
if __name__ == '__main__':
# Here goes starting a LiberTEM Context etc
...
This is not necessary if LiberTEM is used in a Jupyter notebook or IPython.
Gatan Digital Micrograph and other embedded interpreters
If LiberTEM is run from within an embedded interpreter, the following steps should be taken. This is necessary for Python scripting in Gatan Digital Micrograph (GMS), for example.
The variable sys.argv
may not be set in embedded interpreters, but it is expected by the
multiprocessing
module when spawning new processes. This workaround
guarantees that sys.argv
is set until this is fixed upstream:
if not hasattr(sys, 'argv'):
sys.argv = []
Furthermore, the correct executable for spawning subprocesses has to be set.
multiprocessing.set_executable(
os.path.join(sys.exec_prefix, 'pythonw.exe')) # Windows only
In GMS the script may have to run in an additional thread since loading SciPy in a GMS background thread doesn’t work. See https://www.gatan.com/python-faq for more information.
import threading
def main():
# Here goes the actual script
...
if __name__ == '__main__':
# Start the workload "main()" in a thread and wait for it to finish
th = threading.Thread(target=main)
th.start()
th.join()
See our examples folder for a number of scripts that work in GMS!
Show deprecation warnings
Many warning messages via the warnings
built-in module are suppressed by
default, including in interactive shells such as IPython and Jupyter. If you’d
like to be informed early about upcoming backwards-incompatible changes, you
should activate deprecation warnings. This is recommended since LiberTEM is
under active development.
import warnings
warnings.filterwarnings("default", category=DeprecationWarning)
warnings.filterwarnings("default", category=PendingDeprecationWarning)
Profiling long-running tests
Since our code base and test coverage is growing continuously, we should make sure that our test suite remains efficient to finish within reasonable time frames.
You can find the five slowest tests in the output of Tox, see Running the tests
for details. If you are using pytest
directly, you can use the
--durations
parameter:
(libertem) $ pytest --durations=10 tests/
(...)
================= slowest 10 test durations =============================
31.61s call tests/udf/test_blobfinder.py::test_run_refine_affinematch
17.08s call tests/udf/test_blobfinder.py::test_run_refine_sparse
16.89s call tests/test_analysis_masks.py::test_numerics_fail
12.78s call tests/server/test_job.py::test_run_job_delete_ds
10.90s call tests/server/test_cancel.py::test_cancel_udf_job
8.61s call tests/test_local_cluster.py::test_start_local
8.26s call tests/server/test_job.py::test_run_job_1_sum
6.76s call tests/server/test_job.py::test_run_with_all_zeros_roi
6.50s call tests/test_analysis_masks.py::test_numerics_succeed
5.75s call tests/test_analysis_masks.py::test_avoid_calculating_masks_on_client
= 288 passed, 66 skipped, 6 deselected, 2 xfailed, 7 warnings in 260.65 seconds =
Please note that tests which involve starting a local cluster have long lead times that are hard to avoid.
In order to gain more information on what slows down a particular test, you can install the pytest-profiling extension and use it to profile individual slow tests that you identified before:
(libertem) $ pytest --profile tests/udf/test_blobfinder.py::test_run_refine_affinematch
(...)
749921 function calls (713493 primitive calls) in 5.346 seconds
Ordered by: cumulative time
List reduced from 1031 to 20 due to restriction <20>
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 5.346 5.346 runner.py:76(pytest_runtest_protocol)
44/11 0.000 0.000 5.344 0.486 hooks.py:270(__call__)
44/11 0.000 0.000 5.344 0.486 manager.py:65(_hookexec)
44/11 0.000 0.000 5.344 0.486 manager.py:59(<lambda>)
44/11 0.001 0.000 5.344 0.486 callers.py:157(_multicall)
1 0.000 0.000 5.331 5.331 runner.py:83(runtestprotocol)
3 0.000 0.000 5.331 1.777 runner.py:172(call_and_report)
3 0.000 0.000 5.330 1.777 runner.py:191(call_runtest_hook)
3 0.000 0.000 5.329 1.776 runner.py:219(from_call)
3 0.000 0.000 5.329 1.776 runner.py:198(<lambda>)
1 0.000 0.000 5.138 5.138 runner.py:119(pytest_runtest_call)
1 0.000 0.000 5.138 5.138 python.py:1355(runtest)
1 0.000 0.000 5.138 5.138 python.py:155(pytest_pyfunc_call)
1 0.004 0.004 5.137 5.137 test_blobfinder.py:149(test_run_refine_affinematch)
5 0.159 0.032 3.150 0.630 generate.py:6(cbed_frame)
245 0.001 0.000 2.989 0.012 masks.py:98(circular)
245 0.046 0.000 2.988 0.012 masks.py:8(_make_circular_mask)
245 0.490 0.002 2.941 0.012 masks.py:280(radial_bins)
245 0.152 0.001 2.229 0.009 masks.py:212(polar_map)
25 0.001 0.000 1.968 0.079 blobfinder.py:741(run_refine)
=============================== 1 passed, 1 warnings in 7.81 seconds ============================
Platform-dependent code and remote executor
Platform-dependent code in a lambda function or nested function can lead to
incompatibilities when run on an executor with remote workers, such as the
DaskJobExecutor
. Instead, the function should
be defined as part of a module, for example as a stand-alone function or as a
method of a class. That way, the correct remote implementation for
platform-dependent code is used on the remote worker since only a reference to
the function and not the implementation itself is sent over.
Benchmark Numba compilation time
One has to capture the very first execution of a jitted function and compare it with subsequent executions to measure its compilation time. By default, pytest-benchmark performs calibration runs and possibly warmup rounds that don’t report the very first run.
The only way to completely disable this is to use the pedantic mode specifying no warmup rounds and two rounds with one iteration each:
@numba.njit
def hello():
return "world"
@pytest.mark.compilation
@pytest.mark.benchmark(
group="compilation"
)
def test_numba_compilation(benchmark):
benchmark.extra_info["mark"] = "compilation"
benchmark.pedantic(hello, warmup_rounds=0, rounds=2, iterations=1)
That way the maximum is the first run with compilation, and the minimum is the second one without compilation. Tests are marked as compilation tests in the extra info as well to aid later data evaluation. Note that the compilation tests will have poor statistics since it only runs once. If you have an idea on how to collect better statistics, please let us know!
Simulating slow systems with control groups
Under Linux, it is possible to simulate a slow system using control groups:
sudo cgcreate -g cpu:/slow
sudo cgset -r cpu.cfs_period_us=1000000 slow
sudo cgset -r cpu.cfs_quota_us=200000 slow
sudo chown root:<yourgroup> /sys/fs/cgroup/cpu,cpuacct/slow
sudo chmod 664 /sys/fs/cgroup/cpu,cpuacct/slow
Then, as a user, you can use cgexec
to run a command in that control group:
cgexec -g cpu:slow pytest tests/
This is useful, for example, to debug test failures that only seem to happen in CI
or under heavy load. Note that tools like cgcreate
only work with cgroups v1,
with newer distributions using cgroups v2 you may have to adapt these instructions.
Jupyter
To use the Python API from within a Jupyter notebook, you can install Jupyter into your LiberTEM virtual environment.
(libertem) $ python -m pip install jupyter
You can then run a local notebook from within the LiberTEM environment, which should open a browser window with Jupyter that uses your LiberTEM environment.
(libertem) $ jupyter notebook
JupyterHub
If you’d like to use the Python API from a LiberTEM virtual environment on a system that manages logins with JupyterHub, you can easily install a custom kernel definition for your LiberTEM environment.
First, you can launch a terminal on JupyterHub from the “New” drop-down menu in the file browser. Alternatively you can execute shell commands by prefixing them with “!” in a Python notebook.
In the terminal you can create and activate virtual environments and perform the LiberTEM installation as described above. Within the activated LiberTEM environment you additionally install ipykernel:
(libertem) $ python -m pip install ipykernel
Now you can create a custom ipython kernel definition for your environment:
(libertem) $ python -m ipykernel install --user --name libertem --display-name "Python (libertem)"
After reloading the file browser window, a new Notebook option “Python (libertem)” should be available in the “New” drop-down menu. You can test it by creating a new notebook and running
In [1]: import libertem
See also Jupyter integration for launching the web GUI from JupyterHub or JupyterLab.