
View on GitHub


Test Coverage
.. _`loading data`:

Loading data

To efficiently handle files larger than main memory, LiberTEM never loads the
whole data set at once. Calling the :meth:`~libertem.api.Context.load`
function only checks that the dataset exists and is value before providing Python
with an object which can be used in later computation. Running an analysis
on this object with :meth:`` or
:meth:`~libertem.api.Context.run_udf` then streams the data from mass storage
in optimal-sized chunks, such that even very large datasets can be processed without
saturating the system resources.

See :ref:`sample data` for publicly available datasets for testing.

There are two main ways of opening a data set in LiberTEM: using the GUI, or the
Python API.

Loading through the API

In the API, you can use :meth:`libertem.api.Context.load`. The general
pattern is:

.. code-block:: python

   ctx = Context()
   ctx.load("typename", path="/path/to/some/file", arg1="val1", arg2=42)

So, you need to specify the data set type, the path, and dataset-specific
arguments. These arguments are documented below.

For most file types, it is possible to automatically detect the type and
parameters, which you can trigger by using :code:`"auto"` as file type:

.. code-block:: python

   ctx.load("auto", path="/path/to/some/file")

For the full list of supported file formats with links to their reference
documentation, see :ref:`supported formats` below.

.. _`Loading using the GUI`:

Loading using the GUI

Using the GUI, mostly the same parameters need to be specified, although some
are only available in the Python API. Tuples (for example for :code:`nav_shape`)
have to be entered as separated values into the fields. You can hit a comma to jump to
the next field. We follow the NumPy convention here and specify the "fast-access" dimension
last, so a value of :code:`42`, :code:`21` would mean the same as specifying
:code:`(42, 21)` in the Python API, setting :code:`y=42` and :code:`x=21`.

See the :ref:`GUI usage page <usage documentation>` for more information on the GUI. 

For more general information about how LiberTEM structures data see :ref:`the concepts section <concepts>`.

Common parameters

There are some common parameters across data set types:

  The name of the data set, for display purposes. Only used in the GUI.
  In the GUI, we generally support visualizing data containing rectangular 2D scans. For
  all the dataset types, you can specify a nav_shape as a tuple `(y, x)`. If the dataset
  isn't 4D, the GUI can reshape it to 4D. When using the Python API, you are free to
  use n-dimensional `nav_shape`, if the data set and chosen analysis supports it.
  In the GUI, you can specify shape of the detector as :code:`height`, :code:`width`, but
  when using the Python API, it can be of any dimensionality.
  You can specify a `sync_offset` to handle synchronization or acquisition problems.
  If it's positive, `sync_offset` number of frames will be skipped from the start of the input data.
  If it's negative, the dataset will be padded by `abs(sync_offset)` number of frames at the beginning.
  Different methods for I/O are available in LiberTEM, which can influence performance. 
  See :ref:`io backends` for details.

.. note::
  When using :code:`sync_offset` or a :code:`nav_shape` that exceeds the size of the input data
  it is currently not well-defined if zero-filled frames are to be generated or if the missing data is skipped.
  Most dataset implementations seem to skip the data. See :issue:`1384` for discussion, feedback welcome!

.. _`supported formats`:

Supported formats

LiberTEM supports the following file formats out of the box, see links for details:

* :ref:`mib`
* :ref:`raw binary`
* :ref:`raw csr`
* :ref:`npy format`
* :ref:`dm format`
* :ref:`empad`
* :ref:`k2is`
* :ref:`frms6`
* :ref:`blo`
* :ref:`ser`
* :ref:`hdf5`
* :ref:`seq`
* :ref:`mrc`
* :ref:`tvips`

Furthermore, two alternative mechanisms exist for interfacing LiberTEM with data loaded
elsewhere in Python via other libraries:

- a memory data set can be constructed from a NumPy array for testing
  purposes. See :ref:`memory` for details.
- a Dask data set can be constructed from a Dask array. Depending on the
  method used to construct the source array this can achieve good performance.
  See :ref:`daskds` for details.

.. _`data conversion`:

Dataset conversion

LiberTEM supports a mechanism to efficiently convert any supported dataset 
into a Numpy binary file (:code:`.npy`), which can then be loaded into memory
independently of LiberTEM (or read as a :code:`npy` format dataset as above).

.. versionadded:: 0.12.0

To convert a dataset to npy, use the :meth:`~libertem.api.Context.export_dataset` method:

.. code-block:: python

   with lt.Context() as ctx:
       ctx.export_dataset(dataset, './output_path.npy')

As of this time only exporting to the :code:`npy` format is supported, but other formats would be
possible as the need arose.

Alternatively, you can create Dask arrays from LiberTEM datasets via the :ref:`Dask integration <daskarray>`.
These arrays can then be stored with
`Dask's built-in functions <>`_
or through additional libraries such as `RosettaSciIO <>`_.