Skip to content

Base Module

ScanAnalysis.scan_analysis.base

Base classes and utilities for scan analyzers.

This module provides shared infrastructure for scan-level analyses. The core entry point is :class:ScanAnalyzer, which handles locating scan folders, parsing the .ini metadata, loading auxiliary data (the s-file), and exposing a uniform run_analysis() flow that concrete analyzers implement via _run_analysis_core().

All analyzers must inherit from :class:ScanAnalyzer and implement _run_analysis_core().

Classes:

Name Description
DataLengthError

Raised when data arrays have inconsistent lengths.

DataUnavailableWarning

Raised when a device's data directory is missing or empty for a scan.

ScanParameter

Lightweight wrapper for scan parameter string with common renderings.

ScanAnalyzer

Base class for performing analysis on scan data.

Functions:

Name Description
testing_routine

Simple dev sanity check.

Attributes:

Name Type Description
logger

Attributes

logger module-attribute

logger = getLogger(__name__)

Classes

DataLengthError

Bases: ValueError

Raised when data arrays have inconsistent lengths.

DataUnavailableWarning

Bases: Exception

Raised when a device's data directory is missing or empty for a scan.

This is an expected condition when an analyzer is configured for a device that was not active during a particular scan. It is caught separately from unexpected errors so that a clean warning is logged without a traceback, and the task queue records the state as no_data rather than failed.

ScanParameter

Bases: NamedTuple

Lightweight wrapper for scan parameter string with common renderings.

Methods:

Name Description
with_colon

Return the raw parameter as-is (including colons).

with_space

Return parameter with colons replaced by spaces.

__str__

Return default string form (same as with_colon() or with_space() usage).

Attributes:

Name Type Description
raw_string str
Attributes
raw_string instance-attribute
raw_string: str
Functions
with_colon
with_colon()

Return the raw parameter as-is (including colons).

Source code in ScanAnalysis/scan_analysis/base.py
53
54
55
def with_colon(self):
    """Return the raw parameter as-is (including colons)."""
    return f"{self.raw_string}"
with_space
with_space()

Return parameter with colons replaced by spaces.

Source code in ScanAnalysis/scan_analysis/base.py
57
58
59
def with_space(self):
    """Return parameter with colons replaced by spaces."""
    return f"{self.raw_string.replace(':', ' ')}"
__str__
__str__()

Return default string form (same as with_colon() or with_space() usage).

Source code in ScanAnalysis/scan_analysis/base.py
61
62
63
64
def __str__(self):
    """Return default string form (same as `with_colon()` or `with_space()` usage)."""
    # default, used for example in f"{scan_parameter}"
    return self.with_colon()

ScanAnalyzer

ScanAnalyzer(skip_plt_show: bool = True, device_name: Optional[str] = None, use_injected_data: bool = False, **kwargs)

Base class for performing analysis on scan data.

Responsibilities
  • Resolve scan paths from a :class:geecs_data_utils.ScanTag.
  • Read the scan .ini file and extract the "Scan Parameter".
  • Load auxiliary s-file data (tab-delimited) into a DataFrame.
  • Provide convenience helpers (e.g., label generation, s-file append).
  • Define the run_analysis() entry point that calls a subclass-provided :meth:_run_analysis_core.

Attributes:

Name Type Description
scan_directory Path | None

Path to the scan directory containing data.

auxiliary_file_path Path | None

Path to the auxiliary s-file (s<scan_number>.txt).

ini_file_path Path | None

Path to the .ini file containing scan parameters.

scan_parameter str | None

The cleaned scan parameter label (spaces or colons depending on configuration).

bins ndarray | None

Bin numbers from the auxiliary file.

auxiliary_data DataFrame | None

Loaded auxiliary data (s-file) used by downstream analyses.

Initialize the analyzer and default state.

Parameters:

Name Type Description Default
skip_plt_show bool

If False, figures are shown via plt.show(); otherwise all figures are closed automatically (batch/non-interactive use).

True
device_name str

Logical device name the analyzer is associated with. Purely informational here; concrete analyzers may use it to locate files.

None
use_injected_data bool

When False (default, post-scan path), load_auxiliary_data reads the s-file from disk and the scan-parameter column name follows MasterControl's space-separated disk convention.

When True (mid-scan/optimizer path), load_auxiliary_data is a no-op — the caller must set self.auxiliary_data to the in-memory DataLogger DataFrame before calling run_analysis. The scan-parameter column name automatically uses the in-memory device:variable colon convention, and per-shot scalar results stay on self._pending_aux_updates instead of being appended to the s-file.

Contract is intentionally tight: the base class never reaches out to any in-memory data source itself (that would invert the dep graph through Scanner-GUI's ScanManager). The caller owns the data injection.

False
**kwargs

Additional analyzer-specific options (ignored by the base class).

{}

Methods:

Name Description
run_analysis

Load inputs and dispatch to the subclass core analysis.

cleanup

Release per-scan memory after analysis completes.

extract_scan_parameter_from_ini

Extract and normalize the scan parameter label from the .ini file.

load_auxiliary_data

Load auxiliary s-file (tab-delimited) and derive bins/scan values.

close_or_show_plot

Show or close figures based on skip_plt_show.

append_to_sfile

Append or overwrite s-file columns (merging on Shotnumber with a lock).

generate_limited_shotnumber_labels

Generate evenly spaced shot-number labels with an upper bound on count.

find_scan_param_column

Locate the column in the auxiliary DataFrame that corresponds to the scan parameter.

find_column_for_key

Locate an auxiliary-data column that matches a user-supplied key string.

Source code in ScanAnalysis/scan_analysis/base.py
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
def __init__(
    self,
    skip_plt_show: bool = True,
    device_name: Optional[str] = None,
    use_injected_data: bool = False,
    **kwargs,
):
    """Initialize the analyzer and default state.

    Parameters
    ----------
    skip_plt_show : bool, default=True
        If ``False``, figures are shown via `plt.show()`; otherwise all figures
        are closed automatically (batch/non-interactive use).
    device_name : str, optional
        Logical device name the analyzer is associated with. Purely informational
        here; concrete analyzers may use it to locate files.
    use_injected_data : bool, default=False
        When ``False`` (default, post-scan path), ``load_auxiliary_data``
        reads the s-file from disk and the scan-parameter column name
        follows MasterControl's space-separated disk convention.

        When ``True`` (mid-scan/optimizer path), ``load_auxiliary_data``
        is a no-op — the caller must set ``self.auxiliary_data``
        to the in-memory DataLogger DataFrame *before* calling
        ``run_analysis``. The scan-parameter column name automatically
        uses the in-memory ``device:variable`` colon convention,
        and per-shot scalar results stay on
        ``self._pending_aux_updates`` instead of being appended to
        the s-file.

        Contract is intentionally tight: the base class never reaches
        out to any in-memory data source itself (that would invert
        the dep graph through Scanner-GUI's ``ScanManager``). The
        caller owns the data injection.
    **kwargs
        Additional analyzer-specific options (ignored by the base class).
    """
    self.scan_tag: Optional[ScanTag] = None
    self.scan_data: Optional[ScanData] = None
    self.scan_directory: Optional[Path] = None
    self.experiment_dir: Optional[str] = None
    self.ini_file_path: Optional[Path] = None
    self.scan_path: Optional[Path] = None
    self.auxiliary_file_path: Optional[Path] = None
    self.scan_parameter: Optional[str] = None  # the one you’ll *use*

    # Single switch. Disk vs in-memory data source is the underlying
    # distinction; the colon vs space scan-parameter convention is a
    # derived consequence (in-memory DataLogger uses colons,
    # MasterControl's disk s-file uses spaces).
    self.use_injected_data: bool = use_injected_data
    self.use_colon_scan_param: bool = use_injected_data

    self.noscan = False
    self.device_name = device_name
    self.skip_plt_show = skip_plt_show

    self.bins = None
    self.auxiliary_data: Optional[pd.DataFrame] = None
    self.binned_param_values = None

    self.display_contents = []
Attributes
scan_tag instance-attribute
scan_tag: Optional[ScanTag] = None
scan_data instance-attribute
scan_data: Optional[ScanData] = None
scan_directory instance-attribute
scan_directory: Optional[Path] = None
experiment_dir instance-attribute
experiment_dir: Optional[str] = None
ini_file_path instance-attribute
ini_file_path: Optional[Path] = None
scan_path instance-attribute
scan_path: Optional[Path] = None
auxiliary_file_path instance-attribute
auxiliary_file_path: Optional[Path] = None
scan_parameter instance-attribute
scan_parameter: Optional[str] = None
use_injected_data instance-attribute
use_injected_data: bool = use_injected_data
use_colon_scan_param instance-attribute
use_colon_scan_param: bool = use_injected_data
noscan instance-attribute
noscan = False
device_name instance-attribute
device_name = device_name
skip_plt_show instance-attribute
skip_plt_show = skip_plt_show
bins instance-attribute
bins = None
auxiliary_data instance-attribute
auxiliary_data: Optional[DataFrame] = None
binned_param_values instance-attribute
binned_param_values = None
display_contents instance-attribute
display_contents = []
Functions
run_analysis
run_analysis(scan_tag: ScanTag) -> Optional[list[Union[Path, str]]]

Load inputs and dispatch to the subclass core analysis.

Parameters:

Name Type Description Default
scan_tag ScanTag

Tag identifying the scan to analyze.

required

Returns:

Type Description
list[Path | str] or None

Optional list of notable artifact paths/labels produced by analysis (for experiment logs). Returns None if inputs are missing.

Source code in ScanAnalysis/scan_analysis/base.py
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
def run_analysis(self, scan_tag: ScanTag) -> Optional[list[Union[Path, str]]]:
    """Load inputs and dispatch to the subclass core analysis.

    Parameters
    ----------
    scan_tag : geecs_data_utils.ScanTag
        Tag identifying the scan to analyze.

    Returns
    -------
    list[Path | str] or None
        Optional list of notable artifact paths/labels produced by analysis
        (for experiment logs). Returns ``None`` if inputs are missing.
    """
    self._handle_scan_tag(scan_tag)  # or inline the logic here
    if self.auxiliary_data is None:
        return None
    return self._run_analysis_core()
cleanup
cleanup() -> None

Release per-scan memory after analysis completes.

Must be implemented by all subclasses. Called by the task runner after each analyzer finishes so that loaded data and results do not accumulate across scans. Failing to implement this will raise NotImplementedError and halt the runner — intentionally, to prevent unbounded memory growth.

Source code in ScanAnalysis/scan_analysis/base.py
178
179
180
181
182
183
184
185
186
187
def cleanup(self) -> None:
    """Release per-scan memory after analysis completes.

    Must be implemented by all subclasses. Called by the task runner
    after each analyzer finishes so that loaded data and results do not
    accumulate across scans. Failing to implement this will raise
    NotImplementedError and halt the runner — intentionally, to prevent
    unbounded memory growth.
    """
    raise NotImplementedError(f"{self.__class__.__name__} must implement cleanup()")
extract_scan_parameter_from_ini
extract_scan_parameter_from_ini() -> str

Extract and normalize the scan parameter label from the .ini file.

Returns:

Type Description
str

Cleaned scan parameter. By default, colons are replaced with spaces unless use_colon_scan_param is set to True. Optimization scans with a "Shotnumber" parameter are mapped to "Bin #".

Source code in ScanAnalysis/scan_analysis/base.py
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
def extract_scan_parameter_from_ini(self) -> str:
    """Extract and normalize the scan parameter label from the `.ini` file.

    Returns
    -------
    str
        Cleaned scan parameter. By default, colons are replaced with spaces
        unless `use_colon_scan_param` is set to True. Optimization scans with
        a "Shotnumber" parameter are mapped to "Bin #".
    """
    ini_contents = self.scan_data.paths.load_scan_info()
    # A MasterControl scan saves the scalar data columns with spaces between device
    # and variable, rather than use the basic device:variable configuration. If
    # dealing with live data, the device:variable convention is preserved
    # Load and sanitize raw scan parameter
    raw_param = ini_contents["Scan Parameter"].strip().replace('"', "")
    scan_parameter = ScanParameter(raw_string=raw_param)

    # Default value is space-separated unless overridden
    cleaned_scan_parameter = (
        scan_parameter.with_colon()
        if self.use_colon_scan_param
        else scan_parameter.with_space()
    )

    scan_mode = ini_contents.get("ScanMode", None)

    # add some special handling in case of optimization scan
    if scan_mode == "optimization" and cleaned_scan_parameter == "Shotnumber":
        cleaned_scan_parameter = "Bin #"

    return cleaned_scan_parameter
load_auxiliary_data
load_auxiliary_data()

Load auxiliary s-file (tab-delimited) and derive bins/scan values.

Notes
  • When use_injected_data is True, callers are expected to set self.auxiliary_data directly; this method does nothing.
  • For non-noscan cases, the per-bin mean of the scan parameter column is computed into self.binned_param_values.
Source code in ScanAnalysis/scan_analysis/base.py
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
def load_auxiliary_data(self):
    """Load auxiliary s-file (tab-delimited) and derive bins/scan values.

    Notes
    -----
    - When ``use_injected_data`` is True, callers are expected to set
      ``self.auxiliary_data`` directly; this method does nothing.
    - For non-``noscan`` cases, the per-bin mean of the scan parameter
      column is computed into ``self.binned_param_values``.
    """
    # When use_injected_data is True the caller (e.g. the optimizer's
    # MultiDeviceScanEvaluator) supplies self.auxiliary_data from the
    # in-memory DataLogger before run_analysis; nothing to load here.
    if not self.use_injected_data:
        try:
            self.auxiliary_data = pd.read_csv(
                self.auxiliary_file_path, delimiter="\t"
            )
            self.bins = self.auxiliary_data["Bin #"].values

            if not self.noscan:
                # Find the scan parameter column and calculate the binned values
                scan_param_column = self.find_scan_param_column()[0]
                self.binned_param_values = (
                    self.auxiliary_data.groupby("Bin #")[scan_param_column]
                    .mean()
                    .values
                )

        except (KeyError, FileNotFoundError) as e:
            logger.warning(
                f"{e}. Scan parameter not found in auxiliary data. Possible aborted scan. Skipping"
            )
close_or_show_plot
close_or_show_plot()

Show or close figures based on skip_plt_show.

Source code in ScanAnalysis/scan_analysis/base.py
313
314
315
316
317
318
def close_or_show_plot(self):
    """Show or close figures based on `skip_plt_show`."""
    if not self.skip_plt_show:
        plt.show()  # Display for interactive use
    else:
        plt.close("all")  # Ensure plots close when not using the GUI
append_to_sfile
append_to_sfile(data: DataFrame) -> None

Append or overwrite s-file columns (merging on Shotnumber with a lock).

Only accepts a DataFrame and requires an explicit Shotnumber column (case-insensitive match is accepted and normalized). Rows without Shotnumber are dropped with a warning.

Source code in ScanAnalysis/scan_analysis/base.py
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
def append_to_sfile(self, data: pd.DataFrame) -> None:
    """
    Append or overwrite s-file columns (merging on Shotnumber with a lock).

    Only accepts a DataFrame and requires an explicit ``Shotnumber`` column
    (case-insensitive match is accepted and normalized).
    Rows without ``Shotnumber`` are dropped with a warning.
    """
    if self.auxiliary_file_path is None:
        logger.warning("No auxiliary file path set; skipping s-file append.")
        return

    updates = self._prepare_updates_dataframe(data)
    if updates is None:
        return

    key = "Shotnumber"
    if self.auxiliary_data is not None:
        existing_cols = set(self.auxiliary_data.columns) & set(updates.columns) - {
            key
        }
        if existing_cols:
            logger.warning(
                "append_to_sfile: columns already exist in s-file: %s (will overwrite)",
                existing_cols,
            )

    self._merge_auxiliary_data(updates, key=key)
generate_limited_shotnumber_labels
generate_limited_shotnumber_labels(max_labels: int = 20) -> np.ndarray

Generate evenly spaced shot-number labels with an upper bound on count.

Parameters:

Name Type Description Default
max_labels int

Maximum number of labels to return.

20

Returns:

Type Description
ndarray

If the total number of shots is <= max_labels, returns [1..N]. Otherwise returns a range with stride chosen to produce <= max_labels labels.

Source code in ScanAnalysis/scan_analysis/base.py
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
def generate_limited_shotnumber_labels(self, max_labels: int = 20) -> np.ndarray:
    """Generate evenly spaced shot-number labels with an upper bound on count.

    Parameters
    ----------
    max_labels : int, default=20
        Maximum number of labels to return.

    Returns
    -------
    numpy.ndarray
        If the total number of shots is <= `max_labels`, returns ``[1..N]``.
        Otherwise returns a range with stride chosen to produce <= `max_labels`
        labels.
    """
    if self.total_shots <= max_labels:
        # If the number of shots is less than or equal to max_labels, return the full range
        return np.arange(1, self.total_shots + 1)
    else:
        # Otherwise, return a spaced-out array with at most max_labels
        step = self.total_shots // max_labels
        return np.arange(1, self.total_shots + 1, step)
find_scan_param_column
find_scan_param_column() -> tuple[Optional[str], Optional[str]]

Locate the column in the auxiliary DataFrame that corresponds to the scan parameter.

Returns:

Type Description
tuple[str | None, str | None]

(column_name, alias) where alias is the portion after 'Alias:' if present; both elements are None if noscan or a match is not found.

Notes
  • Matching is performed against the part of the column name preceding ' Alias:' to tolerate aliasing in s-files.
Source code in ScanAnalysis/scan_analysis/base.py
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
def find_scan_param_column(self) -> tuple[Optional[str], Optional[str]]:
    """Locate the column in the auxiliary DataFrame that corresponds to the scan parameter.

    Returns
    -------
    tuple[str | None, str | None]
        ``(column_name, alias)`` where `alias` is the portion after ``'Alias:'``
        if present; both elements are ``None`` if `noscan` or a match is not found.

    Notes
    -----
    - Matching is performed against the part of the column name preceding
      ``' Alias:'`` to tolerate aliasing in s-files.
    """
    # Clean the scan parameter by stripping any quotes or extra spaces
    # cleaned_scan_parameter = self.scan_parameter

    if not self.noscan:
        # Search for the first column that contains the cleaned scan parameter string
        for column in self.auxiliary_data.columns:
            # Match the part of the column before 'Alias:'
            if self.scan_parameter in column.split(" Alias:")[0]:
                # Return the column and the alias if present
                return column, column.split("Alias:")[
                    -1
                ].strip() if "Alias:" in column else column

        logger.warning(
            f"Warning: Could not find column containing scan parameter: {self.scan_parameter}"
        )
        return None, None
    else:
        return None, None
find_column_for_key
find_column_for_key(key: str) -> Optional[str]

Locate an auxiliary-data column that matches a user-supplied key string.

Tries the key as-is, with colons replaced by spaces, and with spaces replaced by colons, performing a substring match against the portion of each column name that precedes any ' Alias:' suffix.

Parameters:

Name Type Description Default
key str

User-supplied string, e.g. 'Device:Variable' or 'Device Variable'.

required

Returns:

Type Description
str or None

The first matching column name (full, including any alias suffix), or None if no match is found.

Source code in ScanAnalysis/scan_analysis/base.py
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
def find_column_for_key(self, key: str) -> Optional[str]:
    """Locate an auxiliary-data column that matches a user-supplied key string.

    Tries the key as-is, with colons replaced by spaces, and with spaces
    replaced by colons, performing a substring match against the portion of
    each column name that precedes any ``' Alias:'`` suffix.

    Parameters
    ----------
    key : str
        User-supplied string, e.g. ``'Device:Variable'`` or ``'Device Variable'``.

    Returns
    -------
    str or None
        The first matching column name (full, including any alias suffix),
        or ``None`` if no match is found.
    """
    if self.auxiliary_data is None:
        logger.warning(
            "find_column_for_key called but auxiliary_data is not loaded."
        )
        return None

    candidates = {key, key.replace(":", " "), key.replace(" ", ":")}
    for column in self.auxiliary_data.columns:
        col_base = column.split(" Alias:")[0]
        if any(c in col_base for c in candidates):
            return column

    logger.warning(f"Could not find auxiliary_data column matching key: '{key}'")
    return None

Functions

testing_routine

testing_routine()

Simple dev sanity check.

Source code in ScanAnalysis/scan_analysis/base.py
593
594
595
596
def testing_routine():
    """Simple dev sanity check."""
    print(ScanData)
    pass