Skip to content

Core Modules

ScanPaths API

geecs_data_utils.scan_paths.ScanPaths

ScanPaths(folder: Optional[SysPath] = None, tag: Optional[ScanTag] = None, base_directory: Union[Path, str, None] = None, read_mode: bool = True)

Represents a GEECS experiment scan.

Attributes:

Name Type Description
scan_info dict[str, str]

Dictionary containing scan configuration information loaded from scan info file

paths_config GeecsPathsConfig

Class-level configuration object for managing GEECS data paths

Initialize ScanPaths object.

Either a folder or a tag+base_directory needs to be given in order to specify the location of a scan data folder

Parameters:

Name Type Description Default
folder Union[str, bytes, PathLike]

Data folder containing the scan data, e.g. "Z:/data/Undulator/Y2023/05-May/23_0501/scans/Scan002".

None
tag Optional[ScanTag]

NamedTuple with the experiment name, date, and scan number

None
base_directory Optional[Union[Path, str]]

The base path for the data, e/g/ "Z:/data/" If not given, will default to the path located by GeecsPathsConfig

None
read_mode bool

If True (the default), raise if the scan folder does not exist. If False, silently create the folder (including any missing parents).

read_mode=False is for scanner-side callers only — the GEECS scanner and BlueskyScanner, which legitimately bring new scan folders into existence. Analysis code (ScanAnalysis, ImageAnalysis, anything that consumes existing scans) must always leave this at the default. Silent creation from the consumer side has caused data loss: a transient SMB/NetApp visibility blip looks like a missing folder, and auto-creating it plants an empty directory over the real one.

True

Methods:

Name Description
reload_paths_config

Use by GEECS Scanner to fix scan_data_manager in case experiment name has changed.

get_scan_tag

Return a ScanTag tuple given the appropriate information, formatted correctly.

get_scan_folder_path

Build scan folder paths for local and client directories.

get_daily_scan_folder

Build path to the daily scan folder. If no tag given but experiment name given, uses the current day.

get_scan_analysis_folder_path

Build analysis folder path using the scan folder path as a baseline.

get_device_shot_path

Build the full path to a device's shot file based on the scan tag, device name, and shot number.

get_latest_scan_tag

Locates the last generated scan for the given day or defaults to today if no date is provided.

get_next_scan_tag

Determine the next available scan tag for the given day or today if no date is provided.

get_next_scan_folder

Build the folder path for the next scan on the given day or today if no date is provided.

build_next_scan_data

Create the ScanData object for the next scan and builds its folder.

is_background_scan

Check if the given scan tag references a scan that was designated as a background.

get_folder

Get the scan folder path.

get_tag

Get the scan tag.

get_tag_date

Get the scan date.

get_analysis_folder

Get the analysis folder path, creating it if necessary.

get_folders_and_files

Get lists of device folders and files in the scan directory.

load_scan_info

Load scan configuration information from the scan info file.

get_ecs_dump_file

Get the ECS Live Dump file corresponding to this scan.

build_device_file_map

Build a mapping from shot number to file path for a given device.

get_common_shot_dataframe

Generate a DataFrame containing file paths for all devices with common shot number.

list_device_folders

Return device subfolder names from this scan folder.

device_folder

Resolve '/'.

build_asset_filename

Build canonical expected file naming.

build_asset_path

Full expected path for one asset.

infer_device_ext

Peek at up to max_files files to find proper file extension.

Source code in geecs_data_utils/scan_paths.py
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
def __init__(
    self,
    folder: Optional[SysPath] = None,
    tag: Optional[ScanTag] = None,
    base_directory: Union[Path, str, None] = None,
    read_mode: bool = True,
):
    """
    Initialize ScanPaths object.

    Either a folder or a tag+base_directory needs to be given in order to specify the location of a scan data folder

    Parameters
    ----------
    folder : Union[str, bytes, PathLike]
        Data folder containing the scan data, e.g. "Z:/data/Undulator/Y2023/05-May/23_0501/scans/Scan002".
    tag : Optional[ScanTag]
        NamedTuple with the experiment name, date, and scan number
    base_directory : Optional[Union[Path, str]]
        The base path for the data, e/g/ "Z:/data/"
        If not given, will default to the path located by GeecsPathsConfig
    read_mode: bool
        If True (the default), raise if the scan folder does not exist.
        If False, silently create the folder (including any missing parents).

        ``read_mode=False`` is for *scanner-side* callers only — the GEECS
        scanner and BlueskyScanner, which legitimately bring new scan folders
        into existence. Analysis code (ScanAnalysis, ImageAnalysis, anything
        that consumes existing scans) must always leave this at the default.
        Silent creation from the consumer side has caused data loss: a
        transient SMB/NetApp visibility blip looks like a missing folder, and
        auto-creating it plants an empty directory over the real one.
    """
    self.scan_info: dict[str, str] = {}

    self._folder: Optional[Path] = None
    self._tag: Optional[ScanTag] = None
    self._tag_date: Optional[date] = None
    self._analysis_folder: Optional[Path] = None

    # Handle folder initialization
    if folder is None and tag is not None:
        if base_directory is None or not Path(base_directory).exists():
            base_directory = ScanPaths.paths_config.base_path
        if not Path(base_directory).exists():
            raise NotADirectoryError(
                f"Error setting base directory: '{base_directory}'"
            )
        folder = self.get_scan_folder_path(tag, base_directory=base_directory)

    self._initialize_folders(folder, read_mode)

reload_paths_config classmethod

reload_paths_config(config_path: Optional[Path] = None, default_experiment: Optional[str] = None, set_base_path: Optional[Union[Path, str]] = None, image_analysis_configs_path: Optional[Union[Path, str]] = None)

Use by GEECS Scanner to fix scan_data_manager in case experiment name has changed.

Source code in geecs_data_utils/scan_paths.py
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
@classmethod
def reload_paths_config(
    cls,
    config_path: Optional[Path] = None,
    default_experiment: Optional[str] = None,
    set_base_path: Optional[Union[Path, str]] = None,
    image_analysis_configs_path: Optional[Union[Path, str]] = None,
):
    """Use by GEECS Scanner to fix scan_data_manager in case experiment name has changed."""
    try:
        if (
            config_path is None
        ):  # Then don't explicitly pass config_path so that it uses the default location
            cls.paths_config = GeecsPathsConfig(
                default_experiment=default_experiment,
                set_base_path=set_base_path,
                image_analysis_configs_path=image_analysis_configs_path,
            )
        else:
            cls.paths_config = GeecsPathsConfig(
                config_path=config_path,
                default_experiment=default_experiment,
                set_base_path=set_base_path,
                image_analysis_configs_path=image_analysis_configs_path,
            )
    except ConfigurationError as e:
        logger.error(f"Configuration Error in ScanData: {e}")
        cls.paths_config = None

get_scan_tag staticmethod

get_scan_tag(year: Union[int, str], month: Union[int, str], day: Union[int, str], number: Union[int, str], experiment: Optional[str] = None, experiment_name: Optional[str] = None) -> ScanTag

Return a ScanTag tuple given the appropriate information, formatted correctly.

Ideally one should only build ScanTag objects using this function.

Parameters:

Name Type Description Default
year Union[int, str]

Target scan year

required
month Union[int, str]

Target scan month

required
day Union[int, str]

Target scan day

required
number Union[int, str]

Target scan number

required
experiment str

Target scan's experiment name

None
experiment_name str

Target scan's experiment name (deprecated)

None

Returns:

Type Description
ScanTag

properly formatted information to describe the target scan

Source code in geecs_data_utils/scan_paths.py
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
@staticmethod
def get_scan_tag(
    year: Union[int, str],
    month: Union[int, str],
    day: Union[int, str],
    number: Union[int, str],
    experiment: Optional[str] = None,
    experiment_name: Optional[str] = None,
) -> ScanTag:
    """
    Return a ScanTag tuple given the appropriate information, formatted correctly.

    Ideally one should only build ScanTag objects using this function.

    Parameters
    ----------
    year : Union[int, str]
        Target scan year
    month : Union[int, str]
        Target scan month
    day : Union[int, str]
        Target scan day
    number : Union[int, str]
        Target scan number
    experiment : str
        Target scan's experiment name
    experiment_name : str
        Target scan's experiment name (deprecated)

    Returns
    -------
    ScanTag
        properly formatted information to describe the target scan
    """
    year = int(year)
    if 0 <= year <= 99:
        year += 2000
    month = month_to_int(month)

    exp = experiment or experiment_name or ScanPaths.paths_config.experiment
    if experiment_name is not None:
        logger.warning(
            "Recommended to use 'experiment' instead of 'experiment_name' for 'get_scan_tag'..."
        )

    return ScanTag(
        year=year, month=month, day=int(day), number=int(number), experiment=exp
    )

get_scan_folder_path staticmethod

get_scan_folder_path(tag: ScanTag, base_directory: Optional[Union[Path, str]] = None) -> Path

Build scan folder paths for local and client directories.

Source code in geecs_data_utils/scan_paths.py
253
254
255
256
257
258
259
260
261
@staticmethod
def get_scan_folder_path(
    tag: ScanTag, base_directory: Optional[Union[Path, str]] = None
) -> Path:
    """Build scan folder paths for local and client directories."""
    return (
        ScanPaths.get_daily_scan_folder(tag=tag, base_directory=base_directory)
        / f"Scan{tag.number:03d}"
    )

get_daily_scan_folder staticmethod

get_daily_scan_folder(experiment: str = None, tag: ScanTag = None, base_directory: Optional[Union[Path, str]] = None) -> Path

Build path to the daily scan folder. If no tag given but experiment name given, uses the current day.

Source code in geecs_data_utils/scan_paths.py
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
@staticmethod
def get_daily_scan_folder(
    experiment: str = None,
    tag: ScanTag = None,
    base_directory: Optional[Union[Path, str]] = None,
) -> Path:
    """Build path to the daily scan folder. If no tag given but experiment name given, uses the current day."""
    base = base_directory or ScanPaths.paths_config.base_path

    if tag is None and experiment is None:
        raise ValueError(
            "Need to give experiment name or Scan Tag to `get_daily_scan_folder`"
        )

    if tag is None:
        today = datetime.today()
        tag = ScanPaths.get_scan_tag(
            today.year,
            month=today.month,
            day=today.day,
            number=0,
            experiment=experiment,
        )

    folder = Path(base) / tag.experiment if tag.experiment else Path(base)
    folder = (
        folder / f"Y{tag.year}" / f"{tag.month:02d}-{cal.month_name[tag.month][:3]}"
    )
    folder /= f"{str(tag.year)[-2:]}_{tag.month:02d}{tag.day:02d}"
    folder = folder / "scans"

    return folder

get_scan_analysis_folder_path staticmethod

get_scan_analysis_folder_path(tag: ScanTag, base_directory: Optional[Union[Path, str]] = None) -> Path

Build analysis folder path using the scan folder path as a baseline.

Source code in geecs_data_utils/scan_paths.py
296
297
298
299
300
301
302
303
304
305
306
307
@staticmethod
def get_scan_analysis_folder_path(
    tag: ScanTag, base_directory: Optional[Union[Path, str]] = None
) -> Path:
    """Build analysis folder path using the scan folder path as a baseline."""
    scan_folder_path = ScanPaths.get_scan_folder_path(
        tag=tag, base_directory=base_directory
    )

    parts = list(scan_folder_path.parts)
    parts[-2] = "analysis"
    return Path(*parts)

get_device_shot_path staticmethod

get_device_shot_path(tag: ScanTag, device_name: str, shot_number: int, file_extension: str = 'png', base_directory: Optional[Union[Path, str]] = None) -> Path

Build the full path to a device's shot file based on the scan tag, device name, and shot number.

Parameters:

Name Type Description Default
tag ScanTag

The scan tag containing year, month, day, and scan number.

required
device_name str

The name of the device.

required
shot_number int

The shot number.

required
file_extension str

File extension for the shot file (default: 'png').

'png'
base_directory Optional[Union[Path, str]]

Base directory for the scan (default: CONFIG.local_base_path).

None
experiment Optional[str]

Experiment name (default: CONFIG.experiment).

required

Returns:

Type Description
Path

The full path to the device's shot file.

Source code in geecs_data_utils/scan_paths.py
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
@staticmethod
def get_device_shot_path(
    tag: ScanTag,
    device_name: str,
    shot_number: int,
    file_extension: str = "png",
    base_directory: Optional[Union[Path, str]] = None,
) -> Path:
    """
    Build the full path to a device's shot file based on the scan tag, device name, and shot number.

    Parameters
    ----------
    tag : ScanTag
        The scan tag containing year, month, day, and scan number.
    device_name : str
        The name of the device.
    shot_number : int
        The shot number.
    file_extension : str, optional
        File extension for the shot file (default: 'png').
    base_directory : Optional[Union[Path, str]], optional
        Base directory for the scan (default: CONFIG.local_base_path).
    experiment : Optional[str], optional
        Experiment name (default: CONFIG.experiment).

    Returns
    -------
    Path
        The full path to the device's shot file.
    """
    scan_path = ScanPaths.get_scan_folder_path(
        tag=tag, base_directory=base_directory
    )
    extension = (
        "." + file_extension if "." not in file_extension else file_extension
    )
    file = (
        scan_path
        / f"{device_name}"
        / f"Scan{tag.number:03d}_{device_name}_{shot_number:03d}{extension}"
    )
    return file

get_latest_scan_tag staticmethod

get_latest_scan_tag(experiment: Optional[str] = None, year: Optional[int] = None, month: Optional[int] = None, day: Optional[int] = None, base_directory: Union[str, Path, None] = None) -> Optional[ScanTag]

Locates the last generated scan for the given day or defaults to today if no date is provided.

Parameters:

Name Type Description Default
experiment Optional[str]

Experiment name (default: CONFIG.experiment).

None
year Optional[int]

Year of the scan (4-digit, default: current year if not provided).

None
month Optional[int]

Month of the scan (1-12, default: current month if not provided).

None
day Optional[int]

Day of the scan (1-31, default: current day if not provided).

None

Returns:

Type Description
Optional[ScanTag]

The ScanTag representing the latest scan folder, or None if no scans exist for the given day.

Source code in geecs_data_utils/scan_paths.py
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
@staticmethod
def get_latest_scan_tag(
    experiment: Optional[str] = None,
    year: Optional[int] = None,
    month: Optional[int] = None,
    day: Optional[int] = None,
    base_directory: Union[str, Path, None] = None,
) -> Optional[ScanTag]:
    """
    Locates the last generated scan for the given day or defaults to today if no date is provided.

    Parameters
    ----------
    experiment : Optional[str], optional
        Experiment name (default: CONFIG.experiment).
    year : Optional[int], optional
        Year of the scan (4-digit, default: current year if not provided).
    month : Optional[int], optional
        Month of the scan (1-12, default: current month if not provided).
    day : Optional[int], optional
        Day of the scan (1-31, default: current day if not provided).

    Returns
    -------
    Optional[ScanTag]
        The ScanTag representing the latest scan folder, or None if no scans exist for the given day.
    """
    today = datetime.today()
    year = year or today.year
    month = month or today.month
    day = day or today.day

    i = 1
    while True:
        tag = ScanPaths.get_scan_tag(year, month, day, i, experiment=experiment)
        try:
            ScanPaths(tag=tag, read_mode=True, base_directory=base_directory)
        except ValueError:
            break
        i += 1

    if i == 1:
        return None  # No scans exist for the given day
    return ScanPaths.get_scan_tag(year, month, day, i - 1, experiment=experiment)

get_next_scan_tag staticmethod

get_next_scan_tag(experiment: Optional[str] = None, year: Optional[int] = None, month: Optional[int] = None, day: Optional[int] = None, base_directory: Union[str, Path, None] = None) -> ScanTag

Determine the next available scan tag for the given day or today if no date is provided.

Parameters:

Name Type Description Default
experiment Optional[str]

Experiment name (default: CONFIG.experiment).

None
year Optional[int]

Year of the scan (4-digit, default: current year if not provided).

None
month Optional[int]

Month of the scan (1-12, default: current month if not provided).

None
day Optional[int]

Day of the scan (1-31, default: current day if not provided).

None

Returns:

Type Description
ScanTag

The ScanTag for the next available scan.

Source code in geecs_data_utils/scan_paths.py
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
@staticmethod
def get_next_scan_tag(
    experiment: Optional[str] = None,
    year: Optional[int] = None,
    month: Optional[int] = None,
    day: Optional[int] = None,
    base_directory: Union[str, Path, None] = None,
) -> ScanTag:
    """
    Determine the next available scan tag for the given day or today if no date is provided.

    Parameters
    ----------
    experiment : Optional[str], optional
        Experiment name (default: CONFIG.experiment).
    year : Optional[int], optional
        Year of the scan (4-digit, default: current year if not provided).
    month : Optional[int], optional
        Month of the scan (1-12, default: current month if not provided).
    day : Optional[int], optional
        Day of the scan (1-31, default: current day if not provided).

    Returns
    -------
    ScanTag
        The ScanTag for the next available scan.
    """
    latest_tag = ScanPaths.get_latest_scan_tag(
        experiment, year, month, day, base_directory=base_directory
    )
    if not latest_tag:
        today = datetime.today()
        year = year or today.year
        month = month or today.month
        day = day or today.day
        return ScanPaths.get_scan_tag(year, month, day, 1, experiment=experiment)

    return ScanPaths.get_scan_tag(
        latest_tag.year,
        latest_tag.month,
        latest_tag.day,
        latest_tag.number + 1,
        experiment=experiment,
    )

get_next_scan_folder staticmethod

get_next_scan_folder(experiment: Optional[str] = None, year: Optional[int] = None, month: Optional[int] = None, day: Optional[int] = None, base_directory: Union[str, Path, None] = None) -> Path

Build the folder path for the next scan on the given day or today if no date is provided.

Parameters:

Name Type Description Default
experiment Optional[str]

Experiment name (default: CONFIG.experiment).

None
year Optional[int]

Year of the scan (4-digit, default: current year if not provided).

None
month Optional[int]

Month of the scan (1-12, default: current month if not provided).

None
day Optional[int]

Day of the scan (1-31, default: current day if not provided).

None

Returns:

Type Description
Path

The Path to the folder for the next scan.

Source code in geecs_data_utils/scan_paths.py
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
@staticmethod
def get_next_scan_folder(
    experiment: Optional[str] = None,
    year: Optional[int] = None,
    month: Optional[int] = None,
    day: Optional[int] = None,
    base_directory: Union[str, Path, None] = None,
) -> Path:
    """
    Build the folder path for the next scan on the given day or today if no date is provided.

    Parameters
    ----------
    experiment : Optional[str], optional
        Experiment name (default: CONFIG.experiment).
    year : Optional[int], optional
        Year of the scan (4-digit, default: current year if not provided).
    month : Optional[int], optional
        Month of the scan (1-12, default: current month if not provided).
    day : Optional[int], optional
        Day of the scan (1-31, default: current day if not provided).

    Returns
    -------
    Path
        The Path to the folder for the next scan.
    """
    next_tag = ScanPaths.get_next_scan_tag(
        experiment, year, month, day, base_directory=base_directory
    )
    return ScanPaths.get_scan_folder_path(
        tag=next_tag, base_directory=base_directory
    )

build_next_scan_data staticmethod

build_next_scan_data(experiment: Optional[str] = None, year: Optional[int] = None, month: Optional[int] = None, day: Optional[int] = None, base_directory: Union[str, Path, None] = None) -> ScanPaths

Create the ScanData object for the next scan and builds its folder.

Parameters:

Name Type Description Default
experiment Optional[str]

Experiment name (default: CONFIG.experiment).

None
year Optional[int]

Year of the scan (4-digit, default: current year if not provided).

None
month Optional[int]

Month of the scan (1-12, default: current month if not provided).

None
day Optional[int]

Day of the scan (1-31, default: current day if not provided).

None

Returns:

Type Description
ScanData

The ScanData object for the next scan.

Source code in geecs_data_utils/scan_paths.py
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
@staticmethod
def build_next_scan_data(
    experiment: Optional[str] = None,
    year: Optional[int] = None,
    month: Optional[int] = None,
    day: Optional[int] = None,
    base_directory: Union[str, Path, None] = None,
) -> ScanPaths:
    """
    Create the ScanData object for the next scan and builds its folder.

    Parameters
    ----------
    experiment : Optional[str], optional
        Experiment name (default: CONFIG.experiment).
    year : Optional[int], optional
        Year of the scan (4-digit, default: current year if not provided).
    month : Optional[int], optional
        Month of the scan (1-12, default: current month if not provided).
    day : Optional[int], optional
        Day of the scan (1-31, default: current day if not provided).

    Returns
    -------
    ScanData
        The ScanData object for the next scan.
    """
    next_tag = ScanPaths.get_next_scan_tag(
        experiment, year, month, day, base_directory=base_directory
    )
    return ScanPaths(tag=next_tag, read_mode=False, base_directory=base_directory)

is_background_scan staticmethod

is_background_scan(tag: ScanTag, base_directory: Optional[Union[Path, str]] = None) -> bool

Check if the given scan tag references a scan that was designated as a background.

Parameters:

Name Type Description Default
tag ScanTag

The scan tag containing year, month, day, and scan number.

required
base_directory Optional[Union[Path, str]]

Base directory for the scan (default: CONFIG.local_base_path).

None

Returns:

Type Description
bool

True if scan was explictly set as a Background scan, False otherwise

Source code in geecs_data_utils/scan_paths.py
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
@staticmethod
def is_background_scan(
    tag: ScanTag, base_directory: Optional[Union[Path, str]] = None
) -> bool:
    """
    Check if the given scan tag references a scan that was designated as a background.

    Parameters
    ----------
    tag : ScanTag
        The scan tag containing year, month, day, and scan number.
    base_directory : Optional[Union[Path, str]], optional
        Base directory for the scan (default: CONFIG.local_base_path).

    Returns
    -------
    bool
        True if scan was explictly set as a Background scan, False otherwise
    """
    scan_folder = ScanPaths.get_scan_folder_path(
        tag=tag, base_directory=base_directory
    )
    config_filename = scan_folder / f"ScanInfoScan{tag.number:03d}.ini"

    config = ConfigParser()
    config.read(config_filename)

    if config.has_section("Scan Info") and config.has_option(
        "Scan Info", "Background"
    ):
        return config.get("Scan Info", "Background").strip().lower() == '"true"'
    return False

get_folder

get_folder() -> Optional[Path]

Get the scan folder path.

Source code in geecs_data_utils/scan_paths.py
542
543
544
def get_folder(self) -> Optional[Path]:
    """Get the scan folder path."""
    return self._folder

get_tag

get_tag() -> Optional[ScanTag]

Get the scan tag.

Source code in geecs_data_utils/scan_paths.py
546
547
548
def get_tag(self) -> Optional[ScanTag]:
    """Get the scan tag."""
    return self._tag

get_tag_date

get_tag_date() -> Optional[date]

Get the scan date.

Source code in geecs_data_utils/scan_paths.py
550
551
552
def get_tag_date(self) -> Optional[date]:
    """Get the scan date."""
    return self._tag_date

get_analysis_folder

get_analysis_folder() -> Optional[Path]

Get the analysis folder path, creating it if necessary.

Source code in geecs_data_utils/scan_paths.py
554
555
556
557
558
559
560
561
562
563
def get_analysis_folder(self) -> Optional[Path]:
    """Get the analysis folder path, creating it if necessary."""
    if self._analysis_folder is None:
        parts = list(Path(self._folder).parts)
        parts[-2] = "analysis"
        self._analysis_folder = Path(*parts)
        if not self._analysis_folder.is_dir():
            os.makedirs(self._analysis_folder)

    return self._analysis_folder

get_folders_and_files

get_folders_and_files() -> dict[str, list[str]]

Get lists of device folders and files in the scan directory.

Source code in geecs_data_utils/scan_paths.py
565
566
567
568
def get_folders_and_files(self) -> dict[str, list[str]]:
    """Get lists of device folders and files in the scan directory."""
    top_content = next(os.walk(self._folder))
    return {"devices": top_content[1], "files": top_content[2]}

load_scan_info

load_scan_info()

Load scan configuration information from the scan info file.

Source code in geecs_data_utils/scan_paths.py
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
def load_scan_info(self):
    """Load scan configuration information from the scan info file."""
    config_parser = ConfigParser()
    config_parser.optionxform = str

    try:
        config_parser.read(self._folder / f"ScanInfoScan{self._tag.number:03d}.ini")
        self.scan_info.update(
            {
                key: value.strip("'\"")
                for key, value in config_parser.items("Scan Info")
            }
        )
    except NoSectionError:
        temp_scan_data = inspect.stack()[0][3]
        logging.warning(
            f'ScanInfo file does not have a "Scan Info" section (in {temp_scan_data})'
        )

    return self.scan_info

get_ecs_dump_file

get_ecs_dump_file() -> Optional[Path]

Get the ECS Live Dump file corresponding to this scan.

Returns:

Type Description
Optional[Path]

Path to the ECS dump file if it exists, else None.

Source code in geecs_data_utils/scan_paths.py
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
def get_ecs_dump_file(self) -> Optional[Path]:
    """
    Get the ECS Live Dump file corresponding to this scan.

    Returns
    -------
    Optional[Path]
        Path to the ECS dump file if it exists, else None.
    """
    if not self._folder:
        return None

    ecs_folder = self._folder.parent.parent / "ECS Live dumps"
    filename = f"Scan{self._tag.number}.txt"
    ecs_file = ecs_folder / filename

    return ecs_file if ecs_file.exists() else None

build_device_file_map

build_device_file_map(device: str, file_tail: str, *, device_file_stem: Optional[str] = None) -> dict[int, Path]

Build a mapping from shot number to file path for a given device.

Parameters:

Name Type Description Default
device str

Device name; also the subfolder of the scan directory containing the files.

required
file_tail str

Suffix and extension, e.g., '.png', '_avg.h5'.

required
device_file_stem str

Token used in the filename between Scan<NNN>_ and _<shot>. Defaults to device. Use this when the folder name and the in-filename stem differ — for example, folder U_BCaveMagSpec-interpSpec containing files named Scan042_U_BCaveMagSpec_001.csv.

None

Returns:

Type Description
dict[int, Path]

Mapping from shot number to file path.

Source code in geecs_data_utils/scan_paths.py
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
def build_device_file_map(
    self,
    device: str,
    file_tail: str,
    *,
    device_file_stem: Optional[str] = None,
) -> dict[int, Path]:
    """
    Build a mapping from shot number to file path for a given device.

    Parameters
    ----------
    device : str
        Device name; also the subfolder of the scan directory containing
        the files.
    file_tail : str
        Suffix and extension, e.g., '.png', '_avg.h5'.
    device_file_stem : str, optional
        Token used in the filename between ``Scan<NNN>_`` and ``_<shot>``.
        Defaults to ``device``. Use this when the folder name and the
        in-filename stem differ — for example, folder
        ``U_BCaveMagSpec-interpSpec`` containing files named
        ``Scan042_U_BCaveMagSpec_001.csv``.

    Returns
    -------
    dict[int, Path]
        Mapping from shot number to file path.
    """
    base_path = self.get_folder()
    if not base_path:
        raise ValueError("Scan folder is not set.")

    device_folder = base_path / device
    if not device_folder.exists():
        logger.warning(f"Device folder missing: {device_folder}")
        return {}

    stem = device_file_stem if device_file_stem is not None else device
    pattern = re.compile(
        rf"Scan\d{{3,}}_{re.escape(stem)}_(\d{{3,}}){re.escape(file_tail)}$"
    )

    file_map = {}
    for file in device_folder.iterdir():
        if not file.is_file():
            continue
        match = pattern.match(file.name)
        if match:
            shot_number = int(match.group(1))
            file_map[shot_number] = file

    return file_map

get_common_shot_dataframe

get_common_shot_dataframe(device_file_specs: Sequence[tuple[str, str]]) -> pd.DataFrame

Generate a DataFrame containing file paths for all devices with common shot number.

This method identifies shot numbers that are common (present in all specified devices' subfolders) and returns a table where each row corresponds to a shot number, and each column contains the full path to the file for that device.

Parameters:

Name Type Description Default
device_file_specs Sequence[tuple[str, str]]

A sequence of (device_name, file_tail) pairs. - device_name is the name of the subdirectory inside the scan folder. - file_tail is the suffix used in the filename, including extension, such as '.png', '_avg.h5', or '.tdms'.

required

Returns:

Type Description
DataFrame

A DataFrame with one row per shot number that exists for all devices. Columns: - 'shot_number': The shot number (int). - One column per device name, with each entry as a Path object to the matching file. If no common shots are found, an empty DataFrame with appropriate columns is returned.

Examples:

>>> tag = ScanTag(year=2025, month=8, day=7, number=5, experiment='Undulator')
>>> sd = ScanPaths(tag=tag)
>>> dev_list = [
...     ('Z_Test_Scope', '.dat'),
...     ('Z_Test_Scope_2', '.dat'),
...     ('UC_ALineEBeam3', '.png')
... ]
>>> common_shots = sd.get_common_shot_dataframe(dev_list)
Source code in geecs_data_utils/scan_paths.py
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
def get_common_shot_dataframe(
    self, device_file_specs: Sequence[tuple[str, str]]
) -> pd.DataFrame:
    """
    Generate a DataFrame containing file paths for all devices with common shot number.

    This method identifies shot numbers that are common (present in all specified
    devices' subfolders) and returns a table where each row corresponds to a shot
    number, and each column contains the full path to the file for that device.

    Parameters
    ----------
    device_file_specs : Sequence[tuple[str, str]]
        A sequence of (device_name, file_tail) pairs.
        - `device_name` is the name of the subdirectory inside the scan folder.
        - `file_tail` is the suffix used in the filename, including extension,
          such as '.png', '_avg.h5', or '.tdms'.

    Returns
    -------
    pd.DataFrame
        A DataFrame with one row per shot number that exists for all devices.
        Columns:
        - 'shot_number': The shot number (int).
        - One column per device name, with each entry as a `Path` object to the matching file.
        If no common shots are found, an empty DataFrame with appropriate columns is returned.

    Examples
    --------
    >>> tag = ScanTag(year=2025, month=8, day=7, number=5, experiment='Undulator')
    >>> sd = ScanPaths(tag=tag)
    >>> dev_list = [
    ...     ('Z_Test_Scope', '.dat'),
    ...     ('Z_Test_Scope_2', '.dat'),
    ...     ('UC_ALineEBeam3', '.png')
    ... ]
    >>> common_shots = sd.get_common_shot_dataframe(dev_list)
    """
    device_maps = {
        device: self.build_device_file_map(device, file_tail)
        for device, file_tail in device_file_specs
    }

    # Find common shot numbers across all devices
    common_shots = set.intersection(*(set(m.keys()) for m in device_maps.values()))
    if not common_shots:
        logger.warning("No common shots found across specified devices.")
        return pd.DataFrame(
            columns=["shot_number"] + [device for device, _ in device_file_specs]
        )

    # Build rows: one per shot
    rows = []
    for shot in sorted(common_shots):
        row = {"shot_number": shot}
        for device, file_map in device_maps.items():
            row[device] = file_map[shot]
        rows.append(row)

    return pd.DataFrame(rows)

list_device_folders

list_device_folders() -> list[str]

Return device subfolder names from this scan folder.

Source code in geecs_data_utils/scan_paths.py
731
732
733
734
735
736
737
738
739
740
741
def list_device_folders(self) -> list[str]:
    """Return device subfolder names from this scan folder."""
    try:
        return self.get_folders_and_files().get("devices", [])
    except Exception:
        root = self.get_folder()
        return (
            [p.name for p in root.iterdir() if p.is_dir()]
            if root and root.exists()
            else []
        )

device_folder

device_folder(device: str) -> Path

Resolve '/'.

Source code in geecs_data_utils/scan_paths.py
743
744
745
def device_folder(self, device: str) -> Path:
    """Resolve '<scan>/<device>'."""
    return self.get_folder() / device

build_asset_filename staticmethod

build_asset_filename(*, scan: int, shot: int, device: str, ext: str, variant: Optional[str] = None, device_file_stem: Optional[str] = None) -> str

Build canonical expected file naming.

Parameters:

Name Type Description Default
scan int

Scan and shot numbers.

required
shot int

Scan and shot numbers.

required
device str

Device name (also the in-filename stem unless overridden).

required
ext str

File extension (with or without leading dot).

required
variant str

Variant segment appended after the shot index.

None
device_file_stem str

Token to use in the filename between Scan<NNN>_ and _<shot>. Defaults to device. Use this when the folder name and the in-filename stem differ — for example, folder U_BCaveMagSpec-interpSpec containing files named Scan042_U_BCaveMagSpec_001.csv.

None
Source code in geecs_data_utils/scan_paths.py
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
@staticmethod
def build_asset_filename(
    *,
    scan: int,
    shot: int,
    device: str,
    ext: str,
    variant: Optional[str] = None,
    device_file_stem: Optional[str] = None,
) -> str:
    """Build canonical expected file naming.

    Parameters
    ----------
    scan, shot
        Scan and shot numbers.
    device : str
        Device name (also the in-filename stem unless overridden).
    ext : str
        File extension (with or without leading dot).
    variant : str, optional
        Variant segment appended after the shot index.
    device_file_stem : str, optional
        Token to use in the filename between ``Scan<NNN>_`` and
        ``_<shot>``. Defaults to ``device``. Use this when the folder
        name and the in-filename stem differ — for example, folder
        ``U_BCaveMagSpec-interpSpec`` containing files named
        ``Scan042_U_BCaveMagSpec_001.csv``.
    """
    ext = ext.lstrip(".").lower()
    shot_str = ScanPaths._shot_str(shot)
    variant_seg = "" if not variant else f"{variant}"
    stem = device_file_stem if device_file_stem is not None else device
    return f"Scan{scan:03d}_{stem}_{shot_str}{variant_seg}.{ext}"

build_asset_path

build_asset_path(*, shot: int, device: str, ext: str, variant: Optional[str] = None, device_file_stem: Optional[str] = None) -> Path

Full expected path for one asset.

device is the subfolder name. device_file_stem overrides the in-filename token if it differs from the folder name (defaults to device). See :meth:build_asset_filename for details.

Source code in geecs_data_utils/scan_paths.py
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
def build_asset_path(
    self,
    *,
    shot: int,
    device: str,
    ext: str,
    variant: Optional[str] = None,
    device_file_stem: Optional[str] = None,
) -> Path:
    """Full expected path for one asset.

    ``device`` is the subfolder name. ``device_file_stem`` overrides the
    in-filename token if it differs from the folder name (defaults to
    ``device``). See :meth:`build_asset_filename` for details.
    """
    tag = self.get_tag()
    fname = self.build_asset_filename(
        scan=tag.number,
        shot=shot,
        device=device,
        ext=ext,
        variant=variant,
        device_file_stem=device_file_stem,
    )
    return self.device_folder(device) / fname

infer_device_ext

infer_device_ext(device: str, *, max_files: int = 5) -> str

Peek at up to max_files files to find proper file extension.

Source code in geecs_data_utils/scan_paths.py
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
def infer_device_ext(self, device: str, *, max_files: int = 5) -> str:
    """Peek at up to `max_files` files to find proper file extension."""
    from collections import Counter

    dpath = self.device_folder(device)
    if not dpath.exists():
        return "png"

    counts = Counter()
    seen = 0
    for f in dpath.iterdir():
        if f.is_file():
            ext = f.suffix.lower().lstrip(".")
            if ext in _ACCEPTABLE_EXTS:
                counts[ext] += 1
                seen += 1
                if seen >= max_files:
                    break
    return counts.most_common(1)[0][0] if counts else "png"
ScanData API

geecs_data_utils.scan_data.ScanData

ScanData(*, paths: ScanPaths)

Container for a single scan: paths + scalar DataFrame + lazy asset index.

This class composes a :class:ScanPaths (path logic) and provides: - Optional scalar DataFrame loading (s-file or TDMS→DataFrame). - Lazy, normalized asset indexing (no bytes loaded). - Convenience helpers for grouping/averaging images by Bin #. - Flexible column resolution (case-insensitive, substring/regex). - Per-bin scalar aggregation with configurable center and error.

Parameters:

Name Type Description Default
paths ScanPaths

A pre-constructed :class:ScanPaths instance pointing to the scan.

required
Notes

Use the factories :meth:from_date and :meth:latest for ergonomic creation.

Methods:

Name Description
from_date

Construct a :class:ScanData from date/number.

latest

Construct a :class:ScanData for the latest scan on a date.

load_scalars

Load the scalar DataFrame (s-file or TDMS converted).

set_data_frame

Attach a scalar DataFrame and invalidate dependent caches.

list_columns

List column names as strings (flattens MultiIndex columns if present).

find_cols

Flexible column search.

resolve_col

Resolve a loose column spec to a single best column name.

add_local_alias

Register a user-defined shorthand for a column name.

set_binning_config

Update binning configuration and invalidate cache.

expected_paths_by_bin

Group expected image paths by the current bin definition.

reload_sfile

Re-read the analysis s-file into self.data_frame.

copy_fresh_sfile_to_analysis

Replace the analysis s-file with the fresh copy from the scan folder.

load_ecs_live_dump

Load and parse the ECS Live Dump file for this scan via ScanPaths.

Attributes:

Name Type Description
binned_scalars DataFrame

Aggregate scalar data into bins with configurable center and error metrics.

Source code in geecs_data_utils/scan_data.py
221
222
223
224
225
226
227
228
229
230
231
232
def __init__(self, *, paths: ScanPaths):
    self.paths: ScanPaths = paths
    self.data_frame: Optional[pd.DataFrame] = None

    # Binning state
    self._bin_cfg: BinningConfig = BinningConfig()
    self._binned_cache: Optional[pd.DataFrame] = None
    self._df_version: int = 0
    self._binned_key: Optional[Tuple] = None

    # Local (user) aliases for columns (independent of DAQ "Alias:" strings)
    self.column_aliases: Dict[str, str] = {}

binned_scalars property

binned_scalars: DataFrame

Aggregate scalar data into bins with configurable center and error metrics.

For each bin defined by bin_col in the current :class:BinningConfig, all selected numeric columns (value_cols) are aggregated. The result is a wide DataFrame with a two-level column index: (column_name, {"center", "err_low", "err_high"}).

Notes
  • If value_cols is None, all numeric columns in the scalar DataFrame are included (including the bin source column and Shotnumber).
  • The bin column is treated like any other numeric column: its per-bin center and errors are computed the same way as other variables.
  • Error definitions (err) control how err_low and err_high are computed:
    • "std" : sample standard deviation (symmetric).
    • "stderr" : standard error of the mean (symmetric).
    • "mad" : median absolute deviation (scaled if scale_to_sigma=True; symmetric).
    • "iqr" : interquartile range using percentiles; asymmetric offsets around the chosen center.
    • "percentile": arbitrary quantile range using percentiles; asymmetric offsets around the chosen center.
  • Counts per bin are included under the pseudo-column ("count", "center").

Returns:

Type Description
DataFrame

Binned scalar table with a MultiIndex on columns:

  • Level 0: original column names plus "count".
  • Level 1: one of {"center", "err_low", "err_high"}.

The row index corresponds to unique bin labels, which may be discrete values or numeric bin centers depending on the binning configuration.

Raises:

Type Description
ValueError

If no scalar DataFrame is loaded.

KeyError

If the configured bin column is not found.

from_date classmethod

from_date(*, year: int, month: int, day: int, number: int, experiment: Optional[str] = None, base_directory: Optional[Path] = None, load_scalars: bool = True, source: Literal['sfile', 'tdms'] = 'sfile', append_paths: bool = True, stem_override: Optional[dict[str, str]] = None) -> 'ScanData'

Construct a :class:ScanData from date/number.

Parameters:

Name Type Description Default
year int

Identify the scan.

required
month int

Identify the scan.

required
day int

Identify the scan.

required
number int

Identify the scan.

required
experiment int

Identify the scan.

required
base_directory Optional[Path]

Base data root if not configured globally.

None
load_scalars bool

If True, load scalar DataFrame immediately.

True
source Literal['sfile', 'tdms']

"sfile" (default) or "tdms" for scalar source.

'sfile'
append_paths bool

If true, ad device/shot paths to df.

True
stem_override Optional[dict[str, str]]

Optional {device: in_filename_stem} mapping forwarded to :meth:load_scalars. Use when a device's folder name differs from the in-filename token (e.g., folder U_BCaveMagSpec-interpSpec with files named Scan042_U_BCaveMagSpec_001.csv).

None

Returns:

Type Description
ScanData
Source code in geecs_data_utils/scan_data.py
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
@classmethod
def from_date(
    cls,
    *,
    year: int,
    month: int,
    day: int,
    number: int,
    experiment: Optional[str] = None,
    base_directory: Optional[Path] = None,
    load_scalars: bool = True,
    source: Literal["sfile", "tdms"] = "sfile",
    append_paths: bool = True,
    stem_override: Optional[dict[str, str]] = None,
) -> "ScanData":
    """
    Construct a :class:`ScanData` from date/number.

    Parameters
    ----------
    year, month, day, number, experiment
        Identify the scan.
    base_directory
        Base data root if not configured globally.
    load_scalars
        If True, load scalar DataFrame immediately.
    source
        ``"sfile"`` (default) or ``"tdms"`` for scalar source.
    append_paths
        If true, ad device/shot paths to df.
    stem_override
        Optional ``{device: in_filename_stem}`` mapping forwarded to
        :meth:`load_scalars`. Use when a device's folder name differs
        from the in-filename token (e.g., folder
        ``U_BCaveMagSpec-interpSpec`` with files named
        ``Scan042_U_BCaveMagSpec_001.csv``).

    Returns
    -------
    ScanData
    """
    tag = ScanPaths.get_scan_tag(year, month, day, number, experiment=experiment)
    paths = ScanPaths(tag=tag, base_directory=base_directory)
    sd = cls(paths=paths)
    if load_scalars:
        sd.load_scalars(
            source=source,
            append_paths=append_paths,
            stem_override=stem_override,
        )
    return sd

latest classmethod

latest(experiment: Optional[str] = None, *, year: Optional[int] = None, month: Optional[int] = None, day: Optional[int] = None, base_directory: Optional[Path] = None, load_scalars: bool = True, source: Literal['sfile', 'tdms'] = 'sfile') -> 'ScanData'

Construct a :class:ScanData for the latest scan on a date.

Parameters:

Name Type Description Default
experiment Optional[str]

Experiment name.

None
year Optional[int]

Optional date components; defaults to today if omitted.

None
month Optional[int]

Optional date components; defaults to today if omitted.

None
day Optional[int]

Optional date components; defaults to today if omitted.

None
base_directory Optional[Path]

Base data root if not configured globally.

None
load_scalars bool

If True, load scalar DataFrame immediately.

True
source Literal['sfile', 'tdms']

"sfile" (default) or "tdms".

'sfile'

Returns:

Type Description
ScanData
Source code in geecs_data_utils/scan_data.py
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
@classmethod
def latest(
    cls,
    experiment: Optional[str] = None,
    *,
    year: Optional[int] = None,
    month: Optional[int] = None,
    day: Optional[int] = None,
    base_directory: Optional[Path] = None,
    load_scalars: bool = True,
    source: Literal["sfile", "tdms"] = "sfile",
) -> "ScanData":
    """
    Construct a :class:`ScanData` for the latest scan on a date.

    Parameters
    ----------
    experiment
        Experiment name.
    year, month, day
        Optional date components; defaults to today if omitted.
    base_directory
        Base data root if not configured globally.
    load_scalars
        If True, load scalar DataFrame immediately.
    source
        ``"sfile"`` (default) or ``"tdms"``.

    Returns
    -------
    ScanData
    """
    tag = ScanPaths.get_latest_scan_tag(
        experiment=experiment,
        year=year,
        month=month,
        day=day,
        base_directory=base_directory,
    )
    if not tag:
        raise ValueError("No scans found for the specified date/experiment.")
    paths = ScanPaths(tag=tag, base_directory=base_directory)
    sd = cls(paths=paths)
    if load_scalars:
        sd.load_scalars(source=source)
    return sd

load_scalars

load_scalars(*, source: Literal['sfile', 'tdms'] = 'sfile', append_paths: bool = True, stem_override: Optional[dict[str, str]] = None) -> None

Load the scalar DataFrame (s-file or TDMS converted).

Parameters:

Name Type Description Default
source Literal['sfile', 'tdms']

"sfile" to read s{scan}.txt from the analysis tree, or "tdms" to read ScanNNN.tdms and convert to a DataFrame if possible.

'sfile'
append_paths bool

If true, add device/shot paths to dataframe.

True
stem_override Optional[dict[str, str]]

Optional {device: in_filename_stem} mapping forwarded to :meth:set_data_frame. Use when a device's folder name differs from the in-filename token (e.g., folder U_BCaveMagSpec-interpSpec with files named Scan042_U_BCaveMagSpec_001.csv).

None

Raises:

Type Description
FileNotFoundError

If the s-file is expected but missing.

Source code in geecs_data_utils/scan_data.py
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
def load_scalars(
    self,
    *,
    source: Literal["sfile", "tdms"] = "sfile",
    append_paths: bool = True,
    stem_override: Optional[dict[str, str]] = None,
) -> None:
    """
    Load the scalar DataFrame (s-file or TDMS converted).

    Parameters
    ----------
    source
        ``"sfile"`` to read ``s{scan}.txt`` from the analysis tree, or ``"tdms"`` to
        read ``ScanNNN.tdms`` and convert to a DataFrame if possible.
    append_paths
        If true, add device/shot paths to dataframe.
    stem_override
        Optional ``{device: in_filename_stem}`` mapping forwarded to
        :meth:`set_data_frame`. Use when a device's folder name differs
        from the in-filename token (e.g., folder
        ``U_BCaveMagSpec-interpSpec`` with files named
        ``Scan042_U_BCaveMagSpec_001.csv``).

    Raises
    ------
    FileNotFoundError
        If the s-file is expected but missing.
    """
    if source == "sfile":
        tag = self.paths.get_tag()
        sfile = self.paths.get_analysis_folder().parent / f"s{tag.number}.txt"
        if not sfile.exists():
            raise FileNotFoundError(f"No sfile for scan {tag}")
        df = pd.read_csv(sfile, delimiter="\t")
        self.set_data_frame(
            df, append_paths=append_paths, stem_override=stem_override
        )

    elif source == "tdms":
        tag = self.paths.get_tag()
        tdms_path = self.paths.get_folder() / f"Scan{tag.number:03d}.tdms"
        if not tdms_path.exists():
            raise FileNotFoundError(f"TDMS file not found: {tdms_path}")
        dct = read_geecs_tdms(tdms_path) or {}
        if not dct:
            raise ValueError(f"TDMS file could not be parsed: {tdms_path}")
        df = geecs_tdms_dict_to_panda(dct)
        self.set_data_frame(
            df, append_paths=append_paths, stem_override=stem_override
        )

    else:
        raise ValueError(f"Unsupported source: {source!r}")

set_data_frame

set_data_frame(df: DataFrame, *, append_paths: bool = True, stem_override: Optional[dict[str, str]] = None) -> None

Attach a scalar DataFrame and invalidate dependent caches.

Parameters:

Name Type Description Default
df DataFrame

Scalar table for the scan (typically from s-file).

required
append_paths bool

If true, add device shot paths to dataframe.

True
stem_override Optional[dict[str, str]]

Optional {device: in_filename_stem} mapping forwarded to :meth:_append_expected_asset_columns. Use when a device's folder name differs from the in-filename token (e.g., folder U_BCaveMagSpec-interpSpec with files named Scan042_U_BCaveMagSpec_001.csv).

None
Source code in geecs_data_utils/scan_data.py
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
def set_data_frame(
    self,
    df: pd.DataFrame,
    *,
    append_paths: bool = True,
    stem_override: Optional[dict[str, str]] = None,
) -> None:
    """Attach a scalar DataFrame and invalidate dependent caches.

    Parameters
    ----------
    df
        Scalar table for the scan (typically from s-file).
    append_paths
        If true, add device shot paths to dataframe.
    stem_override
        Optional ``{device: in_filename_stem}`` mapping forwarded to
        :meth:`_append_expected_asset_columns`. Use when a device's folder
        name differs from the in-filename token (e.g., folder
        ``U_BCaveMagSpec-interpSpec`` with files named
        ``Scan042_U_BCaveMagSpec_001.csv``).
    """
    if append_paths:
        df = self._append_expected_asset_columns(df, stem_override=stem_override)
    self.data_frame = df
    self._df_version += 1
    self._binned_cache = None
    self._binned_key = None

list_columns

list_columns() -> List[str]

List column names as strings (flattens MultiIndex columns if present).

Returns:

Type Description
list of str
Source code in geecs_data_utils/scan_data.py
423
424
425
426
427
428
429
430
431
def list_columns(self) -> List[str]:
    """
    List column names as strings (flattens MultiIndex columns if present).

    Returns
    -------
    list of str
    """
    return self._flatten_columns()

find_cols

find_cols(query: Union[str, Sequence[str]], *, mode: ColumnMatchMode = 'contains', case_sensitive: bool = False) -> List[str]

Flexible column search.

Wrapper for find_cols in geecs_data_utils/data/columns.py.

Parameters:

Name Type Description Default
query Union[str, Sequence[str]]

String or list of strings to search for.

required
mode ColumnMatchMode

Search mode: "contains" (default), "startswith", "endswith", "regex", or "exact".

'contains'
case_sensitive bool

If True, match with case sensitivity.

False

Returns:

Type Description
list of str

Matching column names (flattened form). May be empty.

Source code in geecs_data_utils/scan_data.py
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
def find_cols(
    self,
    query: Union[str, Sequence[str]],
    *,
    mode: ColumnMatchMode = "contains",
    case_sensitive: bool = False,
) -> List[str]:
    """
    Flexible column search.

    Wrapper for find_cols in geecs_data_utils/data/columns.py.

    Parameters
    ----------
    query
        String or list of strings to search for.
    mode
        Search mode: ``"contains"`` (default), ``"startswith"``, ``"endswith"``,
        ``"regex"``, or ``"exact"``.
    case_sensitive
        If True, match with case sensitivity.

    Returns
    -------
    list of str
        Matching column names (flattened form). May be empty.
    """
    if self.data_frame is None:
        return []
    return find_cols(
        self.data_frame, query, mode=mode, case_sensitive=case_sensitive
    )

resolve_col

resolve_col(spec: str, *, mode: ColumnMatchMode = 'contains', case_sensitive: bool = False, prefer_exact_ci: bool = True) -> str

Resolve a loose column spec to a single best column name.

Parameters:

Name Type Description Default
spec str

User-provided spec (may be an alias or partial/regex).

required
mode ColumnMatchMode

Matching strategy used by :meth:find_cols: "contains" (default), "startswith", "endswith", "regex", or "exact".

'contains'
case_sensitive bool

If True, enforce case-sensitive matching for the chosen mode.

False
prefer_exact_ci bool

Prefer exact (case-insensitive) matches over substring/regex matches.

True

Returns:

Type Description
str

Selected column name.

Raises:

Type Description
ValueError

If no match is found.

Source code in geecs_data_utils/scan_data.py
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
def resolve_col(
    self,
    spec: str,
    *,
    mode: ColumnMatchMode = "contains",
    case_sensitive: bool = False,
    prefer_exact_ci: bool = True,
) -> str:
    """
    Resolve a loose column spec to a single best column name.

    Parameters
    ----------
    spec
        User-provided spec (may be an alias or partial/regex).
    mode
        Matching strategy used by :meth:`find_cols`: ``"contains"`` (default),
        ``"startswith"``, ``"endswith"``, ``"regex"``, or ``"exact"``.
    case_sensitive
        If True, enforce case-sensitive matching for the chosen mode.
    prefer_exact_ci
        Prefer exact (case-insensitive) matches over substring/regex matches.

    Returns
    -------
    str
        Selected column name.

    Raises
    ------
    ValueError
        If no match is found.
    """
    if self.data_frame is None:
        raise ValueError("No scalar dataframe loaded.")

    if spec in self.column_aliases:
        return self.column_aliases[spec]

    result = resolve_col_detailed(
        self.data_frame,
        spec,
        mode=mode,
        case_sensitive=case_sensitive,
        prefer_exact_ci=prefer_exact_ci,
    )
    if result.ambiguous and result.candidates is not None:
        c = result.candidates
        logging.warning(
            "Spec %r matched multiple columns (%d): %s; using %r",
            spec,
            len(c),
            list(c),
            result.column,
        )
    return result.column

add_local_alias

add_local_alias(alias: str, actual_col: str) -> None

Register a user-defined shorthand for a column name.

Parameters:

Name Type Description Default
alias str

Local shorthand (e.g., "pressure").

required
actual_col str

Full column name present in the DataFrame.

required
Source code in geecs_data_utils/scan_data.py
523
524
525
526
527
528
529
530
531
532
533
534
def add_local_alias(self, alias: str, actual_col: str) -> None:
    """
    Register a user-defined shorthand for a column name.

    Parameters
    ----------
    alias
        Local shorthand (e.g., ``"pressure"``).
    actual_col
        Full column name present in the DataFrame.
    """
    self.column_aliases[alias] = actual_col

set_binning_config

set_binning_config(**updates) -> None

Update binning configuration and invalidate cache.

Parameters:

Name Type Description Default
**updates

Fields to replace on the current :class:BinningConfig.

{}
Source code in geecs_data_utils/scan_data.py
538
539
540
541
542
543
544
545
546
547
548
549
550
551
def set_binning_config(self, **updates) -> None:
    """
    Update binning configuration and invalidate cache.

    Parameters
    ----------
    **updates
        Fields to replace on the current :class:`BinningConfig`.
    """
    if "value_cols" in updates and updates["value_cols"] is not None:
        updates["value_cols"] = tuple(map(str, updates["value_cols"]))
    self._bin_cfg = replace(self._bin_cfg, **updates)
    self._binned_cache = None
    self._binned_key = None

expected_paths_by_bin

expected_paths_by_bin(device: str, *, variant: Optional[str] = None, bin_col: Optional[str] = None, dropna_paths: bool = True, exists_only: bool = False) -> Dict[Hashable, List[Path]]

Group expected image paths by the current bin definition.

Parameters:

Name Type Description Default
device str

Device name (subfolder).

required
variant Optional[str]

Optional variant suffix used when creating expected-path columns.

None
bin_col Optional[str]

Override the configured bin column for this call.

None
dropna_paths bool

If True, drop rows with missing path strings.

True
exists_only bool

If True, filter out paths that do not currently exist on disk.

False

Returns:

Type Description
dict[Hashable, list[Path]]

Mapping {bin_value -> [image paths]}.

Source code in geecs_data_utils/scan_data.py
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
def expected_paths_by_bin(
    self,
    device: str,
    *,
    variant: Optional[str] = None,
    bin_col: Optional[str] = None,
    dropna_paths: bool = True,
    exists_only: bool = False,
) -> Dict[Hashable, List[Path]]:
    """
    Group expected image paths by the current bin definition.

    Parameters
    ----------
    device
        Device name (subfolder).
    variant
        Optional variant suffix used when creating expected-path columns.
    bin_col
        Override the configured bin column for this call.
    dropna_paths
        If True, drop rows with missing path strings.
    exists_only
        If True, filter out paths that do not currently exist on disk.

    Returns
    -------
    dict[Hashable, list[pathlib.Path]]
        Mapping {bin_value -> [image paths]}.
    """
    if self.data_frame is None:
        raise ValueError("No scalar dataframe loaded.")

    # Optionally override the bin column for just this call
    if bin_col is not None:
        self._bin_cfg = replace(self._bin_cfg, bin_col=str(bin_col))

    # Ensure the bin source is present; compute the effective bin key
    self._require_bin_col()
    df = self.data_frame.copy()
    bin_key, bin_name = self._compute_bin_key(df)
    df = df.assign(**{bin_name: bin_key})

    col = self._expected_path_col(device, variant=variant)
    series = df[col]

    if dropna_paths:
        mask = series.notna()
        df = df.loc[mask]

    # Convert to Paths and optionally filter to existing files
    df = df.assign(
        _path_obj=df[col].map(lambda s: Path(s) if isinstance(s, str) else None)
    )
    if exists_only:
        df = df.loc[df["_path_obj"].map(lambda p: p is not None and p.exists())]

    out: Dict[Hashable, List[Path]] = {}
    for bval, group in df.groupby(bin_name, dropna=False, observed=True, sort=True):
        paths = [p for p in group["_path_obj"].tolist() if p is not None]
        if paths:
            out[bval] = paths
    return out

reload_sfile

reload_sfile() -> None

Re-read the analysis s-file into self.data_frame.

Notes

This is a thin alias for load_scalars(source='sfile') to make intent explicit.

Source code in geecs_data_utils/scan_data.py
1058
1059
1060
1061
1062
1063
1064
1065
1066
def reload_sfile(self) -> None:
    """
    Re-read the analysis s-file into ``self.data_frame``.

    Notes
    -----
    This is a thin alias for ``load_scalars(source='sfile')`` to make intent explicit.
    """
    self.load_scalars(source="sfile")

copy_fresh_sfile_to_analysis

copy_fresh_sfile_to_analysis() -> None

Replace the analysis s-file with the fresh copy from the scan folder.

Copies: <scan>/scans/ScanDataScanNNN.txt<scan>/analysis/../sNNN.txt

Raises:

Type Description
FileNotFoundError

If the source s-file in scans/ does not exist.

Source code in geecs_data_utils/scan_data.py
1068
1069
1070
1071
1072
1073
1074
1075
1076
1077
1078
1079
1080
1081
1082
1083
1084
1085
1086
1087
1088
1089
def copy_fresh_sfile_to_analysis(self) -> None:
    """
    Replace the analysis s-file with the fresh copy from the scan folder.

    Copies:
        ``<scan>/scans/ScanDataScanNNN.txt`` → ``<scan>/analysis/../sNNN.txt``

    Raises
    ------
    FileNotFoundError
        If the source s-file in ``scans/`` does not exist.
    """
    tag = self.paths.get_tag()
    scan_txt = self.paths.get_folder() / f"ScanDataScan{tag.number:03d}.txt"
    analysis_txt = self.paths.get_analysis_folder().parent / f"s{tag.number}.txt"

    if not scan_txt.exists():
        raise FileNotFoundError(f"Original s-file '{scan_txt}' not found.")
    if analysis_txt.exists():
        analysis_txt.unlink()

    shutil.copy2(src=scan_txt, dst=analysis_txt)

load_ecs_live_dump

load_ecs_live_dump() -> ECSDump

Load and parse the ECS Live Dump file for this scan via ScanPaths.

Returns:

Type Description
ECSDump

Parsed ECS dump structured by device name.

Raises:

Type Description
FileNotFoundError

If no ECS dump file is available for this scan.

Source code in geecs_data_utils/scan_data.py
1091
1092
1093
1094
1095
1096
1097
1098
1099
1100
1101
1102
1103
1104
1105
1106
1107
1108
1109
def load_ecs_live_dump(self) -> ECSDump:
    """
    Load and parse the ECS Live Dump file for this scan via ``ScanPaths``.

    Returns
    -------
    ECSDump
        Parsed ECS dump structured by device name.

    Raises
    ------
    FileNotFoundError
        If no ECS dump file is available for this scan.
    """
    tag = self.paths.get_tag()
    ecs_path = self.paths.get_ecs_dump_file()
    if not ecs_path:
        raise FileNotFoundError(f"No ECS live dump file found for scan {tag}")
    return parse_ecs_dump(ecs_path)