Core Modules

ScanPaths API

geecs_data_utils.scan_paths.ScanPaths ¶

ScanPaths(folder: Optional[SysPath] = None, tag: Optional[ScanTag] = None, base_directory: Union[Path, str, None] = None, read_mode: bool = True)

Represents a GEECS experiment scan.

Attributes:

Name	Type	Description
`scan_info`	`dict[str, str]`	Dictionary containing scan configuration information loaded from scan info file
`paths_config`	`GeecsPathsConfig`	Class-level configuration object for managing GEECS data paths

Initialize ScanPaths object.

Either a folder or a tag+base_directory needs to be given in order to specify the location of a scan data folder

Parameters:

Name	Type	Description	Default
`folder`	`Union[str, bytes, PathLike]`	Data folder containing the scan data, e.g. "Z:/data/Undulator/Y2023/05-May/23_0501/scans/Scan002".	`None`
`tag`	`Optional[ScanTag]`	NamedTuple with the experiment name, date, and scan number	`None`
`base_directory`	`Optional[Union[Path, str]]`	The base path for the data, e/g/ "Z:/data/" If not given, will default to the path located by GeecsPathsConfig	`None`
`read_mode`	`bool`	If True (the default), raise if the scan folder does not exist. If False, silently create the folder (including any missing parents). `read_mode=False` is for scanner-side callers only — the GEECS scanner and BlueskyScanner, which legitimately bring new scan folders into existence. Analysis code (ScanAnalysis, ImageAnalysis, anything that consumes existing scans) must always leave this at the default. Silent creation from the consumer side has caused data loss: a transient SMB/NetApp visibility blip looks like a missing folder, and auto-creating it plants an empty directory over the real one.	`True`

Methods:

Name	Description
`reload_paths_config`	Use by GEECS Scanner to fix scan_data_manager in case experiment name has changed.
`get_scan_tag`	Return a ScanTag tuple given the appropriate information, formatted correctly.
`get_scan_folder_path`	Build scan folder paths for local and client directories.
`get_daily_scan_folder`	Build path to the daily scan folder. If no tag given but experiment name given, uses the current day.
`get_scan_analysis_folder_path`	Build analysis folder path using the scan folder path as a baseline.
`get_device_shot_path`	Build the full path to a device's shot file based on the scan tag, device name, and shot number.
`get_latest_scan_tag`	Locates the last generated scan for the given day or defaults to today if no date is provided.
`get_next_scan_tag`	Determine the next available scan tag for the given day or today if no date is provided.
`get_next_scan_folder`	Build the folder path for the next scan on the given day or today if no date is provided.
`build_next_scan_data`	Create the ScanData object for the next scan and builds its folder.
`is_background_scan`	Check if the given scan tag references a scan that was designated as a background.
`get_folder`	Get the scan folder path.
`get_tag`	Get the scan tag.
`get_tag_date`	Get the scan date.
`get_analysis_folder`	Get the analysis folder path, creating it if necessary.
`get_folders_and_files`	Get lists of device folders and files in the scan directory.
`load_scan_info`	Load scan configuration information from the scan info file.
`get_ecs_dump_file`	Get the ECS Live Dump file corresponding to this scan.
`build_device_file_map`	Build a mapping from shot number to file path for a given device.
`get_common_shot_dataframe`	Generate a DataFrame containing file paths for all devices with common shot number.
`list_device_folders`	Return device subfolder names from this scan folder.
`device_folder`	Resolve '/'.
`build_asset_filename`	Build canonical expected file naming.
`build_asset_path`	Full expected path for one asset.
`infer_device_ext`	Peek at up to `max_files` files to find proper file extension.

Source code in geecs_data_utils/scan_paths.py

def __init__(
    self,
    folder: Optional[SysPath] = None,
    tag: Optional[ScanTag] = None,
    base_directory: Union[Path, str, None] = None,
    read_mode: bool = True,
):
    """
    Initialize ScanPaths object.

    Either a folder or a tag+base_directory needs to be given in order to specify the location of a scan data folder

    Parameters
    ----------
    folder : Union[str, bytes, PathLike]
        Data folder containing the scan data, e.g. "Z:/data/Undulator/Y2023/05-May/23_0501/scans/Scan002".
    tag : Optional[ScanTag]
        NamedTuple with the experiment name, date, and scan number
    base_directory : Optional[Union[Path, str]]
        The base path for the data, e/g/ "Z:/data/"
        If not given, will default to the path located by GeecsPathsConfig
    read_mode: bool
        If True (the default), raise if the scan folder does not exist.
        If False, silently create the folder (including any missing parents).

        ``read_mode=False`` is for *scanner-side* callers only — the GEECS
        scanner and BlueskyScanner, which legitimately bring new scan folders
        into existence. Analysis code (ScanAnalysis, ImageAnalysis, anything
        that consumes existing scans) must always leave this at the default.
        Silent creation from the consumer side has caused data loss: a
        transient SMB/NetApp visibility blip looks like a missing folder, and
        auto-creating it plants an empty directory over the real one.
    """
    self.scan_info: dict[str, str] = {}

    self._folder: Optional[Path] = None
    self._tag: Optional[ScanTag] = None
    self._tag_date: Optional[date] = None
    self._analysis_folder: Optional[Path] = None

    # Handle folder initialization
    if folder is None and tag is not None:
        if base_directory is None or not Path(base_directory).exists():
            base_directory = ScanPaths.paths_config.base_path
        if not Path(base_directory).exists():
            raise NotADirectoryError(
                f"Error setting base directory: '{base_directory}'"
            )
        folder = self.get_scan_folder_path(tag, base_directory=base_directory)

    self._initialize_folders(folder, read_mode)

reload_paths_config `classmethod` ¶

reload_paths_config(config_path: Optional[Path] = None, default_experiment: Optional[str] = None, set_base_path: Optional[Union[Path, str]] = None, image_analysis_configs_path: Optional[Union[Path, str]] = None)

Use by GEECS Scanner to fix scan_data_manager in case experiment name has changed.

Source code in geecs_data_utils/scan_paths.py

@classmethod
def reload_paths_config(
    cls,
    config_path: Optional[Path] = None,
    default_experiment: Optional[str] = None,
    set_base_path: Optional[Union[Path, str]] = None,
    image_analysis_configs_path: Optional[Union[Path, str]] = None,
):
    """Use by GEECS Scanner to fix scan_data_manager in case experiment name has changed."""
    try:
        if (
            config_path is None
        ):  # Then don't explicitly pass config_path so that it uses the default location
            cls.paths_config = GeecsPathsConfig(
                default_experiment=default_experiment,
                set_base_path=set_base_path,
                image_analysis_configs_path=image_analysis_configs_path,
            )
        else:
            cls.paths_config = GeecsPathsConfig(
                config_path=config_path,
                default_experiment=default_experiment,
                set_base_path=set_base_path,
                image_analysis_configs_path=image_analysis_configs_path,
            )
    except ConfigurationError as e:
        logger.error(f"Configuration Error in ScanData: {e}")
        cls.paths_config = None

get_scan_tag `staticmethod` ¶

get_scan_tag(year: Union[int, str], month: Union[int, str], day: Union[int, str], number: Union[int, str], experiment: Optional[str] = None, experiment_name: Optional[str] = None) -> ScanTag

Return a ScanTag tuple given the appropriate information, formatted correctly.

Ideally one should only build ScanTag objects using this function.

Parameters:

Name	Type	Description	Default
`year`	`Union[int, str]`	Target scan year	required
`month`	`Union[int, str]`	Target scan month	required
`day`	`Union[int, str]`	Target scan day	required
`number`	`Union[int, str]`	Target scan number	required
`experiment`	`str`	Target scan's experiment name	`None`
`experiment_name`	`str`	Target scan's experiment name (deprecated)	`None`

Returns:

Type	Description
`ScanTag`	properly formatted information to describe the target scan

Source code in geecs_data_utils/scan_paths.py

@staticmethod
def get_scan_tag(
    year: Union[int, str],
    month: Union[int, str],
    day: Union[int, str],
    number: Union[int, str],
    experiment: Optional[str] = None,
    experiment_name: Optional[str] = None,
) -> ScanTag:
    """
    Return a ScanTag tuple given the appropriate information, formatted correctly.

    Ideally one should only build ScanTag objects using this function.

    Parameters
    ----------
    year : Union[int, str]
        Target scan year
    month : Union[int, str]
        Target scan month
    day : Union[int, str]
        Target scan day
    number : Union[int, str]
        Target scan number
    experiment : str
        Target scan's experiment name
    experiment_name : str
        Target scan's experiment name (deprecated)

    Returns
    -------
    ScanTag
        properly formatted information to describe the target scan
    """
    year = int(year)
    if 0 <= year <= 99:
        year += 2000
    month = month_to_int(month)

    exp = experiment or experiment_name or ScanPaths.paths_config.experiment
    if experiment_name is not None:
        logger.warning(
            "Recommended to use 'experiment' instead of 'experiment_name' for 'get_scan_tag'..."
        )

    return ScanTag(
        year=year, month=month, day=int(day), number=int(number), experiment=exp
    )

get_scan_folder_path `staticmethod` ¶

get_scan_folder_path(tag: ScanTag, base_directory: Optional[Union[Path, str]] = None) -> Path

Build scan folder paths for local and client directories.

Source code in geecs_data_utils/scan_paths.py

@staticmethod
def get_scan_folder_path(
    tag: ScanTag, base_directory: Optional[Union[Path, str]] = None
) -> Path:
    """Build scan folder paths for local and client directories."""
    return (
        ScanPaths.get_daily_scan_folder(tag=tag, base_directory=base_directory)
        / f"Scan{tag.number:03d}"
    )

get_daily_scan_folder `staticmethod` ¶

get_daily_scan_folder(experiment: str = None, tag: ScanTag = None, base_directory: Optional[Union[Path, str]] = None) -> Path

Build path to the daily scan folder. If no tag given but experiment name given, uses the current day.

Source code in geecs_data_utils/scan_paths.py

@staticmethod
def get_daily_scan_folder(
    experiment: str = None,
    tag: ScanTag = None,
    base_directory: Optional[Union[Path, str]] = None,
) -> Path:
    """Build path to the daily scan folder. If no tag given but experiment name given, uses the current day."""
    base = base_directory or ScanPaths.paths_config.base_path

    if tag is None and experiment is None:
        raise ValueError(
            "Need to give experiment name or Scan Tag to `get_daily_scan_folder`"
        )

    if tag is None:
        today = datetime.today()
        tag = ScanPaths.get_scan_tag(
            today.year,
            month=today.month,
            day=today.day,
            number=0,
            experiment=experiment,
        )

    folder = Path(base) / tag.experiment if tag.experiment else Path(base)
    folder = (
        folder / f"Y{tag.year}" / f"{tag.month:02d}-{cal.month_name[tag.month][:3]}"
    )
    folder /= f"{str(tag.year)[-2:]}_{tag.month:02d}{tag.day:02d}"
    folder = folder / "scans"

    return folder

get_scan_analysis_folder_path `staticmethod` ¶

get_scan_analysis_folder_path(tag: ScanTag, base_directory: Optional[Union[Path, str]] = None) -> Path

Build analysis folder path using the scan folder path as a baseline.

Source code in geecs_data_utils/scan_paths.py

@staticmethod
def get_scan_analysis_folder_path(
    tag: ScanTag, base_directory: Optional[Union[Path, str]] = None
) -> Path:
    """Build analysis folder path using the scan folder path as a baseline."""
    scan_folder_path = ScanPaths.get_scan_folder_path(
        tag=tag, base_directory=base_directory
    )

    parts = list(scan_folder_path.parts)
    parts[-2] = "analysis"
    return Path(*parts)

get_device_shot_path `staticmethod` ¶

get_device_shot_path(tag: ScanTag, device_name: str, shot_number: int, file_extension: str = 'png', base_directory: Optional[Union[Path, str]] = None) -> Path

Build the full path to a device's shot file based on the scan tag, device name, and shot number.

Parameters:

Name	Type	Description	Default
`tag`	`ScanTag`	The scan tag containing year, month, day, and scan number.	required
`device_name`	`str`	The name of the device.	required
`shot_number`	`int`	The shot number.	required
`file_extension`	`str`	File extension for the shot file (default: 'png').	`'png'`
`base_directory`	`Optional[Union[Path, str]]`	Base directory for the scan (default: CONFIG.local_base_path).	`None`
`experiment`	`Optional[str]`	Experiment name (default: CONFIG.experiment).	required

Returns:

Type	Description
`Path`	The full path to the device's shot file.

Source code in geecs_data_utils/scan_paths.py

@staticmethod
def get_device_shot_path(
    tag: ScanTag,
    device_name: str,
    shot_number: int,
    file_extension: str = "png",
    base_directory: Optional[Union[Path, str]] = None,
) -> Path:
    """
    Build the full path to a device's shot file based on the scan tag, device name, and shot number.

    Parameters
    ----------
    tag : ScanTag
        The scan tag containing year, month, day, and scan number.
    device_name : str
        The name of the device.
    shot_number : int
        The shot number.
    file_extension : str, optional
        File extension for the shot file (default: 'png').
    base_directory : Optional[Union[Path, str]], optional
        Base directory for the scan (default: CONFIG.local_base_path).
    experiment : Optional[str], optional
        Experiment name (default: CONFIG.experiment).

    Returns
    -------
    Path
        The full path to the device's shot file.
    """
    scan_path = ScanPaths.get_scan_folder_path(
        tag=tag, base_directory=base_directory
    )
    extension = (
        "." + file_extension if "." not in file_extension else file_extension
    )
    file = (
        scan_path
        / f"{device_name}"
        / f"Scan{tag.number:03d}_{device_name}_{shot_number:03d}{extension}"
    )
    return file

get_latest_scan_tag `staticmethod` ¶

get_latest_scan_tag(experiment: Optional[str] = None, year: Optional[int] = None, month: Optional[int] = None, day: Optional[int] = None, base_directory: Union[str, Path, None] = None) -> Optional[ScanTag]

Locates the last generated scan for the given day or defaults to today if no date is provided.

Parameters:

Name	Type	Description	Default
`experiment`	`Optional[str]`	Experiment name (default: CONFIG.experiment).	`None`
`year`	`Optional[int]`	Year of the scan (4-digit, default: current year if not provided).	`None`
`month`	`Optional[int]`	Month of the scan (1-12, default: current month if not provided).	`None`
`day`	`Optional[int]`	Day of the scan (1-31, default: current day if not provided).	`None`

Returns:

Type	Description
`Optional[ScanTag]`	The ScanTag representing the latest scan folder, or None if no scans exist for the given day.

Source code in geecs_data_utils/scan_paths.py

@staticmethod
def get_latest_scan_tag(
    experiment: Optional[str] = None,
    year: Optional[int] = None,
    month: Optional[int] = None,
    day: Optional[int] = None,
    base_directory: Union[str, Path, None] = None,
) -> Optional[ScanTag]:
    """
    Locates the last generated scan for the given day or defaults to today if no date is provided.

    Parameters
    ----------
    experiment : Optional[str], optional
        Experiment name (default: CONFIG.experiment).
    year : Optional[int], optional
        Year of the scan (4-digit, default: current year if not provided).
    month : Optional[int], optional
        Month of the scan (1-12, default: current month if not provided).
    day : Optional[int], optional
        Day of the scan (1-31, default: current day if not provided).

    Returns
    -------
    Optional[ScanTag]
        The ScanTag representing the latest scan folder, or None if no scans exist for the given day.
    """
    today = datetime.today()
    year = year or today.year
    month = month or today.month
    day = day or today.day

    i = 1
    while True:
        tag = ScanPaths.get_scan_tag(year, month, day, i, experiment=experiment)
        try:
            ScanPaths(tag=tag, read_mode=True, base_directory=base_directory)
        except ValueError:
            break
        i += 1

    if i == 1:
        return None  # No scans exist for the given day
    return ScanPaths.get_scan_tag(year, month, day, i - 1, experiment=experiment)

get_next_scan_tag `staticmethod` ¶

get_next_scan_tag(experiment: Optional[str] = None, year: Optional[int] = None, month: Optional[int] = None, day: Optional[int] = None, base_directory: Union[str, Path, None] = None) -> ScanTag

Determine the next available scan tag for the given day or today if no date is provided.

Parameters:

Name	Type	Description	Default
`experiment`	`Optional[str]`	Experiment name (default: CONFIG.experiment).	`None`
`year`	`Optional[int]`	Year of the scan (4-digit, default: current year if not provided).	`None`
`month`	`Optional[int]`	Month of the scan (1-12, default: current month if not provided).	`None`
`day`	`Optional[int]`	Day of the scan (1-31, default: current day if not provided).	`None`

Returns:

Type	Description
`ScanTag`	The ScanTag for the next available scan.

Source code in geecs_data_utils/scan_paths.py

@staticmethod
def get_next_scan_tag(
    experiment: Optional[str] = None,
    year: Optional[int] = None,
    month: Optional[int] = None,
    day: Optional[int] = None,
    base_directory: Union[str, Path, None] = None,
) -> ScanTag:
    """
    Determine the next available scan tag for the given day or today if no date is provided.

    Parameters
    ----------
    experiment : Optional[str], optional
        Experiment name (default: CONFIG.experiment).
    year : Optional[int], optional
        Year of the scan (4-digit, default: current year if not provided).
    month : Optional[int], optional
        Month of the scan (1-12, default: current month if not provided).
    day : Optional[int], optional
        Day of the scan (1-31, default: current day if not provided).

    Returns
    -------
    ScanTag
        The ScanTag for the next available scan.
    """
    latest_tag = ScanPaths.get_latest_scan_tag(
        experiment, year, month, day, base_directory=base_directory
    )
    if not latest_tag:
        today = datetime.today()
        year = year or today.year
        month = month or today.month
        day = day or today.day
        return ScanPaths.get_scan_tag(year, month, day, 1, experiment=experiment)

    return ScanPaths.get_scan_tag(
        latest_tag.year,
        latest_tag.month,
        latest_tag.day,
        latest_tag.number + 1,
        experiment=experiment,
    )

get_next_scan_folder `staticmethod` ¶

get_next_scan_folder(experiment: Optional[str] = None, year: Optional[int] = None, month: Optional[int] = None, day: Optional[int] = None, base_directory: Union[str, Path, None] = None) -> Path

Build the folder path for the next scan on the given day or today if no date is provided.

Parameters:

Name	Type	Description	Default
`experiment`	`Optional[str]`	Experiment name (default: CONFIG.experiment).	`None`
`year`	`Optional[int]`	Year of the scan (4-digit, default: current year if not provided).	`None`
`month`	`Optional[int]`	Month of the scan (1-12, default: current month if not provided).	`None`
`day`	`Optional[int]`	Day of the scan (1-31, default: current day if not provided).	`None`

Returns:

Type	Description
`Path`	The Path to the folder for the next scan.

Source code in geecs_data_utils/scan_paths.py

@staticmethod
def get_next_scan_folder(
    experiment: Optional[str] = None,
    year: Optional[int] = None,
    month: Optional[int] = None,
    day: Optional[int] = None,
    base_directory: Union[str, Path, None] = None,
) -> Path:
    """
    Build the folder path for the next scan on the given day or today if no date is provided.

    Parameters
    ----------
    experiment : Optional[str], optional
        Experiment name (default: CONFIG.experiment).
    year : Optional[int], optional
        Year of the scan (4-digit, default: current year if not provided).
    month : Optional[int], optional
        Month of the scan (1-12, default: current month if not provided).
    day : Optional[int], optional
        Day of the scan (1-31, default: current day if not provided).

    Returns
    -------
    Path
        The Path to the folder for the next scan.
    """
    next_tag = ScanPaths.get_next_scan_tag(
        experiment, year, month, day, base_directory=base_directory
    )
    return ScanPaths.get_scan_folder_path(
        tag=next_tag, base_directory=base_directory
    )

build_next_scan_data `staticmethod` ¶

build_next_scan_data(experiment: Optional[str] = None, year: Optional[int] = None, month: Optional[int] = None, day: Optional[int] = None, base_directory: Union[str, Path, None] = None) -> ScanPaths

Create the ScanData object for the next scan and builds its folder.

Parameters:

Name	Type	Description	Default
`experiment`	`Optional[str]`	Experiment name (default: CONFIG.experiment).	`None`
`year`	`Optional[int]`	Year of the scan (4-digit, default: current year if not provided).	`None`
`month`	`Optional[int]`	Month of the scan (1-12, default: current month if not provided).	`None`
`day`	`Optional[int]`	Day of the scan (1-31, default: current day if not provided).	`None`

Returns:

Type	Description
`ScanData`	The ScanData object for the next scan.

Source code in geecs_data_utils/scan_paths.py

@staticmethod
def build_next_scan_data(
    experiment: Optional[str] = None,
    year: Optional[int] = None,
    month: Optional[int] = None,
    day: Optional[int] = None,
    base_directory: Union[str, Path, None] = None,
) -> ScanPaths:
    """
    Create the ScanData object for the next scan and builds its folder.

    Parameters
    ----------
    experiment : Optional[str], optional
        Experiment name (default: CONFIG.experiment).
    year : Optional[int], optional
        Year of the scan (4-digit, default: current year if not provided).
    month : Optional[int], optional
        Month of the scan (1-12, default: current month if not provided).
    day : Optional[int], optional
        Day of the scan (1-31, default: current day if not provided).

    Returns
    -------
    ScanData
        The ScanData object for the next scan.
    """
    next_tag = ScanPaths.get_next_scan_tag(
        experiment, year, month, day, base_directory=base_directory
    )
    return ScanPaths(tag=next_tag, read_mode=False, base_directory=base_directory)

is_background_scan `staticmethod` ¶

is_background_scan(tag: ScanTag, base_directory: Optional[Union[Path, str]] = None) -> bool

Check if the given scan tag references a scan that was designated as a background.

Parameters:

Name	Type	Description	Default
`tag`	`ScanTag`	The scan tag containing year, month, day, and scan number.	required
`base_directory`	`Optional[Union[Path, str]]`	Base directory for the scan (default: CONFIG.local_base_path).	`None`

Returns:

Type	Description
`bool`	True if scan was explictly set as a Background scan, False otherwise

Source code in geecs_data_utils/scan_paths.py

@staticmethod
def is_background_scan(
    tag: ScanTag, base_directory: Optional[Union[Path, str]] = None
) -> bool:
    """
    Check if the given scan tag references a scan that was designated as a background.

    Parameters
    ----------
    tag : ScanTag
        The scan tag containing year, month, day, and scan number.
    base_directory : Optional[Union[Path, str]], optional
        Base directory for the scan (default: CONFIG.local_base_path).

    Returns
    -------
    bool
        True if scan was explictly set as a Background scan, False otherwise
    """
    scan_folder = ScanPaths.get_scan_folder_path(
        tag=tag, base_directory=base_directory
    )
    config_filename = scan_folder / f"ScanInfoScan{tag.number:03d}.ini"

    config = ConfigParser()
    config.read(config_filename)

    if config.has_section("Scan Info") and config.has_option(
        "Scan Info", "Background"
    ):
        return config.get("Scan Info", "Background").strip().lower() == '"true"'
    return False

get_folder ¶

get_folder() -> Optional[Path]

Get the scan folder path.

Source code in geecs_data_utils/scan_paths.py

def get_folder(self) -> Optional[Path]:
    """Get the scan folder path."""
    return self._folder

get_tag ¶

get_tag() -> Optional[ScanTag]

Get the scan tag.

Source code in geecs_data_utils/scan_paths.py

def get_tag(self) -> Optional[ScanTag]:
    """Get the scan tag."""
    return self._tag

get_tag_date ¶

get_tag_date() -> Optional[date]

Get the scan date.

Source code in geecs_data_utils/scan_paths.py

def get_tag_date(self) -> Optional[date]:
    """Get the scan date."""
    return self._tag_date

get_analysis_folder ¶

get_analysis_folder() -> Optional[Path]

Get the analysis folder path, creating it if necessary.

Source code in geecs_data_utils/scan_paths.py

def get_analysis_folder(self) -> Optional[Path]:
    """Get the analysis folder path, creating it if necessary."""
    if self._analysis_folder is None:
        parts = list(Path(self._folder).parts)
        parts[-2] = "analysis"
        self._analysis_folder = Path(*parts)
        if not self._analysis_folder.is_dir():
            os.makedirs(self._analysis_folder)

    return self._analysis_folder

get_folders_and_files ¶

get_folders_and_files() -> dict[str, list[str]]

Get lists of device folders and files in the scan directory.

Source code in geecs_data_utils/scan_paths.py

def get_folders_and_files(self) -> dict[str, list[str]]:
    """Get lists of device folders and files in the scan directory."""
    top_content = next(os.walk(self._folder))
    return {"devices": top_content[1], "files": top_content[2]}

load_scan_info ¶

load_scan_info()

Load scan configuration information from the scan info file.

Source code in geecs_data_utils/scan_paths.py

def load_scan_info(self):
    """Load scan configuration information from the scan info file."""
    config_parser = ConfigParser()
    config_parser.optionxform = str

    try:
        config_parser.read(self._folder / f"ScanInfoScan{self._tag.number:03d}.ini")
        self.scan_info.update(
            {
                key: value.strip("'\"")
                for key, value in config_parser.items("Scan Info")
            }
        )
    except NoSectionError:
        temp_scan_data = inspect.stack()[0][3]
        logging.warning(
            f'ScanInfo file does not have a "Scan Info" section (in {temp_scan_data})'
        )

    return self.scan_info

get_ecs_dump_file ¶

get_ecs_dump_file() -> Optional[Path]

Get the ECS Live Dump file corresponding to this scan.

Returns:

Type	Description
`Optional[Path]`	Path to the ECS dump file if it exists, else None.

Source code in geecs_data_utils/scan_paths.py

def get_ecs_dump_file(self) -> Optional[Path]:
    """
    Get the ECS Live Dump file corresponding to this scan.

    Returns
    -------
    Optional[Path]
        Path to the ECS dump file if it exists, else None.
    """
    if not self._folder:
        return None

    ecs_folder = self._folder.parent.parent / "ECS Live dumps"
    filename = f"Scan{self._tag.number}.txt"
    ecs_file = ecs_folder / filename

    return ecs_file if ecs_file.exists() else None

build_device_file_map ¶

build_device_file_map(device: str, file_tail: str, *, device_file_stem: Optional[str] = None) -> dict[int, Path]

Build a mapping from shot number to file path for a given device.

Parameters:

Name	Type	Description	Default
`device`	`str`	Device name; also the subfolder of the scan directory containing the files.	required
`file_tail`	`str`	Suffix and extension, e.g., '.png', '_avg.h5'.	required
`device_file_stem`	`str`	Token used in the filename between `Scan<NNN>_` and `_<shot>`. Defaults to `device`. Use this when the folder name and the in-filename stem differ — for example, folder `U_BCaveMagSpec-interpSpec` containing files named `Scan042_U_BCaveMagSpec_001.csv`.	`None`

Returns:

Type	Description
`dict[int, Path]`	Mapping from shot number to file path.

Source code in geecs_data_utils/scan_paths.py

def build_device_file_map(
    self,
    device: str,
    file_tail: str,
    *,
    device_file_stem: Optional[str] = None,
) -> dict[int, Path]:
    """
    Build a mapping from shot number to file path for a given device.

    Parameters
    ----------
    device : str
        Device name; also the subfolder of the scan directory containing
        the files.
    file_tail : str
        Suffix and extension, e.g., '.png', '_avg.h5'.
    device_file_stem : str, optional
        Token used in the filename between ``Scan<NNN>_`` and ``_<shot>``.
        Defaults to ``device``. Use this when the folder name and the
        in-filename stem differ — for example, folder
        ``U_BCaveMagSpec-interpSpec`` containing files named
        ``Scan042_U_BCaveMagSpec_001.csv``.

    Returns
    -------
    dict[int, Path]
        Mapping from shot number to file path.
    """
    base_path = self.get_folder()
    if not base_path:
        raise ValueError("Scan folder is not set.")

    device_folder = base_path / device
    if not device_folder.exists():
        logger.warning(f"Device folder missing: {device_folder}")
        return {}

    stem = device_file_stem if device_file_stem is not None else device
    pattern = re.compile(
        rf"Scan\d{{3,}}_{re.escape(stem)}_(\d{{3,}}){re.escape(file_tail)}$"
    )

    file_map = {}
    for file in device_folder.iterdir():
        if not file.is_file():
            continue
        match = pattern.match(file.name)
        if match:
            shot_number = int(match.group(1))
            file_map[shot_number] = file

    return file_map

get_common_shot_dataframe ¶

get_common_shot_dataframe(device_file_specs: Sequence[tuple[str, str]]) -> pd.DataFrame

Generate a DataFrame containing file paths for all devices with common shot number.

This method identifies shot numbers that are common (present in all specified devices' subfolders) and returns a table where each row corresponds to a shot number, and each column contains the full path to the file for that device.

Parameters:

Name	Type	Description	Default
`device_file_specs`	`Sequence[tuple[str, str]]`	A sequence of (device_name, file_tail) pairs. - `device_name` is the name of the subdirectory inside the scan folder. - `file_tail` is the suffix used in the filename, including extension, such as '.png', '_avg.h5', or '.tdms'.	required

Returns:

Type	Description
`DataFrame`	A DataFrame with one row per shot number that exists for all devices. Columns: - 'shot_number': The shot number (int). - One column per device name, with each entry as a `Path` object to the matching file. If no common shots are found, an empty DataFrame with appropriate columns is returned.

Examples:

>>> tag = ScanTag(year=2025, month=8, day=7, number=5, experiment='Undulator')
>>> sd = ScanPaths(tag=tag)
>>> dev_list = [
...     ('Z_Test_Scope', '.dat'),
...     ('Z_Test_Scope_2', '.dat'),
...     ('UC_ALineEBeam3', '.png')
... ]
>>> common_shots = sd.get_common_shot_dataframe(dev_list)

Source code in geecs_data_utils/scan_paths.py

def get_common_shot_dataframe(
    self, device_file_specs: Sequence[tuple[str, str]]
) -> pd.DataFrame:
    """
    Generate a DataFrame containing file paths for all devices with common shot number.

    This method identifies shot numbers that are common (present in all specified
    devices' subfolders) and returns a table where each row corresponds to a shot
    number, and each column contains the full path to the file for that device.

    Parameters
    ----------
    device_file_specs : Sequence[tuple[str, str]]
        A sequence of (device_name, file_tail) pairs.
        - `device_name` is the name of the subdirectory inside the scan folder.
        - `file_tail` is the suffix used in the filename, including extension,
          such as '.png', '_avg.h5', or '.tdms'.

    Returns
    -------
    pd.DataFrame
        A DataFrame with one row per shot number that exists for all devices.
        Columns:
        - 'shot_number': The shot number (int).
        - One column per device name, with each entry as a `Path` object to the matching file.
        If no common shots are found, an empty DataFrame with appropriate columns is returned.

    Examples
    --------
    >>> tag = ScanTag(year=2025, month=8, day=7, number=5, experiment='Undulator')
    >>> sd = ScanPaths(tag=tag)
    >>> dev_list = [
    ...     ('Z_Test_Scope', '.dat'),
    ...     ('Z_Test_Scope_2', '.dat'),
    ...     ('UC_ALineEBeam3', '.png')
    ... ]
    >>> common_shots = sd.get_common_shot_dataframe(dev_list)
    """
    device_maps = {
        device: self.build_device_file_map(device, file_tail)
        for device, file_tail in device_file_specs
    }

    # Find common shot numbers across all devices
    common_shots = set.intersection(*(set(m.keys()) for m in device_maps.values()))
    if not common_shots:
        logger.warning("No common shots found across specified devices.")
        return pd.DataFrame(
            columns=["shot_number"] + [device for device, _ in device_file_specs]
        )

    # Build rows: one per shot
    rows = []
    for shot in sorted(common_shots):
        row = {"shot_number": shot}
        for device, file_map in device_maps.items():
            row[device] = file_map[shot]
        rows.append(row)

    return pd.DataFrame(rows)

list_device_folders ¶

list_device_folders() -> list[str]

Return device subfolder names from this scan folder.

Source code in geecs_data_utils/scan_paths.py

def list_device_folders(self) -> list[str]:
    """Return device subfolder names from this scan folder."""
    try:
        return self.get_folders_and_files().get("devices", [])
    except Exception:
        root = self.get_folder()
        return (
            [p.name for p in root.iterdir() if p.is_dir()]
            if root and root.exists()
            else []
        )

device_folder ¶

device_folder(device: str) -> Path

Resolve '/'.

Source code in geecs_data_utils/scan_paths.py

def device_folder(self, device: str) -> Path:
    """Resolve '<scan>/<device>'."""
    return self.get_folder() / device

build_asset_filename `staticmethod` ¶

build_asset_filename(*, scan: int, shot: int, device: str, ext: str, variant: Optional[str] = None, device_file_stem: Optional[str] = None) -> str

Build canonical expected file naming.

Parameters:

Name	Type	Description	Default
`scan`	`int`	Scan and shot numbers.	required
`shot`	`int`	Scan and shot numbers.	required
`device`	`str`	Device name (also the in-filename stem unless overridden).	required
`ext`	`str`	File extension (with or without leading dot).	required
`variant`	`str`	Variant segment appended after the shot index.	`None`
`device_file_stem`	`str`	Token to use in the filename between `Scan<NNN>_` and `_<shot>`. Defaults to `device`. Use this when the folder name and the in-filename stem differ — for example, folder `U_BCaveMagSpec-interpSpec` containing files named `Scan042_U_BCaveMagSpec_001.csv`.	`None`

Source code in geecs_data_utils/scan_paths.py

@staticmethod
def build_asset_filename(
    *,
    scan: int,
    shot: int,
    device: str,
    ext: str,
    variant: Optional[str] = None,
    device_file_stem: Optional[str] = None,
) -> str:
    """Build canonical expected file naming.

    Parameters
    ----------
    scan, shot
        Scan and shot numbers.
    device : str
        Device name (also the in-filename stem unless overridden).
    ext : str
        File extension (with or without leading dot).
    variant : str, optional
        Variant segment appended after the shot index.
    device_file_stem : str, optional
        Token to use in the filename between ``Scan<NNN>_`` and
        ``_<shot>``. Defaults to ``device``. Use this when the folder
        name and the in-filename stem differ — for example, folder
        ``U_BCaveMagSpec-interpSpec`` containing files named
        ``Scan042_U_BCaveMagSpec_001.csv``.
    """
    ext = ext.lstrip(".").lower()
    shot_str = ScanPaths._shot_str(shot)
    variant_seg = "" if not variant else f"{variant}"
    stem = device_file_stem if device_file_stem is not None else device
    return f"Scan{scan:03d}_{stem}_{shot_str}{variant_seg}.{ext}"

build_asset_path ¶

build_asset_path(*, shot: int, device: str, ext: str, variant: Optional[str] = None, device_file_stem: Optional[str] = None) -> Path

Full expected path for one asset.

device is the subfolder name. device_file_stem overrides the in-filename token if it differs from the folder name (defaults to device). See :meth:build_asset_filename for details.

Source code in geecs_data_utils/scan_paths.py

def build_asset_path(
    self,
    *,
    shot: int,
    device: str,
    ext: str,
    variant: Optional[str] = None,
    device_file_stem: Optional[str] = None,
) -> Path:
    """Full expected path for one asset.

    ``device`` is the subfolder name. ``device_file_stem`` overrides the
    in-filename token if it differs from the folder name (defaults to
    ``device``). See :meth:`build_asset_filename` for details.
    """
    tag = self.get_tag()
    fname = self.build_asset_filename(
        scan=tag.number,
        shot=shot,
        device=device,
        ext=ext,
        variant=variant,
        device_file_stem=device_file_stem,
    )
    return self.device_folder(device) / fname

infer_device_ext ¶

infer_device_ext(device: str, *, max_files: int = 5) -> str

Peek at up to max_files files to find proper file extension.

Source code in geecs_data_utils/scan_paths.py

def infer_device_ext(self, device: str, *, max_files: int = 5) -> str:
    """Peek at up to `max_files` files to find proper file extension."""
    from collections import Counter

    dpath = self.device_folder(device)
    if not dpath.exists():
        return "png"

    counts = Counter()
    seen = 0
    for f in dpath.iterdir():
        if f.is_file():
            ext = f.suffix.lower().lstrip(".")
            if ext in _ACCEPTABLE_EXTS:
                counts[ext] += 1
                seen += 1
                if seen >= max_files:
                    break
    return counts.most_common(1)[0][0] if counts else "png"

ScanData API

geecs_data_utils.scan_data.ScanData ¶

ScanData(*, paths: ScanPaths)

Container for a single scan: paths + scalar DataFrame + lazy asset index.

This class composes a :class:ScanPaths (path logic) and provides: - Optional scalar DataFrame loading (s-file or TDMS→DataFrame). - Lazy, normalized asset indexing (no bytes loaded). - Convenience helpers for grouping/averaging images by Bin #. - Flexible column resolution (case-insensitive, substring/regex). - Per-bin scalar aggregation with configurable center and error.

Parameters:

Name	Type	Description	Default
`paths`	`ScanPaths`	A pre-constructed :class:`ScanPaths` instance pointing to the scan.	required

Notes

Use the factories :meth:from_date and :meth:latest for ergonomic creation.

Methods:

Name	Description
`from_date`	Construct a :class:`ScanData` from date/number.
`latest`	Construct a :class:`ScanData` for the latest scan on a date.
`load_scalars`	Load the scalar DataFrame (s-file or TDMS converted).
`set_data_frame`	Attach a scalar DataFrame and invalidate dependent caches.
`list_columns`	List column names as strings (flattens MultiIndex columns if present).
`find_cols`	Flexible column search.
`resolve_col`	Resolve a loose column spec to a single best column name.
`add_local_alias`	Register a user-defined shorthand for a column name.
`set_binning_config`	Update binning configuration and invalidate cache.
`expected_paths_by_bin`	Group expected image paths by the current bin definition.
`reload_sfile`	Re-read the analysis s-file into `self.data_frame`.
`copy_fresh_sfile_to_analysis`	Replace the analysis s-file with the fresh copy from the scan folder.
`load_ecs_live_dump`	Load and parse the ECS Live Dump file for this scan via `ScanPaths`.

Attributes:

Name	Type	Description
`binned_scalars`	`DataFrame`	Aggregate scalar data into bins with configurable center and error metrics.

Source code in geecs_data_utils/scan_data.py

def __init__(self, *, paths: ScanPaths):
    self.paths: ScanPaths = paths
    self.data_frame: Optional[pd.DataFrame] = None

    # Binning state
    self._bin_cfg: BinningConfig = BinningConfig()
    self._binned_cache: Optional[pd.DataFrame] = None
    self._df_version: int = 0
    self._binned_key: Optional[Tuple] = None

    # Local (user) aliases for columns (independent of DAQ "Alias:" strings)
    self.column_aliases: Dict[str, str] = {}

binned_scalars `property` ¶

binned_scalars: DataFrame

Aggregate scalar data into bins with configurable center and error metrics.

For each bin defined by bin_col in the current :class:BinningConfig, all selected numeric columns (value_cols) are aggregated. The result is a wide DataFrame with a two-level column index: (column_name, {"center", "err_low", "err_high"}).

Notes

If value_cols is None, all numeric columns in the scalar DataFrame are included (including the bin source column and Shotnumber).
The bin column is treated like any other numeric column: its per-bin center and errors are computed the same way as other variables.
Error definitions (err) control how err_low and err_high are computed:
- "std" : sample standard deviation (symmetric).
- "stderr" : standard error of the mean (symmetric).
- "mad" : median absolute deviation (scaled if scale_to_sigma=True; symmetric).
- "iqr" : interquartile range using percentiles; asymmetric offsets around the chosen center.
- "percentile": arbitrary quantile range using percentiles; asymmetric offsets around the chosen center.
Counts per bin are included under the pseudo-column ("count", "center").

Returns:

Type	Description
`DataFrame`	Binned scalar table with a MultiIndex on columns: Level 0: original column names plus `"count"`. Level 1: one of `{"center", "err_low", "err_high"}`. The row index corresponds to unique bin labels, which may be discrete values or numeric bin centers depending on the binning configuration.

Raises:

Type	Description
`ValueError`	If no scalar DataFrame is loaded.
`KeyError`	If the configured bin column is not found.

from_date `classmethod` ¶

from_date(*, year: int, month: int, day: int, number: int, experiment: Optional[str] = None, base_directory: Optional[Path] = None, load_scalars: bool = True, source: Literal['sfile', 'tdms'] = 'sfile', append_paths: bool = True, stem_override: Optional[dict[str, str]] = None) -> 'ScanData'

Construct a :class:ScanData from date/number.

Parameters:

Name	Type	Description	Default
`year`	`int`	Identify the scan.	required
`month`	`int`	Identify the scan.	required
`day`	`int`	Identify the scan.	required
`number`	`int`	Identify the scan.	required
`experiment`	`int`	Identify the scan.	required
`base_directory`	`Optional[Path]`	Base data root if not configured globally.	`None`
`load_scalars`	`bool`	If True, load scalar DataFrame immediately.	`True`
`source`	`Literal['sfile', 'tdms']`	`"sfile"` (default) or `"tdms"` for scalar source.	`'sfile'`
`append_paths`	`bool`	If true, ad device/shot paths to df.	`True`
`stem_override`	`Optional[dict[str, str]]`	Optional `{device: in_filename_stem}` mapping forwarded to :meth:`load_scalars`. Use when a device's folder name differs from the in-filename token (e.g., folder `U_BCaveMagSpec-interpSpec` with files named `Scan042_U_BCaveMagSpec_001.csv`).	`None`

Returns:

Type	Description
`ScanData`

Source code in geecs_data_utils/scan_data.py

@classmethod
def from_date(
    cls,
    *,
    year: int,
    month: int,
    day: int,
    number: int,
    experiment: Optional[str] = None,
    base_directory: Optional[Path] = None,
    load_scalars: bool = True,
    source: Literal["sfile", "tdms"] = "sfile",
    append_paths: bool = True,
    stem_override: Optional[dict[str, str]] = None,
) -> "ScanData":
    """
    Construct a :class:`ScanData` from date/number.

    Parameters
    ----------
    year, month, day, number, experiment
        Identify the scan.
    base_directory
        Base data root if not configured globally.
    load_scalars
        If True, load scalar DataFrame immediately.
    source
        ``"sfile"`` (default) or ``"tdms"`` for scalar source.
    append_paths
        If true, ad device/shot paths to df.
    stem_override
        Optional ``{device: in_filename_stem}`` mapping forwarded to
        :meth:`load_scalars`. Use when a device's folder name differs
        from the in-filename token (e.g., folder
        ``U_BCaveMagSpec-interpSpec`` with files named
        ``Scan042_U_BCaveMagSpec_001.csv``).

    Returns
    -------
    ScanData
    """
    tag = ScanPaths.get_scan_tag(year, month, day, number, experiment=experiment)
    paths = ScanPaths(tag=tag, base_directory=base_directory)
    sd = cls(paths=paths)
    if load_scalars:
        sd.load_scalars(
            source=source,
            append_paths=append_paths,
            stem_override=stem_override,
        )
    return sd

latest `classmethod` ¶

latest(experiment: Optional[str] = None, *, year: Optional[int] = None, month: Optional[int] = None, day: Optional[int] = None, base_directory: Optional[Path] = None, load_scalars: bool = True, source: Literal['sfile', 'tdms'] = 'sfile') -> 'ScanData'

Construct a :class:ScanData for the latest scan on a date.

Parameters:

Name	Type	Description	Default
`experiment`	`Optional[str]`	Experiment name.	`None`
`year`	`Optional[int]`	Optional date components; defaults to today if omitted.	`None`
`month`	`Optional[int]`	Optional date components; defaults to today if omitted.	`None`
`day`	`Optional[int]`	Optional date components; defaults to today if omitted.	`None`
`base_directory`	`Optional[Path]`	Base data root if not configured globally.	`None`
`load_scalars`	`bool`	If True, load scalar DataFrame immediately.	`True`
`source`	`Literal['sfile', 'tdms']`	`"sfile"` (default) or `"tdms"`.	`'sfile'`

Returns:

Type	Description
`ScanData`

Source code in geecs_data_utils/scan_data.py

@classmethod
def latest(
    cls,
    experiment: Optional[str] = None,
    *,
    year: Optional[int] = None,
    month: Optional[int] = None,
    day: Optional[int] = None,
    base_directory: Optional[Path] = None,
    load_scalars: bool = True,
    source: Literal["sfile", "tdms"] = "sfile",
) -> "ScanData":
    """
    Construct a :class:`ScanData` for the latest scan on a date.

    Parameters
    ----------
    experiment
        Experiment name.
    year, month, day
        Optional date components; defaults to today if omitted.
    base_directory
        Base data root if not configured globally.
    load_scalars
        If True, load scalar DataFrame immediately.
    source
        ``"sfile"`` (default) or ``"tdms"``.

    Returns
    -------
    ScanData
    """
    tag = ScanPaths.get_latest_scan_tag(
        experiment=experiment,
        year=year,
        month=month,
        day=day,
        base_directory=base_directory,
    )
    if not tag:
        raise ValueError("No scans found for the specified date/experiment.")
    paths = ScanPaths(tag=tag, base_directory=base_directory)
    sd = cls(paths=paths)
    if load_scalars:
        sd.load_scalars(source=source)
    return sd

load_scalars ¶

load_scalars(*, source: Literal['sfile', 'tdms'] = 'sfile', append_paths: bool = True, stem_override: Optional[dict[str, str]] = None) -> None

Load the scalar DataFrame (s-file or TDMS converted).

Parameters:

Name	Type	Description	Default
`source`	`Literal['sfile', 'tdms']`	`"sfile"` to read `s{scan}.txt` from the analysis tree, or `"tdms"` to read `ScanNNN.tdms` and convert to a DataFrame if possible.	`'sfile'`
`append_paths`	`bool`	If true, add device/shot paths to dataframe.	`True`
`stem_override`	`Optional[dict[str, str]]`	Optional `{device: in_filename_stem}` mapping forwarded to :meth:`set_data_frame`. Use when a device's folder name differs from the in-filename token (e.g., folder `U_BCaveMagSpec-interpSpec` with files named `Scan042_U_BCaveMagSpec_001.csv`).	`None`

Raises:

Type	Description
`FileNotFoundError`	If the s-file is expected but missing.

Source code in geecs_data_utils/scan_data.py

def load_scalars(
    self,
    *,
    source: Literal["sfile", "tdms"] = "sfile",
    append_paths: bool = True,
    stem_override: Optional[dict[str, str]] = None,
) -> None:
    """
    Load the scalar DataFrame (s-file or TDMS converted).

    Parameters
    ----------
    source
        ``"sfile"`` to read ``s{scan}.txt`` from the analysis tree, or ``"tdms"`` to
        read ``ScanNNN.tdms`` and convert to a DataFrame if possible.
    append_paths
        If true, add device/shot paths to dataframe.
    stem_override
        Optional ``{device: in_filename_stem}`` mapping forwarded to
        :meth:`set_data_frame`. Use when a device's folder name differs
        from the in-filename token (e.g., folder
        ``U_BCaveMagSpec-interpSpec`` with files named
        ``Scan042_U_BCaveMagSpec_001.csv``).

    Raises
    ------
    FileNotFoundError
        If the s-file is expected but missing.
    """
    if source == "sfile":
        tag = self.paths.get_tag()
        sfile = self.paths.get_analysis_folder().parent / f"s{tag.number}.txt"
        if not sfile.exists():
            raise FileNotFoundError(f"No sfile for scan {tag}")
        df = pd.read_csv(sfile, delimiter="\t")
        self.set_data_frame(
            df, append_paths=append_paths, stem_override=stem_override
        )

    elif source == "tdms":
        tag = self.paths.get_tag()
        tdms_path = self.paths.get_folder() / f"Scan{tag.number:03d}.tdms"
        if not tdms_path.exists():
            raise FileNotFoundError(f"TDMS file not found: {tdms_path}")
        dct = read_geecs_tdms(tdms_path) or {}
        if not dct:
            raise ValueError(f"TDMS file could not be parsed: {tdms_path}")
        df = geecs_tdms_dict_to_panda(dct)
        self.set_data_frame(
            df, append_paths=append_paths, stem_override=stem_override
        )

    else:
        raise ValueError(f"Unsupported source: {source!r}")

set_data_frame ¶

set_data_frame(df: DataFrame, *, append_paths: bool = True, stem_override: Optional[dict[str, str]] = None) -> None

Attach a scalar DataFrame and invalidate dependent caches.

Parameters:

Name	Type	Description	Default
`df`	`DataFrame`	Scalar table for the scan (typically from s-file).	required
`append_paths`	`bool`	If true, add device shot paths to dataframe.	`True`
`stem_override`	`Optional[dict[str, str]]`	Optional `{device: in_filename_stem}` mapping forwarded to :meth:`_append_expected_asset_columns`. Use when a device's folder name differs from the in-filename token (e.g., folder `U_BCaveMagSpec-interpSpec` with files named `Scan042_U_BCaveMagSpec_001.csv`).	`None`

Source code in geecs_data_utils/scan_data.py

def set_data_frame(
    self,
    df: pd.DataFrame,
    *,
    append_paths: bool = True,
    stem_override: Optional[dict[str, str]] = None,
) -> None:
    """Attach a scalar DataFrame and invalidate dependent caches.

    Parameters
    ----------
    df
        Scalar table for the scan (typically from s-file).
    append_paths
        If true, add device shot paths to dataframe.
    stem_override
        Optional ``{device: in_filename_stem}`` mapping forwarded to
        :meth:`_append_expected_asset_columns`. Use when a device's folder
        name differs from the in-filename token (e.g., folder
        ``U_BCaveMagSpec-interpSpec`` with files named
        ``Scan042_U_BCaveMagSpec_001.csv``).
    """
    if append_paths:
        df = self._append_expected_asset_columns(df, stem_override=stem_override)
    self.data_frame = df
    self._df_version += 1
    self._binned_cache = None
    self._binned_key = None

list_columns ¶

list_columns() -> List[str]

List column names as strings (flattens MultiIndex columns if present).

Returns:

Type	Description
`list of str`

Source code in geecs_data_utils/scan_data.py

def list_columns(self) -> List[str]:
    """
    List column names as strings (flattens MultiIndex columns if present).

    Returns
    -------
    list of str
    """
    return self._flatten_columns()

find_cols ¶

find_cols(query: Union[str, Sequence[str]], *, mode: ColumnMatchMode = 'contains', case_sensitive: bool = False) -> List[str]

Flexible column search.

Wrapper for find_cols in geecs_data_utils/data/columns.py.

Parameters:

Name	Type	Description	Default
`query`	`Union[str, Sequence[str]]`	String or list of strings to search for.	required
`mode`	`ColumnMatchMode`	Search mode: `"contains"` (default), `"startswith"`, `"endswith"`, `"regex"`, or `"exact"`.	`'contains'`
`case_sensitive`	`bool`	If True, match with case sensitivity.	`False`

Returns:

Type	Description
`list of str`	Matching column names (flattened form). May be empty.

Source code in geecs_data_utils/scan_data.py

def find_cols(
    self,
    query: Union[str, Sequence[str]],
    *,
    mode: ColumnMatchMode = "contains",
    case_sensitive: bool = False,
) -> List[str]:
    """
    Flexible column search.

    Wrapper for find_cols in geecs_data_utils/data/columns.py.

    Parameters
    ----------
    query
        String or list of strings to search for.
    mode
        Search mode: ``"contains"`` (default), ``"startswith"``, ``"endswith"``,
        ``"regex"``, or ``"exact"``.
    case_sensitive
        If True, match with case sensitivity.

    Returns
    -------
    list of str
        Matching column names (flattened form). May be empty.
    """
    if self.data_frame is None:
        return []
    return find_cols(
        self.data_frame, query, mode=mode, case_sensitive=case_sensitive
    )

resolve_col ¶

resolve_col(spec: str, *, mode: ColumnMatchMode = 'contains', case_sensitive: bool = False, prefer_exact_ci: bool = True) -> str

Resolve a loose column spec to a single best column name.

Parameters:

Name	Type	Description	Default
`spec`	`str`	User-provided spec (may be an alias or partial/regex).	required
`mode`	`ColumnMatchMode`	Matching strategy used by :meth:`find_cols`: `"contains"` (default), `"startswith"`, `"endswith"`, `"regex"`, or `"exact"`.	`'contains'`
`case_sensitive`	`bool`	If True, enforce case-sensitive matching for the chosen mode.	`False`
`prefer_exact_ci`	`bool`	Prefer exact (case-insensitive) matches over substring/regex matches.	`True`

Returns:

Type	Description
`str`	Selected column name.

Raises:

Type	Description
`ValueError`	If no match is found.

Source code in geecs_data_utils/scan_data.py

def resolve_col(
    self,
    spec: str,
    *,
    mode: ColumnMatchMode = "contains",
    case_sensitive: bool = False,
    prefer_exact_ci: bool = True,
) -> str:
    """
    Resolve a loose column spec to a single best column name.

    Parameters
    ----------
    spec
        User-provided spec (may be an alias or partial/regex).
    mode
        Matching strategy used by :meth:`find_cols`: ``"contains"`` (default),
        ``"startswith"``, ``"endswith"``, ``"regex"``, or ``"exact"``.
    case_sensitive
        If True, enforce case-sensitive matching for the chosen mode.
    prefer_exact_ci
        Prefer exact (case-insensitive) matches over substring/regex matches.

    Returns
    -------
    str
        Selected column name.

    Raises
    ------
    ValueError
        If no match is found.
    """
    if self.data_frame is None:
        raise ValueError("No scalar dataframe loaded.")

    if spec in self.column_aliases:
        return self.column_aliases[spec]

    result = resolve_col_detailed(
        self.data_frame,
        spec,
        mode=mode,
        case_sensitive=case_sensitive,
        prefer_exact_ci=prefer_exact_ci,
    )
    if result.ambiguous and result.candidates is not None:
        c = result.candidates
        logging.warning(
            "Spec %r matched multiple columns (%d): %s; using %r",
            spec,
            len(c),
            list(c),
            result.column,
        )
    return result.column

add_local_alias ¶

add_local_alias(alias: str, actual_col: str) -> None

Register a user-defined shorthand for a column name.

Parameters:

Name	Type	Description	Default
`alias`	`str`	Local shorthand (e.g., `"pressure"`).	required
`actual_col`	`str`	Full column name present in the DataFrame.	required

Source code in geecs_data_utils/scan_data.py

def add_local_alias(self, alias: str, actual_col: str) -> None:
    """
    Register a user-defined shorthand for a column name.

    Parameters
    ----------
    alias
        Local shorthand (e.g., ``"pressure"``).
    actual_col
        Full column name present in the DataFrame.
    """
    self.column_aliases[alias] = actual_col

set_binning_config ¶

set_binning_config(**updates) -> None

Update binning configuration and invalidate cache.

Parameters:

Name	Type	Description	Default
`**updates`		Fields to replace on the current :class:`BinningConfig`.	`{}`

Source code in geecs_data_utils/scan_data.py

def set_binning_config(self, **updates) -> None:
    """
    Update binning configuration and invalidate cache.

    Parameters
    ----------
    **updates
        Fields to replace on the current :class:`BinningConfig`.
    """
    if "value_cols" in updates and updates["value_cols"] is not None:
        updates["value_cols"] = tuple(map(str, updates["value_cols"]))
    self._bin_cfg = replace(self._bin_cfg, **updates)
    self._binned_cache = None
    self._binned_key = None

expected_paths_by_bin ¶

expected_paths_by_bin(device: str, *, variant: Optional[str] = None, bin_col: Optional[str] = None, dropna_paths: bool = True, exists_only: bool = False) -> Dict[Hashable, List[Path]]

Group expected image paths by the current bin definition.

Parameters:

Name	Type	Description	Default
`device`	`str`	Device name (subfolder).	required
`variant`	`Optional[str]`	Optional variant suffix used when creating expected-path columns.	`None`
`bin_col`	`Optional[str]`	Override the configured bin column for this call.	`None`
`dropna_paths`	`bool`	If True, drop rows with missing path strings.	`True`
`exists_only`	`bool`	If True, filter out paths that do not currently exist on disk.	`False`

Returns:

Type	Description
`dict[Hashable, list[Path]]`	Mapping {bin_value -> [image paths]}.

Source code in geecs_data_utils/scan_data.py

def expected_paths_by_bin(
    self,
    device: str,
    *,
    variant: Optional[str] = None,
    bin_col: Optional[str] = None,
    dropna_paths: bool = True,
    exists_only: bool = False,
) -> Dict[Hashable, List[Path]]:
    """
    Group expected image paths by the current bin definition.

    Parameters
    ----------
    device
        Device name (subfolder).
    variant
        Optional variant suffix used when creating expected-path columns.
    bin_col
        Override the configured bin column for this call.
    dropna_paths
        If True, drop rows with missing path strings.
    exists_only
        If True, filter out paths that do not currently exist on disk.

    Returns
    -------
    dict[Hashable, list[pathlib.Path]]
        Mapping {bin_value -> [image paths]}.
    """
    if self.data_frame is None:
        raise ValueError("No scalar dataframe loaded.")

    # Optionally override the bin column for just this call
    if bin_col is not None:
        self._bin_cfg = replace(self._bin_cfg, bin_col=str(bin_col))

    # Ensure the bin source is present; compute the effective bin key
    self._require_bin_col()
    df = self.data_frame.copy()
    bin_key, bin_name = self._compute_bin_key(df)
    df = df.assign(**{bin_name: bin_key})

    col = self._expected_path_col(device, variant=variant)
    series = df[col]

    if dropna_paths:
        mask = series.notna()
        df = df.loc[mask]

    # Convert to Paths and optionally filter to existing files
    df = df.assign(
        _path_obj=df[col].map(lambda s: Path(s) if isinstance(s, str) else None)
    )
    if exists_only:
        df = df.loc[df["_path_obj"].map(lambda p: p is not None and p.exists())]

    out: Dict[Hashable, List[Path]] = {}
    for bval, group in df.groupby(bin_name, dropna=False, observed=True, sort=True):
        paths = [p for p in group["_path_obj"].tolist() if p is not None]
        if paths:
            out[bval] = paths
    return out

reload_sfile ¶

reload_sfile() -> None

Re-read the analysis s-file into self.data_frame.

Notes

This is a thin alias for load_scalars(source='sfile') to make intent explicit.

Source code in geecs_data_utils/scan_data.py

def reload_sfile(self) -> None:
    """
    Re-read the analysis s-file into ``self.data_frame``.

    Notes
    -----
    This is a thin alias for ``load_scalars(source='sfile')`` to make intent explicit.
    """
    self.load_scalars(source="sfile")

copy_fresh_sfile_to_analysis ¶

copy_fresh_sfile_to_analysis() -> None

Replace the analysis s-file with the fresh copy from the scan folder.

Copies: <scan>/scans/ScanDataScanNNN.txt → <scan>/analysis/../sNNN.txt

Raises:

Type	Description
`FileNotFoundError`	If the source s-file in `scans/` does not exist.

Source code in geecs_data_utils/scan_data.py

def copy_fresh_sfile_to_analysis(self) -> None:
    """
    Replace the analysis s-file with the fresh copy from the scan folder.

    Copies:
        ``<scan>/scans/ScanDataScanNNN.txt`` → ``<scan>/analysis/../sNNN.txt``

    Raises
    ------
    FileNotFoundError
        If the source s-file in ``scans/`` does not exist.
    """
    tag = self.paths.get_tag()
    scan_txt = self.paths.get_folder() / f"ScanDataScan{tag.number:03d}.txt"
    analysis_txt = self.paths.get_analysis_folder().parent / f"s{tag.number}.txt"

    if not scan_txt.exists():
        raise FileNotFoundError(f"Original s-file '{scan_txt}' not found.")
    if analysis_txt.exists():
        analysis_txt.unlink()

    shutil.copy2(src=scan_txt, dst=analysis_txt)

load_ecs_live_dump ¶

load_ecs_live_dump() -> ECSDump

Load and parse the ECS Live Dump file for this scan via ScanPaths.

Returns:

Type	Description
`ECSDump`	Parsed ECS dump structured by device name.

Raises:

Type	Description
`FileNotFoundError`	If no ECS dump file is available for this scan.

Source code in geecs_data_utils/scan_data.py

def load_ecs_live_dump(self) -> ECSDump:
    """
    Load and parse the ECS Live Dump file for this scan via ``ScanPaths``.

    Returns
    -------
    ECSDump
        Parsed ECS dump structured by device name.

    Raises
    ------
    FileNotFoundError
        If no ECS dump file is available for this scan.
    """
    tag = self.paths.get_tag()
    ecs_path = self.paths.get_ecs_dump_file()
    if not ecs_path:
        raise FileNotFoundError(f"No ECS live dump file found for scan {tag}")
    return parse_ecs_dump(ecs_path)

Core Modules

geecs_data_utils.scan_paths.ScanPaths ¶

reload_paths_config classmethod ¶

get_scan_tag staticmethod ¶

get_scan_folder_path staticmethod ¶

get_daily_scan_folder staticmethod ¶

get_scan_analysis_folder_path staticmethod ¶

get_device_shot_path staticmethod ¶

get_latest_scan_tag staticmethod ¶

get_next_scan_tag staticmethod ¶

get_next_scan_folder staticmethod ¶

build_next_scan_data staticmethod ¶

is_background_scan staticmethod ¶

get_folder ¶

get_tag ¶

get_tag_date ¶

get_analysis_folder ¶

get_folders_and_files ¶

load_scan_info ¶

get_ecs_dump_file ¶

build_device_file_map ¶

get_common_shot_dataframe ¶

list_device_folders ¶

device_folder ¶

build_asset_filename staticmethod ¶

build_asset_path ¶

infer_device_ext ¶

geecs_data_utils.scan_data.ScanData ¶

binned_scalars property ¶

from_date classmethod ¶

latest classmethod ¶

load_scalars ¶

set_data_frame ¶

list_columns ¶

find_cols ¶

resolve_col ¶

add_local_alias ¶

set_binning_config ¶

expected_paths_by_bin ¶

reload_sfile ¶

copy_fresh_sfile_to_analysis ¶

load_ecs_live_dump ¶

reload_paths_config `classmethod` ¶

get_scan_tag `staticmethod` ¶

get_scan_folder_path `staticmethod` ¶

get_daily_scan_folder `staticmethod` ¶

get_scan_analysis_folder_path `staticmethod` ¶

get_device_shot_path `staticmethod` ¶

get_latest_scan_tag `staticmethod` ¶

get_next_scan_tag `staticmethod` ¶

get_next_scan_folder `staticmethod` ¶

build_next_scan_data `staticmethod` ¶

is_background_scan `staticmethod` ¶

build_asset_filename `staticmethod` ¶

binned_scalars `property` ¶

from_date `classmethod` ¶

latest `classmethod` ¶