Basic Usage

Basic creation of a ScanData object¶

Note:

geecs_data_utils makes use of GEECSPathConfig() which is loaded in the background. This looks for a specific config file which has details about the users desired defaults. If this config doesn't exist, some basic attributes need to be set.

In [1]:

Copied!





from geecs_data_utils import ScanData

# if no config file exists, set base path to experiment data as below.
# Assumption is that data is stored like <base_path>/<experiment name>
# ScanPaths.paths_config.base_path = Path('Z:/data')

# Create ScanData object directly referencing year, month, data, scan number and experiment
sd = ScanData.from_date(year=2025, month=8, day=21, number=1, experiment="Undulator")

# inspect the head of the data_frame
sd.data_frame.head()
from geecs_data_utils import ScanData

# if no config file exists, set base path to experiment data as below.
# Assumption is that data is stored like <base_path>/<experiment name>
# ScanPaths.paths_config.base_path = Path('Z:/data')

# Create ScanData object directly referencing year, month, data, scan number and experiment
sd = ScanData.from_date(year=2025, month=8, day=21, number=1, experiment="Undulator")

# inspect the head of the data_frame
sd.data_frame.head()

Out[1]:

	Elapsed Time	Bin #	scan	U_ESP_JetXYZ Position.Axis 3 Alias:Jet_Z (mm)	U_HP_Daq AnalogOutput.Channel 1 Alias:PressureControlVoltage	U_ModeImagerESP Position.Axis 2 Alias:JetBlade	U_BCaveICT acq_timestamp	U_BCaveICT Python Results.ChA Alias:U_BCaveICT Charge pC	UC_HiResMagCam acq_timestamp	Objective PlaceHolder	Shotnumber	UC_HiResMagCam:emittance_proxy	UC_HiResMagCam:total_counts	UC_HiResMagCam_expected_path	U_BCaveICT_expected_path
0	3.0	1	1	4.00051	2.0	-17.04999	3.838643e+09	176.458672	3.838643e+09	-2.657476	1	1000000.000000	19359560.0	/Volumes/hdna2/data/Undulator/Y2025/08-Aug/25_...	/Volumes/hdna2/data/Undulator/Y2025/08-Aug/25_...
1	4.0	1	1	4.00051	2.0	-17.04999	3.838643e+09	226.961340	3.838643e+09	-2.657476	2	0.522350	30597828.0	/Volumes/hdna2/data/Undulator/Y2025/08-Aug/25_...	/Volumes/hdna2/data/Undulator/Y2025/08-Aug/25_...
2	5.0	1	1	4.00051	2.0	-17.04999	3.838643e+09	185.923882	3.838643e+09	-2.657476	3	0.486896	27113014.0	/Volumes/hdna2/data/Undulator/Y2025/08-Aug/25_...	/Volumes/hdna2/data/Undulator/Y2025/08-Aug/25_...
3	6.0	1	1	4.00051	2.0	-17.04999	3.838643e+09	197.577382	3.838643e+09	-2.657476	4	0.527844	25153408.0	/Volumes/hdna2/data/Undulator/Y2025/08-Aug/25_...	/Volumes/hdna2/data/Undulator/Y2025/08-Aug/25_...
4	7.0	1	1	4.00051	2.0	-17.04999	3.838643e+09	173.117150	3.838643e+09	-2.657476	5	0.515521	25046332.0	/Volumes/hdna2/data/Undulator/Y2025/08-Aug/25_...	/Volumes/hdna2/data/Undulator/Y2025/08-Aug/25_...

Note, in the above, the expected path of the saved device data (non scalar data) is directly appended to the data_frame. These are just the expected paths based on configuration etc.

Search the data frame columns

In [2]:

Copied!

sd.find_cols("charge")
sd.find_cols("charge")

Out[2]:

['U_BCaveICT Python Results.ChA Alias:U_BCaveICT Charge pC']

Working with binned data¶

binned_scalars is a property of the ScanData that can be accessed or used for plotting. There are number of attributes that are user defineable, like the aggretate type (i.e. 'agg') which can be mean or median. Also, there are a number of optitions for the error bar type. Default is to use mean and the inter quartile range error bars (same as GEECS Plotter default). See API for more details

There is also a basic plot utility for visualizing binned data in geecs_data_utils.plotting_utils

In [3]:

Copied!





from geecs_data_utils.plotting_utils import plot_binned
import matplotlib.pyplot as plt


# If your s-file has "Bin" (not "Bin #"):
sd.set_binning_config(bin_col="Bin #")

binned = sd.binned_scalars
charge_col = sd.find_cols("charge")[0]

plot_binned(
    binned,
    x_col="U_ModeImagerESP Position.Axis 2 Alias:JetBlade",
    y_col=charge_col,
    label="example",
)

plt.show()
from geecs_data_utils.plotting_utils import plot_binned
import matplotlib.pyplot as plt


# If your s-file has "Bin" (not "Bin #"):
sd.set_binning_config(bin_col="Bin #")

binned = sd.binned_scalars
charge_col = sd.find_cols("charge")[0]

plot_binned(
    binned,
    x_col="U_ModeImagerESP Position.Axis 2 Alias:JetBlade",
    y_col=charge_col,
    label="example",
)

plt.show()

No description has been provided for this image

Reconfigure the binned data. Use 'charge' as the x axis with defined bin_width of 20. Change to mean and standard deviation

In [6]:

Copied!





sd.set_binning_config(
    bin_col=charge_col, agg="mean", err="std", min_count=1, dropna="any", bin_width=20
)

bins1 = sd.binned_scalars  # triggers recompute

# Single series
plot_binned(
    bins1,
    x_col=charge_col,
    y_col="U_ModeImagerESP Position.Axis 2 Alias:JetBlade",
    label="example",
)
plt.show()
sd.set_binning_config(
    bin_col=charge_col, agg="mean", err="std", min_count=1, dropna="any", bin_width=20
)

bins1 = sd.binned_scalars  # triggers recompute

# Single series
plot_binned(
    bins1,
    x_col=charge_col,
    y_col="U_ModeImagerESP Position.Axis 2 Alias:JetBlade",
    label="example",
)
plt.show()

Gather device data by shot number, verify existtence¶

Create dataframe of shotnumber and files paths for devices restricted to shotnumbers where all devices are saved

In [7]:

Copied!





# make list of tuples with <device> <file_tail>. Note, file names are typically:
# Scan<scan_number>DeviceName<shotnumber><extra><extenstion>. Here file_tail represents
# everything after <shotnumber>. For example, for magspec type device, could be ('magspec', 'interpSpec.txt')

sd = ScanData.from_date(year=2025, month=8, day=7, number=5, experiment="Undulator")

dev_list = [
    ("Z_Test_Scope", ".dat"),
    ("Z_Test_Scope_2", ".dat"),
    ("UC_ALineEBeam3", ".png"),
]
shots = sd.paths.get_common_shot_dataframe(dev_list)
shots.head(5)
# make list of tuples with <device> <file_tail>. Note, file names are typically:
# Scan<scan_number>DeviceName<shotnumber><extra><extenstion>. Here file_tail represents
# everything after <shotnumber>. For example, for magspec type device, could be ('magspec', 'interpSpec.txt')

sd = ScanData.from_date(year=2025, month=8, day=7, number=5, experiment="Undulator")

dev_list = [
    ("Z_Test_Scope", ".dat"),
    ("Z_Test_Scope_2", ".dat"),
    ("UC_ALineEBeam3", ".png"),
]
shots = sd.paths.get_common_shot_dataframe(dev_list)
shots.head(5)

Out[7]:

	shot_number	Z_Test_Scope	Z_Test_Scope_2	UC_ALineEBeam3
0	1	/Volumes/hdna2/data/Undulator/Y2025/08-Aug/25_...	/Volumes/hdna2/data/Undulator/Y2025/08-Aug/25_...	/Volumes/hdna2/data/Undulator/Y2025/08-Aug/25_...
1	2	/Volumes/hdna2/data/Undulator/Y2025/08-Aug/25_...	/Volumes/hdna2/data/Undulator/Y2025/08-Aug/25_...	/Volumes/hdna2/data/Undulator/Y2025/08-Aug/25_...
2	3	/Volumes/hdna2/data/Undulator/Y2025/08-Aug/25_...	/Volumes/hdna2/data/Undulator/Y2025/08-Aug/25_...	/Volumes/hdna2/data/Undulator/Y2025/08-Aug/25_...
3	4	/Volumes/hdna2/data/Undulator/Y2025/08-Aug/25_...	/Volumes/hdna2/data/Undulator/Y2025/08-Aug/25_...	/Volumes/hdna2/data/Undulator/Y2025/08-Aug/25_...
4	5	/Volumes/hdna2/data/Undulator/Y2025/08-Aug/25_...	/Volumes/hdna2/data/Undulator/Y2025/08-Aug/25_...	/Volumes/hdna2/data/Undulator/Y2025/08-Aug/25_...

Multi-scan table (`DatasetBuilder`)¶

Prefer DatasetBuilder.from_date_scan_numbers instead of hand-concatenating frames: it loads each scan, skips failures (see load_report), concatenates, and can apply shared row filters / outlier config / dropna. Later cells in this subsection reload assembled and df if needed, so you can run DatasetBuilder plots without executing earlier parts of the notebook.

In [ ]:

Copied!





from geecs_data_utils.data import DatasetBuilder

assembled = DatasetBuilder.from_date_scan_numbers(
    year=2026,
    month=4,
    day=23,
    experiment="Undulator",
    numbers=range(1, 20),
    load_scalars=True,
    source="sfile",
    on_missing="skip",
    dropna=False,
)

df = assembled.frame
print(assembled.scan_info.get("total_scans"), "scans merged")
print(
    "Skipped:", assembled.load_report.skipped[:5], "..."
) if assembled.load_report else None
df.head()
df.shape
from geecs_data_utils.data import DatasetBuilder

assembled = DatasetBuilder.from_date_scan_numbers(
    year=2026,
    month=4,
    day=23,
    experiment="Undulator",
    numbers=range(1, 20),
    load_scalars=True,
    source="sfile",
    on_missing="skip",
    dropna=False,
)

df = assembled.frame
print(assembled.scan_info.get("total_scans"), "scans merged")
print(
    "Skipped:", assembled.load_report.skipped[:5], "..."
) if assembled.load_report else None
df.head()
df.shape

In [ ]:

Copied!





from geecs_data_utils.data import DatasetBuilder
import matplotlib.pyplot as plt

assembled = DatasetBuilder.from_date_scan_numbers(
    year=2026,
    month=4,
    day=23,
    experiment="Undulator",
    numbers=range(1, 20),
    load_scalars=True,
    source="sfile",
    on_missing="skip",
    dropna=False,
)

df = assembled.frame


plt.scatter(
    df["U_FROG_Grenouille acq_timestamp"] - df["U_FROG_Grenouille acq_timestamp"].min(),
    df["U_FROG_Grenouille-Temporal_temporal_fwhm"],
)
plt.xlabel("U_FROG_Grenouille acq_timestamp")
plt.ylabel("U_FROG_Grenouille-Temporal_temporal_fwhm")
plt.show()
from geecs_data_utils.data import DatasetBuilder
import matplotlib.pyplot as plt

assembled = DatasetBuilder.from_date_scan_numbers(
    year=2026,
    month=4,
    day=23,
    experiment="Undulator",
    numbers=range(1, 20),
    load_scalars=True,
    source="sfile",
    on_missing="skip",
    dropna=False,
)

df = assembled.frame


plt.scatter(
    df["U_FROG_Grenouille acq_timestamp"] - df["U_FROG_Grenouille acq_timestamp"].min(),
    df["U_FROG_Grenouille-Temporal_temporal_fwhm"],
)
plt.xlabel("U_FROG_Grenouille acq_timestamp")
plt.ylabel("U_FROG_Grenouille-Temporal_temporal_fwhm")
plt.show()

In [ ]:

Copied!

df["U_FROG_Grenouille acq_timestamp"].min()
df["U_FROG_Grenouille acq_timestamp"].min()

In [ ]:

Copied!





x_col = "U_FROG_Grenouille-Temporal_frog_error"
y_col = "U_FROG_Grenouille-Temporal_temporal_fwhm"

subset = df[[x_col, y_col]].dropna()

plt.hist2d(
    subset[x_col] * 100,
    subset[y_col],
    bins=250,
)

plt.xlabel(x_col)
plt.ylabel(y_col)
plt.colorbar(label="Counts")
plt.show()
x_col = "U_FROG_Grenouille-Temporal_frog_error"
y_col = "U_FROG_Grenouille-Temporal_temporal_fwhm"

subset = df[[x_col, y_col]].dropna()

plt.hist2d(
    subset[x_col] * 100,
    subset[y_col],
    bins=250,
)

plt.xlabel(x_col)
plt.ylabel(y_col)
plt.colorbar(label="Counts")
plt.show()

In [ ]:

Copied!





from geecs_data_utils.data import find_cols

result = find_cols(df, "frog")
print(result)

cols = [c for c in df.columns if "frog" in c.lower()]
print(cols)
print(f"Same results: {sorted(result) == sorted(cols)}")
from geecs_data_utils.data import find_cols

result = find_cols(df, "frog")
print(result)

cols = [c for c in df.columns if "frog" in c.lower()]
print(cols)
print(f"Same results: {sorted(result) == sorted(cols)}")

Basic Usage

Basic creation of a ScanData object¶

Working with binned data¶

Gather device data by shot number, verify existtence¶

Multi-scan table (DatasetBuilder)¶

Multi-scan table (`DatasetBuilder`)¶