Module Documentation

Protomix.extract_fids

extract_fids(root_directory: str, acqus_df: DataFrame) DataFrame[source]

Extract Free Induction Decay (FID) data from binary files and construct a DataFrame.

This function navigates through a specified directory, locates binary files named ‘fid’, and extracts FID data from them. It then assembles a DataFrame where each row represents the complex FID signal from a sample, with columns corresponding to time points calculated using the dwell time from the acqus_df DataFrame.

Parameters:
  • root_directory (str) – The root directory path containing subdirectories with ‘fid’ files.

  • acqus_df (pd.DataFrame) – A DataFrame containing acquisition parameters, specifically the spectral width (‘$SW_h’) used for dwell time calculation.

Returns:

A DataFrame where each row corresponds to a sample’s FID data (as complex values), with columns representing time points.

Return type:

pd.DataFrame

Raises:

AssertionError – If root_directory is not a string, does not exist, is not a directory, contains no ‘fid’ files, or if any ‘fid’ file has an unexpected data length.

Protomix.extract_params

extract_params(root_directory: str) DataFrame[source]

Extract parameters from ‘acqus’ files located within a directory hierarchy.

Parameters:

root_directory (str) – Root directory where the ‘acqus’ files are located.

Returns:

A dataframe with extracted parameters, indexed by sample names.

Return type:

pd.DataFrame

Protomix.group_delay_removal

group_delay_removal(fid_df: DataFrame, acqus_df: DataFrame) DataFrame[source]

Remove the group delay from the input FID (Free Induction Decay) data.

This function processes the FID data by removing the group delay, which is an artifact introduced during data acquisition, to produce cleaner, more accurate signals.

Parameters:
  • fid_df (pd.DataFrame) – A DataFrame where each row represents an individual FID signal, and columns represent time points.

  • acqus_df (pd.DataFrame) – A DataFrame containing acquisition parameters used to correct the group delay.

Returns:

A DataFrame with the group delay removed from the FID signals.

Return type:

pd.DataFrame

Protomix.solvent_residuals_removal

solvent_residuals_removal(fid_df: DataFrame, lam: float = 1000000.0, returnSolvent: bool = False)[source]

Remove solvent residuals from the input FID (Free Induction Decay) data.

This function processes the input FID data to remove solvent residuals, which are unwanted signals typically present in NMR data. The function can also return the solvent residuals if specified.

Parameters:
  • fid_df (pd.DataFrame) – A DataFrame where each row represents an individual FID signal, and columns correspond to time points.

  • lam (float) – The regularization parameter used during the removal process. Defaults to 1e6.

  • returnSolvent (bool) – If True, the function returns both the corrected FID data and the solvent residuals. Defaults to False.

Returns:

If returnSolvent is False, returns a DataFrame with the corrected FID data. If returnSolvent is True, returns a tuple of two DataFrames: (corrected FID data, solvent residuals).

Return type:

pd.DataFrame or tuple of (pd.DataFrame, pd.DataFrame)

Protomix.apodization

apodization(fid_df: DataFrame, LB: float = 1, apodization_type: str = 'gaussian') DataFrame[source]

Apply an apodization function to each Free Induction Decay (FID) row in a DataFrame.

Apodization, in the context of NMR (Nuclear Magnetic Resonance) spectroscopy, is a windowing technique applied to FID signals to enhance the signal-to-noise ratio and improve line shape in the frequency domain. This function supports two types of apodization: Gaussian and Exponential.

Parameters:
  • fid_df (pd.DataFrame) – A DataFrame where each row represents an FID signal, and columns represent time points.

  • LB (float, optional) – Line broadening parameter, which determines the width of the apodization function. A higher value results in more broadening. Defaults to 1.

  • apodization_type (str, optional) – Type of apodization function to apply. Options are ‘gaussian’ for Gaussian apodization, and ‘exponential’ for Exponential apodization. Defaults to ‘gaussian’.

Returns:

A DataFrame of the same shape as fid_df, containing the apodized FID values. Each FID row in the input DataFrame is multiplied by the apodization function, transforming the data in the time domain.

Return type:

pd.DataFrame

Raises:

AssertionError – If fid_df is not a pandas DataFrame, LB is negative, or apodization_type is not ‘gaussian’ or ‘exponential’.

Protomix.zero_filling

zero_filling(fid_df: DataFrame, acqus_df: DataFrame, target_points: int = 5000) DataFrame[source]

Zero-fill FID signals in a DataFrame to achieve the target number of points.

This function adds zeros to the end of each FID signal in the provided DataFrame until the specified target number of points is reached. Zero-filling is commonly used to increase the resolution of the Fourier-transformed spectra.

Parameters:
  • fid_df (pd.DataFrame) – A DataFrame containing FID signals, with each row representing an individual FID signal.

  • acqus_df (pd.DataFrame) – A DataFrame containing acquisition parameters relevant to the FID signals.

  • target_points (int) – The target number of points for each FID signal after zero-filling. Default is 5000.

Returns:

A DataFrame with FID signals that have been zero-filled to the specified target number of points.

Return type:

pd.DataFrame

Protomix.fourier_transform

fourier_transform(fid_df: DataFrame, acqus_df: DataFrame) DataFrame[source]

Apply Fourier Transform to FID (Free Induction Decay) signals in a DataFrame and convert to chemical shift values in ppm.

This function takes a DataFrame containing rows of FID signals, applies the Fourier Transform to each row, and then scales the frequencies to chemical shift values in ppm using parameters from the acqus_df DataFrame.

Parameters:
  • fid_df (pd.DataFrame) – DataFrame containing FID signals in rows. Each row represents an FID signal from a sample.

  • acqus_df (pd.DataFrame) – DataFrame containing acquisition parameters necessary for the transformation. It should include spectral width ($SW_h), spectral offset ($O1), and NMR frequency ($SFO1).

Returns:

DataFrame containing Fourier-transformed spectra. Columns represent chemical shift values in ppm, and rows correspond to the transformed FID signals.

Return type:

pd.DataFrame

Notes:

The function performs a Fourier Transform on each FID signal in fid_df. The spectral width ($SW_h), spectral offset ($O1), and NMR frequency ($SFO1) from acqus_df are used to calculate the ppm scale for the spectra.

Protomix.internal_referencing

internal_referencing(spectra_df: DataFrame, ppm_min: float = -0.2, ppm_max: float = 0.2) DataFrame[source]

Reference a DataFrame of NMR spectra by shifting the spectrum values to align the TSP peak to 0 ppm.

This function adjusts the chemical shift values in the provided NMR spectra so that the TSP (trimethylsilyl propionate) peak is set to 0 ppm. The adjustment is performed within a specified ppm range.

Parameters:
  • spectra_df (pd.DataFrame) – A DataFrame where each row represents a complex NMR spectrum, and columns correspond to ppm (parts per million) values.

  • ppm_min (float, optional) – The minimum ppm value for the search range to locate the TSP peak. Default is -0.2.

  • ppm_max (float, optional) – The maximum ppm value for the search range to locate the TSP peak. Default is 0.2.

Returns:

A DataFrame with spectra shifted so that the TSP peak is aligned at 0 ppm, with the original ppm values as column names.

Return type:

pd.DataFrame

Protomix.phase_correction

phase_correction(spectra_df: DataFrame) DataFrame[source]

Apply phase correction to spectra in a DataFrame.

This function applies a phase correction to each spectrum in the provided DataFrame, ensuring that the spectral peaks are properly aligned.

Parameters:

spectra_df (pd.DataFrame) – A DataFrame containing spectra, with each row representing a spectrum and columns corresponding to ppm values.

Returns:

A DataFrame containing the phase-corrected spectra.

Return type:

pd.DataFrame

Protomix.baseline_correction

baseline_correction(spectra_df: DataFrame, method='als', lambda_=100, porder=1, maxIter=100, lam=10000.0, ratio=0.05)[source]

Apply different baseline correction algorithms to each row of a DataFrame.

Parameters:
  • spectra_df (pd.DataFrame) – DataFrame where each row is a spectrum to be baseline corrected.

  • method (str) – Method for baseline correction. Options are “als”, “arpls”, “airpls”.

  • lambda (float) – Regularization parameter for ALS and AIRPLS.

  • porder (float) – Asymmetry parameter for ALS.

  • maxIter (int) – Maximum number of iterations for ALS and AIRPLS.

  • lam (float) – Lambda parameter for ARPLS.

  • ratio (float) – Ratio parameter for ARPLS.

Returns:

DataFrame with baseline corrected spectra.

Return type:

pd.DataFrame

Protomix.Icoshift class

class Icoshift[source]

Bases: object

A class to perform Icoshift alignment on spectral data.

The Icoshift algorithm aligns signals (e.g., NMR spectra) to a reference signal by shifting them to maximize the correlation with the reference. The alignment can be performed on the entire signal or in defined intervals. The class supports various modes for determining the reference signal, alignment modes, and shift correction strategies.

Properties:
  • name (str): The name of the Icoshift instance.

  • global_pre_align (bool): Flag to enable or disable global pre-alignment.

  • unit_vector (np.ndarray): Vector mapping units to sample points.

  • signal_names (List[str]): List of names corresponding to the signals.

  • input_type (str): Type of input (datapoints or units).

  • result (np.ndarray): The aligned signals after running Icoshift.

  • signals (np.ndarray): The input signals to be aligned.

  • inter (List[Tuple[Union[int, float], Union[int, float]]]): User-defined intervals for alignment.

  • target (Tuple[str, Union[np.ndarray, list]]): The target signal used for alignment.

  • avg2factor (int): Factor used in the ‘average2’ target mode.

  • max_shift (Tuple[str, int]): Maximum shift correction mode and value.

  • fill_mode (str): Mode used to fill missing data after shifting (‘nan’, ‘zero’, ‘adjacent’).

  • loglvl (str): Logging level for debugging and information.

Methods:
run(): Executes the Icoshift alignment process.
The run method performs the main alignment process using the following steps:
  1. Optionally performs a global pre-alignment of the entire dataset.

  2. Splits the signals into intervals based on the selected alignment mode.

  3. Calculates the target signal based on the chosen target mode.

  4. Aligns the signals within each interval to the target signal.

  5. Reconstructs the aligned signals from the intervals.

Example:
icoshift = px.Icoshift()
icoshift.signals = nvz_df.values
icoshift.signal_names = list(nvz_df.index.values)
icoshift.inter = ('n_intervals', 100)
icoshift.target = 'median'
icoshift.global_pre_align = True
icoshift.max_shift = 'best'

icoshift.run()            
run()[source]

Main function to run the actual icoshift process.

The steps involved are:

  1. Pre-run coshift on the whole dataset:
    • Co-shift or no co-shift.

  2. Split the data into intervals.

  3. Co-shift each interval.

  4. Reconstruct the data from intervals.

Protomix.negative_values_zeroing

negative_values_zeroing(spectra_df: DataFrame) DataFrame[source]

Set all negative values in the spectra to zero.

This function processes a DataFrame of spectra, setting any negative values to zero while retaining the original structure and indexes of the DataFrame.

Parameters:

spectra_df (pd.DataFrame) – A DataFrame where each row represents a spectrum, and columns correspond to spectral data points.

Returns:

A DataFrame with the same structure as the input, but with all negative values replaced by zero.

Return type:

pd.DataFrame

Protomix.window_selection

window_selection(spectra_df: DataFrame, range=(0.2, 10)) DataFrame[source]

Extract a specified ppm range from the spectra_df DataFrame.

This function extracts the spectral data within a specified ppm range from each spectrum in the DataFrame. The resulting DataFrame retains the original indexes but only includes the selected ppm range.

Parameters:
  • spectra_df (pd.DataFrame) – A DataFrame where each row represents a spectrum and columns correspond to ppm values.

  • range (tuple) – A tuple specifying the start and end ppm values of the region to extract. Default is (0.2, 10).

Returns:

A DataFrame containing only the spectral data within the specified ppm range, retaining the original indexes.

Return type:

pd.DataFrame

Protomix.region_removal

region_removal(spectra_df: DataFrame, range=(4.5, 6.1)) DataFrame[source]

Zero out a specified ppm range in each spectrum within the DataFrame.

This function sets the values within a specified ppm range to zero for each spectrum in the DataFrame, effectively removing that region from the spectra.

Parameters:
  • spectra_df (pd.DataFrame) – A DataFrame where each row represents a spectrum and columns correspond to ppm values.

  • range (tuple) – A tuple specifying the start and end ppm values of the region to be zeroed out. Default is (4.5, 6.1).

Returns:

An updated DataFrame with the specified ppm range zeroed out, retaining the original indexes.

Return type:

pd.DataFrame

Protomix.binning

binning(spectra_df: DataFrame, bin_size: float, method='trapezoidal') DataFrame[source]

Perform uniform binning on an NMR spectrum by integrating the area under the curve for each bin.

This function applies uniform binning to the provided NMR spectra, dividing the spectrum into bins of a specified size and integrating the area under the curve within each bin. The integration can be performed using either the trapezoidal or rectangular method.

Parameters:
  • spectra_df (pd.DataFrame) – A DataFrame containing the spectra, where each row represents a sample, and columns represent ppm values.

  • bin_size (float) – The size of each bin in ppm.

  • method (str) – The integration method to use for binning. Options are “trapezoidal” (default) or “rectangular”.

Returns:

A DataFrame containing the binned spectra with integrated intensities for each bin.

Return type:

pd.DataFrame

Protomix.normalize

normalize(spectra_df: DataFrame, method: str = 'PQN') DataFrame[source]

Apply different normalization methods to a DataFrame of spectra.

This function normalizes the spectra in the provided DataFrame using the specified method. Each row in the DataFrame represents a sample, and each column corresponds to a data point in the spectrum.

Parameters:
  • spectra_df (pd.DataFrame) – The DataFrame containing the spectra, with samples as rows and data points as columns.

  • method (str) – The normalization method to apply. Options are ‘PQN’ (Probabilistic Quotient Normalization), ‘TotalArea’ (Total Area Normalization), or ‘SNV’ (Standard Normal Variate). Default is ‘PQN’.

Returns:

The normalized spectra as a DataFrame.

Return type:

pd.DataFrame