imspy.simulation.timsim.jobs package

Submodules

imspy.simulation.timsim.jobs.add_noise_from_real_data module

imspy.simulation.timsim.jobs.add_noise_from_real_data.add_real_data_noise_to_frames(acquisition_builder, frames, intensity_max_precursor=30, intensity_max_fragment=30, precursor_sample_fraction=0.2, fragment_sample_fraction=0.2, num_precursor_frames=5, num_fragment_frames=5, acquisition_mode='DIA')

Add noise to frame.

Parameters:
  • acquisition_builder (TimsTofAcquisitionBuilderDIA) – Acquisition builder.

  • frames (List[TimsFrame]) – Frames.

  • intensity_max_precursor (float) – Maximum intensity for precursor.

  • intensity_max_fragment (float) – Maximum intensity for fragment.

  • precursor_sample_fraction (float) – Sample fraction for precursor.

  • fragment_sample_fraction (float) – Sample fraction for fragment.

  • num_precursor_frames (int) – Number of precursor frames.

  • num_fragment_frames (int) – Number of fragment frames.

  • acquisition_mode (str) – Acquisition mode.

Returns:

Frames.

Return type:

List[TimsFrame]

imspy.simulation.timsim.jobs.assemble_frames module

imspy.simulation.timsim.jobs.assemble_frames.assemble_frames(acquisition_builder, frames, batch_size, verbose=False, mz_noise_precursor=False, mz_noise_uniform=False, precursor_noise_ppm=5.0, mz_noise_fragment=False, fragment_noise_ppm=5.0, num_threads=4, add_real_data_noise=False, intensity_max_precursor=150, intensity_max_fragment=75, precursor_sample_fraction=0.01, fragment_sample_fraction=0.05, num_precursor_frames=10, num_fragment_frames=10, fragment=True)

Assemble frames from frame ids and write them to the database.

Parameters:
  • acquisition_builder (TimsTofAcquisitionBuilder) – Acquisition builder object.

  • frames (DataFrame) – DataFrame containing frame ids.

  • batch_size (int) – Batch size for frame assembly, i.e. how many frames are assembled at once.

  • verbose (bool) – Verbosity.

  • mz_noise_precursor (bool) – Add noise to precursor m/z values.

  • mz_noise_uniform (bool) – Add uniform noise to m/z values.

  • precursor_noise_ppm (float) – PPM value for precursor noise.

  • mz_noise_fragment (bool) – Add noise to fragment m/z values.

  • fragment_noise_ppm (float) – PPM value for fragment noise.

  • num_threads (int) – Number of threads for frame assembly.

  • add_real_data_noise (bool) – Add real data noise to the frames.

  • intensity_max_precursor (float) – Maximum intensity for precursor noise.

  • intensity_max_fragment (float) – Maximum intensity for fragment noise.

  • precursor_sample_fraction (float) – Sample fraction for precursor noise.

  • fragment_sample_fraction (float) – Sample fraction for fragment noise.

  • num_precursor_frames (int) – Number of precursor frames.

  • num_fragment_frames (int) – Number of fragment frames.

  • fragment (bool) – if False, Quadrupole isolation will still be used, but no fragmentation will be performed.

Return type:

None

Returns:

None, writes frames to disk and metadata to database.

imspy.simulation.timsim.jobs.build_acquisition module

imspy.simulation.timsim.jobs.build_acquisition.build_acquisition(path, reference_path, exp_name, acquisition_type='dia', verbose=False, gradient_length=None, use_reference_ds_layout=True, reference_in_memory=True, round_collision_energy=True, collision_energy_decimals=0, use_bruker_sdk=True)

Build acquisition object from reference path.

Parameters:
  • path (str) – Path where the acquisition will be saved.

  • reference_path (str) – Path to the reference dataset.

  • exp_name (str) – Experiment name.

  • acquisition_type (str) – Acquisition type, must be ‘dia’, ‘midia’, ‘slice’ or ‘synchro’.

  • verbose (bool) – Verbosity.

  • gradient_length (float) – Gradient length.

  • use_reference_ds_layout (bool) – Use reference dataset layout for synthetic dataset.

  • reference_in_memory (bool) – Load reference dataset into memory.

  • round_collision_energy (bool) – Round collision energy.

  • collision_energy_decimals (int) – Number of decimals for collision energy (controls coarseness).

  • use_bruker_sdk (bool) – Use Bruker SDK for reading reference dataset.

Returns:

Acquisition object.

Return type:

TimsTofAcquisitionBuilderDIA

imspy.simulation.timsim.jobs.digest_fasta module

imspy.simulation.timsim.jobs.digest_fasta.digest_fasta(fasta_file_path, missed_cleavages=2, min_len=6, max_len=30, cleave_at='KR', restrict=None, decoys=False, verbose=False, job_name='digest_fasta', static_mods={'C': '[UNIMOD:4]'}, variable_mods={'M': ['[UNIMOD:35]'], '[': ['[UNIMOD:1]']}, exclude_accumulated_gradient_start=True, min_rt_percent=2.0, gradient_length=3600)

Digest a fasta file.

Parameters:
  • fasta_file_path (str) – Path to the fasta file.

  • missed_cleavages (int) – Number of missed cleavages.

  • min_len (int) – Minimum peptide length.

  • max_len (int) – Maximum peptide length.

  • cleave_at (str) – Cleavage sites.

  • restrict (str) – Restrict to specific proteins.

  • decoys (bool) – Generate decoys.

  • verbose (bool) – Verbosity.

  • job_name (str) – Job name.

  • static_mods (dict[str, str]) – Static modifications.

  • variable_mods (dict[str, list[str]]) – Variable modifications.

  • exclude_accumulated_gradient_start (bool) – Exclude low retention times.

  • min_rt_percent (float) – Minimum retention time in percent.

  • gradient_length (float) – Gradient length in seconds (in seconds).

Returns:

Peptide digest object.

Return type:

PeptideDigest

imspy.simulation.timsim.jobs.simulate_charge_states module

imspy.simulation.timsim.jobs.simulate_charge_states.simulate_charge_states(peptides, mz_lower, mz_upper, p_charge=0.8, max_charge=4, charge_state_one_probability=0.0, min_charge_contrib=0.15, use_binomial=False, normalize=True)

Simulate charge states for peptides.

Parameters:
  • peptides (DataFrame) – Peptides DataFrame.

  • mz_lower (float) – Lower m/z value.

  • mz_upper (float) – Upper m/z value.

  • p_charge (float) – Probability of charge.

  • max_charge (int) – Maximum charge that will be simulated (should default to 4 since IMS simulations are limited to 4).

  • charge_state_one_probability (float) – Probability of charge state one.

  • min_charge_contrib (float) – Minimum charge contribution.

  • use_binomial (bool) – Use binomial distribution model, otherwise use deep learning model.

  • normalize (bool) – Normalize the charge state distribution.

Returns:

Ions DataFrame.

Return type:

pd.DataFrame

imspy.simulation.timsim.jobs.simulate_fragment_intensities module

imspy.simulation.timsim.jobs.simulate_fragment_intensities.simulate_fragment_intensities(path, name, acquisition_builder, batch_size, verbose, num_threads, down_sample_factor=0.5, dda=False)

Simulate fragment ion intensity distributions.

Parameters:
  • path (str) – Path to the synthetic data.

  • name (str) – Name of the synthetic data.

  • acquisition_builder (TimsTofAcquisitionBuilder) – Acquisition builder object.

  • batch_size (int) – Batch size for frame assembly, i.e. how many frames are assembled at once.

  • verbose (bool) – Verbosity.

  • num_threads (int) – Number of threads for frame assembly.

  • down_sample_factor (int) – Down sample factor for fragment ion intensity distributions.

  • dda (bool) – Data dependent acquisition mode.

Return type:

None

Returns:

None, writes frames to disk and metadata to database.

imspy.simulation.timsim.jobs.simulate_frame_distributions module

imspy.simulation.timsim.jobs.simulate_frame_distributions_emg module

imspy.simulation.timsim.jobs.simulate_frame_distributions_emg.calculate_rt_defaults(gradient_length)

Calculates ‘sigma_lower_rt’ and ‘sigma_upper_rt’, if these are not provided by the user. The calculation is based on the gradient length.

Parameters:

gradient_length (float) – Length of the LC gradient in seconds.

Returns:

Parameter dictionary with calculated values.

Return type:

dict

imspy.simulation.timsim.jobs.simulate_frame_distributions_emg.erfcxinv(y, n=10)

Calculates the inverse of the scaled complementary error function (erfcx) via the Newton-Raphson method.

Parameters:
  • y (ArrayLike) – The value(s) for which the inverse is to be calculated.

  • n (int, optional) – Number of iterations for the Newton-Raphson method. Default is 10.

Returns:

The inverse of the scaled complementary error function at y.

Return type:

ArrayLike

imspy.simulation.timsim.jobs.simulate_frame_distributions_emg.estimate_mu_from_mode_emg(mode, sigma, lambda_)

Estimate the parameter \(\mu\) of an EMG distribution from the mode (vectorized). The function uses the following formula (adapted from en.wikipedia.org/wiki/Exponentially_modified_Gaussian_distribution)

\[\mu = x_m + \sqrt{2}\sigma\text{erfcx}^{-1}\left(\frac{1}{\lambda\sigma}\sqrt{\frac{2}{\pi}}\right)-\sigma^2\lambda\]
Parameters:
  • mode (ArrayLike) – The modes of the EMG distributions.

  • sigma (ArrayLike) – EMG parameters \(\sigma\).

  • lambda (ArrayLike) – EMG parameters \(\lambda\).

Returns:

The estimated parameters \(\mu\).

Return type:

ArrayLike

imspy.simulation.timsim.jobs.simulate_frame_distributions_emg.sample_sigma_k_emg(sigma_lower, sigma_upper, sigma_alpha, sigma_beta, k_lower, k_upper, k_alpha, k_beta, n)

Sample \(\sigma\) and \(k\) from scaled beta distributions:

\[\begin{split}\begin{aligned} \sigma &= \sigma_{\text{lower}} + \hat{\sigma} \cdot (\sigma_{\text{upper}} - \sigma_{\text{lower}}) \\ \hat{\sigma} &\sim \text{Beta}(\alpha_{\sigma}, \beta_{\sigma}) \\ \end{aligned}\end{split}\]

This function samples \(\sigma\) and \(k\) parameters for the exponentially modified Gaussian (EMG) distribution with:

\[k=\frac{1}{\sigma\lambda}\]
Parameters:
  • sigma_lower (ArrayLike) – The lower bound for \(\sigma\).

  • sigma_upper (ArrayLike) – The upper bound for \(\sigma\).

  • sigma_alpha (ArrayLike) – The \(\alpha\) parameter for the beta distribution for \(\hat{sigma}\).

  • sigma_beta (ArrayLike) – The \(\beta\) parameter for the beta distribution for \(\hat{sigma}\).

  • k_lower (ArrayLike) – The lower bound for \(k\).

  • k_upper (ArrayLike) – The upper bound for \(k\).

  • k_alpha (ArrayLike) – The \(\alpha\) parameter for the beta distribution for \(\hat{k}\).

  • k_beta (ArrayLike) – The \(\beta\) parameter for the beta distribution for \(\hat{k}\).

  • n (int) – Number of samples.

Returns:

The sampled \(\sigma\) and \(k\).

Return type:

Tuple[ArrayLike, ArrayLike]

imspy.simulation.timsim.jobs.simulate_frame_distributions_emg.sample_sigma_lambda_emg(sigma_lower, sigma_upper, sigma_alpha, sigma_beta, lambda_lower, lambda_upper, lambda_alpha, lambda_beta, n)

Sample \(\sigma\) and \(\lambda\) from scaled beta distributions:

\[\begin{split}\begin{aligned} \sigma &= \sigma_{\text{lower}} + \hat{\sigma} \cdot (\sigma_{\text{upper}} - \sigma_{\text{lower}}) \\ \hat{\sigma} &\sim \text{Beta}(\alpha_{\sigma}, \beta_{\sigma}) \end{aligned}\end{split}\]

This function is currently not used in the codebase. It is kept in case we want to use the EMG parametrization with \(\sigma\) and \(\lambda\) instead of \(\sigma\) and \(k\).

Parameters:
  • sigma_lower (ArrayLike) – The lower bound for \(\sigma\).

  • sigma_upper (ArrayLike) – The upper bound for \(\sigma\).

  • sigma_alpha (ArrayLike) – The \(\alpha\) parameter for the beta distribution for \(\hat{sigma}\).

  • sigma_beta (ArrayLike) – The \(\beta\) parameter for the beta distribution for \(\hat{sigma}\).

  • lambda_lower (ArrayLike) – The lower bound for \(\lambda\).

  • lambda_upper (ArrayLike) – The upper bound for \(\lambda\).

  • lambda_alpha (ArrayLike) – The \(\alpha\) parameter for the beta distribution for \(\hat{\lambda}\).

  • lambda_beta (ArrayLike) – The \(\beta\) parameter for the beta distribution for \(\hat{\lambda}\).

  • n (int) – Number of samples.

Returns:

The sampled \(\sigma\) and \(\lambda\).

Return type:

Tuple[ArrayLike, ArrayLike]

imspy.simulation.timsim.jobs.simulate_frame_distributions_emg.simulate_frame_distributions_emg(peptides, frames, sigma_lower_rt, sigma_upper_rt, sigma_alpha_rt, sigma_beta_rt, k_lower_rt, k_upper_rt, k_alpha_rt, k_beta_rt, target_p, step_size, rt_cycle_length, verbose=False, add_noise=False, n_steps=1000, num_threads=4, from_existing=False, sigmas=None, lambdas=None, gradient_length=None)

Simulate frame distributions for peptides.

Parameters:
  • peptides (DataFrame) – Peptide DataFrame.

  • frames (DataFrame) – Frame DataFrame.

  • sigma_lower_rt (Optional[float]) – Lower bound for sigma of an EMG chromatographic peak.

  • sigma_upper_rt (Optional[float]) – Upper bound for sigma of an EMG chromatographic peak.

  • sigma_alpha_rt (float) – Alpha for beta distribution for sigma_hat that is then scaled to sigma in (sigma_lower_rt, sigma_upper_rt).

  • sigma_beta_rt (float) – Beta for beta distribution for sigma_hat that is then scaled to sigma in (sigma_lower_rt, sigma_upper_rt).

  • k_lower_rt (float) – Lower bound for k of an EMG chromatographic peak.

  • k_upper_rt (float) – Upper bound for k of an EMG chromatographic peak.

  • k_alpha_rt (float) – Alpha for beta distribution for k_hat that is then scaled to k in (k_lower_rt, k_upper_rt).

  • k_beta_rt (float) – Beta for beta distribution for k_hat that is then scaled to k in (k_lower_rt, k_upper_rt).

  • target_p (float) – target p.

  • step_size (float) – step size.

  • rt_cycle_length (float) – Retention time cycle length in seconds.

  • verbose (bool) – Verbosity.

  • add_noise (bool) – Add noise.

  • normalize – Normalize frame abundance.

  • n_steps (int) – number of steps.

  • num_threads (int) – number of threads.

  • from_existing (bool) – Use existing parameters.

  • sigmas (ndarray) – sigmas.

  • lambdas (ndarray) – lambdas.

  • gradient_length (float) – Length of the LC gradient in seconds.

Returns:

Peptide DataFrame with frame distributions.

Return type:

pd.DataFrame

imspy.simulation.timsim.jobs.simulate_ion_mobilities module

imspy.simulation.timsim.jobs.simulate_ion_mobilities.simulate_ion_mobilities(ions, im_lower, im_upper, verbose=False)

Simulate ion mobilities.

Parameters:
  • ions (DataFrame) – Ions DataFrame.

  • im_lower (float) – Lower ion mobility.

  • im_upper (float) – Upper ion mobility.

  • verbose (bool) – Verbosity.

Returns:

Ions DataFrame.

Return type:

pd.DataFrame

imspy.simulation.timsim.jobs.simulate_occurrences module

imspy.simulation.timsim.jobs.simulate_occurrences.simulate_peptide_occurrences(peptides, intensity_mean, intensity_min, intensity_max, verbose=False, sample_occurrences=True, intensity_value=1000000.0, mixture_contribution=1.0)

Simulate peptide occurrences.

Parameters:
  • peptides (DataFrame) – Peptides DataFrame.

  • intensity_mean (float) – Intensity mean.

  • intensity_min (float) – Intensity minimum.

  • intensity_max (float) – Intensity maximum.

  • verbose (bool) – Verbosity.

  • sample_occurrences (bool) – Sample occurrences.

  • intensity_value (float) – Intensity value.

  • mixture_contribution (float) – Mixture contribution.

Returns:

Peptides DataFrame.

Return type:

pd.DataFrame

imspy.simulation.timsim.jobs.simulate_precursor_spectra module

imspy.simulation.timsim.jobs.simulate_precursor_spectra.simulate_precursor_spectra_averagine(ions, isotope_min_intensity, isotope_k, num_threads, verbose=False)
Return type:

DataFrame

imspy.simulation.timsim.jobs.simulate_precursor_spectra.simulate_precursor_spectra_sequence(ions, num_threads=16, verbose=False)

Simulate sequence specific precursor isotopic distributions.

Parameters:
  • ions (DataFrame) – DataFrame containing ions.

  • num_threads (int) – Number of threads.

  • verbose (bool) – Verbosity.

Returns:

DataFrame containing ions with simulated spectra.

Return type:

pd.DataFrame

imspy.simulation.timsim.jobs.simulate_retention_time module

imspy.simulation.timsim.jobs.simulate_retention_time.simulate_retention_times(peptides, verbose=False, gradient_length=3600)

Simulate retention times.

Parameters:
  • peptides (DataFrame) – Peptides DataFrame.

  • verbose (bool) – Verbosity.

  • gradient_length (float) – Gradient length in seconds.

Returns:

Peptides DataFrame.

Return type:

pd.DataFrame

imspy.simulation.timsim.jobs.simulate_scan_distributions module

imspy.simulation.timsim.jobs.utility module

imspy.simulation.timsim.jobs.utility.check_path(p)
Return type:

str

imspy.simulation.timsim.jobs.utility.phosphorylation_sizes(sequence)

Checks if a sequence contains potential phosphorylation sites (S, T, or Y), and returns the count of sites and their indices.

Parameters:

sequence (str) – The input sequence string, e.g., “IC[UNIMOD:4]RQHTK”.

Returns:

A tuple containing:
  • int: The number of phosphorylation sites.

  • list: A list of indices where the sites are found.

Return type:

tuple

Module contents