imspy.simulation.timsim.jobs package¶
Submodules¶
imspy.simulation.timsim.jobs.add_noise_from_real_data module¶
- imspy.simulation.timsim.jobs.add_noise_from_real_data.add_real_data_noise_to_frames(acquisition_builder, frames, intensity_max_precursor=30, intensity_max_fragment=30, precursor_sample_fraction=0.2, fragment_sample_fraction=0.2, num_precursor_frames=5, num_fragment_frames=5, acquisition_mode='DIA')¶
Add noise to frame.
- Parameters:
acquisition_builder (
TimsTofAcquisitionBuilderDIA
) – Acquisition builder.frames (List[TimsFrame]) – Frames.
intensity_max_precursor (float) – Maximum intensity for precursor.
intensity_max_fragment (float) – Maximum intensity for fragment.
precursor_sample_fraction (float) – Sample fraction for precursor.
fragment_sample_fraction (float) – Sample fraction for fragment.
num_precursor_frames (int) – Number of precursor frames.
num_fragment_frames (int) – Number of fragment frames.
acquisition_mode (str) – Acquisition mode.
- Returns:
Frames.
- Return type:
List[TimsFrame]
imspy.simulation.timsim.jobs.assemble_frames module¶
- imspy.simulation.timsim.jobs.assemble_frames.assemble_frames(acquisition_builder, frames, batch_size, verbose=False, mz_noise_precursor=False, mz_noise_uniform=False, precursor_noise_ppm=5.0, mz_noise_fragment=False, fragment_noise_ppm=5.0, num_threads=4, add_real_data_noise=False, intensity_max_precursor=150, intensity_max_fragment=75, precursor_sample_fraction=0.01, fragment_sample_fraction=0.05, num_precursor_frames=10, num_fragment_frames=10, fragment=True)¶
Assemble frames from frame ids and write them to the database.
- Parameters:
acquisition_builder (
TimsTofAcquisitionBuilder
) – Acquisition builder object.frames (
DataFrame
) – DataFrame containing frame ids.batch_size (
int
) – Batch size for frame assembly, i.e. how many frames are assembled at once.verbose (
bool
) – Verbosity.mz_noise_precursor (
bool
) – Add noise to precursor m/z values.mz_noise_uniform (
bool
) – Add uniform noise to m/z values.precursor_noise_ppm (
float
) – PPM value for precursor noise.mz_noise_fragment (
bool
) – Add noise to fragment m/z values.fragment_noise_ppm (
float
) – PPM value for fragment noise.num_threads (
int
) – Number of threads for frame assembly.add_real_data_noise (
bool
) – Add real data noise to the frames.intensity_max_precursor (
float
) – Maximum intensity for precursor noise.intensity_max_fragment (
float
) – Maximum intensity for fragment noise.precursor_sample_fraction (
float
) – Sample fraction for precursor noise.fragment_sample_fraction (
float
) – Sample fraction for fragment noise.num_precursor_frames (
int
) – Number of precursor frames.num_fragment_frames (
int
) – Number of fragment frames.fragment (
bool
) – if False, Quadrupole isolation will still be used, but no fragmentation will be performed.
- Return type:
None
- Returns:
None, writes frames to disk and metadata to database.
imspy.simulation.timsim.jobs.build_acquisition module¶
- imspy.simulation.timsim.jobs.build_acquisition.build_acquisition(path, reference_path, exp_name, acquisition_type='dia', verbose=False, gradient_length=None, use_reference_ds_layout=True, reference_in_memory=True, round_collision_energy=True, collision_energy_decimals=0, use_bruker_sdk=True)¶
Build acquisition object from reference path.
- Parameters:
path (
str
) – Path where the acquisition will be saved.reference_path (
str
) – Path to the reference dataset.exp_name (
str
) – Experiment name.acquisition_type (
str
) – Acquisition type, must be ‘dia’, ‘midia’, ‘slice’ or ‘synchro’.verbose (
bool
) – Verbosity.gradient_length (
float
) – Gradient length.use_reference_ds_layout (
bool
) – Use reference dataset layout for synthetic dataset.reference_in_memory (
bool
) – Load reference dataset into memory.round_collision_energy (
bool
) – Round collision energy.collision_energy_decimals (
int
) – Number of decimals for collision energy (controls coarseness).use_bruker_sdk (
bool
) – Use Bruker SDK for reading reference dataset.
- Returns:
Acquisition object.
- Return type:
imspy.simulation.timsim.jobs.digest_fasta module¶
- imspy.simulation.timsim.jobs.digest_fasta.digest_fasta(fasta_file_path, missed_cleavages=2, min_len=6, max_len=30, cleave_at='KR', restrict=None, decoys=False, verbose=False, job_name='digest_fasta', static_mods={'C': '[UNIMOD:4]'}, variable_mods={'M': ['[UNIMOD:35]'], '[': ['[UNIMOD:1]']}, exclude_accumulated_gradient_start=True, min_rt_percent=2.0, gradient_length=3600)¶
Digest a fasta file.
- Parameters:
fasta_file_path (
str
) – Path to the fasta file.missed_cleavages (
int
) – Number of missed cleavages.min_len (
int
) – Minimum peptide length.max_len (
int
) – Maximum peptide length.cleave_at (
str
) – Cleavage sites.restrict (
str
) – Restrict to specific proteins.decoys (
bool
) – Generate decoys.verbose (
bool
) – Verbosity.job_name (
str
) – Job name.static_mods (
dict
[str
,str
]) – Static modifications.variable_mods (
dict
[str
,list
[str
]]) – Variable modifications.exclude_accumulated_gradient_start (
bool
) – Exclude low retention times.min_rt_percent (
float
) – Minimum retention time in percent.gradient_length (
float
) – Gradient length in seconds (in seconds).
- Returns:
Peptide digest object.
- Return type:
imspy.simulation.timsim.jobs.simulate_charge_states module¶
- imspy.simulation.timsim.jobs.simulate_charge_states.simulate_charge_states(peptides, mz_lower, mz_upper, p_charge=0.8, max_charge=4, charge_state_one_probability=0.0, min_charge_contrib=0.15, use_binomial=False, normalize=True)¶
Simulate charge states for peptides.
- Parameters:
peptides (
DataFrame
) – Peptides DataFrame.mz_lower (
float
) – Lower m/z value.mz_upper (
float
) – Upper m/z value.p_charge (
float
) – Probability of charge.max_charge (
int
) – Maximum charge that will be simulated (should default to 4 since IMS simulations are limited to 4).charge_state_one_probability (
float
) – Probability of charge state one.min_charge_contrib (
float
) – Minimum charge contribution.use_binomial (
bool
) – Use binomial distribution model, otherwise use deep learning model.normalize (
bool
) – Normalize the charge state distribution.
- Returns:
Ions DataFrame.
- Return type:
pd.DataFrame
imspy.simulation.timsim.jobs.simulate_fragment_intensities module¶
- imspy.simulation.timsim.jobs.simulate_fragment_intensities.simulate_fragment_intensities(path, name, acquisition_builder, batch_size, verbose, num_threads, down_sample_factor=0.5, dda=False)¶
Simulate fragment ion intensity distributions.
- Parameters:
path (
str
) – Path to the synthetic data.name (
str
) – Name of the synthetic data.acquisition_builder (
TimsTofAcquisitionBuilder
) – Acquisition builder object.batch_size (
int
) – Batch size for frame assembly, i.e. how many frames are assembled at once.verbose (
bool
) – Verbosity.num_threads (
int
) – Number of threads for frame assembly.down_sample_factor (
int
) – Down sample factor for fragment ion intensity distributions.dda (
bool
) – Data dependent acquisition mode.
- Return type:
None
- Returns:
None, writes frames to disk and metadata to database.
imspy.simulation.timsim.jobs.simulate_frame_distributions module¶
imspy.simulation.timsim.jobs.simulate_frame_distributions_emg module¶
- imspy.simulation.timsim.jobs.simulate_frame_distributions_emg.calculate_rt_defaults(gradient_length)¶
Calculates ‘sigma_lower_rt’ and ‘sigma_upper_rt’, if these are not provided by the user. The calculation is based on the gradient length.
- Parameters:
gradient_length (float) – Length of the LC gradient in seconds.
- Returns:
Parameter dictionary with calculated values.
- Return type:
dict
- imspy.simulation.timsim.jobs.simulate_frame_distributions_emg.erfcxinv(y, n=10)¶
Calculates the inverse of the scaled complementary error function (erfcx) via the Newton-Raphson method.
- Parameters:
y (ArrayLike) – The value(s) for which the inverse is to be calculated.
n (int, optional) – Number of iterations for the Newton-Raphson method. Default is 10.
- Returns:
The inverse of the scaled complementary error function at y.
- Return type:
ArrayLike
- imspy.simulation.timsim.jobs.simulate_frame_distributions_emg.estimate_mu_from_mode_emg(mode, sigma, lambda_)¶
Estimate the parameter \(\mu\) of an EMG distribution from the mode (vectorized). The function uses the following formula (adapted from en.wikipedia.org/wiki/Exponentially_modified_Gaussian_distribution)
\[\mu = x_m + \sqrt{2}\sigma\text{erfcx}^{-1}\left(\frac{1}{\lambda\sigma}\sqrt{\frac{2}{\pi}}\right)-\sigma^2\lambda\]- Parameters:
mode (ArrayLike) – The modes of the EMG distributions.
sigma (ArrayLike) – EMG parameters \(\sigma\).
lambda (ArrayLike) – EMG parameters \(\lambda\).
- Returns:
The estimated parameters \(\mu\).
- Return type:
ArrayLike
- imspy.simulation.timsim.jobs.simulate_frame_distributions_emg.sample_sigma_k_emg(sigma_lower, sigma_upper, sigma_alpha, sigma_beta, k_lower, k_upper, k_alpha, k_beta, n)¶
Sample \(\sigma\) and \(k\) from scaled beta distributions:
\[\begin{split}\begin{aligned} \sigma &= \sigma_{\text{lower}} + \hat{\sigma} \cdot (\sigma_{\text{upper}} - \sigma_{\text{lower}}) \\ \hat{\sigma} &\sim \text{Beta}(\alpha_{\sigma}, \beta_{\sigma}) \\ \end{aligned}\end{split}\]This function samples \(\sigma\) and \(k\) parameters for the exponentially modified Gaussian (EMG) distribution with:
\[k=\frac{1}{\sigma\lambda}\]- Parameters:
sigma_lower (ArrayLike) – The lower bound for \(\sigma\).
sigma_upper (ArrayLike) – The upper bound for \(\sigma\).
sigma_alpha (ArrayLike) – The \(\alpha\) parameter for the beta distribution for \(\hat{sigma}\).
sigma_beta (ArrayLike) – The \(\beta\) parameter for the beta distribution for \(\hat{sigma}\).
k_lower (ArrayLike) – The lower bound for \(k\).
k_upper (ArrayLike) – The upper bound for \(k\).
k_alpha (ArrayLike) – The \(\alpha\) parameter for the beta distribution for \(\hat{k}\).
k_beta (ArrayLike) – The \(\beta\) parameter for the beta distribution for \(\hat{k}\).
n (int) – Number of samples.
- Returns:
The sampled \(\sigma\) and \(k\).
- Return type:
Tuple[ArrayLike, ArrayLike]
- imspy.simulation.timsim.jobs.simulate_frame_distributions_emg.sample_sigma_lambda_emg(sigma_lower, sigma_upper, sigma_alpha, sigma_beta, lambda_lower, lambda_upper, lambda_alpha, lambda_beta, n)¶
Sample \(\sigma\) and \(\lambda\) from scaled beta distributions:
\[\begin{split}\begin{aligned} \sigma &= \sigma_{\text{lower}} + \hat{\sigma} \cdot (\sigma_{\text{upper}} - \sigma_{\text{lower}}) \\ \hat{\sigma} &\sim \text{Beta}(\alpha_{\sigma}, \beta_{\sigma}) \end{aligned}\end{split}\]This function is currently not used in the codebase. It is kept in case we want to use the EMG parametrization with \(\sigma\) and \(\lambda\) instead of \(\sigma\) and \(k\).
- Parameters:
sigma_lower (ArrayLike) – The lower bound for \(\sigma\).
sigma_upper (ArrayLike) – The upper bound for \(\sigma\).
sigma_alpha (ArrayLike) – The \(\alpha\) parameter for the beta distribution for \(\hat{sigma}\).
sigma_beta (ArrayLike) – The \(\beta\) parameter for the beta distribution for \(\hat{sigma}\).
lambda_lower (ArrayLike) – The lower bound for \(\lambda\).
lambda_upper (ArrayLike) – The upper bound for \(\lambda\).
lambda_alpha (ArrayLike) – The \(\alpha\) parameter for the beta distribution for \(\hat{\lambda}\).
lambda_beta (ArrayLike) – The \(\beta\) parameter for the beta distribution for \(\hat{\lambda}\).
n (int) – Number of samples.
- Returns:
The sampled \(\sigma\) and \(\lambda\).
- Return type:
Tuple[ArrayLike, ArrayLike]
- imspy.simulation.timsim.jobs.simulate_frame_distributions_emg.simulate_frame_distributions_emg(peptides, frames, sigma_lower_rt, sigma_upper_rt, sigma_alpha_rt, sigma_beta_rt, k_lower_rt, k_upper_rt, k_alpha_rt, k_beta_rt, target_p, step_size, rt_cycle_length, verbose=False, add_noise=False, n_steps=1000, num_threads=4, from_existing=False, sigmas=None, lambdas=None, gradient_length=None)¶
Simulate frame distributions for peptides.
- Parameters:
peptides (
DataFrame
) – Peptide DataFrame.frames (
DataFrame
) – Frame DataFrame.sigma_lower_rt (
Optional
[float
]) – Lower bound for sigma of an EMG chromatographic peak.sigma_upper_rt (
Optional
[float
]) – Upper bound for sigma of an EMG chromatographic peak.sigma_alpha_rt (
float
) – Alpha for beta distribution for sigma_hat that is then scaled to sigma in (sigma_lower_rt, sigma_upper_rt).sigma_beta_rt (
float
) – Beta for beta distribution for sigma_hat that is then scaled to sigma in (sigma_lower_rt, sigma_upper_rt).k_lower_rt (
float
) – Lower bound for k of an EMG chromatographic peak.k_upper_rt (
float
) – Upper bound for k of an EMG chromatographic peak.k_alpha_rt (
float
) – Alpha for beta distribution for k_hat that is then scaled to k in (k_lower_rt, k_upper_rt).k_beta_rt (
float
) – Beta for beta distribution for k_hat that is then scaled to k in (k_lower_rt, k_upper_rt).target_p (
float
) – target p.step_size (
float
) – step size.rt_cycle_length (
float
) – Retention time cycle length in seconds.verbose (
bool
) – Verbosity.add_noise (
bool
) – Add noise.normalize – Normalize frame abundance.
n_steps (
int
) – number of steps.num_threads (
int
) – number of threads.from_existing (
bool
) – Use existing parameters.sigmas (
ndarray
) – sigmas.lambdas (
ndarray
) – lambdas.gradient_length (
float
) – Length of the LC gradient in seconds.
- Returns:
Peptide DataFrame with frame distributions.
- Return type:
pd.DataFrame
imspy.simulation.timsim.jobs.simulate_ion_mobilities module¶
- imspy.simulation.timsim.jobs.simulate_ion_mobilities.simulate_ion_mobilities(ions, im_lower, im_upper, verbose=False)¶
Simulate ion mobilities.
- Parameters:
ions (
DataFrame
) – Ions DataFrame.im_lower (
float
) – Lower ion mobility.im_upper (
float
) – Upper ion mobility.verbose (
bool
) – Verbosity.
- Returns:
Ions DataFrame.
- Return type:
pd.DataFrame
imspy.simulation.timsim.jobs.simulate_occurrences module¶
- imspy.simulation.timsim.jobs.simulate_occurrences.simulate_peptide_occurrences(peptides, intensity_mean, intensity_min, intensity_max, verbose=False, sample_occurrences=True, intensity_value=1000000.0, mixture_contribution=1.0)¶
Simulate peptide occurrences.
- Parameters:
peptides (
DataFrame
) – Peptides DataFrame.intensity_mean (
float
) – Intensity mean.intensity_min (
float
) – Intensity minimum.intensity_max (
float
) – Intensity maximum.verbose (
bool
) – Verbosity.sample_occurrences (
bool
) – Sample occurrences.intensity_value (
float
) – Intensity value.mixture_contribution (
float
) – Mixture contribution.
- Returns:
Peptides DataFrame.
- Return type:
pd.DataFrame
imspy.simulation.timsim.jobs.simulate_precursor_spectra module¶
- imspy.simulation.timsim.jobs.simulate_precursor_spectra.simulate_precursor_spectra_averagine(ions, isotope_min_intensity, isotope_k, num_threads, verbose=False)¶
- Return type:
DataFrame
- imspy.simulation.timsim.jobs.simulate_precursor_spectra.simulate_precursor_spectra_sequence(ions, num_threads=16, verbose=False)¶
Simulate sequence specific precursor isotopic distributions.
- Parameters:
ions (
DataFrame
) – DataFrame containing ions.num_threads (
int
) – Number of threads.verbose (
bool
) – Verbosity.
- Returns:
DataFrame containing ions with simulated spectra.
- Return type:
pd.DataFrame
imspy.simulation.timsim.jobs.simulate_retention_time module¶
- imspy.simulation.timsim.jobs.simulate_retention_time.simulate_retention_times(peptides, verbose=False, gradient_length=3600)¶
Simulate retention times.
- Parameters:
peptides (
DataFrame
) – Peptides DataFrame.verbose (
bool
) – Verbosity.gradient_length (
float
) – Gradient length in seconds.
- Returns:
Peptides DataFrame.
- Return type:
pd.DataFrame
imspy.simulation.timsim.jobs.simulate_scan_distributions module¶
imspy.simulation.timsim.jobs.utility module¶
- imspy.simulation.timsim.jobs.utility.check_path(p)¶
- Return type:
str
- imspy.simulation.timsim.jobs.utility.phosphorylation_sizes(sequence)¶
Checks if a sequence contains potential phosphorylation sites (S, T, or Y), and returns the count of sites and their indices.
- Parameters:
sequence (str) – The input sequence string, e.g., “IC[UNIMOD:4]RQHTK”.
- Returns:
- A tuple containing:
int: The number of phosphorylation sites.
list: A list of indices where the sites are found.
- Return type:
tuple