imspy.simulation.timsim.jobs package

Submodules

imspy.simulation.timsim.jobs.add_noise_from_real_data module

imspy.simulation.timsim.jobs.add_noise_from_real_data.add_real_data_noise_to_frames(acquisition_builder, frames, intensity_max_precursor=30, intensity_max_fragment=30, precursor_sample_fraction=0.2, fragment_sample_fraction=0.2, num_precursor_frames=5, num_fragment_frames=5)

Add noise to frame.

Parameters:
  • acquisition_builder (TimsTofAcquisitionBuilderDIA) – Acquisition builder.

  • frames (List[TimsFrame]) – Frames.

  • intensity_max_precursor (float) – Maximum intensity for precursor.

  • intensity_max_fragment (float) – Maximum intensity for fragment.

  • precursor_sample_fraction (float) – Sample fraction for precursor.

  • fragment_sample_fraction (float) – Sample fraction for fragment.

  • num_precursor_frames (int) – Number of precursor frames.

  • num_fragment_frames (int) – Number of fragment frames.

Returns:

Frames.

Return type:

List[TimsFrame]

imspy.simulation.timsim.jobs.assemble_frames module

imspy.simulation.timsim.jobs.assemble_frames.assemble_frames(acquisition_builder, frames, batch_size, verbose=False, mz_noise_precursor=False, mz_noise_uniform=False, precursor_noise_ppm=5.0, mz_noise_fragment=False, fragment_noise_ppm=5.0, num_threads=4, add_real_data_noise=False, intensity_max_precursor=150, intensity_max_fragment=75, precursor_sample_fraction=0.01, fragment_sample_fraction=0.05, num_precursor_frames=10, num_fragment_frames=10, fragment=True)

Assemble frames from frame ids and write them to the database.

Parameters:
  • acquisition_builder (TimsTofAcquisitionBuilderDIA) – Acquisition builder object.

  • frames (DataFrame) – DataFrame containing frame ids.

  • batch_size (int) – Batch size for frame assembly, i.e. how many frames are assembled at once.

  • verbose (bool) – Verbosity.

  • mz_noise_precursor (bool) – Add noise to precursor m/z values.

  • mz_noise_uniform (bool) – Add uniform noise to m/z values.

  • precursor_noise_ppm (float) – PPM value for precursor noise.

  • mz_noise_fragment (bool) – Add noise to fragment m/z values.

  • fragment_noise_ppm (float) – PPM value for fragment noise.

  • num_threads (int) – Number of threads for frame assembly.

  • add_real_data_noise (bool) – Add real data noise to the frames.

  • intensity_max_precursor (float) – Maximum intensity for precursor noise.

  • intensity_max_fragment (float) – Maximum intensity for fragment noise.

  • precursor_sample_fraction (float) – Sample fraction for precursor noise.

  • fragment_sample_fraction (float) – Sample fraction for fragment noise.

  • num_precursor_frames (int) – Number of precursor frames.

  • num_fragment_frames (int) – Number of fragment frames.

  • fragment (bool) – if False, Quadrupole isolation will still be used, but no fragmentation will be performed.

Return type:

None

Returns:

None, writes frames to disk and metadata to database.

imspy.simulation.timsim.jobs.build_acquisition module

imspy.simulation.timsim.jobs.build_acquisition.build_acquisition(path, reference_path, exp_name, acquisition_type='dia', verbose=False, gradient_length=None, use_reference_ds_layout=True, reference_in_memory=True, round_collision_energy=True, collision_energy_decimals=0, use_bruker_sdk=True)

Build acquisition object from reference path.

Parameters:
  • path (str) – Path where the acquisition will be saved.

  • reference_path (str) – Path to the reference dataset.

  • exp_name (str) – Experiment name.

  • acquisition_type (str) – Acquisition type, must be ‘dia’, ‘midia’, ‘slice’ or ‘synchro’.

  • verbose (bool) – Verbosity.

  • gradient_length (float) – Gradient length.

  • use_reference_ds_layout (bool) – Use reference dataset layout for synthetic dataset.

  • reference_in_memory (bool) – Load reference dataset into memory.

  • round_collision_energy (bool) – Round collision energy.

  • collision_energy_decimals (int) – Number of decimals for collision energy (controls coarseness).

  • use_bruker_sdk (bool) – Use Bruker SDK for reading reference dataset.

Returns:

Acquisition object.

Return type:

TimsTofAcquisitionBuilderDIA

imspy.simulation.timsim.jobs.digest_fasta module

imspy.simulation.timsim.jobs.digest_fasta.digest_fasta(fasta_file_path, missed_cleavages=2, min_len=6, max_len=30, cleave_at='KR', restrict=None, decoys=False, verbose=False, job_name='digest_fasta', static_mods={'C': '[UNIMOD:4]'}, variable_mods={'M': ['[UNIMOD:35]'], '[': ['[UNIMOD:1]']}, exclude_accumulated_gradient_start=True, min_rt_percent=2.0, gradient_length=3600)

Digest a fasta file.

Parameters:
  • fasta_file_path (str) – Path to the fasta file.

  • missed_cleavages (int) – Number of missed cleavages.

  • min_len (int) – Minimum peptide length.

  • max_len (int) – Maximum peptide length.

  • cleave_at (str) – Cleavage sites.

  • restrict (str) – Restrict to specific proteins.

  • decoys (bool) – Generate decoys.

  • verbose (bool) – Verbosity.

  • job_name (str) – Job name.

  • static_mods (dict[str, str]) – Static modifications.

  • variable_mods (dict[str, list[str]]) – Variable modifications.

  • exclude_accumulated_gradient_start (bool) – Exclude low retention times.

  • min_rt_percent (float) – Minimum retention time in percent.

  • gradient_length (float) – Gradient length in seconds (in seconds).

Returns:

Peptide digest object.

Return type:

PeptideDigest

imspy.simulation.timsim.jobs.simulate_charge_states module

imspy.simulation.timsim.jobs.simulate_charge_states.simulate_charge_states(peptides, mz_lower, mz_upper, p_charge=0.5, max_charge=4, charge_state_one_probability=0.0, min_charge_contrib=0.15, use_binomial=False)

Simulate charge states for peptides.

Parameters:
  • peptides (DataFrame) – Peptides DataFrame.

  • mz_lower (float) – Lower m/z value.

  • mz_upper (float) – Upper m/z value.

  • p_charge (float) – Probability of charge.

  • max_charge (int) – Maximum charge that will be simulated (should default to 4 since IMS simulations are limited to 4).

  • charge_state_one_probability (float) – Probability of charge state one.

  • min_charge_contrib (float) – Minimum charge contribution.

  • use_binomial (bool) – Use binomial distribution model, otherwise use deep learning model.

Returns:

Ions DataFrame.

Return type:

pd.DataFrame

imspy.simulation.timsim.jobs.simulate_fragment_intensities module

imspy.simulation.timsim.jobs.simulate_fragment_intensities.simulate_fragment_intensities(path, name, acquisition_builder, batch_size, verbose, num_threads, down_sample_factor=0.5)

Simulate fragment ion intensity distributions.

Parameters:
  • path (str) – Path to the synthetic data.

  • name (str) – Name of the synthetic data.

  • acquisition_builder (TimsTofAcquisitionBuilderDIA) – Acquisition builder object.

  • batch_size (int) – Batch size for frame assembly, i.e. how many frames are assembled at once.

  • verbose (bool) – Verbosity.

  • num_threads (int) – Number of threads for frame assembly.

  • down_sample_factor (int) – Down sample factor for fragment ion intensity distributions.

Return type:

None

Returns:

None, writes frames to disk and metadata to database.

imspy.simulation.timsim.jobs.simulate_frame_distributions module

imspy.simulation.timsim.jobs.simulate_frame_distributions.simulate_frame_distributions(peptides, frames, z_score, std_rt, rt_cycle_length, verbose=False, add_noise=False, normalize=False)

Simulate frame distributions for peptides.

Parameters:
  • peptides (DataFrame) – Peptide DataFrame.

  • frames (DataFrame) – Frame DataFrame.

  • z_score (float) – Z-score.

  • std_rt (float) – Standard deviation of retention time.

  • rt_cycle_length (float) – Retention time cycle length in seconds.

  • verbose (bool) – Verbosity.

  • add_noise (bool) – Add noise.

  • normalize (bool) – Normalize frame abundance.

Returns:

Peptide DataFrame with frame distributions.

Return type:

pd.DataFrame

imspy.simulation.timsim.jobs.simulate_frame_distributions_emg module

imspy.simulation.timsim.jobs.simulate_frame_distributions_emg.sample_parameters_rejection(sigma_mean, sigma_variance, lambda_mean, lambda_variance, n)
imspy.simulation.timsim.jobs.simulate_frame_distributions_emg.simulate_frame_distributions_emg(peptides, frames, mean_std_rt, variance_std_rt, mean_scewness, variance_scewness, target_p, step_size, rt_cycle_length, verbose=False, add_noise=False, n_steps=1000, num_threads=4, from_existing=False, sigmas=None, lambdas=None)

Simulate frame distributions for peptides.

Parameters:
  • peptides (DataFrame) – Peptide DataFrame.

  • frames (DataFrame) – Frame DataFrame.

  • mean_std_rt (float) – mean retention time.

  • variance_std_rt (float) – variance retention time.

  • mean_scewness (float) – mean scewness.

  • variance_scewness (float) – variance scewness.

  • target_p (float) – target p.

  • step_size (float) – step size.

  • rt_cycle_length (float) – Retention time cycle length in seconds.

  • verbose (bool) – Verbosity.

  • add_noise (bool) – Add noise.

  • normalize – Normalize frame abundance.

  • n_steps (int) – number of steps.

  • num_threads (int) – number of threads.

  • from_existing (bool) – Use existing parameters.

  • sigmas (ndarray) – sigmas.

  • lambdas (ndarray) – lambdas.

Returns:

Peptide DataFrame with frame distributions.

Return type:

pd.DataFrame

imspy.simulation.timsim.jobs.simulate_ion_mobilities module

imspy.simulation.timsim.jobs.simulate_ion_mobilities.simulate_ion_mobilities(ions, im_lower, im_upper, verbose=False)

Simulate ion mobilities.

Parameters:
  • ions (DataFrame) – Ions DataFrame.

  • im_lower (float) – Lower ion mobility.

  • im_upper (float) – Upper ion mobility.

  • verbose (bool) – Verbosity.

Returns:

Ions DataFrame.

Return type:

pd.DataFrame

imspy.simulation.timsim.jobs.simulate_occurrences module

imspy.simulation.timsim.jobs.simulate_occurrences.simulate_peptide_occurrences(peptides, intensity_mean, intensity_min, intensity_max, verbose=False, sample_occurrences=True, intensity_value=1000000.0, mixture_contribution=1.0)

Simulate peptide occurrences.

Parameters:
  • peptides (DataFrame) – Peptides DataFrame.

  • intensity_mean (float) – Intensity mean.

  • intensity_min (float) – Intensity minimum.

  • intensity_max (float) – Intensity maximum.

  • verbose (bool) – Verbosity.

  • sample_occurrences (bool) – Sample occurrences.

  • intensity_value (float) – Intensity value.

  • mixture_contribution (float) – Mixture contribution.

Returns:

Peptides DataFrame.

Return type:

pd.DataFrame

imspy.simulation.timsim.jobs.simulate_precursor_spectra module

imspy.simulation.timsim.jobs.simulate_retention_time module

imspy.simulation.timsim.jobs.simulate_retention_time.simulate_retention_times(peptides, verbose=False, gradient_length=3600)

Simulate retention times.

Parameters:
  • peptides (DataFrame) – Peptides DataFrame.

  • verbose (bool) – Verbosity.

  • gradient_length (float) – Gradient length in seconds.

Returns:

Peptides DataFrame.

Return type:

pd.DataFrame

imspy.simulation.timsim.jobs.simulate_scan_distributions module

imspy.simulation.timsim.jobs.simulate_scan_distributions.simulate_scan_distributions(ions, scans, z_score, mean_std_im=0.01, variance_std_im=0.0, verbose=False, add_noise=False, normalize=False, from_existing=False, std_means=None)

Simulate scan distributions for ions.

Parameters:
  • ions (DataFrame) – Ions DataFrame.

  • scans (DataFrame) – Scan DataFrame.

  • z_score (float) – Z-score.

  • mean_std_im (float) – Standard deviation of ion mobility.

  • variance_std_im (float) – Variance of standard deviation of ion mobility.

  • verbose (bool) – Verbosity.

  • add_noise (bool) – Add noise.

  • normalize (bool) – Normalize scan abundance.

  • from_existing (bool) – Use existing parameters.

  • std_means (ndarray[Any, dtype[TypeVar(_ScalarType_co, bound= generic, covariant=True)]]) – Standard deviations.

Returns:

Ions DataFrame with scan distributions.

Return type:

pd.DataFrame

imspy.simulation.timsim.jobs.utility module

imspy.simulation.timsim.jobs.utility.check_path(p)
Return type:

str

imspy.simulation.timsim.jobs.utility.phosphorylation_sizes(sequence)

Checks if a sequence contains potential phosphorylation sites (S, T, or Y), and returns the count of sites and their indices.

Parameters:

sequence (str) – The input sequence string, e.g., “IC[UNIMOD:4]RQHTK”.

Returns:

A tuple containing:
  • int: The number of phosphorylation sites.

  • list: A list of indices where the sites are found.

Return type:

tuple

Module contents