imspy.timstof.dbsearch package

Submodules

imspy.timstof.dbsearch.imspy_dda module

imspy.timstof.dbsearch.imspy_dda.create_database(fasta, static, variab, enzyme_builder, generate_decoys, bucket_size, shuffle_decoys=True, keep_ends=True)
imspy.timstof.dbsearch.imspy_dda.load_config(config_path)
imspy.timstof.dbsearch.imspy_dda.main()

imspy.timstof.dbsearch.imspy_rescore_sage module

imspy.timstof.dbsearch.imspy_rescore_sage.main()

imspy.timstof.dbsearch.sage_output_utility module

class imspy.timstof.dbsearch.sage_output_utility.PatternReplacer(replacements, pattern='\\\\[.*?\\\\]')

Bases: object

apply(string)
Return type:

str

imspy.timstof.dbsearch.sage_output_utility.break_into_equal_size_sets(sequence_set, k=10)

Breaks a set of objects into k sets of equal size at random.

Parameters:
  • sequence_set – Set of sequences to be divided

  • k (int) – Number of sets to divide the objects into

Returns:

A list containing k sets, each with equal number of randomly chosen sequences

imspy.timstof.dbsearch.sage_output_utility.cosim_from_dict(observed, predicted)
imspy.timstof.dbsearch.sage_output_utility.fragments_to_dict(fragments)
imspy.timstof.dbsearch.sage_output_utility.generate_training_data(psms, method='psm', q_max=0.01, balance=True)

Generate training data. :type psms: DataFrame :param psms: List of PeptideSpectrumMatch objects :type method: str :param method: Method to use for training data generation :type q_max: float :param q_max: Maximum q-value allowed for positive examples :type balance: bool :param balance: Whether to balance the dataset

Returns:

X_train and Y_train

Return type:

Tuple[NDArray, NDArray]

imspy.timstof.dbsearch.sage_output_utility.plot_summary(TARGET, DECOY, save_path, dpi=300, file_format='png')
imspy.timstof.dbsearch.sage_output_utility.re_score_psms(psms, num_splits=10, verbose=True, balance=True, score='hyperscore', positive_example_q_max=0.01)

Re-score PSMs using LDA. :type psms: DataFrame :param psms: List of PeptideSpectrumMatch objects :type num_splits: int :param num_splits: Number of splits :type verbose: bool :param verbose: Whether to print progress :type balance: bool :param balance: Whether to balance the dataset :type score: str :param score: Score to use for re-scoring :type positive_example_q_max: float :param positive_example_q_max: Maximum q-value allowed for positive examples

Returns:

List of PeptideSpectrumMatch objects

Return type:

List[PeptideSpectrumMatch]

imspy.timstof.dbsearch.sage_output_utility.remove_substrings(input_string)
Return type:

str

imspy.timstof.dbsearch.sage_output_utility.row_to_fragment(r)
imspy.timstof.dbsearch.sage_output_utility.split_dataframe_randomly(df, n)
Return type:

list

imspy.timstof.dbsearch.utility module

imspy.timstof.dbsearch.utility.check_memory(limit_in_gb=16, msg='⚠️ Warning: System has only {total_ram_gb:.2f}GB of RAM, which is below the recommended {limit_in_gb}GB.')
imspy.timstof.dbsearch.utility.extract_timstof_dda_data(path, in_memory=False, use_bruker_sdk=False, isolation_window_lower=-3.0, isolation_window_upper=3.0, take_top_n=100, num_threads=16)

Extract TIMSTOF DDA data from bruker timsTOF TDF file. :type path: str :param path: Path to TIMSTOF DDA data :type in_memory: bool :param in_memory: Whether to load data in memory :type use_bruker_sdk: bool :param use_bruker_sdk: Whether to use bruker SDK for data extraction :type isolation_window_lower: float :param isolation_window_lower: Lower bound for isolation window (Da) :type isolation_window_upper: float :param isolation_window_upper: Upper bound for isolation window (Da) :type take_top_n: int :param take_top_n: Number of top peaks to take :type num_threads: int :param num_threads: Number of threads to use

Returns:

DataFrame containing timsTOF DDA data

Return type:

pd.DataFrame

imspy.timstof.dbsearch.utility.generate_balanced_im_dataset(psms)
Return type:

List[Psm]

imspy.timstof.dbsearch.utility.generate_balanced_rt_dataset(psms)
Return type:

List[Psm]

imspy.timstof.dbsearch.utility.generate_training_data(psms, method='psm', q_max=0.01, balance=True)

Generate training data. :type psms: List[Psm] :param psms: List of PeptideSpectrumMatch objects :type method: str :param method: Method to use for training data generation :type q_max: float :param q_max: Maximum q-value allowed for positive examples :type balance: bool :param balance: Whether to balance the dataset

Returns:

X_train and Y_train

Return type:

Tuple[NDArray, NDArray]

imspy.timstof.dbsearch.utility.get_searchable_spec(precursor, raw_fragment_data, spec_processor, time, spec_id, file_id=0, ms_level=2)

Get SAGE searchable spectrum from raw data. :type precursor: Precursor :param precursor: Precursor object :type raw_fragment_data: TimsFrame :param raw_fragment_data: TimsFrame object :type time: float :param time: float :type spec_processor: SpectrumProcessor :param spec_processor: SpectrumProcessor object :type spec_id: str :param spec_id: str :type file_id: int :param file_id: int :type ms_level: int :param ms_level: int

Returns:

ProcessedSpectrum object

Return type:

ProcessedSpectrum

imspy.timstof.dbsearch.utility.linear_map(value, old_min, old_max, new_min=0.0, new_max=60.0)
imspy.timstof.dbsearch.utility.list_to_semicolon_string(value)

Converts a list of proteins into a semicolon-separated string.

imspy.timstof.dbsearch.utility.map_to_domain(data, gradient_length=120.0)

Maps the input data linearly into the domain [0, l].

Parameters: - data: list or numpy array of numerical values - l: float, the upper limit of the target domain [0, l]

Returns: - mapped_data: list of values mapped into the domain [0, l]

imspy.timstof.dbsearch.utility.merge_dicts_with_merge_dict(dicts)
imspy.timstof.dbsearch.utility.parse_string_list(input_str)

Takes a string representation of a list and converts it into an actual list of strings.

Parameters:

input_str (str) – A string containing a list representation.

Returns:

A list of strings parsed from the input string.

Return type:

list

imspy.timstof.dbsearch.utility.parse_to_tims2rescore(TDC, from_mgf=False, file_name=None)
imspy.timstof.dbsearch.utility.peptide_length(peptide)

Takes a peptide sequence as a string and returns its length, excluding [UNIMOD:X] modifications.

Parameters:

peptide (str) – A peptide sequence with possible UNIMOD modifications.

Returns:

The length of the peptide without modifications.

Return type:

int

imspy.timstof.dbsearch.utility.sanitize_charge(charge)

Sanitize charge value. :type charge: Optional[float] :param charge: Charge value as float.

Returns:

Charge value as int.

Return type:

int

imspy.timstof.dbsearch.utility.sanitize_mz(mz, mz_highest)

Sanitize mz value. :type mz: Optional[float] :param mz: Mz value as float. :type mz_highest: float :param mz_highest: Highest mz value.

Returns:

Mz value as float.

Return type:

float

imspy.timstof.dbsearch.utility.split_fasta(fasta, num_splits=16, randomize=True)

Split a fasta file into multiple fasta files. :type fasta: str :param fasta: Fasta file as string. :type num_splits: int :param num_splits: Number of splits fasta file should be split into. :type randomize: bool :param randomize: Whether to randomize the order of sequences before splitting.

Return type:

List[str]

Returns:

List of fasta files as strings, will contain num_splits fasta files with equal number of sequences.

imspy.timstof.dbsearch.utility.split_psms(psms, num_splits=10)

Split PSMs into multiple splits.

Parameters:
  • psms (List[Psm]) – List of PeptideSpectrumMatch objects

  • num_splits (int) – Number of splits

Returns:

List of splits

Return type:

List[List[PeptideSpectrumMatch]]

imspy.timstof.dbsearch.utility.transform_psm_to_pin(psm_df)
imspy.timstof.dbsearch.utility.write_psms_binary(byte_array, folder_path, file_name, total=False)

Write PSMs to binary file. :type byte_array: :param byte_array: Byte array :type folder_path: str :param folder_path: Folder path :type file_name: str :param file_name: File name :type total: bool :param total: Whether to write to total folder

Module contents