imspy.timstof.dbsearch package¶
Submodules¶
imspy.timstof.dbsearch.imspy_dda module¶
- imspy.timstof.dbsearch.imspy_dda.create_database(fasta, static, variab, enzyme_builder, generate_decoys, bucket_size, shuffle_decoys=True, keep_ends=True)¶
- imspy.timstof.dbsearch.imspy_dda.load_config(config_path)¶
- imspy.timstof.dbsearch.imspy_dda.main()¶
imspy.timstof.dbsearch.imspy_rescore_sage module¶
- imspy.timstof.dbsearch.imspy_rescore_sage.main()¶
imspy.timstof.dbsearch.sage_output_utility module¶
- class imspy.timstof.dbsearch.sage_output_utility.PatternReplacer(replacements, pattern='\\\\[.*?\\\\]')¶
Bases:
object
- apply(string)¶
- Return type:
str
- imspy.timstof.dbsearch.sage_output_utility.break_into_equal_size_sets(sequence_set, k=10)¶
Breaks a set of objects into k sets of equal size at random.
- Parameters:
sequence_set – Set of sequences to be divided
k (
int
) – Number of sets to divide the objects into
- Returns:
A list containing k sets, each with equal number of randomly chosen sequences
- imspy.timstof.dbsearch.sage_output_utility.cosim_from_dict(observed, predicted)¶
- imspy.timstof.dbsearch.sage_output_utility.fragments_to_dict(fragments)¶
- imspy.timstof.dbsearch.sage_output_utility.generate_training_data(psms, method='psm', q_max=0.01, balance=True)¶
Generate training data. :type psms:
DataFrame
:param psms: List of PeptideSpectrumMatch objects :type method:str
:param method: Method to use for training data generation :type q_max:float
:param q_max: Maximum q-value allowed for positive examples :type balance:bool
:param balance: Whether to balance the dataset- Returns:
X_train and Y_train
- Return type:
Tuple[NDArray, NDArray]
- imspy.timstof.dbsearch.sage_output_utility.plot_summary(TARGET, DECOY, save_path, dpi=300, file_format='png')¶
- imspy.timstof.dbsearch.sage_output_utility.re_score_psms(psms, num_splits=10, verbose=True, balance=True, score='hyperscore', positive_example_q_max=0.01)¶
Re-score PSMs using LDA. :type psms:
DataFrame
:param psms: List of PeptideSpectrumMatch objects :type num_splits:int
:param num_splits: Number of splits :type verbose:bool
:param verbose: Whether to print progress :type balance:bool
:param balance: Whether to balance the dataset :type score:str
:param score: Score to use for re-scoring :type positive_example_q_max:float
:param positive_example_q_max: Maximum q-value allowed for positive examples- Returns:
List of PeptideSpectrumMatch objects
- Return type:
List[PeptideSpectrumMatch]
- imspy.timstof.dbsearch.sage_output_utility.remove_substrings(input_string)¶
- Return type:
str
- imspy.timstof.dbsearch.sage_output_utility.row_to_fragment(r)¶
- imspy.timstof.dbsearch.sage_output_utility.split_dataframe_randomly(df, n)¶
- Return type:
list
imspy.timstof.dbsearch.utility module¶
- imspy.timstof.dbsearch.utility.check_memory(limit_in_gb=16, msg='⚠️ Warning: System has only {total_ram_gb:.2f}GB of RAM, which is below the recommended {limit_in_gb}GB.')¶
- imspy.timstof.dbsearch.utility.extract_timstof_dda_data(path, in_memory=False, use_bruker_sdk=False, isolation_window_lower=-3.0, isolation_window_upper=3.0, take_top_n=100, num_threads=16)¶
Extract TIMSTOF DDA data from bruker timsTOF TDF file. :type path:
str
:param path: Path to TIMSTOF DDA data :type in_memory:bool
:param in_memory: Whether to load data in memory :type use_bruker_sdk:bool
:param use_bruker_sdk: Whether to use bruker SDK for data extraction :type isolation_window_lower:float
:param isolation_window_lower: Lower bound for isolation window (Da) :type isolation_window_upper:float
:param isolation_window_upper: Upper bound for isolation window (Da) :type take_top_n:int
:param take_top_n: Number of top peaks to take :type num_threads:int
:param num_threads: Number of threads to use- Returns:
DataFrame containing timsTOF DDA data
- Return type:
pd.DataFrame
- imspy.timstof.dbsearch.utility.generate_balanced_im_dataset(psms)¶
- Return type:
List
[Psm
]
- imspy.timstof.dbsearch.utility.generate_balanced_rt_dataset(psms)¶
- Return type:
List
[Psm
]
- imspy.timstof.dbsearch.utility.generate_training_data(psms, method='psm', q_max=0.01, balance=True)¶
Generate training data. :type psms:
List
[Psm
] :param psms: List of PeptideSpectrumMatch objects :type method:str
:param method: Method to use for training data generation :type q_max:float
:param q_max: Maximum q-value allowed for positive examples :type balance:bool
:param balance: Whether to balance the dataset- Returns:
X_train and Y_train
- Return type:
Tuple[NDArray, NDArray]
- imspy.timstof.dbsearch.utility.get_searchable_spec(precursor, raw_fragment_data, spec_processor, time, spec_id, file_id=0, ms_level=2)¶
Get SAGE searchable spectrum from raw data. :type precursor:
Precursor
:param precursor: Precursor object :type raw_fragment_data:TimsFrame
:param raw_fragment_data: TimsFrame object :type time:float
:param time: float :type spec_processor:SpectrumProcessor
:param spec_processor: SpectrumProcessor object :type spec_id:str
:param spec_id: str :type file_id:int
:param file_id: int :type ms_level:int
:param ms_level: int- Returns:
ProcessedSpectrum object
- Return type:
ProcessedSpectrum
- imspy.timstof.dbsearch.utility.linear_map(value, old_min, old_max, new_min=0.0, new_max=60.0)¶
- imspy.timstof.dbsearch.utility.list_to_semicolon_string(value)¶
Converts a list of proteins into a semicolon-separated string.
- imspy.timstof.dbsearch.utility.map_to_domain(data, gradient_length=120.0)¶
Maps the input data linearly into the domain [0, l].
Parameters: - data: list or numpy array of numerical values - l: float, the upper limit of the target domain [0, l]
Returns: - mapped_data: list of values mapped into the domain [0, l]
- imspy.timstof.dbsearch.utility.merge_dicts_with_merge_dict(dicts)¶
- imspy.timstof.dbsearch.utility.parse_string_list(input_str)¶
Takes a string representation of a list and converts it into an actual list of strings.
- Parameters:
input_str (str) – A string containing a list representation.
- Returns:
A list of strings parsed from the input string.
- Return type:
list
- imspy.timstof.dbsearch.utility.parse_to_tims2rescore(TDC, from_mgf=False, file_name=None)¶
- imspy.timstof.dbsearch.utility.peptide_length(peptide)¶
Takes a peptide sequence as a string and returns its length, excluding [UNIMOD:X] modifications.
- Parameters:
peptide (str) – A peptide sequence with possible UNIMOD modifications.
- Returns:
The length of the peptide without modifications.
- Return type:
int
- imspy.timstof.dbsearch.utility.sanitize_charge(charge)¶
Sanitize charge value. :type charge:
Optional
[float
] :param charge: Charge value as float.- Returns:
Charge value as int.
- Return type:
int
- imspy.timstof.dbsearch.utility.sanitize_mz(mz, mz_highest)¶
Sanitize mz value. :type mz:
Optional
[float
] :param mz: Mz value as float. :type mz_highest:float
:param mz_highest: Highest mz value.- Returns:
Mz value as float.
- Return type:
float
- imspy.timstof.dbsearch.utility.split_fasta(fasta, num_splits=16, randomize=True)¶
Split a fasta file into multiple fasta files. :type fasta:
str
:param fasta: Fasta file as string. :type num_splits:int
:param num_splits: Number of splits fasta file should be split into. :type randomize:bool
:param randomize: Whether to randomize the order of sequences before splitting.- Return type:
List
[str
]- Returns:
List of fasta files as strings, will contain num_splits fasta files with equal number of sequences.
- imspy.timstof.dbsearch.utility.split_psms(psms, num_splits=10)¶
Split PSMs into multiple splits.
- Parameters:
psms (
List
[Psm
]) – List of PeptideSpectrumMatch objectsnum_splits (
int
) – Number of splits
- Returns:
List of splits
- Return type:
List[List[PeptideSpectrumMatch]]
- imspy.timstof.dbsearch.utility.transform_psm_to_pin(psm_df)¶
- imspy.timstof.dbsearch.utility.write_psms_binary(byte_array, folder_path, file_name, total=False)¶
Write PSMs to binary file. :type byte_array: :param byte_array: Byte array :type folder_path:
str
:param folder_path: Folder path :type file_name:str
:param file_name: File name :type total:bool
:param total: Whether to write to total folder