CN112185460B - Heterogeneous data independent proteomics mass spectrometry analysis system and method - Google Patents

Heterogeneous data independent proteomics mass spectrometry analysis system and method Download PDF

Info

Publication number
CN112185460B
CN112185460B CN202011005330.1A CN202011005330A CN112185460B CN 112185460 B CN112185460 B CN 112185460B CN 202011005330 A CN202011005330 A CN 202011005330A CN 112185460 B CN112185460 B CN 112185460B
Authority
CN
China
Prior art keywords
data
independent
target detection
pseudo
peptide
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011005330.1A
Other languages
Chinese (zh)
Other versions
CN112185460A (en
Inventor
钟传奇
陈希
韩强强
尚骏
黄邵鑫
刘宜子
杜博贾
杨勇
周欣
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Spectral Double Combined Wuhan Life Technology Co ltd
Original Assignee
Spectral Double Combined Wuhan Life Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Spectral Double Combined Wuhan Life Technology Co ltd filed Critical Spectral Double Combined Wuhan Life Technology Co ltd
Priority to CN202011005330.1A priority Critical patent/CN112185460B/en
Publication of CN112185460A publication Critical patent/CN112185460A/en
Application granted granted Critical
Publication of CN112185460B publication Critical patent/CN112185460B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B50/00ICT programming tools or database systems specially adapted for bioinformatics
    • G16B50/30Data warehousing; Computing architectures

Abstract

The invention discloses a heterogeneous data independent proteomics mass spectrometry analysis system and method. The system comprises a local client and a cloud high-performance server. The method comprises the following steps: (1) the local client reads local heterogeneous data independent proteomics mass spectrum data, and calls a cloud high-performance server to obtain a data interpreter; (2) after the local client locally finishes data interpretation, spectrogram extraction and pseudo-peptide fragment generation, the local client submits peptide fragment spectrogram data, pseudo-peptide fragments and target detection peptide fragments to a high-performance server; (3) and the high-performance server performs data analysis according to the peptide fragment spectrogram data, the pseudo peptide fragments and the target detection peptide fragments provided by the local client, and returns a calculated proteomics analysis result to the local client. The invention can give consideration to the privacy and strong operational capability of the original data independent proteomics mass spectrum data.

Description

Heterogeneous data independent proteomics mass spectrometry analysis system and method
Technical Field
The invention belongs to the field of proteomics, and particularly relates to a heterogeneous library file data independent proteomics mass spectrometry analysis system and method.
Background
Traditional proteomics employs a Data Dependent Acquisition (DDA) strategy to digest protein samples into peptide fragments, ionize and analyze by mass spectrometry. In the full scan mass spectrum, the peptide signals above noise are selectively cleaved to produce a random (MS/MS) mass spectrum that can be matched to the spectra in the database. Although this method is very powerful, it randomly extracts peptides for cleavage, always biased towards those peaks with the strongest signals. Therefore, quantification of low abundance peptide fragments remains a challenge.
In the subsequent development of a directed analysis technique, Selective Reaction Monitoring (SRM), mass spectrometers can detect specific peptide fragments with high sensitivity and high quantitative accuracy.
The proteomics research community now focuses on Data Independent Acquisition (DIA), which theoretically combines the advantages of DDA and SRM. In the DIA analysis, all peptide fragments within a given mass-to-charge ratio (m/z) window are cleaved; the analysis was repeated until the mass spectrometer covered the entire m/z range. This enables accurate peptide quantification without being limited to analysis of pre-defined peptide fragments.
The analysis of the data independent proteomics mass spectrum data has to depend on a bioinformatics algorithm for regression fitting due to the extremely large data volume, however, with the continuous improvement of detection means, the form and format of the data independent proteomics mass spectrum data are continuously updated. The existing analysis system cannot be compatible with the analysis of various data independent proteomics mass spectrum data in an extensible mode. Meanwhile, a cloud centralized analysis system can cause leakage of detection original data, and is not beneficial to commercial popularization, so that a new generation of data independent proteomics mass spectrometry analysis system needs to be developed.
Disclosure of Invention
Aiming at the defects or the improvement requirements of the prior art, the invention provides a heterogeneous data independent proteomics mass spectrometry analysis system and a method, aiming at realizing an extensible and compatible complex data independent proteomics mass spectrometry data format by combining local service and cloud service, and simultaneously shortening data analysis time and reducing the requirement of local on computing capability on the premise of ensuring data privacy safety, thereby solving the technical problems that analysis software in the prior art cannot be extensible and compatible with a continuously updated data format, has high requirement on local computing power, long analysis time and risk of data privacy disclosure, and is not beneficial to commercial popularization.
In order to achieve the above object, according to an aspect of the present invention, there is provided a heterogeneous data independent proteomics mass spectrometry system, including a local client and a cloud high performance server;
the local client is used for acquiring data independent original data and target detection data, interpreting the data independent original data into data independent proteomic mass spectrum data in a standard format and interpreting the target detection data into library files in the standard format according to a data interpreter called from a cloud high-performance server, generating peptide fragment spectrogram data, pseudo peptide fragments and target detection peptide fragments according to the data independent proteomic mass spectrum data in the standard format and the library files in the standard format, and submitting the peptide fragment spectrogram data, the pseudo peptide fragments and the target detection peptide fragments to the high-performance server;
and the cloud high-performance server is used for performing data analysis according to the peptide fragment spectrogram data, the pseudo peptide fragments and the target detection peptide fragments provided by the local client, performing retention time regularization and regression calculation on data analysis results, obtaining the sub-ion series strength and the pseudo-positive rate of the target detection peptide fragments as proteomics analysis results, and returning the proteomics analysis results to the local client.
Preferably, the heterogeneous data independent proteomics mass spectrometry system, wherein the local client is further configured to obtain and display proteomics analysis results from a cloud high-performance server.
Preferably, the heterogeneous data-independent proteomics mass spectrometry system, wherein the local client comprises: a data interpreter, a spectrogram extractor and a pseudopeptide segment generator;
the data interpreter is called from a high-performance server and used for reading the data-independent original data and the target detection data, identifying the data-independent original data and the target detection data of the currently supported types, respectively converting the data-independent original data and the target detection data into data-independent proteomics mass spectrum data in a standard format and library files in a standard format, submitting the data-independent proteomics mass spectrum data in the standard format and the library files in the standard format to a spectrogram extractor, and submitting the library files in the standard format to a pseudo-peptide fragment generator;
the spectrogram extractor is used for merging the data independent proteomics mass spectrum data in the standard format according to the library file in the standard format to obtain peptide fragment spectrogram data and submitting the peptide fragment spectrogram data to the cloud high-performance server;
and the pseudo peptide segment generator is used for generating and operating the library file in the standard format to obtain a pseudo peptide segment, and submitting the pseudo peptide segment and the target detection peptide segment to a cloud high-performance server.
Preferably, the heterogeneous data independent proteomics mass spectrometry system, wherein the merging process comprises cyclic scanning, convolution merging and noise reduction; the convolutions are combined as a Tophat convolution operation or a Bartlett convolution operation.
Preferably, the heterogeneous data independent proteomic mass spectrometry system, wherein the generating operation is an operation of maintaining the peptide fragment composition unchanged and changing the amino acid sequence.
Preferably, the heterogeneous data independent proteomics mass spectrometry system, the cloud high performance server thereof, comprises a data analyzer, a regularizer, and a quality controller;
the data analyzer is used for scoring based on chromatogram, mass spectrum and/or ion mobility according to the peptide fragment spectrogram data, the pseudo-peptide fragment and the target detection peptide fragment data, predicting signal values of the target detection peptide fragment and the pseudo-peptide fragment according to a scoring result and supplying the signal values to the regularizer;
the regularizer is used for carrying out retention time regularization and regression algorithm according to the signal values of the target detection peptide segment and the pseudo peptide segment to obtain the sub-ion series strength of the target detection peptide segment and the pseudo peptide segment, and submitting the sub-ion series strength to the quality controller;
and the quality controller is used for calculating the false positive rate of the peptide fragment according to the sub-ion series strength of the target detection peptide fragment and the pseudo peptide fragment and returning the false positive rate to the local client.
According to another method of the invention, a heterogeneous data independent proteomics mass spectrometry method is provided, which is characterized by applying the heterogeneous data independent proteomics mass spectrometry system provided by the invention.
Preferably, the heterogeneous data independent proteomics mass spectrometry method comprises the following steps:
(1) the local client reads local heterogeneous data independent proteomics mass spectrometry data, and calls a cloud high-performance server to obtain a data interpreter;
(2) after the local client locally finishes data interpretation, spectrogram extraction and pseudo peptide segment generation, the local client submits peptide segment spectrogram data, pseudo peptide segments and target detection peptide segments to a high-performance server;
(3) and the high-performance server performs data analysis according to the peptide fragment spectrogram data, the pseudo peptide fragments and the target detection peptide fragments provided by the local client, performs retention time regularization and regression calculation on data analysis results, obtains the sub-ion series strength and the pseudo-positive rate of the target detection peptide fragments as proteomics analysis results, and returns the proteomics analysis results to the local client.
Preferably, the heterogeneous data independent proteomic mass spectrometry method, when processing a high-throughput data set, the steps (1-2) and step (3) are performed in a distributed or integrated manner.
Preferably, the heterogeneous data independent proteomics mass spectrometry analysis method, which is performed in a distributed manner, is: when the high-performance server analyzes the data of the current data independent proteomics mass spectrum data, the local client simultaneously processes the next batch of data independent proteomics mass spectrum data;
the integrated process, namely: the system is provided with a plurality of high-performance servers and one or more local clients, and task scheduling is performed on the high-performance servers, so that the shortest total processing time or the shortest processing time of the specific data independent proteomics mass spectrum data is realized.
In general, compared with the prior art, the above technical solution contemplated by the present invention can achieve the following beneficial effects:
according to the method and the system, the local client and the cloud high-performance server are adopted to respectively carry out local preprocessing of heterogeneous data independent proteomics mass spectrometry data and cloud analysis, privacy and strong computing capacity of original data independent proteomics mass spectrometry data can be considered, huge time cost and computing performance requirements brought by localization in the whole analysis process are avoided, or original detection data leakage risk and huge data transmission bandwidth requirements are brought because the original data independent proteomics mass spectrometry data are completely completed by the cloud high-performance server.
Meanwhile, the cloud continuously updates the data independent proteomics data and the database file, so that the data independent proteomics data obtained by different formats and different detection means can be adapted in an extensible manner.
According to the optimal scheme of the method, distributed or integrated task scheduling is performed by utilizing the computing power of the local client and the cloud high-performance server, the computing time is further shortened, the method is particularly suitable for high-throughput operation, and the processing speed of high-throughput data in an ideal state is improved by nearly one time.
Drawings
FIG. 1 is a schematic diagram of the heterogeneous data independent proteomics mass spectrometry system structure provided by the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail with reference to the following embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention. In addition, the technical features involved in the embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other.
The heterogeneous data independent proteomics mass spectrometry system provided by the invention comprises a local client and a cloud high-performance server, as shown in fig. 1;
the local client is used for acquiring data independent original data and target detection data, interpreting the data independent original data into data independent proteomic mass spectrum data in a standard format and interpreting the target detection data into library files in the standard format according to a data interpreter called from a cloud high-performance server, generating peptide fragment spectrogram data, pseudo peptide fragments and target detection peptide fragments according to the data independent proteomic mass spectrum data in the standard format and the library files in the standard format, and submitting the peptide fragment spectrogram data, the pseudo peptide fragments and the target detection peptide fragments to the high-performance server; the system is also used for obtaining and displaying a proteomics analysis result from the cloud high-performance server;
the local client includes: a data interpreter, a spectrogram extractor and a pseudopeptide segment generator;
the data interpreter is called from a high-performance server and used for reading the data-independent original data and the target detection data, identifying the data-independent original data and the target detection data of the currently supported types, respectively converting the data-independent original data and the target detection data into data-independent proteomics mass spectrum data in a standard format and library files in a standard format, submitting the data-independent proteomics mass spectrum data in the standard format and the library files in the standard format to a spectrogram extractor, and submitting the library files in the standard format to a pseudo-peptide fragment generator;
the spectrogram extractor is used for merging the data independent from the proteomics mass spectrum data in the standard format according to the library file in the standard format to obtain peptide fragment spectrogram data and submitting the peptide fragment spectrogram data to the cloud high-performance server; the merging processing comprises circular scanning, convolution merging and noise reduction; the convolution combination is preferably a Tophat convolution operation or a Bartlett convolution operation.
The pseudo-peptide fragment generator is used for generating and operating the library file in the standard format to obtain a pseudo-peptide fragment, and submitting the pseudo-peptide fragment and a target detection peptide fragment to a cloud high-performance server; the generating operation comprises: random scrambling, inversion, pseudo-inversion, translation, etc., operations that maintain the peptide fragment components unchanged and change the amino acid sequence.
The cloud high-performance server is used for performing data analysis according to the peptide fragment spectrogram data, the pseudo peptide fragments and the target detection peptide fragments provided by the local client, performing retention time regularization and regression calculation on data analysis results, obtaining the sub-ion series strength and the pseudo-positive rate of the target detection peptide fragments as proteomics analysis results, and returning the proteomics analysis results to the local client; an update module is also included for storing and updating the data interpreter, the update module continually updating the data interpreter according to the type of current data-independent proteomic mass spectrometry data.
The cloud high-performance server comprises a data analyzer, a regularizer and a quality controller;
the data analyzer is used for scoring based on chromatogram, mass spectrum and/or ion mobility according to the peptide fragment spectrogram data, the pseudo peptide fragment and the target detection peptide fragment data, predicting signal values of the target detection peptide fragment and the pseudo peptide fragment according to a scoring result and providing the signal values to the regular device;
the regularizer is used for carrying out retention time regularization and regression algorithm according to the signal values of the target detection peptide segment and the pseudo peptide segment to obtain the sub-ion series strength of the target detection peptide segment and the pseudo peptide segment, and submitting the sub-ion series strength to the quality controller.
And the quality controller is used for extracting the sub-ion series strength of the target detection peptide fragment and calculating the false positive rate of the peptide fragment according to the sub-ion series strength of the target detection peptide fragment and the false peptide fragment, and returning the pseudo positive rate to the local client.
The method for performing heterogeneous data independent proteomics mass spectrometry by using the heterogeneous data independent proteomics mass spectrometry system provided by the invention comprises the following steps:
(1) the local client reads local heterogeneous data independent proteomics mass spectrum data, and calls a cloud high-performance server to obtain a data interpreter;
(2) after the local client locally finishes data interpretation, spectrogram extraction and pseudo peptide segment generation, the local client submits peptide segment spectrogram data, pseudo peptide segments and target detection peptide segments to a high-performance server;
(3) and the high-performance server performs data analysis according to the peptide fragment spectrogram data, the pseudo peptide fragments and the target detection peptide fragments provided by the local client, performs retention time regularization and regression calculation on data analysis results, obtains the sub-ion series strength and the pseudo-positive rate of the target detection peptide fragments as proteomics analysis results, and returns the proteomics analysis results to the local client.
When processing a high-throughput data set, the steps (1-2) and (3) are performed in a distributed or integrated manner;
the distributed operation is that: when the high-performance server analyzes the data of the current data independent proteomics mass spectrum data, the local client simultaneously processes the next batch of data independent proteomics mass spectrum data;
the integrated process, namely: the system is provided with a plurality of high-performance servers and one or more local clients, and task scheduling is performed on the high-performance servers, so that the shortest total processing time or the shortest processing time of specific data independent proteomics mass spectrum data is realized.
The following are examples:
the heterogeneous data independent proteomics mass spectrometry system provided by the invention comprises a local client and a cloud high-performance server, as shown in figure 1;
the local client is used for acquiring data independent original data and target detection data, interpreting the data independent original data into data independent proteomic mass spectrum data in a standard format and interpreting the target detection data into a library file in the standard format according to a data interpreter called from a cloud high-performance server, generating peptide fragment spectrogram data, a pseudo peptide fragment and a target detection peptide fragment according to the data independent proteomic mass spectrum data in the standard format and the library file in the standard format, and submitting the peptide fragment, the pseudo peptide fragment and the target detection peptide fragment to the high-performance server; the system is also used for obtaining and displaying proteomics analysis results from the cloud high-performance server;
the local client includes: a data interpreter, a spectrogram extractor and a pseudopeptide segment generator;
the data interpreter is called from a high-performance server and used for reading the data-independent original data and the target detection data, identifying the data-independent original data and the target detection data of the currently supported types, respectively converting the data-independent original data and the target detection data into data-independent proteomics mass spectrum data in a standard format and library files in a standard format, submitting the data-independent proteomics mass spectrum data in the standard format and the library files in the standard format to a spectrogram extractor, and submitting the library files in the standard format to a pseudo-peptide fragment generator;
the data-independent original data format supported by the current data interpreter is: raw, etc.; the library file format supported by the current data interpreter is: sptxt, blib, and csv. The standard format of data independent proteomics mass spectrometry data is: mzML; the standard format library file format is TraL.
The spectrogram extractor is similar to a chromatograma extractor of OpenSWATH (OpenSWATH enabled automation of data-independent acquisition MS data. nature Biotechnology, 2014/3/10) and is used for merging the data-independent proteomics mass spectrum data in the standard format to obtain peptide fragment spectrogram data and submitting the data to a cloud high-performance server; the merging processing comprises circular scanning, convolution merging and noise reduction; the convolution combination is preferably a Tophat convolution operation or a Bartlett convolution operation.
The pseudo peptide segment generator is similar to a decoy generator of OpenSWATH and used for generating and operating the library file in the standard format to obtain a pseudo peptide segment, and submitting the pseudo peptide segment and a target detection peptide segment to a cloud high-performance server; the generating operation comprises: random scrambling, inversion, pseudo-inversion, and translation, among others, which maintain the peptide fragment composition and alter the amino acid sequence.
The cloud high-performance server is used for performing data analysis according to the peptide fragment spectrogram data, the pseudo peptide fragments and the target detection peptide fragments provided by the local client, performing retention time regularization and regression calculation on data analysis results, obtaining the sub-ion series strength and the pseudo-positive rate of the target detection peptide fragments as proteomics analysis results, and returning the proteomics analysis results to the local client; further comprising an update module for storing and updating the data interpreter, the update module continuously updating the data interpreter according to the type of current data independent proteomic mass spectrometry data.
The cloud high-performance server comprises a data analyzer, a regularizer and a quality controller;
the data Analyzer is similar to the Analyzer of OpenSWATH, and is used for scoring based on chromatogram, mass spectrum and/or ion mobility according to the peptide fragment spectrogram data, the pseudo peptide fragment and the target detection peptide fragment data, predicting the signal values of the target detection peptide fragment and the pseudo peptide fragment according to the scoring result, and providing the signal values to the regularizer;
the chromatographic-based scoring item includes: cross-validation (Cross-Correlation Score), Intensity (Intensity Score), Signal-to-noise Score, EMG (explicit Modified Gaussian Score), Relative Intensity (Relative Intensity Score), and Retention Time (Retention Time Score); the mass spectrum-based scoring items include: isotope (Isotope Score), Mass spectral Mass Accuracy (Mass Accuracy Score), and Ion Series (Ion Series Score); the scoring items based on ion mobility include: ion mobility (ion mobility).
The regularizer is similar to an RTNormalizer of OpenSWATH and is used for carrying out retention time regularization and an LDA linear regression algorithm according to the signal values of the target detection peptide segment and the pseudo-peptide segment to obtain the sub-ion series strength of the target detection peptide segment and the pseudo-peptide segment and submitting the sub-ion series strength to a quality controller;
and the quality controller is used for calculating the false positive rate of the peptide fragment according to the sub-ion series strength of the target detection peptide fragment and the pseudo peptide fragment and returning the false positive rate to the local client.
It will be understood by those skilled in the art that the foregoing is only a preferred embodiment of the present invention, and is not intended to limit the invention, and that any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims (10)

1. A heterogeneous data independent proteomics mass spectrometry analysis system is characterized by comprising a local client and a cloud high-performance server;
the local client is used for acquiring data independent original data and target detection data, interpreting the data independent original data into data independent proteomic mass spectrum data in a standard format and interpreting the target detection data into library files in the standard format according to a data interpreter called from a cloud high-performance server, generating peptide fragment spectrogram data, pseudo peptide fragments and target detection peptide fragments according to the data independent proteomic mass spectrum data in the standard format and the library files in the standard format, and submitting the peptide fragment spectrogram data, the pseudo peptide fragments and the target detection peptide fragments to the high-performance server;
and the cloud high-performance server is used for performing data analysis according to the peptide fragment spectrogram data, the pseudo peptide fragments and the target detection peptide fragments provided by the local client, performing retention time regularization and regression calculation on data analysis results, obtaining the sub-ion series strength and the pseudo-positive rate of the target detection peptide fragments as proteomics analysis results, and returning the proteomics analysis results to the local client.
2. The heterogeneous data-independent proteomic mass spectrometry system of claim 1, wherein the local client is further configured to obtain and display proteomic analysis results from a cloud-based high performance server.
3. The heterogeneous data-independent proteomics mass spectrometry system of claim 1, wherein the local client comprises: a data interpreter, a spectrogram extractor and a pseudopeptide segment generator;
the data interpreter is called from a high-performance server and used for reading the data-independent original data and the target detection data, identifying the data-independent original data and the target detection data of the currently supported types, respectively converting the data-independent original data and the target detection data into data-independent proteomics mass spectrum data in a standard format and library files in a standard format, submitting the data-independent proteomics mass spectrum data in the standard format and the library files in the standard format to a spectrogram extractor, and submitting the library files in the standard format to a pseudo-peptide fragment generator;
the spectrogram extractor is used for merging the data independent from the proteomics mass spectrum data in the standard format according to the library file in the standard format to obtain peptide fragment spectrogram data and submitting the peptide fragment spectrogram data to the cloud high-performance server;
and the pseudo peptide segment generator is used for generating and operating the library file in the standard format to obtain a pseudo peptide segment, and submitting the pseudo peptide segment and the target detection peptide segment to a cloud high-performance server.
4. The heterogeneous data independent proteomic mass spectrometry system of claim 3, wherein the merging process comprises cyclic scanning, convolution merging, noise reduction; the convolutions are combined as a Tophat convolution operation or a Bartlett convolution operation.
5. The heterogeneous data independent proteomic mass spectrometry system of claim 3, wherein the generating operation is an operation that maintains the peptide fragment composition and changes the amino acid sequence.
6. The heterogeneous data independent proteomics mass spectrometry system of claim 1, wherein the cloud high performance server comprises a data analyzer, a regularizer, and a quality controller;
the data analyzer is used for scoring based on chromatogram, mass spectrum and/or ion mobility according to the peptide fragment spectrogram data, the pseudo-peptide fragment and the target detection peptide fragment data, predicting signal values of the target detection peptide fragment and the pseudo-peptide fragment according to a scoring result and supplying the signal values to the regularizer;
the regularizer is used for carrying out retention time regularization and regression calculation according to the signal values of the target detection peptide segment and the pseudo peptide segment to obtain the sub-ion series strength of the target detection peptide segment and the pseudo peptide segment, and submitting the sub-ion series strength to the quality controller;
and the quality controller is used for calculating the false positive rate of the peptide fragment according to the sub-ion series strength of the target detection peptide fragment and the pseudo peptide fragment and returning the false positive rate to the local client.
7. A heterogeneous data independent proteomics mass spectrometry method, characterized in that the heterogeneous data independent proteomics mass spectrometry system of any one of claims 1 to 6 is applied.
8. The heterogeneous data independent proteomic mass spectrometry method of claim 7, comprising the steps of:
(1) the local client reads local heterogeneous data independent proteomics mass spectrum data, and calls a cloud high-performance server to obtain a data interpreter;
(2) after the local client locally finishes data interpretation, spectrogram extraction and pseudo peptide segment generation, the local client submits peptide segment spectrogram data, pseudo peptide segments and target detection peptide segments to a high-performance server;
(3) and the high-performance server performs data analysis according to the peptide fragment spectrogram data, the pseudo peptide fragments and the target detection peptide fragments provided by the local client, performs retention time regularization and regression calculation on data analysis results, obtains the sub-ion series strength and the pseudo-positive rate of the target detection peptide fragments as proteomics analysis results, and returns the proteomics analysis results to the local client.
9. The heterogeneous data independent proteomic mass spectrometry method of claim 8, wherein the steps (1-2) and (3) are performed in a distributed or integrated manner when processing high throughput data sets.
10. The heterogeneous data independent proteomic mass spectrometry method of claim 9, wherein the distribution is performed by: when the high-performance server analyzes the data of the current data independent proteomics mass spectrum data, the local client simultaneously processes the next batch of data independent proteomics mass spectrum data;
the integrated process, namely: the system is provided with a plurality of high-performance servers and one or more local clients, and task scheduling is performed on the high-performance servers, so that the shortest total processing time or the shortest processing time of the specific data independent proteomics mass spectrum data is realized.
CN202011005330.1A 2020-09-23 2020-09-23 Heterogeneous data independent proteomics mass spectrometry analysis system and method Active CN112185460B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011005330.1A CN112185460B (en) 2020-09-23 2020-09-23 Heterogeneous data independent proteomics mass spectrometry analysis system and method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011005330.1A CN112185460B (en) 2020-09-23 2020-09-23 Heterogeneous data independent proteomics mass spectrometry analysis system and method

Publications (2)

Publication Number Publication Date
CN112185460A CN112185460A (en) 2021-01-05
CN112185460B true CN112185460B (en) 2022-07-08

Family

ID=73956788

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011005330.1A Active CN112185460B (en) 2020-09-23 2020-09-23 Heterogeneous data independent proteomics mass spectrometry analysis system and method

Country Status (1)

Country Link
CN (1) CN112185460B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116248680B (en) * 2023-05-11 2023-08-01 湖南工商大学 De novo peptide sequencing method, de novo peptide sequencing device and related equipment

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103810200A (en) * 2012-11-12 2014-05-21 中国科学院计算技术研究所 Database searching method and database searching system for open type protein identification
CN104871164A (en) * 2012-10-24 2015-08-26 考利达基因组股份有限公司 Genome explorer system to process and present nucleotide variations in genome sequence data
CN105956416A (en) * 2016-05-10 2016-09-21 湖北普罗金科技有限公司 Method for analyzing data of prokaryotic proteogenomics rapidly and automatically
CN106687965A (en) * 2013-11-13 2017-05-17 凡弗3基因组有限公司 Systems and methods for transmission and pre-processing of sequencing data
CN107103205A (en) * 2017-05-27 2017-08-29 湖北普罗金科技有限公司 A kind of bioinformatics method based on proteomic image data notes eukaryotic gene group
CN109416926A (en) * 2016-04-11 2019-03-01 迪森德克斯公司 MASS SPECTRAL DATA ANALYSIS workflow
CN111060696A (en) * 2019-12-27 2020-04-24 湖南农业大学 Method for reducing false positive rate of plant small molecule signal peptide

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7158862B2 (en) * 2000-06-12 2007-01-02 The Arizona Board Of Regents On Behalf Of The University Of Arizona Method and system for mining mass spectral data
WO2013097059A1 (en) * 2011-12-31 2013-07-04 深圳华大基因研究院 Method for quantification of proteome
EP3446119A1 (en) * 2016-04-18 2019-02-27 The Broad Institute Inc. Improved hla epitope prediction

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104871164A (en) * 2012-10-24 2015-08-26 考利达基因组股份有限公司 Genome explorer system to process and present nucleotide variations in genome sequence data
CN103810200A (en) * 2012-11-12 2014-05-21 中国科学院计算技术研究所 Database searching method and database searching system for open type protein identification
CN106687965A (en) * 2013-11-13 2017-05-17 凡弗3基因组有限公司 Systems and methods for transmission and pre-processing of sequencing data
CN109416926A (en) * 2016-04-11 2019-03-01 迪森德克斯公司 MASS SPECTRAL DATA ANALYSIS workflow
CN105956416A (en) * 2016-05-10 2016-09-21 湖北普罗金科技有限公司 Method for analyzing data of prokaryotic proteogenomics rapidly and automatically
CN107103205A (en) * 2017-05-27 2017-08-29 湖北普罗金科技有限公司 A kind of bioinformatics method based on proteomic image data notes eukaryotic gene group
CN111060696A (en) * 2019-12-27 2020-04-24 湖南农业大学 Method for reducing false positive rate of plant small molecule signal peptide

Non-Patent Citations (6)

* Cited by examiner, † Cited by third party
Title
Antonio d’Acierno.IsAProteinDB an indexed database of trypsinized proteins for fast peptide mass fingerprinting.《 IEEE/ACM Transactions on Computational Biology and Bioinformatics》.IEEE,2016,第14卷(第5期),全文. *
Harry Shaw, et.al.information theory and signal processing methodology to identify nucleic acid-protein binding sequences in RNA protein interactions.《 2019 53rd Annual Conference on Information Sciences and Systems》.IEEE,2019,全文. *
John C. Hawkins, et.al.Reduced False Positives in PDZ Binding Prediction Using Sequence and Structural Descriptors.《IEEE/ACM Transactions on Computational Biology and Bioinformatics 》.IEEE,2012,第9卷(第5期),1492 - 1503. *
Yosi Shibberu, et.al.A Spectral Approach to Protein Structure Alignment.《IEEE/ACM Transactions on Computational Biology and Bioinformatics》.IEEE,2011,第8卷(第4期),867 - 875. *
张健.基于智能计算的蛋白质残基溶剂可及性和功能的分析预测.《中国优秀博硕士学位论文全文数据库(博士)基础科学辑》.2018,(第01期),A006-47. *
马洁.蛋白质组肽段鉴定质量控制方法的研究与应用.《中国优秀博硕士学位论文全文数据库(博士)基础科学辑》.2011,(第01期),A006-5. *

Also Published As

Publication number Publication date
CN112185460A (en) 2021-01-05

Similar Documents

Publication Publication Date Title
JP3766391B2 (en) Mass spectrometry spectrum analysis system
EP1047107B1 (en) Method of identifying peptides and protein by mass spectrometry
US6489121B1 (en) Methods of identifying peptides and proteins by mass spectrometry
WO2020014767A1 (en) Systems and methods for de novo peptide sequencing from data-independent acquisition using deep learning
JP6901012B2 (en) Data acquisition method in mass spectrometer
JP2006308600A (en) Real-time analysis of mass spectrometry data for discriminating data of objective peptides that are object of analysis
JP2014206497A (en) Analysis system
JP4782579B2 (en) Tandem mass spectrometry system and method
CN112185460B (en) Heterogeneous data independent proteomics mass spectrometry analysis system and method
Tummalacherla et al. Toward artifact-free data in Hadamard transform-based double multiplexing of ion mobility-Orbitrap mass spectrometry
CN111758146A (en) DM-SWATH acquisition for improving MSMS confidence
US9768000B2 (en) Systems and methods for acquiring data for mass spectrometry images
JP6698668B2 (en) High-speed scanning of wide quadrupole RF window while switching fragmentation energy
CA2523976C (en) Computational methods and systems for multidimensional analysis
CN109946413B (en) method for detecting proteome by pulse type data independent acquisition mass spectrum
US11670494B2 (en) Systems and methods for performing tandem mass spectrometry
JP2023502923A (en) Method of mass spectrometry - SWATH using orthogonal fragmentation methodology
CN114365258A (en) Method for IDA by CID-ECD
Victor et al. MAZIE: A mass and charge inference engine to enhance database searching of tandem mass spectra
US11769655B2 (en) Systems and methods for performing multiplexed targeted mass spectrometry
US20220392758A1 (en) Threshold-based IDA Exclusion List
Castro High-throughput matrix-assisted laser desorption/ionization mass spectrometry for single-cell and single-organelle measurements
WO2023235862A1 (en) Methods and systems for individual ion mass spectrometry
WO2023037313A1 (en) Methods and systems for determining molecular mass

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant