CN115206422A - Mass spectrum spectrogram analyzing method and device and intelligent terminal - Google Patents

Mass spectrum spectrogram analyzing method and device and intelligent terminal Download PDF

Info

Publication number
CN115206422A
CN115206422A CN202210784151.5A CN202210784151A CN115206422A CN 115206422 A CN115206422 A CN 115206422A CN 202210784151 A CN202210784151 A CN 202210784151A CN 115206422 A CN115206422 A CN 115206422A
Authority
CN
China
Prior art keywords
spectrogram
spectrum
mass
target
original
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210784151.5A
Other languages
Chinese (zh)
Inventor
魏千洲
张弓
余卓
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Chi Biotech Co ltd
Original Assignee
Shenzhen Chi Biotech Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Chi Biotech Co ltd filed Critical Shenzhen Chi Biotech Co ltd
Priority to CN202210784151.5A priority Critical patent/CN115206422A/en
Publication of CN115206422A publication Critical patent/CN115206422A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N27/00Investigating or analysing materials by the use of electric, electrochemical, or magnetic means
    • G01N27/62Investigating or analysing materials by the use of electric, electrochemical, or magnetic means by investigating the ionisation of gases, e.g. aerosols; by investigating electric discharges, e.g. emission of cathode
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N33/00Investigating or analysing materials by specific methods not covered by groups G01N1/00 - G01N31/00
    • G01N33/48Biological material, e.g. blood, urine; Haemocytometers
    • G01N33/50Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing
    • G01N33/68Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing involving proteins, peptides or amino acids
    • G01N33/6803General methods of protein analysis not limited to specific proteins or families of proteins
    • G01N33/6848Methods of protein analysis involving mass spectrometry
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
    • G16B40/20Supervised data analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/04Recognition of patterns in DNA microarrays

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Theoretical Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Software Systems (AREA)
  • Chemical & Material Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Computing Systems (AREA)
  • Biophysics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Multimedia (AREA)
  • Biotechnology (AREA)
  • Data Mining & Analysis (AREA)
  • Immunology (AREA)
  • Biomedical Technology (AREA)
  • Analytical Chemistry (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Evolutionary Biology (AREA)
  • Biochemistry (AREA)
  • Urology & Nephrology (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Pathology (AREA)
  • Hematology (AREA)
  • Public Health (AREA)
  • Food Science & Technology (AREA)
  • Chemical Kinetics & Catalysis (AREA)
  • Cell Biology (AREA)
  • Epidemiology (AREA)
  • Microbiology (AREA)
  • Bioethics (AREA)
  • Other Investigation Or Analysis Of Materials By Electrical Means (AREA)

Abstract

The application relates to the field of protein unscrambling analysis, in particular to a mass spectrum spectrogram analyzing method, a mass spectrum spectrogram analyzing device and an intelligent terminal, wherein the mass spectrum spectrogram analyzing method comprises the following steps of: acquiring a target spectrogram, wherein the target spectrogram is used for reflecting mass spectrum data of the protein peptide fragment; determining spectrogram characteristics of the target spectrogram based on the mass-to-charge ratio of the target spectrogram; and inputting the spectrogram characteristics of the target spectrogram into a preset spectrum resolving model to obtain the analysis type of the target spectrogram, wherein the analysis type is used for reflecting the peptide fragment sequence corresponding to the target spectrogram. And obtaining spectrogram characteristics based on the mass-to-charge ratio of the target spectrogram, inputting the spectrogram characteristics into a spectrum resolving model, and obtaining analysis classification through the spectrum resolving model. The method has the characteristic of being capable of effectively, accurately and quickly analyzing the peptide fragment sequence of the protein corresponding to the target spectrogram.

Description

Mass spectrum spectrogram analyzing method and device and intelligent terminal
Technical Field
The application relates to the field of protein unscrambling analysis, in particular to a mass spectrum spectrogram analyzing method and device and an intelligent terminal.
Background
The human genome contains all the information about human development and evolution, revealing the rationale for the genes encoding proteins. During the time of post-genomic project, biological studies are gradually shifting from genomic studies to proteomic studies. Proteins are involved in almost every aspect of cellular function, and the characterization of proteins has now become an important component of modern biology, which has motivated a new discipline: proteomics. The mass spectrometry technology is the main technology for researching protein at present, and has the characteristics of high precision, high speed and the like. The progress of the mass spectrometry technology provides unprecedented speed, sensitivity and accuracy identification for protein identification, and provides advantages for research and development of proteomics.
At present, the proteome mass spectrometry technology which is widely applied almost utilizes a computer software algorithm to analyze mass spectrograms, but the problems of high calculation cost and overlong analysis time generally exist.
Disclosure of Invention
The method has the characteristic of short analysis time.
The above object of the present invention is achieved by the following technical solutions:
a method of mass spectrometry spectrum analysis comprising:
acquiring a target spectrogram, wherein the target spectrogram is used for reflecting mass spectrum data of the protein peptide fragment;
determining spectrogram characteristics of the target spectrogram based on the mass-to-charge ratio of the target spectrogram;
inputting the spectrogram characteristics of the target spectrogram into a preset spectrum resolving model to obtain the resolution category of the target spectrogram, wherein the resolution category is used for reflecting the peptide fragment sequence corresponding to the target spectrogram.
By adopting the technical scheme, the spectrogram characteristics are obtained based on the mass-to-charge ratio of the target spectrogram, the spectrogram characteristics are input into the spectrum resolving model, and the analysis and classification are obtained through the spectrum resolving model, so that the peptide fragment sequence of the protein corresponding to the target spectrogram can be effectively, accurately and quickly analyzed.
Optionally, the determining spectral features of the target spectrogram based on the mass-to-charge ratio of the target spectrogram includes:
screening all the mass-to-charge ratios of the target spectrogram based on the intensity values of the mass-to-charge ratios to obtain characteristic quantity characteristic mass-to-charge ratios;
and obtaining the spectrogram characteristics of the target spectrogram based on all the characteristic mass-to-charge ratios of the target spectrogram.
By adopting the technical scheme, the target spectrograms corresponding to different protein peptide fragments have larger difference in mass-to-charge ratio, and the spectrogram characteristics are obtained by utilizing the mass-to-charge ratios in specified quantity, so that the spectrogram characteristics have stronger representativeness relative to the target spectrogram, and the accuracy of detection and analysis is improved.
Optionally, the output of the spectrum solution model includes a feature matrix obtained by compressing the spectrogram features, and the analysis category corresponding to the feature matrix; the length of the feature matrix is 7.
By adopting the technical scheme, the feature loss after spectrogram feature compression is realized, and the utilization rate of the protein spectrogram is improved.
Optionally, the spectrum solution model includes 4 down-sampling layers for compressing the spectrogram feature to obtain an average value of the feature matrix;
screening all the mass-to-charge ratios of the target spectrogram based on the intensity values of the mass-to-charge ratios to obtain characteristic quantity characteristic mass-to-charge ratios, wherein the screening comprises the following steps:
and screening all the mass-to-charge ratios of the target spectrogram based on the intensity values of the mass-to-charge ratios to obtain 112 characteristic mass-to-charge ratios.
By adopting the technical scheme, the spectrum solution model has a wider application range to the target spectrogram, and the utilization rate of mass spectrum data is improved.
Optionally, the solution spectrum model is trained by using the following method:
obtaining a model training data set, wherein the model training data set comprises a training spectrogram, spectrogram features corresponding to the training spectrogram, and peptide fragment labels correspondingly labeled to the training spectrogram;
inputting spectrogram characteristics of the training spectrogram into the spectrum solution model to obtain a training category of the training spectrogram;
and training the spectrum solving model based on the peptide fragment labels of the training spectrogram and the training category of the training spectrogram.
By adopting the technical scheme, the peptide segment label marked in advance is used as a real result, the training category output by the spectrum solution model is used as an actual result, and the spectrum solution model is trained by utilizing the peptide segment label and the training category, so that the analysis result of the spectrum solution model is more accurate and closer to a real value.
Optionally, the obtaining a model training data set includes:
acquiring a mass spectrum original data set and a spectrum resolving data set, wherein the mass spectrum original data set comprises an original spectrogram, and the spectrum resolving data set comprises a spectrum resolving result corresponding to the original spectrogram;
screening the original spectrogram in the mass spectrum original data set to obtain a filtered spectrogram;
removing the filtered spectrogram in the mass spectrum original data set to obtain a mass spectrum training data set;
acquiring spectrogram characteristics of the mass spectrum training data set;
and labeling all original spectrograms in the mass spectrum training dataset based on the spectrum solution dataset to obtain a peptide fragment label of the mass spectrum training dataset.
By adopting the technical scheme, the training data of the solution spectrum model is screened, part of data with less effective information and insufficient credibility is excluded in advance, the model effect after training is improved, and the analysis result of the trained solution spectrum model is more accurate.
Optionally, the screening the original spectrogram in the mass spectrum original data set to obtain a filtered spectrogram includes:
screening the original spectrogram with the spectrum resolving result as an inverse library based on the spectrum resolving result of the original spectrogram to obtain a filtered spectrogram;
and/or screening the original spectrogram with a blank spectrum resolving result based on a spectrum resolving result of the original spectrogram to obtain a filtered spectrogram;
and/or screening the original spectrogram with the score smaller than a score threshold value based on the spectrum resolving result of the original spectrogram to obtain a filtered spectrogram;
and/or screening the original images with the mass-to-charge ratios smaller than a characteristic threshold value based on the mass-to-charge ratios of the original spectrogram to obtain a filtered spectrogram, wherein the characteristic threshold value is smaller than or equal to the mass-to-charge ratio contained in the spectrogram characteristics;
and/or grouping the original spectrograms in the mass spectrum original data set based on the parent ion valence of the original spectrograms to obtain valence classification groups;
screening the valence classification group with the spectrogram number smaller than the valence chart number threshold value to obtain a filtered spectrogram;
and/or grouping the original spectrogram in the mass spectrum original data set based on the peptide fragment sequence in the spectrum resolving result to obtain a peptide fragment classification group;
and screening the peptide fragment classification groups with the number of spectrograms smaller than the threshold value of the number of peptide fragment spectrograms to obtain a filtering spectrogram.
By adopting the technical scheme, the original spectrograms with the spectrum solving result of an inverse library, the spectrum solving result of a null or the score of the spectrum solving result of less than the score threshold are all original spectrograms with low reliability, and the original spectrograms are taken as filtering spectrograms to be subsequently excluded, so that the training of the model can be optimized. If the mass-to-charge ratio quantity of the original spectrogram is smaller than the characteristic threshold, the original spectrogram cannot extract or screen out enough mass-to-charge ratios as spectrogram characteristics to participate in subsequent calculation, so that the part of the original spectrogram is required to be used as a filtering spectrogram to be subsequently eliminated. If the number of the original spectrogram in the valence classification group is smaller than the valence map number threshold, the data amount in the valence classification group is too small; and if the number of the original spectrogram in the peptide fragment classification group is less than the threshold value of the number of the peptide fragment spectrogram, the data amount in the peptide fragment classification group is too small. When the data amount in the valence classification group is too small or the data amount in the peptide classification group is too small, the model is difficult to converge in the training process, and therefore the data needs to be filtered, so that the model training can be converged quickly.
The invention also provides a mass spectrum spectrogram analyzing device.
A mass spectrometry spectrum analysis apparatus comprising:
a spectrogram acquisition module for acquiring a target spectrogram, wherein the target spectrogram corresponds to a peptide fragment of a protein;
the characteristic extraction module is used for determining spectrogram characteristics of the target spectrogram based on the mass-to-charge ratio of the target spectrogram;
and the model analysis module is used for inputting the spectrogram characteristics of the target spectrogram into a preset spectrum solution model to obtain the analysis category of the target spectrogram, wherein the analysis category is used for reflecting the peptide fragment sequence corresponding to the target spectrogram.
The invention also provides an intelligent terminal.
An intelligent terminal comprises a memory and a processor, wherein the memory stores a computer program which can be loaded by the processor and executes the mass spectrum spectrogram analysis method according to any one of the above technical schemes.
The primary object of the present invention is also to provide a computer-readable storage medium.
A computer readable storage medium storing a computer program capable of being loaded by a processor and executing the mass spectrometry spectrogram analysis method according to any one of the above-mentioned technical solutions.
Drawings
Fig. 1 is a schematic flow chart of a mass spectrometry spectrum analysis method according to the present application.
Fig. 2 is a schematic subflow diagram of step S2 of the mass spectrometry spectrum analysis method of the present application.
Fig. 3 is a schematic structural diagram of the spectrum solution model of the present application.
FIG. 4 is a schematic diagram of the mean down-sampling layer of the present application compressing spectral features.
Fig. 5 is a schematic flow chart of a training method of a spectrum solution model in the mass spectrum analysis method of the present application.
Fig. 6 is a sub-flow diagram of step E1 of the method for training a solution spectrum model in the mass spectrometry spectrogram analysis method of the present application.
FIG. 7 is a schematic diagram of the optimization process of the solution spectrum model of the present application.
Fig. 8 is a schematic diagram of a screening process of a training profile of the present application.
Fig. 9 is a block diagram schematically illustrating a mass spectrometry device according to the present invention.
In the figure, 1, a spectrogram acquisition module; 2. a feature extraction module; 3. and a model analysis module.
Detailed Description
At present, proteomic mass spectrometry technology mainly performs analysis through special software and by using a deep learning algorithm.
In special software, widely applied protein spectrum-resolving software includes pFind, mascot and Maxquant, but the problems of high calculation cost and long analysis time are generally existed. For example, pFind, using a 24-core server, requires an average of about 8 hours of unscrambling time per 10GB of spectrum, and if the peptide sequence of the protein to be analyzed is complex, it takes more time. The method is equivalent to performing a protein mass spectrometry experiment once, the result can be obtained only in 8 hours, and the problems of high calculation cost and overlong analysis time of a mass spectrometry technology exist.
In the research of identifying the peptide sequence of the protein by deep learning, the related technology also has a technical scheme of processing spectrogram data by combining CNN and LSTM algorithms, the technical scheme utilizes the CNN algorithm to extract the image characteristics of the spectrogram, and utilizes the LSTM algorithm to predict the peptide sequence corresponding to the image characteristics, a plurality of amino acids in the protein can be predicted in sequence, the last amino acid is predicted by the former amino acid, and the highest accuracy can reach 75%. However, when any of the amino acids located at the preceding position is predicted erroneously, the prediction error rate of the following amino acids is greatly increased.
The two methods have problems that not only the success yield efficiency of scientific research projects is greatly limited, but also doctors are likely to miss the optimal treatment time of patients in clinical application, so that the clinical application of the proteome mass spectrometry technology is limited.
Based on the technical problems, the application provides a mass spectrum spectrogram analyzing method with short analyzing time and high accuracy.
In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
In addition, the reference numbers of the steps in this embodiment are only for convenience of description, and do not represent the limitation of the movement sequence of the steps, and in practical application, the movement sequence of the steps may be adjusted or performed simultaneously as needed, and these adjustments or substitutions all belong to the protection scope of the present invention.
Embodiments of the present application are described in further detail below with reference to figures 1-9 of the specification.
The embodiment of the application provides a mass spectrum spectrogram analyzing method, and the main flow of the mass spectrum spectrogram analyzing method is described as follows.
Referring to fig. 1, S1, a target mass spectral data set is acquired.
The target mass spectrum data set comprises a group of proteome mass spectrum data, and a plurality of target spectrograms are arranged in the target mass spectrum data set. The target spectrogram is a spectrogram of the protein to be analyzed, and one protein spectrogram represents one peptide segment in the protein. The target mass spectrum data set comprises mass spectrum data of the protein peptide fragment, the mass spectrum data reflects basic attributes of the peptide fragment corresponding to a target spectrogram, and the mass spectrum data comprises the valence of a parent ion of the target spectrogram, the mass-to-charge ratio of the target spectrogram and the intensity value of the mass-to-charge ratio on the target spectrogram. Wherein one target spectrum corresponds to one parent ion valence, one target spectrum corresponds to a plurality of mass-to-charge ratios, and one mass-to-charge ratio corresponds to one intensity value.
Specifically, the mass-to-charge ratio is also referred to as m/z, and the mass-to-charge ratio refers to data obtained by dividing mass by charge. The intensity value is also called intensity, and is used to indicate the ion concentration of the parent ion corresponding to the mass-to-charge ratio, and the larger the intensity value of the mass-to-charge ratio is, the higher the ion concentration corresponding to the mass-to-charge ratio is. For protein spectrograms with the same peptide fragment sequence, the mass-to-charge ratio positions with higher intensity in the protein spectrograms are all relatively close.
It can be understood that the mass spectrometry data are all known information in the target spectrogram, and the protein peptide fragment corresponding to the target spectrogram is unknown, and the purpose of the mass spectrometry spectrogram analyzing method provided by the application is how to quickly and accurately analyze the peptide fragment sequence of the protein peptide fragment corresponding to the target spectrogram based on the known information in the target spectrogram.
In one embodiment, a target spectrogram in a target mass spectral data set is stored in the MGF file.
S2, determining spectrogram characteristics of the target spectrogram based on the mass-to-charge ratio of the target spectrogram.
Each target spectrogram has a spectrogram feature which can reflect the characteristics of the corresponding target spectrogram and can represent the target spectrogram. In this embodiment, the spectrogram feature is composed of a plurality of mass-to-charge ratios, and since the target spectrogram corresponding to different peptide fragments has a large difference in respective mass-to-charge ratios, the spectrogram feature is obtained by using a specified number of mass-to-charge ratios, so that the spectrogram feature has a strong representativeness with respect to the target spectrogram.
Referring to fig. 1 and 2, specifically, step S2 includes:
s21, screening all mass-to-charge ratios of the target spectrogram based on the intensity values of the mass-to-charge ratios to obtain characteristic quantity and characteristic mass-to-charge ratios.
Wherein, the characteristic mass-to-charge ratio is the mass-to-charge ratio which is screened from all mass-to-charge ratios of the target spectrogram and can represent the target spectrogram most. The characteristic quantity is a system preset value. In this embodiment, all the mass-to-charge ratio intensity values in the target spectrogram are read, and the characteristic number of mass-to-charge ratios with the maximum intensity values, that is, the characteristic number of mass-to-charge ratios with the highest ion concentration, are retained as the characteristic mass-to-charge ratios of the target spectrogram.
And S22, obtaining the spectrogram characteristics of the target spectrogram based on all characteristic mass-to-charge ratios of the target spectrogram.
The spectrogram features a one-dimensional feature array composed of feature quantity and feature mass-to-charge ratios.
And S3, inputting the spectrogram characteristics of the target spectrogram into a preset spectrum resolving model to obtain the resolution category of the target spectrogram.
The spectrum resolving model is a mathematical model obtained by training based on the spectrogram characteristics of the protein spectrogram and the peptide fragment sequence corresponding to the protein spectrogram. And the analysis category is used for reflecting the peptide fragment sequence corresponding to the target spectrogram. The spectrum resolving model analyzes the spectrogram characteristics consisting of a plurality of mass-to-charge ratios to obtain the resolution category.
In this embodiment, the spectrum solution model is a mathematical model modified based on the network structure of GoogleNet according to the characteristics of the protein spectrum. GoogleNet is a deep learning structure, compared with other deep learning structures such as AlexNet and VGG, the deep learning structure does not need to increase the depth (layer number) of a network to obtain a better training effect, reduces negative effects such as overfit, gradient disappearance, gradient explosion and the like, can more efficiently utilize computing resources, and can extract more features under the same computing capacity, so that the training result is improved.
Referring to fig. 3 and 4, in particular, the structure of the solution model includes a convolution layer, an average value down-sampling layer, an inclusion layer and a full connection layer.
Convolutional layers are used to extract features from the content of the input or from the output of the neural network of the previous layer. In this embodiment, the spectrogram feature of the input cepstrum model is actually a feature array composed of a plurality of mass-to-charge ratios and arranged in a matrix form, and the convolution layer processed data is also a one-dimensional array.
The calculation process of the average value down-sampling layer is that the average value is taken by three continuous numbers from the first value of the input array, and each step is 2, so that the length of the output array after calculation is reduced by half compared with the original array length, and the effect of compressing the characteristics is achieved.
The Incep layer extracts features through 4 different calculations, and then the features are fused, wherein the 4 different calculations refer to different sizes of convolution kernels involved in the process, namely, convolution results of different sizes can be connected in series, and finally obtained results are combined into an array.
The full-connection layer has the functions of continuously learning data, calculating errors, feeding back the errors, revising parameters, and continuously learning to achieve the purpose of convergence. In this embodiment, the structure of the fully-connected layer is a hidden layer composed of 1 × 2048 neuron nodes, and the Dropout value of the fully-connected layer is set to 0.8, which means that only 80% of neurons in the hidden layer are activated in each training process.
The training data of the spectrum-solving model comprises the spectrogram characteristics of the protein spectrogram and the peptide fragment sequence corresponding to the protein spectrogram. In practical application, after the feature array of the spectrogram feature is input into the spectrum solution model, the convolution layer performs convolution operation on the feature array for multiple times based on a preset convolution kernel structure and a stepping value, the average value downsampling layer reduces the length of the feature array, and finally the output result of the spectrum solution model is a feature matrix and the probability that the feature matrix belongs to each peptide segment sequence. The feature matrix can be understood as the compressed spectrogram feature, which can express the feature of a target spectrogram, compare the probability of the feature matrix relative to each peptide sequence, and determine the most likely peptide sequence to which the feature matrix belongs for the peptide sequence with the highest probability, thereby analyzing the peptide sequence to which the spectrogram feature belongs.
Specifically, the length of feature compression in this embodiment is 7, that is, the length of the spectrogram feature is compressed to 7 after passing through a multi-layer average downsampling layer, so as to achieve the effect of high resolution accuracy.
The specific analysis of the beneficial effects is as follows: for the mathematical model constructed by GoogleNet, the length of the feature matrix of its output is typically 3, 5, 7 or 9. If the length of the feature matrix after feature compression is set to 3, the feature matrix is difficult to represent a spectrogram feature, and there is a serious feature loss, resulting in poor accuracy of model analysis. If the length of the feature matrix after feature compression is 5, there will be a serious feature loss, and the feature matrix cannot represent a spectrogram feature, resulting in poor accuracy of the model.
If the length of the feature matrix after feature compression is set to 5, compared with the case that the length is set to 3, only partial feature loss is reduced, and a better verification effect can be obtained when verification is performed in the same data set, but the effect is poor when verification is performed between different data sets, and the accuracy of model analysis is also poor. Wherein, a data set refers to the same batch of extracted proteome mass spectrum data, like the spectrogram data generated after a batch of 97H cell extracted proteins pass through a mass spectrometer, which is a data set, and the data set is generally in accordance with the following 8: 2-scale split, where 80% of the data is used for training and 20% is used for predictive validation.
The smaller the length of the feature matrix after feature compression is, the higher the utilization rate of the protein spectrogram is, and if the length of the feature matrix is set to 9, the lower the utilization rate of the protein spectrogram is, and the effect in practical application is not good. The length of the feature matrix after feature compression is set to be 7, so that the utilization rate of the protein spectrogram can be considered while the protein spectrogram can be represented, and the two are balanced to achieve a better model effect.
Further, the length of the spectrogram feature in this embodiment is set to 112, that is, the spectrogram feature includes 112 mass-to-charge ratios, and the number of layers of the average value down-sampling layer is set to 4. In the calculation process of the spectrum solution model, the original spectrogram feature with the length of 112 is subjected to feature compression of a 4-layer average value down-sampling layer, that is, a feature array of the spectrogram feature is subjected to four times of one-half reduction, and finally a feature matrix with the length of 7 is obtained.
For protein spectrograms belonging to different protein peptide fragments, the data volume of the mass-to-charge ratio of the protein spectrogram is also different, and the data volume of the actually effective mass-to-charge ratio is less after all the mass-to-charge ratios are screened or data washed. If the length of the spectrogram feature is too large, if the length of the spectrogram feature is set to 224, and the number of layers of the average value down-sampling layer is set to 5, the mass-to-charge ratio data amount required to participate in calculation is too large, many protein spectrograms are difficult to obtain a good prediction effect through a spectrum solution model, and the application range of the model is narrow.
In this embodiment, the length of the spectrogram feature is set to 112, that is, the matrix length of the input content of the solution model is 112, and the decoupling model is designed with 4 layers of average value down-sampling layers, so as to achieve the characteristics of higher accuracy, wider application range, and faster analysis speed of the solution spectrum analysis.
It can be understood that, in other embodiments, if in an actual application scenario, only a single data set is analyzed by using a spectrum solution model, and an influence of poor verification effect caused by different data sets is not considered, the length of the feature matrix is reduced, and if a feature matrix with a length of 5 is used and 4 layers of average values are adopted to sample down layers, the spectrogram feature needs to include 80 mass-to-charge ratios. Similarly, in an actual application scenario, if only a peptide sequence with a small/large data amount of mass-to-charge ratio is analyzed, a smaller/larger number of average value downsampling layers may be used, and if a peptide sequence with a large data amount of mass-to-charge ratio is required, a 5-layer average value downsampling layer is used, and the length of the feature matrix is set to 7, 224 mass-to-charge ratios need to be included in the spectrogram feature.
The implementation principle of the mass spectrum spectrogram analysis method provided by the application is as follows: and obtaining spectrogram characteristics based on the mass-to-charge ratio of the target spectrogram, inputting the spectrogram characteristics into a spectrum resolving model, and obtaining analysis classification through the spectrum resolving model. The mass spectrometry spectrogram analysis method is high in accuracy and short in spectrum analysis time consumption, real-time spectrum analysis can be achieved with the joint calculation cost, namely, the protein spectrogram can be analyzed and an analysis result can be quickly obtained as soon as the protein spectrogram is generated, the analysis time is greatly shortened, the limitation on the achievement output efficiency of scientific research projects is reduced, quick clinical examination is facilitated, and the proteome mass spectrometry technology is favorably applied clinically.
The embodiment of the application provides a method for training a solution spectrum model, and the main flow of the training method is described as follows.
Referring to fig. 5, E1, a model training data set is obtained.
The model training data set comprises a plurality of training spectrograms, mass spectrum data corresponding to each training spectrogram and peptide segment labels correspondingly marked on each training spectrogram.
The training spectrogram refers to a protein spectrogram, and corresponds to a target spectrogram, and the training spectrogram also represents one of peptide fragments in the protein. The mass spectrum data is used for reflecting the basic attribute of the peptide section corresponding to the training spectrogram, and comprises the valence of the parent ions of the training spectrogram, the mass-to-charge ratio of the training spectrogram and the intensity value of the mass-to-charge ratio in the training spectrogram. The peptide fragment label is used for reflecting the peptide fragment sequence of the peptide fragment corresponding to the training spectrogram.
Referring to fig. 5 and 6, specifically, step E1 includes:
and E11, acquiring a mass spectrum original data set and a spectrum data set.
In an embodiment, the spectrum-solving data set is obtained by analyzing the original spectrogram through special computer software.
In this embodiment, all the original images in the original mass spectrum data set are generated into a dictionary type, and the dictionary type includes values of mass-to-charge ratio, peptide sequence, score of the unscrambling result, and post-translational modification.
And E12, screening the original spectrograms in the mass spectrum original data set to obtain a filtered spectrogram.
In order to improve the effect of model training, data in the mass spectrum original data set needs to be screened and cleaned, and a filtering spectrogram refers to a screened original spectrogram with less effective information and needs to be cleaned and filtered.
And E13, removing the filtered spectrogram in the mass spectrum original data set to obtain a mass spectrum training data set.
And E14, acquiring spectrogram characteristics of all training spectrograms in the mass spectrum training data set.
And after all the filtered spectrograms are removed from the mass spectrum training data set, all the residual original spectrograms in the mass spectrum training data set are used as training spectrograms to participate in subsequent model training.
And E15, labeling all original spectrograms in the mass spectrum training data set based on the spectrum resolving data set to obtain a peptide fragment label of the mass spectrum training data set.
Referring to fig. 5 and 7, in the present embodiment, each sequence is defined as a class based on a plurality of currently known peptide sequences, as indicated by class numbers such as 0, 1, 2, etc., and each peptide tag corresponds to a peptide class.
It can be understood that the training spectrogram corresponds to a protein spectrogram before mass spectrogram analysis, mass spectrogram data is known basic attribute of the mass spectrogram data, and the peptide fragment tag is a real analysis result obtained after the training spectrogram is subjected to mass spectrogram analysis. In this example, the specific method for generating the peptide tag is as follows: and carrying out mass spectrum spectrogram analysis on the training spectrogram through special computer software such as pFind, then determining a peptide sequence corresponding to the training spectrogram according to a spectrum resolving result, and classifying according to the peptide sequence corresponding to the peptide sequence to generate a peptide label of the training spectrogram.
And E2, inputting the spectrogram characteristics of the training spectrogram into the spectrum resolving model to obtain the training category of the training spectrogram.
After the feature array of the spectrogram feature is input into the spectrum solution model, the convolution layer performs convolution operation on the feature array for multiple times based on a preset convolution kernel structure and a stepping value, the 4-layer average value downsampling layer reduces the length of the feature array, and the feature matrix with the final output length of 7 of the spectrum solution model and the probability that the feature matrix belongs to each peptide segment classification. And classifying the peptide segment with the highest probability as the training class of the training spectrogram.
And E3, training a spectrum solving model based on the peptide segment label of the training spectrogram and the training category of the training spectrogram.
And adjusting model parameters of the spectrum solution model according to the peptide fragment label of the training spectrogram and the analysis result of the training spectrogram until the peptide fragment label of the training spectrogram and the analysis result of the training spectrogram are in an allowable difference range, thereby obtaining the trained spectrum solution model.
Referring to fig. 8, in order to improve the effect of model training, in step E12 in the present embodiment, deep data screening and cleaning are required. In this embodiment, step E12 includes:
and E121, screening the spectrum resolving result as an original spectrogram of the inverse library based on the spectrum resolving result of the original spectrogram to obtain a filtered spectrogram.
Specifically, the spectrum resolving results of all original spectrograms are extracted from the spectrum resolving dataset, and the original spectrograms of which the spectrum resolving results are inverse libraries are used as filtering spectrograms.
And E122, screening the original spectrogram with a blank spectrum resolving result based on the spectrum resolving result of the original spectrogram to obtain a filtered spectrogram.
Specifically, the spectrum resolving results of all original spectrograms are extracted from the spectrum resolving data set, and the original spectrogram with the spectrum resolving result being empty is used as a filtering spectrogram.
And E123, screening the original spectrogram with the score smaller than the score threshold value based on the spectrum resolving result of the original spectrogram, and obtaining a filtered spectrogram.
Specifically, scores of the spectrum resolving results of all the original spectrograms are extracted from the spectrum resolving dataset, and the original spectrogram with the score smaller than a score threshold value is used as a filtering spectrogram. The Score of the spectrum resolving result is also called as Raw _ Score, and the higher the Score is, the higher the credibility of the spectrum resolving result is; otherwise, the higher the reliability of the spectrum resolving result.
The scoring threshold is a system preset value, if the scoring threshold is too high, the number of available training spectrograms after screening is too small, and if the scoring threshold is too high, more unreliable spectrum solving results are also involved in model training, and in order to improve the utilization rate of training data and also consider the effectiveness of the training data, the scoring threshold is preferably 10 in this embodiment.
And (E121-E123) combining the steps, wherein the original spectrograms with the spectrum solving result of an inverse library, the spectrum solving result of null or the score of the spectrum solving result smaller than the score threshold are all original spectrograms with low reliability, and the purpose of the steps E121-E123 is to take the part of original spectrograms as filtering spectrograms to be subsequently excluded so as to optimize the training of the model.
And E124, screening the original images with the mass-to-charge ratios smaller than the characteristic threshold value based on the mass-to-charge ratios of the original spectrogram to obtain a filtered spectrogram.
Specifically, the number of mass-to-charge ratios in all original spectrograms is obtained, and the original images with the number of mass-to-charge ratios smaller than a characteristic threshold value are used as filtering spectrograms. The feature threshold is greater than or equal to the number of mass-to-charge ratios in the spectrogram features. If the mass-to-charge ratio quantity of the original spectrogram is smaller than the characteristic threshold, the original spectrogram cannot extract or screen out enough mass-to-charge ratios as spectrogram characteristics to participate in subsequent calculation, so that the part of the original spectrogram is required to be used as a filtering spectrogram to be subsequently eliminated.
In the present embodiment, the characteristic threshold is preferably 120.
And E125, grouping the original spectrograms in the mass spectrum original data set based on the parent ion valence of the original spectrograms to obtain valence classification groups, and then screening the valence classification groups with the spectrogram number smaller than the valence chart number threshold value to obtain a filtering spectrogram.
Specifically, all original spectrograms are divided into a plurality of valence classification groups according to the valence of parent ions of the original spectrograms, then the number of the original spectrograms in all the valence classification groups is counted, and if the number of the original spectrograms in the valence classification groups is smaller than a valence chart number threshold value, the data amount in the valence classification groups is too small. In the present embodiment, the valence number threshold is preferably 1000.
In the deep learning model training process, the larger the data amount of the classification belonging to the same parent ion valence is, the better theoretically, because protein spectrograms corresponding to different parent ion valences need to be discussed separately, the sufficient number of the protein spectrograms corresponding to each parent ion valence is ensured, otherwise, the model is difficult to converge in the training process, and therefore valence classification groups with too small data amount need to be removed, namely, all original images contained in the part of valence classification groups are used as filtering spectrograms.
In one embodiment, step E125 further comprises:
and calculating the quantity difference of the original spectrograms among different valence classification groups, comparing the quantity difference with a difference threshold value, and taking all the original spectrograms in the valence classification group with the smaller quantity of the original spectrograms as filtering spectrograms when the quantity difference is greater than the difference threshold value. And the difference threshold is the number of the original spectrogram in the valence classification group with the smaller number of the original spectrogram.
It is understood that when there are multiple valence classification groups in the data set, and although the number of original spectra of each valence classification group is greater than 1000, the difference between the numbers of original spectra of different valence classification groups is too large, the valence classification group with the obviously too small number of original spectra needs to be removed. For example, if the data set contains parent ion valences of +2, +3, +4, +5, +6, where the number of raw spectra for +2, +3 is more than 10 ten thousand on average, and the number of raw spectra for +2, +3, +4, +5, +6 is about 2 ten thousand, then the raw spectra for the valency classification group of +2, +3, +4, +5, +6 are used as filtered spectra. If there is a data set with a large number of original spectra containing the valence +4, +5, +6 of the parent ion, it can be discussed as a case to participate in the training of the solution spectrum model.
E126, grouping original spectrograms in the mass spectrum original data set based on the peptide sequence in the spectrum resolving result to obtain a peptide classification group, and then screening the peptide classification group with the spectrogram quantity smaller than the peptide graph quantity threshold value to obtain a filtering spectrogram.
Specifically, according to the peptide fragment classification of the original spectrogram, dividing all the original spectrograms into a plurality of peptide fragment classification groups, then counting the number of the original spectrograms in all the peptide fragment classification groups, and if the number of the original spectrograms in the peptide fragment classification groups is smaller than a threshold value of the number of the peptide fragment spectrograms, indicating that the data amount in the peptide fragment classification group is too small. In this example, the threshold number of peptide fragments is 120.
In the deep learning model training process, theoretically, the more data volume belonging to the same peptide fragment classification is better, and because protein spectrograms corresponding to different peptide fragment classifications need to be discussed separately, the enough number of the protein spectrograms corresponding to each peptide fragment classification needs to be ensured, otherwise, the model is difficult to converge in the training process, so that the peptide fragment classifications with too little data volume need to be removed, namely, all original images contained in the part of the peptide fragment classifications are used as filtering spectrograms.
By utilizing the steps E125-E126, the classification of the peptide sequence and the classification of the valence of the parent ion can be ensured, and higher data volume is kept, so that the solution spectrum model can be quickly converged in the training process, and the model effect is better.
The following example of the training process of the unscrambling model is performed with a set of 97H cell whole proteomic mass spectral data.
The 97H cell whole proteome mass spectrum data has 2108428 original spectrograms in total, according to the spectrum resolving results of all the original spectrograms, the number of the original spectrograms with the spectrum resolving result of a reverse library is 570722, the number of the original spectrograms with the spectrum resolving result of empty is 82335, the number of the original spectrograms with the score of the spectrum resolving result of less than 10 is 1151436, and after the original spectrograms are used as filtering spectrograms and filtered, 303935 Zhang Yuanshi spectrograms remain.
The parent ion valences of the remaining 303935 Zhang Yuanshi spectrum include +2, +3, +4, +5, +6. The remaining 303935 Zhang Yuanshi spectrum was correspondingly divided into 5 valence classification groups. Wherein, the number of the original spectrogram with a valence classification group of +2 and +3 is more than 10 ten thousand on average, the sum of the number of the original spectrogram with a valence classification group of +4, +5 and +6 is less than 10 ten thousand, and the average is only about 3 ten thousand to 4 ten thousand, then the original spectrogram with a valence classification group of +4, +5 and +6 is used as a filtering spectrogram to be filtered.
And based on the peptide segment classification to which the rest original spectrograms belong, grouping all the original spectrograms to obtain a plurality of peptide segment classification groups, screening out the peptide segment classification groups with the spectrogram quantity less than 120, and filtering out the original spectrograms in the part of the peptide segment classification groups as filtering spectrograms.
Through the filtering operation, the finally obtained original spectrogram is a training spectrogram and comprises the following steps:
6356 training spectrogram with a parent ion valence of +2, which comprises 44 peptide fragment classifications;
7856 training spectra with a parent ion valence of +3, which include 55 peptide classes.
Obtaining spectrogram characteristics of the training spectrogram according to the first 112 mass-to-charge ratios with the highest intensity values in the training spectrogram; and classifying the peptide fragments to which the training spectrogram belongs to obtain the peptide fragment labels of the training spectrogram.
For example, in one of the training spectra, the valence of the parent ion in the training spectrum is +2, and the spectrum features of the training spectrum are a feature array including 112 mass-to-charge ratios, which specifically include: [169.13239, 198.08661, 199.07051, 216.09682, 217.1366, … 1746.78259]; the sequence of the peptide segment to which the training spectrogram belongs is PVSSAASVYAGAGGSGSR, the peptide segment is classified as 17, and the label of the peptide segment is 17.
Inputting the spectrogram characteristics of the training spectrogram into the spectrum model to obtain the training category of the training spectrogram, and then training the spectrum model based on the peptide fragment label of the training spectrogram and the training category of the training spectrogram.
The application also provides a mass spectrum spectrogram analyzing device corresponding to the mass spectrum spectrogram analyzing method.
Referring to fig. 9, the mass spectrometry spectrum analyzing apparatus includes:
a spectrogram acquiring module 1, configured to acquire a target spectrogram, where the target spectrogram corresponds to a peptide fragment of a protein;
the feature extraction module 2 is used for determining spectrogram features of the target spectrogram based on the mass-to-charge ratio of the target spectrogram;
and the model analysis module 3 is used for inputting the spectrogram characteristics of the target spectrogram into a preset spectrum solution model to obtain the analysis type of the target spectrogram, wherein the analysis type is used for reflecting the peptide fragment sequence corresponding to the target spectrogram.
The feature extraction module 2 includes:
and the spectrogram screening submodule is used for screening all mass-to-charge ratios of the target spectrogram based on the intensity value of the mass-to-charge ratio to obtain characteristic quantity characteristic mass-to-charge ratios.
And the characteristic combination submodule is used for obtaining the spectrogram characteristics of the target spectrogram based on all characteristic mass-to-charge ratios of the target spectrogram.
The mass spectrum analysis device provided in this embodiment can achieve the same technical effects as the method because of the functions of the modules and the logical connection between the modules, and the principle analysis can refer to the related description of the steps of the mass spectrum analysis method, which will not be described herein again.
The application also provides an intelligent terminal.
An intelligent terminal comprises a memory, a processor and a computer program which is stored in the memory and can run on the processor, wherein the memory stores training data, an algorithm formula, a filtering mechanism and the like in a training model. The processor is used for providing calculation and control capability, and the processor realizes the mass spectrum spectrogram analysis method when executing the computer program.
In the intelligent terminal provided in this embodiment, after the computer program in the memory of the intelligent terminal is run on the processor, the steps of the method are implemented, so that the same technical effect as that of the method can be achieved, and for principle analysis, reference may be made to the related description of the steps of the method, which will not be described herein again.
The present application also provides a computer-readable storage medium.
A computer-readable storage medium comprising a memory and a processor, the memory having stored thereon a computer program which can be loaded by the processor and which performs the method of mass spectrometry spectrogram analysis as described above, the computer program, when executed by the processor, implementing the method of mass spectrometry spectrogram analysis.
The readable storage medium provided by this embodiment may achieve the same technical effects as the foregoing method because the computer program in the readable storage medium implements the steps of the foregoing method after being loaded and executed on the processor, and for principle analysis, reference may be made to the related description of the steps of the foregoing method, which will not be described herein again.
The computer-readable storage medium includes, for example: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, an optical disk, or other various media capable of storing program codes.
The foregoing is only a partial embodiment of the present application, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present application, and these modifications and decorations should also be regarded as the protection scope of the present application.

Claims (10)

1. A method for mass spectrometry spectrum analysis, comprising:
acquiring a target spectrogram, wherein the target spectrogram is used for reflecting mass spectrum data of the protein peptide fragment;
determining spectrogram features of the target spectrogram based on the mass-to-charge ratio of the target spectrogram;
inputting the spectrogram characteristics of the target spectrogram into a preset spectrum resolution model to obtain the analysis category of the target spectrogram, wherein the analysis category is used for reflecting the peptide fragment sequence corresponding to the target spectrogram.
2. The method for mass spectrometry spectrum analysis according to claim 1, wherein said determining the spectrum characteristics of the target spectrum based on the mass-to-charge ratio of the target spectrum comprises:
screening all the mass-to-charge ratios of the target spectrogram based on the intensity values of the mass-to-charge ratios to obtain characteristic quantity characteristic mass-to-charge ratios;
and obtaining the spectrogram characteristics of the target spectrogram based on all the characteristic mass-to-charge ratios of the target spectrogram.
3. The method for mass spectrometry spectrogram analysis according to claim 2, wherein the output of said solution model comprises a feature matrix obtained by compressing said spectrogram features, and said analysis category corresponding to said feature matrix; the length of the feature matrix is 7.
4. The method of mass spectrometry spectrogram analysis according to claim 3, wherein said de-spectral model comprises 4 down-sampling layers for compressing said spectrogram features to obtain said feature matrix;
screening all the mass-to-charge ratios of the target spectrogram based on the intensity values of the mass-to-charge ratios to obtain characteristic quantity characteristic mass-to-charge ratios, wherein the screening comprises the following steps:
and screening all the mass-to-charge ratios of the target spectrogram based on the intensity values of the mass-to-charge ratios to obtain 112 characteristic mass-to-charge ratios.
5. The method of mass spectrometry spectrogram analysis according to claim 1, wherein said cepstrum model is trained using:
obtaining a model training data set, wherein the model training data set comprises a training spectrogram, spectrogram features corresponding to the training spectrogram, and peptide fragment labels correspondingly labeled to the training spectrogram;
inputting spectrogram characteristics of the training spectrogram into the spectrum solution model to obtain a training category of the training spectrogram;
and training the spectrum solving model based on the peptide fragment label of the training spectrogram and the training category of the training spectrogram.
6. The method of mass spectrometry spectrogram analysis according to claim 5, wherein said obtaining a model training data set comprises:
acquiring a mass spectrum original data set and a spectrum resolving data set, wherein the mass spectrum original data set comprises an original spectrogram, and the spectrum resolving data set comprises a spectrum resolving result corresponding to the original spectrogram;
screening the original spectrogram in the mass spectrum original data set to obtain a filtered spectrogram;
removing the filtered spectrogram in the mass spectrum original data set to obtain a mass spectrum training data set;
acquiring spectrogram characteristics of the mass spectrum training data set;
and labeling all original spectrograms in the mass spectrum training dataset based on the spectrum solution dataset to obtain a peptide fragment label of the mass spectrum training dataset.
7. The method for mass spectrometry spectrogram analysis according to claim 6, wherein said screening said raw spectrogram in said mass spectrometry raw data set to obtain a filtered spectrogram comprises:
screening the original spectrogram with the spectrum resolving result as an inverse library based on the spectrum resolving result of the original spectrogram to obtain a filtered spectrogram;
and/or screening the original spectrogram with a blank spectrum resolving result based on a spectrum resolving result of the original spectrogram to obtain a filtered spectrogram;
and/or screening the original spectrogram with the score smaller than a score threshold value based on the spectrum resolving result of the original spectrogram to obtain a filtered spectrogram;
and/or screening the original images with the mass-to-charge ratios smaller than a characteristic threshold value based on the mass-to-charge ratios of the original spectrogram to obtain a filtered spectrogram, wherein the characteristic threshold value is smaller than or equal to the mass-to-charge ratio contained in the spectrogram characteristics;
and/or grouping the original spectrograms in the mass spectrum original data set based on the parent ion valences of the original spectrograms to obtain valency classification groups;
screening the valence classification group with the spectrogram number smaller than the valence chart number threshold value to obtain a filtered spectrogram;
and/or grouping the original spectrogram in the mass spectrum original data set based on the peptide fragment sequence in the spectrum resolving result to obtain a peptide fragment classification group;
and screening the peptide fragment classification groups with the number of spectrograms smaller than the threshold value of the number of peptide fragment spectrograms to obtain a filtering spectrogram.
8. A mass spectrometry apparatus, comprising:
a spectrum acquisition module (1) for acquiring a target spectrum, wherein the target spectrum corresponds to a peptide fragment of a protein;
the characteristic extraction module (2) is used for determining spectrogram characteristics of the target spectrogram based on the mass-to-charge ratio of the target spectrogram;
and the model analysis module (3) is used for inputting the spectrogram characteristics of the target spectrogram into a preset spectrum solution model to obtain the analysis category of the target spectrogram, wherein the analysis category is used for reflecting the peptide fragment sequence corresponding to the target spectrogram.
9. An intelligent terminal, comprising a memory and a processor, wherein the memory stores a computer program that can be loaded by the processor and execute the method of mass spectrometry as claimed in any one of claims 1 to 7.
10. A computer-readable storage medium storing a computer program that can be loaded by a processor and that can perform a method of mass spectrometry spectrum analysis according to any of claims 1 to 7.
CN202210784151.5A 2022-07-05 2022-07-05 Mass spectrum spectrogram analyzing method and device and intelligent terminal Pending CN115206422A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210784151.5A CN115206422A (en) 2022-07-05 2022-07-05 Mass spectrum spectrogram analyzing method and device and intelligent terminal

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210784151.5A CN115206422A (en) 2022-07-05 2022-07-05 Mass spectrum spectrogram analyzing method and device and intelligent terminal

Publications (1)

Publication Number Publication Date
CN115206422A true CN115206422A (en) 2022-10-18

Family

ID=83578804

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210784151.5A Pending CN115206422A (en) 2022-07-05 2022-07-05 Mass spectrum spectrogram analyzing method and device and intelligent terminal

Country Status (1)

Country Link
CN (1) CN115206422A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116597227A (en) * 2023-05-29 2023-08-15 广东省麦思科学仪器创新研究院 Mass spectrogram analysis method, device, equipment and storage medium

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116597227A (en) * 2023-05-29 2023-08-15 广东省麦思科学仪器创新研究院 Mass spectrogram analysis method, device, equipment and storage medium

Similar Documents

Publication Publication Date Title
CN111312329B (en) Transcription factor binding site prediction method based on deep convolution automatic encoder
Springenberg et al. Improving deep neural networks with probabilistic maxout units
CN111798921A (en) RNA binding protein prediction method and device based on multi-scale attention convolution neural network
US20240233313A1 (en) Model training method, image processing method, computing and processing device and non-transient computer-readable medium
WO2011034596A1 (en) High-throughput biomarker segmentation utilizing hierarchical normalized cuts
CN112308825B (en) SqueezeNet-based crop leaf disease identification method
CN110930378B (en) Emphysema image processing method and system based on low data demand
CN115588467B (en) Intracranial aneurysm rupture key gene screening method based on multilayer perceptron
CN111598844B (en) Image segmentation method and device, electronic equipment and readable storage medium
Cadow et al. On the feasibility of deep learning applications using raw mass spectrometry data
CN115206422A (en) Mass spectrum spectrogram analyzing method and device and intelligent terminal
CN113160886B (en) Cell type prediction system based on single cell Hi-C data
CN113838018B (en) Cnn-former-based liver fibrosis lesion detection model training method and system
CN113284563B (en) Screening method and system for protein mass spectrum quantitative analysis result
Barrera et al. Automatic normalized digital color staining in the recognition of abnormal blood cells using generative adversarial networks
Bouilhol et al. DeepSpot: A deep neural network for RNA spot enhancement in single-molecule fluorescence in-situ hybridization microscopy images
Yildiz et al. Nuclei segmentation in colon histology images by using the deep CNNs: a U-net based multi-class segmentation analysis
CN114664391A (en) Molecular feature determination method, related device and equipment
CN114973245B (en) Extracellular vesicle classification method, device, equipment and medium based on machine learning
CN116842996A (en) Space transcriptome method and device based on depth compressed sensing
CN115527193A (en) Chinese medicinal material type identification method
CN116091763A (en) Apple leaf disease image semantic segmentation system, segmentation method, device and medium
CN113392916A (en) Method and system for detecting nutritional ingredients of bamboo shoots based on hyperspectral image and storage medium
CN117912591B (en) Kinase-drug interaction prediction method based on deep contrast learning
CN113851195A (en) Compound-target protein binding prediction method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination