CN116821768A - System for classifying and predicting mass spectrum data based on deep learning - Google Patents

System for classifying and predicting mass spectrum data based on deep learning Download PDF

Info

Publication number
CN116821768A
CN116821768A CN202310854608.XA CN202310854608A CN116821768A CN 116821768 A CN116821768 A CN 116821768A CN 202310854608 A CN202310854608 A CN 202310854608A CN 116821768 A CN116821768 A CN 116821768A
Authority
CN
China
Prior art keywords
mass spectrum
data
spectrum data
deep learning
result
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN202310854608.XA
Other languages
Chinese (zh)
Inventor
赵雪龙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Individual
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Priority to CN202310854608.XA priority Critical patent/CN116821768A/en
Publication of CN116821768A publication Critical patent/CN116821768A/en
Withdrawn legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/213Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
    • G06F18/2131Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods based on a transform domain processing, e.g. wavelet transform
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Molecular Biology (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Probability & Statistics with Applications (AREA)
  • Other Investigation Or Analysis Of Materials By Electrical Means (AREA)

Abstract

The invention belongs to the technical field of classification and prediction of mass spectrum data, and particularly relates to a system for classifying and predicting mass spectrum data based on deep learning; the user inputs mass spectrum data through a mass spectrum data input module and automatically transmits the mass spectrum data to a deep learning mass spectrum data analysis platform; analyzing and processing data according to the deep learning mass spectrum data analysis platform, and sending a data result to a mass spectrum data output module; when the mass spectrum data output module receives the result data, the result data is automatically displayed on a terminal database visualization screen; the user performs scoring feedback on the mass spectrum analysis feedback module according to the data result; the mass spectrum record database records all data in the mass spectrum data analysis process; on the one hand, the method does not need to manually extract the characteristics, can process complex mass spectrum data, reduces labor cost, also often achieves good effects, accords with the adjustment micro-parameters of a deep learning model, and can deeply fuse a feedback mechanism with model adjustment.

Description

System for classifying and predicting mass spectrum data based on deep learning
Technical Field
The invention belongs to the technical field of classification and prediction of mass spectrum data, and particularly relates to a system for classifying and predicting mass spectrum data based on deep learning.
Background
Mass spectrometry is a technique for analyzing chemical components of molecules in a sample, and is widely applied to the fields of medicine, biology, environment, food science and the like; with the development and advancement of mass spectrometry technology, mass spectrometry data has become one of the important means for analyzing and studying molecules; however, in the analysis of mass spectrum data, the traditional method generally needs to manually extract the characteristics of the data, and the method needs the participation of professionals and often only extracts a part of the characteristics of the data; therefore, a deep learning technology based on a neural network is introduced for quickly extracting features;
on the one hand, the existing mass spectrum data is required to be classified and predicted, on the other hand, the manual extraction feature or the extraction method is inaccurate and incomplete, complex mass spectrum data cannot be processed, the labor cost is increased, and the good effect is often not achieved; on the other hand, the lack of the adjustment micro-parameters conforming to the deep learning model can not enable the feedback mechanism to be deeply fused with the model adjustment.
Disclosure of Invention
The invention provides a system for classifying and predicting mass spectrum data based on deep learning based on the technical problems; not only can complex mass spectrum data be processed without manually extracting features, but also the labor cost is reduced, and a good effect is often achieved; the method also accords with the adjustment micro-parameters of the deep learning model, and can carry out deep fusion on a feedback mechanism and model adjustment.
The invention is realized in the following way:
the invention provides a system for classifying and predicting mass spectrum data based on deep learning, which is applied to a mass spectrum data input module, a deep learning mass spectrum data analysis platform, a mass spectrum data output module, a terminal database visualization screen, a mass spectrum analysis feedback module, a terminal information receiving end and a mass spectrum record database; characterized in that the method comprises the steps of:
step1, inputting mass spectrum data by a user through a mass spectrum data input module, and automatically transmitting the mass spectrum data to a deep learning mass spectrum data analysis platform;
step2: analyzing and processing data according to the deep learning mass spectrum data analysis platform, and sending a data result to a mass spectrum data output module;
step3: when the mass spectrum data output module receives the data result, the data result is automatically displayed on a terminal database visualization screen;
step4, the user performs scoring feedback in a mass spectrum analysis feedback module according to the data result;
step 5: the mass spectrum record database records all data in the mass spectrum data analysis process.
According to one implementation manner of the aspect of the present invention, in the step1, a user inputs mass spectrum data through a mass spectrum data input module and automatically transmits the mass spectrum data to a deep learning mass spectrum data analysis platform, and the specific operation method comprises the following steps:
the mass spectrum data comprise two forms of mass spectrum and mass spectrum peak table; wherein the parameters in the mass spectrum include mass-to-charge ratio, ion signal intensity, mass spectrum signal, etc.: labeling parameters in a spectrogram Wherein β=0, 1,2,; p is a positive integer and represents the maximum value of the parameter beta in the mass spectrogram; parameters in the mass spectrum peak table include peak height, peak width, peak area, etc.; the parameters in the mass spectrum peak table are labeled Us, wherein s=0, 1,2, once again, l; and l is a positive integer and represents the maximum value of s available values in parameters in a mass spectrum peak table.
According to one implementation manner of the aspect of the present invention, in the step2, analysis and data processing are performed according to the deep learning mass spectrum data analysis platform, and the data result is sent to the mass spectrum data output module; the operation method comprises the following steps:
the deep learning mass spectrum data analysis platform analyzes mass spectrum data:
step1: the data preprocessing comprises the steps of denoising, normalizing, aligning and the like, wherein the steps of preprocessing new mass spectrum data are needed to ensure the quality and the reliability of the data and keep effective signal characteristics; selecting mass spectrum signals in mass spectrum data as pretreatment and preparing for extracting characteristic values; in mass spectrum data processing, due to the influence of noise and experimental conditions of an instrument, the problems of noise, background signals and the like often exist in data; therefore, the data needs to be preprocessed before deep learning to remove noise and background signals;
step2: feature extraction, namely converting the preprocessed data into feature vectors for training a deep learning model when the feature extraction is carried out; in the mass spectrum data processing, in order to better show the classification and prediction of mass spectrum data, selecting a wavelet transformation method for characteristic value selection;
processing mass spectrum signals based on a wavelet transformation method:
assuming a mass spectrum signal x (N) with the length of N, decomposing the signal into M layers of wavelet coefficients by utilizing wavelet transformation, and respectively representing signal characteristics under different scales; decomposing the signal into M layers of wavelet coefficients by adopting a discrete wavelet transformation mode;
according to the mathematical formula:
wherein h (k) and g (k) are high-pass and low-pass wavelet basis functions, respectively, d ij And S is ij Respectively representing low-frequency and high-frequency coefficients of the i-th layer decomposition, wherein j is the position of the layer coefficient, i=1, 2,., M; j=0, 1, & 2 i *j;
After the wavelet coefficient is obtained, calculating the statistical characteristics of the wavelet coefficient, and reflecting different characteristics of the mass spectrum signal by taking the mean value, the variance, the skewness and the kurtosis as the statistical characteristics in order to more highlight the characteristic value of the mass spectrum signal, wherein the specific formula is as follows:
wherein y (n) is d ij And S is ij Low frequency and high frequency coefficients, respectively calculating the mean value u and variance sigma of the low frequency and high frequency coefficients 2 The skewness s and kurtosis k values;
step3: model training, namely extracting statistical features of wavelet coefficients as feature vectors, inputting the feature vectors into a fully-connected neural network, mapping the feature vectors onto class labels by neurons, and training by using a cross entropy loss function; the data in model training is classified and predicted based on the mass spectrum signals known in the history; common classification or clustering methods include support vector machines, K-means clustering, and the like; the prediction method is to compare unknown mass spectrum data with a known mass spectrum gallery to determine the compound existing in a sample, and a prediction model can be output as a specific prediction value, wherein the range is between 0% and 100%; the comparison mode is to compare the difference ratio of the mass spectrum signals, wherein the smaller the difference ratio is, the more similar the difference ratio is, and the larger the difference ratio is, the more different the difference ratio is;
step4: and (3) model evaluation, namely, in order to evaluate the performance of the model, calculating indexes such as accuracy, precision, recall rate and the like of the model.
According to one implementation manner of the aspect of the present invention, when the mass spectrum data output module in the step3 receives the data result, the method for automatically displaying the operation on the terminal database visualization screen includes:
when the mass spectrum data output module receives the data result, the data result is automatically displayed on a terminal database visualization screen; the data result is visualized in the modes of a line graph, a bar graph, a scatter graph and the like; data information marking eta in visual mode for data result ω Where ω=0, 1,2,..., σ; sigma is a positive integer and represents the maximum value of omega in the data information in a data result visualization mode.
According to one implementation manner of the aspect of the present invention, the specific operation method for performing scoring feedback in the mass spectrometry feedback module according to the data result in step4 includes:
the user performs scoring feedback on the mass spectrometry feedback module according to the data result, wherein the scoring value is between [1,10 ]; the larger the numerical value is, the more accurate the data result is, the smaller the numerical value is, and the more inaccurate the data result is; converting the specific scoring value into a corresponding weight value, and performing model adjustment during cross entropy loss function training;
assuming x is a scoring value, and w is a weight value adjusted by deep learning;
according to the mathematical formula:
wherein the value range of w (x) is (0,1.773), which accords with the model fine adjustment.
According to one implementation manner of the aspect of the present invention, the method for operating all data in the mass spectrum data analysis process by using the mass spectrum record database record in the step 5 includes:
the mass spectrum record database records all data in the mass spectrum data analysis process; all data in the mass spectrum data analysis process comprise mass spectrum uploading data, mass spectrum fraction data results, a data result visualization chart and the like; marking L of all data in a mass spectrometry data analysis process M Wherein m=0, 1,2, σ.; sigma is a positive integer and represents the maximum value of M available values in all data in the mass spectrum data analysis process.
The cloud system analyzes and processes data according to a deep learning mass spectrum data analysis platform and sends a data result to a mass spectrum data output module; classifying and predicting mass spectrum data through cloud computing and analysis;
based on any one of the above aspects, the invention has the following beneficial effects:
1. according to the invention, a user inputs mass spectrum data through a mass spectrum data input module and automatically transmits the mass spectrum data to a deep learning mass spectrum data analysis platform; analyzing and processing data according to the deep learning mass spectrum data analysis platform, and sending a data result to a mass spectrum data output module; on the one hand, the method does not need to manually extract the characteristics, can process complex mass spectrum data, reduces the labor cost and also often achieves good effects.
2. When the mass spectrum data output module receives the result data, the result data is automatically displayed on a terminal database visualization screen; the user performs scoring feedback on the mass spectrum analysis feedback module according to the data result; the mass spectrum record database records all data in the mass spectrum data analysis process; the method accords with the adjustment micro-parameters of the deep learning model on the other hand, and can carry out deep fusion on a feedback mechanism and model adjustment.
Drawings
The invention will be further described with reference to the accompanying drawings, in which embodiments do not constitute any limitation of the invention, and other drawings can be obtained by one of ordinary skill in the art without inventive effort from the following drawings.
FIG. 1 is a flow chart of the steps of the method of the present invention.
Detailed Description
The foregoing is merely illustrative of the principles of the invention, and various modifications, additions and substitutions for those skilled in the art will be apparent to those having ordinary skill in the art without departing from the principles of the invention or from the scope of the invention as defined in the accompanying claims.
Referring to fig. 1, a system for classifying and predicting mass spectrum data based on deep learning is applied with a mass spectrum data input module, a deep learning mass spectrum data analysis platform, a mass spectrum data output module, a terminal database visualization screen, a mass spectrum analysis feedback module, a terminal information receiving end and a mass spectrum record database; characterized in that the method comprises the steps of:
step1, inputting mass spectrum data by a user through a mass spectrum data input module, and automatically transmitting the mass spectrum data to a deep learning mass spectrum data analysis platform;
in a specific embodiment of the present invention, the specific operation method for inputting the mass spectrum data by the user through the mass spectrum data input module in the step1 and automatically transmitting the mass spectrum data to the deep learning mass spectrum data analysis platform includes:
the mass spectrum data comprise two forms of mass spectrum and mass spectrum peak table; wherein the parameters in the mass spectrum include mass-to-charge ratio, ion signal intensity, mass spectrum signal, etc.: labeling parameters in a spectrogram Wherein β=0, 1,2,; p is a positive integer and represents the maximum value of the parameter beta in the mass spectrogram; parameters in the mass spectrum peak table include peak height, peak width, peak area, etc.; the parameters in the mass spectrum peak table are labeled Us, wherein s=0, 1,2, once again, l; and l is a positive integer and represents the maximum value of s available values in parameters in a mass spectrum peak table.
Step2: analyzing and processing data according to the deep learning mass spectrum data analysis platform, and sending a data result to a mass spectrum data output module;
in the specific embodiment of the invention, in the step2, analysis and data processing are performed according to a deep learning mass spectrum data analysis platform, and a data result is sent to a mass spectrum data output module; the operation method comprises the following steps:
the deep learning mass spectrum data analysis platform analyzes mass spectrum data:
step1: the data preprocessing comprises the steps of denoising, normalizing, aligning and the like, wherein the steps of preprocessing new mass spectrum data are needed to ensure the quality and the reliability of the data and keep effective signal characteristics; selecting mass spectrum signals in mass spectrum data as pretreatment and preparing for extracting characteristic values; in mass spectrum data processing, due to the influence of noise and experimental conditions of an instrument, the problems of noise, background signals and the like often exist in data; therefore, the data needs to be preprocessed before deep learning to remove noise and background signals;
step2: feature extraction, namely converting the preprocessed data into feature vectors for training a deep learning model when the feature extraction is carried out; in the mass spectrum data processing, in order to better show the classification and prediction of mass spectrum data, selecting a wavelet transformation method for characteristic value selection;
processing mass spectrum signals based on a wavelet transformation method:
assuming a mass spectrum signal x (N) with the length of N, decomposing the signal into M layers of wavelet coefficients by utilizing wavelet transformation, and respectively representing signal characteristics under different scales; decomposing the signal into M layers of wavelet coefficients by adopting a discrete wavelet transformation mode;
according to the mathematical formula:
wherein h (k) and g (k) are high-pass and low-pass wavelet basis functions, respectively, d ij And S is ij Respectively representing low-frequency and high-frequency coefficients of the i-th layer decomposition, wherein j is the position of the layer coefficient, i=1, 2,., M; j=0, 1, & 2 i *j;
After the wavelet coefficient is obtained, calculating the statistical characteristics of the wavelet coefficient, and reflecting different characteristics of the mass spectrum signal by taking the mean value, the variance, the skewness and the kurtosis as the statistical characteristics in order to more highlight the characteristic value of the mass spectrum signal, wherein the specific formula is as follows:
wherein y (n) is d ij And S is ij Low frequency and high frequency coefficients, respectively calculating the mean value u and variance sigma of the low frequency and high frequency coefficients 2 The skewness s and kurtosis k values;
step3: model training, namely extracting statistical features of wavelet coefficients as feature vectors, inputting the feature vectors into a fully-connected neural network, mapping the feature vectors onto class labels by neurons, and training by using a cross entropy loss function; the data in model training is classified and predicted based on the mass spectrum signals known in the history; common classification or clustering methods include support vector machines, K-means clustering, and the like; the prediction method is to compare unknown mass spectrum data with a known mass spectrum gallery to determine the compound existing in a sample, and a prediction model can be output as a specific prediction value, wherein the range is between 0% and 100%; the comparison mode is to compare the difference ratio of the mass spectrum signals, wherein the smaller the difference ratio is, the more similar the difference ratio is, and the larger the difference ratio is, the more different the difference ratio is;
step4: and (3) model evaluation, namely, in order to evaluate the performance of the model, calculating indexes such as accuracy, precision, recall rate and the like of the model.
Step3: when the mass spectrum data output module receives the data result, the data result is automatically displayed on a terminal database visualization screen;
in a specific embodiment of the present invention, when the mass spectrum data output module in the step3 receives the data result, the method for automatically displaying the operation on the terminal database visualization screen includes:
when the mass spectrum data output module receives the data result, the mass spectrum data output module automatically outputs the data result to the terminal databaseVisual on-screen display; the data result is visualized in the modes of a line graph, a bar graph, a scatter graph and the like; data information marking eta in visual mode for data result ω Wherein ω = 0,1,2, σ.; sigma is a positive integer and represents the maximum value of omega in the data information in a data result visualization mode.
Step4, the user performs scoring feedback in a mass spectrum analysis feedback module according to the data result;
in a specific embodiment of the present invention, the specific operation method for performing scoring feedback in the mass spectrometry feedback module according to the data result in step4 includes:
the user performs scoring feedback on the mass spectrometry feedback module according to the data result, wherein the scoring value is between [1,10 ]; the larger the numerical value is, the more accurate the data result is, the smaller the numerical value is, and the more inaccurate the data result is; converting the specific scoring value into a corresponding weight value, and performing model adjustment during cross entropy loss function training;
assuming x is a scoring value, and w is a weight value adjusted by deep learning;
according to the mathematical formula:
wherein the value range of w (x) is (0,1.773), which accords with the model fine adjustment.
Step 5: the mass spectrum record database records all data in the mass spectrum data analysis process.
In a specific embodiment of the present invention, the method for operating the mass spectrometry record database record in step 5 for all data in the mass spectrometry data analysis process includes:
the mass spectrum record database records all data in the mass spectrum data analysis process; all data in the mass spectrum data analysis process comprise mass spectrum uploading data, mass spectrum fraction data results, a data result visualization chart and the like; marking L of all data in a mass spectrometry data analysis process M Wherein m=0, 1,2, σ.; sigma is a positive integer representing mass spectral data fractionThe maximum value of M can be taken in all data in the analysis process.
The cloud system analyzes and processes data according to a deep learning mass spectrum data analysis platform and sends a data result to a mass spectrum data output module; classifying and predicting mass spectrum data through cloud computing and analysis;
according to the invention, a user inputs mass spectrum data through a mass spectrum data input module and automatically transmits the mass spectrum data to a deep learning mass spectrum data analysis platform; analyzing and processing data according to the deep learning mass spectrum data analysis platform, and sending a data result to a mass spectrum data output module; when the mass spectrum data output module receives the result data, the result data is automatically displayed on a terminal database visualization screen; the user performs scoring feedback on the mass spectrum analysis feedback module according to the data result; the mass spectrum record database records all data in the mass spectrum data analysis process; on one hand, the method does not need to manually extract the characteristics, can process complex mass spectrum data, reduces labor cost and also often achieves good effect; on the other hand, the method accords with the adjustment micro-parameters of the deep learning model, and can carry out deep fusion on a feedback mechanism and model adjustment.
The foregoing is merely illustrative of the structures of this invention and various modifications, additions and substitutions for those skilled in the art can be made to the described embodiments without departing from the scope of the invention or from the scope of the invention as defined in the accompanying claims.

Claims (7)

1. A system for classifying and predicting mass spectrum data based on deep learning is applied to a mass spectrum data input module, a deep learning mass spectrum data analysis platform, a mass spectrum data output module, a terminal database visualization screen, a mass spectrum analysis feedback module, a terminal information receiving end and a mass spectrum record database; characterized in that the method comprises the steps of:
step1, inputting mass spectrum data by a user through a mass spectrum data input module, and automatically transmitting the mass spectrum data to a deep learning mass spectrum data analysis platform;
step2: analyzing and processing data according to the deep learning mass spectrum data analysis platform, and sending a data result to a mass spectrum data output module;
step3: when the mass spectrum data output module receives the data result, the data result is automatically displayed on a terminal database visualization screen;
step4, the user performs scoring feedback in a mass spectrum analysis feedback module according to the data result;
step 5: the mass spectrum record database records all data in the mass spectrum data analysis process.
2. The deep learning based classification and prediction system as claimed in claim 1, wherein: in the step1, a user inputs mass spectrum data through a mass spectrum data input module and automatically transmits the mass spectrum data to a deep learning mass spectrum data analysis platform, and the specific operation method comprises the following steps:
the mass spectrum data comprise two forms of mass spectrum and mass spectrum peak table; wherein the parameters in the mass spectrum include mass-to-charge ratio, ion signal intensity, mass spectrum signal, etc.: labeling parameters in a spectrogram Wherein β=0, 1,2,; p is a positive integer and represents the maximum value of the parameter beta in the mass spectrogram; parameters in the mass spectrum peak table include peak height, peak width, peak area, etc.; the parameters in the mass spectrum peak table are labeled Us, wherein s=0, 1,2, once again, l; and l is a positive integer and represents the maximum value of s available values in parameters in a mass spectrum peak table.
3. The deep learning based classification and prediction system as claimed in claim 1, wherein: in the step2, analysis and data processing are carried out according to a deep learning mass spectrum data analysis platform, and a data result is sent to a mass spectrum data output module; the operation method comprises the following steps:
the deep learning mass spectrum data analysis platform analyzes mass spectrum data:
step1: the data preprocessing comprises the steps of denoising, normalizing, aligning and the like, wherein the steps of preprocessing new mass spectrum data are needed to ensure the quality and the reliability of the data and keep effective signal characteristics; selecting mass spectrum signals in mass spectrum data as pretreatment and preparing for extracting characteristic values;
step2: feature extraction, namely converting the preprocessed data into feature vectors for training a deep learning model when the feature extraction is carried out; in the mass spectrum data processing, in order to better show the classification and prediction of mass spectrum data, selecting a wavelet transformation method for characteristic value selection;
processing mass spectrum signals based on a wavelet transformation method:
assuming a mass spectrum signal x (N) with the length of N, decomposing the signal into M layers of wavelet coefficients by utilizing wavelet transformation, and respectively representing signal characteristics under different scales; decomposing the signal into M layers of wavelet coefficients by adopting a discrete wavelet transformation mode;
according to the mathematical formula:
wherein h (k) and g (k) are high-pass and low-pass wavelet basis functions, respectively, d ij And S is ij Respectively representing low-frequency and high-frequency coefficients of the i-th layer decomposition, wherein j is the position of the layer coefficient, i=1, 2,., M; j=0, 1, & 2 i *j;
After the wavelet coefficient is obtained, calculating the statistical characteristics of the wavelet coefficient, and reflecting different characteristics of the mass spectrum signal by taking the mean value, the variance, the skewness and the kurtosis as the statistical characteristics in order to more highlight the characteristic value of the mass spectrum signal, wherein the specific formula is as follows:
wherein y (n) is d ij And S is ij Low frequency and high frequency coefficients, respectively calculating the mean value u and variance sigma of the low frequency and high frequency coefficients 2 The skewness s and kurtosis k values;
step3: model training, namely extracting statistical features of wavelet coefficients as feature vectors, inputting the feature vectors into a fully-connected neural network, mapping the feature vectors onto class labels by neurons, and training by using a cross entropy loss function; the data in model training is classified and predicted based on the mass spectrum signals known in the history; common classification or clustering methods include support vector machines, K-means clustering, and the like; the prediction method is to compare unknown mass spectrum data with a known mass spectrum gallery to determine the compound existing in a sample, and a prediction model can be output as a specific prediction value, wherein the range is between 0% and 100%; the comparison mode is to compare the difference ratio of the mass spectrum signals, wherein the smaller the difference ratio is, the more similar the difference ratio is, and the larger the difference ratio is, the more different the difference ratio is;
step4: and (3) model evaluation, namely, in order to evaluate the performance of the model, calculating indexes such as accuracy, precision, recall rate and the like of the model.
4. The deep learning based classification and prediction system as claimed in claim 1, wherein: when the mass spectrum data output module in the step3 receives the data result, the operation method is automatically displayed on the terminal database visualization screen, and the method comprises the following steps:
when the mass spectrum data output module receives the data result, the data result is automatically displayed on a terminal database visualization screen; the data result is visualized in the modes of a line graph, a bar graph, a scatter graph and the like; data information marking eta in visual mode for data result ω Wherein ω = 0,1,2, σ.; sigma is a positive integer and represents the maximum value of omega in the data information in a data result visualization mode.
5. The deep learning based classification and prediction system as claimed in claim 1, wherein: in the step4, the specific operation method for scoring feedback in the mass spectrometry feedback module according to the data result comprises the following steps:
the user performs scoring feedback on the mass spectrometry feedback module according to the data result, wherein the scoring value is between [1,10 ]; the larger the numerical value is, the more accurate the data result is, the smaller the numerical value is, and the more inaccurate the data result is; converting the specific scoring value into a corresponding weight value, and performing model adjustment during cross entropy loss function training;
assuming x is a scoring value, and w is a weight value adjusted by deep learning;
according to the mathematical formula:
wherein the value range of w (x) is (0,1.773), which accords with the model fine adjustment.
6. The deep learning based classification and prediction system as claimed in claim 1, wherein: the method for operating all data in the mass spectrum data analysis process by using the mass spectrum record database record in the step 5 comprises the following steps:
the mass spectrum record database records all data in the mass spectrum data analysis process; all data in the mass spectrum data analysis process comprise mass spectrum uploading data, mass spectrum fraction data results, a data result visualization chart and the like; marking L of all data in a mass spectrometry data analysis process M Wherein m=0, 1,2, σ.; sigma is a positive integer and represents the maximum value of M available values in all data in the mass spectrum data analysis process.
7. A cloud system, characterized in that: analyzing and processing data according to a deep learning mass spectrum data analysis platform, and sending a data result to a mass spectrum data output module; classifying and predicting mass spectrum data by cloud computing and analysis to perform a deep learning-based system of classifying and predicting mass spectrum data according to any one of claims 1-6.
CN202310854608.XA 2023-07-12 2023-07-12 System for classifying and predicting mass spectrum data based on deep learning Withdrawn CN116821768A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310854608.XA CN116821768A (en) 2023-07-12 2023-07-12 System for classifying and predicting mass spectrum data based on deep learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310854608.XA CN116821768A (en) 2023-07-12 2023-07-12 System for classifying and predicting mass spectrum data based on deep learning

Publications (1)

Publication Number Publication Date
CN116821768A true CN116821768A (en) 2023-09-29

Family

ID=88120194

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310854608.XA Withdrawn CN116821768A (en) 2023-07-12 2023-07-12 System for classifying and predicting mass spectrum data based on deep learning

Country Status (1)

Country Link
CN (1) CN116821768A (en)

Similar Documents

Publication Publication Date Title
CN108629365B (en) Analysis data analysis device and analysis data analysis method
EP3839942A1 (en) Quality inspection method, apparatus, device and computer storage medium for insurance recording
WO2018121121A1 (en) Method for use in subtracting spectrogram background, method for identifying substance via raman spectrum, and electronic device
CN111370067B (en) LC/GC-MS-oriented metabonomics data quality control method and system
Lee et al. Megavariate data analysis of mass spectrometric proteomics data using latent variable projection method
CN109036466B (en) Emotion dimension PAD prediction method for emotion voice recognition
CN105158200B (en) A kind of modeling method for improving the Qualitative Analysis of Near Infrared Spectroscopy degree of accuracy
CN111428631A (en) Visual identification and sorting method for flight control signals of unmanned aerial vehicle
Liu et al. Function-on-scalar quantile regression with application to mass spectrometry proteomics data
CN113109780B (en) High-resolution range profile target identification method based on complex number dense connection neural network
CN116821768A (en) System for classifying and predicting mass spectrum data based on deep learning
TW201321739A (en) Signal analysis device, signal analysis method and computer program product
CN114858958B (en) Method and device for analyzing mass spectrum data in quality evaluation and storage medium
CN115392375A (en) Intelligent evaluation method and system for multi-source data fusion degree
CN109063767B (en) Near infrared spectrum modeling method based on sample and variable consensus
CN102880861A (en) High-spectrum image classification method based on linear prediction cepstrum coefficient
CN109408498A (en) The identification of time series feature and decomposition method based on eigenmatrix decision tree
CN114694771A (en) Sample classification method, training method of classifier, device and medium
CN112258472B (en) Automatic scoring method for automobile exterior shape
CN112804650A (en) Channel state information data dimension reduction method and intelligent indoor positioning method
CN114742091A (en) Method, system and medium for identifying radar individual radiation based on convolution block attention
CN109829513B (en) Sequential wavelength dispersion X-ray fluorescence spectrum intelligent analysis method
CN112666094A (en) Common toxin recognition system and method
CN111220565A (en) CPLS-based infrared spectrum measuring instrument calibration migration method
CN115015162A (en) Near infrared spectrum model matching method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
WW01 Invention patent application withdrawn after publication

Application publication date: 20230929

WW01 Invention patent application withdrawn after publication