CN116821768A - System for classifying and predicting mass spectrum data based on deep learning - Google Patents
System for classifying and predicting mass spectrum data based on deep learning Download PDFInfo
- Publication number
- CN116821768A CN116821768A CN202310854608.XA CN202310854608A CN116821768A CN 116821768 A CN116821768 A CN 116821768A CN 202310854608 A CN202310854608 A CN 202310854608A CN 116821768 A CN116821768 A CN 116821768A
- Authority
- CN
- China
- Prior art keywords
- mass spectrum
- data
- spectrum data
- deep learning
- result
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
- 238000001819 mass spectrum Methods 0.000 title claims abstract description 219
- 238000013135 deep learning Methods 0.000 title claims abstract description 47
- 238000000034 method Methods 0.000 claims abstract description 61
- 238000007405 data analysis Methods 0.000 claims abstract description 47
- 230000008569 process Effects 0.000 claims abstract description 26
- 238000012800 visualization Methods 0.000 claims abstract description 20
- 238000012545 processing Methods 0.000 claims abstract description 18
- 238000013136 deep learning model Methods 0.000 claims abstract description 8
- 238000012549 training Methods 0.000 claims description 15
- 238000004949 mass spectrometry Methods 0.000 claims description 14
- 239000013598 vector Substances 0.000 claims description 12
- 230000006870 function Effects 0.000 claims description 9
- 238000004458 analytical method Methods 0.000 claims description 8
- 238000000605 extraction Methods 0.000 claims description 8
- 238000007781 pre-processing Methods 0.000 claims description 6
- 230000009466 transformation Effects 0.000 claims description 6
- 238000011426 transformation method Methods 0.000 claims description 6
- 238000013528 artificial neural network Methods 0.000 claims description 4
- 150000001875 compounds Chemical class 0.000 claims description 3
- 238000000354 decomposition reaction Methods 0.000 claims description 3
- 238000011156 evaluation Methods 0.000 claims description 3
- 238000003064 k means clustering Methods 0.000 claims description 3
- 238000002372 labelling Methods 0.000 claims description 3
- 238000013507 mapping Methods 0.000 claims description 3
- 210000002569 neuron Anatomy 0.000 claims description 3
- 238000012706 support-vector machine Methods 0.000 claims description 3
- 230000000007 visual effect Effects 0.000 claims description 3
- 230000000694 effects Effects 0.000 abstract description 5
- 239000000284 extract Substances 0.000 abstract description 5
- 230000008713 feedback mechanism Effects 0.000 abstract description 5
- 230000004927 fusion Effects 0.000 description 3
- 238000007792 addition Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000006467 substitution reaction Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 239000003814 drug Substances 0.000 description 1
- 230000003595 spectral effect Effects 0.000 description 1
- 239000000126 substance Substances 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/213—Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
- G06F18/2131—Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods based on a transform domain processing, e.g. wavelet transform
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
- G06F18/232—Non-hierarchical techniques
- G06F18/2321—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
- G06F18/23213—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0464—Convolutional networks [CNN, ConvNet]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Computation (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Molecular Biology (AREA)
- Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Probability & Statistics with Applications (AREA)
- Other Investigation Or Analysis Of Materials By Electrical Means (AREA)
Abstract
The invention belongs to the technical field of classification and prediction of mass spectrum data, and particularly relates to a system for classifying and predicting mass spectrum data based on deep learning; the user inputs mass spectrum data through a mass spectrum data input module and automatically transmits the mass spectrum data to a deep learning mass spectrum data analysis platform; analyzing and processing data according to the deep learning mass spectrum data analysis platform, and sending a data result to a mass spectrum data output module; when the mass spectrum data output module receives the result data, the result data is automatically displayed on a terminal database visualization screen; the user performs scoring feedback on the mass spectrum analysis feedback module according to the data result; the mass spectrum record database records all data in the mass spectrum data analysis process; on the one hand, the method does not need to manually extract the characteristics, can process complex mass spectrum data, reduces labor cost, also often achieves good effects, accords with the adjustment micro-parameters of a deep learning model, and can deeply fuse a feedback mechanism with model adjustment.
Description
Technical Field
The invention belongs to the technical field of classification and prediction of mass spectrum data, and particularly relates to a system for classifying and predicting mass spectrum data based on deep learning.
Background
Mass spectrometry is a technique for analyzing chemical components of molecules in a sample, and is widely applied to the fields of medicine, biology, environment, food science and the like; with the development and advancement of mass spectrometry technology, mass spectrometry data has become one of the important means for analyzing and studying molecules; however, in the analysis of mass spectrum data, the traditional method generally needs to manually extract the characteristics of the data, and the method needs the participation of professionals and often only extracts a part of the characteristics of the data; therefore, a deep learning technology based on a neural network is introduced for quickly extracting features;
on the one hand, the existing mass spectrum data is required to be classified and predicted, on the other hand, the manual extraction feature or the extraction method is inaccurate and incomplete, complex mass spectrum data cannot be processed, the labor cost is increased, and the good effect is often not achieved; on the other hand, the lack of the adjustment micro-parameters conforming to the deep learning model can not enable the feedback mechanism to be deeply fused with the model adjustment.
Disclosure of Invention
The invention provides a system for classifying and predicting mass spectrum data based on deep learning based on the technical problems; not only can complex mass spectrum data be processed without manually extracting features, but also the labor cost is reduced, and a good effect is often achieved; the method also accords with the adjustment micro-parameters of the deep learning model, and can carry out deep fusion on a feedback mechanism and model adjustment.
The invention is realized in the following way:
the invention provides a system for classifying and predicting mass spectrum data based on deep learning, which is applied to a mass spectrum data input module, a deep learning mass spectrum data analysis platform, a mass spectrum data output module, a terminal database visualization screen, a mass spectrum analysis feedback module, a terminal information receiving end and a mass spectrum record database; characterized in that the method comprises the steps of:
step1, inputting mass spectrum data by a user through a mass spectrum data input module, and automatically transmitting the mass spectrum data to a deep learning mass spectrum data analysis platform;
step2: analyzing and processing data according to the deep learning mass spectrum data analysis platform, and sending a data result to a mass spectrum data output module;
step3: when the mass spectrum data output module receives the data result, the data result is automatically displayed on a terminal database visualization screen;
step4, the user performs scoring feedback in a mass spectrum analysis feedback module according to the data result;
step 5: the mass spectrum record database records all data in the mass spectrum data analysis process.
According to one implementation manner of the aspect of the present invention, in the step1, a user inputs mass spectrum data through a mass spectrum data input module and automatically transmits the mass spectrum data to a deep learning mass spectrum data analysis platform, and the specific operation method comprises the following steps:
the mass spectrum data comprise two forms of mass spectrum and mass spectrum peak table; wherein the parameters in the mass spectrum include mass-to-charge ratio, ion signal intensity, mass spectrum signal, etc.: labeling parameters in a spectrogram Wherein β=0, 1,2,; p is a positive integer and represents the maximum value of the parameter beta in the mass spectrogram; parameters in the mass spectrum peak table include peak height, peak width, peak area, etc.; the parameters in the mass spectrum peak table are labeled Us, wherein s=0, 1,2, once again, l; and l is a positive integer and represents the maximum value of s available values in parameters in a mass spectrum peak table.
According to one implementation manner of the aspect of the present invention, in the step2, analysis and data processing are performed according to the deep learning mass spectrum data analysis platform, and the data result is sent to the mass spectrum data output module; the operation method comprises the following steps:
the deep learning mass spectrum data analysis platform analyzes mass spectrum data:
step1: the data preprocessing comprises the steps of denoising, normalizing, aligning and the like, wherein the steps of preprocessing new mass spectrum data are needed to ensure the quality and the reliability of the data and keep effective signal characteristics; selecting mass spectrum signals in mass spectrum data as pretreatment and preparing for extracting characteristic values; in mass spectrum data processing, due to the influence of noise and experimental conditions of an instrument, the problems of noise, background signals and the like often exist in data; therefore, the data needs to be preprocessed before deep learning to remove noise and background signals;
step2: feature extraction, namely converting the preprocessed data into feature vectors for training a deep learning model when the feature extraction is carried out; in the mass spectrum data processing, in order to better show the classification and prediction of mass spectrum data, selecting a wavelet transformation method for characteristic value selection;
processing mass spectrum signals based on a wavelet transformation method:
assuming a mass spectrum signal x (N) with the length of N, decomposing the signal into M layers of wavelet coefficients by utilizing wavelet transformation, and respectively representing signal characteristics under different scales; decomposing the signal into M layers of wavelet coefficients by adopting a discrete wavelet transformation mode;
according to the mathematical formula:
wherein h (k) and g (k) are high-pass and low-pass wavelet basis functions, respectively, d ij And S is ij Respectively representing low-frequency and high-frequency coefficients of the i-th layer decomposition, wherein j is the position of the layer coefficient, i=1, 2,., M; j=0, 1, & 2 i *j;
After the wavelet coefficient is obtained, calculating the statistical characteristics of the wavelet coefficient, and reflecting different characteristics of the mass spectrum signal by taking the mean value, the variance, the skewness and the kurtosis as the statistical characteristics in order to more highlight the characteristic value of the mass spectrum signal, wherein the specific formula is as follows:
wherein y (n) is d ij And S is ij Low frequency and high frequency coefficients, respectively calculating the mean value u and variance sigma of the low frequency and high frequency coefficients 2 The skewness s and kurtosis k values;
step3: model training, namely extracting statistical features of wavelet coefficients as feature vectors, inputting the feature vectors into a fully-connected neural network, mapping the feature vectors onto class labels by neurons, and training by using a cross entropy loss function; the data in model training is classified and predicted based on the mass spectrum signals known in the history; common classification or clustering methods include support vector machines, K-means clustering, and the like; the prediction method is to compare unknown mass spectrum data with a known mass spectrum gallery to determine the compound existing in a sample, and a prediction model can be output as a specific prediction value, wherein the range is between 0% and 100%; the comparison mode is to compare the difference ratio of the mass spectrum signals, wherein the smaller the difference ratio is, the more similar the difference ratio is, and the larger the difference ratio is, the more different the difference ratio is;
step4: and (3) model evaluation, namely, in order to evaluate the performance of the model, calculating indexes such as accuracy, precision, recall rate and the like of the model.
According to one implementation manner of the aspect of the present invention, when the mass spectrum data output module in the step3 receives the data result, the method for automatically displaying the operation on the terminal database visualization screen includes:
when the mass spectrum data output module receives the data result, the data result is automatically displayed on a terminal database visualization screen; the data result is visualized in the modes of a line graph, a bar graph, a scatter graph and the like; data information marking eta in visual mode for data result ω Where ω=0, 1,2,..., σ; sigma is a positive integer and represents the maximum value of omega in the data information in a data result visualization mode.
According to one implementation manner of the aspect of the present invention, the specific operation method for performing scoring feedback in the mass spectrometry feedback module according to the data result in step4 includes:
the user performs scoring feedback on the mass spectrometry feedback module according to the data result, wherein the scoring value is between [1,10 ]; the larger the numerical value is, the more accurate the data result is, the smaller the numerical value is, and the more inaccurate the data result is; converting the specific scoring value into a corresponding weight value, and performing model adjustment during cross entropy loss function training;
assuming x is a scoring value, and w is a weight value adjusted by deep learning;
according to the mathematical formula:
wherein the value range of w (x) is (0,1.773), which accords with the model fine adjustment.
According to one implementation manner of the aspect of the present invention, the method for operating all data in the mass spectrum data analysis process by using the mass spectrum record database record in the step 5 includes:
the mass spectrum record database records all data in the mass spectrum data analysis process; all data in the mass spectrum data analysis process comprise mass spectrum uploading data, mass spectrum fraction data results, a data result visualization chart and the like; marking L of all data in a mass spectrometry data analysis process M Wherein m=0, 1,2, σ.; sigma is a positive integer and represents the maximum value of M available values in all data in the mass spectrum data analysis process.
The cloud system analyzes and processes data according to a deep learning mass spectrum data analysis platform and sends a data result to a mass spectrum data output module; classifying and predicting mass spectrum data through cloud computing and analysis;
based on any one of the above aspects, the invention has the following beneficial effects:
1. according to the invention, a user inputs mass spectrum data through a mass spectrum data input module and automatically transmits the mass spectrum data to a deep learning mass spectrum data analysis platform; analyzing and processing data according to the deep learning mass spectrum data analysis platform, and sending a data result to a mass spectrum data output module; on the one hand, the method does not need to manually extract the characteristics, can process complex mass spectrum data, reduces the labor cost and also often achieves good effects.
2. When the mass spectrum data output module receives the result data, the result data is automatically displayed on a terminal database visualization screen; the user performs scoring feedback on the mass spectrum analysis feedback module according to the data result; the mass spectrum record database records all data in the mass spectrum data analysis process; the method accords with the adjustment micro-parameters of the deep learning model on the other hand, and can carry out deep fusion on a feedback mechanism and model adjustment.
Drawings
The invention will be further described with reference to the accompanying drawings, in which embodiments do not constitute any limitation of the invention, and other drawings can be obtained by one of ordinary skill in the art without inventive effort from the following drawings.
FIG. 1 is a flow chart of the steps of the method of the present invention.
Detailed Description
The foregoing is merely illustrative of the principles of the invention, and various modifications, additions and substitutions for those skilled in the art will be apparent to those having ordinary skill in the art without departing from the principles of the invention or from the scope of the invention as defined in the accompanying claims.
Referring to fig. 1, a system for classifying and predicting mass spectrum data based on deep learning is applied with a mass spectrum data input module, a deep learning mass spectrum data analysis platform, a mass spectrum data output module, a terminal database visualization screen, a mass spectrum analysis feedback module, a terminal information receiving end and a mass spectrum record database; characterized in that the method comprises the steps of:
step1, inputting mass spectrum data by a user through a mass spectrum data input module, and automatically transmitting the mass spectrum data to a deep learning mass spectrum data analysis platform;
in a specific embodiment of the present invention, the specific operation method for inputting the mass spectrum data by the user through the mass spectrum data input module in the step1 and automatically transmitting the mass spectrum data to the deep learning mass spectrum data analysis platform includes:
the mass spectrum data comprise two forms of mass spectrum and mass spectrum peak table; wherein the parameters in the mass spectrum include mass-to-charge ratio, ion signal intensity, mass spectrum signal, etc.: labeling parameters in a spectrogram Wherein β=0, 1,2,; p is a positive integer and represents the maximum value of the parameter beta in the mass spectrogram; parameters in the mass spectrum peak table include peak height, peak width, peak area, etc.; the parameters in the mass spectrum peak table are labeled Us, wherein s=0, 1,2, once again, l; and l is a positive integer and represents the maximum value of s available values in parameters in a mass spectrum peak table.
Step2: analyzing and processing data according to the deep learning mass spectrum data analysis platform, and sending a data result to a mass spectrum data output module;
in the specific embodiment of the invention, in the step2, analysis and data processing are performed according to a deep learning mass spectrum data analysis platform, and a data result is sent to a mass spectrum data output module; the operation method comprises the following steps:
the deep learning mass spectrum data analysis platform analyzes mass spectrum data:
step1: the data preprocessing comprises the steps of denoising, normalizing, aligning and the like, wherein the steps of preprocessing new mass spectrum data are needed to ensure the quality and the reliability of the data and keep effective signal characteristics; selecting mass spectrum signals in mass spectrum data as pretreatment and preparing for extracting characteristic values; in mass spectrum data processing, due to the influence of noise and experimental conditions of an instrument, the problems of noise, background signals and the like often exist in data; therefore, the data needs to be preprocessed before deep learning to remove noise and background signals;
step2: feature extraction, namely converting the preprocessed data into feature vectors for training a deep learning model when the feature extraction is carried out; in the mass spectrum data processing, in order to better show the classification and prediction of mass spectrum data, selecting a wavelet transformation method for characteristic value selection;
processing mass spectrum signals based on a wavelet transformation method:
assuming a mass spectrum signal x (N) with the length of N, decomposing the signal into M layers of wavelet coefficients by utilizing wavelet transformation, and respectively representing signal characteristics under different scales; decomposing the signal into M layers of wavelet coefficients by adopting a discrete wavelet transformation mode;
according to the mathematical formula:
wherein h (k) and g (k) are high-pass and low-pass wavelet basis functions, respectively, d ij And S is ij Respectively representing low-frequency and high-frequency coefficients of the i-th layer decomposition, wherein j is the position of the layer coefficient, i=1, 2,., M; j=0, 1, & 2 i *j;
After the wavelet coefficient is obtained, calculating the statistical characteristics of the wavelet coefficient, and reflecting different characteristics of the mass spectrum signal by taking the mean value, the variance, the skewness and the kurtosis as the statistical characteristics in order to more highlight the characteristic value of the mass spectrum signal, wherein the specific formula is as follows:
wherein y (n) is d ij And S is ij Low frequency and high frequency coefficients, respectively calculating the mean value u and variance sigma of the low frequency and high frequency coefficients 2 The skewness s and kurtosis k values;
step3: model training, namely extracting statistical features of wavelet coefficients as feature vectors, inputting the feature vectors into a fully-connected neural network, mapping the feature vectors onto class labels by neurons, and training by using a cross entropy loss function; the data in model training is classified and predicted based on the mass spectrum signals known in the history; common classification or clustering methods include support vector machines, K-means clustering, and the like; the prediction method is to compare unknown mass spectrum data with a known mass spectrum gallery to determine the compound existing in a sample, and a prediction model can be output as a specific prediction value, wherein the range is between 0% and 100%; the comparison mode is to compare the difference ratio of the mass spectrum signals, wherein the smaller the difference ratio is, the more similar the difference ratio is, and the larger the difference ratio is, the more different the difference ratio is;
step4: and (3) model evaluation, namely, in order to evaluate the performance of the model, calculating indexes such as accuracy, precision, recall rate and the like of the model.
Step3: when the mass spectrum data output module receives the data result, the data result is automatically displayed on a terminal database visualization screen;
in a specific embodiment of the present invention, when the mass spectrum data output module in the step3 receives the data result, the method for automatically displaying the operation on the terminal database visualization screen includes:
when the mass spectrum data output module receives the data result, the mass spectrum data output module automatically outputs the data result to the terminal databaseVisual on-screen display; the data result is visualized in the modes of a line graph, a bar graph, a scatter graph and the like; data information marking eta in visual mode for data result ω Wherein ω = 0,1,2, σ.; sigma is a positive integer and represents the maximum value of omega in the data information in a data result visualization mode.
Step4, the user performs scoring feedback in a mass spectrum analysis feedback module according to the data result;
in a specific embodiment of the present invention, the specific operation method for performing scoring feedback in the mass spectrometry feedback module according to the data result in step4 includes:
the user performs scoring feedback on the mass spectrometry feedback module according to the data result, wherein the scoring value is between [1,10 ]; the larger the numerical value is, the more accurate the data result is, the smaller the numerical value is, and the more inaccurate the data result is; converting the specific scoring value into a corresponding weight value, and performing model adjustment during cross entropy loss function training;
assuming x is a scoring value, and w is a weight value adjusted by deep learning;
according to the mathematical formula:
wherein the value range of w (x) is (0,1.773), which accords with the model fine adjustment.
Step 5: the mass spectrum record database records all data in the mass spectrum data analysis process.
In a specific embodiment of the present invention, the method for operating the mass spectrometry record database record in step 5 for all data in the mass spectrometry data analysis process includes:
the mass spectrum record database records all data in the mass spectrum data analysis process; all data in the mass spectrum data analysis process comprise mass spectrum uploading data, mass spectrum fraction data results, a data result visualization chart and the like; marking L of all data in a mass spectrometry data analysis process M Wherein m=0, 1,2, σ.; sigma is a positive integer representing mass spectral data fractionThe maximum value of M can be taken in all data in the analysis process.
The cloud system analyzes and processes data according to a deep learning mass spectrum data analysis platform and sends a data result to a mass spectrum data output module; classifying and predicting mass spectrum data through cloud computing and analysis;
according to the invention, a user inputs mass spectrum data through a mass spectrum data input module and automatically transmits the mass spectrum data to a deep learning mass spectrum data analysis platform; analyzing and processing data according to the deep learning mass spectrum data analysis platform, and sending a data result to a mass spectrum data output module; when the mass spectrum data output module receives the result data, the result data is automatically displayed on a terminal database visualization screen; the user performs scoring feedback on the mass spectrum analysis feedback module according to the data result; the mass spectrum record database records all data in the mass spectrum data analysis process; on one hand, the method does not need to manually extract the characteristics, can process complex mass spectrum data, reduces labor cost and also often achieves good effect; on the other hand, the method accords with the adjustment micro-parameters of the deep learning model, and can carry out deep fusion on a feedback mechanism and model adjustment.
The foregoing is merely illustrative of the structures of this invention and various modifications, additions and substitutions for those skilled in the art can be made to the described embodiments without departing from the scope of the invention or from the scope of the invention as defined in the accompanying claims.
Claims (7)
1. A system for classifying and predicting mass spectrum data based on deep learning is applied to a mass spectrum data input module, a deep learning mass spectrum data analysis platform, a mass spectrum data output module, a terminal database visualization screen, a mass spectrum analysis feedback module, a terminal information receiving end and a mass spectrum record database; characterized in that the method comprises the steps of:
step1, inputting mass spectrum data by a user through a mass spectrum data input module, and automatically transmitting the mass spectrum data to a deep learning mass spectrum data analysis platform;
step2: analyzing and processing data according to the deep learning mass spectrum data analysis platform, and sending a data result to a mass spectrum data output module;
step3: when the mass spectrum data output module receives the data result, the data result is automatically displayed on a terminal database visualization screen;
step4, the user performs scoring feedback in a mass spectrum analysis feedback module according to the data result;
step 5: the mass spectrum record database records all data in the mass spectrum data analysis process.
2. The deep learning based classification and prediction system as claimed in claim 1, wherein: in the step1, a user inputs mass spectrum data through a mass spectrum data input module and automatically transmits the mass spectrum data to a deep learning mass spectrum data analysis platform, and the specific operation method comprises the following steps:
the mass spectrum data comprise two forms of mass spectrum and mass spectrum peak table; wherein the parameters in the mass spectrum include mass-to-charge ratio, ion signal intensity, mass spectrum signal, etc.: labeling parameters in a spectrogram Wherein β=0, 1,2,; p is a positive integer and represents the maximum value of the parameter beta in the mass spectrogram; parameters in the mass spectrum peak table include peak height, peak width, peak area, etc.; the parameters in the mass spectrum peak table are labeled Us, wherein s=0, 1,2, once again, l; and l is a positive integer and represents the maximum value of s available values in parameters in a mass spectrum peak table.
3. The deep learning based classification and prediction system as claimed in claim 1, wherein: in the step2, analysis and data processing are carried out according to a deep learning mass spectrum data analysis platform, and a data result is sent to a mass spectrum data output module; the operation method comprises the following steps:
the deep learning mass spectrum data analysis platform analyzes mass spectrum data:
step1: the data preprocessing comprises the steps of denoising, normalizing, aligning and the like, wherein the steps of preprocessing new mass spectrum data are needed to ensure the quality and the reliability of the data and keep effective signal characteristics; selecting mass spectrum signals in mass spectrum data as pretreatment and preparing for extracting characteristic values;
step2: feature extraction, namely converting the preprocessed data into feature vectors for training a deep learning model when the feature extraction is carried out; in the mass spectrum data processing, in order to better show the classification and prediction of mass spectrum data, selecting a wavelet transformation method for characteristic value selection;
processing mass spectrum signals based on a wavelet transformation method:
assuming a mass spectrum signal x (N) with the length of N, decomposing the signal into M layers of wavelet coefficients by utilizing wavelet transformation, and respectively representing signal characteristics under different scales; decomposing the signal into M layers of wavelet coefficients by adopting a discrete wavelet transformation mode;
according to the mathematical formula:
wherein h (k) and g (k) are high-pass and low-pass wavelet basis functions, respectively, d ij And S is ij Respectively representing low-frequency and high-frequency coefficients of the i-th layer decomposition, wherein j is the position of the layer coefficient, i=1, 2,., M; j=0, 1, & 2 i *j;
After the wavelet coefficient is obtained, calculating the statistical characteristics of the wavelet coefficient, and reflecting different characteristics of the mass spectrum signal by taking the mean value, the variance, the skewness and the kurtosis as the statistical characteristics in order to more highlight the characteristic value of the mass spectrum signal, wherein the specific formula is as follows:
wherein y (n) is d ij And S is ij Low frequency and high frequency coefficients, respectively calculating the mean value u and variance sigma of the low frequency and high frequency coefficients 2 The skewness s and kurtosis k values;
step3: model training, namely extracting statistical features of wavelet coefficients as feature vectors, inputting the feature vectors into a fully-connected neural network, mapping the feature vectors onto class labels by neurons, and training by using a cross entropy loss function; the data in model training is classified and predicted based on the mass spectrum signals known in the history; common classification or clustering methods include support vector machines, K-means clustering, and the like; the prediction method is to compare unknown mass spectrum data with a known mass spectrum gallery to determine the compound existing in a sample, and a prediction model can be output as a specific prediction value, wherein the range is between 0% and 100%; the comparison mode is to compare the difference ratio of the mass spectrum signals, wherein the smaller the difference ratio is, the more similar the difference ratio is, and the larger the difference ratio is, the more different the difference ratio is;
step4: and (3) model evaluation, namely, in order to evaluate the performance of the model, calculating indexes such as accuracy, precision, recall rate and the like of the model.
4. The deep learning based classification and prediction system as claimed in claim 1, wherein: when the mass spectrum data output module in the step3 receives the data result, the operation method is automatically displayed on the terminal database visualization screen, and the method comprises the following steps:
when the mass spectrum data output module receives the data result, the data result is automatically displayed on a terminal database visualization screen; the data result is visualized in the modes of a line graph, a bar graph, a scatter graph and the like; data information marking eta in visual mode for data result ω Wherein ω = 0,1,2, σ.; sigma is a positive integer and represents the maximum value of omega in the data information in a data result visualization mode.
5. The deep learning based classification and prediction system as claimed in claim 1, wherein: in the step4, the specific operation method for scoring feedback in the mass spectrometry feedback module according to the data result comprises the following steps:
the user performs scoring feedback on the mass spectrometry feedback module according to the data result, wherein the scoring value is between [1,10 ]; the larger the numerical value is, the more accurate the data result is, the smaller the numerical value is, and the more inaccurate the data result is; converting the specific scoring value into a corresponding weight value, and performing model adjustment during cross entropy loss function training;
assuming x is a scoring value, and w is a weight value adjusted by deep learning;
according to the mathematical formula:
wherein the value range of w (x) is (0,1.773), which accords with the model fine adjustment.
6. The deep learning based classification and prediction system as claimed in claim 1, wherein: the method for operating all data in the mass spectrum data analysis process by using the mass spectrum record database record in the step 5 comprises the following steps:
the mass spectrum record database records all data in the mass spectrum data analysis process; all data in the mass spectrum data analysis process comprise mass spectrum uploading data, mass spectrum fraction data results, a data result visualization chart and the like; marking L of all data in a mass spectrometry data analysis process M Wherein m=0, 1,2, σ.; sigma is a positive integer and represents the maximum value of M available values in all data in the mass spectrum data analysis process.
7. A cloud system, characterized in that: analyzing and processing data according to a deep learning mass spectrum data analysis platform, and sending a data result to a mass spectrum data output module; classifying and predicting mass spectrum data by cloud computing and analysis to perform a deep learning-based system of classifying and predicting mass spectrum data according to any one of claims 1-6.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310854608.XA CN116821768A (en) | 2023-07-12 | 2023-07-12 | System for classifying and predicting mass spectrum data based on deep learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310854608.XA CN116821768A (en) | 2023-07-12 | 2023-07-12 | System for classifying and predicting mass spectrum data based on deep learning |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116821768A true CN116821768A (en) | 2023-09-29 |
Family
ID=88120194
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310854608.XA Withdrawn CN116821768A (en) | 2023-07-12 | 2023-07-12 | System for classifying and predicting mass spectrum data based on deep learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116821768A (en) |
-
2023
- 2023-07-12 CN CN202310854608.XA patent/CN116821768A/en not_active Withdrawn
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108629365B (en) | Analysis data analysis device and analysis data analysis method | |
EP3839942A1 (en) | Quality inspection method, apparatus, device and computer storage medium for insurance recording | |
WO2018121121A1 (en) | Method for use in subtracting spectrogram background, method for identifying substance via raman spectrum, and electronic device | |
CN111370067B (en) | LC/GC-MS-oriented metabonomics data quality control method and system | |
Lee et al. | Megavariate data analysis of mass spectrometric proteomics data using latent variable projection method | |
CN109036466B (en) | Emotion dimension PAD prediction method for emotion voice recognition | |
CN105158200B (en) | A kind of modeling method for improving the Qualitative Analysis of Near Infrared Spectroscopy degree of accuracy | |
CN111428631A (en) | Visual identification and sorting method for flight control signals of unmanned aerial vehicle | |
Liu et al. | Function-on-scalar quantile regression with application to mass spectrometry proteomics data | |
CN113109780B (en) | High-resolution range profile target identification method based on complex number dense connection neural network | |
CN116821768A (en) | System for classifying and predicting mass spectrum data based on deep learning | |
TW201321739A (en) | Signal analysis device, signal analysis method and computer program product | |
CN114858958B (en) | Method and device for analyzing mass spectrum data in quality evaluation and storage medium | |
CN115392375A (en) | Intelligent evaluation method and system for multi-source data fusion degree | |
CN109063767B (en) | Near infrared spectrum modeling method based on sample and variable consensus | |
CN102880861A (en) | High-spectrum image classification method based on linear prediction cepstrum coefficient | |
CN109408498A (en) | The identification of time series feature and decomposition method based on eigenmatrix decision tree | |
CN114694771A (en) | Sample classification method, training method of classifier, device and medium | |
CN112258472B (en) | Automatic scoring method for automobile exterior shape | |
CN112804650A (en) | Channel state information data dimension reduction method and intelligent indoor positioning method | |
CN114742091A (en) | Method, system and medium for identifying radar individual radiation based on convolution block attention | |
CN109829513B (en) | Sequential wavelength dispersion X-ray fluorescence spectrum intelligent analysis method | |
CN112666094A (en) | Common toxin recognition system and method | |
CN111220565A (en) | CPLS-based infrared spectrum measuring instrument calibration migration method | |
CN115015162A (en) | Near infrared spectrum model matching method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
WW01 | Invention patent application withdrawn after publication |
Application publication date: 20230929 |
|
WW01 | Invention patent application withdrawn after publication |