CN113378680B - Intelligent database building method for Raman spectrum data - Google Patents

Intelligent database building method for Raman spectrum data Download PDF

Info

Publication number
CN113378680B
CN113378680B CN202110610390.4A CN202110610390A CN113378680B CN 113378680 B CN113378680 B CN 113378680B CN 202110610390 A CN202110610390 A CN 202110610390A CN 113378680 B CN113378680 B CN 113378680B
Authority
CN
China
Prior art keywords
spectrum
dimensional
raman
original
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110610390.4A
Other languages
Chinese (zh)
Other versions
CN113378680A (en
Inventor
吴德文
韩李翔
陈嘉祥
王思伟
李超然
刘国坤
罗思恒
曾勇明
谢怡
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xiamen Perser Nano Tech Co ltd
Xiamen University
Original Assignee
Xiamen Perser Nano Tech Co ltd
Xiamen University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xiamen Perser Nano Tech Co ltd, Xiamen University filed Critical Xiamen Perser Nano Tech Co ltd
Priority to CN202110610390.4A priority Critical patent/CN113378680B/en
Publication of CN113378680A publication Critical patent/CN113378680A/en
Application granted granted Critical
Publication of CN113378680B publication Critical patent/CN113378680B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2218/00Aspects of pattern recognition specially adapted for signal processing
    • G06F2218/02Preprocessing
    • G06F2218/04Denoising
    • G06F2218/06Denoising by applying a scale-space analysis, e.g. using wavelet analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2218/00Aspects of pattern recognition specially adapted for signal processing
    • G06F2218/08Feature extraction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2218/00Aspects of pattern recognition specially adapted for signal processing
    • G06F2218/12Classification; Matching

Abstract

The invention provides an intelligent database building method for Raman spectrum data, which comprises the steps of firstly transforming one-dimensional sequence signals of an original Raman spectrum into two-dimensional signals in a wavelet space by utilizing wavelet transformation, and then inputting the two-dimensional signals into a generation countermeasure network for training. The generation of the confrontation network comprises a generation model and a discrimination model, and the confrontation network is trained in the form of two models; the former inputs a vector generated at random to generate a generated spectrum (two-dimensional format), and the latter inputs an original spectrum and the generated spectrum (two-dimensional format) and judges whether the input is the original spectrum. After training of the generation countermeasure network is completed, a large number of generated spectrums (two-dimensional format) similar to the original spectrums are generated by using the generation model, and a spectrum database is established by combining the generated spectrums with the original spectrums. The database stores spectral data in a two-dimensional signal format. The method solves the problems of difficult acquisition of spectral data, high cost, long time consumption and the like when the deep learning is applied to the field of Raman spectrum analysis, and promotes the landing of the deep learning method in the application of the spectrum analysis.

Description

Intelligent database building method for Raman spectrum data
Technical Field
The invention relates to the field of machine learning, in particular to an intelligent database building method for Raman spectrum data.
Background
The field detection technology based on Raman spectrum is widely applied in the fields of agricultural production, food safety, public safety and the like, such as Z.F.Zhou, J.L.Lu, J.Y.Wang, Y.S.Zou, T.Liu, Y.L.Zhang, G.K.Liu, Z.Q.Tian, Trace detection of polycyclic aromatic hydrocarbons in environmental waters by SERS, Spectrochiscia Acta Part A, Molecular and Biomolecular Spectroscopy,2020(234)118250, by using surface enhanced Raman Spectroscopy to detect organic pollutants Polycyclic Aromatic Hydrocarbons (PAHs) in water, Cistanchis herba, Viburnet, Ying and the like, and Raman Spectroscopy is used for detecting aflatoxin B1 and zearalenone [ J.zearalenone ] by using Raman Spectroscopy]The book of nuclear agriculture, 2021(1), for the detection of aflatoxin B in maize1And gibberellins. The conventional method for detecting a substance based on a raman spectrum performs template matching with a standard spectrogram of a target substance and sets a similarity threshold to determine whether a sample to be detected contains the target substance, for example, the documents Zhang Z M, Chen X Q, Lu H M, et al ].Chemometrics&Intelligent Laboratory SyStems,2014,137:10-20, uses a search algorithm to qualitatively analyze the Raman spectra of a mixture based on a similarity threshold. The method lacks universality, because the performance of the method is very sensitive to algorithm parameters, and the similarity threshold value is required to be manually adjusted to adapt to the influence of interference and instrument noise in complex environment in field detection.
In recent years, machine learning is widely applied to the field of Raman spectrum substance detection, for example, the literature of the detection method for olive oil adulteration based on Raman spectrum and least square support vector machine is researched [ J ] spectroscopy and spectral analysis, 2012,32(6): 1554-1558) to detect the adulteration of the olive oil by using an algorithm based on the support vector machine. However, the traditional machine learning method depends on manual selection and spectral feature extraction, lacks universality, and is difficult to popularize in different applications. Deep learning has the capability of automatically extracting features due to the strong expression capability of a neural network, and can achieve great success in the fields of computer vision, natural language processing and the like without depending on feature engineering. Therefore, researchers design a deep learning-based raman spectrum analysis algorithm to improve the accuracy and universality of raman detection, and the method has huge development potential and market prospect. For example, the Raman spectra of mixtures are analyzed using neural networks in order to analyze the composition of the Raman spectra, Fan X, Ming W, Zeng H, et al.
Different from traditional machine learning (such as a support vector machine, random forest and the like), the deep learning-based Raman spectrum analysis method needs massive Raman spectrum data to train a neural network. However, the acquisition and labeling of raman spectra require the use of specialized spectroscopic instruments, consume a large amount of materials and labor of specialized personnel, and have quite high time and economic costs, which severely restrict the application and development of deep learning methods. Therefore, some researchers generate a large number of simulated spectra with real spectral characteristics through an information technology based on a small amount of real spectral data so as to improve the construction efficiency of a Raman spectrum database and lay a data foundation for the application of a deep learning method in the Raman spectrum analysis field. For example, the real Raman spectra of single substances are linearly superposed according to a certain proportion to obtain the spectral data of the mixture, and the spectral data are used for training a neural network model for analyzing the components of the mixture; or A.K Conlin, E.B, et al. data evaluation, an alternative acquisition to the analysis of spectral data [ J ]. Chemometics & Intelligent Laboratory Systems,1998 ] to generate large numbers of simulated spectra by adding different levels of Gaussian noise to the actual spectra.
Meanwhile, data enhancement methods emerging in The field of computer vision, such as The Generative Adaptive Networks (GAN), have also been introduced in The field of Raman spectroscopy "Yu S, Li H, Li X, et al. The generation countermeasure network is composed of a generation Model (Generative Model) and a discriminant Model (discriminant Model), a large amount of data is generated by inputting noise vectors into the generation Model, and the discriminant Model discriminates the generated data according to the real data, so that the generation Model generates generated data basically consistent with the distribution of the real data. And applying the generation countermeasure network to the Raman spectrum, taking the original Raman spectrum as a learning target, and directly generating a generated spectrum with the same dimensionality as the original spectrum. When the confrontation network is generated by directly using the spectrum signal training, the convolution structure of the confrontation network cannot well utilize the spectrum locality characteristics, and the training process is unstable, so that the generated model cannot well simulate the distribution of real spectrum data. Experiments prove that the generated spectrum is mostly similar to a signal of an original spectrum added with Gaussian noise, and the signal is directly added into a database for training to reduce the accuracy of substance classification, so that the qualitative detection of the substance cannot be accurately completed. Preliminary analysis, which is due to the lack of spatial correlation of the raman spectral data in sequence form. The inability of the convolution kernel in the generation countermeasure network to extract sufficient local characteristics results in the generated data containing glitch information similar to noise.
A certain degree of Gaussian noise is added to the actually acquired spectrum to simulate a large amount of data with real spectrum characteristics, but the signal-to-noise ratio of the spectrum is also inevitably changed, so that the data distribution of the simulated spectrum and the data distribution of the real spectrum are inconsistent. This violates the assumption that training and test data are identically distributed in machine learning, and if a simulated spectrum added with gaussian noise is used for training a machine learning model, a model overfitting the simulated spectrum is likely to be learned, and cannot be reliably applied to material detection of a real spectrum. Meanwhile, when the Gaussian noise is added, relevant parameters such as Gaussian noise intensity and the like need to be adjusted manually, if the noise is too strong, the original spectrum signal is submerged, and if the noise is too weak, the simulated spectrum is highly similar to the real signal, so that the goal of data enhancement cannot be realized.
Linearly superimposing the spectra of pure substances can generate simulated spectra of a large number of mixtures, but simply setting the weights linearly sums the spectra of several pure substances neglecting the interactions between the substance molecules in the actual mixture. In the spectrum of a real mixture, there is a high possibility that some peaks are drowned or enhanced due to the interaction between molecules, and the nonlinear change cannot be simulated in a linear addition manner. The simulated raman spectrum generated by superposition has low reliability and cannot be used for training or constructing a database.
Disclosure of Invention
The invention mainly aims to overcome the difficulty in establishing a Raman spectrum database in the prior art, and provides an intelligent database establishing method for Raman spectrum data, which can quickly and efficiently establish rich Raman spectrum databases, can be used for training and testing deep learning models, and lays a data foundation for finally realizing accurate field material detection.
The invention adopts the following technical scheme:
an intelligent database building method for Raman spectrum data is characterized by comprising the following steps:
1) performing feature transformation on all original Raman spectrums of a C-th target substance (C ═ 1,.., C) in a database by using continuous wavelet transformation to obtain two-dimensional data signals of the original Raman spectrums, wherein C is the type number of the target substance;
2) generating a random vector z, inputting a trained generation model aiming at the c-th target substance to obtain corresponding two-dimensional signals for generating the spectrum, repeating the step for M times to obtain M two-dimensional signals for generating the spectrum, and marking the signals as the c-th target substance;
3) repeating the steps 1) -2) for other target substances, generating C multiplied by M two-dimensional signals of generated spectrums to form a two-dimensional data set, and establishing a large Raman spectrum database which covers a large amount of labeled samples of the C-type target substances by combining the two-dimensional signals of the original Raman spectrum obtained in the step 1).
The step 1) specifically comprises the following steps: let the original Raman spectrum be S ═ Sj|j=1,2,...,NcIn which N iscRepresenting the number of original raman spectra labeled with the c-th target substance; each raman spectrum is denoted as sj(t), wherein t ═ t1,t2,...,tn]Is a Raman spectrum shift sequence, n represents the length of each Raman spectrum shift sequence, sj(ti) Is shown at tiA raman spectral signal intensity at a location, i ═ 1, 2,. n; for each Raman spectrum sj(t) performing a feature transformation using a continuous wavelet transform to obtain a two-dimensional signal of a time-frequency domain thereof:
Figure BDA0003095575520000031
wherein ψ (t) is a wavelet mother function, a is a scaling factor vector with length l, b is a translation factor vector, and the value thereof is the same as t, then the time-frequency domain two-dimensional data set of the c-th target substance:
Figure BDA0003095575520000032
Figure BDA0003095575520000033
is a two-dimensional matrix of l rows and n columns.
In step 2), training a generation countermeasure network for the c-th target substance, which comprises: firstly establishing a generation countermeasure network aiming at the c type target substance, and then transforming the characteristics of the c type target substanceAnd inputting a two-dimensional signal of a prime original Raman spectrum as a training set into the generative countermeasure network for training, and obtaining a trained generative model aiming at the c type target substance. Specifically, a generation countermeasure network for the c-th target substance is constructed in advance, which includes two neural network models: generating model G cAnd a discrimination model DcTwo-dimensional data set WT of the original Raman spectrum of the c-th target substancecDivided into batches (batch), each batch WTbatchIncluding the batchSize number of raman spectral data,
Figure BDA0003095575520000034
the step of training to generate the confrontation network specifically comprises the following steps:
2.1) two-dimensional data set WT of the batch of original Raman spectrabatchInputting the discrimination model DcThen using discriminant model DcOutput D ofc(WTbatch) Calculating a discriminant model DcAnd propagating the loss back; the first part loss is as follows:
Figure BDA0003095575520000041
2.2) generating a set of random vectors Zbatch={zbatch,k1., batchSize }, each noise vector length is d, one by one, zbatch,kInput generative model GcObtaining a two-dimensional signal Gc(zbatch,k) Then two-dimensional signal G is appliedc(zbatch,k) Input discrimination model DcCalculating the loss of the second part, and finally performing back propagation and gradient descent on the loss; the second part loss is as follows:
Figure BDA0003095575520000042
2.3) intermediate results D with 2.2)c(Gc(zbatch,k) Computer generated model GcDamage ofLosses, which are also counter-propagating and gradient-descending; the loss of the generative model Gc is as follows:
Figure BDA0003095575520000043
2.4) repeating the steps 2.1-2.3) on each batch of two-dimensional data sets of original Raman spectra, completing one round of training, repeating Y rounds of training, completing the generation countermeasure network training of the c-th target substance, and generating a trained model G cCan be used for building a library.
The generation model is a neural network with input of a random vector z and output of a two-dimensional format spectrum, the discrimination model is two-dimensional data of an input spectrum, then a binary neural network is used for judging whether the input is an original spectrum or a generated spectrum, and confidence is output; the optimization goal of generating the countermeasure network is to minimize the difference between the generated spectrum and the original spectrum, as follows:
Figure BDA0003095575520000044
wherein
Figure BDA0003095575520000045
And EzRepresenting a mathematical expectation.
As can be seen from the above description of the present invention, compared with the prior art, the present invention has the following advantages:
1. the method of the invention innovatively uses the multi-resolution characteristic of multi-scale wavelet transform to transform the Raman spectrum from the sequence data into a two-dimensional signal similar to an image, and extracts the fine-grained characteristic of the Raman spectrum signal.
2. The method of the invention uses the original Raman spectrum which generates the two-dimensional training of the countermeasure network to generate a large amount of generated spectrums containing the fingerprint characteristics of the original Raman spectrum material, thereby solving the problems of poor generation effect and unstable training in the process of directly applying the generated countermeasure network to the spectrums.
3. The method of the invention combines a small amount of marked original spectra and a large amount of generated spectra to establish a Raman spectrum database with a two-dimensional format, solves the problems of difficult acquisition of spectral data, high cost, long time consumption and the like when the deep learning is applied to the field of spectral analysis, and promotes the application of the deep learning method to the ground of spectral analysis.
4. The method of the invention uses artificial intelligence technology to quickly establish the spectral database, can be used for training and testing a deep learning model, and lays a data foundation for finally realizing accurate field material detection.
5. The method can be used for the database construction of Raman spectra, and can also be expanded and applied to other spectrum detection, such as the database construction of infrared spectra, X-ray diffraction spectra and chromatograms.
Drawings
FIG. 1 is a flow chart of the method of the present invention;
FIG. 2a is an original spectrum of basic light yellow O;
FIG. 2b is an original two-dimensional spectrum of basic bright yellow O;
FIG. 2c is an example of a two-dimensional spectrogram generated from basic bright yellow O;
FIG. 3a is a raw spectrum of lemon yellow;
FIG. 3b is an original two-dimensional spectrum of lemon yellow;
FIG. 3c is an example of a generated two-dimensional spectrogram of lemon yellow;
fig. 4 is a confusion matrix: substance identification results of the VGG16 classifier.
The invention is described in further detail below with reference to the figures and specific examples.
Detailed Description
The invention is further described below by means of specific embodiments.
The invention provides an intelligent database building method for generating Raman spectrum data of a countermeasure network based on wavelet transformation, which comprises the steps of characteristic transformation, model training and generation, spectrum database building and the like as shown in figure 1. For feature transformation and training the generative model: firstly, a one-dimensional sequence signal of an original Raman spectrum is transformed into a two-dimensional signal in a wavelet space by utilizing wavelet transformation, and then a generation countermeasure network is input for training, which is shown in a dotted arrow part. The generative confrontation network consists of a generative model (Generator, G) and a discriminant model (Discriminator, D) and is trained in a mode of two model confrontations; the former inputs the random vector z and the latter inputs the original spectrum (two-dimensional format).
And establishing a Raman spectrum database part, namely generating a large number of generated spectrums (two-dimensional format) by using a generation model (G) in the training for generating the countermeasure network, and establishing the Raman spectrum database by combining the generated spectrums with the original spectrums. The database stores raman spectral data in a two-dimensional signal format. Therefore, before training or detection, the newly added spectrum needs to be subjected to two-dimensional feature transformation by using multi-scale wavelet transformation, and then subsequent qualitative analysis of the spectrum is performed.
Specifically, the method of the invention comprises the following steps:
1) assuming that an application supports the detection of C target substances, and each target substance has only a small number of labeled spectra, all original raman spectra of C target substances (C ═ 1.. multidot., C) in the database are subjected to feature transformation by using continuous wavelet transformation to obtain two-dimensional data signals of the original raman spectra, wherein C is the number of types of the target substances.
In the characteristic changing step, all original raman spectrums of the c-th target substance in the database are subjected to characteristic transformation by using Continuous Wavelet Transform (CWT), and are transformed from time domain sequence signals (one dimension) into two-dimensional signals of a time-frequency domain.
Let original spectrum be S ═ Sj|j=1,2,...,NcIn which N iscRepresenting the number of original raman spectra labeled with the c-th target substance; each spectrum is denoted as sj(t), wherein t ═ t1,t2,...,tn]Is a Raman spectrum shift sequence, n represents the length of each Raman spectrum shift sequence, sj(ti) Is shown at tiA raman spectral signal intensity at a location, i ═ 1, 2,. n; for each spectrum sj(t) using continuous wavelet transformLine feature transformation to obtain a two-dimensional signal of a time-frequency domain:
Figure BDA0003095575520000061
where ψ (t) is a wavelet mother function, a is a scaling factor vector of length l, and b is a shifting factor vector, and its value is the same as t. Note that the integration interval is [ - ∞, + ∞ n [ - ]]For the Raman spectrum, the spectrum acquisition region of a Raman spectrum instrument is limited, so that the calculation of the formula (1) is ensured by carrying out zero filling or other reasonable modes on signals outside the spectrum acquisition region. Thus obtaining a time-frequency domain two-dimensional data set of the c-th target substance:
Figure BDA0003095575520000062
Figure BDA0003095575520000063
is a two-dimensional matrix of l rows and n columns.
2) Generating a random vector z, inputting a trained generation model G aiming at the c type target substancecObtaining a corresponding two-dimensional signal G of the generated spectrumc(z), repeating the step M times to obtain M two-dimensional signals for generating the Raman spectrum, and marking the two-dimensional signals as the c-th target substance, wherein M can be set to be a larger constant according to application requirements.
In this step, training the generation countermeasure network for the c-th target substance is also included, which includes: firstly establishing a generation countermeasure network for the c type target substance, inputting the two-dimensional signal of the original Raman spectrum of the c type target substance after characteristic transformation as a training set into the generation countermeasure network for training, and obtaining a trained generation model G for the c type target substancec
A generative confrontation network for the c target substance is constructed in advance, which comprises two neural network models: generating model GcAnd a discrimination model Dc. The generation model is a neural network with input of random vector z and output of two-dimensional format spectrum, and the neural network can adopt a neural network with an inverse convolution layer and an activation layerAnd batch standardized conventional neural networks, which are not limited herein. Judging whether the model inputs Raman spectrum two-dimensional data, then judging whether the input is an original spectrum or a generated spectrum by using a binary neural network, and outputting confidence; generating an optimization target V for the countermeasure network is to minimize the difference between the generated spectrum and the original spectrum, where
Figure BDA0003095575520000064
And EzRepresents the mathematical expectation:
Figure BDA0003095575520000071
before training, two-dimensional data set WT of raw Raman spectra of the c-th target substance is generated cDivided into batches (batch), each batch WTbatchIncluding the batchSize number of raman spectral data,
Figure BDA0003095575520000072
the steps of training to generate the confrontation network specifically include the following:
2.1) two-dimensional data set WT of the batch of original spectrabatchInputting the discrimination model DcThen using discriminant model DcOutput D ofc(WTbatch) Calculating a discriminant model DcAnd propagating the loss back, i.e., propagating the error of the loss function to the parameters of the neural network. The first part loss is as follows:
Figure BDA0003095575520000073
2.2) generating a set of random vectors Zbatch={zbatch,k1., batchSize }, each noise vector length is d, one by one, zbatch,kInput generative model GcObtaining a two-dimensional signal Gc(zbatch,k) Then two-dimensional signal G is appliedc(zbatch,k) Input discrimination model DcTo calculate the second partAnd finally, carrying out back propagation and gradient descent on the loss, namely updating the parameters of the neural network according to the back propagation error. The second part loss is as follows:
Figure BDA0003095575520000074
2.3) intermediate results D with 2.2)c(Gc(zbatch,k) Computer generated model GcThe loss is also subjected to back propagation and gradient descent, namely parameters of the neural network are updated according to back propagation errors; generating model GcThe losses are as follows:
Figure BDA0003095575520000075
2.4) repeating the steps 2.1-2.3) on each batch of two-dimensional data sets of original Raman spectra, completing one round of training, repeating Y round of training, wherein Y can be set according to application requirements, completing generation countermeasure network training of the c-th target substance, wherein the well-trained generation model is marked as G c
3) Repeating the steps 1) -2) for other target substances, generating C multiplied by M two-dimensional signals of generated spectrums to form a two-dimensional data set, and establishing a large-scale Raman spectrum database by combining the two-dimensional signals of the original Raman spectrum obtained in the step 1), wherein the large-scale Raman spectrum database covers a large amount of labeled samples of C-class target substances.
The method firstly performs characteristic transformation on a real original Raman spectrum, innovatively applies multi-scale wavelet transformation to Raman spectrum data analysis, and transforms the Raman spectrum data into a two-dimensional signal similar to an image from sequence data. The multi-scale wavelet transform has the characteristic of multi-resolution, and after the multi-scale wavelet transform is carried out on the Raman spectrum, the Raman spectrum is transformed into a two-dimensional signal similar to an image from sequence data, and the signal characteristics of the Raman spectrum with different scales are fully extracted. The transformed two-dimensional spectral data is then used as a learning target to generate a large amount of two-dimensional generated raman spectral data using GAN. And finally, establishing a large Raman spectrum database with a two-dimensional storage format by combining the original spectrum and the generated spectrum. The method of the invention may also be applied to other spectral data libraries, such as infrared spectroscopy or X-ray diffraction spectroscopy or chromatography, as desired.
Experimental verification
The Raman spectrum data used for training and testing in the experiment of the invention is a high-grade PT2000 Raman spectrum instrument (the spectrum range is 200-2500 cm)-1) Samples of 9 target substances collected (pigments: brilliant blue, sunset yellow, lemon yellow, basic bright yellow O, basic orange 2, rhodamine B, carmine, amaranth, allura red), as shown in Table 1.
Table 1: raman spectroscopy data (substance C ═ 9)
Figure BDA0003095575520000081
To verify the effectiveness of the method of the invention in small training sample applications, this experiment only randomly took 20 raw raman spectral data of each class of material for training against the generative network, Nc20, c 1, …, 9. In this experiment, a network structure of a deep convolution deep countermeasure network (DCGAN) was used, and a model G was generated by setting the number of training rounds Y to 1000, the batch size batchSize to 10, and the random vector length d to 100cAnd a discrimination model DcThe learning rates of (1) are all set to 0.0005 and the gradient is decreased using Adam optimizer (beta1 ═ 0.5, beta2 ═ 0.9). Generation model G for each target substancecAfter training is completed, 10000 generated spectra (two-dimensional format) can be generated and labeled. And finally, establishing a Raman spectrum database of 9 pigments, wherein 90180 labeled samples can be used for classifier training.
Experiment-a large number of generated spectra (two-dimensional format) were generated using 20 original raman spectra containing basic bright yellow O, 3 examples of which are shown in fig. 2 c. Experiment two generated spectra (two-dimensional format) using 20 raw raman spectra containing lemon yellow, 3 examples of which are shown in fig. 3 c. Comparing fig. 2b and fig. 2c, it can be found that the original spectrum (two-dimensional format) obtained after the feature change has similarity with the generated spectrum (two-dimensional format). It can also be seen by comparing fig. 2c and fig. 3c that there is a significant difference in the resulting spectra (two-dimensional format) containing different target substances. Therefore, the generated spectrograms also have the characteristic of the material fingerprint and can be used for effective training.
Experiment three verifies the material identification accuracy of the deep learning classifier VGG16 when the spectral database established by the method is used as a training set. The VGG16 neural network was used as a classifier in this experiment, with the number of training rounds set to 4, the batch size set to 50, and the learning rate set to 0.0001. The trained VGG16 classifier performs classification and identification on 1070 test spectrum samples, and uses a confusion matrix and accuracy:
Figure BDA0003095575520000091
as a performance evaluation index. As shown in fig. 4, the test samples of the seven pigments were all correctly identified, i.e. the number of correctly classified spectra on the diagonal is equal to the number of test samples of the corresponding pigment. However, the 10 test spectra of lemon yellow and the 7 test spectra of carmine are misidentified as other pigments, so that the overall accuracy rate is (175+111+90+69+76+96+204+122+111)/1070 ═ 98.41%, which can meet the application requirements of most substance detection. If the VGG16 neural network sets the same parameters, the 180 one-dimensional original Raman spectra are directly trained, and 1070 test spectrum samples are classified, so that only 18.97% of pigments contained in the spectrum samples are accurately identified. It follows that low labeling of samples has hindered the use of deep learning classifiers in raman spectroscopy material identification. The method can effectively solve the practical difficulty of few labeled samples and is beneficial to deep learning and deep analysis of Raman spectra in subsequent use.
The above description is only an embodiment of the present invention, but the design concept of the present invention is not limited thereto, and any insubstantial modifications made by using this concept shall fall within the scope of the present invention.

Claims (6)

1. An intelligent database building method for Raman spectrum data is characterized by comprising the following steps:
1) performing feature transformation on all original Raman spectra of a C-th target substance in a database by using continuous wavelet transformation to obtain a two-dimensional data signal of the original Raman spectra, wherein C is 1, …, C and C are the number of types of the target substance; the step 1) is specifically as follows: let the original Raman spectrum be S ═ Sj|j=1,2,…,NcIn which N iscRepresenting the number of original raman spectra labeled with the c-th target substance; each raman spectrum is represented as sj(t), wherein t ═ t1,t2,…,tn]Is a Raman spectrum shift sequence, n represents the length of each Raman spectrum shift sequence, sj(ti) Is shown at tiA raman spectral signal intensity at a location, i ═ 1,2,. n; for each Raman spectrum sj(t) performing a feature transformation using a continuous wavelet transform to obtain a two-dimensional signal of a time-frequency domain thereof:
Figure FDA0003608058090000011
wherein ψ (t) is a wavelet mother function, a is a scaling factor vector with length l, b is a translation factor vector, and the value thereof is the same as t, then the time-frequency domain two-dimensional data set of the c-th target substance:
Figure FDA0003608058090000012
Figure FDA0003608058090000013
A two-dimensional matrix with l rows and n columns;
2) randomly generating a vector z, inputting a trained generation model aiming at the c-th target substance, obtaining a corresponding two-dimensional signal of a generated spectrum, repeating the step M times, obtaining M two-dimensional signals of the generated spectrum, marking the signals as the c-th target substance, training a generation countermeasure network aiming at the c-th target substance, and comprising the following steps of: firstly establishing a generation countermeasure network for the c type target substance, and inputting a two-dimensional signal of an original Raman spectrum of the c type target substance after characteristic transformation into the generation countermeasure network as a training set for training to obtain a trained generation model for the c type target substance;
3) repeating the steps 1) -2) for other target substances, generating C multiplied by M two-dimensional signals of generated spectrums to form a two-dimensional data set, and establishing a large Raman spectrum database which covers a large amount of labeled samples of the C-type target substances by combining the two-dimensional signals of the original Raman spectrum obtained in the step 1).
2. The intelligent raman spectral data library building method of claim 1, wherein a generative confrontation network for a c-th target substance is pre-constructed and comprises two neural network models: generating model G cAnd a discrimination model DcTwo-dimensional data set WT of the original Raman spectrum of the c-th target substancecDivided into batches of WT' sbatchIncluding the batchSize number of raman spectral data,
Figure FDA0003608058090000014
the step of training to generate the countermeasure network specifically comprises the following steps:
2.1) two-dimensional data set WT of the batch of original Raman spectrabatchInputting the discrimination model DcThen using discriminant model DcOutput D ofc(WTbatch) Calculating a discriminant model DcAnd propagating the loss back;
2.2) generating a set of random vectors Zbatch={zbatch,k1, …, batchSize, each random vector is d, one by one, zbatch,kInput generative model GcObtaining a two-dimensional signal Gc(zbatch,k) Then two-dimensional signal G is appliedc(zbatch,k) Input discrimination model DcCalculating the loss of the second part, and finally performing back propagation and gradient descent on the loss;
2.3) intermediate results D with 2.2)c(Gc(zbatch,k) Computer generated model G)cIs also lostThe loss is subjected to back propagation and gradient descent;
2.4) repeating the steps 2.1) -2.3) on each batch of two-dimensional data sets of the original Raman spectra, completing one round of training, repeating the Y round of training, completing the generation countermeasure network training of the c-th target substance, and generating a trained model GcCan be used for building a library.
3. An intelligent raman spectrum data library construction method according to claim 2, wherein the generating model is a neural network with an input of a random vector z and an output of a two-dimensional format spectrum, the discriminating model is input spectrum two-dimensional data, and then a binary neural network is used to determine whether the input is an original spectrum or a generated spectrum and output a confidence; the optimization objective V of generating the countermeasure network is to minimize the difference between the generated spectrum and the original spectrum, as follows:
Figure FDA0003608058090000021
wherein
Figure FDA0003608058090000022
And EzRepresenting a mathematical expectation.
4. An intelligent raman spectral data library building method according to claim 2 wherein in step 2.1) said first portion loss is as follows:
Figure FDA0003608058090000023
5. an intelligent raman spectral data library building method according to claim 2 wherein in step 2.2) said second portion loss is as follows:
Figure FDA0003608058090000024
6. an intelligent raman spectral data library building method according to claim 2, wherein: in step 2.3), the generative model G iscThe losses are as follows:
Figure FDA0003608058090000025
CN202110610390.4A 2021-06-01 2021-06-01 Intelligent database building method for Raman spectrum data Active CN113378680B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110610390.4A CN113378680B (en) 2021-06-01 2021-06-01 Intelligent database building method for Raman spectrum data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110610390.4A CN113378680B (en) 2021-06-01 2021-06-01 Intelligent database building method for Raman spectrum data

Publications (2)

Publication Number Publication Date
CN113378680A CN113378680A (en) 2021-09-10
CN113378680B true CN113378680B (en) 2022-06-28

Family

ID=77575258

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110610390.4A Active CN113378680B (en) 2021-06-01 2021-06-01 Intelligent database building method for Raman spectrum data

Country Status (1)

Country Link
CN (1) CN113378680B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023198592A1 (en) 2022-04-14 2023-10-19 Covestro Deutschland Ag Method of determining a composition of molecule fragments via a combined experimental – machine learning approach, corresponding data processing circuit and computer program
CN114858782B (en) * 2022-07-05 2022-09-27 中国民航大学 Milk powder doping non-directional detection method based on Raman hyperspectral countermeasure discriminant model
CN115326783B (en) * 2022-10-13 2023-01-17 南方科技大学 Raman spectrum preprocessing model generation method, system, terminal and storage medium

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104215623A (en) * 2013-05-31 2014-12-17 欧普图斯(苏州)光学纳米科技有限公司 Multi-industry detection-oriented laser Raman spectrum intelligent identification method and system
CN107727634A (en) * 2017-09-26 2018-02-23 上海化工研究院有限公司 A kind of laser Raman spectroscopy solution spectrum processing method
CN109508647A (en) * 2018-10-22 2019-03-22 北京理工大学 A kind of spectra database extended method based on generation confrontation network
CN109520999A (en) * 2019-01-17 2019-03-26 云南中烟工业有限责任公司 A kind of sage clary oil method for estimating stability based on two-dimensional correlation spectra
CN111860124A (en) * 2020-06-04 2020-10-30 西安电子科技大学 Remote sensing image classification method based on space spectrum capsule generation countermeasure network
CN112699899A (en) * 2020-12-31 2021-04-23 杭州电子科技大学 Hyperspectral image feature extraction method based on generation countermeasure network
CN112712857A (en) * 2020-12-08 2021-04-27 北京信息科技大学 Method for generating biological Raman spectrum data based on WGAN (WGAN) antagonistic generation network

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104215623A (en) * 2013-05-31 2014-12-17 欧普图斯(苏州)光学纳米科技有限公司 Multi-industry detection-oriented laser Raman spectrum intelligent identification method and system
CN107727634A (en) * 2017-09-26 2018-02-23 上海化工研究院有限公司 A kind of laser Raman spectroscopy solution spectrum processing method
CN109508647A (en) * 2018-10-22 2019-03-22 北京理工大学 A kind of spectra database extended method based on generation confrontation network
CN109520999A (en) * 2019-01-17 2019-03-26 云南中烟工业有限责任公司 A kind of sage clary oil method for estimating stability based on two-dimensional correlation spectra
CN111860124A (en) * 2020-06-04 2020-10-30 西安电子科技大学 Remote sensing image classification method based on space spectrum capsule generation countermeasure network
CN112712857A (en) * 2020-12-08 2021-04-27 北京信息科技大学 Method for generating biological Raman spectrum data based on WGAN (WGAN) antagonistic generation network
CN112699899A (en) * 2020-12-31 2021-04-23 杭州电子科技大学 Hyperspectral image feature extraction method based on generation countermeasure network

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
An auto-adaptive background subtraction method for Raman spectra;xie yi et al.;《Spectrochimica Acta Part A: Molecular & Biomolecular Spectroscopy》;20160515;第161卷;全文 *
Proceedings of the IEEE-SP International Symposium on Time-Frequency and Time-Scale Analysis;K.S. Riedel;《Proceedings of the IEEE-SP International Symposium on Time-Frequency and Time-Scale Analysis》;20020806;全文 *
应用激光拉曼光谱判别油菜叶片核盘菌早期侵染;赵艳茹 等;《农业工程学报》;20170108;第33卷(第1期);全文 *

Also Published As

Publication number Publication date
CN113378680A (en) 2021-09-10

Similar Documents

Publication Publication Date Title
CN113378680B (en) Intelligent database building method for Raman spectrum data
Xie et al. A deep-learning-based real-time detector for grape leaf diseases using improved convolutional neural networks
CN109858477A (en) The Raman spectrum analysis method of object is identified in complex environment with depth forest
CN108960330A (en) Remote sensing images semanteme generation method based on fast area convolutional neural networks
CN108345860A (en) Personnel based on deep learning and learning distance metric recognition methods again
US20190265319A1 (en) System and method for small molecule accurate recognition technology ("smart")
Cheng et al. An overview of infrared spectroscopy based on continuous wavelet transform combined with machine learning algorithms: application to chinese medicines, plant classification, and cancer diagnosis
CN102645649A (en) Radar target recognition method based on radar target range profile time-frequency feature extraction
CN102663447B (en) Cross-media searching method based on discrimination correlation analysis
CN103886336A (en) Polarized SAR image classifying method based on sparse automatic encoder
Liu Adversarial nets for baseline correction in spectra processing
Zhao et al. Vision transformer for quality identification of sesame oil with stereoscopic fluorescence spectrum image
CN108734115A (en) A kind of radar target identification method based on the consistent dictionary learning of label
CN111428585A (en) Metamaterial terahertz spectroscopy identification method based on deep learning
Bogdal et al. Recognition of gasoline in fire debris using machine learning: Part II, application of a neural network
CN107203779A (en) The EO-1 hyperion dimension reduction method kept based on empty spectrum information
Shen et al. Single convolutional neural network model for multiple preprocessing of Raman spectra
CN110378374A (en) A kind of tealeaves near infrared light profile classification method that fuzzy authentication information extracts
CN106778802B (en) Hyperspectral image classification multi-core learning method for maximizing category separability
CN109871907A (en) Radar target high resolution range profile recognition methods based on SAE-HMM model
Képeš et al. Interpreting convolutional neural network classifiers applied to laser-induced breakdown optical emission spectra
Zhang et al. A novel spectral-spatial multi-scale network for hyperspectral image classification with the Res2Net block
CN109447009B (en) Hyperspectral image classification method based on subspace nuclear norm regularization regression model
Tan et al. Near-infrared spectroscopy analysis of compound fertilizer based on GAF and quaternion convolution neural network
Li et al. Application of laser-induced breakdown spectroscopy coupled with spectral matrix and convolutional neural network for identifying geographical origins of Gentiana rigescens Franch

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant