CN113378680B

CN113378680B - Intelligent database building method for Raman spectrum data

Info

Publication number: CN113378680B
Application number: CN202110610390.4A
Authority: CN
Inventors: 吴德文; 韩李翔; 陈嘉祥; 王思伟; 李超然; 刘国坤; 罗思恒; 曾勇明; 谢怡
Original assignee: Xiamen Perser Nano Tech Co ltd; Xiamen University
Current assignee: Xiamen Perser Nano Tech Co ltd; Xiamen University
Priority date: 2021-06-01
Filing date: 2021-06-01
Publication date: 2022-06-28
Anticipated expiration: 2041-06-01
Also published as: CN113378680A

Abstract

The invention provides an intelligent database building method for Raman spectrum data, which comprises the steps of firstly transforming one-dimensional sequence signals of an original Raman spectrum into two-dimensional signals in a wavelet space by utilizing wavelet transformation, and then inputting the two-dimensional signals into a generation countermeasure network for training. The generation of the confrontation network comprises a generation model and a discrimination model, and the confrontation network is trained in the form of two models; the former inputs a vector generated at random to generate a generated spectrum (two-dimensional format), and the latter inputs an original spectrum and the generated spectrum (two-dimensional format) and judges whether the input is the original spectrum. After training of the generation countermeasure network is completed, a large number of generated spectrums (two-dimensional format) similar to the original spectrums are generated by using the generation model, and a spectrum database is established by combining the generated spectrums with the original spectrums. The database stores spectral data in a two-dimensional signal format. The method solves the problems of difficult acquisition of spectral data, high cost, long time consumption and the like when the deep learning is applied to the field of Raman spectrum analysis, and promotes the landing of the deep learning method in the application of the spectrum analysis.

Description

Intelligent database building method for Raman spectrum data

Technical Field

The invention relates to the field of machine learning, in particular to an intelligent database building method for Raman spectrum data.

Background

The field detection technology based on Raman spectrum is widely applied in the fields of agricultural production, food safety, public safety and the like, such as Z.F.Zhou, J.L.Lu, J.Y.Wang, Y.S.Zou, T.Liu, Y.L.Zhang, G.K.Liu, Z.Q.Tian, Trace detection of polycyclic aromatic hydrocarbons in environmental waters by SERS, Spectrochiscia Acta Part A, Molecular and Biomolecular Spectroscopy,2020(234)118250, by using surface enhanced Raman Spectroscopy to detect organic pollutants Polycyclic Aromatic Hydrocarbons (PAHs) in water, Cistanchis herba, Viburnet, Ying and the like, and Raman Spectroscopy is used for detecting aflatoxin B1 and zearalenone [ J.zearalenone ] by using Raman Spectroscopy]The book of nuclear agriculture, 2021(1), for the detection of aflatoxin B in maize₁And gibberellins. The conventional method for detecting a substance based on a raman spectrum performs template matching with a standard spectrogram of a target substance and sets a similarity threshold to determine whether a sample to be detected contains the target substance, for example, the documents Zhang Z M, Chen X Q, Lu H M, et al ].Chemometrics&Intelligent Laboratory SyStems,2014,137:10-20, uses a search algorithm to qualitatively analyze the Raman spectra of a mixture based on a similarity threshold. The method lacks universality, because the performance of the method is very sensitive to algorithm parameters, and the similarity threshold value is required to be manually adjusted to adapt to the influence of interference and instrument noise in complex environment in field detection.

In recent years, machine learning is widely applied to the field of Raman spectrum substance detection, for example, the literature of the detection method for olive oil adulteration based on Raman spectrum and least square support vector machine is researched [ J ] spectroscopy and spectral analysis, 2012,32(6): 1554-1558) to detect the adulteration of the olive oil by using an algorithm based on the support vector machine. However, the traditional machine learning method depends on manual selection and spectral feature extraction, lacks universality, and is difficult to popularize in different applications. Deep learning has the capability of automatically extracting features due to the strong expression capability of a neural network, and can achieve great success in the fields of computer vision, natural language processing and the like without depending on feature engineering. Therefore, researchers design a deep learning-based raman spectrum analysis algorithm to improve the accuracy and universality of raman detection, and the method has huge development potential and market prospect. For example, the Raman spectra of mixtures are analyzed using neural networks in order to analyze the composition of the Raman spectra, Fan X, Ming W, Zeng H, et al.

Different from traditional machine learning (such as a support vector machine, random forest and the like), the deep learning-based Raman spectrum analysis method needs massive Raman spectrum data to train a neural network. However, the acquisition and labeling of raman spectra require the use of specialized spectroscopic instruments, consume a large amount of materials and labor of specialized personnel, and have quite high time and economic costs, which severely restrict the application and development of deep learning methods. Therefore, some researchers generate a large number of simulated spectra with real spectral characteristics through an information technology based on a small amount of real spectral data so as to improve the construction efficiency of a Raman spectrum database and lay a data foundation for the application of a deep learning method in the Raman spectrum analysis field. For example, the real Raman spectra of single substances are linearly superposed according to a certain proportion to obtain the spectral data of the mixture, and the spectral data are used for training a neural network model for analyzing the components of the mixture; or A.K Conlin, E.B, et al. data evaluation, an alternative acquisition to the analysis of spectral data [ J ]. Chemometics & Intelligent Laboratory Systems,1998 ] to generate large numbers of simulated spectra by adding different levels of Gaussian noise to the actual spectra.

Meanwhile, data enhancement methods emerging in The field of computer vision, such as The Generative Adaptive Networks (GAN), have also been introduced in The field of Raman spectroscopy "Yu S, Li H, Li X, et al. The generation countermeasure network is composed of a generation Model (Generative Model) and a discriminant Model (discriminant Model), a large amount of data is generated by inputting noise vectors into the generation Model, and the discriminant Model discriminates the generated data according to the real data, so that the generation Model generates generated data basically consistent with the distribution of the real data. And applying the generation countermeasure network to the Raman spectrum, taking the original Raman spectrum as a learning target, and directly generating a generated spectrum with the same dimensionality as the original spectrum. When the confrontation network is generated by directly using the spectrum signal training, the convolution structure of the confrontation network cannot well utilize the spectrum locality characteristics, and the training process is unstable, so that the generated model cannot well simulate the distribution of real spectrum data. Experiments prove that the generated spectrum is mostly similar to a signal of an original spectrum added with Gaussian noise, and the signal is directly added into a database for training to reduce the accuracy of substance classification, so that the qualitative detection of the substance cannot be accurately completed. Preliminary analysis, which is due to the lack of spatial correlation of the raman spectral data in sequence form. The inability of the convolution kernel in the generation countermeasure network to extract sufficient local characteristics results in the generated data containing glitch information similar to noise.

A certain degree of Gaussian noise is added to the actually acquired spectrum to simulate a large amount of data with real spectrum characteristics, but the signal-to-noise ratio of the spectrum is also inevitably changed, so that the data distribution of the simulated spectrum and the data distribution of the real spectrum are inconsistent. This violates the assumption that training and test data are identically distributed in machine learning, and if a simulated spectrum added with gaussian noise is used for training a machine learning model, a model overfitting the simulated spectrum is likely to be learned, and cannot be reliably applied to material detection of a real spectrum. Meanwhile, when the Gaussian noise is added, relevant parameters such as Gaussian noise intensity and the like need to be adjusted manually, if the noise is too strong, the original spectrum signal is submerged, and if the noise is too weak, the simulated spectrum is highly similar to the real signal, so that the goal of data enhancement cannot be realized.

Linearly superimposing the spectra of pure substances can generate simulated spectra of a large number of mixtures, but simply setting the weights linearly sums the spectra of several pure substances neglecting the interactions between the substance molecules in the actual mixture. In the spectrum of a real mixture, there is a high possibility that some peaks are drowned or enhanced due to the interaction between molecules, and the nonlinear change cannot be simulated in a linear addition manner. The simulated raman spectrum generated by superposition has low reliability and cannot be used for training or constructing a database.

Disclosure of Invention

The invention mainly aims to overcome the difficulty in establishing a Raman spectrum database in the prior art, and provides an intelligent database establishing method for Raman spectrum data, which can quickly and efficiently establish rich Raman spectrum databases, can be used for training and testing deep learning models, and lays a data foundation for finally realizing accurate field material detection.

The invention adopts the following technical scheme:

an intelligent database building method for Raman spectrum data is characterized by comprising the following steps:

1) performing feature transformation on all original Raman spectrums of a C-th target substance (C ═ 1,.., C) in a database by using continuous wavelet transformation to obtain two-dimensional data signals of the original Raman spectrums, wherein C is the type number of the target substance;

2) generating a random vector z, inputting a trained generation model aiming at the c-th target substance to obtain corresponding two-dimensional signals for generating the spectrum, repeating the step for M times to obtain M two-dimensional signals for generating the spectrum, and marking the signals as the c-th target substance;

3) repeating the steps 1) -2) for other target substances, generating C multiplied by M two-dimensional signals of generated spectrums to form a two-dimensional data set, and establishing a large Raman spectrum database which covers a large amount of labeled samples of the C-type target substances by combining the two-dimensional signals of the original Raman spectrum obtained in the step 1).

The step 1) specifically comprises the following steps: let the original Raman spectrum be S ═ S_j|j＝1，2，...，N_cIn which N is_cRepresenting the number of original raman spectra labeled with the c-th target substance; each raman spectrum is denoted as s_j(t), wherein t ═ t₁，t₂，...，t_n]Is a Raman spectrum shift sequence, n represents the length of each Raman spectrum shift sequence, s_j(t_i) Is shown at t_iA raman spectral signal intensity at a location, i ═ 1, 2,. n; for each Raman spectrum s_j(t) performing a feature transformation using a continuous wavelet transform to obtain a two-dimensional signal of a time-frequency domain thereof:

wherein ψ (t) is a wavelet mother function, a is a scaling factor vector with length l, b is a translation factor vector, and the value thereof is the same as t, then the time-frequency domain two-dimensional data set of the c-th target substance:

is a two-dimensional matrix of l rows and n columns.

In step 2), training a generation countermeasure network for the c-th target substance, which comprises: firstly establishing a generation countermeasure network aiming at the c type target substance, and then transforming the characteristics of the c type target substanceAnd inputting a two-dimensional signal of a prime original Raman spectrum as a training set into the generative countermeasure network for training, and obtaining a trained generative model aiming at the c type target substance. Specifically, a generation countermeasure network for the c-th target substance is constructed in advance, which includes two neural network models: generating model G _cAnd a discrimination model D_cTwo-dimensional data set WT of the original Raman spectrum of the c-th target substance_cDivided into batches (batch), each batch WT_batchIncluding the batchSize number of raman spectral data,

the step of training to generate the confrontation network specifically comprises the following steps:

2.1) two-dimensional data set WT of the batch of original Raman spectra_batchInputting the discrimination model D_cThen using discriminant model D_cOutput D of_c(WT_batch) Calculating a discriminant model D_cAnd propagating the loss back; the first part loss is as follows:

2.2) generating a set of random vectors Z_batch＝{z_batch，k1., batchSize }, each noise vector length is d, one by one, z_batch，kInput generative model G_cObtaining a two-dimensional signal G_c(z_batch，k) Then two-dimensional signal G is applied_c(z_batch，k) Input discrimination model D_cCalculating the loss of the second part, and finally performing back propagation and gradient descent on the loss; the second part loss is as follows:

2.3) intermediate results D with 2.2)_c(G_c(z_batch，k) Computer generated model G_cDamage ofLosses, which are also counter-propagating and gradient-descending; the loss of the generative model Gc is as follows:

2.4) repeating the steps 2.1-2.3) on each batch of two-dimensional data sets of original Raman spectra, completing one round of training, repeating Y rounds of training, completing the generation countermeasure network training of the c-th target substance, and generating a trained model G _cCan be used for building a library.

The generation model is a neural network with input of a random vector z and output of a two-dimensional format spectrum, the discrimination model is two-dimensional data of an input spectrum, then a binary neural network is used for judging whether the input is an original spectrum or a generated spectrum, and confidence is output; the optimization goal of generating the countermeasure network is to minimize the difference between the generated spectrum and the original spectrum, as follows:

wherein

And E_zRepresenting a mathematical expectation.

As can be seen from the above description of the present invention, compared with the prior art, the present invention has the following advantages:

1. the method of the invention innovatively uses the multi-resolution characteristic of multi-scale wavelet transform to transform the Raman spectrum from the sequence data into a two-dimensional signal similar to an image, and extracts the fine-grained characteristic of the Raman spectrum signal.

2. The method of the invention uses the original Raman spectrum which generates the two-dimensional training of the countermeasure network to generate a large amount of generated spectrums containing the fingerprint characteristics of the original Raman spectrum material, thereby solving the problems of poor generation effect and unstable training in the process of directly applying the generated countermeasure network to the spectrums.

3. The method of the invention combines a small amount of marked original spectra and a large amount of generated spectra to establish a Raman spectrum database with a two-dimensional format, solves the problems of difficult acquisition of spectral data, high cost, long time consumption and the like when the deep learning is applied to the field of spectral analysis, and promotes the application of the deep learning method to the ground of spectral analysis.

4. The method of the invention uses artificial intelligence technology to quickly establish the spectral database, can be used for training and testing a deep learning model, and lays a data foundation for finally realizing accurate field material detection.

5. The method can be used for the database construction of Raman spectra, and can also be expanded and applied to other spectrum detection, such as the database construction of infrared spectra, X-ray diffraction spectra and chromatograms.

Drawings

FIG. 1 is a flow chart of the method of the present invention;

FIG. 2a is an original spectrum of basic light yellow O;

FIG. 2b is an original two-dimensional spectrum of basic bright yellow O;

FIG. 2c is an example of a two-dimensional spectrogram generated from basic bright yellow O;

FIG. 3a is a raw spectrum of lemon yellow;

FIG. 3b is an original two-dimensional spectrum of lemon yellow;

FIG. 3c is an example of a generated two-dimensional spectrogram of lemon yellow;

fig. 4 is a confusion matrix: substance identification results of the VGG16 classifier.

The invention is described in further detail below with reference to the figures and specific examples.

Detailed Description

The invention is further described below by means of specific embodiments.

The invention provides an intelligent database building method for generating Raman spectrum data of a countermeasure network based on wavelet transformation, which comprises the steps of characteristic transformation, model training and generation, spectrum database building and the like as shown in figure 1. For feature transformation and training the generative model: firstly, a one-dimensional sequence signal of an original Raman spectrum is transformed into a two-dimensional signal in a wavelet space by utilizing wavelet transformation, and then a generation countermeasure network is input for training, which is shown in a dotted arrow part. The generative confrontation network consists of a generative model (Generator, G) and a discriminant model (Discriminator, D) and is trained in a mode of two model confrontations; the former inputs the random vector z and the latter inputs the original spectrum (two-dimensional format).

And establishing a Raman spectrum database part, namely generating a large number of generated spectrums (two-dimensional format) by using a generation model (G) in the training for generating the countermeasure network, and establishing the Raman spectrum database by combining the generated spectrums with the original spectrums. The database stores raman spectral data in a two-dimensional signal format. Therefore, before training or detection, the newly added spectrum needs to be subjected to two-dimensional feature transformation by using multi-scale wavelet transformation, and then subsequent qualitative analysis of the spectrum is performed.

Specifically, the method of the invention comprises the following steps:

1) assuming that an application supports the detection of C target substances, and each target substance has only a small number of labeled spectra, all original raman spectra of C target substances (C ═ 1.. multidot., C) in the database are subjected to feature transformation by using continuous wavelet transformation to obtain two-dimensional data signals of the original raman spectra, wherein C is the number of types of the target substances.

In the characteristic changing step, all original raman spectrums of the c-th target substance in the database are subjected to characteristic transformation by using Continuous Wavelet Transform (CWT), and are transformed from time domain sequence signals (one dimension) into two-dimensional signals of a time-frequency domain.

Let original spectrum be S ═ S_j|j＝1，2，...，N_cIn which N is_cRepresenting the number of original raman spectra labeled with the c-th target substance; each spectrum is denoted as s_j(t), wherein t ═ t₁，t₂，...，t_n]Is a Raman spectrum shift sequence, n represents the length of each Raman spectrum shift sequence, s_j(t_i) Is shown at t_iA raman spectral signal intensity at a location, i ═ 1, 2,. n; for each spectrum s_j(t) using continuous wavelet transformLine feature transformation to obtain a two-dimensional signal of a time-frequency domain:

where ψ (t) is a wavelet mother function, a is a scaling factor vector of length l, and b is a shifting factor vector, and its value is the same as t. Note that the integration interval is [ - ∞, + ∞ n [ - ]]For the Raman spectrum, the spectrum acquisition region of a Raman spectrum instrument is limited, so that the calculation of the formula (1) is ensured by carrying out zero filling or other reasonable modes on signals outside the spectrum acquisition region. Thus obtaining a time-frequency domain two-dimensional data set of the c-th target substance:

is a two-dimensional matrix of l rows and n columns.

2) Generating a random vector z, inputting a trained generation model G aiming at the c type target substance_cObtaining a corresponding two-dimensional signal G of the generated spectrum_c(z), repeating the step M times to obtain M two-dimensional signals for generating the Raman spectrum, and marking the two-dimensional signals as the c-th target substance, wherein M can be set to be a larger constant according to application requirements.

In this step, training the generation countermeasure network for the c-th target substance is also included, which includes: firstly establishing a generation countermeasure network for the c type target substance, inputting the two-dimensional signal of the original Raman spectrum of the c type target substance after characteristic transformation as a training set into the generation countermeasure network for training, and obtaining a trained generation model G for the c type target substance_c。

A generative confrontation network for the c target substance is constructed in advance, which comprises two neural network models: generating model G_cAnd a discrimination model D_c. The generation model is a neural network with input of random vector z and output of two-dimensional format spectrum, and the neural network can adopt a neural network with an inverse convolution layer and an activation layerAnd batch standardized conventional neural networks, which are not limited herein. Judging whether the model inputs Raman spectrum two-dimensional data, then judging whether the input is an original spectrum or a generated spectrum by using a binary neural network, and outputting confidence; generating an optimization target V for the countermeasure network is to minimize the difference between the generated spectrum and the original spectrum, where

And E_zRepresents the mathematical expectation:

before training, two-dimensional data set WT of raw Raman spectra of the c-th target substance is generated _cDivided into batches (batch), each batch WT_batchIncluding the batchSize number of raman spectral data,

the steps of training to generate the confrontation network specifically include the following:

2.1) two-dimensional data set WT of the batch of original spectra_batchInputting the discrimination model D_cThen using discriminant model D_cOutput D of_c(WT_batch) Calculating a discriminant model D_cAnd propagating the loss back, i.e., propagating the error of the loss function to the parameters of the neural network. The first part loss is as follows:

2.2) generating a set of random vectors Z_batch＝{z_batch，k1., batchSize }, each noise vector length is d, one by one, z_batch，kInput generative model G_cObtaining a two-dimensional signal G_c(z_batch，k) Then two-dimensional signal G is applied_c(z_batch，k) Input discrimination model D_cTo calculate the second partAnd finally, carrying out back propagation and gradient descent on the loss, namely updating the parameters of the neural network according to the back propagation error. The second part loss is as follows:

2.3) intermediate results D with 2.2)_c(G_c(z_batch，k) Computer generated model G_cThe loss is also subjected to back propagation and gradient descent, namely parameters of the neural network are updated according to back propagation errors; generating model G_cThe losses are as follows:

2.4) repeating the steps 2.1-2.3) on each batch of two-dimensional data sets of original Raman spectra, completing one round of training, repeating Y round of training, wherein Y can be set according to application requirements, completing generation countermeasure network training of the c-th target substance, wherein the well-trained generation model is marked as G _c。

3) Repeating the steps 1) -2) for other target substances, generating C multiplied by M two-dimensional signals of generated spectrums to form a two-dimensional data set, and establishing a large-scale Raman spectrum database by combining the two-dimensional signals of the original Raman spectrum obtained in the step 1), wherein the large-scale Raman spectrum database covers a large amount of labeled samples of C-class target substances.

The method firstly performs characteristic transformation on a real original Raman spectrum, innovatively applies multi-scale wavelet transformation to Raman spectrum data analysis, and transforms the Raman spectrum data into a two-dimensional signal similar to an image from sequence data. The multi-scale wavelet transform has the characteristic of multi-resolution, and after the multi-scale wavelet transform is carried out on the Raman spectrum, the Raman spectrum is transformed into a two-dimensional signal similar to an image from sequence data, and the signal characteristics of the Raman spectrum with different scales are fully extracted. The transformed two-dimensional spectral data is then used as a learning target to generate a large amount of two-dimensional generated raman spectral data using GAN. And finally, establishing a large Raman spectrum database with a two-dimensional storage format by combining the original spectrum and the generated spectrum. The method of the invention may also be applied to other spectral data libraries, such as infrared spectroscopy or X-ray diffraction spectroscopy or chromatography, as desired.

Experimental verification

The Raman spectrum data used for training and testing in the experiment of the invention is a high-grade PT2000 Raman spectrum instrument (the spectrum range is 200-2500 cm)^-1) Samples of 9 target substances collected (pigments: brilliant blue, sunset yellow, lemon yellow, basic bright yellow O, basic orange 2, rhodamine B, carmine, amaranth, allura red), as shown in Table 1.

Table 1: raman spectroscopy data (substance C ═ 9)

To verify the effectiveness of the method of the invention in small training sample applications, this experiment only randomly took 20 raw raman spectral data of each class of material for training against the generative network, N_c20, c 1, …, 9. In this experiment, a network structure of a deep convolution deep countermeasure network (DCGAN) was used, and a model G was generated by setting the number of training rounds Y to 1000, the batch size batchSize to 10, and the random vector length d to 100_cAnd a discrimination model D_cThe learning rates of (1) are all set to 0.0005 and the gradient is decreased using Adam optimizer (beta1 ═ 0.5, beta2 ═ 0.9). Generation model G for each target substance_cAfter training is completed, 10000 generated spectra (two-dimensional format) can be generated and labeled. And finally, establishing a Raman spectrum database of 9 pigments, wherein 90180 labeled samples can be used for classifier training.

Experiment-a large number of generated spectra (two-dimensional format) were generated using 20 original raman spectra containing basic bright yellow O, 3 examples of which are shown in fig. 2 c. Experiment two generated spectra (two-dimensional format) using 20 raw raman spectra containing lemon yellow, 3 examples of which are shown in fig. 3 c. Comparing fig. 2b and fig. 2c, it can be found that the original spectrum (two-dimensional format) obtained after the feature change has similarity with the generated spectrum (two-dimensional format). It can also be seen by comparing fig. 2c and fig. 3c that there is a significant difference in the resulting spectra (two-dimensional format) containing different target substances. Therefore, the generated spectrograms also have the characteristic of the material fingerprint and can be used for effective training.

Experiment three verifies the material identification accuracy of the deep learning classifier VGG16 when the spectral database established by the method is used as a training set. The VGG16 neural network was used as a classifier in this experiment, with the number of training rounds set to 4, the batch size set to 50, and the learning rate set to 0.0001. The trained VGG16 classifier performs classification and identification on 1070 test spectrum samples, and uses a confusion matrix and accuracy:

as a performance evaluation index. As shown in fig. 4, the test samples of the seven pigments were all correctly identified, i.e. the number of correctly classified spectra on the diagonal is equal to the number of test samples of the corresponding pigment. However, the 10 test spectra of lemon yellow and the 7 test spectra of carmine are misidentified as other pigments, so that the overall accuracy rate is (175+111+90+69+76+96+204+122+111)/1070 ═ 98.41%, which can meet the application requirements of most substance detection. If the VGG16 neural network sets the same parameters, the 180 one-dimensional original Raman spectra are directly trained, and 1070 test spectrum samples are classified, so that only 18.97% of pigments contained in the spectrum samples are accurately identified. It follows that low labeling of samples has hindered the use of deep learning classifiers in raman spectroscopy material identification. The method can effectively solve the practical difficulty of few labeled samples and is beneficial to deep learning and deep analysis of Raman spectra in subsequent use.

The above description is only an embodiment of the present invention, but the design concept of the present invention is not limited thereto, and any insubstantial modifications made by using this concept shall fall within the scope of the present invention.

Claims

1. An intelligent database building method for Raman spectrum data is characterized by comprising the following steps:

1) performing feature transformation on all original Raman spectra of a C-th target substance in a database by using continuous wavelet transformation to obtain a two-dimensional data signal of the original Raman spectra, wherein C is 1, …, C and C are the number of types of the target substance; the step 1) is specifically as follows: let the original Raman spectrum be S ═ S_j|j＝1,2,…,N_cIn which N is_cRepresenting the number of original raman spectra labeled with the c-th target substance; each raman spectrum is represented as s_j(t), wherein t ═ t₁,t₂,…,t_n]Is a Raman spectrum shift sequence, n represents the length of each Raman spectrum shift sequence, s_j(t_i) Is shown at t_iA raman spectral signal intensity at a location, i ═ 1,2,. n; for each Raman spectrum s_j(t) performing a feature transformation using a continuous wavelet transform to obtain a two-dimensional signal of a time-frequency domain thereof:

A two-dimensional matrix with l rows and n columns;

2) randomly generating a vector z, inputting a trained generation model aiming at the c-th target substance, obtaining a corresponding two-dimensional signal of a generated spectrum, repeating the step M times, obtaining M two-dimensional signals of the generated spectrum, marking the signals as the c-th target substance, training a generation countermeasure network aiming at the c-th target substance, and comprising the following steps of: firstly establishing a generation countermeasure network for the c type target substance, and inputting a two-dimensional signal of an original Raman spectrum of the c type target substance after characteristic transformation into the generation countermeasure network as a training set for training to obtain a trained generation model for the c type target substance;

2. The intelligent raman spectral data library building method of claim 1, wherein a generative confrontation network for a c-th target substance is pre-constructed and comprises two neural network models: generating model G _cAnd a discrimination model D_cTwo-dimensional data set WT of the original Raman spectrum of the c-th target substance_cDivided into batches of WT' s_batchIncluding the batchSize number of raman spectral data,

the step of training to generate the countermeasure network specifically comprises the following steps:

2.1) two-dimensional data set WT of the batch of original Raman spectra_batchInputting the discrimination model D_cThen using discriminant model D_cOutput D of_c(WT_batch) Calculating a discriminant model D_cAnd propagating the loss back;

2.2) generating a set of random vectors Z_batch＝{z_batch,k1, …, batchSize, each random vector is d, one by one, z_batch,kInput generative model G_cObtaining a two-dimensional signal G_c(z_batch,k) Then two-dimensional signal G is applied_c(z_batch,k) Input discrimination model D_cCalculating the loss of the second part, and finally performing back propagation and gradient descent on the loss;

2.3) intermediate results D with 2.2)_c(G_c(z_batch,k) Computer generated model G)_cIs also lostThe loss is subjected to back propagation and gradient descent;

2.4) repeating the steps 2.1) -2.3) on each batch of two-dimensional data sets of the original Raman spectra, completing one round of training, repeating the Y round of training, completing the generation countermeasure network training of the c-th target substance, and generating a trained model G_cCan be used for building a library.

3. An intelligent raman spectrum data library construction method according to claim 2, wherein the generating model is a neural network with an input of a random vector z and an output of a two-dimensional format spectrum, the discriminating model is input spectrum two-dimensional data, and then a binary neural network is used to determine whether the input is an original spectrum or a generated spectrum and output a confidence; the optimization objective V of generating the countermeasure network is to minimize the difference between the generated spectrum and the original spectrum, as follows:

wherein

And E_zRepresenting a mathematical expectation.

4. An intelligent raman spectral data library building method according to claim 2 wherein in step 2.1) said first portion loss is as follows:

5. an intelligent raman spectral data library building method according to claim 2 wherein in step 2.2) said second portion loss is as follows:

6. an intelligent raman spectral data library building method according to claim 2, wherein: in step 2.3), the generative model G is_cThe losses are as follows: