CN113390848A - DCGAN spectral data expansion method - Google Patents

DCGAN spectral data expansion method Download PDF

Info

Publication number
CN113390848A
CN113390848A CN202010179231.9A CN202010179231A CN113390848A CN 113390848 A CN113390848 A CN 113390848A CN 202010179231 A CN202010179231 A CN 202010179231A CN 113390848 A CN113390848 A CN 113390848A
Authority
CN
China
Prior art keywords
network
spectrum
data
dcgan
convolution
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010179231.9A
Other languages
Chinese (zh)
Inventor
李彦晖
吴鹏飞
刘勇飞
殷琳琳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guilin University of Electronic Technology
Original Assignee
Guilin University of Electronic Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guilin University of Electronic Technology filed Critical Guilin University of Electronic Technology
Priority to CN202010179231.9A priority Critical patent/CN113390848A/en
Publication of CN113390848A publication Critical patent/CN113390848A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N21/00Investigating or analysing materials by the use of optical means, i.e. using sub-millimetre waves, infrared, visible or ultraviolet light
    • G01N21/62Systems in which the material investigated is excited whereby it emits light or causes a change in wavelength of the incident light
    • G01N21/63Systems in which the material investigated is excited whereby it emits light or causes a change in wavelength of the incident light optically excited
    • G01N21/65Raman scattering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2413Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
    • G06F18/24133Distances to prototypes
    • G06F18/24137Distances to cluster centroïds
    • G06F18/2414Smoothing the distance, e.g. radial basis function networks [RBFN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Nuclear Medicine, Radiotherapy & Molecular Imaging (AREA)
  • Chemical & Material Sciences (AREA)
  • Analytical Chemistry (AREA)
  • Biochemistry (AREA)
  • Immunology (AREA)
  • Pathology (AREA)
  • Investigating, Analyzing Materials By Fluorescence Or Luminescence (AREA)

Abstract

The invention discloses a DCGAN spectral data expansion method, which introduces convolution on the basis of original GAN, extracts deep features of Raman spectrum by means of feature extraction capability of convolution layer, and generates highly similar spectrum. Compared with an infrared spectrum method, the Raman spectrum provides nondestructive qualitative and quantitative analysis, has no special requirements on a sample, is simple and convenient in short time and high in sensitivity, avoids errors caused by the damage of the sample or the defects of the sample, has smaller difference of the distortion degree of the generated spectrum compared with the original spectrum, and well retains the information of the original spectrum of the generated spectrum.

Description

DCGAN spectral data expansion method
Technical Field
The invention relates to the field of spectral metrology, in particular to a DCGAN spectral data expansion method.
Background
The safety of food and medicine has always been the object of major concern, and the commonly used food and medicine detection means include absorption coefficient method, chemical method, HPLC and the like, and these detection methods are not only cumbersome, but also limited to laboratories, so a means capable of rapid detection is needed, and in recent years, the development is better for near infrared spectrum detection and Raman spectrum detection, wherein the Raman spectrum detection technology is a detection technology generated based on Raman spectrum photon fingerprints, when light impacts on object molecules, elastic scattering occurs, and a small amount of photons generate inelastic scattering, these photons are Raman photons, Raman photons transfer energy to molecules, and generate displacement scattered light, and the displacement distance is the information of the molecules. Different distances correspond to different molecular structures, thereby generating a Raman spectrum. The chemical and molecular information and content of the sample can be clarified according to the spectrogram.
The use of raman analysis for identification and classification of food and drug products has been widely used due to improved upgrading of instruments and methods. The application of deep learning to spectroscopy is a necessary trend. The existing Raman spectrum acquisition needs higher manpower and time cost, the amount of acquired data samples is less, interference factors exist, and the condition that deep learning needs to be trained by large samples cannot be met, so that the method for applying the deep learning to the Raman spectrum is less.
Disclosure of Invention
In view of the above, the present invention is to provide a method for expanding DCGAN spectral data.
A DCGAN spectral data expansion method is to introduce convolution on the basis of original GAN, and extract deep features of Raman spectrum by means of feature extraction capability of convolution layer to generate highly similar spectrum, and the method comprises the following steps:
(1) using deep convolution to generate a countermeasure network and a new spectrum, and inputting CNN for classification;
(2) generating a picture by using a countermeasure network, inputting random noise, and judging the authenticity of the picture;
(3) training a generation network, and achieving the purpose of optimizing the generation network by giving a judgment network parameter, so that the judgment network cannot identify 'false' samples, can output larger probability values of all true samples, and is mapped into a function to be maximized D (G (z)), namely minimized 1-D (G (z));
(4) training a discrimination network, and giving parameters of a generated network to achieve the purpose of optimizing the discrimination network, so that the accuracy of the discrimination network can be greatly improved, wherein a real image x is expected to output a larger probability value, namely, the maximum D (x); for the generated sample g (z), D (g (z)) is minimized. Therefore, an objective function optimization objective during network training is obtained: lnD (x) + ln (1-D (G (x)));
finally, an objective function is obtained:
Figure BDA0002410190060000021
training criteria for discriminant networks, i.e. maximizing V (D, G) given a generating network:
Figure BDA0002410190060000022
where equation (2) is desired to be maximized, this requires that each x in the equation lets
Pdata(x)ln(D(x))+Pg(x)ln(1-D(x)) (3)
The maximum value is taken. Where x, Pdata(x),Pg(x) Are all fixed values, obviously: for any nonzero Pdata(x),Pg(x) And a real value D (x) e [0,1 ∈]When the function (3) is in Pdata(x)/(Pdata(x)+Pg(x) Take the maximum value, list the optimal function to generate network D:
Figure BDA0002410190060000023
when optimizing the generation network, there is Pdata=PgThe generation network obtains an optimal solution, so that the generation network better reproduces the distribution of real samples;
(5) and modifying the convolution kernel of the convolution layer in the DCGAN into a one-dimensional vector convolution kernel so as to process the Raman spectrum data.
The invention has the advantages that: compared with an infrared spectrum method, the Raman spectrum provides nondestructive qualitative and quantitative analysis, has no special requirements on a sample, is simple and convenient in short time and high in sensitivity, avoids errors caused by the damage of the sample or the defects of the sample, has smaller difference of the distortion degree of the generated spectrum compared with the original spectrum, and well retains the information of the original spectrum of the generated spectrum.
Drawings
FIG. 1 is a schematic diagram of DCGAN network structure and Raman spectrum classification according to an embodiment of the present invention;
FIG. 2 is a comparison between the original spectrum and the generated spectrum of DCGAN in the example of the present invention ((a) original spectrum (b) generated spectrum).
Detailed Description
The invention is further illustrated with reference to the following figures and examples.
Example (b):
the network structure of DCGAN is shown in fig. 1, and the operation of DCGAN is illustrated here: two networks, G (Generator) and D (Discrimatoror), can be seen. Their function confirms their name:
g this network is used to generate a picture, to which a random noise z is input, and finally a picture, denoted G (z), can be generated.
D this network is used to determine the true degree of a picture. Inputting a picture x to the device, and outputting D (x), which means that x is the probability of the real picture, and if the probability value is 1, the picture is completely real. If the probability value is 0, the picture is false.
Training is then performed, as explained above, where G requires the generation of as many pictures as possible to verify D. And D should distinguish the "false" picture generated by G from the actual picture as much as possible. It can be seen that G and D form a "left-right interpulsation".
Finally, we get the result of a game, i.e. G generates a picture G (z) of "false-to-false". In this case, it is difficult to determine the true degree of g (z) for D, and D (g (z)) is 0.5. The convolution is introduced because CNN does not process each single pixel point, but calculates the whole area, so that the formed DCGAN can better extract Raman spectrum characteristic information. Since partial loss of image information is caused by down-sampling of the pooling layer (posing) in the convolutional network, deconvolution (deconvolution) and stepped convolution (stepped convolution) are introduced instead of generating and countering the network pooling layer, respectively, so that the loss of image information can be reduced. Subsequently, Batch Normalization (BN) is introduced to construct a more stable network.
The convolutional layer of the conventional DCGAN network is mainly classified into image-oriented layers. The network layer default input is typically a two-dimensional image, so the network layer convolution kernel and pooling window are both matrices of size n dimensions. Therefore, the network structure is not suitable for spectral images, and therefore, the convolutional layer of the conventional DCGAN network needs to be improved, that is, the convolutional core of the convolutional layer in the DCGAN network is modified into a one-dimensional vector convolutional core, so that raman spectral data can be processed.
The structures of the generation network and the discrimination network designed for the raman spectrum data are shown in tables 1 and 2.
Table 1 generating a network
Figure BDA0002410190060000041
TABLE 2 Confrontation network
Figure BDA0002410190060000042
The training is divided into two parts:
(1) training a generation network, and achieving the purpose of optimizing the generation network by giving a discrimination network parameter, so that the discrimination network cannot identify 'false' samples, and the larger probability values of all true samples which can be output are mapped into a function, namely, the maximization D (G (z)), namely, the minimization 1-D (G (z)).
(2) And training the discrimination network, and giving the parameters of the generated network to achieve the purpose of optimizing the discrimination network, so that the precision of the discrimination network can be greatly improved, and the real image x is expected to output a larger probability value, namely, the maximum D (x). For the generated sample g (z), D (g (z)) is minimized. Therefore, an objective function optimization objective during network training is obtained: lnD (x) + ln (1-D (G (x))).
Finally, an objective function is obtained:
Figure BDA0002410190060000043
training criteria for discriminant networks, i.e. maximizing V (D, G) given a generating network:
Figure BDA0002410190060000044
where equation (2) is desired to be maximized, this requires that each x in the equation lets
Pdata(x)ln(D(x))+Pg(x)ln(1-D(x)) (3)
The maximum value is taken. Where x, Pdata(x),Pg(x) Are all fixed values, obviously: for any nonzero Pdata(x),Pg(x) And a real value D (x) e [0,1 ∈]When the function (3) is in Pdata(x)/(Pdata(x)+Pg(x) Take the maximum value, list the optimal function to generate network D:
Figure BDA0002410190060000051
when optimizing the generation network, there is Pdata=PgThe generation network obtains an optimal solution, so that the generation network better reproduces the distribution of real samples.
Example (b):
raman spectral data was used in the invention as a published raman spectral data set and the samples of the study were 16 pork samples taken from the slaughterhouse daily production inventory. The cylinders were fat drilled with a 12 mm rotary biopsy drill and then cut into disks (height 1.8 mm) to create a depth profile. 105 disc data were obtained from the waist of a total of 16 pork samples.
The activation function of the convolutional network in the DCGAN selects LeakyReLU, the slope value of the leak is set to be 0.2, the value of the whole network is set to be 2, the learning rate of the network cannot be too large, otherwise, the time is too long, the learning rate is set to be 0.0002, the convolutional layer also needs to use an optimizer and set momentum parameters, the optimizer uses Adam, and the parameters are set to be 0.5. The spectrum number of each drug is expanded to 100 by a DCGAN mode, 70% of the spectrum is selected to train CNN, and the rest 30% of the spectrum is tested. In order to avoid the unsatisfactory results caused by the inconsistent sample bands in the experiments, the spectrum of 100-1000 bands of each drug was selected in the following experiments.
The number of spectra per drug was also expanded to 100 using DCGAN, and the training and test sets were partitioned as in Table 3. FIG. 2 shows the 10 randomly selected sample spectra from the original discharge data, as shown in FIG. 2(a), and 10 new spectra generated by the DCGAN challenge, as shown in FIG. 2 (b). The generated spectrum is visually seen to be smoother and clearer than the original spectrum. And inputting the divided training set and test set into CNN for training and classification to obtain a discrimination result shown in Table 4. The experimental result shows that the generated spectrogram has high classification precision. Since the DCGAN generated spectrum is a regenerated original spectrum, it is necessary to evaluate the degree of distortion of the generated spectrum compared to the original spectrum. The local Variance Estimation method (LVE) is a better method capable of estimating the image distortion degree, and the algorithm principle is that the local Variance of each picture pixel is calculated firstly, the largest local Variance is the signal Variance, the smallest local Variance is the noise Variance, and the signal Variance is compared with the noise Variance to obtain the ratio and converted into the noise unit dB. It can be seen from table 5 that the distortion degree of the generated spectrum is smaller than the difference of the original spectrum, and the generated spectrum better retains the information of the original spectrogram.
TABLE 3 pork sample training set, test set partitioning
Figure BDA0002410190060000061
Note that in Table 4, the detailed result-DCGAN + CNN (%)
Figure BDA0002410190060000062
Table 5 corresponds to the LVE method signal-to-noise ratio of FIGS. 2(a) (b)
Figure BDA0002410190060000063
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims (1)

1. A DCGAN spectral data expansion method is characterized in that: introducing convolution on the basis of original GAN, extracting deep features of Raman spectrum by means of feature extraction capability of convolution layer, and generating highly similar spectrum, wherein the method comprises the following steps:
(1) using deep convolution to generate a countermeasure network and a new spectrum, and inputting CNN for classification;
(2) generating a picture by using a countermeasure network, inputting random noise, and judging the authenticity of the picture;
(3) training a generation network, and achieving the purpose of optimizing the generation network by giving a judgment network parameter, so that the judgment network cannot identify 'false' samples, can output larger probability values of all true samples, and is mapped into a function to be maximized D (G (z)), namely minimized 1-D (G (z));
(4) training a discrimination network, and giving parameters of a generated network to achieve the purpose of optimizing the discrimination network, so that the accuracy of the discrimination network can be greatly improved, wherein a real image x is expected to output a larger probability value, namely, the maximum D (x); for the generated sample g (z), D (g (z)) is minimized. Therefore, an objective function optimization objective during network training is obtained: lnD (x) + ln (1-D (G (x)));
finally, an objective function is obtained:
Figure FDA0002410190050000011
training criteria for discriminant networks, i.e. maximizing V (D, G) given a generating network:
max V(D,G)=∫xPdata(x)ln(D(x))dx+∫zPz(z)ln(1-D(G(z)))dz=∫x[Pdata(x)ln(D(x))+Pg(x)ln(1-D(x))]]dx (2)
where equation (2) is desired to be maximized, this requires that each x in the equation lets
Pdata(x)ln(D(x))+Pg(x)ln(1-D(x)) (3)
The maximum value is taken. Where x, Pdata(x),Pg(x) Are all fixed values, obviously: for any nonzero Pdata(x),Pg(x) And a real value D (x) e [0,1 ∈]When the function (3) is in Pdata(x)/(Pdata(x)+Pg(x) Take the maximum value, list the optimal function to generate network D:
Figure FDA0002410190050000012
when optimizing the generation network, there is Pdata=PgThe generation network obtains an optimal solution, so that the generation network better reproduces the distribution of real samples;
(5) and modifying the convolution kernel of the convolution layer in the DCGAN into a one-dimensional vector convolution kernel so as to process the Raman spectrum data.
CN202010179231.9A 2020-03-13 2020-03-13 DCGAN spectral data expansion method Pending CN113390848A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010179231.9A CN113390848A (en) 2020-03-13 2020-03-13 DCGAN spectral data expansion method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010179231.9A CN113390848A (en) 2020-03-13 2020-03-13 DCGAN spectral data expansion method

Publications (1)

Publication Number Publication Date
CN113390848A true CN113390848A (en) 2021-09-14

Family

ID=77616310

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010179231.9A Pending CN113390848A (en) 2020-03-13 2020-03-13 DCGAN spectral data expansion method

Country Status (1)

Country Link
CN (1) CN113390848A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115326783A (en) * 2022-10-13 2022-11-11 南方科技大学 Raman spectrum preprocessing model generation method, system, terminal and storage medium

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115326783A (en) * 2022-10-13 2022-11-11 南方科技大学 Raman spectrum preprocessing model generation method, system, terminal and storage medium

Similar Documents

Publication Publication Date Title
Bhargava Towards a practical Fourier transform infrared chemical imaging protocol for cancer histopathology
US8280140B2 (en) Classifying image features
CN110879980B (en) Nuclear magnetic resonance spectrum denoising method based on neural network algorithm
Alam et al. Local masking in natural images: A database and analysis
CN108088834B (en) Echinococcosis serum Raman spectrum diagnostic apparatus based on optimized back propagation neural network
US20060013464A1 (en) System and method for identifying objects of interest in image data
CN110674835B (en) Terahertz imaging method and system and nondestructive testing method and system
CN106709967B (en) Endoscopic imaging algorithm and control system
CN111325748A (en) Infrared thermal image nondestructive testing method based on convolutional neural network
CN106653032A (en) Animal sound detecting method based on multiband energy distribution in low signal-to-noise-ratio environment
CN106157232B (en) A kind of general steganalysis method of digital picture characteristic perception
CN107346541B (en) Tissue characterization method based on ultrasonic radio frequency time series wavelet analysis
CN109993155A (en) For the characteristic peak extracting method of low signal-to-noise ratio uv raman spectroscopy
CN110428364B (en) Method and device for expanding Parkinson voiceprint spectrogram sample and computer storage medium
CN115326783B (en) Raman spectrum preprocessing model generation method, system, terminal and storage medium
CN107169497A (en) A kind of tumor imaging label extracting method based on gene iconography
CN107679569A (en) Raman spectrum substance automatic identifying method based on adaptive hypergraph algorithm
Chen et al. Feasibility study on automated recognition of allergenic pollen: grass, birch and mugwort
JP2015135318A (en) Data processing apparatus, data display system, sample data acquisition system, and data processing method
CN115499092B (en) Astronomical radio transient signal searching method, system, device and readable storage medium
CN102122356A (en) Computer-aided method for distinguishing ultrasound endoscope image of pancreatic cancer
CN113390848A (en) DCGAN spectral data expansion method
Aarthy et al. An approach for detecting breast cancer using wavelet transforms
CN108535214A (en) A method of Trichoderma is identified based on hyperspectral technique
CN114781484A (en) Cancer serum SERS spectrum classification method based on convolutional neural network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20210914