CN112712857A - Method for generating biological Raman spectrum data based on WGAN (WGAN) antagonistic generation network - Google Patents

Method for generating biological Raman spectrum data based on WGAN (WGAN) antagonistic generation network Download PDF

Info

Publication number
CN112712857A
CN112712857A CN202011442769.0A CN202011442769A CN112712857A CN 112712857 A CN112712857 A CN 112712857A CN 202011442769 A CN202011442769 A CN 202011442769A CN 112712857 A CN112712857 A CN 112712857A
Authority
CN
China
Prior art keywords
network
data
raman spectrum
generating
sample
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011442769.0A
Other languages
Chinese (zh)
Inventor
祝连庆
丁静雅
于明鑫
孙广开
何彦霖
董明利
庄炜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Information Science and Technology University
Original Assignee
Beijing Information Science and Technology University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Information Science and Technology University filed Critical Beijing Information Science and Technology University
Priority to CN202011442769.0A priority Critical patent/CN112712857A/en
Publication of CN112712857A publication Critical patent/CN112712857A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B50/00ICT programming tools or database systems specially adapted for bioinformatics
    • G16B50/30Data warehousing; Computing architectures
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N21/00Investigating or analysing materials by the use of optical means, i.e. using sub-millimetre waves, infrared, visible or ultraviolet light
    • G01N21/62Systems in which the material investigated is excited whereby it emits light or causes a change in wavelength of the incident light
    • G01N21/63Systems in which the material investigated is excited whereby it emits light or causes a change in wavelength of the incident light optically excited
    • G01N21/65Raman scattering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Theoretical Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • Bioethics (AREA)
  • Biotechnology (AREA)
  • Mathematical Physics (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • General Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Biomedical Technology (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Public Health (AREA)
  • Epidemiology (AREA)
  • Nuclear Medicine, Radiotherapy & Molecular Imaging (AREA)
  • Chemical & Material Sciences (AREA)
  • Analytical Chemistry (AREA)
  • Biochemistry (AREA)
  • Immunology (AREA)
  • Pathology (AREA)
  • Investigating, Analyzing Materials By Fluorescence Or Luminescence (AREA)

Abstract

The invention provides a method for generating biological Raman spectrum data based on a WGAN antagonistic generation network, which comprises the following steps: step a, extracting partial Raman spectrum data from a Raman spectrum database to serve as a real sample; b, creating a normal distribution function to generate random data Z; step c, creating a generating network G, and inputting random data Z into the generating network G; d, creating a discrimination network D, and inputting the Raman spectrum data and the generated sample into the discrimination network D; e, calculating and generating a target function of the network G and judging a target function of the network D; and f, optimizing the target function, and performing iterative training on the generation network G and the discrimination network D. The invention has the beneficial effects that: compared with the existing deep learning technology, the loss function utilizes the wassertein distance formula instead of the kl divergence, and can continuously move to generate the data distribution of the sample, so that the data distribution of the generated sample continuously moves to the data distribution of the real sample.

Description

Method for generating biological Raman spectrum data based on WGAN (WGAN) antagonistic generation network
Technical Field
The invention belongs to the technical field of computers, and particularly relates to a method for generating biological Raman spectrum data based on a WGAN (WGAN) antagonistic generation network.
Background
In the biomedical field, raman spectroscopy plays an important role in facilitating understanding of the behavior of cellular macromolecules. In the past 20 years, Raman spectroscopy has shown broad application prospects in the biomedical field. It can be used to evaluate the morphological structure of a sample, identify tissue components and determine pathological changes within cells, tissues and organs.
The basis for the successful application of raman spectroscopy in the medical field is the understanding of the vibrational properties of the underlying biomolecules and the appropriate assessment of the sensitivity, reproducibility and efficiency of such non-destructive spectroscopic methods.
In the biomedical field, raman spectroscopy has significant advantages over other diagnostic techniques in many respects. Biomedical forms are typically mixtures of body fluids, soft tissues and minerals, and raman spectroscopy can be applied to a wide variety of sample morphologies, which is a very attractive advantage in biomedical assays. Raman spectroscopy can provide many other information not available for biomedical detection, for example, it enables identification of chemical components, analysis of molecular structures, etc. The application of raman spectroscopy to biomedical applications is a non-destructive method. Proper selection of the laser wavelength and power can avoid damage to the sample.
A plot of raman scattering intensity as a function of wavelength is commonly referred to as a raman spectrogram. The customary unit of the x-axis of a raman spectrum is the number of wavenumbers which are cheap relative to the wavelength of the excitation light, abbreviated raman shift. The relationship between wavenumber and energy E is shown below:
E=hν=hc/λ=hcω
wherein h is the Planck constant; v is the frequency of the light; c is the speed of light; λ is the wavelength of light; and ω is the wave number of the light. Thus, the x-axis of the raman spectrum is exactly the difference in wavenumbers of the laser wavelength and the raman wavelength.
The deep learning is applied to the Raman spectrum, so that the development of a Raman spectrum classification system is greatly simplified, and the object to be detected in the biomedical detection can be directly identified from the original Raman wave number information.
The data characteristics of raman spectroscopy are one-dimensional, but raman spectroscopy data applied to biomedical detection is not well-obtained, so that the amount thereof is far less than that of data in the fields of computer vision and natural language. At present, the data of the raman spectrum applied to biomedical detection is not many and is not enough to improve the accuracy of the raman spectrum to the medical detection, so a technical means is needed to expand the raman spectrum database so as to improve the accuracy to the biomedical detection.
And generating Raman spectrum data by using a wessertein GAN neural network method so as to expand the Raman spectrum database.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provides a method for generating biological Raman spectrum data based on a WGAN (WGAN) antagonistic generation network, wherein the generation network G and the discrimination network D are formed by a deconvolution network and a convolution network, and compared with a fully-connected network, the method greatly reduces the number of parameters, saves the running time of a plurality of codes and increases the applicability of the device.
In order to solve the technical problems, the invention adopts the technical scheme that: a method of generating bio-raman spectral data based on a WGAN antagonistic generation network, the method comprising the steps of: a, extracting part of Raman spectrum data from a Raman spectrum database to serve as a real sample, and preprocessing the Raman spectrum data; b, creating a normal distribution function to generate random data Z; step c, creating a generating network G, inputting the random data Z into the generating network G, and generating data similar to the Raman spectrum data distribution, namely generating a sample; d, creating a discrimination network D, and inputting the Raman spectrum data and the generated sample into the discrimination network D; e, calculating the objective functions of the generating network G and the judging network D; and f, optimizing the target function, and performing iterative training on the generation network G and the discrimination network D to obtain the generation data of the Raman spectrum which can be falsified or falsified.
Preferably, the raman spectral data is one-dimensional data.
Preferably, the generation network G is created using a deconvolution operation; the discriminating network D is created by a convolutional neural network.
Preferably, the raman spectral data is pre-processed, including denoising, smoothing and normalization.
Preferably, the objective function is optimized by combining two optimization algorithms, namely AdaGrad and RMSProp.
Compared with the prior art, the invention has the beneficial effects that:
1. compared with the existing deep learning technology, the loss function utilizes the wassertein distance formula instead of the kl divergence, and can continuously move to generate the data distribution of the sample, so that the data distribution of the generated sample continuously moves to the data distribution of the real sample;
2. the generation network G and the discrimination network D adopt a deconvolution network and a convolution network, compared with a fully-connected network, the number of parameters is greatly reduced, and the running time of a plurality of codes is saved;
it is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention, as claimed.
Drawings
Further objects, features and advantages of the present invention will become apparent from the following description of embodiments of the invention, with reference to the accompanying drawings, in which:
FIG. 1 schematically shows the overall flow diagram of the process of the invention.
Detailed Description
The objects and functions of the present invention and methods for accomplishing the same will be apparent by reference to the exemplary embodiments. However, the present invention is not limited to the exemplary embodiments disclosed below; it can be implemented in different forms. The nature of the description is merely to assist those skilled in the relevant art in a comprehensive understanding of the specific details of the invention.
Hereinafter, embodiments of the present invention will be described with reference to the accompanying drawings. In the drawings, the same reference numerals denote the same or similar parts, or the same or similar steps.
The invention aims to solve the problem of insufficient Raman spectrum data samples in a database. A method for generating oral cancer Raman spectrum data based on wassertein GAN is provided, which is used for expanding an oral cancer Raman spectrum database.
The first step is as follows: it is sample X that reads our oral cancer raman spectral data from the database.
The second step is that: a normal distribution function xavier _ init (size) is created for outputting random values to make our initial values of the parameters.
The third step: initializing input of a discriminant model and parameters, wherein the input uses tf.placeholder (dtype) function, and the initialization of the parameters uses normal distribution function xavier _ init (size).
The fourth step: the inputs and parameters of the generative model are initialized as described above.
The fifth step: a random noise sampling function sample _ Z (m, n) is created to generate a random noise input Z that generates the model.
And a sixth step: the generative model is created using a fully connected neural network.
The seventh step: the discriminant model is created in a fully connected neural network, wherein the final activation function uses the tf.nn.sigmoid (d (y)) function in order to control the size of the discriminant value d (y) to be between 0 and 1.
Eighth step: and feeding random noise input Z into the generation network G to obtain a generation sample G (Z) of the oral cancer Raman spectrum data. Feeding a real sample of the oral cancer Raman spectrum data to the discrimination network D to obtain a discrimination value D _ real of the discrimination network D to the real sample; a generated sample of the oral cancer raman spectrum data is fed to the discrimination network D, and a discrimination value D _ fake of the discrimination network D on the generated sample is obtained.
The ninth step: the objective functions of the generation network G and the discrimination network D, i.e. their loss functions, are calculated. The loss function for the generating network G is the wassertein distance formula. The objective function, i.e., the loss function, of the enhanced version of wassertein GAN uses the wassertein distance formula, as compared to the kl divergence of the original GAN.
The tenth step: and optimizing the loss function G _ wassertein of the generation network G and the loss function D _ wassertein of the discrimination network D by adopting an adam optimizer.
The eleventh step: training is started, a network G is fixedly generated, and a discrimination network D is optimized; and then fixing the discrimination network D to optimize the generation network G, and performing loop iteration until the optimal solution of the generation network G is obtained. The finally obtained generated sample generated by the generating network G can achieve the effect of falseness and is used for expanding the Raman spectrum database.
In each training iteration, the learning discrimination network D includes the following steps: extracting real sample { x from oral cavity Raman spectrum database1,x2,…xmGet their data distribution Pdata(x) (ii) a From a priori noise distribution Pprior(z) obtaining a random noise input { z1,z2,…,zm}; sending Z into a generation network G to obtain a generation sample of the oral cancer Raman spectrum data
Figure BDA0002823042030000051
Updating the parameter θ of the discrimination network DDMaximize the objective function of D:
Figure BDA0002823042030000052
Figure BDA0002823042030000053
the learning discrimination network D includes the steps of: from a priori noise distribution Pprior(z) obtaining additional random noise samples { z }1,z2,…,zm}; updating the parameter θ _ G of the generating network G minimizes the objective function of G:
Figure BDA0002823042030000054
Figure BDA0002823042030000055
the invention has the beneficial effects that: compared with the existing deep learning technology, the loss function utilizes the wassertein distance formula instead of the kl divergence, and can continuously move to generate the data distribution of the sample, so that the data distribution of the generated sample continuously moves to the data distribution of the real sample; the generation network G and the discrimination network D of the invention adopt the deconvolution network and the convolution network, compared with the full-connection network, the quantity of parameters is greatly reduced, and the running time of a plurality of codes is saved.
Other embodiments of the invention will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the invention being indicated by the following claims.

Claims (5)

1. A method of generating bio-raman spectral data based on a WGAN antagonistic generation network, the method comprising the steps of:
a, extracting part of Raman spectrum data from a Raman spectrum database to serve as a real sample, and preprocessing the Raman spectrum data;
b, creating a normal distribution function to generate random data Z;
step c, creating a generating network G, inputting the random data Z into the generating network G, and generating data similar to the Raman spectrum data distribution, namely generating a sample;
d, creating a discrimination network D, and inputting the Raman spectrum data and the generated sample into the discrimination network D;
e, calculating the objective functions of the generating network G and the judging network D;
and f, optimizing the target function, and performing iterative training on the generation network G and the discrimination network D to obtain the generation data of the Raman spectrum which can be falsified or falsified.
2. The method of claim 1, wherein the raman spectral data is one-dimensional data.
3. The method of claim 1, wherein the generation network G is created using a deconvolution operation; the discriminating network D is created by a convolutional neural network.
4. The method of claim 1, wherein the raman spectral data is pre-processed, including denoising, smoothing, and normalizing.
5. The method of claim 1, wherein the objective function is optimized by combining two optimization algorithms, AdaGrad and RMSProp.
CN202011442769.0A 2020-12-08 2020-12-08 Method for generating biological Raman spectrum data based on WGAN (WGAN) antagonistic generation network Pending CN112712857A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011442769.0A CN112712857A (en) 2020-12-08 2020-12-08 Method for generating biological Raman spectrum data based on WGAN (WGAN) antagonistic generation network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011442769.0A CN112712857A (en) 2020-12-08 2020-12-08 Method for generating biological Raman spectrum data based on WGAN (WGAN) antagonistic generation network

Publications (1)

Publication Number Publication Date
CN112712857A true CN112712857A (en) 2021-04-27

Family

ID=75542993

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011442769.0A Pending CN112712857A (en) 2020-12-08 2020-12-08 Method for generating biological Raman spectrum data based on WGAN (WGAN) antagonistic generation network

Country Status (1)

Country Link
CN (1) CN112712857A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113378680A (en) * 2021-06-01 2021-09-10 厦门大学 Intelligent database building method for Raman spectrum data
CN114858782A (en) * 2022-07-05 2022-08-05 中国民航大学 Milk powder doping non-directional detection method based on Raman hyperspectral countermeasure discrimination model

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109508647A (en) * 2018-10-22 2019-03-22 北京理工大学 A kind of spectra database extended method based on generation confrontation network

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109508647A (en) * 2018-10-22 2019-03-22 北京理工大学 A kind of spectra database extended method based on generation confrontation network

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
ARJOVSKY M等: "Wasserstein Gan", 《ARXIV PREPRINT ARXIV: 1701.07875》 *
刘田丰等: "一种基于GAN的手势图像生成方法", 《计算机与数字工程》 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113378680A (en) * 2021-06-01 2021-09-10 厦门大学 Intelligent database building method for Raman spectrum data
CN113378680B (en) * 2021-06-01 2022-06-28 厦门大学 Intelligent database building method for Raman spectrum data
CN114858782A (en) * 2022-07-05 2022-08-05 中国民航大学 Milk powder doping non-directional detection method based on Raman hyperspectral countermeasure discrimination model

Similar Documents

Publication Publication Date Title
US11493447B2 (en) Method for removing background from spectrogram, method of identifying substances through Raman spectrogram, and electronic apparatus
Archer et al. A geometric morphometric relationship predicts stone flake shape and size variability
CN112712857A (en) Method for generating biological Raman spectrum data based on WGAN (WGAN) antagonistic generation network
Wu et al. Deep learning data augmentation for Raman spectroscopy cancer tissue classification
CN1975856A (en) Speech emotion identifying method based on supporting vector machine
CN109858477A (en) The Raman spectrum analysis method of object is identified in complex environment with depth forest
Li et al. Probabilistic partial least square regression: A robust model for quantitative analysis of raman spectroscopy data
CN112200770A (en) Tumor detection method based on Raman spectrum and convolutional neural network
CN113095188A (en) Deep learning-based Raman spectrum data analysis method and device
CN109508647A (en) A kind of spectra database extended method based on generation confrontation network
Magnussen et al. Deep convolutional neural network recovers pure absorbance spectra from highly scatter‐distorted spectra of cells
CN113516097B (en) Plant leaf disease identification method based on improved EfficentNet-V2
Sugiharti et al. Integration of convolutional neural network and extreme gradient boosting for breast cancer detection
Hu et al. Recognition method of coal and gangue based on multispectral spectral characteristics combined with one-dimensional convolutional neural network
Sun et al. Feature optimization method for the localization technology on loose particles inside sealed electronic equipment
Hu et al. PCANet: A common solution for laser-induced fluorescence spectral classification
Suganya et al. Ultrasound ovary cyst image classification with deep learning neural network with Support vector machine
CN116519661A (en) Rice identification detection method based on convolutional neural network
CN112716447A (en) Oral cancer classification system based on deep learning of Raman detection spectral data
Wang et al. Raman spectrum model transfer method based on Cycle-GAN
CN115326783B (en) Raman spectrum preprocessing model generation method, system, terminal and storage medium
Babayomi et al. Convolutional xgboost (c-xgboost) model for brain tumor detection
CN113138181B (en) Method for grading quality of fen-flavor wine base
Wan et al. BACNN: Multi-scale feature fusion-based bilinear attention convolutional neural network for wood NIR classification
CN115565004A (en) Raman spectrum analysis method based on two-dimensional Raman map combined with deep learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination