CN115326783A - Raman spectrum preprocessing model generation method, system, terminal and storage medium - Google Patents

Raman spectrum preprocessing model generation method, system, terminal and storage medium Download PDF

Info

Publication number
CN115326783A
CN115326783A CN202211256339.9A CN202211256339A CN115326783A CN 115326783 A CN115326783 A CN 115326783A CN 202211256339 A CN202211256339 A CN 202211256339A CN 115326783 A CN115326783 A CN 115326783A
Authority
CN
China
Prior art keywords
spectrum
raman spectrum
raman
library
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202211256339.9A
Other languages
Chinese (zh)
Other versions
CN115326783B (en
Inventor
沈平
胡嘉祺
陈金娜
薛陈龙
党竑
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Southwest University of Science and Technology
Original Assignee
Southwest University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Southwest University of Science and Technology filed Critical Southwest University of Science and Technology
Priority to CN202211256339.9A priority Critical patent/CN115326783B/en
Publication of CN115326783A publication Critical patent/CN115326783A/en
Application granted granted Critical
Publication of CN115326783B publication Critical patent/CN115326783B/en
Priority to PCT/CN2023/121358 priority patent/WO2024078321A1/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N21/00Investigating or analysing materials by the use of optical means, i.e. using sub-millimetre waves, infrared, visible or ultraviolet light
    • G01N21/62Systems in which the material investigated is excited whereby it emits light or causes a change in wavelength of the incident light
    • G01N21/63Systems in which the material investigated is excited whereby it emits light or causes a change in wavelength of the incident light optically excited
    • G01N21/65Raman scattering
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N21/00Investigating or analysing materials by the use of optical means, i.e. using sub-millimetre waves, infrared, visible or ultraviolet light
    • G01N21/62Systems in which the material investigated is excited whereby it emits light or causes a change in wavelength of the incident light
    • G01N21/63Systems in which the material investigated is excited whereby it emits light or causes a change in wavelength of the incident light optically excited
    • G01N21/65Raman scattering
    • G01N21/658Raman scattering enhancement Raman, e.g. surface plasmons
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/20Design optimisation, verification or simulation
    • G06F30/27Design optimisation, verification or simulation using machine learning, e.g. artificial intelligence, neural networks, support vector machines [SVM] or training a model

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Pathology (AREA)
  • Evolutionary Computation (AREA)
  • Biochemistry (AREA)
  • General Health & Medical Sciences (AREA)
  • Chemical & Material Sciences (AREA)
  • Immunology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Nuclear Medicine, Radiotherapy & Molecular Imaging (AREA)
  • Theoretical Computer Science (AREA)
  • Analytical Chemistry (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Computer Hardware Design (AREA)
  • Geometry (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Investigating, Analyzing Materials By Fluorescence Or Luminescence (AREA)
  • Spectrometry And Color Measurement (AREA)

Abstract

The invention relates to a method, a system, a terminal and a storage medium for generating a Raman spectrum preprocessing model, which are characterized in that noise, a baseline background signal and a Raman peak in a real Raman spectrum library are extracted and built, raman characteristic peaks in the Raman peak library are freely combined to generate an ideal spectrum library without noise and the baseline background signal, the extracted noise and the baseline background signal are superposed on the ideal spectrum library to generate a reference spectrum library, the ideal spectrum library and a random Gaussian noise input generator generate a simulation spectrum library, a discriminator and the generator form countertraining, and a high-simulation Raman spectrum library conforming to the real Raman spectrum characteristic is generated after the training is finished; training a spectrum preprocessing model based on an automatic supervision algorithm by using the library so as to complete automatic parameter setting; the ideal spectrum library is used as a label for model training, the model can be directly used for processing the actually acquired spectrum after the training is finished, the use is simple and quick, the effect of denoising and baseline background removing is good, and the spectrum is high in fidelity.

Description

Raman spectrum preprocessing model generation method, system, terminal and storage medium
Technical Field
The invention relates to the technical field of Raman spectrum preprocessing, in particular to a Raman spectrum preprocessing model generation method, a Raman spectrum preprocessing model generation system, a Raman spectrum preprocessing model generation terminal and a Raman spectrum preprocessing model storage medium.
Background
Raman spectroscopy is a spectroscopic technique based on the interaction of light and a substance to generate raman scattering due to the frequency difference between scattered light and incident light corresponding to the vibration or pure rotational energy level spacing of scattering medium molecules; as a molecular vibration spectrum, the unmarked property of raman spectrum is widely applied to detection of chemistry, biology and the like; however, due to the noise of the instrument, the fluorescence effect of the sample and the environmental noise, the quantitative precision and the qualitative accuracy are interfered, and particularly for the detection of the biological sample, the interference is more obvious, so that the characteristic peak cannot be effectively identified; although the effect of noise can be reduced by upgrading the instrument, optimizing the detection environment and preprocessing the sample, the influence of noise is limited by time cost and economic cost, and the effect of noise cannot be completely eliminated from hardware by the current technical level.
In the raman spectrum preprocessing, a baseline background signal and a noise signal are removed from an acquired spectrum, and in the prior art, a raman spectrum preprocessing method is divided into a mathematical analysis method and a machine learning method: the principle of spectrum preprocessing based on a mathematical analysis method is to use a spectrum physics mathematical equation for analysis, and because the actually sampled Raman spectrum does not completely meet the mathematical equation and background and noise signals are diversified, the mathematical analysis method needs to manually adjust parameters aiming at different spectrums, can not realize full automation and is difficult to process the Raman spectrum of low signal-to-noise ratio or complex background signals; the Raman spectrum preprocessing model based on the machine learning method mostly uses the fully supervised machine learning at the present stage, but because the Raman spectrum can not acquire a standard spectrum without noise and a baseline background signal through real spectrum acquisition as a label, a simulated spectrum generated by a simulated Gaussian peak or Lorentz peak is generally adopted as a fully supervised machine learning label, random Gaussian noise is adopted as noise, and a random fluorescence signal is adopted as a baseline, the processing method can fit a simulated Raman spectrum data set to a certain extent, but the noise peak is easily overfitting in the using process, so that the preprocessed spectrum signal is distorted, and the processing method has certain limitation.
The existing method example:
1. a method for removing background signals based on polynomial or least squares fitting: the artificial design of corresponding parameters is required according to the Raman spectra with different signals and different signal-to-noise ratios, so that the method has certain subjectivity and can hardly process the spectrum with low signal-to-noise ratio.
2. The method for removing the baseline background signal and the noise based on the wavelet transform comprises the following steps: wavelet transformation is to fit according to distribution characteristics on a spectrum time domain, automation can be realized to a certain extent by designing automatic iteration, but different algorithms need to be selected according to Raman spectrums with different signal-to-noise ratios, and in addition, the method is still not ideal in noise suppression, baseline background removal and signal fidelity effects of low signal-to-noise ratio Raman spectrum processing.
3. The method for removing noise and baseline background signals based on the full supervision of deep learning comprises the following steps: the deep learning can be used for realizing automatic spectrum preprocessing without human interference, the noise and base line background removing effect is good, but for the low signal-to-noise ratio Raman spectrum, a full supervision algorithm can over-fit a Gaussian peak and a Lorentz peak, and the obtained Raman spectrum can generate certain deformation, so that the processed spectrum is distorted.
There is a need for a more cost effective, more efficient automated, high fidelity approach to the above problems.
Disclosure of Invention
The technical problem to be solved by the present invention is to provide a method for generating a raman spectrum preprocessing model, a system for generating a raman spectrum preprocessing model, a raman spectrum preprocessing terminal, and a computer-readable storage medium, aiming at the above-mentioned defects in the prior art.
The technical scheme adopted by the invention for solving the technical problems is as follows:
a Raman spectrum preprocessing model generation method is constructed, and the method comprises the following steps:
the first step is as follows: the real Raman/surface enhanced Raman spectrum library is processed by a Raman spectrum generation countermeasure network to generate an ideal spectrum library and a highly simulated Raman spectrum library:
the second step is that: inputting the high-simulation Raman spectrum library into a Raman spectrum preprocessing model for training; and the ideal spectrum library is used as a label of the Raman spectrum pretreatment model, loss value calculation is carried out on the pretreated Raman spectrum, the calculated loss value is fed back to the Raman spectrum pretreatment model, and automatic parameter optimization is carried out.
The invention discloses a Raman spectrum preprocessing model generation method, wherein the method comprises the following steps of processing a real Raman/surface enhanced Raman spectrum library through a Raman spectrum generation countermeasure network to generate an ideal spectrum library and a highly-simulated Raman spectrum library, wherein the method comprises the following steps:
preparing a training data set, and generating an ideal spectrum library and a reference spectrum library;
and training the Raman spectrum to generate a countermeasure network, and finally generating a high-simulation Raman spectrum library.
The method for generating the Raman spectrum preprocessing model comprises the following steps of:
collecting real Raman spectrum/surface enhanced Raman spectrum data;
extracting noise information, a baseline background signal and a Raman characteristic peak of a Raman spectrum in Raman spectrum/surface enhanced Raman spectrum data, and respectively generating a noise information base, a baseline background signal base and a Raman peak base;
randomly combining characteristic peaks in a Raman peak library to generate an ideal spectrum without noise and a baseline background signal, and establishing a library;
the extracted noise and baseline background signal are superimposed on the ideal spectral data to generate a reference spectrum and a library is built.
The invention discloses a method for generating a Raman spectrum preprocessing model, wherein the method for training a Raman spectrum to generate a countermeasure network and finally generating a high-simulation Raman spectrum library comprises the following steps:
the Raman spectrum generation countermeasure network comprises a generator and a discriminator
A generator: generating a simulated Raman spectrum by taking an ideal spectrum library and random Gaussian noise as input;
the discriminator: the simulated Raman spectrum library and the reference spectrum library are used as input, the simulated Raman spectrum is judged to be false or true, and the reference spectrum is judged to be true or false;
when training the generator, the loss function of the generator aims at minimizing the error of the discriminator for discriminating the simulation spectrum into true or false;
during training of the discriminator, the discriminator loss function is targeted to maximize the classifier accuracy;
training the generator and the discriminator alternately, and finally generating a high-simulation Raman spectrum library close to the reference spectrum by the generator.
The invention discloses a Raman spectrum pretreatment model generation method, wherein the high-simulation Raman spectrum is generated by adopting a formula:
Figure 285208DEST_PATH_IMAGE002
wherein
Figure 225351DEST_PATH_IMAGE003
The representation generator (G) uses the input random Gaussian noise (z) and the ideal spectrum: (
Figure 903457DEST_PATH_IMAGE004
) Generating a simulated Raman spectrum;
Figure 587248DEST_PATH_IMAGE005
and
Figure 156114DEST_PATH_IMAGE006
respectively indicating that the discriminator (D) judges the input spectrum as a simulated Raman spectrum
Figure 611366DEST_PATH_IMAGE007
As false or true and reference spectra
Figure 166981DEST_PATH_IMAGE008
Probability of being true or false;
the formula includes
Figure 123305DEST_PATH_IMAGE009
And
Figure 956132DEST_PATH_IMAGE010
two processes are as follows:
Figure 975428DEST_PATH_IMAGE009
simulated raman spectroscopy while representing a training discriminator
Figure 628126DEST_PATH_IMAGE007
Judged false, reference spectrum
Figure 388140DEST_PATH_IMAGE008
Is judged to be true, so that
Figure 668949DEST_PATH_IMAGE011
Is close to 0, and thus
Figure 853287DEST_PATH_IMAGE012
The accuracy of the discriminator is maximized by being close to 0;
Figure 727702DEST_PATH_IMAGE013
showing simulated raman spectra when training the generator
Figure 291408DEST_PATH_IMAGE014
Is judged to be true, so that
Figure 957881DEST_PATH_IMAGE015
Is close to 1, and further
Figure 191416DEST_PATH_IMAGE016
Is close to 1;
through the alternation of the two processes, the generator realizes the generation of the high simulation spectrum.
The method for generating the Raman spectrum pretreatment model comprises the steps of inputting a high-simulation Raman spectrum library into the Raman spectrum pretreatment model for training; the ideal spectrum library is used as a label of a Raman spectrum pretreatment model, loss value calculation is carried out on the pretreated Raman spectrum and is fed back to the Raman spectrum pretreatment model, and the method for automatically optimizing parameters is adopted:
the method comprises the following steps of coding the intensity and the Raman shift of a spectrum by using a Raman spectrum position coder module, and preliminarily fitting noise and a baseline background signal to be removed by using a Raman spectrum background signal evaluation module;
extracting features of the spectrum by adopting a Raman spectrum encoder module, and extracting features belonging to Raman peaks;
reducing the highly compressed Raman peak features into a Raman spectrum by using a Raman spectrum decoder module;
the ideal spectrum library is used as a label for model training, is compared with the reduced Raman spectrum set, and calculates a loss value through a loss function;
and feeding the loss value back to the Raman spectrum pretreatment model, and updating the model parameters through a chain rule.
The invention relates to a method for generating a Raman spectrum pretreatment model, wherein a high-simulation Raman spectrum library is input into the Raman spectrum pretreatment model for training; the ideal spectrum library is used as a label of a Raman spectrum pretreatment model, loss value calculation is carried out on the pretreated Raman spectrum and is fed back to the Raman spectrum pretreatment model, and a method for automatically optimizing parameters is adopted:
encoding the intensity and raman shift of the spectrum using a raman spectrum position encoder module;
extracting features of the spectrum by adopting a Raman spectrum encoder module, and extracting features belonging to Raman peaks;
reducing the highly compressed Raman peak characteristics into a Raman spectrum by using a Raman spectrum decoder module;
primarily fitting noise and a baseline background signal to be removed by adopting a Raman spectrum background signal evaluation module;
the ideal spectrum library is used as a label for model training, is compared with the reduced Raman spectrum set, and calculates a loss value through a loss function;
and feeding the loss value back to the Raman spectrum pretreatment model, and updating the model parameters through a chain rule.
The invention relates to a method for generating a Raman spectrum pretreatment model, wherein a high-simulation Raman spectrum library is input into the Raman spectrum pretreatment model for training; the ideal spectrum library is used as a label of a Raman spectrum pretreatment model, loss value calculation is carried out on the pretreated Raman spectrum and is fed back to the Raman spectrum pretreatment model, and the method for automatically optimizing parameters is adopted:
encoding the intensity and raman shift of the spectrum using a raman spectrum position encoder module;
extracting features of the spectrum by adopting a Raman spectrum encoder module, and extracting features belonging to Raman peaks;
reducing the highly compressed Raman peak characteristics into a Raman spectrum by using a Raman spectrum decoder module;
the ideal spectrum library is used as a label for model training, is compared with the reduced Raman spectrum set, and calculates a loss value through a loss function;
and feeding the loss value back to the Raman spectrum pretreatment model, and updating the model parameters through a chain rule.
The invention relates to a Raman spectrum preprocessing model generation method, wherein a Raman spectrum position encoder module adopts a sliding window method, a sine/cosine encoding method or a batch-like encoding method;
sequentially segmenting different Raman shifts of the Raman spectrum by adopting a sliding window method to obtain multi-channel data;
the relative position of the spectral intensity is coded by adopting a sine/cosine coding method, and the code has a periodic characteristic;
and (3) carrying out correlation analysis on the multiple spectra by adopting a similar batch coding method, and coding the spectra by utilizing correlation coefficients.
The invention discloses a Raman spectrum pretreatment model generation method, wherein a loss function adopts a formula:
Figure 680691DEST_PATH_IMAGE017
Figure 48087DEST_PATH_IMAGE018
wherein
Figure 178854DEST_PATH_IMAGE019
For training the background signal evaluation module to,
Figure 707925DEST_PATH_IMAGE020
for training the entire pre-processing model;
Figure 947145DEST_PATH_IMAGE021
Figure 459510DEST_PATH_IMAGE022
refer to the ideal reference spectrum and the spectrum after pre-processing, respectively;
Figure 834997DEST_PATH_IMAGE023
and
Figure 800548DEST_PATH_IMAGE024
corresponding to the background signal evaluation module and the preprocessing model output;
the method comprises two processes of training a background signal evaluation module and a Raman spectrum preprocessing model:
first step, use
Figure 136851DEST_PATH_IMAGE025
Training a background signal evaluation module;
second step, use
Figure 580471DEST_PATH_IMAGE026
Training a Raman spectrum pretreatment model;
Figure 78973DEST_PATH_IMAGE027
calculating the optional root mean square error or average absolute error, and adding weight coefficient in the second step
Figure 825212DEST_PATH_IMAGE028
Balancing the feedback of the two;
the root mean square error RMSE adopts the formula:
Figure 773445DEST_PATH_IMAGE029
the average absolute error MAE adopts the formula:
Figure 161701DEST_PATH_IMAGE030
a raman spectrum preprocessing model, wherein the raman spectrum preprocessing model is generated by the raman spectrum preprocessing model generating method.
The application method of the Raman spectrum pretreatment model applies the Raman spectrum pretreatment model, and the realization method comprises the following steps:
and inputting the original Raman/surface-enhanced Raman spectrum to be processed into the Raman spectrum preprocessing model for processing, and outputting the processed Raman/surface-enhanced Raman spectrum with noise and baseline background signals removed.
A raman spectrum preprocessing model generation system, which is applied to the raman spectrum preprocessing model generation method as described above, includes: a parameter setting training set unit for executing the first step and a model training unit for executing the second step.
A raman spectroscopy preprocessing apparatus comprising a memory, a processor and a computer program stored in the memory and executable on the processor, wherein the steps of the method as described above are implemented when the computer program is executed by the processor.
A computer-readable storage medium, in which a computer program is stored, wherein the computer program, when being executed by a processor, carries out the steps of the method as set forth above.
The invention has the beneficial effects that: the invention firstly uses the self-supervision algorithm for preprocessing the Raman spectrum, and solves the problem that the Raman spectrum has no label data for training; the method has the advantages that the model is simple and quick to use after the training is finished, and the obtained result has higher fidelity while effectively removing noise and background baseline signals; the method can be used for real-time spectral analysis, and the application accuracy is improved. The application range includes but is not limited to classification applications such as substance identification and disease diagnosis, quantitative applications such as substance concentration/content detection, and applications in the image field such as spectral imaging signal-to-noise ratio and resolution improvement.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the present invention will be further described with reference to the accompanying drawings and embodiments, wherein the drawings in the following description are only part of the embodiments of the present invention, and for those skilled in the art, other drawings can be obtained without inventive efforts according to the accompanying drawings:
fig. 1 is a flowchart of a raman spectrum preprocessing model generation method according to a first embodiment of the present invention;
fig. 2 is a schematic diagram of noise information, a baseline background signal and raman peak extraction in a raman spectrum preprocessing model generation method according to a first embodiment of the present invention;
fig. 3 is a flow chart of the generation method of the raman spectrum preprocessing model according to the first embodiment of the present invention for generating the high-simulation raman spectrum library and the ideal spectrum library;
fig. 4 is a schematic diagram of a raman spectrum position encoding module (sliding window method) in a raman spectrum preprocessing model generation method according to a first embodiment of the present invention;
fig. 5 is a flow chart of a raman spectrum preprocessing algorithm training of a raman spectrum preprocessing model generation method according to a first embodiment of the present invention;
fig. 6 is a comparison diagram of the spectrum processing results of the raman spectrum preprocessing model generation method verification 1 according to the first embodiment of the present invention;
fig. 7 is a schematic diagram of the effect of different preprocessing algorithms of the raman spectrum preprocessing model generation method of the first embodiment of the present invention on the accuracy of cancer diagnosis based on serum surface enhanced raman spectroscopy;
fig. 8 is a schematic diagram illustrating comparison of the effect of the preprocessing algorithm of verification 3 on the improvement of the imaging quality of the raman hyperspectral cells by the raman spectrum preprocessing model generation method according to the first embodiment of the invention;
fig. 9 is a flow chart of a raman spectrum preprocessing algorithm training of a raman spectrum preprocessing model generation method according to a second embodiment of the present invention;
fig. 10 is a flow chart of a training algorithm of a raman spectrum preprocessing model generation method according to a third embodiment of the present invention;
FIG. 11 is a schematic block diagram of a Raman spectrum preprocessing model application method according to a fifth embodiment of the present invention;
fig. 12 is a schematic block diagram of a raman spectrum preprocessing model generation system according to a sixth embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the following will clearly and completely describe the technical solutions in the embodiments of the present invention, and it is obvious that the described embodiments are some embodiments of the present invention, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present invention without inventive step, are within the scope of the present invention.
Example one
A method for generating a raman spectrum preprocessing model, as shown in fig. 1 and referring to fig. 2-8, comprises the following steps:
s01: the real Raman/surface enhanced Raman spectrum library is processed by a Raman spectrum generation countermeasure network to generate an ideal spectrum library and a highly simulated Raman spectrum library:
preparing a training data set, and generating an ideal spectrum library and a reference spectrum library;
training the Raman spectrum to generate a countermeasure network, and finally generating a high-simulation Raman spectrum library;
as shown in fig. 3, the method comprises the following steps:
a1. collecting real Raman spectrum/surface enhanced Raman spectrum data;
a2. extracting noise information, a baseline background signal and a Raman characteristic peak of a Raman spectrum in Raman spectrum/surface enhanced Raman spectrum data, and respectively generating a noise information base, a baseline background signal base and a Raman peak base;
a3. randomly combining characteristic peaks in a Raman peak library to generate an ideal spectrum without noise and a baseline background signal, and establishing a library;
a4. superposing the extracted noise and the baseline background signal on the ideal spectrum data to generate a reference spectrum, and establishing a library;
a5. the generator takes an ideal spectrum library and random Gaussian noise as input, and has the function of generating a simulated Raman spectrum;
a6. the discriminator takes the simulated Raman spectrum library and the reference spectrum library as input, and has the functions of distinguishing the simulated Raman spectrum as false (or true) and distinguishing the reference spectrum as true (or false);
a7. when the generator is trained, the loss function of the generator aims at minimizing the error of the discriminator for discriminating the simulated spectrum into true (or false), when the discriminator is trained, the loss function of the discriminator aims at maximizing the classification accuracy of the discriminator, the loss function value of the generator is increased, when the generator is trained, the loss function value of the generator is resisted, and by alternately training the generator and the discriminator, the generator can generate a high-simulation Raman spectrum library which is closer to the reference spectrum;
a8. in the process of training the generator and the discriminator, a loss function is adopted:
Figure 511780DEST_PATH_IMAGE032
wherein
Figure 574062DEST_PATH_IMAGE003
The representation generator (G) uses the input random Gaussian noise (z) and the ideal spectrum: (
Figure 150537DEST_PATH_IMAGE004
) Generating a simulated Raman spectrum;
Figure 935959DEST_PATH_IMAGE005
and
Figure 874965DEST_PATH_IMAGE006
respectively indicating that the discriminator (D) judges the input spectrum as a simulated Raman spectrum
Figure 494165DEST_PATH_IMAGE007
As false or true and reference spectra
Figure 154341DEST_PATH_IMAGE008
Probability of being true or false;
the formula includes
Figure 884400DEST_PATH_IMAGE009
And
Figure 677912DEST_PATH_IMAGE033
two processes are as follows:
Figure 592648DEST_PATH_IMAGE009
simulated raman spectroscopy while representing a training discriminator
Figure 878136DEST_PATH_IMAGE007
Judged false, reference spectrum
Figure 2431DEST_PATH_IMAGE008
Is judged to be true, so that
Figure 916029DEST_PATH_IMAGE011
Is close to 0, and thus
Figure 877032DEST_PATH_IMAGE034
Close to 0, the accuracy of the discriminator is maximized
In the same way, the method has the advantages of,
Figure 243291DEST_PATH_IMAGE035
indicating simulated raman spectra when training the generator
Figure 315152DEST_PATH_IMAGE007
Is judged to be true, so that
Figure 351766DEST_PATH_IMAGE036
Is close to 1, and further
Figure 77145DEST_PATH_IMAGE037
Close to 1. Through the alternation of the steps, the generator realizes the function of generating the high-simulation spectrum.
S02: inputting the high-simulation Raman spectrum library into a Raman spectrum preprocessing model for training; the ideal spectrum library is used as a label of a Raman spectrum pretreatment model, loss values of the pretreated Raman spectrum are calculated and fed back to the Raman spectrum pretreatment model, and parameters are automatically optimized;
baseline background signal: the raman spectrum baseline background signal is a slowly varying signal inevitably introduced during spectrum acquisition, usually derived from a fluorescent signal. In the laser irradiation process, besides Raman photon transition, the interaction between light and a substance also has transition of most of same-frequency fluorescence, a slowly-changing signal is generated, when the half-height width of the signal is greater than that of a Raman peak, an effective Raman spectrum peak and a fluorescence background signal are superposed together and can be difficult to identify, and the phenomenon is called baseline drift, namely a baseline background signal is generated;
the generator is mainly composed of a neural network, takes an ideal spectrum library and random Gaussian noise as input, and has the function of generating a simulated Raman spectrum. Random gaussian noise is used as a random variable to provide different background signals for an ideal reference spectrum. The convolutional neural network, the attention neural network, the atlas neural network or the perceptron neural network and the like can be mainly selected;
the discriminator is mainly composed of a neural network, takes the simulated Raman spectrum library and the reference spectrum library as input, and has the functions of distinguishing the simulated Raman spectrum as false (or true) and distinguishing the reference spectrum as true (or false). The convolutional neural network, the attention neural network, the atlas neural network or the perceptron neural network and the like can be mainly selected;
specifically, the second step adopts a method that:
b1. the Raman spectrum position encoder module is used for encoding the intensity and the Raman shift of the spectrum, so that the Raman shift corresponding to each intensity can be more easily and automatically associated during model training, and the fitting and generalization capability of the model is enhanced; simultaneously, a Raman spectrum background signal evaluation module is adopted to preliminarily fit the signals to be removed;
b2. extracting features of the spectrum by adopting a Raman spectrum encoder module, and extracting features belonging to Raman peaks; the neural network is mainly used for carrying out classification training on the spectrum, and in the process of the classification training, the model can learn to obtain key features (Raman peaks) in the spectrum. The method adopted by the encoder comprises a convolutional neural network, an attention network, a perceptron network and the like;
b3. reducing the highly compressed Raman peak features into a Raman spectrum by using a Raman spectrum decoder module; the highly compressed features from the encoder are restored to baseline and noise removed spectra, and the decoder capabilities of the model training process are automatically learned using a neural network. The decoder adopts a method comprising a convolutional neural network, an attention network, a perceptron network and the like;
b4. and (3) taking the ideal spectrum library as a label for model training, comparing the ideal spectrum library with the restored Raman spectrum set and calculating a loss function:
Figure 337225DEST_PATH_IMAGE038
Figure 806253DEST_PATH_IMAGE039
wherein
Figure 691513DEST_PATH_IMAGE040
For training the background signal evaluation module to,
Figure 728740DEST_PATH_IMAGE020
for training the entire pre-processing model.
Figure 335170DEST_PATH_IMAGE021
Figure 607889DEST_PATH_IMAGE022
Respectively, an ideal reference spectrum and a spectrum after preprocessing.
Figure 350586DEST_PATH_IMAGE023
And
Figure 827222DEST_PATH_IMAGE024
corresponding to the background signal evaluation module and the pre-processing model output. The first step of the whole training process
Figure 920949DEST_PATH_IMAGE040
Only background signal evaluations are trained. In the second step, the first step is that,
Figure 607145DEST_PATH_IMAGE041
the entire pre-processing model is trained.
Figure 469928DEST_PATH_IMAGE027
And calculating an optional root mean square error or average absolute error method. While in the second step weight coefficients are added
Figure 973590DEST_PATH_IMAGE042
The feedback of the two is balanced.
The root mean square error RMSE adopts the formula:
Figure 161470DEST_PATH_IMAGE029
the mean square average absolute error MAE adopts a formula:
Figure 775991DEST_PATH_IMAGE030
b5. feeding the loss value back to the Raman spectrum pretreatment model, and updating the model parameters through a derivative chain rule;
specifically, in the first step:
the noise signal can be extracted by using a wavelet transform method, a Fourier transform method, a window sliding method or the like; the description is as follows:
wavelet transform is a time-frequency localized signal processing method, is suitable for signals which change slowly at low frequency and change rapidly at high frequency, and is very suitable for local transient events. For raman spectroscopy, noise is equivalent to high frequency signals and the baseline and raman peaks are equivalent to lower frequency signals, so the use of wavelet transforms can effectively extract noise signals in raman spectroscopy. Specifically, wavelet transform coefficient decomposition is carried out on the spectrum signals, wavelet coefficients mainly controlled by noise signals are reserved, and the reserved wavelet coefficients are used for reconstruction to obtain the noise signals;
the purpose of the fourier transform is to transform a signal in the time domain (i.e. time domain) into a signal in the frequency domain (i.e. frequency domain). Noise can also be considered as a high frequency signal by fourier transform. Specifically, discrete Fourier transform is performed on the spectrum, high-frequency components are extracted according to a Fourier domain, and finally noise signals in the spectrum can be obtained;
the sliding window algorithm is an operation that performs the required operations on an array of a given window size. Denoising is to replace a current value with a statistic value (such as a mean value, a median value and the like) of data in a window, and when noise is obtained, the spectrum after sliding is subtracted from an original spectrum to obtain the noise;
extracting Raman characteristic peaks of the Raman spectrum is realized by using a gradient descent method or a differential peak-searching fitting method and the like; the description is as follows:
gradient descent is a commonly used method to solve unconstrained optimization problems. Simply speaking, the extreme value of the function is solved by using the vector direction and the magnitude of the derivation, and for the extraction of the Raman characteristic peak, the Raman peak can be separated by using the extreme value (one Raman peak and two peak edges);
and (4) carrying out local differential solution on the spectrum by a differential method. In the spectrum, the differential value is small in a region where the change rate is small, and the differential value is large near the raman peak. Therefore, the Raman peaks separated one by one can be obtained by separating the peaks according to the differential value;
extracting the baseline background signal by using a wavelet transform method, a Fourier transform method or a least square method and the like; the description is as follows:
wavelet transform is a time-frequency localized signal processing method, suitable for signals that change slowly at low frequencies and more quickly at high frequencies, and is well suited for local transient events. For raman spectroscopy, noise is equivalent to a high frequency signal, a baseline is equivalent to a low frequency signal, and a raman peak is equivalent to a signal at an intermediate frequency, so that the baseline signal can be effectively extracted using wavelet transform. Specifically, wavelet transform coefficient decomposition is carried out on the spectrum signals, wavelet coefficients mainly controlled by baseline signals are reserved, and baseline background signals in the wavelet coefficients can be obtained by utilizing the reserved wavelet coefficients for reconstruction;
least squares (also known as the least squares method) is a mathematical optimization technique. It finds the best functional match of the data by minimizing the sum of the squares of the errors (the difference of the true target object and the fitted target object). For raman spectroscopy, the overall trend line belongs to the baseline signal, so the baseline background signal can be obtained by fitting the spectral trend using the least squares method;
the purpose of the fourier transform is to transform a signal in the time domain (i.e. time domain) into a signal in the frequency domain (i.e. frequency domain). The base line can also be considered to be the intermediate frequency signal using fourier transformation. Specifically, discrete Fourier transform is carried out on the spectrum, intermediate frequency components are extracted according to a Fourier domain, and finally a baseline background signal in the intermediate frequency components can be obtained;
in the second step: the Raman spectrum position encoder module divides the spectrum peak by adopting a method such as a sliding window method, a sine/cosine encoding method or a similar batch encoding method;
specifically, the raman spectral position encoder module may encode the raman shift by using different specific methods, including but not limited to a sliding window method, a sine/cosine encoding method, or a batch-like encoding method, and different encoding priorities are different.
The 'sliding window method' carries out multi-channel segmentation on the spectrum, each part of the spectrum after segmentation belongs to different feature extraction, and the weight can be kept in the common calculation. The 'sliding window method' encoder focuses on signal retention of weak characteristic peaks in different Raman shifted segments of the spectrum.
The alternative scheme adopts a sine/cosine coding method which codes the Raman shifts of the spectrum, gives a coding value to each Raman shift and can obtain the strong corresponding relation between the shift and the intensity, and the method focuses on the periodic characteristics of the spectrum.
An alternative scheme adopts a class batch coding method which gives a coding relation by using the relation among analysis batches, for example, a fixed coding value is given to all spectral intensities of the same Raman shift, and the coding method reflects the strong corresponding relation among the same Raman shift and emphasizes the analysis of the difference among the spectra. The encoders have the functions that when the model is input, the Raman displacement corresponding to each intensity can be more easily and automatically matched, and the fitting and generalization capability of the model is enhanced;
the invention firstly uses the self-supervision algorithm for preprocessing the Raman spectrum, and solves the problem that the Raman spectrum has no label data for training; the method has the advantages that the model is simple and quick to use after training is finished, and the obtained result has higher fidelity while effectively removing noise and a baseline background signal; the method can be used for real-time spectral analysis, and the application accuracy is improved. The application range includes but is not limited to classification applications such as substance identification and disease diagnosis, quantitative applications such as substance concentration/content detection, and applications in the image field such as spectral imaging signal-to-noise ratio and resolution improvement.
FIG. 2 is a schematic diagram of the extraction of a noise information base, a baseline background signal base and a Raman peak base in a Raman spectrum generation countermeasure network algorithm;
the method comprises the following specific implementation steps: and (3) performing characteristic peak searching and window segmentation on the input (a) real Raman spectrum library by the aid of an assistant task Raman spectrum generation countermeasure network algorithm part, and extracting corresponding characteristics. The feature extraction method includes, but is not limited to, least squares, wavelet transform, wiener transform, etc. Further establishing a noise information base (b), a base line background signal base (c) and a Raman peak base (d);
FIG. 4 is a schematic diagram of a Raman spectrum position coding module (sliding window) in a Raman spectrum preprocessing model RSBPDL algorithm;
the method comprises the following specific implementation steps: and dividing the spectrum into a plurality of regions by using a sliding window strategy, and sending the regions into a Raman spectrum preprocessing model RSBPDL model for processing. The key parameters of the sliding window strategy are two parameters of step Size (stride) and Size (Size), and the two parameters can automatically select the most appropriate values according to output feedback in model training. The module is mainly used for splitting the spectrum, so that the extraction of spectral characteristic peaks of different wavebands is realized in the spectrum pretreatment process; after pretreatment, the Raman peak of a weak signal can be better reduced;
the following are the comparison of the effect technical parameters of the invention on the spectrum pretreatment with the prior method:
TABLE 1 comparison of Root Mean Square Error (RMSE) values of spectra after various pretreatment algorithms with ideal spectra
Figure 493281DEST_PATH_IMAGE044
Table 2 infinite norms of the spectrum after various pre-processing algorithms and the ideal spectrum (infinitinform,
Figure 167844DEST_PATH_IMAGE045
) Value comparison
Figure 111530DEST_PATH_IMAGE047
Table 1 and table 2 show the noise removal and background suppression capabilities of the original raman spectra of the present invention (table 1 RMSE) and the signal fidelity (table 2 infinite norm), respectively, for different signal to noise ratios compared to the prior art methods. Wherein the Root Mean Square Error (RMSE) is used to calculate the difference between the two spectra in the plot, a smaller value indicating a stronger noise removal and background suppression capability,
Figure 532671DEST_PATH_IMAGE048
wherein
Figure 979833DEST_PATH_IMAGE049
The total number of points of the spectrum is represented,
Figure 825298DEST_PATH_IMAGE050
is a spectrum of light
Figure 115334DEST_PATH_IMAGE050
The point of the light source is a point,
Figure 212603DEST_PATH_IMAGE051
the ideal reference spectrum of the light is that of the light,
Figure 659413DEST_PATH_IMAGE052
spectra after pretreatment. Infinite norm (
Figure 675779DEST_PATH_IMAGE045
) In calculating the maximum difference of peaks in the spectrum, the smaller the value, the higher the fidelity of the spectrum,
Figure 453111DEST_PATH_IMAGE053
wherein
Figure 822913DEST_PATH_IMAGE049
The total number of points of the spectrum is represented,
Figure 637810DEST_PATH_IMAGE050
is a spectrum of light
Figure 825077DEST_PATH_IMAGE050
The point of the light beam is the point,
Figure 699493DEST_PATH_IMAGE051
which represents an ideal reference spectrum of light,
Figure 263198DEST_PATH_IMAGE052
the spectrum after pretreatment is shown. From the results, the RSBPDL processed spectra of the present invention compare the Root Mean Square Error (RMSE) and the infinite norm (RMSE) of the original spectra
Figure 929671DEST_PATH_IMAGE045
) Compared with other existing methods, the method is better than the existing methods in the pretreatment effect on the Raman spectrum;
FIG. 6 is a graph showing the comparison of single spectrum to ideal spectrum overlap ratio after pretreatment by different algorithms (spectrum 4 in tables 1 and 2);
(a) Spectral data generated by a raman spectroscopy generation countermeasure network algorithm is presented, where the raw spectra are pre-processed raw spectra with noise and a baseline, and the ideal spectra are corresponding ideal spectra after removal of noise and baseline background signals. (b-f) shows the comparison of the spectrum after processing the original spectrum with the ideal spectrum by five different pre-processing algorithms (polymomial fitting, wavelet transform, residualCNN, unet-1D and RSBPDL). Wherein RSBPDL is the model of the invention. The more similar the preprocessed spectrum is to the ideal spectrum, the higher the noise removal and background suppression capability of the algorithm on the spectrum and the higher the fidelity degree of the spectrum. From the results, the RSBPDL-treated spectrum had the highest overlap ratio with the ideal spectrum, and the Root Mean Square Error (RMSE) and the infinite norm (r: (r) (r))
Figure 160277DEST_PATH_IMAGE054
) The value is minimum, which shows that the pretreatment effect of the invention on the Raman spectrum is better than that of the existing method;
FIG. 7 shows that pre-processing Raman spectra according to the present invention improves the accuracy of disease diagnosis;
the diagnosis test included 28 cancer patients and 27 normal persons, and the data set was divided into 22/21 training sets and 6 validation sets, and the cross validation was performed 100 times. The diagnostic model employs residual neural network diagnostics, using data to collect surface enhanced raman spectra for the subject's serum. In order to improve the diagnosis accuracy, the acquired raman spectrum needs to be preprocessed to remove noise and baseline background signals.
The collected Raman spectrum is preprocessed by utilizing a Raman spectrum preprocessing model (RSBPDL) and four existing preprocessing algorithms (polymomial fitting, wavelet transform, residualCNN and Unet-1D), the obtained spectrum is input into a disease diagnosis model for training, and the diagnosis accuracy is evaluated through a verification set. Wherein receiver operating characteristic curves (ROC) are tools used in clinical diagnostics to comprehensively evaluate the specificity and sensitivity of diagnostic methods. AUC is the area under the ROC curve, with higher AUC values indicating higher diagnostic accuracy. As can be seen from the left graph, the ROC curve related to the diagnosis accuracy of the Raman spectrum preprocessed by the RSBPDL is positioned at the top, and represents that the area under the ROC curve is the largest; the right graph is an error bar graph, in which the upper and lower horizontal lines represent the upper and lower limits of the error and the prisms represent the mean value of the error. Each line represents a pre-processing method; the right graph shows that in all the preprocessing algorithms, the AUC value of the classification diagnosis performed after the preprocessing of the Raman spectrum preprocessing algorithm RSBPDL is the largest, and the improvement effect on the diagnosis accuracy is the best.
FIG. 8 is a graph showing the resolution enhancement effect of the present invention on Raman spectroscopy cell imaging;
the left image is an original Raman spectrogram directly acquired by a Raman confocal microscope, and 2930cm in the Raman spectrum of the selected pixel point -1 The image is imaged, and the image signal-to-noise ratio and the resolution are low. The right image is an image of the acquired raman spectrum after being preprocessed by the RSBPDL algorithm of the present invention.
Compared with the prior art, the quality of the imaged image is greatly improved, and more details of cells can be seen in the image, so that the signal to noise ratio and the resolution are higher.
The image quality evaluation can be realized by calculating the entropy values of the two images, and the larger the entropy value is, the more information the images contain is.
Entropy value (Entropy) calculation formula:
Figure 646622DEST_PATH_IMAGE055
in which P is i Represented as the value of each pixel. The entropy value of the left graph is 2.43, the entropy value of the right graph is 3.09, and the result shows that the Raman image after the RSBPDL pretreatment spectrum contains more detailed information. Therefore, the quality of the Raman image can be effectively improved after the RSBPDL pretreatment;
example two
This embodiment is substantially the same as the first embodiment, and the same parts are not described again, except for the following points as shown in fig. 9: the second step adopts the method:
the Raman spectrum position encoder module is used for encoding the intensity and the Raman shift of the spectrum, so that the Raman shift corresponding to each intensity can be more easily and automatically associated during model training, and the fitting and generalization capability of the model is enhanced;
extracting features of the spectrum by adopting a Raman spectrum encoder module, and extracting features belonging to Raman peaks;
reducing the highly compressed Raman peak features into a Raman spectrum by using a Raman spectrum decoder module;
a Raman spectrum background signal evaluation module is adopted to preliminarily fit the signals to be removed, so that the subsequent fitting difficulty is reduced;
the ideal spectrum library is used as a label for model training, is compared with the restored Raman spectrum set, and calculates a loss value;
feeding the loss value back to the Raman spectrum pretreatment model, and updating the model parameters through a chain rule;
as shown in fig. 9, the scheme differs from the first embodiment in that: in the second step, the "raman spectrum background signal evaluation module" is moved backward, so that the pretreatment of the spectrum can be realized, and only the effect of easily reducing the output result of the "raman spectrum background signal evaluation module" is put backward, so that the effect is reduced, because: when the 'Raman spectrum background signal evaluation module' is arranged afterwards, the characteristics of the Raman spectrum background signal evaluation module cannot be extracted by an encoder, and only the output of the background evaluation module can be processed by a decoder. Therefore, the trainability of the "raman spectrum background signal evaluation module" is reduced after being post-processed, and the pre-processing noise is larger than that in the first embodiment.
EXAMPLE III
This embodiment is substantially the same as the first embodiment, and the same parts are not described again, except for the following points as shown in fig. 10: the second step adopts the method:
the Raman spectrum position encoder module is used for encoding the intensity and the Raman shift of the spectrum, so that the Raman shift corresponding to each intensity can be more easily and automatically associated during model training, and the fitting and generalization capability of the model is enhanced;
extracting features of the spectrum by adopting a Raman spectrum encoder module, and extracting features belonging to Raman peaks;
reducing the highly compressed Raman peak characteristics into a Raman spectrum by using a Raman spectrum decoder module;
the ideal spectrum library is used as a label for model training, is compared with the restored Raman spectrum set, and calculates a loss function;
feeding the loss value back to the Raman spectrum pretreatment model, and updating the model parameters through a chain rule;
as shown in fig. 10, the scheme differs from the first embodiment in that: in the second step, only the 'encoder module' and the 'decoder module' are used to extract the spectral features and remove the baseline and noise. Due to the removal of the "spectral background signal evaluation module", the effect is reduced, although spectral preprocessing can be implemented as well.
Example four
A raman spectrum preprocessing model, wherein the raman spectrum preprocessing model is generated by the raman spectrum preprocessing model generating method.
EXAMPLE five
A method for applying a raman spectrum preprocessing model, as shown in fig. 11, applies the raman spectrum preprocessing model as described above, and the implementation method thereof is as follows:
inputting the original Raman/surface-enhanced Raman spectrum to be processed into the Raman spectrum preprocessing model for processing, and outputting a processed Raman/surface-enhanced Raman spectrum with the noise signal and the baseline background signal removed;
the method can be directly used for removing the noise and the baseline background signal of any Raman spectrum to be processed. The output processed spectrum can be directly used for downstream application, and the performance of the application is improved. Applications include, but are not limited to, classification applications such as substance identification and disease diagnosis, quantitative applications such as substance concentration/content detection, and applications in the field of images such as spectral imaging resolution enhancement.
EXAMPLE six
A raman spectrum preprocessing model generation system, as shown in fig. 12, for implementing the raman spectrum preprocessing model generation method as described above, includes a parameter setting training set unit 1 for executing a first step to generate a highly simulated raman spectrum library and an ideal spectrum library, and a model training unit 2 for executing a second step to train and generate a parameter-optimized self-supervised raman spectrum preprocessing model;
the invention uses the self-supervision algorithm for preprocessing the Raman spectrum for the first time, and solves the problem that the Raman spectrum has no label data for training; the method has the advantages that the model is simple and quick to use after training is finished, and the obtained result has higher fidelity while effectively removing noise and a baseline background signal; the method can be used for real-time spectral analysis, and the application accuracy is improved. The application range includes but is not limited to classification applications such as substance identification and disease diagnosis, quantitative applications such as substance concentration/content detection, and applications in the image field such as spectral imaging signal-to-noise ratio and resolution improvement.
EXAMPLE seven
A raman spectroscopy preprocessing apparatus comprising a memory, a processor and a computer program stored in the memory and executable on the processor, wherein the steps of the method as described above are implemented when the computer program is executed by the processor.
Example eight
A computer-readable storage medium, in which a computer program is stored, wherein the computer program, when being executed by a processor, carries out the steps of the method as set forth above.
The computer readable storage medium of the embodiments of the present invention may include any entity or device capable of carrying computer program code, a recording medium, such as a ROM/RAM, a magnetic disk, an optical disk, a flash memory, or the like.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents and improvements made within the spirit and principle of the present invention are intended to be included within the scope of the present invention.

Claims (15)

1. A Raman spectrum preprocessing model generation method is characterized by comprising the following steps:
the first step is as follows: the real Raman/surface enhanced Raman spectrum library is processed by a Raman spectrum generation countermeasure network to generate an ideal spectrum library and a highly simulated Raman spectrum library:
the second step is that: inputting the high-simulation Raman spectrum library into a Raman spectrum preprocessing model for training; and the ideal spectrum library is used as a label of the Raman spectrum pretreatment model, loss value calculation is carried out on the pretreated Raman spectrum, the calculated loss value is fed back to the Raman spectrum pretreatment model, and automatic parameter optimization is carried out.
2. The raman spectrum preprocessing model generation method of claim 1, wherein said subjecting the real raman/surface enhanced raman spectroscopy library to raman spectrum generation countermeasure network processing comprises a method of:
preparing a training data set, and generating an ideal spectrum library and a reference spectrum library;
and training the Raman spectrum to generate a countermeasure network, and finally generating a high-simulation Raman spectrum library.
3. A method for generating a raman spectrum preprocessing model according to claim 2, wherein said preparing a training data set, generating an ideal spectrum library and a reference spectrum library comprises the method of:
collecting real Raman spectrum/surface enhanced Raman spectrum data;
extracting noise information, a baseline background signal and a Raman characteristic peak of a Raman spectrum in Raman spectrum/surface enhanced Raman spectrum data, and respectively generating a noise information base, a baseline background signal base and a Raman peak base;
randomly combining characteristic peaks in a Raman peak library to generate an ideal spectrum without noise and a baseline background signal, and establishing a library;
the extracted noise and baseline background signal are superimposed on the ideal spectral data to generate a reference spectrum and a library is built.
4. The method for generating a raman spectrum preprocessing model according to claim 2, wherein said training raman spectrum to generate a countermeasure network and finally generating a highly simulated raman spectrum library comprises the steps of:
the Raman spectrum generation countermeasure network comprises a generator and a discriminator
A generator: generating a simulated Raman spectrum by taking an ideal spectrum library and random Gaussian noise as input;
the discriminator: the simulated Raman spectrum library and the reference spectrum library are used as input, the simulated Raman spectrum is judged to be false or true, and the reference spectrum is judged to be true or false;
when training the generator, the loss function of the generator aims at minimizing the error of the discriminator for discriminating the simulation spectrum into true or false;
during training of the discriminator, the discriminator loss function is targeted to maximize the classifier accuracy;
training the generator and the discriminator alternately, wherein the generator finally generates a high-simulation Raman spectrum library close to the reference spectrum.
5. The method for generating a raman spectrum preprocessing model according to claim 4, wherein the generating of the high-simulation raman spectrum employs the formula:
Figure 201412DEST_PATH_IMAGE002
wherein
Figure 774345DEST_PATH_IMAGE003
The representation generator (G) uses the input random Gaussian noise (z) and the ideal spectrum: (
Figure 819661DEST_PATH_IMAGE004
) Generating a simulated Raman spectrum;
Figure 870663DEST_PATH_IMAGE005
and
Figure 216194DEST_PATH_IMAGE006
respectively representing the input spectrum as a simulated Raman spectrum
Figure 691519DEST_PATH_IMAGE007
As false or true and reference spectra
Figure 489710DEST_PATH_IMAGE008
Probability of being true or false;
the formula includes
Figure 344403DEST_PATH_IMAGE009
And
Figure 403495DEST_PATH_IMAGE010
two processes are as follows:
Figure 928017DEST_PATH_IMAGE009
simulated raman spectroscopy while representing a training discriminator
Figure 75489DEST_PATH_IMAGE011
Judged false, reference spectrum
Figure 202714DEST_PATH_IMAGE008
Is judged to be true, so that
Figure 257257DEST_PATH_IMAGE012
Is close to 0, and thus
Figure 77315DEST_PATH_IMAGE013
The accuracy of the discriminator is maximized by being close to 0;
Figure 850099DEST_PATH_IMAGE014
indicating simulated raman spectra when training the generator
Figure 512506DEST_PATH_IMAGE007
Is judged to be true, so that
Figure 811769DEST_PATH_IMAGE015
Is close to 1, and further
Figure 678094DEST_PATH_IMAGE016
Is close to 1;
through the alternation of the two processes, the generator realizes the generation of the high simulation spectrum.
6. The method for generating a raman spectrum preprocessing model according to claim 1, wherein the highly simulated raman spectrum library is inputted into the raman spectrum preprocessing model for training; the ideal spectrum library is used as a label of a Raman spectrum pretreatment model, loss value calculation is carried out on the pretreated Raman spectrum and is fed back to the Raman spectrum pretreatment model, and the method for automatically optimizing parameters is adopted:
the method comprises the following steps of coding the intensity and the Raman shift of a spectrum by using a Raman spectrum position coder module, and preliminarily fitting noise and a baseline background signal to be removed by using a Raman spectrum background signal evaluation module;
extracting features of the spectrum by adopting a Raman spectrum encoder module, and extracting features belonging to Raman peaks;
reducing the highly compressed Raman peak features into a Raman spectrum by using a Raman spectrum decoder module;
the ideal spectrum library is used as a label for model training, is compared with the reduced Raman spectrum set, and calculates a loss value through a loss function;
and feeding the loss value back to the Raman spectrum pretreatment model, and updating the model parameters through a chain rule.
7. The method for generating a raman spectrum preprocessing model according to claim 1, wherein the highly-simulated raman spectrum library is inputted into the raman spectrum preprocessing model for training; the ideal spectrum library is used as a label of a Raman spectrum pretreatment model, loss value calculation is carried out on the pretreated Raman spectrum and is fed back to the Raman spectrum pretreatment model, and a method for automatically optimizing parameters is adopted:
encoding the intensity and raman shift of the spectrum using a raman spectrum position encoder module;
extracting features of the spectrum by adopting a Raman spectrum encoder module, and extracting features belonging to Raman peaks;
reducing the highly compressed Raman peak features into a Raman spectrum by using a Raman spectrum decoder module;
primarily fitting noise and a baseline background signal to be removed by adopting a Raman spectrum background signal evaluation module;
the ideal spectrum library is used as a label for model training, is compared with the reduced Raman spectrum set, and calculates a loss value through a loss function;
and feeding the loss value back to the Raman spectrum pretreatment model, and updating the model parameters through a chain rule.
8. The method for generating a raman spectrum preprocessing model according to claim 1, wherein the highly-simulated raman spectrum library is inputted into the raman spectrum preprocessing model for training; the ideal spectrum library is used as a label of a Raman spectrum pretreatment model, loss value calculation is carried out on the pretreated Raman spectrum and is fed back to the Raman spectrum pretreatment model, and the method for automatically optimizing parameters is adopted:
encoding the intensity and raman shift of the spectrum using a raman spectrum position encoder module;
extracting features of the spectrum by adopting a Raman spectrum encoder module, and extracting features belonging to Raman peaks;
reducing the highly compressed Raman peak features into a Raman spectrum by using a Raman spectrum decoder module;
the ideal spectrum library is used as a label for model training, is compared with the reduced Raman spectrum set, and calculates a loss value through a loss function;
and feeding the loss value back to the Raman spectrum pretreatment model, and updating the model parameters through a chain rule.
9. A Raman spectrum preprocessing model generation method according to any one of claims 6-8, wherein the Raman spectrum position encoder module adopts a method of a sliding window method, a sine/cosine encoding method or a batch-like encoding method;
sequentially segmenting different Raman shifts of the Raman spectrum by adopting a sliding window method to obtain multi-channel data;
the relative position of the spectral intensity is coded by adopting a sine/cosine coding method, and the code has a periodic characteristic;
and (3) carrying out correlation analysis on the multiple spectra by adopting a similar batch coding method, and coding the spectra by utilizing correlation coefficients.
10. The method of generating a raman spectrum preprocessing model according to any one of claims 6 to 8, wherein the loss function employs the formula:
Figure 797228DEST_PATH_IMAGE017
Figure 407201DEST_PATH_IMAGE018
wherein
Figure 298321DEST_PATH_IMAGE019
For training the background signal evaluation module to,
Figure 725760DEST_PATH_IMAGE020
for training the entire pre-processing model;
Figure 801033DEST_PATH_IMAGE021
Figure 214696DEST_PATH_IMAGE022
refer to the ideal reference spectrum and the spectrum after pre-processing, respectively;
Figure 954464DEST_PATH_IMAGE023
and
Figure 428170DEST_PATH_IMAGE024
corresponding to the background signal evaluation module and the preprocessing model output;
the method comprises two processes of training a background signal evaluation module and a Raman spectrum preprocessing model:
first step, use
Figure 256318DEST_PATH_IMAGE025
Training a background signal evaluation module;
second step of using
Figure 332727DEST_PATH_IMAGE026
Training a Raman spectrum pretreatment model;
Figure 70876DEST_PATH_IMAGE027
calculating the optional root mean square error or average absolute error, and adding weight coefficient in the second step
Figure 577468DEST_PATH_IMAGE028
Balancing the feedback of the two;
the root mean square error RMSE adopts the formula:
Figure 158491DEST_PATH_IMAGE029
the average absolute error MAE adopts the formula:
Figure 648378DEST_PATH_IMAGE030
11. a raman spectrum preprocessing model generated by the method for generating a raman spectrum preprocessing model according to any one of claims 1 to 10.
12. A method for applying a raman spectrum preprocessing model according to claim 11, wherein the method comprises the following steps:
and inputting the original Raman/surface-enhanced Raman spectrum to be processed into the Raman spectrum preprocessing model for processing, and outputting the processed Raman/surface-enhanced Raman spectrum with noise and baseline background signals removed.
13. A raman spectrum preprocessing model generation system, which is applied to the raman spectrum preprocessing model generation method according to any one of claims 1 to 10, comprising: a parameter setting training set unit for executing the first step and a model training unit for executing the second step.
14. A raman spectroscopy preprocessing apparatus comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the steps of the method according to any one of claims 1 to 10 when executing the computer program.
15. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 10.
CN202211256339.9A 2022-10-13 2022-10-13 Raman spectrum preprocessing model generation method, system, terminal and storage medium Active CN115326783B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202211256339.9A CN115326783B (en) 2022-10-13 2022-10-13 Raman spectrum preprocessing model generation method, system, terminal and storage medium
PCT/CN2023/121358 WO2024078321A1 (en) 2022-10-13 2023-09-26 Raman spectrum preprocessing model generation method and system, and terminal and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211256339.9A CN115326783B (en) 2022-10-13 2022-10-13 Raman spectrum preprocessing model generation method, system, terminal and storage medium

Publications (2)

Publication Number Publication Date
CN115326783A true CN115326783A (en) 2022-11-11
CN115326783B CN115326783B (en) 2023-01-17

Family

ID=83913179

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211256339.9A Active CN115326783B (en) 2022-10-13 2022-10-13 Raman spectrum preprocessing model generation method, system, terminal and storage medium

Country Status (2)

Country Link
CN (1) CN115326783B (en)
WO (1) WO2024078321A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116152578A (en) * 2023-04-25 2023-05-23 深圳湾实验室 Training method and device for noise reduction generation model, noise reduction method and medium
CN117783088A (en) * 2024-02-23 2024-03-29 广州贝拓科学技术有限公司 Control model training method, device and equipment of laser micro-Raman spectrometer
WO2024078321A1 (en) * 2022-10-13 2024-04-18 南方科技大学 Raman spectrum preprocessing model generation method and system, and terminal and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210270742A1 (en) * 2020-02-28 2021-09-02 Virginia Tech Intellectual Properties, Inc. Peak-preserving and enhancing baseline correction methods for raman spectroscopy
CN113378680A (en) * 2021-06-01 2021-09-10 厦门大学 Intelligent database building method for Raman spectrum data
CN113390848A (en) * 2020-03-13 2021-09-14 桂林电子科技大学 DCGAN spectral data expansion method
CN113791037A (en) * 2021-08-19 2021-12-14 南京航空航天大学 Silicon-based Fourier transform spectrum measurement method based on generation countermeasure network
CN114417937A (en) * 2022-01-26 2022-04-29 山东捷讯通信技术有限公司 Deep learning-based Raman spectrum denoising method

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103217409B (en) * 2013-03-22 2015-02-18 中国科学院重庆绿色智能技术研究院 Raman spectral preprocessing method
JP2021528861A (en) * 2018-06-28 2021-10-21 アプライド マテリアルズ インコーポレイテッドApplied Materials,Incorporated Generation of training spectra for machine learning systems for spectroscopic image monitoring
JP7424595B2 (en) * 2020-06-23 2024-01-30 株式会社島津製作所 Discriminator generation method and device
KR102238300B1 (en) * 2020-09-24 2021-04-12 국방과학연구소 Method and apparatus for producing infrared spectrum
CN112712857A (en) * 2020-12-08 2021-04-27 北京信息科技大学 Method for generating biological Raman spectrum data based on WGAN (WGAN) antagonistic generation network
CN115326783B (en) * 2022-10-13 2023-01-17 南方科技大学 Raman spectrum preprocessing model generation method, system, terminal and storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210270742A1 (en) * 2020-02-28 2021-09-02 Virginia Tech Intellectual Properties, Inc. Peak-preserving and enhancing baseline correction methods for raman spectroscopy
CN113390848A (en) * 2020-03-13 2021-09-14 桂林电子科技大学 DCGAN spectral data expansion method
CN113378680A (en) * 2021-06-01 2021-09-10 厦门大学 Intelligent database building method for Raman spectrum data
CN113791037A (en) * 2021-08-19 2021-12-14 南京航空航天大学 Silicon-based Fourier transform spectrum measurement method based on generation countermeasure network
CN114417937A (en) * 2022-01-26 2022-04-29 山东捷讯通信技术有限公司 Deep learning-based Raman spectrum denoising method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
李灵巧 等: "基于DCGAN的拉曼光谱样本扩充及应用研究", 《光谱学与光谱分析》 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2024078321A1 (en) * 2022-10-13 2024-04-18 南方科技大学 Raman spectrum preprocessing model generation method and system, and terminal and storage medium
CN116152578A (en) * 2023-04-25 2023-05-23 深圳湾实验室 Training method and device for noise reduction generation model, noise reduction method and medium
CN116152578B (en) * 2023-04-25 2023-07-18 深圳湾实验室 Training method and device for noise reduction generation model, noise reduction method and medium
CN117783088A (en) * 2024-02-23 2024-03-29 广州贝拓科学技术有限公司 Control model training method, device and equipment of laser micro-Raman spectrometer
CN117783088B (en) * 2024-02-23 2024-05-14 广州贝拓科学技术有限公司 Control model training method, device and equipment of laser micro-Raman spectrometer

Also Published As

Publication number Publication date
WO2024078321A1 (en) 2024-04-18
CN115326783B (en) 2023-01-17

Similar Documents

Publication Publication Date Title
CN115326783B (en) Raman spectrum preprocessing model generation method, system, terminal and storage medium
US7555155B2 (en) Classifying image features
US10107725B2 (en) Multi-spectral imaging including at least one common stain
CN110879980B (en) Nuclear magnetic resonance spectrum denoising method based on neural network algorithm
CN113780056A (en) Rolling bearing fault diagnosis method based on vibration signal
CN110503060B (en) Spectral signal denoising method and system
CN107977949B (en) Method for improving medical image fusion quality of learning based on projection dictionary
CN112200770A (en) Tumor detection method based on Raman spectrum and convolutional neural network
Abdolmaleki et al. Selecting optimum base wavelet for extracting spectral alteration features associated with porphyry copper mineralization using hyperspectral images
CN115640506B (en) Magnetic particle distribution model reconstruction method and system based on time-frequency spectrum signal enhancement
CN116701845A (en) Aquatic product quality evaluation method and system based on data processing
CN112951342A (en) Data analysis system and data analysis method
Monakhova et al. Independent component analysis and multivariate curve resolution to improve spectral interpretation of complex spectroscopic data sets: application to infrared spectra of marine organic matter aggregates
Tripathi Facial image noise classification and denoising using neural network
CN115661069A (en) Hyperspectral anomaly detection method and computer device
CN116380869A (en) Raman spectrum denoising method based on self-adaptive sparse decomposition
CN114820857A (en) Bimodal feature set construction method for machine learning
Wang et al. WaveFormer: transformer-based denoising method for gravitational-wave data
Fang et al. Recent progress and applications of Raman spectrum denoising algorithms in chemical and biological analyses: A review
Yoon Statistical denoising scheme for single molecule fluorescence microscopic images
DARICI et al. A comparative study on denoising from facial images using convolutional autoencoder
CN117313001B (en) Mixed event decomposition and identification method for distributed optical fiber acoustic wave sensing system
Mohammed et al. A Comprehensive Study on Medical Image Denoising using Convolutional Neural Networks
Nagaraja et al. A fully automatic approach for enhancement of microarray images
CN117852612A (en) Raman spectrum denoising method based on unsupervised learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant