CN115326783A

CN115326783A - Raman spectrum preprocessing model generation method, system, terminal and storage medium

Info

Publication number: CN115326783A
Application number: CN202211256339.9A
Authority: CN
Inventors: 沈平; 胡嘉祺; 陈金娜; 薛陈龙; 党竑
Original assignee: Southwest University of Science and Technology
Current assignee: Southwest University of Science and Technology
Priority date: 2022-10-13
Filing date: 2022-10-13
Publication date: 2022-11-11
Anticipated expiration: 2042-10-13
Also published as: WO2024078321A1; CN115326783B

Abstract

The invention relates to a method, a system, a terminal and a storage medium for generating a Raman spectrum preprocessing model, which are characterized in that noise, a baseline background signal and a Raman peak in a real Raman spectrum library are extracted and built, raman characteristic peaks in the Raman peak library are freely combined to generate an ideal spectrum library without noise and the baseline background signal, the extracted noise and the baseline background signal are superposed on the ideal spectrum library to generate a reference spectrum library, the ideal spectrum library and a random Gaussian noise input generator generate a simulation spectrum library, a discriminator and the generator form countertraining, and a high-simulation Raman spectrum library conforming to the real Raman spectrum characteristic is generated after the training is finished; training a spectrum preprocessing model based on an automatic supervision algorithm by using the library so as to complete automatic parameter setting; the ideal spectrum library is used as a label for model training, the model can be directly used for processing the actually acquired spectrum after the training is finished, the use is simple and quick, the effect of denoising and baseline background removing is good, and the spectrum is high in fidelity.

Description

Raman spectrum preprocessing model generation method, system, terminal and storage medium

Technical Field

The invention relates to the technical field of Raman spectrum preprocessing, in particular to a Raman spectrum preprocessing model generation method, a Raman spectrum preprocessing model generation system, a Raman spectrum preprocessing model generation terminal and a Raman spectrum preprocessing model storage medium.

Background

Raman spectroscopy is a spectroscopic technique based on the interaction of light and a substance to generate raman scattering due to the frequency difference between scattered light and incident light corresponding to the vibration or pure rotational energy level spacing of scattering medium molecules; as a molecular vibration spectrum, the unmarked property of raman spectrum is widely applied to detection of chemistry, biology and the like; however, due to the noise of the instrument, the fluorescence effect of the sample and the environmental noise, the quantitative precision and the qualitative accuracy are interfered, and particularly for the detection of the biological sample, the interference is more obvious, so that the characteristic peak cannot be effectively identified; although the effect of noise can be reduced by upgrading the instrument, optimizing the detection environment and preprocessing the sample, the influence of noise is limited by time cost and economic cost, and the effect of noise cannot be completely eliminated from hardware by the current technical level.

In the raman spectrum preprocessing, a baseline background signal and a noise signal are removed from an acquired spectrum, and in the prior art, a raman spectrum preprocessing method is divided into a mathematical analysis method and a machine learning method: the principle of spectrum preprocessing based on a mathematical analysis method is to use a spectrum physics mathematical equation for analysis, and because the actually sampled Raman spectrum does not completely meet the mathematical equation and background and noise signals are diversified, the mathematical analysis method needs to manually adjust parameters aiming at different spectrums, can not realize full automation and is difficult to process the Raman spectrum of low signal-to-noise ratio or complex background signals; the Raman spectrum preprocessing model based on the machine learning method mostly uses the fully supervised machine learning at the present stage, but because the Raman spectrum can not acquire a standard spectrum without noise and a baseline background signal through real spectrum acquisition as a label, a simulated spectrum generated by a simulated Gaussian peak or Lorentz peak is generally adopted as a fully supervised machine learning label, random Gaussian noise is adopted as noise, and a random fluorescence signal is adopted as a baseline, the processing method can fit a simulated Raman spectrum data set to a certain extent, but the noise peak is easily overfitting in the using process, so that the preprocessed spectrum signal is distorted, and the processing method has certain limitation.

The existing method example:

1. a method for removing background signals based on polynomial or least squares fitting: the artificial design of corresponding parameters is required according to the Raman spectra with different signals and different signal-to-noise ratios, so that the method has certain subjectivity and can hardly process the spectrum with low signal-to-noise ratio.

2. The method for removing the baseline background signal and the noise based on the wavelet transform comprises the following steps: wavelet transformation is to fit according to distribution characteristics on a spectrum time domain, automation can be realized to a certain extent by designing automatic iteration, but different algorithms need to be selected according to Raman spectrums with different signal-to-noise ratios, and in addition, the method is still not ideal in noise suppression, baseline background removal and signal fidelity effects of low signal-to-noise ratio Raman spectrum processing.

3. The method for removing noise and baseline background signals based on the full supervision of deep learning comprises the following steps: the deep learning can be used for realizing automatic spectrum preprocessing without human interference, the noise and base line background removing effect is good, but for the low signal-to-noise ratio Raman spectrum, a full supervision algorithm can over-fit a Gaussian peak and a Lorentz peak, and the obtained Raman spectrum can generate certain deformation, so that the processed spectrum is distorted.

There is a need for a more cost effective, more efficient automated, high fidelity approach to the above problems.

Disclosure of Invention

The technical problem to be solved by the present invention is to provide a method for generating a raman spectrum preprocessing model, a system for generating a raman spectrum preprocessing model, a raman spectrum preprocessing terminal, and a computer-readable storage medium, aiming at the above-mentioned defects in the prior art.

The technical scheme adopted by the invention for solving the technical problems is as follows:

a Raman spectrum preprocessing model generation method is constructed, and the method comprises the following steps:

the first step is as follows: the real Raman/surface enhanced Raman spectrum library is processed by a Raman spectrum generation countermeasure network to generate an ideal spectrum library and a highly simulated Raman spectrum library:

the second step is that: inputting the high-simulation Raman spectrum library into a Raman spectrum preprocessing model for training; and the ideal spectrum library is used as a label of the Raman spectrum pretreatment model, loss value calculation is carried out on the pretreated Raman spectrum, the calculated loss value is fed back to the Raman spectrum pretreatment model, and automatic parameter optimization is carried out.

The invention discloses a Raman spectrum preprocessing model generation method, wherein the method comprises the following steps of processing a real Raman/surface enhanced Raman spectrum library through a Raman spectrum generation countermeasure network to generate an ideal spectrum library and a highly-simulated Raman spectrum library, wherein the method comprises the following steps:

preparing a training data set, and generating an ideal spectrum library and a reference spectrum library;

and training the Raman spectrum to generate a countermeasure network, and finally generating a high-simulation Raman spectrum library.

The method for generating the Raman spectrum preprocessing model comprises the following steps of:

collecting real Raman spectrum/surface enhanced Raman spectrum data;

extracting noise information, a baseline background signal and a Raman characteristic peak of a Raman spectrum in Raman spectrum/surface enhanced Raman spectrum data, and respectively generating a noise information base, a baseline background signal base and a Raman peak base;

randomly combining characteristic peaks in a Raman peak library to generate an ideal spectrum without noise and a baseline background signal, and establishing a library;

the extracted noise and baseline background signal are superimposed on the ideal spectral data to generate a reference spectrum and a library is built.

The invention discloses a method for generating a Raman spectrum preprocessing model, wherein the method for training a Raman spectrum to generate a countermeasure network and finally generating a high-simulation Raman spectrum library comprises the following steps:

the Raman spectrum generation countermeasure network comprises a generator and a discriminator

A generator: generating a simulated Raman spectrum by taking an ideal spectrum library and random Gaussian noise as input;

the discriminator: the simulated Raman spectrum library and the reference spectrum library are used as input, the simulated Raman spectrum is judged to be false or true, and the reference spectrum is judged to be true or false;

when training the generator, the loss function of the generator aims at minimizing the error of the discriminator for discriminating the simulation spectrum into true or false;

during training of the discriminator, the discriminator loss function is targeted to maximize the classifier accuracy;

training the generator and the discriminator alternately, and finally generating a high-simulation Raman spectrum library close to the reference spectrum by the generator.

The invention discloses a Raman spectrum pretreatment model generation method, wherein the high-simulation Raman spectrum is generated by adopting a formula:

；

wherein

The representation generator (G) uses the input random Gaussian noise (z) and the ideal spectrum: (

) Generating a simulated Raman spectrum;

and

respectively indicating that the discriminator (D) judges the input spectrum as a simulated Raman spectrum

As false or true and reference spectra

Probability of being true or false;

the formula includes

And

two processes are as follows:

simulated raman spectroscopy while representing a training discriminator

Judged false, reference spectrum

Is judged to be true, so that

Is close to 0, and thus

The accuracy of the discriminator is maximized by being close to 0;

showing simulated raman spectra when training the generator

Is judged to be true, so that

Is close to 1, and further

Is close to 1;

through the alternation of the two processes, the generator realizes the generation of the high simulation spectrum.

The method for generating the Raman spectrum pretreatment model comprises the steps of inputting a high-simulation Raman spectrum library into the Raman spectrum pretreatment model for training; the ideal spectrum library is used as a label of a Raman spectrum pretreatment model, loss value calculation is carried out on the pretreated Raman spectrum and is fed back to the Raman spectrum pretreatment model, and the method for automatically optimizing parameters is adopted:

the method comprises the following steps of coding the intensity and the Raman shift of a spectrum by using a Raman spectrum position coder module, and preliminarily fitting noise and a baseline background signal to be removed by using a Raman spectrum background signal evaluation module;

extracting features of the spectrum by adopting a Raman spectrum encoder module, and extracting features belonging to Raman peaks;

reducing the highly compressed Raman peak features into a Raman spectrum by using a Raman spectrum decoder module;

the ideal spectrum library is used as a label for model training, is compared with the reduced Raman spectrum set, and calculates a loss value through a loss function;

and feeding the loss value back to the Raman spectrum pretreatment model, and updating the model parameters through a chain rule.

The invention relates to a method for generating a Raman spectrum pretreatment model, wherein a high-simulation Raman spectrum library is input into the Raman spectrum pretreatment model for training; the ideal spectrum library is used as a label of a Raman spectrum pretreatment model, loss value calculation is carried out on the pretreated Raman spectrum and is fed back to the Raman spectrum pretreatment model, and a method for automatically optimizing parameters is adopted:

encoding the intensity and raman shift of the spectrum using a raman spectrum position encoder module;

reducing the highly compressed Raman peak characteristics into a Raman spectrum by using a Raman spectrum decoder module;

primarily fitting noise and a baseline background signal to be removed by adopting a Raman spectrum background signal evaluation module;

The invention relates to a method for generating a Raman spectrum pretreatment model, wherein a high-simulation Raman spectrum library is input into the Raman spectrum pretreatment model for training; the ideal spectrum library is used as a label of a Raman spectrum pretreatment model, loss value calculation is carried out on the pretreated Raman spectrum and is fed back to the Raman spectrum pretreatment model, and the method for automatically optimizing parameters is adopted:

The invention relates to a Raman spectrum preprocessing model generation method, wherein a Raman spectrum position encoder module adopts a sliding window method, a sine/cosine encoding method or a batch-like encoding method;

sequentially segmenting different Raman shifts of the Raman spectrum by adopting a sliding window method to obtain multi-channel data;

the relative position of the spectral intensity is coded by adopting a sine/cosine coding method, and the code has a periodic characteristic;

and (3) carrying out correlation analysis on the multiple spectra by adopting a similar batch coding method, and coding the spectra by utilizing correlation coefficients.

The invention discloses a Raman spectrum pretreatment model generation method, wherein a loss function adopts a formula:

；

；

wherein

For training the background signal evaluation module to,

for training the entire pre-processing model;

，

refer to the ideal reference spectrum and the spectrum after pre-processing, respectively;

and

corresponding to the background signal evaluation module and the preprocessing model output;

the method comprises two processes of training a background signal evaluation module and a Raman spectrum preprocessing model:

first step, use

Training a background signal evaluation module;

second step, use

Training a Raman spectrum pretreatment model;

calculating the optional root mean square error or average absolute error, and adding weight coefficient in the second step

Balancing the feedback of the two;

the root mean square error RMSE adopts the formula:

；

the average absolute error MAE adopts the formula:

。

a raman spectrum preprocessing model, wherein the raman spectrum preprocessing model is generated by the raman spectrum preprocessing model generating method.

The application method of the Raman spectrum pretreatment model applies the Raman spectrum pretreatment model, and the realization method comprises the following steps:

and inputting the original Raman/surface-enhanced Raman spectrum to be processed into the Raman spectrum preprocessing model for processing, and outputting the processed Raman/surface-enhanced Raman spectrum with noise and baseline background signals removed.

A raman spectrum preprocessing model generation system, which is applied to the raman spectrum preprocessing model generation method as described above, includes: a parameter setting training set unit for executing the first step and a model training unit for executing the second step.

A raman spectroscopy preprocessing apparatus comprising a memory, a processor and a computer program stored in the memory and executable on the processor, wherein the steps of the method as described above are implemented when the computer program is executed by the processor.

A computer-readable storage medium, in which a computer program is stored, wherein the computer program, when being executed by a processor, carries out the steps of the method as set forth above.

The invention has the beneficial effects that: the invention firstly uses the self-supervision algorithm for preprocessing the Raman spectrum, and solves the problem that the Raman spectrum has no label data for training; the method has the advantages that the model is simple and quick to use after the training is finished, and the obtained result has higher fidelity while effectively removing noise and background baseline signals; the method can be used for real-time spectral analysis, and the application accuracy is improved. The application range includes but is not limited to classification applications such as substance identification and disease diagnosis, quantitative applications such as substance concentration/content detection, and applications in the image field such as spectral imaging signal-to-noise ratio and resolution improvement.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the present invention will be further described with reference to the accompanying drawings and embodiments, wherein the drawings in the following description are only part of the embodiments of the present invention, and for those skilled in the art, other drawings can be obtained without inventive efforts according to the accompanying drawings:

fig. 1 is a flowchart of a raman spectrum preprocessing model generation method according to a first embodiment of the present invention;

fig. 2 is a schematic diagram of noise information, a baseline background signal and raman peak extraction in a raman spectrum preprocessing model generation method according to a first embodiment of the present invention;

fig. 3 is a flow chart of the generation method of the raman spectrum preprocessing model according to the first embodiment of the present invention for generating the high-simulation raman spectrum library and the ideal spectrum library;

fig. 4 is a schematic diagram of a raman spectrum position encoding module (sliding window method) in a raman spectrum preprocessing model generation method according to a first embodiment of the present invention;

fig. 5 is a flow chart of a raman spectrum preprocessing algorithm training of a raman spectrum preprocessing model generation method according to a first embodiment of the present invention;

fig. 6 is a comparison diagram of the spectrum processing results of the raman spectrum preprocessing model generation method verification 1 according to the first embodiment of the present invention;

fig. 7 is a schematic diagram of the effect of different preprocessing algorithms of the raman spectrum preprocessing model generation method of the first embodiment of the present invention on the accuracy of cancer diagnosis based on serum surface enhanced raman spectroscopy;

fig. 8 is a schematic diagram illustrating comparison of the effect of the preprocessing algorithm of verification 3 on the improvement of the imaging quality of the raman hyperspectral cells by the raman spectrum preprocessing model generation method according to the first embodiment of the invention;

fig. 9 is a flow chart of a raman spectrum preprocessing algorithm training of a raman spectrum preprocessing model generation method according to a second embodiment of the present invention;

fig. 10 is a flow chart of a training algorithm of a raman spectrum preprocessing model generation method according to a third embodiment of the present invention;

FIG. 11 is a schematic block diagram of a Raman spectrum preprocessing model application method according to a fifth embodiment of the present invention;

fig. 12 is a schematic block diagram of a raman spectrum preprocessing model generation system according to a sixth embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the following will clearly and completely describe the technical solutions in the embodiments of the present invention, and it is obvious that the described embodiments are some embodiments of the present invention, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present invention without inventive step, are within the scope of the present invention.

Example one

A method for generating a raman spectrum preprocessing model, as shown in fig. 1 and referring to fig. 2-8, comprises the following steps:

s01: the real Raman/surface enhanced Raman spectrum library is processed by a Raman spectrum generation countermeasure network to generate an ideal spectrum library and a highly simulated Raman spectrum library:

training the Raman spectrum to generate a countermeasure network, and finally generating a high-simulation Raman spectrum library;

as shown in fig. 3, the method comprises the following steps:

a1. collecting real Raman spectrum/surface enhanced Raman spectrum data;

a2. extracting noise information, a baseline background signal and a Raman characteristic peak of a Raman spectrum in Raman spectrum/surface enhanced Raman spectrum data, and respectively generating a noise information base, a baseline background signal base and a Raman peak base;

a3. randomly combining characteristic peaks in a Raman peak library to generate an ideal spectrum without noise and a baseline background signal, and establishing a library;

a4. superposing the extracted noise and the baseline background signal on the ideal spectrum data to generate a reference spectrum, and establishing a library;

a5. the generator takes an ideal spectrum library and random Gaussian noise as input, and has the function of generating a simulated Raman spectrum;

a6. the discriminator takes the simulated Raman spectrum library and the reference spectrum library as input, and has the functions of distinguishing the simulated Raman spectrum as false (or true) and distinguishing the reference spectrum as true (or false);

a7. when the generator is trained, the loss function of the generator aims at minimizing the error of the discriminator for discriminating the simulated spectrum into true (or false), when the discriminator is trained, the loss function of the discriminator aims at maximizing the classification accuracy of the discriminator, the loss function value of the generator is increased, when the generator is trained, the loss function value of the generator is resisted, and by alternately training the generator and the discriminator, the generator can generate a high-simulation Raman spectrum library which is closer to the reference spectrum;

a8. in the process of training the generator and the discriminator, a loss function is adopted:

wherein

) Generating a simulated Raman spectrum;

and

As false or true and reference spectra

Probability of being true or false;

the formula includes

And

two processes are as follows:

simulated raman spectroscopy while representing a training discriminator

Judged false, reference spectrum

Is judged to be true, so that

Is close to 0, and thus

Close to 0, the accuracy of the discriminator is maximized

In the same way, the method has the advantages of,

indicating simulated raman spectra when training the generator

Is judged to be true, so that

Is close to 1, and further

Close to 1. Through the alternation of the steps, the generator realizes the function of generating the high-simulation spectrum.

S02: inputting the high-simulation Raman spectrum library into a Raman spectrum preprocessing model for training; the ideal spectrum library is used as a label of a Raman spectrum pretreatment model, loss values of the pretreated Raman spectrum are calculated and fed back to the Raman spectrum pretreatment model, and parameters are automatically optimized;

baseline background signal: the raman spectrum baseline background signal is a slowly varying signal inevitably introduced during spectrum acquisition, usually derived from a fluorescent signal. In the laser irradiation process, besides Raman photon transition, the interaction between light and a substance also has transition of most of same-frequency fluorescence, a slowly-changing signal is generated, when the half-height width of the signal is greater than that of a Raman peak, an effective Raman spectrum peak and a fluorescence background signal are superposed together and can be difficult to identify, and the phenomenon is called baseline drift, namely a baseline background signal is generated;

the generator is mainly composed of a neural network, takes an ideal spectrum library and random Gaussian noise as input, and has the function of generating a simulated Raman spectrum. Random gaussian noise is used as a random variable to provide different background signals for an ideal reference spectrum. The convolutional neural network, the attention neural network, the atlas neural network or the perceptron neural network and the like can be mainly selected;

the discriminator is mainly composed of a neural network, takes the simulated Raman spectrum library and the reference spectrum library as input, and has the functions of distinguishing the simulated Raman spectrum as false (or true) and distinguishing the reference spectrum as true (or false). The convolutional neural network, the attention neural network, the atlas neural network or the perceptron neural network and the like can be mainly selected;

specifically, the second step adopts a method that:

b1. the Raman spectrum position encoder module is used for encoding the intensity and the Raman shift of the spectrum, so that the Raman shift corresponding to each intensity can be more easily and automatically associated during model training, and the fitting and generalization capability of the model is enhanced; simultaneously, a Raman spectrum background signal evaluation module is adopted to preliminarily fit the signals to be removed;

b2. extracting features of the spectrum by adopting a Raman spectrum encoder module, and extracting features belonging to Raman peaks; the neural network is mainly used for carrying out classification training on the spectrum, and in the process of the classification training, the model can learn to obtain key features (Raman peaks) in the spectrum. The method adopted by the encoder comprises a convolutional neural network, an attention network, a perceptron network and the like;

b3. reducing the highly compressed Raman peak features into a Raman spectrum by using a Raman spectrum decoder module; the highly compressed features from the encoder are restored to baseline and noise removed spectra, and the decoder capabilities of the model training process are automatically learned using a neural network. The decoder adopts a method comprising a convolutional neural network, an attention network, a perceptron network and the like;

b4. and (3) taking the ideal spectrum library as a label for model training, comparing the ideal spectrum library with the restored Raman spectrum set and calculating a loss function:

；

；

wherein

For training the background signal evaluation module to,

for training the entire pre-processing model.

，

Respectively, an ideal reference spectrum and a spectrum after preprocessing.

And

corresponding to the background signal evaluation module and the pre-processing model output. The first step of the whole training process

Only background signal evaluations are trained. In the second step, the first step is that,

the entire pre-processing model is trained.

And calculating an optional root mean square error or average absolute error method. While in the second step weight coefficients are added

The feedback of the two is balanced.

The root mean square error RMSE adopts the formula:

the mean square average absolute error MAE adopts a formula:

b5. feeding the loss value back to the Raman spectrum pretreatment model, and updating the model parameters through a derivative chain rule;

specifically, in the first step:

the noise signal can be extracted by using a wavelet transform method, a Fourier transform method, a window sliding method or the like; the description is as follows:

wavelet transform is a time-frequency localized signal processing method, is suitable for signals which change slowly at low frequency and change rapidly at high frequency, and is very suitable for local transient events. For raman spectroscopy, noise is equivalent to high frequency signals and the baseline and raman peaks are equivalent to lower frequency signals, so the use of wavelet transforms can effectively extract noise signals in raman spectroscopy. Specifically, wavelet transform coefficient decomposition is carried out on the spectrum signals, wavelet coefficients mainly controlled by noise signals are reserved, and the reserved wavelet coefficients are used for reconstruction to obtain the noise signals;

the purpose of the fourier transform is to transform a signal in the time domain (i.e. time domain) into a signal in the frequency domain (i.e. frequency domain). Noise can also be considered as a high frequency signal by fourier transform. Specifically, discrete Fourier transform is performed on the spectrum, high-frequency components are extracted according to a Fourier domain, and finally noise signals in the spectrum can be obtained;

the sliding window algorithm is an operation that performs the required operations on an array of a given window size. Denoising is to replace a current value with a statistic value (such as a mean value, a median value and the like) of data in a window, and when noise is obtained, the spectrum after sliding is subtracted from an original spectrum to obtain the noise;

extracting Raman characteristic peaks of the Raman spectrum is realized by using a gradient descent method or a differential peak-searching fitting method and the like; the description is as follows:

gradient descent is a commonly used method to solve unconstrained optimization problems. Simply speaking, the extreme value of the function is solved by using the vector direction and the magnitude of the derivation, and for the extraction of the Raman characteristic peak, the Raman peak can be separated by using the extreme value (one Raman peak and two peak edges);

and (4) carrying out local differential solution on the spectrum by a differential method. In the spectrum, the differential value is small in a region where the change rate is small, and the differential value is large near the raman peak. Therefore, the Raman peaks separated one by one can be obtained by separating the peaks according to the differential value;

extracting the baseline background signal by using a wavelet transform method, a Fourier transform method or a least square method and the like; the description is as follows:

wavelet transform is a time-frequency localized signal processing method, suitable for signals that change slowly at low frequencies and more quickly at high frequencies, and is well suited for local transient events. For raman spectroscopy, noise is equivalent to a high frequency signal, a baseline is equivalent to a low frequency signal, and a raman peak is equivalent to a signal at an intermediate frequency, so that the baseline signal can be effectively extracted using wavelet transform. Specifically, wavelet transform coefficient decomposition is carried out on the spectrum signals, wavelet coefficients mainly controlled by baseline signals are reserved, and baseline background signals in the wavelet coefficients can be obtained by utilizing the reserved wavelet coefficients for reconstruction;

least squares (also known as the least squares method) is a mathematical optimization technique. It finds the best functional match of the data by minimizing the sum of the squares of the errors (the difference of the true target object and the fitted target object). For raman spectroscopy, the overall trend line belongs to the baseline signal, so the baseline background signal can be obtained by fitting the spectral trend using the least squares method;

the purpose of the fourier transform is to transform a signal in the time domain (i.e. time domain) into a signal in the frequency domain (i.e. frequency domain). The base line can also be considered to be the intermediate frequency signal using fourier transformation. Specifically, discrete Fourier transform is carried out on the spectrum, intermediate frequency components are extracted according to a Fourier domain, and finally a baseline background signal in the intermediate frequency components can be obtained;

in the second step: the Raman spectrum position encoder module divides the spectrum peak by adopting a method such as a sliding window method, a sine/cosine encoding method or a similar batch encoding method;

specifically, the raman spectral position encoder module may encode the raman shift by using different specific methods, including but not limited to a sliding window method, a sine/cosine encoding method, or a batch-like encoding method, and different encoding priorities are different.

The 'sliding window method' carries out multi-channel segmentation on the spectrum, each part of the spectrum after segmentation belongs to different feature extraction, and the weight can be kept in the common calculation. The 'sliding window method' encoder focuses on signal retention of weak characteristic peaks in different Raman shifted segments of the spectrum.

The alternative scheme adopts a sine/cosine coding method which codes the Raman shifts of the spectrum, gives a coding value to each Raman shift and can obtain the strong corresponding relation between the shift and the intensity, and the method focuses on the periodic characteristics of the spectrum.

An alternative scheme adopts a class batch coding method which gives a coding relation by using the relation among analysis batches, for example, a fixed coding value is given to all spectral intensities of the same Raman shift, and the coding method reflects the strong corresponding relation among the same Raman shift and emphasizes the analysis of the difference among the spectra. The encoders have the functions that when the model is input, the Raman displacement corresponding to each intensity can be more easily and automatically matched, and the fitting and generalization capability of the model is enhanced;

the invention firstly uses the self-supervision algorithm for preprocessing the Raman spectrum, and solves the problem that the Raman spectrum has no label data for training; the method has the advantages that the model is simple and quick to use after training is finished, and the obtained result has higher fidelity while effectively removing noise and a baseline background signal; the method can be used for real-time spectral analysis, and the application accuracy is improved. The application range includes but is not limited to classification applications such as substance identification and disease diagnosis, quantitative applications such as substance concentration/content detection, and applications in the image field such as spectral imaging signal-to-noise ratio and resolution improvement.

FIG. 2 is a schematic diagram of the extraction of a noise information base, a baseline background signal base and a Raman peak base in a Raman spectrum generation countermeasure network algorithm;

the method comprises the following specific implementation steps: and (3) performing characteristic peak searching and window segmentation on the input (a) real Raman spectrum library by the aid of an assistant task Raman spectrum generation countermeasure network algorithm part, and extracting corresponding characteristics. The feature extraction method includes, but is not limited to, least squares, wavelet transform, wiener transform, etc. Further establishing a noise information base (b), a base line background signal base (c) and a Raman peak base (d);

FIG. 4 is a schematic diagram of a Raman spectrum position coding module (sliding window) in a Raman spectrum preprocessing model RSBPDL algorithm;

the method comprises the following specific implementation steps: and dividing the spectrum into a plurality of regions by using a sliding window strategy, and sending the regions into a Raman spectrum preprocessing model RSBPDL model for processing. The key parameters of the sliding window strategy are two parameters of step Size (stride) and Size (Size), and the two parameters can automatically select the most appropriate values according to output feedback in model training. The module is mainly used for splitting the spectrum, so that the extraction of spectral characteristic peaks of different wavebands is realized in the spectrum pretreatment process; after pretreatment, the Raman peak of a weak signal can be better reduced;

the following are the comparison of the effect technical parameters of the invention on the spectrum pretreatment with the prior method:

TABLE 1 comparison of Root Mean Square Error (RMSE) values of spectra after various pretreatment algorithms with ideal spectra

Table 2 infinite norms of the spectrum after various pre-processing algorithms and the ideal spectrum (infinitinform,

) Value comparison

Table 1 and table 2 show the noise removal and background suppression capabilities of the original raman spectra of the present invention (table 1 RMSE) and the signal fidelity (table 2 infinite norm), respectively, for different signal to noise ratios compared to the prior art methods. Wherein the Root Mean Square Error (RMSE) is used to calculate the difference between the two spectra in the plot, a smaller value indicating a stronger noise removal and background suppression capability,

wherein

The total number of points of the spectrum is represented,

is a spectrum of light

The point of the light source is a point,

the ideal reference spectrum of the light is that of the light,

spectra after pretreatment. Infinite norm (

) In calculating the maximum difference of peaks in the spectrum, the smaller the value, the higher the fidelity of the spectrum,

wherein

The total number of points of the spectrum is represented,

is a spectrum of light

The point of the light beam is the point,

which represents an ideal reference spectrum of light,

the spectrum after pretreatment is shown. From the results, the RSBPDL processed spectra of the present invention compare the Root Mean Square Error (RMSE) and the infinite norm (RMSE) of the original spectra

) Compared with other existing methods, the method is better than the existing methods in the pretreatment effect on the Raman spectrum;

FIG. 6 is a graph showing the comparison of single spectrum to ideal spectrum overlap ratio after pretreatment by different algorithms (spectrum 4 in tables 1 and 2);

(a) Spectral data generated by a raman spectroscopy generation countermeasure network algorithm is presented, where the raw spectra are pre-processed raw spectra with noise and a baseline, and the ideal spectra are corresponding ideal spectra after removal of noise and baseline background signals. (b-f) shows the comparison of the spectrum after processing the original spectrum with the ideal spectrum by five different pre-processing algorithms (polymomial fitting, wavelet transform, residualCNN, unet-1D and RSBPDL). Wherein RSBPDL is the model of the invention. The more similar the preprocessed spectrum is to the ideal spectrum, the higher the noise removal and background suppression capability of the algorithm on the spectrum and the higher the fidelity degree of the spectrum. From the results, the RSBPDL-treated spectrum had the highest overlap ratio with the ideal spectrum, and the Root Mean Square Error (RMSE) and the infinite norm (r: (r) (r))

) The value is minimum, which shows that the pretreatment effect of the invention on the Raman spectrum is better than that of the existing method;

FIG. 7 shows that pre-processing Raman spectra according to the present invention improves the accuracy of disease diagnosis;

the diagnosis test included 28 cancer patients and 27 normal persons, and the data set was divided into 22/21 training sets and 6 validation sets, and the cross validation was performed 100 times. The diagnostic model employs residual neural network diagnostics, using data to collect surface enhanced raman spectra for the subject's serum. In order to improve the diagnosis accuracy, the acquired raman spectrum needs to be preprocessed to remove noise and baseline background signals.

The collected Raman spectrum is preprocessed by utilizing a Raman spectrum preprocessing model (RSBPDL) and four existing preprocessing algorithms (polymomial fitting, wavelet transform, residualCNN and Unet-1D), the obtained spectrum is input into a disease diagnosis model for training, and the diagnosis accuracy is evaluated through a verification set. Wherein receiver operating characteristic curves (ROC) are tools used in clinical diagnostics to comprehensively evaluate the specificity and sensitivity of diagnostic methods. AUC is the area under the ROC curve, with higher AUC values indicating higher diagnostic accuracy. As can be seen from the left graph, the ROC curve related to the diagnosis accuracy of the Raman spectrum preprocessed by the RSBPDL is positioned at the top, and represents that the area under the ROC curve is the largest; the right graph is an error bar graph, in which the upper and lower horizontal lines represent the upper and lower limits of the error and the prisms represent the mean value of the error. Each line represents a pre-processing method; the right graph shows that in all the preprocessing algorithms, the AUC value of the classification diagnosis performed after the preprocessing of the Raman spectrum preprocessing algorithm RSBPDL is the largest, and the improvement effect on the diagnosis accuracy is the best.

FIG. 8 is a graph showing the resolution enhancement effect of the present invention on Raman spectroscopy cell imaging;

the left image is an original Raman spectrogram directly acquired by a Raman confocal microscope, and 2930cm in the Raman spectrum of the selected pixel point ^-1 The image is imaged, and the image signal-to-noise ratio and the resolution are low. The right image is an image of the acquired raman spectrum after being preprocessed by the RSBPDL algorithm of the present invention.

Compared with the prior art, the quality of the imaged image is greatly improved, and more details of cells can be seen in the image, so that the signal to noise ratio and the resolution are higher.

The image quality evaluation can be realized by calculating the entropy values of the two images, and the larger the entropy value is, the more information the images contain is.

Entropy value (Entropy) calculation formula:

in which P is _i Represented as the value of each pixel. The entropy value of the left graph is 2.43, the entropy value of the right graph is 3.09, and the result shows that the Raman image after the RSBPDL pretreatment spectrum contains more detailed information. Therefore, the quality of the Raman image can be effectively improved after the RSBPDL pretreatment;

example two

This embodiment is substantially the same as the first embodiment, and the same parts are not described again, except for the following points as shown in fig. 9: the second step adopts the method:

the Raman spectrum position encoder module is used for encoding the intensity and the Raman shift of the spectrum, so that the Raman shift corresponding to each intensity can be more easily and automatically associated during model training, and the fitting and generalization capability of the model is enhanced;

a Raman spectrum background signal evaluation module is adopted to preliminarily fit the signals to be removed, so that the subsequent fitting difficulty is reduced;

the ideal spectrum library is used as a label for model training, is compared with the restored Raman spectrum set, and calculates a loss value;

feeding the loss value back to the Raman spectrum pretreatment model, and updating the model parameters through a chain rule;

as shown in fig. 9, the scheme differs from the first embodiment in that: in the second step, the "raman spectrum background signal evaluation module" is moved backward, so that the pretreatment of the spectrum can be realized, and only the effect of easily reducing the output result of the "raman spectrum background signal evaluation module" is put backward, so that the effect is reduced, because: when the 'Raman spectrum background signal evaluation module' is arranged afterwards, the characteristics of the Raman spectrum background signal evaluation module cannot be extracted by an encoder, and only the output of the background evaluation module can be processed by a decoder. Therefore, the trainability of the "raman spectrum background signal evaluation module" is reduced after being post-processed, and the pre-processing noise is larger than that in the first embodiment.

EXAMPLE III

This embodiment is substantially the same as the first embodiment, and the same parts are not described again, except for the following points as shown in fig. 10: the second step adopts the method:

the ideal spectrum library is used as a label for model training, is compared with the restored Raman spectrum set, and calculates a loss function;

as shown in fig. 10, the scheme differs from the first embodiment in that: in the second step, only the 'encoder module' and the 'decoder module' are used to extract the spectral features and remove the baseline and noise. Due to the removal of the "spectral background signal evaluation module", the effect is reduced, although spectral preprocessing can be implemented as well.

Example four

EXAMPLE five

A method for applying a raman spectrum preprocessing model, as shown in fig. 11, applies the raman spectrum preprocessing model as described above, and the implementation method thereof is as follows:

inputting the original Raman/surface-enhanced Raman spectrum to be processed into the Raman spectrum preprocessing model for processing, and outputting a processed Raman/surface-enhanced Raman spectrum with the noise signal and the baseline background signal removed;

the method can be directly used for removing the noise and the baseline background signal of any Raman spectrum to be processed. The output processed spectrum can be directly used for downstream application, and the performance of the application is improved. Applications include, but are not limited to, classification applications such as substance identification and disease diagnosis, quantitative applications such as substance concentration/content detection, and applications in the field of images such as spectral imaging resolution enhancement.

EXAMPLE six

A raman spectrum preprocessing model generation system, as shown in fig. 12, for implementing the raman spectrum preprocessing model generation method as described above, includes a parameter setting training set unit 1 for executing a first step to generate a highly simulated raman spectrum library and an ideal spectrum library, and a model training unit 2 for executing a second step to train and generate a parameter-optimized self-supervised raman spectrum preprocessing model;

the invention uses the self-supervision algorithm for preprocessing the Raman spectrum for the first time, and solves the problem that the Raman spectrum has no label data for training; the method has the advantages that the model is simple and quick to use after training is finished, and the obtained result has higher fidelity while effectively removing noise and a baseline background signal; the method can be used for real-time spectral analysis, and the application accuracy is improved. The application range includes but is not limited to classification applications such as substance identification and disease diagnosis, quantitative applications such as substance concentration/content detection, and applications in the image field such as spectral imaging signal-to-noise ratio and resolution improvement.

EXAMPLE seven

Example eight

The computer readable storage medium of the embodiments of the present invention may include any entity or device capable of carrying computer program code, a recording medium, such as a ROM/RAM, a magnetic disk, an optical disk, a flash memory, or the like.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents and improvements made within the spirit and principle of the present invention are intended to be included within the scope of the present invention.

Claims

1. A Raman spectrum preprocessing model generation method is characterized by comprising the following steps:

2. The raman spectrum preprocessing model generation method of claim 1, wherein said subjecting the real raman/surface enhanced raman spectroscopy library to raman spectrum generation countermeasure network processing comprises a method of:

3. A method for generating a raman spectrum preprocessing model according to claim 2, wherein said preparing a training data set, generating an ideal spectrum library and a reference spectrum library comprises the method of:

collecting real Raman spectrum/surface enhanced Raman spectrum data;

4. The method for generating a raman spectrum preprocessing model according to claim 2, wherein said training raman spectrum to generate a countermeasure network and finally generating a highly simulated raman spectrum library comprises the steps of:

training the generator and the discriminator alternately, wherein the generator finally generates a high-simulation Raman spectrum library close to the reference spectrum.

5. The method for generating a raman spectrum preprocessing model according to claim 4, wherein the generating of the high-simulation raman spectrum employs the formula:

；

wherein

) Generating a simulated Raman spectrum;

and

respectively representing the input spectrum as a simulated Raman spectrum

As false or true and reference spectra

Probability of being true or false;

the formula includes

And

two processes are as follows:

simulated raman spectroscopy while representing a training discriminator

Judged false, reference spectrum

Is judged to be true, so that

Is close to 0, and thus

The accuracy of the discriminator is maximized by being close to 0;

indicating simulated raman spectra when training the generator

Is judged to be true, so that

Is close to 1, and further

Is close to 1;

6. The method for generating a raman spectrum preprocessing model according to claim 1, wherein the highly simulated raman spectrum library is inputted into the raman spectrum preprocessing model for training; the ideal spectrum library is used as a label of a Raman spectrum pretreatment model, loss value calculation is carried out on the pretreated Raman spectrum and is fed back to the Raman spectrum pretreatment model, and the method for automatically optimizing parameters is adopted:

7. The method for generating a raman spectrum preprocessing model according to claim 1, wherein the highly-simulated raman spectrum library is inputted into the raman spectrum preprocessing model for training; the ideal spectrum library is used as a label of a Raman spectrum pretreatment model, loss value calculation is carried out on the pretreated Raman spectrum and is fed back to the Raman spectrum pretreatment model, and a method for automatically optimizing parameters is adopted:

8. The method for generating a raman spectrum preprocessing model according to claim 1, wherein the highly-simulated raman spectrum library is inputted into the raman spectrum preprocessing model for training; the ideal spectrum library is used as a label of a Raman spectrum pretreatment model, loss value calculation is carried out on the pretreated Raman spectrum and is fed back to the Raman spectrum pretreatment model, and the method for automatically optimizing parameters is adopted:

9. A Raman spectrum preprocessing model generation method according to any one of claims 6-8, wherein the Raman spectrum position encoder module adopts a method of a sliding window method, a sine/cosine encoding method or a batch-like encoding method;

10. The method of generating a raman spectrum preprocessing model according to any one of claims 6 to 8, wherein the loss function employs the formula:

；

；

wherein

For training the background signal evaluation module to,

for training the entire pre-processing model;

，

and

first step, use

Training a background signal evaluation module;

second step of using

Training a Raman spectrum pretreatment model;

Balancing the feedback of the two;

the root mean square error RMSE adopts the formula:

；

the average absolute error MAE adopts the formula:

。

11. a raman spectrum preprocessing model generated by the method for generating a raman spectrum preprocessing model according to any one of claims 1 to 10.

12. A method for applying a raman spectrum preprocessing model according to claim 11, wherein the method comprises the following steps:

13. A raman spectrum preprocessing model generation system, which is applied to the raman spectrum preprocessing model generation method according to any one of claims 1 to 10, comprising: a parameter setting training set unit for executing the first step and a model training unit for executing the second step.

14. A raman spectroscopy preprocessing apparatus comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the steps of the method according to any one of claims 1 to 10 when executing the computer program.

15. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 10.