CN114781484A - Cancer serum SERS spectrum classification method based on convolutional neural network - Google Patents

Cancer serum SERS spectrum classification method based on convolutional neural network Download PDF

Info

Publication number
CN114781484A
CN114781484A CN202210275709.7A CN202210275709A CN114781484A CN 114781484 A CN114781484 A CN 114781484A CN 202210275709 A CN202210275709 A CN 202210275709A CN 114781484 A CN114781484 A CN 114781484A
Authority
CN
China
Prior art keywords
neural network
convolutional neural
cancer
sers spectrum
layer
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210275709.7A
Other languages
Chinese (zh)
Inventor
李睿
宋泽江
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Langrui Ningbo Technology Co ltd
Dalian University of Technology
Original Assignee
Langrui Ningbo Technology Co ltd
Dalian University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Langrui Ningbo Technology Co ltd, Dalian University of Technology filed Critical Langrui Ningbo Technology Co ltd
Priority to CN202210275709.7A priority Critical patent/CN114781484A/en
Publication of CN114781484A publication Critical patent/CN114781484A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biophysics (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Probability & Statistics with Applications (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Investigating Or Analysing Biological Materials (AREA)

Abstract

The invention relates to the technical field of cancer screening, and provides a cancer serum SERS spectrum classification method based on a convolutional neural network, which comprises the following steps: step 100, acquiring a data set of a cancer serum SERS spectrum, and dividing the data set of the SERS spectrum into a training set, a verification set and a test set; 200, building a convolutional neural network structure for cancer serum SERS spectrum classification; the convolutional neural network structure includes: the device comprises an input layer, a first convolution layer, a first maximum pooling layer, a second convolution layer, a second maximum pooling layer, a first full-connection layer, a second full-connection layer and a normalized exponential function output layer; step 300, training the constructed convolutional neural network structure to obtain a cancer serum SERS spectrum classification model; and 400, utilizing the cancer serum SERS spectrum classification model to classify the cancer serum SERS spectrum. The invention can realize SERS spectrum classification and improve the classification efficiency and effect.

Description

Cancer serum SERS spectrum classification method based on convolutional neural network
Technical Field
The invention relates to the technical field of cancer screening, in particular to a cancer serum SERS spectrum classification method based on a convolutional neural network.
Background
Cancer is a malignant disease threatening human survival at the present stage, and brings a heavy economic burden to society. The latest global cancer data published by the world health organization in 2020 shows that breast cancer is the cancer with the highest incidence worldwide and that lung cancer is the first cancer death. In recent years, the increasing of physical examination consciousness increases the time window for early diagnosis of cancer, and early screening and early diagnosis of cancer are the key points of the current cancer treatment. Chest CT examination is the most common method for detecting early lung cancer, and early screening of breast cancer mainly depends on B-ultrasonic examination, and in addition, cancer marker detection methods such as fluorescence immunoassay and the like can be used for judging whether the breast cancer is suffered from cancer. Although the existing cancer detection methods are many, different cancer screening methods are different, and people not only spend a lot of time in physical examination, but also pay high examination cost, so that it is necessary to find a universal, low-cost and high-sensitivity screening method.
Surface Enhanced Raman Scattering (SERS) is a local electromagnetic field formed by rough noble metals to enhance and improve the intensity of Raman signals, and SERS can improve the intensity of ordinary Raman scattering by several orders of magnitude, even reaching the level of single molecule detection. The SERS has the advantages of high accuracy, high detection speed, low cost, strong universality and the like, and is widely applied to the fields of biomedicine and the like. The species and content of substances in blood can change in the process of canceration of cells, and human serum mainly comprises water, carbohydrate, protein and other components, and compared with water, the Raman signal of other substances is much stronger. Clinical pathological symptoms of a patient at an early stage of cancer may not be obvious enough to lead to uncertain diagnosis, but characteristics reflecting canceration can be found out from spectral differences by measuring SERS spectra of serum, thereby providing reference basis for diagnosis of cancer.
After a large amount of human serum SERS spectrum data is obtained, how to judge whether the human serum SERS spectrum data is affected with cancer and the type of the affected cancer is also a hot point problem, the existing serum SERS spectrum classification methods comprise a Principal Component Analysis (PCA) and a Hierarchical Clustering Analysis (HCA), and the two methods can realize good classification of the serum SERS spectrum. However, these two methods are not only cumbersome to operate: characteristic peaks need to be manually selected after several serum SERS spectrograms are compared, and the selection of the characteristic peaks can influence the experimental result to a great extent, so that the operation process depends on experience and the operation result is not uniform. Therefore, a classification method capable of automatically extracting spectral features is to be developed.
Disclosure of Invention
The invention mainly solves the technical problems that the SERS spectrum classification operation in the prior art is complicated, the operation process depends on experience and the operation result is not uniform, and provides a cancer serum SERS spectrum classification method based on a convolutional neural network, so that the SERS spectrum classification is realized, and the classification efficiency and effect are improved.
The invention provides a cancer serum SERS spectrum classification method based on a convolutional neural network, which comprises the following processes of:
step 100, acquiring a data set of a cancer serum SERS spectrum, and dividing the data set of the SERS spectrum into a training set, a verification set and a test set;
200, building a convolutional neural network structure for cancer serum SERS spectrum classification; wherein the convolutional neural network structure comprises: the device comprises an input layer, a first convolution layer, a first maximum pooling layer, a second convolution layer, a second maximum pooling layer, a first full-connection layer, a second full-connection layer and a normalized exponential function output layer;
step 300, training the constructed convolutional neural network structure to obtain a cancer serum SERS spectrum classification model;
and 400, carrying out SERS spectrum classification on the cancer serum by using the SERS spectrum classification model of the cancer serum.
Further, the first convolution layer and the second convolution layer respectively perform the following operations:
performing convolution processing on the input of the previous layer; the convolution processing formula is as follows:
Figure BDA0003555822610000021
wherein, (f × g) represents that the input data f is subjected to convolution operation with the convolution kernel g, N is the nth result of the convolution operation and ranges from 1 to the sum of the lengths of f and g, and N is the length of the input data f. m represents the length of the convolution kernel g;
carrying out batch normalization processing on the local features to enable the feature output of each layer to be close to standard normal distribution; the working principle of batch normalization treatment is as shown in formulas (1) to (4):
Figure BDA0003555822610000031
Figure BDA0003555822610000032
Figure BDA0003555822610000033
Figure BDA0003555822610000034
wherein x isiA sample representing the input of the ith batch,
Figure BDA0003555822610000035
denotes the normalized sample,. mu.BDenotes the mean value, σ, of the sample x by batchBRepresents the standard deviation of the sample x by batch, m represents the size of the input batch, and epsilon is a small constant set to avoid denominator of 0; BNγ,β(xi) Denotes batch normalization operation, gamma denotes output result yiBeta represents the output result yiGamma and beta are learnable parameters, yiIs a batch normalization result after translation and scaling;
activating by using an LeakyReLU activation function, and outputting a spectral feature matrix; wherein, the implementation formula of the LeakyReLU activation function is as follows (5):
Figure BDA0003555822610000036
where x is the input neuron, f (x) represents the activation result, and a is a small constant set to avoid the disadvantage of the gradient vanishing at the negative axis of the conventional ReLU activation function.
Further, the normalized index function output layer uses the normalized index function as an activation function to obtain the probabilities of healthy people and cancer patients with different cancer types, respectively, wherein the normalized index function is implemented as formula (6):
Figure BDA0003555822610000037
where c is the total number of categories, j is the category index, xjRefers to the output of the output unit at the front stage of the normalized exponential function.
Further, the cancer type comprises at least one or more of breast cancer, lung cancer.
Further, step 300 includes steps 301 to 302:
step 301, training a convolutional neural network structure by using a training set;
and 302, optimizing the convolutional neural network by using an Adam optimizer in the training process to obtain a cancer serum SERS spectrum classification model.
Further, after step 300, the method further includes:
and step A1, verifying the obtained SERS spectrum classification model of the cancer serum by using the verification set.
Further, after step 300, the method further includes:
step A2, verifying the performance of the cancer serum SERS spectrum classification model by using a test set;
calculating the recognition accuracy of healthy people and different cancer types by adopting the following recognition accuracy calculation formula, and measuring the classification performance of the convolutional neural network:
Figure BDA0003555822610000041
calculating the recall rate of healthy people and different cancer types by adopting the following recall rate calculation formula to measure the classification performance of the convolutional neural network:
Figure BDA0003555822610000042
calculating F1 values of healthy people and different cancer types by adopting the following F1 value calculation formula to measure the classification performance of the convolutional neural network:
Figure BDA0003555822610000043
wherein TP, FN and FP represent the number of true positive, false negative and false positive respectively.
The invention provides a cancer serum SERS spectrum classification method based on a convolutional neural network, which classifies the cancer serum SERS spectrum by adopting a convolutional neural network technology. Because the SERS spectrum measured by the experiment is one-dimensional data, the two-dimensional operation commonly used by the traditional neural network cannot be used, so that the method changes the operation contained in the neural network into one-dimensional operation, obtains the similar effect of the two-dimensional operation and is suitable for the one-dimensional SERS spectrum data; compared with the traditional SERS spectrum classification method, the convolution operation in the convolutional neural network can extract the information around each data point, and the peak characteristics can be effectively extracted for the spectrum; meanwhile, the deep learning technology is used for preprocessing the spectral data, so that the SERS spectrum can be directly classified; the complete spectral data is used, and the loss of manual selection on spectral characteristics is reduced; overcomes the defects of manual selection of characteristic experience and uncertainty, provides reference for early cancer screening by using a convolutional neural network method, and has further development and application prospects.
Drawings
FIG. 1 is a flowchart of an implementation of the SERS spectrum classification method for cancer serum based on convolutional neural network provided in the present invention;
FIG. 2 is a block diagram of a convolutional neural network employed by the present invention;
FIG. 3 is a comparison graph of spectra before and after baseline removal of SERS spectral data;
FIG. 4 is a graph illustrating a trend of a loss function;
fig. 5 is a diagram illustrating a variation trend of the accuracy.
Detailed Description
In order to make the technical problems solved, the technical solutions adopted and the technical effects achieved by the present invention clearer, the present invention is further described in detail below with reference to the accompanying drawings and embodiments. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not to be construed as limiting the invention. It should be further noted that, for the convenience of description, only some but not all of the relevant elements of the present invention are shown in the drawings.
As shown in fig. 1, the method for SERS spectrum classification of cancer serum based on convolutional neural network according to the embodiment of the present invention includes:
step 100, acquiring a data set of the SERS spectrum of the cancer serum, and dividing the data set of the SERS spectrum into a training set, a verification set and a test set.
And 200, building a convolutional neural network structure for cancer serum SERS spectrum classification.
The Convolutional Neural Network (CNN) is an algorithm for deep learning, can automatically extract features of input data through convolution and nonlinear transformation operations, has excellent feature learning capability, and can be used for fitting complex models. The CNN combines preprocessing, feature extraction, and classification in the same framework, allowing end-to-end training without human adjustment.
The convolutional neural network structure of the present invention comprises: the device comprises an input layer, a first convolution layer, a first maximum pooling layer, a second convolution layer, a second maximum pooling layer, a first full-link layer, a second full-link layer and a normalized exponential function output layer. It should be noted that a plurality of convolutional layers and a plurality of max-pooling layers may be provided, and a plurality of full-link layers may also be provided, which may be determined according to actual situations.
The convolutional layers and the pooling layers are alternately arranged, the convolutional layers extract data characteristics by using convolutional kernels, the activation function introduces nonlinearity, and the pooling layers are responsible for reducing the dimension and reducing the parameter quantity; the fully connected layer can be used as a classifier to identify different substances according to the features extracted from the convolutional layer.
The method combines the characteristics of serum SERS spectral data, and builds a convolutional neural network structure on the basis of AlexNet model idea, wherein the network structure is shown in figure 2. Since the input raman spectrum belongs to one-dimensional data, a one-dimensional convolution kernel with a size of 3 × 1 is used in the convolution layer, and in order to improve the generalization capability of the model and prevent overfitting, a Batch Normalization (BN) layer is added between the convolution operation and the activation function, and a dropout algorithm is added between the full-connection layers. The network implementation process is as follows, and the parameters are shown in table 1:
TABLE 1 convolutional neural network parameters
Figure BDA0003555822610000061
(1) An input layer: using the SERS spectral data subjected to dispersion normalization as an input layer, the entire spectrum, i.e., the intensity fully sampled at fixed intervals of wavenumbers, was input, the two-dimensional form of the input layer dimensions being 1609 × 1.
(2) A first winding layer: the convolution kernel size is 3 × 1, its number is 16, and the step size is 1.
1) Performing convolution processing on the input of the previous layer;
the first convolution layer performs one-dimensional convolution processing on the input of the input layer, and the one-dimensional convolution processing formula is as follows:
Figure BDA0003555822610000062
wherein, (f × g) represents that the input data f is subjected to convolution operation with the convolution kernel g, N is the nth result of the convolution operation and ranges from 1 to the sum of the lengths of f and g, and N is the length of the input data f. m represents the length of the convolution kernel g, the convolution operation can extract information of m data around each data point in the input data f, and the larger the fluctuation of the data is, the larger the result is, so that a peak with larger change can be effectively extracted, namely the characteristic of the SERS spectrum.
2) Carrying out batch normalization processing on the local features to enable the feature output of each layer to be close to standard normal distribution; the working principle of batch normalization treatment is as shown in formulas (1) to (4):
Figure BDA0003555822610000071
Figure BDA0003555822610000072
Figure BDA0003555822610000073
Figure BDA0003555822610000074
wherein x isiA sample representing the input of the ith batch,
Figure BDA0003555822610000075
denotes the normalized sample,. mu.BDenotes the mean value, σ, of the sample x by batchBRepresents the standard deviation of the sample x by batch, m represents the input batch size, and epsilon is a small constant set to avoid denominator of 0; BNγ,β(xi) Representing a batch normalization operation, and gamma represents the output result yiThe standard deviation of (2) is also the result of normalization
Figure BDA0003555822610000076
Beta represents the output result yiIs also the result of the pair normalization
Figure BDA0003555822610000077
Gamma and beta are learnable parameters, yiIs the batch normalization result after translation and scaling.
The average value and the standard deviation of the input characteristic values are calculated by the formulas (1) to (2), then the sample is normalized by the formula (3), and finally the normalized data is translated and scaled by the formula (4).
3) Finally, activating by using a LeakyReLU activation function, and outputting a 1609 × 16 spectral feature matrix.
The LeakyReLU activation function has the effects of accelerating the convergence of a model and reducing silent neurons, and the implementation formula of the LeakyReLU activation function is shown as the formula (5):
Figure BDA0003555822610000078
where x is the input neuron, f (x) represents the activation result, and a is a small constant set to avoid the disadvantage of the gradient vanishing at the negative axis of the conventional ReLU activation function.
(3) A first pooling layer: the type is Max Pooling layer (Max Pooling), the size of the Pooling core is 4 × 1, the step length is 4, and the size of the output matrix after the feature matrix output by the first convolution layer is subjected to Pooling operation is 402 × 16.
(4) A second convolution layer: 32 convolution kernels of size 3 x 1 with a step size of 1. The output of the first pooling layer is subjected to the same processing (convolution processing, batch normalization processing, and LeakyReLU function activation processing) as the first pooling layer, and the output matrix size is 402 × 32.
(5) A second pooling layer: and (3) adopting a maximum pooling layer, wherein the size of a pooling core is 2 multiplied by 1, the step length is 2, and a 201 multiplied by 32 matrix is output through the second convolution layer of the layer.
(6) First fully-connected layer: the number of the neurons is 512, the spectral feature matrix output by the second pooling layer is converted into a vector with the length of 6432 by using a LeakyReLU activation function, the vector is transmitted to 512 fully-connected neurons, and the feature vector with the length of 512 is output. In order to prevent the model from overfitting, a dropout layer is set, the parameter is 0.5, and half of the neurons are randomly and temporarily inactivated in each iteration.
(7) A second fully connected layer: the number of neurons was 256, and a feature vector with a length of 256 was output using the LeakyReLU activation function to prevent overfitting using dropout.
(8) Normalized exponential function output layer: because serum SERS spectra need to be divided into three types of healthy people, breast disease patients and lung cancer patients, the number of neurons in the layer is set to be 3, a normalization index (SoftMax) function is used as an activation function, the healthy people and the cancer patients with different cancer types are respectively obtained, and the result of neural network prediction is the category corresponding to the maximum probability value. The normalized exponential function is implemented as equation (6):
Figure BDA0003555822610000081
wherein c is the total number of cancer types, j is the cancer type index, xjRefers to the output of the output unit at the front stage of the normalized exponential function.
And 300, training the constructed convolutional neural network structure to obtain a cancer serum SERS spectrum classification model. Step 300 includes steps 301 to 302:
and 301, training the convolutional neural network structure by using the training set.
After the convolutional neural network aiming at serum SERS spectral information is built, training is carried out on the convolutional neural network by using a training set, and each parameter of the convolutional neural network structure is updated in each round of training. In this embodiment, the number of data in the training set is 700.
In the training process, after each round of training, calculating a loss value by using a multi-classification cross entropy loss function, and estimating the difference between a predicted value and a true value of the network, wherein the expression of the multi-classification cross entropy loss function is as follows (7):
Figure BDA0003555822610000082
in the formula, N is the total amount of data; m is the number of categories; y isikTo indicate the variable, y is the same as the class of sample iikIs 1, otherwise is 0; p is a radical of formulaikRepresenting the predicted probability that sample i belongs to class k. The smaller the cross entropy loss value is, the closer the predicted value and the true value are, and the better the classification effect of the model is. Parameters of the neural network structure may be adjusted with reference to the loss values.
And 302, optimizing the convolutional neural network by using an Adam optimizer in the training process to obtain a cancer serum SERS spectrum classification model.
The learning rate in this step is set to 10-5. The Adam optimizer is an optimization algorithm improved based on a gradient descent method, and has the functions of updating the weight of neural network neurons according to the back propagation of loss values after the current network loss values are obtained, and reducing the loss values of the network. Finally, the minimum value of the loss value is obtained through continuous iteration, and the aim that the network prediction result is close to the real result of manual judgment is achieved. Compared with a standard gradient descent method, the Adam optimizer has higher efficiency and can relieve the problem of gradient oscillation.
Specifically, in the present embodiment, the training of the model is started after the input batch size is set to 16 and the number of iterations is set to 150. The trend of the loss function of the training set and the validation set during the training process is shown in fig. 4, and the accuracy rate trend graph is shown in fig. 5. It can be found that the loss value is in a fast-decreasing state all the time in the first 20 times of training, and is kept around the 0 value all the time after the iteration is performed to 100 times, which indicates that the model reaches convergence. The accuracy of the training set was 99.43%.
And 400, utilizing the cancer serum SERS spectrum classification model to classify the cancer serum SERS spectrum.
And carrying out straight baseline removal pretreatment on the collected SERS spectral data. Such asAs shown in FIG. 3, 725cm was selected-1Lower dot and 1825cm-1And (5) taking a straight line from a point on the spectrum to perform substrate removal operation.
In the step, serum SERS spectral data to be classified is collected, and the trained neural network weight is used for inputting the serum SERS spectral data into the neural network for classification.
The cancer types comprise at least one or more of breast cancer, lung cancer. In this embodiment, healthy people, breast cancer, and lung cancer are taken as examples, and the cancer serum SERS classification method of this embodiment is also applicable to other types of cancer classification.
After obtaining the cancer serum SERS spectrum classification model (after step 300), the present embodiment may use step a1 to validate the model, and may use step a2 to evaluate the performance of the model.
And step A1, verifying the obtained SERS spectrum classification model of the cancer serum by using the verification set.
And verifying by using the verification set, sequentially inputting the data of the verification set into the neural network for prediction while training, and comparing the result of the verification set with the result of the training set to observe whether the neural network is over-fitted, namely, the prediction accuracy of the training set is high and the prediction accuracy of other data is low.
In this embodiment, the number of data in the verification set is 100. The accuracy of the validation set was 100%.
Step A2, verifying the performance of the cancer serum SERS spectrum classification model by using the test set.
Calculating the recognition accuracy of healthy people and different cancer types (breast cancer and lung cancer) by adopting the following recognition accuracy calculation formula, and measuring the classification performance of the convolutional neural network:
Figure BDA0003555822610000101
calculating the recall rate of healthy people and different cancer types (breast cancer and lung cancer) by adopting the following recall rate calculation formula to measure the classification performance of the convolutional neural network:
Figure BDA0003555822610000102
f1 values of healthy people and different cancer types (breast cancer and lung cancer) were calculated using the following F1 value calculation formula to measure the classification performance of the convolutional neural network:
Figure BDA0003555822610000103
wherein TP, FN and FP represent the number of true positive, false negative and false positive respectively.
The invention is illustrated below by way of example:
in the experiment, 136 human serum SERS spectral information is collected as a sample, however, CNN needs a large amount of data to train, and in order to reduce the requirement of sample amount and improve the generalization capability and robustness of the model, the invention adopts a data enhancement technology: the spectrum is moved up and down randomly, Gaussian white noise is added, and linear combination of all spectra belonging to the same class of people is used as amplification data to increase the size of a training set of CNN training to obtain 1000 groups of data, and the data size is enough for the problem of three-classification of spectrum information.
In order to avoid uneven distribution of samples and better verify the generalization capability of the model, the experiment uses a uniform random sampling mode, and 1000 groups of data are processed according to the following steps of 7: 1: 2, therefore, in the experiment, the number of training set samples is 700 (189 healthy people, 301 breast cancer and 210 lung cancer), the number of validation set samples is 100 (27 healthy people, 43 breast cancer and 30 lung cancer), and the number of test set samples is 200 (54 healthy people, 86 breast cancer and 60 lung cancer). The training set, the verification set and the test set all have 3 categories, the label of healthy people is '0', the label of breast cancer is '1', and the label of lung cancer is '2'.
And (3) verifying the performance of the cancer serum SERS spectral classification model by using the test set.
And (3) testing results: 1 of 54 healthy people is misjudged to be breast cancer, 86 breast cancer and 60 lung cancer are all correctly classified, and the accuracy rate of the test set is 99.5%. As can be seen from the test results and the variation trend of the loss function of the verification set, the classification performance of the convolutional neural network is ideal, and the phenomenon of over-fitting or under-fitting does not occur. The test results are expressed as a confusion matrix, as shown in table 2.
TABLE 2 confusion matrix of test results
Figure BDA0003555822610000111
The recognition accuracy, recall ratio and F1 value of healthy people, breast cancer patients and lung cancer patients were calculated according to the formulas (8), (9) and (10), respectively, as shown in Table 3. From table 3, it can be seen that the precision, recall rate and F1 value of healthy people, breast cancer patients and lung cancer patients are all above 98%, which indicates that the convolutional neural network model has good classification effect. Furthermore, for cancer identification, false positives can be ruled out by subsequent more careful medical examination, and false negatives are likely to delay the timing of treatment with serious consequences. Only 1 healthy person in the test results is wrongly judged as a breast cancer patient, so that the CNN model can ideally realize the classification of serum SERS spectra.
TABLE 3 Classification model evaluation index
Figure BDA0003555822610000112
Comparison of the method of the invention with other classification methods:
the accuracy of the tests using different experimental methods, compared to the serum SERS spectral classification method based on PCA and HCA, is shown in table 4. Experimental data show that the accuracy of the convolutional neural network model is obviously higher than that of other two experimental methods in the aspect of serum SERS spectrum classification, and the convolutional neural network can automatically and efficiently extract spectral features, so that the defect that the features need to be extracted manually in the traditional method is overcome.
TABLE 4 results of the different classification methods
Figure BDA0003555822610000121
In the experiment, SERS detection is performed on three types of serum (35 healthy people, 43 lung cancer patients and 58 breast cancer patients), certain pretreatment is performed on obtained spectral data, then the data quantity is increased by using a data enhancement method, and the expanded data is obtained according to the ratio of 7: 1: 2, a training set, a verification set and a test set are divided, a 5-layer one-dimensional convolution neural network is built under a TensorFlow frame, and the serum SERS spectrum data is subjected to feature extraction and classification, so that the serum SERS spectrum classification method capable of automatically extracting the spectrum features and having high accuracy is provided.
The invention utilizes the convolutional neural network in deep learning to classify the serum SERS spectrum, thereby identifying different types of cancers. The experiment adopts a convolutional neural network structure consisting of 2 convolutional layers, 2 full-link layers and 1 SoftMax classification layer to classify serum SERS spectra of healthy people, lung cancer patients and breast cancer patients, the accuracy rates of a training set, a verification set and a test set are 99.43%, 100% and 99.5% respectively, the test accuracy, the recall rate and the F1 value of the healthy people, the breast cancer patients and the lung cancer patients are all more than 98%, and the efficient identification of the healthy people, early-stage breast cancer and lung cancer is realized. The method has simple requirements on pretreatment of spectral data, overcomes the defect of manually selecting characteristics, provides reference for early cancer screening by using a convolutional neural network method, and has further development and application prospects.
Finally, it should be noted that: the above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: modifications are made to the technical solutions described in the foregoing embodiments, or some or all of the technical features are replaced with equivalents, without departing from the spirit of the technical solutions of the embodiments of the present invention.

Claims (7)

1. A cancer serum SERS spectrum classification method based on a convolutional neural network is characterized by comprising the following processes:
step 100, acquiring a data set of a cancer serum SERS spectrum, and dividing the data set of the SERS spectrum into a training set, a verification set and a test set;
200, building a convolutional neural network structure for cancer serum SERS spectrum classification; wherein the convolutional neural network structure comprises: the device comprises an input layer, a first convolution layer, a first maximum pooling layer, a second convolution layer, a second maximum pooling layer, a first full-connection layer, a second full-connection layer and a normalized exponential function output layer;
step 300, training the constructed convolutional neural network structure to obtain a cancer serum SERS spectrum classification model;
and 400, utilizing the cancer serum SERS spectrum classification model to classify the cancer serum SERS spectrum.
2. The convolutional neural network-based cancer serum SERS spectrum classification method of claim 1, wherein the first convolutional layer and the second convolutional layer are respectively operated as follows:
performing convolution processing on the input of the previous layer; the convolution processing formula is as follows:
Figure FDA0003555822600000011
wherein, (f × g) represents that the input data f is subjected to convolution operation with the convolution kernel g, N is the nth result of the convolution operation and ranges from 1 to the sum of the lengths of f and g, and N is the length of the input data f. m represents the length of the convolution kernel g;
carrying out batch normalization processing on the local features to enable the feature output of each layer to be close to standard normal distribution; the working principle of batch normalization treatment is as shown in formulas (1) to (4):
Figure FDA0003555822600000012
Figure FDA0003555822600000013
Figure FDA0003555822600000014
Figure FDA0003555822600000015
wherein x isiA sample representing the input of the ith batch,
Figure FDA0003555822600000021
denotes the normalized sample,. mu.BDenotes the mean value, σ, of the sample x by batchBRepresents the standard deviation of the sample x by batch, m represents the size of the input batch, and epsilon is a small constant set to avoid denominator of 0; BNγ,β(xi) Representing a batch normalization operation, and gamma represents the output result yiBeta represents the output result yiGamma and beta are learnable parameters, yiIs a batch normalization result after translation and scaling;
activating by using a LeakyReLU activation function, and outputting a spectral feature matrix; wherein, the implementation formula of the LeakyReLU activation function is as follows:
Figure FDA0003555822600000022
where x is the input neuron, f (x) represents the activation result, and a is a small constant set to avoid the disadvantage of the gradient vanishing at the negative axis of the conventional ReLU activation function.
3. The cancer serum SERS spectrum classification method based on the convolutional neural network of claim 2, wherein the normalized index function output layer obtains the probabilities of healthy people and cancer patients with different cancer types respectively by using the normalized index function as an activation function, wherein the normalized index function is implemented as formula (6):
Figure FDA0003555822600000023
where c is the total number of categories, j is the category index, xjRefers to the output of the output unit at the front stage of the normalized exponential function.
4. The convolutional neural network-based cancer serum SERS spectrum classification method according to claim 3, wherein the cancer type comprises at least one or more of breast cancer and lung cancer.
5. The convolutional neural network-based cancer serum SERS spectrum classification method as claimed in claim 1 or 3, wherein step 300 comprises steps 301 to 302:
301, training a convolutional neural network structure by using a training set;
step 302, in the training process, an Adam optimizer is used for optimizing the convolutional neural network to obtain a cancer serum SERS spectrum classification model.
6. The convolutional neural network-based cancer serum SERS spectrum classification method according to claim 5, further comprising after step 300:
and step A1, verifying the obtained SERS spectrum classification model of the cancer serum by using the verification set.
7. The convolutional neural network-based cancer serum SERS spectrum classification method according to claim 6, further comprising after step 300:
step A2, verifying the performance of the cancer serum SERS spectrum classification model by using a test set;
calculating the recognition accuracy of healthy people and different cancer types by adopting the following recognition accuracy calculation formula, and measuring the classification performance of the convolutional neural network:
Figure FDA0003555822600000031
calculating the recall rate of healthy people and different cancer types by adopting the following recall rate calculation formula, and measuring the classification performance of the convolutional neural network:
Figure FDA0003555822600000032
calculating F1 values of healthy people and different cancer types by adopting the following F1 value calculation formula to measure the classification performance of the convolutional neural network:
Figure FDA0003555822600000033
wherein TP, FN and FP represent the number of true positive, false negative and false positive respectively.
CN202210275709.7A 2022-03-21 2022-03-21 Cancer serum SERS spectrum classification method based on convolutional neural network Pending CN114781484A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210275709.7A CN114781484A (en) 2022-03-21 2022-03-21 Cancer serum SERS spectrum classification method based on convolutional neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210275709.7A CN114781484A (en) 2022-03-21 2022-03-21 Cancer serum SERS spectrum classification method based on convolutional neural network

Publications (1)

Publication Number Publication Date
CN114781484A true CN114781484A (en) 2022-07-22

Family

ID=82425648

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210275709.7A Pending CN114781484A (en) 2022-03-21 2022-03-21 Cancer serum SERS spectrum classification method based on convolutional neural network

Country Status (1)

Country Link
CN (1) CN114781484A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117789972A (en) * 2024-02-23 2024-03-29 北京大学人民医院 Construction method of breast cancer recurrence prediction model and prediction system thereof

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117789972A (en) * 2024-02-23 2024-03-29 北京大学人民医院 Construction method of breast cancer recurrence prediction model and prediction system thereof

Similar Documents

Publication Publication Date Title
Chaudhary et al. FBSED based automatic diagnosis of COVID-19 using X-ray and CT images
WO2020073737A1 (en) Quantitative spectroscopic data analysis and processing method based on deep learning
CN111443165B (en) Odor identification method based on gas sensor and deep learning
CN109543763B (en) Raman spectrum analysis method based on convolutional neural network
CN111798464A (en) Lymphoma pathological image intelligent identification method based on deep learning
CN112784856A (en) Channel attention feature extraction method and identification method of chest X-ray image
CN111833330B (en) Intelligent lung cancer detection method and system based on fusion of image and machine olfaction
CN108573105A (en) The method for building up of soil heavy metal content detection model based on depth confidence network
CN113674767A (en) Depression state identification method based on multi-modal fusion
CN113069117A (en) Electroencephalogram emotion recognition method and system based on time convolution neural network
CN114781484A (en) Cancer serum SERS spectrum classification method based on convolutional neural network
Lu et al. Speech depression recognition based on attentional residual network
CN113011330B (en) Electroencephalogram signal classification method based on multi-scale neural network and cavity convolution
CN114300126A (en) Cancer prediction system based on early cancer screening questionnaire and feed-forward neural network
CN113076878B (en) Constitution identification method based on attention mechanism convolution network structure
Vimalajeewa et al. Early detection of ovarian cancer by wavelet analysis of protein mass spectra
Wei et al. Multi-scale sequential feature selection for disease classification using Raman spectroscopy data
CN112716447A (en) Oral cancer classification system based on deep learning of Raman detection spectral data
CN116858822A (en) Quantitative analysis method for sulfadiazine in water based on machine learning and Raman spectrum
CN116842460A (en) Cough-related disease identification method and system based on attention mechanism and residual neural network
CN116840214A (en) Method for diagnosing brain tumor and cerebral infarction
CN116130105A (en) Health risk prediction method based on neural network
Ahmed et al. Deepcovidnet: Deep convolutional neural network for covid-19 detection from chest radiographic images
Khoirunnisa et al. Implementation of CRNN method for lung cancer detection based on microarray data
CN113571050A (en) Voice depression state identification method based on Attention and Bi-LSTM

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination