CN113702349A - Method for constructing salivary gland tumor diagnosis model based on Raman spectrum - Google Patents
Method for constructing salivary gland tumor diagnosis model based on Raman spectrum Download PDFInfo
- Publication number
- CN113702349A CN113702349A CN202110783992.XA CN202110783992A CN113702349A CN 113702349 A CN113702349 A CN 113702349A CN 202110783992 A CN202110783992 A CN 202110783992A CN 113702349 A CN113702349 A CN 113702349A
- Authority
- CN
- China
- Prior art keywords
- sample
- diagnosis model
- constructing
- salivary gland
- raman spectrum
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000003745 diagnosis Methods 0.000 title claims abstract description 37
- 208000025444 tumor of salivary gland Diseases 0.000 title claims abstract description 33
- 238000000034 method Methods 0.000 title claims abstract description 31
- 238000001237 Raman spectrum Methods 0.000 title claims abstract description 26
- 210000003296 saliva Anatomy 0.000 claims abstract description 23
- 210000001124 body fluid Anatomy 0.000 claims abstract description 21
- 239000010839 body fluid Substances 0.000 claims abstract description 21
- 238000004422 calculation algorithm Methods 0.000 claims abstract description 20
- 238000007477 logistic regression Methods 0.000 claims abstract description 11
- 238000001514 detection method Methods 0.000 claims abstract description 10
- 238000012360 testing method Methods 0.000 claims abstract description 10
- 238000001228 spectrum Methods 0.000 claims abstract description 9
- 238000007637 random forest analysis Methods 0.000 claims abstract description 8
- 238000002790 cross-validation Methods 0.000 claims abstract description 5
- 238000000605 extraction Methods 0.000 claims abstract description 5
- 238000000479 surface-enhanced Raman spectrum Methods 0.000 claims abstract description 5
- 238000007418 data mining Methods 0.000 claims abstract description 4
- 238000001069 Raman spectroscopy Methods 0.000 claims description 15
- 230000035945 sensitivity Effects 0.000 claims description 10
- 230000008569 process Effects 0.000 claims description 7
- BQCADISMDOOEFD-UHFFFAOYSA-N Silver Chemical compound [Ag] BQCADISMDOOEFD-UHFFFAOYSA-N 0.000 claims description 6
- 239000000463 material Substances 0.000 claims description 5
- FAPWRFPIFSIZLT-UHFFFAOYSA-M Sodium chloride Chemical compound [Na+].[Cl-] FAPWRFPIFSIZLT-UHFFFAOYSA-M 0.000 claims description 3
- 206010025482 malaise Diseases 0.000 claims description 3
- 238000010187 selection method Methods 0.000 claims description 3
- 238000005516 engineering process Methods 0.000 abstract description 12
- 238000004416 surface enhanced Raman spectroscopy Methods 0.000 abstract description 6
- 239000000523 sample Substances 0.000 description 43
- 230000006870 function Effects 0.000 description 12
- 206010028980 Neoplasm Diseases 0.000 description 9
- 230000003595 spectral effect Effects 0.000 description 9
- 238000003748 differential diagnosis Methods 0.000 description 8
- 238000012706 support-vector machine Methods 0.000 description 6
- 238000010801 machine learning Methods 0.000 description 5
- 210000001519 tissue Anatomy 0.000 description 5
- JUJWROOIHBZHMG-UHFFFAOYSA-N Pyridine Chemical compound C1=CC=NC=C1 JUJWROOIHBZHMG-UHFFFAOYSA-N 0.000 description 4
- 230000001575 pathological effect Effects 0.000 description 4
- 239000008279 sol Substances 0.000 description 4
- 238000012549 training Methods 0.000 description 4
- 201000011510 cancer Diseases 0.000 description 3
- 238000011160 research Methods 0.000 description 3
- 210000002966 serum Anatomy 0.000 description 3
- 238000004566 IR spectroscopy Methods 0.000 description 2
- 238000001574 biopsy Methods 0.000 description 2
- 238000010276 construction Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000009499 grossing Methods 0.000 description 2
- 239000000203 mixture Substances 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- UMJSCPRVCHMLSP-UHFFFAOYSA-N pyridine Natural products COC1=CC=CN=C1 UMJSCPRVCHMLSP-UHFFFAOYSA-N 0.000 description 2
- 238000002271 resection Methods 0.000 description 2
- 238000012216 screening Methods 0.000 description 2
- 206010061934 Salivary gland cancer Diseases 0.000 description 1
- 206010068771 Soft tissue neoplasm Diseases 0.000 description 1
- 239000000654 additive Substances 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 238000000149 argon plasma sintering Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 208000013794 benign neoplasm of salivary gland Diseases 0.000 description 1
- 239000012472 biological sample Substances 0.000 description 1
- 239000008280 blood Substances 0.000 description 1
- 210000004369 blood Anatomy 0.000 description 1
- 238000010241 blood sampling Methods 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 210000004027 cell Anatomy 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000013145 classification model Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 201000010099 disease Diseases 0.000 description 1
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 208000014829 head and neck neoplasm Diseases 0.000 description 1
- 230000036541 health Effects 0.000 description 1
- 238000003384 imaging method Methods 0.000 description 1
- 208000015181 infectious disease Diseases 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 150000002632 lipids Chemical class 0.000 description 1
- 230000000998 lymphohematopoietic effect Effects 0.000 description 1
- 238000002156 mixing Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 102000039446 nucleic acids Human genes 0.000 description 1
- 108020004707 nucleic acids Proteins 0.000 description 1
- 150000007523 nucleic acids Chemical class 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 230000008520 organization Effects 0.000 description 1
- 210000003681 parotid gland Anatomy 0.000 description 1
- 208000024011 parotid gland neoplasm Diseases 0.000 description 1
- 230000007170 pathology Effects 0.000 description 1
- 239000002504 physiological saline solution Substances 0.000 description 1
- 238000004393 prognosis Methods 0.000 description 1
- 102000004169 proteins and genes Human genes 0.000 description 1
- 108090000623 proteins and genes Proteins 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 238000006722 reduction reaction Methods 0.000 description 1
- 210000003079 salivary gland Anatomy 0.000 description 1
- 208000011581 secondary neoplasm Diseases 0.000 description 1
- 230000011218 segmentation Effects 0.000 description 1
- 229910052709 silver Inorganic materials 0.000 description 1
- 239000004332 silver Substances 0.000 description 1
- 239000000243 solution Substances 0.000 description 1
- 238000004611 spectroscopical analysis Methods 0.000 description 1
- 210000003670 sublingual gland Anatomy 0.000 description 1
- 210000001913 submandibular gland Anatomy 0.000 description 1
- 239000000126 substance Substances 0.000 description 1
- 230000009897 systematic effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01N—INVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
- G01N21/00—Investigating or analysing materials by the use of optical means, i.e. using sub-millimetre waves, infrared, visible or ultraviolet light
- G01N21/62—Systems in which the material investigated is excited whereby it emits light or causes a change in wavelength of the incident light
- G01N21/63—Systems in which the material investigated is excited whereby it emits light or causes a change in wavelength of the incident light optically excited
- G01N21/65—Raman scattering
- G01N21/658—Raman scattering enhancement Raman, e.g. surface plasmons
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H50/00—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
- G16H50/50—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for simulation or modelling of medical disorders
Abstract
The invention discloses a method for constructing a salivary gland tumor diagnosis model based on Raman spectrum, which comprises the following steps: s1: the feature extraction specifically comprises the following steps: s101, extracting saliva meeting requirements of a patient, and carrying out surface enhanced Raman spectrum detection on an extracted saliva sample to obtain sample spectrum data; s102: selecting a certain amount of samples, carrying out data mining on the selected samples through OneR, and evaluating the importance of characteristics through detecting prediction errors; s103: selecting characteristics of the selected samples; s2: constructing a prediction model: s201: establishing a diagnosis model by using a random forest and a logistic regression algorithm, and obtaining a final diagnosis model through 5-fo l d cross validation; s202, performing prediction ability test on the residual sample after a certain amount is selected in the S102, correcting the final diagnosis model according to the test result, rapidly detecting the body fluid sample by applying the surface enhanced Raman spectroscopy technology, obtaining a characteristic Raman spectrum, and diagnosing by identifying the diagnosis model.
Description
Technical Field
The invention relates to a body fluid diagnosis method of salivary gland tumors, in particular to a method for constructing a diagnosis model of the salivary gland tumors based on Raman spectrum.
Background
Salivary gland tumor is a tumor disease occurring in parotid gland, submandibular gland, sublingual gland and oral maxillofacial small salivary gland, is a common type of oral and maxillofacial tumor, and the incidence rate is increasing year by year in recent years. Salivary gland tumor tissue pathological classification is complex, and the tumor types are numerous, according to the tumor classification of world health organization (2005 edition): the pathological classification of salivary gland tumor in the pathology and genetics of head and neck tumor includes five 41 kinds of benign tumor, malignant tumor, soft tissue tumor, lymphohematopoietic tumor and secondary tumor. This makes it difficult for clinicians and pathologists to diagnose and differentially diagnose salivary gland tumors, and since salivary gland tumors prohibit routine preoperative biopsy, it is more difficult to diagnose preoperatively. At present, the salivary gland tumor pre-operation diagnosis is mainly carried out clinically by a fine needle aspiration biopsy technology, but the diagnosis accuracy rate often depends on the technology of a puncture operator and the experience of a pathologist, a stable and objective reference cannot be provided for the pre-operation diagnosis, and meanwhile, the puncture technology belongs to invasive operation, and can stimulate the tumor to cause complications such as tumor dissemination, infection of a puncture area and the like. Therefore, the diagnosis of benign and malignant salivary gland tumors by means of intraoperative frozen pathological sections becomes a decisive technology for determining the resection mode of the operation, but the sensitivity and the sensitivity of the intraoperative frozen section diagnosis technology are unreliable, and related researches show that about 30 percent of malignant tumors are diagnosed as benign, and meanwhile, the intraoperative frozen section technology cannot accurately identify different malignant tumor subtypes, so that the defects cause difficulty in intraoperative determination of the resection mode of the tumor and prognosis judgment.
The principle of the Raman spectrum technology is based on inelastic scattering, namely information such as molecular structures and group compositions are reflected according to light scattering frequency change formed by molecular vibration, the Raman spectrum technology has the advantages of high sensitivity and specificity, no need of special treatment on samples, wide application range, quick test and the like, and is widely applied to the field of biomedicine. The raman spectroscopy has incomparable advantages over other spectroscopy, such as high spectral peak resolution compared with infrared spectroscopy, no interference from moisture, real-time and rapid imaging, and the like, is particularly suitable for researching biological samples containing a large amount of moisture with strong infrared spectroscopy interference, can provide a large amount of information about molecular composition, molecular structure and interaction between molecules in cells, tissues or body fluids, reflects changes of substance components such as proteins, nucleic acids, lipids and the like, and is called as molecular fingerprint technology. The prior research shows that the Raman scattering signal of the pyridine molecules adsorbed on the rough silver surface is enhanced by 10^6 orders of magnitude compared with the Raman scattering signal of the pyridine molecules in the solution phase through systematic experiments and calculation, and the phenomenon is a surface enhancement effect related to the rough surface and is called a surface enhanced Raman spectrum effect. Due to the advantages of non-invasiveness, high specificity, high sensitivity and the like of the surface enhanced Raman spectroscopy on human body detection, the surface enhanced Raman spectroscopy becomes a research hotspot in the aspect of clinical disease diagnosis.
A Support Vector Machine (Support Vector Machine) is used as one of Machine learning algorithms, is suitable for solving the problem of identifying small samples, non-linear and high-dimensional modes, and shows a plurality of specific advantages. The algorithm is established on the basis of a VC (virtual component) dimension theory and a structure risk minimization principle of a statistical learning theory, and an optimal scheme is searched between the complexity and the learning capacity of a model according to limited sample information so as to obtain the best generalization capacity. Generally, when the algorithm is used as a binary classification model, the objective is to find a hyperplane to segment samples, the segmentation principle is to maximize the interval between samples, and the optimization problem of segmenting the hyperplane is finally converted into a convex quadratic programming problem to solve. When the training samples are linearly separable, the algorithm adopts a hard interval and maximizes the hard interval, and a linearly separable support vector machine model can be obtained; when the training samples are not linear, the algorithm adopts a nonlinear kernel function and maximizes the soft interval, and a nonlinear support vector machine model can be obtained.
Disclosure of Invention
In order to solve the technical problems, the invention provides a method for constructing a salivary gland tumor diagnosis model based on Raman spectroscopy, which is used for carrying out real-time surface enhanced Raman spectroscopy detection on serum, saliva and other body fluid samples of a salivary gland tumor patient to obtain characteristic spectral data of the body fluid sample, wherein the process is about 5 minutes. Collecting body fluid spectral data of salivary gland tumor patients to establish a database, analyzing parotid gland tumor data of the database by using a support vector machine technology, establishing a differential diagnosis model, and carrying out differential diagnosis on Raman spectral data of body fluid of clinical salivary gland tumor patients through the differential diagnosis model so as to determine the pathological type of the salivary gland tumor, thereby providing a novel method which is rapid, noninvasive, accurate, simple and convenient for diagnosis and screening of the salivary gland tumor patients.
The technical purpose of the invention is realized by the following technical scheme:
a method for constructing a salivary gland tumor diagnosis model based on Raman spectrum comprises the following steps:
s1: the feature extraction specifically comprises the following steps:
s101, extracting saliva meeting requirements of a patient, and carrying out surface enhanced Raman spectrum detection on an extracted saliva sample to obtain sample spectrum data;
s102: selecting a certain amount of samples, carrying out data mining on the selected samples through OneR, and evaluating the importance of characteristics through detecting prediction errors;
s103: selecting characteristics of the selected samples;
s2: constructing a prediction model:
s201: establishing a diagnosis model by using a random forest and logistic regression algorithm, and obtaining a final diagnosis model through 5-fold cross validation;
s202: and (4) performing prediction ability test on the residual samples with a certain amount selected in the step (S102), and correcting the final diagnosis model according to the test result.
As a preferable scheme, in S101, the method for collecting saliva specifically includes the following steps: the patient fasted in the morning, rinsed with normal saline, placed saliva collecting material in the mouth, then placed in a saliva collecting tube, and centrifuged; then, the collected body fluid specimen is uniformly mixed with the nano silver sol.
As a preferable scheme, in the S103 process, the specific feature selection method specifically includes the following steps:
wherein: hjIs a neighborhood in the selected same class, where j is 1, 2, … k; mjAs neighborhoods in different classes; for all features, initial weight W (A)0Set to 0; function diff(s)i,al,Hj,al) Calculating the same class of features between the ith sample and the jth adjacent sample; function diff(s)i,al,Mj,al) Calculating the ith different grade of the characteristic between the ith sample and the jth adjacent sample; wherein:
wherein: where max (A) is the maximum value of the feature, min (A) is the minimum value of the feature,is a characteristic of the i-th sample,is the jth sample in the same class of eigenvalues of the ith sample,is a feature of the jth sample in a different class; all procedures were repeated m times.
As a preferred scheme, in the S2 process, the construction of the prediction model specifically includes the following steps:
wherein: prediction Accuracy (ACC), Sensitivity (SEN), Specificity (SPE), and horse sickness correlation coefficient (MCC) are considered performance indicators; TP, FP, TN and FN indicate the number of true positives, false positives, true negatives and false negatives, respectively.
As a preferred scheme, in S201, establishing a diagnostic model by using a logistic regression algorithm specifically includes the following steps:
f(t)=P(Y=1|x)=1/(1+e-t)=et/(1+et) (8)
t=b0+b1x1+b2x2+…bnxn (9)
wherein f (t) is the probability of an event occurring and varies from 0 to 1 over time; y is a positive sample (defined as 1) or a negative sample (defined as 0), t is a linear combination of features, b0Representation model { b1,…,bnThe intercept is the partial regression coefficient, { x }1,…,xnAre independent spectral features.
In conclusion, the invention has the following beneficial effects:
(1) rapidly detecting a body fluid sample by using a surface enhanced Raman spectroscopy technology, obtaining a characteristic Raman spectrum, and diagnosing by using a differential diagnosis model;
(2) the stability and accuracy of the salivary gland tumor patient body fluid database established by the support vector machine are continuously improved by continuously expanding the data;
(3) the method is objective, rapid, convenient and accurate, and can form a diagnosis and screening system for salivary gland tumor patients.
Detailed Description
This specification and claims do not intend to distinguish between components that differ in name but not function. In the following description and in the claims, the terms "include" and "comprise" are used in an open-ended fashion, and thus should be interpreted to mean "include, but not limited to. "substantially" means within an acceptable error range, within which a person skilled in the art can solve the technical problem to substantially achieve the technical result.
The terms in the upper, lower, left, right and the like in the specification and the claims are used for further explanation, so that the application is more convenient to understand and is not limited to the application.
The present invention will be described in further detail below.
Example 1:
the structure of the scheme mainly comprises two parts: the first part is a hardware part and comprises a sample carrying component, a Raman spectrometer and a computer system, wherein the sample carrying component is connected to the Raman spectrometer, and the Raman spectrometer is connected to the computer. The sample loading component has the functions of outputting exciting light to tissues needing to be identified and recovering a Raman spectrum to the Raman spectrometer, the Raman spectrometer has the functions of generating the exciting light and filtering and outputting Raman spectrum signals, and the computer system has the function of converting the Raman spectrum signals into digital information and then analyzing and diagnosing the digital information. The second part is computer software which comprises Raman spectrum analysis software, a spectrum database and a machine learning module, wherein the Raman spectrum analysis software is connected with the spectrum database (storage), the machine learning module is connected with the spectrum database (extraction data establishment model), the Raman spectrum analysis software has the function of converting optical signals collected by a Raman spectrometer into digital signals and storing the digital signals into the spectrum database, and the machine learning module has the function of analyzing characteristic Raman spectrum data of a body fluid specimen of a salivary gland tumor patient, extracting existing data from the spectrum database to establish a differential diagnosis model, providing a final differential diagnosis result and giving rapid and accurate diagnosis.
Example 2:
the specific method of the operation is as follows:
1. collection of saliva samples: taking a saliva collecting pipe, taking a patient to fast in the morning, rinsing with physiological saline for three times, placing a saliva collecting material in the mouth for 5 minutes, placing the material in the saliva collecting pipe, centrifuging for 2 minutes at 3000 rpm, and taking 1ml of a saliva sample in the collecting pipe for later use;
2. collecting a serum sample: collecting 5ml of patient's fasting blood by using a blood sampling tube without additives, standing for 30 minutes, centrifuging for 6 minutes at 3400 rpm, and collecting 1ml of serum sample in the tube for later use;
3. uniformly mixing the body fluid specimen with 1ml of nano silver sol;
4. and (2) carrying out surface enhanced Raman spectroscopy detection on the processed body fluid sample by using a Raman spectrometer, placing the sample uniformly mixed with the nano-silver sol into a sample loading pool, fixing a spectrometer probe to detect the body fluid sample, setting the detection conditions to be 785nm exciting light, 200mW power and 9mm working distance, setting the detection time to be 60s, and obtaining average spectral data after 5 times of detection.
5. And converting the spectral data into a txt format, performing processing such as removing a fluorescence background, reducing noise, smoothing, normalizing and the like, and outputting the data into an xls format.
6. The output Raman spectrum is finally stored in a Raman spectrum database, the database system is built by adopting MySQL, and one tissue sample corresponds to one Raman spectrum and clinical index information related to the sample.
7. The machine learning module adopts a random forest algorithm and a logistic regression algorithm for modeling, the random forest algorithm and the logistic regression algorithm randomly extract 80% of samples from the spectral database for model training, a 5-fold cross validation method is adopted during training, the optimal punishment coefficient C and the kernel function correlation coefficient gamma are selected by utilizing a grid parameter searching mode, a differential diagnosis model is established, and the residual 20% of samples are utilized for testing the prediction capability of the model. The process is repeated for a plurality of times (generally more than or equal to 50 times), and the optimal model (generally more than or equal to 90 percent of accuracy) is reserved for carrying out differential diagnosis on the body fluid sample of the salivary gland tumor patient.
8. And scanning the unknown sample by a Raman spectrometer to obtain a corresponding Raman spectrum signal, inputting the Raman spectrum signal into the optimized random forest algorithm and logistic regression algorithm model for discriminant analysis, and judging the tissue type of the unknown sample according to an output result.
Example 3:
a method for constructing a salivary gland tumor diagnosis model based on Raman spectrum comprises the following steps:
s1: the feature extraction specifically comprises the following steps:
s101, extracting saliva meeting requirements of a patient, and carrying out surface enhanced Raman spectrum detection on an extracted saliva sample to obtain sample spectrum data;
s102: selecting a certain amount of samples, carrying out data mining on the selected samples through OneR, and evaluating the importance of characteristics through detecting prediction errors;
s103: selecting characteristics of the selected samples;
s2: constructing a prediction model:
s201: establishing a diagnosis model by using a random forest and logistic regression algorithm, and obtaining a final diagnosis model through 5-fold cross validation;
s202: and (4) performing prediction ability test on the residual samples with a certain amount selected in the step (S102), and correcting the final diagnosis model according to the test result.
As a preferable scheme, in S101, the method for collecting saliva specifically includes the following steps: the patient fasted in the morning, rinsed with normal saline, placed saliva collecting material in the mouth, then placed in a saliva collecting tube, and centrifuged; then, the collected body fluid specimen is uniformly mixed with the nano silver sol.
As a preferable scheme, in the S103 process, the specific feature selection method specifically includes the following steps:
wherein: hjIs a neighborhood in the selected same class, where j is 1, 2, … k; mjAs neighborhoods in different classes; for all features, initial weight W (A)0Set to 0; function diff(s)i,al,Hj,al) Calculating the same class of features between the ith sample and the jth adjacent sample; function diff(s)i,al,Mj,al) Calculating the ith different grade of the characteristic between the ith sample and the jth adjacent sample; wherein:
wherein: where max (A) is the maximum value of the feature, min (A) is the minimum value of the feature,is a characteristic of the i-th sample,is the jth sample in the same class of eigenvalues of the ith sample,is a feature of the jth sample in a different class; all procedures were repeated m times.
As a preferred scheme, in the S2 process, the construction of the prediction model specifically includes the following steps:
wherein: prediction Accuracy (ACC), Sensitivity (SEN), Specificity (SPE), and horse sickness correlation coefficient (MCC) are considered performance indicators; TP, FP, TN and FN indicate the number of true positives, false positives, true negatives and false negatives, respectively.
As a preferred scheme, in S201, establishing a diagnostic model by using a logistic regression algorithm specifically includes the following steps:
f(t)=P(Y=1|x)=1/(1+e-t)=et/(1+et) (8)
t=b0+b1x1+b2x2+…bnxn (9)
wherein f (t) is the probability of an event occurring and varies from 0 to 1 over time; y is a positive sample (defined as 1) or a negative sample (defined as 0), t is a linear combination of features, b0Representation model { b1,…,bnThe intercept is the partial regression coefficient, { x }1,…,xnAre independent spectral features.
The working principle is as follows: the method comprises the steps of detecting body fluid of salivary gland tumor patients by using a Raman spectrometer, obtaining characteristic Raman spectrum data of body fluid of different salivary gland tumor patients, carrying out background subtraction, noise reduction, smoothing and normalization processing on the data, analyzing the characteristic Raman spectrum data of the body fluid of the different salivary gland tumor patients by using a random forest algorithm and a logistic regression algorithm, and establishing a salivary gland tumor diagnosis model. The model is applied to analyze the Raman spectrum data of the body fluid of the salivary gland tumor patient, and a diagnosis report is rapidly, accurately and conveniently given.
The present embodiment is only for explaining the present invention, and it is not limited to the present invention, and those skilled in the art can make modifications of the present embodiment without inventive contribution as needed after reading the present specification, but all of them are protected by patent law within the scope of the claims of the present invention.
Claims (5)
1. A method for constructing a salivary gland tumor diagnosis model based on Raman spectrum is characterized by comprising the following steps:
s1: the feature extraction specifically comprises the following steps:
s101, extracting saliva meeting requirements of a patient, and carrying out surface enhanced Raman spectrum detection on an extracted saliva sample to obtain sample spectrum data;
s102: selecting a certain amount of samples, carrying out data mining on the selected samples through OneR, and evaluating the importance of characteristics through detecting prediction errors;
s103: selecting characteristics of the selected samples;
s2: constructing a prediction model:
s201: establishing a diagnosis model by using a random forest and logistic regression algorithm, and obtaining a final diagnosis model through 5-fold cross validation;
and S202, performing prediction ability test on the residual samples with a certain amount selected in the S102, and correcting the final diagnosis model according to the test result.
2. The method for constructing a diagnostic model of salivary gland tumor according to claim 1, wherein in step S101, the method for collecting saliva specifically comprises the following steps: the patient fasted in the morning, rinsed with normal saline, placed saliva collecting material in the mouth, then placed in a saliva collecting tube, and centrifuged; then, the collected body fluid specimen is uniformly mixed with the nano silver sol.
3. The method for constructing a salivary gland tumor diagnosis model based on Raman spectroscopy as claimed in claim 1, wherein the specific characteristic selection method in the S103 process specifically comprises the following steps:
wherein: hjIs a neighborhood in the selected same class, where j is 1, 2, … k; mjAs neighborhoods in different classes; for all features, initial weight W (A)0Set to 0; function diff(s)i,al,Hj,al) Calculating the same class of features between the ith sample and the jth adjacent sample; function diff(s)i,al,Mj,al) Calculating the ith different grade of the characteristic between the ith sample and the jth adjacent sample; wherein:
wherein: where max (A) is the maximum value of the feature and min (A) is the minimum value of the feature.Is a characteristic of the i-th sample,is the jth sample in the same class of eigenvalues of the ith sample,is a feature of the jth sample in a different class; all procedures were repeated m times.
4. The method for constructing a diagnosis model of salivary gland tumor according to claim 1, wherein the step of constructing a prediction model in the step of S2 comprises the following steps:
wherein: prediction Accuracy (ACC), Sensitivity (SEN), Specificity (SPE), and horse sickness correlation coefficient (MCC) are considered performance indicators; TP, FP, TN and FN indicate the number of true positives, false positives, true negatives and false negatives, respectively.
5. The method for constructing a diagnosis model of salivary gland tumor according to claim 4, wherein the step of establishing a diagnosis model by using a logistic regression algorithm in step S201 comprises the following steps:
f(t)=P(Y=1|x)=1/(1+e-t)=et/(1+et) (8)
t=b0+b1x1+b2x2+…bnxn (9)
wherein f (t) is the probability of an event occurring and varies from 0 to 1 over time; y is a positive sample (defined as 1) or a negative sample (defined as 0), t is a linear combination of features, b0Representation model { b1,…,bnThe intercept is the partial regression coefficient, { x }1,…,xnAre independentSpectral characteristics.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110783992.XA CN113702349A (en) | 2021-07-12 | 2021-07-12 | Method for constructing salivary gland tumor diagnosis model based on Raman spectrum |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110783992.XA CN113702349A (en) | 2021-07-12 | 2021-07-12 | Method for constructing salivary gland tumor diagnosis model based on Raman spectrum |
Publications (1)
Publication Number | Publication Date |
---|---|
CN113702349A true CN113702349A (en) | 2021-11-26 |
Family
ID=78648443
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110783992.XA Pending CN113702349A (en) | 2021-07-12 | 2021-07-12 | Method for constructing salivary gland tumor diagnosis model based on Raman spectrum |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113702349A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114384057A (en) * | 2021-12-28 | 2022-04-22 | 四川大学 | Tumor early diagnosis system based on Raman spectrum |
CN116077016A (en) * | 2022-12-22 | 2023-05-09 | 四川大学 | Portable information diagnosis device based on Raman spectrum and infrared spectrum |
Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103217409A (en) * | 2013-03-22 | 2013-07-24 | 重庆绿色智能技术研究院 | Raman spectral preprocessing method |
CN104142320A (en) * | 2013-06-08 | 2014-11-12 | 李龙江 | Serum surface enhanced Raman spectrum based parotid tumor diagnosis technology |
CN104515797A (en) * | 2014-05-29 | 2015-04-15 | 深圳市第二人民医院 | Breast cancer early stage diagnosis sialoprotein fingerprint model and construction method thereof |
CN106897566A (en) * | 2017-02-28 | 2017-06-27 | 北京积水潭医院 | A kind of construction method and device of risk prediction model |
CN108088834A (en) * | 2017-09-13 | 2018-05-29 | 新疆大学 | Echinococcosis serum Raman spectrum diagnostic equipment based on optimization reverse transmittance nerve network |
CN109781701A (en) * | 2019-01-18 | 2019-05-21 | 拉曼兄弟(深圳)科技发展有限公司 | Real-time detection method in a kind of parathyroidectomy based on Raman spectroscopy |
CN109781706A (en) * | 2019-02-11 | 2019-05-21 | 上海应用技术大学 | Training method based on the PCA-Stacking food-borne pathogens Raman spectrum identification model established |
CN109781699A (en) * | 2019-01-18 | 2019-05-21 | 拉曼兄弟(深圳)科技发展有限公司 | A method of the real-time detection parotid tumor based on Raman spectrum |
CN111274874A (en) * | 2020-01-08 | 2020-06-12 | 上海应用技术大学 | Food-borne pathogenic bacteria Raman spectrum classification model training method based on adaboost |
CN111707656A (en) * | 2020-06-29 | 2020-09-25 | 陕西未来健康科技有限公司 | Cerebrospinal fluid cell detection method and system based on Raman scattering spectrum |
CN112331270A (en) * | 2021-01-04 | 2021-02-05 | 中国工程物理研究院激光聚变研究中心 | Construction method of novel coronavirus Raman spectrum data center |
-
2021
- 2021-07-12 CN CN202110783992.XA patent/CN113702349A/en active Pending
Patent Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103217409A (en) * | 2013-03-22 | 2013-07-24 | 重庆绿色智能技术研究院 | Raman spectral preprocessing method |
CN104142320A (en) * | 2013-06-08 | 2014-11-12 | 李龙江 | Serum surface enhanced Raman spectrum based parotid tumor diagnosis technology |
CN104515797A (en) * | 2014-05-29 | 2015-04-15 | 深圳市第二人民医院 | Breast cancer early stage diagnosis sialoprotein fingerprint model and construction method thereof |
CN106897566A (en) * | 2017-02-28 | 2017-06-27 | 北京积水潭医院 | A kind of construction method and device of risk prediction model |
CN108088834A (en) * | 2017-09-13 | 2018-05-29 | 新疆大学 | Echinococcosis serum Raman spectrum diagnostic equipment based on optimization reverse transmittance nerve network |
CN109781701A (en) * | 2019-01-18 | 2019-05-21 | 拉曼兄弟(深圳)科技发展有限公司 | Real-time detection method in a kind of parathyroidectomy based on Raman spectroscopy |
CN109781699A (en) * | 2019-01-18 | 2019-05-21 | 拉曼兄弟(深圳)科技发展有限公司 | A method of the real-time detection parotid tumor based on Raman spectrum |
CN109781706A (en) * | 2019-02-11 | 2019-05-21 | 上海应用技术大学 | Training method based on the PCA-Stacking food-borne pathogens Raman spectrum identification model established |
CN111274874A (en) * | 2020-01-08 | 2020-06-12 | 上海应用技术大学 | Food-borne pathogenic bacteria Raman spectrum classification model training method based on adaboost |
CN111707656A (en) * | 2020-06-29 | 2020-09-25 | 陕西未来健康科技有限公司 | Cerebrospinal fluid cell detection method and system based on Raman scattering spectrum |
CN112331270A (en) * | 2021-01-04 | 2021-02-05 | 中国工程物理研究院激光聚变研究中心 | Construction method of novel coronavirus Raman spectrum data center |
Non-Patent Citations (2)
Title |
---|
HUANG LIQIU等: "Characteristic wavenumbers of Raman spectra reveal the molecular mechanisms of oral leukoplakia and can help to improve the performance of diagnostic models", 《ANALYTICAL METHODS》 * |
祖恩东, 中国地质大学出版社 * |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114384057A (en) * | 2021-12-28 | 2022-04-22 | 四川大学 | Tumor early diagnosis system based on Raman spectrum |
CN114384057B (en) * | 2021-12-28 | 2023-09-19 | 四川大学 | Tumor early diagnosis system based on Raman spectrum |
CN116077016A (en) * | 2022-12-22 | 2023-05-09 | 四川大学 | Portable information diagnosis device based on Raman spectrum and infrared spectrum |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Ripley et al. | A comparison of Artificial Intelligence techniques for spectral classification in the diagnosis of human pathologies based upon optical biopsy | |
CN111243042A (en) | Ultrasonic thyroid nodule benign and malignant characteristic visualization method based on deep learning | |
Leslie et al. | Identification of pediatric brain neoplasms using Raman spectroscopy | |
CN111812078A (en) | Artificial intelligence assisted early diagnosis method for prostate tumor based on surface enhanced Raman spectroscopy | |
CN113702349A (en) | Method for constructing salivary gland tumor diagnosis model based on Raman spectrum | |
Devpura et al. | Detection of benign epithelia, prostatic intraepithelial neoplasia, and cancer regions in radical prostatectomy tissues using Raman spectroscopy | |
CN110991536A (en) | Training method of early warning model of primary liver cancer | |
Krishna et al. | Raman spectroscopy of breast tissues | |
US20050250091A1 (en) | Raman molecular imaging for detection of bladder cancer | |
Xing et al. | Automatic detection of A‐line in lung ultrasound images using deep learning and image processing | |
Frost et al. | Raman spectroscopy and multivariate analysis for the non invasive diagnosis of clinically inconclusive vulval lichen sclerosus | |
CN112716447A (en) | Oral cancer classification system based on deep learning of Raman detection spectral data | |
CN116030032A (en) | Breast cancer analysis equipment, system and storage medium based on Raman spectrum data | |
CN116840214A (en) | Method for diagnosing brain tumor and cerebral infarction | |
CN113960130A (en) | Machine learning method for diagnosing thyroid cancer by adopting open ion source | |
CN111265234A (en) | Method and system for judging properties of lung mediastinal lymph nodes | |
CN110890158A (en) | Method for establishing cerebral infarction model based on tears and application thereof | |
AU2021100932A4 (en) | Machine learning & deep learning for disease detection | |
Ganapathy | Cancer detection using deep neural network differentiation of squamous carcinoma cells in oral pathology | |
CN113854963B (en) | Prostate cancer photoacoustic spectrum database and construction method thereof | |
CN117789972A (en) | Construction method of breast cancer recurrence prediction model and prediction system thereof | |
WO2024075274A1 (en) | Cell classification device, cell classification method, and program | |
Chaddad et al. | Radiomics for a Comprehensive Assessment of Glioblastoma Multiforme | |
ŞAHİN | Investigation of Fourier Transform Infrared (FT-IR) Spectroscopy and Chemometric Analysis Method as an Alternative Method in the Diagnosis of Prostate Cancer | |
Pasalkar et al. | Breast Cancer Detection using Ultrasound Image |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20211126 |