CN115346663A - Method for screening digestive system tumor - Google Patents

Method for screening digestive system tumor Download PDF

Info

Publication number
CN115346663A
CN115346663A CN202211038137.7A CN202211038137A CN115346663A CN 115346663 A CN115346663 A CN 115346663A CN 202211038137 A CN202211038137 A CN 202211038137A CN 115346663 A CN115346663 A CN 115346663A
Authority
CN
China
Prior art keywords
screening
group
tumor
digestive system
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211038137.7A
Other languages
Chinese (zh)
Inventor
王正
季凯
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Individual
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Priority to CN202211038137.7A priority Critical patent/CN115346663A/en
Publication of CN115346663A publication Critical patent/CN115346663A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/20ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for computer-aided diagnosis, e.g. based on medical expert systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/12Computing arrangements based on biological models using genetic models
    • G06N3/126Evolutionary algorithms, e.g. genetic algorithms or genetic programming
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/70ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Public Health (AREA)
  • Medical Informatics (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Primary Health Care (AREA)
  • Epidemiology (AREA)
  • Pathology (AREA)
  • Theoretical Computer Science (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Databases & Information Systems (AREA)
  • Evolutionary Computation (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Genetics & Genomics (AREA)
  • Physiology (AREA)
  • Investigating Or Analysing Biological Materials (AREA)

Abstract

The invention discloses a method for screening digestive system tumors, which is based on peripheral blood examination indexes including whole blood cell examination data, serology indexes, CEA and AFP, establishes a machine learning model through a quantum genetic algorithm, and screens the digestive system tumors by using the model. The method comprises the following steps: obtaining the checking results of age, sex, whole blood cells, blood biochemistry, AFP and CEA of a learning sample and the disease diagnosis condition; screening and constructing an optimal learning sample based on a quantum genetic algorithm, training a machine learning algorithm model by using the optimal learning sample, and screening the digestive system tumor by using the model. The invention is innovative in that the optimal learning sample is screened by a quantum genetic algorithm by utilizing multiple groups of mathematical data, and a machine learning model is established to screen digestive system tumors.

Description

Method for screening digestive system tumor
Technical Field
The invention belongs to the field of medical data processing, and particularly relates to a method for screening digestive system tumors.
Background
The statements in this section merely provide background information related to the present disclosure and may not necessarily constitute prior art.
The digestive system consists of two major parts, a digestive tube and a digestive gland. The digestive tube comprises oral cavity, pharynx, esophagus, stomach, small intestine, and colon and rectum. The digestive glands include small digestive glands and large digestive glands. The small digestive gland powder is located in the wall of each part of the digestive tract, and the large digestive gland has three pairs of salivary glands (parotid gland, submandibular gland and sublingual gland), liver and pancreas.
The high-incidence digestive system tumors comprise 4 types of gastric cancer, colorectal cancer, esophageal cancer and liver cancer. The current screening of these 4 cancers relies heavily on B-ultrasound and digestive endoscopy. These examinations require specialized medical practitioners, technicians and caregivers, the latter requiring painless treatment by the anesthesiologist for patient compliance, a labor and resource intensive screening approach. This limits the efficiency of the screening of digestive tract tumors.
In addition, tumor markers such as CEA, AFP and the like are widely adopted in the physical examination market at present to screen digestive system tumors. Most of the existing tumor markers have the problems of insufficient sensitivity and specificity and the like, are mainly used for auxiliary diagnosis, prognosis judgment, radiotherapy sensitivity prediction and curative effect monitoring, and have small significance for tumor screening.
In recent years, machine learning has become a research focus for screening malignant tumors by using multiple sets of mathematical data such as blood cells, tumor markers, genes, proteins and the like in peripheral blood. However, in the past research, only healthy people and cancer patients are generally considered, or methods such as controlling the proportion of positive samples are adopted, when screening is carried out in real patients, the false positive rate is very high due to the interference of various bacteria, virus infection and complex diseases. If the accuracy of machine learning multi-group science screening in the real world can be further improved, the method has high practical application value.
Disclosure of Invention
In order to solve the above problems, the present invention provides a method for screening digestive system tumors, comprising:
acquiring clinical test data of age, sex, whole blood cell inspection data, serological inspection data, AFP and CEA inspection results and disease diagnosis conditions;
the clinical data were processed and the whole blood cell examination data, biochemical examination data, AFP, CEA examination results within 72 hours of the same person were used as a set of clinical examination data.
The clinical examination data were divided into five groups, namely healthy group, group of tumors in digestive tract lumen, group of liver cancer, group of contrast malignant tumors, and group of high incidence disease.
Wherein the healthy group does not comprise critically ill patients; the tumor group in the digestive tract cavity comprises gastric cancer, colorectal cancer and esophageal cancer; the control malignant tumor group comprises other malignant tumors except the group of tumors in the digestive tract cavity and the liver cancer; the high-incidence disease group includes other common diseases besides malignant tumors.
The clinical test data is divided into a training data set and a test data set, wherein the test data set is created according to true disease incidence.
And establishing a training data set through a quantum genetic algorithm, and calculating the optimal selection proportion of five grouping personnel to generate an optimal training data set. The method comprises the following steps:
the qubit encoded chromosomes are randomly initialized and the deterministic solution is measured for each individual chromosome. And generating a training data set according to the selection proportion of the five grouped persons corresponding to the determined solution. And generating a corresponding screening model by using a machine learning algorithm, and checking the test data by using the screening model. And defining the average F1 value of the tumor group and the liver cancer group in the digestive tract cavity as a fitness function value in the screening result.
And evaluating the fitness of each determined solution, adjusting the chromosome by using a quantum revolving door to obtain a new population, performing iterative computation to obtain the optimal chromosome and the corresponding fitness, and generating an optimal learning sample.
And training a machine learning algorithm model by using the optimal learning sample, optimizing machine learning hyper-parameters, selecting the parameter with the highest average F1 value of the tumor group and the liver cancer group in the digestive tract cavity in the test data set test result, and establishing a digestive system tumor screening model. And screening the gastric cancer, the colorectal cancer, the esophageal cancer and the liver cancer by using a digestive system tumor screening model.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and, together with the description, serve to explain the principles of the invention. It is obvious that the drawings in the following description are only some embodiments of the invention, and that for a person skilled in the art, other drawings can be derived from them without inventive effort.
FIG. 1 is a schematic structural diagram of a method for screening digestive system tumors according to an embodiment of the present invention.
Detailed Description
The invention is further described with reference to the following figures and examples.
It is to be understood that the following detailed description is exemplary and is intended to provide further explanation of the invention as claimed. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.
Fig. 1 shows a method for screening digestive system tumor of the present embodiment, which comprises:
(1) Acquiring clinical test data of age, sex, whole blood cell inspection data, serological inspection data, AFP and CEA inspection results and disease diagnosis conditions; the clinical data were processed and the whole blood cell examination data, biochemical examination data, AFP, CEA examination results within 72 hours of the same person were used as a set of clinical examination data.
And cleaning the data, namely removing the missing data and the data with wrong content, clustering the data by using a K-Means algorithm, and removing outlier data.
(2) The clinical examination data were divided into five groups, namely healthy group, group of tumors in digestive tract lumen, group of liver cancer, group of contrast malignant tumors, and group of high incidence disease.
Wherein the healthy group does not comprise critically ill patients; the tumor group in the digestive tract cavity comprises gastric cancer, colorectal cancer and esophageal cancer; the control malignant tumor group comprises the group excluding the tumor in the digestive tract cavity and other malignant tumors except liver cancer; the high-incidence disease group includes other common diseases besides malignant tumors.
(3) The clinical test data is divided into a training data set and a test data set, wherein the test data set is created according to the true disease incidence.
(4) In research, it is found that training data sets formed by grouped data in different proportions use the same machine learning algorithm to generate models, and the performance of the models is greatly different in actual tests.
In the past research, only a training data set consisting of healthy people and cancer patients is generally considered, or methods such as controlling the proportion of positive samples are adopted. When screening is carried out on real patients, the false positive rate is very high due to the low tumor incidence rate, the interference of various bacteria, virus infection and complex diseases.
In research, the screening accuracy can be greatly improved by accurately controlling the training data set formed by the grouped data in different proportions.
The traditional genetic algorithm GA has the phenomena of more iteration times, low convergence speed and easy falling into a local extreme value due to improper selection, intersection or variation and other modes. The quantum genetic algorithm QGA is a product of combining quantum computation and a genetic algorithm, and is a newly developed probabilistic evolution method. The quantum genetic algorithm is established on the basis of quantum state vector representation, probability amplitude representation of quantum bits is applied to coding of chromosomes, so that one chromosome can express superposition of multiple states, updating operation of the chromosomes is realized by using a quantum logic gate, and a better effect than that of a conventional genetic algorithm is achieved.
The optimal learning sample is selected and constructed through quantum genetic algorithm screening, and the operation method comprises the following steps:
Figure 888560DEST_PATH_IMAGE001
initializing a population Q (t 0), and randomly generating 100 chromosomes with quantum bits as codes;
Figure 87460DEST_PATH_IMAGE002
measuring each individual in the initial population Q (t 0) once to obtain a corresponding determination solution P (t 0);
Figure 897284DEST_PATH_IMAGE003
generating a training sample for determining the corresponding proportion of the solution P (t 0), and training a corresponding screening model by using machine learning;
Figure 890648DEST_PATH_IMAGE004
testing the test data by using a screening model, wherein in the testing result, the average F1 value of the tumor group and the liver cancer group in the digestive tract cavity is defined as a fitness function value;
Figure 130131DEST_PATH_IMAGE005
evaluating the fitness of each determined solution;
Figure 867142DEST_PATH_IMAGE006
recording the optimal individual and the corresponding fitness;
Figure 531473DEST_PATH_IMAGE007
judging whether the calculation process can be finished or not, if the fitness condition is met, acquiring the optimal chromosome and the corresponding fitness, generating an optimal learning sample, and then quitting, otherwise, continuing to calculate;
Figure 148268DEST_PATH_IMAGE008
performing a measurement on each individual in the population Q (t) to obtain a corresponding determination solution;
Figure 389894DEST_PATH_IMAGE009
evaluating the fitness of each determined solution;
Figure 399438DEST_PATH_IMAGE010
adjusting individuals by using a quantum revolving door U (t) to obtain a new population Q (t + 1);
Figure 42909DEST_PATH_IMAGE011
recording the optimal individual and the corresponding fitness;
Figure 266823DEST_PATH_IMAGE012
add 1 to the iteration number t and return to step g.
(5) And training a machine learning algorithm model by using the optimal learning sample, optimizing machine learning hyper-parameters, selecting the parameter with the highest average F1 value of the tumor group and the liver cancer group in the digestive tract cavity in the test data set test result, and establishing a digestive system tumor screening model. And screening the gastric cancer, the colorectal cancer, the esophageal cancer and the liver cancer by using a digestive system tumor screening model.
Various modifications and alterations to this invention will become apparent to those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (6)

1. A method of screening for a tumor of the digestive system, comprising:
acquiring clinical test data age, sex, whole blood cell inspection data, serological inspection data, AFP and CEA inspection results and disease diagnosis conditions;
screening and constructing an optimal learning sample through a quantum genetic algorithm;
training a machine learning algorithm model by using the optimal learning sample to obtain a digestive system tumor screening model;
and screening the digestive system tumor by using a digestive system tumor screening model.
2. The method of claim 1, wherein the clinical data is processed to provide a set of clinical test data comprising whole blood cytological examination data, biochemical examination data, AFP, CEA examination results within 72 hours of the same person.
3. The method of claim 1, wherein the clinical test data are divided into five groups, namely a healthy group, a group of tumors in the digestive tract cavity, a group of liver cancer, a group of control malignant tumors, and a group of high-incidence diseases, wherein the healthy group does not include patients with serious diseases; the tumor group in the digestive tract cavity comprises gastric cancer, colorectal cancer and esophageal cancer; the control malignant tumor group comprises the group excluding the tumor in the digestive tract cavity and other malignant tumors except liver cancer; the high-incidence disease group includes other common diseases besides malignant tumors.
4. The method of claim 3, wherein the clinical test data is divided into a training data set and a test data set, wherein the test data set is created according to true disease incidence.
5. The method for screening digestive system tumors according to claim 4, wherein the training data set is created by a quantum genetic algorithm, and the optimal selection ratio of five grouping personnel is calculated to generate the optimal training data set, which comprises:
randomly initializing chromosomes of the quantum bit codes, and measuring a determination solution of each chromosome individual;
generating a training data set according to the selection proportion of the five grouped personnel corresponding to the determined solution;
generating a corresponding screening model by using a machine learning algorithm, and checking test data by using the screening model;
defining the average F1 value of the tumor group and the liver cancer group in the digestive tract cavity in the test result as a fitness function value;
and evaluating the fitness of each determined solution, adjusting the chromosome by using a quantum revolving door to obtain a new population, performing iterative computation to obtain the optimal chromosome and the corresponding fitness, and generating an optimal learning sample.
6. The method for screening digestive system tumors according to claim 5, wherein the optimal learning sample is used for training a machine learning algorithm model, the machine learning hyper-parameters are optimized, the parameter with the highest average F1 value of the tumor group and the liver cancer group in the digestive tract cavity in the test data set test result is selected, and a digestive system tumor screening model is established; and screening gastric cancer, colorectal cancer, esophageal cancer and liver cancer by using a digestive system tumor screening model.
CN202211038137.7A 2022-08-29 2022-08-29 Method for screening digestive system tumor Pending CN115346663A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211038137.7A CN115346663A (en) 2022-08-29 2022-08-29 Method for screening digestive system tumor

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211038137.7A CN115346663A (en) 2022-08-29 2022-08-29 Method for screening digestive system tumor

Publications (1)

Publication Number Publication Date
CN115346663A true CN115346663A (en) 2022-11-15

Family

ID=83954425

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211038137.7A Pending CN115346663A (en) 2022-08-29 2022-08-29 Method for screening digestive system tumor

Country Status (1)

Country Link
CN (1) CN115346663A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117689011A (en) * 2023-12-28 2024-03-12 杭州汇健科技有限公司 Model adjustment method, device, equipment and storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117689011A (en) * 2023-12-28 2024-03-12 杭州汇健科技有限公司 Model adjustment method, device, equipment and storage medium
CN117689011B (en) * 2023-12-28 2024-05-03 杭州汇健科技有限公司 Model adjustment method, device, equipment and storage medium

Similar Documents

Publication Publication Date Title
CN111028223B (en) Method for processing microsatellite unstable intestinal cancer energy spectrum CT iodogram image histology characteristics
CN115376706B (en) Prediction model-based breast cancer drug scheme prediction method and device
Liao et al. Learning from ambiguous labels for lung nodule malignancy prediction
JP2023184468A (en) Passage abnormality detection system based on adaptive resampling deep encoder network
CN111748633A (en) Characteristic miRNA expression profile combination and head and neck squamous cell carcinoma early prediction method
Lv et al. TransSurv: transformer-based survival analysis model integrating histopathological images and genomic data for colorectal cancer
CN115346663A (en) Method for screening digestive system tumor
CN110111840A (en) A kind of somatic mutation detection method
CN115873956A (en) Kit, system, use and modeling method of prediction model for predicting risk of colorectal cancer of subject
Yan et al. Histopathological bladder cancer gene mutation prediction with hierarchical deep multiple-instance learning
CN115691813A (en) Genetic gastric cancer assessment method and system based on genomics and microbiomics
CN115537467A (en) Establishment method and application of ovarian cancer survival prognosis prediction molecular model based on deep neural network
CN108048460A (en) A kind of New molecular marker and its application in preparing for the kit of head and neck cancer diagnosis and prognosis
Ramachandra et al. Ensemble machine learning techniques for pancreatic cancer detection
Sun et al. Five-year prognosis model of esophageal cancer based on genetic algorithm improved deep neural network
CN117079801B (en) Colorectal cancer prognosis risk prediction system
CN111793692A (en) Characteristic miRNA expression profile combination and lung squamous carcinoma early prediction method
Hrizi et al. Lung cancer detection and nodule type classification using image processing and machine learning
Chakraborty et al. A radiological image analysis framework for early screening of the COVID-19 infection: A computer vision-based approach
CN104268566A (en) Data processing method in intelligent lymph gland disease diagnostic system
CN114369673A (en) Colorectal adenoma biomarker, kit and screening method of biomarker
Sangeetha et al. A Novel Method to Detect Lung Cancer using Deep Learning
CN111733252A (en) Characteristic miRNA expression profile combination and early gastric cancer prediction method
Abdullahi et al. Pretrained convolutional neural networks for cancer genome classification
Hafiz et al. Convolutional neural network (CNN) in COVID-19 detection: a case study with chest CT scan images

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20221115

WD01 Invention patent application deemed withdrawn after publication