CN112289455A

CN112289455A - Artificial intelligence neural network learning model construction system and construction method

Info

Publication number: CN112289455A
Application number: CN202011129091.0A
Authority: CN
Inventors: 王智; 武艳飞
Original assignee: Individual
Current assignee: Individual
Priority date: 2020-10-21
Filing date: 2020-10-21
Publication date: 2021-01-29

Abstract

The invention belongs to the technical field of computer model construction, and discloses an artificial intelligent neural network learning model construction system and a construction method, which comprises the steps of obtaining relevant data of effective lung cancer patients; performing feature extraction on the acquired data to obtain a feature sample set; preprocessing the acquired characteristic sample set to obtain normalized sample data; dividing the obtained normalized sample data into a training data set and a verification data set; constructing an artificial intelligence algorithm model, and training the constructed artificial intelligence algorithm model by using the obtained training data set; verifying the constructed artificial intelligence algorithm model by using a verification data set, and optimizing the artificial intelligence algorithm model based on a verification result; and obtaining the optimized artificial intelligence algorithm model. According to the invention, an artificial intelligence algorithm model is constructed, so that the diagnosis efficiency of the lung cancer can be effectively improved, the accuracy of the evaluation of the disease condition in the treatment of the lung cancer is improved, the corresponding treatment scheme is optimized, and the survival rate of the lung cancer patient is improved.

Description

Artificial intelligence neural network learning model construction system and construction method

Technical Field

The invention belongs to the technical field of computer model construction, and particularly relates to an artificial intelligent neural network learning model construction system and method.

Background

Currently, lung nodule artificial intelligence screening systems have become mature, and many artificial intelligence technology companies have achieved great success, wherein the proposed inferedtm system has been approved by FDA. The artificial intelligence judgment of the benign and malignant properties of the lung nodules has achieved good results. The artificial intelligence algorithm for lung nodule screening is open source, so the technology for extracting the artificial intelligence imaging characteristics of the lung adenocarcinoma patient is completely feasible. Systemic treatment is currently active in most stages of non-small cell lung cancer (NSCLC). The recommended treatment for stage II non-small cell lung cancer is still surgical resection plus adjuvant chemotherapy. Treatment strategies for advanced patients include radiation therapy, chemotherapy, targeted therapy, immunotherapy and surgical resection. The artificial intelligence deep learning algorithm may provide the physician with patient-specific predictors in future applications, which will enable the physician to predict the treatment outcome. Prior art 1 uses an Artificial Neural Network (ANN) to simulate NSCLC survival, determines relevant genetic features of lung adenocarcinoma, and performs several types of ANN algorithms to construct an optimal ANN architecture for the classification of lung adenocarcinoma benefit. The reliability of the method was then evaluated by cross-dataset validation. The 10-fold cross-validation classification showed an accuracy of 65.71%. The test results indicate that it is feasible to use the gene profile obtained from microarray analysis to predict adjuvant chemotherapy benefit in NSCLC. This may avoid excessive medical care and waste of medical resources.

Development and verification of a non-small cell lung cancer survival deep learning model: the deep learning network can learn a highly complex and/or non-linear correlation between prognostic clinical features and risk of lcs death. In applications, these networks even show the potential to provide individual recommendations based on calculated risk. For example, by analyzing clinical data at surveillance, epidemiological and end-result (SEER) cancer registries, the computerized methods of Bergquist et al 15 assembly, including random forests, lasso regression and neural networks, predicted lung cancer staging with 93% accuracy. In another study, Corey et al developed a machine learning model-based software package (Pythia) that incorporated the patient's age, gender, clinical baseline, race/ethnicity, and medical history to determine the risk of postoperative complications or death. Matsuo et al also developed a deep learning network model with higher C statistic than the traditional proportional hazards regression model (C statistic 0.795vs0.784) for progression free survival analysis. In addition, Katzman et al developed a new survival analysis deep learning method that utilizes a deep learning network to integrate the Cox proportional hazards, known as the learning survival neural network (Deepsurv). The authors demonstrated that a survival model was implemented and published by DeepSurv and could be used to provide treatment recommendations for better survival outcomes. In China, the Deepsurv software is released in Shanghai chest hospital in 2020 and 6 months. The clinical information and treatment condition of the non-small cell lung cancer patients are counted by utilizing the prior retrospective data and the SEEK database. Patients were divided into 2 groups according to the consistency of the treatment received and recommended. For survival analysis, the Kaplan-Meier method was used to analyze the lcs between different groups and the log rank test was used to compare the survival curves. Finally, an additional Cox proportional hazards regression model and non-neural network method was used, performed in a simple backward stepwise manner. Better results are shown for patient prognosis prediction and treatment recommendation.

Prior art 2 tested the role of various factors in predicting the tumor response of EGFR-TKI treatment (erlotinib or gefitinib) in advanced NSCLC patients. The predictive factors include clinical history, environmental risk factors, EGFR mutations, and the like. The highest prediction accuracy of the data-driven decision support model reaches 76%. Their method makes it possible to apply the test results to clinical practice and to provide individualized treatment for the patient.

The use of immunotherapy still presents some challenges, such as patient selection and prediction of treatment outcome with great difficulty. Testing to predict biological markers is currently focused on biopsy markers. Artificial intelligence deep learning algorithms and imaging omics can provide a unique interpretation of tumors and their microenvironment in a non-invasive manner, showing good results in terms of improved patient selection and outcome prediction. However, the response of immunotherapy is dynamic. Once a patient begins immunotherapy, subsequent tumor responses are often difficult to assess using standardized tools (e.g., RECIST or CHOI). The inclusion of the imreciist criteria created by modeling to evaluate efficacy against immunotherapy would provide a revolutionary change in the evaluation of immunotherapy.

Most predictive tests employ small patient data sets of limited heterogeneity. At present, the models used in adjuvant chemotherapy after surgery do not match the data set that was not surgically treated. The direction of testing needs to be further expanded, and application methods incorporating these data are sought for clinical application and popularization.

In metastatic lung adenocarcinoma, because genes such as EGFR, ALK, ROS1, MET, HER-2 and the like have the existence of targeted therapeutic drugs, the survival time and the life quality of lung adenocarcinoma patients are greatly improved, and even because of the existence of PD-L1, PD-1 and CTL-4 inhibitors, a plurality of lung adenocarcinoma patients even survive for more than 5 years. But also because of the existence of numerous specific targeted drugs, the scope of use and indications, as well as the patient's own disease and specific gene expression, and the combination of different mutated genes has a great impact on treatment decisions.

Through the above analysis, the problems and defects of the prior art are as follows: the prior art does not have an artificial intelligent neural network model aiming at the comprehensive processing of multiple lung cancer data (lung cancer imaging, genetics, genomics and whole body state similar to the thinking mode of a clinician).

The difficulty in solving the above problems and defects is: 1, scatter in the number of case data, 2, uncertain unnormalization of post-line treatment. 3, the recognition of lung cancer drivers is incomplete and incomplete. And 4, the treatment influencing factor treatment indication range is difficult to define, and the randomness is high. 5, doctors in different levels and hospitals have large difference in scientific research level, but patients are difficult to focus on large hospitals and follow-up visits are difficult. 6, the clinician business is under heavy pressure, and lacks real-time tools, so that the clinician has difficulty in having time to complete comprehensive information evaluation and tracking record of the patient. The clinician's interest in the statistical record of patient information is missing 7.

The significance of solving the problems and the defects is as follows: the method comprises the following steps that 1, an online network platform built in a local area is built, on the premise that privacy of a patient is well protected, a MySql database platform capable of recording information of the patient at any time is improved for a clinician, and the clinician can track the information of the patient and treatment conditions by means of fragmented time recording. The clinician can also inquire about the treatment condition of the rest doctors and the comment of the expert group on the patients with each group of treatment ending in the patient bank with desensitized information. Improving the normative of treatment. And 3, focusing on the human genome project and continuous clinical genome tests all over the world, and screening the lung cancer driving genes. And a statistical calculation module based on the R language is established, and the driving gene calculation is carried out on the continuously increased cases in the library by utilizing a Lasso and logtics multi-factor regression calculation method, so that the driving factor results of the lung cancer in our region and even the whole country are obtained and prevented. And 4, similarly, analyzing influence factors by using the treatment effect of the patient in the database, performing deep learning by using artificial intelligence, calculating an optimal treatment scheme, and performing manual evaluation by experts. Improving the accuracy of treatment. And 5, designing a simple R language module of a single item, such as a ROC curve, a K-M survival analysis, a logtics regression and other statistical script modules. The clinician can input own clinical data by using the visual shiny package, the scientific research problem in clinic is checked, the scientific research threshold is lowered, the interest of basic doctors in participating in scientific research is improved, and the medical level of the basic doctors is improved. And the enthusiasm of basic doctors for participating in information statistics can be improved. Meanwhile, the accuracy of the patient information is also ensured. 6, the condition allows, can develop the incomplete problem of patient's information of patient's mobile phone follow-up system solution. And 7, a cross validation module of the small samples is arranged, so that statistical calculation can be performed by using data of the small samples, the research enthusiasm of clinicians is improved, expert comment query is arranged, and the participation enthusiasm of primary clinicians is improved.

Disclosure of Invention

Aiming at the problems in the prior art, the invention provides an artificial intelligent neural network learning model.

The construction system and the construction method comprise the following steps:

the invention is realized in this way, a system and a method for constructing an artificial intelligent neural network learning model, the method for constructing the artificial intelligent neural network learning model comprises the following steps:

and constructing an artificial intelligence algorithm model by collecting effective case data, and training and optimizing the constructed artificial intelligence algorithm model to obtain an optimized artificial intelligence algorithm model.

The model is a machine learning model established by big data statistics based on R language: aims to utilize the existing data to mine disease onset risk factors, mine influencing factors influencing disease outcome, summarize and quantify the factors, and finally achieve the purposes of preventing diseases and accurately treating the diseases.

And can be used for broader tumor sequencing data in combination with clinical outcomes for cancer patient survival predictions. Even other diseases such as coronary heart disease, diabetes, etc. It relies on ensemble learning using outcome prediction of multiple types of data to generate a composite risk score. More specifically, the model applies Cox penalty regression methods (LASSO, RIDGE and elastic net) and generates cross-validated genomic risk scores. The score may then be used to stratify each patient into a different risk subset. The program also generates visualizations of feature importance to allow identification of biomarkers to predict clinical outcome. A verification module is also included for calculating a risk score for the verification dataset.

Here, a set of data sets is simulated to illustrate his use. A total of 300 subjects were simulated. The survival time of subjects was simulated using an exponential model with uniform deletions. A total of 15 features were simulated, 5 of which were related to the results and were named "usefrugene". All other functions are named as 'junkgGene'. The following is code for generating data:

the first data set contains only time and an examination variable (named "surfdata"), containing time and state, respectively. The time may be PFS, OS, drug effective time, age of onset, etc., and the status may be the presence or absence of extracted feature values. Similarly, the pixel locations in the image information may also exist as features after processing.

And in the second step, Cox penalty regression is carried out on all or none characteristic sites and time, and relevant characteristics are screened by using Lasso, RIDGE, an elastic net and the like. The process of Lasso can be used to perform 100 to 200 screens to determine time-affecting risk factors. And assigning a risk coefficient score for each risk factor.

And thirdly, introducing the extracted risk factors into logstic regression analysis, and providing a prediction result.

Further, the artificial intelligent neural network learning model construction method comprises the following steps:

acquiring effective basic information materials, disease stages, lung cancer related driving gene inspection data, indexes of tumor markers of patients, important biochemical indexes, conditional assessment MIRNA and chromosome conditions of the patients, and evaluating PS score related data of the patients by doctors;

secondly, extracting the characteristics of the acquired data to obtain a characteristic sample set; preprocessing the acquired characteristic sample set to obtain normalized sample data; dividing the obtained normalized sample data into a training data set and a verification data set;

step three, constructing an artificial intelligence algorithm model, and training the constructed artificial intelligence algorithm model by using the obtained training data set; verifying the constructed artificial intelligence algorithm model by using a verification data set, and optimizing the artificial intelligence algorithm model based on a verification result; and obtaining the optimized artificial intelligence algorithm model.

Another object of the present invention is to provide an artificial intelligence neural network learning model building system for implementing the artificial intelligence neural network learning model building method, the artificial intelligence neural network learning model building system including:

the data acquisition module is used for acquiring basic information materials, disease stages, lung cancer related driving gene inspection data, indexes of tumor markers of patients, important biochemical indexes, conditional assessment MIRNA and chromosome conditions of the patients, and doctors assess PS score related data of the patients;

the data screening module is used for screening the acquired data and eliminating the relevant data of the lung cancer patients which do not meet the standard;

the characteristic extraction module is used for extracting the characteristics of the screened data to obtain a characteristic sample set;

the data processing module is used for preprocessing the characteristic sample set data to obtain a normalized sample set;

the data dividing module is used for dividing the obtained normalized sample set into a training data set and a verification data set;

the model construction module is used for constructing an artificial intelligence algorithm model;

the model training module is used for training the constructed artificial intelligence algorithm model by utilizing a training data set;

the model verification module is used for verifying the trained artificial intelligence algorithm model by utilizing a verification data set;

and the optimization module is used for optimizing the verified artificial intelligence algorithm model based on the verification result.

By combining all the technical schemes, the invention has the advantages and positive effects that: the invention carries out prognosis stratification on lung adenocarcinoma patients with additional information such as basic clinical information, gene mutation detection, TMB and PD-L1 state evaluation and the like; and finally, performing the common test of lung cancer experts in the autonomous region by deep learning and utilizing the existing international latest test and META analysis. The present invention provides an optimal treatment for patients after stratification of different effective variables. The contradiction that the sudden increase of the workload of oncologists and the increase of medical resources and human input per year can not meet the social demand far away is effectively solved; in the background of graded diagnosis and treatment, the survival time of tumor patients is prolonged. By combining the cloud platform, the diagnosis and treatment level of lung cancer of primary hospitals in all areas can be rapidly improved, the grading diagnosis and treatment are facilitated, medical high-quality resources are sunk in the technical level, and the medical resource allocation is integrally optimized. Improve the survival chance of the lung cancer patients and save a large amount of medical insurance funds and medical resources.

The invention assists doctors to carry out image diagnosis by means of artificial intelligence, liberates the doctors from the fussy repeated low-efficiency work to a certain extent, and improves the diagnosis efficiency and accuracy. The doctor has more investment in the selection and optimization of the treatment scheme, and has more time to take a patient to be diagnosed, thereby improving the diagnosis rate. The method comprises the steps of collecting effective case data, importing the effective case data into an artificial intelligent model development system, gradually developing an artificial intelligent diagnosis algorithm model, continuously training and optimizing to form an own algorithm model of our hospital, realizing the breakthrough of a key technical method for screening and treating the lung cancer through cross-field and cross-discipline cross testing of foundation, clinic, image and computer algorithm which are closely combined, providing a tool for clinically detecting high-risk patients, improving diagnosis efficiency and standardizing the use of targeted drugs. Meanwhile, the cloud platform technology is matched, artificial intelligence is popularized to technically weak medical units such as the basic level, the classified diagnosis and treatment are facilitated, medical high-quality resources sink, and medical resource allocation is integrally optimized. The early identification, early diagnosis and early treatment of the lung cancer are realized, the diagnosis and treatment level of the lung cancer disease is improved, the existing platform can be used for propaganda and education of lung cancer prevention, and the prevalence rate and unnecessary social burden of lung cancer patients are reduced.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings needed to be used in the embodiments of the present application will be briefly described below, and it is obvious that the drawings described below are only some embodiments of the present application, and it is obvious for those skilled in the art that other drawings can be obtained from the drawings without creative efforts.

Fig. 1 is a schematic diagram of a method for constructing an artificial intelligence neural network learning model according to an embodiment of the present invention.

Fig. 2 is a development flow of an intelligent lung nodule screening algorithm provided in an embodiment of the present invention.

Fig. 3 is a flowchart of a method for constructing an artificial intelligence neural network learning model according to an embodiment of the present invention.

FIG. 4 is a schematic structural diagram of an artificial intelligence neural network learning model construction system provided by an embodiment of the present invention;

in the figure: 1. a data acquisition module; 2. a data screening module; 3. a feature extraction module; 4. a data processing module; 5. a data partitioning module; 6. a model building module; 7. a model training module; 8. a model verification module; 9. and an optimization module.

Fig. 5 is a schematic diagram of an application and optimization flow of an intelligent lung nodule screening algorithm according to an embodiment of the present invention.

Fig. 6 is a subsequent flow chart of a lung nodule entering a screening system provided by an embodiment of the present invention.

FIG. 7 is a flowchart illustrating risk factors and efficacy prediction for candidate medical records according to an embodiment of the present invention.

Fig. 8 is a flow chart of an operation mode of an artificial intelligence neural network learning system for accurate adenocarcinoma treatment according to an embodiment of the present invention.

Fig. 9 is a schematic diagram of a lung nodule contour segmentation module according to an embodiment of the present invention.

Fig. 10 is a schematic diagram of a lung nodule benign and malignant prediction module according to an embodiment of the present invention.

FIG. 11 is a schematic diagram of the analysis of the risk classification of diseases and the affected survival time of different driver genes according to the embodiment of the present invention.

Fig. 12 is a graphical illustration of the impact of analyzing the patient's different risk scores on survival provided by an embodiment of the present invention.

FIG. 13 is a schematic diagram of survival prediction by a scoring system using data statistics of bad prognosis indexes of different genes according to an embodiment of the present invention.

Fig. 14 is a schematic diagram of risk level prediction according to an embodiment of the present invention.

Fig. 15 is a flowchart of a method for intelligent treatment of lung adenocarcinoma according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail with reference to the following embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

Examples of applications are: for example, the coronary heart disease susceptibility gene genetic locus monitoring example is characterized in that a Cox survival model is established by taking the suffering time of a coronary heart disease group and a non-coronary heart disease group as the survival time, and the living habits, biochemical indexes and the like of all genetic loci and patients are taken as risk factors. Firstly, calculating the risk factors of coronary heart disease by using modules such as LASSO, elastic net and the like. And then carrying out cross validation, establishing risk factor scores through Cox, layering, and determining the effectiveness of layering by using K-M survival analysis. And finally, establishing a model of a coronary heart disease risk factor scoring system, and independently predicting the risk coefficient of the newly-developed patients suffering from the coronary heart disease by using Logistic regression. The model can be gradually improved as the data volume is gradually increased.

Aiming at the problems in the prior art, the invention provides an artificial intelligence neural network learning model construction system and a construction method, and the invention is described in detail below with reference to the accompanying drawings.

The method for constructing the artificial intelligent neural network learning model provided by the embodiment of the invention comprises the following steps: and constructing an artificial intelligence algorithm model by collecting effective case data, and training and optimizing the constructed artificial intelligence algorithm model to obtain an optimized artificial intelligence algorithm model.

As shown in fig. 1 to fig. 3, the method for constructing an artificial intelligence neural network learning model according to an embodiment of the present invention includes the following steps:

s101, obtaining effective basic information materials, disease stages, lung cancer related driving gene inspection data, indexes of tumor markers of a patient, important biochemical indexes, conditional assessment MIRNA and chromosome conditions of the lung cancer patient, and evaluating PS score related data of the patient by a doctor;

s102, performing feature extraction on the acquired data to obtain a feature sample set; preprocessing the acquired characteristic sample set to obtain normalized sample data; dividing the obtained normalized sample data into a training data set and a verification data set;

s103, constructing an artificial intelligence algorithm model, and training the constructed artificial intelligence algorithm model by using the obtained training data set; verifying the constructed artificial intelligence algorithm model by using a verification data set, and optimizing the artificial intelligence algorithm model based on a verification result; and obtaining the optimized artificial intelligence algorithm model.

Those skilled in the art can also implement the artificial intelligence neural network learning model construction method provided by the present invention by adopting other steps, and the artificial intelligence neural network learning model construction method provided by the present invention in fig. 1 is only a specific embodiment.

As shown in fig. 4, the artificial intelligence neural network learning model construction system provided by the embodiment of the present invention includes:

the data acquisition module 1 is used for acquiring basic information materials, disease stages, lung cancer related driving gene inspection data, indexes of tumor markers of patients, important biochemical indexes, conditional assessment MIRNA and chromosome conditions of the patients, and doctors assess PS score related data of the patients;

the data screening module 2 is used for screening the acquired data and eliminating the relevant data of the lung cancer patients which do not meet the standard;

the characteristic extraction module 3 is used for extracting the characteristics of the screened data to obtain a characteristic sample set;

the data processing module 4 is used for preprocessing the characteristic sample set data to obtain a normalized sample set;

the data dividing module 5 is used for dividing the obtained normalized sample set into a training data set and a verification data set;

the model building module 6 is used for building an artificial intelligence algorithm model;

the model training module 7 is used for training the constructed artificial intelligence algorithm model by utilizing a training data set;

the model verification 8 is used for verifying the trained artificial intelligence algorithm model by utilizing a verification data set;

and the optimization module 9 is used for optimizing the verified artificial intelligence algorithm model based on the verification result.

The technical effects of the present invention will be further described with reference to specific embodiments.

Example 1:

the invention assists doctors to carry out image diagnosis by means of artificial intelligence, liberates the doctors from the fussy repeated low-efficiency work to a certain extent, and improves the diagnosis efficiency and accuracy. The doctor has more investment in the selection and optimization of the treatment scheme, and has more time to take a patient to be diagnosed, thereby improving the diagnosis rate. The method comprises the steps of collecting effective case data, importing the effective case data into an artificial intelligent model development system, gradually developing an artificial intelligent diagnosis algorithm model, continuously training and optimizing to form an own algorithm model of our hospital, realizing the breakthrough of a key technical method for screening and treating the lung cancer through the cross-domain cross-discipline test with the close combination of foundation, clinic, image and computer algorithm, providing a tool for clinically detecting high-risk patients, improving the diagnosis efficiency and standardizing the use of targeted drugs. Meanwhile, the cloud platform technology is matched, artificial intelligence is popularized to technically weak medical units such as the basic level, the classified diagnosis and treatment are facilitated, medical high-quality resources sink, and medical resource allocation is integrally optimized. The early identification, early diagnosis and early treatment of lung cancer in our city and surrounding areas are realized, the diagnosis and treatment level of lung cancer diseases is improved, the lung cancer prevention can be announced by using the existing platform, and the morbidity of lung cancer patients and unnecessary social burden are reduced.

The main test contents comprise:

1. application and development of an artificial intelligence pulmonary nodule screening system: the test applies a full-automatic lung nodule detection method, simultaneously carries out analysis of various attributes on the detected nodule, provides probability analysis of benign and malignant diseases, and provides important reference for diagnosis of doctors. Realizes early diagnosis and early treatment of lung cancer, lightens the burden of patients and society, and optimizes the system algorithm.

2. Integrating risk factor data of all patients diagnosed with lung adenocarcinoma in the last 5 years in two hospitals, such as (smoking, family history, radiation exposure history, tuberculosis infection history, lung cancer driver gene expression and the like) to perform hierarchical classification, following diseases PFS and ORR duration of the patients for 24 months, performing algorithm modeling by OS, adjusting according to a non-small cell lung cancer guideline of NCCN2019 and combining with not less than 5 positive high-grade experts in the autonomous region, and deducing an optimal treatment scheme of each hierarchically classified patient. And manufacturing a corresponding intelligent program, and continuously bringing the program into a new patient for optimization through machine deep learning.

Example 2:

1. the test method of the lung adenocarcinoma risk factor hierarchical management intelligent prediction system comprises the following steps:

1.1 test object

The lung adenocarcinoma risk factor hierarchical management intelligent prediction system has the case inclusion standard that:

(1) the age is more than or equal to 18 years old.

(2) Lung adenocarcinoma was confirmed by surgery or after lung puncture, biopsy or lymph node biopsy;

(3) the clinical basic information data are complete; conditional row follow-up 24 months.

(4) The lung is subjected to at least two times of high-resolution CT examination and reexamination;

(5) the curative effect evaluation is carried out by more than 4 major physicians.

Exclusion criteria:

(1) age <18 years old;

(2) the basic information data is imperfect;

(3) there is no related imaging data.

(4) The pathology was not diagnosed, or was of adenosquamous carcinoma.

The data are randomly divided into two categories (A, derivation queue B validation queue)

1.2 inspection method

The patient was admitted to an outpatient clinic to refine the basic information material (sex, age, family tumor history, smoking history, radiation exposure). Disease staging (setting up a simple program, patient or physician filling, TNM staging), lung cancer-related driver gene screening (EGFR, ALK, ROS1, RET, MET, K-RAS, BRAF, V600E, etc.), monitoring of patient tumor markers for indicators, vital biochemical indicators, conditional assessment of MIRNA, and chromosomal status, physician assessment of patient PS score. The entry lists are respectively recorded.

2.2 test methods

At present, health awareness of the public is gradually increasing, and the range of screening lung cancer using high-resolution CT is expanding, so it is necessary to prioritize them. About ten thousand people in two hospitals carry out lung CT screening every year, and the positive rate of lung nodules is about 10 percent. Noninvasive urinary biomarkers associated with lung cancer risk, i.e., tumor-derived metabolites, can help determine the priority of an individual. High resolution CT screening was performed in high risk populations to combine with it the effectiveness of increasing screening and reducing cost and morbidity (risk biomarkers, fig. 6).

The situation is further complicated by the fact that a high value can be identified by a high resolution CT scan. Results of the number of tuberculosis that prompted further invasive testing but not performed were diagnosed as lung cancer. 96.4% of lung nodules initially in further testing, non-cancer screening was considered non-cancer. In a recent retrospective test of clinical management of U.S. patients, whether a lung nodule is a tuberculosis or not was found to be a large difference in management, resulting in a large number of unnecessary invasive surgeries. And for historical reasons, the proportion of patients with tuberculosis is larger in China. Therefore, non-invasive biomarkers are needed to help describe malignant nodule (diagnostic) biomarkers that distinguish benign or indolent lesions, fig. 6).

Based on unmet early management needs, non-invasive, hematuria-type biomarkers as well as tissue-type biomarkers need to be tested (table 4). These data include various types of tumor marker data for lung tumors. When combined with lung CT nodule identification, they can help patients stratify into risk categories and cancer diagnosis and treatment, and generate hypotheses for tumor biology. Such as found in the NCI-MD case control test and demonstrated in NCI's prostate, lung, colorectal, and ovarian (PLCO) cancer screening test, where elevated levels of IL-6, CRP, and IL-8 are relevant, however, risk prediction models may need to account for differences in ethnicity and other biomarker characteristics.

Up to 60% of lung cancers are detected by high resolution CT. The earlier the diagnosis is confirmed, the operation chance and the cure rate are all increased. The advanced intelligent learning and the lung cancer tumor marker screening are combined to determine benign and malignant lung nodules, so that diagnosis and treatment can be given to a patient earlier, a large amount of unnecessary further examination and operation damage can be avoided, economic burden of the patient and the society is saved, the disease condition of the patient can be informed through a visual WEB program, the psychological doubt of cancer terrorism of the patient is eliminated, and the contradiction between doctors and patients is relieved.

In the future, the combination therapy for early operable stage I non-small cell lung will be mentioned as a new step. Although early lung cancer is amenable to surgery and then chemotherapy, 30% of patients receive surgical treatment and will die from tumor recurrence. After sufficient data has been accumulated using the present data platform in combination with foreign test data, biomarkers for molecular classification of stage i patients following non-invasive or tissue-based oncology are used. A high risk population with relapses would be predicted. Improved clinical management (prognostic biomarkers, fig. 6). High risk patients may receive adjuvant chemotherapy or innovative checkpoint immunotherapy while low risk patients may safely avoid further treatment, instead by monitoring high resolution CT. In summary, the present invention will discuss three key issues.

The need for early stage lung cancer: (1) and determining the priority of the high-risk individuals. High resolution CT screening (screening); (2) reducing over-diagnosis malignancy-assessing lung nodules and unnecessary surgery; (3) stage i patients at high risk for relapse are identified (prognosis).

2, the discovery and verification of biomarkers are the main components of accurate medical strategies, are the large trend of the future medical treatment, and are the purposes to be achieved by the software.

Four basic preconditions are involved.

1, a center for disease information is the integrated measurement of various molecules from individual patients (collectively. 0 as "economic" data). This multi-tiered molecular database may include comprehensive analysis of chromosomes, genomes, epigenomes, transcriptomes, metabolomes, proteomes, and microbiomes, as well as clinical analysis. There are also epidemiological data, and patients have previously had specific diseases, such as tuberculosis.

These data are integrated into a knowledge network that examines the interconnectivity between the information center data planes.

And 3, after the knowledge data is generated, the expert needs to further optimize the network. The american-initiated NCI matching (molecular analysis of treatment selection) test is an example of a precise oncology clinical test, the purpose of which is to assess the extent of treatment of cancer according to the following. Their molecular abnormalities will be able to improve the prognosis of the patient. The auxiliary lung cancer enrichment marker identification and sequencing test applies the concept to the treatment of early-stage non-squamous non-small cell lung cancer patients, and simultaneously feeds back the concept to an information sharing region through comprehensive genome analysis. 76% of the correct treatment was achieved.

Accurate localization due to the measured feature number, the analysis strategy must evolve to avoid overfitting if the feature number greatly exceeds the sample number.

After the above data entry is completed, the various biomarkers and clinical data are modeled for individual treatment outcome specific weights and ensure their widespread use outside the group of samples used to generate the biomarkers. A series of recommendations have been proposed by medical testing to guide the use of trans-omics based biomarkers in clinical trials (NCCN guidelines). In the discovery phase, biomarkers should be confirmed in a set of samples independent of the original findings. And many have disclosed the main data and computational processes and precisely defined the derived algorithms.

Briefly, there is a queue of sufficient size (nested case control or case series) with well-planned epidemiological and clinical data. The candidate biomarkers are selected using indices associated with disease or outcome of disease treatment. Rigorous assessment requires evidence of statistically significant risk segregation and improved predictive value for known risk factors, including age and smoking. Candidate biomarkers that pass this threshold. The established model is then validated inside the same patient cohort using a second targeted test, such as quantitative RT-pcr (qrtpcr) or pyrosequencing, to avoid platform-specific bias. This step also allows for the development of an easier assay that can accommodate a larger number of samples than the initial integrated analysis platform.

Where possible, the present invention proposes to focus on tests that can be developed as suggested in the NCCN guidelines standard, and to use standard laboratories. Preparing for the complete clinical treatment in the future. But the level of skill in our area has not yet been reached and many patients refuse to perform invasive procedures at a late stage to confirm the diagnosis. A group of deletion groups without pathological molecule monitoring is purposely set up in the key indexes of the group, and the treatment of patients without molecular biological indexes is simply discussed.

Likewise, to demonstrate robustness, biomarkers were further evaluated in at least one completely independent cohort. The selection of this cohort also allows for the introduction of the remaining factors associated with lung cancer that may affect the broad applicability of biomarkers, such as patient ethnicity and smoking history. Fully specified analysis and related computational procedures were generated from this validation and prepared for further evaluation in other arrays, such as publicly available microarray arrays in the case of gene expression-based databases. Samples were collected as biomarkers of risk. Finally, clinical utility should be examined in the context of prospective clinical trials.

Recently, some tests must investigate the survival benefit of NSCLC adjuvant chemotherapy along with other influencing factors such as age, gender, stage and genetic profile. However, the benefits of adjuvant chemotherapy have not been properly assessed. In this test, the aim of the present invention was to develop a predictive model to distinguish who is suitable for adjuvant chemotherapy and who should avoid NSCLC adjuvant chemotherapy.

The test of the present invention is to develop a classifier for detecting or diagnosing diseases by using a machine learning technique. Known as Artificial Neural Networks (ANN). It is a machine learning method suitable for humans to apply knowledge obtained from past experience to new problems. The ANN employs previously solved examples to build a "neuron" system to make new decisions, classifications and predictions. The basic ANN receives a number of inputs (which may be from raw data, or from outputs of other ANN's). Each input is connected to a neuron with a different weight. The output of the neuron is generated by an activation function. In the field of ANN testing, multi-layer perceptrons (MLPs) and Radial Basis Functions (RBFs) are common types of ANN models and have strong classification capability. It can solve the very complex problem of distribution pattern classification, but there are some differences in structure and function. Among them, MLP is the most widely used type of ANN model for classification and regression in medical testing. The MLP consists of three parts, including an input layer, a hidden layer and an output layer. Input layer neurons accept a large number of nonlinear inputs. The output layer is the signal analysis result through neuron weighting, analysis and transmission. The hidden layer is composed of many neurons and links all layers between the input layer and the output layer. The hidden layer may have multiple layers, which typically uses one layer in an MLP network. MLP can distinguish between data that are not linearly separable. The hidden layer may have multiple layers, which typically uses one layer in an MLP network. MLP can distinguish between data that are not linearly separable. The hidden layer may have multiple layers, which typically uses one layer in an MLP network. MLP can distinguish between data that are not linearly separable.

In recent years, a large amount of genetic data has been generated due to the progress of high-speed gene expression measurement techniques. Many tests report that the combination of gene expression data and other high dimensional genomic data for survival analysis provides a large test comparing complete clinical data (including: age, race, sex, survival time, adjuvant chemotherapy, adjuvant radiation therapy and staging).

2 establishing lung cancer related gene subset

OMIM is a comprehensive authoritative summary of human genes and genetic phenotypes, containing information on all known lung cancer diseases and over 12,000 genes. According to the results after the test of the scientists in the past and the practical situation of the area, the invention collects the database lung cancer related gene list and draws the database lung cancer related gene list to the corresponding microarray probe. This can be reduced to facilitate calculation of the target gene. The advantage is that the results obtained easily explain their biological properties.

2.2 quintile hazard factor conversion

All gene expression according to its pentad value was shifted to 0-4 levels (very low, normal, high and very high). The goal is to reduce individual differences in gene expression and facilitate the transition to other detection techniques for future use.

2.3 adjuvant chemotherapy benefit Classification

In order to build a predictive model to assess which patients are eligible to receive adjuvant chemotherapy, the present invention requires that all patients be classified into adjuvant chemotherapy-null and adjuvant chemotherapy-benefit groups. Since about 50% of these patients do not survive within 24 months, the present invention sets the classification threshold to 24 months. All patients were divided into adjuvant chemotherapy and Observation (OBS) groups depending on the treatment method. In the adjuvant chemotherapy group, patients who lived more than 24 months indicated that the patients had received significant adjuvant chemotherapy help. Patients who lived less than 24 months indicated that the patient did not receive significant adjuvant chemotherapy assistance. These patients need to have reduced damage by chemotherapy. In OBS concentration, a representative patient who lives less than 24 months may need to receive adjuvant chemotherapy. Patients living for more than 24 months indicated good prognosis and did not require adjuvant chemotherapy. As shown in figure 2, all patients were divided into two groups based on their survival time and adjuvant chemotherapy information. Wherein, patients who live less than 24 months without auxiliary chemotherapy or live more than 24 months without auxiliary chemotherapy belong to the ineffective group of auxiliary chemotherapy. Also, patients who live more than 24 months with adjuvant chemotherapy or less than 24 months without adjuvant chemotherapy belong to the adjuvant chemotherapy benefit group.

To select gene signatures for constructing the prediction model, the present invention calculates chi-square test values between the variables (genes) and the adjuvant chemotherapy-benefit/adjuvant chemotherapy-null groups for each dataset. The present invention identifies the top 10 adjuvant chemotherapy benefit-associated gene signatures for each data set and uses the subset gene signatures as survival benefit prediction model variables. Calculating chi-square value of gene expression and adjuvant chemotherapy-benefit/adjuvant chemotherapy-ineffective, and selecting the first 10 genes as machine learning target genes. Clinical data such as gender, age, T-stage and N-stage are also considered as variables of ANN. The ten genes identified in the previous process were used to train the ANN-network. Different combinations of the first ten genes were used as ANN inputs. The final input signature genes for the best ANN model were evaluated by using 10-fold cross validation and Kaplan-Meier survival analysis. Finally, the best results for ANN were retained in all combinations. In 10-fold cross-validation, the raw data is randomly divided into 10 equally sized subsets. Of the 10 subsets, one subset was kept as validation data for testing the ANN model, and the remaining 9 subsets were used as training data. The cross-validation process is then repeated 10 times (folding), with each of the 10 subsets being used only once as test data. The 10 validation data may be combined into a single data set, and then by using each case, the single data may obtain the KM curve estimate. The detailed information that can be found in the 10-fold subset preserves the best results of the ANN. In 10-fold cross-validation, the raw data is randomly divided into 10 equally sized subsets. Of the 10 subsets, one subset was kept as validation data for testing the ANN model, and the remaining 9 subsets were used as training data. The cross-validation process is then repeated 10 times (folding), with each of the 10 subsets being used only once as test data. The 10 validation data may be combined into a single data set, and then by using each case, the single data may obtain the KM curve estimate. The detailed information that can be found in the 10-fold subset preserves the best results of the ANN. In 10-fold cross-validation, the raw data is randomly divided into 10 equally sized subsets. Of the 10 subsets, one subset was kept as validation data for testing the ANN model, and the remaining 9 subsets were used as training data. The cross-validation process is then repeated 10 times (folding), with each of the 10 subsets being used only once as test data. The 10 validation data may be combined into a single data set, and then by using each case, the single data may obtain the KM curve estimate. The detailed information of the 10-fold subset can be found in and then the cross-validation process is repeated 10 times (folded), where each of the 10 subsets is used only once as test data. The 10 validation data may be combined into a single data set, and then by using each case, the single data may obtain the KM curve estimate. The detailed information of the 10-fold subset can be found in and then the cross-validation process is repeated 10 times (folded), where each of the 10 subsets is used only once as test data. The 10 validation data may be combined into a single data set, and then by using each case, the single data may obtain the KM curve estimate. Where detailed information of a 10-fold subset can be found.

2.4 building a predictive model

The present invention uses the most popular machine learning approach, Artificial Neural Network (ANN), to simulate the survival of NSCLC. Artificial neural network approaches have been shown to improve the accuracy of cancer survival outcome predictions. The present invention implements several types of ANN algorithms to identify the optimal ANN architecture. Reliability was assessed by cross-dataset validation. The data sets are randomly selected so that one source provides a training set and the other sources provide a test set. All networks were trained using commercial software (STATISTICA version 25.0). In the supervised training phase, the data set is provided to the ANN and the correct output is provided. Adjuvant chemotherapy was classified (24 months) and adjuvant chemotherapy information based on median total survival. Patients cannot be directly classified into adjuvant chemotherapy-benefit or adjuvant chemotherapy-funile groups according to their gene expression values, so patients who live under 24 months of adjuvant chemotherapy or live over 24 months without adjuvant chemotherapy belong to the adjuvant chemotherapy failure group, and all other patients are assigned to the adjuvant chemotherapy benefit group. Thereafter, the ANN is used for prediction. Because there is no perfect way to design an ideal ANN and the optimal number of hidden nodes and iterations is unknown, the best design is usually determined by trial and error. To identify optimal models, the present invention uses combinations of different activation functions and different numbers of hidden layer neurons to build and train many various ANN architectures using training data. All models were tested using test data to determine the predictive accuracy of their risk classification. The network showing the most accurate classification is retained.

2.5 evaluation by Kaplan-Meier survival analysis and log rank test

Many tests use machine learning methods to predict patient survival. In this test, the prediction results contain erasure data. Using a general machine learning verification method to verify that the prediction with the pruned data is unfair and the prediction may not be significant. In order to evaluate the prediction results using the censored data, Kaplan-Meier survival analysis (KM-plot) is a good method. KM-plot is an estimate of the survival function. In medical testing, KM-plot was used to measure survival time after patient treatment. An important advantage of KM graphs is that the method can take into account missed data. The log rank test was used to determine significant differences in survival between groups, treatments, etc. The present invention uses a hybrid method K-fold cross validation and KM graphs to validate the predicted results.

2.6 example of the concrete Process

The lung adenocarcinoma risk factor hierarchical management intelligent prediction system is a calculation tool developed by the invention and is used for combining extensive tumor sequencing data with clinical results of cancer patient survival prediction. It relies on ensemble learning using outcome prediction of multiple types of data to generate a composite risk score. More specifically, the intelligent prognosis system for hierarchical management of lung adenocarcinoma risk factors applies the Cox penalty regression method (LASSO, RIDGE and elastic net) and generates cross-validated genomic risk scores. The score may then be used to stratify each patient into a different risk subset. The program also generates visualizations of feature importance to allow identification of biomarkers to predict clinical outcome. A verification module is also included for calculating a risk score for the verification dataset.

And deducing the optimal matching groups of treatment schemes and survival time of the queues to establish a treatment model after grouping according to different driving risk factors. Five or more experts in the industry are asked to perform evaluation and improve the model.

The innovation point of the invention is that artificial intelligence is introduced to automatically evaluate medical image images and provide help for early screening of lung nodules, so that risk level evaluation is made, individual treatment decisions are provided for patients clinically, and multi-level layering of individual patients is performed through expression of various clinical data and lung cancer specific driving genes. And combining the latest test data of all countries by using the treatment data after different layers, and determining the optimal treatment scheme of the lung adenocarcinoma patient after the examination and optimization of a plurality of clinical experts. The advanced performance of the method is embodied in that the method can assist a doctor to quickly and accurately diagnose and continuously train and optimize an algorithm system through imported data; and a man-machine matching mode with high efficiency and forward feedback is formed. And a new diagnosis algorithm is developed through image omics, and a breakthrough of an accurate treatment technical method is realized through cross-field and cross-discipline cross test of tightly combining foundation, clinic, image and computer discipline, so that a tool is provided for clinically treating patients, and the diagnosis and treatment efficiency is improved. Meanwhile, machine deep learning can provide a powerful platform for the integration of clinical and imaging data. The method is favorable for deep mining of images and clinical comprehensive data so as to find the pathogenesis of the lung adenocarcinoma and verify the rule of the lung adenocarcinoma.

Artificial intelligence pulmonary nodule detects innovation point:

(1) in the lung nodule contour segmentation module, a recurrent neural network, an attention mechanism and a recurrent sequencing loss function are introduced into the system, so that the network can iteratively correct the result of a single segmentation network, and the lung nodule segmentation precision is greatly improved, as shown in fig. 6.

(2) In the lung nodule benign and malignant prediction module, besides the result of the previous module, the system integrates the advantages of the traditional image classification network and the fine-grained classification network, extracts multi-scale local and overall features, and adopts a loss function and a regular term which are designed in a targeted manner, so that an index with great clinical value is achieved, as shown in fig. 7.

The innovation point of the lung adenocarcinoma risk factor hierarchical management intelligent prediction system is as follows:

the Thierry Colin doctor in the ASCO conference of 2019 proposes to integrate the imaging, genomics and clinical data to carry out accurate treatment on the lung cancer patients. And the software is developed to be applied to the precise treatment of the metastatic non-small cell lung cancer. The present invention is similar in concept. Particularly, the data of the local lung adenocarcinoma patients in the autonomous region of inner Mongolia is introduced, and the method is more suitable for Chinese characteristics and systems. And for China, the difference of knowledge levels among doctors is large, the primary oncology is deficient, and the primary medical level can be greatly improved by the aid of an intelligent auxiliary system. The implementation of grading diagnosis and treatment is accelerated, and limited medical resources are saved.

The disease introduced with gene mutation is divided into layers and groups, the early diagnosis of the lung cancer is optimized in the early diagnosis, the postoperative recurrence possibility prediction is carried out on patients who can be operated, and whether the immune targeted therapy and the radiotherapy and chemotherapy are carried out after the operation is determined. And comprehensively evaluating clinical data, gene mutation data and the like of the late stage patient, and grading the new risk degree of the late stage lung adenocarcinoma patient. And personalized precision treatments are formulated using statistical learning and summarization of data. And a gene expression scoring system is established to evaluate the prognosis.

When side effects occur, patients require additional medication. Thus, the present invention contemplates that this test can help reduce ineffective medical practice, avoid wasting medical resources (e.g., avoid unnecessary chemotherapy, reduce ineffective use of drugs).

The modified precision therapeutics software will create a knowledge network of biomedical tests and new disease taxonomies. The precision medical society creates an information-sharing zone that interactively contains a variety of "omics" data types, as well as historical exposure and lifestyle information for individual patients. Integration of biological information into these data will lead to the development of knowledge networks that will be used to improve disease classification, clinical medicine applications and testing of molecular mechanisms. The iterative process of obtaining information about patients or patient queues is used for improving classification, each expert group utilizes knowledge, according to respective experience, latest treatment results issued at home and abroad and basic data as a template, new individualized schemes are designed for patients with different levels, and with the continuous increase of data, the system can further share information and continuously improve molecular classification and improve clinical diagnosis and treatment effects.

The expected target is:

1. technical index

The invention discloses a medical database based on R language and mysql, which can be used for inputting data of needed disease types at any time, realizing the increase, deletion, modification and examination and expert comment, and facilitating basic-level doctors to carry out scientific research and study based on actual work.

The image processing technology based on Python language is used for extracting the disease imaging characteristics and combining the disease imaging characteristics with the clinical characteristics. And when the quantity is sufficient, modeling is carried out, and risk factor stratification of the diseases and prediction analysis of survival and treatment of the diseases are carried out.

The invention is beneficial to the nation and the people, not only improves the technical level of doctors, improves the medical experience of patients, but also saves the medical expenditure, realizes the prevention and early diagnosis and early treatment of diseases, and can test the etiology and development of the diseases. Finally, the method can solve the problems of insufficient medical resources, uneven distribution of medical resources, difficult and expensive medical observation in China at present, improve medical health knowledge and consciousness of the masses of people, achieve early cancer prevention, establish correct life and death, and establish correct life value. The social significance is extremely great.

In the invention, 742 asymptomatic people of 2016.6-2018.1 CT low-dose lung examination in our hospital are collected at the early stage, wherein 292 men account for 39.4%, 450 women account for 60.6%, the age is 40-76 (average 49 +/-6), and the total 150 non-calcified lung nodules are detected out by analyzing all two high-age medical doctors, wherein the positive results are 83 (83/74211.2%).

And provides comprehensive diagnosis and treatment opinions. The overall medical level of the primary hospital is improved.

It should be noted that the embodiments of the present invention can be realized by hardware, software, or a combination of software and hardware. The hardware portion may be implemented using dedicated logic; the software portions may be stored in a memory and executed by a suitable instruction execution system, such as a microprocessor or specially designed hardware. Those skilled in the art will appreciate that the apparatus and methods described above may be implemented using computer executable instructions and/or embodied in processor control code, such code being provided on a carrier medium such as a disk, CD-or DVD-ROM, programmable memory such as read only memory (firmware), or a data carrier such as an optical or electronic signal carrier, for example. The apparatus and its modules of the present invention may be implemented by hardware circuits such as very large scale integrated circuits or gate arrays, semiconductors such as logic chips, transistors, or programmable hardware devices such as field programmable gate arrays, programmable logic devices, etc., or by software executed by various types of processors, or by a combination of hardware circuits and software, e.g., firmware.

The above description is only for the purpose of illustrating the present invention and the appended claims are not to be construed as limiting the scope of the invention, which is intended to cover all modifications, equivalents and improvements that are within the spirit and scope of the invention as defined by the appended claims.

Claims

1. A method for constructing an artificial intelligent neural network learning model is characterized by comprising the following steps: and constructing an artificial intelligence algorithm model by collecting effective case data, and training and optimizing the constructed artificial intelligence algorithm model to obtain an optimized artificial intelligence algorithm model.

2. The artificial intelligence neural network learning model building method of claim 1, wherein the artificial intelligence neural network learning model building method comprises the steps of:

step one, establishing and connecting a Mysql database by using an R language, establishing a local database shared by teams in departments, recording and storing effective basic information materials of a lung cancer patient, disease stages, lung cancer related driving gene inspection data, indexes of tumor markers of the patient, important biochemical indexes, conditional assessment MIRNA and chromosome conditions by a doctor by using fragmentation time, evaluating PS scoring related data of the patient by the doctor, and gradually tracking PFS, OS, medicament use conditions and toxic and side effects; meanwhile, valid medical record data of the existing patient in the team are collected retrospectively, stored, desensitized and standby, and corresponding lung CT images of the patient are collected, cleaned, desensitized and stored for standby;

step two, text data aspect: performing feature extraction on the acquired data, and cleaning to obtain a feature sample set; preprocessing the acquired characteristic sample set to obtain normalized sample data; dividing the obtained normalized sample data into a training data set and a verification data set; image aspect: the lung nodule 3D CNN algorithm model based on the open-source python language is utilized, a recurrent neural network, an attention mechanism and a recurrent sequencing loss function are introduced, so that the network can iteratively correct the result mark of a single segmentation network, the advantages of the traditional image classification network and a fine-grained network are integrated, the multi-scale local and overall characteristics are extracted, the lung nodule 3D CNN algorithm model has clinical value for evaluating the change and the expression of the lung cancer mass, and the characteristics of the image are extracted as a matrix for standby application;

step three, cleaning a single risk factor mode of the text data, preliminarily predicting and collecting 300 patient data to construct an artificial intelligence algorithm model, generating each risk factor score of cross verification by using a Cox penalty regression method, layering each patient into different risk subsets by using the scores, and training the constructed artificial intelligence algorithm model by using the obtained training data set, wherein the training set comprises the following steps: test set 8: 2; performing cross validation for 10 times, validating the constructed artificial intelligence algorithm model by using a validation data set, and optimizing the artificial intelligence algorithm model based on a validation result; and obtaining the optimized artificial intelligence algorithm model.

3. A computer device, characterized in that the computer device comprises a memory and a processor, the memory storing a computer program which, when executed by the processor, causes the processor to carry out the steps of: and constructing an artificial intelligence algorithm model by collecting effective case data, and training and optimizing the constructed artificial intelligence algorithm model to obtain an optimized artificial intelligence algorithm model.

4. A computer-readable storage medium storing a computer program which, when executed by a processor, causes the processor to perform the steps of: and constructing an artificial intelligence algorithm model by collecting effective case data, and training and optimizing the constructed artificial intelligence algorithm model to obtain an optimized artificial intelligence algorithm model.

5. An artificial intelligence neural network learning model construction system for implementing the artificial intelligence neural network learning model construction method according to any one of claims 1 to 2, wherein the artificial intelligence neural network learning model construction system comprises: