CN115346665B - Method, system and equipment for constructing retinopathy incidence risk prediction model - Google Patents

Method, system and equipment for constructing retinopathy incidence risk prediction model Download PDF

Info

Publication number
CN115346665B
CN115346665B CN202211276129.6A CN202211276129A CN115346665B CN 115346665 B CN115346665 B CN 115346665B CN 202211276129 A CN202211276129 A CN 202211276129A CN 115346665 B CN115346665 B CN 115346665B
Authority
CN
China
Prior art keywords
retinopathy
model
modeling
training
factors
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202211276129.6A
Other languages
Chinese (zh)
Other versions
CN115346665A (en
Inventor
邓燕
许源
刘琛
黄针
黄丹
李姝蓉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Second Affiliated Hospital to Nanchang University
Original Assignee
Second Affiliated Hospital to Nanchang University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Second Affiliated Hospital to Nanchang University filed Critical Second Affiliated Hospital to Nanchang University
Priority to CN202211276129.6A priority Critical patent/CN115346665B/en
Publication of CN115346665A publication Critical patent/CN115346665A/en
Application granted granted Critical
Publication of CN115346665B publication Critical patent/CN115346665B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/20ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for computer-aided diagnosis, e.g. based on medical expert systems
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H10/00ICT specially adapted for the handling or processing of patient-related medical or healthcare data
    • G16H10/60ICT specially adapted for the handling or processing of patient-related medical or healthcare data for patient-specific data, e.g. for electronic patient records
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/70ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Public Health (AREA)
  • Biomedical Technology (AREA)
  • Data Mining & Analysis (AREA)
  • Epidemiology (AREA)
  • General Health & Medical Sciences (AREA)
  • Primary Health Care (AREA)
  • Databases & Information Systems (AREA)
  • Pathology (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention provides a method, a system and equipment for constructing a retinopathy incidence risk prediction model, wherein the method comprises the following steps: acquiring medical index data of a plurality of premature infants and mothers thereof; cleaning the selected modeling factors by adopting a data cleaning method to clean out the relevant modeling factors which finally participate in modeling, and constructing a training sample set according to the relevant modeling factors; respectively training a plurality of initial prediction models of different types through a training sample set to obtain a plurality of post-training prediction models; and evaluating the plurality of trained prediction models by adopting a preset model evaluation index, and determining the optimal trained prediction model after evaluation as a retinopathy incidence risk prediction model. According to the invention, a prediction model capable of accurately predicting the onset risk of retinopathy of prematurity in early stage is obtained through training, so that the dilemma that ROP can be predicted only by depending on fundus pictures is eliminated, and effective and accurate ROP prediction can be carried out in regions with insufficient service and remote areas.

Description

Method, system and equipment for constructing retinopathy incidence risk prediction model
Technical Field
The invention relates to the technical field of model training, in particular to a method, a system and equipment for constructing a retinopathy morbidity risk prediction model.
Background
Retinopathy of prematurity (ROP) is a retinal vascular proliferative disease occurring in premature infants and is the most common blindness-causing and low vision eye disease in infants. Retinopathy of prematurity threatens the visual quality of the infant patients and also causes heavy burden to families and society. Timely screening has important significance for reducing the blindness rate of ROP and the visual impairment.
At present, the doctor mainly screens out diagnosis and predicts the risk of morbidity according to the fundus photo that fundus photography equipment caught by, however, can be accompanied with sclera roof pressure when taking the fundus photo, and whole process needs a lot of indirect ophthalmoscopy examinations, this is harmful to the baby, also be hard to ophthalmology doctor, need observe the subtle characteristic of fundus photo simultaneously and just can screen out diagnosis and the prediction of morbidity risk, rely on experienced ophthalmologist, and often can't go on in the short-of-service and remote area.
Disclosure of Invention
Based on this, the invention aims to provide a method, a system and equipment for constructing a retinopathy onset risk prediction model, and aims to solve at least one technical problem in the background art.
According to the embodiment of the invention, the construction method of the retinopathy incidence risk prediction model comprises the following steps:
acquiring medical index data of a plurality of premature infants and mothers thereof, wherein the premature infants comprise premature infants suffering from retinopathy and premature infants not suffering from retinopathy, the medical index data comprises medical indexes and illness marks, and each medical index correspondingly forms an alternative modeling factor;
cleaning all the selected modeling factors by adopting a preset data cleaning method so as to clean out relevant modeling factors which finally participate in modeling, and constructing a training sample set by the relevant modeling factors of each premature infant and mother and the corresponding sick identification thereof;
respectively training a plurality of initial prediction models of different types through the training sample set to correspondingly obtain a plurality of post-training prediction models;
and evaluating the plurality of trained prediction models by adopting a preset model evaluation index, and determining the optimal trained prediction model after evaluation as a retinopathy incidence risk prediction model.
In addition, the method for constructing the retinopathy onset risk prediction model according to the above embodiment of the present invention may further have the following additional technical features:
further, the step of cleaning all the selected modeling factors by adopting a preset data cleaning method to clean the relevant modeling factors which finally participate in modeling comprises the following steps:
performing correlation analysis on all the selected modeling factors, and screening out candidate modeling factors of which the P values are smaller than a first threshold value from correlation analysis results;
and carrying out importance ranking on the candidate modeling factors, and determining the relevant modeling factors according to the importance ranking result.
Further, the step of ranking the importance of the candidate modeling factors and determining the relevant modeling factors according to the result of ranking the importance includes:
respectively adopting an extreme gradient lifting tree algorithm, a random forest algorithm and a supplementary naive Bayes classification algorithm to carry out importance ranking on the candidate modeling factors, and correspondingly obtaining three importance ranking results;
respectively selecting candidate modeling factors of a front preset position from the three importance sorting results to obtain three candidate modeling factor sets;
and solving an intersection of the three candidate modeling factor sets, and determining common candidate modeling factors in the three candidate modeling factor sets so as to determine the related modeling factors.
Further, the step of determining a common candidate modeling factor in the three candidate modeling factor sets by intersecting the three candidate modeling factor sets to determine the relevant modeling factor includes:
calculating the average difference value of each medical index of the premature infant without retinopathy and the premature infant with retinopathy to obtain the average difference value of each medical index;
drawing a plurality of concentric circles by taking the sum of the average difference value and a basic threshold value as a radius according to the average difference value of each medical index, wherein each concentric circle corresponds to one medical index, continuously carrying out minimum interval clustering on the plurality of concentric circles until the plurality of concentric circles are subjected to differentiation clustering in two clusters, and further clustering each medical index in the two clusters;
and selecting the medical indexes in the cluster with larger average difference value as modeling factors and solving the intersection with the common candidate modeling factors to obtain the related modeling factors.
Further, before the step of performing correlation analysis on all the candidate modeling factors, the method further includes:
calculating the deletion ratio of each candidate modeling factor, wherein the deletion ratio is the ratio of the number of the missing candidate modeling factors in all samples to the total number of the samples, and the data of each premature infant and the mother thereof correspond to one sample;
and removing the candidate modeling factors with the deletion ratio larger than the second threshold value.
Further, the training of the initial prediction model by the training sample set includes:
copying odd initial prediction models, and combining the odd initial prediction models pairwise to form a plurality of groups of initial prediction models with a training sequence and a single initial prediction model;
sequentially training the initial prediction models in each group according to the training sequence through the training sample set, after the initial prediction models in the current group are trained, averaging the parameters of the two trained prediction models in the current group to obtain a model parameter average value, and taking the model parameter average value as the initial values of the two trained initial prediction models in the next group;
and giving the model parameter mean value obtained after the last group of training as a model initial value to the single initial prediction model, and training the single initial prediction model through the training sample set to obtain the post-training prediction model.
Further, the step of determining the evaluated optimal trained prediction model as the retinopathy onset risk prediction model further includes:
inputting medical index data corresponding to the relevant modeling factors of the infant to be tested into the retinopathy onset risk prediction model, and constructing a SHAP graph according to the prediction result;
and calculating the sum of the prediction probabilities of the relevant modeling factors of the infant to be tested in the SHAP graph to obtain the incidence risk probability of the retinopathy of the infant to be tested.
Further, the relevant modeling factors include history of severe preeclampsia, birth score 1 minute, birth gestational age, history of very low body weight, history of blood transfusion, and history of neonatal hyperglycemia;
the initial prediction model comprises an extreme gradient lifting tree algorithm model, a random forest algorithm model, a lightweight gradient lifter algorithm model, a self-adaptive enhancement algorithm model, a complementary naive Bayes classification algorithm model and a support vector machine algorithm model.
According to the embodiment of the invention, the construction system of the retinopathy onset risk prediction model comprises the following steps:
the data acquisition module is used for acquiring medical index data of a plurality of premature infants and mothers thereof, wherein the premature infants comprise premature infants suffering from retinopathy and premature infants not suffering from retinopathy, the medical index data comprise medical indexes and diseased identifiers, and each medical index correspondingly forms an enrollment modeling factor;
the data cleaning module is used for cleaning all the selected modeling factors by adopting a preset data cleaning method so as to clean the relevant modeling factors which finally participate in modeling, and constructing a training sample set by the relevant modeling factors of each premature infant and each mother and the corresponding sick identification of the premature infant;
the model training module is used for respectively training a plurality of initial prediction models of different types through the training sample set to correspondingly obtain a plurality of trained prediction models;
and the model evaluation module is used for evaluating the plurality of trained prediction models by adopting a preset model evaluation index and determining the optimal trained prediction model after evaluation as a retinopathy incidence risk prediction model.
The invention also provides a computer readable storage medium, on which a computer program is stored, which when executed by a processor implements the above-mentioned method for constructing a retinal pathology onset risk prediction model.
The invention also provides a device for constructing the retinopathy incidence risk prediction model, which comprises a memory, a processor and a computer program which is stored on the memory and can run on the processor, wherein the processor realizes the method for constructing the retinopathy incidence risk prediction model when executing the program.
Compared with the prior art: the method comprises the steps of collecting a large amount of medical index data of premature infants and mothers of the premature infants to form a large amount of candidate modeling factors, cleaning the candidate modeling factors based on a preset data cleaning method to find out the most important related modeling factors related to retinopathy, training a plurality of initial prediction models of different types based on the related modeling factors respectively to obtain various prediction models of different types through training, evaluating the various prediction models based on preset model evaluation indexes to select the prediction model most suitable for a scene of retinopathy of the premature infants, and finally training to obtain the prediction model capable of accurately predicting the incidence risk of retinopathy of the premature infants in the early stage, so that the dilemma that ROP can be predicted only by depending on fundus pictures is eliminated, the eyes of the infants are not required to be detected in the mode, only corresponding medical index data of the infants are required to be input, the infants are not injured, the workload and requirements of ophthalmologists can be reduced, and effective and accurate ROP prediction can be performed in regions with insufficient service and remote regions.
Drawings
Fig. 1 is a flowchart of a method for constructing a model for predicting risk of onset of retinopathy in accordance with a first embodiment of the present invention;
FIG. 2 is a schematic diagram of minimum-spaced clusters provided by an embodiment of the present invention;
FIG. 3 is a schematic structural diagram of a system for constructing a model for predicting risk of onset of retinopathy according to a third embodiment of the present invention;
fig. 4 is a schematic structural diagram of a device for constructing a model for predicting risk of onset of retinopathy according to a fourth embodiment of the present invention.
The following detailed description will further illustrate the invention in conjunction with the above-described figures.
Detailed Description
To facilitate an understanding of the invention, the invention will now be described more fully with reference to the accompanying drawings. Several embodiments of the invention are presented in the drawings. This invention may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete.
It will be understood that when an element is referred to as being "secured to" another element, it can be directly on the other element or intervening elements may also be present. When an element is referred to as being "connected" to another element, it can be directly connected to the other element or intervening elements may also be present. The terms "vertical," "horizontal," "left," "right," and the like as used herein are for illustrative purposes only.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used herein, the term "and/or" includes any and all combinations of one or more of the associated listed items.
Example one
Referring to fig. 1, a method for constructing a model for predicting risk of retinopathy according to a first embodiment of the present invention is shown, and the method specifically includes steps S01 to S04.
Step S01, medical index data of a plurality of premature infants and mothers of the premature infants are obtained, wherein the premature infants with retinopathy and premature infants without retinopathy comprise the medical indexes and the affected marks, and each medical index correspondingly forms an enrollment modeling factor.
In order to ensure the reliability of model prediction, considering that the medical index data of the infant mother may affect retinopathy of prematurity, the embodiment collects not only the medical index data of the infant with retinopathy, but also the medical index data of the infant mother, where the medical index data of the infant and the mother belong to the same sample and belong to a study object. The disease marker is a marker for identifying whether the premature infant has retinopathy of prematurity, and for example, the patient has retinopathy of prematurity as 1 and the patient does not have retinopathy of prematurity as 0.
In particular, the premature infant admitted to the hospital and screened for retinopathy of prematurity can be searched from the hospital clinical research data platform by taking the time period as a search scope. The subject inclusion criteria in this example were: (1) Premature infants with 23 weeks < gestational age < 37 weeks (i.e., 23+1 to 36+6 weeks of gestational age); (2) Definitive ROP screening diagnosis, i.e. definitive diagnosis of disease or not, wherein 644 preterm infants meeting the above conditions were co-screened for the study in this study. Then, medical index data of the selected study subject satisfying the above conditions is retrieved, and in some optional embodiments of the present embodiment, the medical index of the preterm infant specifically introduced includes: birth weight, birth gestational age, birth score (Apgar) of 1 minute, birth score (Apgar) of 5 minutes, birth score (Apgar) of 10 minutes, whether it is a multiparous, precious child, whether there is neonatal hyperglycemia, neonatal ischemic hypoxic encephalopathy, neonatal asphyxia, neonatal meconium inhalation syndrome, neonatal respiratory failure, neonatal respiratory distress syndrome, patent ductus arteriosus, neonatal hyperbilirubinemia, cerebral hemorrhage, neonatal asphyxia, congenital heart disease, neonatal septicemia, neonatal pneumothorax, neonatal hypocalcemia, whether there are blood transfusions and blood transfusions, whether there are oxygen uptake and days, whether to receive intratracheal instillation of porcine lung phospholipid therapy, radiation rescue, etc.; specific medical indicators introduced for preterm mothers include: age, fetal number, birth time, amniotic fluid graduation, premature rupture of fetal membranes, intrauterine distress, premature rupture of fetal discs, pregnancy-induced hypertension, severe preeclampsia, gestational diabetes, hyperthyroidism in pregnancy, systemic lupus erythematosus, thalassemia, etc. Wherein the specific score values of 1 minute birth score (Apgar), 5 minutes birth score (Apgar) and 10 minutes birth score (Apgar) are generally 1-9.
And S02, cleaning all the selected modeling factors by adopting a preset data cleaning method so as to clean the relevant modeling factors which finally participate in modeling, and constructing a training sample set by the relevant modeling factors of each premature infant and each mother and the corresponding diseased identification thereof.
It should be noted that the purpose of data cleaning is to screen out the medical indexes most relevant to retinopathy of prematurity, so as to determine the optimal modeling factor combination and improve the training rate and accuracy of the model. In specific implementation, correlation analysis may be performed between the candidate modeling factors and retinopathy of prematurity (i.e., disease markers), and the correlation analysis means may be, but is not limited to, pearson correlation analysis and baseline analysis, for example, pearson correlation coefficients may be used to fit the correlation between each of the medical indicators and retinopathy of prematurity, or the correlation between each of the medical indicators and retinopathy of prematurity may also be fitted based on the baseline analysis, and then a significant medical indicator having a P value (P value) smaller than a threshold (e.g., 0.05) is selected from the correlation analysis results, so as to find a plurality of medical indicators having a correlation higher than the threshold from among the plurality of medical indicators, thereby determining the correlation modeling factors ultimately participating in modeling.
Then, constructing training samples by using the relevant modeling factors of each premature infant and each mother thereof and the corresponding diseased marker thereof, so that each training sample comprises the relevant modeling factors and the diseased markers of the sample, and constructing a training sample set.
And S03, respectively training a plurality of initial prediction models of different types through the training sample set to correspondingly obtain a plurality of post-training prediction models.
In some optional embodiments, the initial prediction model may specifically include an Extreme Gradient Boosting tree algorithm model (XGBoost), a Random Forest algorithm model (RF), a Light Gradient Boosting Machine algorithm model (Light gbm), an adaptive enhancement algorithm model (AdaBoost), a Naive Bayes classification algorithm model, and a support vector Machine algorithm model (GNB).
That is, the present embodiment aims to research the applicability of different machine learning algorithms in the prediction of a retinopathy of prematurity scene, which is different from the prior art that one machine learning algorithm is generally directly selected according to human experience in the prior art, and then the selected machine learning algorithm is trained, which is difficult to ensure the applicability of the selected machine learning algorithm in the research of the required field, and the finally trained machine model is difficult to ensure high reliability. Therefore, the method is different from the prior art, the same training sample set is adopted to respectively train different machine learning algorithms to obtain multiple post-training prediction models, the multiple post-training prediction models are evaluated based on preset model evaluation indexes, and the machine learning algorithm model most suitable for prediction of retinopathy of prematurity is determined, so that the high reliability of the retinopathy morbidity risk prediction model obtained through final training is ensured, and the special requirement of retinopathy of prematurity is met.
And S04, evaluating the trained prediction models by adopting a preset model evaluation index, and determining the optimal trained prediction model after evaluation as a retinopathy incidence risk prediction model.
In a preferred embodiment, the post-training prediction model may be evaluated using one or more of 6 evaluation indicators, namely area under the curve (AUC), accuracy (Accuracy), sensitivity (TPR), specificity (TNR), positive Predictive Value (PPV), negative Predictive Value (NPV), and F1 Score (F1 Score) evaluation machine learning algorithm.
In summary, in the method for constructing a retinopathy onset risk prediction model in the above embodiments of the present invention, a large number of candidate modeling factors are formed by collecting medical index data of a premature infant and a mother thereof, the candidate modeling factors are cleaned based on a preset data cleaning method to find the most important relevant modeling factors related to retinopathy onset, and a plurality of initial prediction models of different types are trained based on the relevant modeling factors, so as to obtain a plurality of prediction models of different types through training, and the plurality of prediction models are evaluated based on preset model evaluation indexes to select a prediction model most suitable for a retinopathy scene of the premature infant, so as to obtain a prediction model capable of accurately predicting the retinopathy onset risk of the premature infant at an early stage through training, thereby getting rid of a dilemma that ROP can be predicted only by depending on fundus pictures.
Example two
The second embodiment of the present invention also provides a method for constructing a model for predicting retinopathy incidence risk, where it is noted that, before the research, it is not clear which specific factors will affect retinopathy, so that it is necessary to incorporate as many factors as possible into the research, and then find out modeling factors related to retinopathy by using a reliable and rapid data cleansing method, and meanwhile, in the subsequent prediction process (i.e., in the actual use of the model), a user needs to input these relevant modeling factors to complete the incidence risk prediction. In order to achieve the above purpose, this embodiment provides a completely new data cleaning method, which includes the following steps:
the construction method of the retinopathy onset risk prediction model in the present embodiment is different from that in the first embodiment in that:
the step of cleaning all the selected modeling factors by using a preset data cleaning method to clean the relevant modeling factors which finally participate in modeling from the selected modeling factors may specifically include:
performing correlation analysis on all the selected modeling factors, and screening candidate modeling factors with P values smaller than a first threshold value from correlation analysis results, wherein the correlation analysis can be Pearson correlation analysis or baseline analysis;
and carrying out importance ranking on the candidate modeling factors, and determining the relevant modeling factors according to the result of the importance ranking.
Specifically, the step of ranking the importance of the candidate modeling factors and determining the relevant modeling factors according to the result of ranking the importance includes:
respectively adopting an extreme gradient lifting tree algorithm, a random forest algorithm and a complementary naive Bayes classification algorithm to perform importance ranking on the candidate modeling factors, and correspondingly obtaining three importance ranking results;
respectively selecting candidate modeling factors of a pre-set bit from the three importance ranking results to obtain three candidate modeling factor sets;
and solving an intersection of the three candidate modeling factor sets, and determining common candidate modeling factors in the three candidate modeling factor sets so as to determine the related modeling factors.
In addition, before the step of performing the baseline analysis on all the candidate modeling factors, the method may further include:
calculating the deletion ratio of each candidate modeling factor, wherein the deletion ratio is the ratio of the number of the missing candidate modeling factors in all samples to the total number of the samples, and the data of each premature infant and the mother thereof correspond to one sample;
and removing the candidate modeling factors with the deletion ratio larger than the second threshold value.
Further, in some optional cases of this embodiment, intersecting the three candidate modeling factor sets, and determining a common candidate modeling factor in the three candidate modeling factor sets, so as to determine the relevant modeling factor may specifically include:
calculating the average difference value of each medical index of the premature infant without retinopathy and the premature infant with retinopathy to obtain the average difference value of each medical index;
drawing a plurality of concentric circles by taking the sum of the average difference value and a basic threshold value as a radius according to the average difference value of each medical index, wherein each concentric circle corresponds to one medical index, continuously carrying out minimum interval clustering on the plurality of concentric circles until the plurality of concentric circles are subjected to differentiation clustering in two clusters, and further clustering each medical index in the two clusters;
and selecting the medical indexes in the group with larger average difference value as modeling factors and solving intersection with the common candidate modeling factors to obtain the related modeling factors.
The medical index includes a numerical medical index and an option medical index, the numerical medical index refers to a medical index of which the index value is a numerical value, such as birth score of 1 minute, blood transfusion frequency, birth gestational age and the like, the option medical index refers to a medical index of which the index value is an option, such as whether the index value is an extremely low-weight infant or not, whether the infant meconium aspiration syndrome exists or not and the like, and the index value of the option medical index is generally 'yes' or 'no'. For the numerical medical index, the difference value of the index values can be directly calculated to calculate the difference value of the medical index, for example, if the birth score of the sick premature infant is 5 in 1 minute, and the birth score of the non-sick premature infant is 9 in 1 minute, the difference value of the birth score in 1 minute is 4. For the option-type medical index, the medical index difference value can be calculated according to the difference of the index values, that is, if the index values of the sick premature infant and the non-sick premature infant are the same (both yes or both no), the index difference value is a first threshold value, if the index values of the sick premature infant and the non-sick premature infant are not the same (one is "yes" and one is "no"), the index difference value is a second threshold value, and when the specific research is carried out, the first threshold value is defined to be 0, and the second threshold value is 6, but the method is not limited to this, and the method can be modified specifically according to specific situations. The assignment of the second threshold value can be close to the average index difference value of the numerical medical index, and the assignment of the first threshold value should be smaller than the second threshold value as much as possible, so that various medical indexes can be better classified and clustered in two clusters.
In specific implementation, sick premature infants and non-sick premature infants are paired randomly one by one, each pair generates a group of difference values of each index, then an average value is calculated for the difference values of each index of each group, so that an average difference value of each medical index is obtained, for example, 100 pairs of the difference values generate 100 birth scores for 1 minute, the average value is calculated for the 100 birth scores for 1 minute, so that an average difference value of the birth scores for 1 minute is obtained, and other indexes are the same.
Then, according to the average difference value of each medical index, a plurality of concentric circles are drawn by taking the sum of the average difference value and the basic threshold as a radius, each concentric circle corresponds to one medical index, as shown in fig. 2 (for example), wherein the basic threshold (for example, 1) is added to enable the medical index with the average difference value of 0 to normally draw the corresponding concentric circle. Then, the multiple concentric circles are continuously subjected to minimum-distance clustering, the minimum-distance clustering process is to calculate the distance between each concentric circle and its adjacent concentric circle, the distances are grouped into a smaller class, for example, as shown in fig. 2, the distance between concentric circle B and concentric circle a is smaller than the distance between concentric circle B and concentric circle C, so concentric circle B and concentric circle a are grouped into a class, and the other is analogized in sequence, after one round of clustering, multiple classes are formed, for example, the distance between each class B and its adjacent class a is calculated as class I, the distance between concentric circle C and concentric circle D is group II, the distance between concentric circle E and concentric circle F is group III, at this time, a second round of clustering is performed, the distance between each class B and its adjacent class is calculated as class I and class II, and the distance between the two closest concentric circles in the two classes are calculated as class C, the distance between class II and the concentric circle C is calculated as a single class I and class II, so that the distance between class I and class II is calculated as a second round of clustering. Therefore, the multiple concentric circles are differentiated and clustered in the two clusters, and further various medical indexes are clustered in the two clusters.
It should be understood that one of the two clusters of the final cluster includes medical indicators with larger difference between diseased and non-diseased, which can be defined as a differential cluster, and the other includes medical indicators with smaller difference between diseased and non-diseased, which can be defined as a homogeneous cluster, and the medical indicators with larger difference are necessarily more related to retinopathy, so that the medical indicators in the differential cluster are used as modeling factors and intersect with the common candidate modeling factors obtained by the importance ranking to finally obtain the related modeling factors. Therefore, the data cleaning mode with the importance sorting and the differentiation analysis in cooperation is adopted in the embodiment, the key modeling factors most relevant to the retinopathy can be screened out finally, and therefore the retinopathy attack risk can be predicted accurately after fewer key modeling factors are input.
In the specific research, the deletion ratio of each selected modeling factor is calculated, the selected modeling factors with the deletion ratio larger than 0.4 are removed, then pearson correlation analysis is performed on all the remaining selected modeling factors, and candidate modeling factors with the P value smaller than a first threshold (0.05) are selected from the correlation analysis results, specifically: birth score 1 minute, birth score 5 minutes, radiation rescue, blood transfusion, number of transfusions, neonatal asphyxia, neonatal bronchopulmonary dysplasia, neonatal apnea, neonatal pneumonia, congenital patent foramen ovale, extremely low body weight infants, congenital atrial septal defect, cerebral hemorrhage, neonatal transient neutropenia, neonatal meconium aspiration syndrome, neonatal hyperbilirubinemia, neonatal hyperglycemia, neonatal respiratory failure, neonatal pulmonary hyaline disease, hemorrhagic shock, patent ductus arteriosus, neonatal disseminated intravascular coagulation, neonatal ischemic hypoxic encephalopathy, gestational diabetes, severe eclampsia, prenatal fetus protection, gestational thyroidism, nephrotic syndrome, hepatitis b little three yang, maternal medication at delivery, ROP treatment time, birth gestational age, and incubator.
Then, respectively adopting an extreme gradient lifting tree algorithm, a random forest algorithm and a complementary naive Bayes classification algorithm to carry out importance ranking on the 33 candidate modeling factors, wherein an extreme gradient lifting tree (XGB) is adopted to carry out variable importance analysis, and the 15 variables (from high to low) with the highest importance respectively are as follows: birth score 1 minute, birth gestational age, neonatal hyperglycemia, severe preeclampsia, blood transfusion, neonatal hyperbilirubinemia, very low body weight, neonatal asphyxia, neonatal ischemic-hypoxic encephalopathy, birth score 5 minutes, congenital heart disease, neonatal sepsis, neonatal hypocalcemia, gestational diabetes, and gestational hyperthyroidism. The importance analysis of the variables is carried out by adopting a random forest algorithm (RF), and the 15 variables (from high to low) with the highest importance are respectively as follows: birth score 1 minute, birth age, neonatal hyperglycemia, blood transfusion, neonatal asphyxia, neonatal ischemic-hypoxic encephalopathy, severe preeclampsia, very low body weight, neonatal hyperbilirubinemia, neonatal meconium aspiration syndrome, radiation rescue, cerebral hemorrhage, neonatal respiratory failure, hepatitis b minor three positive and patent ductus arteriosus. The method adopts a complementary Bayes classification algorithm (CNB) to analyze the importance of the variables, and the 15 variables with the highest importance (from high to low) are respectively as follows: birth score 1 minute, blood transfusion, neonatal asphyxia, severe preeclampsia, neonatal expiration, very low body weight, neonatal meconium aspiration syndrome, neonatal ischemic and hypoxic encephalopathy, hepatitis b minor three positive, neonatal hyperglycemia, gestational hyperthyroidism, cerebral hemorrhage, birth gestational age, hemorrhagic shock, and patent ductus arteriosus. Taking top15 of variable importance of 3 models, drawing a Weinn diagram, and taking the variable set common variables of the three methods, wherein the method specifically comprises the following steps: severe preeclampsia, birth score of 1 minute, birth gestational age, very low weight, blood transfusion, neonatal hyperglycemia, neonatal asphyxia, and neonatal ischemic-hypoxic encephalopathy.
At the same time, the differentiation analysis as described above is performed to find a differentiation group, wherein the differentiation group comprises the following differentiation medical indicators: birth score of 5 minutes, pre-eclampsia severity, birth score of 1 minute, birth gestational age, very low body weight, blood transfusion, neonatal hyperglycemia, neonatal respiratory distress syndrome, congenital heart disease, radiation rescue, and oxygen inhalation. Then with the above common variables: the method comprises the following steps of performing intersection on severe preeclampsia, birth score of 1 minute, birth gestational age, extremely low-weight infants, blood transfusion, neonatal hyperglycemia, neonatal asphyxia and neonatal ischemic-hypoxic encephalopathy, and finally selecting 6 characteristic variables of severe preeclampsia history, birth score of 1 minute, birth gestational age, extremely low-weight infant history, blood transfusion history and neonatal hyperglycemia as final related modeling factors.
In addition, in this embodiment, the extreme gradient lifting tree algorithm model, the random forest algorithm model, the lightweight gradient lifting machine algorithm model, the adaptive enhancement algorithm model, the complementary naive bayes classification algorithm model and the support vector machine algorithm model are trained by respectively using the training sample set constructed by the 6 relevant modeling factors, and finally, the trained extreme gradient lifting tree algorithm model, the random forest algorithm model, the lightweight gradient lifting machine algorithm model, the adaptive enhancement algorithm model, the complementary naive bayes classification algorithm model and the support vector machine algorithm model are evaluated by using the area under the curve AUC, so that the AUC of the extreme gradient lifting tree algorithm model is found to be the best in the training set (0.96) or the verification set (0.949), the model stability is relatively good, the AdaBoost model (0.956 and 0.942) is finally found, the AUC of the RF, lightGBM and CNB algorithms are respectively 0.948, 0.945, 0.940 and the AUC of the unaboost model (0.912), and the AUC of the extreme gradient lifting tree algorithm is the smallest in all the training set models, and thus the final prediction model of the risk of the svc is determined as the final prediction model of the retinal tree after the risk prediction of the training gradient lifting tree. In a specific implementation, the training set and the validation set may be allocated in a ratio of 8.
Further, in some optional embodiments, the training the initial prediction model through the training sample set may specifically include:
copying odd initial prediction models, and combining the odd initial prediction models in pairs to form a plurality of groups of initial prediction models with a training sequence and a single initial prediction model;
sequentially training the initial prediction models in each group according to the training sequence through the training sample set, after the initial prediction models in the current group are trained, averaging the parameters of the two trained prediction models in the current group to obtain a model parameter average value, and taking the model parameter average value as the initial values of the two trained initial prediction models in the next group;
and giving the model parameter mean value obtained after the last group of training as a model initial value to the single initial prediction model, and training the single initial prediction model through the training sample set to obtain the post-training prediction model.
Because different models need to be trained, training data are relatively large, and modeling factors are relatively large, how to improve the training speed of the models and ensure the training precision of the models becomes a key point. For this reason, the present embodiment adopts the following brand-new model training method: firstly, copying odd initial prediction models, combining the odd initial prediction models pairwise to form a plurality of groups of initial prediction models with a training sequence and a single initial prediction model, then training the initial prediction models in each group in sequence according to the training sequence through a training sample set, averaging the parameters of the two trained prediction models in the current group to obtain a model parameter average value after the initial prediction models in the current group are trained, and using the model parameter average value as the initial values of the two initial prediction models in the next group, so that the model is trained at least twice in a conventional iterative training once, the model can be rapidly converged, and the final training time can be shortened by at least half.
Further, in some alternative embodiments, the step of determining the post-evaluation optimal trained prediction model as the retinopathy onset risk prediction model may further include:
inputting medical index data corresponding to the relevant modeling factors of the infant to be tested into the retinopathy onset risk prediction model, and constructing a SHAP graph according to the prediction result;
and calculating the sum of the prediction probabilities of the relevant modeling factors of the infant to be tested in the SHAP graph to obtain the incidence risk probability of the retinopathy of the infant to be tested.
That is, the present embodiment is visualized by a SHAP map, specifically, a SHAP profile combines feature importance with feature effects. Each point on the graph is a feature and an example sharley value, with the position on the ordinate axis being determined by the feature and the position on the abscissa axis being determined by the sharley value. Based on the SHAP map, a score for each predictor is obtained, the scores for all points are added to form a total score for the patient, and the predicted probability corresponding to the total score is the predicted probability of the patient developing retinopathy of prematurity. Wherein, SHAP graph analysis shows that when the threshold probability is more than 16.971%, the SHAP graph model has better prediction efficacy on the occurrence of retinopathy of prematurity. In other embodiments, the output of the model may also be a binary result, such as outputting with or without a risk of onset.
In specific implementation, when the prediction probability is higher than a threshold probability (for example, 16.971%), the patient is recommended to take a doctor to check as soon as possible, and screening treatment is carried out in time, and after the patient inputs 6 characteristic variables, namely the history of pre-eclampsia, the birth score of 1 minute, the birth gestational age, the history of very low-weight infants, the transfusion history and the history of hyperglycemia of newborns, the retinopathy onset risk prediction model trained by the method has very high prediction accuracy.
EXAMPLE III
Another aspect of the present invention further provides a system for constructing a model for predicting onset risk of retinopathy, referring to fig. 3, which shows a system for constructing a model for predicting onset risk of retinopathy according to a third embodiment of the present invention, where the system for constructing a model for predicting onset risk of retinopathy includes:
a data obtaining module 11, configured to obtain medical index data of a plurality of premature infants and mothers thereof, where the plurality of premature infants include premature infants with retinopathy and premature infants without retinopathy, the medical index data includes medical indexes and diseased identifiers, and each of the medical indexes correspondingly forms an enrollment modeling factor;
the data cleaning module 12 is configured to clean all the candidate modeling factors by using a preset data cleaning method, so as to clean relevant modeling factors that finally participate in modeling from the selected modeling factors, and construct a training sample set by using the relevant modeling factors of each premature infant and each mother thereof and the corresponding diseased identifier thereof;
the model training module 13 is configured to train a plurality of initial prediction models of different types through the training sample set, and correspondingly obtain a plurality of trained prediction models;
and the model evaluation module 14 is configured to evaluate the plurality of trained prediction models by using preset model evaluation indexes, and determine an optimal trained prediction model after evaluation as a retinopathy incidence risk prediction model.
Further, in some optional embodiments of the present invention, the data cleansing module 12 is further configured to perform correlation analysis on all the candidate modeling factors, and screen out candidate modeling factors having a P value smaller than the first threshold from the correlation analysis result; and carrying out importance ranking on the candidate modeling factors, and determining the relevant modeling factors according to the importance ranking result.
Further, in some optional embodiments of the present invention, the data cleaning module 12 is further configured to perform importance ranking on the candidate modeling factors by respectively using an extreme gradient lifting tree algorithm, a random forest algorithm, and a complementary naive bayesian classification algorithm, so as to obtain three importance ranking results correspondingly; respectively selecting candidate modeling factors of a pre-set bit from the three importance ranking results to obtain three candidate modeling factor sets; and solving an intersection of the three candidate modeling factor sets, and determining common candidate modeling factors in the three candidate modeling factor sets so as to determine the related modeling factors.
Further, in some optional embodiments of the present invention, the data washing module 12 is further configured to calculate an average difference value of each medical index of the premature infant without retinopathy and the premature infant with retinopathy, so as to obtain an average difference value of each medical index; drawing a plurality of concentric circles by taking the sum of the average difference value and a basic threshold value as a radius according to the average difference value of each medical index, wherein each concentric circle corresponds to one medical index, continuously carrying out minimum spacing clustering on the plurality of concentric circles until the plurality of concentric circles are subjected to differentiation clustering in two clusters, and further clustering each medical index in the two clusters; and selecting the medical indexes in the cluster with larger average difference value as modeling factors and solving the intersection with the common candidate modeling factors to obtain the related modeling factors.
Further, in some optional embodiments of the present invention, the data washing module 12 is further configured to calculate a missing ratio of each candidate modeling factor, where the missing ratio is a ratio of the number of the candidate modeling factors missing in all samples to the total number of samples, and the data of each premature infant and its mother corresponds to one sample; and removing the candidate modeling factors with the deletion ratio larger than the second threshold value.
Further, in some optional embodiments of the present invention, the model training module 13 is further configured to copy an odd number of initial prediction models, and combine the odd number of initial prediction models two by two to form a plurality of groups of initial prediction models having a training sequence and a single initial prediction model; sequentially training the initial prediction models in each group according to the training sequence through the training sample set, after the initial prediction models in the current group are trained, averaging the parameters of the two trained prediction models in the current group to obtain a model parameter average value, and taking the model parameter average value as the initial values of the two trained initial prediction models in the next group; and giving the model parameter mean value obtained after the last group of training as a model initial value to the single initial prediction model, and training the single initial prediction model through the training sample set to obtain the post-training prediction model.
Further, in some optional embodiments of the present invention, the system for constructing the retinopathy incidence risk prediction model further includes:
the probability output module is used for inputting medical index data corresponding to the relevant modeling factors of the infant to be tested into the retinopathy incidence risk prediction model and constructing a SHAP graph according to the prediction result; and calculating the sum of the prediction probabilities of the relevant modeling factors of the infant to be tested in the SHAP picture to obtain the retinopathy incidence risk probability of the infant to be tested.
The functions or operation steps implemented by the modules and units when executed are substantially the same as those of the method embodiments, and are not described herein again.
Example four
Referring to fig. 4, the apparatus for constructing a model for predicting risk of onset of retinopathy according to a fourth embodiment of the present invention includes a memory 20, a processor 10, and a computer program 30 stored in the memory and executable on the processor, where the processor 10 implements the method for constructing the model for predicting risk of onset of retinopathy when executing the computer program 30.
The constructing device of the retinopathy onset risk prediction model may specifically be a computer, a server, an upper computer, and the like, and the processor 10 may be a Central Processing Unit (CPU), a controller, a microcontroller, a microprocessor, or another data Processing chip in some embodiments, and is configured to run a program code stored in the memory 20 or process data, for example, execute an access restriction program.
The memory 20 includes at least one type of readable storage medium, which includes a flash memory, a hard disk, a multimedia card, a card type memory (e.g., SD or DX memory, etc.), a magnetic memory, a magnetic disk, an optical disk, and the like. The memory 20 may be an internal storage unit of a construction device of the retinopathy onset risk prediction model in some embodiments, for example, a hard disk of the construction device of the retinopathy onset risk prediction model. The memory 20 may also be an external storage device of the device for constructing the retinopathy onset risk prediction model in other embodiments, for example, a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), or the like provided on the device for constructing the retinopathy onset risk prediction model. Further, the memory 20 may also include both an internal storage unit and an external storage device of the construction apparatus of the retinopathy onset risk prediction model. The memory 20 may be used not only to store application software installed in a construction device of a retinopathy onset risk prediction model and various kinds of data, but also to temporarily store data that has been output or will be output.
It should be noted that the structure shown in fig. 4 does not constitute a limitation of the construction apparatus for the retinopathy onset risk prediction model, and in other embodiments, the construction apparatus for the retinopathy onset risk prediction model may include fewer or more components than those shown, or some components may be combined, or a different arrangement of components may be used.
The embodiment of the present invention further provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the method for constructing the retinal pathology onset risk prediction model as described above.
Those of skill in the art will understand that the logic and/or steps illustrated in the flowcharts or otherwise described herein, such as an ordered listing of executable instructions that can be considered to implement logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this description, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.
More specific examples (a non-exhaustive list) of the computer-readable medium would include the following: an electrical connection (electronic device) having one or more wires, a portable computer diskette (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable compact disc read-only memory (CDROM). Further, the computer-readable medium could even be paper or another suitable medium upon which the program is printed, as the program can be electronically captured, via for instance optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner if necessary, and then stored in a computer memory.
It should be understood that portions of the present invention may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the various steps or methods may be implemented in software or firmware stored in memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, any one or combination of the following techniques, which are known in the art, may be used: a discrete logic circuit having a logic gate circuit for implementing a logic function on a data signal, an application specific integrated circuit having an appropriate combinational logic gate circuit, a Programmable Gate Array (PGA), a Field Programmable Gate Array (FPGA), or the like.
In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.
The above-mentioned embodiments only express several embodiments of the present invention, and the description thereof is more specific and detailed, but not construed as limiting the scope of the present invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the inventive concept, which falls within the scope of the present invention. Therefore, the protection scope of the present patent should be subject to the appended claims.

Claims (9)

1. A method for constructing a retinopathy onset risk prediction model, the method comprising:
acquiring medical index data of a plurality of premature infants and mothers thereof, wherein the premature infants comprise premature infants suffering from retinopathy and premature infants not suffering from retinopathy, the medical index data comprise medical indexes and illness marks, and each medical index correspondingly forms an enrollment modeling factor;
cleaning all the selected modeling factors by adopting a preset data cleaning method so as to clean out relevant modeling factors which finally participate in modeling, and constructing a training sample set by the relevant modeling factors of each premature infant and mother and the corresponding sick identification thereof;
respectively training a plurality of initial prediction models of different types through the training sample set to correspondingly obtain a plurality of post-training prediction models;
and evaluating the plurality of trained prediction models by adopting a preset model evaluation index, and determining the optimal trained prediction model after evaluation as a retinopathy incidence risk prediction model.
2. The method for constructing a model for predicting the onset risk of retinopathy according to claim 1, wherein the step of cleaning all the candidate modeling factors by using a preset data cleaning method to clean the relevant modeling factors that finally participate in modeling comprises:
performing correlation analysis on all the selected modeling factors, and screening out candidate modeling factors of which the P values are smaller than a first threshold value from correlation analysis results;
and carrying out importance ranking on the candidate modeling factors, and determining the relevant modeling factors according to the importance ranking result.
3. The method of claim 2, wherein the step of ranking the candidate modeling factors for importance and determining the relevant modeling factors according to the ranking result of importance comprises:
respectively adopting an extreme gradient lifting tree algorithm, a random forest algorithm and a supplementary naive Bayes classification algorithm to carry out importance ranking on the candidate modeling factors, and correspondingly obtaining three importance ranking results;
respectively selecting candidate modeling factors of a pre-set bit from the three importance ranking results to obtain three candidate modeling factor sets;
and solving an intersection of the three candidate modeling factor sets, and determining common candidate modeling factors in the three candidate modeling factor sets so as to determine the related modeling factors.
4. The method of claim 3, wherein the step of intersecting the three sets of candidate modeling factors to determine common candidate modeling factors in the three sets of candidate modeling factors to determine the relevant modeling factors comprises:
calculating the average difference value of each medical index of the premature infant without retinopathy and the premature infant with retinopathy to obtain the average difference value of each medical index;
drawing a plurality of concentric circles by taking the sum of the average difference value and a basic threshold value as a radius according to the average difference value of each medical index, wherein each concentric circle corresponds to one medical index, continuously carrying out minimum spacing clustering on the plurality of concentric circles until the plurality of concentric circles are subjected to differentiation clustering in two clusters, and further clustering each medical index in the two clusters;
and selecting the medical indexes in the cluster with larger average difference value as modeling factors and solving the intersection with the common candidate modeling factors to obtain the related modeling factors.
5. The method of claim 2, wherein the step of performing correlation analysis on all the candidate modeling factors is preceded by the step of constructing a model for predicting risk of onset of retinopathy, further comprising:
calculating the deletion ratio of each candidate modeling factor, wherein the deletion ratio is the ratio of the number of the candidate modeling factors in all samples to the total number of the samples, and the data of each premature infant and the mother thereof correspond to one sample;
and removing the candidate modeling factors with the deletion ratio larger than the second threshold value.
6. The method of claim 1, wherein the training of the initial prediction model by the training sample set comprises:
copying odd initial prediction models, and combining the odd initial prediction models pairwise to form a plurality of groups of initial prediction models with a training sequence and a single initial prediction model;
sequentially training the initial prediction models in each group according to the training sequence through the training sample set, after the initial prediction models in the current group are trained, averaging the parameters of the two trained prediction models in the current group to obtain a model parameter average value, and taking the model parameter average value as the initial values of the two trained initial prediction models in the next group;
and giving the model parameter mean value obtained after the last group of training as a model initial value to the single initial prediction model, and training the single initial prediction model through the training sample set to obtain the post-training prediction model.
7. The method of claim 1, wherein the step of determining the post-evaluation optimal trained predictive model as the predictive model of retinopathy risk further comprises:
inputting medical index data corresponding to the relevant modeling factors of the infant to be tested into the retinopathy onset risk prediction model, and constructing a SHAP graph according to the prediction result;
and calculating the sum of the prediction probabilities of the relevant modeling factors of the infant to be tested in the SHAP graph to obtain the incidence risk probability of the retinopathy of the infant to be tested.
8. A system for constructing a retinopathy onset risk prediction model, the system comprising:
the data acquisition module is used for acquiring medical index data of a plurality of premature infants and mothers thereof, wherein the premature infants comprise premature infants suffering from retinopathy and premature infants not suffering from retinopathy, the medical index data comprise medical indexes and diseased identifiers, and each medical index correspondingly forms an enrollment modeling factor;
the data cleaning module is used for cleaning all the selected modeling factors by adopting a preset data cleaning method so as to clean the relevant modeling factors which finally participate in modeling, and constructing a training sample set by the relevant modeling factors of each premature infant and each mother and the corresponding sick identification of the premature infant;
the model training module is used for respectively training a plurality of initial prediction models of different types through the training sample set to correspondingly obtain a plurality of trained prediction models;
and the model evaluation module is used for evaluating the plurality of trained prediction models by adopting a preset model evaluation index and determining the optimal trained prediction model after evaluation as a retinopathy incidence risk prediction model.
9. An apparatus for constructing a model for predicting risk of onset of retinopathy, comprising a memory, a processor and a computer program stored in the memory and executable on the processor, wherein the processor executes the program to implement the method for constructing a model for predicting risk of onset of retinopathy according to any one of claims 1 to 7.
CN202211276129.6A 2022-10-19 2022-10-19 Method, system and equipment for constructing retinopathy incidence risk prediction model Active CN115346665B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211276129.6A CN115346665B (en) 2022-10-19 2022-10-19 Method, system and equipment for constructing retinopathy incidence risk prediction model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211276129.6A CN115346665B (en) 2022-10-19 2022-10-19 Method, system and equipment for constructing retinopathy incidence risk prediction model

Publications (2)

Publication Number Publication Date
CN115346665A CN115346665A (en) 2022-11-15
CN115346665B true CN115346665B (en) 2023-03-10

Family

ID=83957576

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211276129.6A Active CN115346665B (en) 2022-10-19 2022-10-19 Method, system and equipment for constructing retinopathy incidence risk prediction model

Country Status (1)

Country Link
CN (1) CN115346665B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116452584B (en) * 2023-06-14 2023-08-22 天津医科大学第二医院 Neonatal retinopathy prediction method and system
CN117275726A (en) * 2023-09-21 2023-12-22 复旦大学 OSA (OSA) incidence risk prediction method and device based on multiple groups of biological biomarkers
CN117476183B (en) * 2023-12-27 2024-03-19 深圳市一五零生命科技有限公司 Construction system of autism children rehabilitation effect AI evaluation model

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107403072A (en) * 2017-08-07 2017-11-28 北京工业大学 A kind of diabetes B prediction and warning method based on machine learning
WO2019184119A1 (en) * 2018-03-26 2019-10-03 平安科技(深圳)有限公司 Risk model training method and apparatus, risk identification method and apparatus, device, and medium
WO2020037942A1 (en) * 2018-08-20 2020-02-27 平安科技(深圳)有限公司 Risk prediction processing method and apparatus, computer device and medium
CN111563549A (en) * 2020-04-30 2020-08-21 广东工业大学 Medical image clustering method based on multitask evolutionary algorithm
AU2020103938A4 (en) * 2020-12-07 2021-02-11 Capital Medical University A classification method of diabetic retinopathy grade based on deep learning
CN112786203A (en) * 2021-03-03 2021-05-11 天津医科大学 Machine learning diabetic retinopathy morbidity risk prediction method and application
CN113611421A (en) * 2021-08-20 2021-11-05 温州医科大学附属第二医院(温州医科大学附属育英儿童医院) Chinese southern premature infant retinopathy prediction model and construction method thereof
CN113808747A (en) * 2021-10-11 2021-12-17 南昌大学第二附属医院 Ischemic stroke recurrence prediction method

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107403072A (en) * 2017-08-07 2017-11-28 北京工业大学 A kind of diabetes B prediction and warning method based on machine learning
WO2019184119A1 (en) * 2018-03-26 2019-10-03 平安科技(深圳)有限公司 Risk model training method and apparatus, risk identification method and apparatus, device, and medium
WO2020037942A1 (en) * 2018-08-20 2020-02-27 平安科技(深圳)有限公司 Risk prediction processing method and apparatus, computer device and medium
CN111563549A (en) * 2020-04-30 2020-08-21 广东工业大学 Medical image clustering method based on multitask evolutionary algorithm
AU2020103938A4 (en) * 2020-12-07 2021-02-11 Capital Medical University A classification method of diabetic retinopathy grade based on deep learning
CN112786203A (en) * 2021-03-03 2021-05-11 天津医科大学 Machine learning diabetic retinopathy morbidity risk prediction method and application
CN113611421A (en) * 2021-08-20 2021-11-05 温州医科大学附属第二医院(温州医科大学附属育英儿童医院) Chinese southern premature infant retinopathy prediction model and construction method thereof
CN113808747A (en) * 2021-10-11 2021-12-17 南昌大学第二附属医院 Ischemic stroke recurrence prediction method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
RetCam 3小儿广角眼底照相系统对早产儿进行眼底筛查的护理;杨玉兰等;《实用临床医学》;20170820(第08期);全文 *

Also Published As

Publication number Publication date
CN115346665A (en) 2022-11-15

Similar Documents

Publication Publication Date Title
CN115346665B (en) Method, system and equipment for constructing retinopathy incidence risk prediction model
EP3964136A1 (en) System and method for guiding a user in ultrasound assessment of a fetal organ
CN111951965A (en) Panoramic health dynamic monitoring and predicting system based on time sequence knowledge graph
Savaş et al. Comparison of deep learning models in carotid artery intima-media thickness ultrasound images: Caimtusnet
CN112836730A (en) Method, apparatus, electronic device and medium for classifying pregnancy status of user
Reddy et al. Discovering optimal algorithm to predict diabetic retinopathy using novel assessment methods
CN113066574A (en) Neural network-based aneurysm rupture prediction method, device and storage medium
CN115331803A (en) Construction method and system for predicting ovarian hyporesponsiveness and deploying individualized ovarian stimulation strategy model
CN113744869B (en) Method for establishing early screening light chain type amyloidosis based on machine learning and application thereof
CN113270146B (en) Bronchopulmonary dysplasia data processing method and device and related equipment
Rahmany et al. A fully automatic based deep learning approach for aneurysm detection in DSA images
Martins et al. Use of artificial intelligence in ophthalmology: a narrative review
CN112768057B (en) System for identifying child fever cause to be checked
CN113611421A (en) Chinese southern premature infant retinopathy prediction model and construction method thereof
Xiao et al. A deep feature fusion network for fetal state assessment
Nizarudeen et al. Multi-Layer ResNet-DenseNet architecture in consort with the XgBoost classifier for intracranial hemorrhage (ICH) subtype detection and classification
Pahwa et al. Applications of machine learning in pediatric hydrocephalus: a systematic review
CN115910326A (en) Bronchial asthma auxiliary diagnosis method and system based on interpretable machine learning
Umut et al. Prediction of sepsis disease by Artificial Neural Networks
CN114305387A (en) Magnetic resonance imaging-based method, equipment and medium for classifying small cerebral vascular lesion images
US20220192638A1 (en) Method and device for analysis of ultrasound image in first trimester of pregnancy
CN113066584A (en) Prediction method and system for early septicemia
CN112863666A (en) Method and system for predicting poor perinatal outcome of pregnancy acute fatty liver mothers and infants
Kumar et al. Gestational Diabetes Detection Using Machine Learning Algorithm: Research Challenges of Big Data and Data Mining
Perri et al. The future of neonatal lung ultrasound: Validation of an artificial intelligence model for interpreting lung scans. A multicentre prospective diagnostic study

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant