CN115565681A - IgA nephropathy prediction analysis system for unbalanced data - Google Patents
IgA nephropathy prediction analysis system for unbalanced data Download PDFInfo
- Publication number
- CN115565681A CN115565681A CN202211294731.2A CN202211294731A CN115565681A CN 115565681 A CN115565681 A CN 115565681A CN 202211294731 A CN202211294731 A CN 202211294731A CN 115565681 A CN115565681 A CN 115565681A
- Authority
- CN
- China
- Prior art keywords
- data
- iga nephropathy
- sample
- module
- clinical
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H50/00—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
- G16H50/30—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for calculating health indices; for individual health risk assessment
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H10/00—ICT specially adapted for the handling or processing of patient-related medical or healthcare data
- G16H10/20—ICT specially adapted for the handling or processing of patient-related medical or healthcare data for electronic clinical trials or questionnaires
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H50/00—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
- G16H50/20—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for computer-aided diagnosis, e.g. based on medical expert systems
Landscapes
- Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Public Health (AREA)
- Medical Informatics (AREA)
- Biomedical Technology (AREA)
- General Health & Medical Sciences (AREA)
- Data Mining & Analysis (AREA)
- Epidemiology (AREA)
- Primary Health Care (AREA)
- Pathology (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Evolutionary Computation (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Artificial Intelligence (AREA)
- Investigating Or Analysing Biological Materials (AREA)
Abstract
The invention provides an IgA nephropathy prediction analysis system for unbalanced data, and relates to the technical field of data processing and analysis; the device comprises a data collection module, a data preprocessing module, a data normalization module, a model training module and a model prediction module; the data preprocessing module is connected to the data collecting module and is used for preprocessing clinical examination data and pathological examination data to form clinical data F; the data normalization module is connected to the data preprocessing module and is used for carrying out data normalization operation on the obtained clinical data F of the IgA nephropathy patient; the model training module is connected with the data normalization module and is used for training an IgA nephropathy prediction model facing unbalanced data; the model prediction module is connected with the model training module and used for predicting the IgA nephropathy deterioration probability of the clinical sample; the invention has the beneficial effects that: the efficiency of predicting the probability of deterioration of IgA nephropathy patients is improved.
Description
Technical Field
The invention relates to the technical field of data processing and analysis, in particular to an IgA nephropathy prediction analysis system for unbalanced data.
Background
IgA refers to (Immunoglobulin a). IgA nephropathy is the most common immune glomerulonephritis worldwide; diseases occur in all age groups. However, the mechanism of the pathogenesis of IgA nephropathy has not been studied effectively so far, and prediction of IgA nephropathy deterioration still relies on invasive procedures of renal biopsy, and although medical treatment can achieve a certain positive effect, up to 20% to 30% of patients may deteriorate to end-stage nephropathy (uremia). Therefore, the method has important scientific significance and practical significance for predicting the deterioration condition of the IgA nephropathy of the patient through a deep learning algorithm of a neural network.
In the actual IgA nephropathy data analysis, most clinical specimens present an unbalanced data distribution, namely: only a small fraction of the samples were worsening to end stage renal disease (uremia), while most patient samples were healthy. This unbalanced number distribution of samples makes training of the neural network for IgA nephropathy very difficult. Because, on the one hand, an excessive number of healthy patient samples over-fit the neural network after training, the predicted outcome of IgA nephropathy exacerbation will be biased more towards a large number of healthy patient samples; on the other hand, a limited number of samples of a small number of deteriorated IgA nephropathy patients may leave the IgA nephropathy prediction model insufficiently trained and under-fitted, making the data analysis results for IgA nephropathy patients who have actually deteriorated less accurate.
Disclosure of Invention
In order to overcome the defects of the prior art, the invention provides a prediction analysis system for IgA nephropathy based on unbalanced data.
The technical scheme adopted by the invention for solving the technical problem is as follows: the improvement of the system is that the system comprises a data collection module, a data preprocessing module, a data normalization module, a model training module and a model prediction module;
the data collection module is used for collecting clinical examination data and pathological examination data of IgA nephropathy patients and corresponding deterioration labels of the IgA nephropathy patients;
the data preprocessing module is connected to the data collecting module, and is used for preprocessing clinical examination data and pathological examination data, removing samples with data loss to obtain clinical examination data and case examination data which can be used for model training and prediction, and splicing and combining the two data to form clinical data F;
the data normalization module is connected to the data preprocessing module and is used for carrying out data normalization operation on the obtained clinical data F of the IgA nephropathy patient to obtain a data set which can be used for model training and testing;
the model training module is connected with the data normalization module and used for training an IgA nephropathy prediction model facing unbalanced data, and the unbalanced data is sample distribution with unbalanced sample labels;
and the model prediction module is connected with the model training module and used for predicting the IgA nephropathy deterioration probability of the clinical sample by using the IgA nephropathy prediction model facing the unbalanced data.
In the above structure, the clinical examination data is laboratory sheet data obtained by performing a blood examination on a blood sample collected from an IgA nephropathy patient and performing a urine examination on a urine sample collected from the IgA nephropathy patient using a medical instrument, and includes blood creatinine, a glomerular filtration rate, blood pressure, and uric acid.
In the above configuration, the pathological examination data is data relating to the affected renal disease obtained by biopsy of a kidney of a IgA nephropathy patient.
In the above structure, the deterioration flag is a judgmentWhether the IgA nephropathy is worsened or not is judged as whether the end stage nephropathy is reached or whether the eGFR is reduced by more than 50%, wherein the eGFR is glomerular filtration rate, and the end stage nephropathy means that the eGFR is less than 15ml/min/1.73m 2 Or the initiation of renal replacement therapy for more than 3 months.
In the above structure, the clinical data is represented by F = [ F = [ ] 1 ,f 2 ,...,f n ]Wherein n represents a total of n indices, f i I is more than or equal to 1 and less than or equal to n;
the deterioration label was treated as a binary label Y of 1 and 0 as a label for the model training set test, where 1 indicates that the patient has deteriorated IgA nephropathy and 0 indicates that there is no deterioration in IgA nephropathy.
In the above structure, each data sample in the data set includes clinical data F of the patient and a deterioration label corresponding to the patient;
the data set consists of a training set consisting of 70% of the data set of all patients and a test set consisting of 30% of the data set of all patients.
In the above structure, the clinical data F is mapped between 0 and 1 by the following formula to avoid the difficulty of model training caused by too large data range difference:
wherein f is i Representing the ith clinical data index in the clinical data F as the clinical data of the corresponding patient; f. of min Minimum of the ith clinical data, f, for all patients max Maximum value representing the ith clinical data for all patients; x is the number of i Represents the standard value after the ith clinical data was normalized, and the clinical data after normalization is represented by X = [ X ] 1 ,x 2 ,...,x n ]。
In the above structure, the model training module trains the IgA nephropathy prediction model for the unbalanced data by using an unbalanced data oriented learning method; the learning method facing the unbalanced data adopts a resampling method, the offset of the model to the tail sample is adjusted, and the resampling refers to resampling according to the sample distribution.
In the above structure, the method for training the IgA nephropathy prediction model based on unbalanced data is trained by using a progressive sampling method, and the progressive sampling method combines uniform sampling based on samples and sampling based on class balance;
the uniform sampling based on samples refers to a uniform sampling method which is not designed for unbalanced distribution, and one sample is randomly selected as a training sample according to uniform distribution for model training, and is expressed as follows:
wherein p is i Denotes the probability that the ith sample was sampled, C denotes the total number of all classes, n i Represents the total number of samples contained in the ith sample;
based on class equalization sampling, a class is selected from a class set according to uniform distribution, and then a sample instance is selected from the class according to uniform distribution for subsequent model training, which is expressed as:
wherein p is i Represents the probability that the ith sample was sampled, and C represents the total number of all classes;
the function of the method of progressive sampling is expressed as:
wherein p is i Representing the probability that the ith sample was sampled, T representing the tth training round, T representing the full round of training,the sample-based sampling method and sampling probability are expressed as follows:
in the above configuration, the clinical data obtained by progressive sampling is used for the IgA nephropathy classifier to classify the data, thereby predicting the IgA nephropathy deterioration probability;
the IgA nephropathy classifier is a two-classification neural network and is used for judging whether an input patient sample is deteriorated or not, and outputting a judgment result of the classifier, wherein 0 represents deterioration, and 1 represents no deterioration;
model training using a cross-entropy function as a loss function, the cross-entropy functionIs represented as follows:
wherein, Y i A true deterioration label indicating the ith IgA nephropathy patient sample,represents the probability of worsening renal disease predicted by the model for the ith IgA nephropathy patient sample.
In the above-mentioned structure, the method is adoptedWhen the IgA nephropathy prediction model facing unbalanced data obtained by training is used for prediction, for a test set sample, the clinical data of the IgA nephropathy patient sample to be tested, which is obtained by inputting the data preprocessing module, is X = [ X ] 1 ,x 2 ,...,x n ]The clinical data is directly input to the IgA-nephropathy classifier, and the trained IgA-nephropathy prediction model for unbalanced data can output the IgA-nephropathy deterioration probability of the patient by the IgA-nephropathy classifier.
In the above configuration, the system for predictive analysis of IgA nephropathy based on unbalanced data further includes a report generation module connected to the model prediction module, and the report generation module is configured to output a report for analyzing a deterioration condition of a nephropathy of a given IgA nephropathy patient to be tested.
The invention has the beneficial effects that: the prediction efficiency of the IgA nephropathy patient deterioration probability is improved, and doctors are helped to master the disease development rule.
Drawings
FIG. 1 is a schematic diagram showing a framework configuration of an IgA nephropathy prediction analysis system for unbalanced data according to the present invention.
FIG. 2 is a schematic flow chart showing the method of the present invention for the predictive analysis of IgA nephropathy based on unbalanced data.
Detailed Description
The invention is further illustrated by the following examples in conjunction with the drawings.
The conception, the specific structure and the technical effects produced by the present invention will be clearly and completely described in conjunction with the embodiments and the attached drawings, so as to fully understand the objects, the features and the effects of the present invention. It is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all embodiments, and those skilled in the art can obtain other embodiments without inventive effort based on the embodiments of the present invention, and all embodiments are within the protection scope of the present invention. In addition, all the connection/connection relations referred to in the patent do not mean that the components are directly connected, but mean that a better connection structure can be formed by adding or reducing connection auxiliary components according to specific implementation conditions. All technical characteristics in the invention can be interactively combined on the premise of not conflicting with each other.
The invention provides an IgA nephropathy prediction analysis system for unbalanced data, which comprises a data collection module, a data preprocessing module, a data normalization module, a model training module based on a learning algorithm of unbalanced sample data, a model prediction module and a report display module. The method comprises the steps of collecting sample clinical data through a data collection module, and preprocessing the collected clinical sample data in a data preprocessing module. And then, normalizing the preprocessed sample data in a data normalization module for subsequent training. In the model training module, an IgA nephropathy deterioration probability prediction model is trained using a learning algorithm based on imbalance of sample data. After the trained IgA nephropathy deterioration probability prediction model is obtained, the trained IgA nephropathy deterioration probability prediction model is used for predicting the IgA nephropathy deterioration probability of the clinical patient in the model prediction module. Finally, the report generation and display device generates a disease deterioration probability prediction report of the clinical sample.
Referring to fig. 1, in the present embodiment, the system for predictive analysis of IgA nephropathy based on unbalanced data includes a data collection module, a data preprocessing module, a data normalization module, a model training module, a model prediction module, and a report generation module.
The data collection module is used for collecting clinical examination data and pathological examination data of IgA nephropathy patients and corresponding deterioration labels of the IgA nephropathy patients; in this embodiment, the clinical examination data is laboratory sheet data obtained by performing a blood examination on a blood sample collected from an IgA nephropathy patient and performing a urine examination on a urine sample collected from the IgA nephropathy patient using a medical apparatus, and includes blood creatinine, a glomerular filtration rate, blood pressure, and uric acid. The pathological examination data is data related to the affected kidney disease obtained by biopsy of a kidney of a IgA nephropathy patient. In a specific embodiment, the pathological examination data includes five types of indicators, i.e., M, E, S, T, and C, wherein M (mesenchial hypercelluliarity) represents Mesangial cell proliferation: more than 50% of glomeruli have mesangial cell proliferation which is M1, otherwise MO; e (endothelial hypercellularity) indicates endothelial cell proliferation of the capillaries: e1 if there is capillary endothelial cell proliferation, otherwise E0; s (Segmental glomerosclerotiosis) indicates a hardening of the glomerular segment: s1 if there is glomerular segment sclerosis or adhesion, otherwise S0; t (Tubular atrophy/interstitial fibrosis) indicates renal Tubular atrophy or renal interstitial fibrosis: TO represents a proportion of renal tubular atrophy or renal interstitial fibrosis of less than 25%, T1 represents a proportion of renal tubular atrophy or renal interstitial fibrosis of more than 25% and less than 50%, T2 represents a proportion of renal tubular atrophy or renal interstitial fibrosis of more than 50%; c (Cellular fibroblast cultures) represents a Cellular or fibrocellular crescent: CO indicates the absence of cellular or fibrocellular crescents, C1 indicates the presence of less than 25% glomeruli and cellular or fibrocellular crescents, and C2 indicates the presence of more than 25% glomeruli or fibrocellular crescents.
In addition, the deterioration label is used for judging whether the IgA nephropathy is deteriorated or not, and the judgment standard is whether the terminal nephropathy is reached or whether the eGFR is reduced by more than 50%, wherein the eGFR is glomerular filtration rate, and the terminal nephropathy means that the eGFR is less than 15ml/min/1.73m 2 Or the initiation of renal replacement therapy for more than 3 months.
Note that, the clinical examination data and the pathological examination data of the IgA nephropathy patient and the deterioration label corresponding to the IgA nephropathy patient are only provided as an example in the present embodiment; the data collection module in the invention does not directly collect various data for IgA nephropathy patients, and the data collection module only plays a role in collecting various data. In a specific embodiment, the data collection module is a port for data input.
The data preprocessing module is connected to the data collecting module, and is used for preprocessing the clinical examination data and the pathological examination data, eliminating samples with data loss, obtaining the clinical examination data and the case examination data which can be used for model training and prediction, and splicing and combining the two data to form clinical data F serving as input data of subsequent model training and testing. In the bookIn the examples, the clinical data are expressed as F = [ F = [ ] 1 ,f 2 ,...,f n ]Where n denotes a total of n indices, f i I is more than or equal to 1 and less than or equal to n; the deterioration label was treated as a binary label Y of 1 and 0 as a label for the model training set test, where 1 indicates that the patient has deteriorated IgA nephropathy and 0 indicates that there is no deterioration in IgA nephropathy.
The data normalization module is connected to the data preprocessing module and is used for carrying out data normalization operation on the obtained clinical data F of the IgA nephropathy patient to obtain a data set which can be used for model training and testing; in this embodiment, each data sample in the data set includes clinical data of the patient and a deterioration label corresponding to the patient; the final data set of the IgA nephropathy prediction model facing the unbalanced data consists of a training set and a test set; where the training set consists of a 70% data set of all patients and the test set consists of a 30% data set of all patients.
The data normalization refers to mapping the clinical data F between 0 and 1 by the following formula to avoid the difficulty of model training caused by too large data range difference:
wherein, f i Representing the ith clinical data index in the clinical data F as the clinical data of the corresponding patient; f. of min Minimum of the ith clinical data, f, for all patients max Maximum value representing the ith clinical data for all patients; x is the number of i Represents the standard value after the i-th clinical data is normalized, and the clinical data after normalization is represented by X = [ X ] 1 ,x 2 ,...,x n ]。
The model training module is connected with the data normalization module and used for training an IgA nephropathy prediction model facing unbalanced data, and the unbalanced data are sample distribution with unbalanced sample labels.
In this embodiment, an imbalance data-oriented IgA nephropathy prediction model is trained in this module by an imbalance data-oriented learning method for subsequent IgA patient deterioration probability prediction.
In the present invention, the data processed, most clinical samples, exhibited an unbalanced data distribution, namely: only a small fraction of the samples are those that worsen to end stage renal disease (uremia), called tail samples, while most of the patient samples are those that do not worsen, called head samples. This unbalanced distribution can result in an excessive number of healthy patient samples being sampled more during the training of the neural network, and provided to the neural network for training, and overfitting the neural network after training, to bias more the prediction of IgA nephropathy degradation to a large number of non-degraded head patient samples; for a limited number of tail IgA nephropathy patient samples with small deterioration, the IgA nephropathy prediction model is not sufficiently trained, so that fitting is insufficient, and the prediction result of the actually deteriorated IgA nephropathy patient is not accurate enough.
The learning method facing the unbalanced data is to adjust the offset of the model to the tail sample by utilizing a resampling method. Resampling refers to resampling according to the distribution of samples. Typically under-sampling the head class and over-sampling the tail class. The problem of under-fitting to the tail class is avoided by increasing the number of more tail class samples in the process of training the model.
In the invention, the training method of the IgA nephropathy prediction model facing the unbalanced data adopts a progressive sampling method for training, and the progressive sampling method integrates uniform sampling based on samples and sampling based on class balance;
the uniform sampling based on samples refers to a uniform sampling method which is not designed for unbalanced distribution, and one sample is randomly selected as a training sample according to uniform distribution for model training, and is expressed as follows:
wherein p is i Denotes the probability that the ith sample was sampled, C denotes the total number of all classes, n i Denotes the total number of samples contained in the ith sample, n j Represents the total number of samples contained in the jth sample;
based on class equalization sampling, a class is selected from a class set according to uniform distribution, and then a sample instance is selected from the class according to uniform distribution for subsequent model training, which is expressed as:
wherein p is i Representing the probability that the ith sample was sampled and C representing the total number of all classes.
The progressive sampling method aggregates the two methods, which is a step-by-step balanced sampling, with a step-by-step interpolation between sample-based sampling and class-equalization-based sampling as the model learning progresses. In the early training stage, sample-based sampling is favored, the aim is to obtain better feature representation, and in the later training stage, unbalance-oriented balance for sample classes is introduced, so that under-fitting for tail classes and over-fitting for head classes caused by bias to the head classes are prevented.
The function of the method of progressive sampling is expressed as:
wherein p is i Representing the probability that the ith sample was sampled, T representing the tth training round, T representing the full round of training,the sample-based sampling method and sampling probability are expressed as follows:
in this embodiment, for a given input training set data sample, sample sampling is performed according to the progressive sampling function, and the sampled clinical data is used for subsequent IgA nephropathy classifier classification; after the characteristic representation is obtained, the feature is input to an IgA nephropathy classifier to predict IgA nephropathy deterioration probability.
The IgA nephropathy classifier is a two-classification neural network and is used for judging whether an input patient sample is deteriorated or not, and outputting a judgment result of the classifier, wherein 0 represents deterioration, and 1 represents no deterioration;
model training using cross entropy function as loss function, cross entropy functionIs represented as follows:
wherein Y is i A true deterioration label indicating the ith IgA nephropathy patient sample,represents the probability of worsening renal disease predicted by the model for the ith IgA nephropathy patient sample.
The precision of the model refers to the accuracy of the model, that is, the proportion of the number of correctly classified samples in the test set to the total number of samples in the test set.
And the model prediction module is connected with the model training module and used for predicting the IgA nephropathy deterioration probability of the clinical sample by using the IgA nephropathy prediction model facing the unbalanced data.
In this embodiment, when the IgA nephropathy prediction model for unbalanced data obtained by training is used for prediction, for a test set sample, the clinical data of the IgA nephropathy patient sample to be tested, which is obtained by inputting the data preprocessing module, is X = [ X ]) 1 ,x 2 ,...,x n ]The clinical data is directly input to the IgA-nephropathy classifier, and the trained IgA-nephropathy prediction model for unbalanced data can output the IgA-nephropathy deterioration probability of the patient by the IgA-nephropathy classifier.
In the above-described configuration, a report generation module for outputting a report of a deterioration of renal disease analysis for a given IgA nephropathy patient to be tested is connected to the model prediction module. And the report is uploaded to a prediction analysis system platform of the IgA nephropathy for unbalanced data, and a patient can inquire the report at a mobile phone terminal, a tablet and other terminals.
The invention provides an IgA nephropathy prediction analysis system for unbalanced data, which takes patient clinical data as input and takes the probability of possible deterioration of a patient as output. The method is characterized in that the method comprehensively considers unbalanced data problems in the IgA nephropathy prediction problem, designs a robust prediction system to enable the examination effect to be more accurate, automatically compares and analyzes by using an artificial intelligence algorithm, improves the prediction efficiency of the IgA nephropathy patient deterioration probability, is beneficial to a doctor to master the disease development rule when the doctor intervenes in the treatment of the patient, and is beneficial to subsequent treatment and prognosis.
As shown in fig. 2, the analysis for predicting IgA nephropathy based on unbalanced data according to the present invention specifically includes the following steps:
s1, collecting data, namely collecting clinical examination data and pathological examination data of IgA nephropathy patients and corresponding deterioration labels of the IgA nephropathy patients through a data collection module;
in this example, the clinical examination data, the pathological examination data, and the deterioration label corresponding to the IgA nephropathy patient are the same as those in the above example, and therefore, detailed description thereof will be omitted in this example.
S2, preprocessing patient data, namely preprocessing clinical examination data and pathological examination data through a data preprocessing module, removing samples with data loss to obtain clinical examination data and case examination data which can be used for model training and prediction, and splicing and combining the two data to form clinical data F; in addition, the method also comprises the following steps: pre-processing the patient deterioration label into 1 and 0 deterioration labels;
in this example, the clinical data is expressed as F = [ F = [ ] 1 ,f 2 ,...,f n ]Wherein n represents a total of n indices, f i I is more than or equal to 1 and less than or equal to n; the deterioration label was treated as a binary label Y of 1 and 0 as a label for the model training set test, where 1 indicates that the patient has deteriorated IgA nephropathy and 0 indicates that there is no deterioration in IgA nephropathy.
S3, normalizing the clinical data of the patient, and normalizing the clinical data of the patient;
the data normalization refers to mapping the clinical data F to be between 0 and 1 by the following formula so as to avoid the problem that the data range is too different to increase the difficulty of model training:
wherein, f i Representing the ith clinical data index in the clinical data F as the clinical data of the corresponding patient; f. of min Minimum of the ith clinical data, f, representing all patients max Maximum of the ith clinical data representing all patients; x is the number of i Represents the standard value after the i-th clinical data is normalized, and the clinical data after normalization is represented by X = [ X ] 1 ,x 2 ,...,x n ]。
S4, dividing a training set and a testing set, and dividing a data set consisting of all patient samples into the training set and the testing set;
in the step S4, 70% of data sets of all patient samples are divided into training sets for model training; dividing a data set of 30% of all patient samples into test sets for model testing;
s5, carrying out unbalance data-oriented IgA nephropathy diagnosis model training, carrying out sample sampling on a given input training set data sample according to a progressive sampling function, and using the sampled clinical data for an IgA nephropathy classifier to classify;
in step S5, the progressive sampling method aggregates uniform sampling based on samples and sampling based on class equalization;
the uniform sampling based on samples refers to a uniform sampling method which is not designed for unbalanced distribution, and one sample is randomly selected as a training sample according to uniform distribution for model training, and is expressed as follows:
wherein p is i Denotes the probability that the ith sample was sampled, C denotes the total number of all classes, n i Represents the total number of samples contained in the ith sample;
based on class equalization sampling, a class is selected from a class set according to uniform distribution, and then a sample instance is selected from the class according to uniform distribution for subsequent model training, which is expressed as:
wherein p is i Represents the probability that the ith sample was sampled, and C represents the total number of all classes;
the function of the method of progressive sampling is expressed as:
wherein p is i Representing the probability that the ith sample was sampled, T representing the tth training round, T representing the full round of training,the sample-based sampling method and sampling probability are expressed as follows:
using the clinical data subjected to progressive sampling for an IgA nephropathy classifier to classify, and predicting the IgA nephropathy deterioration probability;
the IgA nephropathy classifier is a neural network of two classifications, and is used for judging whether an input patient sample is deteriorated or not, and outputting a judgment result of the classifier, wherein 0 represents deterioration, and 1 represents not deterioration;
model training using cross entropy function as loss function, cross entropy functionIs represented as follows:
wherein Y is i A true deterioration label indicating the ith IgA nephropathy patient sample,indicates the ith IgA nephropathy patientA probability of kidney disease deterioration predicted by a model of the sample of subjects.
S6, predicting the IgA nephropathy deterioration probability of the patient, wherein the IgA nephropathy deterioration probability of the clinical sample is predicted by using an IgA nephropathy prediction model facing to unbalanced data;
when the IgA nephropathy prediction model for unbalanced data obtained through training is used for prediction, clinical data of an IgA nephropathy patient sample to be tested, which is obtained by the data preprocessing module, is input to the test set sample, the clinical data is directly input to the IgA nephropathy classifier, and the IgA nephropathy prediction model for unbalanced data obtained through training can output the IgA nephropathy deterioration probability of a patient through the IgA nephropathy classifier.
S7, generating an IgA nephropathy diagnosis and treatment report, and outputting an IgA nephropathy deterioration condition examination and an IgA nephropathy analysis report for a patient to be predicted.
The invention aims at unbalanced data distribution of IgA nephropathy clinical samples, adopts a resampling algorithm based on class frequency and a decoupled two-stage training mode, improves the IgA nephropathy prediction effect on data efficiency, and provides an IgA nephropathy prediction analysis system for unbalanced data, so that the prediction result is relatively more accurate and robust and has generalization.
While the preferred embodiments of the present invention have been illustrated and described, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.
Claims (12)
1. The IgA nephropathy prediction analysis system for unbalanced data is characterized by comprising a data collection module, a data preprocessing module, a data normalization module, a model training module and a model prediction module;
the data collection module is used for collecting clinical examination data and pathological examination data of IgA nephropathy patients and corresponding deterioration labels of the IgA nephropathy patients;
the data preprocessing module is connected to the data collecting module, and is used for preprocessing the clinical examination data and the pathological examination data, removing samples with data loss to obtain the clinical examination data and the case examination data which can be used for model training and prediction, and splicing and combining the two data to form clinical data F;
the data normalization module is connected to the data preprocessing module and is used for carrying out data normalization operation on the obtained clinical data F of the IgA nephropathy patient to obtain a data set which can be used for model training and testing;
the model training module is connected with the data normalization module and is used for training an IgA nephropathy prediction model facing unbalanced data, and the unbalanced data are sample distribution with unbalanced sample labels;
and the model prediction module is connected with the model training module and used for predicting the IgA nephropathy deterioration probability of the clinical sample by using the IgA nephropathy prediction model facing the unbalanced data.
2. The system for the predictive analysis of IgA nephropathy based on unbalanced data according to claim 1, wherein the clinical examination data is laboratory sheet data obtained by performing a blood examination by collecting a blood sample and a urine examination by collecting a urine sample from the IgA nephropathy patient by using a medical instrument, and includes blood creatinine, glomerular filtration rate, blood pressure, and uric acid.
3. The system for predictive analysis of IgA nephropathy based on unbalanced data as claimed in claim 1, wherein the pathological examination data is data relating to the renal disease of the IgA nephropathy patient obtained by biopsy of a kidney of the patient.
4. The system for predictive analysis of IgA nephropathy to imbalance data as claimed in claim 1, wherein the severity label is used to determine whether IgA nephropathy is worsening based on the criterion of end stage nephropathy being reached or a reduction in eGFR of more than 50%, wherein eGFR is glomerular filtration rate and end stage nephropathy is eGFR < 15ml/min/1.73m 2 Or the time period for starting the renal replacement therapy lasts more than 3 months.
5. The system for the predictive analysis of IgA nephropathy based on unbalanced data according to claim 1, wherein the clinical data is represented by F = [ F ] 1 ,f 2 ,...,f n ]Where n denotes a total of n indices, f i I is more than or equal to 1 and less than or equal to n;
the deterioration label was treated as a binary label Y of 1 and 0 as a label for the model training set test, where 1 indicates that the patient has deteriorated IgA nephropathy and 0 indicates that there is no deterioration in IgA nephropathy.
6. The system of claim 5, wherein each data sample in the data set comprises clinical data F of the patient and a deterioration label associated with the patient;
the data set consists of a training set consisting of 70% of the data set of all patients and a test set consisting of 30% of the data set of all patients.
7. The system for predictive analysis of IgA nephropathy over imbalance data as claimed in claim 5, wherein the clinical data F is mapped between 0 and 1 by the following formula to avoid too large a range of data to increase the difficulty of model training:
wherein f is i Representing the ith clinical data index in the clinical data F as the clinical data of the corresponding patient; f. of min Minimum of the ith clinical data, f, representing all patients max Maximum value representing the ith clinical data for all patients; x is the number of i Represents the standard value after the ith clinical data was normalized, and the clinical data after normalization is represented by X = [ X ] 1 ,x 2 ,…,x n ]。
8. The system for predictive analysis of IgA nephropathy based on unbalanced data of claim 7, wherein the model training module trains the IgA nephropathy predictive model based on unbalanced data by using an unbalanced data learning method; the learning method facing the unbalanced data adopts a resampling method, the offset of the model to the tail sample is adjusted, and the resampling refers to resampling according to the sample distribution.
9. The system for predictive analysis of IgA nephropathy based on unbalanced data according to claim 8, wherein the method for training the IgA nephropathy predictive model based on unbalanced data is trained by a progressive sampling method, and the progressive sampling method integrates uniform sampling based on samples and sampling based on class equalization;
the uniform sampling based on samples refers to a uniform sampling method which is not designed for unbalanced distribution, and one sample is randomly selected as a training sample according to uniform distribution for model training, and is expressed as follows:
wherein p is i Denotes the probability that the ith sample was sampled, C denotes the total number of all classes, n i Represents the total number of samples contained in the ith sample;
based on class equalization sampling, a class is selected from a class set according to uniform distribution, and then a sample instance is selected from the class according to uniform distribution for subsequent model training, which is expressed as:
wherein p is i Represents the ithThe probability that a sample is sampled, C represents the total number of all classes;
the function of the method of progressive sampling is expressed as:
wherein p is i Representing the probability that the ith sample was sampled, T representing the tth training round, T representing the full round of training,the sample-based sampling method and sampling probability are expressed as follows:
10. the system for predictive analysis of IgA nephropathy based on unbalanced data according to claim 9, wherein clinical data sampled progressively is classified by an IgA nephropathy classifier to predict IgA nephropathy deterioration probability;
the IgA nephropathy classifier is a neural network of two classifications, and is used for judging whether an input patient sample is deteriorated or not, and outputting a judgment result of the classifier, wherein 0 represents deterioration, and 1 represents not deterioration;
model training using a cross-entropy function as a loss function, the cross-entropy functionIs represented as follows:
11. The system for the predictive analysis of IgA nephropathy based on unbalanced data according to claim 10, wherein clinical data of a sample of an IgA nephropathy patient to be tested obtained as an input data preprocessing module for a sample of a test set at the time of prediction using an unbalanced data-oriented IgA nephropathy prediction model obtained by training is X = [ X ] in the case of prediction using an unbalanced data-oriented IgA nephropathy prediction model obtained by training 1 ,x 2 ,...,x n ]The clinical data X is directly input to the IgA-nephropathy classifier, and the IgA-nephropathy deterioration probability of the patient is output by the IgA-nephropathy classifier, which is an unbalance data-oriented IgA-nephropathy prediction model obtained by training.
12. The system for predictive analysis of IgA nephropathy oriented towards unbalanced data of claim 1, further comprising a report generation module connected to the model prediction module for outputting an analysis report of the deterioration of nephropathy for a given IgA nephropathy patient to be tested.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211294731.2A CN115565681A (en) | 2022-10-21 | 2022-10-21 | IgA nephropathy prediction analysis system for unbalanced data |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211294731.2A CN115565681A (en) | 2022-10-21 | 2022-10-21 | IgA nephropathy prediction analysis system for unbalanced data |
Publications (1)
Publication Number | Publication Date |
---|---|
CN115565681A true CN115565681A (en) | 2023-01-03 |
Family
ID=84746447
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211294731.2A Pending CN115565681A (en) | 2022-10-21 | 2022-10-21 | IgA nephropathy prediction analysis system for unbalanced data |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115565681A (en) |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20200303075A1 (en) * | 2019-03-18 | 2020-09-24 | Kundan Krishna | System and a method to predict occurrence of a chronic diseases |
WO2021190300A1 (en) * | 2020-03-26 | 2021-09-30 | 肾泰网健康科技(南京)有限公司 | Method for constructing ai chronic kidney disease risk screening model, and chronic kidney disease risk screening method and system |
CN113990521A (en) * | 2021-10-22 | 2022-01-28 | 北京大学人民医院 | IgA nephropathy pathological analysis, prognosis prediction and pathological index mining system |
CN114283307A (en) * | 2021-12-24 | 2022-04-05 | 中国科学技术大学 | Network training method based on resampling strategy |
US20220122739A1 (en) * | 2020-03-07 | 2022-04-21 | Huazhong University Of Science And Technology | Ai-based condition classification system for patients with novel coronavirus |
-
2022
- 2022-10-21 CN CN202211294731.2A patent/CN115565681A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20200303075A1 (en) * | 2019-03-18 | 2020-09-24 | Kundan Krishna | System and a method to predict occurrence of a chronic diseases |
US20220122739A1 (en) * | 2020-03-07 | 2022-04-21 | Huazhong University Of Science And Technology | Ai-based condition classification system for patients with novel coronavirus |
WO2021190300A1 (en) * | 2020-03-26 | 2021-09-30 | 肾泰网健康科技(南京)有限公司 | Method for constructing ai chronic kidney disease risk screening model, and chronic kidney disease risk screening method and system |
CN113990521A (en) * | 2021-10-22 | 2022-01-28 | 北京大学人民医院 | IgA nephropathy pathological analysis, prognosis prediction and pathological index mining system |
CN114283307A (en) * | 2021-12-24 | 2022-04-05 | 中国科学技术大学 | Network training method based on resampling strategy |
Non-Patent Citations (2)
Title |
---|
曾彩虹;: "\'IgA肾病牛津分类的理论依据及临床病理相关性分析\"", 《肾脏病与透析肾移植杂志》, no. 05 * |
邓晓蔚;熊有明;丁世永;钟远斌;: ""IgA肾病进展至终末期肾病的研究新进展"", 《健康之路》, no. 02 * |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109670510B (en) | Deep learning-based gastroscope biopsy pathological data screening system | |
CN109543719B (en) | Cervical atypical lesion diagnosis model and device based on multi-modal attention model | |
US20220198661A1 (en) | Artificial intelligence based medical image automatic diagnosis system and method | |
CN111539308B (en) | Embryo quality comprehensive evaluation device based on deep learning | |
CN116821753A (en) | Machine learning-based community acquired pneumonia pathogen type prediction method | |
CN111079901A (en) | Acute stroke lesion segmentation method based on small sample learning | |
CN115394426A (en) | Juvenile IgA nephropathy prediction analysis system based on transfer learning | |
CN112950614A (en) | Breast cancer detection method based on multi-scale cavity convolution | |
CN110969616B (en) | Method and device for evaluating oocyte quality | |
CN114038507A (en) | Prediction method, training method of prediction model and related device | |
CN116189909B (en) | Clinical medicine discriminating method and system based on lifting algorithm | |
CN115565681A (en) | IgA nephropathy prediction analysis system for unbalanced data | |
Zhang et al. | Deep learning-based methods for classification of microsatellite instability in endometrial cancer from HE-stained pathological images | |
CN115346598A (en) | Chronic kidney disease genetic gene risk screening system | |
CN116563224A (en) | Image histology placenta implantation prediction method and device based on depth semantic features | |
CN115274110A (en) | IgA nephropathy deterioration prediction analysis report generation system based on time series | |
CN116631617B (en) | Prostate Gleason scoring system | |
CN113222061B (en) | MRI image classification method based on two-way small sample learning | |
CN115064267B (en) | Biliary tract occlusion risk assessment system and establishment method thereof | |
KR20190081825A (en) | A cancer determiner utilizing machine learning and mass analysis and a method performing by the cancer determiner | |
Yördan et al. | Hybrid AI-Based Chronic Kidney Disease Risk Prediction | |
Eswaran et al. | Assessment of Human Blastocyst using Deep Learning Algorithm | |
WO2023102786A1 (en) | Application of gene marker in prediction of premature birth risk of pregnant woman | |
CN115050466A (en) | Accurate diagnosis and treatment system for traumatic brain injury based on combined monitoring of multiple biomarkers | |
CN116978567A (en) | HPV infection ending individuation prediction model construction method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |