CN115565681A - IgA nephropathy prediction analysis system for unbalanced data - Google Patents

IgA nephropathy prediction analysis system for unbalanced data Download PDF

Info

Publication number
CN115565681A
CN115565681A CN202211294731.2A CN202211294731A CN115565681A CN 115565681 A CN115565681 A CN 115565681A CN 202211294731 A CN202211294731 A CN 202211294731A CN 115565681 A CN115565681 A CN 115565681A
Authority
CN
China
Prior art keywords
data
iga nephropathy
sample
module
clinical
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211294731.2A
Other languages
Chinese (zh)
Inventor
段立新
刘丹蕾
魏凡越
李文
徐博润
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Higher Research Institute Of University Of Electronic Science And Technology Shenzhen
Original Assignee
Higher Research Institute Of University Of Electronic Science And Technology Shenzhen
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Higher Research Institute Of University Of Electronic Science And Technology Shenzhen filed Critical Higher Research Institute Of University Of Electronic Science And Technology Shenzhen
Priority to CN202211294731.2A priority Critical patent/CN115565681A/en
Publication of CN115565681A publication Critical patent/CN115565681A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/30ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for calculating health indices; for individual health risk assessment
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H10/00ICT specially adapted for the handling or processing of patient-related medical or healthcare data
    • G16H10/20ICT specially adapted for the handling or processing of patient-related medical or healthcare data for electronic clinical trials or questionnaires
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/20ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for computer-aided diagnosis, e.g. based on medical expert systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Public Health (AREA)
  • Medical Informatics (AREA)
  • Biomedical Technology (AREA)
  • General Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Epidemiology (AREA)
  • Primary Health Care (AREA)
  • Pathology (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Investigating Or Analysing Biological Materials (AREA)

Abstract

The invention provides an IgA nephropathy prediction analysis system for unbalanced data, and relates to the technical field of data processing and analysis; the device comprises a data collection module, a data preprocessing module, a data normalization module, a model training module and a model prediction module; the data preprocessing module is connected to the data collecting module and is used for preprocessing clinical examination data and pathological examination data to form clinical data F; the data normalization module is connected to the data preprocessing module and is used for carrying out data normalization operation on the obtained clinical data F of the IgA nephropathy patient; the model training module is connected with the data normalization module and is used for training an IgA nephropathy prediction model facing unbalanced data; the model prediction module is connected with the model training module and used for predicting the IgA nephropathy deterioration probability of the clinical sample; the invention has the beneficial effects that: the efficiency of predicting the probability of deterioration of IgA nephropathy patients is improved.

Description

IgA nephropathy prediction analysis system for unbalanced data
Technical Field
The invention relates to the technical field of data processing and analysis, in particular to an IgA nephropathy prediction analysis system for unbalanced data.
Background
IgA refers to (Immunoglobulin a). IgA nephropathy is the most common immune glomerulonephritis worldwide; diseases occur in all age groups. However, the mechanism of the pathogenesis of IgA nephropathy has not been studied effectively so far, and prediction of IgA nephropathy deterioration still relies on invasive procedures of renal biopsy, and although medical treatment can achieve a certain positive effect, up to 20% to 30% of patients may deteriorate to end-stage nephropathy (uremia). Therefore, the method has important scientific significance and practical significance for predicting the deterioration condition of the IgA nephropathy of the patient through a deep learning algorithm of a neural network.
In the actual IgA nephropathy data analysis, most clinical specimens present an unbalanced data distribution, namely: only a small fraction of the samples were worsening to end stage renal disease (uremia), while most patient samples were healthy. This unbalanced number distribution of samples makes training of the neural network for IgA nephropathy very difficult. Because, on the one hand, an excessive number of healthy patient samples over-fit the neural network after training, the predicted outcome of IgA nephropathy exacerbation will be biased more towards a large number of healthy patient samples; on the other hand, a limited number of samples of a small number of deteriorated IgA nephropathy patients may leave the IgA nephropathy prediction model insufficiently trained and under-fitted, making the data analysis results for IgA nephropathy patients who have actually deteriorated less accurate.
Disclosure of Invention
In order to overcome the defects of the prior art, the invention provides a prediction analysis system for IgA nephropathy based on unbalanced data.
The technical scheme adopted by the invention for solving the technical problem is as follows: the improvement of the system is that the system comprises a data collection module, a data preprocessing module, a data normalization module, a model training module and a model prediction module;
the data collection module is used for collecting clinical examination data and pathological examination data of IgA nephropathy patients and corresponding deterioration labels of the IgA nephropathy patients;
the data preprocessing module is connected to the data collecting module, and is used for preprocessing clinical examination data and pathological examination data, removing samples with data loss to obtain clinical examination data and case examination data which can be used for model training and prediction, and splicing and combining the two data to form clinical data F;
the data normalization module is connected to the data preprocessing module and is used for carrying out data normalization operation on the obtained clinical data F of the IgA nephropathy patient to obtain a data set which can be used for model training and testing;
the model training module is connected with the data normalization module and used for training an IgA nephropathy prediction model facing unbalanced data, and the unbalanced data is sample distribution with unbalanced sample labels;
and the model prediction module is connected with the model training module and used for predicting the IgA nephropathy deterioration probability of the clinical sample by using the IgA nephropathy prediction model facing the unbalanced data.
In the above structure, the clinical examination data is laboratory sheet data obtained by performing a blood examination on a blood sample collected from an IgA nephropathy patient and performing a urine examination on a urine sample collected from the IgA nephropathy patient using a medical instrument, and includes blood creatinine, a glomerular filtration rate, blood pressure, and uric acid.
In the above configuration, the pathological examination data is data relating to the affected renal disease obtained by biopsy of a kidney of a IgA nephropathy patient.
In the above structure, the deterioration flag is a judgmentWhether the IgA nephropathy is worsened or not is judged as whether the end stage nephropathy is reached or whether the eGFR is reduced by more than 50%, wherein the eGFR is glomerular filtration rate, and the end stage nephropathy means that the eGFR is less than 15ml/min/1.73m 2 Or the initiation of renal replacement therapy for more than 3 months.
In the above structure, the clinical data is represented by F = [ F = [ ] 1 ,f 2 ,...,f n ]Wherein n represents a total of n indices, f i I is more than or equal to 1 and less than or equal to n;
the deterioration label was treated as a binary label Y of 1 and 0 as a label for the model training set test, where 1 indicates that the patient has deteriorated IgA nephropathy and 0 indicates that there is no deterioration in IgA nephropathy.
In the above structure, each data sample in the data set includes clinical data F of the patient and a deterioration label corresponding to the patient;
the data set consists of a training set consisting of 70% of the data set of all patients and a test set consisting of 30% of the data set of all patients.
In the above structure, the clinical data F is mapped between 0 and 1 by the following formula to avoid the difficulty of model training caused by too large data range difference:
Figure BDA0003902649350000021
wherein f is i Representing the ith clinical data index in the clinical data F as the clinical data of the corresponding patient; f. of min Minimum of the ith clinical data, f, for all patients max Maximum value representing the ith clinical data for all patients; x is the number of i Represents the standard value after the ith clinical data was normalized, and the clinical data after normalization is represented by X = [ X ] 1 ,x 2 ,...,x n ]。
In the above structure, the model training module trains the IgA nephropathy prediction model for the unbalanced data by using an unbalanced data oriented learning method; the learning method facing the unbalanced data adopts a resampling method, the offset of the model to the tail sample is adjusted, and the resampling refers to resampling according to the sample distribution.
In the above structure, the method for training the IgA nephropathy prediction model based on unbalanced data is trained by using a progressive sampling method, and the progressive sampling method combines uniform sampling based on samples and sampling based on class balance;
the uniform sampling based on samples refers to a uniform sampling method which is not designed for unbalanced distribution, and one sample is randomly selected as a training sample according to uniform distribution for model training, and is expressed as follows:
Figure BDA0003902649350000031
wherein p is i Denotes the probability that the ith sample was sampled, C denotes the total number of all classes, n i Represents the total number of samples contained in the ith sample;
based on class equalization sampling, a class is selected from a class set according to uniform distribution, and then a sample instance is selected from the class according to uniform distribution for subsequent model training, which is expressed as:
Figure BDA0003902649350000032
wherein p is i Represents the probability that the ith sample was sampled, and C represents the total number of all classes;
the function of the method of progressive sampling is expressed as:
Figure BDA0003902649350000033
wherein p is i Representing the probability that the ith sample was sampled, T representing the tth training round, T representing the full round of training,
Figure BDA0003902649350000041
the sample-based sampling method and sampling probability are expressed as follows:
Figure BDA0003902649350000042
Figure BDA0003902649350000043
the sampling probability representing a class equalization based sampling method is as follows:
Figure BDA0003902649350000044
in the above configuration, the clinical data obtained by progressive sampling is used for the IgA nephropathy classifier to classify the data, thereby predicting the IgA nephropathy deterioration probability;
the IgA nephropathy classifier is a two-classification neural network and is used for judging whether an input patient sample is deteriorated or not, and outputting a judgment result of the classifier, wherein 0 represents deterioration, and 1 represents no deterioration;
model training using a cross-entropy function as a loss function, the cross-entropy function
Figure BDA0003902649350000045
Is represented as follows:
Figure BDA0003902649350000046
wherein, Y i A true deterioration label indicating the ith IgA nephropathy patient sample,
Figure BDA0003902649350000047
represents the probability of worsening renal disease predicted by the model for the ith IgA nephropathy patient sample.
In the above-mentioned structure, the method is adoptedWhen the IgA nephropathy prediction model facing unbalanced data obtained by training is used for prediction, for a test set sample, the clinical data of the IgA nephropathy patient sample to be tested, which is obtained by inputting the data preprocessing module, is X = [ X ] 1 ,x 2 ,...,x n ]The clinical data is directly input to the IgA-nephropathy classifier, and the trained IgA-nephropathy prediction model for unbalanced data can output the IgA-nephropathy deterioration probability of the patient by the IgA-nephropathy classifier.
In the above configuration, the system for predictive analysis of IgA nephropathy based on unbalanced data further includes a report generation module connected to the model prediction module, and the report generation module is configured to output a report for analyzing a deterioration condition of a nephropathy of a given IgA nephropathy patient to be tested.
The invention has the beneficial effects that: the prediction efficiency of the IgA nephropathy patient deterioration probability is improved, and doctors are helped to master the disease development rule.
Drawings
FIG. 1 is a schematic diagram showing a framework configuration of an IgA nephropathy prediction analysis system for unbalanced data according to the present invention.
FIG. 2 is a schematic flow chart showing the method of the present invention for the predictive analysis of IgA nephropathy based on unbalanced data.
Detailed Description
The invention is further illustrated by the following examples in conjunction with the drawings.
The conception, the specific structure and the technical effects produced by the present invention will be clearly and completely described in conjunction with the embodiments and the attached drawings, so as to fully understand the objects, the features and the effects of the present invention. It is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all embodiments, and those skilled in the art can obtain other embodiments without inventive effort based on the embodiments of the present invention, and all embodiments are within the protection scope of the present invention. In addition, all the connection/connection relations referred to in the patent do not mean that the components are directly connected, but mean that a better connection structure can be formed by adding or reducing connection auxiliary components according to specific implementation conditions. All technical characteristics in the invention can be interactively combined on the premise of not conflicting with each other.
The invention provides an IgA nephropathy prediction analysis system for unbalanced data, which comprises a data collection module, a data preprocessing module, a data normalization module, a model training module based on a learning algorithm of unbalanced sample data, a model prediction module and a report display module. The method comprises the steps of collecting sample clinical data through a data collection module, and preprocessing the collected clinical sample data in a data preprocessing module. And then, normalizing the preprocessed sample data in a data normalization module for subsequent training. In the model training module, an IgA nephropathy deterioration probability prediction model is trained using a learning algorithm based on imbalance of sample data. After the trained IgA nephropathy deterioration probability prediction model is obtained, the trained IgA nephropathy deterioration probability prediction model is used for predicting the IgA nephropathy deterioration probability of the clinical patient in the model prediction module. Finally, the report generation and display device generates a disease deterioration probability prediction report of the clinical sample.
Referring to fig. 1, in the present embodiment, the system for predictive analysis of IgA nephropathy based on unbalanced data includes a data collection module, a data preprocessing module, a data normalization module, a model training module, a model prediction module, and a report generation module.
The data collection module is used for collecting clinical examination data and pathological examination data of IgA nephropathy patients and corresponding deterioration labels of the IgA nephropathy patients; in this embodiment, the clinical examination data is laboratory sheet data obtained by performing a blood examination on a blood sample collected from an IgA nephropathy patient and performing a urine examination on a urine sample collected from the IgA nephropathy patient using a medical apparatus, and includes blood creatinine, a glomerular filtration rate, blood pressure, and uric acid. The pathological examination data is data related to the affected kidney disease obtained by biopsy of a kidney of a IgA nephropathy patient. In a specific embodiment, the pathological examination data includes five types of indicators, i.e., M, E, S, T, and C, wherein M (mesenchial hypercelluliarity) represents Mesangial cell proliferation: more than 50% of glomeruli have mesangial cell proliferation which is M1, otherwise MO; e (endothelial hypercellularity) indicates endothelial cell proliferation of the capillaries: e1 if there is capillary endothelial cell proliferation, otherwise E0; s (Segmental glomerosclerotiosis) indicates a hardening of the glomerular segment: s1 if there is glomerular segment sclerosis or adhesion, otherwise S0; t (Tubular atrophy/interstitial fibrosis) indicates renal Tubular atrophy or renal interstitial fibrosis: TO represents a proportion of renal tubular atrophy or renal interstitial fibrosis of less than 25%, T1 represents a proportion of renal tubular atrophy or renal interstitial fibrosis of more than 25% and less than 50%, T2 represents a proportion of renal tubular atrophy or renal interstitial fibrosis of more than 50%; c (Cellular fibroblast cultures) represents a Cellular or fibrocellular crescent: CO indicates the absence of cellular or fibrocellular crescents, C1 indicates the presence of less than 25% glomeruli and cellular or fibrocellular crescents, and C2 indicates the presence of more than 25% glomeruli or fibrocellular crescents.
In addition, the deterioration label is used for judging whether the IgA nephropathy is deteriorated or not, and the judgment standard is whether the terminal nephropathy is reached or whether the eGFR is reduced by more than 50%, wherein the eGFR is glomerular filtration rate, and the terminal nephropathy means that the eGFR is less than 15ml/min/1.73m 2 Or the initiation of renal replacement therapy for more than 3 months.
Note that, the clinical examination data and the pathological examination data of the IgA nephropathy patient and the deterioration label corresponding to the IgA nephropathy patient are only provided as an example in the present embodiment; the data collection module in the invention does not directly collect various data for IgA nephropathy patients, and the data collection module only plays a role in collecting various data. In a specific embodiment, the data collection module is a port for data input.
The data preprocessing module is connected to the data collecting module, and is used for preprocessing the clinical examination data and the pathological examination data, eliminating samples with data loss, obtaining the clinical examination data and the case examination data which can be used for model training and prediction, and splicing and combining the two data to form clinical data F serving as input data of subsequent model training and testing. In the bookIn the examples, the clinical data are expressed as F = [ F = [ ] 1 ,f 2 ,...,f n ]Where n denotes a total of n indices, f i I is more than or equal to 1 and less than or equal to n; the deterioration label was treated as a binary label Y of 1 and 0 as a label for the model training set test, where 1 indicates that the patient has deteriorated IgA nephropathy and 0 indicates that there is no deterioration in IgA nephropathy.
The data normalization module is connected to the data preprocessing module and is used for carrying out data normalization operation on the obtained clinical data F of the IgA nephropathy patient to obtain a data set which can be used for model training and testing; in this embodiment, each data sample in the data set includes clinical data of the patient and a deterioration label corresponding to the patient; the final data set of the IgA nephropathy prediction model facing the unbalanced data consists of a training set and a test set; where the training set consists of a 70% data set of all patients and the test set consists of a 30% data set of all patients.
The data normalization refers to mapping the clinical data F between 0 and 1 by the following formula to avoid the difficulty of model training caused by too large data range difference:
Figure BDA0003902649350000071
wherein, f i Representing the ith clinical data index in the clinical data F as the clinical data of the corresponding patient; f. of min Minimum of the ith clinical data, f, for all patients max Maximum value representing the ith clinical data for all patients; x is the number of i Represents the standard value after the i-th clinical data is normalized, and the clinical data after normalization is represented by X = [ X ] 1 ,x 2 ,...,x n ]。
The model training module is connected with the data normalization module and used for training an IgA nephropathy prediction model facing unbalanced data, and the unbalanced data are sample distribution with unbalanced sample labels.
In this embodiment, an imbalance data-oriented IgA nephropathy prediction model is trained in this module by an imbalance data-oriented learning method for subsequent IgA patient deterioration probability prediction.
In the present invention, the data processed, most clinical samples, exhibited an unbalanced data distribution, namely: only a small fraction of the samples are those that worsen to end stage renal disease (uremia), called tail samples, while most of the patient samples are those that do not worsen, called head samples. This unbalanced distribution can result in an excessive number of healthy patient samples being sampled more during the training of the neural network, and provided to the neural network for training, and overfitting the neural network after training, to bias more the prediction of IgA nephropathy degradation to a large number of non-degraded head patient samples; for a limited number of tail IgA nephropathy patient samples with small deterioration, the IgA nephropathy prediction model is not sufficiently trained, so that fitting is insufficient, and the prediction result of the actually deteriorated IgA nephropathy patient is not accurate enough.
The learning method facing the unbalanced data is to adjust the offset of the model to the tail sample by utilizing a resampling method. Resampling refers to resampling according to the distribution of samples. Typically under-sampling the head class and over-sampling the tail class. The problem of under-fitting to the tail class is avoided by increasing the number of more tail class samples in the process of training the model.
In the invention, the training method of the IgA nephropathy prediction model facing the unbalanced data adopts a progressive sampling method for training, and the progressive sampling method integrates uniform sampling based on samples and sampling based on class balance;
the uniform sampling based on samples refers to a uniform sampling method which is not designed for unbalanced distribution, and one sample is randomly selected as a training sample according to uniform distribution for model training, and is expressed as follows:
Figure BDA0003902649350000081
wherein p is i Denotes the probability that the ith sample was sampled, C denotes the total number of all classes, n i Denotes the total number of samples contained in the ith sample, n j Represents the total number of samples contained in the jth sample;
based on class equalization sampling, a class is selected from a class set according to uniform distribution, and then a sample instance is selected from the class according to uniform distribution for subsequent model training, which is expressed as:
Figure BDA0003902649350000082
wherein p is i Representing the probability that the ith sample was sampled and C representing the total number of all classes.
The progressive sampling method aggregates the two methods, which is a step-by-step balanced sampling, with a step-by-step interpolation between sample-based sampling and class-equalization-based sampling as the model learning progresses. In the early training stage, sample-based sampling is favored, the aim is to obtain better feature representation, and in the later training stage, unbalance-oriented balance for sample classes is introduced, so that under-fitting for tail classes and over-fitting for head classes caused by bias to the head classes are prevented.
The function of the method of progressive sampling is expressed as:
Figure BDA0003902649350000083
wherein p is i Representing the probability that the ith sample was sampled, T representing the tth training round, T representing the full round of training,
Figure BDA0003902649350000084
the sample-based sampling method and sampling probability are expressed as follows:
Figure BDA0003902649350000091
Figure BDA0003902649350000092
the sampling probability representing a class equalization based sampling method is as follows:
Figure BDA0003902649350000093
in this embodiment, for a given input training set data sample, sample sampling is performed according to the progressive sampling function, and the sampled clinical data is used for subsequent IgA nephropathy classifier classification; after the characteristic representation is obtained, the feature is input to an IgA nephropathy classifier to predict IgA nephropathy deterioration probability.
The IgA nephropathy classifier is a two-classification neural network and is used for judging whether an input patient sample is deteriorated or not, and outputting a judgment result of the classifier, wherein 0 represents deterioration, and 1 represents no deterioration;
model training using cross entropy function as loss function, cross entropy function
Figure BDA0003902649350000094
Is represented as follows:
Figure BDA0003902649350000095
wherein Y is i A true deterioration label indicating the ith IgA nephropathy patient sample,
Figure BDA0003902649350000096
represents the probability of worsening renal disease predicted by the model for the ith IgA nephropathy patient sample.
The precision of the model refers to the accuracy of the model, that is, the proportion of the number of correctly classified samples in the test set to the total number of samples in the test set.
And the model prediction module is connected with the model training module and used for predicting the IgA nephropathy deterioration probability of the clinical sample by using the IgA nephropathy prediction model facing the unbalanced data.
In this embodiment, when the IgA nephropathy prediction model for unbalanced data obtained by training is used for prediction, for a test set sample, the clinical data of the IgA nephropathy patient sample to be tested, which is obtained by inputting the data preprocessing module, is X = [ X ]) 1 ,x 2 ,...,x n ]The clinical data is directly input to the IgA-nephropathy classifier, and the trained IgA-nephropathy prediction model for unbalanced data can output the IgA-nephropathy deterioration probability of the patient by the IgA-nephropathy classifier.
In the above-described configuration, a report generation module for outputting a report of a deterioration of renal disease analysis for a given IgA nephropathy patient to be tested is connected to the model prediction module. And the report is uploaded to a prediction analysis system platform of the IgA nephropathy for unbalanced data, and a patient can inquire the report at a mobile phone terminal, a tablet and other terminals.
The invention provides an IgA nephropathy prediction analysis system for unbalanced data, which takes patient clinical data as input and takes the probability of possible deterioration of a patient as output. The method is characterized in that the method comprehensively considers unbalanced data problems in the IgA nephropathy prediction problem, designs a robust prediction system to enable the examination effect to be more accurate, automatically compares and analyzes by using an artificial intelligence algorithm, improves the prediction efficiency of the IgA nephropathy patient deterioration probability, is beneficial to a doctor to master the disease development rule when the doctor intervenes in the treatment of the patient, and is beneficial to subsequent treatment and prognosis.
As shown in fig. 2, the analysis for predicting IgA nephropathy based on unbalanced data according to the present invention specifically includes the following steps:
s1, collecting data, namely collecting clinical examination data and pathological examination data of IgA nephropathy patients and corresponding deterioration labels of the IgA nephropathy patients through a data collection module;
in this example, the clinical examination data, the pathological examination data, and the deterioration label corresponding to the IgA nephropathy patient are the same as those in the above example, and therefore, detailed description thereof will be omitted in this example.
S2, preprocessing patient data, namely preprocessing clinical examination data and pathological examination data through a data preprocessing module, removing samples with data loss to obtain clinical examination data and case examination data which can be used for model training and prediction, and splicing and combining the two data to form clinical data F; in addition, the method also comprises the following steps: pre-processing the patient deterioration label into 1 and 0 deterioration labels;
in this example, the clinical data is expressed as F = [ F = [ ] 1 ,f 2 ,...,f n ]Wherein n represents a total of n indices, f i I is more than or equal to 1 and less than or equal to n; the deterioration label was treated as a binary label Y of 1 and 0 as a label for the model training set test, where 1 indicates that the patient has deteriorated IgA nephropathy and 0 indicates that there is no deterioration in IgA nephropathy.
S3, normalizing the clinical data of the patient, and normalizing the clinical data of the patient;
the data normalization refers to mapping the clinical data F to be between 0 and 1 by the following formula so as to avoid the problem that the data range is too different to increase the difficulty of model training:
Figure BDA0003902649350000111
wherein, f i Representing the ith clinical data index in the clinical data F as the clinical data of the corresponding patient; f. of min Minimum of the ith clinical data, f, representing all patients max Maximum of the ith clinical data representing all patients; x is the number of i Represents the standard value after the i-th clinical data is normalized, and the clinical data after normalization is represented by X = [ X ] 1 ,x 2 ,...,x n ]。
S4, dividing a training set and a testing set, and dividing a data set consisting of all patient samples into the training set and the testing set;
in the step S4, 70% of data sets of all patient samples are divided into training sets for model training; dividing a data set of 30% of all patient samples into test sets for model testing;
s5, carrying out unbalance data-oriented IgA nephropathy diagnosis model training, carrying out sample sampling on a given input training set data sample according to a progressive sampling function, and using the sampled clinical data for an IgA nephropathy classifier to classify;
in step S5, the progressive sampling method aggregates uniform sampling based on samples and sampling based on class equalization;
the uniform sampling based on samples refers to a uniform sampling method which is not designed for unbalanced distribution, and one sample is randomly selected as a training sample according to uniform distribution for model training, and is expressed as follows:
Figure BDA0003902649350000112
wherein p is i Denotes the probability that the ith sample was sampled, C denotes the total number of all classes, n i Represents the total number of samples contained in the ith sample;
based on class equalization sampling, a class is selected from a class set according to uniform distribution, and then a sample instance is selected from the class according to uniform distribution for subsequent model training, which is expressed as:
Figure BDA0003902649350000113
wherein p is i Represents the probability that the ith sample was sampled, and C represents the total number of all classes;
the function of the method of progressive sampling is expressed as:
Figure BDA0003902649350000121
wherein p is i Representing the probability that the ith sample was sampled, T representing the tth training round, T representing the full round of training,
Figure BDA0003902649350000122
the sample-based sampling method and sampling probability are expressed as follows:
Figure BDA0003902649350000123
Figure BDA0003902649350000124
the sampling probability representing the class equalization based sampling method is as follows:
Figure BDA0003902649350000125
using the clinical data subjected to progressive sampling for an IgA nephropathy classifier to classify, and predicting the IgA nephropathy deterioration probability;
the IgA nephropathy classifier is a neural network of two classifications, and is used for judging whether an input patient sample is deteriorated or not, and outputting a judgment result of the classifier, wherein 0 represents deterioration, and 1 represents not deterioration;
model training using cross entropy function as loss function, cross entropy function
Figure BDA0003902649350000126
Is represented as follows:
Figure BDA0003902649350000127
wherein Y is i A true deterioration label indicating the ith IgA nephropathy patient sample,
Figure BDA0003902649350000128
indicates the ith IgA nephropathy patientA probability of kidney disease deterioration predicted by a model of the sample of subjects.
S6, predicting the IgA nephropathy deterioration probability of the patient, wherein the IgA nephropathy deterioration probability of the clinical sample is predicted by using an IgA nephropathy prediction model facing to unbalanced data;
when the IgA nephropathy prediction model for unbalanced data obtained through training is used for prediction, clinical data of an IgA nephropathy patient sample to be tested, which is obtained by the data preprocessing module, is input to the test set sample, the clinical data is directly input to the IgA nephropathy classifier, and the IgA nephropathy prediction model for unbalanced data obtained through training can output the IgA nephropathy deterioration probability of a patient through the IgA nephropathy classifier.
S7, generating an IgA nephropathy diagnosis and treatment report, and outputting an IgA nephropathy deterioration condition examination and an IgA nephropathy analysis report for a patient to be predicted.
The invention aims at unbalanced data distribution of IgA nephropathy clinical samples, adopts a resampling algorithm based on class frequency and a decoupled two-stage training mode, improves the IgA nephropathy prediction effect on data efficiency, and provides an IgA nephropathy prediction analysis system for unbalanced data, so that the prediction result is relatively more accurate and robust and has generalization.
While the preferred embodiments of the present invention have been illustrated and described, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims (12)

1. The IgA nephropathy prediction analysis system for unbalanced data is characterized by comprising a data collection module, a data preprocessing module, a data normalization module, a model training module and a model prediction module;
the data collection module is used for collecting clinical examination data and pathological examination data of IgA nephropathy patients and corresponding deterioration labels of the IgA nephropathy patients;
the data preprocessing module is connected to the data collecting module, and is used for preprocessing the clinical examination data and the pathological examination data, removing samples with data loss to obtain the clinical examination data and the case examination data which can be used for model training and prediction, and splicing and combining the two data to form clinical data F;
the data normalization module is connected to the data preprocessing module and is used for carrying out data normalization operation on the obtained clinical data F of the IgA nephropathy patient to obtain a data set which can be used for model training and testing;
the model training module is connected with the data normalization module and is used for training an IgA nephropathy prediction model facing unbalanced data, and the unbalanced data are sample distribution with unbalanced sample labels;
and the model prediction module is connected with the model training module and used for predicting the IgA nephropathy deterioration probability of the clinical sample by using the IgA nephropathy prediction model facing the unbalanced data.
2. The system for the predictive analysis of IgA nephropathy based on unbalanced data according to claim 1, wherein the clinical examination data is laboratory sheet data obtained by performing a blood examination by collecting a blood sample and a urine examination by collecting a urine sample from the IgA nephropathy patient by using a medical instrument, and includes blood creatinine, glomerular filtration rate, blood pressure, and uric acid.
3. The system for predictive analysis of IgA nephropathy based on unbalanced data as claimed in claim 1, wherein the pathological examination data is data relating to the renal disease of the IgA nephropathy patient obtained by biopsy of a kidney of the patient.
4. The system for predictive analysis of IgA nephropathy to imbalance data as claimed in claim 1, wherein the severity label is used to determine whether IgA nephropathy is worsening based on the criterion of end stage nephropathy being reached or a reduction in eGFR of more than 50%, wherein eGFR is glomerular filtration rate and end stage nephropathy is eGFR < 15ml/min/1.73m 2 Or the time period for starting the renal replacement therapy lasts more than 3 months.
5. The system for the predictive analysis of IgA nephropathy based on unbalanced data according to claim 1, wherein the clinical data is represented by F = [ F ] 1 ,f 2 ,...,f n ]Where n denotes a total of n indices, f i I is more than or equal to 1 and less than or equal to n;
the deterioration label was treated as a binary label Y of 1 and 0 as a label for the model training set test, where 1 indicates that the patient has deteriorated IgA nephropathy and 0 indicates that there is no deterioration in IgA nephropathy.
6. The system of claim 5, wherein each data sample in the data set comprises clinical data F of the patient and a deterioration label associated with the patient;
the data set consists of a training set consisting of 70% of the data set of all patients and a test set consisting of 30% of the data set of all patients.
7. The system for predictive analysis of IgA nephropathy over imbalance data as claimed in claim 5, wherein the clinical data F is mapped between 0 and 1 by the following formula to avoid too large a range of data to increase the difficulty of model training:
Figure FDA0003902649340000021
wherein f is i Representing the ith clinical data index in the clinical data F as the clinical data of the corresponding patient; f. of min Minimum of the ith clinical data, f, representing all patients max Maximum value representing the ith clinical data for all patients; x is the number of i Represents the standard value after the ith clinical data was normalized, and the clinical data after normalization is represented by X = [ X ] 1 ,x 2 ,…,x n ]。
8. The system for predictive analysis of IgA nephropathy based on unbalanced data of claim 7, wherein the model training module trains the IgA nephropathy predictive model based on unbalanced data by using an unbalanced data learning method; the learning method facing the unbalanced data adopts a resampling method, the offset of the model to the tail sample is adjusted, and the resampling refers to resampling according to the sample distribution.
9. The system for predictive analysis of IgA nephropathy based on unbalanced data according to claim 8, wherein the method for training the IgA nephropathy predictive model based on unbalanced data is trained by a progressive sampling method, and the progressive sampling method integrates uniform sampling based on samples and sampling based on class equalization;
the uniform sampling based on samples refers to a uniform sampling method which is not designed for unbalanced distribution, and one sample is randomly selected as a training sample according to uniform distribution for model training, and is expressed as follows:
Figure FDA0003902649340000022
wherein p is i Denotes the probability that the ith sample was sampled, C denotes the total number of all classes, n i Represents the total number of samples contained in the ith sample;
based on class equalization sampling, a class is selected from a class set according to uniform distribution, and then a sample instance is selected from the class according to uniform distribution for subsequent model training, which is expressed as:
Figure FDA0003902649340000031
wherein p is i Represents the ithThe probability that a sample is sampled, C represents the total number of all classes;
the function of the method of progressive sampling is expressed as:
Figure FDA0003902649340000032
wherein p is i Representing the probability that the ith sample was sampled, T representing the tth training round, T representing the full round of training,
Figure FDA0003902649340000033
the sample-based sampling method and sampling probability are expressed as follows:
Figure FDA0003902649340000034
Figure FDA0003902649340000035
the sampling probability representing the class equalization based sampling method is as follows:
Figure FDA0003902649340000036
10. the system for predictive analysis of IgA nephropathy based on unbalanced data according to claim 9, wherein clinical data sampled progressively is classified by an IgA nephropathy classifier to predict IgA nephropathy deterioration probability;
the IgA nephropathy classifier is a neural network of two classifications, and is used for judging whether an input patient sample is deteriorated or not, and outputting a judgment result of the classifier, wherein 0 represents deterioration, and 1 represents not deterioration;
model training using a cross-entropy function as a loss function, the cross-entropy function
Figure FDA0003902649340000041
Is represented as follows:
Figure FDA0003902649340000042
wherein, Y i A true deterioration label indicating the ith IgA nephropathy patient sample,
Figure FDA0003902649340000043
represents the probability of worsening renal disease predicted by the model for the ith IgA nephropathy patient sample.
11. The system for the predictive analysis of IgA nephropathy based on unbalanced data according to claim 10, wherein clinical data of a sample of an IgA nephropathy patient to be tested obtained as an input data preprocessing module for a sample of a test set at the time of prediction using an unbalanced data-oriented IgA nephropathy prediction model obtained by training is X = [ X ] in the case of prediction using an unbalanced data-oriented IgA nephropathy prediction model obtained by training 1 ,x 2 ,...,x n ]The clinical data X is directly input to the IgA-nephropathy classifier, and the IgA-nephropathy deterioration probability of the patient is output by the IgA-nephropathy classifier, which is an unbalance data-oriented IgA-nephropathy prediction model obtained by training.
12. The system for predictive analysis of IgA nephropathy oriented towards unbalanced data of claim 1, further comprising a report generation module connected to the model prediction module for outputting an analysis report of the deterioration of nephropathy for a given IgA nephropathy patient to be tested.
CN202211294731.2A 2022-10-21 2022-10-21 IgA nephropathy prediction analysis system for unbalanced data Pending CN115565681A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211294731.2A CN115565681A (en) 2022-10-21 2022-10-21 IgA nephropathy prediction analysis system for unbalanced data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211294731.2A CN115565681A (en) 2022-10-21 2022-10-21 IgA nephropathy prediction analysis system for unbalanced data

Publications (1)

Publication Number Publication Date
CN115565681A true CN115565681A (en) 2023-01-03

Family

ID=84746447

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211294731.2A Pending CN115565681A (en) 2022-10-21 2022-10-21 IgA nephropathy prediction analysis system for unbalanced data

Country Status (1)

Country Link
CN (1) CN115565681A (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200303075A1 (en) * 2019-03-18 2020-09-24 Kundan Krishna System and a method to predict occurrence of a chronic diseases
WO2021190300A1 (en) * 2020-03-26 2021-09-30 肾泰网健康科技(南京)有限公司 Method for constructing ai chronic kidney disease risk screening model, and chronic kidney disease risk screening method and system
CN113990521A (en) * 2021-10-22 2022-01-28 北京大学人民医院 IgA nephropathy pathological analysis, prognosis prediction and pathological index mining system
CN114283307A (en) * 2021-12-24 2022-04-05 中国科学技术大学 Network training method based on resampling strategy
US20220122739A1 (en) * 2020-03-07 2022-04-21 Huazhong University Of Science And Technology Ai-based condition classification system for patients with novel coronavirus

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200303075A1 (en) * 2019-03-18 2020-09-24 Kundan Krishna System and a method to predict occurrence of a chronic diseases
US20220122739A1 (en) * 2020-03-07 2022-04-21 Huazhong University Of Science And Technology Ai-based condition classification system for patients with novel coronavirus
WO2021190300A1 (en) * 2020-03-26 2021-09-30 肾泰网健康科技(南京)有限公司 Method for constructing ai chronic kidney disease risk screening model, and chronic kidney disease risk screening method and system
CN113990521A (en) * 2021-10-22 2022-01-28 北京大学人民医院 IgA nephropathy pathological analysis, prognosis prediction and pathological index mining system
CN114283307A (en) * 2021-12-24 2022-04-05 中国科学技术大学 Network training method based on resampling strategy

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
曾彩虹;: "\'IgA肾病牛津分类的理论依据及临床病理相关性分析\"", 《肾脏病与透析肾移植杂志》, no. 05 *
邓晓蔚;熊有明;丁世永;钟远斌;: ""IgA肾病进展至终末期肾病的研究新进展"", 《健康之路》, no. 02 *

Similar Documents

Publication Publication Date Title
CN109670510B (en) Deep learning-based gastroscope biopsy pathological data screening system
CN109543719B (en) Cervical atypical lesion diagnosis model and device based on multi-modal attention model
US20220198661A1 (en) Artificial intelligence based medical image automatic diagnosis system and method
CN111539308B (en) Embryo quality comprehensive evaluation device based on deep learning
CN116821753A (en) Machine learning-based community acquired pneumonia pathogen type prediction method
CN111079901A (en) Acute stroke lesion segmentation method based on small sample learning
CN115394426A (en) Juvenile IgA nephropathy prediction analysis system based on transfer learning
CN112950614A (en) Breast cancer detection method based on multi-scale cavity convolution
CN110969616B (en) Method and device for evaluating oocyte quality
CN114038507A (en) Prediction method, training method of prediction model and related device
CN116189909B (en) Clinical medicine discriminating method and system based on lifting algorithm
CN115565681A (en) IgA nephropathy prediction analysis system for unbalanced data
Zhang et al. Deep learning-based methods for classification of microsatellite instability in endometrial cancer from HE-stained pathological images
CN115346598A (en) Chronic kidney disease genetic gene risk screening system
CN116563224A (en) Image histology placenta implantation prediction method and device based on depth semantic features
CN115274110A (en) IgA nephropathy deterioration prediction analysis report generation system based on time series
CN116631617B (en) Prostate Gleason scoring system
CN113222061B (en) MRI image classification method based on two-way small sample learning
CN115064267B (en) Biliary tract occlusion risk assessment system and establishment method thereof
KR20190081825A (en) A cancer determiner utilizing machine learning and mass analysis and a method performing by the cancer determiner
Yördan et al. Hybrid AI-Based Chronic Kidney Disease Risk Prediction
Eswaran et al. Assessment of Human Blastocyst using Deep Learning Algorithm
WO2023102786A1 (en) Application of gene marker in prediction of premature birth risk of pregnant woman
CN115050466A (en) Accurate diagnosis and treatment system for traumatic brain injury based on combined monitoring of multiple biomarkers
CN116978567A (en) HPV infection ending individuation prediction model construction method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination