CN115719647A - Hemodialysis-concurrent cardiovascular disease prediction system integrating active learning and contrast learning - Google Patents
Hemodialysis-concurrent cardiovascular disease prediction system integrating active learning and contrast learning Download PDFInfo
- Publication number
- CN115719647A CN115719647A CN202310029096.3A CN202310029096A CN115719647A CN 115719647 A CN115719647 A CN 115719647A CN 202310029096 A CN202310029096 A CN 202310029096A CN 115719647 A CN115719647 A CN 115719647A
- Authority
- CN
- China
- Prior art keywords
- sample
- original
- positive
- amplification
- data
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02A—TECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
- Y02A90/00—Technologies having an indirect contribution to adaptation to climate change
- Y02A90/10—Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation
Landscapes
- Investigating Or Analysing Biological Materials (AREA)
Abstract
The invention discloses a hemodialysis-associated cardiovascular disease prediction system integrating active learning and comparative learning, which comprises the following steps: the hemodialysis data preparation module is used for extracting structured data of a patient sample by utilizing a hospital electronic information system and daily monitoring equipment and processing the structured data to obtain amplified structured data; and the hemodialysis concurrent cardiovascular disease risk prediction module is used for constructing a risk evaluation model, training and learning the amplification structured data through the risk evaluation model to obtain the characterization and the score of the patient, and predicting the hemodialysis concurrent cardiovascular disease risk by using the characterization and the score of the patient. The method solves the problem of positive and negative sample matching, iteratively updates the parameters of the comparison learning model by using the real label data of the hemodialysis complicated cardiovascular diseases, and improves the performance of the model by using the real complication result label; the problem of too few samples or unbalanced number of positive samples and negative samples is solved, and the difference between the amplification data and the original data is reduced.
Description
Technical Field
The invention relates to the technical field of medical health information, in particular to a hemodialysis-associated cardiovascular disease prediction system integrating active learning and comparative learning.
Background
Maintenance hemodialysis (hemodialysis) treatment is one of the main treatment modes of end-stage renal diseases, ensures that hemodialysis patients are effectively treated, and is an urgent need in the field of clinical medical treatment at present. Hemodialysis treatment is a long-term treatment that progresses throughout the course of the disease. Various cardiovascular complications can occur in the long-term hemodialysis process, and the survival condition of a patient is seriously influenced. Therefore, risk prediction and early intervention for cardiovascular complications of maintenance hemodialysis are of crucial importance to improve the quality of life of end stage renal disease patients.
Contrast learning is an automatic supervision algorithm, is widely applied to various fields such as computer vision, natural language processing and the like, and in recent years, model performance exceeding supervision learning is achieved even in various mainstream tasks. There are still difficulties in applying a comparative learning method suitable for an auto-supervised task to a supervised hemodialysis-complicated cardiovascular disease prediction task. On the one hand, cardiovascular complication prediction is a supervised task, and compared with an unsupervised task, additional label information is provided, so that how to effectively utilize a real complication result label to improve the performance of a model is a key problem. On the other hand, the key of the comparative learning lies in matching proper positive and negative samples, an improper matching method will seriously affect the model performance, and how to match proper and most valuable positive and negative samples to improve the model performance is a key problem.
Aiming at the problems, the patent aims to construct a hemodialysis concurrent cardiovascular disease prediction system integrating active learning and comparative learning by aiming at a hemodialysis concurrent cardiovascular disease prediction scene, and provides accurate and effective decision support for clinical decision.
Disclosure of Invention
In order to solve the technical problems, the invention provides a hemodialysis-complicated cardiovascular disease prediction system integrating active learning and comparative learning.
The technical scheme adopted by the invention is as follows:
a hemodialysis-complicated cardiovascular disease prediction system that fuses active learning and contrast learning, comprising:
the hemodialysis data preparation module is used for extracting structured data of a patient sample by utilizing a hospital electronic information system and daily monitoring equipment and processing the structured data to obtain amplified structured data;
and the hemodialysis concurrent cardiovascular disease risk prediction module is used for constructing a risk evaluation model, training and learning the amplification structured data through the risk evaluation model to obtain patient characterization and scores, and predicting the hemodialysis concurrent cardiovascular disease risk by using the patient characterization and scores.
Further, the structured data includes demographic data, clinical event data, medication data, and daily monitoring data.
Further, the hemodialysis data preparation module specifically includes:
the data acquisition unit is used for extracting the structured data of the patient sample by utilizing the hospital electronic information system and the wearable equipment;
the data cleaning unit is used for carrying out missing value processing, error value detection, repeated data elimination and/or inconsistency elimination on the structured data to obtain static data and time sequence data;
the data fusion unit is used for splicing one-dimensional compressed data obtained by performing convolution operation on the time sequence data and the static data to obtain original fusion characteristics;
and the data amplification unit is used for obtaining the amplification structured data by adopting a single-feature randomization method for the original fusion features.
Further, the amplification process of the data amplification unit is as follows:
step S1: taking patients with cardiovascular complications as original positive samples, taking patients without cardiovascular complications as original negative samples, wherein all the original positive samples form an original positive sample set, and all the original negative samples form an original negative sample set;
step S2: when the number of the original positive samples is smaller than that of the original negative samples, amplifying the original positive sample set to obtain amplified positive samples until the number of the positive samples is equal to that of the original negative samples; when the number of the original positive samples is larger than that of the original negative samples, amplifying the original negative sample set to obtain amplified negative samples until the number of the negative samples is equal to that of the original positive samples;
and step S3: the original positive sample set and the amplified positive samples form a positive sample amplification set, and the original negative sample set and the amplified negative samples form a negative sample amplification set;
and step S4: the positive sample amplification set and the negative sample amplification set together constitute amplification structured data.
Further, the process of obtaining the amplification positive sample in step S2 is:
combining the original fusion features with the original positive sample set to obtain a combined positive sample set, wherein the combined positive sample set comprises a single original fusion feature and a single positive sample set corresponding to the single original fusion feature;
taking a single original fusion feature in a single combined positive sample set as an intervention feature, taking the rest original fusion features in the single combined positive sample set as a fixed feature set, taking a positive sample in the single positive sample set as an amplification object to perform sample amplification to obtain a single amplification positive sample, and completing the whole amplification process until the amplification times are the difference value between the original negative sample and the original positive sample to obtain a final amplification positive sample;
the process of obtaining the amplification negative sample comprises the following steps:
combining the original fusion features with the original negative sample set to obtain a combined negative sample set, wherein the combined negative sample set comprises a single original fusion feature and a single negative sample set corresponding to the single original fusion feature;
and taking a single original fusion feature in the single combined negative sample set as an intervention feature, taking the rest original fusion features in the single combined negative sample set as a fixed feature set, taking the negative samples in the single negative sample set as amplification objects to carry out sample amplification to obtain a single amplification negative sample, and completing the whole amplification process until the amplification times are the difference value between the original negative sample and the original positive sample to obtain a final amplification negative sample.
Further, the module for predicting risk of hemodialysis complicated cardiovascular diseases specifically comprises:
a risk evaluation unit: the risk evaluation model is constructed, and the amplification structured data is used as training data of the model to obtain scores and patient phenotypes;
an active learning unit: for selecting positive and negative samples from said amplified structured data by a positive and negative sample selection normalizer using said score and said patient phenotype;
a comparison learning unit: and the system is used for performing comparison learning by using the positive and negative samples and updating the network parameters of the encoder shared by the risk evaluation unit.
Further, the risk evaluation unit specifically includes:
the risk evaluation model is constructed by utilizing an encoder and a risk evaluation network, and is optimized through a loss function;
for extracting a patient phenotype with an encoder in the risk assessment model, the patient phenotype calculating a score for hemodialysis-complicated cardiovascular disease through the risk assessment network;
the system is used for setting a real label for a patient, and when the patient has cardiovascular complications, the real label is 1; otherwise, the real label is 0;
for optimizing a loss function using the score and the true label.
Further, the active learning unit specifically includes:
the risk evaluation model is used for carrying out normalization processing on the patient phenotype output by the risk evaluation model, and the obtained patient representation is mapped into a 0-1 space through the normalization processing;
the device is used for respectively calculating the included angle of each sample representation in the amplification structured data to other sample representations in the 0-1 space direction by utilizing a positive and negative sample selection rule device;
the device is used for dividing the calculated included angle of each sample into a first group and a second group according to whether the real labels of other samples are the same as the real label of the current sample, and respectively sequencing the interiors of the first group and the second group from small to large;
the system is used for selecting the upper quartile as a positive sample set in the sorted first group and selecting the lower quartile as a negative sample set in the sorted second group.
Further, the comparison learning unit specifically includes: and the positive sample and the negative sample are used for performing comparison learning, the real labels of the positive sample and the patient sample are the same, the real labels of the negative sample and the patient sample are different, the cosine distance of the positive sample and the cosine distance of the negative sample of the patient sample are calculated, the loss function of the comparison learning unit is constructed according to the cosine distance of the positive sample and the cosine distance of the negative sample, and the network parameters of the encoder shared by the risk evaluation unit are updated.
Further, the risk evaluation unit, the active learning unit and the comparison learning unit share the encoder, the encoder is a 5-layer fully-connected network, the number of nodes in each layer is 1024, 512, 256, 128 and 64, respectively, and the activation function is ReLU.
The invention has the beneficial effects that:
1. the invention provides a positive and negative sample matching method based on active learning, which is used for selecting high-value comparison samples to improve the model performance and solve the problem of positive and negative sample matching.
2. The invention provides a training method for integrating active learning and comparative learning, which iteratively updates comparative learning model parameters by using real label data of hemodialysis complicated cardiovascular diseases, and solves the problem of how to effectively utilize real complicated symptom result labels to improve model performance in a supervised scene.
3. The invention provides a single-feature randomization method for amplifying original data, solves the problems of too few collected samples or unbalanced number of positive samples and negative samples, and reduces the difference between the amplified data and the original data.
Drawings
FIG. 1 is a block diagram of a hemodialysis-complicated cardiovascular disease prediction system incorporating active learning and contrast learning in accordance with the present invention;
FIG. 2 is a block diagram of a hemodialysis data preparation module of the present invention;
fig. 3 is a block diagram of the module for predicting risk of hemodialysis complicated with cardiovascular diseases according to the present invention.
Detailed Description
The following description of at least one exemplary embodiment is merely illustrative in nature and is in no way intended to limit the invention, its application, or uses. All other embodiments, which can be obtained by a person skilled in the art without inventive step based on the embodiments of the present invention, are within the scope of protection of the present invention.
Referring to fig. 1, a hemodialysis-complicated cardiovascular disease prediction system fusing active learning and contrast learning, includes:
the hemodialysis data preparation module is used for extracting structured data of a patient sample by utilizing a hospital electronic information system and daily monitoring equipment and processing the structured data to obtain amplified structured data;
the structured data includes demographic data, clinical event data, medication data, and daily monitoring data;
(1) Demographic data: age, sex, region, etc.; (2) clinical event data: hemodialysis events, diagnostic events, etc.; (3) medication data: drug name, dosage, etc.; (4) daily monitoring data: blood pressure, heart rate, body weight, etc.
Referring to fig. 2, the hemodialysis data preparation module specifically includes:
the data acquisition unit is used for extracting the structured data of the patient sample by utilizing the hospital electronic information system and the wearable equipment;
the data cleaning unit is used for carrying out missing value processing, error value detection, repeated data elimination and/or inconsistency elimination on the structured data to obtain static data and time sequence data;
taking the information of the complication diagnosis event as an example, the first time of using the complication medicines can be used for filling the missing diagnosis time of the complication; for the missing complication names, the specific complication names can be judged according to the medication condition of the complications; if the name of the complication can not be judged through the medication information, the missing complication diagnosis information is screened actively.
The data fusion unit is used for splicing one-dimensional compressed data obtained by performing convolution operation on the time sequence data and the static data to obtain original fusion characteristics;
the acquired basic information of the patient, such as age and sex, belongs to static data, and hemodialysis information and daily detection information belong to one-dimensional time series data. Convolution operation is carried out on the one-dimensional time sequence data, so that the one-dimensional time sequence data can be fused with static data, and subsequent data processing and model training are facilitated.
And the data amplification unit is used for obtaining the amplification structured data by adopting a single-feature randomization method for the original fusion features.
Under the condition that the number of the collected samples is too small or the number of the positive samples and the negative samples is unbalanced, the training effect of the model can be influenced. In order to reduce the influence, the invention adopts a single-feature randomization method to amplify the original data, and solves the problems of too few collected samples or unbalanced quantity of positive samples and negative samples. In order to reduce the difference between the amplification data and the actual data as much as possible, only one feature is selected as an interference feature residual characteristic set as a fixed feature at each time for sample amplification in the process of amplifying the original data by using a single-feature randomization method.
The amplification process of the data amplification unit comprises the following steps:
step S1: taking patients with cardiovascular complications as original positive samples, taking patients without cardiovascular complications as original negative samples, wherein all the original positive samples form an original positive sample set, and all the original negative samples form an original negative sample set;
step S2: when the number of the original positive samples is smaller than that of the original negative samples, amplifying the original positive sample set to obtain amplified positive samples until the number of the positive samples is equal to that of the original negative samples; when the number of the original positive samples is larger than that of the original negative samples, amplifying the original negative sample set to obtain amplified negative samples until the number of the negative samples is equal to that of the original positive samples;
the process of obtaining the amplification positive sample comprises the following steps:
combining the original fusion features with the original positive sample set to obtain a combined positive sample set, wherein the combined positive sample set comprises a single original fusion feature and a single positive sample set corresponding to the single original fusion feature;
taking a single original fusion feature in a single combined positive sample set as an intervention feature, taking the rest original fusion features in the single combined positive sample set as a fixed feature set, taking a positive sample in the single positive sample set as an amplification object to perform sample amplification to obtain a single amplification positive sample, and completing the whole amplification process until the amplification times are the difference value between the original negative sample and the original positive sample to obtain a final amplification positive sample;
the number of the existing original positive samples is M, the number of the original negative samples is N, andbecause the original positive sample and the original negative sample are not balanced, the original positive sample needs to be amplified by the amount of Q, i.e. Q = N-M, and the whole amplification process of Q is described in detail below:
the original fusion features are noted asWherein, in the step (A),representing the ith single original fused feature,the characteristic quantity of the original positive sample is obtained; all original positive samples were then randomly averaged intoGroup, original Positive sample set recordingIn whichRepresenting a single set of positive samples of group i and recordingIn whichIndicates the total number of positive samples assigned to the i-th group,representing the grouped ith group of jth positive samples.
Then combining the original fusion characteristic V with the original positive sample set X to obtain a combined positive sample set, and recording the combined positive sample set asWherein a single positive sample set is combined, Representing the ith single original fused feature,a single set of positive samples representing the ith group. After combination, only a single combined positive sample set is needed for each group of the combined positive sample sets VXWith a single original fused featureAs a result of the nature of the intervention,(feature set V divideExtra features) as a set of fixed features, a single set of positive samplesThe positive sample in (1) is used as an amplification target to amplify the sample. And combining each set of individual combined positive sample sets of positive sample sets VXThe number of amplified samples is. Recording the amplified data asWherein, in the process,represents the sample set amplified by combining the ith group of samples of the positive sample set VX. MemoWherein, in the process,indicating the number of amplifications of group i, single amplification positiveSample(s)Represents the ith group of single combined positive sample set of combined positive sample set VXSample amplification of the jth sample.
For single amplification positive samplesFirst, a single set of positive samples is combinedSingle positive sample set ofRandomly selecting two samplesSample ofIs characterized by being represented asSample ofIs characterized by being represented as. Single amplification positive sampleThe characteristics are expressed as follows:
wherein the content of the first and second substances,is a random number with a value range of (0, 1);representing a sampleTo (1) aA value of the individual characteristic;representing a sampleTo (1) aThe value of each feature.Representing amplified samplesTo (1)For amplifying the sampleIs characterized byIs taken of a sampleAnd a sampleIs characterized in thatRandom numbers between the lines of the upper values, which reduces the difference between the amplified data and the original data.
The process of obtaining the amplification negative sample comprises the following steps:
combining the original fusion features with the original negative sample set to obtain a combined negative sample set, wherein the combined negative sample set comprises a single original fusion feature and a single negative sample set corresponding to the single original fusion feature;
and taking a single original fusion feature in the single combined negative sample set as an intervention feature, taking the rest original fusion features in the single combined negative sample set as a fixed feature set, taking the negative samples in the single negative sample set as amplification objects to carry out sample amplification to obtain a single amplification negative sample, and completing the whole amplification process until the amplification times are the difference value between the original negative sample and the original positive sample to obtain a final amplification negative sample.
And step S3: the original positive sample set and the amplified positive samples form a positive sample amplification set, and the original negative sample set and the amplified negative samples form a negative sample amplification set;
and step S4: the positive sample amplification set and the negative sample amplification set together constitute amplification structured data.
And the hemodialysis concurrent cardiovascular disease risk prediction module is used for constructing a risk evaluation model, training and learning the amplification structured data through the risk evaluation model to obtain patient characterization and scores, and predicting the hemodialysis concurrent cardiovascular disease risk by using the patient characterization and scores.
The module for predicting the risk of the hemodialysis complicated cardiovascular diseases comprises three parts: risk evaluation unit, active learning unit, contrast learning unit, as shown in fig. 3. Firstly, using amplification structured data as input of a system, and training a primary risk evaluation model through a risk evaluation unit; then, the active learning unit selects high-value contrast samples from the amplified structured data through a positive and negative sample selection rule device R by using the output score p of the risk evaluation unit and the phenotypes of the patients s1 and s2 for the contrast learning unit to learn; and finally, the comparison learning unit learns by using the high-quality comparison samples selected by the active learning unit, so that the samples with the same label are closer, the samples with different labels are farther, and meanwhile, the parameter of the encoder f shared by the comparison learning unit and the risk evaluation unit is updated, so that the risk evaluation model is more accurate.
The hemodialysis complicated cardiovascular disease risk prediction module specifically comprises:
a risk evaluation unit: the risk evaluation model is constructed, and the amplification structured data is used as training data of the model to obtain scores and patient phenotypes;
the risk evaluation unit specifically includes:
the risk evaluation model is constructed by utilizing an encoder and a risk evaluation network, and is optimized through a loss function;
for extracting a patient phenotype with an encoder in the risk assessment model, the patient phenotype calculating a score for hemodialysis-complicated cardiovascular disease through the risk assessment network;
the system is used for setting a real label for a patient, and when the patient has cardiovascular complications, the real label is 1; otherwise, the real label is 0;
for optimizing a loss function using the score and the truth label.
The risk evaluation unit, the active learning unit and the comparison learning unit share the encoder f, the encoder f is a 5-layer fully-connected network, the number of nodes in each layer is 1024, 512, 256, 128 and 64 respectively, and the activation function is ReLU.
After the patient phenotype S (which is a 64-bit vector) is extracted from the patient raw fusion features by using the encoder f, the patient phenotype S is evaluated by a risk evaluation networkCalculating the score p =suffering from cardiovascular complications of the patient. Risk assessment networkIs a network consisting of a 4-layer full connection. Each layer of nodes is 128, 32, 8 and 2 respectively. The activation function of the first three layers is ReLU, and the activation function of the last output layer isThe entire network uses the SGD function as an optimizer. The predicted loss function of the risk assessment unit is as follows:
wherein N represents the number of all samples in the amplified structured data,indicating that the risk assessment unit has a predictive score for the input patient sample i with a certain cardiovascular disease,is a true label for patient i, when patient i has cardiovascular disease,when the patient i does not suffer from cardiovascular disease,. For the entire loss function, when patient i suffers from cardiovascular disease, in the loss function, Predictive scoring with patient iThe larger and larger, and thus the smaller the overall loss function; similarly, when patient i does not have a certain cardiovascular disease, the predicted score for patient i is determinedThe smaller the overall loss function.
An active learning unit: for selecting positive and negative samples from said augmented structured data using said score and said patient phenotype by a positive and negative sample selection normalizer;
the active learning unit is used for selecting high-value comparison samples for the comparison learning unit to learn by combining the risk evaluation unit, so that the patient characteristics of the same label are closer, and the patient characteristics of different labels are farther.
The active learning unit specifically includes:
the risk evaluation model is used for carrying out normalization processing on the patient phenotype output by the risk evaluation model, and the obtained patient representation is mapped into a 0-1 space through the normalization processing;
first, the patient phenotype s generated by the risk assessment unit is normalized and recorded asA patient characterization vector with s length of 64,representing the L1 norm of s. After normalization, the patient characterization is mapped into a space of 0-1, facilitating subsequent pick calculations.
The positive and negative sample rule selector R is used for selecting positive and negative samples from the original input sample set by using a selection rule.
The positive and negative sample rule picker R utilizes the following rules: the cosine distances between patient phenotype vectors of the same label should be similar and the cosine distances between their patient phenotype vectors should be far apart for samples of different labels. The rule for choosing a positive sample j of sample i is that sample j has the same true label as sample i, but sample j is further away from the cosine of sample i. It is desirable to make the cosine distance between the sample i and the positive sample closer by contrast learning; the rule for choosing the negative sample k of sample i is that sample k is not true label to sample i, but the cosine distance of sample k to sample i is small. It is desirable to make the cosine distance between the sample i and the negative sample k further away by contrast learning;
the device is used for respectively calculating the included angle of each sample representation in the amplification structured data to other sample representations in the 0-1 space direction by utilizing a positive and negative sample selection rule device;
using formulasCalculating the included angle of the sample characterization in the amplification structured data to the spatial direction of other sample characterizations, wherein,, which represents a characterization of the sample i,representing a characterization of sample j. In general, if two exemplars have the same label, they should have the same or similar orientation in space, the smaller the cosine of the angle between them, if the labels of the two exemplars are different, the different orientation in space, the larger the cosine of the angle between them,the vector is the vector of the ith sample characterization vector after normalization processing.
The device is used for dividing the calculated included angle of each sample into a first group and a second group according to whether the real labels of other samples are the same as the real label of the current sample, and respectively sequencing the interiors of the first group and the second group from small to large;
dividing the cosine of the included angle between the sample i and other samples into two groups according to whether the real labels of other samples are the same as the real label of the sample iAndin whichThe other sample true tags are the same as the true tag of sample i, i.e. the set is, A real label representing the sample i,a true label representing sample j;the true tags of the other samples are different from the true tag of sample i, i.e. the set. And isAndsorting the interior from small to large, recordingWherein, in the step (A),, ; wherein, in the step (A),, 。
and the upper quartile is selected from the sorted first group as a positive sample set, and the lower quartile is selected from the sorted second group as a negative sample set.
After sortingSelecting upper quartile as positive sample set of sample iWherein, in the step (A),(ii) a After sortingSelecting a lower quartile as a negative sample set of the sample i from the groupWherein, in the process,。
a comparison learning unit: and the system is used for performing comparison learning by using the positive and negative samples and updating the network parameters of the encoder shared by the risk evaluation unit.
The comparison learning unit specifically comprises: and the positive sample and the negative sample are used for performing comparison learning, the real labels of the positive sample and the patient sample are the same, the real labels of the negative sample and the patient sample are different, the cosine distance of the positive sample and the cosine distance of the negative sample of the patient sample are calculated, the loss function of the comparison learning unit is constructed according to the cosine distance of the positive sample and the cosine distance of the negative sample, and the network parameters of the encoder shared by the risk evaluation unit are updated.
In the comparison learning unit, the active learning unit selects positive and negative samples of the original sample based on the real label and the patient characterization. The positive and negative samples obtain patient characteristics s of the positive and negative samples through an encoder f, the obtained positive and negative sample patient characteristics s can be subjected to characteristic mapping through a projector h to obtain a mapped comparison characteristic t, the projector h is a 3-layer fully-connected network, the number of nodes in each layer is 512, 256 and 128 respectively, an activation function is a ReLU function, and an SGD function is used as an optimizer. The mapped representation is normalized and recorded as, Wherein, in the step (A),is the mean value of the characteristic dimension of the comparative characterization t,is the standard deviation of the comparative characterization t characteristic dimension.Is a characterization vector of a positive sample j of patient i screened by the active learning unit,is a characterization vector of the negative sample k of patient i screened by the active learning unit.Representing the cosine distance between sample i and sample j,representing the cosine distance between sample j and negative sample k. As can be seen from the active learning unit described above, the true labels of the positive sample j and the sample i are the same, and as a loss, the smaller the cosine distance between the positive sample j and the sample i is, the better, and in the same way, the true labels of the negative sample k and the sample i are different, and as a loss, the larger the cosine distance between the negative sample k and the sample i is, the better is. Thus, the loss function of the comparative learning unit is constructed as follows:
the above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.
Claims (10)
1. A hemodialysis-complicated cardiovascular disease prediction system that incorporates active learning and contrast learning, comprising:
the hemodialysis data preparation module is used for extracting structured data of a patient sample by utilizing a hospital electronic information system and daily monitoring equipment and processing the structured data to obtain amplified structured data;
and the hemodialysis concurrent cardiovascular disease risk prediction module is used for constructing a risk evaluation model, training and learning the amplification structured data through the risk evaluation model to obtain patient characterization and scores, and predicting the hemodialysis concurrent cardiovascular disease risk by using the patient characterization and scores.
2. The system of claim 1, wherein the structured data comprises demographic data, clinical event data, medication data, and daily monitoring data.
3. The hemodialysis-complicated cardiovascular disease prediction system combining active learning and contrast learning according to claim 1, wherein the hemodialysis data preparation module specifically comprises:
the data acquisition unit is used for extracting the structured data of the patient sample by utilizing the hospital electronic information system and the wearable equipment;
the data cleaning unit is used for carrying out missing value processing, error value detection, repeated data elimination and/or inconsistency elimination on the structured data to obtain static data and time sequence data;
the data fusion unit is used for splicing one-dimensional compressed data obtained by performing convolution operation on the time sequence data and the static data to obtain original fusion characteristics;
and the data amplification unit is used for obtaining the amplification structured data by adopting a single-feature randomization method for the original fusion features.
4. The system for predicting hemodialysis-complicated cardiovascular diseases by combining active learning and contrast learning according to claim 3, wherein the data amplification unit comprises:
step S1: taking patients with cardiovascular complications as original positive samples, taking patients without cardiovascular complications as original negative samples, wherein all the original positive samples form an original positive sample set, and all the original negative samples form an original negative sample set;
step S2: when the number of the original positive samples is smaller than that of the original negative samples, amplifying the original positive sample set to obtain amplified positive samples until the number of the positive samples is equal to that of the original negative samples; when the number of the original positive samples is larger than that of the original negative samples, amplifying the original negative sample set to obtain amplified negative samples until the number of the negative samples is equal to that of the original positive samples;
and step S3: the original positive sample set and the amplified positive samples form a positive sample amplification set, and the original negative sample set and the amplified negative samples form a negative sample amplification set;
and step S4: the positive sample amplification set and the negative sample amplification set together constitute amplification structured data.
5. The system for predicting hemodialysis-complicated cardiovascular disease through active learning and contrast learning according to claim 4, wherein the process of obtaining the amplification positive sample in step S2 comprises:
combining the original fusion features with the original positive sample set to obtain a combined positive sample set, wherein the combined positive sample set comprises a single original fusion feature and a single positive sample set corresponding to the single original fusion feature;
taking a single original fusion feature in a single combined positive sample set as an intervention feature, taking the rest original fusion features in the single combined positive sample set as a fixed feature set, taking a positive sample in the single positive sample set as an amplification object to carry out sample amplification, obtaining a single amplification positive sample, completing the whole amplification process until the amplification times are the difference value between the original negative sample and the original positive sample, and obtaining a final amplification positive sample;
the process of obtaining the amplification negative sample comprises the following steps:
combining the original fusion features with the original negative sample set to obtain a combined negative sample set, wherein the combined negative sample set comprises a single original fusion feature and a single negative sample set corresponding to the single original fusion feature;
and taking a single original fusion feature in the single combined negative sample set as an intervention feature, taking the rest original fusion features in the single combined negative sample set as a fixed feature set, taking the negative samples in the single negative sample set as amplification objects to carry out sample amplification, obtaining a single amplification negative sample, completing the whole amplification process until the amplification times are the difference between the original negative sample and the original positive sample, and obtaining a final amplification negative sample.
6. The system for predicting hemodialysis-complicated cardiovascular disease fused with active learning and comparative learning according to claim 1, wherein the module for predicting risk of hemodialysis-complicated cardiovascular disease comprises:
a risk evaluation unit: the risk evaluation model is constructed, and the amplification structured data is used as training data of the model to obtain scores and a patient phenotype;
an active learning unit: for selecting positive and negative samples from said amplified structured data by a positive and negative sample selection normalizer using said score and said patient phenotype;
a comparison learning unit: and the system is used for performing comparison learning by using the positive and negative samples and updating the network parameters of the encoder shared by the risk evaluation unit.
7. The system for predicting hemodialysis-complicated cardiovascular diseases by combining active learning and comparative learning according to claim 6, wherein the risk evaluation unit specifically comprises:
the risk evaluation model is constructed by utilizing an encoder and a risk evaluation network, and is optimized through a loss function;
for extracting a patient phenotype with an encoder in the risk assessment model, the patient phenotype calculating a score for hemodialysis-complicated cardiovascular disease through the risk assessment network;
the system is used for setting a real label for a patient, and when the patient has cardiovascular complications, the real label is 1; otherwise, the real label is 0;
for optimizing a loss function using the score and the truth label.
8. The system for predicting hemodialysis-complicated cardiovascular diseases by combining active learning and contrast learning according to claim 6, wherein the active learning unit comprises:
the risk evaluation model is used for carrying out normalization processing on the patient phenotype output by the risk evaluation model, and the obtained patient representation is mapped into a 0-1 space through the normalization processing;
the positive and negative sample selection rulers are used for respectively calculating included angles from each sample characterization in the amplification structured data to other sample characterizations in the 0-1 space direction;
the device is used for dividing the calculated included angle of each sample into a first group and a second group according to whether the real labels of other samples are the same as the real label of the current sample, and respectively sequencing the interiors of the first group and the second group from small to large;
and the upper quartile is selected from the sorted first group as a positive sample set, and the lower quartile is selected from the sorted second group as a negative sample set.
9. The system for predicting hemodialysis-complicated cardiovascular diseases by combining active learning and comparative learning according to claim 6, wherein the comparative learning unit comprises: and the positive sample and the negative sample are used for performing comparison learning, the real labels of the positive sample and the patient sample are the same, the real labels of the negative sample and the patient sample are different, the cosine distance of the positive sample and the cosine distance of the negative sample of the patient sample are calculated, the loss function of the comparison learning unit is constructed according to the cosine distance of the positive sample and the cosine distance of the negative sample, and the network parameters of the encoder shared by the risk evaluation unit are updated.
10. The hemodialysis-complicated cardiovascular disease prediction system combining active learning and comparative learning according to claim 6, wherein the risk evaluation unit, the active learning unit and the comparative learning unit share the encoder, the encoder is a 5-layer fully-connected network, each layer has 1024 nodes, 512 nodes, 256 nodes, 128 nodes and 64 nodes, respectively, and the activation function is ReLU.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310029096.3A CN115719647B (en) | 2023-01-09 | 2023-01-09 | Hemodialysis-concurrent cardiovascular disease prediction system integrating active learning and contrast learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310029096.3A CN115719647B (en) | 2023-01-09 | 2023-01-09 | Hemodialysis-concurrent cardiovascular disease prediction system integrating active learning and contrast learning |
Publications (2)
Publication Number | Publication Date |
---|---|
CN115719647A true CN115719647A (en) | 2023-02-28 |
CN115719647B CN115719647B (en) | 2023-04-11 |
Family
ID=85257907
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310029096.3A Active CN115719647B (en) | 2023-01-09 | 2023-01-09 | Hemodialysis-concurrent cardiovascular disease prediction system integrating active learning and contrast learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115719647B (en) |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107242857A (en) * | 2017-06-12 | 2017-10-13 | 南开大学 | The intelligent traditional Chinese medical science based on deep learning integrates diagnosis and therapy system |
WO2019008798A1 (en) * | 2017-07-07 | 2019-01-10 | Ntt Data Corporation | Disease onset prediction device, disease onset prediction method, and program |
US20200210899A1 (en) * | 2017-11-22 | 2020-07-02 | Alibaba Group Holding Limited | Machine learning model training method and device, and electronic device |
CN111430025A (en) * | 2020-03-10 | 2020-07-17 | 清华大学 | Disease diagnosis method based on medical image data amplification |
CN113571183A (en) * | 2020-04-28 | 2021-10-29 | 西门子医疗有限公司 | COVID-19 patient management risk prediction |
CN113674864A (en) * | 2021-08-30 | 2021-11-19 | 重庆大学 | Method for predicting risk of malignant tumor complicated with venous thromboembolism |
CN114005432A (en) * | 2021-10-21 | 2022-02-01 | 江苏信息职业技术学院 | Chinese dialect identification method based on active learning |
CN114913982A (en) * | 2022-07-18 | 2022-08-16 | 之江实验室 | End-stage renal disease complication risk prediction system based on contrast learning |
CN114999659A (en) * | 2022-04-26 | 2022-09-02 | 北京市农林科学院信息技术研究中心 | Complication risk early warning method and device, electronic equipment and storage medium |
-
2023
- 2023-01-09 CN CN202310029096.3A patent/CN115719647B/en active Active
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107242857A (en) * | 2017-06-12 | 2017-10-13 | 南开大学 | The intelligent traditional Chinese medical science based on deep learning integrates diagnosis and therapy system |
WO2019008798A1 (en) * | 2017-07-07 | 2019-01-10 | Ntt Data Corporation | Disease onset prediction device, disease onset prediction method, and program |
US20200210899A1 (en) * | 2017-11-22 | 2020-07-02 | Alibaba Group Holding Limited | Machine learning model training method and device, and electronic device |
CN111430025A (en) * | 2020-03-10 | 2020-07-17 | 清华大学 | Disease diagnosis method based on medical image data amplification |
CN113571183A (en) * | 2020-04-28 | 2021-10-29 | 西门子医疗有限公司 | COVID-19 patient management risk prediction |
CN113674864A (en) * | 2021-08-30 | 2021-11-19 | 重庆大学 | Method for predicting risk of malignant tumor complicated with venous thromboembolism |
CN114005432A (en) * | 2021-10-21 | 2022-02-01 | 江苏信息职业技术学院 | Chinese dialect identification method based on active learning |
CN114999659A (en) * | 2022-04-26 | 2022-09-02 | 北京市农林科学院信息技术研究中心 | Complication risk early warning method and device, electronic equipment and storage medium |
CN114913982A (en) * | 2022-07-18 | 2022-08-16 | 之江实验室 | End-stage renal disease complication risk prediction system based on contrast learning |
Non-Patent Citations (3)
Title |
---|
LU LIU 等: "Mining diabetes complication and treatment patterns for clinical decision support" * |
唐佩军: "生成对抗网络的可解释性研究" * |
赵梦蝶;孙九爱;: "机器学习在心血管疾病诊断中的研究进展" * |
Also Published As
Publication number | Publication date |
---|---|
CN115719647B (en) | 2023-04-11 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Rimi et al. | Derm-NN: skin diseases detection using convolutional neural network | |
CN111951975B (en) | Sepsis early warning method based on deep learning model GPT-2 | |
CN113057585B (en) | Cognitive disorder detection model and training method thereof | |
CN116364299B (en) | Disease diagnosis and treatment path clustering method and system based on heterogeneous information network | |
Gupta | Pneumonia detection using convolutional neural networks | |
CN114913982B (en) | End-stage renal disease complication risk prediction system based on contrast learning | |
CN111951965B (en) | Panoramic health dynamic monitoring and predicting system based on time sequence knowledge graph | |
Chhabra et al. | A smart healthcare system based on classifier DenseNet 121 model to detect multiple diseases | |
CN108595432B (en) | Medical document error correction method | |
CN112967803A (en) | Early mortality prediction method and system for emergency patients based on integrated model | |
CN113160986A (en) | Model construction method and system for predicting development of systemic inflammatory response syndrome | |
Iparraguirre-Villanueva et al. | Convolutional neural networks with transfer learning for pneumonia detection | |
Peng et al. | Heart disease prediction using artificial neural networks: a survey | |
CN115798711A (en) | Chronic nephropathy diagnosis and treatment decision support system based on counterfactual contrast learning | |
CN112542242A (en) | Data transformation/symptom scoring | |
CN116070096A (en) | Method and system for helping hospital build patient portrait through big data analysis | |
CN111145902A (en) | Asthma diagnosis method based on improved artificial neural network | |
Chawla et al. | Artificial intelligence based techniques in respiratory healthcare services: a review | |
CN117423423B (en) | Health record integration method, equipment and medium based on convolutional neural network | |
Hamidi et al. | A new hybrid method for improving the performance of myocardial infarction prediction | |
CN116959715B (en) | Disease prognosis prediction system based on time sequence evolution process explanation | |
CN115719647B (en) | Hemodialysis-concurrent cardiovascular disease prediction system integrating active learning and contrast learning | |
CN111897857A (en) | ICU (intensive care unit) duration prediction method after aortic dissection cardiac surgery | |
Andika et al. | Convolutional neural network modeling for classification of pulmonary tuberculosis disease | |
Chang et al. | Using machine learning algorithms in medication for cardiac arrest early warning system construction and forecasting |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |