CN115719647A - Hemodialysis-concurrent cardiovascular disease prediction system integrating active learning and contrast learning - Google Patents

Hemodialysis-concurrent cardiovascular disease prediction system integrating active learning and contrast learning Download PDF

Info

Publication number
CN115719647A
CN115719647A CN202310029096.3A CN202310029096A CN115719647A CN 115719647 A CN115719647 A CN 115719647A CN 202310029096 A CN202310029096 A CN 202310029096A CN 115719647 A CN115719647 A CN 115719647A
Authority
CN
China
Prior art keywords
sample
original
positive
amplification
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202310029096.3A
Other languages
Chinese (zh)
Other versions
CN115719647B (en
Inventor
李劲松
王丰
池胜强
朱伟伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang Lab
Original Assignee
Zhejiang Lab
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang Lab filed Critical Zhejiang Lab
Priority to CN202310029096.3A priority Critical patent/CN115719647B/en
Publication of CN115719647A publication Critical patent/CN115719647A/en
Application granted granted Critical
Publication of CN115719647B publication Critical patent/CN115719647B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Landscapes

  • Investigating Or Analysing Biological Materials (AREA)

Abstract

The invention discloses a hemodialysis-associated cardiovascular disease prediction system integrating active learning and comparative learning, which comprises the following steps: the hemodialysis data preparation module is used for extracting structured data of a patient sample by utilizing a hospital electronic information system and daily monitoring equipment and processing the structured data to obtain amplified structured data; and the hemodialysis concurrent cardiovascular disease risk prediction module is used for constructing a risk evaluation model, training and learning the amplification structured data through the risk evaluation model to obtain the characterization and the score of the patient, and predicting the hemodialysis concurrent cardiovascular disease risk by using the characterization and the score of the patient. The method solves the problem of positive and negative sample matching, iteratively updates the parameters of the comparison learning model by using the real label data of the hemodialysis complicated cardiovascular diseases, and improves the performance of the model by using the real complication result label; the problem of too few samples or unbalanced number of positive samples and negative samples is solved, and the difference between the amplification data and the original data is reduced.

Description

Hemodialysis-concurrent cardiovascular disease prediction system integrating active learning and contrast learning
Technical Field
The invention relates to the technical field of medical health information, in particular to a hemodialysis-associated cardiovascular disease prediction system integrating active learning and comparative learning.
Background
Maintenance hemodialysis (hemodialysis) treatment is one of the main treatment modes of end-stage renal diseases, ensures that hemodialysis patients are effectively treated, and is an urgent need in the field of clinical medical treatment at present. Hemodialysis treatment is a long-term treatment that progresses throughout the course of the disease. Various cardiovascular complications can occur in the long-term hemodialysis process, and the survival condition of a patient is seriously influenced. Therefore, risk prediction and early intervention for cardiovascular complications of maintenance hemodialysis are of crucial importance to improve the quality of life of end stage renal disease patients.
Contrast learning is an automatic supervision algorithm, is widely applied to various fields such as computer vision, natural language processing and the like, and in recent years, model performance exceeding supervision learning is achieved even in various mainstream tasks. There are still difficulties in applying a comparative learning method suitable for an auto-supervised task to a supervised hemodialysis-complicated cardiovascular disease prediction task. On the one hand, cardiovascular complication prediction is a supervised task, and compared with an unsupervised task, additional label information is provided, so that how to effectively utilize a real complication result label to improve the performance of a model is a key problem. On the other hand, the key of the comparative learning lies in matching proper positive and negative samples, an improper matching method will seriously affect the model performance, and how to match proper and most valuable positive and negative samples to improve the model performance is a key problem.
Aiming at the problems, the patent aims to construct a hemodialysis concurrent cardiovascular disease prediction system integrating active learning and comparative learning by aiming at a hemodialysis concurrent cardiovascular disease prediction scene, and provides accurate and effective decision support for clinical decision.
Disclosure of Invention
In order to solve the technical problems, the invention provides a hemodialysis-complicated cardiovascular disease prediction system integrating active learning and comparative learning.
The technical scheme adopted by the invention is as follows:
a hemodialysis-complicated cardiovascular disease prediction system that fuses active learning and contrast learning, comprising:
the hemodialysis data preparation module is used for extracting structured data of a patient sample by utilizing a hospital electronic information system and daily monitoring equipment and processing the structured data to obtain amplified structured data;
and the hemodialysis concurrent cardiovascular disease risk prediction module is used for constructing a risk evaluation model, training and learning the amplification structured data through the risk evaluation model to obtain patient characterization and scores, and predicting the hemodialysis concurrent cardiovascular disease risk by using the patient characterization and scores.
Further, the structured data includes demographic data, clinical event data, medication data, and daily monitoring data.
Further, the hemodialysis data preparation module specifically includes:
the data acquisition unit is used for extracting the structured data of the patient sample by utilizing the hospital electronic information system and the wearable equipment;
the data cleaning unit is used for carrying out missing value processing, error value detection, repeated data elimination and/or inconsistency elimination on the structured data to obtain static data and time sequence data;
the data fusion unit is used for splicing one-dimensional compressed data obtained by performing convolution operation on the time sequence data and the static data to obtain original fusion characteristics;
and the data amplification unit is used for obtaining the amplification structured data by adopting a single-feature randomization method for the original fusion features.
Further, the amplification process of the data amplification unit is as follows:
step S1: taking patients with cardiovascular complications as original positive samples, taking patients without cardiovascular complications as original negative samples, wherein all the original positive samples form an original positive sample set, and all the original negative samples form an original negative sample set;
step S2: when the number of the original positive samples is smaller than that of the original negative samples, amplifying the original positive sample set to obtain amplified positive samples until the number of the positive samples is equal to that of the original negative samples; when the number of the original positive samples is larger than that of the original negative samples, amplifying the original negative sample set to obtain amplified negative samples until the number of the negative samples is equal to that of the original positive samples;
and step S3: the original positive sample set and the amplified positive samples form a positive sample amplification set, and the original negative sample set and the amplified negative samples form a negative sample amplification set;
and step S4: the positive sample amplification set and the negative sample amplification set together constitute amplification structured data.
Further, the process of obtaining the amplification positive sample in step S2 is:
combining the original fusion features with the original positive sample set to obtain a combined positive sample set, wherein the combined positive sample set comprises a single original fusion feature and a single positive sample set corresponding to the single original fusion feature;
taking a single original fusion feature in a single combined positive sample set as an intervention feature, taking the rest original fusion features in the single combined positive sample set as a fixed feature set, taking a positive sample in the single positive sample set as an amplification object to perform sample amplification to obtain a single amplification positive sample, and completing the whole amplification process until the amplification times are the difference value between the original negative sample and the original positive sample to obtain a final amplification positive sample;
the process of obtaining the amplification negative sample comprises the following steps:
combining the original fusion features with the original negative sample set to obtain a combined negative sample set, wherein the combined negative sample set comprises a single original fusion feature and a single negative sample set corresponding to the single original fusion feature;
and taking a single original fusion feature in the single combined negative sample set as an intervention feature, taking the rest original fusion features in the single combined negative sample set as a fixed feature set, taking the negative samples in the single negative sample set as amplification objects to carry out sample amplification to obtain a single amplification negative sample, and completing the whole amplification process until the amplification times are the difference value between the original negative sample and the original positive sample to obtain a final amplification negative sample.
Further, the module for predicting risk of hemodialysis complicated cardiovascular diseases specifically comprises:
a risk evaluation unit: the risk evaluation model is constructed, and the amplification structured data is used as training data of the model to obtain scores and patient phenotypes;
an active learning unit: for selecting positive and negative samples from said amplified structured data by a positive and negative sample selection normalizer using said score and said patient phenotype;
a comparison learning unit: and the system is used for performing comparison learning by using the positive and negative samples and updating the network parameters of the encoder shared by the risk evaluation unit.
Further, the risk evaluation unit specifically includes:
the risk evaluation model is constructed by utilizing an encoder and a risk evaluation network, and is optimized through a loss function;
for extracting a patient phenotype with an encoder in the risk assessment model, the patient phenotype calculating a score for hemodialysis-complicated cardiovascular disease through the risk assessment network;
the system is used for setting a real label for a patient, and when the patient has cardiovascular complications, the real label is 1; otherwise, the real label is 0;
for optimizing a loss function using the score and the true label.
Further, the active learning unit specifically includes:
the risk evaluation model is used for carrying out normalization processing on the patient phenotype output by the risk evaluation model, and the obtained patient representation is mapped into a 0-1 space through the normalization processing;
the device is used for respectively calculating the included angle of each sample representation in the amplification structured data to other sample representations in the 0-1 space direction by utilizing a positive and negative sample selection rule device;
the device is used for dividing the calculated included angle of each sample into a first group and a second group according to whether the real labels of other samples are the same as the real label of the current sample, and respectively sequencing the interiors of the first group and the second group from small to large;
the system is used for selecting the upper quartile as a positive sample set in the sorted first group and selecting the lower quartile as a negative sample set in the sorted second group.
Further, the comparison learning unit specifically includes: and the positive sample and the negative sample are used for performing comparison learning, the real labels of the positive sample and the patient sample are the same, the real labels of the negative sample and the patient sample are different, the cosine distance of the positive sample and the cosine distance of the negative sample of the patient sample are calculated, the loss function of the comparison learning unit is constructed according to the cosine distance of the positive sample and the cosine distance of the negative sample, and the network parameters of the encoder shared by the risk evaluation unit are updated.
Further, the risk evaluation unit, the active learning unit and the comparison learning unit share the encoder, the encoder is a 5-layer fully-connected network, the number of nodes in each layer is 1024, 512, 256, 128 and 64, respectively, and the activation function is ReLU.
The invention has the beneficial effects that:
1. the invention provides a positive and negative sample matching method based on active learning, which is used for selecting high-value comparison samples to improve the model performance and solve the problem of positive and negative sample matching.
2. The invention provides a training method for integrating active learning and comparative learning, which iteratively updates comparative learning model parameters by using real label data of hemodialysis complicated cardiovascular diseases, and solves the problem of how to effectively utilize real complicated symptom result labels to improve model performance in a supervised scene.
3. The invention provides a single-feature randomization method for amplifying original data, solves the problems of too few collected samples or unbalanced number of positive samples and negative samples, and reduces the difference between the amplified data and the original data.
Drawings
FIG. 1 is a block diagram of a hemodialysis-complicated cardiovascular disease prediction system incorporating active learning and contrast learning in accordance with the present invention;
FIG. 2 is a block diagram of a hemodialysis data preparation module of the present invention;
fig. 3 is a block diagram of the module for predicting risk of hemodialysis complicated with cardiovascular diseases according to the present invention.
Detailed Description
The following description of at least one exemplary embodiment is merely illustrative in nature and is in no way intended to limit the invention, its application, or uses. All other embodiments, which can be obtained by a person skilled in the art without inventive step based on the embodiments of the present invention, are within the scope of protection of the present invention.
Referring to fig. 1, a hemodialysis-complicated cardiovascular disease prediction system fusing active learning and contrast learning, includes:
the hemodialysis data preparation module is used for extracting structured data of a patient sample by utilizing a hospital electronic information system and daily monitoring equipment and processing the structured data to obtain amplified structured data;
the structured data includes demographic data, clinical event data, medication data, and daily monitoring data;
(1) Demographic data: age, sex, region, etc.; (2) clinical event data: hemodialysis events, diagnostic events, etc.; (3) medication data: drug name, dosage, etc.; (4) daily monitoring data: blood pressure, heart rate, body weight, etc.
Referring to fig. 2, the hemodialysis data preparation module specifically includes:
the data acquisition unit is used for extracting the structured data of the patient sample by utilizing the hospital electronic information system and the wearable equipment;
the data cleaning unit is used for carrying out missing value processing, error value detection, repeated data elimination and/or inconsistency elimination on the structured data to obtain static data and time sequence data;
taking the information of the complication diagnosis event as an example, the first time of using the complication medicines can be used for filling the missing diagnosis time of the complication; for the missing complication names, the specific complication names can be judged according to the medication condition of the complications; if the name of the complication can not be judged through the medication information, the missing complication diagnosis information is screened actively.
The data fusion unit is used for splicing one-dimensional compressed data obtained by performing convolution operation on the time sequence data and the static data to obtain original fusion characteristics;
the acquired basic information of the patient, such as age and sex, belongs to static data, and hemodialysis information and daily detection information belong to one-dimensional time series data. Convolution operation is carried out on the one-dimensional time sequence data, so that the one-dimensional time sequence data can be fused with static data, and subsequent data processing and model training are facilitated.
And the data amplification unit is used for obtaining the amplification structured data by adopting a single-feature randomization method for the original fusion features.
Under the condition that the number of the collected samples is too small or the number of the positive samples and the negative samples is unbalanced, the training effect of the model can be influenced. In order to reduce the influence, the invention adopts a single-feature randomization method to amplify the original data, and solves the problems of too few collected samples or unbalanced quantity of positive samples and negative samples. In order to reduce the difference between the amplification data and the actual data as much as possible, only one feature is selected as an interference feature residual characteristic set as a fixed feature at each time for sample amplification in the process of amplifying the original data by using a single-feature randomization method.
The amplification process of the data amplification unit comprises the following steps:
step S1: taking patients with cardiovascular complications as original positive samples, taking patients without cardiovascular complications as original negative samples, wherein all the original positive samples form an original positive sample set, and all the original negative samples form an original negative sample set;
step S2: when the number of the original positive samples is smaller than that of the original negative samples, amplifying the original positive sample set to obtain amplified positive samples until the number of the positive samples is equal to that of the original negative samples; when the number of the original positive samples is larger than that of the original negative samples, amplifying the original negative sample set to obtain amplified negative samples until the number of the negative samples is equal to that of the original positive samples;
the process of obtaining the amplification positive sample comprises the following steps:
combining the original fusion features with the original positive sample set to obtain a combined positive sample set, wherein the combined positive sample set comprises a single original fusion feature and a single positive sample set corresponding to the single original fusion feature;
taking a single original fusion feature in a single combined positive sample set as an intervention feature, taking the rest original fusion features in the single combined positive sample set as a fixed feature set, taking a positive sample in the single positive sample set as an amplification object to perform sample amplification to obtain a single amplification positive sample, and completing the whole amplification process until the amplification times are the difference value between the original negative sample and the original positive sample to obtain a final amplification positive sample;
the number of the existing original positive samples is M, the number of the original negative samples is N, and
Figure DEST_PATH_IMAGE001
because the original positive sample and the original negative sample are not balanced, the original positive sample needs to be amplified by the amount of Q, i.e. Q = N-M, and the whole amplification process of Q is described in detail below:
the original fusion features are noted as
Figure DEST_PATH_IMAGE002
Wherein, in the step (A),
Figure DEST_PATH_IMAGE003
representing the ith single original fused feature,
Figure DEST_PATH_IMAGE004
the characteristic quantity of the original positive sample is obtained; all original positive samples were then randomly averaged into
Figure 973263DEST_PATH_IMAGE004
Group, original Positive sample set recording
Figure DEST_PATH_IMAGE005
In which
Figure DEST_PATH_IMAGE006
Representing a single set of positive samples of group i and recording
Figure DEST_PATH_IMAGE007
In which
Figure DEST_PATH_IMAGE008
Indicates the total number of positive samples assigned to the i-th group,
Figure DEST_PATH_IMAGE009
representing the grouped ith group of jth positive samples.
Then combining the original fusion characteristic V with the original positive sample set X to obtain a combined positive sample set, and recording the combined positive sample set as
Figure DEST_PATH_IMAGE010
Wherein a single positive sample set is combined
Figure DEST_PATH_IMAGE011
Figure 391868DEST_PATH_IMAGE003
Representing the ith single original fused feature,
Figure DEST_PATH_IMAGE012
a single set of positive samples representing the ith group. After combination, only a single combined positive sample set is needed for each group of the combined positive sample sets VX
Figure DEST_PATH_IMAGE013
With a single original fused feature
Figure 109289DEST_PATH_IMAGE003
As a result of the nature of the intervention,
Figure DEST_PATH_IMAGE014
(feature set V divide
Figure 442181DEST_PATH_IMAGE003
Extra features) as a set of fixed features, a single set of positive samples
Figure 787712DEST_PATH_IMAGE012
The positive sample in (1) is used as an amplification target to amplify the sample. And combining each set of individual combined positive sample sets of positive sample sets VX
Figure 16699DEST_PATH_IMAGE013
The number of amplified samples is
Figure DEST_PATH_IMAGE015
. Recording the amplified data as
Figure DEST_PATH_IMAGE016
Wherein, in the process,
Figure DEST_PATH_IMAGE017
represents the sample set amplified by combining the ith group of samples of the positive sample set VX. Memo
Figure DEST_PATH_IMAGE018
Wherein, in the process,
Figure DEST_PATH_IMAGE019
indicating the number of amplifications of group i, single amplification positiveSample(s)
Figure DEST_PATH_IMAGE020
Represents the ith group of single combined positive sample set of combined positive sample set VX
Figure 191721DEST_PATH_IMAGE013
Sample amplification of the jth sample.
For single amplification positive samples
Figure 531567DEST_PATH_IMAGE020
First, a single set of positive samples is combined
Figure DEST_PATH_IMAGE021
Single positive sample set of
Figure 997184DEST_PATH_IMAGE006
Randomly selecting two samples
Figure DEST_PATH_IMAGE022
Sample of
Figure DEST_PATH_IMAGE023
Is characterized by being represented as
Figure DEST_PATH_IMAGE024
Sample of
Figure DEST_PATH_IMAGE025
Is characterized by being represented as
Figure DEST_PATH_IMAGE026
. Single amplification positive sample
Figure DEST_PATH_IMAGE027
The characteristics are expressed as follows:
Figure DEST_PATH_IMAGE028
Figure DEST_PATH_IMAGE029
wherein the content of the first and second substances,
Figure DEST_PATH_IMAGE030
is a random number with a value range of (0, 1);
Figure DEST_PATH_IMAGE031
representing a sample
Figure 974236DEST_PATH_IMAGE023
To (1) a
Figure 135090DEST_PATH_IMAGE003
A value of the individual characteristic;
Figure DEST_PATH_IMAGE032
representing a sample
Figure 576829DEST_PATH_IMAGE025
To (1) a
Figure 365793DEST_PATH_IMAGE003
The value of each feature.
Figure DEST_PATH_IMAGE033
Representing amplified samples
Figure 733321DEST_PATH_IMAGE020
To (1)
Figure 381471DEST_PATH_IMAGE003
For amplifying the sample
Figure 922174DEST_PATH_IMAGE020
Is characterized by
Figure 96803DEST_PATH_IMAGE003
Is taken of a sample
Figure 838494DEST_PATH_IMAGE023
And a sample
Figure DEST_PATH_IMAGE034
Is characterized in that
Figure 675738DEST_PATH_IMAGE003
Random numbers between the lines of the upper values, which reduces the difference between the amplified data and the original data.
The process of obtaining the amplification negative sample comprises the following steps:
combining the original fusion features with the original negative sample set to obtain a combined negative sample set, wherein the combined negative sample set comprises a single original fusion feature and a single negative sample set corresponding to the single original fusion feature;
and taking a single original fusion feature in the single combined negative sample set as an intervention feature, taking the rest original fusion features in the single combined negative sample set as a fixed feature set, taking the negative samples in the single negative sample set as amplification objects to carry out sample amplification to obtain a single amplification negative sample, and completing the whole amplification process until the amplification times are the difference value between the original negative sample and the original positive sample to obtain a final amplification negative sample.
And step S3: the original positive sample set and the amplified positive samples form a positive sample amplification set, and the original negative sample set and the amplified negative samples form a negative sample amplification set;
and step S4: the positive sample amplification set and the negative sample amplification set together constitute amplification structured data.
And the hemodialysis concurrent cardiovascular disease risk prediction module is used for constructing a risk evaluation model, training and learning the amplification structured data through the risk evaluation model to obtain patient characterization and scores, and predicting the hemodialysis concurrent cardiovascular disease risk by using the patient characterization and scores.
The module for predicting the risk of the hemodialysis complicated cardiovascular diseases comprises three parts: risk evaluation unit, active learning unit, contrast learning unit, as shown in fig. 3. Firstly, using amplification structured data as input of a system, and training a primary risk evaluation model through a risk evaluation unit; then, the active learning unit selects high-value contrast samples from the amplified structured data through a positive and negative sample selection rule device R by using the output score p of the risk evaluation unit and the phenotypes of the patients s1 and s2 for the contrast learning unit to learn; and finally, the comparison learning unit learns by using the high-quality comparison samples selected by the active learning unit, so that the samples with the same label are closer, the samples with different labels are farther, and meanwhile, the parameter of the encoder f shared by the comparison learning unit and the risk evaluation unit is updated, so that the risk evaluation model is more accurate.
The hemodialysis complicated cardiovascular disease risk prediction module specifically comprises:
a risk evaluation unit: the risk evaluation model is constructed, and the amplification structured data is used as training data of the model to obtain scores and patient phenotypes;
the risk evaluation unit specifically includes:
the risk evaluation model is constructed by utilizing an encoder and a risk evaluation network, and is optimized through a loss function;
for extracting a patient phenotype with an encoder in the risk assessment model, the patient phenotype calculating a score for hemodialysis-complicated cardiovascular disease through the risk assessment network;
the system is used for setting a real label for a patient, and when the patient has cardiovascular complications, the real label is 1; otherwise, the real label is 0;
for optimizing a loss function using the score and the truth label.
The risk evaluation unit, the active learning unit and the comparison learning unit share the encoder f, the encoder f is a 5-layer fully-connected network, the number of nodes in each layer is 1024, 512, 256, 128 and 64 respectively, and the activation function is ReLU.
After the patient phenotype S (which is a 64-bit vector) is extracted from the patient raw fusion features by using the encoder f, the patient phenotype S is evaluated by a risk evaluation network
Figure DEST_PATH_IMAGE035
Calculating the score p =suffering from cardiovascular complications of the patient
Figure DEST_PATH_IMAGE036
. Risk assessment network
Figure 488973DEST_PATH_IMAGE035
Is a network consisting of a 4-layer full connection. Each layer of nodes is 128, 32, 8 and 2 respectively. The activation function of the first three layers is ReLU, and the activation function of the last output layer is
Figure DEST_PATH_IMAGE037
The entire network uses the SGD function as an optimizer. The predicted loss function of the risk assessment unit is as follows:
Figure DEST_PATH_IMAGE038
wherein N represents the number of all samples in the amplified structured data,
Figure DEST_PATH_IMAGE039
indicating that the risk assessment unit has a predictive score for the input patient sample i with a certain cardiovascular disease,
Figure DEST_PATH_IMAGE040
is a true label for patient i, when patient i has cardiovascular disease,
Figure DEST_PATH_IMAGE041
when the patient i does not suffer from cardiovascular disease,
Figure DEST_PATH_IMAGE042
. For the entire loss function, when patient i suffers from cardiovascular disease, in the loss function
Figure DEST_PATH_IMAGE043
Figure DEST_PATH_IMAGE044
Predictive scoring with patient i
Figure 24167DEST_PATH_IMAGE039
The larger and larger, and thus the smaller the overall loss function; similarly, when patient i does not have a certain cardiovascular disease, the predicted score for patient i is determined
Figure 61393DEST_PATH_IMAGE039
The smaller the overall loss function.
An active learning unit: for selecting positive and negative samples from said augmented structured data using said score and said patient phenotype by a positive and negative sample selection normalizer;
the active learning unit is used for selecting high-value comparison samples for the comparison learning unit to learn by combining the risk evaluation unit, so that the patient characteristics of the same label are closer, and the patient characteristics of different labels are farther.
The active learning unit specifically includes:
the risk evaluation model is used for carrying out normalization processing on the patient phenotype output by the risk evaluation model, and the obtained patient representation is mapped into a 0-1 space through the normalization processing;
first, the patient phenotype s generated by the risk assessment unit is normalized and recorded as
Figure DEST_PATH_IMAGE045
A patient characterization vector with s length of 64,
Figure DEST_PATH_IMAGE046
representing the L1 norm of s. After normalization, the patient characterization is mapped into a space of 0-1, facilitating subsequent pick calculations.
The positive and negative sample rule selector R is used for selecting positive and negative samples from the original input sample set by using a selection rule.
The positive and negative sample rule picker R utilizes the following rules: the cosine distances between patient phenotype vectors of the same label should be similar and the cosine distances between their patient phenotype vectors should be far apart for samples of different labels. The rule for choosing a positive sample j of sample i is that sample j has the same true label as sample i, but sample j is further away from the cosine of sample i. It is desirable to make the cosine distance between the sample i and the positive sample closer by contrast learning; the rule for choosing the negative sample k of sample i is that sample k is not true label to sample i, but the cosine distance of sample k to sample i is small. It is desirable to make the cosine distance between the sample i and the negative sample k further away by contrast learning;
the device is used for respectively calculating the included angle of each sample representation in the amplification structured data to other sample representations in the 0-1 space direction by utilizing a positive and negative sample selection rule device;
using formulas
Figure DEST_PATH_IMAGE047
Calculating the included angle of the sample characterization in the amplification structured data to the spatial direction of other sample characterizations, wherein,
Figure DEST_PATH_IMAGE048
Figure DEST_PATH_IMAGE049
which represents a characterization of the sample i,
Figure DEST_PATH_IMAGE050
representing a characterization of sample j. In general, if two exemplars have the same label, they should have the same or similar orientation in space, the smaller the cosine of the angle between them, if the labels of the two exemplars are different, the different orientation in space, the larger the cosine of the angle between them,
Figure 792457DEST_PATH_IMAGE049
the vector is the vector of the ith sample characterization vector after normalization processing.
The device is used for dividing the calculated included angle of each sample into a first group and a second group according to whether the real labels of other samples are the same as the real label of the current sample, and respectively sequencing the interiors of the first group and the second group from small to large;
dividing the cosine of the included angle between the sample i and other samples into two groups according to whether the real labels of other samples are the same as the real label of the sample i
Figure DEST_PATH_IMAGE051
And
Figure DEST_PATH_IMAGE052
in which
Figure 612646DEST_PATH_IMAGE051
The other sample true tags are the same as the true tag of sample i, i.e. the set is
Figure DEST_PATH_IMAGE053
Figure 637233DEST_PATH_IMAGE040
A real label representing the sample i,
Figure DEST_PATH_IMAGE054
a true label representing sample j;
Figure 376519DEST_PATH_IMAGE052
the true tags of the other samples are different from the true tag of sample i, i.e. the set
Figure DEST_PATH_IMAGE055
. And is
Figure 220979DEST_PATH_IMAGE051
And
Figure 907175DEST_PATH_IMAGE052
sorting the interior from small to large, recording
Figure DEST_PATH_IMAGE056
Wherein, in the step (A),
Figure DEST_PATH_IMAGE057
Figure DEST_PATH_IMAGE058
Figure DEST_PATH_IMAGE059
wherein, in the step (A),
Figure DEST_PATH_IMAGE060
Figure DEST_PATH_IMAGE061
and the upper quartile is selected from the sorted first group as a positive sample set, and the lower quartile is selected from the sorted second group as a negative sample set.
After sorting
Figure 163100DEST_PATH_IMAGE051
Selecting upper quartile as positive sample set of sample i
Figure DEST_PATH_IMAGE062
Wherein, in the step (A),
Figure DEST_PATH_IMAGE063
(ii) a After sorting
Figure 214233DEST_PATH_IMAGE052
Selecting a lower quartile as a negative sample set of the sample i from the group
Figure DEST_PATH_IMAGE064
Wherein, in the process,
Figure DEST_PATH_IMAGE065
a comparison learning unit: and the system is used for performing comparison learning by using the positive and negative samples and updating the network parameters of the encoder shared by the risk evaluation unit.
The comparison learning unit specifically comprises: and the positive sample and the negative sample are used for performing comparison learning, the real labels of the positive sample and the patient sample are the same, the real labels of the negative sample and the patient sample are different, the cosine distance of the positive sample and the cosine distance of the negative sample of the patient sample are calculated, the loss function of the comparison learning unit is constructed according to the cosine distance of the positive sample and the cosine distance of the negative sample, and the network parameters of the encoder shared by the risk evaluation unit are updated.
In the comparison learning unit, the active learning unit selects positive and negative samples of the original sample based on the real label and the patient characterization. The positive and negative samples obtain patient characteristics s of the positive and negative samples through an encoder f, the obtained positive and negative sample patient characteristics s can be subjected to characteristic mapping through a projector h to obtain a mapped comparison characteristic t, the projector h is a 3-layer fully-connected network, the number of nodes in each layer is 512, 256 and 128 respectively, an activation function is a ReLU function, and an SGD function is used as an optimizer. The mapped representation is normalized and recorded as
Figure DEST_PATH_IMAGE066
Figure DEST_PATH_IMAGE067
Wherein, in the step (A),
Figure DEST_PATH_IMAGE068
is the mean value of the characteristic dimension of the comparative characterization t,
Figure DEST_PATH_IMAGE069
is the standard deviation of the comparative characterization t characteristic dimension.
Figure DEST_PATH_IMAGE070
Is a characterization vector of a positive sample j of patient i screened by the active learning unit,
Figure DEST_PATH_IMAGE071
is a characterization vector of the negative sample k of patient i screened by the active learning unit.
Figure DEST_PATH_IMAGE072
Representing the cosine distance between sample i and sample j,
Figure DEST_PATH_IMAGE073
representing the cosine distance between sample j and negative sample k. As can be seen from the active learning unit described above, the true labels of the positive sample j and the sample i are the same, and as a loss, the smaller the cosine distance between the positive sample j and the sample i is, the better, and in the same way, the true labels of the negative sample k and the sample i are different, and as a loss, the larger the cosine distance between the negative sample k and the sample i is, the better is. Thus, the loss function of the comparative learning unit is constructed as follows:
Figure DEST_PATH_IMAGE074
Figure DEST_PATH_IMAGE075
Figure DEST_PATH_IMAGE076
the above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (10)

1. A hemodialysis-complicated cardiovascular disease prediction system that incorporates active learning and contrast learning, comprising:
the hemodialysis data preparation module is used for extracting structured data of a patient sample by utilizing a hospital electronic information system and daily monitoring equipment and processing the structured data to obtain amplified structured data;
and the hemodialysis concurrent cardiovascular disease risk prediction module is used for constructing a risk evaluation model, training and learning the amplification structured data through the risk evaluation model to obtain patient characterization and scores, and predicting the hemodialysis concurrent cardiovascular disease risk by using the patient characterization and scores.
2. The system of claim 1, wherein the structured data comprises demographic data, clinical event data, medication data, and daily monitoring data.
3. The hemodialysis-complicated cardiovascular disease prediction system combining active learning and contrast learning according to claim 1, wherein the hemodialysis data preparation module specifically comprises:
the data acquisition unit is used for extracting the structured data of the patient sample by utilizing the hospital electronic information system and the wearable equipment;
the data cleaning unit is used for carrying out missing value processing, error value detection, repeated data elimination and/or inconsistency elimination on the structured data to obtain static data and time sequence data;
the data fusion unit is used for splicing one-dimensional compressed data obtained by performing convolution operation on the time sequence data and the static data to obtain original fusion characteristics;
and the data amplification unit is used for obtaining the amplification structured data by adopting a single-feature randomization method for the original fusion features.
4. The system for predicting hemodialysis-complicated cardiovascular diseases by combining active learning and contrast learning according to claim 3, wherein the data amplification unit comprises:
step S1: taking patients with cardiovascular complications as original positive samples, taking patients without cardiovascular complications as original negative samples, wherein all the original positive samples form an original positive sample set, and all the original negative samples form an original negative sample set;
step S2: when the number of the original positive samples is smaller than that of the original negative samples, amplifying the original positive sample set to obtain amplified positive samples until the number of the positive samples is equal to that of the original negative samples; when the number of the original positive samples is larger than that of the original negative samples, amplifying the original negative sample set to obtain amplified negative samples until the number of the negative samples is equal to that of the original positive samples;
and step S3: the original positive sample set and the amplified positive samples form a positive sample amplification set, and the original negative sample set and the amplified negative samples form a negative sample amplification set;
and step S4: the positive sample amplification set and the negative sample amplification set together constitute amplification structured data.
5. The system for predicting hemodialysis-complicated cardiovascular disease through active learning and contrast learning according to claim 4, wherein the process of obtaining the amplification positive sample in step S2 comprises:
combining the original fusion features with the original positive sample set to obtain a combined positive sample set, wherein the combined positive sample set comprises a single original fusion feature and a single positive sample set corresponding to the single original fusion feature;
taking a single original fusion feature in a single combined positive sample set as an intervention feature, taking the rest original fusion features in the single combined positive sample set as a fixed feature set, taking a positive sample in the single positive sample set as an amplification object to carry out sample amplification, obtaining a single amplification positive sample, completing the whole amplification process until the amplification times are the difference value between the original negative sample and the original positive sample, and obtaining a final amplification positive sample;
the process of obtaining the amplification negative sample comprises the following steps:
combining the original fusion features with the original negative sample set to obtain a combined negative sample set, wherein the combined negative sample set comprises a single original fusion feature and a single negative sample set corresponding to the single original fusion feature;
and taking a single original fusion feature in the single combined negative sample set as an intervention feature, taking the rest original fusion features in the single combined negative sample set as a fixed feature set, taking the negative samples in the single negative sample set as amplification objects to carry out sample amplification, obtaining a single amplification negative sample, completing the whole amplification process until the amplification times are the difference between the original negative sample and the original positive sample, and obtaining a final amplification negative sample.
6. The system for predicting hemodialysis-complicated cardiovascular disease fused with active learning and comparative learning according to claim 1, wherein the module for predicting risk of hemodialysis-complicated cardiovascular disease comprises:
a risk evaluation unit: the risk evaluation model is constructed, and the amplification structured data is used as training data of the model to obtain scores and a patient phenotype;
an active learning unit: for selecting positive and negative samples from said amplified structured data by a positive and negative sample selection normalizer using said score and said patient phenotype;
a comparison learning unit: and the system is used for performing comparison learning by using the positive and negative samples and updating the network parameters of the encoder shared by the risk evaluation unit.
7. The system for predicting hemodialysis-complicated cardiovascular diseases by combining active learning and comparative learning according to claim 6, wherein the risk evaluation unit specifically comprises:
the risk evaluation model is constructed by utilizing an encoder and a risk evaluation network, and is optimized through a loss function;
for extracting a patient phenotype with an encoder in the risk assessment model, the patient phenotype calculating a score for hemodialysis-complicated cardiovascular disease through the risk assessment network;
the system is used for setting a real label for a patient, and when the patient has cardiovascular complications, the real label is 1; otherwise, the real label is 0;
for optimizing a loss function using the score and the truth label.
8. The system for predicting hemodialysis-complicated cardiovascular diseases by combining active learning and contrast learning according to claim 6, wherein the active learning unit comprises:
the risk evaluation model is used for carrying out normalization processing on the patient phenotype output by the risk evaluation model, and the obtained patient representation is mapped into a 0-1 space through the normalization processing;
the positive and negative sample selection rulers are used for respectively calculating included angles from each sample characterization in the amplification structured data to other sample characterizations in the 0-1 space direction;
the device is used for dividing the calculated included angle of each sample into a first group and a second group according to whether the real labels of other samples are the same as the real label of the current sample, and respectively sequencing the interiors of the first group and the second group from small to large;
and the upper quartile is selected from the sorted first group as a positive sample set, and the lower quartile is selected from the sorted second group as a negative sample set.
9. The system for predicting hemodialysis-complicated cardiovascular diseases by combining active learning and comparative learning according to claim 6, wherein the comparative learning unit comprises: and the positive sample and the negative sample are used for performing comparison learning, the real labels of the positive sample and the patient sample are the same, the real labels of the negative sample and the patient sample are different, the cosine distance of the positive sample and the cosine distance of the negative sample of the patient sample are calculated, the loss function of the comparison learning unit is constructed according to the cosine distance of the positive sample and the cosine distance of the negative sample, and the network parameters of the encoder shared by the risk evaluation unit are updated.
10. The hemodialysis-complicated cardiovascular disease prediction system combining active learning and comparative learning according to claim 6, wherein the risk evaluation unit, the active learning unit and the comparative learning unit share the encoder, the encoder is a 5-layer fully-connected network, each layer has 1024 nodes, 512 nodes, 256 nodes, 128 nodes and 64 nodes, respectively, and the activation function is ReLU.
CN202310029096.3A 2023-01-09 2023-01-09 Hemodialysis-concurrent cardiovascular disease prediction system integrating active learning and contrast learning Active CN115719647B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310029096.3A CN115719647B (en) 2023-01-09 2023-01-09 Hemodialysis-concurrent cardiovascular disease prediction system integrating active learning and contrast learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310029096.3A CN115719647B (en) 2023-01-09 2023-01-09 Hemodialysis-concurrent cardiovascular disease prediction system integrating active learning and contrast learning

Publications (2)

Publication Number Publication Date
CN115719647A true CN115719647A (en) 2023-02-28
CN115719647B CN115719647B (en) 2023-04-11

Family

ID=85257907

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310029096.3A Active CN115719647B (en) 2023-01-09 2023-01-09 Hemodialysis-concurrent cardiovascular disease prediction system integrating active learning and contrast learning

Country Status (1)

Country Link
CN (1) CN115719647B (en)

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107242857A (en) * 2017-06-12 2017-10-13 南开大学 The intelligent traditional Chinese medical science based on deep learning integrates diagnosis and therapy system
WO2019008798A1 (en) * 2017-07-07 2019-01-10 Ntt Data Corporation Disease onset prediction device, disease onset prediction method, and program
US20200210899A1 (en) * 2017-11-22 2020-07-02 Alibaba Group Holding Limited Machine learning model training method and device, and electronic device
CN111430025A (en) * 2020-03-10 2020-07-17 清华大学 Disease diagnosis method based on medical image data amplification
CN113571183A (en) * 2020-04-28 2021-10-29 西门子医疗有限公司 COVID-19 patient management risk prediction
CN113674864A (en) * 2021-08-30 2021-11-19 重庆大学 Method for predicting risk of malignant tumor complicated with venous thromboembolism
CN114005432A (en) * 2021-10-21 2022-02-01 江苏信息职业技术学院 Chinese dialect identification method based on active learning
CN114913982A (en) * 2022-07-18 2022-08-16 之江实验室 End-stage renal disease complication risk prediction system based on contrast learning
CN114999659A (en) * 2022-04-26 2022-09-02 北京市农林科学院信息技术研究中心 Complication risk early warning method and device, electronic equipment and storage medium

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107242857A (en) * 2017-06-12 2017-10-13 南开大学 The intelligent traditional Chinese medical science based on deep learning integrates diagnosis and therapy system
WO2019008798A1 (en) * 2017-07-07 2019-01-10 Ntt Data Corporation Disease onset prediction device, disease onset prediction method, and program
US20200210899A1 (en) * 2017-11-22 2020-07-02 Alibaba Group Holding Limited Machine learning model training method and device, and electronic device
CN111430025A (en) * 2020-03-10 2020-07-17 清华大学 Disease diagnosis method based on medical image data amplification
CN113571183A (en) * 2020-04-28 2021-10-29 西门子医疗有限公司 COVID-19 patient management risk prediction
CN113674864A (en) * 2021-08-30 2021-11-19 重庆大学 Method for predicting risk of malignant tumor complicated with venous thromboembolism
CN114005432A (en) * 2021-10-21 2022-02-01 江苏信息职业技术学院 Chinese dialect identification method based on active learning
CN114999659A (en) * 2022-04-26 2022-09-02 北京市农林科学院信息技术研究中心 Complication risk early warning method and device, electronic equipment and storage medium
CN114913982A (en) * 2022-07-18 2022-08-16 之江实验室 End-stage renal disease complication risk prediction system based on contrast learning

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
LU LIU 等: "Mining diabetes complication and treatment patterns for clinical decision support" *
唐佩军: "生成对抗网络的可解释性研究" *
赵梦蝶;孙九爱;: "机器学习在心血管疾病诊断中的研究进展" *

Also Published As

Publication number Publication date
CN115719647B (en) 2023-04-11

Similar Documents

Publication Publication Date Title
Rimi et al. Derm-NN: skin diseases detection using convolutional neural network
CN111951975B (en) Sepsis early warning method based on deep learning model GPT-2
CN113057585B (en) Cognitive disorder detection model and training method thereof
CN116364299B (en) Disease diagnosis and treatment path clustering method and system based on heterogeneous information network
Gupta Pneumonia detection using convolutional neural networks
CN114913982B (en) End-stage renal disease complication risk prediction system based on contrast learning
CN111951965B (en) Panoramic health dynamic monitoring and predicting system based on time sequence knowledge graph
Chhabra et al. A smart healthcare system based on classifier DenseNet 121 model to detect multiple diseases
CN108595432B (en) Medical document error correction method
CN112967803A (en) Early mortality prediction method and system for emergency patients based on integrated model
CN113160986A (en) Model construction method and system for predicting development of systemic inflammatory response syndrome
Iparraguirre-Villanueva et al. Convolutional neural networks with transfer learning for pneumonia detection
Peng et al. Heart disease prediction using artificial neural networks: a survey
CN115798711A (en) Chronic nephropathy diagnosis and treatment decision support system based on counterfactual contrast learning
CN112542242A (en) Data transformation/symptom scoring
CN116070096A (en) Method and system for helping hospital build patient portrait through big data analysis
CN111145902A (en) Asthma diagnosis method based on improved artificial neural network
Chawla et al. Artificial intelligence based techniques in respiratory healthcare services: a review
CN117423423B (en) Health record integration method, equipment and medium based on convolutional neural network
Hamidi et al. A new hybrid method for improving the performance of myocardial infarction prediction
CN116959715B (en) Disease prognosis prediction system based on time sequence evolution process explanation
CN115719647B (en) Hemodialysis-concurrent cardiovascular disease prediction system integrating active learning and contrast learning
CN111897857A (en) ICU (intensive care unit) duration prediction method after aortic dissection cardiac surgery
Andika et al. Convolutional neural network modeling for classification of pulmonary tuberculosis disease
Chang et al. Using machine learning algorithms in medication for cardiac arrest early warning system construction and forecasting

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant