CN117577333A

CN117577333A - Multi-center clinical prognosis prediction system based on causal feature learning

Info

Publication number: CN117577333A
Application number: CN202410067682.1A
Authority: CN
Inventors: 田雨; 秦园炳; 余华玉; 李劲松; 周天舒
Original assignee: Zhejiang University ZJU
Current assignee: Zhejiang University ZJU
Priority date: 2024-01-17
Filing date: 2024-01-17
Publication date: 2024-02-20
Anticipated expiration: 2044-01-17
Also published as: CN117577333B

Abstract

The invention discloses a multi-center clinical prognosis prediction system based on causal feature learning. The invention realizes the antagonism learning of the causal features and the non-causal features by introducing the complementary mask, promotes the causal features and the non-causal features mutually, and simultaneously uses the information quantity between the causal features and the non-causal features of the multi-branch prognosis prediction network to furthest reduce the information loss and ensure the information sufficiency of the downstream prognosis prediction task. According to the invention, a double-task network is introduced, a multi-branch prognosis prediction task is performed based on the separated causal features, a medical institution identification task is performed based on the separated non-causal features, and the feature separation effect is further improved. In the prediction stage, the medical institution identification network generates probability weight according to the non-causal characteristics, aggregates the prediction results of the multi-branch prognosis prediction network, effectively utilizes the non-causal characteristics, improves the information utilization rate of patient data, and ensures the generalization capability of the model under different application environments while keeping good prediction capability.

Description

Multi-center clinical prognosis prediction system based on causal feature learning

Technical Field

The invention belongs to the technical field of medical information, and particularly relates to a multi-center clinical prognosis prediction system based on causal feature learning.

Background

The prediction and timely treatment of the cancer recurrence risk have significant significance for improving the prognosis of patients. However, most studies construct a clinical prognosis prediction model based on single medical facility data, and when the model is applied to large-scale, multi-center clinical application evaluation, the performance of the model is generally difficult to maintain stable, and when the distribution difference between test data and training data is large, the performance of the model is significantly reduced.

The multicenter clinical prognosis prediction model provides a new direction for the above-mentioned problems. The model is constructed by utilizing the electronic medical record data of a plurality of medical institutions, integrates clinical data from different institutions, regions and countries, ensures the diversification of data sources, and effectively relieves the deviation problem caused by factors such as patient selection bias, subjective judgment of doctors and the like under a single institution. Secondly, the multi-center clinical prognosis prediction model has larger data sample size, and the data sample size of a single institution is limited, so that enough patient samples and clinical situations can not be covered, thereby limiting the prediction capability of the model, and the multi-center model remarkably increases the data sample size by integrating the data of a plurality of medical institutions.

However, even models constructed using large-scale patient data from multiple hospitals may be biased to medical institutions with greater data volumes or categories with greater data volumes due to imbalances in patient sample volume ratios, positive and negative sample ratios between different medical institutions. In addition, the differences of crowd characteristics and disease distribution among different areas further amplify the problem of insufficient generalization capability of the multicenter prognosis prediction model. The reason for this is that the machine learning algorithm is optimized by empirical risk minimization, which greedily exploits the correlations between training data, rather than its inherent causal mechanism, which are unstable under distribution transfer, thus resulting in significant degradation of model performance when the data distributions of the training and test sets deviate significantly.

Currently, in order to solve this problem, many causal relation-based methods are proposed to obtain a constant causal mechanism or recover causal characteristics, and a prediction model is constructed by using characteristics having causal relation with a target variable, so as to realize the stability of model prediction under different environments. The technical scheme similar to the application is as follows:

(1) heterogeneity identification and invariant prediction under no loop label. According to the technical scheme, under the condition that the data has no explicit environmental label, the characteristics are divided into heterogeneous characteristics and causal characteristics by a pair of complementary masks in a characteristic selection mode; identifying the heterogeneity of the data through a clustering algorithm, and generating a multi-environment dataset partition according to the heterogeneity characteristics of the data; and updating the parameters of the invariant classifier by minimizing causal characteristic prediction loss under the multi-environment partition, so as to realize the joint optimization of heterogeneity recognition and invariant prediction. The prior art similar to this solution is applicable to merging data sets from multiple source institutions without explicit source institution signatures, whereas the data of a multi-central clinical prediction model generally has a clear data source, such as which clinical hospital a patient's data belongs to, unlike the case of a real clinical prognosis prediction model. Secondly, the method performs clustering of the data centers in an unsupervised mode to generate multi-environment data set partitions, and information carried by different data center labels, such as regional population information, geographic information, hospital information and the like, is ignored, so that extra noise is generated and information is lost. In addition, the method utilizes a complementary mask to separate causal features and heterogeneous features in a feature selection mode, uses a constant classifier to predict based on the separated causal features, and the causal features after feature separation cannot guarantee the sufficiency of information due to the loss of information and may not be enough to complete downstream classification tasks. Finally, the built invariable classifier does not fully consider the difference between the application environment and the training environment, the model prediction stage only predicts based on causal features, and the information of heterogeneous features is not fully utilized.

(2) Balance represents learning. According to the technical scheme, a double-head structure is introduced into a neural network, the double-head structure represents two different classifiers corresponding to two different environment groups, a shared characteristic layer is used for learning shared characteristic representations among the different environment groups, the shared characteristic representations are used for respectively learning prediction results under the different environment groups and updating the corresponding classifiers, the shared layer can learn balanced characteristic representations under the different environment groups, the distances of the different groups in a representation space are further balanced by adding a loss of difference between measurement distribution, and causal effect estimation of inter-group variables is obtained by passing data through the different classifiers and calculating the difference. The prior art approach similar to this solution is only applicable to both control and experimental groups, whereas existing multi-center collaborative networks typically require the incorporation of patient data from multiple clinical hospitals into the study, and thus the approach is not suitable for combining multiple clinical hospitals as a prognostic prediction problem for the dataset. Secondly, in the method, the problems of unbalanced proportion of positive samples and negative samples of patients and unbalanced proportion of the number of samples among different medical centers in the clinical prognosis prediction problem are not considered, and the actual clinical prediction effect is influenced by adopting an average aggregation mode.

Disclosure of Invention

The invention aims at overcoming the defects of the prior art and providing a multi-center clinical prognosis prediction system based on causal feature learning.

The aim of the invention is realized by the following technical scheme: a multi-center clinical prognosis prediction system based on causal feature learning, the system comprising:

the data preprocessing module is used for collecting patient data of different medical institutions and preprocessing the patient data to obtain patient characteristics;

a feature separation module for constructing a pair of complementary maskers in a soft-masked manner to separate patient features into stable causal features and unstable non-causal features;

the multi-branch prognosis prediction module is used for respectively constructing multi-branch prognosis prediction networks based on causal features and non-causal features, each multi-branch prognosis prediction network comprises a shared characterization layer and multi-branch classifiers corresponding to medical institutions, the shared characterization layer is used for learning shared characterizations among different medical institutions, and the multi-branch classifiers are used for training sub-classifiers of the corresponding medical institutions based on the learned shared characterizations;

the medical institution identification module is used for constructing a medical institution identification network based on non-causal characteristics, outputting probability weights of the patient data belonging to each medical institution, and aggregating prediction results of the multi-branch prognosis prediction network based on the causal characteristics to obtain final prediction results.

Further, in the data preprocessing module, the discrete variable features in the patient data are subjected to single-heat encoding treatment, the continuous variable features are subjected to standardization treatment, and then the patient features are obtained through splicing.

Further, in the feature separation module, a pair of complementary maskers is constructed based on the neural network, wherein one of the maskers is specifically implemented as follows: calculating a contribution value vector of each characteristic dimension to a downstream prognosis prediction task based on the characteristic vector of each patient, sampling the contribution value vector by adopting Gumbel-Softmax sampling technology, and outputting a dimension contribution value; the patient features are multiplied by the dimension contribution values of the complementary masker outputs, respectively, to obtain causal and non-causal features.

Further, each sub-classifier of the multi-branch classifier is trained based on patient data of a corresponding medical facility; the shared characterization layer is trained based on patient data of all medical institutions.

Further, causal feature-based multi-branch prognosis prediction network f _C The update penalty of (a) includes: f (f) _C Classification loss of (c), loss calculated from the distribution distance of causal features between different medical institutions, f _C Regularization term of each sub-classifier of (a);

multi-branch prognosis prediction network f based on non-causal features _L The update penalty of (a) includes: f (f) _L Classification loss, root of (2)Loss, f calculated from the distribution distances of non-causal features between different medical institutions _L Regularization term of each sub-classifier of (c).

Further, the f _C Based on the causal characteristics of each medical institution, inputting classification results obtained by corresponding sub-classifiers, and calculating by adopting a Focal Loss function;

said f _L The classification Loss of (2) is calculated by adopting a Focal Loss function based on classification results obtained by the non-causal characteristic input corresponding sub-classifier of each medical institution.

Further, the loss calculated according to the distribution distance of the causal features among different medical institutions and the loss calculated according to the distribution distance of the non-causal features among different medical institutions are calculated by adopting a maximum mean difference method.

Further, the medical institution identification network outputs probability weights of different medical institutions in the training set, wherein the probability weights are used for weighting prediction results of all sub-classifiers of the multi-branch prognosis prediction network based on causal features to obtain final prediction results.

Further, a multi-center clinical prognosis prediction model is formed by a complementary mask, a causal feature-based multi-branch prognosis prediction network, a non-causal feature-based multi-branch prognosis prediction network, and a medical institution identification network, wherein the training process of the multi-center clinical prognosis prediction model comprises two stages:

a first stage of fixing the masker parameters, updating parameters other than the masker by minimizing the update penalty of the causal-based multi-branch prognosis prediction network, the update penalty of the non-causal-based multi-branch prognosis prediction network, and the loss of the healthcare facility identification network;

in the second stage, parameters other than the masker are fixed, and the masker parameters are updated by minimizing the update loss of the causal-based multi-branch prognosis prediction network and maximizing the update loss of the non-causal-based multi-branch prognosis prediction network.

Further, in the first stage training process of the multi-center clinical prognosis prediction model, the sum of the update loss of the multi-branch prognosis prediction network based on causal features and the update loss of the multi-branch prognosis prediction network based on non-causal features is taken as the loss of a prognosis prediction task, the loss of a medical institution identification network is taken as the loss of a medical institution identification task, and the loss weight between the prognosis prediction task and the medical institution identification task is balanced through a homodyne uncertainty method.

The beneficial effects of the invention are as follows:

1. the invention avoids the additional noise and loss of information introduced by generating the environmental label according to the heterogeneity recognition by assigning explicit medical institution labels to the patient data.

2. The invention realizes the combined learning of potential heterogeneity and causal characteristics of patient data in a multi-center environment by introducing a pair of complementary masks and a dual-task network, realizes the antagonistic learning of non-causal characteristics and causal characteristics by the complementary masks, promotes the two mutually, and simultaneously balances the information quantity between the causal characteristics and the non-causal characteristics by using a multi-branch prognosis prediction network, thereby furthest reducing the information loss and ensuring the information sufficiency of downstream prognosis prediction tasks.

3. According to the invention, a double-task network is introduced, a multi-branch prognosis prediction task is performed based on the separated causal features, a medical institution identification task is performed based on the separated non-causal features, and the two tasks are mutually promoted, so that the effect of feature separation is further improved. In the prediction stage, the medical institution identification network generates probability weights according to the non-causal characteristics of the patient data and the similarity of the data distribution of different medical institutions in the training set to carry out weighted aggregation on prediction results generated by the multi-branch prognosis prediction network, so that the non-causal characteristics of the patient data are effectively utilized, the information utilization rate of the patient data is improved, and the generalization capability of the model under different application environments is ensured while good prediction capability is maintained.

4. The invention expands the double-head structure of the neural network into a multi-branch structure, is suitable for the situation that patient data of a plurality of medical institutions are brought into the study in the multi-center clinical prognosis prediction study, and balances the difference of data distribution among the plurality of medical institutions by introducing a maximum mean difference method.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are needed in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a block diagram of a multi-center clinical prognosis prediction system based on causal feature learning, as shown in an exemplary embodiment;

FIG. 2 is a schematic diagram of the overall implementation of a causal feature learning based multi-center clinical prognosis prediction system, as shown in an exemplary embodiment;

FIG. 3 is a diagram of a multi-branch prognosis prediction network, as shown in an exemplary embodiment;

FIG. 4 is a block diagram of a multi-center clinical prognosis prediction device based on causal feature learning, as shown in an exemplary embodiment.

Detailed Description

For a better understanding of the technical solutions of the present application, embodiments of the present application are described in detail below with reference to the accompanying drawings.

It should be understood that the described embodiments are merely some, but not all, of the embodiments of the present application. All other embodiments, based on the embodiments herein, which would be apparent to one of ordinary skill in the art without making any inventive effort, are intended to be within the scope of the present application.

The terminology used in the embodiments of the application is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in this application and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.

The embodiment of the invention provides a multi-center clinical prognosis prediction system based on causal feature learning, which mainly comprises a data preprocessing module, a feature separation module, a multi-branch prognosis prediction module and a medical institution identification module as shown in fig. 1 and 2. And the data preprocessing module is used for collecting patient data of a plurality of centers, namely a plurality of medical institutions, and preprocessing the patient data to obtain patient characteristics. A feature separation module for constructing a pair of complementary maskers in a soft-masked manner to separate patient features into stable causal features and unstable non-causal features. The multi-branch prognosis prediction module is used for respectively constructing multi-branch prognosis prediction networks based on causal features and non-causal features, each multi-branch prognosis prediction network comprises a shared characterization layer and multi-branch classifiers corresponding to medical institutions, the shared characterization layer is used for learning shared characterizations among different medical institutions, and the multi-branch classifiers are used for training sub-classifiers corresponding to the medical institutions based on the learned shared characterizations. The medical institution identification module is used for constructing a medical institution identification network based on non-causal characteristics, outputting probability weights of the patient data belonging to each medical institution, and aggregating prediction results of the multi-branch prognosis prediction network based on the causal characteristics to obtain final prediction results.

The following description further presents some examples of the implementation of the modules of a causal feature learning based multi-center clinical prognosis prediction system consistent with the requirements of the present application.

1. The data preprocessing module is used for completing the following steps:

according to specific requirements, firstly, assigning independent codes T to patient data from different medical institutions as identifications, then selecting three aspects of demographic information, cancer grading and stage information and pathological observation information of patients in different departments of the medical institutions according to the admission codes of the patients, and matrixing the three aspects into an original data setWhereinThe characteristic matrix is input for continuous variables, n is the sample size, and p is continuousVariable feature dimension +_>To->Continuous variable characteristics of each dimension;the characteristic matrix is input for the discrete variable, n is the sample size, q is the characteristic dimension of the discrete variable, ++>To->Discrete variable features for each dimension;for a true prognosis outcome signature of n samples, for example, for colorectal cancer, the five-year survival rate of less than 5% after recurrence of colorectal cancer patients who do not receive any treatment, while the five-year survival rate of patients who receive radical surgery again can be increased to 30% -50%, setting->Five-year prognosis outcome representing sample i is death,/-)>The five-year prognosis outcome of the ith sample is shown as survival.

For the followingCharacteristic components ∈>Processing by means of one-hot coding, if the characteristic component +.>Has the following componentsThe possible value +.>Obtaining +.>：

Wherein the method comprises the steps ofRepresenting characteristic component +.>The feature representation after single thermal coding; the feature matrix after the single heat encoding treatment is marked as +.>。

For the followingCharacteristic components ∈>Processing in a z-score standardized manner so that the standard deviation of all data on the characteristic component is 1 and the average value is 0;

wherein the method comprises the steps ofRepresenting characteristic component +.>Characterization by z-score normalization, < >>Is a characteristic component->Mean value over n data, +.>Is a characteristic component->Standard deviation over n data; the feature matrix after normalization is marked as +.>。

Stitching the discrete feature representation subjected to the single thermal encoding treatment and the continuous feature representation subjected to the z-score normalization treatment to obtain the patient featureWhere n is the sample size and s is the feature dimension after stitching:

。

2. feature separation module

Stable causal features still maintain good predictive performance with different data distributions, while unstable non-causal features shift significantly with different data distributions, resulting in reduced performance of the model. The core idea of the feature separation module is to separate the original patient feature X into a complementary causal feature Φ (ζ) and a non-causal feature ψ (ζ) by constructing a pair of complementary maskers, by minimizing causal feature-based multi-branch prognosis prediction networksMaximizing a non-causal feature based multi-branch prognosis prediction network ++>Whereby the causal information is absorbed by the causal characteristic Φ (beta) and the non-causal information is absorbed by the non-causal characteristic ψ (beta)) Absorption effect. Because of the use of the complementary mask, the learning of the causal characteristic Φ (beta) can further promote the learning of the causal characteristic ψ (beta), whereas the learning of the causal characteristic ψ (beta) can also further promote the learning of the causal characteristic Φ (beta), and the causal characteristic Φ (beta) are mutually promoted to jointly promote the effect of characteristic separation.

Based on the above knowledge, the feature separation module completes the following steps:

building a pair of complementary maskers based on a neural network, namely a masker M and a complementary masker M _ad Wherein M is _ad The difference between the matrix and M, which is all 1:

wherein the method comprises the steps ofCharacteristic vector representing the ith patient obtained by the data preprocessing module,/for the patient>Neural network representing contribution values for learning each feature dimension to downstream prognostic prediction tasks, +.>The representation is based on feature vectors->The calculated contribution value vector of each characteristic dimension to the downstream prognosis prediction task, s is the number of characteristic dimensions,/I>For the sampling rate to be the same,，/>representing the sampling times for screening out the contribution value before sorting from high to low>Is a feature dimension of (1);matrix representing all 1's identical to M dimension, < >>Is a technique for sampling discrete distributions, allowing the sampling process to be made microscopic, allowing the model parameters to be updated with gradients during back propagation, < - >>Representing the contribution value vector->Go->And subsampling to obtain a contribution value vector of each characteristic dimension to the downstream prognosis prediction task.

The invention adopts Gumbel-Softmax technologyProceeding withSub-sampling. The Gumbil-Softmax technique is formulated as follows:

wherein the method comprises the steps ofRepresents the j-th feature dimension pass +.>Final contribution value after sub-sampling, +.>Is a probability directionThe quantity is used for representing the contribution of each characteristic dimension, the contribution value of each characteristic dimension is more than or equal to 0, and the sum of the contribution values of all characteristic dimensions is 1, < >>And->Respectively represent the j-th and the +.>Contribution value of individual feature dimension, +.>Represents the j-th feature dimension,>representing the first sample, +.>Representing a standard normal distribution,/->And->Represents the jth and the (th) randomly sampled from the standard normal distribution, respectively>Noise parameter->For super parameters, default settings +.>。

Because the causal information amount carried by different feature dimensions is different, the causal information content carried by part of the dimensions is more, and the complementary mask device detects the causal information content carried by each feature dimension and forces the causal information to be forcedSufficient content of characteristic dimension is used for stable prognosis prediction. Specifically, a mask M and a complementary mask M _ad Each connected with a multi-branch prognosis prediction networkAnd->Patient feature X is associated with mask M and complementary mask M, respectively _ad Multiplying the resulting dimension contribution values to obtain the corresponding characteristic representations Φ (beta) and ψ (beta), and then adding Φ (beta) to +.>Adding ψ (beta)>By minimizing +.>Is to maximize +.>The update loss of (c) forces stable causal information to be absorbed as much as possible by causal property Φ (Γ) and unstable non-causal information to be absorbed as much as possible by non-causal property ψ (Γ), thereby achieving the effect of feature separation, expressed as follows:

where M (X) represents the contribution of the patient feature X calculated by the mask M to the respective feature dimension,representing patient feature X passing through complementary mask M _ad The calculated contributions of the individual feature dimensions, Φ (ζ) representing the causal feature of the patient feature X multiplied by the dimension contribution M (ζ), ψ (ζ) representing the patient feature X and the dimension contribution M (ζ)>Non-causal features obtained after multiplication, +.>Representing a multi-branch prognosis prediction network->Update loss of->Representing a multi-branch prognosis prediction network->Update loss of->Representing the loss function of the update mask M. Minimizing +.>Is to maximize +.>To update the parameters of the mask M to achieve the effect of feature separation.

3. Multi-branch prognosis prediction module

After passing through the feature separation module, the patient features achieve a preliminary separation, however the resulting information volumes of causal (ζ) and non-causal (ζ) features ψ (ζ) are lost, possibly insufficient to complete the downstream prognostic prediction task. For this reason, a multi-branch prognosis prediction module is introduced, and the multi-branch prognosis prediction module integrally adopts a network structure of a shared characterization layer matched with multi-branch classifiers, each sub-classifier is trained based on patient data under corresponding medical institutions, and the shared characterization layer is trained based on patient data under all medical institutions. The shared characterization layer aims at learning general information representation of data under each medical institution, and then the shared characterization layer is used for respectively learning the result of each branch under the multi-branch classifier, so that the shared characterization layer can balance information among different medical institutions so as to maximize the information quantity of the multi-branch prognosis prediction network and meet the information sufficiency of downstream prognosis prediction tasks.

Based on the above knowledge, the multi-branch prognosis prediction module completes the following steps:

the data distribution of the corresponding medical institution under each branch is different, and in order to further balance the distances of the different data distributions, a maximum mean difference (Maximum Mean Discrepancy, MMD) method is adopted to calculate the distances between the different data distributions:

wherein P is _i And P _j Two different distributions of data are represented,representing the slave distribution P _i Extracting samples x and using kernel functionsThe expected value after mapping the samples,representing the slave distribution P _j Extracting samples y and using kernel functionsThe expected value after mapping the samples,representing the calculated maximum mean difference. K represents the number of medical institutions,the MMD distance average representing the pairwise post-pairing data distribution between all medical institutions is used to measure the average distance of all input medical institution data distributions.

Due to the discrepancy in patient data for the presence of a positive sample and a negative sample class imbalance (e.g., the number of patients dying from a five year prognosis is significantly less than those surviving from a five year prognosis) and the ease of sample discrimination, this problem is alleviated by introducing a Focal Loss function as follows:

wherein the method comprises the steps ofRepresenting the Focal Loss function, +.>Probability of correct class predicted for model, +.>For balancing the importance of factors for adjusting different categories,/->To adjust the factor for reducing the loss contribution of easily classified samples, the loss contribution of difficult-to-classify or minority-class samples is increased.

As shown in fig. 3, a multi-branch prognosis prediction networkAnd->The losses of (2) are as follows:

wherein the method comprises the steps ofMulti-branch prognosis prediction network for connection to mask M>Is used to update the loss function of the (c),for and complement mask M _ad Connected multi-branch prognosis prediction network>Is used to update the loss function of the (c),multi-branch prognosis prediction network for connection to mask M>Class loss function of>For and complement mask M _ad Connected multi-branch prognosis prediction network>N represents the patient sample size of all medical institutions, K represents the number of medical institutions in the training set, +.>Subscript indicating medical institution->Representing medical institutionsIs the number of samples of (j) represents the medical institution +.>Lower sample subscript,/->Representing a multi-branch prognosis prediction network->Chinese medical institution->Corresponding classifier, < >>Representing a multi-branch prognosis prediction network->Chinese medical institution->The corresponding classifier is used for classifying the objects in the image,representing medical institution->Patient characteristics of the jth sample, +.>Representing medical institution->Prognosis outcome signature for the jth sample,

representing the loss calculated from the distribution distance of causal characteristics Φ (beta) between different medical institutions, < ->Representing a loss calculated from the distribution distance of the non-causal characteristic ψ (beta) between different medical institutions,/v>Representing a multi-branch prognosis prediction network->Regularization term of each sub-classifier of (c),representing a multi-branch prognosis prediction network->Regularization term of each sub-classifier of (2) for controlling model complexity,/v>For the coefficients of regularization term, in this embodiment +.>。

4. Medical institution identification module

The data distribution difference of patients among different medical institutions can be used for identifying the medical institutions to which the patients belong, and the invention is arranged on the complementary mask M in order to maximally utilize the non-causal characteristics of the patients _ad A medical institution identification network is connectedThe medical institution identification network is used for identifying medical institutions to which patient data belong and has two main functions: 1) Identifying the task and prognosis prediction main task by additionally adding a medical institution to form a dual-task network; 2) During the prediction phase, the patient data may generate a corresponding probability weight via the healthcare facility identification network based on the similarity of non-causal characteristics of the patient data to the different healthcare facility data distributions in the training set>Based on probability weight->Causal feature-based multi-branch prognosis prediction network +.>Weighting the prediction results of the sub-classifiers of (a) to obtain a final prediction result +.>Thereby enabling the classifier of each medical institution to participate in decision-making together。

Based on the above knowledge, the medical institution identification module completes the following steps:

because the healthcare facility identification network is based on non-causal features, when the effect of the healthcare facility identification network is improved, the non-causal information can be better absorbed by the non-causal feature ψ (beta), thereby causing the causal information to be absorbed by the causal feature Φ (beta), and further realizing the separation of the causal feature and the non-causal feature. Because of the different importance and difficulty of identifying tasks and prognosis tasks by medical institutions, assigning the same weight to the loss function of each task may result in poor model performance. In addition, the task weight cannot be dynamically adjusted during training by manually setting the task weight, so that a homodyne uncertainty (homoscedastic uncertainty) method is introduced to measure the loss of a single task, and the formula is as follows:

wherein the method comprises the steps ofAnd->Representing losses belonging to the first task and the second task, respectively, W representing model parameters, +.>And->Is two noise parameters that can be optimized by minimizing the loss to achieve a balance of losses among different tasks. When noise parameter->When the model weight is increased, the weight of the corresponding task is reduced, so that the influence of model weight updating is reduced, and the aim of dynamic adjustment is fulfilled.

During the prediction phase, patient data is passed throughOvercomplete masker M _ad To obtain the non-causal characteristic ψ (beta), to obtain the causal characteristic Φ (beta) through a mask M, and to input the causal characteristic Φ (beta) into a multi-branch prognosis prediction networkSub-classifier +.corresponding to each medical institution in (a)>Get the corresponding prediction result->，/>Is put into medical institution identification network->Get the corresponding probability weight->：

Wherein the method comprises the steps ofRepresenting probability weights corresponding to each medical institution generated by the medical institution identification network, < +.>Representing medical institution->Corresponding probability weights, ++>Representing a multi-branch prognosis prediction network->Chinese medical institution->Corresponding sub-classifier->Based on causal characteristics->Is predicted by->Representing a multi-branch prognosis prediction networkPrediction results of each medical institution +.>Indicate use +.>Prediction network for multi-branch prognosis>Is the prediction result of (2)Weighted final prediction results. When the input patient data is close to the data distribution of one medical institution in the training set, the weight of the classifier corresponding to the medical institution is increased, and when the input patient data is not close to the data distribution of all medical institutions in the training set, the aggregation mode is close to the average aggregation, so that the prediction performance of the model is improved.

5. Training of multicenter clinical prognosis prediction model

By a pair of complementary maskers: mask M and complementary mask M _ad Multi-branch prediction network connected after mask MConnected to complementary mask M _ad Posterior multi-branch prognostic prediction network->And medical institution identification network +.>Together form a multicenter clinical prognosis prediction model.

The training process of the multi-center clinical prognosis prediction model is divided into two stages: 1) Fixing the parameters of the mask, and updating the parameters of the model except the mask; 2) The fixed model updates the parameters of the mask except the mask.

At each training round, the mask parameters are first fixed by minimizingTraining the model other parts than the mask:

where T represents a label of the medical institution to which the patient data belongs,representing medical institution->Non-causal characteristics of the lower patient->Representing a medical institution identification network->According to medical institution->The lower patient non-causal characteristic predicts the medical institution to which the patient belongs, < ->Representing the Focal Loss function, +.>Loss function representing a medical institution identification network, +.>Representing a multi-branch prognosis prediction network connected to a mask M>Is used to update the loss function of the (c),representation and complement mask M _ad Connected multi-branch prognosis prediction network>Is a new loss function,/-, for>And->Is two noise parameters for balancing the loss weights of the medical institution identification task and the prognosis prediction task,/->Representing the total loss function of the model in the first stage.

Then the parameters of the model except the mask are fixed, and the minimization is carried outUpdating the mask parameters:

these two steps are alternated until the multicenter clinical prognosis prediction model converges.

Corresponding to the embodiments of the aforementioned causal feature learning based multi-central clinical prognosis prediction system, the present invention also provides embodiments of causal feature learning based multi-central clinical prognosis prediction devices.

Referring to fig. 4, the causal feature learning-based multi-center clinical prognosis prediction apparatus provided in an embodiment of the present invention includes one or more processors for implementing the causal feature learning-based multi-center clinical prognosis prediction system in the above embodiment.

The embodiment of the multi-center clinical prognosis prediction device based on causal feature learning can be applied to any device with data processing capability, such as a computer or the like. The apparatus embodiments may be implemented by software, or may be implemented by hardware or a combination of hardware and software. Taking software implementation as an example, the device in a logic sense is formed by reading corresponding computer program instructions in a nonvolatile memory into a memory by a processor of any device with data processing capability. In terms of hardware, as shown in fig. 4, a hardware structure diagram of an apparatus with optional data processing capability where the multi-center clinical prognosis prediction device based on causal feature learning of the present invention is located is shown in fig. 4, and in addition to the processor, the memory, the network interface, and the nonvolatile memory shown in fig. 4, the apparatus with optional data processing capability in the embodiment generally includes other hardware according to the actual function of the apparatus with optional data processing capability, which will not be described herein.

The implementation process of the functions and roles of each unit in the above device is specifically shown in the implementation process of the corresponding steps in the above method, and will not be described herein again.

For the device embodiments, reference is made to the description of the method embodiments for the relevant points, since they essentially correspond to the method embodiments. The apparatus embodiments described above are merely illustrative, wherein the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purposes of the present invention. Those of ordinary skill in the art will understand and implement the present invention without undue burden.

The embodiment of the present invention also provides a computer readable storage medium having a program stored thereon, which when executed by a processor, implements the causal feature learning-based multi-center clinical prognosis prediction system in the above embodiment.

The computer readable storage medium may be an internal storage unit, such as a hard disk or a memory, of any of the data processing enabled devices described in any of the previous embodiments. The computer readable storage medium may be any device having data processing capability, for example, a plug-in hard disk, a Smart Media Card (SMC), an SD Card, a Flash memory Card (Flash Card), or the like, which are provided on the device. Further, the computer readable storage medium may include both internal storage units and external storage devices of any data processing device. The computer readable storage medium is used for storing the computer program and other programs and data required by the arbitrary data processing apparatus, and may also be used for temporarily storing data that has been output or is to be output.

Other embodiments of the present application will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure herein. This application is intended to cover any variations, uses, or adaptations of the application following, in general, the principles of the application and including such departures from the present disclosure as come within known or customary practice within the art to which the application pertains. The specification and examples are to be regarded in an illustrative manner only.

It is to be understood that the present application is not limited to the precise arrangements and instrumentalities shown in the drawings, which have been described above, and that various modifications and changes may be effected without departing from the scope thereof.

Claims

1. A causal feature learning-based multicenter clinical prognosis system, comprising:

2. The causal feature learning-based multi-center clinical prognosis prediction system according to claim 1, wherein the data preprocessing module performs a single-heat encoding process on discrete variable features in patient data, performs a normalization process on continuous variable features, and then performs stitching to obtain patient features.

3. The causal feature learning-based multi-center clinical prognosis prediction system according to claim 1, wherein in the feature separation module, a pair of complementary maskers is constructed based on a neural network, and one of the maskers is specifically implemented as: calculating a contribution value vector of each characteristic dimension to a downstream prognosis prediction task based on the characteristic vector of each patient, sampling the contribution value vector by adopting Gumbel-Softmax sampling technology, and outputting a dimension contribution value; the patient features are multiplied by the dimension contribution values of the complementary masker outputs, respectively, to obtain causal and non-causal features.

4. The causal feature learning based multi-central clinical prognosis prediction system according to claim 1, wherein each sub-classifier of the multi-branch classifier is trained based on patient data of a corresponding medical institution; the shared characterization layer is trained based on patient data of all medical institutions.

5. The causal feature learning-based multi-central clinical prognosis prediction system according to claim 1, wherein the causal feature-based multi-branch prognosis prediction network f _C The update penalty of (a) includes: f (f) _C Classification loss of (c), loss calculated from the distribution distance of causal features between different medical institutions, f _C Regularization term of each sub-classifier of (a);

multi-branch prognosis prediction network f based on non-causal features _L The update penalty of (a) includes: f (f) _L Classification loss of (c), loss calculated from the distribution distance of non-causal features between different medical institutions, f _L Regularization term of each sub-classifier of (c).

6. The causal feature learning based multicenter clinical prognosis system according to claim 5, wherein the f _C Based on the causal characteristics of each medical institution, inputting classification results obtained by corresponding sub-classifiers, and calculating by adopting a Focal Loss function;

7. The causal feature learning based multicenter clinical prognosis system according to claim 5, wherein the losses calculated from the distribution distances of causal features between different medical institutions and the losses calculated from the distribution distances of non-causal features between different medical institutions are calculated using a maximum mean difference method.

8. The causal feature learning based multi-central clinical prognosis prediction system of claim 1, wherein the healthcare facility identification network outputs probability weights for patient data belonging to different healthcare facilities in the training set based on non-causal features, and wherein the probability weights are used to weight the prediction results of each sub-classifier of the causal feature based multi-branch prognosis prediction network to obtain a final prediction result.

9. The causal feature learning-based multi-central clinical prognosis prediction system of claim 1, wherein the multi-central clinical prognosis prediction model is jointly formed by a complementary mask, a causal feature-based multi-branch prognosis prediction network, a non-causal feature-based multi-branch prognosis prediction network, and a medical institution identification network, and wherein the training process of the multi-central clinical prognosis prediction model comprises two phases:

10. The causal feature learning-based multi-central clinical prognosis prediction system according to claim 9, wherein the sum of the updated loss of the causal feature-based multi-branch prognosis prediction network and the updated loss of the non-causal feature-based multi-branch prognosis prediction network is taken as the loss of the prognosis prediction task, the loss of the medical institution identification network is taken as the loss of the medical institution identification task, and the loss weights between the prognosis prediction task and the medical institution identification task are balanced by a homodyne uncertainty method during the first stage of training of the multi-central clinical prognosis prediction model.