CN117408745A

CN117408745A - Gain prediction method based on causal effect estimation, model training method and device

Info

Publication number: CN117408745A
Application number: CN202311370240.6A
Authority: CN
Inventors: 安志成; 涂珂; 吴郑伟; 张志强; 周俊
Original assignee: Alipay Hangzhou Information Technology Co Ltd
Current assignee: Alipay Hangzhou Information Technology Co Ltd
Priority date: 2023-10-20
Filing date: 2023-10-20
Publication date: 2024-01-16

Abstract

The embodiment of the specification provides a gain prediction method and device based on causal effect estimation. In the gain prediction method based on causal effect estimation, the initial characteristic representation of the acquired sample to be predicted and the initial characteristic representation of the corresponding associated sample are decoupled respectively to obtain the corresponding adjustment variable characteristic representation and confusion variable characteristic representation; then, according to the obtained adjustment variable characteristic representation and confusion variable characteristic representation corresponding to the association sample, aggregation adjustment variable characteristic representation and aggregation confusion variable characteristic representation which correspond to each other and are fused with the corresponding information of the association sample are obtained; determining a corresponding aggregate prediction feature representation based on the confounding variable feature representation of the sample to be predicted and the fusion of the corresponding aggregate adjustment variable feature representation and the aggregate confounding variable feature representation; and accordingly determining the gain size corresponding to the sample to be predicted.

Description

Gain prediction method based on causal effect estimation, model training method and device

Technical Field

Embodiments of the present disclosure relate generally to the field of computer technology, and more particularly, to a gain prediction method based on causal effect estimation, and a training method and apparatus for a causal effect estimation model.

Background

Causal Inference (cause) is the process of determining the cause of one event as another event by analyzing the relationship between events. In causal inference we consider interactions between variables to determine if a change in one variable results in a change in another variable. This typically involves controlling the effect of other variables to determine the effect of the variable to be tested on the result.

The causal effect estimation is to evaluate the influence of a specific factor on the result by comparing the difference of the intervention group (receiving the intervention) and the gain prediction (not receiving the intervention) of the control group based on the causal effect estimation after controlling other factors possibly influencing the result, and is widely applied to the fields of social science, medicine, finance and the like. Therefore, how to more accurately describe causal relationships between data and to make gain (update) predictions based on causal effect estimates is a problem to be solved.

Disclosure of Invention

In view of the foregoing, embodiments of the present disclosure provide a method and apparatus for predicting gain based on causal effect estimation, and a method and apparatus for training a causal effect estimation model. By using the method and the device, the causal relation between data can be more accurately described, and the gain (update) prediction based on causal effect estimation can be realized.

According to an aspect of embodiments of the present specification, there is provided a gain prediction method based on causal effect estimation, comprising: respectively decoupling the initial characteristic representation of the acquired sample to be predicted and the initial characteristic representation of the corresponding associated sample to obtain the corresponding adjustment variable characteristic representation and confusion variable characteristic representation; according to the obtained adjustment variable characteristic representation and confusion variable characteristic representation corresponding to the associated sample, aggregation adjustment variable characteristic representation and aggregation confusion variable characteristic representation which correspond to the sample to be predicted and are fused with the corresponding information of the associated sample are obtained; obtaining a corresponding aggregation prediction feature representation based on the confusion variable feature representation of the sample to be predicted and fusion of the corresponding aggregation adjustment variable feature representation and the aggregation confusion variable feature representation; and determining the gain corresponding to the sample to be predicted according to the aggregate prediction feature representation.

According to another aspect of embodiments of the present specification, there is provided a training method of a causal effect estimation model, wherein the causal effect estimation model includes a feature decoupling model, an adjustment variable aggregation model, a confusion variable aggregation model, and a prediction model, the training method comprising: the following model training process is circularly executed by using a training sample set until a training ending condition is met, wherein the training sample set comprises sample initial feature representations corresponding to all training samples with association relations, intervention labels representing whether intervention is carried out or not and output true values: providing sample initial characteristic representations corresponding to each current training sample in the current training sample set for the current characteristic decoupling model to obtain adjustment variable characteristic representations and confusion variable characteristic representations corresponding to each current training sample; aiming at each current training sample, providing the corresponding adjustment variable characteristic representation of each associated current training sample for a current adjustment variable aggregation model to obtain an aggregation adjustment variable characteristic representation corresponding to the current training sample; according to whether the intervention labels corresponding to the current training samples are consistent with the intervention labels of the current training samples, providing the confusion variable feature representation corresponding to the corresponding current training samples for a current confusion variable aggregation model to obtain an aggregation confusion variable feature representation corresponding to the current training samples; obtaining a corresponding aggregate prediction feature representation based on the confounding variable feature representation of the current training sample and the fusion of the corresponding aggregate adjustment variable feature representation and the aggregate confounding variable feature representation; providing the obtained aggregate prediction feature representation for a current prediction model to obtain a corresponding prediction value; determining an adjustment variable consistency loss value according to the difference between the obtained aggregation adjustment variable characteristic representations corresponding to the current training samples of different intervention labels; determining a predicted loss value according to the obtained difference between each predicted value and the corresponding output true value; and adjusting model parameters of a current feature decoupling model, a current adjustment variable aggregation model, a current confusion variable aggregation model and a current prediction model according to the adjustment variable consistency loss value and the prediction loss value in response to the condition that the training ending condition is not met.

According to still another aspect of the embodiments of the present specification, there is provided a gain prediction apparatus based on causal effect estimation, comprising: the decoupling unit is configured to respectively decouple the initial characteristic representation of the acquired sample to be predicted and the initial characteristic representation of the corresponding associated sample to obtain the corresponding adjustment variable characteristic representation and the confusion variable characteristic representation; the aggregation unit is configured to aggregate according to the obtained adjustment variable characteristic representation and the confusion variable characteristic representation corresponding to the associated sample, so as to obtain an aggregation adjustment variable characteristic representation and an aggregation confusion variable characteristic representation which correspond to the sample to be predicted and are fused with the corresponding information of the associated sample; the fusion unit is configured to obtain a corresponding aggregation prediction feature representation based on the confusion variable feature representation of the sample to be predicted and fusion of the corresponding aggregation adjustment variable feature representation and the aggregation confusion variable feature representation; and the prediction unit is configured to determine the gain size corresponding to the sample to be predicted according to the aggregate prediction feature representation.

According to a further aspect of embodiments of the present specification, there is provided a training apparatus of a causal effect estimation model, wherein the causal effect estimation model includes a feature decoupling model, an adjustment variable aggregation model, a confounding variable aggregation model, and a prediction model, the training apparatus being configured to perform a model training process by a training unit in a loop using a training sample set until a training end condition is satisfied, the training sample set including a sample initial feature representation corresponding to each training sample having an association, an intervention tag characterizing whether or not an intervention is received, and an output truth value, the training unit comprising: the decoupling module is configured to provide sample initial feature representations corresponding to the current training samples in the current training sample set for the current feature decoupling model to obtain adjustment variable feature representations and confusion variable feature representations corresponding to the current training samples; the prediction module is configured to provide the current adjustment variable aggregation model with adjustment variable characteristic representations corresponding to the current training samples according to the current training samples, so as to obtain aggregation adjustment variable characteristic representations corresponding to the current training samples; according to whether the intervention labels corresponding to the current training samples are consistent with the intervention labels of the current training samples, providing the confusion variable feature representation corresponding to the corresponding current training samples for a current confusion variable aggregation model to obtain an aggregation confusion variable feature representation corresponding to the current training samples; obtaining a corresponding aggregate prediction feature representation based on the confounding variable feature representation of the current training sample and the fusion of the corresponding aggregate adjustment variable feature representation and the aggregate confounding variable feature representation; providing the obtained aggregate prediction feature representation for a current prediction model to obtain a corresponding prediction value; the loss value determining module is configured to determine an adjustment variable consistency loss value according to the obtained difference between the aggregate adjustment variable characteristic representations corresponding to the current training samples of different intervention labels; determining a predicted loss value according to the obtained difference between each predicted value and the corresponding output true value; the training device further includes: and the parameter adjusting unit is configured to adjust model parameters of a current characteristic decoupling model, a current adjusting variable aggregation model, a current confusion variable aggregation model and a current prediction model according to the adjusting variable consistency loss value and the prediction loss value in response to the condition that the training ending condition is not met.

According to another aspect of embodiments of the present specification, there is provided a gain prediction apparatus based on causal effect estimation, comprising: at least one processor, and a memory coupled to the at least one processor, the memory storing instructions that, when executed by the at least one processor, cause the at least one processor to perform a causal effect estimation based gain prediction method as described above.

According to another aspect of embodiments of the present specification, there is provided a training apparatus of a causal effect estimation model, comprising: at least one processor, and a memory coupled to the at least one processor, the memory storing instructions that, when executed by the at least one processor, cause the at least one processor to perform the method of training a causal effect estimation model as described above.

According to another aspect of embodiments of the present specification, there is provided a computer readable storage medium storing a computer program which, when executed by a processor, implements a causal effect estimation based gain prediction method and/or a causal effect estimation model training method as described above.

According to another aspect of embodiments of the present specification, there is provided a computer program product comprising a computer program to be executed by a processor for implementing a causal effect estimation based gain prediction method and/or a causal effect estimation model training method as described above.

Drawings

A further understanding of the nature and advantages of the present description may be realized by reference to the following drawings. In the drawings, similar components or features may have the same reference numerals.

FIG. 1 illustrates an exemplary architecture of a causal effect estimation based gain prediction method and apparatus, a causal effect estimation model training method and apparatus, according to an embodiment of the present description.

Fig. 2 shows a flowchart of one example of a causal effect estimation based gain prediction method according to an embodiment of the present description.

Fig. 3 shows a flowchart of one example of a process of determining a gain size corresponding to a sample to be predicted according to an embodiment of the present specification.

FIG. 4 illustrates a flowchart of one example of a method of training a causal effect estimation model according to an embodiment of the present description.

FIG. 5 shows a schematic diagram of yet another example of a training method of a causal effect estimation model according to an embodiment of the present description.

Fig. 6 shows a block diagram of one example of a causal effect estimation based gain prediction device according to an embodiment of the present description.

FIG. 7 illustrates a block diagram of one example of a training apparatus of a causal effect estimation model according to an embodiment of the present description.

Fig. 8 shows a block diagram of one example of a causal effect estimation based gain prediction device according to an embodiment of the present description.

FIG. 9 shows a block diagram of one example of a training apparatus of a causal effect estimation model according to an embodiment of the present description.

Detailed Description

The subject matter described herein will be discussed below with reference to example embodiments. It should be appreciated that these embodiments are discussed only to enable a person skilled in the art to better understand and thereby practice the subject matter described herein, and are not limiting of the scope, applicability, or examples set forth in the claims. Changes may be made in the function and arrangement of elements discussed without departing from the scope of the embodiments herein. Various examples may omit, replace, or add various procedures or components as desired. In addition, features described with respect to some examples may be combined in other examples as well.

As used herein, the term "comprising" and variations thereof mean open-ended terms, meaning "including, but not limited to. The term "based on" means "based at least in part on". The terms "one embodiment" and "an embodiment" mean "at least one embodiment. The term "another embodiment" means "at least one other embodiment". The terms "first," "second," and the like, may refer to different or the same object. Other definitions, whether explicit or implicit, may be included below. Unless the context clearly indicates otherwise, the definition of a term is consistent throughout this specification.

A gain prediction method and apparatus based on causal effect estimation, and a training method and apparatus of a causal effect estimation model according to embodiments of the present specification will be described in detail with reference to the accompanying drawings.

FIG. 1 illustrates an exemplary architecture 100 of a causal effect estimation based gain prediction method and apparatus, a causal effect estimation model training method and apparatus, according to an embodiment of the present description.

In fig. 1, a network 110 is employed to interconnect between a terminal device 120 and an application server 130.

Network 110 may be any type of network capable of interconnecting network entities. The network 110 may be a single network or a combination of networks. In terms of coverage, network 110 may be a Local Area Network (LAN), wide Area Network (WAN), or the like. In terms of a carrier medium, the network 110 may be a wired network, a wireless network, or the like. In terms of data switching technology, the network 110 may be a circuit switched network, a packet switched network, or the like.

Terminal device 120 may be any type of electronic computing device capable of connecting to network 110, accessing servers or websites on network 110, processing data or signals, and the like. For example, the terminal device 120 may be a desktop computer, a notebook computer, a tablet computer, a smart phone, or the like. Although only one terminal device is shown in fig. 1, it should be understood that there may be a different number of terminal devices connected to the network 110.

In one embodiment, the terminal device 120 may be used by a user. Terminal device 120 may include an application client (e.g., application client 121) that may provide various services to a user. In some cases, application client 121 may interact with application server 130. For example, the application client 121 may transmit a message input by a user to the application server 130 and receive a response associated with the message from the application server 130. Herein, a "message" may refer to any input information, such as an initial characteristic representation of a sample to be predicted provided by a user, an initial characteristic representation of a corresponding associated sample, and so on.

The application server 130 may be connected to a training sample database 131. The training sample database 131 includes sample initial feature representations corresponding to respective training samples with association relations, intervention labels representing whether intervention is performed or not, and output true values. The application server 130 may be trained to derive a causal effect estimation model using a training sample set in a training sample database 131. However, it should be understood that in other cases, the causal effect estimation model may be obtained by training by using the training sample set in the training sample database 131 by another server, and the obtained model parameters may be transmitted to the application server 130.

It should be appreciated that all network entities shown in fig. 1 are exemplary and that any other network entity may be involved in architecture 100, depending on the particular application requirements.

Fig. 2 shows a flow chart of a causal effect estimation based gain prediction method 200 according to an embodiment of the present description.

As shown in fig. 2, at 210, the obtained initial feature representation of the sample to be predicted and the initial feature representation of the corresponding associated sample are decoupled, respectively, to obtain a corresponding adjustment variable feature representation and a confounding variable feature representation.

In one example, a feature mask (i.e., each feature multiplied by a respective weight value) may be utilized to decouple the initial feature representation of the acquired sample to be predicted from the initial feature representation of the corresponding associated sample, resulting in a corresponding adjusted variable feature representation and a confounding variable feature representation. In one example, the initial characteristic representation of the sample to be predicted and the corresponding associated sample may be usedTo represent. Wherein the number of samples to be predicted and the corresponding associated samples is N, and the initial characteristic represents X _i The dimension of (c) may be, for example, K dimension. The adjustment variable characteristic representation and the confusion variable characteristic representation corresponding to the sample to be predicted and the corresponding associated sample can be respectively represented by X _a ＝V _mask By X and X _c ＝V’ _mask As indicated by X. Wherein V is _mask And V' _mask May be used to represent a feature mask for deriving the adjusted variable feature representation and a feature mask for deriving the confounding variable feature representation, respectively. X is X _a,i And X _c,i Can be used to represent X respectively _a And X _c Line i of (a).

In one example, the feature mask described above may be trained based on a machine learning model. For example, V _mask And V' _mask The determination can be made by the following formula: z is Z _mask ＝(ReLu(X·W ₁ +Θ ₁ ))·W ₂ +Θ ₂ ,V _mask ＝σ(Z _mask ),V’ _mask ＝σ(-Z _mask ). Wherein,and->May be used to represent the weight vector and bias of the two trained fully connected layers, respectively. Sigma (·) may be used to represent a sigmoid function. D may be used to represent the dimension of the hidden layer vector. Due to sigma (-Z) _mask，i )＝1-σ(Z _mask,i ) Is true for any i.epsilon.N, thus X can be obtained _c +X _a =x. Thus, the initial feature representation may be hard separated (hard separation) to ensure independence of the separated factors (e.g., the tuning variable feature representation and the confounding variable feature representation).

In one example, the samples to be predicted and the corresponding associated samples may be represented as network observations (Networked observational data). Individuals (e.g., users, organizations, products, etc.) and their interactions in network observations may be represented as a network structure. It may comprise two components of node characteristics and network architecture. In one example, the network observations may be represented as Wherein (1)>May be used to represent the adjacency matrix between the N samples (samples to be predicted and corresponding associated samples). For example, if at the ith sample v _i And the jth sample v _j With an adjacency between them, then A _ij =1, otherwise a _ij ＝0。T＝[t ₁ ，…，t _N ]Can be used to represent the values of the intervention variables corresponding to each of the samples to be predicted and the corresponding associated samples, where t _i E {0,1}. For example, 0 is used to indicate that no intervention is being accepted, and 1 is used to indicate that intervention is being accepted. Y= [ Y ] ₁ ，…，y _N ]And the method can be used for representing the output result of each sample in the samples to be predicted and the corresponding associated samples under the condition of the value of the corresponding intervention variable.

At 220, aggregation is performed according to the obtained adjustment variable feature representation and confusion variable feature representation corresponding to the associated sample, so as to obtain an aggregation adjustment variable feature representation and aggregation confusion variable feature representation corresponding to the sample to be predicted and fused with the corresponding information of the associated sample.

In one example, the aggregation may be performed according to the adjustment variable feature representations corresponding to the samples to be predicted and the associated samples, so as to obtain an aggregate adjustment variable feature representation corresponding to the samples to be predicted and fused with the corresponding information of the associated samples. In one example, the aggregate adjustment variable feature representation of the correspondence information fused with the associated sample corresponding to the sample to be predicted may be Wherein (1)>May be used to represent the associated sample set corresponding to the sample i to be predicted. X is X _a，j May be used to represent the adjustment variable feature representation corresponding to the jth correlation sample. W (W) _a May be used to represent the trained weight matrix. Alternatively, the aggregate adjustment variable feature representation corresponding to the sample to be predicted may also be +.> Wherein alpha is _ij May be used to represent the attention coefficient derived based on the self-attention mechanism. In one example, a->Wherein W is _att May be used to represent parameters obtained by training. The l and LeakyReLU may be used to represent a concat connection function and a nonlinear activation function, respectively. Based on this, the weight matrix and the parameter W can be derived by training _att Aggregation between the adjustment variable feature representations is achieved (if any) to obtain an aggregate adjustment variable feature representation corresponding to the sample to be predicted.

Alternatively, the aggregate confounding variable feature representation may comprise an aggregate positive-fact confounding variable feature representation. The confusion variable characteristic representation of the positive facts corresponding to the associated sample of the sample to be predicted can be provided for the positive fact confusion variable aggregation model, and the aggregation positive fact confusion variable characteristic representation corresponding to the sample to be predicted is obtained. Where positive facts may refer to agreement of the value of the dry pre-variable with the sample to be predicted. For example, for the case where the sample to be predicted is under non-intervention, a confounding variable feature representation of positive fact may refer to a confounding variable feature representation corresponding to an associated sample with a value of 0 for the pre-variable. The characteristic representation of the aggregate positive fact confusion variable corresponding to the sample to be predicted can be Alternatively, the aggregate positive fact confusion variable feature representation corresponding to the sample to be predicted may also be +.> Wherein X is _c,j May be used to represent the confounding variable feature representation of the jth positive fact. W (W) _c May be used to represent the trained weight matrix. In one example, the model parameters of the real-time confounding variable aggregation model may include the W _c . Based on the method, aggregation among the confusion variable characteristic representations of the positive facts can be achieved by means of the training obtained confusion variable aggregation model of the positive facts, and accordingly the aggregation confusion variable characteristic representations of the positive facts corresponding to the samples to be predicted are obtained.

Alternatively, the aggregate confounding variable feature representation may comprise an aggregate anti-fact confounding variable feature representation. The confounding variable characteristic representation of the counterfactual corresponding to the associated sample of the sample to be predicted can be provided to the counterfactual confounding variable aggregation model to obtain the aggregated counterfactual confounding variable characteristic representation corresponding to the sample to be predicted. Where the counterfactual may refer to the inverse of the value of the dry pre-variable to the sample to be predicted. For example, for the case where the sample to be predicted is under non-intervention, the confounding variable feature representation of the counterfactual may refer to the confounding variable feature representation corresponding to the associated sample with a value of 1 of the pre-variable. The aggregate inverse fact confusion variable feature representation corresponding to the sample to be predicted can be Alternatively, the aggregate anti-aliasing variable feature representation corresponding to the sample to be predicted may also be +.> Wherein X is _c,j (t _i ≠t _j ) Can be used to represent the confounding variable feature representation of the jth counterfactual. W (W) _cf May be used to represent the trained weight matrix. In one example, the model parameters of the anti-facts confounding variable aggregation model may include the W _cf . The meaning of the remaining symbols may be referred to in the foregoing. Based on this, the training-derived mixed-up variable model can be used to implement mixed-up variable features of the counterfactualAggregation between the symptom representations, thereby obtaining an aggregation anti-fact confusion variable characteristic representation corresponding to the sample to be predicted.

Alternatively, the aggregate confounding variable feature representation may comprise an aggregate anti-fact confounding variable feature representation. The confounding variable feature representation of the sample to be predicted can be provided to the anti-fact confounding variable mapping model, and the mapping result is obtained as an aggregate anti-fact confounding variable feature representation corresponding to the sample to be predicted. In one example, the aggregate anti-aliasing variable feature representation corresponding to the sample to be predicted may be E _cf,i ＝g(X _c,i ). Where g (·) can be used to represent the anti-fact confusion variable mapping model. In one example, the anti-fact confusion variable mapping model described above may be a model that characterizes various mappings between vectors that are derived through machine learning training. For example, the anti-facts confounding variable mapping model may be a graph annotation force network (Graph Attention Networks). Based on the method, the aggregate anti-fact confusion variable characteristic representation corresponding to the sample to be predicted can be obtained under the condition that the value of the intervention variable is difficult to obtain from the initial characteristic representation of the opposite correlation sample of the sample to be predicted, and the method is particularly suitable for scenes with sparse correlation relations between the samples.

At 230, a corresponding aggregate prediction feature representation is derived based on the confounding variable feature representation of the sample to be predicted and the fusion of the corresponding aggregate tuning variable feature representation and the aggregate confounding variable feature representation.

In this embodiment, the fusion may be performed in various manners based on the confounding variable feature representation of the sample to be predicted and the corresponding aggregate adjustment variable feature representation and aggregate confounding variable feature representation, so as to obtain an aggregate prediction feature representation fused with the above information.

Alternatively, the aggregate predicted feature representation may comprise an aggregate real-world predicted feature representation. The aggregate positive fact prediction feature representation may be obtained by fusing the confounding variable feature representation of the sample to be predicted with the corresponding aggregate adjustment variable feature representation and the aggregate positive fact confounding variable feature representation. In one example, the aggregate positive fact prediction feature representation mayIs H _f,i ＝E _a，i +E _c,i . In one example, the aggregate positive fact prediction feature representation may be H _f,i ＝E _a，i +E _c,i +X _c,i . Based on this, the original confusion variable feature can be represented as X _c，i The importance of which is reinforced as a residual.

Alternatively, the aggregate prediction feature representation may comprise an aggregate inverse prediction feature representation. The aggregate inverse facts prediction feature representation may be obtained by fusing the confounding variable feature representation of the sample to be predicted with the corresponding aggregate adjustment variable feature representation and the aggregate inverse facts confounding variable feature representation. In one example, the aggregate inverse fact prediction feature representation may be H _cf,i ＝E _a,i +E _cf,i . Similarly, the aggregate inverse fact prediction feature representation may be H _cf,i ＝E _a,i +E _cf,i +X _c，i . The meaning of the relevant symbols may be referred to in the foregoing.

At 240, a gain size corresponding to the sample to be predicted is determined based on the aggregate prediction feature representation.

In this embodiment, the magnitude of the gain corresponding to the sample to be predicted may be determined according to the aggregate prediction feature representation by means of the prediction model obtained by training. In one example, the predictive Model may include a gain Model (Uplft Model).

Optionally, with continued reference to fig. 3, fig. 3 shows a flowchart of one example of a process 300 of determining a gain magnitude corresponding to a sample to be predicted according to an embodiment of the present disclosure.

At 310, the aggregate positive real-world prediction feature representation is provided to a process variable prediction model and a control variable prediction model, respectively, to obtain corresponding process and control predictions.

In one example, the process variable prediction model and the control variable prediction model may be represented as f, respectively ₁ (H _f,i |t _i =1) and f ₀ (H _f,i |t _i =0). Can be corresponding to t _i Aggregate positive fact prediction feature representation h=1 _f,i|ti＝1 Providing the prediction model of the processing variable to obtain the correspondingThe processed predicted value f of (2) ₁ (H _f,i|ti＝1 ). Can be corresponding to t _i Aggregate positive fact prediction feature representation h=0 _f,i|ti＝0 Providing the control variable prediction model to obtain a corresponding control predicted value f ₀ (H _f,i|ti＝0 ). The parameters of the process variable prediction model and the control variable prediction model can be obtained through training in a machine learning mode.

At 320, a gain size corresponding to the sample to be predicted is determined based on the difference between the obtained process prediction value and the control prediction value.

In one example, the resulting difference between the processed and control predictions may be determined as the gain magnitude corresponding to the sample to be predicted.

Based on the method, the gain corresponding to the sample to be predicted can be determined through the processing variable prediction model and the control variable prediction model according to the corresponding aggregate prediction feature representation.

Optionally, the aggregated anti-facts prediction feature representation may be further provided to an anti-facts prediction model to obtain corresponding anti-facts prediction values. Wherein the aforementioned inverse facts prediction model may select one of the process variable prediction model and the control variable prediction model in accordance with the inverse facts to be predicted. In one example, the inverse fact prediction model may be f _1-ti (H _cf,i ). For example, it may correspond to t _i Aggregate positive fact prediction feature representation h=1 _f,i|ti＝1 Providing the processing variable prediction model to obtain a corresponding processing predicted value f ₁ (H _f,i|ti＝1 ). It is also possible to correspond to t _i Aggregate inverse fact prediction feature representation h=1 _cf,i|ti＝1 Providing the inverse fact prediction model to obtain a corresponding control predicted value f ₀ (H _cf,i|ti＝1 ). The difference between the obtained processing predicted value and the control predicted value can be further determined as the gain size corresponding to the sample to be predicted. The parameters of the inverse reality prediction model can be obtained through training in a machine learning mode.

By using the causal effect estimation-based gain prediction method disclosed in fig. 1-3, the initial characteristic representation of the sample can be decoupled into two parts of the adjustment variable characteristic representation and the confusion variable characteristic representation in a characteristic decoupling mode, and the aggregation prediction characteristic representation used for prediction is further obtained by using neighbor aggregation with different interventions in the corresponding associated sample, so that the corresponding gain is determined. Therefore, the prediction capability of the control confusion variable for the result is reserved while the deviation caused by the confusion variable is controlled, and the prediction accuracy is improved.

Referring now to FIG. 4, FIG. 4 illustrates a flowchart of one example of a method 400 of training a causal effect estimation model according to an embodiment of the present description. The causal effect estimation model may include a feature decoupling model, an adjustment variable aggregation model, a confounding variable aggregation model, and a prediction model.

As shown in FIG. 4, at 410, the following model training processes 420-490 are performed with a training sample set loop until the training end condition is met.

In this embodiment, the training sample set may include a sample initial feature representation corresponding to each training sample having an association relationship, an intervention label that characterizes whether or not an intervention is performed, and an output truth value. In one example, the association between training samples may be embodied by a network structure in the network observations as described above, and in particular, reference may be made to the description of the correlation of the adjacency matrix a as described above. The above-described tamper tag and output truth values that characterize whether tampering has occurred may be referred to the relevant description of T and Y as previously described.

At 420, sample initial feature representations corresponding to each current training sample in the current training sample set are provided to the current feature decoupling model, resulting in an adjusted variable feature representation and a confounding variable feature representation corresponding to each current training sample.

In this embodiment, the current training sample set may refer to a training sample set having an association relationship in a batch (batch) selected from the training sample set in the current iteration process (iteration). The structure and related parameters of the current feature decoupling model may be referred to as the corresponding description of step 210 in the embodiment of fig. 2, which is not repeated here.

The following steps 430-460 may be performed for each current training sample.

At 430, the adjusted variable feature representation corresponding to each associated current training sample is provided to the current adjusted variable aggregate model to obtain an aggregate adjusted variable feature representation corresponding to the current training sample.

In this embodiment, the structure and related parameters of the current adjustment variable aggregation model may refer to the corresponding description of step 220 in the embodiment of fig. 2, which is not repeated here.

At 440, according to whether the intervention label corresponding to each current training sample is consistent with the intervention label of the current training sample, the confounding variable feature representation corresponding to the corresponding current training sample is provided to the current confounding variable aggregation model, and the aggregated confounding variable feature representation corresponding to the current training sample is obtained.

In this embodiment, the structure and related parameters of the above-mentioned current confusion variable aggregation model may refer to corresponding descriptions in the alternative implementation manner of step 220 in the embodiment of fig. 2, which are not described herein.

Optionally, the current confounding variable aggregation model comprises a current positive fact confounding variable aggregation model. The confounding variable feature representation corresponding to the current training sample consistent with the intervention label of the current training sample can be provided to the current positive fact confounding variable aggregation model to obtain the aggregated positive fact confounding variable feature representation corresponding to the current training sample.

Optionally, the current confounding variable aggregation model comprises a current anti-fact confounding variable aggregation model. The confounding variable feature representation corresponding to the current training sample opposite to the intervention label of the current training sample can be provided to the current anti-facts confounding variable aggregation model to obtain the aggregated anti-facts confounding variable feature representation corresponding to the current training sample.

In one example, at 450, a corresponding aggregate predicted feature representation is derived based on the confounding variable feature representation of the current training sample and a fusion of the corresponding aggregate tuning variable feature representation and the aggregate confounding variable feature representation.

In this embodiment, the above-mentioned fusing manner may refer to the corresponding description of step 230 in the embodiment of fig. 2, which is not repeated here.

The resulting aggregated prediction feature representation is provided to the current prediction model to obtain a corresponding prediction value at 460.

In this embodiment, the structure and related parameters of the current prediction model may refer to the corresponding description of step 240 in the embodiment of fig. 2, which is not described herein.

Optionally, for each current training sample, the aggregate positive fact confusion variable feature representation corresponding to the current training sample may also be provided to the current intervention prediction model to obtain an intervention probability corresponding to the current training sample. In one example, the intervention probability corresponding to the current training sample may be expressed as Pr (t= 1|E _c，i ). Where Pr (·) can be used to represent the current intervention prediction model. In one example, the current intervention prediction model may be various models that can characterize the mapping relationship between vectors, such as a deep neural network-based classifier, that are trained through machine learning.

Optionally, for each current training sample, the confusion variable feature representation corresponding to the current training sample may also be provided to the current anti-facts confusion variable mapping model, so as to obtain a mapping result corresponding to the current training sample. The structure and relevant parameters of the above-mentioned current anti-aliasing variable mapping model may refer to corresponding descriptions in the alternative implementation of step 220 in the embodiment of fig. 2, and are not described in detail herein.

At 470, an adjustment variable consistency loss value is determined based on differences between the resulting aggregated adjustment variable feature representations corresponding to the current training samples of the different intervention tags.

In this embodiment, the differences between the distributions of the aggregate adjustment variable feature representations corresponding to the current training samples with and without intervention may be determined separately, and the adjustment variable consistency loss values may be determined using various integrating probability metric (integral probability metrics, IPM) methods. In one example, the pair of current training samples that are being tampered with may be determined separately The representative vector of the corresponding aggregate adjustment variable feature representation (e.g., element-wise averaging) and the representative vector of the corresponding aggregate adjustment variable feature representation of the current training sample without intervention (e.g., element-wise averaging). And determining the consistency loss value of the adjusting variable according to the difference between the two representative vectors. For example, maximum mean difference (Maximum Mean Discrepancy, MMD), KL (Kullback-Leibler) divergence, wasperstein distance, etc. may be employed. In one example, the adjustment variable consistency loss value may be expressed asWherein Wass (.cndot.) can be used to represent Wasserstein distance, { E _a } _i:ti＝0 Sum { E } _a } _j:tj＝1 May be used to represent the aggregate set of adjustment variable feature representations corresponding to the current training sample without intervention and the aggregate set of adjustment variable feature representations corresponding to the current training sample with intervention, respectively.

At 480, a predictive loss value is determined based on the differences between the respective predicted values and the corresponding output truth values.

In this embodiment, the predicted loss value may be obtained according to a predetermined predicted loss function (e.g., L2 norm). In one example, determining the predictive loss value may be represented asThe meaning of the relevant symbols may be referred to in the foregoing.

Alternatively, the decoupling loss value may also be determined based on the resulting differences between the respective intervention probabilities and the corresponding intervention tags. In one example, the decoupling loss value may be represented as Wherein (1)>May be used to represent cross entropy. The meaning of the remaining symbols may be referred to in the foregoing. Based on this, whether to maximize the confounding variable feature representation pair by introducingThe task of the interfered prediction capability is used for assisting decoupling, so that the decoupling effect is further improved.

Alternatively, the mapping loss value may also be determined based on differences between the respective mapping results obtained and the corresponding aggregate anti-aliasing variable feature representations. In one example, the mapping loss value may be represented as Wherein E is _cf，i Can be used to represent the aggregated anti-fact confounding variable feature representation derived from the anti-fact confounding variable aggregation model described above. The meaning of the remaining symbols may be referred to in the foregoing.

At 490, a determination is made as to whether the training end condition is met.

In one example, whether the training end condition is satisfied may be determined by determining whether the number of iterations reaches a preset number of times, whether the training duration reaches a preset duration, whether the loss value converges, and the like.

If not, at 4100, model parameters of the current feature decoupling model, the current adjustment variable aggregation model, the current confusion variable aggregation model, and the current prediction model are adjusted according to the adjustment variable consistency loss value and the prediction loss value. In one example, this may be based on To adjust model parameters of the current feature decoupling model, the current adjustment variable aggregation model, the current confusion variable aggregation model and the current prediction model simultaneously. Wherein w is ₁ May be used to represent a preset weighting factor.

Alternatively, the model parameters of the current feature decoupling model, the current adjustment variable aggregation model, the current confounding variable aggregation model, the current intervention prediction model, and the current prediction model may be adjusted according to the adjustment variable consistency loss value, the decoupling loss value, and the prediction loss value. In one example, this may be based onTo adjust model parameters of the current feature decoupling model, the current adjustment variable aggregation model, the current confounding variable aggregation model, the current intervention prediction model and the current prediction model simultaneously. Wherein w is ₂ May be used to represent a preset weighting factor.

Alternatively, the model parameters of the current feature decoupling model, the current adjustment variable aggregation model, the current confusion variable aggregation model, the current anti-facts confusion variable mapping model, and the current prediction model may be adjusted according to the adjustment variable consistency loss value, the mapping loss value, and the prediction loss value. In one example, this may be based on To adjust model parameters of the current feature decoupling model, the current adjustment variable aggregation model, the current confounding variable aggregation model, the current anti-fact confounding variable mapping model (the current intervention prediction model) and the current prediction model simultaneously. Wherein (1) >May be used to represent a preset weighting factor.

It will be appreciated that the feature decoupling model, the adjusted variable aggregation model, the confounding variable aggregation model, the anti-facts confounding variable mapping model, the intervention prediction model, and the prediction model after model parameter adjustment may serve as a current feature decoupling model, a current adjusted variable aggregation model, a current confounding variable aggregation model, a current anti-facts confounding variable mapping model, a current intervention prediction model, and a current prediction model for the next model training process. Thereafter, the current training sample set may be re-determined using the training sample set described above, and the model training process 420-490 may continue until the training end condition is met.

And if yes, determining the current causal effect estimation model as a causal effect estimation model after training. The gain corresponding to the sample to be predicted can be obtained by utilizing a feature decoupling model, an adjusting variable aggregation model, a confounding variable aggregation model, a prediction model and the like which are included in the trained causal effect estimation model.

Referring now to FIG. 5, FIG. 5 illustrates a schematic diagram of yet another example of a training method 500 of a causal effect estimation model according to an embodiment of the present disclosure.

As shown in fig. 5, the current training sample set may be as shown at 510 in fig. 5. Each current training sample may be represented as each node in 510. In one example, each node may represent a user, whether it is tampered with may refer to whether a coupon is issued to the user, and outputting a true value may refer to a consumption amount. The current feature decoupling model may be utilized to obtain an adjusted variable feature representation and a confounding variable feature representation corresponding to each current training sample. And finally, obtaining the aggregate adjustment variable characteristic representation corresponding to each current training sample according to the adjustment variable characteristic representation corresponding to each associated current training sample by using the current adjustment variable aggregate model. And according to the consistency of the neighbor nodes corresponding to each current training sample, the confusion variable characteristic representation corresponding to the corresponding current training sample is provided for the current confusion variable aggregation model, so that the aggregation confusion variable characteristic representation corresponding to the current training sample is obtained. In one example, for node 520, as the training sample is tampered with, an aggregate positive fact confounding variable feature representation corresponding to node 520 may be derived from a confounding variable feature representation corresponding to the current training sample that is also tampered with using the current positive fact confounding variable aggregation model. In one example, for node 520, as the training sample is tampered with, the current anti-fact confounding variable aggregation model may be utilized to derive an aggregate anti-fact confounding variable feature representation corresponding to node 520 from the confounding variable feature representation corresponding to the current training sample that is not tampered with. Reference is made in particular to the foregoing description of step 220 of fig. 2 and its alternative implementation. And similarly, the aggregate adjustment variable characteristic representation and the aggregate confusion variable characteristic representation corresponding to each current training sample can be obtained.

Next, an adjustment variable consistency loss value may be determined (as shown at 530 in fig. 5) based on the differences between the resulting aggregated adjustment variable feature representations for the interfered and non-interfered current training samples. The aggregate prediction feature corresponding to each current training sample may also be obtained based on the obtained fusion result of the aggregate adjustment variable feature representation and the aggregate confusion variable feature representation, and a corresponding prediction value may be obtained accordingly (as shown in 540 in fig. 5), so as to determine a prediction loss value. Therefore, the model parameters of the current characteristic decoupling model, the current adjustment variable aggregation model, the current confusion variable aggregation model and the current prediction model can be adjusted according to the adjustment variable consistency loss value and the prediction loss value under the condition that the training ending condition is not met.

Optionally, an intervention prediction model may be further used to obtain a corresponding intervention probability (as shown in 550 in fig. 5) according to the aggregate positive fact confusion variable feature representation corresponding to each current training sample, so as to determine a decoupling loss value. Alternatively, the mapping result may be obtained by using the current anti-fact confusion variable mapping model according to the confusion variable feature representation corresponding to each current training sample, and the mapping loss value may be determined by comparing the difference between the mapping result and the aggregated anti-fact confusion variable feature representation obtained by the current anti-fact confusion variable aggregation model (as shown in 560 in fig. 5). The model parameters of the current feature decoupling model, the current adjusted variable aggregation model, the current confounding variable aggregation model and the current prediction model (as well as the current intervention prediction model and the current anti-facts confounding variable mapping model) can thus be further adjusted in combination with the determined decoupling loss values and/or mapping loss values without satisfying the training end conditions.

By utilizing the training method of the causal effect estimation model disclosed in fig. 4-5, training of the feature decoupling model, the adjustment variable aggregation model, the confusion variable aggregation model, the prediction model and other models is realized by selecting different supervision signals, so that the control of the confusion factor deviation is realized by decoupling node features and adopting different graph aggregation modes aiming at different decoupling parts, the prediction of the result is not damaged, and the accuracy of gain prediction is further improved.

Referring now to fig. 6, fig. 6 shows a block diagram of one example of a causal effect estimation based gain prediction apparatus 600 according to an embodiment of the present disclosure. The apparatus embodiment may correspond to the method embodiments shown in fig. 2-3, and the apparatus may be specifically applied to various electronic devices.

As shown in fig. 6, the causal effect estimation based gain prediction apparatus 600 may include a decoupling unit 610, an aggregation unit 620, a fusion unit 630, and a prediction unit 640.

The decoupling unit 610 is configured to decouple the obtained initial feature representation of the sample to be predicted and the initial feature representation of the corresponding associated sample, respectively, so as to obtain a corresponding adjustment variable feature representation and a confounding variable feature representation.

And the aggregation unit 620 is configured to aggregate according to the obtained adjustment variable feature representation and the confusion variable feature representation corresponding to the associated sample, so as to obtain an aggregate adjustment variable feature representation and an aggregate confusion variable feature representation corresponding to the sample to be predicted and fused with the corresponding information of the associated sample.

In one example, aggregating the confounding variable feature representations includes aggregating the positive fact confounding variable feature representations. The aggregation unit 620 is further configured to provide the confusion variable feature representation of the positive fact corresponding to the associated sample of the sample to be predicted to the positive fact confusion variable aggregation model, so as to obtain an aggregated positive fact confusion variable feature representation corresponding to the sample to be predicted.

In one example, the aggregate confounding variable feature representation includes an aggregate anti-fact confounding variable feature representation. The aggregation unit 620 is further configured to provide the mixed variable feature representation of the inverse facts corresponding to the associated sample of the samples to be predicted to the inverse fact mixed variable aggregation model, so as to obtain an aggregated inverse fact mixed variable feature representation corresponding to the samples to be predicted.

In one example, the aggregate confounding variable feature representation includes an aggregate anti-fact confounding variable feature representation. The aggregation unit 620 is further configured to provide the confounding variable feature representation of the sample to be predicted to the anti-fact confounding variable mapping model, so as to obtain a mapping result as an aggregate anti-fact confounding variable feature representation corresponding to the sample to be predicted.

And a fusion unit 630 configured to obtain a corresponding aggregated prediction feature representation based on the confounding variable feature representation of the sample to be predicted and the fusion of the corresponding aggregate adjustment variable feature representation and the aggregate confounding variable feature representation.

And a prediction unit 640 configured to determine a gain size corresponding to the sample to be predicted according to the aggregate prediction feature representation.

In one example, the aggregate prediction feature representation comprises an aggregate positive fact prediction feature representation that is fused from the confounding variable feature representation of the sample to be predicted and the corresponding aggregate adjustment variable feature representation and aggregate positive fact confounding variable feature representation. A prediction unit 640 further configured to provide the aggregate real-world prediction feature representation to a process variable prediction model and a control variable prediction model, respectively, resulting in corresponding process and control predictions; and determining the gain corresponding to the sample to be predicted according to the difference between the obtained processing predicted value and the control predicted value.

In one example, the aggregate prediction feature representation comprises an aggregate inverse fact prediction feature representation that is fused from the confounding variable feature representation of the sample to be predicted and the corresponding aggregate adjustment variable feature representation and aggregate inverse fact confounding variable feature representation. A prediction unit 620 further configured to provide the aggregated inverse facts prediction feature representation to an inverse facts prediction model resulting in a corresponding inverse facts prediction value, wherein the inverse facts prediction model selects one of the process variable prediction model and the control variable prediction model according to the inverse facts to be predicted.

The specific operations of the decoupling unit 610, the aggregation unit 620, the fusion unit 630 and the prediction unit 640 may refer to the specific descriptions of the corresponding steps in the embodiments of fig. 2-3, which are not repeated herein.

Referring now to FIG. 7, FIG. 7 illustrates a block diagram of one example of a training apparatus 700 of a causal effect estimation model according to an embodiment of the present disclosure. The apparatus embodiment may correspond to the method embodiments shown in fig. 4-5, and the apparatus may be specifically applied to various electronic devices. The causal effect estimation model comprises a characteristic decoupling model, an adjustment variable aggregation model, a confusion variable aggregation model and a prediction model.

As shown in fig. 7, the training apparatus 700 of the causal effect estimation model may be configured to perform a model training process by the training unit 710 using training sample set loops until a training end condition is met. The training sample set comprises sample initial characteristic representations corresponding to the training samples with association relations, intervention labels representing whether intervention is performed or not and output true values. The training unit may include a decoupling module 711, a prediction module 712, and a loss value determination module 713.

The decoupling module 711 is configured to provide the sample initial feature representation corresponding to each current training sample in the current training sample set to the current feature decoupling model, resulting in an adjusted variable feature representation and a confounding variable feature representation corresponding to each current training sample.

A prediction module 712, configured to provide, for each current training sample, an adjustment variable feature representation corresponding to each associated current training sample to a current adjustment variable aggregation model, to obtain an aggregate adjustment variable feature representation corresponding to the current training sample; according to whether the intervention labels corresponding to the current training samples are consistent with the intervention labels of the current training samples, providing the confusion variable feature representation corresponding to the corresponding current training samples for a current confusion variable aggregation model to obtain an aggregation confusion variable feature representation corresponding to the current training samples; obtaining a corresponding aggregate prediction feature representation based on the confounding variable feature representation of the current training sample and the fusion of the corresponding aggregate adjustment variable feature representation and the aggregate confounding variable feature representation; and providing the obtained aggregate prediction feature representation for the current prediction model to obtain a corresponding prediction value.

A loss value determination module 713 configured to determine an adjustment variable consistency loss value based on differences between the resulting aggregated adjustment variable feature representations corresponding to the current training samples of different intervention tags; and determining a predicted loss value according to the difference between each obtained predicted value and the corresponding output true value.

The training apparatus 700 of the causal effect estimation model may further comprise: and a parameter adjustment unit 720 configured to adjust model parameters of the current feature decoupling model, the current adjustment variable aggregation model, the current confusion variable aggregation model and the current prediction model according to the adjustment variable consistency loss value and the prediction loss value in response to the training end condition not being satisfied.

The specific operations of the decoupling module 711, the prediction module 712, the loss value determination module 713, and the parameter adjustment unit 720 included in the training unit 710 may refer to the specific descriptions of the corresponding steps in the embodiments of fig. 4-5, which are not repeated herein.

Embodiments of a causal effect estimation based gain prediction method and apparatus, and a causal effect estimation model training method and apparatus according to embodiments of the present specification are described above with reference to fig. 1 to 7.

The gain prediction device based on causal effect estimation and the training device of the causal effect estimation model in the embodiments of the present disclosure may be implemented by hardware, or may be implemented by software or a combination of hardware and software. Taking software implementation as an example, the device in a logic sense is formed by reading corresponding computer program instructions in a memory into a memory by a processor of a device where the device is located. In the embodiments of the present description, the gain prediction means based on causal effect estimation and the training means of the causal effect estimation model may be implemented, for example, using electronic devices.

Fig. 8 shows a block diagram of one example of a causal effect estimation based gain prediction apparatus 800 of an embodiment of the present description.

As shown in fig. 8, a causal effect estimation based gain prediction apparatus 800 may include at least one processor 810, a memory (e.g., a non-volatile memory) 820, a memory 830, and a communication interface 840, and the at least one processor 810, the memory 820, the memory 830, and the communication interface 840 are coupled together via a bus 850. At least one processor 810 executes at least one computer-readable instruction (i.e., the elements described above as being implemented in software) stored or encoded in memory.

In one embodiment, computer-executable instructions are stored in memory that, when executed, cause the at least one processor 810 to: a causal effect estimation based gain prediction method as described above is performed.

It should be appreciated that the computer-executable instructions stored in the memory, when executed, cause the at least one processor 810 to perform the various operations and functions described above in connection with fig. 1-3 in various embodiments of the present specification.

FIG. 9 illustrates a block diagram of one example of a training apparatus 900 of a causal effect estimation model of an embodiment of the present description.

As shown in FIG. 9, the training apparatus 900 of the cause and effect estimation model may include at least one processor 910, a memory (e.g., a non-volatile memory) 920, a memory 930, and a communication interface 940, with the at least one processor 910, the memory 920, the memory 930, and the communication interface 940 being coupled together via a bus 950. The at least one processor 910 executes at least one computer-readable instruction (i.e., the elements described above as being implemented in software) stored or encoded in memory.

In one embodiment, computer-executable instructions are stored in memory that, when executed, cause the at least one processor 910 to: a training method of the causal effect estimation model as described above is performed.

It should be appreciated that the computer-executable instructions stored in the memory, when executed, cause the at least one processor 910 to perform the various operations and functions described above in connection with fig. 4-5 in various embodiments of the present description.

According to one embodiment, a program product, such as a computer readable medium, is provided. The computer-readable medium may have instructions (i.e., elements implemented in software as described above) that, when executed by a computer, cause the computer to perform the various operations and functions described above in connection with fig. 1-5 in various embodiments of the present specification.

In particular, a system or apparatus provided with a readable storage medium having stored thereon software program code implementing the functions of any of the above embodiments may be provided, and a computer or processor of the system or apparatus may be caused to read out and execute instructions stored in the readable storage medium.

In this case, the program code itself read from the readable medium may implement the functions of any of the above-described embodiments, and thus the machine-readable code and the readable storage medium storing the machine-readable code form part of the present invention.

Computer program code required for operation of portions of the present description may be written in any one or more programming languages, including an object oriented programming language such as Java, scala, smalltalk, eiffel, JADE, emerald, C ++, c#, VB, NET, python and the like, a conventional programming language such as C language, visual Basic 2003, perl, COBOL 2002, PHP and ABAP, a dynamic programming language such as Python, ruby and Groovy, or other programming languages and the like. The program code may execute on the user's computer or as a stand-alone software package, or it may execute partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the latter scenario, the remote computer may be connected to the user's computer through any form of network, such as a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet), or the connection may be made to the cloud computing environment, or for use as a service, such as software as a service (SaaS).

Examples of readable storage media include floppy disks, hard disks, magneto-optical disks, optical disks (e.g., CD-ROMs, CD-R, CD-RWs, DVD-ROMs, DVD-RAMs, DVD-RWs), magnetic tapes, nonvolatile memory cards, and ROMs. Alternatively, the program code may be downloaded from a server computer or cloud by a communications network.

The foregoing describes specific embodiments of the present disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims can be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing are also possible or may be advantageous.

Not all steps or units in the above-mentioned flowcharts and system configuration diagrams are necessary, and some steps or units may be omitted according to actual needs. The order of execution of the steps is not fixed and may be determined as desired. The apparatus structures described in the above embodiments may be physical structures or logical structures, that is, some units may be implemented by the same physical entity, or some units may be implemented by multiple physical entities, or may be implemented jointly by some components in multiple independent devices.

The term "exemplary" used throughout this specification means "serving as an example, instance, or illustration," and does not mean "preferred" or "advantageous over other embodiments. The detailed description includes specific details for the purpose of providing an understanding of the described technology. However, the techniques may be practiced without these specific details. In some instances, well-known structures and devices are shown in block diagram form in order to avoid obscuring the concepts of the described embodiments.

The alternative implementation manner of the embodiment of the present disclosure has been described in detail above with reference to the accompanying drawings, but the embodiment of the present disclosure is not limited to the specific details of the foregoing implementation manner, and various simple modifications may be made to the technical solution of the embodiment of the present disclosure within the scope of the technical concept of the embodiment of the present disclosure, and all the simple modifications belong to the protection scope of the embodiment of the present disclosure.

The previous description of the disclosure is provided to enable any person skilled in the art to make or use the disclosure. Various modifications to the disclosure will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other variations without departing from the scope of the disclosure. Thus, the disclosure is not intended to be limited to the examples and designs described herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. A causal effect estimation based gain prediction method, comprising:

respectively decoupling the initial characteristic representation of the acquired sample to be predicted and the initial characteristic representation of the corresponding associated sample to obtain the corresponding adjustment variable characteristic representation and confusion variable characteristic representation;

according to the obtained adjustment variable characteristic representation and confusion variable characteristic representation corresponding to the associated sample, aggregation adjustment variable characteristic representation and aggregation confusion variable characteristic representation which correspond to the sample to be predicted and are fused with the corresponding information of the associated sample are obtained;

obtaining a corresponding aggregation prediction feature representation based on the confusion variable feature representation of the sample to be predicted and fusion of the corresponding aggregation adjustment variable feature representation and the aggregation confusion variable feature representation; and

and determining the gain corresponding to the sample to be predicted according to the aggregate prediction feature representation.

2. The gain prediction method of claim 1, wherein the aggregate confounding variable feature representation comprises an aggregate positive fact confounding variable feature representation,

the step of aggregating according to the obtained adjustment variable feature representation and confusion variable feature representation corresponding to the associated sample, the step of obtaining the aggregate adjustment variable feature representation and the aggregate confusion variable feature representation corresponding to the sample to be predicted and fused with the corresponding information of the associated sample comprises the following steps:

And providing the confusion variable characteristic representation of the positive facts corresponding to the associated sample of the sample to be predicted for a positive fact confusion variable aggregation model to obtain the aggregation positive fact confusion variable characteristic representation corresponding to the sample to be predicted.

3. The gain prediction method of claim 1 or 2, wherein the aggregate confounding variable feature representation comprises an aggregate counterfactual confounding variable feature representation,

and providing the mixed variable characteristic representation of the counterfactual corresponding to the associated sample of the sample to be predicted for a counterfactual mixed variable aggregation model to obtain the aggregated counterfactual mixed variable characteristic representation corresponding to the sample to be predicted.

4. The gain prediction method of claim 1 or 2, wherein the aggregate confounding variable feature representation comprises an aggregate counterfactual confounding variable feature representation,

And providing the confusion variable characteristic representation of the sample to be predicted for an anti-facts confusion variable mapping model to obtain a mapping result as an aggregate anti-facts confusion variable characteristic representation corresponding to the sample to be predicted.

5. The gain prediction method of claim 2, wherein the aggregate prediction feature representation comprises an aggregate positive fact prediction feature representation that is fused from a confounding variable feature representation of the sample to be predicted and a corresponding aggregate tuning variable feature representation and aggregate positive fact confounding variable feature representation,

the determining the gain corresponding to the sample to be predicted according to the aggregate prediction feature representation includes:

providing the aggregate positive real prediction characteristic representation to a processing variable prediction model and a control variable prediction model respectively to obtain a corresponding processing prediction value and a control prediction value;

and determining the gain corresponding to the sample to be predicted according to the difference between the obtained processing predicted value and the control predicted value.

6. The gain prediction method of claim 5, wherein the aggregate prediction feature representation comprises an aggregate inverse prediction feature representation fused from a confounding variable feature representation of the sample to be predicted and a corresponding aggregate adjustment variable feature representation and aggregate inverse confounding variable feature representation,

The gain prediction method further includes:

providing the aggregated inverse facts prediction feature representation to an inverse facts prediction model, resulting in a corresponding inverse facts prediction value, wherein the inverse facts prediction model selects one of the process variable prediction model and the control variable prediction model according to the inverse facts to be predicted.

7. A method of training a causal effect estimation model, wherein the causal effect estimation model comprises a feature decoupling model, an adjustment variable aggregation model, a confounding variable aggregation model, and a prediction model, the method comprising:

the following model training process is circularly executed by using a training sample set until a training ending condition is met, wherein the training sample set comprises sample initial feature representations corresponding to all training samples with association relations, intervention labels representing whether intervention is carried out or not and output true values:

providing sample initial characteristic representations corresponding to each current training sample in the current training sample set for the current characteristic decoupling model to obtain adjustment variable characteristic representations and confusion variable characteristic representations corresponding to each current training sample;

for each of the current training samples,

providing the adjustment variable characteristic representation corresponding to each associated current training sample for a current adjustment variable aggregation model to obtain an aggregation adjustment variable characteristic representation corresponding to the current training sample;

According to whether the intervention labels corresponding to the current training samples are consistent with the intervention labels of the current training samples, providing the confusion variable feature representation corresponding to the corresponding current training samples for a current confusion variable aggregation model to obtain an aggregation confusion variable feature representation corresponding to the current training samples;

obtaining a corresponding aggregate prediction feature representation based on the confounding variable feature representation of the current training sample and the fusion of the corresponding aggregate adjustment variable feature representation and the aggregate confounding variable feature representation;

providing the obtained aggregate prediction feature representation for a current prediction model to obtain a corresponding prediction value; determining an adjustment variable consistency loss value according to the difference between the obtained aggregation adjustment variable characteristic representations corresponding to the current training samples of different intervention labels;

determining a predicted loss value according to the obtained difference between each predicted value and the corresponding output true value; and adjusting model parameters of a current feature decoupling model, a current adjustment variable aggregation model, a current confusion variable aggregation model and a current prediction model according to the adjustment variable consistency loss value and the prediction loss value in response to the training ending condition being not satisfied.

8. The method of claim 7, wherein the current confounding variable aggregation model comprises a current positive fact confounding variable aggregation model,

providing the confusion variable feature representation corresponding to the current training sample to the current confusion variable aggregation model according to whether the intervention label corresponding to each current training sample is consistent with the intervention label of the current training sample, wherein the obtaining the aggregation confusion variable feature representation corresponding to the current training sample comprises the following steps:

providing the confounding variable feature representation corresponding to the current training sample consistent with the intervention label of the current training sample to a current positive fact confounding variable aggregation model to obtain an aggregated positive fact confounding variable feature representation corresponding to the current training sample,

the model training process further includes:

for each current training sample, providing the aggregate positive fact confusion variable characteristic representation corresponding to the current training sample for a current intervention prediction model to obtain an intervention probability corresponding to the current training sample;

determining a decoupling loss value according to the obtained difference between each intervention probability and the corresponding intervention label; and

the adjusting the model parameters of the current feature decoupling model, the current adjustment variable aggregation model, the current confusion variable aggregation model and the current prediction model according to the adjustment variable consistency loss value and the prediction loss value comprises:

And adjusting model parameters of a current characteristic decoupling model, a current adjustment variable aggregation model, a current confusion variable aggregation model, a current intervention prediction model and a current prediction model according to the adjustment variable consistency loss value, the decoupling loss value and the prediction loss value.

9. The method of claim 7 or 8, wherein the current confounding variable aggregation model comprises a current anti-fact confounding variable aggregation model,

providing the confounding variable feature representation corresponding to the current training sample opposite to the intervention label of the current training sample to a current anti-facts confounding variable aggregation model to obtain an aggregated anti-facts confounding variable feature representation corresponding to the current training sample,

the model training process further includes:

for each current training sample, providing the confusion variable characteristic representation corresponding to the current training sample for a current anti-facts confusion variable mapping model to obtain a mapping result corresponding to the current training sample;

Determining a mapping loss value according to the obtained mapping results and the differences between the corresponding aggregated anti-fact confusion variable characteristic representations; and

and adjusting model parameters of a current characteristic decoupling model, a current adjustment variable aggregation model, a current confusion variable aggregation model, a current anti-facts confusion variable mapping model and a current prediction model according to the adjustment variable consistency loss value, the mapping loss value and the prediction loss value.

10. A causal effect estimation based gain prediction apparatus, comprising:

the decoupling unit is configured to respectively decouple the initial characteristic representation of the acquired sample to be predicted and the initial characteristic representation of the corresponding associated sample to obtain the corresponding adjustment variable characteristic representation and the confusion variable characteristic representation;

the aggregation unit is configured to aggregate according to the obtained adjustment variable characteristic representation and the confusion variable characteristic representation corresponding to the associated sample, so as to obtain an aggregation adjustment variable characteristic representation and an aggregation confusion variable characteristic representation which correspond to the sample to be predicted and are fused with the corresponding information of the associated sample;

The fusion unit is configured to obtain a corresponding aggregation prediction feature representation based on the confusion variable feature representation of the sample to be predicted and fusion of the corresponding aggregation adjustment variable feature representation and the aggregation confusion variable feature representation; and

and the prediction unit is configured to determine the gain size corresponding to the sample to be predicted according to the aggregate prediction feature representation.

11. A training apparatus of a causal effect estimation model, wherein the causal effect estimation model comprises a feature decoupling model, an adjustment variable aggregation model, a confusion variable aggregation model and a prediction model, the training apparatus being configured to perform a model training process by a training unit in a loop using a training sample set, until a training end condition is met, the training sample set comprising a sample initial feature representation corresponding to each training sample having an association, an intervention label representing whether or not an intervention is received, and an output truth value, the training unit comprising:

the decoupling module is configured to provide sample initial feature representations corresponding to the current training samples in the current training sample set for the current feature decoupling model to obtain adjustment variable feature representations and confusion variable feature representations corresponding to the current training samples;

The prediction module is configured to provide the current adjustment variable aggregation model with adjustment variable characteristic representations corresponding to the current training samples according to the current training samples, so as to obtain aggregation adjustment variable characteristic representations corresponding to the current training samples; according to whether the intervention labels corresponding to the current training samples are consistent with the intervention labels of the current training samples, providing the confusion variable feature representation corresponding to the corresponding current training samples for a current confusion variable aggregation model to obtain an aggregation confusion variable feature representation corresponding to the current training samples; obtaining a corresponding aggregate prediction feature representation based on the confounding variable feature representation of the current training sample and the fusion of the corresponding aggregate adjustment variable feature representation and the aggregate confounding variable feature representation; providing the obtained aggregate prediction feature representation for a current prediction model to obtain a corresponding prediction value;

the loss value determining module is configured to determine an adjustment variable consistency loss value according to the obtained difference between the aggregate adjustment variable characteristic representations corresponding to the current training samples of different intervention labels; determining a predicted loss value according to the obtained difference between each predicted value and the corresponding output true value; and

The training device further comprises:

and the parameter adjusting unit is configured to adjust model parameters of a current characteristic decoupling model, a current adjusting variable aggregation model, a current confusion variable aggregation model and a current prediction model according to the adjusting variable consistency loss value and the prediction loss value in response to the condition that the training ending condition is not met.

12. A causal effect estimation based gain prediction apparatus, comprising: at least one processor, a memory coupled with the at least one processor, and a computer program stored on the memory, the at least one processor executing the computer program to implement the gain prediction method of any one of claims 1 to 6.

13. A training apparatus for a causal effect estimation model, comprising: at least one processor, a memory coupled with the at least one processor, and a computer program stored on the memory, the at least one processor executing the computer program to implement the training method of any of claims 7 to 9.