CN113408730A

CN113408730A - Causal relationship generation method and device and electronic equipment

Info

Publication number: CN113408730A
Application number: CN202010188570.3A
Authority: CN
Inventors: 李翱
Original assignee: Beijing Dajia Internet Information Technology Co Ltd
Current assignee: Beijing Dajia Internet Information Technology Co Ltd
Priority date: 2020-03-17
Filing date: 2020-03-17
Publication date: 2021-09-17
Anticipated expiration: 2040-03-17
Also published as: CN113408730B

Abstract

The invention relates to a causal relationship generation method, a causal relationship generation device and electronic equipment, wherein, each object to be grouped is divided into a plurality of groups according to the tendency probability of each object to be grouped, and for each grouping, adjusting the grouping such that the number of first control subjects in the grouping is the same as the number of first experimental subjects, and determining whether the first event has an effect on the second event based on difference information representing a difference between the total number of the second subject and the total number of the second control subject in all the groupings, since it can be ensured that the experimental subject and the control subject contained in each group are both subjects having the same or similar occurrence probability for the second event, therefore, the situation that whether the experimental objects and the control objects are the same or similar research objects can be determined only by comparing the experimental objects and the control objects one by one in the prior art is avoided, the calculation process is reduced, and the time and the calculation resource cost for causal inference are reduced.

Description

Causal relationship generation method and device and electronic equipment

Technical Field

The present disclosure relates to the field of cause and effect inference technologies, and in particular, to a cause and effect relationship generation method and apparatus, and an electronic device.

Background

Cause and effect Inference (cause and effect Inference) is increasingly applied to various fields as a method for analyzing cause and effect relationships between objects to be researched, for example, in the public health field, the cause and effect relationship between smoking and lung cancer can be researched through cause and effect Inference, and in the internet social field, the cause and effect relationship between a page and a page sharing can be researched through cause and effect Inference.

In order to analyze and study the causal relationship between the first event and the second event, the causal relationship between the first event and the second event can be determined by setting an experimental group and a control group, and comparing the difference between the first event and the second event between the experimental group and the control group.

In order to quickly determine the cause-effect relationship, each experimental object in the experimental group is usually an object in which a first event has occurred, and each control object in the control group is an object in which the first event has not occurred, and the cause-effect relationship between the first event and the second event can be quickly determined by comparing differences between the experimental objects and the control objects in the second event, for example, in order to analyze the cause-effect relationship between appropriating (the first event) and sharing (the second event) a page, an experimental object is an object appropriating the page, a control object is a research object appropriating the page, and whether appropriating the page affects the page sharing can be determined by comparing differences between appropriating the page and disappearing the page.

However, in the actual analysis process, it is difficult to ensure that the experimental object and the control object have the same characteristics, and therefore, the experimental object and the control object are often required to be matched in the actual analysis process, so that the matched experimental object and the matched control object are the research objects with the same or similar characteristics, and finally, the matched experimental object and the matched control object are analyzed.

In the related art, a KNN (k-nearest neighbor, proximity algorithm) is generally used to match an experimental object and a control object, which first establishes a feature space through feature data of the control object, calculates a feature distance between the experimental object and each control object when matching is needed, and matches the control object with the minimum feature distance or with the feature distance smaller than a certain distance with the experimental object.

However, in the process of using the KNN algorithm, each experimental subject in the experimental group needs to perform feature calculation with each control subject in the control group, and when the number of the experimental subjects and the control subjects is large, the time and calculation resource cost required for causal inference are high.

Disclosure of Invention

The present disclosure provides a causal relationship generation method, device and electronic device, so as to at least solve the problem in the related art that the cost of time and computing resources required for causal inference is high. The technical scheme of the disclosure is as follows:

according to a first aspect of the embodiments of the present disclosure, there is provided a causal relationship generation method, including:

obtaining tendency probability of each object to be grouped, wherein each object to be grouped comprises a first experimental object and a first control object, the first experimental object is an object which has a first event, the first control object is an object which has not the first event, and the tendency probability of each object to be grouped is as follows: estimating the probability of a second event of the object to be grouped according to the characteristic data of the object to be grouped;

dividing each object to be grouped according to the tendency probability of each object to be grouped to obtain a plurality of groups;

for each group of the first experimental subject with different number from the first control subject, adjusting the number of the first control subjects in the group to enable the number of the first control subjects in the group to be the same as the number of the first experimental subjects;

obtaining difference information representing the difference between the total number of second experimental subjects and the total number of second control subjects in all the groups according to the number of the second experimental subjects and the number of the second control subjects in each group, wherein the second experimental subjects are first experimental subjects having the second event, and the second control subjects are first control subjects having the second event;

when the difference information indicates that the total number of the second experimental subjects is larger than the total number of the second control subjects, generating a causal relationship indicating that the first event has an influence on the second event.

Further, the method further comprises:

when the difference information indicates that the total number of the second experimental subjects is not greater than the total number of the second control subjects, generating a causal relationship indicating that the first event has no influence on the second event.

Further, the step of obtaining the tendency probability of each object to be grouped includes:

acquiring characteristic data of each object to be grouped;

acquiring a probability numerical value corresponding to the characteristic data of each object to be grouped based on a corresponding relation between the pre-established characteristic data and the probability numerical value representing the occurrence probability of the second event;

and obtaining the tendency probability of each object to be grouped according to the obtained probability numerical value.

Furthermore, each object to be grouped has multiple types of characteristic data;

the step of obtaining the tendency probability of each object to be grouped according to the obtained probability numerical value comprises the following steps:

and adding probability numerical values corresponding to various types of feature data of each object to be grouped to obtain a probability numerical value sum as the tendency probability of each object to be grouped.

Further, the step of dividing the objects to be grouped according to the tendency probability of the objects to be grouped to obtain a plurality of groups includes:

dividing each object to be grouped into a plurality of groups with the same number of objects to be grouped according to a grouping sequence determined according to the tendency probability of each object to be grouped; alternatively, the first and second electrodes may be,

and dividing the objects to be grouped into groups corresponding to different probability intervals according to the tendency probability of the objects to be grouped.

Further, the step of adjusting the number of the first subjects in the group so that the number of the first subjects in the group is the same as the number of the first subjects in the group for each group in which the number of the first subjects is different from the number of the first subjects comprises:

adjusting the number of first control subjects in each group for which the number of first subject subjects is different from the number of first control subjects as follows:

determining the number of first experimental subjects contained in the group as a first number and the number of first control subjects contained in the group as a second number;

when the first number is smaller than the second number, calculating a first number difference between the second number and the first number; deleting the first number of the different first control objects from the first control objects contained in the group itself;

when the first number is larger than the second number, calculating a second number difference value between the first number and the second number; copying the second number of differential first control objects from the first control objects contained in the packet itself; the replicated first control object is added to the group.

Further, the step of obtaining difference information representing a difference between the total number of the second subjects and the total number of the second control subjects in all the groups according to the number of the second subjects and the number of the second control subjects in each group includes:

calculating a first total number of the second experimental subjects and a second total number of the second control subjects in all the groups according to the number of the second experimental subjects and the number of the second control subjects in each group; calculating a third quantity difference value between the first total quantity and the second total quantity as difference information representing a difference between the total quantity of the second experimental subjects and the total quantity of the second control subjects in all the groups; alternatively, the first and second electrodes may be,

calculating a first quantity ratio of a second experimental object in all the groups to each object to be grouped, and calculating a second quantity ratio of a second control object in all the groups to each object to be grouped; calculating a ratio difference of the first number ratio and the second number ratio as difference information representing a difference between the total number of the second subjects and the total number of the second control subjects in all the groups.

According to a second aspect of the embodiments of the present disclosure, there is provided a causal relationship generation apparatus, including:

the tendency probability obtaining module is configured to perform obtaining of tendency probabilities of objects to be grouped, where the objects to be grouped include a first experimental object and a first control object, the first experimental object is an object in which a first event has occurred, the first control object is an object in which the first event has not occurred, and the tendency probability of each object to be grouped is: estimating the probability of a second event of the object to be grouped according to the characteristic data of the object to be grouped;

the grouping module is configured to divide the objects to be grouped according to the tendency probability of the objects to be grouped to obtain a plurality of groups;

a number adjustment module configured to perform, for each of groups in which the number of the first subject is different from the number of the first control subjects, adjusting the number of the first control subjects in the group so that the number of the first control subjects in the group is the same as the number of the first subject;

a difference information obtaining module configured to perform obtaining difference information representing a difference between the total number of second experimental subjects in all the groups and the total number of second control subjects according to the number of the second experimental subjects in each group and the number of the second control subjects, wherein the second experimental subjects are first experimental subjects in which the second event has occurred, and the second control subjects are first control subjects in which the second event has occurred;

a causal relationship generation module configured to perform generating a causal relationship indicating that the first event has an effect on the second event when the difference information indicates that the total number of the second experimental subjects is greater than the total number of the second control subjects.

Further, the causal relationship generation module is further configured to perform, when the difference information indicates that the total number of the second experimental subjects is not greater than the total number of the second control subjects, generating a causal relationship indicating that the first event has no influence on the second event.

Further, the tendency probability obtaining module is specifically configured to perform obtaining of feature data of each object to be grouped, obtain a probability numerical value corresponding to the feature data of each object to be grouped based on a correspondence between the pre-established feature data and a probability numerical value representing an occurrence probability of a second event, and obtain the tendency probability of each object to be grouped according to the obtained probability numerical value.

the tendency probability obtaining module is specifically configured to perform addition of probability numerical values corresponding to each type of feature data of each object to be grouped to obtain a probability numerical value sum as a tendency probability of each object to be grouped.

Further, the grouping module is specifically configured to divide the objects to be grouped into a plurality of groups with the same number of objects to be grouped according to a grouping sequence determined according to the tendency probability of the objects to be grouped; or dividing the objects to be grouped into groups corresponding to different probability intervals according to the tendency probability of the objects to be grouped.

Further, the number adjusting module is specifically configured to perform the following steps of adjusting the number of the first control subjects in each group in which the number of the first experiment subjects is different from the number of the first control subjects:

Further, the difference information obtaining module is specifically configured to calculate a first total number of the second experimental subjects and a second total number of the second control subjects in all the groups according to the number of the second experimental subjects and the number of the second control subjects in each group; calculating a third quantity difference value between the first total quantity and the second total quantity as difference information representing a difference between the total quantity of the second experimental subjects and the total quantity of the second control subjects in all the groups; or, calculating a first quantity ratio of the second experimental object in all the groups to each object to be grouped, and calculating a second quantity ratio of the second control object in all the groups to each object to be grouped; calculating a ratio difference of the first number ratio and the second number ratio as difference information representing a difference between the total number of the second subjects and the total number of the second control subjects in all the groups.

According to a third aspect of the embodiments of the present disclosure, there is provided a causal relationship generation electronic device, including:

a processor;

a memory for storing the processor-executable instructions;

wherein the processor is configured to execute the instructions to implement any of the causal relationship generation methods described above.

According to a fourth aspect of the embodiments of the present disclosure, there is provided a storage medium, wherein instructions, when executed by a processor of an electronic device, enable the electronic device to perform any one of the causal relationship generation methods described above.

According to a fifth aspect of embodiments of the present disclosure, there is provided a computer program product, wherein instructions that, when executed by a processor of an electronic device, enable the electronic device to perform the causal relationship generation method as described in any one of the above.

The technical scheme provided by the embodiment of the disclosure at least brings the following beneficial effects: obtaining the tendency probability of each object to be grouped, wherein each object to be grouped comprises a first experimental object and a first control object, the first experimental object is an object in which a first event occurs, the first control object is an object in which the first event does not occur, and the tendency probability of each object to be grouped is as follows: estimating the probability of the second event of the objects to be grouped according to the characteristic data of the objects to be grouped, dividing the objects to be grouped according to the tendency probability of the objects to be grouped to obtain a plurality of groups, adjusting the number of first control objects in the groups aiming at each group of first experimental objects with different numbers from the first control objects so that the number of the first control objects in the groups is the same as that of the first experimental objects, obtaining difference information representing the difference between the total number of second experimental objects in all the groups and the total number of second control objects according to the number of second experimental objects in each group and the number of second control objects, wherein the second experimental objects are the first experimental objects generating the second event, the second control objects are the first control objects generating the second event, and when the difference information represents that the total number of the second experimental objects is larger than that of the second control objects, the causal relationship which represents the influence of the first event on the second event is generated, the tendency probability of each object to be grouped is the probability of the second event of the object to be grouped estimated according to the characteristic data of the object to be grouped, and each object to be grouped is divided into a plurality of groups according to the tendency probability, so that the experimental object and the comparison object contained in each group are both objects with the same or similar occurrence probability to the second event, the situation that whether the experimental object and the comparison object are the same or similar research objects can be determined by comparing the experimental object and the comparison object one by one in the prior art is avoided, the calculation process is reduced, and the time for carrying out causal inference and the calculation resource cost are reduced.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and, together with the description, serve to explain the principles of the disclosure and are not to be construed as limiting the disclosure.

FIG. 1 is a flow chart illustrating a causal relationship generation method according to an exemplary embodiment.

FIG. 2 is a flow chart illustrating another causal relationship generation method according to an example embodiment.

FIG. 3 is a flow chart illustrating a trend probability obtaining method according to an exemplary embodiment.

FIG. 4 is a block diagram illustrating a causal relationship generation apparatus according to an exemplary embodiment.

FIG. 5 is a block diagram illustrating an electronic device in accordance with an example embodiment.

Detailed Description

In order to make the technical solutions of the present disclosure better understood by those of ordinary skill in the art, the technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings.

It should be noted that the terms "first," "second," and the like in the description and claims of the present disclosure and in the above-described drawings are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the disclosure described herein are capable of operation in sequences other than those illustrated or otherwise described herein. The implementations described in the exemplary embodiments below are not intended to represent all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present disclosure, as detailed in the appended claims.

FIG. 1 is a flow chart illustrating a causal relationship generation method according to an exemplary embodiment, as shown in FIG. 1, including the following steps.

S101: and obtaining the tendency probability of each object to be grouped.

In this step, each object to be grouped includes a first experimental object and a first comparison object, the first experimental object is an object in which a first event has occurred, the first comparison object is an object in which the first event has not occurred, and the tendency probability of each object to be grouped is: and estimating the probability of the second event of the object to be grouped according to the characteristic data of the object to be grouped.

The object to be grouped may be a research object in causal inference, and may include a first experimental object and a first control object, where the first experimental object is an object that satisfies a first preset condition, and the first control object is an object that does not satisfy the first preset condition.

As known to those skilled in the art, causal inference is commonly used to analyze a causal relationship between two events, for example, the effect of a first event on a second event, such as the causal relationship between smoking on lung cancer, or the causal relationship between user approval and page sharing on a page.

Therefore, the first preset condition may be a "cause" studied by causing the cause and effect inference, for example, when studying whether the first event affects the second event, the first preset condition may be that the first event has occurred in the object to be grouped, for example, when studying whether the cause and effect inference is that smoking (the first event) affects lung cancer (the second event), the first preset condition may be whether smoking is performed, in this case, the first experiment object is a subject who smokes, the first comparison object is a subject who does not smoke, and when studying the cause and effect inference is that the user affects page sharing in favor of the page, the first preset condition may be whether to approve, in this case, the first experiment object is a subject who approves, and the first comparison object is a subject who does not approve.

In an example embodiment, the objects to be grouped may be any type of things, such as people, animals, buildings, etc., it should be noted that the above description of the objects to be grouped is only for the purpose of clearly explaining the present disclosure, and those skilled in the art can know that the objects to be grouped are abstract data, symbols, etc. for a machine applying the above method.

The above feature data may be data representing features of the objects to be grouped, for example, when the objects to be grouped are humans, the feature data may be data of age, sex, school calendar, income, exercise duration per week, daily sleep time, or the like. The characteristic data of the same study subject may be different in different causal inferences. For example, when the influence of smoking on lung cancer is studied, the object to be grouped is a person, the feature data may be data such as age, gender, or exercise duration of each week, and when the influence of the study user on page sharing is complied with, the feature data may be data such as age, gender, or the degree of matching between the browsing preference of the user and the content of the current browsing interface.

S102: and dividing the objects to be grouped according to the tendency probability of the objects to be grouped to obtain a plurality of groups.

In this step, each object to be grouped may be divided in the following two ways to obtain a plurality of groups.

The first mode is as follows: and dividing the objects to be grouped into a plurality of groups with the same number according to the grouping sequence determined according to the tendency probability of the objects to be grouped.

For example, the number of the objects to be grouped is 2000, and every 100 objects to be grouped can be divided into one group according to the order from small to large of the tendency probability.

The second mode is as follows: and dividing the objects to be grouped into groups corresponding to different probability intervals according to the tendency probability of the objects to be grouped.

Since the tendency probability of the object to be grouped is the estimated probability of the second event occurring in the object to be grouped, the value range is [0, 1 ].

Illustratively, 10 value intervals can be divided into [0, 0.1 ], [0.1, 0.2 ], [0.2, 0.3 ], [0.3, 0.4 ], [0.4, 0.5 ], [0.5, 0.6 ], [0.6, 0.7 ], [0.7, 0.8 ], [0.8, 0.9 ], [0.9, 1 ]. After the tendency probability of each object to be grouped is determined, the objects to be grouped with the tendency probability between 0 and 0.1 can be determined as the same group, and can be divided into different groups according to different tendency probabilities in the same way.

Through the step, the objects to be grouped with the same or similar tendency probability can be divided into the same group, so that the tendency probabilities of the objects to be grouped in the same group are the same or similar, which means that the influence of the characteristics of the experimental object and the comparison object contained in the same group on the research result is the same or similar.

S103: for each group of the first subject different in number from the first control subject, the number of the first control subjects in the group is adjusted so that the number of the first control subjects in the group is the same as the number of the first subject.

In this step, in order to avoid the problem of low coverage rate of matching between the first experimental object and the first control object, and ensure fairness of causal inference results, it is necessary to make the number of the first experimental object and the first control object with the same or similar characteristics the same, and the number of the first experimental object and the first control object in one group is the same or different.

In one embodiment, the number of first control subjects in each grouping where the first subject differs from the number of first control subjects may be adjusted in the following manner:

firstly, determining the number of first experimental objects contained in the group as a first number, and determining the number of first control objects contained in the group as a second number, and when the first number is smaller than the second number, calculating a difference value between the second number and the first number; deleting a first number of difference first control objects from the first control objects contained in the group, and calculating a second number difference between the first number and the second number when the first number is greater than the second number; copying a second number of differential first control objects from the first control objects contained in the packet itself; the replicated first control object is added to the group.

Illustratively, the number of the first control objects contained in the group is 15, and the number of the first test objects is 20, in order to make the number of the first test objects and the number of the first test objects the same, 5 first control objects may be randomly selected from the 15 first control objects contained in the group, and after copying the corresponding data, the group is added so that the number of the first control objects in the group is 20, which is consistent with the number of the first test objects.

S104: and obtaining difference information representing the difference between the total number of the second experimental subjects and the total number of the second control subjects in all the groups according to the number of the second experimental subjects and the number of the second control subjects in each group.

In this step, the second experimental subject is the first experimental subject having occurred with the second event, and the second control subject is the first control subject having occurred with the second event.

In one embodiment, a first total number of the second subjects and a second total number of the second control subjects in all the groups may be calculated according to the number of the second subjects and the number of the second control subjects in each group, and a third number difference value between the first total number and the second total number may be calculated as difference information indicating a difference between the total number of the second subjects and the total number of the second control subjects in all the groups.

Illustratively, the number of the second subjects in the group a is 10, the number of the second control subjects is 5, the number of the second subjects in the group B is 20, the number of the second control subjects is 10, the number of the second subjects in the group C is 30, the number of the second control subjects is 5, the first total number of the second subjects in all the groups is 60, the second total number of the second control subjects is 20, and the third number difference between the first total number and the second total number is 40, so that the difference information representing the difference between the total number of the second subjects and the total number of the second control subjects in all the groups is 40 more than the total number of the second control subjects.

In one embodiment, a first quantity ratio of the second experimental object to each object to be grouped in all the groups and a second quantity ratio of the second control object to each object to be grouped in all the groups can be calculated; the ratio difference of the first number ratio and the second number ratio is calculated as difference information representing the difference between the total number of the second subjects and the total number of the second control subjects in all the groups.

Illustratively, the number of the objects to be grouped is 200, the number of the second subjects in all the groups is 100, and the number of the second control subjects in all the groups is 20, so that the ratio of the first number of the second subjects in all the groups to the objects to be grouped is 100/200-0.5, and the ratio of the second number of the second control subjects in all the groups to the objects to be grouped is 50/200-0.25. The ratio difference between the first quantity ratio and the second quantity ratio is 0.5-0.25. The difference information representing the difference between the total number of the second subjects and the total number of the second control subjects in all the groups is that the percentage of the total number of the second subjects is 25% more than the percentage of the total number of the second subjects.

S105: and when the difference information indicates that the total number of the second experimental objects is larger than that of the second control objects, generating a causal relationship indicating that the first event has influence on the second event.

In this step, when the difference information indicates that the total number of the second experimental subjects is greater than the total number of the second control subjects, a causal relationship indicating that the first event has an influence on the second event may be generated, and optionally, when the first total number is greater than the second total number, or the first number ratio is greater than the second number ratio.

Further, in order to ensure the accuracy of the generated causal relationship, a threshold may be set, and the causal relationship indicating that the first event has an influence on the second event is generated only when the difference between the first total number and the second total number is greater than the threshold, or the difference between the first number ratio and the second number ratio is greater than the threshold.

The causal relationship generation method provided by the embodiment of the present disclosure may obtain the tendency probability of each object to be grouped, where each object to be grouped includes a first experimental object and a first control object, the first experimental object is an object in which a first event has occurred, the first control object is an object in which the first event has not occurred, and the tendency probability of each object to be grouped is: estimating the probability of the second event of the objects to be grouped according to the characteristic data of the objects to be grouped, dividing the objects to be grouped according to the tendency probability of the objects to be grouped to obtain a plurality of groups, adjusting the number of first control objects in the groups aiming at each group of first experimental objects with different numbers from the first control objects so that the number of the first control objects in the groups is the same as that of the first experimental objects, obtaining difference information representing the difference between the total number of second experimental objects in all the groups and the total number of second control objects according to the number of second experimental objects in each group and the number of second control objects, wherein the second experimental objects are the first experimental objects generating the second event, the second control objects are the first control objects generating the second event, and when the difference information represents that the total number of the second experimental objects is larger than that of the second control objects, the causal relationship which represents the influence of the first event on the second event is generated, the tendency probability of each object to be grouped is the probability of the second event of the object to be grouped estimated according to the characteristic data of the object to be grouped, and each object to be grouped is divided into a plurality of groups according to the tendency probability, so that the experimental object and the comparison object contained in each group are both objects with the same or similar occurrence probability to the second event, the situation that whether the experimental object and the comparison object are the same or similar research objects can be determined by comparing the experimental object and the comparison object one by one in the prior art is avoided, the calculation process is reduced, and the time for carrying out causal inference and the calculation resource cost are reduced.

In another embodiment of the present invention, as shown in fig. 2, on the basis of the causal relationship generation method shown in fig. 1, after step S104, the following steps may be further included:

s106: when the difference information indicates that the total number of the second experimental subjects is not greater than the total number of the second control subjects, generating a causal relationship indicating that the first event has no influence on the second event.

In this step, when the difference information indicates that the total number of the second experimental subjects is not greater than the total number of the second control subjects, it indicates whether the second event occurs in the different subjects with the same or similar characteristics regardless of whether the first event occurs, that is, the first event has no influence on the second event.

In an embodiment, step S101 may be specifically implemented by a trend probability obtaining method as shown in fig. 3, including the steps of:

s301: and acquiring characteristic data of each object to be grouped.

In this step, the feature number of each object to be grouped may be collected in advance, and optionally, the feature data of each object to be grouped may be read from a feature database established in advance.

S302: and acquiring a probability numerical value corresponding to the characteristic data of each object to be grouped based on the corresponding relation between the pre-established characteristic data and the probability numerical value representing the occurrence probability of the second event.

In this step, as can be seen from the foregoing, the characteristics of the subjects to be grouped in various aspects may affect the occurrence of the second event, for example, when the effect of smoking on lung cancer is studied, the age, the exercise duration per week, or the daily sleep duration of the subjects to be grouped may all have an effect on whether the subjects to be grouped have lung cancer, and generally speaking, the subjects to be grouped with the larger age, the shorter exercise duration per week, or the shorter daily sleep duration are more likely to have lung cancer.

Therefore, the correspondence between the feature data of the object to be grouped and the probability value representing the occurrence probability of the second event may be preset, for example, when the influence of smoking on lung cancer is studied, the second event is that lung cancer is suffered from, the feature data is the age size of the object, and the correspondence is established in advance: the probability of lung cancer is 0.01% at age 20, 0.02% at age 40, 0.03% at age 60 and 0.05% at age 80, so that when the feature data of the objects to be grouped is age 40, the probability value of the objects to be grouped is 0.02%.

In an exemplary embodiment, the preset correspondence between the feature data of the object to be grouped and the probability value representing the occurrence probability of the second event may be obtained through statistics of big data, or may be empirically determined according to existing research results.

S303: and obtaining the tendency probability of each object to be grouped according to the obtained probability numerical value.

In this step, after the probability value is determined, the estimated probability of the second event occurring to the object to be grouped may be determined, and further, the tendency probability of each object to be grouped may be obtained according to the obtained probability value.

In one example embodiment, each object to be grouped has multiple types of characteristic data, e.g., age, gender, etc. The above-mentioned determining the tendency probability of the object to be grouped may add probability values corresponding to each type of feature data of each object to be grouped to obtain a sum of the probability values as the tendency probability of each object to be grouped.

Optionally, each type of feature data may be further provided with a weight, for example, if the influence of age is larger than that of gender, then the weight of age is higher than that of gender, for example, there are objects to be grouped, and the feature data includes: age 20, sex: if the probability values for men are 0.01% and 0.05%, respectively, and the probability values for groups are 0.2 and 0.1, respectively, the probability of tendency of the subject to be grouped is 0.01% × 0.2+ 0.05% × 0.1 ═ 0.007% by weighted summation.

In an exemplary embodiment, the steps S302 and S303 may be determined by a pre-established XGBOOST (Extreme Gradient Boosting) model, and at this time, feature data of each object to be grouped may be input to the XGBOOST model, so as to obtain a tendency probability of each object to be grouped. The XGB OST model can improve the accuracy of an output result under the condition of ensuring the training efficiency of the model.

FIG. 4 is a block diagram illustrating a causal relationship generation apparatus according to an exemplary embodiment. Referring to fig. 3, the apparatus includes a tendency probability obtaining module 401, a grouping module 402, a number adjusting module 403, a difference information obtaining module 404, and a causal relationship generating module 405.

An inclination probability obtaining module 401 configured to perform obtaining an inclination probability of each object to be grouped, where each object to be grouped includes a first experimental object and a first control object, the first experimental object is an object in which a first event has occurred, the first control object is an object in which the first event has not occurred, and the inclination probability of each object to be grouped is: estimating the probability of a second event of the object to be grouped according to the characteristic data of the object to be grouped;

a grouping module 402 configured to perform dividing of each object to be grouped according to the tendency probability of each object to be grouped to obtain a plurality of groups;

a number adjustment module 403 configured to perform, for each of the groups in which the number of the first subjects is different from the number of the first control subjects, adjusting the number of the first control subjects in the group so that the number of the first control subjects in the group is the same as the number of the first subjects;

a difference information obtaining module 404 configured to obtain difference information representing a difference between the total number of the second experimental subjects in all the groups and the total number of the second control subjects according to the number of the second experimental subjects in each group and the number of the second control subjects, where the second experimental subjects are the first experimental subjects having the second event, and the second control subjects are the first control subjects having the second event;

a causal relationship generation module 405 configured to perform generating a causal relationship indicating that the first event has an influence on the second event when the difference information indicates that the total number of the second experimental subjects is greater than the total number of the second control subjects.

Further, the causal relationship generation module 405 is further configured to perform generating a causal relationship indicating that the first event has no influence on the second event when the difference information indicates that the total number of the second experimental subjects is not greater than the total number of the second control subjects.

Further, the tendency probability obtaining module 401 is specifically configured to perform obtaining of feature data of each object to be grouped, obtain a probability numerical value corresponding to the feature data of each object to be grouped based on a correspondence between the pre-established feature data and a probability numerical value representing an occurrence probability of the second event, and obtain the tendency probability of each object to be grouped according to the obtained probability numerical value.

the tendency probability obtaining module 401 is specifically configured to perform addition of probability numerical values corresponding to each type of feature data of each object to be grouped to obtain a sum of the probability numerical values as a tendency probability of each object to be grouped.

Further, the grouping module 402 is specifically configured to perform dividing each object to be grouped into a plurality of groups with the same number of objects to be grouped according to a grouping sequence determined according to the tendency probability of each object to be grouped; or dividing each object to be grouped into groups corresponding to different probability intervals according to the tendency probability of each object to be grouped.

Further, the number adjusting module 403 is specifically configured to perform the following steps of adjusting the number of the first control subjects in each group of the first experiment subject different from the number of the first control subjects:

when the first number is smaller than the second number, calculating a first number difference between the second number and the first number; deleting a first number of the different first control objects from the first control objects contained in the group itself;

when the first number is larger than the second number, calculating a second number difference value between the first number and the second number; copying a second number of differential first control objects from the first control objects contained in the packet itself; the replicated first control object is added to the group.

Further, the difference information obtaining module 404 is specifically configured to calculate a first total number of the second experimental subjects and a second total number of the second control subjects in all the groups according to the number of the second experimental subjects and the number of the second control subjects in each group; calculating a third quantity difference value between the first total quantity and the second total quantity as difference information representing the difference between the total quantity of the second experimental subjects and the total quantity of the second control subjects in all the groups; or, calculating a first quantity ratio of the second experimental object in all the groups to each object to be grouped, and calculating a second quantity ratio of the second control object in all the groups to each object to be grouped; the ratio difference of the first number ratio and the second number ratio is calculated as difference information representing the difference between the total number of the second subjects and the total number of the second control subjects in all the groups.

With regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.

FIG. 5 is a block diagram illustrating an electronic device 500 for causal relationship generation, according to an example embodiment. For example, the electronic device 500 may be provided as a server. Referring to fig. 5, the apparatus 500 includes a processing component 522 that further includes one or more processors and memory resources, represented by memory 532, for storing instructions, such as applications, that are executable by the processing component 522. The application programs stored in memory 532 may include one or more modules that each correspond to a set of instructions. Further, the processing component 522 is configured to execute instructions to perform the causal relationship generation method described above.

The apparatus 500 may also include a power component 526 configured to perform power management of the apparatus 500, a wired or wireless network interface 550 configured to connect the apparatus 500 to a network, and an input/output (I/O) interface 558. The apparatus 500 may operate based on an operating system stored in the memory 532, such as a Windows Server, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM, or similar operating system.

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims

1. A method of generating a causal relationship, comprising:

2. The causal relationship generation method of claim 1, further comprising:

3. A causal relationship generation method according to any of claims 1-2, wherein said step of obtaining a probability of inclination of each object to be grouped comprises:

acquiring characteristic data of each object to be grouped;

4. A causal relationship generation method according to claim 3, wherein each object to be grouped has a plurality of types of feature data;

5. The causal relationship generation method of any one of claims 1 to 2 or 4, wherein the step of dividing the objects to be grouped into a plurality of groups according to the tendency probability of the objects to be grouped comprises:

6. A causal relationship generation method as claimed in any one of claims 1-2 or 4, wherein said step of adjusting the number of first control objects in a group such that the number of first control objects in a group is the same as the number of first test objects for each group where the number of first test objects is different from the number of first control objects comprises:

7. The causal relationship generation method of any one of claims 1-2 or 4, wherein the step of obtaining difference information representing a difference between the total number of second subjects and the total number of second control subjects in all the groups according to the number of second subjects and the number of second control subjects in each group comprises:

8. A cause and effect relationship generation apparatus, comprising:

9. A causal relationship generation electronic device, comprising:

a processor;

a memory for storing the processor-executable instructions;

wherein the processor is configured to execute the instructions to implement the causal relationship generation method of any of claims 1 to 7.

10. A storage medium having instructions that, when executed by a processor of the cause and effect generation electronics, enable the cause and effect generation electronics to perform the cause and effect generation method of any one of claims 1 to 7.