WO2020215237A1 - 用于数据处理的方法、装置和介质 - Google Patents

用于数据处理的方法、装置和介质 Download PDF

Info

Publication number
WO2020215237A1
WO2020215237A1 PCT/CN2019/084049 CN2019084049W WO2020215237A1 WO 2020215237 A1 WO2020215237 A1 WO 2020215237A1 CN 2019084049 W CN2019084049 W CN 2019084049W WO 2020215237 A1 WO2020215237 A1 WO 2020215237A1
Authority
WO
WIPO (PCT)
Prior art keywords
factor
factors
causality
sequence
candidate
Prior art date
Application number
PCT/CN2019/084049
Other languages
English (en)
French (fr)
Inventor
卫文娟
刘春辰
崔绿叶
冯璐
Original Assignee
日本电气株式会社
卫文娟
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 日本电气株式会社, 卫文娟 filed Critical 日本电气株式会社
Priority to US17/605,731 priority Critical patent/US20220215291A1/en
Priority to JP2021563019A priority patent/JP2022537009A/ja
Priority to PCT/CN2019/084049 priority patent/WO2020215237A1/zh
Publication of WO2020215237A1 publication Critical patent/WO2020215237A1/zh
Priority to JP2023190772A priority patent/JP2024016198A/ja

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/02Knowledge representation; Symbolic representation
    • G06N5/022Knowledge engineering; Knowledge acquisition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Definitions

  • the embodiments of the present disclosure relate to the field of machine learning, and more specifically, to methods, devices, and computer-readable storage media for data processing.
  • causal discovery has a wide range of applications in real life, such as in the supply chain, medical and health, and retail fields.
  • the causal discovery described here refers to the discovery of causal relationships among multiple factors from sample data about multiple factors.
  • the results of causal discovery can be used to assist in formulating various sales strategies; in the field of medical and health, the results of causal discovery can be used to assist in formulating treatment plans for patients.
  • the embodiments of the present disclosure provide methods, devices, and computer-readable storage media for data processing.
  • a method for data processing includes: obtaining an observation sample set about multiple factors, each observation sample in the observation sample set includes corresponding observation values of the multiple factors; and determining a set of dependencies existing between the multiple factors based on the observation sample set , A dependency relationship in the dependency relationship set indicates a factor pair among multiple factors that are related to each other; and based on the dependency relationship set, a causal relationship sequence of the multiple factors is determined, and the causality relationship sequence indicates a factor pair in the mutually related factor pairs.
  • One factor is the cause of another factor.
  • a second aspect of the present disclosure provides a device for data processing.
  • the device includes at least one processing unit and at least one memory.
  • At least one memory is coupled to at least one processing unit and stores instructions for execution by the at least one processing unit.
  • the instructions when executed by at least one processing unit, cause the device to perform actions.
  • the actions include: acquiring a set of observation samples about multiple factors, each observation sample in the set of observation samples includes corresponding observation values of multiple factors; Observing a sample set to determine a set of dependency relationships between multiple factors, one dependency relationship in the dependency relationship set indicates a factor pair among multiple factors; and based on the dependency relationship set, the cause and effect of the multiple factors are determined
  • a relationship sequence that indicates that one factor in a pair of factors that are related to each other is the cause of the other factor.
  • a third aspect of the present disclosure provides a computer-readable storage medium having machine-executable instructions stored thereon, and the machine-executable instructions when executed by a device cause the device to execute according to the present disclosure The method described in the first aspect.
  • FIGS. 1A and 1B show block diagrams of an example system for data processing according to an embodiment of the present disclosure
  • FIG. 2 shows a schematic diagram for determining the causal relationship between multiple factors according to an embodiment of the present disclosure
  • Figure 3 shows a flowchart of an example method according to an embodiment of the present disclosure
  • Figure 4 shows a flowchart of an example method according to an embodiment of the present disclosure
  • Figure 5 shows a flowchart of an example method according to an embodiment of the present disclosure
  • Figure 6 shows a flowchart of an example method according to an embodiment of the present disclosure.
  • Figure 7 shows a schematic block diagram of an example device that can be used to implement embodiments of the present disclosure.
  • the term "causal structure” generally refers to a structure that describes the causal relationship between various factors in the system, and is also referred to as a “causal relationship sequence” herein.
  • the term “factor” is also called “variable”.
  • the term “observed sample” refers to a set of observations of multiple factors that can be directly observed, and the factors that can be directly observed are also called “observed variables.”
  • a series of physiological indicators ie, observations of a series of factors
  • the physiological indicators ie, factors
  • the physiological indicators that affect the blood pressure of the patient can be determined.
  • the patient's blood pressure can be kept stable by influencing the physiological index or formulating corresponding strategies for the physiological index.
  • external factor data such as weather, season, temperature, date, store size, etc.
  • sales data of the product such as the The sales volume of the product, the price of the product, etc.
  • sales data of one or more related products for example, ice cream.
  • Each type of data collected serves as an observation of a factor.
  • the sales of the target commodity can be increased by changing the observed value of the one or more factors or formulating corresponding strategies for the one or more factors.
  • information on various factors of software development can be collected, including but not limited to the overall information of software development (such as development cycle, development resources Etc.) and information about the various stages of software development.
  • Information about each stage of software development may include, for example, information about the architecture stage (such as software architecture method, the number of software architecture levels, etc.), information about the coding stage (such as code length, number of functions, programming language, number of modules, etc.), testing Stage information (such as the correct rate or failure rate of unit testing, the correct rate or failure rate of black box testing, the correct rate or failure rate of white box testing, etc.), the information of the operating phase after the software is released (such as the operating phase Correct rate or failure rate, etc.).
  • Each type of data collected serves as an observation of a factor. By discovering the causal relationship between these factors, one or more factors that affect the software development cycle and/or failure rate can be determined. Further, the software development cycle and/or failure rate can be reduced by changing the observed value of the one or more factors or formulating corresponding strategies for the one or more factors.
  • Some traditional schemes are mainly aimed at systems with fewer factors (for example, no more than 100 factors), and use constraint-based methods or score-based methods to search the entire variable space to discover possible causality.
  • constraint-based methods the causal structure is usually discovered by applying conditional independent tests for multiple factors.
  • the results of the conditional independent test will become unreliable.
  • the discovery of a causal structure often requires a set of conditions that never include any factor to include all other factors to test the dependency between any two factors, which will result in huge computational overhead.
  • the causal structure is usually found by optimizing the score that measures the degree of matching between the causal structure and the sample data.
  • these solutions are often difficult to apply to the discovery of causal structures (also called "high-dimensional causal structures") for many factors (for example, hundreds or thousands).
  • a solution for data processing is proposed.
  • This solution can quickly and accurately realize the discovery of a high-dimensional causal structure, thereby being able to solve the above-mentioned problems and/or other potential problems.
  • each embodiment of the present disclosure will be described in detail in combination with the above example scenarios. It should be understood that this is for illustrative purposes only and is not intended to limit the scope of the present invention in any way.
  • FIG. 1A shows an example block diagram of a system 100 for data processing according to an embodiment of the present disclosure.
  • the system 100 can, for example, find causal relationships among multiple factors. It should be understood that the system 100 shown in FIG. 1A is only an example in which the embodiments of the present disclosure can be implemented, and is not intended to limit the scope of the present disclosure. The embodiments of the present disclosure are also applicable to other systems or architectures.
  • the system 100 may include a causality determination device 120.
  • the causality determination device 120 may receive the observation sample set 110 about multiple factors, and determine the causal relationship sequence 130 indicating the causal relationship among the multiple factors therefrom.
  • the system 100 may further include an observation sample collecting device (not shown in FIG. 1A) for collecting an observation sample set 110 related to multiple factors.
  • the observation sample collection device can collect observation values of multiple factors in real time, regularly or irregularly, to obtain the observation sample set 110.
  • the observation sample collection device may include one or more collection units for collecting observation values of different types of factors.
  • the observation sample set 110 may include observation samples of multiple factors related to one or more target factors.
  • the vector x i ⁇ R N (where 1 ⁇ i ⁇ D) represents N observations of the i-th factor. For example, x i,n (where 1 ⁇ i ⁇ D and 1 ⁇ n ⁇ N) represents the n-th observation value of the i-th factor.
  • the target factor is "customer satisfaction"
  • the factor set V may include factors related to customer attributes (for example, customer level, customer number, etc.), and customer behavior Relevant factors (for example, monthly consumption of Internet traffic, ratio of free traffic, total cost of monthly consumption of Internet traffic, etc.), factors related to customer feedback (for example, number of complaints, customer satisfaction), and customer-specific One or more of the strategic factors (for example, the number of over-set reminders, timing, etc.). Wait.
  • the vector x i of N may be composed of the customer level, and x i, n may represent a number N of the n-th customer client level.
  • Factors to "Internet traffic monthly consumption” ie, v i
  • the vector x i Internet traffic consists of N customers monthly consumption of the composition, and x i, n can represent the n-th N customers in Internet traffic consumed by the customer each month.
  • the causal relationship sequence 130 may indicate, for example, the causal relationship among factors such as customer level, monthly Internet traffic consumption, free traffic rate, total cost of monthly Internet traffic consumption, customer satisfaction and other factors, such as the target factor "customer satisfaction "What are the reasons?
  • the target factor is "blood pressure”
  • the factor set V may include heart rate, cardiac output, allergy index, total peripheral vascular resistance, catecholamine release, blood pressure, etc.
  • Factors in "heart rate” i.e., v i
  • the vector x i of N may be composed of the patient's heart rate, and x i, n may represent a number N of the n-th patients patient's heart rate.
  • the vector x i of N may be the patient's cardiac output composition, and x i, n may represent the cardiac output in patients N n-th patients .
  • the causal relationship sequence 130 may, for example, indicate the causal relationship among factors such as heart rate, cardiac output, allergy indicators, total peripheral vascular resistance, catecholamine release, blood pressure, etc., such as which factors are responsible for the target factor "blood pressure”.
  • the target factor is "target merchandise sales”.
  • the factor set V may include external factors (such as weather, season, temperature, date, store size, etc.), and the relationship between the target merchandise (for example, umbrella) Factors related to the sales behavior (such as the sales volume of the target product, the price of the target product, etc.), factors related to the sales behavior of one or more related products (for example, ice cream) (such as the sales volume of the related product, the price of the related product) Etc.) and one or more of the sales strategy factors for the target product (such as the number of promotions, frequency, etc.).
  • external factors such as weather, season, temperature, date, store size, etc.
  • the relationship between the target merchandise for example, umbrella
  • Factors related to the sales behavior such as the sales volume of the target product, the price of the target product, etc.
  • factors related to the sales behavior of one or more related products for example, ice cream
  • one or more of the sales strategy factors for the target product such as the number of promotions, frequency, etc.
  • the temperature may be made of the vector x i consisting of N days, and x i, n may represent the temperature of the n-th day.
  • Factors to "target sales of goods" i.e., v i) as an example, the vector x i may be formed of a composition umbrella sales N days, and x i, n may represent an umbrella sales day n.
  • the causal relationship sequence 130 may, for example, indicate the causal relationship between factors such as weather, season, temperature, date, store size, target product sales, target product prices, related product sales, and related product prices, such as the target factor "target product sales" What are the reasons?
  • the factor set v can include the overall factors of software development (such as development cycle, resources invested in development, etc.) and software One or more of the factors at each stage of development.
  • the factors of each stage of software development may include, for example, factors of the architecture stage (such as software architecture method, number of software architecture levels, etc.), factors of the coding stage (such as code length, number of functions, programming language, number of modules, etc.), testing Stage factors (such as the correct rate or failure rate of unit testing, the correct rate or failure rate of black box testing, the correct rate or failure rate of white box testing, etc.), factors in the operating phase after software release (such as the operating phase Correct rate, failure rate during operation, etc.).
  • Factors to "development cycle” i.e., v i) as an example, the vector x i of N may be software product development cycles, and x i, n may represent an n-th software product development cycle.
  • the vector x i can be composed of the code length of N software products, and x i,n can represent the code length of the nth software product.
  • the causality sequence 130 may, for example, indicate the software development cycle, the resources invested in development, the architecture method, the number of architecture levels, the code length, the number of functions, the number of programming languages, the number of modules, the accuracy or failure rate of unit testing, and the accuracy of black box testing. Or the causal relationship between the failure rate, the correct rate or failure rate of the white box test, the correct rate of the operation phase, and the failure rate of the operation phase. For example, what are the reasons for the target factor "development cycle", and what are the reasons for the target factor "failure rate in operation phase", etc.
  • the causal relationship determining device 120 may include, for example, a dependency relationship determining unit 121 and a causal relationship determining unit 122.
  • the dependence relationship determination unit 121 may determine the set of dependence relationships existing among multiple factors based on the observation sample set 110. Each dependency relationship in the dependency relationship set indicates a corresponding factor pair associated with each other among multiple factors.
  • the causal relationship determining unit 122 may determine the factor relationship sequence 130 based on the dependency relationship set determined by the dependency relationship determining unit 121.
  • the factor relationship sequence 130 may indicate a causal relationship (ie, a cause of one factor being the cause of another factor) between pairs of factors that have a dependent relationship.
  • system 100 may also include additional devices and/or units not shown.
  • system 100 may further include a causality presentation device (not shown) for presenting the representation of the causality sequence 130.
  • the causality presentation device may present the representation of the causality sequence 130 in different ways such as visual and auditory.
  • the causal relationship presentation device may present the causal relationship sequence 130 in the form of graphics, charts, text, and the like.
  • the causality presenting device may present all the representations of the causality sequence 130, that is, the causality among all factors.
  • the causal relationship presenting device may only present a representation of a part of the causal relationship sequence 130, for example, a causal relationship associated with one or more target factors.
  • the causality presentation device may further present the corresponding importance levels of the multiple factors, for example, displaying the multiple factors in different colors and/or numerical values representing different degrees of importance. The relative importance of each factor.
  • the embodiments of the present disclosure are not limited in this respect.
  • FIG. 1B shows an example block diagram of a system 105 for data processing according to an embodiment of the present disclosure.
  • the system 105 can, for example, apply and optimize the causality sequence 130 shown in FIG. 1A. It should be understood that the system 105 shown in FIG. 1B is only an example in which the embodiments of the present disclosure can be implemented, and is not intended to limit the scope of the present disclosure. The embodiments of the present disclosure are also applicable to other systems or architectures.
  • the system 105 may include an observation sample influencing device 140.
  • the observation sample influencing device 140 may determine at least one factor that is the cause of the target factor from a plurality of factors based on the causality sequence 130.
  • the observation sample influencing device 140 can influence the observation value of the target factor by changing the observation value of at least one factor, thereby obtaining the changed observation sample set 150.
  • At least one observation sample in the changed observation sample set 150 includes a changed observation value of at least one factor.
  • the target factor is, for example, "customer satisfaction”
  • the causality sequence 130 may indicate, for example, which factors are responsible for the target factor "customer satisfaction” (for example, the package is exhausted). Previous reminders, discount packages, etc.).
  • the observation sample influencing device 140 can, for example, influence and change the observed values of these factors, and/or formulate corresponding strategies for these factors (for example, provide customers with more reminders and provide customers with more preferential packages before the packages are exhausted). To improve customer satisfaction with telecom operators.
  • the target factor is, for example, "blood pressure”
  • the causality sequence 130 may indicate, for example, which physiological indicators are responsible for the target factor "blood pressure”.
  • the observation sample influencing device 140 can, for example, keep the patient's blood pressure stable by influencing and changing these physiological indicators, and/or formulating corresponding strategies for these physiological indicators.
  • the target factor is, for example, "umbrella sales”
  • the causality sequence 130 may indicate, for example, which factors are responsible for the target factor "umbrella sales” (for example, weather, number of umbrellas available for sale, etc.).
  • the observation sample influencing device 140 can, for example, increase the sales of the target product umbrella by influencing and changing these factors, and/or formulating corresponding strategies for these factors (for example, increasing the number of umbrellas available for sale when it rains).
  • the target factor is, for example, a "development cycle”
  • the causality sequence 130 may indicate, for example, which factors are responsible for the target factor "development cycle” (for example, the number of architecture levels, programming language, etc.).
  • the observation sample influencing device 140 can reduce the software development cycle, for example, by influencing and changing these factors, and/or formulating corresponding strategies for these factors (for example, reducing the complexity of the software architecture, using a more friendly programming language, etc.).
  • the target factor may be "software failure rate in operating phase”
  • the causality sequence 130 may indicate which factors (for example, code length, number of modules, etc.) are responsible for the target factor "software failure rate in operating phase”.
  • the observation sample influencing device 140 may, for example, reduce the software failure rate in the operating phase by influencing and changing these factors, and/or formulating corresponding strategies for these factors (for example, reducing code length, reducing the number of modules, etc.).
  • the system 105 may include a causality optimization device 160.
  • the causality optimization device 160 can optimize the causality sequence 130 based on the changed observation sample set 150, thereby improving the accuracy of the causality sequence 130.
  • the causality optimization device 160 may rediscover the causal relationship between multiple factors based on the changed observation sample set 150 (for example, similar to the process performed by the causality determination device 120), thereby obtaining an optimized Causality sequence. In this way, the embodiments of the present disclosure can further improve the accuracy and robustness of causal discovery.
  • the causality determination device 120 shown in FIG. 1A, the observation sample influencing device 140 and the causality optimization device 160 shown in FIG. 1B are shown as being separated from each other, it should be understood that this is for illustrative purposes only. It is not intended to limit the scope of the present disclosure.
  • the causality determination device 120 shown in FIG. 1A, the observation sample influencing device 140 and the causality optimization device 160 shown in FIG. 1B may be implemented in the same physical device or multiple different physical devices. in.
  • the causality determination device 120 shown in FIG. 1A and the causality optimization device 160 shown in FIG. 1B may be implemented as the same device. The embodiments of the present disclosure are not limited in this respect.
  • FIG. 2 shows a schematic diagram for determining the causal relationship among multiple factors according to an embodiment of the present disclosure.
  • the number of factors (ie, observed variables) involved in the observation sample set 110 ie, D) is 5.
  • the observation sample set 110 includes a plurality of observation samples about the factors v 1 , v 2 , v 3 , v 4 and v 5 .
  • the dependency determination unit 121 may determine the set of dependency relationships existing between the factors v 1 , v 2 , v 3 , v 4 and v 5 based on the observation sample set 110, which is represented as a skeleton diagram 210.
  • 210 indicates the set of dependencies v factors associated with each other. 1 and v 2, v 2 and v 4 factors associated with each other, factors v 3 and v 4 associated with each other and associated with each other factor v 3 and v 5.
  • the causality determining unit 122 may determine the causal relationship sequence of the factors v 1 , v 2 , v 3 , v 4 and v 5 based on the dependency relationship set 210, which is represented as a directed acyclic graph 130, for example.
  • the causality sequence 130 indicates that factor v 2 is the cause of factor v 1 (as shown by edge v 2 ⁇ v 1 ), and factor v 2 is the cause of factor v 4 (as shown by edge v 2 ⁇ v 4 ).
  • v 3 is the cause of factor v 4 (as shown by edge v 3 ⁇ v 4 ), and factor v 3 is the cause of factor v 5 (as shown by edge v 3 ⁇ v 5 ).
  • FIG. 3 shows a flowchart of a method 300 for determining a causal relationship between multiple factors according to an embodiment of the present disclosure.
  • the method 300 may be executed by the causality determination apparatus 120 shown in FIG. 1A. It should be understood that the method 300 may also include additional actions not shown and/or certain actions shown may be omitted. The scope of the present disclosure is not limited in this respect.
  • the causality determination device 120 obtains a set of observation samples about multiple factors (for example, the set of observation samples 110 shown in FIGS. 1A and 2).
  • An observation sample in the observation sample set includes the corresponding observation values of multiple factors.
  • the causality determination device 120 determines a set of dependency relationships existing among multiple factors (for example, the skeleton diagram 210 shown in FIG. 2) based on the observation sample set.
  • a dependency relationship in the dependency relationship set indicates a factor pair related to each other among multiple factors.
  • the causality determination device 120 may estimate the correlation coefficient between any two factors among the multiple factors based on the corresponding observation values of these two factors.
  • the correlation coefficient may be any one of Spearman correlation coefficient or Kendall correlation coefficient.
  • the causality determination device 120 may establish a correlation coefficient matrix S. For example, suppose the total number of factors is D, then S is a D ⁇ D matrix. Assuming that the element in the jth row and kth column of the matrix S is S jk , S jk can be determined as follows:
  • ⁇ jk represents the Spearman correlation coefficient between the jth factor in the D factors and the kth factor
  • ⁇ jk represents the Kendall correlation coefficient between the jth factor in the D factors and the kth factor.
  • the calculation of Spearman's correlation coefficient and Kendall's correlation coefficient is known to those skilled in the art, and will not be repeated here.
  • any known or future-developed method or means can be used to calculate the correlation coefficient between two factors, not limited to Spearman correlation coefficient and Kendall correlation coefficient. It should be understood that the Spearman correlation coefficient and the Kendall correlation coefficient are only regarded as examples of correlation coefficients, and are not intended to limit the scope of the present disclosure.
  • the causality determination device 120 may establish an objective function (also referred to as the “first objective function” herein) for determining the set of dependencies (ie, the skeleton diagram 210) based on the estimated correlation coefficient matrix S .
  • the causality determination device 120 may determine the dependency relationship set by minimizing the first objective function.
  • the causality determination device 120 may learn the accuracy matrix ⁇ based on the graphical Lasso algorithm, which represents the corresponding dependency relationship between the factors. For example, ⁇ can be determined as follows:
  • the causality determination device 120 may determine the dependency relationship set M by applying conditional independent tests on multiple factors.
  • Conditional independence testing can be used to determine whether two factors are independent of each other under a given set of conditions.
  • any combination of all other factors needs to be taken as the above-mentioned condition set.
  • the number of factors ie, D
  • the calculation overhead will be very large. If the condition set includes many other factors, the above two factors will easily be determined to be independent of each other.
  • the causal relationship determining device 120 may limit other factors in the set of conditions.
  • the number is 1. In this way, in addition to reducing the computational overhead of the conditional independence test, the number of factor pairs determined to be independent of each other can also be reduced, thereby facilitating the discovery of subsequent causality sequences.
  • the embodiments of the present disclosure can reduce the size of the variable space to be searched, so that the causal relationship between a large number of factors can be quickly discovered.
  • the causal relationship determining device 120 determines the causal relationship sequence of the multiple factors based on the dependency relationship set.
  • the causality sequence can indicate that one factor in a pair of factors that are related to each other is the cause of another factor.
  • the causal relationship determining device 120 may determine the relationship between one factor and the other factor in each of the factor pairs that are related to each other for a plurality of mutually related factor pairs indicated by the dependency relationship set. influences.
  • the causality determination device 120 may establish a second objective function based on a predetermined distribution (for example, Gaussian distribution or other distributions), thereby The second objective function is minimized to determine the influence of one factor in each factor pair on the other factor.
  • the second objective function may be established based on the following two factors: first, the discovered causal structure has a better data fit to the observed data sample; second, the discovered causal structure is still sparse.
  • B the influence determined for each of the multiple factor pairs is represented by matrix B, then B can be determined as follows:
  • N represents the total number of observation samples in X
  • D represents the total number of factors.
  • a vector x i ⁇ R N (where 1 ⁇ i ⁇ D) denotes the i th element (i.e., factor V i) of N observations.
  • X i, n (where, 1 ⁇ i ⁇ D and 1 ⁇ n ⁇ N) denotes the i th element (i.e., factor V i) of n observations.
  • the vector ⁇ i ⁇ R D-1 represents the corresponding influence of each factor on the factor v i .
  • 0 represents the total number of non-zero elements in matrix B, which is an estimate of the divergence of the causal structure
  • the constraint condition G ⁇ 1 ,..., ⁇ D ⁇ DAG indicates that the causal structure to be determined is a directed acyclic graph
  • the constraint condition M indicates that the causal structure to be determined is a subset of the previously determined skeleton diagram M (for example, the skeleton diagram 210 shown in FIG. 2).
  • the causality determination device 120 may determine the causality sequence 130 based on corresponding influences and observation sample sets determined for multiple factor pairs.
  • FIG. 4 shows a flowchart of a method 400 for determining a causal relationship sequence according to an embodiment of the present disclosure. The method 400 may be executed by the causality determination device 120 as shown in FIG. 1A. It should be understood that the method 400 may also include additional actions not shown and/or certain actions shown may be omitted. The scope of the present disclosure is not limited in this respect.
  • the causality determination device 120 may obtain the historical causality sequence and the historical causality score. It is assumed here that the historical causality sequence is represented by Q S , and the historical causality score is represented by f(Q S ).
  • the causality determination device 120 may determine the initial causality score corresponding to the empty sequence as the historical causality score, that is:
  • V represents all nodes in the graph G (for example, all nodes with edges in the skeleton graph 210)
  • E represents the set of all edges in the graph G.
  • V represents all nodes in the graph G (for example, all nodes with edges in the skeleton graph 210)
  • E represents the set of all edges in the graph G.
  • V ⁇ v 1 , v 2 , v 3 , v 4 , v 5 ⁇ .
  • V ⁇ V U represents all nodes other than the node set U (i.e., V ⁇ v i V represents all the nodes other than v i).
  • S( ⁇ i ) represents the support set of ⁇ i , that is, the set of the parent node of node v i (that is, the node that represents the potential cause of factor v i ).
  • Restrictions Indicates that the set S( ⁇ i ) is a subset of the intersection of the set V ⁇ U and the set S(m i ), where the set S(m i ) represents that the set S(m i ) has a relationship with the node v i in the skeleton graph M (for example, the skeleton graph 120) The collection of nodes on the edge.
  • the causal relationship determining device 120 determines one or more candidate factors that may be added to the causal relationship sequence based on the historical causal relationship sequence Q S and the multiple factor pairs indicated by the dependency relationship set (for example, the skeleton diagram 210) .
  • one or more candidate factors may include the set of candidate nodes V ⁇ Q S all the elements corresponding to the candidate node set V ⁇ Q S may represent a node in addition to the node set V included in the Q S All nodes.
  • the causality determination device 120 may output the historical causality sequence Q S as the determined causality sequence 130.
  • the causality determination device 120 may select candidate factors to be added to the causality sequence from the one or more candidate factors based on the determined one or more candidate causality scores.
  • the causality determination device 120 may determine the smallest candidate causality score from one or more candidate causality scores, and select candidate factors associated with the smallest candidate causality score for adding to the causality relationship. Sequence 130.
  • the causal relationship determining device 120 may obtain the constraint condition associated with the causal relationship sequence to be determined.
  • the causality determination device 120 may obtain expert information indicating the constraint condition, and determine the constraint condition based on the obtained expert information.
  • the expert information may indicate that node v 3 is in front of node v 4 , that is, the factor corresponding to node v 3 may be the cause of the factor corresponding to node v 4 , but is related to node v 4 corresponding to the factors can not be the cause of the node corresponding to factors 3 v.
  • the causal relationship determining device 120 may determine the constraint condition based on the historical causal relationship sequence and a plurality of mutually related factor pairs indicated by the skeleton diagram 120. For example, in the example shown in Fig. 2, suppose that the current Q S indicates that the factor corresponding to node v 3 is the cause of the factor corresponding to node v 4 (that is, there is an edge v 3 ⁇ v 4 in the causality sequence 130 ).
  • the causality determination device 120 may determine that the node set ⁇ v 3 , v 5 ⁇ is in front of the node set ⁇ v 1 , v 2 , v 4 ⁇ .
  • a certain node in the node set ⁇ v 3 , v 5 ⁇ may be the cause of a certain node in the node set ⁇ v 1 , v 2 , v 4 ⁇ , but the node set ⁇ v 1 , v 2 , v Any node in 4 ⁇ cannot be the cause of a node in the node set ⁇ v 3 , v 5 ⁇ .
  • the causal relationship determining device 120 may select candidate factors to be added to the causal relationship sequence from one or more candidate factors, so that all The addition of the selected candidate factors meets the constraints of acquisition. For example, when the addition of the candidate factor associated with the smallest candidate causality score will violate the constraint, the causality determination device 120 may select another candidate factor (for example, the candidate factor associated with the second smallest candidate causality score) , To be added to the causality sequence 130.
  • candidate factors for example, the candidate factor associated with the second smallest candidate causality score
  • the number of candidate factors can be limited in the determination of the causal relationship sequence by using the constraint condition, so that the causal relationship sequence can be determined faster.
  • the causality determination device 120 may update the historical causality sequence Q S and the historical causality score f(Q S ).
  • the causal relationship determination means 120 may be utilized with the candidate factors corresponding to the selected candidate causality sequence Q S 'to replace the history causality sequence Q S, and using the candidate causal relationship between sequence Q S' scores corresponding to f ( Q S ') to replace the historical causality score f(Q S ).
  • the causality determination device 120 may iteratively execute blocks 410-460 in the method 400 until all possible candidate factors are searched out (ie, execute to block 470).
  • FIG. 5 shows a flowchart of a method 500 for influencing the observed value of a target factor according to an embodiment of the present disclosure.
  • the method 500 may be performed by the observation sample influencing device 140 as shown in FIG. 1B.
  • the method 500 may be performed after the method 300.
  • the method 500 may also include additional actions not shown and/or certain actions shown may be omitted. The scope of the present disclosure is not limited in this respect.
  • the observation sample influencing device 140 determines at least one factor that is the cause of the target factor from a plurality of factors based on the causality sequence. Then, at block 520, the observation sample influencing device 140 affects the observation value of the target factor by changing the observation value of at least one factor. In some embodiments, for example, the observation sample influencing device 140 can affect the observed value of the target factor by influencing and changing at least one factor and/or formulating a corresponding strategy for at least one factor.
  • the target factor is, for example, "customer satisfaction”.
  • the observation sample influencing device 140 may determine the cause of the target factor "customer satisfaction" based on the causality sequence 130 (for example, a reminder before the package is exhausted, a discount package, etc.).
  • the observation sample influencing device 140 can further improve the customer by influencing and changing these factors, and/or formulating corresponding strategies for these factors (for example, providing customers with more reminders and providing customers with more preferential packages before the packages are exhausted). Satisfaction with telecom operators.
  • the target factor is, for example, "blood pressure”.
  • the observation sample influencing device 140 may determine which physiological indicators are the cause of the target factor "blood pressure” based on the causality sequence 130.
  • the observation sample influencing device 140 can further stabilize the patient's blood pressure by influencing and changing these physiological indicators, and/or formulating corresponding strategies for these physiological indicators.
  • the target factor is, for example, "Umbrella Sales”.
  • the observation sample influencing device 140 can determine which factors (for example, weather, the number of umbrellas available for sale, etc.) are responsible for the target factor "Umbrella Sales” based on the causality sequence 130.
  • the observation sample influencing device 140 may further influence and change these factors, and/or formulate corresponding strategies for these factors (for example, increase the number of umbrellas available for sale when it rains) to increase the sales of the target product umbrellas.
  • the target factor is, for example, the "development cycle".
  • the observation sample influencing device 140 may determine which factors (for example, the number of architecture levels, programming language, etc.) are responsible for the target factor "development cycle” based on the causality sequence 130.
  • the observation sample influencing device 140 can further reduce the software development cycle by influencing and changing these factors, and/or formulating corresponding strategies for these factors (for example, reducing the complexity of the software architecture, using a more friendly programming language, etc.).
  • the target factor may be "software failure rate in the operating phase".
  • the observation sample influencing device 140 can determine which factors (for example, code length, number of modules, etc.) are responsible for the target factor "software failure rate in the operating phase" based on the causality sequence 130.
  • the observation sample influencing device 140 can further reduce the software failure rate in the operating phase by influencing and changing these factors, and/or formulating corresponding strategies for these factors (for example, reducing code length, reducing the number of modules, etc.).
  • FIG. 6 shows a flowchart of a method 600 for optimizing causality according to an embodiment of the present disclosure.
  • the method 600 may be executed by the causality optimization device 160 as shown in FIG. 1B.
  • the method 600 may be performed after the method 500. It should be understood that the method 600 may also include additional actions not shown and/or certain actions shown may be omitted. The scope of the present disclosure is not limited in this respect.
  • the causality optimization device 160 obtains a set of changed observation samples for a plurality of factors.
  • at least one observation sample in the changed observation sample set may include a changed observation value of at least one factor (eg, at least one factor is the cause of the target factor).
  • the causality optimization device 160 may optimize the causality sequence based on the changed observation sample set.
  • the causality optimization device 160 may rediscover the causality between multiple factors based on the changed observation sample set 150 (for example, similar to the process performed by the causality determination device 120), so as to obtain the experience Optimized causality sequence. In this way, the embodiments of the present disclosure can further improve the accuracy and robustness of causal discovery.
  • Figure 7 shows a schematic block diagram of an example device 700 that can be used to implement embodiments of the present disclosure.
  • the causality determination device 120 shown in FIG. 1A, the observation sample influencing device 140 and/or the causality optimization device 160 shown in FIG. 1B may be implemented by the device 700.
  • the device 700 includes a central processing unit (CPU) 701, which can be loaded according to computer program instructions stored in a read only memory (ROM) 702 or loaded from a storage unit 708 to a random access memory (RAM) 703. Program instructions to perform various appropriate actions and processing.
  • ROM read only memory
  • RAM random access memory
  • various programs and data required for the operation of the device 700 can also be stored.
  • the CPU 701, the ROM 702, and the RAM 703 are connected to each other through a bus 704.
  • An input/output (I/O) interface 705 is also connected to the bus 704.
  • the I/O interface 705 includes: an input unit 706, such as a keyboard, a mouse, etc.; an output unit 707, such as various types of displays, speakers, etc.; and a storage unit 708, such as a magnetic disk, an optical disk, etc. ; And the communication unit 709, such as a network card, a modem, a wireless communication transceiver, etc.
  • the communication unit 709 allows the device 700 to exchange information/data with other devices through a computer network such as the Internet and/or various telecommunication networks.
  • the processing unit 701 may be configured to perform the various procedures and processing described above, for example, the methods 300, 400, 500, and/or 600.
  • the methods 300, 400, 500, and/or 600 may be implemented as a computer software program, which is tangibly contained in a machine-readable medium, such as the storage unit 708.
  • part or all of the computer program may be loaded and/or installed on the device 700 via the ROM 702 and/or the communication unit 709.
  • the computer program is loaded into the RAM 703 and executed by the CPU 701, one or more steps in the methods 300, 400, 500, and/or 600 described above can be executed.
  • the present disclosure may be a system, method, and/or computer program product.
  • the computer program product may include a computer-readable storage medium loaded with computer-readable program instructions for executing various aspects of the present disclosure.
  • the computer-readable storage medium may be a tangible device that can hold and store instructions used by the instruction execution device.
  • the computer-readable storage medium may be, for example, but not limited to, an electrical storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing.
  • Computer-readable storage media include: portable computer disks, hard disks, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM) Or flash memory), static random access memory (SRAM), portable compact disk read-only memory (CD-ROM), digital versatile disk (DVD), memory stick, floppy disk, mechanical encoding device, such as a printer with instructions stored thereon
  • RAM random access memory
  • ROM read-only memory
  • EPROM erasable programmable read-only memory
  • flash memory flash memory
  • SRAM static random access memory
  • CD-ROM compact disk read-only memory
  • DVD digital versatile disk
  • memory stick floppy disk
  • mechanical encoding device such as a printer with instructions stored thereon
  • the computer-readable storage medium used here is not interpreted as a transient signal itself, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through waveguides or other transmission media (for example, light pulses through fiber optic cables), or through wires Transmission of electrical signals.
  • the computer-readable program instructions described herein can be downloaded from a computer-readable storage medium to various computing/processing devices, or downloaded to an external computer or external storage device via a network, such as the Internet, a local area network, a wide area network, and/or a wireless network.
  • the network may include copper transmission cables, optical fiber transmission, wireless transmission, routers, firewalls, switches, gateway computers, and/or edge servers.
  • the network adapter card or network interface in each computing/processing device receives computer-readable program instructions from the network, and forwards the computer-readable program instructions for storage in the computer-readable storage medium in each computing/processing device .
  • the computer program instructions used to perform the operations of the present disclosure may be assembly instructions, instruction set architecture (ISA) instructions, machine instructions, machine-related instructions, microcode, firmware instructions, status setting data, or in one or more programming languages.
  • Source code or object code written in any combination, the programming language includes object-oriented programming languages such as Smalltalk, C++, etc., and conventional procedural programming languages such as "C" language or similar programming languages.
  • Computer-readable program instructions can be executed entirely on the user's computer, partly on the user's computer, executed as a stand-alone software package, partly on the user's computer and partly executed on a remote computer, or entirely on the remote computer or server carried out.
  • the remote computer can be connected to the user's computer through any kind of network-including a local area network (LAN) or a wide area network (WAN)-or it can be connected to an external computer (for example, using an Internet service provider to connect to the user's computer) connection).
  • LAN local area network
  • WAN wide area network
  • an electronic circuit such as a programmable logic circuit, a field programmable gate array (FPGA), or a programmable logic array (PLA), can be customized by using the status information of the computer-readable program instructions.
  • the computer-readable program instructions are executed to realize various aspects of the present disclosure.
  • These computer-readable program instructions can be provided to the processing units of general-purpose computers, special-purpose computers, or other programmable data processing devices, thereby producing a machine that makes these instructions when executed by the processing units of the computer or other programmable data processing devices , A device that implements the functions/actions specified in one or more blocks in the flowchart and/or block diagram is produced. It is also possible to store these computer-readable program instructions in a computer-readable storage medium. These instructions make computers, programmable data processing apparatuses, and/or other devices work in a specific manner, so that the computer-readable medium storing instructions includes An article of manufacture, which includes instructions for implementing various aspects of the functions/actions specified in one or more blocks in the flowchart and/or block diagram.
  • each block in the flowchart or block diagram may represent a module, program segment, or part of an instruction, and the module, program segment, or part of an instruction contains one or more functions for implementing the specified logical function.
  • Executable instructions may also occur in a different order from the order marked in the drawings. For example, two consecutive blocks can actually be executed in parallel, or they can sometimes be executed in the reverse order, depending on the functions involved.
  • each block in the block diagram and/or flowchart, and the combination of the blocks in the block diagram and/or flowchart can be implemented by a dedicated hardware-based system that performs the specified functions or actions Or it can be realized by a combination of dedicated hardware and computer instructions.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Business, Economics & Management (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Medical Informatics (AREA)
  • Marketing (AREA)
  • Health & Medical Sciences (AREA)
  • Economics (AREA)
  • General Health & Medical Sciences (AREA)
  • Human Resources & Organizations (AREA)
  • Computational Linguistics (AREA)
  • Primary Health Care (AREA)
  • Strategic Management (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本公开的实施例涉及用于数据处理的方法、设备和计算机可读存储介质。一种用于数据处理的方法包括获取关于多个因素的观测样本集合,该观测样本集合中的一个观测样本包括多个因素的相应观测值。该方法还包括基于观测样本集合,确定多个因素之间存在的依赖关系集合,依赖关系集合中的一个依赖关系指示多个因素中彼此关联的一个因素对。该方法还包括基于依赖关系集合,确定多个因素的因果关系序列,该因果关系序列指示彼此关联的所述因素对中的一个因素是另一因素的原因。本公开的实施例还提供了能够实现上述方法的设备和计算机可读存储介质。本公开的实施例能够快速且准确地发现大量因素间的因果关系,并基于该因果关系来影响目标因素的观测值。

Description

用于数据处理的方法、装置和介质 技术领域
本公开的实施例涉及机器学习领域,并且更具体地,涉及用于数据处理的方法、装置和计算机可读存储介质。
背景技术
随着信息技术的飞速发展,数据规模迅速增长。在这样的背景和趋势下,机器学习受到越来越广泛的关注。其中,因果发现在现实生活中具有广泛的应用,例如在供应链、医疗健康和零售等领域。在此所述的因果发现是指从关于多个因素的样本数据中发现多个因素间存在的因果关系。例如,在零售领域,因果发现的结果能够被用来辅助制定各种销售策略;在医疗健康领域,因果发现的结果能够被用来辅助制定对患者的治疗方案等。
然而,随着技术的发展,单个系统中可能存在因果关系的各种因素的数目显著地增加。此外,人们还常常关注不同系统间的联动。这导致了要针对其发现因果关系的因素的数目可能高达成百上千个。在此情况下,快速且准确地发现大量因素间存在的因果关系变得越来越重要。
发明内容
本公开的实施例提供了用于数据处理的方法、装置和计算机可读存储介质。
在本公开的第一方面,提供一种用于数据处理的方法。该方法包括:获取关于多个因素的观测样本集合,该观测样本集合中的每个观测样本包括多个因素的相应观测值;基于该观测样本集合来确定多个因素之间存在的依赖关系集合,该依赖关系集合中的一个依赖关系指示多个因素中彼此关联的一个因素对;以及基于该依赖关系集合,确 定多个因素的因果关系序列,该因果关系序列指示彼此关联的因素对中的一个因素是另一因素的原因。
本公开的第二方面,提供一种用于数据处理的装置。该装置包括至少一个处理单元和至少一个存储器。至少一个存储器被耦合到至少一个处理单元并且存储用于由至少一个处理单元执行的指令。指令当由至少一个处理单元执行时,使得该装置执行动作,动作包括:获取关于多个因素的观测样本集合,该观测样本集合中的每个观测样本包括多个因素的相应观测值;基于该观测样本集合来确定多个因素之间存在的依赖关系集合,该依赖关系集合中的一个依赖关系指示多个因素中彼此关联的一个因素对;以及基于该依赖关系集合,确定多个因素的因果关系序列,该因果关系序列指示彼此关联的因素对中的一个因素是另一因素的原因。
本公开的第三方面,提供一种计算机可读存储介质,该计算机可读存储介质具有在其上存储的机器可执行指令,该机器可执行指令在由设备执行时使该设备执行根据本公开的第一方面所描述的方法。
提供发明内容部分是为了以简化的形式来介绍一系列概念,它们在下文的具体实施方式中将被进一步描述。发明内容部分不旨在标识本公开的关键特征或必要特征,也不旨在限制本公开的范围。本公开的其它特征将通过以下的描述变得容易理解。
附图说明
从下文的公开内容和权利要求中,本发明的目的、优点和其他特征将变得更加明显。这里仅出于示例的目的,参考附图来给出优选实施例的非限制性描述,在附图中:
图1A和图1B示出了根据本公开的实施例的用于数据处理的示例系统的框图;
图2示出了根据本公开的实施例的用于确定多个因素间的因果关系的示意图;
图3示出了根据本公开的实施例的示例方法的流程图;
图4示出了根据本公开的实施例的示例方法的流程图;
图5示出了根据本公开的实施例的示例方法的流程图;
图6示出了根据本公开的实施例的示例方法的流程图;以及
图7示出了可以用来实施本公开的实施例的示例设备的示意性框图。
在各个附图中,相同或对应的标号表示相同或对应的部分。
具体实施方式
下面将参照附图更详细地描述本公开的实施例。虽然附图中显示了本公开的某些实施例,然而应当理解的是,本公开可以通过各种形式来实现,而且不应该被解释为限于这里阐述的实施例,相反提供这些实施例是为了更加透彻和完整地理解本公开。应当理解的是,本公开的附图及实施例仅用于示例性作用,并非用于限制本公开的保护范围。
在本公开的实施例的描述中,术语“包括”及其类似用语应当理解为开放性包含,即“包括但不限于”。术语“基于”应当理解为“至少部分地基于”。术语“一个实施例”或“该实施例”应当理解为“至少一个实施例”。术语“第一”、“第二”等可以指代不同的或相同的对象。下文还可能包括其他明确的和隐含的定义。
在本公开的实施例中,术语“因果结构”(causal structure)一般是指描述系统中的各个因素间的因果关系的结构,在本文中也被称为“因果关系序列”。术语“因素”也被称为“变量”。术语“观测样本”指代能够被直接观测到的多个因素的一组观测值,其中能够被直接观测到的因素也被称为“观测变量”。
如上所述,在实际生活中,期望快速且准确地发现大量观测变量间存在的因果关系。
在客户服务领域中,为了确定哪些因素将影响客户对电信运营商的满意度,可以收集大量客户的消费行为数据(诸如,客户等级、每月消费的上网流量、免费流量的比率、每月消费的上网流量的总费用 等)、满意度调查数据以及运营商策略数据。收集的每一种类型的数据也被称为一种因素(或变量)的观测值。通过发现这些因素间存在的因果关系,能够确定影响客户满意度的一个或多个因素。进一步地,可以通过改变该一个或多个因素的观测值或者针对该一个或多个因素制定相应策略,来提高客户对电信运营商的满意度。
在健康领域中,为了确定影响患者血压的因素,可以收集大量患者的一系列生理指标(即,一系列因素的观测值),诸如心率、心输出量、过敏指标、总外周血管阻力、儿茶酚胺释放、血压等。通过发现这些生理指标之间存在的因果关系,能够确定影响患者血压的生理指标(即,因素)。进一步地,可以通过影响该生理指标或者针对该生理指标制定相应策略,来使患者的血压保持稳定。
在商品销售领域中,为了确定影响目标商品(例如,雨伞)销量的因素,可以收集外部因素数据(诸如,天气、季节、温度、日期、店铺大小等),该商品的销售数据(诸如,该商品的销量、该商品的价格等),以及一个或多个关联商品(例如,冰淇淋)的销售数据等。收集的每一种类型的数据作为一种因素的观测值。通过发现这些因素间存在的因果关系,能够确定影响目标商品的销量的一个或多个因素。进一步地,可以通过改变该一个或多个因素的观测值或者针对该一个或多个因素制定相应策略,来提高目标商品的销量。
在软件开发领域中,为了确定影响故障率和/或软件开发周期的因素,可以收集软件开发的各种因素的信息,包括但不限于软件开发的总体信息(诸如,开发周期、开发投入的资源等)和软件开发的各个阶段的信息。软件开发的各个阶段的信息例如可以包括架构阶段的信息(诸如,软件架构方法、软件架构层级数量等)、编码阶段的信息(诸如,代码长度、函数数量、编程语言、模块数量等)、测试阶段的信息(诸如,单元测试的正确率或故障率、黑盒测试的正确率或故障率、白盒测试的正确率或故障率等)、软件发布后运行阶段的信息(诸如,运行阶段的正确率或故障率等)。收集的每一种类型的数据作为一种因素的观测值。通过发现这些因素间存在的因果关系,能够 确定影响软件开发周期和/或故障率的一个或多个因素。进一步地,可以通过改变该一个或多个因素的观测值或者针对该一个或多个因素制定相应策略,来降低软件开发周期和/或故障率。
一些传统方案主要针对具有较少因素(例如,不超过100个因素)的系统,采用基于约束的方法或基于分数的方法来搜索整个变量空间以发现可能的因果关系。例如,在基于约束的方法中,通常通过针对多个因素应用条件独立测试来发现其中的因果结构。然而,当需要搜索的因素数目较多时,条件独立测试的结果将变得不可靠。此外,发现因果结构往往需要利用从不包括任何因素到包括所有其他因素的条件集合来测试任意两个因素间的依赖关系,这将导致巨大的计算开销。在基于分数的方法中,通常通过优化度量因果结构与样本数据之间的匹配度的得分来发现因果结构。然而,由于搜索空间的超指数增长,这些方案往往难以适用于针对许多因素(例如,成百上千个)的因果结构(也称为“高维因果结构”)的发现。
根据本公开的实施例,提出了一种用于数据处理的方案。该方案能够快速且准确地实现高维因果结构的发现,从而能够解决上述问题和/或其他潜在问题。以下将结合上述示例场景来详细描述本公开的各实施例。应当理解,这仅仅是出于说明的目的,不旨在以任何方式限制本发明的范围。
图1A示出了根据本公开的实施例的用于数据处理的系统100的示例框图。系统100例如可以发现多个因素间的因果关系。应当理解,图1A所示的系统100仅仅是本公开的实施例可实现于其中的一种示例,不旨在限制本公开的范围。本公开的实施例同样适用于其他系统或架构。
如图1A所示,系统100可以包括因果关系确定装置120。因果关系确定装置120可以接收关于多个因素的观测样本集合110,并且从中确定指示多个因素间的因果关系的因果关系序列130。可选地,在一些实施例中,系统100可以进一步包括观测样本采集装置(图1A中未示出),用于采集关于多个因素的观测样本集合110。观测样本采 集装置可以实时地、定期地或不定期地采集多个因素的观测值,以得到观测样本集合110。在一些实施例中,观测样本采集装置可以包括一个或多个采集单元,分别用于采集不同类型的因素的观测值。
观测样本集合110可以包括与一个或多个目标因素有关的多个因素的观测样本。观测样本集合110例如可以表示为X={x 1,x 2,...,x D}∈R N×D,其中N表示观测样本集合110中的观测样本的总数目,D表示因素的总数目,观测样本集合110中的每个观测样本包括D个因素的一组观测值。在本文中,D个因素的集合也被表示为V={v 1,v 2,...,v D},并且v i(其中1≤i≤D)表示D个因素中的第i个因素。向量x i∈R N(其中1≤i≤D)表示第i个因素的N个观测值。例如,x i,n(其中,1≤i≤D并且1≤n≤N)表示第i个因素的第n个观测值。
以上述关于电信运营商的客户满意度的场景为例,例如目标因素为“客户满意度”,因素集合V可以包括与客户属性有关的因素(例如,客户等级、客户号码等)、与客户行为有关的因素(例如,每月消费的上网流量、免费流量的比率、每月消费的上网流量的总费用等)、与客户反馈有关的因素(例如,投诉次数、客户满意度)以及针对客户制定的策略因素(例如,超套提醒次数、时机等)中的一种或多种。等。以因素“客户等级”(即,v i)为例,向量x i可以由N个客户的等级组成,并且x i,n可以表示N个客户中的第n个客户的等级。以因素“每月消费的上网流量”(即,v i)为例,向量x i可以由N个客户每月消费的上网流量组成,并且x i,n可以表示N个客户中的第n个客户每月消费的上网流量。因果关系序列130例如可以指示客户等级、每月消费的上网流量、免费流量的比率、每月消费的上网流量的总费用、客户满意度等因素之间的因果关系,如目标因素“客户满意度”的原因是哪些因素。
以上述关于患者血压的场景为例,例如目标因素为“血压”,因素集合V可以包括心率、心输出量、过敏指标、总外周血管阻力、儿茶酚胺释放、血压等。以因素“心率”(即,v i)为例,向量x i可以由N个患者的心率组成,并且x i,n可以表示N个患者中的第n个患者的 心率。以因素“心输出量”(即,v i)为例,向量x i可以由N个患者的心输出量组成,并且x i,n可以表示N个患者中的第n个患者的心输出量。因果关系序列130例如可以指示心率、心输出量、过敏指标、总外周血管阻力、儿茶酚胺释放、血压等因素之间的因果关系,如目标因素“血压”的原因是哪些因素。
以上述商品销售场景为例,例如目标因素为“目标商品销量”,因素集合V可以包括外部因素(诸如,天气、季节、温度、日期、店铺大小等),与目标商品(例如,雨伞)的销售行为有关的因素(诸如,目标商品的销量、目标商品的价格等),与一个或多个关联商品(例如,冰淇淋)的销售行为有关的因素(诸如,关联商品的销量、关联商品的价格等)以及针对目标商品的销售策略因素(诸如,促销次数、频率等)中的一种或多种。以因素“温度”(即,v i)为例,向量x i可以由N天的温度组成,并且x i,n可以表示第n天的温度。以因素“目标商品销量”(即,v i)为例,向量x i可以由N天的雨伞销量组成,并且x i,n可以表示第n天的雨伞销量。因果关系序列130例如可以指示天气、季节、温度、日期、店铺大小、目标商品销量、目标商品价格、关联商品销量、关联商品价格等因素之间的因果关系,如目标因素“目标商品销量”的原因是哪些因素。
以上述软件开发场景为例,例如目标因素为“软件开发周期”或“软件运行阶段故障率”,因素集合v可以包括软件开发的总体因素(诸如,开发周期、开发投入的资源等)和软件开发的各个阶段的因素中的一种或多种。软件开发的各个阶段的因素例如可以包括架构阶段的因素(诸如,软件架构方法、软件架构层级数量等)、编码阶段的因素(诸如,代码长度、函数数量、编程语言、模块数量等)、测试阶段的因素(诸如,单元测试的正确率或故障率、黑盒测试的正确率或故障率、白盒测试的正确率或故障率等)、软件发布后运行阶段的因素(诸如,运行阶段的正确率、运行阶段故障率等)。以因素“开发周期”(即,v i)为例,向量x i可以由N个软件产品的开发周期组成,并且x i,n可以表示第n个软件产品的开发周期。以因素“代码长度”(即, v i)为例,向量x i可以由N个软件产品的代码长度组成,并且x i,n可以表示第n个软件产品的代码长度。因果关系序列130例如可以指示软件开发周期、开发投入的资源、架构方法、架构层级数量、代码长度、函数数量、编程语言、模块数量、单元测试的正确率或故障率、黑盒测试的正确率或故障率、白盒测试的正确率或故障率、运行阶段的正确率、运行阶段故障率等因素之间的因果关系。例如,目标因素“开发周期”的原因是哪些因素,目标因素“运行阶段故障率”的原因是哪些因素等。
如图1A所示,因果关系确定装置120例如可以包括依赖关系确定单元121和因果关系确定单元122。在一些实施例中,依赖关系确定单元121可以基于观测样本集合110来确定多个因素之间存在的依赖关系集合。该依赖关系集合中的每个依赖关系指示多个因素中彼此关联的一个相应因素对。在一些实施例中,因果关系确定单元122可以基于由依赖关系确定单元121确定的依赖关系集合来确定因素关系序列130。因素关系序列130可以指示存在依赖关系的因素对之间的因果关系(即,一个因素是另一因素的原因)。
应当理解,系统100中所包括的这些装置和/或装置中的单元仅是示例性的,而不旨在限制本公开的范围。应当理解的是,系统100还可以包括未示出的附加装置和/或单元。例如,在一些实施例中,系统100还可以进一步包括因果关系呈现装置(未示出),以用于呈现因果关系序列130的表示。
在一些实施例中,因果关系呈现装置可以以视觉、听觉等不同方式来呈现因果关系序列130的表示。例如,因果关系呈现装置可以以图形、图表、文本等方式来呈现因果关系序列130。在一些实施例中,因果关系呈现装置可以呈现因果关系序列130的全部的表示,也即,所有因素间的因果关系。备选地,在一些实施例中,因果关系呈现装置可以仅呈现因果关系序列130的一部分的表示,例如,与一个或多个目标因素相关联的因果关系。在一些实施例中,当目标因素的原因包括多个因素时,因果关系呈现装置可以进一步呈现多个因素的相应 重要程度,例如以不同颜色和/或表示不同重要程度的数值等方式来呈现多个因素的相应重要程度。本公开的实施例在此方面不受限制。
图1B示出了根据本公开的实施例的用于数据处理的系统105的示例框图。系统105例如可以应用和优化如图1A所示的因果关系序列130。应当理解,图1B所示的系统105仅仅是本公开的实施例可实现于其中的一种示例,不旨在限制本公开的范围。本公开的实施例同样适用于其他系统或架构。
如图1B所示,系统105可以包括观测样本影响装置140。观测样本影响装置140可以基于因果关系序列130,从多个因素中确定作为目标因素的原因的至少一个因素。观测样本影响装置140可以通过改变至少一个因素的观测值来影响目标因素的观测值,从而得到经改变的观测样本集合150。经改变的观测样本集合150中的至少一个观测样本包括至少一个因素的经改变的观测值。
以上述关于电信运营商的客户满意度的场景为例,目标因素例如为“客户满意度”,因果关系序列130例如可以指示目标因素“客户满意度”的原因是哪些因素(例如,套餐用尽之前的提醒、优惠套餐等)。观测样本影响装置140例如可以通过影响和改变这些因素的观测值、和/或针对这些因素制定相应策略(例如,在套餐用尽前向客户提供更多提醒、向客户提供更多优惠套餐),来提高客户对电信运营商的满意度。
以上述关于患者血压的场景为例,目标因素例如为“血压”,因果关系序列130例如可以指示目标因素“血压”的原因是哪些生理指标。观测样本影响装置140例如可以通过影响和改变这些生理指标、和/或针对这些生理指标制定相应策略,来使患者的血压保持稳定。
以上述商品销售场景为例,目标因素例如为“雨伞销量”,因果关系序列130例如可以指示目标因素“雨伞销量”的原因是哪些因素(例如,天气、可供销售的雨伞数量等)。观测样本影响装置140例如可以通过影响和改变这些因素、和/或针对这些因素制定相应策略(例如,在下雨时提高可供销售的雨伞数量),来提高目标商品雨伞 的销量。
以上述软件开发场景为例,目标因素例如为“开发周期”,因果关系序列130例如可以指示目标因素“开发周期”的原因是哪些因素(例如,架构层级数量、编程语言等)。观测样本影响装置140例如可以通过影响和改变这些因素、和/或针对这些因素制定相应策略(例如,降低软件架构复杂度、使用更友好的编程语言等),来降低软件开发的周期。又例如,目标因素可以为“运行阶段软件故障率”,因果关系序列130例如可以指示目标因素“运行阶段软件故障率”的原因是哪些因素(例如,代码长度、模块数量等)。观测样本影响装置140例如可以通过影响和改变这些因素、和/或针对这些因素制定相应策略(例如,降低代码长度、减少模块数量等),来降低运行阶段软件故障率。
如图1B所示,系统105可以包括因果关系优化装置160。因果关系优化装置160可以基于经改变的观测样本集合150来对因果关系序列130进行优化,从而提高因果关系序列130的准确性。在一些实施例中,因果关系优化装置160可以基于经改变的观测样本集合150来重新发现多个因素间的因果关系(例如,与因果关系确定装置120执行的过程类似),从而得到经优化的因果关系序列。以此方式,本公开的实施例能够进一步提高因果发现的准确性和鲁棒性。
尽管如图1A所示的因果关系确定装置120、如图1B所示的观测样本影响装置140和因果关系优化装置160被示出为彼此分离,然而应当理解,这仅仅出于说明的目的,而无意于限制本公开的范围。在一些实施例中,如图1A所示的因果关系确定装置120、如图1B所示的观测样本影响装置140和因果关系优化装置160可以被实现在同一物理设备中或者多个不同的物理设备中。在一些实施例中,如图1A所示的因果关系确定装置120和如图1B所示的因果关系优化装置160可以作为相同装置来实现。本公开的实施例在此方面不受限制。
图2示出了根据本公开的实施例的用于确定多个因素间的因果关系的示意图。出于简化和便于说明的目的,在图2中假设观测样本集 合110所涉及的因素(即,观测变量)的数目(即,D)为5。如图2所示,观测样本集合110包括关于因素v 1、v 2、v 3、v 4和v 5的多个观测样本。依赖关系确定单元121可以基于观测样本集合110来确定因素v 1、v 2、v 3、v 4和v 5之间存在的依赖关系集合,其被表示为骨架图210。例如,依赖关系集合210指示因素v 1和v 2彼此关联,因素v 2和v 4彼此关联,因素v 3和v 4彼此关联,并且因素v 3和v 5彼此关联。因果关系确定单元122可以基于依赖关系集合210来确定因素v 1、v 2、v 3、v 4和v 5的因果关系序列,其例如被表示为有向无环图130。例如,因果关系序列130指示因素v 2是因素v 1的原因(如边缘v 2→v 1所示),因素v 2是因素v 4的原因(如边缘v 2→v 4所示),因素v 3是因素v 4的原因(如边缘v 3→v 4所示),并且因素v 3是因素v 5的原因(如边缘v 3→v 5所示)。
图3示出了根据本公开的实施例的用于确定多个因素间的因果关系的方法300的流程图。例如,方法300可以由如图1A所示的因果关系确定装置120来执行。应当理解的是,方法300还可以包括未示出的附加动作和/或可以省略所示出的某些动作。本公开的范围在此方面不受限制。
在框310处,因果关系确定装置120获取关于多个因素的观测样本集合(例如,如图1A和图2所示的观测样本集合110)。观测样本集合中的一个观测样本包括多个因素的相应观测值。
在框320处,因果关系确定装置120(例如,依赖关系确定单元121)基于观测样本集合来确定多个因素之间存在的依赖关系集合(例如,如图2所示的骨架图210)。依赖关系集合中的一个依赖关系指示多个因素中彼此关联的一个因素对。
在一些实施例中,为了确定依赖关系集合,因果关系确定装置120可以基于多个因素中的任意两个因素的相应观测值来估计这两个因素之间的相关系数。例如,相关系数可以是Spearman相关系数或者Kendall相关系数中的任一项。基于所估计的每两个因素的相关系数,因果关系确定装置120可以建立相关系数矩阵S。例如,假设因素的 总数目为D,则S为D×D的矩阵。假设矩阵S中第j行第k列的元素为S jk,则S jk可以如下被确定:
Figure PCTCN2019084049-appb-000001
或者
Figure PCTCN2019084049-appb-000002
其中,ρ jk表示D个因素中的第j个因素与第k个因素的Spearman相关系数,τ jk表示D个因素中的第j个因素与第k个因素的Kendall相关系数。Spearman相关系数和Kendall相关系数的计算是本领域技术人员已知的,在此不再赘述。此外,可以利用任何已知或将来开发的方法或手段来计算两个因素间的相关系数,而不仅限于Spearman相关系数和Kendall相关系数。应当理解,Spearman相关系数和Kendall相关系数仅被视为相关系数的示例,而不旨在限制本公开的范围。
在一些实施例中,因果关系确定装置120可以基于所估计的相关系数矩阵S来建立用于确定依赖关系集合(即,骨架图210)的目标函数(本文也称为“第一目标函数”)。因果关系确定装置120可以通过使第一目标函数最小化来确定依赖关系集合。在一些实施例中,例如,因果关系确定装置120可以基于图形化Lasso算法来学习精度矩阵Ω,其表示因素之间的相应依赖关系。例如,Ω可以如下被确定:
Ω=argmin Ω≥0{tr(SΩ)-log|Ω|+λ∑ j≠kjk|}   (2)
其中λ为预定义的系数。如果所确定的矩阵Ω中第j行第k列的元素Ω jk为0,则表示D个因素中的第j个因素和第k个因素不相关;如果元素Ω jk不为0,则表示D个因素中的第j个因素和第k个因素彼此相关(但不一定具有因果关系)。假设依赖关系集合利用矩阵M来表示,则矩阵M中的元素M jk=Ω jk≠0。以此方式,因果关系确定装置120能够基于观测样本集合来确定多个因素间的依赖关系集合,如图2中的骨架图210所示。在下文中,“依赖关系集合”与“骨架图”可互换地使用。
备选地,在一些实施例中,因果关系确定装置120可以通过对多个因素应用条件独立测试来确定依赖关系集合M。条件独立测试可以用于确定两个因素在给定条件集合下是否彼此独立。在传统的条件独 立测试中,当判断两个因素是否独立时,需要将所有其他因素的任意组合作为上述条件集合。当因素的数目(即,D)较多时,其计算开销将会很大。如果条件集合中包括很多其他因素,则上述两个因素将很容易被确定为是彼此独立的。为了能够快速且准确地实现高维因果结构的发现,在一些实施例中,当通过对多个因素应用条件独立测试来确定依赖关系集合时,因果关系确定装置120可以限制条件集合中的其他因素的数目为1。以此方式,除了能够降低条件独立测试的计算开销,也可以降低被判定为彼此独立的因素对的数目,从而有利于后续因果关系序列的发现。
以此方式,通过确定依赖关系集合,本公开的实施例能够减小待搜索的变量空间的大小,从而能够快速地发现大量因素之间的因果关系。
在框330,因果关系确定装置120(例如,因果关系确定单元122)基于依赖关系集合,确定多个因素的因果关系序列。因果关系序列可以指示彼此关联的因素对中的一个因素是另一因素的原因。
在一些实施例中,为了确定因果关系序列,因果关系确定装置120可以针对由依赖关系集合指示的多个彼此关联的因素对,确定彼此关联的每个因素对中的一个因素对另一因素的影响。在一些实施例中,为了确定每个因素对中的一个因素对另一因素的影响,因果关系确定装置120可以基于预定分布(例如,高斯分布或者其他分布)来建立第二目标函数,从而通过使第二目标函数最小化来确定每个因素对中的一个因素对另一因素的影响。第二目标函数例如可以基于如下两个因素来建立:第一,使得所发现的因果结构对于观测数据样本具有较好的数据拟合度;第二,使得所发现的因果结构仍然是稀疏的。
在一些实施例中,假设因素的总数目为D,针对多个因素对中的每一个而确定的影响利用矩阵B来表示,则B可以如下被确定:
Figure PCTCN2019084049-appb-000003
其中X={x 1,x 2,...,x D}∈R N×D表示观测样本集合,N表示X中的观测样本的总数目,D表示因素的总数目。向量x i∈R N(其中1≤i≤D)表示第i个因素(即,因素v i)的N个观测值。x i,n(其中,1≤i≤D并且1≤n≤N)表示第i个因素(即,因素v i)的第n个观测值。向量β i∈R D-1(其中1≤i≤D)表示每个因素对因素v i的相应影响。例如,如果β ij≠0,则表示因素v j可能是因素v i的直接原因;如果β ij=0,则表示因素v j对因素v i没有影响,因此不可能是因素v i的直接原因。|B| 0表示矩阵B中的非零元素的总数目,其是对因果结构散度的估计,并且
Figure PCTCN2019084049-appb-000004
约束条件G{β 1,...,β D}∈DAG表示要确定的因果结构为有向无环图,并且约束条件
Figure PCTCN2019084049-appb-000005
M表示要确定的因果结构是之前确定的骨架图M(例如,如图2所示的骨架图210)的子集。
在一些实施例中,因果关系确定装置120可以基于针对多个因素对而确定的相应影响和观测样本集,来确定因果关系序列130。例如,图4示出了根据本公开的实施例的用于确定因果关系序列的方法400的流程图。方法400可以由如图1A所示的因果关系确定装置120来执行。应当理解的是,方法400还可以包括未示出的附加动作和/或可以省略所示出的某些动作。本公开的范围在此方面不受限制。
在框410处,因果关系确定装置120可以获取历史因果关系序列和历史因果关系得分。在此假设历史因果关系序列利用Q S来表示,并且历史因果关系得分利用f(Q S)来表示。
在一些实施例中,在初始情况下,因果关系确定装置120可以将历史因果关系序列初始化为空序列,也即Q S={}。因果关系确定装置120可以确定与空序列相对应的初始因果关系得分,以作为历史因果关系得分,也即:
Figure PCTCN2019084049-appb-000006
其中
Figure PCTCN2019084049-appb-000007
并且
Figure PCTCN2019084049-appb-000008
在上述公式(4)中,假设因果关系序列由有向无环图G={V,E}来表示,其中V表示图G中的所有节点(例如,骨架图210中具有边缘 的所有节点)的集合,并且E表示图G中的所有边缘的集合。例如,在如图2所示的示例中,V={v 1,v 2,v 3,v 4,v 5}。V\U表示V中除节点集合U以外的所有节点(也即,V\v i表示V中除v i以外的所有节点)。S(β i)表示β i的支持集合,也即节点v i的父节点(也即,表示因素v i的潜在原因的节点)的集合。约束条件
Figure PCTCN2019084049-appb-000009
表示集合S(β i)是集合V\U与集合S(m i)的交集的子集,其中集合S(m i)表示在骨架图M(例如,骨架图120)中与节点v i具有边缘的节点的集合。例如,在如图2所示的示例中,f(Q S)=f({})=SBIC(v 1|v 2)+SBIC(v 2|(v 1,v 4))+SBIC(v 3|(v 4,v 5))+SBIC(v 4|(v 2,v 3))+SBIC(v 5|v 3)。
在框420,因果关系确定装置120基于历史因果关系序列Q S和由依赖关系集合(例如,骨架图210)指示的多个因素对,确定可能加入到因果关系序列中的一个或多个候选因素。在一些实施例中,一个或多个候选因素可以包括候选节点集合V\Q S所对应的所有因素,候选节点集合V\Q S可以表示节点集合V中除Q S所包括的节点之外的所有节点。
如果一个或多个候选因素不存在(即,集合V\Q S为空),则在框470处,因果关系确定装置120可以输出历史因果关系序列Q S作为所确定的因果关系序列130。
如果一个或多个候选因素存在,则在框440处,因果关系确定装置120可以确定与一个或多个候选因素相对应的一个或多个候选因果关系得分。例如,针对候选节点集合V\Q S中的每个候选节点v i(即,v i∈V\Q S),与之对应的候选因果关系序列Q S’=Q S∪v i,并且该候选因果关系序列的得分为f(Q S’)=f(Q S)+SBIC(v i|Q S)-SBIC(v i|V\v i)。
在框450,因果关系确定装置120可以基于所确定的一个或多个候选因果关系得分,从一个或多个候选因素中选择要加入到因果关系序列中的候选因素。
在一些实施例中,因果关系确定装置120可以从一个或多个候选因果关系得分中确定最小候选因果关系得分,并且选择与最小候选因果关系得分相关联的候选因素,以用于加入到因果关系序列130中。
附加地或者备选地,在一些实施例中,为了能够更快地确定因果关系序列,因果关系确定装置120可以获取与待确定的因果关系序列相关联的约束条件。
在一些实施例中,因果关系确定装置120可以获取指示该约束条件的专家信息,并且基于所获取的专家信息来确定该约束条件。以图2所示的示例为例,专家信息例如可以指示节点v 3在节点v 4的前面,也即与节点v 3对应的因素可能是与节点v 4对应的因素的原因,但是与节点v 4对应的因素不可能是与节点v 3对应的因素的原因。
附加地或备选地,在一些实施例中,因果关系确定装置120可以基于历史因果关系序列和由骨架图120所指示的多个彼此关联的因素对,来确定约束条件。例如,在如图2所示的示例中,假设当前Q S指示节点v 3所对应的因素是节点v 4所对应的因素的原因(也即,因果关系序列130中存在边缘v 3→v 4)。此外,根据骨架图M可以确定节点v 1和v 2彼此关联,节点v 2和v 4彼此关联,因素v 3和v 4彼此关联,并且因素v 3和v 5彼此关联。因此,节点v 1、v 2和v 4构成强连接节点集合,并且节点v 3和v 5构成强连接节点集合。在此情况下,例如因果关系确定装置120可以确定节点集合{v 3,v 5}在节点集合{v 1,v 2,v 4}的前面。也即,节点集合{v 3,v 5}中的某个节点可能是节点集合{v 1,v 2,v 4}中的某个节点的原因,但是节点集合{v 1,v 2,v 4}中的任何节点不可能是节点集合{v 3,v 5}中的某个节点的原因。
在一些实施例中,响应于获取与待确定的因果关系序列相关联的约束条件,因果关系确定装置120可以从一个或多个候选因素中选择要加入到因果关系序列中的候选因素,使得所选择的候选因素的加入符合获取的约束条件。例如,当与最小候选因果关系得分相关联的候选因素的加入将违背该约束条件时,因果关系确定装置120可以选择另一候选因素(例如,与次小候选因果关系得分相关联的候选因素),以用于加入到因果关系序列130中。
以此方式,通过利用约束条件能够在因果关系序列的确定过程中限制候选因素的数目,从而能够更快地确定因果关系序列。
在框460,响应于候选因素被选择,因果关系确定装置120可以更新历史因果关系序列Q S和历史因果关系得分f(Q S)。例如,因果关系确定装置120可以利用与所选择的候选因素相对应的候选因果关系序列Q S’来替换历史因果关系序列Q S,并且利用与候选因果关系序列Q S’相对应的得分f(Q S’)来替换历史因果关系得分f(Q S)。
在一些实施例中,因果关系确定装置120可以迭代地执行方法400中的框410-460,直至所有可能的候选因素被搜索完为止(即,执行至框470)。
图5示出了根据本公开的实施例的用于影响目标因素的观测值的方法500的流程图。例如,方法500可以由如图1B所示的观测样本影响装置140来执行。在一些实施例中,方法500可以在方法300之后被执行。应当理解的是,方法500还可以包括未示出的附加动作和/或可以省略所示出的某些动作。本公开的范围在此方面不受限制。
在框510处,观测样本影响装置140基于因果关系序列,从多个因素中确定作为目标因素的原因的至少一个因素。然后,在框520处,观测样本影响装置140通过改变至少一个因素的观测值来影响目标因素的观测值。在一些实施例中,例如,观测样本影响装置140可以通过影响和改变至少一个因素和/或针对至少一个因素制定相应策略,来影响目标因素的观测值。
以上述关于电信运营商的客户满意度的场景为例,目标因素例如为“客户满意度”。观测样本影响装置140可以基于因果关系序列130来确定目标因素“客户满意度”的原因是哪些因素(例如,套餐用尽之前的提醒、优惠套餐等)。观测样本影响装置140可以进一步通过影响和改变这些因素、和/或针对这些因素制定相应策略(例如,在套餐用尽前向客户提供更多提醒、向客户提供更多优惠套餐),来提高客户对电信运营商的满意度。
以上述关于患者血压的场景为例,目标因素例如为“血压”。观测样本影响装置140可以基于因果关系序列130来确定目标因素“血压”的原因是哪些生理指标。观测样本影响装置140可以进一步通过 影响和改变这些生理指标、和/或针对这些生理指标制定相应策略,来使患者的血压保持稳定。
以上述商品销售场景为例,目标因素例如为“雨伞销量”。观测样本影响装置140可以基于因果关系序列130来确定目标因素“雨伞销量”的原因是哪些因素(例如,天气、可供销售的雨伞数量等)。观测样本影响装置140可以进一步通过影响和改变这些因素、和/或针对这些因素制定相应策略(例如,在下雨时提高可供销售的雨伞数量),来提高目标商品雨伞的销量。
以上述软件开发场景为例,目标因素例如为“开发周期”。观测样本影响装置140可以基于因果关系序列130来确定目标因素“开发周期”的原因是哪些因素(例如,架构层级数量、编程语言等)。观测样本影响装置140可以进一步通过影响和改变这些因素、和/或针对这些因素制定相应策略(例如,降低软件架构复杂度、使用更友好的编程语言等),来降低软件开发的周期。又例如,目标因素可以为“运行阶段软件故障率”。观测样本影响装置140可以基于因果关系序列130来确定目标因素“运行阶段软件故障率”的原因是哪些因素(例如,代码长度、模块数量等)。观测样本影响装置140可以进一步通过影响和改变这些因素、和/或针对这些因素制定相应策略(例如,降低代码长度、减少模块数量等),来降低运行阶段软件故障率。
图6示出了根据本公开的实施例的用于优化因果关系的方法600的流程图。例如,方法600可以由如图1B所示的因果关系优化装置160来执行。在一些实施例中,方法600可以在方法500之后被执行。应当理解的是,方法600还可以包括未示出的附加动作和/或可以省略所示出的某些动作。本公开的范围在此方面不受限制。
在框610处,因果关系优化装置160获取关于多个因素的经改变的观测样本集合。在一些实施例中,经改变的观测样本集合中的至少一个观测样本可以包括至少一个因素(例如,至少一个因素是目标因素的原因)的经改变的观测值。然后,在框620处,因果关系优化装置160可以基于经改变的观测样本集合来优化因果关系序列。在一些 实施例中,例如,因果关系优化装置160可以基于经改变的观测样本集合150来重新发现多个因素间的因果关系(例如,与因果关系确定装置120执行的过程类似),从而得到经优化的因果关系序列。以此方式,本公开的实施例能够进一步提高因果发现的准确性和鲁棒性。
图7示出了可以用来实施本公开的实施例的示例设备700的示意性框图。例如,如图1A所示的因果关系确定装置120、如图1B所示的观测样本影响装置140和/或因果关系优化装置160可以由设备700来实施。如图所示,设备700包括中央处理单元(CPU)701,其可以根据存储在只读存储器(ROM)702中的计算机程序指令或者从存储单元708加载到随机访问存储器(RAM)703中的计算机程序指令,来执行各种适当的动作和处理。在RAM 703中,还可存储设备700操作所需的各种程序和数据。CPU 701、ROM 702以及RAM 703通过总线704彼此相连。输入/输出(I/O)接口705也连接至总线704。
设备700中的多个部件连接至I/O接口705,包括:输入单元706,例如键盘、鼠标等;输出单元707,例如各种类型的显示器、扬声器等;存储单元708,例如磁盘、光盘等;以及通信单元709,例如网卡、调制解调器、无线通信收发机等。通信单元709允许设备700通过诸如因特网的计算机网络和/或各种电信网络与其他设备交换信息/数据。
处理单元701可被配置为执行上文所描述的各个过程和处理,例如方法300、400、500和/或600。例如,在一些实施例中,方法300、400、500和/或600可以被实现为计算机软件程序,其被有形地包含于机器可读介质,例如存储单元708。在一些实施例中,计算机程序的部分或者全部可以经由ROM 702和/或通信单元709而被载入和/或安装到设备700上。当计算机程序被加载到RAM 703并由CPU 701执行时,可以执行上文描述的方法300、400、500和/或600中的一个或多个步骤。
本公开可以是系统、方法和/或计算机程序产品。计算机程序产品可以包括计算机可读存储介质,其上载有用于执行本公开的各个方面 的计算机可读程序指令。
计算机可读存储介质可以是可以保持和存储由指令执行设备使用的指令的有形设备。计算机可读存储介质例如可以是——但不限于——电存储设备、磁存储设备、光存储设备、电磁存储设备、半导体存储设备或者上述的任意合适的组合。计算机可读存储介质的更具体的例子(非穷举的列表)包括:便携式计算机盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、静态随机存取存储器(SRAM)、便携式压缩盘只读存储器(CD-ROM)、数字多功能盘(DVD)、记忆棒、软盘、机械编码设备、例如其上存储有指令的打孔卡或凹槽内凸起结构、以及上述的任意合适的组合。这里所使用的计算机可读存储介质不被解释为瞬时信号本身,诸如无线电波或者其他自由传播的电磁波、通过波导或其他传输媒介传播的电磁波(例如,通过光纤电缆的光脉冲)、或者通过电线传输的电信号。
这里所描述的计算机可读程序指令可以从计算机可读存储介质下载到各个计算/处理设备,或者通过网络、例如因特网、局域网、广域网和/或无线网下载到外部计算机或外部存储设备。网络可以包括铜传输电缆、光纤传输、无线传输、路由器、防火墙、交换机、网关计算机和/或边缘服务器。每个计算/处理设备中的网络适配卡或者网络接口从网络接收计算机可读程序指令,并转发该计算机可读程序指令,以供存储在各个计算/处理设备中的计算机可读存储介质中。
用于执行本公开操作的计算机程序指令可以是汇编指令、指令集架构(ISA)指令、机器指令、机器相关指令、微代码、固件指令、状态设置数据、或者以一种或多种编程语言的任意组合编写的源代码或目标代码,所述编程语言包括面向对象的编程语言-诸如Smalltalk、C++等,以及常规的过程式编程语言-诸如“C”语言或类似的编程语言。计算机可读程序指令可以完全地在用户计算机上执行、部分地在用户计算机上执行、作为一个独立的软件包执行、部分在用户计算机上部分在远程计算机上执行、或者完全在远程计算机或服务器上执 行。在涉及远程计算机的情形中,远程计算机可以通过任意种类的网络-包括局域网(LAN)或广域网(WAN)-连接到用户计算机,或者,可以连接到外部计算机(例如利用因特网服务提供商来通过因特网连接)。在一些实施例中,通过利用计算机可读程序指令的状态信息来个性化定制电子电路,例如可编程逻辑电路、现场可编程门阵列(FPGA)或可编程逻辑阵列(PLA),该电子电路可以执行计算机可读程序指令,从而实现本公开的各个方面。
这里参照根据本公开实施例的方法、装置(系统)和计算机程序产品的流程图和/或框图描述了本公开的各个方面。应当理解,流程图和/或框图的每个方框以及流程图和/或框图中各方框的组合,都可以由计算机可读程序指令实现。
这些计算机可读程序指令可以提供给通用计算机、专用计算机或其它可编程数据处理装置的处理单元,从而生产出一种机器,使得这些指令在通过计算机或其它可编程数据处理装置的处理单元执行时,产生了实现流程图和/或框图中的一个或多个方框中规定的功能/动作的装置。也可以把这些计算机可读程序指令存储在计算机可读存储介质中,这些指令使得计算机、可编程数据处理装置和/或其他设备以特定方式工作,从而,存储有指令的计算机可读介质则包括一个制造品,其包括实现流程图和/或框图中的一个或多个方框中规定的功能/动作的各个方面的指令。
也可以把计算机可读程序指令加载到计算机、其它可编程数据处理装置、或其它设备上,使得在计算机、其它可编程数据处理装置或其它设备上执行一系列操作步骤,以产生计算机实现的过程,从而使得在计算机、其它可编程数据处理装置、或其它设备上执行的指令实现流程图和/或框图中的一个或多个方框中规定的功能/动作。
附图中的流程图和框图显示了根据本公开的多个实施例的系统、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上,流程图或框图中的每个方框可以代表一个模块、程序段或指令的一部分,所述模块、程序段或指令的一部分包含一个或多个用于实现 规定的逻辑功能的可执行指令。在有些作为替换的实现中,方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如,两个连续的方框实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,这依所涉及的功能而定。也要注意的是,框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组合,可以用执行规定的功能或动作的专用的基于硬件的系统来实现,或者可以用专用硬件与计算机指令的组合来实现。
以上已经描述了本公开的各实施例,上述说明是示例性的,并非穷尽性的,并且也不限于所披露的各实施例。在不偏离所说明的各实施例的范围和精神的情况下,对于本技术领域的普通技术人员来说许多修改和变更都是显而易见的。本文中所用术语的选择,旨在最好地解释各实施例的原理、实际应用或对市场中的技术的技术改进,或者使本技术领域的其它普通技术人员能理解本文披露的各实施例。

Claims (33)

  1. 一种用于数据处理的方法,包括:
    获取关于多个因素的观测样本集合,所述观测样本集合中的一个观测样本包括所述多个因素的相应观测值;
    基于所述观测样本集合,确定所述多个因素之间存在的依赖关系集合,所述依赖关系集合中的一个依赖关系指示所述多个因素中彼此关联的一个因素对;以及
    基于所述依赖关系集合,确定所述多个因素的因果关系序列,所述因果关系序列指示彼此关联的所述因素对中的一个因素是另一因素的原因。
  2. 根据权利要求1所述的方法,其中所述多个因素包括目标因素,并且所述方法还包括:
    基于所述因果关系序列,从所述多个因素中确定作为所述目标因素的原因的至少一个因素;以及
    通过改变所述至少一个因素的观测值来影响所述目标因素的观测值。
  3. 根据权利要求2所述的方法,还包括:
    获取关于所述多个因素的经改变的观测样本集合,所述经改变的观测样本集合中的至少一个观测样本包括所述至少一个因素的经改变的观测值;以及
    基于所述经改变的观测样本集合,优化所述因果关系序列。
  4. 根据权利要求1所述的方法,其中确定所述依赖关系集合包括:
    针对所述多个因素中的任意两个因素,基于所述观测样本集合中的所述两个因素的相应观测值来估计所述两个因素之间的相关系数;
    基于所述估计的结果,建立用于确定所述依赖关系集合的第一目标函数;以及
    通过使所述第一目标函数最小化来确定所述依赖关系集合。
  5. 根据权利要求4所述的方法,其中所述相关系数包括以下任一 项;
    Spearman相关系数,或者
    Kendall相关系数。
  6. 根据权利要求1所述的方法,其中确定所述依赖关系集合包括:
    通过对所述多个因素应用条件独立测试来确定所述依赖关系集合。
  7. 根据权利要求1所述的方法,其中确定所述因果关系序列包括:
    针对由所述依赖关系集合指示的多个彼此关联的因素对,确定每个因素对中的一个因素对另一因素的影响;以及
    基于针对所述多个因素对而确定的相应影响和所述观测样本集合,确定所述因果关系序列。
  8. 根据权利要求7所述的方法,其中确定每个因素对中的一个因素对另一因素的影响包括:
    基于预定分布,建立用于针对所述多个因素对来确定所述相应影响的第二目标函数;以及
    通过使所述第二目标函数最小化来确定每个因素对中的一个因素对另一因素的影响。
  9. 根据权利要求8所述的方法,其中所述预定分布为高斯分布。
  10. 根据权利要求7所述的方法,其中确定所述因果关系序列包括迭代地执行以下操作至少一次:
    获取历史因果关系序列和与所述历史因果关系序列相对应的历史因果关系得分;
    基于所述历史因果关系序列和所述多个因素对,确定可能加入到所述因果关系序列中的一个或多个候选因素;
    响应于所述一个或多个候选因素存在,基于所述历史因果关系得分、针对所述多个因素对而确定的所述相应影响和所述观测样本集合,确定与所述一个或多个候选因素相对应的一个或多个候选因果关系得分;
    基于所述一个或多个候选因果关系得分,从所述一个或多个候选 因素中选择要加入到所述因果关系序列中的候选因素;以及
    基于所选择的所述候选因素,更新所述历史因果关系序列和所述历史因果关系得分。
  11. 根据权利要求10所述的方法,还包括:
    响应于所述一个或多个候选因素不存在,将所述历史因果关系序列确定为所述因果关系序列。
  12. 根据权利要求10所述的方法,其中获取所述历史因果关系序列和所述历史因果关系得分包括:
    将所述历史因果关系序列初始化为空序列;以及
    确定与所述空序列相对应的初始因果关系得分,以作为所述历史因果关系得分。
  13. 根据权利要求10所述的方法,其中从所述一个或多个候选因素中选择所述候选因素包括:
    从所述一个或多个候选因果关系得分中确定最小候选因果关系得分;以及
    从所述一个或多个候选因素中选择与所述最小候选因果关系得分相关联的所述候选因素。
  14. 根据权利要求10所述的方法,其中从所述一个或多个候选因素中选择所述候选因素包括:
    获取与待确定的所述因果关系序列相关联的约束条件;以及
    从所述一个或多个候选因素中选择要加入到所述因果关系序列中的所述候选因素,使得所述候选因素的所述加入符合所述约束条件。
  15. 根据权利要求14所述的方法,其中获取所述约束条件包括:
    获取指示所述约束条件的信息;以及
    基于所述信息来确定所述约束条件。
  16. 根据权利要求14所述的方法,其中获取所述约束条件包括:
    基于所述历史因果关系序列和所述多个因素对,确定所述约束条件。
  17. 一种用于数据处理的装置,包括:
    至少一个处理单元;以及
    至少一个存储器,所述至少一个存储器被耦合到所述至少一个处理单元并且存储用于由所述至少一个处理单元执行的指令,所述指令当由所述至少一个处理单元执行时,使得所述装置执行动作,所述动作包括:
    获取关于多个因素的观测样本集合,所述观测样本集合中的一个观测样本包括所述多个因素的相应观测值;
    基于所述观测样本集合,确定所述多个因素之间存在的依赖关系集合,所述依赖关系集合中的一个依赖关系指示所述多个因素中彼此关联的一个因素对;以及
    基于所述依赖关系集合,确定所述多个因素的因果关系序列,所述因果关系序列指示彼此关联的所述因素对中的一个因素是另一因素的原因。
  18. 根据权利要求17所述的装置,其中所述多个因素包括目标因素,并且所述动作还包括:
    基于所述因果关系序列,从所述多个因素中确定作为所述目标因素的原因的至少一个因素;以及
    通过改变所述至少一个因素的观测值来影响所述目标因素的观测值。
  19. 根据权利要求18所述的装置,其中所述动作还包括:
    获取关于所述多个因素的经改变的观测样本集合,所述经改变的观测样本集合中的至少一个观测样本包括所述至少一个因素的经改变的观测值;以及
    基于所述经改变的观测样本集合,优化所述因果关系序列。
  20. 根据权利要求17所述的装置,其中确定所述依赖关系集合包括:
    针对所述多个因素中的任意两个因素,基于所述观测样本集合中的所述两个因素的相应观测值来估计所述两个因素之间的相关系数;
    基于所述估计的结果,建立用于确定所述依赖关系集合的第一目 标函数;以及
    通过使所述第一目标函数最小化来确定所述依赖关系集合。
  21. 根据权利要求20所述的装置,其中所述相关系数包括以下任一项:
    Spearman相关系数,或者
    Kendall相关系数。
  22. 根据权利要求17所述的装置,其中确定所述依赖关系集合包括:
    通过对所述多个因素应用条件独立测试来确定所述依赖关系集合。
  23. 根据权利要求17所述的装置,其中确定所述因果关系序列包括:
    针对由所述依赖关系集合指示的多个彼此关联的因素对,确定每个因素对中的一个因素对另一因素的影响;以及
    基于针对所述多个因素对而确定的相应影响和所述观测样本集合,确定所述因果关系序列。
  24. 根据权利要求23所述的装置,其中确定每个因素对中的一个因素对另一因素的影响包括:
    基于预定分布,建立用于针对所述多个因素对来确定所述相应影响的第二目标函数;以及
    通过使所述第二目标函数最小化来确定每个因素对中的一个因素对另一因素的影响。
  25. 根据权利要求24所述的装置,其中所述预定分布为高斯分布。
  26. 根据权利要求23所述的装置,其中确定所述因果关系序列包括迭代地执行以下操作至少一次:
    获取历史因果关系序列和与所述历史因果关系序列相对应的历史因果关系得分;
    基于所述历史因果关系序列和所述多个因素对,确定可能加入到所述因果关系序列中的一个或多个候选因素;
    响应于所述一个或多个候选因素存在,基于所述历史因果关系得分、针对所述多个因素对而确定的所述相应影响和所述观测样本集合,确定与所述一个或多个候选因素相对应的一个或多个候选因果关系得分;
    基于所述一个或多个候选因果关系得分,从所述一个或多个候选因素中选择要加入到所述因果关系序列中的候选因素;以及
    基于所选择的所述候选因素,更新所述历史因果关系序列和所述历史因果关系得分。
  27. 根据权利要求26所述的装置,其中所述动作还包括:
    响应于所述一个或多个候选因素不存在,将所述历史因果关系序列确定为所述因果关系序列。
  28. 根据权利要求26所述的装置,其中获取所述历史因果关系序列和所述历史因果关系得分包括:
    将所述历史因果关系序列初始化为空序列;以及
    确定与所述空序列相对应的初始因果关系得分,以作为所述历史因果关系得分。
  29. 根据权利要求26所述的装置,其中从所述一个或多个候选因素中选择所述候选因素包括:
    从所述一个或多个候选因果关系得分中确定最小候选因果关系得分;以及
    从所述一个或多个候选因素中选择与所述最小候选因果关系得分相关联的所述候选因素。
  30. 根据权利要求26所述的装置,其中从所述一个或多个候选因素中选择所述候选因素包括:
    获取与待确定的所述因果关系序列相关联的约束条件;以及
    从所述一个或多个候选因素中选择要加入到所述因果关系序列中的所述候选因素,使得所述候选因素的所述加入符合所述约束条件。
  31. 根据权利要求30所述的方法,其中获取所述约束条件包括:
    获取指示所述约束条件的信息;以及
    基于所述信息来确定所述约束条件。
  32. 根据权利要求30所述的方法,其中获取所述约束条件包括:
    基于所述历史因果关系序列和所述多个因素对,确定所述约束条件。
  33. 一种计算机可读存储介质,所述计算机可读存储介质具有在其上存储的机器可执行指令,所述机器可执行指令在由设备执行时使所述设备执行根据权利要求1-16中的任一项所述的方法。
PCT/CN2019/084049 2019-04-24 2019-04-24 用于数据处理的方法、装置和介质 WO2020215237A1 (zh)

Priority Applications (4)

Application Number Priority Date Filing Date Title
US17/605,731 US20220215291A1 (en) 2019-04-24 2019-04-24 Method and device for use in data processing, and medium
JP2021563019A JP2022537009A (ja) 2019-04-24 2019-04-24 データ処理方法、装置及びプログラム
PCT/CN2019/084049 WO2020215237A1 (zh) 2019-04-24 2019-04-24 用于数据处理的方法、装置和介质
JP2023190772A JP2024016198A (ja) 2019-04-24 2023-11-08 データ処理方法、装置及び媒体

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2019/084049 WO2020215237A1 (zh) 2019-04-24 2019-04-24 用于数据处理的方法、装置和介质

Publications (1)

Publication Number Publication Date
WO2020215237A1 true WO2020215237A1 (zh) 2020-10-29

Family

ID=72941072

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/084049 WO2020215237A1 (zh) 2019-04-24 2019-04-24 用于数据处理的方法、装置和介质

Country Status (3)

Country Link
US (1) US20220215291A1 (zh)
JP (2) JP2022537009A (zh)
WO (1) WO2020215237A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2022122249A (ja) * 2021-02-09 2022-08-22 日本電気株式会社 データ処理方法、装置及びプログラム

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101436057A (zh) * 2008-12-18 2009-05-20 浙江大学 数控机床热误差贝叶斯网络补偿方法
CN103020423A (zh) * 2012-11-21 2013-04-03 华中科技大学 基于copula函数获取风电场出力相关特性的方法
CN107491898A (zh) * 2017-09-01 2017-12-19 中国电建集团成都勘测设计研究院有限公司 用于梯级水电站风险分析的贝叶斯网络模型及构建方法

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP5082511B2 (ja) * 2007-03-07 2012-11-28 オムロン株式会社 因果構造決定装置、因果構造決定装置の制御方法、および因果構造決定装置の制御プログラム
US10289751B2 (en) * 2013-03-15 2019-05-14 Konstantinos (Constantin) F. Aliferis Data analysis computer system and method employing local to global causal discovery

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101436057A (zh) * 2008-12-18 2009-05-20 浙江大学 数控机床热误差贝叶斯网络补偿方法
CN103020423A (zh) * 2012-11-21 2013-04-03 华中科技大学 基于copula函数获取风电场出力相关特性的方法
CN107491898A (zh) * 2017-09-01 2017-12-19 中国电建集团成都勘测设计研究院有限公司 用于梯级水电站风险分析的贝叶斯网络模型及构建方法

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2022122249A (ja) * 2021-02-09 2022-08-22 日本電気株式会社 データ処理方法、装置及びプログラム
JP7396344B2 (ja) 2021-02-09 2023-12-12 日本電気株式会社 データ処理方法、装置及びプログラム

Also Published As

Publication number Publication date
JP2022537009A (ja) 2022-08-23
JP2024016198A (ja) 2024-02-06
US20220215291A1 (en) 2022-07-07

Similar Documents

Publication Publication Date Title
Zhou et al. Offline multi-action policy learning: Generalization and optimization
Moeyersoms et al. Comprehensible software fault and effort prediction: A data mining approach
JP6822509B2 (ja) データ処理方法および電子機器
US11113705B2 (en) Business forecasting using predictive metadata
Carrasco et al. The concordance correlation coefficient for repeated measures estimated by variance components
US20210133612A1 (en) Graph data structure for using inter-feature dependencies in machine-learning
JP7294369B2 (ja) 情報処理に用いられる方法、装置、電子機器及びプログラム
JP2024016198A (ja) データ処理方法、装置及び媒体
Veraverbeke et al. Preadjusted non-parametric estimation of a conditional distribution function
Binning et al. Sigma point filters for dynamic nonlinear regime switching models
CN113159934A (zh) 一种网点客流量的预测方法、系统、电子设备及存储介质
US11531836B2 (en) Method, device, and medium for data processing
Azeroual et al. Quality of research information in RIS databases: A multidimensional approach
Kořenek et al. Causal network discovery by iterative conditioning: Comparison of algorithms
US20220114607A1 (en) Method, apparatus and computer readable storage medium for data processing
JP6531820B2 (ja) 推定器可視化システム
Saadi et al. Investigating scalability in population synthesis: a comparative approach
WO2022204540A1 (en) Computationally efficient system and method for observational causal inferencing
CN111767290B (zh) 用于更新用户画像的方法和装置
JP2022028611A (ja) 情報処理に用いられる方法、装置、デバイス及び記憶媒体
Lima Batista et al. Bayesian estimation of term structure models: An application of the Hamiltonian Monte Carlo method
CN112133420A (zh) 数据处理方法、装置和计算机可读存储介质
JP7491333B2 (ja) 電子デバイスおよびコンピュータプログラム
Rashidghalam et al. Parametric and Non-parametric Models for Efficiency Measurement
JP2022017945A (ja) 情報処理装置、処理パターン候補提示方法及びプログラム

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19925752

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2021563019

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19925752

Country of ref document: EP

Kind code of ref document: A1