WO2011016281A2 - Dispositif de traitement d'informations et programme pour apprendre une structure de réseau de bayes - Google Patents

Dispositif de traitement d'informations et programme pour apprendre une structure de réseau de bayes Download PDF

Info

Publication number
WO2011016281A2
WO2011016281A2 PCT/JP2010/058963 JP2010058963W WO2011016281A2 WO 2011016281 A2 WO2011016281 A2 WO 2011016281A2 JP 2010058963 W JP2010058963 W JP 2010058963W WO 2011016281 A2 WO2011016281 A2 WO 2011016281A2
Authority
WO
WIPO (PCT)
Prior art keywords
random variable
edge
random
mutual information
pair
Prior art date
Application number
PCT/JP2010/058963
Other languages
English (en)
Japanese (ja)
Inventor
民平 森下
真臣 植野
Original Assignee
株式会社シーエーシー
国立大学法人 電気通信大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 株式会社シーエーシー, 国立大学法人 電気通信大学 filed Critical 株式会社シーエーシー
Priority to JP2011525822A priority Critical patent/JP5555238B2/ja
Publication of WO2011016281A2 publication Critical patent/WO2011016281A2/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N7/00Computing arrangements based on specific mathematical models
    • G06N7/01Probabilistic graphical models, e.g. probabilistic networks

Definitions

  • the present invention relates to an information processing apparatus and program for Bayesian network structure learning, and more particularly, to perform Bayesian network structure learning at high speed in a stable calculation time in a situation where there are a large number of random variables and a large amount of data.
  • the present invention relates to an information processing apparatus and a program that can be used.
  • Bayesian network which is an expression of probabilistic causality, is known for its high inference accuracy.
  • Bayesian network structure learning is known to be NP-complete with respect to the number of nodes, and learning a large number of random variables from a large amount of data is a very difficult problem.
  • Bayesian network structure learning algorithm extracts causal relations by scoring-based method aiming at maximizing approximation of true joint probability distribution and joint probability distribution expressed by Bayesian network and independence judgment between random variables It is roughly classified into a constraint-based method (also called CI (Conditional Independence) -based) for the purpose of doing this.
  • a constraint-based learning algorithm is a generic name for algorithms that perform conditional independent tests between random variables using given data and determine whether edges (also called arcs) are required between random variables based on the results. is there. When dealing with large-scale random variables, it is known that the inference accuracy is higher when a method for determining independence between variables is used.
  • TPDA Three Phase Dependency Analysis
  • MDF monotone DAG faithful
  • Set P has a corresponding joint probability distribution P (U) As given.
  • TPDA TPDA
  • three phases called a tree structure preparation (Drafting) phase, an edge increase (Thickening) phase, and an edge reduction (Thinning) phase are executed, and the structure is learned by finally directing edges.
  • the main input is tabular data expressed in CSV format or relational database relations, and what random variables and what states (realized values) are included in the data. Is described (hereinafter referred to as “data specification description information”).
  • conditional mutual information is calculated with the node adjacent to the variable node as the first set of conditions.
  • the conditional mutual information is calculated by the following equation (3).
  • the capital letter C represents a random variable set
  • the capital letter c represents a state value set corresponding to each variable. If the calculated conditional mutual information amount is equal to or greater than the predetermined threshold ⁇ , the random variable set C (bold) is reduced and the calculation of Expression (3) is repeated. When a condition set in which the conditional mutual information is less than ⁇ is found, a record is registered in the global cut set without adding an undirected edge between the random variable pairs. On the other hand, if a condition set whose conditional mutual information is less than ⁇ is not found before the condition set becomes an empty set, an undirected edge is added between the random variable pairs.
  • the undirected edge is temporarily deleted.
  • the conditional mutual information is calculated until a condition set that is less than the threshold ⁇ is found while gradually reducing the condition set. If a condition set that is less than the threshold ⁇ is found, the record is registered in the global cut set while the undirected edge is deleted. If the condition set that is less than the threshold ⁇ is not found, the temporarily deleted undirected edge is added and restored.
  • any of the random variable pairs has three or more adjacent nodes except for the counterpart of the pair, a more precise edge necessity determination is performed.
  • edge increase and edge reduction work to determine the direction of the undirected edge added by these phases is performed. Specifically, for all sets of three nodes X, Y, and Z where X and Y and Z and Y are directly connected by undirected edges but X and Y are not directly connected, The direction is determined for an edge whose direction can be determined based on whether or not these nodes are included in the global cutting set.
  • TPDA the Bayesian network structure is learned as described above.
  • the TPDA described above is the fastest-running currently known constraint-based learning algorithm, and is an algorithm suitable for processing large-scale variables and large amounts of data.
  • the number of random variables is still larger, there is a problem that it becomes difficult to calculate a conditional independent test. That is, since the number of variables corresponding to c (bold) in the joint probability distribution P (x, y, c (bold)) portion on the right side of the conditional mutual information shown in Expression (3) increases, the amount of calculation increases. It becomes difficult to calculate.
  • the pattern of the joint probability distribution increases, there is a problem that more missing values are generated that do not contribute to the calculation result.
  • the inventors of the present application can perform Bayesian network structure learning that is faster and has less processing time variation than the conventional TPDA described above. discovered.
  • the inventors of the present application when searching for a cut set, perform conditional mutual information calculation in the order in which the subset size of the condition set is in ascending order, and use the MDF assumption as early as possible. Comparing the mutual information with conditions when given two conditional variables and given each conditional variable in the conditional variable set, two levels of variables not included in the cut set It was found that the cut set can be searched by only three steps at maximum by deleting all from the search target with the eyes. As a result, Bayesian network structure learning can be performed at higher speed than conventional TPDA.
  • the present invention provides an information processing apparatus and program for performing Bayesian network structure learning based on these new algorithms.
  • the information processing apparatus of the present invention is an information processing apparatus that performs Bayesian network structure learning from input data that includes information about a plurality of random variables and the states that each random variable takes.
  • the information processing apparatus is a unit that generates a tree-structured graph for input data, and adds an edge between the random variable pairs for each random variable pair whose mutual information amount is equal to or greater than a first threshold value.
  • the information processing apparatus further adds an edge when an edge is necessary for each random variable pair in which the mutual information amount is not less than the first threshold but the edge is not added by the means for generating the tree structure graph. Means to do.
  • the means includes a random variable included in a set of nodes on a path between the two random variable nodes constituting the random variable pair to which the edge is not added and adjacent to one of the two random variable nodes. If a pair whose condition value is less than the first threshold is found using the set of as a condition set, an edge is not added between the two random variables, otherwise Add an edge.
  • the means in the calculation of the conditional mutual information, includes a second threshold value in which a joint probability distribution for the states of the two random variables and the state set corresponding to each random variable is equal to or less than a first threshold value. If it is less, the calculation of the related component is omitted.
  • the means for generating the tree-structured graph calculates the peripheral probability of each random variable included in the input data, and the peripheral probability taking a state with any random variable constituting the random variable pair is less than the second threshold In this case, the mutual information amount of each random variable pair may be calculated by omitting the calculation of the component of the related mutual information amount.
  • the means for adding an edge includes a set of nodes adjacent to one of the two random variable nodes on a path between two random variable nodes constituting a random variable pair to which no edge is added. Is the final condition set, and the conditional mutual information about the two random variable nodes and the condition set C is the size from when the size of the condition set C is 1 to the size of the final condition set.
  • the information processing apparatus determines, for each random variable pair having an edge after processing by the means for adding an edge, whether or not an edge is necessary, and deletes the edge if unnecessary, Means for directing the edge.
  • the information processing apparatus of the present invention uses a constraint-based learning algorithm to perform Bayesian network structure learning from input data including information about a plurality of random variables and the states that each random variable takes. It is the information processing apparatus which performs.
  • the information processing apparatus determines whether or not an edge should be added between a certain random variable pair by obtaining a conditional mutual information amount, and at the time of the determination, for the two random variable nodes constituting the random variable pair, A conditional mutual information amount is calculated using a set of random variables included in a set of nodes adjacent to one of the two random variable nodes on the path between the two as a condition set, and the value is less than the first threshold value.
  • the information processing apparatus has a simultaneous probability distribution for the states of the two random variables and the state set corresponding to each random variable less than a second threshold that is equal to or less than a first threshold. In this case, calculation of related components is omitted.
  • the information processing apparatus uses, as a final condition set, a set of nodes on the path between them and adjacent to either of the two random variable nodes.
  • the conditional mutual information about the two random variable nodes and the condition set C is calculated while increasing the size from the case where the size of the condition set C is 1 to the size of the final condition set.
  • the present invention is a program that causes a computer to operate like the information processing apparatus described above.
  • the information processing apparatus of the present invention is an information processing apparatus that performs Bayesian network structure learning from input data including information about a plurality of random variables and the states that each random variable takes.
  • the information processing apparatus is a unit that generates a tree-structured graph for input data, and adds an edge between the random variable pairs for each random variable pair whose mutual information amount is equal to or greater than a first threshold value.
  • the information processing apparatus further adds an edge when an edge is necessary for each random variable pair in which the mutual information amount is not less than the first threshold but the edge is not added by the means for generating the tree structure graph. Means to do.
  • the means for adding the edge is a set of nodes adjacent to one of the two random variable nodes on a path between two random variable nodes constituting the random variable pair to which the edge is not added.
  • a conditional set including the included random variables is used as a candidate condition set, each one of the random variables in the candidate condition set is used as a conditional set, and the conditional mutual information is calculated. At least one of the calculated conditional mutual information Is less than the first threshold, the processing for the random variable pair is terminated without adding an edge between the two random variables.
  • the means for adding the edge calculates the conditional mutual information using any two random variable pairs in the candidate condition set as a condition set when the above processing is not completed, and the calculated conditional mutual information If at least one of the quantities is less than the first threshold, the process ends without adding an edge between the two random variables.
  • the calculated conditional mutual information is larger than the conditional mutual information that has already been calculated using only one random variable as a condition set
  • the one random variable is deleted from the candidate condition set and calculated If the conditional mutual information amount is larger than the conditional mutual information amount that has already been calculated using only the other random variable as a condition set, the other random variable is deleted from the candidate condition set.
  • the means for adding the edge calculates a conditional mutual information amount using all the random variables remaining in the candidate condition set as a condition set when the above processing is not completed, and at least one of the calculated conditional mutual information amounts. If is less than the first threshold, the process ends without adding an edge between the two random variables.
  • the information processing apparatus further includes means for adding an edge that adds an edge between the two random variables when the above processing is not completed.
  • the information processing apparatus further determines, for each random variable pair having an edge after the above processing, whether or not an edge is necessary, and if it is not necessary, deletes the edge and directs each edge. Means for performing.
  • the information processing apparatus of the present invention uses a constraint-based learning algorithm to perform Bayesian network structure learning from input data including information about a plurality of random variables and the states that each random variable takes. It is the information processing apparatus which performs. The information processing apparatus determines whether or not an edge should be added between a certain random variable pair by obtaining a conditional mutual information amount.
  • the information processing apparatus for the two random variable nodes constituting the random variable pair, selects a random variable included in a set of nodes on a path between the adjacent random variable nodes and adjacent to either of the two random variable nodes.
  • Conditional mutual information is calculated by using a condition set including the candidate condition set and each one of the random variables in the candidate condition set as a condition set, and at least one of the calculated conditional mutual information is less than the first threshold value In such a case, the processing for the random variable pair is terminated without adding an edge between the two random variables.
  • the information processing apparatus calculates a conditional mutual information amount using any two sets of random variables in the candidate condition set as a condition set, and at least of the calculated conditional mutual information amount If either is less than the first threshold, the process is terminated without adding an edge between the two random variables.
  • the calculated conditional mutual information is larger than the conditional mutual information that has already been calculated using only one random variable as a condition set, the one random variable is deleted from the candidate condition set and calculated If the conditional mutual information amount is larger than the conditional mutual information amount that has already been calculated using only the other random variable as a condition set, the other random variable is deleted from the candidate condition set.
  • the information processing apparatus calculates a conditional mutual information amount using all the random variables remaining in the candidate condition set as a condition set, and at least one of the calculated conditional mutual information amounts is the first. If the threshold value is less than 1, the process ends without adding an edge between the two random variables.
  • the information processing apparatus further includes means for adding an edge between the two random variables when the above processing does not end.
  • the present invention is a program that causes a computer to operate like the information processing apparatus described above.
  • Bayesian network structure learning can be performed at high speed in calculation time. Therefore, according to this invention, the industrial application range of a Bayesian network can be expanded.
  • FIG. 1 is a block diagram of an information processing apparatus for performing Bayesian network structure learning according to an embodiment of the present invention.
  • FIG. It is a figure which shows the example of the information contained in input data. It is a flowchart which shows the process by the information processing apparatus of this invention.
  • FIG. 4 is a flowchart showing the process of step 310 of FIG. 3 in more detail. It is a figure which shows the pseudo code of Apriori.
  • FIG. 10 illustrates an example of a genFreqItemSet1 routine. It is a figure which shows an example of the calcMutualInformation routine for the mutual information calculation in this invention at the time of adopting Apriori as an example of a frequent item set extraction algorithm, and incorporating this.
  • FIG. 11 is a diagram illustrating an example of an edgeNeeded_H routine that is called in the thickening routine of FIG. 8 and is used for the processes of FIGS. 9 and 10. It is a figure which shows an example of the edgeNeededBody routine called in edgeNeeded_H routine. It is a figure which shows an example of the edgeNeededBody routine called in edgeNeeded_H routine.
  • FIG. 4 is a flowchart showing the process of step 310 of FIG. 3 in more detail in the second embodiment of the present invention. It is a figure which shows an example of the thickening routine in 2nd Example of this invention. 12 is a flowchart showing details of main processes executed in a thickening routine in the second embodiment of the present invention.
  • FIG. 26 is a diagram showing an example of an edgeNeeded_H routine that is called in the thickening routine of FIG. 24 and used for the processing of FIG. 25 in the second embodiment of the present invention.
  • FIG. 10 is a diagram illustrating an example of a SearchCutSet routine called in the edgeNeeded_H routine and the edgeNeeded routine in the second embodiment of the present invention. It is a flowchart which shows the detail of the edge reduction process in 2nd Example of this invention. It is a figure which shows an example of the Thinning routine in 2nd Example of this invention. In the second embodiment of the present invention, it is a diagram showing an example of an edgeNeeded routine called in the Thinning routine.
  • FIG. 10 is a diagram showing an example of an orientationEdge routine used for edge orientation in the second embodiment of the present invention.
  • FIG. 1 shows a block diagram of an information processing apparatus 100 for executing Bayesian network structure learning according to an embodiment of the present invention.
  • the control unit 102 is a part that controls the flow of processing of the information processing apparatus 100 as a whole. As preprocessing, the control unit 102 checks whether arguments and parameters specified at the start of structure learning are normal. When these are normal, the control unit 102 causes the data specification analysis unit 104 to analyze the data specification. Thereafter, the control unit 102 causes the structure learning unit 110 to execute the main processing of the algorithm.
  • the data specification analysis unit 104 reads the data specification description file 108 and prepares to analyze data that is the main input.
  • the main input data is, for example, data in a tabular format expressed in a CSV format, a relation in a relational database, or the like.
  • the data is input to the information processing apparatus 100 by a user and stored in the database 106 in the information processing apparatus 100, for example.
  • the data may be stored in a database on a communication network that is connected to the information processing apparatus 100 in a wired or wireless manner, and information processing is performed in response to reception of a command that requests Bayesian network structure learning.
  • the device 100 may access the database and receive data.
  • the data store for storing data may be a file, a relational database, or a two-dimensional array on a memory. In the present embodiment, the following description will be given assuming that it is a relational database.
  • Each column is composed of “ID” and “each random variable name”, and each row is composed of a corresponding “ID” and “state (realized value) of each random variable”.
  • the types of coupons used by the customer are two types T1 and T2 (these cannot be used together).
  • the case where the coupon is not used is represented by n
  • the product purchased by the customer is represented by y
  • the product which has not been purchased by n is represented by purchase data for six customers.
  • the data specification description file 108 is a file including information on the random variables included in the data and the state (realized value) of each random variable.
  • the data specification description file 108 is, for example, a CSV format file.
  • n When the number of states of the random variable is n, description is made in each row as a random variable name, state 1, state 2,..., State n. Is done.
  • the random variable representing the purchase behavior history data of the customer and the actual value thereof are described in the data specification description file 108 as follows. Coupon, T1, T2, n A, y, n B, y, n C, y, n D, y, n
  • the data specification analysis unit 104 reads the data specification description file 108, names each random variable, the number of random variables, the state name of each random variable, the number of states of each random variable, and the entire data. It holds information about the number of cases and provides this information to other components.
  • the structure learning unit 110 is a part that executes the Bayesian network structure learning algorithm proposed in the present application, and a specific operation thereof will be described in detail below.
  • the query management unit 112 calculates a mutual information amount and a conditional mutual information amount from the input data. In order to calculate the probability distribution necessary for these calculations, it is necessary to issue to the database 106 a database query that counts the number of data corresponding to the condition. As will be described in detail below, the present invention, in one embodiment, avoids performing calculations that are less necessary using a frequent item set extraction algorithm when calculating mutual information and conditional mutual information. Thus, the whole process is speeded up. Further, while the algorithm is being executed, the number of data with the same condition is referred to a plurality of times. However, since obtaining the query result is a relatively time-consuming process, it is inefficient to issue a query each time.
  • the query management unit 112 passes the condition corresponding to the query result acquired once to the query result cache unit 114 to hold it.
  • the query management unit 112 queries the query result cache unit 114 when it is necessary to acquire the number of data items, uses the result if the result has already been acquired, and issues the query if the result has not been obtained yet.
  • the number of data items is acquired from the database 106.
  • the query result cache unit 114 has, as an internal data structure, a hash table in which the query search condition is a key and the number of corresponding data items as a query result is a value, and holds the query result.
  • the query result cache unit 114 has a function of responding to a query from the query management unit 112 as to whether or not a query result corresponding to the search condition is already held, and a function of holding a new key / value pair. Have.
  • the cutting set holding unit 116 calculates the conditional mutual information amount between the random variable pairs, the probability variable pair and the variable set of the conditional part in which the conditional mutual information amount is less than the threshold ⁇ are records, It has a function of holding a global cutting set as an element. The cut set is required for directing undirected edges.
  • the graph structure construction unit 118 is a part having a function of constructing the graph structure of the Bayesian network estimated by the structure learning unit 110.
  • the Bayesian network structure description file 120 is a file having information on the structure of the Bayesian network estimated by the information processing apparatus 100. For example, when an estimated edge is detected and becomes a directed edge, it is expressed as “parent variable name ⁇ child variable name”, and the direction of the edge cannot be detected and the undirected edge is detected. In such a case, it is expressed as “variable name 1 ⁇ variable name 2”.
  • the coupon is a parent variable of A and D
  • A is a parent variable of B
  • the output Bayesian network structure description file 120 includes the following contents. Coupon ⁇ A Coupon ⁇ D A ⁇ B BC
  • Bayesian network structure learning can be performed at a higher speed and with less processing time variation than conventional TPDA. This will be described in detail.
  • FIG. 3 is a flowchart showing processing by the information processing apparatus 100 according to the present embodiment.
  • the information processing apparatus 100 starts processing (step 302).
  • the instruction is configured to include predetermined operation parameters including connection information and a data specification description file name for accessing the database 106 that stores data serving as a basis for structure learning.
  • the operation parameters include a mutual information amount used in structure learning and a conditional mutual information threshold value ⁇ (as an example, 0.01) and a minimum support level ⁇ (used in frequent item set extraction). 0 ⁇ ⁇ ⁇ ⁇ , for example, 0.0005).
  • the file name of the output Bayesian network structure description file may be included.
  • the information processing apparatus 100 performs initial processing.
  • the control unit 102 checks whether or not the operation parameter is normal (step 304). If there is an error, the control unit 102 terminates the processing (step 320). If normal, the data specification is sent to the data specification analysis unit 104. Analysis is performed (step 306).
  • the data specification analysis unit 104 reads the data specification description file 108 and stores the names of the random variables, the number of random variables, the names of all the states that can be taken by the random variables, and the number of states. Next, the data specification analysis unit 104 connects to the database 106 using the database connection information, acquires the number of all data, and holds it (step 308). After step 308, the control unit 102 transfers control to the structure learning unit 110.
  • the structure learning unit 110 performs a tree structure preparation process and generates a tree structure for the given data (step 310).
  • the process of step 310 is shown in more detail in FIG.
  • the frequent item set extraction algorithm is a data for rapidly extracting an item set whose support level (that is, the joint probability that a certain item set appears) is equal to or higher than the minimum support level ⁇ among the item sets appearing in the data.
  • a general term for mining algorithms is the inverse monotonicity of the support of the item set, that is, when A and B are the item sets, Then, using the property that (support level of A) ⁇ (support level of B) (that is, if A is not a frequent item set, the set B including A is not a frequent item set) It is an algorithm that efficiently extracts frequent itemsets by branching.
  • Apriori As an example of a multi-frequency item set extraction algorithm, Apriori (Agrawal, R. and Srikant , R .: Fast Algorithms for Mining Association Rules, in Proc. Of the 20 th Int'l Conference on Very Large Databases, pp. 487-499 , Santiago, Chile (1994)). Apriori pseudo code is shown in FIG.
  • a pair of random variables and their values is regarded as an item set, and the expression (2) expressing the mutual information is Among the components constituting the right side, the component related to the set of the joint random variable having the minimum support degree ⁇ and its value is excluded from the calculation target.
  • mutual information I (X, Y) of different random variables X and Y is expressed by the following equation.
  • the query management unit 112 calculates peripheral probabilities for all of the possible states for each random variable.
  • a set of random variables and states (represented as ⁇ X, x>, ⁇ Y, y>, etc.) whose marginal probability is greater than or equal to the minimum support degree ⁇ is added to the frequent item set F 1 of size 1.
  • the probability variable and state at that time are used as search condition keys, and the number of data corresponding to the search condition is stored as a value in the query result cache unit 114.
  • An example of the genFreqItemSet1 routine for performing this procedure in the query management unit 112 is shown in FIG.
  • the query management unit 112 calculates the mutual information amount for all the random variable pairs, and adds a pair of random variables whose mutual information amount is ⁇ or more to the random variable pair array. At this time, the query management unit 112 uses the above-described frequent item set extraction algorithm to speed up the calculation process.
  • the query management unit 112 sets a pair (for example, ⁇ X, x 1 >, ⁇ Y, for all y 1>), in each set of elements (here, ⁇ X, x 1> and ⁇ Y, both of y 1>) is included in the frequent item set F 1 (i.e., the periphery of the case It is determined whether the probabilities are all equal to or greater than the minimum support degree ⁇ (step 404). When at least one is not included in the set F 1 (less than the minimum support degree ⁇ ) (“No” in Step 404), the right side of Equation (4) (Mutual information component) is not calculated.
  • step 404 it is determined whether the joint probability P (x, y) at this time is equal to or greater than ⁇ (step 406). If the joint probability is greater than or equal to ⁇ (“Yes” in step 406), the mutual information component (formula (6)) at this time is calculated (step 408). Further, the current set of random variables and states (for example, ⁇ X, x 1 >, ⁇ Y, y 1 >) is used as a search condition key, and the number of data corresponding to the search condition is set as a value in the query result cache unit 114.
  • the current set of random variables and state frequent itemsets F 2 size 2.
  • the query management unit 112 repeats Step 404 to Step 408 for all the sets (Step 410), and then adds the components of the mutual information calculated so far to the random variable pair currently focused on.
  • a mutual information amount is obtained (step 412).
  • the structure learning unit 110 adds the random variable pair to the random variable pair array.
  • Steps 404 to 412 are repeated for all random variable pairs, and mutual information is calculated for all random variable pairs (step 414).
  • the structure learning unit 110 sorts the random variable pairs stored in the random variable pair array in descending order of mutual information (step 416). Then, the graph structure construction unit 118 is inquired as to whether the graph structure remains a tree structure even if an edge is added between the random variable pairs in the order of the random variable pairs having the larger mutual information amount (step 418). The graph structure construction unit 118 notifies the structure learning unit 110 that a tree structure is not generated when an edge is added to the structure learning unit 110 (“No” in step 418). On the other hand, when the graph structure construction unit 118 notifies that the tree structure is still maintained (“Yes” in step 418), the structure learning unit 110 adds an undirected edge between the current random variable pair of interest. The graph structure construction unit 118 is instructed to delete the random variable pair from the random variable pair array (step 420). Steps 418 and 420 are repeated for all random variable pairs in the random variable pair array (step 422).
  • step 310 for each random variable pair whose mutual information amount is ⁇ or more, even if an edge is added between the random variable pairs, the graph structure remains a tree structure.
  • a tree-structured graph structure is generated by adding edges.
  • the marginal probability of each random variable included in the input data is calculated, and if the marginal probability that takes a state with any random variable constituting the random variable pair is less than ⁇ , By omitting the calculation of the information amount component, the mutual information amount of each random variable pair is calculated.
  • FIG. 7 shows an example of a calcMutualInformation routine for calculating mutual information in this embodiment when Apriori is adopted as an example of the frequent item set extraction algorithm and incorporated.
  • the structure learning unit 110 executes an edge increasing process in step 312. Although the mutual information amount is ⁇ or more, the structure learning unit 110 does not become a tree structure when an undirected edge is added. , The random variable pair remaining in the random variable pair array) is determined by using conditional mutual information to determine whether or not an edge is actually needed. Add an edge.
  • An example of the thickening (edge increase) routine at this time is shown in FIG.
  • FIG. 9 shows details of main processes executed in the thickening routine.
  • the structure learning unit 110 configures the pair for each random variable pair whose mutual information amount is ⁇ or more but does not have an undirected edge (that is, the random variable pair remaining in the random variable pair array).
  • the final condition is a set of nodes that exist on a path that has one node of one random variable (for example, X, Y) node as a start point and the other node as an end point and is adjacent to one of these two random variable nodes.
  • Set as a set (ConditionSet, C (bold)) (step 902).
  • a candidate condition set C ′ (bold) having the same random variable set as the final condition set is generated.
  • the structure learning unit 110 includes one random variable (ie, between X and Y) included in the final condition set (for example, ⁇ C 1 , C 2 , C 3 , C 4 ,... ⁇ ).
  • the query management unit 112 calculates the conditional mutual information for one of the random variables adjacent to X or Y) on the path of (1).
  • the query management unit 112 first determines the conditional mutual information I (() when the condition set includes only one random variable (for example, C 1 ) among the random variables included in the final condition set. X, Y
  • the query management unit 112 includes one random variable (here, C 1 ) and another random variable (one of C 2 , C 3 , C 4 ...) In the condition set. Is included, a conditional mutual information amount is calculated (step 906).
  • the structure learning unit 110 determines whether the calculated conditional mutual information is less than ⁇ (step 908). If it is less than ⁇ (“Yes” in step 908), the pair of the random variable pair ( ⁇ X, Y ⁇ ) and the condition set (for example, ⁇ C 1 , C 2 ⁇ ) is stored in the cut set holding unit 116. Store in the global disconnected set. Then, it is determined that no edge is required between the random variable pairs (step 910). If it is equal to or greater than ⁇ (“No” in step 908), it is determined whether the calculated conditional mutual information amount is larger than the current minimum conditional mutual information amount (step 912).
  • the structure learning unit 110 deletes the other one random variable from the candidate condition set C ′ (bold) (step 914). If it is small (“No” in Step 912), the structure learning unit 110 stores the conditional mutual information amount at this time as the minimum mutual information amount, and the conditional set at this time has the minimum conditional mutual information amount. (Step 916).
  • the query management unit 112 determines that only one of the random variables other than the one of the above random variables of interest (C 1 in the above example) among the random variables remaining in the candidate condition set is a condition. For the cases included in the set, a conditional mutual information amount is calculated (step 918). If the calculated conditional mutual information amount is smaller than the current minimum conditional mutual information amount, the conditional mutual information amount at this time is stored as the minimum mutual information amount. The condition set at this time is stored as a condition set that minimizes the amount of conditional mutual information (step 920).
  • the query management unit 112 calculates a conditional mutual information amount in the case where one random variable currently focused on and one of the other random variables remaining in the candidate condition set are included in the condition set. (Step 922).
  • the structure learning unit 110 determines the global variable in the cut set holding unit 116 as a combination of the random variable pair and the condition set at this time. Remember to cut set. Then, it is determined that no edge is required between the random variable pairs (step 910). If the calculated conditional mutual information amount is equal to or larger than ⁇ (“No” in step 924), it is determined whether the value is larger than the value calculated in step 918 (step 926).
  • step 926 If larger (“Yes” in step 926), the other one random variable is deleted from the candidate condition set (step 928). If it is small (“No” in step 926), the conditional mutual information amount at this time is stored as the minimum mutual information amount, and the condition set at this time is stored as a conditional set that minimizes the conditional mutual information amount ( Step 930).
  • the query management unit 112 uses all the random variables remaining in the candidate condition set as the condition set to perform conditional mutual information.
  • the amount is calculated (step 1004). It is determined whether the calculated conditional mutual information is less than ⁇ (step 1006). If it is less than ⁇ (“Yes” in step 1006), the structure learning unit 110 stores the pair of the random variable pair and the condition set at this time in the global cut set in the cut set holding unit 116 (step 1008). ), It is determined that no edge is required between the pair of random variables. If it is equal to or greater than ⁇ (“No” in step 1006), the structure learning unit 110 determines that an edge is necessary between the random variable pair (step 1010).
  • conditional mutual information is calculated while increasing the size of the condition set from 1 to 2, and the conditional mutual information that is still less than ⁇ cannot be obtained. Calculated conditional mutual information using all the random variables remaining in the candidate set as a condition set, and determined whether or not this was less than ⁇ .
  • the size of the condition set may be further increased to 3, 4 and processing up to a size equal to or smaller than the final condition set size may be performed as shown in FIG.
  • step 312 for each random variable pair in which the mutual information amount is ⁇ or more but no edge is added in step 310, an edge is added if necessary.
  • a condition set is a set of random variables included in a set of nodes on a path between them and adjacent to either of the two random variable nodes. If a pair having a value less than ⁇ is found, an edge is not added between the two random variables, otherwise an edge is added.
  • a set of nodes on the path between the two random variable nodes and adjacent to one of the two random variable nodes is a final condition set, and the two random variable nodes and the condition set
  • the conditional mutual information about C (bold) is calculated while increasing the size from the case where the size of the condition set C (bold) is 1 to the size of the final condition set. If a conditional mutual information amount that is less than ⁇ is obtained in the process, an edge is not added between the two random variables, and an edge is added if it is determined that it is not necessary.
  • the joint probability distribution for the states of the two random variables and the state set corresponding to each random variable is less than ⁇ , the calculation of the related components is omitted. To do.
  • FIG. 11 shows an example of the edgeNeeded_H routine called in the thickening routine of FIG. 7 and used in the processing of FIGS. 9 and 10
  • FIG. 12A and FIG. 12B show an example of the edgeNeededBody routine called in the edgeNeeded_H routine.
  • the routine in FIG. 12A is configured to calculate the conditional mutual information by increasing the size of the condition set C (bold) in order from 1 to 2.
  • the routine of FIG. 12B is configured to perform calculation while increasing the size of the condition set from 1 to the size of the final condition set.
  • conditional mutual information is calculated using a set of various random variables as a condition set.
  • the conditional mutual information when the size of the conditional set is small (size is 1) is calculated, and then the conditional mutual information is calculated while increasing the size of the conditional set. Yes. Then, the calculation is repeated until a condition set in which the conditional mutual information amount is less than the threshold ⁇ is found, and if it is found, it is determined that an edge is not necessary between the random variable pair of interest, and if it is not found Determines that an edge is necessary.
  • a combination of a random variable and its value is used.
  • C (bold)) is expressed by the following equation.
  • Equation (7) if P (x, y, c (bold)) is less than the minimum support degree ⁇ , the component constituting the right side of Equation (7) is also less than ⁇ . As already described, since 0 ⁇ ⁇ ⁇ ⁇ , at this time, for the component, Is established. Therefore, it can be seen that the value is less than ⁇ without directly calculating each component on the right side of Equation (7).
  • FIG. 13 shows a processing flow when the number (
  • the query management unit 112 calculates P (c (bold)) (c (bold) is a state set corresponding to a random variable). It is determined whether this value is less than ⁇ (step 1102). If it is less than ⁇ (“Yes” in step 1102), the query management unit 112 determines whether the conditional mutual information component (the right side of Expression (7)) Is not calculated. If it is equal to or greater than ⁇ (“No” in step 1102), it is determined whether P (x
  • step 1106 it is determined whether P (y
  • step 1110 it is determined whether P (x, y, c (bold)) is less than ⁇ (step 1112). If it is less than ⁇ (“Yes” in step 1112), the conditional mutual information component is not calculated. If it is equal to or greater than ⁇ (“No” in step 1112), the query management unit 112 calculates a component of conditional mutual information about the current set of random variables and states (step 1114).
  • step 1116 The state of one random variable (for example, X) in the current random variable pair is fixed, and steps 1106 to 1114 are repeated for all possible states of the other random variable (for example, Y) (step 1116). Further, after repeating steps 1104 to 1116 for all possible states of the fixed random variable (X) (step 1118), the components calculated so far are added to obtain a conditional mutual information amount (step 1120).
  • FIG. 14 shows a processing flow when the number of random variables (
  • ⁇ 1 variable are generated from the condition set (step 1402).
  • the query management unit 112 inquires whether all combinations are stored in the query result cache unit 114 as search condition keys (step 1404). If any combination is not stored (“No” in step 1404), the following processing is not performed and the process is terminated (step 1406).
  • P (c (bold)) (c (bold) is a state value set corresponding to each random variable) is ⁇ . It is determined whether or not it is less than (step 1408). If it is less than ⁇ (“Yes” in step 1408), the query management unit 112 is not allowed to calculate the conditional mutual information component. If it is greater than or equal to ⁇ (“No” in step 1408), the probability variable at this time is used as a search condition key, and the number of data corresponding to the search condition is stored as a value in the query result cache unit 114 (step 1410). Also, the set of random variables and states is added to the frequent item set F n having a size equal to the number n of random variables.
  • step 1412 it is determined whether P (x
  • step 1414 If it is not 0 (“No” in step 1414), all subsets obtained by a set consisting of two random variables constituting the current random variable pair and the current condition set and having a size smaller by 1 than that set. Is considered (step 1416), and it is determined whether all the subsets are included in the frequent item set (F 1 , F 2 ,...) (Step 1418). If not included (“No” in step 1418), the conditional mutual information component is not calculated. If included (“Yes” in step 1418), it is determined whether P (x, y, c (bold)) is less than ⁇ (step 1420). If it is less than ⁇ (“Yes” in step 1420), the conditional mutual information component is not calculated.
  • the probability variable at this time is used as a search condition key, and the number of data corresponding to the search condition is stored as a value in the query result cache unit 114 (step 1422). Also, the set of random variables and states is added to the frequent item set F n having a size equal to the number n of random variables.
  • a conditional mutual information component is calculated for the current set of random variables and states (step 1424). Further, the state of one random variable (X) in the current random variable pair is fixed, and Steps 1414 to 1424 are repeated for all possible states of the other random variable (Y) (Step 1426). Steps 1412 to 1426 are repeated for all possible states of the fixed random variable (X) (step 1428). Finally, all the components of the conditional mutual information calculated so far are added to obtain the conditional mutual information (step 1430).
  • FIGS. 15 and 16 An example of the calcConditionalMI routine and the haveValidCandidate routine used in the processing of FIGS. 13 and 14 is shown in FIGS. 15 and 16, respectively.
  • the information processing apparatus 100 performs edge reduction processing in step 314. Detailed processing is shown in FIG.
  • the structure learning unit 110 inquires of the graph structure construction unit 118 whether there is another path between the random variable pairs for each random variable pair having an undirected edge (step 1702). When a notification that there is no other path is received from the graph structure construction unit 118 (“No” in Step 1702), the process proceeds to Step 1702 for another random variable pair. When a notification that there is another path is received from the graph structure construction unit 118 (“Yes” in step 1702), the structure learning unit 110 temporarily deletes the undirected edge between the random variable pairs. The construction unit is instructed (step 1704). Then, the processing shown in FIGS. 9 and 10 is executed for the random variable pair, and it is determined whether or not an edge is necessary between the random variable pairs (step 1706).
  • Step 1706 When an edge is necessary (“Yes” in step 1706), the structure learning unit 110 instructs to add the undirected edge temporarily deleted again between the random variable pairs (step 1708). When the edge is not necessary (“No” in Step 1706), the structure learning unit 110 does not give such an instruction, and the undirected edge remains deleted. Steps 1702 to 1708 are repeated for all random variable pairs having undirected edges when the edge increasing process 312 is completed (step 1710).
  • the structure learning unit 110 has, for each random variable pair having an undirected edge at the present time, at least one random variable node constituting the pair has three or more adjacent nodes other than the other node.
  • the graph structure construction unit 118 is inquired about this (step 1712).
  • the graph structure construction unit 118 receives a notification from the graph structure construction unit 118 that any of the random variable nodes constituting the pair does not have three or more adjacent nodes other than the pair counterpart, the structure learning unit 110 The process proceeds to step 1712 for another random variable pair.
  • the structure learning unit 110 Upon receiving notification from the graph structure construction unit 118 that at least one random variable node constituting the pair has three or more adjacent nodes other than the other node (“Yes” in step 1712), the structure learning unit 110 Then, the graph structure construction unit is instructed to temporarily delete the undirected edge between the random variable pairs (step 1714). Then, a set including a node adjacent to one of the two random variable nodes and a node adjacent to the adjacent node on the path between the two random variable nodes constituting the random variable pair, The final condition set is set (step 1716). Then, the processing after step 904 in FIG. 9 and the processing in FIG. 10 are executed for the random variable pair and the final condition set, and it is determined whether or not an edge is necessary between the random variable pairs ( Step 1718).
  • the structure learning unit 110 instructs to add the undirected edge temporarily deleted again between the random variable pairs (Step 1720). If the edge is not necessary (“No” in step 1718), the structure learning unit 110 does not give such an instruction, and the undirected edge remains deleted.
  • the processing of steps 1702 to 1710 is completed, the processing of steps 1712 to 1720 is repeated for all random variable pairs having undirected edges (step 1722).
  • step 314 it is determined whether or not an edge is necessary for each random variable pair having an edge after the processing in step 312, and an edge is added if necessary. At that time, if there is a path other than the undirected edge for a random variable pair having an undirected edge, the undirected edge is temporarily deleted, and the random variable pair is necessary in the same manner as in step 312. If it is, the undirected edge temporarily deleted is added. Furthermore, when at least one random variable node of a random variable pair having an undirected edge has three or more adjacent nodes other than the other node, the undirected edge is temporarily deleted.
  • a set including a node adjacent to one of the two random variable nodes and a node further adjacent to the adjacent node on the path between the two final random variable nodes is finalized.
  • the undirected edge temporarily deleted is added when necessary as in the case of step 312.
  • Thinning (edge reduction) routine is shown in FIG. 18, and an example of the edgeNeeded routine called in the routine is shown in FIG.
  • the information processing apparatus 100 executes an orientation process for an undirected edge included in the graph structure generated in the graph structure construction unit 118 through the operations up to the edge reduction process 314 (step 316). . Detailed processing is shown in FIG.
  • the structure learning unit 110 inquires of the cut set holding unit 116 for such a set of random variables, and 1) an element having ⁇ X, Z ⁇ as a record (for example, ⁇ X, Z ⁇ , C>) is global.
  • step 2002 the structure learning unit 110 instructs the graph structure building unit 118 to direct the undirected edge so that X and Z are parents of Y (step 2004). If not ( Step "No” in 2002), the process of step 2002 for another three random variables set to satisfy the above relationship.
  • the structure learning unit 110 sets the three random variable sets (here, Focus on X, Y and Z).
  • the structure learning unit 110 inquires of the graph structure construction unit 118, 1) X is a parent of Y, 2) Y and Z are adjacent, and 3) X and Z are not adjacent. And 4) An inquiry is made as to whether all the conditions that the edge between Y and Z is an undirected edge are satisfied (step 2008). If the condition is satisfied (“Yes” in step 2008), the structure learning unit 110 instructs the graph structure construction unit 118 to direct the edge so that Y becomes the parent of Z (step 2010). If the condition is not satisfied (“No” in step 2008), the process in step 2008 is executed for another set of three random variables included in the vertex set. Steps 2008 and 2010 are performed for all of the three random variable sets included in the vertex set (step 2012).
  • the structure learning unit 110 inquires of the graph structure construction unit 118 about whether there is a valid path between the random variable pairs for the random variable pair having an undirected edge at the present time (step 2014). If it exists (“Yes” in step 2014), the structure learning unit 110 instructs the graph structure construction unit 118 to direct the edge so that X becomes the parent of Y (step 2016). Steps 2014 and 2016 are performed for all undirected edges (step 2018).
  • FIG. 1 An example of the orientEdge routine used for edge orientation is shown in FIG.
  • the graph structure held by the graph structure construction unit 118 after the execution of step 316 in FIG. 3 represents a Bayesian network structure learned through a series of processes of the present embodiment.
  • the information processing apparatus 100 outputs this as the Bayesian network structure description file 120 (step 318) and completes the process (step 320).
  • the information processing apparatus 100 described here includes a plurality of components shown in FIG. 1, but such a configuration is merely an example. That is, the learning device of the present invention includes a plurality of control units 102, data specification analysis unit 104, structure learning unit 110, query management unit 112, query result cache unit 114, disconnected set holding unit 116, and graph structure construction unit 118.
  • the functions may be configured to execute on a single component. Also, all of these functions may be performed on a single component (eg, a computer processor).
  • the information processing apparatus 100 may include a database 106.
  • the present invention has been described as being implemented as the information processing apparatus 100.
  • the present invention can be realized as a program that causes a computer to operate as a component shown in FIG.
  • the present invention can be realized as a program that causes a computer to execute the steps illustrated in FIG.
  • This embodiment is characterized in that Bayesian network structure learning is performed using a new algorithm in which a frequent item set extraction algorithm is combined with a constraint-based learning algorithm.
  • the technical idea of the present invention has been described using a specific constraint-based learning algorithm and a frequent item set extraction algorithm for specific explanation.
  • the technical idea of the present invention can be realized using other constraint-based learning algorithms and frequent item set extraction algorithms.
  • the information processing apparatus and program according to the present embodiment do not simply use a combination of the above algorithms. In other words, it does not operate with a simple combination in which the output obtained by using the frequent item set extraction algorithm is a joint probability distribution extracted by the algorithm, and this is simply passed as input to the constraint-based learning algorithm. .
  • the present invention uses a unique algorithm in which a frequent item set extraction algorithm is incorporated inside a constraint-based learning algorithm. And this invention operate
  • the number of random variables given as a condition for conditional mutual information does not always increase monotonically as the algorithm progresses.
  • the number of items to be handled in the present invention, a set of random variables and their values
  • the present invention realizes effective incorporation of a frequent item set extraction algorithm by changing the constraint-based learning algorithm so that the number of random variables to be handled keeps monotonically increasing locally.
  • the Bayesian network that is the basis of the experiment is a 37-node, 46-edge system called Alarm from the Bayesian network repository (http://compbio.cs.huji.ac.il/Repository) frequently used as an example of Bayesian network learning. Used the network.
  • the Bayesian network used for the experiment is shown in FIG. Generate 5000, 15000, 30000, and 50000 data for the network, respectively, and use these data as inputs to operate the computer as an information processing device of the present embodiment according to the conventional TPDA algorithm and the algorithm of the present embodiment. By doing this, Bayesian network structure learning was performed.
  • Tables 1 to 4 show the experimental results representing the average values and the like when five data sets are executed for each of the above data numbers. Missing Edge and Extra Edge indicate the number of edges lost in the estimated Bayesian network and the number of extra edges added, respectively, when compared to the correct Bayesian network.
  • the execution time and standard deviation for each algorithm when the execution time when using the conventional TPDA algorithm is set to 1 and the execution time when the number of data items for each algorithm is 5000 are 1 The execution time was shown.
  • Bayesian network structure learning can be greatly speeded up and the variation in execution time can be greatly reduced as compared with the case where the conventional TPDA algorithm is used. It can also be seen that the error from the correct network indicated by Missing Edge and Extra Edge is suppressed to a level comparable to that of the prior art. As described above, the present embodiment has an excellent effect that the speed of the structure learning and the stabilization of the execution time can be realized without sacrificing the accuracy of the estimated Bayesian network as compared with the conventional technique. .
  • the configuration of the information processing apparatus 100 of the present embodiment is shown in FIG. Further, the basic processing by the information processing apparatus 100 in this embodiment is as shown in FIG.
  • the information processing apparatus 100 starts processing (step 302).
  • the instruction is configured to include connection information and a data specification description file name for accessing the database 106 that stores data serving as a basis for Bayesian network structure learning.
  • the operation parameters include a mutual information amount used in Bayesian network structure learning and a conditional mutual information threshold value ⁇ (for example, 0.01).
  • the file name of the output Bayesian network structure description file may be included.
  • the information processing apparatus 100 performs initial processing.
  • the control unit 102 checks whether or not the operation parameter is normal (step 304). If there is an error, the control unit 102 terminates the processing (step 320). If normal, the data specification is sent to the data specification analysis unit 104. Analysis is performed (step 306).
  • the data specification analysis unit 104 reads the data specification description file 108 and holds the names of the random variables, the number of random variables, the names of all the states that can be taken by the random variables, and the number of states. Next, the data specification analysis unit 104 connects to the database 106 using the database connection information, acquires the number of all data, and holds it (step 308). After step 308, the control unit 102 transfers control to the structure learning unit 110.
  • the structure learning unit 110 performs a tree structure preparation process and generates a tree structure for the given data (step 310).
  • the process of step 310 is shown in more detail in FIG.
  • the query management unit 112 calculates mutual information for all random variable pairs (step 2302).
  • the structure learning unit 110 adds the random variable pair to a random variable pair array stored in a storage unit (not shown) in the information processing apparatus 100 or the like.
  • the structure learning unit 110 sorts the random variable pairs stored in the random variable pair array in descending order of mutual information (step 2304). Then, the graph structure construction unit 118 is inquired as to whether or not the graph structure remains a tree structure even if an edge is added between the random variable pairs in the order of the random variable pairs having the larger mutual information amount (step 2306). The graph structure construction unit 118 notifies the structure learning unit 110 that a tree structure is not generated when an edge is added to the structure learning unit 110 (“No” in step 2306). On the other hand, when the graph structure construction unit 118 notifies that the tree structure is maintained even if an edge is added (“Yes” in step 2306), the structure learning unit 110 determines whether the current attention is placed between the random variable pairs. The graph structure construction unit 118 is instructed to add an undirected edge, and the random variable pair is deleted from the random variable pair array (step 2308). Steps 2306 and 2308 are repeated for all random variable pairs in the random variable pair array (step 2310).
  • the graph structure remains a tree structure.
  • a tree-structured graph structure is generated by adding edges.
  • the generated graph structure may be stored in the graph structure construction unit 118, a storage unit (not shown) in the information processing apparatus 100, the structure learning unit 110, or the like.
  • the structure learning unit 110 executes an edge increasing process in step 312. Although the mutual information amount is equal to or larger than ⁇ , the structure learning unit 110 does not become a tree structure when an undirected edge is added, and thus the random variable pair in which the undirected edge is not added in the tree structure preparation process (ie, , Random variable pairs remaining in the random variable pair array), whether or not an edge is actually needed is determined by using conditional mutual information, and if it is determined to be necessary, the probability Add undirected edges between variable pairs.
  • An example of the thickening (edge increase) routine at this time is shown in FIG.
  • FIG. 25 shows details of main processes executed in the thickening routine in this embodiment.
  • the structure learning unit 110 configures the pair for each random variable pair whose mutual information amount is ⁇ or more but does not have an undirected edge (that is, the random variable pair remaining in the random variable pair array).
  • the final condition is a set of nodes that exist on a path that has one node of one random variable (for example, X, Y) as a start point and the other node as an end point and is adjacent to one of these two random variable nodes. It is set as a set (ConditionSet, expressed as Z (bold) in this embodiment) (step 2402).
  • a candidate condition set Z c (bold) having the same random variable set as the final condition set is generated.
  • the structure learning unit 110 includes one random variable (ie, between X and Y) included in the final condition set (for example, ⁇ Z 1 , Z 2 , Z 3 , Z 4 ,... ⁇ ).
  • the query management unit 112 calculates the conditional mutual information for one of the random variables adjacent to X or Y) on the path of (1).
  • the query management unit 112 performs conditional mutual processing on the assumption that only one random variable (for example, Z 1 ) among random variables included in the final condition set Z (bold) is included in the condition set.
  • Z (bold)) is calculated (step 2404).
  • the calculated conditional mutual information is associated with the condition set used in the calculation (including only Z 1 in this case) and stored in the query management unit 112 or the storage unit (not shown) of the information processing apparatus 100. It may be stored.
  • the structure learning unit 110 determines whether or not the conditional mutual information calculated in step 2404 is less than ⁇ (step 2406). If it is less than ⁇ (“Yes” in Step 2406), the pair of the random variable pair ( ⁇ X, Y ⁇ ) and the condition set (for example, ⁇ Z 1 ⁇ ) at this time is stored in the cut set holding unit 116. Stored in a global disconnected set. Then, it is determined that no edge is required between the random variable pairs (step 2408).
  • step 2404 determines whether the conditional mutual information amount calculated using one random variable as a condition set in step 2404 is equal to or larger than ⁇ (“No” in step 2406). If the conditional mutual information amount calculated using one random variable as a condition set in step 2404 is equal to or larger than ⁇ (“No” in step 2406), the process returns to step 2404, and another probability is obtained.
  • a conditional mutual information amount is calculated when only a variable (eg, Z 2 ) is included in the condition set. Thereafter, steps 2404 and 2406 are repeated in the same manner. That is, the calculation of the conditional mutual information with a given random variable included in the final condition set Z (bold) is repeated.
  • the calculated mutual mutual information amount may be stored in the query management unit 112 or the storage unit (not shown) of the information processing apparatus 100 in association with the condition set used for the calculation. If a random variable whose conditional mutual information is less than ⁇ is found in this process (“Yes” in step 2406), the process proceeds to step 2408.
  • any conditional mutual information calculated by the execution of steps 2404 and 2406 is greater than or equal to ⁇ , the process proceeds to step 2410.
  • the query management unit 112 performs conditional mutual processing for a case where two random variables (for example, Z 1 and Z 2 ) among random variables included in the final condition set Z (bold) are included in the condition set.
  • Z (bold)) is calculated (step 2410).
  • the calculated conditional mutual information is associated with the condition set used in the calculation (here, ⁇ Z 1 , Z 2 ⁇ ), and the storage unit (not shown) of the query management unit 112 or the information processing knowledge 100. Or the like.
  • the structure learning unit 110 determines whether the conditional mutual information calculated in step 2410 is less than ⁇ (step 2412). When this is less than ⁇ (“Yes” in step 2412), the set of the random variable pair ( ⁇ X, Y ⁇ ) and the condition set (for example, ⁇ Z 1 , Z 2 ⁇ ) at this time is cut into a cut set holding unit. Store in the global cutting set in 116. Then, it is determined that no edge is required between the random variable pair (here, X and Y) (step 2408).
  • the structure learning unit 110 uses the conditional mutual information amount in the calculation of step 2410. It is determined whether one of the two random variables included in the condition set (for example, Z 1 ) is larger than the conditional mutual information already calculated in step 2404 using the condition set (step 2414). If larger (“Yes” in step 2414), the structure learning unit 110 selects one of the random variables (in this case, Z 1 ) from the candidate condition set Z c (bold) stored in the storage unit (not shown) or the like. ) Is deleted. The structure learning unit 110 performs the same process on the other (for example, Z 2 ) of the two random variables included in the condition set used in the calculation of Step 2410 (Steps 2418 and 2420).
  • the structure learning unit 110 repeats Steps 2410 to 2420 for all random variables remaining in the candidate condition set Z c (bold) (Step 2422). For example, when the initial candidate condition set Z c (bold) includes six random variables Z 1 to Z 6 , the candidate condition set Z c (bold) to Z 1 and When Z 2 is deleted, the processes of steps 2410 to 2420 are executed again for the remaining Z 3 to Z 6 .
  • the query management unit 112 calculates a conditional mutual information amount using all the random variables remaining in the candidate condition set Z c (bold) as a result of the above-described processing (step 2424).
  • the structure learning unit 110 determines whether the conditional mutual information calculated in step 2424 is less than ⁇ (step 2426). When this is less than ⁇ (“Yes” in step 2426), the pair of the random variable pair ( ⁇ X, Y ⁇ ) and the candidate condition set at this time is stored in the global cut set in the cut set holding unit 116. . Then, it is determined that no edge is required between the random variable pair (here, X and Y) (step 2408).
  • the structure learning unit 110 determines that an edge is necessary between the random variable pair, and the graph structure The construction unit 118 is instructed to that effect (step 2428). In this way, an edge is added between a pair of random variables when it is determined that an edge is necessary.
  • the generated graph structure may be stored in the graph structure construction unit 118, a storage unit (not shown) in the information processing apparatus 100, the structure learning unit 110, or the like.
  • the processing when the size of the condition variable set is 1 at the maximum (first stage, steps 2404, 2406 and 2408). ), Processing when the size of the conditional variable set is 2 (second stage, steps 2410 to 2422 and 2408), and processing for all random variables left in the candidate condition set after the first stage and the second stage Only three stages (third stage, steps 2424-2428 and 2408) are required.
  • the conventional TPDA searches for a cut set starting from the larger subset size of a given condition variable set (that is, in descending order), resulting in a maximum of N ⁇ 2 (N is the number of random variables). ) Requires a stage search. Therefore, in the Bayesian network structure learning in a situation where there are a large number of random variables, this embodiment can significantly speed up the processing compared to the conventional TPDA.
  • FIG. 26 shows an example of the edgeNeeded_H routine called in the thickening routine of FIG. 24 and used for the processing of FIG. 25, and
  • FIG. 27 shows an example of the SearchCutSet routine called in the edgeNeeded_H routine. Details of the routine of FIG. 27 have already been described in relation to FIG.
  • the information processing apparatus 100 performs edge reduction processing in step 314. Detailed processing is shown in FIG.
  • the structure learning unit 110 inquires of the graph structure construction unit 118 whether there is another path between the random variable pairs for each random variable pair having an undirected edge (step 2702). When a notification that there is no other path is received from the graph structure construction unit 118 (“No” in step 2702), the processing proceeds to step 2702 for another random variable pair. When a notification that there is another path is received from the graph structure construction unit 118 (“Yes” in Step 2702), the structure learning unit 110 temporarily deletes the undirected edge between the random variable pairs. The construction unit 118 is instructed (step 2704). Then, the process shown in FIG. 25 is executed for the random variable pair, and it is determined whether or not an edge is necessary between the random variable pairs (step 2706).
  • the processing in step 2706 can be executed in only three stages at the maximum, so that the processing speed can also be increased here as compared with the conventional TPDA. If an edge is necessary (“Yes” in step 2706), the structure learning unit 110 instructs the graph structure construction unit 118 to add the undirected edge temporarily deleted again between the random variable pairs (step). 2708). When the edge is unnecessary (“No” in Step 2706), the structure learning unit 110 does not give such an instruction, and the undirected edge remains deleted. Steps 2702 to 2708 are repeated for all random variable pairs having undirected edges when the edge increasing process 312 is completed (step 2710).
  • the structure learning unit 110 has, for each random variable pair having an undirected edge at the present time, at least one random variable node constituting the pair has three or more adjacent nodes other than the other node.
  • the graph structure construction unit 118 is inquired about this (step 2712). Upon receiving notification from the graph structure construction unit 118 that none of the random variable nodes constituting the pair has three or more adjacent nodes other than the counterpart of the pair (“No” in step 2712), the structure learning unit 110 The process proceeds to step 2712 for another random variable pair.
  • the structure learning unit 110 When the notification that the at least one random variable node constituting the pair has three or more adjacent nodes other than the other node is received from the graph structure construction unit 118 (“Yes” in step 2712), the structure learning unit 110 Then, the graph structure construction unit is instructed to temporarily delete the undirected edge between the random variable pairs (step 2714). Then, a set including a node adjacent to one of the two random variable nodes and a node adjacent to the adjacent node on the path between the two random variable nodes constituting the random variable pair, The final condition set is set (step 2716). Then, the processing after step 2404 in FIG. 25 is executed for the random variable pair and the final condition set, and it is determined whether or not an edge is necessary between the random variable pairs (step 2718).
  • step 2718 If an edge is necessary (“Yes” in step 2718), the structure learning unit 110 instructs the graph structure building unit 118 to add the undirected edge temporarily deleted again between the random variable pairs (step). 2720). If the edge is not necessary (“No” in step 2718), the structure learning unit 110 does not give such an instruction, and the undirected edge remains deleted.
  • steps 2702 to 2710 the processing of steps 2712 to 2720 is repeated for all random variable pairs having undirected edges (step 2722).
  • step 314 it is determined whether or not an edge is necessary for each random variable pair having an edge after the processing in step 312, and an edge is added if necessary. At that time, if there is a path other than the undirected edge for a random variable pair having an undirected edge, the undirected edge is temporarily deleted, and the random variable pair is necessary in the same manner as in step 312. If it is determined that the undirected edge is temporarily deleted, the undirected edge is added again. Furthermore, when at least one random variable node of a random variable pair having an undirected edge has three or more adjacent nodes other than the other node, the undirected edge is temporarily deleted.
  • a set including a node adjacent to one of the two random variable nodes and a node further adjacent to the adjacent node on the path between the two final random variable nodes is finalized.
  • the temporarily deleted undirected edge is added again as a necessary condition set when it is determined to be necessary.
  • FIG. 29 shows an example of a thinning (edge reduction) routine in this embodiment.
  • FIG. 30 shows an example of the edgeNeeded routine that is called in this routine in this embodiment.
  • the information processing apparatus 100 executes an orientation process for the undirected edge included in the graph structure generated by the graph structure construction unit 118 through the operations up to the edge reduction process 314 (step 316).
  • FIG. 31 shows an example of a routine used for edge orientation.
  • the graph structure held by the graph structure construction unit 118 after execution of step 316 in FIG. 3 is a series of this embodiment.
  • the information processing apparatus 100 outputs this as the Bayesian network structure description file 120 (step 318) and completes the process (step 320).
  • the information processing apparatus 100 described in connection with the present embodiment includes a plurality of components shown in FIG. 1, but such a configuration is merely an example. That is, the information processing apparatus according to the present embodiment includes a control unit 102, a data specification analysis unit 104, a structure learning unit 110, a query management unit 112, a query result cache unit 114, a disconnected set holding unit 116, and a graph structure construction unit 118. Multiple functions may be configured to perform on a single component. Also, all of these functions may be performed on a single component (eg, a computer processor).
  • the information processing apparatus 100 may include a database 106.
  • the maximum amount of calculation in the cutting set search test using the method of this embodiment will be considered.
  • the condition variable set size is 1, so the CI test (determining whether the conditional mutual information amount is less than ⁇ ).
  • the number of variables handled in is 3 including X and Y. Therefore, the number of CI test patterns in the first stage is r 3 where r is the maximum number of variable states.
  • the number of patterns becomes r 4.
  • the third stage corresponding to steps 2424 to 2428 in FIG.
  • the number of patterns is r N.
  • the maximum calculation amount is O (r N + N 2 r 4 ).
  • the number of CI test patterns of TPDA is r N , r N ⁇ 1 ,..., R 3 in the first , second,. Therefore, the maximum calculation amount by TPDA is obtained by multiplying the number of CI tests at each stage by the number of patterns and taking the sum. It becomes.
  • the difference is a term common to both methods from the respective calculation amounts. It can be analyzed by removing the r N is. That is, excluding the common term, in TPDA In this embodiment, O (N 2 r 4 ).
  • the difference O (r N ⁇ 1 ) in the calculation amount is an exponent order for N, whereas in this embodiment, both N and r are polynomial orders. Therefore, according to the present Example, it was confirmed that the maximum calculation amount in the cut set search test can be reduced as compared with the case of TPDA.
  • the minimum calculation amount for each CI test is O (r 3 ). This is because according to the present embodiment, if a cut set is found in the first CI test of the first stage, the processing of the second stage and the third stage is unnecessary, and the pattern of r 3 is calculated by the CI test. This is because it only has to be done.
  • TPDA when searching for a cut set of variables X and Y from a conditional variable set Z (bold), a subset Z (bold) ' ⁇ Z (bold) of the conditional variable set is descending in the size of the subset.
  • the CI test is performed to check whether the target subset is a cut set. Since the calculation amount of the CI test is an exponential order of a given variable set size, the maximum value of the number of states taken by the variable is r, and the given variable set size
  • the method according to the present embodiment has a remarkable effect that both the maximum calculation amount and the minimum calculation amount can be reduced as compared with TPDA.
  • the present invention has been described as being implemented as the information processing apparatus 100. However, it will be apparent to those skilled in the art that the present invention can be implemented as a program that causes a computer to operate as some or all of the components shown in FIG. It will also be apparent to those skilled in the art that the present invention can be implemented as a program that causes a computer to execute some or all of the steps described in FIG.
  • the Bayesian network shown in FIG. 22 was used as the basis of the experiment. Generate 5000, 15000, 30000, and 50000 data for the network, respectively, and use these data as inputs to operate the computer as an information processing device of the present embodiment according to the conventional TPDA algorithm and the algorithm of the present embodiment. By doing this, Bayesian network structure learning was performed.
  • Tables 1 to 4 show the experimental results showing the average values when 10 data sets are executed for each of the above data numbers.
  • the result of using the conventional TPDA is shown in the “TPDA” line, and the result of this example is shown in the “TS (Three-Staged) -TPDA” line.
  • Missing Edge and Extra Edge indicate the number of edges lost in the estimated Bayesian network and the number of extra edges added, respectively, when compared to the correct Bayesian network.
  • Bayesian network structure learning can be significantly speeded up and the variation in execution time can be greatly reduced compared to the case where the conventional TPDA algorithm is used. It can also be seen that the error from the correct network indicated by Missing Edge and Extra Edge is suppressed to a level comparable to that of the prior art. As described above, the present embodiment has an excellent effect that the structure learning can be significantly speeded up and the execution time can be stabilized without sacrificing the accuracy of the estimated Bayesian network as compared with the prior art. Play.

Landscapes

  • Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Pure & Applied Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Algebra (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Computational Mathematics (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Probability & Statistics with Applications (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
PCT/JP2010/058963 2009-08-06 2010-05-27 Dispositif de traitement d'informations et programme pour apprendre une structure de réseau de bayes WO2011016281A2 (fr)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP2011525822A JP5555238B2 (ja) 2009-08-06 2010-05-27 ベイジアンネットワーク構造学習のための情報処理装置及びプログラム

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2009-182993 2009-08-06
JP2009182993 2009-08-06

Publications (1)

Publication Number Publication Date
WO2011016281A2 true WO2011016281A2 (fr) 2011-02-10

Family

ID=43544742

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2010/058963 WO2011016281A2 (fr) 2009-08-06 2010-05-27 Dispositif de traitement d'informations et programme pour apprendre une structure de réseau de bayes

Country Status (2)

Country Link
JP (1) JP5555238B2 (fr)
WO (1) WO2011016281A2 (fr)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2015045091A1 (fr) * 2013-09-27 2015-04-02 株式会社シーエーシー Procédé et programme d'extraction de superstructure en apprentissage structural de réseau bayésien
WO2018076916A1 (fr) * 2016-10-27 2018-05-03 中兴通讯股份有限公司 Procédé et dispositif de publication de données, et terminal
JP2020529777A (ja) * 2017-08-01 2020-10-08 エルゼビア インコーポレイテッド 大規模、高密度、高ノイズネットワークから構造を抽出するためのシステム及び方法

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2015045091A1 (fr) * 2013-09-27 2015-04-02 株式会社シーエーシー Procédé et programme d'extraction de superstructure en apprentissage structural de réseau bayésien
WO2018076916A1 (fr) * 2016-10-27 2018-05-03 中兴通讯股份有限公司 Procédé et dispositif de publication de données, et terminal
CN108009437A (zh) * 2016-10-27 2018-05-08 中兴通讯股份有限公司 数据发布方法和装置及终端
CN108009437B (zh) * 2016-10-27 2022-11-22 中兴通讯股份有限公司 数据发布方法和装置及终端
JP2020529777A (ja) * 2017-08-01 2020-10-08 エルゼビア インコーポレイテッド 大規模、高密度、高ノイズネットワークから構造を抽出するためのシステム及び方法

Also Published As

Publication number Publication date
JP5555238B2 (ja) 2014-07-23
JPWO2011016281A1 (ja) 2013-01-10

Similar Documents

Publication Publication Date Title
US11354365B1 (en) Using aggregate compatibility indices to identify query results for queries having qualitative search terms
JP7169369B2 (ja) 機械学習アルゴリズムのためのデータを生成する方法、システム
JP7392668B2 (ja) データ処理方法および電子機器
US7818303B2 (en) Web graph compression through scalable pattern mining
Addis et al. Hybrid constructive heuristics for the critical node problem
US8903824B2 (en) Vertex-proximity query processing
WO2018004829A1 (fr) Procédés et appareil d'appariement de sous-graphes dans une analyse de données volumineuses
US11748351B2 (en) Class specific query processing
Ben-Shimon et al. An ensemble method for top-N recommendations from the SVD
JP5844824B2 (ja) Sparqlクエリ最適化方法
Bernard et al. Discovering customer journeys from evidence: a genetic approach inspired by process mining
CN106599122B (zh) 一种基于垂直分解的并行频繁闭序列挖掘方法
CN101635001B (zh) 从数据库提取信息的方法和设备
JP2021092925A (ja) データ生成装置およびデータ生成方法
Guyet et al. Incremental mining of frequent serial episodes considering multiple occurrences
JP5555238B2 (ja) ベイジアンネットワーク構造学習のための情報処理装置及びプログラム
JP5945206B2 (ja) 商品推薦装置及び方法及びプログラム
CN113918807A (zh) 数据推荐方法、装置、计算设备及计算机可读存储介质
CN109492844B (zh) 业务策略的生成方法和装置
JP5622880B2 (ja) アイテム推薦システム、アイテム推薦方法およびアイテム推薦プログラム
Li et al. Optimizing ml inference queries under constraints
CN113469819A (zh) 基金产品的推荐方法、相关装置及计算机存储介质
JP5589009B2 (ja) 推薦クエリ抽出装置及び方法及びプログラム
JP2009301557A (ja) 学習システム
Burashnikova et al. Sequential learning over implicit feedback for robust large-scale recommender systems

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 10806281

Country of ref document: EP

Kind code of ref document: A2

WWE Wipo information: entry into national phase

Ref document number: 2011525822

Country of ref document: JP

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 10806281

Country of ref document: EP

Kind code of ref document: A2