WO2015186278A1 - 属性列挙システム、属性列挙方法および属性列挙プログラム - Google Patents
属性列挙システム、属性列挙方法および属性列挙プログラム Download PDFInfo
- Publication number
- WO2015186278A1 WO2015186278A1 PCT/JP2015/000682 JP2015000682W WO2015186278A1 WO 2015186278 A1 WO2015186278 A1 WO 2015186278A1 JP 2015000682 W JP2015000682 W JP 2015000682W WO 2015186278 A1 WO2015186278 A1 WO 2015186278A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- attribute
- enumeration
- logical expression
- generated
- plan
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/12—Computing arrangements based on biological models using genetic models
- G06N3/126—Evolutionary algorithms, e.g. genetic algorithms or genetic programming
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N5/00—Computing arrangements using knowledge-based models
- G06N5/02—Knowledge representation; Symbolic representation
- G06N5/022—Knowledge engineering; Knowledge acquisition
- G06N5/025—Extracting rules from data
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B40/00—ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
Definitions
- the present invention relates to an attribute enumeration system, an attribute enumeration method, and an attribute enumeration program for enumerating new attributes obtained by combining attributes of learning data.
- Data mining is a technology for finding useful knowledge that has been unknown so far from a large amount of information.
- a process of processing an attribute (feature) used for data mining and generating a new attribute is performed.
- each attribute is represented by a binary attribute, and a logical expression obtained by connecting the binary attributes with an AND / OR operator is generated as a new attribute.
- the day of the week can be represented using seven types of binary attributes (IS_Sunday, IS_Monday, IS_Tuesday, IS_Wednesday, IS_Thursday, IS_Friday, IS_Saturday). Further, when one day is expressed in the morning or in the afternoon, one day can be expressed using two types of binary attributes (IS_AM, IS_PM).
- a new attribute “weekend afternoon” can be generated.
- a logical expression “(IS_Saturday AND IS_PM) OR (IS_Sunday AND IS_PM)” obtained by connecting these binary attributes with an AND / OR operator represents an attribute of weekend afternoon.
- Non-Patent Document 1 and Non-Patent Document 2 describe methods for enumerating attributes.
- an attribute additive standard form: DNF (Disjunctive normal form)
- DNF Disjunctive normal form
- Non-Patent Document 3 describes a method for extracting a frequently used DNF pattern.
- Non-Patent Document 4 describes an example of a method for evaluating attributes.
- Non-Patent Document 1 and Non-Patent Document 2 when enumerating DNF, only the attributes concatenated with the AND operator are enumerated first, and then these are connected one by one with the OR operator. Enumeration is used. However, this has a problem that a large amount of memory space is required. For example, using the method described in Non-Patent Document 1, it is assumed that attributes obtained by connecting five attributes among 100 original attributes with an AND / OR operator are listed. In this case, there are 100 4 combinations of four attributes connected by the AND / OR operator, but all these attributes must be retained in the memory, requiring a large amount of memory space. End up.
- Non-Patent Document 3 In order to suppress consuming a large amount of memory and a large amount of calculation time, it is conceivable to randomly sample attributes using the method described in Non-Patent Document 3. However, since combinations extracted by the method described in Non-Patent Document 3 are not exhaustive, it is difficult to generate better attributes.
- an object of the present invention is to provide an attribute enumeration system, an attribute enumeration method, and an attribute enumeration program that can enumerate new attributes at high speed while suppressing the consumption of memory while ensuring completeness of attributes. To do.
- the attribute enumeration system generates a set of logical expression structures that express how to combine logical expressions expressing combinations of attributes from the attributes of learning data and the maximum number of combinations of the attributes,
- An enumeration plan generation unit that generates a partial logical expression structure obtained by dividing the logical expression included in the expression structure into two parts and generates an enumeration plan corresponding to the logical expression structure of the division source, and according to the generated partial logical expression structure
- An attribute generation unit that generates new attributes by combining the attributes, and the enumeration plan generation unit makes the number of attributes included in the two partial logical expression structures generated from each logical expression structure equal. Further, the logical expression structure is divided into two.
- Another attribute enumeration system generates a set of logical expression structures that express how to combine logical expressions that represent combinations of attributes from learning data attributes and the maximum number of combinations of the attributes.
- An enumeration plan generator that generates an enumeration plan that expresses the relationship with the partial formula structure that represents a part of the formula structure in a graph structure, and a new attribute that combines each attribute according to the partial formula structure
- the enumeration plan generation unit can express a part of the logical structure more while reducing the space size necessary for storing new attributes generated by the attribute generation unit.
- a partial logical structure is selected from an enumeration plan.
- the attribute enumeration method generates a set of logical expression structures that express how to combine logical expressions representing combinations of attributes from the attributes of learning data and the maximum number of combinations of the attributes, Generate a partial logical expression structure that divides the logical expression expression included in the expression structure into two parts, generate an enumeration plan that is associated with the original logical structure, and combine each attribute according to the generated partial logical expression structure
- the logical expression structure is divided into two so that the number of attributes included in the two partial logical expression structures generated from each logical expression structure is equal. It is characterized by.
- Another attribute enumeration method generates a set of logical expression structures that express how to combine logical expressions representing attribute combinations from the attributes of learning data and the maximum number of combinations of the attributes.
- the attribute enumeration program according to the present invention is generated by generating a set of logical expression structures expressing how to combine logical expression expressing the combination of attributes from the learning data attribute and the maximum number of combinations of the attributes on the computer.
- An attribute generation process for generating a new attribute that combines each attribute according to the expression structure is executed, and in the enumeration plan generation process, the number of attributes included in the two partial logical expression structures generated from each logical expression structure is The logical structure is divided into two parts so as to be even.
- Another attribute enumeration program generates, on a computer, a set of logical expression structures expressing how to combine logical expression representing an attribute combination from the attribute of learning data and the maximum number of combinations of the attribute, Enumeration plan generation processing that generates an enumeration plan that expresses the relationship with the partial logical expression structure that represents a part of the generated logical expression structure in a graph structure, and a new combination that combines each attribute according to the partial logical expression structure Attribute generation processing to generate new attributes, and enumeration plan generation processing reduces part of the formula structure while reducing the space size required to store new attributes generated by attribute generation processing.
- a partial logical expression structure that can be expressed more is selected from an enumeration plan.
- the present invention it is possible to enumerate new attributes at high speed while ensuring the completeness of attributes and suppressing memory consumption. That is, the technical problem described in the “problem to be solved by the invention” is described in the “effect of the invention” by using the technical means shown in the “means for solving the problem”. Technology effects can be obtained.
- FIG. 1 is a block diagram showing an embodiment of an attribute enumeration system according to the present invention.
- the DNF enumeration problem will be described.
- the present invention can also be applied to a CNF (Conjunctive normal form) enumeration problem expressed by an expression obtained by connecting terms consisting of only logical sums with logical products. .
- FIG. 2 is an explanatory diagram illustrating examples of attributes indicated by the learning data.
- the table (matrix) illustrated in FIG. 2 expresses, with 1/0, whether or not the sample data s 1 to s 5 that are learning data have the attributes f 1 to f 5 .
- the attribute enumeration system of this embodiment illustrated in FIG. 1 includes an enumeration plan generation unit 11, a DNF search unit 12, an intermediate data storage unit 13, a sequential attribute evaluation unit 14, and an output data storage unit 15. ing.
- the binary matrix X indicating whether or not the learning data has the specified attribute and the maximum number MaxLen of attributes to be combined are input.
- the enumeration plan generation unit 11 When the binary matrix X and MaxLen are input, the enumeration plan generation unit 11 generates a logical expression representing a combination of attributes having a length within MaxLen from the learning data attributes and MaxLen. Further, in the present embodiment, the enumeration plan generation unit 11 generates a set of logical expression structures that express how to combine the generated logical expressions. In this embodiment, since the logical expression is expressed in DNF, this logical expression structure is referred to as a DNF label.
- the DNF label expresses a logical expression with a comma indicating the number of attributes included in the AND term and an OR operator.
- the DNF label expressed as [3] expresses “A and B and C”.
- a DNF label expressed as [1,1] expresses “A or B”.
- a DNF label expressed as [1,3] expresses “A or (B and C and D)”.
- A, B, C, and D indicate attributes.
- the enumeration plan generation unit 11 divides the logical expression expression included in the generated logical expression structure into two partial logical expression structures.
- the enumeration plan generation unit 11 expresses a relationship with a partial logical expression structure that represents a part of the generated logical expression structure in a graph structure.
- Each node in the graph structure is a logical expression structure or a partial logical expression structure.
- the graph structure expressed in this way is hereinafter referred to as an enumeration plan.
- the logical expression structure of the division source is associated with the partial logical expression structure divided into two.
- the graph structure is expressed by, for example, a DAG (directed acyclic graph).
- FIG. 3 is a flowchart illustrating an example of processing performed by the enumeration plan generation unit 11.
- the enumeration plan generation unit 11 generates all combinations of DNF labels up to the length MaxLen (step S11 in FIG. 3).
- the generated DNF labels are [4], [3, 1], [3], [2, 2], [2, 1, 1], [2, 1], [ 2], [1, 1, 1, 1], [1, 1, 1], [1, 1], [1].
- the set of DNF labels generated here can be said to be a set of logical formula structures expressing how to combine logical formulas representing combinations of attributes.
- the enumeration plan generation unit 11 performs structure division (step S12 in FIG. 3). Specifically, the enumeration plan generation unit 11 identifies the parent node by dividing the generated DNF label, and generates an edge connecting the nodes.
- Enumeration plan generation unit 11 specifies a parent node based on the following procedure, for example.
- the enumeration plan generation unit 11 sets the number of AND terms to N, a partial DNF label having a length of ceiling (N / 2), and a length of N-ceiling. Divide into (N / 2) partial DNF labels.
- the ceiling () function is a function that rounds up decimal places.
- the enumeration plan generation unit 11 divides the comma-delimited sequence into two partial DNF labels. At this time, the enumeration plan generation unit 11 divides the DNF label so that the difference in the number of attributes included in the two partial DNF labels is minimized. That is, the enumeration plan generation unit 11 divides the DNF label into two so that the number of attributes included in the two partial DNF labels generated from each DNF label is equal.
- the enumeration plan generation unit 11 sorts the DNF labels in descending order.
- sorted_list [4, 3, 2, 1, 1].
- the enumeration plan generation unit 11 calculates the sum of the numbers included in S1 and S2, and sets the first number of sorted_list in the smaller set. Then, the set number and the subsequent comma are deleted from the sorted_list.
- the DNF label is divided into two partial DNF labels [4, 1, 1] and [3, 2]. Therefore, the enumeration plan generation unit 11 generates an edge from the parent node to the child node using the two partial DNF labels as the parent node and the division source DNF label as the child node.
- FIG. 4 is an explanatory diagram showing an example of the graph structure.
- the enumeration plan generation unit 11 orders each node (DNF label) (step S13 in FIG. 3).
- the enumeration plan generation unit 11 orders the nodes by topological sorting.
- DAG is known to be topologically sortable, and can be ordered while maintaining the parent-child relationship (the context of the arrows) by topological sort.
- FIG. 5 and 6 are explanatory diagrams showing an example of topological sort operation.
- the enumeration plan generation unit 11 sorts the set S of DNF labels in descending order. As a result, the DNF label [4] is the first element. Therefore, the enumeration plan generation unit 11 checks the node with the DNF label [4] as a visited node (FIG. 5A).
- the enumeration plan generation unit 11 traces the output side of the node with the DNF label [4], and checks the node with the DNF label [2] ahead as the visited node (FIG. 5B). Similarly, the enumeration plan generation unit 11 traces the output side of the node with the DNF label [2], and checks the node with the DNF label [1] ahead as the visited node. Since the node with the DNF label [1] has no parent node, the enumeration plan generation unit 11 sets the node with the DNF label [1] to the first (FIG. 5C).
- the enumeration plan generation unit 11 sets the node of the DNF label [2] to the second. Similarly, since the order of all the nodes of the DNF label [4] is set as the parent node, the enumeration plan generating unit 11 sets the node of the DNF label [4] to No. 3 (FIG. 5 (d)).
- the enumeration plan generation unit 11 selects the DNF label [3, 1], which is the second element from the top of the DNF label set S, and sets the node of the DNF label [3, 1] as the visited node. This is checked (FIG. 6 (e)).
- the enumeration plan generation unit 11 traces the output side of the node with the DNF label [3, 1], and checks the node with the DNF label [3] ahead as the visited node.
- the enumeration plan generation unit 11 Since the order of all the nodes of the DNF label [3] is set as the parent node, the enumeration plan generation unit 11 sets the node of the DNF label [3] to No. 4. Similarly, since the order of all the nodes of the DNF label [3, 1] is set to the parent node, the enumeration plan generation unit 11 sets the node of the DNF label [3, 1] to No. 5 (FIG. 6 ( f)). Thereafter, the enumeration plan generation unit 11 repeats the same operation, whereby the order is set for all the nodes (FIG. 6G).
- FIG. 7 is an explanatory diagram illustrating an example of an enumeration plan expressed in a table format.
- the enumeration plan associates a DNF label with two DNF labels that are parents of the DNF label.
- the enumeration plan may include a flag (cacheFlagF) indicating whether or not to cache.
- the enumeration plan generation unit 11 specifies a cache target (step S14 in FIG. 3). Specifically, the enumeration plan generation unit 11 specifies the target attribute to be stored in the intermediate data storage unit 13. At this time, the enumeration plan generation unit 11 reduces the space size necessary for storing in the intermediate data storage unit 13 new attributes generated based on the logical expression structure (logical expression) specified by the DNF label. On the other hand, a partial logical structure that can express a part of the logical structure is selected from the enumeration plan. The expression of a part of the logical formula structure more means that the reusability of the partial logical formula structure is high. The new attribute is generated by the DNF search unit 12 described later.
- the enumeration plan generation unit 11 specifies a cache target based on the calculation cost and the memory cost.
- the calculation cost is the number of references on the enumeration plan. Specifically, the calculation cost indicates the number of times referred to as a parent node.
- the memory cost is the size of the memory space necessary for storing the attribute, and is simply expressed as the sum of the numbers included in the DNF label.
- FIG. 8 is an explanatory diagram illustrating an example in which the calculation cost and the memory cost are calculated based on the enumeration plan illustrated in FIG.
- the calculation cost indicates the number of references on the enumeration plan
- the memory cost indicates the total number included in the DNF label.
- FIG. 8A when the cache target is not specified, the cacheFlag column is in a blank state.
- the enumeration plan generation unit 11 sorts the nodes that are referred to as parent nodes one or more times (that is, the calculation cost is one or more) in descending order, and identifies the top K nodes as cache targets. Note that the number of nodes to be selected is the number within which the memory size M of the generated attribute falls within the specified cache size.
- the cache size of the node set S is calculated by Equation 1 shown below.
- the DNF indicated by the DNF label [1] is on the order of p. That is, the cache size is p ⁇ 10 ⁇ 4 bytes in length. Meanwhile, DNF represented by DNF label [2] is a p 2 amino order. That is, the cache size is p 2 pieces ⁇ length 10 ⁇ 4 bytes. Incidentally, DNF represented by DNF label [1,1] also, because it is p 2 amino order, the cache size is the same as the DNF represented by DNF label [2].
- the enumeration plan generation unit 11 sets “TRUE” in the cacheFlag column for the DNF label that is the cache target, and the DNF label that is not the cache target. For example, “FALSE” is set in the cacheFlag column.
- FIG. 9 is an explanatory diagram showing an example of an enumeration plan.
- the table illustrated in FIG. 9 and the DAG correspond to each other, the DNF label in which “TRUE” is set in the cacheFlag column of the table, and the DAG black node are identified as the cache target.
- the enumeration plan generation unit 11 causes the logical expression so that the difference in the number of attributes included in two partial logical expression structures generated from each logical expression structure is minimized. Divide the structure in two. In other words, the enumeration plan generation unit 11 equally divides each logical expression structure (DNF structure) when creating a parent-child relationship of nodes. Therefore, the memory cost can be reduced.
- DNF structure logical expression structure
- the enumeration plan generation unit 11 does not divide into a DNF with a length of 3 and a DNF with a length of 1, but divides into two DNFs with a length of 2.
- the size of the DNF having the length 3 held in the memory is on the order of the cube.
- the size of the DNF having the length 2 held in the memory can be in the order of a square.
- the DNF search unit 12 generates new attributes by combining the attributes according to the DNF labels in the order of the DNF labels specified by the enumeration plan generation unit 11.
- the DNF search unit 12 registers the generated attribute in the intermediate data storage unit 13.
- the DNF search unit 12 first registers the first node (that is, the original node) in the intermediate data storage unit 13.
- the DNF search unit 12 if the attribute generated for the parent DNF label is cached in the intermediate data storage unit 13, the DNF search unit 12 generates a new attribute using the attribute.
- the enumeration plan generation unit 11 selects a logical expression structure (DNF label) with high reusability as a cache target. As a result, the amount of calculation can be reduced.
- the DNF search unit 12 Each time the DNF search unit 12 generates a new attribute corresponding to each DNF label, the DNF search unit 12 notifies the generated attribute to the sequential attribute evaluation unit 14.
- the intermediate data storage unit 13 stores a new attribute generated by the DNF search unit 12. Specifically, the intermediate data storage unit 13 stores a DNF list and a vector in association with each other for each logical expression structure (DNF label).
- the intermediate data storage unit 13 is realized by, for example, a magnetic disk.
- FIG. 10 is an explanatory diagram illustrating an example of data stored in the intermediate data storage unit 13.
- Each number in the DNF column illustrated in FIG. 10 indicates an attribute type (attribute ID number), and a structure label indicates a logical expression structure.
- attribute ID number attribute ID number
- structure label indicates a logical expression structure.
- the information shown in the DNF sequence is information for holding a permutation of attribute ID numbers, and can be arbitrarily encoded.
- the DNF search unit 12 may store a vector subjected to arbitrary compression such as replacing the same vector with another symbol in the intermediate data storage unit 13.
- the sequential attribute evaluation unit 14 evaluates the attribute generated by the DNF search unit 12.
- the sequential attribute evaluation unit 14 may evaluate attributes using, for example, the method described in Non-Patent Document 4.
- the method for evaluating the attribute is not limited to the method described in Non-Patent Document 4, and the sequential attribute evaluation unit 14 may evaluate the attribute using an arbitrary method.
- the sequential attribute evaluation unit 14 of this embodiment sequentially receives new attributes to be notified each time the DNF search unit 12 generates and evaluates the received attributes. In this way, by performing the sequential evaluation, it is possible to reduce the cost for holding a newly generated attribute.
- the sequential attribute evaluation unit 14 stores the evaluation result in the output data storage unit 15.
- the output data storage unit 15 is a storage device that stores evaluation results.
- the sequential attribute evaluation unit 14 selects, for example, a higher-order attribute (for example, 100) based on an arbitrary score calculated using HSIC (Hilbert-Schmidt Independence Criterion) or Pearson correlation.
- the attribute may be stored in the output data storage unit 15.
- the sequential attribute evaluation unit 14 sends the evaluation result to another device (not shown) via a communication line. May be sent.
- the enumeration plan generation unit 11, the DNF search unit 12, and the sequential attribute evaluation unit 14 are realized by a CPU of a computer that operates according to a program (attribute enumeration program).
- the program is stored in a storage unit (not shown) in the attribute enumeration system, and the CPU reads the program, and as the enumeration plan generation unit 11, the DNF search unit 12, and the sequential attribute evaluation unit 14 according to the program. It may work.
- each of the enumeration plan generation unit 11, the DNF search unit 12, and the sequential attribute evaluation unit 14 may be realized by dedicated hardware.
- the attribute enumeration system of this embodiment may be realized by connecting two or more physically separated devices by wire or wireless, or may be realized by one device. Good.
- the enumeration plan generation unit 11 generates a set of DNF labels that expresses how to combine logical expressions representing combinations of attributes from the learning data attributes and MaxLen.
- the enumeration plan generation unit 11 generates a partial DNF label obtained by dividing the logical expression included in each generated DNF label into two, and generates an enumeration plan associated with the original DNF label.
- the DNF search part 12 produces
- the enumeration plan generation unit 11 divides the DNF label into two so that the number of attributes included in the two partial DNF labels generated from each DNF label becomes equal.
- the enumeration plan generation unit 11 generates an enumeration plan in which a relationship with a partial DNF label that represents a part of the generated DNF label is expressed in a graph structure. At this time, the enumeration plan generation unit 11 can express a part of the DNF label more while reducing the space size necessary for storing the new attribute generated by the DNF search unit 12 (that is, re-encoding). Select a partial DNF label (highly available) from the enumeration plan.
- a relationship (parent-child relationship) that is a component at the time of generation is specified for each DNF label by a graph structure, and a subset of nodes is selected from the viewpoints of memory cost for caching and calculation cost for reuse. Therefore, new attributes can be enumerated comprehensively while enumerating attributes at high speed by reducing calculation cost and suppressing consumption of memory for storing attributes.
- the enumeration plan generation unit 11 first automatically determines a combination method of DNF below MaxLen, and generates an enumeration plan that balances from the viewpoint of the memory amount and the calculation amount. Therefore, it is possible to enumerate new attributes at high speed while suppressing the memory consumption while ensuring the completeness of attributes.
- a new attribute corresponding to the DNF labels [1], [2], [1, 1], that is, a square order attribute may be cached.
- the attribute corresponding to the DNF label [2] can be used instead of the attribute corresponding to the DNF label [3], [1].
- New attributes can be enumerated at a high speed while reducing the consumption of.
- the efficiency of the cache can be improved. For example, in the case of the enumeration plan illustrated in FIG. 9, when evaluating the attributes of the DNF labels [1, 1, 1, 1], [1, 1, 1], [2, 1, 1], a new DNF label There is no need to generate an attribute corresponding to [1,1].
- FIG. 11 is an explanatory diagram illustrating a specific example of processing in which the DNF search unit 12 creates an attribute and stores the cache target attribute in the intermediate data storage unit 13.
- FIG. 11A shows an example of processing for generating an attribute for each DNF label.
- the DNF search unit 12 generates attributes in order from the top row of the table illustrated in FIG. 9 and outputs the generated attributes to the sequential attribute evaluation unit 14 each time an attribute is generated. To do.
- the DNF search unit 12 stores the generated attribute in the intermediate data storage unit 13.
- FIG. 11B shows an example of processing for outputting a combination of attributes.
- the DNF search unit 12 when the attribute corresponding to the DNF label is stored (cached) in the intermediate data storage unit 13, the DNF search unit 12 outputs the attribute.
- the DNF search unit 12 if the attribute corresponding to the DNF label is not stored in the intermediate data storage unit 13 (not cached), a combination of attributes is generated. In this case, the DNF search unit 12 also generates a parent DNF.
- the DNF search unit 12 when the label is only an AND term, the DNF search unit 12 generates a combination of ANDs. When the label is not only the AND term, the DNF search unit 12 Generate a combination.
- FIG. 12 is a block diagram showing an outline of the attribute enumeration system according to the present invention.
- the attribute enumeration system according to the present invention is a logical expression (for example, DNF, CNF) that represents a combination of attributes from the attributes of the learning data (for example, binary matrix X) and the maximum number of combinations of the attributes (for example, MaxLen).
- a set of logical expression structures for example, DNF labels
- a partial logical expression structure for example, partial DNF labels
- An enumeration plan generation unit 81 for example, the enumeration plan generation unit 11
- an enumeration plan for example, an enumeration plan in a table format or a graph structure illustrated in FIG. 9
- an attribute generation unit 82 for example, the DNF search unit 12
- the enumeration plan generation unit 81 converts the logical expression structure to 2 so that the number of attributes included in the two partial logical expression structures generated from each logical expression structure is equal (for example, the difference is minimized). To divide.
- the enumeration plan generation unit 81 generates an enumeration plan in which a relationship with a partial logical expression structure that represents a part of the generated logical expression structure is expressed in a graph structure (for example, DAG).
- a graph structure for example, DAG.
- the structure may be selected from an enumeration plan.
- the attribute generation unit 82 stores a new attribute generated according to the partial logical expression structure selected by the enumeration plan generation unit 81 in the storage device (for example, the intermediate data storage unit 13) and stores the new attribute in the storage device.
- a new attribute corresponding to another logical expression structure may be generated based on the attribute.
- the new attribute stored in the storage device in this way is generated based on the logical expression structure that is appropriately divided into two parts, and the space size can be further reduced. Can be suppressed. Further, since the logical expression structure selected in this way is more reusable, new attributes can be listed at high speed.
- the attribute enumeration system may include an attribute evaluation unit (for example, the sequential attribute evaluation unit 14) that evaluates the attribute generated by the attribute generation unit 82.
- the attribute generation unit 82 may transmit the generated attribute to the attribute evaluation unit every time a new attribute is generated according to each partial logical expression structure.
- the memory space for holding the new attribute to be evaluated can be made smaller, so that the memory efficiency can be increased.
- the enumeration plan generation unit 81 may use an additive standard form (DNF) or a conjunction standard form (CNF) for a logical expression representing a combination of attributes.
- DNF additive standard form
- CNF conjunction standard form
- the additive standard form or the conjunction standard form can guarantee the completeness because any logical expression can be equivalently converted.
- FIG. 13 is a block diagram showing another outline of the attribute listing system according to the present invention.
- the attribute enumeration system according to the present invention is a logical expression (for example, DNF, CNF) that represents a combination of attributes from the attributes of the learning data (for example, binary matrix X) and the maximum number of combinations of the attributes (for example, MaxLen). ) Is generated as a set of logical expression structures (for example, DNF labels), and the relationship with the partial logical expression structure (for example, partial DNF label) that represents a part of the generated logical expression structure is graphed.
- DNF logical expression
- Enumeration plan generation unit 91 e.g., enumeration plan generation unit 11
- a structure e.g., DAG
- attribute generation that generates a new attribute that combines each attribute according to the partial logical expression structure Part 92 (for example, DNF search part 12).
- the enumeration plan generation unit 91 can express a part of the logical expression structure more while reducing the space size (for example, memory cost) necessary for storing the new attribute generated by the attribute generation unit 92.
- a partial formula structure (for example, highly reusable) is selected from the enumeration plan.
- the attribute generation unit 92 stores a new attribute generated according to the partial logical expression structure selected by the enumeration plan generation unit 91 in the storage device (for example, the intermediate data storage unit 13) and stores the new attribute in the storage device.
- a new attribute corresponding to another logical expression structure may be generated based on the attribute.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Evolutionary Computation (AREA)
- General Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- Medical Informatics (AREA)
- Biophysics (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Databases & Information Systems (AREA)
- General Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- Biotechnology (AREA)
- Epidemiology (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Bioethics (AREA)
- Public Health (AREA)
- Physiology (AREA)
- Genetics & Genomics (AREA)
- Biomedical Technology (AREA)
- Molecular Biology (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Description
12 DNF探索部
13 中間データ記憶部
14 逐次的属性評価部
15 出力データ記憶部
Claims (15)
- 学習データの属性と当該属性の組合せ最大数とから、属性の組合せを表す論理式表現の組み合わせ方を表現した論理式構造の集合を生成し、生成された各論理式構造に含まれる論理式表現を2分割した部分論理式構造を生成して分割元の論理式構造に対応付けた列挙プランを生成する列挙プラン生成部と、
生成された前記部分論理式構造に応じて各属性を組み合わせた新たな属性を生成する属性生成部とを備え、
前記列挙プラン生成部は、各論理式構造から生成される2つの部分論理式構造に含まれる属性の数が均等になるように、論理式構造を2分割する
ことを特徴とする属性列挙システム。 - 列挙プラン生成部は、生成された論理式構造の一部を表現する部分論理式構造との関係をグラフ構造で表現した列挙プランを生成し、属性生成部によって生成される新たな属性を記憶するために必要な空間サイズを小さくしつつ、論理式構造の一部をより多く表現可能な部分論理式構造を前記列挙プランの中から選択する
請求項1記載の属性列挙システム。 - 属性生成部は、列挙プラン生成部によって選択された部分論理式構造に応じて生成される新たな属性を記憶装置に記憶させ、前記記憶装置に記憶された属性をもとに、他の論理式構造に応じた新たな属性を生成する
請求項2記載の属性列挙システム。 - 属性生成部により生成される属性の評価を行う属性評価部を備え、
属性生成部は、各部分論理式構造に応じて新たな属性を生成するごとに、生成した属性を前記属性評価部に送信する
請求項1から請求項3のうちのいずれか1項に記載の属性列挙システム。 - 列挙プラン生成部は、属性の組合せを表す論理式表現に加法標準形または連言標準形を用いる
請求項1から請求項4のうちのいずれか1項に記載の属性列挙システム。 - 学習データの属性と当該属性の組合せ最大数とから、属性の組合せを表す論理式表現の組み合わせ方を表現した論理式構造の集合を生成し、生成された論理式構造の一部を表現する部分論理式構造との関係をグラフ構造で表現した列挙プランを生成する列挙プラン生成部と、
前記部分論理式構造に応じて各属性を組み合わせた新たな属性を生成する属性生成部とを備え、
前記列挙プラン生成部は、前記属性生成部によって生成される新たな属性を記憶するために必要な空間サイズを小さくしつつ、前記論理式構造の一部をより多く表現可能な部分論理式構造を前記列挙プランの中から選択する
ことを特徴とする属性列挙システム。 - 属性生成部は、列挙プラン生成部により選択された部分論理式構造に応じて生成される新たな属性を記憶装置に記憶させ、前記記憶装置に記憶された属性をもとに、他の論理式構造に応じた新たな属性を生成する
請求項6記載の属性列挙システム。 - 学習データの属性と当該属性の組合せ最大数とから、属性の組合せを表す論理式表現の組み合わせ方を表現した論理式構造の集合を生成し、
生成された各論理式構造に含まれる論理式表現を2分割した部分論理式構造を生成して分割元の論理式構造に対応付けた列挙プランを生成し、
生成された前記部分論理式構造に応じて各属性を組み合わせた新たな属性を生成し、
列挙プランを生成する際、各論理式構造から生成される2つの部分論理式構造に含まれる属性の数が均等になるように、論理式構造を2分割する
ことを特徴とする属性列挙方法。 - 生成された論理式構造の一部を表現する部分論理式構造との関係をグラフ構造で表現した列挙プランを生成し、
生成される新たな属性を記憶するために必要な空間サイズを小さくしつつ、論理式構造の一部をより多く表現可能な部分論理式構造を前記列挙プランの中から選択する
請求項8記載の属性列挙方法。 - 学習データの属性と当該属性の組合せ最大数とから、属性の組合せを表す論理式表現の組み合わせ方を表現した論理式構造の集合を生成し、
生成された論理式構造の一部を表現する部分論理式構造との関係をグラフ構造で表現した列挙プランを生成し、
部分論理式構造に応じて生成される新たな属性を記憶するために必要な空間サイズを小さくしつつ、前記論理式構造の一部をより多く表現可能な部分論理式構造を前記列挙プランの中から選択し、
選択された部分論理式構造に応じて各属性を組み合わせた新たな属性を生成する
ことを特徴とする属性列挙方法。 - 選択された部分論理式構造に応じて生成される新たな属性を記憶装置に記憶させ、前記記憶装置に記憶された属性をもとに、他の論理式構造に応じた新たな属性を生成する
請求項10記載の属性列挙方法。 - コンピュータに、
学習データの属性と当該属性の組合せ最大数とから、属性の組合せを表す論理式表現の組み合わせ方を表現した論理式構造の集合を生成し、生成された各論理式構造に含まれる論理式表現を2分割した部分論理式構造を生成して分割元の論理式構造に対応付けた列挙プランを生成する列挙プラン生成処理、および、
生成された前記部分論理式構造に応じて各属性を組み合わせた新たな属性を生成する属性生成処理を実行させ、
前記列挙プラン生成処理で、各論理式構造から生成される2つの部分論理式構造に含まれる属性の数が均等になるように、論理式構造を2分割させる
ための属性列挙プログラム。 - コンピュータに、
列挙プラン生成処理で、生成された論理式構造の一部を表現する部分論理式構造との関係をグラフ構造で表現した列挙プランを生成させ、属性生成処理で生成される新たな属性を記憶するために必要な空間サイズを小さくしつつ、論理式構造の一部をより多く表現可能な部分論理式構造を前記列挙プランの中から選択させる
請求項12記載の属性列挙プログラム。 - コンピュータに、
学習データの属性と当該属性の組合せ最大数とから、属性の組合せを表す論理式表現の組み合わせ方を表現した論理式構造の集合を生成し、生成された論理式構造の一部を表現する部分論理式構造との関係をグラフ構造で表現した列挙プランを生成する列挙プラン生成処理、および、
前記部分論理式構造に応じて各属性を組み合わせた新たな属性を生成する属性生成処理とを実行させ、
前記列挙プラン生成処理で、前記属性生成処理で生成される新たな属性を記憶するために必要な空間サイズを小さくしつつ、前記論理式構造の一部をより多く表現可能な部分論理式構造を前記列挙プランの中から選択させる
ための属性列挙プログラム。 - コンピュータに、
属性生成処理で、列挙プラン生成処理で選択された部分論理式構造に応じて生成される新たな属性を記憶装置に記憶させ、前記記憶装置に記憶された属性をもとに、他の論理式構造に応じた新たな属性を生成させる
請求項14記載の属性列挙プログラム。
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US15/316,075 US10740677B2 (en) | 2014-06-03 | 2015-02-13 | Feature enumeration system, feature enumeration method and feature enumeration program |
JP2016525668A JP6500896B2 (ja) | 2014-06-03 | 2015-02-13 | 属性列挙システム、属性列挙方法および属性列挙プログラム |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2014-114923 | 2014-06-03 | ||
JP2014114923 | 2014-06-03 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2015186278A1 true WO2015186278A1 (ja) | 2015-12-10 |
Family
ID=54766365
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2015/000682 WO2015186278A1 (ja) | 2014-06-03 | 2015-02-13 | 属性列挙システム、属性列挙方法および属性列挙プログラム |
Country Status (3)
Country | Link |
---|---|
US (1) | US10740677B2 (ja) |
JP (1) | JP6500896B2 (ja) |
WO (1) | WO2015186278A1 (ja) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2018180970A1 (ja) | 2017-03-30 | 2018-10-04 | 日本電気株式会社 | 情報処理システム、特徴量説明方法および特徴量説明プログラム |
WO2018180971A1 (ja) | 2017-03-30 | 2018-10-04 | 日本電気株式会社 | 情報処理システム、特徴量説明方法および特徴量説明プログラム |
JP2019049975A (ja) * | 2017-09-07 | 2019-03-28 | 富士通株式会社 | ディープラーニング分類モデルの訓練装置及び方法 |
US10885011B2 (en) | 2015-11-25 | 2021-01-05 | Dotdata, Inc. | Information processing system, descriptor creation method, and descriptor creation program |
US11514062B2 (en) | 2017-10-05 | 2022-11-29 | Dotdata, Inc. | Feature value generation device, feature value generation method, and feature value generation program |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6438741B1 (en) * | 1998-09-28 | 2002-08-20 | Compaq Computer Corporation | System and method for eliminating compile time explosion in a top down rule based system using selective sampling |
US7188091B2 (en) * | 2001-03-21 | 2007-03-06 | Resolutionebs, Inc. | Rule processing system |
US8909584B2 (en) * | 2011-09-29 | 2014-12-09 | International Business Machines Corporation | Minimizing rule sets in a rule management system |
-
2015
- 2015-02-13 JP JP2016525668A patent/JP6500896B2/ja active Active
- 2015-02-13 US US15/316,075 patent/US10740677B2/en active Active
- 2015-02-13 WO PCT/JP2015/000682 patent/WO2015186278A1/ja active Application Filing
Non-Patent Citations (2)
Title |
---|
KAZUHISA MAKINO: "Logical Analysis of Data and Boolean Functions", PROCEEDINGS OF THE 1998 IEICE GENERAL CONFERENCE, vol. 1, no. TD-1-3, 6 March 1998 (1998-03-06), pages 476 - 477, XP003016182 * |
KEN SADOHARA: "Feature selection using Boolean kernels for the learning of Boolean functions", IPSJ SIG NOTES, vol. 2004, no. 29, 17 March 2004 (2004-03-17), pages 187 - 192 * |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10885011B2 (en) | 2015-11-25 | 2021-01-05 | Dotdata, Inc. | Information processing system, descriptor creation method, and descriptor creation program |
WO2018180970A1 (ja) | 2017-03-30 | 2018-10-04 | 日本電気株式会社 | 情報処理システム、特徴量説明方法および特徴量説明プログラム |
WO2018180971A1 (ja) | 2017-03-30 | 2018-10-04 | 日本電気株式会社 | 情報処理システム、特徴量説明方法および特徴量説明プログラム |
US11727203B2 (en) | 2017-03-30 | 2023-08-15 | Dotdata, Inc. | Information processing system, feature description method and feature description program |
JP2019049975A (ja) * | 2017-09-07 | 2019-03-28 | 富士通株式会社 | ディープラーニング分類モデルの訓練装置及び方法 |
JP7225614B2 (ja) | 2017-09-07 | 2023-02-21 | 富士通株式会社 | ディープラーニング分類モデルの訓練装置及び方法 |
US11514062B2 (en) | 2017-10-05 | 2022-11-29 | Dotdata, Inc. | Feature value generation device, feature value generation method, and feature value generation program |
Also Published As
Publication number | Publication date |
---|---|
US20170109629A1 (en) | 2017-04-20 |
US10740677B2 (en) | 2020-08-11 |
JP6500896B2 (ja) | 2019-04-17 |
JPWO2015186278A1 (ja) | 2017-04-20 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2015186278A1 (ja) | 属性列挙システム、属性列挙方法および属性列挙プログラム | |
US11481440B2 (en) | System and method for processing metadata to determine an object sequence | |
US20120066667A1 (en) | Simulation environment for distributed programs | |
CN114282076B (zh) | 一种基于秘密分享的排序方法和系统 | |
CN103500185B (zh) | 一种基于多平台数据生成数据表的方法和系统 | |
Gupta | Process mining a comparative study | |
Gavrilets | Dynamics of clade diversification on the morphological hypercube | |
US10540360B2 (en) | Identifying relationship instances between entities | |
Zhang et al. | A multi-level self-adaptation approach for microservice systems | |
Seol et al. | Design process modularization: concept and algorithm | |
Gupta | Process mining algorithms | |
CN112306452A (zh) | 归并排序算法处理业务数据的方法、装置及系统 | |
JPWO2012115007A1 (ja) | 故障の木解析システム、故障の木解析方法及びプログラム | |
Dasari et al. | Maximal clique enumeration for large graphs on hadoop framework | |
CN105787013B (zh) | 一种异构数据的类型名称分配方法及系统 | |
Rusinaite et al. | A systematic literature review on dynamic business processes | |
CN112070487A (zh) | 基于ai的rpa流程的生成方法、装置、设备及介质 | |
Varley | Information theory for complex systems scientists | |
Steffen et al. | Generating hard benchmark problems for weak bisimulation | |
WO2015045091A1 (ja) | ベイジアンネットワークの構造学習におけるスーパーストラクチャ抽出のための方法及びプログラム | |
Weigert et al. | Simulation-based scheduling of assembly operations | |
Lai et al. | Exploiting and evaluating MapReduce for large-scale graph mining | |
Eloe et al. | Efficient determination of spatial relations using composition tables and decision trees | |
Alomari et al. | Anti-synchronisation of non-identical fractional order hyperchaotic systems | |
Ningombam et al. | A knowledge interchange format (KIF) for robots in cloud |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 15802679 Country of ref document: EP Kind code of ref document: A1 |
|
ENP | Entry into the national phase |
Ref document number: 2016525668 Country of ref document: JP Kind code of ref document: A |
|
WWE | Wipo information: entry into national phase |
Ref document number: 15316075 Country of ref document: US |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 15802679 Country of ref document: EP Kind code of ref document: A1 |