AU2021105123A4 - Redundancy rule screening method for association rule mining and device thereof - Google Patents
Redundancy rule screening method for association rule mining and device thereof Download PDFInfo
- Publication number
- AU2021105123A4 AU2021105123A4 AU2021105123A AU2021105123A AU2021105123A4 AU 2021105123 A4 AU2021105123 A4 AU 2021105123A4 AU 2021105123 A AU2021105123 A AU 2021105123A AU 2021105123 A AU2021105123 A AU 2021105123A AU 2021105123 A4 AU2021105123 A4 AU 2021105123A4
- Authority
- AU
- Australia
- Prior art keywords
- rule
- mining
- target
- case
- association
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
- 238000005065 mining Methods 0.000 title claims abstract description 172
- 238000012216 screening Methods 0.000 title claims abstract description 51
- 238000000034 method Methods 0.000 title claims abstract description 42
- 238000004590 computer program Methods 0.000 claims description 7
- 238000007418 data mining Methods 0.000 abstract description 3
- 238000010586 diagram Methods 0.000 description 7
- 238000004891 communication Methods 0.000 description 4
- 238000010276 construction Methods 0.000 description 3
- 230000010365 information processing Effects 0.000 description 2
- 230000000717 retained effect Effects 0.000 description 2
- 238000009412 basement excavation Methods 0.000 description 1
- 230000008030 elimination Effects 0.000 description 1
- 238000003379 elimination reaction Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/26—Visual data mining; Browsing structured data
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2458—Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
- G06F16/2465—Query processing support for facilitating data mining operations in structured databases
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/901—Indexing; Data structures therefor; Storage structures
- G06F16/9027—Trees
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/903—Querying
- G06F16/9035—Filtering based on additional data, e.g. user or group profiles
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Computational Linguistics (AREA)
- Software Systems (AREA)
- Fuzzy Systems (AREA)
- Mathematical Physics (AREA)
- Probability & Statistics with Applications (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
OF THE DISCLOSURE
The present disclosure discloses a redundancy rule screening method for
association rule mining and a device thereof, relates to the technical field of data mining,
and can solve the technical problem that a currently adopted redundancy rule screening
method is easily influenced by subjective factors, so that a screening result is inaccurate.
The method comprises the steps of performing association rule mining on a target
transaction database based on a frequent pattern tree FP-Tree algorithm to obtain an
initial mining result; matching an association rule in the initial mining result with each
case in the target transaction database in sequence; and according to the matching result
of each case, eliminating a redundancy rule in the initial mining result to obtain a target
mining result. The present disclosure is suitable for accurately screening out the
redundancy rules during association rule mining.
ABSTRACT DRAWING - Fig 1
17950820_1 (GHMatters) P116960.AU
1/4
Association rule mining is performed on a target transaction
database based on a frequent pattern tree FP-Tree algorithm 101
to obtain an initial mining result.
An association rule in the initial mining result is matched with 102
each case in the target transaction database in sequence.
According to the matching result of each case, a redundancy
rule in the initial mining result is eliminated to obtain a 03
target mining result.
FIG. 1
17950816_1 (GHMatters) P116960.AU
Description
1/4
Association rule mining is performed on a target transaction database based on a frequent pattern tree FP-Tree algorithm 101 to obtain an initial mining result.
An association rule in the initial mining result is matched with 102 each case in the target transaction database in sequence.
According to the matching result of each case, a redundancy rule in the initial mining result is eliminated to obtain a 03 target mining result.
FIG. 1
17950816_1 (GHMatters) P116960.AU
[01] The present disclosure relates to the technical field of data mining, in particular to a redundancy rule screening method for association rule mining and a device thereof.
[02] Data mining comprises four parts: clustering mining, classification mining, anomaly analysis and association rule mining. The main task of association rule mining is to extract valuable association patterns from transaction databases. For example, the earliest case of mining association rules is to extract the information of associated sales between goods from the shopping column database of supermarket customers, so that supermarket operators can increase the revenue corresponding to bundled sales. Rules with low frequency or low reliability or with no value are often defined as worthless rules, that is, redundancy rules. There are two classical methods for mining association rules: Apriori and FP-Tree. Each mining result usually has three important parameters to measure whether it is redundant. The three parameters comprise supporting degree, confidence and promotion. The thresholds of the three parameters are set in advance, and every potential association rule is selected according to the thresholds of the three parameters in the process of executing Apriori or FP-Tree, and the final mining result is finally preserved.
[03] However, there are two problems in mining results using the threshold of three parameters: 1) the setting of parameters is highly subjective, and 2) too high a parameter threshold will lead to the loss of important mining results, while too low a parameter threshold will lead to the discovery of a large number of redundancy rules. Therefore, the current redundancy rule screening method is easily affected by subjective factors, resulting in inaccurate screening results.
[04] In view of this, the present disclosure provides a redundancy rule screening method for association rule mining and a device thereof, which mainly aims to solve the technical problem that a currently adopted redundancy rule screening method is easily influenced by subjective factors, so that a screening result is inaccurate.
[05] According to one aspect of the present disclosure, there is provided a redundancy rule screening method for association rule mining, comprising:
[06] performing association rule mining on a target transaction database based on a frequent pattern tree FP-Tree algorithm to obtain an initial mining result;
[07] matching an association rule in the initial mining result with each case in the target transaction database in sequence; and
[08] according to the matching result of each case, eliminating a redundancy rule in the initial mining result to obtain a target mining result.
[09] According to another aspect of the present disclosure, there is provided a redundancy rule screening device for association rule mining, comprising:
1 17950820_1(GHMatters) P116960.AU
[10] a mining module, which is configured to perform association rule mining on a target transaction database based on a frequent pattern tree FP-Tree algorithm to obtain an initial mining result;
[11] a matching module, which is configured to match an association rule in the initial mining result with each case in the target transaction database in sequence; and
[12] an eliminating module, which is configured to according to the matching result of each case, eliminate a redundancy rule in the initial mining result to obtain a target mining result.
[13] According to another aspect of the present disclosure, there is provided a non-volatile readable storage medium on which a computer program is stored, wherein when being executed by a processor, the program implements the redundancy rule screening method for association rule mining described above.
[14] According to another aspect of the present disclosure, there is provided a computer device comprising a nonvolatile readable storage medium, a processor, and a computer program stored on the nonvolatile readable storage medium and operable on the processor, wherein the processor implements the redundancy rule screening method for association rule mining described above when executing the program.
[15] With the above technical scheme, compared with the current redundancy rule screening method, a redundancy rule screening method for association rule mining and a device thereof provided by the present disclosure can first perform association rule mining on a target transaction database based on a frequent pattern tree FP-Tree algorithm to obtain an initial mining result of association rules, then match an association rule in the initial mining result with each case in the target transaction database in sequence; and according to the matching result of each case, eliminate a redundancy rule in the initial mining result to obtain a target mining result. According to the technical scheme in the present disclosure, the best association rules with the largest amount of information can be screened out for each case, and the association rules other than the best association rules can be eliminated as redundancy rules. Furthermore, the final mining results can be obtained by fusing the best association rules corresponding to all cases and performing deduplication processing. The screening error caused by subjective factors when screening redundancy rules by setting parameter thresholds is avoided. Through the technical scheme in the present disclosure, the accuracy of screening redundancy rules can be effectively enhanced, and then the reliability of mining association rules corresponding to each case can be guaranteed.
[01] The drawings described here are used to provide a further understanding of the present disclosure, and constitute a part of the present disclosure. The illustrative embodiments of the present disclosure and their descriptions are used to explain the present disclosure, and do not constitute improper restrictions on local applications. In the drawings:
[02] FIG. 1 shows a flow diagram of a redundancy rule screening method for association rule mining according to an embodiment of the present application;
[03] FIG. 2 shows a flow diagram of another redundancy rule screening method for
2 17950820_1 (GHMatters) P116960.AU association rule mining according to an embodiment of the present application;
[04] FIG. 3 shows an example schematic diagram of a redundancy rule screening process for association rule mining according to an embodiment of the present application;
[05] FIG. 4 shows a structural schematic diagram of a redundancy rule screening device for association rule mining according to an embodiment of the present application;
[06] FIG. 5 shows a structural schematic diagram of another redundancy rule screening device for association rule mining according to an embodiment of the present application.
[07] Hereinafter, the present disclosure will be described in detail with reference to the drawings and embodiments. It should be noted that the embodiments in the present disclosure and the features in the embodiments can be combined with each other without conflict.
[08] Aiming at the technical problem that a currently adopted redundancy rule screening method is easily influenced by subjective factors, so that a screening result is inaccurate, an embodiment of the present disclosure provides a redundancy rule screening method for association rule mining. As shown in FIG. 1, the method comprises the following steps.
[09] 101. Association rule mining is performed on a target transaction database based on a frequent pattern tree FP-Tree algorithm to obtain an initial mining result.
[10] The idea of Frequent Pattern Tree (FP-Tree) algorithm is to construct a Frequent Pattern Tree FP-Tree, compress the whole transaction database into the Frequent Pattern Tree, and find out all frequent item sets according to the FP-Tree. In the process of constructing FP-tree of the whole transaction database, it can be generated only by scanning the transaction database once. According to the present disclosure, the association rule mining of the target transaction database can be realized directly based on the frequent pattern tree FP-Tree algorithm, and the initial mining result can be further obtained. Since there are valuable association rules in the initial mining results, there are also redundancy rules with low frequency, low reliability or with no value, so that it is necessary to carry out subsequent processing on the initial mining results in order to accurately eliminate redundancy rules and further obtain target mining results with great reference significance.
[11] The executive body of the present disclosure can be a data processing system for association rule mining. After obtaining the initial mining results, the association rules in the initial mining results can be matched with each case in the target transaction database in sequence, and then redundancy rules can be determined and eliminated according to the matching results, thus obtaining more accurate target mining results.
[12] 102. An association rule in the initial mining result is matched with each case in the target transaction database in sequence.
[13] In this embodiment, in a specific application scenario, each association rule in the initial mining result can be matched with each case in the target transaction database
3 17950820_1 (GHMatters) P116960.AU one by one, and then each association rule can be divided and matched into specific cases, so that the target association rule with the most information coverage can be screened out for each case according to the matching association rule corresponding to each case, thus determining and eliminating redundancy rules with less information coverage.
[14] 103. According to the matching result of each case, a redundancy rule in the initial mining result is eliminated to obtain a target mining result.
[15] In this embodiment, the association rule with the largest rule information amount can be directly determined as the best target association rule in this case by obtaining the association rule that establishes a matching relationship with the same case and comparing and analyzing the rule information amount corresponding to each association rule. Because other matching association rules cover a small amount of information, the association rules other than the target association rules can be directly determined as redundancy rules with low reliability, and then the redundancy rules can be eliminated, so that the target mining results with great reference significance can be obtained.
[16] The redundancy rule screening method for association rule mining in this embodiment can first perform association rule mining on a target transaction database based on a frequent pattern tree FP-Tree algorithm to obtain an initial mining result of association rules, then match an association rule in the initial mining result with each case in the target transaction database in sequence; and according to the matching result of each case, eliminate a redundancy rule in the initial mining result to obtain a target mining result. According to the technical scheme in the present disclosure, the best association rules with the largest amount of information can be screened out for each case, and the association rules other than the best association rules can be eliminated as redundancy rules. Furthermore, the final mining results can be obtained by fusing the best association rules corresponding to all cases and performing deduplication processing. The screening error caused by subjective factors when screening redundancy rules by setting parameter thresholds is avoided. Through the technical scheme in the present disclosure, the accuracy of screening redundancy rules can be effectively enhanced, and then the reliability of mining association rules corresponding to each case can be guaranteed.
[17] Furthermore, as a refinement and extension of the specific implementation of the above embodiment, in order to fully explain the specific implementation process in this embodiment, another redundancy rule screening method for association rule mining is provided. As shown in FIG. 2, the method comprises the following steps.
[18] 201. Association rule mining is performed on a target transaction database based on a frequent pattern tree FP-Tree algorithm to obtain an initial mining result.
[19] For this embodiment, in a specific application scenario, step 201 of the embodiment may specifically comprise determining a target transaction database, and scanning the target transaction database based on the frequent pattern tree FP-Tree algorithm to obtain an initial mining result.
[20] The target transaction database is a database in which a task is to be explored. The target transaction database can contain multiple cases, each case can correspond to
4 17950820_1 (GHMatters) P116960.AU multiple identical or different transaction items, and the transaction item types are different according to the target transaction database types. For example, for the target transaction database of a construction land type, the transaction items corresponding to a certain case A can comprise: the distance from the urban center ic, the distance from the district and county center ix, the distance from the main road iri, the distance from the secondary roadir2, the distance from the railway it, the distance from the expressway iro, the distance from the river ir, the distance from the lake ii, the distance from the slope ip, the construction land intensity ib, the undeveloped land intensity iu, and the expansion intensity ie, etc. For the target transaction database of academic archives, the transaction items corresponding to a case B can comprise: name, age, nationality, native place, political status, educational background, study abroad experience, award-winning experience, academic paper publishing experience, etc.
[21] 202. Each case in the target transaction database is traversed to obtain a first transaction item set of each case.
[22] There are a preset number of case records related to the excavation task in the target transaction database, and each case record can contain multiple transaction items (the number of transaction items is not limited to several to thousands). For example, the target transaction database of academic archives can contain case records A: XX, male, 40 years old, Han nationality, native of Liaoning, doctor, member of the communist party, 1 year of study abroad experience, and 10 published articles; it can further comprise case record B: XX, female, 38 years old, Han nationality, native of Liaoning, doctor, member of the masses, 10 published articles, 5 published patents, hosting over 3 provincial and ministerial projects, etc. Different cases can have the same transaction items (such as gender, age, etc.) or different transaction items (such as whether to publish articles, whether to host projects, whether to study abroad, etc.).
[23] For this embodiment, the transaction item set of each case can be extracted first, so as to verify the information amount contained in each association rule in the association mining result. When the association rule contains more transaction items in the case, the larger the information amount representing the association rule, the more representative it is. Therefore, for this embodiment, it is necessary to traverse each case record to obtain the first transaction item set corresponding to each case. For example, based on the example of step 201 of the embodiment, the first transaction item set corresponding to case A can be extracted from the target transaction database of the construction land type: set_a = {ic, ix, iI, ir2, it, ira, ir, i, ip, ib, iu, ie}.
[24] 203. Each first association rule in the initial mining result is traversed to obtain a second transaction item set of each first association rule.
[25] In the specific application scenario, the initial mining results are comprehensively extracted according to each case record, and the first association rule can be either a valuable association rule or a redundancy rule with less information. In order to further screen out redundancy rules, it is necessary to determine the corresponding matching target cases of each first association rule, and screen out redundancy rules according to the number of the matching association rules and the amount of information corresponding to the target cases. In this embodiment, in order to determine the target case matching with each first association rule, it is necessary to
5 17950820_1 (GHMatters) P116960.AU traverse each first association rule in the initial mining result and determine the second transaction item set corresponding to each first association rule, so as to establish the matching relationship between each first association rule and each case by comparing and analyzing the second transaction item set with the first transaction item set.
[26] 204. The matching result of each case is determined according to the first transaction item set and the second transaction item set.
[27] In a specific application scenario, when matching association rules with cases, the second transaction item set corresponding to association rules can be compared with the first transaction item set corresponding to cases to determine whether the second transaction item set is completely contained in the first transaction item set, that is, when all transaction items in the second transaction item set are contained in the first transaction item set, it can be determined that the first association rule corresponding to the second transaction item set matches the case corresponding to the first transaction item set, and then the first association rule can be associated with the case. If the first association rule do not match the case, the first association rule is associated with the case, and continue to execute until the first association rule and all cases end traversal. Accordingly, for this embodiment, step 204 may specifically comprise: extracting a target transaction item set completely contained in the first transaction item set from the second transaction item set; according to the second association rule corresponding to the target transaction item set, determining the matching relation list of cases corresponding to the first transaction item set. Multiple matching second association rules can be contained in the matching relationship list, that is, multiple association rules can be correspondingly extracted from the same case. When the number of association rules is greater than 1, redundancy rules can be further determined and eliminated.
[28] 205. A redundancy rule is determined according to the rule information amount of the second association rule in the matching relation list corresponding to each case.
[29] For this embodiment, in a specific application scenario, after all the first association rules in the initial mining result are matched with all the cases in the target transaction database, the matching relationship list of each case can be obtained. In this embodiment, adhering to the principle that only the relation rule with the largest amount of information is reserved in each case, the redundancy rule in each case is determined. First, it is necessary to determine the rule information amount of each relation rule according to the number of transaction items in the transaction item set corresponding to the association rule. When it is determined that the matching relationship list corresponding to a case contains only one relation rule or contains a plurality of relation rules with the same rule information amount, the relation rule can be directly determined as the best target relation rule corresponding to the case, and then it is determined that there is no redundancy rule in the case. When it is determined that the matching relation list corresponding to a case contains a plurality of relation rules corresponding to different rule information amounts, the relation rule with the largest rule information amount can be determined as the best target relation rule corresponding to the case by comparing the size of rule information amounts, and all relation rules except the target relation rule can be determined as redundancy rules to be eliminated.
6 17950820_1 (GHMatters) P116960.AU
Accordingly, step 205 of the embodiment may specifically comprise: if it is determined that there are at least two matched second association rules in the case according to the matching relation list, extracting the rule information amount of each second association rule; determining the second association rule with the largest rule information amount as the target association rule of the case; and determining a second association rule other than the target association rule as a redundancy rule.
[30] The rule information amount of the second association rule can be determined according to the number of transaction items in the second transaction item set corresponding to the second association rule. Since the smaller the number of transaction items contained in association rules means the smaller the amount of information contained, in order to determine the amount of information contained in each second association rule, in this embodiment, it is necessary to traverse each second association rule in the target case, record the number of transaction items contained in the transaction item set corresponding to the second association rule, and determine the number of transaction items as the information amount of association rules. In the specific application scenario, due to being limited by the threshold of the number of transaction items extracted from association rules, the rule information amount also has a corresponding numerical interval. For example, if the maximum number of transaction items allowed to be extracted by association rules is 12 and the minimum number is 1, the rule information amount should be greater than or equal to 1 and less than or equal to 12. The closer the rule information amount is to the maximum value, the greater the information amount corresponding to association rules. On the contrary, the smaller the information amount corresponding to association rules, the more likely it is a redundancy rule.
[31] In this embodiment, when there are multiple matching second association rules in a target case, only the second association rule with the largest rule information amount can be retained as the final target association rule of the target case by comparing the rule information among the second association rules, and the second association rules other than the target association rule can be eliminated as redundancy rules. For example, for the target case A, it is determined that the matching second association rules comprise a, b, c and d, and the rule information amounts corresponding to each second association rule are determined as 8, 9, 8 and 3, respectively. Since the rule information amounts corresponding to the second association rule a and the second association rule c are the largest, the second association rule a and the second association rule c can be determined as the target association rules corresponding to the target case A. Since the second association rule b and the second association rule d contain a relatively small amount of rule information, the second association rule b and the second association rule d can be determined as redundancy rule elimination.
[32] 206. The redundancy rule is eliminated from the initial mining result to obtain a target mining result.
[33] In this embodiment, after determining the redundancy rules in each case, the redundancy rules can be further eliminated from the initial mining results. After eliminating the redundancy rules corresponding to each case, each case contains the
7 17950820_1 (GHMatters) P116960.AU target association rule with the largest information amount of the corresponding association rules. However, because each target case can contain the same transaction items, when there are enough transaction items, the target association rules extracted from each target case may be the same. Therefore, in order to ensure the conciseness of the mining results of association rules finally presented, it is necessary to de-duplicate the target association rules. Accordingly, for this embodiment, in a specific application scenario, step 206 of this embodiment may specifically comprise traversing the target association rule, merging the same association rule, and constructing a target mining result according to the merged target association rules.
[34] With the above technical scheme, compared with the current redundancy rule screening method, the present disclosure can first perform association rule mining on a target transaction database based on a frequent pattern tree FP-Tree algorithm to obtain an initial mining result of association rules, then match an association rule in the initial mining result with each case in the target transaction database in sequence; and according to the matching result of each case, eliminate a redundancy rule in the initial mining result to obtain a target mining result. According to the technical scheme in the present disclosure, the best association rules with the largest amount of information can be screened out for each case, and the association rules other than the best association rules can be eliminated as redundancy rules. Furthermore, the final mining results can be obtained by fusing the best association rules corresponding to all cases and performing deduplication processing. The screening error caused by subjective factors when screening redundancy rules by setting parameter thresholds is avoided. Through the technical scheme in the present disclosure, the accuracy of screening redundancy rules can be effectively enhanced, and then the reliability of mining association rules corresponding to each case can be guaranteed.
[35] For this embodiment, in a specific application scenario, when screening out redundancy rules in the association rule mining process, refer to the example schematic diagram of the redundancy rule screening process for association rule mining shown in FIG. 3. After determining the target transaction database A, the FP-Tree method can be used to mine the association rules for the target transaction database A to obtain the initial mining result B. Then, the rules in the initial mining result B are matched to each case in the target transaction database A one by one, and the target transaction database A after matching processing is denoted as A_match. The cases traversed in the database A_Match is denoted as a_match. The rule b_match matched with a_match is traversed. The number of items contained in the transaction set set set b in b match is denoted as the rule information amount i of b_match. The rule b_match corresponding to the maximum value of rule information amount i is taken as the final corresponding rule of the case a match, and the rule b_match with smaller rule information amount i is eliminated as the redundancy rule. The rule b_match retained by each case a_Match in the database a_match is derived to the matching rule result set C. The matching rule result set C is traversed, and the same rules are merged to obtain the final target mining result D. When merging and deduplication processing is carried out, it can specifically comprise: s1, creating a target mining result set D; s2, denoting the rule traversed in the matching rule result set C as c; s3, denoting the rule traversed in the target mining result
8 17950820_1 (GHMatters) P116960.AU set D as d, and creating an identifier flag as False; s4, comparing whether the item set set_c in rule c is completely matched with the item set set_d in rule d, if so, modifying the indicator flag to True, otherwise, repeating step s3 until the traversal of the result set D is completed; s5, judging the value of flag, outputting c to the result set D if the value of flag is False, otherwise discarding c and executing step s2 until the traversal of the result set C is completed; s6, deriving the result set D, that is, thefinal target mining result.
[36] Furthermore, as a specific implementation of the method shown in FIG. 1 and FIG. 2, an embodiment of the present disclosure provides a redundancy rule screening device for association rule mining. As shown in FIG. 4, the device comprises: a mining module 31, a matching module 32 and an eliminating module 33.
[37] The mining module 31 is configured to perform association rule mining on a target transaction database based on a frequent pattern tree FP-Tree algorithm to obtain an initial mining result.
[38] The matching module 32 is configured to match an association rule in the initial mining result with each case in the target transaction database in sequence.
[39] The eliminating module 33 is configured to according to the matching result of each case, eliminate a redundancy rule in the initial mining result to obtain a target mining result.
[40] In a specific application scenario, in order to preliminarily determine the initial mining results, the mining module 31 can be specifically configured to determine a target transaction database, and scan the target transaction database based on the frequent pattern tree FP-Tree algorithm to obtain an initial mining result.
[41] Accordingly, as shown in FIG. 5, the matching module 32 may specifically comprises a traversing unit 321 and a determining unit 322.
[42] The traversing unit 321 is configured to traverse each case in the target transaction database to obtain a first transaction item set of each case.
[43] The traversing unit 321 is further configured to traverse each first association rule in the initial mining result to obtain a second transaction item set of each first association rule.
[44] The determining unit 322 is configured to determine the matching result of each case according to the first transaction item set and the second transaction item set.
[45] In a specific application scenario, the determining unit 322 is specifically configured to extract a target transaction item set completely contained in the first transaction item set from the second transaction item set; according to a second association rule corresponding to the target transaction item set, determining a matching relation list of cases corresponding to the first transaction item set.
[46] Accordingly, in order to eliminate redundancy rules in association rule mining results, as shown in FIG. 5, the eliminating module 33 may specifically comprises a determining unit 331 and an eliminating unit 332.
[47] The determining unit 331 is configured to determine a redundancy rule according to the rule information amount of the second association rule in the matching relation list corresponding to each case.
[48] The eliminating unit 332 is configured to eliminate the redundancy rule from
9 17950820_1 (GHMatters) P116960.AU the initial mining result to obtain a target mining result.
[49] In a specific application scenario, the determining unit 331 is specifically configured to, if it is determined that there are at least two matched second association rules in the case according to the matching relation list, extract the rule information amount of each second association rule; determine the second association rule with the largest rule information amount as the target association rule of the case; and determining a second association rule other than the target association rule as a redundancy rule.
[50] Accordingly, the eliminating unit 332 can be specifically configured to eliminate the redundancy rule from the initial mining result to determine the target mining result constructed by the target association rule corresponding to each case.
[51] In a specific application scenario, in order to determine the final association rule mining result according to the target association rules corresponding to each target case, the eliminating unit 332 can be specifically configured to traverse the target association rule, merge the same association rule, and construct a target mining result according to the merged target association rules.
[52] It should be noted that other corresponding descriptions of the functional units involved in the redundancy rule screening device for association rule mining provided in this embodiment can refer to the corresponding descriptions in FIG. 1 to FIG. 2, which will not be described in detail here.
[53] On the basis of the above methods shown in FIG. 1 to FIG. 2, correspondingly, this embodiment further provides a nonvolatile storage medium on which a computer program is stored, wherein when being executed by a processor, the program implements the redundancy rule screening method for association rule mining as shown in FIG. 1 to FIG. 2.
[54] Based on this understanding, the technical scheme of the present disclosure can be embodied in the form of a software product, which can be stored in a nonvolatile storage medium (which can be CD-ROM, a USB flash disk, a mobile hard disk, etc.) and comprises several instructions to make a computer device (which can be a personal computer, a server, or a network device, etc.) execute the method of each implementation scenario of the present disclosure.
[55] Based on the above embodiments of the method shown in FIG. 1 to FIG. 2 and the virtual device shown in FIGS. 4 and 5, in order to achieve the above purpose, this embodiment further provides a computer device, which comprises a storage medium and a processor. The nonvolatile storage medium is used to store a computer program; the processor is used to execute the computer program to implement the redundancy rule screening method for association rule mining as shown in FIG. 1 to FIG. 2.
[56] Optionally, the computer device may further comprise a user interface, a network interface, a camera, a Radio Frequency (RF) circuit, a sensor, an audio circuit, a WI-FI module, etc. The user interface may comprise a display, an input unit such as a keyboard, etc. The optional user interface may further comprise a USB interface, a card reader interface, etc. The network interface may optionally comprise a standard wired interface, a wireless interface (such as a WI-FI interface), and the like.
[57] Those skilled in the art can understand that the structure of a computer device
10 17950820_1 (GHMatters) P116960.AU provided by this embodiment does not constitute a limitation on the physical device, and may comprise more or less components, or combine some components, or arrange different components.
[58] The nonvolatile storage medium may further comprise an operating system and a network communication module. The operating system is a program that manages the hardware and software resources of the above computer device, and supports the operation of information processing programs and other software and/or programs. The network communication module is used to realize the communication among the components in the nonvolatile storage medium and the communication with other hardware and software in the information processing entity device.
[59] Through the description of the above embodiments, those skilled in the art can clearly understand that the present disclosure can be realized by means of a software plus necessary general hardware platform, and can also be realized by hardware.
[60] By applying the technical scheme of the present disclosure, compared with the prior art, the present disclosure can first perform association rule mining on a target transaction database based on a frequent pattern tree FP-Tree algorithm to obtain an initial mining result of association rules, then match an association rule in the initial mining result with each case in the target transaction database in sequence; and according to the matching result of each case, eliminate a redundancy rule in the initial mining result to obtain a target mining result. According to the technical scheme in the present disclosure, the best association rules with the largest amount of information can be screened out for each case, and the association rules other than the best association rules can be eliminated as redundancy rules. Furthermore, the final mining results can be obtained by fusing the best association rules corresponding to all cases and performing deduplication processing. The screening error caused by subjective factors when screening redundancy rules by setting parameter thresholds is avoided. Through the technical scheme in the present disclosure, the accuracy of screening redundancy rules can be effectively enhanced, and then the reliability of mining association rules corresponding to each case can be guaranteed.
[61] Those skilled in the art can understand that the drawing is only a schematic diagram of a preferred implementation scenario, and the modules or processes in the drawing are not necessary for the implementation of the present disclosure. Those skilled in the art can understand that the modules in the devices in the implementation scenario can be distributed in the devices in the implementation scenario according to the description of the implementation scenario, or they can be located in one or more devices different from this implementation scenario with corresponding changes. The modules in the above implementation scenarios can be merged into one module or further split into multiple sub-modules.
[62] The above application serial numbers are only for description, and do not represent the advantages and disadvantages of the implementation scenario. The above disclosure is only a few specific implementation scenarios of the present disclosure. However, the present disclosure is not limited thereto, and any changes conceivable to those skilled in the art should fall into the protection scope of the present disclosure.
[63] It is to be understood that, if any prior art publication is referred to herein, such
11 17950820_1(GHMtters) P116960.AU reference does not constitute an admission that the publication forms a part of the common general knowledge in the art, in Australia or any other country.
[64] In the claims which follow and in the preceding description of the invention, except where the context requires otherwise due to express language or necessary implication, the word "comprise" or variations such as "comprises" or "comprising" is used in an inclusive sense, i.e. to specify the presence of the stated features but not to preclude the presence or addition of further features in various embodiments of the invention.
12 17950820_1 (GHMatters) P116960.AU
Claims (5)
1. A redundancy rule screening method for association rule mining, comprising: performing association rule mining on a target transaction database based on a frequent pattern tree FP-Tree algorithm to obtain an initial mining result; matching an association rule in the initial mining result with each case in the target transaction database in sequence; and according to the matching result of each case, eliminating a redundancy rule in the initial mining result to obtain a target mining result.
2. The method according to claim 1, wherein performing association rule mining on a target transaction database based on a frequent pattern tree FP-Tree algorithm to obtain an initial mining result specifically comprises: determining a target transaction database, and scanning the target transaction database based on the frequent pattern tree FP-Tree algorithm to obtain an initial mining result; wherein matching an association rule in the initial mining result with each case in the target transaction database in sequence specifically comprises: traversing each case in the target transaction database to obtain a first transaction item set of each case; traversing each first association rule in the initial mining result to obtain a second transaction item set of each first association rule; determining the matching result of each case according to the first transaction item set and the second transaction item set; wherein determining the matching result of each case according to the first transaction item set and the second transaction item set specifically comprises: extracting a target transaction item set completely contained in the first transaction item set from the second transaction item set; according to a second association rule corresponding to the target transaction item set, determining a matching relation list of cases corresponding to the first transaction item set; wherein according to the matching result of each case, eliminating a redundancy rule in the initial mining result to obtain a target mining result specifically comprises: determining a redundancy rule according to the rule information amount of the second association rule in the matching relation list corresponding to each case; eliminating the redundancy rule from the initial mining result to obtain a target mining result; wherein determining a redundancy rule according to the rule information amount of the second association rule in the matching relation list corresponding to each case specifically comprises: if it is determined that there are at least two matched second association rules in the case according to the matching relation list, extracting the rule information amount of each second association rule; determining the second association rule with the largest rule information amount as the target association rule of the case;
13 17950820_1 (GHMatters) P116960.AU determining a second association rule other than the target association rule as a redundancy rule; eliminating the redundancy rule from the initial mining result to obtain the target mining result specifically comprises: eliminating the redundancy rule from the initial mining result to determine the target mining result constructed by the target association rule corresponding to each case; wherein eliminating the redundancy rule from the initial mining result to determine the target mining result constructed by the target association rule corresponding to each case specifically comprises: traversing the target association rule, merging the same association rule, and constructing a target mining result according to the merged target association rules.
3. A redundancy rule screening device for association rule mining, comprising: a mining module, which is configured to perform association rule mining on a target transaction database based on a frequent pattern tree FP-Tree algorithm to obtain an initial mining result; a matching module, which is configured to match an association rule in the initial mining result with each case in the target transaction database in sequence; and an eliminating module, which is configured to according to the matching result of each case, eliminate a redundancy rule in the initial mining result to obtain a target mining result.
4. A non-volatile readable storage medium on which a computer program is stored, wherein when being executed by a processor, the program implements the redundancy rule screening method for association rule mining according to any one of claims 1 to 2.
5. A computer device comprising a nonvolatile readable storage medium, a processor, and a computer program stored on the nonvolatile readable storage medium and operable on the processor, wherein the processor implements the redundancy rule screening method for association rule mining according to any one of claims 1 to 2 when executing the program.
14 17950820_1 (GHMatters) P116960.AU
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011399622.8 | 2020-12-04 | ||
CN202011399622.8A CN112434104B (en) | 2020-12-04 | 2020-12-04 | Redundant rule screening method and device for association rule mining |
Publications (1)
Publication Number | Publication Date |
---|---|
AU2021105123A4 true AU2021105123A4 (en) | 2021-10-07 |
Family
ID=74692551
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
AU2021105123A Ceased AU2021105123A4 (en) | 2020-12-04 | 2021-08-09 | Redundancy rule screening method for association rule mining and device thereof |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN112434104B (en) |
AU (1) | AU2021105123A4 (en) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113407603B (en) * | 2021-05-13 | 2022-10-04 | 北京鼎轩科技有限责任公司 | Data export method and system |
CN115292388B (en) * | 2022-09-29 | 2023-01-24 | 广州天维信息技术股份有限公司 | Automatic scheme mining system based on historical data |
Family Cites Families (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7337230B2 (en) * | 2002-08-06 | 2008-02-26 | International Business Machines Corporation | Method and system for eliminating redundant rules from a rule set |
CN101799810B (en) * | 2009-02-06 | 2012-09-26 | 中国移动通信集团公司 | Association rule mining method and system thereof |
CN102142992A (en) * | 2011-01-11 | 2011-08-03 | 浪潮通信信息系统有限公司 | Communication alarm frequent itemset mining engine and redundancy processing method |
CN104915683A (en) * | 2015-06-09 | 2015-09-16 | 西北工业大学 | Generalized non-redundancy sequence rule mining method based on progressive-increase projection rule |
KR20170088469A (en) * | 2016-01-22 | 2017-08-02 | 서울대학교산학협력단 | A stepwise method for mining association rules based on a Boolean expression for dynamic datasets |
CN106407999A (en) * | 2016-08-25 | 2017-02-15 | 北京物思创想科技有限公司 | Rule combined machine learning method and system |
US10489363B2 (en) * | 2016-10-19 | 2019-11-26 | Futurewei Technologies, Inc. | Distributed FP-growth with node table for large-scale association rule mining |
CN106570128A (en) * | 2016-11-03 | 2017-04-19 | 南京邮电大学 | Mining algorithm based on association rule analysis |
CN106874491A (en) * | 2017-02-22 | 2017-06-20 | 北京科技大学 | A kind of device fault information method for digging based on dynamic association rules |
CN109344150A (en) * | 2018-08-03 | 2019-02-15 | 昆明理工大学 | A kind of spatiotemporal data structure analysis method based on FP- tree |
CN110474929B (en) * | 2019-09-27 | 2021-06-22 | 新华三信息安全技术有限公司 | Redundancy rule detection method and device |
-
2020
- 2020-12-04 CN CN202011399622.8A patent/CN112434104B/en active Active
-
2021
- 2021-08-09 AU AU2021105123A patent/AU2021105123A4/en not_active Ceased
Also Published As
Publication number | Publication date |
---|---|
CN112434104A (en) | 2021-03-02 |
CN112434104B (en) | 2023-10-20 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US9471440B2 (en) | Method and system for processing product properties | |
AU2021105123A4 (en) | Redundancy rule screening method for association rule mining and device thereof | |
Yu et al. | apLCMS—adaptive processing of high-resolution LC/MS data | |
WO2019019630A1 (en) | Anti-fraud identification method, storage medium, server carrying ping an brain and device | |
US9514167B2 (en) | Behavior based record linkage | |
CN109992601B (en) | To-do information pushing method and device and computer equipment | |
CN111199474A (en) | Risk prediction method and device based on network diagram data of two parties and electronic equipment | |
US20120102057A1 (en) | Entity name matching | |
CN106202440B (en) | Data processing method, device and equipment | |
CN111104427A (en) | Global logistics single number identification method and device, computer equipment and storage medium | |
WO2019142391A1 (en) | Data analysis assistance system and data analysis assistance method | |
CN111445320A (en) | Target community identification method and device, computer equipment and storage medium | |
US20170109402A1 (en) | Automated join detection | |
CN116910650A (en) | Data identification method, device, storage medium and computer equipment | |
CN111755092B (en) | Medical data interconnection and intercommunication method and medical system | |
WO2019069507A1 (en) | Feature value generation device, feature value generation method, and feature value generation program | |
JP6655582B2 (en) | Data integration support system and data integration support method | |
US11106689B2 (en) | System and method for self-service data analytics | |
KR20210084364A (en) | Refining and providing method of real estate information, system and computer program thereof | |
CN112861532B (en) | Address standardization processing method, device, equipment and online searching system | |
CN110765327A (en) | Data analysis method, data analysis device, computer device, and storage medium | |
CN107220255B (en) | Address information processing method and device | |
CN111353871A (en) | Risk prediction method and device based on network diagram data of two parties and electronic equipment | |
EP3220336A1 (en) | Method and system for normalizing unit of measures of a product | |
CN109635286B (en) | Policy hotspot analysis method and device, computer equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
FGI | Letters patent sealed or granted (innovation patent) | ||
MK22 | Patent ceased section 143a(d), or expired - non payment of renewal fee or expiry |