CN111309777A

CN111309777A - Report data mining method for improving association rule based on mutual exclusion expression

Info

Publication number: CN111309777A
Application number: CN202010050602.3A
Authority: CN
Inventors: 沈毅; 赵虹博; 张淼
Original assignee: Harbin Institute of Technology
Current assignee: Harbin Institute of Technology
Priority date: 2020-01-14
Filing date: 2020-01-14
Publication date: 2020-06-19

Abstract

A method for mining improved association rule report data based on mutual exclusion expression relates to a knowledge discovery and data mining method in the field of data science, and solves the problems of large memory consumption and low efficiency when a traditional association rule algorithm is used for processing mass data. The method comprises the following steps: converting data into transaction data based on a data threshold range, and obtaining a binary sparse matrix with a grouping label based on data logic; secondly, acquiring a set with all frequent items as 1, and removing the non-frequent item set to obtain a new grouping result; and thirdly, carrying out self-connection iterative search on the frequent item set, cutting the candidate item set, and iterating until a new frequent item set cannot be generated, thereby obtaining an association rule mining result. The basic idea of the invention is to convert the structured data into transaction data, generate grouping based on mutual exclusion relationship, and perform rule mining, thereby reducing the calculation memory and improving the calculation efficiency. The application scene is wide, and the social and economic values are high.

Description

Report data mining method for improving association rule based on mutual exclusion expression

Technical Field

The invention relates to a knowledge discovery and data mining method in the field of data science, in particular to an improved association rule report data mining method based on mutual exclusion expression.

Background

With the explosive growth of data volume in the information age, people find that huge data value is hidden behind massive data. For the data value of the structured data, a better result can be obtained by applying a traditional or modern data mining means, but because the structured data report is difficult to be converted into transaction data (only a data set indicating whether an event occurs is represented by a Boolean value True and a Flase) in the association rule mining process, the data scale is continuously increased, and a data mining method which can mine valuable information faces a huge challenge in the presence of mass data. At present, the data dimensionality is greatly improved by dividing a structured data report into transaction data based on a threshold value, the computing efficiency is low because the memory and computing power of a computer cannot meet the computing requirement, and the time cost of data mining is greatly increased. The method provided by the patent can reasonably reduce the data of the high-dimensional data after the structured data report is converted into the transaction data based on mutual exclusion expression, so that the calculation cost of the traditional association rule algorithm is reduced, and the efficiency of association rule data mining is improved.

Data mining and analysis by using a traditional association rule algorithm have already achieved some research achievements at home and abroad, but aiming at a structured data report based on conversion from threshold partitioning into transaction data, because threshold partitioning generates more data items with mutual exclusion relationship, the traditional algorithm has low efficiency and easily causes insufficient calculation memory, and the improvement method for the situation is still in an exploration stage. Association Rules (Association Rules) are Rules that reflect the interdependency and Association of one thing with another for mining correlations between valuable data items from a large amount of data. The main methods of the existing association rule mining method can be divided into three categories: an association rule mining method based on Apriori, an association rule mining method based on FP-growth and an association rule mining method based on Eclat. The Apriori algorithm utilizes two characteristics of a frequent item set to filter a large number of independent sets, thereby improving the calculation efficiency. The FP-growth algorithm compresses data records by constructing a tree structure, so that the data records are only required to be scanned twice when a frequent item set is mined, and a candidate item set is not required to be generated, thereby improving the calculation efficiency. The Eclat algorithm is converted from an original horizontal database structure of items corresponding to transactions into a structure of a vertical database by introducing an inverted idea, wherein the items in the data are used as keys, and a transaction ID corresponding to each item is used as a value, so that an inverted table is generated, the generation speed of a frequent item set is accelerated, the support degree is calculated by cross counting by taking depth-first search as a strategy, and the calculation efficiency is improved.

The existing association rule algorithm is an association rule mining method for small-scale data, but when the data to be processed is converted from structured data into transaction data with a mutual exclusion relationship, the memory deficiency is often caused by low calculation efficiency. In addition, the efficiency of mining and calculating frequent item sets by association rules can be optimized by improving the existing Apriori algorithm, so that the time cost of the algorithm is reduced. The innovative methods can help people to analyze the big data association rule, improve the calculation efficiency, reduce the time cost and obtain the association rule analysis result more quickly.

Disclosure of Invention

Because the scale of the data mined by the association rule is continuously improved at present, the traditional association rule analysis mining method is difficult to solve the problems of insufficient calculation memory and overhigh calculation time cost caused by converting threshold division into a structured data report of transaction data. The invention aims to solve the problem of low calculation efficiency caused by excessive mutual exclusion data items in a structured data report which is converted into transaction data by threshold partitioning and processed by the conventional association rule mining method, and provides an improved association rule mining method based on mutual exclusion expression, so that the calculation efficiency can be improved, and the implicit association relation among data indexes can be mined.

The purpose of the invention is realized by the following technical scheme: firstly, converting data to be processed into transaction data based on a logical relationship among the data and a data threshold range, analyzing data information to obtain data scale and matrix density, and grouping the data based on mathematical logic; then, reading and writing the data set for the first time, calculating the support degree value of each item, and grouping the frequent item sets according to the grouping result of the previous step by obtaining all sets with the frequent items of 1; and finally, respectively carrying out iterative search on each group of data to obtain a frequent item set, cutting the candidate set obtained by search, and iterating until a new frequent item set cannot be generated, thereby obtaining an association rule mining result.

The flow chart of the invention is shown in figure 1, and the specific steps are as follows:

the method comprises the following steps: firstly, converting data to be processed into transaction data based on a logic relationship between the data and a data threshold range, analyzing data information to obtain data scale and matrix density, and grouping an item set with a mutual exclusion relationship in the data based on mathematical logic to obtain a binary sparse matrix with a grouping label, wherein the specific steps are as follows:

1) converting a threshold value of a data set to be mined based on data values into transaction data, recording the transaction data as a data set D, and setting I_n＝{r_n1,r_n2,...,r_nmIs a collection of m different items, each r_nκReferred to as an item, a collection of items I_nCalled item set, the number of elements is called length of item set, item set with length k is called "k item set", and for data set D, containing n item sets, it can be expressed as:

D＝{I₁,I₂,...,I_n}

after conversion into transaction data, for any item i_κIts value only exists in two cases:

2) by transforming the data set D, a binary sparse matrix U can be obtained, where U can be expressed as:

wherein r is_nmThe first subscript n of (a) denotes the corresponding nth item set I_nThe second subscript m represents the set of items I_nThe m-th element in the binary sparse matrix U only contains boolean values 1 and 0, which respectively correspond to True and False in the transaction data;

3) for item set I in data set D_jPerforming a logic analysis ifIf there are item sets with mutual exclusion relationship, then divide them into one group, and mark it as Q_t：

Q_t＝{I_a,I_b,...,I_n}

Note that packet Q_tIs determined by the number of all mutually exclusive sets of items in the data set D, and each set of items exists in only one group Q_tAmong them:

4) further, a binary sparse matrix U with a grouping label is obtained, which can be represented by Q as:

U＝[Q₁Q₂... Q_t]

wherein, U is a binary sparse matrix, i.e. a matrix only containing 0 and 1, and is composed of t grouping matrices Q.

Step two: reading and writing the data set for the first time, calculating the support degree, the confidence degree and the promotion degree of each item, obtaining a set with all frequent items as 1, and removing the non-frequent item sets according to the groups obtained based on the mutual exclusion relationship in the step one to obtain a new grouping result, wherein the specific steps are as follows:

1) traversing the data set D, calculating a set of items I for each_nOf middle record number, i.e. r _nκ1, based on a given minimum support S₀If S is greater than or equal to S₀Then the item set I_nA frequent item set;

2) respectively calculating the Support degree Support, the Confidence degree Confidence and the Lift degree Lift of the generated frequent item set:

Support(I_a→I_b)＝Support(I_a∪I_b)＝P(I_a∪I_b)

wherein (I)_a→I_b) Representing only item sets I_aAnd item set I_bThe correlation between the two does not have the meaning of mathematical calculation; the support degree is the number of times of occurrence of the corresponding item set divided by the total number of records; support (I)_a→I_b) Is that the transaction in the database D contains I_a∪I_bPercent of (d), denoted Support (I)_a∪I_b),I_a,I_bE is the I; the significance of the confidence lies in the set of terms I_a,I_bThe number of simultaneous occurrences in the set of terms I_aThe ratio of the number of occurrences, i.e. occurrence I_aUnder conditions of (A) and (B) occur_bThe probability of (d); the significance of the promotion degree lies in the metric item set I_aAnd { I }_bIndependence of the item set, which reflects the set of items I_aOccurrence of { I } for a set of items { I_bHow much change occurs to the probability of occurrence, generally if the value is 1, it indicates that the two conditions have no correlation, if the value is less than 1, it indicates that the two conditions are repulsive, and when the degree of lift is greater than 1, the greater the degree of lift is, the higher the value of the correlation rule is;

3) after traversal, based on the minimum support threshold S₀Get the collection of item set with frequent items 1, and record as L₁：

4) For the remaining non-frequent item set L₁'from the group Q obtained based on the mutual exclusion event in the first step, a new Q' is obtained excluding the non-frequent item set:

Q′＝Q-L₁′

q ' is a grouping obtained based on the mutual exclusion relation after a frequent item set is obtained for the first time, and is used for extracting a candidate set of ' 2 item sets '.

Step three: obtaining the mining result of the association rule, respectively extracting the item set from Q 'to generate a candidate set C of' 2 item sets₂Pruning to obtain frequent item set L₂Then, for frequent item set L₂Performing self-connection iterative search on frequent item sets, cutting candidate sets obtained by search, iterating until new frequent item sets cannot be generated, calculating confidence degrees and promotion degrees of all frequent item sets to obtain data index item sets with implicit association relations, and specifically comprising the following steps:

1) according to the permutation and combination, two item sets are sequentially extracted from Q 'to generate a candidate set C of' 2 item sets₂：

2) To obtain L₂Candidate item set C₂Pruning, i.e. computing candidate sets C₂Thereby obtaining a frequent item set L₂：

3) To obtain L_κ(kappa.gtoreq.3), adding L_κ-1And L_κ-1Concatenating to produce a "kappa item set" candidate set C_κIt is written as:

L_κ＝{l₁,l₂,...,l_n},l_i,l_j(1≤i,j≤n),l_i,l_j∈L_κ

wherein l_i＝{l_i(1),l_i(2),...,l_i(m)},l_i(κ)(1≤κ≤m)∈l_iAnd is l_iThe kth entry of (1), perform join L_κ-1∞L_κ-1The symbol "∞" indicates that the two sets of terms are self-join, i.e., different terms are extracted from each set of terms in turn to generate a new set of terms, L if their first (k-2) terms are the same_κ-1Is connectable, connection l₁And l₂The resulting set of items is { l }₁₍₁₎l₁₍₂₎,...,l_1(κ-1)l_2(κ-1)}；

4)C_κIs L_κSuperset of (C), calculation of the same reason_κThe support degree of each candidate item set is obtained as a frequent item set L_κIf, if

Have the advantages that

I.e. new candidates will not be frequent items either, so may be at C_κRemoving the item;

5) repeated iteration is carried out to obtain a frequent item set L_κUp to

And then, ending iteration to obtain the confidence and promotion results of the frequent item sets from 2 to k, and obtaining the data index item set with the implicit association relationship.

Compared with the prior art, the invention has the following advantages:

the invention adopts an association rule mining method, and converts numerical data into transaction data by dividing a data set based on a threshold value, thereby facilitating the development of association rule analysis and mining work and expanding the data range which can be analyzed by the association rule; the characteristic of the structured data with the mutually exclusive items is used for grouping the mutually exclusive item sets in the data preprocessing process, so that the data storage structure in the database is optimized, and the traversal efficiency is improved.

Meanwhile, the structured data association rule mining with a large number of mutual exclusion relations is improved based on the Apriori algorithm, and in the process of traversing frequent item sets for the first time, the structured data association rule mining method based on the Apriori algorithm groups based on the mutual exclusion item sets, so that the calculation process of deleting non-frequent item sets is reduced, and the calculation storage space is effectively reduced; in the process of repeatedly iterating to search for frequent item sets and eliminating non-frequent item sets, the generation of the non-frequent item sets is greatly reduced based on the grouping of the mutually exclusive item sets, so that the complicated step of deleting the non-frequent item sets is omitted, the calculation efficiency of the algorithm is effectively improved, and a large amount of time cost is saved.

At present, some existing association rule mining methods are only suitable for analyzing and mining a small amount of transaction data, and a large amount of time cost is needed for massive transaction data due to low computing efficiency. Aiming at massive structured data with a mutually exclusive item set, the association rule mining and analyzing method carries out association rule mining and analyzing on the data based on an improved algorithm of the group iterative search of the mutually exclusive item set, thereby improving the iterative computation efficiency, optimizing the memory space required by computation, greatly reducing the time cost, and quickly and efficiently meeting the requirements of association rule mining and analyzing of big data researchers.

Drawings

FIG. 1 is a flow chart of the method of the present invention.

FIG. 2 is an exemplary diagram of essential information of an embodiment data set.

Fig. 3 is a diagram of an example of the first 20 degrees of support.

FIG. 4 is an exemplary graph of support, confidence and boost scatter.

Fig. 5 is an exemplary diagram of the association rule paracoord.

Detailed Description

The following describes the specific implementation of the invention with reference to college teaching quality data:

the college teaching quality data are structured data, and multi-dimensional data of college teaching quality are included. The implicit relevance exists among the multidimensional indexes such as the number scale of students, the quality of teacher teams, school handling conditions, school fund prizes and the like, and the association rule mining and analysis are needed to be carried out on the teaching quality data.

Executing the step one: the method comprises the steps of preprocessing structured data to be processed as shown in table 1, uniformly converting the data into transaction data in a Boolean numerical value form, grouping the data based on a mutual exclusion relation seen by data indexes as shown in table 2, and obtaining a binary sparse matrix with a grouping label.

Based on each index, the original structured data in table 1 selects a proper threshold value and converts the threshold value into corresponding transaction data. The original data includes 997 college 25 core indexes, and each core index is divided into three types of low, medium and high according to a threshold value, so that the original structured data is expanded to a binary sparse matrix of 997 × 75, and after the original structured data is converted into the transaction data in table 2, basic information of a data set is checked by calling a Summary function in an R language, and the matrix density of the matrix is 0.33, as shown in fig. 2. The project support frequency map can be viewed by using the itemFrequencyPlot function in the R language, and the project with the support rank 20 is shown in FIG. 3.

Table 1 example table of data to be processed

School	Number of students in general department	Folding number of students	Special-purpose teachers	Percent of pass of constitutions
					Anhui University of Science and Technology	24242	36520	1853	94.15
ZHEJIANG INTERNATIONAL STUDIES University	24430	28531	1635	91.56
					Heilongjiang foreign language	7474	7988	752	69.90
Zhejiang Ocean University	9576	10235	564	76.65
					QILU NORMAL University	11144	13564	1304	92.52
QINGDAO HUANGHAI University	7400	7954	856	87.49
					QILU INSTITUTE OF TECHNOLOGY	5422	6523	785	88.48
JOURNAL OF JIANGXI PUBLIC SECURITY College	29850	31256	2365	100.00
					GUANGXI MEDICAL University	8876	9520	981	65.22
GANNAN NORMAL University	16778	18463	2028	99.90
					BAOSHAN University	22173	42979	1876	88.90
ZheJiang Chinese Medical University	7047	8456	626	67.36
					XINYU University	4337	5236	666	98.20
Jilin University	41344	91261	2869	77.63
					CANGZHOU NORMAL University	14533	15623	1562	82.56
NANCHANG INSTITUTE OF SCIENCE & TECHNOLOGY	13340	15354	1111	98.25
					GUANGDONG UNIVERSITY OF SCIENCE & TECHNOLOGY	6283	7652	628	77.36

Table 2 transaction data example table

School	Reduced student number _ low	Reduced student number in	Reduced number of students _ height	General Benke student number _ Low
					Anhui University of Science and Technology	FALSE	FALSE	TRUE	FALSE
ZHEJIANG INTERNATIONAL STUDIES University	FALSE	FALSE	TRUE	FALSE
					Heilongjiang foreign language	TRUE	FALSE	FALSE	TRUE
Zhejiang Ocean University	FALSE	TRUE	FALSE	FALSE
					QILU NORMAL University	FALSE	TRUE	FALSE	FALSE
QINGDAO HUANGHAI University	TRUE	FALSE	FALSE	TRUE
					QILU INSTITUTE OF TECHNOLOGY	TRUE	FALSE	FALSE	TRUE
JOURNAL OF JIANGXI PUBLIC SECURITY College	FALSE	FALSE	TRUE	FALSE
					GUANGXI MEDICAL University	TRUE	FALSE	FALSE	TRUE
GANNAN NORMAL University	FALSE	TRUE	FALSE	FALSE
					BAOSHAN University	FALSE	FALSE	TRUE	FALSE
ZheJiang Chinese Medical University	TRUE	FALSE	FALSE	TRUE
					XINYU University	TRUE	FALSE	FALSE	TRUE
Jilin University	FALSE	FALSE	TRUE	FALSE
					CANGZHOU NORMAL University	FALSE	TRUE	FALSE	FALSE
NANCHANG INSTITUTE OF SCIENCE & TECHNOLOGY	FALSE	TRUE	FALSE	FALSE
					GUANGDONG UNIVERSITY OF SCIENCE & TECHNOLOGY	TRUE	FALSE	FALSE	TRUE

The transaction data in table 2 are represented by boolean values 0 and 1, resulting in a binary sparse matrix. Because each index of the original structured data is expanded into three indexes, namely a low index, a medium index and a high index, and the indexes have obvious mutual exclusion relationship, the low index, the medium index and the high index of each original index are divided into one group in a mutually exclusive item set Q. Namely, the reduced student number _ low, the reduced student number _ medium and the reduced student number _ high, and the three indexes have obvious mutual exclusion relationship, so that the indexes are divided into a group. Because there are 25 core indexes, the mutually exclusive item set Q includes 25 sets, and each set has 3 indexes. Thereby obtaining a binary sparse matrix with grouping labels, and finishing the first step of data preprocessing.

And (5) executing the step two: reading and writing the data set for the first time, calculating the support value of each item, obtaining a set with all frequent items being 1, and grouping the frequent item sets according to the grouping labels.

For the obtained binary sparse matrix first read-write data set, the record number of each index can be obtained, the minimum support degree is given, the respective support degrees of 75 indexes are calculated, and for the indexes meeting the minimum support degree threshold value, an item set with frequent items of 1 is recorded as L₁. The support, confidence and lift of each set of terms can then be calculated. Wherein the support is the number of occurrences of the corresponding set of items divided by the total number of records; support (I)_a→I_b) Is that the transaction in the database D contains I_a∪I_b(ii) percent (d); the significance of the confidence lies in the set of terms I_a,I_bThe number of simultaneous occurrences in the set of terms I_aThe ratio of the number of occurrences, i.e. occurrence I_aUnder conditions of (A) and (B) occur_bThe probability of (d); the significance of the promotion degree lies in the metric item set I_aAnd { I }_bIndependence of the item set, which reflects the set of items I_aOccurrence of { I } for a set of items { I_bHow much change occurs in the probability of occurrence, generally if the value is 1, it indicates that the two conditions are not associated, if less than 1, it indicates that the two conditions are repulsive, and when the degree of lift is greater than 1, the greater the degree of lift, the higher the value of the association rule.

For the remaining non-frequent item set L₁' removing the non-frequent item set L from Q according to the grouping Q obtained based on the mutual exclusion event in the step one₁'get a new packet Q'. The significance of the method is to reduce the index which does not meet the minimum support threshold in the grouping label, and because the index which does not meet the minimum support threshold still cannot meet the minimum support threshold in the item set generated after the index is connected with other indexes in the follow-up process, the index which does not meet the minimum support threshold can be removed in the stepThe method has the advantages that the calculation efficiency is effectively improved, and a part of memory space can be released, so that the calculation process of the computer is optimized, and the time cost is saved.

And step three is executed: extracting item sets from Q 'respectively to generate a candidate set C of' 2 item sets₂Pruning to obtain frequent item set L₂Then, for frequent item set L₂And carrying out self-connection iterative search on the frequent item set, cutting the candidate set obtained by search, and iterating until a new frequent item set cannot be generated, thereby obtaining an association rule mining result.

To generate a candidate set C of "2 item sets₂In the grouping obtained in the step two, two indexes are arbitrarily extracted from any two grouping sets by permutation and combination to obtain a candidate set C of a' 2 item set₂. For candidate set C₂Calculating the support degree and judging whether the support degree meets the minimum support degree threshold value, and for the item set meeting the minimum support degree threshold value, the frequent item set L can be added₂Among them; and deleting the item sets which do not meet the minimum support threshold, thereby completing the pruning process.

Then in order to obtain L_κ(kappa.gtoreq.3), adding L_κ-1And L_κ-1Concatenating to produce a "kappa item set" candidate set C_κWherein l is_i＝{l_i(1),l_i(2),...,l_i(m)},l_i(κ)(1≤κ≤m)∈l_iAnd is l_iThe kth entry of (1), perform join L_κ-1∞L_κ-1L if their preceding (kappa-2) terms are the same_κ-1Is connectable, connection l₁And l₂The resulting set of items is { l }₁₍₁₎l₁₍₂₎,...,l_1(κ-1)l_2(κ-1)}；C_κIs L_κSuperset of (C), calculation of the same reason_κThe support degree of each candidate item set is obtained as a frequent item set L_κIf, if

Have the advantages that

I.e. new candidates will not be frequent items either, so may be at C_κThe term is removed. Repeated iteration is carried out to obtain a frequent item set L_κUp to

And then, ending the iteration to obtain the support degree, confidence degree and promotion degree results of each frequent item set.

And (3) mining association rules of the data by using an improved Apriori function, setting the parameter support degree to be 0.1, setting the confidence coefficient to be 0.8, and setting the minimum item number included in the association rules to be 2. 738679 rules were obtained by mining. When the support degree is increased to 0.2, the mining obtains 31869 rules as shown in FIG. 2, and when the support degree is increased to 0.3, the mining obtains 3519 rules.

And selecting the mining result with the support degree of 0.2 and the confidence degree of 0.8 for analysis. A scatter diagram of the mining result of the support degree, the confidence degree and the lifting degree is shown in fig. 4, when the support degree is 0.4 and the confidence degree is 0.8, a paracoord diagram is drawn and shown in fig. 5, wherein a broken line represents a related rule, and the deeper the color is, the higher the lifting degree is. The promotion degree of 4135 rules in total is larger than 3 when the support degree is 0.1 and the confidence degree is 0.8, and the promotion degree of 53 rules in total is larger than 3 when the support degree is 0.2 and the confidence degree is 0.8.

The association rule analysis legend generated in this embodiment and the improved association rule report data mining method based on the mutual exclusion expression are written based on the 3.4.3 version R language, but the implementation of the patent of the present invention is not limited to the development of the R language, and the programming implementation of the patent method by using other languages and development environments is all within the protection scope of the patent.

Claims

1. A report data mining method based on the improved association rule of mutual exclusion expression is characterized in that the method comprises the following steps:

the method comprises the following steps: converting data to be processed into transaction data based on the logic relationship among the data and the data threshold range, analyzing data information to obtain data scale and matrix density, and grouping item sets with mutual exclusion relationship in the data based on mathematical logic to obtain a binary sparse matrix with a grouping label;

step two: reading and writing the data set, calculating the support degree, the confidence degree and the promotion degree of each item to obtain a set with all frequent items being 1, and removing the non-frequent item set according to the grouping obtained based on the mutual exclusion relationship in the step one so as to obtain a new grouping result;

step three: extracting item sets from mutually exclusive groups respectively to generate a candidate set C of' 2 item sets₂Pruning to obtain frequent item set L₂Then, for frequent item set L₂And carrying out self-connection iterative search on the frequent item set, cutting the candidate set obtained by search, and iterating until a new frequent item set cannot be generated, thereby obtaining an association rule mining result.

2. The method for mining report data based on mutual exclusion expression improved association rules according to claim 1, wherein the step one specifically comprises:

converting data to be processed into transaction data based on a logic relationship between the data and a data threshold range, analyzing data information to obtain data scale and matrix density, and grouping item sets with a mutual exclusion relationship in the data based on mathematical logic to obtain a binary sparse matrix with a grouping label, wherein the specific steps are as follows:

D＝{I₁,I₂,...,I_n}

3) for item set I in data set D_jPerforming logic analysis, if the item set of the mutual exclusion relationship exists, dividing the item set into a group, and marking the group as Q_t：

Q_t＝{I_a,I_b,...,I_n}

U＝[Q₁Q₂... Q_t]

3. The method for mining report data based on mutual exclusion expression improved association rules according to claim 1, wherein the second step specifically comprises:

reading and writing the data set for the first time, calculating the support degree, the confidence degree and the promotion degree of each item, obtaining a set with all frequent items as 1, and removing the non-frequent item sets according to the groups obtained based on the mutual exclusion relationship in the step one to obtain a new grouping result, wherein the specific steps are as follows:

1) traversing the data set D, calculating a set of items I for each_nOf middle record number, i.e. r_nκ1, based on a given minimum support S₀If S is greater than or equal to S₀Then the item set I_nA frequent item set;

Support(I_a→I_b)＝Support(I_a∪I_b)＝P(I_a∪I_b)

wherein (I)_a→I_b) Representing only item sets I_aAnd item set I_bThe correlation between the two does not have the meaning of mathematical calculation; the support degree is the number of times of occurrence of the corresponding item set divided by the total number of records; support (I)_a→I_b) Is that the transaction in the database D contains I_a∪I_bPercent of (d), denoted Support (I)_a∪I_b),I_a,I_bE is the I; the significance of the confidence lies in the set of terms I_a,I_bThe number of simultaneous occurrences in the set of terms I_aThe ratio of the number of occurrences, i.e. occurrence I_aConditions of (2)Take place again from below_bThe probability of (d); the significance of the promotion degree lies in the metric item set I_aAnd { I }_bIndependence of the item set, which reflects the set of items I_aOccurrence of { I } for a set of items { I_bHow much change occurs to the probability of occurrence, generally if the value is 1, it indicates that the two conditions have no correlation, if the value is less than 1, it indicates that the two conditions are repulsive, and when the degree of lift is greater than 1, the greater the degree of lift is, the higher the value of the correlation rule is;

Q′＝Q-L₁′

4. The method for mining report data based on mutual exclusion expression improved association rules according to claim 1, wherein the third step specifically comprises:

obtaining the mining result of the association rule, respectively extracting the item set from Q 'to generate a candidate set C of' 2 item sets₂Pruning to obtain frequent item set L₂Then, for frequent item set L₂Performing self-connection iterative search on frequent item sets, cutting candidate sets obtained by search, iterating until new frequent item sets cannot be generated, calculating confidence degrees and promotion degrees of all frequent item sets to obtain data index item sets with implicit association relations, and specifically comprising the following steps:

L_κ＝{l₁,l₂,...,l_n},l_i,l_j(1≤i,j≤n),l_i,l_j∈L_κ

Have the advantages that

5) inverse directionRepeating iteration to obtain frequent item set L_κUp to