CN104216874B - Positive and negative mode excavation method and system are weighted between the Chinese word based on coefficient correlation - Google Patents

Positive and negative mode excavation method and system are weighted between the Chinese word based on coefficient correlation Download PDF

Info

Publication number
CN104216874B
CN104216874B CN201410483377.7A CN201410483377A CN104216874B CN 104216874 B CN104216874 B CN 104216874B CN 201410483377 A CN201410483377 A CN 201410483377A CN 104216874 B CN104216874 B CN 104216874B
Authority
CN
China
Prior art keywords
negative
chinese
item set
association rule
word
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201410483377.7A
Other languages
Chinese (zh)
Other versions
CN104216874A (en
Inventor
黄名选
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangxi University of Finance and Economics
Original Assignee
Guangxi University of Finance and Economics
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangxi University of Finance and Economics filed Critical Guangxi University of Finance and Economics
Priority to CN201410483377.7A priority Critical patent/CN104216874B/en
Publication of CN104216874A publication Critical patent/CN104216874A/en
Application granted granted Critical
Publication of CN104216874B publication Critical patent/CN104216874B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

Positive and negative mode excavation method and system are weighted between a kind of Chinese word based on coefficient correlation, Chinese text pretreatment are carried out using Chinese text information pre-processing module;1 item collection of Feature Words candidate is generated using Chinese feature words candidate generation module, from i item collections (i >=2), candidate's i item collections are produced by candidate (i 1) item collection, calculate its support, obtain frequent item set and negative dependent, item collection beta pruning is carried out according to the degree of association of item collection, interesting feature words frequent item set and negative dependent is obtained;Produced using the positive and negative correlation rule of Chinese Feature Words and result display module calculates correlation rule interest-degree and confidence level, the positive and negative association rule model of interesting Feature Words is excavated from frequent item set and negative dependent, user is shown to.The present invention is avoided that invalid and barren Chinese Feature Words association mode occurs, and digging efficiency is greatly improved, and its association rule model applies to Chinese text information retrieval field and realizes query expansion, improves Information retrieval queries performance.

Description

Chinese word weighting positive and negative mode mining method and system based on correlation coefficient
Technical Field
The invention belongs to the field of text mining, in particular to a correlation coefficient-based Chinese word weighting positive and negative pattern mining method and a mining system thereof, which are suitable for the fields of feature word association pattern discovery, Chinese text information retrieval query expansion, cross-language information retrieval and the like in Chinese text mining. The positive and negative correlation mode of the feature words is applied to web search engines such as Baidu and Google to realize query expansion, is favorable for improving the query performance, and meets the information query requirements of users.
Background
In recent 20 years, the association pattern mining research has achieved remarkable results, and the results can be summarized into three major categories, namely, unweighted positive and negative association pattern mining technology, weighted positive and negative association pattern mining technology, matrix weighted (also called fully weighted) positive and negative association pattern mining technology and the like.
Association pattern mining research began with the Apriori method, which is an unweighted association pattern mining method proposed by Agrawal et al (AGRAWAL R, IMIELINSKI T, SWAMIA. mining association rules between sections of items in large database [ C ]// Proceedings of 1993 ACM SIGMOD International Conference on Management of data, Washington D.C.: ACM Press,1993:207 + 216). On the basis, scholars propose an improved unweighted association pattern mining method from different angles and methods. The defects of the mining of the unweighted positive and negative correlation mode are as follows: the different importance of items and the different weights of items in the transaction database are not taken into account, resulting in a large number of invalid, redundant and uninteresting association patterns.
The item weighted association pattern mining technology overcomes some defects of the traditional mining technology, namely, item weights are introduced by considering different importance among items. Item weighted association pattern mining research began in 1998 and was typically conducted using the weighted association rule mining methods proposed by Cai et al (CAI C H, DA A, FU W C, et al. mining association rules with weighted entries [ C ]// Proceedings of IEEE International database Engineering and application symposiums. Washington D.C.: IEEE Computer Society,1998: 68-77.). Thereafter, some improved methods have appeared, such as Vo, etc. (VO B, COENEN F, LE B.A new method for mining Weighted items based on WIT-trees [ J ]. Extertsystems with Applications,2013(40):1256 + 1264.), and proposed Weighted tree-based Weighted Frequent item set mining method. The method for mining the term weighted positive and negative correlation mode has the following defects: the case that the item weight has different weights in the transaction database is ignored.
The item matrix weighted association pattern mining technology attaches importance to the inherent characteristics of matrix weighted data, namely, the condition that items have different weights in each transaction record of a database is considered, and the defect of weighted association pattern mining is overcome. Data in which the item weights are objectively distributed in the transaction record and vary with the record is generally referred to as matrix weighted data, also referred to as fully weighted data. A matrix weighted association model mining study begins in 2003, and is typically a full weighted association rule mining method proposed by Tan Yi Red, etc. (Tan Yi Red, Lin ya Pini. These methods effectively mine matrix-weighted association rules, but cannot mine matrix-weighted negative association patterns. With the rapid growth of matrix weighted data (such as network text data and the like), the matrix weighted positive and negative association mode mining technology has higher and higher application value in the fields of text information retrieval, text mining and the like, and the back part or the back part of the association rule can be used as a source of an information retrieval query expansion word. Aiming at the problems, the invention provides a Chinese word weighting positive and negative mode mining method and system based on correlation coefficients. Experimental results show that the feature word mining method provided by the invention can effectively reduce the number of feature word candidate items and mining time, the mining performance of the feature word mining method is superior to that of the existing mining method without the weighted positive and negative association mode, and the feature word association mode can provide reliable query expansion word sources for retrieval systems such as web search engines and the like so as to improve the query performance of the retrieval systems.
Disclosure of Invention
The invention aims to deeply explore Chinese text feature word association modes, provides a method and a system for mining a weighted positive and negative mode between Chinese words based on a correlation coefficient, improves the Chinese text mining efficiency, is applied to a web search engine to realize query expansion, can improve the retrieval performance, is applied to Chinese text mining, and can find more practical and reasonable Chinese feature word association modes so as to improve the accuracy of text clustering and classification.
The technical scheme adopted by the invention is as follows: a Chinese word space weighting positive and negative mode mining method based on correlation coefficients comprises the following steps:
(1) preprocessing a Chinese text: preprocessing Chinese text information data to be processed: the Chinese text is divided into words to remove stop words, characteristic words and weight calculation thereof are extracted, and a text information database and a characteristic word item database based on a vector space model are constructed.
The text feature word weight calculation formula is as follows: w is aij=(1+ln(tfij))×idfi
Wherein, wijThe weight, idf, of the ith feature word in the jth documentiThe inverse document frequency of the ith feature word, its value idfi=log(N/dfi) N is the total number of documents in the document set, dfiFor the number of documents containing the ith feature word, tfijThe word frequency of the ith characteristic word in the jth document;
(2) mining Chinese characteristic word matrix weighted frequent 1-item set L1: taking candidate 1-item set C from item library1Cumulative C1Item set weight w (C)1) The support is calculated mwS (C)1) From C, compared with ms1Well-mining matrix weighting frequent 1-item set L1mwPIS was added. Candidate 1-item set C1Degree of support mwS (C)1) The formula is as follows:
wherein n is the total number of records in the text information database.
(3) Mining interesting Chinese feature word matrix weighting frequent i-item set LiAnd a negative i-item set Ni(i is more than or equal to 2), comprising the following steps (3.1) to (3.3):
(3.1) frequent (i-1) -item set Li-1Apriori concatenation is performed to generate a candidate i _ term set CiCumulative CiWeight w (C)i) And calculating its support mwS (C)i)。mwS(Ci) The calculation formula is as follows:
(3.2) if candidate i _ item set CiDegree of support mwS (C)i) Greater than or equal to a minimum support threshold ms, mwS (C)i) More than or equal to ms, calculating the association degree mwFIR (C) of the frequent item seti) Correlate them to a degree mwFIR (C)i) Greater than or equal to the minimum frequent correlation threshold mFr (i.e., mwFIR (C)i) Equal to or greater than mFr) weighted frequent i-term set LiAnd adding a frequent item set mwPIS. Frequent item set association degree mwFIR (C)i) The calculation formula is as follows:
wherein,is CiA collection of sub-sets of items.
(3.3) if candidate i _ entry set CiDegree of support mwS (C)i) Less than a minimum support threshold ms, mwS (C)i)<ms, calculating the correlation degree mwNIR (C) of the negative item seti) Degree of correlation mwNIR (C)i) Greater than or equal to a minimum negative term set relevancy threshold mNr (i.e., mwNIR (C)i) Not less than mNr) weighted negative i-term set NiAdd the negative set mwNIS. mwNIR (C)i) The calculation formula is as follows:
wherein,is CiA collection of sub-sets of items.
(4) An effective Chinese characteristic word matrix weighting positive and negative association rule mode is mined from a Chinese characteristic word frequent item set mwPIS, and the method comprises the following steps (4.1) to (4.6):
(4.1) extracting a characteristic word frequent item set L from the Chinese characteristic word frequent item set mwPISiFind LiAll proper subsets of (a).
(4.2) from LiArbitrarily take out two proper subsets I from the proper subset set1And I2When I is1And I2Support of (I) mwS1) And mwS (I)2) Are all greater than or equal to ms, i.e. mwS (I)1)≥ms,mwS(I2) Is greater than or equal to ms, andI1∪I2=Licomputing a matrix weighted frequent item set (I)1,I2) Correlation coefficient of (1) mwPCC (I)1,I2)。mwS(I1)、mwS(I2) And mwPCC (I)1,I2) The calculation formula of (a) is as follows:
wherein i1And i2Is I1And I2The number of items, i.e. the dimension.
Wherein mwS (×) >0, mwS (×) ≠ 1.
(4.3) matrix-weighted frequent item set (I)1,I2) Correlation coefficient of (1) mwPCC (I)1,I2) Greater than or equal to the correlation threshold β,i.e. mwPCC (I)1,I2) Not less than β, calculating VMWAR (I)1,I2Mc, mi), if its value is equal to 1, the strong association rule I of matrix weighted Chinese character word is obtained1→I2Adding a matrix weighting positive association rule set mwPAR; calculating effective matrix weighted negative association rule1→﹁I2Evaluation value VMWAR (|)1,﹁I2Mc, mi) if its value is equal to 1, then we find the matrix weighted Chinese feature strongly negative association rule I1→﹁I2mwNAR was added. VMWAR (I)1,I2Mc, mi) and VMWAR (|)1,﹁I2Mc, mi) is calculated as follows:
wherein,
wherein,
(4.4) matrix-weighted item set (I)1,I2) Correlation coefficient of (1) mwPCC (I)1,I2) Less than or equal to- β, i.e. mwPCC (I)1,I2) Less than or equal to- β, calculating VMWAR (I)1,﹁I2Mc, mi), if its value is equal to 1, the strong negative association rule I of the matrix weighted Chinese character word is obtained1→﹁I2Adding mwNAR; calculating VMWAR (|)1,I2Mc, mi) if its value is equal to 1, then we find the matrix weighted Chinese feature strongly negative association rule I1→I2mwNAR was added. VMWAR (I)1,﹁I2Mc, mi) and VMWAR (|)1,I2Mc, mi) is calculated as follows:
wherein,
wherein,
(4.5) continuing the step (4.2) when the characteristic word is frequent item set LiEach proper subset in the proper subset set is taken out once and can be taken out only once, and then the step (4.6) is carried out;
(4.6) continuing (4.1) when each frequent item set L in the feature word frequent item setiAll the materials are taken out once and only can be taken out once, and then the operation of the step (4) is finished, and the step (5) is switched to;
(5) mining an effective Chinese characteristic word matrix weighted negative association rule mode from a negative term set mwNIS, wherein the method comprises the following steps (5.1) to (5.6):
(5.1) extracting the characteristic word negative item set N from the Chinese characteristic word negative item set mwNISiFinding NiAll proper subsets of (a).
(5.2) from NiArbitrarily take out two proper subsets I from the proper subset set1And I2When I is1And I2Support of (I) mwS1) And mwS (I)2) Are all greater than or equal to ms, andI1∪I2=Nicomputing a matrix weighted negative term set (I)1,I2) Correlation coefficient of (1) mwPCC (I)1,I2)。mwPCC(I1,I2) The calculation formula of (4.2) is the same.
(5.3) matrix weighted negative term set (I)1,I2) Correlation coefficient of (1) mwPCC (I)1,I2) Greater than or equal to the correlation threshold β, i.e., mwPCC (I)1,I2) Greater than or equal to β calculating VMWAR (|)1,﹁I2Mc, mi) if its value is equal to 1, then we find the matrix weighted Chinese feature strongly negative association rule I1→﹁I2mwNAR was added. VMWAR (|)1,﹁I2And mc, mi) is calculated according to the formula in the step (4.3).
(5.4) matrix weighted negative term set (I)1,I2) Correlation coefficient of (1) mwPCC (I)1,I2) Less than or equal to- β, i.e. mwPCC (I)1,I2) Less than or equal to- β, calculating an effective matrix weighted negative association rule I1→﹁I2Evaluation value VMWAR (I)1,﹁I2Mc, mi), if its value is equal to 1, the strong negative association rule I of the matrix weighted Chinese character word is obtained1→﹁I2Adding a matrix weighted negative association rule set mwNAR; calculating effective matrix weighted negative association rule1→I2Evaluation value VMWAR (|)1,I2Value of mc, mi), if it isEqual to 1, then obtain the matrix weighted Chinese character strong negative association rule1→I2mwNAR was added. VMWAR (I)1,﹁I2Mc, mi) and VMWAR (|)1,I2And mc, mi) is calculated according to the formula in the step (4.4).
(5.5) continuing (5.2) when the feature word negative item set NiEach proper subset in the proper subset set is taken out once and can be taken out only once, and then the step (5.6) is carried out;
(5.6) continuing (5.1) when each negative item set N in the feature word frequent item setiAll the materials are taken out once and can be taken out once only, and the operation of the step (5) is finished;
and ending the matrix weighted positive and negative mode mining of the Chinese feature words. The ms is a minimum support threshold, the mc is a minimum confidence threshold, the mi is a minimum interestingness threshold, and the beta is a correlation coefficient threshold.
A mining system suitable for the correlation coefficient-based Chinese inter-word weighted positive-negative pattern mining method is characterized by comprising the following 4 modules:
the Chinese text information preprocessing module: the method is used for performing word segmentation and stop word deletion on the Chinese text information to be processed, extracting the characteristic words and calculating the weight of the characteristic words to construct a Chinese text information base and a characteristic word item base.
The Chinese characteristic word candidate item set generation module: the module firstly excavates a matrix weighted Chinese feature word candidate 1-item set from a feature word item library and a Chinese text information library, calculates the support degree of the matrix weighted Chinese feature word candidate 1-item set, and generates a Chinese feature word candidate i-item set by Apriori connection of the frequent (i-1) -item set from an i-item set (i is more than or equal to 2).
The Chinese characteristic word frequent item set and negative item set generating module: the module calculates the support degree of a candidate i-item set of the Chinese characteristic words, and compares the support degree with a minimum support degree threshold value to obtain a frequent i-item set and a negative i-item set of the Chinese characteristic words; calculating the association degree of the frequent item set, and comparing the association degree with a threshold value of the association degree of the frequent item set to obtain an interesting Chinese feature word frequent item set; and calculating the relevance of the negative term set, and comparing the relevance with a threshold value of the relevance of the negative term set to obtain an interesting Chinese feature word negative term set.
The Chinese feature word positive and negative association rule generation and result display module comprises: the module firstly generates a proper subset of a Chinese feature word frequent item set, calculates a correlation coefficient, an interest degree and a confidence degree of a Chinese feature word association rule mode, compares the correlation coefficient, the interest degree threshold and the confidence degree threshold, and excavates an effective matrix weighting Chinese feature word strong positive and negative association rule mode from the frequent item set; then generating a proper subset of the negative term set of the Chinese characteristic words, calculating the correlation coefficient, the interest degree and the confidence degree of the negative association rule mode of the Chinese characteristic words, comparing the correlation coefficient threshold, the interest degree threshold and the confidence degree threshold, and mining an effective matrix weighting strong negative association rule mode of the Chinese characteristic words from the negative term set; and finally, displaying the effective matrix weighted Chinese feature word positive and negative association rule mode to a user in a required form for analysis and use by the user.
The Chinese characteristic word frequent item set and negative item set generating module comprises the following 2 modules:
a characteristic word frequent item set generation module: the module calculates the support degree of the candidate item set of the Chinese characteristic words, compares the support degree with a support degree threshold value to obtain a frequent item set, calculates the association degree of the frequent item set, and compares the association degree with a correlation degree threshold value to obtain an interesting matrix weighted Chinese characteristic word frequent item set.
The feature word negative item set generation module: the module calculates the support degree of the candidate item set of the Chinese characteristic words, compares the support degree with a support degree threshold value to obtain a negative item set, calculates the relevance degree of the negative item set, and compares the relevance degree with a relevance degree threshold value to obtain an interesting matrix weighted negative item set of the Chinese characteristic words.
The Chinese feature word positive-negative association rule generation and result display module comprises the following 3 modules:
a strong positive and negative association rule generation module from a frequent item set: the module generates a proper subset of a Chinese feature word frequent item set, calculates a correlation coefficient, an interest degree and a confidence degree of the Chinese feature word association rule mode, compares the correlation coefficient, the interest degree threshold and the confidence degree threshold, and mines an effective matrix weighting Chinese feature word strong positive and negative association rule mode from the frequent item set.
A strong negative association rule generation module from the negative set of terms: the module generates a proper subset of the negative term set of the Chinese characteristic words, calculates the correlation coefficient, the interestingness and the confidence coefficient of the negative association rule mode of the Chinese characteristic words, compares the correlation coefficient threshold, the interestingness threshold and the confidence coefficient threshold, and mines an effective matrix weighting strong negative association rule mode of the Chinese characteristic words from the negative term set.
The strong positive and negative association rule display module of the feature words: the module displays the effective matrix weighted Chinese feature word positive and negative association rule mode to the user in a required form for analysis and use by the user.
The support degree threshold ms, the confidence degree threshold mc, the interest degree threshold mi and the correlation coefficient threshold beta in the mining system are input by a user.
Compared with the prior art, the invention has the following beneficial effects:
(1) the invention provides a Chinese feature word matrix weighted positive and negative correlation mode mining method and a mining system thereof based on correlation coefficients. The invention adopts a new item set pruning technology, avoids the generation of a plurality of invalid, false and uninteresting correlation modes, greatly improves the mining efficiency, and leads the mined correlation modes to be closer to the actual conditions. Compared with the existing mining method, the mined candidate item set, frequent item set and negative item set and the number of positive and negative association rule patterns are greatly reduced, the mining time is greatly reduced, and the mining efficiency is greatly improved. Experimental results show that the double-threshold pruning strategy provided by the method is effective, the pruning effect is obvious, a more practical Chinese characteristic word association mode can be obtained, and the method has high application value and wide application prospect in the fields of text mining, information retrieval and the like. The positive and negative association mode of the character feature words is applied to web search engines such as Baidu search engines and Google search engines to realize query expansion, so that the query performance is improved, and the information query requirements of users are met.
(2) The Chinese standard data set CWT200g in China is used as experimental data, the method is compared and analyzed with a classical mining method without a weighted association mode, and experimental results show that the number of candidate item sets mined by the method is less than that mined by an comparative method no matter the support degree threshold value or the confidence degree threshold value changes, the mining time of the method is less than that mined by the comparative method, the reduction is larger, and the mining efficiency is greatly improved.
Drawings
FIG. 1 is a block diagram of a correlation coefficient-based mining method for weighted positive and negative patterns of Chinese words.
FIG. 2 is an overall flow chart of the correlation coefficient-based Chinese inter-word weighted positive-negative pattern mining method of the present invention.
FIG. 3 is a block diagram of the system for mining weighted positive and negative patterns between Chinese words based on correlation coefficients according to the present invention.
FIG. 4 is a block diagram of a module for generating a frequent term set and a negative term set of Chinese feature words according to the present invention.
FIG. 5 is a block diagram of the structure of the Chinese feature word positive-negative association rule generation and result display module according to the present invention.
Detailed Description
To better illustrate the technical solution of the present invention, the following introduces the chinese text data model and related concepts related to the present invention as follows:
first, basic concept
Let MWD be { r ═ r1,r2,…,rnN, Is ═ i, where the transaction record number Is n1,i2,…,imDenotes the set of all items (Itemset, Is) in MWD, the number of items Is m, ij(1 ≦ j ≦ m) indicates the jth entry in the MWD, in the transaction record riThe weight value in is w [ r ]i][ij](0≦wij≦ 1). Is provided withI1,I2Is a sub-set of item set I1∪I2The sum of the two is I and,the following basic definitions are given.
Define 1 Matrix-weighted Support (Matrix-weighted Support, mwS): the calculation formula of the matrix weighting support mwS (I) (Tan Yi hong, Lin ya Ping. mining of the fully weighted association rule in the vector space model. computer engineering and application, 2003(13): 208-.
The matrix weighted negative term set and the negative association rule support degree are shown as formulas (2) to (5).
mwS(﹁I)=1–mwS(I) (2)
mwS(I1→﹁I2)=mwS(I1,﹁I2)=mwS(I1)–mwS(I1,I2) (3)
mwS(﹁I1→I2)=mwS(﹁I1,I2)=mwS(I2)–mwS(I1,I2) (4)
mwS(﹁I1→﹁I2)=mwS(﹁I1,﹁I2)=1–mwS(I1)–mwS(I2)+mwS(I1,I2) (5)
Defining 2 matrix weighting frequent item set and negative item set: for the matrix weighting item set I, if mwS (I) is more than or equal to ms, the item set I is called as a matrix weighting frequent item set; when I is1And I2Are all matrix weighted frequent item sets, if mwS (I)1,I2)<ms, then item set (I)1,I2) Referred to as a matrix-weighted negative term set, where ms is the minimum support threshold.
Define 3 Matrix weighted Confidence (Matrix-weighted Confidence, mwC): the matrix weighting positive and negative association rule confidence degree calculation formula is as follows (6) to (10):
define 4 Matrix weighted Pattern correlation coefficient (mwPCC): matrix weighting Pattern (I)1,I2) Correlation coefficient mwPCC (I)1,I2) The formula (2) is shown in formula (10).
Wherein mwS (×) >0, mwS (×) ≠ 1.
Defining 5 matrix weighted frequent item setAlignment (Matrix-weighted free ItemsetRelevance, mwFIR): for a matrix weighted frequent item set FI ═ i1,i2,…,im)(m>1) Whose sub-item set is And (3) taking the conditional probability of the frequent item set FI when the sub item set with the maximum support degree occurs as the correlation degree of FI, and giving a calculation formula of the correlation degree mwFIR (FI) among the sub item sets of the matrix weighted frequent item set FI as shown in a formula (11).
Define 6 Matrix weighted Negative item set relevance (Matrix-weighted Negative Itemset Relevance, mwNIR): for matrix weighted negative set of terms NI ═ i1,i2,…,ir)(r>1) Whose sub-item set is The conditional probability of the negative term set NI when the sub-term set with the maximum support degree does not occur is taken as the correlation degree of NI, and a calculation formula of the correlation degree mwNIR (NI) between the matrix weighting negative term set NI and the sub-term set is given as shown in a formula (12).
Defining 7 matrix weighted positive and negative association rule interestingness: matrix-weighted positive and negative association rule interest (mwARI) formulas are shown in formulas (13) to (16).
Effective matrix weighting positive and negative association rule mining idea
Assuming that a minimum confidence (mc) threshold is mc, a minimum interest (mi) threshold is mi, a correlation coefficient threshold is beta (beta belongs to (0, 1)), and an effective matrix weighting association rule mining basic idea is as follows:
(1) weighting frequent itemsets (I) for interesting matrices1,I2) Item set I1And I2Are all frequent item sets, if mwPCC (I)1,I2)≥β,VMWAR(I1,I21 and VMWAR (|)1,﹁I2Where mc, mi) ═ 1, then I1→I2And I1→﹁I2Is an effective matrix weighting positive and negative association rule; if mwPCC (I)1,I2) < I > β, when VMWAR (I)1,﹁I2Mc, mi ═ 1 and VMWAR (I)1,﹁I2When mc, mi) is 1, then I1→﹁I2And I1→﹁I2Is an effective negative rule for matrix weighting.
Wherein, VMWAR (I)1,I2,mc,mi)、VMWAR(﹁I1,﹁I2,mc,mi)、VMWAR(I1,﹁I2Mc, mi) and VMWAR (I)1,﹁I2Mc, mi) are calculated asThe formulae (17) to (20).
(2) Weighting negative sets (I) for interesting matrices1,I2) Item set I1And I2Are all frequent item sets, if mwPCC (I)1,I2)≥β,VMWAR(﹁I1,﹁I21 mm, mi ═ 1 mm-1→﹁I2Is an effective matrix weighted negative association rule; if mwPCC (I)1,I2)≤-β,VMWAR(I1,﹁I2,mc,mi)=1、VMWAR(﹁I1,I2When mc, mi) is 1, then I1→﹁I2、﹁I1→I2Is an effective matrix weighted negative association rule.
Three, interesting pruning strategies for matrix-weighted item sets
Let the minimum frequent dependency (mFr) threshold be mFr and the minimum negative dependency (mNr) threshold be mNr.
An interesting matrix-weighted frequent itemset I pruning strategy: when mwS (I) ≧ ms, if mwFIR (I) ≧ mFr, item set I is an interesting matrix-weighted frequent item set that should be retained, otherwise, if mwFIR (I) < mFr, item set I is pruned.
An interesting matrix-weighted negative set I pruning strategy: when mws (I) < ms, if mwnir (ni) ≧ mNr, item set I is an interesting matrix-weighted negative item set that should be retained, otherwise, if mwnir (ni) < mNr, item set I is pruned.
The technical solution of the present invention is further illustrated by the following specific examples.
The excavation method and system adopted by the present invention in a specific embodiment are shown in fig. 1-5.
Example (c): the following formula is an example of a Chinese text database with 5 Chinese document records and 5 feature term items and their weights, i.e., the set of documents is { d }1,d2,d3,d4,d5The feature word set is { i }1,i2,i3,i4,i5Program, queue, function, environment, member.
The mining method of the invention is adopted to mine the Chinese characteristic word matrix weighted positive and negative association mode for the Chinese document data example, and the mining process is as follows (ms is 0.15, mc is 0.3, mFr is 0.3, mNr is 0.12, mi is 0.26, beta is 0.1):
1. mining matrix weighted feature word frequent 1_ item set L1As shown in table 1, wherein n is 5.
Table 1:
C1 w(C1) mwS(C1)
(i1) 2.8 0.56
(i2) 0.55 0.11
(i3) 2.6 0.52
(i4) 0.92 0.184
(i5) 0.84 0.168
as can be seen from Table 1, L1={(i1),(i3),(i4),(i5)},
Set of frequent itemsets of feature words { (i)1),(i3),(i4),(i5)}。
2. Mining matrix weighted feature word frequent k _ term set LkAnd negative k-term set NkAnd k is more than or equal to 2.
k=2:
(1) Frequent 1_ item set L of feature words1Apriori connection is carried out to generate a characteristic word candidate 2_ item set C2And calculating w (C)2) And mwS (C)2) As shown in table 2.
Table 2:
for table 2, the following operations were performed:
rhone mwS (C)2) More than or equal to ms, calculate mwFIR (C)2) Mixing mwFIR (C)2) Fun matrix weighted frequent 2-term set L of > mFr2Add frequent item set mwPIS, i.e. L2={(i1,i3),(i1,i5)},mwPIS={(i1),(i3),(i4),(i5),(i1,i3),(i1,i5)}
Rhone mwS (C)2)<ms, calculate mwNIR (C)2) Mixing mwNIR (C)2) Fun matrix weighted negative 2-term set N of not less than mNr2Add negative set of terms mwNIS, N2={(i1,i4),(i3,i5)},mwNIS={(i1,i4),(i3,i5)}k=3:
*L2Apriori connection is carried out to generate a Chinese feature word candidate 3_ item set C3And add up C3mwS (C)3) As shown in table 3.
Table 3:
C3 w(C3) mwS(C2) mwNIR(C3)(mwS(C3)<0)
(i1,i3,i5) 1.44 0.096 =0.096/(1-0.56)=0.218
for table 3, the following operations were performed:
*(i1,i3,i5) Is { (i)1),(i3),(i5),(i1,i3),(i1,i5)(i3,i5) In these subsets, the most supported is (i)1) The value is 0.56, again due to mwS (C)i)<ms, so mwNIR (C)i)=0.096/(1-0.56)=0.218≥mNr,
Namely N3={(i1,i3,i5)},mwNIS={(i1,i4),(i3,i5),(i1,i3,i5)}
When k is 4, due to L3For null, mining the frequent k _ item set L of the matrix weighted feature wordskAnd negative k-term set NkAfter that, the procedure proceeds to the following 3 steps. The final mining term set results are: mwPIS { (i)1),(i3),(i4),(i5),(i1,i3),(i1,i5)},mwNIS={(i1,i4),(i3,i5),(i1,i3,i5)}
3. And mining an effective matrix weighting Chinese feature word positive and negative association rule mode from the frequent item set mwPIS.
Frequent item set (i) with characteristic words in mwPIS1,i5) For example, the effective matrix weighting positive and negative association rule pattern mining process is given as follows:
frequent item set (i)1,i5) Is { (i)1),(i5) Is given by I1=(i1),I2=(i5)。
mwS(I1)=0.56≥ms,mwS(I2)=0.168≥ms,mwS(I1,I2)=0.214
And (3) calculating:
because of mwPCC (I)1,I2)>β is 0.1, so,
(1)
because of VMWAR (I)1,I2And mc, mi) is 1, so that an effective matrix weighting Chinese feature word association rule I is obtained1→I2I.e., (i)1)→(i5) Or, (program) → (members).
(2)mwS(﹁I1,﹁I2)=1–0.56–0.168+0.214=0.486,mwS(﹁I1)=1–0.56=0.44
mwS(﹁I2)=1–0.168=0.832
Due to the fact thatSo long as you can not go out1→﹁I2
In summary, for the Chinese feature word frequent item set (i)1,i5) An effective matrix weighted Chinese feature word association rule pattern (i) can be mined1)→(i5) Or, (procedure) → (members) (ms 0.15, mc 0.3, mFr 0.3, mNr 0.12, mi 0.26, β 0.1)
4. And mining an effective matrix weighted Chinese characteristic word negative association rule pattern from the negative item set mwNIS.
With characteristic word negative item set (i) in mwNIS3,i5) And (i)1,i4) For example, the mining process of the strong negative association rule mode of the effective matrix weighted Chinese feature words is as follows:
set of negative terms (i)3,i5) Is { (i)3),(i5) Is given by I1=(i3),I2=(i5)。
mwS(I1)=0.52≥ms,mwS(I2)=0.168≥ms,mwS(I1,I2)=0.084
And (3) calculating:
because of mwPCC (I)1,I2)>- β is-0.1, so no association rule is mined.
Set of negative terms (i)1,i4) Is { (i)1),(i4) Is given by I1=(i1),I2=(i4)。
mwS(I1)=0.56≥ms,mwS(I2)=0.184≥ms,mwS(I1,I2)=0.072
And (3) calculating:
because of mwPCC (I)1,I2)=-0.1614<- β is-0.1, so,
(1)mwS(I1,﹁I2)=0.56–0.072=0.488,mwS(﹁I2)=1–0.184=0.816,
due to the fact thatSo that no negative association rule I can be mined1→﹁I2
(2)mwS(﹁I1,I2)=0.184–0.072=0.112,
Due to VMWAR (|)1,I21 to give effective negative association rules I1→I2I 2 (q)1)→(i4) Or rhizoma→ (environment).
In summary, for the Chinese feature word negative set (i)1,i4) Effective matrix weighted Chinese character negative association rules1→I2I 2 (q)1)→(i4) Or rhizoma→ (environment) (ms 0.15, mc 0.3, mFr 0.3, mNr 0.12, mi 0.26, β 0.1).
The beneficial effects of the present invention are further illustrated by experiments below.
In order to verify the effectiveness and the correctness of the invention, an experiment source program is written, a classical unweighted positive and negative association rule mining algorithm (WU Xin-dong, ZHANG Cheng-qi and ZHANG Shi-chao. effective mining of negative positive and negative association rules [ J ]. ACM Transactions on information Systems,2004,22(3):381 and 405) (named as PNARMiner algorithm) is selected as an experiment contrast algorithm, and the experiment contrast and analysis are carried out on the algorithm mining performance from 4 aspects of support degree change, combination parameter change, rule interest degree change, correlation coefficient change and the like. Associated Rules (AR) A → B, A → B and A → B are expressed by AR1, AR2, AR3 and AR4 respectively in the following tables.
The experimental data are from 12024 pure Chinese text documents of Chinese Web Test collection CWT200g (Chinese Web Test collection with 200GB Web Pages) part corpus provided by Beijing university network laboratories. And obtaining a text database and a feature word item library through document preprocessing such as word segmentation, stop word removal, feature word extraction, weight calculation and the like. The experimental data after pretreatment were as follows: the experimental document set may be preprocessed to obtain 8751 feature words, and the document frequency df (i.e. the document length containing the feature words) ranges from 51 to 11258. And extracting the feature words of which the df values are not less than 1500 and not more than 5838 to construct a feature word item library for mining (400 feature words in total). The experimental parameters were: n is 12024, the number of mined items (ItemNumber, ItemNum) is 50, ms, mFr, mNr, mc, mi, beta, and the maximum length of the mined item set in the experiment process is 4.
Experiment one: mining performance with varying support degree threshold
The Candidate set (CI), Frequent item set (FI), Negative item set (NI) and associated rule number results mined by the two algorithms with the support threshold varying are shown in tables 1 and 2,
wherein the experimental parameters are as follows: mc 0.07, mFr 0.06, mNr 0.001, mi 0.01, β 0.05.
As can be seen from tables 1 and 2, as the support threshold increases, the number of each type of item set and association rule gradually decreases, wherein the number of the item sets and association rules mined by the MWARM-SRCCCI algorithm herein is less than that of the comparison algorithm pnarmier, the maximum reduction of the number of the item sets can reach 94.9%, and the maximum reduction of the number of the rules can reach 99.7%.
Experiment two: mining performance combining parameter threshold variation
Since the valid matrix-weighted association rule is the result of comprehensive evaluation of the support, confidence, interest, and correlation coefficient of the association rule, 7 sets of GP values, that is, GP1 {0.03,0.01,0.01,0.05}, GP2 {0.035,0.015,0.015,0.055}, GP3 {0.038,0.02,0.018,0.055}, GP4 {0.04,0.035,0.02,0.065}, GP 64 {0.05,0.04,0.03,0.07}, GP6 {0.06,0.07,0.04,0.08, 0. 7}, 0.05 {0.04,0.03, 0.07}, GP 675 {0.06,0.07 }, 0.3 } of the association rule, and correlation coefficient are shown in the following table of the correlation Parameter set as ms, mc, mi, mn.
TABLE 3 comparison of the number of positive and negative association rules mined under combined parameter changes
The experimental results in Table 3 show that as the combined parameter values increase, the number of classes of association rules progressively decrease with fewer of the herein algorithm miners than the comparison algorithm miners, positive association rules A → B mode numbers decreasing by 95.36%, in negative association rules the largest is A → B mode numbers reaching 93.99%, the smallest is A → B mode numbers reaching 82.85%.
Experiment three: dig time efficiency performance comparison
The time for the 2 algorithms to mine the sets of terms and association rules for the case of varying support thresholds and combination parameters is shown in tables 4 and 5.
The results in tables 4 and 5 show that the MWARM-SRCCCI mining term set and association rule herein have less time than the comparative algorithm pnarnier, respectively, 51.92% and 74.74% in the case of a change in the support threshold (mc 0.07, mFr 0.06, mNr 0.001, mi 0.01, β 0.05) and a change in the combined parameter values, respectively, indicating that the algorithm mining efficiency herein is indeed improved.
Experiment four: mining performance for regular interestingness threshold changes
The experiment mainly verifies the validity of the interest degree threshold mi of the association rules of the algorithm, and the quantity of the mining matrix weighted association rules of the algorithm under the condition of mi change is shown in the table 6.
TABLE 6 number of rules mined by the algorithm herein when mi varies
mi AR1 AR2 AR3 AR4
0.01 1320 247 247 10838
0.03 1320 153 153 1096
0.05 1320 70 70 108
0.09 1320 0 0 0
0.80 1130 0 0 0
0.90 468 0 0 0
0.95 34 0 0 0
As can be seen from table 6, as the interestingness mi threshold increases, the number of association rules becomes smaller. It can be seen that the matrix-weighted positive-negative association rule pattern impact of mi is large, with negative association rules appearing only in the low-value portion of mi, and positive association rules affected in the high-value portion of mi.
Experiment five: item set pruning Performance analysis
In order to verify the effectiveness of the item set pruning strategy provided by the present disclosure, the pruning performance of the MWARM-SRCCCI algorithm is experimentally analyzed according to two cases, namely, the frequent item set association threshold mFr variation and the negative item set association threshold mNr variation, and the experimental results are shown in table 7 and table 8, where mFr and mNr are 0, and are cases without pruning function.
Tables 7 and 8 show that the more frequent and negative item sets are pruned as mFr and mNr increase, the more significant the pruning effect. At the same time, the mNr value is much smaller than the mFr value, indicating the advantage of setting a dual relevance threshold.
The experimental results show that compared with the experiments, the mining performance of the invention has good mining performance, compared with the existing mining algorithm, the mined candidate item set, frequent item set and negative item set, and the number of positive and negative association rule patterns are greatly reduced, the pruning effect is obvious, the mining time is greatly reduced, and the mining efficiency is greatly improved.

Claims (4)

1. A Chinese word space weighting positive and negative mode mining method based on correlation coefficients is characterized by comprising the following steps:
(1) preprocessing a Chinese text: preprocessing Chinese text information data to be processed: removing stop words from Chinese text by word segmentation, extracting characteristic words and calculating weights of the characteristic words, and constructing a text information database and a characteristic word item database based on a vector space model;
(2) mining Chinese characteristic word matrix weighted frequent 1-item set L1: taking candidate 1-item set C from item library1Tired ofAdding C1Item set weight, its support is calculated mwS (C)1) Compared with the minimum support threshold ms, from C1Well-mining matrix weighting frequent 1-item set L1Adding a frequent item set mwPIS;
(3) mining interesting Chinese feature word matrix weighting frequent i-item set LiAnd a negative i-item set NiComprises the following steps (3.1) to (3.3); the i is more than or equal to 2,
(3.1) frequent (i-1) -item set Li-1Apriori concatenation is performed to generate a candidate i _ term set CiCumulative CiAnd calculating mwS (C) thereofi);
(3.2) if candidate i _ item set CiDegree of support mwS (C)i) Greater than or equal to the minimum support threshold ms, and calculating the association degree mwFIR (C) of the frequent item seti) Correlate them to a degree mwFIR (C)i) Interesting matrix weighted frequent i-item set L greater than or equal to minimum frequent relevance threshold mFriAdding a frequent item set mwPIS;
(3.3) if candidate i _ entry set CiDegree of support mwS (C)i) Calculating the degree of association mwNIR (C) of the negative term set less than the minimum support degree threshold msi) Degree of correlation mwNIR (C)i) Interesting matrix weighted negative i-term set N greater than or equal to minimum negative term set relevancy threshold mNriAdding a negative item set mwNIS;
(4) an effective Chinese characteristic word matrix weighting positive and negative association rule mode is mined from a Chinese characteristic word frequent item set mwPIS, and the method comprises the following steps (4.1) to (4.6):
(4.1) extracting a characteristic word frequent item set L from the Chinese characteristic word frequent item set mwPISiFind LiAll proper subsets of (c);
(4.2) from LiArbitrarily take out two proper subsets I from the proper subset set1And I2When I is1And I2Support of (I) mwS1) And mwS (I)2) Are all greater than or equal to ms, andI1∪I2=Licalculating the momentArray weighted frequent item set (I)1,I2) Correlation coefficient of (1) mwPCC (I)1,I2);
(4.3) matrix-weighted frequent item set (I)1,I2) Correlation coefficient of (1) mwPCC (I)1,I2) Greater than or equal to the correlation threshold β, i.e., mwPCC (I)1,I2) Not less than β, calculating effective matrix weighting association rule I1→I2Evaluation value VMWAR (I)1,I2Mc, mi), if its value is equal to 1, the strong association rule I of matrix weighted Chinese character word is obtained1→I2Adding a matrix weighting positive association rule set mwPAR; calculating effective matrix weighted negative association rule1→﹁I2Evaluation value VMWAR (|)1,﹁I2Mc, mi) if its value is equal to 1, then we find the matrix weighted Chinese feature strongly negative association rule I1→﹁I2Adding a matrix weighted negative association rule set mwNAR;
(4.4) matrix-weighted item set (I)1,I2) Correlation coefficient of (1) mwPCC (I)1,I2) Less than or equal to- β, i.e. mwPCC (I)1,I2) Less than or equal to- β, calculating an effective matrix weighted negative association rule I1→﹁I2Evaluation value VMWAR (I)1,﹁I2Mc, mi), if its value is equal to 1, the strong negative association rule I of the matrix weighted Chinese character word is obtained1→﹁I2Adding a matrix weighted negative association rule set mwNAR; calculating effective matrix weighted negative association rule1→I2Evaluation value VMWAR (|)1,I2Mc, mi) if its value is equal to 1, then we find the matrix weighted Chinese feature strongly negative association rule I1→I2Adding a matrix weighted negative association rule set mwNAR;
(4.5) continuing the step (4.2) when the characteristic word is frequent item set LiEach proper subset in the proper subset set is taken out once and can be taken out only once, and then the step (4.6) is carried out;
(4.6) continuing (4.1) when each frequent item set L in the feature word frequent item setiAre all taken outAnd (4) if the operation is finished and the operation is shifted to the step (5) if the operation can be taken out only once;
(5) mining an effective Chinese characteristic word matrix weighted negative association rule mode from a negative term set mwNIS, wherein the method comprises the following steps (5.1) to (5.6):
(5.1) extracting the characteristic word negative item set N from the Chinese characteristic word negative item set mwNISiFinding NiAll proper subsets of (c);
(5.2) from NiArbitrarily take out two proper subsets I from the proper subset set1And I2When I is1And I2Support of (I) mwS1) And mwS (I)2) Are all greater than or equal to a minimum support threshold ms, andI1∪I2=Nicomputing a matrix weighted negative term set (I)1,I2) Correlation coefficient of (1) mwPCC (I)1,I2);
(5.3) matrix weighted negative term set (I)1,I2) Correlation coefficient of (1) mwPCC (I)1,I2) Greater than or equal to the correlation threshold β, i.e., mwPCC (I)1,I2) Greater than or equal to β calculating effective matrix weighted negative association rule I1→﹁I2Evaluation value VMWAR (|)1,﹁I2Mc, mi) if its value is equal to 1, then we find the matrix weighted Chinese feature strongly negative association rule I1→﹁I2Adding a matrix weighted negative association rule set mwNAR;
(5.4) matrix weighted negative term set (I)1,I2) Correlation coefficient of (1) mwPCC (I)1,I2) Less than or equal to- β, i.e. mwPCC (I)1,I2) Less than or equal to- β, calculating an effective matrix weighted negative association rule I1→﹁I2Evaluation value VMWAR (I)1,﹁I2Mc, mi), if its value is equal to 1, the strong negative association rule I of the matrix weighted Chinese character word is obtained1→﹁I2Adding a matrix weighted negative association rule set mwNAR; calculating effective matrix weighted negative association rule1→I2Evaluation value VMWAR (|)1,I2Mc, mi) if its value is equal to 1, then we find the matrix weighted Chinese feature strongly negative association rule I1→I2Adding a matrix weighted negative association rule set mwNAR;
(5.5) continuing (5.2) when the feature word negative item set NiEach proper subset in the proper subset set is taken out once and can be taken out only once, and then the step (5.6) is carried out;
(5.6) continuing (5.1) when each negative item set N in the feature word frequent item setiAll the materials are taken out once and can be taken out once only, and the operation of the step (5) is finished;
at this point, mining positive and negative modes of the matrix weighted Chinese feature words is finished; the ms is a minimum support threshold, the mc is a minimum confidence threshold, the mi is a minimum interestingness threshold, and the beta is a correlation coefficient threshold.
2. A mining system suitable for the correlation coefficient-based Chinese inter-word weighted positive-negative pattern mining method of claim 1, comprising the following 4 modules:
the Chinese text information preprocessing module: the method comprises the steps of performing word segmentation and stop word deletion on Chinese text information to be processed, extracting feature words and calculating weights of the feature words, and constructing a Chinese text information base and a feature word item base;
the Chinese characteristic word candidate item set generation module: the module firstly mines matrix weighted Chinese feature word candidate 1-item set C from a feature word item library and a Chinese text information library1The support is calculated mwS (C)1) Thus obtaining the frequent 1-item set L of Chinese character words1From the candidate i-item set CiFrom, frequently (i-1) -item set Li-1Chinese characteristic word candidate i-item set C is generated through Apriori connectioniWherein i is more than or equal to 2;
the Chinese characteristic word frequent item set and negative item set generating module: the module calculates Chinese characteristic word candidate i-item set CiDegree of support mwS (C)i) Comparing with the minimum support threshold ms to obtain a Chinese characteristic word frequent i-item set LiAnd a negative i-item set Ni(ii) a Calculating the association degree of the frequent item set, and comparing the association degree with a threshold value of the association degree of the frequent item set to obtain an interesting Chinese feature word frequent item set; calculating the relevance of the negative term set, and comparing the relevance with a threshold value of the relevance of the negative term set to obtain an interesting Chinese feature word negative term set;
the Chinese feature word positive and negative association rule generation and result display module comprises: the module firstly generates a proper subset of a Chinese feature word frequent item set, calculates a correlation coefficient, an interest degree and a confidence degree of a Chinese feature word association rule mode, compares the correlation coefficient, the interest degree threshold and the confidence degree threshold, and excavates an effective matrix weighting Chinese feature word strong positive and negative association rule mode from the frequent item set; then generating a proper subset of the negative term set of the Chinese characteristic words, calculating the correlation coefficient, the interest degree and the confidence degree of the negative association rule mode of the Chinese characteristic words, comparing the correlation coefficient threshold, the interest degree threshold and the confidence degree threshold, and mining an effective matrix weighting strong negative association rule mode of the Chinese characteristic words from the negative term set; and finally, displaying the effective matrix weighted Chinese feature word positive and negative association rule mode to a user in a required form for analysis and use by the user.
3. The mining system of claim 2, wherein the chinese feature word frequent term set and negative term set generating module comprises the following 2 modules:
a characteristic word frequent item set generation module: the module calculates the support degree of the candidate item set of the Chinese characteristic words, compares the support degree with a support degree threshold value to obtain a frequent item set, calculates the association degree of the frequent item set, and compares the association degree with the association degree threshold value to obtain an interesting matrix weighted Chinese characteristic word frequent item set;
the feature word negative item set generation module: the module calculates the support degree of the candidate item set of the Chinese characteristic words, compares the support degree with a support degree threshold value to obtain a negative item set, calculates the association degree of the negative item set, and compares the association degree with an association degree threshold value to obtain an interesting matrix weighted negative item set of the Chinese characteristic words.
4. The mining system of claim 2, wherein the Chinese feature word positive-negative association rule generation and result display module comprises the following 3 modules:
a strong positive and negative association rule generation module from a frequent item set: the module generates a proper subset of a Chinese feature word frequent item set, calculates a correlation coefficient, an interest degree and a confidence degree of the Chinese feature word association rule mode, compares the correlation coefficient, the interest degree threshold and the confidence degree threshold, and mines an effective matrix weighting Chinese feature word strong positive and negative association rule mode from the frequent item set;
a strong negative association rule generation module from the negative set of terms: the module generates a proper subset of a negative term set of the Chinese characteristic words, calculates correlation coefficients, interestingness and confidence degrees of the negative association rule mode of the Chinese characteristic words, compares the correlation coefficients, interestingness thresholds and confidence degree thresholds with the confidence degree threshold, and mines an effective matrix weighting strong negative association rule mode of the Chinese characteristic words from the negative term set;
the strong positive and negative association rule display module of the feature words: the module displays the effective matrix weighted Chinese feature word positive and negative association rule mode to the user in a required form for analysis and use by the user.
CN201410483377.7A 2014-09-22 2014-09-22 Positive and negative mode excavation method and system are weighted between the Chinese word based on coefficient correlation Expired - Fee Related CN104216874B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410483377.7A CN104216874B (en) 2014-09-22 2014-09-22 Positive and negative mode excavation method and system are weighted between the Chinese word based on coefficient correlation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410483377.7A CN104216874B (en) 2014-09-22 2014-09-22 Positive and negative mode excavation method and system are weighted between the Chinese word based on coefficient correlation

Publications (2)

Publication Number Publication Date
CN104216874A CN104216874A (en) 2014-12-17
CN104216874B true CN104216874B (en) 2017-03-29

Family

ID=52098380

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410483377.7A Expired - Fee Related CN104216874B (en) 2014-09-22 2014-09-22 Positive and negative mode excavation method and system are weighted between the Chinese word based on coefficient correlation

Country Status (1)

Country Link
CN (1) CN104216874B (en)

Families Citing this family (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105760266B (en) * 2016-02-24 2019-09-03 深圳芯邦科技股份有限公司 A kind of mobile device capacity check method based on Nand Flash memory
CN105930358B (en) * 2016-04-08 2019-06-04 南方电网科学研究院有限责任公司 Case retrieval method and system based on relevance
CN105955950A (en) * 2016-04-29 2016-09-21 乐视控股(北京)有限公司 New word discovery method and device
CN106484781B (en) * 2016-09-18 2019-03-15 广西财经学院 Merge the Indonesia's Chinese cross-language retrieval method and system of association mode and user feedback
CN107562904B (en) * 2017-09-08 2019-07-09 广西财经学院 Positive and negative association mode method for digging is weighted between fusion item weight and the English words of frequency
CN107526839B (en) * 2017-09-08 2019-09-10 广西财经学院 Consequent extended method is translated across language inquiry based on weight positive negative mode completely
CN107609095B (en) * 2017-09-08 2019-07-09 广西财经学院 Based on across the language inquiry extended method for weighting positive and negative regular former piece and relevant feedback
CN108416442B (en) * 2017-12-26 2021-10-29 广西财经学院 Chinese word matrix weighting association rule mining method based on item frequency and weight
CN109446410A (en) * 2018-09-19 2019-03-08 平安科技(深圳)有限公司 Knowledge point method for pushing, device and computer readable storage medium
CN111680501B (en) * 2020-08-12 2020-11-20 腾讯科技(深圳)有限公司 Query information identification method and device based on deep learning and storage medium

Also Published As

Publication number Publication date
CN104216874A (en) 2014-12-17

Similar Documents

Publication Publication Date Title
CN104216874B (en) Positive and negative mode excavation method and system are weighted between the Chinese word based on coefficient correlation
CN103955542B (en) Method of item-all-weighted positive or negative association model mining between text terms and mining system applied to method
CN104182527B (en) Association rule mining method and its system between Sino-British text word based on partial order item collection
CN102662952B (en) Chinese text parallel data mining method based on hierarchy
AL-Zawaidah et al. An improved algorithm for mining association rules in large databases
CN104317794B (en) Chinese Feature Words association mode method for digging and its system based on dynamic item weights
Nam et al. Efficient approach for damped window-based high utility pattern mining with list structure
CN103838854B (en) Completely-weighted mode mining method for discovering association rules among texts
Xu et al. Scalable continual top-k keyword search in relational databases
CN103440308B (en) A kind of digital thesis search method based on form concept analysis
Cheng et al. Differentially private maximal frequent sequence mining
CN109739953B (en) Text retrieval method based on chi-square analysis-confidence framework and back-part expansion
Zhang et al. A hot spot clustering method based on improved kmeans algorithm
CN109684463B (en) Cross-language post-translation and front-part extension method based on weight comparison and mining
CN109726263B (en) Cross-language post-translation hybrid expansion method based on feature word weighted association pattern mining
Yang et al. Top k probabilistic skyline queries on uncertain data
CN111897926A (en) Chinese query expansion method integrating deep learning and expansion word mining intersection
CN108664548B (en) Network access behavior characteristic group dynamic mining method and system under degradation condition
Yonghong et al. Text clustering based on term weights automatic partition
CN107562904B (en) Positive and negative association mode method for digging is weighted between fusion item weight and the English words of frequency
CN113609247A (en) Big data text duplicate removal technology based on improved Simhash algorithm
CN109684465B (en) Text retrieval method based on pattern mining and mixed expansion of item set weight value comparison
CN109684462B (en) Text inter-word association rule mining method based on weight comparison and chi-square analysis
Yaling et al. An improved differential privacy algorithm using frequent pattern mining
He et al. Enterprise human resources information mining based on improved Apriori algorithm

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C41 Transfer of patent application or patent right or utility model
TA01 Transfer of patent application right

Effective date of registration: 20160325

Address after: Nanning City, 530003 West Road Mingxiu the Guangxi Zhuang Autonomous Region No. 100

Applicant after: Guangxi Finance and Economics Institute

Address before: Nanning City, the Guangxi Zhuang Autonomous Region Qingxiu District JianZheng Road No. 37 530023

Applicant before: Guangxi College of Education

CB03 Change of inventor or designer information

Inventor after: Huang Mingxuan

Inventor before: Huang Mingxuan

Inventor before: Lan Huihong

COR Change of bibliographic data
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20170329

Termination date: 20170922