CN104182527A - Partial-sequence itemset based Chinese-English test word association rule mining method and system - Google Patents
Partial-sequence itemset based Chinese-English test word association rule mining method and system Download PDFInfo
- Publication number
- CN104182527A CN104182527A CN201410427491.8A CN201410427491A CN104182527A CN 104182527 A CN104182527 A CN 104182527A CN 201410427491 A CN201410427491 A CN 201410427491A CN 104182527 A CN104182527 A CN 104182527A
- Authority
- CN
- China
- Prior art keywords
- collection
- item
- partial order
- candidate
- feature words
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/31—Indexing; Data structures therefor; Storage structures
- G06F16/316—Indexing structures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/334—Query execution
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
- G06F16/353—Clustering; Classification into predefined classes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
- G06F40/216—Parsing using statistical methods
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- Computational Linguistics (AREA)
- Software Systems (AREA)
- Probability & Statistics with Applications (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Audiology, Speech & Language Pathology (AREA)
- General Health & Medical Sciences (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Disclosed is a partial-sequence itemset based Chinese-English test word association rule mining method and system. A text information preprocessing module is used for performing preprocessing to establish a text information database and a feature word item base; a feature word frequent partial-sequence item implementation module is used for mining feature word candidate itemsets and solving out partial-sequence itemsets of the candidate itemsets, the candidate partial-sequence itemsets are pruned by a new itemset pruning method, weights of candidate partial-sequence itemsets are calculate, and supports of the candidate partial-sequence itemsets are calculated by a new calculation method so as to obtain frequent partial-sequence itemsets.
Description
Technical field
The invention belongs to Data Mining, specifically association rule mining method and a digging system thereof between the Chinese and English text word based on partial order item collection, is applicable in Chinese and English text mining that Feature Words association mode is found and the field such as Chinese and English document information retrieval query expansion, Chinese and English text cross-language information retrieval.
Background technology
More than 20 years, association rule mining research has obtained significant technological achievement, mainly concentrates on two aspects of the excavation based on item frequency and the digging technology based on item weights.
Excavation based on item frequency also claims to excavate without weighted association rules, and its principal feature is the principle processing item collection consistent by equality, and the probability that item collection is occurred in affairs and conditional probability are as the support of its collection and the degree of confidence of correlation rule.The most representative classical way is Apriori method (R.Agrawal, T.Imielinski, A.Swami. Mining association rules between sets of items in large database[C] // Proceeding of 1993 ACM SIGMOD International Conference on Management of Data, Washington D.C., 1993, (5): 207-216.), on this basis, scholars adopt diverse ways, have improved Apriori method from different angles.
Although the method for digging based on frequency is studied widely, also there is following defect: only pay attention to a frequency, ignore the situation that has project weights, cause the association mode with barren redundancy, invalid to increase.In order to address the above problem, the weighted association mode excavation technology based on item weights obtains extensive discussions and research, is characterized in introducing weights, to have different importance between embodiment project and project has different weights in transaction journal.According to the source difference of item weights, complete weighting pattern digging technology two classes that the excavation based on item weights is divided into the weighting pattern digging technology fixing based on item weights and changes based on item weights.
It is early stage based on item weights method for digging excavating based on the fixing weighting pattern of item weights, since nineteen ninety-eight obtain numerous scholars' concern and further investigation, be characterized in: project weights derive from user or domain expert arranges, and in affairs mining process, immobilize.Its typical algorithm is Algorithms of Mining Association Rules With Weighted Items MINWAL (O) and MINWAL (W) (the C. H. Cai of the propositions such as Cai, A. da, W. C. Fu, et al. Mining Association Rules with Weighted Items [C] //Proceedings of IEEE International database Engineering and Application Symposiums, 1998:68-77.).On this basis, occurred improved weighting pattern method for digging, it all obtains good performance at digging efficiency and excavation aspect of performance.
Limitation based on the fixing weighted association rules method for digging of item weights is not consider that project weights are along with transaction journal changes and the situation of variation, ignores a situation for weights variation, can not solve and have a data mining problem for weights variation characteristic.Conventionally be called complete weighted data by thering are a data for weights variation characteristic, also claim matrix weighted data.Text message is typical weighted data completely, and in the text message of magnanimity, its Feature Words weights are to depend on each document, and changes with document is different.All-weighted association digging technology has overcome the defect of excavating based on the fixing weighting pattern of item weights, there are a various association mode of the data of weights variation characteristic for excavating, belong to the digging technology changing based on item weights, principal feature is that its project weights depend on affairs and dynamic change.Typical all-weighted association method for digging is the red mining algorithm KWEstimate (Tan Yihong that waits the All-weighted Association Rules from Vector Space Model proposing of Tan Yi in 2003, Lin Yaping. the excavation [J] of All-weighted Association Rules from Vector Space Model. computer engineering and application, 2003 (13): 208-211.) and the matrix Algorithms of Mining Association Rules With Weighted Items MWARM (Huang Mingxuan of inquiry oriented expansion, Yan little Wei, Zhang Shichao. the spurious correlation feedback query expansion [J] of excavating based on matrix weighted association rules. Journal of Software, 2009, 20 (7): 1854-1865.), these methods all obtain good mining effect at the complete weighted data association mode of excavation, and successfully apply to information retrieval query expansion field (Huang Mingxuan, Yan little Wei, Zhang Shichao. the spurious correlation feedback query expansion [J] of excavating based on matrix weighted association rules. Journal of Software, 2009, 20 (7): 1854-1865., Huang Mingxuan, Yan little Wei, Zhang Shichao. all-weighted association excavation and the application [J] in query expansion thereof. computer utility research, 2008, 25 (6): 1724-1727.), obtain significant effect.The defect of the existing method for digging changing based on weights is: the association mode quantity that it excavates is still very huge, increase the difficulty that user selects required mode, barren, the false association mode with invalid is also a lot, is difficult to its technical application that is raised to.
Along with the development of network technology and infotech, weighted data (as network text information data) quantity rapidly increases completely, become mass data, how from the complete weighted data of these magnanimity, to excavate association mode useful, that more approach actual conditions is current problem demanding prompt solution.Based on the complete weighted data of the inapplicable processing of the fixing mining algorithm of item weights, majority still adopts the method for digging based on frequency to process these data at present, causes the association mode with barren bulk redundancy, invalid to produce.For the problems referred to above, the present invention, according to the feature of Chinese and English document data, carries a kind of new Chinese and English eigen word association mode of rule method for digging and digging system thereof based on partial order item collection.This invention adopts new partial order item collection support computing method and technology of prunning branches, avoid a lot of invalid, association mode generations falseness and barren, greatly improve Chinese and English text mining efficiency, the Feature Words association rule model of excavating approaches actual conditions more.Experimental result shows, the Feature Words association mode quantity that the text mining method that this invention proposes is excavated and excavation time all obviously reduce, its excavation performance is better than existing complete weighting pattern method for digging and the mode excavation method based on frequency, its Feature Words association mode can be information retrieval reliable query expansion word source is provided, therefore, this inventive method has important using value and wide application prospect in the field such as text mining, information retrieval.
Summary of the invention
Technical matters to be solved by this invention is, further investigate for the civilian text feature word association of Chinese and English mode excavation, association rule mining method and system thereof between a kind of Chinese and English text word based on partial order item collection proposed, improve Chinese and English text mining efficiency, be applied to Chinese and English document information retrieval query expansion, can improve retrieval performance, be applied to Chinese and English text mining, can find more actual rational Chinese and English Feature Words association mode, thereby improve the precision of text cluster and classification.Such as, in search engine (Baidu, Google etc.), use the inventive method can obtain high-quality expansion word and realize user's query expansion, improve recall ratio and precision ratio.
The present invention solves the problems of the technologies described above taked technical scheme: association rule mining method between a kind of Chinese and English text word based on partial order item collection, comprises the steps:
(1) Chinese and English text message data pre-service: pending Chinese and English text message data are carried out to pre-service: Chinese text participle, English text stem extracts, remove stop words, extract Feature Words and weights calculating thereof, build text message database and Feature Words project storehouse based on vector space model.
Adopt Porter (seeing http://tartarus.org/ ~ martin/PorterStemmer) program as English document stem extraction procedure, Chinese word segmentation program is the ICTCLAS Chinese word segmentation system (seeing http://www.ictclas.org/) that Inst. of Computing Techn. Academia Sinica develops.
Text feature word weights computing formula is:
w ij =(1+ln (
tf ij )) ×
idf i ,
Wherein,
w ij be
iindividual Feature Words is
jthe weights of section document,
idf i be
ithe reverse document frequency of individual Feature Words, its value
idf i =log (
n/
df i ),
nfor total number of documents in document sets,
df i for containing
ithe number of documents of individual Feature Words,
tf ij be
iindividual Feature Words is
jthe word frequency of section document;
(2) excavate the numerous partial order item of complete weighted feature word frequency collection, comprise the following steps 2.1 and step 2.2:
2.1, excavate the numerous 1_ item of complete weighted feature word frequency collection
l 1 , concrete steps are carried out according to 2.1.1 and 2.1.3:
2.1.1, from Feature Words project storehouse, extract Feature Words candidate 1_ item collection
c 1, in cumulative text message database, the weights of all items, obtain whole project weights summations
w, cumulative
c 1weights accumulative total in text message database
, calculate
c 1support
poisup(
c 1).
piosup(
c 1) formula as follows:
2.1.2, by Feature Words candidate 1_ item collection
c 1in its support
poisup(
c 1)>=
msfrequent 1_ item collection
l 1 join the set of Feature Words frequent item set
fIS,
msfor minimum support threshold value.
2.1.3, cumulative candidate 1-item collection in text message database
c 1occurrence frequency
n c1
, extract
w r (
c 1), calculate
c 1partial order item centralization of state power value expect
pOIWB(
c 1, 2).
pOIWB(
c 1, 2) computing formula be:
POIWB(
C 1,2)=2×
W×
ms-
n c1
×w r (
C 1)。
w r (
c 1) for not belong to
c 1project set in the project weights of weights maximum of sundry item.
2.2, excavate complete weighted feature word frequency numerous
k_ collection
l k , described
k>=2, according to step, 2.2.1 ~ 2.2.12 operates:
2.2.1, for candidate (
k-1) _ collection C
k-1
, will
w(
c k-1
) <
pOIWB(
c k-1 ,
k) can not become frequent
k_ collection
c k-1
wipe out, obtain new candidate
c k-1
set.(beta pruning 1)
Wherein,
w(
c k-1
) be
c k-1
weights accumulative total in text message database,
pOIWB(
c k-1
,
k) for comprise complete weighting candidate (
k-1) _ collection
c k-1
's
k_ centralization of state power heavily expects, its computing formula is as follows:
POIWB(
C k-1
,
k)=
k×
W×
ms-
n (
k-1)
×w r
n (
k-1)
for candidate
c k-1
occurrence frequency in text message database,
w r for not belonging to
c k-1
the project weights of weights maximum in the sundry item of project set.
2.2.2, by its collection frequency be not 0 Feature Words candidate (
k-1) _ collection
c k-1
carry out Apriori connection, generating feature word candidate
k_item collection
c k ;
If 2.2.3
c k for sky, exit 2.2 steps and proceed to (3) step; Otherwise, if
c k not empty, proceed to 2.2.4 step.
2.2.4, for candidate
k_ collection
c k , investigate
c k any (
k-1) _ collected works collection, if exist one its (
k-1) the item centralization of state power value an of _ subset is less than its corresponding partial order item centralization of state power and heavily expects (
w (
k-1)
<
pOIWB(
c k-1 ,
k)), this collection
c k must be non-frequent item set, from its set, delete this collection, obtain new candidate's partial order item collection po
c k set.(beta pruning 2)
2.2.5, cumulative candidate in text message database
k-collection
c k occurrence frequency
n ck and each project weights
w 1(
c k )
, w 2(
c k )
..., w k (
c k ), extract
w r (
c k ), calculate
c k weight expect
pOIWB(
c k ,
k+1).
pOIWB(
c k ,
k+1) computing formula is:
POIWB(
C k ,
k+1) =(
k+1)×
W×
ms-
n ck ×w r (
C k )
2.2.6, delete the candidate that its collection frequency is 0
k-collection
c k , obtain new
c k set.(beta pruning 3)
2.2.7, obtain each
c k partial order item collection po
c k .
2.2.8, investigate partial order item collection po
c k high order proper subclass, if there is po
c k high order proper subclass right and wrong frequently, partial order item collection po
c k certain right and wrong frequently, are deleted this collection from its set, obtain new candidate's partial order item collection po
c k set.(beta pruning 4)
2.2.9, investigate partial order item collection po
c k high claim object project weights, if there is po
c k high claim object project weights be less than the minimum weight threshold of 1_ item collection
minw, partial order item collection po
c k certain right and wrong frequently, are deleted this collection from its set, obtain new candidate's partial order item collection po
c k set,
minwcomputing formula be:
minw=
w×
ms.(beta pruning 5)
2.2.10, investigate partial order item collection
poC k low claim order, if there is po
c k low claim object project weights be not less than
minw, partial order item collection po
c k must be frequently, this collection is joined to the set of Feature Words frequent item set
fIS.
2.2.11, to remaining partial order item collection
poC k , calculate its support
piosup(
poC k ), if
piosup(
poC k )>=
ms, this partial order item collection
poC k be frequently, join the set of Feature Words frequent item set
fIS.
poisup(
poC k ) computing formula as follows:
Wherein,
it is partial order item collection
poC k weights accumulative total in text message database,
kfor Feature Words partial order item collection
poC k project number.
2.2.12, will
kvalue add 1, circulation 2.2.1 ~ 2.2.12 step, until
c k for sky, exit 2.2 steps and proceed to (3) step as follows.
(3) from the set of Feature Words frequent item set
fISthe effectively complete weighted feature word Strong association rule pattern of middle excavation, comprises the following steps:
3.1, from the set of Feature Words frequent item set
fIStake out Feature Words frequent item set
l i , generate
l i all proper subclass.
3.2, from
l i proper subclass set in take out arbitrarily two proper subclass
i 1 with
i 2 , work as I
1 i
2=
, and I
1 i
2=L
iif,
w 12>=(
k 12/
k 1) ×
w 1×
mc, excavate Feature Words Strong association rule
i 1 →
i 2 ; If
w 12>=(
k 12/
k 2) ×
w 2×
mc, excavate Feature Words Strong association rule
i 2 →
i 1 .Described
k 1,
k 2with
k 12be respectively a collection
i 1 ,
i 2 (
i 1 ,
i 2 ) project number,
w 1,
w 2with
w 12be respectively
i 1 ,
i 2 (
i 1 ,
i 2 ) item centralization of state power value,
mcfor minimal confidence threshold.
3.3, continue 3.2 steps, when Feature Words frequent item set
l i proper subclass set in each proper subclass be removed once, and only can take out once, proceed to step 3.4;
3.4, continue 3.1 steps, when each frequent item set in the set of Feature Words frequent item set
l i all be removed once, and only can take out once, (3) step end of run;
So far, weighted feature word association mode of rule excavates end completely.
A digging system that is applicable to association rule mining method between the above-mentioned Chinese and English text word based on partial order item collection, is characterized in that, comprises following 4 modules:
Text message pretreatment module: for pending Chinese and English notebook data is carried out to pre-service, be that Chinese text participle, English text stem extract, remove stop words and Feature Words extraction and weights calculating thereof etc., build text message database and Feature Words project storehouse based on vector space model.
The frequent partial order item of Feature Words collection generation module: this module is used for from the complete weighted feature word of text message database mining candidate partial order item collection, and adopt new pruning method to the beta pruning of candidate's partial order item collection, obtain final candidate's partial order item collection, by new partial order item collection support computing method, concentrate and draw the numerous partial order item of complete weighted feature word frequency integrated mode from candidate's partial order item.
Completely weighted feature word association rule generation module: simple computation and the comparison of this module and dimension heavy by a centralization of state power, from the numerous partial order item of complete weighted feature word frequency collection (
i 1,
i 2) the middle weighted feature word association mode of rule completely that excavates effectively:
i 1→
i 2.
Association rule model result display module: the form that effectively weighted feature word association mode of rule is liked with user is completely shown to user, for customer analysis, choice and operation.
Described text message pretreatment module comprises following 2 modules:
Chinese and English text pretreatment module: this module is responsible for Chinese text message carry out participle and remove Chinese stop words, and English text information is carried out stem extraction and removed the Chinese and English language material pre-service such as English stop words.
Text database and project library build module: this module mainly carries out Chinese and English Feature Words extracts and weight calculation, build text message database and Chinese and English Feature Words project storehouse based on vector space model.
The frequent partial order item of described Feature Words collection generation module comprises following 3 modules:
Feature Words candidate partial order item collection generation module: this module is mainly excavated Feature Words candidate partial order item collection from text message database, detailed process is as follows: from Feature Words project storehouse, extract candidate 1-item collection, the weights summation of cumulative candidate 1-item collection in text message database, calculate its support, draw the numerous 1_ item of complete weighted feature word frequency collection; Then, by connect Apriori connect, by complete weighted feature word frequency numerous (
k-1) _ collection generating feature word candidate k_ item collection; Described
k>=2; The project weights of each project of cumulative Feature Words candidate k_ item collection in text message database, draw complete weighted feature word candidate partial order k_ item collection.
Feature Words candidate partial order item collection beta pruning module: this module utilizes pruning method of the present invention to carry out beta pruning to complete weighted feature word candidate partial order k_ item collection, candidate's partial order k_ item collection is deleted frequently, obtains finally likely candidate's partial order k_ item collection set frequently.
The frequent partial order item of Feature Words collection generation module: this module is mainly that the final candidate's partial order k_ item collection to obtaining after above-mentioned module beta pruning excavates, use the support of support computing method calculated candidate partial order k_ item collection of the present invention, with the comparison of minimum support threshold value, draw the numerous partial order k_ item of complete weighted feature word frequency collection.
Described complete weighted feature word association rule generation module comprises following 2 modules:
The subitem collection generation module of the frequent partial order item of Feature Words collection: this module is mainly found out the frequent partial order item of Feature Words and collected all proper subclass, and obtain item centralization of state power value and the dimension of each proper subclass.
Weighted feature word association rule generation module completely: this module is by simple computation and a comparison for centralization of state power value, from the effective weighted feature word Strong association rule pattern completely of the frequent partial order item set mining of Feature Words.
Minimum support threshold value in described digging system
ms, minimal confidence threshold
mcinputted by user.
Compared with prior art, the present invention has following beneficial effect:
(1) first the present invention proposes Chinese and English concept and a kind of new complete weighted feature word partial order item collection support computing method and the partial order item collection pruning method of weighted feature word partial order item collection completely, proposes on this basis association rule mining method and digging system thereof between a kind of Chinese and English text word based on partial order item collection.This invention adopts new partial order item collection support computing method and technology of prunning branches, avoids a lot of invalid, association mode generations falseness and barren, greatly improves digging efficiency, and the association mode of excavating approaches actual conditions more.With existing method for digging comparison, the present invention has good beta pruning effect, its association mode quantity and excavation time all obviously reduce, its excavation performance is better than existing complete weighting pattern excavation and the mode excavation method based on frequency, improve Chinese and English Feature Words association rule model digging efficiency, obtain association mode between actual text word, in the field such as text mining, information retrieval field, have higher using value and wide application prospect.Such as, in search engine (Baidu, Google etc.), use the inventive method can obtain high-quality expansion word and realize user's query expansion, improve recall ratio and precision ratio.
(2) using the language material of the English data set NTCIR-5 of domestic Chinese standard data set CWT200g and international standard as experimental data, by the present invention and the traditional mode excavation method based on frequency and completely weighting pattern method for digging test relatively and analysis, experimental result shows, no matter in the situation that support threshold value or confidence threshold value change, it is few that the candidate quantity that the present invention excavates is all excavated than control methods, it is few that the excavation time of the present invention excavates than control methods, amount of decrease is larger, and digging efficiency is greatly improved.
Brief description of the drawings
Fig. 1 is the block diagram of association rule mining method between the Chinese and English text word based on partial order item collection of the present invention.
Fig. 2 is the overall flow figure of association rule mining method between the Chinese and English text word based on partial order item collection of the present invention.
Fig. 3 is the structured flowchart of association rule mining system between the Chinese and English text word based on partial order item collection of the present invention.
Fig. 4 is the structured flowchart of text message pretreatment module of the present invention.
Fig. 5 is the structured flowchart of the frequent partial order item of Feature Words of the present invention collection generation module.
Fig. 6 is the structured flowchart of complete weighted feature word association rule generation module of the present invention.
Embodiment
For technical scheme of the present invention is described better, below Chinese and English text data model and the relevant concept that the present invention relates to are described below:
One, key concept
Definition 1 (Chinese and English text message data model):
Chinese and English text message data belong to the complete weighted data changing based on item weights, its data model
dWDM(
dynamic Weighted Data Model) by the set of affairs paper trail
tR(
transaction Record), the set of Feature Words project
iS(
item Set) and Feature Words project and affairs paper trail, weights three's correspondence set
iW(
item Weight) composition, can form turn to suc as formula (1) and represent.
DWDM=(
TR,
IS,
IW) (1)
Wherein,
tR=
r 1 , r 2 ..., r n ,
r i (1≤
i≤
n) be
dWDMin
ibar affairs paper trail (
record),
iS=
i 1 , i 2 ..., i m ,
i j (1≤
j≤
m) be
dWDMin
jindividual Feature Words project.
IW={<
i 1 ,
r 1,
w[
r 1][
i 1]>, <
i 2 ,
r 1,
w[
r 1][
i 2]>,…, <
i m ,
r 1,
w[
r 1][
i m ]>, <
i 1 ,
r 2,
w[
r 2][
i 1]>, <
i 2 ,
r 2,
w[
r 2][
i 2]>,…, <
i m ,
r 2,
w[
r 2][
i m ]>,…, <
i 1 ,
r n ,
w[
r n ][
i 1]>, <
i 2 ,
r n ,
w[
r n ][
i 2]>,…, <
i m ,
r n ,
w[
r n ][
i m ]>}。
iWin set,
w[
r i ] [
i j ] (1≤
i≤
n, 1≤
j≤
m) be project
i j at affairs paper trail
r i in weights, if Feature Words project
i j do not appear at affairs paper trail
r i in,
w[
r i ] [
i j ]=0.
Example: the complete weighted data example of Chinese text data (Text data) is as follows:
text=(
tR,
iS,
iW), wherein,
tR=
r 1 , r 2 , r 3 , r 4 , r 5 be 5 paper trails,
iS=
i 1 , i 2 , i 3 , i 4 , i 5 be 5 Feature Words projects,
iW={ <
i 1 ,
r 1, 0>, <
i 2 ,
r 1, 0.83>, <
i 3 ,
r 1, 0.81>, <
i 4 ,
r 1, 0>, <
i 5 ,
r 1, 0.01>, <
i 1 ,
r 2, 0>, <
i 2 ,
r 2, 0.94>, <
i 3 ,
r 2, 0.7>, <
i 4 ,
r 2, 0.23>, <
i 5 ,
r 2, 0>, <
i 1 ,
r 3, 0>, <
i 2 ,
r 3, 0.35>, <
i 3 ,
r 3, 0.5>, <
i 4 ,
r 3, 0.63>, <
i 5 ,
r 3, 0>, <
i 1 ,
r 4, 0.95>, <
i 2 ,
r 4, 0>, <
i 3 ,
r 4, 0.85>, <
i 4 ,
r 4, 0>, <
i 5 ,
r 4, 0>, <
i 1 ,
r 5, 0.73>, <
i 2 ,
r 5, 0.02>, <
i 3 ,
r 5, 0>, <
i 4 ,
r 5, 0.06>, <
i 5 ,
r 5, 0.9>}.
iWset can represent with following Fig. 1.
The complete weighted data example of Fig. 1
Definition 2 (centralization of state power value and project weights): weighted term collection completely
iby different projects
i 1 ,
i 2 ,
...,
i p the set of composition,
i=(
i 1 ,
i 2 ,
...,
i p ) (1≤
p≤
m),
i iS,
ian item centralization of state power value refer to a collection
iwhen all projects appear at same transaction journal simultaneously in each transaction journal
i 1 ,
i 2 ,
...,
i p weights accumulative total, be designated as
w i ,
, or
w i =
w 1+
w 2+ ... +
w p , wherein,
w 1 ,
w 2 ...,
w p be
iin each project
i 1 , i 2 ..., i p corresponding weights, are called a collection
iproject weights, its value for this project in transaction journal set
tRin meet collection
iwhole projects (
i 1 ,
i 2 ,
...,
i p ) the weights cumulative sum of each single project in the different transaction journals that satisfy condition under Conditions simultaneously,
,
Especially, by item collection
isubset its meet concentrate each project (
i 1 ,
i 2 ,
...,
i p ) simultaneously occur transaction journal in cumulative weights summation be called subset project weights, be designated as
w sub , and this subset is during separately as an item collection, in transaction journal set
tRin an item centralization of state power value be designated as
w (
sub)
, for example, a collection
isubset (
i 1 ,
i 3 ) subset project weights
w sub(
i1 ,
i3)
=
w 1 +
w 2 , and the item centralization of state power value of this subset during separately as item collection is
.
Example: Fig. 1's
textin example, 3_ item collection (
i 2 ,
i 3 ,
i 4 ) an item centralization of state power value be all items of this collection
i 2 ,
i 3 ,
i 4 while appearance, (transaction journal satisfying condition is in each transaction journal simultaneously
r 2,
r 3) in
i 2 ,
i 3 ,
i 4 weights accumulative total,
w (
i2,
i3,
i4)
=(0.94+0.7+0.23)+(0.35+0.5+0.63)=3.35.3_ item collection (
i 2 ,
i 3 ,
i 4 ) project weights be
i 2 ,
i 3 ,
i 4 while appearance simultaneously single project the different transaction journals that satisfy condition (
r 2with
r 3) in weights summation,
w i2 =0.94+0.35=1.29,
w i3 =0.7+0.5=1.2,
w i4 =0.23+0.63=0.86.Item collection (
i 2 ,
i 3 ,
i 4 ) subset (
i 2 ,
i 3 ) subset project weights
w sub(
i2,
i3)
=
w i2 +
w i3 =1.29+1.2=2.49, and the item centralization of state power value of this subset is
w (
i2,
i3)
=(0.83+0.81)+(0.94+0.7)+(0.35+0.5)=4.13.
Definition 3 (weighting partial order item collection completely): for complete weighted term collection
i=
i 1 , i 2 ..., i p (1≤
p≤
m), its project weights are
w 1 , w 2 ..., w p .According to the size sequence of project weights, if
w 1 ≤
w 2 ≤
...≤
w p , its corresponding project is arranged and is designated as i
1 i
2 ... i
p, this is collected
i 1 , i 2 ..., i p be called complete weighting partial order item collection (
partial Order Itemset, POI), wherein
i 1 be called the minimum project of weights, be called for short low claim order,
i p be called the highest project of weights, be called for short high claim order.
Example: Fig. 1's
textin example, 3_ item collection (
i 2 ,
i 3 ,
i 4 ) project weights be respectively 1.29,1.2,0.86, therefore its complete weighting partial order Xiang Jiwei (
i 4 , i 3 , i 2 ),
i 4 for low claim order,
i 2 it is high claim order.
Definition 4 (weighting partial order item collection support completely): regard a kind of metric point as with project weight, taking entitlement recast in complete weighting transaction database as sample point, according to how much scheme theories in theory of probability, provide a kind of new complete weighting partial order item collection
i=(
i 1 ,
i 2 ,
...,
i p ) (1≤
p≤
m) support (
all-weighted partial order itemset support,
poisup) computing formula
poisup(
i), shown in (7).
(7)
Wherein,
for complete weighting partial order item collection
iitem centralization of state power value,
for whole project weights summations in complete weighting transaction journal set TR,
be called complete weighting partial order item collection support standardization coefficient.The reason of introducing support standardization coefficient is: in complete weighted data mining process, partial order item centralization of state power value increases along with the increase of its collection length, cause a collection support and regular degree of confidence to become large, in order to make the numerical value of its support and degree of confidence in rational scope, the special partial order item collection support standardization coefficient 1/ of introducing
p, make its support and degree of confidence more reasonable, do not affect again the excavation of complete weighted association pattern.
Definition 5 (the frequent partial order item of weighting collection completely): establishing minimum support threshold value is
ms, for complete weighting partial order item collection
iif,
poisup(
i)>=
ms,
w i >=
w×
p×
ms, claim a collection
ifor the complete frequent partial order item of weighting collection.
Especially, when item collection
iduring for 1_ item collection,
p=1, can obtain the minimum weight threshold of 1_ item collection
minw=
w×
ms, obviously, when the weights of 1_ item collection are not less than
minwtime, this 1_ item collection is frequently.
Definition 6 (expectations of partial order item centralization of state power value): the expectation of partial order item centralization of state power value (
partial Order Itemset Weight Bound,
pOIWB) refer to comprise complete weighting (
k-1) _ partial order item collection
i k-1
's
k_ centralization of state power value prediction critical value, is designated as
pOIWB(
i k-1 ,
k).The centralization of state power heavily expects to have important theory significance: by complete weighting partial order (
k-1) weights an of _ collection can be predicted its follow-up generation
kthe frequency an of _ collection.
If weighting completely (
k-1) _ partial order item collection
i k-1
(
k<
m) weights be
w (
k-1)
,
i k-1
iS.Do not belonging to
i k-1
in the sundry item of project set, remember that the project of its weights maximum is
i r (i
r iS, i
r i
k-1, 1≤
r≤
m), these project weights are
w r , a collection
i k-1
in transaction journal set
tRin occurrence frequency be
n (
k-1)
, comprise so
i k-1
's
kthe weights an of _ collection maximum possible are:
w (
k-1)
+
n (
k-1)
× w r , wherein,
.
If comprise
i k-1
's
k_ collection is frequently, from definition 4,
(8)
By formula (8) right-hand component be called comprise complete weighting (
k-1) _ partial order item collection
i k-1
's
kthe centralization of state power of _ partial order item is heavily expected, is designated as
pOIWB(
i k-1 ,
k), that is,
POIWB(
I k-1 ,
k)=
k×
W×
ms-
n (
k-1)
×w r (9)
Formula (9) shows, if
w (
k-1)
>=
pOIWB(
i k-1 ,
k), comprise
i k-1
complete weighting
k_ partial order item collection may be frequent item set.
Definition 7 (low order proper subclass and high order proper subclass): establish complete weighting partial order item collection
z=(
x,
y),
xwith
ybe
z2 sub-partial order item collection, wherein
x=(
i 1 , i 2 ..., i r ) (1≤
r<
m),
y=(
i r+1
, i r+2 ..., i r+q ) (1≤
q<
m, 2≤(
r+
q)≤
m), its corresponding project weights are
w 1 , w 2 ..., w r (wherein
w 1≤
w 2≤
...≤
w r ) and
w r+1
, w r+2 ..., w r+q (wherein,
w r+1
≤
w r+2 ≤
...≤
w r+q ), if
xhigh claim order weights be not more than
ylow claim order weights,
w r ≤
w r+1
, claim subitem collection
xit is partial order item collection
zlow order proper subclass, subitem collection
ybe
zhigh order proper subclass.
The pruning method of described of the present invention complete weighted feature lexical item collection is:
1. Feature Words candidate (
i-1) _ collection
c i-1
produce Feature Words candidate
i-collection
c i (
i>=2) front, calculate
c i-1
feature lexical item centralization of state power value expect
pOIWB(
c i-1
,
i), if complete weighted feature word candidate (
i-1) _ collection
c i-1
item centralization of state power value
w (
i-1)
<
pOIWB(
c i-1
,
i), so its Feature Words (
i-1) _ collection
c i-1
follow-up Feature Words
i_ collection
c i must be non-frequent item set, should be from
c i-1
in set, wipe out this Feature Words (
i-1) _ collection.
2. generating feature word candidate
c i after, for candidate
c i any (
i-1) _ collected works collection, calculates the feature lexical item centralization of state power value of each candidate subset and expects, if exist one its (
i-1) the item centralization of state power value an of _ subset is less than its characteristic of correspondence lexical item centralization of state power value and expects (
w (
i-1)
<
pOIWB(
c i-1
,
i)), this Feature Words candidate
i_ collection
c i must be non-frequent item set, should be from
c i in set, wipe out this Feature Words candidate.
3. for Feature Words candidate
c i the high order proper subclass of partial order item collection, if to have its high order proper subclass be non-frequent item set, this Feature Words candidate so
c i the frequent partial order item of right and wrong collection, should be from
c i in set, wipe out this Feature Words candidate.
4. for Feature Words candidate
c i the high claim order of partial order item collection, if exist its high claim object project weights to be less than the minimum weight threshold of 1_ item collection
minw, this Feature Words candidate must be non-frequent item set, should be from
c i in set, wipe out this Feature Words candidate.
If 5. Feature Words (
i-1) _ collection
c i-1
feature lexical item collection frequency be 0,
n (
i-1)
=0, this Feature Words (
i-1) _ follow-up Feature Words of collection
i_ collection must be non-frequent item set, should be from
c i-1
in set, wipe out this Feature Words (
i-1) _ collection.
6. for candidate
c i the low claim order of partial order item collection, if exist its project weights to be not less than the minimum weight threshold of 1_ item collection
minw, this candidate so
c i frequently, will
c i join in frequent item set set.
Below by specific embodiment, technical scheme of the present invention is described further.
The method for digging that in specific embodiment, the present invention takes and system are as shown in Fig. 1-Fig. 6.
Process that the present invention excavates complete weighted feature word association rule to Fig. 1 data instance following (
ms=0.1,
mc=0.6):
1. obtain whole project weights summations in database
w=8.51, therefore
minw=
w×
ms=0. 851.
2. excavate the numerous 1_ item of complete weighted feature word frequency collection
l 1, as shown in table 1.
Table 1:
C 1 | w( C 1) | poisup( C 1) | n c 1 | w r( C 1) | POIWB( C 1,2) |
( i 1) | 1.68 | 0.197 | 2 | 0.94 | 2×8.51×0.1-2×0.94=-0.178 |
( i 2) | 2.14 | 0.25 | 4 | 0.95 | 2×8.51×0.1-4×0.95=-2.098 |
( i 3) | 2.86 | 0.33 | 4 | 0.95 | 2×8.51×0.1-4×0.95=-2.098 |
( i 4) | 0.92 | 0.108 | 3 | 0.95 | 2×8.51×0.1-3×0.95=-1.148 |
( i 5) | 0.91 | 0.107 | 2 | 0.95 | 2×8.51×0.1-2×0.95=-0.198 |
As shown in Table 1,
l 1 =(
i 1 ), (
i 2 ), (
i 3 ), (
i 4 ), (
i 5 ),
The set of Feature Words frequent item set
fIS=(
i 1 ), (
i 2 ), (
i 3 ), (
i 4 ), (
i 5 ).
3. excavate complete weighted feature word frequency numerous
k_ collection
l k , described
k>=2.
k=2:
(1) (beta pruning 1) is for candidate 1_ item collection C
1, do not have
w(
c 1) <
pOIWB(
c 1, 2) situation, therefore candidate
c 1gather constant.
(2) be not 0 Feature Words candidate 1_ item collection by its collection frequency
c 1carry out Apriori connection, generating feature word candidate 2
_item collection
c 2, and calculate
w 1(
c 2 ),
w 2(
c 2 ),
poC 2 ,
w(
poC 2 ),
n c2
,
w r (
c 2) and
pOIWB(
c 2, 3) and as shown in table 2.
Table 2:
C 2 | w 1( C 2) | w 2( C 2) | poC 2 | w( poC 2) | n c 2 | w r( C 2) | POIWB( C 2,3) |
( i 1, i 2) | 0.73 | 0.02 | ( i 2, i 1) | (0.02,0.73) | 1 | 0.9 | 3×8.51×0.1-1×0.9=1.653 |
( i 1, i 3) | 0.95 | 0.85 | ( i 3, i 1) | (0.85,0.95) | 1 | 0.94 | 3×8.51×0.1-1×0.94=1.613 |
( i 1, i 4) | 0.73 | 0.06 | ( i 4, i 1) | (0.06, 0.73) | 1 | 0.94 | 3×8.51×0.1-1×0.94=1.613 |
( i 1, i 5) | 0.73 | 0.9 | ( i 1, i 5) | (0.73,0.9) | 1 | 0.94 | 3×8.51×0.1-1×0.94=1.613 |
( i 2, i 3) | 2.12 | 2.01 | ( i 3, i 2) | (2.01, 2.12) | 3 | 0.95 | 3×8.51×0.1-3×0.95=-0.297 |
( i 2, i 4) | 1.31 | 0.92 | ( i 4, i 2) | (0.92,1.31) | 3 | 0.95 | 3×8.51×0.1-3×0.95=-0.297 |
( i 2, i 5) | 0.85 | 0.91 | ( i 2, i 5) | (0.85,0.91) | 2 | 0.95 | 3×8.51×0.1-2×0.95=0.653 |
( i 3, i 4) | 1.2 | 0.86 | ( i 4, i 3) | (0.86,1.2) | 2 | 0.95 | 3×8.51×0.1-2×0.95=0.653 |
( i 3, i 5) | 0.81 | 0.01 | ( i 5, i 3) | (0.01, 0.81) | 1 | 0.95 | 3×8.51×0.1-1×0.95=1.603 |
( i 4, i 5) | 0.06 | 0.9 | ( i 4, i 5) | (0.06, 0.9) | 1 | 0.95 | 3×8.51×0.1-1×0.95=1.603 |
For table 2, proceed as follows:
﹡ investigates partial order item collection
poC 2high order proper subclass, (
i 1 ), (
i 2 ), (
i 3 ), (
i 5 ), these proper subclass are all frequently, do not deposit non-frequent proper subclass item collection, therefore partial order item collection
poC 2gather constant.
﹡ investigates partial order item collection
poC 2high claim object project weights,
poC 2high claim object project weights <
minw=0. 851: (
i 1 ,
i 2 ), (
i 1 ,
i 4 ), (
i 3 ,
i 5 ), their right and wrong frequently, from
poC 2in set, delete this collection.
﹡ investigates partial order item collection
poC 2low claim order,
poC 2low claim object project weights>=
minw: (
i 2 ,
i 3 ), (
i 2 ,
i 4 ), (
i 3 ,
i 4 ), they are frequently, and these collection are joined to the set of Feature Words frequent item set
fIS, that is,
fIS=(
i 1 ), (
i 2 ), (
i 3 ), (
i 4 ), (
i 5 ), (
i 2 ,
i 3 ), (
i 2 ,
i 4 ), (
i 3 ,
i 4 ).
﹡ is to remaining partial order item collection
poC 2, (
i 3 ,
i 1 ), (
i 1 ,
i 5 ), (
i 2 ,
i 5 ), (
i 4 ,
i 5 ), calculate its support, that is,
piosup(
i 3 ,
i 1 )=(0.85+0.95)/(8.51 × 2)=0.106>
ms,
piosup(
i 1 ,
i 5 )=0.096<
ms,
piosup(
i 2 ,
i 5 )=0.103>
ms,
piosup(
i 4 ,
i 5 )=0.056<
mstherefore, (
i 3 ,
i 1 ) and (
i 2 ,
i 5 ) be frequent partial order item collection, join the set of Feature Words frequent item set
fIS, that is,
fIS=(
i 1 ), (
i 2 ), (
i 3 ), (
i 4 ), (
i 5 ), (
i 2 ,
i 3 ), (
i 2 ,
i 4 ), (
i 3 ,
i 4 ), (
i 3 ,
i 1 ), (
i 2 ,
i 5 ).
k=3:
﹡ as known from Table 2, for candidate 2_ item collection
c 2,
w(
c 2)=
w 1(
c 2 )+
w 2(
c 2 ), its
w(
c 2) <
pOIWB(
c 2, 3) partial order Xiang Jiyou: (
i 2 ,
i 1 ), (
i 4 ,
i 1 ), (
i 5 ,
i 3 ) and (
i 4 ,
i 5 ), these partial order item collection can not become frequent 3_ item collection, should be from
c 2in set, wipe out, obtain new candidate
c 2set,
c 2=(
i 1 ,
i 3 ), (
i 1 ,
i 5 ), (
i 2 ,
i 3 ), (
i 2 ,
i 4 ), (
i 2 ,
i 5 ), (
i 3 ,
i 4 ).
﹡ is not 0 Feature Words candidate 2_ item collection by its collection frequency
c 2carry out Apriori connection, generating feature word candidate 3
_item collection
c 3,
c 3=(
i 1 ,
i 3 ,
i 5 ), (
i 2 ,
i 3 ,
i 4 ), (
i 2 ,
i 3 ,
i 5 ), (
i 2 ,
i 4 ,
i 5 ).
﹡ is for candidate 3
_item collection
c 3, investigate
c 3any (3-1) _ collected works collection,
c 32_ item collected works collection:
For (
i 1 ,
i 3 ,
i 5 ) and (
i 2 ,
i 3 ,
i 5 ): exist its subitem collection (
i 5 ,
i 3 ), its
w(
i 5 ,
i 3 ) <
pOIWB((
i 5 ,
i 3 ), 3), for (
i 2 ,
i 4 ,
i 5 ): exist its subitem collection (
i 4 ,
i 5 ), its
w(
i 4 ,
i 5 ) <
pOIWB((
i 4 ,
i 5 ), 3), therefore Feature Words candidate 3
_item collection (
i 1 ,
i 3 ,
i 5 ), (
i 2 ,
i 3 ,
i 5 ) and (
i 2 ,
i 4 ,
i 5 ) be non-frequent item set, should be from
c 3delete, new
c 3=(
i 2 ,
i 3 ,
i 4 ).
﹡ calculates
w 1(
c 3 ),
w 2(
c 3 ),
w 3(
c 3 ),
poC 3 ,
w(
poC 3 ),
n c3
,
w r (
c 3) and
pOIWB(
c 3, 4) and as shown in table 3.
Table 3:
For table 3, proceed as follows:
﹡ investigates partial order item collection
poC 3high order proper subclass, (
i 2 ), (
i 2 ,
i 3 ), these proper subclass are all frequently, do not deposit non-frequent proper subclass item collection, therefore partial order item collection
poC 3gather constant.
﹡ investigates partial order item collection
poC 3high claim object project weights,
poC 2high claim object project weights are all greater than
minwtherefore, partial order item collection
poC 3gather constant.
﹡ investigates partial order item collection
poC 3low claim order,
poC 3low claim object project weights>=
minwbe (
i 4 ,
i 3 ,
i 2 ) collection, this collection is frequently, is joined the set of Feature Words frequent item set
fIS, that is,
fIS=(
i 1 ), (
i 2 ), (
i 3 ), (
i 4 ), (
i 5 ), (
i 2 ,
i 3 ), (
i 2 ,
i 4 ), (
i 3 ,
i 4 ), (
i 3 ,
i 1 ), (
i 2 ,
i 5 ), (
i 4 ,
i 3 ,
i 2 ).
﹡ is not 0 Feature Words candidate 3_ item collection by its collection frequency
c 3carry out Apriori connection, generating feature word candidate 4
_item collection
c 4,
c 4=
.Due to
c 4for sky, therefore excavating, 3 steps finish, proceed to following 4 steps.
4. from the set of Feature Words frequent item set
fISthe effectively complete weighted feature word association mode of rule of middle excavation.
With
fISmiddle Feature Words frequent item set (
i 4 ,
i 3 ,
i 2 ) be example, provide effectively complete weighted feature word association mode of rule mining process as follows:
Frequent item set (
i 4 ,
i 3 ,
i 2 ) proper subclass set be (
i 4 ), (
i 3 ), (
i 2 ), (
i 4 ,
i 3 ), (
i 4 ,
i 2 ), (
i 3 ,
i 2 ).
(1) for (
i 4 ), (
i 3 ,
i 2 ),
i 1 =(
i 4 ),
i 2 =(
i 3 ,
i 2 ), (
i 4 ), (
i 3 ,
i 2 )=(
i 1 ,
i 2 ), therefore
k 1=1,
k 2=2,
k 12=3,
As known from Table 1,
w 1=0.92, as known from Table 2,
w 2=2.01+ 2.12=4.13,
As known from Table 3,
w 12=0.86+1.2+1.29=3.35,
(
k 12/
k 1) ×
w 1×
mc=(3/1) × 0.92 × 0.6=1.656,
w 12=3.35>=(
k 12/
k 1) ×
w 1×
mc=1.656, so excavate Feature Words correlation rule
i 1 →
i 2 , (
i 4 ) → (
i 3 ,
i 2 ).
(
k 12/
k 2) ×
w 2×
mc=(3/2) × 4.13 × 0.6=3.717,
w 12=3.35< (
k 12/
k 2) ×
w 2×
mc=3.717, so do not excavate rule.
(2) for (
i 3 ), (
i 4 ,
i 2 ),
i 1 =(
i 3 ),
i 2 =(
i 4 ,
i 2 ), (
i 3 ), (
i 4 ,
i 2 )=(
i 1 ,
i 2 ), therefore
k 1=1,
k 2=2,
k 12=3,
As known from Table 1,
w 1=2.86, as known from Table 2,
w 2=0.92+1.31=2.23,
As known from Table 3,
w 12=0.86+1.2+1.29=3.35,
(
k 12/
k 1) ×
w 1×
mc=(3/1) × 2.86 × 0.6=5.148,
w 12=3.35< (
k 12/
k 1) ×
w 1×
mc=5.14, therefore do not excavate rule.
(
k 12/
k 2) ×
w 2×
mc=(3/2) × 2.23 × 0.6=2.007,
w 12=3.35>=(
k 12/
k 2) ×
w 2×
mc=2.007, so excavate Feature Words correlation rule
i 2 →
i 1, (
i 4 ,
i 2 ) → (
i 3 ).
(3) (
i 2 ),(
i 4 ,
i 3 )
For (
i 2 ), (
i 4 ,
i 3 ),
i 1 =(
i 2 ),
i 2 =(
i 4 ,
i 3 ), (
i 2 ), (
i 4 ,
i 3 )=(
i 1 ,
i 2 ), therefore
k 1=1,
k 2=2,
k 12=3,
As known from Table 1,
w 1=2.14, as known from Table 2,
w 2=0.86+1.2=2.06,
As known from Table 3,
w 12=0.86+1.2+1.29=3.35,
(
k 12/
k 1) ×
w 1×
mc=(3/1) × 2.14 × 0.6=3.852,
w 12=3.35< (
k 12/
k 1) ×
w 1×
mc=3.852, therefore do not excavate rule.
(
k 12/
k 2) ×
w 2×
mc=(3/2) × 2.06 × 0.6=1.854,
w 12=3.35>=(
k 12/
k 2) ×
w 2×
mc=1.854, so excavate Feature Words correlation rule
i 2 →
i 1, (
i 4 ,
i 3 ) → (
i 2 ).
Eventually the above, for Feature Words frequent item set (
i 4 ,
i 3 ,
i 2 ), can excavate effectively completely weighted feature word association mode of rule (
ms=0.1,
mc=0.6): (
i 4 ) → (
i 3 ,
i 2 ), (
i 4 ,
i 2 ) → (
i 3 ) and (
i 4 ,
i 3 ) → (
i 2 )
Below by experiment, beneficial effect of the present invention is described further.
In order to verify validity of the present invention, correctness, select classical without weighted association rules method for digging Apriori (R.Agrawal, T.Imielinski, A.Swami. Mining association rules between sets of items in large database[C] // Proceeding of 1993 ACM SIGMOD International Conference on Management of Data, Washington D.C., 1993, : 207-216.) and the matrix weighted association rules method for digging MWARM (Huang Mingxuan of inquiry oriented expansion (5), Yan little Wei, Zhang Shichao. the spurious correlation feedback query expansion [J] of excavating based on matrix weighted association rules. Journal of Software, 2009, 20 (7): 1854-1865., in experiment, expansion word quantity is made as to 0) be control methods, write experiment source program, from support changes of threshold and two kinds of situations of confidence threshold value variation, the excavation performance of the present invention and control methods is tested to contrast and analysis respectively.Experiment parameter except
mswith
mcin addition, also have:
iN: the number of entry of excavation,
n: document lump record.4-item collection is excavated in experiment.
The part Chinese language material of the Chinese Web test set CWT200g that experimental data provides from Korea_Times2001 English document language material and the network laboratories of Peking University of Japanese national scientific information system central information searching system test set NTCIR-5 CLIR extracts 4936 sections of English document (Serial Number Range is: KT2001_00000--KT2001_05066) and from this CWT200g language material, extracts 12024 sections of Chinese text documents as testing document test set herein from Korea_Times2001.Process participle (Chinese document), stem extract the document pre-service such as the calculating of (English document), elimination stop words and extraction Feature Words and weights thereof, build text database and Feature Words project storehouse based on vector space model.After pre-service, document frequency df(being contained to the document record of this Feature Words) scope is that 1028 to 2593 English Feature Words (totally 50) and df value are extracted and packed feature dictionary (now Chinese Feature Words quantity is 400) at the Chinese Feature Words of [1500,5838] scope.
Experiment 1: in support changes of threshold situation, algorithm excavates Performance Ratio
When support changes of threshold the present invention and 2 kinds of control methodss (Apriori and MWARM method) Chinese and English excavate in 2 kinds of document test sets candidate (
candidate Itemset, CI), frequent item set (
frequent Itemset, FI) and correlation rule (
association Rule, AR) and quantity result is if table 1 is to as shown in table 4.
Experiment 2: excavate Performance Ratio when confidence threshold value changes
When confidence threshold value changes the present invention and 2 kinds of control methodss in 2 kinds of document test sets of Chinese and English Mining Association Rules quantity as shown in table 5 and table 6.
Experiment 3: excavate time efficiency comparison
The time (second) that when support changes of threshold, candidate, frequent item set and correlation rule are excavated in the present invention and control methods is as shown in table 7 and table 8.In the situation that confidence threshold value changes, the time of 3 kinds of algorithm Mining Association Rules (second) as shown in Table 9 and Table 10.
Experiment 4: experimental result instance analysis
In Chinese text test set CWT200g, 28 examples of selected characteristic lexical item order are as the project set excavating, and as shown in table 11, the present invention and 2 kinds of control methodss exist
mc=0. 1 He
msunder=0. 1 condition, Chinese test set is excavated to (excavating 4-item collection), the correlation rule example extracting taking Feature Words project " participation " as former piece in its result is analyzed, and result is as shown in table 12.
Feature Words example in table 11CWT200g
The correlation rule example table taking " participation " as former piece that three kinds of methods of table 12 are excavated
Table 12 shows, in the correlation rule example taking " participation " as former piece, it is few that the correlation rule quantity that the present invention excavates is excavated than 2 control methodss, and its association rule model more approaches actual conditions, avoided invalid and false association mode generation.For example, " participation " and " participation " is near synonym, in short or in one section of word should occur seldom simultaneously, so correlation rule " participates in → participates in " should not be Strong association rule.In the Result of this paper algorithm MAWAR-POI, do not excavate shape as invalid and false patterns of this class such as " participate in → participate in ", and in the Result of contrast algorithm, the association rule model of not only excavating is many, and can also excavate Strong association rule " participate in → participate in ", and this class association mode should be false, barren and invalid pattern.
Above-mentioned experimental result shows, compares with experiment contrast, and the present invention has good excavation performance, and concrete manifestation is as follows:
(1) no matter in the situation that support threshold value or confidence threshold value change, the candidate that the present invention excavates, frequent item set, correlation rule quantity is all than existing excavate without the complete weighting algorithm of weighted sum few a lot, for example, the candidate quantity that invention is excavated on the English data set of NTCIR-5 is than the minimizing of Apriori method 90.60%, than the minimizing 90.49%(table 1 of MWARM method), and the candidate quantity of excavating on Chinese data collection CWT200g is than the minimizing of Apriori method 94.37%, than the minimizing 87.29%(table 2 of MWARM method), show that the present invention can avoid and reduce a lot of invalid association modes and occur.
(2) what excavation time comparison of the present invention was excavated than algorithm is few, amount of decrease is larger, for example, the time average that the present invention excavates a collection and correlation rule on the English data set of NTCIR-5 is than the minimizing of Apriori method 87.58%, than the minimizing 83.56%(table 7 of MWARM method), and the time of excavating on Chinese data collection CWT200g is than the minimizing of Apriori method 85.98%, than the minimizing 67.60%(table 8 of MWARM method), show that digging efficiency of the present invention is greatly improved.
(3) experimental result of table 12 shows, the Feature Words association rule model that the present invention excavates more can approach reality.
Claims (6)
1. an association rule mining method between the Sino-British text word based on partial order item collection, is characterized in that, comprises the steps:
(1) Chinese and English text message data pre-service: pending Chinese and English text message data are carried out to pre-service: Chinese text participle, English text stem extracts, remove stop words, extract Feature Words and weights calculating thereof, build text message database and Feature Words project storehouse based on vector space model;
(2) excavate the numerous partial order item of complete weighted feature word frequency collection, comprise the following steps 2.1 and step 2.2:
(2.1) excavate the numerous 1_ item of complete weighted feature word frequency collection
l 1 , concrete steps are carried out according to 2.1.1 and 2.1.3:
(2.1.1) from Feature Words project storehouse, extract Feature Words candidate 1_ item collection
c 1, in cumulative text message database, the weights of all items, obtain whole project weights summations
w, cumulative
c 1weights accumulative total in text message database
, calculate
c 1support
poisup(
c 1);
(2.1.2) by Feature Words candidate 1_ item collection
c 1in its support
piosup(
c 1)>=
msfrequent 1_ item collection
l 1 join the set of Feature Words frequent item set
fIS,
msfor minimum support threshold value;
(2.1.3) cumulative candidate 1-item collection in text message database
c 1occurrence frequency
n c1
, extract
w r (
c 1), calculate
c 1partial order item centralization of state power value expect
pOIWB(
c 1, 2);
(2.2) excavate complete weighted feature word frequency numerous
k_ collection
l k , described
k>=2, according to step, 2.2.1 ~ 2.2.12 operates:
(2.2.1) for candidate (
k-1) _ collection C
k-1
, will
w(
c k-1
) <
pOIWB(
c k-1 ,
k) can not become frequent
k_ collection
c k-1
wipe out, obtain new candidate
c k-1 set;
Wherein,
w(
c k-1
) be
c k-1
weights accumulative total in text message database,
pOIWB(
c k-1
,
k) for comprise complete weighting candidate (
k-1) _ collection
c k-1
's
k_ centralization of state power value is expected;
(2.2.2) by its collection frequency be not 0 Feature Words candidate (
k-1) _ collection
c k-1
carry out Apriori connection, generating feature word candidate
k_item collection
c k ;
If (2.2.3)
c k for sky, exit 2.2 steps and proceed to (3) step; Otherwise, if
c k not empty, proceed to 2.2.4 step;
(2.2.4) for candidate
k_ collection
c k , investigate
c k any (
k-1) _ collected works collection, if exist one its (
k-1) the item centralization of state power value an of _ subset is less than its corresponding partial order item centralization of state power and heavily expects (
w (
k-1)
<
pOIWB(
c k-1 ,
k)), this collection
c k must be non-frequent item set, from its set, delete this collection, obtain new candidate's partial order item collection po
c k set;
(2.2.5) cumulative candidate in text message database
k-collection
c k occurrence frequency
n ck and each project weights
w 1(
c k )
, w 2(
c k )
..., w k (
c k ), extract
w r (
c k ), calculate
c k weight expect
pOIWB(
c k ,
k+1);
(2.2.6) delete the candidate that its collection frequency is 0
k-collection
c k , obtain new
c k set;
(2.2.7) obtain each
c k partial order item collection po
c k ;
(2.2.8) investigate partial order item collection po
c k high order proper subclass, if there is po
c k high order proper subclass right and wrong frequently, partial order item collection po
c k certain right and wrong frequently, are deleted this collection from its set, obtain new candidate's partial order item collection po
c k set;
(2.2.9) investigate partial order item collection po
c k high claim object project weights, if there is po
c k high claim object project weights be less than the minimum weight threshold of 1_ item collection
minw, partial order item collection po
c k certain right and wrong frequently, are deleted this collection from its set, obtain new candidate's partial order item collection po
c k set,
minwcomputing formula be:
minw=
w×
ms;
(2.2.10) investigate partial order item collection
poC k low claim order, if there is po
c k low claim object project weights be not less than
minw, partial order item collection po
c k must be frequently, this collection is joined to the set of Feature Words frequent item set
fIS;
(2.2.11) to remaining partial order item collection
poC k , calculate its support
piosup(
poC k ), if
piosup(
poC k )>=
ms, this partial order item collection
poC k be frequently, join the set of Feature Words frequent item set
fIS;
(2.2.12) will
kvalue add 1, circulation 2.2.1 ~ 2.2.12 step, until
c k for sky, exit 2.2 steps and proceed to (3) step as follows;
(3) from the set of Feature Words frequent item set
fISthe effectively complete weighted feature word Strong association rule pattern of middle excavation, comprises the following steps:
(3.1) from the set of Feature Words frequent item set
fIStake out Feature Words frequent item set
l i , find out
l i all proper subclass;
(3.2) from
l i proper subclass set in take out arbitrarily two proper subclass
i 1 with
i 2 , work as I
1 i
2=
, and I
1 i
2=L
iif,
w 12>=(
k 12/
k 1) ×
w 1×
mc, excavate Feature Words Strong association rule
i 1 →
i 2 ; If
w 12>=(
k 12/
k 2) ×
w 2×
mc, excavate Feature Words Strong association rule
i 2 →
i 1 ; Described
k 1,
k 2with
k 12be respectively a collection
i 1 ,
i 2 (
i 1 ,
i 2 ) project number,
w 1,
w 2with
w 12be respectively
i 1 ,
i 2 (
i 1 ,
i 2 ) item centralization of state power value,
mcfor minimal confidence threshold;
(3.3) continue 3.2 steps, when Feature Words frequent item set
l i proper subclass set in each proper subclass be removed once, and only can take out once, proceed to step 3.4;
(3.4) continue 3.1 steps, when each frequent item set in the set of Feature Words frequent item set
l i all be removed once, and only can take out once, (3) step end of run;
So far, weighted feature word association mode of rule excavates end completely.
2. be applicable to an association rule mining system between the Sino-British text word based on partial order item collection claimed in claim 1, it is characterized in that, comprise following 4 modules:
Text message pretreatment module: for pending Chinese and English notebook data is carried out to pre-service, be that Chinese text participle, English text stem extract, remove stop words and Feature Words extraction and weights calculating thereof etc., build text message database and Feature Words project storehouse based on vector space model;
The frequent partial order item of Feature Words collection generation module: this module is used for from the complete weighted feature word of text message database mining candidate partial order item collection, and adopt new pruning method to the beta pruning of candidate's partial order item collection, obtain final candidate's partial order item collection, by new partial order item collection support computing method, concentrate and draw the numerous partial order item of complete weighted feature word frequency integrated mode from candidate's partial order item;
Completely weighted feature word association rule generation module: simple computation and the comparison of this module and dimension heavy by a centralization of state power, from the numerous partial order item of complete weighted feature word frequency collection (
i 1,
i 2) the middle weighted feature word Strong association rule pattern completely of excavating effectively:
i 1→
i 2;
Association rule model result display module: the form that effectively weighted feature word Strong association rule pattern is liked with user is completely shown to user, for customer analysis, choice and operation.
3. digging system according to claim 2, is characterized in that, described text message pretreatment module comprises following 2 modules:
Chinese and English text pretreatment module: this module is responsible for Chinese text message carry out participle and remove Chinese stop words, and English text information is carried out stem extraction and removed the Chinese and English language material pre-service work such as English stop words;
Text database and project library build module: this module mainly carries out Chinese and English Feature Words extracts and weight calculation, build text message database and Chinese and English Feature Words project storehouse based on vector space model.
4. digging system according to claim 2, is characterized in that, the frequent partial order item of described Feature Words collection generation module comprises following 3 modules:
Feature Words candidate partial order item collection generation module: this module is mainly excavated Feature Words candidate partial order item collection from text message database, detailed process is as follows: from Feature Words project storehouse, extract candidate 1-item collection, the weights summation of cumulative candidate 1-item collection in text message database, calculate its support, draw the numerous 1_ item of complete weighted feature word frequency collection; Then, connect by Apriori, by complete weighted feature word frequency numerous (
k-1) _ collection generating feature word candidate k_ item collection; Described
k>=2; The project weights of each project of cumulative Feature Words candidate k_ item collection in text message database, draw complete weighted feature word candidate partial order k_ item collection;
Feature Words candidate partial order item collection beta pruning module: this module utilizes pruning method of the present invention to carry out beta pruning to complete weighted feature word candidate partial order k_ item collection, candidate's partial order k_ item collection is deleted frequently, obtains finally likely candidate's partial order k_ item collection set frequently;
The frequent partial order item of Feature Words collection generation module: this module is mainly that the final candidate's partial order k_ item collection to obtaining after above-mentioned module beta pruning excavates, use the support of support computing method calculated candidate partial order k_ item collection of the present invention, with the comparison of minimum support threshold value, draw the numerous partial order k_ item of complete weighted feature word frequency collection.
5. digging system according to claim 2, is characterized in that, described complete weighted feature word association rule generation module comprises following 2 modules:
The subitem collection generation module of the frequent partial order item of Feature Words collection: the numerous partial order item of the main generating feature word frequency of this module collects all proper subclass, and obtain item centralization of state power value and the dimension of each proper subclass;
Weighted feature word association rule generation module completely: this module, by simple computation and a comparison for centralization of state power value, concentrates from the frequent partial order item of Feature Words the weighted feature word Strong association rule pattern completely of excavating effectively.
6. according to the digging system described in any one in claim 2-5, it is characterized in that the minimum support threshold value in described digging system
ms, minimal confidence threshold
mcinputted by user.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410427491.8A CN104182527B (en) | 2014-08-27 | 2014-08-27 | Association rule mining method and its system between Sino-British text word based on partial order item collection |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410427491.8A CN104182527B (en) | 2014-08-27 | 2014-08-27 | Association rule mining method and its system between Sino-British text word based on partial order item collection |
Publications (2)
Publication Number | Publication Date |
---|---|
CN104182527A true CN104182527A (en) | 2014-12-03 |
CN104182527B CN104182527B (en) | 2017-07-18 |
Family
ID=51963566
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201410427491.8A Expired - Fee Related CN104182527B (en) | 2014-08-27 | 2014-08-27 | Association rule mining method and its system between Sino-British text word based on partial order item collection |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN104182527B (en) |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104715073A (en) * | 2015-04-03 | 2015-06-17 | 江苏物联网研究发展中心 | Association rule mining system based on improved Apriori algorithm |
CN106383883A (en) * | 2016-09-18 | 2017-02-08 | 广西财经学院 | Matrix weighted association mode-based Indonesian and Chinese cross-language retrieval method and system |
CN106484781A (en) * | 2016-09-18 | 2017-03-08 | 广西财经学院 | Indonesia's Chinese cross-language retrieval method of fusion association mode and user feedback and system |
CN107562904A (en) * | 2017-09-08 | 2018-01-09 | 广西财经学院 | Positive and negative association mode method for digging is weighted between the English words of fusion item weights and frequency |
CN108563735A (en) * | 2018-04-10 | 2018-09-21 | 国网浙江省电力有限公司 | One kind being based on the associated data sectioning search method of word |
CN109684464A (en) * | 2018-12-30 | 2019-04-26 | 广西财经学院 | Compare across the language inquiry extended method of implementation rule consequent excavation by weight |
CN109783628A (en) * | 2019-01-16 | 2019-05-21 | 福州大学 | The keyword search KSAARM algorithm of binding time window and association rule mining |
CN110619073A (en) * | 2019-08-30 | 2019-12-27 | 北京影谱科技股份有限公司 | Method and device for constructing video subtitle network expression dictionary based on Apriori algorithm |
CN112527953A (en) * | 2020-11-20 | 2021-03-19 | 出门问问(武汉)信息科技有限公司 | Rule matching method and device |
CN113254755A (en) * | 2021-07-19 | 2021-08-13 | 南京烽火星空通信发展有限公司 | Public opinion parallel association mining method based on distributed framework |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20080147688A1 (en) * | 2001-09-04 | 2008-06-19 | Frank Beekmann | Sampling approach for data mining of association rules |
CN103279570A (en) * | 2013-06-19 | 2013-09-04 | 广西教育学院 | Text database oriented matrix weighting negative pattern mining method |
CN103838854A (en) * | 2014-03-14 | 2014-06-04 | 广西教育学院 | Completely-weighted mode mining method for discovering association rules among texts |
CN103955542A (en) * | 2014-05-20 | 2014-07-30 | 广西教育学院 | Method of item-all-weighted positive or negative association model mining between text terms and mining system applied to method |
-
2014
- 2014-08-27 CN CN201410427491.8A patent/CN104182527B/en not_active Expired - Fee Related
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20080147688A1 (en) * | 2001-09-04 | 2008-06-19 | Frank Beekmann | Sampling approach for data mining of association rules |
CN103279570A (en) * | 2013-06-19 | 2013-09-04 | 广西教育学院 | Text database oriented matrix weighting negative pattern mining method |
CN103838854A (en) * | 2014-03-14 | 2014-06-04 | 广西教育学院 | Completely-weighted mode mining method for discovering association rules among texts |
CN103955542A (en) * | 2014-05-20 | 2014-07-30 | 广西教育学院 | Method of item-all-weighted positive or negative association model mining between text terms and mining system applied to method |
Non-Patent Citations (2)
Title |
---|
黄名选等: "基于两次剪枝的完全加权关联规则挖掘算法", 《情报理论与实践》 * |
黄名选等: "基于文本库的完全加权词间关联规则挖掘算法", 《广西师范大学学报(自然科学版)》 * |
Cited By (18)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104715073B (en) * | 2015-04-03 | 2017-11-24 | 江苏物联网研究发展中心 | Based on the association rule mining system for improving Apriori algorithm |
CN104715073A (en) * | 2015-04-03 | 2015-06-17 | 江苏物联网研究发展中心 | Association rule mining system based on improved Apriori algorithm |
CN106383883A (en) * | 2016-09-18 | 2017-02-08 | 广西财经学院 | Matrix weighted association mode-based Indonesian and Chinese cross-language retrieval method and system |
CN106484781A (en) * | 2016-09-18 | 2017-03-08 | 广西财经学院 | Indonesia's Chinese cross-language retrieval method of fusion association mode and user feedback and system |
CN106484781B (en) * | 2016-09-18 | 2019-03-15 | 广西财经学院 | Merge the Indonesia's Chinese cross-language retrieval method and system of association mode and user feedback |
CN106383883B (en) * | 2016-09-18 | 2019-04-16 | 广西财经学院 | Indonesia's Chinese cross-language retrieval method and system based on matrix weights association mode |
CN107562904B (en) * | 2017-09-08 | 2019-07-09 | 广西财经学院 | Positive and negative association mode method for digging is weighted between fusion item weight and the English words of frequency |
CN107562904A (en) * | 2017-09-08 | 2018-01-09 | 广西财经学院 | Positive and negative association mode method for digging is weighted between the English words of fusion item weights and frequency |
CN108563735A (en) * | 2018-04-10 | 2018-09-21 | 国网浙江省电力有限公司 | One kind being based on the associated data sectioning search method of word |
CN109684464A (en) * | 2018-12-30 | 2019-04-26 | 广西财经学院 | Compare across the language inquiry extended method of implementation rule consequent excavation by weight |
CN109783628A (en) * | 2019-01-16 | 2019-05-21 | 福州大学 | The keyword search KSAARM algorithm of binding time window and association rule mining |
CN109783628B (en) * | 2019-01-16 | 2022-06-21 | 福州大学 | Method for searching KSAARM by combining time window and association rule mining |
CN110619073A (en) * | 2019-08-30 | 2019-12-27 | 北京影谱科技股份有限公司 | Method and device for constructing video subtitle network expression dictionary based on Apriori algorithm |
CN110619073B (en) * | 2019-08-30 | 2022-04-22 | 北京影谱科技股份有限公司 | Method and device for constructing video subtitle network expression dictionary based on Apriori algorithm |
CN112527953A (en) * | 2020-11-20 | 2021-03-19 | 出门问问(武汉)信息科技有限公司 | Rule matching method and device |
CN112527953B (en) * | 2020-11-20 | 2023-06-20 | 出门问问创新科技有限公司 | Rule matching method and device |
CN113254755A (en) * | 2021-07-19 | 2021-08-13 | 南京烽火星空通信发展有限公司 | Public opinion parallel association mining method based on distributed framework |
CN113254755B (en) * | 2021-07-19 | 2021-10-08 | 南京烽火星空通信发展有限公司 | Public opinion parallel association mining method based on distributed framework |
Also Published As
Publication number | Publication date |
---|---|
CN104182527B (en) | 2017-07-18 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN104182527A (en) | Partial-sequence itemset based Chinese-English test word association rule mining method and system | |
CN103514183B (en) | Information search method and system based on interactive document clustering | |
CN103955542B (en) | Method of item-all-weighted positive or negative association model mining between text terms and mining system applied to method | |
CN104216874B (en) | Positive and negative mode excavation method and system are weighted between the Chinese word based on coefficient correlation | |
Luo et al. | A parallel dbscan algorithm based on spark | |
CN104317794A (en) | Chinese feature word association pattern mining method based on dynamic project weight and system thereof | |
CN107832467A (en) | A kind of microblog topic detecting method based on improved Single pass clustering algorithms | |
CN103440308B (en) | A kind of digital thesis search method based on form concept analysis | |
Wenli | Application research on latent semantic analysis for information retrieval | |
CN111897926A (en) | Chinese query expansion method integrating deep learning and expansion word mining intersection | |
CN103678642A (en) | Concept semantic similarity measurement method based on search engine | |
CN109739952A (en) | Merge the mode excavation of the degree of association and chi-square value and the cross-language retrieval method of extension | |
CN104239430A (en) | Item weight change based method and system for mining education data association rules | |
Du et al. | An overview of dynamic data mining | |
CN111259117B (en) | Short text batch matching method and device | |
Lu et al. | Research on text classification based on TextRank | |
CN109684465B (en) | Text retrieval method based on pattern mining and mixed expansion of item set weight value comparison | |
Duan et al. | Error correction for search engine by mining bad case | |
CN111897928A (en) | Chinese query expansion method for embedding expansion words into query words and counting expansion word union | |
Xu | An Apriori algorithm to improve teaching effectiveness | |
CN108170778A (en) | Rear extended method is translated across language inquiry by China and Britain based on complete Weighted Rule consequent | |
Gui et al. | Topic modeling of news based on spark Mllib | |
Xiaohu et al. | A Fast Search Algorithm Based on Agent Association Rules | |
Hu et al. | Graphsdh: a general graph sampling framework with distribution and hierarchy | |
He et al. | Enterprise human resources information mining based on improved Apriori algorithm |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C41 | Transfer of patent application or patent right or utility model | ||
TA01 | Transfer of patent application right |
Effective date of registration: 20160325 Address after: Nanning City, 530003 West Road Mingxiu the Guangxi Zhuang Autonomous Region No. 100 Applicant after: Guangxi Finance and Economics Institute Address before: Building 530023 Nanning Road, the Guangxi Zhuang Autonomous Region No. 37 Applicant before: Guangxi College of Education |
|
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20170718 Termination date: 20180827 |