CN102663142B - Knowledge extraction method - Google Patents

Knowledge extraction method Download PDF

Info

Publication number
CN102663142B
CN102663142B CN201210157204.7A CN201210157204A CN102663142B CN 102663142 B CN102663142 B CN 102663142B CN 201210157204 A CN201210157204 A CN 201210157204A CN 102663142 B CN102663142 B CN 102663142B
Authority
CN
China
Prior art keywords
search
individual
pos
dimension
centerdot
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201210157204.7A
Other languages
Chinese (zh)
Other versions
CN102663142A (en
Inventor
刘洪波
冯士刚
陈荣
张维石
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Dalian Maritime University
Original Assignee
Dalian Maritime University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Dalian Maritime University filed Critical Dalian Maritime University
Priority to CN201210157204.7A priority Critical patent/CN102663142B/en
Publication of CN102663142A publication Critical patent/CN102663142A/en
Application granted granted Critical
Publication of CN102663142B publication Critical patent/CN102663142B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The invention discloses a knowledge extraction method, which comprises the following steps of: calculating a reduced initial value; enabling a dual-matrix coding strategy; conducting searching initialization; calculating an ending criterion; calculating adaptive values of searching individuals; conducting optimum saving; and conducting state transition joint operation. The dual-matrix coding strategy is adopted, the positions of the searching individuals are coded into 0 and 1 character strings, and the number of dimensions is equal to the number of condition attributes. When the dimension scale exceeds 23, time spent in fishing reduction is not exponentially and obviously increased and the spatial dimensions and the time are saved. Rough set positive area discrimination is adopted. If POS'<E> is equal to U'<pos>, the adaptive values are the number of corresponding condition attributes; and if the POS'<E> is not equal to U'<pos>, the adaptive values are punished to be the total number of the condition attributes. The strategy simply and reasonably guarantees the knowledge extraction effect. The superiority of a group consisting of the searching individuals is used for conducting dynamic searching, and a method of conducting feature combination through effective positive area comparison to obtain much knowledge is adopted.

Description

A kind of method of Knowledge Extraction
Technical field
The present invention relates to a kind of knowledge discovering technologies, particularly a kind of Knowledge Extraction Method.
Background technology
Rough set theory is a kind of mathematical tool of processing out of true, inconsistent and fragmentary data, and it is that scientist in Poland Pawlak proposed in nineteen eighty-two, can keep under the prerequisite that classification capacity is constant, by the classifying rules of Reduction of Knowledge acquire knowledge.Compare with decision tree, bayes method etc., rough set method does not need priori, in the infosystem of only utilizing data itself to provide, finds knowledge.In real world, the knowledge of many infosystems embodies conventionally not unique, has the knowledge of a plurality of angles, and they may be a plurality of various combinations of the different attribute in infosystem, and its classification performance is suitable.These many bodies knowledge, in specific environment, may be brought into play different effects.For example, in multirobot real-time route is selected, in the enough situation of memory body capacity, many bodies knowledge provides more routing, can show stronger avoidant disorder ability.For Knowledge Extraction, each yojan can be expressed as different monomer knowledge, and the common many bodies knowledge system that forms of these many yojan, has very important value in actual applications.
Verified all yojan and the minimum yojan that solves decision table is NP.hard problem.For this reason, conventionally adopt didactic method to carry out attribute reduction.Conventional heuritic approach has old attribute reduction algorithms, the old attribute reduction algorithms based on distinguishable matrix and the old attribute reduction algorithms based on positive region based on information entropy.The basic ideas of most of heuristic reduction algorithms are to take core attribute as starting point, then according to certain of Importance of Attributes, estimate, select not successively to be added to the concentrated most important attribute of yojan beyond core attribute, being joined yojan concentrates, until meet end condition, obtain thus a yojan of decision table.This yojan can only be expressed as monomer knowledge in knowledge system.Current, many Knowledge Extractions are major issues that face in knowledge discovering technologies.
Summary of the invention
The problems referred to above that exist for solving prior art, the present invention will propose a kind of Knowledge Extraction Method that obtains many bodies knowledge in existing information system.
To achieve these goals, technical scheme of the present invention is as follows: a kind of method of Knowledge Extraction, comprises the following steps:
A, calculating yojan initial value
According to formula (1), (2) and (3), calculate about normal district POS' e, yojan domain U',Yue normal district U ' pos
Figure BDA00001657206500021
Note U/C={[u' 1] c, [u' 2] c..., [u' m] c,
U′={u′ 1,u' 2,…,u' m}(2)
U &prime; POS = { u &prime; i 1 , u &prime; i 2 , &CenterDot; &CenterDot; &CenterDot; u &prime; i t } - - - ( 3 )
B, enable two square coding strategies
In solution space during Search of Individual, need to encode according to the dimension of solution space, described coding is that the location dimension of conditional attribute is direct and Search of Individual forms mapping, when information system perspective field object surpasses 4000, dimension scale is while surpassing 23, every 3 attributes are corresponding to a coding unit, like this, in dimension, show as 1, the integer that the span of position is 0 ~ 7;
C, search initialization
For without loss of generality, suppose that the field of definition of yojan is [0, r], solution space maximum occurrences is r, and minimum value is 0, and solution space dimension is d, if what adopt in step B is a coded representation, r=1 so; If that adopt in step B is abbreviated code representation, so r=7;
Utilize n the formed population of Search of Individual to carry out parallel search in solution space, make the maximal rate v of Search of Individual max=r; During time step t=0, the random initializtion that carries out to n Search of Individual, i.e. the position p of the j of i Search of Individual dimension ij=Rand (0, r) and the speed v tieed up of the j of i Search of Individual ij=Rand (v max, v max); In formula, r is field of definition, and t is time step;
D, calculating ending-criterion
If meet predetermined maximum iteration time or 10 iteration results without improvement, Output rusults p* and f (p*) finish to calculate; Otherwise, go to step E;
In formula, p* be Search of Individual form group in best individual state, that i Search of Individual starts to iterate to current best state from t=0, f (p*) be Search of Individual form group in the best determined adaptive value of individual state.
The adaptive value of E, calculating Search of Individual
Adopt the positive district of rough set to differentiate, if POS' e=U ' pos, adaptive value is respective conditions attribute number; If POS' e≠ U ' pos, adaptive value punishment is conditional attribute sum;
F, optimum preservation
Make t=t+1, implement optimum conversation strategy, that is:
p i # ( t ) = arg min 1 &le; i &le; n ( f ( p i # ( t - 1 ) ) , f ( p i ( t ) ) )
p * ( t ) = arg min 1 &le; i &le; n ( f ( p * ( t - 1 ) ) , f ( p 1 ( t ) ) , &CenterDot; &CenterDot; &CenterDot; , f ( p n ( t ) ) )
G, state transitions joint operation
Introduce the community superiority that Search of Individual forms and dynamically search for, for each dimension of each Search of Individual according to formula (4), (5) and (6) executing state transfer joint operation:
v ij ( t ) = wv ij ( t - 1 ) + c 1 r 1 ( p ij # ( t - 1 ) - p ij ( t - 1 ) ) + c 2 r 2 ( p j * ( t - 1 ) - p ij ( t - 1 ) ) - - - ( 4 )
p ij ( t ) = 1 if&rho; < sig ( v ij ( t ) ) 0 otherwise . - - - ( 5 )
Wherein,
sig ( v ij ( t ) ) = 1 1 + e - v ij ( t ) - - - ( 6 )
Go to step D.
Compared with prior art, the present invention has following beneficial effect:
1, the present invention adopts two square coding strategies.Conditional attribute in Reduction of Knowledge of the present invention directly forms and shines upon with the location dimension of Search of Individual, and the every one dimension span in position is that { 0,1}, ' 0' represent that corresponding attribute is not included in yojan, ' 1' represents that corresponding attribute is included in yojan.Like this, position encoded one-tenth 0,1 character string of Search of Individual, dimension is identical with conditional attribute number.When dimension scale surpasses 23, complete the time that yojan consumes and be not index phenomenal growth, saved Spatial Dimension and time.
2, the present invention adopts the positive district of rough set to differentiate POS' e=U ' posadaptive value is respective conditions attribute number, if POS' e≠ U ' posadaptive value punishment, for conditional attribute sum, has guaranteed to this tactful advantages of simple Knowledge Extraction effect.
3, the present invention introduces the community superiority that Search of Individual forms and dynamically searches for, according to formula (4), (5) and (6) executing state shift joint operation, obtain rational many yojan knowledge and distribute, effectively solve that prior art exists, be difficult to the problem to many bodies knowledge in existing information system.
4, the present invention proposes the many Algorithm for Reduction of two square coding swarm intelligence rough set of being convenient to many Knowledge Extractions, the community superiority forming by Search of Individual is dynamically searched for, and adopts a kind of effectively positive district relatively to carry out the method that Feature Combination obtains many knowledge.
Accompanying drawing explanation
4, the total accompanying drawing of the present invention, wherein:
Fig. 1 is that two 1 absolute coding of square represent.
Fig. 2 is that two square multidigit compressed encodings represent.
Fig. 3 soybean-large-test data set performance comparison curves.
Fig. 4 is process flow diagram of the present invention.
Embodiment
Below in conjunction with accompanying drawing, the present invention is described further.
Fig. 1 is a coded representation method schematic diagram in the two square coding strategies of the present invention, conditional attribute location dimension direct and Search of Individual in Reduction of Knowledge forms mapping, the every one dimension span in position is { 0,1}, ' 0' represent that corresponding attribute is not included in yojan, ' 1' represents that corresponding attribute is included in yojan.Like this, position encoded one-tenth 0,1 character string of Search of Individual, dimension is identical with conditional attribute number.
Fig. 2 is the abbreviated code method for expressing schematic diagram in the two square coding strategies of the present invention, and when information system perspective field object surpasses 4000, dimension scale is while surpassing 23, every 3 synthetic unit show as 1, the integer that the span of position is 0 ~ 7 in dimension.
Fig. 3 is the performance comparison curves of two kinds of method yojan soybean large test data sets, and performance curve shows, the present invention can obtain result that will be good than genetic algorithm within the shorter time.For soybean largetest data set, the result of three GA yojan is 12:{1,3,4,5,6,7,13,15,16,22,32,35}; The result of three PSO yojan is respectively 10:{1,3,5,6,7,12,15,18,22,31}, 10:{1,3,5,6,7,15,23,26,28,30} and 10:{1,2,3,6,7,9,15,21,22,30}.Compare with additive method, the present invention is more inclined to provides many yojan, and the result conditional attribute number of gained will be lacked.
Table 1 is one group of example raw data set, namely adopts the data mode before yojan of the present invention, c 1, c 2, c 3and c 4be four conditional attributes, d is four determined decision attributes of conditional attribute, x i(i=1 ... 15) be 15 examples phenotypes after discretize in this infosystem, table 2 is the results that obtain according to after the yojan of flow process shown in Fig. 4, wherein comprise two minimum yojan: { Isosorbide-5-Nitrae } and { 2,3} (noting: the numeral in braces is conditional attribute numbering).Example in the table 2 obtaining after table 1 yojan can be converted into { Isosorbide-5-Nitrae } and { 2,3} disome knowledge.
Table 1 sample data collection
c 1 c 2 c 3 c 4 d
x 1 1 1 1 1 0
x 2 2 2 2 1 1
x 3 1 1 1 1 0
x 4 2 3 2 3 0
x 5 2 2 2 1 1
x 6 3 1 2 1 0
x 7 1 2 3 2 2
x 8 2 3 1 2 3
x 9 3 1 2 1 1
x 10 1 2 3 2 2
x 11 3 1 2 1 1
x 12 2 3 1 2 3
x 13 4 3 4 2 1
x 14 1 2 3 2 1
x 15 4 3 4 2 2
Table 2 result set
c 1 c 2 c 3 c 4 d
{1,4}
x 1 1 1 0
x 2 2 1 1
x 4 2 3 0
x 6 3 1 0
x 7 1 2 2
x 8 2 2 3
x 9 3 1 1
x 13 4 2 1
x 14 1 2 1
x 15 4 2 2
{2,3}
x 1 1 1 0
x 2 2 2 1
x 4 3 2 0
x 6 1 2 0
x 7 2 3 2
x 8 3 1 3
x 9 1 2 1
x 13 3 4 1
x 14 2 3 1
x 15 3 4 2

Claims (1)

1. a method for Knowledge Extraction, is characterized in that: comprise the following steps:
A, calculating yojan initial value
According to formula (1), (2) and (3), calculate about normal district POS' e, yojan domain U',Yue normal district U ' pos
Note U/C={[u' 1] c, [u' 2] c..., [u' m] c,
U′={u′ 1,u' 2,…,u' m} (2)
U &prime; POS = { u &prime; i 1 , u &prime; i 2 , &CenterDot; &CenterDot; &CenterDot; u &prime; i t } - - - ( 3 )
B, enable two square coding strategies
In solution space during Search of Individual, need to encode according to the dimension of solution space, described coding is that the location dimension of conditional attribute is direct and Search of Individual forms mapping, when information system perspective field object surpasses 4000, dimension scale is while surpassing 23, every 3 attributes are corresponding to a coding unit, like this, in dimension, show as 1, the integer that the span of position is 0 ~ 7;
C, search initialization
For without loss of generality, suppose that the field of definition of yojan is [0, r], solution space maximum occurrences is r, and minimum value is 0, and solution space dimension is d, if what adopt in step B is a coded representation, r=1 so; If that adopt in step B is abbreviated code representation, so r=7;
Utilize n the formed population of Search of Individual to carry out parallel search in solution space, make the maximal rate v of Search of Individual max=r; During time step t=0, the random initializtion that carries out to n Search of Individual, i.e. the position p of the j of i Search of Individual dimension ij=Rand (0, r) and the speed v tieed up of the j of i Search of Individual ij=Rand (v max, v max); In formula, r is field of definition, and t is time step;
D, calculating ending-criterion
If meet predetermined maximum iteration time or 10 iteration results without improvement, Output rusults p* and f (p*) finish to calculate; Otherwise, go to step E;
In formula, p* be Search of Individual form group in best individual state, that i Search of Individual starts to iterate to current best state from t=0, f (p*) be Search of Individual form group in the best determined adaptive value of individual state;
The adaptive value of E, calculating Search of Individual
Adopt the positive district of rough set to differentiate, if POS' e=U ' pos, adaptive value is respective conditions attribute number; If POS' e≠ U ' pos, adaptive value punishment is conditional attribute sum;
F, optimum preservation
Make t=t+1, implement optimum conversation strategy, that is:
p i # ( t ) = arg min 1 &le; i &le; n ( f ( p i # ( t - 1 ) ) , f ( p i ( t ) ) )
p * ( t ) = arg min 1 &le; i &le; n ( f ( p * ( t - 1 ) ) , f ( p 1 ( t ) ) , &CenterDot; &CenterDot; &CenterDot; , f ( p n ( t ) ) )
G, state transitions joint operation
Introduce the community superiority that Search of Individual forms and dynamically search for, for each dimension of each Search of Individual according to formula (4), (5) and (6) executing state transfer joint operation:
v ij ( t ) = wv ij ( t - 1 ) + c 1 r 1 ( p ij # ( t - 1 ) - p ij ( t - 1 ) ) + c 2 r 2 ( p j * ( t - 1 ) - p ij ( t - 1 ) ) - - - ( 4 )
p ij ( t ) = 1 if&rho; < sig ( v ij ( t ) ) 0 otherwise . - - - ( 5 )
Wherein,
sig ( v ij ( t ) ) = 1 1 + e - v ij ( t ) - - - ( 6 )
Go to step D.
CN201210157204.7A 2012-05-18 2012-05-18 Knowledge extraction method Expired - Fee Related CN102663142B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201210157204.7A CN102663142B (en) 2012-05-18 2012-05-18 Knowledge extraction method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201210157204.7A CN102663142B (en) 2012-05-18 2012-05-18 Knowledge extraction method

Publications (2)

Publication Number Publication Date
CN102663142A CN102663142A (en) 2012-09-12
CN102663142B true CN102663142B (en) 2014-02-26

Family

ID=46772633

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201210157204.7A Expired - Fee Related CN102663142B (en) 2012-05-18 2012-05-18 Knowledge extraction method

Country Status (1)

Country Link
CN (1) CN102663142B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103530505B (en) * 2013-09-29 2017-02-08 大连海事大学 Human brain language cognition modeling method

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7461064B2 (en) * 2004-09-24 2008-12-02 International Buiness Machines Corporation Method for searching documents for ranges of numeric values
CN101187927B (en) * 2007-12-17 2010-12-15 电子科技大学 Criminal case joint investigation intelligent analysis method

Also Published As

Publication number Publication date
CN102663142A (en) 2012-09-12

Similar Documents

Publication Publication Date Title
CN105069173A (en) Rapid image retrieval method based on supervised topology keeping hash
CN105512289A (en) Image retrieval method based on deep learning and Hash
CN110309343B (en) Voiceprint retrieval method based on deep hash
CN103617157A (en) Text similarity calculation method based on semantics
CN104933624A (en) Community discovery method of complex network and important node discovery method of community
CN104268629B (en) Complex network community detecting method based on prior information and network inherent information
CN106708947B (en) Web article forwarding and identifying method based on big data
CN105976048A (en) Power transmission network extension planning method based on improved artificial bee colony algorithm
CN101324926B (en) Method for selecting characteristic facing to complicated mode classification
CN107301513A (en) Bloom prealarming method and apparatus based on CART decision trees
CN101763529A (en) Rough set attribute reduction method based on genetic algorithm
CN103440275A (en) Prim-based K-means clustering method
CN106452452A (en) Full-pulse data lossless compression method based on K-means clustering
CN111027574A (en) Building mode identification method based on graph convolution
CN103034869A (en) Part maintaining projection method of adjacent field self-adaption
CN104657472A (en) EA (Evolutionary Algorithm)-based English text clustering method
CN102663142B (en) Knowledge extraction method
CN109033746B (en) Protein compound identification method based on node vector
CN103761308B (en) Materialized view selection method based on self-adaption genetic algorithm
CN104462503A (en) Method for determining similarity between data points
CN104572868A (en) Method and device for information matching based on questioning and answering system
CN108460147A (en) The recommendation method of information core is built based on how sub- population coevolution
CN104200222A (en) Picture object identifying method based on factor graph model
Tao et al. Assembly model retrieval based on optimal matching
CN105279489A (en) Video fingerprint extraction method based on sparse coding

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20140226

Termination date: 20150518

EXPY Termination of patent right or utility model