CN1428696A

CN1428696A - KDD* system based on double-library synergistic mechanism

Info

Publication number: CN1428696A
Application number: CN 01145080
Authority: CN
Inventors: 杨炳儒
Original assignee: Individual
Current assignee: University of Science and Technology Beijing USTB
Priority date: 2001-12-29
Filing date: 2001-12-29
Publication date: 2003-07-09
Anticipated expiration: 2021-12-29
Also published as: CN1138206C

Abstract

The present invention discloses a KDD* new system based on double-base coordination mechanism. It is characterized by that on the basis of KDD technology the double-base coordination mechanism is added, i.e. the interrelationship "channel" between data base and basic knowledge base is created, the basic knowledge base can be used for restricting and driving the excavation process of KDD, and in the excavation process of KDD the real time maintenance of the knowledge base can be made so as to change the original operation mechanism of KDD and form an open optimized expanded KDD* system in the structure and function relatively to KDD. It can raise and improve the functions of KDD, can be produce some new structure models and excavation methods.

Description

KDD based on double-library synergistic mechanism *New system

Technical field

The present invention relates to a kind of KDD system, particularly based on the bright KDD that is based on double-library synergistic mechanism ^*New system.This new system has changed the intrinsic operating mechanism of existing KDD system, has formed an expansion body opening, that optimize for KDD on 26S Proteasome Structure and Function.

Background of invention

That the descriptive definition of KDD (Knowledge Discovery in Database, i.e. Knowledge Discovery in the database) extracts from mass data is believable, novel, effectively and the advanced processes process of the pattern that can be understood by the people.By this process, interested knowledge or high layer information can be concentrated from the database related data and extract and study from different perspectives.Someone also is called KDD data mining (or data excavation, Data Mining etc.), KDD generally uses more in artificial intelligence and machine learning field, be referred to as data mining (data mining) in the engineering application more, generally can use both without distinction.

Find out that from above-mentioned KDD definition KDD is a processing procedure that multistep is rapid, mainly comprised following treatment steps, as shown in Figure 1.

1. data are selected: the data relevant with KDD are extracted in the requirement according to the user from database, KDD will mainly carry out Knowledge Extraction from these data, in this process, can utilize some database manipulations that data are handled, and form the True Data storehouse.

2. data pre-service: mainly be that the data that step 1 produces are reprocessed, check the integrality of data and the consistance of data, noise data is wherein handled, can utilize statistical method to fill up to the data of losing.Form the mining data storehouse.

3. determine the target of KDD: according to user's requirement, determine that KDD is a knowledge of finding which kind of type, because the difference of KDD is required to adopt different Methods of Knowledge Discovering Based in concrete Knowledge Discovery process.

4. determine Methods of Knowledge Discovering Based: according to the determined task of step 3, select suitable Methods of Knowledge Discovering Based, this comprises chooses proper model and parameter, and makes Methods of Knowledge Discovering Based consistent with the judgment criteria of whole KDD.

5. focus on: promptly carry out the selection of data in the mining data storehouse.The mode that guide data focuses on is to import interested knowledge by man-machine interaction by the user, comes the excavation direction of guide data.

6. produce the hypothesis rule: the Methods of Knowledge Discovering Based that utilization is selected, go out the needed knowledge of user from extracting data, these knowledge can be represented or use some modes of representing commonly used with a kind of specific mode, as production rule or the like.

7. knowledge evaluation: this process is mainly used in and the rule that is obtained is worth evaluation whether deposits primary knowledge base in the rule of decision gained.Mainly be to estimate by human-computer interaction interface.

From top introduction as can be seen, data mining is a step among the KDD, and it mainly is to utilize some specific Methods of Knowledge Discovering Based, in the restriction of certain calculation efficient, finds out valuable knowledge from data.The several steps of above-mentioned KDD overall process can further reduce three parts, i.e. data mining pre-service (preliminary work before the data mining), data mining, data mining aftertreatment (work of treatment after the data mining).

Though KDD is a very new field, but many achievements in research in the of short duration time, have been obtained, already and will continue and interrelate just as the crossing research in fields such as machine learning, pattern-recognition, database, mathematical statistics, artificial intelligence, expert system, knowledge acquisition, data visualization and high-performance calculation, unified target is that the original coarse extracting data from big database goes out high level knowledge.To specific research field, specific KDD technology will relate to various field, opens up the application of specific research method.

Although the research of KDD has obtained some achievements, the mining process of KDD still has its an intrinsic contradiction and difficult problem, shows:

1) object of digging: more complicated relation between more large-scale database, higher dimension and the attribute.Data mining data volume to be processed is normally very huge.Hundreds of table, up to a million records, database volume reach some GB (10 ⁹) byte, even TB (10 ¹²) byte.More attribute means the search volume of higher-dimension, thereby causes shot array.It is complicated more that relation between the property value becomes, such as showing as hierarchical structure.These factors make searches for the knowledge very expensive.Therefore, systematically, directionally search for the certainty that becomes logic.

2) the input data of form: the treatable data mode of Data Mining Tools is limited at present.Generally can handle the structural data of numeric type, but can not carry out dredge operation to these half hitch structures, structureless data modes such as text, figure, mathematical formulae, image or WWW resources mostly.Challenge in addition is that data itself exist damaged or noise, particularly in business database.

3) user participates in and domain knowledge: effectively decision process often needs repeatedly mutual and repeatedly repeatedly.Present data digging system or instrument seldom can really accomplish to allow the user participate in the mining process.User's background knowledge and directive function can be accelerated the process of digging, and guarantee the validity of the knowledge of discovery.With the knowledge of association area incorporate in the data mining system be one important but do not have the problem of fine solution.Therefore, knowledge is presented to the user, be used for improving the efficient of Knowledge Discovery and the certainty that practicality also becomes logic with " plug-type ".

4) maintenance and renewal: the knowledge of finding before new data accumulation may cause lost efficacy, and these knowledge need Dynamic Maintenance and upgrade in time.Research at present adopts the method for incremental update to safeguard existing knowledge, has proposed the delta algorithm of maintenance association rule such as D.W.Cheung etc.

5) limitation of knowledge, integrated with other system: present data mining system still can not support kinds of platform.Some products are based on PC's, and some are towards mainframe system, and also having some is curstomer-oriented server environments.The system that has is limited for territory that comprises in the database or record, and for example requiring data file is specific size, perhaps is converted into the form of specific data base management system (DBMS) (DBMS) identification.But the expense that data redefine may be very expensive.Challenge in addition is the organic integration of data mining system and other DECISION KNOWLEDGE system, particularly combines with the familiar system of some users, and this plays one's part to the full for system is very important.

These limitation of data mining are to derive from knowledge discovery system self architecture is lacked research and understanding to a great extent.If KDD is confined to inevitably can bring a series of problem in the pattern of a sealing; So KDD based on double-library synergistic mechanism ^*New system proposes in order to solve above all kinds of problems better just.

In addition, the algorithm of present most KDD is not studied as the complication system of cognition KDD to the regularity of its inherence, and all do not consider knowledge base, many hypothesis rules of excavate and the existing knowledge in the knowledge base be repetition with redundancy, or even it is inconsistent, and only depend on man-machine interaction to form focusing, and do not embody the cognitive independence of system self, therefore just can't embody novelty and the validity that requires in the KDD definition.

Summary of the invention

At the shortcoming in the background technology, the invention provides a kind of KDD based on double-library synergistic mechanism ^*New system.

Technical scheme 1: a kind of KDD based on double-library synergistic mechanism ^*New system, comprise the digital machine that central processing unit, storer are formed, the memory stores of described digital machine has True Data storehouse and primary knowledge base, it is characterized in that: described primary knowledge base is divided into several relevant knowledge word banks according to each concrete domain, described knowledge word bank is based on attribute, represents wherein knowledge with linguistic field and language value structure; Described digital machine is carried out following step:

1), data pre-service: the data in the True Data storehouse are reprocessed, form and excavate database, and under the structure of building the storehouse based on attribute, set up corresponding relation with described primary knowledge base;

2), focus on: by the direction of coming guide data to excavate by the content of man-machine interaction input;

3), the directed excavation: inspiration type telegon is searched for the shortage of discovery knowledge primary knowledge base, and forms new focusing with this, directionally carries out the selection of data from excavating lane database;

4), ask for the hypothesis rule: by selected knowledge excavation method, from the mining data storehouse, extract user and the new knowledge that focuses on required excavation, and express the knowledge of being extracted with specific pattern;

5), estimate: the rule that step 4) is obtained is worth evaluation, deposits received rule in the knowledge base of deriving.

Technical scheme 2: a kind of KDD based on double-library synergistic mechanism ^*New system, comprise the digital machine that central processing unit, storer are formed, the memory stores of described digital machine has True Data storehouse and primary knowledge base, it is characterized in that: described primary knowledge base is divided into several relevant knowledge word banks according to each concrete domain, described knowledge word bank is based on attribute, represents wherein knowledge with linguistic field and language value structure; Described digital machine is carried out following step:

3), ask for the hypothesis rule: by selected knowledge discovery method, from excavate database, extract the needed knowledge of user, and express the knowledge of being extracted with specific pattern;

4), real-time servicing: the interrupt-type telegon carries out beam search to primary knowledge base, with determining step 3) in each hypothesis rule of being obtained and the primary knowledge base original knowledge whether repeat, redundancy or contradiction, and handle accordingly according to judged result;

5), estimate: step 4) is handled rule back and that be selected be worth evaluation, deposit received rule in the knowledge base of deriving.

Technical scheme 3: a kind of KDD based on double-library synergistic mechanism ^*New system, comprise the digital machine that central processing unit, storer are formed, the memory stores of described digital machine has True Data storehouse and primary knowledge base, it is characterized in that: described primary knowledge base is divided into several relevant knowledge word banks according to each concrete domain, described knowledge word bank is based on attribute, represents wherein knowledge with linguistic field and language value structure; Described digital machine is carried out following step:

4), ask for the hypothesis rule: by selected knowledge discovery method, from excavate database, extract user and the new knowledge that focuses on required excavation, and express the knowledge of being extracted with specific pattern;

5), real-time servicing: the interrupt-type telegon carries out beam search to primary knowledge base, with determining step 4) in each hypothesis rule of being obtained and the primary knowledge base original knowledge whether repeat, redundancy or contradiction, and handle accordingly according to judged result;

6), estimate: step 5) is handled rule back and that be selected be worth evaluation, deposit received rule in the knowledge base of deriving.

Technical scheme 1,2 and 3 described storeies are mass storage, also can be the vast capacity storage system that some mass storages are formed.

Technical scheme 1,2 and 3 described digital machines are the digital computing system that some computing machines are formed.

The described data reprocessing of technical scheme 1,2 and 3 step 1) comprises that integrality, consistance to data check, to the processing of noise data, utilizes statistical method to fill up etc. to the data of losing.

Technical scheme 1,2 and 3 the described corresponding relation of step 1) are the one-to-one relationship of setting up between the knowledge node of knowledge word bank and data word bank subclass structural sheet.

Technical scheme 1 and 3 step 2) described inspiration type telegon may further comprise the steps:

1), search rule intensity is greater than the linguistic variable of a certain threshold value, forms set of node;

2), the node in the set of node is made up the formation tuple-set;

3), the search primary knowledge base, from tuple, remove the tuple that in primary knowledge base, has existed;

4), to the residue tuple by strength of association ordering, make the priority of beam search;

5), according to priority order scans each tuple one by one, focuses on the directed excavation of corresponding inlet in the database.

The described interrupt-type telegon of the step 4) of technical scheme 2 and 3 step 5) may further comprise the steps:

1), reads a rule;

2), in knowledge base, search this rule, if rule intensity, is then carried out the step down greater than set-point; Otherwise forwarding step 4) to carries out;

3), whether judgment rule repeats. and redundancy or contradiction if there is one of them, then forwards step 4) to and carry out; Otherwise deposit this rule in knowledge base, carry out the step down then;

4), judge whether to read strictly all rules, if read strictly all rules, then finish this process; Otherwise read next bar rule, and forward step 2 to) carry out.

It is all kinds of figures and the analysis of data that utilizes visualization tool to provide that the described rule of the step 5) of technical scheme 1,2 and the step 6) of technical scheme 3 is worth evaluation, is estimated by the user by human-computer interaction interface.

The described rule of the step 5) of technical scheme 1,2 and the step 6) of technical scheme 3 is worth the automatic evaluation method of evaluation employing based on the causalnexus rule of autoepistemic logic, that is: according to the strength of association and the preset threshold of rule, realize automatically by described digital machine; Described automatic evaluation method is:

Get reason A and the data of S as a result, constitute the set P={＜tw of a preface idol, sw〉} (w=1,2....N), tw is the data (promptly because of sample value) in reason shape (change) state space, sw be with the corresponding shape as a result of reason data (change) state space in data (i.e. fruit sample value), N is the holding strength of rule for the number of sample in the set, SUP, and CR is the strength of association of rule, the regular holding strength of SUP1 for try to achieve at every turn, its initial value is 0; Carry out following steps:

1), get reason sample value tw (w=1,2....N), it belongs to general sample space, obtains because of shape (change) attitude input vector atw;

2), determine because of shape (changes) attitude input vector atw affiliated because of shape (change) attitude type such as Ak (k=1,2,3,4,5) promptly calculate atw and each dH that estimates, get reckling and be that atw belongs to because of shape (change) attitude type, randomly draw a sample set because of shape (change) attitude standard vector Ai by formula (2), can see the set P={＜tw that contributes a foreword even, sw〉};

3), with rule As local major premise, with because of under shape (change) the attitude input vector at because of shape (change) attitude standard vector Ak be minor premise, can in estimating knowledge base, the mode by self-organization find the unique knowledge matrix M ijk that is complementary with it, obtain result's shape (change) attitude vector Sw1 according to automated reasoning pattern (3);

4), cluster: calculate fruit shape (changes) the attitude standard vector β under the Sw1, can by ask it and each really the estimating of shape (change) attitude standard vector (as shown in the formula) get reckling and obtain cluster;

d_{H} = (S_{w 1}, S_{j}) = Σ_{i = 1}^{10} | {μS}_{w}^{(i)} - {μS}_{j}^{(i)} |

Wherein, μ Sw1 (i) is respectively its each self-corresponding coordinate with μ Sj (i);

5), for set of ordered pairs P={＜tw, sw, get the sample value sw of corresponding results, can obtain fruit shape (change) the attitude standard vector γ in interval under its with the method for fuzzy clustering, if β=γ, SUP1=SUP1+1 then, otherwise SUP1=SUP1;

6), repeat said process N time, obtain SUP; If

The causalnexus intensity CR that SUP=SUP1/N gets rule compares with it:

If: SUP＞CR then rule is accepted;

SUP≤CR then rule is rejected.

A kind of based on new method based on the excavation correlation rule of double-library synergistic mechanism, it is characterized in that comprising the steps:

1), data pre-service: mainly be that the user selects the True Data storehouse, the connection attribute in the True Data storehouse is carried out discretize, form the excavation database of forming by several tables of data;

2), primary knowledge base is searched for to find " knowledge shortage ", generation knowledge shortage collection;

3), rule that knowledge shortage is concentrated calculates its rule intensity, and according to threshold value rule accepted or rejected, sort according to rule intensity then;

4), orientation is carried out in the mining data storehouse excavates formation hypothesis rule;

5), qualified rule application interrupt-type telegon is handled;

6), the rule of passing through after the interrupt-type mediators handle is estimated; Pass through if estimate, then warehouse-in; Do not pass through if estimate, then delete this rule.

KDD ^*Two storehouses (database and knowledge base) synergistic mechanism that new system is proposed, fundamentally solved the deficiency that KDD exists, simultaneously, the introducing of double-library synergistic mechanism makes that KDD has obtained on function further perfect, this mainly shows following two aspects: 1. aspect the data excavation, double-library synergistic mechanism makes knowledge base can participate in the excavation process of database dynamically, intrinsic knowledge in user's priori and the knowledge base can produce " the directed excavation " by this mechanism, with the generation that improves cognitive independence and avoid magnanimity to search for; 2. aspect MAINTENANCE OF KNOWLEDGE BASE, by double-library synergistic mechanism can in data excavation process, revise in real time and the maintenance knowledge storehouse in content, comprise the check of repetition and redundancy, contradiction processing etc.

Meaning of the present invention is: 1) except that going to excavate the knowledge according to user's request and artificial interest, proposed automatically to inspire the directed approach of excavating knowledge according to " knowledge shortage " in the primary knowledge base, promptly improve " cognitive independence ", overcome self limiting to of field user more effectively; 2) significantly reduced " evaluation amount " after hypothesis rule is excavated; 3) according to the mechanism in two storehouses " structure correspondence ", can dwindle the search volume greatly, improve and excavate efficient; 4) solve more effectively new and old knowledge synthetic after, problems such as the redundancy of knowledge base and consistance are guaranteed the real-time servicing to knowledge base; 5) generally speaking, KDD is considered as an open system, in the extensive connection of KDD process and primary knowledge base, improves and structure, process and the operating mechanism of having optimized KDD.

The present invention is embedded into two telegons among the KDD and goes, thereby fundamentally changing the intrinsic operating mechanism of KDD, on structure and function, form an expansion body opening, that optimize for KDD, and also can induce the new construction model of Knowledge Discovery on this basis.

Description of drawings

Fig. 1 is the FB(flow block) of the KDD system of prior art;

Fig. 2 representation of knowledge synoptic diagram of the present invention;

Fig. 3 A is the FB(flow block) of technical scheme 1 of the present invention;

Fig. 3 B is the FB(flow block) of technical scheme 2 of the present invention;

Fig. 3 C is the FB(flow block) of technical scheme 3 of the present invention;

Fig. 4 is a general structure model of the present invention;

Fig. 5 is the counter structure figure of knowledge word bank of the present invention and data word bank;

Fig. 6 is an inspiration type telegon process flow diagram of the present invention;

Fig. 7 is an interrupt-type telegon process flow diagram of the present invention;

Fig. 8 is the excavation correlation rule new method process flow diagram based on double-library synergistic mechanism of the present invention;

Fig. 9 is the process flow diagram of KDD process of the present invention;

Figure 10 is the operation result of QAR_SQL method;

The unexpected rule that Figure 11 produces for operation Famer method;

Figure 12 is applied to the rule that database produced of pestering for the excavation correlation rule new method based on double-library synergistic mechanism of the present invention.

Embodiment1, KDD ^*The theoretical foundation of new system:

According to the listed relation of Fig. 2, provide following related definition: 1.1 knowledge representation methods-linguistic field and language value structure:

Definition 1:C=＜D, I, N ,≤N 〉, if satisfy following condition:

(1) D is the set that basic underlying variables domain R goes up the intersection closed interval, and D+ is its corresponding opener;

(2) N ≠ Φ is the finite set of language value;

(3)≤N is the ordering relation on the N;

(4) I:N → D is standard value mapping, satisfies isotonicity, that is: n1, n2 ∈ N (n1 ≠ n2 ∧ n1

≤ N n2 → I (n1)≤I (n2)), (≤be partial ordering relation)

Claim that then C is a linguistic field.

Definition 2: for linguistic field C=＜D, I, N ,≤N 〉, claim F=＜D, W, K〉be the language value knot of C

Structure, if:

(1) C satisfies definition 1;

(2) K is a natural number;

(3) W:N → Rk satisfies:

n1，n2∈N(n1≤N?n2→W(n1)≤dicW(n2)>，

n1，n2∈N(n1≠n2→W(n1)≠W(n2))。

Wherein ,≤dic is the dictionary preface on [0,1] k, and promptly (a1 ...., ak)≤dic (b1 ...., bk) and if only if exists h, makes aj=bj when 0≤j＜h, ah≤bh.1.2 excavate the foundation of general relation of homotopy between storehouse and the knowledge base: 1) knowledge node:

Definition 3: in being relevant to the knowledge word bank of domain X, claim that the knowledge of expressing by following formation is uncertain regular pattern composite knowledge:

(1) P(X)Q(X)

(2) - - - - P (X) &DoubleRightArrow; Λ_{j = 1}^{n} Q_{j} (X)

(3) - - - - - - Λ_{i = 1}^{n} P_{i} (X) &DoubleRightArrow; Q_{j} (X)

(4) - - - - Λ_{i = 1}^{n} P_{i} (X) &DoubleRightArrow; Λ_{j = 1}^{m} Q_{j} (X)

P (X) wherein, Pi (x), Q (X), Qj (X) are respectively " attribute speech " (or " descriptive word ")+degree speech " form.

Definition 4: in definition 3, P (X) and Pi (x) are called knowledge beginning node, and Q (X) and Qj (X) are called the knowledge destination node, and are called the plain node of knowledge;

Λ_{i = 1}^{n} P_{i} (X), Λ_{j = 1}^{m} Q_{j} (X)

, be called knowledge and close node; Both are referred to as knowledge node.

Obviously, the attribute that each knowledge node indicates promptly constitutes linguistic field, as: temperature field, pressure field etc.; And each state or abnormal degree promptly constitute language value structure, as: the temperature in the temperature field is very high, high, medium and low, very low etc.

Theorem 1: in being relevant to the domain X knowledge word bank of (containing some linguistic fields), the set of all knowledge nodes note is made E (finite set), and its power set note is made ρ (E); Then＜and E, ρ (E)〉maximization of formation manifold.2) data subclass structure:

Definition 5: for domain X, in data word bank, with the plain node corresponding structure of each knowledge S=＜U, N, I, W corresponding to the knowledge word bank〉be called data subclass structure.Wherein, U ≠ Φ, U={u1, u2 ..., (ui is a data set, is formed by following I), it is under specific linguistic field and language value structure, characterizes the class (being called the data subclass) corresponding to the data set of the plain node of knowledge " attribute speech " or " descriptive word "; N ≠ Φ is the finite set of language value, and it is the set of delineation corresponding to the language value of the plain node of knowledge " degree speech ";

I:N → U, it is the mapping of the class U of data set being divided by the language value.When the data continuous distribution, be divided into some transposition sections usually (that is:

W:N → [0,1] K (k is a positive integer) satisfies: _{I, j}(u _i∩ u _j≠ Φ));

n1，n2∈N(n1≤N?n2→W(n1)≤dicW(n2))，

n1，n2∈N(n1≠n2→W(n1)≠W(n2))。

Wherein≤and N is that N goes up ordering relation, and≤dic be the dictionary preface on [0,1] K, and W (n) (n ∈ N) is the standard vector of language value when taking from language value interval mid point of correspondence and neighborhood thereof (be sample pairing vector).

Definition 6: at data subclass structure S=＜U, N, I, W〉in, title satisfies the tlv triple＜ui of following condition, ni, ri〉be the layer of S:

(1) ui ∈ U, ui (i=1,2,3 ..., v) be sample data collection in preliminary i the segment of delimiting;

(2) ni ∈ N, ni (i=1,2,3 ..., v) language value for belonging between settling in an area according to the sample data collection;

(3) ri (i=1,2,3 ..., determining v): when (i) sample data fell within non-transposition section among the ui, ri was taken as standard vector; At this moment, ri ∈ W (n).When (ii) sample data falls in the transposition section among the ui, try to achieve with interpolation formula: ( Be i interval master sample data, 1i is an i burst length, and Ai is an i interval standard vector, and the A neighbour is by according to standard vector between the fixed adjacent region of ui drop point).

Again according to r _i ^*With r _i, r _I+1Estimate or r _i ^*With r _i, r _I-1Estimate, r is got in decision _iOr r _I+1Or i _R-1, and this partial data is retained in the i layer or moves to the i+1 layer or move to the i-1 layer.Obviously, the data subclass constitutes corresponding one by one with data subclass structure.

Theorem 2: for domain X, in the data word bank corresponding to the knowledge word bank, the set of all data subclasses (structure) note is made F (finite set), and its power set note is made ρ (F), then＜F, ρ (F)〉maximization of formation manifold.3) relation of " knowledge node " and " data subclass (structure) ":

Definition 7: establish X and Y and be manifold arbitrarily, title Continuous Mappings F:X * [0,1] ⁿWhat → Y was X to the mapping of Y is general homotopy.(homotopy conception expansion under the ordinary meaning).

Definition 8: establish f, g is the Continuous Mappings from the Topological Space X to Y, if there is general homotopy F (x, t)=and ft (x), making all has f (x)=F (x, (0 for arbitrfary point x ∈ X, ..., 0)), g (x)=F (x, (1, ..., 1)), then claim g general homotopy in f, and claiming that F is the general homotopy of Continuous Mappings f and mapping g, note is made f～g.

Definition 9: the Continuous Mappings f from Topological Space X to manifold Y is called general homotopy equivalence, if there is Continuous Mappings g from manifold Y to Topological Space X, make g ° of f of synthetic mapping and f ° of g respectively from X and Y to self, general homotopy in the identical mapping IX in corresponding space and the mapping of IY, note is made g ° of f～IX, f ° of g～IY respectively; Mapping g also is general homotopy equivalence, and is called the contrary of equal value of f of equal value.

Definition 10: establish given two manifold,, then claim this two spaces that the space is same general homotopy type if there be of the mapping of a space at least to a general homotopy equivalence in another space.

Theorem 3 (structure correspondence theorem): for domain X, in corresponding knowledge word bank and data word bank, about the manifold＜E of knowledge node, ρ (E)〉with manifold＜F about data subclass (structure), ρ (E) be the space of same general homotopy type.

As the above analysis: when a space was changed into the space of same general homotopy type, the structure of general homotopy class set there is no change, so in general homotopy theory, can regard the space of same general homotopy type as identical.So theorem 3 provided in the knowledge word bank in " knowledge node " and corresponding data word bank in " data subclass structure " layer between one-to-one relationship, as shown in Figure 5.2, the realization of double-library synergistic mechanism: 2.1 Fig. 3 A have represented first kind of scheme of the present invention, and key step comprises:

3), the directed excavation: inspiration type telegon is searched for the shortage of discovery knowledge primary knowledge base, and directionally carries out the selection of data from excavating lane database with this;

4), ask for the hypothesis rule: by selected knowledge excavation method, from the mining data storehouse, extract the needed knowledge of user, and express the knowledge of being extracted with specific pattern;

Fig. 3 B has represented second kind of scheme of the present invention, and key step comprises:

3), ask for the hypothesis rule: by selected knowledge discovery method, from excavate database, extract the needed knowledge of user, and express the knowledge of being extracted with specific pattern.

5), estimate: step 4) is handled rule back and that be selected be worth evaluation, deposit received rule in the knowledge base of deriving.。

Fig. 3 C has represented the third scheme of the present invention, and key step comprises:

4), ask for the hypothesis rule: by selected knowledge discovery method, from excavate database, extract the needed knowledge of user, and express the knowledge of being extracted with specific pattern.

The pairing technical scheme 1 of Fig. 3 B does not have the real-time servicing step, the technical scheme 2 of Fig. 3 A correspondence does not have directed excavation step, and the pairing technical scheme 3 of Fig. 3 C comprises directed the excavation and two steps of real-time servicing simultaneously, therefore, present embodiment mainly describes in detail the pairing technical scheme of Fig. 3 C, and the realization base reason of all the other two kinds of schemes is identical.

Fig. 4 has further expressed structure of the present invention.According to described theoretical foundation and structure correspondence theorem, in the present invention, the plain node of knowledge in the knowledge base is corresponding, just corresponding with the corresponding attribute degree of this element node speech with the layer in the database, through pre-service the True Data storehouse is divided into n table (table) for this reason, be table1, table2 ..., tablen, n is the number of attribute degree speech, and the k correspondence among the tablek ID number of each attribute degree speech.The field of each table has only one, is used for depositing ID number of the data in the True Data storehouse, and the pairing data of this ID are in the described state of attribute degree speech k.The mining data storehouse is exactly to be made up of this n Table, so just need not to search for entire database, only need scan the corresponding several tables of knowledge node for the knowledge of every shortage.This just seems particularly important for large database, and these little tables can be put into internal memory and carry out computing, and entire database just can't be carried out (being that the Apriori method will be affected).

The knowledge word bank is characterized in being convenient to form the corresponding relation of knowledge node and data subclass based on attribute, lays the foundation thereby excavate for directional data.Logical organization: in corresponding domain, be that the basis turns to the several rules word bank with the rule base class with the attribute, each regular word bank is corresponding with the mining data storehouse.2.2 double-library synergistic mechanism is mainly realized by inspiration type telegon and interrupt-type telegon.

The function of inspiration type telegon is the not related attitude by " knowledge node " in the search knowledge base, to find " knowledge shortage ", produce " original idea image ", thereby inspire and activate corresponding " data class " in the True Data storehouse, to produce " directed excavation process ", promptly finished computing machine automatic focus.

The function of interrupt-type telegon is when line focus from the mass data in True Data storehouse and behind the create-rule (knowledge), make the KDD process produce " interruption ", and correspondence position have or not the repetition, redundancy, contradiction, subordinate, circulation etc. of this create-rule in the removal search knowledge base.If have, then cancel " top " that returns KDD after this create-rule or the respective handling; If do not have, then continue the KDD process, i.e. knowledge evaluation.2.3 KDD ^*Software realize that the function mainly comprise inspiration type telegon, KDD process and interrupt-type telegon realizes.

Inspiration type telegon mainly realizes finding " knowledge shortage " by the reachability matrix that calculates oriented hypergraph, and then carries out beta pruning and form focusing on the rule intensity threshold value; The KDD process mainly realizes (is example to excavate correlation rule) by the confidence level threshold value; The interrupt-type telegon then comes repetition, thatch shield, redundancy, circulation and the subordinate of judgemental knowledge with sql like language or the reachability matrix that calculates oriented hypergraph, and handles accordingly.

Several relevant notions

The support and confidence level of rule: with define in the common correlation rule identical.

Define 1 degree Interest interested: be meant the interest level of user, just refer to the interest level of user to the plain node of each knowledge in the knowledge base to each linguistic variable in the database or language value.When pre-service, at first by the degree interested of given each the language value of user, the degree interested of the plain node of promptly corresponding knowledge is expressed as Interest (ek), and codomain is [0,1], and this value is big more, illustrates that the user is interested in more the plain node of this knowledge.Close node F=e1 ∧ e2 ∧ for knowledge ... ∧ em, be defined as mean value, that is: the plain node of each knowledge degree interested

Interest (F) = Σ_{i = 1}^{m} Interest (e_{i}) / m

If definition rule length is the regular number that contains the plain node of knowledge, note is made Len (ri), and then for a regular ri=F → h, its degree interested is

Ingerest (r_{i}) = [Σ_{i = 1}^{m} Interest (e_{i}) + Interest (h)] / Len (r_{i})

Wherein, Len (ri)=m+1.The degree interested of rule is to the number that appears at the plain node of knowledge in the rule and a kind of comprehensive measurement of degree interested.Usually, the plain node of the knowledge that the degree interested that comprises in the rule is big is many more, and the plain node of the knowledge that degree interested is little is few more, thinks that the user is interested in more this rule.

Define 2 rule intensity Intensity: comprise to the degree of support of objective (objective) of rule with to two aspects of interest level of the subjectivity (subjective) of rule.Objective degree of support to rule just is called support, and the interest level of the subjectivity of rule is called degree interested (seeing definition 6).For regular ri=F → h, its rule intensity Intensity (ri)=(Interest (ri)+sup (ri))/2.

Present embodiment is from practical standpoint to the definition of rule intensity, and a kind of regularity of making for ease of tolerance is not lost its essential characteristic.

The method of excavation as the Apriori algorithm, was only come mining rule according to objective metric in the past, was difficult to obtain the real interested rule of user, needed a large amount of manually interested rule being screened.And rule intensity is considered objective and subjective two aspects simultaneously, according to above-mentioned definition, with the rule intensity is that index inspires mining rule, then both can reasonable mutual coordination: on the one hand, even support is smaller, as long as the user is very interested in this short knowledge, then rule intensity just can not be too little, thereby this hypothesis rule still can be focused, and then excites the excavation process; On the other hand, if the user is not very interested to the knowledge of a shortage, just may be focused, and then excite the excavation process when having only this shortage knowledge to have very high support.In addition, in the definition of rule intensity, we have also used this notion of support, but this moment, the support threshold value just can be set lowlyer with respect to the Apriori algorithm, and is very careful when promptly the knowledge of shortage being carried out beta pruning.

By above to KDD ^*The introduction of new system global structure illustraton of model and theoretical foundation, we as can be seen the technology of double-library synergistic mechanism realize it being to construct interruption (R) type telegon and inspiration (S) type telegon.The major function of interrupt-type telegon is: generate hypothesis rule (knowledge) when line focus from the mass data in True Data storehouse after, make the KDD process produce " interruption ", and correspondence position have or not the repetition of this create-rule, redundant and contradiction (beam search process) in the removal search knowledge base.If have, then cancel " top " that returns KDD after this create-rule or the respective handling; If do not have, then continue the KDD process, promptly estimate warehouse-in with the result.The major function of inspiration type telegon is: building under the principle of storehouse based on the knowledge base of attribute, not related attitude by " knowledge node " in the search knowledge base, to find " knowledge shortage ", produce " original idea image ", thereby inspire and activate corresponding " data class " in the True Data storehouse, to produce " directed excavation process ".

Key is to adopt double-library synergistic mechanism in the present invention: promptly adopt interrupt-type telegon, inspiration type telegon, respectively the hypothesis rule that is obtained is handled realizing the real-time servicing of knowledge base, and utilized the rule intensity excitation data to focus on to carry out data and excavate.

Therefore: the problem that realizes the double-library synergistic mechanism most critical promptly is to realize " beam search process " (reducing the search volume) and " directed excavation process " (reduce and excavate the space); And the necessary condition that realizes this function is: the corresponding relation of " data subclass (structure) " in " knowledge node " and the True Data storehouse in the structure knowledge base.

Inspire telegon:

The fundamental purpose of inspiration type telegon is that the focusing for system provides another approach.In classical KDD process, the focusing of system normally by the user provide interested parties to, KDD excavates along this direction.If but only carry out along this direction, perhaps potential in the mass data can tend to be ignored by the user to user's Useful Information.Inspiration type telegon can help that KDD is as much as possible to search the Useful Information to the user, to remedy user's self limitation, improves the cognitive independence of machine.

The step of inspiration type telegon as shown in Figure 6:

When calling inspiration type telegon, program forwards step 101 to, and search rule intensity forms set of node greater than the linguistic variable of a certain threshold value; Node in the step 102 pair set of node makes up, and forms tuple-set; Step 103 search primary knowledge base is removed the tuple that has existed in primary knowledge base from tuple; Step 104 pair residue tuple is made the priority of beam search by the strength of association ordering; Step 105 according to priority order scans each tuple one by one, focuses on the directed excavation of corresponding inlet in the database; Step 106 forwards the KDD process to.

The coordination for interrupt device:

Traditional knowledge discovery system, the hypothesis that the KDD process produces is directly estimated, when received knowledge is integrated into knowledge base, be responsible for consistance, the redundancy of knowledge base are checked by knowledge base management system, contradiction and redundant knowledge are handled, formed new knowledge base.The shortcoming of this mode is: form many insignificant hypothesis evaluations and owing to a large amount of accumulation of problem add the burden that weight uniformity, redundancy are checked.

Because the interrupt-type telegon is to the intervention of KDD process, can be in real time, as soon as possible repetition, contradiction, redundant knowledge are eliminated, thus only accomplish those hypothesis that might become new knowledge are estimated, reduced evaluate workload to greatest extent.Step as shown in Figure 7:

When calling the interrupt-type telegon, program forwards step 201 to, and initialization rule counting pointer also makes it point to article one rule; Whether finish in step 202 judgemental knowledge storehouse, if this judgement is sure, then execution in step 203, and to close knowledge base and to finish this time and call, if negate, then execution in step 204; Step 204 is searched I bar rule in knowledge base, execution in step 205 then; Whether step 205 judgment rule intensity greater than 0.5, if judge whether surely, then execution in step 206, and I is added 1 and forward step 202 to, if judge it is sure, then execution in step 207; The rule that step 207 judge to produce whether with knowledge base in rule repeat, if judge it is sure, then execution in step 208, and I is added 1 and forward step 202 to, if judge whether surely, then execution in step 209; The rule that step 209 judge to produce whether with knowledge base in rule exist redundantly, if judge it is sure, then execution in step 210, and I is added 1 and forward step 202 to, if judge whether surely, then execution in step 211; The rule that step 211 judge to produce whether with knowledge base in regular contradiction, if judge it is sure, then execution in step 212, and I is added 1 and forward step 202 to, if judge whether surely, then execution in step 213; Step 213 deposits I bar rule in the knowledge base in, and execution in step 214 then, and I is added 1 and forward step 202 to.

3. based on excavation correlation rule new method---the Maradbcm method of double-library synergistic mechanism:

The present research of KDD in the world mainly is that task description, knowledge evaluation and the representation of knowledge with Knowledge Discovery is served as theme, and is the center with effective knowledge discovery algorithm, and seldom KDD is studied the regularity of its inherence as the complication system of cognition.The algorithm of present most KDD is not all considered knowledge base, excavate many hypothesis rules of coming out and existing knowledge in the knowledge base and be repetition with redundancy, or even inconsistent, therefore just can't embody desired novelty in the KDD definition.And do not carry out subsequent treatment for the rule that produces, promptly do not consider between these rules or and primary knowledge base between the processing of repetition, redundancy, contradiction etc.

Based on the association rule mining new method of double-library synergistic mechanism, be called for short Mara-dbcm method (miningassociation rules algorithms based on double-bases cooperating mechanism) and can solve the above-mentioned problem of mentioning effectively.

Based on the association rule mining method of double-library synergistic mechanism concrete technical mainly be the discretization method that utilized the classics in the data mining, and KDD ^*Carry out Mining Association Rules based on the inspiration type telegon of double-library synergistic mechanism and interrupt-type telegon etc. in the system.

If the rule intensity threshold value is MinIntensity, the support threshold value is MinSup, the confidence level threshold value is MinCon, adequacy factor threshold value is minLS, m is the number of the plain node pi of knowledge of support (pi)＞minSup, n is that knowledge is closed the node number and added m in the reachability matrix, and the pairing attribute of the plain node pi of knowledge is attr (pi).

1) data pre-service: mainly be that the user selects the True Data storehouse, the connection attribute in the True Data storehouse is carried out discretize, formation excavation database (n table, table1, table2 ..., tablen);

2) find " knowledge shortage ": represent knowledge in the knowledge base with oriented hypergraph H, and the adjacency matrix A (H) that has provided oriented hypergraph represents, proposed a kind of reachability matrix P (H) new algorithm that calculates oriented hypergraph on this basis, 0 element among the reachability matrix P (H) is exactly the knowledge of shortage;

3) produce K2: establish short knowledge collection and represent, represent that with Km regular length is the short knowledge collection of m, i.e. Km={r|Len (r)=m} with K.Because the element among the K is very many, we will utilize the rule intensity Intensity (ri) that introduces above that K2 is carried out beta pruning, and the regular ri of Intensity (ri)＞minIntensity (ri) is focused on.Promptly, must satisfy: support sup (ep), sup (eq)＞MinSup, and sup (the ri)=min among the Intensity (ri) (sup (ep), sup (eq)) for short knowledge ri:cp → cq (ri ∈ K2);

4)m＝2；

5) Km is produced the hypothesis rule: to the short knowledge r among the Km _i: e ₁∧ e ₂∧ ... ∧ e _p→ e _q(r _i∈ K _m), carry out orientation and excavate, promptly to tables of data table ₁, table ₂..., table _p, table _qExcavate, calculate Con (r _i) and Intensity (r _i), if Con is (r _i)＞MinCon and Intensity (r _i)＞MinIntensity (r _i), then change 6); Otherwise, delete this rule;

6) regular ri is used the interrupt-type telegon and handle, correspondence position has or not the repetition, redundancy, contradiction, subordinate, circulation of this create-rule etc. in the search primary knowledge base.If have, then cancel this create-rule or respective handling; Change 8); If do not have, then change 7);

7) regular ri is estimated, pass through, then put in storage, and the corresponding reachability matrix of oriented hypergraph is calculated,, adjust Km if estimate; Do not pass through if estimate, then delete this rule;

8) whether Km finishes, if finish, changes 9); If not finish, then do not change 5) carry out the processing of next bar rule;

9) m=m+1 is if Km=φ changes 10); Otherwise, change 5);

10) show the new rule that produces;

11) finish.

Fig. 8 has provided program flow diagram:

Pre-service is carried out in step 302 pair True Data storehouse, forms the mining data storehouse; Step 303 will be counted pointer and be changed to 1; Step 304 produces all set greater than the data of minimum support from the mining data storehouse, i.e. sport collection L _iStep 305 produces Candidate Set C from knowledge base _I+1Step 306 judges whether Candidate Set is empty, if judge it is sure, then forwards step 314 to, otherwise execution in step 307; Step 307 computation rule intensity intensity (cm); Whether step 308 judgment rule intensity is less than rule intensity threshold value MinIntensity, if judge it is sure, then execution in step 309 is with deletion c _mIf it is fixed to judge whether, then execution in step 310; Step 310 produces knowledge shortage collection K _I+1Step 311 judgemental knowledge shortage collection K _I+1Whether be empty,, then forward step 314 to if judge it is sure, otherwise execution in step 312; Step 312 is called the excavation that the KDD process is carried out data; Step 313 forwards step 305 to after making the counting pointer add 1; Step 314 shows the new regulation that produces; Step 315 this operation of end.

Program flow diagram with reference to KDD process shown in Figure 9:

Step 401 pair excavation database carries out orientation and excavates; The support of step 402 computation rule, confidence level and adequacy factor values; The value that step 403 is tried to achieve step 402 compares with corresponding threshold separately, if support greater than support threshold value and confidence level greater than confidence level threshold value and adequacy factor values greater than adequacy factor threshold value, then execution in step 404, otherwise execution in step 405; Step 404 is called the interrupt-type telegon gained rule is handled; Step 405 this process program of end.

The example operating ratio is:

The mushroom database:

Comparative in order to have, this algorithm utilization experimentizes for the mushroom database (mushroom database) that the classic network database of testing usefulness is provided.This algorithm is that the programming language that is adopted is Delphi5.0, and Database Systems are SQL-Server7.0 of Microsoft, have adopted the Client-Server structure.

Because there is not domain expert's knowledge in the pairing knowledge word bank of mushroom database, at first operation is dug the QAR_SQL algorithm of many-valued correlation rule and the Famer algorithm of the unexpected correlation rule of excavation, the rule of excavate as knowledge in the primary knowledge base, whether we are only poisonous interested in mushroom, and therefore whether the consequent of rule is poisonous this attribute (promptly comprising ' edible ' ' edible ' and ' poisonous ' ' poisonous ').At first move the QAR_SQL algorithm, support threshold value minSup=0.4 is set, confidence level threshold value minCon=0.6, adequacy factor threshold value minLS=1.2, the result will produce 19 rules, be illustrated in fig. 10 shown below.

Operation Famer algorithm is provided with support threshold value minSup=0.14, confidence level threshold value minCon=0.8, and adequacy factor threshold value minLS=1.2 produces 10 and the corresponding unexpected rule of above-mentioned conventional rule, 20 to 29 rules among following Figure 11 in addition.

As primary knowledge base, operation inspiration type telegon is provided with support threshold value minSup=0.14, confidence level threshold value minCon=0.6, rule intensity threshold value minIntensity=0.45 with 29 above-mentioned rules.Produce 45 rules in addition, be illustrated in fig. 12 shown below (only having shown 12 rules wherein).

This case verification shows that the Maradbcm method is effectively, can find some new correlation rules on QAR_SQL method and Famer method basis in addition.4, knowledge evaluation method-------based on the automatic evaluation method 4.1 principle 1:(agreement principles of the causalnexus rule of autoepistemic logic) in the objective world, under uncertain inference mechanism and great amount of samples statistics, the causalnexus rule inferential sign be consistent in statistical sign.

Principle 2:(applicability principle) the authentication reasoning pattern is applicable in the reasoning relevant with the causalnexus rule.That is:

HE

\frac{E}{H}

Wherein H is the hypothesis that is verified, and can be considered as the causalnexus rule R of needs assessment after excavating.E asserts for some that can release from H, can be considered as the assay that obtains through check.In evaluation procedure, the check of being carried out is according to uncertain cause and effect induction, and whether check cause and effect data satisfy agreement principle, if the i.e. shape metamorphosis of data equals by the result of data through the reasoning gained, show that then it satisfies agreement principle, otherwise do not satisfy agreement principle.4.2 according to the positive correlation standard:

E authenticates H, and if only if Pr (H/E)＞Pr (H)

Wherein, Pr (H) is for testing preceding degree of confidence, and Pr (H/E) is for testing the back degree of confidence.In other words, and if only if that H tests preceding degree of confidence with respect to the back degree of confidence of testing of E greater than it for E authentication H.4.3 do following analysis for the foundation of evaluation method:

With the causalnexus rule of being found be designated as R (

), it is exactly to judge whether accept this rule that rule is estimated, so it belongs to the category of authentication logic.Definition 10: to causalnexus rule R ( ), the probability that Ai and Sj occur simultaneously is Pr (Ai ∧ Sj)/Pr (Ai ∨ Sj) with both extract ratios of the probability that occurs, is called causalnexus intensity, note is made CR.(promptly be equivalent to Pr (H), can be used as and test preceding degree of confidence)

Annotate: causalnexus intensity shows, and what not only show is the rule association degree, and main is the cause-effect relationship that shows former piece and consequent, and it emphasizes it is both causalnexus degree.With in general sense confidence level and support and more the degree of confidence on the universal significance obvious difference is arranged.Definition 11: Pr (E2)/(Pr (E1)+Pr (E2)) is called holding strength, and note is made SUP.(promptly be equivalent to Pr (H/E), can be used as and test the back degree of confidence)

Annotate: in fact, in evaluation procedure, the check of being done is inspection rule and whether satisfies agreement principle in the principle 1.E is the assay of gained like this, so just separated into two parts of data: satisfy the part (being designated as E1) of agreement principle and do not satisfy the part (being designated as E2) of agreement principle.Wherein the part of Man Zuing has been represented the degree that the causalnexus rule is set up, promptly be a kind of degree of support that is based upon on the inference mechanism to rule, this is different 4.4 can get following conclusion according to principle 2 and relevant definition with the usually said support that just merely is based upon on the statistics:

For causalnexus rule R ( ), if SUP＞CR, then this causalnexus rule obtains authentication, if SUP≤CR, then this causalnexus rule is by falsification.4.5 utilizing LS adequacy factor pair correlation rule estimates:

In the subjective Bayes method, the representation of every rule is

IF?E?THEN(LS，LN)H(P(H))

Wherein: P (H) is the prior probability of H; Ls ∈ [0 ,+∞) being called the adequacy factor, it has reflected that evidence E is for very to the influence degree of conclusion H; LN ∈ [0 ,+∞) being called the necessity factor, it has reflected-and E is to the influence degree of H, and promptly E is genuine necessity degree to H.

The relation of LS and P (H/E) is as shown in the formula expression:

P (H / E) = \frac{LS \times P (H)}{1 + (LS - 1) \times P (H)} - - - - - (4 - 2)

Wherein P (H/E) is a conditional probability, and P (H) is the prior probability of H, can release LS thus:

LS = \frac{P (H / E) \times (1 - P (H))}{P (H) \times (1 - P (H / E))} - - - - - - (4 - 3)

The LS value is provided by the domain expert usually, but can calculate in the Mining Association Rules method.The meaning of LS as can be seen from following formula (4-3):

(1) when LS=1, can get P (H/E)=P (H) by formula (1), this shows that E and H have nothing to do;

(2) when LS＞1, can get P (H/E)＞P (H) by formula (1), this shows that promptly E is strong more for really supporting to H because the pairing evidence of E exists, and having increased H is genuine possibility, and LS is big more, and P (H/E) is just big more.When LS → ∞, P (H/E) → 1 shows that because the existence of E, it is true will causing H, this shows, the existence of E to H for really being fully, so title LS is the adequacy factor;

(3) when LS＜1, can get P (H/E)＜P (H) by formula (1), this shows because the existence of evidence E, and will cause H is that genuine possibility descends;

(4) work as LS=0, can get P (H/E)=0 by formula (1), the existence that this shows owing to evidence E, it is false will making H.4.6 automatic evaluation method based on the causalnexus rule of autoepistemic logic:

Its automatic evaluation method is as follows: (evaluation rule ):

Get reason A and the data of S as a result, constitute the set P={＜tw of a preface idol, sw〉} (w=1,2....N), tw is the data (promptly because of sample value) in reason shape (change) state space, sw be with the corresponding shape as a result of reason data (change) state space in data (i.e. fruit sample value).N is the number of sample in the set.If SUP1=0.

STEP1: (w=1,2....N), it belongs to general sample space, can obtain because of shape (change) attitude input vector atw according to formula (1) to get the sample value tw of reason.

STEP2: determine because of under shape (changes) the attitude input vector atw because of shape (changes) attitude type such as Ak (k=1,2,3,4,5) promptly by formula (2) calculating atw and each dH that estimates because of shape (change) attitude standard vector Ai, get reckling and be that atw belongs to because of shape (change) attitude type.Randomly draw a sample set, can see the set P={＜tw that contributes a foreword even, sw }.

STEP3: with rule As local major premise, with because of under shape (change) the attitude input vector at because of shape (change) attitude standard vector Ak be minor premise, can in estimating knowledge base, the mode by self-organization find the unique knowledge matrix M ijk that is complementary with it, obtain result's shape (change) attitude vector Sw1 according to automated reasoning pattern (3).

STEP4: cluster.Calculate fruit shape (changes) the attitude standard vector β under the Sw1, can by ask it and each really the estimating of shape (change) attitude standard vector (as shown in the formula) get reckling and obtain cluster.

d_{H} = (S_{w 1}, S_{j}) = Σ_{i = 1}^{10} | {μS}_{w}^{(i)} - {μS}_{j}^{(i)} |

Wherein, μ Sw1 (i) is respectively its each self-corresponding coordinate with μ Sj (i).

STEP5: for set of ordered pairs P={＜tw, sw〉}, get the sample value sw of corresponding results, can obtain fruit shape (change) the attitude standard vector γ in interval under its with the method for fuzzy clustering, if β=γ, SUP1=SUP1+1 then, otherwise SUP1=SUP1.

STEP6: repeat said process N time, obtain SUP.If

SUP＝SUP1/N

The causalnexus intensity CR that gets rule compares with it.

If SUP＞CR then rule is accepted;

SUP≤CR then rule is rejected.

The embodiment of the best of the present invention is illustrated, and those of ordinary skill in the art is among the various changes of having done on the basis that does not break away from its spirit all should be contained in protection scope of the present invention.

Claims

1, a kind of KDD based on double-library synergistic mechanism ^*New system, comprise the digital machine that central processing unit, storer are formed, the memory stores of described digital machine has True Data storehouse and primary knowledge base, it is characterized in that: described primary knowledge base is divided into several relevant knowledge word banks according to each concrete domain, described knowledge word bank is based on attribute, represents wherein knowledge with linguistic field and language value structure; Described digital machine is carried out following step:

2, the KDD based on double-library synergistic mechanism according to claim 1 ^*New system is characterized in that described storer is the vast capacity storage system that some mass storages are formed.

3, the KDD based on double-library synergistic mechanism according to claim 1 ^*New system is characterized in that described digital machine is the digital computing system that some computing machines are formed.

4, the KDD based on double-library synergistic mechanism according to claim 1 ^*New system is characterized in that the described data reprocessing of step 1) comprises that integrality, consistance to data check, to the processing of noise data, utilizes statistical method to fill up etc. to the data of losing.

5, the KDD based on double-library synergistic mechanism according to claim 1 ^*New system is characterized in that the described corresponding relation of step 1) is the knowledge node of knowledge word bank and the one-to-one relationship of the interlayer foundation of data word bank data subclass structure.

6, the KDD based on double-library synergistic mechanism according to claim 1 ^*New system is characterized in that the described inspiration type of step 3) telegon may further comprise the steps:

2), the node in the set of node is made up the formation tuple-set;

7, the KDD based on double-library synergistic mechanism according to claim 1 ^*New system is characterized in that it is that all kinds of figures and the analysis of data that utilizes visualization tool to provide estimated by the user by human-computer interaction interface that the described rule of step 5) is worth evaluation.

8, the KDD based on double-library synergistic mechanism according to claim 1 ^*New system is characterized in that the described rule of step 5) is worth the automatic evaluation method of evaluation employing based on the causalnexus rule of autoepistemic logic, and described automatic evaluation method is:

d_{H} = (S_{w 1}, S_{j}) = Σ_{i = 1}^{10} | {μS}_{w}^{(i)} - {μS}_{j}^{(i)} |

6), repeat said process N time, obtain SUP; If

The causalnexus intensity CR that SUP=SUP1/N gets rule compares with it:

If: SUP＞CR then rule is accepted;

SUP≤CR then rule is rejected.

9, a kind of KDD based on double-library synergistic mechanism ^*New system, comprise the digital machine that central processing unit, storer are formed, the memory stores of described digital machine has True Data storehouse and primary knowledge base, it is characterized in that: described primary knowledge base is divided into several relevant knowledge word banks according to each concrete domain, described knowledge word bank is based on attribute, represents wherein knowledge with linguistic field and language value structure; Described digital machine is carried out following step:

10, the KDD based on double-library synergistic mechanism according to claim 9 ^*New system is characterized in that described storer is the vast capacity storage system that some mass storages are formed.

11, the KDD based on double-library synergistic mechanism according to claim 9 ^*New system is characterized in that described digital machine is the digital computing system that some computing machines are formed.

12, the KDD based on double-library synergistic mechanism according to claim 9 ^*New system is characterized in that the described data reprocessing of step 1) comprises that integrality, consistance to data check, to the processing of noise data, utilizes statistical method to fill up etc. to the data of losing.

13, the KDD based on double-library synergistic mechanism according to claim 9 ^*New system is characterized in that the described corresponding relation of step 1) is the knowledge node of knowledge word bank and the one-to-one relationship of the interlayer foundation of data word bank data subclass structure.

14, the KDD based on double-library synergistic mechanism according to claim 9 ^*New system is characterized in that the described interrupt-type telegon of step 4) may further comprise the steps:

1), reads a rule;

3), whether judgment rule repeat, redundancy or contradiction, if there is one of them, then forwards step 4) to and carry out; Otherwise deposit this rule in knowledge base, carry out the step down then;

15, the KDD based on double-library synergistic mechanism according to claim 9 ^*New system is characterized in that it is all kinds of figures and the analysis of data that utilizes visualization tool to provide that the described rule of step 5) is worth evaluation, is estimated by the user by human-computer interaction interface.

16, the KDD based on double-library synergistic mechanism according to claim 9 ^*New system is characterized in that the described rule of step 5) is worth the automatic evaluation method of evaluation employing based on the causalnexus rule of autoepistemic logic, that is: according to the strength of association and the preset threshold of rule, realized by described digital machine automatically; Described automatic evaluation method is:

d_{H} = (S_{w 1}, S_{j}) = Σ_{i = 1}^{10} | {μS}_{w}^{(i)} - {μS}_{j}^{(i)} |

6), repeat said process N time, obtain SUP; If

The causalnexus intensity CR that SUP=SUP1/N gets rule compares with it:

If: SUP＞CR then rule is accepted;

SUP≤CR then rule is rejected.

17, a kind of KDD based on double-library synergistic mechanism ^*New system, comprise the digital machine that central processing unit, storer are formed, the memory stores of described digital machine has True Data storehouse and primary knowledge base, it is characterized in that: described primary knowledge base is divided into several relevant knowledge word banks according to each concrete domain, described knowledge word bank is based on attribute, represents wherein knowledge with linguistic field and language value structure; Described digital machine is carried out following step:

4), ask for the hypothesis rule: by selected knowledge discovery method, from excavate database, extract the needed knowledge of user, and express the knowledge of being extracted with specific pattern;

18, the KDD based on double-library synergistic mechanism according to claim 17 ^*New system is characterized in that described storer is the vast capacity storage system that some mass storages are formed.

19, the KDD based on double-library synergistic mechanism according to claim 17 ^*New system is characterized in that described digital machine is the digital computing system that some computing machines are formed.

20, the KDD based on double-library synergistic mechanism according to claim 17 ^*New system is characterized in that the described data reprocessing of step 1) comprises that integrality, consistance to data check, to the processing of noise data, utilizes statistical method to fill up etc. to the data of losing.

21, the KDD based on double-library synergistic mechanism according to claim 17 ^*New system is characterized in that the described corresponding relation of step 1) is the knowledge node of knowledge word bank and the one-to-one relationship of the interlayer foundation of data word bank data subclass structure.

22, the KDD based on double-library synergistic mechanism according to claim 17 ^*New system is characterized in that the described inspiration type of step 3) telegon may further comprise the steps:

2), the node in the set of node is made up the formation tuple-set;

23, the KDD based on double-library synergistic mechanism according to claim 17 ^*New system is characterized in that the described interrupt-type telegon of step 5) may further comprise the steps:

1), reads a rule;

2), in knowledge base, search this rule, if rule intensity, is then carried out the step down greater than set-point; Otherwise forwarding step 5) to carries out;

4), whether judgment rule repeat, redundancy or contradiction, if there is one of them, then forwards step 5) to and carry out; Otherwise deposit this rule in knowledge base, carry out the step down then;

5), judge whether to read strictly all rules, if read strictly all rules, then finish this process; Otherwise read next bar rule, and forward step 2 to) carry out.

24, the KDD based on double-library synergistic mechanism according to claim 17 ^*New system is characterized in that it is all kinds of figures and the analysis of data that utilizes visualization tool to provide that the described rule of step 6) is worth evaluation, is estimated by the user by human-computer interaction interface.

25, the KDD based on double-library synergistic mechanism according to claim 17 ^*New system is characterized in that the described rule of step 6) is worth evaluation and adopts automatic evaluation method, that is: according to the strength of association and the preset threshold of rule, realized automatically by described digital machine; Described automatic evaluation method is:

d_{H} = (S_{w 1}, S_{j}) = Σ_{i = 1}^{10} | {μS}_{w}^{(i)} - {μS}_{j}^{(i)} |

6), repeat said process N time, obtain SUP; If

The causalnexus intensity CR that SUP=SUP1/N gets rule compares with it:

If: SUP＞CR then rule is accepted;

SUP≤CR then rule is rejected.

26, a kind of based on new method based on the excavation correlation rule of double-library synergistic mechanism, it is characterized in that comprising the steps:

5), qualified rule application interrupt-type telegon is handled;