Knowledge base example of the same name obscures the method and device of detection
Technical field
The present embodiments relate to knowledge base and knowledge mapping technical field, more particularly to a kind of knowledge base example of the same name to mix
Confuse the method and device of detection.
Background technology
Knowledge base be a kind of form structure with triple stored knowledge database, for a certain field or
Mass knowledge is structurally stored in certain industry.For example, a historical knowledge base can store the sea in history field
Measure knowledge, including each historical personage, historical events etc..Knowledge base is using example as main description object, using object-oriented
Method represents knowledge, and an example is the reference to one in reality specific or abstract affairs.For example, example can represent one
Personage, can also represent city, a something etc..
One knowledge base generally includes multiple examples, and the relation between the multiple attributes and each example of example uses
The structure storage of triple.Triple is the foundation structure for being used to represent knowledge in knowledge base, its structure can be expressed as<It is real
Example ID, predicate, example ID/property value>.Wherein, first element in triple is example ID, for representing belonging to triple
The ID of example 1;Second element in triple is predicate, for describing example relationship or attribute;The 3rd in triple
A element can be the ID of another example 2, or the property value of example 1.When the 3rd element is the ID of example 2, then should
Triple is relation triple, for describing the relation between example 1 and example 2, predicate statement at this time example 1 and example 2 it
Between relation;When the 3rd element is property value, then the triple is attribute triple, for describing a category of example 1
Property, the attribute of the example 1 of predicate statement at this time.For example, example 1 represents poet li po, example 1 includes triple<Id1, name, li po
>, example 2 represents poet Tu Fu, and example 2 includes following two triples<Id2, name, Tu Fu>,<Id2, friend, id1>;Then<
Id2, name, Tu Fu>For an attribute triple of example 2, the entitled Tu Fu of expression example 2;<Id2, friend, id1>For reality
One relation triple of example 2, represents the relation of example 2 and example 1, can represent that Tu Fu and li po are friends in the example.
Each example has unique name attribute in knowledge base, and name attribute is used for the title for storing example.Due to name
The predicate for claiming attribute is " name ", if the property value of the name attribute of two examples is identical, the two examples are example of the same name.
During knowledge base is built, it is easy to occur that attribute is obscured between two or more examples of the same name.Such as:Entitled li po
Two people, first man, name:Li po, occupation:Poet, age:The Tang Dynasty, gender:Man.Second people, name:Li po, duty
Industry:Student, date of birth:1996, gender:Female, specialty:Artificial intelligence.In knowledge base has been built, it is likely that occur two
The situation that the attribute of corresponding two examples of people is obscured, for example, storing the example for including following triple in knowledge base:<Id,
Name, li po>,<Id, occupation, student>,<Id, age, the Tang Dynasty>,<Id, gender, man>,<Id, specialty, artificial intelligence>.The example
In be contaminated with both of the aforesaid entitled " li po " corresponding two examples of two people attribute, that is, occur two instance properties of the same name
Situation about obscuring.If the attribute of two examples of the same name is obscured, then it is assumed that the two examples of the same name are obscured.
Existing knowledge base generally comprises multigroup example of the same name, there is a situation where that example of the same name is obscured.At present, technical staff
According to the context of each text in related text storehouse, the ternary by artificial nucleus to each example in knowledge base
Group, to determine the triple of other examples of the same name whether is contaminated with the example.Since knowledge base is typically stored with mass knowledge,
Example quantity is very big, and the triple quantity that example includes is even more huge, and the method for this artificial nucleus couple expends substantial amounts of manpower, and
Take considerable time, efficiency is very low.
The content of the invention
The embodiment of the present invention provides the method and device that a kind of knowledge base example of the same name obscures detection, to solve the prior art
In situation about obscuring in knowledge base with the presence or absence of example of the same name determined by the method for artificial nucleus couple, expend substantial amounts of manpower, and
Take considerable time, the problem of efficiency is very low.
The one side of the embodiment of the present invention is to provide the method that a kind of knowledge base example of the same name obscures detection, including:
Text library is obtained, the text library content is related to the content of knowledge base, and the text library includes at least one text
This, each text includes at least one sentence, and the knowledge base includes multiple examples, and each example includes multiple by N number of
The ordered set of sentence composition, N are the positive integer more than or equal to 3;
First object is obtained, according to the first object and the text library, constructs the corresponding target of the first object
The set of vector, wherein the dimension of each object vector is equal with the number of text in the text library;Wherein, described
One target is any one example in the knowledge base;
Cluster analysis is carried out to each object vector, whether the knowledge base is determined according to the result of the cluster analysis
Generation example of the same name is obscured.
The other side of the embodiment of the present invention is to provide the device that a kind of knowledge base example of the same name obscures detection, including:
Acquisition module, for obtaining text library, the text library content is related to the content of knowledge base, the text library bag
At least one text is included, each text includes at least one sentence, and the knowledge base includes multiple examples, each example bag
Multiple ordered sets being made of N number of sentence are included, N is the positive integer more than or equal to 3;
Constructing module, for obtaining first object, according to the first object and the text library, constructs first mesh
The set of corresponding object vector is marked, wherein the number phase of the dimension and text in the text library of each object vector
Deng;Wherein, the first object is any one example in the knowledge base;
Cluster Analysis module, for carrying out cluster analysis to each object vector, according to the result of the cluster analysis
Determine whether the knowledge base occurs example of the same name and obscure.
Knowledge base example of the same name provided in an embodiment of the present invention obscures the method and device of detection, passes through acquisition and knowledge base
Text library with related content, for the first object in knowledge base, first object can be any one reality in knowledge base
Example, the set of the corresponding object vector of each first object is constructed according to institute's text library;By being clustered to each object vector
Analysis, determines whether knowledge base occurs example of the same name and obscure according to the result of cluster analysis;Knowledge base example of the same name is realized to mix
The automatic detection confused, without manually checking each first object, saves substantial amounts of manpower, and substantially increase detection
Efficiency.
Brief description of the drawings
Fig. 1 is the flow diagram for the method that the knowledge base example of the same name that the embodiment of the present invention one provides obscures detection;
Fig. 2 is the flow diagram for the method that knowledge base provided by Embodiment 2 of the present invention example of the same name obscures detection;
Fig. 3 is the structure diagram for the device that the knowledge base example of the same name that the embodiment of the present invention three provides obscures detection;
Fig. 4 is the structure diagram for the device that the knowledge base example of the same name that the embodiment of the present invention four provides obscures detection.
Embodiment
To make the purpose, technical scheme and advantage of the embodiment of the present invention clearer, below in conjunction with the embodiment of the present invention
In attached drawing, the technical solution in the embodiment of the present invention is clearly and completely described, it is clear that described embodiment is
Part of the embodiment of the present invention, instead of all the embodiments.Based on the embodiments of the present invention, those of ordinary skill in the art
All other embodiments obtained without creative efforts, belong to the scope of protection of the invention.
In the description of the present application, it is to be understood that knowledge base includes multiple examples, and each example includes multiple by N
The ordered set of a sentence composition, N are the positive integer more than or equal to 3.Ordered set can be used for describe example attribute or
Relation between person's example, and each sentence is arranged according to predefined procedure in ordered set.For example, during N=3, ordered set can be with
Represented by the way of triple, each element can be a sentence in triple.Wherein, sentence be by a word or
Related one group of word is formed on person's syntax.
Embodiment one
Fig. 1 is the method flow diagram that knowledge base provided in an embodiment of the present invention example of the same name obscures detection.The present embodiment pin
To the situation for determining with the presence or absence of example of the same name to obscure in knowledge base in the prior art by the method for artificial nucleus couple, expend a large amount of
Manpower, and take considerable time, the problem of efficiency is very low, there is provided knowledge base example of the same name obscures the method for detection, the party
Method comprises the following steps that:
Step S101, text library is obtained, text library content is related to the content of knowledge base, and text library includes at least one text
This, each text includes at least one sentence.
In the present embodiment, text library is the set of natural language text, and a text library includes at least one text, often
A text includes at least one sentence.Whether text library content is related to the content of knowledge base, can be deposited as judgemental knowledge storehouse
In the reference for the situation that example of the same name is obscured.For example, knowledge base is the knowledge base in a history field, then the text in text library
It can include the text and other and the relevant text of history in the e-book of a historical textbook.
It is alternatively possible to chosen from existing text library with the relevant text of knowledge base content to be detected, or from electricity
Text is directly acquired in the computer-readable file such as the philosophical works, web page text, electronic document, forms the text library of the present invention.
Step S102, first object is obtained, according to first object and text library, the corresponding object vector of construction first object
Set.
Wherein, the dimension of each object vector is equal with the number of text in text library.
In the present embodiment, first object can be any one example in knowledge base, and each first object includes more
A ordered set being made of N number of sentence, N are the positive integer more than or equal to 3.
Further, each ordered set in first object corresponds to an object vector.It is each in the object vector
The value of dimension is corresponding with a text in text library, the feelings that can be occurred according to the sentence in ordered set in the text
Condition determines.That is, the ordered set that number vectorial in the set of the corresponding object vector of first object includes with the first object
The number of conjunction is equal, and the ordered set in first object and the object vector in the set of object vector correspond.
It should be noted that one or more of knowledge base first object, this implementation can be obtained in the present embodiment
Whether example occurs detection first object the process that example of the same name is obscured and illustrates only exemplified by obtaining a first object.
When obtaining multiple first objects, to the detection method all same of each first object.The present embodiment is for obtaining first object
Quantity be not specifically limited.
Step S103, cluster analysis is carried out to each object vector, determines whether knowledge base is sent out according to the result of cluster analysis
Raw example of the same name is obscured.
Specifically, according to the similarity between each object vector, to the target in the object vector set of first object
Vector carries out cluster analysis, the higher object vector of similarity is merged, according to cluster analysis result, if in obtained set
The number of vector is more than 1, it is determined that knowledge base occurs example of the same name and obscures.
Cluster analysis can use any type of the prior art without in advance specify cluster number clustering method into
OK, such as hierarchy clustering method, this will not be repeated here for the present embodiment.
The embodiment of the present invention is by obtaining the text library for having related content with knowledge base, for the first mesh in knowledge base
Mark, first object can be any one example in knowledge base, and the corresponding target of each first object is constructed according to institute's text library
The set of vector;By carrying out cluster analysis to each object vector, determine whether knowledge base occurs according to the result of cluster analysis
Example of the same name is obscured;The automatic detection that knowledge base example of the same name is obscured is realized, without manually carrying out core to each first object
It is right, substantial amounts of manpower is saved, and substantially increase detection efficiency.
Embodiment two
Fig. 2 is the method flow diagram that knowledge base provided by Embodiment 2 of the present invention example of the same name obscures detection.In above-mentioned reality
On the basis of applying example one, this method is described in detail in the present embodiment, and this method specifically includes following steps:
Step S201, text library is obtained, text library content is related to the content of knowledge base, and text library includes at least one text
This, each text includes at least one sentence.
Step S201 is similar with step S101, and details are not described herein for the present embodiment.
Step S202, first object is obtained, the example ID in the ordered set of first object is replaced with example ID corresponds to
Example title, obtain corresponding second target of first object.
In the present embodiment, last sentence of at least one ordered set is example in each example of knowledge base
Title, and the example ID that first sentence of each ordered set is the corresponding example of ordered set, example ID are used to uniquely mark
Know an example.
Wherein, first object can be any one example in knowledge base.In practical applications, can be referred to by technical staff
One or more example in knowledge base is determined as first object, whether is mixed in the ordered set by detecting first object
There is the ordered set of other examples of the same name, further determine that whether knowledge base occurs example of the same name and obscure.
For example, N=3, ordered set is represented by the way of triple, and each example includes multiple triples.Assuming that
There are two example A and example B, example A to include following 2 ordered set:<Id1, name, li po>With<Id1, occupation, poet>, its
In, first sentence id1 is the example ID of example A in ordered set.The ordered set of existence anduniquess in example A<Id1, name, Lee
In vain>, its last sentence " li po " represents the title of example A.Example B includes following ordered set:<Id2, name, Tu Fu
>,<Id2, occupation, poet>,<Id2, works, poem with five characters in one line《Spring hopes》>,<Id2, hobby, online game>,<Id2, friend, id1
>,<Id2, post, company executive president>.Wherein, the example ID that first sentence id2 is example B in ordered set.In example B
The ordered set of existence anduniquess<Id2, name, Tu Fu>, its last sentence " Tu Fu " represents the title of example B.
So, the first object obtained in the step is example B, and the example ID occurred in the ordered set of example B has
" id1 " and " id2 "." id2 " in the ordered set of example B is replaced with to the name of example B corresponding with id2 in this step
Claim, that is, use " Tu Fu " replacement " id2 ";And " id1 " in the ordered set of example B is replaced with example A's corresponding with id1
Title, that is, use " li po " replacement " id1 ".Thus obtaining corresponding second target of first object includes following ordered set:<Du
Just, name, Tu Fu>,<Tu Fu, occupation, poet>,<Tu Fu, works, poem with five characters in one line《Spring hopes》>,<Tu Fu, hobby, online game>,
<Tu Fu, friend, li po>,<Tu Fu, post, company executive president>.
It is alternatively possible to removed unrelated ordered set is obscured detection example attribute in first object, only to protecting
Example ID in the ordered set stayed replaces with the title of the corresponding example of example ID.In this way, the first object got corresponds to
The second target in the quantity of ordered set be less than the quantity of the ordered set in former first object, it is possible to reduce what need to be detected has
The quantity of ordered sets, so as to improve operational efficiency.For example, if first object represents a people, the first mesh can be removed
The ordered set of instance name and personage's gender is represented in mark.In the present embodiment, detection example attribute is obscured unrelated has
Ordered sets can be specified according to being actually needed by technical staff, and the present embodiment is not specifically limited this.
It should be noted that when first object is two and the above, step can be performed to each first object respectively
S202-S209, determines the ordered set of other examples of the same name whether is contaminated with each first object.When there is a first object
In when being contaminated with the ordered set of other examples of the same name, determine that the first object is obscured there is a situation where example of the same name, go forward side by side one
Step determines that knowledge base occurs example of the same name and obscures.When all first objects there is a situation where example of the same name without obscuring, into
One step determines that knowledge base does not occur example of the same name and obscures.
Step S203, according to the second target and text library, the set of the corresponding object vector of the second target of construction.
Wherein, the dimension of each object vector is equal with the number of text in text library.
Specifically, which can realize with the following method:
The corresponding interim vector of each ordered set of the second target is obtained, wherein the dimension and text library of each interim vector
The number of middle text is equal;For each ordered set in the second target, determine whether each text includes in ordered set
All object statements;If it is determined that result is yes, then by dimension corresponding with text in the corresponding interim vector of ordered set
Value is arranged to first object value;If it is determined that result is no, then by dimension corresponding with text in the corresponding interim vector of ordered set
Several values is arranged to the second desired value, obtains the corresponding object vector of ordered set;Obtain the corresponding object vector of the second target
Set.
For example, first object value can be set as 1, and the second desired value can be set as 2, and object statement is ordered set
The sentence of at least two predeterminated positions in conjunction, the sentences of for example, at least two predeterminated positions be in ordered set first position and
The sentence of last position.
In the present embodiment, text library can be expressed as WB={ W1,W2,…,Wl,…,Wt, wherein t is text library Chinese
This number;WlRepresent l-th of text in text library, l=1,2 ..., t.Wherein, the value of t is bigger, the effect of cluster analysis
Better.In practical applications, usually it is chosen for the text library to tens of thousands of a texts comprising hundreds of so that Clustering Effect is preferable and poly-
The calculation amount of alanysis process is not too large.
The second target is represented with E, and multiple ordered sets that E includes can be expressed as { V1,V2,…,Vi,…,Vn, wherein,
N represents the number of ordered set in E, ViRepresent any one ordered set in E, i=1,2 ..., n.Use CiRepresent ordered set
Close Vi, corresponding object vector, then the dimension of object vector is identical with the text number in text library, Ci={ Ci1,Ci2,…,
Cil,…,Cit, wherein CilThe value that dimension in object vector is l, with W in text librarylCorrespondence.Text W1,W2,…,
Wl,…,WtThe value C of dimension is corresponded to object vector respectivelyi1,Ci2,…,Cil,…,CitCorrespond.
In this step, if text WlIt is middle to include ordered set V at the same time there are a sentenceiIn at least two default positions
The sentence put, then by object vector with text WlThe value C of corresponding dimensionilIt is arranged to 1;If being not present, by object vector
In with text WlThe value C of corresponding dimensionilIt is arranged to 0.
Based on the citing in above-mentioned steps S202, obtaining corresponding second target of first object includes following 6 ordered set
Close:<Tu Fu, name, Tu Fu>,<Tu Fu, occupation, poet>,<Tu Fu, works, poem with five characters in one line《Spring hopes》>,<Tu Fu, hobby, network
Game>,<Tu Fu, friend, li po>,<Tu Fu, post, company executive president>.Priority according to above-mentioned 6 ordered sets is suitable
Sequence, is denoted as C1, C2, C3, C4, C5, C6 respectively by the corresponding interim vector of ordered set respectively.Assuming that at least two predeterminated positions
Sentence be ordered set in first position and last position sentence, that is to say in triple first and the 3rd
A sentence.Assuming that text library includes 4 texts, it is respectively text 1, text 2, text 3 and text 4.So, in the step
In S203, the dimension of the corresponding interim vector of each ordered set of the second target is 4.With ordered set<Tu Fu, works, five
Say regulated verse《Spring hopes》>Exemplified by, its corresponding interim vector C3 can be expressed as C3={ C31, C32, C33, C34 }, wherein C31,
C32, C33, C34 represent the value that dimension is 1,2,3,4 in interim vector C3 respectively, and respectively with the text 1 in text library, it is literary
Sheet 2, text 3 and text 4 correspond to.If occur " Tu Fu " at the same time there are a sentence in text 1, in the sentence and " five say
Regulated verse《Spring hopes》", then the value of 1 corresponding dimension of text is set to 1, i.e. C31=1;If do not have in all sentences in text 1
There is " Tu Fu " and " poem with five characters in one line at the same time《Spring hopes》", then the value of 1 corresponding dimension of text is set to 0, i.e. C31=0.Similarly,
C32, the value of C33, C34 can be determined respectively according to text 2, text 3 and text 4.
Above-mentioned steps S202-S203 is according to first object and text library, constructs the corresponding object vector of first object
The process of set.
Step S204, the similarity of any two object vector in the set of object vector is determined.
In the present embodiment, for any two object vector C in the set of object vectori={ Ci1,Ci2,…,
Cit, Cj={ Cj1,Cj2,…,Cjt, the two object vectors CiAnd CjSimilarity can use SimilarityLengthRatio
(Ci,Cj) represent, and can be calculated using following methods:
First, the similarity molecule of two object vectors is calculated:Similarity(Ci,Cj)=| Ci&Cj|, wherein, Ci&Cj
={ Ci1&Cj1,Ci2&Cj2,…,Cit&Cjt, and a t dimensional vector, | Ci&Cj| it is Ci&CjIn each dimension value in 1 number
Amount.
Wherein
Then, object vector C is calculatediAnd CjSimilarity:
Wherein, Similarity (Ci,Cj) for the similarity molecule of two object vectors, length (Ci) it is object vector
Ci1 quantity in the value of each dimension, length (Cj) it is object vector Cj1 quantity in the value of each dimension, min (length
(Ci),length(Cj)) represent length (Ci) and length (Cj) in minimum value.
For example, it is assumed that there are two object vector C1={ 1,0,1,1,0,1 }, C2={ 1,1,0,0,1,0 }.Then according to above-mentioned
Formula can obtain C1&C2={ 1&1,0&1,1&0,1&0,0&1,1&0 }, i.e. C1&C2={ 1,0,0,0,0,0 }.Understand:C1&C2
Each dimension value in 1 number be 1, C1Each dimension value in 1 number be 4, C2Each dimension value in 1 number be 3.I.e. |
C1&C2|=1, length (C1)=4, length (C2)=3, can obtain Similarity (C1,C2)=| C1&C2|=1, min
(length(C1),length(C2))=3, it is hereby achieved that two object vector C1And C2Similarity be:
SimilarityLengthRatio(C1,C2)=1/3.
Step S205, judge whether each similarity is respectively less than first threshold.
If judging result is no, step S206-S207 is performed;If so then execute step S208.
For example, first threshold can be 1/4.Alternatively, first threshold can also be 1/2,1/6,1/8, Ke Yiyou
Technical staff is set according to actual conditions, and the embodiment of the present invention is not specifically limited for the value of first threshold.
In the step, by the similarity of the definite any two object vector in above-mentioned steps S205 and first threshold into
Row compares, and judges whether that the similarity of any two object vector is respectively less than first threshold.If judging result is no, illustrate to deposit
In the higher object vector of similarity, cluster analysis need to be continued, perform step S206-S207.If the determination result is YES, then
Cluster analysis terminates, and determines that current object vector collection is combined into the result of cluster analysis.Object vector in cluster analysis result
Number is more than 1, it may be determined that the ordered set of other examples of the same name is contaminated with the corresponding first object of object vector set, can be with
Determine that knowledge base occurs example of the same name and obscures, perform step S208.
Step S206, merge two object vectors of similarity maximum, and the object vector after merging is updated to target
Object vector in the set of vector.
In this step, it is first determined two object vectors of similarity maximum, by two targets of similarity maximum to
Amount merges operation, the object vector being then updated to the object vector after merging in the set of object vector.After renewal
Object vector set in, the fresh target vector that merges will replace two maximum former object vectors of original similarity.
That is, after union operation, two former object vectors that original similarity is maximum in object vector set will not exist.
In the present embodiment, for any two vector C in the set of object vectoriAnd Cj:Ci={ Ci1,Ci2,…,
Cit, Cj={ Cj1,Cj2,…,Cjt, use CijRepresent the new object vector obtained after merging, then CiAnd CjUnion operation can
To be realized using following methods:
Cij={ Ci1|Cj1,Ci2|Cj2,…,Cit|Cjt,
Wherein,
For example, it is assumed that there are two object vector C1={ 1,0,1,0 }, C2={ 1,1,0,0 }, in the object vector after merging
First dimension value be:1 | 1=1, the value of the second dimension are:0 | 1=1, the value of the third dimension are:1 | 0=1, the value of fourth dimension are:0|
0=0, then can obtain C1And C2The object vector obtained after merging is { 1,1,1,0 }.
Step S207, judge whether object vector number is 1 in the set of object vector.
If the determination result is YES, when object vector number is 1 in object vector set, then the process of cluster analysis terminates,
Determine current goal vector set cooperation be cluster analysis as a result, due in object vector set object vector number be 1, can be with
Determine the ordered set for not mixing other examples of the same name in the corresponding first object of object vector set, it may be determined that knowledge base
Example of the same name does not occur to obscure, performs step S209.
If judging result is no, when object vector number is not 1 in object vector set, illustrate the process of cluster analysis also
Do not terminate, need to continue to carry out cluster analysis to each object vector in object vector set, return and perform step S204, determine mesh
The operation of the similarity of any two object vector in the set of vector is marked, until the object vector in the set of object vector
Number be 1.
Step S208, determine that knowledge base occurs example of the same name and obscures.
In the present embodiment, when in cluster analysis result object vector number be more than 1, it may be determined that object vector set pair
The ordered set of other examples of the same name is contaminated with the first object answered, it may be determined that knowledge base occurs example of the same name and obscures.
Preferably, can also determine how many mixes in first object according to object vector number in cluster analysis result
The ordered set of a example of the same name, may thereby determine that the number for the example of the same name obscured.
Step S209, determine that knowledge base does not occur example of the same name and obscures.
Above-mentioned steps S204-S209 is to carry out cluster analysis to object vector, and knowledge is determined according to the result of cluster analysis
Whether storehouse occurs the process that example of the same name is obscured.
The method that the embodiment of the present invention obscures knowledge base example of the same name detection is described in detail.Specifically provide
The set of the corresponding object vector of first object is constructed, cluster analysis is carried out to each object vector, and according to the knot of cluster analysis
Fruit determines whether knowledge base occurs the detailed process that example of the same name is obscured, and realizes the automatic inspection that knowledge base example of the same name is obscured
Survey, without manually checking each first object, save substantial amounts of manpower, and substantially increase detection efficiency.
Embodiment three
Fig. 3 is the structure chart that the knowledge base example of the same name that the embodiment of the present invention three provides obscures detection device.The present embodiment
The device of offer specifically can be used for performing the process flow that above method embodiment one provides, as shown in figure 3, the device bag
Include:Acquisition module 301, constructing module 302 and Cluster Analysis module 303.
Wherein, acquisition module 301 is used to obtain text library, and text library content is related to the content of knowledge base, text library bag
Include at least one text, each text includes at least one sentence, and knowledge base includes multiple examples, each example include it is multiple by
The ordered set of N number of sentence composition, N are the positive integer more than or equal to 3.Constructing module 302 is used to obtain first object, according to
First object and text library, the set of the corresponding object vector of construction first object, wherein the dimension of each object vector and text
The number of text is equal in this storehouse;Wherein, first object is any one example in knowledge base.Cluster Analysis module 303 is used
In carrying out cluster analysis to each object vector, determine whether knowledge base occurs example of the same name and obscure according to the result of cluster analysis.
Device provided in an embodiment of the present invention specifically can be used for execution above-described embodiment one and provide embodiment of the method
Process flow, details are not described herein again for concrete function.
The embodiment of the present invention is by obtaining the text library for having related content with knowledge base, for the first mesh in knowledge base
Mark, first object can be any one example in knowledge base, and the corresponding target of each first object is constructed according to institute's text library
The set of vector;By carrying out cluster analysis to each object vector, determine whether knowledge base occurs according to the result of cluster analysis
Example of the same name is obscured;The automatic detection that knowledge base example of the same name is obscured is realized, without manually carrying out core to each first object
It is right, substantial amounts of manpower is saved, and substantially increase detection efficiency.
Example IV
Fig. 4 is the structure chart that the knowledge base example of the same name that the embodiment of the present invention four provides obscures detection device.In above-mentioned reality
On the basis of applying example three, in the present embodiment, last sentence of at least one ordered set in each example of knowledge base
First sentence for the title of example, and each ordered set is the example ID of the corresponding example of ordered set, and example ID is used
In one example of unique mark.
As shown in figure 4, constructing module 302 includes:Acquisition submodule 3021 and construction submodule 3022.Wherein, son is obtained
Module 3021 is used to replacing with the example ID in the ordered set of first object into the title of the corresponding example of example ID, obtains the
Corresponding second target of one target.Submodule 3022 is constructed to be used to be corresponded to according to the second target and text library, the second target of construction
Object vector set.
The corresponding interim vector of each ordered set that submodule 3022 is specifically used for obtaining the second target is constructed, wherein each
The dimension of interim vector is equal with the number of text in text library;For each ordered set in the second target, determine each
Whether text includes all object statements in ordered set;If it is determined that result is yes, then by ordered set it is corresponding temporarily to
The value of dimension corresponding with text is arranged to first object value in amount;If it is determined that result is no, then face ordered set is corresponding
The value of dimension corresponding with text is arranged to the second desired value in Shi Xiangliang, obtains the corresponding object vector of ordered set;Obtain
The set of the corresponding object vector of second target.
Wherein, object statement is the sentence of at least two predeterminated positions in ordered set.
Alternatively, the sentence of at least two predeterminated positions is the language of first position and last position in ordered set
Sentence.
Cluster Analysis module 303 is specifically used for determining the similar of any two object vector in the set of object vector
Degree;Judge whether each similarity is respectively less than first threshold;If judging result is no, merge similarity maximum two targets to
Measure, and the object vector after merging is updated to the object vector in the set of object vector;Return to perform and determine object vector
Set in any two object vector similarity operation, until vector set in object vector number be 1,
And determine that knowledge base does not occur example of the same name and obscures.
Cluster Analysis module 303 is additionally operable to if the determination result is YES, it is determined that knowledge base occurs example of the same name and obscures.
Device provided in an embodiment of the present invention specifically can be used for execution above-described embodiment two and provide embodiment of the method
Process flow, details are not described herein again for concrete function.
The embodiment of the present invention is by obtaining the text library for having related content with knowledge base, for the first mesh in knowledge base
Mark, first object can be any one example in knowledge base, and the corresponding target of each first object is constructed according to institute's text library
The set of vector;By carrying out cluster analysis to each object vector, determine whether knowledge base occurs according to the result of cluster analysis
Example of the same name is obscured;The automatic detection that knowledge base example of the same name is obscured is realized, without manually carrying out core to each first object
It is right, substantial amounts of manpower is saved, and substantially increase detection efficiency.
In several embodiments provided by the present invention, it should be understood that disclosed apparatus and method, can pass through it
Its mode is realized.For example, device embodiment described above is only schematical, for example, the division of the unit, only
Only a kind of division of logic function, can there is other dividing mode when actually realizing, such as multiple units or component can be tied
Another system is closed or is desirably integrated into, or some features can be ignored, or do not perform.It is another, it is shown or discussed
Mutual coupling, direct-coupling or communication connection can be the INDIRECT COUPLING or logical by some interfaces, device or unit
Letter connection, can be electrical, machinery or other forms.
The unit illustrated as separating component may or may not be physically separate, be shown as unit
The component shown may or may not be physical location, you can with positioned at a place, or can also be distributed to multiple
In network unit.Some or all of unit therein can be selected to realize the mesh of this embodiment scheme according to the actual needs
's.
In addition, each functional unit in each embodiment of the present invention can be integrated in a processing unit, can also
That unit is individually physically present, can also two or more units integrate in a unit.Above-mentioned integrated list
Member can both be realized in the form of hardware, can also be realized in the form of hardware adds SFU software functional unit.
The above-mentioned integrated unit realized in the form of SFU software functional unit, can be stored in one and computer-readable deposit
In storage media.Above-mentioned SFU software functional unit is stored in a storage medium, including some instructions are used so that a computer
It is each that equipment (can be personal computer, server, or network equipment etc.) or processor (processor) perform the present invention
The part steps of embodiment the method.And foregoing storage medium includes:USB flash disk, mobile hard disk, read-only storage (Read-
Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disc or CD etc. it is various
Can be with the medium of store program codes.
Those skilled in the art can be understood that, for convenience and simplicity of description, only with above-mentioned each function module
Division progress for example, in practical application, can be complete by different function modules by above-mentioned function distribution as needed
Into the internal structure of device being divided into different function modules, to complete all or part of function described above.On
The specific work process of the device of description is stated, may be referred to the corresponding process in preceding method embodiment, details are not described herein.
Finally it should be noted that:The above embodiments are only used to illustrate the technical solution of the present invention., rather than its limitations;To the greatest extent
Pipe is described in detail the present invention with reference to foregoing embodiments, it will be understood by those of ordinary skill in the art that:Its according to
Can so modify to the technical solution described in foregoing embodiments, either to which part or all technical characteristic into
Row equivalent substitution;And these modifications or replacement, the essence of appropriate technical solution is departed from various embodiments of the present invention technology
The scope of scheme.