A kind of Relation extraction method based on convolutional neural networks and apart from supervision
Technical field
The present invention relates to neural network, natural language processing, information extraction and Relation extractions, more particularly to one kind is based on volume
Product neural network and the Relation extraction method apart from supervision.
Background technique
In recent years, With the fast development of internet, the content on internet and knowledge are more and more, even with index
Double form increases, including news, blog, Email, public document, chat record etc..But these data are all
Non- institutional e-text.The mankind how are enabled easily to understand all these data one extraordinary to think
Method is exactly the semantic information these non-structured data conversions at structuring.But huge data volume manually to go to infuse
Releasing these information becomes extremely difficult, even not possible with.It is, therefore, desirable to be able to by computer, with computer technology by this
A little data marks are at the text structure for being easy to human intelligible, reading.This just has the appearance of Relation extraction method.
Relation extraction is mainly the following method:
First, measure of supervision.This method carries out handmarking to the sentence in corpus first, marks entity and reality
Relationship between body.Such as the data of ACE meeting in 2004 contain a document more than 1000, wherein 16 are marked, 771 entities
To as relationship example.The relationship example that ACE meeting is marked using these is as training set, by the word for extracting these examples
Method, syntax and semantic feature obtain a relationship classifier using supervised learning method.Then it goes to judge with this classifier
Whether entity in test data is to having some relationship.Since measure of supervision needs prior handmarking's training dataset, and it is somebody's turn to do
Labor intensive is compared in work, so measure of supervision is not suitable for the information extraction task of extensive Opening field.
Second, unsupervised approaches.Such method extracts the character string between two entities, and gathers to these character strings
Class simplifies operation, to obtain the string representation of relationship.This method is suitable in the case of large-scale data, also can produce
A large amount of relationship example, but the relationship example that the method obtains is difficult to map directly to a specific knowledge base.
Third, semi-supervised method.This method is using a small amount of flag data as initial seed, and then iterative learning marks mould
Type, and gone to mark unlabelled data with the model, the mark example most firmly believed is added in marked data.However,
After a large amount of the number of iterations, the comparison that accuracy rate would generally decline is more, this is because the accumulation of marking error causes
, this phenomenon is referred to as " semantic shift (semantic drift) " problem.In order to reduce this mistake, scholar is goed deep into
Research.Wherein Co-training method is a kind of feature set using two conditional samplings, to provide different and complementary letters
Breath, to reduce marking error.Type checking (Type checking) method is to go to check using a name Entity recognition device
Relationship example.
Based on the Relation extraction method (DS, Distance Supervision) of distance supervision, compared to measure of supervision, energy
Enough utilize fairly large number of data, including more content of text, more relationships, more examples.Due to combining phase
When the feature of number, much problems due to feature difference are avoided.Due to DS be by data-driven, rather than rely on
The good text of label, so the problem of overcoming the over-fitting and field dependence that measure of supervision is encountered.Compared to unsupervised side
Method, the result of DS classification have specific and significant relationship, and the relationship extracted has more actual meaning, Ke Yiwei
Mankind's service.DS method not only uses part of speech feature compared to method before, and is also added into many grammar properties.
Therefore, DS method becomes mainstream side so far instead of the method based on core (Kernel) more widely used before
The basis of method.
Deep learning model achieves significant effect in terms of computer vision and speech recognition.In recent years, some people
Deep learning model has been also used in the work in terms of natural language processing, has found have sizable compared to method before
Effect promoting.Convolutional neural networks (Convolution Neural NetWork) are exactly one of method.Convolutional Neural net
Network is to find its uniqueness when being used for the neuron of local sensitivity and direction selection in research cat cortex by Hubel and Wiesel
Network structure the complexity of Feedback Neural Network, a kind of neural network then proposed can be effectively reduced.It is proposed from them
After this network structure, more research workers improve network, and become in numerous ambits and grind
Study carefully hot spot.The characteristics of convolutional neural networks, is that feature extraction and pattern classification carry out simultaneously, and generate in training, weight
It can share, to reduce network parameter, so network structure is simple, adaptable, speed is fast.
Summary of the invention
The purpose of the present invention is overcoming the deficiencies of the prior art and provide a kind of convolutional neural networks and based on distance supervision
Relation extraction method.
A kind of Relation extraction method based on convolutional neural networks and apart from supervision, comprises the following steps:
1) by existing relationship map at relationship by objective (RBO);
2) entity alias in existing relationship is extended, a variety of different forms for finding entity alias are extended by problem;
3) from internet, the relevant non-structured text of entity is obtained, and establish index;
4) by search index sentence relevant to entity alias, and positive negative sample is isolated;
5) convolutional neural networks are based on, positive negative sample is converted into feature vector;
6) classified with more example multi-tag models to non-structured text using the feature vector obtained, obtained new
Relationship pair.
On the basis of above scheme, each step can further use following preferred embodiment:
Step 1) is specific as follows: different field existing for existing knowledge base, the relationship expression form of different places are reflected
Penetrate into the relationship by objective (RBO) of needs.
The step 2) specifically:
1) entity alias that redirection link of the entity on wikipedia is corresponding in existing relationship is found;
2) be extended to the entity alias of infull name: abbreviation conversion helps name or after the entity alias of not suffix
Face adds suffix;
3) reduce to the entity alias that do not abridge: full name is carried out part statement by acronym;
4) to step 1)~3) it is iterated, the entity alias of target requirement is met until finding;
5) processing is filtered to entity alias using entity link and disambiguation.
The step 3) specifically:
1) with the entity alias and the obtained entity alias structure of entity alias extension in already existing relationship
Build up a dictionary;
2) it uses the word in the dictionary constructed as keyword, crawls the corresponding reality of keyword from internet by crawler
The relevant webpage of body;
3) text extraction is carried out to the webpage crawled, and subordinate sentence processing is carried out to content of text, acquire non-knot
The text of structure, and store in the form of a file;
4) in full gopher establishes full-text index to obtained non-structured text.
The step 4) specifically:
1) already existing relationship is expressed as r (e1,e2), wherein r is relationship name, e1And e2It is entity 1 and entity respectively
2 name;
2) with the name e of entity 11As keyword, the sentence of the related name of in full gopher retrieval entity 1;Such as
The sentence that fruit is retrieved includes the title e of entity 22, which is labeled as positive sample;Otherwise, sentence label is negative
Sample.
The step 5) specifically:
1) each word in positive negative sample is converted into term vector with word2vec;
2) sentence that will convert into term vector passes through convolution, the sequence after all samples to be converted into convolution;
3) convolution sequence pond is obtained into final feature with aggregate function.
The step 6) specifically:
1) definition document collection is combined into C, and the collection of the entity description extracted from C is combined into E, it is known that the collection of relational tags be combined into
R, the related database of institute are D, and D is at least instantiated by the sentence in C primary;
2) Relation extraction based on distance supervision is carried out with the model of more example multi-tags, the model is differentiated using hard
Expectation-maximization algorithm, the training step of model is divided into two steps:
The first step executes E process, by maximizing the maximal possibility estimation for the joint probability p that following formula provides, finds out
Optimal relational tags:
Wherein, PiAnd NiRespectively indicate the corresponding set of positive and negative relational tags of i-th of entity pair, ziI-th of expression real
The relational tags of body pair, yiIt indicates whether to hold corresponding relationship, if r ∈ Pi, thenIf r ∈ Ni, then
wyAnd wzRespectively indicate the parameter of y classifier and z classifier, xiIndicate that i-th of sentence, r indicate that the corresponding label of relationship, m indicate
M-th of description, z 'iCorresponding group is once asked in the past comprising i-th of entity and describes label obtained in joint probability, i=1 ...,
N carries out calculating joint probability, and n is the number of the entity pair in D, MiIt is i-th of entity to corresponding entity description set,
For each m ∈ MiCalculate following formula:
Wherein: P () indicates finally obtained joint probability, and subscript * indicates the parameter final result;
Second step executes M process, is separately optimized the parameter of y classifier and z classifier, obtains new wyAnd wz, and point
Not You Hua two layers of classified device parameter, optimization formula it is as follows, wherein w be each function parameter:
The beneficial effect that the present invention has compared with prior art:
1. the method that the relationship proposed by the present invention based on distance supervision is extracted, this method is compared to very small amount of mark
The supervised training mode for the corpus being poured in, can utilize a large amount of data, including more texts, more relationships, more
Example.And due to there is comparatively large piece of data amount, so the feature that can combine vast number is supplied to classifier, thus keep away
The problem of much bringing because being characterized difference is exempted from.
2. method proposed by the present invention is compared to unsupervised method.Unsupervised method there are the problem of be exactly, it is difficult to
The result that model training is obtained is mapped in known knowledge base, and the relationship that training obtains is beyond expression of words to be easy to people at the mankind
The form that class understands.
3. what the present invention applied is the model of more example multi-tags (MIML), this model is supervised compared to basic distance
Model.Since MIML uses the feature that at least there is primary (At Least Once) example, so avoiding many because lacking
Few example and the result there are deviation.More example multi-tags have also used two layers of model, and the descriptive level for capableing of multiple entity pair is other
Classification is stated, and makes entity to that can possess multiple relationship classifications, more really simulates actual conditions.For example,
Jordon is both the team member of Bulls and the boss of Hornets.There may be multiple relationships for one entity.
4. the present invention added convolutional neural networks layer compared to the model of basic more example multi-tags (MIML).By
In applying newest deep learning model, to the declarative stronger of text, feature is compared to original general natural language
Feature is more representative.Therefore, performance and accuracy rate have relatively high promotion.
Detailed description of the invention
Fig. 1 is natural language model used in the present invention, and first layer therein is convolutional layer, that is, by original sample
Originally it is converted into after the expression way of term vector, then convolution obtains convolution sequence, and the second layer is pond layer, by convolution sequence pond
Change, the last layer connection is more example multi-tag layers.
Fig. 2 is whole flow process figure of the invention.
Specific embodiment
The present invention is further elaborated and is illustrated with reference to the accompanying drawings and detailed description.Each implementation in the present invention
The technical characteristic of mode can carry out the corresponding combination under the premise of not conflicting with each other.
As shown in Fig. 1~2, a kind of Relation extraction method based on convolutional neural networks and apart from supervision includes following step
It is rapid:
1) by existing a small amount of relationship map at relationship by objective (RBO).It is specific as follows: difference existing for existing knowledge base is led
Domain, different places relationship expression form be mapped to the relationship by objective (RBO) of needs.By relationship map existing for existing knowledge base at
The relationship by objective (RBO) needed, because different field, different places are different to the expression form of relationship.For example wikipedia
(Wikipedia) many transaction attributes that information boxes (Info Box) include, but it is different with the relationship by objective (RBO) that we need
Sample.Such as: it is Org:founded that University:established is corresponding in information boxes.
2) entity alias (the different expression ways of entity) in existing relationship is extended, (Query is extended by problem
Expansion a variety of different forms of entity alias) are found.Specifically:
2.1) entity alias that redirection link of the entity on wikipedia is corresponding in existing relationship is found;
The link source of wikipedia is literary (Anchor Text), and link source text includes the change of the various different names of entity
Shape, and can all occur in actual sentence, to extracting, the relevant sentence of entity is highly useful
2.2) be extended to the entity alias of infull name: name or the entity alias in not suffix are helped in abbreviation conversion
Below plus suffix (such as: Co., Ltd (Ltd), company (Corp));
2.3) reduce to the entity alias that do not abridge: full name is carried out part statement by acronym.
2.4) to step 1)~3) it is iterated, the entity alias of target requirement is met until finding;Target requirement can root
Factually border is determined, i.e., physical name is suitable and quantity is enough;
2.5) using entity link (Entity Linking) and disambiguation (Disambiguation) to entity alias
It is filtered processing.
3) from internet, news, blog, Email Information, public document, chat record etc. be can be, obtained
The relevant non-structured text of a large amount of entities, and establish index.Specifically:
3.1) with the entity alias and the obtained entity alias of entity alias extension in already existing relationship
It is built into a dictionary;
3.2) it uses the word in the dictionary constructed as keyword, it is corresponding to crawl keyword from internet by crawler
The relevant webpage of entity;
3.3) text extraction is carried out to the webpage crawled, and subordinate sentence processing is carried out to content of text, acquired big
The non-structured text of amount, and a large amount of texts stored in the form of a file;
3.4) full text is established to obtained non-structured text with the full-text searches such as Lucene or Solr tool
Index.
4) by search index sentence relevant to entity alias, and positive negative sample is isolated.Specifically:
4.1) already existing relationship is expressed as r (e1,e2), wherein r is relationship name, e1And e2It is right in relationship respectively
The name for two entities answered is respectively defined as the name of entity 1 and entity 2;
4.2) with the name e of entity 11As keyword, the sentence of the related name of in full gopher retrieval entity 1;
If retrieving the title e that obtained sentence includes entity 22, which is labeled as positive sample;Otherwise, which is labeled as
Negative sample.
5) convolutional neural networks are based on, positive negative sample is converted into feature vector.Specifically:
5.1) each word in positive negative sample is converted into term vector with word2vec;
5.2) sentence that will convert into term vector passes through convolution, the sequence after all samples to be converted into convolution;
5.3) the convolution sequence pond obtained after sentence convolution is obtained with aggregate function (being max function here) final
Feature.
6) classified with more example multi-tag models (MIML) to non-structured text using the feature vector obtained,
Obtain new relationship pair.Specifically:
6.1) definition document collection is combined into C, and the collection of the entity description extracted from C is combined into E, it is known that relational tags set
For R, the related database of institute is D, and D is at least instantiated by the sentence in C primary;
6.2) Relation extraction based on distance supervision is carried out with the model of more example multi-tags, the model utilizes to be sentenced firmly
Other expectation-maximization algorithm (EM, Expectation Maximization), the training step of model is divided into two steps:
The first step executes E process, by maximizing the maximal possibility estimation for the joint probability p that following formula provides, finds out
Optimal relational tags:
Wherein, PiAnd NiRespectively indicate the corresponding set of positive and negative relational tags of i-th of entity pair, ziI-th of expression real
Relational tags of the body to (Entity Tuple), yiIndicate whether to hold corresponding relationship (that is, if r ∈ Pi, thenIf r ∈ Ni, thenwyAnd wzRespectively indicate the parameter of y classifier and z classifier, xiIndicate i-th
Son, r indicate that the corresponding label of relationship, m indicate m-th of description, z 'iJoint is once asked in the past to corresponding group comprising i-th of entity
Label is described, i=1 ..., n carry out calculating joint probability, and n is the number of the entity pair in D, M obtained in probabilityiIt is i-th
A entity is to corresponding entity description set, for each m ∈ MiCalculate following formula:
Wherein: P () indicates finally obtained joint probability, and subscript * indicates the parameter final result;
Second step executes M process, is separately optimized the parameter of y classifier and z classifier, obtains new wyAnd wz, and point
Not You Hua two layers of classified device parameter, optimization formula it is as follows, wherein w be each function parameter:
Embodiment
The Relation extraction of KBP2010 is completed with the corpus of the entry of wikipedia 820,000 or so and a large amount of New York Times
For task, implementation steps of the invention are as follows:
Illustrate:
There is an entry on wikipedia, that is, corresponds to an entity, its relevant attribute, in the information of each entry
In box (Info Box), there are also the relevant articles of this entry, that is, content of text.New York Times corpus is largely to come
From the newsletter archive of the New York Times, wherein including a large amount of non-structured information.
1. by the information MAP of the information boxes (Info Box) on wikipedia at attribute type corresponding to KBP.Analogy
Say the objective attribute target attribute by the relationship map of University:established at Org:founded.It maps in some Wikis
Attribute does not have in task, and just these attributes are neglected, also have it is one-to-many, with regard to corresponding mapping;
2. finding the corresponding entity alias of redirection link of the entity on wikipedia;
3. the link source of wikipedia is literary (Anchor Text): link source text includes the various different names of entity
Deformation, and can all occur in actual sentence, to extracting, the relevant sentence of entity is highly useful;
4. extension name: name is helped in abbreviation conversion, and name is helped in surname conversion, add behind the name some suffix (such as:
Co., Ltd (Ltd), company (Corp));
5. reduction physical name: with extension name on the contrary, finding all possible abbreviation: acronym, part is stated
Etc.;
6. after step 4 and step 5, then jumping to step 1 and step 2, iteration is carried out, suitable and enough until finding
Physical name;
It puts together 7. the corresponding all texts of entry on wikipedia are individually extracted, the New York Times is relevant
Article is also extracted and is put together;
8. carrying out subordinate sentence to obtained text with subordinate sentence tool, there are in new file for a sentence a line;
9. establishing index to the text for having divided sentence with the full-text searches such as Lucene or Solr tool;
10. the entry name of each entry and its alias are as keyword using in Wiki, with full-text searches such as Lucene
This entry of tool queries relevant sentence in all texts, these sentence extractions are come out;
11. pair sentence extracted is simply handled, if in the sentence extracted including the entry information
Attribute involved in box (Info Box), then this sentence is just labeled as positive sample;Otherwise just this sentence is marked
For negative sample, classifies for subsequent classifier and use;
12. each word in sample is converted to term vector with word2vec;
13. the sentence that will convert into term vector passes through convolution, the sequence after all samples to be converted into convolution;
14. the convolution sequence pond obtained after sentence convolution talked about to obtain with aggregate function (being max function here) final
Feature;
15. setting PiAnd NiRespectively indicate the corresponding set of positive and negative relational tags of i-th of entity pair, ziI-th of expression real
Relational tags of the body to (Entity Tuple), yiIndicate whether to hold corresponding relationship (that is, if r ∈ PiSoIf r ∈ Ni, thenwyAnd wzThe parameter of y classifier and z classifier is respectively indicated, x indicates sentence, r table
Show that the corresponding label of relationship, m indicate m-th of description.z′iComprising i-th of entity to corresponding group obtained in the reasoning before
Label is described.By maximizing joint probability described in formula, the new relational tags of entity pair are obtained:
16. being separately optimized the parameter of y classifier and z classifier, new w is obtainedyAnd wz, since two layers of classifier is excellent
Change process is uncorrelated, so independently optimize two groups of parameters, as follows:
17. iteration step 15 and step 16 are until obtaining final model.