CN106055675B

CN106055675B - A kind of Relation extraction method based on convolutional neural networks and apart from supervision

Info

Publication number: CN106055675B
Application number: CN201610393749.6A
Authority: CN
Inventors: 凌立刚; 朱海鹏
Original assignee: Hangzhou Measurement Data Technology Co Ltd
Current assignee: Hangzhou Measurement Data Technology Co Ltd
Priority date: 2016-06-06
Filing date: 2016-06-06
Publication date: 2019-10-29
Anticipated expiration: 2036-06-06
Also published as: CN106055675A

Abstract

The invention discloses a kind of Relation extraction methods based on convolutional neural networks and apart from supervision.Include the following steps: (1) by existing a small amount of relationship map at required relationship type；(2) the different expression ways of entity in existing relationship are extended；(3) a large amount of relevant non-structured texts are obtained from internet, and establish index；(4) by search index sentence relevant to entity, and positive negative sample is separated；(5) convolutional neural networks are based on, feature vector is converted the samples into；(6) using the feature vector obtained, classify to text, obtain new relationship pair.The present invention is based on a sentence, there may be a kind of relationships it is assumed that since a small amount of known relation, using largely from the non-structured text of internet, acquiring a large amount of new structured messages, that is, finds new relationship.

Description

A kind of Relation extraction method based on convolutional neural networks and apart from supervision

Technical field

The present invention relates to neural network, natural language processing, information extraction and Relation extractions, more particularly to one kind is based on volume Product neural network and the Relation extraction method apart from supervision.

Background technique

In recent years, With the fast development of internet, the content on internet and knowledge are more and more, even with index Double form increases, including news, blog, Email, public document, chat record etc..But these data are all Non- institutional e-text.The mankind how are enabled easily to understand all these data one extraordinary to think Method is exactly the semantic information these non-structured data conversions at structuring.But huge data volume manually to go to infuse Releasing these information becomes extremely difficult, even not possible with.It is, therefore, desirable to be able to by computer, with computer technology by this A little data marks are at the text structure for being easy to human intelligible, reading.This just has the appearance of Relation extraction method.

Relation extraction is mainly the following method:

First, measure of supervision.This method carries out handmarking to the sentence in corpus first, marks entity and reality Relationship between body.Such as the data of ACE meeting in 2004 contain a document more than 1000, wherein 16 are marked, 771 entities To as relationship example.The relationship example that ACE meeting is marked using these is as training set, by the word for extracting these examples Method, syntax and semantic feature obtain a relationship classifier using supervised learning method.Then it goes to judge with this classifier Whether entity in test data is to having some relationship.Since measure of supervision needs prior handmarking's training dataset, and it is somebody's turn to do Labor intensive is compared in work, so measure of supervision is not suitable for the information extraction task of extensive Opening field.

Second, unsupervised approaches.Such method extracts the character string between two entities, and gathers to these character strings Class simplifies operation, to obtain the string representation of relationship.This method is suitable in the case of large-scale data, also can produce A large amount of relationship example, but the relationship example that the method obtains is difficult to map directly to a specific knowledge base.

Third, semi-supervised method.This method is using a small amount of flag data as initial seed, and then iterative learning marks mould Type, and gone to mark unlabelled data with the model, the mark example most firmly believed is added in marked data.However, After a large amount of the number of iterations, the comparison that accuracy rate would generally decline is more, this is because the accumulation of marking error causes , this phenomenon is referred to as " semantic shift (semantic drift) " problem.In order to reduce this mistake, scholar is goed deep into Research.Wherein Co-training method is a kind of feature set using two conditional samplings, to provide different and complementary letters Breath, to reduce marking error.Type checking (Type checking) method is to go to check using a name Entity recognition device Relationship example.

Based on the Relation extraction method (DS, Distance Supervision) of distance supervision, compared to measure of supervision, energy Enough utilize fairly large number of data, including more content of text, more relationships, more examples.Due to combining phase When the feature of number, much problems due to feature difference are avoided.Due to DS be by data-driven, rather than rely on The good text of label, so the problem of overcoming the over-fitting and field dependence that measure of supervision is encountered.Compared to unsupervised side Method, the result of DS classification have specific and significant relationship, and the relationship extracted has more actual meaning, Ke Yiwei Mankind's service.DS method not only uses part of speech feature compared to method before, and is also added into many grammar properties. Therefore, DS method becomes mainstream side so far instead of the method based on core (Kernel) more widely used before The basis of method.

Deep learning model achieves significant effect in terms of computer vision and speech recognition.In recent years, some people Deep learning model has been also used in the work in terms of natural language processing, has found have sizable compared to method before Effect promoting.Convolutional neural networks (Convolution Neural NetWork) are exactly one of method.Convolutional Neural net Network is to find its uniqueness when being used for the neuron of local sensitivity and direction selection in research cat cortex by Hubel and Wiesel Network structure the complexity of Feedback Neural Network, a kind of neural network then proposed can be effectively reduced.It is proposed from them After this network structure, more research workers improve network, and become in numerous ambits and grind Study carefully hot spot.The characteristics of convolutional neural networks, is that feature extraction and pattern classification carry out simultaneously, and generate in training, weight It can share, to reduce network parameter, so network structure is simple, adaptable, speed is fast.

Summary of the invention

The purpose of the present invention is overcoming the deficiencies of the prior art and provide a kind of convolutional neural networks and based on distance supervision Relation extraction method.

A kind of Relation extraction method based on convolutional neural networks and apart from supervision, comprises the following steps:

1) by existing relationship map at relationship by objective (RBO)；

2) entity alias in existing relationship is extended, a variety of different forms for finding entity alias are extended by problem；

3) from internet, the relevant non-structured text of entity is obtained, and establish index；

4) by search index sentence relevant to entity alias, and positive negative sample is isolated；

5) convolutional neural networks are based on, positive negative sample is converted into feature vector；

6) classified with more example multi-tag models to non-structured text using the feature vector obtained, obtained new Relationship pair.

On the basis of above scheme, each step can further use following preferred embodiment:

Step 1) is specific as follows: different field existing for existing knowledge base, the relationship expression form of different places are reflected Penetrate into the relationship by objective (RBO) of needs.

The step 2) specifically:

1) entity alias that redirection link of the entity on wikipedia is corresponding in existing relationship is found；

2) be extended to the entity alias of infull name: abbreviation conversion helps name or after the entity alias of not suffix Face adds suffix；

3) reduce to the entity alias that do not abridge: full name is carried out part statement by acronym；

4) to step 1)~3) it is iterated, the entity alias of target requirement is met until finding；

5) processing is filtered to entity alias using entity link and disambiguation.

The step 3) specifically:

1) with the entity alias and the obtained entity alias structure of entity alias extension in already existing relationship Build up a dictionary；

2) it uses the word in the dictionary constructed as keyword, crawls the corresponding reality of keyword from internet by crawler The relevant webpage of body；

3) text extraction is carried out to the webpage crawled, and subordinate sentence processing is carried out to content of text, acquire non-knot The text of structure, and store in the form of a file；

4) in full gopher establishes full-text index to obtained non-structured text.

The step 4) specifically:

1) already existing relationship is expressed as r (e₁,e₂), wherein r is relationship name, e₁And e₂It is entity 1 and entity respectively 2 name；

2) with the name e of entity 1₁As keyword, the sentence of the related name of in full gopher retrieval entity 1；Such as The sentence that fruit is retrieved includes the title e of entity 2₂, which is labeled as positive sample；Otherwise, sentence label is negative Sample.

The step 5) specifically:

1) each word in positive negative sample is converted into term vector with word2vec；

2) sentence that will convert into term vector passes through convolution, the sequence after all samples to be converted into convolution；

3) convolution sequence pond is obtained into final feature with aggregate function.

The step 6) specifically:

1) definition document collection is combined into C, and the collection of the entity description extracted from C is combined into E, it is known that the collection of relational tags be combined into R, the related database of institute are D, and D is at least instantiated by the sentence in C primary；

2) Relation extraction based on distance supervision is carried out with the model of more example multi-tags, the model is differentiated using hard Expectation-maximization algorithm, the training step of model is divided into two steps:

The first step executes E process, by maximizing the maximal possibility estimation for the joint probability p that following formula provides, finds out Optimal relational tags:

Wherein, P_iAnd N_iRespectively indicate the corresponding set of positive and negative relational tags of i-th of entity pair, z_iI-th of expression real The relational tags of body pair, y_iIt indicates whether to hold corresponding relationship, if r ∈ P_i, thenIf r ∈ N_i, then w_yAnd w_zRespectively indicate the parameter of y classifier and z classifier, x_iIndicate that i-th of sentence, r indicate that the corresponding label of relationship, m indicate M-th of description, z '_iCorresponding group is once asked in the past comprising i-th of entity and describes label obtained in joint probability, i=1 ..., N carries out calculating joint probability, and n is the number of the entity pair in D, M_iIt is i-th of entity to corresponding entity description set, For each m ∈ M_iCalculate following formula:

Wherein: P () indicates finally obtained joint probability, and subscript * indicates the parameter final result；

Second step executes M process, is separately optimized the parameter of y classifier and z classifier, obtains new w_yAnd w_z, and point Not You Hua two layers of classified device parameter, optimization formula it is as follows, wherein w be each function parameter:

The beneficial effect that the present invention has compared with prior art:

1. the method that the relationship proposed by the present invention based on distance supervision is extracted, this method is compared to very small amount of mark The supervised training mode for the corpus being poured in, can utilize a large amount of data, including more texts, more relationships, more Example.And due to there is comparatively large piece of data amount, so the feature that can combine vast number is supplied to classifier, thus keep away The problem of much bringing because being characterized difference is exempted from.

2. method proposed by the present invention is compared to unsupervised method.Unsupervised method there are the problem of be exactly, it is difficult to The result that model training is obtained is mapped in known knowledge base, and the relationship that training obtains is beyond expression of words to be easy to people at the mankind The form that class understands.

3. what the present invention applied is the model of more example multi-tags (MIML), this model is supervised compared to basic distance Model.Since MIML uses the feature that at least there is primary (At Least Once) example, so avoiding many because lacking Few example and the result there are deviation.More example multi-tags have also used two layers of model, and the descriptive level for capableing of multiple entity pair is other Classification is stated, and makes entity to that can possess multiple relationship classifications, more really simulates actual conditions.For example, Jordon is both the team member of Bulls and the boss of Hornets.There may be multiple relationships for one entity.

4. the present invention added convolutional neural networks layer compared to the model of basic more example multi-tags (MIML).By In applying newest deep learning model, to the declarative stronger of text, feature is compared to original general natural language Feature is more representative.Therefore, performance and accuracy rate have relatively high promotion.

Detailed description of the invention

Fig. 1 is natural language model used in the present invention, and first layer therein is convolutional layer, that is, by original sample Originally it is converted into after the expression way of term vector, then convolution obtains convolution sequence, and the second layer is pond layer, by convolution sequence pond Change, the last layer connection is more example multi-tag layers.

Fig. 2 is whole flow process figure of the invention.

Specific embodiment

The present invention is further elaborated and is illustrated with reference to the accompanying drawings and detailed description.Each implementation in the present invention The technical characteristic of mode can carry out the corresponding combination under the premise of not conflicting with each other.

As shown in Fig. 1~2, a kind of Relation extraction method based on convolutional neural networks and apart from supervision includes following step It is rapid:

1) by existing a small amount of relationship map at relationship by objective (RBO).It is specific as follows: difference existing for existing knowledge base is led Domain, different places relationship expression form be mapped to the relationship by objective (RBO) of needs.By relationship map existing for existing knowledge base at The relationship by objective (RBO) needed, because different field, different places are different to the expression form of relationship.For example wikipedia (Wikipedia) many transaction attributes that information boxes (Info Box) include, but it is different with the relationship by objective (RBO) that we need Sample.Such as: it is Org:founded that University:established is corresponding in information boxes.

2) entity alias (the different expression ways of entity) in existing relationship is extended, (Query is extended by problem Expansion a variety of different forms of entity alias) are found.Specifically:

2.1) entity alias that redirection link of the entity on wikipedia is corresponding in existing relationship is found；

The link source of wikipedia is literary (Anchor Text), and link source text includes the change of the various different names of entity Shape, and can all occur in actual sentence, to extracting, the relevant sentence of entity is highly useful

2.2) be extended to the entity alias of infull name: name or the entity alias in not suffix are helped in abbreviation conversion Below plus suffix (such as: Co., Ltd (Ltd), company (Corp))；

2.3) reduce to the entity alias that do not abridge: full name is carried out part statement by acronym.

2.4) to step 1)~3) it is iterated, the entity alias of target requirement is met until finding；Target requirement can root Factually border is determined, i.e., physical name is suitable and quantity is enough；

2.5) using entity link (Entity Linking) and disambiguation (Disambiguation) to entity alias It is filtered processing.

3) from internet, news, blog, Email Information, public document, chat record etc. be can be, obtained The relevant non-structured text of a large amount of entities, and establish index.Specifically:

3.1) with the entity alias and the obtained entity alias of entity alias extension in already existing relationship It is built into a dictionary；

3.2) it uses the word in the dictionary constructed as keyword, it is corresponding to crawl keyword from internet by crawler The relevant webpage of entity；

3.3) text extraction is carried out to the webpage crawled, and subordinate sentence processing is carried out to content of text, acquired big The non-structured text of amount, and a large amount of texts stored in the form of a file；

3.4) full text is established to obtained non-structured text with the full-text searches such as Lucene or Solr tool Index.

4) by search index sentence relevant to entity alias, and positive negative sample is isolated.Specifically:

4.1) already existing relationship is expressed as r (e₁,e₂), wherein r is relationship name, e₁And e₂It is right in relationship respectively The name for two entities answered is respectively defined as the name of entity 1 and entity 2；

4.2) with the name e of entity 1₁As keyword, the sentence of the related name of in full gopher retrieval entity 1； If retrieving the title e that obtained sentence includes entity 2₂, which is labeled as positive sample；Otherwise, which is labeled as Negative sample.

5) convolutional neural networks are based on, positive negative sample is converted into feature vector.Specifically:

5.1) each word in positive negative sample is converted into term vector with word2vec；

5.2) sentence that will convert into term vector passes through convolution, the sequence after all samples to be converted into convolution；

5.3) the convolution sequence pond obtained after sentence convolution is obtained with aggregate function (being max function here) final Feature.

6) classified with more example multi-tag models (MIML) to non-structured text using the feature vector obtained, Obtain new relationship pair.Specifically:

6.1) definition document collection is combined into C, and the collection of the entity description extracted from C is combined into E, it is known that relational tags set For R, the related database of institute is D, and D is at least instantiated by the sentence in C primary；

6.2) Relation extraction based on distance supervision is carried out with the model of more example multi-tags, the model utilizes to be sentenced firmly Other expectation-maximization algorithm (EM, Expectation Maximization), the training step of model is divided into two steps:

Wherein, P_iAnd N_iRespectively indicate the corresponding set of positive and negative relational tags of i-th of entity pair, z_iI-th of expression real Relational tags of the body to (Entity Tuple), y_iIndicate whether to hold corresponding relationship (that is, if r ∈ P_i, thenIf r ∈ N_i, thenw_yAnd w_zRespectively indicate the parameter of y classifier and z classifier, x_iIndicate i-th Son, r indicate that the corresponding label of relationship, m indicate m-th of description, z '_iJoint is once asked in the past to corresponding group comprising i-th of entity Label is described, i=1 ..., n carry out calculating joint probability, and n is the number of the entity pair in D, M obtained in probability_iIt is i-th A entity is to corresponding entity description set, for each m ∈ M_iCalculate following formula:

Embodiment

The Relation extraction of KBP2010 is completed with the corpus of the entry of wikipedia 820,000 or so and a large amount of New York Times For task, implementation steps of the invention are as follows:

Illustrate:

There is an entry on wikipedia, that is, corresponds to an entity, its relevant attribute, in the information of each entry In box (Info Box), there are also the relevant articles of this entry, that is, content of text.New York Times corpus is largely to come From the newsletter archive of the New York Times, wherein including a large amount of non-structured information.

1. by the information MAP of the information boxes (Info Box) on wikipedia at attribute type corresponding to KBP.Analogy Say the objective attribute target attribute by the relationship map of University:established at Org:founded.It maps in some Wikis Attribute does not have in task, and just these attributes are neglected, also have it is one-to-many, with regard to corresponding mapping；

2. finding the corresponding entity alias of redirection link of the entity on wikipedia；

3. the link source of wikipedia is literary (Anchor Text): link source text includes the various different names of entity Deformation, and can all occur in actual sentence, to extracting, the relevant sentence of entity is highly useful；

4. extension name: name is helped in abbreviation conversion, and name is helped in surname conversion, add behind the name some suffix (such as: Co., Ltd (Ltd), company (Corp))；

5. reduction physical name: with extension name on the contrary, finding all possible abbreviation: acronym, part is stated Etc.；

6. after step 4 and step 5, then jumping to step 1 and step 2, iteration is carried out, suitable and enough until finding Physical name；

It puts together 7. the corresponding all texts of entry on wikipedia are individually extracted, the New York Times is relevant Article is also extracted and is put together；

8. carrying out subordinate sentence to obtained text with subordinate sentence tool, there are in new file for a sentence a line；

9. establishing index to the text for having divided sentence with the full-text searches such as Lucene or Solr tool；

10. the entry name of each entry and its alias are as keyword using in Wiki, with full-text searches such as Lucene This entry of tool queries relevant sentence in all texts, these sentence extractions are come out；

11. pair sentence extracted is simply handled, if in the sentence extracted including the entry information Attribute involved in box (Info Box), then this sentence is just labeled as positive sample；Otherwise just this sentence is marked For negative sample, classifies for subsequent classifier and use；

12. each word in sample is converted to term vector with word2vec；

13. the sentence that will convert into term vector passes through convolution, the sequence after all samples to be converted into convolution；

14. the convolution sequence pond obtained after sentence convolution talked about to obtain with aggregate function (being max function here) final Feature；

15. setting P_iAnd N_iRespectively indicate the corresponding set of positive and negative relational tags of i-th of entity pair, z_iI-th of expression real Relational tags of the body to (Entity Tuple), y_iIndicate whether to hold corresponding relationship (that is, if r ∈ P_iSoIf r ∈ N_i, thenw_yAnd w_zThe parameter of y classifier and z classifier is respectively indicated, x indicates sentence, r table Show that the corresponding label of relationship, m indicate m-th of description.z′_iComprising i-th of entity to corresponding group obtained in the reasoning before Label is described.By maximizing joint probability described in formula, the new relational tags of entity pair are obtained:

16. being separately optimized the parameter of y classifier and z classifier, new w is obtained_yAnd w_z, since two layers of classifier is excellent Change process is uncorrelated, so independently optimize two groups of parameters, as follows:

17. iteration step 15 and step 16 are until obtaining final model.

Claims

1. a kind of Relation extraction method based on convolutional neural networks and apart from supervision, characterized by comprising the steps of:

1) by existing relationship map at relationship by objective (RBO)；

2) entity alias in existing relationship is extended, a variety of different forms for finding entity alias are extended by problem, specifically Are as follows:

2.2) be extended to the entity alias of infull name: abbreviation conversion helps name or behind the entity alias of not suffix In addition suffix；

2.3) reduce to the entity alias that do not abridge: full name is carried out part statement by acronym；

2.4) to step 2.1)~2.3) it is iterated, the entity alias of target requirement is met until finding；

2.5) processing is filtered to entity alias using entity link and disambiguation；

3) the relevant non-structured text of entity is obtained from internet, and establishes index；

6) classified with more example multi-tag models to non-structured text using the feature vector obtained, obtain new pass System pair.

2. a kind of Relation extraction method based on convolutional neural networks and apart from supervision according to claim 1, feature It is that step 1) is specific as follows: different field existing for existing knowledge base, the relationship expression form of different places is mapped to The relationship by objective (RBO) needed.

3. a kind of Relation extraction method based on convolutional neural networks and apart from supervision according to claim 1, feature It is the step 3) specifically:

3.1) in already existing relationship entity alias and entity alias extend an obtained entity alias and construct At a dictionary；

3.2) it uses the word in the dictionary constructed as keyword, crawls the corresponding entity of keyword from internet by crawler Relevant webpage；

3.3) text extraction is carried out to the webpage crawled, and subordinate sentence processing is carried out to content of text, acquired non-structural The text of change, and store in the form of a file；

3.4) in full gopher establishes full-text index to obtained non-structured text.

4. a kind of Relation extraction method based on convolutional neural networks and apart from supervision according to claim 1, feature It is the step 4) specifically:

4.1) already existing relationship is expressed as r (e₁,e₂), wherein r is relationship name, e₁And e₂It is entity 1 and entity 2 respectively Name；

4.2) with the name e of entity 1₁As keyword, the sentence of the related name of in full gopher retrieval entity 1；If inspection The sentence that rope obtains includes the title e of entity 2₂, which is labeled as positive sample；Otherwise, which is labeled as negative sample.

5. a kind of Relation extraction method based on convolutional neural networks and apart from supervision according to claim 1, feature It is the step 5) specifically:

5.3) convolution sequence pond is obtained into final feature with aggregate function.

6. a kind of Relation extraction method based on convolutional neural networks and apart from supervision according to claim 1, feature It is the step 6) specifically:

6.1) definition document collection is combined into C, and the collection of the entity description extracted from C is combined into E, it is known that the collection of relational tags be combined into R, The related database of institute is D, and D is at least instantiated by the sentence in C primary；

6.2) Relation extraction based on distance supervision is carried out with the model of more example multi-tags, what the model utilization differentiated firmly Expectation-maximization algorithm, the training step of model are divided into two steps:

Wherein, P_iAnd N_iRespectively indicate the corresponding set of positive and negative relational tags of i-th of entity pair, z_iIndicate i-th of entity pair Relational tags, y_iIt indicates whether to hold corresponding relationship, if r ∈ P_i, thenIf r ∈ N_i, thenw_yAnd w_z Respectively indicate the parameter of y classifier and z classifier, x_iIndicate that i-th of sentence, r indicate that the corresponding label of relationship, m indicate m-th Description, z '_iCorresponding group is once asked in the past comprising i-th of entity and describes label obtained in joint probability, i=1 ..., n into Row calculates joint probability, and n is the number of the entity pair in D, M_iIt is i-th of entity to corresponding entity description set, for Each m ∈ M_iCalculate following formula:

Second step executes M process, is separately optimized the parameter of y classifier and z classifier, obtains new w_yAnd w_z, and it is excellent respectively Change the parameter of two layers of classified device, optimization formula is as follows, and wherein w is the parameter of each function: