CN111090724A

CN111090724A - Entity extraction method capable of judging relevance between text content and entity based on deep learning

Info

Publication number: CN111090724A
Application number: CN201911148302.2A
Authority: CN
Inventors: 李举; 刘方然; 李金波; 徐常亮
Original assignee: Xinhua Zhiyun Technology Co ltd
Current assignee: Xinhua Zhiyun Technology Co ltd
Priority date: 2019-11-21
Filing date: 2019-11-21
Publication date: 2020-05-01
Anticipated expiration: 2039-11-21
Also published as: CN111090724B

Abstract

The invention relates to the technical field of entity extraction, in particular to an entity extraction method capable of judging the relevance between text content and an entity based on deep learning. The method comprises the following steps: indicating that the text consists of n words, containing m entities; using LSTM to represent text literal semantics; splicing text literal semantic representation and averaging other entity representations except the correlation entity to be determined, and finally generating text semantic context; performing attention mechanism operation on the text semantic context by the entity to be determined to obtain an attention vector; a further representation of the text semantics is calculated from the attention vector. The design of the invention uses an end-to-end deep learning model to avoid writing a large number of complicated rules and improve the universality of the model. The method avoids the processing of a large number of characteristic projects in machine learning, improves the iteration speed of the model, and is easy to convert.

Description

Entity extraction method capable of judging relevance between text content and entity based on deep learning

Technical Field

The invention relates to the technical field of entity extraction, in particular to an entity extraction method capable of judging the relevance between text content and an entity based on deep learning.

Background

The entity is also called as a "proper name", and refers to an entity with a specific meaning in a text, and mainly comprises people, places, organizations and the like. Named entity recognition aims to identify the entities and types of entities present in the text, and the technology is now well developed. Named entity recognition does not indicate how relevant the entities and articles presented herein are. Entity relevance refers to the strong and weak relevance between an entity and an article, and generally, a plurality of entities appear in one article, but not all entities are strongly relevant to the article. In the actual use process, only entities which are strongly related to the articles need to be concerned, so that the importance of finding and judging the relevance between the entities and the articles is very important. At this stage, there is little research on the relevance of entities to articles, and the only research is based on rules and machine learning. The deep learning network structure provided by the invention can solve the problem of strong and weak correlation between an entity and an article end to end, avoids the problem of poor universality caused by rules, and can automatically perform feature screening, thereby reducing the processing work of a large number of feature projects of machine learning and improving the model iteration speed.

Disclosure of Invention

The invention aims to provide an entity extraction method capable of judging the relevance between text content and an entity based on deep learning, so as to solve the problems in the background technology.

In order to achieve the above object, the present invention provides an entity extraction method based on deep learning and capable of judging the relevance between text content and an entity, wherein the method comprises the following steps:

the method comprises the following steps: indicating that the text is composed of n words

Composition of containingm entities [ E ]₁,E₂,E₃,…,E_m]；

Step two: using LSTM to represent the literal semantic meaning of the text in the step one;

step three: splicing text literal semantic representation and averaging other entity representations except the correlation entity to be determined, and finally generating text semantic context;

step four: performing attention mechanism operation on the text semantic context by the entity to be determined to obtain an attention vector;

step five: and calculating further representation of text semantics according to the attention vectors in the fourth step, wherein the attention vectors are multiplied by elements of context respectively and then added to obtain text semantic representation aiming at the attention of the entity:

step six: representing the text semantic meaning based on entity attention in the step five by C_rAnd a correlation entity representation E to be determined_mAnd splicing the vectors into a vector d, and sending the vector d into a classifier to finally obtain the probability of strong and weak correlation between the entity and the text.

Preferably, in the first step, w is a word2vec vector of the corresponding word, and E represents a transm representation of the corresponding entity.

Preferably, in the second step, the LSTM algorithm is defined as: given word vector w^kThe previous cell state is c^k ^-1Previous hidden layer state h^k-1The current cell state is c^kThe current hidden layer state is h^k-1Therefore, the LSTM network is as follows:

h^k＝o^k⊙tanh(c^k) (6)

wherein i, f and o are respectively an input gate, a forgetting gate and an output gate, and sigma is an activation function, so that the text literal semantic representation is obtained:

preferably, in the third step, the representation of the other entities except the correlation entity to be determined by averaging is defined as: let E_mFor the correlation entity to be determined, the semantics of other entities except the correlation entity to be determined are expressed as:

the spliced text is represented as:

preferably, in the fourth step, the attention vector of the entity to the text is:

where γ is the attention score function defined as:

W_a，b_aweight matrix and offset.

Preferably, in the sixth step, d ═ C_r+E_mThe classifier is as follows:

x＝tanh(W_l.d+bl)，

wherein W_l，b_lRespectively weight matrix and offset.

Compared with the prior art, the invention has the beneficial effects that:

1. in the entity extraction method based on deep learning and capable of judging the relevance between the text content and the entity, the end-to-end deep learning model is used, so that the compiling of a large number of complicated rules is avoided, and the universality of the model is improved. The method avoids the processing of a large number of characteristic projects in machine learning, improves the iteration speed of the model, and is easy to convert.

2. In the entity extraction method based on deep learning and capable of judging the relevance between the text content and the entity, the TransH representation entity information is introduced, and the implicit relation between the entities can be captured.

3. In the entity extraction method based on deep learning and capable of judging the relevance between the text content and the entity, the attention mechanism of the entity to the text extracts the text information related to the entity more efficiently.

Drawings

FIG. 1 is a diagram of the algorithm architecture of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Referring to fig. 1, the present invention provides a technical solution:

the invention provides an entity extraction method capable of judging the relevance between text content and an entity based on deep learning, which comprises the following steps:

Composition containing m entities [ E₁,E₂,E₃,…,E_m](ii) a Where w is the word2vec vector of the corresponding word and E represents the TransH representation of the corresponding entity. word2vec is a tool for converting words into vectors that was derived by Google in 2013, and is used in this patent to convert words in text into corresponding word vectors. TransH is a distributed vector representation based on entities and relations, and obtains dense representation of the entities and relations in a low-dimensional space, and is used for capturing semantic relations between the entities and texts.

The LSTM algorithm is named as Long short-term algorithm, which was originally proposed by Sepp Hochreiter and J ü rgen Schmidhuber in 1997, and is a specific form of RNN (Recurrent neural network) mainly aiming at solving the problems of gradient elimination and gradient explosion in the Long-sequence training process^kThe previous cell state is c^k-1Previous hidden layer state h^k-1The current cell state is c^kThe current hidden layer state is h^k-1Therefore, the LSTM network is as follows:

h^k＝o^k⊙tanh(c^k) (6)

step three: splicing text literal semantic representation and averaging other entity representations except the correlation entity to be determined, and finally generating text semantic context; let E_mFor the correlation entity to be determined, the semantics of other entities except the correlation entity to be determined are expressed as:

the spliced text is represented as:

step four: making the correlation entity E to be determined in step three_mPerforming attention vector calculation on the text semantic context; let the attention vector of an entity to text be:

where γ is the attention score function defined as:

W_a，b_aweight matrix and offset.

in the sixth step, the text semantic meaning based on the entity attention in the fifth step is expressed C_rAnd a correlation entity representation E to be determined_mAnd splicing the vectors into a vector d, and sending the vector d into a classifier to finally obtain the probability of strong and weak correlation between the entity and the text. d ═ C_r+E_mThe classifier is as follows:

x＝tanh(W_l·d+b_l),

wherein W_l，b_lRespectively obtaining a weight matrix and an offset, and finally obtaining the strong and weak probability of the correlation between the entity and the text through a softmax function:

where C is 2, it indicates whether the correlation is strong or weak. The softmax function, also called normalized exponential function, is a generalization of the logistic function. It can "compress" a K-dimensional vector z containing arbitrary real numbers into another K-dimensional real vector y (z) such that each element ranges between (0,1) and the sum of all elements is 1.

The foregoing shows and describes the general principles, essential features, and advantages of the invention. It will be understood by those skilled in the art that the present invention is not limited to the embodiments described above, and the preferred embodiments of the present invention are described in the above embodiments and the description, and are not intended to limit the present invention. The scope of the invention is defined by the appended claims and equivalents thereof.

Claims

1. An entity extraction method based on deep learning and capable of judging the relevance between text content and an entity comprises the following steps:

Composition containing m entities [ E₁，E₂，E₃，...，E_m]；

2. The entity extraction method for judging the relevance of text content and entity based on deep learning according to claim 1, wherein: in the first step, w is a word2vec vector of a corresponding word, and E represents a TransH representation of a corresponding entity.

3. The method of claim 1The entity extraction method based on deep learning and capable of judging the relevance between text content and an entity is characterized in that: in the second step, the LSTM algorithm is defined as: given word vector w^kThe previous cell state is c^k-1Previous hidden layer state h^k-1The current cell state is c^kThe current hidden layer state is h^k-1Therefore, the LSTM network is as follows:

h^k＝o^k⊙tanh(c^k) (6)

4. the entity extraction method for judging the relevance of text content and entity based on deep learning according to claim 1, wherein: in the third step, the representation of other entities except the correlation entity to be determined by averaging is defined as: let E_mFor the correlation entity to be determined, so averaging is performedOther entity semantics beyond determining a relevance entity are represented as:

the spliced text is represented as:

5. the entity extraction method for judging the relevance of text content and entity based on deep learning according to claim 1, wherein: in the fourth step, the attention vector of the entity to the text is made as follows:

where γ is the attention score function defined as:

W_a，b_aweight matrix and offset.

6. The entity extraction method for judging the relevance of text content and entity based on deep learning according to claim 1, wherein: in the sixth step, d ═ C_r+E_mThe classifier is as follows:

x＝tanh(W_l·d+b_l)，

wherein W_l，b_lRespectively weight matrix and offset.