CN112633014B

CN112633014B - Neural network-based long text reference digestion method and device

Info

Publication number: CN112633014B
Application number: CN202011437239.7A
Authority: CN
Inventors: 洪万福; 钱智毅; 赵青欣
Original assignee: Xiamen Yuanting Information Technology Co ltd
Current assignee: Xiamen Yuanting Information Technology Co ltd
Priority date: 2020-12-11
Filing date: 2020-12-11
Publication date: 2024-04-05
Anticipated expiration: 2040-12-11
Also published as: CN112633014A

Abstract

The invention relates to the field of natural language processing learning, and discloses a long text reference digestion method and a long text reference digestion device based on a neural network. The method is helpful for directly judging the reference relationship of the reference part and the referred part in the current context, and has better effect on both dominant reference and zero reference.

Description

Neural network-based long text reference digestion method and device

Technical Field

The invention relates to the field of natural language processing learning, in particular to a long text reference digestion method and device based on a neural network.

Background

Natural language understanding means that a human uses a computer to process information such as the form, pronunciation, and semantics of a natural language expressed in the form of text, voice, and the like.

The reference and omission are language phenomena widely existing in natural language, have positive effects of simplifying expression, connecting context and the like, but can cause ambiguity problem of sentences, and bring great difficulty to understanding natural language, so that the content or default part of the pronoun reference needs to be recovered and supplemented.

Reference to resolution, in a broad sense, is a question of determining in chapters to which noun phrase a pronoun points. According to the direction, it can be divided into back finger and pre-finger. The back-meaning is that the antecedent of the pronoun precedes the pronoun, and the pre-meaning is that the antecedent of the pronoun follows the pronoun. Three categories can be distinguished by the type of reference being made: human pronouns, indicated pronouns, definite descriptions, omissions, partial-whole designations, and common noun phrases.

The index digestion has a longer research history, and the performance of the index digestion system is continuously improved from the research of early manual rules and other theoretical methods to the derivation of computer automatic processing technology in a large-scale corpus afterwards and the introduction of various machine learning methods at present. However, the understanding and representing method of the semantics in the natural language is still not mature enough, and the deep language knowledge and the semantic features are simpler to use, so that the deep mining of different characteristics of multiple levels of words, sentences and chapters is not carried out, and the effective utilization of the context information is not carried out. The reference resolution is a key difficulty in natural language processing, and has important significance for information extraction in the field of natural language processing.

Disclosure of Invention

In view of the foregoing drawbacks of the prior art, an object of the present invention is to provide a neural network-based long text reference resolution method and apparatus, which introduces deep learning techniques into the pronoun resolution task to recover and supplement the reference and default portions of the long text.

In order to achieve the above purpose, the invention provides a long text reference digestion method based on a neural network, which introduces a deep learning technology into a pronouncing digestion task to realize Chinese pronouncing digestion and omit a recovery task. In particular, the present invention employs an attention network, which is essentially a multi-layered forward neural network, for increasing or decreasing the degree of attention of the network to certain words by computing the probability value between the target and the source as the attention, and for adjusting in error feedback. The time complexity of the attention network is much smaller than that of the cyclic neural network (RNN network) and the like, and the application is applicable.

The invention provides a long text reference digestion method based on a neural network, which comprises the following steps:

step S1: inputting a long text;

step S2: extracting the information of the indicated part, the information of the surrounding words of the indicated part, the original text information, the information of the indicated part and the information of the surrounding words of the indicated part in the long text, and carrying out vectorization representation; embedding the indicating part information, the indicating part surrounding word information, the original text information, the indicated part information and the indicated part surrounding word information of the vectorization representation into position information respectively, and correspondingly outputting an indicating part vector, an indicating part surrounding information vector, an original text part vector, an indicated part vector and an indicated part surrounding information vector;

step S3: calculating the multi-head attention mechanism of the information vector around the pointing part and the pointing part, and then continuously calculating the obtained result and the original text part vector by adopting the multi-head attention mechanism to obtain a first characteristic result;

step S4: carrying out multi-head attention mechanism calculation on the pointed part vector and the information vector around the pointed part, and then, continuing to carry out multi-head attention mechanism calculation on the obtained result and the original part vector to obtain a second characteristic result;

step S5: and (3) connecting the first characteristic result and the second characteristic result obtained in the step (S3) and the step (S4) to form a comprehensive result, and mapping the comprehensive result in a discrimination space by utilizing a Softmax layer to judge whether the reference is indicated or not.

Further, the step S3 specifically includes: the method comprises the steps of taking a reference part vector as a query of a first multi-head attention model, taking a reference part surrounding information vector as a key and a value, taking output of the first multi-head attention model as a query of a second multi-head attention model, taking an original Wen Bufen vector as the key and the value of the second multi-head attention model, and taking the second multi-head attention model as an output of a first characteristic result.

Further, the step S4 specifically includes: the method comprises the steps of taking a designated part vector as a query of a third multi-head attention model, taking a designated part surrounding information vector as a key and a value, taking output of the third multi-head attention model as a query of a fourth multi-head attention model, taking an original Wen Bufen vector as the key and the value of the fourth multi-head attention model, and taking the fourth multi-head attention model as an output of a second characteristic result.

The invention also discloses a long text reference digestion device based on the neural network, which comprises a long text extraction module, a connecting layer, a Softmax layer and four multi-head attention models;

the long text extraction module is used for extracting the information of the indicated part, the information of the words around the indicated part, the original text information, the information of the indicated part and the information of the words around the indicated part from the long text and carrying out vectorization representation; embedding the indicating part information, the indicating part surrounding word information, the original text information, the indicated part information and the indicated part surrounding word information of the vectorization representation into position information respectively, and outputting an indicating part vector, an indicating part surrounding information vector, an original text part vector, an indicated part vector and an indicated part surrounding information vector;

the connection relation of the input and the output of the four multi-head attention models is as follows:

the input of the query of the first multi-head attention model refers to a partial vector, and the input of keys and values refers to a partial surrounding information vector; the output of the first multi-headed attention model is the input of a query of the second multi-headed attention model; the key and value inputs of the second multi-head attention model are original text partial vectors; outputting a first characteristic result by the second multi-head attention model;

the input of the query of the third multi-head attention model is a designated partial vector, the input of keys and values of the key and the value of the key and the value of the key are the designated partial vector, the output of the third multi-head attention model is the input of the query of the fourth multi-head attention model, the key and the value of the fourth multi-head attention model are the original partial vector, and the fourth multi-head attention model outputs a second characteristic result;

the connection layer is used for connecting the first characteristic result and the second characteristic result to form a comprehensive result;

the Softmax layer is used for mapping the comprehensive result in a discrimination space to judge whether the result is referred or not.

The technical effects are as follows:

the long text reference digestion method realizes the processing of different layers of information in natural language by constructing a multi-layer attention model. Vector representations of the indicated part and the indicated part under the surrounding information and original conditions are calculated according to the attention mechanism, and whether the indicated relation exists or not is further calculated. The method is helpful for directly judging the reference relationship of the reference part and the referred part in the current context, and has better effect on both dominant reference and zero reference.

Drawings

FIG. 1 is a schematic diagram of an algorithm architecture for an inventive reference digestion method;

FIG. 2 is a diagram of a multi-headed attention mechanism model;

FIG. 3 is a schematic diagram of a digestion unit in accordance with the present invention.

Detailed Description

For further illustration of the various embodiments, the invention is provided with the accompanying drawings. The accompanying drawings, which are incorporated in and constitute a part of this disclosure, illustrate embodiments and together with the description, serve to explain the principles of the embodiments. With reference to these matters, one of ordinary skill in the art will understand other possible embodiments and advantages of the present invention. The components in the figures are not drawn to scale and like reference numerals are generally used to designate like components.

The invention will now be further described with reference to the drawings and detailed description.

Example 1

As shown in fig. 1 and fig. 2, the present invention provides a neural network-based long text reference digestion method, which introduces a deep learning technique into a pronouncing digestion task, thereby optimizing chinese pronouncing digestion and omitting completion of a recovery task.

The invention relates to a plurality of specialized words referring to digestion, which are described as follows:

(1) The word "back" (anapora) is in turn interpreted as an upward, forward or corresponding meaning and refers to the unit or meaning mentioned above (Crystal, 1985) by a term (transfer from Hu Zhuanglin, 1994: 48). Wherein, the term referred to in the foregoing is called antecedent (antecedent), and the term referred to is called return term or term (anaphor). Generally speaking, when something is mentioned in an utterance, it is necessary to make a second time, a form of back finger (anaphoric) is used to make the contexts correspond to each other (Chen Ping, 1987).

In many cases, the antecedent in the index is the same as the index, and both are in a co-index (coreferential) relationship. The main checking means is the return of the language, i.e. see if the look-ahead can be placed in place of the return item without changing meaning, and if the expression of the original sentence meaning is not affected, it can be judged to be the dominant return.

In explicit echo, the antecedent has obvious tracing property (tracking) when chapter appears, and echo item can be replaced by antecedent without changing sentence meaning. Whereas implicit echo refers to the reference to an implicit (immediate) antecedent. The implicit echo term does not necessarily refer to a specific word or a syntactic component in the speech, but it must be an entity that stands out in the psychological characterization established by the parties to the interaction from the utterance. In this case, the callback term recognizes only the meaning (identity of sense) associated with the preceding word, and the two terms are not identical. Therefore, in implicit echo, if the echo item is replaced by the antecedent item, the meaning will be changed.

Refer to partial information, and to a word. Pronouns refer to words replacing nouns, verbs, adjectives, quantity words, adverbs, including: alpha) human pronouns such as "I, you, he, we, zans, oneself, people", b) query pronouns such as "who, what, where, how many places, how many, c) indicates a pronoun such as" this, here, so, that, there, then, that, then "etc.

(2) Multi-head attention mechanism (Multi-head attention)

Multiple head attention is directed to using multiple queries to compute multiple information choices from input information in parallel. Each focusing on a different part of the input information.

The structure of the multi-headed attention model is shown in fig. 2, where the query Q, key K, and the value V are first linearly transformed (linear) and then input to the scaled dot product (scaled dot-produact attention), where h times are taken, the so-called multi-headed, and the heads are calculated one at a time. And the parameters W for the linear transformation of Q, K, V are different each time. Then, the h times of scaling point product position results are spliced (concat), and a value obtained by linear transformation (linear) is used as a multi-head position result.

The invention discloses a long text reference digestion method based on a neural network, which specifically comprises the following steps:

(1) Inputting long text, wherein the long text can be a sentence or a section of characters;

(2) Extracting the information of the indicated part, the information of the surrounding words of the indicated part, the original text information, the information of the indicated part and the information of the surrounding words of the indicated part of the long text, and carrying out vectorization representation; and embedding the indicating part information, the indicating part surrounding word information, the original text information, the indicated part information and the indicated part surrounding word information of the vectorization representation into position information respectively, and correspondingly outputting an indicating part vector, an indicating part surrounding information vector, an original text part vector, an indicated part vector and an indicated part surrounding information vector.

(3) And carrying out multi-head attention mechanism calculation on the index part vector and the information vector around the index part, and then enabling the obtained result to continue to adopt multi-head attention mechanism calculation with the original text part vector to obtain a first characteristic result representing the index part. The result can reflect the effect of the word information surrounding the original text and the reference part on the reference part under the condition of a multi-layer attention mechanism. More specifically, a Query (Q) with a part of vectors as a first multi-head attention model, a Key (K) and a Value (V) with a part of surrounding information vectors as keys, an output of the first multi-head attention model is used as a Query Q of a second multi-head attention model, an original part of vectors is used as a Key K and a Value V of the second multi-head attention model, and the second multi-head attention model outputs a first feature result.

(4) And performing multi-head attention mechanism calculation on the reference part vector, the surrounding information vector of the reference part and the original part vector in the same way to obtain a second characteristic result representing the reference part. More specifically, the designated partial vector is taken as a query Q of a third multi-head attention model, the designated partial surrounding information vector is taken as a key K and a value V, the output of the third multi-head attention model is taken as a query Q of a fourth multi-head attention model, the original partial vector is taken as the key K and the value V of the fourth multi-head attention model, and the fourth multi-head attention model outputs a second characteristic result.

(5) Finally, the first characteristic result and the second characteristic result are connected to form a comprehensive result, and the comprehensive result is mapped in a discrimination space by utilizing a Softmax layer to judge whether the reference is indicated or not.

Example 2

The first step: before training the model, training data is collected and data pre-processed.

And a second step of: the training data is transmitted to the steps (2) to (5) mentioned in embodiment 1, and since the training data is transmitted in the form of arrays after being transmitted to the model, the output result of each layer cannot be visually interpreted, and the above 4 steps (data processing, vectorization processing, array conversion and calculation of the neural network layer, etc. of the long text) are all passed from the processing of the long text to the training of the model or the model reasoning, the steps (2) to (5) can be divided into two parts of a model training module and a model reasoning module according to the model requirement. Generating a model file in a model training module; the model inference module generates an inference result.

And a third step of: after the training data of the first step is processed and the model of the second step is trained and stored, the data reasoning stage is entered.

Fourth step: and outputting an reasoning result.

The long text reference digestion method based on the neural network is an echo digestion method, and the processing of different layers of information of natural language is realized by constructing a multi-layer attention model. Vector representations of the indicated part and the indicated part under the surrounding information and original conditions are calculated according to the attention mechanism, and whether the indicated relation exists or not is further calculated. The method is helpful for directly judging the reference relationship of the reference part and the referred part in the current context, and has better effect on both dominant reference and zero reference.

As shown in fig. 3, the invention also discloses a long text reference digestion device based on a neural network, which comprises a long text extraction module, a connection layer, a Softmax layer and four multi-head attention models;

While the invention has been particularly shown and described with reference to a preferred embodiment, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims

1. A neural network-based long text reference digestion method, characterized in that: the method comprises the following steps:

step S1: inputting a long text;

step S5: the first characteristic result and the second characteristic result obtained in the step S3 and the step S4 are connected to form a comprehensive result, and the comprehensive result is mapped in a discrimination space by utilizing a Softmax layer to judge whether the reference is indicated or not;

the step S3 specifically includes: taking the reference partial vector as the query of the first multi-head attention model, taking the reference partial surrounding information vector as the key and the value, taking the output of the first multi-head attention model as the query of the second multi-head attention model, taking the original Wen Bufen vector as the key and the value of the second multi-head attention model, and taking the second multi-head attention model to output the first characteristic result;

the step S4 specifically includes: the method comprises the steps of taking a designated part vector as a query of a third multi-head attention model, taking a designated part surrounding information vector as a key and a value, taking output of the third multi-head attention model as a query of a fourth multi-head attention model, taking an original Wen Bufen vector as the key and the value of the fourth multi-head attention model, and taking the fourth multi-head attention model as an output of a second characteristic result.

2. A neural network-based long text reference resolution device, characterized in that: the system comprises a long text extraction module, a connection layer, a Softmax layer and four multi-head attention models;