CN115455985A

CN115455985A - Natural language system processing method based on machine reading understanding

Info

Publication number: CN115455985A
Application number: CN202211135769.5A
Authority: CN
Inventors: 戴慧珺; 方强; 郭小雷
Original assignee: Suzhou Huijun Tao Intelligent Technology Co ltd
Current assignee: Suzhou Huijun Tao Intelligent Technology Co ltd
Priority date: 2022-09-19
Filing date: 2022-09-19
Publication date: 2022-12-09

Abstract

The invention discloses a processing method of a natural language system based on machine reading understanding, which comprises the steps of firstly analyzing the relation among articles, questions and answers selected by multiple items in a simulated machine reading understanding task based on the prior pre-training ALBERT model, providing a bidirectional multi-head common attention model at an interaction layer, coding the input articles and the answers of the questions as two query directions respectively, and combining the two query directions with a self-attention mechanism; adding a residual error network, a feedforward network and a layer normalization network into an encoding layer by referring to a transform structure; the output result is a probability value of each selected answer option, and the answer is determined according to the probability value. The method is based on the large-scale pre-training language model ALBERT, the transposition thinking experience of a human in solving the machine reading understanding problem is simulated, the natural relation among articles, problems and answers is better utilized, the prediction precision of correct options is effectively improved, and multiple selection problems in natural language processing are solved. The method is applied to the fields of natural language processing such as shielding processing, mask models, sentence prediction and the like, can fully capture the information of articles, questions and answer options, and can effectively improve the performance of machine question answering, natural language reasoning or text matching on natural language processing tasks needing interphrase understanding reasoning.

Description

Natural language system processing method based on machine reading understanding

Technical Field

The invention belongs to the field of multiple choices of machine reading comprehension under the processing of a neural network model, and particularly relates to a processing method of a natural language system based on machine reading comprehension.

Background

Natural Language Processing (NLP) is an important direction in the fields of computer science and artificial intelligence, and various theories and methods for realizing effective communication between people and computers using Natural Language are studied. Machine Reading Comprehension (MRC) is one of the most challenging tasks in the field of natural language processing, and an algorithm is used for enabling a computer to acquire article semantics and answer related questions, and the MRC gives the computer the ability to read, analyze and summarize texts. Machine-reading understanding can be essentially formalized as a supervised learning problem: a triple of text, question, and answer (P, Q, A) is given, where P represents an article, Q represents a question, and a represents an answer. The MRC task is to obtain a predictor f through training and learning, wherein P and Q are used as input, and a correct answer A is used as output. f: (P, Q) → A.

In the process of solving the MRC problem, there are a traditional method based on feature matching and a method based on neural reading understanding model. The traditional characteristic matching method manually constructs rules or extracts characteristics to complete matching between sentences or words, and then picks out the answer with the highest matching degree. The traditional method is greatly limited by characteristics, words and linguistic data, difficult to generalize and low in efficiency. Reading understanding models based on deep learning become mainstream, particularly, large-scale pre-training coding models are widely applied to the fields of natural language processing and neural reading understanding models, and the pre-training models obtain models irrelevant to specific tasks from large-scale data through self-supervision learning and represent semantic representation of a certain word in a specific context. The pre-training model represented by the Bert can be used for properly coding each word element according to the context meaning, and the ALBERT is improved on the basis of the Bert, so that the model parameters are reduced, and the semantic comprehension capability of the model is enhanced. ALBERT uses embedded layer parameter factorization and cross-layer parameter sharing to reduce memory overhead and increase training speed. The attention mechanism is widely applied to a machine reading and understanding task, the multi-head attention mechanism adopts h different self-attention splices, the pre-coded context can be fully fused, the final attention value is calculated by multiple times of parallel scaling dot products, and the output dimension of the multi-head attention mechanism is consistent with the output dimension of the single attention mechanism.

The neural reading understanding model technology and the attention mechanism are widely applied to the fields of natural language processing and machine reading understanding, and because most models do not fully consider human reverse thinking for context fusion after precoding, another query direction from a question-answer to an article is ignored, a predictor cannot fully learn the potential and natural logical relations among the article, the question and the answer, and the logical relation among the sentence questions and answers of the paragraph of the article or the associated information among words are ignored.

Disclosure of Invention

The invention aims to provide an accurate and effective method of a natural language processing system, which is used in the fields of multiple choices of machine reading understanding, reasoning prediction and the like. On the basis of the existing pre-training model, starting from the natural relation among articles, questions and answers, co-attention and Self-attention are combined to construct a proper multi-head bidirectional Co-attention machine mechanism; adding a residual error network, a feedforward network and a layer normalization network by referring to a Transformer-encoder structure; and correcting the network parameters aiming at the specific task, outputting the probability of each option selected by using the full-connection layer as a decoder, and selecting the option with the highest probability value. Finally, training is performed on the dream dataset, and compared with the method and the system which directly use a pre-training model, the method and the system remarkably improve the answer accuracy.

The invention adopts the following technical scheme:

a method of processing a natural language system based on machine-readable understanding, the system comprising:

the pre-training coding module is used for pre-training the original data set to obtain codes of pre-training word elements; splicing the question-answer and the article into a sequence according to various options in the machine reading understanding field, and constructing different inputs;

the neural network module is used for constructing a deep neural network model understood by machine reading, and the model comprises a self-attention, bidirectional co-attention and multi-head attention fusion layer, a residual error layer and a Softmax output layer;

the judgment output module is used for outputting the option with the maximum probability as a correct answer based on the probability value of each option in the machine reading understanding output by the natural language processing system of the machine reading understanding;

the method comprises the following steps:

s1, encoding each word element of an input article, a question and an option by using a pre-training model commonly used in the field of natural language processing;

and S2, constructing an article and a problem-option tensor, and adding a residual error network, a layer normalization and a feedforward neural network by referring to a Transformer-encoder structure through a co-attention mechanism and a self-attention mechanism to obtain three groups of output tensors. Training the whole model by using a random small batch gradient descent method;

s3, fusing the three groups of prediction tensors output in the S2 to obtain a prediction tensor corresponding to each option;

and S4, inputting the prediction tensor of each option into a decoder to obtain the probability of the option being selected, normalizing by using a Softmax function, and selecting the option with the highest probability as output.

Specifically, step S1 combines each option X _i Splicing the answers with articles and questions into a sequence, and mapping each word element in the sequence into a vector with a fixed length;

let P = [ P ] ₁ ，p ₂ ，...p _m ]Q＝[q ₁ ，q ₂ ，...q _n ]A＝[a ₁ ，a ₂ ，...a _k ]Respectively representing articles, questions, answer sequences, where p _i ，q _i ，a _i Respectively representing the lemmas in the sequence;

inputting P, Q, A into pre-training model E at coding levelUse of

Processing to obtain an output tensor E = [ E ] ₁ ，e ₂ ，...e _m+n+k ]Where ei is the encoding vector for each token.

Specifically, step S2 specifically includes:

s201, dividing the tensor coded by the pre-training model into an article tensor and a question-option tensor, and expressing the tensor as follows:

and

wherein, the first and the second end of the pipe are connected with each other,

is a token representation of the ith article,

is a token representation of the jth question-option pair,/ _p And l _qa Length of the presentation article and question-option sequence;

s202, bidirectional co-attention processing is carried out on the article and the question-option tensor, namely E is respectively processed ^P And E ^QA Taking the attention mechanism as a Query vector to obtain two different attention scores;

s203, performing layer normalization processing on the two acquired attention score vectors and connecting the two acquired attention score vectors through a residual error network;

s204, inputting the two updated attention scores into a feedforward neural network, using Relu as an activation function, and continuing to perform layer normalization after connecting through a residual error network;

s205, problem-option tensor E ^QA Performing self-attention processing, and performing layer normalization and feedforward neural network;

and S206, repeating the bidirectional co-attention process and the self-attention process k-1 times, namely repeating the updated three attention score calculation processes k-1 times.

Further, in step S202, attention value is Attention, softmax is normalized exponential function, T is vector transposition, d _k Is E ^QA The dimension of (2) is also the encoding length of the pre-training model to the lemma.

W ^O Parameters for model training, head _i For the ith attention value, multiHeadAtten is the total score of the multi-head attention model, concat represents the concatenation of the scores of multi-head attention, MHA ₁ And MHA ₂ It represents the different multi-head attention model scores from the article to the question-option and from the question-option to the article in two different directions, where the query and key values are different, representing a two-way mutual attention mechanism.

MultiHeadAtten(E ^P ，E ^QA ，E ^QA )＝Concat(head ₁ ，…，head _h )W ^O

MHA ₁ ＝MultiHeadAtten(E ^P ，E ^QA ，E ^QA )

MHA ₂ ＝MultiHeadAtten(E ^QA ，E ^P ，E ^P )

Further, in step S203, the multi-head attention model score MHA ₁ 、MHA ₂ Performing layer normalization processing and residual error connection:

MHA ₁ ＝LN(E ^P +MHA ₁ )

MHA ₂ ＝LN(E ^QA +MHA ₂ )

further, in step S204, the MHA is continued to be activated ₁ 、MHA ₂ Inputting a feedforward neural network, using Relu as an activation function, connecting through a residual error network, and then performing layer normalization:

MHA′ ₁ ＝Linear(Relu(Linear(MHA ₁ )))

M ₁ ＝LN(MHA′ ₁ +MHA ₁ )

MHA′ ₂ ＝Linear(Relu(Linear(MHA ₂ )))

M ₂ ＝LN(MHA′ ₂ +MHA ₂ )

further, in step S205, answer E to the question ^QA A self-attention process is carried out, and layer normalization is carried out to carry out residual error connection:

MHA ₃ ＝MHA(E ^QA ，E ^QA ，E ^QA )

MHA ₃ ＝LN(E ^QA +MHA ₃ )

MHA ₃ ＝Linear(Relu(Linear(MHA ₃ )))

M ₃ ＝LN(MHA′ ₃ +MHA ₃ )

further, in step S206: repeating the steps S201, S202, S203, S204 and S205 for k-1 times, and adding M ₁ 、M ₂ 、M ₃ The method iterates k-1 times.

Specifically, step S3 specifically includes: the tensors after attention mechanism processing are fused,

pooling means that the tensor is averaged in dimension 1, e.g., the m n tensor averaged in dimension 1 becomes 1*n in size,

representing the join operation between tensors.

Further, in step S401: o to be output _i And the tensor is input into the full connection layer operation to obtain the selected probability of each option. The fully-connected layer maps n-dimensional feature vectors to 1-dimensional.

Further, in step S402: and (4) normalizing the probability of each option through Softmax, and selecting the option with the highest probability as an output answer. The calculation method of Softmax is as follows:

compared with the prior art, the invention at least has the following beneficial effects:

the invention relates to a natural language processing method based on machine reading understanding, which combines a bidirectional multi-head common attention and self-attention mechanism, respectively fuses information from an article to a question-answer pair and a question-answer pair to the article in two directions, wherein the information in the two directions is mutually supplemented, and the transposition thinking and the reverse thinking experience of a human in solving the machine reading understanding problem are simulated. And adding a residual error network, a feedforward network and a layer normalization network into the coding layer by referring to the transform structure, outputting a probability value of each answer option selected, and determining an answer according to the probability value. The natural relation among articles, questions and answers is captured better, and the prediction precision of the correct options is effectively improved. The method is based on a large-scale pre-training language model ALBERT, adopts a multi-head attention mechanism, can accurately pay attention to keywords related to correct answers in an article sequence, and solves the problem of multiple choices in natural language processing.

In summary, the invention is mainly optimized for multiple choices understood by machine reading, and an attention mechanism structure is designed according to the natural relationship among articles, questions and answers. The algorithmic experiments provided by the present invention are based on the Dream dataset, which is the first dialog-based multiple-choice reading understanding dataset. Each piece of data contains an article, a question, and three options, only one of which is correct. The experimental result shows that the method has better adaptability and certain accuracy, and can improve the prediction performance.

Drawings

FIG. 1 is a diagram of a pre-training-fine-tuning framework;

fig. 2 is a schematic diagram of a network model of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

It will be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

It is also to be understood that the terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in the specification of the present invention and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.

It should be further understood that the term "and/or" as used in this specification and the appended claims refers to and includes any and all possible combinations of one or more of the associated listed items.

Various structural schematics according to the disclosed embodiments of the invention are shown in the drawings. The figures are not drawn to scale, wherein certain details are exaggerated and possibly omitted for clarity of presentation. The shapes of the various regions, layers and their relative sizes, positional relationships are shown in the drawings as examples only, and in practice deviations due to manufacturing tolerances or technical limitations are possible, and a person skilled in the art may additionally design regions/layers with different shapes, sizes, relative positions, according to the actual needs.

The scheme provides a natural language processing method based on machine reading understanding multiple choices, which is applied to the field of natural language processing by using a bidirectional multi-head co-attention machine mechanism and comprises the steps of shielding mask processing, sentence prediction and the like. The method can fully capture the information of articles, questions and various answer options, and can better promote the expressions on natural language processing tasks needing understanding and reasoning, such as machine question answering, natural language reasoning or text matching:

referring to fig. 2, the processing method of a natural language system based on machine reading understanding of the present invention includes the following steps:

s1, encoding each lemma of an input data set by using a pre-training model (the ALBERT-large model is used in the patent) commonly used in the field of natural language processing. The Dream data set was collected from an examination of english designed by a human expert to evaluate the english comprehension level of a chinese learner as a foreign language, and comprised 10197 multiple choice questions and 6444 dialogues. Dream is the first reading understanding data set focused on deep multi-turn multi-party conversational understanding, in contrast to existing reading understanding data sets. The 84% of the answers in Dream are non-extractable, 85% of the questions need to be inferred outside of a single sentence, and 34% of the questions also involve common sense knowledge. Let P = [ P ] ₁ ，p ₂ ，...p _m ]Q＝[q ₁ ，q ₂ ，...q _n ]A＝[a ₁ ，a ₂ ，...a _k ]Respectively representing articles, questions and answer sequences, wherein pi, qi and ai respectively represent the lemmas in the sequences; the tensor P, Q, A is input into a pre-training model E and spliced

Using Enc () process, the output tensor E = [ E ] ₁ ，e ₂ ，...e _m+n+k ]Where ei is the encoded vector for each token, outputting a 768-dimensional token encoded vector.

S2, using the coding tensor obtained from the S1 to perform co-attention processing on the article and the problem-option tensor, and then performing self-attention processing on the problem-option tensor:

s201, mixing E = [ E = [) ₁ ，e ₂ ，...e _m+n+k ]Tensor E divided into articles ^P And question-option tensor E ^QA Expressed as:

and

wherein the content of the first and second substances,

is a token representation of the ith text,

is a token representation of the jth question-option pair,/ _p And l _qa Length, l, of presentation article and question-option sequence _p +l _qa ＝m+n+k。

S202, respectively adding E ^P And E ^QA The two-way co-attention mechanism is carried out as a Query vector to obtain two different attention scores;

MHA ₁ ＝MultiHeadAtten(E ^P ，E ^QA ，E ^QA )

MHA ₂ ＝MultiHeadAtten(E ^QA ，E ^P ，E ^P )

W ^O parameters for which the model requires training, d _k The coding length for each token for the pre-training model.

S203, two attention fraction vectors MHA are obtained ₁ And MHA ₂ Layer-Normalization processing is performed and connected with E through a residual error network ^P And E ^QA ，

MHA ₁ ＝LN(E ^P +MHA ₁ )

MHA ₂ ＝LN(E ^QA +MHA ₂ )

S204, the updated two attention scores MHA ₁ And MHA ₂ Inputting a feedforward neural network to obtain MHA' ₁ And MHA' ₂ Using Relu as an activation function and connecting the MHA via a residual network ₁ And MHA ₂ Then the normalization of the subsequent layer is carried out to obtain M ₁ And M ₂ ：

MHA′ ₁ ＝Linear(Relu(Linear(MHA ₁ )))

M ₁ ＝LN(MHA′ ₁ +MHA ₁ )

MHA′ ₂ ＝Linear(Relu(Linear(MHA ₂ )))

M ₂ ＝LN(MHA′ ₂ +MHA ₂ )

S205, tensor E for question-option ^QA Performing self-attention processing, and obtaining M through layer normalization and feedforward neural network ₃ ：

MHA ₃ ＝MHA(E ^QA ，E ^QA ，E ^QA )

MHA ₃ ＝LN(E ^QA +MHA ₃ )

MHA′ ₃ ＝Linear(Relu(Linear(MHA ₃ )))

M ₃ ＝LN(MHA′ ₃ +MHA ₃ )

S206. Repeating the bidirectional co-attention process and the self-attention process k-1 times, namely three attention scores M ₁ 、M ₂ And M ₃ The calculation process is repeated k-1 times and M is updated ₁ 、M ₂ And M ₃ The value of (c).

S3, fusing the tensors subjected to the attention processing to obtain a predicted tensor:

pooling means that the tensors are averaged in dimension 1, e.g., an m x n tensor becomes 1*n after averaging in dimension 1,

representing the join operation between tensors.

S4, inputting the prediction tensor corresponding to each option into a decoder to obtain the probability of the option being selected:

s401, O to be output _i And the tensor is input into the full connection layer operation to obtain the selected probability of each option. The fully-connected layer maps n-dimensional feature vectors to 1-dimensional feature vectors.

S402, normalizing the probability of each option through Softmax, and selecting the option with the maximum probability as an output answer. The calculation method of Softmax is as follows:

the natural language processing method based on machine reading understanding has practical significance in the fields of equal intelligent question answering, text generation matching, natural language reasoning, sentence prediction and the like. Letting the machine understand the human questions described by the user and give reasonable answers, which may exist in context, may be simply "if", or may not be directly answered, requiring the machine to generate answers according to its own understanding.

In another embodiment of the present invention, a natural language processing system based on machine reading understanding is provided, which can be used to implement the natural language processing method based on machine reading understanding, and specifically, the natural language processing system based on machine reading understanding includes a pre-training encoding module, a neural network module, and a judgment output module. The pre-training coding module is used for pre-training an original data set to obtain codes of pre-training word elements; splicing the question-answer and the article into a sequence according to various options in the machine reading understanding field, and constructing different inputs; the neural network module is used for constructing a deep neural network model understood by machine reading, and the model comprises a self-attention, bidirectional co-attention and multi-head attention fusion layer, a residual error layer and a Softmax output layer; and the judgment output module is used for outputting the option with the maximum probability as the correct answer based on the probability value of each option in the machine reading comprehension output of the natural language processing system of the machine reading comprehension.

In one embodiment, a terminal device is provided that includes a processor and a memory for storing a computer program comprising program instructions, the processor for executing the program instructions stored by the computer storage medium. The Processor may be a Central Processing Unit (CPU), or may be other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable gate array (FPGA) or other Programmable logic device, a discrete gate or transistor logic device, a discrete hardware component, etc., which is a computing core and a control core of the terminal, and is adapted to implement one or more instructions, and is specifically adapted to load and execute one or more instructions to implement a corresponding method flow or a corresponding function; the processor according to the embodiment of the present invention may be used for the operation of a natural language method processing method based on machine reading understanding, and the method includes:

performing data splicing and coding on an original data set needing pre-training, and inputting the data set into a preprocessing model to obtain a coding tensor; dividing the tensor into articles, questions and answer sequences according to multiple choices understood by machine reading to construct the articlesChapter tensor E ^P And the problem-option tensor E ^QA (ii) a The method comprises the steps of constructing a machine reading understanding deep neural network model, further processing a coding tensor by using an attention mechanism, constructing a proper attention mechanism from natural relations among articles, questions and answers, combining co-attentions and self-attentions, and adding a residual network, a feedforward network and a layer normalization network by referring to a transform-encoder structure. And outputting the probability value of each option corresponding to the output aspect category of the deep neural network model, and outputting the maximum probability as a correct answer.

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. The components of embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present invention, as presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

In order to verify the performance effect of the natural language processing method based on machine reading understanding, the evaluation is carried out on a dream public data set, and compared with the method of directly using a pre-training model, the structure obviously increases the answer accuracy. The experimental platform configuration is shown in table 1.

In terms of model training, the patent sets the learning rate to 1e-5, the batch size to 8, and the number of training rounds to 3.

Let k be the number of layers of the attention mechanism in step S2, change the size of k, calculate the corresponding accuracy (take the five-time average), to save the training time, use ALBERT-base as the pre-training model, and the result is shown in table 2:

TABLE 2 model accuracy at different k values

The results show that as the number of layers increases, the accuracy fluctuates, and too many layers cause a slight decrease in performance. The process is similar to human beings who may misunderstand the meaning of some information when they are done with machine reading understanding tasks. For networks with the existing number of parameters, two interactions are enough to capture key information, and too many layers disturb good model representation, making the model more difficult to train.

In step S2, the article and question-answer tensor are respectively used as query tensors in an attention mechanism, which is an important feature for fully modeling the relationship between the article and the question. To study this effect, the patent changed the two-way attention to one-way attention, i.e. experiments were conducted while only articles were retained as query tensor and question-answer as query tensor, respectively, with the results shown in table 3:

TABLE 3 comparison of two-way and one-way attention accuracy

The result shows that the performance is effectively improved by setting the bidirectional attention, and the efficiency of the bidirectional attention on relational modeling is reflected. It is also more intuitive to use two-way attention.

In step S2, the present patent refers to a transform-encoder structure with a residual structure, a layer normalization and a feedforward neural network. To investigate the effectiveness of this structure, we removed the structures and retained only the underlying co-attention and self-attention structures and performed experiments, with the results shown in table 4:

TABLE 4 comparison of transform structure to non-transform structure accuracy

From the experimental result, residual connection in a transform-encoder structure is added, and the accuracy of the model can be effectively improved by the feedforward neural network and layer normalization, and the parameter quantity is slightly increased. The training time of the algorithm for training the Google Colab TPU is about 30 minutes, and the algorithm can be called at any time after the training is finished. When the model is used, a user only needs to input articles, questions and options in a json format and call a prediction function of the model to obtain a prediction answer.

In conclusion, the invention simulates the transposition thinking experience of people in solving the reading and understanding problem of multiple choices, and provides a new bidirectional attention model to simulate the relationship among chapters, problems and answers in the reading and understanding task of the multi-choice machine. The method can be matched with a popular large-scale pre-training language model to bring effective performance improvement.

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

The above-mentioned contents are only for illustrating the technical idea of the present invention, and the protection scope of the present invention is not limited thereby, and any modification made on the basis of the technical idea of the present invention falls within the protection scope of the claims of the present invention.

Claims

1. A method for processing a natural language system based on machine-readable understanding, the system comprising:

the judgment output module is used for outputting probability values of all options in machine reading understanding based on the output of the natural language processing system for machine reading understanding, and outputting the option with the maximum probability as a correct answer;

the method comprises the following steps:

s1, inputting article content, question and answer options, and coding each morpheme of the input article, question and options by using a pre-training model commonly used in the field of natural language processing;

s2, tensor obtained from the coding layer is used, the interaction layer is processed by a multi-head bidirectional co-attention machine system, and the co-attention processing and the self-attention processing are carried out on the article tensor and the problem-option tensor;

s3, the decoding layer fuses the output tensors to obtain a prediction tensor;

and S4, inputting the prediction tensor corresponding to each option into the probability of selecting the option, so as to determine a correct answer.

2. The processing method of the natural language system based on machine reading understanding according to claim 1, wherein the step S1 is specifically:

each option X _i The answers, articles and questions are spliced to form an input sequence, and each lemma in the sequence is mapped into a vector with a fixed length;

let P = [ P ] ₁ ,p ₂ ,…p _m ]Q＝[q ₁ ,q ₂ ,…q _n ]A＝[a ₁ ,a ₂ ,…a _k ]Respectively representing articles, questions, answer sequences, where p _i ,q _i ,a _i Respectively representing the lemmas in the sequence;

the tensor P, Q, A is input into a pre-training model E, spliced and used in an encoding layer

Processing to obtain an output tensor E = [ E ] ₁ ,e ₂ ,…e _m+n+k ]Wherein e is _i Is a coded vector for each token.

3. The processing method of the natural language system based on machine reading understanding according to claim 1, wherein the step S2 is specifically:

s201, dividing the tensor encoded by the pre-training model into an article tensor and a question-option tensor (adding a corresponding mask before outputting the pre-training model) as the input of the interaction layer, and expressing them as:

and

wherein the content of the first and second substances,

is a token representation of the ith text,

s202, bidirectional co-attention processing is carried out on the article and the question-option tensor, namely E is respectively processed ^P And E ^QA Taking an attention mechanism as a Query vector to obtain two different attention scores;

s204, continuously inputting the two updated attention scores into a feedforward neural network, using Relu as an activation function, and continuously performing layer normalization processing after connection through a residual error network;

s205, problem-option tensor E ^QA From the attention self-attention process,and through layer normalization and feedforward neural network;

4. The method for processing a natural language system based on machine-readable understanding of claim 3, wherein the step S202 comprises the following steps: attention value is Attention, softmax is a normalized exponential function, T is vector transposition, d _k Is E ^QA The dimension of (2) is also the encoding length of the pre-training model for each lemma.

W ^O Head, parameters to be trained for the model _i For the ith attention value, multiHeadAtten is the total score of the multi-head attention model, concat represents the concatenation of the scores of multi-head attention, MHA ₁ And MHA ₂ It represents the different multi-head attention model scores from the article to the question-option and from the question-option to the article in two different directions, where the query and key values are different, representing a two-way mutual attention mechanism.

MultiHeadAtten(E ^P ,E ^QA ,E ^QA )＝Concat(head ₁ ,…,head _h )W ^O

MHA ₁ ＝MultiHeadAtten(E ^P ,E ^QA ,E ^QA )

MHA ₂ ＝MultiHeadAtten(E ^QA ,E ^P ,E ^P )。

5. The processing method of the natural language system based on machine reading understanding of claim 3, wherein the step S203 comprises the following steps: the obtained multi-head attention model score MHA ₁ 、MHA ₂ Performing layer normalization processing and residual error connection:

MHA ₁ ＝LN(E ^P +MHA ₁ )

MHA ₂ ＝LN(E ^QA +MHA ₂ )。

6. the method for processing a natural language system based on machine-readable understanding of claim 3, wherein the step S204 comprises the following steps: continuing to connect the MHA ₁ 、MHA ₂ Inputting a feedforward neural network, using Relu as an activation function, connecting through a residual error network, and then performing layer normalization:

MHA′ ₁ ＝Linear(Relu(Linear(MHA ₁ )))

M ₁ ＝LN(MHA′ ₁ +MHA ₁ )

MHA′ ₂ ＝Linera(Relu(Linear(MHA ₂ )))

M ₂ ＝LN(MHA′ ₂ +MHA ₂ )。

7. the method for processing the natural language system based on the machine-readable comprehension of claim 3, wherein the step S205 comprises: answer to question E ^QA Performing self-attention treatment, performing layer normalization and residual error connection:

MHA ₃ ＝MHA(E ^QA ,E ^QA ,E ^QA )

MHA ₃ ＝LN(E ^QA +MHA ₃ )

MHA′ ₃ ＝Linear(Relu(Linear(MHA ₃ )))

M ₃ ＝LN(MHA′ ₃ +MHA ₃ )。

8. according to claim 3The processing method of the natural language system based on machine reading understanding is characterized in that the step S206 includes: repeating the steps S201, S202, S203, S204 and S205 k-1 times, and mixing M ₁ 、M ₂ 、M ₃ The method iterates k-1 times.

9. The processing method of the natural language system based on the machine reading understanding as claimed in claim 1, wherein the step S3 of calculating process is specifically: the tensor M processed by attention ₁ 、M ₂ 、M ₃ Performing a fusion treatment, O _i ＝

representing the connection operations between tensors.

10. The processing method of the natural language system based on machine reading understanding according to claim 1, wherein the step S4 of decoding the prediction vector to obtain the option probability includes:

s401, using O obtained in the step S3 _i And the tensor is input into the full connection layer operation to obtain the selected probability of each option. The fully connected layer maps the n-dimensional feature vectors into 1-dimension;