CN115455985A - Natural language system processing method based on machine reading understanding - Google Patents

Natural language system processing method based on machine reading understanding Download PDF

Info

Publication number
CN115455985A
CN115455985A CN202211135769.5A CN202211135769A CN115455985A CN 115455985 A CN115455985 A CN 115455985A CN 202211135769 A CN202211135769 A CN 202211135769A CN 115455985 A CN115455985 A CN 115455985A
Authority
CN
China
Prior art keywords
mha
attention
option
natural language
tensor
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211135769.5A
Other languages
Chinese (zh)
Inventor
戴慧珺
方强
郭小雷
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou Huijun Tao Intelligent Technology Co ltd
Original Assignee
Suzhou Huijun Tao Intelligent Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou Huijun Tao Intelligent Technology Co ltd filed Critical Suzhou Huijun Tao Intelligent Technology Co ltd
Priority to CN202211135769.5A priority Critical patent/CN115455985A/en
Publication of CN115455985A publication Critical patent/CN115455985A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • G06F40/35Discourse or dialogue representation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Abstract

The invention discloses a processing method of a natural language system based on machine reading understanding, which comprises the steps of firstly analyzing the relation among articles, questions and answers selected by multiple items in a simulated machine reading understanding task based on the prior pre-training ALBERT model, providing a bidirectional multi-head common attention model at an interaction layer, coding the input articles and the answers of the questions as two query directions respectively, and combining the two query directions with a self-attention mechanism; adding a residual error network, a feedforward network and a layer normalization network into an encoding layer by referring to a transform structure; the output result is a probability value of each selected answer option, and the answer is determined according to the probability value. The method is based on the large-scale pre-training language model ALBERT, the transposition thinking experience of a human in solving the machine reading understanding problem is simulated, the natural relation among articles, problems and answers is better utilized, the prediction precision of correct options is effectively improved, and multiple selection problems in natural language processing are solved. The method is applied to the fields of natural language processing such as shielding processing, mask models, sentence prediction and the like, can fully capture the information of articles, questions and answer options, and can effectively improve the performance of machine question answering, natural language reasoning or text matching on natural language processing tasks needing interphrase understanding reasoning.

Description

Natural language system processing method based on machine reading understanding
Technical Field
The invention belongs to the field of multiple choices of machine reading comprehension under the processing of a neural network model, and particularly relates to a processing method of a natural language system based on machine reading comprehension.
Background
Natural Language Processing (NLP) is an important direction in the fields of computer science and artificial intelligence, and various theories and methods for realizing effective communication between people and computers using Natural Language are studied. Machine Reading Comprehension (MRC) is one of the most challenging tasks in the field of natural language processing, and an algorithm is used for enabling a computer to acquire article semantics and answer related questions, and the MRC gives the computer the ability to read, analyze and summarize texts. Machine-reading understanding can be essentially formalized as a supervised learning problem: a triple of text, question, and answer (P, Q, A) is given, where P represents an article, Q represents a question, and a represents an answer. The MRC task is to obtain a predictor f through training and learning, wherein P and Q are used as input, and a correct answer A is used as output. f: (P, Q) → A.
In the process of solving the MRC problem, there are a traditional method based on feature matching and a method based on neural reading understanding model. The traditional characteristic matching method manually constructs rules or extracts characteristics to complete matching between sentences or words, and then picks out the answer with the highest matching degree. The traditional method is greatly limited by characteristics, words and linguistic data, difficult to generalize and low in efficiency. Reading understanding models based on deep learning become mainstream, particularly, large-scale pre-training coding models are widely applied to the fields of natural language processing and neural reading understanding models, and the pre-training models obtain models irrelevant to specific tasks from large-scale data through self-supervision learning and represent semantic representation of a certain word in a specific context. The pre-training model represented by the Bert can be used for properly coding each word element according to the context meaning, and the ALBERT is improved on the basis of the Bert, so that the model parameters are reduced, and the semantic comprehension capability of the model is enhanced. ALBERT uses embedded layer parameter factorization and cross-layer parameter sharing to reduce memory overhead and increase training speed. The attention mechanism is widely applied to a machine reading and understanding task, the multi-head attention mechanism adopts h different self-attention splices, the pre-coded context can be fully fused, the final attention value is calculated by multiple times of parallel scaling dot products, and the output dimension of the multi-head attention mechanism is consistent with the output dimension of the single attention mechanism.
The neural reading understanding model technology and the attention mechanism are widely applied to the fields of natural language processing and machine reading understanding, and because most models do not fully consider human reverse thinking for context fusion after precoding, another query direction from a question-answer to an article is ignored, a predictor cannot fully learn the potential and natural logical relations among the article, the question and the answer, and the logical relation among the sentence questions and answers of the paragraph of the article or the associated information among words are ignored.
Disclosure of Invention
The invention aims to provide an accurate and effective method of a natural language processing system, which is used in the fields of multiple choices of machine reading understanding, reasoning prediction and the like. On the basis of the existing pre-training model, starting from the natural relation among articles, questions and answers, co-attention and Self-attention are combined to construct a proper multi-head bidirectional Co-attention machine mechanism; adding a residual error network, a feedforward network and a layer normalization network by referring to a Transformer-encoder structure; and correcting the network parameters aiming at the specific task, outputting the probability of each option selected by using the full-connection layer as a decoder, and selecting the option with the highest probability value. Finally, training is performed on the dream dataset, and compared with the method and the system which directly use a pre-training model, the method and the system remarkably improve the answer accuracy.
The invention adopts the following technical scheme:
a method of processing a natural language system based on machine-readable understanding, the system comprising:
the pre-training coding module is used for pre-training the original data set to obtain codes of pre-training word elements; splicing the question-answer and the article into a sequence according to various options in the machine reading understanding field, and constructing different inputs;
the neural network module is used for constructing a deep neural network model understood by machine reading, and the model comprises a self-attention, bidirectional co-attention and multi-head attention fusion layer, a residual error layer and a Softmax output layer;
the judgment output module is used for outputting the option with the maximum probability as a correct answer based on the probability value of each option in the machine reading understanding output by the natural language processing system of the machine reading understanding;
the method comprises the following steps:
s1, encoding each word element of an input article, a question and an option by using a pre-training model commonly used in the field of natural language processing;
and S2, constructing an article and a problem-option tensor, and adding a residual error network, a layer normalization and a feedforward neural network by referring to a Transformer-encoder structure through a co-attention mechanism and a self-attention mechanism to obtain three groups of output tensors. Training the whole model by using a random small batch gradient descent method;
s3, fusing the three groups of prediction tensors output in the S2 to obtain a prediction tensor corresponding to each option;
and S4, inputting the prediction tensor of each option into a decoder to obtain the probability of the option being selected, normalizing by using a Softmax function, and selecting the option with the highest probability as output.
Specifically, step S1 combines each option X i Splicing the answers with articles and questions into a sequence, and mapping each word element in the sequence into a vector with a fixed length;
let P = [ P ] 1 ,p 2 ,...p m ]Q=[q 1 ,q 2 ,...q n ]A=[a 1 ,a 2 ,...a k ]Respectively representing articles, questions, answer sequences, where p i ,q i ,a i Respectively representing the lemmas in the sequence;
inputting P, Q, A into pre-training model E at coding levelUse of
Figure BDA0003852011360000036
Processing to obtain an output tensor E = [ E ] 1 ,e 2 ,...e m+n+k ]Where ei is the encoding vector for each token.
Specifically, step S2 specifically includes:
s201, dividing the tensor coded by the pre-training model into an article tensor and a question-option tensor, and expressing the tensor as follows:
Figure BDA0003852011360000035
Figure BDA0003852011360000031
and
Figure BDA0003852011360000032
wherein, the first and the second end of the pipe are connected with each other,
Figure BDA0003852011360000033
is a token representation of the ith article,
Figure BDA0003852011360000034
is a token representation of the jth question-option pair,/ p And l qa Length of the presentation article and question-option sequence;
s202, bidirectional co-attention processing is carried out on the article and the question-option tensor, namely E is respectively processed P And E QA Taking the attention mechanism as a Query vector to obtain two different attention scores;
s203, performing layer normalization processing on the two acquired attention score vectors and connecting the two acquired attention score vectors through a residual error network;
s204, inputting the two updated attention scores into a feedforward neural network, using Relu as an activation function, and continuing to perform layer normalization after connecting through a residual error network;
s205, problem-option tensor E QA Performing self-attention processing, and performing layer normalization and feedforward neural network;
and S206, repeating the bidirectional co-attention process and the self-attention process k-1 times, namely repeating the updated three attention score calculation processes k-1 times.
Further, in step S202, attention value is Attention, softmax is normalized exponential function, T is vector transposition, d k Is E QA The dimension of (2) is also the encoding length of the pre-training model to the lemma.
Figure BDA0003852011360000041
W O Parameters for model training, head i For the ith attention value, multiHeadAtten is the total score of the multi-head attention model, concat represents the concatenation of the scores of multi-head attention, MHA 1 And MHA 2 It represents the different multi-head attention model scores from the article to the question-option and from the question-option to the article in two different directions, where the query and key values are different, representing a two-way mutual attention mechanism.
Figure BDA0003852011360000042
Figure BDA0003852011360000043
MultiHeadAtten(E P ,E QA ,E QA )=Concat(head 1 ,…,head h )W O
MHA 1 =MultiHeadAtten(E P ,E QA ,E QA )
MHA 2 =MultiHeadAtten(E QA ,E P ,E P )
Further, in step S203, the multi-head attention model score MHA 1 、MHA 2 Performing layer normalization processing and residual error connection:
MHA 1 =LN(E P +MHA 1 )
MHA 2 =LN(E QA +MHA 2 )
further, in step S204, the MHA is continued to be activated 1 、MHA 2 Inputting a feedforward neural network, using Relu as an activation function, connecting through a residual error network, and then performing layer normalization:
MHA′ 1 =Linear(Relu(Linear(MHA 1 )))
M 1 =LN(MHA′ 1 +MHA 1 )
MHA′ 2 =Linear(Relu(Linear(MHA 2 )))
M 2 =LN(MHA′ 2 +MHA 2 )
further, in step S205, answer E to the question QA A self-attention process is carried out, and layer normalization is carried out to carry out residual error connection:
MHA 3 =MHA(E QA ,E QA ,E QA )
MHA 3 =LN(E QA +MHA 3 )
MHA 3 =Linear(Relu(Linear(MHA 3 )))
M 3 =LN(MHA′ 3 +MHA 3 )
further, in step S206: repeating the steps S201, S202, S203, S204 and S205 for k-1 times, and adding M 1 、M 2 、M 3 The method iterates k-1 times.
Specifically, step S3 specifically includes: the tensors after attention mechanism processing are fused,
Figure BDA0003852011360000051
Figure BDA0003852011360000052
pooling means that the tensor is averaged in dimension 1, e.g., the m n tensor averaged in dimension 1 becomes 1*n in size,
Figure BDA0003852011360000053
representing the join operation between tensors.
Further, in step S401: o to be output i And the tensor is input into the full connection layer operation to obtain the selected probability of each option. The fully-connected layer maps n-dimensional feature vectors to 1-dimensional.
Further, in step S402: and (4) normalizing the probability of each option through Softmax, and selecting the option with the highest probability as an output answer. The calculation method of Softmax is as follows:
Figure BDA0003852011360000054
compared with the prior art, the invention at least has the following beneficial effects:
the invention relates to a natural language processing method based on machine reading understanding, which combines a bidirectional multi-head common attention and self-attention mechanism, respectively fuses information from an article to a question-answer pair and a question-answer pair to the article in two directions, wherein the information in the two directions is mutually supplemented, and the transposition thinking and the reverse thinking experience of a human in solving the machine reading understanding problem are simulated. And adding a residual error network, a feedforward network and a layer normalization network into the coding layer by referring to the transform structure, outputting a probability value of each answer option selected, and determining an answer according to the probability value. The natural relation among articles, questions and answers is captured better, and the prediction precision of the correct options is effectively improved. The method is based on a large-scale pre-training language model ALBERT, adopts a multi-head attention mechanism, can accurately pay attention to keywords related to correct answers in an article sequence, and solves the problem of multiple choices in natural language processing.
In summary, the invention is mainly optimized for multiple choices understood by machine reading, and an attention mechanism structure is designed according to the natural relationship among articles, questions and answers. The algorithmic experiments provided by the present invention are based on the Dream dataset, which is the first dialog-based multiple-choice reading understanding dataset. Each piece of data contains an article, a question, and three options, only one of which is correct. The experimental result shows that the method has better adaptability and certain accuracy, and can improve the prediction performance.
Drawings
FIG. 1 is a diagram of a pre-training-fine-tuning framework;
fig. 2 is a schematic diagram of a network model of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It will be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
It is also to be understood that the terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in the specification of the present invention and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.
It should be further understood that the term "and/or" as used in this specification and the appended claims refers to and includes any and all possible combinations of one or more of the associated listed items.
Various structural schematics according to the disclosed embodiments of the invention are shown in the drawings. The figures are not drawn to scale, wherein certain details are exaggerated and possibly omitted for clarity of presentation. The shapes of the various regions, layers and their relative sizes, positional relationships are shown in the drawings as examples only, and in practice deviations due to manufacturing tolerances or technical limitations are possible, and a person skilled in the art may additionally design regions/layers with different shapes, sizes, relative positions, according to the actual needs.
The scheme provides a natural language processing method based on machine reading understanding multiple choices, which is applied to the field of natural language processing by using a bidirectional multi-head co-attention machine mechanism and comprises the steps of shielding mask processing, sentence prediction and the like. The method can fully capture the information of articles, questions and various answer options, and can better promote the expressions on natural language processing tasks needing understanding and reasoning, such as machine question answering, natural language reasoning or text matching:
referring to fig. 2, the processing method of a natural language system based on machine reading understanding of the present invention includes the following steps:
s1, encoding each lemma of an input data set by using a pre-training model (the ALBERT-large model is used in the patent) commonly used in the field of natural language processing. The Dream data set was collected from an examination of english designed by a human expert to evaluate the english comprehension level of a chinese learner as a foreign language, and comprised 10197 multiple choice questions and 6444 dialogues. Dream is the first reading understanding data set focused on deep multi-turn multi-party conversational understanding, in contrast to existing reading understanding data sets. The 84% of the answers in Dream are non-extractable, 85% of the questions need to be inferred outside of a single sentence, and 34% of the questions also involve common sense knowledge. Let P = [ P ] 1 ,p 2 ,...p m ]Q=[q 1 ,q 2 ,...q n ]A=[a 1 ,a 2 ,...a k ]Respectively representing articles, questions and answer sequences, wherein pi, qi and ai respectively represent the lemmas in the sequences; the tensor P, Q, A is input into a pre-training model E and spliced
Figure BDA0003852011360000071
Using Enc () process, the output tensor E = [ E ] 1 ,e 2 ,...e m+n+k ]Where ei is the encoded vector for each token, outputting a 768-dimensional token encoded vector.
S2, using the coding tensor obtained from the S1 to perform co-attention processing on the article and the problem-option tensor, and then performing self-attention processing on the problem-option tensor:
s201, mixing E = [ E = [) 1 ,e 2 ,...e m+n+k ]Tensor E divided into articles P And question-option tensor E QA Expressed as:
Figure BDA0003852011360000081
Figure BDA0003852011360000082
and
Figure BDA0003852011360000083
wherein the content of the first and second substances,
Figure BDA0003852011360000084
is a token representation of the ith text,
Figure BDA0003852011360000085
is a token representation of the jth question-option pair,/ p And l qa Length, l, of presentation article and question-option sequence p +l qa =m+n+k。
S202, respectively adding E P And E QA The two-way co-attention mechanism is carried out as a Query vector to obtain two different attention scores;
Figure BDA0003852011360000086
Figure BDA0003852011360000087
MultiHeadAtten(E P ,E QA ,E QA )=Concat(head 1 ,…,head h )W O
MHA 1 =MultiHeadAtten(E P ,E QA ,E QA )
MHA 2 =MultiHeadAtten(E QA ,E P ,E P )
Figure BDA0003852011360000088
W O parameters for which the model requires training, d k The coding length for each token for the pre-training model.
S203, two attention fraction vectors MHA are obtained 1 And MHA 2 Layer-Normalization processing is performed and connected with E through a residual error network P And E QA
MHA 1 =LN(E P +MHA 1 )
MHA 2 =LN(E QA +MHA 2 )
S204, the updated two attention scores MHA 1 And MHA 2 Inputting a feedforward neural network to obtain MHA' 1 And MHA' 2 Using Relu as an activation function and connecting the MHA via a residual network 1 And MHA 2 Then the normalization of the subsequent layer is carried out to obtain M 1 And M 2
MHA′ 1 =Linear(Relu(Linear(MHA 1 )))
M 1 =LN(MHA′ 1 +MHA 1 )
MHA′ 2 =Linear(Relu(Linear(MHA 2 )))
M 2 =LN(MHA′ 2 +MHA 2 )
S205, tensor E for question-option QA Performing self-attention processing, and obtaining M through layer normalization and feedforward neural network 3
MHA 3 =MHA(E QA ,E QA ,E QA )
MHA 3 =LN(E QA +MHA 3 )
MHA′ 3 =Linear(Relu(Linear(MHA 3 )))
M 3 =LN(MHA′ 3 +MHA 3 )
S206. Repeating the bidirectional co-attention process and the self-attention process k-1 times, namely three attention scores M 1 、M 2 And M 3 The calculation process is repeated k-1 times and M is updated 1 、M 2 And M 3 The value of (c).
S3, fusing the tensors subjected to the attention processing to obtain a predicted tensor:
Figure BDA0003852011360000091
pooling means that the tensors are averaged in dimension 1, e.g., an m x n tensor becomes 1*n after averaging in dimension 1,
Figure BDA0003852011360000092
representing the join operation between tensors.
S4, inputting the prediction tensor corresponding to each option into a decoder to obtain the probability of the option being selected:
s401, O to be output i And the tensor is input into the full connection layer operation to obtain the selected probability of each option. The fully-connected layer maps n-dimensional feature vectors to 1-dimensional feature vectors.
S402, normalizing the probability of each option through Softmax, and selecting the option with the maximum probability as an output answer. The calculation method of Softmax is as follows:
Figure BDA0003852011360000093
the natural language processing method based on machine reading understanding has practical significance in the fields of equal intelligent question answering, text generation matching, natural language reasoning, sentence prediction and the like. Letting the machine understand the human questions described by the user and give reasonable answers, which may exist in context, may be simply "if", or may not be directly answered, requiring the machine to generate answers according to its own understanding.
In another embodiment of the present invention, a natural language processing system based on machine reading understanding is provided, which can be used to implement the natural language processing method based on machine reading understanding, and specifically, the natural language processing system based on machine reading understanding includes a pre-training encoding module, a neural network module, and a judgment output module. The pre-training coding module is used for pre-training an original data set to obtain codes of pre-training word elements; splicing the question-answer and the article into a sequence according to various options in the machine reading understanding field, and constructing different inputs; the neural network module is used for constructing a deep neural network model understood by machine reading, and the model comprises a self-attention, bidirectional co-attention and multi-head attention fusion layer, a residual error layer and a Softmax output layer; and the judgment output module is used for outputting the option with the maximum probability as the correct answer based on the probability value of each option in the machine reading comprehension output of the natural language processing system of the machine reading comprehension.
In one embodiment, a terminal device is provided that includes a processor and a memory for storing a computer program comprising program instructions, the processor for executing the program instructions stored by the computer storage medium. The Processor may be a Central Processing Unit (CPU), or may be other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable gate array (FPGA) or other Programmable logic device, a discrete gate or transistor logic device, a discrete hardware component, etc., which is a computing core and a control core of the terminal, and is adapted to implement one or more instructions, and is specifically adapted to load and execute one or more instructions to implement a corresponding method flow or a corresponding function; the processor according to the embodiment of the present invention may be used for the operation of a natural language method processing method based on machine reading understanding, and the method includes:
performing data splicing and coding on an original data set needing pre-training, and inputting the data set into a preprocessing model to obtain a coding tensor; dividing the tensor into articles, questions and answer sequences according to multiple choices understood by machine reading to construct the articlesChapter tensor E P And the problem-option tensor E QA (ii) a The method comprises the steps of constructing a machine reading understanding deep neural network model, further processing a coding tensor by using an attention mechanism, constructing a proper attention mechanism from natural relations among articles, questions and answers, combining co-attentions and self-attentions, and adding a residual network, a feedforward network and a layer normalization network by referring to a transform-encoder structure. And outputting the probability value of each option corresponding to the output aspect category of the deep neural network model, and outputting the maximum probability as a correct answer.
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. The components of embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present invention, as presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
In order to verify the performance effect of the natural language processing method based on machine reading understanding, the evaluation is carried out on a dream public data set, and compared with the method of directly using a pre-training model, the structure obviously increases the answer accuracy. The experimental platform configuration is shown in table 1.
Figure BDA0003852011360000111
In terms of model training, the patent sets the learning rate to 1e-5, the batch size to 8, and the number of training rounds to 3.
Let k be the number of layers of the attention mechanism in step S2, change the size of k, calculate the corresponding accuracy (take the five-time average), to save the training time, use ALBERT-base as the pre-training model, and the result is shown in table 2:
TABLE 2 model accuracy at different k values
Figure BDA0003852011360000112
The results show that as the number of layers increases, the accuracy fluctuates, and too many layers cause a slight decrease in performance. The process is similar to human beings who may misunderstand the meaning of some information when they are done with machine reading understanding tasks. For networks with the existing number of parameters, two interactions are enough to capture key information, and too many layers disturb good model representation, making the model more difficult to train.
In step S2, the article and question-answer tensor are respectively used as query tensors in an attention mechanism, which is an important feature for fully modeling the relationship between the article and the question. To study this effect, the patent changed the two-way attention to one-way attention, i.e. experiments were conducted while only articles were retained as query tensor and question-answer as query tensor, respectively, with the results shown in table 3:
TABLE 3 comparison of two-way and one-way attention accuracy
Figure BDA0003852011360000121
The result shows that the performance is effectively improved by setting the bidirectional attention, and the efficiency of the bidirectional attention on relational modeling is reflected. It is also more intuitive to use two-way attention.
In step S2, the present patent refers to a transform-encoder structure with a residual structure, a layer normalization and a feedforward neural network. To investigate the effectiveness of this structure, we removed the structures and retained only the underlying co-attention and self-attention structures and performed experiments, with the results shown in table 4:
TABLE 4 comparison of transform structure to non-transform structure accuracy
Figure BDA0003852011360000122
From the experimental result, residual connection in a transform-encoder structure is added, and the accuracy of the model can be effectively improved by the feedforward neural network and layer normalization, and the parameter quantity is slightly increased. The training time of the algorithm for training the Google Colab TPU is about 30 minutes, and the algorithm can be called at any time after the training is finished. When the model is used, a user only needs to input articles, questions and options in a json format and call a prediction function of the model to obtain a prediction answer.
In conclusion, the invention simulates the transposition thinking experience of people in solving the reading and understanding problem of multiple choices, and provides a new bidirectional attention model to simulate the relationship among chapters, problems and answers in the reading and understanding task of the multi-choice machine. The method can be matched with a popular large-scale pre-training language model to bring effective performance improvement.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The above-mentioned contents are only for illustrating the technical idea of the present invention, and the protection scope of the present invention is not limited thereby, and any modification made on the basis of the technical idea of the present invention falls within the protection scope of the claims of the present invention.

Claims (10)

1. A method for processing a natural language system based on machine-readable understanding, the system comprising:
the pre-training coding module is used for pre-training the original data set to obtain codes of pre-training word elements; splicing the question-answer and the article into a sequence according to various options in the machine reading understanding field, and constructing different inputs;
the neural network module is used for constructing a deep neural network model understood by machine reading, and the model comprises a self-attention, bidirectional co-attention and multi-head attention fusion layer, a residual error layer and a Softmax output layer;
the judgment output module is used for outputting probability values of all options in machine reading understanding based on the output of the natural language processing system for machine reading understanding, and outputting the option with the maximum probability as a correct answer;
the method comprises the following steps:
s1, inputting article content, question and answer options, and coding each morpheme of the input article, question and options by using a pre-training model commonly used in the field of natural language processing;
s2, tensor obtained from the coding layer is used, the interaction layer is processed by a multi-head bidirectional co-attention machine system, and the co-attention processing and the self-attention processing are carried out on the article tensor and the problem-option tensor;
s3, the decoding layer fuses the output tensors to obtain a prediction tensor;
and S4, inputting the prediction tensor corresponding to each option into the probability of selecting the option, so as to determine a correct answer.
2. The processing method of the natural language system based on machine reading understanding according to claim 1, wherein the step S1 is specifically:
each option X i The answers, articles and questions are spliced to form an input sequence, and each lemma in the sequence is mapped into a vector with a fixed length;
let P = [ P ] 1 ,p 2 ,…p m ]Q=[q 1 ,q 2 ,…q n ]A=[a 1 ,a 2 ,…a k ]Respectively representing articles, questions, answer sequences, where p i ,q i ,a i Respectively representing the lemmas in the sequence;
the tensor P, Q, A is input into a pre-training model E, spliced and used in an encoding layer
Figure FDA0003852011350000011
Processing to obtain an output tensor E = [ E ] 1 ,e 2 ,…e m+n+k ]Wherein e is i Is a coded vector for each token.
3. The processing method of the natural language system based on machine reading understanding according to claim 1, wherein the step S2 is specifically:
s201, dividing the tensor encoded by the pre-training model into an article tensor and a question-option tensor (adding a corresponding mask before outputting the pre-training model) as the input of the interaction layer, and expressing them as:
Figure FDA0003852011350000021
and
Figure FDA0003852011350000022
Figure FDA0003852011350000023
wherein the content of the first and second substances,
Figure FDA0003852011350000024
is a token representation of the ith text,
Figure FDA0003852011350000025
is a token representation of the jth question-option pair,/ p And l qa Length of the presentation article and question-option sequence;
s202, bidirectional co-attention processing is carried out on the article and the question-option tensor, namely E is respectively processed P And E QA Taking an attention mechanism as a Query vector to obtain two different attention scores;
s203, performing layer normalization processing on the two acquired attention score vectors and connecting the two acquired attention score vectors through a residual error network;
s204, continuously inputting the two updated attention scores into a feedforward neural network, using Relu as an activation function, and continuously performing layer normalization processing after connection through a residual error network;
s205, problem-option tensor E QA From the attention self-attention process,and through layer normalization and feedforward neural network;
and S206, repeating the bidirectional co-attention process and the self-attention process k-1 times, namely repeating the updated three attention score calculation processes k-1 times.
4. The method for processing a natural language system based on machine-readable understanding of claim 3, wherein the step S202 comprises the following steps: attention value is Attention, softmax is a normalized exponential function, T is vector transposition, d k Is E QA The dimension of (2) is also the encoding length of the pre-training model for each lemma.
Figure FDA0003852011350000026
W O Head, parameters to be trained for the model i For the ith attention value, multiHeadAtten is the total score of the multi-head attention model, concat represents the concatenation of the scores of multi-head attention, MHA 1 And MHA 2 It represents the different multi-head attention model scores from the article to the question-option and from the question-option to the article in two different directions, where the query and key values are different, representing a two-way mutual attention mechanism.
Figure FDA0003852011350000031
Figure FDA0003852011350000032
MultiHeadAtten(E P ,E QA ,E QA )=Concat(head 1 ,…,head h )W O
MHA 1 =MultiHeadAtten(E P ,E QA ,E QA )
MHA 2 =MultiHeadAtten(E QA ,E P ,E P )。
5. The processing method of the natural language system based on machine reading understanding of claim 3, wherein the step S203 comprises the following steps: the obtained multi-head attention model score MHA 1 、MHA 2 Performing layer normalization processing and residual error connection:
MHA 1 =LN(E P +MHA 1 )
MHA 2 =LN(E QA +MHA 2 )。
6. the method for processing a natural language system based on machine-readable understanding of claim 3, wherein the step S204 comprises the following steps: continuing to connect the MHA 1 、MHA 2 Inputting a feedforward neural network, using Relu as an activation function, connecting through a residual error network, and then performing layer normalization:
MHA′ 1 =Linear(Relu(Linear(MHA 1 )))
M 1 =LN(MHA′ 1 +MHA 1 )
MHA′ 2 =Linera(Relu(Linear(MHA 2 )))
M 2 =LN(MHA′ 2 +MHA 2 )。
7. the method for processing the natural language system based on the machine-readable comprehension of claim 3, wherein the step S205 comprises: answer to question E QA Performing self-attention treatment, performing layer normalization and residual error connection:
MHA 3 =MHA(E QA ,E QA ,E QA )
MHA 3 =LN(E QA +MHA 3 )
MHA′ 3 =Linear(Relu(Linear(MHA 3 )))
M 3 =LN(MHA′ 3 +MHA 3 )。
8. according to claim 3The processing method of the natural language system based on machine reading understanding is characterized in that the step S206 includes: repeating the steps S201, S202, S203, S204 and S205 k-1 times, and mixing M 1 、M 2 、M 3 The method iterates k-1 times.
9. The processing method of the natural language system based on the machine reading understanding as claimed in claim 1, wherein the step S3 of calculating process is specifically: the tensor M processed by attention 1 、M 2 、M 3 Performing a fusion treatment, O i
Figure FDA0003852011350000041
Pooling means that the tensor is averaged in dimension 1, e.g., the m n tensor averaged in dimension 1 becomes 1*n in size,
Figure FDA0003852011350000042
representing the connection operations between tensors.
10. The processing method of the natural language system based on machine reading understanding according to claim 1, wherein the step S4 of decoding the prediction vector to obtain the option probability includes:
s401, using O obtained in the step S3 i And the tensor is input into the full connection layer operation to obtain the selected probability of each option. The fully connected layer maps the n-dimensional feature vectors into 1-dimension;
s402, normalizing the probability of each option through Softmax, and selecting the option with the maximum probability as an output answer. The calculation method of Softmax is as follows:
Figure FDA0003852011350000043
CN202211135769.5A 2022-09-19 2022-09-19 Natural language system processing method based on machine reading understanding Pending CN115455985A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211135769.5A CN115455985A (en) 2022-09-19 2022-09-19 Natural language system processing method based on machine reading understanding

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211135769.5A CN115455985A (en) 2022-09-19 2022-09-19 Natural language system processing method based on machine reading understanding

Publications (1)

Publication Number Publication Date
CN115455985A true CN115455985A (en) 2022-12-09

Family

ID=84305636

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211135769.5A Pending CN115455985A (en) 2022-09-19 2022-09-19 Natural language system processing method based on machine reading understanding

Country Status (1)

Country Link
CN (1) CN115455985A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116561286A (en) * 2023-07-06 2023-08-08 杭州华鲤智能科技有限公司 Dialogue method and device

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116561286A (en) * 2023-07-06 2023-08-08 杭州华鲤智能科技有限公司 Dialogue method and device
CN116561286B (en) * 2023-07-06 2023-10-27 杭州华鲤智能科技有限公司 Dialogue method and device

Similar Documents

Publication Publication Date Title
CN111444709B (en) Text classification method, device, storage medium and equipment
CN109657041B (en) Deep learning-based automatic problem generation method
CN110188176B (en) Deep learning neural network, and training and predicting method, system, device and medium
US20180144234A1 (en) Sentence Embedding for Sequence-To-Sequence Matching in a Question-Answer System
CN112131366A (en) Method, device and storage medium for training text classification model and text classification
CN114565104A (en) Language model pre-training method, result recommendation method and related device
Mainzer Artificial intelligence-When do machines take over?
CN109739995B (en) Information processing method and device
CN108595436A (en) The generation method and system of emotion conversation content, storage medium
CN113408430B (en) Image Chinese description system and method based on multi-level strategy and deep reinforcement learning framework
CN113239169A (en) Artificial intelligence-based answer generation method, device, equipment and storage medium
Tang et al. Modelling student behavior using granular large scale action data from a MOOC
CN111274375A (en) Multi-turn dialogue method and system based on bidirectional GRU network
CN113779220A (en) Mongolian multi-hop question-answering method based on three-channel cognitive map and graph attention network
CN112905772B (en) Semantic correlation analysis method and device and related products
CN111145914B (en) Method and device for determining text entity of lung cancer clinical disease seed bank
KR20200023664A (en) Response inference method and apparatus
CN111966811A (en) Intention recognition and slot filling method and device, readable storage medium and terminal equipment
CN115455985A (en) Natural language system processing method based on machine reading understanding
Huang et al. TeFNA: Text-centered fusion network with crossmodal attention for multimodal sentiment analysis
CN114492451A (en) Text matching method and device, electronic equipment and computer readable storage medium
CN111813907A (en) Question and sentence intention identification method in natural language question-answering technology
CN114048301B (en) Satisfaction-based user simulation method and system
CN116258147A (en) Multimode comment emotion analysis method and system based on heterogram convolution
CN113010662B (en) Hierarchical conversational machine reading understanding system and method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination