CN114741485A

CN114741485A - English choice question answer prediction method, device, equipment and storage medium

Info

Publication number: CN114741485A
Application number: CN202110025511.9A
Authority: CN
Inventors: 林鑫
Original assignee: Guangzhou Shiyuan Electronics Thecnology Co Ltd; Guangzhou Shikun Electronic Technology Co Ltd
Current assignee: Guangzhou Shiyuan Electronics Thecnology Co Ltd; Guangzhou Shikun Electronic Technology Co Ltd
Priority date: 2021-01-08
Filing date: 2021-01-08
Publication date: 2022-07-12

Abstract

The invention discloses a method, a device, equipment and a storage medium for predicting answers of English choice questions. After a question stem and a plurality of options of an English choice question are obtained, the options are respectively substituted into the question stem to obtain a plurality of sentences to be processed, whether the sentences to be processed contain a conversation or not is judged, different option prediction strategies are executed based on whether the sentences to be processed contain the conversation or not, and a target option is determined from the options to serve as an answer of the English choice question. The English choice questions are divided into conversation types and non-conversation types, different choice prediction strategies are executed according to the conversation types and the non-conversation types, the target choice is determined from a plurality of choices and serves as the answer of the English choice questions, the answer of the English choice questions is directly predicted in a manner of solving the questions by a person, a huge exercise question bank is not needed, the data storage space is saved, the problem that correct answers cannot be obtained due to insufficient coverage of the exercise question bank can be solved, and the answer prediction accuracy is improved.

Description

English choice question answer prediction method, device, equipment and storage medium

Technical Field

The embodiment of the invention relates to the technical field of natural language processing, in particular to a method, a device, equipment and a storage medium for predicting answers of English choice questions.

Background

In recent years, with the vigorous development of online education, more and more online question bank websites, APPs, and the like have appeared. Among them, the function of searching for questions by taking a picture is one of the functions that students use very frequently and are very popular. Students can search for problems by taking pictures and shoot problems which are not met by themselves, so that specific answers and analysis can be checked in the APP, and the students can understand corresponding problems and knowledge points. However, the shooting question searching scene depends on a very large question bank besides the accuracy of the algorithm. If the exercises shot by the students are not in the exercise library, even if the algorithm is accurate, the answers satisfactory to the students cannot be returned. However, no matter how large the problem library is, the problem library cannot cover all the problems in the world, and some problems which cannot be retrieved always exist, and the problem library is generally difficult to process in the scene.

Among many disciplines, the problem that english practice depends on the question bank is particularly prominent, and the main reason is that many english practice questions are newspapers, magazines and the like in the source life, and even teachers can make sentences by themselves. For some grammar class selection questions, only a few hollows are randomly dug, and different exercises can be generated. Especially, for the same sentence, the different dug spaces are different, the examined knowledge points are different, and the different exercises are also corresponding to different exercises. Therefore, new questions in english are generated at a much higher speed than other subjects, and therefore, in the scene of searching for questions by taking pictures, english is also the subject which most often appears and cannot be matched with satisfactory results.

Disclosure of Invention

The invention provides a method, a device, equipment and a storage medium for predicting answers of English choice questions, which do not need to rely on a huge examination question bank, not only save the data storage space, but also improve the accuracy of answer prediction.

In a first aspect, an embodiment of the present invention provides a method for predicting answers to english selection questions, including:

acquiring a question stem and a plurality of options of an English choice question;

substituting each option into the question stem respectively to obtain a plurality of sentences to be processed;

judging whether the sentence to be processed contains a dialogue or not;

and executing different option prediction strategies based on whether the to-be-processed sentence contains a dialogue or not, and determining a target option from the options as an answer of the English choice question.

Optionally, the determining whether the to-be-processed statement contains a dialog includes:

and respectively inputting each sentence to be processed into a preset conversation judgment model for processing, and judging whether the sentence to be processed contains a conversation.

Optionally, the step of inputting each to-be-processed statement into a preset dialogue judgment model for processing, and judging whether the to-be-processed statement contains a dialogue includes:

performing word embedding processing on characters in the sentence to be processed to obtain a characterization vector of the sentence to be processed;

extracting a first feature vector for representing whether the statement to be processed contains a dialogue or not from the representation vector;

mapping the first feature vector into a probability value of the sentence to be processed containing a dialog;

and judging whether the sentence to be processed contains a dialog or not based on the probability value.

Optionally, based on whether the to-be-processed sentence contains a dialog, different option prediction strategies are executed, and a target option is determined from the options, where the method includes:

when the sentence to be processed contains a dialogue, calculating the text matching degree of an upper sentence and a lower sentence in the sentence to be processed;

and determining a target option based on the text matching degree corresponding to each sentence to be processed.

Optionally, calculating the text matching degree between the upper sentence and the lower sentence in the sentence to be processed includes:

inputting the statement to be processed into an input layer of a Roberta model for embedding processing to obtain an embedded matrix;

inputting the embedded matrix into a coding layer of a Roberta model for processing to obtain a coding matrix;

performing linear transformation on the coding matrix to obtain a second eigenvector;

and mapping the second feature vector to the text matching degree of the upper sentence and the lower sentence in the sentence to be processed.

Optionally, the input layer of the Roberta model includes a word embedding layer, a position embedding layer and a segment embedding layer, and the to-be-processed sentence is input into the input layer of the Roberta model to be embedded, so as to obtain an embedded matrix, where the embedding matrix includes:

performing word embedding operation on the characters of the sentence to be processed in the word embedding layer to obtain a word embedding matrix;

performing position embedding operation on the characters of the sentence to be processed in the position embedding layer to obtain a position embedding matrix;

performing segmented embedding operation on the statements to be processed in the segmented embedding layer to obtain a segmented embedding matrix;

and adding the word embedded matrix, the position embedded matrix and the segment embedded matrix to obtain an embedded matrix.

Optionally, the encoding layer of the Roberta model includes M multi-head attention layers stacked in sequence, where the multi-head attention layer has an input matrix and an output matrix, M is a positive integer greater than or equal to 2, and the embedded matrix is input into the encoding layer of the Roberta model and processed to obtain an encoding matrix, where the encoding matrix includes:

inputting the embedded matrix into a first multi-head attention layer as an input matrix of the first multi-head attention layer for processing;

and taking the output matrix of the previous multi-head attention layer as the input matrix of the next multi-head attention layer until the output matrix of the last multi-head attention layer is obtained as the coding matrix.

Optionally, the processing procedure of each multi-head attention layer includes:

processing the input matrix of the multi-head attention layer based on a multi-head attention mechanism to obtain an attention matrix;

adding the attention matrix and the input matrix of the multi-head attention layer to obtain a fusion matrix;

inputting the fusion matrix into a full-connection feedforward layer for processing to obtain a full-connection matrix;

and summing the full-connection matrix and the fusion matrix to obtain an output matrix of the multi-head attention layer.

when the statement to be processed does not contain a dialogue, inputting the statement to be processed into a preset grammar error correction model for processing to obtain an error-corrected target statement;

taking an option corresponding to the statement to be processed which is the same as the corrected target statement as an intermediate option;

when the number of the intermediate options is 1, taking the intermediate options as target options;

when the number of the intermediate options is larger than 1, calculating the confusion degree of the to-be-processed statement corresponding to the intermediate options;

and taking the intermediate option corresponding to the sentence to be processed with the minimum confusion degree as a target option.

Optionally, calculating the confusion of the to-be-processed sentence corresponding to the intermediate option includes:

respectively inputting the sentences to be processed corresponding to the intermediate options into a preset confusion degree calculation model for processing to obtain the probability distribution of each character in the sentences to be generated under the context semantic environment;

calculating a confusion degree of the sentence to be processed based on the probability distribution.

Optionally, calculating the confusion of the to-be-processed sentence based on the probability distribution includes:

calculating the product of all probability values in the probability distribution to obtain a first numerical value;

calculating the reciprocal of the first numerical value to obtain a second numerical value;

and calculating the N power root of the second numerical value as the confusion degree, wherein N is the number of characters in the sentence to be processed.

In a second aspect, an embodiment of the present invention further provides an apparatus for predicting answers to english choice questions, including:

the question acquisition module is used for acquiring a question stem and a plurality of options of the English choice question;

a sentence to be processed determining module, configured to substitute each option into the question stem respectively to obtain a plurality of sentences to be processed;

the dialogue judging module is used for judging whether the sentence to be processed contains dialogue or not;

and the target option determining module is used for executing different option prediction strategies based on whether the sentence to be processed contains a conversation or not, and determining a target option from a plurality of options as an answer of the English choice question.

In a third aspect, an embodiment of the present invention further provides a computer device, including:

one or more processors;

storage means for storing one or more programs;

when executed by the one or more processors, cause the one or more processors to implement a method for english choice question answer prediction as provided by the first aspect of the present invention.

In a fourth aspect, an embodiment of the present invention further provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the english choice question solution prediction method according to the first aspect of the present invention.

According to the English choice question answer prediction method provided by the embodiment of the invention, after a question stem and a plurality of options of an English choice question are obtained, each option is respectively substituted into the question stem to obtain a plurality of sentences to be processed, whether the sentences to be processed contain a conversation or not is judged, different option prediction strategies are executed based on whether the sentences to be processed contain the conversation or not, and a target option is determined from the options to be used as an answer of the English choice question. The English choice questions are divided into conversation types and non-conversation types, different choice prediction strategies are executed according to the conversation types and the non-conversation types, the target choice is determined from a plurality of choices and serves as the answer of the English choice questions, the answer of the English choice questions is directly predicted in a manner of solving the questions by a person, a huge exercise question bank is not needed, the data storage space is saved, the problem that correct answers cannot be obtained due to insufficient coverage of the exercise question bank can be solved, and the answer prediction accuracy is improved.

Drawings

Fig. 1 is a flowchart of a method for predicting answers to english choice questions according to an embodiment of the present invention;

FIG. 2A is a flowchart of a method for predicting English choice question answers according to a second embodiment of the present invention;

FIG. 2B is a schematic structural diagram of an input layer of the Roberta model according to an embodiment of the present invention;

FIG. 2C is a schematic structural diagram of an encoding layer of a Roberta model according to an embodiment of the present invention;

fig. 3 is a schematic structural diagram of an english choice answer prediction apparatus according to a third embodiment of the present invention;

fig. 4 is a schematic structural diagram of a computer device according to a fourth embodiment of the present invention.

Detailed Description

The present invention will be described in further detail with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting of the invention. It should be further noted that, for the convenience of description, only some of the structures related to the present invention are shown in the drawings, not all of the structures.

Example one

Fig. 1 is a flowchart of an english choice question answer prediction method according to an embodiment of the present invention, where the present embodiment is applicable to answer prediction of english choice questions, and the method may be executed by an english choice question answer prediction apparatus according to an embodiment of the present invention, and the apparatus may be implemented in a software and/or hardware manner and integrated into a computer device according to an embodiment of the present invention, as shown in fig. 1, the method may specifically include the following steps:

s101, obtaining a question stem of an English choice question and a plurality of options.

English choice questions in embodiments of the present invention may include english choice questions from the preschool education stage to the high school education stage (or K12 education stage). Of course, in other embodiments of the present invention, english language choice questions in higher education stages may also be included, and the embodiments of the present invention are not limited herein.

In the embodiment of the invention, when a user encounters an english choice question which cannot be answered, the user (e.g. a student) can take a picture of a question stem of the english choice question and an area where a plurality of options are located by means of a user terminal (e.g. a smart phone, a tablet computer, and a personal computer), so as to obtain a picture including the question stem of the english choice question and the plurality of options. The picture identifies the text content in the picture through an Optical Character identification (Optical Character Recognition) technology at the local of a user terminal or a server, and further obtains a question stem and a plurality of options of an English choice question. In other embodiments of the present invention, the user may input the stem of the choice question and a plurality of options on the user terminal according to a preset format, for example, input the stem in the stem column and input a plurality of options in the option column, and further obtain the stem and a plurality of options of the english choice question.

It should be noted that the above manner of obtaining the question stem and the plurality of options of the english choice question is an exemplary illustration and is not a limitation of the embodiment of the present invention, and in other embodiments of the present invention, the question stem and the plurality of options of the english choice question may also be obtained in other manners, and the embodiment of the present invention is not limited herein.

And S102, substituting each option into the question stem respectively to obtain a plurality of sentences to be processed.

Specifically, each option is respectively substituted into the question stem to obtain a plurality of sentences to be processed corresponding to each option. Illustratively, for English language choice questions, there are typically 4 options, the A option, the B option, the C option, and the D option. And substituting the 4 options into the question stem respectively to obtain 4 sentences to be processed.

S103, judging whether the sentence to be processed contains a dialogue or not.

Specifically, whether each sentence to be processed contains a dialogue is judged, and whether the question type of the examination of the selection question is a dialogue type question type is further determined. In the embodiment of the invention, English choice questions are divided into conversation classes and non-conversation classes according to different contents of English choice question researches.

The dialogue selection question is examined whether the connection between the upper sentence and the lower sentence is smooth and accords with the context after the option is substituted into the question stem. The stem of a dialog-type choice question usually comprises 2 sentences, which are called an upper sentence and a lower sentence, and the upper sentence and the lower sentence have obvious separation for distinguishing different dialog characters. The dialogue-class selection question has 2 dialogue characters, and the utterances of the upper sentence and the lower sentence are the two dialogue characters, respectively. The question stem of the non-dialogue-class selection question usually comprises 1 sentence or a plurality of sentences, but no separation is used for distinguishing the dialogue characters between two adjacent sentences. Based on the above characteristics of the dialogue type choice questions, whether the sentence to be processed contains a dialogue or not can be judged.

And S104, executing different option prediction strategies based on whether the sentence to be processed contains a conversation or not, and determining a target option from a plurality of options as an answer of the English choice question.

And executing different option prediction strategies according to whether the sentence to be processed contains a dialogue or not, and further determining a target option from a plurality of options as an answer of the English choice question. Illustratively, in some embodiments of the invention, the first prediction strategy may be performed when the to-be-processed statement contains a dialogue, and the second prediction strategy may be performed when the to-be-processed statement does not contain a dialogue. Wherein the first prediction strategy may be: and calculating the text matching degree of the upper sentence and the lower sentence in each sentence to be processed, and then taking the option corresponding to the sentence to be processed with the highest text matching degree as a target option. The text matching degree is used for representing whether the connection between the upper sentence and the lower sentence is smooth and meets the context. The second prediction strategy may be: firstly, judging whether grammar errors exist in the sentences to be processed or not, then calculating the confusion of the sentences to be processed without the grammar errors, and then taking the option corresponding to the sentence to be processed with the minimum confusion as a target option. The confusion degree is used for measuring the quality degree of a probability distribution or probability model prediction sample, and in a popular way, if the probability of describing how large the sentence to be processed is normal, the sentence to be processed obtained after the option is substituted into the question stem is in accordance with the normal expression habit of people. It should be noted that the first prediction strategy and the second prediction strategy in the above embodiments are exemplary illustrations of embodiments of the present invention, and are not limited thereto.

According to the English choice question answer prediction method provided by the embodiment of the invention, after a question stem and a plurality of options of an English choice question are obtained, each option is respectively substituted into the question stem to obtain a plurality of sentences to be processed, whether the sentences to be processed contain a conversation or not is judged, different option prediction strategies are executed based on whether the sentences to be processed contain the conversation or not, and a target option is determined from the options to serve as an answer of the English choice question. The English choice questions are divided into conversation types and non-conversation types, different choice forecasting strategies are executed aiming at the conversation types and the non-conversation types of choice questions, the target choice is determined from a plurality of choices to serve as the answer of the English choice questions, the answer of the English choice questions is directly forecasted in a person-like problem solving mode, a huge question bank is not needed, the data storage space is saved, the problem that correct answers cannot be obtained due to insufficient coverage of the question bank can be solved, and the answer forecasting accuracy is improved.

Example two

Fig. 2A is a flowchart of an english choice answer prediction method according to a second embodiment of the present invention, which is detailed based on the first embodiment and describes in detail the detailed processes of different prediction strategies, as shown in fig. 2A, the method includes:

s201, obtaining a question stem of an English choice question and a plurality of options.

As described in the foregoing embodiments, english choice questions in the embodiments of the present invention may include english choice questions within the preschool education stage to the high school education stage (or K12 education stage). Of course, in other embodiments of the present invention, english language choice questions in higher education stages may also be included, and the embodiments of the present invention are not limited herein. The question stem and the multiple options for obtaining the english choice question may be taken by a photo or manually input by the user, and the embodiment of the present invention is not limited herein.

And S202, substituting each option into the question stem respectively to obtain a plurality of sentences to be processed.

Specifically, each option is respectively substituted into the question stem to obtain a plurality of sentences to be processed corresponding to each option.

And S203, judging whether the sentence to be processed contains a dialogue or not.

Specifically, in the embodiment of the invention, the English choice questions are divided into conversation classes and non-conversation classes according to different investigation contents of the English choice questions. The non-dialogue class is divided into a grammar error correction class and an expression class, the grammar error correction class selection question is used for investigating English grammar, and the expression class selection question is used for investigating normal expression habits.

Judging whether the sentences to be processed contain the dialogue is essentially a two-classification problem, pre-training a dialogue judgment model, and then respectively inputting each sentence to be processed into the trained dialogue judgment model for processing, so that whether the sentences to be processed contain the dialogue can be judged. Specifically, the dialogue judgment model may be a commonly used convolutional neural network model, and a processing procedure of the dialogue judgment model on the to-be-processed sentence is as follows:

1. and performing word embedding processing on characters in the sentence to be processed to obtain a characterization vector of the sentence to be processed.

Characters can be understood as placeholders in the sentence to be processed, one character for each word or punctuation. In the embodiment of the invention, the characters in the sentence to be processed are subjected to word embedding processing to obtain the characterization vector of the sentence to be processed. Word Embedding (Word Embedding) refers to converting a Word (Word) into a vector representation, and specifically, a Word Embedding method may be implemented by using one-hot encoding or Word2Vec algorithm, and the like, which is not described herein again.

2. And extracting a first feature vector for characterizing whether the sentence to be processed contains the dialogue or not from the characterization vectors.

For example, the characterization vector may be input into a convolutional network to perform convolution, pooling, and the like, so as to obtain a first feature vector that characterizes whether the to-be-processed sentence includes a dialog. Specifically, the convolutional network may include one or more convolutional layers, and the embodiments of the present invention are not limited herein.

3. And mapping the first feature vector into a probability value of the dialog included in the statement to be processed.

Specifically, the first feature vector may be mapped to [0, 1] through a classification function, so as to obtain a probability value of a dialog included in the to-be-processed statement. In particular, the classification function may be a sigmoid function.

4. And judging whether the sentence to be processed contains the dialogue or not based on the probability value.

Specifically, the obtained probability value is compared with a preset value, when the probability value is greater than or equal to the preset value (the preset value can be set to 0.8), it is determined that the sentence to be processed contains a dialog, and when the probability value is smaller than the preset value, it is determined that the sentence to be processed does not contain a dialog.

When the sentence to be processed contains a dialogue, step S204 to step S205 are executed, and when the sentence to be processed does not contain a dialogue, step S206 to step S211 are executed.

And S204, calculating the text matching degree of the upper sentence and the lower sentence in the sentence to be processed.

Specifically, when a sentence to be processed contains a dialogue, the English selection question is described as a dialogue selection question, and whether the connection between an upper sentence and a lower sentence is smooth and meets the context or not is examined after an option is substituted into a question stem. Therefore, the text matching degree of the upper sentence and the lower sentence in each sentence to be processed can be calculated, and the target option is determined according to the text matching degree.

In some embodiments of the present invention, the process of calculating the text matching degree between the upper sentence and the lower sentence in the sentence to be processed is as follows:

1. and inputting the statement to be processed into an input layer of the Roberta model for embedding processing to obtain an embedded matrix.

Specifically, an input layer of the roberta (robustly Optimized BERT predicting approach) model is used for performing word Embedding (Token Embedding) operation on characters in a sentence to be processed to obtain a word Embedding matrix, performing Position Embedding (Position Embedding) operation on the characters in the sentence to be processed to obtain a Position Embedding matrix, and performing Segment Embedding (Segment Embedding) operation on the sentence to be processed to obtain a Segment Embedding matrix. The word embedding operation means that each character is represented by codes in a dictionary library and is converted into word embedding vectors with fixed dimensionality, and a matrix formed by all the word embedding vectors is a word embedding matrix; the position embedding operation means numbering the position of each character in the sentence to be processed, then each number corresponds to a vector, and certain position information is introduced to each word by combining the position vector and the word vector. The segment embedding operation refers to encoding the sentences to be processed to distinguish different sentences of the sentences to be processed in a way that all words in one of two adjacent sentences are assigned with 0, all words in the other sentence are assigned with 1, the starting position of each sentence is represented by a CLS identifier, and the ending position of each sentence is represented by an SEP identifier.

Fig. 2B is a schematic structural diagram of input layers of the Roberta model according to an embodiment of the present invention, where exemplary input layers of the Roberta model include a word embedding layer, a position embedding layer, and a segment embedding layer, as shown in fig. 2B. Inputting the statement to be processed into an input layer of a Roberta model for embedding processing to obtain an embedded matrix, wherein the embedding process comprises the following steps:

and performing word embedding operation on the characters of the sentence to be processed in the word embedding layer to obtain a word embedding matrix. And performing position embedding operation on the characters of the sentence to be processed in the position embedding layer to obtain a position embedding matrix. And carrying out segmentation embedding operation on the sentences to be processed in the segmentation embedding layer to obtain a segmentation embedding matrix. And adding the word embedded matrix, the position embedded matrix and the segmentation embedded matrix to obtain an embedded matrix E.

2. And inputting the embedded matrix into an encoding layer of the Roberta model for processing to obtain an encoding matrix.

And processing the embedded matrix E by the coding layer of the Roberta model based on a multi-head self-attention mechanism to obtain a coding matrix T. Fig. 2C is a schematic structural diagram of an encoding layer of the Roberta model according to an embodiment of the present invention, and as shown in fig. 2C, specifically, the encoding layer of the Roberta model includes M multi-head attention layers stacked in sequence, where each multi-head attention layer has an input matrix and an output matrix, and M is a positive integer greater than or equal to 2. The multi-headed attention layer is also commonly referred to as a transform, and thus, the coding layer of the Roberta model consists of M stacked transforms. Exemplarily, in the embodiment of the present invention, M ═ 12. Inputting the embedded matrix E into a coding layer of a Roberta model for processing to obtain a coding matrix T, wherein the method comprises the following steps:

and inputting the embedded matrix E as an input matrix of the first-layer multi-headed attention layer into the first-layer multi-headed attention layer for processing.

And taking the output matrix of the previous multi-head attention layer as the input matrix of the next multi-head attention layer, and so on until the output matrix of the last multi-head attention layer is obtained as the coding matrix T.

Each multi-head attention layer processes the input matrix based on a multi-head attention mechanism, and the processing process of the multi-head attention layer is as follows:

1) and processing the input matrix of the multi-head attention layer based on the multi-head attention mechanism to obtain an attention matrix.

Specifically, taking the first Multi-Head Attention layer as an example, the Multi-Head Attention mechanism (Multi-Head Attention) is processed as follows:

firstly, three linear transformations are carried out on the embedded matrix E by adopting three different linear transformation coefficients to respectively obtain a matrix Q, a matrix K and a matrix V.

Q＝EW_i ^Q

K＝EW_i ^K

V＝EW_i ^V

Wherein, W_i ^QLinear transformation coefficient of matrix Q for i-th multi-head attention layer, W_i ^KLinear transformation coefficient of matrix K for i-th multi-head attention layer, W_i ^VThe linear transform coefficient of the matrix V for the ith multi-headed attention layer, where i is 1.

Then, the matrix Q, the matrix K and the matrix V are respectively subjected to m times of linear transformation to obtain a matrix QⁱMatrix KⁱAnd matrix VⁱWherein i ∈ m, and m is the number of attention heads of the multi-head attention layer. Illustratively, taking m-2 as an example, the processing procedure of the multi-head attention layer is exemplified.

Next, a matrix Q is calculatedⁱAnd matrix KⁱDot multiplication is carried out to obtain a first sub-matrix, and the first sub-matrix and a matrix K are calculatedⁱThe quotient of the square root of the dimension of (a) yields a second sub-matrix. Then, the second sub-matrix is normalized to obtain a third sub-matrix. Then, a third sub-matrix and a matrix V are calculatedⁱPoint multiplication of (c) to obtain a fourth sub-matrix (i.e., head)_i)。

Wherein the content of the first and second substances,

is a matrix QⁱA dot product with the matrix Ki,

is KⁱTransposed matrix of d₁Is a matrix KⁱSoftmax is the normalization process.

Finally, m fourth sub-matrix heads_iSplicing to obtain a spliced matrix, performing linear transformation on the first spliced matrix,an attention matrix M is obtained.

M＝Multihead(Q，K，V)＝concat(head₁，...，head_m)W⁰

Wherein concat is matrix splicing, W⁰Linear transformation coefficients for linearly transforming the mosaic matrix.

2) And adding the attention matrix and the input matrix of the multi-head attention layer to obtain a fusion matrix.

And adding an Attention matrix output by a Multi-Head Attention mechanism (Multi-Head Attention) and an input matrix of a Multi-Head Attention layer to obtain a fusion matrix. As shown in fig. 2C, taking the first multi-head attention layer as an example, the attention matrix M output by the multi-head attention mechanism is summed with the input matrix (embedded matrix E) of the multi-head attention layer to obtain a fusion matrix.

In an embodiment of the present invention, in order to accelerate the convergence speed of the network, the attention matrix M may be normalized (Normalization, in the figure, norm is used instead). In order to reduce the overfitting phenomenon of the network, the matrix after normalization processing is input into a drop (dropout) layer for random drop operation to obtain a matrix M₁. The output of the discard layer is then residual concatenated with the input of the multi-head attention layer, i.e. the matrix M₁Adding the embedded matrix E to obtain a fusion matrix M₂。

3) And inputting the fusion matrix into the full-connection feedforward layer for processing to obtain the full-connection matrix.

Specifically, in the embodiment of the present invention, in order to accelerate the convergence speed of the network, the fusion matrix M may be pre-aligned₂Carrying out layer normalization processing to obtain a matrix M₃. The layer normalization process is as follows:

wherein, t_iRepresentative pair fusion matrix M₂Is normalized by u_LAnd σ_LMean and variance of each sample are shown, respectively, and α and β represent shrinkageThe parameter vector of the amplification and translation is shown, epsilon is a bias parameter, the denominator is avoided to be zero, and the matrix M is obtained after each row is normalized₃。

Then, the matrix M is divided into₃Inputting the input signal into a Fully Connected Feed Forward Network (FFN) for processing to obtain a Fully Connected matrix M₄. Specifically, the processing procedure of the fully-connected feedforward layer is as follows:

M₄＝FFN(M₃)＝Max(0，M₃W₁+b₁)W₂+b₂

specifically, the fully-connected feedforward layer is first aligned to matrix M₃Making a non-linear transformation with the transformation parameter of (W)₁，b₁) To obtain a matrix M₃W₁+b₁Then applying a non-linear activation function Max (0, a) to M₃W₁+b₁Performing nonlinear activation, and performing linear transformation again on the matrix obtained by the nonlinear activation to obtain a transformation parameter (W)₂，b₂) The nonlinear activation function Max (0, a) is used to apply the matrix M₃W₁+b₁The negative element in (1) is replaced with 0.

4) And summing the full-connection matrix and the fusion matrix to obtain an output matrix of the multi-head attention layer.

Specifically, in an embodiment of the present invention, in order to accelerate the convergence rate of the network and reduce the over-fitting phenomenon of the network, the full-connection matrix M may be pre-aligned₄Normalization processing and random discarding operation are carried out, then, residual error connection is carried out on the output of random discarding and the input of the full-connection feedforward layer, namely the output of random discarding and the fusion matrix M₂And adding to obtain an output matrix of the multi-head attention layer.

And in analogy, taking the output matrix of the previous layer of the second multi-head attention layer as the input matrix of the next layer of the second multi-head attention layer until the output matrix of the last layer of the second multi-head attention layer is obtained as the coding matrix T.

3. And performing linear transformation on the coding matrix to obtain a second eigenvector.

Specifically, the coding matrix T is linearly transformed again, and the coding matrix T is converted into the second eigenvector.

4. And mapping the second feature vector into the text matching degree of the upper sentence and the lower sentence in the sentence to be processed.

Specifically, the second feature vector is mapped to be between [0 and 1] through a classification function, so that the text matching degree of the upper sentence and the lower sentence of the sentence to be processed is obtained, and the text matching degree is used for representing whether the connection of the upper sentence and the lower sentence is smooth and accords with the context. In particular, the classification function may be a sigmoid function or a softmax function.

S205, determining target options based on the text matching degree corresponding to each sentence to be processed.

The text matching degree is used for representing whether the upper sentence and the lower sentence are smoothly joined and conform to the context, and the higher the text matching degree is, the smoother the upper sentence and the lower sentence of the sentence to be processed are joined and conform to the context. In the embodiment of the invention, the option corresponding to the sentence to be processed with the highest text matching degree is taken as the target option, and the target option is the answer of the predicted English choice question.

And S206, inputting the statement to be processed into a preset grammar error correction model for processing to obtain an error-corrected target statement.

Specifically, when the to-be-processed statement does not contain a dialogue, the to-be-processed statement is input into a preset grammar error correction model for processing, and an error-corrected target statement is obtained.

For the syntax error correction class choice question, it is essentially a seq2seq problem, and the syntax error correction model is a seq2seq model. Namely, the input of the grammar error correction model is a sentence with grammar errors, and the output is a sentence without grammar errors after grammar error correction. The seq2seq model belongs to one of encoder-decoder structures, and is a common encoder-decoder structure. The encoder is responsible for compressing an input sequence into a vector with a specified length, the vector can be regarded as the semantic of the sequence, the process is called encoding, and the simplest way for obtaining the semantic vector is to directly use the hidden state of the last input as the semantic vector C. The last hidden state can be transformed to obtain a semantic vector, and all hidden states of the input sequence can be transformed to obtain a semantic variable.

The decoder is responsible for generating a specified sequence according to the semantic vector, the process is also called decoding, and the simplest mode is to input semantic variables obtained by the encoder into RNN of the decoder as an initial state to obtain an output sequence. It can be seen that the output at the previous moment is used as the input at the current moment, and the semantic vector C is only used as the initial state to participate in the operation, and the latter operation is not related to the semantic vector C. In another decoder processing mode, the semantic vector C participates in the operation of all time points of the sequence, the output of the previous time point is still used as the input of the current time point, but the semantic vector C participates in the operation of all time points.

And S207, taking an option corresponding to the to-be-processed statement which is the same as the corrected target statement as an intermediate option.

Specifically, the corrected target statement is compared with the input to-be-processed statement to see whether the corrected target statement is the same as the input to-be-processed statement. If the sentence to be processed obtained after the option is substituted into the question stem has no grammar error, if the sentence to be processed is not the same, the sentence to be processed obtained after the option is substituted into the question stem has no grammar error. And taking the option corresponding to the sentence to be processed which is the same as the error-corrected target sentence as an intermediate option, namely screening the option corresponding to the sentence to be processed without grammar error as the intermediate option.

And S208, judging whether the number of the intermediate options is more than 1.

Specifically, it is determined whether the number of intermediate options is greater than 1, if the number of intermediate options is equal to 1, step S209 is executed, and if the number of intermediate options is greater than 1, step S210-step S211 are executed.

And S209, taking the intermediate option as a target option.

Specifically, when the number of the screened intermediate options is 1, that is, only one option is substituted into the question stem to obtain a sentence to be processed without grammar errors, and the rest sentences all have grammar errors, the only intermediate option can be regarded as the target option, that is, the answer of the english choice question.

And S210, calculating the confusion degree of the to-be-processed statement corresponding to the intermediate option.

Specifically, the to-be-processed sentence without grammar error may have a condition that accords with the grammar rule but does not accord with the normal expression habit of people. For example, in the case of the stem "I ___ Natural Language Processing" and in the case of the option "hit", although there is no grammatical error in the sentence obtained by substituting the option into the stem, it is obviously not in accordance with the normal expression habit. When the number of the intermediate options is greater than 1, that is, there is no grammar error in the to-be-processed sentence obtained by substituting a plurality of options into the question stem, at this time, the confusion degree of the to-be-processed sentence corresponding to the intermediate option needs to be calculated, and then the target option is determined according to the confusion degree of each to-be-processed sentence. Specifically, the calculation process of the confusion of the to-be-processed sentence is as follows:

1. and respectively inputting the sentences to be processed corresponding to the intermediate options into a preset confusion degree calculation model for processing to obtain the probability distribution of each character in the sentences to be generated under the context semantic environment.

Secondly, for the expression class selection questions, the semantic content is considered, namely, the empty option is not suitable for filling, and the expression class selection questions do not accord with the expression habit of people. Whether the expression is proper or not is judged often based on whether the expression appears frequently or not, so that the model can be trained by using a large number of English corpora in life to measure whether the hollowed-out word is proper or not, and if so, the degree of the fit.

In the embodiment of the invention, a Generative Pre-Training model (GPT) is introduced as a confusion calculation model, specifically a GPT2.0 model. The language model of GPT2.0 is a pre-training language model derived from the Open AI team, and uses about 1000 ten thousand articles and text contents close to 40G, so that a large number of associations between texts can be learned, and the language model has very strong semantic expression ability. Meanwhile, the method is a rare one-way language model in the pre-training language model, the calculation mode of the one-way language model is closer to that of a normal language model, and the calculation precision of the confusion degree is improved.

The GPT2.0 model adopts a pre-training and downstream fine-tuning mode to process NLP tasks, and solves the problem of dynamic semantics. After GPT2.0 model processing, obtaining probability distribution of each character in the sentence to be generated corresponding to the sentence to be processed corresponding to each intermediate option under the context semantic environment.

2. And calculating the confusion degree of the sentence to be processed based on the probability distribution.

Next, the degree of confusion of the corresponding one of the sentences to be processed is calculated as the degree of confusion pp(s) based on the probability distribution. Specifically, the calculation method is as follows:

specifically, first, the product of all probability values in the probability distribution is calculated to obtain a first value P (w)₁w₂...w_N) Wherein w is₁w₂...w_NIs each character in the sentence to be processed. Wherein, N is the number of characters in the sentence to be processed, and S is the input sentence to be processed with the length of N. Then, the reciprocal of the first numerical value is calculated to obtain a second numerical value, and then the N-th root of the second numerical value is calculated as a first confusion degree. To simplify the calculation process, the above calculation process of the multiplication can be converted into a logarithmic concatenation process, P (w)_i|w_1：i-1) Representing a probability value of the ith symbol based on the contextual semantic environment.

And S211, taking the middle option corresponding to the to-be-processed sentence with the minimum confusion as a target option.

And after calculating the confusion degree of the to-be-processed sentence corresponding to each intermediate option, taking the intermediate option corresponding to the to-be-processed sentence with the minimum confusion degree as a target option, wherein the target option is the answer of the predicted English choice question.

According to the English choice question answer prediction method provided by the embodiment of the invention, English choice questions are divided into conversation types and non-conversation types, different option prediction strategies are executed aiming at the conversation types and the non-conversation types of choice questions, the target option is determined from a plurality of options to serve as the answer of the English choice questions, the answer of the English choice questions is directly predicted in a manner of solving the questions by a person, a huge question bank is not required, the data storage space is saved, the problem that correct answers cannot be obtained due to insufficient coverage of the question bank can be solved, and the answer prediction accuracy is improved. In addition, a plurality of visual angles such as context, semantics, grammar and expression habits are considered at the same time to predict answers, and the accuracy of the model is improved. According to the embodiment of the invention, external data is introduced through the confusion degree calculation model, and strong semantic information is obtained from massive data to assist the prediction of the answer of the final question and improve the reliability, accuracy and generalization capability of the model.

EXAMPLE III

Fig. 3 is a schematic structural diagram of an english choice answer prediction apparatus according to a third embodiment of the present invention, as shown in fig. 3, the apparatus includes:

the question acquisition module 301 is used for acquiring a question stem and a plurality of options of an English choice question;

a to-be-processed sentence determining module 302, configured to substitute each option into the question stem respectively to obtain a plurality of to-be-processed sentences;

a dialogue judging module 303, configured to judge whether the to-be-processed sentence includes a dialogue;

and a target option determining module 304, configured to execute different option prediction strategies based on whether the to-be-processed sentence contains a dialog, and determine a target option from the multiple options as an answer to the english choice question.

In some embodiments of the present invention, the dialog determination module 303 is further configured to:

In some embodiments of the present invention, the dialog determination module 303 comprises:

the word embedding submodule is used for carrying out word embedding processing on characters in the statement to be processed to obtain a representation vector of the statement to be processed;

the first feature vector extraction submodule is used for extracting a first feature vector used for representing whether the statement to be processed contains a dialogue or not from the representation vector;

a probability value obtaining submodule, configured to map the first feature vector to a probability value that the to-be-processed sentence contains a dialog;

and the dialogue judgment submodule is used for judging whether the sentence to be processed contains dialogue or not based on the probability value.

In some embodiments of the present invention, the target option determination module 304 comprises:

the text matching degree calculation operator module is used for calculating the text matching degree of an upper sentence and a lower sentence in the sentence to be processed when the sentence to be processed contains a dialogue;

and the first target option determining sub-module is used for determining a target option based on the text matching degree corresponding to each sentence to be processed.

In some embodiments of the invention, the text matchmaker sub-module comprises:

the embedding processing unit is used for inputting the statement to be processed into an input layer of a Roberta model for embedding processing to obtain an embedding matrix;

the encoding matrix acquisition unit is used for inputting the embedded matrix into an encoding layer of a Roberta model for processing to obtain an encoding matrix;

the linear transformation unit is used for carrying out linear transformation on the coding matrix to obtain a second characteristic vector;

and the text matching degree acquisition unit is used for mapping the second feature vector into the text matching degrees of the upper sentence and the lower sentence in the sentence to be processed.

In some embodiments of the present invention, the input layers of the Roberta model include a word embedding layer, a position embedding layer, and a segment embedding layer, and the embedding processing unit includes:

the word embedding subunit is used for carrying out word embedding operation on the characters of the sentence to be processed in the word embedding layer to obtain a word embedding matrix;

the position embedding subunit is used for carrying out position embedding operation on the characters of the sentence to be processed in the position embedding layer to obtain a position embedding matrix;

the segment embedding subunit is used for carrying out segment embedding operation on the statement to be processed in the segment embedding layer to obtain a segment embedding matrix;

and the adding subunit is used for adding the word embedded matrix, the position embedded matrix and the segmented embedded matrix to obtain an embedded matrix.

In some embodiments of the present invention, the encoding layer of the Roberta model includes M multi-headed attention layers stacked in sequence, the multi-headed attention layer having an input matrix and an output matrix, M being a positive integer greater than or equal to 2, the encoding matrix obtaining unit includes:

the first processing subunit is used for inputting the embedded matrix into a first-layer multi-head attention layer for processing as an input matrix of the first-layer multi-head attention layer;

and the second processing subunit is used for taking the output matrix of the previous multi-head attention layer as the input matrix of the next multi-head attention layer until the output matrix of the last multi-head attention layer is obtained as the coding matrix.

In some embodiments of the invention, the second processing subunit is to:

In some embodiments of the present invention, the target option determination module 304 further comprises:

the error correction processing submodule is used for inputting the statement to be processed into a preset grammar error correction model for processing when the statement to be processed does not contain a dialogue, so as to obtain an error-corrected target statement;

an intermediate option obtaining sub-module, configured to use an option corresponding to the to-be-processed statement that is the same as the target statement after error correction as an intermediate option;

a second target option determining submodule, configured to, when the number of the intermediate options is 1, take the intermediate options as target options;

the confusion degree operator module is used for calculating the confusion degree of the to-be-processed statement corresponding to the intermediate option when the number of the intermediate options is larger than 1;

and the third target option determining submodule is used for taking the intermediate option corresponding to the sentence to be processed with the minimum confusion degree as a target option.

In some embodiments of the invention, the confusion calculator module comprises:

a probability distribution obtaining unit, configured to input the to-be-processed sentences corresponding to the intermediate options into a preset confusion degree calculation model respectively for processing, so as to obtain probability distribution of each character in the to-be-generated sentences in a context semantic environment;

a confusion degree calculation unit for calculating a confusion degree of the sentence to be processed based on the probability distribution.

In some embodiments of the invention, the confusion calculation unit comprises:

the first numerical value calculating subunit is used for calculating the product of all probability values in the probability distribution to obtain a first numerical value;

the second numerical value operator unit is used for calculating the reciprocal of the first numerical value to obtain a second numerical value;

and the confusion degree operator unit is used for calculating the N power root of the second numerical value as the confusion degree, wherein N is the number of characters in the statement to be processed.

The English choice answer prediction device can execute the method provided by any embodiment of the invention and has the corresponding functional modules and beneficial effects of the execution method.

Example four

A fourth embodiment of the present invention provides a computer device, and fig. 4 is a schematic structural diagram of the computer device provided in the fourth embodiment of the present invention, as shown in fig. 4, the computer device includes a processor 401, a memory 402, a communication module 403, an input device 404, and an output device 405; the number of the processors 401 in the computer device may be one or more, and one processor 401 is taken as an example in fig. 4; the processor 401, the memory 402, the communication module 403, the input device 404 and the output device 405 in the computer apparatus may be connected by a bus or other means, and fig. 4 illustrates an example of connection by a bus. The processor 401, the memory 402, the communication module 403, the input device 404 and the output device 405 may be integrated on a control board of the computer apparatus.

The memory 402 is a computer-readable storage medium, and can be used for storing software programs, computer-executable programs, and modules, such as the modules corresponding to the english choice question answer prediction method in this embodiment. The processor 401 executes various functional applications and data processing of the computer device by running software programs, instructions and modules stored in the memory 402, that is, implements the english choice question solution prediction method provided by the above embodiments.

The memory 402 may mainly include a program storage area and a data storage area, wherein the program storage area may store an operating system, an application program required for at least one function; the storage data area may store data created according to use of the computer device, and the like. Further, the memory 402 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device. In some examples, the memory 402 may further include memory located remotely from the processor 401, which may be connected to a computer device through a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

And the communication module 403 is configured to establish a connection with an external device (e.g., an intelligent terminal) and implement data interaction with the external device. The input device 404 may be used to receive input numeric or character information and generate key signal inputs related to user settings and function controls of the computer apparatus.

The computer device provided by this embodiment may implement the english choice question answer prediction method provided by any of the above embodiments of the present invention, and its corresponding functions and advantages are concrete.

EXAMPLE five

An embodiment of the present invention provides a storage medium containing computer-executable instructions, where a computer program is stored on the storage medium, and when the computer program is executed by a processor, the method for predicting english choice question answers provided in any of the above embodiments of the present invention is implemented, where the method includes:

judging whether the statement to be processed contains a dialogue or not;

and executing different option prediction strategies based on whether the sentence to be processed contains a dialogue or not, and determining a target option from a plurality of options as an answer of the English choice question.

Of course, the storage medium provided by the embodiment of the present invention contains computer-executable instructions, and the computer-executable instructions are not limited to the operations of the method described above, and may also perform related operations in the english choice question answer prediction method provided by the embodiment of the present invention.

It should be noted that, as for the apparatus, the device and the storage medium embodiments, since they are basically similar to the method embodiments, the description is relatively simple, and in relevant places, reference may be made to the partial description of the method embodiments.

From the above description of the embodiments, it is obvious for those skilled in the art that the present invention can be implemented by software and necessary general hardware, and certainly, can also be implemented by hardware, but the former is a better embodiment in many cases. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which may be stored in a computer-readable storage medium, such as a floppy disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a FLASH Memory (FLASH), a hard disk or an optical disk of a computer, and includes instructions for enabling a computer device (which may be a robot, a personal computer, a server, or a network device) to execute the english choice question answer prediction method according to any embodiment of the present invention.

It should be noted that, in the above apparatus, each of the modules, sub-modules, units and sub-units included in the apparatus is merely divided according to functional logic, but is not limited to the above division as long as the corresponding function can be achieved; in addition, specific names of the functional units are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present invention.

It should be understood that portions of the present invention may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the various steps or methods may be implemented in software or firmware stored in memory and executed by suitable instruction execution devices. For example, if implemented in hardware, as in another embodiment, any one or combination of the following techniques, which are known in the art, may be used: a discrete logic circuit having a logic gate circuit for implementing a logic function on a data signal, an application specific integrated circuit having an appropriate combinational logic gate circuit, a Programmable Gate Array (PGA), a Field Programmable Gate Array (FPGA), or the like.

In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.

It is to be noted that the foregoing is only illustrative of the preferred embodiments of the present invention and the technical principles employed. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, although the present invention has been described in greater detail by the above embodiments, the present invention is not limited to the above embodiments, and may include other equivalent embodiments without departing from the spirit of the present invention, and the scope of the present invention is determined by the scope of the appended claims.

Claims

1. A method for predicting English choice question answers is characterized by comprising the following steps:

respectively substituting each option into the question stem to obtain a plurality of sentences to be processed;

judging whether the statement to be processed contains a dialogue or not;

2. The method of claim 1, wherein determining whether the sentence to be processed contains a dialogue comprises:

3. The method of claim 2, wherein the step of inputting each sentence to be processed into a preset dialogue judgment model for processing to judge whether the sentence to be processed contains a dialogue comprises:

4. The English choice question answer prediction method of any one of claims 1 to 3, wherein different option prediction strategies are executed based on whether the sentence to be processed contains a dialogue, and the target option is determined from a plurality of options, including:

5. The method for predicting answers to english choice questions of claim 4, wherein calculating the text matching degree of the upper sentence and the lower sentence in the sentence to be processed comprises:

6. The method of claim 5, wherein the input layers of the Roberta model include a word embedding layer, a position embedding layer and a segment embedding layer, and the input layer of the Roberta model into which the to-be-processed sentence is input is embedded to obtain an embedding matrix, including:

performing segmentation embedding operation on the statement to be processed in the segmentation embedding layer to obtain a segmentation embedding matrix;

7. The method of claim 5, wherein the coding layer of the Roberta model includes M multi-head attention layers stacked in sequence, the multi-head attention layer has an input matrix and an output matrix, M is a positive integer greater than or equal to 2, and the embedded matrix is input into the coding layer of the Roberta model for processing to obtain a coding matrix, including:

inputting the embedded matrix as an input matrix of a first multi-head attention layer into the first multi-head attention layer for processing;

8. The method of claim 7, wherein the processing procedure of each of the multiple attention layers comprises:

9. The english choice question answer prediction method of any one of claims 1-3 and 5-8, wherein based on whether the sentence to be processed contains a dialogue, different choice prediction strategies are executed to determine a target choice from a plurality of choices, including:

10. The method for predicting answers to english choice questions of claim 9, wherein calculating the confusion of the sentence to be processed corresponding to the intermediate choice includes:

11. The english choice answer prediction method of claim 10, wherein calculating the perplexity of the sentence to be processed based on the probability distribution comprises:

12. An english choice question answer prediction apparatus, comprising:

and the target option determining module is used for executing different option prediction strategies based on whether the to-be-processed sentence contains a dialogue or not, and determining a target option from a plurality of options as an answer of the English choice question.

13. A computer device, comprising:

one or more processors;

storage means for storing one or more programs;

when executed by the one or more processors, cause the one or more processors to implement the english choice question answer prediction method of any one of claims 1-11.

14. A computer-readable storage medium on which a computer program is stored, the program, when executed by a processor, implementing the english choice question answer prediction method according to any one of claims 1 to 11.