CN114357964A

CN114357964A - Subjective question scoring method, model training method, computer device, and storage medium

Info

Publication number: CN114357964A
Application number: CN202111632605.9A
Authority: CN
Inventors: 汪意发; 巩捷甫; 盛志超; 王士进; 胡国平; 秦兵; 刘挺
Original assignee: Hebei Xunfei Institute Of Artificial Intelligence; Zhongke Xunfei Internet Beijing Information Technology Co ltd; iFlytek Co Ltd
Current assignee: Hebei Xunfei Institute Of Artificial Intelligence; Zhongke Xunfei Internet Beijing Information Technology Co ltd; iFlytek Co Ltd
Priority date: 2021-12-28
Filing date: 2021-12-28
Publication date: 2022-04-15

Abstract

The embodiment of the application provides a subjective question scoring method, a model training method, computer equipment and a storage medium, wherein the scoring method comprises the following steps: acquiring a plurality of key point texts in the standard answer and the key point full score of each key point text; obtaining an examinee answering text; matching the key point text and the examinee answering text to obtain the matching degree of the examinee answering text and the key point text; and determining the response score corresponding to the answer text of the examinee according to the full point score of each point text and the matching degree of the answer text of the examinee and each point text. Matching the key point text and the examinee answering text, determining the matching degree of the examinee answering text and the key point text, and determining the answering score corresponding to the examinee answering text according to the matching degree; the effective information of the score relation between the scores of all the key points in the standard answers can be utilized in the scoring process, so that the scoring accuracy is higher, and the error between the scoring and manual scoring can be reduced.

Description

Subjective question scoring method, model training method, computer device, and storage medium

Technical Field

The present application relates to the field of natural language processing technologies, and in particular, to a subjective question scoring method, a model training method, a computer device, and a storage medium.

Background

Automatic scoring of examinations is one of the most popular directions in the academic and industrial fields, and a machine processes and analyzes answer texts of examinees to give scoring results. In large-scale examination, the machine scoring result is close to manual scoring, so that the burden of teachers can be effectively relieved, and the labor cost is reduced. The scheme is oriented to the key points of the scheme.

Subjective questions (also called simple answers) are a type of questions relative to objective questions (also called choice questions), and examinees need to analyze and understand the questions, refine and summarize according to learned knowledge, material content, general living knowledge and the like, and independently write and explain answer texts. The knowledge accumulation capacity of the examinees is inspected, the reading, induction, analysis and refinement capacity of the materials is analyzed, and the cognitive degree of life is realized. The system has strict requirements on examinee analysis and summarization ability, language expression ability and thinking innovation ability, but simultaneously has objective questions with relatively high scores, and is easily influenced by subjective factors of examiners.

For example, the main point subjective questions are one of the subjective questions, and the characteristic thereof is that the standard answer is relatively unique and comprises a plurality of main points and corresponding scores. The subject matter subjects are exemplified as follows, with the subjects: "QQ. ", the standard answer is: "A (6 min); b (7 min); c (7 min). "the standard answer is divided into three definite parts, and the definite score is indicated, the examinee's free expression space is smaller, and the answering direction is more definite. The subjective question examinees generally do not freely express the question stems or contact with the practical opinion of life. The reader strictly inspects whether the answer of the examinee is the key point in the standard answer, and scores the score according to the key point; in scoring, whether the expression of examinees is smooth, whether examples are rich, whether writing is standard, the number of answers and the like are not important to consider, but the influence of subjective factors of examiners is large.

The existing machine learning scoring system directly models the answer of an examinee and a standard answer as a whole during automatic scoring, key information of key points is difficult to capture, the coverage condition of the answer of the examinee on the key points is difficult to correctly calculate, and the semantic modeling and scoring process is often influenced by some irrelevant key words.

Disclosure of Invention

The embodiment of the application provides a subjective question scoring method, a model training method, computer equipment and a storage medium, and can improve the accuracy of subjective question scoring.

In a first aspect, the present application provides a subjective question scoring method, including:

acquiring a plurality of key point texts in a standard answer and a key point full score of each key point text;

obtaining an examinee answering text;

matching the key point text and the examinee answering text to obtain the matching degree of the examinee answering text and the key point text;

and determining the response score corresponding to the answer text of the examinee according to the full point score of the key point text and the matching degree of the answer text of the examinee and each key point text.

In a second aspect, the present application provides a training method for a subjective question scoring model, where the subjective question scoring model includes a principal point matching model, and the training method includes:

acquiring a plurality of key texts in the standard answers;

obtaining an examinee answering text and corresponding labeled data;

inputting the key point matching models into the key point texts and the examinee answer texts to obtain matching degrees of the examinee answer texts and the key point texts, wherein the matching degrees are used for determining answer scores corresponding to the examinee answer texts;

determining a loss value of the key point matching model according to the matching degree obtained by the key point matching model and the matching degree of the examinee answer text and the key point text in the labeled data;

and adjusting parameters of the main point matching model according to the loss value of the main point matching model.

In a third aspect, the present application provides a computer device comprising a memory and a processor; the memory is used for storing a computer program; the processor is configured to execute the computer program and to implement the steps of the above-mentioned method when executing the computer program.

In a fourth aspect, the present application provides a computer-readable storage medium storing a computer program which, if executed by a processor, implements the steps of the method described above.

The application discloses a subjective question scoring method, a model training method, computer equipment and a storage medium, wherein the scoring method comprises the following steps: acquiring a plurality of key point texts in the standard answer and the key point full score of each key point text; obtaining an examinee answering text; matching the key point text and the examinee answering text to obtain the matching degree of the examinee answering text and the key point text; and determining the response score corresponding to the answer text of the examinee according to the full point score of each point text and the matching degree of the answer text of the examinee and each point text. Matching the key point text and the examinee answering text, determining the matching degree of the examinee answering text and the key point text, and determining the answering score corresponding to the examinee answering text according to the matching degree; the effective information of the score relation between the scores of all the key points in the standard answers can be utilized in the scoring process, so that the scoring accuracy is higher, and the error between the scoring and manual scoring can be reduced.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

Fig. 1 to 3 are schematic diagrams of a conventional subjective question scoring system;

FIG. 4 is a schematic flow chart illustrating a subjective question scoring method according to an embodiment of the present application;

FIG. 5 is a diagram illustrating an application scenario of a subjective question scoring method according to an embodiment;

FIG. 6 is a diagram illustrating scoring based on a subjective question scoring model, according to one embodiment;

FIG. 7 is a diagram of a multitasking point matching model in one embodiment;

FIG. 8 is a diagram illustrating scoring based on a subjective question scoring model according to another embodiment;

FIG. 9 is a diagram of training a scoring model, according to one embodiment;

FIG. 10 is a schematic flow chart illustrating a method for training a subjective topic scoring model according to another embodiment of the present application;

fig. 11 is a schematic structural diagram of a computer device according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some, but not all, embodiments of the present application. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

The flow diagrams depicted in the figures are merely illustrative and do not necessarily include all of the elements and operations/steps, nor do they necessarily have to be performed in the order depicted. For example, some operations/steps may be decomposed, combined or partially combined, so that the actual execution sequence may be changed according to the actual situation.

The embodiment of the application provides a subjective question scoring method, a model training method, computer equipment and a storage medium, which are used for scoring answer texts of examinees according to standard answers and can improve the accuracy of subjective question scoring. For example, for an automatic scoring project for an examination, whether answers of examinees hit each key point in standard answers or not is analyzed according to the key points of the subjective questions, and further scoring results are given.

For example, if the subjective question scoring task is regarded as a regression task, because the full score value is small compared with the composition, the manual scoring is all integer or accurate to zero point five, and the numerical distribution is discontinuous. Meanwhile, due to the limitation of standard answers, the diversity of the answer of the examinee is general compared with the composition, whether the key words in the answer of the examinee can be correctly captured or not, the semantic similarity degree of the standard answers on each key point is understood, and meanwhile, the proportion of each key point score in the whole is fully utilized as the key of the grading task.

Fig. 1 is a schematic diagram of a feature-based machine learning model scoring system. By adopting a characteristic engineering method, characteristics such as the number of answer words, the coverage percentage of the standard answer words, the length of the longest public substring, the editing distance, the distribution of key words, the part-of-speech ratio, the cosine similarity of word vectors and the like are extracted for students to answer the whole standard answer text. The machine learning model uses SVM, GBDT, XGboost and the like to perform fusion scoring on the extracted features. The scoring process is a calibration scoring process, namely before scoring each time, a certain number of student answers are selected to be manually scored to serve as a calibration set training model, and the model is subjected to prediction scoring on the rest answers after being subjected to instant training; this process is required for different test questions each time.

With the rapid development of artificial intelligence technology, schemes based on conventional neural networks, such as a Recurrent Neural Network (RNN) and an end-to-end scheme of a Convolutional Neural Network (CNN), which do not depend on artificial features, are widely used. Fig. 2 and 3 show an end-to-end scoring system based on a neural network model, wherein fig. 2 shows a neural network scoring system in which student responses and standard answers are jointly modeled, and fig. 3 shows a neural network scoring system in which student responses and standard answers are separately modeled. The system shown in fig. 2 is used for promoting the weight of key words in response during semantic modeling of a neural network by splicing student answers and standard answers and using a Self-Attention mechanism (Self-Attention), so as to perform regression scoring after hidden layer representation is obtained. The system shown in fig. 3 separately models student answers and standard answers, respectively obtains hidden layer representations, then carries out semantic matching and attention mechanism to calculate similarity, and outputs the similarity as score. The end-to-end scheme is generally similar to manual scoring in all contents in the standard answers when calculating the semantic matching degree between the student answers and the standard answers.

Similar to the feature-based machine learning model scoring system, the scoring process of the end-to-end scoring system based on the neural network model is also a calibration scoring process.

Scaling scoring, a certain number of scaling sets are necessary for each test question, and the following defects exist: firstly, the number of calibration sets, different selection methods and different random seeds can cause the difference of manual score distribution, and biased manual score distribution can seriously influence the model training process, so the calibration score stability is poor. Second, scaling scores greatly limit application scenarios, such as small-scale examinations in classes, and a small number of answers may make scaling set selection difficult.

The inventor of the application finds that most of the machine learning scoring system based on the traditional features uses low-order linguistic features or basic vector features, and has difficulty in having enough modeling capability to model complex semantic relationships. Compared with the traditional characteristics, the end-to-end scoring system based on the neural network has better semantic modeling capability, but the standard answers do not utilize key point information, but directly model as a whole. The method is limited by the capability of the model, key information of the key points is difficult to capture, the coverage condition of students to the key points is difficult to calculate correctly, and the semantic modeling and scoring process is often influenced by some irrelevant key words. Meanwhile, the scores of different key points in the standard answers are different, the end-to-end model does not utilize the effective information, the semantic matching degree is calculated by one-looking-the-same, and the machine scoring and the manual scoring have more errors.

Based on this, the inventor of the application improves the subjective question scoring method, and the subjective question scoring method can make full use of the information of the main point text in the standard answer, so that the accuracy of the subjective question scoring is higher.

The subjective question scoring method provided by the embodiment of the application can be applied to a terminal or a server. The terminal can be an electronic device such as a mobile phone, a tablet computer, a notebook computer, a desktop computer, a personal digital assistant and the like; the servers may be independent servers or server clusters. However, for the sake of understanding, the following embodiments will be described in detail in a method applied to a server.

Some embodiments of the present application will be described in detail below with reference to the accompanying drawings. The embodiments described below and the features of the embodiments can be combined with each other without conflict.

Referring to fig. 4, fig. 4 is a schematic flowchart illustrating a subjective topic scoring method according to an embodiment of the present application.

In some embodiments, as shown in fig. 5, the server obtains the standard answers and the examinee response texts from the terminal, generates response scores corresponding to the examinee response texts according to a subjective question scoring method, and sends the response scores corresponding to the examinee response texts to the terminal. Of course, the standard answers and the test taker answer texts are, for example, texts stored locally by the apparatus for implementing the subjective question scoring method, texts acquired by the apparatus from a network, texts acquired by the apparatus from an input device connected thereto, texts acquired by the apparatus from other electronic devices, texts converted by the apparatus from voice information, and the like.

As shown in fig. 5, the subjective question scoring method includes the following steps S110 to S140.

Step S110, a plurality of key point texts in the standard answers and the key point full score of each key point text are obtained.

Wherein, the point score of full score is the score of corresponding point in the standard answer, for example the standard answer is: "A (6 min); b (7 min); and C (7 points), wherein the full point score of the point A is 6 points, the full point score of the point B is 7 points, and the full point score of the point C is 7 points.

For example, as shown in FIG. 6, a number of point texts in the standard answer, and a point full score for each of the point texts may constitute a point text set.

For example, for a main point subjective question, the standard answer is denoted as G, and the full score is S, and the standard answer may be split into a plurality of parts, such as n main point texts: g₁,G₂,…,G_nThe number of words of each key text is l₁,l₂,…,l_nThe score corresponding to each point text, namely the full score of the point is S₁,S₂,…,S_n。

Illustratively, the key point text may be obtained by manually splitting the standard answer by a user, or may be automatically obtained according to the standard answer, for example, the key point text may be automatically obtained according to the standard answer based on a preset splitting rule (for example, splitting according to a punctuation agreement, a first, a second, and the like connecting word), or the key point text may be automatically obtained according to the standard answer based on a machine learning model.

And step S120, obtaining a test taker answering text.

Illustratively, the test taker response text is denoted as a ═ w₁,w₂,…,w_mContains the number of words m.

And S130, matching the key point text and the examinee answer text to obtain the matching degree of the examinee answer text and the key point text.

Illustratively, the point text and the test taker answer text may be matched based on semantic matching or other text matching methods.

In some embodiments, the key point text and the answer text of the examinee are input into a preset key point matching model, and the matching degree of the answer text of the examinee and the key point text is obtained. Referring to fig. 6, each point text and the test taker response text are input into a preset point matching model, and a matching degree corresponding to each point text is obtained. Illustratively, the current gist text has a word count of l.

Illustratively, the matching degree is used for indicating whether the semantics of the examinee response text and the point text are matched. Optionally, the matching degree includes 3 categories: mismatch, partial match, full match, although not limited thereto; for example, the degree of match may be expressed in terms of a match score, with higher match scores indicating greater degrees of match.

In some embodiments, the step S130 of matching the gist text and the test taker response text to obtain the matching degree between the test taker response text and the gist text includes steps S131 to S133.

And S131, embedding words into a sub-network based on the key point matching model, and performing word embedding processing on the key point text and the answer text of the examinee to obtain a text vector.

Illustratively, the l words of the gist text are represented as g1 to gl, and the m words of the test taker answering text are represented as a1 to am. Referring to FIG. 7, the gist text is used as a first part, supplementing a beginning character, such as [ CLS ], at the beginning of a sentence; the test taker answers the text as a second part, and space characters, such as [ SEP ], are filled in between the first part and the second part.

Carrying out word embedding processing on the initial character, the key point text, the interval character and the answer text of the examinee through a word embedding sub-network, converting each character into an embedding vector to obtain a text vector H₀＝[e₁,e₂,…,e_l+m+2]Wherein e is_iAn embedding vector representing the ith word, wherein the 1 st word is the embedding vector e of the starting character₁The l +3 th word is the embedding vector e of the first word in the answer text of the examinee_l+3。

And S132, processing the text vector based on the multi-head self-attention subnetwork of the main point matching model to obtain the hidden layer representation of the text vector.

In some embodiments, the multi-headed self-attention subnetwork is a self-attention network in the BERT model (Bidirectional Encoder responses from transducers); the word embedding subnetwork is the word embedding layer of the BERT model. Optionally, the multi-headed self-attention subnetwork is a pre-trained BERT model, and certainly, the multi-headed self-attention subnetwork is not limited to the BERT model, and may also be, for example, a recurrent neural network model, a convolutional neural network model, or a combination of multiple models/networks.

The BERT model is a two-way language representation model. A Transformer network (a neural network based on self attention) is used as a unit module, and two upstream tasks, namely word mask Prediction (MLM) and upper and lower Sentence coherence classification (NSP), are used for pre-training on large-scale linguistic data, so that compared with a cyclic neural network, the convolutional neural network has stronger semantic modeling capability, and a better effect can be achieved on the downstream tasks only by simple fine tuning. By using a multi-head self-attention subnetwork such as a pre-trained BERT model and the like, deeper semantic modeling can be performed, and meanwhile, the multi-label classification task of knowledge point labeling can be better adapted.

In a common QA (question and answer) task, a question Q and an answer a are often spliced to be used as inputs of a BERT model, and the BERT model predicts whether the answer a can answer the question Q. In the embodiment of the application, each gist text G_iAre spliced with the answer text A of the examinee and supplemented at the beginning of the sentence [ CLS]Starting character, used between two sentences [ SEP]Padding, input to BERIn the T model; and carrying out semantic modeling on the key point text and the examinee answer text by utilizing a BERT model to obtain a corresponding text content representation, namely hidden layer representation.

The BERT model is a deep language model with 12 layers of transformers (multi-headed self-attention network) and no shared parameters, and the output of the previous layer of transformers is used as the input of the next layer of transformers, and is expressed as follows:

H_i+1＝Transformer(H_i)

the input of the first layer transform is the text vector H₀＝[e₁,e₂,…,e_l+m+2]，H_iFor each layer of transform output, the last layer of output

Hidden layer representations of all characters in the starting character, the main point text, the interval character and the test taker answering text are contained; wherein

And the context semantic information of the ith character is represented.

Among them, the Transformer uses Multi-Head Attention (Multi-Head Attention) mechanism to process the vector, and the Transformer will calculate once in each Head, and finally concatenate them, which can be expressed as follows:

MultiHead(Q,K,V)＝Concat(head₁,head₂,…,head_p)W

head_i＝Attention(QW_i ^Q,KW_i ^K,VW_i ^V)

wherein Q ═ K ═ V ═ e₁,e₂,…,e_l+m+2]W is the model parameter and Concat () represents the vector concatenation.

And S133, based on the first matching sub-network of the main point matching model, obtaining the matching degree of the examinee answering text and the main point text according to the hidden layer representation of the text vector.

Please refer to FIG. 7, gist text G_iAfter the answer text A is spliced with the examinee answer text A, semantic modeling is carried out by using a BERT model to obtain a text vector. For the last layer output H ═ H₁,h₂,…,h_l+m+2]The start character [ CLS ] may be selected]Hidden layer of (1) represents h₁And the text content representation (which can be called semantic representation) corresponding to all the characters in the starting character, the point text, the interval character and the test taker answering text is hidden layer representation of the text vector.

Referring to fig. 7, the hidden layer representation of the text vector is input into the first matching sub-network, so as to obtain the matching degree between the answer text of the examinee and the main point text.

Illustratively, the first matching subnetwork includes a full-connectivity layer and softmax activation function, by which the hidden representation of the text vector is transformed into a probability distribution of, for example, three categories of no match, partial match, and full match, such as:

C＝softmax(Wh₁+b)

wherein W, b are model parameters.

In some embodiments, referring to fig. 8, the matching degree includes a semantic matching confidence (confidence), which may be determined according to a probability distribution over the classes predicted by the first matching sub-network, as follows:

L_c＝-log(p(y|C))

wherein y represents the real matching degree of the examinee answer text and the main point text.

In some embodiments, a cross entropy loss function (Cross Encopy) may be used to optimally train the gist matching model. Illustratively, the degree of matching between the annotation of the test taker response text and the main point text is calculated according to the degree of matching obtained by the first matching sub-networkFirst loss value L of the matching model_cAnd adjusting parameters of the point matching model according to the first loss value.

In some embodiments, the accuracy of the model is improved by multitask training.

In some embodiments, referring to fig. 7, after the multi-headed self-attention subnetwork based on the point matching model processes the text vector to obtain a hidden layer representation of the text vector, the method further includes: based on a second matching sub-network of the key point matching model, obtaining a text segment matched with the key point text in the answer text of the examinee according to the hidden layer representation of the text vector; and adjusting parameters of the main point matching model according to the matching degree of the mark of the answer text of the examinee and the main point text and the matched text segment of the mark, and based on the matching degree obtained by the first matching sub-network and the text segment obtained by the second matching sub-network.

Illustratively, the text segment matched with the gist text in the answer text of the test taker is the segment covering the gist text in the answer text of the test taker, and the text segment is, for example, the same as the semantic information of the gist text and can also be called the gist covering segment.

The reader examines whether the examinee answers the key points in the standard answers and scores the key points according to the key point texts. The embodiment of the application abstracts two subtasks: firstly, whether the answer text of the test taker is matched with the key point text semantics of the standard answer is predicted, and the method comprises 3 categories: mismatch, partial match, complete match; secondly, the key point text coverage prediction is carried out, for a test taker answering text semantically matched with the key point text of the standard answer, a text segment specifically matched with the key point text is found, and for example, the starting position and the ending position of the text segment are predicted.

The two subtasks are consistent in nature and have the characteristic of supplementing each other. By adopting the multi-task learning method for simultaneously carrying out the classification of the matching degree of the main points and the prediction of the main point coverage segment on the two subtasks, the accuracy of whether the model prediction answers are matched with the standard answers or not can be improved.

Illustratively, as shown in FIG. 7, the gist overlays segment predictor subtasks, sharing a hidden layer representation of the text vector with the gist match degree classification subtask. The last layer of the BERT model outputs a hidden layer representation of a text vector H ═ H₁,h₂,…,h_l+m+2]The start character [ CLS ] may be selected]Hidden layer of (1) represents h₁And a hidden layer representation of the test taker response text, H_A＝[h₁,h_l+2,h_l+3,…,h_l+m+2]Inputting the second matching sub-network to obtain the starting position start and the ending position end of the text segment matched with the main point text in the answer text of the examinee, wherein H is_A∈R^768*(m+1)。

Illustratively, the second matching subnetwork comprises two fully-connected layers and a softmax activation function, as follows:

C_start＝softmax(W_sH_A+b_s)

C_end＝softmax(W_eH_A+b_e)

wherein, W_s,W_e∈R^1*768，C_startAnd C_endThe starting position and the ending position of the text segment used for representing the matching are [ CLS]Probability distribution on characters and the whole test taker answering text, when the predicted starting position and the predicted ending position are [ CLS ]]When the character is started, the test taker is indicated to answer the unmatched main point, and no segment covering the main point text exists.

In some embodiments, a cross entropy loss function (Cross Encopy) may be used to optimally train the gist matching model in some embodiments. Exemplarily, a second loss value of the matching model is determined according to the text segment obtained based on the second matching sub-network and the text segment of the label of the test taker answering text, which is matched with the main point text; and adjusting parameters of the point matching model according to the second loss value.

Illustratively, the second penalty value includes a home penalty value L_startAnd a terminal position loss value L_end：

L_start＝-log(p(y_start|C_start))

L_end＝-log(p(y_end|C_end))

Illustratively, the parameters of the main point matching model are adjusted according to the matching degree of the mark of the answer text of the examinee and the main point text and the matched text segment of the mark, and the matching degree obtained based on the first matching sub-network and the text segment obtained based on the second matching sub-network. And multi-task learning is realized.

In some embodiments, the gist matching model of the embodiments of the present application uses 3 AdamW optimization functions with different learning rates and without shared parameters for optimally training the BERT model, where the first matching sub-network fully connected layer and the second matching sub-network fully connected layer are connected. Illustratively, when the two subtasks are trained, the overall Loss value Loss of the point matching model is calculated as follows:

and adjusting parameters of the main point matching model according to the overall Loss value Loss of the main point matching model. And multi-task learning is realized.

And S140, determining the response score corresponding to the answer text of the examinee according to the full point score of each point text and the matching degree of the answer text of the examinee and each point text.

The degree of matching obtained based on the point matching model includes, for example, the probability P of matching the answer text of the examinee with each (e.g., ith) point text in the standard answer_i。

Illustratively, the full point score according to each of the point texts, and the test taker response text and each of the point textsAnd determining the answer score corresponding to the answer text of the examinee according to the matching degree of the key point text, wherein the matching degree comprises the following steps: according to the matching degree of the answer text of the examinee and each point text, such as the matching probability P_iCarrying out weighted summation on the full point score of each point text; and determining the answering score corresponding to the examinee answering text according to the weighted summation result and the standard score of the standard answer.

For example, by performing weighted summation on the full point score of each point text according to the matching degree of each point text, the overall matching confidence of the whole candidate answer text and the standard answer can be obtained, which can be called the point matching score P of the candidate answer text, and is expressed as follows:

the point matching score can represent objective score given by a reader strictly according to the standard whether the points are matched or not, and the score can be multiplied by a full score S to obtain a score corresponding to the examinee answering text, namely the answering score.

In some embodiments, the reader is inevitably affected by some subjective factors such as word count and topic similarity, although for point subjectives, considering the actual scoring scenario. Further, for answers that all points fail to answer, the reader will actually give a "bottom" score such as 1 or 2, although the result should be a zero score based on the point. Based on the above, the embodiment of the application can utilize the features extracted from the test taker answer text and the standard answer based on the preset feature extraction rule, so as to enhance the robustness of the scoring system.

Illustratively, referring to fig. 8, determining a response score corresponding to the answer text of the test taker according to the full point score of each of the point texts and the matching degree of the answer text of the test taker and each of the point texts includes: according to the matching degree of the answer text of the examinee and each point text, carrying out weighted summation on the point full score of each point text to obtain the point matching score corresponding to the answer text of the examinee; determining preset characteristics corresponding to the answer text of the examinee according to the standard answer and the answer text of the examinee based on a preset characteristic extraction rule; and matching the point score with the preset feature input preset scoring model to obtain the answer score corresponding to the examinee answer text. As can be appreciated, the scoring model outputs a result based on model fusion as a final scoring result.

Illustratively, the determining, based on a preset feature extraction rule, a preset feature corresponding to the test taker answering text according to the standard answer and the test taker answering text includes at least one of: determining the length characteristics corresponding to the answer texts of the examinees according to the length of the standard answers and the length of the answer texts of the examinees; determining word coverage corresponding to the answer text of the examinee according to whether the characters/words in the standard answers appear in the answer text of the examinee; determining word vector similarity corresponding to the answer text of the examinee according to the word vectors of the characters/words in the standard answers and the word vectors of the characters/words in the answer text of the examinee; determining the length ratio of the minimum common substring corresponding to the answer text of the examinee according to the substring in the answer text of the examinee, which is the same as the substring in the standard answer; and determining the editing distance between the answer text of the examinee and the standard answer according to the answer text of the examinee and the standard answer based on a preset editing rule. Wherein the editing distance may be determined according to the maximum editing number required for converting the test taker answer text into the standard answer.

By way of example, the scoring model may include, but is not limited to, machine learning models such as GBDT (Gradient Boosting Decision Tree), XGBoost (extreme Gradient Boosting Tree), gaussian regression, and the like. The scoring model can be obtained by training on a large-scale examinee answering corpus which is provided with scoring labels and covers subjective questions with different subject points.

In some embodiments, please refer to fig. 9, a large-scale examinee practice text and corresponding annotation data are obtained, where the annotation data at least includes the answer score, the full score and the matching confidence of the point, or the matching score of the point; the extracted preset features may also be included.

Illustratively, the training process of the scoring model includes the following steps:

acquiring a main point full score of each main point text in the standard answer;

according to the matching degree of the answer text of the examinee and each point text, carrying out weighted summation on the point full score of each point text to obtain the point matching score corresponding to the answer text of the examinee;

determining preset characteristics corresponding to the answer text of the examinee according to the standard answer and the answer text of the examinee based on a preset characteristic extraction rule;

inputting the point matching score and the preset features into the scoring model to obtain a response score corresponding to the examinee response text;

determining a loss value of the grading model according to the answer score obtained by the grading model and the answer score of the examinee answer text in the annotation data;

and adjusting parameters of the grading model according to the loss value of the grading model.

The subjective question scoring method provided by the embodiment of the application comprises the following steps: acquiring a plurality of key point texts in the standard answer and the key point full score of each key point text; obtaining an examinee answering text; matching the key point text and the examinee answering text to obtain the matching degree of the examinee answering text and the key point text; and determining the response score corresponding to the answer text of the examinee according to the full point score of each point text and the matching degree of the answer text of the examinee and each point text. Matching the key point text and the examinee answering text, determining the matching degree of the examinee answering text and the key point text, and determining the answering score corresponding to the examinee answering text according to the matching degree; the effective information of the score relation (such as the different scores of each point) between the scores of each point in the standard answer can be utilized in the scoring process, so that the scoring accuracy is higher, and the error between the scoring and manual scoring can be reduced.

Illustratively, capturing semantic matching conditions of the answer text of the examinee and each point text in the standard answer through a point matching model such as a neural network, and determining a point matching score rate corresponding to the answer text of the examinee according to the semantic matching conditions and the point full score values of all the point texts; and determining preset features corresponding to the examinee answer texts based on feature extraction rules, and combining the preset features with the main point matching score to input a scoring model to obtain a more accurate scoring result.

By way of example, the embodiment of the application provides a point subjective topic scoring pipeline system based on multi-task learning and combined with a neural network and a machine learning model. Firstly, a deep neural network pre-trained on a large-scale corpus is used, training is continued on a main point subjective question answering corpus, a multi-task learning method for simultaneously performing main point matching classification and main point coverage segment prediction on two subtasks is adopted, and the accuracy of whether a model prediction answer is matched with a standard answer is improved. And carrying out weighted summation on the confidence coefficient of whether each point predicted by the neural network is matched with the corresponding score of the point, so as to obtain the confidence coefficient which completely represents the point matching. And finally, extracting traditional features from the large-scale subjective question scoring corpus, splicing the traditional features with the weighted confidence obtained in the previous step as one-dimensional features, training a plurality of regression scoring models, and fusing to obtain a final scoring result.

For example, the scoring method in the embodiment of the present application no longer depends on a calibration set, and a general scoring may be implemented based on a point matching model and a scoring model that are trained in advance on a large-scale corpus. Different models do not need to be used for different main points and subjective questions, but the models which are used for large-scale corpus training and cover different subjects and different test questions are used.

Referring to fig. 10 in conjunction with the foregoing embodiments, an embodiment of the present application further provides a training method of a subjective topic scoring model. As shown in fig. 6 and 7, the subjective question scoring model at least includes a point matching model.

Referring to fig. 10, the training method includes steps S210 to S250.

Step S210, obtaining a plurality of key point texts in the standard answers;

s220, obtaining an examinee answering text and corresponding annotation data;

step S230, inputting the key point matching models into the key point texts and the examinee answer texts to obtain matching degrees of the examinee answer texts and the key point texts, wherein the matching degrees are used for determining answer scores corresponding to the examinee answer texts;

step S240, determining a loss value of the key point matching model according to the matching degree obtained by the key point matching model and the matching degree of the answer text of the examinee and the key point text in the labeled data;

and S250, adjusting parameters of the main point matching model according to the loss value of the main point matching model.

In some embodiments, referring to fig. 7, the inputting the plurality of point texts and the test taker response text into the point matching model to obtain the matching degree between the test taker response text and the point text includes:

performing word embedding processing on the key point text and the answer text of the examinee based on the word embedding sub-network of the key point matching model to obtain a text vector;

processing the text vector based on the multi-head self-attention subnetwork of the key point matching model to obtain a hidden layer representation of the text vector;

and based on the first matching sub-network of the main point matching model, obtaining the matching degree of the answer text of the examinee and the main point text according to the hidden layer representation of the text vector.

In some embodiments, referring to fig. 7, after the multi-headed self-attention subnetwork based on the point matching model processes the text vector to obtain a hidden layer representation of the text vector, the method further includes:

and based on a second matching sub-network of the key point matching model, obtaining a text segment matched with the key point text in the answer text of the examinee according to the hidden layer representation of the text vector.

Illustratively, the determining the loss value of the point matching model according to the matching degree obtained by the point matching model and the matching degree of the answer text of the examinee and the point text in the annotation data includes:

determining a first loss value of the matching model according to the matching degree obtained based on the first matching sub-network and the matching degree of the answer text of the examinee in the annotation data and the matching degree of the answer text of the examinee in the data and the key text;

determining a second loss value of the matching model according to a text segment obtained based on the second matching sub-network and a text segment matched with the point text according to the matching degree of the answer text of the examinee in the labeling data and the answer text of the examinee in the data;

and determining a loss value of the point matching model according to the first loss value and the second loss value based on a preset function.

In some embodiments, referring to fig. 8 and 9, the subjective topic scoring model further includes a scoring model, and the training method further includes:

The specific principle and implementation manner of the training method of the subjective question scoring model provided in the embodiment of the present application are similar to those of the subjective question scoring method in the foregoing embodiment, and are not described here again.

The methods of the present application are operational with numerous general purpose or special purpose computing system environments or configurations. For example: personal computers, server computers, hand-held or portable devices, tablet-type devices, multiprocessor systems, microprocessor-based systems, set top boxes, programmable consumer electronics, network PCs, minicomputers, mainframe computers, distributed computing environments that include any of the above systems or devices, and the like.

Illustratively, the above-described method may be implemented in the form of a computer program that is executable on a computer device as shown in fig. 11.

Referring to fig. 11, fig. 11 is a schematic structural diagram of a computer device according to an embodiment of the present disclosure. The computer device may be a server or a terminal.

Referring to fig. 11, the computer device includes a processor, a memory, and a network interface connected through a system bus, wherein the memory may include a nonvolatile storage medium and an internal memory.

The non-volatile storage medium may store an operating system and a computer program. The computer program comprises program instructions which, when executed, cause a processor to perform the steps of any of the methods described above.

The processor is used for providing calculation and control capability and supporting the operation of the whole computer equipment.

The internal memory provides an environment for the execution of a computer program on a non-volatile storage medium, which when executed by a processor causes the processor to perform the steps of any of the methods described above.

The network interface is used for network communication, such as sending assigned tasks and the like. Those skilled in the art will appreciate that the configuration of the computer apparatus is merely a block diagram of a portion of the configuration associated with aspects of the present application and is not intended to limit the computer apparatus to which aspects of the present application may be applied, and that a particular computer apparatus may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.

It should be understood that the Processor may be a Central Processing Unit (CPU), and the Processor may be other general purpose processors, Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components, etc. Wherein a general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

Wherein, in one embodiment, the processor is configured to execute a computer program stored in the memory to implement the steps of:

obtaining an examinee answering text;

acquiring a plurality of key texts in the standard answers;

obtaining an examinee answering text and corresponding labeled data;

inputting the key point matching models into the key point texts and the examinee answer texts to obtain the matching degree of the examinee answer texts and the key point texts;

From the above description of the embodiments, it is clear to those skilled in the art that the present application can be implemented by software plus necessary general hardware platform. Based on such understanding, the technical solution of the present application may be substantially implemented or contributed to by the prior art in the form of a software product, which may be stored in a storage medium, such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method according to the embodiments or some parts of the embodiments of the present application, such as:

a computer-readable storage medium storing a computer program, the computer program including program instructions, and a processor executing the program instructions to implement any of the steps of the subjective question scoring method provided in the embodiments of the present application.

The computer-readable storage medium may be an internal storage unit of the computer device described in the foregoing embodiment, for example, a hard disk or a memory of the computer device. The computer readable storage medium may also be an external storage device of the computer device, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like provided on the computer device.

While the invention has been described with reference to specific embodiments, the scope of the invention is not limited thereto, and those skilled in the art can easily conceive various equivalent modifications or substitutions within the technical scope of the invention. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. A method for scoring subjective questions, comprising:

obtaining an examinee answering text;

2. The method for scoring subjective questions according to claim 1, wherein the matching of the gist text and the test taker response text to obtain the degree of matching between the test taker response text and the gist text comprises:

performing word embedding processing on the key point text and the answer text of the examinee based on a word embedding sub-network of the key point matching model to obtain a text vector;

3. The method of subjective question scoring according to claim 2, wherein after said multi-headed self-attentional subnetwork based on said gist matching model processes said text vector to obtain a hidden representation of said text vector, said method further comprises:

based on a second matching sub-network of the key point matching model, obtaining a text segment matched with the key point text in the answer text of the examinee according to the hidden layer representation of the text vector;

and adjusting parameters of the main point matching model according to the matching degree of the mark of the answer text of the examinee and the main point text and the matched text segment of the mark, and based on the matching degree obtained by the first matching sub-network and the text segment obtained by the second matching sub-network.

4. The subjective question scoring method of any one of claims 1 to 3, wherein determining the answer score corresponding to the examinee's answer text based on the full point score of each of the point texts and the matching degree of the examinee's answer text with each of the point texts comprises:

and matching the point score with the preset feature input preset scoring model to obtain the answer score corresponding to the examinee answer text.

5. The subjective question scoring method of claim 4, wherein the determining of the preset features corresponding to the test taker answering text according to the standard answers and the test taker answering text based on the preset feature extraction rules comprises at least one of the following:

determining the length characteristics corresponding to the answer texts of the examinees according to the length of the standard answers and the length of the answer texts of the examinees;

determining word coverage corresponding to the answer text of the examinee according to whether the characters/words in the standard answers appear in the answer text of the examinee;

determining word vector similarity corresponding to the answer text of the examinee according to the word vectors of the characters/words in the standard answers and the word vectors of the characters/words in the answer text of the examinee;

determining the length ratio of the minimum common substring corresponding to the answer text of the examinee according to the substring in the answer text of the examinee, which is the same as the substring in the standard answer;

and determining the editing distance between the answer text of the examinee and the standard answer according to the answer text of the examinee and the standard answer based on a preset editing rule.

6. The subjective question scoring method of any one of claims 1 to 3, wherein determining the answer score corresponding to the examinee's answer text based on the full point score of each of the point texts and the matching degree of the examinee's answer text with each of the point texts comprises:

according to the matching degree of the answer text of the examinee and each point text, carrying out weighted summation on the point full score of each point text;

and determining the answering score corresponding to the examinee answering text according to the weighted summation result and the standard score of the standard answer.

7. A training method of a subjective question scoring model is characterized in that the subjective question scoring model comprises a principal point matching model, and the training method comprises the following steps:

acquiring a plurality of key texts in the standard answers;

obtaining an examinee answering text and corresponding labeled data;

8. The training method of claim 7, wherein the step of inputting the plurality of gist texts and the test taker answer text into the gist matching model to obtain the matching degree of the test taker answer text and the gist text comprises the steps of:

9. The training method of claim 8, wherein after the multi-headed self-attention subnetwork based on the gist matching model processes the text vector resulting in a hidden representation of the text vector, the method further comprises:

according to the matching degree obtained by the key point matching model and the matching degree of the examinee answer text and the key point text in the labeled data, determining the loss value of the key point matching model, including:

10. Training method according to any of the claims 7-9, wherein the subjective topic scoring model further comprises a scoring model, the training method further comprising:

11. A computer device, wherein the computer device comprises a memory and a processor;

the memory is used for storing a computer program;

the processor is used for executing the computer program and realizing the following when the computer program is executed:

the step of the subjective question scoring method according to any one of claims 1 to 6; and/or

The steps of the method of training a subjective problem scoring model according to any one of claims 7 to 10.

12. A computer-readable storage medium storing a computer program, wherein if the computer program is executed by a processor, the computer program implements: