CN115982351A

CN115982351A - Test question evaluation method and related device, electronic equipment and storage medium

Info

Publication number: CN115982351A
Application number: CN202211589007.2A
Authority: CN
Inventors: 陈子恒; 沙晶; 刘丹; 王士进; 魏思
Original assignee: iFlytek Co Ltd
Current assignee: iFlytek Co Ltd
Priority date: 2022-12-09
Filing date: 2022-12-09
Publication date: 2023-04-18

Abstract

The application discloses a test question evaluating method, a related device, electronic equipment and a storage medium, wherein the test question evaluating method comprises the following steps: acquiring a test to be evaluated and an answer text of the test to be evaluated, and acquiring a first evaluation model; the first evaluation model is obtained by training based on a sample test question text and a sample answer text thereof, and the sample test question text is marked with a sample evaluation result of the sample answer text; in response to the fact that the to-be-evaluated test questions are different from any sample test question text, screening sample test question texts similar to the to-be-evaluated test questions to serve as first test question texts, and adjusting network parameters of the first evaluation model based on the first test question texts and sample answering texts thereof to obtain a second evaluation model; and processing the test questions to be reviewed and the answer texts thereof based on the second review model to obtain answer evaluation results of the answer texts. According to the scheme, the accuracy of the test question evaluation can be improved on the premise of reducing the marking cost.

Description

Test question evaluation method and related device, electronic equipment and storage medium

Technical Field

The present application relates to the field of artificial intelligence technologies, and in particular, to a test question review method, a related apparatus, an electronic device, and a storage medium.

Background

With the rapid development of online education, the number of users is increasing dramatically, and how to customize a proper learning plan for different users becomes a crucial problem in the field of online education. In the process of modeling user portrait and user learning effect, the answering condition of the user is a key basis. Therefore, the automatic review for the test question and answer is of great significance to online education.

Currently, the prior art completes automatic review by using a pre-matching rule or completes automatic review by a deep learning neural network. However, because the test questions and answers usually present diversity, it is difficult to design a complete matching rule, and the accuracy of test question evaluation is further reduced; the latter usually depends heavily on the labeled data, which is often not satisfactory for the absence of labeled data, and if the labeled data is labeled to achieve full coverage as much as possible, a huge manpower cost is required. In view of this, how to improve the accuracy of examination question review on the premise of reducing the labeling cost becomes an urgent problem to be solved.

Disclosure of Invention

The technical problem mainly solved by the application is to provide the test question evaluating method, the related device, the electronic equipment and the storage medium, and the accuracy of the test question evaluation can be improved on the premise of reducing the marking cost.

In order to solve the above technical problem, a first aspect of the present application provides a test question review method, including: acquiring a test question to be evaluated and an answer text of the test question to be evaluated, and acquiring a first evaluation model; the first evaluation model is obtained by training based on a sample test question text and a sample answer text thereof, and the sample test question text is marked with a sample evaluation result of the sample answer text; in response to the fact that the to-be-evaluated test questions are different from any sample test question text, screening sample test question texts similar to the to-be-evaluated test questions to serve as first test question texts, and adjusting network parameters of the first evaluation model based on the first test question texts and sample answering texts thereof to obtain a second evaluation model; and processing the test questions to be evaluated and the answer texts thereof based on the second evaluation model to obtain the answer evaluation results of the answer texts.

In order to solve the above technical problem, a second aspect of the present application provides a test question review device, including: the system comprises an acquisition module, a training module and an evaluation module, wherein the acquisition module is used for acquiring the examination questions to be evaluated and answer texts of the examination questions to be evaluated and acquiring a first evaluation model; the first evaluation model is obtained by training based on a sample test question text and a sample answer text thereof, and the sample test question text is marked with a sample evaluation result of the sample answer text; the training module is used for responding to the fact that the to-be-evaluated test is different from any sample test text, screening sample test texts similar to the to-be-evaluated test as first test texts, and adjusting network parameters of the first evaluation model based on the first test texts and sample answering texts thereof to obtain a second evaluation model; and the evaluation module is used for processing the test questions to be evaluated and the answering texts thereof based on the second evaluation model to obtain answering evaluation results of the answering texts.

In order to solve the above technical problem, a third aspect of the present application provides an electronic device, which includes a memory and a processor coupled to each other, where the memory stores program instructions, and the processor is configured to execute the program instructions to implement the test question review method of the first aspect.

In order to solve the above technical problem, a fourth aspect of the present application provides a computer-readable storage medium storing program instructions executable by a processor, the program instructions being for implementing the test question review method of the first aspect.

According to the scheme, the test questions to be evaluated and answering texts of the test questions to be evaluated are obtained, the first evaluation model is obtained and trained on the basis of the sample test question texts and the sample answering texts thereof, the sample evaluation results of the sample answering texts are marked on the sample test question texts, so that in response to the fact that the test questions to be evaluated are different from any sample test question texts, sample test texts similar to the test questions to be evaluated are screened out and used as first test question texts, network parameters of the first evaluation model are adjusted on the basis of the first test question texts and the sample answering texts thereof, the second evaluation model is obtained, the test questions to be evaluated and the answering texts thereof are processed on the basis of the second evaluation model, the answering evaluation results of the test questions are obtained, therefore, a universal first evaluation model is trained in advance, when the test questions to be evaluated are detected to be different from the sample test questions to be evaluated in the traditional process, the sample test questions to be marked as the first test question texts, the second evaluation model is fine-tuned to obtain the additional test question data marked as the extra test questions, and the additional test questions marked as the extra test questions can be obtained, and the additional high evaluation cost of the test questions can be possibly brought by the first evaluation model. Therefore, the accuracy of examination question evaluation can be improved on the premise of reducing the marking cost.

Drawings

FIG. 1 is a schematic flow chart diagram of an embodiment of the method for reviewing test questions of the present application;

FIG. 2 is a schematic diagram of a process for screening an embodiment of a first test question text;

FIG. 3 is a schematic diagram of a process for screening an embodiment of a second test question text;

FIG. 4 is a schematic diagram of a process for obtaining an embodiment of a second review model;

FIG. 5 is a schematic diagram of a frame of an embodiment of the test question review device of the present application;

FIG. 6 is a block diagram of an embodiment of an electronic device of the present application;

FIG. 7 is a block diagram of an embodiment of a computer-readable storage medium of the present application.

Detailed Description

The following describes in detail the embodiments of the present application with reference to the drawings attached hereto.

In the following description, for purposes of explanation and not limitation, specific details are set forth such as particular system structures, interfaces, techniques, etc. in order to provide a thorough understanding of the present application.

The terms "system" and "network" are often used interchangeably herein. The term "and/or" herein is merely an association describing an associated object, meaning that three relationships may exist, e.g., a and/or B, may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the character "/" herein generally indicates that the former and latter associated objects are in an "or" relationship. Further, the term "plurality" herein means two or more than two.

Referring to fig. 1, fig. 1 is a schematic flow chart of an embodiment of a method for reviewing test questions according to the present application.

Specifically, the method may include the steps of:

step S11: the method comprises the steps of obtaining a test to be evaluated and an answer text of the test to be evaluated, and obtaining a first evaluation model.

In one implementation scenario, the test question types of the test questions to be evaluated can be set according to the actual application needs. Illustratively, the examination questions to be evaluated may be solution questions; alternatively, the test questions to be reviewed may be discussion questions, which are not limited herein. In addition, the test questions such as "test questions to be evaluated", "sample test question text", etc., in the embodiments disclosed in the present application may include a question stem and an answer, unless otherwise specified.

In one implementation scene, when a subject answers, an electronic answering mode can be adopted, and the test questions to be evaluated and the answering texts thereof can be directly obtained; or, the subject may also adopt a manual writing mode when answering, at this time, an image shot by the paper test paper may be obtained first, and the image is detected to obtain a test question area and an answering area, then the test question area is identified by adopting a character identification mode such as OCR and the like to obtain a test question to be evaluated, and the answering area is identified by adopting a character identification mode such as OCR and the like to obtain an answering text.

In the embodiment of the disclosure, the first evaluation model is obtained by training based on the sample test question text and the sample answer text thereof, and the sample test question text is marked with the sample evaluation result of the sample answer text. It should be noted that, unless otherwise specified, the network structure of the review models in the embodiments disclosed in the present application, such as the "first review model", the "second review model", and the like, may not be limited, and may include but is not limited to: convolutional neural networks, circular neural networks, transformers, BERTs, and the like, without limitation thereto. In addition, the evaluation models such as the "first evaluation model", the "second evaluation model", and the like in the embodiments disclosed in the present application have the same network structure, but the network parameters may be different.

In one implementation scenario, the sample evaluation results may include sample scoring rates for the answer text. It should be noted that the sample score rate can be obtained by dividing the sample score (which can be obtained by human review) of the answer text by the total score of the sample test question text. Illustratively, the total score of the sample test question text a is 10 scores, and if the sample answer text of the sample test question text a is manually read to 5 scores, the corresponding sample score rate is 0.5. Other cases may be analogized and are not illustrated here.

In one implementation scenario, the sample evaluation result may include a score fraction section in which the sample score fraction is located. Illustratively, several intervals may be demarcated in advance: 0 to 0.2, 0.2 to 0.4, 0.4 to 0.6, 0.6 to 0.8, 0.8 to 1.0, the score-rate section corresponding to the sample answer text of the sample question text A is of the second type (i.e. 0.2 to 0.4). Other cases may be analogized, and no one example is given here.

In an implementation scenario, when the first review model is trained, the sample question text and the sample answer text thereof may be processed based on the initial review model to obtain a predicted evaluation result, and then the network parameters of the initial review model are adjusted based on the difference between the sample evaluation result and the predicted evaluation result, and the steps are repeated in this way until the training is converged, so that the initial review model with the converged training can be used as the first review model. As previously mentioned, the initial review model may include, but is not limited to: convolutional neural networks, circular neural networks, transformers, BERTs, and the like.

In a specific implementation scenario, the network layer for feature extraction in the initial review model, such as a convolutional neural network, a cyclic neural network, a Transformer, a BERT, etc., may perform feature extraction on the sample number test question text and the sample answer text thereof to obtain feature vectors of the input text, and then process the feature vectors through the output layer (e.g., the fully-connected layer) in the initial review model to obtain a prediction evaluation result (e.g., a prediction score or a score interval). For a specific processing procedure, the technical details of network models such as convolutional neural network, cyclic neural network, transform, BERT, etc. may be referred to, and are not described herein again.

In a specific implementation scenario, in the case that the model training is constrained by the score ratio, the sample evaluation result may include the sample score ratio, and the prediction review result obtained by processing the initial review model may include the prediction score ratio, so that the sample score ratio and the prediction score ratio are processed based on a loss function such as a mean square error, and a training loss of the initial review model is obtained, and then the network parameters of the initial review model may be adjusted based on the training loss through an optimization manner such as gradient descent. The specific adjustment process of the network parameters may refer to technical details of optimization manners such as gradient descent, and details are not described herein.

In a specific implementation scenario, different from the foregoing implementation, in a case where the model training is constrained by the score interval, the sample evaluation result may include the score interval of the label, and the prediction review result obtained by the initial review model processing may include the prediction probability of each preset score interval, so that the score interval of the label and the prediction probability of each preset score interval may be processed based on a loss function such as cross entropy, and the training loss of the initial review model may be obtained, and further, the network parameter of the initial review model may be adjusted based on the training loss through a gradient descent optimization manner. For the specific adjustment process of the network parameters, the technical details of the optimization manners such as gradient descent and the like may be referred to, and are not described herein again.

Step S12: and responding to the fact that the to-be-evaluated test questions are different from any sample test question text, screening sample test question texts similar to the to-be-evaluated test questions to serve as first test question texts, and adjusting network parameters of the first evaluation model based on the first test question texts and sample answering texts thereof to obtain a second evaluation model.

In an implementation scene, if the text of the test questions to be evaluated is the same as the text of the sample test questions in word-by-word comparison, the text of the test questions to be evaluated can be determined to be the same as the text of the sample test questions; or, the semantic features of the to-be-evaluated test questions can be extracted, the semantic features of the sample test question texts can be extracted, on the basis, the semantic similarity between the semantic features of the to-be-evaluated test questions and the semantic features of the sample test question texts can be compared in semantic dimensions, if the semantic similarity is higher than a preset threshold value, the to-be-evaluated test questions can be considered to be the same as the sample test question texts, and otherwise, the to-be-evaluated test questions can be considered to be different from the sample test question texts. It should be noted that the preset threshold may be set according to practical application requirements, for example, in a case where a requirement for a comparison accuracy is relatively high, the preset threshold may be set to be relatively high, or in a case where a requirement for a comparison accuracy is relatively loose, the preset threshold may be set to be appropriately low.

In one implementation scenario, in order to screen the first test question text, feature similarity between the test questions to be evaluated and the sample test question text may be obtained, and the sample test question text is selected as the first test question text based on the feature similarity between each sample test question text and the test questions to be evaluated. For example, the sample test question texts may be sorted in the order of high feature similarity to low feature similarity, and the sample test question text in the top preset order (e.g., top 10, top 20, etc.) may be selected as the first test question text. According to the method, the feature similarity between the test questions to be evaluated and the sample test question texts is obtained, the sample test question texts are selected as the first test question texts based on the feature similarity between each sample test question text and each sample test question, the sample test question texts which are similar to each other are selected as far as possible in feature dimension, and the adaptability of the second evaluation model to the test questions to be evaluated can be improved.

In a specific implementation scenario, in order to obtain the feature similarity, feature extraction may be performed on the test questions to be evaluated and the sample test question text through a traditional sparse vector such as bag of words (bag of words), TF-IDF, or a classical dense vector such as word2vec, glove, or a pre-training model such as BERT, XLNet, GPT, to obtain feature vectors of the test questions to be evaluated and feature vectors of the sample test question text. On the basis, similarity measurement modes such as inner product, euclidean distance, cosine similarity and the like can be adopted to perform similarity measurement on the feature vector of the test question to be evaluated and the feature vector of the sample test question text, so that the feature similarity between the test question to be evaluated and the sample test question text is obtained.

In a specific implementation scenario, different from the feature extraction manner, the hidden layer feature of the first review model with the convergent training when processing the test question to be reviewed can be obtained as the text feature of the test question to be reviewed, and the hidden layer feature of the first review model with the convergent training when processing the text of the sample test question can be obtained as the text feature of the text of the sample test question. On the basis, similarity measurement (such as similarity measurement modes of the inner product, the Euclidean distance, the cosine similarity and the like) can be carried out based on the text characteristics of the test questions to be evaluated and the text characteristics of the sample test question text, and the characteristic similarity is obtained. In the mode, the text features are extracted through the first evaluation model which is converged by training, so that the extraction of feature information related to the evaluation and reading of the test questions is facilitated, and the accuracy of screening similar test questions in the test question evaluation and reading process can be further improved.

In an implementation scenario, different from the foregoing implementation, please refer to fig. 2 in combination in order to further enhance the adaptability of the second review model to the review questions, and fig. 2 is a schematic process diagram of an embodiment of screening the text of the first test question. As shown in fig. 2, the correlation degree of the knowledge points between the test question to be evaluated and the sample test question text may also be obtained while obtaining the feature similarity between the test question to be evaluated and the sample test question text. On the basis, the relevance between the test questions to be evaluated and the sample test question texts can be obtained by fusing the feature similarity and the relevance of the knowledge points, so that the sample test question texts are selected as the first test question texts based on the relevance between each sample test question text and each test question to be evaluated. It should be noted that, the specific process of screening the sample test question text according to the relevance may refer to the foregoing description of screening the sample test question text according to the feature similarity, and is not described herein again. According to the method, the relevance is obtained by combining the feature similarity and the relevance of the knowledge points, the sample test question text is screened through the relevance, test questions can be screened from the semantic features and the two dimensions related to the knowledge points, and the accuracy of screening similar test questions in the test question evaluation process is further improved.

In a specific implementation scenario, the knowledge point relevance may be the degree of coincidence between the knowledge points related to the test questions to be evaluated and the knowledge points related to the sample test questions. Illustratively, a knowledge point list of the test questions to be evaluated may be obtained, the knowledge points related to the test questions to be evaluated are recorded in the knowledge point list, a knowledge point list of the sample test question text is obtained, the knowledge points related to the sample test question text are recorded in the knowledge point list, and then the number of knowledge points common to the two knowledge point lists, that is, the number of knowledge points related to the test questions to be evaluated and the sample test question text is obtained. According to the mode, the relevance of the knowledge points is measured through the contact ratio between the knowledge points related to the examination questions to be evaluated and the knowledge points related to the sample examination questions, and the measurement complexity of the relevance of the knowledge points is facilitated to be simplified.

In a specific implementation scenario, the correlation degree may be obtained by weighting the feature similarity and the correlation degree of the knowledge point. It should be noted that a first weight may be preset for the feature similarity, a second weight may be preset for the knowledge point correlation, and the numerical values of the first weight and the second weight may be set according to the actual application requirement. For example, if the reference feature dimension is emphasized in the similar test question screening process, the first weight may be set to be greater than the second weight, for example, the first weight is set to be 0.7, and the second weight is set to be 0.3; alternatively, if the reference knowledge point dimension is emphasized in the similar test question screening process, the second weight may be set to be greater than the first weight, for example, the first weight is set to 0.3, and the second weight is set to 0.7. Other cases may be analogized, and no one example is given here. In the mode, the relevance is obtained by weighting the feature similarity and the relevance of the knowledge points, and the measurement complexity of the relevance is facilitated to be simplified.

In an implementation scenario, after the first test question text is obtained through screening, the network parameters of the first review model can be adjusted based on the first test question text and the sample answering text thereof, so as to obtain a second review model. Specifically, the first test question text and the sample answer text thereof may be processed based on the first review model to obtain a predicted evaluation result, and the difference between the predicted evaluation result and the sample evaluation result is measured by using a loss function such as the aforementioned mean square error and cross entropy to obtain a training loss, so that the network parameters of the first review model may be adjusted based on the training loss by using an optimization manner such as gradient reduction, and so on until the training converges, at which time the first review model converged by the training of the first test question text and the sample answer text thereof may be used as the second review model. It should be noted that, when different indicators (e.g., fraction-obtaining rate and fraction-obtaining rate interval) are used to constrain model training, different loss functions may be used to measure training loss, which is specifically referred to the foregoing description and will not be described herein again.

In a specific implementation scenario, when the adjustment of the network parameters of the first review model is performed, only the network parameters of the output layer in the first review model may be adjusted, that is, the network parameters of the network layers other than the output layer in the first review model may be fixed. In this case, the second review model can be obtained by assembling the network layer other than the output layer in the first review model and the output layer after the training convergence. According to the mode, when the adjustment of the network parameters of the first evaluation model is executed, only the network parameters of the output layer in the first evaluation model are adjusted, so that the second evaluation model obtained through fine adjustment is forced to be applicable to the test questions to be evaluated as far as possible on the premise of ensuring the performance of the model, the accuracy of the evaluation of the test questions is improved, and the fine adjustment operation complexity is reduced.

In a specific implementation scenario, when the adjustment of the network parameters of the first review model is performed, the learning rate of the output layer in the first review model is higher than the learning rate of each network layer except the output layer in the first review model, that is, each network layer except the output layer can adjust the network parameters with an adjustment step smaller than that of the output layer, that is, the network parameters of the network layers except the output layer are only subjected to fine adjustment processing compared with the network layers except the output layer. According to the mode, when the adjustment of the network parameters of the first evaluation model is executed, the learning rate of the output layer in the first evaluation model is higher than the learning rate of each network layer except the output layer in the first evaluation model, so that the second evaluation model obtained through fine adjustment is forced to be applicable to the test questions to be evaluated as far as possible on the premise of ensuring the performance of the model, the accuracy of evaluation of the test questions is improved, and the fine adjustment operation complexity is reduced.

In an implementation scenario, different from the foregoing implementation, in order to further ensure that the second review model still maintains the model performance of the universal first review model on the premise of improving the adaptability of the second review model to the review questions, the sample review text may be selected as the second review model text based on the score-scoring interval before adjusting the network parameters of the first review model, so that the network parameters of the first review model may be adjusted to obtain the second review model based on the first review model text and the sample answering text thereof, and the second review model text and the sample answering text thereof. In the mode, the sample test question text similar to the test question to be evaluated is screened as the first test question text, the second test question text is screened based on the score rate interval, the first test question text and the sample answering text thereof, the second test question text and the sample answering text thereof are referred in the process of adjusting the network parameters of the first evaluation model, on one hand, the second evaluation model can be forced to be applicable to the test question to be evaluated as much as possible by referring to the former text, and on the other hand, the second evaluation model can be forced to keep the model performance of the universal first evaluation model as much as possible by referring to the latter text. Therefore, the second review model can be further ensured to still keep the model performance of the universal first review model on the premise of improving the adaptability of the second review model to the review questions.

In a specific implementation scenario, please refer to fig. 3 in combination, and fig. 3 is a process diagram of an embodiment of screening the second test question text. As shown in fig. 3, clustering may be performed based on text features of the sample test question texts (e.g., sample test question text 1 to sample test question text N in fig. 3) to obtain a plurality of test question groups (e.g., test question group 1 to test question group M in fig. 3). It should be noted that, the text features of the sample test question text may be obtained by referring to the foregoing related description, and are not described herein again. On the basis, sample test question texts within a preset distance from the center of each test question group can be selected from the test question groups as candidate test question texts. It should be noted that the average feature may be obtained by averaging the text features of the sample test question texts in the test question cluster, and the average feature may be regarded as the cluster center of the test question cluster, so that the feature distance between the text feature of each sample test question text in the test question cluster and the average feature may be calculated. Illustratively, feature distances between features may be measured in terms of Euclidean distances or the like. And selecting the sample test question texts with the characteristic distance within the preset distance as candidate test question texts based on the selected sample test question texts. On the basis, at least one candidate test question text can be further selected from each score ratio interval to serve as a second test question text. That is, the candidate test question texts selected from the test question groups may be classified according to the score-rate sections corresponding to the candidate test question texts. Exemplarily, candidate test question texts in the score obtaining interval of 0 to 0.2 can be grouped into one group, candidate test question texts in the score obtaining interval of 0.2 to 0.4 can be grouped into one group, candidate test question texts in the score obtaining interval of 0.4 to 0.6 can be grouped into one group, … …, candidate test question texts in the score obtaining interval of 0.8 to 1.0 can be grouped into one group, and the rest can be analogized, so that the distance is not increased one by one. Referring to fig. 3, the sample test question texts 1 to 3 are grouped into one group, the sample test question text 3 is selected as a second test question text, the sample test question text 4 and the sample test question text 5 are grouped into one group, the sample test question text 4 is selected as the second test question text, the sample test question text 6 and the sample test question text 7 are grouped into one group, and the sample test question text 7 is selected as the second test question text. Of course, fig. 3 shows only one possible implementation in the practical process, and does not limit other possible implementations in the practical process. According to the method, clustering is carried out based on text characteristics of sample test question texts to obtain a plurality of test question groups, the sample test question texts within a preset distance from the group center of each test question group are selected from each test question group to serve as candidate test question texts, at least one candidate test question text is selected from each scoring interval to serve as a second test question text, the sample test question texts close to the group center are selected from the test question groups through clustering, the finally selected second test question texts can represent the average condition of the sample test questions as much as possible, the at least one candidate test question text is further selected from each scoring interval to serve as a second test question text, various answering conditions can be distributed as much as possible in the finally selected second test question texts, therefore, two dimensions of test questions and answering can be integrated to select a second test question text, the second question answering model can be beneficial to enable the second question model to retain the knowledge of the model to be learned as much as possible on the premise that the adaptability of the test questions is improved, and the model learning performance is maintained.

In a specific implementation scenario, after the second test question text is selected, the first review model can be finely adjusted based on the first test question text and the sample answer text thereof, and the second test question text and the sample answer text thereof. Referring to fig. 4, fig. 4 is a schematic diagram illustrating a process of acquiring the second review model according to an embodiment. As shown in fig. 4, a predicted evaluation result obtained by the first review model with convergent training processing the second test question text and the sample answer text thereof may be obtained as a reference evaluation result, so that the network parameter of the first review model may be adjusted based on a difference between the predicted evaluation result obtained by the first review model processing the first test question text and the sample answer text thereof and a difference between the predicted evaluation result obtained by the first review model processing the second test question text and the sample answer text thereof and the reference evaluation result, thereby obtaining the second review model. Illustratively, a first Loss (e.g., loss1 in fig. 4) may be obtained based on the former difference metric, a second Loss (e.g., loss2 in fig. 4) may be obtained based on the latter difference metric, and then fusion (e.g., weighting, summing, etc.) may be performed based on the first Loss and the second Loss to obtain a training Loss, and then the network parameters of the first review model may be adjusted based on the training Loss. It should be noted that, when constraint models with different indexes (e.g., score ratio and score ratio interval) are used for training, the above loss measurement manners are different, and reference may be specifically made to the foregoing related description, which is not described herein again. In addition, in the adjusting process of the network parameters, only the network parameters of the output layer in the first review model may be adjusted, or the learning rate of the output layer in the first review model is higher than the learning rate of each network layer other than the output layer in the first review model, which may be referred to the foregoing description specifically and is not described herein again. In the above manner, the predicted evaluation result obtained by processing the second test question text and the sample answer text by the first review model with the converged training is obtained as the reference evaluation result, so that the difference between the predicted evaluation result and the sample evaluation result obtained by processing the first test question text and the sample answer text by the first review model, and the difference between the predicted evaluation result and the reference evaluation result obtained by processing the second test question text and the sample answer text by the first review model are obtained, the network parameters of the first review model are adjusted to obtain the second review model, and further the second review model is ensured to still maintain the model performance of the universal first review model on the premise of improving the adaptability of the second review model to the review questions.

Step S13: and processing the test questions to be evaluated and the answer texts thereof based on the second evaluation model to obtain the answer evaluation results of the answer texts.

Specifically, after the second review model is obtained through training, the test questions to be reviewed and the answering texts thereof can be processed based on the second review model, and the answering evaluation results of the answering texts are obtained. As described above, the review model may be constrained by different indicators (e.g., score ratio interval) during training, and the specific content of the answer evaluation result corresponds to the indicator used during the training of the review model. That is, in the case of training with the score-rate constraint review model, the answer evaluation result includes the score rate of the answer text, and in the case of training with the score-rate section constraint review model, the answer evaluation result includes the score rate section of the answer text. On the basis, if the answer evaluation result comprises the score, the score can be multiplied by the total score of the test questions to be evaluated, and then the score of the answer text can be obtained; or, if the answer evaluation result includes the score-score interval, the end point values of the score-score interval may be multiplied by the total score of the test questions to be evaluated, so as to obtain the score intervals of the answer text, and at this time, the middle value between the score intervals may be further used as the score of the answer text.

In one embodiment, the examination questions to be evaluated may be known in advance, for example, all examination questions to be evaluated may be known after the examination is completed before a certain examination. In this case, the second review models respectively suitable for each of the to-be-reviewed test questions can be obtained through training in the foregoing embodiment, so that when the answer text of the to-be-reviewed test question needs to be reviewed, the second review model suitable for the to-be-reviewed test question only needs to be selected, and the to-be-reviewed test question and the answer text thereof are input to the second review model, and the answer evaluation result of the answer text can be obtained.

In one implementation scenario, as another possible implementation, the examination questions to be reviewed may not be known in advance, for example, the teacher may set questions in a classroom. In this case, after learning about the test to be evaluated, a second evaluation model suitable for the test to be evaluated can be obtained through the foregoing embodiment by means of the on-site training, and the test to be evaluated and the answer text thereof are input to the second evaluation model, so as to obtain the answer evaluation result of the answer text.

In an implementation scenario, different from the aforementioned test to be evaluated and any sample test text, under the condition that the test to be evaluated and the sample test text are the same, the test to be evaluated and the answer text thereof can be processed based on the first evaluation model to obtain the answer evaluation result of the answer text, so that the answer evaluation result can be directly obtained based on the processing of the first evaluation model under the condition that the test to be evaluated and the sample test text are the same, and the test review speed is further promoted. It should be noted that, as long as the sample test question text same as the test question to be evaluated exists, the first evaluation model can be considered to be suitable for the test question to be evaluated, and at this time, the test question to be evaluated and the answering text thereof can be directly processed based on the first evaluation model, so that the answering evaluation result of the answering text is obtained.

In one implementation scenario, after the answer evaluation result of the answer text is obtained based on the second review model, the to-be-reviewed test questions can be used as the reviewed test questions, so that in response to the fact that the new to-be-reviewed test questions are the same as the reviewed test questions, the new to-be-reviewed test questions and the answer text thereof can be processed based on the second review model obtained by the training for the reviewed test questions, and a new answer evaluation result is obtained, so that the new to-be-reviewed test questions and the answer text thereof can be directly processed based on the second review model obtained by the training for the reviewed test questions under the condition that the new to-be-reviewed test questions are the same as the reviewed test questions, and the speed of reviewing the test questions can be further increased.

According to the scheme, the test questions to be evaluated and answer texts of the test questions to be evaluated are obtained, the first evaluation model is obtained and trained on the basis of the sample test question texts and the sample answer texts thereof, the sample evaluation results of the sample answer texts are marked on the sample test question texts, so that in response to the fact that the test questions to be evaluated are different from any sample test question texts, the sample test question texts similar to the test questions to be evaluated are screened out and used as the first test question texts, the network parameters of the first evaluation model are adjusted on the basis of the first test question texts and the sample answer texts thereof, the second evaluation model is obtained, the test questions to be evaluated and the answer texts thereof are processed on the basis of the second evaluation model, the answer evaluation results of the test questions are obtained, the universal first evaluation model is trained in advance, when the test questions to be evaluated are detected to be different from the sample test questions in the previous training, the sample test question texts similar to be evaluated are screened out as far as possible, the second evaluation model is beneficial to fine-tuning of the test questions, and the additional test question data which are obtained and which are marked as the second evaluation model is suitable for the additional test questions. Therefore, the accuracy of examination question evaluation can be improved on the premise of reducing the marking cost.

Referring to fig. 5, fig. 5 is a schematic diagram of a frame of an embodiment of the test question review device 50 of the present application. The test question evaluating device 50 includes: the system comprises an acquisition module 51, a training module 52 and an evaluation module 53, wherein the acquisition module 51 is used for acquiring a test to be evaluated and an answer text of the test to be evaluated and acquiring a first evaluation model; the first evaluation model is obtained by training based on a sample test question text and a sample answer text thereof, and the sample test question text is marked with a sample evaluation result of the sample answer text; the training module 52 is configured to, in response to that the to-be-evaluated test is different from any sample test text, screen a sample test text similar to the to-be-evaluated test as a first test text, and adjust a network parameter of the first evaluation model based on the first test text and a sample answer text thereof to obtain a second evaluation model; and the review module 53 is configured to process the test questions to be reviewed and the answer texts thereof based on the second review model to obtain answer evaluation results of the answer texts.

Above-mentioned scheme, because the examination question appraises device 50 trains out general first model of appraising in advance, when detecting the sample examination question when waiting to appraise the examination question and being different from training in the past, through screening out the sample examination question text that is similar with the examination question of waiting to appraise as first examination question text to first model of appraising is finely tuned again through first examination question text and sample answer text, obtains the second model of appraising, and then can make the second model of appraising be applicable to the examination question of waiting to appraise as far as possible, and also need not additionally mark the training sample for the examination question that does not appear, help alleviating the high human cost problem that continuous data mark brought. Therefore, the accuracy of examination question evaluation can be improved on the premise of reducing the marking cost.

In some disclosed embodiments, the training module 52 includes a similarity obtaining sub-module, configured to obtain feature similarity between the test question to be evaluated and the sample test question text; the training module 52 comprises a relevancy obtaining sub-module for obtaining the relevancy of the knowledge points between the test questions to be evaluated and the sample test question texts; the training module 52 comprises a relevancy obtaining sub-module, which is used for fusing based on the feature similarity and the correlation of the knowledge points to obtain the relevancy between the test to be evaluated and the sample test question text; the training module 52 includes a test question selecting sub-module, configured to select a sample test question text as a first test question text based on the association degree between each sample test question text and the test question to be evaluated.

Therefore, the feature similarity between the test questions to be evaluated and the sample test question texts is obtained, the sample test question texts are selected as the first test question texts based on the feature similarity between each sample test question text and the test questions to be evaluated, the similar sample test question texts are selected as far as possible in feature dimension, and the adaptability of the second evaluation model to the test questions to be evaluated can be improved.

In some disclosed embodiments, the relevance of knowledge points is the degree of overlap between the knowledge points involved in the test questions to be evaluated and the knowledge points involved in the sample test questions.

Therefore, the relevance of the knowledge points is measured according to the contact ratio between the knowledge points related to the examination questions to be evaluated and the knowledge points related to the sample examination questions, and the method is favorable for simplifying the measurement complexity of the relevance of the knowledge points.

In some disclosed embodiments, the relevancy is weighted by the feature similarity and the relevance of the knowledge points.

Therefore, the relevance is obtained by weighting the feature similarity and the relevance of the knowledge points, and the measurement complexity of the relevance is facilitated to be simplified.

In some disclosed embodiments, the similarity obtaining sub-module includes a feature obtaining unit, configured to obtain hidden layer features when the first review model with the convergent training process the test questions to be reviewed, as text features of the test questions to be reviewed, and obtain hidden layer features when the first review model with the convergent training process the text of the sample test questions, as text features of the text of the sample test questions; the similarity obtaining sub-module comprises a similarity measuring unit used for carrying out similarity measurement based on the text characteristics of the test questions to be evaluated and the text characteristics of the sample test question text to obtain the characteristic similarity.

Therefore, the text features are extracted through the first evaluation model which is converged by training, the extraction of feature information related to the evaluation of the test questions is facilitated, and the accuracy of screening similar test questions in the test question evaluation process can be further improved.

In some disclosed embodiments, the sample evaluation result includes a score ratio interval where the sample score ratio of the sample answer text is located, and the test question review device 50 further includes a screening module for selecting the sample test question text as the second test question text based on the score ratio interval; the training module 52 is specifically configured to adjust the network parameters of the first review model based on the first test question text and the sample answer text thereof, and the second test question text and the sample answer text thereof, so as to obtain a second review model.

Therefore, by screening a sample test question text similar to the test question to be evaluated as a first test question text, screening a second test question text based on the score rate interval, and simultaneously referring to the first test question text and a sample answering text thereof as well as the second test question text and a sample answering text thereof in the process of adjusting the network parameters of the first evaluation model, on one hand, the second evaluation model can be forced to be applied to the test question to be evaluated as much as possible by referring to the former, and on the other hand, the second evaluation model can be forced to keep the model performance of the universal first evaluation model as much as possible by referring to the latter. Therefore, the second evaluation model can be further ensured to still keep the model performance of the universal first evaluation model on the premise of improving the adaptability of the second evaluation model to the evaluation questions.

In some disclosed embodiments, the screening module includes a clustering submodule, configured to perform clustering based on text features of the sample test question text to obtain a plurality of test question groups; the screening module comprises a candidate sub-module and a candidate sub-module, wherein the candidate sub-module is used for respectively selecting sample test question texts within a preset distance from the center of each test question group as candidate test question texts; the screening module comprises a selection submodule used for respectively selecting at least one candidate test question text as a second test question text in each score ratio interval.

Therefore, clustering is carried out based on the text characteristics of the sample test question texts to obtain a plurality of test question groups, sample test question texts within a preset distance from the group center of each test question group are respectively selected from each test question group to serve as candidate test question texts, at least one candidate test question text is respectively selected from each score interval to serve as a second test question text, the sample test question texts close to the group center are selected from the test question groups through clustering, the finally selected second test question texts can represent the average condition of the sample test questions as much as possible, at least one candidate test question text is further selected from each score interval to serve as the second test question text, the finally selected second test question texts can be distributed to various answering conditions as much as possible, the second test question texts can be selected by integrating two dimensions of test questions and answering, the second examination question text to be selected by the second evaluation model on the premise that the adaptability of the second examination questions is improved, the knowledge of the model to be learned is kept as much as possible, and the performance of the model is maintained.

In some disclosed embodiments, the training module 52 includes a reference evaluation obtaining sub-module, configured to obtain a predicted evaluation result obtained by processing the second test question text and the sample answer text thereof by the training converged first review model as a reference evaluation result; the training module 52 includes a network parameter adjusting sub-module, configured to adjust a network parameter of the first review model based on a difference between a predicted evaluation result obtained by processing the first test question text and the sample answer text thereof by the first review model and a difference between a predicted evaluation result obtained by processing the second test question text and the sample answer text thereof by the first review model and a reference evaluation result, so as to obtain the second review model.

Therefore, the predicted evaluation result obtained by the first evaluation model with the convergent training processing the second test question text and the sample answering text is obtained as the reference evaluation result, so that the difference between the predicted evaluation result obtained by the first evaluation model processing the first test question text and the sample answering text and the sample evaluation result, the difference between the predicted evaluation result obtained by the first evaluation model processing the second test question text and the sample answering text and the reference evaluation result are adjusted to obtain the second evaluation model, and the second evaluation model can be further ensured to still keep the model performance of the universal first evaluation model on the premise of improving the adaptability of the second evaluation model to the evaluation questions.

In some disclosed embodiments, the review module 53 is further configured to process the test to be reviewed and the answering text thereof based on the first review model in response to the test to be reviewed being the same as the sample test text, and obtain an answering evaluation result of the answering text.

Therefore, under the condition that the test to be evaluated is the same as the sample test question text, the answer evaluation result is obtained by directly processing the test to be evaluated based on the first evaluation model, and the test question evaluation speed is further promoted.

In some disclosed embodiments, the test question review device 50 further includes as a module for taking the test questions to be reviewed as reviewed test questions; the review module 53 is further configured to, in response to that the new review question to be reviewed is the same as the reviewed question, process the new review question to be reviewed and the answer text thereof based on the second review model obtained by training for the reviewed question to obtain a new answer evaluation result.

Therefore, under the condition that the new to-be-evaluated test questions are the same as the evaluated test questions, the new to-be-evaluated test questions and answer texts thereof can be processed directly on the basis of the second evaluation model obtained by training the evaluated test questions, and the speed of evaluating the test questions is further improved.

Referring to fig. 6, fig. 6 is a schematic diagram of a frame of an embodiment of an electronic device 60 according to the present application. The electronic device 60 includes a memory 61 and a processor 62 coupled to each other, the memory 61 stores program instructions, and the processor 62 is configured to execute the program instructions to implement the steps in any of the above embodiments of the test question review method. Specifically, the electronic device 60 may include, but is not limited to: desktop computers, notebook computers, servers, mobile phones, tablet computers, and the like, without limitation.

Specifically, the processor 62 is configured to control itself and the memory 61 to implement the steps in any of the above-described embodiments of the test question review method. The processor 62 may also be referred to as a CPU (Central Processing Unit). The processor 62 may be an integrated circuit chip having signal processing capabilities. The Processor 62 may also be a general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. In addition, the processor 62 may be collectively implemented by an integrated circuit chip.

According to the scheme, the electronic device 60 can implement the steps in any one of the test question evaluation method embodiments, namely, a universal first evaluation model is trained in advance, and then when a test question to be evaluated is detected to be different from a sample test question in the past training, a sample test question text similar to the test question to be evaluated is screened out to serve as the first test question text, so that the first evaluation model is finely adjusted through the first test question text and a sample answering text thereof, a second evaluation model is obtained, the second evaluation model can be made to be suitable for the test question to be evaluated as far as possible, training samples do not need to be additionally labeled for the test question which does not appear, and the problem of high labor cost caused by continuous data labeling is favorably relieved. Therefore, the accuracy of examination question evaluation and reading can be improved on the premise of reducing the labeling cost.

Referring to fig. 7, fig. 7 is a block diagram illustrating an embodiment of a computer readable storage medium 70 according to the present application. The computer readable storage medium 70 stores program instructions 71 capable of being executed by the processor, the program instructions 71 being configured to implement the steps in any of the above-described embodiments of the test question review method.

In the above-mentioned scheme, the computer-readable storage medium 70 may implement the steps in any of the above-mentioned test question evaluation method embodiments, that is, a general first evaluation model is trained in advance, and then when it is detected that the test question to be evaluated is different from the sample test question in the past training, the sample test question text similar to the test question to be evaluated is screened out as the first test question text, so that the first evaluation model is fine-tuned through the first test question text and the sample answer text thereof, and a second evaluation model is obtained, so that the second evaluation model can be applied to the test question to be evaluated as much as possible, and no training sample needs to be additionally labeled for the test question that does not appear, which is helpful for alleviating the problem of high human cost caused by continuous data labeling. Therefore, the accuracy of examination question evaluation can be improved on the premise of reducing the marking cost.

In some embodiments, functions of or modules included in the apparatus provided in the embodiments of the present disclosure may be used to execute the method described in the above method embodiments, and for specific implementation, reference may be made to the description of the above method embodiments, and for brevity, details are not described here again.

The foregoing description of the various embodiments is intended to highlight various differences between the embodiments, and the same or similar parts may be referred to each other, and for brevity, will not be described again herein.

In the several embodiments provided in the present application, it should be understood that the disclosed method and apparatus may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, a division of a module or a unit is merely a logical division, and an actual implementation may have another division, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of devices or units through some interfaces, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one position, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the embodiment.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit may be implemented in the form of hardware, or may also be implemented in the form of a software functional unit.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a separate product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be substantially implemented or contributed to by the prior art, or all or part of the technical solution may be embodied in a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, a network device, or the like) or a processor (processor) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk, and various media capable of storing program codes.

If the technical scheme of the application relates to personal information, a product applying the technical scheme of the application clearly informs personal information processing rules before processing the personal information, and obtains personal independent consent. If the technical scheme of the application relates to sensitive personal information, a product applying the technical scheme of the application obtains individual consent before processing the sensitive personal information, and simultaneously meets the requirement of 'express consent'. For example, at a personal information collection device such as a camera, a clear and significant identifier is set to inform that the personal information collection range is entered, the personal information is collected, and if the person voluntarily enters the collection range, the person is considered as agreeing to collect the personal information; or on the device for processing the personal information, under the condition of informing the personal information processing rule by using obvious identification/information, obtaining personal authorization by modes of popping window information or asking a person to upload personal information of the person by himself, and the like; the personal information processing rule may include information such as a personal information processor, a personal information processing purpose, a processing method, and a type of personal information to be processed.

Claims

1. A method for evaluating test questions, comprising:

acquiring a test question to be evaluated and an answer text of the test question to be evaluated, and acquiring a first evaluation model; the first evaluation model is obtained by training based on a sample test question text and a sample answering text thereof, and the sample test question text is marked with a sample evaluation result of the sample answering text;

responding to the fact that the to-be-evaluated test questions are different from any sample test question text, screening sample test question texts similar to the to-be-evaluated test questions to serve as first test question texts, and adjusting network parameters of the first evaluation model based on the first test question texts and sample answering texts thereof to obtain a second evaluation model;

and processing the test questions to be evaluated and the answer texts thereof based on the second evaluation model to obtain the answer evaluation results of the answer texts.

2. The method as claimed in claim 1, wherein said screening a sample test question text similar to said test question to be evaluated as a first test question text comprises:

acquiring the feature similarity between the test to be evaluated and the sample test question text, and acquiring the correlation degree of the knowledge points between the test to be evaluated and the sample test question text;

fusing based on the feature similarity and the correlation degree of the knowledge points to obtain the correlation degree between the test to be evaluated and the sample test question text;

and selecting the sample test question text as the first test question text based on the association degree between each sample test question text and the test question to be evaluated.

3. The method of claim 2, wherein the knowledge point relevance is a degree of coincidence between the knowledge points involved in the test questions to be evaluated and the knowledge points involved in the sample test questions;

and/or the relevance is obtained by weighting the feature similarity and the correlation of the knowledge points.

4. The method of claim 2, wherein the obtaining of the feature similarity between the test question to be evaluated and the sample test question text comprises:

obtaining hidden layer characteristics of a first evaluation model with convergent training when processing the test questions to be evaluated as text characteristics of the test questions to be evaluated, and obtaining hidden layer characteristics of the first evaluation model with convergent training when processing the sample test question text as text characteristics of the sample test question text;

and performing similarity measurement based on the text characteristics of the test questions to be evaluated and the text characteristics of the sample test question text to obtain the characteristic similarity.

5. The method of claim 1, wherein the sample evaluation result comprises a score fraction interval in which the sample score fraction of the sample answer text is located, and before the adjusting the network parameters of the first review model based on the first test question text and the sample answer text thereof to obtain a second review model, the method further comprises:

selecting the sample test question text as a second test question text based on the score ratio interval;

the adjusting the network parameters of the first review model based on the first test question text and the sample answering text thereof to obtain a second review model comprises the following steps:

and adjusting the network parameters of the first review model based on the first test question text and the sample answering text thereof, and the second test question text and the sample answering text thereof to obtain the second review model.

6. The method of claim 5, wherein selecting the sample question text as a second question text based on the score-rate section comprises:

clustering is carried out on the basis of the text characteristics of the sample test question texts to obtain a plurality of test question clusters;

respectively selecting sample test question texts within a preset distance from the cluster center of the test question cluster from each test question cluster as candidate test question texts;

and respectively selecting at least one candidate test question text from each score ratio interval as the second test question text.

7. The method of claim 5, wherein said adjusting network parameters of said first review model based on said first test question text and its sample answer text and said second test question text and its sample answer text to obtain said second review model comprises:

acquiring a predicted evaluation result obtained by processing the second test question text and the sample answer text thereof by a first review model with convergent training as a reference evaluation result;

and adjusting the network parameters of the first review model based on the difference between the predicted evaluation result obtained by processing the first test question text and the sample answer text of the first review model and the sample evaluation result, and the difference between the predicted evaluation result obtained by processing the second test question text and the sample answer text of the second test question text by the first review model and the reference evaluation result, so as to obtain the second review model.

8. The method of claim 1, further comprising:

responding to the test to be evaluated and the sample test question text are the same, processing the test to be evaluated and the answer text thereof based on the first evaluation model, and obtaining the answer evaluation result of the answer text;

and/or after the test to be reviewed and the answer text thereof are processed based on the second review model to obtain an answer evaluation result of the answer text, the method further comprises the following steps:

taking the test questions to be evaluated as evaluated test questions;

and in response to the fact that the new to-be-evaluated test questions are the same as the evaluated test questions, processing the new to-be-evaluated test questions and answer texts thereof on the basis of a second evaluation model obtained by training the evaluated test questions to obtain new answer evaluation results.

9. The method of any one of claims 1 to 8, wherein in performing said adjusting network parameters of said first review model, only network parameters of an output layer in said first review model are adjusted;

or when the adjustment of the network parameters of the first review model is performed, the learning rate of an output layer in the first review model is higher than the learning rate of each network layer except the output layer in the first review model.

10. A test question evaluating apparatus, comprising:

the system comprises an acquisition module, a first evaluation model and a second evaluation model, wherein the acquisition module is used for acquiring a test to be evaluated and an answer text of the test to be evaluated and acquiring the first evaluation model; the first evaluation model is obtained by training based on a sample test question text and a sample answering text thereof, and the sample test question text is marked with a sample evaluation result of the sample answering text;

the training module is used for responding to the fact that the to-be-evaluated test is different from any sample test text, screening sample test texts similar to the to-be-evaluated test as first test texts, and adjusting network parameters of the first evaluation model based on the first test texts and sample answering texts thereof to obtain a second evaluation model;

and the review module is used for processing the test questions to be reviewed and the answer texts thereof based on the second review model to obtain the answer evaluation results of the answer texts.

11. An electronic device comprising a memory having stored therein program instructions and a processor for executing the program instructions to implement the test question review method of any one of claims 1 to 9.

12. A computer-readable storage medium, characterized in that program instructions executable by a processor for implementing the test question review method of any one of claims 1 to 9 are stored.