CN112214579A

CN112214579A - Machine intelligent evaluation method and system for short answer questions

Info

Publication number: CN112214579A
Application number: CN202011078190.0A
Authority: CN
Inventors: 张新华; 王朝选; 刘喜军; 徐佳健; 彭军; 赖日毅; 江琪
Original assignee: Zhejiang Lancoo Technology Co ltd
Current assignee: Zhejiang Lancoo Technology Co ltd
Priority date: 2020-10-10
Filing date: 2020-10-10
Publication date: 2021-01-12
Anticipated expiration: 2040-10-10
Also published as: CN112214579B

Abstract

The application relates to examination paper evaluation and discloses a machine intelligent evaluation method and system for short answer questions, which can avoid misjudgment of high-grade behaviors obtained by stacking words and achieve more objective and accurate evaluation results. The method comprises the following steps: acquiring subject keywords and common keywords of the subject based on a subject corpus in advance, acquiring a related word set of each subject keyword, and constructing a keyword library of the subject; acquiring answering information and standard answers of the target test questions; extracting a subject keyword set and a common keyword set in the standard answer based on a keyword library, and determining a related word set of each subject keyword to expand the subject keyword set; identifying subject keywords and associated words in answering information based on the expanded subject keyword set, and identifying common keywords in answering information based on the common keyword set; calculating sentence reasonableness of the answering information; and calculating the score of the response information according to a scoring formula.

Description

Machine intelligent evaluation method and system for short answer questions

Technical Field

The application relates to examination paper evaluation, in particular to a machine intelligent evaluation technology of short answer questions.

Background

The examination is an indispensable important part in teaching activities, and is used for checking the ordinary learning condition of students and checking the teaching level of teachers.

With the development of computer technology, the examination paper evaluation is generally performed in a computer-aided manner at present, the automatic evaluation technology of objective questions is relatively mature at present, and the automatic evaluation of subjective questions (for example, short answer questions) remains a difficulty in the automatic evaluation technology of computers. Although some automatic review methods of subjective questions have been proposed, many problems still exist in these methods. For example, chinese application publication No. CN108959261A discloses a test paper subjective question determination device and method based on natural language, the device first extracts keywords in answers of examinees, then calculates word similarity between the extracted keywords and score point keywords, calculates sentence similarity between text sentences of the answers of the examinees and reference answers to provide sentences, and finally obtains determination scores of the answers of the examinees according to the word similarity and the sentence similarity. The device has problems in that: the word similarity of the keywords is calculated only by the word sense similarity, and the semantic similarity of the two sentences is calculated by using the word similarity of the words contained in the two sentences, which is only the judgment of the representation information of the sentences, and lacks the judgment of the logic of the sentences, and thus, the misjudgment of obtaining high scores by stacking the words is easily caused.

Disclosure of Invention

The application aims to provide a machine intelligent evaluation method and system for short answer questions, misjudgment of behaviors with high scores obtained by stacking words is avoided, and the evaluation result is more objective and accurate.

The application discloses a machine intelligent review method of short answer questions, which comprises the following steps:

acquiring subject keywords and common keywords of the subject in advance based on subject linguistic data, generating a word vector table of each keyword, clustering the keywords based on the word vector table, acquiring a related word set of the keywords of the subject, and constructing a keyword library of the subject;

acquiring answering information and standard answers of the target test questions;

extracting a subject keyword set and a common keyword set in the standard answers based on the keyword library, and determining a related word set of each subject keyword to expand the subject keyword set;

identifying subject keywords and associated words in the answering information based on the expanded subject keyword set, and identifying common keywords in the answering information based on the common keyword set;

calculating the sentence reasonableness of the answering information, wherein the sentence reasonableness refers to the reasonable degree of the logical sequence and relationship between words in the sentence;

according to the formula

Calculating a score F of the response information, wherein s₁、s₂、s₃、s₄Respectively representing subject keyword information in the answer information,Weight coefficient of associated word information, common keyword information, sentence reasonableness and s₁＞s₂＞s₃，F₀And summarizing the target test questions.

In a preferred embodiment, the calculating the sentence reasonability of the response information, where the sentence reasonability refers to the reasonability of the logical sequence and relationship between words in a sentence, further includes:

respectively extracting word sequences of each sentence in the answering information and the standard answers;

calculating the probability value of the position of each word in each word sequence in the sentence by adopting an N-gram language model according to the Markov assumption;

calculating a word reasonable probability value of each sentence according to the probability value of the position of each word in the sentence based on a Bayesian conditional probability model;

and calculating the sentence reasonableness of the answering information according to the answering information and the word reasonable probability value of each sentence in the standard answers.

In a preferred embodiment, the calculating the sentence reasonability of the response information according to the response information and the word reasonability probability value of each sentence in the standard answers further includes:

respectively calculating word reasonable probability mean values of sentences of the answering information and the standard answers;

if the mean value of the reasonable probabilities of the words of the sentence of the answering information is smaller than the mean value of the reasonable probabilities of the words of the sentence of the standard answer, the sentence reasonability of the answering information is the quotient of the mean value of the reasonable probabilities of the words of the sentence of the answering information and the mean value of the reasonable probabilities of the words of the sentence of the standard answer;

and if the mean value of the reasonable probability of the words of the sentence of the answering information is greater than or equal to the mean value of the reasonable probability of the words of the sentence of the standard answer, the sentence reasonableness of the answering information is 1.

In a preferred embodiment, the obtaining, based on the subject corpus, subject keywords and common keywords of the subject, generating a word vector table of each keyword, clustering each keyword based on the word vector table, and obtaining a relevant word set of each subject keyword, further includes:

acquiring subject keywords and common keywords of the subject based on the subject corpus;

generating word vectors of the keywords by using a text depth language model to obtain a word vector table;

calculating the distance between each keyword in the word vector table;

and acquiring words with the distance from each subject keyword to each subject keyword less than a preset threshold value to form a related word set of each subject keyword.

In a preferred example, the text deep language model is a deep learning model based on word2 vec;

and calculating the distance between the keywords in the word vector table by adopting a cosine similarity calculation method.

In a preferred embodiment, after the calculating the score of the response information, the method further includes:

calculating the average value of the scores of the answering information of all examinees in a preset examination group;

obtaining expected scores of the answering information of all examinees in the preset examination group, and calculating an average value of the expected scores;

and adjusting the scores of the answering information of each examinee according to the average value of the scores and the average value of the expected scores.

In a preferred embodiment, the adjusting the score of the answering information of each test taker according to the average value of the scores and the average value of the expected scores further comprises:

determining an expected average score range according to the average value of the expected scores, determining to adjust the scores upwards when the average value of the scores is smaller than the lower limit value of the expected average score range, and determining to adjust the scores downwards when the average value of the scores is larger than or equal to the upper limit value of the expected average score range;

after the determining to up-regulate the score or the determining to down-regulate the score, further comprising:

calculating a down-regulation or up-regulation base score as an absolute value of a difference between the average of the scores and the average of the expected scores;

according to the ranking of all the examinees in the preset examination group from high to low according to the scores of the answering information, dividing the examinees and the scores thereof into an excellent examinee set, a common examinee set and a poor examinee set according to the ranking result;

calculating an up-or down-regulation score for each test in the set of common test takers equal to the up-or down-regulation base score;

according to the formula

Calculating an up or down score for each test in the set of excellent tests, wherein TR represents the up or down benchmark score, F₀Representing the total score of the target test questions, F_iRepresenting the score, S, of the ith test taker in the set of excellent test takers_iRepresenting an up or down score for the ith test taker in the set of excellent test takers;

according to the formula

Calculating an up-regulation or down-regulation score of the score of each examinee in the poor examinee set, wherein f represents the number of examinees in the excellent examinee set, h represents the number of examinees in the poor examinee set, S represents an up-regulation or down-regulation score of each examinee in the poor examinee set, TR represents the up-regulation or down-regulation reference score, and S represents the number of examinees in the poor examinee set_iRepresenting an up or down score for the ith test taker in the set of excellent test takers;

and adjusting the scores of the answering information of each examinee up or down according to the calculated up-adjustment or down-adjustment values of the scores of each examinee.

The application also discloses machine intelligence system of reviewing of brief answer, include:

the keyword library construction module is used for acquiring subject keywords and common keywords of the subject based on subject linguistic data in advance, generating a word vector table of each keyword, clustering each keyword based on the word vector table, and acquiring a related word set of each subject keyword so as to construct a keyword library of the subject;

the acquisition module is used for acquiring answering information and standard answers of the target test questions;

a keyword identification module, configured to extract a subject keyword set and a common keyword set in the standard answer based on the keyword library, determine a related word set of each subject keyword to expand the subject keyword set, identify subject keywords and related words in the response information based on the expanded subject keyword set, and identify common keywords in the response information based on the common keyword set;

the reasonableness calculation module is used for calculating the sentence reasonableness of the answering information, wherein the sentence reasonableness refers to the reasonable degree of the logical sequence and relationship among the words in the sentence;

a scoring module for scoring according to a formula

Calculating a score F of the response information, wherein s₁、s₂、s₃、s₄Weight coefficients s representing subject keyword information, associated word information, general keyword information, and sentence reasonableness in the answer information₁＞s₂＞s₃，F₀And summarizing the target test questions.

a memory for storing computer executable instructions; and the number of the first and second groups,

a processor for implementing the steps in the method as described hereinbefore when executing the computer-executable instructions.

The present application also discloses a computer-readable storage medium having stored therein computer-executable instructions which, when executed by a processor, implement the steps in the method as described hereinbefore.

Compared with the prior art, the embodiment of the application at least comprises the following advantages:

the keyword information contained in the answering information is identified by taking the standard answer as a reference, and the score of the answering information of the examinee is calculated by combining the keyword information with the sentence reasonability (namely the reasonability degree of the logical sequence and the relation between words in the sentence), so that the misjudgment of the behavior of obtaining high score by stacking the words is avoided, and the scoring result is more objective and accurate.

In addition, when the keywords are expanded, only the subject keyword set is expanded without expanding the common keyword set, words with low relevance are effectively filtered (namely the common keywords are not expanded), and the subsequent keyword identification speed is favorably improved; and moreover, the subject keyword set is expanded by the words with the same, similar and high degree of association as the subject keyword, and the expanded subject keyword set covers a wider range of keywords, so that answering information of the words with higher related word occupation ratio can be reasonably scored although the subject keyword occupation ratio is lower, and the flexibility and the accuracy of scoring results are improved to a certain extent.

In addition, when the scoring is calculated, the scoring weight of the subject keyword is highest, the scoring weight of the associated word is lower, and the scoring weight of the common keyword is lowest, so that answering information of the common keyword with a low proportion but a high proportion can also obtain a reasonable score, and the accuracy of the scoring result is improved to a certain extent.

Furthermore, the sentence reasonability of the answering information is calculated according to the answering information and the word sequence of each sentence in the standard answer based on the Markov hypothesis and the Bayesian conditional probability model, so that the reasonability calculation result is consistent with the approximation degree of the natural language expression habit, the sentence reasonability of the standard answer is used as a judgment reference, the judgment standard is unified, the grading difference is avoided, and the objectivity and the accuracy of the grading result are further improved.

In addition, the expected scores of the examinees are used as an adjustment standard, and the scores of the examinees are adjusted according to an up-regulation principle of 'less score for high-score examinees and more score for low-score examinees' and a down-regulation principle of 'less score for high-score examinees and more score for low-score examinees', so that the purpose of examination is achieved, and the psychological health problem caused by too large difference between the actual scores and the expected scores of the examinees is avoided.

The present specification describes a number of technical features distributed throughout the various technical aspects, and if all possible combinations of technical features (i.e. technical aspects) of the present specification are listed, the description is made excessively long. In order to avoid this problem, the respective technical features disclosed in the above summary of the invention of the present application, the respective technical features disclosed in the following embodiments and examples, and the respective technical features disclosed in the drawings may be freely combined with each other to constitute various new technical solutions (which are considered to have been described in the present specification) unless such a combination of the technical features is technically infeasible. For example, in one example, the feature a + B + C is disclosed, in another example, the feature a + B + D + E is disclosed, and the features C and D are equivalent technical means for the same purpose, and technically only one feature is used, but not simultaneously employed, and the feature E can be technically combined with the feature C, then the solution of a + B + C + D should not be considered as being described because the technology is not feasible, and the solution of a + B + C + E should be considered as being described.

Drawings

Fig. 1 is a flow chart of a method for machine-intelligent review of short-response questions according to a first embodiment of the present application.

FIG. 2 is a schematic structural diagram of a machine intelligent review system for short-response questions according to a second embodiment of the present application.

Detailed Description

In the following description, numerous technical details are set forth in order to provide a better understanding of the present application. However, it will be understood by those skilled in the art that the technical solutions claimed in the present application may be implemented without these technical details and with various changes and modifications based on the following embodiments.

Description of partial concepts:

subject keywords: the subject nouns or phrases, and all the words which reflect the subject characteristics and have certain information meanings in the subject field appear in the subject corpus.

Common keywords: in addition to the subject keywords, other words with certain information meanings are often found in the subject corpus.

Associated words: the words with the same or similar meaning as the corresponding subject keywords and the words with high subject relevance with the corresponding subject keywords are referred. The related words corresponding to each subject keyword may be 0, 1 or more, and the related words of one subject keyword are not limited to other subject keywords and/or common keywords.

Semantic related words: and words with the same or similar meaning as the corresponding subject keywords.

Subject associated word: and words with high subject relevance with corresponding subject keywords.

Some innovation points of the application are described below for a history subject brief answer:

simple answering: to illustrate the role of the people in independent war?

Standard answers: the united states people, who have been tightly integrated with the formation of the american nationality, have ultimately developed a tremendous revolutionary force, their role in the united states' independent war is not negligible, and we can see the following aspects: (1) after the seven-year war is finished, the English starts to tighten the compression and the peeling of the English colonial land.

According to the embodiment of the application, the keywords are divided and expanded:

1) and identifying a subject keyword set and a common keyword set in the standard answer based on a keyword library corresponding to the historical subject:

subject keyword set: { [ 1-the American ethnic group ] [ 1-the revolutionary force ] [ 1-the independent war ] … … };

common keyword set: { [ 3-polymerization ] [ 3-bulk ] [ 3-formation ] [ 3-Large ] … … }.

2) The method comprises the following steps of (1) expanding a subject keyword set based on a keyword library of a historical subject:

determining [ 1-the American public ] corresponding associated word set: { [ 2-Community ] [ 2-people ] [ 2-America ] [ 2-Union ] [ 2-Med. Atlantic ] [ 2-Med. No. ] ] [ 2-Med. … … };

determining [ 1-independent war ] corresponding associated word set: { [ 2-USA ] [ 2-oppress ] [ 2-peel ] [ 2-FREE ] [ 2-REJECT ] [ 2-revolute ] [ 2-GROUND ] [ 2-ARRAY ] [ 2-EY force ] [ 2-NYY ] … … }; … … are provided.

The expanded subject keyword set comprises: { [ 1-the American people ] [ 1-the national beauty ] [ 1-revolutionary force ] [ 1-independent war ] [ 2-Community ] [ 2-the civil public ] [ 2-the American national beauty ] [ 2-the conglomerate ] [ 2-the Large ] [ 2-the United states ] [ 2-the oppression ] [ 2-the decortication ] [ 2-the liberty ] [ 2-the resistance ] [ 2-the revolutionary ] [ 2-the army ] [ 2-the armed force ] [ 2-the New York ] … … };

common keyword set: { [ 3-polymerization ] [ 3-bulk ] [ 3-formation ] … … }.

It can be seen that:

1) when the keywords are expanded, only the subject keywords are expanded, words with low relevance are effectively filtered (namely common keywords are not expanded), and the identification speed of subsequent keyword identification is improved.

2) The expanded subject keywords cover wider keywords, so that response information containing more relevant words can be reasonably scored during subsequent scoring calculation.

3) Due to reasonable division and selective expansion of the keywords, and in the subsequent scoring calculation, the scoring weight of the common keywords is reduced by improving the scoring weights of the subject keywords and the keywords, so that the scoring result is more reasonable and accurate. For example, the formation of "american nationality makes people group together to form a revolutionary power" and the answer information is less in the scoring weight of the common keyword than that of the subject keyword and the related word, and although the answer information lacks the common keyword of "aggregate and whole" and the related word of "huge", the answer information can also obtain a relatively high score because the subject keyword including "american nationality and revolutionary power" and the related word of "people group and group", and the embodiment of the present application can achieve this effect.

To make the objects, technical solutions and advantages of the present application more clear, embodiments of the present application will be described in further detail below with reference to the accompanying drawings.

The first embodiment of the present application relates to a machine intelligent review method of a short-answer question, the flow of which is shown in fig. 1, and the method comprises the following steps:

and constructing a corresponding keyword library for each subject in advance. Specifically, the construction method of the keyword library of each subject comprises the following steps: the method comprises the steps of obtaining subject keywords and common keywords of a subject based on large-scale subject linguistic data (including subject textbooks, test questions, documents and the like, for example), generating a word vector table of each keyword, clustering the keywords based on the word vector table, obtaining a relevant word set of each subject keyword, and constructing a keyword library of the subject, wherein the keyword library comprises subject keywords and common keywords of the subject and the relevant word set of each subject keyword.

Optionally, the "acquiring subject keywords and common keywords of the subject based on the subject corpus, generating a word vector table of each keyword, clustering each keyword based on the word vector table, and acquiring a related word set of each subject keyword" may further include the following steps:

firstly, acquiring subject keywords and common keywords of the subject based on the subject corpus;

calculating the distance between each keyword in the word vector table;

acquiring words with the distance from each subject keyword to each subject keyword smaller than a preset threshold value to form a related word set of each subject keyword.

The text deep language model in the step two may be, for example, a deep learning model based on word2vec, an NNLM neural network language model, an ELMo model, a BERT model, and the like, and is not limited thereto. In the step c, the distance between the keywords in the word vector table may be calculated by, for example, a cosine similarity calculation method, a pearson correlation coefficient, or a similarity calculation method based on euclidean distance, but is not limited thereto.

In step 101, response information and standard answers of the target test question are acquired.

Then, step 102 is entered, based on the keyword library, a subject keyword set and a common keyword set in the standard answer are extracted, and a related word set of each subject keyword is determined to expand the subject keyword set.

Optionally, step 102 is preceded by the step "clause and word segmentation on the standard answer", and step 102 is further implemented as: and extracting a subject keyword set and a common keyword set in the standard answer based on the keyword library and the word segmentation result of the standard answer, and determining a related word set of each subject keyword to expand the subject keyword set.

Optionally, after step 102, the following steps are further included: and removing the repeated associated words in the expanded subject keyword set from the subject keyword set, and removing the repeated common keywords in the common keyword set from the associated words in the expanded subject keyword set.

Optionally, after step 102, the following steps are further included: and labeling the category identification of each keyword in the expanded subject keyword set, and providing a basis for the subsequent step 103 of identifying different types of keywords and step 105 of determining the number of the different types of keywords. For example and without limitation, the following notations apply: the subject keyword may be labeled 1 and the associated word may be labeled 2. Optionally, the related words include semantic related words and disciplinary related words. For example and without limitation, the following notations apply: the semantic related word may be denoted by 21 and the disciplinary related word may be denoted by 22.

Then, step 103 is performed, subject keywords and associated words in the response information are identified based on the expanded subject keyword set, and common keywords in the response information are identified based on the common keyword set.

Optionally, step 103 further includes a step of "segmenting words and sentences of the answering information", and step 103 is further implemented as: and identifying subject keywords and associated words in the response information based on the expanded subject keyword set and the word segmentation result of the response information, and identifying common keywords in the response information based on the common keyword set.

Then, step 104 is entered to calculate the sentence reasonableness of the answering information, which is the reasonable degree of the logical sequence and relationship between words in the sentence. Specifically, the sentence reasonableness is a calculation of the reasonable degree of the word order (logical sequence and relation between words) of the sentence, and is calculated based on a language model of a large amount of language training.

Optionally, step 104 may further comprise the following sub-steps 104 a-104 d:

step 104 a: respectively extracting the word sequence of each sentence in the answering information and the standard answer;

step 104 b: calculating the probability value of the position of each word in each word sequence in the sentence by adopting an N-gram language model according to the Markov assumption;

step 104 c: calculating a word reasonable probability value of each sentence according to the probability value of the position of each word in the sentence based on a Bayesian conditional probability model;

step 104 d: and calculating the sentence reasonableness of the answering information according to the answering information and the word reasonable probability value of each sentence in the standard answer.

In one embodiment, the N-gram language model is a trigram language model. In other embodiments, the N-gram language model may be twoA meta, quaternary or quinary language model, etc. Taking a trigram model as an example, according to the Markov assumption, the probability of any word appearing at a certain position is only related to one or a limited number of words appearing before it, and the trigram model is adopted, i.e., the word appearing at the current position is only related to two words before it, i.e., P (w)_n|w₁,w₂,...,w_n)≈P(w_n|w_n-2,w_n-1) And training to obtain a 'three-dimensional language model' based on the large-scale subject linguistic data. Based on the sentence word sequence set of the standard answer information and the examinee answering information, obtaining the probability value of the position of each word appearing in the current sentence; and calculating the reasonable word probability of each sentence in the standard answer information and the examinee answering information according to the Bayesian conditional probability model. The specific calculation method is as follows: when S denotes a word sequence set of a sentence, the sentence passing probability P (S) ═ P (w)₁,w₂,...,w_n) Namely: p(s) ═ P (w)₁,w₂,...,w_n)＝P(w₁)×P(w₂|w₁)×...×P(w_n|w₁,w₂,...,w_n-1) Wherein w is₁Denotes the probability of occurrence of the first word, P (w)₂|w₁) Denotes w₁On the premise of occurrence of w₂Probability of occurrence next to each other, and so on, P (w)₂|w₁)、P(w₃|w₁,w₂)、......、P(w_n|w₁,w₂,...,w_n) And obtaining through the constructed trigram language model.

Optionally, step 104d may be further implemented as: respectively calculating the word reasonable probability mean values of the sentences of the answering information and the standard answers; if the mean value of the reasonable probabilities of the words of the sentence of the answering information is smaller than the mean value of the reasonable probabilities of the words of the sentence of the standard answer, the sentence reasonability of the answering information is the quotient of the mean value of the reasonable probabilities of the words of the sentence of the answering information and the mean value of the reasonable probabilities of the words of the sentence of the standard answer; if the mean value of the reasonable probabilities of the words of the sentence of the response information is greater than or equal to the mean value of the reasonable probabilities of the words of the sentence of the standard answer, the sentence reasonableness of the response information is 1.

Then, step 105 is entered, according to the formula

Calculating a score F of the response information, wherein s₁、s₂、s₃、s₄Weight coefficients s representing subject keyword information, related word information, general keyword information, and sentence reasonableness in the answer information₁＞s₂＞s₃，F₀The total score is the target test question.

Note that, in step 105, the number of related words in the response information means the number of excluded related words that are the same as the subject keyword, and the number of common keywords in the response information means the number of excluded common keywords that are the same as the related words.

Optionally, the related words comprise semantic related words and disciplinary related words, and the step 105 further comprises the steps of: according to

Calculating the score F of the response information: wherein s is₁、s₂、s₃、s₄Weight coefficients respectively representing subject keyword information, related word information, general keyword information, sentence reasonability in the response information, a being a weight coefficient of semantic keyword information in related word information, b being a weight coefficient of subject related word information in related word information, and s being₁＞s₂＞s₃，a＞b，F₀The total score is the target test question.

Wherein s is₄May be between s₁And s₃Preferably s₁＞s₄＞s₂＞s₃。

Optionally, after step 105, the following steps a to C may be further included:

a: calculating the average value of the scores of the answering information of all examinees in the preset examination group;

b: obtaining expected scores of the answering information of all examinees in the preset examination group, and calculating the average value of the expected scores;

c: and adjusting the scores of the answering information of each examinee according to the average value of the scores and the average value of the expected scores.

Optionally, the step C may be further implemented as: and determining an expected average score range according to the average value of the expected scores, determining to adjust the score upwards when the average value of the score is smaller than the lower limit value of the expected average score range, and determining to adjust the score downwards when the average value of the score is larger than or equal to the upper limit value of the expected average score range.

Optionally, after the determining to adjust the score up or the determining to adjust the score down, the following steps a to f may be further included:

a: calculating a down-regulation or up-regulation base score as an absolute value of a difference between the average of the scores and the average of the expected scores;

b: according to the ranking of all the examinees in the preset examination group from high to low according to the scores of the answering information, dividing each examinee and the scores thereof into an excellent examinee set, a common examinee set and a poor examinee set according to the ranking result;

c: calculating the up-regulation or down-regulation score of each test in the common test set to be equal to the up-regulation or down-regulation benchmark score;

d: according to the formula

Computing each test in the set of excellent testsWherein TR represents the up or down baseline score, F₀Indicates the total score of the target test question, F_iRepresents the score, S, of the ith test taker in the set of excellent test takers_iRepresenting the upregulation or downregulation value of the ith test taker in the set of excellent test takers;

e: according to the formula

Calculating the up-regulation or down-regulation score of each examinee in the poor examinee set, wherein f represents the number of examinees in the excellent examinee set, h represents the number of examinees in the poor examinee set, S represents the up-regulation or down-regulation score of each examinee in the poor examinee set, TR represents the up-regulation or down-regulation reference score, and S represents the up-regulation or down-regulation reference score_iRepresenting the upregulation or downregulation value of the ith test taker in the set of excellent test takers;

f: and adjusting the score of the answering information of each examinee up or down according to the calculated up-adjustment or down-adjustment value of the score of each examinee.

Alternatively, the above-mentioned "determining the expected average score range according to the average of the expected scores" may be set by manual input of the reviewer or by default by the system to the average of the expected scores ± a preset score, which may be, for example, but not limited to, 2, 3, 4, 5, etc.

The second embodiment of the present application relates to a machine intelligent review system for a short-response question, which has a structure shown in fig. 2 and includes a keyword library construction module, a keyword library, an acquisition module, a keyword recognition module, a reasonableness calculation module, and a scoring module.

Specifically, the keyword library construction module is configured to obtain subject keywords and common keywords of the subject based on the subject corpus, generate a word vector table of each keyword, perform clustering on each keyword based on the word vector table, and obtain a related word set of each subject keyword, so as to construct a keyword library of the subject. The keyword library construction module is used for constructing a keyword library for each subject, and the keyword library of each subject comprises subject keywords of the subject, common keywords and associated word sets corresponding to the subject keywords.

Optionally, the keyword library building module is further configured to obtain subject keywords and common keywords of the subject based on the subject corpus; generating word vectors of the keywords by using a text depth language model to obtain a word vector table; calculating the distance between each keyword in the word vector table; and acquiring words with the distance from each subject keyword to each subject keyword less than a preset threshold value to form a related word set of each subject keyword. The text deep language model may be, for example, a word2 vec-based deep learning model, an NNLM neural network language model, an ELMo model, a BERT model, or the like, and is not limited thereto. For example, a cosine similarity calculation method, a pearson correlation coefficient, or a similarity calculation method based on euclidean distance may be used to calculate the distance between the keywords in the word vector table, but the method is not limited thereto.

The acquisition module is used for acquiring the answering information and the standard answers of the target test questions.

The keyword identification module is used for extracting a subject keyword set and a common keyword set in the standard answer based on the keyword library, and determining a related word set of each subject keyword to expand the subject keyword set; and identifying subject keywords and associated words in the response information based on the expanded subject keyword set, and identifying common keywords in the response information based on the common keyword set.

Optionally, the keyword recognition module is further configured to perform sentence segmentation on the standard answer, extract a subject keyword set and a common keyword set in the standard answer based on the keyword library and the segmentation result of the standard answer, and determine a related word set of each subject keyword to expand the subject keyword set. Optionally, the keyword recognition module is further configured to perform word segmentation and sentence segmentation on the response information, recognize subject keywords and associated words in the response information based on the expanded subject keyword set and the word segmentation result of the response information, and recognize common keywords in the response information based on the common keyword set.

Optionally, the keyword recognition module is further configured to perform deduplication on associated words in the expanded subject keyword set, the associated words being repeated with the subject keywords, and eliminate common keywords in the common keyword set, the common keywords being repeated with the associated words in the expanded subject keyword set.

Optionally, the keyword recognition module is further configured to label each keyword in the expanded subject keyword set with a category identifier thereof, so as to provide a basis for subsequent recognition of different types of keywords and determination of the number of different types of keywords, for example, but not limited to, the following labeling manner is adopted: the subject keyword may be labeled 1 and the associated word may be labeled 2. Optionally, the related words include semantic related words and disciplinary related words, for example and without limitation, the following notation is adopted: the semantic related word may be denoted by 21 and the disciplinary related word may be denoted by 22.

The reasonableness calculation module is used for calculating the sentence reasonableness of the answering information, wherein the sentence reasonableness refers to the reasonableness of the logical sequence and relationship between words in the sentence. Specifically, the sentence reasonableness is a calculation of the reasonable degree of the word order (logical sequence and relation between words) of the sentence, and is calculated based on a language model of a large amount of language training.

Optionally, the reasonableness calculation module is further configured to extract word sequences of each sentence in the answer information and the standard answer respectively; calculating the probability value of the position of each word in each word sequence in the sentence by adopting an N-gram language model according to the Markov assumption; calculating a word reasonable probability value of each sentence according to the probability value of the position of each word in the sentence based on a Bayesian conditional probability model; and calculating the sentence reasonableness of the answering information according to the answering information and the word reasonable probability value of each sentence in the standard answer.

In one embodiment, the N-gram language model is a trigram language model. In other embodiments, the N-gram language model may also be a binary, quaternary or quinary language model, or the like. Using a trigram model as an example, a probability that any word appears at a certain position according to the Markov assumptionThe rate is only related to one or a limited number of words appearing before it, and the trigram language model is adopted, namely, the word appearing at the current position is only related to two words before it, namely P (w)_n|w₁,w₂,...,w_n)≈P(w_n|w_n-2,w_n-1) And training to obtain a 'three-dimensional language model' based on the large-scale subject linguistic data. Based on the sentence word sequence set of the standard answer information and the examinee answering information, obtaining the probability value of the position of each word appearing in the current sentence; and calculating the reasonable word probability of each sentence in the standard answer information and the examinee answering information according to the Bayesian conditional probability model. The specific calculation method is as follows: s denotes a word sequence set of a sentence, and the sentence passing probability P (S ═ P (w)₁,w₂,...,w_n) Namely: p (S ═ P (w)₁,w₂,...,w_n)＝P(w₁)×P(w₂|w₁)×...×P(w_n|w₁,w₂,...,w_n-1) Wherein w is₁Denotes the probability of occurrence of the first word, P (w)₂|w₁) Denotes w₁On the premise of occurrence of w₂Probability of occurrence next to each other, and so on, P (w)₂|w₁)、P(w₃|w₁,w₂)、......、P(w_n|w₁,w₂,...,w_n) And obtaining through the constructed trigram language model.

Optionally, the reasonableness calculation module is further configured to calculate word reasonable probability averages of sentences of the answer information and the standard answers, respectively; if the mean value of the reasonable probabilities of the words of the sentence of the answering information is smaller than the mean value of the reasonable probabilities of the words of the sentence of the standard answer, the sentence reasonability of the answering information is the quotient of the mean value of the reasonable probabilities of the words of the sentence of the answering information and the mean value of the reasonable probabilities of the words of the sentence of the standard answer; if the mean value of the reasonable probabilities of the words of the sentence of the response information is greater than or equal to the mean value of the reasonable probabilities of the words of the sentence of the standard answer, the sentence reasonableness of the response information is 1.

The scoring module is used for scoring according to a formula

Optionally, the related words comprise semantic related words and disciplinary related words, and the scoring module is further configured to score according to

Calculating a score F of the response information, wherein s₁、s₂、s₃、s₄Weight coefficients respectively representing subject keyword information, related word information, general keyword information, sentence reasonability in the response information, a being a weight coefficient of semantic keyword information in related word information, b being a weight coefficient of subject related word information in related word information, and s being₁＞s₂＞s₃，a＞b，F₀The total score is the target test question.

Optionally, the scoring module is further configured to calculate an average value of the scores of the answering information of all the examinees in the preset examination group; obtaining expected scores of the answering information of all examinees in the preset examination group, and calculating the average value of the expected scores; and adjusting the scores of the answering information of each examinee according to the average value of the scores and the average value of the expected scores.

Optionally, the scoring module is further configured to determine an expected average score range according to the average value of the expected scores, determine to adjust the score upward when the average value of the score is smaller than a lower limit value of the expected average score range, and determine to adjust the score downward when the average value of the score is greater than or equal to an upper limit value of the expected average score range.

Optionally, the scoring module is further configured to compute a down-or up-regulation reference score as an absolute value of a difference between the average of the scores and the average of the expected scores; according to the ranking of all the examinees in the preset examination group from high to low according to the scores of the answering information, dividing each examinee and the scores thereof into an excellent examinee set, a common examinee set and a poor examinee set according to the ranking result; calculating the up-regulation or down-regulation score of each test in the common test set to be equal to the up-regulation or down-regulation benchmark score; according to the formula

Calculating an up or down score for each test in the set of excellent tests, wherein TR represents the up or down benchmark score, F₀Indicates the total score of the target test question, F_iRepresents the score, S, of the ith test taker in the set of excellent test takers_iRepresenting the upregulation or downregulation value of the ith test taker in the set of excellent test takers; according to the formula

Calculating the up-regulation or down-regulation score of each examinee in the poor examinee set, wherein f represents the number of examinees in the excellent examinee set, h represents the number of examinees in the poor examinee set, S represents the up-regulation or down-regulation score of each examinee in the poor examinee set, TR represents the up-regulation or down-regulation reference score, and S represents the up-regulation or down-regulation reference score_iRepresenting the upregulation or downregulation value of the ith test taker in the set of excellent test takers; according to the calculated up-regulation or down-regulation value of the score of each examineeThe score of the answering information of each test taker is adjusted up or down.

The first embodiment is a method embodiment corresponding to the present embodiment, and the technical details in the first embodiment may be applied to the present embodiment, and the technical details in the present embodiment may also be applied to the first embodiment.

It should be noted that, those skilled in the art should understand that the implementation functions of the modules shown in the embodiment of the machine intelligent review system for the short-answer question can be understood by referring to the related description of the machine intelligent review method for the short-answer question. The functions of the modules shown in the embodiment of the machine-intelligent review system for short-answer questions can be realized by a program (executable instructions) running on a processor, and can also be realized by specific logic circuits. The machine intelligent review system for the short answer questions in the embodiment of the application can be stored in a computer readable storage medium if the system is realized in the form of a software functional module and is sold or used as an independent product. Based on such understanding, the technical solutions of the embodiments of the present application may be essentially implemented or portions thereof contributing to the prior art may be embodied in the form of a software product stored in a storage medium, and including several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the methods described in the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read Only Memory (ROM), a magnetic disk, or an optical disk. Thus, embodiments of the present application are not limited to any specific combination of hardware and software.

Accordingly, the present application also provides a computer-readable storage medium, in which computer-executable instructions are stored, and when the computer-executable instructions are executed by a processor, the computer-executable instructions implement the method embodiments of the present application. Computer-readable storage media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, a computer readable storage medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.

In addition, the embodiment of the application also provides a machine intelligent review system of the short answer question, which comprises a memory for storing computer executable instructions and a processor; the processor is configured to implement the steps of the method embodiments described above when executing the computer-executable instructions in the memory. The Processor may be a Central Processing Unit (CPU), other general-purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), or the like. The aforementioned memory may be a read-only memory (ROM), a Random Access Memory (RAM), a Flash memory (Flash), a hard disk, or a solid state disk. The steps of the method disclosed in the embodiments of the present invention may be directly implemented by a hardware processor, or implemented by a combination of hardware and software modules in the processor.

It is noted that, in the present patent application, relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, the use of the verb "comprise a" to define an element does not exclude the presence of another, same element in a process, method, article, or apparatus that comprises the element. In the present patent application, if it is mentioned that a certain action is executed according to a certain element, it means that the action is executed according to at least the element, and two cases are included: performing the action based only on the element, and performing the action based on the element and other elements. The expression of a plurality of, a plurality of and the like includes 2, 2 and more than 2, more than 2 and more than 2.

All documents mentioned in this application are to be considered as being incorporated in their entirety into the disclosure of this application so as to be subject to modification as necessary. It should be understood that the above description is only a preferred embodiment of the present disclosure, and is not intended to limit the scope of the present disclosure. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of one or more embodiments of the present disclosure should be included in the scope of protection of one or more embodiments of the present disclosure.

Claims

1. A machine intelligent review method of short answer questions is characterized by comprising the following steps:

according to the formula

2. The method for machine-intelligent review of abbreviated questions as recited in claim 1, wherein said calculating a sentence reasonableness of said response information, which is a reasonable degree of logical order and relationship between words in a sentence, further comprises:

3. The method of machine-intelligent review of the abbreviated question as set forth in claim 2, wherein said calculating a sentence reasonableness of said response information based on said response information and a word reasonableness probability value of each sentence of said standard answers further comprises:

4. The machine-intelligent method for machine-readable review of a abbreviated question as in claim 1, wherein the obtaining of the subject keywords and the common keywords of the subject based on the subject corpus and the generating of the word vector table for each keyword, and the clustering of each keyword based on the word vector table to obtain the associated word set for each subject keyword further comprises:

calculating the distance between each keyword in the word vector table;

5. The machine-intelligent method for machine-readable review of the abbreviated question of claim 4, wherein the text deep language model is a deep learning model based on word2 vec;

6. The machine-intelligent review method for short-response questions of any of claims 1-5, further comprising, after said calculating the score for said response information:

7. The machine-intelligent method for machine-readable review of the abbreviated question of claim 6, wherein said adjusting the score of said answering information for each test taker based on said average value of the scores and said average value of the expected scores further comprises:

according to the formula

according to the formula

8. The utility model provides a machine intelligence system of reviewing of brief answer, its characterized in that includes:

a scoring module for scoring according to a formula

9. The utility model provides a machine intelligence system of reviewing of brief answer, its characterized in that includes:

a processor for implementing the steps in the method of any one of claims 1 to 7 when executing the computer-executable instructions.

10. A computer-readable storage medium having stored thereon computer-executable instructions which, when executed by a processor, implement the steps in the method of any one of claims 1 to 7.