CN107845047B

CN107845047B - Dynamic scoring system, method and computer readable storage medium

Info

Publication number: CN107845047B
Application number: CN201711079672.6A
Authority: CN
Inventors: 郑丽华; 何征宇
Original assignee: Iol Wuhan Information Technology Co ltd
Current assignee: Iol Wuhan Information Technology Co ltd
Priority date: 2017-11-07
Filing date: 2017-11-07
Publication date: 2021-09-17
Anticipated expiration: 2037-11-07
Also published as: CN107845047A

Abstract

The invention provides a dynamic scoring system and method based on global test data and a computer readable storage medium. By utilizing the scoring method and the scoring system provided by the invention, a computer objective judgment mechanism can be introduced in the subjective question testing process, so that the marking process of the subjective questions can also be carried out with the assistance of a computer; further, in the technical solution of the present invention, the ability level of each testee is not only judged by the answer submitted by the testee, but dynamically changes according to the overall level of all testees, specifically, is measured according to the unity or weight of the answers given by all testees to the same tested item, so that the level of the testee can be more accurately distinguished.

Description

Dynamic scoring system, method and computer readable storage medium

Technical Field

The invention belongs to the technical field of evaluation, and particularly relates to a dynamic scoring system and method based on global test data and a computer readable storage medium.

Background

Currently, a written test is one of the indispensable links in order to evaluate the relative abilities of a person or to screen out a person with prominent abilities from a candidate group. And quantitatively scoring the test answers submitted by the testees by reviewing the test answers so as to screen out candidates with scores exceeding certain standards.

However, when the number of candidate groups is large, it becomes difficult to manually review the test paper one by one. For this reason, the test questions usually contain a large number of objective questions or even all objective questions, because fully automatic computer online examination paper marking and real-time scoring can be realized for the objective questions. However, since objective questions have certain random and memory components, if all objective questions or most of the questions are objective questions, accurate quantitative evaluation on the ability of a candidate cannot be comprehensively made. Therefore, in the actual evaluation process, the more important evaluation is, and the more indispensable is the subjective question. However, as mentioned above, the current computer can only realize the on-line examination of objective questions, and the subjective questions are easy to be misjudged. Therefore, manual scoring must be introduced, thereby reducing the evaluation efficiency; even in the case of manual scoring, there is a possibility that erroneous judgment may occur due to subjective tendency and carelessness of the reader.

Furthermore, the existing various evaluation methods are static scoring mechanisms, namely, the score of a tested person is obtained according to the answer sheet of the tested person, and all tested persons in the test are not considered. However, in the subjective question test, since the answers are not unified, one test question usually gives a plurality of reference answers; it is also possible for the examinee to give correct answers in addition to the plurality of reference answers when actually answering the question. The accuracy of the answers is very different, and if the scores are judged only by mistake or correctness, the ability of distinguishing different testees cannot be further realized; similarly, if the whole answer sheet condition of the same subjective question of the tested population is not considered, each tested person is only independently scored, and the real level of the tested person cannot be reflected.

Disclosure of Invention

In order to solve the problems that manual marking is needed, the efficiency is low, and a static marking mechanism cannot accurately evaluate the capability of a tested person in the prior art, the invention provides a dynamic marking system and method based on global test data and a computer readable storage medium.

By utilizing the scoring method and the scoring system provided by the invention, a computer objective judgment mechanism can be introduced in the subjective question testing process, so that the marking process of the subjective questions can also be carried out with the assistance of a computer; further, in the technical solution of the present invention, the ability level of each testee is not only judged by the answer submitted by the testee, but dynamically changes according to the overall level of all testees, specifically, is measured according to the unity or weight of the answers given by all testees to the same tested item, so that the level of the testee can be more accurately distinguished.

In other words, unlike the existing testing method and technology, when a certain tester completes all tests, the score of the certain tester is actually determined, in the present invention, when the certain tester completes the tests, the score of the certain tester is actually in a pending state, and only after the answers of all other testers are collected and summarized simultaneously, the score of each individual tester can be finally determined, that is, the method of the present invention is a dynamic scoring mechanism based on global test data, and is not a traditional static mechanism.

In a first aspect of the present invention, a dynamic scoring system based on global test data is provided, which includes a test item display, a first test system and a second test system; the first test system comprises a first test item scoring system and a first test result display; the second test system includes at least one database storing a plurality of second test items and a plurality of benchmark comparison entries corresponding to each of the plurality of second test items; the dynamic scoring system further comprises: at least one input component, which is used for inputting corresponding content when the current testee tests the project;

the content analysis component is used for acquiring corresponding content input by the current testee;

the first test item scoring system obtains a first test item score of the current testee according to corresponding content which is acquired by the content analysis component and input when the current testee tests the first test item through the at least one input component, and the first test item score is displayed on the first test result display;

the dynamic scoring system also comprises a second test item scoring system, and when the score of the first test item of the current testee is greater than a first standard threshold, the dynamic scoring system prompts the current testee to enter the second test system through the first test result display to test the second test item;

in the scoring mechanism of the present invention, a two-step screening process is first utilized. As a final highlight of ability, it should meet basic requirements in every respect, e.g. as an objective problem test, it should pass; this is the premise for further investigation. Since objective questions can be quickly implemented in the form of computer scoring, they are considered as the first test item.

In the implementation of the next two second test items, the content analysis component acquires corresponding content input by the current testee when the second test item is tested through the at least one input component;

as an important difference from the prior art, the dynamic scoring system further includes a global test data summarization module, configured to summarize corresponding contents input during a predetermined time period and/or a predetermined number of tests of the second test item when entering the second test system; and the second test item scoring system obtains the scores of the second test items of all the testees according to the data summarized by the global test data summarizing module.

The important difference here is that the dynamic scoring mechanism based on global data of the present invention is embodied. That is, after the current testee completes the test, the system cannot immediately obtain the score of the testee, because in the present invention, the score of the testee not only depends on the answers submitted by the testee, but also dynamically changes according to the answers submitted by all other testees. Therefore, a global test data summarization module is needed

Wherein, the content analysis component obtains the corresponding content input when the current testee tests the second test item through the at least one input component, and further comprises: and extracting keywords from the corresponding content, and comparing the extracted keywords with a plurality of reference comparison items corresponding to the second test item.

Preferably, the second test item scoring system obtains the scores of the second test items of all the testees according to the data summarized by the global test data summarizing module, and specifically includes: summarizing the corresponding content input by all the testees aiming at each second test item, extracting keywords from the corresponding content, comparing the extracted keywords with a plurality of reference comparison items corresponding to each second test item, and obtaining the second test item score of each testee according to the comparison result.

At this stage, the invention creatively adopts the computer analysis technology to realize the subjective question appraising process, specifically, the subjective question answers of the testee are extracted to be divided and extracted with key words, then the key words are compared with the reference standard answers of the subjective question, and the grade of the testee is obtained according to the matching degree. Although answers to the subjective questions are various, the keyword part of the answers should be substantially the same.

As an optimization, the scoring system may further include a subject database system for storing historical data of the subject, the historical data including historical first test item scores and second test item scores of the subject. Therefore, after the tested person logs in the dynamic scoring system and before entering the first testing system, the historical first testing item score of the current tested person is obtained; and if the historical first test item score meets a preset passing condition, prompting the current testee to directly enter a second test system to test the second test item.

The predetermined passing condition may be that all historical first test item scores of the subject in the past exceed a first standard threshold; or more than half exceeds the first criterion threshold; or the first standard threshold value is exceeded last time, etc., the present invention is not limited in particular, and those skilled in the art can set the threshold value according to actual situations when implementing the present invention. The purpose of this setting is to avoid repeated testing.

In a second aspect of the present invention, a dynamic scoring method based on global test data is provided, wherein the scoring method is used for testing a predetermined time period and/or a predetermined number of testees; the test comprises a first test phase and a second test phase; after the first test stage is finished, displaying the score of the first test stage of the current tested person in real time; if the score of the current tested person in the first test stage is larger than the first preset standard score, prompting the current tested person to enter a second test stage; otherwise, the test is finished;

as a core technical scheme of the invention, the method comprises the following steps after entering the second test stage:

s1: acquiring a corresponding answer input by the current testee aiming at each test item in the second test stage;

s2: analyzing the corresponding answer to obtain at least one keyword;

s3: matching the at least one keyword with at least one benchmark answer of the test item;

s4: obtaining a current second score of the current testee according to the matching result, and updating the weight of the reference answer;

s5: processing other testees within the preset time period and/or in a preset number by adopting the same method of the steps S1-S4 to obtain current second scores of other testees and update the weight of the reference answer;

s5: and obtaining the final second scores of all the testees according to the updated weights of the reference answers.

In practical implementation, each test item in the first test stage has only one reference answer, namely common objective questions including single-item choice questions, multiple-item choice questions, indefinite-item choice questions and judgment questions; there are a plurality of reference answers, i.e., non-objective questions, e.g., subjective translation questions, for each test item of the second test stage.

As a feedback mechanism, if the final second score of a tested person is lower than the second standard threshold value and the score of the first test stage is higher than the first standard threshold value, the test items of the second test stage of the tested person are manually graded again.

Although subjective question marking can be realized by the method, the keyword extraction/analysis, matching and other processes are not as accurate as objective question comparison after all, and in order to avoid misjudgment, if the objective question score of the tested person is higher, the subjective question score is lower, and the answer of the tested person is possibly misjudged, a rechecking mechanism should be introduced at the moment. It will be appreciated that the introduction of this mechanism does not increase the manual scoring workload, since this should be a small number, but the answers given by some of the testees should be within the keyword range of the reference answers, since the test designer typically gives enough reference answers for each subjective question.

As a third aspect of the present invention, there is also provided a computer-readable storage medium having stored thereon computer-executable instructions; the executable instructions are executed by a processor and a memory for implementing the aforementioned dynamic scoring method.

Drawings

FIG. 1 is a diagram of a scoring system framework in accordance with the present invention.

FIG. 2 is a flow chart of the scoring method of the present invention at two stages.

FIG. 3 is a diagram showing the implementation of the second stage of the scoring method according to the present invention.

Fig. 4(a) -4(c) are specific examples of the dynamic scoring method of the present invention.

FIG. 5 is a diagram illustrating final results of an embodiment of the present invention.

DETAILED DESCRIPTION OF EMBODIMENT (S) OF INVENTION

In the present embodiment, the subject is a translator. The purpose of the test is to screen out those candidates with higher translation levels. For translation work, the evaluation criteria are "belief", "Dada" and "elegance" in the order from low to high. So-called "letter", i.e. translation is accurate; so-called "to", require a thorough idea; the "elegant" is required to be rich in culture.

Fig. 1 schematically shows a system framework for implementing the method of the invention.

As shown in fig. 2, in the present embodiment, the dynamic scoring process of the present invention is divided into two stages:

(1) and (5) objective subject testing stage.

First, the subject should pass the objective questions test and reach a certain score. On one hand, for a qualified translation job, the capability of ensuring the accuracy of the translated text should be firstly ensured, namely the most basic 'confidence' standard is reached; on the other hand, for an excellent translator, it is necessary that the translation reaches the "reach" or "ya" standard, and the stage of "confidence" has already passed, so that the subject who reaches the "reach" or "ya" standard should be selected among subjects who pass the objective problem test and reach a certain score.

Unlike the traditional test mode of simultaneously presenting objective questions and subjective questions in a test paper, in the invention, the objective question test is a module which is independent and firstly presents. The testee can enter the examination of the next stage only through the testing stage of the objective question module and reaching a certain score, otherwise, the testee is eliminated;

(2) subjective question testing stage

The test at this stage is mainly to give priority to a few translators which can reach the standard of "reach" and "ya" from the testees passing through the test stage of the objective question module.

The preferred process given in this embodiment is automatically performed based on the overall subject level, and the process is relatively objective and also conforms to the relevant statistical rules.

Specifically, for each item to be translated, a plurality of different translation result sets are given in advance, and the translation result sets serve as reference scoring items of the item to be translated; each translation result in the translation result set comprises a plurality of keywords.

For example, for item to be translated T₁={t₁,t₂,t₃,……，t_nGiving the item T to be translated in advance₁Corresponding k translation results A₁，A₂，……A_k。

For each given translation result A₁，A₂，……A_kDefining a first flag value, said first flag values each initialized to 0;

for each item to be translated, defining a second mark value for each testee, wherein the second mark value is initialized to 0;

after all the testees finish the subjective question test, each tester is scored according to the following steps:

(1) obtaining a translation result submitted by a current testee aiming at the item to be translated, and comparing the submitted translation result with each translation result in the translation result set to obtain a reference item and a reference score of the submitted translation result;

specifically, the relevant keywords of the submitted translation result are extracted and matched with the keywords of each translation result in the translation result set, so as to obtain a reference entry; the number of the successfully matched keywords is used as a benchmark score;

for example, the current subject is directed to the item to be translated T₁={t₁,t₂,t₃,……，t_nGive the translation result of

E₁={e₁,e₂,e₃,……，e_rIn which e₁,e₂,e₃,……，_erTo form a sentence E₁The related keywords of (1);

then e will be₁,e₂,e₃,……，e_rRespectively corresponding to k translation results A₁，A₂，……A_kComparing the keywords of each;

statistics E₁And k translation results A₁，A₂，……A_kThe matching number of the keywords is sorted;

(2) if at least one keyword is successfully matched, taking the translation result with the highest matching degree with the translation result submitted by the current testee in the translation result set as a reference item of the translation result submitted by the current testee, and taking the number of the keywords successfully matched as a reference score of the current testee for the item to be translated;

meanwhile, updating the first mark value of the translation result with the highest matching degree with the translation result submitted by the current testee in the translation result set;

for example, if A_jAnd E₁Has the most number of matched related keywords, is_YA, then A_j Plus 1, the first flag value of; the reference score obtained by the current testee aiming at the item to be translated is marked as Y;

if the relevant keyword of the translation result given by the current testee is not successfully matched with any keyword of any translation result in the translation result set, updating the second mark value, specifically, adding 1 to the second mark value of the current testee for the item to be translated;

(3) obtaining the translation result submitted by the next testee for the item to be translated, and repeating the steps (1) - (2) until all the translation results submitted by the testees for the item to be translated have executed the steps (1) - (2);

(4) acquiring the current first marker value of each translation result in the translation result set, the second marker values of all testees aiming at the current item to be translated, and reference items obtained by all testees aiming at the item to be translated;

(5) and calculating the final scores of all the testees for the item to be translated.

Specifically, when the m testees perform the above steps for the submitted translation results of the item to be translated T1, a translation result a given for each item to be translated can be obtained₁，A₂，……A_kFirst flag value X of₁,X₂,……，X_k(ii) a m-bit testees aim at item to be translated T₁Of said second flag value Z₁，Z₂，……Z_mAnd m-bit testees for the entry to be translated T₁The resulting base and base entries (Y)_r，A_s) Wherein Y is_rIs the benchmark score determined in the previous step (2), A_sIs the reference item determined according to the step (2);

at this time, the m-th candidate has a final score of Y for the entry to be translated T1_r*X_sWherein X is_sIs a reference item A_sThe first flag value of (1).

In this way, the score of each subject for all the items to be translated, i.e. the test score of the second stage, can be obtained.

FIG. 3 is a flow chart generally illustrating the second testing phase described above

To better illustrate the above scoring process, the scoring process of the present invention is schematically illustrated by the specific examples in fig. 4(a) -4 (c):

as just one illustrative example, in the present embodiment, the number of the testees is 10, and a translation result set given by a certain to-be-translated item includes four translation results as reference answers. The four translation results are all the correct translation results of the item to be translated. In general, to ensure the universality of the reference answers, different sentence patterns are usually adopted for the four translation results, different translation words are adopted for the same word to be translated, proper expansion is performed, and the like, so that the four translation results express the same meaning but are different in word expression.

Taking fig. 4(a) -4(c) as an example, four translation results a1-a4 are given for a certain item to be translated.

Referring to fig. 4(a) -4(c), the dynamic scoring mechanism for each subject in the second stage is as follows:

(1) and acquiring a translation result of the first testee. Through matching, the answer given by the first subject is found to have the highest matching degree with A4, and the number of matched keywords is 6. At this time, a4 was selected as the reference item for the first subject, the reference of which was divided into 6 points; then, the first flag value is updated, and since a4 is selected as the reference entry, only the first flag value of a4 is updated to 1, and the first flag values of the other translation results a1-A3 remain 0; at this time, the second flag value is not updated;

(2) and acquiring a translation result of the second testee. Through matching, the answer given by the second subject is still the highest in matching degree with a4, and the number of matched keywords is 6. At this time, a4 was selected as the reference item for the second subject, the reference for the second subject being divided into 6 points; then, the first flag value is updated, and since a4 is selected as the reference entry again, the first flag value of a4 is updated to 2 again, and the first flag values of the other translation results a1-A3 remain 0; at this time, the second flag value is not updated;

(3) and acquiring a translation result of the third testee. Through the matching finding, the answer of the third subject can not be matched with the keywords of A1-A4, i.e. the number of matched keywords is zero. At this time, updating the second flag value to 1 also means that the third subject has no reference entry and no reference score;

(4) and acquiring a translation result of the fourth testee. Through matching, the answer given by the fourth subject is found to have the highest matching degree with a1, and the number of matched keywords is 5. At this time, a1 was selected as the reference item for the fourth subject, whose reference score was 5; then, the first flag value is updated, and since a1 is selected as the reference entry, the first flag value of a1 is updated to 1. At this time, the first marker value of A4 is still 2, and the first marker values of the other translation results A2-A3 are still 0; at this time, the second flag value is not updated;

by analogy, the translation results of the remaining fifth to tenth subjects are obtained one by one, and the updating conditions of the number of the specific matching keywords, the reference entries, the reference scores, the first mark values and the second mark values are shown in the remaining parts of fig. 4(a) -4 (c).

After all the above steps are completed, at this time, as can be seen from the last row of FIG. 4(c), the first flag values corresponding to the four translation results A1-A4 are 2, 4, 0, and 3, respectively;

meanwhile, the benchmark item, the benchmark score, the second mark value and the final total score of 10 testees are shown in fig. 5.

It can be seen from the above process that in the scoring scheme of the present invention, unlike the conventional evaluation method, the achievement of each subject is not determined by the answer submitted by the subject, but can be known after the achievement of all the subjects is counted.

In an actual scenario, the sample size is of course more than that. This is also a statistically valid embodiment of the present invention. The larger the sample size, the more rational the above protocol. Because according to the theorem of large numbers, if the number of the testees exceeds a certain degree, the distribution of the answers submitted by the testees is more random, and the number reaching the 'reach' and 'ya' standard is stable in a certain proportion.

In addition, when the sample size is larger, the statistical result is more and more consistent with the normal distribution, that is, the two ends are small, the middle is large, and the statistical result is embodied in the invention, which means that the testee with the second marker value of 1 should only account for a small part.

Also, the above embodiment merely illustrates the case where 10 subjects submit answers to one question (actually, one sentence translation). In practical situations, the subjective question test is not necessarily only one question, but also only one sentence, and the keyword data of the answer to be submitted is increased, which increases the sample size. The larger the sample size is, the more obvious the statistical rule of the invention is, the more the advantages are reflected, because according to the traditional testing method, the larger the sample size is, the larger the workload of manual examination paper marking is. In the invention, all the steps are automatically completed by a computer statistical module, and a large sample size is exactly needed to ensure the accuracy.

Claims

1. A dynamic scoring system based on global test data comprises a test item display, a first test system and a second test system; the first test system comprises a first test item scoring system and a first test result display; the second test system includes at least one database storing a plurality of second test items and a plurality of benchmark comparison entries corresponding to each of the plurality of second test items;

the dynamic scoring system further comprises:

at least one input component, which is used for inputting corresponding content when the current testee tests the project;

the content analysis component is used for acquiring and analyzing corresponding content input by the current testee;

the first test item scoring system obtains a first test item score of the current testee according to corresponding content which is acquired by the content analysis component and input when the current testee performs a first test item test through the at least one input component, and the first test item score is displayed on the first test result display;

the dynamic scoring system further comprises a second test item scoring system,

when the score of the first test item of the current testee is larger than a first standard threshold, the dynamic scoring system prompts the current testee to enter a second test system through the first test result display to test the second test item;

the content analysis component acquires corresponding content input by the current testee during the second test item test through the at least one input component;

the method is characterized in that:

the dynamic scoring system further comprises a global test data summarizing module which is used for summarizing corresponding contents which are input when all the contents enter the second test system to perform the second test item test in a preset time period and/or a preset quantity;

the second test item scoring system obtains second test item scores of all testees according to the data summarized by the global test data summarizing module, and specifically comprises the following steps:

and updating the weight of the reference answer according to a matching result of matching the keyword of the answer input by each testee with at least one reference answer, and obtaining the final score of all testees according to the updated weight of the reference answer.

2. The dynamic scoring system of claim 1, wherein:

the content analysis component obtains corresponding content input by the current testee when the second test item test is performed through the at least one input component, and further includes: and extracting keywords from the corresponding content, and comparing the extracted keywords with a plurality of reference comparison items corresponding to the second test item.

3. The dynamic scoring system according to claim 1 or 2, wherein:

the second test item scoring system obtains second test item scores of all testees according to the data summarized by the global test data summarizing module, and specifically comprises the following steps: summarizing the corresponding content input by all the testees aiming at each second test item, extracting keywords from the corresponding content, comparing the extracted keywords with a plurality of reference comparison items corresponding to each second test item, and obtaining the second test item score of each testee according to the comparison result.

4. The dynamic scoring system of claim 1, wherein:

the system also comprises a testee database system which is used for storing historical data of the testee, wherein the historical data comprises historical first test item scores and historical second test item scores of the testee.

5. The dynamic scoring system of claim 4, wherein:

after a tested person logs in a dynamic scoring system and before entering a first testing system, acquiring historical first testing item scores of the current tested person; and if the historical first test item score meets a preset passing condition, prompting the current testee to directly enter a second test system to test the second test item.

6. A dynamic scoring method based on global test data is disclosed, wherein the scoring method is used for testing a predetermined time period and/or a predetermined number of testees; the test comprises a first test phase and a second test phase; after the first test stage is finished, displaying the score of the first test stage of the current tested person in real time; if the score of the current tested person in the first test stage is larger than the first preset standard score, prompting the current tested person to enter a second test stage; otherwise, the test is finished;

the method is characterized in that after entering a second test stage, the scoring method comprises the following steps:

s2: analyzing the corresponding answer to obtain at least one keyword;

s5: correspondingly processing other testees in the preset time period and/or in the preset number by adopting the same method of the steps S1-S4 to obtain the current second scores of the other testees and update the weight of the reference answer;

s6: and obtaining the final second scores of all the testees according to the updated weights of the reference answers.

7. The dynamic scoring method based on global test data according to claim 6, wherein:

each test item of the first test stage only has one benchmark answer; there are a plurality of reference answers for each test item of the second test stage.

8. The dynamic scoring method based on global test data according to claim 6, wherein:

the test items of the first test stage are selection item questions which comprise single item selection questions, multiple item selection questions, indefinite item selection questions and judgment questions; the test items of the second test stage are subjective translation questions.

9. A dynamic scoring method based on global test data according to any one of claims 6-8, characterized by:

and if the final second score of a tested person is lower than the second standard threshold value and the score of the first test stage is higher than the first standard threshold value, manually scoring the test items of the second test stage of the tested person again.

10. A computer-readable storage medium having stored thereon computer-executable instructions; executing the executable instructions by a processor and a memory for implementing the dynamic scoring method of any one of claims 6-9.