WO2024095819A1

WO2024095819A1 - Proficiency level determination device, proficiency level determination method, and program

Info

Publication number: WO2024095819A1
Application number: PCT/JP2023/038290
Authority: WO
Inventors: 淳渡辺; 倫也上田
Original assignee: 株式会社Ｚ会
Priority date: 2022-11-04
Filing date: 2023-10-24
Publication date: 2024-05-10
Also published as: JP7339414B1; JP2024067159A

Abstract

Provided is a proficiency level determination device with which it is possible to know the accuracy of a determined level, which is the results of a proficiency level determination by a determinator. This proficiency level determination device includes: a determinator training unit that trains determinators which determine a proficiency level exclusive to each problem using, as training data, answer data from a plurality of respondents with proficiency levels attached thereto as labels; a label distribution per problem/determination level generation unit that generates a label distribution for each determination level determined on the basis of test data by means of the determinators exclusive to each problem; a proficiency level determination unit that determines a proficiency level by means of the exclusive determinator on the basis of answer data by a prescribed respondent for a prescribed problem, the answer data not having a label attached thereto; and a termination determination unit that executes a termination determination for proficiency level determination for the prescribed respondent on the basis of an indicator indicating the degree of variation in the label distribution corresponding to the determined level for the prescribed problem.

Description

Skill level determination device, skill level determination method, and program

The present invention relates to a proficiency level assessment device, a proficiency level assessment method, and a program for assessing the proficiency level of an answerer.

For example, Patent Document 1 discloses an academic ability estimation model generation device that can generate an academic ability estimation model that accurately estimates current academic ability without requiring comprehensive learning data. The academic ability estimation model generation device of Patent Document 1 includes a decision tree generation unit that generates a decision tree using correct/incorrect information as teacher data indicating whether multiple solvers who answered a group of predetermined problems answered each question correctly or incorrectly, a pruning unit that deletes the leaf node that is the end of the generated decision tree when the entropy of the classification result indicated by the leaf node is equal to or less than a predetermined value, and a category generation unit that sets each of the new ends of the decision tree after the leaf node deletion as a category to which one of the solvers belongs.

Patent No. 7065927

　Until now, there has been an issue with the reliability of academic ability estimates made using models that estimate academic ability.

The present invention aims to provide a proficiency level assessment device that allows users to know the accuracy of the assessment level, which is the result of the proficiency level assessment made by a classifier.

The proficiency level assessment device of the present invention includes a classifier learning unit, a label distribution generation unit for each question and assessment level, a proficiency level assessment unit, and an end assessment unit.

The classifier learning unit uses answer data by multiple solvers, to which proficiency levels have been assigned as labels, as learning data, and learns each classifier dedicated to each problem that judges the proficiency level. The problem/judgment level label distribution generation unit uses answer data by multiple solvers, to which proficiency levels have been assigned as labels, that does not include learning data, as test data, and generates a distribution of labels assigned to the test data (hereinafter, label distribution) for each proficiency level (hereinafter, judgment level) judged based on the test data by each classifier dedicated to each problem. The proficiency level judgment unit judges the proficiency level using the dedicated classifier based on answer data for a specific problem by a specific solver to which no label has been assigned. The completion judgment unit executes an end judgment of the proficiency level judgment for a specific solver based on an index indicating the degree of variation in the label distribution corresponding to the judgment level for the specific problem.

The proficiency level assessment device of the present invention allows you to know the accuracy of the assessment level, which is the result of the proficiency level assessment made by the assessor.

FIG. 2 is a block diagram showing the functional configuration of the skill level determination device according to the first embodiment. 4 is a flowchart showing a classifier learning operation of the skill level determination device according to the first embodiment. 4 is a flowchart showing a label distribution generation operation of the skill level assessment device according to the first embodiment. FIG. 13 is a diagram showing an example of a label distribution generated for each question and for each judgment level. 4 is a flowchart showing an end determination operation of the skill level determination device according to the first embodiment. 11A and 11B are diagrams for explaining an example in which cumulative variance is used for end determination; 11A and 11B are diagrams for explaining an example in which a confidence interval is used for termination determination. 4 is a flowchart showing the question order optimization operation of the mastery level determination device according to the first embodiment. FIG. 13 is a diagram showing an example of a question order determination operation. FIG. 2 is a diagram showing an example of the functional configuration of a computer.

Below, an embodiment of the present invention will be described in detail. Components having the same functions will be given the same numbers, and duplicate explanations will be omitted.

The functional configuration of the proficiency level determination device of the first embodiment will be described below with reference to FIG. 1. As shown in the figure, the proficiency level determination device 1 of the present embodiment includes a determiner learning unit 100, a determiner storage unit 105, a label distribution generation unit for each question and determination level 110, a label distribution storage unit 115, a proficiency level determination unit 120, a completion determination unit 125, and a question order optimization unit 130. The determiner learning operation of the proficiency level determination device 1 will be described below with reference to FIG. 2.

<Classifier learning operation>
The classifier learning unit 100 learns each classifier that is dedicated to each question and judges the proficiency level using answer data by multiple solvers to which proficiency levels are assigned as labels (S100). The classifier storage unit 105 stores each learned classifier (S105).

<Answer data>
The answer data may be, for example, an English composition question, an English reading comprehension question, etc. The answer may be an answer to a question in a subject other than English. For example, the answer may be an answer to a question in mathematics, Japanese, social studies, or science.

Skill Level
The mastery level can be, for example, five levels from level 1 (low evaluation) to level 5 (high evaluation). Alternatively, it may be 13 levels from A+ to A-, B+ to B-, C+ to C-, D+ to D-, and F, or it may be 10 levels from 1 to 10, or it may be a system with a full score of 0 to 100 points. In the following embodiment, an example of five levels from level 1 (low evaluation) to level 5 (high evaluation) will be described.

<Evaluation of proficiency level>
The evaluation of the proficiency level may be an evaluation of the entire subject, or an evaluation of a specific field of the subject. For example, when an English composition question is asked, the evaluation may be made on the overall English composition ability as the proficiency level, or, if the question range is determined (e.g., the use of specific grammar rules is determined), only the English composition ability in the corresponding question range may be evaluated.

<Label assignment>
Labeling may be performed, for example, by a teacher or a corrector judging the proficiency level for each piece of answer sheet data and assigning the label.

<Learning data>
This is a large amount of answer data by multiple solvers with their proficiency levels labeled. Although the data has the same content as the test data described below, it is necessary to avoid using data that overlaps with the test data because the purposes are different. In operation, it is sufficient to divide the large amount of labeled answer data at a specified ratio, and use one portion as learning data and the other as test data.

<Classifier learning>
The classifier is a model trained by supervised learning using the above-mentioned labeled answer data (proficiency level) as training data, and is trained as a classifier dedicated to each problem as described above. For example, if a total of X English composition questions, from problem 1 to X, are given, classifier 1 dedicated to problem 1, classifier 2 dedicated to problem 2, ..., classifier X dedicated to problem X are trained.

<Judgment Level>
The proficiency level judged by the judger is also called the "judgment level."

<Example of judgment operation of the judge>
For example, if N free English composition questions are given, a classifier n (n = 1, ..., N) inputs the answer of a respondent P to English composition question n, judges the overall proficiency level of the respondent P in English composition ability, and outputs the judgment level (for example, 3 on a 5-point scale).

For example, if a short English composition using participial constructions is given as a question, classifier m (m = 1, ..., M) inputs answer sheet by respondent Q to English composition question m, judges respondent Q's proficiency level in English composition ability related to participial constructions, and outputs the judged level (for example, 4 on a 5-point scale).

Below, the label distribution generation operation of the proficiency level assessment device 1 will be explained with reference to Figures 3 and 4.

<Label distribution generation operation>
The label distribution generation unit 110 for each question and judgment level uses answer data that does not include learning data among answer data by multiple solvers to which proficiency levels are assigned as labels as test data, and generates a distribution of labels assigned to the test data (hereinafter also referred to as a label distribution) for each proficiency level (hereinafter referred to as a judgment level) judged based on the test data by each judger dedicated to each question (S110).The label distribution storage unit 115 stores the generated label distribution (S115).

<Test Data>
This is a large amount of answer data by multiple solvers with proficiency levels labeled, and although it has the same content as the training data described above, it is necessary to avoid using data that overlaps with the training data because the applications are different. The accuracy of each classifier can be evaluated by evaluating the difference between the judgment level output by each classifier trained using the training data and the label assigned to the test data.

<Label distribution>
As described above, the label distribution is generated for each question and each judgment level. An example of the label distribution is shown in FIG. 4. For example, when the answer data of multiple solvers for question 1 in the test data is judged by the judger 1 for the proficiency level, the label distribution for judgment level 3 is a distribution of 4 answers labeled with a level 1 label, 18 answers labeled with a level 2 label, 100 answers labeled with a level 3 label, 17 answers labeled with a level 4 label, and 4 answers labeled with a level 5 label. In this example, the judger 1 judges 22 answers labeled with

levels

1 and 2 as being overly rated, judges 100 answers labeled with a level 3 label (total of 143 answers) as being appropriate, and judges 21 answers labeled with

levels

4 and 5 as being underrated. The above-mentioned label distribution has a scatter in the values, and the smaller the scatter, the higher the performance of the judger. For example, the label distribution for level 3 of classifier 2 for question 2 has 135 answers (out of a total of 160) labeled with level 3, which is less variance than the label distribution for level 3 of classifier 1.

The following describes the end determination operation of the proficiency level determination device 1 with reference to Figures 5, 6, and 7.

<End determination operation>
The mastery level determination unit 120 determines the mastery level by a dedicated determiner based on the answer data of a predetermined problem by a predetermined solver to which no label has been assigned (S120).

The completion determination unit 125 performs a completion determination of the proficiency level determination for a given solver based on an index indicating the degree of variability in the label distribution corresponding to the determination level for a given question (S125). The degree of variability in the label distribution is, for example, the variance of the label distribution.

<Example 1 of termination judgment>
For example, the completion determination unit 125 can determine that the skill level determination is completed when the variance of the label distribution is lower than a predetermined threshold value.

For example, suppose that when judger 1 and judger 2 judge the proficiency level of a given solver R for answer data (without labels) in which the solver solves

problems

1 and 2 in that order, the judgement level for problem 1 is 3 and the judgement level for problem 2 is 2. In this case, the variance values representing the degree of variation in the corresponding label distribution are 0.47 and 0.28, respectively (see the shaded areas in Figure 4). If the threshold value set for the variance is, for example, 0.3, then judger 2 satisfies the condition, and the judgement level 2 output by judger 2 is output as the solver R's proficiency level judgement result, and the proficiency level judgement is terminated. In this case, solver R does not need to solve problem 3 in order to have his/her proficiency level judged, thereby reducing the burden on solver R.

<Example 2 of termination judgment>
Hereinafter, when solver R answers questions 1, ..., question K, and the label distribution corresponding to the judgment level of classifier k for the answer data of question k (k = ₁ , ..., K) is _Xk , the average value E(( _X1 +...+ _XK )/K) and variance V(( _X1 +...+ _XK )/K) of the average value ( _X1 +...+ _XK )/K of _X1 , ..., XK will be called the cumulative average value and cumulative variance, respectively.

For example, the end judgment unit 125 can judge that the proficiency level judgment is ended when the cumulative variance of the multiple label distributions corresponding to the judgment levels in multiple questions is lower than a predetermined threshold. For example, as shown in FIG. 6, when the judgment level of the judger 1 for the answer data of the problem 1 of the solver R is 3 (average value E( _X1 ) of the corresponding label distribution _X1 = 2.99, variance V( _X1 ) = 0.47), the judgment level of the judger 2 for the answer data of the problem 2 of the solver R is 2 (average value E( _X2 ) of the corresponding label distribution _X2 = 2.16, variance V( _X2 ) = 0.28), and the judgment level of the judger 3 for the answer data of the problem 3 of the solver R is 3 (average value E( _X3 ) of the corresponding label distribution _X3 = 2.95, variance V( _X3 ) = 0.39), the cumulative variance up to the problem 2 is

The cumulative variance up to problem 3 is

Therefore, if the threshold is set to 0.20, the proficiency level assessment of answerer R is determined to end at question 2. If the threshold is set to 0.15, the proficiency level assessment of answerer R is determined to end at question 3. If the threshold is set to 0.10, the proficiency level assessment of answerer R continues after question 3. The cumulative average value can be used as the proficiency level assessment result of answerer R. For example, if the proficiency level assessment ends at question 2,

If the proficiency level assessment is completed in question 3,

It is.

<Example 3 of termination judgment>
In addition, for example, the end determination unit 125 can determine that the proficiency level determination is completed when only one integer value is included in a predetermined confidence interval (T% confidence interval, for example, T=95) calculated using the cumulative average value and cumulative variance obtained from a plurality of label distributions corresponding to the determination levels in a plurality of questions. For example, as shown in FIG. 7, the 95% confidence interval calculated using the cumulative average value and cumulative variance obtained from the label distributions corresponding to the determination level 3 of question 1 and the determination level 2 of question 2 is [1.98, 3.18], and since two integer values (level 2, level 3) are included in the 95% confidence interval, it can be considered that the determination level is not yet determined to be 2 or 3. In this case, the end determination unit 125 does not determine that the proficiency level determination is completed. On the other hand, the 95% confidence interval calculated using the cumulative average value and cumulative variance obtained from the label distributions corresponding to each determination level of questions 1 to 3 is [2.30, 3.10], and since only one integer value (level 3) is included in the 95% confidence interval, it can be considered that the determination level is determined to be 3. In this case, the completion determination unit 125 may determine that the mastery level determination is completed.

<Example 4 of termination judgment>
The end judgment unit 125 may correct the average value of the label distribution corresponding to the minimum or maximum judgment level in a given problem and the index indicating the degree of variation so as to approximate the label distribution corresponding to other judgment levels. The minimum or maximum judgment level refers to

judgment levels

1 and 5, for example, when judgment levels are 1 to 5. The label distributions of

judgment levels

1 and 5 are distributed only on one side and do not have a so-called bell curve shape. For example, the label distribution of judgment level 1 has distributions for

labels

2 and 3, but labels 0 and -1 are not set, so there is no left side of the distribution. The same is true for the label distribution of judgment level 5, where

labels

6 and 7 are not set, so there is no right side of the distribution.

For example, the termination judgment unit 125 may generate a pseudo distribution on one side that does not exist and synthesize it to correct the average value of the relevant label distribution and an index indicating the degree of variation so that they approximate the label distribution corresponding to the other judgment level.

Also, for example, labels with integer values smaller than the minimum judgment level and labels with integer values larger than the maximum judgment level can be prepared as dummy labels and assigned manually.

Below, the question order optimization operation of the proficiency level assessment device 1 will be explained with reference to Figures 8 and 9.

<Question order optimization>
The question order optimization unit 130 sets the question with the smallest mean square error of the label distribution corresponding to the judgment level of a given question for a given solver as the next question to be asked to the given solver (S130).

For example, if the judgment level of judger 1 for problem 1 of solver R is 3, it is highly likely that the judgment level of solver R will ultimately be concluded to be 3. In this case, as shown in Figure 9, for example, the least squares error of the label distribution at judgment level 3 for each problem (problems 2 to 4) is 0.19, 0.39, and 0.57, respectively (see the shaded areas in the figure). Since the least squares error for problem 2 is the smallest, it can be said that problem 2 is superior in judging solvers belonging to judgment level 3.

Therefore, in this case, the question order optimization unit 130 should set question 2, which has the smallest mean square error of the label distribution corresponding to judgment level 3 by the judge 1 for question 1 for solver R, as the next question to be given to solver R, following question 1. In the case of the example shown in the figure, the questions will be given in the order of question 1 → question 2 → question 3 → question 4.

From the second question onwards, the next question to be asked may be the question with the smallest mean square error of the label distribution of the judgment level closest to the cumulative average value of previously answered questions.

<Modification 1>
The completion determination unit 125 of the first embodiment may be omitted, and the mastery level determination unit 120 may perform the final output (in this modified example, the mastery level determination unit 120A is referred to as the mastery level determination unit 120A). In this case, the mastery level determination unit 120A determines the mastery level using a dedicated determiner based on the answer data of a predetermined problem by a predetermined solver to which no label has been assigned, and generates and outputs a statistical index of the label distribution corresponding to the determination level.

The statistical indicators of the label distribution are, for example, the cumulative average value and cumulative variance described above. Although the end determination unit 125 is omitted in this modified example, the proficiency level determination unit 120A outputs, for example, the cumulative average value and cumulative variance as the statistical indicators of the label distribution, so the evaluator can refer to the output cumulative variance value and decide whether to end or continue the proficiency level determination. If the evaluator decides to end the proficiency level determination, he or she can determine the proficiency level of the answerer based on the cumulative average value.

<Additional Notes>
The device of the present invention has, as a single hardware entity, an input section to which a keyboard or the like can be connected, an output section to which a liquid crystal display or the like can be connected, a communication section to which a communication device (e.g., a communication cable) capable of communicating with the outside of the hardware entity can be connected, a CPU (which may also have a central processing unit, cache memory, registers, etc.), memories such as RAM and ROM, an external storage device such as a hard disk, and a bus connecting the input section, output section, communication section, CPU, RAM, ROM, and external storage device so that data can be exchanged between them. If necessary, the hardware entity may also be provided with a device (drive) capable of reading and writing recording media such as a CD-ROM. A physical entity equipped with such hardware resources is, for example, a general-purpose computer.

The external storage device of the hardware entity stores the programs required to realize the above-mentioned functions and the data required in the processing of these programs (not limited to an external storage device, for example the programs may be stored in a ROM, which is a read-only storage device). Data obtained by the processing of these programs is stored appropriately in the RAM or the external storage device.

In a hardware entity, each program stored in an external storage device (or ROM, etc.) and the data required to process each program are loaded into memory as needed, and interpreted, executed, and processed by the CPU as appropriate. As a result, the CPU realizes the specified functions (each of the components represented as the above, ... unit, ... means, etc.).

The present invention is not limited to the above-described embodiment, and appropriate modifications can be made without departing from the spirit of the present invention. Furthermore, the processes described in the above embodiment are not limited to being executed chronologically in the order described, but may be executed in parallel or individually depending on the processing capacity of the device executing the processes or as necessary.

As mentioned above, when the processing functions of the hardware entities (the devices of the present invention) described in the above embodiments are realized by a computer, the processing contents of the functions that the hardware entities should have are described by a program. Then, by executing this program on a computer, the processing functions of the hardware entities are realized on the computer.

The various processes described above can be implemented by loading a program that executes each step of the above method into the recording unit 10020 of the computer shown in FIG. 10, and operating the control unit 10010, input unit 10030, output unit 10040, etc.

The program describing the processing contents can be recorded on a computer-readable recording medium. Examples of computer-readable recording media include magnetic recording devices, optical disks, magneto-optical recording media, and semiconductor memories. Specifically, for example, hard disk drives, flexible disks, magnetic tapes, etc. can be used as magnetic recording devices; DVDs (Digital Versatile Discs), DVD-RAMs (Random Access Memory), CD-ROMs (Compact Disc Read Only Memory), and CD-Rs (Recordable)/RWs (ReWritable) can be used as optical disks; MOs (Magneto-Optical discs) can be used as magneto-optical recording media; and EEP-ROMs (Electrically Erasable and Programmable-Read Only Memory) can be used as semiconductor memories.

The program may be distributed, for example, by selling, transferring, or lending portable recording media such as DVDs or CD-ROMs on which the program is recorded. Furthermore, the program may be distributed by storing the program in a storage device of a server computer and transferring the program from the server computer to other computers via a network.

A computer that executes such a program, for example, first stores in its own storage device the program recorded on a portable recording medium or the program transferred from a server computer. Then, when executing a process, the computer reads the program stored on its own recording medium and executes the process according to the read program. As another execution form of the program, the computer may read the program directly from the portable recording medium and execute the process according to the program, or may execute the process according to the received program each time a program is transferred from the server computer to the computer. The above-mentioned process may also be executed by a so-called ASP (Application Service Provider) type service that does not transfer the program from the server computer to the computer, but realizes the processing function only by issuing an execution instruction and obtaining the results. Note that the program in this form includes information used for processing by an electronic computer that is equivalent to a program (such as data that is not a direct command to the computer but has properties that specify the processing of the computer).

In addition, in this embodiment, a hardware entity is configured by executing a specific program on a computer, but at least a portion of the processing content may be realized by hardware.

Claims

A classifier learning unit uses answer data by a plurality of solvers to which proficiency levels are assigned as labels as learning data, and learns each classifier that is dedicated to each problem and judges the proficiency level;
a problem/assessment level label distribution generation unit that generates a distribution of labels (hereinafter, label distribution) assigned to the test data for each problem and for each proficiency level (hereinafter, assessment level) determined based on the test data by each classifier dedicated to each problem; and
A mastery level determination unit that determines a mastery level by a dedicated determiner based on answer data of a predetermined problem by a predetermined solver to which no label has been assigned;
A proficiency level assessment device comprising: an end assessment unit that executes an end assessment of the proficiency level of a specified solver based on an index indicating a degree of variation in label distribution corresponding to an assessment level for a specified question.
The skill level determination device according to claim 1,
The degree of variation in the label distribution is the variance of the label distribution.
The skill level determination device according to claim 1,
When solver R answers questions 1, ..., question K, and the label distribution corresponding to the judgment level of classifier k for the answer data of question k (k = 1 , ..., K) is Xk , the average value E(( X1 +...+ XK )/K) and variance V(( X1 +...+ XK )/K) of the average value ( X1 +...+ XK )/K of X1, ..., XK are called the cumulative average value and cumulative variance, respectively.
The end determination unit is
The proficiency level assessment device determines that the proficiency level assessment is completed when a cumulative variance of a plurality of label distributions corresponding to assessment levels in a plurality of questions is lower than a predetermined threshold.
The skill level determination device according to claim 1,
When solver R answers questions 1, ..., question K, and the label distribution corresponding to the judgment level of classifier k for the answer data of question k (k = 1 , ..., K) is Xk , the average value E(( X1 +...+ XK )/K) and variance V(( X1 +...+ XK )/K) of the average value ( X1 +...+ XK )/K of X1, ..., XK are called the cumulative average value and cumulative variance, respectively.
The end determination unit is
A proficiency level assessment device that determines that the proficiency level assessment is complete when only one integer value is included in a predetermined confidence interval calculated using a cumulative mean value and a cumulative variance obtained from a plurality of label distributions corresponding to the assessment levels for a plurality of questions.
The skill level determination device according to claim 1,
When solver R answers questions 1, ..., question K, and the label distribution corresponding to the judgment level of classifier k for the answer data of question k (k = 1 , ..., K) is Xk , the average value E(( X1 +...+ XK )/K) and variance V(( X1 +...+ XK )/K) of the average value ( X1 +...+ XK )/K of X1, ..., XK are called the cumulative average value and cumulative variance, respectively.
A proficiency level assessment device including a question order optimization unit that sets, as the next question to be asked to a given solver, the question with the smallest mean square error of the label distribution corresponding to the assessment level of a given solver, or the question with the smallest mean square error of the label distribution of the assessment level closest to the cumulative average value.
The skill level determination device according to claim 1,
The end determination unit is
A proficiency level determination device that corrects the average value of a label distribution corresponding to the minimum or maximum determination level in a specified problem, and an index indicating the degree of variation, so as to approximate the label distribution corresponding to other determination levels.
A classifier learning unit uses answer data by a plurality of solvers to which proficiency levels are assigned as labels as learning data, and learns each classifier that is dedicated to each problem and judges the proficiency level;
a problem/assessment level label distribution generation unit that generates a distribution of labels (hereinafter, label distribution) assigned to the test data for each problem and for each proficiency level (hereinafter, assessment level) determined based on the test data by each classifier dedicated to each problem; and
A proficiency level determination device including a proficiency level determination unit that determines the proficiency level using a dedicated determiner based on answer data for a specified problem by a specified solver to which no label has been assigned, and generates and outputs a statistical index of the label distribution corresponding to the determination level.
The skill level determination device according to claim 7,
When solver R answers questions 1, ..., question K, and the label distribution corresponding to the judgment level of classifier k for the answer data of question k (k = 1 , ..., K) is Xk , the average value E(( X1 +...+ XK )/K) and variance V(( X1 +...+ XK )/K) of the average value ( X1 +...+ XK )/K of X1, ..., XK are called the cumulative average value and cumulative variance, respectively.
The proficiency level determination unit is
A proficiency level determination device that generates and outputs a cumulative average value and a cumulative variance as statistical indicators of a label distribution corresponding to the determination level.
A skill level determination method executed by a skill level determination device, comprising:
A step of learning answer data by a plurality of solvers to which proficiency levels are assigned as labels, and training each classifier that judges the proficiency level exclusively for each problem;
A step of generating a distribution of labels (hereinafter, label distribution) assigned to the test data for each problem, based on the proficiency level (hereinafter, judgment level) judged based on the test data by each judger dedicated to each problem, from among answer data by multiple solvers to which proficiency level is assigned as a label, and using answer data that does not include learning data as test data;
A step of judging a proficiency level by a dedicated judger based on answer data of a predetermined problem by a predetermined solver to which no label has been assigned;
A method for judging a proficiency level, comprising the step of executing a judgment of completion of the proficiency level judgment for a given solver based on an index indicating a degree of variation in label distribution corresponding to an assessment level for a given question.
A program that causes a computer to function as a proficiency level determination device according to any one of claims 1 to 8.