RU2726700C1

RU2726700C1 - Computer-aided automated method of creating test tasks for testing depth of knowledge and ability of students and specialists to reason

Info

Publication number: RU2726700C1
Application number: RU2019127875A
Authority: RU
Inventors: Денис Станиславович Тарасов
Original assignee: Денис Станиславович Тарасов
Priority date: 2019-09-04
Filing date: 2019-09-04
Publication date: 2020-07-15

Abstract

FIELD: information technology; training.SUBSTANCE: invention relates to the field of educational systems with computer technologies, in particular to methods for assessing the depth of knowledge and the ability for logical reasoning of students and professionals. Computer-aided automated method of creating test tasks for testing the depth of knowledge and the ability of students and professionals to reasoning involves obtaining, using an electronic computing device, a selection of texts on a given topic, generation of a language model in the form of a conditional distribution function of probability, modification of distribution using a numerical value of the complexity parameter by subsequent generation of automatically generated texts based on it by obtaining the next token in the text taking into account values of previous tokens, obtaining texts from real information sources, similar in the established function of similarity to said automatically generated texts and generating test tasks containing said automatically generated texts and similar texts from real information sources.EFFECT: automation of the process of creating test tasks which enable to test the ability of students and specialists to reason.6 cl

Description

Изобретение относится к области образовательных систем с компьютерными технологиями, в частности к методам оценки глубины знаний и способности к логическим рассуждениям учащихся и специалистов.The invention relates to the field of educational systems with computer technology, in particular to methods for assessing the depth of knowledge and the ability for logical reasoning of students and specialists.

Самый распространенный способ тестирования знаний — это тестирование с выбором варианта ответов, в котором учащиеся оценивают предложенные варианты ответа на тестовое задание и выбирают один (или несколько) верных. Если выбранный студентом вариант совпадает с отмеченным разработчиком тестового задания как правильный, то студент получает определенное число баллов, иначе, если вариант неправильный или вопрос пропущен, то тестируемый не получает никаких баллов. Главным преимуществом этого метода является возможность стандартизации и автоматизации процесса оценки (исключение фактора участия человека в проверке задания).The most common way to test knowledge is multi-choice testing, in which students evaluate the proposed answers to a test item and choose one (or more) correct ones. If the option chosen by the student coincides with the one marked by the developer of the test task as correct, then the student receives a certain number of points, otherwise, if the option is incorrect or the question is skipped, then the test taker does not receive any points. The main advantage of this method is the ability to standardize and automate the assessment process (excluding the factor of human participation in checking the assignment).

Данный способ обладает известными недостатками: This method has known disadvantages:

1. Составление тестовых заданий трудоемкая задача, при этом для объективности оценки каждый раз необходимо составлять новые тестовые задания, поскольку предыдущие задания могут стать известными для учащихся и потеряют свою ценность. Кроме того, новые задания приходится составлять если в процессе развития науки старые знания устаревают.1. Compilation of test items is a laborious task, while for the objectivity of the assessment, each time it is necessary to draw up new test items, since the previous assignments may become known to students and lose their value. In addition, new tasks have to be drawn up if, in the process of the development of science, old knowledge becomes obsolete.

2. Составитель теста может допустить ошибки или составить тесты, проверяющие несущественные аспекты учебной программы. Составитель теста может сам быть некомпетентен, и его знания невозможно объективно проверить иначе, чем использованием других тестов. 2. The test writer can make mistakes or write tests that check non-essential aspects of the curriculum. The test writer himself may be incompetent, and his knowledge cannot be objectively verified otherwise than by using other tests.

3. Неоднозначные формулировки заданий и ответов могут привести к искажению результатов.3. Ambiguous wording of tasks and answers can lead to distortion of results.

4. Задача выбора из вариантов ответов проверяет пассивные знания (способность опознать правильный ответ) и не проверяет способность рассуждать логически и применять полученные знания.4. The task of choosing from answer options tests passive knowledge (the ability to recognize the correct answer) and does not test the ability to reason logically and apply the knowledge gained.

5. Тестирование не устойчиво к списыванию из учебников или поиску ответов в сети Интернет, из-за чего для его проведения требуются строгие меры контроля списывания. В то время, как на практике часто требуется проверить не запоминания фактов, а способность обучаемого найти нужные факты в литературе и применить их. 5. Testing is not resistant to cheating from textbooks or searching for answers on the Internet, which is why it requires strict control measures for cheating. While in practice it is often required to check not the memorization of facts, but the ability of the student to find the necessary facts in the literature and apply them.

Для решения задачи снижения трудоемкости процесса создания теста существуют подходы к автоматизации создания тестов. Известны подходы к компьютерной генерации тестов с помощью базы фактов (онтологии) [1], а также более современные решения позволяющие перефразировать предложения текста в вопрос с помощью семантического анализа [2, 3] или глубоких нейронных сетей [4] (например «Куликовская битва была в 1380 году → «Что было в 1380 году?») и генерации заведомо неправильных ответов для получения теста. Эти решения не устраняют проблему ошибок в тестах, поскольку уровень алгоритмов уступает человеческому интеллекту, качество теста снижается и требует последующей проверки человеком. Кроме того, не решается проблема глубокой проверки способности рассуждать и применять знания. Вероятнее всего, автоматизация составления тестов такого рода требует наличия искусственного интеллекта, не уступающего по возможностям человеку-специалисту, что на текущем уровне развития науки невозможно. Тесты, где необходимо вписать недостающее слово в предложение страдают от тех же проблем, усугубляющихся трудностью оценки (могут подходить по смыслу разные слова).To solve the problem of reducing the complexity of the process of creating a test, there are approaches to automating the creation of tests. There are known approaches to computer generation of tests using a fact base (ontology) [1], as well as more modern solutions that allow you to paraphrase text sentences into a question using semantic analysis [2, 3] or deep neural networks [4] (for example, “The Kulikovo battle was in 1380 → "What happened in 1380?") and generating deliberately incorrect answers to get the test. These solutions do not eliminate the problem of errors in tests, since the level of algorithms is inferior to human intelligence, the quality of the test decreases and requires subsequent human verification. In addition, the problem of deep testing of the ability to reason and apply knowledge is not solved. Most likely, the automation of writing tests of this kind requires artificial intelligence, which is not inferior in capabilities to a human-specialist, which is impossible at the current level of development of science. Tests where it is necessary to enter the missing word in a sentence suffer from the same problems, aggravated by the difficulty of evaluating (different words may fit the meaning).

Другим известным направлением является автоматизация оценки заданий выраженных в свободной форме (сочинения, эссе), например оценка уникальности [5] и соответствия теме [6]. Но ни одно из известных решений не способно автоматически проверить содержательную часть сочинения или даже реферата на достаточном уровне, по причине изложенной выше (требует наличия искусственного интеллекта, не уступающего по возможностям человеку-специалисту). Another well-known direction is the automation of the assessment of tasks expressed in free form (essays, essays), for example, the assessment of uniqueness [5] and relevance to the topic [6]. But none of the known solutions is capable of automatically checking the content of an essay or even an abstract at a sufficient level, for the reason stated above (requires artificial intelligence that is not inferior in capabilities to a human specialist).

Технической задачей заявляемого изобретения является получение способа создания тестовых заданий для проверки знаний, обеспечивающего возможность проверки способности учащихся и специалистов к рассуждениям.The technical objective of the claimed invention is to obtain a method for creating test items to test knowledge, which makes it possible to test the ability of students and specialists to reason.

Технический результат – автоматизация процесса создания тестовых заданий, обеспечивающих возможность проверки способности учащихся и специалистов к рассуждениям.The technical result is the automation of the process of creating test items that provide an opportunity to test the ability of students and specialists to reason.

Технический результат достигается тем, что способ создания тестовых заданий для проверки глубины знаний и способности к рассуждениям учащихся и специалистов включает получение с помощью электронно-вычислительного устройства выборки текстов по заданной теме, создание на ее основе языковой модели в виде функции условного распределения вероятности, модификацию распределения с использованием числового значения параметра сложности, с последующим формированием на его основе автоматически генерируемых текстов путем получения следующего токена в тексте с учетом значений предыдущих токенов в соответствии с функцией распределения, получение текстов из реальных источников информации, схожих по установленной функции сходства с упомянутыми автоматически сгенерированными текстами и формирование тестовых заданий, содержащих упомянутые автоматически сгенерированные тексты и схожие тексты из реальных источников информации.The technical result is achieved by the fact that the method of creating test items to test the depth of knowledge and the ability to reason of students and specialists includes obtaining, using an electronic computing device, a sample of texts on a given topic, creating on its basis a language model in the form of a conditional probability distribution function, modifying the distribution using the numerical value of the complexity parameter, with the subsequent formation of automatically generated texts on its basis by obtaining the next token in the text, taking into account the values of the previous tokens in accordance with the distribution function, obtaining texts from real sources of information similar in terms of the established similarity function to the mentioned automatically generated texts and the formation of test items containing the mentioned automatically generated texts and similar texts from real information sources.

Заявляемый способ основан на использовании в составе тестовых заданий автоматически сгенерированных текстов, содержащих фактические и логические ошибки с заданным уровнем и частотой ошибок. В процессе тестирования тестируемый должен выявить из предложенных текстов те, которые сгенерированы автоматически, т.е. содержат ошибки. Для того, чтобы опознать автоматически сгенерированный текст необходимо иметь знания и навыки, чтобы выявить в нем логические ошибки. Таким образом, основная проблема автоматизации проверки знаний, заключающаяся в отсутствии идеального искусственного интеллекта обходится, поскольку в данном случае ошибки компьютерных алгоритмов являются основой самого метода. The inventive method is based on the use of automatically generated texts containing actual and logical errors with a given level and frequency of errors as part of test tasks. During the testing process, the tested person must identify from the proposed texts those that are generated automatically, i.e. contain errors. In order to recognize the automatically generated text, it is necessary to have the knowledge and skills to identify logical errors in it. Thus, the main problem of automating knowledge verification, which consists in the absence of ideal artificial intelligence, is bypassed, since in this case the errors of computer algorithms are the basis of the method itself.

Осуществление изобретения.Implementation of the invention.

Способ создания тестовых заданий для проверки знаний и способности к рассуждениям учащихся и специалистов согласно заявляемому изобретению включает формирование тестовых заданий с применением ЭВМ и языковой модели, представляющих собой сочетание автоматически генерируемых текстов и текстов из реальных источников информации (например, источников научной информации, учебников, энциклопедий). Сформированные тестовые задания предоставляют обучающимся, задача которых состоит в выборе текстов, сгенерированных автоматически, подсчет правильных ответов обучающегося. The method of creating test items to test the knowledge and reasoning ability of students and specialists according to the claimed invention includes the formation of test items using a computer and a language model, which are a combination of automatically generated texts and texts from real information sources (for example, sources of scientific information, textbooks, encyclopedias ). The generated test tasks are provided to students, the task of which is to select texts generated automatically, counting the student's correct answers.

Формирование автоматически генерируемых текстов осуществляется следующим образом:Formation of automatically generated texts is carried out as follows:

С помощью ЭВМ создается выборка текстов по заданной тематике с использованием научных статей, учебников, энциклопедий, находящихся в открытом доступе в Интернете, или используя недоступные публично базы данных (например, если тестирование создается для проверки знания сотрудниками внутренних документов компании).With the help of a computer, a selection of texts on a given topic is created using scientific articles, textbooks, encyclopedias that are publicly available on the Internet, or using publicly inaccessible databases (for example, if testing is created to check the knowledge of employees of the company's internal documents)

С помощью полученной выборки с использованием ЭВМ осуществляется обучение языковой модели. With the help of the obtained sample, the language model is trained using a computer.

Языковая модель - это модель осуществляющая предсказание токена под номером t в последовательности, заданной t-1 предшествующих токенов, где токены могут быть словами, подсловами или символами. A language model is a model that predicts the token numbered t in the sequence given by the t-1 preceding tokens, where tokens can be words, subwords, or symbols.

Целью языковой модели является оценка распределения P(x _0:T ) по последовательностям токенов (x ₀ , x ₁ , ..., x _T ):=x ₀ :x _T[8]. Обучение языковой модели это процесс, создающий аппроксимирующей реальное распределение функции дискретного условного распределения вероятностей P(x _t |x ₀ ,...x _t-1 ) получения следующего токена, если даны значения предыдущих токенов. Совместное распределение по длинным текстовым фрагментам может быть представлено как произведение значений распределения по токенам, обусловленным предшествующими токенами [9]:The purpose of the language model is to estimate the distribution of P (x _{0: T} ) over the token sequences (x ₀ , x ₁ , ..., x _T ): = x ₀ : x _T [8]. Learning a language model is a process that creates a discrete conditional probability distribution function P (x _t | x ₀ , ... x _t-1 ) approximating the real distribution of obtaining the next token if the values of the previous tokens are given. Joint distribution over long text fragments can be represented as the product of distribution values over tokens due to previous tokens [9]:

Таким образом, получают функцию, вычисляющую вероятность появления следующего токена на основе примеров текста. Thus, a function is obtained that calculates the probability of occurrence of the next token based on the examples of the text.

Языковая модель может представлять собой рекуррентную нейронную сеть или нейронную сеть другой архитектуры и может строиться различными способами. The language model can be a recurrent neural network or a neural network of another architecture and can be built in various ways.

Для проверки результата с помощью ЭВМ и языковой модели генерируется несколько текстов небольшой длины, путем последовательного предсказания следующего токена на основании распределения P(x _{0: T} ). Предсказание может осуществляться как путем выбора самого вероятного токена, так и путем случайного выбора токена из распределения P(x _0:T ), или из распределения p_τ _i, вычисляемого на основании P(x _{0: T} ) согласно формуле: To check the result using a computer and a language model, several short texts are generated by sequentially predicting the next token based on the distribution P (x _{0: T} ) . Prediction can be carried out both by choosing the most probable token, and by randomly choosing a token from the distribution P (x _{0: T} ) , or from the distribution p _τ _i calculated on the basis of P (x _{0: T} ) according to the formula:

где τ — значение температуры (условный показатель), i - номер токена для которого производится вычисление, j - индекс, по которому производится суммирование, он принимает значения от 0 до количества токенов. where τ is the temperature value (conditional indicator), i is the number of the token for which the calculation is performed, j is the index by which the summation is performed, it takes values from 0 to the number of tokens.

Как следует из формулы, при τ= 1 текст генерируется из исходного распределения, при τ>1 повышается элемент случайности и при очень высоких значениях τ текст станет просто случайным набором символов (токенов). Для целей настоящего метода показатель τ используется как один из способов регулирования сложности теста. Сложность теста также возможно регулировать другими способами, изменяющими выходное распределение P(x _0:T ), например путем добавления к активациям скрытых слоев нейронной сети случайного шума с заданной амплитудой A. As follows from the formula, at τ = 1, the text is generated from the original distribution, at τ> 1, the element of randomness increases, and at very high values of τ, the text becomes just a random set of symbols (tokens). For the purposes of this method, the τ exponent is used as one of the ways to control the test difficulty. The complexity of the test can also be adjusted in other ways that change the output distribution P (x _{0: T} ) , for example, by adding random noise with a given amplitude A to the activations of the hidden layers of the neural network.

Подбор значений τ и A может осуществляться различными методами. При отсутствии возможности проведения процедуры подбора параметра значение τ принимается равным 1, а А равным нулю и текст генерируется из исходного распределения P(x ₀ : _T).The selection of the values of τ and A can be carried out by various methods. In the absence of the possibility of carrying out the procedure for selecting the parameter, the value of τ is taken equal to 1, and A equal to zero, and the text is generated from the initial distribution P (x ₀ : _T ).

Значения параметров могут быть выбраны эмпирически: осуществляется генерация текстов с использованием различных параметров, затем специалист в предметной области просматривает полученные фрагменты и выбирает уровень сложности, который он считает приемлемым для проведения проверки знаний учащихся.The values of the parameters can be chosen empirically: texts are generated using various parameters, then a specialist in the subject area looks at the received fragments and chooses the level of difficulty that he considers acceptable for testing students' knowledge.

В другом варианте осуществления изобретения значения параметров могут быть выбраны путем статистического подбора: осуществляется генерация текстов с использованием различных значений параметров А или τ, затем с использованием полученных текстов, а также текстов из реальных источников информации формируют тестовые задания, предоставляемые двум группам испытуемых, одна из которых является экспертами в теме тестирования, а вторая контрольной (группа может быть набрана из обучающихся, еще не знакомых с темой теста). Выбирается значение параметра, показавшего наибольшую разницу в средних баллах между группой экспертов и контрольной группой.In another embodiment of the invention, the parameter values can be selected by statistical selection: texts are generated using different values of the parameters A or τ , then using the obtained texts, as well as texts from real information sources, test tasks are formed, provided to two groups of subjects, one of which are experts in the topic of testing, and the second is a control group (a group can be recruited from students who are not yet familiar with the topic of the test). The value of the parameter that showed the greatest difference in average scores between the group of experts and the control group is selected.

Для создания теста по выбранной теме ряд автоматически-сгенерированных текстов перемешиваются в случайном порядке с похожими фрагментами из реальных источников информации (учебников, научных статей и других). To create a test on a selected topic, a number of automatically generated texts are mixed in random order with similar fragments from real sources of information (textbooks, scientific articles, and others).

Похожие фрагменты могут быть получены путем осуществления автоматизированного поиска с помощью ЭВМ сходных согласно некоторой функции сходства фрагментов (абзацев) в реальных источниках информации. Функция сходства может быть определена различным образом. Одним из вариантов реализации данного процесса является вычисление процента совпадений уникальных слов (или словосочетаний заданной длины) сгенерированного фрагмента текста и абзацев реальных источников информации с последующим поиском среди них абзаца, для которого данная величина максимальна. Таким образом, выбираются похожие фрагменты, которые имеют наибольшее число совпадающих слов со сгенерированным примером. Другим способом реализации функции сходства является использование функций, предназначенных для поиска информации таких как TF/IDF, BM25 [10]. Также могут использоваться метрики сходства текстов, такие как BLEU [11], нейросетевые обучаемые модели для вычисления сходства текстовых фрагментов [12-14], или другие функции с фиксированными, подбираемыми или обучаемыми параметрами. Тексты также могут быть подобраны вручную или с использованием поисковых систем в сети Интернет.Similar fragments can be obtained by performing an automated search using a computer for similar fragments (paragraphs) in real information sources according to some function of similarity. The similarity function can be defined in various ways. One of the options for implementing this process is to calculate the percentage of matches of unique words (or phrases of a given length) of the generated fragment of text and paragraphs of real sources of information, followed by a search among them for the paragraph for which this value is maximum. Thus, similar fragments are selected that have the largest number of matching words with the generated example. Another way to implement the similarity function is to use functions designed for information retrieval such as TF / IDF, BM25 [10]. Metrics of text similarity can also be used, such as BLEU [11], neural network learning models for calculating the similarity of text fragments [12-14], or other functions with fixed, fit or learning parameters. The texts can also be selected manually or using search engines on the Internet.

Данный процесс можно повторять многократно для получения любого необходимого количества тестов. This process can be repeated many times to obtain any desired number of tests.

Полученный тест наносится на бумагу или другой носитель, и предъявляется тестируемому. Задача обучающегося определить тексты, которые сгенерированы автоматически. После выполнения тестового задания вручную или с помощью ЭВМ производится подсчет правильных ответов, полученное число является числом баллов, полученных за тест.The resulting test is applied to paper or other media and presented to the person being tested. The task of the student is to identify the texts that are generated automatically. After completing the test task manually or with the help of a computer, the correct answers are counted, the resulting number is the number of points received for the test.

Для превращения числа баллов в оценку используются существующие методы калибровки тестов с множественным выбором, например [7], которые позволяют оценить вероятность случайного правильного прохождения теста и найти соответствие между баллами и уровнем знаний. To convert the number of points into an assessment, existing methods of calibration of multiple choice tests are used, for example [7], which allow us to assess the probability of accidentally passing the test correctly and find a correspondence between the scores and the level of knowledge.

Примеры осуществления изобретения:Examples of implementation of the invention:

Пример 1.Example 1.

Языковая модель на базе нейронной сети Long Short-Term Memory (LSTM) [15] была обучена для предсказания следующего символа (модель на уровне символов, где токенами являются отдельные символы) на выборке текстов художественных книг (5Gb данных), после чего была дообучена на выборке текстов по истории. Модель содержала в себе 4 слоя LSTM ячеек по 3192 ячейки в каждом слое. Выходным множеством токенов были символы русского алфавита. При генерации текста, значение параметра τ было принято равным 1. Был получен следующий текст:A language model based on the Long Short-Term Memory (LSTM) neural network [15] was trained to predict the next character (a model at the character level, where tokens are individual characters) on a sample of fiction books (5Gb data), after which it was retrained on selection of texts on history. The model contained 4 layers of LSTM cells with 3192 cells in each layer. The output set of tokens was the characters of the Russian alphabet. When generating the text, the value of the parameter τ was taken equal to 1. The following text was received:

1. «Полковник Корнилов родился в 1891 году в семье старого полковника. В 1918 году он был призван в армию и получил приказ от командования войск НКВД подготовить почву для подготовки к войне. В 1921 году он принял участие в операции по проведению военных действий в районе Сталинграда. В 1941 году он был назначен командиром 1-й танковой дивизии и получил приказ от командования 1-й танковой армии от 15 июня 1941 года принять участие в боях под Москвой. В 1942 году он был награжден орденом Красного Знамени. В 1942 году в районе Сталинграда был обнаружен труп майора Василия Сергеевича Сталина. В 1942 году он был признан виновным в убийстве Сталина. В 1942 году он был приговорен к смертной казни за убийство»1. “Colonel Kornilov was born in 1891 in the family of an old colonel. In 1918 he was drafted into the army and received orders from the command of the NKVD troops to prepare the ground for war preparations. In 1921, he took part in an operation to conduct hostilities in the Stalingrad area. In 1941, he was appointed commander of the 1st Panzer Division and received an order from the command of the 1st Panzer Army on June 15, 1941 to take part in the battles near Moscow. In 1942 he was awarded the Order of the Red Banner. In 1942, the corpse of Major Vasily Sergeevich Stalin was discovered near Stalingrad. In 1942, he was found guilty of Stalin's murder. In 1942 he was sentenced to death for murder "

После этого второй текст был подобран из реальных источников информации по принципу наибольшего сходства со сгенерированным с использованием метрики сходства на базе ключевых слов:After that, the second text was selected from real sources of information according to the principle of the greatest similarity with the generated using the similarity metric based on keywords:

2. «Полковник Корнилов родился 18 (30) августа 1870 г. в семье хорунжего. Он окончил Сибирский кадетский корпус, Михайловское артиллерийской училище, а также Николаевскую академию Генштаба (с золотой медалью). В 1904-1905 гг. занимал должность штаб-офицера и фактически выполнял обязанности начальника штаба. Был награжден за доблесть в Мукденском сражении и произведен в полковники. Л.Г. Корнилов попал в плен в 1915 г. во время отступления армий, его дивизия понесла серьезные потери”2. “Colonel Kornilov was born on August 18 (30), 1870 in the family of a cornet. He graduated from the Siberian Cadet Corps, the Mikhailovskoe Artillery School, and the Nikolaev Academy of the General Staff (with a gold medal). In 1904-1905. served as a staff officer and actually performed the duties of chief of staff. He was awarded for valor in the battle of Mukden and promoted to colonel. L.G. Kornilov was captured in 1915 during the retreat of the armies, his division suffered serious losses "

Оба текста содержат изложение фактов биографии исторического лица, однако текст номер 1 содержит ряд фактических ошибок (военные действия в районе Сталинграда в 1921 году, несуществующие исторические лица, другие нарушения хронологии), что позволяет человеку с базовыми знаниями истории опознать его как ошибочный. Both texts contain a statement of the facts of the biography of a historical person, however, text number 1 contains a number of factual errors (military operations in the Stalingrad region in 1921, non-existent historical figures, other violations of chronology), which allows a person with basic knowledge of history to recognize him as erroneous.

Пример 2.Example 2.

Языковая модель на базе нейросетевой архитектуры Transformer [16,17] была обучена на 10 Gb текста научных статей из медицинских журналов. Текст был разбит на токены с помощью метода Byte-Pair Encoding [18], был использован словарь из 64000 токенов. Был сгенерирован текст из 20 текстов, 10 из которых являются сгенирированными с помощью языковой модели и 10 — найдены в реальных научных статьях по принципу сходства начала первого предложения в каждом тексте. Тестируемому было предложено определить, какие из фрагментов сгенерированы автоматически.The language model based on the neural network architecture Transformer [16,17] was trained on 10 Gb of text of scientific articles from medical journals. The text was split into tokens using the Byte-Pair Encoding method [18], a dictionary of 64,000 tokens was used. A text was generated from 20 texts, 10 of which were generated using the language model and 10 were found in real scientific articles on the basis of the similarity of the beginning of the first sentence in each text. The person being tested was asked to determine which of the fragments were generated automatically.

Пример сгенерированного фрагмента:An example of a generated snippet:

“Настоящее исследование было разработано для изучения влияния одной внутрибрюшинной инъекции неселективного антагониста 5-НТ-рецептора метизергида (10 мг / кг) на антиноцицептивное действие морфина у крысы. Антиноцицептивный эффект метизергида оценивали в тесте принудительного плавания”.“The present study was designed to investigate the effect of a single intraperitoneal injection of the non-selective 5-HT receptor antagonist metisergide (10 mg / kg) on the antinociceptive effect of morphine in rats. The antinociceptive effect of metisergide was evaluated in the forced swimming test. ”

В данном случае предметом проверки знаний является экспериментальная биохимия. Приведенный выше текст сгенерирован автоматически и очень похож на настоящий. Однако, он содержит логическую ошибку (тест принудительного плавания используется для оценки эффективности антидепрессантов и не может быть использован для описанной в фрагменте цели, потому что в нем нет болевого стимула). В данном случае студент должен не только знать факты специфической предметной области, но и представить себе описываемый в фрагменте опыт, чтобы понять, что нужный результат в нем получен не будет. При этом наличие под рукой справочной информации не помогает однозначно определить ошибку, если человек не имеет достаточного уровня понимания и способности к логическим рассуждениям в заданной области. In this case, the subject of knowledge testing is experimental biochemistry. The above text is automatically generated and very similar to the real one. However, it contains a logical error (the forced swimming test is used to assess the effectiveness of antidepressants and cannot be used for the purpose described in the fragment, because it does not contain painful stimuli). In this case, the student must not only know the facts of a specific subject area, but also imagine the experience described in the fragment in order to understand that the desired result will not be obtained in it. At the same time, the presence of reference information at hand does not help to unambiguously determine the error if the person does not have a sufficient level of understanding and the ability to reason logically in a given area.

Заявляемый способ обладает следующими преимуществами:The inventive method has the following advantages:

- позволяет снизить трудозатраты на создание тестовых заданий;- allows you to reduce labor costs for creating test items;

- позволяет исключить случайные ошибки при создании и проверке тестовых заданий;- allows you to exclude accidental errors when creating and checking test items;

- обеспечивает возможность создания заданий заданного уровня сложности;- provides the ability to create tasks of a given level of complexity;

- позволяет минимизировать вероятность случайного правильного прохождения теста тестируемым;- allows to minimize the probability of accidental correct passing of the test by the tested;

- учитывает не только знание определенных фактов в изучаемой дисциплине, но и способность к логическим рассуждениям и выводам тестируемого.- takes into account not only knowledge of certain facts in the studied discipline, but also the ability for logical reasoning and conclusions of the test taker.

1. Papasalouros, Andreas, Konstantinos Kanaris, and Konstantinos Kotis. "Automatic Generation Of Multiple Choice Questions From Domain Ontologies." e-Learning. 2008.1. Papasalouros, Andreas, Konstantinos Kanaris, and Konstantinos Kotis. "Automatic Generation Of Multiple Choice Questions From Domain Ontologies." e-Learning. 2008.

2. Kantor, Arthur, Jan Kleindienst, and Martin Schmid. "Automatic question generation from natural text." U.S. Patent No. 9,904,675. 27 Feb. 2018.2. Kantor, Arthur, Jan Kleindienst, and Martin Schmid. "Automatic question generation from natural text." U.S. Patent No. 9,904,675. 27 Feb. 2018.

3. Международная заявка WO2018165579, опубл. 13.09.2018г.3. International application WO2018165579, publ. 13.09.2018

4. Subramanian, Sandeep, et al. "Neural models for key phrase detection and question generation." arXiv preprint arXiv:1706.04560 (2017).4. Subramanian, Sandeep, et al. "Neural models for key phrase detection and question generation." arXiv preprint arXiv: 1706.04560 (2017).

5.Zubarev D.V., Sochenkov I.V (2019). Cross-Language Text Alignment for Plagiarism Detection Based on Contextual and Context-Free Models. Computational Linguistics and Intellectual Technologies: Proceedings of Annual International Conference “Dialogue”, Issue 185.Zubarev D.V., Sochenkov I.V (2019). Cross-Language Text Alignment for Plagiarism Detection Based on Contextual and Context-Free Models. Computational Linguistics and Intellectual Technologies: Proceedings of Annual International Conference “Dialogue”, Issue 18

6. Tikhomirov M. M., Loukachevitch N. V., Dobrov B. V (2019). Assessing Theme Adherence in Student Thesis. Computational Linguistics and Intellectual Technologies: Proceedings of Annual International Conference “Dialogue”, Issue 186. Tikhomirov M. M., Loukachevitch N. V., Dobrov B. V (2019). Assessing Theme Adherence in Student Thesis. Computational Linguistics and Intellectual Technologies: Proceedings of Annual International Conference “Dialogue”, Issue 18

7. Ercikan, Kadriye, et al. "Calibration and scoring of tests with multiple-choice and constructed-response item types." Journal of Educational Measurement 35.2 (1998): 137-154.7. Ercikan, Kadriye, et al. "Calibration and scoring of tests with multiple-choice and constructed-response item types." Journal of Educational Measurement 35.2 (1998): 137-154.

8. Ronald Rosenfeld. 2000. Two decades of statistical language modeling: Where do we go from here? Proceedings of the IEEE, 88(8):1270–12788. Ronald Rosenfeld. 2000. Two decades of statistical language modeling: Where do we go from here? Proceedings of the IEEE, 88 (8): 1270-1278

9. Akbik, Alan, Duncan Blythe, and Roland Vollgraf. "Contextual string embeddings for sequence labeling." Proceedings of the 27th International Conference on Computational Linguistics. 2018.9. Akbik, Alan, Duncan Blythe, and Roland Vollgraf. "Contextual string embeddings for sequence labeling." Proceedings of the 27th International Conference on Computational Linguistics. 2018.

10. Christopher D. Manning, Prabhakar Raghavan, Hinrich Schütze. An Introduction to Information Retrieval, Cambridge University Press, 2009, p. 23310. Christopher D. Manning, Prabhakar Raghavan, Hinrich Schütze. An Introduction to Information Retrieval, Cambridge University Press, 2009, p. 233

11. Papineni, K.; Roukos, S.; Ward, T.; Zhu, W. J. (2002). BLEU: a method for automatic evaluation of machine translation (PDF). ACL-2002: 40th Annual meeting of the Association for Computational Linguistics. pp. 311–318.11. Papineni, K .; Roukos, S .; Ward, T .; Zhu, W. J. (2002). BLEU: a method for automatic evaluation of machine translation (PDF). ACL-2002: 40th Annual meeting of the Association for Computational Linguistics. pp. 311-318.

12. He H., Gimpel K., Lin J. Multi-perspective sentence similarity modeling with convolutional neural networks //Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. – 2015. – С. 1576-1586.12. He H., Gimpel K., Lin J. Multi-perspective sentence similarity modeling with convolutional neural networks // Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. - 2015 .-- S. 1576-1586.

13. Neculoiu, Paul, Maarten Versteegh, and Mihai Rotaru. "Learning text similarity with siamese recurrent networks." Proceedings of the 1st Workshop on Representation Learning for NLP. 2016.13. Neculoiu, Paul, Maarten Versteegh, and Mihai Rotaru. "Learning text similarity with siamese recurrent networks." Proceedings of the 1st Workshop on Representation Learning for NLP. 2016.

14. Amiri, Hadi, et al. "Learning text pair similarity with context-sensitive autoencoders." Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2016.14. Amiri, Hadi, et al. "Learning text pair similarity with context-sensitive autoencoders." Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2016.

15. Sepp Hochreiter; Jürgen Schmidhuber (1997). "Long short-term memory". Neural Computation. 9 (8): 1735–1780. doi:10.1162/neco.1997.9.8.1735. PMID 9377276.15. Sepp Hochreiter; Jürgen Schmidhuber (1997). "Long short-term memory". Neural Computation. 9 (8): 1735-1780. doi: 10.1162 / neco.1997.9.8.1735. PMID 9377276.

16. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., ... & Polosukhin, I. (2017). Attention is all you need. In Advances in neural information processing systems (pp. 5998-6008).16. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., ... & Polosukhin, I. (2017). Attention is all you need. In Advances in neural information processing systems (pp. 5998-6008).

17. Radford, Alec, et al. "Improving language understanding by generative pre-training." URL: https://s3-us-west-2.amazonaws.com/openai-assets/researchcovers/languageunsupervised/language understanding paper. Pdf (2018).17. Radford, Alec, et al. "Improving language understanding by generative pre-training." URL: https://s3-us-west-2.amazonaws.com/openai-assets/researchcovers/languageunsupervised/language understanding paper. Pdf (2018).

18. Sennrich, Rico, Barry Haddow, and Alexandra Birch. "Neural machine translation of rare words with subword units." arXiv preprint arXiv:1508.07909 (2015).18. Sennrich, Rico, Barry Haddow, and Alexandra Birch. "Neural machine translation of rare words with subword units." arXiv preprint arXiv: 1508.07909 (2015).

Claims

1. Computer automated method for creating test items to test the depth of knowledge and reasoning ability of students and specialists, including obtaining a sample of texts on a given topic using an electronic computing device, creating a language model on its basis in the form of a conditional probability distribution function, modifying the distribution with using the numerical value of the complexity parameter, the subsequent formation of automatically generated texts on its basis by obtaining the next token in the text taking into account the values of the previous tokens, obtaining texts from real sources of information similar in terms of the established similarity function to the mentioned automatically generated texts and the formation of test tasks containing the mentioned automatically generated texts and similar texts from real sources of information.

2. The method according to claim 1, characterized in that a word, part of a word or a symbol is used as a token.

3. The method according to claim 1, characterized in that the determination of the distribution is carried out using neural networks.

4. The method according to claim 1, characterized in that the numerical value of the complexity parameter is determined empirically by a specialist in the subject area choosing an acceptable level of complexity based on a number of automatically generated texts.

5. The method according to claim 1, characterized in that the numerical value of the complexity parameter is determined by a statistical method by performing test tasks with different levels of difficulty in the control and expert groups and choosing the value of the complexity parameter that showed the greatest difference in average scores between the groups.

6. The method according to claim 1, characterized in that the maximum coincidence of words or word combinations of the generated text and the text of real information sources is used as the similarity function.