RU2809595C1

RU2809595C1 - Method and system for automatic polygraph testing using three ensembles of machine learning models

Info

Publication number: RU2809595C1
Application number: RU2023102442A
Authority: RU
Inventors: Дмитрий Валерьевич Асонов; Максим Андреевич Крылов; Анастасия Евгеньевна Рябикина; Евгений Вячеславович Литвинов; Максим Алексеевич Митрофанов; Максим Алексеевич Михайлов
Original assignee: Публичное Акционерное Общество "Сбербанк России" (Пао Сбербанк)
Filing date: 2023-02-03
Publication date: 2023-12-13

Abstract

FIELD: polygraph testing.

SUBSTANCE: invention relates to a method and system for automatically checking a subject using a polygraph using machine learning methods. In the method, records of polygraphic tests are obtained containing sensor signals with time scales on which the beginning and end of the question are marked; additional data containing at least the age of the person being checked, gender, job information are received; processing of the received signals using the first ensemble of ML models trained on one topic is performed, and during this processing the following is carried out: signal processing by the first ML model, during which the following is performed: determination of time intervals for extracting variables based on the time stamps of the beginning and end of the question and the time answer labels, and based on question type and topic; extracting variables from each signal at certain time intervals; processing of obtained variables from signals, which involves normalization and concatenation of processed variables and construction of a vector based on them; feeding said vector to the 1st ML model to obtain the output value of the 1st ML model; transferring the output value of the 1st ML model to the input of the 2nd ML model; using the second ML model, the output value of the 1st ML model and additional data are processed, and during this processing the following is carried out: division of additional data into categorical and numerical variables; processing of obtained variables from additional data, which involves vectorization of categorical variables and normalization of numerical variables; concatenation of processed additional variables, as well as the output value of the 1st ML model and construction of a vector based on them; feeding said vector to the 2nd ML model to obtain the output value of the 2nd ML model; feeding the output value of the 2nd ML model to the third ML model to form the output value of the first ensemble; carry out processing of the received signals using a second ensemble of the ML models trained on a combination of topics, and during this processing the following is carried out: signal processing by the first ML model during which the following is performed: determination of time intervals for extracting variables based on the time stamps of the beginning and end of the question and the time stamp answer, and based on the type and topic of the question; extracting variables from each signal at certain time intervals; processing of obtained variables from signals, which involves normalization and concatenation of processed variables and construction of a vector based on them; feeding said vector to the 1st ML model to obtain the output value of the 1st ML model; transferring the output value of the 1st ML model to the input of the 2nd ML model; using the second ML model, the output value of the 1st ML model and additional data are processed, and during this processing the following is carried out: division of additional data into categorical and numerical variables; processing of obtained variables from additional data, which involves vectorization of categorical variables and normalization of numerical variables; concatenation of processed additional variables, as well as the output value of the 1st ML model and construction of a vector based on them; feeding said vector to the 2nd ML model to obtain the output value of the 2nd ML model; feeding the output value of the 2nd ML model to the third ML model to form the output value of the second ensemble; carry out processing of the received signals using a third ensemble of machine learning models trained on a combination of topics, and during the specified processing the following is carried out: signal processing by the first ML model, during which the following is performed: determination of time intervals for extracting variables based on the time stamps of the beginning and end of the question and response timestamp, and based on question type and topic; extracting variables from each signal at certain time intervals; processing of obtained variables from signals, which involves normalization and concatenation of processed variables and construction of a vector based on them; feeding said vector to the 1st ML model to obtain the output value of the 1st ML model; transferring the output value of the 1st ML model to the input of the 2nd ML model; using the second ML model, the output value of the 1st ML model and additional data are processed, and during this processing the following is carried out: division of additional data into categorical and numerical variables; processing of obtained variables from additional data, which involves vectorization of categorical variables and normalization of numerical variables; concatenation of processed additional variables, as well as the output value of the 1st ML model, and construction of a vector based on them; feeding said vector to the 2nd ML model to obtain the output value of the 2nd ML model; feeding the output value of the 2nd ML model to the third ML model to form the output value of the third ensemble; using the third ML model, the output values of the first, second and third ML ensembles are processed, and during this processing the following is carried out: concatenation of the processed outputs data values of the first, second and third ensembles and construction of a vector based on them; feeding said vector to the 3rd ML model to obtain the output value of the 3rd ML model; comparison of the output value of the 3rd model with a given threshold value; and determining that the response is false if the output value is greater than or equal to the threshold value, or the response is true if the output value is below the threshold value.

EFFECT: increased accuracy of polygraph testing.

7 cl, 11 dwg, 11 tbl

Description

ОБЛАСТЬ ТЕХНИКИTECHNICAL FIELD

[0001] Заявленное техническое решение относится к автоматизированному способу и системе автоматической полиграфической проверки с помощью алгоритмов машинного обучения.[0001] The claimed technical solution relates to an automated method and system for automatic printing verification using machine learning algorithms.

УРОВЕНЬ ТЕХНИКИBACKGROUND OF THE ART

[0002] Классические полиграфические скрининги регулярно используются значимыми предприятиями такими как банки, правоохранительные структуры и федеральные органы власти. Основное беспокойства научных сообществ заключается в том, что эти скрининги склонны к содержанию ошибок. Однако эти ошибки могут быть следствием не только метода, но и человека (полиграфолога).[0002] Classic polygraph screenings are regularly used by significant enterprises such as banks, law enforcement agencies and federal authorities. The main concern of the scientific community is that these screens are prone to error. However, these errors can be a consequence not only of the method, but also of the person (the polygraph examiner).

[0003] Безопасность клиентских денег и данных (например, транзакций) заложена в основе банковской культуры и репутации. В качестве одного из инструментов защиты клиентов, только с согласия кандидатов и сотрудников, и в соответствии с законодательством, банк использует полиграфические скрининги (ПС). Они применяются при найме кандидатов на рисковых направлениях, чтобы предотвратить наем ненадежного человека. Чтобы обнаружить нарушение, сотрудники на особо рисковых позициях регулярно проходят проверку. ПС включает следующие темы: наркотические вещества, зависимость от азартных игр, инсайдерская торговля, разглашение конфиденциальной информации, взяточничество, коррупция, незаконные присвоение средств и мошенничество. Финансовая отрасль - не единственная, использующая ПС; другие примеры - такие важные отрасли, как авиация, промышленность и правоохранительные структуры во всем мире [1,2].[0003] The security of customer money and data (eg transactions) is built into the core of banking culture and reputation. As one of the tools to protect clients, only with the consent of candidates and employees, and in accordance with the law, the bank uses polygraphic screenings (PS). They are used when hiring candidates in risky areas to prevent hiring an unreliable person. To detect violations, employees in high-risk positions are regularly tested. The CoP includes the following topics: drugs, gambling addiction, insider trading, disclosure of confidential information, bribery, corruption, misappropriation of funds and fraud. The financial industry is not the only one using PS; other examples are such important sectors as aviation, industry and law enforcement agencies around the world [1,2].

[0004] Классический полиграф - это устройство, записывающее сердечно-сосудистую активность (такую как сердечный пульс), грудное и брюшное дыхание, гальваническая реакция кожи (электрическая активность кожи или ЭАК) и дрожь. Полиграфолог задает вопросы испытуемому, на которые получает ответы «да» или «нет». Обзоры классического полиграфа и методологии построения вопросов представлены в [3, 4, 5].[0004] A classic polygraph is a device that records cardiovascular activity (such as heart rate), thoracic and abdominal respiration, galvanic skin response (skin electrical activity or SEA), and shivering. The polygraph examiner asks questions to the subject, to which he receives “yes” or “no” answers. Reviews of the classical polygraph and the methodology for constructing questions are presented in [3, 4, 5].

[0005] Нетрадиционные исследования обнаружения лжи используют анализ видео и аудио [6] (включая мимику лица [7, 8], реакцию зрачка [9] и задержку между вопросом и ответом [10]), электромиографию (ЭМГ) [11], электроэнцефалографию (ЭЭГ) [12], магнитно-резонансную томографию [13, 14] или письменные последовательности (динамика нажатия клавиш) [15] в дополнение к классическим полиграфическим данным.[0005] Non-traditional lie detection studies use video and audio analysis [6] (including facial expression [7, 8], pupil response [9] and question-answer latency [10]), electromyography (EMG) [11], electroencephalography (EEG) [12], magnetic resonance imaging [13, 14] or written sequences (keystroke dynamics) [15] in addition to classical polygraphic data.

[0006] Некоторые из этих исследований даже получили возможность на освоение новой области, так детектор лжи iBorderCtrl тестируется в европейских аэропортах [16, 17] или VeriPol применяется испанской полицией на делах о страховых требованиях [18, 19]. Классический полиграф остается инструментом выбора в традиционных задачах как скрининг при найме и уголовные или внутренние расследования.[0006] Some of these studies have even expanded into new areas, such as the iBorderCtrl lie detector being tested at European airports [16, 17] or VeriPol being used by Spanish police in insurance claims cases [18, 19]. The classic polygraph remains the tool of choice in traditional tasks such as employment screening and criminal or internal investigations.

[0007] Полиграф имеет длинную историю критики от ученых из области психологии и права, а также и со стороны общества и правительства [1, 23]. Основная обеспокоенность заключается в том, что эта методика надежно не определяет ложь и правду. И все же «парадоксально, хотя Конгресс выражает глубокое беспокойство по поводу эффективности данной технологии, ЕРРА разрешает использование детекторов лжи в случае, если точность результата имеет первостепенное значение: национальная оборона, безопасность и законные текущие расследования» [22].[0007] The polygraph has a long history of criticism from scientists in the fields of psychology and law, as well as from society and government [1, 23]. The main concern is that this technique does not reliably detect lies and truth. And yet, “paradoxically, while Congress expresses deep concern about the effectiveness of this technology, EPRA allows the use of lie detectors in cases where the accuracy of the result is of paramount importance: national defense, security and legitimate ongoing investigations” [22].

[0008] Критика данной методики предоставляет много аргументов, почему полиграфический скрининг может потерпеть неудачу при обнаружении лжи или отметить правду как ложь. Например, «Полиграфические тестирования оценивают не обман, а ситуации, которые построены так, чтобы вызвать и оценить страх» [24].[0008] Criticisms of this technique provide many reasons why polygraph screening may fail to detect lies or flag truths as lies. For example, “Printing tests do not evaluate deception, but situations that are structured to evoke and evaluate fear” [24].

[0009] Правдивый младший менеджер может бояться, что его назовут коррупционером больше, чем хладнокровный старший менеджер боится быть пойманным на лжи полиграфологом. Еще один пример конструктивной критики - призыв к стандартизации процедуры полиграфического скрининга и обучения полиграфолога [25]. Ошибки полиграфолога могут происходить, например, когда полиграфолог неопытен, уставший, отвлечен или предвзят [26].[0009] A truthful junior manager may fear being called corrupt more than a cold-blooded senior manager fears being caught in a lie by a polygraph examiner. Another example of constructive criticism is a call for standardization of the polygraph screening procedure and polygraph examiner training [25]. Polygraph examiner errors can occur, for example, when the polygraph examiner is inexperienced, tired, distracted, or biased [26].

[0010] Существует простое решение проверки качества: всегда проводить проверку еще одним полиграфологом, который подтвердит или опровергнет заключение предыдущего полиграфолога [27]. Чтобы провести проверку полиграфического отчета, другому полиграфологу требуется пересмотреть запись скрининга, включающую полиграмму (графическое представление данных с датчиков, связанных с вопросами полиграфолога и ответами испытуемого), иногда аудио- и видеозапись и сравнить его заключение с оригинальным. Данная проверка занимает минимум половину времени от скрининга. Стандартный скрининг длится минимум два часа. Таким образом, повторная проверка стоит и времени, и денег. По этой причине, отделения внутренней безопасности проводят повторные проверки редко или не проводят их вообще. Другая причина, по которой проверки вторым полиграфологом могут быть не эффективны: второй полиграфолог может допустить ту же самую ошибку, которую допустил оригинальный полиграфолог.[0010] There is a simple solution to quality control: always carry out the test by another polygraph examiner, who will confirm or refute the conclusion of the previous polygraph examiner [27]. To review a polygraph report, another polygraph examiner needs to review the screening recording, which includes the polygram (a graphical representation of the sensor data associated with the polygraph examiner's questions and the test taker's answers), sometimes audio and video recordings, and compare his report with the original. This check takes at least half the time of screening. A standard screening lasts a minimum of two hours. Thus, re-checking costs both time and money. For this reason, Homeland Security offices conduct follow-up inspections rarely or not at all. Another reason why tests by a second polygraph examiner may not be effective: the second polygraph examiner may make the same mistake that the original polygraph examiner made.

[0011] Общим недостатком существующих решений в данной области является присутствие человеческого фактора при полиграфической проверке, что негативно сказывается на точности и скорости проверки, а также отсутствие автоматизированного процесса повторной проверки.[0011] A common disadvantage of existing solutions in this area is the presence of a human factor during polygraphic checking, which negatively affects the accuracy and speed of verification, as well as the lack of an automated re-checking process.

РАСКРЫТИЕ ИЗОБРЕТЕНИЯDISCLOSURE OF INVENTION

[0012] В заявленном техническом решении предлагается новый подход, к автоматической полиграфической проверке с использованием моделей машинного обучения (МО).[0012] The claimed technical solution proposes a new approach to automatic polygraph checking using machine learning (ML) models.

[0013] Эффективность данного решения подтверждается существенным приростом точности и скорости проведения автоматической полиграфической проверки.[0013] The effectiveness of this solution is confirmed by a significant increase in the accuracy and speed of automatic printing testing.

[0014] Таким образом, решается техническая проблема точной и скоростной автоматической полиграфической проверки.[0014] Thus, the technical problem of accurate and high-speed automatic printing verification is solved.

[0015] Техническим результатом, достигающимся при решении данной проблемы, является повышение точности полиграфической проверки.[0015] The technical result achieved by solving this problem is to increase the accuracy of printing testing.

[0016] Дополнительным техническим результатом, достигающимся при решении данной проблемы, является повышение скорости полиграфической проверки.[0016] An additional technical result achieved by solving this problem is increasing the speed of printing testing.

[0017] Также дополнительным техническим результатом, достигающимся при решении данной проблемы, является автоматизация процесса полиграфической проверки.[0017] Also, an additional technical result achieved when solving this problem is the automation of the printing verification process.

[0018] Указанные технические результаты достигается благодаря осуществлению компьютерно-реализуемого способа автоматической полиграфической проверки, выполняемый с помощью вычислительной системы, содержащей по меньшей мере три ансамбля моделей машинного обучения, при этом способ выполняет этапы, на которых:[0018] These technical results are achieved by implementing a computer-implemented method for automatic polygraph checking, performed using a computing system containing at least three ensembles of machine learning models, wherein the method performs the steps of:

- получают записи полиграфических проверок, содержащие по меньшей мере сигналы датчиков с временными шкалами, на которых промаркированы начало и конец вопроса;- obtain records of polygraph tests containing at least sensor signals with time scales on which the beginning and end of the question are marked;

- получают дополнительные данные, содержащие по меньшей мере возраст проверяемого, пол, должностную информацию;- receive additional data containing at least the age of the person being checked, gender, job information;

- осуществляют обработку полученных сигналов с помощью первого ансамбля моделей МО, обученного на одной теме, причем в ходе указанной обработки осуществляется:- process the received signals using the first ensemble of ML models trained on one topic, and during this processing the following is carried out:

обработка сигналов первой моделью МО в ходе которой выполняется: signal processing by the first MO model during which the following is performed:

определение временных интервалов для извлечения переменных на основе временных меток начала и конца вопроса и временной метки ответа, и на основе типа и темы вопроса; defining time intervals for retrieving variables based on question start and end timestamps and answer timestamps, and based on question type and topic;

извлечение переменных из каждого сигнала на определенных временных интервалах; extracting variables from each signal at certain time intervals;

обработка полученных переменных из сигналов, при которой выполняется нормализация и конкатенация обработанных переменных и построение на их основе вектора; processing of obtained variables from signals, which involves normalization and concatenation of processed variables and construction of a vector based on them;

подача упомянутого вектора в 1-ю модель МО для получения выходного значения 1-й модели МО; feeding said vector to the 1st MO model to obtain the output value of the 1st MO model;

передача выходного значения 1-й модели МО на вход 2-й модели МО; transferring the output value of the 1st MO model to the input of the 2nd MO model;

с помощью второй модели МО осуществляют обработку выходного значения 1-й модели МО, и дополнительных данных, причем в ходе указанной обработки осуществляется: using the second MO model, the output value of the 1st MO model and additional data are processed, and during this processing the following is carried out:

разделение дополнительных данных на категориальные и численные переменные; separating additional data into categorical and numerical variables;

обработка полученных переменных из дополнительных данных, при которой выполняется векторизация категориальных переменных и нормализация численных переменных; processing of obtained variables from additional data, which involves vectorization of categorical variables and normalization of numerical variables;

конкатенация обработанных дополнительных переменных, а также выходного значения 1-й модели МО, и построение на их основе вектора; concatenation of processed additional variables, as well as the output value of the 1st MO model, and construction of a vector based on them;

подача упомянутого вектора во 2-ю модель МО для получения выходного значения 2-й модели МО; feeding said vector to the 2nd MO model to obtain the output value of the 2nd MO model;

подача выходного значения 2-ой модели МО в третью модель МО, для формирования выходного значения первого ансамбля; feeding the output value of the 2nd MO model to the third MO model to form the output value of the first ensemble;

- осуществляют обработку полученных сигналов с помощью второго ансамбля моделей МО, обученного на комбинации тем, причем в ходе указанной обработки осуществляется:- process the received signals using a second ensemble of ML models trained on a combination of topics, and during this processing the following is carried out:

подача выходного значения 2-ой модели МО в третью модель МО, для формирования выходного значения второго ансамбля; feeding the output value of the 2nd MO model to the third MO model to form the output value of the second ensemble;

- осуществляют обработку полученных сигналов с помощью третьего ансамбля моделей машинного обучения, обученного на комбинации тем, причем в ходе указанной обработки осуществляется:- process the received signals using a third ensemble of machine learning models trained on a combination of topics, and during this processing the following is carried out:

обработка сигналов первой моделью МО, в ходе которой выполняется: signal processing by the first MO model, during which the following is performed:

передача выходного значения 1 -й модели МО на вход 2-й модели МО; transferring the output value of the 1st MO model to the input of the 2nd MO model;

подача выходного значения 2-ой модели МО в третью модель МО, для формирования выходного значения третьего ансамбля; feeding the output value of the 2nd MO model to the third MO model to form the output value of the third ensemble;

- с помощью третьей модели МО осуществляют обработку выходных значений первого, второго и третьего ансамблей МО, причем в ходе указанной обработки осуществляется:- using the third MO model, the output values of the first, second and third MO ensembles are processed, and during this processing the following is carried out:

конкатенация обработанных выходных значений первого, второго и третьего ансамблей, и построение на их основе вектора; concatenation of the processed output values of the first, second and third ensembles, and construction of a vector based on them;

подача упомянутого вектора в 3-ю модель МО для получения выходного значения 3-й модели МО; feeding said vector to the 3rd MO model to obtain the output value of the 3rd MO model;

сравнение выходного значения 3-й модели с заданным пороговым значением; и comparison of the output value of the 3rd model with a given threshold value; And

- определяют, что ответ является ложью если выходное значение выше или равно пороговому значению или ответ является правдой если выходное значение ниже порогового значения.- determine that the response is false if the output value is greater than or equal to the threshold value or the response is true if the output value is below the threshold value.

[0019] В одном из частных вариантов реализации способа модели МО обучены на одной из тем для проверок или их комбинации, где темами для проверок являются: наркотические вещества, получение дополнительного вознаграждения, разглашение конфиденциальной информации, долговые обязательства, сторонний доход, уголовные правонарушения, административные правонарушения, нарушения внутренних нормативных документов (ВНД).[0019] In one of the private embodiments of the method, ML models are trained on one of the topics for checks or a combination thereof, where the topics for checks are: narcotic substances, receiving additional remuneration, disclosure of confidential information, debt obligations, third-party income, criminal offenses, administrative offenses, violations of internal regulations (INR).

[0020] В другом частном варианте реализации способа модели МО первого второго и третьего ансамблей имеют тип, выбираемый из группы: градиентный бустинг, случайный лес, или нейронная сеть.[0020] In another particular embodiment of the method, the ML models of the first second and third ensembles are of a type selected from the group: gradient boosting, random forest, or neural network.

[0021] В другом частном варианте реализации способа третья модель МО имеет тип, выбираемый из группы: логистическая регрессия, случайный лес, градиентный бустинг или усреднение.[0021] In another particular embodiment of the method, the third ML model has a type selected from the group: logistic regression, random forest, gradient boosting or averaging.

[0022] В другом частном варианте реализации способа записи полиграфических проверок содержат сигналы с датчиков, включающие по меньшей мере одно из: частота сердечного сокращения (ЧСС), кожно-гальваническая реакции (КГР), артериальное давление, верхнее и нижнее дыхание, пьезоплетизмограмму, фотоплетизмограмму, термических, движения зрачка или их комбинации.[0022] In another particular embodiment of the method for recording polygraphic checks, they contain signals from sensors, including at least one of: heart rate (HR), galvanic skin response (GSR), blood pressure, upper and lower respiration, piezoplethysmogram, photoplethysmogram , thermal, pupil movement or combinations thereof.

[0023] В другом частном варианте реализации способа дополнительные данные содержат по меньшей мере одно из: идентификационный номер полиграфологов, идентификационный номер полиграфов, информацию о погодных условиях, результаты электроэнцефалограммы, магнитно-резонансной томографии, функциональной ближней инфракрасной спектроскопии, информацию о геомагнитных бурях или их комбинации[0023] In another particular embodiment of the method, the additional data contains at least one of: identification number of polygraph examiners, identification number of polygraphs, information about weather conditions, results of an electroencephalogram, magnetic resonance imaging, functional near-infrared spectroscopy, information about geomagnetic storms or their combinations

[0024] Кроме того, заявленный технический результат достигается за счет системы автоматической полиграфической проверки, содержащей:[0024] In addition, the claimed technical result is achieved through an automatic printing verification system containing:

- по меньшей мере один процессор;- at least one processor;

- по меньшей мере одну память, соединенную с процессором, которая содержит машиночитаемые инструкции, которые при их выполнении по меньшей мере одним процессором обеспечивают выполнение способа автоматической полиграфической проверки.- at least one memory connected to the processor, which contains machine-readable instructions that, when executed by at least one processor, enable the execution of an automatic polygraphic checking method.

КРАТКОЕ ОПИСАНИЕ ЧЕРТЕЖЕЙBRIEF DESCRIPTION OF THE DRAWINGS

[0025] Признаки и преимущества настоящего изобретения станут очевидными из приводимого ниже подробного описания изобретения и прилагаемых чертежей.[0025] Features and advantages of the present invention will become apparent from the following detailed description of the invention and the accompanying drawings.

[0026] Фиг. 1 иллюстрирует пример реализации способа автоматической полиграфической проверки с помощью одной модели МО.[0026] FIG. 1 illustrates an example of implementing a method for automatic polygraph checking using one ML model.

[0027] Фиг. 2 иллюстрирует пример реализации способа автоматической полиграфической проверки с помощью двух моделей МО.[0027] FIG. 2 illustrates an example of the implementation of a method for automatic polygraph checking using two ML models.

[0028] Фиг. 3 иллюстрирует пример реализации способа автоматической полиграфической проверки с помощью двух ансамблей моделей МО.[0028] FIG. Figure 3 illustrates an example of the implementation of a method for automatic polygraph checking using two ensembles of ML models.

[0029] Фиг. 4 иллюстрирует пример реализации способа автоматической полиграфической проверки с помощью трех ансамблей моделей МО.[0029] FIG. 4 illustrates an example of the implementation of a method for automatic polygraph checking using three ensembles of ML models.

[0030] Фиг. 5 иллюстрирует график, где модель и полиграфолог сошлись и не сошлись во мнениях в зависимости от оценки модели[0030] FIG. Figure 5 illustrates a graph where the model and the polygraph examiner agreed and disagreed depending on the model’s assessment

[0031] Фиг. 6 иллюстрирует распределение оценок для двух релевантных тем.[0031] FIG. 6 illustrates the distribution of ratings for two relevant topics.

[0032] Фиг. 7 иллюстрирует пример реализации базовой модели.[0032] FIG. 7 illustrates an example implementation of the basic model.

[0033] Фиг. 8 иллюстрирует пример реализации модели, обученной на одной теме.[0033] FIG. Figure 8 illustrates an example implementation of a model trained on one topic.

[0034] Фиг. 9 иллюстрирует пример реализации универсальной модели.[0034] FIG. 9 illustrates an example of the implementation of a universal model.

[0035] Фиг. 10 иллюстрирует пример свертки сигналов.[0035] FIG. 10 illustrates an example of signal convolution.

[0036] Фиг. 11 иллюстрирует общий вид системы автоматической полиграфической проверки.[0036] FIG. 11 illustrates a general view of the automatic printing verification system.

ОСУЩЕСТВЛЕНИЕ ИЗОБРЕТЕНИЯIMPLEMENTATION OF THE INVENTION

[0037] Данное техническое решение может быть реализовано на компьютере, в виде автоматизированной информационной системы (АИС) или машиночитаемого носителя, содержащего инструкции для выполнения вышеупомянутого способа.[0037] This technical solution can be implemented on a computer, in the form of an automated information system (AIS) or a machine-readable medium containing instructions for performing the above method.

[0038] Техническое решение может быть реализовано в виде распределенной компьютерной системы.[0038] The technical solution can be implemented in the form of a distributed computer system.

[0039] В данном решении под системой подразумевается компьютерная система, ЭВМ (электронно-вычислительная машина), ЧПУ (числовое программное управление), ПЛК (программируемый логический контроллер), компьютеризированные системы управления и любые другие устройства, способные выполнять заданную, четко определенную последовательность вычислительных операций (действий, инструкций).[0039] In this solution, a system means a computer system, a computer (computer), CNC (computer numerical control), PLC (programmable logic controller), computerized control systems and any other devices capable of performing a given, well-defined sequence of computing operations (actions, instructions).

[0040] Под устройством обработки команд подразумевается электронный блок либо интегральная схема (микропроцессор), исполняющая машинные инструкции (программы)/[0040] A command processing device means an electronic unit or an integrated circuit (microprocessor) that executes machine instructions (programs)/

[0041] Устройство обработки команд считывает и выполняет машинные инструкции (программы) с одного или более устройства хранения данных, например таких устройств, как оперативно запоминающие устройства (ОЗУ) и/или постоянные запоминающие устройства (ПЗУ). В качестве ПЗУ могут выступать, но, не ограничиваясь, жесткие диски (HDD), флеш-память, твердотельные накопители (SSD), оптические носители данных (CD, DVD, BD, MD и т.п.) и др.[0041] A command processing device reads and executes machine instructions (programs) from one or more storage devices, such as devices such as random access memory (RAM) and/or read only memory (ROM). ROM can be, but is not limited to, hard drives (HDD), flash memory, solid-state drives (SSD), optical storage media (CD, DVD, BD, MD, etc.), etc.

[0042] Программа - последовательность инструкций, предназначенных для исполнения устройством управления вычислительной машины или устройством обработки команд.[0042] Program - a sequence of instructions intended for execution by a computer control device or command processing device.

[0043] Для данного технического решения была построена базовая модель второго мнения, обученная на исторических данных из реальных ответов 2094 полиграфических скринингов, включающих поле «Выявлена ложь», заполненное полиграфологами, которые проводили скрининг.[0043] For this technical solution, a basic second opinion model was built, trained on historical data from real answers from 2094 polygraph screenings, including the “False Detected” field filled in by the polygraph examiners who conducted the screening.

[0044] Данное техническое решение демонстрирует метрики качества базовой модели в Таблице 1.а, в колонке «все темы». Основные метрики это ROC AUC (receiver operating characteristic, рабочая характеристика приемника) и TPR (целевой рейтинг target rating point) при FPR=0.05.[0044] This technical solution demonstrates the quality metrics of the base model in Table 1.a, in the “all topics” column. The main metrics are ROC AUC (receiver operating characteristic) and TPR (target rating point) at FPR=0.05.

[0045] Таблицы 2 и 3 отражают важность признаков, основанных на 10 физиологических сигналах и признаков пол и возраст испытуемого; детали построения признаков второго уровня будут описаны ниже. Фиг. 5 отображает график, где модель и полиграфолог сошлись и не сошлись во мнениях в зависимости от оценки модели.[0045] Tables 2 and 3 reflect the importance of features based on 10 physiological signals and features of gender and age of the subject; details of the construction of second-level features will be described below. Fig. Figure 5 displays a graph where the model and polygraph examiner agreed and disagreed depending on the model's assessment.

[0046] Также было измерено качество модели, применив ее к каждой из семи тем скрининга (Таблица 1.а).[0046] The quality of the model was also measured by applying it to each of the seven screening topics (Table 1.a).

[0047] Оценка эффективности модели влечет за собой использование примерно 100 человеко-часов, потому что это потребует задействовать настоящего полиграфолога, который будет проверять множество полиграфических скринингов, отобранных на основании оценки модели.[0047] Evaluating the performance of the model entails the use of approximately 100 man-hours because it would require the use of an actual polygraph examiner who would review multiple polygraph screenings selected based on the evaluation of the model.

[0048] Также было предположено, что информация о геомагнитных бурях на Земле [28] и погодных условиях в городе в день скрининга может улучшить качество предсказания модели. Основой для этого предположения было то, что во время геомагнитных бурь и при разных погодных условиях люди могут вести себя по-другому, что может выражаться в небольшом изменении физиологических сигналов или данные датчика могут быть немного смещены, или оба этих случая.[0048] It has also been suggested that information about geomagnetic storms on Earth [28] and weather conditions in the city on the day of screening could improve the model's prediction quality. The basis for this assumption was that during geomagnetic storms and under different weather conditions, people may behave differently, which may result in a slight change in physiological signals, or the sensor data may be slightly biased, or both.

[0049] Также был получен ID полиграфолога из расчета того, что эти данные могут помочь модели, так как разные полиграфологи могут вызывать немного разные физиологические реакции у испытуемых, или полиграфическое оборудование, назначенное каждому полиграфологу может иметь разную погрешность измерений. Также были собраны данные о должностях испытуемых, потому что люди с разным образованием и подготовкой могут говорить правду и ложь по-разному.[0049] The ID of the polygraph examiner was also obtained in the belief that this data may help the model, since different polygraph examiners may produce slightly different physiological responses in subjects, or the printing equipment assigned to each polygraph examiner may have different measurement errors. Data was also collected on the subjects' job titles because people with different education and training may tell the truth and lie differently.

[0050] Таблица l.b отражает качество базовой модели, обученной с альтернативными данными, а Таблица 4 показывать важность этих признаков. Улучшения от каждого источника относительно модели, построенной только на данных физиологических сигналов, показаны в Таблице 5.[0050] Table l.b shows the quality of the base model trained with alternative data, and Table 4 shows the importance of these features. The improvements from each source relative to the model built only on physiological signal data are shown in Table 5.

[0051] Как показывает эксперимент, все альтернативные данные улучшают качество модели, однако были оставлены только возраст, пол, должность и геомагнитные бури для эксплуатации (Таблица 1.с).[0051] As the experiment shows, all alternative data improve the quality of the model, but only age, gender, position and geomagnetic storms were retained for operation (Table 1.c).

[0052] Погода демонстрирует высокий прирост качества и высокую важность, и было опасение, что это может быть по техническим причинам: данные имеют сильный дисбаланс по доле выявленных по городам. Чтобы исключить смещение, вызванное дисбалансом городов, были взяты данные только по одному городу, но погода все равно имела высокое значение важности.[0052] Weather shows a high increase in quality and high importance, and there was concern that this might be due to technical reasons: the data has a strong imbalance in the proportion of detected by city. To eliminate bias caused by city imbalance, data was taken from only one city, but weather still had a high importance value.

[0053] Таким образом, можно сделать вывод, что погода - это значимые альтернативные данные. Однако на всей выборки погода может нести в себе информацию о городе, и оценка модели может быть смещенной из-за дисбаланса выявленных по городам. В то время как признак ID полиграфолога демонстрирует умеренный прирост качества, природа данного признака требует более глубокого изучение до того, как использовать этот признак при эксплуатации модели.[0053] Thus, it can be concluded that weather is a meaningful alternative data. However, across the entire sample, weather may carry information about the city, and the model estimate may be biased due to an imbalance of those identified across cities. While the polygraph examiner ID feature shows a moderate increase in quality, the nature of this feature requires deeper study before using this feature in the operation of the model.

[0054] Например, если на самом деле улучшению качества способствует не ID полиграфолога, a ID полиграфа, то при смене устройства одним полиграфологом будет получена неверная оценка модели.[0054] For example, if in fact it is not the polygraph examiner ID that contributes to quality improvement, but the polygraph ID, then when one polygraph examiner changes the device, an incorrect assessment of the model will be obtained.

[0056] Если обучить отдельную модель для каждой темы, то результаты будут лучше, чем результаты базовой модели с геомагнитными бурями. Таблица 6 демонстрирует, что было получено +2% ROC AUC (6% относительного прироста), если модель обучалась только на теме наркотические вещества. Таким образом, разделение на темы поможет модели лучше обучаться и делать заключение.[0056] If you train a separate model for each topic, the results will be better than the results of the basic model with geomagnetic storms. Table 6 shows that +2% ROC AUC (6% relative gain) was obtained if the model was trained only on the drug topic. Thus, dividing into topics will help the model to learn and make a conclusion better.

[0058] Модель, обученная на наркотических вещества, о которой велась речь в предыдущем параграфе, была применена ко всем остальным шести темам (Таблица 6). В сравнении с универсальной моделью качество модели на наркотических веществах колеблется от темы к теме. Это доказывает, что модель, обученная на одной теме, может быть использована на остальных темах пусть и с незначительными падениями качества для некоторых тем.[0058] The drug-trained model discussed in the previous paragraph was applied to all other six topics (Table 6). Compared to the universal model, the quality of the drug model varies from topic to topic. This proves that a model trained on one topic can be used on other topics, albeit with minor drops in quality for some topics.

[0059] В Таблице 1 показано, что модель на некоторых темах (таких как наркотические вещества и уголовные правонарушения) работает лучше, чем на других темах (таких как сторонний доход и нарушение ВНД). Это наблюдение соответствует давней проблеме скринингов: люди не могут с уверенностью ответить на вопрос, когда полностью не уверены в ответе.[0059] Table 1 shows that the model performs better on some topics (such as drugs and criminal offenses) than on other topics (such as outside income and GNI violations). This observation is consistent with a long-standing problem with screenings: people cannot answer a question with confidence when they are not completely sure of the answer.

[0060] Например, в банке присутствует сотни ВНД, в каждом из которых множество страниц, и это, не упоминая о различных версиях, поэтому люди могут быть не уверены, нарушали ли они хотя бы один нормативный документ. Похожая ситуация наблюдается со сторонним доходом: некоторые люди начинают задаваться вопросом по типу «если я получал деньги от родственников, является ли это сторонним доходом?» и т.п.[0060] For example, there are hundreds of IRRs in a bank, each with many pages, and that's without mentioning the different versions, so people may not be sure whether they violated at least one regulatory document. A similar situation is observed with third-party income: some people begin to ask questions like “if I received money from relatives, is this third-party income?” and so on.

[0061] Также данное наблюдение может быть использовано для улучшения качества базовой модели. Если будут найдены темы, на которых люди не уверены, то модель должна быть не уверена, обучившись на этих темах.[0061] This observation can also be used to improve the quality of the base model. If topics are found that people are unsure about, then the model should be unsure by learning from those topics.

[0062] Неопределенные темы были удалены из всей выборки, однако в Таблице 7 видно, что это действие значимо не улучшило качество модели, и качество скоринга «размытых» тем упало либо осталось неизменным. Это связано с тем, что доля выявленных в скринингах по «размытым» темам в обучающей выборке незначительна.[0062] Vague topics were removed from the entire sample, however, Table 7 shows that this action did not significantly improve the quality of the model, and the quality of scoring “vague” topics dropped or remained unchanged. This is due to the fact that the proportion of “vague” topics identified in screenings in the training sample is insignificant.

[0063] Построенная универсальная модель, которую описывали выше, не использует название темы в качестве признака для обучения и оценки. Причина этого заключается в том, что была необходима модель, которая может оценивать любую тему в скрининге, а не только 7 тем, которые присутствуют в обучающей выборке.[0063] The constructed universal model, which was described above, does not use the topic name as a feature for training and evaluation. The reason for this is that a model was needed that can evaluate any topic in the screening, not just the 7 topics that are present in the training set.

[0064] Ниже в таблице 11 показано как использование названия темы в роли дополнительного признака влияет на качество модели. Показано, что знание темы помогает модели оценивать «размытую» тему нарушение ВНД, в то время как качество по другим темам осталось неизменным.[0064] Table 11 below shows how using the topic name as an additional feature affects the quality of the model. It is shown that knowledge of the topic helps the model evaluate the “blurry” topic of violation of GNI, while the quality on other topics remains unchanged.

[0065] Перед добавлением признака «название темы» была сбалансирована выборка по количеству выявленных риск-факторов в скринингах по темам. Эта балансировка заключается в уменьшении числа выявленных по теме наркотические вещества в 4 раза. В Таблице 11 показано, что это сокращение уменьшает качество темы наркотические вещества, которая раньше всегда была необъяснимым лидером.[0065] Before adding the “topic name” attribute, the sample was balanced by the number of identified risk factors in screenings by topic. This balancing consists of reducing the number of narcotic substances identified on the topic by 4 times. Table 11 shows that this reduction reduces the quality of the drug theme, which previously was always the unexplained leader.

[0066] Качество по теме наркотические вещества сравнялось с ближайшими по качеству темами, такими как получение денежного вознаграждения и уголовные, и административные правонарушения. Это наблюдение объясняет предыдущее лидерство темы НВ - это было по причине того, что по данной теме имелось наибольшее число объектов минорного класса (выявлено сокрытие информации) по сравнению с другими темами.[0066] The quality on the topic of narcotic substances was equal to the topics closest in quality, such as receiving monetary rewards and criminal and administrative offenses. This observation explains the previous leadership of the NV topic - this was due to the fact that this topic had the largest number of objects of a minor class (hiding of information was revealed) compared to other topics.

[0067] Было измерено улучшение качества от добавления новых, дополнительных скринингов с выявленными риск-факторами, и проведены эксперименты с различными вариациями ансамблей. Результаты представлены в Таблице 8.[0067] The improvement in quality from adding new, additional screens with identified risk factors was measured, and experiments were conducted with various variations of the ensembles. The results are presented in Table 8.

[0068] Суммарно ансамблирование и дополнительные данные увеличили AUC на 5% по всем темам и на 11% на выбранных темах.[0068] Combined, ensemble and additional data increased AUC by 5% across all topics and by 11% on selected topics.

[0069] Теперь можно сравнить работу двух продвинутых моделей: Универсальной модели (ансамбль с альтернативными данными) и модели, обученной на теме НВ (модель на одной теме с альтернативными данными). Тестирования поиска ошибок полиграфологов проводится среди 2094 архивных скринингов. Поиск ошибок полиграфологов проводится в двух темах, которые наиболее распространены в скринингах. Были отобраны скрининги, в которых полиграфолог сделал заключение, что тема не выявлена, но модель уверенно утверждает, что выявлена ложь.[0069] Now you can compare the performance of two advanced models: the Universal model (ensemble with alternative data) and the model trained on the NV topic (model on the same topic with alternative data). Testing of polygraph examiners' error detection is carried out among 2094 archival screenings. Polygraph examiners search for errors in two topics that are most common in screenings. Screenings were selected in which the polygraph examiner concluded that the topic was not identified, but the model confidently stated that a lie was identified.

[0070] Основываясь на 15 наивысших оценках модели, обученной на теме Наркотические вещества, было отобрано 15 выводов полиграфологов с заключением «Не выявлено» в качества кандидатов на ошибку полиграфолога по теме Наркотические вещества. Таким же путем, основываясь на оценках универсальной модели, было отобрано 15/5/5 заключений «Не выявлено» для тем «Получение денежного вознаграждения»/ «Разглашение конфиденциальной информации»/ «Уголовные и административные правонарушения». Таким образом, получилось 40 заключений по 36 скринингам.[0070] Based on the 15 highest scores of the model trained on the Substances topic, 15 polygraph examiner findings with a “Not Detected” conclusion were selected as candidates for polygraph examiner error on the Narcotics Substances topic. In the same way, based on the estimates of the universal model, 15/5/5 conclusions “Not identified” were selected for the topics “Receipt of monetary reward” / “Disclosure of confidential information” / “Criminal and administrative offenses”. Thus, 40 conclusions were obtained from 36 screenings.

[0071] Данные 36 скринингов были проверены через слепую повторную проверку у двух полиграфологов. Полиграфологи не знали результаты скринингов и не передавали результаты друг другу. Причина проведения двух проверок заключается в том, что в случае расхождения выводов оригинального заключения и одной из повторных проверок, будет слово одного полиграфолога против другого.[0071] Data from 36 screenings were verified through blind retesting by two polygraph examiners. The polygraph examiners did not know the results of the screenings and did not share the results with each other. The reason for conducting two tests is that in the event of a discrepancy between the conclusions of the original report and one of the repeated tests, it will be the word of one polygraph examiner against the other.

[0072] В эксперименте участвовали чрезвычайно опытные полиграфологи, и существует предположение, что доля ошибок полиграфологов среди всех скринигов колеблется от 0,0 до 1,0%.[0072] The experiment involved extremely experienced polygraph examiners, and it is believed that the error rate of polygraph examiners among all screeners ranges from 0.0 to 1.0%.

[0073] Обзор двух повторных проверок представлен в Таблице 9. Распределения оценок для двух релевантных тем представлены на Фиг. 6.[0073] An overview of the two replicate tests is presented in Table 9. The distributions of scores for the two relevant topics are presented in FIG. 6.

[0074] Путем проверки 39 выводов в 35 скрининга (из 2094 скринингов) было обнаружено 30 проблемных заключений, в которых ошибка полиграфолога подтверждена двумя проверками (13 заключений) или одна из проверок не согласилась с оригинальным заключением (17 заключений). Оставшиеся 9 заключений - ошибки модели, т.е. оригинальное заключение было верное, что и подтвердили обе повторные проверки.[0074] By reviewing 39 findings in 35 screenings (out of 2094 screenings), 30 problematic findings were identified in which the polygraph examiner's error was confirmed by two tests (13 findings) or one of the tests did not agree with the original conclusion (17 findings). The remaining 9 conclusions are model errors, i.e. the original conclusion was correct, which was confirmed by both repeated checks.

[0075] Было ожидаемо, что будут случаи расхождения результатов двух повторных проверок, т.к. бывают трудные случаи, в которых решение не очевидно. Для подобных трудных случаев собирается консилиум, на которых полиграфологи обсуждают их конфликтующие заключения и приходят к единому мнению.[0075] It was expected that there would be cases of discrepancy between the results of two repeated tests, because There are difficult cases in which the solution is not obvious. For such difficult cases, a consultation is held at which polygraph examiners discuss their conflicting conclusions and come to a consensus.

[0076] Было замечено, что в некоторых проблемных скринингах (заключение «Выявлено» в одной или в обеих перепроверках), полиграфологи, которые проводили перепроверку, пометили, что экзаменуемый оказывал противодействие. Пропуск противодействия является ошибкой полиграфолога по определению.[0076] It has been observed that in some problematic screenings (finding “Revealed” in one or both retests), the polygraph examiners who performed the retest noted that the examinee was resisting. Omitting a countermeasure is, by definition, a polygraph examiner's error.

Описание данных.Description of the data.

[0077] Датасет состоит из 2094 исторических полиграфических проверок типа скрининг, включающие в себя метки «Выявлено сокрытие информации» и «Не выявлено сокрытие информации», проставленные полиграфологами, которые проводили скрининги. Эти скрининги проводились на кандидатах и персонале банка на рисковых направлениях, с их согласия и в соответствии с законодательством, перед наймом, повышением или каждый год, в зависимости от позиции. Полиграфический скрининг включает в себя подмножество из 14 тем, включая наркотические вещества и получение денежного вознаграждения.[0077] The dataset consists of 2094 historical polygraph tests of the screening type, including the labels “Concealment of information detected” and “Concealment of information not detected”, put down by polygraph examiners who conducted the screenings. These screenings were carried out on candidates and bank staff in risky areas, with their consent and in accordance with the law, before hiring, promotion or every year, depending on the position. The polygraph screening includes a subset of 14 topics, including drugs and receipt of monetary reward.

[0078] Запись скрининга включает в себя физиологические сигналы испытуемого, аудио и вопросы в текстовом формате. Каждый вопрос имеет три временных метки для каждого предъявления: начало вопроса, заданного полиграфологом, конец вопроса и момент ответа. Каждый вопрос классифицируется на четыре типа.[0078] The screening recording includes the subject's physiological signals, audio, and questions in text format. Each question has three time stamps for each presentation: the beginning of the question asked by the polygraph examiner, the end of the question, and the moment of the answer. Each question is classified into four types.

[0079] Типы вопросов:[0079] Question types:

[0080] Примеры вопросов:[0080] Examples of questions:

[0081] В дополнении к физиологическим сигналам Таблица 10 фиксируется пол и возраст испытуемого.[0081] In addition to the physiological signals, Table 10 records the gender and age of the subject.

[0082] Скрининг проводился на полиграфе Polyconius, моделью 7.[0082] Screening was carried out on a Polyconius polygraph, model 7.

Обработка данныхData processing

[0083] Основная задача - оценить вывод полиграфолога (Выявлено или Не Выявлено) для определенной темы в скрининге.[0083] The main task is to evaluate the polygraph examiner's conclusion (Detected or Not Detected) for a certain topic in the screening.

[0084] Для построения модели были представлены данные в следующем формате: каждая строка в наборе данных представляет из себя запись скрининга по конкретной теме в конкретном тесте для одного экзаменуемого. Целевая переменная размечена как «Выявлено» или «Не Выявлено». Может наблюдаться смещение из-за такой разметки, так как испытуемый может солгать не на всех текстах по теме в течении скрининга.[0084] To build the model, data was presented in the following format: each row in the data set represents a screening record on a specific topic on a specific test for one examinee. The target variable is labeled as “Detected” or “Not Detected”. There may be a bias due to such marking, since the test taker may not lie on all texts on the topic during screening.

[0085] Физиологические сигналы были извлечены из временных окон согласно временным меткам вопросов. Таким образом, изначально строка - это временной ряд физиологического сигнала для конкретного предъявления по конкретной теме.[0085] Physiological signals were extracted from time windows according to question timestamps. Thus, initially a line is a time series of a physiological signal for a specific presentation on a specific topic.

[0086] Для каждого предъявления проверочного и контрольного вопросов сгенерировали базовые статистики: минимальное, максимальное, среднее, амплитуду и стандартное отклонение. Далее использовались минимальное, максимальное, среднее и стандартное отклонение в качестве агрегатных функций на каждом шаге. Данные каждого предъявления были сгруппированы по вопросам. Был создан дополнительный признак, характеризующий разницу между первым и последующими предъявлениями. Похожим образом вопросы были сгруппированы по темам внутри каждого теста. В конце концов каждая строка включает в себя 600 признаков, извлеченных из записи полиграфического скрининга по конкретной теме в конкретном тесте (Фиг. 10).[0086] For each presentation of the test and control questions, basic statistics were generated: minimum, maximum, mean, amplitude and standard deviation. Next, the minimum, maximum, mean and standard deviation were used as aggregate functions at each step. Data from each presentation were grouped by question. An additional feature was created that characterizes the difference between the first and subsequent presentations. Similarly, questions were grouped into topics within each test. Ultimately, each line includes 600 features extracted from the polygraph screening record for a specific topic in a specific test (Figure 10).

Модели.Models.

[0087] В данном техническом решении был использован ансамбль типа стэкинг, который состоял из двух уровней моделей, основанных на алгоритме градиентного бустинга. Двухуровневая структура требовалась для того, чтобы избежать «проклятия размерности». Модель первого уровня была обучена на 600 признаках, построенных на физиологических сигналах для каждой темы, делая заключение Выявлено/Не выявлено для каждого теста внутри скрининга. На вход второй модели подавались агрегированные на каждой теме выходные значения первой модели.[0087] In this technical solution, a stacking ensemble was used, which consisted of two levels of models based on the gradient boosting algorithm. The two-level structure was required in order to avoid the “curse of dimensionality.” The first level model was trained on 600 features built on physiological signals for each topic, making a Detected/Not Detected conclusion for each test within the screen. The output values of the first model aggregated on each topic were fed to the input of the second model.

[0088] Модель второго уровня имела следующие признаки:[0088] The second level model had the following features:

pred_proba_max - максимальная вероятность выявления лжи среди тестов;pred_proba_max - maximum probability of detecting lies among tests;

pred_proba_mean - средняя вероятность выявления лжи;pred_proba_mean - average probability of detecting a lie;

pred_proba_min - минимальная вероятность обнаружения лжи среды тестов;pred_proba_min - the minimum probability of detecting lies in the test environment;

pred_proba_diff- разница между максимальным и средним значением вероятностей.pred_proba_diff - the difference between the maximum and average probabilities.

[0089] Эти вероятности объединяется с альтернативными данными (биографические данные, данные о магнитных бурях и т.п.) Полученный набор данных подается на вход второй модели, которая выдает вероятность лжи на скрининге по определенной теме.[0089] These probabilities are combined with alternative data (biographical data, data on magnetic storms, etc.) The resulting data set is fed into the input of a second model, which produces the probability of lying during screening on a specific topic.

Базовая модель.Basic model.

[0090] Данная модель (Фиг. 7) не получает информацию о темах в момент обучения и предсказания. Информация о темах используется для дальнейшей агрегации внутри скрининга. Например, в первом скрининге происходит агрегация по теме «наркотические вещества» по всем тестам, то же самое происходит, например, с темой «инсайдерская торговля» во втором скрининге.[0090] This model (Figure 7) does not receive topic information during training and prediction. The topic information is used for further aggregation within the screening. For example, in the first screening, aggregation occurs on the topic “drugs” across all tests, the same thing happens, for example, with the topic “insider trading” in the second screening.

Модель обученная на одной темеModel trained on one topic

[0091] Логика построения модели и создания признаков такая же, как и в базовой модели. Различие заключается в том, что для обучения берутся признаки только для одной темы. Отобраны данные только по одной теме перед тем, как применяется модель первого уровня. На Фиг. 8 показан порядок обучения модели для темы наркотические вещества, в то время как остальные темы убраны из тренировочного набора данных.[0091] The logic for building the model and creating features is the same as in the base model. The difference is that for training, features are taken for only one topic. Data from only one topic are selected before the first level model is applied. In FIG. Figure 8 shows the order of training the model for the topic narcotic substances, while the remaining topics were removed from the training data set.

Универсальная модель.Universal model.

[0092] Было принято решение использовать лучшие стороны моделей, описанных выше. Таким образом, был построен ансамбль из существующих архитектур (Фиг. 9). После серии экспериментов лучший результат показала архитектура, в которой усредняется степень уверенности следующих моделей:[0092] It was decided to use the best features of the models described above. Thus, an ensemble of existing architectures was built (Fig. 9). After a series of experiments, the best result was shown by an architecture that averaged the degree of confidence of the following models:

базовая модель - алгоритм: градиентным бустинг; с альтернативными данными; модель на одной теме - алгоритм: градиентный бустинг; с альтернативными данными; базовая модель - алгоритм: случайный лес.basic model - algorithm: gradient boosting; with alternative data; model on one topic - algorithm: gradient boosting; with alternative data; basic model - algorithm: random forest.

[0093] Данный ансамбль был применен ко всем темам, кроме темы Наркотические вещества. Помимо привычных преимуществ ансамблей было рационально использовать разные архитектуры (градиентный бустинг и случайный лес) для того, чтобы исключить чистую ошибку модели, и выделены будут те скрининги, в которых присутствует ошибка полиграфолога.[0093] This ensemble was applied to all topics except the Drugs topic. In addition to the usual advantages of ensembles, it was rational to use different architectures (gradient boosting and random forest) in order to exclude pure model error, and those screenings in which there was a polygraph examiner error would be highlighted.

[0094] Как показано на Фиг. 1 способ автоматической полиграфической проверки (100), выполняется с помощью вычислительной системы, содержащей модель машинного обучения, и состоит из нескольких взаимосвязанных этапов.[0094] As shown in FIG. 1 method of automatic polygraph checking (100), is performed using a computing system containing a machine learning model, and consists of several interrelated stages.

[0095] На этапе 101 получают записи полиграфных проверок, содержащие по меньшей мере сигналы датчиков с временными шкалами, на которых промаркированы начало и конец вопроса.[0095] At step 101, polygraph test records are obtained containing at least sensor signals with time scales on which the beginning and end of the question are marked.

[0096] Далее на этапе 102 получают дополнительные данные, содержащие по меньшей мере возраст проверяемого, пол, должностную информацию.[0096] Next, at step 102, additional data is obtained containing at least the age of the person being checked, gender, job information.

[0097] Далее на этапе 103 осуществляют обработку полученных сигналов и дополнительных данных с помощью модели машинного обучения (МО), причем в ходе указанной обработки осуществляется:[0097] Next, at step 103, the received signals and additional data are processed using a machine learning (ML) model, and during the said processing:

определение временных интервалов для извлечения переменных на основе временных меток начала и конца вопроса и временной метки ответа, и на основе типа и темы вопроса. На данном этапе, в зависимости от логики построения модели, выделяются временные метки только тех вопросов (и ответов), физиологические данные которых будут участвовать в обучении и тестировании модели. Например, в универсальной модели используются вопросы по всем темам, поэтому будут выбраны все метки; defining time intervals for retrieving variables based on question start and end timestamps and answer timestamps, and based on question type and topic. At this stage, depending on the logic of building the model, time stamps are allocated only for those questions (and answers) whose physiological data will be involved in training and testing the model. For example, the generic model uses questions on all topics, so all labels will be selected;

извлечение переменных из каждого сигнала на определенных временных интервалах. На данном этапе из базы данных извлекаются физиологические сигналы на выбранных ранее интервалах; extracting variables from each signal at specific time intervals. At this stage, physiological signals at previously selected intervals are extracted from the database;

обработка полученных переменных из сигналов, при которой выполняется нормализация и конкатенация обработанных переменных и построение на их основе вектора. На данном этапе, каждое значение численной переменной становится равным значению от 0 до 1; processing of obtained variables from signals, which involves normalization and concatenation of processed variables and construction of a vector based on them. At this stage, each value of a numerical variable becomes equal to a value between 0 and 1;

разделение дополнительных данных на категориальные и численные переменные. separating additional data into categorical and numerical variables.

обработка полученных переменных из дополнительных данных, при которой выполняется векторизация категориальных переменных и нормализация численных переменных. На данном этапе, каждое значение численной переменной становится равным значению от 0 до 1; processing of obtained variables from additional data, which involves vectorization of categorical variables and normalization of numerical variables. At this stage, each value of a numerical variable becomes equal to a value between 0 and 1;

конкатенация обработанных переменных, извлеченных из каждого сигнала, и обработанных дополнительных переменных, и построение на их основе вектора. На данном этапе, из всех выбранных релевантных для конкретного вывода полиграфолога данных, получают один вектор, одинаково применимый как для тренировки модели (при наличии вывода полиграфолога), так и для принятия моделью решения (выявлено / не выявлено); concatenating the processed variables extracted from each signal and the processed additional variables, and constructing a vector based on them. At this stage, from all the selected data relevant to a specific polygraph examiner’s conclusion, one vector is obtained, which is equally applicable both for training the model (if the polygraph examiner’s conclusion is available) and for making a decision by the model (detected / not detected);

подача упомянутого вектора в модель МО для получения выходного значения модели МО. На данном этапе получают скор модели в интервале от 0 до 1; feeding said vector into the ML model to obtain the output value of the ML model. At this stage, model speeds are obtained in the range from 0 to 1;

сравнение выходного значения модели с заданным пороговым значением. comparing the output value of the model with a specified threshold value.

[0098] И на этапе 104 определяют, что ответ является ложью если выходное значение выше или равно пороговому значению или ответ является правдой если выходное значение ниже порогового значения.[0098] And at step 104, it is determined that the response is false if the output value is greater than or equal to the threshold value or the response is true if the output value is below the threshold value.

[0099] В одном из частных вариантов реализации способа модель МО определяет ответ как правдивый или ложный не по одному полученному ответу, а по совокупности вопросов и ответов, связанных с темой полиграфической проверки, при этом получаемые переменные сигналы объединяются в единый вектор, либо усредняются перед передачей в модели МО.[0099] In one of the particular embodiments of the method, the MO model determines the answer as true or false not by one received answer, but by a set of questions and answers related to the topic of polygraph testing, while the received variable signals are combined into a single vector, or averaged before transfer to the MO model.

[0100] В другом частном варианте реализации способа модель МО имеет тип градиентный бустинг, случайный лес, или нейронная сеть.[0100] In another particular embodiment of the method, the ML model is of the gradient boosting, random forest, or neural network type.

[0101] В другом частном варианте реализации способа модель МО обучена на одной из тем для проверок или их комбинации, где темами для проверок являются: наркотические вещества, получение дополнительного вознаграждения, разглашение конфиденциальной информации, долговые обязательства, сторонний доход, уголовные правонарушения, административные правонарушения, нарушения внутренних нормативных документов (ВНД).[0101] In another particular embodiment of the method, the ML model is trained on one of the topics for checks or a combination thereof, where the topics for checks are: narcotic substances, receiving additional remuneration, disclosure of confidential information, debt obligations, third-party income, criminal offenses, administrative offenses , violations of internal regulatory documents (INR).

[0102] В другом частном варианте реализации способа записи полиграфических проверок содержат сигналы с датчиков, включающие по меньшей мере одно из: частота сердечного сокращения (ЧСС), кожно-гальваническая реакции (КГР), артериальное давление, верхнее и нижнее дыхание, пьезоплетизмограмму, фотоплетизмограмму, термических, движения зрачка или их комбинации.[0102] In another particular embodiment of the method for recording polygraphic checks, they contain signals from sensors, including at least one of: heart rate (HR), galvanic skin response (GSR), blood pressure, upper and lower respiration, piezoplethysmogram, photoplethysmogram , thermal, pupil movement or combinations thereof.

[0103] В другом частном варианте реализации способа дополнительные данные содержат по меньшей мере одно из: идентификационный номер полиграфологов, идентификационный номер полиграфов, информацию о погодных условиях, результаты электроэнцефалограммы, магнитно-резонансной томографии, функциональной ближней инфракрасной спектроскопии, информацию о геомагнитных бурях или их комбинации.[0103] In another particular embodiment of the method, the additional data contains at least one of: identification number of polygraph examiners, identification number of polygraphs, information about weather conditions, results of an electroencephalogram, magnetic resonance imaging, functional near-infrared spectroscopy, information about geomagnetic storms or their combinations.

[0104] Реализация данного технического решения на базе одной модели МО, позволяет автоматизировать процесс полиграфических проверок и с высокой точностью выявлять сокрытие информации.[0104] The implementation of this technical solution based on one MO model allows you to automate the process of polygraphic checks and detect information concealment with high accuracy.

[0105] Как показано на Фиг. 2 в другом частном варианте реализации, способ автоматической полиграфической проверки (200), выполняется с помощью вычислительной системы, содержащей по меньшей мере две модели машинного обучения, и состоит из нескольких взаимосвязанных этапов.[0105] As shown in FIG. 2, in another particular embodiment, the automatic polygraph checking method (200) is performed using a computing system containing at least two machine learning models and consists of several interrelated steps.

[0106] На этапе 201 получают записи полиграфных проверок, содержащие по меньшей мере сигналы датчиков с временными шкалами, на которых промаркированы начало и конец вопроса.[0106] At step 201, polygraph test records are obtained containing at least sensor signals with time scales on which the beginning and end of the question are marked.

[0107] Дале на этапе 202 получают дополнительные данные, содержащие по меньшей мере возраст проверяемого, пол, должностную информацию.[0107] Next, at step 202, additional data is obtained containing at least the age of the person being checked, gender, and job information.

[0108] Далее на этапе 203 осуществляют обработку полученных сигналов с помощью первой модели машинного обучения (МО), причем в ходе указанной обработки осуществляется:[0108] Next, at step 203, the received signals are processed using the first machine learning (ML) model, and during the said processing:

подача упомянутого вектора в 1-ю модель МО для получения выходного значения 1-й модели МО. На данном этапе получают скор модели в интервале от 0 до 1; feeding the said vector to the 1st MO model to obtain the output value of the 1st MO model. At this stage, model speeds are obtained in the range from 0 to 1;

передача выходного значения 1-й модели МО на вход 2-й модели МО. transferring the output value of the 1st MO model to the input of the 2nd MO model.

[0109] Далее на этапе 204 с помощью второй модели машинного обучения (МО) осуществляют обработку выходного значения 1-й модели МО, и дополнительных данных, причем в ходе указанной обработки осуществляется:[0109] Next, at step 204, using a second machine learning (ML) model, the output value of the 1st ML model and additional data are processed, and during the specified processing:

конкатенация обработанных дополнительных переменных, а также выходного значения 1-й модели МО, и построение на их основе вектора. На данном этапе, из всех выбранных релевантных для конкретного вывода полиграфолога данных, получают один вектор, одинаково применимый как для тренировки модели (при наличии вывода полиграфолога), так и для принятия моделью решения (выявлено / не выявлено); concatenation of processed additional variables, as well as the output value of the 1st MO model, and construction of a vector based on them. At this stage, from all the selected data relevant to a specific polygraph examiner’s conclusion, one vector is obtained, which is equally applicable both for training the model (if the polygraph examiner’s conclusion is available) and for making a decision by the model (detected / not detected);

подача вектора во 2-ю модель МО для получения выходного значения 2-й модели МО. На данном этапе получают скор модели в интервале от 0 до 1; feeding the vector to the 2nd MO model to obtain the output value of the 2nd MO model. At this stage, model speeds are obtained in the range from 0 to 1;

сравнение выходного значения 2-й модели с заданным пороговым значением. comparison of the output value of the 2nd model with a given threshold value.

[0110] И на этапе 205 определяют, что ответ является ложью если выходное значение выше или равно пороговому значению или ответ является правдой если выходное значение ниже порогового значения.[0110] And at step 205, it is determined that the response is false if the output value is greater than or equal to the threshold value or the response is true if the output value is below the threshold value.

[0111] В одном из частных вариантов реализации способа модели МО определяют ответ как правдивый или ложный не по одному полученному ответу, а по совокупности вопросов и ответов, связанных с темой полиграфической проверки, при этом получаемые переменные сигналы объединяются в единый вектор, либо усредняются перед передачей в модели МО.[0111] In one of the particular embodiments of the method, the ML model determines the answer as true or false not by one received answer, but by a set of questions and answers related to the topic of polygraph testing, while the received variable signals are combined into a single vector, or averaged before transfer to the MO model.

[0112] В другом частном варианте реализации способа 1-я модель МО имеет тип градиентный бустинг, случайный лес, или нейронная сеть.[0112] In another particular embodiment of the method, the 1st ML model is of the type gradient boosting, random forest, or neural network.

[0113] В другом частном варианте реализации способа 2-я модель МО имеет тип градиентный бустинг, случайный лес, или нейронная сеть.[0113] In another particular embodiment of the method, the 2nd ML model is of the type gradient boosting, random forest, or neural network.

[0114] В другом частном варианте реализации способа модели МО обучены на одной из тем для проверок или их комбинации, где темами для проверок являются: наркотические вещества, получение дополнительного вознаграждения, разглашение конфиденциальной информации, долговые обязательства, сторонний доход, уголовные правонарушения, административные правонарушения, нарушения внутренних нормативных документов (ВНД).[0114] In another particular embodiment of the method, ML models are trained on one of the topics for checks or a combination thereof, where the topics for checks are: narcotic substances, receiving additional remuneration, disclosure of confidential information, debt obligations, third-party income, criminal offenses, administrative offenses , violations of internal regulatory documents (INR).

[0115] В другом частном варианте реализации способа записи полиграфических проверок содержат сигналы с датчиков, включающие по меньшей мере одно из: частота сердечного сокращения (ЧСС), кожно-гальваническая реакции (КГР), артериальное давление, верхнее и нижнее дыхание, пьезоплетизмограмму, фотоплетизмограмму, термических, движения зрачка или их комбинации.[0115] In another particular embodiment of the method for recording polygraphic checks, they contain signals from sensors, including at least one of: heart rate (HR), galvanic skin response (GSR), blood pressure, upper and lower respiration, piezoplethysmogram, photoplethysmogram , thermal, pupil movement or combinations thereof.

[0116] В другом частном варианте реализации способа дополнительные данные содержат по меньшей мере одно из: идентификационный номер полиграфологов, идентификационный номер полиграфов, информацию о погодных условиях, результаты электроэнцефалограммы, магнитно-резонансной томографии, функциональной ближней инфракрасной спектроскопии, информацию о геомагнитных бурях или их комбинации.[0116] In another particular embodiment of the method, the additional data contains at least one of: identification number of polygraph examiners, identification number of polygraphs, information about weather conditions, results of an electroencephalogram, magnetic resonance imaging, functional near-infrared spectroscopy, information about geomagnetic storms or their combinations.

[0117] Реализация данного технического решения на базе двух моделей МО, позволяет автоматизировать процесс полиграфических проверок и с высокой точностью выявлять сокрытие информации. Данная реализация позволяет повысить точность выявления сокрытия информации за счет предварительного обсчета физиологических данных в модели первого уровня и использования уже агрегатов из первой модели (с добавлением доп.Нефизиологических данных) в модели 2-го уровня.[0117] The implementation of this technical solution based on two MO models allows you to automate the process of polygraphic checks and identify hidden information with high accuracy. This implementation makes it possible to increase the accuracy of identifying information hiding due to the preliminary calculation of physiological data in the first level model and the use of aggregates from the first model (with the addition of additional non-physiological data) in the 2nd level model.

[0118] Как показано на Фиг. 3 в другом частном варианте реализации, способ автоматической полиграфической проверки (300), выполняется с помощью вычислительной системы, содержащей по меньшей мере два ансамбля моделей машинного обучения, и состоит из нескольких взаимосвязанных этапов.[0118] As shown in FIG. 3, in another particular embodiment, the automatic polygraph verification method (300) is performed using a computing system containing at least two ensembles of machine learning models, and consists of several interrelated stages.

[0119] На этапе 301 получают записи полиграфических проверок, содержащие по меньшей мере сигналы датчиков с временными шкалами, на которых промаркированы начало и конец вопроса.[0119] At step 301, polygraph test records are obtained containing at least sensor signals with time lines on which the beginning and end of the question are marked.

[0120] Далее на этапе 302 получают дополнительные данные, содержащие по меньшей мере возраст проверяемого, пол, должностную информацию.[0120] Next, at step 302, additional data is obtained containing at least the age of the person being checked, gender, and job information.

[0121] Далее на этапе 303 осуществляют обработку полученных сигналов с помощью первого ансамбля моделей МО, обученного на одной теме, причем в ходе указанной обработки осуществляется:[0121] Next, at step 303, the received signals are processed using the first ensemble of ML models trained on one topic, and during this processing the following is carried out:

• определение временных интервалов для извлечения переменных на основе временных меток начала и конца вопроса и временной метки ответа, и на основе типа и темы вопроса. На данном этапе, в зависимости от логики построения модели, выделяются временные метки только тех вопросов (и ответов), физиологические данные которых будут участвовать в обучении и тестировании модели. Например, в универсальной модели используются вопросы по всем темам, поэтому будут выбраны все метки;• defining time intervals for retrieving variables based on the start and end timestamps of the question and the answer timestamp, and based on the question type and topic. At this stage, depending on the logic of building the model, time stamps are allocated only for those questions (and answers) whose physiological data will be involved in training and testing the model. For example, the generic model uses questions on all topics, so all labels will be selected;

• извлечение переменных из каждого сигнала на определенных временных интервалах. На данном этапе из базы данных извлекаются физиологические сигналы на выбранных ранее интервалах;• extracting variables from each signal at certain time intervals. At this stage, physiological signals at previously selected intervals are extracted from the database;

• обработка полученных переменных из сигналов, при которой выполняется нормализация и конкатенация обработанных переменных и построение на их основе вектора. На данном этапе, каждое значение численной переменной становится равным значению от 0 до 1;• processing of obtained variables from signals, which involves normalization and concatenation of processed variables and construction of a vector based on them. At this stage, each value of a numerical variable becomes equal to a value between 0 and 1;

• подача упомянутого вектора в 1-ю модель МО для получения выходного значения 1-й модели МО. На данном этапе получают скор модели в интервале от 0 до 1;• feeding the mentioned vector into the 1st MO model to obtain the output value of the 1st MO model. At this stage, model speeds are obtained in the range from 0 to 1;

• передачу выходного значения 1-й модели МО на вход 2-й модели МО.• transfer of the output value of the 1st MO model to the input of the 2nd MO model.

• разделение дополнительных данных на категориальные и численные переменные;• dividing additional data into categorical and numerical variables;

• обработка полученных переменных из дополнительных данных, при которой выполняется векторизация категориальных переменных и нормализация численных переменных. На данном этапе, каждое значение численной переменной становится равным значению от 0 до 1;• processing of obtained variables from additional data, which involves vectorization of categorical variables and normalization of numerical variables. At this stage, each value of a numerical variable becomes equal to a value between 0 and 1;

• конкатенация обработанных дополнительных переменных, а также выходного значения 1-й модели МО, и построение на их основе вектора. На данном этапе, из всех выбранных релевантных для конкретного вывода полиграфолога данных, получают один вектор, одинаково применимый как для тренировки модели (при наличии вывода полиграфолога), так и для принятия моделью решения (выявлено / не выявлено);• concatenation of processed additional variables, as well as the output value of the 1st MO model, and construction of a vector based on them. At this stage, from all the selected data relevant to a specific polygraph examiner’s conclusion, one vector is obtained, which is equally applicable both for training the model (if the polygraph examiner’s conclusion is available) and for making a decision by the model (detected / not detected);

• подача упомянутого вектора во 2-ю модель МО для получения выходного значения 2-й модели МО. На данном этапе получают скор модели в интервале от 0 до 1;• feeding the mentioned vector into the 2nd MO model to obtain the output value of the 2nd MO model. At this stage, model speeds are obtained in the range from 0 to 1;

• подача выходного значения 2-ой модели МО в третью модель МО, для формирования выходного значения первого ансамбля.• feeding the output value of the 2nd MO model to the third MO model to form the output value of the first ensemble.

[0122] Далее на этапе 304 осуществляют обработку полученных сигналов с помощью второго ансамбля моделей МО, обученного на комбинации тем, причем в ходе указанной обработки осуществляется:[0122] Next, at step 304, the received signals are processed using a second ensemble of ML models trained on a combination of topics, and during the specified processing:

• передача выходного значения 1 -й модели МО на вход 2-й модели МО.• transfer of the output value of the 1st MO model to the input of the 2nd MO model.

• подача выходного значения 2-ой модели МО в третью модель МО, для формирования выходного значения второго ансамбля.• supplying the output value of the 2nd MO model to the third MO model to form the output value of the second ensemble.

[0123] Далее на этапе 305 с помощью третьей модели МО осуществляют обработку выходных значений первого и второго ансамблей МО, причем в ходе указанной обработки осуществляется:[0123] Next, at step 305, using the third MO model, the output values of the first and second MO ensembles are processed, and during this processing the following is carried out:

конкатенация обработанных выходных значений первого и второго ансамблей, и построение на их основе вектора. concatenation of the processed output values of the first and second ensembles, and construction of a vector based on them.

подача упомянутого вектора в 3-ю модель МО для получения выходного значения 3-й модели МО. На данном этапе получают скор модели в интервале от 0 до 1; feeding the said vector into the 3rd MO model to obtain the output value of the 3rd MO model. At this stage, model speeds are obtained in the range from 0 to 1;

сравнение выходного значения 3-й модели с заданным пороговым значением. comparison of the output value of the 3rd model with a given threshold value.

[0124] И на этапе 306 определяют, что ответ является ложью если выходное значение выше или равно пороговому значению или ответ является правдой если выходное значение ниже порогового значения.[0124] And at step 306, it is determined that the response is false if the output value is greater than or equal to the threshold value or the response is true if the output value is below the threshold value.

[0125] В одно из частных вариантов реализации способа модели МО обучены на одной из тем для проверок или их комбинации, где темами для проверок являются: наркотические вещества, получение дополнительного вознаграждения, разглашение конфиденциальной информации, долговые обязательства, сторонний доход, уголовные правонарушения, административные правонарушения, нарушения внутренних нормативных документов (ВНД).[0125] In one of the private embodiments of the method, ML models are trained on one of the topics for checks or a combination thereof, where the topics for checks are: narcotic substances, receiving additional remuneration, disclosure of confidential information, debt obligations, third-party income, criminal offenses, administrative offenses, violations of internal regulations (INR).

[0126] В другом частном варианте реализации способа модели МО первого и второго ансамблей имеют тип, выбираемый из группы: градиентный бустинг, случайный лес, или нейронная сеть.[0126] In another particular embodiment of the method, the ML models of the first and second ensembles are of a type selected from the group: gradient boosting, random forest, or neural network.

[0127] В другом частном варианте реализации способа третья модель МО имеет тип, выбираемый из группы: логистическая регрессия, случайный лес, градиентный бустинг или усреднение.[0127] In another particular embodiment of the method, the third ML model is of a type selected from the group: logistic regression, random forest, gradient boosting or averaging.

[0128] В другом частном варианте реализации способа записи полиграфических проверок содержат сигналы с датчиков, включающие по меньшей мере одно из: частота сердечного сокращения (ЧСС), кожно-гальваническая реакции (КГР), артериальное давление, верхнее и нижнее дыхание, пьезоплетизмограмму, фотоплетизмограмму, термических, движения зрачка или их комбинации.[0128] In another particular embodiment of the method for recording polygraphic checks, they contain signals from sensors, including at least one of: heart rate (HR), galvanic skin response (GSR), blood pressure, upper and lower respiration, piezoplethysmogram, photoplethysmogram , thermal, pupil movement or combinations thereof.

[0129] В другом частном варианте реализации способа дополнительные данные содержат по меньшей мере одно из: идентификационный номер полиграфологов, идентификационный номер полиграфов, информацию о погодных условиях, результаты электроэнцефалограммы, магнитно-резонансной томографии, функциональной ближней инфракрасной спектроскопии, информацию о геомагнитных бурях или их комбинации.[0129] In another particular embodiment of the method, the additional data contains at least one of: identification number of polygraph examiners, identification number of polygraphs, information about weather conditions, results of an electroencephalogram, magnetic resonance imaging, functional near-infrared spectroscopy, information about geomagnetic storms or their combinations.

[0130] Реализация данного технического решения на базе двух ансамблей моделей МО, позволяет автоматизировать процесс полиграфических проверок и с высокой точностью выявлять сокрытие информации. Данная реализация позволяет повысить точность выявления сокрытия информации за счет различий архитектур между двумя ансамблями и за счет добавления верхнеуровневой модели.[0130] The implementation of this technical solution based on two ensembles of MO models allows you to automate the process of polygraphic checks and identify information concealment with high accuracy. This implementation makes it possible to increase the accuracy of detecting information hiding due to differences in architectures between the two ensembles and by adding a top-level model.

[0131] Как показано на Фиг. 3 в другом частном варианте реализации, способ автоматической полиграфической проверки (400), выполняется с помощью вычислительной системы, содержащей по меньшей мере три ансамбля моделей машинного обучения, и состоит из нескольких взаимосвязанных этапов.[0131] As shown in FIG. 3, in another particular embodiment, the automatic polygraph verification method (400) is performed using a computing system containing at least three ensembles of machine learning models, and consists of several interrelated stages.

[0132] На этапе 401 получают записи полиграфических проверок, содержащие по меньшей мере сигналы датчиков с временными шкалами, на которых промаркированы начало и конец вопроса.[0132] At step 401, polygraph test records are obtained containing at least sensor signals with time scales marking the beginning and end of the question.

[0133] Далее на этапе 402 получают дополнительные данные, содержащие по меньшей мере возраст проверяемого, пол, должностную информацию.[0133] Next, at step 402, additional data is obtained containing at least the age of the person being checked, gender, job information.

[0134] Далее на этапе 403 осуществляют обработку полученных сигналов с помощью первого ансамбля моделей МО, обученного на одной теме, причем в ходе указанной обработки осуществляется:[0134] Next, at step 403, the received signals are processed using the first ensemble of ML models trained on one topic, and during this processing the following is carried out:

• определение временных интервалов для извлечения переменных на основе временных меток начала и конца вопроса и временной метки ответа, и на основе типа и темы вопроса. На данном этапе, в зависимости от логики построения модели, выделяются временные метки только тех вопросов (и ответов), физиологические данные которых будут участвовать в обучении и тестировании модели. Например, в универсальной модели используются вопросы по всем темам, поэтому будут выбраны все метки.• defining time intervals for retrieving variables based on the start and end timestamps of the question and the answer timestamp, and based on the question type and topic. At this stage, depending on the logic of building the model, time stamps are allocated only for those questions (and answers) whose physiological data will be involved in training and testing the model. For example, the generic model uses questions on all topics, so all labels will be selected.

• извлечение переменных из каждого сигнала на определенных временных интервалах. На данном этапе из базы данных извлекаются физиологические сигналы на выбранных ранее интервалах.• extracting variables from each signal at certain time intervals. At this stage, physiological signals at previously selected intervals are extracted from the database.

• обработка полученных переменных из сигналов, при которой выполняется нормализация и конкатенация обработанных переменных и построение на их основе вектора. На данном этапе, каждое значение численной переменной становится равным значению от 0 до 1.• processing of obtained variables from signals, which involves normalization and concatenation of processed variables and construction of a vector based on them. At this stage, each value of a numerical variable becomes equal to a value between 0 and 1.

• подача упомянутого вектора в 1-ю модель МО для получения выходного значения 1-й модели МО. На данном этапе получают скор модели в интервале от 0 до 1.• feeding the mentioned vector into the 1st MO model to obtain the output value of the 1st MO model. At this stage, model speeds in the range from 0 to 1 are obtained.

• передача выходного значения 1-й модели МО на вход 2-й модели МО.• transfer of the output value of the 1st MO model to the input of the 2nd MO model.

• разделение дополнительных данных на категориальные и численные переменные.• separating additional data into categorical and numerical variables.

• обработка полученных переменных из дополнительных данных, при которой выполняется векторизация категориальных переменных и нормализация численных переменных. На данном этапе, каждое значение численной переменной становится равным значению от 0 до 1.• processing of obtained variables from additional data, which involves vectorization of categorical variables and normalization of numerical variables. At this stage, each value of a numerical variable becomes equal to a value between 0 and 1.

• конкатенация обработанных дополнительных переменных, а также выходного значения 1-й модели МО, и построение на их основе вектора. На данном этапе, из всех выбранных релевантных для конкретного вывода полиграфолога данных, получают один вектор, одинаково применимый как для тренировки модели (при наличии вывода полиграфолога), так и для принятия моделью решения (выявлено / не выявлено).• concatenation of processed additional variables, as well as the output value of the 1st MO model, and construction of a vector based on them. At this stage, from all the selected data relevant to a specific polygraph examiner’s conclusion, one vector is obtained, which is equally applicable both for training the model (if the polygraph examiner’s conclusion is available) and for making a decision by the model (detected / not detected).

• подача упомянутого вектора во 2-ю модель МО для получения выходного значения 2-й модели МО. На данном этапе получают скор модели в интервале от 0 до 1.• feeding the mentioned vector into the 2nd MO model to obtain the output value of the 2nd MO model. At this stage, model speeds in the range from 0 to 1 are obtained.

[0135] Далее на этапе 404 осуществляют обработку полученных сигналов с помощью второго ансамбля моделей МО, обученного на комбинации тем, причем в ходе указанной обработки осуществляется:[0135] Next, at step 404, the received signals are processed using a second ensemble of ML models trained on a combination of topics, and during the specified processing:

• обработка полученных переменных из сигналов, при которой выполняется нормализация и конкатенация обработанных переменных и построение на их основе вектора. На данном этапе, каждое значение численной переменной становится равным значению от 0 до 1.подача упомянутого вектора в 1-ю модель МО для получения выходного значения 1-й модели МО. На данном этапе получают скор модели в интервале от 0 до 1.• processing of obtained variables from signals, which involves normalization and concatenation of processed variables and construction of a vector based on them. At this stage, each value of the numerical variable becomes equal to a value from 0 to 1. Feeding the said vector into the 1st ML model to obtain the output value of the 1st ML model. At this stage, model speeds in the range from 0 to 1 are obtained.

[0136] Далее на этапе 405 осуществляют обработку полученных сигналов с помощью третьего ансамбля моделей машинного обучения, обученного на комбинации тем, причем в ходе указанной обработки осуществляется:[0136] Next, at step 405, the received signals are processed using a third ensemble of machine learning models trained on a combination of topics, and during the said processing:

обработку сигналов первой моделью МО, в ходе которой выполняется: signal processing by the first MO model, during which the following is performed:

• подача упомянутого вектора в 1 -ю модель МО для получения выходного значения 1-й модели МО. На данном этапе получают скор модели в интервале от 0 до 1.• feeding the mentioned vector into the 1st MO model to obtain the output value of the 1st MO model. At this stage, model speeds in the range from 0 to 1 are obtained.

• конкатенацию обработанных дополнительных переменных, а также выходного значения 1-й модели МО, и построение на их основе вектора. На данном этапе, из всех выбранных релевантных для конкретного вывода полиграфолога данных, получают один вектор, одинаково применимый как для тренировки модели (при наличии вывода полиграфолога), так и для принятия моделью решения (выявлено / не выявлено).• concatenation of processed additional variables, as well as the output value of the 1st MO model, and construction of a vector based on them. At this stage, from all the selected data relevant to a specific polygraph examiner’s conclusion, one vector is obtained, which is equally applicable both for training the model (if the polygraph examiner’s conclusion is available) and for making a decision by the model (detected / not detected).

• подача выходного значения 2-ой модели МО в третью модель МО, для формирования выходного значения третьего ансамбля.• supplying the output value of the 2nd MO model to the third MO model to form the output value of the third ensemble.

[0137] Далее на этапе 406 с помощью третьей модели МО осуществляют обработку выходных значений первого, второго и третьего ансамблей МО, причем в ходе указанной обработки осуществляется:[0137] Next, at step 406, using the third MO model, the output values of the first, second and third MO ensembles are processed, and during this processing the following is carried out:

конкатенация обработанных выходных значений первого, второго и третьего ансамблей, и построение на их основе вектора. На данном этапе, каждое значение численной переменной становится равным значению от 0 до 1. concatenation of the processed output values of the first, second and third ensembles, and construction of a vector based on them. At this stage, each value of a numerical variable becomes equal to a value between 0 and 1.

подача упомянутого вектора в 3-ю модель МО для получения выходного значения 3-й модели МО. На данном этапе получают скор модели в интервале от 0 до 1. feeding the said vector into the 3rd MO model to obtain the output value of the 3rd MO model. At this stage, model speeds in the range from 0 to 1 are obtained.

[0138] И на этапе 407 осуществляют определение того, что ответ является ложью если выходное значение выше или равно пороговому значению или ответ является правдой если выходное значение ниже порогового значения.[0138] And at step 407, a determination is made that the response is false if the output value is greater than or equal to the threshold value or the response is true if the output value is below the threshold value.

[0139] В одно из частных вариантов реализации способа модели МО обучены на одной из тем для проверок или их комбинации, где темами для проверок являются: наркотические вещества, получение дополнительного вознаграждения, разглашение конфиденциальной информации, долговые обязательства, сторонний доход, уголовные правонарушения, административные правонарушения, нарушения внутренних нормативных документов (ВНД).[0139] In one of the private embodiments of the method, ML models are trained on one of the topics for checks or a combination thereof, where the topics for checks are: narcotic substances, receiving additional remuneration, disclosure of confidential information, debt obligations, third-party income, criminal offenses, administrative offenses, violations of internal regulations (INR).

[0140] В другом частном варианте реализации способа модели МО первого, второго и третьего ансамблей имеют тип, выбираемый из группы: градиентный бустинг, случайный лес, или нейронная сеть.[0140] In another particular embodiment of the method, the ML models of the first, second and third ensembles are of a type selected from the group: gradient boosting, random forest, or neural network.

[0141] В другом частном варианте реализации способа третья модель МО имеет тип, выбираемый из группы: логистическая регрессия, случайный лес, градиентный бустинг или усреднение.[0141] In another particular embodiment of the method, the third ML model is of a type selected from the group: logistic regression, random forest, gradient boosting, or averaging.

[0142] В другом частном варианте реализации способа записи полиграфических проверок содержат сигналы с датчиков, включающие по меньшей мере одно из: частота сердечного сокращения (ЧСС), кожно-гальваническая реакции (КГР), артериальное давление, верхнее и нижнее дыхание, пьезоплетизмограмму, фотоплетизмограмму, термических, движения зрачка или их комбинации.[0142] In another particular embodiment of the method for recording polygraphic checks, they contain signals from sensors, including at least one of: heart rate (HR), galvanic skin response (GSR), blood pressure, upper and lower respiration, piezoplethysmogram, photoplethysmogram , thermal, pupil movement or combinations thereof.

[0143] В другом частном варианте реализации способа дополнительные данные содержат по меньшей мере одно из: идентификационный номер полиграфологов, идентификационный номер полиграфов, информацию о погодных условиях, результаты электроэнцефалограммы, магнитно-резонансной томографии, функциональной ближней инфракрасной спектроскопии, информацию о геомагнитных бурях или их комбинации.[0143] In another particular embodiment of the method, the additional data contains at least one of: identification number of polygraph examiners, identification number of polygraphs, information about weather conditions, results of an electroencephalogram, magnetic resonance imaging, functional near-infrared spectroscopy, information about geomagnetic storms or their combinations.

[0144] Реализация данного технического решения на базе трех ансамблей моделей МО, позволяет автоматизировать процесс полиграфических проверок и с высокой точностью выявлять сокрытие информации. Данная реализация позволяет повысить точность выявления сокрытия информации за счет различий архитектур между тремя ансамблями и за счет добавления верхнеуровневой модели.[0144] The implementation of this technical solution based on three ensembles of MO models allows you to automate the process of polygraphic checks and identify information concealment with high accuracy. This implementation makes it possible to increase the accuracy of detecting information hiding due to differences in architectures between the three ensembles and by adding a top-level model.

[0145] На Фиг. 11 представлен пример общего вида вычислительной системы (500), которая обеспечивает реализацию заявленных способа или является частью компьютерной системы, например, сервером, персональным компьютером, частью вычислительного кластера, обрабатывающим необходимые данные для осуществления заявленного технического решения.[0145] In FIG. 11 shows an example of a general view of a computing system (500), which implements the claimed method or is part of a computer system, for example, a server, a personal computer, or part of a computing cluster that processes the necessary data to implement the claimed technical solution.

[0146] В общем случае, система (500) содержит объединенные общей шиной информационного обмена один или несколько процессоров (501), средства памяти, такие как ОЗУ (302) и ПЗУ (503), интерфейсы ввода/вывода (504), устройства ввода/вывода (1105), и устройство для сетевого взаимодействия (506).[0146] In general, the system (500) contains one or more processors (501), memory devices such as RAM (302) and ROM (503), input/output interfaces (504), and input devices connected by a common information exchange bus. /output (1105), and a device for network communication (506).

[0147] Процессор (501) (или несколько процессоров, многоядерный процессор и т.п.) может выбираться из ассортимента устройств, широко применяемых в настоящее время, например, таких производителей, как: Intel™, AMD™, Apple™, Samsung Exynos™, MediaTEK™, Qualcomm Snapdragon™ и т.п. Под процессором или одним из используемых процессоров в системе (500) также необходимо учитывать графический процессор, например, GPU NVIDIA или Graphcore, тип которых также является пригодным для полного или частичного выполнения способа, а также может применяться для обучения и применения моделей машинного обучения в различных информационных системах.[0147] The processor (501) (or multiple processors, multi-core processor, etc.) may be selected from a variety of devices commonly used today, for example, from manufacturers such as: Intel™, AMD™, Apple™, Samsung Exynos ™, MediaTEK™, Qualcomm Snapdragon™, etc. The processor or one of the processors used in the system (500) must also include a graphics processor, such as an NVIDIA or Graphcore GPU, the type of which is also suitable for full or partial implementation of the method, and can also be used to train and apply machine learning models in various information systems.

[0148] ОЗУ (502) представляет собой оперативную память и предназначено для хранения исполняемых процессором (501) машиночитаемых инструкций для выполнение необходимых операций по логической обработке данных. ОЗУ (502), как правило, содержит исполняемые инструкции операционной системы и соответствующих программных компонент (приложения, программные модули и т.п.). При этом, в качестве ОЗУ (502) может выступать доступный объем памяти графической карты или графического процессора.[0148] RAM (502) is a random access memory and is designed to store computer-readable instructions executable by the processor (501) to perform the necessary logical data processing operations. RAM (502) typically contains executable operating system instructions and related software components (applications, program modules, etc.). In this case, the available memory capacity of the graphics card or graphics processor can act as RAM (502).

[0149] ПЗУ (503) представляет собой одно или более устройств постоянного хранения данных, например, жесткий диск (HDD), твердотельный накопитель данных (SSD), флэш-память (EEPROM, NAND и т.п.), оптические носители информации (CD-R/RW, DVD-R/RW, BlueRay Disc, MD) и др.[0149] A ROM (503) is one or more permanent storage devices, such as a hard disk drive (HDD), a solid state drive (SSD), flash memory (EEPROM, NAND, etc.), optical storage media ( CD-R/RW, DVD-R/RW, BlueRay Disc, MD), etc.

[0150] Для организации работы компонентов системы (500) и организации работы внешних подключаемых устройств применяются различные виды интерфейсов В/В (504). Выбор соответствующих интерфейсов зависит от конкретного исполнения вычислительного устройства, которые могут представлять собой, не ограничиваясь: PCI, AGP, PS/2, IrDa, Fire Wire, LPT, COM, SATA, IDE, Lightning, USB (2.0, 3.0, 3.1, micro, mini, type C), TRS/Audio jack (2.5, 3.5, 6.35), HDMI, DVI, VGA, Display Port, RJ45, RS232 и т.п.[0150] To organize the operation of system components (500) and organize the operation of external connected devices, various types of I/O interfaces (504) are used. The choice of appropriate interfaces depends on the specific design of the computing device, which can be, but is not limited to: PCI, AGP, PS/2, IrDa, Fire Wire, LPT, COM, SATA, IDE, Lightning, USB (2.0, 3.0, 3.1, micro , mini, type C), TRS/Audio jack (2.5, 3.5, 6.35), HDMI, DVI, VGA, Display Port, RJ45, RS232, etc.

[0151] Для обеспечения взаимодействия пользователя с вычислительной системой (300) применяются различные средства (505) В/В информации, например, клавиатура, дисплей (монитор), сенсорный дисплей, тач-пад, джойстик, манипулятор мышь, световое перо, стилус, сенсорная панель, трекбол, динамики, микрофон, средства дополненной реальности, оптические сенсоры, планшет, световые индикаторы, проектор, камера, средства биометрической идентификации (сканер сетчатки глаза, сканер отпечатков пальцев, модуль распознавания голоса) и т.п.[0151] To ensure user interaction with the computing system (300), various means (505) of I/O information are used, for example, a keyboard, a display (monitor), a touch display, a touch pad, a joystick, a mouse, a light pen, a stylus, touch panel, trackball, speakers, microphone, augmented reality tools, optical sensors, tablet, light indicators, projector, camera, biometric identification tools (retina scanner, fingerprint scanner, voice recognition module), etc.

[0152] Средство сетевого взаимодействия (506) обеспечивает передачу данных посредством внутренней или внешней вычислительной сети, например, Интранет, Интернет, ЛВС и т.п. В качестве одного или более средств (306) может использоваться, но не ограничиваться: Ethernet карта, GSM модем, GPRS модем, LTE модем, 5G модем, модуль спутниковой связи, NFC модуль, Bluetooth и/или BLE модуль, Wi-Fi модуль и др.[0152] The networking facility (506) enables data transmission via an internal or external computer network, such as an Intranet, Internet, LAN, or the like. One or more means (306) may be used, but not limited to: Ethernet card, GSM modem, GPRS modem, LTE modem, 5G modem, satellite communication module, NFC module, Bluetooth and/or BLE module, Wi-Fi module and etc.

[0153] Представленные материалы заявки раскрывают предпочтительные примеры реализации технического решения и не должны трактоваться как ограничивающие иные, частные примеры его воплощения, не выходящие за пределы испрашиваемой правовой охраны, которые являются очевидными для специалистов соответствующей области техники.[0153] The submitted application materials disclose preferred examples of implementation of a technical solution and should not be interpreted as limiting other, particular examples of its implementation that do not go beyond the scope of the requested legal protection, which are obvious to specialists in the relevant field of technology.

Источники информации:Information sources:

[1] М. Harris, "The Lie Generator: Inside the Black Mirror World of Polygraph Job Screenings," in Wired.com, 2018.[1] M. Harris, "The Lie Generator: Inside the Black Mirror World of Polygraph Job Screenings," in Wired.com, 2018.

[2] B. Banerjee and G. Chatterjee, "The world of lie detection: a study into state of lie detection usage by state and society in Asia, Africa and Europe," 2021.[2] B. Banerjee and G. Chatterjee, “The world of lie detection: a study into state of lie detection usage by state and society in Asia, Africa and Europe,” 2021.

[3] S. E. Fienberg, J. J. Blascovich, J. T. Cacioppo, R. J. Davidson, P. Ekman, D. L. Faigman and e. al., "The polygraph and lie detection," in National Research Council, The National Academies Press, Washington, DC, 2003.[3] S. E. Fienberg, J. J. Blascovich, J. T. Cacioppo, R. J. Davidson, P. Ekman, D. L. Faigman and e. al., "The polygraph and lie detection," in National Research Council, The National Academies Press, Washington, DC, 2003.

[4] A. Slavkovic, "Evaluating polygraph data," in Carnegie Mellon University, 2002.[4] A. Slavkovic, "Evaluating polygraph data," in Carnegie Mellon University, 2002.

[5] J. Synnott, D. Dietzel and M. Ioannou, "The Polygraph: History, Methodology and Current Status," in Reviewing Crime Psychology, 2020.[5] J. Synnott, D. Dietzel and M. Ioannou, “The Polygraph: History, Methodology and Current Status,” in Reviewing Crime Psychology, 2020.

[6] G. Krishnamurthy, N. Majumder, S. Poria and E. Cambria, "A Deep Learning Approach for Multimodal Deception Detection," in 19th International Conference on Computational Linguistics and Intelligent Text Processing (CICLing), Hanoi, Vietnam, 2018.[6] G. Krishnamurthy, N. Majumder, S. Poria and E. Cambria, "A Deep Learning Approach for Multimodal Deception Detection," in 19th International Conference on Computational Linguistics and Intelligent Text Processing (CICLing), Hanoi, Vietnam, 2018 .

[7] D. Avola, L. Cinque, G. L. Foresti and D. Pannone, "Automatic Deception Detection in RGB videos using Facial Action Units," in ICDSC 2019: Proceedings of the 13th International Conference on Distributed Smart Cameras, Trento, 2019.[7] D. Avola, L. Cinque, G. L. Foresti and D. Pannone, “Automatic Deception Detection in RGB videos using Facial Action Units,” in ICDSC 2019: Proceedings of the 13th International Conference on Distributed Smart Cameras, Trento, 2019.

[8] N. Samadiani, G. Huang, B. Cai, W. Luo, C.-H. Chi, Y. Xiang and J. He, "A Review on Automatic Facial Expression Recognition Systems Assisted by Multimodal Sensor Data," in Sensors, Vol.19, Issue 8, Sp.Issue Sensor Applications on Face Analysis, 2019.[8] N. Samadiani, G. Huang, B. Cai, W. Luo, C.-H. Chi, Y. Xiang and J. He, "A Review on Automatic Facial Expression Recognition Systems Assisted by Multimodal Sensor Data," in Sensors, Vol.19, Issue 8, Sp.Issue Sensor Applications on Facial Analysis, 2019.

[9] A.K. Webb, C.R. Honts, J.C. Kircher, P. Bernhardt and A.E. Cook, "Effectiveness of pupil diameter in a probable-lie comparison question test for deception," in Legal and Criminological Psychology, Vol. 14, Issue 2, 2010.[9] A.K. Webb, C.R. Honts, J.C. Kircher, P. Bernhardt and A.E. Cook, "Effectiveness of pupil diameter in a probable-lie comparison question test for deception," in Legal and Criminological Psychology, Vol. 14, Issue 2, 2010.

[10] J.J. Walczyk, K.T. Mahoney, D. Doverspike and D.A. Griffith-Ross, "Cognitive Lie Detection: Response Time and Consistency of Answers as Cues to Deception," in Journal of Business and Psychology, Vol.24, 2009.[10] J.J. Walczyk, K.T. Mahoney, D. Doverspike and D.A. Griffith-Ross, "Cognitive Lie Detection: Response Time and Consistency of Answers as Cues to Deception," in Journal of Business and Psychology, Vol.24, 2009.

[11] A. Shuster, L. Inzelberg, O. Ossmy, L. Izakson, Y. Hanein and D. Levy, "Lie to my face: An electromyography approach to the study of deceptive behavior," in Brain and Behavior, vol. 11 issue 12, 2021.[11] A. Shuster, L. Inzelberg, O. Ossmy, L. Izakson, Y. Hanein and D. Levy, “Lie to my face: An electromyography approach to the study of deceptive behavior,” in Brain and Behavior, vol. . 11 issue 12, 2021.

[12] V. Abootalebi, M.H. Moradi and M.A. Khalilzadeh, "A new approach for EEG feature extraction in Р300-based lie detection," in Computer Methods and Programs in Biomedicine, vol.94, issue 1, 2009.[12] V. Abootalebi, M.H. Moradi and M.A. Khalilzadeh, "A new approach for EEG feature extraction in P300-based lie detection," in Computer Methods and Programs in Biomedicine, vol.94, issue 1, 2009.

[13] A. Kozel, K. Johnson, Q. Mu, E. Grenesko, S. Laken and M. George, "Detecting Deception Using Functional Magnetic Resonance Imaging," in Biological Psychiatry, Vol.58, Issue 8, 2005.[13] A. Kozel, K. Johnson, Q. Mu, E. Grenesko, S. Laken and M. George, "Detecting Deception Using Functional Magnetic Resonance Imaging," in Biological Psychiatry, Vol.58, Issue 8, 2005.

[14] M. J. Farah, J. B. Hutchinson, E. A. Phelps and A. D. Wagner, "Functional MRI-based lie detection: scientific and societal challenges," in Nature Reviews Neuroscience, Vol.14, 2014.[14] M. J. Farah, J. B. Hutchinson, E. A. Phelps and A. D. Wagner, “Functional MRI-based lie detection: scientific and societal challenges,” in Nature Reviews Neuroscience, Vol. 14, 2014.

[15] M. Monaro, C. Galante, R. Spolaor, Q.Q. Li, L. Gamberini, M. Conti and G. Sartori, "Covert lie detection using keyboard dynamics," in Nature, Scientific Reports, Vol.8, 2018.[15] M. Monaro, C. Galante, R. Spolaor, Q.Q. Li, L. Gamberini, M. Conti and G. Sartori, "Covert lie detection using keyboard dynamics," in Nature, Scientific Reports, Vol.8, 2018.

[16] L. Sousedikova, M. Hromada and M. Adamek, "Analysis of Artificial Intelligence Lie Detector Developed for Airport Security," in Tomas Bata University in Zlin, 2021.[16] L. Sousedikova, M. Hromada and M. Adamek, "Analysis of Artificial Intelligence Lie Detector Developed for Airport Security," in Tomas Bata University in Zlin, 2021.

[17] and L. Dencik, "The politics of deceptive borders: 'biomarkers of deceit and the case of iBorderCtrl," in Information, Communication & Society, Vol.25 Issue 3, 2022.[17] and L. Dencik, "The politics of deceptive borders: 'biomarkers of deceit and the case of iBorderCtrl," in Information, Communication & Society, Vol.25 Issue 3, 2022.

[18] F. Liberatore, J. Camacho-Collados and M. Camacho-Collados, "Applying automatic text-based detection of deceptive language to police reports: Extracting behavioral patterns from a multi-step classification model to understand how we lie to the police," in Knowledge-Based Systems, Vol.749,2018.[18] F. Liberatore, J. Camacho-Collados and M. Camacho-Collados, "Applying automatic text-based detection of deceptive language to police reports: Extracting behavioral patterns from a multi-step classification model to understand how we lie to the police," in Knowledge-Based Systems, Vol.749,2018.

[19] "Police use a computer to expose false testimony. A lie-detection system being used by Spanish police highlights concerns about algorithms.," in Nature, Editorial, 2018.[19] "Police use a computer to expose false testimony. A lie-detection system being used by Spanish police highlights concerns about algorithms." in Nature, Editorial, 2018.

[20] G. Ben-Shakhar and W. Iacono, "Fallacies in the estimation of the validity of the Comparison Question Polygraph Test: A reply to Ginton (2020)," in Investigative Psychology and Offender Profiling, Vol.18, Issue 3, 2021.[20] G. Ben-Shakhar and W. Iacono, “Fallacies in the estimation of the validity of the Comparison Question Polygraph Test: A reply to Ginton (2020),” in Investigative Psychology and Offender Profiling, Vol.18, Issue 3 , 2021.

[21] D. Grubin and L. Madsen, "Lie detection and the polygraph: A historical review," in The Journal oj Forensic Psychiatry & Psychology, 2005.[21] D. Grubin and L. Madsen, “Lie detection and the polygraph: A historical review,” in The Journal oj Forensic Psychiatry & Psychology, 2005.

[22] C. Hinkle, "The Modern Lie Detector: AI-Powered Affect Screening and the Employee Polygraph Protection Act (EPPA)," in The Georgetown Law Journal Vol.109, Georgetown, 2021.[22] C. Hinkle, “The Modern Lie Detector: AI-Powered Affect Screening and the Employee Polygraph Protection Act (EPPA),” in The Georgetown Law Journal Vol.109, Georgetown, 2021.

[23] J. Bittle, "Lie detectors have always been suspect. AI has made the problem worse.," in MIT Technology review, 2020.[23] J. Bittle, “Lie detectors have always been suspect. AI has made the problem worse.” in MIT Technology review, 2020.

[24] L. Saxe, "Science and the CQT polygraph - A theoretical critique," in Integrative Physiological and Behavioral Science, 1991.[24] L. Saxe, "Science and the CQT polygraph - A theoretical critique," in Integrative Physiological and Behavioral Science, 1991.

[25] A. M. Perkey, "Recommendations for uniform polygraph examinations for preemployment screening of law enforcement applicants," in University ofWisconsin-Platteville, 2021.[25] A. M. Perkey, “Recommendations for uniform polygraph examinations for preemployment screening of law enforcement applicants,” in University of Wisconsin-Platteville, 2021.

[26] W. Egerton, "Use of the Polygraph to Screen Police Candidates," in Law Enforcement Management Institute of Texas (LEMIT), North Richland Hills, Texas, 2020.[26] W. Egerton, "Use of the Polygraph to Screen Police Candidates," in Law Enforcement Management Institute of Texas (LEMIT), North Richland Hills, Texas, 2020.

[27] D. Baur, "Federal Psychophysiological Detection of Deception Examiner Handbook," in Counterintelligence Field Activity Technical Manual, 2006.[27] D. Baur, "Federal Psychophysiological Detection of Deception Examiner Handbook," in Counterintelligence Field Activity Technical Manual, 2006.

[28] J. Matzka, O. Bronkalla, K. Tornow, K. Elger and C. Stolle, "Geomagnetic Kp index V. 1.0.," in GFZ Data Services. https://doi.org/10.5880/Kp.0001, Potsdam, Germany, 2021.[28] J. Matzka, O. Bronkalla, K. Tornow, K. Elger and C. Stolle, “Geomagnetic Kp index V. 1.0,” in GFZ Data Services. https://doi.org/10.5880/Kp.0001, Potsdam, Germany, 2021.

[29] C.R. Honts and S. Amato, "Automation of a screening polygraph test increases accuracy," in Psychology, Crime & Law, Vol.13, Issue 2, 2007.[29] C.R. Honts and S. Amato, "Automation of a screening polygraph test increases accuracy," in Psychology, Crime & Law, Vol.13, Issue 2, 2007.

[30] A. Mambreyan, E. Punskaya and H. Gunes, "Dataset Bias in Deception Detection," in 26TH International Conference on Pattern Recognition, Montreal Quebec, 2022.[30] A. Mambreyan, E. Punskaya and H. Gunes, “Dataset Bias in Deception Detection,” in 26TH International Conference on Pattern Recognition, Montreal Quebec, 2022.

[31] M. Abouelenien, R. Mihalcea and M. Burzo, "Detecting Deceptive Behavior via Integration of Discriminative Features from Multiple Modalities," in IEEE Transactions on Information Forensics and Security, Vol.12, Issue 5, 2016.[31] M. Abouelenien, R. Mihalcea and M. Burzo, "Detecting Deceptive Behavior via Integration of Discriminative Features from Multiple Modalities," in IEEE Transactions on Information Forensics and Security, Vol. 12, Issue 5, 2016.

[32] Interfax, "Bill on possible ban on transfer abroad of Russians" personal data being submitted to State Duma," in https://interfax.com/newsroom/top-stories/77833/, 2022.[32] Interfax, “Bill on possible ban on transfer abroad of Russians” personal data being submitted to State Duma,” in https://interfax.com/newsroom/top-stories/77833/, 2022.

[33] M. Handler and N. Hernandez, "Introduction to the NCCA ASCII Standard," in Polygraph & Forensic Credibility Assessment: A Journal of Science and Field Practice, 2019.[33] M. Handler and N. Hernandez, "Introduction to the NCCA ASCII Standard," in Polygraph & Forensic Credibility Assessment: A Journal of Science and Field Practice, 2019.

Claims

1. A computer-implemented method for automatically testing a subject using a polygraph using machine learning methods, containing stages in which:

- obtain records of polygraph tests containing at least sensor signals with time scales on which the beginning and end of the question are marked;

- receive additional data containing at least the age of the person being checked, gender, job information;

- process the received signals using the first ensemble of ML models trained on one topic, and during this processing the following is carried out:

signal processing by the first MO model, during which the following is performed:

• defining time intervals for retrieving variables based on the start and end timestamps of the question and the answer timestamp and based on the question type and topic;

• extracting variables from each signal at certain time intervals;

• processing of obtained variables from signals, which involves normalization and concatenation of processed variables and construction of a vector based on them;

• feeding the mentioned vector into the 1st MO model to obtain the output value of the 1st MO model;

• transfer of the output value of the 1st MO model to the input of the 2nd MO model;

using the second MO model, the output value of the 1st MO model and additional data are processed, and during this processing the following is carried out:

• dividing additional data into categorical and numerical variables;

• processing of obtained variables from additional data, which involves vectorization of categorical variables and normalization of numerical variables;

• concatenation of processed additional variables, as well as the output value of the 1st MO model and construction of a vector based on them;

• feeding the mentioned vector into the 2nd MO model to obtain the output value of the 2nd MO model;

• feeding the output value of the 2nd MO model to the third MO model to form the output value of the first ensemble;

- process the received signals using a second ensemble of ML models trained on a combination of topics, and during this processing the following is carried out:

• extracting variables from each signal at certain time intervals;

• dividing additional data into categorical and numerical variables;

• feeding the output value of the 2nd MO model to the third MO model to form the output value of the second ensemble;

- process the received signals using a third ensemble of machine learning models trained on a combination of topics, and during this processing the following is carried out:

• extracting variables from each signal at certain time intervals;

• dividing additional data into categorical and numerical variables;

• feeding the output value of the 2nd MO model to the third MO model to form the output value of the third ensemble;

- using the third MO model, the output values of the first, second and third MO ensembles are processed, and during this processing the following is carried out:

concatenation of the processed output values of the first, second and third ensembles and construction of a vector based on them;

feeding said vector to the 3rd MO model to obtain the output value of the 3rd MO model;

comparison of the output value of the 3rd model with a given threshold value; And

- determine that the response is false if the output value is greater than or equal to the threshold value, or the response is true if the output value is below the threshold value.

2. The method according to claim 1, characterized in that the ML models are trained on one of the topics for checks or a combination thereof, where the topics for checks are: narcotic substances, receiving additional remuneration, disclosure of confidential information, debt obligations, third-party income, criminal offenses , administrative offenses, violations of internal regulatory documents (INR).

3. The method according to claim 1, characterized in that the MR models of the first, second and third ensembles have a type selected from the group: gradient boosting, random forest or neural network.

4. The method according to claim 1, characterized in that the third ML model has a type selected from the group: logistic regression, random forest, gradient boosting or averaging.

5. The method according to claim 1, characterized in that the printing test records contain signals from sensors including at least one of: heart rate (HR), galvanic skin response (GSR), blood pressure, upper and lower respiration, piezoplethysmogram, photoplethysmogram, thermal, pupil movements or combinations thereof.

6. The method according to claim 1, characterized in that the additional data contains at least one of: identification number of polygraph examiners, identification number of polygraphs, information about weather conditions, results of an electroencephalogram, magnetic resonance imaging, functional near-infrared spectroscopy, information about geomagnetic storms or combinations thereof.

7. A system for automatically checking a subject using a polygraph using machine learning methods, containing:

- at least one processor;

- at least one memory connected to the processor, which contains machine-readable instructions that, when executed by at least one processor, enable execution of the method according to any one of claims. 1-6.