EA043568B1

EA043568B1 - METHOD AND SYSTEM FOR DETECTING SYNTHETICALLY ALTERED IMAGES OF FACES IN VIDEO

Info

Publication number: EA043568B1
Application number: EA202192996
Authority: EA
Inventors: Кирилл Евгеньевич Вышегородцев; Александр Викторович Балашов; Григорий Алексеевич Вельможин; Валентин Валерьевич Сысоев
Original assignee: Публичное Акционерное Общество "Сбербанк России" (Пао Сбербанк)
Priority date: 2021-10-19
Filing date: 2021-11-30
Publication date: 2023-06-01

Description

Область техникиField of technology

Изобретение относится к области компьютерных технологий, применяемых в области обработки данных, в частности к способу и системе для определения синтетически измененных изображений лиц на видео.The invention relates to the field of computer technologies used in the field of data processing, in particular to a method and system for determining synthetically modified images of faces in video.

Уровень техникиState of the art

На сегодняшний день применение технологий для формирования синтетических изображений, накладываемых на изображения лиц реальных людей, основано, как правило, на применении алгоритмов машинного обучения, например, искусственных нейронных сетей (ИНС). Такие подходы направлены на наложение цифровых масок, имитирующих лица людей. Примером такой технологии является методика DeepFake, основанная на искусственном интеллекте и применяемая для синтеза изображений (см. http s://ru. wikipedia. org/ wiki/D e ep fake).Today, the use of technologies for the formation of synthetic images superimposed on images of the faces of real people is based, as a rule, on the use of machine learning algorithms, for example, artificial neural networks (ANN). Such approaches aim to apply digital masks that mimic human faces. An example of such technology is the DeepFake technique, based on artificial intelligence and used for image synthesis (see http s://ru. wikipedia. org/wiki/D e ep fake).

Известен способ распознавания синтетически измененных изображений лиц людей, в частности DeepFake изображений (Tolosana et al. DeepFakes Evolution: Analysis of Facial Regionsand Fake Detection Performance // Biometrics and Data Pattern Analytics - BiDA Lab, Universidad Autonoma de Madrid. 2020), который основан на анализе сегментов, формирующих изображения лица. Анализ осуществляется с помощью ИНС, обученной на реальных и синтетических изображениях лиц людей, в частности, знаменитостей, что может применяться для выявления подложных (фейковых) видео. Способ позволяет анализировать сегменты лица, на основании которых выдается классификация соответствующего изображения как содержащего синтетические изменения или нет.There is a known method for recognizing synthetically altered images of people’s faces, in particular DeepFake images (Tolosana et al. DeepFakes Evolution: Analysis of Facial Regions and Fake Detection Performance // Biometrics and Data Pattern Analytics - BiDA Lab, Universidad Autonoma de Madrid. 2020), which is based on analysis of segments that form facial images. The analysis is carried out using an ANN trained on real and synthetic images of people’s faces, in particular celebrities, which can be used to identify fake videos. The method makes it possible to analyze facial segments, on the basis of which a classification of the corresponding image as containing synthetic changes or not is issued.

Недостатком такого подхода является низкая эффективность в связи с отсутствием применения интегральной оценки, которая бы формировалась исходя из геометрических параметров изображения лица, так и на основании пространственно-временной характеристики лица человека в видео. Другим недостатком является отсутствие в некоторых решениях обработки нескольких людей, если на видео присутствует нескольких человек. В других известных открытых решениях (https://www.kaggle.com/c/deepfakedetection-challenge, https://ai.facebook.com/datasets/dfdc/) такая обработка осуществляется путём независимой оценки каждого изображения лица, каждого человека, на каждом анализируемом кадре видео и дальнейшем усреднением всех таких оценок. Все такие решения показывают низкую эффективность при обработке видео с несколькими людьми.The disadvantage of this approach is low efficiency due to the lack of use of an integral assessment, which would be formed based on the geometric parameters of the face image, and on the basis of the spatio-temporal characteristics of the person’s face in the video. Another drawback is that some solutions do not handle multiple people if there are multiple people in the video. In other well-known open solutions (https://www.kaggle.com/c/deepfakedetection-challenge, https://ai.facebook.com/datasets/dfdc/) such processing is carried out by independently assessing each face image, each person, on each analyzed video frame and then averaging all such estimates. All such solutions show low efficiency when processing videos with multiple people.

Сущность изобретенияThe essence of the invention

Заявленные способ и система направлены на решение технической проблемы, заключающейся в эффективном и точном определении синтетических изменений изображений лиц на видео.The claimed method and system are aimed at solving the technical problem of efficiently and accurately determining synthetic changes in facial images on video.

Техническим результатом является повышение точности и эффективности обнаружения синтетического изменения изображений лиц людей в видео. В первой предпочтительной реализации изобретения предложен компьютерно-реализуемый способ определения синтетически измененных изображений лиц на видео, выполняемый с помощью процессора, при котором:The technical result is to increase the accuracy and efficiency of detecting synthetic changes in images of people's faces in videos. In a first preferred implementation of the invention, a computer-implemented method for identifying synthetically altered images of faces in video is provided, performed using a processor, in which:

a) получают по меньшей мере одно изображение из видео;a) obtaining at least one image from the video;

b) выявляют изображения лиц на упомянутом изображении;b) identifying images of faces in said image;

c) рассчитывают векторное представление геометрических характеристик выявленных изображений лиц, с помощью по меньшей мере алгоритма сравнения опорных точек лиц, для определения изображений по меньшей мере лица одного человека;c) calculating a vector representation of the geometric characteristics of the identified facial images, using at least a facial reference point comparison algorithm, to determine images of at least one person's face;

d) с помощью покадрового анализа видео рассчитывают пространственно-временную значимость каждого изображения лица каждого человека на упомянутом изображении, которая определяется как векторное представление пространственной характеристики лица, характеризующей размер области лица по отношению к кадру, и векторное представление временной характеристики изображения лица, характеризующей время отображения анализируемого изображения лица на кадрах видео;d) using frame-by-frame video analysis, calculate the spatio-temporal significance of each face image of each person in the said image, which is defined as a vector representation of the spatial characteristic of the face, characterizing the size of the facial area in relation to the frame, and a vector representation of the temporal characteristic of the face image, characterizing the time of display analyzed facial image on video frames;

e) рассчитывают вектор оценок вероятности синтетических изменений для изображений лиц человека, характеризующий наличие синтетических изменений изображений лиц этого человека в каждом кадреe) calculate a vector of estimates of the probability of synthetic changes for images of a person’s faces, characterizing the presence of synthetic changes in images of this person’s faces in each frame

f) рассчитывают общую оценку вероятности синтетических изменений на основании векторных преставлений пространственного, временного распределения и вектора оценок синтетических изменений для изображений лиц каждого человека в видео;f) calculate an overall estimate of the probability of synthetic changes based on vector representations of the spatial, temporal distribution and vector of estimates of synthetic changes for the facial images of each person in the video;

g) формируют итоговую оценку наличия на видео синтетического изменения изображения по меньшей мере одного лица;g) forming a final assessment of the presence of a synthetic change in the image of at least one face in the video;

h) формируют интегральную оценку наличия на видео синтетически измененного изображения лица по меньшей мере по одной итоговой оценке модели и генерируют уведомление о наличии синтетически измененного лица в видео. В одной из частных реализаций способа этапы с) - h) выполняются моделью машинного обучения или ансамблем моделей, при этом модель машинного обучения или ансамбль моделей натренированы на наборе данных, содержащих синтезированные изображения лиц людей.h) forming an integral assessment of the presence of a synthetically altered face image in the video based on at least one final assessment of the model and generating a notification about the presence of a synthetically altered face in the video. In one of the particular implementations of the method, steps c) - h) are performed by a machine learning model or an ensemble of models, wherein the machine learning model or ensemble of models is trained on a data set containing synthesized images of people's faces.

В другой частной реализации способа модель машинного обучения использует функцию автоматической корректировки разметки, обеспечивающей исправление некорректной разметки каждого лица на кадрах, путём сравнения изображений лиц на синтезированном видео с их изображениями на исходном видео.In another particular implementation of the method, the machine learning model uses the function of automatic markup correction, which corrects the incorrect marking of each face in the frames by comparing the images of faces on the synthesized video with their images on the original video.

- 1 043568- 1 043568

В другой частной реализации способа сравнение лиц осуществляется на основании значения векторной близости опорных точек, формирующих геометрические характеристики исходного изображения лица и синтезированного изображения на его основе.In another particular implementation of the method, comparison of faces is carried out based on the value of the vector proximity of reference points that form the geometric characteristics of the original face image and the synthesized image based on it.

В другой частной реализации способа сравнение лиц осуществляется с помощью анализа координат областей исходного изображения лица и синтезированного изображения лица.In another particular implementation of the method, comparison of faces is carried out by analyzing the coordinates of the areas of the original face image and the synthesized face image.

В другой частной реализации способа пространственно-временная значимость рассчитывается как общая матрица на основании значений векторных представлений, а оценка наличия синтетических изменений изображений лиц отдельного человека формируется моделью машинного обучения по полученной общей матрице.In another particular implementation of the method, spatiotemporal significance is calculated as a general matrix based on the values of vector representations, and an assessment of the presence of synthetic changes in the images of an individual’s faces is generated by a machine learning model based on the resulting general matrix.

В другой частной реализации способа ансамбль моделей машинного обучения состоит из группы моделей, каждая из которых обучена на выявление определенного алгоритма формирования синтетических изображений.In another particular implementation of the method, an ensemble of machine learning models consists of a group of models, each of which is trained to identify a specific algorithm for generating synthetic images.

В другой частной реализации способа содержит интегральный классификатор, получающий на вход оценки, формируемые с помощью моделей, входящих в ансамбль.In another particular implementation of the method, it contains an integral classifier that receives as input estimates generated using models included in the ensemble.

В другой частной реализации способа общая оценка рассчитывается с помощью интегрального классификатора.In another particular implementation of the method, the overall score is calculated using an integral classifier.

В другой частной реализации способа дополнительно определяется алгоритм формирования синтетического изображения лица в анализируемом видеопотоке.In another particular implementation of the method, an algorithm for generating a synthetic face image in the analyzed video stream is additionally defined.

В другой частной реализации способа видео представляет собой онлайн видеоконференцию.In another particular implementation of the method, the video is an online video conference.

В другой частной реализации способа при определении синтетически измененного изображения лица в области его отображения формируется уведомление.In another particular implementation of the method, when a synthetically modified image of a face is detected in its display area, a notification is generated.

В другой частной реализации способа при определении синтетически измененного изображения лица осуществляется блокирование соединения с данным пользователем.In another particular implementation of the method, when a synthetically modified face image is determined, the connection with this user is blocked.

В другой частной реализации способа анализируемое изображение получают из системы биометрической идентификации или биометрической аутентификации.In another particular implementation of the method, the analyzed image is obtained from a biometric identification or biometric authentication system.

В другой частной реализации способа при определении синтетически измененного изображения лица осуществляется блокировка доступа или запрашиваемого действия со стороны пользователя.In another private implementation of the method, when determining a synthetically modified face image, access or the requested action on the part of the user is blocked.

В другой частной реализации способа при определении синтетически измененного изображения лица дополнительно запрашивают данные аутентификации пользователя, выбираемые из группы: логин, код, пароль, двухфакторная аутентификация или их сочетания.In another private implementation of the method, when determining a synthetically altered image of a person, they additionally request user authentication data selected from the group: login, code, password, two-factor authentication or combinations thereof.

В другой частной реализации способа формируется сигнал в виде количественной оценки вероятности присутствия синтетически измененного изображения лица.In another particular implementation of the method, a signal is generated in the form of a quantitative estimate of the probability of the presence of a synthetically modified face image.

В другой частной реализации способа изображения получают из видео системы мониторинга медиапространства и анализа социальных медиа и СМИ, выполняющей проверку контента в социальных медиа и СМИ.In another particular implementation of the method, images are obtained from a video of a system for monitoring the media space and analyzing social media and mass media, which checks content in social media and mass media.

В другой частной реализации способа при определении синтетически измененного изображения лица формируется уведомление для информирования человека, который был подвержена созданию измененного изображения лица.In another particular implementation of the method, when a synthetically altered face image is detected, a notification is generated to inform the person who was subject to the creation of the altered face image.

Во второй предпочтительной реализации изобретения предложена система определения синтетически измененных изображений лиц на видео, содержащая по меньшей мере один процессор и по меньшей мере одну память, хранящую машиночитаемые инструкции, которые при их выполнении процессором реализуют вышеуказанный способ.A second preferred implementation of the invention provides a system for detecting synthetically altered images of faces in video, comprising at least one processor and at least one memory storing machine-readable instructions that, when executed by the processor, implement the above method.

Краткое описание фигурBrief description of the figures

Фиг. 1 иллюстрирует блок-схему реализации заявленного способа.Fig. 1 illustrates a block diagram of the implementation of the claimed method.

Фиг. 2 иллюстрирует пример формирования векторного представления изображений лиц в видео.Fig. Figure 2 illustrates an example of the formation of a vector representation of facial images in a video.

Фиг. 3А-Б иллюстрируют пример формирования векторных представлений пространственно-временных характеристик.Fig. 3A-B illustrate an example of the formation of vector representations of spatio-temporal characteristics.

Фиг. 4 иллюстрирует блок-схему формирования вектора оценок синтетических изображений лиц, вектора пространственной характеристики изображений лиц и вектора временной характеристики изображений лиц для изображений лиц каждого человека на видео.Fig. 4 illustrates a block diagram for generating a vector of estimates of synthetic face images, a vector of spatial characteristics of face images, and a vector of temporal characteristics of face images for the face images of each person in the video.

Фиг. 5 иллюстрирует блок-схему независимого формирования итоговых пространственной и временной характеристик, и общей оценки синтетических изображений лиц.Fig. 5 illustrates a block diagram of the independent generation of the final spatial and temporal characteristics, and the overall evaluation of synthetic facial images.

Фиг. 6 иллюстрирует блок-схему обработки итоговых пространственной и временной характеристик, при их независимом формировании от общей оценки синтетических изображений лиц, для исключения лиц людей из расчёта оценки синтетических изменений в видео.Fig. 6 illustrates a block diagram of processing the final spatial and temporal characteristics, when they are formed independently of the overall assessment of synthetic images of faces, to exclude people's faces from the calculation of the assessment of synthetic changes in the video.

Фиг. 7 иллюстрирует блок-схему формирования уведомления с интегральной оценкой наличия синтетических изображений лиц людей в видео, уведомления о вероятном алгоритме генерации данных синтетических изменений, при использовании совокупности ансамблей обученных моделей машинного обучения, когда модели каждого ансамбля обучены на наборе данных с одним конкретным алгоритмом генерации синтетических изменений лиц, а, по меньше мере, модели одного ансамбля обучены на наборе данных с несколькими алгоритмами генерации синтетических изменений лиц.Fig. 7 illustrates a block diagram for generating a notification with an integral assessment of the presence of synthetic images of people’s faces in a video, a notification about a probable algorithm for generating data of synthetic changes, when using a set of ensembles of trained machine learning models, when the models of each ensemble are trained on a data set with one specific algorithm for generating synthetic facial changes, and at least one ensemble model is trained on a dataset with multiple algorithms for generating synthetic facial changes.

Фиг. 8 иллюстрирует блок-схему, когда уведомление формируется интегральным классификаторомFig. 8 illustrates a block diagram when a notification is generated by an integrated classifier

- 2 043568 на основании оценок нескольких обученных моделей машинного обучения или их ансамблей, а на видео присутствует несколько людей.- 2 043568 based on scores from multiple trained machine learning models or ensembles of them, and there are multiple people in the video.

Фиг. 9 иллюстрирует общий вид вычислительного устройства.Fig. 9 illustrates a general view of a computing device.

Осуществление изобретенияCarrying out the invention

В настоящем решении под термином синтетически измененное изображения лица здесь и далее по тексту будет пониматься любой тип формирования цифрового изображения, имитирующего лицо или часть лица другого человека, в том числе путем наложения цифровых масок, искажение/изменение частей лица и т.п. Под синтетически измененным изображением лица следует понимать, как полностью сгенерированные изображения, например, масок с помощью технологии DeepFake, накладываемых на лицо реального человека в кадре с сохранением мимической активности изображения, так и формирование частичного изменения отдельных частей лица (глаз, носа, губ, ушей и т.п.).In this decision, the term synthetically modified face image hereinafter will be understood as any type of formation of a digital image that imitates the face or part of the face of another person, including by applying digital masks, distorting/changing parts of the face, etc. A synthetically modified face image should be understood as both fully generated images, for example, masks using DeepFake technology, superimposed on the face of a real person in the frame while preserving the facial activity of the image, and the formation of partial changes in individual parts of the face (eyes, nose, lips, ears and so on.).

Как представлено на фиг. 1, реализация заявленного способа (100) определения синтетически измененных изображений лиц в видео заключается в выполнении вычислительным компьютерным устройством, в частности, с помощью одного или нескольких процессоров в автоматизированном режиме программного алгоритма, представленного в виде последовательности этапов (101)-(107), обеспечивающих выполнение материальных действий в виде обработки электронных сигналов, порождаемых при исполнении процессором вычислительного устройства своих функций в целях реализации выполнения обработки данных в рамках исполнения способа (100).As shown in FIG. 1, the implementation of the claimed method (100) for determining synthetically modified images of faces in a video consists in executing a software algorithm presented in the form of a sequence of stages (101)-(107) by a computing device, in particular, using one or more processors in an automated mode. ensuring the implementation of material actions in the form of processing electronic signals generated when the processor of a computing device performs its functions in order to implement data processing within the framework of the execution of the method (100).

На первом этапе (101) осуществляется получение и сохранение в память вычислительного устройства, выполняющего способ (100), одного или нескольких изображений, получаемых из видео. В настоящих материалах заявки под термином видео будет пониматься видеоизображение, видеопоток (например, с ip-камеры, камеры электронного устройства, виртуальной камеры, с Интернет-приложения), упорядоченная последовательность кадров (изображений), подвыборка кадров, в том числе вплоть и до одного изображения.At the first stage (101), one or more images obtained from the video are received and stored in the memory of the computing device performing the method (100). In these application materials, the term video will mean a video image, a video stream (for example, from an IP camera, an electronic device camera, a virtual camera, from an Internet application), an ordered sequence of frames (images), a subsample of frames, including up to one Images.

На этапе (102) полученные изображения анализируются на предмет наличия на них изображений лиц для определения наличия его синтетического изменения. Последующий анализ полученных изображений может выполняться с помощью одной или нескольких (ансамбля) моделей машинного обучения, которые обучены на детектирование и классификацию изображений лиц.At step (102), the resulting images are analyzed for the presence of facial images to determine the presence of a synthetic change. Subsequent analysis of the resulting images can be performed using one or more (ensemble) machine learning models that are trained to detect and classify facial images.

При выявлении синтетического изменения изображений лиц в видео могут использоваться различные модели машинного обучения, например, архитектуры нейронных сетей, таких как полносвязанные нейронные сети, CNN (сверточные сети), RNN (рекуррентные сети), Transformer (сети трансформеры), CapsNet (капсульные сети) и их совокупности.When identifying synthetic changes in facial images in videos, various machine learning models can be used, for example, neural network architectures, such as fully connected neural networks, CNN (convolutional networks), RNN (recurrent networks), Transformer (transformer networks), CapsNet (capsule networks) and their totality.

При своем обучении сети могут выявлять одну или несколько особенностей синтетически измененных изображений лиц, в частности: анатомическая пропорция лица и головы; анатомическая особенность расположения частей лица; пропорции частей лица; пластика и рельеф мимического разнообразия; особенности пластики деталей лица: бровей, глаз, носа, ушей, губ, кожи; общая характеристику мышц лица и шеи; строение и распределение мышц на группы (мимические, жевательные, подзатылочные и прочие), место расположения; неестественность теней, света, бликов, полутеней, рефлексов освещенности и окружения деталей лица и окружающего пространства; температурное распределение по элементам лица; размытие, сглаживание при отрисовке элементов лица, головы и других элементов изображения; повышение резкости (шарпности) и искусственное усиление черт при отрисовке элементов лица, головы и других элементов изображения; графические артефакты, оставляемые алгоритмами генерации и/или их конкретными реализациями в программном обеспечении при создании синтетических изображений.During their training, networks can identify one or more features of synthetically modified images of faces, in particular: the anatomical proportion of the face and head; anatomical feature of the location of parts of the face; proportions of facial parts; plasticity and relief of facial diversity; features of plastic surgery of facial details: eyebrows, eyes, nose, ears, lips, skin; general characteristics of the muscles of the face and neck; structure and distribution of muscles into groups (facial, chewing, suboccipital and others), location; unnaturalness of shadows, light, highlights, penumbra, reflections of illumination and surrounding details of the face and surrounding space; temperature distribution across facial elements; blur, smoothing when drawing elements of the face, head and other image elements; increased sharpness (sharpness) and artificial enhancement of features when drawing elements of the face, head and other image elements; graphical artifacts left by generation algorithms and/or their specific implementations in software when creating synthetic images.

Так же возможно использование предобученных нейронных сетей с дальнейшим их обучение или без такового. В случае использования архитектур со сверточными сетями могут использоваться такие предобученные модели как: AlexNet, VGG, NASNet-A, DenseNet, DenseNet-B, DenseNet-BC, Inception, Xception, GoogleNet, PReLU-net, BN-inception, AmoebaNet, SENet, ResNet-18, ResNet-34, ResNet-50, ResNet-101, ResNet-152, XResNet, Squeeze-and-Excitation ResNet (SE-ResNet), EfficientNet-B0, EfficientNet EfficientNet-B1, EfficientNet-B2, EfficientNet-B3, EfficientNet-B4, EfficientNet-B5, EfficientNet-B6, EfficientNet-B7, YOLO и наследуемых от них.It is also possible to use pre-trained neural networks with or without further training. In the case of using architectures with convolutional networks, the following pretrained models can be used: AlexNet, VGG, NASNet-A, DenseNet, DenseNet-B, DenseNet-BC, Inception, Xception, GoogleNet, PReLU-net, BN-inception, AmoebaNet, SENet, ResNet-18, ResNet-34, ResNet-50, ResNet-101, ResNet-152, XResNet, Squeeze-and-excitation ResNet (SE-ResNet), EfficientNet-B0, EfficientNet EfficientNet-B1, EfficientNet-B2, EfficientNet-B3 , EfficientNet-B4, EfficientNet-B5, EfficientNet-B6, EfficientNet-B7, YOLO and inherited from them.

Обучение модели машинного обучения производилось как минимум с одним этапом из следующих: получение классифицированных (размеченных, с проставленными классами) данных в одном или нескольких форматов: видеопоток, видеофайл, кадры (кадр) видео;The machine learning model was trained with at least one of the following stages: obtaining classified (labeled, with classes assigned) data in one or more formats: video stream, video file, video frames;

выделение кадров в случае получения видеопотока или видеофайла;highlighting frames when receiving a video stream or video file;

обнаружение лица (лиц) на кадрах. Их вырезка из кадра с некоторой окрестностью вокруг лица и получение массивов данных лиц;detection of face(s) in frames. Cutting them out of the frame with some neighborhood around the face and obtaining data arrays of faces;

для данных класса Синтетически измененное изображение в случае наличия исходного кадра изображения - кадр из которого формировалось такое измененное изображение, проверка правильности проставленного класса;for class data Synthetically modified image in the case of the presence of an original image frame - the frame from which such a modified image was formed, checking the correctness of the assigned class;

для каждого лица производится трансформация его массива данных (значений пикселей, bmpкарты) по алгоритму предобработки (стандартизация данных, масштабирование изображения и другие);for each person, its data array (pixel values, bmp maps) is transformed according to a preprocessing algorithm (data standardization, image scaling, etc.);

аугментация данных;data augmentation;

- 3 043568 формирование пакета данных и подача его на обучение нейронной сети;- 3 043568 generating a data package and submitting it for training a neural network;

подсчет значения целевой функции и обратное распространении ошибки пакета данных для обучения сети. В качестве показателей качества могут применяться: LogLoss, accuracy, precision (точность), recall (полнота), F-мера, AUC-ROC, AUC-PR, коэффициент/индекс Джини (Gini coefficient), confusion matrix (матрица ошибок).calculating the value of the objective function and backpropagating the error of the data packet for network training. The following quality indicators can be used: LogLoss, accuracy, precision, recall, F-measure, AUC-ROC, AUC-PR, Gini coefficient, confusion matrix.

В качестве алгоритма тренировки модели машинного обучения может использоваться один или несколько следующих алгоритмов: Adagrad (Adaptive gradient algorithm), RMS (Root mean square), RMSProp (Root mean square propagation), Rprop (Resilient backpropagation algorithm), SGD (Stochastic Gradient Descent), BGD (Batch Gradient Descent), MBGD (Mini-batch Gradient Descent), Momentum, Nesterov Momentum, NAG (Nesterov Accelerated Gradient), FussySGD, SGDNesterov (SGD + Nesterov Momentum), AdaDelta, Adam (Adaptive Moment Estimation), AMSGrad, AdamW, ASGD (Averaged Stochastic Gradient Descent), LBFGS (L-BFGS algorithm - алгоритм Бройдена-Флетчера-Гольдфарба-Шанно с ограниченным использованием памяти), а так же оптимизаторы второго порядка, такие как: Метод Ньютона, Квазиньютоновский метод, Алгоритм Гаусса-Ньютона, Метод сопряженного градиента, Алгоритм ЛевенбергаМарквардта.One or more of the following algorithms can be used as an algorithm for training a machine learning model: Adagrad (Adaptive gradient algorithm), RMS (Root mean square), RMSProp (Root mean square propagation), Rprop (Resilient backpropagation algorithm), SGD (Stochastic Gradient Descent) , BGD (Batch Gradient Descent), MBGD (Mini-batch Gradient Descent), Momentum, Nesterov Momentum, NAG (Nesterov Accelerated Gradient), FussySGD, SGDNesterov (SGD + Nesterov Momentum), AdaDelta, Adam (Adaptive Moment Estimation), AMSGrad, AdamW, ASGD (Averaged Stochastic Gradient Descent), LBFGS (L-BFGS algorithm - Broyden-Fletcher-Goldfarb-Shanno algorithm with limited memory use), as well as second-order optimizers, such as: Newton's method, Quasi-Newton method, Gaussian algorithm Newton, Conjugate Gradient Method, Levenberg-Marquardt Algorithm.

В качестве целевой функции при обучении модели машинного обучения используется по крайней мере одна из следующих функций: L1Loss, MSELoss, CrossEntropyLoss, CTCLoss, NLLLoss, PoissonNLLLoss, GaussianNLLLoss, KLDivLoss, BCELoss, BCEWithLogitsLoss, MarginRankingLoss, HingeEmbeddingLoss, MultiLabelMarginLoss, HuberLoss, SmoothL1Loss, SoftMarginLoss, MultiLabelSoftMarginLoss, CosineEmbeddingLoss, MultiMarginLoss, TripletMarginLoss, TripletMarginWithDistanceLoss.At least one of the following functions is used as the target function when training a machine learning model: L1Loss, MSELoss, CrossEntropyLoss, CTCLoss, NLLLoss, PoissonNLLLoss, GaussianNLLLoss, KLDivLoss, BCELoss, BCEWithLogitsLoss, MarginRankingLoss, HingeEmbeddingLoss, MultiLabelMarginLoss, HuberLo ss, SmoothL1Loss, SoftMarginLoss, MultiLabelSoftMarginLoss, CosineEmbeddingLoss, MultiMarginLoss, TripletMarginLoss, TripletMarginWithDistanceLoss.

При обучении модели машинного обучения может применяться этап самопроверки разметки (автоматической корректировки разметки), на котором проводится проверка каждого лица на изображении (кадре), которое размечено как содержащее синтетическое изменение, что оно действительно содержит признаки такого изменения.When training a machine learning model, a markup self-checking stage (automatic markup adjustment) can be used, which verifies that each face in an image (frame) that is marked as containing a synthetic change actually contains signs of such a change.

Данная проверка реализуется в случае наличия исходного видео (кадров, изображений). Исходное видео (кадры, изображения) это реальное (неизменённое внедрением синтетического изменения) видео из которого формировались синтетически измененное видео учебного (дополнительно и тестового) набора. Данная особенность реализуется следующим образом и может содержать следующие шаги:This check is implemented if the source video (frames, images) is available. The original video (frames, images) is a real (unchanged by the introduction of a synthetic change) video from which a synthetically modified video of the training (additionally and test) set was formed. This feature is implemented as follows and may contain the following steps:

Алгоритмом обнаружения лица на изображении на кадре синтетически измененного видео обнаруживается лицо. Вырезается часть изображения с лицом и некоторой окрестностью вокруг него. Размер окрестности может варьироваться.The face detection algorithm detects a face in a frame of a synthetically modified video. Part of the image with the face and some surroundings around it is cut out. The size of the neighborhood may vary.

В соответствующем кадре исходного видео обнаруживаются все лица. Выбирается лицо с наиболее близкими характеристиками к лицу с предыдущего шага. В качестве меры близости, в зависимости от используемого алгоритма детектирования лиц, используется близость по одной или нескольким точкам (совокупность точек) лица: носа; ноздрей; линии волос; линии растительности на лице (борода, усы); рта; губ (верхней и нижней); лба; глаз; зрачков; ушей; бровей; век; головы; скул; подбородка; носогубного треугольника; координат прямоугольника лица.All faces in the corresponding frame of the source video are detected. The face with the closest characteristics to the face from the previous step is selected. As a measure of proximity, depending on the face detection algorithm used, proximity to one or several points (a set of points) of the face is used: nose; nostrils; hair lines; lines of facial hair (beard, mustache); mouth; lips (upper and lower); forehead; eye; pupils; ears; eyebrows; century; heads; cheekbone; chin; nasolabial triangle; coordinates of the face rectangle.

В качестве алгоритма для детектирования лиц людей могут использоваться такие подходы, как: адаптированное улучшение и основанный на нём метод Виолы-Джонса, MTCNN, метод гибкого сравнения на графах (Elastic graph matching), DeepFace Facebook, скрытые Марковские модели (СММ, HMM), Метод главных компонент и алгоритмы на основе разложения матрицы данных (РСА, SVD, LDA), Active Appearance Models (AAM), Active Shape Models (ASM), FERET (face recognition technology), SURF, NeoFace, SHORE, ROI, Template Matching Methods, DPM (модель деформируемой детали), Искусственные нейронные сети (Neural network: Multilayer Perceptrons), Факторного анализа (ФА), Линейный дискриминантный анализ (Linear Discriminant Analysis), Метод опорных векторов (Support Vector Machines (SVM)), Наивный байесовский классификатор (Naive Bayes classifier), Скрытые Марковские модели (Hidden Markov model), Метод распределения (Distribution-based method), Совмещение ФА и метода главных компонент (Mixture of PCA, Mixture of factor analyzers), Разреженная сеть окон (Sparse network of winnows (SNoW)).The following approaches can be used as an algorithm for detecting people’s faces: adapted enhancement and the Viola-Jones method based on it, MTCNN, Elastic graph matching method, DeepFace Facebook, hidden Markov models (HMM), Principal component method and algorithms based on data matrix decomposition (PCA, SVD, LDA), Active Appearance Models (AAM), Active Shape Models (ASM), FERET (face recognition technology), SURF, NeoFace, SHORE, ROI, Template Matching Methods , DPM (deformable part model), Artificial neural networks (Neural network: Multilayer Perceptrons), Factor analysis (FA), Linear Discriminant Analysis, Support Vector Machines (SVM), Naive Bayes classifier ( Naive Bayes classifier), Hidden Markov model, Distribution-based method, Mixture of PCA, Mixture of factor analyzers, Sparse network of winnows (SNoW )).

Под близостью понимаем минимальность расстояния для числовых данных по метрике БреяКёртиса, Канберры, Ружичка, Кульчинского, Жаккара, Евклидова расстояния, метрики Манхэттена, расстояние размера Пенроуза, расстояние формы Пенроуза, Лоренцевское расстояние, расстояние Хеллинджера, расстояние Минковского меры р, расстояние Махаланобиса, статистическое расстояние, корреляционные подобности и расстояния (корреляция Пирсона, подобность Орчини, нормированное скалярное произведение) или иное. При вычислении близости для расчета берутся координаты точек на кадре синтетически измененного видео и координаты таких точек на кадре исходного видео, далее выбирается наиболее близкое изображения лица, как лица с минимальными расстояниями между используемыми точками.By proximity we mean the minimum distance for numerical data according to the metric of Bray-Curtis, Canberra, Ruzick, Kulczynski, Jaccard, Euclidean distance, Manhattan metric, Penrose size distance, Penrose shape distance, Lorentzian distance, Hellinger distance, Minkowski distance of measure p, Mahalanobis distance, statistical distance , correlation similarities and distances (Pearson correlation, Orcini similarity, normalized scalar product) or others. When calculating proximity, the coordinates of points on the frame of the synthetically modified video and the coordinates of such points on the frame of the original video are taken for calculation, then the closest image of the face is selected, as a face with minimal distances between the points used.

В одном из частных примеров реализации также возможно выделение (получение координат) области лица на кадре синтетически измененного видео, после чего выполняется вырезание области с такими же координатами на кадре исходного видео. В другом частном примере реализации может выполняться обратный вид обработки - на кадре исходного видео обнаруживается лицо, а на кадрах синтетиче- 4 043568 ски измененного видео вырезается область с такими же координатами. По итогу выполненных операций получается два изображения, которые представляют собой область с лицом с кадра исходного видео и лица с кадром синтетически измененного видео.In one of the particular implementation examples, it is also possible to select (obtain coordinates) an area of a face in a frame of a synthetically modified video, after which an area with the same coordinates is cut out from a frame of the original video. In another particular example of implementation, the opposite type of processing can be performed - a face is detected on the frame of the original video, and an area with the same coordinates is cut out on the frames of the synthetically modified video. As a result of the operations performed, two images are obtained, which represent an area with a face from a frame of the original video and a face with a frame of a synthetically modified video.

полученная пара изображений сравнивается между собой по заданной метрике для оценки уровня искажения изображения. В качестве такой метрики может использоваться:the resulting pair of images is compared with each other according to a given metric to assess the level of image distortion. Such a metric can be used:

пиковое отношение сигнала к шуму (PSNR - peak signal-to-noise ratio). https://ru.wikipedia.org/wiki/Пиковое_отношение_сигнала_к_шуму ;peak signal-to-noise ratio (PSNR). https://ru.wikipedia.org/wiki/Peak_signal_to_noise_ratio ;

среднеквадратичная ошибка (MSE - mean square error). https://m.wrkrpedia.org/wrkr/Среднеквадратическое_отkлонение;mean square error (MSE - mean square error). https://m.wrkrpedia.org/wrkr/Root Mean Square Deviation;

квадратный корень функции среднеквадратической ошибки (RMSE - root-mean-square error). https://m.wikipedia.org/wiki/Пиковое_отношение_сигнала_к_шуму;square root of the root-mean-square error function (RMSE). https://m.wikipedia.org/wiki/Peak_signal_to_noise_ratio;

относительное среднее отклонение (RMD - Root mean squared deviation);relative mean deviation (RMD - Root mean squared deviation);

среднеквадратичное отклонение (RMS - Root Mean Squared);root mean squared deviation (RMS - Root Mean Squared);

индекс структурного сходства (SSIM - structure similarity). https://ru.wikipedia.org/wiki/SSIM;structural similarity index (SSIM - structure similarity). https://ru.wikipedia.org/wiki/SSIM;

структурные отличия (DSSIM - structural dissimilarity). https://ru.wikipedia.org/wiki/SSIM ;structural differences (DSSIM - structural dissimilarity). https://ru.wikipedia.org/wiki/SSIM;

отношение сигнал/шум (ОСШ; SNR - signal-to-noise ratio). https://ru.wikrpedia.org/wrkr/Отношение_сигнал/шум/;signal-to-noise ratio (SNR - signal-to-noise ratio). https://ru.wikrpedia.org/wrkr/Signal/noise ratio/;

абсолютная разница между пикселями и наследуемые от нее показатели (средняя, относительная и прочие).the absolute difference between pixels and the indicators inherited from it (average, relative, and others).

При этом если анализируются цветные изображений (с несколькими компонентами на пиксель) применяются аналогичные с дальнейшим взвешенным усреднением по каждой из компонент. Например, для RGB изображения для расчета PSNR или MSE считается по всем трем компонентам (и делится на утроенный размер изображения). Для синтетического изображения хорошего качества и хорошего качества видео (без помех у шумов) предпочтительно использовать PSNR. Если наложению синтетического изображения подлежит только часть лица, то предпочтительно применять PSNR. Если видео с помехами, или высокой зернистостью, то предпочтительно использовать DSSIM или SSIM. При наличии множества помех предпочтительно применяться ОСШ. Если качество видео крайне низкого качества, например, с высокой степенью сжатия, то предпочтительно применять MSE или RMD. Если размеры лица по отношению к кадру маленькие, то применяется абсолютная разница между пикселями. По применяемой метрике выбирается граничное значение, при этом если значение метрики между двумя полученными изображениями больше данного граничного значения, то лицо на кадре принимается как синтетически измененное. Если значение меньше или равно, то, не смотря на разметку данного изображения как синтетически измененного, то данное изображение лица принимается за реальное.Moreover, if color images are analyzed (with several components per pixel), similar ones are used with further weighted averaging for each component. For example, for an RGB image, PSNR or MSE is calculated over all three components (and divided by three times the image size). For good quality synthetic image and good video quality (without interference from noise), it is preferable to use PSNR. If only part of the face is to be overlaid with a synthetic image, then it is preferable to use PSNR. If the video is noisy or highly grainy, then it is preferable to use DSSIM or SSIM. When there is a lot of interference, it is preferable to use SNR. If the video quality is extremely low quality, for example, highly compressed, then it is preferable to use MSE or RMD. If the face is small in relation to the frame, then the absolute difference between the pixels is applied. Based on the applied metric, a boundary value is selected, and if the metric value between the two obtained images is greater than this boundary value, then the face in the frame is accepted as synthetically modified. If the value is less than or equal to, then, despite the marking of this image as synthetically modified, then this image of the face is accepted as real.

При выполнении трансформации массивов данных лиц могут использоваться такие элементы как: нормировка данных, стандартизация данных, приведение размера к заданному, алгоритмы масштабирование изображения.When performing transformation of facial data arrays, the following elements can be used: data normalization, data standardization, bringing the size to a given size, image scaling algorithms.

Аугментация данных для тренировки одной или нескольких моделей машинного обучения может проводится с использование как минимум одного из следующих подходов: масштабирование изображения (увеличения, уменьшения); обрезка изображения; затемнение всего изображения, отдельных каналов изображения; осветление всего изображения, отдельных каналов изображения; повышение контрастности; цветовые преобразования: перемена мест (перемешивание) цветовых каналов, усиление, уменьшения одного или несколько цветовых каналов, получение изображения в градациях серого, получение монохромного изображения, удаление цветового канала; сдвиги и децентровка изображения; повороты изображения на различные углы в различных направлениях, вращение изображения или его части; наклоны, перекосы изображения; зеркальное отображение вдоль произвольной оси, линии; дополнительные линии или геометрические объекты на изображении: с прозрачностью своего цвета, без прозрачности, цветные объекты; серые объекты (от белого до черного цвета), в том числе и удаление части изображения (помещение черного объекта на изображение) на геометрических или смысловых позициях изображения; добавление любого фона на изображение; блики и затемнения частей изображения; дефокус (размытие) изображения или его частей; повышение зернистости, шарпности (резкости) изображения; сжатия и растяжения вдоль осей, линий; зашумление изображение по всему изображению или его части, помещение белого или иного шума; добавление одного или несколько элементов гауссового шума (Blur), пятнистого шума; совмещение (наложение) двух или нескольких изображений из тренировочной выборки (частей изображений) с различными весами; эластическая трансформация изображения (Elastic Transform); сеточное искажение изображения (GridDistortion); сжатие данных изображения различными алгоритмами обработки изображения с некоторым качеством (например, сжатие исходного bmp-изображения по стандарту JPEG некоторого качества, а затем получения из него снова bmp-изображения); изотропные, аффинные и другие преобразования (https://github.com/albumentations-team/albumentations).Data augmentation for training one or more machine learning models can be carried out using at least one of the following approaches: image scaling (increase, decrease); image cropping; darkening the entire image, individual image channels; brightening the entire image, individual image channels; increasing contrast; color transformations: reversing (mixing) color channels, enhancing, decreasing one or more color channels, obtaining a grayscale image, obtaining a monochrome image, deleting a color channel; image shifts and decentration; rotation of the image at different angles in different directions, rotation of the image or part of it; tilts, distortions of the image; mirror image along an arbitrary axis, line; additional lines or geometric objects in the image: with transparency of their color, without transparency, colored objects; gray objects (from white to black), including removing part of the image (placing a black object on the image) at geometric or semantic positions of the image; adding any background to the image; glare and darkening of parts of the image; defocus (blurring) of the image or its parts; increasing graininess, sharpness (sharpness) of the image; compression and stretching along axes, lines; making the image noisy throughout the entire image or part of it, placing white or other noise; adding one or more elements of Gaussian noise (Blur), spotted noise; combination (overlay) of two or more images from the training set (parts of images) with different weights; elastic transformation of the image (Elastic Transform); grid image distortion (GridDistortion); compression of image data by various image processing algorithms with some quality (for example, compression of the original bmp image according to the JPEG standard of some quality, and then obtaining a bmp image from it again); isotropic, affine and other transformations (https://github.com/albumentations-team/albumentations).

При этом все вышеуказанные применимы во всевозможных видах графического представления или его каналах: RGB, sRGB, RGBA, ProPhoto, CMYK, XYZ, LMS, HKS, HSV, HSB, HSL, AHSL, RYB, LAB, NCS, RAL, YUV. YCbCr. YPbPr, YDbDr, YIQ, PMS (Пантон), Манселла. Указанные методы аугментации могут применяться и к одному изображению, в любой последовательности, с вероятностью примененияMoreover, all of the above are applicable in all kinds of graphic representation or its channels: RGB, sRGB, RGBA, ProPhoto, CMYK, XYZ, LMS, HKS, HSV, HSB, HSL, AHSL, RYB, LAB, NCS, RAL, YUV. YCbCr. YPbPr, YDbDr, YIQ, PMS (Panton), Munsella. The specified augmentation methods can be applied to one image, in any sequence, with the probability of application

- 5 043568 или без нее. С помощью обученной модели или алгоритма детектирования лиц людей или на этапе (102) выделяются лица. На этапе (103) выполняется обработка изображений лиц, выделенных на этапе (102), с целью определения какие изображения лиц принадлежат одному человеку. Для этого на этапе (103) осуществляется расчет векторного представления геометрических характеристик изображений лиц. В общем случае это выполняется с помощью алгоритма сравнения опорных точек лиц. С помощью определения геометрических характеристик определяются изображения лиц, принадлежащие непосредственно одному и тому же человеку. Формирование данного вектора позволяет оценить вероятность наличия лица реального человека. Алгоритм работы может осуществляться следующим образом. На i-ом кадре выделяется j-oe лицо. Данное j-oe лицо ищется на последующих кадрах.- 5 043568 or without it. Using a trained model or algorithm for detecting people's faces, faces are identified at step (102). At step (103), the face images identified at step (102) are processed in order to determine which face images belong to the same person. To do this, at stage (103) the vector representation of the geometric characteristics of facial images is calculated. In general, this is done using an algorithm for comparing the reference points of faces. By determining geometric characteristics, images of faces belonging directly to the same person are determined. The formation of this vector allows us to estimate the probability of the presence of a real person’s face. The work algorithm can be carried out as follows. On the i-th frame, the j-oe face is highlighted. This j-oe face is searched for in subsequent frames.

В одном из частных примеров реализации изобретения поиск осуществляется путём выделения наиболее близкого изображения лица в пространстве среди всех обнаруженных лиц на i+1-ом кадре. В качестве меры близости, в зависимости от используемого алгоритма детектирования лиц, используется близость (числовых данных по метрике Брея-Кёртиса, Канберры, Ружичка, Кульчинского, Жаккара, Евклидова расстояния, метрики Манхэттена, расстояние размера Пенроуза, расстояние формы Пенроуза, Лоренцевское расстояние, расстояние Хеллинджера, расстояние Минковского меры р, расстояние Махаланобиса, статистическое расстояние, корреляционные подобности и расстояния - корреляция Пирсона, подобность Орчини, нормированное скалярное произведение, или иное) по одной или нескольким точкам лица (опорным точкам лица): носа, ноздрей, линии волос, линии растительности на лице (борода, усы), рта, губ (верхней и нижней), лба, глаз, зрачков, ушей, бровей, век, головы, скул, подбородка, носогубного треугольника, координат прямоугольника лица.In one of the particular examples of the invention, the search is carried out by selecting the closest face image in space among all detected faces on the i+1th frame. As a measure of proximity, depending on the face detection algorithm used, proximity is used (numerical data according to the Bray-Curtis metric, Canberra, Ruzicka, Kulczynski, Jaccard, Euclidean distance, Manhattan metric, Penrose size distance, Penrose shape distance, Lorentz distance, distance Hellinger, Minkowski distance of measure p, Mahalanobis distance, statistical distance, correlation similarities and distances - Pearson correlation, Orcini similarity, normalized scalar product, or other) by one or more points of the face (facial reference points): nose, nostrils, hairline, lines of facial hair (beard, mustache), mouth, lips (upper and lower), forehead, eyes, pupils, ears, eyebrows, eyelids, head, cheekbones, chin, nasolabial triangle, coordinates of the rectangle of the face.

Осуществляется расчет расстояния между соответствующими опорными точками j-го лица на i-ом кадре и точками каждого лица на i+1-ом кадре. Затем выбирается лицо с i+1-го кадра с наименьшими расстояниями по опорным точкам. В другом частном примере осуществлении изобретения на i+1-ом кадре ищется лицо с наиболее близкими характеристиками между данными опорными точками (взаимным расположением точек). В этом случае считаются геометрические характеристики (размерами) расположения опорных точек j-го изображения лица и на i+1 кадре ищется изображения лица с наиболее похожими геометрическими характеристиками. В еще одном примере осуществления для каждого лица выделяется некоторая пространственная окрестность (область расположения) на кадре и проверяется есть ли какое-либо изображения лица в i+1-ом кадре. Реализация подходов при осуществлении заявленного способа (100) не ограничивает иные возможные способы поиска изображения лица на кадрах.The distance between the corresponding reference points of the j-th face on the i-th frame and the points of each face on the i+1-th frame is calculated. Then the face from the i+1st frame with the smallest distances along the reference points is selected. In another particular example of the invention, on the i+1st frame, a face is searched for with the closest characteristics between the given reference points (the relative positions of the points). In this case, the geometric characteristics (dimensions) of the location of the reference points of the j-th face image are considered and face images with the most similar geometric characteristics are searched for in the i+1 frame. In another embodiment, for each face, a certain spatial neighborhood (location area) is allocated on the frame and it is checked whether there is any image of the face in the i+1th frame. The implementation of approaches when implementing the claimed method (100) does not limit other possible methods of searching for a face image in frames.

Далее на этапе (104) для каждого обнаруженного изображения лица на кадрах определяется рассчитывается оценка вероятности его синтетического изменения по используемой обученной модели машинного обучения детектирования и классификации синтетических изменений. Данная оценка добавляется в вектор оценок изображений лиц j-ого человека. Если на очередном кадре (или серии кадров) упорядоченной последовательности кадров изображение j-ого лица не обнаруживается, то формирование вектора оценок может завершиться. Пример формирования вектора оценок для изображения лица человека на кадрах видео представлен на фиг. 2. В другом из вариантов реализации формирование вектора оценок для изображений лиц j-го человека происходит по всему видео, а не завершается, если на последующем кадре изображение лица не обнаруживается.Next, at step (104), for each detected face image in the frames, an estimate of the probability of its synthetic change is determined using the trained machine learning model for detecting and classifying synthetic changes. This score is added to the vector of face image scores of the j-th person. If on the next frame (or series of frames) of an ordered sequence of frames the image of the j-th face is not detected, then the formation of the assessment vector can be completed. An example of the formation of a vector of estimates for the image of a person’s face in video frames is presented in Fig. 2. In another implementation option, the formation of a vector of estimates for the face images of the j-th person occurs throughout the entire video, and does not end if the face image is not detected on the next frame.

Далее на этапе (104) для каждого определенного изображения лица человека определяются его пространственная и временная значимость, которая определяется как векторное представление пространственной характеристики лица человека, характеризующей размер области лица по отношению к кадру, и векторное представление временной характеристики изображения лица, характеризующей время отображения анализируемого изображения лица на кадрах видео. На фиг. 4 представлена схема этапа 104. Расчёты вектора оценок синтетических изменений изображений лица j-го человека на видео, который состоит из оценок изменений изображения лица в каждом анализируемом кадре, расчёт вектора пространственной и временной характеристик (пространственного вектора и временного вектора) могут проводится последовательно, как это представлено на фиг. 4, или параллельно, независимо друг от друга. Описание изобретения не ограничивает порядок и способ расчёта данных векторов, а описывает их применение для повышения качества выявления синтетических изменений изображений лиц на видео.Next, at step (104), for each specific image of a person’s face, its spatial and temporal significance is determined, which is defined as a vector representation of the spatial characteristic of a person’s face, characterizing the size of the face area in relation to the frame, and a vector representation of the temporal characteristic of the face image, characterizing the time of display of the analyzed facial images in video frames. In fig. 4 shows a diagram of step 104. Calculations of the vector of estimates of synthetic changes in the face images of the j-th person in the video, which consists of estimates of changes in the face image in each analyzed frame, the calculation of the vector of spatial and temporal characteristics (spatial vector and temporal vector) can be carried out sequentially, as this is shown in Fig. 4, or in parallel, independently of each other. The description of the invention does not limit the order and method of calculating these vectors, but describes their use to improve the quality of identifying synthetic changes in facial images in video.

На фиг. 3А-Б представлен пример расчета векторов пространственной и временной значимостей и вектора оценок синтетических изменений. На представленном примере для каждого кадра (К1)-(К6) полученного на этапе (101) видео рассчитывается вектор оценок синтетически измененного лица, вектор пространственного распределения лиц на кадрах, а также временная характеристика лица на кадрах. Пространственная характеристика может рассчитываться исходя из занимаемой доли площади лица от размера кадра. Например, прямоугольник, в который вписано лицо в кадре, имеет координаты: X1=100, Y1=50 - верхний левый угол; Х2=300, Y2=150 - нижний правый угол. Площадь такого прямоугольника 200x100=20000. Видео получено в разрешении 1280x1920 пикселей и его площадь равна 2457600. Доля площади лица в кадре составит 20000/2457600=0,8%. Временная характеристика для каждого лица может рассчитываться как скалярная величина, например, время его отображения на видео. В другой реализации может формироваться вектор, при котором 1 присваивается если человек присутствует в кадре, или 0 - если его нет в кадре. Пространственную и временную значимости можно представить, в частности, какIn fig. 3A-B shows an example of calculating vectors of spatial and temporal significance and a vector of estimates of synthetic changes. In the presented example, for each frame (K1)-(K6) of the video obtained at stage (101), a vector of estimates of the synthetically modified face, a vector of the spatial distribution of faces in the frames, as well as a temporal characteristic of the face in the frames are calculated. The spatial characteristic can be calculated based on the occupied fraction of the face area from the frame size. For example, the rectangle in which the face is inscribed in the frame has the coordinates: X1=100, Y1=50 - upper left corner; X2=300, Y2=150 - lower right corner. The area of such a rectangle is 200x100=20000. The video was obtained in a resolution of 1280x1920 pixels and its area is 2457600. The proportion of the face area in the frame will be 20000/2457600=0.8%. The temporal characteristic for each face can be calculated as a scalar quantity, for example, the time it appears in the video. In another implementation, a vector can be formed in which 1 is assigned if a person is present in the frame, or 0 if he is not in the frame. Spatial and temporal significance can be represented, in particular, as

- 6 043568 общую матрицу на основании значений сформированных векторных представлений.- 6 043568 general matrix based on the values of the generated vector representations.

На этапе (105) формируется общая оценка синтетических изменений изображений лиц человека на видео на основании векторов, полученных на этапах (103)-(104). То есть расчет оценки вероятности синтетического изменения изображения для каждого лица человека в видео выполняется на основании векторов временного распределения, пространственного распределения и вектора оценок вероятности, что изображение лица на кадре было подвержено синтетическим изменениям.At stage (105), an overall assessment of synthetic changes in images of human faces in the video is formed based on the vectors obtained at stages (103)-(104). That is, the calculation of the probability estimate of a synthetic change in the image for each human face in the video is performed based on the vectors of the temporal distribution, spatial distribution and the vector of probability estimates that the face image in the frame was subject to synthetic changes.

Для формирования общей оценки синтетических изменений изображений лиц j-го человека может использоваться отдельная модель машинного обучения. Для формирования упомянутой общей оценки полученные вектора пространственного и временного распределения, вектор оценок синтетических изменений изображения лица объединяются в общую двумерную матрицу, представленную в табл. 1 для примера на фиг. 3А. Полученная матрица подаётся на вход модели машинного обучения для формирования общей оценки синтетического изменения лица j-го человека на видео. Данная модель может представлять собой рекуррентную нейронную сеть, сверточную нейронную сеть, полносвязанную нейронную сеть. Подобное объединение рекомендуется использовать для случая, когда человек присутствует на разных временных отрезках видео, а не только в одной последовательной серии кадров.A separate machine learning model can be used to generate an overall assessment of synthetic changes in facial images of the jth person. To form the mentioned general assessment, the obtained vectors of spatial and temporal distribution, the vector of estimates of synthetic changes in the face image are combined into a common two-dimensional matrix presented in Table. 1 for example in FIG. 3A. The resulting matrix is fed to the input of the machine learning model to form an overall estimate of the synthetic change in the face of the jth person in the video. This model can be a recurrent neural network, a convolutional neural network, or a fully connected neural network. Such a combination is recommended for use in the case when a person is present in different time periods of the video, and not just in one sequential series of frames.

Таблица 1Table 1

Двумерная матрица векторного представления пространственно-временного распределения и вектора оценок синтетических изменений лиц.Two-dimensional matrix of vector representation of spatio-temporal distribution and vector of estimates of synthetic changes in faces.

0,5 0.5 0,04 0.04 0,03 0.03 0,01 0.01 0,02 0.02 0,5 0.5 0 0 0,45 0.45 0,45 0.45 0,45 0.45 0,45 0.45 0 0 0 0 1 1 1 1 1 1 1 1 0 0

В другом частном примере реализации изобретения в общую двумерную матрицу объединяются вектор пространственного распределения изображения лица человека и вектор оценок присутствия синтетических изменений. Но их объединение происходит только по кадрам, на которых есть лицо данного человека. Пример представлен в табл. 2 для фиг. 3А. Подобное объединение рекомендуется использоваться для случая, когда человек присутствует в одной последовательной серии кадров.In another particular example of the invention, a vector of spatial distribution of a human face image and a vector of estimates of the presence of synthetic changes are combined into a common two-dimensional matrix. But their combination occurs only based on frames that contain the person’s face. An example is presented in table. 2 for fig. 3A. Such a combination is recommended for the case when a person is present in one sequential series of frames.

Таблица 2 Двумерная матрица векторного представления пространственно-временного распределения.Table 2 Two-dimensional matrix of vector representation of space-time distribution.

0,04 0.04 0,03 0.03 0,01 0.01 0,02 0.02 0,45 0.45 0,45 0.45 0,45 0.45 0,45 0.45

Один из вариантов формирование уведомления наличия изменений в видео, при способе расчёта общей оценки синтетических изменений изображений лиц j-го человека на этапе 105 с помощью обученной модели, которая использует матрицу объединения векторов пространственно-временного представления и вектора оценок, представлена на фиг. 8.One of the options for generating a notification of the presence of changes in the video, with the method of calculating the overall assessment of synthetic changes in the images of the faces of the j-th person at stage 105 using a trained model that uses a matrix of combining spatio-temporal representation vectors and the assessment vector, is presented in Fig. 8.

В другом частном примере реализации изобретения на этапе (105) вектор оценок, характеризующий то, что изображение лица на кадре было подвержено синтетическим изменениям, анализируется отдельно от векторов временного и пространственного распределения. Пример данной схемы приведён на фиг. 5. В этом случае общая оценка синтетического изменения лица j-го человека строится только на векторе оценок синтетических изменений. Для формирования общей оценки может использоваться отдельная модель машинного обучения или отдельный алгоритм. В одном из частных примеров реализации изобретения, приведенного на фиг. 5, вектор оценок подаётся на вход отдельно обученной модели. Данная модель может представлять собой рекуррентную нейронную сеть, сверточную нейронную сеть, полносвязанную нейронную сеть. В подобных случаях может использоваться вектор определённой длины. В случае если полученный вектор оценок изображения лица меньше заданной длины вектора, то такой вектор дополняется значениями, например, 0,5 с определённого конца. Если вектор больше заданной длины, то он обрезается с определённого конца.In another particular example of implementation of the invention, at step (105), the vector of estimates characterizing the fact that the face image in the frame was subject to synthetic changes is analyzed separately from the vectors of temporal and spatial distribution. An example of this circuit is shown in Fig. 5. In this case, the overall estimate of the synthetic change in the face of the jth person is based only on the vector of estimates of the synthetic changes. A separate machine learning model or a separate algorithm may be used to generate the overall score. In one of the particular examples of implementation of the invention shown in Fig. 5, the vector of estimates is fed to the input of a separately trained model. This model can be a recurrent neural network, a convolutional neural network, or a fully connected neural network. In such cases, a vector of a certain length can be used. If the resulting vector of face image ratings is less than the specified vector length, then such a vector is supplemented with values, for example, 0.5 from a certain end. If a vector is larger than a given length, then it is cut off at a certain end.

В другом частном примере реализации изобретения производится подсчёт количества оценок по заданным интервалам или частоты интервалов оценок. Например, берутся интервалы с шагом 0,1: [0-0,1; 0,1-0,2; 0,2-0,3; 0,3-0,4; 0,4-0,5; 0,5-0,6; 0,6-0,7; 0,7-0,8; 0,8-0,9; 0,9-1] и подсчитывается частота оценок из вектора в данных интервалах. Полученные значения подаются на модель машинного обучения, например, опорных векторов (SVM), K-соседей (K-nearest neighbour), линейной (нелинейной) регрессии, модель деревьев классификации. Описание изобретения не ограничивает вид модели машинного обучения, а описывает ее применение к полученному вектору оценок. В еще одном частном примере реализации изобретения общая оценка синтетический изменений изображений лица человека получается усреднением вектора оценок, или получается извлечением максимального значения, или по всему вектору или по его части.In another particular example of the invention, the number of ratings at given intervals or the frequency of rating intervals is counted. For example, intervals with a step of 0.1 are taken: [0-0.1; 0.1-0.2; 0.2-0.3; 0.3-0.4; 0.4-0.5; 0.5-0.6; 0.6-0.7; 0.7-0.8; 0.8-0.9; 0.9-1] and the frequency of estimates from the vector in these intervals is calculated. The obtained values are fed to a machine learning model, for example, support vector machines (SVM), K-nearest neighbor, linear (nonlinear) regression, classification tree model. The description of the invention does not limit the type of machine learning model, but describes its application to the resulting vector of estimates. In another particular example of the invention, the overall assessment of synthetic changes in human facial images is obtained by averaging a vector of assessments, or is obtained by extracting the maximum value, either over the entire vector or over part of it.

В одном из частных примеров реализации изобретения, приведенного на фиг. 5, для дальнейшего анализа строятся общие пространственные и временные характеристики изображений лица человека.In one of the particular examples of implementation of the invention shown in Fig. 5, for further analysis, general spatial and temporal characteristics of human face images are constructed.

- 7 043568- 7 043568

Общая пространственная характеристика рассчитывается как средняя по пространственному вектору данного лица. Общая временная характеристика получается как длина вектора временной характеристики по отношению к длине видео, то есть, является долей времени присутствия данного человека на видео от всего времени на видео. В другом частном варианте для расчета общей пространственной характеристики выбирается максимальное значение или минимальное.The overall spatial characteristic is calculated as the average over the spatial vector of a given person. The overall time characteristic is obtained as the length of the time characteristic vector in relation to the length of the video, that is, it is the proportion of the time a given person is present in the video of the total time in the video. In another particular variant, the maximum or minimum value is selected to calculate the overall spatial characteristic.

На этапе (106) вычисляется итоговая оценка присутствия синтетических изменений лиц для всего видео. Данная оценка строится с помощью каждой общей оценки синтетический изменений изображений лиц людей. Другими словами, на этапах (104)-(105) получаем оценки синтетических изменений для каждого человека на видео (отдельный человек выделяется на этапе 103), а на этапе (106) по оценкам для людей рассчитываем оценку для видео. Данный этап совокупного анализа оценок всех людей на видео позволяет повысить качество работы изобретения по сравнению с существующими. Например, если на видео множество людей и по всем им имеем высокую оценку синтетического изменения, то вероятно всего исследуемое видео очень сильно сжато, и мы имеем ложное положительное решение моделей при анализе изображений лиц. Совокупный анализ оценок на этапе (106) позволит в таком случае сформировать итоговую оценку для видео как видео без синтетических изменений.At step (106), a final score for the presence of synthetic facial changes is calculated for the entire video. This assessment is built using each general assessment of synthetic changes in images of people's faces. In other words, at stages (104)-(105) we obtain estimates of synthetic changes for each person in the video (an individual person is identified at stage 103), and at stage (106) based on the estimates for people, we calculate the estimate for the video. This stage of cumulative analysis of the ratings of all people in the video allows you to improve the quality of the invention compared to existing ones. For example, if there are a lot of people in the video and for all of them we have a high estimate of the synthetic change, then most likely the video under study is very highly compressed, and we have a false positive decision from the models when analyzing facial images. The cumulative analysis of the ratings at step (106) will then allow us to generate a final rating for the video as a video without synthetic changes.

В одном из частных вариантов реализации изобретения, представленном на фиг. 1, используем общие оценки лиц всех людей для формирования итоговой оценки видео могут применяться следующим образом:In one of the particular embodiments of the invention, shown in Fig. 1, we use the general ratings of the faces of all people to form the final rating of the video can be used as follows:

Определяется средневзвешенное значение оценок используемых лиц людей. В одном из частных вариантов изобретения весами для оценок могут быть произведение среднего размера изображений лиц данного человека и доли времени присутствия на видео.The weighted average of the ratings of the people's faces used is determined. In one of the particular embodiments of the invention, the weights for assessments can be the product of the average size of images of a given person’s faces and the proportion of time present in the video.

Вычисляется простое среднее по оценкам синтетических изменений изображений используемых лиц людей.A simple average is calculated based on the estimates of synthetic changes in the images of the people's faces used.

Формируется максимальная оценка среди оценок используемых лиц людей.The maximum rating is formed among the ratings of the people's faces used.

В другом частном варианте реализации изобретения для указанного выше примера может использоваться обученная модель. Данная модель может представлять метод опорных векторов (SVM), Kсоседей (K-nearest neighbour), линейной (нелинейной) регрессии, модель деревьев классификации, одну или несколько нейронных сетей. Подобная модель может принимать на вход вектор (векторное представление данных), который характеризует количество использования интервалов оценок синтетических изменений по лицам.In another particular embodiment of the invention, a trained model can be used for the above example. This model can represent a support vector machine (SVM), K-nearest neighbor, linear (nonlinear) regression, a classification tree model, one or more neural networks. Such a model can take as input a vector (vector representation of data), which characterizes the amount of use of intervals for assessing synthetic changes in faces.

В другом частном варианте реализации изобретения, пример этапов которого приведены на фиг. 5, формируются общие пространственные и временные характеристики изображений лица человека. На этапе (106) эти характеристики сравниваются с соответствующими граничными значениями. Если по итоговой характеристике размер изображения лица или время его присутствия меньше граничного значения, то оценка данного лица человека не учитывается при расчёте вероятности синтетического изменения видео (оценка None). Схема данного примера приведена на фиг. 6. Оставшиеся оценки синтетических изменений лиц людей анализируются далее выше описанными способами.In another particular embodiment of the invention, an example of the steps of which is shown in Fig. 5, the general spatial and temporal characteristics of images of a person’s face are formed. At step (106), these characteristics are compared with the corresponding boundary values. If, according to the final characteristic, the size of the face image or the time of its presence is less than the limit value, then the assessment of this person’s face is not taken into account when calculating the probability of a synthetic change in the video (None assessment). The diagram of this example is shown in Fig. 6. The remaining estimates of synthetic changes in people’s faces are further analyzed using the methods described above.

В другом частном варианте изобретения двумерные матрицы векторных представлений пространственно-временного распределения и оценок синтетических изменений лиц различных людей, формирование которых описано выше, подаются на вход этапа (106), где выполняется формирование итоговой оценки наличия синтетических изменений лиц людей на видео. Этот этап может выполняться также с помощью отдельной модели машинного обучения или ансамбля обученных моделей.In another particular embodiment of the invention, two-dimensional matrices of vector representations of spatio-temporal distribution and estimates of synthetic changes in the faces of various people, the formation of which is described above, are supplied to the input of stage (106), where the final assessment of the presence of synthetic changes in people’s faces in the video is generated. This step can also be performed using a single machine learning model or an ensemble of trained models.

На этапе (107) формируется интегральная оценка наличия на видео синтетически измененного изображения лица по итоговым оценка наличия синтетических изменений лиц людей на видео. Для этого используются по меньшей мере одна итоговая оценка наличия синтетических изменений лиц на видео, которая формируется по отдельной модели классификации и детектирования синтетических изменений лиц. По завершению этапа (107) генерируют уведомление о наличии синтетически измененного лица в видео.At stage (107), an integral assessment of the presence of a synthetically altered face image in the video is formed based on the final assessment of the presence of synthetic changes in the faces of people in the video. For this purpose, at least one final assessment of the presence of synthetic changes in faces in the video is used, which is formed using a separate model for classifying and detecting synthetic changes in faces. Upon completion of step (107), a notification is generated about the presence of a synthetically altered face in the video.

Уведомление может отображаться непосредственно в графическом интерфейсе пользователя, например, при проведении онлайн-конференции (Zoom, Skype, MS Teams). Также, уведомление может отображаться непосредственно в области выявления синтетического изменения лица, например, в области с изображением лица человека. Дополнительным эффектом от применения изобретения может являться его использование в системах биометрического контроля, например, при получении услуг (например, банковских услуг) или доступа (система контроля доступа, турникет с биометрическим сенсором). При выявлении синтетически измененного изображения лица осуществляется блокировка доступа или запрашиваемого действия со стороны пользователя. В этом случае может дополнительно запрашиваться данные аутентификации пользователя, выбираемые из группы: логин, код, пароль, двухфакторная аутентификация или их сочетания.The notification can be displayed directly in the graphical user interface, for example, during an online conference (Zoom, Skype, MS Teams). Also, the notification may be displayed directly in the synthetic face change detection area, for example, in the area depicting a person's face. An additional effect from the application of the invention may be its use in biometric control systems, for example, when receiving services (for example, banking services) or access (access control system, turnstile with a biometric sensor). If a synthetically modified face image is detected, access or the requested action on the part of the user is blocked. In this case, user authentication data may be additionally requested, selected from the group: login, code, password, two-factor authentication, or combinations thereof.

Заявленное решение может применяться в системах мониторинга медиапространства и анализа социальных медиа и СМИ, для выявления публичных известных людей (первые лица государства, медийные личности, известные люди и т.п.), на которых может производиться попытка их компрометации. Такие системы будут являться источником получаемого видео для его последующего анализа, и, в случаеThe declared solution can be used in systems for monitoring the media space and analyzing social media and mass media, to identify public famous people (top officials of the state, media personalities, famous people, etc.) who may be attempting to compromise them. Such systems will be the source of the resulting video for its subsequent analysis, and, in the case

- 8 043568 выявления синтетических изменений изображений лиц таких людей, им или соответствующей службе может быть направлено уведомление о подложно сформированной информации. Для такого вида уведомления может также сохраняться информация о времени выявленного события, источнике события.- 8 043568 detection of synthetic changes in the images of the faces of such people, a notification about fraudulently generated information can be sent to them or the appropriate service. For this type of notification, information about the time of the detected event and the source of the event may also be stored.

В одном частном варианте изобретения используется несколько моделей выявления синтетических изменений в изображениях лиц, каждая из которых, по меньшей мере одна модель, обучена на свой алгоритм генерирования синтетических изменений.In one particular embodiment of the invention, several models for detecting synthetic changes in facial images are used, each of which, at least one model, is trained on its own algorithm for generating synthetic changes.

В другом частном варианте на каждый алгоритм генерирования синтетических изменений обучен ансамбль моделей. Оценки с нескольких моделей в данном ансамбле усредняются.In another particular version, an ensemble of models is trained for each algorithm for generating synthetic changes. Estimates from several models in a given ensemble are averaged.

Для итоговой классификации полученные оценки обрабатываются интегральным классификатором, что позволяет выявлять скрытые взаимосвязи между предсказаниями моделей для различных алгоритмов генерирования синтетических изменений. Это качество позволяет достичь сверхаддитивного эффекта (синергетического) и повысить качество выявления видео с присутствием синтетических изменений изображений лиц. Общая схема представлена на фиг. 7. Более подробная схема представлена на фиг. 8.For the final classification, the obtained estimates are processed by an integral classifier, which makes it possible to identify hidden relationships between model predictions for various algorithms for generating synthetic changes. This quality allows you to achieve a super-additive effect (synergetic) and improve the quality of identifying videos with the presence of synthetic changes in facial images. The general diagram is shown in Fig. 7. A more detailed diagram is shown in Fig. 8.

В другом частном варианте изобретения интегральным классификатором формирует не только интегральную оценку наличия синтетических изменений лиц людей на видео, но и наиболее вероятный алгоритм, с помощью которого был созданы данные синтетические изменения. Данный пример представлен на фиг. 7.In another particular embodiment of the invention, the integral classifier generates not only an integral assessment of the presence of synthetic changes in people’s faces in the video, but also the most probable algorithm with the help of which these synthetic changes were created. This example is shown in Fig. 7.

На фиг. 9 представлен общий вид вычислительного устройства (600), пригодного для реализации заявленного решения. Устройство (600) может представлять собой, например, компьютер, сервер или иной тип вычислительного устройства, который может применяться для реализации заявленного технического решения. В том числе входить в состав облачной вычислительной платформы.In fig. 9 shows a general view of a computing device (600) suitable for implementing the claimed solution. The device (600) may be, for example, a computer, server, or other type of computing device that can be used to implement the claimed technical solution. Including being part of a cloud computing platform.

В общем случае вычислительное устройство (600) содержит объединенные общей шиной информационного обмена один или несколько процессоров (601), средства памяти, такие как ОЗУ (602) и ПЗУ (603), интерфейсы ввода/вывода (604), устройства ввода/вывода (605), и устройство для сетевого взаимодействия (606).In the general case, a computing device (600) contains one or more processors (601), memory devices such as RAM (602) and ROM (603), input/output interfaces (604), and input/output devices ( 605), and a network communication device (606).

Процессор (601) (или несколько процессоров, многоядерный процессор) могут выбираться из ассортимента устройств, широко применяемых в текущее время, например, компаний Intel™, AMD™, Apple™, Samsung Exynos™, MediaTEK™, Qualcomm Snapdragon™ и т.п. В качестве процессора (601) может также применяться графический процессор, например, Nvidia, AMD, Graphcore и пр.The processor (601) (or multiple processors, multi-core processor) may be selected from a variety of devices commonly used today, such as Intel™, AMD™, Apple™, Samsung Exynos™, MediaTEK™, Qualcomm Snapdragon™, etc. . A graphics processor, for example, Nvidia, AMD, Graphcore, etc., can also be used as the processor (601).

ОЗУ (602) представляет собой оперативную память и предназначено для хранения исполняемых процессором (601) машиночитаемых инструкций для выполнение необходимых операций по логической обработке данных. ОЗУ (602), как правило, содержит исполняемые инструкции операционной системы и соответствующих программных компонент (приложения, программные модули и т.п.).RAM (602) is a random access memory and is designed to store machine-readable instructions executed by the processor (601) to perform the necessary operations for logical data processing. RAM (602) typically contains executable operating system instructions and associated software components (applications, program modules, etc.).

ПЗУ (603) представляет собой одно или более устройств постоянного хранения данных, например, жесткий диск (HDD), твердотельный накопитель данных (SSD), флэш- память (EEPROM, NAND и т.п.), оптические носители информации (CD-R/RW, DVD-R/RW, BlueRay Disc, MD) и др.ROM (603) is one or more permanent storage devices, such as a hard disk drive (HDD), a solid state drive (SSD), flash memory (EEPROM, NAND, etc.), optical storage media (CD-R) /RW, DVD-R/RW, BlueRay Disc, MD), etc.

Для организации работы компонентов устройства (600) и организации работы внешних подключаемых устройств применяются различные виды интерфейсов В/В (604). Выбор соответствующих интерфейсов зависит от конкретного исполнения вычислительного устройства, которые могут представлять собой, не ограничиваясь: PCI, AGP, PS/2, IrDa, FireWire, LPT, COM, SATA, IDE, Lightning, USB (2.0, 3.0, 3.1, micro, mini, type C), TRS/Audio jack (2.5, 3.5, 6.35), HDMI, DVI, VGA, Display Port, RJ45, RS232 и т.п.To organize the operation of device components (600) and organize the operation of external connected devices, various types of I/O interfaces (604) are used. The choice of appropriate interfaces depends on the specific design of the computing device, which can be, but is not limited to: PCI, AGP, PS/2, IrDa, FireWire, LPT, COM, SATA, IDE, Lightning, USB (2.0, 3.0, 3.1, micro, mini, type C), TRS/Audio jack (2.5, 3.5, 6.35), HDMI, DVI, VGA, Display Port, RJ45, RS232, etc.

Для обеспечения взаимодействия пользователя с вычислительным устройством (600) применяются различные средства (605) В/В информации, например, клавиатура, дисплей (монитор), сенсорный дисплей, тач-пад, джойстик, манипулятор мышь, световое перо, стилус, сенсорная панель, трекбол, динамики, микрофон, средства дополненной реальности, оптические сенсоры, планшет, световые индикаторы, проектор, камера, средства биометрической идентификации (сканер сетчатки глаза, сканер отпечатков пальцев, модуль распознавания голоса) и т.п.To ensure user interaction with the computing device (600), various I/O information tools (605) are used, for example, a keyboard, a display (monitor), a touch display, a touch pad, a joystick, a mouse, a light pen, a stylus, a touchpad, trackball, speakers, microphone, augmented reality tools, optical sensors, tablet, light indicators, projector, camera, biometric identification tools (retina scanner, fingerprint scanner, voice recognition module), etc.

Средство сетевого взаимодействия (606) обеспечивает передачу данных устройством (600) посредством внутренней или внешней вычислительной сети, например, Интранет, Интернет, ЛВС и т.п. В качестве одного или более средств (606) может использоваться, но не ограничиваться: Ethernet карта, GSM модем, GPRS модем, LTE модем, 5G модем, модуль спутниковой связи, NFC модуль, Bluetooth и/или BLE модуль, Wi-Fi модуль и др.The network communication means (606) ensures that the device (600) communicates data via an internal or external computer network, for example, an Intranet, the Internet, a LAN, etc. One or more means (606) may be used, but not limited to: Ethernet card, GSM modem, GPRS modem, LTE modem, 5G modem, satellite communication module, NFC module, Bluetooth and/or BLE module, Wi-Fi module and etc.

Дополнительно могут применяться также средства спутниковой навигации в составе устройства (600), например, GPS, ГЛОНАСС, BeiDou, Galileo.Additionally, satellite navigation tools can also be used as part of the device (600), for example, GPS, GLONASS, BeiDou, Galileo.

Представленные материалы заявки раскрывают предпочтительные примеры реализации технического решения и не должны трактоваться как ограничивающие иные, частные примеры его воплощения, не выходящие за пределы испрашиваемой правовой охраны, которые являются очевидными для специалистов соответствующей области техники.The submitted application materials disclose preferred examples of implementation of a technical solution and should not be interpreted as limiting other, particular examples of its implementation that do not go beyond the scope of the requested legal protection, which are obvious to specialists in the relevant field of technology.

--

Claims

CLAIM

1. A computer-implemented method for determining synthetically modified images of faces on video, performed using a processor and containing the steps of:

a) obtaining at least one image from the video;

b) identifying images of faces in said image;

c) calculating a vector representation of the geometric characteristics of the identified facial images, using at least a facial reference point comparison algorithm, to determine images of at least one person's face;

d) using frame-by-frame video analysis, calculate the spatio-temporal significance of each face image of each person in the said image, which is defined as a vector representation of the spatial characteristic of the face, characterizing the size of the facial area in relation to the frame, and a vector representation of the temporal characteristic of the face image, characterizing the time of display analyzed facial image on video frames;

e) calculate a vector of estimates of the probability of synthetic changes for images of a person’s faces, characterizing the presence of synthetic changes in images of this person’s faces in each frame;

f) calculate an overall estimate of the probability of synthetic changes based on vector representations of the spatial, temporal distribution and vector of estimates of synthetic changes for the facial images of each person in the video;

g) forming a final assessment of the presence of a synthetic change in the image of at least one face in the video;

h) forming an integral assessment of the presence of a synthetically altered face image in the video based on at least one final assessment of the model and generating a notification about the presence of a synthetically altered face in the video.

2. The method according to claim 1, characterized in that steps c)-h) are performed by a machine learning model or an ensemble of models, wherein the machine learning model or ensemble of models is trained on a data set containing synthesized images of people's faces.

3. The method according to claim 2, characterized in that the machine learning model uses the function of automatic marking correction, which ensures correction of incorrect marking of each face in the frames by comparing the images of faces on the synthesized video with their images on the original video.

4. The method according to claim 3, characterized in that the comparison of faces is carried out based on the value of the vector proximity of reference points that form the geometric characteristics of the original face image and the synthesized image based on it.

5. The method according to claim 3, characterized in that the comparison of faces is carried out by analyzing the coordinates of the areas of the original face image and the synthesized face image.

6. The method according to claim 1, characterized in that the spatio-temporal significance is calculated as a general matrix based on the values of vector representations, and the assessment of the presence of synthetic changes in the images of an individual’s faces is formed by a machine learning model based on the resulting general matrix.

7. The method according to claim 2, characterized in that the ensemble of machine learning models consists of a group of models, each of which is trained to identify a specific algorithm for generating synthetic images.

8. The method according to claim 7, characterized in that it contains an integral classifier that receives as input estimates generated using models included in the ensemble.

9. The method according to claim 8, characterized in that the final score is calculated using an integral classifier.

10. The method according to claim 9, characterized in that the algorithm for generating a synthetic face image in the analyzed video stream is additionally determined.

11. The method according to claim 1, characterized in that the video is an online video conference.

12. The method according to claim 11, characterized in that when a synthetically modified image of a face is detected in the area where it is displayed, a notification is generated.

13. The method according to claim 11, characterized in that when a synthetically modified face image is determined, the connection with this user is blocked.

14. The method according to claim 1, characterized in that the analyzed image is obtained from a biometric identification or biometric authentication system.

15. The method according to claim 14, characterized in that when a synthetically modified image of a face is determined, access or the requested action on the part of the user is blocked.

-