RU2815689C1

RU2815689C1 - Method, terminal and system for biometric identification

Info

Publication number: RU2815689C1
Application number: RU2023115537A
Authority: RU
Inventors: Тимур Ринатович Абдуллин; Евгений Васильевич Васильченко; Тимур Вячеславович Шипунов
Original assignee: Общество с ограниченной ответственностью "МЕТРИКА Б"
Filing date: 2023-06-14
Publication date: 2024-03-20

Abstract

FIELD: data processing.

SUBSTANCE: invention relates to a method, a terminal and a system for biometric identification. Method comprises steps of: activating, in a terminal for biometric identification, a human face search mode on color images obtained from a video stream from one camera from a stereo camera; detecting and tracking a person's face in an image from said one camera from a stereo camera and determining its dimensions and coordinates; searching for a person's face in an image from an infrared or thermal camera, synchronized with said image from a color camera, and determining its dimensions and coordinates; comparing the dimensions and coordinates of the person's face, determined for the image from the color camera and for the image from the infrared or thermal camera, and making a conclusion on the presence of a person in front of said cameras based on said comparison; sending images from stereo camera and infrared or thermal camera to recognition server; presence of a living person on a captured image is checked on a recognition server by analyzing a color image from one camera from a stereo camera and when confirming the presence of a living person on the captured image, the person's face is recognized, which includes steps of, on which face images of a person on said images received from the terminal are compared with templates stored in a database, and a conclusion is made regarding the presence or absence of a match with the template; and sending the person recognition results from the recognition server to the terminal.

EFFECT: high reliability of identification.

33 cl, 2 dwg

Description

Область техникиField of technology

Настоящее изобретение относится к области биометрической идентификации и, в частности, к способу, терминалу и системе, применяемым в системах, требующих применения биометрии в качестве средств идентификации и аутентификации.The present invention relates to the field of biometric identification and, in particular, to a method, terminal and system used in systems requiring the use of biometrics as a means of identification and authentication.

Уровень техникиState of the art

Известные методы биометрической идентификации включают в себя идентификацию по отпечатку пальца, по лицу, по радужной оболочке глаза, по геометрии руки, по рисунку вен, по голосу, по рукописному почерку и т.д.Well-known methods of biometric identification include identification by fingerprint, face, iris, hand geometry, vein pattern, voice, handwriting, etc.

В настоящее время в качестве биометрических признаков в процессе биометрической идентификации чаще всего используется голос, а также метрики лица человека в видимом и/или инфракрасном (ИК) диапазонах. Другие типы признаков либо не обеспечивают достаточную точность/скорость идентификации пользователя, либо требуют контакта со считывающим устройством (сканер отпечатка пальца, рисунка вен и т.д.).Currently, voice as well as human facial metrics in the visible and/or infrared (IR) ranges are most often used as biometric features in the biometric identification process. Other types of signs either do not provide sufficient accuracy/speed of user identification, or require contact with a reading device (fingerprint scanner, vein pattern, etc.).

Минимальная система идентификации по лицу состоит из камеры видеонаблюдения, устройства захвата и программного обеспечения, которое выполняет анализ изображений. Программное обеспечение для распознавания лиц основано на сложных математических алгоритмах, которые требуют большого количества вычислений.A minimal facial identification system consists of a CCTV camera, a capture device, and software that performs image analysis. Facial recognition software is based on complex mathematical algorithms that require a lot of calculations.

Основная масса устройств и алгоритмов относятся к системам 2D (двухмерного) распознавания лиц, как следствие широкого распространения систем видеонаблюдения. В таких решениях непрерывный видеопоток с камеры разделяется на кадры, из которых после некоторой обработки выделяются участки изображения, содержащие лицо человека. Эти участки обрабатываются компьютерной программой, которая ищет максимально возможную степень сходства предъявленного изображения с набором заранее сохраненных изображений лиц, зарегистрированных с уникальными идентификаторами. Наибольшие успехи в этой области связаны с нейронными сетями. Параметры поиска, образующие многомерную модель, должны быть предварительно вычислены в процессе обучения нейронной сети на специально подготовленных наборах данных (dataset, датасет). Обучение сети это неоднозначный и самый трудоемкий подготовительный процесс, определяющий наряду с объемом и качеством датасета будущую способность модели идентифицировать людей. В то же время такие системы обрабатывают 2D модели лица пользователя и тем самым позволяют получить несанкционированный доступ путем предоставления камерам фотографии зарегистрированного пользователя.The bulk of devices and algorithms relate to 2D (two-dimensional) face recognition systems, as a result of the widespread use of video surveillance systems. In such solutions, a continuous video stream from the camera is divided into frames, from which, after some processing, image areas containing a person’s face are selected. These areas are processed by a computer program that looks for the highest possible degree of similarity between the presented image and a set of pre-stored images of faces registered with unique identifiers. The greatest advances in this area are associated with neural networks. The search parameters that form a multidimensional model must be pre-calculated during the training of the neural network on specially prepared data sets (dataset). Network training is an ambiguous and most labor-intensive preparatory process, which, along with the volume and quality of the dataset, determines the future ability of the model to identify people. At the same time, such systems process 2D models of the user's face and thereby allow unauthorized access by providing cameras with a photograph of the registered user.

Технология 3D (трехмерного) распознавания может использовать тот же математический аппарат, что и 2D, но отличается большим количеством параметров, которые могут быть проанализированы.3D (three-dimensional) recognition technology can use the same mathematical apparatus as 2D, but differs in a larger number of parameters that can be analyzed.

Известные решения 3D распознавания представлены сканерами со структурированной лазерной подсветкой и фотограмметрическими сканерами стереоизображений. Преимуществом 3D технологии является на порядок меньший уровень ошибок 1 и 2 рода. Для реализации этого преимущества требуется соответствующие сканеры и обучение нейронной сети на соответствующих датасетах.Well-known 3D recognition solutions are represented by scanners with structured laser illumination and photogrammetric stereo image scanners. The advantage of 3D technology is an order of magnitude lower level of type 1 and type 2 errors. To realize this advantage, appropriate scanners and neural network training on appropriate datasets are required.

Всем нейронным сетям требуется обучение, но качественные 2D наборы несоизмеримо более доступны. Косвенной иллюстрацией большей привлекательности 2D решения является применение 3D сканирования в первую очередь в системах верификации.All neural networks require training, but high-quality 2D sets are disproportionately more accessible. An indirect illustration of the greater attractiveness of a 2D solution is the use of 3D scanning primarily in verification systems.

Точность распознавания лиц всегда имеет ненулевую ошибку. Это связано с тем, что условия съемки, освещение, повороты лица в кадре и другие факторы отличаются от тех, при которых была произведена регистрация шаблона, не считая того, что люди часто меняют внешность, одежду, прически и так далее. Наибольшее влияние оказывает освещение. Не всегда есть возможность обеспечить равномерное бестеневое освещение (по стандарту РФ ГОСТ Р ИСО/МЭК 19794-5-2013). На практике при установке систем контроля доступа камеры почти всегда оказываются засвечены контровым светом.The accuracy of face recognition always has a non-zero error. This is due to the fact that shooting conditions, lighting, facial rotations in the frame and other factors differ from those under which the template was registered, not counting the fact that people often change their appearance, clothes, hairstyles, and so on. Lighting has the greatest impact. It is not always possible to provide uniform shadow-free lighting (according to the Russian standard GOST R ISO/IEC 19794-5-2013). In practice, when installing access control systems, cameras are almost always backlit.

Различают ошибку 1 рода - FAR (False Acceptance Rate), когда система сопоставляет неверный шаблон, и ошибку 2 рода - FRR (False Rejection Rate), когда система не находит изображение в базе, хотя шаблон для него там зарегистрирован. There is a type 1 error - FAR (False Acceptance Rate), when the system matches an incorrect template, and a type 2 error - FRR (False Rejection Rate), when the system does not find the image in the database, although the template for it is registered there.

Бурное развитие биометрических систем идентификации порождает предпосылки для разработки систем и методов фальсификации биометрических параметров - спуфинга (spoofing).The rapid development of biometric identification systems creates prerequisites for the development of systems and methods for falsifying biometric parameters - spoofing.

Разработки систем и методов по защите от фальсификации при биометрической идентификации демонстрируют множество направлений защиты от атак. В основном это усложненные алгоритмы, различное сочетание статических и динамических методов, в том числе интерактивное распознавание, мультимодальная биометрия, а также сочетание биометрических и небиометрических методов, которое на самом деле относится уже не к идентификации, а к верификации. Все эти методы существенно улучшают стойкость к атакам. Платой за это является усложнение и удорожание систем, а также значительное увеличение времени распознавания. Следствием является сужение области применения систем. The development of systems and methods for protecting against falsification in biometric identification demonstrates many areas of protection against attacks. Basically, these are sophisticated algorithms, various combinations of static and dynamic methods, including interactive recognition, multimodal biometrics, as well as a combination of biometric and non-biometric methods, which in fact no longer refers to identification, but to verification. All these methods significantly improve resistance to attacks. The price for this is increased complexity and cost of systems, as well as a significant increase in recognition time. The consequence is a narrowing of the scope of application of the systems.

В качестве примера можно рассмотреть условную автоматизированную систему учета рабочего времени, либо систему контроля доступа на нережимные обьекты. Приоритетным требованием для такой системы является высокая скорость распознавания, умеренный уровень ошибок, защита от элементарного спуфинга фотографией и изображением на экране мобильного устройства. Поскольку персонал не заинтересован в саботаже, то некоторый процент несрабатывания или ложных срабатываний до определенного предела не влияет на субъективную оценку качества работы. Гораздо важнее скорость распознавания - это интуитивно понятный критерий, поскольку человек без всяких приборов может оценить и сравнить собственную скорость прохождения турникета с биометрией, с обычной RFID картой или даже обычным бумажным пропуском с визуальным контролем.As an example, we can consider a conditional automated system for recording working hours, or an access control system for non-regular facilities. The priority requirement for such a system is a high recognition speed, a moderate error rate, and protection from basic spoofing by photographs and images on the screen of a mobile device. Since the staff is not interested in sabotage, a certain percentage of failures or false alarms, up to a certain limit, does not affect the subjective assessment of the quality of work. Much more important is the speed of recognition - this is an intuitive criterion, since a person without any equipment can evaluate and compare his own speed of passing a turnstile with biometrics, with a regular RFID card or even a regular paper pass with visual control.

Противоположный случай - получение дистанционных государственных или банковских услуг с аутентификацией по биометрии. Приоритетным свойством системы здесь является стойкость к спуфингу, причем зачастую весьма изощренному. С точки зрения пользователя дополнительные проверки биометрии в мультимодальных системах не должны быть утомительными или неудобными. Фактор скорости хоть и имеет второстепенное значение, но только до определенного предела. Поэтому реальное коммерческое применение нашли системы, декларирующие высокий уровень безопасности и удобство, которое является субъективной характеристикой. Скорость работы системы мультимодальной автоматической идентификации и верификации также можно оценить в терминах «быстро-медленно».The opposite case is receiving remote government or banking services with biometric authentication. The priority property of the system here is resistance to spoofing, which is often very sophisticated. From a user perspective, additional biometric checks in multimodal systems should not be tedious or inconvenient. The speed factor, although of secondary importance, is only up to a certain limit. Therefore, systems that declare a high level of security and convenience, which is a subjective characteristic, have found real commercial application. The speed of operation of a multimodal automatic identification and verification system can also be assessed in terms of “fast-slow”.

Преимущественное распространение получили системы одномоментного распознавания (single-shot face recognition), которые не требуют выполнения дополнительных действий при идентификации, таких как действие по определенному позиционированию себя перед камерой, произнесение неких фраз, действий над предметами и т.д. Приоритет в таких системах отдается скорости распознавания, которая дает пользователю ощущение комфорта. В основном это неплатежные терминалы для систем контроля доступа, программ лояльности и т.д.Single-shot face recognition systems, which do not require additional actions during identification, such as positioning oneself in a certain way in front of the camera, pronouncing certain phrases, actions on objects, etc., have become predominantly widespread. Priority in such systems is given to recognition speed, which gives the user a feeling of comfort. These are mainly non-payment terminals for access control systems, loyalty programs, etc.

Наиболее массовые реализации систем одномоментного распознавания лиц используют двухмерные цветные изображения лица. Ограничения существующих методов включают в себя влияние:The most widespread implementations of instantaneous face recognition systems use two-dimensional color images of the face. Limitations of existing methods include the impact of:

- положения и наклона лица в кадре,- position and tilt of the face in the frame,

- эмоций и маскирующих факторов (прическа, одежда, маски),- emotions and masking factors (hairstyle, clothes, masks),

- интенсивности и направления освещения,- intensity and direction of lighting,

- необходимости защиты от спуфинга.- the need for protection against spoofing.

Последние два фактора могут решаться с помощью захвата изображения в ближнем ИК диапазоне (0,7-0,9 мкм).The last two factors can be addressed using near-IR image capture (0.7-0.9 µm).

Изображения в ближнем инфракрасном диапазоне не искажаются окружающим светом и тенями от него, поэтому аппаратные терминалы неплатежного применения обычно используют инфракрасные камеры для защиты от подделок. По сравнению с камерами RGB, инфракрасные камеры имеют более высокую точность защиты от подделок. В то же время, по сравнению со структурированной подсветкой или камерами глубокого зондирования (TOF - Time of flight) в технологии 3D-зрения, инфракрасные камеры дешевле и проще.Near-infrared images are not distorted by ambient light or shadows, which is why non-payment hardware terminals typically use infrared cameras to protect against counterfeiting. Compared to RGB cameras, infrared cameras have higher anti-counterfeit accuracy. At the same time, compared with structured illumination or deep sensing (TOF - Time of flight) cameras in 3D vision technology, infrared cameras are cheaper and simpler.

По сравнению с RGB-камерами, инфракрасные камеры менее подвержены влиянию света, а также могут отображать высококачественные изображения лиц в темноте, при сильном контровом свете и сильном прямом свете (см., например, патентный документ CN112364842A). Тепловые инфракрасные изображения отображают только реальные лица, поэтому они могут решить проблему спуфинга, но низкое разрешение тепловых инфракрасных изображений серьезно влияет на эффект распознавания (см., например, патентные документы US2020311238A1, CN107169483A).Compared to RGB cameras, infrared cameras are less affected by light and can also display high-quality images of faces in the dark, under strong backlight and strong direct light (see, for example, patent document CN112364842A). Thermal infrared images only display real faces, so they can solve the spoofing problem, but the low resolution of thermal infrared images seriously affects the recognition effect (see, for example, patent documents US2020311238A1, CN107169483A).

Проблема обнаружения живого пользователя (liveness detection, определение «живости») при одномоментном распознавании может решаться методом триангуляции стереокамерами (см. патент RU 2316051 C2). При этом строится карта глубин объекта и на основании этой карты принимается решение, обладает ли объект рельефом. Это пример решения, когда распознавание изображения лица и определение живости - независимые процессы и окончательное решение о распознавании объекта принимается путем сравнения взвешенных метрик с порогом. Стоит иметь в виду, что данное решение дополнительно включает в себя анализ поведенческих признаков, а также интерактивных действий пользователя, например, визуальных, аудиальных, кинестетических, на определенный набор команд системы и т.д., что требует дополнительных вычислительных мощностей и дополнительного времени обработки.The problem of detecting a live user (liveness detection, definition of “liveness”) during instant recognition can be solved by triangulation with stereo cameras (see patent RU 2316051 C2). In this case, a depth map of the object is constructed and, based on this map, a decision is made whether the object has a relief. This is an example of a decision where face image recognition and liveness detection are independent processes and the final decision on object recognition is made by comparing the weighted metrics with a threshold. It is worth keeping in mind that this solution additionally includes the analysis of behavioral signs, as well as interactive user actions, for example, visual, auditory, kinesthetic, on a specific set of system commands, etc., which requires additional computing power and additional processing time .

Кроме того, сложность при монтаже и последующей эксплуатации терминалов биометрической идентификации вызывает расположение и установка упомянутых терминалов таким образом, чтобы они не мешали пользователям и не вызывали у них дискомфорт, при этом обеспечивая надлежащий угол обзора для камер, подходящий для захвата изображений лица пользователя, пригодных для идентификации, а также снижающий негативное влияние контрового и/или прямого света на захватываемое изображение.In addition, the difficulty in installing and subsequently operating biometric identification terminals involves the location and installation of said terminals in such a way that they do not interfere with users or cause them discomfort, while providing the appropriate viewing angle for cameras suitable for capturing images of the user's face suitable for identification, as well as reducing the negative impact of backlight and/or direct light on the captured image.

Таким образом, в настоящее время к терминалам для биометрической идентификации предъявляются следующие требования:Thus, the following requirements are currently imposed on terminals for biometric identification:

- высокое быстродействие для выбранного применения;- high performance for the selected application;

- высокая точность по ошибкам 1 и 2 рода;- high accuracy for errors of type 1 and 2;

- устойчивость к помехам (сильный прямой и контровый свет и т.д.) и преднамеренным атакам (спуфинг и т.д.);- resistance to interference (strong direct and backlight, etc.) and deliberate attacks (spoofing, etc.);

- хорошая эргономика при монтаже и эксплуатации.- good ergonomics during installation and operation.

Краткое изложение существа изобретенияBrief summary of the invention

Настоящее изобретение направлено на решение по меньшей мере некоторых из указанных выше проблем.The present invention is directed to solving at least some of the above problems.

В соответствии с одним аспектом настоящее изобретение обеспечивает способ биометрической идентификации, содержащий этапы, на которых:In accordance with one aspect, the present invention provides a biometric identification method comprising the steps of:

- активируют в терминале для биометрической идентификации режим поиска человеческого лица на цветных изображениях, полученных из видеопотока от одной камеры из стереокамеры;- activate in the terminal for biometric identification the search mode for a human face on color images obtained from a video stream from one camera from a stereo camera;

- осуществляют предварительную обработку изображений со стереокамеры и инфракрасной или тепловой камеры упомянутого терминала, которая включает в себя, по меньшей мере, одну из операций линейной коррекции, баланса белого, адаптивной экспозиции и удаления шума;- carry out preliminary processing of images from the stereo camera and the infrared or thermal camera of the said terminal, which includes at least one of the operations of linear correction, white balance, adaptive exposure and noise removal;

- обнаруживают и отслеживают лицо человека в изображении с упомянутой одной камеры из стереокамеры и определяют его размеры и координаты;- detect and track a person’s face in an image from said one camera from a stereo camera and determine its dimensions and coordinates;

- осуществляют поиск лица человека в изображении с инфракрасной или тепловой камеры, синхронизированном с упомянутым изображением с цветной камеры, и определяют его размеры и координаты;- search for a person’s face in an image from an infrared or thermal camera, synchronized with the said image from a color camera, and determine its dimensions and coordinates;

- сравнивают размеры и координаты лица человека, определенные для изображения с цветной камеры и для изображения с инфракрасной или тепловой камеры, и делают вывод о наличии человека перед упомянутыми камерами на основании упомянутого сравнения; - compare the dimensions and coordinates of a person’s face, determined for an image from a color camera and for an image from an infrared or thermal camera, and draw a conclusion about the presence of a person in front of said cameras based on said comparison;

- осуществляют нормализацию изображений со стереокамеры и инфракрасной или тепловой камеры и отправляют их на сервер распознавания;- normalize images from a stereo camera and an infrared or thermal camera and send them to the recognition server;

- на сервере распознавания осуществляют распознавание лица человека, включающее в себя этапы, на которых сопоставляют изображения лица человека на упомянутых изображениях, принятых от терминала, с шаблонами, сохраненными в базе данных, и делают вывод относительно наличия или отсутствия совпадения с шаблоном;- on the recognition server, a person's face is recognized, which includes the steps of comparing images of a person's face in said images received from the terminal with templates stored in the database, and drawing a conclusion regarding the presence or absence of a match with the template;

- отправляют результаты распознавания человека с сервера распознавания на терминал.- send the results of person recognition from the recognition server to the terminal.

Согласно одному варианту осуществления способа активируют в терминале для биометрической идентификации режим поиска человеческого лица на изображениях, полученных из видеопотока от одной цветной камеры из стереокамеры, в ответ на обнаружение движения в захватываемых изображениях от упомянутой одной цветной камеры из стереокамеры, сопровождающегося изменением общей освещенности в кадре.According to one embodiment of the method, the biometric identification terminal activates a human face search mode in images obtained from a video stream from a single color camera from a stereo camera, in response to detecting motion in the captured images from said single color camera from a stereo camera, accompanied by a change in the overall illumination in the frame. .

Согласно другому варианту осуществления способа до активации режима обнаружения человеческого лица подсветка интересующей области, в которой возможно появление человеческого лица, осуществляемая блоком подсветки, приглушена, поиск лица не производится, изображения от одной камеры из стереокамеры анализируются только на общий уровень освещенности, а после активации режима поиска человеческого лица происходит выход подсветки на рабочий режим.According to another embodiment of the method, before activating the human face detection mode, the illumination of the area of interest in which the appearance of a human face is possible, carried out by the backlight unit, is dimmed, a face search is not performed, images from one camera from a stereo camera are analyzed only for the general illumination level, and after activating the mode When searching for a human face, the backlight enters operating mode.

Согласно другому варианту осуществления способа на этапе предварительной обработки изображения со стереокамеры и инфракрасной или тепловой камеры записываются в кольцевой буфер и подвергаются упомянутой предварительной обработке, при этом сохраняется их синхронность. According to another embodiment of the method, in the pre-processing step, images from the stereo camera and the infrared or thermal camera are recorded in a ring buffer and subjected to said pre-processing, while maintaining their synchronicity.

Согласно другому варианту осуществления способа поиск лица человека в изображении с инфракрасной или тепловой камеры осуществляют по всему полю захватываемого изображения.According to another embodiment of the method, the search for a person's face in an image from an infrared or thermal camera is carried out over the entire field of the captured image.

Согласно другому варианту осуществления способа, если на этапе поиска лица человека в изображении с инфракрасной или тепловой камеры лицо человека не обнаружено, то данное изображение отбрасывается, отслеживание лица человека на изображении с цветной камеры прекращается и происходит возврат к этапу поиска человеческого лица на изображениях от одной цветной камеры из стереокамеры.According to another embodiment of the method, if at the stage of searching for a person’s face in an image from an infrared or thermal camera, a person’s face is not detected, then this image is discarded, tracking the person’s face in the image from a color camera stops and returns to the stage of searching for a human face in images from one color camera from a stereo camera.

Согласно другому варианту осуществления способа перед распознаванием лица человека на сервере распознавания осуществляют проверку присутствия живого человека на захватываемом изображении посредством анализа цветного изображения от одной камеры из стереокамеры, при этом анализу подвергается как лицо человека, так и фоновые объекты на изображении.According to another embodiment of the method, before recognizing a person's face, the recognition server checks the presence of a living person in the captured image by analyzing a color image from one camera from a stereo camera, while both the person's face and background objects in the image are analyzed.

Согласно другому варианту осуществления способа перед распознаванием лица человека на сервере распознавания осуществляют проверку присутствия живого человека на захватываемом изображении посредством анализа карты глубин, построенной на основе двух синхронных цветных изображений от стереокамеры.According to another embodiment of the method, before recognizing a person's face, the recognition server checks the presence of a living person in the captured image by analyzing a depth map built on the basis of two synchronous color images from a stereo camera.

Согласно другому варианту осуществления способа анализ карты глубин осуществляют посредством сравнения карты глубин, вычисленной в текущий момент на основе двух синхронных цветных изображений от стереокамеры, с шаблоном карты глубины, представляющим рельеф некоторого усредненного лица человека.According to another embodiment of the method, the analysis of the depth map is carried out by comparing the depth map currently calculated based on two synchronous color images from a stereo camera with a depth map template representing the relief of some average person's face.

Согласно другому варианту осуществления способа перед распознаванием лица человека на сервере распознавания осуществляют проверку присутствия живого человека на захватываемом изображении посредством анализа синхронизированных цветных изображений от двух камер из стереокамеры.According to another embodiment of the method, before recognizing a person's face, the recognition server checks the presence of a living person in the captured image by analyzing synchronized color images from two cameras from a stereo camera.

Согласно другому варианту осуществления способа перед распознаванием лица человека на сервере распознавания осуществляют проверку присутствия живого человека на захватываемом изображении посредством анализа синхронизированных цветных изображений от двух камер из стереокамеры и изображения от инфракрасной камеры.According to another embodiment of the method, before recognizing a person's face, the recognition server checks the presence of a living person in the captured image by analyzing synchronized color images from two cameras from a stereo camera and an image from an infrared camera.

Согласно другому варианту осуществления способа упомянутый анализ для определения присутствия живого человека на захватываемом изображении выполняют посредством нейронной сети.According to another embodiment of the method, said analysis to determine the presence of a living person in the captured image is performed by means of a neural network.

Согласно другому варианту осуществления способ дополнительно содержит этапы, на которых осуществляют дополнительное подтверждение идентификации посредством записи аудиосигнала от микрофона/микрофонов терминала в синхронизированном режиме с захватом видеоизображений от камер и идентификации человека по голосу, выделенному из записанного аудиосигнала.According to another embodiment, the method further comprises the steps of further confirming the identification by recording the audio signal from the microphone/microphones of the terminal in a synchronized mode with capturing video images from the cameras and identifying the person by voice extracted from the recorded audio signal.

Согласно другому варианту осуществления способ дополнительно содержит этап, на котором осуществляют дополнительную проверку присутствия живого человека посредством сопоставления движения губ человека на видеоизображениях с фразой, произнесенной человеком и захваченной посредством записи аудиосигнала от микрофона/микрофонов терминала.In another embodiment, the method further comprises further verifying the presence of a living person by matching the movement of the person's lips in the video images with a phrase spoken by the person and captured by recording audio from the terminal microphone(s).

Согласно другому варианту осуществления способ дополнительно содержит этапы, на которых осуществляют дополнительное подтверждение идентификации посредством считывания метки или карты, подтверждающей личность человека, бесконтактным устройством считывания.According to another embodiment, the method further comprises further confirming the identification by reading the person's identification tag or card with a contactless reader.

В соответствии с другим аспектом настоящее изобретение обеспечивает компьютерно-читаемый носитель данных, хранящий на себе компьютерную программу, которая при выполнении процессором предписывает упомянутому процессору осуществлять вышеупомянутый способ биометрической идентификации.In accordance with another aspect, the present invention provides a computer-readable storage medium storing a computer program that, when executed by a processor, causes said processor to implement the aforementioned biometric identification method.

В соответствии с еще одним аспектом настоящее изобретение обеспечивает терминал для биометрической идентификации, включающий в себя блок камер, содержащий блок подсветки, стереокамеру, инфракрасную или тепловую камеру, а также процессорный блок, соединенный с блоком камер, причем процессорный блок выполнен с возможностью:In accordance with yet another aspect, the present invention provides a biometric identification terminal including a camera unit including a backlight unit, a stereo camera, an infrared or thermal camera, and a processing unit coupled to the camera unit, wherein the processing unit is configured to:

- активировать режим поиска человеческого лица на изображениях, полученных из видеопотока от одной цветной камеры из стереокамеры;- activate the search mode for a human face in images obtained from a video stream from one color camera from a stereo camera;

- осуществлять предварительную обработку изображений со стереокамеры и инфракрасной или тепловой камеры упомянутого терминала, которая включает в себя, по меньшей мере, одну из операций линейной коррекции, баланса белого, адаптивной экспозиции и удаления шума;- carry out preliminary processing of images from the stereo camera and the infrared or thermal camera of the said terminal, which includes at least one of the operations of linear correction, white balance, adaptive exposure and noise removal;

- обнаруживать и отслеживать лицо человека в изображении с упомянутой одной цветной камеры из стереокамеры и определять его размеры и координаты;- detect and track a person’s face in an image from said single color camera from a stereo camera and determine its dimensions and coordinates;

- осуществлять поиск лица человека в изображении с инфракрасной или тепловой камеры, синхронизированном с упомянутым изображением с цветной камеры, и определять его размеры и координаты;- search for a person’s face in an image from an infrared or thermal camera, synchronized with the said image from a color camera, and determine its dimensions and coordinates;

- сравнивать размеры и координаты лица человека, определенные для изображения с цветной камеры и для изображения с инфракрасной или тепловой камеры, и делать вывод о наличии человека перед упомянутыми камерами на основании упомянутого сравнения;- compare the dimensions and coordinates of a person’s face determined for an image from a color camera and for an image from an infrared or thermal camera, and draw a conclusion about the presence of a person in front of said cameras based on the said comparison;

- осуществлять нормализацию изображений со стереокамеры и инфракрасной или тепловой камеры и отправлять их на сервер распознавания;- normalize images from a stereo camera and an infrared or thermal camera and send them to the recognition server;

- принимать результаты распознавания человека с сервера распознавания.- receive human recognition results from the recognition server.

Согласно одному варианту осуществления терминала процессорный блок выполнен с возможностью активировать режим поиска человеческого лица на изображениях, полученных из видеопотока от одной цветной камеры из стереокамеры, в ответ на обнаружение движения в захватываемых изображениях от упомянутой одной цветной камеры из стереокамеры, сопровождающегося изменением общей освещенности в кадре.According to one embodiment of the terminal, the processing unit is configured to activate a human face search mode in images obtained from a video stream from a single color camera of a stereo camera, in response to detecting motion in the captured images from said single color camera of a stereo camera, accompanied by a change in the overall illumination in the frame. .

Согласно другому варианту осуществления терминала до активации режима обнаружения человеческого лица подсветка интересующей области, в которой возможно появление человеческого лица, осуществляемая блоком подсветки, приглушена, поиск лица не производится, изображения от одной камеры из стереокамеры анализируются только на общий уровень освещенности, а после активации режима поиска человеческого лица происходит выход подсветки на рабочий режим.According to another embodiment of the terminal, before activating the human face detection mode, the illumination of the area of interest in which the appearance of a human face is possible, carried out by the backlight unit, is dimmed, face search is not performed, images from one camera from the stereo camera are analyzed only for the general illumination level, and after activating the mode When searching for a human face, the backlight enters operating mode.

Согласно другому варианту осуществления терминала процессорный блок выполнен с возможностью записывать изображения со стереокамеры и инфракрасной или тепловой камеры в кольцевой буфер и подвергать их упомянутой предварительной обработке, при этом сохраняя их синхронность.According to another embodiment of the terminal, the processing unit is configured to record images from the stereo camera and the infrared or thermal camera into a ring buffer and subject them to said pre-processing while maintaining their synchronicity.

Согласно другому варианту осуществления терминала процессорный блок выполнен с возможностью осуществлять поиск лица человека в изображении с инфракрасной или тепловой камеры по всему полю захватываемого изображения.According to another embodiment of the terminal, the processing unit is configured to search for a person's face in an image from an infrared or thermal camera across the entire field of the captured image.

Согласно другому варианту осуществления терминала процессорный блок выполнен с возможностью, если на этапе поиска лица человека в изображении с инфракрасной или тепловой камеры лицо человека не обнаружено, отбрасывать данное изображение, прекращать отслеживание лица человека на изображении с цветной камеры и осуществлять возврат к этапу поиска человеческого лица на изображениях от одной цветной камеры из стереокамеры.According to another embodiment of the terminal, the processing unit is configured to, if at the stage of searching for a human face in the image from an infrared or thermal camera, a human face is not detected, discard this image, stop tracking the person's face in the image from the color camera and return to the stage of searching for a human face on images from a single color camera from a stereo camera.

Согласно другому варианту осуществления терминала блок камер соединен с корпусом, содержащим процессорный блок, посредством поворотного кронштейна.According to another embodiment of the terminal, the camera unit is connected to a housing containing a processing unit via a rotating bracket.

В соответствии с еще одним аспектом настоящее изобретение обеспечивает систему для биометрической идентификации, включающую в себя вышеупомянутый терминал и сервер распознавания, выполненный с возможностью:In accordance with yet another aspect, the present invention provides a system for biometric identification, including the above-mentioned terminal and a recognition server configured to:

- принимать нормализованные изображения от терминала;- receive normalized images from the terminal;

- осуществлять распознавание лица человека, включающее в себя сопоставление изображений лица человека на нормализованных изображениях, принятых от терминала, с шаблонами, сохраненными в базе данных, и принятие решения относительно наличия или отсутствия совпадения с шаблоном;- perform human face recognition, which includes comparing images of a person’s face in normalized images received from the terminal with templates stored in the database, and making a decision regarding the presence or absence of a match with the template;

- отправлять результаты распознавания человека на терминал.- send person recognition results to the terminal.

Согласно одному варианту осуществления системы сервер распознавания выполнен с возможностью перед распознаванием лица человека осуществлять проверку присутствия живого человека на захватываемом изображении посредством анализа цветного изображения от одной камеры из стереокамеры, при этом анализу подвергается как лицо человека, так и фоновые объекты на изображении.According to one embodiment of the system, the recognition server is configured to, before recognizing a person's face, check for the presence of a living person in the captured image by analyzing a color image from one camera from a stereo camera, wherein both the person's face and background objects in the image are analyzed.

Согласно одному варианту осуществления системы сервер распознавания выполнен с возможностью перед распознаванием лица человека осуществлять проверку присутствия живого человека на захватываемом изображении посредством анализа карты глубин, построенной на основе двух синхронных цветных изображений от стереокамеры.According to one embodiment of the system, the recognition server is configured to, before recognizing a person's face, check the presence of a living person in the captured image by analyzing a depth map built on the basis of two synchronous color images from a stereo camera.

Согласно одному варианту осуществления системы сервер распознавания выполнен с возможностью анализа карты глубин посредством сравнения карты глубин, вычисленной в текущий момент на основе двух синхронных цветных изображений от стереокамеры, с шаблоном карты глубины, представляющим рельеф некоторого усредненного лица человека.According to one embodiment of the system, the recognition server is configured to analyze the depth map by comparing the depth map currently calculated based on two synchronous color images from a stereo camera with a depth map template representing the relief of some average person's face.

Согласно одному варианту осуществления системы сервер распознавания выполнен с возможностью перед распознаванием лица человека осуществлять проверку присутствия живого человека на захватываемом изображении посредством анализа синхронизированных цветных изображений от двух камер из стереокамеры.According to one embodiment of the system, the recognition server is configured to, before recognizing a person's face, verify the presence of a living person in the captured image by analyzing synchronized color images from two cameras from a stereo camera.

Согласно одному варианту осуществления системы сервер распознавания выполнен с возможностью перед распознаванием лица человека осуществлять проверку присутствия живого человека на захватываемом изображении посредством анализа синхронизированных цветных изображений от двух камер из стереокамеры и изображения от инфракрасной камеры.According to one embodiment of the system, the recognition server is configured to, before recognizing a person's face, verify the presence of a living person in the captured image by analyzing synchronized color images from two cameras from a stereo camera and an image from an infrared camera.

Согласно одному варианту осуществления системы упомянутый анализ для определения присутствия живого человека на захватываемом изображении выполняется в сервере распознавания посредством нейронной сети.According to one embodiment of the system, said analysis to determine the presence of a living person in the captured image is performed in the recognition server via a neural network.

Согласно одному варианту осуществления системы терминал дополнительно содержит микрофон/микрофоны для захвата аудиосигнала, а система выполнена с возможностью осуществлять дополнительное подтверждение идентификации посредством записи аудиосигнала от микрофона/микрофонов терминала в синхронизированном режиме с захватом видеоизображений от камер и идентификации человека по голосу, выделенному из записанного аудиосигнала.According to one embodiment of the system, the terminal further includes a microphone/microphones for capturing audio, and the system is configured to provide additional confirmation of identification by recording the audio signal from the microphone/microphones of the terminal in a synchronized mode with capturing video images from cameras and identifying a person by voice extracted from the recorded audio signal. .

Согласно одному варианту осуществления система выполнена с возможностью осуществлять дополнительную проверку присутствия живого человека посредством сопоставления движения губ человека на видеоизображениях с фразой, произнесенной человеком и захваченной посредством записи аудиосигнала от микрофона/микрофонов терминала.In one embodiment, the system is configured to further verify the presence of a live person by matching the movement of the person's lips in the video images with a phrase spoken by the person and captured by recording audio from the terminal microphone(s).

Согласно одному варианту осуществления системы терминал дополнительно содержит бесконтактное устройство считывания, а система выполнена с возможностью осуществлять дополнительное подтверждение идентификации посредством считывания метки или карты, подтверждающей личность человека, бесконтактным устройством считывания.According to one embodiment of the system, the terminal further includes a contactless reader, and the system is configured to provide additional confirmation of identification by reading a tag or card confirming the person's identity with the contactless reader.

Таким образом, настоящее изобретение обеспечивает простое и недорогое решение для осуществления биометрической идентификации, обладающее высоким быстродействием, точностью, надежностью и высоким уровнем защиты от несанкционированного доступа.Thus, the present invention provides a simple and inexpensive solution for implementing biometric identification, which has high speed, accuracy, reliability and a high level of protection against unauthorized access.

Краткое описание чертежейBrief description of drawings

В дальнейшем изобретение поясняется описанием предпочтительных вариантов воплощения изобретения со ссылками на сопроводительные чертежи, на которых:The invention is further illustrated by a description of preferred embodiments of the invention with reference to the accompanying drawings, in which:

На фиг. 1 изображен примерный вариант осуществления терминала для биометрической идентификации в соответствии с настоящим изобретением;In fig. 1 illustrates an exemplary embodiment of a biometric identification terminal in accordance with the present invention;

На фиг. 2 изображена примерная блок-схема процесса биометрической идентификации в соответствии с настоящим изобретением.In fig. 2 depicts an exemplary flow diagram of a biometric identification process in accordance with the present invention.

Описание предпочтительных вариантов осуществления изобретенияDescription of Preferred Embodiments of the Invention

В соответствии с одним аспектом настоящее изобретение раскрывает терминал для биометрической идентификации.In accordance with one aspect, the present invention discloses a biometric identification terminal.

Примерный терминал для биометрической идентификации (см. фиг. 1) включает в себя блок (1) камер, содержащий блок (2) подсветки, стереокамеру (3), инфракрасную камеру (4), а также процессорный блок в корпусе (7), соединенном с блоком (1) камер посредством поворотного кронштейна (5). Опционально, терминал для биометрической идентификации в соответствии с настоящим изобретением может включать в себя дисплей (6) и бесконтактное устройство (8) считывания.An exemplary terminal for biometric identification (see Fig. 1) includes a camera unit (1) containing a backlight unit (2), a stereo camera (3), an infrared camera (4), and a processing unit in a housing (7) connected with the camera block (1) using a rotating bracket (5). Optionally, the biometric identification terminal in accordance with the present invention may include a display (6) and a contactless reader (8).

Терминал является частью распределенной информационной системы. Его функционирование определяется программным обеспечением, как его собственным, так и сетевым. Терминал в соответствии с примерным вариантом осуществления оснащен всей необходимой периферией для реализации большинства функций, которые нужны для универсального терминала распознавания лиц. Подключение периферии к процессорному блоку осуществляется через стандартные интерфейсы, поэтому программное обеспечение, реализующее тот или иной алгоритм работы, может быть изменено без изменений аппаратного уровня.The terminal is part of a distributed information system. Its functioning is determined by software, both its own and network software. The terminal according to an exemplary embodiment is equipped with all the necessary peripherals to implement most of the functions that are needed for a universal facial recognition terminal. The peripherals are connected to the processor unit through standard interfaces, so the software that implements a particular operating algorithm can be changed without changing the hardware level.

Стоит отметить, что терминал в соответствии с настоящим изобретением может использоваться, например, в системах электронной торговли, электронного банковского обслуживания, электронного документооборота, контроля доступа и т.д. В таком случае терминал, помимо непосредственно биометрической идентификации, дополнительно выполнен с возможностью реализации функций, присущих области применения. Например, терминал может быть выполнен с возможностью осуществления оплаты (транзакции) за товары или услуги в системах электронной торговли, управления доступом пользователя в помещение (открытие двери/турникета, включение/отключение сигнализации и т.д.) в системах контроля доступа и т.д.It is worth noting that the terminal in accordance with the present invention can be used, for example, in electronic commerce, electronic banking, electronic document management, access control, etc. systems. In this case, the terminal, in addition to direct biometric identification, is additionally designed to implement functions specific to the application area. For example, the terminal can be configured to make payments (transactions) for goods or services in e-commerce systems, control user access to the premises (opening a door/turnstile, turning on/off an alarm, etc.) in access control systems, etc. d.

Блок (1) камер на поворотном кронштейне (5) включает в себя стереокамеру (3), инфракрасную камеру (4) и блок (2) подсветки. Блок (2) подсветки представляет собой блок белой и инфракрасной подсветки. Все три камеры имеют одинаковые матрицы высокого разрешения, установленные в портретной ориентации, одинаковую оптику и работают синхронно. Оси камер направлены параллельно. Средняя камера на фиг. 1 включает в себя ИК светофильтр с пропусканием в диапазоне 700нм и более и представляет собой ИК камеру. Камеры по краям образуют RGB стереопару (стереокамеру), в них установлены фильтры, обрезающие ИК излучение длиннее 650 нм, чтобы избежать искажений цветопередачи в условиях мощной ИК подсветки. Портретная ориентация матриц имеет преимущество при обработке видео людей разного роста за счет увеличения охвата зоны обслуживания по вертикали и в сочетании с широкоугольной светосильной оптикой позволяет покрыть весь диапазон ростов пользователей.The camera block (1) on the rotating bracket (5) includes a stereo camera (3), an infrared camera (4) and a backlight block (2). The backlight unit (2) is a white and infrared backlight unit. All three cameras have the same high-resolution sensors installed in portrait orientation, the same optics and work synchronously. The camera axes are directed parallel. The middle chamber in Fig. 1 includes an IR filter with transmission in the range of 700 nm or more and is an IR camera. The cameras at the edges form an RGB stereo pair (stereo camera); they are equipped with filters that cut off IR radiation longer than 650 nm to avoid color distortion under conditions of powerful IR illumination. The portrait orientation of the matrices has an advantage when processing videos of people of different heights by increasing the vertical coverage of the service area and, in combination with wide-angle high-aperture optics, allows you to cover the entire range of user heights.

Регулируемый блок (2) подсветки содержит ИК излучатели (например, 940нм) и светодиоды белого свечения. Блок (2) подсветки служит для приведения изображения в требуемый диапазон освещенности. Размещение узла подсветки в блоке камер дает возможность эффективно использовать световой поток. Адаптивная регулировка яркости подсветки использует в качестве сигнала управления уровень яркости лица, выделенного алгоритмом поиска.The adjustable backlight unit (2) contains IR emitters (for example, 940 nm) and white LEDs. The backlight block (2) is used to bring the image into the required illumination range. Placing the backlight unit in the camera block makes it possible to effectively use the light flux. Adaptive backlight brightness control uses the brightness level of the face identified by the search algorithm as a control signal.

Стереокамера (3) используется для построения карты глубин изображения. По этой карте происходит выявление плоских изображений, фотографий для предотвращения несанкционированного доступа посредством предъявления камере фотографии или плоского изображения зарегистрированного пользователя. Одна из камер стереопары может являться источником изображения для цифровой видеосвязи, а также для дальнейшей обработки кадров изображения, прошедших антиспуфинг-контроль. То есть изображение со стереопары используется одновременно в нескольких целях, как для 3D обнаружения, так и для основного назначения - 2D распознавания.A stereo camera (3) is used to construct an image depth map. Using this card, flat images and photographs are identified to prevent unauthorized access by presenting a photograph or flat image of a registered user to the camera. One of the cameras in a stereo pair can be an image source for digital video communication, as well as for further processing of image frames that have passed anti-spoofing control. That is, the image from a stereo pair is used simultaneously for several purposes, both for 3D detection and for the main purpose - 2D recognition.

Инфракрасная камера (4) высокого разрешения применяется для обнаружения атак с помощью изображений на экранах мобильных устройств. ИК камера (4) работает в ближнем ИК диапазоне (например, 800-960 нм) и формирует черно-белое изображение. Матрица установлена также в портретной ориентации. Угол зрения ИК камеры (4) согласован с углом зрения основных камер стереопары, так что программное обеспечение при обработке выполняет совмещение ИК изображения с цветным используя все доступное поле зрения. Экраны гаджетов в этой спектральной области имеют низкую яркость и контрастность, поэтому если в ИК области в кадре нет объекта распознавания, дальнейшая обработка изображений со стереокамеры (3) не требуется. Применение ИК камеры (4) в дополнение к основной цветной камере существенно уменьшает нагрузку на процессор и его тепловыделение. Черно-белое изображение имеет меньший объём и вследствие упрощенного алгоритма может иметь и меньшее разрешение. Вследствие распараллеливания процессов обработки по ИК каналу и по видимому диапазону снижается сложность вычислений и объем обрабатываемых данных, а, следовательно, увеличивается скорость обработки, так как в основном канале обработки не требуется проводить анализ по этому вектору атаки.A high-resolution infrared camera (4) is used to detect attacks using images on mobile device screens. The IR camera (4) operates in the near-IR range (for example, 800-960 nm) and generates a black and white image. The matrix is also installed in portrait orientation. The viewing angle of the IR camera (4) is coordinated with the viewing angle of the main cameras of the stereo pair, so the processing software combines the IR image with the color image using the entire available field of view. Gadget screens in this spectral region have low brightness and contrast, so if there is no recognition object in the frame in the IR region, further processing of images from the stereo camera (3) is not required. The use of an IR camera (4) in addition to the main color camera significantly reduces the load on the processor and its heat generation. A black and white image has a smaller volume and, due to a simplified algorithm, may have a lower resolution. Due to the parallelization of processing processes along the IR channel and in the visible range, the complexity of calculations and the volume of processed data are reduced, and, consequently, the processing speed increases, since analysis along this attack vector is not required in the main processing channel.

В альтернативном варианте осуществления вместо ИК камеры (4) используется тепловая камера низкого разрешения, например, класса FLIR Lepton. Она работает в средневолновой и длинноволновой инфракрасной области спектра (2-20 мкм). Небольшое разрешение (например, 160*120 пикселей) массовых недорогих тепловизионных камер не позволяет использовать их для распознавания лиц, но такая камера эффективно отсеивает неживые объекты и применяется в дополнение к основным камерам для обнаружения латексных масок, накладок на лица и других способов имитации, которые неизбежно изменяют тепловую сигнатуру изображения лица. Кроме того, тепловая камера может служить для оценки температуры тела человека и в системах контроля учета доступа может дополнительно использоваться для выявления людей с повышенной температурой.In an alternative embodiment, instead of the IR camera (4), a low-resolution thermal camera, such as the FLIR Lepton class, is used. It operates in the mid-wave and long-wave infrared regions of the spectrum (2-20 microns). The small resolution (for example, 160 * 120 pixels) of mass-produced inexpensive thermal imaging cameras does not allow them to be used for facial recognition, but such a camera effectively screens out inanimate objects and is used in addition to the main cameras to detect latex masks, face coverings and other methods of imitation that inevitably change the thermal signature of the facial image. In addition, a thermal camera can serve to assess a person’s body temperature and, in access control systems, can additionally be used to identify people with elevated temperatures.

Поворотный кронштейн (5) служит для обеспечения возможности регулировки наклона и поворота блока (1) камер относительно корпуса (7) терминала и, следовательно, дисплея (6). Поворотный кронштейн (5) содержит полость для пропуска кабелей от камер и подсветки в корпус (7) терминала. Использование поворотного кронштейна (5) существенно повышает удобство использования терминалов. В качестве примера - крепление корпуса (7) терминала на стену рядом с проходом. Возможность поворота блока (1) камер решает проблему угла зрения камер, т.к. для большего охвата пространства угол зрения должен быть широким. Короткофокусные камеры имеют большую дисторсию и геометрические искажения. Поворотный кронштейн (5) дает возможность применить более узконаправленную оптику и более эффективно согласовать требуемый охват и угол зрения камер. Допустимый угол обзора хорошего дисплея достигает 170-175 градусов. А практически применимый угол зрения камер ограничен геометрическими искажениями и дисторсией. Таким образом, поворотный кронштейн обеспечивает преимущества по расположению терминала и по углу зрения камер терминала, что позволяет расширить спектр применения терминала и удобство установки/настройки/использования терминала.The rotating bracket (5) serves to provide the ability to adjust the tilt and rotation of the camera unit (1) relative to the terminal body (7) and, consequently, the display (6). The rotating bracket (5) contains a cavity for passing cables from cameras and lighting into the terminal body (7). The use of a rotating bracket (5) significantly increases the ease of use of the terminals. As an example, fastening the terminal housing (7) to the wall next to the passage. The ability to rotate the camera block (1) solves the problem of the cameras' viewing angle, because For greater coverage of space, the viewing angle should be wide. Short throw cameras have greater distortion and geometric distortion. The rotating bracket (5) makes it possible to use more narrowly directed optics and more effectively match the required coverage and angle of view of the cameras. The permissible viewing angle of a good display reaches 170-175 degrees. And the practically applicable angle of view of cameras is limited by geometric distortions and distortion. Thus, the rotating bracket provides advantages in terms of the location of the terminal and the viewing angle of the terminal cameras, which allows you to expand the range of applications of the terminal and the ease of installation/configuration/use of the terminal.

Дисплей (6) представляет собой жидкокристаллический (LCD) дисплей с сенсорным экраном для обеспечения взаимодействия пользователя с терминалом. Сенсорный экран может представлять собой, например, резисистивный, проекционно-емкостной, поверхностно-емкостной, сенсорно-сканирующий экран и т.п. Дисплей (6) может отображать изображение с камер для повышения комфорта пользователя (режим «цифрового зеркала»). Попутно на экране отображается текстовая информация, содержащая инструкции для пользователя. Сенсорный экран используется для навигации и ввода пользователем текстовой информации в процессе взаимодействия с терминалом. Дисплей (6) терминала выполнен сменным. Его можно отстыковать, отсоединить кабели и поменять. Унификация обусловлена конструкцией модуля, содержащего стыковочные узлы, а также высокой степенью унификации дисплеев на уровне стандартных HDMI, DP, USB интерфейсов. Такая конструкция позволяет повысить ремонтопригодность терминала, а также возможности по его модернизации на аппаратном уровне.The display (6) is a liquid crystal display (LCD) with a touch screen to provide user interaction with the terminal. The touch screen may be, for example, a resistive screen, a projected capacitive screen, a surface capacitive screen, a touch scanning screen, and the like. The display (6) can display images from cameras to improve user comfort (digital mirror mode). At the same time, text information containing instructions for the user is displayed on the screen. The touch screen is used for navigation and user input of text information while interacting with the terminal. The display (6) of the terminal is replaceable. It can be undocked, cables disconnected and replaced. Unification is due to the design of the module containing docking nodes, as well as the high degree of unification of displays at the level of standard HDMI, DP, USB interfaces. This design makes it possible to increase the maintainability of the terminal, as well as the possibility of upgrading it at the hardware level.

Процессорный блок в примерном варианте осуществления настоящего изобретения представляет собой одноплатный компьютер, оснащенный всеми требуемыми интерфейсами для работы с вышеописанными блоками, а также носители информации, используемые в работе. Процессорный блок осуществляет управление функционированием всех блоков терминала, включая ввод-вывод, захват и обработку видеоизображений с камер и звука с микрофона, а также ввод и обработку информации с карт и сенсорного экрана. Процессорный блок оснащен подсистемой питания и охлаждения, а также содержит необходимые коммуникационные проводные и беспроводные сетевые интерфейсы, такие как Ethernet, USB и т.д. Процессорный блок включает в себя набор графических ядер CUDA, на которых происходит обработка графики в нейронных сетях. Самостоятельная (то есть изолированно от сети) работа терминала возможна, но не является рациональной вследствие медленной скорости обработки данных из-за ограниченных вычислительных мощностей доступных в терминале. В итоге терминал ориентирован на работу с сервером распознавания по TCP/IP с использованием криптозащиты сетей и передаваемых данных и закрытым физическим доступом. Сервер распознавания, являющийся в примерном варианте осуществления настоящего изобретения удаленным сервером, обладает значительно более высокими вычислительными возможностями по сравнению с терминалом. Такая распределенная обработка по распознаванию позволяет снизить технические требования к терминалам, которых может быть значительное количество, при этом повысить скорость обработки за счет переноса наиболее ресурсозатратной ее части на мощный удаленный сервер.The processing unit in an exemplary embodiment of the present invention is a single board computer equipped with all the required interfaces for operation with the above-described units, as well as storage media used in the operation. The processing unit controls the functioning of all terminal blocks, including input/output, capturing and processing video images from cameras and sound from a microphone, as well as entering and processing information from cards and touch screens. The processor unit is equipped with a power and cooling subsystem, and also contains the necessary communication wired and wireless network interfaces, such as Ethernet, USB, etc. The processor unit includes a set of CUDA graphics cores on which graphics are processed in neural networks. Independent (that is, isolated from the network) operation of the terminal is possible, but is not rational due to the slow data processing speed due to the limited computing power available in the terminal. As a result, the terminal is designed to work with a recognition server over TCP/IP using cryptographic protection of networks and transmitted data and closed physical access. The recognition server, which is a remote server in an exemplary embodiment of the present invention, has significantly higher computing capabilities compared to the terminal. Such distributed recognition processing makes it possible to reduce the technical requirements for terminals, of which there can be a significant number, while increasing the processing speed by transferring the most resource-intensive part of it to a powerful remote server.

Бесконтактное устройство (8) считывания обеспечивает выполнение как вспомогательных функций идентификации, так и основных функций, например защищенные операции с банковскими картами. Бесконтактное устройство (8) считывания, помимо считывания банковских карт, в зависимости от реализации может дополнительно применяться для считывания различных карт и меток (NFC, RFID и т.д.).The contactless reader (8) provides both auxiliary identification functions and basic functions, such as secure transactions with bank cards. The contactless reader (8), in addition to reading bank cards, depending on the implementation, can additionally be used to read various cards and tags (NFC, RFID, etc.).

Дополнительно терминал может включать в себя звуковую подсистему для обеспечения голосовой связи по цифровым каналам связи (VoIP), состоящую из нескольких микрофонов (в том числе для целей адаптивного шумопонижения), громкоговорителей и цифрового интерфейса для подключения к процессорному блоку. Микрофоны, установленные в нескольких местах корпуса, служат для приема голоса и окружающего шума. Это обеспечивает возможность применения алгоритмов адаптивного шумопонижения и позволяет добиться повышения качества передачи звуков в условиях шумного окружения и устранения эха. Существенным расширением функционала терминала является потенциальная возможность использования звуковой подсистемы в качестве голосового канала связи с удаленной поддержкой. Для этого терминал имеет все необходимые условия, к которым следует отнести наличие достаточных аппаратных и вычислительных ресурсов процессорного блока, а также каналов связи.Additionally, the terminal may include an audio subsystem for providing voice communication over digital communication channels (VoIP), consisting of several microphones (including for adaptive noise reduction purposes), loudspeakers and a digital interface for connection to the processing unit. Microphones installed in several places on the body are used to pick up voice and ambient noise. This makes it possible to use adaptive noise reduction algorithms and improve the quality of sound transmission in noisy environments and eliminate echoes. A significant expansion of the terminal's functionality is the potential ability to use the audio subsystem as a voice communication channel with remote support. For this, the terminal has all the necessary conditions, which include the presence of sufficient hardware and computing resources of the processor unit, as well as communication channels.

Корпус терминала выполнен по модульной технологии. Модульный корпус позволяет без изменения конструкции менять конфигурацию и назначение терминала путем замены модуля дисплея, подключения блока считывателей различного назначения. В модуле основного корпуса размещены процессорный блок, звуковая подсистема. Конструкция этого модуля позволяет пристыковывать к нему сменный модуль дисплея. На корпусе предусмотрены точки крепления для установки подвесных кронштейнов стандарта VESA.The terminal body is made using modular technology. The modular housing allows you to change the configuration and purpose of the terminal without changing the design by replacing the display module or connecting a block of readers for various purposes. The main body module houses the processor unit and sound subsystem. The design of this module allows you to dock a replaceable display module to it. The case has mounting points for installing VESA standard hanging brackets.

В соответствии с дополнительным аспектом настоящее изобретение обеспечивает систему для биометрической идентификации, включающую в себя описанный выше терминал для биометрической идентификации и удаленный сервер.According to a further aspect, the present invention provides a system for biometric identification including a biometric identification terminal as described above and a remote server.

Способ биометрической идентификации, в котором применяется описанная выше система в соответствии с настоящим изобретением, описан далее со ссылкой на фиг. 2.A biometric identification method employing the above-described system in accordance with the present invention is described next with reference to FIG. 2.

Процесс биометрической идентификации в соответствии с настоящим изобретением представляет собой последовательные операции обработки изображений с нескольких камер в двух диапазонах (например, видимом диапазоне и ближнем ИК диапазоне). Алгоритм может меняться в зависимости от требований к надежности распознавания и скорости.The biometric identification process in accordance with the present invention is a sequential operation of processing images from multiple cameras in two bands (eg, visible range and near-infrared range). The algorithm may change depending on the requirements for recognition reliability and speed.

Процесс биометрической идентификации в соответствии с настоящим изобретением включает в себя описанные далее этапы:The biometric identification process in accordance with the present invention includes the following steps:

- этап (S1) активации режима поиска человеческого лица;- stage (S1) of activating the human face search mode;

- этап (S2) предварительной обработки изображений;- stage (S2) of image pre-processing;

- этап (S3) обнаружения человеческого лица и определение его местонахождения;- stage (S3) of detecting a human face and determining its location;

- этап (S4) нормализации изображения;- stage (S4) of image normalization;

- этап (S5) сопоставления изображения лица с шаблонами;- stage (S5) of matching the face image with templates;

- этап (S6) приема и использования результатов биометрической идентификации.- stage (S6) of receiving and using the results of biometric identification.

На этапе S1 активируют режим поиска человеческого лица на изображениях, полученных из видеопотока от цветной (одной из стереопары) камеры. Блок (2) подсветки осуществляет непрерывную подсветку интересующей области, в которой возможно появление человеческого лица. Активацию упомянутого режима осуществляют, например, при обнаружении движения, сопровождающегося изменением общей освещенности в кадре. Использование изображений только от одной из цветных камер на данном этапе позволяет снизить вычислительную нагрузку при обработке.At step S1, the search mode for a human face is activated in images obtained from a video stream from a color (one of a stereo pair) camera. The illumination unit (2) provides continuous illumination of the area of interest in which a human face may appear. Activation of the mentioned mode is carried out, for example, when motion is detected, accompanied by a change in the overall illumination in the frame. Using images from only one of the color cameras at this stage allows you to reduce the computational load during processing.

В режиме ожидания, когда в захватываемом изображении отсутствуют люди/движение/изменения освещенности, для снижения нагрева подсветка может быть приглушена, поиск лица не производится, изображение интересующей области пространства с одной цветной камеры анализируется только на общий уровень освещенности, дисплей может быть погашен. При обнаружении изменения общей освещенности в кадре выполняется активация режима поиска человеческого лица, для чего происходит выход подсветки на рабочий режим, и запускаются процессы обработки изображений этапа S2.In standby mode, when there are no people/movement/light changes in the captured image, the backlight can be dimmed to reduce heating, no face search is performed, the image of the area of interest from one color camera is analyzed only for the general illumination level, the display can be dimmed. When a change in the overall illumination in the frame is detected, the human face search mode is activated, for which the backlight enters the operating mode, and the image processing processes of stage S2 are launched.

На этапе S2 осуществляют предварительную обработку изображений со всех камер, которая включает в себя по меньшей мере одну из операций линейной коррекции, баланса белого, адаптивной экспозиции и удаления шума.At step S2, pre-processing of images from all cameras is carried out, which includes at least one of linear correction, white balance, adaptive exposure and noise removal.

Несмотря на то, что лицо на изображении еще не обнаружено, захватываемый кадр предварительно обрабатывается, так чтобы его можно было передать в дальнейшую обработку после нахождения на нем лица. Баланс белого редко корректно отрабатывается камерами, так как искусственное освещение может быть сильно неравномерным по спектру. В свете ламп накаливания лица могут быть красными. Низкокачественное (с индексом CRI менее 80-90) светодиодное освещение может дать непредсказуемые оттенки. Для целей распознавания цвет не должен сильно отличаться от эталонного изображения. Линейная коррекция нужна для устранения искажений типа «подушка» или «бочка». Особенно это актуально для широкоугольных камер, которые в углах изменяют лицо до неузнаваемости. При захвате изображения против света, например на фоне окна или стеклянной двери, простые алгоритмы автоматической экспозиции, реализованные в самих камерах, только портят изображения. Получая много света в кадре, они снижают экспозицию и лицо получается темным. Корректный алгоритм должен учитывать яркость интересующего объекта, а не общую засветку, и подстраивать экспозицию именно под объект. В описываемом случае экспозиция должна быть увеличена для вытягивания темного (относительно фона) лица из теней. При этом действительно темные элементы кадра оказываются зашумленными. Этот шум должен быть сглажен.Even though a face has not yet been detected in the image, the captured frame is pre-processed so that it can be passed on for further processing once a face is found in it. White balance is rarely worked out correctly by cameras, since artificial lighting can be very uneven in spectrum. Faces may appear red in incandescent light. Low quality (CRI less than 80-90) LED lighting can produce unpredictable shades. For recognition purposes, the color should not differ greatly from the reference image. Linear correction is needed to eliminate pincushion or barrel distortion. This is especially true for wide-angle cameras, which in the corners change the face beyond recognition. When capturing images against light, such as against a window or glass door, the cameras' simple auto-exposure algorithms only ruin the images. By getting a lot of light into the frame, they lower the exposure and the face turns out dark. The correct algorithm should take into account the brightness of the object of interest, and not the general illumination, and adjust the exposure specifically to the object. In this case, the exposure should be increased to pull the dark (relative to the background) face out of the shadows. In this case, the really dark elements of the frame turn out to be noisy. This noise must be smoothed out.

Во всех практических случаях обработка имеет характер конвейера. Однажды включенный, конвейер (pipeline) захватывает поток с камер, обрабатывает и «сохраняет» обработанные данные в кольцевом буфере. Размер буфера обычно соответствует длительности видео в единицы секунд. Только после записи данных графическим процессором в буфер изображение становится доступным для анализа с помощью свободно программируемых математических алгоритмов. Вторым следствием конвейерной обработки является то, что изображения со всех камер, прошедшие предварительную обработку, сохраняют синхронность и доступны в любой момент времени достаточно долго (долго с точки зрения работы алгоритмов). Конечно, конвейер для трех потоков потребляет больше энергии, чем для одного, но синхронность и возможность ретроспективного обращения к видео кадрам, обеспечивается только для вышеописанного режима работы.In all practical cases, processing has the character of a conveyor belt. Once enabled, the pipeline captures the camera stream, processes it, and “stores” the processed data in a ring buffer. The buffer size usually corresponds to the video duration in units of seconds. Only after the GPU writes data to the buffer does the image become available for analysis using freely programmable mathematical algorithms. The second consequence of pipeline processing is that images from all cameras that have undergone preprocessing remain synchronous and are available at any time for quite a long time (long from the point of view of the algorithms). Of course, a pipeline for three threads consumes more energy than for one, but synchrony and the ability to retrospectively access video frames are provided only for the operating mode described above.

На этапе S3 обнаружение и отслеживание лиц в кадрах от одной из цветных камер стереопары осуществляется, например, с использованием известного алгоритма Single Shot Detector, который обеспечивает высокую скорость обработки и минимальное количество ложных срабатываний. Мощности терминала вполне достаточно для его реализацииAt stage S3, detection and tracking of faces in frames from one of the color cameras of a stereo pair is carried out, for example, using the well-known Single Shot Detector algorithm, which provides high processing speed and a minimum number of false positives. The power of the terminal is quite sufficient for its implementation

Когда лицо в цветном кадре обнаружено, включается поиск в ИК кадре, который выбирается из буфера и соответствует моменту времени захвата соответствующего цветного изображения. Поиск производится по всей площади кадра. Поиск лица по всему полю ИК кадра, а не в ожидаемой области, является одним из элементов защиты, и довольно эффективным. Т.к. изображение в оттенках серого имеет существенно меньший объем, чем цветное изображение, то такая обработка не потребует значительных вычислительных ресурсов. После нахождения лица в ИК кадре его размеры и координаты сравниваются с ожидаемыми величинами, полученными из цветной камеры. Некоторое смещение изображения от дистанционно разнесенных камер, вызванное параллаксом, учитывается при настройке системы. После нахождения лица на ИК кадре и сравнения его размеров/координат принимается решение о наличии человека перед камерами и о необходимости дальнейшей проверки на спуффинг, либо о начале форматирования изображения для собственно распознавания. Если в ИК кадре нет лица, то кадр отбрасывается и трекинг (слежение за лицом - при этом найденное лицо на кадре помечено) на цветном изображении прекращается, система возвращается в исходное состояние - снова поиск человеческого лица на изображениях от одной цветной камеры из стереокамеры и т.д.When a face is detected in a color frame, a search is started in the IR frame, which is selected from the buffer and corresponds to the moment in time when the corresponding color image was captured. The search is carried out over the entire area of the frame. Searching for a face across the entire field of the IR frame, and not in the expected area, is one of the security elements, and quite effective. Because Since a grayscale image has a significantly smaller volume than a color image, such processing will not require significant computing resources. After finding a face in the IR frame, its dimensions and coordinates are compared with the expected values obtained from the color camera. Some image shift from remotely spaced cameras caused by parallax is taken into account when setting up the system. After finding a face on the IR frame and comparing its size/coordinates, a decision is made about the presence of a person in front of the cameras and the need for further checking for spoofing, or to begin formatting the image for the actual recognition. If there is no face in the IR frame, then the frame is discarded and tracking (face tracking - in this case, the found face in the frame is marked) on the color image stops, the system returns to its original state - again searching for a human face in images from one color camera from a stereo camera, etc. .d.

Этап S3, реализует первый этап защиты от несанкционированного доступа, например, посредством демонстрации терминалу экрана смартфона (планшетного компьютера, графического планшета и т.д.) с изображением лица человека. Экраны смартфонов не обеспечивают достаточной яркости и контрастности изображения в ИК диапазоне. Кроме того, на экране смартфона будет присутствовать яркий блик от ИК подсветки терминала. Смартфон сам также является источником ИК излучения в обрабатываемом диапазоне (800-1000 нм), вследствие наличия яркого ИК светодиода в датчике обнаружения приближения смартфона к голове пользователя.Stage S3 implements the first stage of protection against unauthorized access, for example, by showing the terminal the screen of a smartphone (tablet computer, graphics tablet, etc.) with an image of a person’s face. Smartphone screens do not provide sufficient brightness and contrast in the IR range. In addition, a bright glare from the terminal’s IR illumination will be present on the smartphone screen. The smartphone itself is also a source of IR radiation in the processed range (800-1000 nm), due to the presence of a bright IR LED in the sensor for detecting the proximity of the smartphone to the user’s head.

В альтернативном варианте осуществления, когда в терминале вместо ИК камеры используется тепловая камера, этап S3 выполняется несколько отличным от описанного выше образом. На изображении, полученном от тепловой камеры, лицо выглядит как более яркое пятно на общем сером фоне, и лишь при достаточном приближении становятся выражены более темные пятна (размером несколько пикселей) на месте глаз, носа и рта. Поэтому стандартный для обычных камер алгоритм поиска может не давать правильные результаты. Тем не менее, соответствие между координатами точек кадра цветной и тепловой камеры также устанавливается заранее. После обнаружения лица в изображении от цветной камеры ожидается, что в соответствующей зоне теплового изображения присутствует объект соответствующей площади с температурой примерно около 36 градусов, соответствующий лицу человека. Таким образом, на этапе S3 в соответствии с альтернативным вариантом осуществления сначала осуществляют обнаружение и отслеживание лица в изображениях с цветной камеры. После обнаружения лица в цветных кадрах осуществляют поиск объекта, соответствующего лицу человека, в изображениях с тепловой камеры. Далее приводят координаты и размеры изображения с тепловой камеры в соответствие с координатами и размерами цветного изображения, сравнивают результаты обнаружения на упомянутых изображениях и делают вывод относительно присутствия лица человека на захватываемых изображениях.In an alternative embodiment, when the terminal uses a thermal camera instead of an IR camera, step S3 is performed in a slightly different manner from that described above. In the image obtained from a thermal camera, the face looks like a brighter spot on a general gray background, and only when you get close enough do darker spots (several pixels in size) in place of the eyes, nose and mouth become visible. Therefore, the search algorithm standard for conventional cameras may not provide correct results. However, the correspondence between the coordinates of the frame points of the color and thermal cameras is also established in advance. After detecting a face in the image from the color camera, it is expected that in the corresponding zone of the thermal image there is an object of the appropriate area with a temperature of approximately about 36 degrees, corresponding to the person's face. Thus, in step S3, according to an alternative embodiment, face detection and tracking is first performed in color camera images. After detecting a face in color frames, an object corresponding to the person's face is searched for in images from the thermal camera. Next, the coordinates and dimensions of the image from the thermal camera are brought into accordance with the coordinates and dimensions of the color image, the detection results are compared on the mentioned images and a conclusion is drawn regarding the presence of a person’s face in the captured images.

Использование на данном этапе изображений от одной из цветных камер для обнаружения лица и последующая проверка наличия лица на изображениях от инфракрасной или тепловой камеры, которые обладают значительно меньшим объемом, позволяет снизить вычислительную нагрузку при обработке.Using images from one of the color cameras for face detection at this stage and then checking for the presence of a face in images from an infrared or thermal camera, which have a significantly smaller volume, reduces the computational load during processing.

На этапе S4 осуществляют нормализацию изображений со всех трех камер терминала, т.е. приведение к стандартному виду, и пересылку их в сервер распознавания посредством доступных сетевых интерфейсов. На данном этапе происходит измерение или оценка освещенности лица пользователя. Этот параметр используется для статистической подстройки яркости белой подсветки и экспозиции для отслеживания медленных изменений условий освещения.At step S4, images from all three cameras of the terminal are normalized, i.e. reduction to a standard form, and sending them to the recognition server via available network interfaces. At this stage, the illumination of the user's face is measured or assessed. This setting is used to statistically adjust white backlight brightness and exposure to track slow changes in lighting conditions.

На этапе S5 в сервере распознавания анализируют полученные от терминала изображения и сопоставляют изображения лица с сохраненными шаблонами в базе данных. Для этого используется предварительно обученная нейронная сеть. Анализ изображений от терминала выполняют согласно требуемому уровню надежности и скорости обработки.At step S5, the recognition server analyzes the images received from the terminal and matches the facial images with the stored templates in the database. For this, a pre-trained neural network is used. The analysis of images from the terminal is performed according to the required level of reliability and processing speed.

В цветном видео поиск лица ведется в двух кадрах со стереокамеры и область кадра с лицом, которое есть на обоих кадрах, является источником для построения карты глубин, например, известным (из геодезии) фотограмметрическим способом, когда замеряется координата общей характерной точки на плоскости снимка, и по ним вычисляется дальность до точки с помощью триангуляцинных формул. Карта глубин является предметом анализа живости объекта перед камерой.In color video, the search for a face is carried out in two frames from a stereo camera, and the area of the frame with a face, which is present in both frames, is the source for constructing a depth map, for example, using the well-known (from geodesy) photogrammetric method, when the coordinate of a common characteristic point on the image plane is measured, and from them the distance to the point is calculated using triangulation formulas. The depth map is the subject of analysis of the liveness of the object in front of the camera.

Алгоритм распознавания на этапе S5 может иметь несколько описанных далее сценариев, которые должны быть выбраны заранее в соответствии с целями и ограничениями конкретной задачи применения терминала для биометрической идентификации, причем переключение между сценариями на ходу затруднительно и нецелесообразно:The recognition algorithm at step S5 may have several scenarios described below, which must be selected in advance in accordance with the goals and limitations of the specific application of the terminal for biometric identification, and switching between scenarios on the fly is difficult and impractical:

Сценарий 1. Вариант максимальной скорости обработки. На распознавание отправляется кадр только с одной цветной камеры. Определение живости при этом не осуществляется. Распознавание происходит при участии нейронной сети сверточного типа. Результатом распознавания является идентификатор, присвоенный одному из шаблонных изображений, а также степень совпадения в виде числа, либо факт отсутствия совпадения с шаблоном, выраженный слишком низким уровнем совпадения.Scenario 1. Maximum processing speed option. A frame from only one color camera is sent for recognition. Liveness determination is not carried out in this case. Recognition occurs with the participation of a convolutional neural network. The result of recognition is an identifier assigned to one of the template images, as well as the degree of match in the form of a number, or the fact that there is no match with the template, expressed by a too low level of match.

Описанные далее дополнительные сценарии 2-5 перед распознаванием изображения (сопоставлением изображения с шаблоном в базе данных) включают в себя выполнение определения присутствия живого человека на изображении (определение живости).Additional scenarios 2-5 described below, prior to image recognition (matching the image to a template in the database), involve performing a detection of the presence of a living person in the image (liveness detection).

Сценарий 2. Более сложный вариант обработки подразумевает дополнительную обработку цветного изображения с одной камеры для определения живости пользователя. Изображение без обрезки подается на сверточную нейронную сеть. Для определения живости анализу подвергается не только лицо, но и фоновые объекты. В большинстве существующих решений лицо для анализа обрезается «впритирку» для снижения времени обработки. Поэтому атакующий может спокойно держать фотографию руками, зная, что обрабатываться будет только обнаруженное «лицо», а не края фотографии и руки. Сеть результатом своей работы имеет вычисленную вероятность атаки на систему биометрической идентификации в виде числа в диапазоне 0-1. Выставление доверительных границ позволяет выделить 3 зоны: условно «зеленая» - высокая степень уверенности в живости лица; «желтая» - требуется дополнительная проверка, при наличии возможности; «красная» - высокая вероятность попытки несанкционированного доступа, лицо «отбрасывается», для экономии вычислительных ресурсов изображение не отправляется на обработку следующими этапами. В случае попадания в зеленую зону изображение отправляется на распознавание, аналогично сценарию 1, описанному выше. Дополнительная проверка при попадании в «желтую» зону означает, что результаты обработки текущего кадра могут быть учтены при выставлении границ при обработке следующего (снижение или повышение порога).Scenario 2. A more complex processing option involves additional processing of a color image from one camera to determine the user's liveliness. The uncropped image is fed to a convolutional neural network. To determine liveliness, not only the face, but also background objects are analyzed. In most existing solutions, the face for analysis is cut off “close to the touch” to reduce processing time. Therefore, the attacker can safely hold the photo with his hands, knowing that only the detected “face” will be processed, and not the edges of the photo and the hand. As a result of its work, the network has a calculated probability of an attack on the biometric identification system as a number in the range 0-1. Setting confidence boundaries allows us to distinguish 3 zones: conditionally “green” - a high degree of confidence in the liveliness of the face; “yellow” - additional verification is required, if possible; “red” - there is a high probability of an unauthorized access attempt, the face is “discarded”, to save computing resources, the image is not sent for processing in the following stages. If it enters the green zone, the image is sent for recognition, similar to scenario 1 described above. An additional check when entering the “yellow” zone means that the results of processing the current frame can be taken into account when setting boundaries when processing the next one (lowering or increasing the threshold).

Сценарий 3. Более затратный по ресурсам вариант предполагает вычисление карты глубин на основе двух синхронных цветных изображений от стереокамеры для отсеивания атаки плоским цветным изображением. Алгоритмы вычисления карты глубин хорошо известны. Для каждой точки на одном изображении должна быть найдена соответствующая ей парная точка на втором изображении. Поиск происходит путем вычисления и поиска экстремума некоторой функции (например, корреляции окрестности точек). С помощью геометрических построений, то есть триангуляцией, вычисляется расстояние до точки. Карта глубин содержит расстояние от камер до каждой точки изображения. Карта глубин может быть построена не только триангуляцией парных точек изображения. Для этого может использоваться, в частности, и сверточная нейронная сеть типа DenseNet, обученная на картах различия пар изображений.Scenario 3. A more resource-intensive option involves calculating a depth map based on two synchronous color images from a stereo camera to filter out an attack with a flat color image. Algorithms for calculating depth maps are well known. For each point in one image, a corresponding paired point in the second image must be found. The search occurs by calculating and searching for the extremum of a certain function (for example, correlation of a neighborhood of points). Using geometric constructions, that is, triangulation, the distance to a point is calculated. The depth map contains the distance from the cameras to each point in the image. A depth map can be constructed not only by triangulating paired image points. For this, in particular, a convolutional neural network of the DenseNet type, trained on difference maps of image pairs, can be used.

Анализ карты глубин происходит двумя возможными вариантами:Analysis of the depth map occurs in two possible ways:

а) Присутствует некоторый шаблон «глубинного» изображения, представляющий рельеф некоторого усредненного лица, с которым производится сравнение вычисленной в конкретной момент карты глубин. В случае, если вычисленное среднеквадратичное отклонение между шаблоном и текущей картой глубин находится в рамках заранее выставленных диапазонов, то изображение считается изображением живого человека и передается на дальнейшую обработку.a) There is a certain “depth” image template, representing the relief of some averaged face, with which the depth map calculated at a particular moment is compared. If the calculated standard deviation between the template and the current depth map is within the preset ranges, then the image is considered to be an image of a living person and is transferred for further processing.

б) Обученная нейронная сеть, принимающая на входе карту глубин, приведенную к определенным размерам, соответствующим входной размерности первого сверточного слоя. Результатом прямого прогона данной сети является классификация изображения, относящая его к «зеленой», «желтой» или «красной» зоне, согласно вышеописанной классификации.b) A trained neural network that receives as input a depth map reduced to certain dimensions corresponding to the input dimension of the first convolutional layer. The result of a direct run of this network is the classification of the image, assigning it to the “green”, “yellow” or “red” zone, according to the classification described above.

Способ с нейронной сетью является более надежным (устойчивым) относительно положения человека в кадре и прочих оптических факторов, однако требует большее количество вычислительных ресурсов. Выбор между вариантами делает эксплуататор системы исходя из необходимого баланса между скоростью и защищенностью системы.The neural network method is more reliable (stable) with respect to the position of the person in the frame and other optical factors, but requires more computing resources. The choice between options is made by the system operator based on the required balance between speed and system security.

Таким образом, анализ карты глубин обеспечивает дополнительный этап защиты от несанкционированного доступа, например, посредством демонстрации терминалу плоского цветного изображения.Thus, depth map analysis provides an additional step of protection against unauthorized access, for example by presenting a flat color image to the terminal.

Сценарий 4. Еще более нагруженный алгоритм определения живости - обработка одновременно двух фотографий в нейронной сети. Подход похож на сценарий 3 с точки зрения использования двух фотографий от стереопары, за исключением того, что на данном этапе используется специально обученная сверточная нейронная сеть, принимающая на вход отснятые в один момент времени изображения с обеих оптических камер стереопары.Scenario 4. An even more loaded algorithm for determining liveliness is processing two photographs simultaneously in a neural network. The approach is similar to scenario 3 from the point of view of using two photographs from a stereo pair, except that at this stage a specially trained convolutional neural network is used that takes as input images captured at the same time from both optical cameras of the stereo pair.

Упомянутая нейронная сеть делает вывод, насколько похож снимок со стереопары на снимок объёмного объекта. Возможная необходимость применения данного сценария может быть обусловлена следующими факторами:The mentioned neural network concludes how similar a picture from a stereo pair is to a picture of a three-dimensional object. The possible need to use this scenario may be due to the following factors:

Априори понятно, что фотография, даже вырезанная и согнутая, имеет меньшую рельефность по сравнению с лицом. В сценарии 3 для использования этой априорной информации производится промежуточное построение в виде карты глубин, которая содержит в несколько раз меньше информации, чем цветной стереоснимок. Пиксель в каждом цветном снимке содержит от 15 до 32 бит информации о цвете. Точка на карте глубин имеет разрядность 8-12 бит, то есть карта в 3-6 раз меньше. Основная проблема при построении карты глубин - большая неравномерность распределения найденных парных точек. На многих частях изображения они не находятся достоверно даже для статических изображений. Поэтому в некоторых случаях попиксельно сравнивать карты не получится, многие пиксели могут просто отсутствовать. Вторая проблема - это внесение шумов квантования и округления при вычислениях.It is a priori clear that a photograph, even cut out and folded, has less relief compared to a face. In scenario 3, to use this a priori information, an intermediate construction is made in the form of a depth map, which contains several times less information than a color stereo image. A pixel in each color photograph contains between 15 and 32 bits of color information. A point on the depth map has a bit depth of 8-12 bits, that is, the map is 3-6 times smaller. The main problem when constructing a depth map is the large uneven distribution of the found paired points. In many parts of the image they are not reliably found, even for static images. Therefore, in some cases it will not be possible to compare maps pixel by pixel; many pixels may simply be missing. The second problem is the introduction of quantization and rounding noise during calculations.

В сценарии 4 упомянутые априорные данные о способе атаки также используются, но в отличие от сценария 3 информация не отбрасывается и не искажается искусственными преобразованиями и вычислениями. Увеличение количества информации и уменьшение шума непосредственно влияет на уменьшение вероятности ошибок.In scenario 4, the aforementioned a priori data about the attack method is also used, but unlike scenario 3, the information is not discarded or distorted by artificial transformations and calculations. Increasing the amount of information and reducing noise directly affects the reduction in the likelihood of errors.

Сценарий 4 также обеспечивает дополнительный этап защиты от несанкционированного доступа, например, посредством демонстрации терминалу плоского цветного изображения.Scenario 4 also provides an additional step of security against unauthorized access, for example by displaying a flat color image to the terminal.

Сценарий 5. Самым высокоточным и ресурсозатратным является алгоритм, который принимает в качестве входной информации синхронизированные по времени съемки изображения с обеих оптических камер стереопары и ИК камеры. Алгоритм также представляет из себя нейронную сеть, обученную на заданном наборе данных при различных положениях испытуемого, различных положениях, направленностях и интенсивностях источников освещения и с учетом различных возможных типов атак (в головном уборе и без, плоское фото в рамке, фото с цифровых дисплеев, фото, вырезанное на бумаге по контуру лица, вырезанное фото с проделанными отверстиями под глаза, фото с отверстиями под глаза и нос, искривленные вокруг лица бумажные фото и т.д.).Scenario 5. The most highly accurate and resource-intensive is the algorithm that takes as input information time-synchronized images from both optical cameras of a stereo pair and an IR camera. The algorithm is also a neural network trained on a given set of data for various positions of the subject, various positions, directions and intensities of light sources and taking into account various possible types of attacks (with and without a headdress, a flat photo in a frame, photos from digital displays, a photo cut out on paper along the contour of the face, a cut out photo with holes made for the eyes, a photo with holes for the eyes and nose, paper photos curved around the face, etc.).

В предпочтительном варианте осуществления в сценариях 4 и 5 фоновые объекты на изображениях не анализируются для снижения объема вычислений.In the preferred embodiment, in scenarios 4 and 5, background objects in the images are not analyzed to reduce computational effort.

Таким образом, на этапе S5 возможна реализация второго этапа защиты от несанкционированного доступа посредством демонстрации терминалу цветной фотографии пользователя.Thus, at step S5, it is possible to implement the second stage of protection against unauthorized access by displaying a color photograph of the user to the terminal.

Непосредственно сопоставление изображения лица с шаблонами на этапе S5 (описано в сценарии 1) может быть реализовано различными известными методами. Например, путем прямого прогона нормализованного фото (т.е. с выровненным, обрезанным изображением лица, скорректированной экспозицией и т.п.) через заранее обученную сверточную нейронную сеть, выдающую как результат числовой вектор, содержащий в общем случае от 128 до 4096 элементов. Возможно и меньшее и большее количество, это зависит от конкретной архитектуры сети и не является предметом данного описания. Частным случаем такой сети является широкоиспользуемый ResNet50. Далее, полученный вектор сравнивается по некоторой метрике с эталонными векторами присутствующих в базе пользователей. В случае если расстояние, вычисленное по метрике (линейное, косинусное и т.п.) между полученным вектором и вектором, содержащимся в базе, меньше заранее определенной пороговой величины, то считается, что человек перед биометрическим терминалом - это человек, соответствующий данному эталонному вектору.Directly matching the face image with templates in step S5 (described in scenario 1) can be implemented by various known methods. For example, by directly running a normalized photo (i.e. with a straightened, cropped face image, adjusted exposure, etc.) through a pre-trained convolutional neural network, which produces as a result a numerical vector containing in general from 128 to 4096 elements. A smaller or larger number is possible; this depends on the specific network architecture and is not the subject of this description. A special case of such a network is the widely used ResNet50. Next, the resulting vector is compared by some metric with the reference vectors of users present in the database. If the distance calculated by the metric (linear, cosine, etc.) between the received vector and the vector contained in the database is less than a predetermined threshold value, then it is considered that the person in front of the biometric terminal is a person corresponding to this reference vector .

На этапе S6 терминал принимает от сервера распознавания результат биометрической идентификации (полученный идентификатор или факт отсутствия соответствия в базе данных) и использует его в соответствии с требуемой задачей.At step S6, the terminal receives the result of the biometric identification (the received identifier or the fact that there is no match in the database) from the recognition server and uses it in accordance with the required task.

Дальнейшие действия терминала реализуют соответствующую логику приложения, в которую интегрирован описываемый процесс. Это может быть открытие доступа, подтверждение прав и т.п.Further actions of the terminal implement the corresponding application logic into which the described process is integrated. This could be opening access, confirming rights, etc.

В альтернативном варианте осуществления одним из возможных действий является многофакторная идентификация/аутентификация, т.е. дополнительное подтверждение идентификации с использованием других принципов.In an alternative embodiment, one of the possible actions is multi-factor identification/authentication, i.e. additional confirmation of identification using other principles.

Дополнительным преимуществом данного устройства может являться одновременная запись видео и аудио канала в синхронизированном режиме. Запись начинается с момента обнаружения лица в кадре и происходит параллельно и независимо от процесса идентификации. Запись может производиться как во внутреннюю память терминала для использования программным обеспечением терминала, так и транслироваться на сервер распознавания по стандартным протоколам TCP/IP, например, с использованием SIP. Синхронная запись позволяет решать следующие задачи:An additional advantage of this device can be the simultaneous recording of video and audio channels in synchronized mode. Recording begins from the moment a face is detected in the frame and occurs in parallel and independently of the identification process. The recording can be made either into the internal memory of the terminal for use by the terminal software, or transmitted to the recognition server via standard TCP/IP protocols, for example, using SIP. Synchronous recording allows you to solve the following problems:

- второй фактор биометрической идентификации/аутентификации, когда после успешного факта идентификации по лицу возможно осуществить аутентификацию по голосовой биометрии с использованием любой из доступных на данный момент технологий. Это может быть как текстозависимая, так и текстонезависимая идентификация.- the second factor of biometric identification/authentication, when after successful identification by face it is possible to carry out authentication using voice biometrics using any of the currently available technologies. This can be either text-dependent or text-independent identification.

- проверка живости путем сопоставления движения губ на видео отрезке и звуковой дорожки при произнесении человеком некоторой парольной фразы. Такая проверка может быть статичной или динамичной в зависимости от контекста бизнес цели. Данное сравнение происходит следующим образом:- checking liveliness by comparing the movement of lips on a video segment and the audio track when a person pronounces a certain password phrase. Such verification can be static or dynamic depending on the context of the business purpose. This comparison works as follows:

1. Заранее обученная нейронная сеть (трехмерная сверточная или RCNN) анализирует входящий поток кадров, выделяя произносимые слова и буквы.1. A pre-trained neural network (3D convolutional neural network or RCNN) analyzes the incoming frame stream, highlighting spoken words and letters.

2. Заранее обученный алгоритм speach2text (Kaldi, например) позволяет осуществить преобразование аудиодорожки в текстовый формат2. A pre-trained speach2text algorithm (Kaldi, for example) allows you to convert an audio track into text format

3. Проводится сравнение текстовых строк полученных в ходе шагов 1 и 2. В случае высокого процента совпадения считается, что перед экраном живой человек.3. A comparison is made of the text strings obtained during steps 1 and 2. In the case of a high percentage of matches, it is considered that there is a living person in front of the screen.

Наличие бесконтактного устройства считывания также позволяет производить двухфакторную аутентификацию путем сопоставления записей из различных разделов базы данных. Т.е. человек может дополнительно подтвердить свою личность посредством поднесения к бесконтактному устройству считывания некоторой метки (NFC, RFID и т.д.) или карты (банковской карты, идентификационной карты и т.д.).Having a contactless reader also allows for two-factor authentication by matching records from different sections of the database. Those. a person can additionally confirm his identity by presenting some tag (NFC, RFID, etc.) or card (bank card, identification card, etc.) to a contactless reader.

В соответствии с еще одним аспектом настоящее изобретение обеспечивает компьютерно-читаемый носитель данных, хранящий на себе компьютерную программу, которая при выполнении процессором предписывает упомянутому процессору осуществлять описанный выше способ биометрической идентификации.In accordance with yet another aspect, the present invention provides a computer-readable storage medium storing a computer program that, when executed by a processor, causes said processor to implement the biometric identification method described above.

Таким образом, настоящее изобретение обеспечивает высокую стойкость к несанкционированному доступу (спуфингу), потому что имеет несколько степеней защиты от атак (может обнаруживать экраны гаджетов, цветные фотографии, маски). Дополнительно настоящее изобретение обеспечивает высокое быстродействие, потому что терминал выполняет только поиск лица в кадре, проверку на спуфинг и отправку данных на сервер распознавания, который имеет большую вычислительную мощность. Повышение быстродействия и снижение требований к вычислительной мощности терминала достигается также за счет того, что первоначально обрабатывается только изображение с одной камеры. Кроме того, настоящее изобретение является безопасным с точки зрения хранения данных, т.к. не хранит образцы для сравнения в терминале. Также обеспечена возможность многофакторной (мультимодальной) биометрической идентификации (то есть 2D+3D распознавание лица плюс распознавание голоса), а также использование дополнительных небиометрических идентификаторов. Терминал в соответствии с настоящим изобретением имеет преимущества по размещению и по углу зрения камер вследствие использования поворотного кронштейна, что расширяет спектр применения и удобство установки/настройки/использования терминала.Thus, the present invention provides high resistance to unauthorized access (spoofing), because it has several degrees of protection against attacks (can detect gadget screens, color photographs, masks). Additionally, the present invention provides high performance because the terminal only performs a face search in the frame, checks for spoofing and sends data to the recognition server, which has greater computing power. Increased performance and reduced requirements for the terminal's computing power are also achieved due to the fact that initially only the image from one camera is processed. In addition, the present invention is secure from a data storage point of view because does not store samples for comparison in the terminal. The possibility of multifactor (multimodal) biometric identification is also provided (that is, 2D + 3D facial recognition plus voice recognition), as well as the use of additional non-biometric identifiers. The terminal in accordance with the present invention has advantages in the placement and viewing angle of the cameras due to the use of a rotating bracket, which expands the range of applications and ease of installation/configuration/use of the terminal.

Настоящее изобретение может использоваться при первичном сборе биометрической информации вследствие наличия всех необходимых функциональных компонентов.The present invention can be used in the initial collection of biometric information due to the presence of all the necessary functional components.

Настоящее изобретение может использоваться в решениях для идентификации на основе биометрического признака, например, в системах электронной торговли, электронного банковского обслуживания, электронного документооборота с биометрической авторизацией пользователей, а также в системах контроля доступа. Второе применение - терминал voIP связи с дополнительной аутентификацией по лицу и голосу.The present invention can be used in biometric identification solutions, for example, in electronic commerce, electronic banking, electronic document management with biometric user authorization, as well as in access control systems. The second application is a voIP communication terminal with additional authentication by face and voice.

Процессор может включать в себя один или несколько процессоров. В то же время, один или несколько процессоров могут быть процессором общего назначения, например, центральным процессором (CPU), прикладным процессором (AP) или т.п., блоком обработки только графики, таким как графический процессор (GPU), визуальный процессор (VPU) и/или специализированный процессор AI, такой как нейронный процессор (NPU).The processor may include one or more processors. At the same time, one or more processors may be a general purpose processor, such as a central processing unit (CPU), an application processor (AP), or the like, a graphics-only processing unit such as a graphics processing unit (GPU), a visual processing unit ( VPU) and/or a specialized AI processor such as a neural processing unit (NPU).

Примеры нейронных сетей включают, помимо прочего, сверточную нейронную сеть (CNN), глубокую нейронную сеть (DNN), рекуррентную нейронную сеть (RNN), ограниченную машину Больцмана (RBM), глубокую сеть доверия (DBN), двунаправленную рекуррентную глубокую нейронную сеть (BRDNN), генеративно-состязательные сети (GAN) и глубокие Q-сети.Examples of neural networks include, but are not limited to, convolutional neural network (CNN), deep neural network (DNN), recurrent neural network (RNN), restricted Boltzmann machine (RBM), deep belief network (DBN), bidirectional recurrent deep neural network (BRDNN) ), generative adversarial networks (GANs), and deep Q-networks.

Алгоритм обучения - это метод обучения предварительно определенного целевого устройства (например, нейронной сети на базе GPU) с использованием множества обучающих данных, чтобы вызывать, разрешать или управлять целевым устройством для выполнения определения или прогнозирования. Примеры алгоритмов обучения включают, но не ограничиваются ими, обучение с учителем, обучение без учителя, обучение с частичным привлечением учителя или обучение с подкреплением.A learning algorithm is a method of training a predefined target device (such as a GPU-based neural network) using a set of training data to invoke, enable, or control the target device to perform a determination or prediction. Examples of learning algorithms include, but are not limited to, supervised learning, unsupervised learning, semi-supervised learning, or reinforcement learning.

Различные иллюстративные блоки и модули, описанные в связи с раскрытием сущности в данном документе, могут реализовываться или выполняться с помощью процессора общего назначения, процессора цифровых сигналов (DSP), специализированной интегральной схемы (ASIC), программируемой пользователем вентильной матрицы (FPGA) или другого программируемого логического устройства (PLD), дискретного логического элемента или транзисторной логики, дискретных аппаратных компонентов либо любой комбинации вышеозначенного, предназначенной для того, чтобы выполнять описанные в данном документе функции. Процессор общего назначения может представлять собой микропроцессор, но в альтернативном варианте, процессор может представлять собой любой традиционный процессор, контроллер, микроконтроллер или конечный автомат. Процессор также может реализовываться как комбинация вычислительных устройств (к примеру, комбинация DSP и микропроцессора, несколько микропроцессоров, один или более микропроцессоров вместе с DSP-ядром либо любая другая подобная конфигурация).The various illustrative blocks and modules described in connection with the disclosure herein may be implemented or executed by a general purpose processor, digital signal processor (DSP), application specific integrated circuit (ASIC), field programmable gate array (FPGA), or other programmable logic device (PLD), discrete logic gate or transistor logic, discrete hardware components, or any combination of the foregoing, designed to perform the functions described herein. The general purpose processor may be a microprocessor, but in the alternative, the processor may be any conventional processor, controller, microcontroller, or state machine. A processor may also be implemented as a combination of computing devices (eg, a combination of a DSP and a microprocessor, multiple microprocessors, one or more microprocessors plus a DSP core, or any other similar configuration).

Вышеупомянутая память может быть энергозависимой или энергонезависимой памятью или может включать в себя как энергозависимую, так и энергонезависимую память. Энергонезависимой памятью может быть постоянное запоминающее устройство (ROM), программируемое постоянное запоминающее устройство (PROM), стираемое программируемое постоянное запоминающее устройство (EPROM), электронно-стираемое программируемое постоянное запоминающее устройство (EEPROM) или флэш-память. Энергозависимая память может быть оперативной памятью (RAM). Также память в вариантах осуществления настоящего раскрытия может быть статической памятью с произвольным доступом (SRAM), динамической памятью с произвольным доступом (DRAM), синхронной динамической памятью с произвольным доступом (синхронная DRAM, SDRAM), синхронной динамической памятью с произвольной выборкой с двойной скоростью передачи данных (SDRAM с двойной скоростью передачи данных, DDR SDRAM), синхронной динамической памятью с произвольной выборкой с повышенной скоростью (улучшенная SDRAM, ESDRAM), DRAM с синхронной линией связи (SLDRAM) и оперативной памятью с шиной прямого доступа (DR RAM) и тд. То есть память в вариантах осуществления настоящего раскрытия включает в себя, но не ограничивается этим, эти и любые другие подходящие типы памяти.The above memory may be volatile or non-volatile memory, or may include both volatile and non-volatile memory. Non-volatile memory may be read-only memory (ROM), programmable read-only memory (PROM), erasable programmable read-only memory (EPROM), electronically erasable programmable read-only memory (EEPROM), or flash memory. Volatile memory may be random access memory (RAM). Also, the memory in embodiments of the present disclosure may be static random access memory (SRAM), dynamic random access memory (DRAM), synchronous dynamic random access memory (synchronous DRAM, SDRAM), synchronous double-rate dynamic random access memory data (dual data rate SDRAM, DDR SDRAM), synchronous dynamic random access memory with increased speed (enhanced SDRAM, ESDRAM), synchronous link DRAM (SLDRAM) and direct access random access memory (DR RAM), etc. . That is, memory in embodiments of the present disclosure includes, but is not limited to, these and any other suitable types of memory.

Информация и сигналы, описанные в данном документе, могут представляться с помощью любой из множества различных технологий. Например, данные, инструкции, команды, информация, сигналы, биты, символы и элементарные сигналы, которые могут приводиться в качестве примера в вышеприведенном описании, могут представляться посредством напряжений, токов, электромагнитных волн, магнитных полей или частиц, оптических полей или частиц либо любой комбинации вышеозначенного.The information and signals described herein may be represented using any of a variety of different technologies. For example, the data, instructions, commands, information, signals, bits, symbols and chips that may be exemplified in the above description may be represented by voltages, currents, electromagnetic waves, magnetic fields or particles, optical fields or particles, or any combinations of the above.

Функции, описанные в данном документе, могут реализовываться в аппаратном обеспечении, программном обеспечении, выполняемом посредством процессора, микропрограммном обеспечении или в любой комбинации вышеозначенного. При реализации в программном обеспечении, выполняемом посредством процессора, функции могут сохраняться или передаваться как одна или более инструкций или код на компьютерно-читаемом носителе. Другие примеры и реализации находятся в пределах объема раскрытия настоящего изобретения. Например, вследствие характера программного обеспечения, функции, описанные выше, могут реализовываться с использованием программного обеспечения, выполняемого посредством процессора, аппаратного обеспечения, микропрограммного обеспечения, фиксированного блока или комбинаций любого из вышеозначенного. Признаки, реализующие функции, также могут физически находиться в различных позициях, в том числе согласно такому распределению, что части функций реализуются в различных физических местоположениях.The functions described herein may be implemented in hardware, software executed by a processor, firmware, or any combination thereof. When implemented in software executed by a processor, the functions may be stored or transmitted as one or more instructions or code on a computer-readable medium. Other examples and implementations are within the scope of the disclosure of the present invention. For example, due to the nature of software, the functions described above may be implemented using software executed by a processor, hardware, firmware, a fixed unit, or combinations of any of the foregoing. Features that implement functions can also be physically located in different positions, including according to such a distribution that parts of the functions are implemented in different physical locations.

Компьютерно-читаемые носители включают в себя как некратковременные компьютерные носители хранения данных, так и среду связи, включающую в себя любую передающую среду, которая упрощает перемещение компьютерной программы из одного места в другое. Некратковременный носитель хранения данных может представлять собой любой доступный носитель, к которому можно осуществлять доступ посредством компьютера общего назначения или специального назначения. В качестве примера, а не ограничения, некратковременные компьютерно-читаемые носители могут содержать оперативное запоминающее устройство (RAM), постоянное запоминающее устройство (ROM), электрически стираемое программируемое постоянное запоминающее устройство (EEPROM), флэш-память, ROM на компакт-дисках (CD) или другое устройство хранения данных на оптических дисках, устройство хранения данных на магнитных дисках или другие магнитные устройства хранения, либо любой другой некратковременный носитель, который может использоваться для того, чтобы переносить или сохранять требуемое средство программного кода в форме инструкций или структур данных, и к которому можно осуществлять доступ посредством компьютера общего назначения или специального назначения либо процессора общего назначения или специального назначения.Computer-readable media includes both non-transitory computer storage media and communication media, which includes any transmission medium that facilitates movement of a computer program from one location to another. A non-transitory storage medium can be any available medium that can be accessed by a general purpose or special purpose computer. By way of example, and not limitation, non-transitory computer-readable media may include random access memory (RAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), flash memory, compact disc ROM (CD ) or other optical disk storage device, magnetic disk storage device or other magnetic storage devices, or any other non-transitory medium that can be used to carry or store the required program code facility in the form of instructions or data structures, and that can be accessed by a general-purpose or special-purpose computer or a general-purpose or special-purpose processor.

Следует понимать, что хотя в настоящем документе для описания различных элементов, компонентов, областей, слоев и/или секций, могут использоваться такие термины, как "первый", "второй", "третий" и т.п., эти элементы, компоненты, области, слои и/или секции не должны ограничиваться этими терминами. Эти термины используются только для того, чтобы отличить один элемент, компонент, область, слой или секцию от другого элемента, компонента, области, слоя или секции. Так, первый элемент, компонент, область, слой или секция может быть назван вторым элементом, компонентом, областью, слоем или секцией без выхода за рамки объема настоящего изобретения. В настоящем описании термин "и/или" включает любые и все комбинации из одной или более из соответствующих перечисленных позиций. Элементы, упомянутые в единственном числе, не исключают множественности элементов, если отдельно не указано иное.It should be understood that although terms such as "first", "second", "third" and the like may be used herein to describe various elements, components, regions, layers and/or sections, these elements, components , areas, layers and/or sections should not be limited to these terms. These terms are used only to distinguish one element, component, area, layer or section from another element, component, area, layer or section. Thus, a first element, component, region, layer or section may be referred to as a second element, component, region, layer or section without departing from the scope of the present invention. As used herein, the term “and/or” includes any and all combinations of one or more of the respective items listed. Elements referred to in the singular do not exclude the plurality of elements unless specifically stated otherwise.

Функциональность элемента, указанного в описании или формуле изобретения как единый элемент, может быть реализована на практике посредством нескольких компонентов устройства, и наоборот, функциональность элементов, указанных в описании или формуле изобретения как несколько отдельных элементов, может быть реализована на практике посредством единого компонента.The functionality of an element specified in the description or claims as a single element may be implemented in practice by means of several components of the device, and conversely, the functionality of elements specified in the description or claims as several separate elements may be realized in practice by means of a single component.

В одном варианте осуществления элементы/блоки предложенного устройства находятся в общем корпусе, могут быть размещены на одной раме/конструкции/печатной плате и связаны друг с другом конструктивно посредством монтажных (сборочных) операций и функционально посредством линий связи. Упомянутые линии или каналы связи, если не указано иное, являются стандартными, известными специалистам линиями связи, материальная реализация которых не требует творческих усилий. Линией связи может быть провод, набор проводов, шина, дорожка, беспроводная линия связи (индуктивная, радиочастотная, инфракрасная, ультразвуковая и т.д.). Протоколы связи по линиям связи известны специалистам и не раскрываются отдельно.In one embodiment, the elements/blocks of the proposed device are located in a common housing, can be placed on the same frame/structure/printed circuit board and are connected to each other structurally through installation (assembly) operations and functionally through communication lines. The mentioned lines or communication channels, unless otherwise indicated, are standard communication lines known to specialists, the material implementation of which does not require creative efforts. The communication line can be a wire, a set of wires, a bus, a track, a wireless communication line (inductive, radio frequency, infrared, ultrasonic, etc.). Communication protocols over communication lines are known to those skilled in the art and are not disclosed separately.

Под функциональной связью элементов следует понимать связь, обеспечивающую корректное взаимодействие этих элементов друг с другом и реализацию той или иной функциональности элементов. Частными примерами функциональной связи может быть связь с возможностью обмена информацией, связь с возможностью передачи электрического тока, связь с возможностью передачи механического движения, связь с возможностью передачи света, звука, электромагнитных или механических колебаний и т.д. Конкретный вид функциональной связи определяется характером взаимодействия упомянутых элементов, и, если не указано иное, обеспечивается широко известными средствами, используя широко известные в технике принципы.The functional connection of elements should be understood as a connection that ensures the correct interaction of these elements with each other and the implementation of one or another functionality of the elements. Particular examples of functional communication may be communication with the ability to exchange information, communication with the ability to transmit electric current, communication with the ability to transmit mechanical motion, communication with the ability to transmit light, sound, electromagnetic or mechanical vibrations, etc. The specific type of functional connection is determined by the nature of the interaction of the mentioned elements, and, unless otherwise indicated, is provided by widely known means, using principles widely known in the art.

Электрическое соединение одного элемента/схемы/порта/вывода с другим элементом/схемой/портом/выводом подразумевает, что эти элементы/схемы/порты/выводы могут быть как непосредственно соединены друг с другом, так и опосредованно через иные элементы или схемы.The electrical connection of one element/circuit/port/output to another element/circuit/port/output implies that these elements/circuits/ports/outputs can be either directly connected to each other or indirectly through other elements or circuits.

Конструктивное исполнение элементов предложенного устройства является известным для специалистов в данной области техники и не описывается отдельно в данном документе, если не указано иное. Элементы устройства могут быть выполнены из любого подходящего материала. Эти составные части могут быть изготовлены с использованием известных способов, включая, лишь в качестве примера, механическую обработку на станках, литье по выплавляемой модели, наращивание кристаллов. Операции сборки, соединения и иные операции в соответствии с приведенным описанием также соответствуют знаниям специалиста в данной области и, таким образом, более подробно поясняться здесь не будут.The design of the elements of the proposed device is known to those skilled in the art and is not described separately in this document unless otherwise indicated. The elements of the device can be made of any suitable material. These component parts can be manufactured using known methods including, by way of example only, machining, investment casting, and crystal growth. The assembly, connection and other operations in accordance with the above description are also within the knowledge of a person skilled in the art and, therefore, will not be explained in more detail here.

Несмотря на то, что примерные варианты осуществления были подробно описаны и показаны на сопроводительных чертежах, следует понимать, что такие варианты осуществления являются лишь иллюстративными и не предназначены ограничивать настоящее изобретение, и что данное изобретение не должно ограничиваться конкретными показанными и описанными компоновками и конструкциями, поскольку специалисту в данной области техники на основе информации, изложенной в описании, и знаний уровня техники могут быть очевидны различные другие модификации и варианты осуществления изобретения, не выходящие за пределы сущности и объема данного изобретения.Although exemplary embodiments have been described in detail and shown in the accompanying drawings, it is to be understood that such embodiments are merely illustrative and are not intended to limit the present invention, and that the invention should not be limited to the specific arrangements and structures shown and described, since Various other modifications and embodiments of the invention may be apparent to one skilled in the art based on the information set forth in the specification and knowledge of the prior art without departing from the spirit and scope of the present invention.

Claims

1. A method of biometric identification, containing the steps of:

- activate in the terminal for biometric identification the search mode for a human face on color images obtained from a video stream from one camera from a stereo camera;

- detect and track a person’s face in an image from said one camera from a stereo camera and determine its dimensions and coordinates;

- search for a person’s face in an image from an infrared or thermal camera, synchronized with the said image from a color camera, and determine its dimensions and coordinates;

- compare the dimensions and coordinates of a person’s face, determined for an image from a color camera and for an image from an infrared or thermal camera, and draw a conclusion about the presence of a person in front of said cameras based on said comparison;

- send images from a stereo camera and an infrared or thermal camera to the recognition server;

- on the recognition server, the presence of a living person in the captured image is checked by analyzing at least one color image from at least one camera from a stereo camera, and when confirming the presence of a living person in the captured image, the person’s face is recognized, which includes the steps of comparing images of a person's face in said images received from the terminal, with patterns stored in the database, and inferring whether or not there is a match with the pattern; And

- send the results of person recognition from the recognition server to the terminal.

2. The method according to claim 1, wherein the biometric identification terminal activates a human face search mode in images obtained from a video stream from a single color camera from a stereo camera, in response to detecting motion in the captured images from said single color camera from a stereo camera, accompanied by changing the overall illumination in the frame.

3. The method according to claim 2, in which, before activating the human face detection mode, the illumination of the area of interest in which a human face may appear, carried out by the backlight unit, is dimmed, a face search is not performed, images from one camera from a stereo camera are analyzed only for the general illumination level , and after activating the human face search mode, the backlight enters the operating mode.

4. The method according to claim 1, in which, at the pre-processing stage, images from the stereo camera and the infrared or thermal camera are recorded in a ring buffer and subjected to said pre-processing, while maintaining their synchronicity.

5. The method according to claim 1, in which the search for a person’s face in an image from an infrared or thermal camera is carried out over the entire field of the captured image.

6. The method according to claim 1, in which, if at the stage of searching for a person’s face in the image from an infrared or thermal camera a person’s face is not detected, then this image is discarded, tracking the person’s face in the image from a color camera stops and returns to the stage of searching for a person faces in images from a single color camera from a stereo camera.

7. The method according to claim 1, in which when checking the presence of a living person in the captured image, both the detected person’s face and background objects in the image are analyzed.

8. The method according to claim 1, in which the presence of a living person in the captured image is checked by analyzing a depth map built on the basis of two synchronous color images from a stereo camera.

9. The method according to claim 8, in which the analysis of the depth map is carried out by comparing the depth map currently calculated based on two synchronous color images from a stereo camera with a depth map template representing the relief of some average person's face.

10. The method according to claim 1, wherein checking the presence of a living person in the captured image is carried out by analyzing synchronized color images from two cameras from a stereo camera.

11. The method according to claim 1, wherein checking the presence of a living person in the captured image is carried out by analyzing synchronized color images from two cameras from a stereo camera and an image from an infrared camera.

12. The method according to any one of claims 1, 7, 8, 10, 11, wherein said check for determining the presence of a living person in the captured image is performed by means of a neural network.

13. The method according to claim 1, further comprising the steps of additional confirmation of identification by recording an audio signal from the microphone/microphones of the terminal in a synchronized mode with capturing video images from cameras and identifying a person by voice extracted from the recorded audio signal.

14. The method of claim 13, further comprising further verifying the presence of a living person by comparing the movement of the person's lips in the video images with a phrase spoken by the person and captured by recording audio from the terminal microphone(s).

15. The method according to claim 1, further comprising the steps of additional confirmation of identification by reading a tag or card confirming the person’s identity with a contactless reader.

16. A computer-readable storage medium storing on itself a computer program that, when executed by a processor, causes said processor to implement the biometric identification method according to any one of claims 1 to 15.

17. A terminal for biometric identification, including a camera unit containing a backlight unit, a stereo camera, an infrared or thermal camera, as well as a processing unit connected to the camera unit, wherein the processing unit is configured to:

- activate the search mode for a human face in images obtained from a video stream from one color camera from a stereo camera;

- detect and track a person’s face in an image from said single color camera from a stereo camera and determine its dimensions and coordinates;

- compare the dimensions and coordinates of a person’s face determined for an image from a color camera and for an image from an infrared or thermal camera, and draw a conclusion about the presence of a person in front of said cameras based on the said comparison;

- receive human recognition results from the recognition server.

18. The terminal as set forth in claim 17, wherein the processing unit is configured to activate a human face search mode in images obtained from a video stream from a single color camera of a stereo camera, in response to detecting motion in the captured images from said single color camera of a stereo camera, accompanied by changing the overall illumination in the frame.

19. The terminal according to claim 18, in which, before activating the human face detection mode, the illumination of the area of interest in which a human face may appear, carried out by the backlight unit, is dimmed, a face search is not performed, images from one camera from a stereo camera are analyzed only for the general illumination level , and after activating the human face search mode, the backlight enters the operating mode.

20. The terminal according to claim 17, wherein the processing unit is configured to record images from the stereo camera and the infrared or thermal camera into a ring buffer and subject them to said pre-processing while maintaining their synchronicity.

21. The terminal according to claim 17, in which the processing unit is configured to search for a person’s face in an image from an infrared or thermal camera across the entire field of the captured image.

22. The terminal according to claim 17, in which the processing unit is configured to, if at the stage of searching for a person’s face in the image from an infrared or thermal camera, a person’s face is not detected, discard this image, stop tracking the person’s face in the image from the color camera and return to the stage of searching for a human face in images from a single color camera from a stereo camera.

23. The terminal according to claim 17, in which the camera unit is connected to a housing containing a processing unit via a rotating bracket.

24. A system for biometric identification, including a terminal according to any one of claims 17-23 and a recognition server configured to:

- receive images from the terminal;

- check the presence of a living person in the captured image by analyzing at least one color image from at least one camera from a stereo camera;

- when confirming the presence of a living person in the captured image, perform human face recognition, which includes comparing images of the person’s face in the images received from the terminal with templates stored in the database, and making a decision regarding the presence or absence of a match with the template;

- send person recognition results to the terminal.

25. The system according to claim 24, in which when checking the presence of a living person in the captured image, both the detected person’s face and background objects in the image are analyzed.

26. The system according to claim 24, in which the recognition server is configured to check the presence of a living person in the captured image by analyzing a depth map built on the basis of two synchronous color images from a stereo camera.

27. The system according to claim 26, in which the recognition server is configured to analyze the depth map by comparing the depth map currently calculated based on two synchronous color images from a stereo camera with a depth map template representing the relief of some average person's face.

28. The system of claim 24, wherein the recognition server is configured to verify the presence of a living person in the captured image by analyzing synchronized color images from two cameras from a stereo camera.

29. The system according to claim 24, in which the recognition server is configured to verify the presence of a living person in the captured image by analyzing synchronized color images from two cameras from a stereo camera and an image from an infrared camera.

30. The system according to any one of claims 24-26, 28, 29, in which said check for determining the presence of a living person in the captured image is performed in the recognition server via a neural network.

31. The system according to claim 24, in which the terminal additionally contains a microphone/microphones for capturing an audio signal, and the system is configured to provide additional confirmation of identification by recording an audio signal from the microphone/microphones of the terminal in a synchronized mode with capturing video images from cameras and identifying a person by voice extracted from the recorded audio signal.

32. The system of claim 31, wherein the system is configured to further verify the presence of a living person by matching the movement of the person's lips in the video images with a phrase spoken by the person and captured by recording audio from the terminal microphone(s).

33. The system of claim 24, wherein the terminal further comprises a contactless reader, and the system is configured to provide additional confirmation of identification by reading a tag or card confirming the person's identity with the contactless reader.