RU2370817C2

RU2370817C2 - System and method for object tracking

Info

Publication number: RU2370817C2
Application number: RU2004123248/09A
Authority: RU
Inventors: Ванг Джин МУН (KR); Ванг Джин МУН; Александр Борисович МУРЫНИН (RU); Александр Борисович Мурынин; Петр Валерьевич БАЗАНОВ (RU); Петр Валерьевич Базанов; Виктор Дмитриевич Кузнецов (RU); Виктор Дмитриевич Кузнецов; Светлана Юрьевна Фаткина (RU); Светлана Юрьевна Фаткина; Юнг Джин ЛИ (KR); Юнг Джин ЛИ; Хе Куан ЯНГ (KR); Хе Куан ЯНГ
Original assignee: Самсунг Электроникс Ко., Лтд.; Корпорация С1
Priority date: 2004-07-29
Filing date: 2004-07-29
Publication date: 2009-10-20
Also published as: RU2004123248A

Abstract

FIELD: electric engineering.

SUBSTANCE: invention is related to technology of video surveillance arrangement, namely to systems and methods of automatic separation and tracking of human being face for biometric identification of a person. In method they detect presence of a human being in zone of surveillance with the help of two and more distanced video cameras with their previously known location; head position is identified in zone of surveillance, using a priori data on geometric dimensions; face area is separated together with position of elements, such as eyebrows, eyes, nose, mouth, on detected face; simultaneous tracking over three types of objects (point, area, graph) is carried out on face; based on a priori and detected data 3D face model is reconstructed; in case of sufficient completeness and integrity of informative criteria of produced 3D face model, angles are calculated, which define orientation of head in space; in case detected angle is sufficiently representative and differs from angles at previous frames, face is recognised based on most representative frames of image. System realizes specified method.

EFFECT: higher accuracy and speed of human face detection and tracking, and also expansion of application field.

11 cl, 9 dwg

Description

Изобретение относится к технике организации видеонаблюдения, а именно к системам и способам автоматического выделения и отслеживания лица человека в зоне наблюдения для биометрической идентификации личности.The invention relates to techniques for organizing video surveillance, and in particular to systems and methods for automatically isolating and tracking a person’s face in a surveillance zone for biometric identification of a person.

В связи с развитием технологий в области обработки сигналов и изображений в режиме реального времени все большее распространение находят сложные компьютерные системы, обеспечивающие возможность видеоконтроля в зоне обзора с автоматическим распознаванием объекта наблюдения. Зоной обзора в большинстве случаев является конкретное пространство, находящееся под контролем интеллектуальной экспертной системы безопасности, которая принимает решения аналогично человеку-охраннику.In connection with the development of technologies in the field of signal and image processing in real time, sophisticated computer systems that provide the possibility of video monitoring in the field of view with automatic recognition of the object of observation are becoming more widespread. The review area in most cases is a specific space under the control of an intelligent expert security system that makes decisions similar to a human security guard.

Для принятия такой системой решения ей необходимы определенные данные об объекте, которые можно получить за счет отслеживания ключевых признаков объекта. В случае, если таким объектом является человек, наиболее характерные признаки можно получить путем отслеживания его лица. При этом системе приходится иметь дело с нестатическими изображениями или, иначе говоря, с видеопоследовательностью.To make a decision by such a system, it needs certain data about the object, which can be obtained by tracking the key features of the object. In the event that such an object is a person, the most characteristic signs can be obtained by tracking his face. In this case, the system has to deal with non-static images or, in other words, with a video sequence.

Одним из основных элементов процесса отслеживания является определение (предугадывание) позиции объекта, например, лица, в следующем кадре, основанное на предположениях, сделанных по позиции объекта в предыдущем кадре. Отслеживание - это очень быстрая операция по сравнению с выделением и распознаванием, именно поэтому она является ключевым элементом любой системы распознавания, работающей в режиме реального времени в таких приложениях как: распознавание лиц, видеоконференции, наблюдение, интерфейс «человек-машина», виртуальная реальность, зрение роботов.One of the main elements of the tracking process is determining (predicting) the position of an object, for example, a person, in the next frame, based on assumptions made about the position of the object in the previous frame. Tracking is a very quick operation compared to extraction and recognition, which is why it is a key element of any recognition system that works in real time in such applications as: face recognition, video conferencing, surveillance, human-machine interface, virtual reality, sight of robots.

Одной из попыток решить проблему автоматического отслеживания и распознавания объекта стала система выделения и отслеживания, описанная в опубликованной международной заявке PCT/US 98/15323 (см. публикацию ВОИС WO 99/06940) [1]. Данная система основана на пассивной технике стереозрения. В ней применяют три известных способа выделения изображения, например, лица, основанных на следующих методах анализа изображений: оценка карты глубины изображения, цветовая сегментация изображения и классификация образов на полутоновом изображения. Основной идеей системы является применение комбинации трех независимых методов анализа изображений и принятия решений. Однако наилучшая производительность системы была достигнута в результате применения вышеупомянутых методов в последовательном каскадном режиме.One of the attempts to solve the problem of automatic tracking and object recognition was the detection and tracking system described in published international application PCT / US 98/15323 (see WIPO publication WO 99/06940) [1]. This system is based on a passive stereo vision technique. It employs three well-known methods of image extraction, for example, faces based on the following image analysis methods: evaluation of the image depth map, color segmentation of the image and classification of images on a grayscale image. The main idea of the system is to use a combination of three independent methods of image analysis and decision making. However, the best system performance was achieved as a result of applying the above methods in sequential cascade mode.

Основным недостатком вышеупомянутого способа является то, что он относится к категории способов анализа изображений по цвету, в которых тон изображения является доминирующим признаком при принятии решений о детектировании лица человека. Таким образом, данный способ анализа изображения по цвету накладывает большие ограничения на условия освещенности зоны наблюдения и цвет кожи лица человека. Анализ полутонового изображения, проводимый при классификации образов, является второстепенным и существенным образом зависит от шаблона, получаемого в результате цветовой сегментации изображений и стереореконструкции. Другим недостатком способа можно назвать то, что применялся в большей степени холистический, а не локальный подход для определения/отслеживания лица, то есть производился анализ признаков всего изображения целиком, а не анализ его локальных особенностей.The main disadvantage of the aforementioned method is that it belongs to the category of methods for analyzing images by color, in which the image tone is the dominant feature when making decisions about detecting a person’s face. Thus, this method of image analysis by color imposes great restrictions on the lighting conditions of the observation zone and the color of the skin of a person's face. The grayscale image analysis carried out in the classification of images is secondary and substantially depends on the template obtained as a result of color segmentation of images and stereo reconstruction. Another disadvantage of the method can be called the fact that a holistic rather than a local approach was used to determine / track the face, that is, an analysis was made of the signs of the whole image, and not an analysis of its local features.

Таким образом, данный способ имеет большие ограничения по точности и эффективности, поскольку способ:Thus, this method has great limitations on accuracy and efficiency, since the method:

1) является холистическим, то есть рассматривает изображение целиком при больших разрешениях и оперирует глобальными характеристиками изображения;1) is holistic, that is, it considers the whole image at high resolutions and operates with global image characteristics;

2) в большой степени является последовательным;2) to a large extent is consistent;

3) малоприменим, поскольку накладываются существенные ограничения на освещенность зоны наблюдения и объект съемки.3) hardly applicable, since significant restrictions are imposed on the illumination of the observation zone and the subject.

Другой способ и устройство выделения и отслеживание лица/головы представлен в опубликованном патенте РФ №2093890 [2]. Данный способ и устройство верифицирует лицо с использованием технологии объемной (3D) реконструкции. Изобретение имеет большие недостатки в вопросах устойчивости и слишком узкий круг приложений, поскольку в данном способе, как и в способе описанного выше аналога, результат процесса выделения и отслеживания зависит существенным образом от цветовой информации, получаемой из изображения.Another method and device for the isolation and tracking of the face / head is presented in the published patent of the Russian Federation No. 2093890 [2]. This method and device verifies a person using volumetric (3D) reconstruction technology. The invention has great drawbacks in terms of stability and a too narrow range of applications, since in this method, as well as in the method of the analogue described above, the result of the process of isolation and tracking essentially depends on the color information obtained from the image.

Наиболее близким к заявленному изобретению является способ, описанный в международной заявке PCT/SE 02/01234 (см. публикацию ВОИС WO 03/003910) [3]. Данный способ точного выделения и отслеживания глаз и лица основан на анализе границ и контуров объектов на изображении, получаемых в результате фильтрации полутонового изображения. Ключевая идея изобретения - интеграция двух устройств детектирования в один: процедуры выделения лица и выделения глаз взаимосвязаны. Таким образом, процесс выделения лица сильно зависит от характеристик изображения глаз на полутоновом изображении. Это ограничение повышает скорость и устойчивость детектирования лица и глаз в одних случаях, но в других случаях (например, при закрытых глазах) не позволяет производить выделение лица вообще.Closest to the claimed invention is the method described in international application PCT / SE 02/01234 (see WIPO publication WO 03/003910) [3]. This method of accurately isolating and tracking eyes and faces is based on the analysis of the boundaries and contours of objects in the image obtained by filtering a grayscale image. The key idea of the invention is the integration of two detection devices into one: face extraction and eye extraction procedures are interconnected. Thus, the process of extracting a face is highly dependent on the characteristics of the image of the eyes in the grayscale image. This limitation increases the speed and stability of the detection of the face and eyes in some cases, but in other cases (for example, with eyes closed) it does not allow the selection of the face at all.

Другой недостаток такого решения заключается в том, что этот способ пригоден для решения только очень узкой области биометрических задач и не учитывает более сложные ситуации, когда в кадре присутствует множество лиц или случай, когда лицо заслонено другими объектами. Кроме того, описанный способ привязан к определенному фону, освещению и геометрии зоны наблюдения.Another disadvantage of this solution is that this method is suitable for solving only a very narrow area of biometric problems and does not take into account more complex situations when there are many faces in the frame or the case when the face is obscured by other objects. In addition, the described method is tied to a specific background, lighting and geometry of the observation zone.

Таким образом, способ не может решать задачи выделения и отслеживания при вычислениях в режиме реального времени в сложных ситуациях с зашумленным фоном, перекрытием лиц и быстрым изменением мимики лица человека. Несмотря на отмеченные недостатки, данный способ анализа изображений наиболее близок к заявляемому изобретению и поэтому выбран в качестве прототипа.Thus, the method cannot solve the problems of isolation and tracking in real-time calculations in complex situations with a noisy background, overlapping faces and a quick change in the facial expressions of a person. Despite the noted disadvantages, this method of image analysis is closest to the claimed invention and therefore is selected as a prototype.

Задачей заявленной системы и способа является повышение точности и скорости детектирования и отслеживания лица и черт лица, за счет использования стереореконструкции зоны наблюдения и контурных методов анализа полутоновых изображений. Кроме того, задачей заявленной системы и способа является расширение условий их применения, а именно выделение и отслеживание объектов в условиях, близких к темноте, в условиях зашумленного фона, перекрытия лиц и быстрого изменения мимики лица человека.The objective of the claimed system and method is to increase the accuracy and speed of detection and tracking of faces and facial features through the use of stereo reconstruction of the observation area and contour analysis methods of grayscale images. In addition, the objective of the claimed system and method is to expand the conditions for their use, namely, the allocation and tracking of objects in conditions close to darkness, in the conditions of a noisy background, overlapping faces and rapid changes in facial expressions of a person.

Технический результат достигается за счет усовершенствованной процедуры распознавания и отслеживания объекта, в частности, за счет того, что цветовая информация используется только для обеспечения эвристик с целью ускорения сходимости методов отслеживания объектов на полутоновом изображении.The technical result is achieved due to the improved recognition and tracking of an object, in particular, due to the fact that color information is used only to provide heuristics in order to accelerate the convergence of methods for tracking objects in a grayscale image.

Предлагаемая система включает в себя следующие основные элементы:The proposed system includes the following main elements:

- Детектор, который состоит из датчика (например, стереокамеры), процессора (например, цифрового сигнального процессора) и программного обеспечения для выделения и отслеживания лица;- A detector, which consists of a sensor (for example, a stereo camera), a processor (for example, a digital signal processor) and software for detecting and tracking faces;

- Сенсорный датчик (например, стереокамера), который получает поток данных о зоне наблюдения и передает его на модули и блоки устройств, реализованных на одном или нескольких процессоров (например, цифровых сигнальных процессорах), обрабатывающих приходящий поток данных;- A sensor (for example, a stereo camera) that receives a stream of data about the observation area and transmits it to modules and units of devices implemented on one or more processors (for example, digital signal processors) that process the incoming data stream;

- Осветительный прибор (например, матрица диодов) для работы в условиях малой освещенности.- A lighting device (for example, a matrix of diodes) for operation in low light conditions.

Одним из основных отличительных признаков системы является то, что она работает в режиме реального времени и оптимизирована для любого разрешения, процесс распознавания может осуществляться без привлечения информации о цвете, что позволяет успешно использовать недорогие камеры с низким разрешением. С другой стороны, предложенный способ использует метод оценки карты глубины изображения совместно с контурными методами анализа полутоновых изображений, что позволяет значительно расширить зону видеонаблюдения.One of the main distinguishing features of the system is that it works in real time and is optimized for any resolution, the recognition process can be carried out without involving color information, which allows the successful use of inexpensive cameras with low resolution. On the other hand, the proposed method uses the method of estimating the image depth map in conjunction with the contour analysis methods of grayscale images, which can significantly expand the area of video surveillance.

Благодаря представленному способу 3D стереореконструкции, совмещенной с 2D техникой контурных методов анализа полутоновых изображений, система может эффективно выделять лица, которые частично перекрыты другими объектами. Подобная интеграция двух способов детектирования делает систему многоцелевой и пригодной для реальных приложений, в которых фон сильно зашумлен или задан сложный сценарий съемки.Thanks to the presented method of 3D stereo reconstruction, combined with the 2D technique of contour methods for analyzing halftone images, the system can effectively distinguish faces that are partially blocked by other objects. Such integration of the two detection methods makes the system multi-purpose and suitable for real applications in which the background is very noisy or a complex shooting scenario is specified.

Для явного и эффективного представления системы биометрического детектирования/отслеживания разработан вид архитектуры биометрической системы распознавания, который позволяет эффективно распараллеливать процесс отслеживания. Более того, основная отличительная особенность состоит в том, что модуль слежения распараллеливается за счет использования независимо трех типов слежения за точками, областями и графом, а также модуля координации указанных типов слежения.For an explicit and effective presentation of the biometric detection / tracking system, a type of architecture of the biometric recognition system has been developed, which allows you to effectively parallelize the tracking process. Moreover, the main distinguishing feature is that the tracking module is parallelized through the use of independently three types of tracking points, regions and graph, as well as the coordination module of these types of tracking.

С одной стороны, данный способ распараллеливания процесса слежения позволяет улучшить работу и обеспечить детектирование в реальном масштабе времени быстро движущихся объектов, например, в случае быстрой смены ракурсы человеком. С другой стороны, данный способ обеспечивает устойчивость к блочному (перекрытие объектов) и точечному шуму в видеопоследовательности. Если человек закрывает лицо руками, устойчивое отслеживание по точкам проводиться не может, и система переключается в режим устойчивого слежения по областям и структуре графа лица человека. В случае значительного ухудшения работы системы или сбоев консистентности областей в графе лица производится новая стереореконструкция.On the one hand, this method of parallelizing the tracking process can improve performance and provide real-time detection of fast-moving objects, for example, in the case of a quick change of angle by a person. On the other hand, this method provides resistance to block (overlapping objects) and point noise in the video sequence. If a person covers his face with his hands, stable tracking by points cannot be carried out, and the system switches to the mode of stable tracking of the regions and structure of the graph of a person’s face. In the event of a significant deterioration in the operation of the system or failures in the consistency of areas in the face graph, a new stereo reconstruction is performed.

В данном изобретении были применены подходы, основанные на таких данных об изображении, как: глубина, интенсивность областей, свойства контуров и границ, при этом был выявлен оптимальный (компромиссный вариант между точностью и производительностью) гибридный метод, использующий все эти свойства изображения в определенной последовательности и конфигурации. В данном методе применяются три типа двухмерного отслеживания с методом оценки карты глубины, более того, используются методы локального выделения признаков, инвариантные к освещенности, что позволяет вычислить пространственные углы между локальными точками и оценить ракурс лица человека.In this invention, approaches based on image data such as depth, intensity of regions, properties of edges and borders were applied, and an optimal (compromise between accuracy and performance) hybrid method was used that uses all these image properties in a certain sequence and configurations. In this method, three types of two-dimensional tracking are used with the method of estimating the depth map, moreover, methods of local feature extraction that are invariant to illumination are used, which allows one to calculate the spatial angles between local points and evaluate the angle of a person’s face.

Как уже отмечалось, большинство известных систем определения и отслеживания лица человека имеют строгие ограничения в использовании, низкую точность, производительность и узкий диапазон получения спектра сигнала при различной освещенности. В отличие от этих систем, данная система стереоотслеживания лица человека обладает следующими свойствами:As already noted, most of the known systems for detecting and tracking a person’s face have severe restrictions on use, low accuracy, performance, and a narrow range for obtaining a signal spectrum in different lighting conditions. Unlike these systems, this system of stereo tracking of a person’s face has the following properties:

(а) работает в режиме обнаружения одного лица и более;(a) operates in the detection mode of one person or more;

(б) отделяет лица от сложного фона;(b) separates faces from a complex background;

(с) остается работоспособной при большом скоплении людей;(c) remains operational with a large crowd of people;

(д) обеспечивает высокий уровень устойчивости и целостности за счет применения интеллектуального модуля устройства слежения за лицом человека;(e) provides a high level of stability and integrity through the use of an intelligent module of a device for tracking a person’s face;

(е) имеет высокую производительность обработки в режиме реального времени;(e) has high processing performance in real time;

(ж) поддерживает широкий диапазон частот сигнала для работы в условиях различной освещенности.(g) supports a wide range of signal frequencies for operation in different lighting conditions.

Сущность предлагаемого способа обнаружения и отслеживания множества лиц человека в зоне наблюдения заключается в использовании гибридной параллельной архитектурой системы обнаружения/отслеживания/распознавания лица человека и сборе информации, относящейся к глубине, цвету, яркости полутона, границам и контурам, структуре графа лица в соответствии с ракурсом, движением и симметрией лица.The essence of the proposed method for detecting and tracking a plurality of human faces in the observation area is to use a hybrid parallel architecture of the human face detection / tracking / recognition system and collecting information related to the depth, color, semitone brightness, borders and contours, the face graph structure in accordance with the perspective , movement and symmetry of the face.

Предлагаемый способ предусматривает использование, по крайней мере, двух идентичных сенсоров, например видеокамер (далее стереокамера), которые разнесены в пространстве с заранее определенной ориентацией. Двухмерные распределения в пространстве интенсивности светового излучения от стереокамер (далее стереоизображения) передаются на процессоры, например, цифровые сигнальные процессоры, работающие на разных частотах и проводящие анализ и обработку данных изображения от стереокамеры.The proposed method involves the use of at least two identical sensors, such as video cameras (hereinafter stereo camera), which are spaced in space with a predetermined orientation. Two-dimensional spatial distributions of the intensity of light radiation from stereo cameras (hereinafter stereo images) are transmitted to processors, for example, digital signal processors operating at different frequencies and analyzing and processing image data from a stereo camera.

Архитектура системы, реализующей предлагаемый способ, представлена тремя главными модулями:The architecture of the system that implements the proposed method is represented by three main modules:

1) модуль детектирования трехмерных объектов/головы человека;1) module for detecting three-dimensional objects / human head;

2) модуль быстрого детектирования и отслеживания лица;2) module for quick detection and tracking of the face;

3) модуль распознавания лица человека.3) face recognition module of a person.

Такая архитектура реализуется, например, с использованием трех параллельных процессоров, на каждом из которых решаются следующие задачи компьютерного зрения: первичный захват объекта, его отслеживание и распознавание. Однако наиболее существенным отличием предлагаемого способа от прототипа [3] является распараллеливание структуры второго модуля и проведение независимого слежения за разными типами объектов на различных частотах слежения:Such an architecture is implemented, for example, using three parallel processors, on each of which the following computer vision tasks are solved: primary capture of an object, its tracking and recognition. However, the most significant difference of the proposed method from the prototype [3] is the parallelization of the structure of the second module and independent monitoring of different types of objects at different tracking frequencies:

F₁ - частота для отслеживание точек;F ₁ - frequency for tracking points;

F₂ - частота для отслеживание области;F ₂ - frequency for tracking area;

F₃ - частота для отслеживание графа лица.F ₃ - frequency for tracking face graph.

Стратегия выбора приоритета частот работы процессов слежения является, в большинстве случаев, фиксированной, или задается динамически согласно целевым функциям. Например, динамической выбор частот слежения может быть обусловлен биометрическими характеристиками приложения (компромисс между устойчивостью и производительностью, компромисс между ошибкой 1-го и 2-го рода), или зависеть от критерия качества целостности модели лица. Фиксированная установка приоритета частот позволяет проводить, например, следующую стратегию:The strategy for choosing the priority of the frequencies of the tracking processes is, in most cases, fixed, or set dynamically according to the target functions. For example, the dynamic choice of tracking frequencies may be due to the biometric characteristics of the application (a compromise between stability and performance, a compromise between type 1 and type 2 errors), or depend on the quality criterion for the integrity of the face model. A fixed frequency priority setting allows, for example, the following strategy:

- с высокой частотой - для быстрого детектирования исходных точек и их отслеживания на лице,- with a high frequency - for quick detection of reference points and their tracking on the face,

- со средней частотой - для определения областей лица и их отслеживания на области лица,- with an average frequency - to identify areas of the face and track them on the area of the face,

- с низкой частотой - для детектирования и отслеживания структуры графа лица.- with a low frequency - for detecting and tracking the structure of the face graph.

Основные этапы и процессы обработки стерео изображения:The main stages and processes of processing stereo images:

В самом начале изображения подвергают предварительной обработке локальным медианным фильтром для снижения воздействия точечных помех. Затем в работу включают три основных модуля. Такой способ организации архитектуры биометрической системы обеспечивает очень высокий уровень устойчивости и работы.At the very beginning, the images are pretreated with a local median filter to reduce the effects of point noise. Then, three main modules are included in the work. This way of organizing the architecture of the biometric system provides a very high level of stability and performance.

С помощью модуля детектирования трехмерных объектов выполняют два главных процесса:Using the module for detecting three-dimensional objects, two main processes are performed:

- трехмерную реконструкция зоны наблюдения;- three-dimensional reconstruction of the observation zone;

- детектирование головы.- head detection.

Для обеспечения устойчивости системы к незнакомому фону и перекрытым лицам производят первоначальную трехмерную реконструкцию зоны наблюдения в сочетании с цветовой и полутоновой сегментацией. Если система не идентифицирует трехмерный объект на экране с высокой степенью уверенности, то трехмерное распределение элементов фона сохраняют для дальнейшего сравнения. Детектирование головы выполняют через локализацию элементов трехмерного объекта в пространстве - обнаружением соответствующих точек, по которым осуществляют сравнение с заранее определенным трехмерным распределением элементов фона, то есть выполняют сравнение измеренного трехмерного распределения элементов с заранее определенным трехмерным распределение фона. Если эта разница больше, чем заранее определенная величина, то выделяют определенный геометрический размер и возможное очертание человека в зоне изменений.To ensure the stability of the system to an unfamiliar background and blocked faces, an initial three-dimensional reconstruction of the observation zone is performed in combination with color and grayscale segmentation. If the system does not identify a three-dimensional object on the screen with a high degree of confidence, then the three-dimensional distribution of background elements is saved for further comparison. Head detection is performed by localizing the elements of a three-dimensional object in space - by detecting the corresponding points at which they are compared with a predetermined three-dimensional distribution of background elements, that is, they compare the measured three-dimensional distribution of elements with a predetermined three-dimensional distribution of the background. If this difference is greater than a predetermined value, then a specific geometric size and a possible shape of a person in the zone of changes are distinguished.

С помощью модуля быстрого детектирования лица и отслеживания выполняютUsing the quick face detection and tracking module,

- оценку движения;- motion estimation;

- определение основных черт лица и нормализацию (начальное определение области лица человека, локальное детектирование точек на области лица, детектирование областей черт лица, детектирование ракурса графа лица);- determination of basic facial features and normalization (initial determination of a person’s facial area, local detection of points on the facial area, detection of facial features, detection of the face graph view);

- сегментацию изображения (пирамида изображений, регион интереса);- image segmentation (image pyramid, region of interest);

- отслеживание точек, области лица (глаз, брови, нос, рот, подбородок), графа;- tracking points, areas of the face (eyes, eyebrows, nose, mouth, chin), graph;

- оценку углов между точками, областями, графами;- an estimation of angles between points, areas, graphs;

- оценку ракурса человека в соответствии с вычисленными углами;- assessment of a person’s angle in accordance with the calculated angles;

- семантический анализ лица, при этом лицо должно быть достаточно целостным и соответствовать метрике лица.- semantic analysis of the face, while the face should be sufficiently holistic and consistent with the face metric.

В процессе работы системы с помощью первого модуля выполняют сравнение текущего стереоизображения с предыдущими стереоизображениями в видеопотоке и определяют их различия, например, формируя двухмерное поле скоростей при помощи второго модуля.In the process of the system using the first module, the current stereo image is compared with previous stereo images in the video stream and their differences are determined, for example, by forming a two-dimensional velocity field using the second module.

В случае положительного детектирования тела человека производят более детальную оценку расположения элементов объекта в пространстве методом поиска соответствующих точек на двух и более стереоизображений при помощи первого модуля.In the case of positive detection of the human body, a more detailed assessment of the location of the elements of the object in space is made by searching for the corresponding points on two or more stereo images using the first module.

С помощью модуля распознавания лица человека осуществляют:Using the face recognition module of a person carry out:

- нормализацию лица человека в соответствии с метрикой трехмерной модели лица,- normalization of a person’s face in accordance with the metric of a three-dimensional model of the face,

- построение трехмерной модели лица человека, идентификацию и верификацию лица.- building a three-dimensional model of a person’s face, identification and verification of the face.

Предлагаемый способ и система поясняются чертежами.The proposed method and system is illustrated by drawings.

На Фиг.1 представлена схема взаимодействия различных элементов всей системы и отдельных устройств детектирования, слежения и распознавания.Figure 1 presents a diagram of the interaction of various elements of the entire system and individual devices for detection, tracking and recognition.

На Фиг.2 показаны два положения камеры для захвата видеосигнала и результат трансформации сигнала в стереоизображение.Figure 2 shows two camera positions for capturing a video signal and the result of transforming the signal into a stereo image.

На Фиг.3 приведен пример предобработки стереоизображения (результат локальной нормализации интенсивности).Figure 3 shows an example of pre-processing a stereo image (the result of local intensity normalization).

На Фиг.4 приведен пример 3D реконструкции по стереопаре изображений и выделения головы.Figure 4 shows an example of 3D reconstruction of a stereo pair of images and highlight the head.

На Фиг.5 приведен результат холистического детектирования черт по стереопаре изображений - локальные черты лица и области лица, выраженные в качестве глаз, бровей, носа и рта.Figure 5 shows the result of holistic detection of features by a stereo pair of images - local facial features and facial areas, expressed as eyes, eyebrows, nose and mouth.

На Фиг.6 приведен результат сегментированной карты черт лица, благодаря которой возможно проведение локального детектирования черт по стереопаре изображений - локальные черты лица и области лица, выраженные в качестве глаз, бровей, носа и рта.Figure 6 shows the result of a segmented facial features map, due to which it is possible to conduct local detection of features using a stereo pair of images - local facial features and facial areas, expressed as eyes, eyebrows, nose and mouth.

На Фиг.7 приведен пример отслеживания точек на стереопаре изображений.Figure 7 shows an example of tracking points on a stereo pair of images.

На Фиг.8 приведен пример отслеживания локальных черт лица на стереопаре изображений.On Fig shows an example of tracking local facial features on a stereo pair of images.

На Фиг.9 приведен пример отслеживания графа лица на стереопаре изображений.Figure 9 shows an example of tracking a face graph on a stereo pair of images.

Способ реализуется с помощью описанной далее системы следующим образом:The method is implemented using the following system as follows:

Видеосигналы от двух видеокамер (стереокамеры) или множества камер подают на вход блока 1 преобразования сигналов. С помощью блока 1 осуществляют все операции, связанные с приемом сигналов, переданных от стереокамеры (Фиг.2), и трансформацией их в изображения, а также, в случае нескольких камер, проводят необходимую синхронизацию сигналов. Для захвата изображений целесообразно использовать два различных положения камеры (вертикальное и горизонтальное) (Фиг.2). Выбор положения камеры зависит от конкретного применения данного устройства и определяется согласно ограничениям на геометрические и световые условия проведения съемки. Горизонтальное положение камеры увеличивает угол горизонтального обзора камеры, вертикальное положение камеры предоставляет пользователю дополнительные удобства при позиционировании лица по высоте. (Фиг.2.)Video signals from two video cameras (stereo cameras) or multiple cameras are fed to the input of signal conversion unit 1. Using block 1, all operations related to the reception of signals transmitted from the stereo camera (FIG. 2) and their transformation into images, and also, in the case of several cameras, carry out the necessary synchronization of signals, are performed. To capture images, it is advisable to use two different camera positions (vertical and horizontal) (Figure 2). The choice of camera position depends on the specific application of this device and is determined according to restrictions on the geometric and lighting conditions of the shooting. The horizontal position of the camera increases the horizontal viewing angle of the camera, the vertical position of the camera provides the user with additional convenience when positioning a person in height. (Figure 2.)

Далее изображения подают в блок 2 предобработки, с помощью которого производят локальную нормализацию освещения (Фиг.3) и удаление помех из изображения. В случае обнаружения объекта (шаг 3 на Фиг.1), качественные нормализованные пары изображений подают на вход модуля быстрого детектирования и отслеживания лица (далее упоминается как модуль 7). Предлагаемый способ предполагает, что первоначально производят детектирование движения в зоне обнаружения быстрыми 2D методами анализа оптических потоков, реализованными в блоке 8 сегментации-координации детектировании и слежения за лицом/позицией и блоке 11 детектирования и отслеживания точек интереса (оба блока принадлежат модулю 7).Next, the image is fed to the pre-processing unit 2, with which local lighting normalization is performed (Fig. 3) and interference is removed from the image. In the case of detecting an object (step 3 in FIG. 1), high-quality normalized pairs of images are input to the module for fast detection and tracking of the face (hereinafter referred to as module 7). The proposed method assumes that the motion detection in the detection zone is initially performed by fast 2D optical flow analysis methods implemented in the segmentation-coordination block 8 for detecting and tracking the face / position and the block for detecting and tracking points of interest (both blocks belong to module 7).

После обнаружения изменений в зоне наблюдения, в случае невысокой консистентности детектирования (шаг 10 на Фиг.1), генерируют сигнал для блока 3D реконструкции (далее упоминается как блок 5) модуля 4, являющегося модулем детектирования 3D объектов. Далее, с учетом данных о дисторсии объектива камеры и смещении оптических осей, производят стереореконструкцию по стереопаре изображений, в результате которой восстанавливают рельеф зоны наблюдения и строят карту глубины, характеризующую удаленность объектов от камеры.After detecting changes in the observation zone, in the case of a low detection consistency (step 10 in FIG. 1), a signal is generated for the 3D reconstruction block (hereinafter referred to as block 5) of module 4, which is a module for detecting 3D objects. Further, taking into account the data on the distortion of the camera lens and the displacement of the optical axes, stereo reconstruction is performed using a stereo pair of images, as a result of which the relief of the observation area is restored and a depth map is constructed that characterizes the remoteness of objects from the camera.

Восстановление карты глубины производят путем вычисления диспаратности. Диспаратность, в данном случае, - это расстояние между двумя соответствующими объектами на левом и правом изображении (Фиг.4). На полученной карте глубины идентифицируют с помощью блока 6 выделения объекта-головы те объекты, которые по форме имеют очертания головы. Подобные данные об объектах-кандидатах из модуля 4 заносят в 3D модель лица (см. 18 на Фиг.1), и затем управление передают всецело модулю 7 быстрого детектирования и отслеживания лица.The restoration of the depth map is performed by calculating the disparity. Disparity, in this case, is the distance between two corresponding objects in the left and right image (Figure 4). On the obtained depth map, those objects that are shaped like a head outline are identified using the block 6 for selecting the head object. Similar data on candidate objects from module 4 are entered into a 3D face model (see 18 in FIG. 1), and then control is transferred entirely to module 7 for fast face detection and tracking.

Модель 18 лица представляет собой универсальный набор (карт черт) двухмерных изображений лица, а также 3D метрики лица (множество точек и расстояний между ними), который реализуется как разделяемый ресурс для всех параллельных модулей и трех параллельных блоков слежения (блоки 11, 12, 13).Face model 18 is a universal set (feature maps) of two-dimensional face images, as well as 3D face metrics (a set of points and distances between them), which is implemented as a shared resource for all parallel modules and three parallel tracking blocks (blocks 11, 12, 13 )

После проведения первичной оценки изображения головы человека при помощи трехмерной реконструкции работают только быстрые методы двухмерного холистического детектирования, которые заполняют модель 18 лица объектами-кандидатами для дальнейшего слежения.After the initial assessment of the image of the human head with the help of three-dimensional reconstruction, only fast methods of two-dimensional holistic detection work, which fill the model of the 18 face with candidate objects for further tracking.

С помощью блока 8 на основе хранящейся в модели 18 лица информации о положении головы выполняют построение пирамиды изображения, сегментирование, производят выбор области интереса под различные алгоритмы детектирования и слежения. Пирамида изображений подразумевает выбор определенной последовательности разрешений для дальнейшего анализа изображений. Сегментация изображений подразумевает процесс кластеризации изображения согласно различным признакам по цвету, глубине, интенсивности пикселов, в результате проведения сегментации выбирают только информативные участки изображений.Using block 8, based on the head position information stored in the face model 18, the image pyramid is constructed, segmented, and the region of interest is selected for various detection and tracking algorithms. The pyramid of images implies the choice of a certain sequence of resolutions for further image analysis. Image segmentation implies the process of image clustering according to various criteria in terms of color, depth, and pixel intensity; as a result of segmentation, only informative image sections are selected.

Далее с помощью блока 8 осуществляют управление процессом детектирования и отслеживания областей, нормализуют поступающие данные от различных и блоков детектирования и отслеживания к однородному виду и обновляют параметры модели 18 лица в соответствии с новой информацией.Next, using the block 8, the process of detecting and tracking areas is controlled, the incoming data from the various and the detection and tracking blocks are normalized to a uniform form, and the parameters of the face model 18 are updated in accordance with the new information.

На каждой итерации блок 8 производит оценку уровня консистентности. Уровень консистентности модели 18 лица - это свойство, характеризующее качество антропометрического расположения и соответствия локальных черт лица, а также точность их детектирования. В случае высокого уровня консистентности генерируется сигнал для блока 9, определяющего ракурс модели 18 лица. В случае низкого уровня консистентности (целостности модели 18 лица) генерируется сигнал модулю 4 для проведения новой стереореконструции. В том случае если блок 9 успешно производит оценку ракурса лица, блок 8 производит оценку репрезентативности модели 18 лица. Уровень репрезентативности модели 18 лица - это свойство, характеризующее качество представительности найденного ракурса для последующего слежения или распознавания. В случае если уровень репрезентативности модели 18 лица высокий (например, в результате быстрого смены ракурса лица появился новый представительный ракурс важный для идентификации), то формируют (шаг 14) сигнал для модуля 15 распознавания лица, и управление в дальнейшем осуществляют с помощью модуля 15.At each iteration, block 8 evaluates the level of consistency. The consistency level of model 18 of the face is a property that characterizes the quality of the anthropometric location and correspondence of local facial features, as well as the accuracy of their detection. In the case of a high level of consistency, a signal is generated for block 9, which determines the aspect of model 18 of the face. In the case of a low level of consistency (integrity of the model 18 of the face), a signal is generated to module 4 for a new stereo reconstruction. In the event that block 9 successfully evaluates the angle of the face, block 8 evaluates the representativeness of the model 18 of the face. The level of representativeness of the model 18 of the face is a property that characterizes the quality of the representativeness of the found angle for subsequent tracking or recognition. If the level of representativeness of the model 18 of the face is high (for example, as a result of a quick change in the angle of the face, a new representative angle appears that is important for identification), then a signal is generated (step 14) for the face recognition module 15, and control is further performed using module 15.

После сегментации изображений по областям интереса, а также выбора определенного разрешения для областей интереса запускают три процесса детектирования объектов (точки, области и граф). В соответствии с исходными параметрами системы и информацией о целостности текущей модели 18 лица применяют заранее определенную стратегию, устанавливающую приоритет и частоту (количество кадров в секунду) для каждого процесса слежения. Высокая эффективность достигается, когда каждый блок слежения (блоки 11, 12, 13) реализуется независимо на каждом процессоре, например, на цифровом сигнальном процессоре. Также выбирают режим детектирования или слежения для каждого из процессов.After segmenting the images by areas of interest, as well as selecting a specific resolution for areas of interest, three processes for detecting objects (points, areas, and graph) are started. In accordance with the initial parameters of the system and the integrity information of the current model 18, individuals apply a predetermined strategy that sets the priority and frequency (number of frames per second) for each tracking process. High efficiency is achieved when each tracking unit (blocks 11, 12, 13) is implemented independently on each processor, for example, on a digital signal processor. A detection or tracking mode for each of the processes is also selected.

Для большей точности детектирования черт и более высокого уровня целостности черт используют комбинация двух подходов к детектированию черт - холистического и локального.For greater accuracy in detecting traits and a higher level of integrity of traits, a combination of two approaches to detecting traits — holistic and local — is used.

Блок 11 детектирования/отслеживания точек интереса в режиме детектирования производит холистическое детектирование черт по стереопаре изображений - локальные черты лица и области лица, выраженные в качестве глаз, бровей, носа и рта. Метод основан на применении горизонтальных и вертикальных проекций и использовании антропометрических характеристик лица (см. Фиг.5).The block 11 detection / tracking of points of interest in the detection mode performs holistic detection of features on a stereo pair of images - local facial features and facial areas, expressed as eyes, eyebrows, nose and mouth. The method is based on the use of horizontal and vertical projections and the use of anthropometric characteristics of the face (see Figure 5).

Блок 11 в режиме слежения выполняет только быстрое отслеживание точек с использованием быстрого метода анализа оптического потока (см. Фиг.7).Block 11 in tracking mode performs only fast point tracking using the fast optical flow analysis method (see Fig. 7).

Блок 12 детектирования/отслеживания областей интереса в режиме детектирования выполняет первичное детектирование с использованием вейвлет-преобразования изображения по Габору, проводит сегментацию и устанавливает правила детектирования на основе свойств геометрических форм каждой из областей, расположения области на лице, и холистической информации об изображении, получаемой методом проекций (Фиг.6). Блок 12 в режиме слежения адаптирует параметры детектируемого режима к более быстрому режиму. Это означает, что для процесса детектирования больше не нужен адаптивный метод сегментирования, поэтому открывается возможность произвольно устанавливать область интереса, предоставлять таблицу индексов, и выбирать только главные информативные карты черт лица (Фиг.8). Таким образом, что для быстрого детектирования области лица осуществляют два типа независимого анализа изображений стереопары: анализ изображений в целом с помощью горизонтальных и вертикальных проекций гистограммы яркости и анализ локально-частотных свойств изображения в результате вейвлет преобразования изображения.The unit 12 for detecting / tracking areas of interest in the detection mode performs primary detection using the Gabor image wavelet transform, performs segmentation and sets the detection rules based on the properties of the geometric shapes of each area, the location of the area on the face, and holistic image information obtained by the method projections (Fig.6). Block 12 in tracking mode adapts the parameters of the detected mode to a faster mode. This means that the adaptive segmentation method is no longer needed for the detection process, so it is possible to arbitrarily set the area of interest, provide an index table, and select only the main informative facial features maps (Fig. 8). Thus, in order to quickly detect the face region, two types of independent analysis of stereopair images are carried out: analysis of the images as a whole using horizontal and vertical projections of the brightness histogram and analysis of the local-frequency image properties as a result of the image wavelet transform.

Блок 13 детектирования/отслеживания графа лица в режиме детектирования выполняет первоначальное детектирование на основе применения архитектуры согласования динамических связей (метод сравнения жесткого и эластичного графа) для предобработанных черт лица. Данный блок 13 в режиме отслеживания выполняет более эффективно трехмерное отслеживание структуры черт лица, установленных при помощи графа, в то время как (блоки 11 и 12) выполняют только двумерное отслеживание контуров изображения. Здесь каждая черта лица в заранее определенном ракурсе представлена при помощи графа, и целью данного блока является предсказание деформации граф на следующем кадре. Эта задача реализуется на основе использовании методов классификации образов, например, используя нейронные сети, или за счет применения архитектуры динамической связей (Фиг.9).The face graph detection / tracking unit 13 in the detection mode performs initial detection based on the application of dynamic linking architecture (a method for comparing a rigid and elastic graph) for pre-processed facial features. This block 13 in the tracking mode performs more efficiently three-dimensional tracking of the structure of facial features set using the graph, while (blocks 11 and 12) perform only two-dimensional tracking of image contours. Here, each feature in a predetermined perspective is represented by a graph, and the purpose of this unit is to predict the deformation of the graph in the next frame. This task is realized through the use of image classification methods, for example, using neural networks, or through the use of dynamic communications architecture (Figure 9).

В случае высокой степени консистентности модели 18 лица в блоке 9 проводят вычисление углов ориентации головы, а именно наклон, кивок, поворот. Для этого используют координаты найденного геометрического центра головы и координаты отслеживаемых черт лица, полученные по текущему кадру и предыдущим кадрам. Вычисление углов поворота, кивка, наклона головы производят в пространстве при помощи сравнения полученных данных с шаблоном. Когда обнаружены координаты положения черт на левых и правых изображениях, двухмерные координаты переводят в трехмерные координаты. Полученная информация позволяет провести оценку ракурса лица человека. Для оценки ракурса лица используются информация, содержащаяся в модели 18 лица. Уточненные параметры ракурса записывают в модель 18 лица и производят уточнение графа лица.In the case of a high degree of consistency of the model 18 of the face in block 9, the angles of orientation of the head are calculated, namely the inclination, nod, rotation. For this, the coordinates of the found geometric center of the head and the coordinates of the traced facial features obtained from the current frame and previous frames are used. The calculation of the rotation angles, nodding, tilt of the head is carried out in space by comparing the data obtained with the template. When the coordinates of the position of the lines on the left and right images are detected, two-dimensional coordinates are translated into three-dimensional coordinates. The information obtained allows us to evaluate the angle of a person’s face. To evaluate the face angle, the information contained in the face model 18 is used. The refined angle parameters are recorded in the face model 18 and the face graph is refined.

В случае выявления репрезентативной и уникальной в смысле распознавания пары изображений управление передают на модуль 15 распознавания, который производит геометрическую нормализацию модели 18 согласно однородным требованиям алгоритмов распознавания и подает их на сравнение с эталонами, характеризующими биометрические характеристики верифицируемого или идентифицируемого пользователя.If a pair of images that are representative and unique in terms of recognition is detected, control is transferred to the recognition module 15, which performs geometric normalization of the model 18 according to the uniform requirements of the recognition algorithms and submits them for comparison with standards characterizing the biometric characteristics of the user being verified or identified.

Блок 17 выполняет геометрическую нормализацию - согласование размеров, ракурсов, световых вариаций текстуры изображения лица к однородным требованиям.Block 17 performs geometric normalization - the coordination of sizes, angles, light variations of the texture of the image of the face to uniform requirements.

Блок 16 осуществляет процесс распознавания лица для прикладных задач. Этот блок выполняет сравнение нормализованной модели 18 лица с эталонами моделей лица, которые могут отличаться между собой по ракурсу и удаленностью от камеры. Эталоны лица соответствуют определенному пользователю, и им сопоставлен определенный идентификатор пользователя. Распознавание проводят по модели сравнения 1 к 1 - верификация, или модели сравнения 1 к N -идентификация.Block 16 implements a face recognition process for applied tasks. This block compares the normalized face model 18 with the standards of face models, which may differ in angle and distance from the camera. The person’s standards correspond to a specific user, and they are associated with a specific user ID. Recognition is carried out according to the 1 to 1 comparison model - verification, or the 1 to N comparison model - identification.

Промышленное применение данного устройства может быть различным в зависимости от его свойств и конструкции. В связи с растущим спросом на интеллектуальные системы безопасности данная система стереодетектирования и наблюдения может быть использована в качестве интеллектуальной системы наблюдения за лицом в офисах, банках, магазинах, частных домах, промышленных зданиях и других замкнутых пространствах для предоставления услуги интеллектуального видеонаблюдения.The industrial application of this device may be different depending on its properties and design. Due to the growing demand for intelligent security systems, this stereo detection and surveillance system can be used as an intelligent face monitoring system in offices, banks, shops, private homes, industrial buildings and other enclosed spaces to provide intelligent video surveillance services.

Claims

1. A method for detecting, tracking and determining the angle of a person’s face in the observation zone, which consists in determining the presence of a person in the observation zone using a stereoscopic system consisting of at least two sensors with a predetermined location; determine the position of the head in the observation area using a priori data on the geometric dimensions and a specific method of positioning a person’s face; distinguish the area of the face and the position of such elements as eyebrows, eyes, nose, mouth on the found face; simultaneously tracking three types of objects on the face, namely, points, areas and many ordered connected areas or points that together form a face graph; based on a priori and found data, a three-dimensional face model is reconstructed; in the case of sufficient completeness and integrity of the informative features of the obtained three-dimensional model of the face, the angles determining the orientation of the head in space are calculated; if the found angle is representative enough and differs from the angles in the previous frames, then face recognition is performed based on the most representative image frames.

2. The method according to claim 1, characterized in that the detection of the image of the human head and its remoteness from the camera is based on a three-dimensional stereo reconstruction of the observation zone and the formation of objects - candidates corresponding in shape to the human head.

3. The method according to claim 1, characterized in that the fast detection of the face region is carried out by conducting two types of independent analysis of the images of the stereo pair, namely the analysis of images as a whole using horizontal and vertical projections of the brightness histogram and analysis of the local-frequency image properties as a result image conversion wavelet.

4. The method according to claim 1, characterized in that the stability of detection and tracking is ensured by performing segmentation of the face region according to the combination of features and the exact localization of the point and region of interest, while the structure of the face graph is determined on the basis of a priori data about the face, angles and method positioning.

5. The method according to claim 1, characterized in that they coordinate and control the tracking flows of three types of geometric objects, namely, points, regions, and graphs, where the image frames are fed into each tracking stream at a different frequency, namely, slow, medium fast.

6. The method according to claim 1, characterized in that to control the three types of tracking, a task distribution block is used, which is configured to establish the priority and frequency of the tracking flows, as well as a specific target order for performing tracking tasks, where the order and strategy for planning the work of the block task distributions are selected based on the target settings of the biometric system, such as accuracy, speed, reliability, and video recording restrictions, such as the size of the detection zone, light restrictions, the presence of sl zhnyh backgrounds, the presence of overlapping entities, the way of positioning the face in front of the camera.

7. The method according to claim 1, characterized in that they evaluate the geometric center of a three-dimensional face model based on stereo reconstruction and attracting information about the traced facial features, such as points, areas, and graph.

8. The method according to claim 1, characterized in that to determine the angle and calculate the orientation angles of the head, the coordinates of the found geometric center of the head and the coordinates of the traced facial features obtained from the current and previous frames are used.

9. The method according to claim 1, characterized in that to ensure tracking of objects in poorly lit conditions, the method automatically adapts the parameters of the method that does not use color information and is based on gradient filtering and wavelet transform of grayscale images with subsequent restoration of the image depth map.

10. The method according to claim 1, characterized in that face recognition is performed by sampling the most representative angles determined by restrictions on the angles of orientation of the face.

11. A system for tracking an object in the observation zone, including a module for detecting three-dimensional objects, containing at least two identical sensors spaced in a space with a predetermined orientation and connected to a signal conversion unit, the output of which is connected to the input of the pre-processing and normalization unit the converted signal, the output of which is connected to the input of the module for fast detection and tracking of a person’s face, while such a module includes a detector and coordinator unit a channel connected by fast channels with parallel independent tracking units, namely, a unit for tracking points of interest, a unit for tracking areas of interest, a unit for tracking the structure of a face graph, each of these units being configured to read and write to a shared data unit a three-dimensional model of the face, as well as with an angle evaluation unit, while the module for quickly detecting and tracking a person’s face is connected to the input of the person’s face recognition module.