KR20170007070A

KR20170007070A - Method for visitor access statistics analysis and apparatus for the same

Info

Publication number: KR20170007070A
Application number: KR1020150129820A
Authority: KR
Inventors: 김병민; 권재철
Original assignee: 주식회사 케이티
Priority date: 2015-07-08
Filing date: 2015-09-14
Publication date: 2017-01-18
Also published as: KR102550673B1

Abstract

The present invention relates to a human detection method using a depth image analysis. A method of detecting a person using a depth image according to an aspect of the present invention includes: obtaining one or more depth images from one or more depth cameras; generating one analysis target height map using the one or more depth images; determining one or more candidate regions in the one analysis target height map; extracting feature information from each of the one or more candidate regions; and determining if a human object is detected in each of the one or more candidate regions based on the feature information. The feature information may include information on an area according to a height level and an area variation according to the height level in one candidate region. Further, an apparatus and a method for statistically analyzing an access of visitors can be provided based on the person detection method.

Description

[0001] METHOD FOR VISITOR ACCESS STATISTICS ANALYSIS AND APPARATUS FOR THE SAME [0002]

본 발명은 사람 검출 방안에 대한 것으로, 보다 구체적으로는 깊이 영상 분석을 이용한 방문객 출입 통계 분석 방법, 장치, 소프트웨어, 이러한 소프트웨어가 저장된 기록 매체에 대한 것이다.The present invention relates to a human detection scheme, and more particularly, to a visitor access statistical analysis method, apparatus, software, and recording medium storing such software using depth image analysis.

종래의 방문객 계수(counting) 방법들 중의 하나로 대기번호표를 발급하는 장치를 이용하는 방법이 있다. 이는 순번에 따른 효율적인 고객 응대가 주 목적이지만, 대기표가 발급될 때마다 횟수와 시간을 별도의 서버에 기록함으로써 방문 고객을 계수하여 그 통계를 분석하는 목적으로도 이용할 수 있었다. 그러나 사람이 직접 번호표를 발급받아야 하므로 발급 기록이 조작될 수 있고 방문고객이 떠나는 시점을 알 수 없으므로 방문객들의 평균적인 체류시간도 알 수 없는 문제가 있었다.There is a method of using an apparatus for issuing an wait number table as one of the conventional visitor counting methods. This is mainly for the purpose of efficient customer service according to the order, but it can be used for the purpose of counting the visiting customers and analyzing the statistics by recording the number of times and the time on a separate server each time a waiting ticket is issued. However, since a person has to receive a number card directly, the issuance record can be manipulated, and the average staying time of the visitors can not be known because the visitor can not know when the visitor leaves.

또 다른 종래의 방문객 계수 방법으로는 출입문에 설치된 적외선 센서를 이용하는 방법이 있다. 이는 비교적 간단하고 비용이 적게 드는 방법이지만, 동시에 여러 명이 적외선 센서의 감지 선을 지나가거나 센서가 다른 물체에 가려지는 등의 혼잡한 상황에서 출입 감지와 계수의 정확도가 크게 떨어지는 문제가 있다. 또한, 단순계수 기능 이외에 부가적인 정보 획득이나 모니터링 등의 용도로는 사용할 수 없는 문제가 있었다. As another conventional visitor counting method, there is a method using an infrared sensor installed at a door. This is a relatively simple and inexpensive method, but there is a problem in that the accuracy of the entrance detection and the coefficient is greatly deteriorated in a congested situation where several people pass the infrared ray sensor's sensing line or the sensor is covered by another object. Further, there is a problem that it can not be used for the purpose of acquiring additional information or monitoring other than the simple counting function.

이러한 문제점을 해결하기 위해서, 카메라를 통해서 촬영된 영상을 분석하여 사람을 검출하는 방법은 고객 분석(출입 인원 계수(counting), 동선 분석 등)과 같은 분야에서 그 필요성이 높아지고 있다. 종래의 사람 검출 방법은 출입문 근처와 같은 특정 공간 영역을 카메라로 촬영한 영상을 분석함으로써 수행되는데, 일반적인 2차원 카메라(예를 들어, CCD (charge coupled device) 또는 CMOS (complementary metal oxide semiconductor) 방식의 카메라)를 이용하는 경우에는 조명 환경이나 배경 영역의 복잡도에 의해 검출의 정확도가 크게 떨어질 수 있다. 또한, 다수의 사람들이 몰려서 이동하는 경우에는 검출해야 하는 객체들의 가려짐(occlusion)으로 인하여 정확한 출입 인원 계수가 어려운 문제가 있었다. 따라서, 최근에는 깊이(depth) 카메라를 이용하여 보다 정확하게 사람을 검출하는 방법이 개발되고 있다. In order to solve such a problem, a method of detecting a person by analyzing an image photographed through a camera is required in fields such as customer analysis (counting of entrance, counting of a moving line, etc.). The conventional human detection method is performed by analyzing an image captured by a camera in a specific space area such as near the entrance door. In general, a two-dimensional camera (for example, a charge coupled device (CCD) or a complementary metal oxide semiconductor Camera), the accuracy of detection may be greatly reduced due to the complexity of the illumination environment or the background area. In addition, when a large number of people are moving, there is a problem that it is difficult to accurately count the number of people to enter because of the occlusion of objects to be detected. Therefore, in recent years, a method of detecting a person more accurately using a depth camera has been developed.

깊이 카메라를 이용하면 촬영된 영상으로부터 촬영된 대상의 3차원 공간 정보를 획득할 수 있으므로, 이차원 영상에 기반한 사람 검출 방식에 비하여, 가려짐(occlusion)이나 주변 조명 환경에 의한 영향을 상대적으로 덜 받을 수 있다. The depth camera can acquire three-dimensional spatial information of a photographed object from the photographed image. Therefore, compared to a two-dimensional image-based human detection method, it is less affected by occlusion or ambient lighting environment .

다만, 종래의 깊이 카메라를 이용한 사람 검출 방법에서는, 깊이 정보를 획득하기 위한 카메라 방식(예를 들어, 깊이 카메라는 거리를 측정하기 위한 방법에 따라, 구조광(structured light), TOF(time of flight), 스테레오(stereo) 등의 방식으로 구현될 수 있음)의 특성상 고정된 화각(angle of view)을 가지고 유효 촬영거리가 제한되므로, 설치할 수 있는 환경에도 제약이 발생한다. 만약 카메라의 화각이 고정된 경우, 카메라에서 촬영 대상까지의 거리에 따라서 촬영 가능한 영역이 달라질 수 있다. 또한, 일반적인 깊이 카메라는 촬영 가능한 최소 거리와 최대 거리에 제약이 있을 수 있다.However, in a conventional human detection method using a depth camera, a camera system for acquiring depth information (for example, a depth camera may include structured light, TOF (time of flight ), Stereo (stereo), etc.), the effective shooting distance is limited with a fixed angle of view, so that there is a restriction in the installation environment. If the angle of view of the camera is fixed, the photographable area may vary depending on the distance from the camera to the object to be photographed. In addition, a general depth camera may have a limitation on the minimum distance and maximum distance that can be photographed.

따라서, 종래에는 이러한 깊이 카메라의 제약을 고려하여, 촬영 영역이 출입 영역의 전체를 커버할 수 있도록 도 1과 같이 카메라가 출입 영역의 중심 수직면으로부터 이격되어 설치되는 방식이 제안되었다. 그러나, 이러한 경우에는 카메라가 향하는 방향과 바닥면이 이루는 각도가 낮아지게 되고, 이로 인해 앞쪽에 있는 객체에 의해서 뒤쪽에 위치한 객체의 일부가 가려지는 현상(occlusion)이 발생할 수 있으므로, 사람 검출의 정확도가 낮아지게 된다. Therefore, in consideration of the limitation of the depth camera, conventionally, a method has been proposed in which the camera is installed to be spaced apart from the central vertical plane of the entrance area, as shown in FIG. 1, so that the photographing area covers the entire entrance and exit area. However, in such a case, the angle between the direction of the camera and the bottom surface becomes low, which may cause occlusion of a part of the object located at the rear by the object at the front, .

또 다른 종래 기술에서는 깊이 카메라를 이용하면서 객체의 가려짐과 같은 문제가 발생하지 않도록, 스테레오 방식의 카메라를 바닥에 수직인 방향으로 천장에 설치하는 방식도 제안되었다. 그러나, 이러한 경우에는 천장이 높은 환경에서는 카메라를 바닥에 수직인 방향으로 설치하는 것이 매우 어렵고, 천장이 낮은 환경에서는 카메라 화각의 제약으로 인하여 촬영 영역이 매우 좁아지게 되므로, 출입 인원 계수와 같은 목적을 올바르게 달성할 수 없는 문제가 있었다. In another conventional technique, a stereo camera is installed on the ceiling in a direction perpendicular to the floor so that problems such as obscuration of objects are not generated while using a depth camera. However, in such a case, it is very difficult to install the camera in a direction perpendicular to the floor in a high ceiling environment, and in a low ceiling environment, the shooting area becomes very narrow due to restriction of the camera angle of view. There was a problem that could not be achieved correctly.

본 발명은 깊이 카메라의 설치 위치 상의 제약이 없이 깊이 카메라로부터 획득되는 영상으로부터 보다 정확하고 효율적으로 사람을 검출하고, 이에 기초하여 방문객 출입 여부를 실시간으로 검출하고, 방문객 출입 통계를 산출 및 분석하는 방법 및 장치를 제공하는 것을 기술적 과제로 한다. The present invention relates to a method of detecting a person more accurately and efficiently from an image obtained from a depth camera without restriction on the installation position of a depth camera, detecting on the basis of the presence or absence of a visitor in real time, calculating and analyzing visitor access statistics And an object of the present invention is to provide an apparatus.

본 발명에서 이루고자 하는 기술적 과제들은 이상에서 언급한 기술적 과제들로 제한되지 않으며, 언급하지 않은 또 다른 기술적 과제들은 아래의 기재로부터 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에게 명확하게 이해될 수 있을 것이다.It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory and are not restrictive of the invention, unless further departing from the spirit and scope of the invention as defined by the appended claims. It will be possible.

본 발명의 일 양상에 따른 깊이 영상을 이용하여 사람을 검출하는 방법은, 하나 이상의 깊이 카메라로부터 하나 이상의 깊이 영상을 획득하는 단계; 상기 하나 이상의 깊이 영상을 이용하여 하나의 분석 대상 높이맵을 생성하는 단계; 상기 하나의 분석 대상 높이맵에서 하나 이상의 후보 영역을 결정하는 단계; 상기 하나 이상의 후보 영역 각각에서 특징 정보를 추출하는 단계; 및 상기 특징 정보에 기초하여 상기 하나 이상의 후보 영역의 각각에서 사람 객체가 검출되는지 판정하는 단계를 포함할 수 있다. 상기 특징 정보는 하나의 후보 영역에서 높이 레벨에 따른 면적 및 상기 높이 레벨에 따른 면적의 변화량에 대한 정보를 포함할 수 있다. A method of detecting a person using a depth image according to an aspect of the present invention includes: obtaining one or more depth images from one or more depth cameras; Generating one analysis target height map using the one or more depth images; Determining one or more candidate regions in the one analysis target height map; Extracting feature information from each of the one or more candidate regions; And determining if a human object is detected in each of the one or more candidate regions based on the feature information. The feature information may include an area according to a height level in one candidate region and information on a variation amount of the area according to the height level.

본 발명의 다른 양상에 따른 깊이 영상을 이용하여 사람을 검출하는 장치는, 하나 이상의 깊이 카메라로부터 하나 이상의 깊이 영상을 획득하는 영상 수신부; 상기 하나 이상의 깊이 영상을 이용하여 하나의 분석 대상 높이맵을 생성하는 분석 대상 높이맵 생성부; 상기 하나의 분석 대상 높이맵에서 하나 이상의 후보 영역을 결정하는 후보 영역 결정부; 상기 하나 이상의 후보 영역 각각에서 특징 정보를 추출하는 특징 정보 추출부; 및 상기 특징 정보에 기초하여 상기 하나 이상의 후보 영역의 각각에서 사람 객체가 검출되는지 판정하는 사람 판정부를 포함할 수 있다. 상기 특징 정보는 하나의 후보 영역에서 높이 레벨에 따른 면적 및 상기 높이 레벨에 따른 면적의 변화량에 대한 정보를 포함할 수 있다. According to another aspect of the present invention, there is provided an apparatus for detecting a person using a depth image, comprising: an image receiving unit for obtaining one or more depth images from one or more depth cameras; An analysis object height map generation unit for generating an analysis object height map using the at least one depth image; A candidate region determining unit for determining one or more candidate regions in the one analysis target height map; A feature information extracting unit for extracting feature information from each of the one or more candidate regions; And a person judging unit for judging whether a human object is detected in each of the one or more candidate regions based on the feature information. The feature information may include an area according to a height level in one candidate region and information on a variation amount of the area according to the height level.

본 발명의 또 다른 양상에 따르면 깊이 영상을 이용하여 사람을 검출하는 장치에 의해 실행가능한 명령들을 가지는 소프트웨어가 저장된 컴퓨터-판독가능한 매체가 제공될 수 있다. 상기 실행가능한 명령들은, 상기 장치로 하여금, 하나 이상의 깊이 카메라로부터 하나 이상의 깊이 영상을 획득하고; 상기 하나 이상의 깊이 영상을 이용하여 하나의 분석 대상 높이맵을 생성하고; 상기 하나의 분석 대상 높이맵에서 하나 이상의 후보 영역을 결정하고; 상기 하나 이상의 후보 영역 각각에서 특징 정보를 추출하고; 상기 특징 정보에 기초하여 상기 하나 이상의 후보 영역의 각각에서 사람 객체가 검출되는지 판정하도록 할 수 있다. 상기 특징 정보는 하나의 후보 영역에서 높이 레벨에 따른 면적 및 상기 높이 레벨에 따른 면적의 변화량에 대한 정보를 포함할 수 있다. According to another aspect of the present invention, a computer-readable medium having stored thereon software executable by an apparatus for detecting a person using a depth image may be provided. The executable instructions causing the apparatus to obtain one or more depth images from one or more depth cameras; Generate one analysis target height map using the one or more depth images; Determine one or more candidate regions in the one analysis target height map; Extracting feature information from each of the one or more candidate regions; And to determine whether a human object is detected in each of the one or more candidate regions based on the feature information. The feature information may include an area according to a height level in one candidate region and information on a variation amount of the area according to the height level.

본 발명의 또 다른 양상에 따르면 깊이 영상을 이용하여 사람의 출입을 판정하는 방법이 제공될 수 있다. 상기 방법은, 상기 깊이 영상의 n 번째 프레임에서 사람 검출 결과가 존재하는지 판정하는 단계; 상기 n 번째 프레임에서 사람 검출 결과가 존재하는 경우, 기존 경로가 존재하는지 판정하는 단계; 상기 기존 경로가 존재하는 경우, 시간 임계치 및 공간 임계치에 기초하여 추가 가능 경로가 존재하는지 판정하는 단계; 추가 가능 경로가 존재하는 경우에 상기 기존 경로를 업데이트하는 단계; 및 상기 업데이트된 경로의 시작 위치 및 종료 위치에 기초하여 상기 사람의 출입을 판정하는 단계를 포함하고, 상기 기존 경로의 마지막 검출 위치와 상기 n 번째 프레임에서 사람 검출 결과의 위치의 차이가 상기 공간 임계치 이하이고, 상기 기존 경로의 마지막 검출 시점과 상기 n 번째 프레임에서 사람 검출 시점의 차이가 상기 시간 임계치 이하인 경우, 상기 추가 가능 경로가 존재하는 것으로 판정될 수 있다. According to still another aspect of the present invention, a method of determining a person's access using a depth image can be provided. The method includes determining whether a human detection result exists in an nth frame of the depth image; Determining if an existing path exists if a human detection result exists in the nth frame; If the existing path exists, determining whether an addable path exists based on a time threshold and a spatial threshold; Updating the existing path if an addable path exists; And determining the entrance and exit of the person based on the start position and the end position of the updated path, wherein the difference between the last detected position of the existing path and the position of the human detection result in the nth frame is the spatial threshold And if the difference between the last detection point of the existing path and the human detection point in the nth frame is less than or equal to the time threshold, it can be determined that the additional path exists.

본 발명의 다양한 양상들에 있어서, 상기 특징 정보는 높이 레벨에 따라 분류된 복수개의 좌표 그룹의 각각에 속한 좌표들이 차지하는 면적을 나타내는 원소들을 포함하는 특징 벡터를 포함할 수 있다. In various aspects of the present invention, the feature information may include a feature vector including elements representing the area occupied by the coordinates belonging to each of the plurality of coordinate groups classified according to the height level.

본 발명의 다양한 양상들에 있어서, 상기 특징 벡터의 하나의 원소는, 하나의 높이 레벨에 대응하는 하나의 좌표 그룹에 속하는 좌표들에 대한 컨벡스 헐(convex hull)의 면적을 나타낼 수 있다. In various aspects of the present invention, one element of the feature vector may represent the area of the convex hull relative to the coordinates belonging to one coordinate group corresponding to one height level.

본 발명의 다양한 양상들에 있어서, 인접한 높이 레벨 간의 면적 변화량에 가중치가 적용된 최종 특징 벡터 x는 아래의 수학식으로 정의되고,In various aspects of the present invention, the final feature vector x weighted by the area change amount between adjacent height levels is defined by the following equation,

s _i는 상기 특징 벡터의 i 번째 원소를 나타내고, α _k는 상기 최종 특징 벡터의 k 번째 원소에 대한 가중치를 나타낼 수 있다. s _i denotes an i-th element of the feature vector, and ? _k denotes a weight of a k-th element of the final feature vector.

본 발명의 다양한 양상들에 있어서, 인접한 높이 레벨 간의 면적 변화량과 각각의 면적의 영상비에 가중치가 적용되는 최종 특징 벡터 x는 아래의 수학식으로 정의되고,In various aspects of the present invention, the final feature vector x, to which the weights are applied, is defined by the following equation,

s _i는 상기 특징 벡터의 i 번째 원소를 나타내고, s _i denotes an i-th element of the feature vector,

α _k는 상기 최종 특징 벡터의 k 번째 원소에 대한 가중치를 나타내고, alpha _k denotes a weight for the k-th element of the final feature vector,

r _i 은 i 번째 높이 레벨에 해당하는 면적에서 단축의 길이를 장축의 길이로 나눈 값이며, 0 초과 1 미만의 값을 가질 수 있다. r _i is a value obtained by dividing the length of the minor axis by the length of the major axis in the area corresponding to the i-th height level, and may have a value of more than 0 and less than 1.

본 발명의 다양한 양상들에 있어서, 상기 특징 정보는 사람의 머리 부분에 대응하는 하나 이상의 높이 레벨에 따른 면적과, 사람의 어깨부터 상반신 부분에 대응하는 하나 이상의 높이 레벨에 따른 면적에 대한 정보를 포함할 수 있다. In various aspects of the present invention, the feature information includes information on an area according to one or more height levels corresponding to a head portion of a person, and an area according to one or more height levels corresponding to a part of the upper half of the person from a shoulder of a person can do.

본 발명의 다양한 양상들에 있어서, 상기 하나의 분석 대상 높이맵은, 상기 하나 이상의 깊이 영상의 각각에 대해서 생성되는 높이맵에 기초하여 생성될 수 있다. In various aspects of the present invention, the one analysis target height map may be generated based on a height map generated for each of the one or more depth images.

본 발명의 다양한 양상들에 있어서, 상기 하나의 분석 대상 높이맵은 복수의 높이맵의 합성에 의해 생성될 수 있다. In various aspects of the present invention, the one analysis target height map may be generated by combining a plurality of height maps.

본 발명의 다양한 양상들에 있어서, 하나 이상의 깊이 영상의 각각에 대해서 좌표 변환이 적용되고, 좌표 변환이 수행된 깊이 영상을 기준으로 높이맵이 생성될 수 있다. In various aspects of the present invention, a coordinate transformation is applied to each of the one or more depth images, and a height map may be generated based on the depth image on which the coordinate transformation is performed.

본 발명의 다양한 양상들에 있어서, 상기 좌표 변환은, 하나의 깊이 카메라에서 획득되는 하나의 깊이 영상의 화소의 깊이 정보로부터 카메라 기준 좌표계의 3차원 좌표로 변환하고, 상기 카메라 기준 좌표계의 3차원 좌표를 실세계 좌표계 상의 3차원 좌표로 변환하는 것을 포함할 수 있다. The coordinate transformation may be performed by converting depth information of a depth image obtained from one depth camera into three-dimensional coordinates of a camera reference coordinate system, To three-dimensional coordinates on the real world coordinate system.

본 발명의 다양한 양상들에 있어서, 상기 하나의 깊이 영상의 각각의 화소의 깊이 정보로부터 상기 카메라 기준 좌표계의 3차원 좌표로의 변환은 아래의 수학식에 따라 정의되고,In the various aspects of the present invention, the conversion from the depth information of each pixel of the one depth image to the three-dimensional coordinate of the camera reference coordinate system is defined according to the following equation,

i 는 상기 하나의 깊이 영상의 화소의 행 인덱스 및 열 인덱스를 각각 나타내고, d는 상기 깊이 정보의 값을 나타내고, x, y, z는 상기 카메라 기준 좌표계의 X축, Y축, Z축 상의 값을 각각 나타내고, ρ _hor , ρ _ver 는 상기 하나의 깊이 영상의 수평 해상도 및 수직 해상도를 각각 나타내고, θ _hor , θ _ver 는 상기 하나의 깊이 카메라의 수평 및 수직 FOV(Field Of View)를 각각 나타낼 수 있다. i represents a row index and a column index of pixels of the one depth image, d represents a value of the depth information, and x , y , and z represent values on the X axis, Y axis, and Z axis of the camera reference coordinate system Respectively, and ρ _hor , ρ _ver Represents a horizontal resolution and a vertical resolution of the one depth image, and θ _hor and θ _ver represent horizontal and vertical FOVs of the one depth camera, respectively.

본 발명의 다양한 양상들에 있어서, 상기 카메라 기준 좌표계의 3차원 좌표로부터 상기 실세계 좌표계 상의 3차원 좌표로의 변환은, 상기 카메라 기준 좌표계의 X축, Y축, Z축을 각각 φ,θ,ψ 만큼 회전변환하고, H 만큼 이동 변환하고, X-Y 평면에 대한 리플렉션 변환하는 것을 포함하며, H는 하나의 카메라가 설치된 높이를 나타내고, φ,θ,ψ 는 상기 카메라 기준 좌표계의 X, Y, Z 축에서 하나의 카메라가 설치된 각도를 각각 나타낼 수 있다. In the various aspects of the present invention, the conversion from the three-dimensional coordinates of the camera reference coordinate system to the three-dimensional coordinates on the real world coordinate system may be performed on the X, Y, and Z axes of the camera reference coordinate system by φ, H, represents the height at which one camera is installed, and [phi], [theta], and [phi] represent the height in the X, Y, and Z axes of the camera reference coordinate system It is possible to indicate the angle at which one camera is installed.

본 발명의 다양한 양상들에 있어서, 상기 하나 이상의 후보 영역의 각각은 하나의 지역 최고점을 포함할 수 있다. In various aspects of the present invention, each of the one or more candidate regions may include one local peak.

본 발명의 다양한 양상들에 있어서, 상기 하나 이상의 깊이 카메라의 각각의 설치 위치 또는 설치 각도 중의 하나 이상은 조절가능할 수 있다. In various aspects of the invention, at least one of the respective installation locations or installation angles of the one or more depth cameras may be adjustable.

본 발명의 다양한 양상들에 있어서, 복수개의 깊이 카메라의 설치 위치 또는 설치 각도 중의 하나 이상이 서로 다를 수 있다. In various aspects of the invention, one or more of the installation locations or installation angles of the plurality of depth cameras may be different.

본 발명의 다양한 양상들에 있어서, 상기 사람 객체가 검출되는지의 판정은, 미리 학습된 분류기를 이용하여 결정되는 사람 객체의 특징 정보에 상기 특징 정보가 매칭되는지에 기초할 수 있다. In various aspects of the present invention, the determination of whether the human object is detected may be based on whether the feature information matches the feature information of the human object determined using a pre-trained classifier.

본 발명에 대하여 위에서 간략하게 요약된 특징들은 후술하는 본 발명의 상세한 설명의 예시적인 양상일 뿐이며, 본 발명의 범위를 제한하는 것은 아니다. The features briefly summarized above for the present invention are only illustrative aspects of the detailed description of the invention which are described below and do not limit the scope of the invention.

본 발명에 따르면 깊이 카메라의 설치 위치 상의 제약이 없이 깊이 카메라로부터 획득되는 영상으로부터 보다 정확하고 효율적으로 사람을 검출하고, 이에 기초하여 방문객 출입 여부를 실시간으로 검출하고, 방문객 출입 통계를 산출 및 분석하는 방법 및 장치가 제공될 수 있다. According to the present invention, it is possible to detect a person more accurately and efficiently from an image acquired from a depth camera without restriction on the installation position of the depth camera, to detect whether or not a visitor enters or exits on the basis thereof, A method and apparatus may be provided.

본 발명에서 얻을 수 있는 효과는 이상에서 언급한 효과들로 제한되지 않으며, 언급하지 않은 또 다른 효과들은 아래의 기재로부터 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에게 명확하게 이해될 수 있을 것이다.The effects obtained by the present invention are not limited to the above-mentioned effects, and other effects not mentioned can be clearly understood by those skilled in the art from the following description will be.

본 명세서에 첨부되는 도면은 본 발명에 대한 이해를 제공하기 위한 것으로서 본 발명의 다양한 실시형태들을 나타내고 명세서의 기재와 함께 본 발명의 원리를 설명하기 위한 것이다.
도 1 및 도 2는 사람 검출을 위한 깊이 카메라 설치 위치의 예시들을 설명하기 위한 도면이다.
도 3은 본 발명에 따른 사람 검출 장치의 구성 및 동작에 대해서 설명하기 위한 도면이다.
도 4 및 도 5는 본 발명에 따른 하나 이상의 깊이 영상 획득에 대해서 설명하기 위한 도면이다.
도 6은 본 발명에 따른 하나 이상의 깊이 카메라를 이용하는 사람 검출 장치의 구성 및 동작을 설명하기 위한 도면이다.
도 7은 본 발명에 따른 좌표 변환에 대해서 설명하기 위한 도면이다.
도 8은 카메라 기준 좌표계와 실세계 좌표계의 관계를 설명하기 위한 도면이다.
도 9는 본 발명에 따른 카메라 기준 좌표계의 좌표를 실세계 기준 좌표계의 좌표로 변환하는 동작을 설명하기 위한 도면이다.
도 10은 본 발명에 따른 각각의 후보 영역에서 특징 정보를 추출하는 방법을 설명하기 위한 도면이다.
도 11은 본 발명에 따른 후보 영역의 특징 정보의 일례를 나타내는 도면이다.
도 12는 본 발명에 따른 방문객 출입 통계 분석 시스템을 나타내는 도면이다.
도 13은 출입 감지기의 일례를 나타낸 구성도이다.
도 14는 본 발명에 따른 높이맵 생성을 예시적으로 나타내는 도면이다.
도 15는 본 발명에 따른 분석 대상 높이맵 생성을 예시적으로 나타내는 도면이다.
도 16은 본 발명에 따른 이동 추적 및 출입 판정 동작을 설명하기 위한 도면이다.
도 17은 출입 감지기 제어 화면을 예시적으로 나타내는 도면이다.
도 18 및 도 19는 방문객 출입 통계 분석 정보를 예시적으로 나타내는 도면이다.BRIEF DESCRIPTION OF THE DRAWINGS The accompanying drawings, which are included to provide a further understanding of the invention, illustrate various embodiments of the invention and, together with the description, serve to explain the principles of the invention.
1 and 2 are views for explaining examples of depth camera installation positions for human detection.
3 is a diagram for explaining the configuration and operation of the person detecting apparatus according to the present invention.
4 and 5 are views for explaining acquisition of one or more depth images according to the present invention.
6 is a diagram for explaining the configuration and operation of a human detection apparatus using one or more depth cameras according to the present invention.
7 is a diagram for explaining coordinate transformation according to the present invention.
8 is a diagram for explaining the relationship between the camera reference coordinate system and the real world coordinate system.
9 is a diagram for explaining an operation of converting coordinates of a camera reference coordinate system into coordinates of a real world reference coordinate system according to the present invention.
10 is a diagram for explaining a method of extracting feature information from each candidate region according to the present invention.
11 is a diagram showing an example of feature information of a candidate region according to the present invention.
12 is a diagram illustrating a system for analyzing visitor entrance statistics according to the present invention.
13 is a block diagram showing an example of an entrance / exit detector.
FIG. 14 is an exemplary view illustrating generation of a height map according to the present invention. FIG.
15 is a diagram exemplarily showing generation of a target height map according to the present invention.
16 is a diagram for explaining movement tracking and access determination operations according to the present invention.
17 is a view showing an exemplary entrance sensor control screen.
18 and 19 are views showing exemplary visitor entrance statistical analysis information.

아래에서는 첨부한 도면을 참고로 하여 본 발명의 실시 예에 대하여 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자가 용이하게 실시할 수 있도록 상세히 설명한다. 그러나, 본 발명은 여러 가지 상이한 형태로 구현될 수 있으며 여기에서 설명하는 실시 예에 한정되지 않는다. 그리고, 도면에서 본 발명을 명확하게 설명하기 위해서 설명과 관계없는 부분은 생략하였으며, 명세서 전체를 통하여 유사한 부분에 대해서는 유사한 도면 부호를 붙였다.Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings so that those skilled in the art can easily carry out the present invention. The present invention may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein. In order to clearly illustrate the present invention, parts not related to the description are omitted, and like parts are denoted by similar reference numerals throughout the specification.

깊이 영상 기반 사람 검출 방법 및 장치Depth image-based person detection method and apparatus

도 3은 본 발명에 따른 사람 검출 장치의 구성 및 동작에 대해서 설명하기 위한 도면이다. 3 is a diagram for explaining the configuration and operation of the person detecting apparatus according to the present invention.

본 발명에 따르면, 하나 이상의 깊이 카메라(110)의 각각에 의해서 촬영된 깊이 영상이 영상 수신부(120)로 전달될 수 있다. 영상 수신부(120)는 하나 이상의 깊이 카메라(110)의 각각에 의해서 촬영된 깊이 영상(즉, 하나 이상의 깊이 영상)을 획득하여 이에 대한 정보를 분석 대상 높이맵 생성부(130)로 전달할 수 있다 (S10). 분석 대상 높이맵 생성부(130)는 획득된 하나 이상의 깊이 영상에 대한 정보로부터 사람 검출을 위해 이용되는 하나의 분석 대상 높이맵을 생성할 수 있다(S20). 후보 영역 결정부(140)는 분석 대상 높이맵을 분석하여 지역 최고점(local maxima) 방식을 이용하여 객체 검출을 위한 하나 이상의 후보 영역을 결정할 수 있다(S30). 특징 정보 추출부(150)는 하나 이상의 후보 영역의 각각에서 사람의 머리와 어깨의 높이와 같은 특징(feature) 정보를 추출할 수 있다(S40). 사람 판정부(160)는 검출된 객체가 사람 객체인지 여부를 추출된 특징 정보 기반으로 판정할 수 있다(S50). According to the present invention, a depth image photographed by each of the one or more depth cameras 110 can be transmitted to the image receiving unit 120. The image receiving unit 120 may acquire a depth image (i.e., one or more depth images) taken by each of the one or more depth cameras 110 and transmit the information to the analysis target height map generating unit 130 S10). The analysis target height map generation unit 130 may generate one analysis target height map used for human detection from the obtained information of one or more depth images (S20). The candidate region determination unit 140 may analyze one or more candidate regions for object detection using the local maxima method by analyzing the target height map (S30). The feature information extracting unit 150 may extract feature information such as the height of the head and the shoulder of each of the one or more candidate regions (S40). The person determining unit 160 may determine whether the detected object is a human object based on the extracted feature information (S50).

이하에서는, 본 발명에 따른 깊이 영상 기반 사람 검출 방법 및 장치의 구체적인 구성에 대해서 설명한다. Hereinafter, a specific configuration of a depth image-based human detection method and apparatus according to the present invention will be described.

도 4 및 도 5는 본 발명에 따른 하나 이상의 깊이 영상 획득에 대해서 설명하기 위한 도면이다.4 and 5 are views for explaining acquisition of one or more depth images according to the present invention.

본 발명에서는 깊이 카메라가 특정 공간 영역을 촬영함에 있어서, 깊이 카메라의 설치 위치, 높이, 각도, 개수 등에 아무런 제약을 두지 않는다. 즉, 본 발명에서는 하나 또는 복수개의 깊이 카메라가 특정 공간 영역의 전부를 커버할 수 있도록 적절하게 설치되고, 그로부터 획득되는 하나 이상의 깊이 영상(즉, 하나의 깊이 카메라로부터 하나의 깊이 영상이 획득됨)으로부터, 사람 검출을 위한 영상 분석의 대상이 되는 하나의 영상(즉, 하나의 분석 대상 영상)을 생성할 수 있다. In the present invention, when the depth camera photographs a specific space area, there is no restriction on the installation position, height, angle, and number of the depth camera. That is, in the present invention, one or a plurality of depth cameras are appropriately installed so as to cover all the specific spatial regions, and one or more depth images obtained therefrom (i.e., one depth image is acquired from one depth camera) (I.e., one analysis target image) to be subjected to image analysis for human detection.

도 4 및 도 5의 예시에서는 특정 공간 영역을 나타낸다. 특정 공간 영역은, 사람 검출을 위해서 분석해야 하는 공간 영역을 의미하며, 예를 들어, 출입문에 사람이 드나들 때 발생가능한 모든 동선을 커버하는 영역으로 설정될 수 있다. 4 and 5 show specific spatial regions. The specific space area means a space area to be analyzed for human detection. For example, it may be set as an area covering all the possible lines when a person enters and leaves a doorway.

도 4의 예시와 같이 하나의 깊이 카메라(110)만으로는 특정 공간 영역의 전부를 커버할 수 없는 경우가 발생할 수 있다. 즉, 종래 기술과 같이 하나의 깊이 카메라(또는 한 쌍의 스테레오 카메라)가 바닥에 수직인 방향으로 천장에 설치되는 경우에는, 유효 촬영 거리 내에서 최대한 높이 설치한다고 하더라도 전체 출입 영역을 커버하지 못할 수도 있다. As in the example of FIG. 4, it may happen that only one depth camera 110 can not cover the entire specific space area. That is, when one depth camera (or a pair of stereo cameras) is installed on the ceiling in a direction perpendicular to the floor as in the prior art, it may not cover the entire entrance area have.

도 5의 예시와 같이 복수개의 깊이 카메라를 설치하여 특정 공간 영역의 전부를 커버할 수도 있다. 본 발명에서는 단순히 복수개의 깊이 카메라를 설치 및 이용하는 것이 아니라, 복수개의 깊이 카메라를 이용하여 특정 공간 영역의 전체를 커버하면서도 복수개의 깊이 카메라를 이용하여 획득되는 복수개의 깊이 영상 간의 중첩으로 인한 문제점을 오히려 사람 검출을 위한 정확도를 높이는 정보로서 이용하는 방안을 제안한다. As in the example of FIG. 5, a plurality of depth cameras may be installed to cover all the specific space areas. In the present invention, rather than merely installing and using a plurality of depth cameras, the problem of overlapping a plurality of depth images obtained by using a plurality of depth cameras while covering a specific space area by using a plurality of depth cameras is rather And to use it as information for increasing the accuracy for human detection.

구체적으로, 도 5의 예시에서와 같이 제 1 깊이 카메라(110-1)에 의해서 촬영되는 공간과 제 2 깊이 카메라(110-2)에 의해서 촬영되는 공간이 중복되는 영역이 발생한다. 만약 각각의 깊이 카메라에 의해 획득된 깊이 영상을 이용하여 개별적으로 사람 검출이 수행되는 경우에는, 한 명의 사람이 중복하여 두 번 검출되는 문제가 발생할 수도 있다. 본 발명에서와 같이 복수개의 깊이 카메라로부터 획득된 복수개의 깊이 영상을 이용하는 경우, 하나의 합성된 깊이 영상을 생성할 수 있다. 또한, 이와 같이 합성된 깊이 영상을 이용함으로써, 후술하는 바와 같이 어느 하나의 깊이 카메라에서 오인식 또는 미인식되는 정보가 다른 깊이 카메라에서 획득된 정보에 의해서 보정될 수 있으므로, 사람 검출의 정확도가 높아질 수 있다. Specifically, as in the example of FIG. 5, an area where a space taken by the first depth camera 110-1 overlaps with a space taken by the second depth camera 110-2 occurs. If human detection is performed individually using the depth images obtained by the respective depth cameras, there may be a problem that one person is detected twice in duplicate. When a plurality of depth images obtained from a plurality of depth cameras are used as in the present invention, a combined depth image can be generated. Further, by using the thus-synthesized depth image, the information that is erroneously recognized or unrecognized by any one of the depth cameras can be corrected by the information obtained by the other depth camera as described later, so that the accuracy of human detection can be enhanced have.

만약 하나의 카메라만으로 특정 공간 영역의 전부를 커버할 수 있는 경우에는 하나의 깊이 영상으로부터 분석 대상 높이맵이 생성되고, 이에 기반하여 후보 영역 결정, 특징 정보 추출, 사람 객체 판별을 적용될 수 있으며, 본 발명의 범위에서 하나의 깊이 영상 기반 동작을 배제하는 것은 아니다. 즉, 본 발명에서 하나의 깊이 카메라에 의한 하나의 깊이 영상만을 이용하더라도, 종래 기술과 같이 사람 객체 판별을 단순히 일정 높이에 해당하는 영역이나 촬영되는 면적이나 화소(pixel) 개수를 카운팅하는 방식을 이용하는 것이 아니라, 객체 검출을 위한 하나 이상의 후보 영역을 결정하고 각각의 후보 영역에서 특징 정보를 추출하여 추출된 특징 정보에 기반하여 사람 객체를 검출하는 새로운 방식에 따르기 때문에, 종래 기술에 비하여 사람 검출의 정확도를 높일 수 있다. If only one camera can cover all the specific spatial regions, the analysis object height map is generated from one depth image, and candidate region determination, feature information extraction, and human object determination can be applied based on the generated height map. But does not exclude a depth image based operation within the scope of the invention. That is, in the present invention, even if only one depth image is used by one depth camera, a method of counting the area corresponding to a predetermined height or the number of pixels or pixels to be photographed is used However, since one or more candidate regions for object detection are determined, feature information is extracted from each candidate region, and a new method of detecting a human object is performed based on the extracted feature information, the accuracy of human detection .

만약 하나의 카메라만으로 특정 공간 영역의 전부를 커버할 수 있다고 하더라도 본 발명의 예시와 같이 복수개의 카메라를 이용하여 복수개의 깊이 영상을 획득 및 이용하는 경우, 보다 정확한 사람 검출 결과를 기대할 수 있다. Even if a single camera can cover all the specific spatial regions, more accurate human detection results can be expected when acquiring and using a plurality of depth images using a plurality of cameras as in the present invention.

또한, 본 발명에 따른 하나 이상의 깊이 카메라로부터 획득되는 하나 이상의 깊이 영상을 이용하는 방법은, 깊이 카메라의 설치 위치, 높이, 또는 각도를 조절하는 경우, 또는 설치되는 깊이 카메라의 개수를 늘리거나 줄이는 경우 중의 하나 이상의 경우에서 모두 적용될 수 있다. In addition, the method using one or more depth images obtained from one or more depth cameras according to the present invention may be used when adjusting the installation position, height, or angle of the depth camera, or when increasing or decreasing the number of installed depth cameras It can be applied to all of one or more cases.

도 6은 본 발명에 따른 하나 이상의 깊이 카메라를 이용하는 사람 검출 장치의 구성 및 동작을 설명하기 위한 도면이다.6 is a diagram for explaining the configuration and operation of a human detection apparatus using one or more depth cameras according to the present invention.

도 6의 예시에서는 N(N=1, 2, 3, ...) 개의 깊이 카메라를 구비하는 사람 검출 장치를 도시한다. In the example of Fig. 6, there is shown a person detecting apparatus having N (N = 1, 2, 3, ...) depth cameras.

제 1 영상 수신부(120-1)는 제 1 깊이 카메라(110-1)로부터 깊이 영상을 획득하여, 깊이 영상 정보를 제 1 좌표 변환부(121-1)로 전달할 수 있다. 제 1 깊이 카메라(110-1)는 제 1 영상 수신부(120-1)와 다양한 방식(예를 들어, USB, IEEE 1394 등의 유선 연결 방식, 또는 블루투스, WiFi 등의 무선 연결 방식)으로 연결될 수 있다. 제 1 좌표 변환부(121-1)는 깊이 영상 정보로부터 실세계 3차원 좌표를 생성할 수 있다. 제 1 높이맵 생성부(122-1)는 변환된 3차원 좌표 기반으로 소정의 기준값 이상의 유효한 값들에 해당하는 좌표값들을 이용하여 높이맵을 생성할 수 있다. 제 1 영상 수신부(120-1), 제 1 좌표 변환부(121-1) 및 제 1 높이맵 생성부(122-1)를 포함하여 제 1 단말 전처리부라고 칭할 수 있다. 제 1 단말 전처리부는 제 1 깊이 카메라(110-1)에 대응되는 임베디드(embeded) 장치로 구성될 수도 있다. The first image receiving unit 120-1 may acquire a depth image from the first depth camera 110-1 and transmit the depth image information to the first coordinate converter 121-1. The first depth camera 110-1 may be connected to the first image receiving unit 120-1 in various ways (for example, a wired connection method such as USB, IEEE 1394, or a wireless connection method such as Bluetooth or WiFi) have. The first coordinate transforming unit 121-1 can generate real-world three-dimensional coordinates from the depth image information. The first height map generation unit 122-1 may generate a height map using coordinate values corresponding to valid values of a predetermined reference value or more based on the converted three-dimensional coordinates. And may be referred to as a first terminal preprocessor including a first image receiving unit 120-1, a first coordinate transforming unit 121-1, and a first height map generating unit 122-1. The first terminal preprocessing unit may be an embedded device corresponding to the first depth camera 110-1.

제 2, ..., 제 N 깊이 카메라의 각각으로부터 획득되는 깊이 영상에 대해서도 각각의 단말 전처리부에서 좌표 변환, 높이맵 생성과 동일한 프로세스가 수행될 수 있다. 즉, 하나의 깊이 카메라에 대응하는 하나의 단말 전처리부가 구비될 수 있으며, 개별 깊이 카메라로부터의 깊이 영상을 병렬적으로 또는 동시에 처리할 수 있다. The same process as the coordinate transformation and the height map generation in the respective terminal preprocessing units can be performed on the depth images obtained from each of the second,. That is, one terminal preprocessing unit corresponding to one depth camera can be provided, and depth images from individual depth cameras can be processed in parallel or simultaneously.

분석 대상 높이맵 생성부(130)는 N 개의 깊이 카메라(110-1, 110-2, ..., 110-N)의 각각에 의해 촬영된 깊이 영상에 대해서 각각의 단말 전처리부의 프로세스에 의해 생성된 제 1, 제 2, ..., 제 N 높이맵을 이용하여, 하나의 높이맵(즉, 분석 대상 높이맵)을 생성할 수 있다.The analysis target height map generation unit 130 generates the analysis target height map by a process of each terminal preprocessing unit with respect to the depth image photographed by each of the N depth cameras 110-1, 110-2, ..., 110-N .., N) height map, the height map (that is, the analysis target height map) can be generated using the first, second,.

후보 영역 결정부(140)에서는 분석 대상 높이맵에서 지역 최고점(local maxima) 검출 방식을 이용하여 사람 객체가 존재할 수 있다고 판정되는 위치(즉, 후보 영역)를 결정할 수 있다. 하나의 분석 대상 높이맵에서 후보 영역이 없을 수도 있고(즉, 사람 객체가 존재할 가능성이 있는 위치가 없음), 하나 이상의 후보 영역이 존재할 수도 있다(즉, 한 명 또는 다수의 사람 객체가 존재할 가능성이 있는 위치가 있음). The candidate region determining unit 140 can determine a position (i.e., a candidate region) in which it is determined that a human object can exist using the local maxima detection scheme in the analysis object height map. There may be one or more candidate regions in one analysis target height map (i.e., no candidate region may exist) (i.e., there is no possibility of existence of a human object) There is a location).

특징 정보 추출부(150)에서는 후보 영역이 존재한다면 각각의 후보 영역에서 사람 객체의 특징 정보(예를 들어, 특징 벡터)를 추출할 수 있다. The feature information extraction unit 150 may extract feature information (e.g., a feature vector) of a human object in each candidate region if a candidate region exists.

사람 판정부(160)에서는 추출된 특징 벡터에 기반하여 각각의 후보 영역에 검출된 객체가 사람 객체인지 여부를 판정할 수 있다. The person determining unit 160 can determine whether the detected object in each candidate region is a human object based on the extracted feature vector.

객체 추정 및 계수부(170)은 사람 객체 검출 결과를 이용하여 출입 카운팅, 동선 예측, 보안 등의 애플리케이션을 위해 정보를 가공하여 출력 인터페이스(예를 들어, 디스플레이, 음향/음성 출력부)로 전달하거나 다른 장치로 전달할 수 있다. The object estimating and counting unit 170 processes information for applications such as outbound counting, copper line prediction, security, and the like using the result of human object detection and transfers the information to an output interface (for example, a display, an audio / To another device.

분석 대상 높이맵 생성부(130), 후보 영역 결정부(140), 특징 정보 추출부(150), 사람 판정부(160)를 포함하여 사람 검출부라고 칭할 수도 있다. 사람 검출부는 단말 전처리부와 물리적으로 분리된 별개의 장치로 구현될 수도 있고, 하나의 장치 내에서 단말 전처리부와 사람 검출부가 구분되는 기능 모듈로서 구현될 수도 있다. 또한, 사람 검출부와 단말 전처리부에 포함된 여러 가지 기능부는 반드시 분리되어 구현되는 것은 아니며, 하나 이상이 통합되어 구현될 수도 있다. A candidate region determining unit 140, a feature information extracting unit 150, and a person determining unit 160 may be referred to as a human detecting unit. The human detection unit may be implemented as a separate device physically separated from the terminal preprocessing unit or may be implemented as a functional module in which the terminal preprocessing unit and the human detection unit are distinguished in one device. In addition, the various functional units included in the human detection unit and the terminal preprocessing unit are not necessarily separately implemented, and one or more functional units may be integrated.

도 7은 본 발명에 따른 좌표 변환에 대해서 설명하기 위한 도면이다.7 is a diagram for explaining coordinate transformation according to the present invention.

단말 전처리부(예를 들어, 영상 수신부)는 깊이 카메라에 의해 촬영된 깊이 영상을 수신할 수 있다. 즉, 하나의 깊이 카메라로부터 하나의 깊이 영상을 수신할 수 있으며, N 개의 깊이 카메라에 의해 획득된 N 개의 깊이 영상에 대한 좌표 변환이 수행될 수 있다. 이하에서는 하나의 깊이 카메라(예를 들어, 제 1 깊이 카메라(110-1))로부터 수신되는 하나의 깊이 영상을 처리하는 것에 대해서 설명하며, 동일한 설명이 다른 깊이 카메라로부터 수신되는 깊이 영상에 대해서도 각각 적용될 수 있다.The terminal preprocessing unit (for example, image receiving unit) can receive the depth image photographed by the depth camera. That is, one depth image can be received from one depth camera, and coordinate transformation for N depth images acquired by N depth cameras can be performed. Hereinafter, the processing of one depth image received from one depth camera (for example, the first depth camera 110-1) will be described, and for depth images received from different depth cameras, Can be applied.

도 7에서 설명하는 좌표 변환에 대한 구체적인 예시들은 도 6의 제 1 깊이 카메라(110-1)에 대응하는 단말 전처리부 또는 좌표 변환부(121-1)에 의해서 수행될 수 있다. Specific examples of the coordinate transformation described in FIG. 7 can be performed by the terminal preprocessing unit or the coordinate transformation unit 121-1 corresponding to the first depth camera 110-1 of FIG.

단계 S710에서 좌표 변환부(121-1)는 영상 수신부(120-1)가 깊이 카메라(110-1)로부터 획득한 깊이 영상에 대한 정보를 수신할 수 있다. 깊이 영상은, 가로-세로(또는 수평(horizontal)-수직(vertical))의 2차원 평면 상에서 일정한 개수의 화소를 가질 수 있다. 각각의 화소는 깊이 정보(예를 들어, 깊이 카메라가 위치하는 카메라 평면에서 대상 객체까지의 거리 값)을 가질 수 있다. In step S710, the coordinate transforming unit 121-1 may receive information on the depth image acquired by the image receiving unit 120-1 from the depth camera 110-1. The depth image may have a certain number of pixels on a two-dimensional plane of horizontal-vertical (or horizontal-vertical). Each pixel may have depth information (e.g., a distance value from the camera plane at which the depth camera is located to the target object).

단계 S720에서 좌표 변환부(121-1)는, 깊이 영상 내의 각각의 화소의 깊이 정보에 기반하여 카메라 기준 좌표계의 3차원 좌표값으로 변환할 수 있다. 예를 들어, 카메라의 화각과 가로-세로(또는 수평-수직) 해상도를 이용하여, 각각의 화소에 대한 깊이 정보(예를 들어, 거리 값)를 카메라 기준 좌표계의 3차원 좌표 값으로 변환할 수 있다. In step S720, the coordinate conversion unit 121-1 may convert the three-dimensional coordinate values of the camera reference coordinate system based on the depth information of each pixel in the depth image. For example, depth information (e.g., distance values) for each pixel can be converted into three-dimensional coordinate values of the camera reference coordinate system using the angle of view of the camera and the horizontal-vertical (or horizontal-vertical) have.

만약 깊이 카메라의 각각의 화소의 값이 3차원 좌표값인 경우에 좌표 변환은 생략할 수도 있다. 깊이 값으로부터 산출된 3 차원 좌표값 또는 카메라를 통해서 직접 특정된 3 차원 좌표값은 카메라를 기준으로 하는 3차원 좌표계 (즉, 카메라 기준 좌표계) 상의 좌표에 해당한다.If the value of each pixel of the depth camera is a three-dimensional coordinate value, the coordinate transformation may be omitted. The three-dimensional coordinate value calculated from the depth value or the three-dimensional coordinate value directly specified through the camera corresponds to the coordinate on the three-dimensional coordinate system (i.e., camera reference coordinate system) based on the camera.

카메라 기준 좌표계에 대해서는 도 8을 참조하여 구체적으로 설명한다. The camera reference coordinate system will be described in detail with reference to Fig.

도 8은 카메라 기준 좌표계와 실세계 좌표계의 관계를 설명하기 위한 도면이다. 8 is a diagram for explaining the relationship between the camera reference coordinate system and the real world coordinate system.

도 8의 예시에서와 같이 카메라를 기준으로 하는 좌표계에서 X축과 Y축은 카메라 평면과 평행한 평면을 이루는 가로축 및 세로축(또는 수평축 및 수직축)에 각각 대응할 수 있다. Z 축은 카메라 평면과 직교하면서 카메라가 향하는 방향에 대응할 수 있다. 카메라 기준 좌표계의 원점(O)는 카메라 위치의 중심점에 대응할 수 있다. 8, the X axis and the Y axis in the coordinate system based on the camera can correspond to the horizontal axis and the vertical axis (or the horizontal axis and the vertical axis), respectively, which form a plane parallel to the camera plane. The Z axis can be orthogonal to the camera plane and correspond to the direction the camera is facing. The origin (O) of the camera reference coordinate system may correspond to the center point of the camera position.

깊이 영상의 각각의 화소의 깊이 정보(예를 들어, 거리 값)은 아래의 수학식 1에 의해서 카메라 기준 좌표계 상의 3차원 좌표값으로 변환될 수 있다. The depth information (e.g., distance value) of each pixel of the depth image can be converted into a three-dimensional coordinate value on the camera reference coordinate system by the following equation (1).

상기 수학식 1에서 i, j 는 각각 깊이 영상의 화소의 행(row) 및 열(column)의 인덱스를 나타낸다. d는 깊이 정보의 값(예를 들어, 거리 값)을 나타낸다. x, y, z는 각각 카메라 기준 좌표계의 X축, Y축, Z축 상의 값에 각각 해당한다. ρ _hor , ρ _ver 는 깊이 영상의 가로 해상도 및 세로 해상도(또는 수평 해상도 및 수직 해상도)를 나타낸다. θ _hor , θ _ver 는 카메라의 가로 및 세로(또는 수평 및 수직)에서의 FOV(Field Of View)를 나타낸다. In Equation (1), i and j denote the row and column index of the pixel of the depth image, respectively. and d represents a value of the depth information (e.g., a distance value). x , y , and z respectively correspond to values on the X-axis, Y-axis, and Z-axis of the camera reference coordinate system, respectively. ρ _hor and ρ _ver represent the horizontal resolution and the vertical resolution (or horizontal resolution and vertical resolution) of the depth image. θ _hor and θ _ver represent FOV (Field Of View) in the horizontal and vertical (or horizontal and vertical) directions of the camera.

다시 도 7을 참조하면, 단계 S730에서 좌표 변환부(121-1)는 카메라 기준 좌표계의 3차원 좌표값(예를 들어, 상기 수학식 1에서 도출되는 x, y, z 값)을 실세계 좌표계의 3차원 좌표값으로 변환할 수 있다. Referring again to FIG. 7, in step S730, the coordinate transforming unit 121-1 transforms the three-dimensional coordinate values of the camera reference coordinate system (for example, x , y , and z values derived from Equation 1) It can be converted into a three-dimensional coordinate value.

카메라 기준 좌표계 상의 3차원 좌표값을 실세계 좌표계의 3차원 좌표값으로 변환함으로써, 실제 공간 상에서 객체의 위치를 나타낼 수 있다. 이를 위해서, 카메라의 위치, 높이, 각도에 대한 정보가 필요한데, 이는 카메라를 설치하는 과정에서 미리 정해진 값이거나, 카메라 설치 후 캘리브레이션(calibration) 과정을 통해서 도출되는 값일 수도 있다. The position of the object in the real space can be represented by converting the three-dimensional coordinate value on the camera reference coordinate system into the three-dimensional coordinate value of the real world coordinate system. In order to do this, information on the position, height, and angle of the camera is required, which may be a predetermined value in the process of installing the camera or may be a value derived through a calibration process after the camera is installed.

카메라 기준 좌표계 상의 3차원 좌표값을 실세계 좌표계의 3차원 좌표값으로 변환하는 과정은, 3차원 공간에서의 회전 변환(rotation transformation), 이동 변환(translation transformation), 리플렉션 변환(reflection transformation)의 조합으로 이루어지는 3차원 변환(3-dimension transformation)을 통해 이루어질 수 있다. The process of converting the three-dimensional coordinate values on the camera reference coordinate system to the three-dimensional coordinate values of the real world coordinate system is a combination of rotation transformation, translation transformation, and reflection transformation in three-dimensional space Dimensional transformation (3-dimensional transformation).

예를 들어, 실세계 좌표계의 3차원 공간은 X'축 Y'축 Z'축으로 구성되며, X'축, Y'축은 각각 실제 공간에서 바닥면의 평면과 평행한 평면을 이루는 가로축 및 세로축(또는 수평축 및 수직축)에 대응할 수 있다. Z'축은 바닥면이 이루는 평면과 직교하며 천장 방향에 대응할 수 있다. 실세계 좌표계의 원점은 카메라 위치의 중심점으로부터 바닥면에 직교하는 지점에 대응할 수 있다. For example, the three-dimensional space of the real world coordinate system consists of the X 'axis Y' axis Z 'axis, and the X' axis and Y 'axis respectively represent the horizontal and vertical axes Horizontal axis and vertical axis). Z 'axis is orthogonal to the plane formed by the bottom surface and can correspond to the ceiling direction. The origin of the real world coordinate system may correspond to a point orthogonal to the floor surface from the center point of the camera position.

카메라가 설치된 높이를 H라 하고, 카메라 기준 좌표계의 X, Y, Z 축에서의 설치 각도가 (φ,θ,ψ)라고 가정한다. 설치 각도가 (0, 0, 0) 라면, 실세계 좌표계의 X', Y', Z'축은 카메라 기준 좌표계에서의 X, Y, -Z 축에 대응된다 (즉, 천장면과 바닥면이 평행한 것으로 가정하면, 카메라 평면이 천장면과 평행하고, 카메라 방향이 바닥면으로 직교하는 방향인 경우이다). 따라서, X, Y, Z 축이 각각 φ,θ,ψ 만큼 회전변환되고, 카메라 높이 H 만큼 이동변환되고, X-Y 평면에 대한 리플렉션 변환이 수행되어, 카메라 기준 좌표계가 실세계 좌표계로 변환될 수 있다. Let H be the height at which the camera is installed, and assume that the installation angles in the X, Y, and Z axes of the camera reference coordinate system are (φ, θ, ψ). If the installation angle is (0, 0, 0), the X ', Y', and Z 'axes of the real world coordinate system correspond to the X, Y, and -Z axes in the camera reference coordinate system , The camera plane is parallel to the ceiling scene and the camera direction is the direction orthogonal to the bottom plane). Therefore, the X, Y, and Z axes are rotationally transformed by?,?, And?, The camera height is H, and the X-Y plane is subjected to the reflection transformation, so that the camera reference coordinate system can be transformed into the real world coordinate system.

이러한 각각의 변환 과정은, H, φ, θ, ψ 를 파라미터로 하는 하나의 3차원 변환 행렬과 3차원 좌표 행렬의 행렬 곱 연산으로 표현될 수 있다. Each of these transformation processes can be expressed by a matrix multiplication operation of a three-dimensional transformation matrix and a three-dimensional coordinate matrix having H, φ, θ, and ψ as parameters.

전술한 본 발명의 예시들에 따라, 깊이 카메라가 바닥면에 수직인 방향으로 설치되지 않아도 (예를 들어, 비스듬한 방향으로 설치되어도), 좌표 변환을 통해서 높이맵을 용이하게 생성할 수 있다. 예를 들어, 깊이 카메라가 비스듬한 방향으로 설치되는 경우에 좌표 변환이 적용되지 않으면 깊이 영상의 깊이 정보(즉, 거리값)가 객체의 실제 높이가 아니라 객체로부터 카메라 평면까지의 거리에 해당하므로, 실제 높이가 왜곡될 수 있다. 그러나, 본 발명에 따른 좌표 변환에 의하면 실제 객체의 높이를 반영하는 보다 정확한 높이맵이 생성될 수 있다. According to the above-described examples of the present invention, even if the depth camera is not installed in a direction perpendicular to the floor (for example, even if installed in an oblique direction), a height map can easily be generated through coordinate transformation. For example, when the depth camera is installed in an oblique direction, if the coordinate transformation is not applied, the depth information (i.e., the distance value) of the depth image corresponds to the distance from the object to the camera plane, The height may be distorted. However, according to the coordinate transformation according to the present invention, a more accurate height map reflecting the height of the actual object can be generated.

또한, 좌표 변환을 통해서 복수개의 깊이 카메라에서 촬영된 영상을 동일한 좌표계에서 용이하게 합성할 수 있으므로, 복수개의 깊이 카메라의 위치와 각도가 서로 다르더라도 이들에 의해 촬영된 깊이 영상들로부터 하나의 분석 대상 높이맵을 용이하게 생성할 수 있다.In addition, since the images captured by the plurality of depth cameras can be easily synthesized in the same coordinate system through the coordinate conversion, even if the positions and angles of the plurality of depth cameras are different from each other, The height map can be easily generated.

도 9는 본 발명에 따른 카메라 기준 좌표계의 좌표를 실세계 기준 좌표계의 좌표로 변환하는 동작을 설명하기 위한 도면이다. 9 is a diagram for explaining an operation of converting coordinates of a camera reference coordinate system into coordinates of a real world reference coordinate system according to the present invention.

단계 S910에서 카메라 설치 높이(H) 및 카메라 설치 각도(φ,θ,ψ)에 기초하여 3차원 변환 행렬을 계산(또는 결정)할 수 있다. 이러한 3차원 변환 행렬은 카메라 설치시에 미리 결정될 수도 있고, 카메라의 캘리브레이션이나 설치후 각도 조정 등에 따라서 업데이트되어 결정될 수도 있다. 만약 카메라의 위치나 각도가 실시간으로 조절 가능한 경우에는, 각각의 위치 및 각도에 대한 파라미터를 실시간으로 업데이트함으로써, 3차원 변환 행렬이 실시간으로 결정될 수도 있다.In step S910, the three-dimensional transformation matrix can be calculated (or determined) based on the camera installation height H and the camera installation angles?,?, And?. Such a three-dimensional transformation matrix may be predetermined at the time of installation of the camera, or may be updated and determined depending on the calibration of the camera, the angle adjustment after installation, and the like. If the position or angle of the camera is adjustable in real time, the 3D transformation matrix may be determined in real time by updating the parameters for each position and angle in real time.

단계 S920에서 하나의 화소에 대한 카메라 기준 좌표계의 3차원 좌표값이 입력될 수 있다. 예를 들어, 상기 수학식 1에 따라 깊이 카메라로부터 획득된 깊이 영상 내의 하나의 화소에 대한 깊이 정보로부터 카메라 기준 좌표계의 3차원 좌표값이 계산될 수 있다. In step S920, the three-dimensional coordinate value of the camera reference coordinate system for one pixel may be input. For example, the 3D coordinate value of the camera reference coordinate system can be calculated from the depth information of one pixel in the depth image obtained from the depth camera according to Equation (1).

단계 S930에서 하나의 화소에 대한 카메라 기준 좌표계의 3차원 좌표값을 상기 3차원 변환 행렬에 의해 변환함으로써 실세계 좌표계의 3차원 좌표값을 계산할 수 있다. 이러한 변환은 행렬의 곱 연산에 의해 수행될 수 있다. In step S930, the three-dimensional coordinate value of the real world coordinate system can be calculated by converting the three-dimensional coordinate value of the camera reference coordinate system for one pixel by the three-dimensional conversion matrix. This transformation can be performed by multiplying a matrix.

단계 S940에서 카메라 기준 좌표계의 3차원 좌표값으로부터 실세계 좌표계의 3차원 좌표값으로의 변환이 깊이 영상 내의 모든 화소에 대해서 수행되었는지를 체크하고, 아직 변환되지 않은 화소가 남아 있으면 단계 S920으로 돌아갈 수 있다. 이에 따라, 깊이 영상 내의 모든 화소(즉, 카메라 기준 좌표계 상의 모든 좌표)에 대해서 실세계 좌표계 상의 3차원 좌표값을 얻을 수 있다. 또는, 깊이 영상에서 특정 영역(예를 들어, 배경 영역을 제외한 관심 영역)에 대해서만 3차원 좌표 변환이 수행될 수도 있다. 이 경우에는 단계 S940의 모든 좌표는 상기 특정 영역 내의 모든 좌표를 의미한다. In step S940, it is checked whether or not the conversion from the three-dimensional coordinate value of the camera reference coordinate system to the three-dimensional coordinate value of the real world coordinate system has been performed for all the pixels in the depth image. . Thus, three-dimensional coordinate values on the real world coordinate system can be obtained for all the pixels in the depth image (i.e., all the coordinates on the camera reference coordinate system). Alternatively, a three-dimensional coordinate transformation may be performed only on a specific region (for example, a region of interest excluding the background region) in the depth image. In this case, all the coordinates in step S940 refer to all the coordinates in the specific area.

이와 같이 획득된 실시예 좌표계 상의 좌표값들을 이용하여, 카메라에 의해 촬영된 객체에 대한 천장으로부터 바닥 방향으로의 깊이(즉, 바닥으로부터 천장 방향으로는 높이)를 결정할 수 있다. 깊이 영상 내의 모든 화소(또는 특정 영역 내의 모든 화소)에 대해서 깊이 또는 높이에 기반하여, 높이맵(heightmap)을 생성할 수 있다. 도 6에서 설명한 바와 같이 하나의 깊이 카메라에 의해 촬영된 깊이 영상에 대해서 높이맵이 각각 생성될 수 있다. Using the coordinate values thus obtained in the embodiment coordinate system, the depth from the ceiling to the floor for the object photographed by the camera (that is, the height from the floor to the ceiling) can be determined. A height map can be generated based on the depth or the height for all the pixels in the depth image (or all the pixels within the specific area). As described with reference to FIG. 6, a height map can be generated for the depth image photographed by one depth camera.

높이맵의 생성을 위해 3차원 객체를 2차원 평면에 투영하는 정사영(orthographic projection)을 이용할 수도 있다. 높이맵은 깊이 영상의 전체 영역에 대해서 생성하기 보다는, 사람을 검출하고자 하는 하나 이상의 특정 영역(또는 관심 영역)을 미리 설정하고, 실세계 3차원 공간 좌표가 상기 특정 영역 내에 속하는 경우에만 2차원 평면에 투영하는 방식으로 높이맵 생성의 부하를 줄일 수도 있다. Orthographic projection may be used to project a three-dimensional object onto a two-dimensional plane for the generation of a height map. The height map may be created by setting in advance one or more specific regions (or regions of interest) for which a person is to be detected, rather than creating the entire region of the depth image, and only when the real- It is also possible to reduce the load of height map generation in a projection manner.

특정 영역은 왼쪽(left), 오른쪽(right), 앞(front), 뒤(back), 위(top), 아래(bottom)의 6개의 임계치에 의해서 결정될 수 있다. 높이맵이 표시되는 2차원 평면은 가로, 세로로 소정의 화소 개수를 가지는 영상으로 표현될 수 있다. 상기 특정 영역의 왼쪽, 오른쪽, 앞, 뒤에 대한 임계치에 의해 결정되는 범위는, 2차원 평면에 대응될 수 있다. 상기 특정 영역의 위, 아래의 임계치는, 2차원 영상 화소들(또는 좌표들)의 최대 높이값, 최소 높이값에 각각 대응될 수 있다. A particular area can be determined by six thresholds: left, right, front, back, top, and bottom. The two-dimensional plane on which the height map is displayed can be expressed as an image having a predetermined number of pixels horizontally and vertically. The range determined by the threshold values for the left, right, front, and back of the specific area may correspond to a two-dimensional plane. The threshold values above and below the specific region may correspond to the maximum height value and the minimum height value of the two-dimensional image pixels (or coordinates), respectively.

또한, 복수개의 3차원 좌표가 2차원 평면상의 하나의 동일한 좌표에 사영될 수도 있다. 예를 들어, 3차원 공간 상에서 Z 축 방향의 값만 상이하고 X-Y 평면 상에서는 동일한 값을 가지는 복수개의 3차원 좌표가 존재할 수 있다. 이 경우, 복수개의 3차원 좌표의 Z 축 방향의 값(즉, 높이 값)이 가장 높은 좌표가 나머지 좌표들을 대체할 수 있다. 즉, 상기 복수개의 3차원 좌표의 높이값을 비교하여, 가장 큰 높이값을 가지는 3차원 좌표만이 2차원 평면 상에 사영되고, 나머지 3차원 좌표들은 버려질 수 있다.In addition, a plurality of three-dimensional coordinates may be projected on one and the same coordinate on a two-dimensional plane. For example, there may be a plurality of three-dimensional coordinates having different values only in the Z-axis direction on the three-dimensional space and having the same value on the X-Y plane. In this case, the coordinate having the highest value of the plurality of three-dimensional coordinates in the Z-axis direction (i.e., the height value) can replace the remaining coordinates. That is, by comparing the height values of the plurality of three-dimensional coordinates, only the three-dimensional coordinates having the largest height value are projected on the two-dimensional plane, and the remaining three-dimensional coordinates can be discarded.

다음으로, 복수개의 높이맵으로부터 하나의 분석 대상 높이맵이 생성될 수 있다. Next, one analysis target height map can be generated from the plurality of height maps.

N 개의 깊이 영상 각각에 대해서 하나 이상의 높이맵이 생성되고, 이러한 복수개의 높이맵은 하나의 분석 대상 높이맵으로 합성될 수 있다. 분석 대상 높이맵은, 복수개의 깊이 영상에서 설정되는 복수개의 특정 영역(예를 들어, 관심 영역)들에 대한 높이맵을 전부 포함할 수 있다. One or more height maps are generated for each of the N depth images, and the plurality of height maps can be synthesized into one analysis target height map. The analysis target height map may include all the height maps for a plurality of specific regions (for example, the ROIs) set in the plurality of depth images.

또한, 분석 대상 높이맵은 합성되는 복수의 높이맵의 임계치를 모두 포함할 수 있다. 예를 들어, 각각의 높이맵은 서로 일부 중첩되거나 서로 중첩되지 않는 다른 범위를 촬영한 깊이 영상에 대해서 생성되며, 각각의 깊이 영상에 대한 높이맵의 생성을 위해 설정되는 임계치(예를 들어, 왼쪽, 오른쪽, 앞, 뒤, 위, 아래)에 의해 설정되는 특정 영역(예를 들어, 관심 영역)의 모든 화소 또는 좌표가 누락되지 않고 모두 분석 대상 높이맵의 화소 또는 좌표에 대응되어야 한다. 이 경우, 분석 대상 높이맵의 하나의 화소(또는 좌표)가 서로 다른 높이맵의 서로 다른 화소(또는 좌표)들에 대응할 수도 있고, 분석 대상 높이맵의 하나의 화소(또는 좌표)가 하나의 높이맵의 특정 화소(또는 좌표)에만 대응할 수도 있다. Further, the analysis target height map may include all threshold values of a plurality of height maps to be combined. For example, each of the height maps is generated for a depth image photographed in a different range partially overlapped with each other or not overlapped with each other, and a threshold value set for generation of a height map for each depth image (for example, All the pixels or coordinates of a specific region (for example, the region of interest) set by the right, left, front, right, front, back, top and bottom of the target object should correspond to pixels or coordinates of the target height map. In this case, one pixel (or coordinate) of the analysis target height map may correspond to different pixels (or coordinates) of different height maps, and one pixel (or coordinate) of the analysis target height map may correspond to one height (Or coordinates) of the map.

각각의 높이맵의 화소(또는 좌표)와 분석 대상 높이맵의 화소(또는 좌표)의 대응 관계에 따라서, 각각의 높이맵의 화소(또는 좌표)의 높이 값을 분석 대상 높이맵의 대응 화소(또는 좌표)의 높이 값으로 대입하는 방식으로 분석 대상 높이맵이 생성될 수 있다. 만약 합성되는 복수개의 높이맵의 각각의 화소(또는 좌표)가 분석 대상 높이맵에서 하나의 동일한 화소(또는 좌표)에 대응하는 경우 (예를 들어, 제 1 높이맵의 하나의 화소(또는 좌표)와 제 2 높이맵의 하나의 화소(또는 좌표)가, 분석 대상분석 대상 높이맵의 하나의 화소(또는 좌표)에 대응하는 경우), 복수개의 높이맵의 화소(또는 좌표)들 중에서 가장 높은 높이값을 가지는 화소(또는 좌표)가 나머지 화소(또는 좌표)를 대체할 수 있다. 즉, 복수개의 높이맵의 화소(또는 좌표)들의 높이값을 비교하여, 가장 큰 높이값을 가지는 화소(또는 좌표)만이 분석 대상 높이맵의 화소(또는 좌표)로 대입되고, 나머지 화소(또는 좌표)들은 버려질 수 있다. (Or coordinates) of the pixels of the respective height maps (or coordinates) in accordance with the correspondence between the pixels (or coordinates) of the respective height maps and the pixels (or coordinates) The height of the analysis target height map can be generated in such a manner that the height of the analysis target height map is substituted. If each pixel (or coordinate) of a plurality of height maps to be synthesized corresponds to one and the same pixel (or coordinate) in the analysis target height map (e.g., one pixel (or coordinate) of the first height map) (Or coordinates) of the plurality of height maps corresponds to one pixel (or coordinate) of the analysis target height map), the pixel (or the coordinates) of the second height map corresponds to one pixel (Or coordinates) having a value can replace the remaining pixels (or coordinates). That is, only the pixel (or the coordinate) having the largest height value is substituted into the pixel (or coordinate) of the analysis object height map by comparing the height values of the pixels (or the coordinates) of the plurality of height maps, ) Can be discarded.

다음으로, 분석 대상 높이맵을 기준으로, 검출을 위한 하나 이상의 후보 영역을 결정할 수 있다. Next, based on the analysis target height map, one or more candidate regions for detection can be determined.

분석 대상 높이맵에서 일정 영역 내에서 가장 높은 높이값을 가지는 화소(또는 좌표)를 지역 최고점(local maxima)라고 칭할 수 있다. 지역 최고점에 해당하는 위치는 사람 객체가 존재할 가능성이 높은 영역이다. 따라서, 분석 대상 높이맵에서 각각의 화소(또는 좌표)에 대해서 해당 화소(또는 좌표) 주변으로 소정의 반경 내의 다른 화소(또는 좌표)들과 서로 높이 값을 비교하고, 주변에 다른 화소(또는 좌표)들보다 해당 화소(또는 좌표)의 높이값이 더 높은 경우 해당 화소(또는 좌표)의 위치를 후보 영역으로 결정할 수 있다. 분석 대상 높이맵 내에서 하나 이상의 후보 영역이 결정될 수도 있다. A pixel (or coordinate) having the highest height value in a certain area in the analysis target height map may be referred to as a local maxima. The location corresponding to the local peak is the area where the person object is likely to exist. Therefore, for each pixel (or coordinate) in the analysis target height map, the height values are compared with other pixels (or coordinates) within a predetermined radius around the corresponding pixel (or coordinate) (Or coordinates) is higher than the corresponding pixel (or coordinate), the position of the corresponding pixel (or coordinate) can be determined as the candidate region. One or more candidate regions may be determined within the analysis target height map.

다른 화소(또는 좌표)에 비해서 큰 높이값을 가진다는 이유만으로 해당 위치에 사람 객체가 검출된 것으로 판정하는 경우, 사람이 아니지만 사람 키 높이 정도의 물체를 사람 객체인 것으로 오인식할 수도 있다. 따라서, 본 발명의 일 예시에서는 후보 영역에 대해서 특징 정보를 추출하여 특징 정보 기반으로 사람 객체 여부를 판정할 수 있다. If it is determined that a human object is detected at the position only because it has a larger height value than other pixels (or coordinates), it may be misleading to refer to an object of a human height level that is not a human being as a human object. Therefore, in one example of the present invention, the feature information may be extracted for the candidate region, and the presence or absence of the person object may be determined based on the feature information.

하나 이상의 후보 영역이 결정되는 경우, 각각의 후보 영역에 대해서 특징 정보를 추출할 수 있다. 특징 정보는 인체의 높이에 따른 단면의 면적과, 그 면적의 변화를 나타내는 특징 벡터를 포함할 수 있다. 예를 들어, 특징 벡터는 아래의 수학식 2를 이용하여 계산될 수 있다.When one or more candidate regions are determined, the feature information can be extracted for each candidate region. The feature information may include an area of a cross section depending on the height of the human body and a feature vector indicating a change in the area. For example, the feature vector may be calculated using Equation 2 below.

상기 수학식 2에서 벡터 s는 특징 벡터를 나타낸다. α _k는 최종 특징 벡터의 각각의 원소에 대한 가중치(weighting factor)를 나타낸다. 즉, 최종 특징 벡터 x는 인접한 높이 레벨 간의 면적 변화량에 가중치가 적용된 특징 벡터를 나타낸다. s _i는 특징 벡터의 i 번째 원소를 나타낸다. 예를 들어, 특징 벡터 s는 후보 영역의 높이 레벨에 따른 면적(예를 들어, 컨벡스 헐(convex hull) 면적)을 나타낼 수 있다. 즉, s ₁ 은 첫 번째 높이 레벨에 해당하는 면적을 의미한다. 벡터 d는 특징 벡터 s의 i 번째 원소와 i+1 번째 원소의 차이값(delta)에 해당한다. 예를 들어, d ₁ 은 s _i ₊ ₂ 과 s ₁ 의 차이값을 나타낸다.In Equation (2), the vector s represents a feature vector. α _k represents a weighting factor for each element of the final feature vector. That is, the final feature vector x represents a feature vector to which a weight is applied to an area change amount between adjacent height levels. s _i represents the i-th element of the feature vector. For example, the feature vector s may represent an area (for example, a convex hull area) according to the height level of the candidate area. That is, s ₁ means the area corresponding to the first height level. The vector d corresponds to the difference value (delta) between the i-th element and the (i + 1) -th element of the feature vector s. For example, d ₁ represents the difference between s _i ₊ ₂ and s ₁ .

도 10은 본 발명에 따른 각각의 후보 영역에서 특징 정보를 추출하는 방법을 설명하기 위한 도면이다.10 is a diagram for explaining a method of extracting feature information from each candidate region according to the present invention.

후보 영역에 대한 특징 정보 추출은, 후보 영역 내의 지역 최고점에 인접한 좌표(즉, 분석 대상 높이맵의 가로-세로축, 수평-수직축, 또는 X-Y축 상의 평면에서의 좌표)들을 높이 레벨을 기준으로 다수의 좌표 그룹으로 분류함으로써 특징 벡터를 생성하는 것을 포함할 수 있다. The feature information extraction for the candidate region is performed by extracting the feature information of the candidate region from the coordinates (i.e., the horizontal-vertical axis, the horizontal-vertical axis, or the coordinates in the plane on the XY axis) And generating the feature vectors by classifying them into coordinate groups.

예를 들어, 분석 대상 높이맵에서 화소(또는 좌표)들의 높이값의 최대값과 최소값이 각각 H_max 및 H_min이고, 이들을 K (K는 자연수) 개의 높이 레벨로 분할하는 것을 가정한다. 예를 들어, 제 1 높이 레벨은 H_max 이하 H₁ 초과, 제 2 높이 레벨은 H₁ 이하 H₂ 초과, ..., 제 K 높이 레벨은 H_K _-1 이하 H_min 초과로 분할할 수 있다 (또는, 제 1 높이 레벨은 H_max 미만 H₁ 이상, 제 2 높이 레벨은 H₁ 미만 H₂ 이상, ..., 제 K 높이 레벨은 H_K _-1 미만 H_min 이상으로 분할할 수도 있다). For example, it is assumed that the maximum value and the minimum value of the height values of the pixels (or coordinates) in the analysis target height map are H _max and H _min , respectively, and are divided into K (K is a natural number) height levels. For example, the first height level is greater than or equal to H _{max and} greater than H ₁ , the second height level is greater than or equal to H _{1 and} greater than H ₂ , ..., and the Kth height level is less than or equal to H _K _-1 H _min (Or the first height level is less than H _max less than H ₁ , the second height level less than H ₁ less than H ₂ , ..., K th height level is less than H _K _-1 H _min Or more).

여기서, 각각의 높이 레벨이 커버하는 범위의 크기(또는 각각의 높이 레벨에 속하는 높이값의 개수)가 동일할 필요는 없다. 즉, 제 1, 제 2, 제 3 높이 레벨의 범위 크기는 제 K 높이 레벨의 범위 크기보다 작을 수도 있다. 이러한 경우, 예를 들어, 사람의 머리부터 어깨 높이에 해당하는 높이에 해당하는 화소(또는 좌표)들을 좀더 세밀하게 분류 및 그룹화할 수 있다.Here, the size of the range covered by each height level (or the number of height values belonging to each height level) need not be the same. That is, the range size of the first, second, and third height levels may be smaller than the range size of the Kth height level. In this case, for example, the pixels (or coordinates) corresponding to the height corresponding to the height of the human head to the shoulder can be further classified and grouped.

또한, 높이 레벨은 지역 최고점의 높이값에 따른 가중치를 곱하여, 사람 키에 따라 각각의 높이 레벨이 각각의 사람의 몸에서 비슷한 위치에서 형성되도록 보정할 수 있다. 즉, 지역 최고점이 높은 후보 영역에 적용되는 높이 레벨의 범위 크기는, 지역 최고점이 낮은 후보 영역에 적용되는 높이 레벨의 범위 크기에 비하여 더 크게 설정할 수도 있다. Also, the height level can be corrected by multiplying the weight according to the height value of the local highest point so that each height level is formed at a similar position in each human body according to the human key. That is, the range size of the height level applied to the candidate region having the highest local peak may be set larger than the range size of the height level applied to the candidate region having the lower local peak.

지역 최고점에 해당하는 좌표로부터 시작하여 인접한 좌표들의 높이값을 확인하여, 소정의 높이값 차이 범위 내에 속하는 (즉, 소정의 높이 레벨에 해당하는) 좌표들을 동일한 좌표 그룹에 포함시킬 수 있고, 더 이상 해당 높이 레벨에 속하는 좌표가 없다면 그 다음 높이 레벨에 대해서 좌표 그룹을 생성하는 것을 반복하여, 높이 레벨에 따른 좌표 그룹을 결정할 수 있다. 하나의 좌표 그룹의 모든 좌표들을 포함하는 컨벡스 헐을 구하고, 구해진 컨벡스 헐의 면적을 해당 높이 레벨에 대한 면적으로 결정할 수 있다. 이와 같이 각각의 좌표 그룹에 대해서 해당 높이 레벨에 대한 면적을 결정할 수 있다. It is possible to check the height values of the adjacent coordinates starting from the coordinates corresponding to the local peak, and to include the coordinates within the predetermined height difference difference range (i.e., corresponding to the predetermined height level) in the same coordinate group, If there is no coordinate belonging to the corresponding height level, it is possible to repeat the generation of the coordinate group with respect to the next height level, thereby determining the coordinate group according to the height level. The convex hull containing all the coordinates of one coordinate group is obtained, and the area of the calculated convex hull can be determined as the area with respect to the corresponding height level. Thus, the area for the corresponding height group can be determined for each coordinate group.

이하에서는 도 10을 참조하여 하나의 후보 영역에 대한 특징 정보를 추출하는 방법에 대해 구체적으로 설명한다.Hereinafter, a method for extracting feature information for one candidate region will be described in detail with reference to FIG.

단계 S1010에서 후보 영역에서 지역 최고점의 정보(즉, 지역 최고점의 좌표값 및 그 위치에서의 높이값)가 입력될 수 있다.In step S1010, the information of the local highest point in the candidate area (i.e., the coordinate value of the local highest point and the height value at the position) can be input.

단계 S1020에서 지역 최고점의 좌표값을 제 1 높이 레벨에 해당하는 제 1 좌표 그룹에 추가할 수 있다. 제 1 높이 레벨은 지역 최고점의 높이값을 포함하는 소정의 범위의 높이값으로 정해질 수 있고, 제 1 좌표 그룹은 지역 최고점의 좌표를 포함하는 그룹일 수 있다. The coordinate value of the local highest point may be added to the first coordinate group corresponding to the first height level in step S1020. The first height level may be defined as a height value of a predetermined range including the height value of the local peak, and the first coordinate group may be a group including the coordinates of the local peak.

단계 S1030에서 분석 대상 높이맵에서 제 k 높이 레벨에 해당하는 좌표가 존재한다면, 해당 좌표를 제 k 좌표 그룹에 추가할 수 있다. 여기서, k=1, 2, ..., K의 값을 가질 수 있다. k=1인 경우에는 지역 최고점 좌표에 인접한 좌표들의 높이값을 확인하여 제 1 높이 레벨에 해당하는 높이값을 가지는 좌표가 있는 경우, 해당 좌표를 제 1 좌표 그룹에 추가할 수 있다. If there is a coordinate corresponding to the k-th height level in the analysis object height map in step S1030, the coordinate may be added to the k-th coordinate group. Here, k can have a value of 1, 2, ..., K. If k = 1, the height value of coordinates adjacent to the local maximum coordinate is checked, and if there is a coordinate having a height value corresponding to the first height level, the coordinate can be added to the first coordinate group.

단계 S1040에서 제 k 높이 레벨에 유효한 인접 좌표가 존재하는지 여부를 판정하고, 만약 존재한다면 단계 S1030으로 진행하여 해당 좌표를 제 k 높이 레벨에 해당하는 제 k 좌표 그룹에 추가시킬 수 있다. 제 k 높이 레벨에 유효한 인접 좌표가 존재하는지를 판단하는 것은, 어떤 좌표의 높이값이 제 k 높이 레벨의 범위에 포함되는지와, 해당 좌표가 중심(또는 지역 최고점의 좌표)으로부터 소정의 거리 내에 속하는지를 기준으로 할 수 있다. 예를 들어, 제 k 높이 레벨의 범위에 속하는 좌표가 지역 최고점으로부터 사람의 신체 반경을 벗어난 경우에는 유효하지 않은 것으로 판단할 수 있다. 만약 제 k 높이 레벨에 유효한 인접 좌표가 존재하지 않는다면 단계 S1050으로 진행한다.In step S1040, it is determined whether there is a valid neighboring coordinate at the k-th height level. If YES, the process proceeds to step S1030 to add the corresponding coordinate to the k-th coordinate group corresponding to the k-th height level. Determining whether there is a valid adjacent contour coordinate at the k-th height level determines whether the height value of a certain coordinate is included in the range of the k-th height level and whether the coordinate falls within a predetermined distance from the center (or the coordinates of the local highest point) It can be a standard. For example, it can be determined that the coordinates in the range of the k-th height level are not valid when the coordinates are out of the human body radius from the local peak. If there is no valid adjacent coordinate at the k-th level, the process proceeds to step S1050.

단계 S1050에서는 k가 그 최대값인 K인지 판정할 수 있다. 만약 그렇지 않다면 (즉, k < K) 라면, 단계 S1060으로 진행하여 k 값을 1 만큼 증가시키고, 단계 S1030으로 진행하여 그 다음 높이 레벨(즉, 1만큼 증가된 k에 대응하는 높이 레벨)에 해당하는 좌표가 있다면 해당 좌표를 새로운 좌표 그룹(즉, 1만큼 증가된 k에 대응하는 좌표 그룹)에 추가시킬 수 있다. 만약 k=K 라면, 높이 레벨 1 부터 높이 레벨 K 까지의 모든 높이 레벨의 각각에 유효한 좌표들에 대한 그룹화가 완료된 것을 의미한다.In step S1050, it is possible to determine whether k is the maximum value K or not. If not (i.e., k < K), the process proceeds to step S1060 to increase the value of k by 1, and proceeds to step S1030 to correspond to the next height level (i.e., the height level corresponding to k increased by 1) (I.e., a coordinate group corresponding to k increased by 1), the coordinate can be added. If k = K, it means that grouping of valid coordinates in each of all the height levels from the height level 1 to the height level K is completed.

단계 S1070에서 K 개의 좌표 그룹의 각각에 대해서 컨벡스 헐(Convex hull)을 계산할 수 있다. 컨벡스 헐은 2차원 평면에서 주어진 좌표들을 모두 포함하는 최소 크기의 다각형을 의미한다. Convex hull can be calculated for each of the K coordinate groups in step S1070. Convex Hull means the minimum size polygon that contains all the coordinates in a two-dimensional plane.

단계 S1080에서 각각의 컨벡스 헐에 대한 면적을 계산할 수 있다. 예를 들어, 컨벡스 헐의 면적은 컨벡스 헐 포함되는 화소(또는 좌표)의 개수를 카운트함으로써 계산할 수 있다. The area for each convex hull can be calculated in step S1080. For example, the area of the convex hull can be calculated by counting the number of pixels (or coordinates) included in the convex hull.

도 10을 참조하여 설명한 방법에 따라 하나의 후보 영역에 대한 특징 벡터(즉, 높이 레벨에 따른 면적)을 결정할 수 있다. 복수의 후보 영역이 존재하는 경우, 나머지 후보 영역의 각각에 대해서 이와 유사한 방식으로 특징 벡터를 결정할 수 있다. It is possible to determine a feature vector for one candidate region (i.e., an area based on the height level) according to the method described with reference to FIG. If there are a plurality of candidate regions, the feature vectors may be determined in a similar manner for each of the remaining candidate regions.

도 11은 본 발명에 따른 후보 영역의 특징 정보의 일례를 나타내는 도면이다.11 is a diagram showing an example of feature information of a candidate region according to the present invention.

도 11에서 좌측 도면은 사람의 몸에 대한 정면도와 높이맵을 예시적으로 나타내며, 우측 도면은 높이 레벨에 따른 컨벡스 헐의 면적을 예시적으로 나타낸다. 도 11의 예시에서와 같이 사람 검출의 정확도를 높이기 위해서 머리 부분에 대해서 다수의 높이 레벨이 촘촘하게 설정될 수 있다. 어깨부터 상반신에 대해서는 머리 부분에 비해서 높이 레벨이 촘촘하지 않게 설정될 수 있다. 각각의 높이 레벨에 따른 컨벡스 헐 면적으로부터, 해당 후보 영역의 특징 정보(즉, 특징 벡터)가 결정될 수 있다. 11 shows an example of a front view and a height map for a human body, and the right diagram exemplarily shows an area of a convex hull according to a height level. As in the example of Fig. 11, a plurality of height levels can be set closely to the head portion to increase the accuracy of human detection. For the upper body from the shoulder, the height level may be set to be less dense than the head part. From the convex hull area according to each height level, feature information of the candidate region (i.e., feature vector) can be determined.

즉, 본 발명에 따른 특징 벡터는, 지역 최고점을 기준으로 높이 레벨에 따른 단면의 면적과, 면적의 변화량에 대한 정보를 포함할 수 있다. 이러한 특징 벡터를 이용함으로써, 사람의 머리, 어깨, 상반신의 형상을 구체적으로 모델링할 수 있다. 이와 같이, 머리, 어깨, 상반신에 대한 세밀한 형상 모델을 표현할 수 있는 특징 벡터를 사람 검출을 위해서 이용하므로, 해당 후보 영역에서의 객체가 사람인지 여부를 보다 용이하고 정확하게 판정할 수 있다. 이에 따라, 사람 검출 여부의 정확도가 크게 향상될 수 있다.That is, the feature vector according to the present invention may include information on the area of the cross section according to the height level and the change amount of the area based on the local maximum point. By using such a feature vector, the shape of a person's head, shoulders, and upper body can be specifically modeled. As described above, since the feature vector capable of expressing a detailed shape model of the head, shoulder, and upper body is used for human detection, it is possible to more easily and accurately determine whether the object in the candidate region is a human. Thus, the accuracy of human detection can be greatly improved.

이와 같이, 사람 여부를 판정하기 위한 특징 정보는 사람의 머리 끝에서부터 어깨 및 상반신에 이르는 3차원 형상에 대한 모델링 정보를 포함할 수 있다. 이를 위해, 후보 영역에서 머리와 상반신을 포함하는 영역의 각 높이 레벨에 따른 단면(즉, 컨벡스 헐)의 면적과, 인접한 높이 레벨간의 면적 변화량, 및 면적의 영상비(aspect ratio)를 이용할 수 있다. 단면의 영상비는 해당 단면을 포함하는 최소 직사각형의 단축과 장축의 비에 해당할 수 있다. As described above, the feature information for determining whether a person is human can include modeling information for a three-dimensional shape ranging from the head end of the person to the shoulder and the upper body. For this purpose, the area ratio of the cross section (i.e., convex hull) along the height level of the region including the head and the upper half of the candidate region, the area change amount between the adjacent height levels, and the aspect ratio of the area can be used . The aspect ratio of the cross section may correspond to the ratio of the short axis to the long axis of the minimum rectangle including the cross section.

즉, 상기 수학식 2와 같이 인접한 높이 레벨간의 단면의 변화량에 추가적으로, 단면의 영상비까지 고려한 최종 특징 벡터 x를 이용할 수도 있다. 최종 특징 벡터 x는 아래의 수학식 3과 같이 계산될 수 있다.In other words, the final feature vector x considering the image ratio of the cross section may be used in addition to the change amount of the cross section between the adjacent height levels as shown in Equation (2). The final feature vector x can be calculated as: < EMI ID = 3.0 >

상기 수학식 3에서 특징 벡터 s는 후보 영역 각각의 높이 레벨에 따른 단면의 면적을 나타낸다. 벡터 d는 각각의 인접한 높이 레벨 간의 면적 변화량을 나타낸다. 벡터 r은 각각의 면적의 영상비를 나타낸다. The feature vector s in Equation (3) represents the area of the cross section according to the height level of each candidate region. The vector d represents the amount of area change between each adjacent height level. The vector r represents the image ratio of each area.

α _k는 최종 특징 벡터의 각각의 원소에 대한 가중치를 나타낸다. s _i는 특징 벡터의 i 번째 원소를 나타낸다. 예를 들어, 특징 벡터 s는 후보 영역의 높이 레벨에 따른 면적(예를 들어, 컨벡스 헐 면적)을 나타낼 수 있다. 즉, s ₁ 은 첫 번째 높이 레벨(즉, 후보 영역의 가장 높은 지점을 포함하는 높이 레벨)에 해당하는 면적을 의미한다. 벡터 d는 특징 벡터 s의 i 번째 원소와 i+1 번째 원소의 차이값(delta)에 해당한다. 예를 들어, d ₁ 은 s _i ₊ ₂ 과 s ₁ 의 차이값을 나타낸다. r _i 은 각각의 면적에서 단축의 길이를 장축의 길이로 나눈 값으로 0 초과 1 미만의 값을 가질 수 있다. and? _k represents a weight for each element of the final feature vector. s _i represents the i-th element of the feature vector. For example, the feature vector s may represent an area (for example, a convex hull area) according to the height level of the candidate area. That is, s ₁ means the area corresponding to the first height level (i.e., the height level including the highest point of the candidate area). The vector d corresponds to the difference value (delta) between the i-th element and the (i + 1) -th element of the feature vector s. For example, d ₁ represents the difference between s _i ₊ ₂ and s ₁ . r _i is the value obtained by dividing the length of the minor axis by the length of the major axis in each area and can have a value of more than 0 and less than 1.

다음으로, 미리 학습된 분류기(classifier)를 이용하여, 후보 영역의 특징 벡터를 가지는 객체가 사람인지 여부를 판별할 수 있다. 즉, 실제 사람인 객체가 가지는 특징 벡터의 다수의 샘플을 누적 학습함으로써 결정되는 비교 기준 특징 벡터와, 깊이 카메라에 의해 획득된 깊이 영상으로부터 도출되는 후보 영역의 특징 벡터의 매칭여부(또는 유사도)에 근거하여, 매칭되는 경우에(또는 유사도가 높을수록) 해당 객체가 사람인 것으로 판정할 수 있다. Next, it is possible to determine whether the object having the feature vector of the candidate region is a person, using a classifier that has been learned in advance. That is, based on the comparison reference feature vector determined by cumulatively learning a plurality of samples of the feature vector of the actual person and the matching (or similarity) between the feature vector of the candidate region derived from the depth image obtained by the depth camera So that it can be determined that the object is a person when the matching is performed (or the degree of similarity is higher).

또한, 분류기 학습은 사람 객체를 포함하는 깊이 영상과 사람 객체를 포함하지 않는 깊이 영상에 대한 다수의 샘플을 저장하는 데이터베이스를 이용하여, 서포트 벡터 머신(SVM), 부스팅(boosting) 등의 머신 학습 알고리즘을 이용하여 구현될 수 있다. 머신 학습 알고리즘은 깊이 카메라의 방식, 영상 특성, 특징 벡터의 특성 등에 따라서 적절한 것을 적용할 수 있다. In addition, the classifier learning may be performed using a database that stores a plurality of samples of a depth image including a human object and a depth image that does not include a human object, and performs a learning algorithm such as a support vector machine (SVM) . &Lt; / RTI > The machine learning algorithm can be applied appropriately according to the depth camera method, image characteristics, characteristics of feature vectors, and so on.

방문객 출입 통계 분석 방법 및 장치Method and apparatus for analyzing visitor entrance statistics

방문객 출입 통계 정보는 매장이나 시설을 운영함에 있어 매우 유용한 정보이다. 예를 들어, 전국 각지에 많은 수의 매장을 보유한 업체에서 각 매장 별로 고객이 방문한 기록에 대한 통계를 이용하여 매장 여건에 맞는 운영 전략을 수립할 수 있다. 방문 고객수에 비해서 매출이 작은 매장은 방문한 고객으로부터 매출을 얻는데 보다 중점을 두고 매출에 비해 방문 고객수가 적다면 방문 고객을 늘리는데 중점을 두어 매장을 운영할 수 있다. 또, 대형 쇼핑몰이나 백화점과 같이 규모가 넓은 매장에서는 주요 이동 지점을 지나는 고객의 수와 시간에 대한 통계 정보로 고객이 주로 이동하는 방향의 흐름이나 동선을 분석하여 전략적으로 상품을 배치할 수도 있다. 뿐만 아니라, 일 평균 방문객 수, 요일 별 방문객 수, 시간대별, 계절별, 이벤트 여부 등 다양한 상황에 따른 방문 고객의 출입 데이터와 이를 바탕으로 한 통계 및 분석 결과를 활용하여 마케팅 전략이나 운영 전략을 세우는데 활용할 수 있다. 이와 같은 방문객 통계 및 분석 정보는 대형마트나 백화점 또는 대규모 가맹점을 보유한 체인점 등 다수의 매장을 보유하거나 운영하는 업종뿐만 아니라 박물관이나 테마파크 등 방문객이 많은 관광지나 랜드마크 시설 등에서도 혼잡 예측 및 운영 효율성 향상 등에 활용할 수 있어 최근 들어 그 수요가 크게 증가하고 있다.Visitor access statistics are very useful information for operating a store or facility. For example, a company with a large number of stores across the country could use the statistics of customer visits for each store to establish an operational strategy that fits the store's circumstances. Smaller sales compared to the number of visitors can focus on increasing sales from visitors, and if there are fewer visitors than sales, they can focus on increasing visitors. Also, in a large shopping mall such as a large shopping mall or a department store, statistical information on the number and time of customers passing through the main moving point may be used to strategically arrange the goods by analyzing the flow or movement line of the customer. In addition, we use marketing data and analysis results based on visitor's access data according to various situations such as average number of visitors per day, number of visitors per day, time period, season, event, etc. Can be utilized. Such visitor statistics and analysis information can be used to improve congestion prediction and operational efficiency even in tourism and landmark facilities where many visitors such as museums and theme parks, as well as businesses that own or operate a large number of stores such as large marts or department stores or chain stores with large merchant stores And the demand has increased greatly in recent years.

방문객의 출입 기록에 기반한 고객 분석 정보를 생성 및 이용하기 위해서는 사람이 들어오고 나가는 출입 상황을 오차 없이 정확하게 검출하는 것이 중요하다. 이를 위해서 전술한 본 발명의 예시들에서 설명한 바와 같은 깊이 영상 기반 사람 검출 방법 및 장치를 이용할 수 있다. 나아가, 정확한 사람 검출 정보를 기반으로 방문객 출입 여부를 감지하고 이를 이용하여 방문객 출입 통계 정보를 산출함으로써 전술한 바와 같은 고객 분석을 수행할 수 있다. In order to generate and use customer analysis information based on the visitor's access record, it is important to accurately detect the entry and exit situation of a person entering and leaving without error. For this, a depth image-based human detection method and apparatus as described in the above embodiments of the present invention can be used. Furthermore, based on the accurate person detection information, it is possible to perform the customer analysis as described above by detecting the entrance / exit of the visitor and calculating the visitor entrance statistical information using the information.

이하에서는, 방문객 출입 통계 정보 분석 방법 및 장치에 대한 본 발명의 다양한 예시들에 대해서 구체적으로 설명한다. Hereinafter, various examples of the present invention for a visitor entrance statistical information analysis method and apparatus will be described in detail.

도 12는 본 발명에 따른 방문객 출입 통계 분석 시스템을 나타내는 도면이다. 12 is a diagram illustrating a system for analyzing visitor entrance statistics according to the present invention.

복수의 출입 감지기(1210-1, 1210-2, ..., 1210-N)는 하나의 출입 영역에 대해서 사람이 들어오고 나감을 감지하는 장치이다. 물론, 하나의 출입 영역에 대해서 하나의 출입 감지기가 설치 및 이용될 수도 있다. 예를 들어, 출입 감지기의 각각은 전술한 3차원 카메라를 이용하는 깊이 영상 기반 사람 검출 장치를 포함할 수 있다. The plurality of access detectors 1210-1, 1210-2, ..., 1210-N are devices that allow a person to enter and exit from one access area. Of course, one access sensor may be installed and used for one access area. For example, each of the access sensors may include a depth-of-field-based human detection device using the three-dimensional camera described above.

복수의 출입 감지기(1210-1, 1210-2, ..., 1210-N)의 각각은 네트워크를 통해 출입 통계 서버(1220) 및 출입 감지기 제어 서버(1240)와 연결될 수 있다. 출입 감지기(1210-1, 1210-2, ..., 1210-N)의 각각은 출입 상황이 감지되면, 출입 정보를 출입 통계 서버(1220)로 전송할 수 있다. 출입 정보는, 예를 들어, 감지 시간과 출입한 사람의 수 및 해당 감지기의 식별 정보(예를 들어, 매장 및 출입 영역 위치에 대한 정보 등) 등을 포함할 수 있다. Each of the plurality of access detectors 1210-1, 1210-2, ..., 1210-N may be connected to the access statistics server 1220 and the access sensor control server 1240 through the network. Each of the access detectors 1210-1, 1210-2, ..., 1210-N can transmit access information to the access statistics server 1220 when the access status is detected. The access information may include, for example, the detection time, the number of persons who have gone in and the identification information of the sensor (for example, information on the location of the store and the access area).

출입 통계 서버(1220)는 출입 감지기들(1210-1, 1210-2, ..., 1210-N)로부터 수신된 출입 정보를 같은 그룹(예를 들어, 하나의 매장의 여러 출입구에 설치된 감지기들)의 출입 감지기별로 취합하고, 체류시간, 입/출 계수, 시간대별, 일별, 월별, 요일별에 대한 누적이나 평균 등의 각종 통계치를 산출하여 데이터베이스(1230)에 저장할 수 있다. The access statistics server 1220 stores the access information received from the access detectors 1210-1, 1210-2, ..., 1210-N in the same group (e.g., detectors installed at various outlets of one store And various statistics such as the accumulation time, the input / output count, the time series, the day, the month, and the day of the week can be calculated and stored in the database 1230.

출입 감지기 제어 서버(1240)는 출입 감지기들(1210-1, 1210-2, ..., 1210-N)의 운용과 유지보수를 위해서 원격으로 출입 감지기(1210-1, 1210-2, ..., 1210-N)를 모니터링하고 제어할 수 있다. 또한, 출입 감지기 제어 서버(1240)는 출입 감지기(1210-1, 1210-2, ..., 1210-N) 각각의 상태를 모니터링하고 동작 상태를 데이터베이스(1230)에 저장하고 웹 서버(1250)를 통한 사용자 단말기(1270)로부터의 요청 및 출입 감지기 제어 단말기(1260)를 통한 관리자의 요청 등을 처리할 수 있다. The access sensor control server 1240 controls access sensors 1210-1, 1210-2, ..., 1210-N for remote operation and maintenance of the access detectors 1210-1, 1210-2, ..., 1210- ., 1210-N). The access sensor control server 1240 monitors the status of each of the access detectors 1210-1, 1210-2, ..., 1210-N, stores the operation status in the database 1230, A request from the user terminal 1270 through the access control server 1260 and an administrator request through the access controller control terminal 1260. [

출입 감지기 제어 단말기(1260)는 네트워크를 통해 출입 감지기 제어 서버(1240)에 연결되고 출입 감지기 제어서버의 중계를 통해 출입 감지기(1210-1, 1210-2, ..., 1210-N)를 모니터링하고 제어할 수 있다. The access sensor control terminal 1260 is connected to the access sensor control server 1240 through the network and monitors the access sensors 1210-1, 1210-2, ..., 1210-N through the relay of the access sensor control server .

웹 서버(1250)는 데이터베이스(1230)에 저장된 출입 정보를 네트워크를 통해 접속한 사용자 단말기(127)에게 제공할 수 있다. The web server 1250 can provide the access information stored in the database 1230 to the user terminal 127 connected through the network.

출입 통계 서버(1220), 데이터베이스(1230), 출입 감지기 제어 서버(1240), 웹 서버(1250)는 전체가 하나의 물리적인 서버로 구성될 수도 있으며, 또는 각각이 별개의 물리적인 서버로 구성되어 네트워크를 통해 연결될 수도 있다.The access statistics server 1220, the database 1230, the access sensor control server 1240, and the web server 1250 may all be configured as one physical server, or each may be configured as a separate physical server Or may be connected via a network.

도 13은 출입 감지기의 일례를 나타낸 구성도이다. 13 is a block diagram showing an example of an entrance / exit detector.

도 13에서 깊이 카메라(110-1, 110-2, ..., 110-N), 단말 전처리부(영상 수신부(120-1, 120-2, ..., 120-N), 좌표 변환부(121-1, 121-2, ..., 121-N), 높이맵 생성부(122-1, 122-2, ..., 122-N)), 사람 검출부(분석 대상 높이맵 생성부(130), 후보 영역 결정부(140), 특징 정보 추출부(150), 사람 판정부(160))에 대한 구성은 도 6의 예시와 동일하므로 중복되는 설명은 생략한다. 13, the depth cameras 110-1, 110-2, ..., 110-N, the terminal preprocessing units (image receiving units 120-1, 120-2, ..., 120- (122-1, 122-2, ..., 122-N), a human detection unit (analysis target height map generation unit 122-1, 122-2, ..., The candidate region determining unit 140, the feature information extracting unit 150, and the person determining unit 160 are the same as those shown in FIG. 6, and therefore, duplicate descriptions are omitted.

도 13에서 출입 정보 처리부(1300)는 사람 검출부(분석 대상 높이맵 생성부(130), 후보 영역 결정부(140), 특징 정보 추출부(150), 사람 판정부(160)), 이동 추적 및 판정부(1310) 및 출입 정보 전송 처리부(1320)를 포함할 수 있다. 여기서, 이동 추적 및 판정부(1310) 및 출입 정보 전송 처리부(1320)는 도 6의 예시에서 객체 추적 및 계수부(170)에 포함될 수도 있다.13, the entrance information processing unit 1300 includes a human detection unit (analysis object height map generation unit 130, candidate region determination unit 140, feature information extraction unit 150, and person determination unit 160) A determination section 1310 and an access information transfer processing section 1320. [ Here, the movement tracking and determination unit 1310 and the access information transmission processing unit 1320 may be included in the object tracking and counting unit 170 in the example of FIG.

도 14는 본 발명에 따른 높이맵 생성을 예시적으로 나타내는 도면이다.FIG. 14 is an exemplary view illustrating generation of a height map according to the present invention. FIG.

도 6에서 설명한 바와 같이 출입 영역을 하나의 깊이 카메라로 커버할 수 없는 경우에는 하나의 출입 영역에 대해서 복수의 깊이 카메라를 이용할 수 있으며, 복수의 깊이 카메라에 대응하는(110-1, 110-2, ..., 110-N) 복수의 단말 전처리부에서는 영상 수신, 좌표 변환, 높이맵 생성을 병렬적으로 처리할 수 있다. 6, it is possible to use a plurality of depth cameras with respect to one entrance / exit area when the entrance / exit area can not be covered by one depth camera, and it is also possible to use a plurality of depth cameras (110-1, 110-2 , ..., 110-N) The plurality of terminal preprocessing units can process image reception, coordinate conversion, and height map generation in parallel.

도 14의 좌측 도면은 하나의 깊이 카메라로부터 획득되는 영상, 즉, 좌표 변환이 적용되기 전의 깊이 영상을 예시적으로 나타낸다. 도 14의 우측 도면은 획득된 깊이 영상에 대해서 좌표 변환이 적용되어 생성된 높이맵을 예시적으로 나타낸다. The left drawing of FIG. 14 exemplarily shows an image obtained from one depth camera, that is, a depth image before the coordinate transformation is applied. The right diagram of FIG. 14 exemplarily shows a height map generated by applying coordinate transformation to the obtained depth image.

도 15는 본 발명에 따른 분석 대상 높이맵 생성을 예시적으로 나타내는 도면이다.15 is a diagram exemplarily showing generation of a target height map according to the present invention.

예를 들어, 깊이 카메라 1에 의해서 획득된 깊이 영상 1에 대해서 단말 전처리부에 의해서 좌표 변환을 거쳐 높이맵 1이 생성될 수 있다. 마찬가지로, 깊이 카메라 2에 의해서 획득된 깊이 영상 2에 대해서 단말 전처리부에 의해서 좌표 변환을 거쳐 높이맵 2가 생성될 수 있다. 이와 같이, N 개의 깊이 카메라의 각각에 의해서 획득된 N 개의 깊이 영상으로부터 N 개의 높이맵이 생성될 수 있다. For example, the depth map 1 obtained by the depth camera 1 may be transformed by the terminal preprocessing unit to generate the height map 1. Likewise, the depth map 2 obtained by the depth camera 2 can be generated by the terminal preprocessing unit through the coordinate transformation. Thus, N height maps can be generated from the N depth images obtained by each of the N depth cameras.

분석 대상 높이맵 생성부(130)는 복수의 높이맵(예를 들어, 높이맵 1, ..., 높이맵 N)에 기초하여 분석 대상 높이맵을 생성할 수 있다. 이에 대해서 분석 대상 높이맵에 기반하여 후보 영역을 결정하고, 특징 정보를 추출하여, 해당 후보 영역이 사람인지 판정할 수 있다. The analysis object height map generation unit 130 can generate an analysis object height map based on a plurality of height maps (e.g., height map 1, ..., height map N). On the other hand, the candidate region can be determined based on the analysis object height map, and the feature information can be extracted to determine whether the candidate region is a human.

이동 추적 및 출입 판정부(1310)는 사람으로 판정된 후보 영역과 해당 후보 영역의 위치에 대한 정보를 사람 검출부(예를 들어, 사람 판정부(160))로부터 전달 받아서, 이동 궤적 분석을 통해서 사람이 들어오는 상황인지 나가는 상황인지를 결정하여 출입 정보(예를 들어, 감지 시간과 출입한 사람의 수 및 해당 감지기의 식별 정보 등)를 생성할 수 있다. 이동 추적 및 출입 판정부(1310)의 구체적인 동작에 대해서는 도 16을 참조하여 후술한다. The movement tracking and entry determination unit 1310 receives information on the candidate region determined as a person and the position of the candidate region from the human detection unit (e.g., the person determination unit 160) (For example, the detection time, the number of persons who have gone in and the identification information of the corresponding sensor) by determining whether the incoming or outgoing situation is an incoming or outgoing situation. The specific operation of the movement tracking and entrance determination unit 1310 will be described later with reference to FIG.

출입 정보 전송 처리부(1320)는 이동 추적 및 출입 판정부(1310)로부터 출입 정보를 전달 받아서 서버(예를 들어, 출입 통계 서버(1220), 출입 감지기 제어 서버(1240), 웹 서버(1250) 중의 하나 이상)로 전달할 수 있다.The access information transmission processing unit 1320 receives access information from the movement tracking and access determination unit 1310 and transmits the access information to the server (e.g., the access statistics server 1220, the access sensor control server 1240, and the web server 1250) One or more).

분석 대상 높이맵 생성부(130), 후보 영역 결정부(140), 특징 정보 추출부(150), 사람 판정부(160), 이동 추적 및 출입 판정부(1310) 및 출입 정보 전송 처리부(1320)는 물리적으로 단말 전처리부와 별개의 장치로 분리된 출입 정보 처리부(1300)로 구성될 수도 있고, 출입 정보 처리부(1300)와 단말 전처리부가 하나의 물리적인 장치로 구성될 수도 있다.The candidate determining unit 140, the feature information extracting unit 150, the person determining unit 160, the movement tracking and entry determining unit 1310, and the access information transmitting processing unit 1320, The access information processing unit 1300 and the terminal preprocessing unit may be constituted by one physical device. The input / output information processing unit 1300 and the terminal preprocessing unit may be physically separate from the terminal preprocessing unit.

도 16은 본 발명에 따른 이동 추적 및 출입 판정 동작을 설명하기 위한 도면이다. 16 is a diagram for explaining movement tracking and access determination operations according to the present invention.

이동 추적 및 출입 판정을 위해서 연속적인 깊이 영상 프레임에서 매 영상 프레임마다 이동 궤적을 분석하여 출입 여부를 판정할 수 있다. 구체적으로, 연속적인 깊이 영상 프레임의 각각에 대해서 사람 검출 결과가 존재하는지 여부를 판정할 수 있다. 만약 n 번째 깊이 영상 프레임에서 사람 검출 결과가 존재하는 경우에는, n 번째 깊이 영상 프레임에 후속하는 하나 이상의 (즉, n+1 번째, n+2 번째, ...) 깊이 영상 프레임의 각각에서 사람이 검출된 3 차원 위치에 기초하여 사람의 이동 경로를 생성할 수 있다. 즉, 사람 검출 결과가 처음으로 발생한 깊이 영상 프레임으로부터 사람 검출 결과가 존재하지 않는 깊이 영상 프레임의 직전 프레임까지에서 사람이 검출된 3 차원 위치를 연결하여 이동 경로를 생성할 수 있다. 하나의 이동 경로가 결정되면(즉, 이동 경로가 종료되면), 사람이 들어오는 상황인지 나가는 상황인지를 판별할 수 있다. In order to track the movement and judge the entrance / exit, it is possible to determine whether or not the moving image is analyzed by analyzing the moving trajectory for each image frame in the continuous depth image frame. Specifically, it is possible to determine whether a human detection result exists for each of successive depth image frames. If there is a human detection result in the n-th depth image frame, it is determined that there is no human detection result in each of one or more depth image frames (i.e., (n + 1) And the movement route of the person can be generated based on the detected three-dimensional position. That is, a moving path can be created by connecting a three-dimensional position where a person is detected from a depth image frame in which a person detection result is first generated to a frame immediately before a depth image frame in which no human detection result exists. When one movement route is determined (that is, when the movement route is terminated), it is possible to determine whether a person is entering or exiting.

단계 S1610에서는 n 번째 깊이 영상 프레임에서 사람 검출 결과가 존재하는지 판정할 수 있다. 만약 사람 검출 결과가 존재하는 경우에는 단계 S1620으로 진행하여 기존 경로가 하나 이상 존재하는지를 판정할 수 있다. 만약 사람 검출 결과가 존재하지 않는 경우에는 단계 S1670으로 진행하여 종료된 경로가 존재하는지 판정할 수 있다. In step S1610, it is possible to determine whether a human detection result exists in the nth depth image frame. If there is a person detection result, the flow advances to step S1620 to determine whether one or more existing paths exist. If there is no human detection result, the flow advances to step S1670 to determine whether there is an end path.

단계 S1620에서 기존 경로가 하나 이상 존재하는 경우에는 단계 S1630으로 진행하여 추가 가능 경로가 존재하는지 판정할 수 있다. 만약 단계 S1620에서 기존 경로가 존재하지 않는 경우에는 단계 S1660으로 진행하여 새로운 경로를 생성할 수 있다. 즉, n 번째 프레임에서 새로 검출된 사람 위치는 기존에 검출된 사람이 이동한 것으로 판정할 수 없고, 새로운 사람이 이동을 시작한 것으로 판정할 수 있다. 새로운 경로를 생성한 후에는 단계 S1650으로 진행하여 깊이 영상의 후속 프레임을 처리하기 위해서 n 값을 1 만큼 증가시킬 수 있다. 이에 따라, 다시 단계 S1610로 돌아가서 후속 프레임에서 사람 검출 결과가 존재하는지 판정할 수 있다. If there is more than one existing route in step S1620, the flow advances to step S1630 to determine whether there is an addable route. If the existing path does not exist in step S1620, the flow advances to step S1660 to generate a new path. That is, the person position newly detected in the n-th frame can not be determined as a person who has been previously detected, and it can be determined that a new person has started to move. After generating the new path, the process advances to step S1650 to increase the value of n by 1 in order to process the subsequent frame of the depth image. Accordingly, the process returns to step S1610 to determine whether there is a result of human detection in the subsequent frame.

단계 S1630에서는 추가 가능 경로가 존재하는지 판정할 수 있다. 여기서, 추가 가능 경로가 존재하는지를 판정하기 위해서는, 기존 경로의 마지막 지점과 새로운 이동 경로 상의 위치가 공간 및 시간적으로 근접해야 한다. 기존 경로와 공간 및 시간적으로 근접한지 여부는 공간상의 소정의 임계치(즉, 공간 임계치)와 시간상의 소정의 임계치(즉, 시간 임계치)를 설정하고, 기존 경로의 마지막 위치를 기준으로 공간 임계치 이내의 위치에서 사람이 검출되면서 기존 경로에서 사람이 검출된 마지막 시점을 기준으로 시간 임계치 이내에서 사람이 검출되는 경우에는, 추가 가능 경로가 존재하는 것으로 판정할 수 있다. 이와 같이 단계 S1630에서 추가 가능 경로가 존재하는 것으로 판정된 경우 단계 S1640으로 진행하여 매칭된 기존 경로에 새로 검출된 사람 위치를 추가하여 경로를 업데이트할 수 있다. 경로를 업데이트 한 후에는 단계 S1650으로 진행하여 깊이 영상의 후속 프레임을 처리하기 위해서 n 값을 1 만큼 증가시킬 수 있다. 이에 따라, 다시 단계 S1610로 돌아가서 후속 프레임에서 사람 검출 결과가 존재하는지 판정할 수 있다. In step S1630, it is determined whether or not an additional path exists. Here, in order to determine whether or not the addable route exists, the last point of the existing route and the position on the new route must be close to each other in space and time. (I.e., a spatial threshold) and a predetermined threshold in time (i.e., a time threshold), and determines whether or not the current path is within a spatial threshold based on the last position of the existing path When a person is detected within a time threshold based on the last time point when a person is detected in the existing path while the person is detected in the existing path, it can be determined that an additional path exists. If it is determined in step S1630 that the addable route exists, the flow advances to step S1640 to update the route by adding the newly detected person location to the matched existing route. After updating the path, the process advances to step S1650 to increase the value of n by 1 to process the subsequent frame of the depth image. Accordingly, the process returns to step S1610 to determine whether there is a result of human detection in the subsequent frame.

만약 기존 경로의 사람이 검출된 마지막 위치를 기준으로 공간 임계치를 초과하는 위치에서 사람이 검출되거나, 또는 기존 경로에서 사람이 검출된 마지막 시점을 기준으로 시간 임계치를 초과하는 시점에서 사람이 검출되는 경우에는, 새로 검출된 사람 위치가 기존 경로 상에 추가될 수 없는 것으로 판정할 수 있다. 이와 같이, 단계 S1630에서 추가 가능 경로가 존재하지 않는 것으로 판정되는 경우에는 단계 S1660으로 진행하여 새로운 경로를 생성할 수 있다. 즉, n 번째 프레임에서 새로 검출된 사람 위치는 기존에 검출된 사람이 이동한 것으로 판정할 수 없고, 새로운 사람이 이동을 시작한 것으로 판정할 수 있다. 새로운 경로를 생성한 후에는 단계 S1650으로 진행하여 깊이 영상의 후속 프레임을 처리하기 위해서 n 값을 1 만큼 증가시킬 수 있다. 이에 따라, 다시 단계 S1610로 돌아가서 후속 프레임에서 사람 검출 결과가 존재하는지 판정할 수 있다. If a person is detected at a position exceeding the spatial threshold based on the last position where the person of the existing path is detected or if a person is detected at a time point exceeding the time threshold based on the last time when the person is detected in the existing path , It can be determined that the newly detected human position can not be added to the existing route. In this manner, if it is determined in step S1630 that no additional path exists, the flow advances to step S1660 to generate a new path. That is, the person position newly detected in the n-th frame can not be determined as a person who has been previously detected, and it can be determined that a new person has started to move. After generating the new path, the process advances to step S1650 to increase the value of n by 1 in order to process the subsequent frame of the depth image. Accordingly, the process returns to step S1610 to determine whether there is a result of human detection in the subsequent frame.

단계 S1670에서는 종료된 경로가 존재하는지 판정할 수 있다. 즉, 단계 S1610에서 깊이 영상 프레임에서 사람 검출 결과가 존재하지 않는 것으로 판정되는 경우에, 사람의 이동이 종료된 것인지를 결정할 수 있다. 여기서, 종료된 경로를 판정하기 위해서는 종료 판정을 위한 소정의 임계치(즉, 종료 임계치)를 설정하고, 기존의 경로에서 사람이 검출된 마지막 시점을 기준으로 현재 프레임이 입력된 시점까지 경과된 시간이 종료 임계치 이내이면 아직 경로가 종료되지 않은 것으로 판정할 수 있다. 즉, 단계 S1670에서 종료된 경로가 존재하지 않는 것으로 판정되는 경우에는 단계 S1650으로 진행하여 깊이 영상의 후속 프레임을 처리하기 위해서 n 값을 1 만큼 증가시킬 수 있다. 이에 따라, 다시 단계 S1610로 돌아가서 후속 프레임에서 사람 검출 결과가 존재하는지 판정할 수 있다. In step S1670, it is possible to determine whether or not the terminated path exists. That is, when it is determined in step S1610 that there is no human detection result in the depth image frame, it can be determined whether or not the movement of the person is completed. Here, in order to determine the terminated path, a predetermined threshold value (i.e., a termination threshold value) for termination determination is set, and the elapsed time from the last time point when the person is detected in the existing path to the point in time when the current frame is input If it is within the termination threshold, it can be determined that the path has not yet been terminated. That is, if it is determined in step S1670 that the completed path does not exist, the process advances to step S1650 to increase the value of n to 1 in order to process the subsequent frame of the depth image. Accordingly, the process returns to step S1610 to determine whether there is a result of human detection in the subsequent frame.

단계 S1670에서 기존의 경로에서 사람이 검출된 마지막 시점을 기준으로 현재 프레임이 입력된 시점까지 경과된 시간이 종료 임계치를 초과하는 경우에는 해당 경로가 종료된 것으로 판정할 수 있다. 경로가 종료된 경우에는 단계 S1680으로 진행하여 출입 판별을 수행할 수 있다.If it is determined in step S1670 that the elapsed time from the last point of time when a person is detected on the existing path to the point in time when the current frame is input exceeds the termination threshold, it can be determined that the path has ended. If the path is terminated, the flow advances to step S1680 to perform the entrance / exit determination.

단계 S1680에서는 미리 설정된 출입 기준선을 기준으로, 종료된 경로의 시작 위치 및 종료 위치와 출입 기준선을 비교하여 출입 여부를 판정할 수 있다. 예를 들어, 종료된 경로의 시작 위치 및 종료 위치가 출입 기준선을 기준으로 각각 외부 위치 및 내부 위치에 해당하는 경우, 방문객이 들어온 것으로 판정할 수 있다. 또는, 종료된 경로의 시작 위치 및 종료 위치가 출입 기준선을 기준으로 각각 내부 위치 및 외부 위치에 해당하는 경우, 방문객이 나간 것으로 판정할 수 있다. 만약 종료된 경로의 시작 위치 및 종료 위치가 출입 기준선을 기준으로 모두 외부 위치이거나 또는 모두 내부 위치인 경우에는 방문객 출입 계수에 고려하지 않을 수도 있다. In step S1680, it is possible to compare the start position and the end position of the finished path with the access baseline based on a preset access baseline, and determine whether or not the access is permitted. For example, when the start position and the end position of the finished path correspond to the external position and the internal position with respect to the entrance baseline, respectively, it can be determined that the visitor has arrived. Alternatively, when the start position and the end position of the finished path correspond to the inner position and the outer position with reference to the entry / exit reference line, it can be determined that the visitor has gone out. If the starting and ending positions of the finished path are both external or all internal with respect to the entry baseline, they may not be considered in the visitor access factor.

도 12를 참조하여 설명한 출입 감지기(1210-1, 1210-2, ..., 1210-N), 출입 통계 서버(1220), 데이터베이스(1230), 출입 감지기 제어 서버(1240), 웹 서버(1250), 출입 감지기 제어 단말기(1260), 사용자 단말기(1270)의 동작에 대한 본 발명의 예시에 대해서 이하에서 설명한다. 1210-N, entrance and exit statistics server 1220, database 1230, access sensor control server 1240, web server 1250 ), The access sensor control terminal 1260, and the user terminal 1270 will be described below.

다수의 매장을 보유하는 업체의 본사에 출입 통계 서버(1220)가 위치하고 각 매장마다 출입 구역의 개수만큼 출입 감지기(1210-1, 1210-2, ..., 1210-N)가 존재하는 경우를 가정한다. 이 경우, 출입 감지기(1210-1, 1210-2, ..., 1210-N)에서는 출입 상황이 발생할 때마다 또는 설정된 시간 단위마다 누적된 출입 정보를 출입 통계 서버(1220)로 전송할 수 있다. 예를 들어, 한 사람이라도 들어오거나 나갈 때 마다 출입 정보를 전송할 수도 있고, 10초, 1분 등의 지정된 시간 간격마다 해당 시간내에 누적된 출입 정보를 출입 통계 서버(1220)로 전송할 수도 있다. When the access statistics server 1220 is located at the head office of a company having a large number of stores and the access detectors 1210-1, 1210-2, ..., 1210-N exist in the number of the access areas for each store I suppose. In this case, the access sensors 1210-1, 1210-2, ..., 1210-N can transmit the access information accumulated every time the access situation occurs or the set time unit to the access statistics server 1220. [ For example, access information may be transmitted whenever a person enters or exits, or access information accumulated within a predetermined time interval such as 10 seconds or 1 minute may be transmitted to the access statistics server 1220.

출입 감지기(1210-1, 1210-2, ..., 1210-N)에서 출입 통계 서버(1220)로 전송되는 출입 정보는, 예를 들어, 출입 감지기 ID, 출입 발생 시각, 입장수, 퇴장수 등을 포함할 수 있다. 출입 감지기(1210-1, 1210-2, ..., 1210-N)의 ID는 출입 감지기가 설치된 매장과 출입구역을 식별할 수 있는 식별정보일 수 있다. 출입 발생 시각은 출입 상황이 발생한 시각이며, 입장수와 퇴장수는 동시에 또는 설정된 시간 간격 내에서 들어오거나 나간 사람의 수이다. The access information transmitted from the access detectors 1210-1, 1210-2, ..., 1210-N to the access statistics server 1220 includes, for example, an access sensor ID, an access occurrence time, And the like. The IDs of the access detectors 1210-1, 1210-2, ..., 1210-N may be identification information that can identify the store where the access detector is installed and the access area. The entrance / exit time is the time when the entrance / exit situation occurred, and the number of entries and the number of departures are the number of persons entering or leaving at the same time or within the set time interval.

출입 통계 서버(1220)에서는 출입 감지기로부터 전송되는 출입 정보를 데이터베이스(1230)에 기록하고 설정된 일정 시간 주기로 출입 정보로부터 산출된 통계정보를 데이터베이스(1230)에 기록할 수 있다. 출입 통계 서버(1220)가 데이터베이스(1230)에 기록하는 통계 정보는, 예를 들어, 매장별 시간대에 따른 입장 수, 퇴장 수, 체류 수, 평균 체류 시간 등을 포함할 수 있다. In the access statistics server 1220, access information transmitted from the access sensor may be recorded in the database 1230, and the statistical information calculated from the access information may be recorded in the database 1230 at predetermined time intervals. The statistical information recorded by the entrance statistics server 1220 in the database 1230 may include, for example, the number of entries, the number of exits, the number of residence, the average residence time, etc. according to the time zone according to each store.

또한, 출입 통계 서버(1220)는 이러한 통계 정보로부터 산출가능한 부가정보도 산출하여 데이터베이스(1230)에 기록할 수 있다. 부가 정보에는, 예를 들어, 월별, 주별, 연별 등 시기와 기간에 따른 각각의 출입구역별, 매장별 또는 특정한 매장 그룹별 입장/퇴장 통계치 등을 포함할 수 있다. The access statistics server 1220 can also calculate additional information that can be calculated from such statistical information and write it in the database 1230. [ The additional information may include, for example, entry / exit statistics for each entrance station, store or specific store group according to time and period, such as monthly, weekly, and yearly.

출입 감지기 제어 서버(1240)는 다음과 같은 동작을 지원할 수 있다. 예를 들어, 출입 감지기(1210-1, 1210-2, ..., 1210-N)를 설치 및 구축하거나 유지보수하는 시스템 관리자에 해당하는 사용자는 전용의 출입 감지기 제어 단말(1260)을 이용하여 출입 감지기 제어 서버(1240)에 접속하여 전체 매장에 설치된 각 출입 감지기(1210-1, 1210-2, ..., 1210-N)의 상태를 확인하고 출입 감지기(1210-1, 1210-2, ..., 1210-N)의 동작을 제어하거나 설정할 수 있다. 또는, 시스템 관리자는 출입 감지기 제어 단말(1260)을 통하지 않더라도 일반적인 웹 브라우저를 이용하여 방문객 출입 통계 분석 서비스의 관리자용 웹 페이지로 접속하여 출입 감지기 제어 서버(1240)와 연동되는 웹 서버(1250)를 이용하여 출입 감지기(1210-1, 1210-2, ..., 1210-N)를 제어할 수도 있다. The access sensor control server 1240 may support the following operations. For example, a user corresponding to the system administrator who installs and builds or maintains the access detectors 1210-1, 1210-2, ..., 1210-N may use a dedicated access sensor control terminal 1260 And 1210-N connected to the access sensor control server 1240 to check the status of each of the access sensors 1210-1, 1210-2, ..., and 1210-N installed in the entire store, and the access sensors 1210-1, 1210-2, ..., 1210-N. Alternatively, the system administrator may access the web page for the administrator of the visitor access statistical analysis service using a general web browser, not through the access sensor control terminal 1260, and access the web server 1250 linked with the access sensor control server 1240 1210-N by using the access sensors 1210-1, 1210-2, ..., 1210-N.

도 17은 출입 감지기 제어 단말(1260) 또는 웹 브라우저를 이용한 출입 감지기 제어 화면을 예시적으로 나타낸다. 출입 감지기 제어 단말(1260)과 출입 감지기 제어 서버(1240)는 출입감지기(1210-1, 1210-2, ..., 1210-N)의 목록, 각각의 출입 감지기(1210-1, 1210-2, ..., 1210-N)에서 획득된 영상, 높이맵, 분석 대상 높이 맵, 각각의 출입 감지기(1210-1, 1210-2, ..., 1210-N)동작 설정 값 등에 대한 정보를 주고 받을 수 있다. 예를 들어, 특정 출입 감지기의 주소 및 포트를 설정하고 디바이스 ID를 선택함으로써 해당 출입 감지기와 연결할 수 있다. 또한, 연결된 특정 출입 감지기의 카메라 파라미터 설정은 컬러 여부, 미러링 여부, 상하 반전 여부, 아웃사이드 여부에 대한 설정 등을 포함하고, 카메라 높이 및 각도를 설정할 수 있다. 또한, 출입 감지기가 커버하는 경계(boundary)를 상, 하, 전, 좌(Top, Bottom, Front, Left) 값을 이용하여 설정할 수도 있다. 또한, 출입 기준선(In/Out Line)을 입장 기준선과 퇴장 기준선을 구분하여 설정할 수도 있다. 17 exemplarily shows an access sensor control screen using the access sensor control terminal 1260 or a web browser. The access sensor control terminal 1260 and the access sensor control server 1240 are connected to a list of the access detectors 1210-1, 1210-2, ..., 1210-N and a list of the access detectors 1210-1, 1210-2 The height map, the analysis object height map, and the operation setting values of the access sensors 1210-1, 1210-2, ..., and 1210-N, You can send and receive. For example, you can connect to a specific access detector by setting the address and port of a specific access detector and selecting the device ID. In addition, the camera parameter setting of the connected access detector includes color, mirroring, up / down reversal, and out-side setting, and the camera height and angle can be set. Also, the boundaries covered by the access sensor can be set by using the values of Top, Bottom, Front, and Left. In addition, the entry baseline (In / Out Line) can be set separately from the entry baseline and the exit baseline.

또한, 출입 감지기 제어 단말(1260)에서 출입 감지기 제어 서버(1240)로 접속하여, 출입 감지기 제어 단말(1260)의 클라이언트 ID를 출입 감지기 제어 서버(1240)로 전송할 수 있고, 출입 감지기 제어 서버(1240)는 클라이언트 ID 인증을 수행하고 및 인증 결과를 출입 감지기 제어 단말(1260)로 전송할 수 있다. 또한, 출입 감지기 제어 단말(1260)에서 출입 감지기 제어 서버(1240)로 출입 감지기 목록을 요청하고, 출입 감지기 제어 서버(1240)에서 출입 감지기 목록을 출입 감지기 제어 단말(1240)로 전송할 수도 있다. 또한, 출입 감지기 제어 단말(1260)에서 특정 출입감지기에 대한 제어 권한을 출입 감지기 제어 서버(1240)에게 요청하고, 출입 감지기 제어 서버(1240)에서 권한 승인여부를 판단하여 및 승인 여부를 출입 감지기 제어 단말(1240)로 통지할 수도 있다. 또한, 출입 감지기 제어 단말(1260)이 출입 감지기 제어 서버(1240)에게, 특정 출입감지기의 영상 스트리밍 전송을 요청하거나, 특정 출입 감지기의 동작 상태 및 설정 값을 요청하거나, 특정 출입 감지기의 동작 상태 및 설정 값 변경을 요청할 수도 있다. 전술한 설명에서 출입 감지기 제어 단말(1260)의 동작은 일반적인 웹 브라우저를 이용하여 방문객 출입 통계 분석 서비스의 관리자용 웹 페이지를 통한 관리 클라이언트의 동작으로도 적용될 수 있다. The access sensor control server 1240 may access the access sensor control server 1240 from the access sensor control terminal 1260 to transmit the client ID of the access sensor control terminal 1260 to the access sensor control server 1240, May perform client ID authentication and transmit the authentication result to the access sensor control terminal 1260. [ Also, the access sensor control terminal 1260 may request the access sensor list to the access sensor control server 1240, and the access sensor control server 1240 may transmit the access sensor list to the access sensor control terminal 1240. Also, the access controller control server 1240 requests the control of the specific access sensor from the access sensor control terminal 1260, determines whether the access controller control server 1240 has authorized the access, And notifies the terminal 1240 of this. In addition, the access sensor control terminal 1260 may request the access sensor control server 1240 to transmit the video streaming of the specific access sensor, request the operation state and the set value of the specific access sensor, You can also request to change the setting value. In the above description, the operation of the access sensor control terminal 1260 can also be applied to the operation of the management client through the manager web page of the visitor entrance statistical analysis service using a general web browser.

도 18 및 도 19는 방문객 출입 통계 분석 정보를 예시적으로 나타내는 도면이다. 18 and 19 are views showing exemplary visitor entrance statistical analysis information.

웹 서버(1250)는 사용자 또는 관리자에게 필요한 정보를 다른 개체(예를 들어, 출입 통계 서버(1220), 데이터베이스(1230), 출입 감지기 제어 서버(1240) 등)로부터 입수하거나 입수된 정보를 가공함으로써 생성하여, 도 18과 같은 웹 페이지를 사용자 또는 관리자에게 제공함으로써 출입 통계 정보, 출입 감지기 정보, 매장 정보 등을 확인하고 관리하는 기능을 제공할 수 있다. The web server 1250 may obtain information necessary for a user or an administrator from other entities (e.g., the entrance statistics server 1220, the database 1230, the access sensor control server 1240, etc.) And provides a function of checking and managing access statistics information, access detector information, store information, and the like by providing a web page as shown in FIG. 18 to a user or an administrator.

예를 들어, 웹 서버(1250)는 입장객수, 기준값 대비 입장객 수의 변화량, 퇴장객수, 기준값 대비 퇴장객 수의 변화량, 현재 체류 인원, 평균 체류 시간, 최대 체류 시간대, 최소 체류 시간대 등에 대한 정보를 생성 및 제공할 수 있다. For example, the web server 1250 may store information on the number of visitors, the change in the number of visitors compared to the reference value, the number of departing guests, the change amount of the number of departed customers relative to the reference value, the current stay number, the average stay time, the maximum stay time, And can provide and provide.

또한, 웹 서버(1250)는 복수의 출입 감지기에 대응되는 복수의 카메라 각각에 대한 입장객 비율에 대한 정보 및 퇴장객 비율에 대한 정보를 생성 및 제공할 수 있다. Also, the web server 1250 can generate and provide information on the ratio of the attendance rate to the plurality of cameras corresponding to the plurality of access sensors, and information on the exit ratio.

또한, 웹 서버(1250)는 방문객의 성별 및 나이에 대한 정보(예를 들어, 얼굴 검출 정보와 연동, 또는 기저장된 방문객 식별정보와 연동 방식 등으로 획득되는 정보)에 기초하여, 성별 입장객 비율 정보 및 나이별 입장객 비율 등에 대한 정보를 생성 및 제공할 수 있다.In addition, the web server 1250 displays the information on the sex and the age of the visitor based on the information on the sex and age of the visitor (for example, information obtained by interlocking with the face detection information or in association with previously stored visitor identification information) And the age-specific attendance ratio and the like can be generated and provided.

또한, 웹 서버(1250)는 복수의 출입 감지기에 대응되는 복수의 카메라의 각각에 의해서 촬영되는 영상을 실시간으로 제공할 수도 있다.In addition, the web server 1250 may provide an image photographed by each of a plurality of cameras corresponding to a plurality of access sensors in real time.

또한, 웹 서버(1250)는 시간대 별 입장객수, 퇴장객수, 쳬류인원수, 누적 입장객수, 누적 퇴장객수, 평균 체류시간에 대한 세부적인 통계치를 생성 및 제공할 수 있다. In addition, the web server 1250 can generate and provide detailed statistics on the number of visitors, the number of departed guests, the number of guests, the number of cumulative visitors, the number of cumulants removed, and the average residence time by time slot.

전술한 본 발명의 다양한 실시 예에서 설명한 사항들은 독립적으로 적용되거나 또는 2 이상의 실시 예가 동시에 적용될 수도 있다. The matters described in the various embodiments of the present invention described above may be applied independently or two or more embodiments may be applied at the same time.

전술한 본 발명의 다양한 실시 예에서 설명하는 예시적인 방법은 설명의 간명함을 위해서 동작의 시리즈로 표현되어 있지만, 이는 단계가 수행되는 순서를 제한하기 위한 것은 아니며, 필요한 경우에는 각각의 단계가 동시에 또는 상이한 순서로 수행될 수도 있다. 또한, 본 발명에서 제안하는 방법을 구현하기 위해서 예시하는 모든 단계가 반드시 필요한 것은 아니다.Although the exemplary methods described in the various embodiments of the invention described above are represented by a series of acts for clarity of illustration, they are not intended to limit the order in which the steps are performed, Or may be performed in different orders. In addition, not all illustrated steps are necessary to implement the method proposed by the present invention.

본 발명의 범위는 본 발명에서 제안하는 방안에 따른 동작을 처리 또는 구현하는 장치를 포함한다. The scope of the present invention includes an apparatus for processing or implementing operations according to the methods proposed by the present invention.

본 발명의 범위는 본 발명에서 제안하는 방안에 따른 동작이 장치 또는 컴퓨터 상에서 실행되도록 하는 소프트웨어(또는, 운영체제, 애플리케이션, 펌웨어(firmware), 프로그램 등), 및 이러한 소프트웨어 등이 저장되어 장치 또는 컴퓨터 상에서 실행 가능한 매체(medium)를 포함한다. The scope of the present invention includes software (or an operating system, an application, a firmware, a program, and the like) that causes an operation according to the present invention to be executed on a device or a computer, And includes an executable medium.

110 깊이 카메라 120 영상 수신부
121 좌표 변환부 122 높이맵 생성부
130 분석 대상 높이맵 생성부 140 후보 영역 결정부
150 특징 정보 추출부 160 사람 판정부
170 객체 추적 및 계수부110 depth camera 120 image receiving unit
121 Coordinate transformation unit 122 Height map generation unit
130 Analysis object height map generation unit 140 Candidate area determination unit
150 Feature information extraction unit 160 Person < RTI ID = 0.0 >
170 Object tracking and counting

Claims

A method for detecting a person using a depth image,
Obtaining one or more depth images from the one or more depth cameras;
Generating one analysis target height map using the one or more depth images;
Determining one or more candidate regions in the one analysis target height map;
Extracting feature information from each of the one or more candidate regions; And
Determining if a human object is detected in each of the one or more candidate regions based on the feature information,
Wherein the feature information includes information on an area according to a height level in one candidate region and an amount of change in an area according to the height level.

The method according to claim 1,
Wherein the feature information includes a feature vector including elements representing the area occupied by coordinates belonging to each of a plurality of coordinate groups classified according to a height level.

3. The method of claim 2,
Wherein one element of the feature vector represents an area of a convex hull with respect to the coordinates belonging to one coordinate group corresponding to one height level.

3. The method of claim 2,
The final feature vector x to which the weight is applied to the area change amount between adjacent height levels is defined by the following equation,

s _i denotes an i-th element of the feature vector,
and? _k represents a weight for the k-th element of the final feature vector.

3. The method of claim 2,
The final feature vector x to which the weights are applied to the area ratio between the adjacent height levels and the image ratio of each area is defined by the following equation,

s _i denotes an i-th element of the feature vector,
alpha _k denotes a weight for the k-th element of the final feature vector,
r _i is a value obtained by dividing the length of the minor axis by the length of the major axis in an area corresponding to the i-th level, and has a value of more than 0 and less than 1.

The method according to claim 1,
Wherein the feature information comprises information on an area according to one or more height levels corresponding to a head portion of a person and an area according to one or more height levels corresponding to a part of an upper half of the person from a shoulder of a person.

The method according to claim 1,
Wherein the one analysis target height map is generated based on a height map generated for each of the one or more depth images.

8. The method of claim 7,
Wherein the one analysis target height map is generated by combining a plurality of height maps.

The method according to claim 1,
Wherein a coordinate transformation is applied to each of the one or more depth images, and a height map is generated based on the depth image on which the coordinate transformation is performed.

10. The method of claim 9,
In the coordinate transformation,
The depth information of one depth image obtained from one depth camera is converted into three-dimensional coordinates of the camera reference coordinate system,
And converting the three-dimensional coordinates of the camera reference coordinate system into three-dimensional coordinates on a real world coordinate system.

11. The method of claim 10,
Wherein the conversion of the depth information of each pixel of the one depth image to the three-dimensional coordinates of the camera reference coordinate system is defined according to the following equation,

i represents a row index and a column index of a pixel of the depth image,
d represents a value of the depth information,
x , y , and z represent values on the X-axis, Y-axis, and Z-axis of the camera reference coordinate system,
ρ _hor , ρ _ver Represents a horizontal resolution and a vertical resolution of the one depth image,
and θ _hor and θ _ver represent the horizontal and vertical field of view (FOV) of the one depth camera, respectively.

11. The method of claim 10,
Wherein the conversion from the three-dimensional coordinates of the camera reference coordinate system to the three-dimensional coordinates on the real world coordinate system is performed,
The X-axis, the Y-axis, and the Z-axis of the camera reference coordinate system are rotationally transformed by?,?, And?
H,
And performing reflection transformation on the XY plane,
H denotes the height at which one camera is installed,
and? represents an angle at which one camera is installed in the X, Y, and Z axes of the camera reference coordinate system, respectively.

The method according to claim 1,
Wherein each of the one or more candidate regions comprises one local peak.

The method according to claim 1,
Wherein at least one of the installation position or the installation angle of each of the one or more depth cameras is adjustable.

The method according to claim 1,
Wherein at least one of installation positions or installation angles of the plurality of depth cameras is different.

The method according to claim 1,
Wherein the determination as to whether the human object is detected is based on whether the characteristic information matches the characteristic information of the human object determined using the classifier that has been learned in advance.

An apparatus for detecting a person using a depth image,
An image receiving unit for acquiring one or more depth images from at least one depth camera;
An analysis object height map generation unit for generating an analysis object height map using the at least one depth image;
A candidate region determining unit for determining one or more candidate regions in the one analysis target height map;
A feature information extracting unit for extracting feature information from each of the one or more candidate regions; And
And a person judging section that judges whether a human object is detected in each of the one or more candidate regions based on the feature information,
Wherein the feature information includes information on an area according to a height level in one candidate region and an amount of change in an area according to the height level.

A computer-readable medium having stored thereon software executable by an apparatus for detecting a person using a depth image,
The executable instructions causing the apparatus to obtain one or more depth images from one or more depth cameras; Generate one analysis target height map using the one or more depth images; Determine one or more candidate regions in the one analysis target height map; Extracting feature information from each of the one or more candidate regions; Determine, based on the feature information, whether a human object is detected in each of the one or more candidate regions,
Wherein the feature information comprises information on an area according to a height level in one candidate area and an amount of change in area according to the height level.

A method for determining a person's access using a depth image,
Determining whether a human detection result exists in an nth frame of the depth image;
Determining if an existing path exists if a human detection result exists in the nth frame;
If the existing path exists, determining whether an addable path exists based on a time threshold and a spatial threshold;
Updating the existing path if an addable path exists; And
And determining the entry and exit of the person based on the start position and the end position of the updated route,
Wherein a difference between a last detection position of the existing path and a position of a human detection result in the nth frame is less than or equal to the spatial threshold and a difference between a last detection point of the existing path and a human detection point in the nth frame is less than or equal to the time threshold , It is determined that the addable path exists.