KR102460899B1

KR102460899B1 - Method and System for People Count based on Deep Learning

Info

Publication number: KR102460899B1
Application number: KR1020180079358A
Authority: KR
Inventors: 황영배; 김성흠; 김정호
Original assignee: 한국전자기술연구원
Priority date: 2018-07-09
Filing date: 2018-07-09
Publication date: 2022-10-31
Also published as: KR20200005853A

Abstract

심층 구조 학습 기반 사람 계수 방법 및 시스템이 제공된다. 본 발명의 실시예에 따른 사람 계수 방법은, 사람의 신체 부위들을 조합한 사람 기준을 설정하고, 설정된 사람 기준에 따라 입력 영상에서 사람을 계수한다. 이에 의해, 사용자가 사람 계수에 대한 주관적인 기준을 제시할 수 있게 되므로, 보다 효과적이고 합리적인 사람 계수가 가능해진다.A deep structure learning-based human counting method and system are provided. In the person counting method according to an embodiment of the present invention, a person criterion is set by combining human body parts, and a person is counted in an input image according to the set person standard. Thereby, since the user can present a subjective criterion for the person count, a more effective and reasonable person count becomes possible.

Description

{Method and System for People Count based on Deep Learning}

본 발명은 사람을 계수하는 방법에 관한 것으로, 더욱 상세하게는 입력된 영상에 존재하는 사람들을 심층 구조 학습 기반으로 계수하는 방법 및 시스템에 관한 것이다.The present invention relates to a method of counting people, and more particularly, to a method and system for counting people present in an input image based on deep structure learning.

종래 기술에서는 RGB 카메라로 획득한 사람의 외형을 모델링하기 위해 다양한 가정들을 고려하였고, 이를 단계적으로 실행하는 parametric 모듈들을 구현해야 했다.In the prior art, various assumptions were considered in order to model the appearance of a person obtained with an RGB camera, and parametric modules that implement this step by step had to be implemented.

하지만 알고리즘을 구현했던 실험 환경과는 다른 일반적인 테스트 환경에서는 전경과 배경의 외형 변화에 따라 탐지 모듈의 안정도가 크게 떨어지며, 적용 환경에 따라 알고리즘 파라메터 튜닝에 의한 성능 차이가 크다는 단점이 있다. 예를 들어, 시점에 대해 강한 가정을 두는 계수 방법으로는 일반적인 방법론으로 검토되기 어렵다.However, in a general test environment different from the experimental environment in which the algorithm was implemented, the stability of the detection module is greatly reduced according to changes in the appearance of the foreground and background, and there is a disadvantage in that the performance difference due to algorithm parameter tuning is large depending on the application environment. For example, a counting method that makes a strong assumption about the time point is difficult to review as a general methodology.

보다 안정된 성능을 위해서는 깊이 센서 등의 추가적인 장치들을 통해 탐지 정확도를 향상시킬 수 있다. 하지만 이러한 센서들 또한 몇 가지 물리적인 이슈들로 인해 동작 환경이 제한적이며, 무엇보다 가격 경쟁력 측면에 있어 보급형 이미지 센서에 비해 불리한 조건을 가진다.For more stable performance, detection accuracy can be improved through additional devices such as a depth sensor. However, these sensors also have a limited operating environment due to several physical issues, and above all, have disadvantages compared to low-end image sensors in terms of price competitiveness.

학습에 의한 방법론은 기본적으로 데이터의 양과 기록된 정보의 질이 중요하다는 것이 알려져 있지만, 사람 계수 문제에 있어서 데이터 활용이 제한적이라는 문제가 있다.Although it is known that the amount of data and the quality of recorded information are fundamentally important in the learning-by-learning methodology, there is a problem in that data utilization is limited in the human counting problem.

본 발명은 상기와 같은 문제점을 해결하기 위하여 안출된 것으로서, 본 발명의 목적은, 인터넷으로 수집되는 다양하고 많은 양의 데이터를 다루기 위해, 심층 구조 학습을 사람 탐지와 신체 부위별 분할에 적용한 사람 계수 방법 및 시스템을 제공함에 있다.The present invention has been devised to solve the above problems, and an object of the present invention is to apply deep structure learning to human detection and segmentation by body parts in order to handle various and large amounts of data collected on the Internet. To provide a method and system.

상기 목적을 달성하기 위한 본 발명의 일 실시예에 따른, 사람 계수 방법은, 사람의 신체 부위들을 조합한 사람 기준을 설정하는 단계; 및 설정된 사람 기준에 따라, 입력 영상에서 사람을 계수하는 단계;를 포함한다.According to an embodiment of the present invention for achieving the above object, a person counting method includes the steps of: setting a person standard combining human body parts; and counting people in the input image according to the set person criteria.

그리고, 사람 기준은, 다수의 신체 부위들이 논리 연산에 따라 결합되어 있을 수 있다.In addition, in the human standard, a plurality of body parts may be combined according to a logical operation.

또한, 사람 기준은, 사용자에 의해 설정될 수 있다.In addition, the person criterion may be set by the user.

그리고, 설정된 사람 기준은, 사용자에 의해 변경 가능할 수 있다.And, the set person criterion may be changeable by the user.

또한, 본 발명의 실시예에 따른 사람 계수 방법은, 딥러닝 기반으로, 입력 영상에서 사람을 탐지하는 단계;를 더 포함하고, 계수 단계는, 탐지 단계에서 탐지된 사람들 중 설정된 사람 기준에 부합하는 사람을 계수할 수 있다.In addition, the method of counting people according to an embodiment of the present invention further includes; based on deep learning, detecting a person from an input image, wherein the counting step includes: can count people.

그리고, 본 발명의 실시예에 따른 사람 계수 방법은, 입력 영상에서 전경과 배경을 분리하여, 사람 영역을 설정하는 단계; 및 설정된 사람 영역을 신체 부위들로 분할하는 단계;를 더 포함하고, 계수 단계는, 분할 단계에서 분할 결과를 참조하여, 사람을 계수할 수 있다.In addition, a method for counting people according to an embodiment of the present invention includes the steps of setting a human region by separating a foreground and a background from an input image; and dividing the set human region into body parts, wherein the counting step may refer to the dividing result in the dividing step to count people.

또한, 사람 영역 설정단계는, 통계적인 기법을 기반으로 수행되고, 분할 단계는, 딥러닝 기반으로 수행될 수 있다.In addition, the human region setting step may be performed based on a statistical technique, and the segmentation step may be performed based on deep learning.

한편, 본 발명의 다른 실시예에 따른, 사람 계수 시스템은, 입력 영상을 획득하는 획득부; 및 사람의 신체 부위들을 조합한 사람 기준을 설정하고, 설정된 사람 기준에 따라 입력 영상에서 사람을 계수하는 프로세서;를 포함한다.On the other hand, according to another embodiment of the present invention, a person counting system includes: an acquisition unit for acquiring an input image; and a processor configured to set a human criterion by combining human body parts, and to count people in the input image according to the set human criterion.

이상 설명한 바와 같이, 본 발명의 실시예들에 따르면, 사람 탐지 및 신체 부위별 분할 기능을 통해 사람에 대한 정의가 보다 일반적으로 될 수 있어 사용자가 사람 계수에 대한 주관적인 기준을 제시할 수 있게 되므로, 보다 효과적이고 합리적인 사람 계수가 가능해진다.As described above, according to the embodiments of the present invention, the definition of a person can be made more general through the function of detecting a person and segmenting each body part, so that the user can present a subjective criterion for the person coefficient, A more effective and rational counting of people becomes possible.

도 1은 본 발명의 일 실시예에 따른 심층 구조 학습 기반 사람 계수 방법의 설명에 제공되는 흐름도,
도 2는 Yolo 구조 기반의 사람 영역 탐지기의 활용예,
도 3은 RefineNet 구조을 활용한 사람 영역 분할을 예시한 도면,
도 4는 사람 영역 분할 예들을 제시한 도면,
도 5는 사람 계수 결과들을 예시한 도면,
도 6은 본 발명의 실시예에 따른 사람 계수 방법에 대한 SW 구현 결과를 나타낸 도면,
도 7은 구현한 SW에 의한 사람 탐지/계수 결과를 예시한 도면,
도 8은 본 발명의 다른 실시예에 따른 심층 구조 학습 기반 사람 계수 시스템의 블럭도이다.1 is a flowchart provided for the description of a deep structure learning-based human counting method according to an embodiment of the present invention;
2 is an example of application of a Yolo structure-based human area detector;
3 is a diagram illustrating human domain segmentation using RefineNet structure;
4 is a view showing examples of division of human domains;
5 is a diagram illustrating human counting results;
6 is a view showing a SW implementation result for a method for counting people according to an embodiment of the present invention;
7 is a diagram illustrating the result of human detection / counting by the implemented SW;
8 is a block diagram of a deep structure learning-based human counting system according to another embodiment of the present invention.

이하에서는 도면을 참조하여 본 발명을 보다 상세하게 설명한다.Hereinafter, the present invention will be described in more detail with reference to the drawings.

도 1은 본 발명의 일 실시예에 따른 심층 구조 학습 기반 사람 계수 방법의 설명에 제공되는 흐름도이다. 본 발명의 실시예에 따른 심층 구조 학습 기반 사람 계수 방법은, 입력 영상에 존재하는 사람들을 사용자가 원하는 기준으로 계수하는 방법을 제시한다.1 is a flowchart provided to explain a deep structure learning-based human counting method according to an embodiment of the present invention. The deep structure learning-based person counting method according to an embodiment of the present invention proposes a method of counting people present in an input image based on a user's desired criterion.

이를 위해, 본 발명의 실시예에 따른 사람 계수 방법에서는, 먼저 입력 영상을 획득한다(S110). 입력 영상은 카메라에 의해 생성된 영상, 저장매체에 저장된 영상, 통신망을 통해 수신되는 영상 등이 될 수 있다.To this end, in the method of counting people according to an embodiment of the present invention, an input image is first obtained ( S110 ). The input image may be an image generated by a camera, an image stored in a storage medium, an image received through a communication network, and the like.

다음, 입력 영상에 대한 전처리 과정으로, 입력 영상에 대해 전경과 배경을 분리한다(S120). 통계적인 기법을 기반으로 배경을 학습하는 전배경 분리 알고리즘을 통해 사람 계수를 위한 관심 영역인 사람 영역을 전경 영역으로 설정할 수 있다.Next, as a preprocessing process for the input image, a foreground and a background are separated for the input image (S120). Through a background separation algorithm that learns the background based on a statistical technique, the human region, which is a region of interest for human counting, can be set as the foreground region.

실시간 처리가 필요한 이 알고리즘은 미리 구해진 Gaussian 모델의 RGB mean과 co-variance 값을 통해, 입력 영상의 픽셀당 Mahalanobis metric을 실시간 측정하여 관심 영역을 추출하고, 이를 CUDA 기반으로 최적화 할 수 있다.This algorithm, which requires real-time processing, extracts the region of interest by measuring the Mahalanobis metric per pixel of the input image in real time through the RGB mean and co-variance values of the Gaussian model obtained in advance, and can be optimized based on CUDA.

한편, 딥러닝 기반 알고리즘을 이용하여 입력 영상에서 사람을 탐지한다(S130). 이를 위해, 다양한 입력 영상을 관심 카테고리별로 학습하고, 각 카테고리별 의미 있는 시각 패턴을 데이터 기반으로 인식한다.Meanwhile, a person is detected from the input image using a deep learning-based algorithm (S130). To this end, various input images are learned for each interest category, and meaningful visual patterns for each category are recognized based on data.

딥러닝 기반 알고리즘 중에서 Yolo 네트워크를 변형하여, 본 발명의 실시예에 따른 방법에 적용 가능한 사람 영역 탐지기를 생성하고, 인터넷에서 쉽게 수집할 수 있는 영상에서 다양한 종류의 물체들을 학습하여, 자동으로 학습된 패턴을 검출하여 사람을 탐지하도록 할 수 있다. Yolo 구조 기반의 사람 영역 탐지기의 활용예를 도 2에 제시하였다.Among the deep learning-based algorithms, the Yolo network is transformed to create a human area detector applicable to the method according to an embodiment of the present invention, and it is automatically learned by learning various types of objects from images that can be easily collected from the Internet. It is possible to detect a person by detecting a pattern. An example of application of the Yolo structure-based human area detector is presented in FIG. 2 .

기존의 탐지기와 달리 물체의 각 후보 영역에 대한 점수를 한 번에 계산하는 방식으로, 심층구조를 통해 regression 된 각 후보 영역과의 IoU로 최종 물체 영역과 예측 점수를 후처리하여 목표 물체를 탐지하는 전역적 검토 방법이다.Unlike conventional detectors, it is a method that calculates the score for each candidate area of an object at once. It detects the target object by post-processing the final object area and prediction score with IoU with each candidate area regressed through deep structure. It is a global review method.

기존 방식은 전처리에서 후보 영역을 뽑고 각 국소 영역 별로 반복 분류하는 방법을 선호했지만, 본 발명의 실시예에 따른 방법에서는 기본 입력 해상도 NxNx3의 칼라 픽셀 값을 받아 SxSx{Bx(5+C)}개의 출력 값을 계산하는데, 여기서 N은 입력 영상 해상도, S는 셀 하나의 grid 사이즈, B는 셀 마다 예측되는 후보 영역의 수, C는 학습 데이터에서 정의한 클래스의 종류 개수가 된다. 심층구조로 regression된 각 후보 영역과의 IoU로 최종 물체 영역과 예측 점수를 후처리하여 목표 물체를 탐지하게 된다. The existing method preferred a method of selecting a candidate region in preprocessing and repeating classification for each local region, but in the method according to an embodiment of the present invention, a color pixel value of a basic input resolution of NxNx3 is received and SxSx{Bx(5+C)} The output value is calculated, where N is the input image resolution, S is the grid size of one cell, B is the number of predicted candidate regions per cell, and C is the number of classes defined in the training data. The target object is detected by post-processing the final object area and prediction score with IoU with each candidate area regressed to a deep structure.

예를 들어, 전체 영상이 15x15x(5x(80+5))의 확률 맵(영상을 15x15(= 225)개의 grid로 나눔)으로 표현된다면, 각 grid에서 425개의 값들은 5 [predictions] x (80 [classes] + 5 [box coordinates + confidence])와 같은데, 각 grid는 총 5개의 객체 검출을 하고 각 검출은 바운딩 박스 위치에 대한 좌표를 나타내는 4개의 값을 가지게 된다. 위치 좌표는 각 grid의 시작 좌표를 기준으로 검출되며 바운딩 박스의 세로/가로 길이는 기준 바운딩 박스와의 비율 차이로 검출된다. 또한 각 검출은 confidence 값을 가지는데, 이는 이 검출이 객체인 정도를 나타냄. 클래스를 알아내기 위해 추가로 80개의 클래스에 대한 확률 값을 가지고 있으며 이로부터 각 검출이 어떤 클래스인지 알아낸다. 학습을 위한 손실 함수(loss function)는 다음과 같다.For example, if the entire image is expressed as a 15x15x(5x(80+5)) probability map (the image is divided into 15x15(=225) grids), then 425 values in each grid are 5 [predictions] x (80 [classes] + 5 [box coordinates + confidence]), each grid detects a total of 5 objects, and each detection has 4 values representing the coordinates of the bounding box position. The position coordinates are detected based on the starting coordinates of each grid, and the vertical/horizontal length of the bounding box is detected as the difference in ratio from the reference bounding box. Each detection also has a confidence value, which indicates the degree to which this detection is an object. In order to find out the class, we have probability values for an additional 80 classes, and from this we find out which class each detection is. The loss function for learning is as follows.

사람 탐지를 위한 심층 구조 학습은 위치 좌표에 대한 cost, 바운딩 박스 크기에 대한 cost, confidence에 대한 costs, 클래스에 대한 cost로 이루어진다. 여기서 grid가 ground truth를 가지고 있지 않은 경우가 있고 있는 경우에서도 총 5개의 검출을 하기 때문에 이에 대한 loss를 측정하기 위해 총 5개의 검출 중에서 ground truth와 가장 바운딩 박스의 겹치는 면적이 큰 검출을 cost에 이용하고 그 외의 객체가 아닌 경우의 confidence 에러를 측정하는 cost를 더해준다. 추가적으로 batch normalization 기법과 skip-connection 기법을 적용하여 딥러닝 학습을 가속화하고 성능을 높인다.Deep structure learning for human detection consists of cost for location coordinates, cost for bounding box size, cost for confidence, and cost for class. Here, since a total of 5 detections are performed even when the grid does not have a ground truth, in order to measure the loss, the detection with the largest overlapping area between the ground truth and the bounding box among the total 5 detections is used for the cost. and the cost of measuring the confidence error in the case of other objects is added. Additionally, the batch normalization technique and skip-connection technique are applied to accelerate deep learning learning and increase performance.

적용할 데이터 셋은 인터넷에 공개된 공인 데이터 중에서 ImageNet과 Microsoft COCO를 선정. ILSVRC 1000개의 클래스에 대해 pre-training 한 후, COCO 데이터가 608x608 해상도로 fine-turning 된 모델을 활용할 수 있다.ImageNet and Microsoft COCO were selected from publicly available data sets for the data set to be applied. After pre-training for ILSVRC 1000 classes, the COCO data fine-turned model with 608x608 resolution can be utilized.

후처리에서 non-max suppression (NMS)을 통해 여러 개의 검출 중에서 어느 정도 겹치는 부분의 검출들을 하나로 만드는 과정을 통해 가장 관련이 있는 검출만을 최종 결과로 출력한다.In post-processing, only the most relevant detection is output as the final result through the process of making the detections of the overlapping part among several detections into one through non-max suppression (NMS).

실시간성이 중요한 심층 구조는 기본적으로 GoogleNet과 같이 여러 개의 1x1xN channel reduction layer를 삽입하여 연산량(FLOPs)을 크게 줄일 수 있고, 이에 따라 빠른 속도로 입력 영상을 처리. 또한 기존 fully connected layer가 제거되고, route(skip connection), reorg와 같은 신규 모듈이 네트워크 구성에 포함되어, 향후 하드웨어 친화적인 딥 구조 개발 및 상용화 솔루션을 지원할 수 있다.In a deep structure where real-time is important, the amount of computation (FLOPs) can be greatly reduced by inserting multiple 1x1xN channel reduction layers like GoogleNet, and accordingly, the input image is processed at a high speed. In addition, the existing fully connected layer is removed and new modules such as route (skip connection) and reorg are included in the network configuration to support future hardware-friendly deep structure development and commercialization solutions.

다시, 도 1을 참조하여 설명한다.Again, it will be described with reference to FIG. 1 .

전경/배경 분리에 의한 관심 영역 설정(S120)과 딥러닝 기반의 사람 탐지(S130)가 수행된 이후에는, 딥러닝 기반으로 사람 영역들에 대한 분할이 이루어진다(S140).After setting the ROI by foreground/background separation ( S120 ) and deep learning-based person detection ( S130 ) are performed, the deep learning-based human regions are divided ( S140 ).

S120단계에서 설정된 관심 영역인 사람 영역들을 정해진 탐지 범위 내에서 세분화된 신체 부위들(얼굴, 몸통, 골반, 팔, 다리)로 분할하는 과정이다. 여기서, 팔은 상박과 하박으로 더 분할하고, 다리는 허벅지와 종아리로 더 분할할 수도 있다.It is a process of dividing the human regions, which are the regions of interest set in step S120, into subdivided body parts (face, torso, pelvis, arms, and legs) within a predetermined detection range. Here, the arm may be further divided into the upper arm and the lower arm, and the leg may be further divided into the thigh and calf.

이는, 도 3에 예시된 바와 같은 RefineNet을 활용하여 수행될 수 있다. 매우 큰 해상도의 입력을 처리하는 시나리오에서 Pyramid 구조를 통해 점진적으로 결과 Refine하는 방식이다. 이에 따른 사람 영역 분할 예를 도 4에 제시하였다.This can be performed utilizing RefineNet as illustrated in FIG. 3 . It is a method of progressively refining the result through the Pyramid structure in a scenario where a very large resolution input is processed. An example of dividing a human region according to this is shown in FIG. 4 .

사람 영역 분할이 완료되면, 분할 결과를 참조하여, S130단계에서 탐지된 사람 영역들에 대해 사람 계수가 수행된다(S150).When the human region segmentation is completed, a person counting is performed on the human regions detected in step S130 with reference to the segmentation result ( S150 ).

S150단계에서 계수하는 사람은 사용자에 의해 정의된 사람 기준에 따른다. 즉, S130단계에서 탐지된 사람들 중 사용자에 의해 설정된 사람 기준에 부합하는 사람들만을 계수한다.The person counting in step S150 follows the person criteria defined by the user. That is, only those who meet the person criteria set by the user among the people detected in step S130 are counted.

사용자는 S140단계에서 사람 영역을 분할하는 신체 부위들을 조합하여 사람 기준을 정의할 수 있다. 예를 들어, "얼굴 AND 몸통"을 사람 기준으로 정의할 수 있는데, 이는 얼굴과 몸통이 모두 나타난 사람에 대해서만 계수하는 것이다.The user may define a human standard by combining body parts dividing a human region in step S140 . For example, "face AND torso" can be defined on a human basis, which counts only those who have both a face and a torso.

다른 예로, "얼굴 AND 몸통 AND 상박 AND 하박"을 사람 기준으로 정의할 수 있는데, 이는 얼굴, 몸통, 상박 및 하박이 모두 나타난 사람에 대해서만 계수하는 것이다.As another example, "face AND torso AND upper and lower arms AND lower arms" may be defined based on a person, which counts only those who have all of the face, torso, upper and lower arms.

또 다른 예로, "얼굴 AND 몸통 AND (상박 OR 허벅지)"를 사람 기준으로 정의할 수 있는데, 이는 얼굴과 몸통이 모두 나타나고 상박이나 허벅지가 나타난 사람에 대해서만 계수하는 것이다.As another example, "face AND torso AND (upper arms OR thighs)" can be defined on a human basis, which counts only those who have both the face and the torso and the upper arms or thighs.

사람 기준에 대한 정의는 사용자의 필요에 따라 언제든지 변경가능하다. 도 5에는 사람 계수 결과들을 예시하였다.The definition of the human standard can be changed at any time according to the needs of the user. 5 exemplifies the results of human counting.

본 발명의 실시예에 따른 사람 계수 방법에 대해, 딥러닝은 Darknet 플랫폼을 MFC로 포팅하였고, 그 밖에 OpenCV 및 MATLAB 함수들을 활용하여 SW를 구현 및 검증하였다.For the method of counting people according to an embodiment of the present invention, deep learning has ported the Darknet platform to MFC, and implemented and verified SW using other OpenCV and MATLAB functions.

실시간 처리 기능을 위해서는 비교적 가벼운 심층 구조를 활용하는 탐지기로 데모하였고(4K 기준 10 FPS 이상), 보다 높은 복잡도를 가지는 심층 구조를 활용하는 검출기로는 PETS 데이터셋 기준 95% 이상의 정확도를 달성하였다. 구현 과정의 몇몇 중요한 이슈를 다루는 과정에서 정량 목표 상의 성능 뿐 아니라, 일반적인 실험 환경에서의 다양한 사람 외형에 대한 정성적인 성능도 검증하였다.For the real-time processing function, it was demonstrated with a detector using a relatively light deep structure (10 FPS or more based on 4K), and a detector using a deep structure with higher complexity achieved an accuracy of more than 95% based on the PETS dataset. In the process of dealing with some important issues in the implementation process, not only the performance on the quantitative target, but also the qualitative performance of various human appearances in a general experimental environment was verified.

도 6에는 본 발명의 실시예에 따른 사람 계수 방법에 대한 SW 구현 결과를 나타내었다. 카메라 초기화 및 탐지 기능을 위한 심층 구조 파라미터 값을 읽어 들이는 과정을 Init.으로 수행하고, 보다 복잡한 후처리를 포함하는 비실시간 알고리즘 적용을 위한 단일 영상 처리 기능은 Grab을 통해 동작한다. 실시간 알고리즘은 Start/Stop 기능으로 데모가 가능하다. 4K 입력 기준으로는 10FPS이며, 현재 개발 환경에서 20FPS급이 가능한 입력 해상도는 QXGA로 실험되었다(1% 미만의 frame drop이 발생). 비실시간 알고리즘은 C++ 및 MATLAB 함수를 혼합하여 구현하였다. 구현한 SW에 의한 사람 탐지/계수 결과를 도 7에 예시하였다.6 shows the SW implementation result for the person counting method according to an embodiment of the present invention. The process of reading deep structure parameter values for camera initialization and detection functions is performed with Init., and a single image processing function for applying non-real-time algorithms including more complex post-processing operates through Grab. The real-time algorithm can be demonstrated with the Start/Stop function. The 4K input standard is 10FPS, and the input resolution capable of 20FPS in the current development environment was tested with QXGA (frame drop of less than 1% occurred). The non-real-time algorithm was implemented using a mixture of C++ and MATLAB functions. The result of human detection/counting by the implemented SW is exemplified in FIG. 7 .

도 8은 본 발명의 다른 실시예에 따른 심층 구조 학습 기반 사람 계수 시스템의 블럭도이다. 본 발명의 다른 실시예에 따른 사람 계수 시스템은, 도 8에 도시된 바와 같이, 통신부(210), 출력부(220), 프로세서(230), 입력부(240) 및 저장부(250)를 포함하는 컴퓨팅 시스템으로 구현할 수 있다.8 is a block diagram of a deep structure learning-based human counting system according to another embodiment of the present invention. A person counting system according to another embodiment of the present invention, as shown in FIG. 8 , includes a communication unit 210 , an output unit 220 , a processor 230 , an input unit 240 , and a storage unit 250 . It can be implemented as a computing system.

통신부(210)는 외부 기기(카메라)와 외부 네트워크로부터 영상을 입력받기 위한 통신 수단이다.The communication unit 210 is a communication means for receiving an image from an external device (camera) and an external network.

입력부(240)는 사용자 설정 명령, 이를 테면 사람 기준을 입력받기 위한 입력 수단이고, 출력부(220)는 사람 계수 과정과 결과를 표시하기 위한 디스플레이이다.The input unit 240 is an input unit for receiving a user setting command, for example, a person criterion, and the output unit 220 is a display for displaying a person counting process and result.

프로세서(230)는 도 1에 도시된 방법을 실행하여 입력 영상에서 사람을 계수한다. 저장부(250)는 프로세서(230)가 동작함에 있어 필요한 저장 공간을 제공한다.The processor 230 counts people in the input image by executing the method shown in FIG. 1 . The storage unit 250 provides a storage space necessary for the processor 230 to operate.

지금까지, 심층 구조 학습 기반 사람 계수 방법에 대해 바람직한 실시예를 들어 상세히 설명하였다.So far, a preferred embodiment of the deep structure learning-based human counting method has been described in detail.

본 발명의 실시예에 따른 심층 구조 학습 기반 사람 계수 방법에서는, 배경을 학습하여 사람 영역을 추출하는 컴퓨터 비전 및 영상처리 알고리즘과 심층 구조 학습을 통한 사람 탐지, 신체 부위별 분할 기법을 접목하였다.In the deep structure learning-based person counting method according to an embodiment of the present invention, a computer vision and image processing algorithm for extracting a human region by learning the background, human detection through deep structure learning, and a body part segmentation technique are combined.

이를 위해, 통계적인 기법으로 배경을 학습하는 전배경 분리 알고리즘을 통해 사람 계수를 위한 관심 영역을 설정하였고, 사람 계수를 위한 사람 탐지는 심층 구조를 학습하는 방법론이 적용되었으며, 이를 위한 기록(annotation)으로는 외형을 정의하는 사각 박스로 인터넷에서 가용한 데이터로 수집되었다.To this end, a region of interest for human counting was set through a background separation algorithm that learns the background using a statistical technique, and a methodology for learning a deep structure was applied to human detection for human counting, and for this purpose, annotation It is a rectangular box that defines the appearance and was collected from data available on the Internet.

또한, 사람 계수를 위한 신체 부위별 분할 기능으로는 심층 구조를 학습하는 방법론이 적용되었으며, 이를 위해 픽셀 수준으로 정의된 부위별 분할 라벨을 인터넷에서 가용한 데이터로 수집되었다. In addition, as a function of segmentation by body part for human counting, a methodology for learning deep structures was applied, and for this, segmentation labels defined at the pixel level were collected as data available on the Internet.

나아가, 사람 탐지 및 신체 부위별 분할 기능을 통해 사람에 대한 정의가 보다 일반적으로 이루어질 수 있으며, 사용자가 사람 계수에 대한 주관적인 기준을 제시할 수 있게 하였다. 즉, 사람이 계수되는 조건은 탐지된 사람 영역에서 신체 부위 가려짐 정도를 사용자가 제시할 수 있는 것이다. 예를 들어, 사람의 머리와 몸통이 함께 보이는 경우에만 사람을 계수하는 등의 기준으로 프로그램이 동작할 수 있다.Furthermore, the definition of a person can be made more generally through the function of detecting a person and segmenting each body part, and it is possible for the user to present a subjective criterion for the coefficient of a person. That is, the condition in which a person is counted is that the user may present the degree of occlusion of a body part in the detected area of the person. For example, the program may operate on the basis of counting people only when a person's head and torso are seen together.

한편, 본 실시예에 따른 장치와 방법의 기능을 수행하게 하는 컴퓨터 프로그램을 수록한 컴퓨터로 읽을 수 있는 기록매체에도 본 발명의 기술적 사상이 적용될 수 있음은 물론이다. 또한, 본 발명의 다양한 실시예에 따른 기술적 사상은 컴퓨터로 읽을 수 있는 기록매체에 기록된 컴퓨터로 읽을 수 있는 코드 형태로 구현될 수도 있다. 컴퓨터로 읽을 수 있는 기록매체는 컴퓨터에 의해 읽을 수 있고 데이터를 저장할 수 있는 어떤 데이터 저장 장치이더라도 가능하다. 예를 들어, 컴퓨터로 읽을 수 있는 기록매체는 ROM, RAM, CD-ROM, 자기 테이프, 플로피 디스크, 광디스크, 하드 디스크 드라이브, 등이 될 수 있음은 물론이다. 또한, 컴퓨터로 읽을 수 있는 기록매체에 저장된 컴퓨터로 읽을 수 있는 코드 또는 프로그램은 컴퓨터간에 연결된 네트워크를 통해 전송될 수도 있다.On the other hand, it goes without saying that the technical idea of the present invention can also be applied to a computer-readable recording medium containing a computer program for performing the functions of the apparatus and method according to the present embodiment. In addition, the technical ideas according to various embodiments of the present invention may be implemented in the form of computer-readable codes recorded on a computer-readable recording medium. The computer-readable recording medium may be any data storage device readable by the computer and capable of storing data. For example, the computer-readable recording medium may be a ROM, RAM, CD-ROM, magnetic tape, floppy disk, optical disk, hard disk drive, or the like. In addition, the computer-readable code or program stored in the computer-readable recording medium may be transmitted through a network connected between computers.

또한, 이상에서는 본 발명의 바람직한 실시예에 대하여 도시하고 설명하였지만, 본 발명은 상술한 특정의 실시예에 한정되지 아니하며, 청구범위에서 청구하는 본 발명의 요지를 벗어남이 없이 당해 발명이 속하는 기술분야에서 통상의 지식을 가진자에 의해 다양한 변형실시가 가능한 것은 물론이고, 이러한 변형실시들은 본 발명의 기술적 사상이나 전망으로부터 개별적으로 이해되어져서는 안될 것이다.In addition, although preferred embodiments of the present invention have been illustrated and described above, the present invention is not limited to the specific embodiments described above, and the technical field to which the present invention belongs without departing from the gist of the present invention as claimed in the claims In addition, various modifications may be made by those of ordinary skill in the art, and these modifications should not be individually understood from the technical spirit or perspective of the present invention.

210 : 통신부
220 : 출력부
230 : 프로세서
240 : 입력부
250 : 저장부210: communication department
220: output unit
230: processor
240: input unit
250: storage

Claims

separating a foreground and a background from an input image to set a human area; and
dividing the set human area into body parts;
setting a human standard combining human body parts; and
Including; counting people in the input image according to the set person criteria;
The division step is
After dividing the human area into face, torso, pelvis, arms and legs, the arms are further divided into upper and lower arms, and the legs are further divided into thighs and calves,
human standards,
It is set and changeable by the user, and a number of body parts are combined according to a logical operation,
logical operation,
A person counting method comprising an AND operation and an OR operation.

delete

The method according to claim 1,
Based on deep learning, detecting a person in the input image; further comprising,
The counting step is
A person counting method, characterized in that among the people detected in the detection step, persons meeting a set person criterion are counted.

delete

The method according to claim 1,
The step of setting up a person's area is:
It is performed based on statistical techniques,
The division step is
A method of counting people, characterized in that it is performed based on deep learning.

an acquisition unit acquiring an input image; and
The human region is set by separating the foreground and the background from the input image, the set human region is divided into body parts, the human criterion is set by combining human body parts, and the person is counted in the input image according to the set human criterion A processor that includes;
The processor is
After dividing the human area into face, torso, pelvis, arms and legs, the arms are further divided into upper and lower arms, and the legs are further divided into thighs and calves,
human standards,
It is set and changeable by the user, and a number of body parts are combined according to a logical operation,
logical operation,
A human counting system comprising an AND operation and an OR operation.