KR20200046178A

KR20200046178A - Head region detection method and head region detection device

Info

Publication number: KR20200046178A
Application number: KR1020180124314A
Authority: KR
Inventors: 전은솜; 김광중
Original assignee: 주식회사 케이티
Priority date: 2018-10-18
Filing date: 2018-10-18
Publication date: 2020-05-07

Abstract

The present invention relates to a head region detecting method which detects a human head in an input image to determine the number of people in the image, and to a head region detecting apparatus. According to an embodiment of the present invention, the head region detecting method comprises the steps of: receiving an initial image; setting a detection target region; generating a plurality of transformed images; detecting a first head region, which is a candidate estimated to have a head for each of the plurality of transformed images; detecting a second head region for each of the plurality of transformed images; and detecting a third head region which is a final head region in the detection target region.

Description

Head area detection method and head area detection device {HEAD REGION DETECTION METHOD AND HEAD REGION DETECTION DEVICE}

본 발명은 머리 영역 검출 방법 및 머리 영역 검출 장치에 관한 것으로서, 보다 상세하게는 입력되는 이미지 내에서 사람의 머리를 검출하여 이미지 내의 사람이 얼마나 있는 지 확인할 수 있도록 하는 머리 영역 검출 방법 및 머리 영역 검출 장치에 관한 것이다.The present invention relates to a head region detection method and a head region detection apparatus, and more specifically, a head region detection method and a head region detection that detects a human head in an input image to determine how many people are in the image. It is about the device.

최근 인구 수 파악, 동선 분석, 비상상황 알림, 게임 등에 사람 검출 기술이 다양하게 활용되고 있다. 다양한 정보의 수집 또는 일정한 공간에 대한 환경 조성을 위해선 해당 공간에 위치하는 사람의 수를 효율적으로 계산하는 것이 요구된다. In recent years, people detection technology has been widely used in population analysis, traffic analysis, emergency alerts, games, etc. In order to collect various information or to create an environment for a certain space, it is required to efficiently calculate the number of people located in the space.

기존에는 깊이 카메라 또는 스테레오 카메라로 취득된 영상을 분석하여 영상 내 사람의 머리를 검출함으로써 촬영되는 영상에 포함되는 사람의 수를 세는 방법을 사용하였다. 또한, 보다 정확히 사람의 수를 세기 위하여 복수 개의 카메라를 이용하거나 영상에 포함되는 이미지의 컬러, 깊이 등의 복합적인 특징을 취득하여 인식을 수행하기도 한다. Previously, a method of counting the number of people included in an image captured by analyzing the image acquired by a depth camera or a stereo camera and detecting a human head in the image was used. In addition, recognition is performed by using a plurality of cameras to more accurately count the number of people, or by acquiring complex features such as color and depth of an image included in an image.

그러나, 종래의 기술은 사람의 머리로 인식할 수 있는 형태가 제한되어, 머리의 옆모습이나 뒷모습에 대한 인식률이 떨어졌다. 따라서, 카메라를 바라보지 않고 다른 방향을 바라보는 사람의 머리를 인식하지 못하게 되는 문제점이 있다. 또한, 영상에 사람의 전신이 포함되지 않거나 사람끼리 서로 겹치는 경우에도 신체를 인식하기 힘들어 인식률이 저하되는 문제점이 있었다. However, in the related art, the form that can be recognized by the human head is limited, and the recognition rate for the side or back of the head is reduced. Therefore, there is a problem in that a head of a person looking in a different direction without looking at the camera cannot be recognized. In addition, even if the entire body of a person is not included in the image or people overlap each other, there is a problem in that the recognition rate is lowered because the body is difficult to recognize.

또한, 촬영되는 영상으로부터 머리의 인식률을 향상시키기 위하여, 깊이 카메라를 이용하거나, 같은 지역을 촬영하는 복수의 카메라로부터 제공되는 영상을 이용할 수 도 있으나, 일반적인 촬영 환경에 비하여 추가적인 비용이 요구되는 문제점이 있다. Further, in order to improve the recognition rate of the head from the captured image, a depth camera may be used, or an image provided from a plurality of cameras photographing the same area may be used, but an additional cost is required compared to a general shooting environment. have.

또한, 움직임이나 인체 정보 등 사용자 개인의 특징 데이터에 기반한 방법은 개인별 얼굴, 키, 피부색 등의 차이점을 함께 고려하여야 하므로 다양한 환경에서 사람을 검출하는 것이 어렵다.In addition, the method based on the user's personal feature data such as movement or human body information must consider the differences of the individual face, height, and skin color together, so it is difficult to detect a person in various environments.

본 발명의 실시예에 따른 머리 영역 검출 방법 및 머리 영역 검출 장치는 입력되는 이미지로부터 사람의 머리를 검출하는 방법에 있어서, 이미지 상에 표현되는 머리의 상태에 관계 없이 머리가 위치 하는 머리 영역을 정확하고 빠르게 검출하는 것에 그 목적이 있다. The head region detection method and the head region detection apparatus according to an embodiment of the present invention, in a method of detecting a human head from an input image, accurately recognize the head region where the head is located regardless of the state of the head represented on the image. The purpose is to detect quickly.

본 발명의 실시예에 따른 머리 영역 검출 방법 및 머리 영역 검출 장치는 다수의 카메라로부터 입력되는 영상 또는 깊이 카메라를 통해 입력되는 영상을 사용하지 않고도 수행됨으로써, 머리 검출에 소요되는 비용을 절감시키는 것에 다른 목적이 있다.The head region detection method and the head region detection apparatus according to an embodiment of the present invention are performed without using an image input from a plurality of cameras or an image input through a depth camera, thereby reducing the cost of head detection. There is a purpose.

상기한 과제를 해결하기 위해 본 발명의 일 실시예에 따른 객체 인식 방법은 초기 이미지를 입력받는 단계; 상기 초기 이미지 내에서 검출 대상 영역이 설정되는 단계; 상기 검출 대상 영역을 서로 다른 크기를 가진 복수의 변환 이미지로 생성하는 단계; 상기 복수의 변환 이미지 각각에 대하여 머리가 위치하는 것으로 추정되는 후보인 제1 머리 영역을 검출하는 단계; 상기 제1 머리 영역을 분석하여 상기 복수의 변환 이미지 각각에 대하여 제2 머리 영역을 검출하는 단계; 및 상기 복수의 변환 이미지를 중첩하여 복수의 상기 제2 머리 영역을 분석함으로써, 상기 검출 대상 영역 내의 최종 머리 영역인 제3 머리 영역을 검출하는 단계; 를 포함할 수 있다.In order to solve the above problems, an object recognition method according to an embodiment of the present invention includes receiving an initial image; Setting an area to be detected in the initial image; Generating the detection target regions as a plurality of transformed images having different sizes; Detecting a first head region which is a candidate estimated to have a head for each of the plurality of transformed images; Analyzing the first head region and detecting a second head region for each of the plurality of transformed images; And analyzing the plurality of second head regions by overlapping the plurality of transformed images, thereby detecting a third head region that is a final head region in the detection target region. It may include.

상기 검출 대상 영역은 사용자가 지정함으로써 설정될 수 있다.The detection target area can be set by a user designating.

상기 복수의 변환 이미지로 생성하는 단계는 상기 검출 대상 영역 내에 위치하는 객체 중 제일 큰 크기의 객체와 제일 작은 크기의 객체를 선택하는 단계; 및 상기 제일 큰 크기의 객체와 상기 제일 작은 크기의 객체의 크기 비율에 따라 상기 초기 이미지보다 확대된 변환 이미지를 생성하거나 상기 초기 이미지보다 축소된 변환 이미지를 생성하는 단계; 를 포함할 수 있다.The generating of the plurality of transformed images may include selecting an object of the largest size and an object of the smallest size among objects located in the detection target area; And generating a converted image enlarged from the initial image or a reduced converted image from the initial image according to the size ratio of the largest object and the smallest object. It may include.

상기 제1 머리 영역을 검출하는 단계는 상기 변환 이미지에 대한 필터링을 거쳐 상기 머리가 위치하는 영역의 특징과 관련된 특징 맵을 생성하고, 생성된 상기 특징 맵에 대한 CNN(Convolutional Neural Network) 연산을 설정된 횟수만큼 반복 수행함으로써 압축된 크기의 특징 맵을 포함하는 제1 중간 레이어를 생성하는 단계; 상기 제1 중간 레이어가 포함하는 특징 맵의 크기를 유지한 상태로 CNN 연산을 설정된 횟수만큼 반복 수행하고, 각각의 반복 수행 단계에 따른 특징 맵을 포함하는 복수의 제2 중간 레이어를 생성하는 단계; 및 상기 복수의 제2 중간 레이어를 생성하는 단계에서 생성된 제2 중간 레이어들의 특징 맵을 병합하여 CNN 연산을 수행함으로써 제3 중간 레이어를 생성하는 단계; 를 포함할 수 있다.In the detecting of the first head region, through filtering on the transformed image, a feature map related to features of the region where the head is located is generated, and a convolutional neural network (CNN) operation for the generated feature map is set. Generating a first intermediate layer including a feature map of a compressed size by repeatedly performing the number of times; Repeating a CNN operation a predetermined number of times while maintaining the size of the feature map included in the first intermediate layer, and generating a plurality of second intermediate layers including the feature map according to each iteration step; And generating a third intermediate layer by performing a CNN operation by merging the feature maps of the second intermediate layers generated in the step of generating the plurality of second intermediate layers. It may include.

상기 제1 머리 영역을 생성하는 단계는 중간 레이어들을 생성하는 과정에서 ReLU(Rectified Linear Unit) 함수를 사용하는 단계일 수 있다.The step of generating the first head region may be a step of using a ReLU (Rectified Linear Unit) function in the process of generating intermediate layers.

상기 ReLU 함수는 음수의 범위에서 음의 기울기를 갖는 함수일 수 있다.The ReLU function may be a function having a negative slope in a negative range.

상기 제2 중간 레이어는 상기 제1 중간 레이어보다 많은 수의 특징 맵을 포함할 수 있다.The second intermediate layer may include a larger number of feature maps than the first intermediate layer.

상기 제2 머리 영역을 검출하는 단계는 상기 제2 중간 레이어 중 최초로 생성된 제2 중간 레이어의 특징 맵과 상기 제3 중간 레이어의 특징 맵을 병합하여 CNN 연산을 반복 수행함으로써 제4 중간 레이어를 생성하는 단계; 및 생성된 제4 중간 레이어의 특징 맵에 상기 제1 중간 레이어의 특징 맵을 병합하는 과정을 반복함으로써 최종 레이어를 생성 하는 단계; 를 포함할 수 있다.In the detecting of the second head region, a fourth intermediate layer is generated by repeatedly performing a CNN operation by merging the feature map of the second intermediate layer generated first among the second intermediate layers and the feature map of the third intermediate layer. To do; And generating a final layer by repeating the process of merging the feature map of the first intermediate layer to the generated feature map of the fourth intermediate layer. It may include.

상기 제3 머리 영역을 검출하는 단계는 서로 다른 크기의 상기 복수의 변환 이미지를 동일한 사이즈로 조정하는 단계; 동일한 크기로 조정된 상기 복수의 변환 이미지에 포함된 제2 머리 영역들의 위치를 비교하는 단계; 및 상기 복수의 변환 이미지에 포함된 제2 머리 영역들의 위치가 겹쳐지는 정도에 따라 상기 제3 머리 영역을 결정하는 단계; 를 포함할 수 있다.The detecting of the third head region may include adjusting the plurality of transformed images of different sizes to have the same size; Comparing positions of second head regions included in the plurality of transformed images adjusted to the same size; And determining the third head region according to the degree of overlap of positions of the second head regions included in the plurality of transformed images. It may include.

상기 제2 머리 영역들의 위치가 겹쳐지는 정도에 따라 상기 제3 머리 영역을 결정하는 단계는 NMS(Non-Maximum-Suppression) 알고리즘을 통해 상기 제3 머리 영역을 결정하는 단계일 수 있다.The determining of the third head region according to the degree of overlap of the positions of the second head regions may be a step of determining the third head region through a non-maximum-suppression (NMS) algorithm.

상기한 과제를 해결하기 위한 본 발명의 실시예에 따른 머리 영역 검출 장치는 적어도 하나의 프로그램이 저장된 메모리; 및 상기 적어도 하나의 프로그램의 제어에 따라 동작하는 프로세서;를 포함하고, 상기 프로세서는 초기 이미지와 검출 대상 영역을 입력받고, 상기 검출 대상 영역을 서로 다른 크기를 가진 복수의 변환 이미지로 생성하고, 상기 복수의 변환 이미지 각각에 대하여 머리가 위치하는 것으로 추정되는 후보 영역으로서 제1 머리 영역을 검출하며, 상기 제1 머리 영역을 분석하여 상기 복수의 변환 이미지 각각에 대하여 제2 머리 영역을 검출하고, 상기 복수의 변환 이미지를 중첩하여 복수의 상기 제2 머리 영역을 분석함으로써, 상기 검출 대상 영역 내의 최종 머리 영역인 제3 머리 영역을 검출할 수 있다.An apparatus for detecting a head region according to an embodiment of the present invention for solving the above-described problem includes a memory in which at least one program is stored; And a processor operating under the control of the at least one program, wherein the processor receives an initial image and a detection target region, generates the detection target region as a plurality of transformed images having different sizes, and A first head region is detected as a candidate region where a head is estimated to be located for each of the plurality of transformed images, and the first head region is analyzed to detect a second head region for each of the plurality of transformed images, and the By analyzing a plurality of the second head regions by overlapping a plurality of transformed images, the third head region, which is the final head region in the detection target region, can be detected.

본 발명의 실시예에 따른 머리 영역 검출 방법 및 머리 영역 검출 장치는 입력되는 이미지로부터 사람의 머리를 검출하는 방법에 있어서, 이미지 상에 표현되는 머리의 상태에 관계 없이 머리가 위치 하는 머리 영역을 정확하고 빠르게 검출할 수 있다.The head region detection method and the head region detection apparatus according to an embodiment of the present invention, in a method of detecting a human head from an input image, accurately recognize the head region where the head is located regardless of the state of the head represented on the image. And can be detected quickly.

본 발명의 실시예에 따른 머리 영역 검출 방법 및 머리 영역 검출 장치는 다수의 카메라로부터 입력되는 영상 또는 깊이 카메라를 통해 입력되는 영상을 사용하지 않고도 수행됨으로써, 머리 검출에 소요되는 비용을 절감시키는 효과가 있다.The head region detection method and the head region detection apparatus according to an embodiment of the present invention are performed without using an image input from a plurality of cameras or an image input through a depth camera, thereby reducing the cost of head detection. have.

또한, 머리 영역을 검출하는 과정에서 배경 영상을 생성하는 과정이 필요하지 않고 특정 동작이나 환경에 대해서 영상이나 특징 정보를 저장하지 않아도 되어 비교적 연산 처리 성능이 부족한 모바일 단말 등에서도 활용성이 우수한 효과가 있다. In addition, the process of generating the background image in the process of detecting the head region is not required, and it is not necessary to store the image or feature information for a specific operation or environment, so it has an excellent usability effect even in a mobile terminal having relatively low computational processing performance. have.

다만, 본 발명의 일 실시예에 따라 달성할 수 있는 효과는 이상에서 언급한 것들로 제한되지 않으며, 언급하지 않은 또 다른 효과들은 아래의 기재로부터 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에게 명확하게 이해될 수 있을 것이다.However, the effects that can be achieved according to an embodiment of the present invention are not limited to those mentioned above, and other effects not mentioned are those skilled in the art from the following description. Will be clearly understood.

도 1은 본 발명의 일 실시예에 따른 머리 영역 검출 방법을 설명하기 위한 도면이다.
도 2는 도 1의 실시예에서 복수의 변환 이미지 생성 단계를 보다 상세히 설명하기 위한 도면이다.
도 3은 검출 대상 영역 내에서 객체가 선택되는 과정을 예시하여 설명하기 위한 도면이다.
도 4는 복수의 변환 이미지가 생성되는 것을 예시하여 설명하기 위한 도면이다.
도 5는 도 1의 실시예에서 제1 머리 영역 검출 단계를 보다 상세히 설명하기 위한 도면이다.
도 6은 제1 머리 영역 검출 단계를 설명하기 위한 다른 도면이다.
도 7은 제1 머리 영역 검출 단계를 설명하기 위한 또 다른 도면이다.
도 8은 제1 머리 영역 검출 단계의 서브 샘플링 과정을 예시하여 설명하기 위한 도면이다.
도 9는 ReLU 함수의 그래프를 설명하기 위한 도면이다.
도 10은 도 1의 제2 머리 영역 검출 단계를 보다 상세히 설명하기 위한 도면이다.
도 11은 제2 머리 영역 검출 단계를 설명하기 위한 다른 도면이다.
도 12는 제2 머리 영역 검출 단계를 설명하기 위한 또 다른 도면이다.
도 13은 도 1의 제3 머리 영역 검출 단계를 보다 상세히 설명하기 위한 도면이다.
도 14는 본 발명의 다른 실시예에 따른 머리 영역 검출 장치를 설명하기 위한 도면이다.1 is a view for explaining a head region detection method according to an embodiment of the present invention.
FIG. 2 is a diagram for explaining in more detail the steps of generating a plurality of transformed images in the embodiment of FIG. 1.
3 is a diagram for explaining an example of a process in which an object is selected in a detection target area.
4 is a view for explaining that a plurality of transformed images are generated.
FIG. 5 is a diagram for explaining the first head region detection step in more detail in the embodiment of FIG. 1.
6 is another diagram for describing a first head region detection step.
7 is another diagram for describing a first head region detection step.
8 is a diagram for explaining a sub-sampling process of the first head region detection step by way of example.
9 is a view for explaining a graph of the ReLU function.
FIG. 10 is a view for explaining the second head region detection step of FIG. 1 in more detail.
11 is another diagram for describing a second head region detection step.
12 is another diagram for describing a second head region detection step.
13 is a view for explaining the third head region detection step of FIG. 1 in more detail.
14 is a view for explaining a head region detection apparatus according to another embodiment of the present invention.

본 발명은 다양한 변경을 가할 수 있고 여러 가지 실시예를 가질 수 있는 바, 특정 실시예들을 도면에 예시하고, 이를 상세한 설명을 통해 상세히 설명하고자 한다. 그러나, 이는 본 발명을 특정한 실시 형태에 대해 한정하려는 것이 아니며, 본 발명은 본 발명의 사상 및 기술 범위에 포함되는 모든 변경, 균등물 내지 대체물을 포함하는 것으로 이해되어야 한다.The present invention can be applied to various changes and can have various embodiments, and specific embodiments are illustrated in the drawings and will be described in detail through detailed description. However, this is not intended to limit the present invention to specific embodiments, and it should be understood that the present invention includes all modifications, equivalents, and substitutes included in the spirit and scope of the present invention.

본 발명을 설명함에 있어서, 관련된 공지 기술에 대한 구체적인 설명이 본 발명의 요지를 불필요하게 흐릴 수 있다고 판단되는 경우 그 상세한 설명을 생략한다. 또한, 본 명세서의 설명 과정에서 이용되는 숫자(예를 들어, 제 1, 제 2 등)는 하나의 구성요소를 다른 구성요소와 구분하기 위한 식별기호에 불과하다.In describing the present invention, when it is determined that a detailed description of related known technologies may unnecessarily obscure the subject matter of the present invention, the detailed description will be omitted. In addition, the numbers (for example, first, second, etc.) used in the description process of the present specification are only identification numbers for distinguishing one component from other components.

또한, 본 명세서에서, 일 구성요소가 다른 구성요소와 "연결된다" 거나 "접속된다" 등으로 언급된 때에는, 상기 일 구성요소가 상기 다른 구성요소와 직접 연결되거나 또는 직접 접속될 수도 있지만, 특별히 반대되는 기재가 존재하지 않는 이상, 중간에 또 다른 구성요소를 매개하여 연결되거나 또는 접속될 수도 있다고 이해되어야 할 것이다.Further, in this specification, when one component is referred to as "connected" or "connected" with another component, the one component may be directly connected to the other component, or may be directly connected, but in particular, It should be understood that, as long as there is no objection to the contrary, it may or may be connected via another component in the middle.

도 1은 본 발명의 일 실시예에 따른 머리 영역 검출 방법을 설명하기 위한 도면이다. 도 1을 참조하면, 머리 영역 검출 방법은 초기 이미지 입력 단계(S100), 검출 대상 영역 설정 단계(S200), 복수의 변환 이미지 생성 단계(S300), 제1 머리 영역 검출 단계(S400), 제2 머리 영역 검출 단계(S500) 및 제3 머리 영역 검출 단계(S600)를 포함할 수 있다. 1 is a view for explaining a head region detection method according to an embodiment of the present invention. Referring to FIG. 1, the head region detection method includes an initial image input step (S100), a detection target region setting step (S200), a plurality of transformed image generation steps (S300), a first head region detection step (S400), and a second. A head region detection step (S500) and a third head region detection step (S600) may be included.

본 발명의 실시예에 따른 머리 영역 검출 방법의 각 단계는 후술할 머리 영역 검출 장치의 프로세서에 의하여 수행될 수 있다. 프로세서는 상기한 단계 들의 수행을 위해 필요한 데이터를 머리 영역 검출 장치의 메모리로부터 로드하여 사용하고, 상기한 단계들의 수행에 필요한 연산을 처리할 수 있다.Each step of the head region detection method according to an embodiment of the present invention may be performed by a processor of the head region detection apparatus, which will be described later. The processor may load and use data necessary for performing the above steps from the memory of the head region detection device, and process an operation necessary for performing the above steps.

초기 이미지 입력 단계(S100)는 머리 영역 검출 장치의 프로세서가 내부 또는 외부의 데이터베이스로부터 초기 이미지를 로드하거나 머리 영역 검출 장치의 촬영부로부터 직접 전송되는 초기 이미지를 수신하는 단계를 의미할 수 있다. 초기 이미지는 정지된 상태의 사진 또는 그림에 대한 이미지 데이터일 수 있고, 이러한 이미지 데이터는 복수의 프레임이 일련의 시퀀스에 따라 표시되는 영상의 프레임 중 하나를 의미할 수 있다. The initial image input step (S100) may mean a step in which the processor of the head region detection apparatus loads an initial image from an internal or external database or receives an initial image transmitted directly from a photographing unit of the head region detection apparatus. The initial image may be image data for a still picture or picture, and the image data may mean one of frames of an image in which a plurality of frames are displayed according to a sequence of sequences.

검출 대상 영역 설정 단계(S200)는 초기 이미지의 전부 또는 일부의 영역을 검출 대상 영역으로 설정하는 단계를 의미할 수 있다. 보다 상세하게, 검출 대상 영역은 머리 영역 검출 장치의 사용자가 초기 이미지 내에서 머리 영역의 검출이 필요한 영역을 지정함으로써 설정될 수 있다. 검출 대상 영역은 필요에 따라 초기 이미지 전체에 대하여 설정될 수도 있다.The detection target region setting step S200 may mean a step of setting all or part of the initial image as a detection target region. More specifically, the detection target region may be set by a user of the head region detection device designating an area in which the head region needs to be detected in the initial image. The detection target area may be set for the entire initial image, if necessary.

복수의 변환 이미지 생성 단계(S300)는 머리 영역 검출 장치가 검출 대상 영역을 서로 다른 크기를 가진 복수의 변환 이미지로 생성하는 단계를 의미할 수 있다. The generating of the plurality of transformed images (S300) may mean a step in which the head region detecting apparatus generates a region to be detected as a plurality of transformed images having different sizes.

여기서 변환 이미지는 검출 대상 영역의 가로 세로 길이를 같은 비율로 확대하거나 축소하여 생성된 복수의 이미지일 수 있다. 즉, 초기 이미지의 검출 대상 영역은 적어도 둘 이상의 변환 이미지로 생성될 수 있으며, 이러한 복수의 변환 이미지에는 비율이 변화되지 않은 검출 대상 영역에 대한 이미지도 포함할 수 있다. 복수의 변환 이미지는 이후 제1 내지 제3 머리 영역 검출 단계를 통해 연산이 수행될 수 있다. 복수의 변환 이미지 생성 단계(S300)에 대한 보다 상세한 내용의 설명은 후술하기로 한다.Here, the converted image may be a plurality of images generated by expanding or reducing the width and length of the detection target area at the same ratio. That is, the detection target region of the initial image may be generated with at least two or more transformed images, and the plurality of transformed images may also include an image of the detection target region whose ratio is not changed. The plurality of transformed images may be subsequently performed through the first to third head region detection steps. The details of the plurality of transformed image generation steps S300 will be described later.

제1 머리 영역 검출 단계(S400)는 머리 영역 검출 장치가 복수의 변환 이미지 각각에 대하여 머리가 위치하는 것으로 추정되는 후보 영역인 제1 머리 영역을 검출하는 단계를 의미할 수 있다.The first head region detection step (S400) may mean a step in which the head region detection apparatus detects a first head region, which is a candidate region estimated to be located with respect to each of the plurality of transformed images.

여기서 제1 머리 영역은 크기가 서로 다르게 생성된 복수의 변환 이미지 각각에 대하여 사람의 머리가 위치하는 것으로 추정되는 영역을 의미한다. 제1 머리 영역은 이후 검출되는 제2 머리 영역과 제3 머리 영역에 앞서 기초적인 연산을 통해 1차적으로 검출되는 머리 영역을 의미할 수 있다. 즉, 제1 머리 영역은 이후, 제2 머리 영역과 제3 머리 영역의 검출에 사용되는 영역이다. 머리 영역 검출 장치는 제1 머리 영역에서 제2 머리 영역과 제3 머리 영역으로 순차적으로 검출하며, 해당 머리 영역에 사람의 머리가 존재할 확률을 높여갈 수 있다.Here, the first head region refers to a region in which a human head is estimated to be located for each of a plurality of transformed images generated in different sizes. The first head region may mean a head region that is primarily detected through a basic operation prior to the second and third head regions that are subsequently detected. That is, the first head region is an area used for detection of the second head region and the third head region. The head region detecting apparatus sequentially detects from the first head region to the second head region and the third head region, and increases the probability that a human head exists in the head region.

제1 머리 영역 검출 단계(S400)가 수행되면, 복수의 변환 이미지 각각에 대하여 제1 머리 영역이 검출될 수 있다. 변환 이미지들은 서로 이미지의 크기가 다르므로, 각각의 변환 이미지에서 검출되는 제1 머리 영역도 동일하지 않을 수 있다. 제1 머리 영역 검출 단계(S400)에 대한 보다 상세한 내용의 설명은 후술하기로 한다.When the first head region detection step S400 is performed, the first head region may be detected for each of the plurality of transformed images. Since the transformed images have different sizes from each other, the first head region detected in each transformed image may not be the same. Details of the first head region detection step S400 will be described later.

제2 머리 영역 검출 단계(S500)는 머리 영역 검출 장치가 제1 머리 영역을 분석하여 복수의 변환 이미지 각각에 대한 제2 머리 영역을 검출하는 단계를 의미할 수 있다. The second head region detection step S500 may mean a step in which the head region detection apparatus analyzes the first head region and detects the second head region for each of the plurality of transformed images.

제2 머리 영역 검출 단계(S500)는 제1 머리 영역에서 잘못 검출된 머리 영역을 제외하고, 사람의 머리에 대한 특징이 보다 강하게 포함되는 영역에 대한 머리 영역 검출 확률을 증가시키는 단계일 수 있다. 결과적으로 각각의 변환 이미지에 대하여 제2 머리 영역 검출 단계(S500)를 수행함으로써 제1 머리 영역 보다 사람의 머리가 위치할 확률이 높은 제2 머리 영역을 검출할 수 있게 된다.The second head region detection step S500 may be a step of increasing the probability of detecting the head region for the region in which the characteristics of the human head are more strongly included, except for the head region that is incorrectly detected in the first head region. As a result, by performing the second head region detection step S500 for each transformed image, it is possible to detect the second head region having a higher probability that the human head is located than the first head region.

제2 머리 영역 검출 단계(S500)에 대한 보다 상세한 내용의 설명은 후술하기로 한다. Details of the second head region detection step S500 will be described later.

제3 머리 영역 검출 단계(S600)는 머리 영역 검출 장치가 복수의 변환 이미지를 중첩하여 복수의 제2 머리 영역을 분석함으로써 검출 대상 영역 내의 최종 머리 영역인 제3 머리 영역을 검출하는 단계를 의미할 수 있다.In the third head region detection step (S600), the head region detection apparatus analyzes a plurality of second head regions by overlapping a plurality of transformed images, thereby detecting a third head region as a final head region in the detection target region. You can.

머리 영역 검출 장치는 앞선 제2 머리 영역 검출 단계(S500)를 통해 각각의 변환 이미지에 대하여 검출된 제2 머리 영역을 생성할 수 있다. 머리 영역 검출 장치는 제3 머리 영역 검출 단계(S500)에서 각각의 변환 이미지를 같은 크기로 다시 재조정할 수 있다. 머리 영역 검출 장치는 재조정된 변환 이미지간에 제2 머리 영역이 겹치는 정도를 판단하여 최종적으로 사람의 머리가 위치하는 것으로 추정되는 제3 머리 영역을 검출할 수 있게 된다.The head region detection apparatus may generate a second head region detected for each converted image through the preceding second head region detection step (S500). The head region detection apparatus may re-adjust each converted image to the same size in the third head region detection step (S500). The head region detecting apparatus may determine the degree of overlap of the second head region between the re-adjusted transformed images, and finally detect the third head region estimated to be located at the human head.

제3 머리 영역 검출 단계(S600)에 대한 보다 상세한 내용의 설명은 후술하기로 한다. The detailed description of the third head region detection step S600 will be described later.

제1 머리 영역 검출 단계(S400)와 제2 머리 영역 검출 단계(S500)는 복수의 변환 이미지 생성 단계(S300)에서 생성된 복수의 변환 이미지 각각에 대하여 수행될 수 있다. 이 때, 하나의 변환 이미지에 대하여 제1 머리 영역 검출 단계(S400)와 제2 머리 영역 검출 단계(S500)가 순차적으로 수행된 후, 다른 크기의 변환 이미지에 제1 머리 영역 검출 단계(S400)와 제2 머리 영역 검출 단계(S500)가 반복적으로 순차 수행될 수 있다. 또한, 다른 실시예에서 모든 복수의 변환 이미지 각각에 대하여 제1 머리 영역 검출 단계(S400)가 수행된 후, 다시 모든 복수의 변환 이미지 각각에 대하여 제2 머리 영역 검출 단계(S500)가 수행될 수 도 있다. The first head region detection step (S400) and the second head region detection step (S500) may be performed for each of the plurality of transformed images generated in the plurality of transformed image generating stages (S300). At this time, after the first head region detection step (S400) and the second head region detection step (S500) are sequentially performed on one transformed image, the first head region detection step (S400) is performed on the transformed image having different sizes. And the second head region detection step S500 may be repeatedly performed sequentially. Also, in another embodiment, after the first head region detection step S400 is performed on each of the plurality of transformed images, the second head region detection step S500 may be performed on each of the plurality of transformed images again. There is also.

제1 머리 영역 검출 단계(S400)와 제2 머리 영역 검출 단계(S500)는 CNN(Convolutional Neural Network) 연산을 이용하여 수행될 수 있다. 보다 상세하게, 제1 머리 영역 검출 단계(S400)와 제2 머리 영역 검출 단계(S500)가 수행되는 과정에서 각각의 중간 레이어에 포함되는 특징 맵은 다음 과정의 중간 레이어로 진행될 때, CNN 연산을 통해 머리 영역의 특징이 보다 강화된 특징 맵이 출력될 수 있다.The first head region detection step S400 and the second head region detection step S500 may be performed using a convolutional neural network (CNN) operation. In more detail, when the first head region detection step (S400) and the second head region detection step (S500) are performed, the feature map included in each intermediate layer proceeds to the intermediate layer of the next process, and performs CNN calculation. Through this, a feature map with enhanced features of the head region may be output.

도 2는 도 1의 실시예에서 복수의 변환 이미지 생성 단계(S300)를 보다 상세히 설명하기 위한 도면이다. 도 2를 참조하면, 복수의 변환 이미지 생성 단계(S300)는 객체 선택 단계(S310) 및 확대되거나 축소된 변환 이미지 생성 단계(S320)를 포함할 수 있다.2 is a view for explaining in more detail the step (S300) of generating a plurality of transformed images in the embodiment of FIG. 1. Referring to FIG. 2, a plurality of transformed image generation steps (S300) may include an object selection step (S310) and an enlarged or reduced transformed image generation step (S320).

객체 선택 단계(S310)는 검출 대상 영역 내에 위치하는 객체 중 제일 큰 크기의 객체와 제일 작은 크기의 객체를 선택하는 단계를 의미할 수 있다.The object selection step S310 may mean a step of selecting an object having the largest size and an object having the smallest size among objects located in the detection target area.

머리 영역 검출 장치는 검출 대상 영역 내에서 배경이 아닌 임의의 객체를 선정하고, 선정된 객체와 유사한 종류의 객체 중 제일 큰 크기의 객체와 제일 작은 크기의 객체를 선택할 수 있다. 본 발명의 실시예에서 객체는 머리를 포함한 사람의 신체를 의미할 수 있다. 객체 선택 단계(S310)는 구체적인 연산의 처리에 앞서, 간단하고 빠른 연산을 통해 대상 객체를 선정하고 큰 크기의 객체와 작은 크기의 객체를 선택하는 단계일 수 있다. The head region detection apparatus may select an arbitrary object, not a background, within the detection target region, and may select an object of the largest size and an object of the smallest size among objects of a kind similar to the selected object. In an embodiment of the present invention, the object may mean the body of a person, including the head. The object selection step S310 may be a step of selecting a target object through simple and fast calculation and selecting a large-sized object and a small-sized object prior to the processing of the specific operation.

확대되거나 축소된 변환 이미지 생성 단계(S320)는 상기 제일 큰 크기의 객체와 상기 제일 작은 크기의 객체의 크기 비율에 따라 상기 초기 이미지보다 확대된 변환 이미지를 생성하거나 상기 초기 이미지보다 축소된 변환 이미지를 생성하는 단계를 의미할 수 있다.In the step of generating an enlarged or reduced transformed image (S320), a transformed image enlarged from the initial image is generated according to a size ratio of the largest object and the smallest object, or the converted image reduced from the initial image is generated. It can mean the step of creating.

머리 영역 검출 장치는 복수의 변환 이미지를 생성할 때, 변환 이미지를 어떤 비율에 따라 확대하거나 축소할 지, 또는 얼마나 많은 수의 복수의 이미지를 생성할 지를 결정하기 위해서 상기 객체 선택 단계(S310)의 결과를 이용할 수 있다. 즉, 머리 영역 검출 장치는 객체 선택 단계(S310)에서 선택된 가장 큰 객체의 크기와 가장 작은 크기의 객체의 크기를 비교하여 상기 복수의 변환 이미지의 비율과 생성되는 변환 이미지의 수를 정할 수 있다.When a plurality of transformed images are generated, the head region detecting apparatus selects the object selection step (S310) in order to determine a ratio of the transformed image to be enlarged or reduced, or how many images to be generated. Results are available. That is, the head region detecting apparatus may determine the ratio of the plurality of transformed images and the number of transformed images generated by comparing the size of the largest object selected in the object selection step S310 and the size of the smallest sized object.

보다 상세하게, 변환 이미지는 설정된 검출 대상 영역의 크기와 검출 대상 영역 내에서 선택된 객체의 크기에 따라 달라질 수 있다. 예를 들어, 검출 대상 영역의 크기가 N이고, 선택된 가장 작은 크기의 객체 크기가 A일 때, 해당 객체에 대한 변환 이미지의 변환 비율 Rmin는 A/N로 설정될 수 있다. 또한, 선택된 가장 큰 크기의 객체 크기가 B일 때, 해당 객체에 대한 변환 이미지의 비율 Rmax는 B/N으로 설정될 수 있다. 그리고 사전에 설정되거나 사용자에 의하여 임의로 설정된 변환 이미지의 생성 기준이 되는 비율 R1 값이 C/N으로 설정되고, Rmin은 R1의 0.5배, Rmax는 R1의 2배 크기 인 것을 가정한다. 이 때, 머리 영역 검출 장치는 가장 작은 크기 객체를 기준으로 검출 대상 영역의 크기를 2배로 늘린 변환 이미지와 가장 큰 크기 객체를 기준으로 0.5로 축소한 변환 이미지를 생성할 수 있다. 상기한 예시에서 머리 영역 검출 장치는 크기가 늘어나거나 줄어든 2 개의 변환 이미지를 생성한 것을 예시하였으나, 설정에 따라 생성되는 변환 이미지는 3개 이상일 수 있고, 그 확대되거나 축소되는 비율도 다양할 수 있다. In more detail, the converted image may be changed according to the size of the set detection target region and the size of the selected object in the detection target region. For example, when the size of the detection target area is N and the size of the smallest selected object is A, the conversion ratio Rmin of the converted image for the object may be set to A / N. Also, when the object size of the largest selected size is B, the ratio Rmax of the converted image for the object may be set to B / N. In addition, it is assumed that a ratio R1 value, which is a reference for generating a converted image set in advance or arbitrarily set by a user, is set to C / N, Rmin is 0.5 times R1, and Rmax is 2 times R1. At this time, the head region detection apparatus may generate a transform image in which the size of the detection target region is doubled based on the smallest size object and a transform image reduced to 0.5 based on the largest size object. In the above example, the head region detection apparatus exemplifies generating two transformed images of which the size is increased or decreased, but the transformed images generated according to the setting may be three or more, and the enlarged or reduced ratio may be varied. .

도 3은 검출 대상 영역 내에서 객체가 선택되는 과정을 예시하여 설명하기 위한 도면이다. 도 3을 참조하면, 초기 이미지(A)에서 사용자에 의하여 검출 대상 영역(B)이 설정된 상태를 확인할 수 있다. 머리 영역 검출 장치는 검출 대상 영역(B) 내에서 사람의 신체 전체를 객체로 인식하고, 인식된 객체 중 가장 큰 크기의 객체(C)와 가장 작은 크기의 객체(D)를 선택할 수 있다. 도 3에서 머리 영역 검출 장치가 객체로 인식하는 대상은 사람의 신체로 예시하였으나, 이는 사용자나 설계자에 의한 설정이나 환경에 따라 달리 변경될 수 있다. 3 is a diagram for explaining an example of a process in which an object is selected in a detection target area. Referring to FIG. 3, it is possible to check a state in which the detection target area B is set by the user in the initial image A. The head region detecting apparatus may recognize the entire human body as an object in the detection target region B, and may select the largest object C and the smallest object D from among the recognized objects. In FIG. 3, the object recognized by the head region detection device as an object is illustrated as a human body, but this may be changed according to a user's or designer's setting or environment.

도 4는 복수의 변환 이미지가 생성되는 것을 예시하여 설명하기 위한 도면이다. 도 4를 참조하면, 초기 이미지의 검출 대상 영역이 선정되고, 검출 대상 영역 내에서 크기에 따른 대상 객체가 선택되면, 선택된 객체의 크기에 대응하여 복수의 변환 이미지가 생성된 것을 알 수 있다. 도 4에서 참조되는 바와 같이, 변환 이미지의 크기는 서로 다를 수 있고, 가장 큰 크기로 확대되어 생성되는 변환 이미지(S)는 작은 크기의 머리 영역을 검출할 확률을 상승시킬 수 있다. 반대로, 가장 작은 크기로 축소되어 생성되는 변환 이미지(L)는 큰 크기의 머리 영역을 검출할 확률을 상승시킬 수 있는 효과가 있다.4 is a view for explaining that a plurality of transformed images are generated. Referring to FIG. 4, when an area to be detected of an initial image is selected and an object according to a size is selected in the area to be detected, it can be seen that a plurality of transformed images are generated corresponding to the size of the selected object. As illustrated in FIG. 4, the size of the transformed images may be different, and the transformed image S generated by being enlarged to the largest size may increase the probability of detecting a small sized head region. Conversely, the transformed image L generated by being reduced to the smallest size has an effect of increasing the probability of detecting a large sized head region.

도 5는 도 1의 실시예에서 제1 머리 영역 검출 단계를 보다 상세히 설명하기 위한 도면이다. 도 5를 참조하면, 제1 머리 영역 검출 단계(S400)는 제1 중간 레이어를 생성하는 단계(S410), 제2 중간 레이어를 생성하는 단계(S420) 및 제3 중간 레이어를 생성하는 단계(S43)를 포함할 수 있다.FIG. 5 is a diagram for explaining the first head region detection step in more detail in the embodiment of FIG. 1. Referring to FIG. 5, the first head region detection step (S400) includes generating a first intermediate layer (S410), generating a second intermediate layer (S420), and generating a third intermediate layer (S43). ).

제1 중간 레이어를 생성하는 단계(S410)는 변환 이미지에 대한 필터링을 거쳐 상기 머리가 위치하는 영역의 특징과 관련된 특징 맵을 생성하고, 생성된 상기 특징 맵에 대한 CNN(Convolutional Neural Network) 연산을 설정된 횟수만큼 반복 수행함으로써 압축된 크기의 특징 맵을 포함하는 제1 중간 레이어를 생성하는 단계를 의미할 수 있다.In operation S410 of generating a first intermediate layer, a feature map related to a feature of a region in which the head is located is filtered through a transformation image, and a CNN (Convolutional Neural Network) operation on the generated feature map is performed. By repeatedly performing the set number of times, it may mean a step of generating a first intermediate layer including a feature map of a compressed size.

머리 영역 검출 장치는 앞선 단계에서 생성된 복수의 변환 이미지 각각에 대하여 CNN 연산을 반복 수행할 수 있다. 변환 이미지는 제1 머리 영역 검출 단계(S400)가 수행되는 동안, 복수의 특징 맵으로 출력될 수 있다. 하나의 변환 이미지에 대하여 CNN 연산이 반복 수행되며 레이어를 생성할 수 있고, 이러한 레이어는 변환 이미지에서 머리 영역을 검출하기 위한 요소들로 분화되는 복수의 특징 맵을 포함할 수 있다. 제1 중간 레이어는 같은 수의 특징 맵들을 포함하는 레이어들의 집합으로 이해될 수 있다. The head region detecting apparatus may repeatedly perform CNN calculation for each of the plurality of transformed images generated in the previous step. The converted image may be output as a plurality of feature maps while the first head region detection step S400 is performed. The CNN operation is repeatedly performed on one transformed image and a layer may be generated, and such a layer may include a plurality of feature maps differentiated into elements for detecting a head region in the transformed image. The first intermediate layer may be understood as a set of layers including the same number of feature maps.

또한, 제1 중간 레이어 생성 단계(S410)는 변환 이미지에 대하여 CNN의 컨볼루션(convolution)과 서브샘플링(subsampling)이 반복 수행되며, 분석 대상이 되는 레이어의 특징 맵 크기가 작아지는 단계일 수 있다.In addition, the first intermediate layer generation step (S410) may be a step in which convolution and subsampling of the CNN are repeatedly performed on the transformed image, and the feature map size of the layer to be analyzed is reduced. .

제1 중간 레이어 생성 단계(S410)에서 CNN이 반복 수행되는 횟수는 머리 영역 검출 방법의 설계자의 의도에 따라 달라질 수 있다. 일반적으로 CNN 연산이 반복될수록 머리 영역의 검출 정확도는 상승하나, 연산 시간이 길어지는 문제점이 발생할 수 있다. The number of times the CNN is repeatedly performed in the first intermediate layer generation step S410 may vary according to the designer's intention of the head region detection method. In general, as the CNN operation is repeated, the detection accuracy of the head region increases, but a problem in which the calculation time is increased may occur.

제1 중간 레이어 생성 단계(S410)에서, 다음 단계의 제1 중간 레이어를 생성하기 위해선 바로 이전 단계의 제1 중간 레이어를 사용할 수 있다. In the first intermediate layer generation step (S410), the first intermediate layer of the immediately preceding step may be used to generate the first intermediate layer of the next step.

제2 중간 레이어를 생성하는 단계(S420)는 상기 제1 중간 레이어가 포함하는 특징 맵의 크기를 유지한 상태로 CNN 연산을 설정된 횟수만큼 반복 수행하고, 각각의 반복 수행 단계에 따른 특징 맵을 포함하는 복수의 제2 중간 레이어를 생성하는 단계를 의미할 수 있다. 여기서 제2 중간 레이어는 상기 제1 중간 레이어보다 많은 수의 특징 맵을 포함할 수 있다.In operation S420 of generating a second intermediate layer, the CNN operation is repeatedly performed a predetermined number of times while maintaining the size of the feature map included in the first intermediate layer, and includes a feature map according to each iteration step. It may mean a step of generating a plurality of second intermediate layers. Here, the second intermediate layer may include a larger number of feature maps than the first intermediate layer.

제2 중간 레이어 생성 단계(S420)는 제1 중간 레이어 생성 단계(S410)를 거쳐 마지막으로 생성된 제1 중간 레이어에 대하여 추가적인 CNN 연산이 수행되는 단계로 이해될 수 있다. 제2 중간 레이어 생성 단계(S420)에서는 서브 샘플링을 통한 특징 맵의 크기 감소가 이루어지지 않는다. 이는 제2 중간 레이어 생성 단계(S420)는 제1 중간 레이어보다 많은 수의 특징 맵을 포함하기 때문이다.The second intermediate layer generation step S420 may be understood as a step in which an additional CNN operation is performed on the first intermediate layer generated last through the first intermediate layer generation step S410. In the second intermediate layer generation step (S420), the size reduction of the feature map through subsampling is not performed. This is because the second intermediate layer generation step S420 includes a greater number of feature maps than the first intermediate layer.

즉, 제2 중간 레이어 생성 단계(S420)에서 생성되는 제2 중간 레이어의 특징 맵은 제1 중간 레이어 보다 많은 수의 특징 맵을 포함하고 더 작은 크기를 가질 수 있다. 제2 중간 레이어 생성 단계(S420)는 제1 중간 레이어 생성 단계(S410) 보다, 많은 수의 특징 맵을 포함하는 레이어를 생성하나, 더 작은 크기의 특징 맵을 생성함으로써 연산의 횟수가 보다 적으면서도 다양한 특징을 활용할 수 있어 머리 영역 검출의 정확도를 향상시킬 수 있게 된다.That is, the feature map of the second intermediate layer generated in the second intermediate layer generation step S420 may include a larger number of feature maps than the first intermediate layer and may have a smaller size. The second intermediate layer generation step (S420) generates a layer including a larger number of feature maps than the first intermediate layer generation step (S410), but generates fewer feature maps, but has fewer operations. Various features can be used to improve the accuracy of head region detection.

제3 중간 레이어를 생성하는 단계(S430)는 상기 복수의 제2 중간 레이어의 특징 맵과 상기 제1 중간 레이어의 특징 맵을 병합하여 CNN 연산을 수행함으로써 제3 중간 레이어를 생성하는 단계를 의미할 수 있다.The step of generating a third intermediate layer (S430) means generating a third intermediate layer by performing a CNN operation by merging the feature maps of the plurality of second intermediate layers and the feature maps of the first intermediate layer. You can.

제3 중간 레이어는 제1 머리 영역 검출 단계(S400)의 CNN 연산을 통해 최종적으로 생성되는 레이어를 의미할 수 있다. 제3 중간 레이어의 생성은 최종적으로 연산 생성된 제2 중간 레이어 뿐 아니라, 앞선 단계에서 생성되는 복수의 제2 중간 레이어들을 모두 병합하여 계산함으로써 생성될 수 있다. 제3 중간 레이어를 생성함에 있어서, 앞선 중간 레이어들을 같이 활용함으로써 보다 다양한 특징 맵들을 사용할 수 있어 머리 영역 검출의 확률이 상승하게 된다. The third intermediate layer may refer to a layer finally generated through the CNN operation of the first head region detection step (S400). The third intermediate layer may be generated by merging and calculating not only the second intermediate layer finally calculated and generated, but also a plurality of second intermediate layers generated in the previous step. In generating the third intermediate layer, a variety of feature maps can be used by using the preceding intermediate layers together, thereby increasing the probability of head region detection.

여기서 머리 영역 검출 장치는 상기한 제1 내지 제3 중간 레이어들을 생성하는 과정에서 ReLU(Rectified Linear Unit) 함수를 사용할 수 있다. ReLU 함수는 상기한 중간 레이어들을 생성하는 과정에서 발생하는 노이즈를 제거하기 위해 음수를 0으로 가정하는 함수이다.Here, the head region detection device may use a ReLU (Rectified Linear Unit) function in the process of generating the first to third intermediate layers. The ReLU function is a function that assumes a negative number of 0 to remove noise generated in the process of generating the intermediate layers.

그리고, 다른 실시예에서 ReLU는 음수의 범위에서 음의 기울기를 갖는 함수일 수 있다. 상기한 실시예에서 사용되는 ReLU의 방법에서 중간 레이어를 생성하는 과정에서 노이즈의 제거와 머리 영역의 검출 정확도를 향상시키기 위하여, 음수를 0으로 고정하지 않고 소정의 기울기를 가지도록 할 수 있다. 이 때, 기울기에 사용되는 가중치의 값은 학습 과정에서 도출되지 않고 미리 고정된 값을 활용토록 함으로써 학습 과정에서 소요되는 연산량을 줄이고, 보다 빠른 속도로 연산이 가능토록 하는 효과가 있다. And, in another embodiment, ReLU may be a function having a negative slope in a negative range. In the process of generating an intermediate layer in the method of ReLU used in the above-described embodiment, in order to remove noise and improve the detection accuracy of the head region, it is possible to have a predetermined slope without fixing a negative number to zero. At this time, the value of the weight used for the slope is not derived in the learning process, but by using a fixed value in advance, there is an effect of reducing the amount of computation required in the learning process and allowing computation at a higher speed.

도 6은 제1 머리 영역 검출 단계를 설명하기 위한 다른 도면이다.6 is another diagram for describing a first head region detection step.

도 6을 참조하면, 변환 이미지에 대하여 복수의 CNN 연산(Conv)가 수행되며, 제1 중간 레이어를 생성하는 단계(S410), 제2 중간 레이어를 생성하는 단계(S420) 및 제3 중간 레이어를 생성하는 단계(S430)가 예시적으로 도시된 것을 알 수 있다.Referring to FIG. 6, a plurality of CNN operations (Conv) are performed on the converted image, generating a first intermediate layer (S410), generating a second intermediate layer (S420), and a third intermediate layer. It can be seen that the step of generating (S430) is illustratively illustrated.

도 6에서 제1 중간 레이어를 생성하는 단계(S410)는 입력되는 변환 이미지에 대하여 6번의 CNN 연산이 반복 수행되는 것을 알 수 있다. 제1 중간 레이어를 생성하는 단계(S410)가 수행되는 동안, 6 개의 제1 중간 레이어(Layer 1 내지 Layer 6)가 생성될 수 있다. 최초의 제1 중간 레이어(Layer 1)는 변환 이미지에 기초하여 생성되고, 이후의 각각 제1 중간 레이어(Layer 2 내지 Layer 6)는 앞선 단계의 제1 중간 레이어를 기초로 하여 생성될 수 있다.In FIG. 6, in step S410 of generating the first intermediate layer, it can be seen that 6 CNN operations are repeatedly performed on the input transformed image. During the operation (S410) of generating the first intermediate layer, six first intermediate layers (Layer 1 to Layer 6) may be generated. The first first intermediate layer (Layer 1) is generated based on the transformed image, and each subsequent first intermediate layer (Layer 2 to Layer 6) may be generated based on the first intermediate layer of the previous step.

이후, 제2 중간 레이어를 생성하는 단계(S420)가 수행된다. 이 때, 최종 생성된 제1 중간 레이어(Layer 6)가 제2 중간 레이어 생성 단계(S420)의 시작에 사용될 수 있으며, 4 번의 CNN 연산이 반복 수행되는 것을 알 수 있다. 제2 중간 레이어 생성 단계(S420)가 수행되는 동안, 4 개의 제2 중간 레이어(Layer 7 내지 Layer 10)가 생성될 수 있다. 최초의 제2 중간 레이어(Layer 7)는 최종 생성된 제1 중간 레이어(Layer 7)를 기초하여 생성되고, 이후의 각각 제2 중간 레이어(Layer 8 내지 Layer 10)는 앞선 단계의 제2 중간 레이어를 기초로 하여 생성될 수 있다.Thereafter, a step S420 of generating a second intermediate layer is performed. At this time, the first generated first intermediate layer (Layer 6) may be used at the start of the second intermediate layer generation step (S420), it can be seen that the 4 CNN operation is repeatedly performed. While the second intermediate layer generation step (S420) is performed, four second intermediate layers (Layer 7 to Layer 10) may be generated. The first second intermediate layer (Layer 7) is generated based on the first generated first intermediate layer (Layer 7), and each subsequent second intermediate layer (Layer 8 to Layer 10) is the second intermediate layer of the previous step. It can be generated on the basis of.

마지막으로 제3 중간 레이어 생성 단계(S430)는 제2 중간 레이어들(Layer 7 내지 Layer 10)을 모두 병합하여 생성될 수 있다. 이후, 제3 중간 레이어(Layer 11)를 통해 제1 머리 영역 검출 단계의 출력물(out)이 생성될 수 있다. 상기한 방법과 같이 앞서 생성된 제2 중간 레이어들(Layer 7 내지 Layer 10)을 함께 병합하여 연산처리 함으로써 머리 영역의 다양한 모양과 크기에 대한 인지 및 위치 정보등을 나타내는 특징 맵을 더욱 정확하게 추출해낼 수 있다. 따라서, 크기가 다양하고 모양이 다양한 머리 영역에 대해서도 검출 확률을 향상시킬 수 있는 효과가 있다. Finally, the third intermediate layer generation step (S430) may be generated by merging all of the second intermediate layers (Layer 7 to Layer 10). Thereafter, an output (out) of the first head region detection step may be generated through the third intermediate layer (Layer 11). As described above, by merging and processing the second intermediate layers (Layer 7 to Layer 10) previously generated together, a feature map indicating recognition and location information for various shapes and sizes of the head region can be more accurately extracted. You can. Therefore, there is an effect of improving the detection probability even for hair regions of various sizes and shapes.

도 6에서 제1 중간 레이어(Layer 1 내지 Layer 6)는 6개, 제2 중간 레이어(Layer 7 내지 Layer 10)는 4개인 것으로 예시하였으나, 이는 설명의 편의를 위하여 정해놓은 수치일 뿐이다. 따라서, 머리 영역 검출 방법의 설계자나 사용자의 의도에 따라 그 수는 달라질 수 있다.In FIG. 6, six first intermediate layers (Layer 1 to Layer 6) and four second intermediate layers (Layer 7 to Layer 10) are illustrated, but these are merely numerical values set for convenience of description. Therefore, the number may vary depending on the intention of the designer or the user of the head region detection method.

도 7은 제1 머리 영역 검출 단계를 설명하기 위한 또 다른 도면이다.7 is another diagram for describing a first head region detection step.

도 7을 참조하면, 도 6에서 설명된 제1 머리 영역 검출 단계가 수행되는 과정에서 생성되는 각 중간 레이어들의 내용을 보다 상세히 알 수 있다. 도 7에서, CONV는 컨볼루션(Convolution), RELU는 ReLU(Rectified Linear Unit) 함수를 의미한다. Ker는 커널 크기, pad는 제로 패딩의 크기를 의미한다. 제로 패딩(Zero padding)을 통해서 CNN 연산이 수행된 후 연산이 수행되지 않은 가장자리 픽셀 값을 0으로 할당하여 결과 영상의 크기를 보존하도록 할 수 있다. Referring to FIG. 7, the contents of each intermediate layer generated in the process of performing the first head region detection step described in FIG. 6 can be seen in more detail. In FIG. 7, CONV is a convolution, and RELU is a ReLU (Rectified Linear Unit) function. Ker is the size of the kernel, pad is the size of the zero padding. After the CNN operation is performed through zero padding, an edge pixel value for which the operation is not performed is assigned to 0 to preserve the size of the resulting image.

각각의 레이어에 대한 CNN 연산을 수행함에 있어, RELU 함수가 사용될 수 있고, 이 때, 음수의 영역에 대한 가중치는 0.01(negative slope)로 설정될 수 있다. 레이어에 따라 RELU 함수가 적용되거나 적용되지 않을 수 있다. RELU 함수의 적용에 있어서, 사용되는 가중치나 각각의 중간 레이어에 대한 RELU 함수 사용 여부는 머리 영역 검출 방법의 설계자의 의도에 따라 달라질 수 있다. In performing CNN operation for each layer, a RELU function may be used, and in this case, a weight for a negative region may be set to 0.01 (negative slope). Depending on the layer, the RELU function may or may not be applied. In the application of the RELU function, the weight used or whether to use the RELU function for each intermediate layer may vary depending on the designer's intention of the head region detection method.

도 7에서는 제1 중간 레이어(Layer 1 내지 Layer 6)들은 A개 만큼의 특징 맵을 출력하고, 제2 중간 레이어(Layer 7 내지 Layer 10)는 B 개 만큼의 특징 맵을 출력하는 것으로 예시하였다. 제1 중간 레이어(Layer 1 내지 Layer 6)들은 A개의 특징 맵을 반복적으로 활용하여 입력 영상 및 중간 단계에서 추출되는 특징을 효율적으로 추출한다. 제2 중간 레이어(Layer 7 내지 Layer 10)는 에서 생성되는 특징 맵의 수는 제1 중간 레이어(Layer 1 내지 Layer 6)의 2배 이상으로, 제1 중간 레이어(Layer 1 내지 Layer 6)에 비해 상대적으로 크기는 작으면서 보다 많은 필터로부터 추출된 특징맵을 활용하여 처리 속도를 높일 수 있다. 또한, 특징 맵에 대하여 상기한 서브 샘플링 단계가 수행되지 않으므로 효율적으로 임계치를 활용하여 연산의 효율성을 높일 수 있다. In FIG. 7, it is illustrated that the first intermediate layers (Layers 1 to 6) output A feature maps, and the second intermediate layers (Layers 7 to 10) output B feature maps. The first intermediate layers (Layers 1 to 6) repeatedly utilize A feature maps to efficiently extract the input image and features extracted in the intermediate stage. The number of feature maps generated in the second intermediate layers (Layers 7 to 10) is at least twice that of the first intermediate layers (Layers 1 to 6), compared to the first intermediate layers (Layers 1 to 6). The processing speed can be increased by using feature maps extracted from more filters while being relatively small in size. In addition, since the above-described sub-sampling step is not performed on the feature map, it is possible to efficiently utilize a threshold value to increase the efficiency of the operation.

제3 중간 레이어(Layer 11) 검출하고자 하는 정보량 N개 만큼의 특징맵을 도출한다. 예를 들어, 머리 영역만을 검출하고자 하는 경우, 1가지 객체를 검출하기 위해서 1개의 특징 맵이 도출될 수 있다. 이 외에 머리 영역이 아닌 다른 부위에 대한 검출 영역을 확보하고자 하는 경우, N의 값은 2 이상의 수로 설정될 수 있다.A feature map corresponding to N pieces of information to be detected is derived from the third intermediate layer (Layer 11). For example, when only the head region is to be detected, one feature map may be derived to detect one object. In addition, when it is desired to secure a detection area for a portion other than the head area, the value of N may be set to a number of 2 or more.

도 8은 제1 머리 영역 검출 단계의 서브 샘플링 과정을 예시하여 설명하기 위한 도면이다. 8 is a diagram for explaining a sub-sampling process of the first head region detection step by way of example.

도 8을 참조하면, 변환 이미지에 대하여 제1 중간 레이어 생성 단계(S410)를 수행함에 있어서, 컨벌루션 연산과 서브 샘플링 연산이 처리과정을 알 수 있다. 도 8과 같이, 5x5 픽셀 크기의 패치를 활용하여 제1 중간 레이어를 형성한 후 2x2 패치에서 최대, 평균 값등의 연산을 하여 서브 샘플링 연산을 을 수행하여 크기가 작은 특징 맵을 획득할 수 있다.Referring to FIG. 8, in performing the first intermediate layer generation step (S410) on the transformed image, a process of convolution and subsampling operations can be seen. As shown in FIG. 8, after forming a first intermediate layer by using a patch having a size of 5x5 pixels, a sub-sampling operation is performed by performing operations such as maximum and average values in the 2x2 patch to obtain a feature map having a small size.

도 9는 ReLU 함수의 그래프를 설명하기 위한 도면이다.9 is a view for explaining a graph of the ReLU function.

도 9는 제1 중간 레이어 생성 단계(S410)와 제2 중간 레이어 생성 단계(S420) 과정에서 사용되는 ReLU(Rectified Linear Unit)와 Leaky ReLU를 설명하기 위함이다. 중간 레이어에서 노이즈를 감소시키고자 할 때, 음수를 0으로 가정하는 ReLU 함수가 사용될 수 있다. 또한, 다른 실시예에서 중간 레이어의 노이즈를 감소시킴과 동시에 정확도를 보다 상승시키기 위해 음수의 값에 고정된 가중치(ai)를 부여하여 활용하여 활용할 수 있다. 이 때, 가중치(ai)가 학습 과정에서 도출되는 것이 아닌 고정값을 활용함으로 학습이 빠르게 이루어 질 수 있고 연산을 위해 저장해야 할 데이터의 수를 줄일 수 있다.FIG. 9 is for explaining a rectified linear unit (ReLU) and a Leaky ReLU used in the first intermediate layer generation step (S410) and the second intermediate layer generation step (S420). When reducing noise in the middle layer, a ReLU function that assumes a negative number as 0 can be used. In addition, in another embodiment, a fixed weight (ai) may be applied to a negative value in order to reduce the noise of the intermediate layer and at the same time increase the accuracy, and utilize it. At this time, by using a fixed value whose weight ai is not derived from the learning process, learning can be performed quickly and the number of data to be stored for calculation can be reduced.

도 10은 도 1의 제2 머리 영역 검출 단계를 보다 상세히 설명하기 위한 도면이다.FIG. 10 is a view for explaining the second head region detection step of FIG. 1 in more detail.

도 10을 참조하면, 제2 머리 영역 검출 단계(S500)는 제4 중간 레이어 생성 단계(S510)와 최종 레이어 생성 단계(S520)를 포함할 수 있다.Referring to FIG. 10, the second head region detection step (S500) may include a fourth intermediate layer generation step (S510) and a final layer generation step (S520).

제4 중간 레이어 생성 단계(S510)는 상기 제2 중간 레이어 중 최초로 생성된 제2 중간 레이어의 특징 맵과 상기 제3 중간 레이어의 특징 맵을 병합하여 CNN 연산을 반복 수행함으로써 제4 중간 레이어를 생성하는 단계를 의미할 수 있다. The fourth intermediate layer generation step (S510) generates a fourth intermediate layer by repeatedly performing a CNN operation by merging the feature map of the second intermediate layer first generated among the second intermediate layers and the feature map of the third intermediate layer. It can mean a step.

최종 레이어 생성 단계(S520)는 생성된 제4 중간 레이어의 특징 맵에 상기 제1 중간 레이어의 특징 맵을 병합하는 과정을 반복함으로써 최종 레이어를 생성 하는 단계를 의미할 수 있다.The final layer generation step S520 may refer to a step of generating a final layer by repeating the process of merging the feature map of the first intermediate layer to the generated feature map of the fourth intermediate layer.

제4 중간 레이어 생성 단계(S510)와 최종 레이어 생성 단계(S520)는 도 11과 도 12를 참조하여 설명하기로 한다. The fourth intermediate layer generation step (S510) and the final layer generation step (S520) will be described with reference to FIGS. 11 and 12.

도 11은 제2 머리 영역 검출 단계를 설명하기 위한 다른 도면이다. 도 11과 도 12의 예시는 앞서 설명한 도 6과 도 7의 예시를 다시 이용하여 설명하기로 한다.11 is another diagram for describing a second head region detection step. The examples of FIGS. 11 and 12 will be described again using the examples of FIGS. 6 and 7 described above.

도 11에서, 제2 중간 레이어(Layer 7)은 앞선 제2 중간 레이어 생성 단계(S400)에서 최초로 생성되는 제2 중간 레이어(Layer 7)를 의미할 수 있다. 즉, 제2 중간 레이어(Layer 7)은 제1 중간 레이어 생성 단계(S300)를 통해 마지막으로 생성된 제1 중간 레이어(Layer 6)에 대한 CNN 연산이 처리된 중간 레이어를 의미할 수 있다.In FIG. 11, the second intermediate layer (Layer 7) may refer to the second intermediate layer (Layer 7) that is first generated in the previous second intermediate layer generation step (S400). That is, the second intermediate layer (Layer 7) may refer to an intermediate layer in which the CNN operation for the first intermediate layer (Layer 6) last generated through the first intermediate layer generation step (S300) is processed.

도 11에서, 제4 중간 레이어는 제3 중간 레이어 생성 단계(S430)의 연산 결과 생성되는 출력물(out)과 상기 제2 중간 레이어(Layer 7)이 병합되어 최초의 제4 중간 레이어(710)를 생성할 수 있다. 이후, 제4 중간 레이어(710)에 제2 중간 레이어(Layer 7)이 병합되어 다음 단계의 제4 중간 레이어(720)가 생성될 수 있다. In FIG. 11, the fourth intermediate layer merges the output (out) and the second intermediate layer (Layer 7) generated as a result of the calculation of the third intermediate layer generation step (S430) to form the first fourth intermediate layer 710. Can be created. Thereafter, the second intermediate layer (Layer 7) is merged with the fourth intermediate layer 710 to generate the fourth intermediate layer 720 in the next step.

즉, 앞선 과정에서 생성된 최초의 제2 중간 레이어는 제4 중간 레이어를 생성하고, 다음 단계의 제4 중간 레이어를 생성하는 과정에서 계속 병합 사용될 수 있다. 생성된 제4 중간 레이어에 대하여 CNN 연산을 반복하는 단계의 횟수는 머리 영역 검출 방법의 설계자의 의도에 따라 달리 설정될 수 있다. 제4 중간 레이어에 대한 CNN 연산이 반복됨에 따라 검출 대상 영역에서 머리 영역이 위치하는 곳을 검출할 확률을 보다 향상시킬 수 있게 된다.That is, the first second intermediate layer generated in the previous process may be used for merging in the process of generating the fourth intermediate layer and generating the fourth intermediate layer in the next step. The number of steps of repeating the CNN operation for the generated fourth intermediate layer may be differently set according to the intention of the designer of the head region detection method. As the CNN operation for the fourth intermediate layer is repeated, it is possible to further improve the probability of detecting where the head region is located in the detection target region.

머리 영역 검출 장치는 이러한 과정을 소정 횟수 반복함으로써 특징들 간의 차이점 및 분별력은 유지시키면서 정확한 결과물을 도출해낼 수 있다. 즉, 오 검출된 후보영역은 제외시키고 대상이 검출되어야 할 부분의 확률은 상승시킴으로 최종 정확도를 상승시킬 수 있다.The head region detection device can repeat this process a predetermined number of times to derive accurate results while maintaining differences and discrimination between features. In other words, the final accuracy can be increased by excluding the false-detected candidate region and increasing the probability of the part to be detected.

제4 중간 레이어 생성 단계(S510)가 종료되면, 해당 변환 이미지에 대한 최종 레이어를 생성하는 최종 레이어 생성 단계(S520)가 수행될 수 있다. When the fourth intermediate layer generation step S510 ends, a final layer generation step S520 of generating a final layer for the converted image may be performed.

최종 레이어는 해당 변환 이미지에서 머리 영역이 위치하는 것으로 최종적으로 결정된 레이어를 의미한다. 최종 레이어에 따라 머리 영역으로 추정되는 영역을 제2 머리 영역으로 정의할 수 있다. The final layer means the layer finally determined as the head region located in the converted image. A region estimated as a head region according to the final layer may be defined as a second head region.

도 12는 제2 머리 영역 검출 단계를 설명하기 위한 또 다른 도면이다.12 is another diagram for describing a second head region detection step.

도 12에서, CONV는 컨볼루션(Convolution), RELU는 ReLU(Rectified Linear Unit) 함수를 의미한다. Ker는 커널 크기, pad는 제로 패딩의 크기를 의미한다. 제로 패딩(Zero padding)을 통해서 CNN 연산이 수행된 후 연산이 수행되지 않은 가장자리 픽셀 값을 0으로 할당하여 결과 영상의 크기를 보존하도록 할 수 있다. In FIG. 12, CONV is a convolution, and RELU is a ReLU (Rectified Linear Unit) function. Ker is the size of the kernel, pad is the size of the zero padding. After the CNN operation is performed through zero padding, an edge pixel value for which the operation is not performed is assigned to 0 to preserve the size of the resulting image.

도 12에서는 제4 중간 레이어(Layer 21 내지 Layer 24)들은 C개 만큼의 특징 맵을 출력하는 것으로 예시하였다. 제4 중간 레이어(Layer 21 내지 Layer 24)들은 C개의 특징맵을 반복적으로 활용하여 입력 영상 및 중간 단계에서 추출되는 특징을 효율적으로 추출한다. In FIG. 12, the fourth intermediate layers (Layers 21 to 24) are illustrated as outputting as many as C feature maps. The fourth intermediate layers (Layers 21 to 24) repeatedly utilize C feature maps to efficiently extract the input image and features extracted in the intermediate stage.

최종 레이어(Layer 25)는 마지막으로 생성된 제4 중간 레이어(Layer 24)에 대한 CNN 연산을 통해 생성되며, T개의 최종 특징 맵을 포함할 수 있다. 여기서 T는 검출 대상이 되는 신체의 부위 중 하나를 의미할 수 있으며, 머리 영역만을 검출하는 겨우 T 값은 1이다. 이 때, 최종 레이어(Layer 25)는 변환 이미지의 수만큼 생성될 수 있다. The final layer (Layer 25) is generated through the CNN operation for the last generated fourth intermediate layer (Layer 24), and may include T final feature maps. Here, T may mean one of the parts of the body to be detected, and the T value is only 1 for detecting only the head region. At this time, the final layer (Layer 25) may be generated as many as the number of transformed images.

도 13은 도 1의 제3 머리 영역 검출 단계를 보다 상세히 설명하기 위한 도면이다. 도 13을 참조하면, 제3 머리 영역 검출 단계(S600)는 변환 이미지 사이즈 조정 단계(S610), 제2 머리 영역 위치 비교 단계(S620) 및 제3 머리 영역 결정 단계(S630)를 포함할 수 있다.13 is a view for explaining the third head region detection step of FIG. 1 in more detail. Referring to FIG. 13, the third head region detection step (S600) may include a transformed image size adjustment step (S610), a second head region position comparison step (S620), and a third head region determination step (S630). .

변환 이미지 사이즈 조정 단계(S610)는 서로 다른 크기의 상기 복수의 변환 이미지를 동일한 사이즈로 조정하는 단계를 의미할 수 있다. The converted image size adjustment step S610 may mean adjusting the plurality of converted images of different sizes to the same size.

앞선 단계에서 최종 레이어는 변환 이미지의 수만큼 획득된다. 이 때, 각각의 최종 레이어는 서로 다른 크기의 변환 이미지에 대응하여 서로 포함하는 특징 맵의 크기가 상이하므로 기준이 되는 크기로 재조정할 필요가 있다. 따라서, 변환 이미지 사이즈 조정 단계(S610)는 복수의 변환 이미지를 모두 같은 크기로 재조정하여, 해당 변환 이미지에 대응되는 최종 레이어의 크기도 서로 같도록 변환하는 단계일 수 있다. In the previous step, the final layer is obtained as many as the number of transformed images. At this time, since each final layer has a different size of the feature maps included with each other in correspondence to the transformed images of different sizes, it is necessary to re-adjust to a reference size. Accordingly, the transform image size adjustment step S610 may be a step of resizing all of the plurality of transformed images to the same size, and converting the size of the final layers corresponding to the transformed images to be the same.

제2 머리 영역 위치 비교 단계(S620)는 동일한 크기로 조정된 상기 복수의 변환 이미지에 포함된 제2 머리 영역들의 위치를 비교하는 단계를 의미할 수 있다. The second head region position comparison step S620 may mean comparing the positions of the second head regions included in the plurality of transformed images adjusted to the same size.

제3 머리 영역 결정 단계(S630)는 상기 복수의 변환 이미지에 포함된 제2 머리 영역들의 위치가 겹쳐지는 정도에 따라 상기 제3 머리 영역을 결정하는 단계를 의미할 수 있다. 이 때, 제3 머리 영역을 결정하는 단계(S630)는 NMS(Non-Maximum-Suppression) 알고리즘을 통해 상기 제3 머리 영역을 결정할 수 있다. 여기서, 제3 머리 영역은 머리 영역 검출 장치가 검출 대상 영역 내에서 머리 영역인 것으로 최종적으로 판단한 영역을 의미한다.The determining of the third head region (S630) may mean determining the third head region according to the degree of overlap of positions of the second head regions included in the plurality of transformed images. In this case, the determining of the third head region (S630) may determine the third head region through a non-maximum-suppression (NMS) algorithm. Here, the third head region refers to a region where the head region detection device finally determines that it is the head region within the detection target region.

보다 상세하게, 사이즈가 재조정된 변환 이미지들을 같은 영역 상에서 위치 시킬 수 있다. 이 때, 각 제2 머리 영역의 특징 맵의 블록 사이즈를 고려한 겹침의 정도(IOU, intersection over union)에 따른 NMS(Non-Maximum-Suppression)를 수행하여 제3 머리 영역을 검출할 수 있다. 즉, 머리 영역 검출 장치는 제2 머리 영역 간 겹치는 정도가 소정 수치 이상일 때, 해당 영역을 제3 머리 영역으로 판단할 수 있다. 이때 블록 사이즈는 는 기준 이미지 대비 특징 맵의 크기 및 보정 임계치를 활용할 수 있다. 예를 들어, 기준 이미지의 크기가 400x400이고 특징 맵의 크기가 50x50이면 블록 사이즈는 400/50x4로 설정될 수 있다.More specifically, resized transformed images can be positioned on the same area. At this time, the third head region may be detected by performing a non-maximum-suppression (NMS) according to the degree of overlap (IOU, intersection over union) in consideration of the block size of the feature map of each second head region. That is, when the degree of overlap between the second head regions is greater than or equal to a predetermined value, the head region detection apparatus may determine the corresponding region as the third head region. In this case, the block size may utilize the size and correction threshold of the feature map compared to the reference image. For example, if the size of the reference image is 400x400 and the size of the feature map is 50x50, the block size may be set to 400 / 50x4.

도 14는 본 발명의 다른 실시예에 따른 머리 영역 검출 장치를 설명하기 위한 도면이다. 도 14를 참조하면, 머리 영역 검출 장치(100)는 메모리(110), 프로세서(120) 및 촬영부(130)를 포함할 수 있다. 도 14를 설명함에 있어, 앞서 설명한 도면과 동일한 구성 또는 효과에 대한 기재는 생략하기로 한다.14 is a view for explaining a head region detection apparatus according to another embodiment of the present invention. Referring to FIG. 14, the head region detection apparatus 100 may include a memory 110, a processor 120, and a photographing unit 130. In describing FIG. 14, description of the same configuration or effect as the above-described drawings will be omitted.

머리 영역 검출 장치(100) 는 입력된 이미지로부터 대상 객체를 인식하여 검출하기 위한 장치로서, 웹/모바일 사이트 또는 전용 어플리케이션의 제어 하에 서비스 화면 구성, 데이터 입력, 데이터 송수신, 데이터 저장 등 서비스 전반의 동작을 수행할 수 있다. 머리 영역 검출 장치(100)는 촬영부(130)를 포함하며, 입력된 이미지로부터 그래픽 처리가 가능한 스마트폰 뿐만 아니라, 데스크탑 컴퓨터, 노트북 컴퓨터, PDA, 웹 패드, 테블릿 PC 등의 다양한 장치를 포함할 수 있다.The head area detection device 100 is a device for recognizing and detecting a target object from an input image, and operates the overall service such as configuring a service screen, data input, data transmission, and data storage under the control of a web / mobile site or a dedicated application. You can do The head region detection device 100 includes a photographing unit 130, and includes a variety of devices such as a desktop computer, a notebook computer, a PDA, a web pad, and a tablet PC, as well as a smartphone capable of graphic processing from an input image. can do.

메모리(110)는 적어도 하나의 프로그램이 저장될 수 있다. 또한, 후술할 제1 내지 제3 모델과 관련된 파일이 저장될 수 있고, 프로세서의 요청에 따라 저장된 데이터를 제공할 수 있다. At least one program may be stored in the memory 110. In addition, files related to the first to third models to be described later may be stored, and stored data may be provided according to a request of the processor.

프로세서(120)는 입력되는 이미지 데이터로부터 대상 객체를 인식하기 위한 연산을 처리할 수 있다.The processor 120 may process an operation for recognizing a target object from input image data.

촬영부(130)는 대상 객체를 촬영할 수 있다. 촬영부(130)는 객체 인식 장치에 내장된 카메라 또는 별도의 장치로 연결된 외부의 카메라를 포함할 수 있으며, 사진이나 영상의 형태로 대상 객체를 촬영함으로써 이미지 데이터를 생성할 수 있다.The photographing unit 130 may photograph a target object. The photographing unit 130 may include a camera embedded in the object recognition device or an external camera connected to a separate device, and may generate image data by photographing a target object in the form of a photo or video.

보다 상세하게, 프로세서(120)는 초기 이미지와 검출 대상 영역을 입력받고, 상기 검출 대상 영역을 서로 다른 크기를 가진 복수의 변환 이미지로 생성하고, 상기 복수의 변환 이미지 각각에 대하여 머리가 위치하는 것으로 추정되는 후보 영역으로서 제1 머리 영역을 검출하며, 상기 제1 머리 영역을 분석하여 상기 복수의 변환 이미지 각각에 대하여 제2 머리 영역을 검출하고, 상기 복수의 변환 이미지를 중첩하여 복수의 상기 제2 머리 영역을 분석함으로써, 상기 검출 대상 영역 내의 최종 머리 영역인 제3 머리 영역을 검출할 수 있다. In more detail, the processor 120 receives an initial image and a detection target region, generates the detection target region as a plurality of transformed images having different sizes, and a head is positioned for each of the plurality of transformed images. A first head region is detected as an estimated candidate region, and the first head region is analyzed to detect a second head region for each of the plurality of transformed images, and the plurality of transformed images are superimposed to overlap the plurality of second regions. By analyzing the head region, the third head region, which is the final head region in the detection target region, can be detected.

이 ?, 검출 대상 영역은 상기 머리 영역 검출 장치의 사용자가 지정함으로써 설정될 수 있다. The?, The detection target area can be set by designating the user of the head area detection device.

이상, 첨부된 도면을 참조하여 본 발명의 실시예를 설명하였지만, 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자는 본 발명이 그 기술적 사상이나 필수적인 특징을 변경하지 않고서 다른 구체적인 형태로 실시될 수 있다는 것을 이해할 수 있을 것이다. 그러므로 이상에서 기술한 실시예들은 모든 면에서 예시적인 것이며 한정적이 아닌 것으로 이해해야만 한다.The embodiments of the present invention have been described above with reference to the accompanying drawings, but those of ordinary skill in the art to which the present invention pertains can implement the present invention in other specific forms without changing its technical spirit or essential features. You will understand that there is. Therefore, it should be understood that the embodiments described above are illustrative in all respects and not restrictive.

S100: 초기 이미지 입력 단계
S200: 검출 대상 영역 설정 단계
S300: 복수의 변환 이미지 생성 단계
S400: 제1 머리 영역 검출 단계
S500: 제2 머리 영역 검출 단계
S600: 제3 머리 영역 검출 단계S100: Initial image input step
S200: detection target area setting step
S300: multiple conversion image generation step
S400: first head area detection step
S500: second head area detection step
S600: Third head area detection step

Claims

Receiving an initial image;
Setting an area to be detected in the initial image;
Generating the detection target regions as a plurality of transformed images having different sizes;
Detecting a first head region which is a candidate estimated to have a head for each of the plurality of transformed images;
Analyzing the first head region and detecting a second head region for each of the plurality of transformed images; And
Detecting a third head region as a final head region in the detection target region by analyzing the plurality of second head regions by overlapping the plurality of transformed images; Head area detection method comprising a.

The method of claim 1, wherein the detection target region
A head area detection method characterized by being set by a user designating.

The method of claim 1, wherein the generating of the plurality of transformed images
Selecting an object of the largest size and an object of the smallest size among the objects located in the detection target area; And
Generating a converted image enlarged from the initial image or a reduced converted image from the initial image according to a size ratio of the largest object and the smallest object; Head area detection method comprising a.

The method of claim 1, wherein detecting the first head region
After filtering the transformed image, a feature map related to the feature of the region where the head is located is generated, and CNN (Convolutional Neural Network) operation on the generated feature map is repeatedly performed a predetermined number of times to obtain a feature of the compressed size. Creating a first intermediate layer including a map;
Repeating a CNN operation a predetermined number of times while maintaining the size of the feature map included in the first intermediate layer, and generating a plurality of second intermediate layers including the feature map according to each iteration step; And
Generating a third intermediate layer by performing a CNN operation by merging feature maps of the second intermediate layers generated in the generating of the plurality of second intermediate layers; Head area detection method comprising a.

The method of claim 4, wherein generating the first head region
A method of detecting a head region, characterized in that the step of using a ReLU (Rectified Linear Unit) function in the process of generating the intermediate layers.

The method of claim 5, wherein the ReLU function
A head region detection method characterized in that it is a function having a negative slope in a range of negative numbers.

The method of claim 4, wherein the second intermediate layer
A head area detection method comprising a feature map having a larger number of feature maps than the first intermediate layer.

The method of claim 4, wherein detecting the second head region
Generating a fourth intermediate layer by repeatedly performing a CNN operation by merging the feature map of the second intermediate layer first generated from the second intermediate layer and the feature map of the third intermediate layer; And
Generating a final layer by repeating the process of merging the feature maps of the first intermediate layer to the generated feature maps of the fourth intermediate layer; Head area detection method comprising a.

The method of claim 1, wherein detecting the third head region
Adjusting the plurality of transformed images of different sizes to the same size;
Comparing positions of second head regions included in the plurality of transformed images adjusted to the same size; And
Determining the third head region according to a degree of overlap of positions of the second head regions included in the plurality of transformed images; Head area detection method comprising a.

10. The method of claim 9, The step of determining the third head region according to the degree of overlap of the position of the second head region
And determining the third head region through a NMS (Non-Maximum-Suppression) algorithm.

A memory in which at least one program is stored; And
It includes; a processor that operates under the control of the at least one program; includes,
The processor
An initial image and a detection target region are inputted, the detection target region is generated as a plurality of transformed images having different sizes, and a first head region is a candidate region estimated to have a head for each of the plurality of transformed images And detecting the second head region for each of the plurality of transformed images by analyzing the first head region, and analyzing the plurality of second head regions by overlapping the plurality of transformed images, thereby detecting the target A head region detection device characterized in that it detects a third head region which is the final head region in the region.

The method of claim 11, wherein the detection target area
A head area detection device characterized by being set by a user designating.