KR20180065866A

KR20180065866A - A method and apparatus for detecting a target

Info

Publication number: KR20180065866A
Application number: KR1020170103609A
Authority: KR
Inventors: 비아오 왕; 차오 장; 최창규; 데헹 퀴안; 한재준; 징타오 쑤; 하오 펑
Original assignee: 삼성전자주식회사
Priority date: 2016-12-07
Filing date: 2017-08-16
Publication date: 2018-06-18
Also published as: KR102516360B1; CN108171103B; CN108171103A

Abstract

A target detection method and an apparatus thereof are disclosed. According to an embodiment, the target detection method includes: a step of generating an image pyramid based on an image to be detected; a step of classifying a plurality of candidate regions from the image pyramid using a cascade neural network; and a step of determining a target region corresponding to a target included in the image based on the plurality of candidate regions. The cascade neural network includes a plurality of neural networks. At least one of the plurality of neural networks includes a plurality of parallel sub-neural networks.

Description

[0001] A METHOD AND APPARATUS FOR DETECTING A TARGET [0002]

아래 실시예들은 컴퓨터 시각기술영역에 관한 것이고, 구체적으로 타겟 검출 방법 및 장치에 관한 것이다.The following embodiments relate to a computer vision technology domain, and specifically to a target detection method and apparatus.

타겟 검출은 컴퓨터 시각기술영역의 전통적인 연구분야이다. Adaboost(자체 적응 향상 분류기 향상)알고리즘에 Haar(하르 소파)특징 또는 LBP(Local Binary Pattern, 로컬 바이너리 패턴)등의 특징을 결합한 방법과 같은 종래의 타겟 검출 방법은 이미 광범위한 응용이 이루어지고 있다. 그러나, 이런 종래의 방법들은 검출율 등의 성능을 크게 향상시키기 어려운 문제가 있다.Target detection is a traditional area of research in the field of computer vision technology. Conventional target detection methods such as a method combining Adaboost (a self-adaptive enhancement classifier enhancement) algorithm with features such as Haar (Harsh) or LBP (Local Binary Pattern) have already been widely applied. However, such conventional methods have a problem that it is difficult to greatly improve performance such as detection rate.

현재 타겟 검출 알고리즘의 문제는 타겟이 쉽게 간섭을 받아 검출율 등의 성능을 향상시키는 것이 어렵다는 점이다. 예를 들면, 타겟 중 얼굴은 얼굴의 자태, 피부색, 조명, 겹침 효과, 흐릿함(blur) 및 기타 외부 요소에 의해 야기된 다양한 영향을 받는다. 따라서, 종래의 타겟 검출 방법을 이용하여 얼굴을 검출 하면 검출율이 비교적 낮다.The problem of the current target detection algorithm is that it is difficult for the target to easily interfere and improve the performance such as the detection rate. For example, a face in a target is subjected to various influences caused by facial appearance, skin color, illumination, overlapping effect, blur, and other external factors. Therefore, when the face is detected using the conventional target detection method, the detection rate is relatively low.

최근 깊이 학습에 기초한 타겟 검출 방법이 나타났고, 이 방법은 검출율과 오차율이 비교적 우세하다. 그러나, 깊이 학습에 기초한 타겟 검출 방법은 속도가 느리고 분류 모델이 크다는 두 가지 문제가 존재한다.Recently, a target detection method based on depth learning has appeared, and this method has a relatively high detection rate and error rate. However, there are two problems that the target detection method based on depth learning is slow and the classification model is large.

첫 번째로, 깊이 학습을 통하여 획득한 타겟 분류 모델이 차지하는 저장 공간이 크다. 예를 들면, 일반적으로 사용하는 ZF(ZeilerandFergus, 제러랜드와 퍼거스)분류모델의 전형적 데이터 양은 200MB(MegaByte, 메가비트)에 달하고, VGG(Visual Geometric Group at oxford university, 옥스퍼드 대학 시각기하그룹)분류모델의 전형적 데이터 양은 500MB에 달한다. 이와 같이 타겟 분류 모델은 대량의 비휘발성 메모리(예를 들면 하드웨어 또는 플래시 메모리)의 저장 공간을 차지하고, 동시에 모델 작동을 분류할 때 큰 메모리 공간을 차지한다.First, the storage space occupied by the target classification model acquired through depth learning is large. For example, the typical amount of data for the commonly used ZF (ZeilerandFergus, Zereland and Fergus) classification models reaches 200MB (megabytes, megabits), and the VGG (Visual Geometric Group at Oxford University, Oxford University visual geometry group) classification model The typical amount of data is 500MB. Thus, the target classification model occupies a large amount of storage space in non-volatile memory (for example, hardware or flash memory) and at the same time occupies a large memory space when classifying the model operation.

두 번째로, 방대한 분류 모델의 데이터 양은 계산 속도와 로딩 속도가 매우 느리고, 대량의 프로세서 자원을 차지한다. 이는 깊이 학습에 기초한 타겟 검출 방법을 대대적으로 제한하였고, 하드웨어 사양이 높지 않거나 계산 성능이 비교적 낮은 장비에서 발생한다. 또한 깊이 학습에 기초한 타겟 검출 방법의 운용은 CPU의 지지가 더 필요하기 때문에 이런 유형의 방법은 성능이 제한된 장비에서 사용이 어렵다는 문제점이 있다.Second, the amount of data in a massive classification model is computational and loading speed is very slow and takes up a large amount of processor resources. This largely limits the target detection method based on depth learning and occurs in equipment with low hardware specification or relatively low computation performance. Also, since the operation of the target detection method based on depth learning requires further support of the CPU, this type of method has a problem that it is difficult to use it in equipment with limited performance.

실시예들은 검출할 이미지로부터 이미지에 포함된 타겟을 검출하는 기술을 제공할 수 있다.Embodiments may provide a technique for detecting a target contained in an image from an image to be detected.

일 실시예에 따른 타겟 검출 방법은, 검출할 이미지에 기초하여 이미지 피라미드를 생성하는 단계와, 케스케이드 뉴럴 네트워크를 이용하여 상기 이미지 피라미드로부터 복수의 후보 영역을 분류하는 단계와, 상기 복수의 후보 영역에 기초하여 상기 이미지에 포함된 타겟에 대응하는 타겟 영역을 결정하는 단계를 포함하고, 상기 케스케이드 뉴럴 네트워크는 복수의 뉴럴 네트워크를 포함하고, 상기 복수의 뉴럴 네트워크 중 적어도 하나의 뉴럴 네트워크는 복수의 병렬 서브 뉴럴 네트워크를 포함한다.A target detection method according to an embodiment includes the steps of generating an image pyramid based on an image to be detected, classifying a plurality of candidate regions from the image pyramid using a cascade neural network, Wherein the cascade neural network comprises a plurality of neural networks, at least one neural network of the plurality of neural networks comprises a plurality of parallel sub- Neural networks.

상기 분류하는 단계는, 제1 뉴럴 네트워크를 이용하여 상기 검출할 이미지로부터 복수의 영역을 분류하는 단계와, 상기 복수의 병렬 서브 뉴럴 네트워크를 포함하는 제2 뉴럴 네트워크를 이용하여 상기 복수의 영역을 복수의 타겟 후보 영역 및 복수의 비타겟(non-target) 후보 영역으로 분류하는 단계를 포함하고, 상기 복수의 뉴럴 네트워크는 상기 제1 뉴럴 네트워크 및 상기 제2 뉴럴 네트워크를 포함할 수 있다.Wherein said classifying comprises classifying a plurality of regions from the image to be detected using a first neural network, and combining the plurality of regions into a plurality of regions using a second neural network comprising the plurality of parallel sub- Target candidate region and a plurality of non-target candidate regions, wherein the plurality of neural networks may include the first neural network and the second neural network.

상기 복수의 병렬 서브 뉴럴 네트워크 각각은 서로 다른 타겟 속성에 대응될 수 있다.Each of the plurality of parallel sub-neural networks may correspond to different target attributes.

상기 검출할 이미지에 포함된 타겟이 얼굴(face of human)인 경우, 상기 타겟 속성은 얼굴의 정면 자태(front face posture), 얼굴의 측면 자태(side face posture), 얼굴의 정면 또는 측면의 회전에 의한 자태(front face or side face by rotation), 피부색(skin color), 조명 조건(light condition), 겹침 효과(occlusion), 선명도(clarity) 중에서 적어도 하나를 포함할 수 있다.If the target included in the image to be detected is a face of human, the target attribute may be a front face posture of the face, a side face posture, And may include at least one of front face or side face by rotation, skin color, light condition, occlusion, and clarity.

상기 결정하는 단계는, 상기 복수의 타겟 후보 영역 각각이 속한 상기 이미지 피라미드의 층 이미지(layer image)들 및 상기 복수의 타겟 후보 영역 각각이 속한 상기 층 이미지들 간의 크기 및 위치 차이에 기초하여 상기 복수의 타겟 후보 영역의 크기 및 위치를 표준화하는 단계와, 표준화된 복수의 타겟 후보 영역을 병합하여 상기 타겟 영역을 획득하는 단계를 포함할 수 있다.Wherein the determining is based on the size and position difference between the layer images of the image pyramid to which each of the plurality of target candidate regions belongs and the layer images to which each of the plurality of target candidate regions belong, Normalizing the size and location of the target candidate region of the target region, and merging the standardized plurality of target candidate regions to obtain the target region.

상기 복수의 뉴럴 네트워크는 컨벌루션 뉴럴 네트워크 및 볼츠만 네트워크를 포함할 수 있다.The plurality of neural networks may include a convolution neural network and a Boltzmann network.

일 실시예에 따른 학습 방법은, 타겟이 포함된 이미지를 수신하는 단계와, 상기 이미지를 이용하여 복수의 뉴럴 네트워크를 포함하는 케스케이드 뉴럴 네트워크를 학습시키는 단계를 포함하고, 상기 복수의 뉴럴 네트워크 중 적어도 하나의 뉴럴 네트워크는 복수의 병렬 서브 뉴럴 네트워크를 포함한다.The learning method according to an embodiment includes receiving an image including a target, and using the image to learn a cascade neural network including a plurality of neural networks, wherein at least one of the plurality of neural networks One neural network includes a plurality of parallel sub-neural networks.

상기 학습시키는 단계는, 상기 타겟에 대응하는 타겟 영역의 면적에 기초하여 복수의 이미지 영역으로 구성된 샘플 집합을 복수의 긍정 샘플 및 복수의 부정 샘플로 분류하는 단계와, 상기 복수의 부정 샘플에 기초하여 제1 뉴럴 네트워크를 학습시키는 단계와, 오분류된 샘플, 상기 복수의 부정 샘플 및 상기 복수의 긍정 샘플에 기초하여 상기 복수의 병렬 서브 뉴럴 네트워크를 포함하는 제2 뉴럴 네트워크를 학습시키는 단계를 포함하고, 상기 복수의 뉴럴 네트워크는 상기 제1 뉴럴 네트워크 및 상기 제2 뉴럴 네트워크를 포함할 수 있다.Wherein the learning step comprises the steps of: classifying a sample set composed of a plurality of image areas based on an area of a target area corresponding to the target into a plurality of positive samples and a plurality of negative samples; Learning a first neural network and learning a second neural network comprising the plurality of parallel sub-neural networks based on the misclassified sample, the plurality of negative samples, and the plurality of positive samples And the plurality of neural networks may include the first neural network and the second neural network.

상기 학습시키는 단계는, 상기 제1 뉴럴 네트워크 및 상기 제2 뉴럴 네트워크 중에서 적어도 하나를 상기 타겟에 대한 검출율이 감소하거나 상기 타겟에 대한 오차율이 상승할 때까지 반복적으로 미세조절 하는 단계를 더 포함하고, 상기 미세조절하는 단계는, 상기 오분류된 샘플, 상기 복수의 부정 샘플 및 상기 복수의 긍정 샘플에 기초하여 상기 적어도 하나를 학습시키는 단계와, 상기 학습을 통해 테스트 샘플 집합을 분류하는 단계를 포함할 수 있다.The step of learning further includes repeatedly fine-tuning at least one of the first neural network and the second neural network until a detection rate for the target decreases or an error rate for the target increases , The step of fine-tuning comprises: learning the at least one based on the misclassified sample, the plurality of negative samples and the plurality of positive samples, and classifying the set of test samples through the learning can do.

일 실시예에 따른 타겟 검출 장치는, 검출할 이미지에 기초하여 이미지 피라미드를 생성하는 이미지 획득기와, 케스케이드 뉴럴 네트워크를 이용하여 상기 이미지 피라미드로부터 복수의 후보 영역을 분류하는 후보 영역 분류기와, 상기 복수의 후보 영역에 기초하여 상기 이미지에 포함된 타겟에 대응하는 타겟 영역을 결정하는 타겟 영역 결정기를 포함하고, 상기 케스케이드 뉴럴 네트워크는 복수의 뉴럴 네트워크를 포함하고, 상기 복수의 뉴럴 네트워크 중 적어도 하나의 뉴럴 네트워크는 복수의 병렬 서브 뉴럴 네트워크를 포함한다.According to an embodiment of the present invention, there is provided a target detection apparatus including an image acquiring unit for generating an image pyramid based on an image to be detected, a candidate region classifier for classifying a plurality of candidate regions from the image pyramid using a cascade neural network, And a target region determiner for determining a target region corresponding to a target included in the image based on the candidate region, wherein the cascade neural network comprises a plurality of neural networks, and wherein at least one of the plurality of neural networks Includes a plurality of parallel sub-neural networks.

상기 후보 영역 분류기는, 제1 뉴럴 네트워크를 이용하여 상기 검출할 이미지로부터 복수의 영역을 분류하는 제1 분류기와, 상기 복수의 병렬 서브 뉴럴 네트워크를 포함하는 제2 뉴럴 네트워크를 이용하여 상기 복수의 영역을 복수의 타겟 후보 영역 및 복수의 비타겟(non-target) 후보 영역으로 분류하는 제2 분류기를 포함하고, 상기 복수의 뉴럴 네트워크는 상기 제1 뉴럴 네트워크 및 상기 제2 뉴럴 네트워크를 포함할 수 있다.Wherein the candidate region classifier comprises: a first classifier for classifying a plurality of regions from the image to be detected using a first neural network; and a second classifier for classifying the plurality of regions using the second neural network including the plurality of parallel sub- And a second classifier for classifying the plurality of neural networks into a plurality of target candidate regions and a plurality of non-target candidate regions, wherein the plurality of neural networks may include the first neural network and the second neural network .

상기 타겟 영역 결정기는, 상기 복수의 타겟 후보 영역 각각이 속한 상기 이미지 피라미드의 층 이미지들 및 상기 복수의 타겟 후보 영역 각각이 속한 상기 층 이미지들 간의 크기 및 위치 차이에 기초하여 상기 복수의 타겟 후보 영역의 크기 및 위치를 표준화하고, 표준화된 복수의 타겟 후보 영역을 병합하여 상기 타겟 영역을 획득할 수 있다.Wherein the target region determiner is configured to determine the plurality of target candidate regions based on the size and position difference between the layer images of the image pyramid to which each of the plurality of target candidate regions belongs and the layer images to which each of the plurality of target candidate regions belong, And the target regions may be obtained by merging a plurality of standardized target candidate regions.

일 실시예에 따른 학습 장치는, 타겟이 포함된 이미지를 수신하는 이미지 획득기와, 상기 이미지를 이용하여 복수의 뉴럴 네트워크를 포함하는 케스케이드 뉴럴 네트워크를 학습시키는 학습기를 포함하고, 상기 복수의 뉴럴 네트워크 중 적어도 하나의 뉴럴 네트워크는 복수의 병렬 서브 뉴럴 네트워크를 포함한다.A learning apparatus according to an embodiment includes an image acquirer for receiving an image including a target and a learning device for learning a cascade neural network including a plurality of neural networks using the image, The at least one neural network includes a plurality of parallel sub-neural networks.

상기 학습기는, 상기 타겟에 대응하는 타겟 영역의 면적에 기초하여 복수의 이미지 영역으로 구성된 샘플 집합을 복수의 긍정 샘플 및 복수의 부정 샘플로 분류하고, 상기 복수의 부정 샘플에 기초하여 제1 뉴럴 네트워크를 학습시키고, 오분류된 부정 샘플, 상기 복수의 부정 샘플 및 상기 복수의 긍정 샘플에 기초하여 상기 복수의 병렬 서브 뉴럴 네트워크를 포함하는 제2 뉴럴 네트워크를 학습시키고, 상기 복수의 뉴럴 네트워크는 상기 제1 뉴럴 네트워크 및 상기 제2 뉴럴 네트워크를 포함할 수 있다.Wherein the learning device classifies a sample set composed of a plurality of image areas into a plurality of positive samples and a plurality of negative samples based on an area of a target area corresponding to the target, And learns a second neural network including the plurality of parallel sub-neural networks based on the negatively-classified sample, the plurality of negative samples, and the plurality of positive samples, and the plurality of neural networks learns 1 < / RTI > neural network and the second neural network.

상기 학습기는, 상기 제1 뉴럴 네트워크 및 상기 제2 뉴럴 네트워크 중에서 적어도 하나를 상기 타겟에 대한 검출율이 감소하거나 상기 타겟에 대한 오차율이 상승할 때까지 반복적으로 미세조절하고, 상기 미세조절은 상기 오분류된 샘플, 상기 복수의 부정 샘플 및 상기 복수의 긍정 샘플에 기초하여 상기 적어도 하나를 학습시키고, 상기 학습을 통해 테스트 샘플 집합을 분류할 수 있다.Wherein the learning device repeatedly fine-tunes at least one of the first neural network and the second neural network repeatedly until a detection rate for the target decreases or an error rate for the target increases, The at least one may be learned based on the classified samples, the plurality of negative samples and the plurality of positive samples, and the test sample set may be classified through the learning.

도 1은 일 실시예에 따른 타겟 검출 장치의 개략적인 블록도를 나타낸다.
도 2는 도 1에 도시된 타겟 검출 장치의 동작의 순서도를 나타낸다.
도 3a는 일 실시예에 따른 학습장치의 개략적인 블록도를 나타낸다.
도 3b는 도 1에 도시된 케스케이드 뉴럴 네트워크에 포함된 제1 뉴럴 네트워크를 학습(traning)시키는 동작의 순서도를 나타낸다.
도 3c는 도 3b에 도시된 미세조절하는 동작의 순서도를 나타낸다.
도 4는 도 3b에 도시된 제1 뉴럴 네트워크의 구조의 예시이다.
도 5a는 도 1에 도시된 케스케이드 뉴럴 네트워크에 포함된 제2 뉴럴 네트워크를 학습시키는 동작의 순서도를 나타낸다.
도 5b는 도 5a에 도시된 미세조절하는 동작의 순서도를 나타낸다.
도 6은 도 5a에 도시된 제2 뉴럴 네트워크에 포함된 서브 뉴럴 네트워크의 구조의 예시이다.
도 7은 도 1에 도시된 타겟 검출장치가 영상 피라미드를 생성하여 타겟을 검출하는 동작의 순서도를 나타낸다.
도 8은 도 7에 도시된 이미지 피라미드의 예시이다.
도 9는 2 스테이지 모델을 이용하여 타겟을 검출하는 동작의 예시를 타낸다.1 shows a schematic block diagram of a target detection device according to an embodiment.
Fig. 2 shows a flowchart of the operation of the target detection apparatus shown in Fig.
FIG. 3A shows a schematic block diagram of a learning apparatus according to one embodiment.
FIG. 3B shows a flowchart of operations for traning the first neural network included in the cascade neural network shown in FIG.
FIG. 3C shows a flowchart of the fine adjustment operation shown in FIG. 3B.
FIG. 4 is an illustration of the structure of the first neural network shown in FIG. 3B.
FIG. 5A shows a flowchart of operations for learning a second neural network included in the cascade neural network shown in FIG. 1. FIG.
FIG. 5B shows a flowchart of the fine adjustment operation shown in FIG. 5A.
Fig. 6 is an illustration of the structure of a sub-neural network included in the second neural network shown in Fig. 5A.
Fig. 7 shows a flowchart of an operation in which the target detection apparatus shown in Fig. 1 generates an image pyramid to detect a target.
8 is an illustration of the image pyramid shown in FIG.
9 shows an example of an operation of detecting a target using a two-stage model.

실시예들에 대한 특정한 구조적 또는 기능적 설명들은 단지 예시를 위한 목적으로 개시된 것으로서, 다양한 형태로 변경되어 실시될 수 있다. 따라서, 실시예들은 특정한 개시형태로 한정되는 것이 아니며, 본 명세서의 범위는 기술적 사상에 포함되는 변경, 균등물, 또는 대체물을 포함한다.Specific structural or functional descriptions of embodiments are set forth for illustration purposes only and may be embodied with various changes and modifications. Accordingly, the embodiments are not intended to be limited to the specific forms disclosed, and the scope of the disclosure includes changes, equivalents, or alternatives included in the technical idea.

제1 또는 제2 등의 용어를 다양한 구성요소들을 설명하는데 사용될 수 있지만, 이런 용어들은 하나의 구성요소를 다른 구성요소로부터 구별하는 목적으로만 해석되어야 한다. 예를 들어, 제1 구성요소는 제2 구성요소로 명명될 수 있고, 유사하게 제2 구성요소는 제1 구성요소로도 명명될 수 있다.The terms first or second, etc. may be used to describe various elements, but such terms should be interpreted solely for the purpose of distinguishing one element from another. For example, the first component may be referred to as a second component, and similarly, the second component may also be referred to as a first component.

어떤 구성요소가 다른 구성요소에 "연결되어" 있다고 언급된 때에는, 그 다른 구성요소에 직접적으로 연결되어 있거나 또는 접속되어 있을 수도 있지만, 중간에 다른 구성요소가 존재할 수도 있다고 이해되어야 할 것이다.It is to be understood that when an element is referred to as being "connected" to another element, it may be directly connected or connected to the other element, although other elements may be present in between.

단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한, 복수의 표현을 포함한다. 본 명세서에서, "포함하다" 또는 "가지다" 등의 용어는 설시된 특징, 숫자, 단계, 동작, 구성요소, 부분품 또는 이들을 조합한 것이 존재함으로 지정하려는 것이지, 하나 또는 그 이상의 다른 특징들이나 숫자, 단계, 동작, 구성요소, 부분품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 미리 배제하지 않는 것으로 이해되어야 한다.The singular expressions include plural expressions unless the context clearly dictates otherwise. In this specification, the terms "comprises ", or" having ", and the like, are used to specify one or more of the features, numbers, steps, operations, elements, But do not preclude the presence or addition of steps, operations, elements, parts, or combinations thereof.

다르게 정의되지 않는 한, 기술적이거나 과학적인 용어를 포함해서 여기서 사용되는 모든 용어들은 해당 기술 분야에서 통상의 지식을 가진 자에 의해 일반적으로 이해되는 것과 동일한 의미를 가진다. 일반적으로 사용되는 사전에 정의되어 있는 것과 같은 용어들은 관련 기술의 문맥상 가지는 의미와 일치하는 의미를 갖는 것으로 해석되어야 하며, 본 명세서에서 명백하게 정의하지 않는 한, 이상적이거나 과도하게 형식적인 의미로 해석되지 않는다.Unless otherwise defined, all terms used herein, including technical or scientific terms, have the same meaning as commonly understood by one of ordinary skill in the art. Terms such as those defined in commonly used dictionaries are to be interpreted as having a meaning consistent with the meaning of the context in the relevant art and, unless explicitly defined herein, are to be interpreted as ideal or overly formal Do not.

이하, 실시예들을 첨부된 도면을 참조하여 상세하게 설명한다. 그러나, 특허출원의 범위가 이러한 실시예들에 의해 제한되거나 한정되는 것은 아니다. 각 도면에 제시된 동일한 참조 부호는 동일한 부재를 나타낸다.Hereinafter, embodiments will be described in detail with reference to the accompanying drawings. However, the scope of the patent application is not limited or limited by these embodiments. Like reference symbols in the drawings denote like elements.

도 1은 일 실시예에 따른 타겟 검출 장치의 개략적인 블록도를 나타내고, 도 2는 도 1에 도시된 타겟 검출 장치의 동작의 순서도를 나타낸다.Fig. 1 shows a schematic block diagram of a target detection apparatus according to an embodiment, and Fig. 2 shows a flowchart of the operation of the target detection apparatus shown in Fig.

도 1 및 도 2를 참조하면, 타겟 검출 장치(10)는 깊이 학습에 기초한 타겟 검출 방법에 존재하는 분류 모델이 비교적 큰 문제를 개선하기 위해서 타겟 검출 방법으로 케스케이드 CNN(Cascade Convolutional Neural Network)으로 불리는 깊이 학습 모델과 결합한 케스케이드 분류기(Cascade Classifier)를 이용하여 타겟을 검출할 수 있다.Referring to FIGS. 1 and 2, the target detection apparatus 10 is a target detection method for improving the problem of a relatively large classification model existing in a target detection method based on depth learning, called a cascade convolutional neural network (CNN) The target can be detected using a Cascade Classifier combined with a depth learning model.

종래의 케스케이드 CNN에 기초한 타겟 검출 방법은 6 레벨에 달하는 CNN을 분류 모델로 사용하였고, 전체적으로 분류 모델이 차지하는 저장 공간이 크며, 계산 속도가 느릴 수 있다. 또한, 종래의 케스케이드 CNN은 각각의 레벨의 CNN이 모두 소형 CNN이기 때문에 소형 CNN이 감당할 수 있는 데이터 양이 비교적 작을 수 있다.Conventional cascade CNN-based target detection methods use CNNs of 6 levels as a classification model, and the storage space occupied by the classification model as a whole is large and calculation speed may be slow. In addition, since the CNNs of the respective levels in the conventional cascade CNN are all small CNNs, the amount of data that the small CNNs can afford is relatively small.

이로 인해, 종래의 케스케이드 CNN에 기초한 타겟 검출 방법은 얼굴 타겟의 경우에 얼굴의 자태, 피부색, 조명 조건 등의 복잡한 상황의 속성 정보를 정확하게 표현할 수 없어서, 종래의 Adaboost 알고리즘 등에 비하여 타겟 검출율 등 성능의 향상을 가져올 수 없었다.Therefore, the conventional cascaded CNN-based target detection method can not accurately represent the attribute information of the complex situation such as the face appearance, skin color, illumination condition, and the like in the case of the face target, Could not be improved.

타겟 검출 장치(10)는 종래의 케스케이스 CNN이 가지는 문제점인 검출율이 낮고, 분류 모델이 차지하는 저장 공간이 큰 문제를 해결할 수 있다.The target detection apparatus 10 can solve the problem that the detection rate which is a problem of the conventional case CNN is low and the storage space occupied by the classification model is large.

타겟 검출 장치(10)는 검출할 이미지에 기초하여 이미지 피라미드를 생성하고, 이미지 피라미드로부터 복수의 후보 영역을 분류할 수 있다. 타겟 검출 장치(10)는 복수의 후보 영역에 기초하여 이미지에 포함된 타겟에 대응하는 타겟 영역을 결정함으로써 이미지로부터 타겟을 검출할 수 있다.The target detection apparatus 10 can generate an image pyramid based on the image to be detected, and can classify a plurality of candidate regions from the image pyramid. The target detection apparatus 10 can detect a target from an image by determining a target region corresponding to the target included in the image based on the plurality of candidate regions.

타겟이란 생물의 신체부위, 환경 물체 중의 적어도 하나를 포함할 수 있다. 예를 들면, 생물의 신체부위는 구체적으로 인간의 얼굴 또는 동물의 얼굴, 또는 식물의 잎, 꽃 또는 과일 등일 수 있다. 환경 물체는 교통 표시판 또는 신호등 등을 포함할 수 있다.The target may include at least one of a body part of an organism, and an environmental object. For example, a body part of a creature can be specifically a face of a human or an animal, or a leaf, flower, or fruit of a plant. Environmental objects may include traffic signs or traffic lights.

타겟 검출 장치(10)는 뉴럴 네트워크를 이용하여 검출할 이미지의 후보 영역을 분류하여 분류 정확도를 향상시킬 수 있다. 타겟 검출 장치(10)는 복수의 각도에서 촬영된 이미지를 동시에 분류할 수 있으며, 전면적이고 정확하게 검출할 이미지를 분류할 수 있다. 타겟 검출 장치(10)는 정확한 분류 결과에 기초하여 타겟 영역을 정확하게 결정할 수 있다.The target detection apparatus 10 can improve the classification accuracy by classifying candidate regions of an image to be detected using a neural network. The target detection apparatus 10 can simultaneously classify the images photographed at a plurality of angles and classify the images to be detected entirely and accurately. The target detection apparatus 10 can accurately determine the target area based on the accurate classification result.

타겟 검출 장치(10)는 뉴럴 네트워크에 포함된 복수의 서브 뉴럴 네트워크를 사용하여 전체 뉴럴 네트워크의 수를 감소시키고, 계산 속도를 향상시킬 뿐만 아니라 복수의 뉴럴 네트워크로 구성된 분류 모델이 차지하는 저장 공간을 획기적으로 감소시킬 수 있다. 이로 인해, 타겟 검출 장치(10)는 저사양의 하드웨어나 계산 성능이 낮은 장비에 적용될 수 있다.The target detection apparatus 10 can reduce the total number of neural networks and improve the calculation speed by using a plurality of sub-neural networks included in the neural network, as well as improve the storage space occupied by the classification models composed of a plurality of neural networks . Therefore, the target detection apparatus 10 can be applied to low-end hardware and equipment with low calculation performance.

타겟 검출 장치(10)는 이미지 획득기(101), 후보 영역 분류기(102), 타겟 영역 결정기(103) 및 케스케이드 뉴럴 네트워크(104)를 포함한다.The target detection apparatus 10 includes an image obtainer 101, a candidate region classifier 102, a target region determinator 103, and a cascade neural network 104.

이미지 획득기(101)는 검출 이미지를 획득할 수 있다(S201). 이미지 획득기(101)는 검출할 이미지에 기초하여 이미지 피라미드를 생성할 수 있다. 이미지 피라미드를 생성하는 동작은 도 7 및 도 8을 참조하여 자세하게 설명할 것이다.The image acquirer 101 may acquire the detected image (S201). The image acquirer 101 may generate an image pyramid based on the image to be detected. The operation of generating the image pyramid will be described in detail with reference to Figs. 7 and 8. Fig.

이미지 획득기(101)는 단말기의 촬영 장비를 통하여 검출할 영상을 획득할 수 있다.The image acquiring unit 101 may acquire an image to be detected through the photographing equipment of the terminal.

단말 또는 단말 장비는 무선 신호 수신기를 포함하고, 송, 수신 하드웨어 장비를 포함하며, 양방향 송, 수신이 가능한 장비를 의미할 수 있다. 단말 장비는 허니콤 등의 통신 장비를 포함할 수 있고, 단말 장비는 단일 회로 디스플레이, 다중 회로 디스플레이 및 다중 회로 디스플레이가 없는 허니콤 장비를 포함할 수 있다.A terminal or terminal equipment may include a radio signal receiver, including transmission and reception hardware equipment, and may mean a device capable of two-way transmission and reception. The terminal equipment may include communication equipment such as honeycomb, and the terminal equipment may include honeycomb equipment without single circuit display, multiple circuit display and multi-circuit display.

단말은 음성, 데이터 처리, 팩스 및/또는 데이터 통신이 가능한 PCS(Personal Communications Service), PDA(Personal Digital Assistant), RF(Radio Frequency) 수신기, 무전 호출기, 인터넷 네트워크, 인터넷 브라우저, 메모지, 달력, GPS(Global Positioning System) 수신기, 랩탑, 핸드헬드 컴퓨터, MID(Mobile Internet Device), 이동 전화, 스마트 TV 및 셋톱박스 등의 장비를 포함할 수 있다.The terminal may be a Personal Communications Service (PCS), a Personal Digital Assistant (PDA), a Radio Frequency (RF) receiver, a pager, an Internet network, an Internet browser, a note pad, a calendar, a GPS (Global Positioning System) receiver, a laptop, a handheld computer, a MID (Mobile Internet Device), a mobile phone, a smart TV, and a set-top box.

또한, 이러한 단말은, 휴대용으로 운반 가능하고, 운송장치(항공, 선박)에 설치 및/또는 배치될 수 있다.Further, such a terminal can be portable and can be installed and / or disposed in a transportation device (air, ship).

후보 영역 분류기(102)는 케스케이드 뉴럴 네트워크(104)를 이용하여 이미지 피라미드로부터 복수의 후보 영역을 분류할 수 있다(S202).The candidate region classifier 102 may classify a plurality of candidate regions from the image pyramid using the cascade neural network 104 (S202).

후보 영역은 검출하려는 이미지에 포함된 임의의 영역을 의미하거나 검출하려는 이미지의 일부에 대하여 이미지 프로세싱을 수행한 결과를 의미할 수 있다.The candidate region may mean any region included in the image to be detected or may be a result of performing image processing on a part of the image to be detected.

후보 영역 분류기(102)는 제1 분류기(1021) 및 제2 분류기(1022)를 포함할 수 있다.The candidate region classifier 102 may include a first classifier 1021 and a second classifier 1022.

제1 분류기(1021) 및 제2 분류기(1022)는 케스케이드 뉴럴 네트워크(104)를 이용하여 후보 영역을 분류할 수 있다.The first classifier 1021 and the second classifier 1022 may classify the candidate region using the cascade neural network 104. [

타겟 영역 결정기(103)은 복수의 후보 영역에 기초하여 이미지에 포함된 타겟에 대응하는 타겟 영역을 결정할 수 있다(S203).The target area determiner 103 may determine a target area corresponding to the target included in the image based on the plurality of candidate areas (S203).

타겟 영역은 검출할 이미지에서 타겟이 포함된 영역을 의미할 수 있다.The target area may refer to an area including the target in the image to be detected.

케스케이드 뉴럴 네트워크(104)는 복수의 뉴럴 네트워크를 포함할 수 있고, 케스케이드 뉴럴 네트워크가 포함하는 복수의 뉴럴 네트워크 중 적어도 하나의 뉴럴 네트워크는 복수의 병렬 서브 뉴럴 네트워크를 포함할 수 있다. 또한, 케스케이드 뉴럴 네트워크(104)를 구성하는 복수의 뉴럴 네트워크는 컨벌루션 뉴럴 네트워크 및 볼츠만 네트워크를 포함할 수 있다.The cascade neural network 104 may include a plurality of neural networks, and at least one of the plurality of neural networks included in the cascade neural network may include a plurality of parallel sub-neural networks. In addition, the plurality of neural networks constituting the cascade neural network 104 may include a convolution neural network and a Boltzmann network.

뉴럴 네트워크에 포함되는 복수의 병렬 서브 뉴럴 네트워크 각각은 서로 다른 타겟 속성에 대응될 수 있다. 타겟 속성이란, 타겟이 가진 고유한 특성을 의미할 수 있다.Each of the plurality of parallel sub-neural networks included in the neural network may correspond to different target attributes. The target attribute may mean a characteristic inherent to the target.

예를 들어, 타겟이 사람의 얼굴인 경우에, 타겟 속성은 얼굴의 정면 자태(front face posture), 얼굴의 측면 자태(side face posture), 얼굴의 정면 또는 측면의 회전에 의한 자태(front face or side face by rotation), 피부색(skin color), 조명 조건(light condition), 겹침 효과(occlusion), 선명도(clarity) 중에서 적어도 하나를 포함할 수 있다.For example, if the target is a person's face, the target attribute may be a front face posture, a side face posture, a face front or side face rotation a side face by rotation, a skin color, a light condition, an occlusion, and a clarity.

여기서, 피부색은 백인의 피부색, 황인의 피부색, 흑인의 피부색 및 갈색 피부색을 포함할 수 있다. 조명 조건은 역광, 어두운 광 및 역광 및 어두운 광을 제외한 정상 조명을 포함할 수 있다. 선명도는 뚜렷함 및 희미함을 포함할 수 있다.Here, the skin color may include a white skin color, a yellow skin color, a black skin color, and a brown skin color. The lighting conditions may include normal lighting except backlight, dark light and backlight and dark light. The sharpness may include sharpness and blurring.

계속해서 예를 들면, 어느 하나의 서브 뉴럴 네트워크는 얼굴의 정면 자태, 확인의 피부색, 역광 및 희미함을 포함하는 타겟 속성에 대응될 수 있다.Subsequently, for example, any one of the sub-neural networks may correspond to a target attribute including face frontal state, confirmation skin color, backlight and blurring.

케스케이드 뉴럴 네트워크(104)는 2 개 이상의 뉴럴 네트워크를 포함할 수 있다. 각각의 뉴럴 네트워크는 검출할 이미지로부터 후보 영역을 분류할 수 있다.The cascade neural network 104 may include two or more neural networks. Each neural network can classify candidate regions from the images to be detected.

케스케이드 뉴럴 네트워크(104)는 트리 아치(tree arch) 형태를 가진 2 스테이지 모델로 구현될 수 있다. 2 스테이지 모델의 동작은 도 9를 참조하여 자세하게 설명할 것이다.The cascade neural network 104 may be implemented as a two-stage model with a tree arch shape. The operation of the two-stage model will be described in detail with reference to Fig.

케스케이드 뉴럴 네트워크(104)가 2 개의 뉴럴 네트워크로 이루어진 경우, 각각의 뉴럴 네트워크를 제1 뉴럴 네트워크 및 제2 뉴럴 네트워크로 구분할 수 있다. 제2 뉴럴 네트워크는 제1 뉴럴 네트워크의 분류 결과를 분류할 수 있다.When the cascade neural network 104 is composed of two neural networks, each neural network can be divided into a first neural network and a second neural network. The second neural network can classify the classification result of the first neural network.

이 때, 제1 뉴럴 네트워크 및 제2 뉴럴 네트워크 중에서 적어도 하나는 복수의 병렬 서브 뉴럴 네트워크를 포함할 수 있다.At this time, at least one of the first neural network and the second neural network may include a plurality of parallel sub-neural networks.

예를 들어, 제2 뉴럴 네트워크가 복수의 병렬 서브 뉴럴 네트워크를 포함하고, 사람의 얼굴이 타겟인 경우, 제2 뉴럴 네트워크는 얼굴의 정면 자태에 대응하는 서브 뉴럴 네트워크, 얼굴의 측면 자태에 대응하는 서브 뉴럴 네트워크, 서로 다른 피부색에 대응하는 서브 뉴럴 네트워크 및 역광에 기초한 서브 뉴럴 네트워크를 포함할 수 있다.For example, if the second neural network comprises a plurality of parallel sub-neural networks, and the face of the person is the target, the second neural network may be a sub-neural network corresponding to the face frontal state, A sub-neural network, a sub-neural network corresponding to different skin colors, and a sub-neural network based on back light.

복수의 서브 뉴럴 네트워크는 서로 독립적이고, 병렬관계일 수 있다. 즉, 복수의 서브 뉴럴 네트워크들은 동시에 또는 다른 시간에 사용될 수 있다.The plurality of sub-neural networks may be independent of one another and in a parallel relationship. That is, a plurality of sub-neural networks may be used at the same time or at different times.

제1 분류기(1021)는 제1 뉴럴 네트워크를 이용하여 검출할 이미지로부터 복수의 영역을 분류할 수 있다. 제2 분류기(1022)는 복수의 병렬 서브 뉴럴 네트워크를 포함하는 제2 뉴럴 네트워크를 이용하여 복수의 영역을 복수의 타겟 후보 영역 및 복수의 비타겟(non-target) 후보 영역으로 분류할 수 있다.The first classifier 1021 can classify a plurality of regions from an image to be detected using the first neural network. The second classifier 1022 may classify the plurality of regions into a plurality of target candidate regions and a plurality of non-target candidate regions using a second neural network including a plurality of parallel sub-neural networks.

케스케이드 뉴럴 네트워크(104)는 3 개의 뉴럴 네트워크 또는 4 개의 뉴럴 네트워크를 포함할 수도 있다. 복수의 뉴럴 네트워크 중에서 적어도 하나는 복수의 병렬 서브 뉴럴 네트워크를 포함할 수 있다.The cascade neural network 104 may include three neural networks or four neural networks. At least one of the plurality of neural networks may comprise a plurality of parallel sub-neural networks.

일 예로, 제1 뉴럴 네트워크는 하나의 서브 뉴럴 네트워크로 구성되고, 제2 뉴럴 네트워크는 복수의 병렬 서브 뉴럴 네트워크를 포함하고, 제3 뉴럴 네트워크는 하나의 서브 뉴럴 네트워크로 구성되고, 제4 뉴럴 네트워크는 복수의 병렬 서브 뉴럴 네트워크를 포함할 수 있다.For example, the first neural network may be comprised of one subnetwork, the second neural network may comprise a plurality of parallel subnetworks, the third neural network may be comprised of one subnetwork, May include a plurality of parallel sub-neural networks.

이 때, 제2 뉴럴 네트워크가 포함하는 복수의 병렬 서브 뉴럴 네트워크 중에서 적어도 하나는 제1 뉴럴 네트워크의 분류결과를 분류할 수 있다.At this time, at least one of the plurality of parallel sub-neural networks included in the second neural network can classify the classification result of the first neural network.

다른 예로, 케스케이드 뉴럴 네트워크(104)가 3 개의 뉴럴 네트워크를 포함하는 경우에, 제1 뉴럴 네트워크는 하나의 서브 뉴럴 네트워크로 구성되고, 제2 뉴럴 네트워크는 복수의 병렬 서브 뉴럴 네트워크를 포함하고, 제3 뉴럴 네트워크는 복수의 병렬 서브 뉴럴 네트워크를 포함할 수 있다.As another example, in the case where the cascade neural network 104 includes three neural networks, the first neural network is composed of one subnetwork, the second neural network comprises a plurality of parallel subnetworks, 3 neural network may include a plurality of parallel sub-neural networks.

이 때, 하나의 제2 뉴럴 네트워크가 포함하는 복수의 병렬 서브 뉴럴 네트워크 중에서 적어도 하나는 제1 뉴럴 네트워크의 분류 결과를 분류하고, 제3 뉴럴 네트워크가 포함하는 복수의 병렬 서브 뉴럴 네트워크 중에서 적어도 하나는 제2 서브 뉴럴 네트워크의 분류결과를 분류할 수 있다.At this time, at least one of the plurality of parallel sub-neural networks included in one second neural network classifies the classification result of the first neural network, and at least one of the plurality of parallel sub- The classification result of the second sub-neural network can be classified.

최종적으로, 마지막 뉴럴 네트워크의 분류를 통해, 복수의 타겟 후보 영역과 복수의 비타겟 후보 영역을 획득함으로써 후보 영역을 결정할 수 있다.Finally, the candidate region can be determined by obtaining a plurality of target candidate regions and a plurality of non-target candidate regions through classification of the last neural network.

이하에서, 도 3a, 도 3b 및 도 3c를 참조하여 케스케이드 뉴럴 네트워크를 학습시키는 동작에 대하여 설명할 것이다.Hereinafter, an operation of learning a cascade neural network will be described with reference to FIGS. 3A, 3B, and 3C.

도 3a는 일 실시예에 따른 학습 장치의 개략적인 블록도를 나타내고, 도 3b는 도 1에 도시된 케스케이드 뉴럴 네트워크에 포함된 제1 뉴럴 네트워크를 학습(training)시키는 동작의 순서도를 나타낸다.FIG. 3A shows a schematic block diagram of a learning apparatus according to an embodiment, and FIG. 3B shows a flowchart of an operation of training a first neural network included in the cascade neural network shown in FIG.

도 3a를 참조하면, 학습 장치(training device, 20)는 학습기(trainer, 300) 및 케스케이드 뉴럴 네트워크를 포함할 수 있다.Referring to FIG. 3A, a training device 20 may include a trainer 300 and a cascade neural network.

학습기(300)는 복수의 긍정 샘플 및 복수의 부정 샘플에 기초하여 제1 뉴럴 네트워크를 사전 학습시킬 수 있다(S301).The learning device 300 can pre-learn the first neural network based on the plurality of positive samples and the plurality of negative samples (S301).

긍정 샘플이란 이미 알고 있는 복수의 이미지 영역으로 구성된 샘플 집합에서, 타겟 영역의 면적이 설정된 역치 값(threshold value)에 도달한 이미지 영역을 의미할 수 있고, 부정 샘플은 타겟 영역의 면적이 설정된 역치 값에 도달하지 못한 이미지 영역을 의미할 수 있다.An affirmative sample may mean an image area in which the area of the target area has reached a set threshold value in a sample set consisting of a plurality of already known image areas, Can not be reached.

예를 들어, 학습기(300)는 타겟 영역의 면적이 타겟 영역이 속한 이미지 영역의 30 %에 도달한 경우 이미지 영역을 긍정 샘플로 결정하고, 타겟 영역의 면적이 타겟 영역이 속한 이미지 영역의 30 %에 도달하지 못한 경우 이미지 영역을 부정 샘플로 결정할 수 있다.For example, the learning device 300 determines the image area as an affirmative sample when the area of the target area reaches 30% of the image area to which the target area belongs, and determines that the area of the target area is 30% The image region can be determined to be a negative sample.

사람의 얼굴이 타겟인 경우를 예로들면, 학습기(300)는 얼굴의 정면, 얼굴의 측면, 얼굴의 회전 등을 포함하는 복수의 자태, 복수의 피부색 및 복수의 외부 조명 조건 등을 포함하는 타겟 속성을 가지는 이미지 영역을 긍정 샘플로 결정할 수 있다. 학습기(300)는 다양한 배경 이미지의 얼굴을 포함하지 않는 이미지 영역 및 다른 이미지 영역을 부정샘플로 결정할 수 있다.For example, when the face of a person is a target, the learning device 300 may acquire a target attribute including a plurality of states including a front face of a face, a face side, a face rotation, etc., a plurality of skin colors, As an affirmative sample. The learner 300 can determine an image area that does not contain a face of various background images and another image area as a negative sample.

학습기(300)는 제1 뉴럴 네트워크를 생성하고, 그 파라미터를 초기화할 수 있다.The learning device 300 can create a first neural network and initialize its parameters.

학습기(300)는 복수의 긍정 샘플의 집합과 복수의 부정 샘플의 집합에서 랜덤으로 추출한 일정 수의 부정 샘플에 기초하여 생성한 제1 뉴럴 네트워크를 사전 학습시킬 수 있다. 학습기(300)는 사전 학습을 통해 제1 뉴럴 네트워크의 네트워크 파라미터를 결정할 수 있다. 학습기(300)가 제1 뉴럴 네트워크를 학습시키는 방법은 역방향 전파 알고리즘일 수 있다.The learning device 300 can pre-learn the first neural network generated based on a set of a plurality of positive samples and a predetermined number of negative samples randomly extracted from a plurality of negative samples. The learning device 300 can determine the network parameters of the first neural network through pre-learning. The way in which the learning device 300 learns the first neural network may be a reverse propagation algorithm.

학습기(300)는 사전 학습시킨 후 제1 뉴럴 네트워크를 반복적으로 미세조절(fine-tune) 할 수 있다(S202). 학습기(300)는 타겟에 대한 검출율이 감소하거나 타겟에 대한 오차율이 상승할 때까지 미세조절을 반복할 수 있다.The learning device 300 may fine-tune the first neural network repeatedly after pre-learning (S202). The learning device 300 can repeat the fine adjustment until the detection rate for the target decreases or the error rate for the target increases.

학습기(300)는 마지막에서 두 번째 미세조절한 제1 뉴럴 네트워크를 최종 학습시킨 뉴럴 네트워크로 결정할 수 있다.The learning device 300 may determine that the first neural network that was finely adjusted the second time from the last is the final neural network.

이 때, 타겟에 대한 검출율이 감소하거나, 타겟에 대한 오차율이 상승하는 경우란, 검출율 감소 검출율 감소 및 오차율 감소, 검출율 상승 및 오차율 상승, 검출율 감소 및 오차율 상승인 경우를 포함할 수 있다.The case where the detection rate for the target decreases or the error rate for the target increases includes a case where the detection rate decrease detection rate decrease and the error rate decrease, the detection rate increase and the error rate increase, the detection rate decrease and the error rate increase .

도 3c는 도 3b에 도시된 미세조절하는 동작의 순서도를 나타낸다.FIG. 3C shows a flowchart of the fine adjustment operation shown in FIG. 3B.

도 3c를 참조하면, 학습기(300)는 제1 뉴럴 네트워크에 기초하여 복수의 부정 샘플을 분류하고, 긍정 샘플로 오분류된 부정 샘플을 결정할 수 있다(S3021).Referring to FIG. 3C, the learner 300 classifies a plurality of negative samples based on the first neural network, and determines negative samples misclassified as positive samples (S3021).

이 때, 학습기(300)가 사용하는 제1 뉴럴 네트워크는 사전 학습된 제1 뉴럴 네트워크 및 미세조절된 뉴럴 네트워크를 포함할 수 있다.At this time, the first neural network used by the learning device 300 may include a pre-learned first neural network and a fine-tuned neural network.

학습기(300)는 제1 뉴럴 네트워크에 기초하여, 부정 샘플 집합의 부정 샘플 및 긍정 샘플의 두 개의 유형으로 샘플을 분류하고, 그 중에서 긍정 샘플로 오분류된 부정 샘플을 결정할 수 있다.Based on the first neural network, the learner 300 can classify the samples into two types of negative and positive samples of the negative sample set, and determine negative samples misclassified as positive samples among them.

예를 들어, 학습기(300)는 부정 샘플 집합의 모든 부정 샘플을 긍정 샘플과 부정 샘플로 분류하고, 그 중에서 긍정 샘플로 오분류된 부정 샘플을 결정할 수 있다.For example, the learner 300 may classify all the negative samples of the negative sample set into positive samples and negative samples, and determine negative samples misclassified as affirmative samples.

학습기(300)는 오분류된 부정 샘플, 복수의 부정 샘플 및 복수의 긍정 샘플에 기초하여 제1 뉴럴 네트워크를 학습시킬 수 있다(S3022).The learning device 300 can learn the first neural network based on the misclassified negative samples, the plurality of negative samples, and the plurality of positive samples (S3022).

학습기(300)는 긍정 샘플로 오분류된 부정 샘플을 부정 샘플의 집합에서 랜덤으로 추출한 일정 수량의 부정 샘플과 혼합할 수 있다.The learner 300 may mix a negative sample misclassified as a positive sample with a random number of random samples extracted from the set of negative samples.

학습기(300)는 혼합된 복수의 부정 샘플과 복수의 긍정 샘플에 기초하여 제1 뉴럴 네트워크를 학습시켜 네트워크 파라미터를 획듬함으로써, 제1 뉴럴 네트워크를 학습시킬 수 있다. 이 때, 학습기(300)는 역전파 알고리즘을 사용하여 제1 뉴럴 네트워크를 결정할 수 있다.The learning device 300 can learn the first neural network by learning the first neural network based on a plurality of mixed negative samples and a plurality of positive samples to train the network parameters. At this time, the learning device 300 can determine the first neural network using a back propagation algorithm.

학습기(300)는 학습된 제1 뉴럴 네트워크에 기초하여 미리 설정한 테스트 샘플 집합을 분류할 수 있다(S3023). 테스트 샘플 집합은 분류 결과를 이미 알고 있는 샘플의 집합을 의미할 수 있다.The learner 300 can classify the set of test samples previously set based on the learned first neural network (S3023). A test sample set may refer to a set of samples that already know the classification result.

학습기(300)는 미리 설정한 테스트 샘플 집합에 포함된 복수의 샘플을 긍정 샘플과 부정 샘플로 분류할 수 있다. 예를 들어, 타겟이 사람의 얼굴인 경우 학습기(300)는 제1 뉴럴 네트워크에 기초하여 FDDB(Face Detection Data set and Benchmark) 집합의 복수의 샘플을 얼굴 영역과 얼굴이 아닌 영역으로 분류할 수 있다. 이 경우, 미리 설정된 테스트 샘플은 FFDB에 대응될 수 있고 얼굴 영역은 긍정 샘플, 얼굴이 아닌 영역은 부정 샘플에 대응될 수 있다.The learner 300 can classify a plurality of samples included in the preset set of test samples into positive samples and negative samples. For example, if the target is a face of a person, the learner 300 may classify a plurality of samples of a set of Face Detection Data Set and Benchmark (FDDB) into face regions and non-face regions based on the first neural network . In this case, the preset test sample may correspond to the FFDB, the face region may be the positive sample, and the non-face region may correspond to the negative sample.

학습기(300)는 타겟에 대한 검출율이 상승하고, 타겟에 대한 오차율이 감소할 경우 제1 뉴럴 네트워크를 반복적으로 미세조절하고, 타겟에 대한 검출율이 감소하거나 타겟에 대한 오차율이 상승할 경우 미세조절을 종료할 수 있다(S3024).The learning device 300 repeatedly fine-tunes the first neural network when the detection rate for the target rises and the error rate for the target decreases, and when the detection rate for the target decreases or the error rate for the target rises, The adjustment may be terminated (S3024).

이 때, 학습기(300)는 반복적 미세조절로 획득한 분류 결과를 이미 분류 결과를 알고 있는 테스트 샘플 집합의 샘플들과 비교하여 제1 뉴럴 네트워크의 타겟에 대한 검출율과 오차율을 결정할 수 있다.At this time, the learning device 300 may compare the classification result obtained by the iterative fine adjustment with the samples of the test sample set that already know the classification result, and determine the detection rate and the error rate for the target of the first neural network.

오차율은 긍정 샘플로 오분류된 부정 샘플 및 부정 샘플로 오분류된 긍정 샘플의 모든 샘플에 대한 비율을 의미할 수 있다. 검출율은 샘플 집합에서 검출한 긍정 샘플의 수의 샘플 집합 내의 전체 긍정 샘플에 대한 비율을 의미할 수 있다.The error rate may refer to the ratio for all samples of positive samples that were misclassified as negative samples and negative samples that were misclassified as positive samples. The detection rate can refer to the ratio of the number of positive samples detected in the sample set to the total positive samples in the sample set.

학습기(300)는 미세조절 후의 제1 뉴럴 네트워크의 타겟에 대한 검출율 및 타겟에 대한 오차율을 미세조절 전의 검출율 및 오차율과 비교할 수 있다. 비교결과 미세 조절 후의 제1 뉴럴 네트워크의 타겟에 대한 검출율이 상승하고, 타겟에 대한 오차율이 감소한 경우, 제1 뉴럴 네트워크는 성능이 향상될 가능성이 있으므로, 학습기(300)는 다시 미세조절을 수행할 수 있다.The learning device 300 can compare the detection rate for the target of the first neural network after the fine adjustment and the error rate for the target with the detection rate and the error rate before the fine adjustment. As a result of comparison, when the detection rate of the target of the first neural network after the fine adjustment increases and the error rate with respect to the target decreases, the performance of the first neural network may be improved. Therefore, the learning device 300 performs fine adjustment again can do.

비교결과 미세조절 후에 제1 뉴럴 네트워크의 타겟에 대한 검출율이 감소하고, 오차율이 상승한 경우, 학습기(300)는 제1 뉴럴 네트워크의 성능이 최고점에 도달하였음을 판단하여 미세조절을 종료할 수 있다.As a result of the comparison, if the detection rate for the target of the first neural network decreases and the error rate increases after the fine adjustment, the learning device 300 may determine that the performance of the first neural network has reached the maximum, and terminate the fine adjustment .

도 4는 도 3b에 도시된 제1 뉴럴 네트워크의 구조의 예시이다.FIG. 4 is an illustration of the structure of the first neural network shown in FIG. 3B.

도 4를 참조하면, 제1 뉴럴 네트워크는 복수의 네트워크를 포함할 수 있다. 제1 뉴럴 네트워크는 가장 좌측부터 입력층(input layer), 5개의 히든층(hidden layer) 및 출력층(output layer)을 포함할 수 있다. 5개의 히든층은 좌측부터 제1 컨벌루션층(convolution layer)(또는 제1 필터층(filter layer)), 제1 풀링층(pooling layer), 제2 컨벌루션층, 제2 풀링층 및 풀 조인층(full join layer)를 의미할 수 있다.Referring to FIG. 4, the first neural network may include a plurality of networks. The first neural network may include an input layer, a hidden layer, and an output layer from the leftmost side. The five hidden layers include a first convolution layer (or a first filter layer), a first pooling layer, a second convolution layer, a second pooling layer, and a full join layer join layer.

입력층은 높이 12, 깊이 12인 12×12 개의 뉴런 행렬로 표현될 수 있다. 입력 이미지는 12×12의 픽셀 포인트 행렬에 대응될 수 있다.The input layer can be represented by a 12 × 12 neuron matrix with a height of 12 and a depth of 12. The input image may correspond to a 12 x 12 pixel point matrix.

제1 컨벌루션층은 높이 10, 깊이 10 및 너비 32인 직육면체로 표현될 수 있고, 학습기(300)는 입력 이미지를 컨벌루션하여 32개의 특징맵(characteristic map)들로 표현할 수 있다. 각각의 특징맵은 10×10 픽셀 포인트를 포함할 수 있다.The first convolution layer may be represented by a rectangular parallelepiped having a height of 10, a depth of 10, and a width of 32, and the learning apparatus 300 can express the input image by 32 characteristic maps by convoluting the input image. Each feature map may include 10 x 10 pixel points.

입력층과 제1 컨벌루션층 사이의 컨벌루션 스텝 사이즈는 제1 컨벌루션층의 컨벌루션 스텝사이즈를 나타내는 것일 수 있다. 제1 컨벌루션층은 32개의 제1 컨벌루션 커널(Convolution Kernel(또는 필터))들을 포함하고, 각각의 커널은 특징맵에 대응하고, 각각의 컨벌루션 커널은 5×5 뉴런 행렬을 포함할 수 있다.The convolution step size between the input layer and the first convolution layer may be indicative of the convolution step size of the first convolution layer. The first convolution layer comprises 32 first convolution kernels (Convolution Kernels (or filters), each kernel corresponding to a feature map, and each convolution kernel may comprise a 5x5 neuron matrix.

각각의 컨벌루션 커널은 5×5 뉴련 행렬을 단위 행렬로 템플릿을 스캔하고, 컨벌루션 스텝 픽셀(도 4에서는 컨벌루션 스텝사이즈가 1일 때, 하나의 픽셀을 의미함)의 간격을 가지고 템플릿을 스캔하여, 입력층의 뉴런에 대응하는 픽셀들을 스캔할 수 있다.Each convolution kernel scans the template with a 5x5 training matrix as a unit matrix and scans the template with the spacing of the convolution step pixels (which means one pixel when the convolution step size is one in Figure 4) The pixels corresponding to the neurons of the input layer can be scanned.

스캔 과정에서 각각의 컨벌루션 커널은 복수의 5×5 픽셀 포인트에 대한 컨벌루션 스텝 간격에 대응하는 입력층을 컨벌루션할 수 있다. 학습기(300)는 입력층에서 컨벌루션 스텝 사이즈를 간격으로 하는 복수의 5×5 뉴런 영역이 대응하는 픽셀 포인트를 제1 컨벌루션 결과 중 하나의 특징맵의 픽셀 포인트에 매핑시킬 수 있다.During the scan process, each convolution kernel may convolve the input layer corresponding to a convolution step interval for a plurality of 5x5 pixel points. The learner 300 may map a corresponding pixel point of a plurality of 5x5 neuron regions spacing the convolution step size in the input layer to pixel points of one of the first convolution results.

제1 풀링층은 높이 5, 깊이 5, 너비 32인 직육면체로 표현될 수 있고, 제1 컨벌루션 결과 32 개의 특징맵이 제1 풀링층을 통과한 결과를 32개의 특징맵을 가질 수 있고, 각각의 특징맵은 5×5 픽셀 포인트를 포함할 수 있다.The first pulling layer may be represented by a rectangular parallelepiped having a height of 5, a depth of 5, and a width of 32, and the first convolution result may have 32 characteristic maps of 32 characteristic maps passing through the first pooling layer, The feature map may include 5x5 pixel points.

제1 컨벌루션층과 제1 풀링층 사이의 풀링 스텝 사이즈는 제1 풀링층의 풀링 스텝사이즈를 의미할 수 있다. 제1 풀링층은 32개의 제1 풀링 커널을 포함하고, 32개의 특징맵을 가질 수 있다. 각각의 풀링 커널은 3×3 뉴런 행렬을 포함할 수 있다.The pooling step size between the first convolution layer and the first pooling layer may mean the pooling step size of the first pooling layer. The first pooling layer includes 32 first pooling kernels and may have 32 feature maps. Each pooling kernel may contain a 3x3 neuron matrix.

학습기(300)는 각각의 풀링 커널로 3×3 뉴런 행렬을 단위로 템플릿을 스캔하고, 풀링 스텝 사이즈 픽셀(풀링 스텝 사이즈가 1일 때, 하나의 픽셀을 의미함) 간격으로 제1 컨벌루션층의 특징맵을 스캔할 수 있다.The learner 300 scans the template with a 3 × 3 neuron matrix in each of the pooling kernels and calculates the number of pixels of the first convolution layer at a pooling step size pixel (which means one pixel when the pooling step size is 1) Feature maps can be scanned.

스캔과정에서 학습기(300)는 제1 컨벌루션 특징맵에서 풀링 스텝 사이즈를 간격으로 하는 복수의 3×3 픽셀 포인트를 풀링한 후, 풀링 결과로 특징맵을 획득할 수 있다.During the scan process, the learner 300 may pool a plurality of 3x3 pixel points spaced by the pooling step size in the first convolution feature map, and then obtain the feature map as the pooling result.

학습기(300)는 제1 컨벌루션층의 특징맵을 풀링 스텝 사이즈를 간격으로하여 복수의 3×3 픽셀 포인트를 제1 풀링층의 특징맵에 매핑시킬 수 있다.The learning device 300 may map the feature map of the first convolution layer to a feature map of the first pulling layer with a plurality of 3 占 pixel points at intervals of the pulling step size.

제2 컨벌루션층은 높이 4, 깊이 4 및 너비 32인 직육면체로 표현될 수 있고, 학습기(300)는 제1 풀링층의 32개의 특징맵이 제2 컨벌루션층을 통과한 후에 제2 컨벌루션층의 32개의 특징맵을 획득할 수 있다. 각각의 특징도는 4×4 픽셀 포인트를 포함할 수 있다.The second convolution layer may be represented by a rectangular parallelepiped having a height of 4, a depth of 4 and a width of 32, and the learning apparatus 300 may be configured such that 32 characteristic maps of the first pooling layer are divided into 32 It is possible to acquire the feature maps. Each feature may include 4 x 4 pixel points.

제2 풀링층은 높이 2, 깊이 2, 너비 32인 직육면체로 표현될 수 있고, 학습기(300)는 제2 컨벌루션층의 32개의 특징맵을 제2 풀링하여 제2 풀링층의 32개의 특징맵을 획득할 수 있다. 각각의 특징맵은 2×2 픽셀 포인트를 포함할 수 있다.The second pooling layer may be represented by a rectangular parallelepiped having a height of 2, a depth of 2, and a width of 32, and the learning device 300 pools 32 feature maps of the second convolution layer to obtain 32 feature maps of the second pooling layer Can be obtained. Each feature map may contain 2x2 pixel points.

제2 컨벌루션 층과 제2 풀링층의 동작은 제1 컨벌루션층과 제1 풀링층의 동작과 동일할 수 있다.The operation of the second convolution layer and the second pulling layer may be the same as the operation of the first convolution layer and the first pulling layer.

풀조인층은 32개의 뉴런을 포함할 수 있다. 풀조인층의 각각의 뉴런은 독립적으로 제2 풀링층의 각각의 뉴런과 연결될 수 있다. The full join layer may include 32 neurons. Each neuron of the full join layer can be independently connected to each neuron of the second pooling layer.

출력층은 2개의 뉴런을 포함할 수 있다. 출력층의 각각의 뉴런은 독립적으로 풀조인층의 각각의 뉴런과 연결될 수 있다.The output layer may include two neurons. Each neuron in the output layer can be independently connected to each neuron in the full join layer.

도 5a는 도 1에 도시된 케스케이드 뉴럴 네트워크에 포함된 제2 뉴럴 네트워크를 학습시키는 동작의 순서도를 나타낸다.FIG. 5A shows a flowchart of operations for learning a second neural network included in the cascade neural network shown in FIG. 1. FIG.

도 5a를 참조하면, 학습기(300)는 제2 뉴럴 네트워크를 이용하여 제1 뉴럴 네트워크에 의해서 긍정 샘플로 오분류된 부정샘플을 결정할 수 있다(S501).Referring to FIG. 5A, the learning device 300 can determine a negative sample classified as an affirmative sample by the first neural network using the second neural network (S501).

학습기(300)는 오분류된 부정 샘플, 복수의 부정 샘플, 복수의 긍정 샘플에 기초하여 복수의 병렬 서브 뉴럴 네트워크를 포함하는 제2 뉴럴 네트워크를 사전 학습시킬 수 있다(S502).The learning device 300 may pre-learn a second neural network including a plurality of parallel sub-neural networks based on the misclassified negative samples, the plurality of negative samples, and the plurality of positive samples (S502).

학습기(300)는 제2 뉴럴 네트워크에 포함되는 복수의 서브 뉴럴 네트워크를 생성하고, 그 파라미터를 랜덤 초기화할 수 있다. 학습기(300)는 긍정 샘플로 오분류된 부정 샘플을 부정 샘플 집합에서 랜덤으로 추출한 일정 수의 부정 샘플과 혼합할 수 있다.The learning device 300 can generate a plurality of sub-neural networks included in the second neural network and randomly initialize the parameters. The learner 300 may mix a negative sample misclassified as a positive sample with a constant number of negative samples randomly extracted from the negative sample set.

학습기(300)는 복수의 긍정 샘플과 복수의 혼합된 부정 샘플에 기초하여, 제2 뉴럴 네트워크에 포함된 복수의 병렬 서브 뉴럴 네트워크를 학습시켜 제2 뉴럴 네트워크의 네트워크 파라미터를 획득함으로써, 사전 학습시킨 제2 뉴럴 테느워크에 포함된 복수의 병렬 서브 뉴럴 네트워크를 결정할 수 있다.The learning device 300 learns a plurality of parallel sub-neural networks included in the second neural network based on the plurality of positive samples and the plurality of mixed negative samples to acquire the network parameters of the second neural network, A plurality of parallel sub-neural networks included in the second neural network can be determined.

학습기(300)는 역방향 전파 알고리즘을 이용하여 사전 학습을 수행할 수 있다.The learning device 300 can perform the pre-learning using the reverse propagation algorithm.

학습기(300)는 사전 학습된 제2 뉴럴 네트워크에 포함된 복수의 병렬 서브 뉴럴 네트워크에 대하여 타겟에 대한 검출율이 감소하거나 타겟에 대한 오차율이 상승할 때까지 반복적으로 미세조절할 수 있다(S503).The learning device 300 can repeatedly fine adjust the detection rate for the target or the error rate for the target for a plurality of parallel sub-neural networks included in the pre-learned second neural network (S503).

학습기(300)는 제2 뉴럴 네트워크에 포함된 복수의 병렬 서브 뉴럴 네트워크의 검출율이 감소하거나 오차율이 상승하면, 마지막에서 두 번째 미세조절한 복수의 서브 뉴럴 네트워크를 최종 학습시킨 뉴럴 네트워크로 결정할 수 있다.When the detection rate of a plurality of parallel sub-neural networks included in the second neural network decreases or the error rate rises, the learning device 300 can determine a neural network that has finely learned a plurality of second finely adjusted sub-neural networks have.

도 5b는 도 5a에 도시된 미세조절하는 동작의 순서도를 나타낸다.FIG. 5B shows a flowchart of the fine adjustment operation shown in FIG. 5A.

도 5b를 참조하면, 학습기(300)는 제2 뉴럴 네트워크의 복수의 병렬 서브 뉴럴 네트워크에 기초하여 복수의 부정 샘플을 분류하고, 긍정 샘플로 오분류된 부정 샘플을 결정할 수 있다(S5021).Referring to FIG. 5B, the learner 300 can classify a plurality of negative samples based on a plurality of parallel sub-neural networks of the second neural network, and determine negative samples misclassified as positive samples (S5021).

제2 뉴럴 네트워크의 복수의 병렬 서브 뉴럴 네트워크는 사전 학습된 서브 뉴럴 네트워크 또는 미세조절된 서브 뉴럴 네트워크일 수 있다.The plurality of parallel sub-neural networks of the second neural network may be a pre-learned sub-neural network or a fine-tuned sub-neural network.

학습기(300)는 제2 뉴럴 네트워크의 복수의 병렬 서브 뉴럴 네트워크에 기초하여 부정 샘플 집합의 부정 샘플을 긍정 샘플과 부정 샘플로 분류하고, 그 중에서 긍정 샘플로 오분류된 부정 샘플을 결정할 수 있다.The learner 300 can classify the negative samples of the negative sample set into positive samples and negative samples based on a plurality of parallel sub-neural networks of the second neural network, and to determine negative samples misclassified as positive samples.

학습기(300)는 오분류된 부정 샘플, 복수의 부정 샘플 및 복수의 긍정 샘플에 기초하여, 제2 뉴럴 네트워크의 복수의 병렬 서브 뉴럴 네트워크를 학습시킬 수 있다(S5022). The learning device 300 can learn a plurality of parallel sub-neural networks of the second neural network based on the misclassified negative samples, the plurality of negative samples, and the plurality of positive samples (S5022).

학습기(300)는 긍정 샘플로 오분류된 부정 샘플을 부정 샘플 집합에서 랜덤으로 추출한 일정 수의 부정 샘플과 혼합할 수 있다.The learner 300 may mix a negative sample misclassified as a positive sample with a constant number of negative samples randomly extracted from the negative sample set.

복수의 긍정 샘플과 혼합된 복수의 부정 샘플에 기초하여 서브 뉴럴 네트워크를 학습시켜 네트워크 파라미터를 획득함으로써, 제2 뉴럴 네트워크의 복수의 병렬 서브 뉴럴 네트워크를 결정할 수 있다.A plurality of parallel sub-neural networks of the second neural network can be determined by learning the sub-neural network based on the plurality of negative samples mixed with the plurality of positive samples to obtain the network parameters.

학습기(300)는 역방향 전파 알고리즘을 이용하여 제2 뉴럴 네트워크의 복수의 병렬 서브 뉴럴 네트워크를 학습시킬 수 있다.The learning device 300 can learn a plurality of parallel sub-neural networks of the second neural network using the reverse propagation algorithm.

학습기(300)는 학습된 제2 뉴럴 네트워크의 복수의 병렬 서브 뉴럴 네트워크에 기초하여 미리 설정한 테스트 샘플 집합을 분류할 수 있다(S5023).The learning device 300 can classify a set of preset test samples based on a plurality of parallel sub-neural networks of the learned second neural network (S5023).

학습기(300)는 제2 뉴럴 네트워크의 복수의 병렬 서브 뉴럴 네트워크에 기초하여, 미리 설정한 테스트 샘플 집합의 복수의 샘플을 복수의 긍정 샘플 및 복수의 부정 샘플로 분류할 수 있다.The learning device 300 can classify a plurality of samples of the predetermined set of test samples into a plurality of positive samples and a plurality of negative samples based on the plurality of parallel sub-neural networks of the second neural network.

예를 들어, 학습기(300)는 FDDB의 복수의 샘플을 얼굴 영역과 얼굴이 아닌 영역으로 분류할 수 있다.For example, the learner 300 may classify a plurality of samples of the FDDB into a face area and a non-face area.

학습기(300)는 타겟에 대한 검출율이 상승하고, 타겟에 대한 오차율이 감소할 경우 제2 뉴럴 네트워크의 복수의 병렬 서브 뉴럴 네트워크를 반복적으로 미세조절하고, 검출율이 감소하거나 오차율이 상승할 경우에 미세조절을 종료할 수 있다(S5024).The learning device 300 repeatedly fine-tunes a plurality of parallel sub-neural networks of the second neural network when the detection rate for the target rises and the error rate for the target decreases, and when the detection rate decreases or the error rate rises The fine adjustment may be terminated (S5024).

학습기(300)는 반복적인 미세조절로 인하여 획득한 분류 결과와 분류 결과를 미리 알고 있는 테스트 샘플 집합의 분류 결과와 비교하여 제2 뉴럴 네트워크의 복수의 병렬 서브 뉴럴 네트워크의 타겟에 대한 검출율 및 오차율을 결정할 수 있다.The learning device 300 compares the classification result obtained by the iterative fine adjustment and the classification result with the classification result of the test sample set that is known in advance and detects the detection rate and error rate for the target of the parallel neural network of the second neural network Can be determined.

학습기(300)는 미세조절한 후의 검출율 및 오차율을 미세조절하기 전의 검출율 및 오차율과 비교할 수 있다.The learning device 300 can compare the detection rate and the error rate after the fine adjustment with the detection rate and the error rate before the fine adjustment.

비교결과, 미세조절 후의 제2 뉴럴 네트워크의 복수의 병렬 서브 뉴럴 네트워크의 타겟에 대한 검출율이 상승하고 타겟에 대한 오차율이 감소하면, 제2 뉴럴 네트워크의 복수의 병렬 서브 뉴럴 네트워크의 성능이 향상될 가능성이 있다고 판단하여 미세조절을 반복하여 수행할 수 있다.As a result of the comparison, if the detection rate for the targets of the plurality of parallel sub-neural networks of the second neural network after the fine adjustment is increased and the error rate for the target is decreased, the performance of the plurality of parallel sub-neural networks of the second neural network is improved It is determined that there is a possibility, and the fine adjustment can be repeatedly performed.

비교결과가 제2 뉴럴 네트워크의 복수의 병렬 서브 뉴럴 네트워크의 타겟에 대한 검출율이 감소하고, 타겟에 대한 오차율이 상승하면, 제2 뉴럴 네트워크의 복수의 병렬 서브 뉴럴 네트워크의 성능이 최고점에 도달했다고 판단하여 미세조절을 종료할 수 있다.When the comparison result shows that the detection rate for the targets of the plurality of parallel sub-neural networks of the second neural network decreases and the error rate for the target increases, the performance of the plurality of parallel sub-neural networks of the second neural network reaches the peak And fine adjustment can be terminated.

도 6은 도 5a에 도시된 제2 뉴럴 네트워크에 포함된 서브 뉴럴 네트워크의 구조의 예시이다.Fig. 6 is an illustration of the structure of a sub-neural network included in the second neural network shown in Fig. 5A.

도 6을 참조하면, 좌측부터 입력층, 7 개의 히든층, 출력층을 포함할 수 있다. 히든층은 좌측부터 제1 컨벌루션층, 제1 풀링층, 제2 컨벌루션층, 제2 풀링층, 제3컨벌루션층, 제3 풀링층 및 풀조인층을 의미할 수 있다.Referring to FIG. 6, the input layer, the hidden layer, and the output layer may be included from the left side. The hidden layer may mean a first convolute layer, a first pulling layer, a second convolute layer, a second pulling layer, a third convolute layer, a third pulling layer, and a full join layer from the left side.

입력층은 높이 48, 깊이 48을 갖는 48×48 뉴런 행렬로 표현될 수 있다. 입력 이미지는 48×48 픽셀 포인트 행렬에 대응될 수 있다.The input layer can be represented by a 48 x 48 neuron matrix with a height of 48 and a depth of 48. The input image may correspond to a 48 x 48 pixel point matrix.

제1 컨벌루션층은 높이 44, 깊이 44, 너비 32인 직육면체로 표현될 수 있고, 학습기(300)는 입력 이미지를 제1 컨벌루션층에 포함된 32 개의 특징맵으로 컨벌루션할 수 있다. 각각의 특징맵은 44×44 개의 픽셀 포인트를 포함할 수 있다.The first convolution layer may be represented by a rectangle having a height of 44, a depth of 44 and a width of 32, and the learning device 300 may convolve the input image into 32 feature maps included in the first convolution layer. Each feature map may include 44 x 44 pixel points.

입력층과 제1 컨벌루션층 사이의 컨벌루션 스텝 사이즈는 제1 컨벌루션 스텝사이즈를 의미할 수 있다. 제1 컨벌루션층은 32개의 제1 컨벌루션 커널(또는 필터)를 포함할 수 있고, 32개의 제1 컨벌루션 커널은 32개의 특징맵에 대응될 수 있다. 제1 컨벌루션 커널은 5×5 뉴런 행렬을 포함할 수 있다.The convolution step size between the input layer and the first convolution layer may mean a first convolution step size. The first convolution layer may include 32 first convolution kernels (or filters), and 32 first convolution kernels may correspond to 32 feature maps. The first convolution kernel may include a 5x5 neuron matrix.

학습기(300)는 각각의 제1 컨벌루션 커널은 5×5 뉴런 행렬을 단위로 템플릿을 스캔하고, 컨벌루션 스텝 사이즈 픽셀(컨벌루션 스텝 사이즈가 2일 때, 두 개의 픽셀을 의미함)의 간격으로 스캔하여 입력층의 뉴런에 대응하는 픽셀 포인트를 스캔할 수 있다.The learner 300 determines that each first convolution kernel scans the template in units of a 5x5 neuron matrix and scans at intervals of convolution step size pixels (meaning two pixels when the convolution step size is 2) The pixel point corresponding to the neuron of the input layer can be scanned.

학습기(300)는 스캔 과정 중에, 각각의 제1 컨벌루션 커널을 이용하여 입력층에 대응하는 컨벌루션 스텝 사이즈 간격으로 복수의 5×5 픽셀 포인트를 컨벌루션 하여 특징맵을 획득할 수 있다.The learner 300 may acquire a feature map during the scan process by using a respective first convolution kernel to convolute a plurality of 5x5 pixel points at a convolution step size interval corresponding to the input layer.

학습기(300)는 입력층에서 컨벌루션 스텝 사이즈를 간격으로 하는 복수의 5×5 뉴런 영역에 대응되는 픽셀 포인트를 제1 컨벌루션층의 특징맵의 복수의 픽셀 포인트에 매핑시킬 수 있다.The learning device 300 may map pixel points corresponding to a plurality of 5x5 neuron regions spaced by the convolution step size in the input layer to a plurality of pixel points of the feature map of the first convolution layer.

제1 풀링층은 높이 22, 깊이 22, 너비 32인 직육면체로 표현될 수 있고, 학습기(300)는 32개의 제1 컨벌루션층을 풀링하여 제1 풀링층의 32개의 특징맵을 획득할 수 있다. 각각의 특징맵은 22×22 개의 픽셀 포인트를 포함할 수 있다.The first pooling layer may be represented by a rectangular parallelepiped having a height 22, a depth 22, and a width 32, and the learning device 300 may acquire 32 feature maps of the first pooling layer by pooling 32 first convolution layers. Each feature map may contain 22 x 22 pixel points.

제1 컨벌루션층과 제1풀링층 사이의 스텝 사이즈는 제1 풀링층의 풀링 스텝 사이즈를 의미할 수 있다. 제1 풀링층은 32 개의 제1 풀링 커널을 포함하고, 32개의 풀링 커널은 32개의 특징맵에 대응될 수 있다. 각각의 풀링 커널은 3×3 뉴런 행렬을 포함할 수 있다.The step size between the first convolution layer and the first pulling layer may mean the pulling step size of the first pulling layer. The first pooling layer may include 32 first pooling kernels and 32 pooling kernels may correspond to 32 feature maps. Each pooling kernel may contain a 3x3 neuron matrix.

학습기(300)는 각각의 제1 풀링 커널 3×3 뉴런 행렬을 단위로 하여 템플릿을 스캔하고, 풀링 스텝 사이즈 픽셀(풀링 스텝 사이즈가 2일 때, 두 개의 픽셀을 의미함) 간격으로 제1 컨벌루션층 특징맵의 픽셀 포인트를 스캔할 수 있다.The learning device 300 scans the template in units of the respective first pulling kernel 3 × 3 neuron matrix and generates a first convolution in a pooling step size pixel (meaning two pixels when the pooling step size is 2) The pixel points of the layer feature map can be scanned.

학습기(300)는 각각의 제1 풀링 커널을 이용하여 제1 컨벌루션층의 특징맵에서 풀링 스텝 사이즈를 간격으로 하는 복수의 3×3 픽셀 포인트를 풀링하여 제1 풀링층의 특징맵을 획득할 수 있다.The learner 300 can acquire the feature map of the first pooling layer by pooling a plurality of 3x3 pixel points spacing the pooling step size in the feature map of the first convolution layer using each first pooling kernel have.

학습기(300)는 제1 컨벌루션의 특징맵을 복수의 3×3 픽셀 포인트를 풀링 스텝 사이즈 간격으로 풀링하여 제1 풀링층의 특징맵의 복수의 픽셀 포인트에 대응시킬 수 있다.The learner 300 can match the feature map of the first convolution to a plurality of pixel points of the feature map of the first pulling layer by pooling a plurality of 3x3 pixel points in a pooling step size interval.

제2 풀링층은 높이 18, 깊이 18, 너비 32인 직육면체로 표현될 수 있고, 학습기(300)는 제1 풀링층의 32개 특징맵을 컨벌루션하여 제2 컨벌루션층의 32개의 특징맵을 획득할 수 있다. 제2 컨벌루션층의 특징맵은 18×18 개 픽셀 포인트를 포함할 수 있다.The second pooling layer may be represented by a rectangular parallelepiped having a height of 18, a depth of 18 and a width of 32, and the learning device 300 may acquire 32 feature maps of the second convolution layer by convolving 32 feature maps of the first pulling layer . The feature map of the second convolution layer may include 18 x 18 pixel points.

제2 풀링층은 높이 9, 깊이 9, 너비 64인 직육면체로 표현될 수 있고, 학습기(300)는 제2 컨벌루션층의 32개의 특징맵을 풀링하여 제2 풀링층의 64개의 특징맵을 획득할 수 있다. 각각의 제2 풀링층의 특징맵은 9×9 픽셀 포인트를 포함할 수 있다.The second pooling layer may be represented by a rectangular parallelepiped having a height of 9, a depth of 9 and a width of 64, and the learning device 300 acquires 64 feature maps of the second pooling layer by pooling 32 feature maps of the second convolution layer . The feature map of each second pooling layer may comprise 9 x 9 pixel points.

제3 컨벌루션층은 높이 7, 깊이 7, 너비 64인 직육면체로 표현될 수 있고, 학습기(300)는 제2 풀링층의 64개의특징맵을 컨벌루션하여 64 개의 제3 컨벌루션층의 특징맵을 획득할 수 있다. 각각의 제3 컨벌루션층의 특징맵은 7×7 픽셀 포인트를 포함할 수 있다.The third convolution layer may be represented by a rectangular parallelepiped having a height of 7, a depth of 7 and a width of 64, and the learning apparatus 300 may convolve 64 characteristic maps of the second pooling layer to obtain 64 characteristic maps of the third convolution layer . The feature map of each third convolution layer may include 7 x 7 pixel points.

제3 풀링층은 높이 3, 깊이 3, 너비 64인 직육면체로 표현될 수 있고, 학습기(300)는 제3 컨벌루션층의 64개의 특징맵을 풀링하여 제3 풀링층의 64개의 특징맵을 획득할 수 있다. 각각의 제2 풀링층의 특징맵은 3×3 픽셀 포인트를 포함할 수 있다.The third pooling layer may be represented by a rectangular parallelepiped having a height of 3, a depth of 3, and a width of 64, and the learning device 300 acquires 64 feature maps of the third pooling layer by pooling 64 feature maps of the third convolution layer . The feature map of each second pooling layer may include 3 x 3 pixel points.

제2 컨벌루션층 및 제3 컨벌루션층은 제1 컨벌루션층의 동작과 동일하게 동작할 수 있고, 제2 풀링층 및 제3 풀링층은 제1 풀링층과 동일하게 동작할 수 있다.The second convolution layer and the third convolution layer may operate in the same manner as the operation of the first convolution layer and the second and third pulling layers may operate the same as the first pulling layer.

풀조인층은 64×64 개의 뉴런을 포함할 수 있다. 풀조인층의 각각의 뉴런은 독립적으로 제3 풀링층의 뉴런과 연결될 수 있다.The full join layer may include 64 x 64 neurons. Each neuron in the full join layer can be independently connected to a neuron in the third pooling layer.

출력층은 2개의 뉴런을 포함할 수 있다. 출력층의 각각의 뉴런은 독립적으로 풀조인층의 뉴런과 연결될 수 있다.The output layer may include two neurons. Each neuron in the output layer can be independently connected to a neuron in the full join layer.

도 7은 도 1에 도시된 타겟 검출장치가 영상 피라미드를 생성하여 타겟을 검출하는 동작의 순서도를 나타내고, 도 8은 도 7에 도시된 이미지 피라미드의 예시이다.FIG. 7 shows a flowchart of an operation in which the target detection apparatus shown in FIG. 1 generates an image pyramid to detect a target, and FIG. 8 is an illustration of the image pyramid shown in FIG.

도 7 및 도 8을 참조하면, 이미지 획득기(101)는 검출할 이미지에 기초하여 이미지 피라미드를 생성할 수 있다(S710).Referring to FIGS. 7 and 8, the image acquirer 101 may generate an image pyramid based on the image to be detected (S710).

이미지 획득기(101)는 검출할 이미지를 획득할 수 있다. 이미지는 독립적인 단일 이미지 또는 비디오 영상 중의 프레임을 포함할 수 있다.The image acquiring unit 101 can acquire an image to be detected. The image may comprise a single single image or a frame in a video image.

이미지 획득기(101)는 검출할 이미지를 타겟 검출 장치(10)의 템플릿 사이즈가 될 때까지, 미리 설정한 배율에 기초하여 점진적으로 축소할 수 있다. 타겟 검출 장치(10)의 템플릿은 제1 컨벌루션 뉴럴 네트워크의 입력층이 이미지를 획득한 단위 검출 영역을 의미할 수 있다. 템플릿의 하나의 뉴런은 이미지의 하나의 픽셀에 대응될 수 있다. 템플릿 및 검출할 이미지가 직사각형인 경우, 그 사이즈는 길이와 너비로 나타낼 수 있다.The image acquirer 101 can gradually reduce the image to be detected based on a preset magnification until the target size of the target detection apparatus 10 is obtained. The template of the target detection apparatus 10 may mean a unit detection region in which the input layer of the first convolutional neural network acquires an image. One neuron of the template may correspond to one pixel of the image. If the template and the image to be detected are rectangular, the size can be expressed in length and width.

이미지 획득기(101)는 실험 데이터, 역사 데이터, 경험 데이터 및/또는 실제 상황에 기초하여 축소 배율을 결정할 수 있다. 예를 들어, 이미지 획득기(101)는 축소 배율을 1.2배로 설정하고, 검출할 이미지를 매회 1.2배씩 축소하여 타겟 검출 장치(10)의 템플릿 사이즈에 도달할 때까지 축소할 수 있다.The image acquirer 101 may determine a reduction magnification based on experimental data, historical data, empirical data, and / or actual conditions. For example, the image acquirer 101 may set the reduction magnification to 1.2 times, reduce the image to be detected by 1.2 times each time, and reduce it until the template size of the target detection apparatus 10 is reached.

이미지 획득기(10)는 검출할 이미지 및 점진적으로 축소한 영상을 사이즈에 따라 큰 이미지부터 작은 이미지의 순서로 아래에서부터 위로 겹쳐서 검출할 이미지의 이미지 피라미드를 생성할 수 있다. 이미지 피라미드의 최저층은 검출할 원본 이미지이고, 다른 층은 검출할 이미지가 점진적으로 축소된 후의 이미지일 수 있다.The image acquirer 10 can generate an image pyramid of an image to be detected and an image to be detected to be superimposed on the progressively reduced image from bottom to top in the order of a large image to a small image according to the size. The lowest layer of the image pyramid may be the original image to be detected and the other layer may be the image after the image to be detected is progressively reduced.

도 8의 원본 이미지 척도는 이미지 획득기(101)가 획득한 원본 이미지의 크기를 나타내고, 검출기 템플렛 척도는 타겟 검출 장치(10)의 템플릿 크기를 나타낼 수 있다. 타겟 검출 장치(10)의 템플릿 크기는 제1 뉴럴 네트워크의 입력층의 템플릿 크기와 동일할 수 있다.The source image scale of FIG. 8 represents the size of the original image acquired by the image acquirer 101, and the detector template scale may represent the template size of the target detection apparatus 10. [ The template size of the target detection apparatus 10 may be equal to the template size of the input layer of the first neural network.

제1 분류기(1021)는 제1 뉴럴 네트워크에 기초하여 이미지 피라미드의 각각의 층 이미지에 포함된 복수의 후보 영역을 분류할 수 있다(S702).The first classifier 1021 may classify a plurality of candidate regions included in each layer image of the image pyramid based on the first neural network (S702).

제1 분류기(1021)는 제1 뉴럴 네트워크의 입력층의 템플릿을 슬라이딩 하는 방식으로 검출할 이미지의 이미지 피라미드에 포함된 각각의 층 이미지를 스캔할 수 있다.The first classifier 1021 may scan each layer image included in the image pyramid of the image to be detected in such a manner as to slide the template of the input layer of the first neural network.

제1 분류기(1021)는 매번 슬라이딩 할 때마다 템플릿을 통하여 층 이미지의 이미지 영역(템플릿 범위 내의 이미지 영역)을 한 번 획득할 수 있다. 템플릿을 통하여 획득한 층 이미지의 이미지 영역을 후보 영역으로 정의할 수 있다. 제1 분류기(1021)는 복수의 후보 영역과 그 후보 영역들이 속한 층 이미지의 대응관계를 기록할 수 있다.The first classifier 1021 can acquire an image region (an image region within a template range) of the layer image once through the template every time when sliding. The image region of the layer image acquired through the template can be defined as a candidate region. The first classifier 1021 may record a plurality of candidate regions and a correspondence relationship of the layer images to which the candidate regions belong.

입력층 템플릿의 뉴런과 이미지 영역의 픽셀 포인트는 일대일로 대응될 수 있다. 입력층 템플릿의 형태와 후보 영역의 형태는 완전히 일치할 수 있다. 템플릿이 뉴런의 행렬일 때, 대응하는 후보 영역은 픽셀 포인트 행렬일 수 있다.The neurons of the input layer template and the pixel points of the image region may correspond one to one. The shape of the input layer template and the shape of the candidate region can be completely matched. When the template is a matrix of neurons, the corresponding candidate region may be a pixel point matrix.

제1 분류기(1021)는 뉴럴 네트워크를 통하여 각각의 후보 영역을 분류하고, 출력층에서 후보 영역을 타겟 후보 영역 및 비타겟 후보 영역으로 분류할 수 있다. 타겟 후보 영역이란 타겟을 포함하는 후보 영역을 의미하고, 비타겟 후보 영역이란 타겟을 포함하지 않는 후보 영역을 의미할 수 있다. 최종적으로, 제1 분류기(1021)는 이미지 피라미드의 각 층 이미지에 포함된 후보 영역의 분류 결과를 획득할 수 있다. 즉, 제1 뉴럴 네트워크의 분류 결과로 복수의 타겟 후보 영역 및 복수의 비타겟 후보 영역을 획득할 수 있다.The first classifier 1021 classifies each candidate region through a neural network, and classifies the candidate region into a target candidate region and a non-target candidate region in the output layer. The target candidate region means a candidate region including a target, and the non-target candidate region means a candidate region not including a target. Finally, the first classifier 1021 can obtain the classification result of the candidate region included in each layer image of the image pyramid. That is, a plurality of target candidate regions and a plurality of non-target candidate regions can be obtained as a result of classification of the first neural network.

제2 분류기(1022)는 제2 뉴럴 네트워크의 복수의 병렬 서브 뉴럴 네트워크에 기초하여 제1 뉴럴 네트워크가 분류한 영역을 복수의 타겟 후보 영역과 복수의 비타겟 후보 영역으로 분류할 수 있다(S703).The second classifier 1022 may classify the region classified by the first neural network into a plurality of target candidate regions and a plurality of non-target candidate regions based on a plurality of parallel sub-neural networks of the second neural network (S703) .

제2 뉴럴 네트워크에 포함된 복수의 병렬 서브 뉴럴 네트워크는 제1 뉴럴 네트워크의 분류 결과를 뉴럴 네트워크의 입력으로 사용할 수 있다. 예를 들어, 제1 뉴럴 네트워크가 분류한 복수의 타겟 후보 영역 및 복수의 비타겟 후보 영역을 제2 뉴럴 네트워크의 병렬 서브 뉴럴 네트워크의 입력으로 사용할 수 있다.The plurality of parallel sub-neural networks included in the second neural network may use the classification result of the first neural network as an input of the neural network. For example, a plurality of target candidate regions classified by the first neural network and a plurality of non-target candidate regions may be used as inputs to the parallel sub-neural network of the second neural network.

상술한 바와 같이 복수의 병렬 서브 뉴럴 네트워크는 서로 다른 타겟 속성에 대응될 수 있다.As described above, a plurality of parallel sub-neural networks may correspond to different target attributes.

복수의 병렬 서브 뉴럴 네트워크 각각은 독립적이고, 병렬로 동작하므로 제2 분류기(1022)는 제2 뉴럴 네트워크의 복수의 병렬 서브 뉴럴 네트워크 각각을 이용하여 입력 정보를 독립적으로 수신하고, 분류 결과를 독립적으로 출력할 수 있다.Since each of the plurality of parallel sub-neural networks is independent and operates in parallel, the second classifier 1022 independently receives the input information using each of the plurality of parallel sub-neural networks of the second neural network, Can be output.

제1 뉴럴 네트워크의 입력층의 템플릿 사이즈는 제2 뉴럴 네트워크의 복수의 병렬 서브 뉴럴 네트워크의 입력층 템플릿 사이즈와 일치할 수 있다.The template size of the input layer of the first neural network may coincide with the input layer template size of the plurality of parallel sub-neural networks of the second neural network.

제2 분류기(1022)는 교차, 직렬 또는 랜덤등의 비동시 방식으로 제2 뉴럴 네트워크의 복수의 병렬 서브 뉴럴 네트워크를 이용하여 제1 뉴럴 네트워크가 분류한 복수의 타겟 후보 영역 및 복수의 비타겟 후보 영역을 분류할 수 있다.The second classifier 1022 may use a plurality of parallel sub-neural networks of the second neural network in an asynchronous manner such as cross, serial, or random to generate a plurality of target candidate regions sorted by the first neural network and a plurality of non- The area can be classified.

제2 분류기(1022)는 각각의 서브 뉴럴 네트워크의 출력층에서 분류 결과를 출력하여 타겟 후보 영역 및 비타겟 후보 영역의 분류 결과를 획득할 수 있다.The second classifier 1022 may output the classification result in the output layer of each sub-neural network to obtain classification result of the target candidate region and the non-target candidate region.

제2 분류기(1022)는 제2 뉴럴 네트워크에 포함된 복수의 병렬 서브 뉴럴 네트워크에 대하여 선택 명령을 전송할 수 있고, 각각의 서브 뉴럴 네트워크는 선택 명령을 수신한 경우 제1 뉴럴 네트워크의 분류결과를 분류할 수 있다.The second classifier 1022 may transmit a selection command to a plurality of parallel sub-neural networks included in the second neural network, and when each of the sub-neural networks receives the selection command, classifies the classification results of the first neural network can do.

제2 분류기(1022)는 선택 명령을 통하여 서브 뉴럴 네트워크를 선택적으로 조작함으로써, 사용자의 여러 요구를 융통성있게 만족시킬 수 있다. 이를 통하여 타겟 검출 장치(10)는 모든 서브 뉴럴 네트워크를 조절하여 분류하는 것에 비하여 시스템의 계산 자원을 절약할 수 있다. 따라서, 타겟 검출 장치(10)는 하드웨어 사양이 낮거나, 계산 성능이 떨어지는 장비에 쉽게 적용될 수 있다.The second classifier 1022 can flexibly satisfy various demands of the user by selectively operating the sub-neural network through the selection command. Through this, the target detection apparatus 10 can save the calculation resources of the system as compared with the control and classification of all the sub-neural networks. Therefore, the target detection apparatus 10 can be easily applied to equipment having a low hardware specification or poor calculation performance.

타겟 영역 결정기(103)는 복수의 타겟 후보 영역 각각이 속한 이미지 피라미드의 층 이미지들 및 복수의 타겟 후보 영역 각각이 속한 층 이미지들 간의 크기 및 위치 차이에 기초하여 복수의 타겟 후보 영역의 크기 및 위치를 표준화하고, 표준화된 복수의 타겟 후보 영역을 병합하여 타겟 영역을 획득할 수 있다.The target region decider 103 determines the size and location of the plurality of target candidate regions based on the size and position difference between the layer images of the image pyramid to which each of the plurality of target candidate regions belong and the layer images to which each of the plurality of target candidate regions belong, And a target area can be obtained by merging a plurality of standardized target candidate areas.

타겟 영역 결정기(103)는 복수의 후보 영역과 후보 영역이 속한 층 이미지의 대응관계에 기초하여 제2 뉴럴 네트워크의 복수의 병렬 서브 뉴럴 네트워크가 분류한 복수의 타겟 후보 영역이 속한 층 이미지를 결정할 수 있다.The target region decider 103 can determine a layer image to which a plurality of target candidate regions classified by the plurality of parallel sub-neural networks of the second neural network belong, based on the correspondence relationship between the plurality of candidate regions and the layer image to which the candidate region belongs have.

타겟 영역 결정기(103)는 타겟 후보 영역이 속한 층 이미지와 이미지 파리마드의 층 이미지들 간의 크기 차이 및 위치 차이 값에 기초하여 타겟 후보 영역의 위치 및 크기를 결정함으로써 타겟 후보 영역의 크기 및 위치를 표준화 할 수 있다.The target region decider 103 determines the size and position of the target candidate region by determining the position and size of the target candidate region based on the size difference and the position difference value between the layer image of the target candidate region and the layer images of the image parity Can be standardized.

예를 들어, 타겟 영역 결정기(103)는 각각의 타겟 후보 영역이 속한 층 이미지와 검출할 이미지(이미지 피라미드의 가장 아래층 이미지) 사이의 위치 및 크기의 차이 값에 기초하여 타겟 후보 영역이 검출할 이미지에서 가지는 위치 및 크기를 결정함으로써 타겟 후보 영역의 크기 및 위치를 표준화할 수 있다.For example, the target region decider 103 determines the target region to be detected by the target candidate region based on the position and size difference value between the layer image to which each target candidate region belongs and the image to be detected (the bottom layer image of the image pyramid) The size and position of the target candidate region can be standardized by determining the position and size of the target candidate region.

타겟 영역 결정기(103)는 표준화된 복수의 타겟 후보 영역을 병합하여 타겟 영역을 획득할수 있다(S705).The target area determiner 103 may acquire a target area by merging a plurality of standardized target candidate areas (S705).

표준화로 획득한 임의의 두 개의 타겟 후보 영역에 대하여 두 개의 타겟 후보 영역이 속한 층 이미지의 층 차이가 미리 설정한 층 차이 값보다 크지 않거나, 두 개의 타겟 후보 영역의 영역 교차비가 미리 설정한 제1 영역 교차비의 값보다 크면 타겟 영역 결정기(103)는 두 개의 타겟 후보 영역을 제1 병합할 수 있다.If the layer difference of the layer image in which two target candidate regions belong to any two candidate candidate regions obtained by normalization is not larger than a predetermined layer difference value or if the region ratio of two target candidate regions is equal to or smaller than the first The target area decider 103 can merge the two target candidate areas first.

타겟 영역 결정기(103)는 표준화된 복수의 타겟 후보 영역이 모두 병합될 때까지 제1 병합을 수행할 수 있다.The target area determiner 103 may perform the first merging until all of the standardized plurality of target candidate areas are merged.

타겟 영역 결정기(103)는 표준화로 획득한 타겟 후보 영역이 x, y인 경우에, x와 y 사이의 영역 교집합과 영역 합집합을 결정하여 x와 y의 영역 교차비(예를 들어, 영역 교집합의 면적을 영역 합집합의 면적으로 나눈 값)를 계산할 수 있다.When the target candidate region obtained by normalization is x, y, the target area determiner 103 determines the area intersection between x and y and the area union, and calculates the area intersection ratio of x and y (for example, the area of the area intersection Divided by the area of the domain union) can be calculated.

예를 들어, 타겟 영역 결정기(103)는 미리 설정한 제1 영역 교차비의 값이 0.3인 경우, 타겟 영역 결정기(103)가 계산한 영역 교차비를 0.3과 비교할 수 있다.For example, the target area decider 103 can compare the area ratio calculated by the target area decider 103 to 0.3 when the preset value of the first area ratio is 0.3.

또한, 타겟 영역 결정기(103)는 이미지 피라미드에서 x, y 각각이 속한 층의 레벨을 결정하고, 층 레벨의 차이를 계산하여 층 차이 값을 획득하고, 이를 미리 설정된 층 차이 값인 4와 비교할 수 있다.In addition, the target area determiner 103 may determine the level of the layer to which x and y each belong in the image pyramid, calculate the layer level difference to obtain the layer difference value, and compare this to the predetermined layer difference value 4 .

타겟 영역 결정기(103)는 x와 y의 영역 교차비가 0.3보다 크고, 층 차이가 4보다 크지 않을 때, 타겟 후보 영역 x 및 y가 겹친다고 판단하고 x와 y를 제1 병합할 수 있다.The target region determiner 103 may determine that the target candidate regions x and y overlap and when x and y are greater than 0.3 and the layer difference is not greater than 4,

층 차이가 작은 타겟 후보 영역들은 층 차이가 큰 타겟 후보 영역들에 비하여 이미지가 겹칠(중복될) 확률이 높고, 영역 교차비가 큰 타겟 후보 영역들은 영역 교차비가 작은 타겟 후보 영역들에 비하여 이미지가 겹칠(중복될) 확률이 높기 때문에 병합 가능성이 높을 수 있다.The target candidate regions having small layer differences are more likely to overlap (overlap) images than the target candidate regions having a large layer difference, and the target candidate regions having a large region ratio are overlapped with the target candidate regions having a small region- (The possibility of overlapping) is high, the possibility of merging may be high.

이미지 피라미드의 위층의 층 이미지는 아래층의 층 이미지의 축소를 통해 얻을 수 있고, 아래층의 층 이미지는 위층의 층 이미지의 모든 픽셀 포인트 또는 일부 픽셀 포인트를 포함할 수 있다. 따라서, 층 차이가 작은 타겟 후보 영역들은 중복되는 픽셀포인트가 많고, 영역 교차비가 큰 타겟 후보 영역들은 중복되는 픽셀 포인트가 많을 수 있다.The layer image of the upper layer of the image pyramid may be obtained through reduction of the layer image of the lower layer and the layer image of the lower layer may include all pixel points or some pixel points of the layer image of the upper layer. Therefore, the target candidate regions having small layer differences may have many pixel points to be overlapped, and the target candidate regions having a large area ratio may have many overlapping pixel points.

타겟 영역 결정기(103)는 중복이 많은 타겟 후보 영역들을 제1 병합함으로써,타겟 후보 영역의 수를 감소 시켜 후속 프로세스를 용이하게 할 뿐만 아니라, 병합 전과 비교하여 병합한 후의 타겟 후보 영역의 이미지 특징의 손실을 감소시키는데 유리할 수 있다. 또한, 중복이 많은 타겟 후보 영역들을 제1 병합함으로써, 검출효율도 향상시킬 수 있다.The target region decider 103 not only facilitates the subsequent process by reducing the number of target candidate regions by first merging the redundant target candidate regions, but also reduces the number of target candidate regions of the target candidate region after merging It may be advantageous to reduce losses. In addition, detection efficiency can be improved by merging the target candidate regions with many redundancies.

타겟 영역 결정기(103)는 두 개의 타겟 후보 영역의 위치와 크기를 각각 누적하고 평균한 후에 제1 병합을 수행할 수 있다. 예를 들어, 타겟 후보 영역이 x, y인 경우에, 타겟 영역 결정기(103)는 x 및 y의 위치 좌표를 누적, 평균하고 x 및 y의 길이와 너비를 누적, 평균할 수 있다. 이 때, 타겟 영역 결정기(103)는 타겟 후보 영역 x를 누적, 평균한 타겟 후보 영역으로 치환하고 y를 타겟 후보 영역에서 제거할 수 있다.The target region decider 103 may perform the first merging after accumulating and averaging the positions and sizes of the two target candidate regions, respectively. For example, if the target candidate region is x, y, the target region determiner 103 may accumulate and average the positional coordinates of x and y and accumulate and average the length and width of x and y. At this time, the target region decider 103 may replace the target candidate region x with the cumulative and average target candidate region, and remove y from the target candidate region.

타겟 영역 결정기(103)는 두 개의 겹치는 타겟 후보 영역에 대한 누적 평균을 통하여 1 병합을 수행할 수 있다. 타겟 영역 결정기(103)는 병합하기 전의 타겟 후보 영역들의 크기 및 위치 정보를 종합하여 병합함으로써 두 개의 타겟 후보 영역의 픽셀 포인트(이미지의 특징)을 종합적으로 포함할 수 있다. 이에 따라, 타겟 영역 결정기(103)는 겹쳐진 타겟 후보 영역의 이미지의 특징을 종합적으로 보유하면서 타겟 후보 영역의 수를 감소시킬 수 있다.The target region determiner 103 may perform a merge through the cumulative average of two overlapping target candidate regions. The target region decider 103 may collectively include the pixel points (features of the image) of the two target candidate regions by integrating and merging the size and position information of the target candidate regions before merging. Accordingly, the target area determiner 103 can reduce the number of target candidate areas while collectively retaining the characteristics of the images of the overlapping target candidate areas.

타겟 영역 결정기(103)는 제1 병합된 두 개의 타겟 후보 영역의 영역 교차비가 미리 설정한 제2 영역 교차비 값보다 큰 경우에 두 개의 제1 병합된 타겟 후보 영역을 제2 병합할 수 있다. 타겟 영역 결정기(103)는 두 개의 제1 병합된 타겟 후보 영역 중에 면적이 작은 타겟 후보 영역을 제거할 수도 있다.The target region decider 103 may merge the two first merged target candidate regions in a second merging process when the region merging ratio of the first merged target candidate regions is greater than a preset second region merging ratio value. The target area determiner 103 may remove a target candidate area having a small area among the two first merged target candidate areas.

타겟 영역 결정기(103)는 제1 병합된 타겟 후보 영역과 제2 병합된 타겟 후보 영역 사이에서 병합을 수행할 수 있으며, 이러한 병합을 2회 이상 수행할 수 있다. 타겟 영역 결정기(103)는 남아 있는 복수의 타겟 후보 영역 사이의 영역 교차비가 미리 설정한 제2 영역 교차비 보다 크지 않을 때까지 병합을 수행할 수 있다.The target region determiner 103 may perform the merging between the first merged target candidate region and the second merged target candidate region, and may perform this merging more than once. The target area determiner 103 may perform the merging until the area ratio between the remaining plurality of target candidate areas is not greater than the preset second area ratio.

타겟 영역 결정기(103)는 제2 병합된 타겟 후보 영역 중에서 적어도 하나의 타겟 후보 영역을 타겟 영역으로 결정할 수 있다.The target region determiner 103 may determine at least one target candidate region among the second merged target candidate regions as a target region.

예를 들어, 타겟 영역 결정기(103)는 제1 병합된 두 개의 타겟 후보 영역 x와 z에 대해서 x와 z 사이의 영역 교차비 및 면적을 결정할 수 있다. x의 면적이 z의 면적보다 크고, x와 z 사이의 영역 교차비를 미리 설정한 제2 교차비인 0.4와 비교하여, 영역 교차비가 0.4보다 큰 경우에 면적이 작은 z를 제거하고 x를 타겟 후보 영역으로 치환함으로써 제2 병합을 완성할 수 있다.For example, the target region determiner 103 may determine the area ratio and area between x and z for the first merged target candidate regions x and z. The area ratio of x is larger than the area of z and the area ratio between x and z is compared with a preset second ratio of 0.4. If the area ratio is larger than 0.4, z smaller in area is removed, The second merging can be completed.

영역 교차비가 큰 타겟 후보 영역들 사이에는 많은 중복 픽셀 포인트가 있고, 타겟 후보 영역들 사이의 이미지 중복율이 비교적 높을 수 있다. 두 개의 타겟 후보 영역의 영역 교차비가 미리 설정한 제2 영역 교차비 값보다 크면, 면적이 큰 타겟 후보 영역은 면적이 작은 타겟 후보 영역보다 더 많은 픽셀 포인트를 가지고, 더 많은 이미지 특징을 가지고, 더 큰 대표성을 가질 수 있다.There are a lot of overlapping pixel points between target candidate regions having a large area ratio, and the image overlap rate between target candidate regions may be relatively high. If the area ratio of the two target candidate areas is larger than the preset second area ratio ratio value, the target candidate area having a larger area has more pixel points than the target candidate area having a smaller area, has more image characteristics, It can be representative.

중복된 픽셀 포인트가 많은 타겟 후보 영역을 제2 병합하면 타겟 후보 영역의 수를 감소시킬 수 있어 후속 작업에 유리할 뿐만 아니라 대부분의 이미지 특징을 보유할 수 있어, 타겟 검출 장치(10)의 검출 효율을 상승시키는데 유리할 수 있다.It is possible to reduce the number of target candidate regions by merging the target candidate regions having a large number of overlapping pixel points for a second time, It may be advantageous to raise it.

표 1은 타겟 검출 장치(10)와 기존 기술들 간의 성능을 비교한 결과를 나타낸다.Table 1 shows the results of comparing the performance between the target detection apparatus 10 and existing technologies.

방법Way 검출율Detection rate 오검출량Misdetection 평균속도Average speed 모델사이즈Model Size AdaboostAdaboost 88%88% 500500 100msec100msec 5MB5MB CascadeCNNCascadeCNN 83%83% 500500 250msec250msec 15MB15MB Other CNNOther CNN 96%96% 500500 500msec500msec 보다 큼Greater than 100~500MB100 to 500 MB 타겟target 검출 장치(10) The detection device (10) 90%90% 500500 200msec200msec 1MB1MB

표 1은 FDDB 얼굴 검출 데이터 집합에 대하여 검출을 수행한 결과를 의미할 수 있다. 타겟 검출 장치(10)의 검출율은 Adaboost 알고리즘과 CascadeCNN 알고리즘에 기초한 방법보다 높을 수 있다.Table 1 shows the results of FDDB face detection data set detection. The detection rate of the target detection apparatus 10 may be higher than that based on the Adaboost algorithm and the CascadeCNN algorithm.

모델 사이즈는 모델이 차지하는 저장공간을 의미할 수 있다. 타겟 검출 장치(10)가 케스케이드 컨벌루션 뉴럴 네트워크를 사용하여 모델 분류를 수행할 때, 분류 모델이 차지하는 저장공간은 1MB도 되지 않아 기존 기술에 비하여 훨씬 적은 공간을 차지할 수 있다.The model size can refer to the storage space occupied by the model. When the target detection apparatus 10 performs the model classification using the cascade convolution neural network, the storage space occupied by the classification model is not more than 1 MB, which can occupy much less space than the existing technology.

또한, 타겟 검출 장치(10)는 하드웨어 사양이 비교적 낮고, 계산 성능이 떨어지는 장비에 대해서도 모델 분류를 수행할 수 있기 때문에 범용적으로 사용될 수 있다.Further, the target detection apparatus 10 can be used for general purposes because it can perform model classification for equipment whose hardware specifications are relatively low and calculation performance is low.

타겟 검출 장치(10)의 평균 검출 속도는 CascadeCNN 알고리즘에 비하여 현저하게 높을 수 있다. 표 1을 참조하면 타겟 검출 장치(10)는 CascadeCNN 알고리즘에 비하여 50 msec의 검출 시간을 절약할 수 있다. 타겟 검출 장치(10)는 비교적 높은 검출율, 빠른 평균 검출 속도 및 최소의 모델 사이즈를 가질 수 있다.The average detection rate of the target detection apparatus 10 may be significantly higher than that of the Cascade CNN algorithm. Referring to Table 1, the target detection apparatus 10 can save a detection time of 50 msec as compared with the Cascade CNN algorithm. The target detection apparatus 10 may have a relatively high detection rate, a fast average detection rate, and a minimum model size.

표 1의 데이터는 하드웨어의 사양과 계산 성능이 미치는 영향을 제거하기 위하여, 높은 하드웨어 사양과 계산 성능을 가지는 장비에서 모델 분류를 수행한 결과일 수 있다.The data in Table 1 may be the result of performing model classification on equipment with high hardware specifications and computational capabilities to eliminate the impact of hardware specifications and computational performance.

장비의 하드웨어 사양 및 계산 성능이 낮아질 때, 기존 기술의 방안은 분류 모델의 사이즈가 크게 때문에, 검출 속도가 늦어지거나 검출율이 감소하는 현상이 나타나고, caton 시스템 정지 상황이 발생하여 실용성을 잃는 경우도 발생할 수 있다.When the hardware specification and the calculation performance of the equipment become low, the existing technology has a large size of the classification model, so that the detection rate is slowed down or the detection rate is decreased. If the caton system is stopped and the practicality is lost Lt; / RTI >

이에 반하여, 타겟 검출 장치(10)는 하드웨어 성능의 변화에 따른 분류 성능의 변화가 테스트 오차범위 내에 있고, 감소하는 현상을 나타내지 않을 수 있다. 따라서, 검출율, 검출 속도 및 모델 사이즈를 종합적으로 비교하면, 타겟 검출 장치(10)는 하드웨어 성능이 낮은 장비에 최적화된 검출 방법을 제공할 수 있다.On the other hand, the target detection apparatus 10 may not exhibit a phenomenon in which a change in classification performance due to a change in hardware performance falls within a test error range and decreases. Therefore, by comprehensively comparing the detection rate, detection speed, and model size, the target detection apparatus 10 can provide a detection method optimized for equipment with low hardware performance.

타겟 검출 장치(10)는 2 이상의 뉴럴 네트워크 및 복수의 병렬 서브 뉴럴 네트워크를 이용하여 후보 영역을 분류함으로써, 검출할 영상에 대한 분류 정확도를 높일 수 있고, 정확한 타겟 영역을 결정할 수 있다.The target detection apparatus 10 can classify the candidate regions using two or more neural networks and a plurality of parallel sub-neural networks, thereby increasing the classification accuracy for the image to be detected and determining an accurate target region.

또한, 병렬 서브 뉴럴 네트워크를 이용함으로써, 케스케이드되는 뉴럴 네트워크의 수를 감소함과 동시에 계산 속도를 향상시킬 수 있고, 분류 모델이 차지하는 저장공간을 대대적으로 감소시켜 하드웨어 성능 및 계산 성능이 낮은 장비에 적용될 수 있다.Further, by using the parallel sub-neural network, the number of neural networks to be cascaded can be reduced and the calculation speed can be improved, and the storage space occupied by the classification model can be largely reduced, .

또한, 타겟 검출 장치(10)는 병렬 서브 뉴럴 네트워크 각각에 서로 다른 타겟 속성을 대응시킴으로써, 타겟 후보 영역 및 비타겟 후보 영역에 대한 식별의 정확도를 대대적으로 향상시킬 수 있고, 이에 따라, 타겟 검출율을 향상시킬 수 있다.Further, the target detection apparatus 10 can greatly improve the accuracy of the identification for the target candidate region and the non-target candidate region by associating different target attributes with each of the parallel sub-neural networks, Can be improved.

이러한 타겟 검출율의 향상으로 타겟 검출 장치(10)는 많은 수의 뉴럴 네트워크를 사용하지 않고 계산 속도를 향상시킬 수 있을 뿐만 아니라, 분류 모델이 차지하는 저장공간을 감소시킬 수 있다.With the improvement of the target detection rate, the target detection apparatus 10 not only can improve the calculation speed without using a large number of neural networks, but also can reduce the storage space occupied by the classification model.

타겟 검출 장치(10)는 뉴럴 네트워크를 반복적으로 미세조절함으로써, 뉴럴 네트워크의 검출율을 점진적으로 향상시키고, 오차율을 점진적으로 감소시켜 검출율이 가장 높고 오차율이 가장 낮은 뉴럴 네트워크를 결정할 때까지 미세조절을 수행할 수 있다.The target detection apparatus 10 gradually adjusts the neural network so as to gradually increase the detection rate of the neural network by gradually fine tuning the neural network and gradually adjusts the error rate until the neural network with the highest detection rate and the lowest error rate is determined Can be performed.

타겟 검출 장치(10)는 이러한 미세조절을 통하여 뉴럴 네트워크의 잠재적인 성능을 충분히 발굴하여 2 개의 뉴럴 네트워크만으로 기존의 6 개의 뉴럴 네트워크를 이용한 분류 모델의 성능을 능가할 수 있다.The target detection apparatus 10 can sufficiently exploit the potential performance of the neural network through such fine adjustment, and can exceed the performance of the classification model using the existing six neural networks with only two neural networks.

타겟 검출 장치(10)는 뉴럴 네트워크의 수를 절약하고 분류 모델의 구조를 단순화하여 저장공간을 감소시키고, 하드웨어 성능 및 계산 성능이 낮은 장비에 적용될 수 있다.The target detection apparatus 10 can reduce the number of neural networks and simplify the structure of the classification model to reduce storage space and can be applied to equipment with low hardware performance and low calculation performance.

타겟 검출 장치(10)는 타겟 후보 영역을 병합함으로써, 병합 전과 비교하여 영상 특징을 손실시키기 않으면서 타겟 검출율을 향상시킬 수 있다.The target detection apparatus 10 can improve the target detection ratio without losing the image characteristic as compared with before merging by merging the target candidate region.

도 9는 2 스테이지 모델을 이용하여 타겟을 검출하는 동작의 예시를 나타낸다.9 shows an example of an operation of detecting a target using a two-stage model.

도 9를 참조하면, 타겟 검출 장치(10)는 제1 스테이지 및 제2 스테이지를 포함할 수 있다. 제1 스테이지는 제1 뉴럴 네트워크로 구현되고, 제2 스테이지는 제2 뉴럴 네트워크로 구현될 수 있다.Referring to Fig. 9, the target detection apparatus 10 may include a first stage and a second stage. The first stage may be implemented as a first neural network and the second stage may be implemented as a second neural network.

2 스테이지 모델을 이용한 타겟 검출 장치(10)는 성능을 향상시키기 위해서 리프 모델(leaf model)들을 결합할 수 있고, 얕은 심층 모델(shallow deep model)의 성능을 향상시킬 수 있다.The target detection apparatus 10 using the two-stage model can combine leaf models to improve performance and improve the performance of a shallow deep model.

제1 스테이지는 제1 뉴럴 네트워크를 이용하여 복수의 후보 영역을 분류할 수 있다.The first stage may classify a plurality of candidate regions using the first neural network.

제2 스테이지는 제2 뉴럴 네트워크를 이용하여 타겟 영역을 결정할 수 있다.The second stage may determine the target area using the second neural network.

제2 스테이지는 복수의 병렬 서브 뉴럴 네트워크를 이용하여 제1 스테이지의 분류 결과를 정제(refine)하고, 타겟 후보 영역을 병합(merge)함으로써 타겟 영역을 획득할 수 있다.The second stage may refine the classification results of the first stage using a plurality of parallel sub-neural networks, and may acquire the target area by merging the target candidate areas.

제2 스테이지는 복수의 병렬 서브 뉴럴 네트워크에 대하여 동시에 미세조절을 수행할 수 있다. 이 때, 제2 스테이지는 복수의 데이터 셋(dataset)을 이용하여 미세조절을 수행할 수 있다.The second stage can simultaneously perform fine adjustment on a plurality of parallel sub-neural networks. At this time, the second stage can perform fine adjustment using a plurality of data sets.

타겟 검출 장치(10)는 2 스테이지 모델을 이용하여 사람의 얼굴을 검출할 수 있다. 타겟 검출 장치(10)는 기존의 2 개를 초과한 뉴럴 네트워크를 사용하여 얼굴을 검출하는 방법에 비하여 적은 수의 뉴럴 네트워크를 이용하여 타겟을 검출함에도 얼굴 검출의 정확도가 높고 빠를 수 있다.The target detection apparatus 10 can detect a human face using a two-stage model. The target detection apparatus 10 can detect the target using a small number of neural networks and can detect the face more accurately and faster than the conventional method of detecting a face using more than two neural networks.

학습기(300)는 제1 스테이지의 제1 뉴럴 네트워크 및 제2 스테이지의 제2 뉴럴 네트워크를 동시에 학습시킬 수 있다.The learning device 300 can simultaneously learn the first neural network of the first stage and the second neural network of the second stage.

학습기(300)는 제1 뉴럴 네트워크 및 제2 뉴럴 네트워크를 사전 학습시킬 수 있고, 제1 뉴럴 네트워크 및 제2 뉴럴 네트워크에 대하여 미세 조절을 수행할 수 있다. 학습기(300)는 제1 뉴럴 네트워크 및 제2 뉴럴 네트워크에 대하여 반복적으로 미세 조절을 수행할 수 있다.The learning device 300 can pre-learn the first neural network and the second neural network, and can perform fine adjustment on the first neural network and the second neural network. The learning device 300 may repeatedly perform fine adjustment on the first neural network and the second neural network.

학습기(300)는 후보영역을 분류하기 위하여 제1 뉴럴 네트워크를 학습시킬 수 있다. 학습기(300)는 긍정 샘플 및 부정 샘플에 기초하여 제1 뉴럴 네트워크를 학습시킴으로써 후보 영역을 분류할 수 있다. 후보 영역 분류기(102)는 제1 뉴럴 네트워크의 분류결과를 제2 뉴럴 네트워크로 출력할 수 있다.The learning device 300 may learn the first neural network to classify the candidate regions. The learning device 300 can classify the candidate regions by learning the first neural network based on the positive samples and the negative samples. The candidate region classifier 102 may output the classification result of the first neural network to the second neural network.

학습기(300)는 제1 뉴럴 네트워크로 부정 샘플, 긍정 샘플, 오분류된 부정 샘플에 관한 정보를 입력 받아 제2 뉴럴 네트워크를 학습시킬 수 있다.The learning device 300 can receive the information about the negative sample, the positive sample, and the misclassified negative sample into the first neural network to learn the second neural network.

타겟 영역 결정기(104)는 반복적인 미세조절을 통하여 학습된 제2 뉴럴 네트워크를 통하여 타겟 영역을 결정함으로써, 타겟을 검출할 수 있다.The target area determiner 104 can detect the target by determining the target area through the learned second neural network through iterative fine adjustment.

실시예에 따른 방법은 다양한 컴퓨터 수단을 통하여 수행될 수 있는 프로그램 명령 형태로 구현되어 컴퓨터 판독 가능 매체에 기록될 수 있다. 상기 컴퓨터 판독 가능 매체는 프로그램 명령, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다. 상기 매체에 기록되는 프로그램 명령은 실시예를 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 당업자에게 공지되어 사용 가능한 것일 수도 있다. 컴퓨터 판독 가능 기록 매체의 예에는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체(magnetic media), CD-ROM, DVD와 같은 광기록 매체(optical media), 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical media), 및 롬(ROM), 램(RAM), 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다. 프로그램 명령의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드를 포함한다. 상기된 하드웨어 장치는 실시예의 동작을 수행하기 위해 하나 이상의 소프트웨어 모듈로서 작동하도록 구성될 수 있으며, 그 역도 마찬가지이다.The method according to an embodiment may be implemented in the form of a program command that can be executed through various computer means and recorded in a computer-readable medium. The computer-readable medium may include program instructions, data files, data structures, and the like, alone or in combination. The program instructions to be recorded on the medium may be those specially designed and configured for the embodiments or may be available to those skilled in the art of computer software. Examples of computer-readable media include magnetic media such as hard disks, floppy disks and magnetic tape; optical media such as CD-ROMs and DVDs; magnetic media such as floppy disks; Magneto-optical media, and hardware devices specifically configured to store and execute program instructions such as ROM, RAM, flash memory, and the like. Examples of program instructions include machine language code such as those produced by a compiler, as well as high-level language code that can be executed by a computer using an interpreter or the like. The hardware devices described above may be configured to operate as one or more software modules to perform the operations of the embodiments, and vice versa.

이상과 같이 실시예들이 비록 한정된 도면에 의해 설명되었으나, 해당 기술분야에서 통상의 지식을 가진 자라면 상기를 기초로 다양한 기술적 수정 및 변형을 적용할 수 있다. 예를 들어, 설명된 기술들이 설명된 방법과 다른 순서로 수행되거나, 및/또는 설명된 시스템, 구조, 장치, 회로 등의 구성요소들이 설명된 방법과 다른 형태로 결합 또는 조합되거나, 다른 구성요소 또는 균등물에 의하여 대치되거나 치환되더라도 적절한 결과가 달성될 수 있다.Although the embodiments have been described with reference to the drawings, various technical modifications and variations may be applied to those skilled in the art. For example, it is to be understood that the techniques described may be performed in a different order than the described methods, and / or that components of the described systems, structures, devices, circuits, Lt; / RTI > or equivalents, even if it is replaced or replaced.

그러므로, 다른 구현들, 다른 실시예들 및 특허청구범위와 균등한 것들도 후술하는 특허청구범위의 범위에 속한다.Therefore, other implementations, other embodiments, and equivalents to the claims are also within the scope of the following claims.

Claims

Generating an image pyramid based on the image to be detected;
Classifying a plurality of candidate regions from the image pyramid using a Cascade Neural Network; And
Determining a target region corresponding to a target included in the image based on the plurality of candidate regions
Lt; / RTI >
Wherein the cascade neural network comprises a plurality of neural networks, and wherein at least one neural network of the plurality of neural networks comprises a plurality of parallel sub-
Target detection method.

The method according to claim 1,
Wherein said classifying comprises:
Classifying a plurality of regions from the image to be detected using a first neural network; And
Classifying the plurality of regions into a plurality of target candidate regions and a plurality of non-target candidate regions using a second neural network including the plurality of parallel sub-neural networks
Lt; / RTI >
Wherein the plurality of neural networks comprises the first neural network and the second neural network
Target detection method.

The method according to claim 1,
Each of the plurality of parallel sub-neural networks corresponding to different target attributes
Target detection method.

The method of claim 3,
If the target included in the image to be detected is a face of human,
The target attributes may include a front face posture, a side face posture, a front face or a side face by rotation, a skin color, At least one of a light condition, an occlusion, and a clarity.
Target detection method.

3. The method of claim 2,
Wherein the determining comprises:
The size of the plurality of target candidate regions is determined based on the size and position difference between the layer images of the image pyramid to which each of the plurality of target candidate regions belongs and the layer images to which each of the plurality of target candidate regions belong, And standardizing the location; And
Merging a plurality of standardized target candidate regions to obtain the target region
Gt;

The method according to claim 1,
Wherein the plurality of neural networks comprises a convolution neural network and a Boltzmann network
Gt;

Receiving an image containing a target; And
Learning a cascade neural network including a plurality of neural networks using the image
Lt; / RTI >
Wherein at least one of the plurality of neural networks includes a plurality of parallel sub-neural networks
Learning method.

8. The method of claim 7,
Wherein the learning step comprises:
Classifying a sample set composed of a plurality of image areas into a plurality of positive samples and a plurality of negative samples based on an area of a target area corresponding to the target;
Learning a first neural network based on the plurality of negative samples; And
Learning the second neural network including the plurality of parallel sub-neural networks based on the misclassified samples, the plurality of negative samples, and the plurality of positive samples
Lt; / RTI >
Wherein the plurality of neural networks comprises the first neural network and the second neural network
Learning method.

9. The method of claim 8,
Wherein the learning step comprises:
Repeatedly fine-tuning at least one of the first neural network and the second neural network until a detection rate for the target decreases or an error rate for the target increases,
Wherein the fine adjustment comprises:
Learning the at least one based on the misclassified sample, the plurality of negative samples, and the plurality of positive samples; And
Classifying the set of test samples through the learning
&Lt; / RTI >

An image acquiring unit that generates an image pyramid based on an image to be detected;
A candidate region classifier for classifying a plurality of candidate regions from the image pyramid using a cascade neural network; And
A target area determining unit for determining a target area corresponding to a target included in the image based on the plurality of candidate areas,
Lt; / RTI >
Wherein the cascade neural network comprises a plurality of neural networks, and wherein at least one neural network of the plurality of neural networks comprises a plurality of parallel sub-
Target detection device.

11. The method of claim 10,
Wherein the candidate region classifier comprises:
A first classifier for classifying a plurality of regions from the image to be detected using a first neural network; And
A second classifier for classifying the plurality of regions into a plurality of target candidate regions and a plurality of non-target candidate regions using a second neural network including the plurality of parallel sub-
Lt; / RTI >
Wherein the plurality of neural networks comprises the first neural network and the second neural network
Target detection device.

11. The method of claim 10,
Each of the plurality of parallel sub-neural networks corresponding to different target attributes
Target detection device.

13. The method of claim 12,
If the target included in the image to be detected is a face of human,
The target attributes may include a front face posture, a side face posture, a front face or a side face by rotation, a skin color, At least one of a light condition, an occlusion, and a clarity.
Target detection device.

12. The method of claim 11,
Wherein the target area determiner comprises:
The size and position of the plurality of target candidate regions are standardized based on the size and position difference between the layer images of the image pyramid to which each of the plurality of target candidate regions belong and the layer images to which each of the plurality of target candidate regions belong And merging the plurality of standardized target candidate regions to acquire the target region
Target detection device.

11. The method of claim 10,
Wherein the plurality of neural networks comprises a convolution neural network and a Boltzmann network
And a target detection device.

An image acquiring unit that receives an image including a target; And
A learning unit for learning a cascade neural network including a plurality of neural networks using the image;
Lt; / RTI >
Wherein at least one of the plurality of neural networks includes a plurality of parallel sub-neural networks
Learning device.

17. The method of claim 16,
The learning device includes:
Classifying a sample set composed of a plurality of image areas into a plurality of positive samples and a plurality of negative samples based on an area of a target area corresponding to the target, learning the first neural network based on the plurality of negative samples, Learning a second neural network including the plurality of parallel sub-neural networks based on the classified negative samples, the plurality of negative samples, and the plurality of positive samples,
Wherein the plurality of neural networks comprises the first neural network and the second neural network
Learning device.

18. The method of claim 17,
The learning device includes:
Repeating fine adjustment until at least one of the first neural network and the second neural network decreases the detection rate for the target or increases the error rate for the target,
The fine adjustment
Learning the at least one based on the misclassified sample, the plurality of negative samples and the plurality of positive samples, and classifying the set of test samples through the learning
Learning device.