KR20200027078A

KR20200027078A - Method and apparatus for detecting object independently of size using convolutional neural network

Info

Publication number: KR20200027078A
Application number: KR1020180101128A
Authority: KR
Inventors: 김대진; 김용현
Original assignee: 포항공과대학교 산학협력단
Priority date: 2018-08-28
Filing date: 2018-08-28
Publication date: 2020-03-12
Also published as: KR102213600B1; WO2020045903A1; KR102213600B9

Abstract

Disclosed are a method for detecting an object independently of size using a convolutional neural network (CNN) and an apparatus thereof. The method for detecting an object independently of size comprises the steps of: obtaining an input image in which an object to be detected is included; inputting the input image in a size recognition-based CNN which learned a correlation between a fixed size feature extracted from an image in which the object size is normalized and a size-dependent feature extracted from the input image, and extracting a size-independent feature for the object; and inputting the extracted size-independent feature in an object detection neural network and detecting the object.

Description

METHOD AND APPARATUS FOR DETECTING OBJECT INDEPENDENTLY OF SIZE USING CONVOLUTIONAL NEURAL NETWORK

본 발명은 CNN을 이용하여 크기에 독립적으로 물체를 검출하는 방법 및 장치에 관한 것으로, 더욱 상세하게는 고정된 크기로 정규화된 물체의 특징과 다양한 크기를 갖는 물체의 특징 사이의 상관관계를 CNN을 이용해 학습하고, 학습된 CNN을 이용하여 물체의 크기에 관계없이 물체를 검출하는 방법 및 장치에 관한 것이다.The present invention relates to a method and apparatus for detecting an object independently of a size using a CNN, and more specifically, to a correlation between a characteristic of an object normalized to a fixed size and an feature of an object having various sizes. It relates to a method and apparatus for learning by using, and detecting an object regardless of the size of the object by using the learned CNN.

물체 검출 기술은 로봇, 비디오 감시, 자동차 안전 등과 같은 여러 응용 분야에서 널리 사용되고 있는 핵심 기술이다. 최근에는, 물체 검출 기술에 합성 곱 신경망(convolutional neural network, CNN)을 사용하는 방식이 알려짐에 따라, 단일 영상을 이용한 물체 검출 기술은 비약적으로 발전하였다. Object detection technology is a core technology widely used in various applications such as robots, video surveillance, and automobile safety. Recently, as a method of using a convolutional neural network (CNN) for object detection technology is known, object detection technology using a single image has been rapidly developed.

합성 곱 신경망을 사용한 물체 검출 방법은 영역 추출(ROI Pooling)을 기반으로 하는 물체 검출 기술과 격자 공간(Grid Cell)을 기반으로 하는 물체 검출 기술로 분류할 수 있다. The object detection method using the synthetic product neural network can be classified into an object detection technique based on ROI pooling and an object detection technique based on a grid cell.

영역 추출을 기반으로 하는 물체 검출 방법은 합성 곱 신경망(CNN)을 이용해 단일 영상 전체의 합성 곱 특징을 계산하고, 계산된 특징을 이용하여 확인된 물체 후보 영역을 대상으로 영역 추출을 통한 합성 곱 특징을 계산한다. 상기 영역 추출을 기반으로 한 방법은 물체 후보 영역을 미리 정의된 추출 영역 크기로 분할한 뒤 분할된 영역에 대해 최대 혹은 평균값을 계산해 대입한다. 이처럼, 영역 추출을 기반으로 한 방법은 미리 정의된 추출 영역 크기를 사용하기 때문에 영상 내 물체 크기와 관계없이 동일 크기의 합성 곱 특징을 추출하게 된다. 따라서, 영상에 따라 다양한 크기로 표현되는 물체의 특징을 동일한 영역 크기를 갖는 합성 곱 특징으로 추출하기 때문에, 영상 내 물체 크기에 따라 합성 곱 특징의 해상도가 달라지고 특징의 중복과 누락이 발생하게 된다.In the object detection method based on region extraction, a composite product feature of a single image is calculated using a composite product neural network (CNN), and a composite product feature through region extraction is performed on an identified object candidate region using the calculated feature. Calculate In the method based on the region extraction, the object candidate region is divided into a predefined extraction region size, and then the maximum or average value is calculated and substituted for the divided region. As described above, since the method based on region extraction uses a predefined extraction region size, a composite product feature of the same size is extracted regardless of the object size in the image. Therefore, since the characteristics of objects expressed in various sizes according to an image are extracted as composite product features having the same area size, the resolution of the composite product features varies according to the size of the object in the image, and overlapping and omission of features occurs. .

격자 공간을 기반으로 하는 물체 검출 방법은 합성 곱 신경망(CNN)을 이용해 단일 영상 전체의 합성 곱 특징을 계산하고 얻어진 합성 곱 특징에 따른 각각의 격자 공간을 물체와 대응시킨다. 격자 공간은 물체의 크기와 관계없이 격자 공간의 중심에 위치한 물체들을 대표한다. 상기 격자 공간을 기반으로 한 방법은 공간 정보가 없이 하나의 격자 공간 값으로 표현하기 때문에 물체의 크기 정보를 학습할 수 없게 된다.In the object detection method based on the lattice space, a composite product feature of a single image is calculated using a composite product neural network (CNN), and each lattice space according to the obtained composite product feature is matched with an object. The lattice space represents objects located at the center of the lattice space regardless of the size of the object. Since the method based on the grid space is expressed as one grid space value without spatial information, it is impossible to learn the size information of the object.

종합하면, 현재의 합성 곱 신경망(CNN)을 이용한 단일 영상내의 물체 검출 기술은 영상마다 상이한 물체의 크기를 고려하지 않고 영역 추출 혹은 격자 공간에 기초한 방법을 이용해 물체를 검출하고 있다. 이 때문에, 같은 물체라고 하더라도 서로 다른 물체의 특징이 추출되므로 물체 검출의 정확도가 낮은 문제점이 있다.In summary, the current object detection technology within a single image using a synthetic multiplicity neural network (CNN) detects an object using a method based on region extraction or lattice space without considering different object sizes for each image. For this reason, even if it is the same object, since the characteristics of different objects are extracted, the accuracy of object detection is low.

상기와 같은 문제점을 해결하기 위한 본 발명의 목적은, CNN(Convolutional Neural Network)을 이용하여 크기 독립적으로 물체를 검출하는 방법을 제공하는 데 있다.An object of the present invention for solving the above problems is to provide a method of independently detecting an object using a convolutional neural network (CNN).

상기와 같은 문제점을 해결하기 위한 본 발명의 다른 목적은, CNN(Convolutional Neural Network)을 이용하여 크기 독립적으로 물체를 검출하는 장치를 제공하는 데 있다.Another object of the present invention for solving the above problems is to provide an apparatus for detecting an object independently of size using a convolutional neural network (CNN).

상기 목적을 달성하기 위한 본 발명의 일 측면은, CNN(Convolutional Neural Network)을 이용하여 크기 독립적으로 물체를 검출하는 방법을 제공한다.One aspect of the present invention for achieving the above object is to provide a method for detecting objects independently of size using a convolutional neural network (CNN).

CNN(Convolutional Neural Network)을 이용하여 크기 독립적으로 물체를 검출하는 방법은, 검출하고자 하는 물체가 포함된 입력 영상을 획득하는 단계, 상기 물체의 크기를 정규화한 영상에서 추출한 고정 크기 특징과 상기 입력 영상에서 추출한 크기 의존적 특징 사이의 상관 관계를 학습한 크기 인식 기반 CNN에 상기 입력 영상을 입력하여, 상기 물체에 대한 크기 독립적 특징을 추출하는 단계 및 추출된 크기 독립적 특징을 물체 검출 신경망에 입력하여 상기 물체를 검출하는 단계를 포함할 수 있다.A method of detecting an object independently of size using a convolutional neural network (CNN) includes: obtaining an input image including an object to be detected, fixed size characteristics extracted from a normalized image of the object, and the input image Inputting the input image into a size recognition-based CNN that has learned the correlation between the size-dependent features extracted from, extracting size-independent features for the object, and inputting the extracted size-independent features into an object detection neural network to obtain the object It may include the step of detecting.

상기 입력 영상을 획득하는 단계 전이나 후에, 상기 고정 크기 특징을 수집하여 고정 크기 특징 DB(database)를 생성하는 단계를 더 포함할 수 있다.Before or after the step of acquiring the input image, the method may further include collecting the fixed size feature and generating a fixed size feature DB (database).

상기 고정 크기 특징 DB를 생성하는 단계는, 상기 물체의 크기 정보를 수집하는 단계, 수집된 상기 물체의 크기 정보를 이용하여 참조 크기를 결정하는 단계, 결정된 참조 크기를 이용하여 상기 입력 영상에 포함된 물체의 크기를 정규화하는 단계 및 정규화된 상기 입력 영상을 CNN에 입력하여 고정 크기 특징을 추출하는 단계를 포함할 수 있다.The step of generating the fixed size feature DB includes: collecting size information of the object, determining a reference size using the collected size information of the object, and using the determined reference size to be included in the input image The method may include normalizing the size of the object and extracting a fixed size feature by inputting the normalized input image to a CNN.

상기 참조 크기를 결정하는 단계는, 상기 물체의 크기 정보에 포함된 값들 중에서 중간값(median)을 상기 참조 크기로 결정할 수 있다.In the determining of the reference size, a median among the values included in the size information of the object may be determined as the reference size.

상기 크기 인식 기반 CNN은, 상기 참조 크기에 따라 상기 크기 의존적 특징을 분류하고, 분류된 크기에 상응하는 부분 신경망이 개별적으로 활성화되어 상기 크기 독립적 특징을 추출할 수 있다.The size recognition-based CNN classifies the size-dependent features according to the reference size, and partially neural networks corresponding to the classified sizes are individually activated to extract the size-independent features.

상기 크기 인식 기반 CNN은, 상기 크기 의존적 특징이 산출된 물체의 크기를 상기 참조 크기와 비교하여, 상기 참조 크기보다 큰 경우, 작은 경우 및 유사한 경우에 따라 개별적으로 활성화되는 3개의 부분 신경망을 포함할 수 있다.The size recognition-based CNN may include three partial neural networks that are individually activated according to a case where the size-dependent feature is calculated, compared to the reference size, when the size is greater than the reference size, when the size is smaller, and similar. You can.

상기 고정 크기 특징과 상기 크기 의존적 특징 사이의 상관 관계는, 상기 크기 의존적 특징에 따른 특징값을 공간축상에서 모두 더한 값(r_c)과 상기 고정 크기 특징에 따른 특징값을 공간축상에서 모두 더한 값(

) 사이의 차분값을 부드러운 1차 정규 손실 함수(smooth_L1)로 계산하여 결정될 수 있다.The correlation between the fixed-size feature and the size-dependent feature is a value in which all of the feature values according to the size-dependent feature are added on the spatial axis (r _c ) and all of the feature values according to the fixed-size feature are added on the spatial axis. (

It can be determined by calculating the difference between) as a smooth first order normal loss function (smooth _L1 ).

상기 고정 크기 특징과 상기 크기 의존적 특징 사이의 상관 관계(L_san)는, 상기 입력 영상의 표시 방식에 따른 채널(C)에 대하여 수학식

로 정의될 수 있다.The correlation (L _san ) between the fixed-size feature and the size-dependent feature is expressed by the equation for the channel C according to the display method of the input image

It can be defined as.

상기 크기 인식 기반 CNN은, 상기 상관 관계를 학습하여 신경망 내부 파라미터(parameter)를 설정하는 제1 크기 인식 기반 CNN 및 상기 제1 크기 인식 기반 CNN의 신경망 내부 파라미터를 공유하여 상기 크기 독립적 특징을 추출하는 제2 크기 인식 기반 CNN을 포함할 수 있다.The size-awareness-based CNN extracts the size-independent feature by sharing the neural network internal parameters of the first size-awareness-based CNN and the first size-awareness-based CNN by learning the correlation and setting the neural network internal parameters. A CNN based on the second size recognition may be included.

상기 물체를 검출하는 단계는, 상기 크기 독립적 특징과 상기 크기 의존적 특징을 결합한 결과값을 상기 물체 검출 신경망에 입력함으로써, 상기 크기 의존적 특징에 따른 상기 물체의 영상 내 위치 정보를 고려하여 상기 물체를 검출할 수 있다.In the step of detecting the object, the result of combining the size-independent feature and the size-dependent feature is input to the object detection neural network, thereby detecting the object in consideration of positional information in the image of the object according to the size-dependent feature can do.

상기 목적을 달성하기 위한 본 발명의 다른 측면은, CNN(Convolutional Neural Network)을 이용하여 크기 독립적으로 물체를 검출하는 장치를 제공한다.Another aspect of the present invention for achieving the above object is to provide an apparatus for detecting objects independently of size using a convolutional neural network (CNN).

상기 CNN(Convolutional Neural Network)을 이용하여 크기 독립적으로 물체를 검출하는 장치는, 적어도 하나의 프로세서(processor) 및 상기 적어도 하나의 프로세서가 적어도 하나의 단계를 수행하도록 지시하는 명령어들(instructions)을 저장하는 메모리(memory)를 포함할 수 있다.The apparatus for detecting an object independently of size using the convolutional neural network (CNN) stores at least one processor and instructions to instruct the at least one processor to perform at least one step. It may include a memory (memory).

적어도 하나의 단계는, 검출하고자 하는 물체가 포함된 입력 영상을 획득하는 단계, 상기 물체의 크기를 정규화한 영상에서 추출한 고정 크기 특징과 상기 입력 영상에서 추출한 크기 의존적 특징 사이의 상관 관계를 학습한 크기 인식 기반 CNN에 상기 입력 영상을 입력하여, 상기 물체에 대한 크기 독립적 특징을 추출하는 단계 및 추출된 크기 독립적 특징을 물체 검출 신경망에 입력하여 상기 물체를 검출하는 단계를 포함할 수 있다.At least one step includes: acquiring an input image including an object to be detected, and learning a correlation between a fixed size feature extracted from a normalized image of the object and a size-dependent feature extracted from the input image The method may include inputting the input image to a recognition-based CNN, extracting a size-independent feature for the object, and inputting the extracted size-independent feature into an object detection neural network to detect the object.

) 사이의 차분값을 부드러운 1차 정규 손실 함수(smooth_L1)로 계산하여 결정될 수 있다.The correlation between the fixed size feature and the size-dependent feature is a value obtained by adding feature values according to the size-dependent feature on a spatial axis (r _c ) and a feature value according to the fixed size feature added on a spatial axis. (

It can be defined as.

상기 크기 인식 기반 CNN은, 상기 상관 관계를 학습하여 신경망 내부 파라미터(parameter)를 설정하는 제1 크기 인식 기반 CNN 및 상기 제1 크기 인식 기반 CNN의 신경망 내부 파라미터를 공유하여 상기 크기 독립적 특징을 추출하는 제2 크기 인식 기반 CNN을 포함할 수 있다.The size recognition-based CNN extracts the size-independent feature by sharing the neural network internal parameters of the first size-awareness-based CNN and the first size-awareness-based CNN by learning the correlation and setting a neural network internal parameter. A CNN based on the second size recognition may be included.

상기와 같은 본 발명에 따른 CNN(Convolutional Neural Network)을 이용하여 크기 독립적으로 물체를 검출하는 방법 및 장치를 이용할 경우에는 영상에 표현된 물체의 크기와 관계없이 물체 자체가 가진 고유의 특징을 이용하여 물체를 검출할 수 있으므로 검출 성능이 크게 향상될 수 있다.In the case of using a method and apparatus for detecting an object independently of size using a convolutional neural network (CNN) according to the present invention as described above, regardless of the size of the object expressed in the image, the unique features of the object itself are used. Since the object can be detected, the detection performance can be greatly improved.

또한, 기존의 합성 곱 특징(또는 크기 의존적 특징)과 크기에 무관한 크기 독립적 특징을 결합하여 물체를 검출하면, 기존의 합성 곱 특징에 따른 물체의 영상 내 위치 정보와 크기 독립적 특징을 모두 이용하여 물체를 검출할 수 있는 장점이 있다.In addition, when an object is detected by combining an existing composite product feature (or a size-dependent feature) and a size-independent feature independent of size, both the location information in the image and the size-independent feature of the object according to the existing composite product feature are used. It has the advantage of being able to detect objects.

도 1은 본 발명의 일 실시예에 따른 CNN을 이용하여 크기 독립적으로 물체를 검출하는 방법 및 장치에 대한 개념도이다.
도 2는 본 발명의 일 실시예에 따른 영상 내 물체에 대한 고정 크기 특징을 수집하여 데이터베이스로 구축하는 개념도이다.
도 3은 본 발명의 일 실시예에 따른 크기 인식 기반 CNN에 대한 개념도이다.
도 4는 본 발명의 일 실시예에 따른 크기 인식 기반 CNN을 학습시키는 과정을 설명하기 위한 개념도이다.
도 5는 본 발명의 일 실시예에 따른 크기 의식 손실이 역전파되는 것을 차단하기 위해 크기 인식 기반 CNN을 듀얼로 구성한 개념도이다.
도 6은 본 발명의 일 실시예에 따른 CNN을 이용하여 크기 독립적으로 물체를 검출하는 방법에 대한 흐름도이다.
도 7은 본 발명의 일 실시예에 따른 고정 크기 특징을 수집하여 데이터베이스를 구축하는 방법에 대한 흐름도이다.
도 8은 본 발명의 일 실시예에 따른 CNN을 이용하여 크기 독립적으로 물체를 검출하는 장치에 대한 구성도이다.
도 9a 내지 도 9d는 본 발명의 일 실시예에 따른 CNN을 이용하여 크기 독립적으로 물체를 검출하는 방법 및 장치에 따른 물체 검출 성능을 나타내는 그래프이다.1 is a conceptual diagram of a method and apparatus for independently detecting an object using a CNN according to an embodiment of the present invention.
FIG. 2 is a conceptual diagram of collecting fixed size features of an object in an image and constructing a database according to an embodiment of the present invention.
3 is a conceptual diagram for a size recognition based CNN according to an embodiment of the present invention.
4 is a conceptual diagram illustrating a process of learning a CNN based on size recognition according to an embodiment of the present invention.
FIG. 5 is a conceptual diagram of dual size recognition-based CNNs to block back propagation of size consciousness loss according to an embodiment of the present invention.
6 is a flowchart illustrating a method for independently detecting an object using a CNN according to an embodiment of the present invention.
7 is a flowchart illustrating a method of building a database by collecting fixed size features according to an embodiment of the present invention.
8 is a configuration diagram of an apparatus for independently detecting an object using a CNN according to an embodiment of the present invention.
9A to 9D are graphs showing object detection performance according to a method and apparatus for independently detecting an object using a CNN according to an embodiment of the present invention.

본 발명은 다양한 변경을 가할 수 있고 여러 가지 실시예를 가질 수 있는 바, 특정 실시예들을 도면에 예시하고 상세한 설명에 상세하게 설명하고자 한다. 그러나, 이는 본 발명을 특정한 실시 형태에 대해 한정하려는 것이 아니며, 본 발명의 사상 및 기술 범위에 포함되는 모든 변경, 균등물 내지 대체물을 포함하는 것으로 이해되어야 한다. 각 도면을 설명하면서 유사한 참조부호를 유사한 구성요소에 대해 사용하였다. The present invention can be applied to various changes and may have various embodiments, and specific embodiments will be illustrated in the drawings and described in detail in the detailed description. However, this is not intended to limit the present invention to specific embodiments, and should be understood to include all modifications, equivalents, and substitutes included in the spirit and scope of the present invention. In describing each drawing, similar reference numerals are used for similar components.

제1, 제2, A, B 등의 용어는 다양한 구성요소들을 설명하는데 사용될 수 있지만, 상기 구성요소들은 상기 용어들에 의해 한정되어서는 안 된다. 상기 용어들은 하나의 구성요소를 다른 구성요소로부터 구별하는 목적으로만 사용된다. 예를 들어, 본 발명의 권리 범위를 벗어나지 않으면서 제1 구성요소는 제2 구성요소로 명명될 수 있고, 유사하게 제2 구성요소도 제1 구성요소로 명명될 수 있다. 및/또는 이라는 용어는 복수의 관련된 기재된 항목들의 조합 또는 복수의 관련된 기재된 항목들 중의 어느 항목을 포함한다. Terms such as first, second, A, and B can be used to describe various components, but the components should not be limited by the terms. The terms are used only for the purpose of distinguishing one component from other components. For example, the first component may be referred to as a second component without departing from the scope of the present invention, and similarly, the second component may be referred to as a first component. The term and / or includes a combination of a plurality of related described items or any one of a plurality of related described items.

어떤 구성요소가 다른 구성요소에 "연결되어" 있다거나 "접속되어" 있다고 언급된 때에는, 그 다른 구성요소에 직접적으로 연결되어 있거나 또는 접속되어 있을 수도 있지만, 중간에 다른 구성요소가 존재할 수도 있다고 이해되어야 할 것이다. 반면에, 어떤 구성요소가 다른 구성요소에 "직접 연결되어" 있다거나 "직접 접속되어" 있다고 언급된 때에는, 중간에 다른 구성요소가 존재하지 않는 것으로 이해되어야 할 것이다. When an element is said to be "connected" or "connected" to another component, it is understood that other components may be directly connected to or connected to the other component, but other components may exist in the middle. It should be. On the other hand, when a component is said to be "directly connected" or "directly connected" to another component, it should be understood that no other component exists in the middle.

본 출원에서 사용한 용어는 단지 특정한 실시예를 설명하기 위해 사용된 것으로, 본 발명을 한정하려는 의도가 아니다. 단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한, 복수의 표현을 포함한다. 본 출원에서, "포함하다" 또는 "가지다" 등의 용어는 명세서상에 기재된 특징, 숫자, 단계, 동작, 구성요소, 부품 또는 이들을 조합한 것이 존재함을 지정하려는 것이지, 하나 또는 그 이상의 다른 특징들이나 숫자, 단계, 동작, 구성요소, 부품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 미리 배제하지 않는 것으로 이해되어야 한다.The terms used in this application are only used to describe specific embodiments, and are not intended to limit the present invention. Singular expressions include plural expressions unless the context clearly indicates otherwise. In this application, the terms "include" or "have" are intended to indicate the presence of features, numbers, steps, actions, components, parts or combinations thereof described herein, one or more other features. It should be understood that the existence or addition possibilities of fields or numbers, steps, operations, components, parts or combinations thereof are not excluded in advance.

다르게 정의되지 않는 한, 기술적이거나 과학적인 용어를 포함해서 여기서 사용되는 모든 용어들은 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자에 의해 일반적으로 이해되는 것과 동일한 의미를 가지고 있다. 일반적으로 사용되는 사전에 정의되어 있는 것과 같은 용어들은 관련 기술의 문맥 상 가지는 의미와 일치하는 의미를 가지는 것으로 해석되어야 하며, 본 출원에서 명백하게 정의하지 않는 한, 이상적이거나 과도하게 형식적인 의미로 해석되지 않는다.Unless defined otherwise, all terms used herein, including technical or scientific terms, have the same meaning as commonly understood by a person skilled in the art to which the present invention pertains. Terms such as those defined in a commonly used dictionary should be interpreted as having meanings consistent with meanings in the context of related technologies, and should not be interpreted as ideal or excessively formal meanings unless explicitly defined in the present application. Does not.

이하, 본 발명에 따른 바람직한 실시예를 첨부된 도면을 참조하여 상세하게 설명한다.Hereinafter, preferred embodiments of the present invention will be described in detail with reference to the accompanying drawings.

도 1은 본 발명의 일 실시예에 따른 CNN을 이용하여 크기 독립적으로 물체를 검출하는 방법 및 장치에 대한 개념도이다.1 is a conceptual diagram of a method and apparatus for independently detecting an object using a CNN according to an embodiment of the present invention.

본 발명의 일 실시예에 따르면, 영상에 표시되는 물체의 크기에 관계없이 물체의 형상 자체가 가진 고유의 특징(이하에서 크기 독립적 특징으로 지칭할 수 있음)을 추출하고, 추출된 특징을 이용하여 물체를 검출하기 위한 방법을 제안한다. 이러한 목적을 달성하기 위해 본 발명에서는 크기 독립적 특징을 추출할 수 있는 크기 인식 기반 CNN을 학습시키고, 학습된 크기 인식 기반 CNN을 이용하여 물체를 검출한다.According to an embodiment of the present invention, regardless of the size of an object displayed on an image, the characteristic features of the object itself (which may be referred to as size-independent features) are extracted, and the extracted features are used. We propose a method for detecting objects. In order to achieve this object, in the present invention, a size recognition-based CNN capable of extracting size-independent features is learned, and an object is detected using the learned size recognition-based CNN.

구체적으로 도 1을 참조하면, 먼저 검출하고자 하는 물체가 포함된 영상을 CNN(10)에 입력하여 물체에 대한 크기 의존적 특징을 추출할 수 있다. 이때, 크기 의존적 특징은 영상에 따라 다양한 크기를 갖는 물체의 특징을 추출한 것이기 때문에, 영상에 표시된 물체가 동일하더라도 물체의 크기에 따라 서로 다를 수 있다.Specifically, referring to FIG. 1, first, an image including an object to be detected may be input to the CNN 10 to extract a size-dependent characteristic of the object. At this time, since the size-dependent feature extracts features of objects having various sizes according to an image, even if the objects displayed on the image are the same, they may be different depending on the size of the object.

여기서 CNN(10)은 입력 영상으로부터 특징을 추출하는 컨볼루셔널 레이어(Convolutional layer)를 포함할 수 있다. 컨볼루셔널 레이어는 입력 영상의 특징을 추출하는 필터, 필터의 값을 비선형 값으로 바꾸는 활성화 함수(activation function) 및 풀링 레이어(pooling layer) 중 적어도 하나를 포함할 수 있다. 필터는 일종의 행렬로 표현되는 입력 영상의 특징적 부분을 검출하는 함수로서, 일반적으로 행렬로 표현된다. 여기서 행렬로 표현된 입력 영상과 필터를 서로 합성 곱함으로써, 물체의 특징을 추출할 수 있는데, 여기서 추출된 특징은 특징 맵(feature map), 활성화 맵(activation map) 또는 관심 영역(Region of interest, ROI)으로 지칭될 수도 있다. 또한, 합성곱을 수행하는 간격 값을 스트라이드(stride)라고 지칭할 수 있는데, 스트라이드 값에 따라 다른 크기의 특징 맵이 추출될 수 있다. 이때, 특징 맵은 필터의 크기가 입력 영상보다 작으면, 기존의 입력 영상보다 더 작은 크기를 갖게 되는데, 여러 단계를 거쳐 특징이 소실되는 것을 방지하기 위하여 패딩 과정이 추가로 수행될 수 있다. 이때, 패딩 과정은 생성된 특징 맵의 외곽에 미리 설정된 값(예를 들면 0)을 추가함으로써 입력 영상의 크기와 특징 맵의 크기를 동일하게 유지하는 과정일 수 있다. Here, the CNN 10 may include a convolutional layer that extracts features from the input image. The convolutional layer may include at least one of a filter for extracting the characteristics of the input image, an activation function for converting the filter value to a nonlinear value, and a pooling layer. The filter is a function that detects a characteristic part of an input image expressed as a kind of matrix, and is generally expressed as a matrix. Here, by multiplying the input image represented by the matrix and the filter by synthesizing each other, the characteristics of the object can be extracted, wherein the extracted features are a feature map, an activation map, or a region of interest, ROI). In addition, an interval value for performing convolution may be referred to as a stride, and feature maps of different sizes may be extracted according to the stride value. In this case, if the size of the filter is smaller than the input image, the feature map has a smaller size than the existing input image, and a padding process may be additionally performed to prevent the feature from being lost through various steps. At this time, the padding process may be a process of maintaining the size of the input image and the size of the feature map the same by adding a preset value (for example, 0) to the outside of the generated feature map.

활성화 함수는 어떠한 값(또는 행렬)으로 추출된 특징을 비선형 값으로 바꾸는 함수로서, 시그모이드(sigmoid) 함수, ReLU 함수 등이 사용될 수 있다.The activation function is a function of changing a feature extracted with a certain value (or matrix) into a nonlinear value, and a sigmoid function, a ReLU function, and the like can be used.

풀링 레이어는 추출된 특징맵에 대하여 서브 샘플링(subsampling) 또는 풀링(pooling)을 수행하여 특징맵을 대표하는 특징을 선정하는 계층으로서, 특징맵의 일정 영역에 대하여 가장 큰 값을 추출하는 맥스 풀링(max pooling), 평균값을 추출하는 애버리지 풀링(average pooling) 등이 수행될 수 있다. 이때, 풀링 레이어는 활성화 함수 이후에 반드시 수행되는 것이 아니라 선택적으로 수행될 수 있다. The pooling layer is a layer that selects features representative of the feature map by performing subsampling or pooling on the extracted feature map, and Max pooling extracts the largest value for a certain region of the feature map ( max pooling), average pooling to extract the average value, and the like can be performed. At this time, the pooling layer may not be necessarily performed after the activation function, but may be selectively performed.

그 밖에 CNN(10)의 구성과 동작에 대해서는 본 발명이 속하는 기술분야에서 통상의 기술자가 용이하게 이해할 수 있을 것이므로 구체적인 설명은 생략한다.In addition, since the configuration and operation of the CNN 10 will be easily understood by those skilled in the art to which the present invention pertains, detailed description thereof will be omitted.

한편, 본 발명의 일 실시예에 따르면, 영상에 표시된 다양한 크기의 물체를 미리 설정된 크기로 정규화하고, 미리 설정된 크기의 물체가 표시된 영상을 대상으로 추출한 특징(이하에서 고정 크기 특징으로 지칭)을 수집하여 고정 크기 특징 DB(database, DB, 11)를 미리 구축할 수 있다.Meanwhile, according to an embodiment of the present invention, objects of various sizes displayed on an image are normalized to a preset size, and features (hereinafter referred to as fixed-size features) extracted from objects displaying an object of a preset size are collected. By doing so, a fixed size feature DB (database, DB, 11) can be built in advance.

이때, 미리 구축된 고정 크기 특징 DB(11)에서 얻어지는 고정 크기 특징과 다양한 크기의 물체가 표시된 영상에서 얻은 크기 의존적 특징 사이의 상관 관계를 크기 인식 기반 CNN(12)이 학습할 수 있도록 한다. 상관 관계를 학습한 크기 인식 기반 CNN(12)은 입력 영상에 대해 크기 독립적 특징을 추출할 수 있게 된다.At this time, the size recognition-based CNN 12 can learn the correlation between the fixed size feature obtained from the pre-built fixed size feature DB 11 and the size-dependent feature obtained from an image in which objects of various sizes are displayed. The CNN 12 based on the size recognition based on learning the correlation can extract a size independent feature for the input image.

다음으로, 크기 인식 기반 CNN(12)을 통해 크기 독립적 특징이 추출되면, 물체 검출 신경망(13)은 추출된 크기 독립적 특징을 이용하여 입력 영상에 표시된 물체를 검출할 수 있다. Next, when the size-independent feature is extracted through the size recognition-based CNN 12, the object detection neural network 13 may detect the object displayed in the input image using the extracted size-independent feature.

이때, 물체 검출 신경망(13)은 입력 영상에 표시된 물체를 하나 이상의 검출 가능성이 있는 후보 물체로 분류하는 분류기(classifier) 및 회귀 분석을 통해 어떤 물체인지 예측하는 리그레서(Regressor)로 구성될 수 있는데, 물체 검출 신경망(13)은 본 발명이 속하는 기술분야에서 통상의 기술자가 용이하게 구현할 수 있으므로 자세한 설명은 생략한다.At this time, the object detection neural network 13 may be composed of a classifier that classifies the object displayed in the input image as one or more candidates for detection and a regressor that predicts which object through regression analysis. , Since the object detection neural network 13 can be easily implemented by a person skilled in the art to which the present invention pertains, detailed description thereof will be omitted.

도 2는 본 발명의 일 실시예에 따른 영상 내 물체에 대한 고정 크기 특징을 수집하여 데이터베이스로 구축하는 개념도이다.FIG. 2 is a conceptual diagram of collecting fixed size features of an object in an image and constructing a database according to an embodiment of the present invention.

본 발명의 일 실시예에 따르면, 도 1의 크기 인식 기반 CNN(12)을 학습시키기 위해 고정 크기 특징이 사용될 수 있다. 이때, 고정 크기 특징은 고정된 크기로 정규화된 물체에 대하여 추출된 특징으로서, 미리 수집될 필요가 있다.According to an embodiment of the present invention, a fixed size feature may be used to train the size recognition based CNN 12 of FIG. 1. At this time, the fixed size feature is a feature extracted for an object normalized to a fixed size, and needs to be collected in advance.

도 2를 참조하면, 고정 크기 특징이 수집되어 데이터베이스로 구축되는 과정을 설명할 수 있다. 먼저 하나 이상의 영상(또는 하나 이상의 픽쳐)에 표시된 물체의 크기를 미리 설정된 크기로 정규화할 수 있다(21). 이때, 정규화된 영상은 영상 내에 표시된 물체가 하나 이상 존재할 경우, 개별 물체가 표시된 영역을 추출하고, 추출된 영역의 크기를 미리 설정된 크기로 정규화함으로써, 하나의 영상에 대해 복수개의 정규화된 영상이 도출될 수도 있다. Referring to FIG. 2, a process in which fixed size features are collected and built into a database may be described. First, the size of an object displayed on one or more images (or one or more pictures) may be normalized to a preset size (21). At this time, in the normalized image, when there is more than one object displayed in the image, a region in which the individual objects are displayed is extracted, and the size of the extracted region is normalized to a preset size, thereby deriving a plurality of normalized images for one image. It may be.

한편, 여기서 미리 설정된 크기는 별도로 구축된 물체 DB(22)를 통해 얻어진 참조 크기에 따라 설정될 수 있다. 여기서 물체 DB(22)는 개별 물체가 표시된 영상들이 저장되어 있거나, 개별 물체에 대한 크기(넓이, 폭, 높이 등으로 대표적으로는 넓이)가 저장되어 있다. 따라서, 물체 DB(22)에 포함된 크기 중에서 선정된 크기(예를 들면 중앙값)를 참조 크기로 하여 정규화할 크기가 설정될 수 있다.Meanwhile, the preset size may be set according to the reference size obtained through the separately constructed object DB 22. Here, in the object DB 22, images in which individual objects are displayed are stored, or sizes (areas typically represented by width, width, height, etc.) for individual objects are stored. Accordingly, a size to be normalized may be set by using a selected size (for example, a median value) from among sizes included in the object DB 22 as a reference size.

물체의 크기가 정규화된 영상이 도출되면, 크기 정규화된 영상을 CNN(23)에 입력하여 영상에 포함된 물체의 특징을 추출할 수 있다. 여기서 추출되는 물체의 특징은 크기가 정규화된 물체를 대상으로 추출되었기 때문에 도 1에서 설명한 고정 크기 특징으로 지칭될 수 있다. 여기서 CNN(23)은 도 1에 따른 CNN(10)과 동일한 구조를 가지거나 통상의 기술자에 따라 다양한 방식으로 구현될 수 있다. When the normalized size of the object is derived, the size normalized image may be input to the CNN 23 to extract characteristics of the object included in the image. The feature of the object extracted here may be referred to as a fixed size feature described in FIG. 1 because the object is extracted with a normalized object. Here, the CNN 23 may have the same structure as the CNN 10 according to FIG. 1 or may be implemented in various ways according to a person skilled in the art.

도 3은 본 발명의 일 실시예에 따른 크기 인식 기반 CNN에 대한 개념도이다.3 is a conceptual diagram for a size recognition based CNN according to an embodiment of the present invention.

도 3을 참조하면, 본 발명의 일 실시예에 따른 크기 인식 기반 CNN은 크기에 독립적으로(또는 크기에 관련없는) 물체의 특징을 추출하는 신경망으로서, 입력 영상에 대하여 추출된 관심영역들을 관심영역에 포함된 물체의 크기에 따라 분류하는 분류기(31) 물체의 크기에 따라 독립적으로 활성화되는 복수의 부분 신경망(32) 및 중 적어도 하나를 포함할 수 있다.Referring to FIG. 3, the CNN based on size recognition according to an embodiment of the present invention is a neural network that extracts features of an object independent of size (or not related to size), and the regions of interest extracted with respect to the input image Classifier 31 to classify according to the size of an object included in may include at least one of a plurality of partial neural networks 32 and independently activated according to the size of the object.

여기서 분류기(31)는 입력된 관심영역을 물체의 크기에 따라 분류하여 해당되는 크기에 상응하는 부분 신경망(32)을 활성화할 수 있다. 이때, 분류 기준이 되는 크기는 앞서 도 2에 따른 참조 크기가 될 수 있다. Here, the classifier 31 may classify the input region of interest according to the size of the object to activate the partial neural network 32 corresponding to the corresponding size. In this case, the size that is the classification criterion may be the reference size according to FIG. 2 above.

예를 들어 부분 신경망(32)은 참조 크기보다 미리 설정된 임계값 이상 작은 경우, 미리 설정된 임계값보다 큰 경우, 미리 설정된 임계값 이내인 경우에 따라 각각 하나씩 구현되어 최소 3개일 수 있다. 또한, 부분 신경망(32)은 미리 학습된 고정 크기 특징과 크기 의존적 특징 사이의 관계를 이용하여 입력된 물체의 특징에서, 물체의 크기가 미치는 영향을 제외함으로써 크기 독립적 특징을 추출할 수 있다.For example, if the partial neural network 32 is smaller than a preset threshold value than a reference size, or larger than a preset threshold value, the partial neural network 32 may be implemented one at a time, depending on the case, and may be at least three. In addition, the partial neural network 32 may extract a size-independent feature by excluding the effect of the size of the object from the feature of the input object using the relationship between the previously learned fixed size feature and the size-dependent feature.

도 4는 본 발명의 일 실시예에 따른 크기 인식 기반 CNN을 학습시키는 과정을 설명하기 위한 개념도이다.4 is a conceptual diagram illustrating a process of learning a CNN based on size recognition according to an embodiment of the present invention.

도 4를 참조하면, 본 발명의 일 실시예에 따른 크기 인식 기반 CNN은 먼저 입력 영상(40)에 대하여 고정 크기 특징(42)을 추출하는 제1 과정(40a)과 입력 영상에서 추출된 관심 영역(43)을 크기 인식 기반 CNN(44)에 입력하여 크기 의존적 특징(45)을 추출하는 제2 과정(40b)을 통해 학습될 수 있다. Referring to FIG. 4, the CNN based on size recognition according to an embodiment of the present invention first extracts a fixed size feature 42 for the input image 40 and a region of interest extracted from the input image It can be learned through the second process 40b of extracting the size-dependent feature 45 by inputting (43) to the size recognition-based CNN (44).

여기서 고정 크기 특징(42)은 고정된 크기로 정규화된 관심 영역들(41)을 필터와 합성 곱(convolution)하고, 풀링을 선택적으로 수행하여 획득될 수 있다.Here, the fixed size feature 42 may be obtained by performing a convolution of the regions of interest 41 normalized to a fixed size with a filter and selectively performing pooling.

이때, 크기 인식 기반 CNN(44)은 관심 영역(43)을 참조 크기에 따라 분류하고, 분류된 관심 영역(43)을 필터와 컨볼루션한 후 활성화 함수(ReLU)를 적용함으로써 크기 의존적 특징(45)을 추출할 수 있다. 여기서 추출된 크기 의존적 특징(45)은 입력 영상이 RGB에 기반한 경우, R,G,B 각각 3개의 채널에 따라 3개의 특징값일 수 있다.At this time, the size recognition-based CNN 44 classifies the region of interest 43 according to the reference size, convolutions the classified region of interest 43 with a filter, and then applies an activation function (ReLU) to size-dependent features 45 ) Can be extracted. Here, when the input image is based on RGB, the extracted size-dependent feature 45 may be three feature values according to three channels of R, G, and B, respectively.

여기서, 크기 인식 기반 CNN(44)은 고정 크기 특징(42)의 전역 평균값과 크기 의존적 특징(45)의 전역 평균값 사이의 차분값을 부드러운 1차 정규 손실 함수(smooth L1)로 계산함으로써, 고정 크기 특징(42)과 크기 의존적 특징(45) 사이의 관계를 학습할 수 있다.Here, the size recognition-based CNN 44 calculates the difference between the global average value of the fixed-size feature 42 and the global average value of the size-dependent feature 45 by a smooth first-order normal loss function (smooth L1). The relationship between the feature 42 and the size dependent feature 45 can be learned.

구체적으로, 고정 크기 특징(42)과 크기 의존적 특징(45) 사이의 관계는 다음의 수학식 1에 따른 함수로 정의될 수 있다.Specifically, the relationship between the fixed size feature 42 and the size dependent feature 45 may be defined as a function according to Equation 1 below.

상기 수학식 1을 참조하면, 각 채널(c, 예를 들어 R,G,B마다 각각 하나)마다 크기 의존적 특징(44)을 공간 축에 대하여 모두 더한 값(r_c)과 고정 크기 특징(42)을 공간 축(x, y)에 대하여 모두 더한 값(

) 사이의 차분값을 부드러운 정규 손실 함수(smooth_L1)에 입력하고 각 채널의 결과값을 합산함으로써, 고정 크기 특징(42)과 크기 의존적 특징(44) 사이의 관계가 정의될 수 있다. 이때, 수학식 1에 따른 관계는 크기 의식 손실(size aware loss)로 지칭될 수도 있다.Referring to Equation 1 above, a value (r _c ) and a fixed size feature (42) in which a size-dependent feature (44) is added for each channel (for example, one for each R, G, and B) for the spatial axis ) Plus all of the space axes (x, y) (

By inputting the difference value between) into the smooth normal loss function (smooth _L1 ) and summing the results of each channel, the relationship between the fixed size feature 42 and the size dependent feature 44 can be defined. In this case, the relationship according to Equation 1 may also be referred to as size aware loss.

크기 인식 기반 CNN(44)에 대한 학습이 끝나면, 학습에 따라 결정된 파라미터를 기반으로 크기 인식 기반 CNN(44)에 따른 특징 추출 과정(40b)을 수행함으로써, 영상에 포함된 물체에 대한 크기 독립적 특징을 추출할 수 있고 추출된 크기 독립적 특징을 물체 검출 신경망(46)에 입력함으로써 최종적으로 물체를 검출할 수 있다.After learning of the size recognition-based CNN 44 is finished, by performing a feature extraction process 40b according to the size recognition-based CNN 44 based on parameters determined according to learning, size-independent features of objects included in an image Can be extracted and the object can be finally detected by inputting the extracted size-independent features into the object detection neural network 46.

도 5는 본 발명의 일 실시예에 따른 크기 의식 손실이 역전파되는 것을 차단하기 위해 크기 인식 기반 CNN을 듀얼로 구성한 개념도이다.FIG. 5 is a conceptual diagram of dual size recognition-based CNNs to block back propagation of size consciousness loss according to an embodiment of the present invention.

일반적으로 인공 신경망(nurural network)은 다중 계층으로 구성되어 있고, 계층이 복잡해질수록 연산이 복잡해지고 최적의 값을 계산할 수 없는 문제가 있다. 이러한 문제를 해결하기 위해 인공 신경망에서는, 일반적으로 인공 신경망의 연산 결과값을 다시 역방향으로 계산하면서 결론을 도출하는 역전파(back propagation) 개념이 사용된다. 그러나, 본 발명의 일 실시예에 따른 크기 의식 손실(또는 크기 독립적 특징과 크기 의존적 특징 사이의 관계 학습)이 역전파되는 경우 검출 성능의 저하를 야기할 수 있다. 따라서, 본 발명의 일 실시예에 따른 크기 인식 기반 CNN은 역전파를 차단하기 위해 듀얼로 구성될 수 있다. In general, the artificial neural network (nurural network) is composed of multiple layers, and the more complicated the layer, the more complicated the operation and the problem that the optimal value cannot be calculated. In order to solve this problem, in the artificial neural network, a concept of back propagation is generally used to draw a conclusion while calculating the result of the computation of the artificial neural network in the reverse direction. However, when the loss of size consciousness (or learning the relationship between the size-independent feature and the size-dependent feature) according to an embodiment of the present invention is reverse propagated, it may cause a decrease in detection performance. Accordingly, the CNN based on size recognition according to an embodiment of the present invention may be configured in dual to block back propagation.

구체적으로, 도 5를 참조하면 제1 크기 인식 기반 CNN(51)이 크기 의식 손실을 학습하여 신경망 내부의 파라미터를 설정하게 되며, 설정된 파라미터를 공유하는 제2 크기 인식 기반 CNN(52)을 실제 영상의 물체 검출에 사용함으로써, 역전파를 차단할 수 있다. 반면, 제2 크기 인식 기반 CNN(52)는 검출 신경망에 따른 검출 결과(또는 검출 손실)를 다시 신경망 전체로 역전파할 수 있다. Specifically, referring to FIG. 5, the first size recognition-based CNN 51 learns the size consciousness loss and sets the parameters inside the neural network, and the second size recognition-based CNN 52 that shares the set parameters is an actual image. By using for object detection, it is possible to block back propagation. On the other hand, the second size recognition based CNN 52 may back propagate the detection result (or detection loss) according to the detected neural network to the entire neural network.

한편, 도 5에서 제2 크기 인식 기반 CNN에서 도출되는 크기 독립적 특징은 그대로 물체 검출 신경망에 입력되어 물체를 검출할 수도 있으나, 기존의 합성 곱 특징과 크기 독립적 특징을 서로 결합시키고 결합된 특징이 검출 신경망에 입력될 수 있다.On the other hand, the size-independent feature derived from the second size recognition-based CNN in FIG. 5 may be input to the object detection neural network as it is, but the existing composite product feature and the size-independent feature are combined with each other and the combined feature is detected. It can be entered into the neural network.

크기 독립적 특징은 영상 내 물체의 위치에 따른 정보가 배제되어 있기 때문에 기존의 합성 곱 특징과 결합시키면 표현력을 더 풍부하게 만들 수 있다. 여기서 결합된 특징은 기존 합성 곱 특징에 따른 물체의 위치를 반영하고 있고, 크기 독립적일 수 있다. 도 5를 참조하면, 이러한 특징 결합을 반영하기 위하여 가산기(53)가 부가된 것을 확인할 수 있다.Since the size-independent feature excludes information according to the position of the object in the image, it can make expression more rich when combined with the existing composite product feature. Here, the combined features reflect the position of the object according to the existing synthetic product features, and may be size independent. Referring to Figure 5, it can be seen that the adder 53 is added to reflect this feature combination.

도 6은 본 발명의 일 실시예에 따른 CNN을 이용하여 크기 독립적으로 물체를 검출하는 방법에 대한 흐름도이다. 도 7은 본 발명의 일 실시예에 따른 고정 크기 특징을 수집하여 데이터베이스를 구축하는 방법에 대한 흐름도이다. 6 is a flowchart illustrating a method for independently detecting an object using a CNN according to an embodiment of the present invention. 7 is a flowchart illustrating a method of building a database by collecting fixed size features according to an embodiment of the present invention.

도 6을 참조하면, CNN(Convolutional Neural Network)을 이용하여 크기 독립적으로 물체를 검출하는 방법은, 검출하고자 하는 물체가 포함된 입력 영상을 획득하는 단계(S100), 상기 물체의 크기를 정규화한 영상에서 추출한 고정 크기 특징과 상기 입력 영상에서 추출한 크기 의존적 특징 사이의 상관 관계를 학습한 크기 인식 기반 CNN에 상기 입력 영상을 입력하여, 상기 물체에 대한 크기 독립적 특징을 추출하는 단계(S110) 및 추출된 크기 독립적 특징을 물체 검출 신경망에 입력하여 상기 물체를 검출하는 단계(S120)를 포함할 수 있다.Referring to FIG. 6, a method for independently detecting an object using a convolutional neural network (CNN) includes obtaining an input image including an object to be detected (S100), and normalizing the size of the object Extracting a size-independent feature for the object by inputting the input image into a CNN based on a size recognition learning a correlation between a fixed-size feature extracted from and a size-dependent feature extracted from the input image (S110) and extracted And inputting a size-independent feature into an object detection neural network to detect the object (S120).

상기 입력 영상을 획득하는 단계(S100) 전이나 후에, 상기 고정 크기 특징을 수집하여 고정 크기 특징 DB(database)를 생성하는 단계를 더 포함할 수 있다.Before or after the step (S100) of acquiring the input image, the method may further include collecting the fixed size feature and generating a fixed size feature DB (database).

도 7을 참조할 때 상기 고정 크기 특징 DB를 생성하는 단계는, 상기 물체의 크기 정보를 수집하는 단계(S200), 수집된 상기 물체의 크기 정보를 이용하여 참조 크기를 결정하는 단계(S210), 결정된 참조 크기를 이용하여 상기 입력 영상에 포함된 물체의 크기를 정규화하는 단계(S220) 및 정규화된 상기 입력 영상을 CNN에 입력하여 고정 크기 특징을 추출하는 단계(S230)를 포함할 수 있다.The step of generating the fixed size feature DB when referring to FIG. 7 includes collecting size information of the object (S200), and determining a reference size using the collected size information of the object (S210), It may include the step of normalizing the size of the object included in the input image using the determined reference size (S220) and extracting a fixed size feature by inputting the normalized input image to the CNN (S230).

상기 참조 크기를 결정하는 단계(S110)는, 상기 물체의 크기 정보에 포함된 값들 중에서 중간값(median)을 상기 참조 크기로 결정할 수 있다.In the determining of the reference size (S110), a median among the values included in the size information of the object may be determined as the reference size.

It can be defined as.

상기 물체를 검출하는 단계(S120)는, 상기 크기 독립적 특징과 상기 크기 의존적 특징을 결합한 결과값을 상기 물체 검출 신경망에 입력함으로써, 상기 크기 의존적 특징에 따른 상기 물체의 영상 내 위치 정보를 고려하여 상기 물체를 검출할 수 있다.In the step of detecting the object (S120), the result of combining the size-independent feature and the size-dependent feature is input to the object detection neural network, and the position information in the image of the object according to the size-dependent feature is considered. Objects can be detected.

도 8은 본 발명의 일 실시예에 따른 CNN을 이용하여 크기 독립적으로 물체를 검출하는 장치에 대한 구성도이다.8 is a configuration diagram of an apparatus for independently detecting an object using a CNN according to an embodiment of the present invention.

도 8을 참조하면, CNN(Convolutional Neural Network)을 이용하여 크기 독립적으로 물체를 검출하는 장치(100)는, 적어도 하나의 프로세서(processor, 110) 및 상기 적어도 하나의 프로세서(110)가 적어도 하나의 단계를 수행하도록 지시하는 명령어들(instructions)을 저장하는 메모리(memory, 120)를 포함할 수 있다.Referring to FIG. 8, an apparatus 100 for independently detecting an object using a convolutional neural network (CNN) includes at least one processor (110) and at least one processor (110). It may include a memory (memory, 120) for storing instructions (instructions) to perform the step.

상기 적어도 하나의 프로세서(110)는 중앙 처리 장치(central processing unit, CPU), 그래픽 처리 장치(graphics processing unit, GPU), 또는 본 발명의 실시예들에 따른 방법들이 수행되는 전용의 프로세서를 의미할 수 있다. 메모리(120) 및 저장 장치(160) 각각은 휘발성 저장 매체 및 비휘발성 저장 매체 중에서 적어도 하나로 구성될 수 있다. 예를 들어, 메모리(120)는 읽기 전용 메모리(read only memory, ROM) 및 랜덤 액세스 메모리(random access memory, RAM) 중에서 적어도 하나로 구성될 수 있다. The at least one processor 110 may refer to a central processing unit (CPU), a graphics processing unit (GPU), or a dedicated processor in which methods according to embodiments of the present invention are performed. You can. Each of the memory 120 and the storage device 160 may be configured as at least one of a volatile storage medium and a nonvolatile storage medium. For example, the memory 120 may be configured as at least one of a read only memory (ROM) and a random access memory (RAM).

또한, CNN을 이용하여 크기 독립적으로 물체를 검출하는 장치(100)는, 무선 네트워크를 통해 통신을 수행하는 송수신 장치(transceiver, 130)를 포함할 수 있다. 또한, CNN을 이용하여 크기 독립적으로 물체를 검출하는 장치(100)는 입력 인터페이스 장치(140), 출력 인터페이스 장치(150), 저장 장치(160) 등을 더 포함할 수 있다. CNN을 이용하여 크기 독립적으로 물체를 검출하는 장치(100)에 포함된 각각의 구성 요소들은 버스(bus)(170)에 의해 연결되어 서로 통신을 수행할 수 있다.In addition, the apparatus 100 for detecting an object independently of a size using a CNN may include a transceiver 130 that performs communication through a wireless network. In addition, the apparatus 100 for detecting an object independently using a CNN may further include an input interface device 140, an output interface device 150, and a storage device 160. Each component included in the apparatus 100 for independently detecting an object using a CNN may be connected by a bus 170 to communicate with each other.

상기 적어도 하나의 단계는, 검출하고자 하는 물체가 포함된 입력 영상을 획득하는 단계, 상기 물체의 크기를 정규화한 영상에서 추출한 고정 크기 특징과 상기 입력 영상에서 추출한 크기 의존적 특징 사이의 상관 관계를 학습한 크기 인식 기반 CNN에 상기 입력 영상을 입력하여, 상기 물체에 대한 크기 독립적 특징을 추출하는 단계 및 추출된 크기 독립적 특징을 물체 검출 신경망에 입력하여 상기 물체를 검출하는 단계를 포함할 수 있다.The at least one step of acquiring an input image containing an object to be detected, learning a correlation between a fixed size feature extracted from a normalized image of the object and a size dependent feature extracted from the input image The method may include inputting the input image to a size recognition-based CNN, extracting size-independent features of the object, and inputting the extracted size-independent features to an object detection neural network to detect the object.

It can be defined as.

상기 물체를 검출하는 단계는, 상기 크기 독립적 특징과 상기 크기 의존적 특징을 결합한 결과값을 상기 물체 검출 신경망에 입력함으로써, 상기 크기 의존적 특징에 따른 상기 물체의 영상 내 위치 정보를 고려하여 상기 물체를 검출할 수 있다.In the step of detecting the object, the result of combining the size-independent feature and the size-dependent feature is input to the object detection neural network to detect the object in consideration of positional information in the image of the object according to the size-dependent feature can do.

도 9a 내지 도 9d는 본 발명의 일 실시예에 따른 CNN을 이용하여 크기 독립적으로 물체를 검출하는 방법 및 장치에 따른 물체 검출 성능을 나타내는 그래프이다.9A to 9D are graphs showing object detection performance according to a method and apparatus for independently detecting an object using a CNN according to an embodiment of the present invention.

도 9a 내지 도 9d는 각각 배경(background), 비행기(aeroplane), 자전거(bicycle), 새(bird) 등이 도시된 픽쳐(또는 영상)을 입력 영상으로 사용하여 영상에 포함된 물체를 검출한 성능을 나타낸 것이다. 이때, 각 그래프는 기존의 관심 영역 기반의 물체 검출 방법(pooling)과 본 발명의 일 실시예에 따른 크기 인식 기반 CNN을 이용하여 크기 독립적으로 물체를 검출하는 방법(Pooling with SAN)을 서로 비교하여, 물체의 크기(scale)에 따른 평균 제곱근 오차(root mean square error, RMSE)을 도시한 것이다.9A to 9D show the performance of detecting an object included in an image by using a picture (or image) showing a background, an airplane, a bicycle, a bird, etc. as an input image, respectively. It shows. In this case, each graph compares an existing method of detecting an object based on a region of interest (Pooling with SAN) using an object detection method (pooling) based on a region of interest and a size recognition-based CNN according to an embodiment of the present invention. , It shows the root mean square error (RMSE) according to the scale of an object.

도 9a 내지 도 9d를 참조하면, 본 발명의 일 실시예에 따른 크기 독립적으로 물체를 검출하는 방법 및 장치를 사용한 경우에서, 종래의 기술보다 오검출율이 모든 크기 범위에서 더 낮게 측정된 것을 확인할 수 있다. 9A to 9D, when using a method and apparatus for detecting an object independently of size according to an embodiment of the present invention, it is confirmed that a false detection rate is measured lower in all size ranges than in the prior art Can.

상기에서는 본 발명의 바람직한 실시예를 참조하여 설명하였지만, 해당 기술 분야의 숙련된 당업자는 하기의 특허 청구의 범위에 기재된 본 발명의 사상 및 영역으로부터 벗어나지 않는 범위 내에서 본 발명을 다양하게 수정 및 변경시킬 수 있음을 이해할 수 있을 것이다. Although described above with reference to preferred embodiments of the present invention, those skilled in the art variously modify and change the present invention without departing from the spirit and scope of the present invention as set forth in the claims below. You can understand that you can.

Claims

In a method of independently detecting an object using a CNN (Convolutional Neural Network),
Obtaining an input image containing an object to be detected;
Extract the size-independent feature of the object by inputting the input image into the size recognition-based CNN that learned the correlation between the fixed-size feature extracted from the normalized image and the size-dependent feature extracted from the input image. To do; And
And inputting the extracted size-independent feature into an object detection neural network to detect the object.

In claim 1,
Before or after the step of acquiring the input image,
And collecting the fixed size feature to generate a fixed size feature DB (database).

In claim 2,
The step of generating the fixed size feature DB,
Collecting size information of the object;
Determining a reference size using the collected size information of the object;
Normalizing the size of the object included in the input image using the determined reference size; And
And extracting a fixed size feature by inputting the normalized input image into a CNN, and detecting an object independently of size.

In claim 3,
Determining the reference size,
A method of detecting an object independently of size, determining a median among the values included in the size information of the object as the reference size.

In claim 3,
The size recognition based CNN,
A method of classifying the size-dependent features according to the reference size, and partially neural networks corresponding to the classified sizes to individually activate and extract the size-independent features.

In claim 5,
The size recognition based CNN,
Compares the size of the object for which the size-dependent feature is calculated with the reference size, and detects an object independently of size, including three partial neural networks that are individually activated according to the reference size, a small case, and a similar case. How to.

In claim 1,
The correlation between the fixed size feature and the size dependent feature,
The value (r _c ) of all the feature values according to the size-dependent feature on the spatial axis and the value of all the feature values according to the fixed-size feature on the spatial axis (

) A method for detecting objects independently of size, which is determined by calculating the difference value between) with a smooth first order normal loss function (smooth _L1 ).

In claim 7,
The correlation (L _san ) between the fixed-size feature and the size-dependent feature is expressed by the equation for the channel C according to the display method of the input image

A method of detecting objects independently of size, as defined by.

In claim 1,
The size recognition based CNN,
A first size recognition based CNN for setting the internal parameters of the neural network by learning the correlation; And
And a second size recognition-based CNN that extracts the size-independent features by sharing the neural network internal parameters of the first size-recognition-based CNN.

In claim 1,
The step of detecting the object,
By detecting the object in consideration of the positional information in the image of the object according to the size-dependent feature, by inputting the result of combining the size-independent feature and the size-dependent feature into the object detection neural network, the object is size-independently detected How to.

A device that independently detects objects using a CNN (Convolutional Neural Network),
At least one processor; And
A memory storing instructions instructing the at least one processor to perform at least one step,
The at least one step,
Obtaining an input image containing an object to be detected;
Extract the size-independent feature of the object by inputting the input image into the size recognition-based CNN that learned the correlation between the fixed-size feature extracted from the normalized image and the size-dependent feature extracted from the input image. To do; And
And detecting the object by inputting the extracted size-independent feature into an object detection neural network.

In claim 11,
Before or after the step of acquiring the input image,
Collecting the fixed-size feature further comprises the step of generating a fixed-size feature DB (database), the apparatus for detecting an object independently of size.

In claim 12,
The step of generating the fixed size feature DB,
Collecting size information of the object;
Determining a reference size using the collected size information of the object;
Normalizing the size of the object included in the input image using the determined reference size; And
And extracting a fixed size feature by inputting the normalized input image to a CNN, and detecting an object independently of size.

In claim 13,
Determining the reference size,
Apparatus for detecting an object independently of size, determining a median among the values included in the size information of the object as the reference size.

In claim 13,
The size recognition based CNN,
The apparatus for classifying the size-dependent features according to the reference size and partially neural networks corresponding to the classified sizes to individually activate and extract the size-independent features.

In claim 15,
The size recognition based CNN,
Compares the size of the object for which the size-dependent feature is calculated with the reference size, and detects an object independently of size, including three partial neural networks that are individually activated according to the reference size, a small case, and a similar case. Device.

In claim 11,
The correlation between the fixed size feature and the size dependent feature,
The value (r _c ) of all the feature values according to the size-dependent feature on the spatial axis and the value of all the feature values according to the fixed-size feature on the spatial axis (

A device that detects objects independently of size, which is determined by calculating the difference between) as a smooth first-order normal loss function (smooth _L1 ).

In claim 17,
The correlation (L _san ) between the fixed-size feature and the size-dependent feature is expressed by the equation for the channel C according to the display method of the input image

A device that detects objects independently of size, as defined by.

In claim 1,
The size recognition based CNN,
A first size recognition based CNN for setting the internal parameters of the neural network by learning the correlation; And
And a second size recognition-based CNN for extracting the size-independent feature by sharing the neural network internal parameters of the first size-recognition-based CNN.

In claim 11,
The step of detecting the object,
By detecting the object in consideration of the positional information in the image of the object according to the size-dependent feature, by inputting the result of combining the size-independent feature and the size-dependent feature into the object detection neural network, the object is size-independently detected Device.