KR20190052785A

KR20190052785A - Method and apparatus for detecting object, and computer program for executing the method

Info

Publication number: KR20190052785A
Application number: KR1020170148498A
Authority: KR
Inventors: 김준광; 정우영; 정희철
Original assignee: 재단법인대구경북과학기술원
Priority date: 2017-11-09
Filing date: 2017-11-09
Publication date: 2019-05-17

Abstract

According to an embodiment of the present invention, disclosed is an object detection method with improved operation processing speed while having high reliability which comprises the steps of: classifying an input image into any one of a plurality of categories by using a neural network model learned to classify an image into the plurality of categories; and detecting an object from a first region which is a partial region of the image when the image is classified as a first category and detecting an object in a second region which is another partial region of the image when the image is classified as a second category.

Description

FIELD OF THE INVENTION [0001] The present invention relates to an object detection method, a device, and a computer program,

본 발명의 실시예들은 객체 검출 방법, 장치 및 컴퓨터 프로그램에 관한 것이다.Embodiments of the present invention relate to object detection methods, apparatuses, and computer programs.

영상, 예를 들어 CCTV 영상 등에서 자동으로 객체를 검출하기 위한 알고리즘이 다양하게 개발되었다. 객체 검출은 주로 보안의 목적을 가지고 있고, 고신뢰성 및 실시간성이 요구된다. 고신뢰성을 달성하기 위해서는 많은 연산처리량이 요구되는 반면, 연산처리량이 많아질수록 실시간성을 달성할 수 없게 되는 문제가 있다.Various algorithms have been developed to automatically detect objects in images, for example, CCTV images. Object detection is mainly for security purposes and requires high reliability and real - time. In order to attain high reliability, a large amount of computation throughput is required, but as the computational throughput increases, there is a problem that real time performance can not be achieved.

본 발명의 실시예들은 신뢰도가 높으면서도 연산 처리 속도가 향상된 객체 검출 방법, 장치 및 컴퓨터 프로그램을 제공한다.Embodiments of the present invention provide an object detection method, an apparatus, and a computer program, which have high reliability and a high processing speed.

본 발명의 일 실시예는 영상을 복수의 카테고리로 분류하도록 학습된 신경망 모델을 이용하여, 입력된 영상을 복수의 상기 복수의 카테고리 중 어느 하나로 분류하는 단계; 및 상기 영상이 제1 카테고리로 분류된 경우 상기 영상의 일부 영역인 제1 영역에서 객체를 검출하고, 상기 영상이 제2 카테고리로 분류된 경우 상기 영상의 다른 일부 영역인 제2 영역에서 객체를 검출하는 단계;를 포함하는 객체 검출 방법을 개시한다.According to an embodiment of the present invention, there is provided a method of classifying an image into a plurality of categories using a neural network model learned to classify an image into a plurality of categories, And detecting an object in a first area that is a part of the image when the image is classified into a first category and detecting an object in a second area that is another part of the image when the image is classified into a second category A method for detecting an object, the method comprising:

상기 카테고리는, 영상을 기설정된 방법에 따라 제1 영역과 제2 영역을 포함하는 2 이상의 영역으로 분할하였을 때의 상기 제1 영역에 객체가 포함된 것을 의미하는 제1 카테고리, 상기 제2 영역에 객체가 포함된 것을 의미하는 제2 카테고리를 포함할 수 있다.The category includes a first category indicating that an object is included in the first area when the image is divided into two or more areas including a first area and a second area according to a predetermined method, And may include a second category, which means that the object is included.

상기 카테고리는 상기 영상에 상기 객체가 포함되지 않은 것을 의미하는 제3 카테고리를 더 포함할 수 있고, 상기 검출하는 단계는, 상기 영상이 제3 카테고리로 분류되면 상기 영상에서 객체를 검출하지 않을 수 있다.The category may further include a third category indicating that the object is not included in the image, and the detecting may not detect the object in the image if the image is classified into a third category .

상기 카테고리는, 상기 분할된 모든 영역에 객체가 포함된 것을 의미하는 제4 카테고리를 더 포함할 수 있고, 상기 검출하는 단계는, 상기 영상이 상기 제4 카테고리로 분류되면 상기 영상 전체에서 상기 객체를 검출할 수 있다.The category may further include a fourth category indicating that an object is included in all of the divided areas, and the detecting step may further include, when the image is classified into the fourth category, Can be detected.

상기 제1 영역은 상기 영상을 2개 영역으로 분할하였을 때의 왼쪽 영역이고, 상기 제2 영역은 오른쪽 영역일 수 있다.The first area may be a left area when the image is divided into two areas, and the second area may be a right area.

상기 신경망 모델은, 상기 입력 영상을 필터링하는 컨볼루션 레이어, 상기 컨볼루션 레이어의 출력 데이터를 압축하는 풀링 레이어, 상기 풀링 레이어의 출력 데이터에 연결되는 FCL(fully connected layer), 상기 FCL의 출력 데이터에 포함되는 소프트맥스 레이어를 포함할 수 있고, 상기 소프트맥스 레이어는 상기 복수의 카테고리 각각에 대응되는 노드를 포함할 수 있다.The neural network model includes a convolution layer for filtering the input image, a pulling layer for compressing output data of the convolution layer, a fully connected layer (FCL) connected to output data of the pulling layer, And may include a soft max layer included, and the soft max layer may include a node corresponding to each of the plurality of categories.

상기 풀링 레이어는 스트라이드가 1로 설정될 수 있다.The pooling layer may have a stride set to one.

상기 검출하는 단계는, 후보 영역을 검출하는 단계; 및 상기 후보 영역 각각에 포함된 객체를 검출하는 단계;를 포함할 수 있다.The detecting step may include: detecting a candidate region; And detecting an object included in each of the candidate regions.

본 발명의 다른 실시예는, 영상을 복수의 카테고리로 분류하도록 학습된 신경망 모델을 이용하여, 입력된 영상을 복수의 상기 복수의 카테고리 중 어느 하나로 분류하는 카테고리 분류부; 및 상기 영상이 제1 카테고리로 분류된 경우 상기 영상의 일부 영역인 제1 영역에서 객체를 검출하고, 상기 영상이 제2 카테고리로 분류된 경우 상기 영상의 다른 일부 영역인 제2 영역에서 객체를 검출하는 검출부;를 포함하는 객체 검출 장치를 개시한다.According to another embodiment of the present invention, there is provided an image processing apparatus comprising: a category classification unit for classifying an input image into one of a plurality of categories using a neural network model learned to classify an image into a plurality of categories; And detecting an object in a first area that is a part of the image when the image is classified into a first category and detecting an object in a second area that is another part of the image when the image is classified into a second category The object detecting apparatus comprising:

본 발명의 다른 실시예는 전술한 방법을 컴퓨터에서 실행하기 위하여 매체에 저장된 컴퓨터 프로그램을 개시한다.Another embodiment of the present invention discloses a computer program stored on a medium for performing the above-described method on a computer.

전술한 것 외의 다른 측면, 특징, 이점이 이하의 도면, 특허청구범위 및 발명의 상세한 설명으로부터 명확해질 것이다. Other aspects, features, and advantages will become apparent from the following drawings, claims, and detailed description of the invention.

이러한 일반적이고 구체적인 측면이 시스템, 방법, 컴퓨터 프로그램, 또는 어떠한 시스템, 방법, 컴퓨터 프로그램의 조합을 사용하여 실시될 수 있다.These general and specific aspects may be implemented by using a system, method, computer program, or any combination of systems, methods, and computer programs.

본 발명의 실시예들에 관한 객체 검출 방법, 장치 및 컴퓨터 프로그램은, 입력 영상의 카테고리를 먼저 분류한 후, 카테고리 분류 결과에 따라 입력 영상 중 적어도 일부 영역인 객체 검출 대상 영역 내에 한하여 객체 검출을 수행하므로, 카테고리 분류 연산 처리량이 현저하게 저감되어 빠른 속도로 객체를 검출할 수 있는 효과가 있다.The object detection method, apparatus, and computer program according to embodiments of the present invention first classify an input image category, and then perform object detection only within an object detection target area that is at least a partial region of the input image according to a category classification result Therefore, the category classification operation processing amount is remarkably reduced, and the object can be detected at a high speed.

본 발명의 실시예들에 관한 객체 검출 방법, 장치 및 컴퓨터 프로그램은, 기설정된 방법으로 영상을 분할하였을 때 어느 영역에 객체가 포함되는지 여부에 따라 입력 영상의 카테고리를 분류할 수 있고, 인공신경망을 이용하여 그러한 분류를 높은 정확도로 수행할 수 있기 때문에, 연산 처리량을 저감하면서도 높은 신뢰도를 유지할 수 있는 효과가 있다.The object detection method, apparatus, and computer program according to embodiments of the present invention can classify the category of the input image according to whether an object is included in an area when the image is divided by a preset method, It is possible to carry out such classification with high accuracy, so that it is possible to maintain a high reliability while reducing the computational processing amount.

도 1은 일 실시예에 따른 객체 검출 시스템을 도시한 것이다.
도 2는 일 실시예에 다른 객체 검출 장치(100)의 구성을 개략적으로 도시한 블록도이다.
도 3은 일 실시예에 따른 검출부(120)의 구성을 개략적으로 도시한 블록도이다.
도 4는 일 실시예에 다른 객체 검출 방법을 개략적으로 도시한 흐름도이다.
도 5는 입력 영상의 예를 도시한 것이다.
도 6은 도 5의 (a)에 도시된 영상의 제1 영역(51)에서 객체를 검출하는 방법을 도시한 것이다.
도 7은 출력 영상의 예를 도시한 것이다.
도 8은 일 실시예에 따른 카테고리 분류부(110)의 신경망 모델을 도시한 것이다.1 illustrates an object detection system according to one embodiment.
2 is a block diagram schematically showing the configuration of an object detecting apparatus 100 according to an embodiment.
3 is a block diagram schematically showing the configuration of the detection unit 120 according to one embodiment.
4 is a flowchart schematically illustrating an object detection method according to an embodiment.
5 shows an example of an input image.
FIG. 6 shows a method of detecting an object in the first area 51 of the image shown in FIG. 5 (a).
FIG. 7 shows an example of an output image.
FIG. 8 illustrates a neural network model of the category classifier 110 according to an embodiment.

본 발명은 다양한 변환을 가할 수 있고 여러 가지 실시예를 가질 수 있는 바, 특정 실시예들을 도면에 예시하고 상세한 설명에 상세하게 설명하고자 한다. 본 발명의 효과 및 특징, 그리고 그것들을 달성하는 방법은 도면과 함께 상세하게 후술되어 있는 실시예들을 참조하면 명확해질 것이다. 그러나 본 발명은 이하에서 개시되는 실시예들에 한정되는 것이 아니라 다양한 형태로 구현될 수 있다. BRIEF DESCRIPTION OF THE DRAWINGS The present invention is capable of various modifications and various embodiments, and specific embodiments are illustrated in the drawings and described in detail in the detailed description. The effects and features of the present invention and methods of achieving them will be apparent with reference to the embodiments described in detail below with reference to the drawings. However, the present invention is not limited to the embodiments described below, but may be implemented in various forms.

이하, 첨부된 도면을 참조하여 본 발명의 실시예들을 상세히 설명하기로 하며, 도면을 참조하여 설명할 때 동일하거나 대응하는 구성 요소는 동일한 도면부호를 부여하고 이에 대한 중복되는 설명은 생략하기로 한다.Hereinafter, exemplary embodiments of the present invention will be described in detail with reference to the accompanying drawings, wherein like reference numerals refer to like or corresponding components throughout the drawings, and a duplicate description thereof will be omitted .

이하의 실시예에서, 제1, 제2 등의 용어는 한정적인 의미가 아니라 하나의 구성 요소를 다른 구성 요소와 구별하는 목적으로 사용되었다. 이하의 실시예에서, 단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한, 복수의 표현을 포함한다. 이하의 실시예에서, 포함하다 또는 가지다 등의 용어는 명세서상에 기재된 특징, 또는 구성요소가 존재함을 의미하는 것이고, 하나 이상의 다른 특징들 또는 구성요소가 부가될 가능성을 미리 배제하는 것은 아니다. 도면에서는 설명의 편의를 위하여 구성 요소들이 그 크기가 과장 또는 축소될 수 있다. 예컨대, 도면에서 나타난 각 구성의 크기 및 두께는 설명의 편의를 위해 임의로 나타내었으므로, 본 발명이 반드시 도시된 바에 한정되지 않는다.In the following embodiments, the terms first, second, and the like are used for the purpose of distinguishing one element from another element, not the limitative meaning. In the following examples, the singular forms "a", "an" and "the" include plural referents unless the context clearly dictates otherwise. In the following embodiments, terms such as inclusive or possessive are intended to mean that a feature, or element, described in the specification is present, and does not preclude the possibility that one or more other features or elements may be added. In the drawings, components may be exaggerated or reduced in size for convenience of explanation. For example, the size and thickness of each component shown in the drawings are arbitrarily shown for convenience of explanation, and thus the present invention is not necessarily limited to those shown in the drawings.

도 1은 일 실시예에 따른 객체 검출 시스템을 도시한 것이다.1 illustrates an object detection system according to one embodiment.

도 1을 참조하면, 일 실시예에 따른 객체 검출 시스템은, 입력 영상으로부터 객체를 검출하는 객체 검출 장치(100)를 포함한다. 일 실시예에 따른 객체 검출 시스템은, 객체 검출 장치(100)로부터 출력되는 출력 영상을 표시하는 표시장치(200) 및/또는 객체 검출 장치(100)로부터 출력된 정보를 저장하는 데이터베이스(300)를 더 포함할 수 있다.Referring to FIG. 1, an object detection system according to an embodiment includes an object detection apparatus 100 for detecting an object from an input image. The object detection system according to one embodiment includes a display device 200 for displaying an output image output from the object detection apparatus 100 and a database 300 for storing information output from the object detection apparatus 100 .

일 실시예에서 객체는, 영상 내에서 검출하고자 하는 대상을 의미한다. 객체 검출 시스템은 검출 대상이 되는 객체에 관한 정보를 포함할 수 있다.In one embodiment, the object means an object to be detected in the image. The object detection system may include information about an object to be detected.

도 2는 일 실시예에 다른 객체 검출 장치(100)의 구성을 개략적으로 도시한 블록도이다.2 is a block diagram schematically showing the configuration of an object detecting apparatus 100 according to an embodiment.

도 2를 참조하면, 객체 검출 장치(100)는, 카테고리 분류부(110), 검출부(120)를 포함할 수 있고, 영상 출력부(130)를 더 포함할 수 있다.Referring to FIG. 2, the object detecting apparatus 100 may include a category classifying unit 110, a detecting unit 120, and may further include a video output unit 130.

카테고리 분류부(110)는 입력 영상을 수신하고, 입력 영상을 복수의 카테고리 중 어느 하나로 분류할 수 있다. 카테고리 분류부(110)는 영상을 기설정된 복수의 카테고리로 분류하도록 학습된 신경망(neural net) 모델을 포함할 수 있다. 카테고리 분류부(110)는 학습된 신경망 모델을 이용하여, 입력 영상을 기설정된 복수의 카테고리 중 어느 하나로 분류할 수 있다.The category classification unit 110 receives an input image and can classify the input image into one of a plurality of categories. The category classifying unit 110 may include a neural net model that is learned to classify images into a plurality of predetermined categories. The category classification unit 110 can classify the input image into any one of a plurality of predetermined categories by using the learned neural network model.

카테고리 분류부(110)에 의해 분류되는 카테고리는, 영상을 복수의 영역으로 분할하였을 때 어느 영역에 객체가 포함되는 지 여부에 따라 구분될 수 있다. 예를 들어, 일 실시예에 따른 카테고리는, 영상을 기설정된 방법에 따라 제1 영역 및 제2 영역을 포함하는 2 이상의 영역으로 분할하였을 때, 제1 영역에 객체가 포함된 경우를 의미하는 제1 카테고리 및 제2 영역에 객체가 포함된 경우를 의미하는 제2 카테고리를 포함할 수 있다. The category classified by the category classification unit 110 can be classified according to which area the object is included when the image is divided into a plurality of areas. For example, a category according to an embodiment may include a category, which means that when an image is divided into two or more regions including a first region and a second region according to a predetermined method, 1 category, and a second category indicating a case where an object is included in the second area.

검출부(120)는, 입력 영상 및 카테고리 분류부(110) 의한 입력 영상의 카테고리 분류 결과를 수신하고, 카테고리 분류 결과에 기초하여 입력 영상의 적어도 일부 영역인 객체 검출 대상 영역 내에서 객체를 검출할 수 있다.The detection unit 120 receives the category classification result of the input image and the input image by the category classification unit 110 and detects the object in the object detection target area that is at least a partial region of the input image based on the category classification result have.

객체 검출 대상 영역은, 카테고리 분류 결과에 기초하여 결정되는 것일 수 있다. 예를 들어, 검출부(120)는 입력 영상이 전술한 제1 카테고리로 분류된 경우, 입력 영상의 제1 영역에서 객체를 검출할 수 있다. 검출부(120)는 입력 영상이 전술한 제2 카테고리로 분류된 경우, 입력 영상의 제2 영역에서 객체를 검출할 수 있다.The object detection subject area may be determined based on the category classification result. For example, when the input image is classified into the first category, the detection unit 120 may detect the object in the first area of the input image. The detection unit 120 may detect an object in a second region of the input image when the input image is classified into the second category.

일 실시예에 따르면, 입력 영상에서 실질적으로 객체를 검출하기 이전에, 객체가 존재하는 것으로 예상되는 영역 정보를 먼저 획득하고, 그러한 영역 내에서만 객체를 검출할 수 있다. 이에 따르면, 입력 영상의 전 영역에서 객체를 검출하는 것에 비하여 연산 속도가 현저하게 저감된다.According to one embodiment, the region information expected to exist before the object is actually obtained, and the object can be detected only within such region, before substantially detecting the object in the input image. According to this, the operation speed is remarkably reduced as compared with the case where the object is detected in the entire area of the input image.

도 3은 일 실시예에 따른 검출부(120)의 구성을 개략적으로 도시한 블록도이다.3 is a block diagram schematically showing the configuration of the detection unit 120 according to one embodiment.

도 3을 참조하면, 검출부(120)는 객체 검출 대상 영역 내에서 객체를 검출하는 객체 검출부(122)를 포함할 수 있고, 객체를 검출하기 위한 사전작업을 수행하는 후보영역 검출부(121) 및 검출된 객체를 분류하는 객체 분류부(123)를 더 포함할 수 있다.Referring to FIG. 3, the detection unit 120 may include an object detection unit 122 for detecting an object in an object detection area, and may include a candidate area detection unit 121 for performing a preliminary operation for detecting an object, And an object classification unit 123 for classifying the objects.

후보영역 검출부(121)는 객체 검출 대상 영역 내에서, 객체가 존재할 것으로 예상되는 후보영역을 검출할 수 있다.The candidate region detection unit 121 can detect a candidate region in which an object is expected to exist in the object detection target region.

객체 검출부(122)는 후보영역 검출부(121)에 의해 검출된 후보영역 각각에 포함된 객체를 검출할 수 있다. 후보영역은 노이즈를 포함할 수 있고, 적어도 일부 후보영역에서 객체가 검출되지 않을 수 있다.The object detecting unit 122 may detect an object included in each of the candidate regions detected by the candidate region detecting unit 121. The candidate region may include noise, and an object may not be detected in at least some candidate regions.

객체 분류부(123)는 객체 검출부(122)에 의해 검출된 객체를 분류하고, 객체에 부가적인 정보를 부여할 수 있다. 예를 들어, 객체 분류부(123)는 검출된 객체를 차량 또는 사람으로 분류할 수 있고, 객체 분류 정보를 객체에 대해 부여할 수 있다.The object classifying unit 123 classifies the objects detected by the object detecting unit 122 and can provide additional information to the objects. For example, the object classification unit 123 can classify the detected object as a vehicle or a person, and assign object classification information to the object.

일 실시예에 따르면, 객체 검출부(122)는, 검출된 객체 정보를 포함하는 데이터를 출력할 수 있다. 예를 들어, 객체 검출부(122)는 검출된 객체 정보를 입력 영상에 표시한 출력 영상을 출력할 수 있다. 또는 객체 검출부(122)는 검출된 객체의 위치 정보를 포함하거나 객체 분류 정보를 더 포함하는 메타데이터를 출력할 수 있다. According to one embodiment, the object detecting unit 122 may output data including the detected object information. For example, the object detection unit 122 may output an output image in which the detected object information is displayed on the input image. Alternatively, the object detection unit 122 may output positional information of the detected object or metadata including the object classification information.

객체 검출부(122)는 출력 영상을 출력장치(200)로 출력하거나, 출력 영상 및/또는 메타데이터를 데이터베이스(300)에 저장할 수 있다.The object detection unit 122 may output the output image to the output device 200 or may store the output image and / or metadata in the database 300. [

출력 영상은, 입력 영상에서 객체와 관련된 기설정된 표시요소를 추가한 것일 수 있다. 표시요소는, 객체의 위치를 식별할 수 있도록 객체가 검출된 위치에 표시되는 점, 선, 및/또는 면으로 구성되는 요소일 수 있다. 표시요소는, 객체의 분류 정보를 식별할 수 있도록 입력 영상 내에 객체의 정보를 표시하는 요소일 수 있다. 예를 들어, 출력 영상은 입력 영상에서 객체가 검출된 위치에 객체를 둘러싸도록 표시된 바운딩(bounding) 박스를 포함할 수 있다. 출력 영상은 입력 영상에서 객체가 검출된 위치에, 또는 기설정된 특정 위치에, 객체의 분류 정보를 텍스트 및/또는 기설정된 도형으로 표시한 것일 수 있다. 객체의 분류 정보를 나타내는 도형은, 예컨대 사람을 형성화한 도형, 차량을 형상화한 도형 등을 포함할 수 있다.The output image may be a predetermined display element added to the object in the input image. The display element may be an element consisting of a point, a line, and / or a face displayed at a position where the object is detected so as to identify the position of the object. The display element may be an element that displays information of the object in the input image so as to identify the classification information of the object. For example, the output image may include a bounding box marked to surround the object at the location where the object was detected in the input image. The output image may be one in which the classification information of the object is displayed in a text and / or a predetermined graphic form at a position where the object is detected in the input image or at a predetermined specific position. The figure representing the classification information of the object may include, for example, a figure formed by a person, a figure shaped by a vehicle, or the like.

이하에서 본 발명의 실시예들을 설명할 때 도 1 내지 도 3에 도시된 구성들을 참조할 것이다.In the following, reference will be made to the configurations shown in Figs. 1 to 3 when explaining embodiments of the present invention.

도 4는 일 실시예에 다른 객체 검출 방법을 개략적으로 도시한 흐름도이다.4 is a flowchart schematically illustrating an object detection method according to an embodiment.

도 4에 도시된 객체 검출 방법은 도 1 내지 도 3에 도시된 객체 검출 장치(100)에서 수행될 수 있는 바, 이상에서 도 1 내지 도 3을 참조하여 설명한 본 발명의 내용은 이하 도 4를 참조하여 설명하지 않더라도 도 4에 도시된 흐름도에 동일하게 적용 가능하다.The object detection method shown in FIG. 4 can be performed in the object detection apparatus 100 shown in FIGS. 1 to 3. The contents of the present invention described above with reference to FIGS. 1 to 3 will be described with reference to FIG. 4 It is equally applicable to the flowchart shown in Fig.

단계 41에서 카테고리 분류부(110)는 입력 영상을 수신할 수 있다.In step 41, the category classifying unit 110 may receive an input image.

단계 42에서 카테고리 분류부(110)는 단계 41에서 수신된 입력 영상을 신경망을 이용하여 복수의 카테고리 중 어느 하나로 분류할 수 있다. 신경망은, 영상을 복수의 카테고리로 분류하도록 사전에 학습된 딥러닝(deep-learning) 모델일 수 있다. 복수의 카테고리는, 영상을 기설정된 방법에 따라 제1 영역과 제2 영역을 포함하는 2 이상의 영역으로 분할하였을 때의 제1 영역에 객체가 포함된 것을 의미하는 제1 카테고리, 제2 영역에 객체가 포함된 것을 의미하는 제2 카테고리를 포함할 수 있다.In operation 42, the category classification unit 110 may classify the input image received in operation 41 into one of a plurality of categories using a neural network. The neural network may be a deep-learning model previously learned to classify images into a plurality of categories. The plurality of categories includes a first category indicating that the object is included in the first area when the image is divided into two or more areas including the first area and the second area according to a predetermined method, May be included in the second category.

단계 43에서 검출부(120)는 단계 42에서 분류된 결과에 따라 결정되는 객체 검출 대상 영역에서, 객체를 검출한다. 단계 42에서 입력 영상이 제1 카테고리로 분류되면, 단계 43에서 검출부(120)는 입력 영상의 제1 영역 내에서 객체를 검출할 수 있고, 입력 영상이 제2 카테고리로 분류된 경우 검출부(120)는 제2 영역에서 객체를 검출할 수 있다.In step 43, the detection unit 120 detects the object in the object detection subject area determined according to the result classified in step 42. [ If the input image is classified into the first category in step 42, the detection unit 120 may detect the object in the first area of the input image in step 43. If the input image is classified into the second category, Can detect the object in the second area.

일 실시예에 따르면, 단계 42에서 카테고리 분류부(110)에 의해 분류되는 복수의 카테고리는, 영상에 객체가 포함되지 않은 것을 의미하는 제3 카테고리를 더 포함할 수 있다. 단계 43에서 검출부(120)는 입력 영상이 제3 카테고리로 분류된 경우 객체 검출 대상 영역이 없는 것으로 결정할 수 있고, 영상에서 객체를 검출하지 않고 프로세스를 종료할 수 있다.According to one embodiment, the plurality of categories classified by the category classifying unit 110 in step 42 may further include a third category indicating that no object is included in the image. In step 43, the detection unit 120 can determine that the object detection target area does not exist when the input image is classified into the third category, and terminate the process without detecting the object in the image.

일 실시예에 따르면, 단계 42에서 분류되는 복수의 카테고리는, 제1 영역 및 제2 영역에 모두 객체가 포함된 것을 의미하는 제4 카테고리를 포함할 수 있다. 영상이 제1 영역 및 제2 영역으로 2분할되는 경우, 제4 카테고리는, 입력 영상의 분할된 모든 영역에 객체가 포함된 것을 의미할 수 있다. 입력 영상이 제4 카테고리로 분류된 경우 객체 검출 대상 영역은 제1 영역 및 제2 영역으로 결정될 수 있다. 영상이 제1 영역 및 제2 영역으로 2분할되는 경우 객체 검출 대상 영역은 입력 영상의 모든 영역으로 결정될 수 있다. 단계 43에서 검출부(120)는 입력 영상이 제4 카테고리로 분류된 경우 제1 영역 및 제2 영역에서, 혹은 영상의 전 영역에서 객체를 검출할 수 있다. According to one embodiment, the plurality of categories classified in step 42 may include a fourth category, which means that objects are included in both the first area and the second area. When the image is divided into the first area and the second area, the fourth category may mean that the object is included in all divided areas of the input image. When the input image is classified into the fourth category, the object detection subject area may be determined as the first area and the second area. When the image is divided into the first area and the second area, the object detection area may be determined as the entire area of the input image. In step 43, the detection unit 120 may detect the object in the first area, the second area, or the entire area of the image when the input image is classified into the fourth category.

한편, 단계 42가 수행되기에 앞서, 카테고리 분류부(110)에 포함되는 신경망은, 복수의 카테고리로 기분류된 입력 영상을 포함하는 학습 데이터에 의하여 학습될 수 있다.On the other hand, before the step 42 is performed, the neural network included in the category classification unit 110 can be learned by learning data including input images pre-classified into a plurality of categories.

전술한 제1 내지 제4 카테고리는, 영상이 제1 영역 및 제2 영역으로 2등분된 경우에 설정 가능한 카테고리의 종류이다. 다른 실시예에 따라 영상을 3 이상의 영역으로 분할하는 경우에는, 객체가 3 이상의 영역 각각에 포함되는지 여부에 따라 5 이상의 서로 다른 카테고리 설정이 가능하고, 카테고리 분류부(110)는 입력 영상을 5 이상의 카테고리 중 어느 하나로 분류할 수 있다.The above-described first to fourth categories are types of categories that can be set when the video is divided into a first region and a second region. According to another embodiment, when dividing an image into three or more regions, it is possible to set five or more different categories depending on whether the object is included in each of three or more regions, and the category classification unit 110 may classify the input image into five or more Or a category.

이하에서는, 영상을 제1 영역 및 제 2영역으로 2분할하는 기본적인 실시예를 가정하여 본 발명의 실시예들을 설명하기로 한다. 다만 이하에서 설명되는 실시예들은 영상을 3 이상의 영역으로 분할하는 경우에도 동일하게 적용 가능하다. 영상이 제1 영역 및 제 2영역으로 2분할되는 경우, 카테고리 분류부(110)는 입력 영상을 제1 영역에 객체가 포함되고 제2 영역에 객체가 포함되지 않는 경우에 해당하는 제1 카테고리, 제2 영역에 객체가 포함되고 제1 영역에 객체가 포함되지 않는 경우에 해당하는 제2 카테고리, 제1 영역 및 제2 영역에 모두 객체가 포함되는 경우에 해당하는 제3 카테고리, 제1 영역 및 제2 영역에 모두 객체가 포함되지 않는 경우에 해당하는 제4 카테고리 중 어느 하나로 분류할 수 있다.Hereinafter, embodiments of the present invention will be described on the assumption that the image is divided into a first area and a second area. However, the embodiments described below are equally applicable to the case of dividing an image into three or more regions. When the image is divided into the first area and the second area, the category classifying unit 110 classifies the input image into a first category corresponding to the case where the object is included in the first area and an object is not included in the second area, A third category, a first area, and a third category corresponding to the case where an object is included in the second category, the first area and the second area corresponding to the case where the object is included in the second area and the object is not included in the first area, And a fourth category corresponding to the case where no objects are included in the second area.

도 5는 입력 영상의 예를 도시한 것이다.5 shows an example of an input image.

도 5에는 다양한 입력 영상의 예시가 도시되었다. An example of various input images is shown in Fig.

먼저 도 5의 (a)는 객체(p1)를 포함하는 입력 영상의 예를 도시하고, (b)는 객체(p2)를 포함하는 입력 영상의 예를 도시하며, (c)는 객체(p3, p4)를 포함하는 입력 영상의 예를 도시하고, (d)는 객체를 포함하지 않는 입력 영상의 예를 도시한다.5 (a) shows an example of an input image including an object p1, (b) shows an example of an input image including an object p2, (c) p4), and (d) shows an example of an input image that does not include an object.

도 5를 참조하면, 카테고리 분류부(110)는 입력 영상(50)을 좌우로 2분할하였을 때의 어느 하나의 영역인 제1 영역(51)과 나머지 하나의 영역인 제2 영역(52)에 객체가 포함되는지 여부에 따라 카테고리를 분류할 수 있다. 제1 영역(51)은 영상을 2개 영역으로 분할하였을 때의 왼쪽 영역이고, 제2 영역(52)은 오른쪽 영역을 지칭할 수 있다.Referring to FIG. 5, the category classifying unit 110 classifies the input image 50 into a first region 51, which is one of two regions when the input image 50 is divided into left and right regions, and a second region 52, Categories can be categorized depending on whether the object is included or not. The first area 51 may be a left area when an image is divided into two areas, and the second area 52 may be a right area.

카테고리 분류부(110)는 입력 영상(50)의 제1 영역(51)과 제2 영역(52)에 객체가 포함되었는지 여부에 따라 입력 영상(50)을 제1 카테고리 내지 제 4 카테고리 중 어느 하나로 분류할 수 있다. 도 5를 참조하면, 카테고리 분류부(110)는 도 5의 (a)에 도시된 영상을 제1 영역에 객체가 포함된 것을 의미하는 제1 카테고리로, (b)에 도시된 영상을 제2 영역에 객체가 포함된 것을 의미하는 제2 카테고리로, (c)에 도시된 영상을 영상에 포함된 모든 영역(도 5에서 제1 영역 및 제2 영역)에 객체가 포함된 것을 의미하는 제4 카테고리로, (d)에 도시된 영상을 객체가 포함되지 않은 것을 의미하는 제3 카테고리로 분류할 수 있다.The category classifying unit 110 classifies the input image 50 into any one of the first category to the fourth category according to whether an object is included in the first area 51 and the second area 52 of the input image 50 Can be classified. 5, the category classifying unit 110 classifies the image shown in (a) of FIG. 5 into a first category indicating that an object is included in the first region, (C) is a second category indicating that the object is included in the region, and the fourth category, which means that the object is included in all the regions included in the image (the first region and the second region in Fig. 5) Category, the image shown in (d) can be classified into a third category which means that the object is not included.

검출부(110)는 도 5의 (a)에 도시된 영상이 제1 카테고리로 분류된 결과를 참조하여, 제1 영역 내에서 객체를 검출할 수 있다. 검출부(110)는 도 5의 (b)에 도시된 영상이 제2 카테고리로 분류된 결과를 참조하여 제2 영역에서 객체를 검출할 수 있고, (c)에 도시된 영상이 제4 카테고리로 분류된 결과를 참조하여 영상의 전 영역에서 객체를 검출할 수 있고, (d)에 도시된 영상이 제3 카테고리로 분류된 결과를 참조하여 객체 검출 처리를 수행하지 않을 수 있다.The detection unit 110 can detect the object in the first area by referring to the result of classification of the image shown in (a) of FIG. 5 into the first category. The detection unit 110 can detect the object in the second area with reference to the result of classification of the image shown in FIG. 5B into the second category, and the image shown in FIG. 5C can be classified into the fourth category It is possible to detect the object in the entire region of the image with reference to the result of the classification, and not to perform the object detection process with reference to the result of classifying the image shown in (d) into the third category.

도 6은 도 5의 (a)에 도시된 영상의 제1 영역(51)에서 객체를 검출하는 방법을 도시한 것이다.FIG. 6 shows a method of detecting an object in the first area 51 of the image shown in FIG. 5 (a).

도 6을 참조하면, 검출부(110)는 제1 카테고리로 분류된 입력 영상(50)의 제1 영역(51)에서 객체(p1)를 검출할 수 있다.Referring to FIG. 6, the detection unit 110 may detect the object p1 in the first region 51 of the input image 50 classified into the first category.

일 예에 따르면, 후보영역 검출부(111)는 제1 영역(51)에서 객체가 존재하는 것으로 추정되는 후보영역(61, 62)을 검출할 수 있다. 후보영역 검출부(111)는 후보영역을 검출하기 위해, 선택적 검출(selective search) 알고리즘, 영역 추천 네트워크(region proposal network) 알고리즘 등을 이용할 수 있다.According to an example, the candidate region detecting unit 111 may detect candidate regions 61 and 62 in which the object is estimated to exist in the first region 51. The candidate region detection unit 111 may use a selective search algorithm, a region proposal network algorithm, or the like to detect a candidate region.

객체 검출부(112)는, 후보영역 검출부(111)에 의해 검출된 각각의 후보영역(61, 62) 내에서, 객체를 검출할 수 있다. 객체 검출부(112)는 머신러닝 알고리즘을 이용하여 객체를 검출할 수 있다.The object detecting unit 112 can detect an object in each of the candidate regions 61 and 62 detected by the candidate region detecting unit 111. [ The object detection unit 112 can detect an object using a machine learning algorithm.

객체 검출부(112)는 모든 후보영역을 대상으로 검출 대상 객체에 대응되는 객체를 검출하기 위한 처리를 수행할 수 있고, 적어도 일부 후보영역에서 결과적으로 검출 대상 객체에 대응되는 객체를 검출해낼 수 있다. 도 6의 예에서, 검출 대상 객체는 사람일 수 있고, 객체 검출부(112)는 후보영역(61) 내에서 사람에 대응되는 객체(p1)를 검출할 수 있다. 그러나 객체 검출부(112)는 후보영역(62) 내에서는 사람에 대응되는 객체를 검출하지 못할 수 있다.The object detecting unit 112 can perform processing for detecting an object corresponding to a detection object on all candidate regions and can detect an object corresponding to the detection object in at least some candidate regions as a result. In the example of Fig. 6, the object to be detected may be a person, and the object detecting unit 112 may detect an object p1 corresponding to a person in the candidate region 61. [ However, the object detecting unit 112 may not be able to detect an object corresponding to a person in the candidate region 62.

객체 검출의 경우 객체를 놓치지 않고 검출하는 것을 중요시할 수 있기 때문에, 후보영역 검출부(111)는 후보영역의 검출 기준을 낮게 설정할 수 있다. 이에 따르면 후보영역 검출부(111)는 많은 수의 후보 영역을 많이 검출할 수 있고, 검출된 후보 영역들은 실제로 객체를 포함하지 않는 후보 영역, 즉 노이즈를 포함할 수 있게 된다. 도 6의 후보영역(62)이 노이즈의 예이다.In the case of object detection, it may be important to detect the object without missing. Therefore, the candidate region detecting unit 111 can set the detection criterion of the candidate region to be low. According to this, the candidate region detecting unit 111 can detect a large number of candidate regions, and the detected candidate regions can include a candidate region that does not actually include an object, that is, noise. The candidate region 62 in Fig. 6 is an example of noise.

도 7은 출력 영상의 예를 도시한 것이다.FIG. 7 shows an example of an output image.

상세히 도 7은 도 5의 (a)에 도시된 입력 영상에서 객체(p1)가 검출되어 검출부(120)에 의해 출력될 수 있는 출력 영상의 예를 도시한다.7 shows an example of an output image in which an object p1 is detected in the input image shown in FIG. 5A and output by the detection unit 120. In FIG.

도 7을 참조하면, 검출부(120)는 객체(p1)가 검출된 위치에 객체(p1)를 둘러싸는 바운딩 박스(71)를 표시한 출력 영상(70)을 출력할 수 있다. 출력 영상(70)은 출력 장치(200)에 출력되어 출력 장치(200)에 표시될 수 있으나, 반드시 이에 한정하는 것은 아니다.Referring to FIG. 7, the detector 120 may output an output image 70 indicating a bounding box 71 surrounding the object p1 at a position where the object p1 is detected. The output image 70 may be output to the output device 200 and displayed on the output device 200, but is not limited thereto.

도 8은 일 실시예에 따른 카테고리 분류부(110)의 신경망 모델을 도시한 것이다.FIG. 8 illustrates a neural network model of the category classifier 110 according to an embodiment.

도 8을 참조하면, 카테고리 분류부(110)는 입력 영상을 필터링하는 콘볼루션 레이어(111), 콘볼루션 레이어(111)의 출력 데이터의 차원을 압축하는 풀링 레이어(112)를 포함할 수 있다. 카테고리 분류부(110)는 2이상의 콘볼루션 레이어와 풀링 레이어를 교번적으로 포함할 수 있다. 예컨대 카테고리 분류부(11)는 콘볼루션 레이어(113) 및 풀링 레이어(114)를 더 포함할 수 있다.Referring to FIG. 8, the category classifier 110 may include a convolution layer 111 for filtering an input image, and a pulling layer 112 for compressing dimensions of output data of the convolution layer 111. The category classification unit 110 may alternately include two or more convolution layers and a pooling layer. For example, the category classification unit 11 may further include a convolution layer 113 and a pooling layer 114. [

카테고리 분류부(110)는 풀링 레이어(114)의 출력 데이터에 연결된 FCL(fully connected layer, 115), 및 FCL(115)의 출력 데이터에 포함되는 소프트맥스 레이어(116)를 더 포함할 수 있다. 카테고리 분류부(110)는 소프트맥스 레이어(116)에 연결되어 최종적으로 카테고리 분류 정보를 출력하는 출력층(117)을 더 포함할 수 있다. The category classification unit 110 may further include a soft connected layer 116 included in the output data of the FCL 115 and the fully connected layer 115 connected to the output data of the pulling layer 114. The category classification unit 110 may further include an output layer 117 connected to the soft max layer 116 to finally output category classification information.

풀링 레이어(112)는 영상의 모든 정보들을 놓치지 않고 참조하기 위하여 스트라이드(stride) 값을 1로 설정할 수 있다. 카테고리 분류부(110)는 학습 데이터들로부터 카테고리 분류 정보를 학습하는 히든 레이어로써 모든 노드가 전체적으로 연결된 FCL 층을 포함함으로써, 모든 픽셀들 간의 관계를 참고할 수 있다.The pooling layer 112 may set a stride value to 1 in order to miss all the information of the image. The category classification unit 110 includes a FCL layer in which all nodes are connected as a hidden layer for learning category classification information from learning data, so that the relation between all the pixels can be referred to.

소프트맥스 레이어(116)는 카테고리 분류부(110)에 의해 분류 가능한 카테고리의 수만큼의 노드를 포함할 수 있다. 각 노드는 각 카테고리의 확률에 대응될 수 있다.The soft max layer 116 may include as many nodes as the number of categories that can be classified by the category classifier 110. Each node may correspond to a probability of each category.

출력층(117)은 소프트멕스 레이어(116)에서 가장 확률이 높은 노드에 대응되는 정보를 출력할 수 있다.The output layer 117 can output information corresponding to the node having the highest probability in the soft mux layer 116. [

전술한 본 발명의 실시예들에 따르면, 입력 영상을 분할한 복수 영역 중 어느 영역에 객체가 포함되었는지에 따른 카테고리를 1차적으로 분류한 후에, 카테고리 분류 결과에 따라 객체가 포함된 것으로 추측되는 영역 내에서만 객체 검출을 위한 2차적 연산을 처리하게 된다. 따라서, 카테고리와 무관하게 입력 영상의 전 영역에서 무조건적으로 객체를 검출하는 경우에 비하여, 객체 검출을 위한 연산에 소요되는 시간이 현저하게 단축된다.According to the embodiments of the present invention described above, a category is firstly classified according to which region of an input image is divided into a plurality of regions, and then an area estimated to include an object according to a category classification result Only the secondary operation for object detection is processed. Therefore, the time required for the object detection is significantly shortened compared with the case where the object is unconditionally detected in the entire region of the input image irrespective of the category.

예를 들어, 한장의 영상에서 후보 영역이 수백개 내지 수천개까지 검출되도록, 후보 영역 검출 알고리즘을 관용적으로 설정할 수 있다. 즉, 제1 영역에만 객체가 존재하더라도 제2 영역에서 많은 수의 후보 영역들이 검출될 수 있게 된다. 이 때 제2 영역에서 검출되는 후보 영역들은 모두 노이즈이고, 제2 영역에 포함된 각 후보 영역 내에서 객체를 검출하는 처리 작업은 불필요한 작업이 된다. For example, the candidate region detection algorithm can be arbitrarily set so that hundreds to thousands of candidate regions are detected from a single image. That is, even if an object exists only in the first area, a large number of candidate areas can be detected in the second area. At this time, the candidate regions detected in the second region are all noise, and the processing task of detecting an object in each candidate region included in the second region becomes unnecessary work.

전술한 본 발명의 실시예들에 따라 입력 영상의 절반에 해당하는 제1 영역에만 객체가 포함된 것으로 입력 영상의 카테고리가 분류되는 경우, 제1 영역에서만 후보 영역을 검출하게 되고, 후보 영역의 개수 자체가 거의 절반으로 줄어들게 되므로, 후보 영역 내에서 객체를 실질적으로 검출해내기 위한 연산 처리량이 영상 전체를 대상으로 객체를 검출하는 경우에 비하여 절반으로 감소된다. According to the above-described embodiments of the present invention, when the category of the input image is classified into that the object is included only in the first region corresponding to half of the input image, the candidate region is detected only in the first region, The computation throughput for actually detecting the object in the candidate region is reduced to half as compared with the case where the object is detected with respect to the entire image.

만약 입력 영상에 객체가 포함되지 않은 것으로 카테고리가 분류되는 경우, 객체 검출을 위한 실질적인 연산 자체를 수행하지 않고 프로세스를 종료하거나 다음 입력 영상을 수신하게 되므로, 연산 처리량이 현저하게 감소된다.If the category is classified as not including an object in the input image, the process is terminated without performing the actual operation itself for detecting the object, or the next input image is received, so that the processing throughput is significantly reduced.

또한, 객체 검출 대상 영역 내에서도 후보 영역을 선정하고, 후보 영역 내에서만 객체를 검출하게 되므로, 객체 검출 대상 영역에 대한 영상을 전체적으로 참고하여 객체를 검출하는 경우에 비하여 연산 처리량이 저감된다.In addition, since the candidate region is selected in the object detection target region and the object is detected only in the candidate region, the computational processing amount is reduced as compared with the case where the object is detected by referring to the image of the object detection target region as a whole.

한편, 도 4에 도시된 본 발명의 일 실시예에 따른 객체 검출 방법은 컴퓨터에서 실행될 수 있는 프로그램으로 작성 가능하고, 컴퓨터로 읽을 수 있는 기록매체를 이용하여 상기 프로그램을 동작시키는 범용 디지털 컴퓨터에서 구현될 수 있다. 상기 컴퓨터로 읽을 수 있는 기록매체는 마그네틱 저장매체(예를 들면, 롬, 플로피 디스크, 하드 디스크 등), 광학적 판독 매체(예를 들면, 시디롬, 디브이디 등)와 같은 저장매체를 포함한다.Meanwhile, the object detection method according to an embodiment of the present invention shown in FIG. 4 can be implemented as a program that can be executed by a computer and is implemented in a general-purpose digital computer that operates the program using a computer- . The computer-readable recording medium includes a storage medium such as a magnetic storage medium (e.g., ROM, floppy disk, hard disk, etc.), optical reading medium (e.g., CD ROM,

한편, 도 2 및 도 3에 도시된 블록도는 본 실시예의 특징이 흐려지는 것을 방지하기 위하여 본 실시예와 관련된 구성요소들만을 도시한 것이므로, 도면에 도시된 구성요소들 외에 다른 범용적인 구성요소들이 더 포함될 수 있음을 본 실시예와 관련된 기술분야에서 통상의 지식을 가진 자라면 이해할 수 있다.2 and 3 show only the components related to the present embodiment in order to prevent the features of the present embodiment from being blurred, it is to be understood that other general components other than the components shown in the figures It will be understood by those skilled in the art that the present invention can be further embodied.

또한 도 2 및 도 3에 도시된 블록도는 적어도 하나 이상의 프로세서(processor)에 해당하거나, 적어도 하나 이상의 프로세서를 포함할 수 있다. 블록도에 대응되는 장치는 마이크로 프로세서나 범용 컴퓨터 시스템과 같은 다른 하드웨어 장치에 포함된 형태로 구동될 수 있다.Also, the block diagrams shown in FIGS. 2 and 3 may correspond to at least one processor or may include at least one or more processors. The devices corresponding to the block diagrams may be implemented in the form embedded in other hardware devices such as microprocessors or general purpose computer systems.

이와 같이 본 발명은 도면에 도시된 일 실시예를 참고로 하여 설명하였으나 이는 예시적인 것에 불과하며 당해 분야에서 통상의 지식을 가진 자라면 이로부터 다양한 변형 및 실시예의 변형이 가능하다는 점을 이해할 것이다. 따라서, 본 발명의 진정한 기술적 보호 범위는 첨부된 특허청구범위의 기술적 사상에 의하여 정해져야 할 것이다.While the present invention has been particularly shown and described with reference to exemplary embodiments thereof, it is to be understood that the invention is not limited to the exemplary embodiments, and that various changes and modifications may be made therein without departing from the scope of the present invention. Accordingly, the true scope of the present invention should be determined by the technical idea of the appended claims.

본 발명은 다양한 변환을 가할 수 있고 여러 가지 실시예를 가질 수 있는 바, 특정 실시예들을 도면에 예시하고 상세한 설명에 상세하게 설명하고자 한다. 본 발명의 효과 및 특징, 그리고 그것들을 달성하는 방법은 도면과 함께 상세하게 후술되어 있는 실시예들을 참조하면 명확해질 것이다. 그러나 본 발명은 이하에서 개시되는 실시예들에 한정되는 것이 아니라 다양한 형태로 구현될 수 있다.
이하, 첨부된 도면을 참조하여 본 발명의 실시예들을 상세히 설명하기로 하며, 도면을 참조하여 설명할 때 동일하거나 대응하는 구성 요소는 동일한 도면부호를 부여하고 이에 대한 중복되는 설명은 생략하기로 한다.
이하의 실시예에서, 제1, 제2 등의 용어는 한정적인 의미가 아니라 하나의 구성 요소를 다른 구성 요소와 구별하는 목적으로 사용되었다. 이하의 실시예에서, 단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한, 복수의 표현을 포함한다. 이하의 실시예에서, 포함하다 또는 가지다 등의 용어는 명세서상에 기재된 특징, 또는 구성요소가 존재함을 의미하는 것이고, 하나 이상의 다른 특징들 또는 구성요소가 부가될 가능성을 미리 배제하는 것은 아니다. 도면에서는 설명의 편의를 위하여 구성 요소들이 그 크기가 과장 또는 축소될 수 있다. 예컨대, 도면에서 나타난 각 구성의 크기 및 두께는 설명의 편의를 위해 임의로 나타내었으므로, 본 발명이 반드시 도시된 바에 한정되지 않는다.
도 1은 일 실시예에 따른 객체 검출 시스템을 도시한 것이다.
도 1을 참조하면, 일 실시예에 따른 객체 검출 시스템은, 입력 영상으로부터 객체를 검출하는 객체 검출 장치(100)를 포함한다. 일 실시예에 따른 객체 검출 시스템은, 객체 검출 장치(100)로부터 출력되는 출력 영상을 표시하는 표시장치(200) 및/또는 객체 검출 장치(100)로부터 출력된 정보를 저장하는 데이터베이스(300)를 더 포함할 수 있다.
일 실시예에서 객체는, 영상 내에서 검출하고자 하는 대상을 의미한다. 객체 검출 시스템은 검출 대상이 되는 객체에 관한 정보를 포함할 수 있다.
도 2는 일 실시예에 다른 객체 검출 장치(100)의 구성을 개략적으로 도시한 블록도이다.
도 2를 참조하면, 객체 검출 장치(100)는, 카테고리 분류부(110), 검출부(120)를 포함할 수 있고, 영상 출력부(130)를 더 포함할 수 있다.
카테고리 분류부(110)는 입력 영상을 수신하고, 입력 영상을 복수의 카테고리 중 어느 하나로 분류할 수 있다. 카테고리 분류부(110)는 영상을 기설정된 복수의 카테고리로 분류하도록 학습된 신경망(neural net) 모델을 포함할 수 있다. 카테고리 분류부(110)는 학습된 신경망 모델을 이용하여, 입력 영상을 기설정된 복수의 카테고리 중 어느 하나로 분류할 수 있다.
카테고리 분류부(110)에 의해 분류되는 카테고리는, 영상을 복수의 영역으로 분할하였을 때 어느 영역에 객체가 포함되는 지 여부에 따라 구분될 수 있다. 예를 들어, 일 실시예에 따른 카테고리는, 영상을 기설정된 방법에 따라 제1 영역 및 제2 영역을 포함하는 2 이상의 영역으로 분할하였을 때, 제1 영역에 객체가 포함된 경우를 의미하는 제1 카테고리 및 제2 영역에 객체가 포함된 경우를 의미하는 제2 카테고리를 포함할 수 있다.
검출부(120)는, 입력 영상 및 카테고리 분류부(110) 의한 입력 영상의 카테고리 분류 결과를 수신하고, 카테고리 분류 결과에 기초하여 입력 영상의 적어도 일부 영역인 객체 검출 대상 영역 내에서 객체를 검출할 수 있다.
객체 검출 대상 영역은, 카테고리 분류 결과에 기초하여 결정되는 것일 수 있다. 예를 들어, 검출부(120)는 입력 영상이 전술한 제1 카테고리로 분류된 경우, 입력 영상의 제1 영역에서 객체를 검출할 수 있다. 검출부(120)는 입력 영상이 전술한 제2 카테고리로 분류된 경우, 입력 영상의 제2 영역에서 객체를 검출할 수 있다.
일 실시예에 따르면, 입력 영상에서 실질적으로 객체를 검출하기 이전에, 객체가 존재하는 것으로 예상되는 영역 정보를 먼저 획득하고, 그러한 영역 내에서만 객체를 검출할 수 있다. 이에 따르면, 입력 영상의 전 영역에서 객체를 검출하는 것에 비하여 연산 속도가 현저하게 저감된다.
도 3은 일 실시예에 따른 검출부(120)의 구성을 개략적으로 도시한 블록도이다.
도 3을 참조하면, 검출부(120)는 객체 검출 대상 영역 내에서 객체를 검출하는 객체 검출부(122)를 포함할 수 있고, 객체를 검출하기 위한 사전작업을 수행하는 후보영역 검출부(121) 및 검출된 객체를 분류하는 객체 분류부(123)를 더 포함할 수 있다.
후보영역 검출부(121)는 객체 검출 대상 영역 내에서, 객체가 존재할 것으로 예상되는 후보영역을 검출할 수 있다.
객체 검출부(122)는 후보영역 검출부(121)에 의해 검출된 후보영역 각각에 포함된 객체를 검출할 수 있다. 후보영역은 노이즈를 포함할 수 있고, 적어도 일부 후보영역에서 객체가 검출되지 않을 수 있다.
객체 분류부(123)는 객체 검출부(122)에 의해 검출된 객체를 분류하고, 객체에 부가적인 정보를 부여할 수 있다. 예를 들어, 객체 분류부(123)는 검출된 객체를 차량 또는 사람으로 분류할 수 있고, 객체 분류 정보를 객체에 대해 부여할 수 있다.
일 실시예에 따르면, 객체 검출부(122)는, 검출된 객체 정보를 포함하는 데이터를 출력할 수 있다. 예를 들어, 객체 검출부(122)는 검출된 객체 정보를 입력 영상에 표시한 출력 영상을 출력할 수 있다. 또는 객체 검출부(122)는 검출된 객체의 위치 정보를 포함하거나 객체 분류 정보를 더 포함하는 메타데이터를 출력할 수 있다.
객체 검출부(122)는 출력 영상을 출력장치(200)로 출력하거나, 출력 영상 및/또는 메타데이터를 데이터베이스(300)에 저장할 수 있다.
출력 영상은, 입력 영상에서 객체와 관련된 기설정된 표시요소를 추가한 것일 수 있다. 표시요소는, 객체의 위치를 식별할 수 있도록 객체가 검출된 위치에 표시되는 점, 선, 및/또는 면으로 구성되는 요소일 수 있다. 표시요소는, 객체의 분류 정보를 식별할 수 있도록 입력 영상 내에 객체의 정보를 표시하는 요소일 수 있다. 예를 들어, 출력 영상은 입력 영상에서 객체가 검출된 위치에 객체를 둘러싸도록 표시된 바운딩(bounding) 박스를 포함할 수 있다. 출력 영상은 입력 영상에서 객체가 검출된 위치에, 또는 기설정된 특정 위치에, 객체의 분류 정보를 텍스트 및/또는 기설정된 도형으로 표시한 것일 수 있다. 객체의 분류 정보를 나타내는 도형은, 예컨대 사람을 형성화한 도형, 차량을 형상화한 도형 등을 포함할 수 있다.
이하에서 본 발명의 실시예들을 설명할 때 도 1 내지 도 3에 도시된 구성들을 참조할 것이다.
도 4는 일 실시예에 다른 객체 검출 방법을 개략적으로 도시한 흐름도이다.
도 4에 도시된 객체 검출 방법은 도 1 내지 도 3에 도시된 객체 검출 장치(100)에서 수행될 수 있는 바, 이상에서 도 1 내지 도 3을 참조하여 설명한 본 발명의 내용은 이하 도 4를 참조하여 설명하지 않더라도 도 4에 도시된 흐름도에 동일하게 적용 가능하다.
단계 41에서 카테고리 분류부(110)는 입력 영상을 수신할 수 있다.
단계 42에서 카테고리 분류부(110)는 단계 41에서 수신된 입력 영상을 신경망을 이용하여 복수의 카테고리 중 어느 하나로 분류할 수 있다. 신경망은, 영상을 복수의 카테고리로 분류하도록 사전에 학습된 딥러닝(deep-learning) 모델일 수 있다. 복수의 카테고리는, 영상을 기설정된 방법에 따라 제1 영역과 제2 영역을 포함하는 2 이상의 영역으로 분할하였을 때의 제1 영역에 객체가 포함된 것을 의미하는 제1 카테고리, 제2 영역에 객체가 포함된 것을 의미하는 제2 카테고리를 포함할 수 있다.
단계 43에서 검출부(120)는 단계 42에서 분류된 결과에 따라 결정되는 객체 검출 대상 영역에서, 객체를 검출한다. 단계 42에서 입력 영상이 제1 카테고리로 분류되면, 단계 43에서 검출부(120)는 입력 영상의 제1 영역 내에서 객체를 검출할 수 있고, 입력 영상이 제2 카테고리로 분류된 경우 검출부(120)는 제2 영역에서 객체를 검출할 수 있다.
일 실시예에 따르면, 단계 42에서 카테고리 분류부(110)에 의해 분류되는 복수의 카테고리는, 영상에 객체가 포함되지 않은 것을 의미하는 제3 카테고리를 더 포함할 수 있다. 단계 43에서 검출부(120)는 입력 영상이 제3 카테고리로 분류된 경우 객체 검출 대상 영역이 없는 것으로 결정할 수 있고, 영상에서 객체를 검출하지 않고 프로세스를 종료할 수 있다.
일 실시예에 따르면, 단계 42에서 분류되는 복수의 카테고리는, 제1 영역 및 제2 영역에 모두 객체가 포함된 것을 의미하는 제4 카테고리를 포함할 수 있다. 영상이 제1 영역 및 제2 영역으로 2분할되는 경우, 제4 카테고리는, 입력 영상의 분할된 모든 영역에 객체가 포함된 것을 의미할 수 있다. 입력 영상이 제4 카테고리로 분류된 경우 객체 검출 대상 영역은 제1 영역 및 제2 영역으로 결정될 수 있다. 영상이 제1 영역 및 제2 영역으로 2분할되는 경우 객체 검출 대상 영역은 입력 영상의 모든 영역으로 결정될 수 있다. 단계 43에서 검출부(120)는 입력 영상이 제4 카테고리로 분류된 경우 제1 영역 및 제2 영역에서, 혹은 영상의 전 영역에서 객체를 검출할 수 있다.
한편, 단계 42가 수행되기에 앞서, 카테고리 분류부(110)에 포함되는 신경망은, 복수의 카테고리로 기분류된 입력 영상을 포함하는 학습 데이터에 의하여 학습될 수 있다.
전술한 제1 내지 제4 카테고리는, 영상이 제1 영역 및 제2 영역으로 2등분된 경우에 설정 가능한 카테고리의 종류이다. 다른 실시예에 따라 영상을 3 이상의 영역으로 분할하는 경우에는, 객체가 3 이상의 영역 각각에 포함되는지 여부에 따라 5 이상의 서로 다른 카테고리 설정이 가능하고, 카테고리 분류부(110)는 입력 영상을 5 이상의 카테고리 중 어느 하나로 분류할 수 있다.
이하에서는, 영상을 제1 영역 및 제 2영역으로 2분할하는 기본적인 실시예를 가정하여 본 발명의 실시예들을 설명하기로 한다. 다만 이하에서 설명되는 실시예들은 영상을 3 이상의 영역으로 분할하는 경우에도 동일하게 적용 가능하다. 영상이 제1 영역 및 제 2영역으로 2분할되는 경우, 카테고리 분류부(110)는 입력 영상을 제1 영역에 객체가 포함되고 제2 영역에 객체가 포함되지 않는 경우에 해당하는 제1 카테고리, 제2 영역에 객체가 포함되고 제1 영역에 객체가 포함되지 않는 경우에 해당하는 제2 카테고리, 제1 영역 및 제2 영역에 모두 객체가 포함되는 경우에 해당하는 제3 카테고리, 제1 영역 및 제2 영역에 모두 객체가 포함되지 않는 경우에 해당하는 제4 카테고리 중 어느 하나로 분류할 수 있다.
도 5는 입력 영상의 예를 도시한 것이다.
도 5에는 다양한 입력 영상의 예시가 도시되었다.
먼저 도 5의 (a)는 객체(p1)를 포함하는 입력 영상의 예를 도시하고, (b)는 객체(p2)를 포함하는 입력 영상의 예를 도시하며, (c)는 객체(p3, p4)를 포함하는 입력 영상의 예를 도시하고, (d)는 객체를 포함하지 않는 입력 영상의 예를 도시한다.
도 5를 참조하면, 카테고리 분류부(110)는 입력 영상(50)을 좌우로 2분할하였을 때의 어느 하나의 영역인 제1 영역(51)과 나머지 하나의 영역인 제2 영역(52)에 객체가 포함되는지 여부에 따라 카테고리를 분류할 수 있다. 제1 영역(51)은 영상을 2개 영역으로 분할하였을 때의 왼쪽 영역이고, 제2 영역(52)은 오른쪽 영역을 지칭할 수 있다.
카테고리 분류부(110)는 입력 영상(50)의 제1 영역(51)과 제2 영역(52)에 객체가 포함되었는지 여부에 따라 입력 영상(50)을 제1 카테고리 내지 제 4 카테고리 중 어느 하나로 분류할 수 있다. 도 5를 참조하면, 카테고리 분류부(110)는 도 5의 (a)에 도시된 영상을 제1 영역에 객체가 포함된 것을 의미하는 제1 카테고리로, (b)에 도시된 영상을 제2 영역에 객체가 포함된 것을 의미하는 제2 카테고리로, (c)에 도시된 영상을 영상에 포함된 모든 영역(도 5에서 제1 영역 및 제2 영역)에 객체가 포함된 것을 의미하는 제4 카테고리로, (d)에 도시된 영상을 객체가 포함되지 않은 것을 의미하는 제3 카테고리로 분류할 수 있다.
검출부(110)는 도 5의 (a)에 도시된 영상이 제1 카테고리로 분류된 결과를 참조하여, 제1 영역 내에서 객체를 검출할 수 있다. 검출부(110)는 도 5의 (b)에 도시된 영상이 제2 카테고리로 분류된 결과를 참조하여 제2 영역에서 객체를 검출할 수 있고, (c)에 도시된 영상이 제4 카테고리로 분류된 결과를 참조하여 영상의 전 영역에서 객체를 검출할 수 있고, (d)에 도시된 영상이 제3 카테고리로 분류된 결과를 참조하여 객체 검출 처리를 수행하지 않을 수 있다.
도 6은 도 5의 (a)에 도시된 영상의 제1 영역(51)에서 객체를 검출하는 방법을 도시한 것이다.
도 6을 참조하면, 검출부(110)는 제1 카테고리로 분류된 입력 영상(50)의 제1 영역(51)에서 객체(p1)를 검출할 수 있다.
일 예에 따르면, 후보영역 검출부(111)는 제1 영역(51)에서 객체가 존재하는 것으로 추정되는 후보영역(61, 62)을 검출할 수 있다. 후보영역 검출부(111)는 후보영역을 검출하기 위해, 선택적 검출(selective search) 알고리즘, 영역 추천 네트워크(region proposal network) 알고리즘 등을 이용할 수 있다.
객체 검출부(112)는, 후보영역 검출부(111)에 의해 검출된 각각의 후보영역(61, 62) 내에서, 객체를 검출할 수 있다. 객체 검출부(112)는 머신러닝 알고리즘을 이용하여 객체를 검출할 수 있다.
객체 검출부(112)는 모든 후보영역을 대상으로 검출 대상 객체에 대응되는 객체를 검출하기 위한 처리를 수행할 수 있고, 적어도 일부 후보영역에서 결과적으로 검출 대상 객체에 대응되는 객체를 검출해낼 수 있다. 도 6의 예에서, 검출 대상 객체는 사람일 수 있고, 객체 검출부(112)는 후보영역(61) 내에서 사람에 대응되는 객체(p1)를 검출할 수 있다. 그러나 객체 검출부(112)는 후보영역(62) 내에서는 사람에 대응되는 객체를 검출하지 못할 수 있다.
객체 검출의 경우 객체를 놓치지 않고 검출하는 것을 중요시할 수 있기 때문에, 후보영역 검출부(111)는 후보영역의 검출 기준을 낮게 설정할 수 있다. 이에 따르면 후보영역 검출부(111)는 많은 수의 후보 영역을 많이 검출할 수 있고, 검출된 후보 영역들은 실제로 객체를 포함하지 않는 후보 영역, 즉 노이즈를 포함할 수 있게 된다. 도 6의 후보영역(62)이 노이즈의 예이다.
도 7은 출력 영상의 예를 도시한 것이다.
상세히 도 7은 도 5의 (a)에 도시된 입력 영상에서 객체(p1)가 검출되어 검출부(120)에 의해 출력될 수 있는 출력 영상의 예를 도시한다.
도 7을 참조하면, 검출부(120)는 객체(p1)가 검출된 위치에 객체(p1)를 둘러싸는 바운딩 박스(71)를 표시한 출력 영상(70)을 출력할 수 있다. 출력 영상(70)은 출력 장치(200)에 출력되어 출력 장치(200)에 표시될 수 있으나, 반드시 이에 한정하는 것은 아니다.
도 8은 일 실시예에 따른 카테고리 분류부(110)의 신경망 모델을 도시한 것이다.
도 8을 참조하면, 카테고리 분류부(110)는 입력 영상을 필터링하는 콘볼루션 레이어(111), 콘볼루션 레이어(111)의 출력 데이터의 차원을 압축하는 풀링 레이어(112)를 포함할 수 있다. 카테고리 분류부(110)는 2이상의 콘볼루션 레이어와 풀링 레이어를 교번적으로 포함할 수 있다. 예컨대 카테고리 분류부(11)는 콘볼루션 레이어(113) 및 풀링 레이어(114)를 더 포함할 수 있다.
카테고리 분류부(110)는 풀링 레이어(114)의 출력 데이터에 연결된 FCL(fully connected layer, 115), 및 FCL(115)의 출력 데이터에 포함되는 소프트맥스 레이어(116)를 더 포함할 수 있다. 카테고리 분류부(110)는 소프트맥스 레이어(116)에 연결되어 최종적으로 카테고리 분류 정보를 출력하는 출력층(117)을 더 포함할 수 있다.
풀링 레이어(112)는 영상의 모든 정보들을 놓치지 않고 참조하기 위하여 스트라이드(stride) 값을 1로 설정할 수 있다. 카테고리 분류부(110)는 학습 데이터들로부터 카테고리 분류 정보를 학습하는 히든 레이어로써 모든 노드가 전체적으로 연결된 FCL 층을 포함함으로써, 모든 픽셀들 간의 관계를 참고할 수 있다.
소프트맥스 레이어(116)는 카테고리 분류부(110)에 의해 분류 가능한 카테고리의 수만큼의 노드를 포함할 수 있다. 각 노드는 각 카테고리의 확률에 대응될 수 있다.
출력층(117)은 소프트멕스 레이어(116)에서 가장 확률이 높은 노드에 대응되는 정보를 출력할 수 있다.
전술한 본 발명의 실시예들에 따르면, 입력 영상을 분할한 복수 영역 중 어느 영역에 객체가 포함되었는지에 따른 카테고리를 1차적으로 분류한 후에, 카테고리 분류 결과에 따라 객체가 포함된 것으로 추측되는 영역 내에서만 객체 검출을 위한 2차적 연산을 처리하게 된다. 따라서, 카테고리와 무관하게 입력 영상의 전 영역에서 무조건적으로 객체를 검출하는 경우에 비하여, 객체 검출을 위한 연산에 소요되는 시간이 현저하게 단축된다.
예를 들어, 한장의 영상에서 후보 영역이 수백개 내지 수천개까지 검출되도록, 후보 영역 검출 알고리즘을 관용적으로 설정할 수 있다. 즉, 제1 영역에만 객체가 존재하더라도 제2 영역에서 많은 수의 후보 영역들이 검출될 수 있게 된다. 이 때 제2 영역에서 검출되는 후보 영역들은 모두 노이즈이고, 제2 영역에 포함된 각 후보 영역 내에서 객체를 검출하는 처리 작업은 불필요한 작업이 된다.
전술한 본 발명의 실시예들에 따라 입력 영상의 절반에 해당하는 제1 영역에만 객체가 포함된 것으로 입력 영상의 카테고리가 분류되는 경우, 제1 영역에서만 후보 영역을 검출하게 되고, 후보 영역의 개수 자체가 거의 절반으로 줄어들게 되므로, 후보 영역 내에서 객체를 실질적으로 검출해내기 위한 연산 처리량이 영상 전체를 대상으로 객체를 검출하는 경우에 비하여 절반으로 감소된다.
만약 입력 영상에 객체가 포함되지 않은 것으로 카테고리가 분류되는 경우, 객체 검출을 위한 실질적인 연산 자체를 수행하지 않고 프로세스를 종료하거나 다음 입력 영상을 수신하게 되므로, 연산 처리량이 현저하게 감소된다.
또한, 객체 검출 대상 영역 내에서도 후보 영역을 선정하고, 후보 영역 내에서만 객체를 검출하게 되므로, 객체 검출 대상 영역에 대한 영상을 전체적으로 참고하여 객체를 검출하는 경우에 비하여 연산 처리량이 저감된다.
한편, 도 4에 도시된 본 발명의 일 실시예에 따른 객체 검출 방법은 컴퓨터에서 실행될 수 있는 프로그램으로 작성 가능하고, 컴퓨터로 읽을 수 있는 기록매체를 이용하여 상기 프로그램을 동작시키는 범용 디지털 컴퓨터에서 구현될 수 있다. 상기 컴퓨터로 읽을 수 있는 기록매체는 마그네틱 저장매체(예를 들면, 롬, 플로피 디스크, 하드 디스크 등), 광학적 판독 매체(예를 들면, 시디롬, 디브이디 등)와 같은 저장매체를 포함한다.
한편, 도 2 및 도 3에 도시된 블록도는 본 실시예의 특징이 흐려지는 것을 방지하기 위하여 본 실시예와 관련된 구성요소들만을 도시한 것이므로, 도면에 도시된 구성요소들 외에 다른 범용적인 구성요소들이 더 포함될 수 있음을 본 실시예와 관련된 기술분야에서 통상의 지식을 가진 자라면 이해할 수 있다.
또한 도 2 및 도 3에 도시된 블록도는 적어도 하나 이상의 프로세서(processor)에 해당하거나, 적어도 하나 이상의 프로세서를 포함할 수 있다. 블록도에 대응되는 장치는 마이크로 프로세서나 범용 컴퓨터 시스템과 같은 다른 하드웨어 장치에 포함된 형태로 구동될 수 있다.
이와 같이 본 발명은 도면에 도시된 일 실시예를 참고로 하여 설명하였으나 이는 예시적인 것에 불과하며 당해 분야에서 통상의 지식을 가진 자라면 이로부터 다양한 변형 및 실시예의 변형이 가능하다는 점을 이해할 것이다. 따라서, 본 발명의 진정한 기술적 보호 범위는 첨부된 특허청구범위의 기술적 사상에 의하여 정해져야 할 것이다.BRIEF DESCRIPTION OF THE DRAWINGS The present invention is capable of various modifications and various embodiments, and specific embodiments are illustrated in the drawings and described in detail in the detailed description. The effects and features of the present invention and methods of achieving them will be apparent with reference to the embodiments described in detail below with reference to the drawings. However, the present invention is not limited to the embodiments described below, but may be implemented in various forms.
Hereinafter, exemplary embodiments of the present invention will be described in detail with reference to the accompanying drawings, wherein like reference numerals refer to like or corresponding components throughout the drawings, and a duplicate description thereof will be omitted .
In the following embodiments, the terms first, second, and the like are used for the purpose of distinguishing one element from another element, not the limitative meaning. In the following examples, the singular forms "a", "an" and "the" include plural referents unless the context clearly dictates otherwise. In the following embodiments, terms such as inclusive or possessive are intended to mean that a feature, or element, described in the specification is present, and does not preclude the possibility that one or more other features or elements may be added. In the drawings, components may be exaggerated or reduced in size for convenience of explanation. For example, the size and thickness of each component shown in the drawings are arbitrarily shown for convenience of explanation, and thus the present invention is not necessarily limited to those shown in the drawings.
1 illustrates an object detection system according to one embodiment.
Referring to FIG. 1, an object detection system according to an embodiment includes an object detection apparatus 100 for detecting an object from an input image. The object detection system according to one embodiment includes a display device 200 for displaying an output image output from the object detection apparatus 100 and a database 300 for storing information output from the object detection apparatus 100 .
In one embodiment, the object means an object to be detected in the image. The object detection system may include information about an object to be detected.
2 is a block diagram schematically showing the configuration of an object detecting apparatus 100 according to an embodiment.
Referring to FIG. 2, the object detecting apparatus 100 may include a category classifying unit 110, a detecting unit 120, and may further include a video output unit 130.
The category classification unit 110 receives an input image and can classify the input image into one of a plurality of categories. The category classifying unit 110 may include a neural net model that is learned to classify images into a plurality of predetermined categories. The category classification unit 110 can classify the input image into any one of a plurality of predetermined categories by using the learned neural network model.
The category classified by the category classification unit 110 can be classified according to which area the object is included when the image is divided into a plurality of areas. For example, a category according to an embodiment may include a category, which means that when an image is divided into two or more regions including a first region and a second region according to a predetermined method, 1 category, and a second category indicating a case where an object is included in the second area.
The detection unit 120 receives the category classification result of the input image and the input image by the category classification unit 110 and detects the object in the object detection target area that is at least a partial region of the input image based on the category classification result have.
The object detection subject area may be determined based on the category classification result. For example, when the input image is classified into the first category, the detection unit 120 may detect the object in the first area of the input image. The detection unit 120 may detect an object in a second region of the input image when the input image is classified into the second category.
According to one embodiment, the region information expected to exist before the object is actually obtained, and the object can be detected only within such region, before substantially detecting the object in the input image. According to this, the operation speed is remarkably reduced as compared with the case where the object is detected in the entire area of the input image.
3 is a block diagram schematically showing the configuration of the detection unit 120 according to one embodiment.
Referring to FIG. 3, the detection unit 120 may include an object detection unit 122 for detecting an object in an object detection area, and may include a candidate area detection unit 121 for performing a preliminary operation for detecting an object, And an object classification unit 123 for classifying the objects.
The candidate region detection unit 121 can detect a candidate region in which an object is expected to exist in the object detection target region.
The object detecting unit 122 may detect an object included in each of the candidate regions detected by the candidate region detecting unit 121. The candidate region may include noise, and an object may not be detected in at least some candidate regions.
The object classifying unit 123 classifies the objects detected by the object detecting unit 122 and can provide additional information to the objects. For example, the object classification unit 123 can classify the detected object as a vehicle or a person, and assign object classification information to the object.
According to one embodiment, the object detecting unit 122 may output data including the detected object information. For example, the object detection unit 122 may output an output image in which the detected object information is displayed on the input image. Alternatively, the object detection unit 122 may output positional information of the detected object or metadata including the object classification information.
The object detection unit 122 may output the output image to the output device 200 or may store the output image and / or metadata in the database 300. [
The output image may be a predetermined display element added to the object in the input image. The display element may be an element consisting of a point, a line, and / or a face displayed at a position where the object is detected so as to identify the position of the object. The display element may be an element that displays information of the object in the input image so as to identify the classification information of the object. For example, the output image may include a bounding box marked to surround the object at the location where the object was detected in the input image. The output image may be one in which the classification information of the object is displayed in a text and / or a predetermined graphic form at a position where the object is detected in the input image or at a predetermined specific position. The figure representing the classification information of the object may include, for example, a figure formed by a person, a figure shaped by a vehicle, or the like.
In the following, reference will be made to the configurations shown in Figs. 1 to 3 when explaining embodiments of the present invention.
4 is a flowchart schematically illustrating an object detection method according to an embodiment.
The object detection method shown in FIG. 4 can be performed in the object detection apparatus 100 shown in FIGS. 1 to 3. The contents of the present invention described above with reference to FIGS. 1 to 3 will be described with reference to FIG. 4 It is equally applicable to the flowchart shown in Fig.
In step 41, the category classifying unit 110 may receive an input image.
In operation 42, the category classification unit 110 may classify the input image received in operation 41 into one of a plurality of categories using a neural network. The neural network may be a deep-learning model previously learned to classify images into a plurality of categories. The plurality of categories includes a first category indicating that the object is included in the first area when the image is divided into two or more areas including the first area and the second area according to a predetermined method, May be included in the second category.
In step 43, the detection unit 120 detects the object in the object detection subject area determined according to the result classified in step 42. [ If the input image is classified into the first category in step 42, the detection unit 120 may detect the object in the first area of the input image in step 43. If the input image is classified into the second category, Can detect the object in the second area.
According to one embodiment, the plurality of categories classified by the category classifying unit 110 in step 42 may further include a third category indicating that no object is included in the image. In step 43, the detection unit 120 can determine that the object detection target area does not exist when the input image is classified into the third category, and terminate the process without detecting the object in the image.
According to one embodiment, the plurality of categories classified in step 42 may include a fourth category, which means that objects are included in both the first area and the second area. When the image is divided into the first area and the second area, the fourth category may mean that the object is included in all divided areas of the input image. When the input image is classified into the fourth category, the object detection subject area may be determined as the first area and the second area. When the image is divided into the first area and the second area, the object detection area may be determined as the entire area of the input image. In step 43, the detection unit 120 may detect the object in the first area, the second area, or the entire area of the image when the input image is classified into the fourth category.
On the other hand, before the step 42 is performed, the neural network included in the category classification unit 110 can be learned by learning data including input images pre-classified into a plurality of categories.
The above-described first to fourth categories are types of categories that can be set when the video is divided into a first region and a second region. According to another embodiment, when dividing an image into three or more regions, it is possible to set five or more different categories depending on whether the object is included in each of three or more regions, and the category classification unit 110 may classify the input image into five or more Or a category.
Hereinafter, embodiments of the present invention will be described on the assumption that the image is divided into a first area and a second area. However, the embodiments described below are equally applicable to the case of dividing an image into three or more regions. When the image is divided into the first area and the second area, the category classifying unit 110 classifies the input image into a first category corresponding to the case where the object is included in the first area and an object is not included in the second area, A third category, a first area, and a third category corresponding to the case where an object is included in the second category, the first area and the second area corresponding to the case where the object is included in the second area and the object is not included in the first area, And a fourth category corresponding to the case where no objects are included in the second area.
5 shows an example of an input image.
An example of various input images is shown in Fig.
5 (a) shows an example of an input image including an object p1, (b) shows an example of an input image including an object p2, (c) p4), and (d) shows an example of an input image that does not include an object.
Referring to FIG. 5, the category classifying unit 110 classifies the input image 50 into a first region 51, which is one of two regions when the input image 50 is divided into left and right regions, and a second region 52, Categories can be categorized depending on whether the object is included or not. The first area 51 may be a left area when an image is divided into two areas, and the second area 52 may be a right area.
The category classifying unit 110 classifies the input image 50 into any one of the first category to the fourth category according to whether an object is included in the first area 51 and the second area 52 of the input image 50 Can be classified. 5, the category classifying unit 110 classifies the image shown in (a) of FIG. 5 into a first category indicating that an object is included in the first region, (C) is a second category indicating that the object is included in the region, and the fourth category, which means that the object is included in all the regions included in the image (the first region and the second region in Fig. 5) Category, the image shown in (d) can be classified into a third category which means that the object is not included.
The detection unit 110 can detect the object in the first area by referring to the result of classification of the image shown in (a) of FIG. 5 into the first category. The detection unit 110 can detect the object in the second area with reference to the result of classification of the image shown in FIG. 5B into the second category, and the image shown in FIG. 5C can be classified into the fourth category It is possible to detect the object in the entire region of the image with reference to the result of the classification, and not to perform the object detection process with reference to the result of classifying the image shown in (d) into the third category.
FIG. 6 shows a method of detecting an object in the first area 51 of the image shown in FIG. 5 (a).
Referring to FIG. 6, the detection unit 110 may detect the object p1 in the first region 51 of the input image 50 classified into the first category.
According to an example, the candidate region detecting unit 111 may detect candidate regions 61 and 62 in which the object is estimated to exist in the first region 51. The candidate region detection unit 111 may use a selective search algorithm, a region proposal network algorithm, or the like to detect a candidate region.
The object detecting unit 112 can detect an object in each of the candidate regions 61 and 62 detected by the candidate region detecting unit 111. [ The object detection unit 112 can detect an object using a machine learning algorithm.
The object detecting unit 112 can perform processing for detecting an object corresponding to a detection object on all candidate regions and can detect an object corresponding to the detection object in at least some candidate regions as a result. In the example of Fig. 6, the object to be detected may be a person, and the object detecting unit 112 may detect an object p1 corresponding to a person in the candidate region 61. [ However, the object detecting unit 112 may not be able to detect an object corresponding to a person in the candidate region 62.
In the case of object detection, it may be important to detect the object without missing. Therefore, the candidate region detecting unit 111 can set the detection criterion of the candidate region to be low. According to this, the candidate region detecting unit 111 can detect a large number of candidate regions, and the detected candidate regions can include a candidate region that does not actually include an object, that is, noise. The candidate region 62 in Fig. 6 is an example of noise.
FIG. 7 shows an example of an output image.
7 shows an example of an output image in which an object p1 is detected in the input image shown in FIG. 5A and output by the detection unit 120. In FIG.
Referring to FIG. 7, the detector 120 may output an output image 70 indicating a bounding box 71 surrounding the object p1 at a position where the object p1 is detected. The output image 70 may be output to the output device 200 and displayed on the output device 200, but is not limited thereto.
FIG. 8 illustrates a neural network model of the category classifier 110 according to an embodiment.
Referring to FIG. 8, the category classifier 110 may include a convolution layer 111 for filtering an input image, and a pulling layer 112 for compressing dimensions of output data of the convolution layer 111. The category classification unit 110 may alternately include two or more convolution layers and a pooling layer. For example, the category classification unit 11 may further include a convolution layer 113 and a pooling layer 114. [
The category classification unit 110 may further include a soft connected layer 116 included in the output data of the FCL 115 and the fully connected layer 115 connected to the output data of the pulling layer 114. The category classification unit 110 may further include an output layer 117 connected to the soft max layer 116 to finally output category classification information.
The pooling layer 112 may set a stride value to 1 in order to miss all the information of the image. The category classification unit 110 includes a FCL layer in which all nodes are connected as a hidden layer for learning category classification information from learning data, so that the relation between all the pixels can be referred to.
The soft max layer 116 may include as many nodes as the number of categories that can be classified by the category classifier 110. Each node may correspond to a probability of each category.
The output layer 117 can output information corresponding to the node having the highest probability in the soft mux layer 116. [
According to the embodiments of the present invention described above, a category is firstly classified according to which region of an input image is divided into a plurality of regions, and then an area estimated to include an object according to a category classification result Only the secondary operation for object detection is processed. Therefore, the time required for the object detection is significantly shortened compared with the case where the object is unconditionally detected in the entire region of the input image irrespective of the category.
For example, the candidate region detection algorithm can be arbitrarily set so that hundreds to thousands of candidate regions are detected from a single image. That is, even if an object exists only in the first area, a large number of candidate areas can be detected in the second area. At this time, the candidate regions detected in the second region are all noise, and the processing task of detecting an object in each candidate region included in the second region becomes unnecessary work.
According to the above-described embodiments of the present invention, when the category of the input image is classified into that the object is included only in the first region corresponding to half of the input image, the candidate region is detected only in the first region, The computation throughput for actually detecting the object in the candidate region is reduced to half as compared with the case where the object is detected with respect to the entire image.
If the category is classified as not including an object in the input image, the process is terminated without performing the actual operation itself for detecting the object, or the next input image is received, so that the processing throughput is significantly reduced.
In addition, since the candidate region is selected in the object detection target region and the object is detected only in the candidate region, the computational processing amount is reduced as compared with the case where the object is detected by referring to the image of the object detection target region as a whole.
Meanwhile, the object detection method according to an embodiment of the present invention shown in FIG. 4 can be implemented as a program that can be executed by a computer and is implemented in a general-purpose digital computer that operates the program using a computer- . The computer-readable recording medium includes a storage medium such as a magnetic storage medium (e.g., ROM, floppy disk, hard disk, etc.), optical reading medium (e.g., CD ROM,
2 and 3 show only the components related to the present embodiment in order to prevent the features of the present embodiment from being blurred, it is to be understood that other general components other than the components shown in the figures It will be understood by those skilled in the art that the present invention can be further embodied.
Also, the block diagrams shown in FIGS. 2 and 3 may correspond to at least one processor or may include at least one or more processors. The devices corresponding to the block diagrams may be implemented in the form embedded in other hardware devices such as microprocessors or general purpose computer systems.
While the present invention has been particularly shown and described with reference to exemplary embodiments thereof, it is to be understood that the invention is not limited to the exemplary embodiments, and that various changes and modifications may be made therein without departing from the scope of the present invention. Accordingly, the true scope of the present invention should be determined by the technical idea of the appended claims.

Claims

Classifying an input image into one of a plurality of categories using a neural network model learned to classify an image into a plurality of categories; And
When an image is classified into a first category, an object is detected in a first area, which is a part of the image, and when the image is classified into a second category, an object is detected in a second area, Comprising:
Object detection method.

The method according to claim 1,
The category includes a first category indicating that an object is included in the first area when the image is divided into two or more areas including a first area and a second area according to a predetermined method, A second category, which means that the object is included,
Object detection method.

3. The method of claim 2,
Wherein the category further includes a third category indicating that the image does not include the object,
Wherein the detecting step comprises the steps of: detecting an object in the image if the image is classified into a third category;
Object detection method.

3. The method of claim 2,
Wherein the category further includes a fourth category indicating that the object is included in all of the divided areas,
Wherein the detecting step detects the object in the entire image if the image is classified into the fourth category,
Object detection method.

The method according to claim 1,
Wherein the first area is a left area when the image is divided into two areas and the second area is a right area,
Object detection method.

The method according to claim 1,
The neural network model includes:
A convolution layer for filtering the input image, a pulling layer for compressing output data of the convolution layer, a fully connected layer (FCL) connected to output data of the pulling layer, a soft max layer Lt; / RTI >
Wherein the soft max layer comprises a node corresponding to each of the plurality of categories,
Object detection method.

The method according to claim 6,
Wherein the pooling layer has a stride set to one,
Object detection method.

The method according to claim 1,
Wherein the detecting comprises:
Detecting candidate regions; And
And detecting an object included in each of the candidate regions.
Object detection method.

A category classifying unit classifying the input image into one of a plurality of the plurality of categories using a neural network model learned to classify the image into a plurality of categories; And
When an image is classified into a first category, an object is detected in a first area, which is a part of the image, and when the image is classified into a second category, an object is detected in a second area, Detecting unit
Object detection device.

9. A computer program stored on a medium for executing in a computer the method of any one of claims 1-8.