KR20220083613A

KR20220083613A - Method and Apparatus for Detecting Object in Image

Info

Publication number: KR20220083613A
Application number: KR1020210175641A
Authority: KR
Inventors: 여건민; 김영일; 박성희; 정운철; 허태욱
Original assignee: 한국전자통신연구원
Priority date: 2020-12-11
Filing date: 2021-12-09
Publication date: 2022-06-20

Abstract

영상 내에 포함되는 객체의 작은 크기로 인하여 인식이 곤란해지는 문제를 감소시킬 수 있고, 카메라를 신속하게 줌인 구동할 수 있으며, 이에 따라 영상내 객체의 분류 작업의 정확도를 크게 향상시킬 수 있는 객체 인식 방법 및 장치를 제공한다. 객체 인식 방법은 입력 영상을 받아들이는 단계; 상기 입력 영상으로부터 객체 예측을 수행하여 제1 객체 클래스(c), 제1 신뢰도(p), 및 바운딩 박스(b)를 획득하는 단계; 바운딩 박스 내부의 입력 영상 부분을 제1 임계치를 기준으로 이진화한 제1 이진화 영상에서 제1 객체 영역 비율을 계산하고, 바운딩 박스 내부의 입력 영상 부분을 제2 임계치를 기준으로 이진화한 제2 이진화 영상에서 제2 객체 영역 비율을 계산하는 단계; 제1 및 제2 객체 영역 비율 사이의 차이를 토대로 입력 영상이 포커싱되어 있는지 판단하는 단계; 입력 영상이 선명하게 포커싱되어 있다고 판단되는 경우, 입력 영상을 크롭하고 크롭된 부분을 확대하여 확대된 크롭 영상을 획득하고, 확대된 크롭 영상으로부터 객체 예측을 수행하여 제2 객체 클래스(c_m) 및 제2 신뢰도(p_m)를 획득하는 단계; 및 제1 신뢰도(p) 및 상기 제2 신뢰도(p_m)를 토대로 최종 객체 클래스를 결정하는 단계;를 포함한다.Object recognition method that can reduce the problem of difficult recognition due to the small size of the object included in the image, can quickly zoom in the camera, and thus greatly improve the accuracy of the classification task of the object in the image and devices. The object recognition method includes: receiving an input image; obtaining a first object class (c), a first reliability (p), and a bounding box (b) by performing object prediction from the input image; A second binarized image obtained by calculating a first object area ratio from a first binarized image obtained by binarizing an input image portion inside a bounding box based on a first threshold, and binarizing an input image portion inside a bounding box based on a second threshold calculating a second object area ratio in ; determining whether the input image is focused based on a difference between the first and second object area ratios; When it is determined that the input image is clearly focused, the input image is cropped and the cropped portion is enlarged to obtain an enlarged cropped image, and object prediction is performed from the enlarged cropped image to obtain a second object class (c _m ) and obtaining a second reliability p _m ; and determining a final object class based on the first reliability p and the second reliability p _m .

Description

Method and Apparatus for Detecting Object in Image

본 발명은 객체 인식 방법 및 장치에 관한 것으로서, 특히, 카메라로부터 받아들여지는 영상 또는 저장장치에 저장된 광역 감시 영상으로부터 소형 객체를 인식하기에 적합한 방법 및 장치에 관한 것이다.The present invention relates to a method and apparatus for recognizing an object, and more particularly, to a method and apparatus suitable for recognizing a small object from an image received from a camera or a wide area surveillance image stored in a storage device.

영상 내에 있는 객체의 종류 및 위치의 결정에 적용하기 위한 기계 학습은 다양한 학습 이미지 데이터를 토대로 이루어지며, 학습 수준의 질적 방법에 의해서 학습된 모형의 성능이 크게 좌우된다고 할 수 있다. 영상 분석을 통한 기계 학습은 많은 시간이 소요되며 기계 학습을 위한 데이터 세트도 방대한 양이 필요하다. 본 출원이 이루어지는 시점에서, 카메라에 의해 획득한 영상에 포함된 지상의 객체, 예컨대 자동차, 사람, 또는 동물을 인식하는 기술은 그래픽 프로세서 속도의 발전 및 기계 학습의 진화와 함께 많은 성능 향상을 보여왔다.Machine learning to be applied to the determination of the type and location of objects in the image is made based on various learning image data, and it can be said that the performance of the learned model is greatly influenced by the qualitative method of the learning level. Machine learning through image analysis is time-consuming and requires a large amount of data sets for machine learning. At the time of filing of the present application, technologies for recognizing terrestrial objects, such as cars, people, or animals, included in images acquired by a camera have shown a lot of performance improvement along with the development of graphics processor speed and the evolution of machine learning. .

한편, 근래에 이르러 소형무인기에 의한 공항, 공공장소, 또는 기타 보호지역에 대한 침범 사고로 사회적 불안이 점증하고 있다. 특히, 군사목적으로 전용될 수 있는 소형무인기의 공격으로부터 인명과 재산을 보호하기 위해서 다양한 기술들이 논의되고 있다. 소형무인기의 감지에 적용 가능한 기술로서 레이더를 통한 탐지, 영상신호 분석 기반 탐지, 소음특성 기반 탐지 등이 제시되고 있으나, 소형무인기의 크기가 작다는 특징 때문에 상기 방법 중 어느 것도 효과적이라고 하기 어렵다.Meanwhile, in recent years, social unrest is increasing due to the invasion of airports, public places, or other protected areas by small drones. In particular, various technologies are being discussed to protect life and property from attacks by small unmanned aerial vehicles that can be used for military purposes. Detection through radar, image signal analysis-based detection, noise characteristic-based detection, etc. have been proposed as technologies applicable to the detection of small UAVs.

특히 영상 분석에 의한 탐지의 경우에도, 지상의 객체를 탐지하는 경우와 달리 유효한 해결책이 되지 못하는데, 소형무인기의 크기가 작고 카메라로부터 거리가 멀어서 영상 내에 포함되는 소형무인기 객체의 크기가 매우 작기 때문이다. 따라서, 침입하는 소형무인기를 빠짐없이 감지할 수 있고, 각 소형무인기를 정확하게 인식할 수 있는 방안이 요구된다. 줌 렌즈를 사용하여 영상 내에서 소형무인기가 차지하는 영역을 확대시키고, 다수의 카메라를 배치하여 각 카메라가 담당하는 감시영역을 좁히는 방안을 생각해볼 수 있지만, 소요되는 카메라의 수가 급증하여 경제적 부담이 증가한다는 문제가 발생한다. 또한, 카메라를 줌인 상태로 진입하는데 소요되는 시간이 비교적 길기 때문에, 필요한 시간 내에 줌인이 성공적으로 이루어지지 않을 수 있다.In particular, even in the case of detection by image analysis, unlike the case of detecting an object on the ground, it is not an effective solution because the size of the small drone is small and the distance from the camera is far, so the size of the small drone object included in the image is very small. . Therefore, there is a need for a method capable of detecting all invading small unmanned aerial vehicles and accurately recognizing each small unmanned aerial vehicle. It is possible to think of ways to enlarge the area occupied by a small drone within the image by using a zoom lens and to narrow the surveillance area covered by each camera by arranging multiple cameras, but the economic burden increases as the number of required cameras increases rapidly. A problem arises that In addition, since the time required for the camera to enter the zoom-in state is relatively long, the zoom-in may not be successfully performed within a required time.

본 발명은 이와 같은 문제를 해결하기 위한 것으로서, 영상 내에 포함되는 객체의 작은 크기로 인하여 인식이 곤란해지는 문제를 감소시킬 수 있고, 카메라를 신속하게 줌인 구동할 수 있으며, 이에 따라 영상내 객체의 분류 작업의 정확도를 크게 향상시킬 수 있어서, 소형무인기를 정확하게 탐지할 수 있는 객체 인식 방법 및 장치를 제공한다.The present invention is intended to solve such a problem, and it is possible to reduce the problem of difficult recognition due to the small size of the object included in the image, and the camera can be quickly zoomed in and driven, thereby classifying the object in the image Provided are an object recognition method and apparatus capable of accurately detecting a small unmanned aerial vehicle by greatly improving work accuracy.

본 발명의 일 측면에 따른 객체 인식 방법은 입력 영상을 받아들이는 단계; 상기 입력 영상으로부터 객체 예측을 수행하여 제1 객체 클래스(c), 제1 신뢰도(p), 및 바운딩 박스(b)를 획득하는 단계; 상기 바운딩 박스 내부의 입력 영상 부분을 제1 임계치를 기준으로 이진화한 제1 이진화 영상에서 제1 객체 영역 비율을 계산하고, 상기 바운딩 박스 내부의 입력 영상 부분을 제2 임계치를 기준으로 이진화한 제2 이진화 영상에서 제2 객체 영역 비율을 계산하는 단계; 상기 제1 및 제2 객체 영역 비율 사이의 차이를 토대로 상기 입력 영상이 포커싱되어 있는지 판단하는 단계; 상기 입력 영상이 선명하게 포커싱되어 있다고 판단되는 경우, 상기 입력 영상을 크롭하고 크롭된 부분을 확대하여 확대된 크롭 영상을 획득하고, 상기 확대된 크롭 영상으로부터 객체 예측을 수행하여 제2 객체 클래스(c_m) 및 제2 신뢰도(p_m)를 획득하는 단계; 및 상기 제1 신뢰도(p) 및 상기 제2 신뢰도(p_m)를 토대로 최종 객체 클래스를 결정하는 단계;를 포함한다.An object recognition method according to an aspect of the present invention includes the steps of receiving an input image; obtaining a first object class (c), a first reliability (p), and a bounding box (b) by performing object prediction from the input image; A first object area ratio is calculated from a first binarized image obtained by binarizing an input image portion inside the bounding box based on a first threshold, and a second binarization of an input image portion inside the bounding box based on a second threshold. calculating a second object area ratio in the binarized image; determining whether the input image is focused based on a difference between the ratios of the first and second object areas; When it is determined that the input image is clearly focused, the input image is cropped and the cropped portion is enlarged to obtain an enlarged cropped image, and object prediction is performed from the enlarged cropped image to obtain a second object class (c) _m ) and a second confidence level p _m ; and determining a final object class based on the first reliability p and the second reliability p _m .

상기 입력 영상이 포커싱되어 있는지 판단하는 단계는 상기 제1 및 제2 객체 영역 비율 사이의 차이가 소정의 기준값보다 큰지 여부를 판단하는 단계; 상기 차이가 상기 기준값보다 크면 상기 입력 영상이 포커싱이 되어 있지 않다고 판단하는 단계; 및 상기 차이가 상기 기준값보다 크지 않으면 상기 입력 영상이 포커싱이 되어 있다고 판단하는 단계;를 포함할 수 있다.The determining whether the input image is focused may include: determining whether a difference between the first and second object area ratios is greater than a predetermined reference value; determining that the input image is not focused when the difference is greater than the reference value; and determining that the input image is focused if the difference is not greater than the reference value.

일 실시예에서, 상기 입력 영상이 카메라로부터 수신될 수 있다. 이 경우, 상기 입력 영상이 포커싱되어 있는지 판단하는 단계는 상기 입력 영상이 포커싱이 되어 있지 않다고 판단하는 경우 상기 카메라에 포커스 조정을 위한 제어신호를 공급하는 단계를 더 포함할 수 있다.In an embodiment, the input image may be received from a camera. In this case, the step of determining whether the input image is focused may further include supplying a control signal for focus adjustment to the camera when it is determined that the input image is not focused.

상기 제어신호는 줌인 제어명령 또는 오토포커스 명령을 포함할 수 있다.The control signal may include a zoom-in control command or an auto-focus command.

상기 최종 객체 클래스를 결정하는 단계는 상기 제1 신뢰도(p) 및 상기 제2 신뢰도(p_m)에 제1 및 제2 가중치(w1, w2)를 각각 부여하고 비교하는 단계를 더 포함하여, 비교 결과에 따라 상기 최종 객체 클래스를 결정할 수 있다.The determining of the final object class further includes the step of assigning first and second weights w1 and w2 to the first reliability p and the second reliability p _m , respectively, and comparing them, The final object class may be determined according to the result.

상기 최종 객체 클래스를 결정하는 단계는 상기 제2 신뢰도(p_m)에 상기 제2 가중치(w2)를 곱한 값이 상기 제1 신뢰도(p)에 상기 제1 가중치(w1)를 곱한 값보다 크다면, 상기 제2 객체 클래스(c_m)를 상기 최종 객체 클래스를 결정하는 단계; 및 상기 제2 신뢰도(p_m)에 상기 제2 가중치(w2)를 곱한 값이 상기 제1 신뢰도(p)에 상기 제1 가중치(w1)를 곱한 값보다 크지 않다면, 상기 제1 객체 클래스(c)를 상기 최종 객체 클래스를 결정하는 단계;를 포함할 수 있다.In the determining of the final object class, if the value obtained by multiplying the second reliability p _m by the second weight w2 is greater than the value obtained by multiplying the first reliability p by the first weight w1 , determining the final object class for the second object class (c _m ); and if the value obtained by multiplying the second reliability p _m by the second weight w2 is not greater than the value obtained by multiplying the first reliability p by the first weight w1, the first object class c ) to determine the final object class; may include.

본 발명의 일 측면에 따른 객체 인식 장치는 프로그램 명령들을 저장하는 메모리와; 상기 메모리에 접속되고 상기 메모리에 저장된 상기 프로그램 명령들을 실행하는 프로세서;를 구비한다. 상기 프로그램 명령들은 상기 프로세서에 의해 실행될 때 상기 프로세서로 하여금: 입력 영상을 받아들이고; 상기 입력 영상으로부터 객체 예측을 수행하여 제1 객체 클래스(c), 제1 신뢰도(p), 및 바운딩 박스(b)를 획득하고; 상기 바운딩 박스 내부의 입력 영상 부분을 제1 임계치를 기준으로 이진화한 제1 이진화 영상에서 제1 객체 영역 비율을 계산하고, 상기 바운딩 박스 내부의 입력 영상 부분을 제2 임계치를 기준으로 이진화한 제2 이진화 영상에서 제2 객체 영역 비율을 계산하며; 상기 제1 및 제2 객체 영역 비율 사이의 차이를 토대로 상기 입력 영상이 포커싱되어 있는지 판단하고; 상기 입력 영상이 선명하게 포커싱되어 있다고 판단되는 경우, 상기 입력 영상을 크롭하고 크롭된 부분을 확대하여 확대된 크롭 영상을 획득하고, 상기 확대된 크롭 영상으로부터 객체 예측을 수행하여 제2 객체 클래스(c_m) 및 제2 신뢰도(p_m)를 획득하고; 상기 제1 신뢰도(p) 및 상기 제2 신뢰도(p_m)를 토대로 최종 객체 클래스를 결정하는 동작을 수행하게 할 수 있다.An object recognition apparatus according to an aspect of the present invention comprises: a memory for storing program instructions; and a processor connected to the memory and executing the program instructions stored in the memory. The program instructions, when executed by the processor, cause the processor to: accept an input image; performing object prediction from the input image to obtain a first object class (c), a first reliability (p), and a bounding box (b); A first object area ratio is calculated from a first binarized image obtained by binarizing an input image portion inside the bounding box based on a first threshold, and a second binarization of an input image portion inside the bounding box based on a second threshold. calculating a second object area ratio in the binarized image; determining whether the input image is focused based on a difference between the ratio of the first and second object areas; When it is determined that the input image is clearly focused, the input image is cropped and the cropped portion is enlarged to obtain an enlarged cropped image, and object prediction is performed from the enlarged cropped image to obtain a second object class (c) _m ) and a second confidence level (p _m ); An operation of determining a final object class may be performed based on the first reliability p and the second reliability p _m .

본 발명의 일 실시예에 따르면, 영상 내에 포함되는 객체의 작은 크기로 인하여 인식이 곤란해지는 문제를 경감시킬 수 있고, 신속하게 카메라를 줌인 구동할 수 있으며, 이에 따라 영상내 객체의 분류 작업의 정확도를 크게 향상시킬 수 있다. 그러므로, 침입하는 소형무인기를 빠짐없이 감지할 수 있고, 각 소형무인기를 정확하게 탐지할 수 있게 된다.According to an embodiment of the present invention, it is possible to reduce the problem of difficulty in recognition due to the small size of the object included in the image, and to quickly zoom in the camera, and thus the accuracy of the classification operation of the object in the image can be greatly improved. Therefore, it is possible to detect all the small unmanned aerial vehicles invading, and it is possible to accurately detect each small unmanned aerial vehicle.

특히, 일 실시예에 따르면, 영상 이진화를 통해서 조도, 영상 색상, 명암, 객체 크기 등 다양한 환경 요건이 잘 반영되도록 하고, 바운딩 박스 내에서의 객체의 비율을 기반으로 선명하게 포커싱된 영상을 확보한다. 이 과정에서 임계치가 서로 다른 이진화 영상을 토대로 포커싱 여부를 판단하고 카메라의 포커싱을 제어함으로써, 물리적인 카메라 줌인 시간을 감소시킬 수 있다. 한편, 화면의 크롭(crop) 및 확대를 통해 재생산한 영상과 원본 영상에서 추출된 객체 클래스들 중에서 정확도가 더 높을 가능성이 있는 것을 선택하여 최종 객체 클래스로 결정한다. 이에 따라 분류 정확도가 현저히 향상될 수 있어서, 침입하는 소형무인기를 빠짐없이 감지할 수 있고, 각 소형무인기를 정확하게 인식할 수 있게 된다.In particular, according to an embodiment, various environmental requirements such as illuminance, image color, contrast, and object size are well reflected through image binarization, and a clearly focused image is secured based on the ratio of objects in the bounding box. . In this process, by determining whether to focus based on the binarized images with different thresholds and controlling the focusing of the camera, the physical camera zoom-in time can be reduced. On the other hand, a final object class is determined by selecting an object class extracted from an image reproduced through cropping and magnification of the screen and an object class having a higher accuracy. Accordingly, classification accuracy can be significantly improved, so that it is possible to detect all intruding small UAVs, and it is possible to accurately recognize each small UAV.

도 1은 영상 내의 소형무인기 검출 실패 사례의 일 예를 보여주는 화면 스크린샷이다.
도 2는 영상 내의 소형무인기 검출 실패 사례의 다른 예를 보여주는 화면 스크린샷이다.
도 3은 소형무인기의 고도와, 카메라로부터의 거리와, 촬영방향과, 틸트 각에 따른 소형무인기의 다양한 형태를 예시적으로 보여주는 도면이다.
도 4는 본 발명의 일 실시예에 따른 객체 인식 시스템의 모식도이다.
도 5는 본 발명의 일 실시예에 따른 객체 인식 장치의 기능적 블록도이다.
도 6은 바운딩 박스 설정의 일 예를 보여주는 도면이다.
도 7은 영상 이진화의 일 예를 보여주는 도면이다.
도 8은 입력 영상을 크롭하여 확대한 영상의 일 예를 보여주는 도면이다.
도 9는 본 발명의 일 실시예에 따른 객체 인식 장치의 물리적 블록도이다.
도 10은 본 발명의 일 실시예에 따른 객체 인식 방법을 보여주는 흐름도이다.
도 11은 도 10에 도시된 최종 분류 판단 단계를 보다 구체적으로 보여주는 상세 흐름도이다.
도 12는 포커스가 맞지 않은 영상에 대한 제1 이진화 영상과 제2 이진화 영상 간의 객체 영역의 비율의 차이를 예시적으로 보여주는 도면이다.1 is a screen shot showing an example of a small unmanned aerial vehicle detection failure in an image.
2 is a screen shot showing another example of a small unmanned aerial vehicle detection failure case in an image.
3 is a view exemplarily showing various shapes of the small UAV according to the altitude, the distance from the camera, the shooting direction, and the tilt angle of the small UAV.
4 is a schematic diagram of an object recognition system according to an embodiment of the present invention.
5 is a functional block diagram of an object recognition apparatus according to an embodiment of the present invention.
6 is a diagram illustrating an example of setting a bounding box.
7 is a diagram illustrating an example of image binarization.
8 is a diagram illustrating an example of an enlarged image by cropping an input image.
9 is a physical block diagram of an object recognition apparatus according to an embodiment of the present invention.
10 is a flowchart illustrating an object recognition method according to an embodiment of the present invention.
11 is a detailed flowchart showing the final classification determination step shown in FIG. 10 in more detail.
12 is a diagram exemplarily showing a difference in a ratio of an object area between a first binarized image and a second binarized image with respect to an out-of-focus image.

본 발명은 다양한 변경을 가할 수 있고 여러 가지 실시예를 가질 수 있는 바, 특정 실시예들을 도면에 예시하고 상세한 설명에 상세하게 설명하고자 한다. 그러나, 이는 본 발명을 특정한 실시 형태에 대해 한정하려는 것이 아니며, 본 발명의 사상 및 기술 범위에 포함되는 모든 변경, 균등물 내지 대체물을 포함하는 것으로 이해되어야 한다. 각 도면을 설명하면서 유사한 구성요소에 대해서는 유사한 참조부호를 사용하였다.Since the present invention can have various changes and can have various embodiments, specific embodiments are illustrated in the drawings and described in detail in the detailed description. However, this is not intended to limit the present invention to specific embodiments, and it should be understood to include all modifications, equivalents and substitutes included in the spirit and scope of the present invention. In describing each drawing, like reference numerals are used for similar components.

제1, 제2, 등의 서수가 다양한 구성요소들을 설명하는 데 사용될 수 있지만, 상기 구성요소들은 상기 용어들에 의해 한정되어서는 안 된다. 상기 용어들은 하나의 구성요소를 다른 구성요소로부터 구별하는 목적으로만 사용된다. 예를 들어, 본 발명의 권리 범위를 벗어나지 않으면서 제1 구성요소는 제2 구성요소로 명명될 수 있고, 유사하게 제2 구성요소도 제1 구성요소로 명명될 수 있다. "및/또는"이라는 용어는 복수의 관련된 기재된 항목들의 조합 또는 복수의 관련된 기재된 항목들 중의 어느 항목을 포함한다.Ordinal numbers such as first, second, etc. may be used to describe various elements, but the elements should not be limited by the terms. The above terms are used only for the purpose of distinguishing one component from another. For example, without departing from the scope of the present invention, a first component may be referred to as a second component, and similarly, a second component may also be referred to as a first component. The term “and/or” includes a combination of a plurality of related listed items or any of a plurality of related listed items.

어떤 구성요소가 다른 구성요소에 "연결되어" 있다거나 "접속되어" 있다고 언급된 때에는, 그 다른 구성요소에 직접적으로 연결되어 있거나 또는 접속되어 있을 수도 있지만, 중간에 다른 구성요소가 존재할 수도 있다고 이해되어야 할 것이다. 반면에, 어떤 구성요소가 다른 구성요소에 "직접 연결되어" 있다거나 "직접 접속되어" 있다고 언급된 때에는, 중간에 다른 구성요소가 존재하지 않는 것으로 이해되어야 할 것이다.When an element is referred to as being “connected” or “connected” to another element, it is understood that it may be directly connected or connected to the other element, but other elements may exist in between. it should be On the other hand, when it is said that a certain element is "directly connected" or "directly connected" to another element, it should be understood that the other element does not exist in the middle.

본 출원에서 사용한 용어는 단지 특정한 실시예를 설명하기 위해 사용된 것으로, 본 발명을 한정하려는 의도가 아니다. 단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한, 복수의 표현을 포함한다. 본 출원에서, "포함하다" 또는 "가지다" 등의 용어는 명세서상에 기재된 특징, 숫자, 단계, 동작, 구성요소, 부품 또는 이들을 조합한 것이 존재함을 지정하려는 것이지, 하나 또는 그 이상의 다른 특징들이나 숫자, 단계, 동작, 구성요소, 부품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 미리 배제하지 않는 것으로 이해되어야 한다.The terms used in the present application are only used to describe specific embodiments, and are not intended to limit the present invention. The singular expression includes the plural expression unless the context clearly dictates otherwise. In the present application, terms such as “comprise” or “have” are intended to designate that a feature, number, step, operation, component, part, or combination thereof described in the specification exists, but one or more other features It should be understood that this does not preclude the existence or addition of numbers, steps, operations, components, parts, or combinations thereof.

달리 정의되지 않는 한, 기술적이거나 과학적인 용어를 포함해서 여기서 사용되는 모든 용어들은 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자에 의해 일반적으로 이해되는 것과 동일한 의미를 가지고 있다. 일반적으로 사용되는 사전에 정의되어 있는 것과 같은 용어들은 관련 기술의 문맥 상 가지는 의미와 일치하는 의미를 가지는 것으로 해석되어야 하며, 본 출원에서 명백하게 정의하지 않는 한, 이상적이거나 과도하게 형식적인 의미로 해석되지 않는다.Unless defined otherwise, all terms used herein, including technical and scientific terms, have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. Terms such as those defined in commonly used dictionaries should be interpreted as having a meaning consistent with the meaning in the context of the related art, and should not be interpreted in an ideal or excessively formal meaning unless explicitly defined in the present application. does not

이하, 본 발명에 따른 바람직한 실시예를 첨부된 도면을 참조하여 상세하게 설명한다.Hereinafter, preferred embodiments according to the present invention will be described in detail with reference to the accompanying drawings.

영상 분석에 의한 소형무인기 탐지 시에는, 소형무인기의 크기가 작고 카메라로부터의 거리가 멀어서 영상 내에 표시되는 소형무인기 객체의 크기가 작으므로, 객체 인식 및 분류의 정확도가 현저히 떨어진다. 도 1은 객체 크기가 매우 작을 경우의 객체 분류 오류의 일 예를 보여준다. 도시된 예에서, 객체 분류의 신뢰도는 0.482로서 매우 낮다. 아울러, 검출된 객체는 IRONMAN 기종이 아님에도 불구하고 잘못 분류가 되어 있다.When detecting a small UAV by image analysis, since the size of the small UAV is small and the distance from the camera is far, the size of the small UAV object displayed in the image is small, so the accuracy of object recognition and classification is significantly reduced. 1 shows an example of an object classification error when the object size is very small. In the illustrated example, the reliability of object classification is 0.482, which is very low. In addition, the detected object is classified incorrectly even though it is not an IRONMAN model.

카메라를 줌인(zoom-in)하여 객체를 확대시킨 후 분류 모델의 입력으로 인가할 수 있지만, 카메라 줌인에 짧지 않은 시간이 소요되어 필요한 시간 내에 줌인이 성공적으로 이루어지지 않을 수 있다. 또한, 탐지 대상 객체에 대한 초점이 흐려지게 되어 객체 인식 및 분류의 정확도가 높아지지 않을 수 있다. 도 2는 확대된 객체의 대한 분류 오류의 일 예를 보여준다. 도시된 예에서, 객체 분류 신뢰도는 0.987로서 상당히 높은 수치를 나타내고 있지만, 해당 소형무인기가 다른 기종임에도 불구하고 DJI 기종인 것으로 분류 오류가 발생하고 있다. 특히 소형무인기는 도 3에 도시된 바와 같이 소형무인기의 고도와, 카메라로부터의 거리와, 촬영방향과, 틸트 각에 따라 형태가 크게 다를 수 있기 때문에, 초점이 흐린 영상에서 소형무인기를 탐지하여 분류하는 것은 쉽지 않다.Although the object can be enlarged by zooming in the camera and applied as an input to the classification model, the zoom-in may not be successfully performed within the required time because it takes a short time to zoom in the camera. In addition, since the focus of the detection target object is blurred, the accuracy of object recognition and classification may not increase. 2 shows an example of a classification error for an enlarged object. In the illustrated example, the object classification reliability is 0.987, which is a fairly high value, but a classification error occurs as the small UAV is a DJI model even though it is a different model. In particular, as shown in FIG. 3 , the shape of a small UAV can vary greatly depending on the height of the small UAV, the distance from the camera, the shooting direction, and the tilt angle. It's not easy to do.

도 4는 본 발명의 일 실시예에 따른 객체 인식 시스템의 모식도이다. 도시된 객체 인식 시스템은 감시용 카메라(10, 이하 "카메라"로 약칭함)와, 상기 카메라(10)가 획득한 영상을 받아들이고 상기 영상 내에 포함된 소형무인기 객체를 검출하는 객체 인식 장치(100)를 구비한다. 객체 인식 장치(100)는 카메라(10)에 인접하게 위치하도록 설치될 수 있다. 그렇지만, 본 발명이 이에 한정되는 것은 아니며, 객체 인식 장치(100)가 카메라(10)로부터 멀리 이격된 위치에 설치될 수도 있다. 이와 같은 경우, 객체 인식 장치(100)는 IP 네트웍을 통해서 카메라(10)에 접속될 수 있다. 한편, 객체 인식 장치(100)는 카메라가 획득한 영상 이외에 내외부의 저장장치에 저장된 영상을 토대로 객체를 탐지할 수도 있다.4 is a schematic diagram of an object recognition system according to an embodiment of the present invention. The illustrated object recognition system includes a surveillance camera 10 (hereinafter abbreviated as “camera”) and an object recognition device 100 that receives an image acquired by the camera 10 and detects a small UAV object included in the image. to provide The object recognition apparatus 100 may be installed to be located adjacent to the camera 10 . However, the present invention is not limited thereto, and the object recognition apparatus 100 may be installed at a location far away from the camera 10 . In this case, the object recognition apparatus 100 may be connected to the camera 10 through an IP network. Meanwhile, the object recognition apparatus 100 may detect an object based on images stored in internal and external storage devices in addition to the images acquired by the camera.

도 5는 본 발명의 일 실시예에 따른 객체 인식 장치(100)의 기능적 블록도이다. 설명의 편의상 도 5에는 카메라(10)가 함께 도시되어 있다. 객체 인식 장치(100)는 제1 객체 예측부(110), 포커싱 판단부(120), 카메라 제어부(140), 크롭/확대부(150), 제2 객체 예측부(160), 및 최종 분류 판단부(170)를 구비한다.5 is a functional block diagram of the object recognition apparatus 100 according to an embodiment of the present invention. For convenience of description, the camera 10 is also illustrated in FIG. 5 . The object recognition apparatus 100 includes a first object prediction unit 110 , a focusing determination unit 120 , a camera control unit 140 , a crop/enlargement unit 150 , a second object prediction unit 160 , and a final classification determination A portion 170 is provided.

제1 객체 예측부(110)는 카메라(10)로부터의 입력 영상으로부터 객체 클래스(c), 신뢰도(p), 및 바운딩 박스(b)를 획득할 수 있다. 주지된 바와 같이, 바운딩 박스는 객체 탐색 영역을 좁히기 위하여 탐지할 객체가 존재할 가능성이 높은 영역을 둘러싸도록 설정되는 직사각형 객체이다. 바운딩 박스의 일 예가 도 6에 도시되어 있다. 일 실시예에 있어서 제1 객체 예측부(110)의 동작은 사전에 학습된 소정의 예측 모델을 토대로 수행될 수 있다. 상기 예측 모델은 예컨대 합성곱 신경망(Convolutional Neural Network: CNN)을 토대로 구축될 수 있으나, 본 발명이 이에 한정되는 것은 아니다.The first object predictor 110 may obtain the object class c, the reliability p, and the bounding box b from the input image from the camera 10 . As is well known, the bounding box is a rectangular object set to surround the area where the object to be detected is most likely to exist in order to narrow the object search area. An example of a bounding box is shown in FIG. 6 . According to an embodiment, the operation of the first object prediction unit 110 may be performed based on a predetermined prediction model learned in advance. The prediction model may be built, for example, based on a convolutional neural network (CNN), but the present invention is not limited thereto.

포커싱 판단부(120)는 입력 영상을 이진화하고 입력 영상이 선명하게 포커싱되어 있는지 즉, 입력 영상을 제공한 상기 카메라(10)가 잘 포커싱되어 있는지 여부를 판단한다. 그리고, 입력 영상이 선명하게 포커싱되어 있지 않다고 판단되는 경우, 포커싱 판단부(120)는 카메라 제어부(140)로 하여금 카메라(10)에 제어신호를 송신하여 포커싱을 변경하도록 할 수 있다.The focusing determination unit 120 binarizes the input image and determines whether the input image is clearly focused, that is, whether the camera 10 that provided the input image is well focused. And, when it is determined that the input image is not clearly focused, the focusing determination unit 120 may cause the camera control unit 140 to transmit a control signal to the camera 10 to change the focusing.

일 실시예에 있어서, 포커싱 판단부(120)는 제1 영상 이진화부(122), 제1 객체 비율 계산부(124), 제2 영상 이진화부(126), 제2 객체 비율 계산부(128), 객체 비율 차이 계산부(130), 및 판단부(132)를 포함할 수 있다.In an embodiment, the focusing determination unit 120 includes the first image binarization unit 122 , the first object ratio calculation unit 124 , the second image binarization unit 126 , and the second object ratio calculation unit 128 . , an object ratio difference calculation unit 130 , and a determination unit 132 .

제1 영상 이진화부(122)는 입력 영상을 이진화하여, 제1 이진화 영상을 생성한다. 이때, 도 7은 영상 이진화의 일 예를 보여준다. 제1 영상 이진화부(122)는 제1 임계치(T₁)를 기준으로 각 픽셀의 레벨을 '0'(흑색)과 '1'(백색)로 구분할 수 있다. 즉, 픽셀의 휘도 레벨이 제1 임계치(T₁)보다 큰 경우에는 제1 영상 이진화 영상에서 해당 픽셀의 픽셀 값은 '0'의 값을 가지게 되어 해당 픽셀이 객체 내의 픽셀(픽셀 On)임을 나타내고, 휘도 레벨이 제1 임계치(T₁)보다 작은 경우에는 제1 영상 이진화 영상에서 해당 픽셀의 픽셀 값은 '1'의 값을 가지게 되어 해당 픽셀이 객체 외부의 픽셀(픽셀 Off)임을 나타내게 된다. 제1 객체 비율 계산부(124)는 제1 이진화 영상에서 다음 수학식 1에 의하여 바운딩 박스 내의 전체 픽셀 수에 대한 객체 픽셀의 수의 비율을 나타내는 객체 영역 비율(Ratio₁)을 계산한다.The first image binarization unit 122 binarizes the input image to generate a first binarized image. At this time, FIG. 7 shows an example of image binarization. The first image binarizer 122 may classify the level of each pixel into '0' (black) and '1' (white) based on the first threshold value T ₁ . That is, when the luminance level of the pixel is greater than the first threshold (T ₁ ), the pixel value of the corresponding pixel in the first image binarized image has a value of '0', indicating that the corresponding pixel is a pixel (pixel On) within the object. , when the luminance level is smaller than the first threshold value T ₁ , the pixel value of the corresponding pixel in the first image binarized image has a value of '1', indicating that the corresponding pixel is a pixel outside the object (pixel Off). The first object ratio calculator 124 calculates an object area ratio Ratio ₁ representing a ratio of the number of object pixels to the total number of pixels in a bounding box in the first binarized image according to Equation 1 below.

제2 영상 이진화부(126)는 입력 영상을 이진화하여, 제2 이진화 영상을 생성한다. 이때, 제2 영상 이진화부(126)는 제2 임계치(T₂)를 기준으로 각 픽셀의 레벨을 '0'과 '1'로 구분할 수 있다. 제2 임계치(T₂)는 수학식 2와 같이 제1 임계치(T₁)보다 일정한 값(

)만큼 다르게 설정될 수 있다. 제2 객체 비율 계산부(128)는 제2 이진화 영상에서 상기 수학식 1에 의하여 바운딩 박스 내의 전체 픽셀 수에 대한 객체 픽셀의 수의 비율을 나타내는 객체 영역 비율(Ratio₂)을 계산한다.The second image binarization unit 126 binarizes the input image to generate a second binarized image. In this case, the second image binarizer 126 may classify the level of each pixel into '0' and '1' based on the second threshold value T ₂ . The second threshold (T ₂ ) is a constant value than the first threshold (T ₁ ) as in Equation 2 (

) can be set differently. The second object ratio calculator 128 calculates an object area ratio Ratio ₂ representing a ratio of the number of object pixels to the total number of pixels in a bounding box according to Equation 1 in the second binarized image.

객체 비율 차이 계산부(130)는 제1 및 제2 객체 비율 계산부(124, 128)로부터 출력되는 객체 영역 비율들의 차이(Ratio₁-Ratio₂)를 계산하고, 판단부(132)는 객체 영역 비율 차이(Ratio₁-Ratio₂)를 토대로 입력 영상이 선명하게 포커싱되어 있는지 여부를 판단한다. 일 실시예에 있어서, 판단부(132)는 객체 영역 비율 차이(Ratio₁-Ratio₂)가 특정 기준값(

)보다 크면 즉, 다음 수학식 3을 만족하면, 카메라(10)의 포커싱이 되어 있지 않다고 판단할 수 있다.The object ratio difference calculation unit 130 calculates a difference (Ratio ₁ -Ratio ₂ ) between the ratios of the object areas output from the first and second object

ratio calculation units

124 and 128 , and the determination unit 132 is the object area Based on the ratio difference (Ratio ₁ -Ratio ₂ ), it is determined whether the input image is clearly focused. In an embodiment, the determination unit 132 determines that the object area ratio difference (Ratio ₁ -Ratio ₂ ) is a specific reference value (

), that is, if the following Equation 3 is satisfied, it can be determined that the camera 10 is not focused.

카메라(10)의 포커싱이 되어 있지 않은 경우에는, 제2 임계치(T₂)를 적용할 경우 초점이 맞는 경우 보다 fade-out 경향이 커지므로 객체 외부의 픽셀의 비율이 높아지게 된다. 그러므로 포커싱이 되어 있지 않은 경우 제2 임계치(T₂)를 적용하여 도출한 객체 영역이 제1 임계치(T₁)를 적용할 때의 객체 영역보다 현저히 줄어든다. 이러한 현상을 이용하여 포커싱이 되어 있지 않은(focus-out) 상황을 수학식 3과 같이 수치적으로 구분할 수 있게 된다.When the camera 10 is not focused, when the second threshold T ₂ is applied, the fade-out tendency is greater than when the focus is achieved, so that the ratio of pixels outside the object is increased. Therefore, when the focus is not performed, the object area derived by applying the second threshold value T ₂ is significantly reduced compared to the object area when the first threshold value T ₁ is applied. Using this phenomenon, it is possible to numerically classify a situation in which a focus is not made as shown in Equation (3).

카메라(10)의 포커싱이 되어 있지 않은 경우 판단부(132)는 카메라 제어부(140)로 하여금 포커스 조정을 위한 제어신호를 송신하도록 하고, 카메라(10)로부터 새로이 입력되는 영상에 대하여 객체 영역 비율 차이(Ratio₁-Ratio₂)을 재계산하고 수학식 4의 만족될 때까지 기다릴 수 있다. 이와 같이 카메라 제어부(140)는 포커싱 판단부(120)의 제어 하에 포커스 조정을 위한 제어신호를 카메라(10)에 송신할 수 있다. 상기 제어신호는 줌인 제어명령 또는 Auto-focus 명령을 포함할 수 있다.When the camera 10 is not focused, the determination unit 132 causes the camera control unit 140 to transmit a control signal for focus adjustment, and a difference in object area ratio with respect to an image newly input from the camera 10 . (Ratio ₁ -Ratio ₂ ) may be recalculated and wait until Equation 4 is satisfied. In this way, the camera control unit 140 may transmit a control signal for focus adjustment to the camera 10 under the control of the focusing determination unit 120 . The control signal may include a zoom-in control command or an auto-focus command.

한편, 입력 영상을 토대로 카메라(10)가 포커싱이 잘 되어 있다고 판단부(132)가 판단하는 경우, 크롭/확대부(150)는 도 8에 도시된 바와 같이 바운딩 박스 또는 바운딩 박스보다 넓은 영역에 대하여 입력 영상을 크롭(crop)하고 크롭된 영상을 확대한다. 확대 비율은

로 나타낼 수 있다. 제2 객체 예측부(160)는 크롭 및 확대된 영상에서 객체 클래스(c_m) 및 신뢰도(p_m)를 획득한다. 제2 객체 예측부(160)의 동작은 제1 객체 예측부(110)와 동일한 예측 모델을 토대로 수행될 수 있다. 최종 분류 판단부(170)는 제1 객체 예측부(110)에 의해 생성된 신뢰도(p)와 제2 객체 예측부(160)에 의해 생성된 신뢰도(p_m)에 각각의 가중치(w1, w2)를 부여하여, 최종 객체 클래스를 결정한다. 일 실시예에 있어서, 최종 객체 클래스는 제1 객체 예측부(110)에 의해 결정된 객체 클래스(c)와 제2 객체 예측부(160)에 의해 결정된 객체 클래스(c_m) 중에서 어느 하나로 정해질 수 있다.On the other hand, when the determination unit 132 determines that the camera 10 is well focused based on the input image, the crop/enlargement unit 150 is located in a larger area than the bounding box or the bounding box as shown in FIG. 8 . The input image is cropped and the cropped image is enlarged. the magnification ratio

can be expressed as The second object prediction unit 160 obtains an object class (c _m ) and reliability (p _m ) from the cropped and enlarged image. The operation of the second object predictor 160 may be performed based on the same prediction model as that of the first object predictor 110 . The final classification determining unit 170 assigns weights w1 and w2 to the reliability p generated by the first object prediction unit 110 and the reliability p _m generated by the second object prediction unit 160, respectively. ) to determine the final object class. In an embodiment, the final object class may be determined as any one of the object class c determined by the first object predictor 110 and the object class c _m determined by the second object predictor 160 have.

위에서 설명한 바와 같이 선명한 포커싱이 확보된 상태에서 확대된 영상의 이진화 영상은 확대 전 수준의 선명도를 제공할 가능성이 높으므로 최종 클래스 결정의 신뢰도가 높아질 수 있다. 특히, 위와 같이 가중치(w1, w2)를 기반으로 운영하면서 각 가중치(w1, w2)를 조정함으로써, 학습 모델의 성능에 따른 안정적인 신뢰도를 확보하는 것이 가능해진다. 또한, 위와 같은 이진화, 크롭, 및 확대에 의한 영상처리는 카메라의 줌인에 의한 물리적 확대보다 시간적으로 유리하며, 움직이는 객체에 대한 신속한 탐지를 제공할 수 있다.As described above, a binarized image of an enlarged image in a state in which sharp focusing is secured is highly likely to provide the level of sharpness before enlargement, and thus the reliability of final class determination may be increased. In particular, by adjusting the respective weights w1 and w2 while operating based on the weights w1 and w2 as described above, it becomes possible to secure stable reliability according to the performance of the learning model. In addition, the image processing by binarization, cropping, and magnification as described above is temporally advantageous over physical magnification by zoom-in of the camera, and can provide rapid detection of a moving object.

도 5에 도시된 객체 인식 장치(100)는 프로세서와 메모리를 구비하는 범용 데이터 처리 장치에 의해 구현될 수 있다. 도 9는 본 발명의 일 실시예에 따른 객체 인식 장치(100)의 물리적 구성 예를 보여준다. 객체 인식 장치(100)는 적어도 하나의 프로세서(220), 메모리(240), 및 저장 장치(260)를 구비할 수 있다. 객체 인식 장치(100)의 구성요소들은 버스(bus)에 의해 연결되어 데이터를 교환할 수 있다.The object recognition apparatus 100 illustrated in FIG. 5 may be implemented by a general-purpose data processing apparatus including a processor and a memory. 9 shows an example of the physical configuration of the object recognition apparatus 100 according to an embodiment of the present invention. The object recognition apparatus 100 may include at least one processor 220 , a memory 240 , and a storage device 260 . Components of the object recognition apparatus 100 may be connected by a bus to exchange data.

프로세서(220)는 메모리(240) 및/또는 저장 장치(260)에 저장된 프로그램 명령들을 실행할 수 있다. 프로세서(220)는 적어도 하나의 중앙 처리 장치(central processing unit, CPU), 그래픽 처리 장치(graphics processing unit, GPU), 또는 본 발명에 따른 방법을 수행할 수 있는 여타의 프로세서를 포함할 수 있다. 메모리(240)는 예컨대 RAM(Random Access Memory)와 같은 휘발성 메모리와, ROM(Read Only Memory)과 같은 비휘발성 메모리를 포함할 수 있다. 메모리(240)는 저장 장치(260)에 저장된 프로그램 명령들을 로드하여, 프로세서(220)에 제공함으로써 프로세서(220)가 이를 실행할 수 있도록 할 수 있다. 저장 장치(260)는 프로그램 명령들과 데이터를 저장하기에 적합한 기록매체로서, 예컨대 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체(Magnetic Media), CD-ROM(Compact Disk Read Only Memory), DVD(Digital Video Disk)와 같은 광 기록 매체(Optical Media), 플롭티컬 디스크(Floptical Disk)와 같은 자기-광 매체(Magneto-Optical Media), 플래시 메모리나 EPROM(Erasable Programmable ROM) 또는 이들을 기반으로 제작되는 SSD와 같은 반도체 메모리를 포함할 수 있다.The processor 220 may execute program instructions stored in the memory 240 and/or the storage device 260 . Processor 220 may include at least one central processing unit (CPU), graphics processing unit (GPU), or other processor capable of performing the method according to the present invention. The memory 240 may include, for example, a volatile memory such as a random access memory (RAM) and a non-volatile memory such as a read only memory (ROM). The memory 240 loads program instructions stored in the storage device 260 and provides them to the processor 220 so that the processor 220 can execute them. The storage device 260 is a recording medium suitable for storing program instructions and data, for example, a magnetic medium such as a hard disk, a floppy disk, and a magnetic tape, a compact disk read only memory (CD-ROM), and a DVD. (Optical Media) such as (Digital Video Disk), magneto-optical media such as floppy disk, flash memory or EPROM (Erasable Programmable ROM), or manufactured based on them It may include a semiconductor memory such as an SSD.

상기 프로그램 명령들은 프로세서(220)에 의해 실행될 때 프로세서(220)로 하여금 후술하는 객체 인식 방법을 구현하는데 필요한 동작을 수행하게 할 수 있다. 예컨대, 상기 프로그램 명령들은 프로세서(220)에 의해 실행될 때 프로세서(220)로 하여금: 입력 영상을 받아들이는 동작; 상기 입력 영상으로부터 객체 예측을 수행하여 제1 객체 클래스(c), 제1 신뢰도(p), 및 바운딩 박스(b)를 획득하는 동작; 상기 바운딩 박스 내부의 입력 영상 부분을 제1 임계치를 기준으로 이진화한 제1 이진화 영상에서 제1 객체 영역 비율을 계산하고, 상기 바운딩 박스 내부의 입력 영상 부분을 제2 임계치를 기준으로 이진화한 제2 이진화 영상에서 제2 객체 영역 비율을 계산하는 동작; 상기 제1 및 제2 객체 영역 비율 사이의 차이를 토대로 상기 입력 영상이 포커싱되어 있는지 판단하는 동작; 상기 입력 영상이 선명하게 포커싱되어 있다고 판단되는 경우, 상기 입력 영상을 크롭하고 크롭된 부분을 확대하여 확대된 크롭 영상을 획득하고, 상기 확대된 크롭 영상으로부터 객체 예측을 수행하여 제2 객체 클래스(c_m) 및 제2 신뢰도(p_m)를 획득하는 동작; 및 상기 제1 신뢰도(p) 및 상기 제2 신뢰도(p_m)를 토대로 최종 객체 클래스를 결정하는 동작;을 수행하게 할 수 있다.When the program instructions are executed by the processor 220 , the processor 220 may perform an operation necessary to implement an object recognition method to be described later. For example, the program instructions, when executed by the processor 220 , cause the processor 220 to: receive an input image; performing object prediction from the input image to obtain a first object class (c), a first reliability (p), and a bounding box (b); A first object area ratio is calculated from a first binarized image obtained by binarizing an input image portion inside the bounding box based on a first threshold, and a second binarization of an input image portion inside the bounding box based on a second threshold. calculating a second object area ratio in the binarized image; determining whether the input image is focused based on a difference between the ratios of the first and second object areas; When it is determined that the input image is clearly focused, the input image is cropped and the cropped portion is enlarged to obtain an enlarged cropped image, and object prediction is performed from the enlarged cropped image to obtain a second object class (c) obtaining _m ) and a second confidence level p _m ; and determining a final object class based on the first reliability p and the second reliability p _m .

도 10은 본 발명의 일 실시예에 따른 객체 인식 방법을 보여주는 흐름도이다.10 is a flowchart illustrating an object recognition method according to an embodiment of the present invention.

영상이 입력되면(제300단계), 제1 객체 예측부(110)는 입력 영상으로부터 객체를 예측한다(제302단계). 객체 예측 과정에서, 제1 객체 예측부(110)는 객체 클래스(c), 신뢰도(p), 및 바운딩 박스(b)를 획득할 수 있다.When an image is input (step 300), the first object prediction unit 110 predicts an object from the input image (step 302). In the object prediction process, the first object prediction unit 110 may obtain an object class (c), a reliability (p), and a bounding box (b).

이어서, 제1 영상 이진화부(122)는 입력 영상을 이진화하여 제1 이진화 영상을 생성하고, 제1 객체 비율 계산부(124)는 제1 이진화 영상에서 객체 영역 비율(Ratio₁)을 계산한다(제304단계). 한편, 제2 영상 이진화부(126)는 입력 영상을 이진화하여 제2 이진화 영상을 생성하고, 제2 객체 비율 계산부(128)는 제2 이진화 영상에서 객체 영역 비율(Ratio₂)을 계산한다(제306단계).Next, the first image binarization unit 122 binarizes the input image to generate a first binarized image, and the first object ratio calculator 124 calculates an object area ratio Ratio ₁ from the first binarized image ( step 304). Meanwhile, the second image binarization unit 126 binarizes the input image to generate a second binarized image, and the second object ratio calculator 128 calculates an object area ratio Ratio ₂ from the second binarized image ( step 306).

그 다음, 객체 비율 차이 계산부(130)는 객체 영역 비율들의 차이(Ratio₁-Ratio₂)를 계산하고(제308단계), 포커스 판단부(120)는 객체 영역 비율 차이(Ratio₁-Ratio₂)를 토대로 입력 영상이 선명하게 포커싱되어 있는지 여부를 판단한다(제310단계). 입력 영상을 토대로 카메라(10)의 포커싱이 되어 있지 않다고 판단되는 경우, 포커스 판단부(120)는 카메라 제어부(140)로 하여금 포커스 조정명령 내지 Auto-focus 명령을 카메라(10)에 송신하도록 하여 카메라(10)의 포커싱이 조정되도록 한다(제312단계). 카메라(10)로부터 새로이 입력되는 영상을 토대로 카메라(10)가 포커싱이 잘 되어 있다고 포커스 판단부(120)가 판단할 때까지 프로세스를 대기 상태를 유지할 수 있다.Next, the object ratio difference calculation unit 130 calculates the difference between the object area ratios (Ratio ₁ -Ratio ₂ ) (step 308), and the focus determination unit 120 determines the object area ratio difference (Ratio ₁ -Ratio ₂ ). ), it is determined whether the input image is clearly focused (step 310). If it is determined based on the input image that the camera 10 is not focused, the focus determination unit 120 causes the camera control unit 140 to transmit a focus adjustment command or an auto-focus command to the camera 10 , Let the focusing of (10) be adjusted (step 312). Based on the image newly input from the camera 10 , the process may be maintained in a standby state until the focus determination unit 120 determines that the camera 10 is well focused.

입력 영상을 토대로 카메라(10)가 포커싱이 잘 되어 있다고 제210단계에서 판단되면, 크롭/확대부(150)는 바운딩 박스 또는 바운딩 박스보다 넓은 영역에 대하여 입력 영상을 크롭(crop)하고 크롭된 영상을 확대한다(제314단계). 제2 객체 예측부(160)는 크롭 및 확대된 영상에서 객체를 예측하여, 객체 클래스(c_m) 및 신뢰도(p_m)를 획득할 수 있다(제316단계). 최종 분류 판단부(170)는 제1 객체 예측부(110)에 의해 생성된 신뢰도(p)와 제2 객체 예측부(160)에 의해 생성된 신뢰도(p_m)에 각각의 가중치(w1, w2)를 부여하여, 최종 객체 클래스를 결정할 수 있다(제318단계).If it is determined in step 210 that the camera 10 is well focused based on the input image, the crop/enlarger 150 crops the input image with respect to a bounding box or a larger area than the bounding box, and the cropped image is enlarged (step 314). The second object prediction unit 160 may predict an object from the cropped and enlarged image to obtain an object class (c _m ) and reliability (p _m ) (operation 316 ). The final classification determining unit 170 assigns weights w1 and w2 to the reliability p generated by the first object prediction unit 110 and the reliability p _m generated by the second object prediction unit 160, respectively. ) to determine the final object class (step 318).

도 11은 최종 분류 판단 단계(제318단계)의 일 실시예를 보다 구체적으로 보여주는 상세 흐름도이다. 도 11에 도시된 실시예에 따르면, 최종 분류 판단부(170)는 제1 객체 예측부(110)에 의해 생성된 신뢰도(p)에 제1 가중치(w1)를 곱한 값과, 제2 객체 예측부(160)에 의해 생성된 신뢰도(p_m)에 제2 가중치(w2)를 곱한 값을 비교할 수 있다(제320단계). 만약 제2 객체 예측부(160)에 의해 생성된 신뢰도(p_m)에 제2 가중치(w2)를 곱한 값이 제1 객체 예측부(110)에 의해 생성된 신뢰도(p)에 제1 가중치(w1)를 곱한 값보다 크다면, 최종 분류 판단부(170)는 제2 객체 예측부(160)에 의해 예측된 객체 클래스(c_m)를 최종 객체 클래스를 결정할 수 있다(제322단계). 만약 제2 객체 예측부(160)에 의해 생성된 신뢰도(p_m)에 제2 가중치(w2)를 곱한 값이 제1 객체 예측부(110)에 의해 생성된 신뢰도(p)에 제1 가중치(w1)를 곱한 값보다 크지 않다면, 최종 분류 판단부(170)는 제1 객체 예측부(110)에 의해 예측된 객체 클래스(c)를 최종 객체 클래스를 결정할 수 있다(제324단계).11 is a detailed flowchart showing an embodiment of the final classification determination step (step 318) in more detail. According to the embodiment shown in FIG. 11 , the final classification determiner 170 multiplies the reliability p generated by the first object predictor 110 by the first weight w1 and predicts the second object. A value obtained by multiplying the reliability p _m generated by the unit 160 by the second weight w2 may be compared (operation 320). If the value obtained by multiplying the reliability p _m generated by the second object prediction unit 160 by the second weight w2 is the reliability p generated by the first object prediction unit 110, the first weight ( w1), the final classification determiner 170 may determine the final object class from the object class c _m predicted by the second object predictor 160 (step 322). If the value obtained by multiplying the reliability p _m generated by the second object prediction unit 160 by the second weight w2 is the reliability p generated by the first object prediction unit 110, the first weight ( w1), the final classification determiner 170 may determine the final object class from the object class c predicted by the first object predictor 110 (operation 324).

도 12는 포커스가 맞지 않은 영상으로부터 제1 임계치(T₁)를 토대로 구해진 제1 이진화 영상과, 제2 임계치(T₂)를 토대로 구해진 제2 이진화 영상 간의 객체 영역의 비율 차이를 예시적으로 보여준다. 이 예에서, 제1 임계치(T₁)는 95로 설정되고, 제2 임계치(T₂)는 이보다 40이 작은(즉,

=-40)인 55로 설정되었다. 도면에서 볼 수 있는 바와 같이, 포커스가 맞지 않은 영상의 경우, 제2 임계치를 토대로 정해지는 제2 이진화 영상 내에서 객체 영역의 비율이 줄어든 것을 확인할 수 있다. 따라서, 본 발명의 메커니즘이 정상적으로 작용한다고 볼 수 있다.12 exemplarily shows the difference in the ratio of the object area between the first binarized image obtained based on the first threshold value T ₁ and the second binarized image obtained based on the second threshold value T ₂ from the out-of-focus image. . In this example, the first threshold T ₁ is set to 95, and the second threshold T ₂ is 40 less (ie,

=-40) was set to 55. As can be seen in the drawing, in the case of an out-of-focus image, it can be confirmed that the ratio of the object area is reduced in the second binarized image determined based on the second threshold. Therefore, it can be seen that the mechanism of the present invention operates normally.

위에서 언급한 바와 같이 본 발명의 실시예에 따른 장치와 방법은 컴퓨터로 읽을 수 있는 기록매체에 컴퓨터가 읽을 수 있는 프로그램 또는 코드로서 구현하는 것이 가능하다. 컴퓨터가 읽을 수 있는 기록매체는 컴퓨터 시스템에 의해 읽혀질 수 있는 데이터가 저장되는 모든 종류의 기록장치를 포함한다. 또한 컴퓨터가 읽을 수 있는 기록매체는 네트워크로 연결된 컴퓨터 시스템에 분산되어 분산 방식으로 컴퓨터로 읽을 수 있는 프로그램 또는 코드가 저장되고 실행될 수 있다.As mentioned above, the apparatus and method according to the embodiment of the present invention can be implemented as a computer-readable program or code on a computer-readable recording medium. The computer-readable recording medium includes all types of recording devices in which data that can be read by a computer system is stored. In addition, the computer-readable recording medium may be distributed in a network-connected computer system to store and execute computer-readable programs or codes in a distributed manner.

상기 컴퓨터가 읽을 수 있는 기록매체는 롬(rom), 램(ram), 플래시 메모리(flash memory) 등과 같이 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치를 포함할 수 있다. 프로그램 명령은 컴파일러(compiler)에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터(interpreter) 등을 사용해서 컴퓨터에 의해 실행될 수 있는 고급 언어 코드를 포함할 수 있다.The computer-readable recording medium may include a hardware device specially configured to store and execute program instructions, such as ROM, RAM, and flash memory. The program instructions may include not only machine language codes such as those generated by a compiler, but also high-level language codes that can be executed by a computer using an interpreter or the like.

본 발명의 일부 측면들은 장치의 문맥에서 설명되었으나, 그것은 상응하는 방법에 따른 설명 또한 나타낼 수 있고, 여기서 블록 또는 장치는 방법 단계 또는 방법 단계의 특징에 상응한다. 유사하게, 방법의 문맥에서 설명된 측면들은 또한 상응하는 블록 또는 아이템 또는 상응하는 장치의 특징으로 나타낼 수 있다. 방법 단계들의 몇몇 또는 전부는 예를 들어, 마이크로프로세서, 프로그램 가능한 컴퓨터 또는 전자 회로와 같은 하드웨어 장치에 의해(또는 이용하여) 수행될 수 있다. 몇몇의 실시예에서, 가장 중요한 방법 단계들의 하나 이상은 이와 같은 장치에 의해 수행될 수 있다.Although some aspects of the invention have been described in the context of an apparatus, it may also represent a description according to a corresponding method, wherein a block or apparatus corresponds to a method step or feature of a method step. Similarly, aspects described in the context of a method may also represent a corresponding block or item or a corresponding device feature. Some or all of the method steps may be performed by (or using) a hardware device such as, for example, a microprocessor, programmable computer or electronic circuit. In some embodiments, one or more of the most important method steps may be performed by such an apparatus.

실시예들에서, 프로그램 가능한 로직 장치(예를 들어, 필드 프로그래머블 게이트 어레이)가 여기서 설명된 방법들의 기능의 일부 또는 전부를 수행하기 위해 사용될 수 있다. 실시예들에서, 필드 프로그래머블 게이트 어레이는 여기서 설명된 방법들 중 하나를 수행하기 위한 마이크로프로세서와 함께 작동할 수 있다. 일반적으로, 방법들은 어떤 하드웨어 장치에 의해 수행되는 것이 바람직하다.In embodiments, a programmable logic device (eg, a field programmable gate array) may be used to perform some or all of the functions of the methods described herein. In embodiments, the field programmable gate array may operate in conjunction with a microprocessor to perform one of the methods described herein. In general, the methods are preferably performed by some hardware device.

위에서는 본 발명의 바람직한 실시예를 참조하여 설명하였지만, 해당 기술 분야의 숙련된 당업자는 하기의 특허 청구의 범위에 기재된 본 발명의 사상 및 영역으로부터 벗어나지 않는 범위 내에서 본 발명을 다양하게 수정 및 변경시킬 수 있음을 이해할 수 있을 것이다.Although the above has been described with reference to the preferred embodiment of the present invention, those skilled in the art can variously modify and change the present invention without departing from the spirit and scope of the present invention as set forth in the claims below. You will understand that it can be done.

Claims

A method of recognizing an object in an image, comprising:
receiving an input image;
obtaining a first object class (c), a first reliability (p), and a bounding box (b) by performing object prediction from the input image;
A first object area ratio is calculated from a first binarized image obtained by binarizing an input image portion inside the bounding box based on a first threshold, and a second binarization of an input image portion inside the bounding box based on a second threshold. calculating a second object area ratio in the binarized image;
determining whether the input image is focused based on a difference between the ratios of the first and second object areas;
When it is determined that the input image is clearly focused, the input image is cropped and the cropped portion is enlarged to obtain an enlarged cropped image, and object prediction is performed from the enlarged cropped image to obtain a second object class (c) _m ) and a second confidence level p _m ; and
and determining a final object class based on the first reliability (p) and the second reliability (p _m ).

The method according to claim 1, wherein the step of determining whether the input image is focused
determining whether a difference between the first and second object area ratios is greater than a predetermined reference value;
determining that the input image is not focused when the difference is greater than the reference value; and
determining that the input image is focused if the difference is not greater than the reference value;
An object recognition method comprising

The method according to claim 2, wherein the input image is received from a camera,
The step of determining whether the input image is focused
supplying a control signal for focus adjustment to the camera when it is determined that the input image is not focused;
An object recognition method further comprising a.

The method according to claim 3, wherein the control signal includes a zoom-in control command or an auto-focus command.

The method according to claim 1, wherein determining the final object class comprises:
assigning first and second weights w1 and w2 to the first reliability p and the second reliability p _m , respectively, and comparing them;
The object recognition method further comprising: determining the final object class according to the comparison result.

The method of claim 5, wherein determining the final object class comprises:
If the value obtained by multiplying the second reliability p _m by the second weight w2 is greater than the value obtained by multiplying the first reliability p by the first weight w1, the second object class c _m ) to determine the final object class; and
If the value obtained by multiplying the second reliability p _m by the second weight w2 is not greater than the value obtained by multiplying the first reliability p by the first weight w1, the first object class c determining the final object class;
An object recognition method comprising

An apparatus for recognizing an object in an image, comprising:
a memory storing program instructions; a processor connected to the memory and executing the program instructions stored in the memory;
The program instructions, when executed by the processor, cause the processor to:
accept an input image;
performing object prediction from the input image to obtain a first object class (c), a first reliability (p), and a bounding box (b);
A first object area ratio is calculated from a first binarized image obtained by binarizing an input image portion inside the bounding box based on a first threshold, and a second binarization of an input image portion inside the bounding box based on a second threshold. calculating a second object area ratio in the binarized image;
determining whether the input image is focused based on a difference between the ratio of the first and second object areas;
When it is determined that the input image is clearly focused, the input image is cropped and the cropped portion is enlarged to obtain an enlarged cropped image, and object prediction is performed from the enlarged cropped image to obtain a second object class (c) _m ) and a second confidence level (p _m );
An object recognition apparatus configured to determine a final object class based on the first reliability (p) and the second reliability (p _m ).

8. The method of claim 7, wherein the program instructions that cause the processor to determine whether the input image is in focus cause the processor to:
determine whether a difference between the first and second object area ratios is greater than a predetermined reference value;
if the difference is greater than the reference value, it is determined that the input image is not focused; and
If the difference is not greater than the reference value, the object recognition apparatus determines that the input image is focused.

The method according to claim 8, wherein the object recognition device is interlocked with a camera to receive the input image from the camera,
Program instructions that cause the processor to determine whether the input image is in focus cause the processor to:
An object recognition apparatus configured to supply a control signal for focus adjustment to the camera when it is determined that the input image is not focused.

The apparatus of claim 9 , wherein the control signal includes a zoom-in control command or an auto-focus command.

8. The method of claim 7, wherein program instructions causing the processor to determine the final object class cause the processor to:
First and second weights w1 and w2 are respectively assigned to the first reliability p and the second reliability p _m and compared, and the final object class is determined according to the comparison result. object recognition device.

12. The method of claim 11, wherein program instructions causing the processor to determine the final object class cause the processor to:
If the value obtained by multiplying the second reliability p _m by the second weight w2 is greater than the value obtained by multiplying the first reliability p by the first weight w1, the second object class c _m ) to determine the final object class;
If the value obtained by multiplying the second reliability p _m by the second weight w2 is not greater than the value obtained by multiplying the first reliability p by the first weight w1, the first object class c An object recognition device for determining the final object class.