KR20230090255A

KR20230090255A - Apparatus and method for object recognition

Info

Publication number: KR20230090255A
Application number: KR1020220172036A
Authority: KR
Inventors: 여건민; 김영일; 박성희; 정운철; 허태욱
Original assignee: 한국전자통신연구원
Priority date: 2021-12-14
Filing date: 2022-12-09
Publication date: 2023-06-21

Abstract

본 발명은 객체 인식 장치에 관한 것으로, 카메라 모듈에서 촬영된 원본영상을 처리하여, 상기 머신 러닝 추론 모델에 입력할 크기의 영상을 생성하는 객체 추론 모델;을 포함하고, 상기 객체 추론 모델은, 상기 머신 러닝 추론 모델을 포함하며, 상기 머신 러닝 추론 모델을 통해 추론되는 객체의 인식과 분류 결과를 출력하며, 상기 머신 러닝 추론 모델은, 입력된 영상을 처리하여, 상기 입력된 영상에 포함된 객체를 추론하는 것을 특징으로 한다.The present invention relates to an object recognition device, and includes an object inference model for generating an image of a size to be input to the machine learning inference model by processing an original image captured by a camera module, wherein the object inference model comprises: It includes a machine learning inference model, and outputs results of recognition and classification of an object inferred through the machine learning inference model, wherein the machine learning inference model processes an input image to identify an object included in the input image. It is characterized by reasoning.

Description

Object recognition device and method {APPARATUS AND METHOD FOR OBJECT RECOGNITION}

본 발명은 객체 인식 장치 및 방법에 관한 것으로, 보다 상세하게는 객체 인식과 분류에 대한 추론의 정확도 및 추론 속도를 향상시킬 수 있도록 하는, 객체 인식 장치 및 방법에 관한 것이다.The present invention relates to an object recognition apparatus and method, and more particularly, to an object recognition apparatus and method capable of improving inference accuracy and inference speed for object recognition and classification.

일반적으로 머신러닝은 기계학습이라고 하며, 사용자가 입력하는 데이터를 기준으로 컴퓨터가 학습하고 예측하는 알고리즘이라고 할 수 있다. 즉, 머신러닝은 관계가 있는 것들이 지닌 상하 구조와 일정한 패턴을 인식하여 입력되어 있지 않은 정보들도 스스로 판단하여 결정하고, 앞으로 발생할 상황들을 예측하는 기술이라고 할 수 있다.In general, machine learning is called machine learning, and it can be said that a computer learns and predicts an algorithm based on data input by a user. In other words, machine learning is a technology that recognizes the hierarchical structure and certain patterns of related things, judges and decides information that has not been input by itself, and predicts future situations.

상기 머신러닝이 학습하는 유형 중 하나에는 지도 학습이 있으며, 이는 훈련하는 데에 사용되는 데이터마다 결과 값을 달아주어 모델을 분류하는 알고리즘을 의미한다. 또한 상기 머신러닝의 다른 학습 유형에는 비지도 학습이 있으며, 이는 결과 값을 따로 부여하지 않은 훈련 데이터들의 공통점을 찾아 그룹으로 나누는 알고리즘을 의미한다. 또한 상기 머신러닝의 또 다른 학습 유형에는 강화 학습이 있으며, 이는 훈련 데이터를 별도로 준비하지 않고, 각각 다른 상황에서 어떻게 행동하느냐에 따라 그에 맞는 보상을 제공하는 알고리즘을 의미한다.One of the types of machine learning learning is supervised learning, which means an algorithm that classifies a model by weighing a result value for each data used for training. In addition, another learning type of machine learning includes unsupervised learning, which refers to an algorithm that divides into groups by finding commonalities in training data to which result values are not separately assigned. In addition, another type of learning of machine learning includes reinforcement learning, which means an algorithm that provides compensation according to how a user behaves in different situations without separately preparing training data.

상기 머신러닝은 게임이나 자동차, 로봇 등 다양한 분야에서 사용되고 있다.The machine learning is used in various fields such as games, automobiles, and robots.

본 발명의 배경기술은 대한민국 등록특허 10-2261187호(2021.05.31. 등록, 머신러닝에 기반한 감시 영상 분석 시스템 및 그 방법)에 개시되어 있다. The background art of the present invention is disclosed in Korean Patent Registration No. 10-2261187 (registered on May 31, 2021, surveillance video analysis system and method based on machine learning).

본 발명의 일 측면에 따르면, 본 발명은 상기와 같은 문제점을 해결하기 위해 창작된 것으로서, 객체 인식과 분류에 대한 정확도 및 추론 속도를 향상시킬 수 있도록 하는, 객체 인식 장치 및 방법을 제공하는 데 그 목적이 있다. According to one aspect of the present invention, the present invention was created to solve the above problems, to provide an object recognition apparatus and method that can improve the accuracy and reasoning speed for object recognition and classification. There is a purpose.

본 발명의 일 측면에 따른 객체 인식 장치는, 카메라 모듈에서 촬영된 원본영상을 처리하여, 상기 머신 러닝 추론 모델에 입력할 크기의 영상을 생성하는 객체 추론 모델;을 포함하고, 상기 객체 추론 모델은, 상기 머신 러닝 추론 모델을 포함하며, 상기 머신 러닝 추론 모델을 통해 추론되는 객체의 인식과 분류 결과를 출력하며, 상기 머신 러닝 추론 모델은, 입력된 영상을 처리하여, 상기 입력된 영상에 포함된 객체를 추론하는 것을 특징으로 한다.An object recognition apparatus according to an aspect of the present invention includes an object inference model that processes an original image captured by a camera module and generates an image of a size to be input to the machine learning inference model, wherein the object inference model , Includes the machine learning inference model, and outputs a result of recognition and classification of an object inferred through the machine learning inference model, wherein the machine learning inference model processes an input image to obtain information included in the input image. It is characterized by inferring an object.

본 발명에 있어서, 상기 객체 추론 모듈은, 상기 머신 러닝 추론 모델에 입력할 영상으로서, 상기 원본영상 내에서 클러스터링 되는 객체 포함 영역들의 영상을, 상기 머신 러닝 추론 모델에 입력할 크기에 맞춰 추출하는 것을 특징으로 한다.In the present invention, the object inference module extracts, as an image to be input to the machine learning inference model, an image of object-containing regions clustered in the original image according to a size to be input to the machine learning inference model to be characterized

본 발명에 있어서, 상기 객체 추론 모듈은, 상기 객체 포함 영역을 추출하기 위하여, 상기 원본영상이 축소된 이진화 영상에서 배경선을 산출한 다음, 상기 배경선의 상측 하늘 공간으로 오프셋을 더하여 경계선을 산출하고, 상기 경계선을 기준으로 아래의 영역을 제외한 하늘 공간에서만 객체 포함 영역을 추출하는 것을 특징으로 한다.In the present invention, in order to extract the object-containing region, the object inference module calculates a background line from a binarized image in which the original image is reduced, and then calculates a boundary line by adding an offset to the sky space above the background line, , It is characterized in that the object-containing region is extracted only from the sky space excluding the region below based on the boundary line.

본 발명에 있어서, 상기 객체 추론 모듈은, 상기 원본영상인 칼라 영상을, R, G, B 값이 동일한 값을 가지는 그레이 영상으로 변환하고, 상기 그레이 영상을 축소하여 축소된 이진화 영상을 생성하며, 상기 이진화 영상은, 픽셀 값이 지정된 임계값 이상일 경우에는 1로 표현되고, 상기 픽셀 값이 지정된 임계값 보다 작을 경우에는 0으로 표현된 영상인 것을 특징으로 한다.In the present invention, the object inference module converts the color image, which is the original image, into a gray image having the same values of R, G, and B, and generates a reduced binarized image by downscaling the gray image; The binarized image is characterized in that it is an image expressed as 1 when the pixel value is greater than or equal to a specified threshold value and expressed as 0 when the pixel value is smaller than the specified threshold value.

본 발명에 있어서, 상기 객체 추론 모듈은, 각기 다른 임계값을 적용하여 N개의 배경선을 아래의 수학식 2와 같이 산출하고, 상기 N개의 각 배경선에 각기 다른 가중치(

)를 적용한 후 가중 평균함으로써, 최종 배경선(B_i)을 아래의 수학식 3과 같이 산출하는 것을 특징으로 한다.In the present invention, the object inference module calculates N background lines as shown in Equation 2 below by applying different threshold values, and each of the N background lines has a different weight (

) is applied and then a weighted average is performed to calculate the final background line (B _i ) as shown in Equation 3 below.

(수학식 2)(Equation 2)

(수학식 3)(Equation 3)

본 발명에 있어서, 상기 객체 추론 모듈은, 상기 축소된 이진화 영상에서 객체를 검출하기 위하여 객체선을 산출하고, 상기 객체선을 산출하기 위하여 지정된 객체선 산출용 임계값(THRESHOLD_O)을 아래의 수학식 5에 적용하여, 상기 이진화 영상의 가로 픽셀 i에 대한 세로 픽셀 값이 최소가 되는 첫 번째 세로 픽셀 인덱스(OBJ_i,O)를 찾는 과정을 반복 수행함으로써 상기 객체선을 검출하는 것을 특징으로 한다.In the present invention, the object inference module calculates an object line to detect an object in the reduced binarized image, and calculates the object line calculation threshold value (THRESHOLD_O) specified in the following equation to calculate the object line. 5, the object line is detected by repeating the process of finding the first vertical pixel index (OBJ _i,O ) where the value of the vertical pixel for the horizontal pixel i of the binarized image is the minimum.

(수학식 5)(Equation 5)

본 발명에 있어서, 상기 객체 추론 모듈은, 상기 축소된 이진화 영상을 세로 방향으로 분할한 복수의 분할영역에서 각기 독립적으로 객체를 검출하되, 상기 객체선의 경사가 객체 검출용 임계값(THRESHOLD_OBJECT)을 넘어 가장 크게 증가하는 지점을, 객체의 위치로 검출하는 것을 특징으로 한다.In the present invention, the object inference module independently detects an object in a plurality of partitioned regions obtained by dividing the reduced binary image in the vertical direction, but the slope of the object line exceeds the object detection threshold (THRESHOLD_OBJECT). It is characterized in that the point of greatest increase is detected as the position of the object.

본 발명에 있어서, 상기 객체 추론 모듈은, 상기 축소된 이진화 영상에서 객체들의 위치가 검출되면, 머신 러닝 추론 모델에 입력할 수 있는 크기에 맞춰 객체 포함 영역들을 클러스터링하되, 해당하는 영상 내에서 조합 가능한 가장 적은 갯수로 클러스터링 하는 것을 특징으로 한다.In the present invention, the object inference module clusters object-containing regions according to a size that can be input to a machine learning inference model when the positions of objects are detected in the reduced binarized image, but can be combined within a corresponding image It is characterized by clustering with the smallest number.

본 발명에 있어서, 상기 객체 추론 모듈은, 상기 축소된 이진화 영상에서 클러스터링된 객체 포함 영역에 지정된 비율(RATIO)을 적용함으로써, 원본영상에서의 객체 포함 영역으로 매핑시키는 것을 특징으로 한다.In the present invention, the object inference module is characterized in that by applying a specified ratio (RATIO) to the clustered object-containing area in the reduced binarized image, mapping to the object-containing area in the original image.

본 발명의 다른 측면에 따른 객체 인식 방법은, 객체 추론 모듈이 원본영상을 입력받는 단계; 상기 객체 추론 모듈이 상기 원본영상의 처리를 통해 머신 러닝 추론 모델에 입력할 크기에 맞춰 적어도 하나의 객체 포함 영역을 추출하는 단계; 및 상기 객체 추론 모듈이 상기 머신 러닝 추론 모델에 의해 상기 적어도 하나의 객체 포함 영역에 포함된 객체의 인식과 분류를 수행하여 출력하는 단계;를 포함하는 것을 특징으로 한다.An object recognition method according to another aspect of the present invention includes receiving an original image by an object inference module; extracting, by the object inference module, an area including at least one object according to a size to be input to a machine learning inference model through processing of the original image; and performing, by the object inference module, recognition and classification of an object included in the at least one object-containing region by the machine learning inference model and outputting the result.

본 발명에 있어서, 상기 머신 러닝 추론 모델에 입력할 크기에 맞춰 적어도 하나의 객체 포함 영역을 추출하는 단계는, 상기 객체 추론 모듈이, 상기 원본영상을 변환하여 축소된 이진화 영상을 생성하는 단계; 상기 축소된 이진화 영상에서 검출되는 객체에 대응하여 적어도 하나의 객체 포함 영역을 클러스터링 하는 단계; 및 상기 축소된 이진화 영상에서 클러스터링 된 객체 포함 영역을, 원본영상에서의 객체 포함 영역으로 매핑시켜 상기 머신 러닝 추론 모델에 입력할 객체 포함 영역을 추출하는 단계;를 포함하는 것을 특징으로 한다.In the present invention, the step of extracting at least one object-containing region according to the size to be input to the machine learning inference model includes: generating, by the object inference module, a reduced binarized image by transforming the original image; clustering a region including at least one object corresponding to an object detected in the reduced binary image; and extracting the object-containing region to be input to the machine learning inference model by mapping the object-containing region clustered in the downsized binarized image to the object-containing region in the original image.

본 발명에 있어서, 상기 축소된 이진화 영상에서 적어도 하나의 객체 포함 영역을 추출하기 위하여, 상기 객체 추론 모듈이, 상기 축소된 이진화 영상에서 배경선을 산출하고, 상기 배경선의 상측 하늘 공간으로 오프셋을 더하여 경계선을 산출하고, 상기 경계선을 기준으로 아래의 영역을 제외한 하늘 공간에서만 객체 포함 영역을 추출하는 것을 특징으로 한다.In the present invention, in order to extract at least one object-containing region from the reduced binary image, the object inference module calculates a background line from the reduced binary image and adds an offset to the sky space above the background line. It is characterized in that the boundary line is calculated, and the object-containing region is extracted only from the sky space excluding the area below based on the boundary line.

본 발명에 있어서, 상기 축소된 이진화 영상에서 적어도 하나의 객체 포함 영역을 추출하기 위하여, 상기 객체 추론 모듈이, 상기 축소된 이진화 영상을 세로 방향으로 복수의 분할영역으로 분할하고, 상기 복수의 분할영역에서 독립적으로 객체를 검출함으로써 전체적인 추론 시간을 단축하는 것을 특징으로 한다.In the present invention, in order to extract at least one object-containing region from the reduced binarized image, the object inference module divides the reduced binarized image into a plurality of divided regions in a vertical direction, and the plurality of divided regions It is characterized in that the overall inference time is shortened by independently detecting objects in .

본 발명의 다른 측면에 따른 객체 인식 방법은, 객체 추론 모듈이 원본영상을 축소하여 축소영상을 생성하는 단계; 상기 객체 추론 모듈이 상기 축소영상을 이진화 영상으로 변환하여 축소된 이진화 영상을 생성하는 단계; 상기 객체 추론 모듈이 상기 축소된 이진화 영상에서 객체를 검출하고, 머신 러닝 추론 모델에 입력할 수 있는 크기에 맞춰 객체 포함 영역들을 클러스터링 하는 단계; 상기 객체 추론 모듈이 상기 축소된 이진화 영상에서 클러스터링 된 객체 포함 영역을, 원본영상에서의 객체 포함 영역으로 매핑시키는 단계; 및 상기 원본영상으로 매핑된 객체 포함 영역들을 머신 러닝 추론 모델에 입력하여 객체를 추론하는 단계;를 포함하는 것을 특징으로 한다.An object recognition method according to another aspect of the present invention includes generating a reduced image by reducing an original image by an object inference module; generating, by the object inference module, a reduced binarized image by converting the reduced image into a binarized image; the object inference module detecting an object in the reduced binarized image and clustering object-containing regions according to a size that can be input to a machine learning inference model; mapping, by the object inference module, an object-containing region clustered in the reduced binarized image to an object-containing region in the original image; and inferring the object by inputting the object-containing regions mapped to the original image into a machine learning inference model.

본 발명에 있어서, 상기 객체 포함 영역을 추출하기 위하여, 상기 객체 추론 모듈은, 상기 축소된 이진화 영상에서 배경선을 산출한 다음, 상기 배경선의 상측 하늘 공간으로 오프셋을 더하여 경계선을 산출하고, 상기 경계선을 기준으로 아래의 영역을 제외한 하늘 공간에서만 객체 포함 영역을 추출하는 것을 특징으로 한다.In the present invention, in order to extract the object-containing region, the object inference module calculates a boundary line by calculating a background line from the reduced binarized image and then adding an offset to a sky space above the background line, and calculating the boundary line. It is characterized in that the object-containing area is extracted only from the sky space excluding the area below based on .

본 발명에 있어서, 상기 축소된 이진화 영상을 생성하는 단계에서, 상기 객체 추론 모듈은, 상기 원본영상인 칼라 영상을, R, G, B 값이 동일한 값을 가지는 그레이 영상으로 변환하고, 상기 그레이 영상을 축소하여 축소영상을 생성한 후, 픽셀 값이 지정된 임계값 이상일 경우에는 1로 표현하고, 상기 픽셀 값이 지정된 임계값 보다 작을 경우에는 0으로 표현하여 이진화 영상을 생성하는 것을 특징으로 한다.In the present invention, in the step of generating the reduced binarized image, the object inference module converts the color image, which is the original image, into a gray image having the same R, G, and B values, and the gray image After generating a reduced image by downscaling, a binary image is generated by expressing 1 when the pixel value is greater than or equal to a designated threshold value and expressing 0 when the pixel value is smaller than the designated threshold value.

본 발명에 있어서, 상기 배경선을 산출하기 위하여, 상기 객체 추론 모듈은 각기 다른 임계값을 적용하여 N개의 배경선을 아래의 수학식 2와 같이 산출하고, 상기 N개의 각 배경선에 각기 다른 가중치(

)를 적용한 후 가중 평균함으로써, 최종 배경선(B_i)을 아래의 수학식 3과 같이 산출하는 것을 특징으로 한다.In the present invention, in order to calculate the background line, the object inference module calculates N background lines as shown in Equation 2 below by applying different threshold values, and has different weights for each of the N background lines. (

(수학식 2)(Equation 2)

(수학식 3)(Equation 3)

본 발명에 있어서, 상기 축소된 이진화 영상에서 객체를 검출하기 위하여, 상기 객체 추론 모듈은, 상기 축소된 이진화 영상에서 객체를 검출하기 위하여 객체선을 산출하고, 상기 객체선의 경사가 객체 검출용 임계값(THRESHOLD_OBJECT)을 넘어 가장 크게 증가하는 지점을 객체의 위치로 검출하는 것을 특징으로 한다.In the present invention, in order to detect an object in the reduced binary image, the object inference module calculates an object line to detect an object in the reduced binary image, and the slope of the object line is a threshold value for object detection. (THRESHOLD_OBJECT) is characterized by detecting the point of greatest increase as the location of the object.

본 발명에 있어서, 상기 객체선을 산출하기 위하여, 상기 객체 추론 모듈은, 지정된 객체선 산출용 임계값(THRESHOLD_O)을 아래의 수학식 5에 적용하여, 상기 축소된 이진화 영상의 가로 픽셀 i에 대한 세로 픽셀 값이 최소가 되는 첫 번째 세로 픽셀 인덱스(OBJ_i,O)를 찾는 과정을 반복 수행함으로써 상기 객체선을 검출하는 것을 특징으로 한다.In the present invention, in order to calculate the object line, the object inference module applies the designated object line calculation threshold value (THRESHOLD_O) to Equation 5 below to calculate the horizontal pixel i of the reduced binarized image. It is characterized in that the object line is detected by repeating a process of finding a first vertical pixel index (OBJ _i,O ) having a minimum vertical pixel value.

(수학식 5)(Equation 5)

본 발명에 있어서, 상기 축소된 이진화 영상에서 클러스터링 된 객체 포함 영역을, 원본영상에서의 객체 포함 영역으로 매핑시키는 단계에서, 상기 객체 추론 모듈은, 상기 축소된 이진화 영상에서 클러스터링 된 객체 포함 영역((cx, cy),(cx+w, cy+h))에 지정된 비율(RATIO)을 적용하여 원본영상에서의 객체 포함 영역(((cx, cy),(cx+w, cy+h))×RATIO)으로 매핑시키는 것을 특징으로 한다.In the present invention, in the step of mapping the object-containing region clustered in the reduced binarized image to the object-containing region in the original image, the object inference module includes a clustered object-containing region in the reduced binary image ((( Applying the specified ratio (RATIO) to cx, cy), (cx+w, cy+h)), the area containing the object in the original image (((cx, cy), (cx+w, cy+h))× RATIO).

본 발명의 일 측면에 따르면, 본 발명은 원본영상 전체를 단순히 축소하여 머신러닝 추론 모델에 입력하는 것이 아니라, 원본영상에서 객체 포함 영역을 산출하여 머신러닝 추론 모델에 입력함으로써 객체의 인식과 분류에 대한 추론 정확도와 추론 속도를 향상시킬 수 있도록 한다.According to one aspect of the present invention, the present invention does not simply downsize the entire original image and inputs it to the machine learning inference model, but calculates the object-containing region from the original image and inputs it to the machine learning inference model, thereby improving object recognition and classification. To improve inference accuracy and reasoning speed.

또한 본 발명의 다른 측면에 따르면, 본 발명은 하늘 공간상에 출현하는 객체의 검출 및 그 종류의 확인을 위한 머신러닝 추론 모델의 입력 방식으로 직접 적용 가능하며, 특히 고해상도 영상에서 추론 정확도와 추론 속도를 향상시킬 수 있도록 한다.In addition, according to another aspect of the present invention, the present invention can be directly applied as an input method of a machine learning inference model for detecting objects appearing in sky space and confirming their types, and in particular, inference accuracy and inference speed in high-resolution images. to be able to improve

또한 본 발명의 또 다른 측면에 따르면, 본 발명은 추론된 객체의 물리적 위치를 추적하는 시스템에 직접 응용할 수 있도록 한다.According to another aspect of the present invention, the present invention can be directly applied to a system for tracking the physical location of an inferred object.

도 1은 본 발명의 일 실시 예에 따른 객체 인식 장치의 개략적인 구성을 보인 예시도이다.
도 2는 상기 도 1에 있어서, 카메라 모듈을 통해 촬영된 영상에 포함된 객체를 머신러닝을 이용하여 분류하는 과정에서 발생할 수 있는 문제점을 설명하기 위하여 보인 예시도이다.
도 3은 상기 도 2에 있어서, 객체를 분류하는 과정에서 객체의 크기가 작을 경우의 객체의 인식 오류 영상을 보인 예시도이다.
도 4는 본 발명의 일 실시 예에 따라, 객체의 인식과 분류의 정확도 향상과 추론 속도의 향상을 위하여, 머신러닝 모델에 입력할 객체의 크기를 증가시키는 방법을 개략적으로 설명하기 위하여 보인 예시도이다.
도 5는 상기 도 1에 있어서, 객체 추론 모듈의 보다 구체적인 구성을 보인 예시도이다.
도 6은 상기 도 1에 있어서, 객체 추론 모듈에 의해 원본영상에서 객체 포함 영역을 추출하는 방법을 설명하기 위한 예시도이다.
도 7은 상기 도 6에 있어서, 원본영상에서 배경선을 검출하는 원리를 설명하기 위하여 보인 예시도이다.
도 8은 상기 도 7에 있어서, 원본영상에서 배경선을 검출하기 전 단계로서 영상 이진화(binary-coding) 방법을 설명하기 위하여 보인 예시도이다.
도 9은 상기 도 8에 있어서, 축소된 이진화 영상으로부터 배경선을 산출하는 방법을 설명하기 위하여 보인 예시도이다.
도 10은 상기 도 8에 있어서, 축소된 이진화 영상에서 객체를 검출하는 방법을 설명하기 위하여 보인 예시도이다.
도 11은 상기 도 8에 있어서, 축소된 이진화 영상에서 분할영역을 이용하여 객체를 검출하는 방법을 설명하기 위하여 보인 예시도이다.
도 12는 상기 도 11에 있어서, 축소된 이진화 영상에서 객체 포함 영역들의 클러스터링 방법을 설명하기 위하여 보인 예시도이다.
도 13은 상기 도 6에 있어서, 원본영상에서 객체 포함 영역을 산출하여 머신러닝 추론 모델에 입력하는 과정을 설명하기 위한 예시도이다.1 is an exemplary view showing a schematic configuration of an object recognition apparatus according to an embodiment of the present invention.
FIG. 2 is an exemplary diagram shown to explain problems that may occur in a process of classifying objects included in an image captured through a camera module in FIG. 1 using machine learning.
FIG. 3 is an exemplary diagram showing an object recognition error image when the size of the object is small in the process of classifying the object in FIG. 2 .
4 is an exemplary view schematically illustrating a method of increasing the size of an object to be input to a machine learning model in order to improve object recognition and classification accuracy and inference speed according to an embodiment of the present invention. am.
FIG. 5 is an exemplary view showing a more specific configuration of the object inference module in FIG. 1 .
FIG. 6 is an exemplary diagram for explaining a method of extracting an object-containing region from an original image by an object inference module in FIG. 1 .
FIG. 7 is an exemplary diagram shown in FIG. 6 to explain the principle of detecting a background line in an original image.
FIG. 8 is an exemplary diagram shown in FIG. 7 to explain an image binary-coding method as a step prior to detecting a background line in an original image.
FIG. 9 is an exemplary diagram shown to explain a method of calculating a background line from a reduced binarized image in FIG. 8 .
FIG. 10 is an exemplary diagram shown to explain the method of detecting an object in a reduced binarized image in FIG. 8 .
FIG. 11 is an exemplary diagram shown to explain the method of detecting an object using a segmented region in a reduced binarized image in FIG. 8 .
FIG. 12 is an exemplary diagram shown to explain a method of clustering object-containing regions in a reduced binarized image in FIG. 11 .
FIG. 13 is an exemplary diagram for explaining the process of calculating an object-containing region from an original image and inputting the object-containing region to a machine learning inference model in FIG. 6 .

이하, 첨부된 도면을 참조하여 본 발명에 따른 객체 인식 장치 및 방법의 일 실시 예를 설명한다. 이 과정에서 도면에 도시된 선들의 두께나 구성요소의 크기 등은 설명의 명료성과 편의상 과장되게 도시되어 있을 수 있다. 또한, 후술되는 용어들은 본 발명에서의 기능을 고려하여 정의된 용어들로서 이는 사용자, 운용자의 의도 또는 관례에 따라 달라질 수 있다. 그러므로 이러한 용어들에 대한 정의는 본 명세서 전반에 걸친 내용을 토대로 내려져야 할 것이다.Hereinafter, an embodiment of an object recognition apparatus and method according to the present invention will be described with reference to the accompanying drawings. In this process, the thickness of lines or the size of components shown in the drawings may be exaggerated for clarity and convenience of explanation. In addition, terms to be described later are terms defined in consideration of functions in the present invention, which may vary according to the intention or custom of a user or operator. Therefore, definitions of these terms will have to be made based on the content throughout this specification.

최근 카메라를 통해서 촬영된 영상 내의 객체(예 : 자동차, 사람, 동물 등)를 인식하는 기술은, 그래픽 카드의 처리 속도 및 기계 학습의 진화와 함께 많은 성능 향상이 이루어졌다. Recently, a technology for recognizing an object (eg, a car, a person, an animal, etc.) in an image captured by a camera has been greatly improved along with the processing speed of a graphics card and the evolution of machine learning.

한편 최근 국방 분야에서는 상기 영상 내의 객체를 인식하는 기술을 응용하여, 드론 공격 등에 대응하기 위하여 드론 출현을 조기에 인식하는 기술이 대두되고 있으며, 상기 드론 출현을 조기에 인식하는 기술은, 민간 분야에서는 사생활 보호 측면에서 매우 중요한 기술로 떠오르고 있다.On the other hand, recently, in the field of defense, a technology for recognizing the appearance of drones at an early stage has emerged in order to respond to drone attacks by applying the technology for recognizing objects in the image. It is emerging as a very important technology in terms of privacy protection.

이 때 상기 영상 내의 객체를 인식하는 기술은, 화면상에서 인식할 대상(즉, 객체)이 무엇이고, 상기 객체가 어떤 영역에 있는지를 판단하는 기계학습(머신러닝)을 통해 구현되며, 상기 영상 내의 객체 인식을 위한 기계학습(머신러닝)은, 다양한 화면상에서의 영상 학습을 통해서 이루어지며, 질적으로 수준이 높은 학습 방법에 의해서 그 성능이 크게 좌우된다. At this time, the technology for recognizing the object in the image is implemented through machine learning (machine learning) to determine what the object (ie, object) to be recognized on the screen is and in what area the object is located. Machine learning (machine learning) for object recognition is performed through image learning on various screens, and its performance is greatly influenced by high-quality learning methods.

도 1은 본 발명의 일 실시 예에 따른 객체 인식 장치의 개략적인 구성을 보인 예시도로서, 이하 드론(무인기)을 감지하여 종류를 판단하는 방법을 예시적으로 설명한다.1 is an exemplary view showing a schematic configuration of an object recognition device according to an embodiment of the present invention, and a method of detecting a drone (unmanned aerial vehicle) and determining the type will be exemplarily described below.

도 1에 도시된 바와 같이, 본 실시 예에 따른 객체 인식 장치는, 카메라 모듈(101), 객체 추론 모듈(102), 및 영상 출력 모듈(104)을 포함한다.As shown in FIG. 1 , the object recognition apparatus according to the present embodiment includes a camera module 101 , an object inference module 102 , and an image output module 104 .

상기 카메라 모듈(101)은 공간상의 객체(10)를 촬영한다.The camera module 101 photographs an object 10 in space.

상기 객체 추론 모듈(102)은 상기 카메라 모듈(101)에서 촬영된 영상에 포함된 객체(10)의 종류가 무엇인지 판단하고(즉, 객체를 분류하고), 상기 객체(10)가 상기 영상 출력 모듈(104)상의(즉, 화면상의) 어디에 위치하는지 판단(예 : 바운딩 박스 표시)한다.The object inference module 102 determines the type of object 10 included in the image captured by the camera module 101 (ie, classifies the object), and outputs the image when the object 10 is present. Determine where it is located on the module 104 (ie, on the screen) (eg, display a bounding box).

상기 영상 출력 모듈(104)은 상기 객체 추론 모듈(102)이 분류한 객체(10)의 종류와 객체의 위치(즉, 바운딩 박스)(105)를 화면상에 표시한다(도 3 참조). 즉, 상기 영상 출력 모듈(104)은 화면을 포함하는 개념이다. The image output module 104 displays the type of object 10 classified by the object inference module 102 and the location (ie, bounding box) 105 of the object on the screen (see FIG. 3 ). That is, the image output module 104 is a concept including a screen.

이 때 상기 영상 출력 모듈(104)에(즉, 화면상에) 표시되는 객체(103)(즉, 객체 이미지)는, 상기 공간상의 실제 객체(10)가 상기 카메라 모듈(101)에 의해 촬영되어 화면상에 출력된 이미지 형태의 객체(103)(즉, 객체 이미지)를 의미한다. At this time, the object 103 (ie, the object image) displayed on the image output module 104 (ie, on the screen) is a real object 10 in the space captured by the camera module 101 It means an object 103 (ie, an object image) in the form of an image output on the screen.

아울러 상기 영상 출력 모듈(104)의 화면상에 표시되는 객체(103)(즉, 객체 이미지)의 둘레에는 바운딩 박스(105)가 표시된다.In addition, a bounding box 105 is displayed around the object 103 (ie, object image) displayed on the screen of the image output module 104 .

도 2는 상기 도 1에 있어서, 카메라 모듈을 통해 촬영된 영상에 포함된 객체를 머신러닝을 이용하여 분류(classification)하는 과정에서 발생할 수 있는 문제점을 설명하기 위하여 보인 예시도이고, 도 3은 상기 도 2에 있어서, 객체를 분류하는 과정에서 객체의 크기가 작을 경우의 객체의 인식 오류 영상을 보인 예시도이다.FIG. 2 is an exemplary diagram shown in FIG. 1 to explain problems that may occur in the process of classifying an object included in an image captured through a camera module using machine learning, and FIG. 2 is an exemplary view showing an object recognition error image when the size of the object is small in the process of classifying the object.

도 2를 참조하면, 머신 러닝 모델(또는 머신 러닝 추론 모델)에 입력되는 영상의 크기는 제한적이므로, 상기 객체 추론 모듈(102)은 머신 러닝 모델(또는 머신 러닝 추론 모델)에 입력할 수 있는 크기에 맞춰(즉, 지정된 비율에 따라) 원본영상(201)의 크기를 축소한 후, 상기 원본영상(201)의 크기를 축소한 축소영상(203)을 머신 러닝 추론 모델(205)에 입력한다. Referring to FIG. 2, since the size of an image input to a machine learning model (or machine learning inference model) is limited, the object inference module 102 determines the size that can be input to a machine learning model (or machine learning inference model). After reducing the size of the original image 201 (ie, according to the specified ratio), the reduced image 203 of the original image 201 is input to the machine learning inference model 205.

여기서 상기 머신 러닝 추론 모델(205)은, 머신러닝을 이용하여 객체를 추론하는 모델을 의미하며, 상기 객체 추론 모듈(102)은 상기 머신러닝 추론 모델(205)을 포함할 수 있다.Here, the machine learning inference model 205 means a model for inferring an object using machine learning, and the object inference module 102 may include the machine learning inference model 205 .

그런데 상술한 바와 같이 원본영상(201)의 크기를 축소할 경우, 결과적으로 정보량의 손실이 발생되며, 특히 고해상도(예 : 4K, FHD 등) 영상의 크기를 축소할 경우에 영상 열화가 더욱 심해지는 문제점이 있다.However, as described above, when the size of the original video 201 is reduced, the amount of information is lost as a result. There is a problem.

예컨대 상기와 같이 영상의 크기를 축소할 경우, 해당 영상 내의 객체의 크기도 작아지게 되므로, 객체의 인식(detection)과 분류의 정확도가 현저히 떨어지게 되는 문제점이 있다. 가령, 도 3을 참조하면, 영상 내에서 바운딩 박스로 표시된 객체의 종류(예 : 드론의 기종)가 “IRONMAN(즉, 알려진 드론의 기종 명칭)”으로 기재되어 있으나, 실제로는 잘못 판단된 기종이며, 정확도가 “0.482”로 표시됨으로써 정확도가 매우 낮음을 알 수 있다. 또한 작은 크기의 객체를 머신러닝으로 인식하기 위해서 학습모델의 최소 학습 객체의 크기를 줄여야 하기 때문에 크기가 큰 객체를 학습할 경우에 대비하여 상대적으로 많은 학습 시간이 소요되며, 결과적으로 최종 추론(inference) 시간이, 크기가 큰 객체 대비 상대적으로 길어지는 문제점이 있다. For example, when the size of an image is reduced as described above, the size of an object in the corresponding image is also reduced, so there is a problem in that the accuracy of object detection and classification is significantly reduced. For example, referring to FIG. 3, the type of object (e.g., drone model) displayed as a bounding box in the image is described as “IRONMAN (i.e., the model name of a known drone)”, but it is actually an incorrectly determined model , the accuracy is displayed as “0.482”, indicating that the accuracy is very low. In addition, since the size of the minimum learning object of the learning model must be reduced in order to recognize a small-sized object by machine learning, it takes a relatively long learning time compared to learning a large-sized object, resulting in final inference (inference). ), there is a problem that the time is relatively long compared to large objects.

다시 말해, 상기 카메라 모듈(101)로부터 촬영된 영상에서 실시간으로 객체를 인식하기 위해서는, 정확도뿐만 아니라, 추론 시간이 매우 중요한 요소가 된다. 즉, 빠른 시간 내에 객체 출현을 감지하고, 또한 상기 감지된 객체의 종류를 빠르게 파악하기 위해서는 빠른 추론이 필요하다.In other words, in order to recognize an object in real time from an image captured by the camera module 101, inference time as well as accuracy is a very important factor. That is, fast reasoning is required to detect the appearance of an object within a short period of time and to rapidly identify the type of the detected object.

그런데 상술한 바와 같이 머신러닝을 이용하여 객체를 분류하고자 할 경우, 머신 러닝 모델(또는 머신 러닝 추론 모델)에 입력할 수 있는 크기에 맞춰 영상의 크기를 축소하는 과정에서 객체의 인식과 분류의 정확도가 떨어지며, 또한 추론 시간도 증가하게(즉, 추론 속도가 느려지게) 되는 문제점이 발생할 수 있다.However, as described above, if you want to classify an object using machine learning, the accuracy of object recognition and classification in the process of reducing the size of the image to match the size that can be input to the machine learning model (or machine learning inference model) A problem may occur in that the inference time is decreased and the inference time is also increased (ie, the inference speed is slowed down).

따라서 본 실시 예에 따른 객체 인식 장치는, 상기 객체의 인식과 분류의 정확도를 향상시키며, 또한 추론 시간도 감소하게(즉, 추론 속도가 빨라지게) 하는 방법을 제공한다.Therefore, the object recognition apparatus according to the present embodiment provides a method of improving the accuracy of object recognition and classification, and also reducing inference time (ie, inference speed is increased).

도 4는 본 발명의 일 실시 예에 따라, 객체의 인식과 분류의 정확도 향상과 추론 속도의 향상을 위하여, 머신 러닝 모델(또는 머신 러닝 추론 모델)에 입력할 객체의 크기를 증가시키는 방법을 개략적으로 설명하기 위하여 보인 예시도이고, 도 5는 상기 도 1에 있어서, 객체 추론 모듈의 보다 구체적인 구성을 보인 예시도이다.4 is a schematic diagram of a method of increasing the size of an object to be input to a machine learning model (or machine learning inference model) in order to improve object recognition and classification accuracy and inference speed according to an embodiment of the present invention. , and FIG. 5 is an exemplary diagram showing a more specific configuration of the object inference module in FIG. 1 .

도 4를 참조하면, 영상의 하측에는 지상의 건물들과 나무들에 의해 형성되는 경계선에 해당하는 배경선이 형성되고, 영상의 상측인 하늘 공간에는 영상의 크기에 비해 매우 작은 크기의 객체(예 : 무인기)가 촬영되어 있다.Referring to FIG. 4, a background line corresponding to the boundary formed by buildings and trees on the ground is formed on the lower side of the image, and an object of a very small size compared to the size of the image is formed in the sky space on the upper side of the image (e.g. : Unmanned aerial vehicle) is filmed.

따라서 이미 상술한 바와 같이 상기 원본 영상 전체를 축소하여 상기 머신 러닝 추론 모델(205)에 입력할 경우, 객체(예 : 무인기)의 크기가, 객체의 종류를 분류하기 어려울 정도로 더욱 작아지게 됨으로써, 상기 객체의 인식과 분류의 정확도가 더욱 저하되는 문제점이 발생하게 된다. Therefore, as described above, when the entire original image is reduced and input to the machine learning inference model 205, the size of an object (eg, a drone) becomes smaller enough to make it difficult to classify the type of the object. A problem occurs in which the accuracy of object recognition and classification is further deteriorated.

따라서 본 실시 예에서 상기 객체 추론 모듈(102)은 영상 전체의 크기를 축소시키는 것이 아니라, 객체를 포함하는 일부 영역만 추출한 영상(305)을 머신 러닝 모델(또는 머신 러닝 추론 모델)에 입력시킴으로써, 결과적으로 객체 분류 프로세서(102a) 및 객체 예측 프로세서(102b)를 통해(도 5 참조), 상기 머신 러닝 모델(또는 머신 러닝 추론 모델)에 입력되는 영상(즉, 객체를 포함하는 일부 영역만 추출한 영상)의 크기 대비 객체의 크기가 상대적으로 증가되게 함으로써, 기존의 영상 전체의 크기를 머신 러닝 모델(또는 머신 러닝 추론 모델)에 입력할 크기로 축소시키는 방법 대비 상대적으로 객체의 인식과 분류의 정확도를 향상시키는 방법을 제공한다.Therefore, in this embodiment, the object inference module 102 does not reduce the size of the entire image, but inputs the image 305 extracted from only a part of the region including the object to a machine learning model (or machine learning inference model), As a result, through the object classification processor 102a and the object prediction processor 102b (see FIG. 5 ), the image input to the machine learning model (or machine learning inference model) (that is, the image extracted from only a partial region including the object) ) by increasing the size of the object relative to the size of the object, the accuracy of object recognition and classification relative to the method of reducing the size of the entire image to the size to be input to the machine learning model (or machine learning inference model) Offers ways to improve.

한편 상기 객체 분류 프로세서(102a) 및 객체 예측 프로세서(102b)는 하나의 프로세서로 통합되어 상기 객체 추론 모듈(102)에 포함될 수 있다.Meanwhile, the object classification processor 102a and the object prediction processor 102b may be integrated into one processor and included in the object inference module 102 .

도 6은 상기 도 1에 있어서, 객체 추론 모듈에 의해 원본영상에서 객체 포함 영역을 추출하는 방법을 설명하기 위한 예시도이다.FIG. 6 is an exemplary diagram for explaining a method of extracting an object-containing region from an original image by an object inference module in FIG. 1 .

도 6을 참조하면, 객체 추론 모듈(102)은 원본영상(201)을 세로 방향으로 나누어 지정된 개수(N)의 영역으로 분할한다. 이하 상기 세로 방향으로 지정된 개수(N)만큼 분할된 영역을 분할영역(301)이라고 한다.Referring to FIG. 6 , the object inference module 102 divides the original image 201 in the vertical direction into a designated number (N) of regions. Hereinafter, a region divided by the designated number (N) in the vertical direction is referred to as a divided region 301 .

상기 객체 추론 모듈(102)은 상기 원본영상(201)의 각 분할영역(301)에 존재하는 객체(103)를 탐색한 후, 한 개 이상의 객체(103)를 포함하는 객체 포함 영역(305)을 계산하여, 상기 객체 포함 영역(305)의 영상 정보를 머신 러닝 추론 모델(205)에 입력한다. 즉, 본 실시 예에 따른 상기 객체 추론 모듈(102)은 원본영상(201)에서 탐색한 객체들을 군집화(clustering)한 객체 포함 영역(305)을 산출하고, 상기 원본영상(201)(즉, 전체 영상)이 아닌 상기 객체 포함 영역(305)의 영상 정보를 머신 러닝 추론 모델(205)에 입력한다.The object inference module 102 searches for objects 103 existing in each segmented area 301 of the original image 201, and then selects an object-containing area 305 including one or more objects 103. After calculation, the image information of the object-containing region 305 is input to the machine learning inference model 205. That is, the object inference module 102 according to the present embodiment calculates the object-containing region 305 in which the objects searched for in the original image 201 are clustered, and the original image 201 (ie, the entire The image information of the object-containing region 305, not the image), is input to the machine learning inference model 205.

이에 따라 상기 머신 러닝 추론 모델(205)은 상기 객체 포함 영역(305)의 영상 정보를 입력으로 받아 최종 분류 결과를 산출한다.Accordingly, the machine learning inference model 205 receives the image information of the object-containing region 305 as an input and calculates a final classification result.

한편 도 6에 도시된 원본영상(201)에서 배경선(304)은 전체 영상에서 건물 또는 나무 등의 지상 물체들과 하늘 공간과의 실제 경계를 나타내며, 경계선(302)은 상기 배경선(304)으로부터 상측(즉, 하늘 공간)으로 미리 지정된 간격의 오프셋(303)을 부가하여 형성된 선을 의미한다. Meanwhile, in the original image 201 shown in FIG. 6, the background line 304 represents the actual boundary between ground objects such as buildings or trees and the sky space in the entire image, and the boundary line 302 represents the background line 304 It means a line formed by adding an offset 303 of a predetermined interval from to the upper side (ie, sky space).

이 때 상기 경계선(302)을 설정하는 이유는, 상기 경계선(302) 아래의 영역은 객체 탐색 범위에서 제외하기 위한 목적이다.At this time, the reason for setting the boundary line 302 is to exclude the area below the boundary line 302 from the object search range.

다시 말해, 상기 경계선(302)을 두는 이유는 상기 배경선(304) 추정에서 발생할 수 있는 에러를 보상하기 위한 것으로서, 예컨대 상기 배경선(304) 부근의 영역에서 실제로는 객체가 아닌 것을 객체로 잘못 판단하는 오류(예 : 하늘 공간으로 뻗어 나온 나뭇가지를 객체로 잘못 판단할 가능성)를 줄이기 위함이다.In other words, the reason for placing the boundary line 302 is to compensate for errors that may occur in estimating the background line 304. This is to reduce errors in judgment (e.g., the possibility of mistakenly judging a tree branch extending into the sky space as an object).

이하 상기 배경선(304) 및 경계선(302)을 검출하는 방법에 대해서 설명한다.Hereinafter, a method of detecting the background line 304 and the boundary line 302 will be described.

도 7은 상기 도 6에 있어서, 원본영상에서 배경선을 검출하는 원리를 설명하기 위하여 보인 예시도이다.FIG. 7 is an exemplary diagram shown in FIG. 6 to explain the principle of detecting a background line in an original image.

도 7을 참조하면, 상기 객체 추론 모듈(102)이 상기 원본영상(201)에서 배경선(304)을 검출하기 위하여, 상기 원본영상(201)을 지정된 비율(RATIO)에 따라 상기 축소영상(203)으로 축소한다. Referring to FIG. 7 , in order for the object inference module 102 to detect a background line 304 in the original image 201, the reduced image 203 according to a specified ratio (RATIO) of the original image 201 ) is reduced to

상기와 같이 원본영상(201)을 축소영상(203)으로 축소하는 이유는, 계산(즉, 배경선 검출을 위한 계산)의 부하를 줄이기 위한 것이다. The reason for reducing the original image 201 to the reduced image 203 as described above is to reduce the load of calculation (ie, calculation for detecting the background line).

즉, 배경선(304)을 검출하기 위해서는 픽셀 단위의 계산이 이루어져야 하므로, 상기 축소영상(203)을 사용하여 배경선(304)을 검출한 후, 상기 축소영상(203)의 객체의 위치로부터 다시 상기 원본영상(201)에서의 객체의 위치로 복원(또는 환원)시키는 과정에서의 계산 부하를 줄이기 위한 것이다.That is, in order to detect the background line 304, calculation in units of pixels must be performed. After detecting the background line 304 using the reduced image 203, the object position of the reduced image 203 is used again. This is to reduce the calculation load in the process of restoring (or reducing) the position of the object in the original image 201.

예컨대 상기 객체 추론 모듈(102)이 상기 원본영상(201)에서 배경선(304)을 검출하기 위하여, 원본영상(201)(예 : 세로×가로 픽셀 = H×W)을 지정된 비율(RATIO=h/H=w/W)에 따라 축소하여 축소영상(203)(예 : 세로×가로 픽셀 = h×w)을 생성한다고 가정한다. 이 때 상기 축소영상(203)에서 어느 한 객체의 위치(즉, 좌표)가 (cx, cy)라고 할 때, 상기 축소영상(203)에서의 객체의 위치(즉, 좌표)를 상기 원본영상(201)에서의 객체의 위치로 환원시키면, 상기 원본영상(201)에서의 객체의 위치는 (cx/RATIO, cy/RATIO)가 된다.For example, in order for the object inference module 102 to detect the background line 304 in the original image 201, the original image 201 (eg, vertical × horizontal pixels = H × W) is selected at a specified ratio (RATIO = h). It is assumed that a reduced image 203 (eg, vertical × horizontal pixels = h × w) is generated by scaling down according to /H = w / W. At this time, when the position (ie, coordinates) of any one object in the reduced image 203 is (cx, cy), the position (ie, coordinates) of the object in the reduced image 203 is the original image ( 201), the position of the object in the original image 201 becomes (cx/RATIO, cy/RATIO).

도 8은 상기 도 7에 있어서, 원본영상에서 배경선을 검출하기 전 단계로서 영상 이진화(binary-coding) 방법을 설명하기 위하여 보인 예시도이다.FIG. 8 is an exemplary diagram shown in FIG. 7 to explain an image binary-coding method as a step prior to detecting a background line in an original image.

도 8을 참조하면, 상기 객체 추론 모듈(102)은 칼라 영상(500)을 R, G, B 값이 동일한 값을 가지는 그레이(gray) 영상(501)으로 변환한다. Referring to FIG. 8 , the object inference module 102 converts a color image 500 into a gray image 501 having the same R, G, and B values.

이후, 상기 객체 추론 모듈(102)은 지정된 비율(예 : 축소 RATIO)에 따라 상기 그레이 영상(501)을 축소하여 축소영상(502)을 생성한 뒤, 상기 축소영상(502)으로부터 이진화 영상(503)을 생성한다. Thereafter, the object inference module 102 generates a reduced image 502 by reducing the gray image 501 according to a specified ratio (eg, reduced RATIO), and then generates a binary image 503 from the reduced image 502. ) to create

여기서 상기 이진화 영상(503)은 픽셀 값이 0 또는 1로 표현된 영상으로서, 아래의 수학식 1에 의해 산출될 수 있다. Here, the binarized image 503 is an image in which pixel values are expressed as 0 or 1, and can be calculated by Equation 1 below.

즉, 수학식 1을 참조하면, 상기 축소 영상(502)의 각 픽셀 값이 지정된 임계값(THRESHOLD) 이상일 경우에는 1로 표현되고, 상기 각 픽셀 값이 지정된 임계값(THRESHOLD) 보다 작을 경우에는 0으로 표현된다. 이 때 상기 임계값(THRESHOLD)은 복수로 설정될 수도 있다.That is, referring to Equation 1, when each pixel value of the reduced image 502 is equal to or greater than the designated threshold value (THRESHOLD), it is expressed as 1, and when each pixel value is less than the designated threshold value (THRESHOLD), it is expressed as 0. is expressed as At this time, the threshold value THRESHOLD may be set in plurality.

도 9는 상기 도 8에 있어서, 축소된 이진화 영상으로부터 배경선을 산출하는 방법을 설명하기 위하여 보인 예시도이다.FIG. 9 is an exemplary diagram shown to explain a method of calculating a background line from a reduced binarized image in FIG. 8 .

상기 객체 추론 모듈(102)은 제1 임계값(THRESHOLD_1)(600)을 적용하여 제1 배경선을 산출하고, 제2 임계값(THRESHOLD_2)(601)을 적용하여 제2 배경선을 산출하며, 이와 마찬가지 방식으로 제N 임계값(THRESHOLD_N)(602)을 적용하여 제N 배경선을 산출한다(수학식 2 참조). The object inference module 102 calculates a first background line by applying a first threshold value (THRESHOLD_1) (600) and calculates a second background line by applying a second threshold value (THRESHOLD_2) (601), In the same way, the Nth background line is calculated by applying the Nth threshold (THRESHOLD_N) 602 (see Equation 2).

상기와 같이 복수의 배경선을 산출하는 이유는, 상기 배경선 중 가장 최적의 배경선을 검출하기 위한 목적이다(수학식 3 참조).The reason for calculating a plurality of background lines as described above is to detect the most optimal background line among the background lines (see Equation 3).

참고로 상기 N 개의 배경선을 산출하기 위해 적용되는 가중치(

)는 각기 다르게 적용되며, 상기 가중치(

)들의 합은 1이다(수학식 3 참조).For reference, the weight applied to calculate the N background lines (

) is applied differently, and the weight (

) is 1 (see Equation 3).

즉, 도 9에 도시된 세로 픽셀 확대 이미지를 참조하면, 배경선의 위치는 해당하는 세로 픽셀들의 값을 더하여 산출되며, 이는 수학식 2와 같이 표현된다.That is, referring to the vertical pixel enlarged image shown in FIG. 9 , the position of the background line is calculated by adding the values of the corresponding vertical pixels, which is expressed as Equation 2.

예컨대 수학식 2에서 L_i를 연접하면, 제1 임계값(THRESHOLD_1)에서의 배경선, 제2 임계값(THRESHOLD_2)에서의 배경선, 및 제N 임계값(THRESHOLD_N)에서의 배경선이 산출되며, 상기 복수의 배경선 중 최종 배경선(즉, 최적의 배경선)은, 제1 임계값 내지 제N 임계값(THRESHOLD_1 ~ THRESHOLD_N)까지의 값을 가중 평균하여 산출되며, 이는 아래의 수학식 3과 같이 표현된다. For example, when L _i is concatenated in Equation 2, a background line at the first threshold value THRESHOLD_1, a background line at the second threshold value THRESHOLD_2, and a background line at the Nth threshold value THRESHOLD_N are calculated. , The final background line (that is, the optimal background line) among the plurality of background lines is calculated by weighting the average of the values from the first threshold to the Nth threshold (THRESHOLD_1 to THRESHOLD_N), which is expressed by Equation 3 below is expressed as

즉, 도 9에 있어서, 가로 픽셀 i에 대한 최종 배경선(도 6에서 304)은 아래의 수학식 3과 같이 B_i가 된다.That is, in FIG. 9, the final background line (304 in FIG. 6) for the horizontal pixel i becomes B _i as shown in Equation 3 below.

아래의 수학식 4는 상기 배경선(304)에서 오프셋(OFFSET) 만큼의 격차를 가지는 최종 경계선(302)인 C_i를 나타낸다.Equation 4 below represents C _i , which is the final boundary line 302 having a gap as much as OFFSET from the background line 304 .

도 10은 상기 도 8에 있어서, 축소된 이진화 영상에서 객체를 검출하는 방법을 설명하기 위하여 보인 예시도이다.FIG. 10 is an exemplary diagram shown to explain the method of detecting an object in a reduced binarized image in FIG. 8 .

도 10을 참조하면, 상기 객체 추론 모듈(102)은 상기 축소된 이진화 영상(503)에서 객체 검출을 위해 지정된 객체선 산출용 임계값(THRESHOLD_O)을 아래의 수학식 5에 적용하여 객체선(703)을 검출(산출)한다.Referring to FIG. 10 , the object inference module 102 applies an object line calculation threshold (THRESHOLD_O) designated for object detection in the reduced binarized image 503 to Equation 5 below to obtain an object line 703. ) is detected (calculated).

즉, 아래의 수학식 5는 객체선 산출용 임계값(THRESHOLD_O)을 적용한 축소된 이진화 영상(503)에서 가로 픽셀 i에 대한 세로 픽셀 값이 최소가 되는 첫 번째 세로 픽셀 인덱스(OBJ_i,O)를 찾는 것이다. 이에 따라 도 10에 도시된 바와 같이, 상기 객체선(703)은 객체가 없는 곳(위치)에서는 경계선(302)에 맞닿은 선이 생성되고 객체가 있는 곳(위치)에서는 경계선(302)까지 수직인 직선이 생성된다.That is, Equation 5 below is the first vertical pixel index (OBJ _i,O ) at which the vertical pixel value for the horizontal pixel i is the minimum in the reduced binarized image 503 to which the object line calculation threshold (THRESHOLD_O) is applied. is to find Accordingly, as shown in FIG. 10, the object line 703 is a line that touches the boundary line 302 where there is no object (position), and is perpendicular to the boundary line 302 where there is an object (position). A straight line is created.

결과적으로 도 10에 도시된 바와 같이, 전체 가로 픽셀에 대해서 객체가 있는 곳(위치)은 돌출되고(돌출된 높이는 객체의 위치에 따라 다름), 객체가 없는 곳(위치)은 경계선(302)을 추종하는 형태의 객체선(703)이 생성된다.As a result, as shown in FIG. 10, with respect to all horizontal pixels, the place (position) where the object is located protrudes (the height of the protrusion varies depending on the location of the object), and the place where there is no object (position) has a boundary line 302. An object line 703 in the form of following is created.

도 11은 상기 도 8에 있어서, 축소된 이진화 영상에서 분할영역을 이용하여 객체를 검출하는 방법을 설명하기 위하여 보인 예시도이다.FIG. 11 is an exemplary diagram shown to explain the method of detecting an object using a segmented region in a reduced binarized image in FIG. 8 .

도 11을 참조하면, 상기 객체 추론 모듈(102)은 복수개로 분할된 각 분할영역(301)에서 적어도 한 개의 객체(103)를 검출한다. Referring to FIG. 11 , the object inference module 102 detects at least one object 103 in each divided area 301 divided into a plurality of parts.

이 때 상기 분할영역(301)의 갯수가 증가할수록, 하나의 분할영역(301)의 간격이 좁아짐으로써 하나의 분할영역(301) 내에 두 개 이상의 객체(103)가 존재할 가능성은 점차 낮아지게 된다. At this time, as the number of the divided regions 301 increases, the interval between one divided region 301 narrows, so that the possibility of two or more objects 103 existing in one divided region 301 gradually decreases.

따라서 상기 복수의 분할영역(301)에서 독립적으로(또는 병렬적으로) 객체를 검출함으로써 전체적인 추론 시간을 단축할 수 있게 된다.Accordingly, by independently (or in parallel) detecting objects in the plurality of partition regions 301, the overall inference time can be shortened.

이하 상기 각 분할영역(301)에서 한 개의 객체(103)를 검출하는 것에 대해서 좀 더 구체적으로 설명한다.Hereinafter, detection of one object 103 in each partition area 301 will be described in more detail.

예컨대 각 분할영역(301) 내에서 객체선(703)의 한 점이 객체(103)가 되기 위한 조건의 한 가지 예로서, 객체선(703)의 증가가 지정된 객체 검출용 임계값(THRESHOLD_OBJECT) 보다 커지는 경우를 들 수 있다. For example, as one example of a condition for a point of the object line 703 to become the object 103 within each segmentation area 301, the increase of the object line 703 is greater than the designated object detection threshold value (THRESHOLD_OBJECT). case can be cited.

가령, 상기 객체선(703)을 함수라고 가정하면, 아래의 수학식 6과 같이, “dy/dx > THRESHOLD_OBJECT”인 경우를 만족하는 y값, 즉, 세로 픽셀의 위치 값을 가지는 지점(x, y)이 객체(103)가 존재하는 위치가 된다. 즉, 경사(gradient)가 가장 크게 증가하는 지점이 객체가 위치하는 지점이 된다. For example, assuming that the object line 703 is a function, as shown in Equation 6 below, a y value satisfying the case of “dy / dx > THRESHOLD_OBJECT”, that is, a point having a vertical pixel position value (x, y) becomes the location where the object 103 exists. That is, the point where the gradient increases the most becomes the point where the object is located.

다만 상기 예시한 설명은 분할영역(301)에서 객체를 검출하는 방법을 한정하고자 하는 것은 아니며, 영역을 분할하지 않더라도 전체 영상에서 객체를 검출하기 위하여 적용할 수도 있다.However, the illustrative description above is not intended to limit the method of detecting an object in the segmented area 301, and may be applied to detect an object in the entire image even if the area is not segmented.

한편 상기와 같이 축소된 이진화 영상(503)에서 객체들의 위치가 검출되면, 머신 러닝 모델(또는 머신 러닝 추론 모델)에 입력할 수 있는 크기에 맞춰 객체 포함 영역(305)들을 클러스터링(즉, 객체들을 객체 포함 영역에 포함되도록 조합) 하고, 상기 클러스터링 된 객체 포함 영역(305)들을 원본영상(201)으로 복원(환원)시킨다.On the other hand, when the positions of objects are detected in the reduced binarized image 503 as described above, the object-containing regions 305 are clustered (ie, the objects are grouped according to the size that can be input to the machine learning model (or machine learning inference model)). are combined to be included in the object-containing region), and the clustered object-containing regions 305 are restored (reduced) to the original image 201.

이 때 상기 산출(또는 생성)할 객체 포함 영역의 갯수는 최소화시키는 것이 객체 추론 시간을 단축하는데 바람직하다. In this case, it is preferable to reduce the object inference time by minimizing the number of object-containing regions to be calculated (or created).

예컨대 도 12를 참조하면, 클러스터링 방식에 따라, 각기 1개의 객체를 포함하는 객체 포함 영역(305)을 생성할 경우에는 최대 4개의 객체 포함 영역(305)이 생성되며, 각기 2개의 객체를 포함하는 객체 포함 영역(305)을 생성할 경우에는 2개의 객체 포함 영역(305)이 생성된다. For example, referring to FIG. 12 , when object-containing regions 305 each including one object are created according to the clustering method, up to four object-containing regions 305 are created, each containing two objects. When the object-containing area 305 is created, two object-containing areas 305 are created.

만약 4개의 객체를 모두 포함하는 객체 포함 영역(305)을 생성할 경우에는 머신 러닝 모델(또는 머신 러닝 추론 모델)에 입력할 수 있는 크기를 초과하게 되므로, 도 12에서는 2개의 객체 포함 영역(305)으로 객체를 클러스터링(906) 하는 것이 바람직하다.If the object containing area 305 including all four objects is created, the size that can be input to the machine learning model (or machine learning inference model) is exceeded, so in FIG. ) to cluster 906 the objects.

이하 상기 객체 포함 영역들의 클러스터링 방법에 대해서 설명한다.Hereinafter, a method of clustering the object-containing regions will be described.

도 12는 상기 도 11에 있어서, 축소된 이진화 영상에서 객체 포함 영역들의 클러스터링 방법을 설명하기 위하여 보인 예시도이다.FIG. 12 is an exemplary diagram shown to explain a method of clustering object-containing regions in a reduced binarized image in FIG. 11 .

도 12를 참조하면, 총 객체의 수를 F라고 할 때, 지정된 클러스터링 함수(_FC_f)를 이용한 조합으로(즉, f=F, F-1, ..., 1), 객체 포함 영역(305)에 포함되는 객체 수가 최대가 되도록 클러스터링을 수행한다. 즉, f=F 부터 시작하여 클러스터링 된 객체들은 제외시키면서 나머지 클러스터링을 수행한다. Referring to FIG. 12, when the total number of objects is F, in a combination using a designated clustering function ( _F C _f ) (ie, f = F, F-1, ..., 1), the object containing area ( 305), clustering is performed so that the number of objects included is maximized. That is, starting from f = F, clustering is performed while excluding clustered objects.

상기 과정은 객체 수를 많이 포함하는 클러스터링부터 우선적으로 수행된다.The above process is performed first from clustering that includes a large number of objects.

한편 상기 도 12에 도시된 축소된 이진화 영상(503)에서의 객체 포함 영역((cx, cy),(cx+w, cy+h))(305)은, 아래의 수학식 7과 같이, 축소된 이진화 영상(503)에서의 객체 포함 영역에 지정된 비율(RATIO)을 적용함으로써, 원본영상(201)에서의 객체 포함 영역(((cx, cy),(cx+w, cy+h))×RATIO)으로 매핑될 수 있다.Meanwhile, the object-containing region ((cx, cy), (cx + w, cy + h)) 305 in the reduced binarized image 503 shown in FIG. 12 is reduced as shown in Equation 7 below. By applying a specified ratio (RATIO) to the object-containing area in the binarized image 503, the object-containing area in the original image 201 (((cx, cy), (cx + w, cy + h)) × RATIO).

도 13은 상기 도 6에 있어서, 원본영상에서 객체 포함 영역을 산출하여 머신러닝 추론 모델에 입력하는 과정을 설명하기 위한 예시도이다.FIG. 13 is an exemplary diagram for explaining the process of calculating an object-containing region from an original image and inputting the object-containing region to a machine learning inference model in FIG. 6 .

상기 도 12를 참조하여 설명한 바와 같이, 축소된 이진화 영상(503)에서 검출한 객체 포함 영역((cx, cy),(cx+w, cy+h))을 이용하여 원본영상(201)에서의 객체 포함 영역(((cx, cy),(cx+w, cy+h))×RATIO)이 산출(추출)되면, 상기 산출(추출)된 원본영상(201)에서의 객체 포함 영역(((cx, cy),(cx+w, cy+h))×RATIO)을 머신 러닝 추론 모델(205)에 입력하여 최종 결과를 산출할 수 있다. As described with reference to FIG. 12, in the original image 201 using the object-containing region ((cx, cy), (cx + w, cy + h)) detected in the reduced binarized image 503 When the object-containing area (((cx, cy), (cx + w, cy + h)) × RATIO) is calculated (extracted), the object-containing area ((( cx, cy), (cx+w, cy+h))×RATIO) may be input to the machine learning inference model 205 to calculate a final result.

상기와 같이 본 실시 예는 원본영상(201) 전체를 단순히 축소하여 머신 러닝 추론 모델(205)에 입력하는 것이 아니라, 상기 원본영상(201)에서 하나 이상의 객체 포함 영역(305)을 산출하여 머신 러닝 추론 모델(205)에 입력함으로써 객체의 인식과 분류에 대한 추론 정확도와 추론 속도를 향상시킬 수 있도록 하는 효과가 있다.As described above, in this embodiment, rather than simply downscaling the entire original image 201 and inputting it to the machine learning inference model 205, one or more object-containing regions 305 are calculated in the original image 201 and machine learning Input to the inference model 205 has an effect of improving inference accuracy and inference speed for object recognition and classification.

또한 본 실시 예는 하늘 공간상에 출현하는 객체(예 : 무인기)의 검출 및 그 종류의 확인을 위한 머신 러닝 추론 모델의 입력 방식으로 직접 적용 가능하며, 특히 고해상도(예 : 4K, FHD) 영상에서 탁월한 추론 성능을 가지고 있으며, 추론된 객체의 물리적 위치를 추적하는 시스템에 직접 응용 가능한 효과가 있다.In addition, this embodiment can be directly applied as an input method of a machine learning inference model for detecting an object (e.g., UAV) appearing in the sky space and confirming its type, especially in high-resolution (e.g., 4K, FHD) images. It has excellent inference performance and has an effect that can be directly applied to a system that tracks the physical location of an inferred object.

이상으로 본 발명은 도면에 도시된 실시 예를 참고로 하여 설명되었으나, 이는 예시적인 것에 불과하며, 당해 기술이 속하는 분야에서 통상의 지식을 가진 자라면 이로부터 다양한 변형 및 균등한 타 실시예가 가능하다는 점을 이해할 것이다. 따라서 본 발명의 기술적 보호범위는 아래의 특허청구범위에 의해서 정하여져야 할 것이다. 또한 본 명세서에서 설명된 구현은, 예컨대, 방법 또는 프로세스, 장치, 소프트웨어 프로그램, 데이터 스트림 또는 신호로 구현될 수 있다. 단일 형태의 구현의 맥락에서만 논의(예컨대, 방법으로서만 논의)되었더라도, 논의된 특징의 구현은 또한 다른 형태(예컨대, 장치 또는 프로그램)로도 구현될 수 있다. 장치는 적절한 하드웨어, 소프트웨어 및 펌웨어 등으로 구현될 수 있다. 방법은, 예컨대, 컴퓨터, 마이크로프로세서, 집적 회로 또는 프로그래밍 가능한 로직 디바이스 등을 포함하는 프로세싱 디바이스를 일반적으로 지칭하는 프로세서 등과 같은 장치에서 구현될 수 있다. 프로세서는 또한 최종-사용자 사이에 정보의 통신을 용이하게 하는 컴퓨터, 셀 폰, 휴대용/개인용 정보 단말기(personal digital assistant: "PDA") 및 다른 디바이스 등과 같은 통신 디바이스를 포함한다.The present invention has been described above with reference to the embodiments shown in the drawings, but this is only exemplary, and those skilled in the art can make various modifications and equivalent other embodiments. you will understand the point. Therefore, the technical protection scope of the present invention should be determined by the claims below. Implementations described herein may also be embodied in, for example, a method or process, an apparatus, a software program, a data stream, or a signal. Even if discussed only in the context of a single form of implementation (eg, discussed only as a method), the implementation of features discussed may also be implemented in other forms (eg, an apparatus or program). The device may be implemented in suitable hardware, software and firmware. The method may be implemented in an apparatus such as a processor, which is generally referred to as a processing device including, for example, a computer, microprocessor, integrated circuit, programmable logic device, or the like. Processors also include communication devices such as computers, cell phones, personal digital assistants ("PDAs") and other devices that facilitate communication of information between end-users.

101 : 카메라 모듈 102 : 객체 추론 모듈
102a : 객체 분류 프로세서 102b : 객체 예측 프로세서
103 : 객체 또는 객체 이미지 104 : 영상 출력 모듈
105 : 바운딩 박스 205 : 머신러닝 추론 모델101: camera module 102: object inference module
102a: object classification processor 102b: object prediction processor
103: object or object image 104: image output module
105: bounding box 205: machine learning inference model

Claims

An object inference model for processing an original image captured by a camera module and generating an image of a size to be input to the machine learning inference model;
The object inference model,
It includes the machine learning inference model, and outputs a result of recognition and classification of an object inferred through the machine learning inference model,
The machine learning inference model,
An object recognition device characterized in that it infers an object included in the input image by processing the input image.

The method of claim 1, wherein the object inference module,
As an image to be input to the machine learning inference model,
The object recognition device characterized in that for extracting images of object-containing regions clustered in the original image according to a size to be input to the machine learning inference model.

The method of claim 2, wherein the object inference module,
In order to extract the object-containing region,
A background line is calculated from the binarized image in which the original image is reduced, and a boundary line is calculated by adding an offset to a sky space above the background line;
An object recognition device characterized in that for extracting an object-containing area only from the sky space excluding areas below based on the boundary line.

The method of claim 3, wherein the object inference module,
Converting the color image, which is the original image, into a gray image having the same R, G, and B values, and generating a reduced binary image by downscaling the gray image;
The binarized image,
If the pixel value is greater than or equal to the specified threshold value, it is expressed as 1.
An object recognition device characterized in that the image is expressed as 0 when the pixel value is smaller than a designated threshold.

The method of claim 3, wherein the object inference module,
N background lines are calculated as in Equation 2 below by applying different threshold values, and each of the N background lines has a different weight (

) is applied and then a weighted average is performed to calculate the final background line (B _i ) as shown in Equation 3 below.
(Equation 2)

(Equation 3)

The method of claim 3, wherein the object inference module,
Calculating an object line to detect an object in the reduced binary image,
In order to calculate the object line, the first vertical pixel index (OBJ) at which the vertical pixel value for the horizontal pixel i of the binarized image is minimized by applying the designated object line calculation threshold value (THRESHOLD_O) to Equation 5 below. _{i, O} ) object recognition device characterized in that for detecting the object line by repeatedly performing a process of finding.
(Equation 5)

The method of claim 6, wherein the object inference module,
Independently detecting objects in a plurality of divided regions obtained by dividing the reduced binary image in the vertical direction,
The object recognition device, characterized in that for detecting, as the position of the object, a point where the inclination of the object line increases the most beyond an object detection threshold (THRESHOLD_OBJECT).

The method of claim 7, wherein the object inference module,
When the positions of objects are detected in the reduced binary image,
An object recognition device characterized by clustering object-containing regions according to a size that can be input to a machine learning inference model, but clustering in a minimum number that can be combined in a corresponding image.

The method of claim 8, wherein the object inference module,
The apparatus for recognizing an object, characterized in that by applying a specified ratio (RATIO) to the clustered object-containing area in the reduced binarized image, mapping to an object-containing area in the original image.

object inference module receiving an original image;
extracting, by the object inference module, an area including at least one object according to a size to be input to a machine learning inference model through processing of the original image; and
and performing, by the object inference module, recognition and classification of an object included in the at least one object-containing region by the machine learning inference model and outputting the result.

11. The method of claim 10, wherein extracting at least one object-containing region according to a size to be input to the machine learning inference model comprises:
The object inference module,
converting the original image to generate a reduced binary image;
clustering a region including at least one object corresponding to an object detected in the reduced binary image; and
and extracting the object-containing region to be input to the machine learning inference model by mapping the object-containing region clustered in the reduced binarized image to the object-containing region in the original image.

The method of claim 11, in order to extract at least one object-containing region from the reduced binarized image,
The object inference module,
Calculating a background line in the reduced binary image,
Calculate a boundary line by adding an offset to the sky space above the background line;
An object recognition method characterized by extracting an object-containing region only from the sky space excluding the region below based on the boundary line.

The method of claim 11, in order to extract at least one object-containing region from the reduced binarized image,
The object inference module,
Dividing the reduced binarized image into a plurality of partitioned regions in a vertical direction;
The object recognition method, characterized in that for reducing the overall inference time by independently detecting the object in the plurality of partitioned areas.

generating a reduced image by reducing the original image by an object inference module;
generating, by the object inference module, a reduced binarized image by converting the reduced image into a binarized image;
the object inference module detecting an object in the reduced binarized image and clustering object-containing regions according to a size that can be input to a machine learning inference model;
mapping, by the object inference module, an object-containing region clustered in the reduced binarized image to an object-containing region in the original image; and
and inferring an object by inputting object-containing regions mapped to the original image to a machine learning inference model.

The method of claim 14, in order to extract the object-containing region,
The object inference module,
After calculating a background line from the reduced binary image, calculating a boundary line by adding an offset to the sky space above the background line;
An object recognition method characterized by extracting an object-containing region only from the sky space excluding the region below based on the boundary line.

The method of claim 13, wherein in the generating of the reduced binarized image,
The object inference module,
After converting the color image, which is the original image, into a gray image having the same R, G, and B values, and reducing the gray image to generate a reduced image,
An object recognition method characterized by generating a binarized image by expressing 1 when a pixel value is greater than or equal to a specified threshold value and expressing it as 0 when the pixel value is smaller than a specified threshold value.

16. The method of claim 15, in order to calculate the background line,
The object inference module
N background lines are calculated as in Equation 2 below by applying different threshold values, and each of the N background lines has a different weight (

), and then performing a weighted average to calculate the final background line (B _i ) as shown in Equation 3 below.
(Equation 2)

(Equation 3)

The method of claim 14, in order to detect an object in the reduced binarized image,
The object inference module,
Calculating an object line to detect an object in the reduced binary image,
The object recognition method characterized by detecting, as the position of the object, a point where the slope of the object line increases the most beyond a threshold for object detection (THRESHOLD_OBJECT).

The method of claim 18, in order to calculate the object line,
The object inference module,
By applying the designated object line calculation threshold (THRESHOLD_O) to Equation 5 below, the first vertical pixel index (OBJ _i,O ) at which the vertical pixel value of the horizontal pixel i of the reduced binary image is minimum is obtained. An object recognition method characterized by detecting the object line by repeatedly performing a finding process.
(Equation 5)

15. The method of claim 14, wherein in the step of mapping an object-containing region clustered in the reduced binarized image to an object-containing region in the original image,
The object inference module,
The object-containing area (((cx, cy ), (cx + w, cy + h)) × RATIO).