KR102660089B1

KR102660089B1 - Method and apparatus for estimating depth of object, and mobile robot using the same

Info

Publication number: KR102660089B1
Application number: KR1020210143207A
Authority: KR
Inventors: 김근환; 황정훈; 김동엽; 김태근
Original assignee: 한국전자기술연구원
Priority date: 2021-10-26
Filing date: 2021-10-26
Publication date: 2024-04-23
Anticipated expiration: 2041-10-26
Also published as: KR20230059236A

Abstract

본 발명은 저가형의 센서를 이용하여 최적의 깊이 값을 추정하기 위한 객체 깊이 추정 방법 및 장치, 이를 이용한 이동 장치에 관한 것이다. 본 발명에 따른 객체 깊이 추정 장치는 RGB 색영상 및 깊이 정보를 획득하는 단계, RGB 색영상에서 객체 인식을 통한 관심 영역을 지정하는 단계, 깊이 정보를 이용하여 관심 영역 내에 픽셀에 대한 깊이 값의 분포도를 생성하는 단계, 분포도에서 빈도수가 기 설정된 값보다 높은 영역의 깊이 값을 객체의 깊이로 추정하는 단계를 포함한다.The present invention relates to an object depth estimation method and device for estimating an optimal depth value using a low-cost sensor, and a mobile device using the same. The object depth estimation device according to the present invention includes the steps of acquiring an RGB color image and depth information, designating a region of interest through object recognition in the RGB color image, and mapping the distribution of depth values for pixels within the region of interest using the depth information. It includes a step of generating and estimating the depth value of an area in the distribution map where the frequency is higher than a preset value as the depth of the object.

Description

Method and apparatus for estimating depth of object, and mobile robot using the same}

본 발명은 객체의 깊이를 추정하기 위한 방법에 관한 것으로, 더욱 상세하게는 저가형의 센서를 이용하여 최적의 깊이 값을 추정하기 위한 객체 깊이 추정 방법 및 장치, 이를 이용한 이동 장치에 관한 것이다.The present invention relates to a method for estimating the depth of an object, and more specifically, to an object depth estimation method and device for estimating an optimal depth value using a low-cost sensor, and a mobile device using the same.

로봇 비젼(robot vision), 휴먼 컴퓨터 인터페이스(human computer interface), 지능형 시각 감시(intelligent visual surveillance), 3D 이미지 획득(3D image acquisition), 지능형 운전자 보조 시스템(intelligent driver assistant system) 등과 같은 다양한 분야에서 이용되는 3D 깊이 정보(three-dimensional depth information)의 추정 방법에 대해 활발한 연구가 이루어지고 있다.Used in various fields such as robot vision, human computer interface, intelligent visual surveillance, 3D image acquisition, intelligent driver assistance system, etc. Active research is being conducted on methods for estimating three-dimensional depth information.

3D 깊이 정보 추정에 대한 대부분의 전통적인 방법은 스테레오 비전(stereo vision) 등과 같은 복수의 영상에 의존하는 것이다. 스테레오 매칭(stereo matching)은 두 대의 카메라에서 얻어진 영상에서 생기는 양안 시차(binocular disparity)를 이용하여 깊이를 추정하는 방법이다. 이러한 방법은 많은 장점이 있지만, 동일한 장면(scene)에 대해 두 개의 카메라에서 얻어진 영상의 쌍(pair of image)이 필요한 근본적인 제약이 있다.Most traditional methods for estimating 3D depth information rely on multiple images, such as stereo vision. Stereo matching is a method of estimating depth using binocular disparity that occurs in images obtained from two cameras. Although this method has many advantages, it has a fundamental limitation of requiring a pair of images obtained from two cameras for the same scene.

이러한 양안 시차(binocular disparity)를 사용한 방법의 대안으로 단안(monocular) 방법도 연구되고 있다. 일례로 DFD(depth from defocus) 방법은 단일 카메라 기반 깊이 추정 방법으로서, 동일한 장면에서 촬영된 다른 초점을 가지는 영상의 쌍을 이용하여 디포커스 블러(defocus blur)의 정도를 추정한다. 그러나 이 방법은 복수의 디포커시드 영상(defocused image)을 촬영하기 위해 고정된 카메라 뷰(fixed camera view)가 필요한 제약이 있다.As an alternative to the method using binocular disparity, monocular methods are also being studied. For example, the depth from defocus (DFD) method is a single camera-based depth estimation method that estimates the degree of defocus blur using pairs of images with different focuses taken in the same scene. However, this method has the limitation of requiring a fixed camera view to capture multiple defocused images.

한국공개특허 제10-2016-0041441호(2016.04.18.)Korean Patent Publication No. 10-2016-0041441 (2016.04.18.)

따라서 본 발명의 목적은 저가형의 센서를 이용하여 최적의 깊이 값을 추정하기 위한 객체 깊이 추정 방법 및 장치, 이를 이용한 이동 장치을 제공하는 데 있다.Therefore, the purpose of the present invention is to provide an object depth estimation method and device for estimating an optimal depth value using a low-cost sensor, and a mobile device using the same.

본 발명에 따른 객체 깊이 추정 방법은 RGB 색영상 및 깊이 정보를 획득하는 단계, 상기 RGB 색영상에서 객체 인식을 통한 관심 영역을 지정하는 단계, 상기 깊이 정보를 이용하여 상기 관심 영역 내에 픽셀에 대한 깊이 값의 분포도를 생성하는 단계, 상기 분포도에서 빈도수가 기 설정된 값보다 높은 영역의 깊이 값을 객체의 깊이로 추정하는 단계를 포함하는 것을 특징으로 한다.The object depth estimation method according to the present invention includes the steps of acquiring an RGB color image and depth information, designating a region of interest through object recognition in the RGB color image, and using the depth information to determine the depth of a pixel in the region of interest. Generating a distribution chart of values, and estimating the depth value of an area in the distribution chart where the frequency is higher than a preset value as the depth of the object.

본 발명에 따른 객체 깊이 추정 방법에 있어서, 상기 관심 영역을 지정하는 단계는, 딥러닝 기반의 객체 인식을 통해 관심 영역을 지정하는 것을 특징으로 한다.In the object depth estimation method according to the present invention, the step of specifying the region of interest is characterized by designating the region of interest through deep learning-based object recognition.

본 발명에 따른 객체 깊이 추정 방법에 있어서, 상기 분포도를 생성하는 단계는, 상기 관심 영역 내에 모든 픽셀에 대한 히스토그램을 통해 분포도를 계산하는 것을 특징으로 한다.In the object depth estimation method according to the present invention, the step of generating the distribution diagram is characterized by calculating the distribution diagram through a histogram for all pixels in the region of interest.

본 발명에 따른 객체 깊이 추정 방법에 있어서, 상기 추정하는 단계는, 상기 분포도에서 빈도수가 기 설정된 값보다 높은 영역의 깊이 평균값을 객체의 깊이로 추정하는 것을 특징으로 한다.In the object depth estimation method according to the present invention, the estimating step is characterized by estimating the average depth value of a region in the distribution map where the frequency is higher than a preset value as the depth of the object.

본 발명에 따른 객체 깊이 추정 방법에 있어서, 상기 획득하는 단계에서 RGB-D 카메라를 통해 RGB 색영상 및 깊이 정보를 획득하는 것을 특징으로 한다.In the object depth estimation method according to the present invention, in the acquisition step, an RGB color image and depth information are acquired through an RGB-D camera.

본 발명에 따른 객체 깊이 추정 장치는 RGB 색영상 및 깊이 정보를 획득하는 카메라, 상기 카메라로부터 획득한 상기 RGB 색영상에서 객체 인식을 통한 관심 영역을 지정하고, 상기 깊이 정보를 이용하여 상기 관심 영역 내에 모든 픽셀에 대한 깊이 값의 분포도를 생성하고, 상기 분포도에서 빈도수가 기 설정된 값보다 높은 영역의 깊이 값을 객체의 깊이로 추정하는 제어부를 포함하는 것을 특징으로 한다.An object depth estimation device according to the present invention includes a camera that acquires an RGB color image and depth information, designates an area of interest through object recognition in the RGB color image obtained from the camera, and uses the depth information to determine the area of interest within the area of interest. It is characterized in that it includes a control unit that generates a distribution chart of depth values for all pixels and estimates the depth value of an area in the distribution chart where the frequency is higher than a preset value as the depth of the object.

본 발명에 따른 이동 장치는 RGB 색영상 및 깊이 정보를 획득하는 카메라, 상기 카메라로부터 획득한 상기 RGB 색영상에서 객체 인식을 통한 관심 영역을 지정하고, 상기 깊이 정보를 이용하여 상기 관심 영역 내에 모든 픽셀에 대한 깊이 값의 분포도를 생성하고, 상기 분포도에서 빈도수가 기 설정된 값보다 높은 영역의 깊이 값을 객체의 깊이로 추정하는 제어부를 포함하는 객체 깊이 추정 장치, 상기 객체 깊이 추정 장치에 의해 객체를 탐지하는 로봇 제어부를 포함하는 것을 특징으로 한다.A mobile device according to the present invention includes a camera that acquires RGB color images and depth information, designates a region of interest through object recognition in the RGB color image obtained from the camera, and uses the depth information to determine all pixels within the region of interest. An object depth estimation device including a control unit that generates a distribution map of depth values and estimates the depth value of an area in the distribution map where the frequency is higher than a preset value as the depth of the object, and detects an object by the object depth estimation device. It is characterized in that it includes a robot control unit that.

본 발명에 따른 객체 깊이 추정 방법은 저가의 RGB-D 카메를 통해 RGB 색영상 및 깊이 정보를 획득하여, 관심 영역 내에 픽셀에 대한 깊이 값의 분포도를 생성하고, 분포도에서 빈도수가 기 설정된 값보다 높은 영역의 깊이 값을 객체의 깊이로 추정함으로써, 간단한 방법으로 정확한 깊이 값을 추정할 수 있다.The object depth estimation method according to the present invention acquires RGB color images and depth information through a low-cost RGB-D camera, generates a distribution chart of depth values for pixels within the area of interest, and generates a distribution chart of depth values for pixels in the area of interest, and the frequency in the distribution chart is higher than a preset value. By estimating the depth value of the area as the depth of the object, an accurate depth value can be estimated in a simple way.

도 1은 본 발명의 실시예에 따른 이동 장치의 구성을 나타낸 블록도이다.
도 2는 본 발명의 실시예에 따른 객체 깊이 추정 장치의 구성을 나타낸 블록도이다.
도 3은 본 발명의 실시예에 따른 객체 깊이 추정 방법을 나타낸 순서도이다.
도 4는 도 3의 S200 단계를 구체적으로 나타낸 순서도이다.
도 5는 본 발명의 실시예에 따른 객체 깊이 추정 방법을 설명하기 위한 예시도이다.1 is a block diagram showing the configuration of a mobile device according to an embodiment of the present invention.
Figure 2 is a block diagram showing the configuration of an object depth estimation device according to an embodiment of the present invention.
Figure 3 is a flowchart showing an object depth estimation method according to an embodiment of the present invention.
Figure 4 is a flowchart specifically showing step S200 of Figure 3.
Figure 5 is an example diagram for explaining an object depth estimation method according to an embodiment of the present invention.

본 명세서에서 사용되는 기술적 용어는 단지 특정한 실시 예를 설명하기 위해 사용된 것으로, 본 발명을 한정하려는 의도가 아님을 유의해야 한다. 또한, 본 명세서에서 사용되는 기술적 용어는 본 명세서에서 특별히 다른 의미로 정의되지 않는 한, 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자에 의해 일반적으로 이해되는 의미로 해석되어야 하며, 과도하게 포괄적인 의미로 해석되거나, 과도하게 축소된 의미로 해석되지 않아야 한다. 또한, 본 명세서에서 사용되는 기술적인 용어가 본 발명의 사상을 정확하게 표현하지 못하는 잘못된 기술적 용어일 때에는, 당업자가 올바르게 이해할 수 있는 기술적 용어로 대체되어 이해되어야 할 것이다. 또한, 본 발명에서 사용되는 일반적인 용어는 사전에 정의되어 있는 바에 따라, 또는 전후 문맥상에 따라 해석되어야 하며, 과도하게 축소된 의미로 해석되지 않아야 한다.It should be noted that the technical terms used in this specification are only used to describe specific embodiments and are not intended to limit the present invention. In addition, the technical terms used in this specification, unless specifically defined in a different way in this specification, should be interpreted as meanings commonly understood by those skilled in the art in the technical field to which the present invention pertains, and are not overly comprehensive. It should not be interpreted in a literal or excessively reduced sense. Additionally, if the technical terms used in this specification are incorrect technical terms that do not accurately express the spirit of the present invention, they should be replaced with technical terms that can be correctly understood by those skilled in the art. In addition, general terms used in the present invention should be interpreted according to the definition in the dictionary or according to the context, and should not be interpreted in an excessively reduced sense.

또한, 본 명세서에서 사용되는 단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한, 복수의 표현을 포함한다. 본 출원에서, "구성된다" 또는 "가지다" 등의 용어는 명세서 상에 기재된 여러 구성 요소들, 또는 여러 단계들을 반드시 모두 포함하는 것으로 해석되지 않아야 하며, 그 중 일부 구성 요소들 또는 일부 단계들은 포함되지 않을 수도 있고, 또는 추가적인 구성 요소 또는 단계들을 더 포함할 수 있는 것으로 해석되어야 한다.Additionally, as used herein, singular expressions include plural expressions, unless the context clearly dictates otherwise. In this application, terms such as “consists of” or “have” should not be construed as necessarily including all of the various components or steps described in the specification, and only some of the components or steps are included. It may not be possible, or it should be interpreted as including additional components or steps.

또한, 본 명세서에서 사용되는 제1, 제2 등과 같이 서수를 포함하는 용어는 다양한 구성 요소들을 설명하는데 사용될 수 있지만, 상기 구성 요소들은 상기 용어들에 의해 한정되어서는 안 된다. 상기 용어들은 하나의 구성 요소를 다른 구성 요소로부터 구별하는 목적으로만 사용된다. 예를 들어, 본 발명의 권리 범위를 벗어나지 않으면서 제1 구성 요소는 제2 구성 요소로 명명될 수 있고, 유사하게 제2 구성 요소도 제1 구성 요소로 명명될 수 있다.Additionally, terms including ordinal numbers, such as first, second, etc., used in this specification may be used to describe various components, but the components should not be limited by the terms. The above terms are used only for the purpose of distinguishing one component from another. For example, a first component may be referred to as a second component, and similarly, the second component may also be referred to as a first component without departing from the scope of the present invention.

어떤 구성 요소가 다른 구성 요소에 "연결되어" 있다거나 "접속되어" 있다고 언급된 때에는, 그 다른 구성 요소에 직접적으로 연결되어 있거나 또는 접속되어 있을 수도 있지만, 중간에 다른 구성 요소가 존재할 수도 있다. 반면에, 어떤 구성 요소가 다른 구성 요소에 "직접 연결되어" 있다거나 "직접 접속되어" 있다고 언급된 때에는, 중간에 다른 구성 요소가 존재하지 않는 것으로 이해되어야 할 것이다.When a component is referred to as being “connected” or “connected” to another component, it may be directly connected to or connected to the other component, but other components may also exist in between. On the other hand, when it is mentioned that a component is “directly connected” or “directly connected” to another component, it should be understood that there are no other components in between.

이하, 첨부된 도면을 참조하여 본 발명에 따른 바람직한 실시예를 상세히 설명하되, 도면 부호에 관계없이 동일하거나 유사한 구성 요소는 동일한 참조 번호를 부여하고 이에 대한 중복되는 설명은 생략하기로 한다. 또한, 본 발명을 설명함에 있어서 관련된 공지 기술에 대한 구체적인 설명이 본 발명의 요지를 흐릴 수 있다고 판단되는 경우 그 상세한 설명을 생략한다. 또한, 첨부된 도면은 본 발명의 사상을 쉽게 이해할 수 있도록 하기 위한 것일 뿐, 첨부된 도면에 의해 본 발명의 사상이 제한되는 것으로 해석되어서는 아니 됨을 유의해야 한다. 본 발명의 사상은 첨부된 도면 외에 모든 변경, 균등물 내지 대체물에 까지도 확장되는 것으로 해석되어야 한다.Hereinafter, preferred embodiments according to the present invention will be described in detail with reference to the attached drawings. However, identical or similar components will be assigned the same reference numbers regardless of the reference numerals, and duplicate descriptions thereof will be omitted. Additionally, when describing the present invention, if it is determined that a detailed description of related known technologies may obscure the gist of the present invention, the detailed description will be omitted. In addition, it should be noted that the attached drawings are only intended to facilitate easy understanding of the spirit of the present invention, and should not be construed as limiting the spirit of the present invention by the attached drawings. The spirit of the present invention should be construed as extending to all changes, equivalents, or substitutes other than the attached drawings.

도 1은 본 발명의 실시예에 따른 이동 장치의 구성을 나타낸 블록도이다.Figure 1 is a block diagram showing the configuration of a mobile device according to an embodiment of the present invention.

도 1을 참조하면, 본 발명의 실시예에 따른 이동 장치(300)은 객체 깊이 추정 장치(100) 및 이동 제어부(200)를 포함하여 구성될 수 있다.Referring to FIG. 1, a movement device 300 according to an embodiment of the present invention may be configured to include an object depth estimation device 100 and a movement control unit 200.

객체 깊이 추정 장치(100)는 이동 장치(300)에 탑재되어, 이동 장치(300)가 이동하면서 객체를 탐지하거나, 탐지된 객체를 회피할 수 있도록 촬영된 영상에서 객체를 인식하고, 객체의 깊이를 추정할 수 있다.The object depth estimation device 100 is mounted on the mobile device 300, detects an object while the mobile device 300 moves, or recognizes an object in a captured image so that it can avoid the detected object, and detects the depth of the object. can be estimated.

여기서 이동 장치(300)는 이동 로봇, 자율주행 자동차 등 객체 인식이 필요한 다양한 장치가 될 수 있다.Here, the mobile device 300 may be a variety of devices that require object recognition, such as a mobile robot or self-driving car.

이러한 객체 깊이 추정 장치(100)는 먼저 객체의 영상을 획득한다. 여기서 객체 깊이 추정 장치(100)는 RGB 색영상 및 깊이 정보를 획득할 수 있다.This object depth estimation device 100 first acquires an image of the object. Here, the object depth estimation device 100 can acquire RGB color images and depth information.

여기서 객체 깊이 추정 장치(100)는 RGB 색영상과 깊이 정보를 동시에 획득 가능한 RGB-D 카메라를 이동하여 RGB 색영상 및 깊이 정보를 획득할 수 있다. 하지만 이에 한정된 것은 아니고, 객체 깊이 추정 장치(100)는 RGB 색영상을 획득할 수 있는 RGB 카메라 및 깊이 정보를 획득할 수 있는 뎁스 카메라를 통해 RGB 카메라 및 깊이 정보를 각각 획득할 수도 있다.Here, the object depth estimation device 100 can acquire RGB color images and depth information by moving an RGB-D camera capable of simultaneously acquiring RGB color images and depth information. However, it is not limited to this, and the object depth estimation device 100 may acquire an RGB camera and depth information through an RGB camera capable of acquiring an RGB color image and a depth camera capable of acquiring depth information, respectively.

또한 객체 깊이 추정 장치(100)는 앞서 획득한 RGB 색영상에서 객체 인식을 통한 관심 영역(Bounding Box)을 지정할 수 있다. 여기서 객체 깊이 추정 장치(100)는 딥러닝 기반의 객체 인식을 통해 관심 영역을 지정할 수 있다. 예를 들어 객체 깊이 추정 장치(100)는 CNN 알고리즘을 이용하여 영상 속의 객체를 인식하여 관심 영역을 지정할 수 있다.Additionally, the object depth estimation device 100 can designate a region of interest (Bounding Box) through object recognition in the previously acquired RGB color image. Here, the object depth estimation device 100 can designate an area of interest through deep learning-based object recognition. For example, the object depth estimation device 100 may recognize an object in an image using a CNN algorithm and designate an area of interest.

또한 객체 깊이 추정 장치(100)는 획득한 깊이 정보를 이용하여 지정된 관심 영역 내에 픽셀에 대한 깊이 값의 분포도를 생성할 수 있다. 이때 객체 깊이 추정 장치(100)는 관심 영역 내에 모든 픽셀에 대한 히스토그램을 통해 분포도를 계산할 수 있다.Additionally, the object depth estimation apparatus 100 may generate a distribution chart of depth values for pixels within a designated area of interest using the acquired depth information. At this time, the object depth estimation device 100 may calculate the distribution through the histogram for all pixels within the area of interest.

또한 객체 깊이 추정 장치(100)는 분포도에서 빈도수가 기 설정된 값보다 높은 영역의 깊이 값을 객체의 깊이로 추정할 수 있다. 구체적으로 객체 깊이 추정 장치(100)는 분포도에서 빈도수가 기 설정된 값보다 높은 영역의 깊이 평균값을 객체의 깊이로 추정할 수 있다. 즉 객체 깊이 추정 장치(100)는 히스토그램 분포도에서 빈도수가 설정된 값 이상의 영역에 대한 깊이 평균값을 객체의 깊이로 추정함으로써 관심 영역 내의 깊이 값에 대한 노이즈를 제거하여 보다 정확한 객체의 깊이 값을 추정할 수 있다.Additionally, the object depth estimation device 100 may estimate the depth value of an area in the distribution map where the frequency is higher than a preset value as the depth of the object. Specifically, the object depth estimation device 100 may estimate the average depth value of an area in the distribution map where the frequency is higher than a preset value as the depth of the object. That is, the object depth estimation device 100 can estimate the depth value of the object more accurately by removing the noise about the depth value within the area of interest by estimating the average depth value for the area above the frequency set value in the histogram distribution as the depth of the object. there is.

이동 제어부(200)는 상술한 객체 깊이 추정 장치(100)에 의해 객체를 탐지하여 이동 장치(300)의 이동을 제어할 수 있다.The movement control unit 200 may control the movement of the movement device 300 by detecting an object using the object depth estimation device 100 described above.

이하 본 발명의 실시예에 따른 객체 깊이 추정 장치(100)의 구성에 대하여 상세히 설명하도록 한다.Hereinafter, the configuration of the object depth estimation device 100 according to an embodiment of the present invention will be described in detail.

도 2는 본 발명의 실시예에 따른 객체 깊이 추정 장치의 구성을 나타낸 블록도이다.Figure 2 is a block diagram showing the configuration of an object depth estimation device according to an embodiment of the present invention.

도 2를 참조하면, 본 발명의 실시예에 따른 객체 깊이 추정 장치(100)는 카메라(110), 저장부(120) 및 제어부(130)를 포함한다.Referring to FIG. 2, the object depth estimation device 100 according to an embodiment of the present invention includes a camera 110, a storage unit 120, and a control unit 130.

카메라(110)는 객체의 영상을 획득한다. 여기서 카메라(110)는 RGB 색영상 및 깊이 정보를 획득할 수 있다. 여기서 카메라(110)는 RGB 색영상과 깊이 정보를 동시에 획득 가능한 RGB-D 카메라를 이동하여 RGB 색영상 및 깊이 정보를 획득할 수 있다. 하지만 이에 한정된 것은 아니고, 카메라(110)는 RGB 색영상을 획득할 수 있는 RGB 카메라 및 깊이 정보를 획득할 수 있는 뎁스 카메라를 통해 RGB 카메라 및 깊이 정보를 각각 획득할 수도 있다.The camera 110 acquires an image of an object. Here, the camera 110 can acquire RGB color images and depth information. Here, the camera 110 can acquire RGB color images and depth information by moving an RGB-D camera capable of simultaneously acquiring RGB color images and depth information. However, it is not limited to this, and the camera 110 may acquire RGB camera and depth information through an RGB camera capable of acquiring RGB color images and a depth camera capable of acquiring depth information, respectively.

저장부(120)는 데이터를 저장하기 위한 장치로, 객체 깊이 추정 장치(100)의 기능 동작에 필요한 응용 프로그램을 저장한다. 특히 저장부(120)는 관심 영역을 추출하기 위한 알고리즘, 분포도를 생성하기 위한 알고리즘, 객체 깊이를 추정하기 위한 알고리즘 등을 저장할 수 있다. 이러한 저장부(120)는 플래시 메모리 타입(flash memory type), 하드디스크 타입(hard disk type), 미디어 카드 마이크로 타입(multimedia card micro type), 카드 타입의 메모리(예를 들어 SD 또는 XD 메모리 등), 램(Random Access Memory, RAM), SRAM(Static Random Access Memory), 롬(Read-Only Memory, ROM), EEPROM(Electrically Erasable Programmable Read-Only Memory), PROM(Programmable Read-Only Memory), 자기메모리, 자기 디스크, 광디스크 중 적어도 하나의 저장매체를 포함할 수 있다.The storage unit 120 is a device for storing data and stores application programs necessary for functional operation of the object depth estimation device 100. In particular, the storage unit 120 may store an algorithm for extracting a region of interest, an algorithm for generating a distribution map, an algorithm for estimating object depth, etc. This storage unit 120 may be a flash memory type, hard disk type, multimedia card micro type, or card type memory (for example, SD or XD memory, etc.). , RAM (Random Access Memory), SRAM (Static Random Access Memory), ROM (Read-Only Memory, ROM), EEPROM (Electrically Erasable Programmable Read-Only Memory), PROM (Programmable Read-Only Memory), magnetic memory , may include at least one storage medium among a magnetic disk and an optical disk.

제어부(130)는 카메라(110)로부터 촬영된 영상을 통해 객체의 깊이를 추정할 수 있다. 이러한 제어부(130)는 관심 영역 추출 모듈(131), 분포도 생성 모듈(132) 및 깊이 추정 모듈(133)을 포함하여 구성될 수 있다.The control unit 130 may estimate the depth of the object through the image captured by the camera 110. This control unit 130 may be configured to include a region of interest extraction module 131, a distribution map generation module 132, and a depth estimation module 133.

관심 영역 추출 모듈(131)은 카메라(110)를 통해 획득한 RGB 색영상에서 객체 인식을 통한 관심 영역(Bounding Box)을 지정할 수 있다. 여기서 관심 영역 추출 모듈(131)은 딥러닝 기반의 객체 인식을 통해 관심 영역을 지정할 수 있다. 예를 들어 관심 영역 추출 모듈(131)은 CNN 알고리즘을 이용하여 영상 속의 객체를 인식하여 관심 영역을 지정할 수 있다.The region of interest extraction module 131 can specify a region of interest (Bounding Box) through object recognition in the RGB color image acquired through the camera 110. Here, the region of interest extraction module 131 can specify the region of interest through deep learning-based object recognition. For example, the region-of-interest extraction module 131 may recognize objects in an image using a CNN algorithm and designate a region of interest.

먼저 관심 영역 추출 모듈(131)은 특정 해상도로 이루어진 입력 이미지를 CNN의 입력 데이터에 맞는 해상도로 변환할 수 있다.First, the region of interest extraction module 131 can convert an input image with a specific resolution to a resolution that matches the input data of the CNN.

예를 들어 관심 영역 추출 모듈(131)은 입력 영상이 고해상도(예를 들어 FHD, QHD 등) 영상이고, CNN의 입력 데이터는 저해상도(예를 들어 SD 등) 영상인 경우, 입력 이미지의 해상도를 다운 스케일링하여 CNN의 입력 데이터에 따른 해상도를 갖도록 변환할 수 있다. 이때, 입력 이미지는 동영상을 구성하는 연속된 프레임들(예를 들면 현재 시점이 t일 때, t-1, t, t+1에 해당하는 시점을 갖는 3개의 프레임)일 수 있다.For example, if the input image is a high-resolution (e.g., FHD, QHD, etc.) image and the input data of the CNN is a low-resolution (e.g., SD, etc.) image, the region of interest extraction module 131 lowers the resolution of the input image. It can be converted to have a resolution according to the CNN input data by scaling. At this time, the input image may be consecutive frames that make up a video (for example, when the current time point is t, three frames with time points corresponding to t-1, t, and t+1).

다음으로 관심 영역 추출 모듈(131)은 사전에 학습된 CNN에 전처리된 입력 이미지를 통과시킴에 따라 CNN의 출력으로, 여러 픽셀 위치에 대한 바운딩 박스 후보와 바운딩 박스 후보에 관심 객체가 위치할 확률을 나타내는 신뢰도를 획득할 수 있다.Next, the region-of-interest extraction module 131 passes the pre-processed input image to a pre-trained CNN, and as the output of the CNN, bounding box candidates for several pixel positions and the probability that the object of interest is located in the bounding box candidate are calculated. A level of reliability can be obtained.

여기서 CNN은 다양한 크기를 갖는 특징 맵(feature map)을 사용하여 바운딩 박스가 될 수 있는 위치와 크기를 추정하고, 추정된 바운딩 박스들로 이루어지는 바운딩 박스 후보들을 출력할 수 있다. 이때, CNN은 바운딩 박스 후보들에 대한 신뢰도를 함께 출력할 수 있다. 또한, CNN을 구성하는 계수 값 등은 빅 데이터를 이용하여 미리 학습됨으로써 결정될 수 있다. 그 밖에 객체 검출을 위한 CNN의 구성과 동작에 대해서는 본 발명이 속하는 기술 분야에서 통상의 기술자가 용이하게 이해할 수 있을 것이므로 구체적인 설명은 생략한다.Here, CNN can estimate the location and size of a bounding box using feature maps of various sizes and output bounding box candidates composed of the estimated bounding boxes. At this time, CNN can also output the reliability of the bounding box candidates. Additionally, the coefficient values that make up the CNN can be determined by learning in advance using big data. In addition, since the configuration and operation of CNN for object detection can be easily understood by those skilled in the art, detailed description will be omitted.

그리고 관심 영역 추출 모듈(131)은 비최대화 억제(Non-max suppression) 알고리즘을 수행하여 신뢰도가 미리 설정된 임계값보다 높은 바운딩 박스 후보를 선택함으로써, 관심 영역을 결정할 수 있다.Additionally, the region of interest extraction module 131 may determine the region of interest by performing a non-max suppression algorithm to select a bounding box candidate whose reliability is higher than a preset threshold.

분포도 생성 모듈(132)은 카메라(110)를 통해 획득한 깊이 정보를 이용하여 지정된 관심 영역 내에 픽셀에 대한 깊이 값의 분포도를 생성할 수 있다. 이때 분포도 생성 모듈(132)은 관심 영역 내에 모든 픽셀에 대한 히스토그램을 통해 분포도를 계산할 수 있다.The distribution map generation module 132 may generate a distribution map of depth values for pixels within a designated area of interest using depth information acquired through the camera 110. At this time, the distribution chart creation module 132 may calculate the distribution chart through the histogram for all pixels within the region of interest.

또한 깊이 추정 모듈(133)은 분포도에서 빈도수가 기 설정된 값보다 높은 영역의 깊이 값을 객체의 깊이로 추정할 수 있다. 구체적으로 깊이 추정 모듈(133)은 분포도에서 빈도수가 기 설정된 값보다 높은 영역의 깊이 평균값을 객체의 깊이로 추정할 수 있다. 즉 깊이 추정 모듈(133)은 히스토그램 분포도에서 빈도수가 설정된 값 이상의 영역에 대한 깊이 평균값을 객체의 깊이로 추정함으로써 관심 영역 내의 깊이 값에 대한 노이즈를 제거하여 보다 정확한 객체의 깊이 값을 추정할 수 있다.Additionally, the depth estimation module 133 may estimate the depth value of an area in the distribution map where the frequency is higher than a preset value as the depth of the object. Specifically, the depth estimation module 133 may estimate the average depth value of an area in the distribution map where the frequency is higher than a preset value as the depth of the object. That is, the depth estimation module 133 estimates the average depth value for an area whose frequency is greater than the value set in the histogram distribution chart as the depth of the object, thereby removing noise about the depth value within the area of interest to more accurately estimate the depth value of the object. .

이하 본 발명의 실시예에 따른 객체 깊이 추정 방법에 대하여 상세히 설명하도록 한다.Hereinafter, the object depth estimation method according to an embodiment of the present invention will be described in detail.

도 3은 본 발명의 실시예에 따른 객체 깊이 추정 방법을 나타낸 순서도이고, 도 4는 도 3의 S200 단계를 구체적으로 나타낸 순서도이고, 도 5는 본 발명의 실시예에 따른 객체 깊이 추정 방법을 설명하기 위한 예시도이다.FIG. 3 is a flowchart showing an object depth estimation method according to an embodiment of the present invention, FIG. 4 is a flowchart specifically showing step S200 of FIG. 3, and FIG. 5 illustrates an object depth estimation method according to an embodiment of the present invention. This is an example for doing so.

도 3 내지 도 5를 참조하면, 먼저 객체 깊이 추정 장치는 S100 단계에서 RGB 색영상 및 깊이 정보를 획득할 수 있다.Referring to FIGS. 3 to 5, first, the object depth estimation device may acquire RGB color images and depth information in step S100.

다음으로 S200 단계에서 획득한 RGB 색영상에서 객체 인식을 통한 관심 영역(Bounding Box)을 지정할 수 있다. 여기서 객체 깊이 추정 장치는 딥러닝 기반의 객체 인식을 통해 관심 영역을 지정할 수 있다. 예를 들어 객체 깊이 추정 장치는 CNN 알고리즘을 이용하여 영상 속의 객체를 인식하여 관심 영역을 지정할 수 있다.Next, an area of interest (Bounding Box) can be designated through object recognition in the RGB color image obtained in step S200. Here, the object depth estimation device can specify an area of interest through deep learning-based object recognition. For example, an object depth estimation device can use a CNN algorithm to recognize objects in an image and designate an area of interest.

구체적으로 S210 단계에서 객체 깊이 추정 장치는 특정 해상도로 이루어진 입력 이미지를 CNN의 입력 데이터에 맞는 해상도로 변환시키는 전처리를 수행할 수 있다.Specifically, in step S210, the object depth estimation device may perform preprocessing to convert an input image with a specific resolution to a resolution suitable for the input data of the CNN.

예를 들어 객체 깊이 추정 장치는 입력 영상이 고해상도(예를 들어 FHD, QHD 등) 영상이고, CNN의 입력 데이터는 저해상도(예를 들어 SD 등) 영상인 경우, 입력 이미지의 해상도를 다운 스케일링하여 CNN의 입력 데이터에 따른 해상도를 갖도록 변환할 수 있다. 이때, 입력 이미지는 동영상을 구성하는 연속된 프레임들(예를 들면 현재 시점이 t일 때, t-1, t, t+1에 해당하는 시점을 갖는 3개의 프레임)일 수 있다.For example, if the input image of the object depth estimation device is a high-resolution (e.g., FHD, QHD, etc.) image, and the input data of the CNN is a low-resolution (e.g., SD, etc.) image, the object depth estimation device downscales the resolution of the input image to create a CNN. It can be converted to have a resolution according to the input data. At this time, the input image may be consecutive frames that make up a video (for example, when the current time point is t, three frames with time points corresponding to t-1, t, and t+1).

다음으로 S220 단계에서 객체 깊이 추정 장치는 사전에 학습된 CNN에 전처리된 입력 이미지를 통과시킴에 따라 CNN의 출력으로, 여러 픽셀 위치에 대한 바운딩 박스 후보와 바운딩 박스 후보에 관심 객체가 위치할 확률을 나타내는 신뢰도를 획득할 수 있다.Next, in step S220, the object depth estimation device passes the preprocessed input image to the pre-trained CNN, and as the output of the CNN, bounding box candidates for several pixel positions and the probability that the object of interest is located in the bounding box candidate are calculated. A level of reliability can be obtained.

여기서 CNN은 다양한 크기를 갖는 특징 맵(feature map)을 사용하여 바운딩 박스가 될 수 있는 위치와 크기를 추정하고, 추정된 바운딩 박스들로 이루어지는 바운딩 박스 후보들을 출력할 수 있다. 이때, CNN은 바운딩 박스 후보들에 대한 신뢰도를 함께 출력할 수 있다. 또한, CNN을 구성하는 계수 값 등은 빅 데이터를 이용하여 미리 학습됨으로써 결정될 수 있다. 그 밖에 객체 검출을 위한 CNN의 구성과 동작에 대해서는 본 발명이 속하는 기술 분야에서 통상의 기술자가 용이하게 이해할 수 있을 것이므로 구체적인 설명은 생략한다.Here, CNN can estimate the location and size of a bounding box using feature maps of various sizes and output bounding box candidates composed of the estimated bounding boxes. At this time, CNN can also output the reliability of the bounding box candidates. Additionally, the coefficient values that make up the CNN can be determined by learning in advance using big data. In addition, detailed descriptions of the configuration and operation of CNN for object detection will be omitted since those skilled in the art can easily understand them.

그리고 S230 단계에서 객체 깊이 추정 장치는 비최대화 억제(Non-max suppression) 알고리즘을 수행하여 신뢰도가 미리 설정된 임계값보다 높은 바운딩 박스 후보를 선택함으로써, 관심 영역을 결정할 수 있다.And in step S230, the object depth estimation device can determine the region of interest by performing a non-max suppression algorithm to select a bounding box candidate whose reliability is higher than a preset threshold.

다음으로 S300 단계에서 객체 깊이 추정 장치는 S100 단계에서 획득한 깊이 정보를 이용하여 지정된 관심 영역 내에 픽셀에 대한 깊이 값의 분포도를 생성할 수 있다. 이때 객체 깊이 추정 장치는 관심 영역 내에 모든 픽셀에 대한 히스토그램을 통해 분포도를 계산할 수 있다.Next, in step S300, the object depth estimation device may generate a distribution chart of depth values for pixels within the designated area of interest using the depth information acquired in step S100. At this time, the object depth estimation device can calculate the distribution through the histogram for all pixels within the area of interest.

다음으로 S400 단계에서 객체 깊이 추정 장치는 분포도에서 빈도수가 기 설정된 값보다 높은 영역의 깊이 값을 객체의 깊이로 추정할 수 있다. 구체적으로 객체 깊이 추정 장치는 분포도에서 빈도수가 기 설정된 값보다 높은 영역의 깊이 평균값을 객체의 깊이로 추정할 수 있다. 즉 객체 깊이 추정 장치는 히스토그램 분포도에서 빈도수가 설정된 값 이상의 영역에 대한 깊이 평균값을 객체의 깊이로 추정함으로써 관심 영역 내의 깊이 값에 대한 노이즈를 제거하여 보다 정확한 객체의 깊이 값을 추정할 수 있다.Next, in step S400, the object depth estimation device may estimate the depth value of an area in the distribution map where the frequency is higher than a preset value as the depth of the object. Specifically, the object depth estimation device can estimate the average depth value of an area in the distribution map where the frequency is higher than a preset value as the depth of the object. In other words, the object depth estimation device can estimate the depth value of the object more accurately by removing noise about the depth value within the area of interest by estimating the average depth value of the area above the frequency set value in the histogram distribution as the depth of the object.

예를 들어 객체 깊이 추정 장치는 도 5에 도시된 바와 같이, 모든 픽셀에 대한 깊이 값의 히스토그램을 생성하고, 기 설정된 깊이값(A)을 초과하는 깊이 값을 갖는 영역에 대한 깊이 값의 평균을 객체의 깊이로 추정할 수 있다.For example, as shown in FIG. 5, the object depth estimation device generates a histogram of depth values for all pixels and averages the depth values for areas with depth values exceeding a preset depth value (A). It can be estimated by the depth of the object.

이와 같이 본 발명의 실시예에 따른 객체 깊이 추정 방법은 저가의 RGB-D 카메를 통해 RGB 색영상 및 깊이 정보를 획득하여, 관심 영역 내에 픽셀에 대한 깊이 값의 분포도를 생성하고, 분포도에서 빈도수가 기 설정된 값보다 높은 영역의 깊이 값을 객체의 깊이로 추정함으로써, 간단한 방법으로 정확한 깊이 값을 추정할 수 있다.In this way, the object depth estimation method according to an embodiment of the present invention acquires RGB color images and depth information through a low-cost RGB-D camera, generates a distribution map of depth values for pixels within the area of interest, and calculates the frequency numbers in the distribution map. By estimating the depth value of an area higher than a preset value as the depth of the object, an accurate depth value can be estimated in a simple way.

아울러, 본 발명에 따른 실시예를 설명하는데 있어서, 특정한 순서로 도면에서 동작들을 묘사하고 있지만, 이는 바람직한 결과를 얻기 위하여 도시된 그 특정한 순서나 순차적인 순서대로 그러한 동작들을 수행하여야 한다거나 모든 도시된 동작들이 수행되어야 하는 것으로 이해되어서는 안 된다. 특정한 경우, 멀티태스킹과 병렬 프로세싱이 유리할 수 있다. 또한, 상술한 실시형태의 다양한 시스템 컴포넌트의 분리는 그러한 분리를 모든 실시 형태에서 요구하는 것으로 이해되어서는 안되며, 설명한 프로그램 컴포넌트와 시스템들은 일반적으로 단일의 소프트웨어 제품으로 함께 통합되거나 다중 소프트웨어 제품에 패키징될 수 있다는 점을 이해하여야 한다.In addition, in describing embodiments according to the present invention, operations are depicted in the drawings in a specific order, but this means that in order to obtain desirable results, such operations must be performed in the specific order or sequential order shown, or all of the illustrated operations must be performed. They should not be understood as something that must be done. In certain cases, multitasking and parallel processing may be advantageous. Additionally, the separation of various system components in the above-described embodiments should not be construed as requiring such separation in all embodiments, and the described program components and systems may generally be integrated together into a single software product or packaged into multiple software products. You must understand that it is possible.

100 : 객체 깊이 추정 장치 110 : 카메라
120 : 저장부 130 : 제어부
131 : 관심 영역 추출 모듈 132 : 분포도 생성 모듈
133 : 깊이 추정 모듈 200 : 로봇 제어부
300 : 이동 장치100: object depth estimation device 110: camera
120: storage unit 130: control unit
131: Region of interest extraction module 132: Distribution map generation module
133: Depth estimation module 200: Robot control unit
300: moving device

Claims

Obtaining RGB color image and depth information;
Designating a region of interest through object recognition in the RGB color image;
generating a distribution chart of depth values for pixels within the region of interest using the depth information;
A step of estimating the depth value of an area in the distribution map where the frequency is higher than a preset value as the depth of the object,
The estimation step is,
An object depth estimation method characterized by estimating the average depth value of an area in the distribution map where the frequency is higher than a preset value as the depth of the object.

According to paragraph 1,
The step of specifying the area of interest is,
An object depth estimation method characterized by specifying a region of interest through deep learning-based object recognition.

According to paragraph 1,
The step of generating the distribution diagram is,
An object depth estimation method characterized by calculating a distribution through a histogram for all pixels within the region of interest.

delete

According to paragraph 1,
In the obtaining step,
An object depth estimation method characterized by acquiring RGB color images and depth information through an RGB-D camera.

A camera that acquires RGB color images and depth information;
Designate an area of interest through object recognition in the RGB color image obtained from the camera, generate a distribution chart of depth values for all pixels within the area of interest using the depth information, and set the frequency in the distribution chart to a preset value. It includes a control unit that estimates the depth value of the higher area as the depth of the object,
The control unit,
An object depth estimation device characterized by estimating the average depth value of an area in the distribution map where the frequency is higher than a preset value as the depth of the object.

A camera that acquires RGB color images and depth information, designates a region of interest through object recognition in the RGB color image obtained from the camera, and uses the depth information to create a distribution chart of depth values for all pixels in the region of interest. an object depth estimation device including a control unit that generates and estimates a depth value of a region in the distribution map where the frequency is higher than a preset value as the depth of the object;
It includes a movement control unit that detects an object and controls its movement by the object depth estimation device,
The control unit,
A mobile device characterized in that the average depth value of an area in the distribution map where the frequency is higher than a preset value is estimated as the depth of the object.