KR102453834B1

KR102453834B1 - A method for structuring the output information of multiple thermal and image cameras as input data of a deep neural network model

Info

Publication number: KR102453834B1
Application number: KR1020200087721A
Authority: KR
Inventors: 이정우; 박지현; 황정환; 이나현; 최영호
Original assignee: 한국로봇융합연구원
Priority date: 2020-07-15
Filing date: 2020-07-15
Publication date: 2022-10-11
Also published as: KR20220009246A

Abstract

본 발명의 실시예에 따른 심층신경망 모델의 입력데이터로 구조화하기 위한 방법은 평면분류부가 2차원의 영상이 입력되면, 상기 영상에 포함된 정보의 속성을 기초로 상기 영상에 포함된 정보를 분류하고, 각각이 분류된 정보를 포함하는 복수의 평면 영상을 생성하는 단계와, 입체형성부가 상기 복수의 평면 영상을 복수의 층으로 쌓아 3차원 구조를 가지는 입체 영상을 생성하는 단계를 포함한다. In the method for structuring into input data of a deep neural network model according to an embodiment of the present invention, when a two-dimensional image is input by the plane classification unit, the information included in the image is classified based on the properties of the information included in the image, , generating a plurality of planar images each including classified information, and a stereoscopic forming unit stacking the plurality of planar images in a plurality of layers to generate a three-dimensional image having a three-dimensional structure.

Description

A method for structuring the output information of multiple thermal and image cameras as input data of a deep neural network model

본 발명은 심층신경망 모델의 입력데이터를 생성하는 기술에 관한 것으로, 보다 상세하게는, 다수의 열화상 및 영상 카메라의 출력 정보를 위치 정보, 자세 정보 및 객체 정보를 추정하기 위한 심층신경망 모델의 입력데이터로 구조화하기 위한 방법에 관한 것이다. The present invention relates to a technique for generating input data of a deep neural network model, and more particularly, input of a deep neural network model for estimating position information, posture information, and object information from output information of a plurality of thermal images and video cameras It is about a method for structuring into data.

기존의 영상 처리를 대신하여 심층신경망을 통해 객체를 식별하는 다양한 방법들이 도입되고 있다. Instead of the existing image processing, various methods for identifying objects through deep neural networks are being introduced.

한국공개특허 제2020-0035536호 2020년 04월 06일 공개 (명칭: 적외선 카메라와 적외선 조명을 이용한 버스전용차로 위반차량 단속 시스템 및 그 방법)Korean Patent Laid-Open Patent No. 2020-0035536 published on April 06, 2020 (Title: System and method for controlling bus-only lane violations using infrared camera and infrared lighting)

본 발명의 목적은 본 발명은 실내 및 실외 환경에서 장치에 부착되거나, 연결되거나, 내장된 다수의 이종의 카메라를 이용하여 객체의 절대 위치 또는 상대 위치 변화량, 자세 정보, 객체 정보를 추정하는 심층신경망을 학습시키기 위한 입력데이터를 도출하는 방법을 제공함에 있다. An object of the present invention is to provide a deep neural network for estimating an absolute or relative position change amount, posture information, and object information of an object using a plurality of heterogeneous cameras attached to, connected to, or built-in to a device in indoor and outdoor environments. To provide a method of deriving input data for learning

상술한 바와 같은 목적을 달성하기 위한 본 발명의 바람직한 실시예에 따른 심층신경망 모델의 입력데이터로 구조화하기 위한 방법은 평면분류부가 2차원의 영상이 입력되면, 상기 영상에 포함된 정보의 속성을 기초로 상기 영상에 포함된 정보를 분류하고, 각각이 분류된 정보를 포함하는 복수의 평면 영상을 생성하는 단계와, 입체형성부가 상기 복수의 평면 영상을 복수의 층으로 쌓아 3차원 구조를 가지는 입체 영상을 생성하는 단계를 포함한다. The method for structuring into input data of a deep neural network model according to a preferred embodiment of the present invention for achieving the above object is based on the properties of information included in the image when the plane classification unit receives a two-dimensional image. classifying the information included in the image, generating a plurality of planar images each including the classified information, and a stereoscopic image having a three-dimensional structure by stacking the plurality of planar images in a plurality of layers by a stereoscopic forming unit It includes the step of creating

상기 복수의 평면 영상을 생성하는 단계는 상기 입력된 영상이 열 영상(Thermal Image)일 때, 상기 평면분류부가 상기 열 영상의 픽셀을 픽셀값의 온도를 기준으로 복수의 단위 구간으로 분류하는 단계와, 상기 평면분류부가 상기 분류된 복수의 단위 구간 별로 상기 단위 구간에 포함되는 온도의 픽셀값을 가지는 픽셀로 이루어진 평면 영상을 구성함으로써 복수의 평면 영상을 생성하는 단계를 포함한다. The generating of the plurality of flat images may include: when the input image is a thermal image, classifying the pixels of the thermal image by the plane classification unit into a plurality of unit sections based on the temperature of the pixel value; and generating, by the plane classification unit, a plane image including pixels having a pixel value of a temperature included in the unit section for each of the plurality of classified unit sections, thereby generating a plurality of planar images.

상기 복수의 단위 구간으로 분류하는 단계는 상기 평면분류부가 온도 분포의 최대값을 설정한 후, 수학식

에 따라 소정의 분할 수의 단위 구간으로 구분하는 것을 특징으로 하고, 상기 Q는 분할 수이고, 상기 Tstart는 단위 구간의 시작 온도이고, 상기 Tend는 단위 구간의 종료 온도이고, 상기 Tmax는 상기 온도 분포의 최대값인 것을 특징으로 한다. In the step of classifying into the plurality of unit sections, after the plane classifying unit sets the maximum value of the temperature distribution,

characterized in that it is divided into unit sections of a predetermined number of divisions according to the It is characterized in that it is the maximum value of .

상기 복수의 평면 영상을 생성하는 단계는 상기 평면분류부가 상기 온도 분포의 최대값 이상의 온도를 가지는 픽셀값을 유효 최대 온도로 대체하는 단계와, 상기 분류된 복수의 단위 구간 별로 단위 구간에 포함되는 온도의 픽셀값을 가지는 특징 픽셀과 단위 구간에 포함되지 않는 온도의 픽셀값을 0으로 변환한 패딩 픽셀을 포함하는 평면 영상을 구성함으로써 복수의 평면 영상을 생성하는 단계와, 복수의 평면 영상의 모든 픽셀의 픽셀값을 정규화하여 [0.0, 1.0)의 범위로 변환하는 단계를 포함한다. The generating of the plurality of plane images includes: replacing, by the plane classification unit, a pixel value having a temperature equal to or greater than the maximum value of the temperature distribution with an effective maximum temperature; generating a plurality of flat images by constructing a flat image including a feature pixel having a pixel value of , and a padding pixel obtained by converting a pixel value of a temperature not included in a unit section to 0; Normalizing the pixel values of ? and converting them to a range of [0.0, 1.0).

상기 복수의 영상을 생성하는 단계는 상기 입력된 영상이 컬러 영상, 깊이 영상 및 적외선 영상을 포함할 때, 상기 평면분류부가 픽셀값이 색 정보, 거리 정보 및 밝기 정보 중 적어도 하나를 가지는 복수의 평면 영상을 생성하는 단계와, 복수의 평면 영상의 모든 픽셀의 픽셀값을 정규화하여 [0.0, 1.0)의 범위로 변환하는 단계를 포함한다. In the generating of the plurality of images, when the input image includes a color image, a depth image, and an infrared image, the plane classifying unit includes a plurality of planes in which pixel values have at least one of color information, distance information, and brightness information. generating an image, and converting pixel values of all pixels of a plurality of planar images into a range of [0.0, 1.0) by normalizing the pixel values.

복수의 평면 영상을 생성하는 단계는 컬러 영상으로부터 R(Red, 적색) 채널의 픽셀값만을 가지는 R 채널 평면 영상, G(Green, 녹색) 채널의 픽셀값만을 가지는 G 채널 평면 영상 및 B(Blue, 청색) 채널의 픽셀값만을 가지는 B 채널 평면 영상을 생성하거나, 그레이스케일의 값으로 이루어진 그레이스케일 평면 영상을 생성하는 단계와, 깊이 영상으로부터 거리를 나타내는 픽셀값의 크기에 따라 상위 깊이 평면 영상 및 하위 깊이 평면 영상을 생성하거나, 픽셀값을 양자화하여 양자화 깊이 평면 영상을 생성하는 단계와, 적외선 영상으로부터 밝기를 나타내는 픽셀값의 크기에 따라 상위 밝기 평면 영상 및 하위 밝기 평면 영상을 생성하거나, 픽셀값을 양자화한 양자화 밝기 평면 영상을 생성하는 단계를 포함한다. The generating of the plurality of plane images includes an R channel plane image having only pixel values of an R (Red, red) channel, a G channel plane image having only pixel values of a G (Green, green) channel, and a B (Blue, blue) generating a B-channel plane image having only pixel values of the channel or generating a grayscale plane image composed of grayscale values, and an upper depth plane image and a lower generating a depth plane image or generating a quantized depth plane image by quantizing pixel values; and generating a quantized quantized brightness plane image.

상기 방법은 격자형성부가 복수의 입체 영상을 격자로 배열하여 격자 영상을 형성하는 단계와, 상기 격자 영상에 포함되는 복수의 입체 영상 중 적어도 하나의 높이가 다른 경우, 적어도 하나의 입체 영상으로부터 적어도 하나의 평면 영상을 소거하거나, 적어도 하나의 입체 영상에 적어도 하나의 패딩 영상을 부가하여 상기 격자 영상에 포함되는 복수의 입체 영상의 높이를 일치시키는 단계를 더 포함한다. The method includes the steps of: a grid forming unit arranging a plurality of stereoscopic images in a grid to form a grid image; The method may further include matching heights of a plurality of stereoscopic images included in the grid image by erasing the plane image of , or adding at least one padding image to the at least one stereoscopic image.

상기 방법은 2차원의 영상이 입력되면, 상기 평면분류부가 복수의 평면 영상을 생성하고, 상기 입체형성부가 상기 복수의 평면 영상으로부터 상기 입체 영상을 생성하고, 상기 격자형성부가 상기 복수의 입체 영상을 격자로 배열하여 격자 영상을 형성하는 단계와, 학습부가 상기 입체 영상 혹은 상기 격자 영상인 입력 데이터에 대응하여 객체의 위치 정보, 자세 정보 및 객체 정보 중 적어도 하나를 레이블 데이터로 수집하는 단계와, 상기 학습부가 상기 레이블 데이터를 상기 입력 데이터에 레이블링하는 단계를 더 포함한다. In the method, when a two-dimensional image is input, the plane classifying unit generates a plurality of planar images, the three-dimensional forming unit generates the three-dimensional image from the plurality of planar images, and the grid forming unit generates the plurality of three-dimensional images. Forming a grid image by arranging them in a grid, and collecting, by a learning unit, at least one of position information, posture information, and object information of an object as label data in response to input data that is the stereoscopic image or the grid image; The method further includes: a learning unit labeling the label data on the input data.

본 발명에 따르면, 복수의 열화상 또는 영상 카메라의 출력 정보를 가공하여 입력데이터 구조로 변환 및 융합하여 실내외 환경 내에서의 장치 또는 관측 대상의 절대 위치 또는 상대위치 변화량, 자세 정보 등을 추정하는 깊은 신경망 모델의 효율적인 학습이 가능하도록 한다. 본 발명에 의해 생성된 학습 데이터를 이용하여 심층신경망을 학습시키는 경우 복잡한 영상 처리 알고리즘의 계산 없이 편리하고 빠른 위치 및 자세의 추정이 가능하다. According to the present invention, it is possible to process the output information of a plurality of thermal imagers or video cameras, convert it into an input data structure, and fuse it to estimate the absolute position or relative position change amount, posture information, etc. of a device or object in an indoor or outdoor environment. It enables efficient training of neural network models. When a deep neural network is trained using the training data generated by the present invention, it is possible to conveniently and quickly estimate a position and posture without calculating a complex image processing algorithm.

도 1은 본 발명의 실시예에 따른 연산장치의 정보 도출 방법을 설명하기 위한 개념도이다.
도 2는 본 발명의 실시예에 따른 다수의 열화상 및 영상 카메라의 출력 정보를 심층신경망 모델의 입력데이터로 구조화하기 위한 장치의 구성을 설명하기 위한 도면이다.
도 3은 본 발명의 실시예에 따른 다수의 열화상 및 영상 카메라의 출력 정보를 심층신경망 모델의 입력데이터로 구조화하기 위한 장치의 제어부의 세부 구성을 설명하기 위한 도면이다.
도 4는 본 발명의 실시예에 따른 다수의 열화상 및 영상 카메라의 출력 정보를 기초로 위치, 자세 및 객체 정보를 추정하기 위한 심층신경망 모델의 일례를 설명하기 위한 도면이다.
도 5는 본 발명의 실시예에 따른 다수의 열화상 및 영상 카메라의 출력 정보를 심층신경망 모델의 입력데이터로 구조화하기 위한 방법을 설명하기 위한 흐름도이다.
도 6은 본 발명의 제1 실시예에 따른 다수의 열화상 카메라의 출력 정보로부터 입체 영상을 생성하는 방법을 설명하기 위한 도면이다.
도 7은 본 발명의 제2 실시예에 따른 다수의 영상 카메라의 출력 정보로부터 입체 영상을 생성하는 방법을 설명하기 위한 도면이다.
도 8은 본 발명의 실시예에 따른 다수의 열화상 및 영상 카메라의 출력 정보로부터 격자 영상을 생성하는 방법을 설명하기 위한 도면이다.
도 9는 본 발명의 제1 실시예에 따른 입체 영상을 생성하는 방법을 설명하기 위한 흐름도이다.
도 10은 본 발명의 제2 실시예에 따른 입체 영상을 생성하는 방법을 설명하기 위한 흐름도이다.
도 11은 본 발명의 실시예에 따른 학습 데이터를 생성하는 방법을 설명하기 위한 흐름도이다. 1 is a conceptual diagram for explaining a method of deriving information of an arithmetic device according to an embodiment of the present invention.
FIG. 2 is a diagram for explaining the configuration of an apparatus for structuring output information of a plurality of thermal imaging and imaging cameras into input data of a deep neural network model according to an embodiment of the present invention.
FIG. 3 is a view for explaining the detailed configuration of a control unit of an apparatus for structuring output information of a plurality of thermal images and video cameras into input data of a deep neural network model according to an embodiment of the present invention.
4 is a diagram for explaining an example of a deep neural network model for estimating position, posture, and object information based on output information of a plurality of thermal images and video cameras according to an embodiment of the present invention.
5 is a flowchart illustrating a method for structuring output information of a plurality of thermal imaging and imaging cameras into input data of a deep neural network model according to an embodiment of the present invention.
6 is a view for explaining a method of generating a stereoscopic image from output information of a plurality of thermal imaging cameras according to the first embodiment of the present invention.
7 is a diagram for explaining a method of generating a stereoscopic image from output information of a plurality of image cameras according to a second embodiment of the present invention.
8 is a diagram for explaining a method of generating a grid image from output information of a plurality of thermal images and image cameras according to an embodiment of the present invention.
9 is a flowchart illustrating a method of generating a stereoscopic image according to the first embodiment of the present invention.
10 is a flowchart illustrating a method of generating a stereoscopic image according to a second embodiment of the present invention.
11 is a flowchart illustrating a method of generating learning data according to an embodiment of the present invention.

본 발명의 상세한 설명에 앞서, 이하에서 설명되는 본 명세서 및 청구범위에 사용된 용어나 단어는 통상적이거나 사전적인 의미로 한정해서 해석되어서는 아니 되며, 발명자는 그 자신의 발명을 가장 최선의 방법으로 설명하기 위해 용어의 개념으로 적절하게 정의할 수 있다는 원칙에 입각하여 본 발명의 기술적 사상에 부합하는 의미와 개념으로 해석되어야만 한다. 따라서 본 명세서에 기재된 실시예와 도면에 도시된 구성은 본 발명의 가장 바람직한 실시예에 불과할 뿐, 본 발명의 기술적 사상을 모두 대변하는 것은 아니므로, 본 출원시점에 있어서 이들을 대체할 수 있는 다양한 균등물과 변형 예들이 있을 수 있음을 이해하여야 한다. Prior to the detailed description of the present invention, the terms or words used in the present specification and claims described below should not be construed as being limited to their ordinary or dictionary meanings, and the inventors should develop their own inventions in the best way. It should be interpreted as meaning and concept consistent with the technical idea of the present invention based on the principle that it can be appropriately defined as a concept of a term for explanation. Accordingly, the embodiments described in this specification and the configurations shown in the drawings are only the most preferred embodiments of the present invention, and do not represent all the technical ideas of the present invention, so various equivalents that can replace them at the time of the present application It should be understood that there may be water and variations.

이하, 첨부된 도면을 참조하여 본 발명의 바람직한 실시예들을 상세히 설명한다. 이때, 첨부된 도면에서 동일한 구성 요소는 가능한 동일한 부호로 나타내고 있음을 유의해야 한다. 또한, 본 발명의 요지를 흐리게 할 수 있는 공지 기능 및 구성에 대한 상세한 설명은 생략할 것이다. 마찬가지의 이유로 첨부 도면에 있어서 일부 구성요소는 과장되거나 생략되거나 또는 개략적으로 도시되었으며, 각 구성요소의 크기는 실제 크기를 전적으로 반영하는 것이 아니다. Hereinafter, preferred embodiments of the present invention will be described in detail with reference to the accompanying drawings. In this case, it should be noted that the same components in the accompanying drawings are denoted by the same reference numerals as much as possible. In addition, detailed descriptions of well-known functions and configurations that may obscure the gist of the present invention will be omitted. For the same reason, some components are exaggerated, omitted, or schematically illustrated in the accompanying drawings, and the size of each component does not fully reflect the actual size.

먼저, 본 발명의 실시예에 따른 본 발명의 실시예에 따른 심층신경망을 통해 다수의 열화상 및 영상 카메라의 영상을 기초로 생성된 입력데이터를 기초로 정보를 도출하는 방법에 대해서 설명하기로 한다. 도 1은 본 발명의 실시예에 따른 연산장치의 정보 도출 방법을 설명하기 위한 개념도이다. 도 2는 본 발명의 실시예에 따른 다수의 열화상 및 영상 카메라의 출력 정보를 심층신경망 모델의 입력데이터로 구조화하기 위한 장치의 구성을 설명하기 위한 도면이다. 도 3은 본 발명의 실시예에 따른 다수의 열화상 및 영상 카메라의 출력 정보를 심층신경망 모델의 입력데이터로 구조화하기 위한 장치의 제어부의 세부 구성을 설명하기 위한 도면이다. 도 4는 본 발명의 실시예에 따른 다수의 열화상 및 영상 카메라의 출력 정보를 기초로 위치, 자세 및 객체 정보를 추정하기 위한 심층신경망 모델의 일례를 설명하기 위한 도면이다. First, a method of deriving information based on input data generated based on images of a plurality of thermal imaging and imaging cameras through a deep neural network according to an embodiment of the present invention will be described. . 1 is a conceptual diagram for explaining a method of deriving information of an arithmetic device according to an embodiment of the present invention. FIG. 2 is a diagram for explaining the configuration of an apparatus for structuring output information of a plurality of thermal imaging and imaging cameras into input data of a deep neural network model according to an embodiment of the present invention. FIG. 3 is a view for explaining the detailed configuration of a control unit of an apparatus for structuring output information of a plurality of thermal images and video cameras into input data of a deep neural network model according to an embodiment of the present invention. 4 is a diagram for explaining an example of a deep neural network model for estimating position, posture, and object information based on output information of a plurality of thermal images and video cameras according to an embodiment of the present invention.

도 1을 참조하면, 기본적으로, 연산장치(10)는 본 발명의 실시예에 따른 심층신경망을 학습시키기 위한 것이다. 로봇(R) 혹은 컴퓨팅연산장치(C) 내에 설치되어 심층신경망을 학습시킬 수 있다. 컴퓨팅연산장치(C) 내에 설치되어 심층신경망을 학습시킨 경우, 해당 심층신경망은 로봇(R)에 포팅될 수 있다. 이러한 심층신경망은 로봇(R) 혹은 객체(obj1, obj2, ob3)의 위치 정보를 식별하거나, 자세 정보를 추정하거나, 객체(obj1, obj2, ob3)를 식별하도록 학습될 수 있다. 이를 위하여, 본 발명은 컬러 영상(Color Image), 깊이 영상(Depth Image) 및 적외선 영상(Infrared Image), 열 영상(Thermal Image)을 기초로 생성된 입력데이터와, 그 입력데이터에 대해 위치 정보, 자세 정보 및 개체 정보를 레이블링한 학습 데이터를 이용한다. Referring to FIG. 1 , basically, the computing device 10 is for learning a deep neural network according to an embodiment of the present invention. It can be installed in the robot (R) or the computing unit (C) to learn deep neural networks. If it is installed in the computing unit (C) to learn the deep neural network, the deep neural network can be ported to the robot (R). Such a deep neural network can be trained to identify position information of the robot R or objects (obj1, obj2, ob3), estimate posture information, or identify objects (obj1, obj2, ob3). To this end, the present invention provides input data generated based on a color image, a depth image, an infrared image, and a thermal image, and location information for the input data; Learning data labeled with posture information and entity information is used.

도 2를 참조하면, 본 발명의 실시예에 따른 다수의 열화상 및 영상 카메라의 출력 정보를 심층신경망 모델의 입력데이터로 구조화하기 위한 장치(10: 이하, '구조화장치'로 축약함)는 카메라부(11), 입력부(12), 표시부(13), 저장부(14), 통신부(15), 센서부(16), 위치정보부(17) 및 제어부(18)을 포함한다. Referring to FIG. 2 , an apparatus (10: hereinafter abbreviated as 'structuring apparatus') for structuring output information of a plurality of thermal imaging and imaging cameras into input data of a deep neural network model according to an embodiment of the present invention is a camera It includes a unit 11 , an input unit 12 , a display unit 13 , a storage unit 14 , a communication unit 15 , a sensor unit 16 , a location information unit 17 , and a control unit 18 .

카메라부(11)는 영상을 촬영하기 위한 것이다. 카메라부(11)는 영상카메라(110) 및 열화상카메라(120)를 포함한다. 영상카메라(110)는 피사체를 촬영하여 컬러 영상(Color Image), 깊이 영상(Depth Image) 및 적외선 영상(Infrared Image) 중 적어도 하나를 출력한다. 열화상카메라(120)는 피사체를 촬영하여 열 영상(Thermal Image)을 출력한다. The camera unit 11 is for capturing an image. The camera unit 11 includes an image camera 110 and a thermal image camera 120 . The video camera 110 captures a subject and outputs at least one of a color image, a depth image, and an infrared image. The thermal imaging camera 120 outputs a thermal image by photographing a subject.

입력부(12)는 연산장치(10)을 제어하기 위한 사용자의 조작을 입력받고 입력 신호를 생성하여 제어부(18)에 전달할 수 있다. 입력부(12)는 연산장치(10)을 제어하기 위한 각 종 버튼, 키 등을 포함한다. 입력부(12)는 표시부(13)가 터치스크린으로 이루어진 경우, 각 종 키들의 기능이 표시부(13)에서 이루어질 수 있으며, 터치스크린만으로 모든 기능을 수행할 수 있는 경우, 입력부(12)는 생략될 수도 있다. The input unit 12 may receive a user's manipulation for controlling the arithmetic unit 10 , generate an input signal, and transmit it to the control unit 18 . The input unit 12 includes various kinds of buttons and keys for controlling the arithmetic unit 10 . As for the input unit 12, when the display unit 13 is formed of a touch screen, the functions of various keys may be performed on the display unit 13, and when all functions can be performed only with the touch screen, the input unit 12 may be omitted. may be

표시부(13)는 화면 표시를 위한 것으로, 연산장치(10)의 메뉴, 입력된 데이터, 기능 설정 정보 및 기타 다양한 정보를 사용자에게 시각적으로 제공할 수 있다. 표시부(13)는 액정표시장치(LCD, Liquid Crystal Display), 유기 발광 다이오드(OLED, Organic Light Emitting Diodes), 능동형 유기 발광 다이오드(AMOLED, Active Matrix Organic Light Emitting Diodes) 등으로 형성될 수 있다. 한편, 표시부(13)는 터치스크린으로 구현될 수 있다. 이러한 경우, 표시부(13)는 터치센서를 포함한다. 터치센서는 사용자의 터치 입력을 감지한다. 터치센서는 정전용량 방식(capacitive overlay), 압력식, 저항막 방식(resistive overlay), 적외선 감지 방식(infrared beam) 등의 터치 감지 센서로 구성되거나, 압력 감지 센서(pressure sensor)로 구성될 수도 있다. 상기 센서들 이외에도 물체의 접촉 또는 압력을 감지할 수 있는 모든 종류의 센서 기기가 본 발명의 터치센서로 이용될 수 있다. 터치센서는 사용자의 터치 입력을 감지하고, 터치된 위치를 나타내는 입력 좌표를 포함하는 감지 신호를 발생시켜 제어부(18)로 전송할 수 있다. 특히, 표시부(13)가 터치스크린으로 이루어진 경우, 입력부(12) 기능의 일부 또는 전부는 표시부(13)를 통해 이루어질 수 있다. The display unit 13 is for screen display, and may visually provide a menu of the computing device 10 , input data, function setting information, and other various information to the user. The display unit 13 may be formed of a liquid crystal display (LCD), an organic light emitting diode (OLED), an active matrix organic light emitting diode (AMOLED), or the like. Meanwhile, the display unit 13 may be implemented as a touch screen. In this case, the display unit 13 includes a touch sensor. The touch sensor detects a user's touch input. The touch sensor may be composed of a touch sensing sensor such as a capacitive overlay, a pressure type, a resistive overlay, or an infrared beam, or may be composed of a pressure sensor. . In addition to the above sensors, all kinds of sensor devices capable of sensing contact or pressure of an object may be used as the touch sensor of the present invention. The touch sensor may detect a user's touch input, generate a detection signal including input coordinates indicating the touched position, and transmit it to the controller 18 . In particular, when the display unit 13 is formed of a touch screen, some or all of the functions of the input unit 12 may be performed through the display unit 13 .

저장부(14)는 연산장치(10)의 동작에 필요한 프로그램 및 데이터를 저장하는 역할을 수행한다. 저장부(14)는 카메라부(11)가 촬영한 영상 및 센서부(12)가 감지한 관성 데이터 등을 소정 기간 저장할 수 있다. 저장부(14)에 저장되는 각 종 데이터는 사용자의 조작에 따라, 삭제, 변경, 추가될 수 있다. The storage unit 14 serves to store programs and data necessary for the operation of the arithmetic unit 10 . The storage unit 14 may store an image captured by the camera unit 11 and inertial data sensed by the sensor unit 12 for a predetermined period of time. Various types of data stored in the storage unit 14 may be deleted, changed, or added according to a user's operation.

통신부(15)는 예컨대, 로봇 등의 다른 객체(20)와 통신하기 위한 것이다. 통신부(15)는 객체(20)로부터 객체(20)가 GPS 수신기를 통해 수신한 객체의 위치 정보, 객체(20)의 관성 센서를 통해 측정한 객체(20)의 관성 센서 정보 등을 수신할 수 있다. 통신부(15)는 송신되는 신호의 주파수를 상승 변환 및 증폭하는 RF 송신기와, 수신되는 신호를 저 잡음 증폭하고 주파수를 하강 변환하는 RF 수신기를 포함한다. 또한, 통신부(15)는 송신되는 신호를 변조하고, 수신되는 신호를 복조하는 모뎀(modem)을 포함한다. The communication unit 15 is for communicating with another object 20 such as, for example, a robot. The communication unit 15 may receive from the object 20 the location information of the object received by the object 20 through the GPS receiver, inertial sensor information of the object 20 measured through the inertial sensor of the object 20, and the like. have. The communication unit 15 includes an RF transmitter for up-converting and amplifying a frequency of a transmitted signal, and an RF receiver for low-noise amplifying and down-converting a received signal. In addition, the communication unit 15 includes a modem that modulates a transmitted signal and demodulates a received signal.

센서부(16)는 관성을 측정하기 위한 것이다. 이러한 센서부(16)는 속도 센서, 가속도 센서, 각속도 센서, 자이로센서, 관성센서(Inertial Measurement Unit: IMU), 도플러속도센서(Doppler Velocity Log: DVL) 및 자세방위각센서(Attitude and Heading Reference. System: AHRS) 중 적어도 하나를 포함한다. 이러한 센서는 MEMS(micro electro-mechanical systems)로 구현될 수 있다. The sensor unit 16 is for measuring inertia. The sensor unit 16 includes a speed sensor, an acceleration sensor, an angular velocity sensor, a gyro sensor, an inertial measurement unit (IMU), a Doppler Velocity Log (DVL), and an attitude and heading reference. System : AHRS). Such a sensor may be implemented as micro electro-mechanical systems (MEMS).

위치정보부(17)는 GPS(Global Positioning System) 신호를 수신하기 위한 것이다. 예컨대, 위치정보부(17)는 지속적으로 GPS 위성 등으로부터 GPS 신호를 수신하여, 수신된 GPS 신호로부터 위치 정보를 도출한다. 도출된 위치 정보는 제어부(18)로 전달된다. 이러한 위치 정보는 위도, 경도, 고도 등의 좌표가 될 수 있다. The location information unit 17 is for receiving a Global Positioning System (GPS) signal. For example, the location information unit 17 continuously receives a GPS signal from a GPS satellite or the like, and derives location information from the received GPS signal. The derived location information is transmitted to the control unit 18 . Such location information may be coordinates such as latitude, longitude, and altitude.

제어부(18)는 연산장치(10)의 전반적인 동작 및 연산장치(10)의 내부 블록들(11 내지 18)간 신호 흐름을 제어하고, 데이터를 처리하는 데이터 처리 기능을 수행할 수 있다. 또한, 제어부(18)는 기본적으로, 연산장치(10)의 각 종 기능을 제어하는 역할을 수행한다. 제어부(18)는 중앙처리장치(CPU: Central Processing Unit), 디지털신호처리기(DSP: Digital Signal Processor) 등을 예시할 수 있다. The controller 18 may control the overall operation of the arithmetic unit 10 and the signal flow between the internal blocks 11 to 18 of the arithmetic unit 10 , and may perform a data processing function of processing data. In addition, the control unit 18 basically serves to control various functions of the arithmetic unit 10 . The control unit 18 may be exemplified by a central processing unit (CPU), a digital signal processor (DSP), or the like.

다음으로, 도 3을 참조하면, 제어부(18)는 구성부(200), 학습부(300), 심층신경망(400: Deep Neural Network, DNN) 및 추정부(500)를 포함한다. Next, referring to FIG. 3 , the controller 18 includes a configuration unit 200 , a learning unit 300 , a deep neural network (DNN) 400 , and an estimator 500 .

구성부(200)는 카메라부(11)가 촬영한 영상으로부터 심층신경망(400)에 입력되는 입력데이터를 생성하기 위한 것이다. 구성부(200)는 카메라부(11)가 촬영한 2차원의 영상을 입력 받고, 입력된 2차원의 영상을 하나 이상의 3차원 구조의 입체영상(SI: Stereo Image)으로 구성하고, 구성된 복수의 입체 영상(SI)을 격자로 배열하여 격자 영상(GI: Grid Image)을 생성한다. 이러한 입체영상(SI: Stereo Image) 또는 격자 영상(GI: Grid Image)은 심층신경망(400)에 입력되는 입력데이터로 이용될 수 있다. The configuration unit 200 is for generating input data input to the deep neural network 400 from the image captured by the camera unit 11 . The configuration unit 200 receives a two-dimensional image captured by the camera unit 11, configures the input two-dimensional image into one or more three-dimensional stereoscopic images (SI: Stereo Image), and comprises a plurality of A grid image (GI) is generated by arranging a stereoscopic image (SI) in a grid. Such a stereo image (SI) or a grid image (GI) may be used as input data input to the deep neural network 400 .

이를 위하여, 구성부(200)는 평면형성부(210), 입체형성부(220) 및 격자형성부(230)를 포함한다. 평면형성부(210)는 2차원의 영상이 입력되면, 입력된 영상에 포함된 정보의 속성을 기초로 영상에 포함된 정보를 분류하고, 각각이 분류된 정보를 포함하는 복수의 평면 영상(FI: Flat Image)을 생성한다. 입체형성부(220)는 평면형성부(210)가 생성한 복수의 평면 영상(FI)을 복수의 층으로 쌓아 3차원 구조를 가지는 입체 영상(SI)을 생성한다. 격자형성부(230)는 복수의 입체 형상을 격자로 배열하여 격자 영상(GI)을 형성한다. 이러한 평면형성부(210), 입체형성부(220) 및 격자형성부(230)를 포함하는 구성부(200)의 동작은 아래에서 더 상세하게 설명될 것이다. To this end, the constituent unit 200 includes a planar forming unit 210 , a three-dimensional forming unit 220 , and a grid forming unit 230 . When a two-dimensional image is input, the plane forming unit 210 classifies information included in the image based on the properties of the information included in the input image, and a plurality of plane images FI each including the classified information. flat image). The three-dimensional forming unit 220 generates a three-dimensional image SI having a three-dimensional structure by stacking the plurality of planar images FI generated by the planar forming unit 210 in a plurality of layers. The grid forming unit 230 forms a grid image GI by arranging a plurality of three-dimensional shapes in a grid. The operation of the component 200 including the planar forming unit 210 , the three-dimensional forming unit 220 , and the grid forming unit 230 will be described in more detail below.

학습부(300)는 입력데이터에 레이블을 부여하는 레이블링을 수행하여 심층신경망(400)에 대한 학습 데이터를 생성하고, 이를 이용하여 심층신경망(400)을 학습시키기 위한 것이다. The learning unit 300 generates learning data for the deep neural network 400 by performing labeling to give a label to the input data, and uses this to learn the deep neural network 400 .

심층신경망(400)은 영상에 포함된 객체의 위치, 자세, 객체 정보를 추정하거나, 분류하기 위한 확률을 제공하기 위한 것이다. 심층신경망(400)은 CNN(Convolution Neural Network) 등과 같은 컨볼루션 신경망이거나, RNN(Recurrent Neural Network), LTSM(Long Short-Term Memory) 등의 순환 신경망이 될 수 있다. 하지만, 심층신경망(400)을 이에 한정하는 것은 아니며, 은닉층이 복수의 계층으로 이루어진 모든 종류의 인공신경망은 본 발명의 실시예에 따른 심층신경망(400)이 될 수 있다. The deep neural network 400 is to provide a probability for estimating or classifying the position, posture, and object information of an object included in an image. The deep neural network 400 may be a convolutional neural network, such as a Convolution Neural Network (CNN), or a recurrent neural network, such as a Recurrent Neural Network (RNN) or Long Short-Term Memory (LTSM). However, the deep neural network 400 is not limited thereto, and all kinds of artificial neural networks in which the hidden layer is composed of a plurality of layers may be the deep neural network 400 according to an embodiment of the present invention.

본 발명의 일 실시예에 따르면, 심층신경망(400)이 컨벌루션 신경망인 경우, 심층신경망(400)은 입력층(input layer: IL), 교번으로 반복되는 적어도 한 쌍의 컨벌루션층(convolution layer: CL)과 풀링층(pooling layer: PL), 적어도 하나의 완전연결층(fully-connected layer: FL) 및 출력층(output layer: OL)을 포함할 수 있다. 도 4에 도시된 바와 같이, 본 발명의 일 실시예에 따른 심층신경망(400)은 순차로 입력층(IL), 컨볼루션층(CL), 풀링층(PL), 완전연결층(FL) 및 출력층(OL)을 포함한다. According to an embodiment of the present invention, when the deep neural network 400 is a convolutional neural network, the deep neural network 400 is an input layer (IL), at least one pair of alternatingly repeated convolution layers (convolution layer: CL) ), a pooling layer (PL), and at least one fully-connected layer (FL) and an output layer (OL). As shown in FIG. 4 , the deep neural network 400 according to an embodiment of the present invention sequentially includes an input layer (IL), a convolution layer (CL), a pooling layer (PL), a fully connected layer (FL), and and an output layer OL.

컨볼루션층(CL) 및 풀링층(PL)은 적어도 하나의 특징맵(FM: Feature Map)으로 구성된다. 특징맵(FM)은 이전 계층의 연산 결과에 대해 가중치 및 임계치를 적용한 값을 입력받고, 입력받은 값에 대한 연산을 수행한 결과로 도출된다. 이러한 가중치는 소정 크기의 가중치 행렬인 필터 혹은 커널(W)을 통해 적용된다. 본 발명의 실시예에서 컨볼루션층(CL)의 컨벌루션 연산은 제1 필터(W1)가 사용되며, 풀링층(PL)의 풀링 연산은 제2 필터(W2)가 사용된다. The convolution layer CL and the pooling layer PL include at least one feature map (FM). The feature map FM is derived as a result of receiving a value applied with a weight and a threshold to the operation result of the previous layer, and performing an operation on the input value. These weights are applied through a filter or kernel W, which is a weight matrix of a predetermined size. In the embodiment of the present invention, the first filter W1 is used for the convolution operation of the convolutional layer CL, and the second filter W2 is used for the pooling operation of the pooling layer PL.

입력층(IL)에 입력데이터(소정 크기의 행렬 혹은 벡터열)가 입력되면, 컨볼루션층(CL)은 입력층(IL)의 입력 데이터에 대해 제1 필터(W1)를 이용한 컨벌루션(convolution) 연산 및 활성화함수에 의한 연산을 수행하여 적어도 하나의 제1 특징맵(FM1)을 도출한다. 이어서, 풀링층(PL)은 컨볼루션층(CL)의 적어도 하나의 제1 특징맵(FM1)에 대해 제2 필터(W2)를 이용한 풀링(pooling 또는 sub-sampling) 연산을 수행하여 적어도 하나의 제2 특징맵(FM2)을 도출한다. When input data (a matrix or vector column of a predetermined size) is input to the input layer IL, the convolution layer CL performs convolution using the first filter W1 on the input data of the input layer IL. At least one first feature map FM1 is derived by performing an operation using an operation and an activation function. Next, the pooling layer PL performs a pooling or sub-sampling operation using the second filter W2 on at least one first feature map FM1 of the convolutional layer CL to obtain at least one A second feature map FM2 is derived.

완결연결층(FL)은 복수의 연산노드(f1 내지 fx)로 이루어진다. 완결연결층(FL)의 복수의 연산노드(f1 내지 fx)는 풀링층(PL)의 적어도 하나의 제2 특징맵(FM2)에 대해 활성화함수에 의한 연산을 통해 복수의 연산값을 산출한다. The final connection layer FL includes a plurality of operation nodes f1 to fx. The plurality of operation nodes f1 to fx of the final connection layer FL calculates a plurality of operation values through an operation using an activation function with respect to at least one second feature map FM2 of the pooling layer PL.

출력층(OL)은 복수의 출력노드(g1 내지 gy)를 포함한다. 완결연결층(FL)의 복수의 연산노드(f1 내지 fx) 각각은 가중치(W: weight)를 가지는 채널로 출력층(OL)의 출력노드(g1 내지 gy)와 연결된다. 다른 말로, 복수의 연산노드(f1 내지 fx)의 복수의 연산값은 가중치가 적용되어 복수의 출력노드(g1 내지 gy) 각각에 입력된다. 이에 따라, 출력층(OL)의 복수의 출력노드(g1 내지 gy)는 완결연결층(FL)의 가중치가 적용되는 복수의 연산값에 대해 활성화함수에 의한 연산을 통해 출력값을 산출한다. The output layer OL includes a plurality of output nodes g1 to gy. Each of the plurality of operation nodes f1 to fx of the final connection layer FL is connected to the output nodes g1 to gy of the output layer OL through a channel having a weight (W). In other words, a weight is applied to a plurality of operation values of the plurality of operation nodes f1 to fx and input to each of the plurality of output nodes g1 to gy. Accordingly, the plurality of output nodes g1 to gy of the output layer OL calculates an output value through an activation function operation for a plurality of calculated values to which the weight of the final connection layer FL is applied.

전술한 컨벌루션층(CL), 완결연결층(FL) 및 출력층(OL)에서 사용되는 활성화함수는 시그모이드(Sigmoid), 하이퍼볼릭탄젠트(tanh: Hyperbolic tangent), ELU(Exponential Linear Unit), ReLU(Rectified Linear Unit), Leakly ReLU, Maxout, Minout, Softmax 등을 예시할 수 있다. 컨벌루션층(CL), 완결연결층(FL) 및 출력층(OL)에 이러한 활성화함수 중 어느 하나를 선택하여 적용할 수 있다. Activation functions used in the above-described convolutional layer (CL), final connection layer (FL) and output layer (OL) are Sigmoid, Hyperbolic tangent (tanh), Exponential Linear Unit (ELU), and ReLU. (Rectified Linear Unit), Leakly ReLU, Maxout, Minout, Softmax, etc. may be exemplified. Any one of these activation functions may be selected and applied to the convolutional layer CL, the final connection layer FL, and the output layer OL.

정리하면, 전술한 바와 같이, 심층신경망(400)은 복수의 계층을 포함한다. 또한, 심층신경망(400)의 복수의 계층은 복수의 연산을 포함한다. 복수의 계층 각각의 연산 결과는 파라미터, 즉, 가중치, 임계치 등이 적용되어 다음 계층으로 전달된다. 이에 따라, 심층신경망(400)은 입력데이터에 대해 복수의 계층의 가중치가 적용되는 복수의 연산을 수행하여 출력값을 산출하고, 산출된 출력값을 출력할 수 있다. In summary, as described above, the deep neural network 400 includes a plurality of layers. In addition, the plurality of layers of the deep neural network 400 includes a plurality of operations. The calculation result of each of the plurality of layers is transmitted to the next layer by applying parameters, ie, weights, thresholds, and the like. Accordingly, the deep neural network 400 may calculate an output value by performing a plurality of operations to which the weights of a plurality of layers are applied to the input data, and may output the calculated output value.

추정부(500)는 심층신경망(400)을 이용하여 영상에 포함된 객체의 위치, 자세, 객체 정보를 추정하거나, 분류하기 위한 것이다. 심층신경망(400)의 출력값은 확률이 될 수 있다. 추정부(500)는 이러한 심층신경망(400) 출력한 확률을 기초로 영상에 포함된 객체의 위치, 자세, 객체 정보를 추정하거나, 분류할 수 있다. The estimator 500 is for estimating or classifying the position, posture, and object information of the object included in the image by using the deep neural network 400 . The output value of the deep neural network 400 may be a probability. The estimator 500 may estimate or classify the position, posture, and object information of the object included in the image based on the probability output from the deep neural network 400 .

다음으로, 본 발명의 실시예에 따른 다수의 열화상 및 영상 카메라의 출력 정보를 심층신경망 모델의 입력데이터로 구조화하기 위한 방법에 대해서 설명하기로 한다. 도 5는 본 발명의 실시예에 따른 다수의 열화상 및 영상 카메라의 출력 정보를 심층신경망 모델의 입력데이터로 구조화하기 위한 방법을 설명하기 위한 흐름도이다. 도 6은 본 발명의 제1 실시예에 따른 다수의 열화상 카메라의 출력 정보로부터 입체 영상을 생성하는 방법을 설명하기 위한 도면이다. 도 7은 본 발명의 제2 실시예에 따른 다수의 영상 카메라의 출력 정보로부터 입체 영상을 생성하는 방법을 설명하기 위한 도면이다. 도 8은 본 발명의 실시예에 따른 다수의 열화상 및 영상 카메라의 출력 정보로부터 격자 영상을 생성하는 방법을 설명하기 위한 도면이다. Next, a method for structuring output information of a plurality of thermal images and video cameras into input data of a deep neural network model according to an embodiment of the present invention will be described. 5 is a flowchart illustrating a method for structuring output information of a plurality of thermal imaging and imaging cameras into input data of a deep neural network model according to an embodiment of the present invention. 6 is a view for explaining a method of generating a stereoscopic image from output information of a plurality of thermal imaging cameras according to the first embodiment of the present invention. 7 is a diagram for explaining a method of generating a stereoscopic image from output information of a plurality of image cameras according to a second embodiment of the present invention. 8 is a diagram for explaining a method of generating a grid image from output information of a plurality of thermal images and image cameras according to an embodiment of the present invention.

도 5를 참조하면, 구성부(200)는 S110 단계에서 카메라부(11)의 복수의 영상카메라(110) 및 복수의 열화상카메라(120)를 통해 촬영된 복수의 영상(IMG)을 입력받을 수 있다. 전술한 바와 같이, 카메라부(11)는 복수의 영상카메라(110) 및 복수의 열화상카메라(120)를 포함한다. 이에 따라, 구성부(200)는 컬러 영상(CI: Color Image), 깊이 영상(DI: Depth Image), 적외선 영상(II: Infrared Image)을 포함하는 복수의 영상과, 복수의 열 영상(TI: Thermal Image)을 입력받을 수 있다. Referring to FIG. 5 , the configuration unit 200 receives a plurality of images IMG photographed through the plurality of image cameras 110 and the plurality of thermal imaging cameras 120 of the camera unit 11 in step S110 . can As described above, the camera unit 11 includes a plurality of video cameras 110 and a plurality of thermal imaging cameras 120 . Accordingly, the configuration unit 200 includes a plurality of images including a color image (CI), a depth image (DI), and an infrared image (II), and a plurality of thermal images (TI). Thermal Image) can be input.

이와 같이, 복수의 영상(IMG: CI, DI, II, TI)이 입력되면, 도 6 및 도 7에 도시된 바와 같이, 구성부(200)의 평면분류부(210)는 S120 단계에서 복수의 영상(IMG: CI, DI, II, TI) 각각에 포함된 정보의 속성을 기초로 복수의 영상(IMG: CI, DI, II, TI) 각각에 포함된 정보를 분류하고, 각각이 분류된 정보를 포함하는 복수의 평면 영상(FI: Flat Image)을 생성하고, 구성부(200)의 입체형성부(220)는 평면분류부(210)가 생성한 복수의 평면 영상(F1)을 복수의 층으로 쌓아 3차원 구조를 가지는 복수의 입체 영상(SI: Stereo Image)을 생성한다. 본 발명의 실시예에서 평면 영상(F1)은 복수의 픽셀이 2차원으로 배열된 이미지이며, 복수의 픽셀 각각의 픽셀값은 소정의 정보, 예컨대, 온도, 컬러, 깊이(거리), 밝기 등을 가진다. 또한, 본 발명의 실시예에서 입체 영상(SI: Stereo Image)은 복수의 평면 영상(F1)이 적층되어 복수의 평면 영상(F1)에 포함된 전체 픽셀이 3차원의 형태를 구성하는 이미지를 의미한다. In this way, when a plurality of images (IMG: CI, DI, II, TI) are input, as shown in FIGS. 6 and 7 , the plane classification unit 210 of the configuration unit 200 generates a plurality of images in step S120 . Information included in each of a plurality of images (IMG: CI, DI, II, TI) is classified based on the properties of the information included in each of the images (IMG: CI, DI, II, TI), and each classified information Generates a plurality of flat images (FI) including A plurality of stereo images (SI: Stereo Images) having a three-dimensional structure are generated by stacking them. In the embodiment of the present invention, the planar image F1 is an image in which a plurality of pixels are arranged in two dimensions, and each pixel value of the plurality of pixels contains predetermined information such as temperature, color, depth (distance), brightness, etc. have In addition, in the embodiment of the present invention, a stereo image (SI) refers to an image in which a plurality of flat images F1 are stacked and all pixels included in the plurality of flat images F1 constitute a three-dimensional shape. do.

전술한 S120 단계는 도 6에 도시된 바와 같이, 입력된 영상이 열 영상(TI)인 경우를 가정하는 제1 실시예와, 도 7에 도시된 바와 같이, 입력된 영상이 컬러 영상(CI), 깊이 영상(DI) 및 적외선 영상(II)을 포함하는 경우를 가정하는 제2 실시예를 포함한다. 이러한 S120 단계의 제1 실시예와 제2 실시예는 아래에서 더 상세하게 설명될 것이다. The above-described step S120 is performed in the first embodiment assuming that the input image is a thermal image (TI) as shown in FIG. 6 , and as shown in FIG. 7 , in which the input image is a color image (CI). , a second embodiment that assumes a case including a depth image DI and an infrared image II is included. The first and second embodiments of step S120 will be described in more detail below.

제1 및 제2 실시예에 따라 복수의 입체 영상(SI)이 생성되면, 격자형성부(230)는 S130 단계에서 복수의 입체 영상(SI)을 격자로 배열하여 격자 영상(GI: Grid Image)을 형성한다. When a plurality of stereoscopic images SI are generated according to the first and second embodiments, the grid forming unit 230 arranges the plurality of stereoscopic images SI in a grid in step S130 to form a grid image (GI). to form

예컨대, 카메라부(110)는 제1 영상카메라(111) 및 제2 영상 카메라(112)와, 제1 열화상카메라(121) 및 제2 열화상카메라(122)를 포함한다고 가정한다. 또한, 동일 시점에 피사체를 촬영한 제1 영상카메라(111) 및 제2 영상 카메라(112)와, 제1 열화상카메라(121) 및 제2 열화상카메라(122)가 동일 시점에 피사체를 촬영하여 제1 영상카메라(111)가 컬러 영상(CI), 깊이 영상(DI), 적외선 영상(II)을 포함하는 제1 영상을 출력하고, 제2 영상카메라(112)가 컬러 영상(CI), 깊이 영상(DI), 적외선 영상(II)을 포함하는 제2 영상을 출력하고, 제1 열화상카메라(121)가 제1 열 영상(TI1)을 출력하고, 제2 열화상카메라(121)가 제2 열 영상(TI2)을 출력하였다고 가정한다. 이에 따라, 구성부(200)는 도 8에 도시된 바와 같이, 제1 영상으로부터 제1 입체영상(SI1), 제2 영상으로부터 제2 입체영상(SI2), 제1 열 영상(TI1)으로부터 제3 입체영상(SI3), 그리고 제2 열 영상(TI2)으로부터 제4 입체영상(SI4)을 생성할 수 있다. 그러면, 격자형성부(230)는 도시된 바와 같이, 제1 입체영상(SI1), 제2 입체영상(SI2), 제3 입체영상(SI3) 및 제4 입체영상(SI4)을 격자로 배열하여 격자 영상(GI)을 형성한다. For example, it is assumed that the camera unit 110 includes a first image camera 111 and a second image camera 112 , and a first thermal imager 121 and a second thermal imager 122 . In addition, the first imaging camera 111 and the second imaging camera 112 and the first thermal imaging camera 121 and the second thermal imaging camera 122 photographing the subject at the same point in time photograph the subject at the same time Thus, the first image camera 111 outputs a first image including a color image CI, a depth image DI, and an infrared image II, and the second image camera 112 outputs a color image CI, A second image including a depth image DI and an infrared image II is output, the first thermal imaging camera 121 outputs a first thermal image TI1, and the second thermal imaging camera 121 is It is assumed that the second thermal image TI2 is output. Accordingly, as shown in FIG. 8 , the configuration unit 200 performs the first stereoscopic image SI1 from the first image, the second stereoscopic image SI2 from the second image, and the second stereoscopic image SI1 from the first thermal image TI1. A fourth stereoscopic image SI4 may be generated from the third stereoscopic image SI3 and the second thermal image TI2 . Then, the grid forming unit 230 arranges the first stereoscopic image SI1 , the second stereoscopic image SI2 , the third stereoscopic image SI3 , and the fourth stereoscopic image SI4 in a grid as shown in the figure. A lattice image GI is formed.

한편, S130 단계에서 격자 영상(GI)에 포함되는 복수의 입체 영상(SI1, SI2, SI3, SI4) 중 적어도 하나의 높이가 다른 경우, 격자형성부(230)는 적어도 하나의 입체 영상(SI)으로부터 적어도 하나의 평면 영상(FI)을 소거하거나, 적어도 하나의 입체 영상(SI)에 적어도 하나의 패딩 영상(PI: Padding Image)을 부가하여 격자 영상에 포함되는 복수의 입체 영상(SI1, SI2, SI3, SI4)의 높이를 일치시킬 수 있다. 여기서, 패딩 영상(PI)은 모든 픽셀의 픽셀값이 상수인 평면 영상(FI)을 의미한다. Meanwhile, when at least one of the plurality of stereoscopic images SI1 , SI2 , SI3 and SI4 included in the grid image GI has a different height in step S130 , the grid forming unit 230 performs at least one stereoscopic image SI. A plurality of stereoscopic images SI1, SI2, included in a grid image by deleting at least one planar image FI from The heights of SI3 and SI4) can be matched. Here, the padding image PI refers to a flat image FI in which pixel values of all pixels are constant.

그러면, 전술한 제1 실시예 및 제2 실시예에 대해서 보다 상세하게 설명하기로 한다. 먼저, 본 발명의 제1 실시예에 따른 입체 영상(SI)을 생성하는 방법에 대해서 설명하기로 한다. 도 9는 본 발명의 제1 실시예에 따른 입체 영상(SI)을 생성하는 방법을 설명하기 위한 흐름도이다. Then, the above-described first and second embodiments will be described in more detail. First, a method for generating a stereoscopic image SI according to a first embodiment of the present invention will be described. 9 is a flowchart illustrating a method of generating a stereoscopic image (SI) according to the first embodiment of the present invention.

도 6 및 도 9를 참조하면, 전술한 바와 같이 제1 실시예는 구성부(200)에 입력된 영상이 열 영상(TI)인 경우를 가정한다. 열 영상(TI)이 입력되면, 평면분류부(210)는 S210 단계에서 열 영상의 픽셀의 픽셀값이 나타내는 온도를 기준으로 열 영상의 픽셀을 복수의 단위 구간으로 분류한다. 6 and 9 , as described above, in the first embodiment, it is assumed that an image input to the configuration unit 200 is a thermal image TI. When the thermal image TI is input, the plane classifying unit 210 classifies the pixel of the thermal image into a plurality of unit sections based on the temperature indicated by the pixel value of the pixel of the thermal image in step S210 .

이때, 평면분류부(210)는 온도 분포 {T | 0 ≤ T < Tmax}를 설정하고, 설정된 온도 분포에 따라 온도 분포의 최대값 Tmax을 도출한다. 그런 다음, 평면분류부(210)는 다음의 수학식 1에 따라 열 영상의 픽셀을 픽셀값이 나타내는 온도를 기준으로 소정의 분할 수의 단위 구간으로 구분한다. At this time, the plane classification unit 210 is the temperature distribution {T | 0 ≤ T < Tmax}, and derive the maximum value Tmax of the temperature distribution according to the set temperature distribution. Then, the plane classifying unit 210 divides the pixels of the thermal image into unit sections of a predetermined number of divisions based on the temperature indicated by the pixel value according to Equation 1 below.

수학식 1에서, Q는 분할 수이고, 미리 설정된다. 또한, Tstart는 단위 구간의 시작 온도이고, Tend는 단위 구간의 종료 온도를 나타낸다. Tmax는 온도 분포의 최대값을 의미한다. 온도 분포는 기본적으로, 모든 픽셀의 픽셀값, 즉, 온도의 평균 및 표준 편차를 기준으로 결정될 수 있다. 혹은, 온도 분포는 심층신경망(400)을 학습시키고자 하는 목적에 따라 특정한 온도 범위 혹은 온도가 반드시 포함되어야 하는 경우, 온도 분포는 해당 온도 범위 혹은 온도를 기준으로 설정될 수도 있다. 예컨대, 심층신경망(400)이 과열된 배터리를 검출하거나, 수소 저장 탱크의 안정성을 검출하기 위한 것이라면, 일반적인 상온의 온도의 범위 보다는 온도 분포가 상대적으로 고온이거나, 상대적으로 저온의 온도 분포를 가져야 한다. 따라서 온도 분포는 심층신경망(400)을 학습시키고자 하는 목적에 맞춰 설정하는 것이 바람직하다. In Equation 1, Q is the number of divisions and is preset. In addition, Tstart is the start temperature of the unit section, and Tend represents the end temperature of the unit section. Tmax means the maximum value of the temperature distribution. The temperature distribution may be basically determined based on the pixel values of all pixels, that is, the mean and standard deviation of the temperature. Alternatively, when the temperature distribution must include a specific temperature range or temperature according to the purpose of training the deep neural network 400 , the temperature distribution may be set based on the corresponding temperature range or temperature. For example, if the deep neural network 400 is to detect an overheated battery or to detect the stability of a hydrogen storage tank, the temperature distribution should be relatively high or have a relatively low temperature distribution than the general room temperature range. . Therefore, it is preferable to set the temperature distribution according to the purpose of learning the deep neural network 400 .

다음으로, 평면분류부(210)는 S220 단계에서 온도 분포의 최대값(Tmax) 이상의 온도를 나타내는 픽셀값을 유효 최대 온도(Tmax-1)를 나타내는 픽셀값으로 대체한다. 앞서 설정된 온도 분포 {T | 0 ≤ T < Tmax}와 같이 온도 분포의 종료 범위가 폐구간이기 때문에 유효 최대 온도(Tmax-1)는 온도 분포의 최대값(Tmax)에서 1을 차감한다. Next, the plane classifying unit 210 replaces a pixel value indicating a temperature equal to or higher than the maximum value Tmax of the temperature distribution with a pixel value indicating the effective maximum temperature Tmax-1 in step S220. previously set temperature distribution {T | As 0 ≤ T < Tmax}, since the end range of the temperature distribution is a closed section, the effective maximum temperature (Tmax-1) subtracts 1 from the maximum value (Tmax) of the temperature distribution.

그런 다음, 평면분류부(210)는 S230 단계에서 분류된 복수(Q개)의 단위 구간 별로 픽셀값이 단위 구간에 포함되는 온도를 나타내는 픽셀로 이루어진 평면 영상을 구성함으로써 복수(Q개)의 평면 영상(FI)을 생성한다. 이때, 평면분류부(210)는 픽셀값이 단위 구간에 포함되지 않는 온도를 나타내는 픽셀의 픽셀값을 0으로 대체한다. 정리하면, S230 단계에서 평면분류부(210)는 분류된 복수(Q개)의 단위 구간 별로, 평면 영상(FI)을 구성함으로써 복수(Q개)의 평면 영상을 생성한다. 특히, 각 평면 영상(FI)은 픽셀값이 단위 구간에 포함되는 온도를 나타내는 특징 픽셀과 픽셀값이 단위 구간에 포함되지 않는 온도를 나타내는 픽셀의 픽셀값을 0으로 변환한 패딩 픽셀을 포함한다. 예컨대, 도 6에 도시된 어느 하나의 평면 영상(FI)은 픽셀값이 해당 평면 영상의 단위 구간에 포함되는 온도를 나타내는 특징 픽셀(FX)을 포함하며, 특징 픽셀(FX) 이외의 나머지 영역은 모두 패딩 픽셀(PX)이며, 패딩 픽셀(PX)의 모든 픽셀값은 0으로 대체된다. Then, the plane classification unit 210 constructs a plane image composed of pixels representing the temperature included in the unit section for each of the plurality of (Q) unit sections classified in step S230 by constructing a planar image of a plurality of (Q) planes. Create an image (FI). In this case, the plane classifying unit 210 replaces the pixel value of the pixel representing the temperature at which the pixel value is not included in the unit section with 0. In summary, in step S230 , the plane classifying unit 210 generates a plurality of (Q) plane images by constructing the plane image FI for each of the classified plurality (Q) unit sections. In particular, each plane image FI includes a feature pixel representing a temperature in which a pixel value is included in a unit section and a padding pixel obtained by converting a pixel value of a pixel representing a temperature in which a pixel value is not included in the unit section to 0. For example, any one plane image FI shown in FIG. 6 includes a feature pixel FX whose pixel value indicates a temperature included in a unit section of the plane image, and the remaining areas other than the feature pixel FX All are padding pixels PX, and all pixel values of the padding pixels PX are replaced with 0.

다음으로, 평면분류부(210)는 S240 단계에서 복수(Q개)의 평면 영상(FI)의 픽셀의 픽셀값을 정규화하여 [0.0, 1.0)의 범위로 변환한다. 이러한 정규화에 대해 보다 상세하게 설명하면 다음과 같다. 평면분류부(210)는 복수(Q개)의 평면 영상(FI) 각각의 모든 특징 픽셀(FX)의 픽셀값에서 해당 평면 영상의 단위 구간의 시작 온도

를 차감한다. 이와 같이, 특징 픽셀(FX)은 각 단위 구간의 시작 온도를 차감하고, 나머지 패딩 픽셀(PX)의 픽셀값은 0이기 때문에 모든 평면 영상(FI)의 모든 픽셀의 픽셀값은

범위를 가진다. 그러면, 평면분류부(210)는 평면 영상(F1)의 모든 픽셀을

로 나눈다. 그러면, 모든(Q개) 평면 영상(FI)의 모든 픽셀의 픽셀값은 [0.0, 1.0)의 범위로 변환된다. 이러한 S240 단계의 정규화 과정은 선택적인 것으로 생략될 수 있다. Next, the plane classifying unit 210 normalizes the pixel values of the pixels of the plurality of (Q) plane images FI in step S240 and converts them to a range of [0.0, 1.0). The normalization will be described in more detail as follows. The plane classifying unit 210 determines the starting temperature of the unit section of the corresponding plane image from the pixel values of all the characteristic pixels FX of each of the plurality of (Q) plane images FI.

deduct the As such, since the feature pixel FX subtracts the start temperature of each unit section, and the pixel value of the remaining padding pixels PX is 0, the pixel values of all pixels of all flat images FI are

have a range Then, the plane classification unit 210 selects all pixels of the plane image F1.

Divide by Then, the pixel values of all pixels of all (Q) plane images FI are converted into the range of [0.0, 1.0). The normalization process of step S240 is optional and may be omitted.

다음으로, 입체형성부(220)는 S250 단계에서 앞서 생성된 복수(Q개)의 평면 영상(FI)을 복수의 층, 즉, 깊이 방향으로 쌓아 3차원 구조를 가지는 입체 영상(SI)을 생성한다. 이어서, 입체형성부(220)는 선택적으로, S260 단계에서 입체 영상(SI)에 적어도 하나의 패딩 영상(PI)을 부가할 수 있다. 여기서, 패딩 영상(PI)은 모든 픽셀의 픽셀값이 상수 C(여기서, C는 실수)인 평면 영상(FI)을 의미한다. 이러한 패딩 영상(PI)은 깊이 방향에 대한 기준값을 부여하기 위한 것이다. Next, the three-dimensional forming unit 220 generates a three-dimensional image SI having a three-dimensional structure by stacking the plurality of (Q) planar images FI previously generated in step S250 in a plurality of layers, that is, in the depth direction. . Subsequently, the stereoscopic forming unit 220 may optionally add at least one padding image PI to the stereoscopic image SI in operation S260 . Here, the padding image PI means a flat image FI in which pixel values of all pixels are constant C (here, C is a real number). The padding image PI is used to provide a reference value for the depth direction.

다음으로, 본 발명의 제2 실시예에 따른 입체 영상(SI)을 생성하는 방법에 대해서 설명하기로 한다. 도 10은 본 발명의 제2 실시예에 따른 입체 영상(SI)을 생성하는 방법을 설명하기 위한 흐름도이다. Next, a method for generating a stereoscopic image SI according to a second embodiment of the present invention will be described. 10 is a flowchart illustrating a method of generating a stereoscopic image (SI) according to a second embodiment of the present invention.

도 7 및 도 10을 참조하면, 전술한 바와 같이 제2 실시예는 구성부(200)에 입력되는 영상이 컬러 영상(CI), 깊이 영상(DI), 적외선 영상(II)을 포함하는 복수의 영상인 경우를 가정한다. 이러한 영상(CI, DI, II)이 입력되면, 평면분류부(210)는 S310 단계에서 색 정보, 거리 정보 및 밝기 정보를 기준으로 픽셀값이 색 정보, 거리 정보 및 밝기 정보 중 적어도 하나를 가지는 복수의 평면 영상을 생성한다. 이러한 S310 단계에 대해 보다 구체적으로 설명하면, 다음과 같다. Referring to FIGS. 7 and 10 , as described above, in the second embodiment, an image input to the configuration unit 200 includes a plurality of images including a color image CI, a depth image DI, and an infrared image II. Assume that it is an image. When these images (CI, DI, II) are input, the plane classification unit 210 determines that pixel values have at least one of color information, distance information, and brightness information based on color information, distance information, and brightness information in step S310 . Generate a plurality of flat images. In more detail with respect to this step S310, as follows.

먼저, 컬러 영상(CI)은 각 픽셀을 각 8비트의 R(Red, 적색), G(Green, 녹색), B(Blue, 청색) 값의 조합으로 표현한다. 이에 따라, 평면분류부(210)는 컬러 영상(CI)으로부터 8비트의 R(Red, 적색) 채널의 픽셀값만을 가지는 R 채널 평면 영상

, 8비트의 G(Green, 녹색) 채널의 픽셀값만을 가지는 G 채널 평면 영상

및 8비트의 B(Blue, 청색) 채널의 픽셀값만을 가지는 B 채널 평면 영상

을 생성할 수 있다. 또한, 평면분류부(210)는 3개 채널의 컬러 영상의 정보를 축약하기 위해서 컬러 영상(CI)의 픽셀값을 3채널의 RGB 값에서 그레이스케일 값으로 변환하여 픽셀값이 그레이스케일의 값으로 이루어진 그레이스케일 평면 영상

을 생성할 수 있다. First, in the color image CI, each pixel is expressed as a combination of 8-bit R (Red, Red), G (Green, Green), and B (Blue, Blue) values. Accordingly, the plane classification unit 210 is an R channel plane image having only 8-bit R (Red, red) channel pixel values from the color image CI.

, G-channel plane image with only 8-bit G(Green) channel pixel values

and a B channel plane image having only 8-bit B (Blue, Blue) channel pixel values.

can create In addition, the plane classification unit 210 converts the pixel value of the color image CI from the RGB value of the three channels to the grayscale value in order to abbreviate the information of the color image of the three channels, and the pixel value is converted to the value of the grayscale. A grayscale plane image made up of

can create

또한, 깊이 영상(DI)의 픽셀값은 기본적으로, 각 픽셀이 8비트(256 범위)에서 16비트(65536 범위) 사이의 크기를 갖는 거리를 나타낸다. 따라서 평면분류부(210)는 거리를 나타내는 픽셀값의 크기에 따라 깊이 영상(DI)으로부터 16비트(65536 범위)로 거리를 나타내는 상위 깊이 평면 영상

및 8비트로 거리를 나타내는 하위 깊이 평면 영상

을 생성할 수 있다. 또는, 평면분류부(210)는 깊이 영상(DI)에 대해 정보를 축약하기 위해서 깊이 영상(DI)의 픽셀값을 8비트로 양자화(Quantization) 하여 양자화 깊이 평면 영상

을 생성할 수 있다. In addition, the pixel value of the depth image DI basically represents a distance at which each pixel has a size between 8 bits (256 range) and 16 bits (65536 range). Therefore, the plane classifying unit 210 determines the upper depth plane image indicating the distance in 16 bits (65536 range) from the depth image DI according to the size of the pixel value indicating the distance.

and a sub-depth plane image representing the distance in 8 bits.

can create Alternatively, the plane classifying unit 210 quantizes the pixel value of the depth image DI to 8 bits in order to abbreviate information about the depth image DI, and then quantizes the quantized depth plane image.

can create

적외선 영상(II)의 픽셀값은 기본적으로, 8비트(256 범위)에서 16비트(65536 범위) 사이의 크기를 갖는 밝기를 표현한다. 이에 따라, 평면분류부(210)는 적외선 영상(II)으로부터 밝기를 나타내는 픽셀값의 크기에 따라 16비트(65536 범위)로 밝기를 표현하는 상위 밝기 평면 영상

및 16비트(65536 범위)로 밝기를 표현하는 하위 평면 영상

을 생성할 수 있다. 또는, 평면분류부(210)는 적외선 영상(II)에 대해 정보를 축약하기 위해서 적외선 영상(II)의 픽셀값을 8비트로 양자화(Quantization) 하여 픽셀값을 양자화한 양자화 밝기 평면 영상

을 생성할 수 있다. The pixel value of the infrared image (II) basically expresses brightness having a size between 8 bits (256 range) and 16 bits (65536 range). Accordingly, the plane classifying unit 210 expresses the brightness in 16 bits (range 65536) according to the size of the pixel value representing the brightness from the infrared image II.

and a sub-plane image representing brightness in 16 bits (range 65536)

can create Alternatively, the plane classifying unit 210 quantizes the pixel values of the infrared image II into 8 bits in order to abbreviate information about the infrared image II and quantizes the pixel values.

can create

이와 같이, 본 발명의 제2 실시예에서 평면분류부(210)는 영상 카메라(110)에서 출력되는 각 영상의 픽셀을 채널 별 또는 바이트 순서 별로 분류하여 분류된 바에 따라 평면 영상(FI)을 생성한다. As described above, in the second embodiment of the present invention, the plane classification unit 210 classifies the pixels of each image output from the image camera 110 by channel or byte order, and generates a plane image FI according to the classification. do.

다음으로, 평면분류부(210)는 S320 단계에서 복수의 평면 영상(FI)의 픽셀의 픽셀값을 정규화하여 [0.0, 1.0)의 범위로 변환한다. 이때, 평면분류부(210)는 각 평면의 픽셀의 최대 픽셀값을 구하고, 최대 픽셀값으로 모든 픽셀의 픽셀값을 나누어 준다. 이에 따라, 모든 픽셀의 값이 [0.0, 1.0)의 범위로 변환된다. 이러한 S320 단계의 정규화 과정은 선택적인 것으로 생략될 수 있다. Next, the plane classifying unit 210 normalizes the pixel values of the pixels of the plurality of plane images FI in step S320 and converts them to a range of [0.0, 1.0). At this time, the plane classifying unit 210 obtains the maximum pixel value of the pixel of each plane, and divides the pixel value of all the pixels by the maximum pixel value. Accordingly, the values of all pixels are converted to the range of [0.0, 1.0). The normalization process of step S320 is optional and may be omitted.

다음으로, 입체형성부(220)는 S330 단계에서 앞서 생성된 복수의 평면 영상(FI)을 복수의 층, 즉, 깊이 방향으로 쌓아 3차원 구조를 가지는 입체 영상(SI)을 생성한다. 전술한 바와 같이, 평면분류부(210)는 컬러 영상(CI)에 대해 3개 혹은 1개의 평면 영상, 깊이 영상(DI)에 대해 2개 혹은 1개의 평면 영상, 그리고, 적외선 영상(II)에 대해 2개 혹은 1개의 평면 영상을 생성하였다. 따라서 입체형성부(220)가 입체 영상(SI)을 생성할 때, 평면분류부(210)가 생성한 평면 영상(FI)의 수에 따라 그 깊이가 달라진다. 가장 깊은 3차원 구조의 입체 영상(SI)은 다음의 수학식 2와 같이 깊이가 7개 층으로 구성된다. Next, the three-dimensional forming unit 220 generates the three-dimensional image SI having a three-dimensional structure by stacking the plurality of previously generated planar images FI in a plurality of layers, that is, in the depth direction in step S330 . As described above, the plane classification unit 210 includes three or one planar image for the color image CI, two or one planar image for the depth image DI, and the infrared image II. Two or one planar images were generated for each image. Accordingly, when the three-dimensional forming unit 220 generates the three-dimensional image SI, the depth varies according to the number of the two-dimensional image FI generated by the plane classifying unit 210 . The stereoscopic image (SI) of the deepest three-dimensional structure is composed of seven layers having a depth as shown in Equation 2 below.

또한, 가장 얕은 3차원 구조의 입체 영상(SI)은 깊이가 3개 층으로 구성되고 다음의 수학식 3과 같다. In addition, the 3D image SI of the shallowest 3D structure is composed of three layers in depth and is expressed by Equation 3 below.

수학식 2 및 수학식 3의 각 층의 순서는 상호간에 변경될 수 있다. The order of each layer in Equations 2 and 3 may be mutually changed.

다음으로, 입체형성부(220)는 선택적으로, S340 단계에서 입체 영상(SI)에 적어도 하나의 패딩 영상(PI)을 부가할 수 있다. 여기서, 패딩 영상(PI)은 모든 픽셀의 픽셀값이 상수 C(여기서, C는 실수)인 평면 영상(FI)을 의미한다. 이러한 패딩 영상(PI)은 깊이 방향에 대한 기준값을 부여하기 위한 것이다. 예컨대, 도 7에 도시된 바와 같이, 상수 0인 평면 영상

과, 상수 1인 평면 영상

이 부가될 수 있다. 각 층의 순서는 상호간에 바뀔 수 있다. Next, the stereoscopic forming unit 220 may optionally add at least one padding image PI to the stereoscopic image SI in operation S340 . Here, the padding image PI means a flat image FI in which pixel values of all pixels are constant C (here, C is a real number). The padding image PI is used to provide a reference value for the depth direction. For example, as shown in FIG. 7 , a flat image with a constant 0

and a flat image with constant 1

This can be added. The order of each floor can be changed with each other.

정리하면, 본 발명의 제1 및 제2 실시예에 따라 열화상 카메라(120) 또는 영상 카메라(110)를 통해 촬영한 복수의 영상(IMG: CI, DI, II, TI)으로부터 복수의 입체 영상(SI)을 생성할 수 있다. 그리고 앞서 도 5에서 설명된 바와 같이, 제1 및 제2 실시예에 따라 복수의 입체 영상(SI)이 생성되면, 격자형성부(230)는 복수의 입체 영상(SI)을 격자로 배열하여 격자 영상(GI)을 형성할 수 있다. In summary, a plurality of stereoscopic images from a plurality of images (IMG: CI, DI, II, TI) captured by the thermal imaging camera 120 or the imaging camera 110 according to the first and second embodiments of the present invention (SI) can be created. And as described above with reference to FIG. 5 , when a plurality of stereoscopic images SI are generated according to the first and second embodiments, the grid forming unit 230 arranges the plurality of stereoscopic images SI in a grid to create a grid. An image GI may be formed.

전술한 입체 영상(SI) 혹은 격자 영상(GI)은 심층신경망(400)에 대한 입력데이터이며, 이러한 입력데이터에 레이블을 부여하여 심층신경망(400)에 대한 학습 데이터로 활용할 수 있다. 이러한 학습 데이터를 생성하는 방법에 대해서 설명하기로 한다. 도 11은 본 발명의 실시예에 따른 학습 데이터를 생성하는 방법을 설명하기 위한 흐름도이다. The above-described stereoscopic image (SI) or grid image (GI) is input data for the deep neural network 400 , and may be used as learning data for the deep neural network 400 by assigning a label to the input data. A method of generating such training data will be described. 11 is a flowchart illustrating a method of generating learning data according to an embodiment of the present invention.

도 11을 참조하면, 구성부(200)는 S410 단계에서 시점 t에 카메라부(11)의 열화상 카메라(120) 또는 영상 카메라(110)를 통해 촬영한 복수의 영상(IMG: CI, DI, II, TI)으로부터 복수의 입체 영상(SI)을 생성하고, 복수의 입체 영상(SI)을 격자로 배열하여 격자 영상(GI)을 형성할 수 있다. 이러한 복수의 입체 영상(SI) 또는 격자 영상(GI)은 심층신경망(400)에 대한 입력데이터로 이용된다. Referring to FIG. 11 , the configuration unit 200 includes a plurality of images (IMG: CI, DI, II and TI), a plurality of stereoscopic images SI may be generated, and the plurality of stereoscopic images SI may be arranged in a grid to form a grid image GI. The plurality of stereoscopic images (SI) or lattice images (GI) are used as input data to the deep neural network 400 .

그러면, 학습부(300)는 S420 단계에서 상기 시점 t에 대응하는 레이블 데이터를 수집한다. 이러한 레이블 데이터는 심층신경망(400)을 학습시키기 위한 목적에 따라 달라질 수 있다. 이러한 레이블 데이터는 객체(20)의 절대 위치 또는 상대위치 변화량을 포함하는 위치 정보, 객체(20)의 자세 정보 및 객체(20)의 식별 정보를 포함할 수 있다. Then, the learning unit 300 collects label data corresponding to the time t in step S420 . Such label data may vary depending on the purpose for training the deep neural network 400 . The label data may include position information including an absolute position or relative position change amount of the object 20 , posture information of the object 20 , and identification information of the object 20 .

그런 다음, 학습부(300)는 S430 단계에서 앞서 수집된 레이블 데이터를 입력데이터에 대한 레이블로 설정하여 학습 데이터를 생성한다. 즉, 학습부(300)는 시점 t에 촬영된 복수의 영상(IMG: CI, DI, II, TI)으로부터 생성된 입력데이터에 시점 t에 수집된 레이블 데이터로 레이블링(Labeling)한다. Then, the learning unit 300 generates learning data by setting the previously collected label data as a label for the input data in step S430. That is, the learning unit 300 labels input data generated from a plurality of images (IMG: CI, DI, II, TI) captured at time t with label data collected at time t.

일 실시예에 따르면, 학습부(300)는 심층신경망(400)이 연산장치(10)을 탑재한 로봇(R) 및 객체(obj1, obj2, obj3)의 위치를 추정하기 위한 학습 데이터를 마련한다. 이를 위하여, 구성부(200)가 입력데이터를 생성하면, 학습부(300)는 구성부(200)가 입력데이터를 생성한 시점, 즉, 시점 t에 대응하여 센서부(16)가 측정한 관성 정보 및 위치정보부(17)가 수신한 GPS 신호, 로봇인 객체(obj1)으로부터 수신된 해당 객체(obj1)의 관성 정보 및 GPS 신호, 카메라부(11)가 촬영한 깊이 영상(Depth Image)에서 추출되는 객체(obj1, obj2, obj3)의 특징점(FP)의 깊이를 기초로 로봇(R), 객체(obj1, obj2, obj3)의 직교 좌표계상의 3축(x, y, z)으로 표현된 절대 위치 및 이전 위치로부터의 상대적으로 위치가 변화한 상대위치 변화량을 포함하는 위치 정보를 레이블 데이터로 획득한다. 그런 다음, 학습부(300)는 시점 t의 입력데이터에 대해 획득한 위치 정보를 레이블링한다. According to an embodiment, the learning unit 300 prepares learning data for estimating the positions of the robot R and the objects obj1, obj2, and obj3 in which the deep neural network 400 is equipped with the computing device 10. . To this end, when the configuration unit 200 generates input data, the learning unit 300 generates the inertia measured by the sensor unit 16 in response to the timing when the configuration unit 200 generates the input data, that is, the time t. Extracted from the GPS signal received by the information and location information unit 17, the inertial information and GPS signal of the object obj1 received from the robot object obj1, and the depth image captured by the camera unit 11 Absolute position expressed by three axes (x, y, z) on the Cartesian coordinate system of the robot (R) and the objects (obj1, obj2, obj3) based on the depth of the feature point (FP) of the object (obj1, obj2, obj3) and position information including a relative position change amount in which the position is relatively changed from the previous position is acquired as label data. Then, the learning unit 300 labels the position information obtained with respect to the input data at time t.

다른 실시예에 따르면, 학습부(300)는 심층신경망(400)이 실내외 공간에서 로봇(R) 및 객체(obj1, obj2, obj3)의 자세를 추정하기 위한 학습 데이터를 마련할 수 있다. 이를 위하여, 구성부(200)가 입력데이터를 생성하면, 학습부(300)는 구성부(200)가 입력데이터를 생성한 시점, 즉, 시점 t에 대응하여 센서부(16)가 측정한 관성 정보(예컨대, yaw, roll, pitch), 로봇(R)의 각 관절의 모터의 각속도, 각가속도 등과, 로봇인 객체(obj1)으로부터 수신된 해당 객체(obj1)의 관성 정보, 로봇(R)의 각 관절의 모터의 각속도, 각가속도 등과, 카메라부(11)가 촬영한 깊이 영상(Depth Image)에서 추출되는 객체(obj1, obj2, obj3)의 특징점(FP)의 깊이를 기초로 모션 캡쳐 장치, AHRS(attitude/heading reference system) 장치를 통해 로봇(R) 또는 객체(obj1)의 6 자유도로 표현된 자세 정보 혹은 객체(obj2, obj3)의 특징점(FP)의 깊이를 나타내는 자세 정보를 레이블 데이터로 획득할 수 있다. 이어서, 학습부(300)는 시점 t의 입력데이터에 대해 획득한 자세 정보를 레이블링한다. According to another embodiment, the learning unit 300 may provide training data for the deep neural network 400 to estimate the postures of the robot R and the objects obj1, obj2, and obj3 in indoor and outdoor spaces. To this end, when the configuration unit 200 generates input data, the learning unit 300 generates the inertia measured by the sensor unit 16 in response to the timing when the configuration unit 200 generates the input data, that is, the time t. Information (eg, yaw, roll, pitch), angular velocity and angular acceleration of the motor of each joint of the robot R, inertia information of the object obj1 received from the robot object obj1, the angle of the robot R Based on the depth of the feature point (FP) of the object (obj1, obj2, obj3) extracted from the depth image taken by the camera unit 11, the angular velocity, angular acceleration, etc. of the motor of the joint motion capture device, AHRS ( Through the attitude/heading reference system) device, the attitude information expressed in 6 degrees of freedom of the robot (R) or the object (obj1) or the attitude information indicating the depth of the feature point (FP) of the objects (obj2, obj3) can be acquired as label data. can Next, the learning unit 300 labels the posture information obtained with respect to the input data at time t.

또 다른 실시예에 따르면, 학습부(300)는 심층신경망(400)이 카메라부(11)를 통해 촬영된 2차원 영상 또는 3차원 투영 공간 상에서 객체(object 1, 2, 3)가 차지하는 영역 및 객체(object 1, 2, 3)의 종류(Robot 5911TARGA4S, tumbler, Human)를 식별하도록 학습시킬 수 있다. 이에 따라, 구성부(200)가 입력데이터를 생성하면, 학습부(320)는 입력데이터의 복수의 평면 영상(FI)에 대해 영상처리 알고리즘을 통해 자동으로 객체(obj1, obj2, obj3)의 클래스(Robot 5911TARGA4S, tumbler, Human) 및 영역상자(B)를 검출할 수 있다. 이에 대한 대안으로, 사용자에 의해 레이블링 소프트웨어를 사용하여 수동으로 입력데이터의 복수의 평면 영상(FI)에 포함된 객체(obj1, obj2, obj3)의 클래스(Robot 5911TARGA4S, tumbler, Human) 및 영역상자(B)가 지정되며, 학습부(300)는 이를 입력받을 수도 있다. 이에 따라, 학습부(300)는 입력데이터의 복수의 평면 영상(FI)에 포함된 하나 이상의 객체(obj1, obj2, obj3) 각각에 대해 획득한 객체 정보, 즉, 객체의 클래스(class) 및 영역상자(Bounding Box)를 레이블링한다. According to another embodiment, the learning unit 300 includes the area occupied by the objects (objects 1, 2, 3) on the 2D image or 3D projection space where the deep neural network 400 is photographed through the camera unit 11 and It can be trained to identify the types of objects (object 1, 2, 3) (Robot 5911TARGA4S, tumbler, Human). Accordingly, when the construction unit 200 generates input data, the learning unit 320 automatically uses an image processing algorithm for a plurality of plane images FI of the input data to automatically classify the objects obj1, obj2, and obj3. (Robot 5911TARGA4S, tumbler, Human) and area box (B) can be detected. As an alternative to this, the classes (Robot 5911TARGA4S, tumbler, Human) and the areabox (Robot 5911TARGA4S, tumbler, Human) and the domain box ( B) is designated, and the learning unit 300 may receive it. Accordingly, the learning unit 300 acquires object information obtained for each of the one or more objects obj1 , obj2 , and obj3 included in the plurality of plane images FI of the input data, that is, the class and region of the object. Label the Bounding Box.

전술한 바와 같이 학습 데이터가 마련되면, 학습부(300)는 S440 단계에서 심층신경망(400)을 학습시킬 수 있다. 예컨대, 학습부(300)는 입력데이터를 심층신경망(400)에 입력하고, 심층신경망(400)이 각각이 가중치가 적용되는 복수의 계층의 복수의 연산을 통해 출력값을 산출하면, 출력값과 레이블 데이터와의 차이가 최소가 되도록 심층신경망(400)의 파라미터를 수정하는 최적화를 통해 심층신경망(400)을 학습시킬 수 있다. When the training data is prepared as described above, the learning unit 300 may train the deep neural network 400 in step S440 . For example, the learning unit 300 inputs the input data to the deep neural network 400, and when the deep neural network 400 calculates an output value through a plurality of operations of a plurality of layers to which each weight is applied, the output value and the label data The deep neural network 400 can be trained through optimization of modifying the parameters of the deep neural network 400 so that the difference between .

전술한 바와 같이, 학습이 완료되면, 연산장치(10)의 구성부(200)는 본 발명의 제1 및 제2 실시예에 따라 열화상 카메라(120) 또는 영상 카메라(110)를 통해 촬영한 복수의 영상(IMG: CI, DI, II, TI)으로부터 복수의 입체 영상(SI)을 생성하고, 복수의 입체 영상(SI)을 격자로 배열하여 격자 영상(GI)을 형성할 수 있다. As described above, when the learning is completed, the configuration unit 200 of the computing device 10 is photographed through the thermal imaging camera 120 or the video camera 110 according to the first and second embodiments of the present invention. A plurality of stereoscopic images SI may be generated from a plurality of images IMG: CI, DI, II, and TI, and a grid image GI may be formed by arranging the plurality of stereoscopic images SI in a grid.

그런 다음, 구성부(200)는 입체 영상(SI) 혹은 격자(GI)을 입력데이터로 심층신경망(400)에 입력할 수 있다. 그러면, 심층신경망(400)은 학습된 파라미터에 따라 복수의 계층의 가중치가 적용되는 복수의 연산을 통해 출력값으로 위치 정보, 자세 정보 혹은 객체 정보에 대한 확률을 출력할 것이다. 그러면, 추정부(500)는 출력값인 확률을 통해 로봇(R) 혹은 객체(obj1, obj2, obj3)의 절대 위치 또는 상대 위치 변화량, 자세를 추정하거나, 객체(obj1, obj2, obj3)를 식별할 수 있다. Then, the configuration unit 200 may input a stereoscopic image (SI) or a grid (GI) as input data to the deep neural network 400 . Then, the deep neural network 400 will output the probability of position information, posture information, or object information as an output value through a plurality of operations to which a plurality of layer weights are applied according to the learned parameters. Then, the estimator 500 estimates the absolute position or the relative position change amount, the posture of the robot R or the object (obj1, obj2, obj3) through the probability that is the output value, or identifies the object (obj1, obj2, obj3). can

한편, 전술한 본 발명의 실시예에 따른 방법은 다양한 컴퓨터수단을 통하여 판독 가능한 프로그램 형태로 구현되어 컴퓨터로 판독 가능한 기록매체에 기록될 수 있다. 여기서, 기록매체는 프로그램 명령, 데이터 파일, 데이터구조 등을 단독으로 또는 조합하여 포함할 수 있다. 기록매체에 기록되는 프로그램 명령은 본 발명을 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 당업자에게 공지되어 사용 가능한 것일 수도 있다. 예컨대 기록매체는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체(magnetic media), CD-ROM, DVD와 같은 광 기록 매체(optical media), 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical media) 및 롬(ROM), 램(RAM), 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치를 포함한다. 프로그램 명령의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어를 포함할 수 있다. 이러한 하드웨어 장치는 본 발명의 동작을 수행하기 위해 하나 이상의 소프트웨어 모듈로서 작동하도록 구성될 수 있으며, 그 역도 마찬가지이다. Meanwhile, the above-described method according to an embodiment of the present invention may be implemented in the form of a program readable by various computer means and recorded in a computer readable recording medium. Here, the recording medium may include a program command, a data file, a data structure, etc. alone or in combination. The program instructions recorded on the recording medium may be specially designed and configured for the present invention, or may be known and available to those skilled in the art of computer software. For example, the recording medium includes magnetic media such as hard disks, floppy disks and magnetic tapes, optical recording media such as CD-ROMs and DVDs, and magneto-optical media such as floppy disks ( magneto-optical media) and hardware devices specially configured to store and execute program instructions such as ROM, RAM, flash memory, and the like. Examples of the program instruction may include not only machine language such as generated by a compiler, but also a high-level language that can be executed by a computer using an interpreter or the like. Such hardware devices may be configured to operate as one or more software modules to perform the operations of the present invention, and vice versa.

이상 본 발명을 몇 가지 바람직한 실시예를 사용하여 설명하였으나, 이들 실시예는 예시적인 것이며 한정적인 것이 아니다. 이와 같이, 본 발명이 속하는 기술분야에서 통상의 지식을 지닌 자라면 본 발명의 사상과 첨부된 특허청구범위에 제시된 권리범위에서 벗어나지 않으면서 균등론에 따라 다양한 변화와 수정을 가할 수 있음을 이해할 것이다. Although the present invention has been described above using several preferred embodiments, these examples are illustrative and not restrictive. As such, those of ordinary skill in the art to which the present invention pertains will understand that various changes and modifications can be made in accordance with the doctrine of equivalents without departing from the spirit of the present invention and the scope of rights set forth in the appended claims.

10: 연산장치 11: 카메라부
12: 입력부 13: 표시부
14: 저장부 15: 통신부
16: 센서부 17: 위치정보부
18: 제어부 110: 영상 카메라
120: 열화상 카메라 200: 구성부
210: 평면분류부 220: 입체형성부
230: 격자형성부 300: 학습부
400: 심층신경망 500: 추정부 10: arithmetic unit 11: camera unit
12: input unit 13: display unit
14: storage unit 15: communication unit
16: sensor unit 17: location information unit
18: control unit 110: video camera
120: thermal imaging camera 200: components
210: plane classification unit 220: three-dimensional forming unit
230: grid forming unit 300: learning unit
400: deep neural network 500: estimator

Claims

In a method for structuring as input data of a deep neural network model,
classifying the information included in the image based on the properties of the information included in the image when the plane classifier receives a two-dimensional image, and generating a plurality of plane images each including the classified information; and
generating a three-dimensional image having a three-dimensional structure by a three-dimensional forming unit stacking the plurality of planar images in a plurality of layers;
includes,
The step of generating the plurality of plane images
When the input image is a thermal image,
classifying, by the plane classification unit, the pixels of the thermal image into a plurality of unit sections based on the temperature of the pixel value; and
generating, by the plane classification unit, a plane image including pixels having a pixel value of a temperature included in the unit section for each of the plurality of classified unit sections, thereby generating a plurality of planar images;
characterized in that it comprises
A method for structuring as input data of a deep neural network model.

delete

According to claim 1,
The step of classifying into the plurality of unit sections is
The plane classification unit
After setting the maximum value of the temperature distribution,
formula

It is characterized in that it is divided into unit sections of a predetermined number of divisions according to
wherein Q is the number of divisions,
The Tstart is the starting temperature of the unit section,
The Tend is the end temperature of the unit section,
The Tmax is the maximum value of the temperature distribution
characterized by
A method for structuring as input data of a deep neural network model.

4. The method of claim 3,
The step of generating the plurality of plane images
The plane classification unit
replacing a pixel value having a temperature equal to or greater than the maximum value of the temperature distribution with an effective maximum temperature;
By constructing a flat image including a feature pixel having a pixel value of a temperature included in the unit section for each of the classified plurality of unit sections and a padding pixel obtained by converting a pixel value of a temperature not included in the unit section into 0, a plurality of planes generating an image;
Normalizing pixel values of all pixels of a plurality of plane images and converting them into a range of [0.0, 1.0);
characterized in that it comprises
A method for structuring as input data of a deep neural network model.

According to claim 1,
The step of generating the plurality of images
generating, by the plane classification unit, a plurality of plane images in which pixel values have at least one of color information, distance information, and brightness information when the input image includes a color image, a depth image, and an infrared image; and
Normalizing pixel values of all pixels of a plurality of plane images and converting them into a range of [0.0, 1.0);
characterized in that it comprises
A method for structuring as input data of a deep neural network model.

6. The method of claim 5,
The step of generating a plurality of plane images is
From a color image, an R channel plane image having only R(Red, red) channel pixel values, a G channel plane image having only G(Green, green) channel pixel values, and a B (Blue, blue) channel pixel value only. generating a channel plane image or a grayscale plane image composed of grayscale values;
generating an upper depth plane image and a lower depth plane image according to the size of a pixel value indicating a distance from the depth image, or generating a quantized depth plane image by quantizing pixel values; and
generating an upper luminance plane image and a lower luminance plane image according to the size of a pixel value indicating brightness from the infrared image, or generating a quantized luminance plane image obtained by quantizing pixel values;
characterized in that it comprises
A method for structuring as input data of a deep neural network model.

According to claim 1,
forming a grid image by arranging a plurality of stereoscopic images in a grid by a grid forming unit; and
When at least one of the plurality of stereoscopic images included in the grid image has a different height, at least one planar image is deleted from the at least one stereoscopic image or by adding at least one padding image to the at least one stereoscopic image matching the heights of a plurality of stereoscopic images included in the image;
characterized in that it further comprises
A method for structuring as input data of a deep neural network model.

8. The method of claim 7,
When a two-dimensional image is input, the plane classification unit generates a plurality of planar images, the three-dimensional forming unit generates the three-dimensional image from the plurality of planar images, and the grid forming unit arranges the plurality of three-dimensional images in a grid. to form a grid image;
collecting, by a learning unit, at least one of position information, posture information, and object information of an object as label data in response to input data that is the stereoscopic image or the grid image; and
labeling, by the learning unit, the label data to the input data;
characterized in that it further comprises
A method for structuring as input data of a deep neural network model.