KR102108953B1

KR102108953B1 - Robust camera and lidar sensor fusion method and system

Info

Publication number: KR102108953B1
Application number: KR1020180055784A
Authority: KR
Inventors: 최준원; 김재겸
Original assignee: 한양대학교 산학협력단
Priority date: 2018-05-16
Filing date: 2018-05-16
Publication date: 2020-05-11
Also published as: KR20190131207A

Abstract

센서 품질 저하에 강인한 딥러닝 기반 카메라, 라이더 센서 융합 인지 방법 및 시스템이 개시된다. 일 실시예에 따른 인지 시스템에 의하여 수행되는 인지 방법은, 서로 다른 데이터를 각각의 딥 뉴럴 네트워크에 입력하여 각각의 특징맵을 획득하는 단계; 상기 획득된 각각의 특징맵을 융합 네트워크를 통하여 융합하는 단계; 및 상기 융합 네트워크를 통하여 융합된 새로운 특징맵에 기반하여 객체를 검출하는 단계를 포함할 수 있다. Disclosed is a deep learning based camera, a rider sensor fusion recognition method and system that is robust to sensor quality degradation. The cognitive method performed by the cognitive system according to an embodiment may include obtaining different feature maps by inputting different data into each deep neural network; Fusing each of the acquired feature maps through a fusion network; And detecting an object based on a new feature map fused through the fusion network.

Description

Deep learning-based camera, rider sensor convergence recognition method and system robust to sensor quality deterioration {ROBUST CAMERA AND LIDAR SENSOR FUSION METHOD AND SYSTEM}

아래의 설명은 서로 다른 데이터를 융합하여 물체를 인지하기 위한 딥러닝 기술에 관한 것이다. The following description relates to a deep learning technique for recognizing objects by fusing different data.

영상 정보를 통한 물체 인지 기법은 기존 컴퓨터 비전 분야에서 다양한 접근 방식으로 활발히 연구되어 왔던 분야이다. 대표적으로 영상 내에 물체를 인지하는 데 도움을 줄 수 있는 특징들을 영상으로부터 추출하고 분류기를 통해 물체의 위치와 종류를 판별하는 방법이 있다. 최근에는 딥 뉴럴 네트워크 구조를 활용하는 딥러닝 기술의 발전으로 인해 대용량의 영상 데이터로부터 특징들을 학습하고 이를 통해 매우 높은 수준의 정확도로 물체를 검출하는 방법이 제안되었다. 일례로, 영상의 선의 구조나 형태 등의 특징들을 알아내고 이를 이미 알고있는 템플릿과 비교하여 물체를 인지하거나 검출하는 기술로서, 대표적인 특징 추출 기술로 SIFT(Scale invariant feature transform)과 HOG(Historgram of oriented gradients) 표현자 등이 있다. SIFT는 영상에서 코너점 등 식별이 용이한 특징점들을 선택한 후에 각 특징점을 중심으로 한 로컬 패치에 대해 방향성을 갖는 특징 벡터를 추출하는 방법이다. 이는 주변의 밝기 변화의 방향 및 밝기 변화의 급격한 정도를 표현하는 특징을 이용한 방법이다. HOG는 대상 영역을 일정 크기의 셀로 분할하고, 각 셀마다 경계 픽셀들의 방향에 대한 히스토그램을 구한 후 이들 히스토그램 bin 값들을 일렬로 연결한 벡터를 이용한 방법이다. HOG는 경계의 방향정보를 이용하기 때문에 일종의 경계기반 템플릿 매칭 방법으로도 볼 수 있다. 또한 물체의 실루엣 정보를 이용하므로 사람, 자동차 등과 같이 내부 패턴이 복잡하지 않으면서도 고유의 독특한 윤곽선 정보를 갖는 물체를 식별하는데 적합한 방법이다. 이러한 기술들은 미리 엣지나 형태에 대한 물체의 알려진 정보들만을 활용하기 때문에 영상 데이터에 다양한 조도 변화나 형태 왜곡, 노이즈, 가림등이 발생하는 경우에 특징값이 예측된 분포에서 벗어나게 되면 인식 성능이 크게 떨어지는 단점이 있다. Object recognition technology through image information is an area that has been actively studied in various approaches in the field of computer vision. Typically, there is a method of extracting features that can help to recognize an object in an image from the image and determining the position and type of the object through a classifier. Recently, due to the development of deep learning technology that utilizes a deep neural network structure, a method for learning features from a large amount of image data and detecting an object with a very high level of accuracy has been proposed. As an example, it is a technology that recognizes or detects an object by finding out features such as the structure or shape of a line of an image and comparing it with a template that is already known.Scale invariant feature transform (SIFT) and history of oriented HOG gradients). SIFT is a method of extracting feature vectors having directionality for local patches centered on each feature point after selecting feature points that are easily identified, such as corner points, in the image. This is a method using a feature that expresses the direction of brightness change and the sudden degree of brightness change. HOG is a method using a vector that divides the target region into cells of a predetermined size, obtains a histogram for the direction of the boundary pixels for each cell, and then connects the histogram bin values in a row. Since HOG uses boundary direction information, it can be viewed as a kind of boundary-based template matching method. In addition, since it uses silhouette information of an object, it is a suitable method for identifying an object having unique unique contour information without complicated internal patterns such as a person or a car. Since these technologies utilize only the known information of the object about the edge or shape in advance, the recognition performance is greatly improved when the feature value deviates from the predicted distribution when various illuminance changes, shape distortion, noise, and occlusion occur in the image data. There is a downside.

최근에 많은 연구가 되고 있는 딥러닝 기술은 방대한 양의 데이터로부터 데이터의 표현 방법 즉 특징 추출을 직접 학습하는 방법을 사용하여 다양한 환경이나 변화에 좋은 성능을 유지할 수 장점을 갖고 있다. 특히 Convolutional neural network(CNN) 구조는 2차원 영상의 로컬한 특징을 계층적으로 추출하여 영상 인식에 필요한 고차원 의미 정보를 효과적으로 얻을 수 있다. 최근에 병렬 계산을 할 수 있는 컴퓨터 기술이 발달하고 대용량 데이터를 활용할 수 있는 인프라가 가능해지면서 이러한 딥러닝 기술이 매우 효과적인 인식 기술로 활용되고 있다.Deep learning technology, which has been studied a lot in recent years, has the advantage of maintaining good performance in various environments or changes by using a method of directly expressing data, that is, extracting features from a large amount of data. In particular, the convolutional neural network (CNN) structure can hierarchically extract the local characteristics of a 2D image to effectively obtain high-dimensional semantic information necessary for image recognition. Recently, as computer technology capable of parallel computing has been developed and an infrastructure capable of utilizing large amounts of data has been made available, such deep learning technology is being used as a very effective recognition technology.

한편, 센서융합 기술이란 다양한 센서로부터 취득된 데이터를 결합하여 활용하는 기술을 의미한다. 서로 다른 특성을 갖는 다양한 종류의 센서를 사용하여 인지의 성능과 신뢰성을 높이는 것이 가능하기 때문에 센서 융합은 자율주행이나 영상인지 분야에 있어 중요한 기술이다. 다양한 분포의 센서 데이터를 융합하기 위해서는 여러 가지 융합 기술이 가능한데 일반적으로 초기 단계 융합, 중간 단계 융합, 후기 단계 융합 방식이 존재한다. 초기 단계 융합은 하나 이상이 데이터를 미리 결합하여 동시에 처리하는 융합 방법이고 후기 단계 융합은 각각의 데이터를 모두 처리하여 최종적인 인지 결과를 얻은 후에 이를 결합하는 기술이다. 중간 단계 융합은 이러한 두 가지 방법의 중간에 위치한 방법이다. 초기 단계 융합은 서로 이질적인 데이터 특성에 의하여 좋은 융합 성능을 얻기가 어렵고 후기 단계 융합은 데이터의 고차 의미적 상관도를 잘 활용하지 못하는 단점이 있다. 최근에 딥러닝 기법에 의한 센서융합 기술이 등장하면서 각각의 센서 신호를 별도의 CNN을 통과시킨 뒤에 이를 중간에서 결합하고 마지막 단계에서의 결합된 특징값을 CNN을 통해 처리하는 중단 단계 융합 기법이 좋은 성능을 나타내고 있다. 이러한 다양한 센서들로부터 같은 환경에 대해 서로 다른 분포를 갖는 센서 측정 데이터를 얻을 수 있기 때문에 이러한 다른 형태의 센서 데이터를 어떻게 융합하여 센서 성능을 최적화할 것인가의 문제가 중요해지고 있다. 기본적으로 각각의 센서 데이터를 별도로 처리하여 결과를 반영하는 것 보다는 각 센서 성능의 특성을 반영해 데이터를 효과적으로 결합하는 센서 융합(sensor fusion) 기술이 개발될 필요가 있다. Meanwhile, the sensor fusion technology refers to a technology that combines and utilizes data acquired from various sensors. Sensor fusion is an important technology in the field of autonomous driving or image recognition because it is possible to increase the performance and reliability of cognition by using various types of sensors with different characteristics. In order to fuse sensor data of various distributions, various fusion techniques are possible. Generally, there are early stage fusion, middle stage fusion, and late stage fusion. Early stage fusion is a fusion method in which one or more data is pre-combined and processed at the same time, and late stage fusion is a technique in which each data is processed and combined after obtaining a final recognition result. Intermediate-stage fusion is a method located in the middle of these two methods. In the early stage fusion, it is difficult to obtain good fusion performance due to heterogeneous data characteristics, and the later stage fusion has a disadvantage in that it cannot utilize the high-order semantic correlation of data well. As sensor fusion technology based on deep learning has recently emerged, the middle-stage fusion technique that combines each sensor signal through a separate CNN and processes it through the CNN is combined with the intermediate feature and the combined feature value in the last stage is good. It shows performance. Since it is possible to obtain sensor measurement data having different distributions for the same environment from these various sensors, the question of how to optimize sensor performance by integrating these different types of sensor data is becoming important. Basically, it is necessary to develop a sensor fusion technique that effectively combines data by reflecting the characteristics of each sensor performance rather than processing each sensor data separately and reflecting the results.

최근 자율주행 분야에서 센서 융합 기반의 인지 기술에 딥러닝 기술이 적용되고 있다. 특히 자동차에 장착되는 카메라와 라이더 센서를 융합하여 인지 성능의 신뢰성을 높이는 연구가 활발히 진행되고 있다. 이러한 카메라와 라이더 센서 융합을 위해 딥러닝을 적용하는 기술로서, 라이더 포인트로 탑-뷰 이미지를 만들어 카메라 이미지와 네트워크 안에서 Fully connected layer를 이용하여 융합하는 방법과, 둘째, 라이더 포인트로 만든 탑-뷰 이미지를 전방 이미지로 변환하여 융합한 방법이 존재한다. In the field of autonomous driving, deep learning technology has been applied to cognitive technology based on sensor fusion. In particular, research is being actively conducted to improve the reliability of cognitive performance by integrating a camera mounted on a vehicle with a rider sensor. As a technology that applies deep learning for fusion of these cameras and rider sensors, a method of creating a top-view image with rider points and fusion using a fully connected layer in the camera image and network, and second, a top-view made with rider points There is a method of converting and converting an image into a front image.

그러나 센서 융합을 통한 물체 인지 기술은 다음과 같은 문제점이 발생할 수 있다. CNN을 이용하여 각각 센서 데이터의 특징맵을 추출한 후에 카메라와 라이더 센서를 네트워크 단에서 융합하는 기술의 경우, 다양한 데이터를 통해 이미 학습이 되고 나면 고정된 네트워크 계수를 사용하게 된다. 이러한 경우에 두 센서 데이터가 모두 온전한 경우에는 좋은 성능을 나타내지만 하나의 센서 데이터의 품질의 저하가 발생할 경우는 네트워크가 이러한 상황을 잘 처리하지 못해 성능이 떨어지게 되는 문제가 발생한다. 이에 따라 센서 데이터에 저하가 자주 발생하는 자율주행 환경의 경우 치명적인 문제의 요인이 될 수 있다.However, the following problems may occur in object recognition technology through sensor fusion. After extracting the feature map of each sensor data using CNN, in the case of the technology that fuses the camera and the rider sensor at the network stage, after learning through various data, fixed network coefficients are used. In this case, when both sensor data are intact, they exhibit good performance, but when the quality of one sensor data is deteriorated, the network does not handle this situation well, resulting in a problem of deterioration in performance. Accordingly, in a self-driving environment where sensor data is frequently degraded, it can be a cause of a fatal problem.

참고자료: KR10-2018-0003535(2018.01.09.), KR10-1714233(2017.03.02.), KR10-2016-0096460(2016.08.16)References: KR10-2018-0003535 (2018.01.09.), KR10-1714233 (2017.03.02.), KR10-2016-0096460 (2016.08.16)

서로 다른 데이터를 딥 뉴럴 네트워크에 입력시킴에 따라 획득된 각각의 특징맵을 융합 네트워크를 통하여 융합하여 구성된 새로운 특징맵에 기반하여 객체를 검출하는 방법 및 시스템을 제공할 수 있다. It is possible to provide a method and system for detecting an object based on a new feature map constructed by fusing each feature map obtained by inputting different data into a deep neural network through a fusion network.

또한, 센서 품질의 저하에 강인한 딥 러닝 기반의 융합 인지 기술을 제공할 수 있다. 다시 말해서, 서로 다른 형태의 센서 성능의 특성을 반영하여 데이터를 효과적으로 결합하는 센서 융합 방법 및 시스템을 제공할 수 있다. 구체적으로, 카메라의 영상 데이터와 라이더의 포인트 데이터를 융합하기 위한 딥 뉴럴 네트워크를 통하여 물체의 인지 성능을 향상시키며 카메라 혹은 라이더 센서 중 일부 데이터가 저하될지라도 높은 수준의 물체 인지 성능을 발휘하는 방법 및 시스템을 제공할 수 있다. In addition, it is possible to provide a deep learning-based convergence cognitive technology that is robust to deterioration in sensor quality. In other words, it is possible to provide a sensor fusion method and system that effectively combines data by reflecting characteristics of different types of sensor performance. Specifically, through the deep neural network for fusion of the camera's image data and the rider's point data, the object's cognitive performance is improved, and even if some of the camera or rider sensor data is degraded, a method of exerting a high level of object recognition performance and System.

인지 시스템에 의하여 수행되는 인지 방법은, 서로 다른 데이터를 각각의 딥 뉴럴 네트워크에 입력하여 각각의 특징맵을 획득하는 단계; 상기 획득된 각각의 특징맵을 융합 네트워크를 통하여 융합하는 단계; 및 상기 융합 네트워크를 통하여 융합된 새로운 특징맵에 기반하여 객체를 검출하는 단계를 포함할 수 있다. The cognitive method performed by the cognitive system includes: inputting different data into each deep neural network to obtain respective feature maps; Fusing each of the acquired feature maps through a fusion network; And detecting an object based on a new feature map fused through the fusion network.

상기 서로 다른 데이터를 각각의 딥 뉴럴 네트워크에 입력하여 각각의 특징맵을 획득하는 단계는, 상기 서로 다른 데이터가 라이더와 카메라와 관련된 데이터일 경우, 상기 라이더에 대하여 전처리 과정을 수행함에 따라 변환된 2차원의 3채널 이미지와 상기 카메라에 대한 카메라 이미지를 각각의 CNN에 통과시키는 단계를 포함할 수 있다. The step of acquiring each feature map by inputting the different data into each deep neural network may include converting 2 data according to a pre-processing process for the rider when the different data is data related to a rider and a camera. And passing the dimensional three-channel image and the camera image for the camera through each CNN.

상기 서로 다른 데이터를 각각의 딥 뉴럴 네트워크에 입력하여 각각의 특징맵을 획득하는 단계는, 상기 라이더로부터 취득한 3차원 포인트 정보를 상기 카메라의 2차원 영상에 매핑하되, 상기 3차원 포인트 정보의 위치 정보를 라이더 좌표에서 카메라 좌표로 변환하는 행렬을 곱하여 2차원 영상의 좌표로 생성하는 전처리 과정을 수행하는 단계를 포함할 수 있다. The step of acquiring each feature map by inputting the different data into each deep neural network maps the 3D point information acquired from the rider to the 2D image of the camera, but the location information of the 3D point information It may include the step of performing a pre-processing process of generating a coordinate of a 2D image by multiplying matrices for converting from rider coordinates to camera coordinates.

상기 획득된 각각의 특징맵을 융합 네트워크를 통하여 융합하는 단계는, 상기 각각의 특징맵에 대한 품질을 판별함에 따라 상기 각각의 특징맵 중 어느 하나 이상의 특징맵에 가중치를 부여한 후, 각각의 특징맵을 융합 네트워크를 통하여 융합시키는 단계를 포함할 수 있다. In the step of fusing each of the acquired feature maps through a convergence network, after assigning a weight to any one or more feature maps of the respective feature maps by determining the quality of the respective feature maps, each feature map It may include the step of fusion through the fusion network.

상기 획득된 각각의 특징맵을 융합 네트워크를 통하여 융합하는 단계는, 상기 획득된 각각의 특징맵이 카메라의 특징맵과 라이더의 특징맵일 경우, 상기 카메라의 특징맵과 상기 라이더의 특징맵을 상기 융합 네트워크에 통과시킴에 따라 각각의 특징맵을 병렬로 융합하여 새로운 특징맵을 생성하는 단계를 포함할 수 있다. In the step of fusing each acquired feature map through a convergence network, if each acquired feature map is a feature map of the camera and a feature map of the rider, the feature map of the camera and the feature map of the rider are fused As passing through the network, each feature map may be fused in parallel to generate a new feature map.

상기 획득된 각각의 특징맵을 융합 네트워크를 통하여 융합하는 단계는, 상기 카메라의 특징맵과 상기 라이더의 특징맵을 병렬로 1차 융합하고, 상기 1차 융합된 특징맵을 복수의 3X3 크기의 커널을 가진 딥 뉴럴 네트워크와 복수의 sigmoid 함수를 통과하여 상기 라이더 또는 상기 카메라의 강인성을 판단하는 단계를 포함할 수 있다. In the step of fusing each of the acquired feature maps through a fusion network, the camera's feature map and the rider's feature map are first fused in parallel, and the first fused feature map is a plurality of 3X3 size kernels. And passing through a deep neural network having a plurality of sigmoid functions and determining the robustness of the rider or the camera.

상기 획득된 각각의 특징맵을 융합 네트워크를 통하여 융합하는 단계는, 상기 1차 융합된 특징맵을 복수의 3X3 크기의 커널을 가진 딥 뉴럴 네트워크와 sigmoid 함수를 통과시킴에 따라 상기 카메라 및 상기 라이더의 데이터에 대한 신뢰도로서 픽셀별로 0 내지 1 사이의 값으로 출력하는 단계를 포함할 수 있다. In the step of fusing each of the acquired feature maps through a fusion network, as the first fused feature map passes a deep neural network having a plurality of 3 × 3 kernels and a sigmoid function, the camera and the rider The reliability of the data may include outputting a value between 0 and 1 for each pixel.

상기 획득된 각각의 특징맵을 융합 네트워크를 통하여 융합하는 단계는, 상기 카메라에 대한 신뢰도를 카메라의 특징맵에 곱한 값과 상기 라이더에 대한 신뢰도를 상기 라이더의 특징맵에 곱한 값을 다시 병렬로 2차 융합하고, 상기 2차 융합된 특징맵을 1x1 크기의 커널의 딥 뉴럴 네트워크를 통과하여 3차 융합하여 상기 카메라의 특징맵과 라이더의 특징맵에 대한 새로운 특징맵을 도출하는 단계를 포함할 수 있다. In the step of fusing each of the acquired feature maps through a convergence network, a value obtained by multiplying the reliability of the camera by the feature map of the camera and the multiplied reliability of the rider by the feature map of the rider in parallel 2 It may include a step of fusion and deriving a new feature map for the feature map of the camera and the feature map of the rider by performing a third fusion of the second fused feature map through a deep neural network of a 1x1 kernel. have.

상기 융합 네트워크를 통하여 융합된 새로운 특징맵에 기반하여 객체를 검출하는 단계는, 상기 서로 다른 데이터 중 일부의 데이터의 품질을 저하시키기 위한 데이터를 생성하고, 상기 품질이 저하된 데이터를 학습하도록 제어하는 단계를 포함할 수 있다.In the step of detecting an object based on a new feature map fused through the convergence network, generating data for reducing the quality of some of the different data and controlling to learn the degraded data It may include steps.

상기 융합 네트워크를 통하여 융합된 새로운 특징맵에 기반하여 객체를 검출하는 단계는, 상기 딥 뉴럴 네트워크와 상기 센서 융합 네트워크를 동시에 학습함에 따라 상기 새로운 특징맵을 처리하여 객체와 관련된 위치와 종류를 동시에 판별하는 단계를 포함할 수 있다. In the step of detecting an object based on a new feature map fused through the convergence network, as the deep neural network and the sensor convergence network are simultaneously learned, the new feature map is processed to simultaneously determine the location and type associated with the object. It may include the steps.

인지 시스템은, 서로 다른 데이터를 각각의 딥 뉴럴 네트워크에 입력하여 제1 특징맵 및 제2 특징맵을 획득하는 특징맵 획득부; 상기 획득된 제1 특징맵 및 제2 특징맵을 융합 네트워크를 통하여 융합하는 융합부; 및 상기 융합 네트워크를 통하여 융합된 새로운 특징맵에 기반하여 객체를 검출하는 검출부를 포함할 수 있다. The cognitive system may include a feature map acquiring unit configured to input different data into each deep neural network to obtain a first feature map and a second feature map; A fusion unit that fuses the obtained first feature map and second feature map through a fusion network; And a detection unit detecting an object based on a new feature map fused through the fusion network.

상기 특징맵 획득부는, 상기 서로 다른 데이터로서 제1 데이터와 제2 데이터가 입력됨에 따라 상기 제1 데이터에 대하여 전처리 과정을 수행함에 따라 변환된 2차원의 3채널 이미지와 상기 제2 데이터에 대한 이미지를 각각의 CNN에 통과시킬 수 있다. The feature map acquiring unit converts the two-dimensional three-channel image and the image of the second data by performing preprocessing on the first data as the first data and the second data are input as the different data. Can be passed through each CNN.

상기 융합부는, 상기 제1 특징맵 및 상기 제2 특징맵에 대한 품질을 판별함에 따라 각각의 특징맵 중 어느 하나 이상의 특징맵에 가중치를 부여한 후, 각각의 특징맵을 융합 네트워크를 통하여 융합시킬 수 있다. The fusion unit may assign a weight to one or more feature maps of each feature map according to determining the quality of the first feature map and the second feature map, and then fuse each feature map through a fusion network. have.

상기 융합부는, 상기 제1 특징맵과 제2 특징맵을 병렬로 1차 융합하고, 상기 1차 융합된 특징맵을 복수의 3X3 크기의 커널을 가진 딥 뉴럴 네트워크와 sigmoid 함수를 통과시킴에 따라 상기 카메라 및 상기 라이더의 데이터에 대한 신뢰도로서 픽셀별로 0 내지 1 사이의 값으로 출력할 수 있다. The fusion unit first fuses the first feature map and the second feature map in parallel, and the first fused feature map passes through a deep neural network having a plurality of 3X3 kernels and a sigmoid function. As a reliability of the data of the camera and the rider, a value between 0 and 1 may be output for each pixel.

상기 융합부는, 상기 카메라에 대한 신뢰도를 카메라의 특징맵에 곱한 값과 상기 라이더에 대한 신뢰도를 상기 라이더의 특징맵에 곱한 값을 다시 병렬로 2차 융합하고, 상기 2차 융합된 특징맵을 1x1 크기의 커널의 딥 뉴럴 네트워크를 통과하여 3차 융합하여 상기 카메라의 특징맵과 라이더의 특징맵에 대한 새로운 특징맵을 도출할 수 있다. The fusion unit secondarily fuses the multiplied value of the reliability of the camera to the feature map of the camera and the multiplied value of the reliability of the rider to the feature map of the rider in parallel, and 1x1 of the second fused feature map. Through the deep neural network of the kernel of the size, the third feature may be fused to derive a new feature map for the feature map of the camera and the feature map of the rider.

일 실시예에 따른 인지 시스템은 서로 다른 데이터를 각각의 딥 뉴럴 네트워크에 입력함에 따라 획득된 각각의 특징맵을 융합 네트워크를 통하여 융합된 새로운 특징맵에 기반하여 객체를 인지함으로써 객체의 인지 및 검출 성능을 향상시킬 수 있다. The recognition system according to an embodiment recognizes and detects an object by recognizing an object based on a new feature map fused through a fusion network, each feature map obtained by inputting different data into each deep neural network. Improve it.

또한, 융합 네트워크에서 서로 다른 특징맵에 가중치를 부여하여 결합하기 때문에 품질이 저하된 특징맵에 대한 기여도를 조절하여 최적의 센서 융합을 가능하게 한다. 다시 말해서, 종래의 융합 네트워크(GFU)가 없이 트레이닝이 종료된 네트워크를 적용 시 둘 중의 하나의 센서 데이터에 밝기 변화, 가림, 노이즈, 고장 등의 센서 품질 저하가 발생할 경우 결합된 특징맵에 영향을 주어 전체적인 인지 성능이 떨어지는 문제가 발생하는 것을 해결할 수 있다.In addition, since the different feature maps are weighted and combined in the convergence network, the optimal sensor convergence is possible by adjusting the contribution to the deteriorated feature map. In other words, when applying a network that has been trained without a conventional convergence network (GFU), sensor quality degradation such as brightness change, occlusion, noise, and failure occurs in one of the sensor data, and the combined feature map is affected. It can solve the problem that the overall cognitive performance is lowered.

또한, 라이더와 카메라에 포함된 정보를 딥 러닝 기법을 이용하여 딥 러닝의 뛰어난 분류 및 일반화 성능을 그대로 활용할 수 있다는 장점이 있다. In addition, there is an advantage that the information included in the rider and the camera can be used as it is, by using deep learning techniques, excellent classification and generalization performance of deep learning.

도 1은 일 실시예에 따른 인지 시스템의 개괄적인 동작을 설명하기 위한 도면이다.
도 2는 일 실시예에 따른 인지 시스템의 구성을 설명하기 위한 블록도이다.
도 3은 일 실시예에 따른 인지 시스템의 객체 인지 방법을 설명하기 위한 흐름도이다.
도 4는 일 실시예에 따른 인지 시스템에서 딥 러닝 기반의 물체 인지 알고리즘의 구조를 설명하기 위한 도면이다.
도 5는 일 실시예에 따른 인지 시스템에서 서로 다른 데이터의 융합을 위한 GFU의 구조를 설명하기 위한 도면이다.
도 6은 일 실시예에 따른 인지 시스템의 성능을 평가하기 위하여 카메라 이미지의 성능을 저하시킨 것을 나타낸 예이다.
도 7은 일 실시예에 따른 인지 시스템의 성능을 설명하기 위한 표이다. 1 is a view for explaining the general operation of the cognitive system according to an embodiment.
2 is a block diagram illustrating the configuration of a cognitive system according to an embodiment.
3 is a flowchart illustrating an object recognition method of a cognitive system according to an embodiment.
4 is a diagram for explaining the structure of a deep learning-based object recognition algorithm in a cognitive system according to an embodiment.
5 is a diagram for explaining the structure of GFU for fusion of different data in a cognitive system according to an embodiment.
6 is an example of deteriorating the performance of a camera image in order to evaluate the performance of a cognitive system according to an embodiment.
7 is a table for explaining the performance of a cognitive system according to an embodiment.

이하, 실시예를 첨부한 도면을 참조하여 상세히 설명한다.Hereinafter, embodiments will be described in detail with reference to the accompanying drawings.

도 1은 일 실시예에 따른 인지 시스템의 개괄적인 동작을 설명하기 위한 도면이다.1 is a view for explaining the general operation of the cognitive system according to an embodiment.

자율주행 또는 스마트 가정을 위한 사물인터넷 환경에서 카메라와 라이더 센서를 융합하여 사물을 인지하고 환경을 이해할 수 있는 딥 러닝 기반의 물체 인지 알고리즘(100)에 대하여 제안하고자 한다. 딥 러닝 기반의 물체 인지 알고리즘(100)은 인지 시스템에 의하여 동작될 수 있다. 이때, 서로 다른 특성/구조를 갖는 데이터가 딥 러닝 기반의 물체 인지 알고리즘을 수행함으로써 객체(예를 들면, 물체, 사물 등)를 인지할 수 있다. 아래의 실시예에서는 카메라의 데이터와 라이더 센서(이하, '라이더'로 기재)의 데이터를 이용하여 객체를 인지하는 방법을 예를 들어 설명하기로 한다. In the IoT environment for autonomous driving or smart homes, we propose a deep learning-based object recognition algorithm (100) that can recognize objects and understand the environment by integrating cameras and rider sensors. The deep learning-based object recognition algorithm 100 may be operated by a cognitive system. At this time, data having different characteristics / structures may recognize an object (eg, an object, an object, etc.) by performing a deep learning-based object recognition algorithm. In the following embodiment, a method of recognizing an object using data of a camera and data of a rider sensor (hereinafter referred to as 'rider') will be described as an example.

아래의 실시예에서는 딥 뉴럴 네트워크의 입력으로 라이더로부터 획득한 2차원 포인트 이미지 정보와 카메라를 통해 획득한 영상 정보를 동시에 활용하기 위한 센서 융합 딥 러닝 기술에 대하여 설명하고자 한다. 또한, 라이더 혹은 카메라 중 하나의 센서 데이터에 저하가 발생하여도 나머지 하나의 센서 데이터를 기반으로 좋은 성능을 나타내도록 한다. In the following embodiment, a sensor fusion deep learning technique for simultaneously using two-dimensional point image information obtained from a rider and image information acquired through a camera as input of a deep neural network will be described. In addition, even if degradation occurs in the sensor data of one of the rider or the camera, it exhibits good performance based on the other sensor data.

인지 시스템은 카메라(110)와 라이더(120)의 데이터를 획득함에 따라 데이터 전처리 과정을 수행하여 딥 러닝 기반의 물체 인지 알고리즘(100)을 기반으로 하여 객체를 인지할 수 있다. 카메라(110)로부터 획득한 카메라 데이터를 딥 러닝 기반의 물체 인지 알고리즘(100)에 기반하여 객체를 인지할 수 있다. 이때, 카메라 데이터는 2차원의 영상으로서, RGB 이미지(111)로 구성될 수 있다. 라이더(120)로부터 취득한 3차원 포인트 데이터(3차원 포인트 클라우드)(121)를 2차원 영상(Front-view 이미지)(122)으로 변환하여 딥 러닝 기반의 물체 인지 알고리즘(100)에 기반하여 객체를 인지할 수 있다. 구체적으로, 데이터 전처리 과정은 카메라 또는/및 라이더에 수행될 수 있다. 이에, 라이더로부터 획득된 데이터에 대한 전처리 과정을 설명하기로 한다. 예를 들면, 라이더로부터 획득된 3차원 공간 정보를 2차원 깊이, 높이 또는 반사율 중 적어도 하나 이상의 정보를 포함하는 영상으로 변환하여 3채널로 이미지화할 수 있다. 이러한 전처리 과정이 수행된2차원의 3채널 이미지를 RGB 이미지(111)와 같이 이용하여 각각 두 개의 Convolutional neural network(CNN)의 입력으로 이용할 수 있다. 두 개의 CNN으로부터 획득된 특징맵을 결합하고, 결합된 특징맵을 이용하여 객체 검출에 관련된 최종적인 정보를 추출하게 된다. 두 개의 CNN으로부터 획득된 각각의 특징맵을 결합할 때 열악한 상황에서 하나의 센서 신호의 품질이 저하되는 경우에 강인한 성능을 얻기 위하여 카이더와 카메라의 센서 신호로부터 획득된 특징맵에 적절한 가중치 값을 부여하여 융합하는 센서융합 기법이 적용된다. 이러한 가중치 값은 특징맵을 입력으로 하여 자동으로 계산이 되고 트레이닝 단계에서 가중치를 계산하는 융합 네트워크를 앞의 두 CNN 네트워크와 함께 동시에 학습시키게 된다. The cognitive system may recognize an object based on a deep learning-based object recognition algorithm 100 by performing a data pre-processing process as data of the camera 110 and the rider 120 are acquired. The camera data obtained from the camera 110 may be recognized based on the deep learning-based object recognition algorithm 100. At this time, the camera data is a two-dimensional image, and may be composed of an RGB image 111. The 3D point data (3D point cloud) 121 obtained from the rider 120 is converted into a 2D image (front-view image) 122 to convert the object based on the deep learning-based object recognition algorithm 100. Can be recognized. Specifically, the data pre-processing process may be performed on the camera or / and the rider. Accordingly, a pre-processing process for data obtained from the rider will be described. For example, the 3D spatial information obtained from the rider may be converted into an image including at least one of 2D depth, height, or reflectance, and imaged in 3 channels. The two-dimensional three-channel image on which the pre-processing has been performed may be used as an input of two convolutional neural networks (CNN) by using the RGB image 111 together. Feature maps obtained from two CNNs are combined and final information related to object detection is extracted using the combined feature maps. When combining the feature maps obtained from two CNNs, in order to obtain robust performance when the quality of one sensor signal deteriorates in poor conditions, appropriate weight values are applied to the feature maps obtained from the sensor signals of Kyder and the camera. The sensor fusion technique to apply and fuse is applied. These weight values are automatically calculated using the feature map as inputs, and the convergence network that calculates the weights in the training step is simultaneously trained with the two previous CNN networks.

인지 시스템은 이러한 카메라의 데이터 및 라이더의 데이터를 딥 러닝 기반의 물체 인지 알고리즘(100)에 기반하여 학습함에 따른 인지 결과(130), 객체를 검출할 수 있다. 이에 따라, 기존에 라이더 센서와 카메라 센서 융합 기술에 비하여 모두 온전한 데이터뿐만 아니라 특정 센서의 데이터에 품질 저하가 발생하였을 때에도 좋은 인식 성능을 획득할 수 있다.The cognitive system may detect the cognitive result 130 and the object according to learning the camera data and the rider data based on the deep learning-based object recognition algorithm 100. Accordingly, it is possible to obtain good recognition performance when quality deterioration occurs in data of a specific sensor as well as intact data as compared to the existing rider sensor and camera sensor fusion technology.

도 2는 일 실시예에 따른 인지 시스템의 구성을 설명하기 위한 블록도이고, 도 3은 일 실시예에 따른 인지 시스템의 객체 인지 방법을 설명하기 위한 흐름도이다.2 is a block diagram illustrating a configuration of a cognitive system according to an embodiment, and FIG. 3 is a flowchart illustrating an object recognition method of a cognitive system according to an embodiment.

인지 시스템(200)은 특징맵 획득부(210), 융합부(220) 및 검출부(230)를 포함할 수 있다. 이러한 구성요소들은 인지 시스템(200)에 저장된 프로그램 코드가 제공하는 제어 명령에 따라 프로세서에 의해 수행되는 서로 다른 기능들(different functions)의 표현들일 수 있다. 구성요소들은 도 3의 객체 인지 방법이 포함하는 단계들(310 내지 330)을 수행하도록 인지 시스템(200)을 제어할 수 있다. 이때, 구성요소들은 메모리가 포함하는 운영체제의 코드와 적어도 하나의 프로그램의 코드에 따른 명령(instruction)을 실행하도록 구현될 수 있다. The cognitive system 200 may include a feature map acquisition unit 210, a fusion unit 220, and a detection unit 230. These components may be expressions of different functions performed by a processor according to a control command provided by program code stored in the cognitive system 200. The components may control the cognitive system 200 to perform steps 310 to 330 included in the object recognition method of FIG. 3. At this time, the components may be implemented to execute instructions according to the code of the operating system included in the memory and the code of at least one program.

인지 시스템(200)의 프로세서는 객체 인지 방법을 위한 프로그램의 파일에 저장된 프로그램 코드를 메모리에 로딩할 수 있다. 예를 들면, 인지 시스템(200)에서 프로그램이 실행되면, 프로세서는 운영체제의 제어에 따라 프로그램의 파일로부터 프로그램 코드를 메모리에 로딩하도록 인지 시스템을 제어할 수 있다. 이때, 프로세서 및 프로세서가 포함하는 특징맵 획득부(210), 융합부(220) 및 검출부(230) 각각은 메모리에 로딩된 프로그램 코드 중 대응하는 부분의 명령을 실행하여 이후 단계들(310 내지 330)을 실행하기 위한 프로세서의 서로 다른 기능적 표현들일 수 있다. The processor of the recognition system 200 may load program code stored in a file of a program for an object recognition method into a memory. For example, when a program is executed in the cognitive system 200, the processor may control the cognitive system to load program code from a file of the program into a memory under the control of the operating system. At this time, each of the feature map acquisition unit 210, the fusion unit 220, and the detection unit 230 included in the processor and the processor executes an instruction of a corresponding portion of the program code loaded in the memory, and then performs the following steps 310 to 330 ) May be different functional representations of the processor for executing.

단계(310)에서 특징맵 획득부(210)는 서로 다른 데이터를 각각의 딥 뉴럴 네트워크에 입력하여 각각의 특징맵을 획득할 수 있다. 특징맵 획득부(210)는 서로 다른 데이터로서 제1 센서와 제2 센서가 입력됨에 따라 제1 센서의 데이터에 대하여 전처리 과정을 수행함에 따라 변환된 2차원의 3채널 이미지와 제2 센서에 대한 이미지를 각각의 CNN에 통과시킬 수 있다. 예를 들면, 특징맵 획득부(210)는 서로 다른 데이터가 라이더와 카메라와 관련된 데이터일 경우, 라이더에 대하여 전처리 과정을 수행함에 따라 변환된 2차원의 3채널 이미지와 카메라에 대한 카메라 이미지를 각각의 CNN에 통과시킬 수 있다. 이때, 특징맵 획득부(210)는 라이더로부터 취득한 3차원 포인트 정보를 카메라의 2차원 영상에 매핑하되, 3차원 포인트 정보의 위치 정보를 라이더 좌표에서 카메라 좌표로 변환하는 행렬을 곱하여 2차원 영상의 좌표로 생성하는 전처리 과정을 수행할 수 있다. In step 310, the feature map acquiring unit 210 may acquire different feature maps by inputting different data into each deep neural network. The feature map acquiring unit 210 is configured to perform a pre-processing process on the data of the first sensor as the first sensor and the second sensor are input as different data, and convert the two-dimensional three-channel image and the second sensor. Images can be passed through each CNN. For example, when the different data is data related to the rider and the camera, the feature map acquiring unit 210 respectively converts the two-dimensional three-channel image and the camera image for the camera by performing a pre-processing process for the rider. Can pass through CNN. At this time, the feature map acquiring unit 210 maps the 3D point information obtained from the rider to the 2D image of the camera, but multiplies the location information of the 3D point information from the rider coordinates to the camera coordinates to multiply the 2D image. A pre-processing process generated by coordinates can be performed.

단계(320)에서 융합부(220)는 획득된 각각의 특징맵을 융합 네트워크를 통하여 융합할 수 있다. 융합부(220)는 각각의 특징맵에 대한 품질을 판별함에 따라 각각의 특징맵 중 어느 하나 이상의 특징맵에 가중치를 부여한 후, 각각의 특징맵을 융합 네트워크를 통하여 융합시킬 수 있다. 융합부(220)는 획득된 각각의 특징맵이 카메라의 특징맵과 상기 라이더의 특징맵일 경우, 카메라의 특징맵과 라이더의 특징맵을 융합 네트워크에 통과시킴에 따라 각각의 특징맵을 병렬로 융합하여 새로운 특징맵을 생성할 수 있다. 융합부(220)는 카메라의 특징맵과 상기 라이더의 특징맵을 병렬로 1차 융합하고, 1차 융합된 특징맵을 복수의 3X3 크기의 커널을 가진 딥 뉴럴 네트워크와 복수의 sigmoid 함수를 통과하여 라이더 또는 상기 카메라의 강인성을 판단할 수 있다. 융합부(220)는 1차 융합된 특징맵을 복수의 3X3 크기의 커널을 가진 딥 뉴럴 네트워크와 sigmoid 함수를 통과시킴에 따라 카메라 및 라이더의 데이터에 대한 신뢰도로서 픽셀별로 0 내지 1 사이의 값으로 출력시킬 수 있다. 융합부(220)는 카메라에 대한 신뢰도를 카메라의 특징맵에 곱한 값과 라이더에 대한 신뢰도를 라이더의 특징맵에 곱한 값을 다시 병렬로 2차 융합하고, 2차 융합된 특징맵을 1x1 크기의 커널의 딥 뉴럴 네트워크를 통과하여 3차 융합하여 카메라의 특징맵과 라이더의 특징맵에 대한 새로운 특징맵을 도출할 수 있다. In step 320, the fusion unit 220 may fuse each acquired feature map through a fusion network. As the quality of each feature map is determined, the fusion unit 220 may assign weights to one or more feature maps of each feature map, and then fuse each feature map through a fusion network. When each acquired feature map is a feature map of the camera and a feature map of the rider, the fusion unit 220 fuses each feature map in parallel by passing the feature map of the camera and the feature map of the rider through the fusion network. To create a new feature map. The fusion unit 220 first fuses the feature map of the camera and the rider's feature map in parallel, and passes the first fused feature map through a deep neural network having a plurality of 3X3 kernels and a plurality of sigmoid functions. The toughness of the rider or the camera can be determined. The fusion unit 220 is the reliability of the data of the camera and the rider as it passes through the deep neural network having a plurality of 3X3 size kernels and the sigmoid function, and the primary fused feature map is a value between 0 and 1 for each pixel. Can be printed. The fusion unit 220 secondarily fuses the multiplied value of the reliability of the camera to the feature map of the camera and the multiplied value of the reliability of the rider to the feature map of the rider in parallel, and the second fused feature map is 1x1 in size. Through the kernel's deep neural network, it is possible to derive a new feature map for the camera's feature map and the rider's feature map by tertiary fusion.

단계(330)에서 검출부(230)는 융합 네트워크를 통하여 융합된 새로운 특징맵에 기반하여 객체를 검출할 수 있다. 검출부(230)는 딥 뉴럴 네트워크와 센서 융합 네트워크를 동시에 학습함에 따라 새로운 특징맵을 처리하여 객체와 관련된 위치와 종류를 동시에 판별할 수 있다. 검출부(230)는 서로 다른 데이터 중 일부의 데이터의 품질을 저하시키기 위한 데이터를 생성하고, 품질이 저하된 데이터를 학습하도록 제어할 수 있다. In step 330, the detector 230 may detect the object based on the new feature map fused through the fusion network. As the learning unit 230 simultaneously learns the deep neural network and the sensor fusion network, the new feature map can be processed to simultaneously determine the location and type associated with the object. The detector 230 may generate data for deteriorating the quality of some of the different data, and may control to learn the degraded data.

딥 러닝 기반 물체 인지 알고리즘은 라이더와 카메라에 포함된 정보를 딥 러닝 기법을 이용하여 딥러닝의 뛰어난 분류 및 일반화 성능을 그대로 활용할 수 있다는 장점이 있다. The deep learning-based object recognition algorithm has the advantage that the information included in the rider and the camera can be utilized by using the deep classification and general classification performance of deep learning as it is.

도 4는 일 실시예에 따른 인지 시스템의 딥 러닝 기반의 물체 인지 알고리즘의 구조를 설명하기 위한 도면이다. 4 is a diagram for explaining a structure of an object recognition algorithm based on deep learning of a cognitive system according to an embodiment.

도 4를 참고하면, 딥 러닝 기반의 물체 인지 알고리즘의 구조(400)를 나타낸 것이다. 카메라와 라이더의 센서 데이터를 복수의 딥 뉴럴 네트워크(예를 들면, CNN)에 각각 먼저 통과시킴에 따라 각 데이터의 품질을 판별한 후, 두 데이터를 융합하여 객체를 인지할 수 있다. 다시 말해서, 카메라의 RGB 이미지(이하, 카메라 이미지)(111)와 라이더의 라이다 이미지(122)가 각각의 딥 뉴럴 네트워크에 입력될 수 있다. 이때, 센서 융합 기술을 위하여 라이더의 데이터에 전처리 과정이 수행될 수 있다. 라이더를 통하여 취득한 3차원 포인트 정보를 카메라의 2차원의 영상에 매핑할 수 있다. 3차원 포인트의 좌표 정보 p=(x, y, z)에 라이더 좌표에서 카메라 좌표로 변환하는 행렬을 곱하여 2차원의 영상 좌표

를 도출할 수 있다. 영상 좌표는 영상의 각 픽셀의 위치에 해당하게 되고, 픽셀값은 깊이, 높이, 반사율 정보 값으로부터 획득될 수 있다. 예를 들면, 매우 가까이 있는 포인트에 대해서는 255에 가까운 밝기로, 매우 먼 경우에 있는 포인트에 대해서는 0에 가까운 밝기로 변환될 수 있다. 이 경우, 포인트를 이용하여 2차원 공간을 모두 채울 수 없기 때문에 포인트가 이미지 상에 생긴 형태로 드문드문(sparse) 나타나게 된다. 포인트가 존재하지 않는 픽셀의 경우 0으로 채울 수 있다. 이미 카메라의 좌표와 라이더의 좌표가 정합이 되어 있을 경우, 앞서 설명한 바와 같은 방법으로 라이더의 데이터를 변환하여 새로운 2차원의 3채널 이미지를 추가적으로 획득할 수 있다. 이와 같이, 라이더에 대하여 전처리 과정을 수행함에 따라 변환된 라이더에 대한 2차원의 3채널 이미지(122)와 카메라의 카메라 이미지(111)를 각각의 딥 뉴럴 네트워크(예를 들면, CNN)에 통과시킬 수 있다. 도 4와 같이, 딥 뉴럴 네트워크의 입력 부분에는 라이더의 3채널 이미지와 카메라의 카메라 이미지를 각각 독립적인 CNN을 통과시킨 후, 어느 시점 이후의 네트워크 노드에서 획득되는 특징맵(feature map)을 융합 네트워크(GFU)(410)를 통하여 융합하여 새로운 특징맵을 구성할 수 있다. 이러한 특징맵을 처리함에 따라 객체의 위치와 종류를 동시에 판별(420)할 수 있다. 이때, 예를 들면, Single Shot Detector(SSD) 방법을 통하여 객체가 검출될 수 있으며, SSD 방법 이외에도 다양한 방법을 통하여 객체를 검출할 수도 있다. Referring to FIG. 4, a structure 400 of a deep learning-based object recognition algorithm is illustrated. As the sensor data of the camera and the rider are first passed through a plurality of deep neural networks (for example, CNN), the quality of each data is determined, and then the two data are fused to recognize the object. In other words, the RGB image of the camera (hereinafter referred to as the camera image) 111 and the rider's lidar image 122 may be input to each deep neural network. At this time, a pre-processing process may be performed on the rider's data for sensor fusion technology. The 3D point information acquired through the rider can be mapped to the 2D image of the camera. 2D image coordinates by multiplying the coordinate information p = (x, y, z) of the 3D point by a matrix that converts from rider coordinates to camera coordinates

Can be derived. The image coordinates correspond to the position of each pixel in the image, and the pixel value can be obtained from depth, height, and reflectance information values. For example, it may be converted to a brightness close to 255 for a point very close, and to a brightness close to 0 for a point very distant. In this case, since the two-dimensional space cannot be filled by using the points, the points appear sparse in the form formed on the image. Pixels with no points can be filled with zeros. If the coordinates of the camera and the rider's coordinates have already been matched, a new two-dimensional three-channel image can be additionally obtained by converting the rider's data in the same manner as described above. As such, as the pre-processing process is performed on the rider, the two-dimensional three-channel image 122 and the camera image 111 of the converted rider are passed through each deep neural network (for example, CNN). Can be. As shown in FIG. 4, a fusion network uses a feature map obtained from a network node after a certain point after passing an independent CNN through a 3-channel image of a rider and a camera image of a camera, respectively, in an input portion of a deep neural network. It is possible to construct a new feature map by fusion through (GFU) 410. As the feature map is processed, the location and type of the object may be simultaneously determined (420). At this time, for example, an object may be detected through a Single Shot Detector (SSD) method, and an object may be detected through various methods in addition to the SSD method.

도 5를 참고하면, 카메라의 특징맵(510)과 라이더의 특징맵(520)의 융합을 위한 GFU의 구조를 설명하기 위한 도면이다. 각각의 특징맵에 대한 품질을 판별함에 따라 각각의 특징맵 중 어느 하나 이상의 특징맵에 가중치를 부여한 후, 각각의 특징맵을 융합 네트워크(GFU(410))를 통과시켜 융합할 수 있다. 구체적으로, 우선, 카메라의 특징맵(510)과 라이더의 특징맵(520)이 병렬로 합치게 된다. 그 후, 병렬로 합쳐진 특징맵(1차 융합 특징맵)은 각각 3X3 크기의 커널을 가진 CNN과 sigmoid 함수를 통과하게 된다. 이러한 과정을 통하여 카메라와 라이더의 데이터 중 어느 데이터가 더 신뢰성 있는 데이터인지 강인성을 판단할 수 있다. 이때, 병렬로 합쳐진 특징맵(1차 융합 특징맵)이 3X3 크기의 커널을 가진 CNN과 sigmoid 함수를 통과함에 따라 픽셀별로 데이터에 대한 신뢰도로서 0 내지 1 사이의 값으로 매핑하여 출력 데이터를 출력할 수 있다. 다시 말해서, 3X3 크기의 커널을 가진 CNN과 sigmoid 함수를 통과한 출력이 데이터에 대한 신뢰도로서 0부터 1 사이의 값으로 픽섹별로 도출될 수 있다. 이러한 신뢰도는 다시 카메라와 라이더의 각 특징맵에 곱해질 수 있고, 곱해진 특징맵은 다시 병렬로 합쳐지는 과정이 수행될 수 있다. 그리고 나서, 1X1크기의 커널을 가진 CNN을 통과하여 최종적으로 카메라의 특징맵과 라이더의 특징맵이 융합된다. 구체적으로, 카메라에 대한 신뢰도를 카메라의 특징맵에 곱한 값과 라이더에 대한 신뢰도를 라이더의 특징맵에 곱한 값을 다시 병렬로 2차 융합하고, 2차 융합된 특징맵을 1X1크기의 커널의 딥 뉴럴 네트워크(예를 들면, CNN)를 통과하여 3차 융합하여 카메라의 특징맵과 라이더의 특징맵에 대한 새로운 특징맵이 구성될 수 있다. 마지막으로, GFU 계수와 각 센서에 해당하는 CNN, 융합된 특징맵으로부터 검출 결과를 추출하는 네트워크의 계수를 동시에 학습시키게 된다. 이에 따라 새로운 특징맵에 기반하여 객체가 검출될 수 있다. Referring to FIG. 5, it is a view for explaining the structure of the GFU for fusion of the feature map 510 of the camera and the feature map 520 of the rider. After determining the quality of each feature map, after assigning a weight to one or more feature maps of each feature map, each feature map may be fused through a fusion network (GFU 410). Specifically, first, the feature map 510 of the camera and the feature map 520 of the rider are merged in parallel. Then, the feature maps (primary fusion feature maps) merged in parallel pass through the CNN and sigmoid functions, each with a 3X3 kernel. Through this process, it is possible to determine which of the camera and rider data is more reliable data. At this time, as the feature map (primary fusion feature map) merged in parallel passes through the CNN and sigmoid functions having a 3X3 kernel, the reliability of the data for each pixel is mapped to a value between 0 and 1 to output the output data. Can be. In other words, the output passed through the CNN and sigmoid functions with a 3X3 kernel size can be derived per pixel by 0 to 1 as the reliability of the data. The reliability can be multiplied back to each feature map of the camera and the rider, and the multiplied feature maps can be combined again in parallel. Then, after passing through a CNN with a 1X1 kernel, the camera's feature map and the rider's feature map are finally fused. Specifically, the value multiplied by the reliability of the camera is multiplied by the camera's feature map and the value of the reliability of the rider by the feature map of the rider is secondly fused in parallel again, and the second fused feature map is deep in the kernel of 1X1 size. A new feature map for the feature map of the camera and the feature map of the rider may be constructed by tertiary fusion through the neural network (eg, CNN). Finally, the GFU coefficient, the CNN corresponding to each sensor, and the network coefficient for extracting the detection result from the fused feature map are simultaneously learned. Accordingly, an object may be detected based on the new feature map.

이때, 카메라와 라이더의 센서 품질이 저하될 경우, 이에 적합한 가중치를 부여하는 기능을 학습하기 위하여 카메라와 라이더 중 하나에 품질의 저하를 적용하는 데이터 augmentation 기법이 필요하다. 예를 들면, 도 6을 참고하면, 실시예에서 제안된 딥 러닝 기반의 물체인지 알고리즘의 성능을 평가하기 위하여 카메라의 이미지에 성능 저하를 시킨 것을 나타낸 예이다. 도 6(a)는 밝기가 조작된 영상 데이터이고, 도 6(b)는 일부분이 가려진 영상 데이터이고, 도 6(c)는 노이즈가 포함된 영상 데이터이고, 도 6(d)는 원본의 영상 데이터이다. 영상 데이터의 부분적으로 가림이 있다거나 밝기를 조절한다든지 둘 중 하나의 데이터를 제거하는 등의 조작을 통해서 강인성을 주기 위한 추가적인 데이터 생성을 수행하고 이를 통해 해당 네트워크를 학습시킬 수 있다. 이러한 일부 데이터의 품질을 저하시켜 데이터를 학습하도록 함으로써 영상의 품질에 따라 적응적으로 네트워크의 구조뿐만 아니라 가중치를 부여하도록 할 수 있다. At this time, when the sensor quality of the camera and the rider is deteriorated, a data augmentation technique is applied to apply the deterioration of the quality to one of the camera and the rider in order to learn a function of assigning a weight appropriate thereto. For example, referring to FIG. 6, this is an example of deteriorating an image of a camera in order to evaluate the performance of the deep learning-based object recognition algorithm proposed in the embodiment. FIG. 6 (a) is image data whose brightness is manipulated, FIG. 6 (b) is image data that is partially obscured, FIG. 6 (c) is image data including noise, and FIG. 6 (d) is an original image Data. You can create additional data to give robustness through operations such as partially obscuring the image data, adjusting the brightness, or removing one of the two data, and train the network through this. By lowering the quality of some of the data and learning the data, it is possible to adaptively assign a weight as well as a network structure according to the quality of the image.

일 실시예에 따른 인지 시스템은 딥 러닝을 통해서 센서 융합을 하는 경우에 CNN을 통하여 획득된 특징맵을 결합하여 새로운 특징맵을 획득하고, 획득된 새로운 특징맵을 추가적으로 CNN 계층을 더 적용하여 센서 융합 기반의 물체 검출을 수행하게 된다. 이때, 융합 네트워크(GFU)에서 각각의 특징맵에 적절한 가중치를 부여하여 결합하기 때문에 품질이 저하된 특징맵에 대한 기여를 조절함으로써 최적의 센서 융합이 이루어지게 된다. 또한, 라이더와 카메라에 대한 데이터의 장점을 이용한 융합을 발휘할 수 있고, 하나의 센서 데이터에 저하가 발생한 경우도 최적의 융합을 가능하게 한다. In the case of sensor fusion through deep learning, the cognitive system according to an embodiment combines the feature map acquired through CNN to obtain a new feature map, and additionally applies the CNN layer to the acquired new feature map to further apply sensor fusion. Based object detection is performed. At this time, since an appropriate weight is applied to each feature map in the fusion network (GFU) to combine them, optimal sensor fusion is achieved by controlling the contribution to the deteriorated feature map. In addition, it is possible to exert fusion using the advantages of data for the rider and the camera, and even when a degradation occurs in one sensor data, optimal fusion is possible.

도 7은 일 실시예에 따른 인지 시스템의 성능을 설명하기 위한 표이다. 7 is a table for explaining the performance of a cognitive system according to an embodiment.

카메라와 라이더의 데이터의 융합을 위한 딥 러닝 기법을 통하여 물체 인지 및 검출 성능을 향상시킬 수 있다. 일례로, KITTI 벤치마크 데이터 셋을 이용하여, 라이더 좌표와 카메라 좌표를 정합한 후 라이더 센서의 3차원 포인트 데이터를 2차원 영상으로 변환한 후, 라이더 센서의 2차원 영상과 카메라의 영상을 CNN에 각각 입력할 수 있다. 이때, 융합 네트워크(GFU)를 통하여 센서 데이터의 품질 판단 및 융합을 하여 객체를 인지하게 된다. 도 7에서 GFU에 대한 성능을 베이스 네트워크에 GFU가 포함된 경우와, GFU가 포함되지 않은 경우를 비교한 결과를 나타낸 것이다. 예를 들면, 성능을 비교하기 위하여 카메라 이미지가 비었거나, 일부 가려지거나, 일부분이 밝게 되거나, 노이즈가 섞인 경우에서 진행될 수 있고, 이외에도 라이더 이미지가 비었거나, 일부 가려진 환경에서도 진행될 수 있다. 이에 따라, 비교 결과 모든 항목에서 GFU가 존재하는 네트워크가 더 좋은 성능을 나타내었으며, 특히, 노이즈가 섞인 경우에는 보다 큰 성능 차이를 보이는 것을 확인할 수 있다. Object recognition and detection performance can be improved through deep learning techniques for fusion of camera and rider data. For example, using the KITTI benchmark data set, after matching the rider coordinates and the camera coordinates, the 3D point data of the rider sensor is converted into a 2D image, and then the 2D image of the rider sensor and the image of the camera are sent to CNN. You can enter each. At this time, the quality of sensor data is determined and fused through the fusion network (GFU) to recognize the object. 7 shows the results of comparing the performance for GFU with the case where GFU is included in the base network and the case where GFU is not included. For example, to compare performance, the camera image may be empty, partially obscured, partially brightened, or noise may be mixed, and in addition, the rider image may be empty or partially obscured. Accordingly, as a result of the comparison, it can be seen that the network in which GFU exists in all items showed better performance, and in particular, when noise is mixed, shows a larger performance difference.

일 실시예에 따른 인지 시스템은 최근 자율주행에서 라이더와 카메라 센서가 동시에 활용될 것으로 예상되는 자율주행에 적용될 수 있고, 자율주행뿐만 아니라 환경이나 물체를 인식하는 다양한 인공지능 분야에도 적용이 가능하다. The cognitive system according to an embodiment may be applied to autonomous driving in which a rider and a camera sensor are expected to be simultaneously used in recent autonomous driving, and may be applied to various artificial intelligence fields that recognize an environment or an object as well as autonomous driving.

대표적인 적용 가능 분야로는 자율주행자동차, 로봇공학 및 차량 모니터링 시스템이 있다. 첫째로, 자율주행자동차에서 물체 인지는 가장 먼저 수반되어야 하는 기술로, 주변의 물체 및 보행자가 어느 위치에 존재하는지 위험한 요소는 없는지 판단하는 역할을 하며 이를 통해 안전한 주행을 할 수 있고 사고를 미연에 방지할 수 있다. 또한 주행 보조를 위해 신호등이나 표지판 등 주행 시 필요한 도로 정보를 확인하는 데에 사용된다. 둘째로, 로봇 공학에서 물체 인지는 사람의 눈처럼 로봇의 시각적 활동을 돕는다. 물체 인지를 통해 로봇이 주변 상황 및 물체를 확인하여 이를 바탕으로 목적에 맞는 기능을 수행할 수 있도록 한다. 또한 물체 인지 기술은 차량 모니터링 시스템에도 적용 가능하다. 주차 관리 시스템을 예로 들면, 주차장으로 입차하는 차의 종류 및 차량번호를 인지하고 더 나아가 현재 빈자리 상황까지 확인하여 주차를 도울 수 있다.Typical applications include autonomous vehicles, robotics and vehicle monitoring systems. First, in autonomous vehicles, object recognition is the first technology that must be carried out, and it serves to determine whether there are objects and pedestrians in the vicinity and whether there are any dangerous factors. Through this, you can drive safely and avoid accidents. Can be prevented. It is also used to check road information required for driving, such as traffic lights and signs, to assist driving. Second, in robotics, object recognition helps the robot's visual activity like the human eye. Through object recognition, the robot can check surrounding conditions and objects to perform functions suitable for the purpose. Also, object recognition technology can be applied to vehicle monitoring systems. As an example of a parking management system, it is possible to assist with parking by recognizing the type and vehicle number of a car entering the parking lot and further checking the current vacant situation.

이 기술을 적용하기 위해서는 카메라, 라이더 센서를 이용하여 데이터를 실시간으로 취득하고 이를 그래픽 프로세서 유닛(GPU) 등을 장착한 임베디드 시스템에서 물체 인식 알고리즘을 수행하는 방법이 있다. 이를 위해서는 미리 다양한 환경에 대한 데이터들을 확보하고 이를 통해 딥 뉴럴 네트워크의 구조를 학습시키게 된다. 학습된 딥 뉴럴 네트워크는 최적화된 네트워크 계수로 저장되게 되고 이를 임베디드 시스템에 적용하여 실시간으로 입력되는 테스트 데이터에 대해 물체인지 알고리즘을 수행하여 그 결과를 획득하도록 한다.In order to apply this technology, there is a method of acquiring data in real time using a camera and a rider sensor and performing an object recognition algorithm in an embedded system equipped with a graphic processor unit (GPU). To do this, it is necessary to acquire data on various environments in advance and learn the structure of a deep neural network. The trained deep neural network is stored as an optimized network coefficient and applied to an embedded system to perform object recognition algorithm on test data input in real time to obtain the result.

또한, 카메라, 라이더 센서 기반의 물체 인지 알고리즘은 현재 자율주행이나 모바일 로봇등에 응용될 수 있으며 향후 인지를 넘어 객체 추적 및 미래 예측 등 좀 더 복잡한 기능을 수행할 것으로 예상된다. 예를 들면 자동화된 감시, 교통 모니터링 및 차량 탐색 또는 도로 위의 교통상황을 고려한 신호등 제어 등도 가능해 질 것이다. 더불어 객체 인식은 주변 환경을 이해하는데 중요한 역할을 하며 이를 IoT를 결합한 가전제품에 적용 가능하다. 이와 같이 딥 러닝 기반의 물체 인지 알고리즘은 미래 기술에 기초가 되는 기술로 대표적인 인공지능 기술 중의 하나라고 할 수 있다.In addition, object recognition algorithms based on cameras and rider sensors can be applied to current autonomous driving or mobile robots, and are expected to perform more complex functions such as object tracking and future prediction beyond future recognition. For example, automated monitoring, traffic monitoring and vehicle navigation or traffic light control considering traffic conditions on the road will be possible. In addition, object recognition plays an important role in understanding the surrounding environment, and it can be applied to home appliances that combine IoT. As described above, deep learning-based object recognition algorithm is a technology that is based on future technologies and can be said to be one of the representative artificial intelligence technologies.

이상에서 설명된 장치는 하드웨어 구성요소, 소프트웨어 구성요소, 및/또는 하드웨어 구성요소 및 소프트웨어 구성요소의 조합으로 구현될 수 있다. 예를 들어, 실시예들에서 설명된 장치 및 구성요소는, 예를 들어, 프로세서, 콘트롤러, ALU(arithmetic logic unit), 디지털 신호 프로세서(digital signal processor), 마이크로컴퓨터, FPGA(field programmable gate array), PLU(programmable logic unit), 마이크로프로세서, 또는 명령(instruction)을 실행하고 응답할 수 있는 다른 어떠한 장치와 같이, 하나 이상의 범용 컴퓨터 또는 특수 목적 컴퓨터를 이용하여 구현될 수 있다. 처리 장치는 운영 체제(OS) 및 상기 운영 체제 상에서 수행되는 하나 이상의 소프트웨어 애플리케이션을 수행할 수 있다. 또한, 처리 장치는 소프트웨어의 실행에 응답하여, 데이터를 접근, 저장, 조작, 처리 및 생성할 수도 있다. 이해의 편의를 위하여, 처리 장치는 하나가 사용되는 것으로 설명된 경우도 있지만, 해당 기술분야에서 통상의 지식을 가진 자는, 처리 장치가 복수 개의 처리 요소(processing element) 및/또는 복수 유형의 처리 요소를 포함할 수 있음을 알 수 있다. 예를 들어, 처리 장치는 복수 개의 프로세서 또는 하나의 프로세서 및 하나의 콘트롤러를 포함할 수 있다. 또한, 병렬 프로세서(parallel processor)와 같은, 다른 처리 구성(processing configuration)도 가능하다.The device described above may be implemented with hardware components, software components, and / or combinations of hardware components and software components. For example, the devices and components described in the embodiments include, for example, processors, controllers, arithmetic logic units (ALUs), digital signal processors (micro signal processors), microcomputers, field programmable gate arrays (FPGAs). , A programmable logic unit (PLU), microprocessor, or any other device capable of executing and responding to instructions, may be implemented using one or more general purpose computers or special purpose computers. The processing device may run an operating system (OS) and one or more software applications running on the operating system. In addition, the processing device may access, store, manipulate, process, and generate data in response to execution of the software. For convenience of understanding, a processing device may be described as one being used, but a person having ordinary skill in the art, the processing device may include a plurality of processing elements and / or a plurality of types of processing elements. It can be seen that may include. For example, the processing device may include a plurality of processors or a processor and a controller. In addition, other processing configurations, such as parallel processors, are possible.

소프트웨어는 컴퓨터 프로그램(computer program), 코드(code), 명령(instruction), 또는 이들 중 하나 이상의 조합을 포함할 수 있으며, 원하는 대로 동작하도록 처리 장치를 구성하거나 독립적으로 또는 결합적으로(collectively) 처리 장치를 명령할 수 있다. 소프트웨어 및/또는 데이터는, 처리 장치에 의하여 해석되거나 처리 장치에 명령 또는 데이터를 제공하기 위하여, 어떤 유형의 기계, 구성요소(component), 물리적 장치, 가상 장치(virtual equipment), 컴퓨터 저장 매체 또는 장치에 구체화(embody)될 수 있다. 소프트웨어는 네트워크로 연결된 컴퓨터 시스템 상에 분산되어서, 분산된 방법으로 저장되거나 실행될 수도 있다. 소프트웨어 및 데이터는 하나 이상의 컴퓨터 판독 가능 기록 매체에 저장될 수 있다.The software may include a computer program, code, instruction, or a combination of one or more of these, and configure the processing device to operate as desired, or process independently or collectively You can command the device. Software and / or data may be interpreted by a processing device, or to provide instructions or data to a processing device, of any type of machine, component, physical device, virtual equipment, computer storage medium or device. Can be embodied in The software may be distributed over networked computer systems, and stored or executed in a distributed manner. Software and data may be stored in one or more computer-readable recording media.

실시예에 따른 방법은 다양한 컴퓨터 수단을 통하여 수행될 수 있는 프로그램 명령 형태로 구현되어 컴퓨터 판독 가능 매체에 기록될 수 있다. 상기 컴퓨터 판독 가능 매체는 프로그램 명령, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다. 상기 매체에 기록되는 프로그램 명령은 실시예를 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 당업자에게 공지되어 사용 가능한 것일 수도 있다. 컴퓨터 판독 가능 기록 매체의 예에는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체(magnetic media), CD-ROM, DVD와 같은 광기록 매체(optical media), 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical media), 및 롬(ROM), 램(RAM), 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다. 프로그램 명령의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드를 포함한다. The method according to the embodiment may be implemented in the form of program instructions that can be executed through various computer means and recorded on a computer-readable medium. The computer-readable medium may include program instructions, data files, data structures, or the like alone or in combination. The program instructions recorded in the medium may be specially designed and configured for the embodiments or may be known and usable by those skilled in computer software. Examples of computer-readable recording media include magnetic media such as hard disks, floppy disks, and magnetic tapes, optical media such as CD-ROMs, DVDs, and magnetic media such as floptical disks. -Hardware devices specifically configured to store and execute program instructions such as magneto-optical media, and ROM, RAM, flash memory, and the like. Examples of program instructions include high-level language codes that can be executed by a computer using an interpreter, etc., as well as machine language codes produced by a compiler.

이상과 같이 실시예들이 비록 한정된 실시예와 도면에 의해 설명되었으나, 해당 기술분야에서 통상의 지식을 가진 자라면 상기의 기재로부터 다양한 수정 및 변형이 가능하다. 예를 들어, 설명된 기술들이 설명된 방법과 다른 순서로 수행되거나, 및/또는 설명된 시스템, 구조, 장치, 회로 등의 구성요소들이 설명된 방법과 다른 형태로 결합 또는 조합되거나, 다른 구성요소 또는 균등물에 의하여 대치되거나 치환되더라도 적절한 결과가 달성될 수 있다.As described above, although the embodiments have been described by a limited embodiment and drawings, those skilled in the art can make various modifications and variations from the above description. For example, the described techniques are performed in a different order than the described method, and / or the components of the described system, structure, device, circuit, etc. are combined or combined in a different form from the described method, or other components Alternatively, even if replaced or substituted by equivalents, appropriate results can be achieved.

그러므로, 다른 구현들, 다른 실시예들 및 특허청구범위와 균등한 것들도 후술하는 특허청구범위의 범위에 속한다.Therefore, other implementations, other embodiments, and equivalents to the claims are also within the scope of the following claims.

Claims

In the cognitive method performed by the cognitive system,
Obtaining different feature maps by inputting different data into each deep neural network;
Fusing each of the acquired feature maps through a fusion network; And
Detecting an object based on a new feature map fused through the convergence network
Including,
The step of acquiring each feature map by inputting the different data into each deep neural network,
When the different data is data related to the rider and the camera, the converted 2-dimensional three-channel image and the camera image for the camera are passed through each CNN by performing a pre-processing process on the rider, and from the rider Mapping the acquired 3D point information to the 2D image of the camera, but performing a pre-processing process of generating location information of the 3D point information from rider coordinates to camera coordinates and generating 2D image coordinates by multiplying the matrix.
Cognitive method comprising a.

delete

According to claim 1,
The step of fusing each of the acquired feature maps through a fusion network,
After determining the quality of each feature map, assigning a weight to one or more feature maps of the feature maps, and then fusing each feature map through a convergence network
Cognitive method comprising a.

The method of claim 4,
The step of fusing each of the acquired feature maps through a fusion network,
When each of the acquired feature maps is a feature map of the camera and a feature map of the rider, as the feature map of the camera and the feature map of the rider pass through the fusion network, each feature map is fused in parallel to generate new features. Steps to create a map
Cognitive method comprising a.

In the cognitive method performed by the cognitive system,
Obtaining different feature maps by inputting different data into each deep neural network;
Fusing each of the acquired feature maps through a fusion network; And
Detecting an object based on a new feature map fused through the convergence network
Including,
The step of fusing each of the acquired feature maps through a fusion network,
As the quality of each feature map is determined, weights are applied to any one or more feature maps of the feature maps, and then each feature map is fused through a fusion network, and each acquired feature map is a camera. In the case of the feature map of the rider and the feature map of the rider, as the feature map of the camera and the feature map of the rider pass through the fusion network, each feature map is fused in parallel to generate a new feature map, and the features of the camera The map and the rider's feature map are first fused in parallel, and the first fused feature map is passed through a deep neural network with a plurality of 3X3 size kernels and a plurality of sigmoid functions to improve the robustness of the rider or the camera. Judging steps
Cognitive method comprising a.

The method of claim 6,
The step of fusing each of the acquired feature maps through a fusion network,
Outputting a value of 0 to 1 for each pixel as a reliability of data of the camera and the rider by passing the primary fused feature map through a deep neural network having a plurality of 3X3 kernels and a sigmoid function.
Cognitive method comprising a.

The method of claim 7,
The step of fusing each of the acquired feature maps through a fusion network,
Secondly, the second multiplicity of the multiplied value of the reliability of the camera is multiplied by the feature map of the camera and the multiplied value of the reliability of the rider by the feature map of the rider, and the second fused feature map of the kernel of 1x1 size. Step 3 through a deep neural network to derive a new feature map for the feature map of the camera and the feature map of the rider
Cognitive method comprising a.

According to any one of claims 1 or 6,
The step of detecting an object based on a new feature map fused through the convergence network includes:
Generating data for reducing the quality of some of the different data, and controlling to learn the deteriorated data
Cognitive method comprising a.

According to any one of claims 1 or 6,
The step of detecting an object based on a new feature map fused through the convergence network includes:
By simultaneously learning the deep neural network and the convergence network, processing the new feature map to simultaneously determine the location and type associated with the object.
Cognitive method comprising a.

In the cognitive system,
A feature map acquiring unit for inputting different data into each deep neural network to obtain a first feature map and a second feature map;
A fusion unit that fuses the obtained first feature map and second feature map through a fusion network; And
A detection unit that detects an object based on a new feature map fused through the convergence network
Including,
The fusion unit,
The first and second feature maps are first fused in parallel, and the data of the camera and the rider as the first fused feature map passes through a deep neural network and a sigmoid function having a plurality of 3X3 kernels. Reliability for outputting values between 0 and 1 for each pixel
Cognitive system.

The method of claim 11,
The feature map acquisition unit,
As the first data and the second data are input as the different data, the converted 2-dimensional three-channel image and the image for the second data are passed through each CNN as a pre-processing process for the first data. Letting
Cognitive system characterized in that.

The method of claim 11,
The fusion unit,
As the quality of the first feature map and the second feature map is determined, weighting is applied to one or more feature maps of each feature map, and then each feature map is fused through a convergence network.
Cognitive system characterized in that.

delete

The method of claim 11,
The fusion unit,
Secondly, the second multiplicity of the multiplied value of the reliability of the camera is multiplied by the feature map of the camera and the multiplied value of the reliability of the rider by the feature map of the rider, and the second fused feature map of the kernel of 1x1 size. Through the deep neural network, the third feature is fused to derive a new feature map for the camera feature map and rider feature map.
Cognitive system characterized in that.