KR20180070258A

KR20180070258A - Method for detecting and learning of objects simultaneous during vehicle driving

Info

Publication number: KR20180070258A
Application number: KR1020160172760A
Authority: KR
Inventors: 박민우; 조아라; 이정관; 황성주; 윤재홍
Original assignee: 기아자동차주식회사; 울산과학기술원; 현대자동차주식회사
Priority date: 2016-12-16
Filing date: 2016-12-16
Publication date: 2018-06-26
Also published as: KR102540393B1

Abstract

The present invention relates to a method for simultaneously recognizing and learning an object in a driving situation, which is able to automatically recognize and learn an object only by repetitive driving by increasing a recognition rate of the object. The method comprises a first step of extracting first and second object candidate areas by an image processing unit; a second step of estimating, by a detection unit, an object from the first and second object candidate areas and comparing the same with a type of an actual object to estimate and store a probability value indicating consistency; and a third step of calculating a type and a probability value of the object and updating the first and second object candidate areas related to the object.

Description

BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a method and apparatus for simultaneous object recognition and learning in a driving situation,

본 발명은 주행상황에서의 동시적 물체 인식 및 학습 방법에 관한 것으로서, 특히 영상신호로부터 오브젝트의 검출을 수행하면서 동시에 자동적인 학습을 수행하여 주행상황에서 검출기의 성능을 지속적으로 업그레이드 하는 동시적 물체 인식 및 학습 방법에 관한 것이다.The present invention relates to a simultaneous object recognition and learning method in a running situation, and more particularly, to a simultaneous object recognition and learning method in which the performance of a detector is continuously upgraded in a running situation by performing automatic learning while performing object detection from a video signal And a learning method.

차량 주변에 주의를 기울여야 하는 요소가 많은 상황에서 운전자는 서행을 하게 마련이다. 특히 좁은 골목길이나 교차로에서 좌 또는 우방향으로 회전하는 경우, 운전자의 시야가 충분히 확보되지 않은 영역에서 접근하는 물체가 차량과 충돌할 가능성에 대한 우려로 운전자는 항상 긴장하게 된다.In situations where there are many factors that require attention around the vehicle, the driver will have to slow down. In particular, when the vehicle rotates in a narrow alley or an intersection in the left or right direction, the driver is always nervous because of the possibility that an object approaching the vehicle in a region where the driver's visibility is not sufficiently secured may collide with the vehicle.

일반적인 물체 검출 관련 연구들은 오프라인 상태에서 수작업으로 물체의 Ground Truth를 추출하고, 이를 학습시켜서 검출기를 생성한다. 이 학습기를 온라인 상태에서 사용하여 물체 검출을 수행할 수 있다. General object detection related studies extract the ground truth of an object by hand while offline and learn it and generate a detector. The object can be detected by using this learning device on-line.

이와 관련, 종래의 한국공개특허 제10-2016-0121481호(물체 인식 시스템 및 그 물체 인식 방법)는 물체로부터 발생한 스테레오 영상으로부터 좌우 특징 벡터를 추출하고, 추출된 좌우 특징 벡터에 공통적으로 존재하는 강인한 특징 벡터를 찾으며, 좌우 특징 벡터 및 강인한 특징 벡터의 정보와 데이터베이스에 저장된 정보를 비교하여 물체의 이름 정보를 추출하여 물체를 인식하는 물체 인식 시스템 및 그 물체 인식 방법을 개시한다.In this regard, Korean Unexamined Patent Publication No. 10-2016-0121481 (object recognition system and object recognizing method) extracts left and right feature vectors from a stereo image generated from an object, and extracts left and right feature vectors common to the extracted left and right feature vectors Disclosed is an object recognition system for searching for a feature vector, comparing left and right feature vectors and robust feature vector information with information stored in a database to extract name information of the object to recognize the object, and an object recognition method thereof.

다만, 종래의 물체 인식 시스템 및 그 물체 인식 방법은 오프라인에서 인식과 학습을 분리하여 진행을 한다. 따라서 물체를 인식하는 경우에만 프로세서가 작동하게 되어 물체의 인식률이 낮은 문제점이 있다.However, in the conventional object recognition system and the object recognition method, recognition and learning are separated and proceeded in off-line. Therefore, the processor operates only when the object is recognized, and the recognition rate of the object is low.

한국공개(등록)특허 제10-2016-0121481호Korean Patent Registration No. 10-2016-0121481

본 발명은 오브젝트의 특징 추출과 인식기 학습을 통합시켜 인식과 학습을 자동적으로 수행하는 방법을 제공하는 것을 일 목적으로 한다.An object of the present invention is to provide a method of automatically performing recognition and learning by integrating object feature extraction and recognizer learning.

또한, 본 발명은 물체의 인식률을 증가시켜 반복적인 주행만으로 자동으로 물체를 인식하고 학습하는 시스템 방법을 제공하는 것을 다른 목적으로 한다.Another object of the present invention is to provide a system for automatically recognizing and learning an object by only repeating driving by increasing the recognition rate of the object.

상기 목적을 달성하기 위하여 본 발명은, 주행상황에서 카메라에 의해 촬영된 영상신호로부터 오브젝트를 인식하는 물체 인식 및 학습 방법에 있어서, 영상처리부가 영상신호에 RPN(Region Proposal Network)기법을 적용하여 오브젝트를 판단할 수 있는 제1 오브젝트 후보 영역을 추출하고 기 저장된 영상신호로부터 오브젝트를 트랙킹하여 제2 오브젝트 후보 영역을 추출하는 제1 단계; 검출부가 제1 및 제2 오브젝트 후보 영역으로부터 오브젝트를 추정하고 실제 물체의 종류와 비교하여 일치도를 나타내는 확률값을 추정 및 저장하는 제2 단계; 및 백워딩부에서 검출부에서 추정된 오브젝트 및 확률값을 물체의 종류마다 기 설정된 임계값과 비교하여 임계값 이상일 경우 오브젝트의 종류 및 확률값을 산출하고 영상처리부로 오브젝트의 종류 및 확률값을 피드백하여 물체와 관련된 제1 및 제2 오브젝트 후보 영역을 업데이트 하는 제3 단계;를 포함하는 주행상황에서의 동시적 물체 인식 및 학습 방법을 제공한다.According to an aspect of the present invention, there is provided an object recognition and learning method for recognizing an object from a video signal photographed by a camera in a running situation, the object processing method comprising the steps of: applying an RPN (Region Proposal Network) Extracting a first object candidate region from which the object candidate region can be determined, and tracking the object from the stored video signal to extract a second object candidate region; Estimating an object from the first and second object candidate regions and comparing and comparing the object with a kind of an actual object, and estimating and storing a probability value indicating the degree of matching; And the backwating unit compares the object and probability values estimated by the detection unit with predetermined threshold values for each type of object, calculates the type and probability value of the object when the threshold value is greater than the threshold value, feeds back the type and probability value of the object to the image processing unit And a third step of updating the first and second object candidate regions based on the first object candidate region and the second object candidate region.

바람직하게, 주행상황에서 오브젝트의 종류를 인식하는 동시에 오브젝트와 관련된 제1 및 제2 오브젝트 후보 영역을 업데이트 하여 물체의 인식률을 높일 수 있다.Preferably, the first and second object candidate regions associated with the object are updated while recognizing the type of the object in the running situation, thereby increasing the recognition rate of the object.

바람직하게, 제1 단계의 제2 오브젝트 후보 영역은, 기 설정된 시간 내에서의 KLT(Kanade-Lucas-Tomasi) 트래킹을 이용하여 추출될 수 있다.Preferably, the second object candidate region in the first step may be extracted using KLT (Kanade-Lucas-Tomasi) tracking within a predetermined time.

바람직하게, 오브젝트의 종류는, 주행상황에서 감지되는 전방의 정지 또는 이동하는 물체를 포함할 수 있다.Preferably, the type of object may include a forward stop or moving object sensed in a running situation.

바람직하게, 제3 단계는, 제1 및 제2 오브젝트 후보 영역에서 물체와 매칭되는 레이블을 산출하는 과정; 레이블에 softmax function을 취한 값 중 최대 확률값을 갖는 레이블을 산출하는 과정; 및 최대 확률값을 갖는 레이블이 임계값 이상인 경우 레이블을 ground truth 레이블이라 가정하고 ground truth 레이블과 실시간으로 측정되는 레이블과의 차이를 영상처리부로 피드백하는 과정;을 포함할 수 있다.Preferably, the third step may include: calculating a label matching the object in the first and second object candidate regions; Calculating a label having a maximum probability value among values obtained by taking a softmax function in a label; And a process of feeding the difference between the ground truth label and the label measured in real time to the image processing unit when the label having the maximum probability value is equal to or greater than the threshold value and assuming that the label is the ground truth label.

전술한 바와 같은 구성을 갖는 본 발명에 따르면, 지속적인 자동차 주행을 통해 얻어진 데이터를 이용한 '동시적 자율학습'을 통해 오브젝트 인식기의 성능이 향상되는 이점이 있다.According to the present invention having the above-described configuration, there is an advantage that performance of the object recognizer is improved through 'simultaneous autonomous learning' using data obtained through continuous driving of a vehicle.

또한 본 발명은, 비디오 영상의 오브젝트 학습을 위해서 매 프레임마다 오브젝트에 레이블링을 해주어야 할 필요가 없어 레이블링 비용이 절감되는 이점이 있다.Further, the present invention is advantageous in that the labeling cost is reduced because there is no need to label the object every frame for object learning of the video image.

도 1은 종래기술 대비 동시적 학습이 향상된 주행상황에서의 동시적 물체 인식 및 학습 방법을 나타낸다.
도 2는 본 발명의 실시예에 따른 순서도를 나타낸다.
도 3은 본 발명의 실시예에 따른 주행상황에서의 동시적 물체 인식 및 학습 방법의 데이터 처리 모습을 나타낸다.
도 4는 본 발명의 실시예에 따라 영상처리부에서 KLT(Kanade-Lucas-Tomasi) 트래킹 결과를 이용하여 영상을 인식하는 방법을 나타낸다.
도 5는 본 발명의 실시예에 따른 제3 단계의 세부 구성을 나타낸다.
도 6은 본 발명의 실시예에 따른 학습 방법을 적용하여 오브젝트를 검출하는 개선된 결과 화면을 나타낸다.
도 7은 본 발명의 실시예에 따라 오브젝트의 검출율이 개선된 결과를 나타낸다.FIG. 1 shows a simultaneous object recognition and learning method in a running situation in which simultaneous learning is improved compared to the prior art.
2 shows a flowchart according to an embodiment of the present invention.
FIG. 3 shows a data processing view of a simultaneous object recognition and learning method in a running situation according to an embodiment of the present invention.
4 illustrates a method of recognizing an image using a KLT (Kanade-Lucas-Tomasi) tracking result in an image processing unit according to an embodiment of the present invention.
5 shows a detailed configuration of the third step according to the embodiment of the present invention.
FIG. 6 shows an improved result screen for detecting an object by applying a learning method according to an embodiment of the present invention.
FIG. 7 shows the result of improving the object detection rate according to the embodiment of the present invention.

이하, 첨부된 도면들에 기재된 내용들을 참조하여 본 발명을 상세히 설명한다. 다만, 본 발명이 예시적 실시 예들에 의해 제한되거나 한정되는 것은 아니다. 각 도면에 제시된 동일 참조부호는 실질적으로 동일한 기능을 수행하는 부재를 나타낸다.Hereinafter, the present invention will be described in detail with reference to the accompanying drawings. However, the present invention is not limited to or limited by the exemplary embodiments. Like reference numerals in the drawings denote members performing substantially the same function.

본 발명의 목적 및 효과는 하기의 설명에 의해서 자연스럽게 이해되거나 보다 분명해 질 수 있으며, 하기의 기재만으로 본 발명의 목적 및 효과가 제한되는 것은 아니다. 또한, 본 발명을 설명함에 있어서 본 발명과 관련된 공지 기술에 대한 구체적인 설명이, 본 발명의 요지를 불필요하게 흐릴 수 있다고 판단되는 경우에는 그 상세한 설명을 생략하기로 한다.The objects and effects of the present invention can be understood or clarified naturally by the following description, and the purpose and effect of the present invention are not limited by the following description. In the following description, well-known functions or constructions are not described in detail since they would obscure the invention in unnecessary detail.

도 1은 종래기술 대비 동시적 학습이 향상된 주행상황에서의 동시적 물체 인식 및 학습 방법을 나타낸다. 도 1을 참조하면, 기존 연구들의 경우 카테고리 학습 모델만을 업데이트하여 근본적인 표현형까지 학습되지 않는다. 종래의 물체 인식 시스템 및 그 물체 인식 방법은 오프라인에서 인식과 학습을 분리하여 진행하기 때문에, 물체를 인식하는 경우에만 프로세서가 작동하게 되어 물체의 인식률이 낮은 문제점이 있다.FIG. 1 shows a simultaneous object recognition and learning method in a running situation in which simultaneous learning is improved compared to the prior art. Referring to FIG. 1, in the case of existing studies, only the category learning model is updated, and the basic phenotype is not learned. Conventionally, the object recognition system and the object recognition method have separate recognition and learning processes in the off-line, so that only when the object is recognized, the processor operates and the object recognition rate is low.

이에 비해 본 발명은 백워딩을 통해 오브젝트의 특징 추출과 인식기 학습을 통합시켜 인식과 학습을 자동적으로 수행하는 방법을 제공하고, 물체의 인식률을 증가시켜 반복적인 주행만으로 자동으로 물체를 인식하고 학습하는 시스템 방법을 제공할 수 있다. On the other hand, the present invention provides a method of automatically performing recognition and learning by integrating object feature extraction and recognizer learning through backwashing, and recognizing and learning an object automatically by repetitive running by increasing the recognition rate of the object System method can be provided.

도 2는 본 발명의 실시예에 따른 순서도를 나타낸다. 도 2를 참조하면, 본 발명은 3단계로 구성될 수 있다. 영상처리부가 제1 및 제2 오브젝트 후보 영역을 추출하는 제1 단계(S10); KLT(Kanade Lucas Tomasi)기반의 트래킹 결과를 이용하여 CNN(Convolutional Neural Network)-LSTM(Long-Short Term Memory)을 사용하여 여러 프레임을 하나의 트랙으로 묶어서 한번에 인식을 수행하고 확률값을 추정 및 저장하는 제2 단계(S30); 및 오브젝트의 종류 및 확률값을 피드백하여 물체와 관련된 제1 및 제2 오브젝트 후보 영역을 업데이트 하는 제3 단계(S50 및 S70)를 포함할 수 있다.2 shows a flowchart according to an embodiment of the present invention. Referring to FIG. 2, the present invention can be configured in three stages. A first step (S10) of extracting first and second object candidate regions by the image processing unit; Using CNN (Convolutional Neural Network) -LSTM (Long-Short Term Memory) using KLT (Kanade Lucas Tomasi) based tracking results, multiple frames are grouped into one track to perform recognition at one time, and estimation and storage of probability values A second step S30; And a third step (S50 and S70) of updating the first and second object candidate regions related to the object by feeding back the kind and probability value of the object.

제1 단계(S10)는 영상처리부가 영상신호에 RPN(Region Proposal Network)기법을 적용하여 오브젝트를 판단할 수 있는 제1 오브젝트 후보 영역을 추출하고 기 저장된 영상신호로부터 오브젝트를 트랙킹하여 제2 오브젝트 후보 영역을 추출하는 과정이다.In the first step S10, the image processing unit extracts a first object candidate region that can determine an object by applying an RPN (Region Proposal Network) technique to the video signal, tracks the object from the previously stored video signal, This is the process of extracting the region.

본 발명은 주행상황에서 오브젝트의 종류를 인식하는 동시에 오브젝트와 관련된 제1 및 제2 오브젝트 후보 영역을 업데이트 하여 물체의 인식률을 높이는 것을 특징으로 하며, 오브젝트의 종류는, 주행상황에서 감지되는 전방의 정지 또는 이동하는 물체를 포함할 수 있다.The present invention is characterized by recognizing the type of an object in a running situation and updating the first and second object candidate regions related to the object to increase the recognition rate of the object. Or moving objects.

제1 오브젝트 후보 영역은 영상처리부가 현재의 영상신호에서 오브젝트를 추출하기 위한 영역을 의미하고, 제2 오브젝트 후보 영역은 기 설정된 시간 간격 전에 검출된 학습된 후보 영역을 의미한다.The first object candidate region means an area for extracting an object from a current image signal and the second object candidate region means a learned candidate region detected before a predetermined time interval.

제2 단계(S30)은 검출부가 제1 및 제2 오브젝트 후보 영역으로부터 오브젝트를 추정하고 실제 물체의 종류와 비교하여 일치도를 나타내는 확률값을 추정 및 저장하는 과정이다.The second step S30 is a process in which the detecting unit estimates an object from the first and second object candidate regions and compares the object with a kind of an actual object to estimate and store a probability value indicating the degree of matching.

제3 단계(S50 및 S70)는 백워딩부에서 검출부에서 추정된 오브젝트 및 확률값을 물체의 종류마다 기 설정된 임계값과 비교하여 임계값 이상일 경우 오브젝트의 종류 및 확률값을 산출하고 영상처리부로 오브젝트의 종류 및 확률값을 피드백하여 물체와 관련된 제1 및 제2 오브젝트 후보 영역을 업데이트 하는 과정이다.In the third step (S50 and S70), the object and the probability value estimated by the detection unit in the backward part are compared with predetermined threshold values for each kind of object, and the type and the probability value of the object are calculated when the threshold value is over, And updating the first and second object candidate regions related to the object by feeding back the probability values.

도 3은 본 발명의 실시예에 따른 주행상황에서의 동시적 물체 인식 및 학습 방법의 데이터 처리 모습을 나타낸다. 도 3을 참조하면, 제1 단계에서 오브젝트 영역을 추출한다. 본 발명의 실시예에서는 RPN(Region Proposal Network)를 이용하여 오브젝트 후보 영역을 추출한다. FIG. 3 shows a data processing view of a simultaneous object recognition and learning method in a running situation according to an embodiment of the present invention. Referring to FIG. 3, an object region is extracted in a first step. In the embodiment of the present invention, an object candidate region is extracted using RPN (Region Proposal Network).

RPN(Region Proposal Network)은 지역 기반으로 물체를 인식하는 것으로, 물체로 추정되는 후보영역들을 찾아서 제안하는 네트워크입니다. 영상의 어느 위치에 사용자가 찾고자하는 물체로 추정되는 후보군이 있다라는 형태로 후보영역들을 찾아서 제안할 수 있다. 즉, 전체 영상에서 물체로 추정되는 후보영역들을 뽑아내는 가장 앞단에 있는 네트워크를 의미한다. 그러나 이 과정에서 나오는 후보영역이 너무 많기 때문에 time t에 대한 과거 영상들을 사용한 트래킹을 수행하여 일정 간격으로 물체를 판별함으로써 후보 영역을 줄일 수 있다. 또한 이 결과를 CNN을 통해 최종적으로 물체를 판별하고, 이 최종 결과 중 confidence가 높은 물체는 다시 네트워크의 추가학습을 통해 CNN네트워크의 성능을 향상시킬 수 있다.The RPN (Region Proposal Network) is an area-based object recognition system that finds and suggests candidates for objects. It is possible to find candidate regions in the form that there is a candidate group that is estimated as an object that the user wants to find at a certain position of the image. In other words, it means the frontmost network which extracts candidate regions estimated as objects in the whole image. However, since there are too many candidate regions in this process, it is possible to reduce the candidate region by tracking the objects using the past images with respect to time t to identify objects at regular intervals. Finally, CNN can be used to identify the objects, and the confidence of the final result can be used to improve CNN network performance through additional learning of the network.

따라서 t 시간에 인식된 영상을 바탕으로 전방의 물체들을 검색하고, 동일한 주행 구간 내에서 학습된 t-1 시간 또는 그 이전의 학습된 데이터로부터 추출한 오브젝트를 트랙킹하여 오브젝트 종류를 검출한다. 검출된 오브젝트의 종류를 분류하여 주행상황에서 추출된 오브젝트를 인식하며, 이렇게 추출된 오브젝트의 확률값을 기존의 학습된 오브젝트 별 확률의 임계값 이상인지 판단한다. Therefore, the objects in front are searched based on the images recognized at time t, and the objects extracted from the learned data at time t-1 or earlier learned in the same travel section are tracked to detect the object type. Classifies the types of the detected objects, recognizes the extracted objects in the running situation, and determines whether the probability values of the extracted objects are equal to or larger than a threshold value of the probability of each learned object.

오브젝트별 확률이 임계값 이상일 경우, 네트워크 백워딩을 통해 기존의 학습된 오브젝트의 임계값을 업데이트 할 수 있다. 또한, 검출된 오브젝트를 비교하여 각 오브젝트의 종류 및 확률의 추정 작업을 진행할 수 있다.If the probability per object is greater than or equal to the threshold value, the threshold value of the existing learned object can be updated via network backwashing. In addition, it is possible to compare the detected objects, and to estimate the kinds and probabilities of the respective objects.

도 4는 본 발명의 실시예에 따라 영상처리부에서 KLT(Kanade-Lucas-Tomasi) 트래킹 결과를 이용하여 영상을 인식하는 방법을 나타낸다. 도 4를 참조하면, 기존 Faster R-CNN과 달리 KLT기반의 트래킹 결과를 이용하여 CNN(Convolutional Neural Network)-LSTM(Long-Short Term Memory)을 사용하여 여러 프레임을 하나의 트랙으로 묶어서 한번에 인식을 수행할 수 있다. 4 illustrates a method of recognizing an image using a KLT (Kanade-Lucas-Tomasi) tracking result in an image processing unit according to an embodiment of the present invention. Referring to FIG. 4, unlike the conventional Faster R-CNN, a plurality of frames are grouped into one track using a CNN (Convolutional Neural Network) -LSTM (Long-Short Term Memory) Can be performed.

도 4는 상술한 도 3과 비교하여 CNN(Convolutional Neural Network) LSTM(Long-Short Term Memory)을 사용하여 여러 프레임을 하나의 트랙으로 묶어서 한번에 인식을 수행할 수 있다.FIG. 4 is a block diagram illustrating a method of combining a plurality of frames into a single track by using a CNN (Convolutional Neural Network) LSTM (Long-Short Term Memory).

CNN(Convolutional Neural Network) LSTM(Long-Short Term Memory)은 신경망 구조의 일종으로 지속적인 반복 학습을 통해 입력 패턴에 대하여 비교적 올바른 출력을 생성할 수 있도록 한다. 입력 패턴을 특정 그룹으로 분류하는 문제를 해결하는 방안으로써, 인간이 지니고 있는 효율적인 패턴 인식 방법을 실제 컴퓨터에 적용시키는 것으로, 입력 패턴을 특정 그룹으로 분류하는 문제를 해결하기 위해, 인공신경망은 인간이 가지고 있는 학습이라는 능력을 모방한 알고리즘을 이용한다. 이 알고리즘을 통하여 인공신경망은 입력 패턴과 출력 패턴들 사이의 사상(mapping)을 생성해낼 수 있는데, 이를 인공신경망이 학습 능력이 있다고 표현한다. 따라서, 인공신경망은 학습된 결과에 기초하여 학습에 이용되지 않았던 입력 패턴에 대하여 비교적 올바른 출력을 생성할 수 있다.CNN (Convolutional Neural Network) LSTM (Long-Short Term Memory) is a kind of neural network structure that allows continuous output of relatively correct output for input patterns. In order to solve the problem of classifying the input pattern into a specific group, in order to solve the problem of classifying the input pattern into a specific group by applying an effective pattern recognition method of a human being to a real computer, I use an algorithm that mimics the ability of learning I have. Through this algorithm, an artificial neural network can generate mapping between input pattern and output pattern, which expresses that artificial neural network has learning ability. Thus, the artificial neural network can generate a relatively correct output for input patterns that were not used for learning based on the learned results.

도 4를 참조하면, 본 발명의 실시예에서는 t-1 시간을 이용하는데, 이에 한정되는 것은 아니고, 현재 입력되는 값 또는 그 이전의 학습된 데이터 이전의 데이터로부터 추출한 오브젝트를 트랙킹하여 오브젝트 종류를 검출한다. 검출된 오브젝트의 종류를 분류하여 주행상황에서 추출된 오브젝트를 인식하는데, 각각의 입력값을 레이어에 저장하여, 각 레이어를 연결하거나 또는 회귀신경망을 통해 기존 레이어를 업그레이드하는 방식을 취할 수 있다. Referring to FIG. 4, in the embodiment of the present invention, the time t-1 is used. However, the present invention is not limited to this, and an object extracted from the data currently input or previous data do. It is possible to classify the types of detected objects and recognize the objects extracted from the running situation. Alternatively, each input value may be stored in a layer, and each layer may be connected or an existing layer may be upgraded through a regression neural network.

t-1 시간에 입력되는 영상 신호를 LSTM 방식으로 입력 받아 트럭임을 인식하고, t 시간에 입력되는 영상 신호를 이용하여 t-1 시간에 입력된 영상 신호에서 추출된 오브젝트의 확률값(0.9)을 업그레이드 할 수 있다. 즉, 동일한 오브젝트에 대한 반복 학습으로 전방의 오브젝트 영역에서 오브젝트를 트럭으로 인식할 수 있다. 따라서 CNN(Convolutional Neural Network) LSTM(Long-Short Term Memory)을 사용하여 여러 프레임을 하나의 트랙으로 묶어서 각 레이어에 대한 인식을 한번에 수행할 수 있다.The video signal input at time t-1 is received by the LSTM method and recognized as a track. The probability value (0.9) of the object extracted from the video signal input at time t-1 is upgraded using the video signal input at time t can do. That is, it is possible to recognize an object as a truck in an object area ahead by repeating learning on the same object. Therefore, it is possible to perform recognition of each layer at a time by grouping a plurality of frames into one track by using CNN (Convolutional Neural Network) LSTM (Long-Short Term Memory).

이는 기존과 다른 Originality를 보장할 수 있는데, 도면에 제시된 바와 같이 LSTM을 통한 현재 시점에서의 인식이 아닌 일정 시간 내에서의 결과를 취합한 정보를 통해 인식률이 향상될 수 있다.As shown in the figure, the recognition rate can be improved through the information obtained by collecting the results within a predetermined time rather than the recognition at the present time through the LSTM.

도 5는 본 발명의 실시예에 따른 제3 단계의 세부 구성을 나타낸다. 도 5를 참조하면, 제3 단계의 세부 구성으로 제3 단계는, 제1 및 제2 오브젝트 후보 영역에서 물체와 매칭되는 레이블을 산출하는 과정(S701); 레이블에 softmax function을 취한 값 중 최대 확률값을 갖는 레이블을 산출하는 과정(S702 및 S703); 및 최대 확률값을 갖는 레이블이 임계값 이상인 경우 레이블을 ground truth 레이블이라 가정하고 ground truth 레이블과 실시간으로 측정되는 레이블과의 차이를 영상처리부로 피드백하는 과정(S704);을 포함할 수 있다.5 shows a detailed configuration of the third step according to the embodiment of the present invention. Referring to FIG. 5, in the third step, the third step includes a step S701 of calculating a label matching an object in the first and second object candidate regions; (S702 and S703) of calculating a label having a maximum probability value among values obtained by taking a softmax function in a label; And a step (S704) of, when the label having the maximum probability value is equal to or greater than the threshold, the label as the ground truth label and feeding back the difference between the ground truth label and the label measured in real time to the image processing unit.

레이블을 산출하는 과정(S701)은 Softmax output에서 가장 높은 확률값의 오브젝트를 선택하여 인덱싱하는 과정이다. 인덱싱된 확률값(p)과 임계값을 비교하는 과정(S702)은 인덱싱된 예측값(p)과 threshold (Θ) 를 비교하는 과정으로 Θ < p 이면 학습 (Back Propagation), Θ > p 라면 비학습 (Do Nothing) 모드로 진행이 가능하다. 아래 표 1 및 표 2의 경우 임계값(즉, threshold (Θ))에 대한 포워드 및 백워드 방식에 대한 슈도코드(pesudocode)를 나타낸다.The process of calculating the label (S701) is a process of selecting an object having the highest probability value in the softmax output and indexing it. The process of comparing the indexed probability value p with the threshold value S702 is a process of comparing the indexed predicted value p with the threshold value Θ. If Θ <p, learning is performed (Back Propagation) Do Nothing) mode. (Pesudocode) for the forward and backward schemes for thresholds (i. E. Threshold ([theta]) in the case of Tables 1 and 2 below).

본 발명은 unlabeled softmax loss layer를 새로이 구현하였다. 종래 기술은 기존 ground truth label과 softmax prediction output의 차를 계산하여 loss를 구하여 그에 기반한 백워딩을 하는데 반하여, 본 발명은 백워딩부를 통해 기존 softmax loss와 달리 레이블이 주어지지 않은 raw data를 통하여도 올바른 방향의 loss를 계산하여 백워딩 computatio를 진행하기 위함이다. The present invention newly implements an unlabeled softmax loss layer. In contrast to the prior art, backwarding based on the difference between an existing ground truth label and a softmax prediction output is performed to obtain loss and the backward based on the difference. The present invention is also applicable to raw data Directional loss is calculated and the backward computation is performed.

종래의 softmax loss의 경우는, 피쳐를 통한 각 레이블의 아웃풋에 softmax function을 취한 값(확률 x=[0,1])과 ground truth label(이 경우 확률 x'=1)의 차이만큼 loss를 취해 백워딩을 한다.In the case of conventional softmax loss, take a loss by the difference between the value obtained by taking the softmax function (probability x = [0,1]) and the ground truth label (probability x '= 1 in this case) Backwash.

이에 반해 본 발명의 softmax function은 Unlabeled softmax loss로서, 피쳐를 통한 레이블의 아웃풋에 softmax function을 취한 값 중 가장 높은 확률값(

)을 선별한 후, 각 레이블마다 독립적으로 정의되어 있는 기준점(임계값<

) 이상인 경우, 레이블 n을 ground truth label이라고 가정하고(

'=1), 그 차이만큼 loss를 취해 백워딩을 진행할 수 있다.On the contrary, the softmax function of the present invention is an unlabeled softmax loss, and the highest probability value obtained by taking a softmax function for the output of a label through a feature

), And then a reference point independently defined for each label (threshold value <

), The label n is assumed to be the ground truth label (

'= 1), the back-warding can proceed by taking the loss as much as the difference.

레이블을 산출하는 과정(S701~ S703) 및 영상처리부로 피드백하는 과정(S704)은 오브젝트 영역에서 confidence가 높은 결과를 다시 CNN 네트워크에 피드백하여 영상에서 물체를 검출하는 동시에 그 결과를 실시간으로 온라인 상황에서 학습할 수 있다.In the process of calculating the label (S701 to S703) and the process of feeding back to the image processing unit (S704), the object having high confidence in the object region is fed back to the CNN network to detect the object in the image, You can learn.

종래의 대부분의 딥러닝 및 기계학습의 경우, 오프라인에서 별도의 학습을 거친 후 생성된 classifier 혹은 network를 가져와 후보 검출 및 물체 검출을 수행한다. 이 경우 다시 오프라인에서 학습을 하기 때문에 classifier 혹은 network의 성능은 별도로 업데이트하지 않는 한 개선되지 않는다.In most conventional deep learning and machine learning, candidate learning and object detection are performed by taking a classifier or network generated after learning separately in off-line. In this case, since the learning is performed offline, the performance of the classifier or network is not improved unless it is updated separately.

이에 본 발명의 실시예는, 오프라인 학습 없이 단순한 차량 주행만으로도 오브젝트의 검출과 동시에 학습을 병행하여 자동적으로 성능을 개선되는 장점을 지니게 되는 것이다.Therefore, the embodiment of the present invention has an advantage that the performance is automatically improved by simultaneously learning the object and learning simultaneously with a simple vehicle driving without off-line learning.

도 6은 본 발명의 실시예에 따른 학습 방법을 적용하여 오브젝트를 검출하는 개선된 결과 화면을 나타낸다. 도 6을 참조하면, 본 발명을 적용하는 경우 동일 영상에 대한 반복 학습 결과가 개선되는 것을 확인할 수 있다. FIG. 6 shows an improved result screen for detecting an object by applying a learning method according to an embodiment of the present invention. Referring to FIG. 6, it can be seen that the result of iterative learning for the same image is improved when the present invention is applied.

동일 영상에 대해서 반복 검출을 통해 제1 및 제2 오브젝트 영역을 지속적으로 개선하는 경우, 오브젝트의 인식률이 높아지는 것을 확인할 수 있다.In the case where the first and second object areas are continuously improved through repeated detection of the same image, it is confirmed that the recognition rate of the object is increased.

도 7은 본 발명의 실시예에 따라 오브젝트의 검출율이 개선된 결과를 나타낸다. 도 7은 도 6의 가시화된 모습을 수치로서 표현한 내용이다. 동일 영상에 대한 지속적인 반복 학습을 통해 검츌율이 증가하는 것을 확인할 수 있다. x축은 검출화면에서의 단계적 순서이고, base에 tracking 이 진행되고 횟수가 거듭할수록 y축의 확률값이 개선되는 결과를 확인할 수 있다.FIG. 7 shows the result of improving the object detection rate according to the embodiment of the present invention. 7 is a numerical representation of the visualized state of Fig. It can be seen that the detection rate is increased through continuous iterative learning on the same image. The x-axis is a step-by-step sequence on the detection screen, and tracking results on the base, and the probability that the y-axis increases as the number of times increases.

이상에서 대표적인 실시예를 통하여 본 발명을 상세하게 설명하였으나, 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자는 상술한 실시예에 대하여 본 발명의 범주에서 벗어나지 않는 한도 내에서 다양한 변형이 가능함을 이해할 것이다. 그러므로 본 발명의 권리 범위는 설명한 실시예에 국한되어 정해져서는 안 되며, 후술하는 특허청구범위뿐만 아니라 특허청구범위와 균등 개념으로부터 도출되는 모든 변경 또는 변형된 형태에 의하여 정해져야 한다.While the present invention has been particularly shown and described with reference to exemplary embodiments thereof, it is to be understood that the invention is not limited to the disclosed exemplary embodiments. will be. Therefore, the scope of the present invention should not be limited to the above-described embodiments, but should be determined by all changes or modifications derived from the scope of the appended claims and equivalents of the claims.

Claims

An object recognition and learning method for recognizing an object from a video signal photographed by a camera in a running situation,
A first step of extracting a first object candidate region capable of judging the object from the video signal and extracting a second object candidate region by tracking the object from a previously stored video signal;
A second step of estimating the object from the first and second object candidate regions and comparing and comparing the object with a kind of an actual object to estimate and store a probability value indicating the degree of agreement; And
The backwashing unit compares the object and the probability value estimated by the detection unit with a predetermined threshold value for each type of object, calculates the type and probability value of the object when the threshold value is more than the threshold value, and transmits the type and probability value of the object to the image processing unit And a third step of updating the first and second object candidate regions associated with the object.

The method according to claim 1,
And recognizing the type of the object in the running situation and updating the first and second object candidate regions related to the object to increase the recognition rate of the object.

The method according to claim 1,
Wherein the second object candidate region in the first step includes:
(KLT) tracking within a predetermined period of time. The method for recognizing and learning a simultaneous object in a running situation is characterized in that it is extracted using KLT (Kanade-Lucas-Tomasi) tracking within a predetermined time.

The method according to claim 1,
The type of the object is,
And a forward stop or moving object sensed in the running situation.

The method according to claim 1,
In the third step,
Calculating a label matching the object in the first and second object candidate regions;
Calculating a label having a maximum probability value among values obtained by taking a softmax function in the label; And
When the label having the maximum probability value is greater than or equal to the threshold value, feeding back the difference between the ground truth label and the label measured in real time to the image processing unit Simultaneous object recognition and learning methods.

The method according to claim 1,
And extracting a first object candidate region from the video signal using a Region Proposal Network (RPN) technique.