KR101355976B1

KR101355976B1 - Method and apparatus for object tracking using stereo feature matching

Info

Publication number: KR101355976B1
Application number: KR1020120128448A
Authority: KR
Inventors: 임영철; 이충희; 김종환; 박지호; 김남혁
Original assignee: 재단법인대구경북과학기술원
Priority date: 2012-11-13
Filing date: 2012-11-13
Publication date: 2014-02-03

Abstract

An object tracking method which uses a matching of space time feature points and an apparatus thereof are disclosed. The object tracking apparatus using the matching of the time space feature points according to an embodiment of the present invention includes: a candidate area setting unit which sets an area (a first searching area) which detects an object which is a tracking object in any one image (a first image) between a left image and a right image which are inputted by a stereo camera; a time image feature point extracting unit which extracts feature points within a region-of-interest (POI) of a previous frame and the first searching area; an ROI estimating unit which estimates an ROI in the first image through a matching of the feature points between time images; a space image feature point extracting unit which sets a second searching area in the other image (a second image) between the left and right images based on the ROI of the first image and extracts feature points in the second searching area; and a space image feature point matching unit which extracts a feature point which is matched with the feature points which are extracted from the second searching area among the feature points extracted in the ROI of the first image through a feature point matching between space images. [Reference numerals] (10) Preprocessing unit; (20) Candidate area setting unit; (30, 50) Time image feature point extracting unit; (40) ROI estimating unit; (60) Space image feature point matching unit; (70) Distance estimating unit; (AA) Left and right images

Description

Object tracking method using spatiotemporal feature point matching and its apparatus {Method and apparatus for object tracking using stereo feature matching}

본 발명은 동영상 프레임에서 특정 객체를 추적하는 방법 및 그 장치에 관한 것으로서, 보다 상세하게는 시공간 특징점 정합을 이용한 객체 추적방법 및 장치에 관한 것이다.The present invention relates to a method and apparatus for tracking a specific object in a video frame, and more particularly, to an object tracking method and apparatus using space-time feature point matching.

최근 자동차, 로봇, 스마트폰, 영상 보안 등 다양한 환경에서 카메라를 이용한 객체 추적방법들에 대한 연구가 활발하게 진행되고 있다. 단일 카메라를 이용한 방법은 시스템 구성이 간단한 장점은 있지만, 3차원 실제 정보를 2차원의 영상 정보로 투영하면서 정보의 손실이 발생하고, 그 결과 거리 추정이나 객체 추적 등에 있어서 성능 저하가 발생한다.Recently, research on object tracking methods using cameras in various environments such as automobiles, robots, smartphones, and video security has been actively conducted. Although the method using a single camera has the advantage of simple system configuration, loss of information occurs while projecting three-dimensional real information into two-dimensional image information, and as a result, performance degradation occurs in distance estimation and object tracking.

도 1은 종래 기술에 따른 스테레오 카메라를 이용한 객체 추적방법의 일련의 프로세스를 설명한 순서도이다. 이하, 도 1을 참조하여 종래 기술에 따른 스테레오 카메라를 이용한 객체 추적방법을 설명한다.1 is a flowchart illustrating a series of processes of an object tracking method using a stereo camera according to the prior art. Hereinafter, an object tracking method using a stereo camera according to the prior art will be described with reference to FIG. 1.

도 1을 참조하면, 종래 기술에 따른 스테레오 카메라를 이용한 객체 추적방법은 스테레오 영상을 보정(calibration)하고 조정(rectification)하는 전처리 단계(S11)와, 좌우 영상의 시차를 계산하는 스테레오 정합 단계(S12)와, 추적 대상이 되는 객체의 관심영역(Region Of Interest, 이하 ROI)을 설정하는 ROI 설정 단계(S13)와, 이전 영상의 ROI 내부와 현재 영상의 탐색 영역에서 특징점을 추출하는 단계(S14)와, 이전 영상의 ROI에서 추출된 특징점과 현재 영상의 탐색 영역에서 추출된 특징점을 정합하는 시간 영상 특징점 정합 단계(S15)와, 정합된 특징점들을 이용하여 ROI와 객체와의 거리를 계산하는 단계(S16)로 구성된다.Referring to FIG. 1, an object tracking method using a stereo camera according to the related art includes a preprocessing step S11 of calibrating and rectifying a stereo image, and a stereo matching step of calculating a parallax of a left and right image. ), A ROI setting step of setting a region of interest (ROI) of the object to be tracked (S13), and extracting feature points from the ROI of the previous image and the search region of the current image (S14). And a time image feature point matching step (S15) of matching the feature points extracted from the ROI of the previous image with the feature points extracted from the search region of the current image, and calculating a distance between the ROI and the object using the matched feature points ( S16).

이와 같은 종래의 객체 추적방법은 좌우 영상을 정합함으로써 3차원 영상 정보를 복원하여 거리 정밀도와 추적 성능을 향상시킬 수 있는 장점이 있지만, 영상 정합에 따른 복잡도 증가, 다양한 내외부 환경 변화에 취약한 단점이 있다. 특히, 스테레오 영상 정합은 좌우 카메라의 기하학적인 변화뿐만 아니라, 밝기의 변화에도 민감하여 외부 환경에서 생기는 다양한 문제를 극복하기에 많은 어려움이 있다.
Such a conventional object tracking method has the advantage of improving distance precision and tracking performance by reconstructing 3D image information by matching left and right images, but has disadvantages of increased complexity and various internal and external environment changes due to image registration. . In particular, stereo image matching is not only a geometric change of left and right cameras, but also sensitive to a change in brightness, and thus, it is difficult to overcome various problems occurring in an external environment.

본 발명은 상술한 종래 기술의 문제점을 해결하기 위하여, 다양한 외부 환경의 변환에 강건한 시공간 특징점 정합을 이용한 객체 추적방법 및 그 장치를 제공하는 것을 목적으로 한다.An object of the present invention is to provide a method and apparatus for tracking an object using spatio-temporal feature point matching, which is robust to the transformation of various external environments.

본 발명의 목적은 이상에서 언급한 목적으로 제한되지 않으며, 언급되지 않은 또 다른 목적들은 아래의 기재로부터 당업자에게 명확하게 이해될 수 있을 것이다.
The objects of the present invention are not limited to the above-mentioned objects, and other objects not mentioned can be clearly understood by those skilled in the art from the following description.

전술한 목적을 달성하기 위한 본 발명의 일 면에 따른 시공간 특징점 정합을 이용한 객체 추적장치는 스테레오 카메라에서 입력된 좌우 영상 중 어느 하나의 영상(이하, 제1 영상)에서 추적의 대상이 되는 객체가 검출된 영역(이하, 제1 탐색영역)을 설정하는 후보영역 설정부와, 이전 프레임의 관심영역(ROI: Region Of Interest) 및 상기 제1 탐색영역 내에서 특징점을 추출하는 시간영상 특징점 추출부와, 시간 영상간의 특징점 정합을 통해 상기 제1 영상에서 ROI를 추정하는 ROI 추정부와, 상기 제1 영상에서의 ROI에 기초하여 상기 좌우 영상 중 다른 하나의 영상(이하, 제2 영상)에서 제2 탐색영역을 설정하고, 상기 제2 탐색영역 내에서 특징점을 추출하는 공간영상 특징점 추출부와, 공간 영상간의 특징점 정합을 통해, 상기 제1 영상의 ROI 내에 추출된 특징점들 중에서 상기 제2 탐색영역에서 추출된 특징점들과 매칭되는 특징점을 추출하는 공간영상 특징점 정합부를 포함한다.In accordance with an aspect of the present invention, an object tracking device using a space-time feature point registration is an object to be tracked in any one image (hereinafter, referred to as a first image) of left and right images input from a stereo camera. A candidate region setting unit for setting a detected region (hereinafter, referred to as a first search region), a region of interest (ROI) of a previous frame, and a temporal image feature point extractor extracting feature points within the first search region; A ROI estimator for estimating an ROI in the first image by matching feature points between time images, and a second image in another image (hereinafter, referred to as a second image) among the left and right images based on the ROI in the first image. Feature points extracted in the ROI of the first image by setting a search area and matching feature points between the spatial image feature point extractor and the spatial image to extract the feature points in the second search area. In the space it contains the second image feature point matching for extracting a feature point that matches the feature point extracted from the search area portion.

상기 공간영상 특징점 추출부는 아래 수학식과 같이 이전 프레임 영상을 참조하여 예측된 현재 프레임에서의 객체와의 거리(

)를 고려하여, 유효한 시차 범위([

,

]) 내에서 상기 제2 탐색영역을 설정한다.The spatial image feature point extractor extracts the distance from the object in the current frame predicted by referring to the previous frame image as

), The effective parallax range ([

,

]) Sets the second search region.

여기서, b는 스테레오 카메라의 기준선 (baseline), α는 픽셀 크기에 대한 초점 거리,

는 추정된 시차,

는 추정된 거리에 대한 신뢰도,

는 이전 프레임에서 추정된 속도,

는 프레임간의 시간을 나타낸다.Where b is the baseline of the stereo camera, α is the focal length for the pixel size,

Is the estimated time difference,

Is the confidence for the estimated distance,

Is the estimated velocity from the previous frame,

Represents the time between frames.

상기 공간영상 특징점 정합부는 아래의 수식과 같이, 상기 제1 영상의 ROI 내에 추출된 특징점과 상기 제2 탐색영역에서 추출된 특징점을 동일한 에피폴라 선상에 두고 선상에 존재하는 특징점 간의 시차(disparity)만을 고려하여 정합을 수행한다.The spatial image feature point matching unit sets only the disparity between the feature points extracted in the ROI of the first image and the feature points extracted from the second search region on the same epipolar line, as shown in the following equation. Consider the matching.

특징점들 간의 스테레오 정합을 영상의 왼쪽에서 오른쪽으로, 위쪽에서 아래쪽으로 이동하면서 진행한다고 가정할 때

,

는 현재 프레임 좌영상, 우영상의 i번째 행에서 j번째 정합된 특징점들의 x좌표,

와

는 각각 i번째 행에서 추정된 시차와 ROI 에서 추정된 시차를 나타낸다Suppose that stereo matching between feature points proceeds from left to right and top to bottom of the image.

,

Is the x-coordinate of the j-th matched feature points in the i-th row of the current frame,

Wow

Denotes the parallax estimated in the i-th row and the parallax estimated in the ROI, respectively.

본 발명의 다른 면에 따른 시공간 특징점 정합을 이용한 객체 추적방법은 스테레오 카메라에서 입력된 좌우 영상 중 어느 하나의 영상(이하, 제1 영상)에 대하여 이전 프레임에서 정의된 관심영역(ROI: Region Of Interest)을 참조하여 시간 영역에서 특징점 정합을 수행하는 단계와, 상기 시간 영역에서 특징점 정합의 수행 결과 추출되는 특징점에 기초하여 상기 좌우 영상 중 다른 하나의 영상(이하, 제2 영상)과 공간 영역에서의 특징점 정합을 수행하는 단계를 포함한다.According to another aspect of the present invention, an object tracking method using space-time feature point matching is defined as a region of interest defined in a previous frame with respect to any one image (hereinafter, referred to as a first image) of left and right images input from a stereo camera. Performing the feature point matching in the time domain and extracting the feature points in the time domain based on the feature points extracted in the time domain. Performing feature point matching.

한편, 본 발명의 일 면에 따른 객체 추적방법은 컴퓨터에서 실행시키기 위한 프로그램으로 구현되어 컴퓨터로 판독 가능한 기록 매체에 저장될 수 있다.
Meanwhile, the object tracking method according to an aspect of the present invention may be implemented as a program for executing in a computer and stored in a computer-readable recording medium.

이상 상술한 바와 같이 본 발명에 따르면 다양한 내외부 환경에서도 시공간 영상에서의 특징점 정합을 이용하여 강건한 객체 추적과 객체와의 거리를 추정할 수 있다. As described above, according to the present invention, robust object tracking and distance from an object may be estimated using feature point matching in a spatiotemporal image even in various internal and external environments.

또한, 본 발명에서는 기존 스테레오 비전 시스템에서 사용했던 고밀도(dense) 스테레오 정합 과정이 필요 없으므로 복잡도를 줄일 수 있다. 스테레오 정합 과정은 좌우 영상의 밝기 차이에 민감하여 정합 오류가 쉽게 발생하므로 다양한 외부 환경에서 적용될 때 많은 문제가 발생할 수 있는데 본 발명에 따르면 시간 영상과 공간 영상의 특징점 들을 동시에 정합하고, 특징점들 간의 시공간 정보를 융합함으로써 거리 정확도뿐만 아니라 객체 추적 성능도 향상시킬 수 있다.
In addition, the present invention does not require a dense stereo matching process used in the conventional stereo vision system, thereby reducing the complexity. The stereo matching process is sensitive to the difference in brightness of the left and right images so that a matching error occurs easily. Therefore, many problems may occur when applied in various external environments. According to the present invention, the feature points of the temporal image and the spatial image are simultaneously matched, and the space-time between the feature points is By fusing information, you can improve object tracking performance as well as distance accuracy.

도 1은 종래 기술에 따른 특징점을 이용한 객체 추적방법의 일련의 프로세스를 설명한 순서도.
도 2는 본 발명의 일 실시예에 따른 시공간 특징점 정합을 이용한 객체 추적장치의 개략적인 구성을 도시한 블록 구성도.
도 3은 본 발명에 따른 시공간 특징점 정합의 일 예를 도시한 예시도.
도 4는 본 발명의 다른 실시예에 따른 시공간 특징점 정합을 이용한 객체 추적방법을 도시한 순서도. 1 is a flow chart illustrating a series of processes of an object tracking method using feature points according to the prior art.
2 is a block diagram illustrating a schematic configuration of an object tracking apparatus using space-time feature point matching according to an embodiment of the present invention.
3 is an exemplary view showing an example of space-time feature point registration according to the present invention.
4 is a flowchart illustrating an object tracking method using space-time feature point matching according to another embodiment of the present invention.

본 발명의 이점 및 특징, 그리고 그것들을 달성하는 방법은 첨부되는 도면과 함께 상세하게 후술되어 있는 실시예들을 참조하면 명확해질 것이다. 그러나 본 발명은 이하에서 개시되는 실시예들에 한정되는 것이 아니라 서로 다른 다양한 형태로 구현될 것이며, 단지 본 실시예들은 본 발명의 개시가 완전하도록 하며, 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에게 발명의 범주를 완전하게 알려주기 위해 제공되는 것이며, 본 발명은 청구항의 범주에 의해 정의될 뿐이다. 한편, 본 명세서에서 사용된 용어는 실시예들을 설명하기 위한 것이며 본 발명을 제한하고자 하는 것은 아니다. 본 명세서에서, 단수형은 문구에서 특별히 언급하지 않는 한 복수형도 포함한다.Advantages and features of the present invention and methods for achieving them will be apparent with reference to the embodiments described below in detail with the accompanying drawings. The present invention may, however, be embodied in many different forms and should not be construed as being limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the invention to those skilled in the art. Is provided to fully convey the scope of the invention to those skilled in the art, and the invention is only defined by the scope of the claims. It is to be understood that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. In the present specification, the singular form includes plural forms unless otherwise specified in the specification.

이하, 본 발명의 바람직한 실시예를 첨부된 도면들을 참조하여 상세히 설명한다. 우선 각 도면의 구성요소들에 참조부호를 부가함에 있어서, 동일한 구성요소들에 대해서는 비록 다른 도면상에 표시되더라도 가능한 동일한 부호를 가지도록 하고 있음에 유의해야 한다. 또한 본 발명을 설명함에 있어, 관련된 공지 구성 또는 기능에 대한 구체적인 설명이 본 발명의 요지를 흐릴 수 있다고 판단되는 경우에는 그 상세한 설명은 생략한다Hereinafter, preferred embodiments of the present invention will be described in detail with reference to the accompanying drawings. In the drawings, the same reference numerals are used to designate the same or similar components throughout the drawings. In describing the present invention, when it is determined that the detailed description of the related well-known configuration or function may obscure the gist of the present invention, the detailed description thereof will be omitted.

본 발명은 복잡한 외부 환경에서 다양한 객체의 움직임에 대하여 강건하게 객체를 추적하는 방법 및 그 장치를 제안한다. 이를 위한 본 발명의 일 실시예에 따른 시공간 특징점 정합을 이용한 객체 추적장치의 구성을 도 2를 참조하여 구체적으로 설명한다. 도 2는 본 발명의 일 실시예에 따른 시공간 특징점 정합을 이용한 객체 추적장치의 개략적인 구성을 도시한 블록 구성도이다.The present invention proposes a method and apparatus for robustly tracking an object against movement of various objects in a complex external environment. A configuration of an object tracking apparatus using space-time feature point matching according to an embodiment of the present invention for this purpose will be described in detail with reference to FIG. 2. FIG. 2 is a block diagram illustrating a schematic configuration of an apparatus for tracking an object using space-time feature point matching according to an embodiment of the present invention.

도 2를 참조하면, 본 발명의 일 실시예에 따른 시공간 특징점 정합을 이용한 객체 추적장치는 전처리부(10), 후보영역 설정부(20), 시간영상 특징점 추출부(30), ROI 추정부(40), 공간영상 특징점 추출부(50), 공간영상 특징점 정합부(60), 거리 추정부(70)를 포함하여 구성된다Referring to FIG. 2, an object tracking apparatus using spatiotemporal feature point matching according to an embodiment of the present invention may include a preprocessor 10, a candidate region setting unit 20, a temporal image feature point extractor 30, and an ROI estimator ( 40), the spatial image feature point extractor 50, the spatial image feature point matcher 60, and the distance estimator 70 are configured.

전처리부(10)는 스테레오 카메라에서 입력되는 좌우영상을 보정하고, 조정한다. The preprocessor 10 corrects and adjusts the left and right images input from the stereo camera.

후보영역 설정부(20)는 좌우 영상 중 어느 하나의 영상(이하, 설명의 편의를 위해 좌측 카메라에서 입력된 영상에서 객체 검출이 수행된 것으로 가정함)에서 추적의 대상이 되는 객체를 검출한다. 검출되는 객체는 해당 영상 내에서 영역 정보로 표시되는데, 영상 내에서 표시되는 영역은 하나 이상일 수 있고 각각의 영역은 객체 추적을 위한 후보영역을 의미한다.The candidate region setting unit 20 detects an object to be tracked from one of the left and right images (hereinafter, it is assumed that object detection is performed in the image input from the left camera for convenience of explanation). The detected object is displayed as region information in the corresponding image, and there may be one or more regions displayed in the image, and each region represents a candidate region for object tracking.

객체 검출은 사용자에 의해 객체가 위치하는 해당 영역이 선택되거나, 또는 Adaboost나 HOG(Histogram of gradient) 알고리즘에 의해 자동으로 선택되는 방법으로 수행될 수 있다.The object detection may be performed by a method in which a corresponding region where an object is located is selected by a user or automatically selected by an Adaboost or histogram of gradient (HOG) algorithm.

시간영상 특징점 추출부(30)는 이전 프레임 좌영상(

)의 관심 영역(Region of Interest, 이하 ROI)과, 현재 프레임 좌영상(

)의 후보 영역에서 특징점을 각각 추출한다. 여기서, 이전 프레임의 ROI는 이전 프레임 영상에서 추적의 대상이 되는 객체가 위치하는 것으로 추정된 관심 영역을 의미한다The temporal image feature point extractor 30 is a previous frame left image (

Region of interest (ROI) and the current frame left image (

Each feature point is extracted from the candidate regions of. Here, the ROI of the previous frame refers to the ROI estimated to be located in the object to be tracked in the previous frame image.

영상에서 두드러진 특징을 가지는 영역을 추출하기 위해 많은 알고리즘이 사용되는데 일 예로 FAST 알고리즘이 본 발명에서 사용될 수 있다. FAST 알고리즘은 현재 픽셀과 주변 영역의 n개 픽셀의 밝기 값을 비교하여, 그 밝기의 차이가 특정 문턱값(threshold)보다 클 때, 현재 픽셀을 영상에서 특징점으로 결정한다. Many algorithms are used to extract regions with prominent features in an image. For example, the FAST algorithm may be used in the present invention. The FAST algorithm compares the brightness value of the current pixel with n pixels of the surrounding area, and determines the current pixel as a feature point in the image when the difference in brightness is greater than a specific threshold.

ROI 추정부(40)는 시간 영상간의 (

) 특징점 정합을 수행한다. 특징점을 정합할 때에는 기술자(descriptor)를 이용하며, 이들간의 거리가 가장 가까운 특징점들을 선택한다. 선택된 특징점을 이용하여, 아래 수학식 1에서와 같이 변환 행렬 (

)을 추정한다.The ROI estimator 40 performs

) Perform feature point matching. Descriptors are used to match the feature points, and feature points closest to each other are selected. Using the selected feature points, the transformation matrix (

).

여기서,

,

는 각각 이전 프레임 좌영상, 현재 프레임 좌영상에서 추출되고, k번째 정합된 특징점들, p 는 정합된 특징점들의 총 개수를 나타낸다.here,

,

Are respectively extracted from the previous frame left image and the current frame left image, k-th matched feature points, and p represents the total number of matched feature points.

ROI 추정부(40)는 이상치(outlier) 특징점들을 효과적으로 제거하기 위하여, Least median of squares(LMS)나 Random sample consensus(RANSAC) 등의 알고리즘을 이용하여 변환 행렬을 추정하고, 추정된 변환 행렬을 이용하여 현재 프레임의 좌영상에서 ROI를 추정한다.The ROI estimator 40 estimates a transform matrix using algorithms such as Least median of squares (LMS) or Random sample consensus (RANSAC) to effectively remove outlier features, and uses the estimated transform matrix. ROI is estimated from the left image of the current frame.

도 3은 본 발명에 따른 시공간 특징점 정합의 일 예를 도시한 예시도이고, ROI 추정부(40)에서 처리되는 시간 특징점 정합은 도 3의 좌측에 도시되어 있다.3 is an exemplary diagram illustrating an example of space-time feature point matching according to the present invention, and the temporal feature point matching processed by the ROI estimator 40 is shown on the left side of FIG. 3.

도 3에서 좌영상의 이전 프레임 영상은 t-1로 표시되고, 현재 프레임 영상은 t로 표시된다. 이전 프레임 좌영상(t-1)에서 ROI는 흰색 직사각형으로 테두리로 표시되고, 현재 프레임 좌영상(t-1)에서 후보영역(또는 탐색영역)은 푸른색으로 표시된다. 이전 프레임 좌영상에서 ROI의 특징점은 붉은 색 점으로 표시되고, 현재 프레임 좌영상에서 탐색영역의 특징점은 푸른 색으로 표시되고, 시간영상 특징점 정합결과 이전 프레임 좌영상에서 추출된 특징점과 정합되는 현재 프레임 좌영상에서의 특징점은 붉은 색으로 표시된다. 이와 같은 특징점 정합 결과, ROI 추정부(40)는 변환행렬((

)을 추정할 수 있고, 추정된 변환행렬과 이전 프레임 좌영상에서의 ROI를 이용하여 현재 프레임 좌영상에서의 ROI를 추정한다. 현재 프레임 좌영상에서 추정된 ROI는 도 3에서 흰색 직사각형 테두리로 표시되어 있다.In FIG. 3, the previous frame image of the left image is represented by t-1, and the current frame image is represented by t. In the previous frame left image t-1, the ROI is displayed as a white rectangle with a border, and in the current frame left image t-1, the candidate area (or search area) is displayed in blue. In the left frame of the previous frame, the feature point of the ROI is indicated by a red dot, in the left frame of the current frame, the feature point of the search area is indicated by a blue color, and as a result of matching the time image feature points, the current frame is matched with the feature point extracted from the previous frame. Feature points in the left image are displayed in red. As a result of such feature point matching, the ROI estimator 40 converts the transformation matrix ((

) Can be estimated, and the ROI of the current frame left image is estimated using the estimated transformation matrix and the ROI of the previous frame left image. The ROI estimated in the current frame left image is indicated by a white rectangular border in FIG. 3.

공간영상 특징점 추출부(50)는 현재 프레임의 좌영상을 참조 영상(reference The spatial image feature point extractor 50 references the left image of the current frame.

image)으로 하고, 현재 프레임의 우영상을 해당 영상(corresponding image)으로 하여 현재 프레임 우영상 (

)의 탐색 영역에서 특징점들을 추출한다.image) and the current frame right image (corresponding image) as the corresponding image.

Extract feature points from the search region

좌우 영상에서 추출된 특징점만의 시차 정보를 추정하는 스테레오 특징점 정합(stereo feature matching) 기술은 참조 영상에서 추출된 특징점들과 해당 영상의 추출된 특징점들 사이의 대응관계를 이용하는데, 본 발명에 따른 스테레오 특징점 정합은 이전 프레임 영상과 시간영상 특징점 정합 결과 추출된 특징점을 기준으로 스테레오 특징점 정합을 수행하는 것을 특징으로 한다.Stereo feature matching technique for estimating parallax information of only feature points extracted from left and right images uses a correspondence between feature points extracted from a reference image and extracted feature points of the corresponding image. Stereo feature point matching is characterized by performing stereo feature point matching on the basis of the feature points extracted as a result of matching the previous frame image with the time image feature point.

구체적으로, 공간영상 특징점 추출부(50)는 시간영상 특징점 정합 결과 추정된 좌영상의 ROI를 기준으로 우영상에서 탐색 영역을 설정한다. 참조 영상(좌영상)의 ROI 내부의 특징점을 정확하게 정합하기 위해서는 해당 영상(우영상)에서 참조 영상의 ROI와 가장 유사한 탐색 영역을 찾는 것이 중요한데, 공간영상 특징점 추출부(50)는 아래 수학식 2와 같이 이전 프레임 영상을 참조하여 예측된 현재 프레임에서의 객체와의 거리(

)를 고려하여, 유효한 시차 범위([d _t _, _min , d _t _, _max ]) 내에서 탐색 영역을 설정하고, 설정된 탐색 영역 내에서 특징점을 추출한다.In detail, the spatial image feature point extractor 50 sets a search region in the right image based on the ROI of the left image estimated as a result of temporal image feature point matching. In order to accurately match the feature points inside the ROI of the reference image (left image), it is important to find a search area that is most similar to the ROI of the reference image in the image (right image), and the spatial image feature point extractor 50 is represented by Equation 2 below. The distance to the object in the current frame predicted by referring to the previous frame image

), The search area is set within the valid parallax range ([ d _t _, _min , d _t _, _max ]), and feature points are extracted within the set search area.

여기서,

는 스테레오 카메라의 기준선 (baseline),

는 픽셀 크기에 대한 초점 거리,

는 추정된 시차,

는 추정된 거리에 대한 신뢰도,

는 이전 프레임에서 추정된 속도,

는 프레임간의 시간을 나타낸다.here,

Is the baseline of the stereo camera,

Is the focal length for the pixel size,

Is the estimated time difference,

Is the confidence for the estimated distance,

Is the estimated velocity from the previous frame,

Represents the time between frames.

공간영상 특징점 정합부(60)는 공간 영상간의 (

,

) 특징점 정합을 수행하며, 전처리부에서 교정된 에피폴라 제약조건 (epipolar constraint)을 이용하여, 시차 이동(

)만을 고려한 변환 행렬을 추정한다.The spatial image feature point matching unit 60 performs

,

) Feature point matching, and using the epipolar constraint corrected in the preprocessor,

Estimate the transformation matrix considering only).

스테레오 영상에서 정합의 정확성을 높이고 정합 수행 시간을 단축하기 위해 몇 가지 정합의 제약을 이용할 수 있다. 대표적으로 이용되는 제약 중 하나는 에피폴라 제약인데, 에피폴라 제약은 스테레오 영상에서 대응하는 두 점을 동일 선상에 위치하도록 해준다. 이를 이용하여 스테레오 영상의 대응 영역을 탐색할 경우, 동일한 선상에 해당하는 영역만 비교하여 연산 수행 시간을 줄일 수 있다.Several matching constraints can be used to increase the accuracy of matching and shorten the matching performance in stereo images. One of the constraints typically used is an epipolar constraint, which places two corresponding points on the same line in a stereo image. When searching for a corresponding region of a stereo image using this, it is possible to shorten the calculation execution time by comparing only the region corresponding to the same line.

즉, 공간영상 특징점 정합부(60)는 시간영상 정합 결과 추출된 좌영상의 특징점들과 우영상의 탐색영역 내에서 추출된 특징점들을 동일한 에피폴라 선상에 두고, 선상에 존재하는 특징점 간의 시차 거리만을 고려하여 정합을 수행한다. 정합 결과 추정되는 수평 거리(시차, disparity)는 수학식 3과 같이 나타낼 수 있다That is, the spatial image feature point matching unit 60 places the feature points of the left image extracted as a result of temporal image registration and the feature points extracted in the search region of the right image on the same epipolar line, and only the disparity distance between the feature points present on the line Consider the matching. The horizontal distance (disparity) estimated as a result of the matching may be expressed as in Equation 3.

,

와

는 각각 i번째 행에서 추정된 시차와 ROI 에서 추정된 시차를 나타낸다.Suppose that stereo matching between feature points proceeds from left to right and top to bottom of the image.

,

Wow

Respectively represent the parallax estimated in the i-th row and the parallax estimated in the ROI.

또한, 공간영상 특징점 정합부(60)는 현재 프레임의 좌우 영상 간의 특징점 정합을 통하여 추정된 시차를 이용하여 아래의 수학식 4와 같이 시간 영상 간의 특징점들 중에 배경에 해당되는 특징점들, 즉 이상치(outlier) 특징점을 제거한다. In addition, the spatial image feature point matching unit 60 uses the disparity estimated by matching the feature points between the left and right images of the current frame, so that the feature points corresponding to the background among the feature points between temporal images as shown in Equation 4 below, that is, outliers ( outlier) Remove feature points.

여기서,

,

는 추정된 시차,

는 추정된 시차의 신뢰도를 나타내고, k는 상수를 의미한다.here,

,

Is the estimated time difference,

Denotes the reliability of the estimated parallax, and k denotes a constant.

거리 추정부(70)는 최종적으로 선택된 특징점들을 이용하여 시차를 추정하고, 추정된 시차를 이용하여 수학식 2에서와 같이 객체와의 거리(

)를 계산한다.The distance estimator 70 estimates the parallax using the finally selected feature points, and uses the estimated parallax to calculate the distance from the object (as shown in Equation 2).

).

이하, 도 4를 참조하여 본 발명의 다른 실시예에 따른 시공간 특징점 정합을 이용한 객체 추적방법을 설명한다. 도 4는 본 발명의 다른 실시예에 따른 시공간 특징점 정합을 이용한 객체 추적방법을 도시한 순서도이다Hereinafter, an object tracking method using space-time feature point matching according to another embodiment of the present invention will be described with reference to FIG. 4. 4 is a flowchart illustrating an object tracking method using space-time feature point matching according to another embodiment of the present invention.

본 발명에 따른 객체 추적방법은 시간 영역에서의 특징점 정합과 공간 영역에서의 특징점 정합이 융합된 것을 특징으로 한다. 구체적으로, 스테레오 카메라에 의해 획득한 좌우 영상 간의 특징점 정합에 있어서, 정합의 대상이 되는 특징점을 일반적인 특징점 추출기를 이용하여 추출하는 방식을 이용하는 것이 아니라, 시간 영역에서의 특징점 정합 결과 추출된 특징점을 공간 영역에서의 특징점 정합 대상으로 한다.The object tracking method according to the present invention is characterized in that the feature point matching in the time domain and the feature point matching in the spatial domain are fused. In detail, in the feature point matching between the left and right images acquired by the stereo camera, the feature point extracted as a result of the feature point matching in the time domain is extracted, instead of using a method of extracting the feature point to be matched using a general feature point extractor. Feature point matching in the area is assumed.

즉, 먼저 좌우 영상 중에서 어느 하나의 영상에 대하여 시간 영역에서 특징점 정합을 먼저 수행하고, 그 결과 추출되는 특징점을 대상으로 공간 영역에서의 특징점 정합을 수행한다. 만약, 좌영상에서 시간 영역 특징점 정합이 수행되었다면, 그 결과 추출된 특징점과 우영상에서 특징점 추출기에서 추출된 특징점 간에 공간 영역 특징점 정합이 수행된다That is, the feature point matching is first performed on one of the left and right images in the time domain, and the feature point matching is performed on the feature points extracted as a result. If the time domain feature point matching is performed in the left image, the spatial domain feature point matching is performed between the extracted feature points and the feature points extracted by the feature point extractor in the right image.

이와 같이, 본 발명에 따르면, 시간 영역 특징점 정합 및 공간 영역 특징점 정합을 모두 만족하는 특징점이 최종적으로 추출되고 그 특징점을 바탕으로 시간 영역에서의 변환행렬 및 ROI가 추정되고, 공간 영역에서의 시차 및 객체와의 거리가 추정된다. 이하에서는 좌영상에서 시간 영역 특징점 정합이 수행되고, 이후 우영상과의 공간 영역 특징점 정합이 수행되는 순서로 본 발명에 따른 객체 추적방법을 설명한다. 이는 하나의 예시에 불과하며, 우영상에서 시간 영역 특징점 정합, 이후 좌영상과의 공간 영역 특징점 정합도 수행될 수 있음은 당업자에게는 자명한 사항일 것이다.As described above, according to the present invention, a feature point that satisfies both the time domain feature point match and the space domain feature point match is finally extracted, and the transformation matrix and the ROI in the time domain are estimated based on the feature point, and the parallax and The distance to the object is estimated. Hereinafter, the object tracking method according to the present invention will be described in the order that temporal domain feature point matching is performed on the left image, and then spatial domain feature point matching is performed with the right image. This is just an example, and it will be apparent to those skilled in the art that temporal domain feature point matching in the right image and spatial region feature point matching with the left image may also be performed.

먼저, 좌영상에서 시간 탐색을 위한 후보영역 설정 단계(S410)가 수행된다. 시간 탐색은 이전 시점 영상과 현재 시점 영상의 특징점 정합을 통해 현재 시점 영상에서 ROI를 추정하는 과정으로서, S410 단계는 추적 대상이 되는 객체를 현재 시점 영상에서 탐색하기 위한 후보 영역을 설정하는 단계이다.First, a candidate region setting step (S410) for searching for time in the left image is performed. The temporal search is a process of estimating the ROI in the current view image by matching feature points of the previous view image and the current view image. In step S410, a candidate area for searching for an object to be tracked in the current view image is set.

이전 시점 영상에서 추적 대상이 되는 객체에 대한 ROI는 이미 설정되어 있으며, 이전 시점 ROI에 대응되는 현재 시점 영상의 ROI를 추정하기 위해서는 현재 시점 영상에서 탐색 대상이 먼저 특정되어야 한다. 본 발명에서 탐색 영역은 사용자에 의해 수동으로 선택되거나, 또는 Adaboost나 HOG(Histogram of gradient) 알고리즘에 의해 자동으로 특정될 수 있다.The ROI for the object to be tracked in the previous view image is already set. In order to estimate the ROI of the current view image corresponding to the previous view ROI, the search target must be specified first in the current view image. In the present invention, the search area may be manually selected by a user or automatically specified by an Adaboost or histogram of gradient (HOG) algorithm.

다음으로, 이전 시점 영상의 ROI와 현재 시점 영상의 탐색 영역에서 특징점 추출 단계(S420)가 수행된다. 해당 영상에서 특징점 추출을 위해 SURF, SIFT, FAST와 같은 특징점 추출을 위한 알고리즘이 사용될 수 있으며 특징점 추출 결과 추출된 특징점은 기술자로 서술된다. Next, the feature point extraction step S420 is performed in the ROI of the previous view image and the search region of the current view image. An algorithm for extracting feature points such as SURF, SIFT, and FAST may be used to extract feature points from the image, and the feature points extracted as a result of feature point extraction are described as descriptors.

다음으로, 특징점 정합을 통한 ROI 추정 단계(S430)가 수행된다. 이전 시점 영상에서 기 정의된 ROI와 현재 시점 영상에서 정의된 탐색 영역에 대하여 S420 단계에서 특징점 추출 프로세스가 진행되고, 그 결과 추출된 특징점들 간의 정합을 통해 현재 시점 영상에서 ROI가 추정된다. 전술한 수학식 1에서와 같이, 특징점들 간의 거리가 최소가 되는 변환 행렬이 추정되고, 이상치(outlier) 특징점들을 효과적으로 제거하기 위하여, Least median of squares(LMS)나 Random sample consensus(RANSAC) 등의 알고리즘이 사용된다. 추정된 변환 행렬은 현재 프레임의 좌영상(현재 시점의 좌영상)에서 ROI를 추정하는데 이용된다.Next, an ROI estimation step (S430) through feature point matching is performed. A feature point extraction process is performed in step S420 with respect to the ROI defined in the previous view image and the search region defined in the current view image. As a result, the ROI is estimated in the current view image through matching between the extracted feature points. As in Equation 1, the transformation matrix having the minimum distance between the feature points is estimated, and in order to effectively remove outlier feature points, such as Least median of squares (LMS) or Random sample consensus (RANSAC), etc. An algorithm is used. The estimated transformation matrix is used to estimate the ROI in the left image of the current frame (left image of the current view).

전술한 S410 내지 S430 단계의 시간 영역에서 특징점 정합 프로세스가 수행되면, 그 결과 변환 행렬, 현재 프레임 영상에서의 ROI 및 시간 영역에서 정합된 특징점이 추출되는데 이후 단계에서는 이를 기초하여 공간 영역에서 특징점 정합이 수행된다.When the feature point matching process is performed in the time domains of steps S410 to S430 described above, as a result, the transform matrix, the ROI in the current frame image, and the feature points matched in the time domain are extracted. Is performed.

먼저, 공간 영역에서 특징점 정합을 위한 탐색 영역이 현재 프레임 우영상에서 설정된다(S440). 구체적으로 S430 단계에서 추정된 좌영상의 ROI를 기준으로 탐색 영역이 설정되는데, 좌영상의 ROI와 가장 유사한 탐색 영역을 찾기 위해 수학식 2와 같이 이전 프레임 영상을 참조하여 예측된 현재 프레임에서의 객체와의 거거리(

)를 고려하여, 유효한 시차 범위([d _t _, _min , d _t _, _max ]) 내에서 탐색 영역이 설정된다.First, a search area for matching feature points in a spatial area is set in the current frame right image (S440). In more detail, a search area is set based on the ROI of the left image estimated at step S430. To find a search area most similar to the ROI of the left image, an object in the current frame predicted by referring to a previous frame image as shown in Equation 2 Distance with

), The search area is set within the valid parallax range [[ d _t _, _min , d _t _, _max ]).

다음으로, 우영상에서 설정된 공간 탐색 영역 내에서 특징점 추출 단계(S450)가 수행된다.Next, the feature point extraction step S450 is performed within the spatial search region set in the right image.

다음으로, S430 단계에서 시간 영역에서 특징점 정합 결과 추출된 좌영상의 특징점과 S450 단계에서 추출된 우영상 탐색 영역 내의 특징점들 간의 스테레오 정합 단계(S460)가 수행된다. 본 발명에서 스테레오 정합은 좌영상의 특징점들과 우영상의 특징점들을 동일한 에피폴라 선상에 두고, 선상에 존재하는 특징점 간의 수평 거리만을 고려하여 수학식 3과 같이 최소 값을 가지는 수평 거리(시차, disparity)를 추정하는 단계이다Next, a stereo matching step (S460) is performed between the feature point of the left image extracted as a result of feature point matching in the time domain in step S430 and the feature points in the right image search region extracted in step S450. In the present invention, stereo matching puts the feature points of the left image and the feature points of the right image on the same epipolar line, and considers only the horizontal distance between the feature points existing on the line, and has a horizontal distance having a minimum value as shown in Equation 3 (disparity, disparity). Estimating

스테레오 정합을 통해 추정된 시차를 이용하여 S430 단계에서 추출된 특징점들 중 배경에 해당되는 특징점이 제거된다(S470). 이와 같은 시공간 특징점 정합을 통해 최종적으로 선택된 특징점들이 추출되고, 추정된 시차를 이용하여 수학식 2에서와 같이 객체와의 거리(

)가 연산된다(S480).The feature point corresponding to the background is removed from the feature points extracted in step S430 by using the parallax estimated through stereo matching (S470). Finally, the selected feature points are extracted through the spatiotemporal feature point matching, and the distance from the object (as shown in Equation 2) is estimated using the estimated parallax.

) Is calculated (S480).

상술한 본 발명에 따른 객체 추적방법은 컴퓨터로 읽을 수 있는 기록매체에 컴퓨터가 읽을 수 있는 코드로서 구현되는 것이 가능하다. 컴퓨터가 읽을 수 있는 기록 매체로는 컴퓨터 시스템에 의하여 해독될 수 있는 데이터가 저장된 모든 종류의 기록 매체를 포함한다. 예를 들어, ROM(Read Only Memory), RAM(Random Access Memory), 자기 테이프, 자기 디스크, 플래시 메모리, 광 데이터 저장장치 등이 있을 수 있다. 또한, 컴퓨터로 판독 가능한 기록매체는 컴퓨터 통신망으로 연결된 컴퓨터 시스템에 분산되어, 분산방식으로 읽을 수 있는 코드로서 저장되고 실행될 수 있다.The object tracking method according to the present invention described above may be embodied as computer readable codes on a computer readable recording medium. The computer-readable recording medium includes all kinds of recording media storing data that can be decoded by a computer system. For example, there may be a ROM (Read Only Memory), a RAM (Random Access Memory), a magnetic tape, a magnetic disk, a flash memory, an optical data storage device and the like. The computer-readable recording medium may also be distributed and executed in a computer system connected to a computer network and stored and executed as a code that can be read in a distributed manner.

본 발명이 속하는 기술분야의 통상의 지식을 가진 자는 본 발명이 그 기술적 사상이나 필수적인 특징을 변경하지 않고서 다른 구체적인 형태로 실시될 수 있다는 것을 이해할 수 있을 것이다. 그러므로 이상에서 기술한 실시예들은 모든 면에서 예시적인 것이며 한정적이 아닌 것으로 이해해야만 한다. 본 발명의 보호범위는 상기 상세한 설명보다는 후술하는 특허청구범위에 의하여 나타내어지며, 특허청구의 범위 그리고 그 균등 개념으로부터 도출되는 모든 변경 또는 변형된 형태가 본 발명의 범위에 포함되는 것으로 해석되어야 한다.
It will be understood by those skilled in the art that the present invention may be embodied in other specific forms without departing from the spirit or essential characteristics thereof. It is therefore to be understood that the above-described embodiments are illustrative in all aspects and not restrictive. The scope of the present invention is defined by the appended claims rather than the detailed description, and all changes or modifications derived from the scope of the claims and their equivalents should be construed as being included within the scope of the present invention.

Claims

A candidate region setting unit configured to set a region (hereinafter, referred to as a first search region) in which an object to be tracked is detected in one of the left and right images input from the stereo camera (hereinafter, referred to as a first image);
A temporal image feature point extracting unit extracting a region of interest (ROI) of a previous frame and a feature point in the first search region;
An ROI estimator for estimating an ROI in the first image by matching feature points between time images;
A spatial image feature point extractor configured to set a second search region from another image (hereinafter, referred to as a second image) among the left and right images based on the ROI of the first image, and extract a feature point within the second search region. ; And
Spatial image feature point matching unit for extracting feature points matched with feature points extracted from the second search region among feature points extracted in ROI of the first image by matching feature points between spatial images
Object tracking device using the spatiotemporal feature point matching comprising a.

The method of claim 1, wherein the spatial image feature point extractor,
The distance from the object in the current frame predicted by referring to the previous frame image

), The object tracking device using the spatiotemporal feature point matching to set the second search region within the effective parallax range ([ d _t _, _min , d _t _, _max ])

here,

Is the baseline of the stereo camera,

Is the focal length for the pixel size,

Is the estimated time difference,

Is the confidence for the estimated distance,

Is the estimated velocity from the previous frame,

Represents the time between frames

The method of claim 1, wherein the spatial image feature point matching unit,
As described below, the matching is performed by considering only the parallax distance between the feature points extracted in the ROI of the first image and the feature points extracted from the second search region on the same epipolar line. Object tracking device using space-time feature point matching.

Suppose that stereo matching between feature points proceeds from left to right and top to bottom of the image.

,

Wow

Performing feature point matching in a time domain with reference to a region of interest (ROI) defined in a previous frame with respect to one of the left and right images input from the stereo camera (hereinafter, referred to as a first image); And
Performing feature point matching in the spatial domain with another image (hereinafter, referred to as a second image) of the left and right images based on the feature points extracted as a result of performing the feature point matching in the time domain;
Performing feature point matching in the spatial domain may include:
Setting a search region for the second image based on the ROI for the first image extracted as a result of performing feature point matching in the time domain;
Setting the search area for the second image,
The distance from the object in the current frame predicted by referring to the previous frame image

), And setting the second search range within a valid parallax range ([ d _{t, min} , d _{t, max} ]).

here,

Is the baseline of the stereo camera,

Is the focal length for the pixel size,

Is the estimated time difference,

Is the confidence for the estimated distance,

Is the estimated velocity from the previous frame,

Represents the time between frames.

delete

A computer-readable recording medium having recorded thereon a program for executing the object tracking method according to claim 4 on a computer.