KR100574227B1

KR100574227B1 - Apparatus and method for separating object motion from camera motion

Info

Publication number: KR100574227B1
Application number: KR1020030093206A
Authority: KR
Inventors: 권형진; 허남호; 안충현; 이수인
Original assignee: 한국전자통신연구원
Priority date: 2003-12-18
Filing date: 2003-12-18
Publication date: 2006-04-26
Also published as: KR20050061115A

Abstract

1. 청구범위에 기재된 발명이 속하는 기술분야1. TECHNICAL FIELD OF THE INVENTION

본 발명은 카메라 움직임을 보상한 객체 움직임 추출 장치 및 그 방법에 관한 것임.The present invention relates to an object motion extraction apparatus and method for compensating for camera motion.

2. 발명이 해결하려고 하는 기술적 과제2. The technical problem to be solved by the invention

본 발명은 카메라와 객체의 움직임이 제 각각인 동적환경에서, 카메라 움직임을 보상한 각 객체의 순수 움직임(3차원 움직임)만을 분리할 수 있는 객체 움직임 추출 장치 및 그 방법을 제공하고자 함.An object of the present invention is to provide an object motion extraction apparatus and method capable of separating pure motion (three-dimensional motion) of each object compensating for camera movement in a dynamic environment in which camera and object motions are different.

3. 발명의 해결 방법의 요지3. Summary of the Solution of the Invention

본 발명은, 카메라와 객체의 움직임이 제 각각인 동적 환경하에서의 영상 객체 움직임 추출 장치에 있어서, 적어도 두 개의 영상프레임에서 영상의 움직임정보를 이용하여 객체와 배경 영역을 분리하되, 에피폴라 제약조건을 이용하여 카메라만의 움직임으로 나타나는 배경영역과 카메라와 객체의 움직임으로 나타나는 객체영역을 분리하기 위한 움직임 분할수단; 각 영상프레임마다 상기 배경영역에서 내부 카메라 파라미터(카메라 좌표계)를 구하기 위한 카메라 캘리브레이션수단; 각 영상프레임당 상기 카메라 좌표계간의 '카메라의 3차원 움직임'을 구하기 위한 카메라 움직임 추정수단; 각 영상프레임마다 상기 카메라 좌표계와 각 객체 좌표계 사이의 변환을 구하기 위한 카메라/객체 좌표계 변환수단; 및 카메라/객체간 좌표변환과 영상프레임간의 카메라 좌표변환을 이용하여, 각 영상프레임마다 객체와 카메라의 움직임이 포함된 상기 객체영역에서 상기 '카메라의 3차원 움직임'을 제거하여, '객체의 순수 3차원 움직임'을 구하기 위한 객체 움직임 추정수단을 포함함.The present invention provides a device for extracting motion of a video object in a dynamic environment in which a motion of a camera and an object is different, wherein an object and a background area are separated by using motion information of an image in at least two video frames, and the epipolar constraint is removed. Motion dividing means for separating the background region represented by the movement of the camera only and the object region represented by the movement of the camera and the object; Camera calibration means for obtaining an internal camera parameter (camera coordinate system) in the background area for each image frame; Camera motion estimation means for obtaining a "three-dimensional movement of the camera" between the camera coordinate systems for each image frame; Camera / object coordinate system conversion means for obtaining a conversion between the camera coordinate system and each object coordinate system for each image frame; And using the camera / object coordinate transformation and the camera coordinate transformation between the image frames to remove the 'three-dimensional movement of the camera' from the object region including the movement of the object and the camera for each image frame, thereby eliminating the 'pure of the object. Object motion estimation means for obtaining a three-dimensional motion.

4. 발명의 중요한 용도4. Important uses of the invention

본 발명은 카메라 움직임에서 객체 움직임을 분리하는 장치 등에 이용됨.The present invention is used in the apparatus for separating the object movement from the camera movement.

동적환경, 객체 움직임 분리, 캘리브레이션, 좌표계 변환, 객체 움직임 추정Dynamic Environment, Object Motion Separation, Calibration, Coordinate System Transformation, Object Motion Estimation

Description

Apparatus and method for separating object motion from camera motion}

도 1 은 본 발명에 따른 카메라 움직임을 보상한 객체 움직임 추출 장치의 일실시예 구성도.1 is a block diagram of an embodiment of an object motion extraction apparatus for compensating for camera movement according to the present invention.

도 2 는 본 발명에 따른 카메라 움직임을 보상한 객체 움직임 추출 방법에 대한 일실시예 흐름도.2 is a flowchart illustrating an object motion extraction method for compensating for camera motion according to the present invention.

도 3 은 본 발명에 이용되는 동적환경에서 두 장의 영상에서 나타나는 객체와 카메라간의 좌표계 변환 관계를 나타낸 일실시예 설명도.FIG. 3 is a diagram illustrating an embodiment of a coordinate system transformation relationship between an object and a camera appearing in two images in a dynamic environment used in the present invention. FIG.

* 도면의 주요 부분에 대한 부호 설명* Explanation of symbols on the main parts of the drawing

11 : 움직임 분할부 12 : 카메라 캘리브레이션부11: motion divider 12: camera calibration unit

13 : 카메라 움직임 추정부 14 : 좌표계 변환부13: camera motion estimation unit 14: coordinate system conversion unit

15 : 객체 움직임 추정부15: object motion estimation unit

본 발명은 객체 움직임 추정 기술분야에 관한 것으로, 특히 카메라와 객체의 움직임이 제 각각인 동적환경에서, 카메라 움직임을 보상한 각 객체의 순수 움직임(3차원 움직임)만을 분리할 수 있는 객체 움직임 추출 장치 및 그 방법에 관한 것이다.The present invention relates to the field of object motion estimation technology. In particular, in a dynamic environment in which camera and object motions are different, an object motion extraction apparatus capable of separating pure motion (three-dimensional motion) of each object that compensates for camera motion And to a method thereof.

영상으로부터 3차원 공간을 복원하는 문제는 컴퓨터 비젼 분야에서 큰 비중을 차지하며, 이를 위한 여러 방법들이 현재 연구 진행되고 있다. 그러나, 이들 방법들은 보통 카메라와 객체 둘 중 하나를 고정시킨 환경으로 제한하거나, 카메라와 객체가 모두 움직이는 환경에서는 객체를 추적하는 방법만 제시할 뿐, 추적으로부터 객체만의 움직임을 분리하는 방법은 아직 제안되고 있지 않다.The problem of reconstructing three-dimensional space from an image occupies a large proportion in the field of computer vision, and various methods for this are currently being studied. However, these methods usually only limit the environment in which one of the cameras and objects are fixed, or only suggest ways to track objects in environments where both cameras and objects are moving. It is not being proposed.

여기서, 고정된 환경으로부터 카메라를 움직여 3차원 환경을 복원하는 방법은, SFM(Structure from Motion)이라 하여 오랫동안 연구되어 왔다. SFM은 카메라 캘리브레이션(calibration)을 수행한 후 3차원 구조를 복원하는 방법과, 카메라 캘리브레이션 정보없이 비보정된 카메라(uncalibrated camera)로부터 3차원 구조를 복원하는 두 가지 방법이 대표적이다. Here, a method of restoring a three-dimensional environment by moving a camera from a fixed environment has been studied for a long time as a structure from motion (SFM). The SFM is representative of two methods of restoring the three-dimensional structure after performing the camera calibration and two methods of restoring the three-dimensional structure from the uncalibrated camera without the camera calibration information.

첫 번째, 카메라 캘리브레이션(calibration)을 수행한 후 3차원 구조를 복원하는 방법은, 카메라 캘리브레이션 파라미터를 미리 안다고 가정한다. 이를 위해서, 센서를 사용하여 미리 측정하거나, 3차원 절대좌표를 알고 있는 마커(marker)로부터 카메라를 위치를 구하는 카메라 캘리브레이션을 수행한다. 이렇게 카메라의 3차원 위치를 알고 있다면, 공간상의 서로 다른 위치에서 획득한 영상을 이용하여 3차원 정보를 추출할 수 있다. 즉, 카메라의 보정 정보(calibration parameter)를 사용하여 3차원 공간상의 좌표와 영상상의 화소 좌표계 사이의 카메라 모델을 세운 후, 이 모델을 이용하여 카메라의 내부 변수(intrinsic parameter)와 외부 변수(extrinsic(pose) parameter)를 추출한 후 3차원정보를 추출한다. First, the method of restoring a three-dimensional structure after performing camera calibration assumes that the camera calibration parameters are known in advance. To this end, camera calibration is performed by using a sensor to measure in advance or to find the camera position from a marker that knows the three-dimensional absolute coordinates. If the three-dimensional position of the camera is known as described above, three-dimensional information may be extracted using images acquired at different positions in space. That is, a camera model is constructed between three-dimensional space coordinates and a pixel coordinate system on an image using calibration parameters of the camera, and then the intrinsic parameter and extrinsic ( After the pose) parameter) is extracted, three-dimensional information is extracted.

두 번째, 카메라 캘리브레이션 정보없이 비보정된 카메라(uncalibrated camera)로부터 3차원 구조를 복원하는 방법은, 보정정보없이 영상만으로 3차원 복원으로 두 영상 사이에 존재하는 사영 투영하의 정보를 이용하는 방법이다. 이 방법은 두 영상 사이의 기하학적인 관계를 나타내는 기반(fundamental) 행렬을 구한 후 3차원 정보를 구한다. 이는 투영시의 카메라 모델에 관한 기하학적인 에피폴라 제약 조건(epipolar constraint)를 이용하는 것이다. 에피폴라 제약 조건의 수학적인 표현은 기반(fundamental) 행렬에 해당하며, 이는 사영 투영하의 두 영상 사이의 강체 제약 조건을 나타낸다. 그러나, 여기서 구해진 정보는 메트릭(metric)에 관한 정보가 아니라, 사영 변환(projective transform)하의 정보이다. Second, a method of restoring a three-dimensional structure from an uncalibrated camera without camera calibration information is a method of using the information under the projection projection existing between two images by three-dimensional reconstruction with only an image without correction information. This method obtains 3D information after obtaining a fundamental matrix representing the geometric relationship between two images. This utilizes geometric epipolar constraints on the camera model in projection. The mathematical representation of the epipolar constraint corresponds to a fundamental matrix, which represents a rigid constraint between two images under a projection projection. However, the information obtained here is not information about a metric, but information under a projective transform.

이와 같이, 비록 종래에도 영상으로부터 3차원 환경을 복원하는 다양한 방법들이 존재하였지만, 카메라나 객체 둘 중 어느 하나는 고정된 환경이며, 둘 다 움직이는 동적환경하에서 3차원 환경의 복원 시도는 아직 이루어지지 않았다. 따라서, 동적환경하에서 종래의 방법을 이용할 경우, 카메라와 객체의 움직임이 혼합된 형태로 표현되어서, 실제 객체의 움직임과는 차이가 있다. 따라서, 카메라와 객체의 움직임이 제 각각인 동적환경에서, 각 객체의 순수 움직임을 분리할 수 있는 방안이 절실히 요구된다. As described above, although various methods exist for restoring a 3D environment from an image in the past, either a camera or an object is a fixed environment, and neither attempt has been made to restore the 3D environment under a moving dynamic environment. . Therefore, when the conventional method is used in a dynamic environment, the motions of the camera and the object are expressed in a mixed form, which is different from the motion of the actual object. Therefore, there is an urgent need for a method that can separate the pure movement of each object in a dynamic environment in which the movements of the camera and the object are different.

본 발명은, 상기와 같은 요구에 부응하기 위하여 제안된 것으로, 카메라와 객체의 움직임이 제 각각인 동적환경에서, 카메라 움직임을 보상한 각 객체의 순수 움직임(3차원 움직임)만을 분리할 수 있는 객체 움직임 추출 장치 및 그 방법을 제공하는데 그 목적이 있다.
The present invention has been proposed to meet the above demands, and in a dynamic environment in which the motions of the camera and the object are different, an object capable of separating only the pure motion (three-dimensional motion) of each object that compensates for the camera motion It is an object of the present invention to provide a motion extraction device and a method thereof.

상기 목적을 달성하기 위한 본 발명은, 카메라와 객체의 움직임이 제 각각인 동적 환경하에서의 영상 객체 움직임 추출 장치에 있어서, 적어도 두 개의 영상프레임에서 영상의 움직임정보를 이용하여 객체와 배경 영역을 분리하되, 에피폴라 제약조건을 이용하여 카메라만의 움직임으로 나타나는 배경영역과 카메라와 객체의 움직임으로 나타나는 객체영역을 분리하기 위한 움직임 분할수단; 각 영상프레임마다 상기 배경영역에서 내부 카메라 파라미터(카메라 좌표계)를 구하기 위한 카메라 캘리브레이션수단; 각 영상프레임당 상기 카메라 좌표계간의 '카메라의 3차원 움직임'을 구하기 위한 카메라 움직임 추정수단; 각 영상프레임마다 상기 카메라 좌표계와 각 객체 좌표계 사이의 변환을 구하기 위한 카메라/객체 좌표계 변환수단; 및 카메라/객체간 좌표변환과 영상프레임간의 카메라 좌표변환을 이용하여, 각 영상프레임마다 객체와 카메라의 움직임이 포함된 상기 객체영역에서 상기 '카메라의 3차원 움직임'을 제거하여, '객체의 순수 3차원 움직임'을 구하기 위한 객체 움직임 추정수단을 포함하여 이루어진 것을 특징으로 한다. In order to achieve the above object, the present invention provides a device for extracting motion of an image object in a dynamic environment in which a motion of a camera and an object is respectively provided, wherein the object and the background area are separated using motion information of the image in at least two image frames. Motion dividing means for separating the background region represented by the movement of the camera only and the object region represented by the movement of the camera and the object using the epipolar constraint; Camera calibration means for obtaining an internal camera parameter (camera coordinate system) in the background area for each image frame; Camera motion estimation means for obtaining a "three-dimensional movement of the camera" between the camera coordinate systems for each image frame; Camera / object coordinate system conversion means for obtaining a conversion between the camera coordinate system and each object coordinate system for each image frame; And using the camera / object coordinate transformation and the camera coordinate transformation between the image frames to remove the 'three-dimensional movement of the camera' from the object region including the movement of the object and the camera for each image frame, thereby eliminating the 'pure of the object. And object movement estimating means for obtaining a three-dimensional motion.

한편, 본 발명은, 카메라와 객체의 움직임이 제 각각인 동적 환경에서 객체의 움직임을 추출하는 방법에 있어서, 적어도 두 개의 영상프레임에서 영상의 움직임정보를 이용하여 객체와 배경영역을 분리하되, 에피폴라 제약조건을 이용하여 카메라만의 움직임으로 나타나는 배경영역과 카메라와 객체의 움직임으로 나타나는 객체영역을 분리하는 움직임 분할단계; 각 영상프레임마다 상기 배경영역에서 내부 카메라 파라미터(카메라 좌표계)를 구하는 카메라 캘리브레이션단계; 각 영상프레임당 상기 카메라 좌표계간의 '카메라의 3차원 움직임'을 구하는 카메라 움직임 추정단계; 각 영상프레임마다 상기 카메라 좌표계와 각 객체 좌표계 사이의 변환을 구하는 카메라/객체 좌표계 변환단계; 및 카메라/객체간 좌표변환과 영상프레임간의 카메라 좌표변환을 이용하여, 각 영상프레임마다 객체와 카메라의 움직임이 포함된 상기 객체영역에서 상기 '카메라의 3차원 움직임'을 제거하여, '객체의 순수 3차원 움직임'을 구하는 객체 움직임 추정단계를 포함하여 이루어진 것을 특징으로 한다. Meanwhile, the present invention provides a method of extracting a motion of an object in a dynamic environment in which a camera and an object are respectively moved, wherein the object and the background area are separated from each other by using motion information of the image in at least two image frames. A motion segmentation step of separating a background region represented by camera movements and an object region represented by camera and object movements using polar constraints; A camera calibration step of obtaining an internal camera parameter (camera coordinate system) in the background area for each image frame; A camera motion estimation step of obtaining a 'three-dimensional movement of the camera' between the camera coordinate systems for each image frame; A camera / object coordinate system conversion step of obtaining a transformation between the camera coordinate system and each object coordinate system for each image frame; And using the camera / object coordinate transformation and the camera coordinate transformation between the image frames to remove the 'three-dimensional movement of the camera' from the object region including the movement of the object and the camera for each image frame, thereby eliminating the 'pure of the object. And an object motion estimation step of obtaining a three-dimensional motion.

또한, 본 발명은, 상기 분리된 객체의 움직임을 절대 좌표계에서 표시하는 객체 움직임 표시단계를 더 포함하여 이루어진 것을 특징으로 한다. In addition, the present invention is characterized in that it further comprises an object motion display step of displaying the movement of the separated object in the absolute coordinate system.

본 발명은 카메라와 객체가 모두 움직이는 동적환경에서, 카메라와 객체의 움직임이 반영된 영상 시퀀스에서 카메라와 객체의 움직임을 분리하고자 한다. 이때, 객체만의 움직임을 분리하려면, 먼저 카메라의 3차원 움직임을 영상에 투영된 2차원 움직임으로부터 복원하여야 한다. 그러면, 객체의 움직임은 카메라와 객체의 움직임이 모두 반영되어 투영된 영상상의 움직임으로부터 카메라의 움직임을 분리함으로써 구할 수 있다. The present invention is to separate the movement of the camera and the object in the image sequence reflecting the movement of the camera and the object in a dynamic environment in which both the camera and the object is moving. At this time, in order to separate the movement of only the object, first, the three-dimensional movement of the camera must be restored from the two-dimensional movement projected on the image. Then, the movement of the object can be obtained by separating the movement of the camera from the movement on the projected image reflecting both the movement of the camera and the object.

본 발명에서는 이러한 동적환경에서 움직이는 객체의 3차원 움직임을 복원하는 것으로서, 기존의 정적환경에서 3차원 움직임을 복원하는 방법에도 적용하기 위해 객체와 배경을 분할하는 전처리 과정과, 카메라와 객체 움직임을 분리하는 후처리 과정을 통하여 동적환경으로 쉽게 일반화하였다. In the present invention, to restore the three-dimensional motion of the moving object in such a dynamic environment, the preprocessing process for dividing the object and the background, and to separate the camera and the object movement to apply to the method of restoring the three-dimensional motion in the existing static environment It is easily generalized to dynamic environment through post-processing.

상술한 목적, 특징들 및 장점은 첨부된 도면과 관련한 다음의 상세한 설명을 통하여 보다 분명해 질 것이다. 이하, 첨부된 도면을 참조하여 본 발명에 따른 바람직한 일실시예를 상세히 설명한다.The above objects, features and advantages will become more apparent from the following detailed description taken in conjunction with the accompanying drawings. Hereinafter, exemplary embodiments of the present invention will be described in detail with reference to the accompanying drawings.

이해를 돕기 위하여, 본 발명의 기술적인 원리를 전체적으로 설명하면 다음과 같다. To help understand, the technical principles of the present invention as a whole will be described.

일반적으로, 보정정보 없이 영상만으로 3차원 복원으로 두 영상 사이에 존재하는 사영 투영하의 정보를 이용할 때, 구해지는 정보는 메트릭(metric)에 관한 정보가 아니라, 사영 변환(projective transform)하의 정보이다. In general, when using the information under the projection projection existing between two images by three-dimensional reconstruction with only the image without correction information, the information obtained is not information about a metric, but information under a projective transform.

따라서, 메트릭(metric) 복원을 하려면, 씬 제약조건(scene constraint)(즉, 평면상의 점이나 라인, 라인과 평면의 수직 또는 수평 조건 등)을 가지고 있어야 한다. 이러한 씬 제약조건을 일반화하여 제약조건을 모두 가지고 있는 기하학적 객체인 평행 육면체(parallelepiped)로부터의 투영이 있다면, 카메라 내부 파라미터를 구할 수 있으므로, 메트릭(metric) 복원을 할 수 있다. 그러나, 영상으로부터 씬 제약 조건을 추출할 수 없다면, 사영 변환하의 복원만이 가능하다. 또한, 메트릭(metric) 복원을 하기 위하여 반드시 평행 육면체를 사용하지 않고, 평행한 라인의 교점인 소실점과 카메라의 내부 파라미터를 제한시킴으로써 구할 수 있다. 최근 출시되는 카메라의 경우, 사각형 픽셀(square pixel)이나 제로 왜도(zero skewness)의 조건을 만족하므로, 이러한 조건을 아는 경우 부분적인 씬 제약조건만 가지고도 메트릭(metric) 복원을 할 수 있다. Thus, for metric reconstruction, you must have a scene constraint (i.e., a point or line on a plane, or a vertical or horizontal condition on a line or plane). Generalizing these scene constraints, if there is a projection from a parallelepiped, a geometrical object that has all the constraints, then the camera internal parameters can be obtained, allowing metric reconstruction. However, if the scene constraints cannot be extracted from the image, only reconstruction under projective transformation is possible. In addition, it is possible to obtain by restraining the vanishing point, which is the intersection point of the parallel lines, and the internal parameters of the camera, without necessarily using a parallelepiped for metric restoration. Recently released cameras satisfy the conditions of square pixel or zero skewness, so if you know these conditions, you can perform metric reconstruction with only partial scene constraints.

따라서, 본 발명에서는 완전한 씬 제약조건을 만족시키기 위하여 평행 육면체(parallelepiped) 등의 3차원 구조를 미리 아는 마커(marker) 영상으로의 투영을 통하여 카메라 내부 파라미터를 구하거나, 부분적 씬 제약조건과 미리 알고 있는 카메라 내부 파라미터 정보를 이용하여 나머지 카메라 내부 파라미터(초점거리, principal point 위치)를 구하여, 메트릭(metric) 복원을 한다. 또한, 카메라 내부 파라미터를 구하였으면, 움직이는 객체의 좌표계를 아는 경우, 객체와 카메라 좌표계 사이의 변환을 구할 수 있다. 따라서, 본 발명에서는 각 객체마다 그 위에 3차원 메트릭(metric) 구조를 아는 특징점들을 포함하거나, 2차원 메트릭(metric) 구조위에 놓여 있는(한 평면상의 점으로 parameterization할 수 있는) 적어도 네 점을 이용하여 각 객체와 카메라 좌표계 변환을 구해낸다. Therefore, in the present invention, in order to satisfy the complete scene constraint, camera internal parameters are obtained through projection to a marker image that knows a three-dimensional structure such as a parallelepiped, or a partial scene constraint is known in advance. The camera internal parameter information is used to obtain the remaining camera internal parameters (focal length, principal point position) and perform metric restoration. In addition, once the camera internal parameters are obtained, a conversion between the object and the camera coordinate system can be obtained when the coordinate system of the moving object is known. Thus, the present invention utilizes at least four points each of which includes feature points that know a three-dimensional metric structure thereon, or which can be parameterized into one plane point on a two-dimensional metric structure. Get each object and camera coordinate system transformation.

그러나, 환경하의 객체도 움직이는 경우에는(즉, 동적환경에서는), 위의 방법으로 3차원 복원을 할 수가 없는데, 그 이유는 두 뷰로부터 3차원 위치를 정하는 삼각법(triangulation)이 성립할 수 없기 때문이다. 만약, 배경영역으로부터 구해진 에피폴라 제약조건(epipolar constraint)을 그대로 이용한다면, 객체, 카메라 움직임이 하나로 반영된다. 바꿔 말하면, 에피폴라 제약조건(epipolar constraint)이 카메라의 움직임에만 관련있는 배경영역과 카메라 움직임과 객체 움직임 모두에 관련있는 객체 영역에 대해 다르게 나타나게 된다. However, if the objects in the environment are also moving (i.e. in a dynamic environment), the three-dimensional reconstruction cannot be done with the above method, because triangulation cannot be established to determine the three-dimensional position from the two views. to be. If the epipolar constraint obtained from the background region is used as it is, the object and camera movement are reflected as one. In other words, epipolar constraints appear differently for background areas that are only related to camera movement and for object areas that are related to both camera and object movement.

따라서, 본 발명에서는 객체와 카메라의 움직임의 분리시, 크게 카메라 움직임을 구하는 과정(도 2의 202,203)과, 이 카메라의 움직임을 이용하여 객체의 움직임을 분리하는 과정(도 2의 204,205)으로 나눈다. 이를 위해서, 카메라만의 움직임으로 나타나는 배경과 카메라와 객체의 움직임으로 나타나는 객체의 영역을 분할하는 과정(201)이 필수적이다. Therefore, in the present invention, when the object and the camera movement is separated, it is divided into the process of obtaining the camera movement largely (202, 203 of FIG. 2), and the process of separating the object movement using the movement of the camera (204, 205 of FIG. 2). . To this end, a process 201 of dividing the background of the camera-only movement and the area of the object represented by the movement of the camera and the object is essential.

이러한 분할을 가정하면, 분할된 배경영역은 기존의 SFM 방법을 이용하여 메트릭(metric) 복원이 가능하며, 복원하는 도중 카메라의 움직임도 알아 낼 수 있다(203). 또한, 객체의 경우는 평면 위에 있는 적어도 네 점과 배경에서 구해진 카메라 내부 파라미터를 이용하여 객체와 카메라 좌표계간의 강체 변환을 구할 수 있으므로(204), 좌표계 변환을 이용한 본 발명을 사용하여 객체의 움직임도 구할 수 있다(205). 이러한 분할을 하기 위해서 가장 좋은 경우는, 3차원 구조를 알고 있는 평행 육면체와 같은 마커(marker)가 존재하여 이로부터 카메라 캘리브레이션 및 객체 좌표계를 구해내는 것이다. 그러나, 마커(marker)의 조건을 완화시켜, 점, 라인의 평행, 수직관계를 미리 알고 있는 배경을 통해 카메라 내부 파라미터를 알아내는 셀프 캘리브레이션이 가능하고, 객체위에는 평면위의 적어도 네 점의 대응점만 알 수 있어도 본 발명을 사용하면 가능하다. Assuming such division, the divided background region can be restored by a metric using an existing SFM method, and the camera movement can be detected during the restoration (203). In addition, in the case of an object, a rigid body transformation between the object and the camera coordinate system can be obtained using at least four points on a plane and internal camera parameters obtained from the background (204). Can be obtained (205). The best case for this division is to have a marker like a parallelepiped that knows the three-dimensional structure, and to derive the camera calibration and object coordinate system from it. However, by mitigating the condition of the marker, self-calibration that finds the internal parameters of the camera through the background that knows the parallel and vertical relationship of points and lines is possible, and only the corresponding points of at least four points on the plane are on the object. Even if it is known, it is possible to use the present invention.

한편, 객체 위나 배경에 이러한 좌표계를 줄 수 있는 마커(marker)와 부분적 씬 제약조건을 주는 특징(feature)이 없는 경우에는, 유클리디안(Euclidean) 모션을 구해내지 못하고, 사영변환하의 모션밖에 알아 낼 수 없다. 따라서, 본 발명에서는 가장 많은 정보가 주어지는 각 객체와 배경에 좌표계를 줄 수 있는 마커(marker)가 존재하는 경우와, 조건을 완화하여 셀프 캘리브레이션 가능한 제약조건이 존재하는 환경하에서, 제약조건을 통해 카메라 내부 파라미터 및 움직임을 알아내고, 카메라와 객체간의 좌표계 변환을 통해 객체의 움직임을 알아낸다. 또한, 영상만으로 객체와 배경을 구분할 수 없는 경우에도, 수동으로 최소한의 국소 대응 영역을 지정하면 영역안에서 가장 정합이 잘되는 대응점이나 라인을 구해내어 에피폴라 제약조건(epipolar constraint)이나 씬 제약조건을 구한다. 이처럼 수동으로 대응점을 지정하는 것은, 알고리즘으로 구한 대응점이 실제와 다른 경우, 사 용자가 interactive하게 대응점을 수정하여 자동 알고리즘이 실패하는 경우에 손쉽게 정확한 제약조건을 구할 수도 있게 한다. On the other hand, if there are no markers that can give these coordinates on objects or in the background and features that give partial scene constraints, Euclidean motion cannot be obtained and only motion under projective transformation is known. I can't. Therefore, in the present invention, a camera is provided through the constraints in the case where there is a marker that can give a coordinate system to each object and the background where the most information is given, and in the environment where the self-calibrable constraints are alleviated. The internal parameters and movements are identified, and the movement of objects is determined through the coordinate system transformation between the camera and the object. In addition, even when the image and the object cannot be distinguished from each other, manually specifying a minimum local correspondence area obtains the best matching point or line in the area and obtains an epipolar constraint or a scene constraint. . Manually assigning a matching point allows the user to interactively modify the matching point interactively if the matching point obtained by the algorithm is different from the actual one, and to easily obtain accurate constraints in case the automatic algorithm fails.

마지막으로, 씬 제약조건을 만족하는 특징점이 없거나 부족한 영상의 경우를 위하여, 씬 대신 카메라 제약조건을 주어 객체 모션분리를 할 수 있게 한다. 씬 제약조건처럼 다양한 카메라 제약조건이 나올 수 있겠지만, 본 발명에서는 고정된 스테레오 카메라 한 쌍을 이용한다. 여기서, 고정의 의미는 카메라가 움직일 때, 두 카메라의 상대적인 시점(orientation)이 변하지 않고, 카메라 내부 파라미터도 변하지 않음을 뜻한다. 이러한 경우, 스테레오 카메라의 움직임은 무한평면(plane at infinity)을 결정하는 것이 되어, 싱글 카메라에서 3개의 소실점을 준것과 동치이며, 기존 SFM에서 어파인(affine) 복원까지 가능하게 된다. 이로부터 메트릭(metric) 복원을 하려면 "image of absolute conic"의 제약조건을 이용한 셀프 캘리브레이션을 이용하면 가능하다. 이러면, 결국 두 카메라의 상대적 외부 시점(orientation) 및 내부 파라미터를 구하게 된다. 그 다음은, 배경으로 분류된 모든 픽셀에 대해 대응점을 스테레오 정합을 통해 삼각법을 통해 픽셀의 3차원 좌표를 구하게 되며, 3차원 좌표를 구하면, 객체 위 대응점은 두 프레임 사이의 3차원 좌표변환을 하나의 강체 변환으로 표현하여, 객체의 움직임을 손쉽게 구하게 된다. Lastly, for the case of missing or lacking features that satisfy the scene constraints, object constraints can be achieved by giving camera constraints instead of scenes. Various camera constraints may come out like scene constraints, but the present invention uses a fixed pair of stereo cameras. In this case, the fixed means that the relative orientation of the two cameras does not change when the camera moves, and the internal parameters of the camera do not change. In this case, the movement of the stereo camera is to determine the plane at infinity, which is equivalent to giving three vanishing points in a single camera, and even affine restoration in the existing SFM. This can be achieved by using self-calibration with the constraint of "image of absolute conic". This results in finding the relative external orientation and internal parameters of the two cameras. Next, for all the pixels classified as the background, the corresponding points are obtained by triangulation through stereo matching, and when the three-dimensional coordinates are obtained, the corresponding points on the object perform one three-dimensional coordinate transformation between two frames. Expressed by the rigid transformation of, the motion of the object can be easily obtained.

도 1 은 본 발명에 따른 카메라 움직임을 보상한 객체 움직임 추출 장치의 일실시예 구성도로서, 카메라 및 객체의 모션 분리 구조를 나타낸다. 1 is a configuration diagram of an object motion extraction apparatus for compensating for camera motion according to an exemplary embodiment of the present invention, illustrating a motion separation structure of a camera and an object.

도 3 은 동적 환경에서 두 장의 영상에서 나타나는 객체와 카메라간의 좌표계 변환 관계를 나타내는 것으로, 객체 움직임은 두 객체 좌표계의 변환으로 나타 내며, 이 변환은 나머지 3개의 변환을 구해내면 선형합성 변환에 의해 구할 수 있다. 도 3에서, C₁은 첫 번째 영상의 카메라 센타(카메라 좌표계 원점)의 위치, C₂는 두 번째 영상의 카메라 센타(카메라 좌표계 원점)의 위치, O₁은 첫 번째 영상획득시 객체 좌표계 원점의 위치, 그리고 O₂는 두 번째 영상획득시 객체 좌표계 원점의 위치를 각각 나타낸다. FIG. 3 shows the coordinate system transformation relationship between an object and a camera appearing in two images in a dynamic environment. The object motion is represented by a transformation of two object coordinate systems. The transformation is obtained by a linear composite transformation after obtaining the remaining three transformations. Can be. In Figure 3, C ₁ is the position of the camera center (camera coordinate system origin) of the first image, C ₂ is the position of the camera center (camera coordinate system origin) of the second image, O ₁ is the position of the object coordinate system origin at the first image acquisition The position and O ₂ represent the position of the object coordinate system origin at the second image acquisition, respectively.

본 발명에서는 카메라와 객체가 모두 움직이는 동적환경하에서, 카메라와 객체의 움직임이 반영된 영상 시퀀스로부터 카메라와 객체의 움직임을 분리한다. In the present invention, in a dynamic environment in which both the camera and the object move, the camera and the object are separated from the image sequence reflecting the movement of the camera and the object.

이를 위해, 먼저 정지된 배경으로부터 카메라의 움직임 정보를 추출한다. 그리고, 추출된 카메라 움직임을 이용하여 객체만의 움직임을 분리한다. 이는 기존 방법을 이용하여, 정지된 환경에서 카메라 움직임과 환경의 3차원 구조를 복원할 수 있다. 또한, 객체에 좌표계를 줄 수 있는 마커(marker)가 있을 경우, 객체와 카메라간 강체변환도 기존 방법으로 구해낼 수 있다. 그러나, 기존 방법의 경우, 객체의 움직임을 구해내려면 정지된 배경으로부터 객체 좌표계의 움직임을 알아내야 하며, 이러한 경우 직접 정지된 절대좌표계로부터 객체 좌표계의 움직임을 매프레임마다 구하거나, 절대좌표계에 대한 카메라 좌표계의 움직임과 카메라 좌표계에 대한 객체좌표계의 변환을 구해 이들의 선형변환으로부터 간접적으로 절대좌표계로부터 객체 좌표계로의 변환을 매프레임마다 구해내야 한다. 이것이 가능하려면, 매프레임마다 절대좌표계를 나타내는 마커(marker)가 보여야 하는데, 이를 만족하려면 항상 마커(marker)가 보이도록 카메라 움직임을 제한시켜야 한다. 그러나, 카메 라가 고정되었어도 객체가 마커(marker)를 가리게 되면, 객체 움직임을 구해내는 것이 불가능하게 된다. To this end, first, camera motion information is extracted from the still background. Then, the object movement is separated using the extracted camera movement. This can restore the camera motion and the three-dimensional structure of the environment in the stationary environment using the existing method. In addition, if the object has a marker that can give a coordinate system, the rigid body transformation between the object and the camera can also be obtained by conventional methods. However, in the conventional method, in order to obtain the motion of an object, the motion of the object coordinate system must be found from the stationary background. In this case, the motion of the object coordinate system can be obtained every frame or directly relative to the absolute coordinate system. The movement of the camera coordinate system and the transformation of the object coordinate system to the camera coordinate system should be obtained and indirectly from these linear transformations to the absolute coordinate system to the object coordinate system. For this to be possible, a marker representing the absolute coordinate system must be visible every frame. To satisfy this, the camera movement must be limited so that the marker is always visible. However, even if the camera is fixed, if the object obscures the marker, it becomes impossible to obtain the object movement.

그러나, 본 발명에서는 움직임을 구하고자 하는 객체에 초점을 맞추어 배경의 마커(marker)가 보이지 않거나, 마커(marker)가 없어 객체 좌표계의 움직임을 정지된 절대좌표계로부터 변환을 알지 못하더라도, 카메라 움직임과 객체와 카메라간 변환을 구하고, 이들의 선형합성변환으로 객체의 움직임을 구할 수 있다. 이는 절대좌표계를 주는 캘리브레이션 타겟이 없어도 임의로 첫 번째 영상의 카메라 위치 등으로 주고 객체의 움직임을 구할 수 있게 되고, 항상 보이지 않더라도 캘리브레이션 타겟이나 씬 제약조건이 존재한다면 이를 함께 사용하여 카메라 움직임정보를 보다 정확하게 구해낼 수 있게 할 수 있다. 단지, 모호성이 없이 유일한 객체 좌표계를 잡을 수 있는 객체위의 마커(marker)가 존재한다는 제약조건만 만족시키면 된다. 본 발명에 제시한 가장 손쉬운 마커(marker)는 평면위에 거울 대칭성을 지니지 않은 네 점이다. 또한, 만약 객체의 표면이 평평치 않고 곡선의 형상이라 하더라도 가상으로 자른 평면의 단면위의 네 점 이상을 마커(marker)로 지정해 놓으면, 그 객체의 움직임을 구해 낼 수 있음을 의미한다. 또한, 객체에 이런 마커(marker) 마저도 없는 경우에는 카메라에 제약조건을 주어 객체의 움직임을 구해내게 한다. However, in the present invention, even if the background marker is not visible by focusing on the object for which the motion is to be found or the marker is not present, the movement of the object coordinate system is not known from the stationary absolute coordinate system. The transformation between the object and the camera can be obtained, and the linear motion transformation can be used to obtain the motion of the object. This means that even if there is no calibration target that gives the absolute coordinate system, it can be arbitrarily given to the camera position of the first image and the object's movement can be obtained. It can be saved. It only needs to satisfy the constraint that there is a marker on the object that can grasp the unique object coordinate system without ambiguity. The easiest markers presented in the present invention are four points with no mirror symmetry on the plane. In addition, even if the surface of the object is not flat and curved, if four or more points on the cross section of the virtually cut plane are designated as markers, this means that the movement of the object can be obtained. In addition, if the object does not even have such a marker, the camera is constrained to obtain the object's movement.

이로써, 카메라 움직임과 독립적으로 움직이는 객체에 대해서는 카메라 프레임에 대한 상대적인 움직임밖에 얻을 수 없었던 종래의 기술을 개선하여, 객체만의 움직임을 분리하고, 화면에 출력할 수 있게 됨으로써 정적 환경이상의 동적환경하 에서 3차원 복원을 가능하게 한다.As a result, the conventional technology that can obtain only the relative movement with respect to the camera frame for the object moving independently of the camera movement can separate the movement of the object and output it to the screen. Enable 3D reconstruction

도 1 및 도 3을 참조하여, 상기와 같은 특징을 갖는 본 발명에 따른 카메라 움직임을 보상한 영상 객체 움직임 추출 장치의 구성을 살펴보면, 적어도 두 개의 영상프레임(I₁,I₂)에서 영상의 움직임정보를 이용하여 객체와 배경영역을 분할하기 위한 움직임 분할부(11)와, 각 영상프레임마다 배경영역에서 내부 카메라 파라미터(카메라 좌표계)(K₁,K₂)를 구하기 위한 카메라 캘리브레이션부(12)와, 각 영상프레임당 카메라 좌표계간의 카메라 움직임(즉, I₁에서 I₂로의 카메라 움직임을 나타내는 강체 변환

)을 구하기 위한 카메라 움직임 추정부(13)와, 각 영상프레임마다 카메라 좌표계와 각 객체 좌표계 사이의 변환(즉, 각 영상에서 카메라/객체간 강체 변환

)을 구하기 위한 좌표계 변환부(14)와, 카메라/객체간 좌표변환(즉, 각 영상에서 카메라/객체간 강체 변환

)과 영상프레임간의 카메라 좌표변환(즉, I₁에서 I₂로의 카메라 움직임을 나타내는 강체 변환

)으로부터 객체의 움직임(즉, I₁에서 I₂로의 객체 움직임을 나타내는 강체 변환

)을 구하기 위한 객체 움직임 추정부(15)를 포함한다. Referring to FIGS. 1 and 3, a configuration of an apparatus for extracting a motion of an image object compensating for camera movement according to the present invention having the above-described features may include: movement of an image in at least two image frames I ₁ and I ₂ . A motion dividing unit 11 for dividing an object and a background region using information, and a camera calibrating unit 12 for obtaining an internal camera parameter (camera coordinate system) K ₁ and K ₂ in the background region for each image frame. And a rigid body transform representing the camera movement between the camera coordinate systems for each image frame (i.e., the camera movement from I ₁ to I ₂ ).

A camera motion estimator 13 for obtaining the image data, and a transformation between the camera coordinate system and each object coordinate system for each image frame (i.e., a rigid body-to-camera transformation in each image).

) And a coordinate transformation between the camera and the object (i.e., a rigid body transformation between the camera and the object in each image).

) Camera coordinate transformation (i.e. rigid body transformation representing camera movement from I ₁ to I ₂ )

Rigid body representing the object's movement (i.e., I ₁ to I ₂ )

It includes the object motion estimation unit 15 for obtaining ().

여기서, 객체 움직임 추정부(15)에서 분리된 객체의 움직임은 절대좌표계에 표시 가능하다. 따라서, 본 발명은 카메라와 객체의 움직임이 제 각각인 동적환경에서, 각 객체의 순수 움직임을 분리하여, 분리된 객체의 움직임만을 절대 좌표계에서 표시할 수 있다. Here, the motion of the object separated by the object motion estimation unit 15 can be displayed in the absolute coordinate system. Therefore, in the dynamic environment in which the movements of the camera and the object are respectively, pure movement of each object may be separated, and only the movement of the separated object may be displayed in the absolute coordinate system.

움직임 분할부(11)는 고정된 배경과 독립적으로 움직이는 객체의 영역을 분류함에 있어서, 배경과 객체에 이를 구분하는 마커(marker)가 있을 경우, 배경의 마커(marker)들의 에피폴라 제약조건(epipolar constraint)에 의해 에피폴(epipole)을 구하고, 그 에피폴(epipole)로 수렴하지 않는 이상치 픽셀(outlier pixel)을 분류하여, 배경과 나머지를 분류한다. 또한, 나머지 영역에 대해, 각 객체의 마커(marker)들에 대해 배경 분류 과정과 동일한 과정을 수행하여 해당 객체의 영역을 분류한다. The motion divider 11 classifies the area of the moving object independently of the fixed background. When the background and the object have a marker that distinguishes it, the epipolar constraint of the markers of the background is epipolar. Epipoles are obtained by constraints, and outlier pixels that do not converge to the epipoles are classified to classify the background and the rest. In addition, for the remaining areas, the markers of the respective objects are classified in the same manner as the background classification process to classify the regions of the corresponding objects.

그리고, 움직임 분할부(11)는 고정된 배경과 독립적으로 움직이는 객체의 영역을 분류함에 있어서, 배경에 마커(marker)가 없는 경우, 에피폴라 제약조건(epipolar constraint)을 구하기 위해 사용자가 수동으로 두 프레임 영상에서 대응영역을 지정하면 그 영역안에서 가장 로컬 매칭(local matching)이 잘되는 대응점을 구하고(특징점 정합), 특징점으로 구한 에피폴(epipole)로 수렴하는 픽셀들을 배경으로 분류한다. 그리고, 배경이 아닌 이상치(outlier) 픽셀에 대해 배경 분류 과정과 동일과 과정을 수행하여 각 객체의 영역을 분류한다. The motion divider 11 classifies the area of the moving object independently of the fixed background. When there is no marker in the background, the motion divider 11 manually sets the user to obtain the epipolar constraint. When the corresponding region is designated in the frame image, the corresponding point having the best local matching in the region is obtained (feature point matching), and the pixels converged by the epipole obtained as the feature point are classified into the background. Then, the area of each object is classified by performing the same process as the background classification process on the outlier pixels that are not the background.

카메라 캘리브레이션부(12)는 카메라 내부 파라미터를 추출함에 있어서, 절대 좌표계에 대한 3차원 카메라 좌표계 및 카메라 내부 변수를 구하고(카메라 캘리브레이션), 3차원 절대 좌표계에 관한 정보가 없는 경우, 씬 제약조건이나 두 장 이상의 영상에서 특징점의 대응관계를 이용하여 카메라 내부 파라미터만을 구한다(셀프 캘리브레이션). In extracting camera internal parameters, the camera calibration unit 12 obtains a 3D camera coordinate system and a camera internal variable for the absolute coordinate system (camera calibration), and when there is no information about the 3D absolute coordinate system, a scene constraint or two Only the internal parameters of the camera are obtained using the correspondence of feature points in more than one image (self-calibration).

카메라 움직임 추정부(13)는 각 영상프레임당 카메라 좌표계간의 카메라 움 직임을 계산함에 있어서, 3차원 절대 좌표계에 관한 정보가 있는 경우, 카메라 외부 파라미터로부터 카메라의 회전, 병진 움직임을 구하고, 절대 좌표계에 관한 정보가 없는 경우, 카메라 캘리브레이션부(12)에서 구한 카메라 내부 파라미터와 배경영역의 에피폴라 제약조건(epipolar constraint)으로부터 카메라의 회전, 병진 움직임을 구한다. When the camera motion estimator 13 calculates camera movement between camera coordinate systems for each image frame, when there is information about a three-dimensional absolute coordinate system, the camera motion estimator 13 obtains the rotation and translational movement of the camera from an external parameter of the camera, If there is no related information, the rotation and translational movement of the camera are obtained from the camera internal parameters obtained from the camera calibrator 12 and the epipolar constraint of the background region.

좌표계 변환부(14)는 카메라와 각 객체 좌표계간 변환을 통해 최소한의 점을 사용하여 객체 좌표계에 대한 카메라 좌표계의 자세를 계산한다. 이때, 절대 좌표계를 알 수 있는 3차원 표식이 없는 경우, 카메라 캘리브레이션부(12)의 결과를 이용하여 고정된 배경의 영역안의 화소만을 이용하여 세 개 이상의 영상프레임을 사용하여 카메라 내부 변수를 구하고(셀프 캘리브레이션), 셀프 캘리브레이션으로부터 구해진 카메라 파라미터와 객체상의 적어도 6개의 점(평면 위의 점인 경우 4개)의 대응점을 가지는 객체 좌표계를 이용하여 카메라와 객체간의 변환을 구한다. The coordinate system converting unit 14 calculates the pose of the camera coordinate system with respect to the object coordinate system using the minimum points through the conversion between the camera and each object coordinate system. At this time, if there is no three-dimensional marker that can know the absolute coordinate system, using the result of the camera calibration unit 12 to obtain the internal variables of the camera using three or more image frames using only pixels in the fixed background area ( The conversion between the camera and the object is obtained by using an object coordinate system having self-calibration) and camera parameters obtained from self-calibration and corresponding points of at least six points on the object (four on the plane).

객체 움직임 추정부(15)는 카메라 움직임과 좌표계 변환의 선형변환으로 객체의 3차원 움직임을 계산한다. 즉, 카메라 움직임 추정부(13)에서 구한 카메라 좌표계간의 변환과 좌표계 변환부(14)에서 구한 카메라와 객체간의 좌표계 변환의 결합 변환으로부터 객체의 움직임을 구한다. The object motion estimator 15 calculates a three-dimensional motion of the object by linear transformation of camera motion and coordinate system transformation. That is, the motion of the object is obtained from the combined transformation between the camera coordinate system obtained by the camera motion estimation unit 13 and the coordinate system transformation between the camera and the object obtained by the coordinate system conversion unit 14.

상기와 같은 구성을 갖는 본 발명에 따른 카메라 움직임을 보상한 영상 객체 움직임 추출 장치의 동작을 도 2을 참조하여 살펴보기로 한다. An operation of the image object motion extraction apparatus for compensating for camera movement according to the present invention having the above configuration will be described with reference to FIG. 2.

먼저, 움직임 분할부(11)에서 객체와 배경영역의 분할을 한다(201). First, the motion division unit 11 divides an object and a background area (201).

만약, 미리 알고 있는 마커(marker)가 있으면, 두 장에서 마커(marker)의 대 응점들의 에피폴라 제약조건(epipolar constraint)을 만족하는 대응점 쌍(pairs)으로 분류를 한다. If there are known markers, the two chapters classify them as pairs of pairs that satisfy the epipolar constraint of markers' corresponding points.

한편, 마커(marker)가 없는 경우, 에피폴라 제약조건(epipolar constraint)을 만족하는 최소한의 대응점 쌍(pairs)을 수동/자동을 정해주고, 배경/객체 여부를 결정해준다. On the other hand, if there are no markers, manual / automatically determines minimum pairs of matching points that satisfy the epipolar constraint and determines whether a background / object is present.

그밖에, 좀 더 정확한 경계(motion boundary)는 다른 특징들(예를 들면, contour, color)를 결합하여 보정해준다. In addition, more accurate motion boundaries can be compensated for by combining other features (eg contour, color).

이후, 분할된 배경과 객체의 영역에 대해 다음 과정을 거친다. After that, the divided background and the area of the object are processed as follows.

우선, 카메라 캘리브레이션부(12)에서는 배경에서 카메라 내부 파라미터를 얻어낸다(202). First, the camera calibration unit 12 obtains camera internal parameters in the background (202).

만약, 마커(marker)가 주어진 경우, 이로부터 캘리브레이션을 하여 카메라 내부/외부 파라미터를 얻어내, 외부 파라미터로부터 카메라의 움직임을 즉시 구하고, 내부 파라미터를 객체의 pose estimation(객체와 카메라간 강체변환)에 이용한다. If a marker is given, it is calibrated from this to obtain camera internal / external parameters, immediately obtain camera movement from external parameters, and internal parameters to the object's pose estimation (rigid transformation between object and camera). I use it.

그러나, 마커(marker)가 주어지지 않은 경우, 배경의 영역에서부터 자동으로 셀프 캘리브레이션을 하여야 한다. 이 방법은 기존의 비보정(uncalibrated) 카메라로부터의 SFM 방법을 이용하는데, 사영복원으로부터 메트릭 복원하려면 제약조건이 필요하다. 이때, 제약조건에는 앞서 설명한 유클리디안 공간의 정보(수직, 평행 조건 등) 같은 씬 제약조건, 카메라 모션정보 같은 카메라 외부변수 제약조건, 마지막으로 카메라 내부변수 제약조건 등이 있다. 보통 카메라 CCD를 정사각형 픽셀(pixel)로 간주하여 zero skew, aspect ratio=1로 제한시키면, 구해야 하는 파라미터는 principal point 2개와 focal length 1개만 구하면 된다. 보통, principal point를 영상 센터나 그 근처로 두고, focal length만 근사화하여 구한다. 이때, 근사화할 적절한 cost function을 주어 최적화를 수행하여 셀프 캘리브레이션을 수행한다. However, if no marker is given, self calibration should be performed automatically from the background area. This method uses the SFM method from an existing uncalibrated camera, which requires constraints to restore the metric from projective restoration. At this time, the constraints include scene constraints such as the information of the Euclidean space (vertical, parallel condition, etc.), camera external variable constraints such as camera motion information, and finally camera internal variable constraints. Normally, considering the camera CCD as a square pixel and limiting to zero skew, aspect ratio = 1, the parameters to be obtained are two principal points and one focal length. Normally, the principal point is placed at or near the image center and approximates only the focal length. At this time, the self-calibration is performed by optimizing by giving an appropriate cost function to approximate.

이렇게 하여 카메라 내부변수(내부 파라미터)를 구하면(202), 메트릭 복원을 한 것이 되며, 구해진 유클리디안 사영행렬로부터 행렬 분할(matrix decomposition)을 하면 내부 파라미터 행렬과 외부 파라미터 행렬로 분할하게 되며, 카메라 움직임 추정부(13)에서는 이로부터 카메라의 위치를 구하게 된다(203). In this way, when the camera internal variable (internal parameter) is obtained (202), the metric is restored, and when matrix decomposition is performed from the obtained Euclidean projection matrix, the camera is divided into an internal parameter matrix and an external parameter matrix. The motion estimation unit 13 calculates the position of the camera from this (203).

이 방법 말고도 카메라 내부파라미터를 구하였으면, 에피폴라 제약조건(epipolar constraint)을 나타내는 기반행렬(fundamental matrix)로부터 두 카메라의 내부 파라미터 행렬을 분할하여 외부 파라미터만을 나타내는 행렬을 구해내게 되며, 이중 회전행렬의 직교성(orthogonality)을 이용하여 SVD(Singular Value Decomposition) 등의 방법을 사용하여 카메라의 위치를 나타내는 회전과 병진 운동성분을 구해낼 수도 있다. 그러면, 두 카메라의 위치를 나타내는 행렬들의 선형변환에 의해 카메라 움직임을 구해낸 것이 된다.In addition to this method, if the camera internal parameters are obtained, the internal parameter matrix of the two cameras is divided from the fundamental matrix representing the epipolar constraint to obtain a matrix representing only external parameters. Orthogonality can be used to calculate rotational and translational motion components that represent the position of the camera using methods such as Singular Value Decomposition (SVD). Then, the camera motion is obtained by linear transformation of matrices representing the positions of the two cameras.

이제, 남은 부분은 객체의 움직임인데, 먼저 좌표계 변환부(14)에서는 카메라 캘리브레이션부(12)에서 구한 카메라 내부 파라미터로부터 얻은 카메라 사영모델을 이용하여 카메라와 객체 좌표계간의 변환, 즉 객체 좌표계에 대한 카메라의 pose 추정을 각 프레임마다 한다(204). Now, the remaining part is the movement of the object. First, the coordinate system converting unit 14 converts between the camera and the object coordinate system using the camera projection model obtained from the camera internal parameters obtained by the camera calibrating unit 12, that is, the camera with respect to the object coordinate system. The pose estimation of each frame is performed (204).

마지막으로, 객체 움직임 추정부(15)에서 객체의 움직임을 미리 구해진 카메라 객체간 변환 2개와 카메라 움직임의 선형변환으로 구한다(205). Finally, the object motion estimation unit 15 calculates the motion of the object using two previously obtained transformations between camera objects and a linear transformation of the camera motion (205).

기존의 방법에서는 객체의 위치(좌표계)를 절대좌표계에 대한 카메라 좌표계와 카메라와 객체간 좌표계 변환의 선형변환으로 구했으나, 이를 위해서는 절대좌표계의 원점이 항상보여야 한다. 그러나, 동적환경의 경우 카메라가 움직여 원점이 시야에서 사라지거나, 객체가 자유롭게 움직여 원점을 가리는 것 등이 빈번하게 발생하므로 제약이 많은 기존 방법을 보완하여, 본 발명은 원점을 굳이 정하지 않고도 객체의 움직임을 구할 수 있는 방법을 제시하였다.In the conventional method, the position of the object (coordinate system) was obtained by the linear transformation of the camera coordinate system and the transformation of the coordinate system between the camera and the object with respect to the absolute coordinate system, but the origin of the absolute coordinate system must always be visible. However, in the dynamic environment, the camera moves away from the origin, or the object moves freely, obscuring the origin frequently. Therefore, the present invention compensates for the limitations of the existing method. We presented a way to obtain.

도 3 은 객체의 움직임(좌표계 변환)이 카메라의 움직임과 객체와 카메라간 좌표계 변환들의 선형 변환으로부터 구해질 수 있음을 보여준다. 이들은 강체 변환으로 수학적으로 group에 속하며, 한 변환에 대한 역변환에 유일한 group의 성질때문에 객체의 움직임은 나머지 세 변환에 의해 유일하게 결정된다.3 shows that the movement of the object (coordinate system transformation) can be obtained from the linear movement of the camera movement and coordinate transformations between the object and the camera. They are rigidly transformed and mathematically belong to groups, and because of the nature of the group that is unique to the inverse of one transform, the movement of the object is determined solely by the remaining three transforms.

정리해 보면, 본 발명은 크게 3개의 경우로 나누어 각각에 대해 객체의 움직임을 분리하는 방법을 제안하였다. In summary, the present invention is divided into three cases and proposed a method of separating the movement of the object for each.

첫째, 객체와 배경에 마커(marker)가 있는 경우, 배경의 마커(marker)로부터 카메라 캘리브레이션을 하여 카메라와 배경간 좌표계 변환 및 카메라의 움직임을 구해내는 제1 과정, 객체의 마커(marker)로부터 카메라 객체간 좌표계 변환을 알아내는 제2 과정, 상기 제1 과정의 카메라와 배경간 좌표계 변환과 상기 제2 과정의 카메라 객체간 좌표계 변환의 선형 변환으로 객체와 배경간 좌표계 변환, 즉 객체의 움직임을 알아내는 제3 과정으로 나눌 수 있다. First, when there are markers on the object and the background, the first process of performing the camera calibration from the background marker to obtain the coordinate system and camera movement between the camera and the background, the camera from the object marker The second process of determining the coordinate system transformation between objects, the transformation of the coordinate system between the camera and the background of the first process and the linear transformation of the coordinate system transformation between the camera objects of the second process, the coordinate transformation between the object and the background, that is, the movement of the object I can be divided into third courses.

둘째, 객체에 한 평면위의 적어도 4점(일반적 3차원 공간은 6점)과, 배경에는 씬 제약조건을 가지고 있을 경우, 배경에서 씬 제약조건으로부터 카메라 내부 파라미터를 알아내기 위해 카메라 셀프 캘리브레이션하는 제1 과정, 상기 제1 과정에서 구해진 내부 파라미터를 이용하여 배경의 특징점으로 구한 에피폴라 제약조건(epipolar constraint)에서 카메라 움직임을 알아내는 제2 과정, 카메라 내부 파라미터와 평면 위의 적어도 4점으로부터 카메라와 객체간 좌표계를 알아내는 제3 과정, 상기 제2 및 제3 과정의 결과를 이용하여 객체의 움직임을 알아내는 제4 과정으로 나눌 수 있다. Second, if the object has at least four points on a plane (six points in a typical three-dimensional space) and scene constraints in the background, the camera self-calibrates to determine camera internal parameters from the scene constraints in the background. Step 1, a second step of detecting the camera movement in the epipolar constraint obtained as the feature point of the background using the internal parameters obtained in the first step, the camera from the camera internal parameters and at least four points on the plane It may be divided into a third process of finding a coordinate system between objects and a fourth process of finding a motion of an object by using the results of the second and third processes.

셋째, 객체와 배경에 마커(marker)가 없는 경우에, 상대적인 시점(orientation)과 내부 파라미터가 고정된 스테레오 카메라를 사용하여 객체의 움직임을 복원하는 방법을 제시한다. 이는 캘리브레이션 타겟이나 씬 제약조건을 줄 수 없는 환경에서도 객체의 순수 움직임을 분리하기 위해 상대적 시점(orientation)과 내부 파라미터가 고정된 스테레오 카메라를 이용하여 카메라 제약조건을 줌으로써 가능하다. Third, when there are no markers in the object and the background, a method of restoring the movement of the object by using a stereo camera having a fixed orientation and internal parameters is fixed. This can be done by using a stereo camera with fixed relative viewpoints and internal parameters to separate pure movements of objects even in environments where calibration targets or scene constraints cannot be given.

즉, 스테레오 영상 쌍(pair)마다 사영복원(projective reconstruction)하는 제1 과정, 두 개의 사영복원 사이의 변환행렬을 배경, 객체 각각에 대해 구하는 제2 과정, 상기 제2 과정에서 구해진 변환행렬로부터 무한평면(plane at infinity)을 계산하여 어파인 복원(affine reconstruction)하는 제3 과정, 배경영역에서 메트릭(metric) 복원을 위해 셀프 캘리브레이션(Self-calibration)하고 이로부터 카메라 움직임을 구하는 제4 과정, 각 객체마다 메트릭(metric) 복원을 하고 이로부 터 직접 객체 움직임을 구하는 제5 과정으로 나눌 수 있다. That is, a first process of projective reconstruction for each stereo image pair, a second process of obtaining a transformation matrix between two projective restorations for each object, and an infinite process from the transformation matrix obtained in the second process A third process of calculating plane at infinity to affine reconstruction, a fourth process of self-calibrating for metric reconstruction in the background area and obtaining camera movement therefrom, each It can be divided into a fifth process of performing metric reconstruction for each object and directly obtaining the object movement.

이상에서와 같이, 본 발명은 하나(또는 한 쌍)의 카메라를 이용한 연속된 이미지의 해석을 통하여 3차원 공간상에 존재하는 움직이는 물체의 3차원 움직임을 복원할 수 있다. 이는 궁극적으로 카메라의 운동에 의한 동적 영상을 해석하여 카메라에 대해 상대적인 운동을 하는 물체의 검출과 그 물체의 3차원 공간상의 움직임 정보를 복원하는 것을 뜻한다. As described above, the present invention can restore the three-dimensional movement of the moving object existing in the three-dimensional space through the analysis of a continuous image using a (or a pair) camera. This ultimately means to analyze the dynamic image by the motion of the camera and to detect the object moving relative to the camera and to restore the motion information in the three-dimensional space of the object.

기존의 영상기반 모델링 기법 등은 정적환경에서만 3차원 구조의 복원이 가능하나, 본 발명은 일반적인 동적환경까지 3차원 구조를 복원함을 의미한다. Existing image-based modeling techniques can restore a three-dimensional structure only in a static environment, but the present invention means that the three-dimensional structure can be restored to a general dynamic environment.

기존의 3차원 구조를 복원하는 방법은 정교한 복원을 위하여 특수한 장비를 사용하며 대표적으로 레이저를 이용한 Structured Light 등이 있다. 이러한 방법은 대개 값비싼 장비가 필요하며 적용환경이 실내로 제한되는 등의 단점을 지니는 반면, 영상을 이용하는 방법들은 이러한 제한에 비교적 자유로운 편이다. 그러나, 이러한 영상을 이용하는 방법도 움직이는 객체의 구조를 복원하려면 카메라를 위치를 고정하는 등의 제한을 가해야 한다. 하지만, 실제의 환경은 이러한 제한을 가할 수 없는 동적환경이 무수히 존재한다. 예를 들어, 도로상의 차에서 다른 차와 충돌 검사, 운동경기에서의 선수나 공을 추적하는 영상, 무인 상점에서의 감시(surveillance)를 위한 움직이는 사람의 검출 및 추적 등이 있다. The existing method of restoring the three-dimensional structure is to use special equipment for sophisticated restoration, and there is typically a structured light using a laser. Such methods usually require expensive equipment and have disadvantages such as limited application environment indoors, while methods using images are relatively free of such restrictions. However, the method using such an image also has to restrict the position of the camera to restore the structure of the moving object. However, there are a lot of dynamic environments in the real environment that cannot impose this limitation. For example, collision detection with other cars in a car on the road, images of tracking a player or ball in athletics, detection and tracking of a moving person for surveillance in an unmanned store.

이러한 동적환경에서 얻어진 영상에서 단순히 움직이는 물체의 검출이 아니라, 본 발명에서와 같이 그 움직임을 3차원 복원을 매프레임마다 해낸다면 유용한 정보를 제공할 수 있게 된다. 예를 들어, 도로상에서 카메라와 물체간 거리를 매프 레임마다 알 수 있으므로, 차량의 자율 주행 시스템과 연동하여 차의 속도를 제어(control)할 수 있다. 또한, 액티브(active) 카메라 시스템에 쓰이면, 타겟(target) 객체의 움직임에 따라 카메라 시스템에 움직임이나 줌잉(zooming) 피드백을 주어 타겟을 놓치지 않고 추적이 가능하게 하는 등의 여러 분야에서 유용하게 사용될 수 있다.In the image obtained in such a dynamic environment, not only a moving object is detected but also useful information can be provided by performing the three-dimensional reconstruction every frame as in the present invention. For example, since the distance between the camera and the object on the road can be known for each frame, the speed of the car can be controlled in conjunction with the autonomous driving system of the vehicle. In addition, when used in an active camera system, the camera system can be usefully used in various fields, such as providing a motion or zooming feedback to the camera system according to the movement of a target object, so that the target can be tracked without missing a target. have.

상술한 바와 같은 본 발명의 방법은 프로그램으로 구현되어 컴퓨터로 읽을 수 있는 기록매체(씨디롬, 램, 롬, 플로피 디스크, 하드 디스크, 광자기 디스크 등)에 저장될 수 있다.The method of the present invention as described above may be implemented as a program and stored in a computer-readable recording medium (CD-ROM, RAM, ROM, floppy disk, hard disk, magneto-optical disk, etc.).

이상에서 설명한 본 발명은 전술한 실시예 및 첨부된 도면에 의해 한정되는 것이 아니고, 본 발명의 기술적 사상을 벗어나지 않는 범위 내에서 여러 가지 치환, 변형 및 변경이 가능하다는 것이 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에게 있어 명백할 것이다.The present invention described above is not limited to the above-described embodiments and the accompanying drawings, and various substitutions, modifications, and changes are possible in the art without departing from the technical spirit of the present invention. It will be clear to those of ordinary knowledge.

상기한 바와 같은 본 발명은, 하나(또는 한 쌍)의 카메라를 이용한 연속된 이미지의 해석을 통하여 3차원 공간상에 존재하는 움직이는 물체의 3차원 움직임을 복원할 수 있는 효과가 있다. As described above, the present invention has an effect of restoring three-dimensional movement of a moving object existing in three-dimensional space through analysis of a continuous image using one (or a pair) camera.

이를 통해, 본 발명은 카메라 위치를 고정하는 등의 제한을 가하지 않고도, 혹은 제한을 가할 수 없는 환경, 예를 들어 도로상에서 차간 충돌검사, 운동경기에서의 선수나 공을 추적하는 영상, 무인 상점에서의 감시(surveillance)를 위한 움 직이는 사람의 검출 및 추적 등에 활용 가능하며, 액티브(active) 카메라 시스템에 쓰이면, 객체의 움직임에 따라 카메라 시스템에 움직임이나 줌잉(zooming) 피드백을 주어 객체를 놓치지 않고 추적이 가능한 효과가 있다.By doing so, the present invention can be applied in an environment without limitations, such as fixing a camera position, or in an environment in which limitations cannot be applied, for example, collision checking between cars on a road, video tracking a player or a ball in an athletic game, and an unmanned store. It can be used for the detection and tracking of a moving person for surveillance, and when used in an active camera system, the object can be missed by giving a movement or zooming feedback to the camera system according to the movement of the object. There is a traceable effect.

Claims

In the video object motion extraction apparatus in a dynamic environment where the motion of the camera and the object is respectively,

In order to separate the object and the background area using the motion information of the image in at least two video frames, and to separate the background area represented by the camera movement and the object area represented by the camera and the object movement using the epipolar constraint. Motion dividing means;

Camera calibration means for obtaining an internal camera parameter (camera coordinate system) in the background area for each image frame;

Camera motion estimation means for obtaining a "three-dimensional movement of the camera" between the camera coordinate systems for each image frame;

Camera / object coordinate system conversion means for obtaining a conversion between the camera coordinate system and each object coordinate system for each image frame; And

By using the camera-to-object coordinate transformation and the camera coordinate transformation between the image frames, the 'three-dimensional movement of the camera' is removed from the object region including the movement of the object and the camera for each image frame. Object motion estimation means to find dimensional motion

Apparatus for extracting object movements that compensates for camera movements.

The method of claim 1,

The movement dividing means,

In classifying the area of an object that moves independently of the fixed background, if the background and the object have markers that distinguish it, the epipole is affected by epipolar constraints of the markers in the background. (epipole), classify outlier pixels that do not converge to the epipole, classify the background and the rest, and classify the background for each object's markers, for the rest of the area. An apparatus for extracting motion of a camera, which compensates for camera motion, comprising performing a process identical to the process to classify a region of a corresponding object.

The method of claim 1,

The movement dividing means,

In classifying the area of an object that moves independently of the fixed background, if there is no marker in the background, the user can manually specify the corresponding area in the two-frame image to obtain the epipolar constraint. Find the matching point that has the best local matching in the area (matching feature), classify the pixels that converge to the epipole obtained as the feature point, and classify the background for outlier pixels rather than the background. Compensating for the camera movement, the object motion extraction apparatus characterized in that to classify the area of each object by performing the same process.

The method of claim 1,

The camera calibration means,

In extracting the camera internal parameters, the 3D camera coordinate system and the camera internal variables for the absolute coordinate system are obtained (camera calibration), and in the absence of information on the 3D absolute coordinate system, the feature constraints or the mapping of feature points in two or more images An apparatus for extracting object motions that compensates for camera movements, wherein only the camera internal parameters are obtained using a relationship (self-calibration).

The method of claim 1,

The camera motion estimation means,

In calculating camera movements between camera coordinate systems for each image frame, when there is information about a three-dimensional absolute coordinate system, the rotation and translational movement of the camera are obtained from external parameters of the camera, and when there is no information about the absolute coordinate system, the camera calibration is performed. An object motion extraction apparatus for compensating for camera motion, comprising obtaining a rotation and a translation motion of a camera from an internal parameter obtained from means and an epipolar constraint of a background area.

The method of claim 1,

The camera / object coordinate system conversion means,

Calculate the pose of the camera coordinate system with respect to the object coordinate system using the minimum points through the conversion between the camera and each object coordinate system. If there is no three-dimensional marker that can know the absolute coordinate system, the result of the camera calibration means is used. Using only pixels in a fixed background area, camera internal variables are obtained using three or more image frames (self-calibration), camera parameters obtained from self-calibration, and at least six points on the object (four on a plane) An object motion extraction apparatus for compensating for camera movement, comprising: obtaining a transformation between a camera and an object using an object coordinate system having a corresponding point of.

The method according to any one of claims 1 to 6,

The object motion estimation means,

The three-dimensional motion of the object is calculated by linear transformation of camera motion and coordinate system transformation, and the object is obtained from the combined transformation of the transformation between the camera coordinate system obtained by the camera motion estimation means and the coordinate system transformation between the camera and object obtained by the camera / object coordinate system conversion means. An object motion extraction apparatus for compensating for camera movements, the motions of which are obtained.

In the method for extracting the movement of an object in a dynamic environment where the movement of the camera and the object is respectively,

A motion that separates the object and the background area using the motion information of the image in at least two video frames, but separates the background area represented by the camera movement and the object area represented by the camera and the object movement using the epipolar constraint. Partitioning step;

A camera calibration step of obtaining an internal camera parameter (camera coordinate system) in the background area for each image frame;

A camera motion estimation step of obtaining a 'three-dimensional movement of the camera' between the camera coordinate systems for each image frame;

A camera / object coordinate system conversion step of obtaining a transformation between the camera coordinate system and each object coordinate system for each image frame; And

By using the camera-to-object coordinate transformation and the camera coordinate transformation between the image frames, the 'three-dimensional movement of the camera' is removed from the object region including the movement of the object and the camera for each image frame. Object motion estimation step to find 'dimensional motion'

Object motion extraction method that compensates for the camera movement comprising a.

The method of claim 8,

An object motion display step of displaying a motion of the separated object in an absolute coordinate system

Object movement extraction method that compensates for the camera movement further comprising.

The method according to claim 8 or 9,

When we find the movement of the object,

If there are markers on the object and in the background,

A camera calibration step of performing a camera calibration from a marker of a background to obtain a coordinate system transformation between the camera and the background and to obtain a camera movement;

A coordinate system transformation step of finding a coordinate system transformation between camera objects from a marker of the object; And

An object motion estimation step of determining the object-background coordinate system transformation, that is, the motion of the object, by the linear transformation of the camera-background coordinate system transformation in the camera calibration step and the camera object coordinate transformation in the coordinate system transformation step.

The method according to claim 8 or 9,

When we find the movement of the object,

If an object has at least 4 points on a plane (6 points in a typical 3D space) and a scene constraint in the background,

Self-calibrating the camera to calibrate the camera internal parameters from the scene constraints in the background;

A camera movement estimating step of knowing camera movement in an epipolar constraint obtained as a background feature using the internal parameters obtained in the self-calibration step;

A coordinate system transformation step of finding a coordinate system between a camera and an object from at least four points on a plane and a camera internal parameter; And

Object motion estimation step of finding the motion of the object by using the results of the camera motion estimation step and the coordinate system conversion step

The method according to claim 8 or 9,

When we find the movement of the object,

If there are no markers on the object and background, restore the object's movement using a stereo camera with fixed relative orientation and internal parameters.

Projective reconstruction for each stereo image pair (projective reconstruction),

A transformation matrix extraction step of obtaining a transformation matrix between two projective restorations for each of the background and objects;

An affine restoration step of calculating a plane at infinity from the transformation matrix obtained in the transformation matrix extraction step and affine reconstruction;

A camera motion estimation step of self-calibration for metric reconstruction in the background area and obtaining camera movement therefrom; And

Object motion estimation step of metric recovery for each object and direct object motion from it