KR20220114819A

KR20220114819A - Real-time object tracking system and method in moving camera video

Info

Publication number: KR20220114819A
Application number: KR1020210018306A
Authority: KR
Inventors: 천세욱; 김종헌
Original assignee: 주식회사 라온버드
Priority date: 2021-02-09
Filing date: 2021-02-09
Publication date: 2022-08-17
Also published as: KR102614895B1

Abstract

A real-time object tracking system and method in a moving camera video are disclosed. The tracking system includes: a video camera that can move horizontally, vertically and rotate, and generates moving video frames captured while moving in a specific area; and a video analysis server that generates compensation video frames from which the motion of the video camera is removed by homographically inversely transforming motion vectors according to the motion of the video camera with respect to the moving video frames generated by the video camera, and tracks an object detected according to a key point from the generated compensation video frames according to location similarity and appearance similarity.

Description

Real-time OBJECT TRACKING SYSTEM AND METHOD IN MOVING CAMERA VIDEO

본 발명의 개념에 따른 실시 예는 동적 카메라로부터 포착된 영상 내의 객체를 실시간으로 추적하는 기술에 대한 것으로, 보다 상세하게는 움직이는 카메라로부터 촬영된 영상에서 카메라의 움직임 성분을 실시간 제거하고 탐지된 객체의 키 포인트 별 특징 벡터를 추출 비교함으로써 보다 빠르고 정확하게 객체를 추적할 수 있는 동적 카메라 영상 내의 객체를 실시간 추적하는 기술에 관한 것이다.An embodiment according to the concept of the present invention relates to a technology for tracking an object in an image captured by a dynamic camera in real time, and more particularly, it removes a camera motion component from an image captured by a moving camera in real time, and It relates to a technology for real-time tracking of objects in dynamic camera images that can track objects faster and more accurately by extracting and comparing feature vectors for each key point.

현대는 각종 강력 범죄의 잦은 발생으로 사회적 불안감이 증가 됨에 따라 개인과 공공 안전에 대한 관심이 높아지고 있는 사회이다. 이에 따라, 각종 사건, 사고에 대한 사전 예방과 신속한 해결을 위하여 도심의 주택가, 학교, 도로 등에 CCTV(Closed-Circuit Television)가 설치되는 경우가 점차 증가하고 있다. 그리고 이러한 CCTV 시스템은 영상에서의 주요 특징점을 추출하여 객체를 인식하는 방법을 통해 단순 기록장치가 아닌 실시간 감시 및 신고자의 역할을 겸할 수 있는 지능형 CCTV로 진화하고 있다. 또한, 최근의 CCTV 시스템은 CCTV 카메라의 촬영 범위, 해상도의 한계를 극복하기 위해 PTZ(Pan-Tilt-Zoom) 카메라 등과 같은 동적 카메라를 이용하여 영상을 촬영하기 시작하였다. 그러나 종래 또는 최근의 CCTV 시스템들은 고정된 카메라가 촬영한 영상에 대해 객체를 인식 및 추적하는 방안 만을 제시할 뿐 움직이는 카메라로부터 촬영된 영상에 대해 실시간으로 객체를 인식 및 추적하는 방안은 제시하지 못하고 있다.In modern times, as social anxiety increases due to the frequent occurrence of various violent crimes, interest in personal and public safety is increasing. Accordingly, closed-circuit television (CCTV) is increasingly installed in residential areas, schools, and roads in downtown for proactive prevention and prompt resolution of various incidents and accidents. And this CCTV system is evolving into an intelligent CCTV that can serve as both a real-time monitoring and a reporter, rather than a simple recording device, through a method of recognizing an object by extracting key feature points from the image. In addition, the recent CCTV system began to shoot images using a dynamic camera such as a PTZ (Pan-Tilt-Zoom) camera to overcome the limitations of the shooting range and resolution of the CCTV camera. However, conventional or recent CCTV systems only suggest a method for recognizing and tracking an object for an image captured by a fixed camera, but do not provide a method for recognizing and tracking an object in real time for an image captured by a moving camera. .

본 발명이 해결하고자 하는 기술적인 과제는 고정된 카메라 뿐만 아니라 움직이는 카메라에서 촬영된 영상에 대해서도 객체 추적이 가능하고, 저밀도 환경 뿐 아니라 고밀도의 복잡한 환경에서도 객체 추적이 가능하며, 기 촬영된 영상 뿐 아니라 현재 촬영 중인 영상에 대해서도 실시간 객체 추적이 가능한 동적 카메라 영상 내의 객체를 실시간 추적하는 시스템을 제공하는 것이다. The technical problem to be solved by the present invention is that object tracking is possible not only in a fixed camera but also in an image captured by a moving camera, object tracking is possible not only in a low-density environment but also in a high-density complex environment; It is to provide a system for real-time tracking of an object in a dynamic camera image capable of real-time object tracking even for an image currently being photographed.

본 발명이 해결하고자 하는 다른 기술적인 과제는 고정된 카메라 뿐만 아니라 움직이는 카메라에서 촬영된 영상에 대해서도 객체 추적이 가능하고, 저밀도 환경 뿐 아니라 고밀도의 복잡한 환경에서도 객체 추적이 가능하며, 기 촬영된 영상 뿐 아니라 현재 촬영 중인 영상에 대해서도 실시간 객체 추적이 가능한 동적 카메라 영상 내의 객체를 실시간 추적하는 방법을 제공하는 것이다.Another technical problem to be solved by the present invention is that object tracking is possible not only in a fixed camera but also in an image taken by a moving camera, object tracking is possible not only in a low-density environment but also in a high-density complex environment, Rather, it is to provide a method for real-time tracking of an object in a dynamic camera image, which enables real-time object tracking even for an image currently being photographed.

본 발명의 일 실시 예에 따른 동적 카메라 영상 내의 객체를 실시간 추적하는 시스템은 수평 이동, 수직 이동 및 회전이 가능하며, 특정 영역을 이동하며 촬영한 이동 영상 프레임들을 생성하는 영상 촬영 카메라 및 상기 영상 촬영 카메로부터 생성한 이동 영상 프레임들에 대하여 상기 영상 촬영 카메라의 움직임에 따른 이동 벡터들을 호모그래피 역변환함으로써 상기 영상 촬영 카메라의 움직임을 제거한 보상 영상 프레임들을 생성하고, 상기 생성한 보상 영상 프레임들로부터 키 포인트에 따라 탐지된 객체를 위치 유사도 및 생김새 유사도에 따라 추적하는 영상 분석 서버를 포함한다.A system for tracking an object in a dynamic camera image in real time according to an embodiment of the present invention is capable of horizontal movement, vertical movement, and rotation, and an image capturing camera that generates moving image frames captured while moving a specific area, and the image capturing Compensation image frames from which the motion of the imaging camera is removed are generated by inverse homography transformation of motion vectors according to the motion of the imaging camera with respect to the motion image frames generated by the camera, and a key point is obtained from the generated compensation image frames. and an image analysis server that tracks the detected object according to the location similarity and the shape similarity.

이때, 상기 영상 분석 서버는 상기 이동 영상 프레임들에 대하여 상기 영상 촬영 카메라의 움직임에 따른 이동 벡터들을 산출하는 카메라 움직임 추정 모듈과 상기 이동 영상 프레임들에 대하여 상기 산출한 이동 벡터들을 호모그래피 역변환함으로써 상기 영상 촬영 카메라의 움직임을 제거한 보상 영상 프레임들을 생성하는 카메라 움직임 보상 모듈과 상기 보상 영상 프레임들로부터 객체의 각 부분에 따른 상기 키 포인트를 인식하여 상기 객체를 탐지하는 객체 검출 모듈 및 상기 위치 유사도 및 생김새 유사도에 따라 상기 탐지된 객체를 추적하는 객체 추적 모듈을 포함한다.In this case, the image analysis server includes a camera motion estimation module that calculates motion vectors according to the motion of the image capturing camera with respect to the moving image frames, and homography inversely transforms the calculated motion vectors with respect to the moving image frames. A camera motion compensation module for generating compensated image frames from which the motion of an image capturing camera is removed, an object detection module for detecting the object by recognizing the key point according to each part of the object from the compensated image frames, and the positional similarity and shape and an object tracking module for tracking the detected object according to the degree of similarity.

실시 예에 따라, 상기 카메라 움직임 추정 모듈은 상기 이동 영상 프레임들의 픽셀값을 그레이 스케일로 변환하고, 상기 이동 영상 프레임들 중 이전 영상 프레임에 일정 간격으로 배치된 격자점들로 구성된 격자를 생성하여 상기 격자점들 각각에 대해 상기 이전 영상 프레임 대비 현재 영상 프레임에서의 그레이 스케일 값 변화량이 소정 기준치 이상인 격자점을 특징점으로 선별하며, 피라미드 루카스 카나에 알고리즘을 통해 상기 선별한 특징점들의 다음 영상 프레임에서의 예상 위치를 추정하여 특징점 쌍의 이동 벡터들을 구하고, 상기 특징점 쌍의 이동 벡터들에 대해 이차원 히스토그램을 계산하여 최고 빈도의 이동 벡터들을 상기 영상 촬영 카메라의 움직임에 따른 이동 벡터들로 산출하는 것을 특징으로 한다.According to an embodiment, the camera motion estimation module converts pixel values of the moving image frames into gray scale, and generates a lattice composed of lattice points arranged at regular intervals in a previous image frame among the moving image frames to generate the For each of the grid points, a grid point in which the gray scale value change in the current image frame compared to the previous image frame is greater than or equal to a predetermined reference value is selected as a feature point, and the selected feature points are expected in the next image frame through the Pyramid Lucas Kanae algorithm. It is characterized by calculating the motion vectors of the feature point pair by estimating the position, calculating the two-dimensional histogram for the motion vectors of the feature point pair, and calculating the highest frequency motion vectors as motion vectors according to the motion of the imaging camera. .

실시 예에 따라, 상기 움직임 보상 모듈은 상기 영상 촬영 카메라의 움직임에 따른 이동 벡터들을 기초로 상기 이동 영상 프레임들 중 이전 영상 프레임 내의 특징점들을 현재 영상 프레임의 특정 위치로 호모그래피 변환시키기 위한 호모그래피 행렬을 계산하고, 상기 계산한 호모그래피의 역변환을 상기 현재 영상 프레임의 모든 픽셀에 적용하여 상기 이전 영상 프레임 대비 상기 현재 영상 프레임에서의 상기 영상 촬영 카메라 움직임을 제거한 보상 영상 프레임들을 생성하는 것을 특징으로 한다.According to an embodiment, the motion compensation module is a homography matrix for homography-converting feature points in a previous image frame among the moving image frames to a specific position of the current image frame based on the motion vectors according to the motion of the image capturing camera. , and applying the calculated inverse transform of the homography to all pixels of the current image frame to generate compensated image frames in which the image capturing camera movement in the current image frame is removed compared to the previous image frame .

실시 예에 따라, 상기 객체 검출 모듈은 상기 인식한 키 포인트에 따라 상기 탐지한 객체의 생김새를 나타내는 특징 벡터를 추출하는 것을 특징으로 한다.According to an embodiment, the object detection module extracts a feature vector representing the shape of the detected object according to the recognized key point.

실시 예에 따라, 상기 객체 추적 모듈은 칼만 필터를 이용해 상기 객체 검출 모듈로부터 탐지된 객체들 중 이전 영상 프레임까지 추적 중이던 추적 객체에 대하여 현재 영상 프레임에서 존재할 것이라 예측되는 위치를 추정하고, 상기 객체 검출 모듈에 의해 상기 현재 영상 프레임에서 탐지된 후보 객체들의 위치와 상기 칼만 필터를 이용해 추정한 상기 추적 객체의 위치를 비교하여 상기 위치 유사도를 판단하고, 상기 추적 객체의 특징 벡터와 상기 후보 객체들의 특징 벡터를 비교하여 상기 생김새 유사도를 판단하며, 상기 판단한 위치 유사도 및 상기 생김새 유사도에 따라 상기 추적 객체와 상기 후보 객체 간 동일성을 판단하여 상기 추적 객체를 추적하는 것을 특징으로 한다.According to an embodiment, the object tracking module estimates a position predicted to exist in a current image frame with respect to a tracking object being tracked up to a previous image frame among objects detected by the object detection module using a Kalman filter, and detecting the object The position similarity is determined by comparing the positions of the candidate objects detected in the current image frame by the module with the positions of the tracking objects estimated using the Kalman filter, and the feature vector of the tracking object and the feature vectors of the candidate objects to determine the shape similarity, and to track the tracking object by determining the sameness between the tracking object and the candidate object according to the determined location similarity and the shape similarity.

이때, 상기 객체 추적 모듈은 상기 판단한 위치 유사도 및 상기 생김새 유사도 각각에 소정 비율에 따른 가중치를 부여하여 상기 추적 객체와 상기 후보 객체간 동일성을 판단하되, 상기 추정한 추적 객체의 위치와 인접하는 소정 범위 내에서 상기 후보 객체들이 탐지된 경우에는 상기 위치 유사도에 상기 생김새 유사도보다 큰 가중치를 부여하고, 상기 추정한 추정된 위치와 인접하는 소정 범위 밖에서 상기 후보 객체들이 탐지된 경우에는 상기 생김새 유사도에 상기 위치 유사도보다 큰 가중치를 부여하는 것을 특징으로 한다.At this time, the object tracking module determines the sameness between the tracking object and the candidate object by assigning a weight according to a predetermined ratio to each of the determined position similarity and the shape similarity, but within a predetermined range adjacent to the estimated position of the tracking object. When the candidate objects are detected within It is characterized in that a weight greater than the similarity is given.

본 발명의 일 실시 예에 따른 동적 카메라 영상 내의 객체를 실시간 추적하는 방법은 영상 촬영 카메라가 이동하며 촬영한 이동 영상 프레임들을 카메라 움직임 추정 모듈로 전송하는 단계와 상기 카메라 움직임 추정 모듈이 상기 전송된 이동 영상 프레임들에서 상기 영상 촬영 카메라에 대한 이동 벡터를 계산하여 상기 영상 촬영 카메라의 움직임을 추정하는 단계와 카메라 움직임 보상 모듈이 상기 이동 영상 프레임들에서 상기 추정한 영상 촬영 카메라의 움직임을 제거한 보상 영상 프레임들을 생성하여 객체 검출 모듈로 전송하는 단계와 상기 객체 검출 모듈이 상기 전송된 보상 영상 프레임들 중 해당 보상 영상 프레임에서 추적 객체를 검출하고, 상기 검출된 추적 객체의 특징 벡터를 추출하는 단계와 객체 추적 모듈이 칼만 필터를 이용하여 상기 해당 보상 영상 프레임 이후의 보상 영상 프레임에서 상기 추적 객체의 예상 위치를 추정하는 단계와 상기 객체 검출 모듈이 상기 해당 보상 영상 프레임 이후의 보상 영상 프레임에서 후보 객체를 검출하고, 상기 검출된 후보 객체의 특징 벡터를 추출하는 단계 및 상기 객체 추적 모듈이 상기 추정한 예상 위치와 인접하는 소정 범위 내 또는 상기 소정 범위 외에서 검출된 상기 후보 객체들과 상기 추적 객체 상호 간의 위치 유사도 및 상기 추출한 특징 벡터에 따른 생김새 유사도를 대비하여 매칭 여부를 판단하는 단계를 포함한다.A method for tracking an object in a dynamic camera image in real time according to an embodiment of the present invention includes the steps of: transmitting moving image frames captured by an image capturing camera while moving to a camera motion estimation module; estimating the motion of the imaging camera by calculating a motion vector for the imaging camera from image frames, and a compensation image frame obtained by removing the estimated motion of the imaging camera from the moving image frames by a camera motion compensation module generating and transmitting them to an object detection module; detecting, by the object detection module, a tracking object in a corresponding compensation image frame among the transmitted compensation image frames, and extracting a feature vector of the detected tracking object; and object tracking estimating, by a module, an expected position of the tracking object in a compensation image frame after the corresponding compensation image frame by using a Kalman filter, and the object detection module detecting a candidate object in a compensation image frame after the corresponding compensation image frame, , extracting a feature vector of the detected candidate object and the position similarity between the candidate objects and the tracking object detected within or outside the predetermined range adjacent to the predicted position estimated by the object tracking module; and determining whether to match by comparing the degree of similarity in appearance according to the extracted feature vector.

실시 예에 따라, 상기 카메라 움직임 추정 모듈이 상기 영상 촬영 카메라의 움직임을 추정하는 단계는 상기 전송된 이동 영상 프레임들의 픽셀값을 그레이 스케일로 변환하는 단계와 상기 이동 영상 프레임들에 대해 일정 간격으로 배치된 격자점들로 구성된 격자를 생성하는 단계와 상기 격자점들 각각에 대해 이전 영상 프레임 대비 현재 영상 프레임에서의 그레이 스케일 값 변화량이 소정 기준치 이상인 격자점을 특징점으로 선별하는 단계와 소정의 광학 흐름 방법을 통해 상기 특징점들의 다음 영상 프레임에 대한 위치를 추정하는 단계와 상기 현재 영상 프레임에서 선별한 특징점과 상기 추정한 다음 영상 다음 프레임서의 특징점을 이용하여 특징점 쌍의 이동 벡터들을 구하는 단계 및 상기 구한 이동 벡터들에 대해 이차원 히스토그램을 계산하여 최고 빈도의 이동 벡터들을 결정하여 상기 영상 촬영 카메라에 대한 이동 벡터를 계산하는 단계를 포함한다.According to an embodiment, the step of estimating, by the camera motion estimation module, the motion of the image capturing camera includes converting pixel values of the transmitted moving image frames into gray scale and arranging the moving image frames at regular intervals. Generating a lattice composed of lattice points, and for each of the lattice points, selecting lattice points in which the gray scale value change in the current image frame compared to the previous image frame is greater than or equal to a predetermined reference value as a feature point, and a predetermined optical flow method estimating the position of the feature points with respect to the next image frame through and calculating a motion vector for the imaging camera by calculating a two-dimensional histogram with respect to the vectors to determine the motion vectors of the highest frequency.

실시 예에 따라, 상기 카메라 움직임 보상 모듈이 상기 보상 영상 프레임들을 생성하여 객체 검출 모듈로 전송하는 단계는 상기 결정한 최고 빈도의 이동 벡터들을 통해, 상기 이전 영상 프레임 내의 특징점들을 상기 현재 영상 프레임의 특정 위치로 변환시키기 위한 호모그래피 행렬을 계산하는 단계와 상기 이동 영상 프레임 내의 픽셀들에 대해 상기 계산한 호모그래프 행렬의 역변환을 적용하여 상기 영상 촬영 카메라 움직임을 제거한 상기 보상 영상 프레임들을 생성하는 단계 및 상기 생성한 보상 영상 프레임들을 상기 객체 검출 모듈로 전송하는 단계를 포함한다.According to an embodiment, the step of generating, by the camera motion compensation module, the compensated image frames and transmitting the compensated image frames to the object detection module may include, through the determined highest frequency motion vectors, feature points in the previous image frame to a specific position of the current image frame. calculating a homography matrix for transforming into , and generating the compensated image frames from which the imaging camera motion is removed by applying an inverse transformation of the calculated homography matrix to pixels in the moving image frame; and transmitting one compensation image frame to the object detection module.

상기와 같이 본 발명의 일 실시 예에 따른 동적 카메라 영상 내의 객체를 실시간 추적하는 시스템 및 방법은 영상 프레임 내 객체의 움직임에서 영상 카메라의 움직임을 배제한 영상 프레임 내 객체 자체의 움직임을 산출해 낼 수 있기 때문에 고정된 카메라 영상 뿐만 아니라 고정되지 않은 카메라 영상으로부터도 객체를 안정적으로 추적할 수 있는 효과가 있다.As described above, the system and method for real-time tracking of an object in a dynamic camera image according to an embodiment of the present invention can calculate the movement of the object itself in the image frame by excluding the movement of the video camera from the movement of the object in the image frame. Therefore, there is an effect of stably tracking an object not only from a fixed camera image but also from a non-fixed camera image.

또한, 본 발명의 일 실시 예에 따른 동적 카메라 영상 내의 객체를 실시간 추적하는 시스템 및 방법은 시스템 부하를 최소화하며 매우 빠르게 카메라 움직임을 배제시킬 수 있기 때문에 기 촬영된 영상 뿐 아니라 현재 촬영 중인 영상에 대해서도 실시간 객체를 추적할 수 있는 효과가 있다.In addition, the system and method for real-time tracking of an object in a dynamic camera image according to an embodiment of the present invention minimizes system load and can exclude camera movement very quickly, so that not only a previously captured image but also an image currently being captured It has the effect of being able to track real-time objects.

나아가, 본 발명의 일 실시 예에 따른 동적 카메라 영상 내의 객체를 실시간 추적하는 시스템 및 방법은 사람의 신체 부위에 대한 각 키 포인트를 인식하는 방식을 사용함으로써 저밀도 환경 뿐 아니라 객체가 중첩되는 고밀도 복잡한 환경에서도 정밀하게 객체를 추적할 수 있는 효과가 있다.Furthermore, a system and method for real-time tracking of an object in a dynamic camera image according to an embodiment of the present invention uses a method of recognizing each key point for a human body part, so that not only a low-density environment but also a high-density complex environment in which objects are overlapped It also has the effect of precisely tracking objects.

본 발명의 상세한 설명에서 인용되는 도면을 보다 충분히 이해하기 위한 각 도면의 상세한 설명이 제공된다.
도 1은 본 발명의 일 실시 예에 따른 동적 카메라 영상 내의 객체를 실시간 추적하는 시스템의 구성을 나타내는 블럭도이다.
도 2는 도 1에 도시된 카메라 움직임 추정 모듈이 해당 영상 프레임에 격자점 및 특징점을 표시한 예시도이다.
도 3은 도 1에 도시된 카메라 움직임 추정 모듈이 이동 벡터에 대해 이차원 히스토그램을 계산하여 높은 빈도의 이동 벡터를 결정한 예시도이다.
도 4는 본 발명의 일실 시예 따른 호모그래피 역변환이 적용된 예시도이다.
도 5는 본 발명의 일 실시 예에 따른 동적 카메라 영상 내의 객체를 실시간 추적하는 방법을 설명하기 위한 순서도이다. A detailed description of each drawing is provided in order to more fully understand the drawings recited in the Detailed Description of the Invention.
1 is a block diagram showing the configuration of a system for tracking an object in a dynamic camera image in real time according to an embodiment of the present invention.
FIG. 2 is an exemplary diagram in which the camera motion estimation module shown in FIG. 1 displays grid points and feature points in a corresponding image frame.
FIG. 3 is an exemplary diagram in which the camera motion estimation module shown in FIG. 1 determines a high frequency motion vector by calculating a two-dimensional histogram with respect to the motion vector.
4 is an exemplary view to which inverse homography transformation according to an embodiment of the present invention is applied.
5 is a flowchart illustrating a method of tracking an object in a dynamic camera image in real time according to an embodiment of the present invention.

본 명세서에 개시되어 있는 본 발명의 개념에 따른 실시 예들에 대해서 특정한 구조적 또는 기능적 설명들은 단지 본 발명의 개념에 따른 실시 예들을 설명하기 위한 목적으로 예시된 것으로서, 본 발명의 개념에 따른 실시 예들은 다양한 형태들로 실시될 수 있으며 본 명세서에 설명된 실시 예들에 한정되지 않는다.Specific structural or functional descriptions for the embodiments according to the concept of the present invention disclosed in this specification are only exemplified for the purpose of explaining the embodiments according to the concept of the present invention, and the embodiments according to the concept of the present invention are It may be implemented in various forms and is not limited to the embodiments described herein.

본 발명의 개념에 따른 실시 예들은 다양한 변경들을 가할 수 있고 여러 가지 형태들을 가질 수 있으므로 실시 예들을 도면에 예시하고 본 명세서에 상세하게 설명하고자 한다. 그러나 이는 본 발명의 개념에 따른 실시 예들을 특정한 개시 형태들에 대해 한정하려는 것이 아니며, 본 발명의 사상 및 기술 범위에 포함되는 모든 변경, 균등물, 또는 대체물을 포함한다.Since the embodiments according to the concept of the present invention may have various changes and may have various forms, the embodiments will be illustrated in the drawings and described in detail herein. However, this is not intended to limit the embodiments according to the concept of the present invention to specific disclosed forms, and includes all modifications, equivalents, or substitutes included in the spirit and scope of the present invention.

제1 또는 제2 등의 용어는 다양한 구성 요소들을 설명하는데 사용될 수 있지만, 상기 구성 요소들은 상기 용어들에 의해 한정되어서는 안 된다. 상기 용어들은 하나의 구성 요소를 다른 구성 요소로부터 구별하는 목적으로만, 예컨대 본 발명의 개념에 따른 권리 범위로부터 이탈되지 않은 채, 제1구성요소는 제2구성요소로 명명될 수 있고, 유사하게 제2구성요소는 제1구성요소로도 명명될 수 있다.Terms such as first or second may be used to describe various elements, but the elements should not be limited by the terms. The above terms are used only for the purpose of distinguishing one element from other elements, for example, without departing from the scope of the present invention, a first element may be called a second element, and similarly The second component may also be referred to as the first component.

어떤 구성요소가 다른 구성요소에 "연결되어" 있다거나 "접속되어" 있다고 언급된 때에는, 그 다른 구성요소에 직접적으로 연결되어 있거나 또는 접속되어 있을 수도 있지만, 중간에 다른 구성요소가 존재할 수도 있다고 이해되어야 할 것이다. 반면에, 어떤 구성요소가 다른 구성요소에 "직접 연결되어" 있다거나 "직접 접속되어" 있다고 언급된 때에는, 중간에 다른 구성요소가 존재하지 않는 것으로 이해되어야 할 것이다. 구성요소들 간의 관계를 설명하는 다른 표현들, 즉 "~사이에"와 "바로 ~사이에" 또는 "~에 이웃하는"과 "~에 직접 이웃하는" 등도 마찬가지로 해석되어야 한다.When a component is referred to as being “connected” or “connected” to another component, it may be directly connected or connected to the other component, but it is understood that other components may exist in between. it should be On the other hand, when it is said that a certain element is "directly connected" or "directly connected" to another element, it should be understood that the other element does not exist in the middle. Other expressions describing the relationship between elements, such as "between" and "immediately between" or "neighboring to" and "directly adjacent to", etc., should be interpreted similarly.

본 명세서에서 사용한 용어는 단지 특정한 실시 예를 설명하기 위해 사용된 것으로, 본 발명을 한정하려는 의도가 아니다. 단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한, 복수의 표현을 포함한다.The terms used herein are used only to describe specific embodiments, and are not intended to limit the present invention. The singular expression includes the plural expression unless the context clearly dictates otherwise.

본 명세서에서, "포함한다" 또는 "갖는다" 등의 용어는 설명된 특징, 숫자, 단계, 동작, 구성요소, 부분품 또는 이들을 조합한 것이 존재함을 지정하려는 것이지, 하나 또는 그 이상의 다른 특징들이나 숫자, 단계, 동작, 구성요소, 부분품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 미리 배제하지 않는 것으로 이해되어야 한다.As used herein, terms such as “comprises” or “having” are intended to designate that the described feature, number, step, operation, component, part, or combination thereof is present, and includes one or more other features or numbers. , it is to be understood that it does not preclude the possibility of the presence or addition of steps, operations, components, parts, or combinations thereof.

다르게 정의되지 않는 한, 기술적이거나 과학적인 용어를 포함해서 여기서 사용되는 모든 용어들은 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자에 의해 일반적으로 이해되는 것과 동일한 의미가 있다.Unless defined otherwise, all terms used herein, including technical or scientific terms, have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.

일반적으로 사용되는 사전에 정의되어 있는 것과 같은 용어들은 관련 기술의 문맥상 가지는 의미와 일치하는 의미를 포함하는 것으로 해석되어야 하며, 본 명세서에서 명백하게 정의하지 않는 한, 이상적이거나 과도하게 형식적인 의미로 해석되지 않는다.Terms such as those defined in commonly used dictionaries should be interpreted as including meanings consistent with the meanings in the context of the related art, and unless explicitly defined in the present specification, they should be interpreted in an ideal or excessively formal meaning. doesn't happen

이하, 첨부한 도면을 참조하여 본 발명의 바람직한 실시 예를 설명함으로써, 본 발명을 상세히 설명한다.Hereinafter, the present invention will be described in detail by describing preferred embodiments of the present invention with reference to the accompanying drawings.

도 1은 본 발명의 일 실시 예에 따른 동적 카메라 영상 내의 객체를 실시간 추적하는 시스템(이하, '추적 시스템(10)'이라 함)의 구성을 나타내는 블럭도이고,1 is a block diagram showing the configuration of a system (hereinafter, referred to as a 'tracking system 10') for tracking an object in a dynamic camera image in real time according to an embodiment of the present invention;

도 1을 참조하면, 추적 시스템(10)은 특정 지역의 원하는 영역을 촬영하는 영상 촬영 카메라(100)와 촬영된 영역에 대한 영상으로부터 카메라의 이동 성분을 제거하고 객체의 이동 성분에 따라 객체를 추적하는 영상 분석 서버(200)를 포함한다.Referring to FIG. 1 , the tracking system 10 removes a moving component of the camera from an image for a captured area and an image capturing camera 100 for capturing a desired area of a specific area, and tracks the object according to the moving component of the object. and an image analysis server 200 that

영상 촬영 카메라(100)는 고정된 영역을 하는 일반적인 CCTV 카메라 뿐만 아니라 동적으로 움직이는 카메라, 예컨대 수평 이동, 수직 이동, 줌 조정이 가능한 팬틸트줌(Pan-Tilt-Zoom, PTZ) 카메라로 구현될 수 있다.The video recording camera 100 can be implemented as not only a general CCTV camera with a fixed area, but also a dynamically moving camera, for example, a Pan-Tilt-Zoom (PTZ) camera capable of horizontal movement, vertical movement, and zoom adjustment. have.

즉, 영상 촬영 카메라(100)는 특정 지역 또는 특정 객체를 촬영하기 위해 이동이 가능하며, 이동하며 촬영한 이동 영상 프레임들(1_frame 내지 n_frame)을 영상 분석 서버(200)로 전송한다.That is, the image capturing camera 100 is movable to photograph a specific area or a specific object, and transmits the moving image frames 1_frame to n_frame taken while moving to the image analysis server 200 .

영상 분석 서버(200)는 카메라 움직임 추정 모듈(230), 카메라 움직임 보상 모듈(250), 객체 검출 모듈(270) 및 객체 추적 모듈(290)을 포함하며, 영상 촬영 카메라(100)로부터 전송된 영상 프레임들에 대해 카메라의 이동 성분을 제거한 객체 이동 상태를 파악하여 추적하는 역할을 수행한다.The image analysis server 200 includes a camera motion estimation module 230 , a camera motion compensation module 250 , an object detection module 270 , and an object tracking module 290 , and the image transmitted from the image capturing camera 100 . It plays a role of tracking by identifying the moving state of the object from which the camera's moving component is removed for the frames.

우선, 영상 분석 서버(200)의 카메라 움직임 추정 모듈(230)은 영상 촬영 카메라(100)를 통해 이동 영상 프레임들(1_frame 내지 n_frame)을 획득하게 되면, 이전 영상 프레임(예컨대, k-1_frame) 대비 현재 영상 프레임(예컨대, k_frame)에서 영상 촬영 카메라(100)의 움직임을 추정하는 역할을 한다.First, when the camera motion estimation module 230 of the image analysis server 200 acquires moving image frames 1_frame to n_frame through the image capturing camera 100, compared to the previous image frame (eg, k-1_frame) It serves to estimate the motion of the image capturing camera 100 in the current image frame (eg, k_frame).

이때, 카메라 움직임 추정 모듈(230)은 실시간 동영상 스티칭 알고리즘(video stitching algorithm) 기반으로 이전 영상 프레임(k-1_frame) 대비 현재 영상 프레임(k_frame)에서 배경(예컨대, 건물, 상점 등)의 위치가 어떻게 변화 했는지를 파악해 영상 촬영 카메라(100)의 움직임을 추정한다.At this time, the camera motion estimation module 230 determines how the position of the background (eg, buildings, shops, etc.) in the current image frame (k_frame) compared to the previous image frame (k-1_frame) is determined based on a real-time video stitching algorithm. The movement of the imaging camera 100 is estimated by determining whether there is a change.

이전 영상 프레임(예컨대, k-1_frame) 대비 현재 영상 프레임(예컨대, k_frame)에서 객체들의 움직임이 있다고 할 때, 이는 객체 자체의 움직임이 있었거나 영상 촬영 카메라(100)의 움직임(예컨대, 배경의 위치 변화)이 있었다 할 수 있으며, 또한 객체 자체의 움직임 및 영상 촬영 카메라(100)의 움직임이 모두 존재하는 경우일 수도 있다.When there is movement of objects in the current image frame (eg, k_frame) compared to the previous image frame (eg, k-1_frame), this means that there was a movement of the object itself or the movement of the imaging camera 100 (eg, the position of the background). change) may exist, or it may be a case in which both the movement of the object itself and the movement of the image capturing camera 100 exist.

상기와 같은 여러 가지 이유의 객체 움직임이 있을 수 있고, 카메라 움직임 추정 모듈(230)은 영상 프레임 내 객체의 움직임은 배제하고 영상 프레임 내의 카메라 움직임만을 추정한다.There may be object motion for various reasons as described above, and the camera motion estimation module 230 estimates only the camera motion within the image frame while excluding the motion of the object within the image frame.

이와 같이 카메라 움직임 추정 모듈(230)이 영상 프레임 내의 카메라 움직임을 추정하는 이유는 영상 프레임에서 직접적으로 얻을 수 있는 것은 '영상 프레임 내 객체의 움직임'이고, 이러한 '영상 프레임 내 객체의 움직임'에서 '영상 카메라의 움직임'을 배제하면 '영상 프레임 내 객체 자체의 움직임'을 산출해 낼 수 있기 때문이다.The reason that the camera motion estimation module 230 estimates the camera motion in the image frame is that the 'movement of an object in the image frame' is directly obtained from the image frame, and in this 'movement of the object in the image frame', ' This is because if 'movement of the video camera' is excluded, 'movement of the object itself within the image frame' can be calculated.

카메라 움직임 추정 모듈(230)이 이동 영상 프레임에서 영상 촬영 카메라(100)의 움직임을 추정하는 과정은 다음과 같다.A process in which the camera motion estimation module 230 estimates the motion of the image capturing camera 100 in the moving image frame is as follows.

우선, 카메라 움직임 추정 모듈(230)은 해당 영상 프레임에 일정 간격으로 배치된 격자점들로 구성된 격자(Grid)를 생성하며, 상기 격자를 이루는 격자점들 각각은 해당 영상 프레임에서 픽셀값 뿐만 아니라 x축 좌표값 및 y축 좌표값을 갖는 위치 정보를 보유한다.First, the camera motion estimation module 230 generates a grid composed of grid points arranged at regular intervals in a corresponding image frame, and each of the grid points constituting the grid is x as well as a pixel value in the corresponding image frame. It holds position information having an axis coordinate value and a y-axis coordinate value.

이후, 카메라 움직임 추정 모듈(230)은 상기 생성한 격자점들을 대상으로 영상 프레임 내 움직임을 알아낼 기준이 되는 특징점을 추출한다.Thereafter, the camera motion estimation module 230 extracts a feature point, which is a reference for finding out a motion within an image frame, from the generated grid points.

이때, 상기의 특징점 추출을 위하여 이미지 피라미드를 이용하는 SIFT(Scale Invariant Feature Transform) 또는 헤시안(Hessian) 행렬식을 이용하는 SURF(Speed-Up Robust Feature) 등과 같은 일반적인 특징점 추출 알고리즘을 사용할 수도 있으나, 상기 SIFR나 SURF 알고리즘은 특징점을 추출하고 매칭하기 위한 연산량이 많기 때문에 실시간 환경에는 적합하지 않다.In this case, for extracting the feature points, a general feature point extraction algorithm such as SIFT (Scale Invariant Feature Transform) using an image pyramid or SURF (Speed-Up Robust Feature) using a Hessian determinant may be used, but the SIFR or The SURF algorithm is not suitable for a real-time environment because there is a large amount of computation for extracting and matching feature points.

따라서, 카메라 움직임 추정 모듈(230)은 종래 일반적인 방식 대비 현저히 빠른 연산 속도를 낼 수 있는 특징점 추출 방식을 이용한다.Accordingly, the camera motion estimation module 230 uses a feature point extraction method that can achieve a significantly faster operation speed than the conventional general method.

카메라 움직임 추정 모듈(230)이 상기 특징점을 추출하는 방법을 보다 상세히 설명한다.A method for the camera motion estimation module 230 to extract the feature point will be described in more detail.

우선, 카메라 움직임 추정 모듈(230)은 영상 프레임들의 픽셀값을 컬러 값이 아닌 무채색 스케일, 즉 그레이 스케일(gray scale)로 변환한다.First, the camera motion estimation module 230 converts pixel values of image frames into an achromatic scale, that is, a gray scale, rather than a color value.

실시 예에 따라, 상기 그레이 스케일은 0.0 ~ 1.0 범위 내 또는 0 ~ 255 범위 내로 설정될 수 있다.According to an embodiment, the gray scale may be set within a range of 0.0 to 1.0 or within a range of 0 to 255.

이후, 카메라 움직임 추정 모듈(230)은 격자점들 각각에 대해 이전 영상 프레임(예컨대, k-1_frame) 대비 현재 영상 프레임(k_frame)에서의 그레이 스케일 값 변화량을 계산하고, 상기 계산된 그레이 스케일 값 변화량이 소정 기준치 이상인 경우에만 해당 격자점을 상기 특징점으로 선별한다.Then, the camera motion estimation module 230 calculates the gray scale value change amount in the current image frame (k_frame) compared to the previous image frame (eg, k-1_frame) for each of the grid points, and the calculated gray scale value change amount Only when it is equal to or greater than the predetermined reference value, the corresponding grid point is selected as the feature point.

예컨대, 이전 영상 프레임(k-1_frame)에서 x 좌표값이 10이고 y 좌표값이 10인 제1격자점의 그레이 스케일값은 1.0(흰색)이고, 현재 영상 프레임(k_frame)에서 동일 좌표값을 갖는 상기 제1격자점의 그레이 스케일값이 0.0(검은색)인 경우에 상기 제1격자점의 그레이 스케일 값 변화량은 1.0이 된다.For example, in the previous image frame (k-1_frame), the gray scale value of the first lattice point having an x-coordinate value of 10 and a y-coordinate value of 10 is 1.0 (white), and has the same coordinate value in the current image frame (k_frame). When the gray scale value of the first grid point is 0.0 (black), the change amount of the gray scale value of the first grid point becomes 1.0.

이때, 상기 소정 기준치가 0.5라 한다면, 상기 제1격자점의 그레이 스케일 값 변화량인 1.0은 상기 소정 기준치 0.5 보다 크기 때문에 카메라 움직임 추정 모듈(230)은 상기 제1격자점을 상기 상기 특징점으로 선별한다.At this time, if the predetermined reference value is 0.5, the gray scale value change amount of the first grid point of 1.0 is greater than the predetermined reference value 0.5, so the camera motion estimation module 230 selects the first grid point as the feature point. .

이와 같은 방식으로 카메라 움직임 추정 모듈(230)은 각 격자점들에 대해 그레이 스케일 값 변화량을 계산함으로써 상기 특징점을 선별한다.In this way, the camera motion estimation module 230 selects the feature point by calculating the gray scale value change for each grid point.

도 2는 도 1에 도시된 카메라 움직임 추정 모듈(230)이 해당 영상 프레임에 격자점 및 특징점을 표시한 예시도이다.FIG. 2 is an exemplary diagram in which the camera motion estimation module 230 shown in FIG. 1 displays grid points and feature points in a corresponding image frame.

도 2를 참조하면, 해당 영상 프레임에 격자가 생성되고 상기 생성된 격자는 소정 간격으로 배치된 격자점으로 구성됨을 알 수 있다.Referring to FIG. 2 , it can be seen that a grid is generated in a corresponding image frame, and the generated grid is composed of grid points arranged at predetermined intervals.

또한, 격자점들 중 노란색으로 표시된 격자점들은 최종 선별 된 특징점을 나타내고, 빨간색으로 표시된 격자점들은 상기의 소정 기준치를 만족하지 못해 특징점으로 선별되지 못하는 격자점을 나타낸다.Also, among the grid points, the grid points indicated in yellow indicate the final selected feature points, and the grid points indicated in red indicate the grid points that cannot be selected as the feature points because they do not satisfy the predetermined reference value.

다시 도 1을 참조하면, 카메라 움직임 추정 모듈(230)은 소정의 광학 흐름(Optical Flow) 방법을 통해 현재 영상 프레임(k_frame)에서 추출한 특징점들이 다음 프레임(k+1_frame)의 어느 위치로 이동했는지 추정한다.Referring back to FIG. 1 , the camera motion estimation module 230 estimates to which position in the next frame (k+1_frame) the feature points extracted from the current image frame (k_frame) have moved through a predetermined optical flow method. do.

실시 예에 따라, 상기 광학 흐름 방법으로 블록 매칭 방법(Block Matching method), 혼-셩크 알고리즘(Horn-Shunck algorithm), 루카스 카나데 알고리즘(Lucas-Kanade algorithm) 또는 군나르 파너백 알고리즘(Gunnar Farenback's algorithm) 등의 방법을 이용할 수도 있으나, 본 발명의 일 실시 예 따른 카메라 움직임 추정 모듈(230)은 추출한 특징점에 대하여 광학 흐름을 분석할 수 있고, 알고리즘 소요 시간 대비 정확성이 높아 실시간 영상에 가장 적합한 피라미드 루카스 카나에 알고리즘(Iterative Lucas-Kanade Method with Pyramids)을 이용하는 것이 바람직하다. According to an embodiment, the optical flow method is a block matching method, a Horn-Shunck algorithm, a Lucas-Kanade algorithm, or a Gunnar Farenback's algorithm, etc. may be used, but the camera motion estimation module 230 according to an embodiment of the present invention can analyze the optical flow for the extracted feature points, and the accuracy is high compared to the algorithm required time, so the pyramid Lucas Canae most suitable for real-time images It is preferable to use an algorithm (Iterative Lucas-Kanade Method with Pyramids).

즉, 카메라 움직임 추정 모듈(230)은 피라미드 루카스 카나에 알고리즘을 통해 상기와 같이 선별한 특징점들이 다음 영상 프레임(k+1_frame)의 어느 위치로 이동했는지 추정할 수 있다.That is, the camera motion estimation module 230 may estimate to which position in the next image frame (k+1_frame) the selected feature points are moved through the Pyramid Lucas Canae algorithm.

이를 통해 카메라 움직임 추정 모듈(230)은 현재 영상 프레임(k_frame)에서의 특징점 위치 정보와 다음 영상 프레임(k+1_frame)에 대해 추정한 특징점의 위치 정보를 이용하여 특징점 쌍(현재 영상 프레임에서의 특징점과 다음 영상 프레임에서 추정된 특징점)의 이동 벡터를 구한다.Through this, the camera motion estimation module 230 uses the key point position information in the current image frame (k_frame) and the position information of the key point estimated for the next image frame (k+1_frame) in a key point pair (feature point in the current image frame). and the motion vector of the estimated feature point in the next image frame).

예컨대, 현재 영상 프레임(k_frame)에 존재하는 여러 특징점들 중 제1 특징점의 위치 정보가 (41, 25)이고 다음 영상 프레임(k+1_frame)에서 추정된 상기 제1 특징점의 위치 정보가 (73, 81)라 하면, 상기 제1 특징점의 이동 벡터는 V1 = (32, 56)이 된다.For example, the position information of the first feature point among several feature points existing in the current image frame (k_frame) is (41, 25), and the position information of the first feature point estimated in the next image frame (k+1_frame) is (73, 81), the movement vector of the first feature point becomes V1 = (32, 56).

이와 같은 방법으로 카메라 움직임 추정 모듈(230)은 현재 영상 프레임(k_frame)에 존재하는 모든 특징점들에 대해 다음 영상 프레임(k+1_frame)에서의 위치를 추정하고, 이들 특징점 쌍에 대한 이동 벡터를 구할 수 있다.In this way, the camera motion estimation module 230 estimates positions in the next image frame (k+1_frame) for all feature points existing in the current image frame (k_frame), and obtains motion vectors for these feature point pairs. can

한편, 이들 특징점 쌍의 이동 벡터들 중에는 영상 프레임 내 객체의 움직임 만으로 발생한 이동 벡터가 있을 수 있고, 또한 전적으로 영상 촬영 카메라(100)의 움직임만으로 발생한 이동 벡터가 있을 수 있다.Meanwhile, among the motion vectors of the pair of feature points, there may be motion vectors generated only by the motion of the object in the image frame, and there may be motion vectors generated solely by the motion of the image capturing camera 100 .

사람, 자동차와 같이 움직이는 객체는 영상 프레임 내에서 일부 영역인 반면, 건물 등과 같은 배경이 되는 부분은 영상 프레임 내에서 대부분의 영역을 차지할 것이기 때문에 배경이 되는 부분으로부터 발생한 이동 벡터는 영상 촬영 카메라(100)의 움직임에 의한 것으로 볼 수 있다.A moving object such as a person or a car is a partial area within the image frame, whereas a background such as a building will occupy most of the area within the image frame. ) can be attributed to the movement of

따라서 전체 이동 벡터에 대해 이차원 히스토그램(2D Histogram)을 계산하여 최고 빈도의 이동 벡터를 결정할 수 있고, 상기 최고 빈도에 해당하는 이동 벡터들이 배경의 이동 벡터, 즉 영상 촬영 카메라(100)의 움직임에 따른 이동 벡터로 판단한다.Therefore, it is possible to determine a motion vector with the highest frequency by calculating a 2D histogram for the entire motion vector, and the motion vectors corresponding to the highest frequency are the motion vectors of the background, that is, according to the motion of the imaging camera 100 . It is judged as a movement vector.

실제적으로 영상 프레임 내의 상기 특징점 쌍의 이동 벡터들은 수 백개 이상일 수 있으나, 본 발명의 상세한 설명에서는 설명의 편의를 위해 상기 전체 이동 벡터들의 수가 4개인 경우를 가정하고, 상기 이차원 히스토그램의 저장소(bin)는 x, y를 10 단위로 분류하는 것으로 설명한다.In practice, the number of motion vectors of the pair of feature points in the image frame may be several hundred or more, but in the detailed description of the present invention, for convenience of explanation, it is assumed that the total number of motion vectors is 4, and the two-dimensional histogram bin is explained by classifying x and y into units of 10.

예컨대, 제1 이동 벡터(V1)가 (32, 56), 제2 이동 벡터(V2)가 (41, 25), 제3 이동 벡터(V3)가 (37, 51), 제4 이동 벡터(V4)가 (35, 53)이라 가정한다.For example, the first motion vector V1 is (32, 56), the second motion vector V2 is (41, 25), the third motion vector V3 is (37, 51), and the fourth motion vector V4 is ) is (35, 53).

이때, 상기 제1 이동 벡터 내지 제4 이동 벡터(V1 ~ V4)를 각각 x, y 10 단위의 저장소에 입력한다면, 상기 제1 이동 벡터(V1)는 (3, 5)의 저장소에 입력되고, 상기 제2 이동 벡터(V2)는 (4, 2)의 저장소에 입력되고, 상기 제3 이동 벡터(V3)는 (3, 5)의 저장소에 입력되며, 상기 제4 이동 벡터(V4)는 (3, 5)의 저장소에 입력된다.At this time, if the first to fourth motion vectors V1 to V4 are input to the storage of x and y 10 units, respectively, the first motion vector V1 is input to the storage of (3, 5), The second motion vector V2 is input to the storage of (4, 2), the third motion vector V3 is input to the storage of (3, 5), and the fourth motion vector V4 is ( 3, 5) are entered into the storage.

이 때, 가장 많은 이동 벡터가 입력된 저장소는 제1 이동 벡터(V1), 제3 이동 벡터(V3) 및 제4 이동 벡터(V4)가 저장되어 있는 (3, 5) 저장소이며, 상기 (3, 5) 저장소에 저장된 제1 이동 벡터, 제3 이동 벡터 및 제4 이동 벡터가 최고 빈도에 해당하는 이동 벡터(즉, 배경의 이동 벡터)라 할 수 있다.In this case, the storage to which the largest number of motion vectors is input is the (3, 5) storage in which the first motion vector V1, the third motion vector V3, and the fourth motion vector V4 are stored, and the above (3) , 5) The first, third, and fourth motion vectors stored in the storage may be referred to as motion vectors (ie, motion vectors of the background) corresponding to the highest frequency.

이에, 카메라 움직임 추정 모듈(230)은 배경의 이동 벡터와 관련된 상기 제1 이동 벡터(V1), 제3 이동 벡터(V3) 및 제4 이동 벡터(V4)를 영상 촬영 카메라(100)의 움직임에 따른 이동 벡터로 판단한다.Accordingly, the camera motion estimation module 230 applies the first motion vector (V1), the third motion vector (V3), and the fourth motion vector (V4) related to the motion vector of the background to the motion of the image capturing camera 100 . It is determined as a movement vector according to the

영상 촬영 카메라(100)는 단순히 수평 방향(또는 수직 방향)으로 움직일 수 있으나 회전(수평 방향 및 수직 방향으로 동시에 움직이는 경우)할 수도 있기 때문에, 영상 촬영 카메라(100)의 움직임에 따른 이동 벡터는 상기와 같이 유사한 크기와 방향을 갖는 복수의 이동 벡터들(예컨대, V1, V3 및 V4)로 나타날 수 있다.Since the video camera 100 can simply move in the horizontal direction (or in the vertical direction), but can also rotate (when moving in the horizontal and vertical directions at the same time), the motion vector according to the movement of the video camera 100 is may be represented by a plurality of motion vectors (eg, V1, V3, and V4) having similar magnitudes and directions.

도 3은 카메라 움직임 추정 모듈(230)이 이동 벡터에 대해 이차원 히스토그램을 계산하여 높은 빈도의 이동 벡터를 결정한 예시도이다.3 is an exemplary diagram in which the camera motion estimation module 230 determines a high frequency motion vector by calculating a two-dimensional histogram with respect to the motion vector.

도 3을 참조하면, 표시된 빨간선이 최종 선택된 카메라 움직임에 해당하는 이동 벡터가 되며, 대략적으로 이전 프레임 대비 현재 프레임에서 빨간선 정도의 방향과 크기로 움직임이 있었다고 판단할 수 있다.Referring to FIG. 3 , the displayed red line becomes a motion vector corresponding to the finally selected camera motion, and it can be determined that there was movement in the direction and magnitude of the red line in the current frame approximately compared to the previous frame.

다시 도 1을 참조하면, 카메라 움직임 보상 모듈(250)은 카메라 움직임 추정 모듈(230)로부터 판단된 영상 촬영 카메라(100)의 움직임에 따른 이동 벡터들을 통해, 이전 영상 프레임(k-1_frame) 내의 특징점들을 현재 영상 프레임(k_frame)의 특정 위치로 호모그래피(Homography) 변환시키기 위한 호모그래피 행렬(Homography matrix)을 계산한다.Referring back to FIG. 1 , the camera motion compensation module 250 uses the motion vectors according to the motion of the imaging camera 100 determined by the camera motion estimation module 230 , the feature points in the previous image frame k-1_frame. A homography matrix is calculated for homography-transforming the images to a specific position of the current image frame (k_frame).

일반적으로 호모그래피란 3D 공간에서의 이미지를 2D 공간으로 투영시킨 변환으로 3D 공간에서 서로 다른 두 시점에서 바라본 두 개의 이미지를 서로 변환하는 방법을 의미하며, 이때 서로 다른 두 이미지의 관계를 표현한 행렬을 호모그래피 행렬이라 한다.In general, homography is a transformation in which an image in 3D space is projected into 2D space, and refers to a method of transforming two images viewed from two different viewpoints in 3D space. It is called a homography matrix.

즉, 상기 호모그래피 행렬이란 이전 영상 프레임(k-1_frame)에 위치한 특정 특징점들이 현재 영상 프레임(k_frame)의 특정 위치에 존재하도록 변환시키는 행렬이라 할 수 있다.That is, the homography matrix can be referred to as a matrix that transforms specific feature points located in the previous image frame (k-1_frame) so that they exist at specific positions in the current image frame (k_frame).

이때, 상기 특정 특징점은 이들에 대한 특징점 쌍의 이동 벡터가 상기 영상 촬영 카메라(100)의 움직임에 따른 이동 벡터로 판단되는 특징점을 의미하며, 상기 현재 영상 프레임(k_frame)의 특정 위치는 상기 광학 흐름 방법을 통해 상기 특정 특징점이 현재 영상 프레임(k_frame)에 존재할 것으로 추정되는 위치를 의미한다.In this case, the specific feature point means a feature point in which a movement vector of a pair of feature points is determined as a movement vector according to the movement of the image capturing camera 100, and the specific position of the current image frame (k_frame) is the optical flow It means a position at which the specific feature point is estimated to exist in the current image frame (k_frame) through the method.

따라서, 이전 영상 프레임(k-1_frame)에 위치한 특정 특징점들은 상기 호모그래피 변환을 적용하였을 때 현재 프레임(k_frame)의 특정 위치에 존재하게 된다.Accordingly, specific feature points located in the previous image frame (k-1_frame) exist at a specific position in the current frame (k_frame) when the homography transformation is applied.

즉, 본 명세서에서의 상기 호모그래피란 객체의 이동에 있어서 객체 자신스스로의 움직임을 제외한 영상 촬영 카메라(100)의 움직임에 따른 이동이라 할 수 있다. That is, in the present specification, the homography may be a movement according to the movement of the image capturing camera 100 excluding the movement of the object itself in the movement of the object.

한편, 이전 영상 프레임(k-1_frame)에 위치한 특정 특징점들을 제외한 다른 특징점들은 상기 호모그래피 변환을 적용하였을 때의 현재 영상 프레임(k_frame) 내 특정 위치와 앞서 설명한 광학 흐름 방법을 통해 현재 영상 프레임(k_frame)에서 추정된 위치가 다를 수 있다.On the other hand, other feature points except for the specific feature points located in the previous image frame (k-1_frame) are located at a specific position in the current image frame (k_frame) when the homography transformation is applied and the current image frame (k_frame) through the optical flow method described above. ) may be different.

이러한 차이는 상기 특정 특징점들을 제외한 다른 특징점들은 영상 촬영 카메라(100)의 움직임에 따른 이동(즉, 상기 호모그래프 변환) 뿐만 아니라 객체 스스로의 움직임이 있었기 때문으로 볼 수 있다.This difference can be seen because other feature points other than the specific feature points are not only moved according to the movement of the image capturing camera 100 (ie, the homograph transformation) but also the object itself has a movement.

따라서, 현재 영상 프레임(k_frame)내의 모든 특징점들은 상기 광학 흐름 방법을 통해 추정된 위치 정보를 보유하며, 이들에 대해 상기 호모그래프의 역변환을 수행하면 영상 촬영 카메라(100)의 움직임이 제거된 위치 정보를 보유하게 된다.Accordingly, all feature points in the current image frame (k_frame) retain the position information estimated through the optical flow method, and when the inverse transformation of the homograph is performed on them, the position information from which the motion of the imaging camera 100 is removed. will hold

즉, 상기 호모그래프의 역변환 수행시 배경과 관련된 객체들은 영상 촬영 카메라(100)의 움직임이 제거되었기 때문에 이전 영상 프레임(k-1_frame) 대비 동일한 위치 정보를 갖게 되며, 배경과 관련이 없는 객체들(예컨대, 사람, 자동차 등)은 객체 스스로의 움직임만 반영된 위치 정보를 갖게 된다.That is, when the inverse transformation of the homograph is performed, objects related to the background have the same position information as compared to the previous image frame (k-1_frame) because the motion of the image capturing camera 100 is removed, and objects not related to the background ( For example, a person, a car, etc.) has location information in which only the movement of the object itself is reflected.

이와 같은 이유로, 카메라 움직임 보상 모듈(250)은 상기와 같이 계산한 호모그래피의 역변환을 현재 영상 프레임(k_frame)을 이루는 모든 픽셀에 적용하여, 이전 영상 프레임 대비 현재 영상 프레임(k_frame)에서의 영상 촬영 카메라(100) 움직임을 제거한 보상 영상 프레임들(예컨대, 1_frame_am 내지 n_frame_am)을 생성한다.For this reason, the camera motion compensation module 250 applies the inverse transform of the homography calculated as described above to all pixels constituting the current image frame (k_frame) to capture an image in the current image frame (k_frame) compared to the previous image frame Compensation image frames (eg, 1_frame_am to n_frame_am) from which the camera 100 movement is removed are generated.

그 결과, 상기 호모그래피의 역변환이 적용된 현재 보상 영상 프레임(k_frame_am) 내의 모든 픽셀은 영상 촬영 카메라(100)의 움직임이 제거된 위치 정보를 갖는다.As a result, all pixels in the current compensated image frame k_frame_am to which the inverse transformation of the homography is applied have position information from which the motion of the image capturing camera 100 is removed.

도 4는 본 발명의 일실 시예 따른 호모그래피 역변환이 적용된 예시도이다.4 is an exemplary view to which inverse homography transformation according to an embodiment of the present invention is applied.

이때, 도 4의 (a)는 호모그래피 역변환이 적용되기 전의 이동 영상 프레임(k-1_frame 및 k_frame)을 나타내며, 도 4의 (b)는 호모그래피 역변환이 적용된 후의 보상 영상 프레임(k-1_frame_am 및 k_frame_am)을 나타낸다.At this time, (a) of FIG. 4 shows the moving image frames (k-1_frame and k_frame) before the inverse homography transformation is applied, and (b) of FIG. 4 is the compensation image frame (k-1_frame_am and k_frame_am).

도 4의 (a)를 참조하면, 영상 촬영 카메라(100)의 우측 방향으로의 이동으로 인해 이전 영상 프레임((k-1_frame) 대비 현재 영상 프레임(k_frame)에서 배경 객채(예를 들어, 전봇대)가 좌측으로 이동한 것을 확인할 수 있다.Referring to (a) of FIG. 4 , due to the movement in the right direction of the image capturing camera 100, a background object (eg, a power pole) in the current image frame (k_frame) compared to the previous image frame ((k-1_frame) It can be seen that has moved to the left.

이에 비해 도 4의 (b)를 참조하면, 영상 촬영 카메라(100)가 움직였음에도 불구하고 영상 내 배경 객체(예를 들어, 전봇대)의 절대적인 위치는 고정되어 있음을 확인할 수 있다.In contrast, referring to FIG. 4B , it can be confirmed that the absolute position of the background object (eg, a power pole) in the image is fixed even though the image capturing camera 100 moves.

이후, 카메라 움직임 보상 모듈(250)은 상기와 같이 영상 프레임에서 영상 촬영 카메라(100) 움직임을 제거하여 생성한 보상 영상 프레임들(1_frame_am 내지 n_frame_am)을 객체 검출 모듈(270)로 전송한다.Thereafter, the camera motion compensation module 250 transmits the compensation image frames 1_frame_am to n_frame_am generated by removing the motion of the imaging camera 100 from the image frame as described above to the object detection module 270 .

다시 도 1을 참조하면, 객체 검출 모듈(270)은 카메라 움직임 보상 모듈(250)로부터 전송된 보상 영상 프레임들(1_frame_am 내지 n_frame_am) 내에서 추적과 계수의 대상이 되는 객체(예컨대, 사람)를 검출하고, 검출된 객체의 생김새를 표현하는 특징 벡터를 추출한다.Referring back to FIG. 1 , the object detection module 270 detects an object (eg, a person) to be tracked and counted within the compensation image frames 1_frame_am to n_frame_am transmitted from the camera motion compensation module 250 . and extracts a feature vector representing the shape of the detected object.

일반적인 사람 검출 모델의 경우 영상 내 사람이 차지하고 있는 전체 영역을 인식하고 사람을 둘러싸고 있는 바운딩 박스(bounding box)를 반환하지만, 본 발명의 일 실시 예에 따른 객체 검출 모듈(270)은 사람의 신체 부위에 대한 각 키 포인트(예컨대, 눈, 코, 입, 어깨, 팔꿈치, 손목, 허리, 무릎, 발목 등의 신체 부분)를 인식하여 역으로 사람의 전체 영역을 추정하는 방식을 사용한다.In the case of a general human detection model, the entire area occupied by a person in an image is recognized and a bounding box surrounding the person is returned, but the object detection module 270 according to an embodiment of the present invention is a human body part. A method of recognizing each key point (eg, body parts such as eyes, nose, mouth, shoulder, elbow, wrist, waist, knee, ankle, etc.) for

이와 같은 방식은 상기 키 포인트라는 개념을 이용하여 사람의 신체 부분의 정보를 추출, 인식함으로써 보다 정밀하게 사람을 검출할 수 있다.In this way, by extracting and recognizing information on a body part of a person using the concept of the key point, a person can be detected more precisely.

예를 들어 영상 내 특정 사람의 머리만 보이고 머리 아래 부분은 지나가는 자동차에 의해 가려진 경우, 사람 전체 영역에 대해서 스코어를 계산하는 일반적인 사람 검출 모델은 상기 특정 사람을 사람으로 인식하지 못할 가능성이 높다.For example, when only the head of a specific person in the image is visible and the lower part of the head is obscured by a passing car, there is a high probability that a general person detection model that calculates a score for the entire area of the person may not recognize the specific person as a person.

그러나, 본 발명의 일 실시 예에 따른 객체 검출 모듈(270)은 사람의 각 키 포인트 별 스코어를 계산할 수 있기 때문에 사람 몸 전체에 대한 스코어는 낮게 계산되어도 머리 부분에 대해서는 높은 스코어가 계산되어 머리만 노출된 사람도 정확하게 검출할 수 있다.However, since the object detection module 270 according to an embodiment of the present invention can calculate a score for each key point of a person, even if the score for the entire human body is calculated low, a high score is calculated for the head, so only the head Even exposed people can be accurately detected.

이후 객체 검출 모듈(270)은 검출된 객체의 생김새를 표현하는 특징 벡터를 추출한다.Then, the object detection module 270 extracts a feature vector representing the shape of the detected object.

상기 특징 벡터는 대상이 되는 사람 영역(예컨대, 키 포인트 별 영역)의 픽셀값에서 추출되는 생김새 벡터이며, 객체 검출 모듈(250)은 키 포인트별 정보를 이용하여 상기 특징 벡터를 추출한다.The feature vector is a feature vector extracted from pixel values of a target human region (eg, a region for each key point), and the object detection module 250 extracts the feature vector using information for each key point.

즉, 객체 검출 모듈(250)은 키 포인트 별 정보를 이용함으로써 어떤 신체 부분이 영상 프레임 내에서 보이는 부분인지, 보이지 않는 부분인지를 파악할 수 있으며, 대상의 보이는 부분을 이용하여 특징 벡터를 추출한다.That is, the object detection module 250 may determine which body part is a visible part or an invisible part in the image frame by using the information for each key point, and extracts a feature vector using the visible part of the object.

이에 따라, 객체 검출 모듈(250)은 해당 영상 프레임 내에 여러 사람이 중첩되는 밀도가 높은 경우라도 정확하게 사람을 인식하고 해당 특징 벡터를 추출할 수 있다.Accordingly, the object detection module 250 can accurately recognize a person and extract a corresponding feature vector even when the density of overlapping multiple people within the corresponding image frame is high.

한편, 객체 추적 모듈(290)은 이전 영상 프레임(예컨대, k-1_frame_am)에서 검출된 객체와 현재 영상 프레임(예컨대, k_frame_am)에서 검출된 객체와의 동일성을 판단하는 검출 기반 추적 방법(Track by Detect)을 이용하여 해당 객체를 추적한다.On the other hand, the object tracking module 290 determines the identity of an object detected in a previous image frame (eg, k-1_frame_am) and an object detected in a current image frame (eg, k_frame_am) is a detection-based tracking method (Track by Detect). ) to track the object.

이때, 객체 추적 모듈(290)이 상기 해당 객체의 추적을 수행하기 위한 보상 영상 프레임들(예컨대, 1_frame_am 내지 n_frame_am)은 카메라 움직임 보상 모듈(250)이 영상 촬영 카메라(100) 움직임을 제거하여 생성한 영상 프레임들이다.At this time, the compensation image frames (eg, 1_frame_am to n_frame_am) for the object tracking module 290 to track the corresponding object are generated by the camera motion compensation module 250 removing the motion of the imaging camera 100. video frames.

상기 검출 기반 추적 방법과 관련하여, 제 1 영상 프레임(1_frame_am)은 영상이 최초 시작되는 프레임으로서 제 1 영상 프레임(1_frame_am)에서는 추적 중인 객체가 없기 때문에 검출된 모든 객체에 대해 추적을 시작하는 초기화 단계라 할 수 있다.In relation to the detection-based tracking method, the first image frame (1_frame_am) is a frame from which an image is first started, and since there is no object being tracked in the first image frame (1_frame_am), an initialization step of starting tracking of all detected objects can be said

또한, 제 k 영상 프레임(k_frame_am)은 제 k-1 영상 프레임(k-1_frame_am)까지 추적 중이던 객체들과 제 k 영상 프레임(k_frame_am)에서 검출된 객체들과 매칭을 통해 기존 추적을 현재 영상 프레임으로 이어가거나, 기존 추적과 매칭되지 않는 객체들을 대상으로 신규 추적을 시작하는 단계라 할 수 있다.In addition, the kth image frame (k_frame_am) matches the objects being tracked until the k-1th image frame (k-1_frame_am) with the objects detected in the kth image frame (k_frame_am) to convert the existing tracking into the current image frame. It can be said that it is a step of continuing or starting a new tracking for objects that do not match the existing tracking.

이때, 객체 추적 모듈(290)은 상기 검출 기반 추적 방법을 이용하여 해당 객체를 추적함에 있어서, 객체 검출 모듈(250)로부터 검출된 객체 간의 위치 유사도및 생김새 유사도(키 포인트 별 특징 벡터의 유사도)에 따라 해당 객체를 추적할 수 있다.At this time, when the object tracking module 290 tracks the corresponding object using the detection-based tracking method, the position similarity and shape similarity between the objects detected by the object detection module 250 (similarity of feature vectors for each key point) You can track the object accordingly.

이하, 객체 추적 모듈(290)이 해당 객체를 추적하는 방법에 대해서 상세히 설명한다.Hereinafter, a method for the object tracking module 290 to track a corresponding object will be described in detail.

우선, 객체 추적 모듈(290)은 칼만 필터(Kalman Filter)를 이용해 이전 영상 프레임(예컨대, k-1_frame_am)까지 추적 중이던 객체(예컨대, 추적 객체(Tr_obj))에 대하여 현재 프레임(예컨대, k_frame_am)에서 존재할 것이라 예측되는 위치를 추정한다.First, the object tracking module 290 uses a Kalman filter to track an object (eg, a tracking object (Tr_obj)) that was being tracked until a previous image frame (eg, k-1_frame_am) in the current frame (eg, k_frame_am). Estimate where it is expected to exist.

이후, 객체 추적 모듈(290)은 객체 검출 모듈(270)로부터 상기 추정된 위치와 인접하는 소정 범위 내에서 검출된 객체(예컨대, 후보 객체(Ca_obj))를 상기 추적 중이던 객체(Tr_obj)와 대비한다.Thereafter, the object tracking module 290 compares an object (eg, a candidate object Ca_obj) detected within a predetermined range adjacent to the estimated position from the object detection module 270 with the object being tracked (Tr_obj). .

이때, 객체 추적 모듈(290)로부터 추정된 위치나 객체 검출 모듈(270)로부터 검출된 객체의 위치는 이미 카메라 움직임 보상 모듈(250)로부터 카메라 움직임이 보상된 영상 프레임에서의 위치이므로 이들 상호 간의 직접적인 위치 비교가 가능하다.At this time, the position estimated by the object tracking module 290 or the position of the object detected by the object detection module 270 is a position in the image frame for which the camera movement has already been compensated by the camera movement compensation module 250 , so they are directly related to each other. Location comparison is possible.

즉, 객체 추적 모듈(290)은 객체 검출 모듈(250)로부터 검출된 후보 객체가 상기 추정된 위치의 일정 범위 내에 존재하는지 여부에 따라 이들 간 위치 유사도를 판단하고, 상기 추적 중이던 객체(Tr_obj)의 특징 벡터와 상기 검출된 후보 객체(Ca_obj)의 특징 벡터를 비교하여 이들 간 생김새 유사도를 판단한다.That is, the object tracking module 290 determines the position similarity between the candidate objects detected by the object detection module 250 according to whether or not they exist within a predetermined range of the estimated position, and By comparing the feature vector and the feature vector of the detected candidate object Ca_obj, the similarity in appearance is determined.

앞서 설명하였듯이, 상기 특징 벡터는 키 포인트 별(눈, 코, 입, 어깨, 팔꿈치, 손목, 허리, 무릎, 발목 등의 신체 부분) 정보를 이용하여 추출되므로, 해당 영상 프레임 내에 여러 사람이 중첩되는 밀도가 높은 경우라도 정확하게 비교가 가능하다.As described above, the feature vector is extracted using information for each key point (body parts such as eyes, nose, mouth, shoulder, elbow, wrist, waist, knee, ankle, etc.), so that multiple people overlap within the image frame. Even when the density is high, accurate comparison is possible.

또한, 객체 추적 모듈(290)은 상기 추적 중이던 객체(Tr_obj)의 키 포인트별 특징 벡터와 상기 상기 검출된 후보 객체(Ca_obj)의 키포인트 별 특징 벡터를 비교하여 생김새 유사도를 판단하되, 각 키포인트에 따른 우선 순위를 설정하여 객체 상호 간 유사도를 판단할 수 있다.In addition, the object tracking module 290 compares the feature vector for each key point of the object being tracked (Tr_obj) with the feature vector for each key point of the detected candidate object (Ca_obj) to determine the similarity in appearance, but according to each key point It is possible to determine the degree of similarity between objects by setting priorities.

예컨대, 눈, 코, 입, 어깨, 팔꿈치, 손목, 허리, 무릎, 발목 등의 각 키포인트들 중에서 눈, 코, 입 등 사람의 얼굴과 관련된 키 포인트의 특징 벡터에 가장 높은 우선 순위를 설정할 수 있다.For example, the highest priority may be set to a feature vector of a key point related to a human face, such as an eye, nose, or mouth, among key points such as eyes, nose, mouth, shoulder, elbow, wrist, waist, knee, and ankle. .

이와 같이 객체 추적 모듈(290)은 상기 위치 유사도 및 상기 생김새 유사도 판단에 따라 상기 후보 객체(Ca_obj)와 상기 추적 객체(Tr_obj)의 동일성을 판단하고, 상기 후보 객체(Ca_obj)와 상기 추적 객체(Tr_obj)를 동일한 객체로 판단하는 경우 상호 매칭되었다고 정의한다.In this way, the object tracking module 290 determines the identity of the candidate object Ca_obj and the tracking object Tr_obj according to the location similarity and the shape similarity determination, and the candidate object Ca_obj and the tracking object Tr_obj ) as the same object, it is defined as mutually matching.

한편, 해당 영상 프레임(예컨대, k_frame_am)에 여러 객체가 중첩될 정도로 밀도가 높은 경우, 즉 상기 추정된 위치의 일정 범위 내에서 복수의 후보 객체들(예컨대, Ca_obj1 ~ Ca_obj3)이 검출되는 경우도 있을 수 있다.On the other hand, when the density is high enough to overlap several objects in the corresponding image frame (eg, k_frame_am), that is, a plurality of candidate objects (eg, Ca_obj1 to Ca_obj3) are detected within a certain range of the estimated position. can

이와 같은 경우일 때, 객체 추적 모듈(290)은 상기 복수의 후보 객체들(Ca_obj1 ~ Ca_obj3) 각각에 대하여 추적 객체(Tr_obj)와의 상기 생김새 유사도및 상기 위치 유사도를 대비함으로써 매칭 여부를 판단한다.In this case, the object tracking module 290 determines whether to match each of the plurality of candidate objects Ca_obj1 to Ca_obj3 by comparing the similarity in appearance with the tracking object Tr_obj and the similarity in location.

이때, 상기 생김새 유사도와 상기 위치 유사도에 소정 비율에 따른 가중치를 부여하고, 상기 판단된 생김새 유사도 및 상기 위치 유사도에 상기 부여한 가중치를 적용한 결과를 종합하여 상기 복수의 후보 객체들(Ca_obj1 ~ Ca_obj3) 중 추적 객체(Tr_obj)와 동일한 객체로 판단되는 후보 객체를 매칭한다.At this time, weights are given according to a predetermined ratio to the degree of appearance similarity and the degree of location similarity, and the results of applying the weights to the determined degree of similarity in appearance and location are combined, and among the plurality of candidate objects (Ca_obj1 to Ca_obj3). A candidate object determined to be the same as the tracking object (Tr_obj) is matched.

실시 예에 따라, 상기 생김새 유사도에 대해서는 1보다 큰 소정값이 상기 가중치로 설정될 수 있고, 상기 위치 유사도에 대해서는 1보다 작은 소정값이 상기 가중치로 설정될 수 있다.According to an embodiment, a predetermined value greater than 1 for the appearance similarity may be set as the weight, and a predetermined value less than 1 for the location similarity may be set as the weight.

이는 객체간 동일성 판단 시, 객체 간 생김새 유사도가 객체 간 위치 유사도보다 더 중요한 비교 정보로 볼 수 있기 때문인데, 영상 프레임 환경 등의 영향으로 생김새 유사도에 비중을 높게 부여하기 어려운 경우(객체 자체가 인식되지 않는 경우나 키 포인트 별로 모든 특징 벡터가 추출되지 않은 경우, 영상 내 조명 환경 변화, 객체의 자세 변화 등)에는 위치 유사도에 더 높은 가중치가 설정될 수 있음을 물론이다.This is because, when determining the sameness between objects, the similarity in appearance between objects can be viewed as more important comparative information than the position similarity between objects. It goes without saying that a higher weight may be set for the position similarity in the case of not being able to do so, or when all feature vectors are not extracted for each key point, a change in the lighting environment in the image, a change in the posture of an object, etc.).

예컨대 상기 추정된 위치와 인접하는 소정 범위 내에 아무런 객체가 검출되지 않는 경우에, 객체 추적 모듈(290)은 상기 소정 범위 밖에서 검출된 후보 객체들(예컨대, Ca_obj4 ~ Ca_obj) 각각에 대하여 추적 객체(Tr_obj)와의 상기 생김새 유사도 및 상기 위치 유사도를 대비함으로써 매칭 여부를 판단하되, 상기 위치 유사도에 대해서는 1보다 큰 소정값을 상기 가중치로 설정할 수 있고, 상기 생김새 유사도에 대해서는 1보다 작은 소정값을 상기 가중치로 설정할 수 있다.For example, when no object is detected within a predetermined range adjacent to the estimated position, the object tracking module 290 may detect a tracking object Tr_obj for each of the candidate objects (eg, Ca_obj4 to Ca_obj) detected outside the predetermined range. ) to determine whether a match is made by comparing the degree of similarity in appearance with the degree of location similarity, and a predetermined value greater than 1 may be set as the weight for the degree of location similarity, and a predetermined value smaller than 1 may be used as the weight for the degree of similarity in appearance. can be set.

한편, 이전 영상 프레임들(1_frame_am ~ k-1_frame_am)에서 검출은 되었으나 추적에 계속 실패한 객체(Un_obj)가 존재할 수 있고, 상기 추적에 실패한 객체(Un_obj)의 현재 영상 프레임(k_frame_am)에서의 추정 위치와 인접하는 소정 범위 내에 아무런 객체가 검출되지 않는 경우가 있을 수 있다. On the other hand, there may be an object (Un_obj) that has been detected in the previous image frames (1_frame_am ~ k-1_frame_am) but has failed to be tracked, and the estimated position of the object (Un_obj) that has failed to be tracked in the current image frame (k_frame_am) and There may be a case where no object is detected within an adjacent predetermined range.

이러한 경우에 객체 추적 모듈(290)은 상기 위치 유사도에 대한 상기 가중치를 0으로 설정하여, 상기 위치 유사도와는 무관하게 상기 생김새 유사도 만으로 객체 간 매칭을 수행할 수 있다.In this case, the object tracking module 290 sets the weight for the location similarity to 0, so that the object-to-object matching can be performed only with the appearance similarity regardless of the location similarity.

상기와 같은 매칭 과정을 통해서도 매칭에 실패한 객체(Un_obj)에 대해서는 상기 칼만 필터를 통해 예측한 위치를 현재 영상 프레임(k_frame_am)에서의 객체 위치로 업데이트 한다.For the object Un_obj that has failed to match even through the above matching process, the position predicted through the Kalman filter is updated with the object position in the current image frame k_frame_am.

또한, 일정 회수 이상 연속으로 매칭에 실패한 경우에 해당 객체는 해당 영상 프레임 내에서 사라졌다고 판단(예컨대, 버스 등 차량 탑승, 지하철 역으로 내려감)하고 더 이상 추적을 하지 않는다.In addition, if matching fails a certain number of times in succession, it is determined that the object has disappeared within the corresponding image frame (eg, boarding a vehicle such as a bus, going down to a subway station) and tracking is no longer performed.

도 5는 본 발명의 일 실시 예에 따른 동적 카메라 영상 내의 객체를 실시간 추적하는 방법을 설명하기 위한 순서도이다.5 is a flowchart illustrating a method of tracking an object in a dynamic camera image in real time according to an embodiment of the present invention.

도 1 내지 도 5를 참조하면, 수평 이동, 수직 이동, 줌 조정이 가능한 팬틸트줌(Pan-Tilt-Zoom, PTZ) 카메라로 구현된 영상 촬영 카메라(100)가 이동하며 촬영한 이동 영상 프레임들(1_frame 내지 n_frame)을 영상 분석 서버(200)로 전송한다(S100).1 to 5 , moving image frames captured by the image capturing camera 100 implemented as a Pan-Tilt-Zoom (PTZ) camera capable of horizontal movement, vertical movement, and zoom adjustment while moving (1_frame to n_frame) is transmitted to the image analysis server 200 (S100).

영상 분석 서버(200)의 카메라 움직임 추정 모듈(230)은 영상 촬영 카메라(100)를 통해 전송된 이동 영상 프레임들(1_frame 내지 n_frame)을 수신하고(S150), 해당 영상 프레임에 일정 간격으로 배치된 격자점들로 구성된 격자(Grid)를 생성한다(S200).The camera motion estimation module 230 of the image analysis server 200 receives the moving image frames 1_frame to n_frame transmitted through the image capturing camera 100 (S150), and is arranged at regular intervals in the image frame. A grid composed of grid points is created (S200).

이후, 카메라 움직임 추정 모듈(230)은 해당 영상 프레임들의 픽셀값을 컬러 값이 아닌 무채색 스케일, 즉 그레이 스케일(gray scale)로 변환한다(S210).Thereafter, the camera motion estimation module 230 converts the pixel values of the corresponding image frames into an achromatic scale, that is, a gray scale, rather than a color value ( S210 ).

순차적으로, 카메라 움직임 추정 모듈(230)은 격자점들 각각에 대해 이전 영상 프레임(예컨대, k-1_frame) 대비 현재 영상 프레임(k_frame)에서의 그레이 스케일 값 변화량을 계산하고, 상기 계산된 그레이 스케일 값 변화량이 소정 기준치 이상인 경우에만 해당 격자점을 상기 특징점으로 선별한다(S230).Sequentially, the camera motion estimation module 230 calculates the gray scale value change amount in the current image frame (k_frame) compared to the previous image frame (eg, k-1_frame) for each of the grid points, and the calculated gray scale value Only when the amount of change is equal to or greater than a predetermined reference value, the corresponding grid point is selected as the feature point (S230).

이후, 카메라 움직임 추정 모듈(230)은 소정의 광학 흐름 방법을 통해 현재 영상 프레임(k_frame)에서 추출한 특징점들이 다음 영상 프레임(k+1_frame)의 어느 위치로 이동했는지 추정한다(S250).Thereafter, the camera motion estimation module 230 estimates to which position in the next image frame (k+1_frame) the feature points extracted from the current image frame (k_frame) are moved through a predetermined optical flow method (S250).

실시 예에 따라, 상기 광학 흐름 방법으로 추출한 특징점에 대한 광학 흐름을 분석할 수 있고 알고리즘 소요 시간 대비 정확성이 높아 실시간 영상에 가장 적합한 피라미드 루카스 카나에 알고리즘을 이용할 수 있다.According to an embodiment, it is possible to analyze the optical flow for the feature points extracted by the optical flow method, and it is possible to use the Pyramid Lucas Canae algorithm most suitable for real-time images because of its high accuracy compared to the required time of the algorithm.

순차적으로, 카메라 움직임 추정 모듈(230)은 현재 영상 프레임(k_frame)에서의 특징점 위치 정보와 다음 영상 프레임(k+1_frame)에 대해 추정한 특징점의 위치 정보를 이용하여 특징점 쌍(현재 프레임에서의 특징점과 다음 프레임에서 추정된 특징점)의 이동 벡터를 구한다(S270).Sequentially, the camera motion estimation module 230 uses the key point position information in the current image frame (k_frame) and the position information of the key points estimated for the next image frame (k+1_frame) in a key point pair (feature point in the current frame). and a motion vector of the feature point estimated in the next frame) is obtained (S270).

이어서, 카메라 움직임 추정 모듈(230)은 상기 구한 이동 벡터 전체에 대해 이차원 히스토그램을 계산하여 최고 빈도의 이동 벡터들을 결정한다(S290).Next, the camera motion estimation module 230 calculates a two-dimensional histogram for all of the obtained motion vectors to determine the highest frequency motion vectors ( S290 ).

이때, 상기 결정된 최고 빈도에 해당하는 이동 벡터들은 배경의 이동 벡터, 즉 영상 촬영 카메라(100)의 움직임과 관련된 이동 벡터가 된다.In this case, the motion vectors corresponding to the determined highest frequency become a motion vector of the background, that is, a motion vector related to the motion of the image capturing camera 100 .

한편, 카메라 움직임 보상 모듈(250)은 카메라 움직임 추정 모듈(230)로부터 결정된 최고 빈도의 이동 벡터들을 통해, 이전 영상 프레임(k-1_frame) 내의 특정 특징점들을 현재 영상 프레임(k_frame)의 특정 위치로 변환시키기 위한 호모그래피 행렬(Homography matrix)을 계산한다(S300).Meanwhile, the camera motion compensation module 250 converts specific feature points in the previous image frame (k-1_frame) into specific positions of the current image frame (k_frame) through the motion vectors of the highest frequency determined by the camera motion estimation module 230 . A homography matrix is calculated for the purpose (S300).

이때, 상기 특정 특징점은 이들에 대한 특징점 쌍의 이동 벡터가 상기 결정된 최고 빈도의 이동 벡터들에 포함되는 특징점을 의미하며, 상기 호모그래피 행렬을 계산한다는 것은 영상 촬영 카메라(100)의 움직임을 추정한다는 것과 동일한 의미이다.In this case, the specific feature point means a feature point in which a motion vector of a pair of feature points is included in the determined highest frequency motion vectors, and calculating the homography matrix means estimating the motion of the imaging camera 100 . has the same meaning as

순차적으로, 카메라 움직임 보상 모듈(250)은 현재 영상 프레임(k_frame) 내의 픽셀들에 대해 상기 계산한 호모그래프 행렬의 역변환을 적용하여 영상 촬영 카메라(100) 움직임을 제거한 보상 영상 프레임들(예컨대, 1_frame_am 내지 n_frame_am)을 생성한다(S330).Sequentially, the camera motion compensation module 250 applies the inverse transformation of the calculated homograph matrix to pixels in the current image frame (k_frame) to remove the motion of the imaging camera 100 to compensate image frames (eg, 1_frame_am). to n_frame_am) are generated (S330).

카메라 움직임 보상 모듈(250)은 상기와 같이 영상 촬영 카메라(100)의 움직임을 제거하여 생성한 보상 영상 프레임들(1_frame_am 내지 n_frame_am)을 객체 검출 모듈(270)로 전송한다(S350).The camera motion compensation module 250 transmits the compensation image frames 1_frame_am to n_frame_am generated by removing the motion of the imaging camera 100 as described above to the object detection module 270 (S350).

객체 검출 모듈(270)은 카메라 움직임 보상 모듈(250)로부터 전송된 보상 영상 프레임들(1_frame_am 내지 n_frame_am) 내에서 추적의 대상이 되는 객체(예컨대, 사람)를 검출하고(S400), 검출된 객체의 생김새를 표현하는 특징 벡터를 추출한다(S430).The object detection module 270 detects an object (eg, a person) to be tracked within the compensation image frames (1_frame_am to n_frame_am) transmitted from the camera motion compensation module 250 ( S400 ), and A feature vector representing the appearance is extracted (S430).

본 발명의 일 실시 예에 따른 객체 검출 모듈(270)은 사람의 신체 부위에 대한 각 키 포인트(예컨대, 눈, 코, 입, 어깨, 팔꿈치, 손목, 허리, 무릎, 발목 등의 신체 부분)를 인식하여 역으로 사람의 전체 영역을 추정하는 방식을 사용하여 보다 정밀하게 사람을 검출할 수 있다(S400).The object detection module 270 according to an embodiment of the present invention detects each key point (eg, eye, nose, mouth, shoulder, elbow, wrist, waist, knee, ankle, etc.) of a body part of a person. It is possible to detect a person more precisely by using a method of recognizing and estimating the entire area of the person in reverse (S400).

이후 객체 검출 모듈(270)은 검출된 객체의 키 포인트 별 픽셀값에서 얻어지는 특징 벡터(생김새 벡터)를 추출한다(S430).Thereafter, the object detection module 270 extracts a feature vector (appearance vector) obtained from pixel values for each key point of the detected object ( S430 ).

순차적으로, 객체 추적 모듈(290)은 칼만 필터(Kalman Filter)를 이용해 상기 설명에서와 같이 이전 영상 프레임(예컨대, k-1_frame_am)까지 추적 중이던 객체(예컨대, 추적 객체(Tr_obj))에 대하여 현재 프레임(예컨대, k_frame_am)에서 존재할 것이라 예측되는 위치를 추정한다(S500).Sequentially, the object tracking module 290 uses a Kalman Filter to determine the current frame for the object (eg, the tracking object (Tr_obj)) that was being tracked until the previous image frame (eg, k-1_frame_am) as described above. A position predicted to exist in (eg, k_frame_am) is estimated ( S500 ).

이후, 객체 추적 모듈(290)은 객체 검출 모듈(270)로부터 상기 추정된 위치와 인접하는 소정 범위 내에서 검출된 후보 객체들(예컨대, Ca_obj1 내지 Ca_obj3) 또는 상기 소정 범위 밖에서 검출된 후보 객체들(예컨대, Ca_obj4 내지 Ca_obj6)과 상기 추적 중이던 객체(Tr_obj)를 대비하여 매칭 여부를 판단한다(S550).Thereafter, the object tracking module 290 detects candidate objects (eg, Ca_obj1 to Ca_obj3) detected within a predetermined range adjacent to the estimated position from the object detection module 270 or candidate objects detected outside the predetermined range ( For example, whether Ca_obj4 to Ca_obj6) is matched with the object being tracked (Tr_obj) is determined (S550).

이때, 상기 매칭이란 객체 추적 모듈(290)이 상기 대비시 상기 후보 객체(Ca_obj)와 상기 추적 객체(Tr_obj)가 동일한 객체로 판단되는 경우를 의미한다.In this case, the matching refers to a case in which the object tracking module 290 determines that the candidate object Ca_obj and the tracking object Tr_obj are the same object during the comparison.

객체 추적 모듈(290)이 상기 후보 객체와 추적 객체를 대비하여 매칭 여부를 판단하는 방법(S550)은 앞서 설명한 바 있으므로 중복되는 설명은 생략한다.Since the method ( S550 ) in which the object tracking module 290 compares the candidate object and the tracking object and determines whether or not they match is described above, a redundant description will be omitted.

이상의 설명은 본 발명의 기술 사상을 예시적으로 설명한 것에 불과한 것으로서, 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자라면 본 발명의 본질적인 특성에서 벗어나지 않는 범위에서 다양한 수정 및 변형이 가능하다. The above description is merely illustrative of the technical idea of the present invention, and various modifications and variations are possible without departing from the essential characteristics of the present invention by those skilled in the art to which the present invention pertains.

따라서, 본 발명에 개시된 실시 예들은 본 발명의 기술 사상을 한정하기 위한 것이 아니라 설명하기 위한 것이고, 이러한 실시 예에 의하여 본 발명의 기술 사상의 범위가 한정되는 것은 아니다. 본 발명의 보호 범위는 아래의 청구범위에 의하여 해석되어야 하며, 그와 동등한 범위 내에 있는 모든 기술 사상은 본 발명의 권리범위에 포함되는 것으로 해석되어야 할 것이다.Therefore, the embodiments disclosed in the present invention are not intended to limit the technical spirit of the present invention, but to explain, and the scope of the technical spirit of the present invention is not limited by these embodiments. The protection scope of the present invention should be interpreted by the following claims, and all technical ideas within the scope equivalent thereto should be construed as being included in the scope of the present invention.

10 : 추적 시스템
100 : 영상 촬영 카메라
200 : 영상 분석 서버
230 : 카메라 움직임 추정 모듈
250 : 카메라 움직임 보상 모듈
270 : 객체 검출 모듈
290 : 객체 추적 모듈10: tracking system
100: video recording camera
200: video analysis server
230: camera motion estimation module
250: camera motion compensation module
270: object detection module
290: object tracking module

Claims

an image capturing camera capable of horizontal movement, vertical movement and rotation, and generating moving image frames captured while moving a specific area; and
Compensation image frames from which the motion of the imaging camera is removed are generated by inverse homography transformation of motion vectors according to the motion of the imaging camera with respect to the moving image frames generated from the imaging camera, and the generated compensation image frames A system for real-time tracking of an object in a dynamic camera image comprising a; an image analysis server that tracks an object detected according to a key point from a location similarity and a shape similarity.

According to claim 1, wherein the image analysis server,
a camera motion estimation module for calculating motion vectors according to the motion of the imaging camera with respect to the moving image frames;
a camera motion compensation module for generating compensated image frames from which the motion of the image capturing camera is removed by inverse homography transforming the calculated motion vectors with respect to the moving image frames;
an object detection module that detects the object by recognizing the key point according to each part of the object from the compensation image frames; and
and an object tracking module for tracking the detected object according to the location similarity and appearance similarity.

The method of claim 2, wherein the camera motion estimation module comprises:
The pixel values of the moving image frames are converted into gray scale, and a grid composed of grid points disposed at regular intervals in a previous image frame among the moving image frames is generated, and the previous image frame is compared with respect to each of the grid points. A grid point in which the gray scale value change in the current image frame is greater than or equal to a predetermined reference value is selected as a feature point,
The movement vectors of the feature point pair are obtained by estimating the predicted positions of the selected feature points in the next image frame through the pyramid Lucas Kanae algorithm, and the highest frequency movement vectors are obtained by calculating a two-dimensional histogram for the movement vectors of the feature point pair. A system for tracking an object in a dynamic camera image in real time, characterized in that it is calculated as motion vectors according to the motion of the imaging camera.

The method of claim 2, wherein the motion compensation module comprises:
calculating a homography matrix for homography-converting feature points in a previous image frame among the moving image frames to a specific position of the current image frame based on the movement vectors according to the movement of the image capturing camera;
The inverse transformation of the calculated homography is applied to all pixels of the current image frame to generate compensated image frames obtained by removing the image capturing camera movement in the current image frame compared to the previous image frame. A system that tracks objects in real time.

According to claim 2, wherein the object detection module,
A system for tracking an object in a dynamic camera image in real time, characterized in that extracting a feature vector representing the shape of the detected object according to the recognized key point.

According to claim 2, wherein the object tracking module,
estimating a position predicted to exist in the current image frame with respect to the tracking object that was being tracked until the previous image frame among the objects detected by the object detection module using the Kalman filter,
The position similarity is determined by comparing the positions of the candidate objects detected in the current image frame by the object detection module with the positions of the tracking objects estimated using the Kalman filter,
determining the similarity in appearance by comparing the feature vector of the tracking object with the feature vector of the candidate objects,
The system for tracking an object in a dynamic camera image, characterized in that the tracking object is tracked by determining the sameness between the tracking object and the candidate object according to the determined position similarity and the shape similarity.

The method of claim 6, wherein the object tracking module,
and determining the identity between the tracking object and the candidate object by assigning a weight according to a predetermined ratio to each of the determined degree of location similarity and the degree of appearance similarity,
When the candidate objects are detected within a predetermined range adjacent to the estimated location of the tracking object, a weight greater than the appearance similarity is given to the location similarity, and the candidate is outside a predetermined range adjacent to the estimated location of the tracking object. A system for tracking an object in a dynamic camera image in real time, characterized in that when objects are detected, a weight greater than that of the location similarity is given to the appearance similarity.

transmitting the moving image frames captured by the image capturing camera moving to the camera motion estimation module;
estimating, by the camera motion estimation module, a motion vector of the image capturing camera from the transmitted moving image frames to estimate the motion of the image capturing camera;
generating, by the camera motion compensation module, compensating image frames obtained by removing the estimated motion of the image capturing camera from the moving image frames, and transmitting them to the object detection module;
detecting, by the object detection module, a tracking object in a corresponding compensation image frame among the transmitted compensation image frames, and extracting a feature vector of the detected tracking object;
estimating, by an object tracking module, an expected position of the tracking object in a compensation image frame after the corresponding compensation image frame by using a Kalman filter;
detecting, by the object detection module, at least one or more candidate objects in a compensation image frame subsequent to the corresponding compensation image frame, and extracting a feature vector of the detected candidate object; and
The object tracking module compares the position similarity between the candidate objects and the tracking object detected within or outside the predetermined range adjacent to the estimated expected position and the shape similarity according to the extracted feature vector to determine whether to match. A method of real-time tracking of an object in a dynamic camera image comprising the step of determining.

The method of claim 8, wherein the step of estimating, by the camera motion estimation module, the motion of the image capturing camera,
converting pixel values of the transmitted moving image frames into gray scale;
generating a grid composed of grid points arranged at regular intervals with respect to the moving image frames;
selecting, for each of the grid points, a grid point in which a gray scale value change in a current image frame compared to a previous image frame is greater than or equal to a predetermined reference value as a feature point;
estimating the positions of the feature points with respect to a next image frame through a predetermined optical flow method;
obtaining motion vectors of a pair of feature points using the feature point selected in the current image frame and the feature point in a frame next to the estimated next image; and
and calculating a motion vector for the imaging camera by calculating a two-dimensional histogram with respect to the obtained motion vectors to determine motion vectors with the highest frequency.

The method of claim 9, wherein the step of generating, by the camera motion compensation module, the compensation image frames and transmitting them to the object detection module, The camera motion compensation module generating the compensation image frames and transmitting the compensation image frames to the object detection module The steps to send to
calculating a homography matrix for converting feature points in the previous image frame to specific positions of the current image frame using the determined highest frequency motion vectors;
generating the compensated image frames from which the image capturing camera movement is removed by applying an inverse transformation of the calculated homograph matrix to pixels in the moving image frame; and
and transmitting the generated compensation image frames to the object detection module.