KR102614895B1

KR102614895B1 - Real-time object tracking system and method in moving camera video

Info

Publication number: KR102614895B1
Application number: KR1020210018306A
Authority: KR
Inventors: 천세욱; 김종헌
Original assignee: 주식회사 라온버드
Priority date: 2021-02-09
Filing date: 2021-02-09
Publication date: 2023-12-19
Also published as: KR20220114819A

Abstract

동적 카메라 영상 내의 객체를 실시간 추적하는 시스템이 개시된다. 상기 추적 시스템은 수평 이동, 수직 이동 및 회전이 가능하며, 특정 영역을 이동하며 촬영한 이동 영상 프레임들을 생성하는 영상 촬영 카메라 및 상기 영상 촬영 카메로부터 생성한 이동 영상 프레임들에 대하여 상기 영상 촬영 카메라의 움직임에 따른 이동 벡터들을 호모그래피 역변환함으로써 상기 영상 촬영 카메라의 움직임을 제거한 보상 영상 프레임들을 생성하고, 상기 생성한 보상 영상 프레임들로부터 키 포인트에 따라 탐지된 객체를 위치 유사도 및 생김새 유사도에 따라 추적하는 영상 분석 서버를 포함한다.A system for real-time tracking of objects in a dynamic camera image is disclosed. The tracking system is capable of horizontal movement, vertical movement, and rotation, and is capable of tracking a video camera that generates moving video frames taken while moving a specific area, and the moving video frames generated from the video camera. By inverse homography transforming the movement vectors according to the movement, compensation image frames are generated in which the movement of the video capture camera is removed, and objects detected according to key points from the generated compensation image frames are tracked according to location similarity and appearance similarity. Includes video analysis server.

Description

System and method for real-time tracking objects in dynamic camera images {REAL-TIME OBJECT TRACKING SYSTEM AND METHOD IN MOVING CAMERA VIDEO}

본 발명의 개념에 따른 실시 예는 동적 카메라로부터 포착된 영상 내의 객체를 실시간으로 추적하는 기술에 대한 것으로, 보다 상세하게는 움직이는 카메라로부터 촬영된 영상에서 카메라의 움직임 성분을 실시간 제거하고 탐지된 객체의 키 포인트 별 특징 벡터를 추출 비교함으로써 보다 빠르고 정확하게 객체를 추적할 수 있는 동적 카메라 영상 내의 객체를 실시간 추적하는 기술에 관한 것이다.An embodiment according to the concept of the present invention relates to a technology for tracking an object in an image captured from a dynamic camera in real time. More specifically, the real-time removal of camera movement components from an image captured from a moving camera and the detection of the detected object are performed in real time. This relates to a technology for real-time tracking objects in dynamic camera images that can track objects more quickly and accurately by extracting and comparing feature vectors for each key point.

현대는 각종 강력 범죄의 잦은 발생으로 사회적 불안감이 증가 됨에 따라 개인과 공공 안전에 대한 관심이 높아지고 있는 사회이다. 이에 따라, 각종 사건, 사고에 대한 사전 예방과 신속한 해결을 위하여 도심의 주택가, 학교, 도로 등에 CCTV(Closed-Circuit Television)가 설치되는 경우가 점차 증가하고 있다. 그리고 이러한 CCTV 시스템은 영상에서의 주요 특징점을 추출하여 객체를 인식하는 방법을 통해 단순 기록장치가 아닌 실시간 감시 및 신고자의 역할을 겸할 수 있는 지능형 CCTV로 진화하고 있다. 또한, 최근의 CCTV 시스템은 CCTV 카메라의 촬영 범위, 해상도의 한계를 극복하기 위해 PTZ(Pan-Tilt-Zoom) 카메라 등과 같은 동적 카메라를 이용하여 영상을 촬영하기 시작하였다. 그러나 종래 또는 최근의 CCTV 시스템들은 고정된 카메라가 촬영한 영상에 대해 객체를 인식 및 추적하는 방안 만을 제시할 뿐 움직이는 카메라로부터 촬영된 영상에 대해 실시간으로 객체를 인식 및 추적하는 방안은 제시하지 못하고 있다.In modern times, interest in personal and public safety is increasing as social anxiety increases due to the frequent occurrence of various violent crimes. Accordingly, the number of closed-circuit televisions (CCTVs) being installed in urban residential areas, schools, and roads is gradually increasing in order to prevent and quickly resolve various incidents and accidents. And these CCTV systems are evolving from simple recording devices to intelligent CCTVs that can serve as real-time surveillance and reporters by extracting key features from images and recognizing objects. In addition, recent CCTV systems have begun to capture images using dynamic cameras such as PTZ (Pan-Tilt-Zoom) cameras to overcome limitations in the shooting range and resolution of CCTV cameras. However, conventional or recent CCTV systems only provide a method of recognizing and tracking objects in images captured by fixed cameras, but do not provide a method of recognizing and tracking objects in real time for images captured by moving cameras. .

본 발명이 해결하고자 하는 기술적인 과제는 고정된 카메라 뿐만 아니라 움직이는 카메라에서 촬영된 영상에 대해서도 객체 추적이 가능하고, 저밀도 환경 뿐 아니라 고밀도의 복잡한 환경에서도 객체 추적이 가능하며, 기 촬영된 영상 뿐 아니라 현재 촬영 중인 영상에 대해서도 실시간 객체 추적이 가능한 동적 카메라 영상 내의 객체를 실시간 추적하는 시스템을 제공하는 것이다. The technical problem that the present invention aims to solve is that object tracking is possible not only for images captured from fixed cameras but also from moving cameras, and object tracking is possible not only in low-density environments but also in high-density complex environments, and not only in previously captured images, but also in images captured by moving cameras. The goal is to provide a system for real-time tracking objects in dynamic camera images that enables real-time object tracking even for images currently being shot.

본 발명이 해결하고자 하는 다른 기술적인 과제는 고정된 카메라 뿐만 아니라 움직이는 카메라에서 촬영된 영상에 대해서도 객체 추적이 가능하고, 저밀도 환경 뿐 아니라 고밀도의 복잡한 환경에서도 객체 추적이 가능하며, 기 촬영된 영상 뿐 아니라 현재 촬영 중인 영상에 대해서도 실시간 객체 추적이 가능한 동적 카메라 영상 내의 객체를 실시간 추적하는 방법을 제공하는 것이다.Another technical problem that the present invention aims to solve is that object tracking is possible for images captured not only from fixed cameras but also from moving cameras, and object tracking is possible not only in low-density environments but also in high-density and complex environments, and is capable of tracking only images that have already been captured. In addition, it provides a method of real-time tracking objects in dynamic camera images that enables real-time object tracking even for images currently being shot.

본 발명의 일 실시 예에 따른 동적 카메라 영상 내의 객체를 실시간 추적하는 시스템은 수평 이동, 수직 이동 및 회전이 가능하며, 특정 영역을 이동하며 촬영한 이동 영상 프레임들을 생성하는 영상 촬영 카메라 및 상기 영상 촬영 카메로부터 생성한 이동 영상 프레임들에 대하여 상기 영상 촬영 카메라의 움직임에 따른 이동 벡터들을 호모그래피 역변환함으로써 상기 영상 촬영 카메라의 움직임을 제거한 보상 영상 프레임들을 생성하고, 상기 생성한 보상 영상 프레임들로부터 키 포인트에 따라 탐지된 객체를 위치 유사도 및 생김새 유사도에 따라 추적하는 영상 분석 서버를 포함한다.A system for real-time tracking of objects in a dynamic camera image according to an embodiment of the present invention is capable of horizontal movement, vertical movement, and rotation, and includes an image capture camera that generates moving image frames captured while moving in a specific area, and the image capture camera. Compensation image frames that remove the movement of the video capture camera are generated by inverse homography transformation of movement vectors according to the movement of the video capture camera with respect to the moving image frames generated from the camera, and key points are generated from the generated compensation image frames. It includes a video analysis server that tracks objects detected according to location similarity and appearance similarity.

이때, 상기 영상 분석 서버는 상기 이동 영상 프레임들에 대하여 상기 영상 촬영 카메라의 움직임에 따른 이동 벡터들을 산출하는 카메라 움직임 추정 모듈과 상기 이동 영상 프레임들에 대하여 상기 산출한 이동 벡터들을 호모그래피 역변환함으로써 상기 영상 촬영 카메라의 움직임을 제거한 보상 영상 프레임들을 생성하는 카메라 움직임 보상 모듈과 상기 보상 영상 프레임들로부터 객체의 각 부분에 따른 상기 키 포인트를 인식하여 상기 객체를 탐지하는 객체 검출 모듈 및 상기 위치 유사도 및 생김새 유사도에 따라 상기 탐지된 객체를 추적하는 객체 추적 모듈을 포함한다.At this time, the video analysis server performs homography inverse transformation on the calculated motion vectors for the moving video frames and a camera motion estimation module for calculating motion vectors according to the movement of the video capture camera for the moving video frames. A camera motion compensation module that generates compensation image frames by removing the movement of the video capture camera, an object detection module that detects the object by recognizing the key points according to each part of the object from the compensation image frames, and the location similarity and appearance. It includes an object tracking module that tracks the detected object according to similarity.

실시 예에 따라, 상기 카메라 움직임 추정 모듈은 상기 이동 영상 프레임들의 픽셀값을 그레이 스케일로 변환하고, 상기 이동 영상 프레임들 중 이전 영상 프레임에 일정 간격으로 배치된 격자점들로 구성된 격자를 생성하여 상기 격자점들 각각에 대해 상기 이전 영상 프레임 대비 현재 영상 프레임에서의 그레이 스케일 값 변화량이 소정 기준치 이상인 격자점을 특징점으로 선별하며, 피라미드 루카스 카나에 알고리즘을 통해 상기 선별한 특징점들의 다음 영상 프레임에서의 예상 위치를 추정하여 특징점 쌍의 이동 벡터들을 구하고, 상기 특징점 쌍의 이동 벡터들에 대해 이차원 히스토그램을 계산하여 최고 빈도의 이동 벡터들을 상기 영상 촬영 카메라의 움직임에 따른 이동 벡터들로 산출하는 것을 특징으로 한다.Depending on the embodiment, the camera motion estimation module converts the pixel values of the moving image frames into gray scale, and generates a grid composed of grid points arranged at regular intervals in the previous image frame among the moving image frames. For each of the grid points, the grid point whose gray scale value change in the current image frame compared to the previous image frame is greater than a predetermined standard value is selected as a feature point, and the prediction of the selected feature points in the next image frame is performed through the pyramid Lucas Canae algorithm. It is characterized by estimating the position, obtaining the motion vectors of the feature point pair, calculating a two-dimensional histogram for the motion vectors of the feature point pair, and calculating the highest frequency motion vectors as motion vectors according to the movement of the video capture camera. .

실시 예에 따라, 상기 움직임 보상 모듈은 상기 영상 촬영 카메라의 움직임에 따른 이동 벡터들을 기초로 상기 이동 영상 프레임들 중 이전 영상 프레임 내의 특징점들을 현재 영상 프레임의 특정 위치로 호모그래피 변환시키기 위한 호모그래피 행렬을 계산하고, 상기 계산한 호모그래피의 역변환을 상기 현재 영상 프레임의 모든 픽셀에 적용하여 상기 이전 영상 프레임 대비 상기 현재 영상 프레임에서의 상기 영상 촬영 카메라 움직임을 제거한 보상 영상 프레임들을 생성하는 것을 특징으로 한다.Depending on the embodiment, the motion compensation module is a homography matrix for homography converting feature points in a previous image frame among the moving image frames to a specific position of the current image frame based on movement vectors according to the movement of the image capturing camera. Calculate and apply the inverse transformation of the calculated homography to all pixels of the current image frame to generate compensated image frames in which the video capture camera movement in the current image frame is removed compared to the previous image frame. .

실시 예에 따라, 상기 객체 검출 모듈은 상기 인식한 키 포인트에 따라 상기 탐지한 객체의 생김새를 나타내는 특징 벡터를 추출하는 것을 특징으로 한다.According to an embodiment, the object detection module is characterized in that it extracts a feature vector representing the appearance of the detected object according to the recognized key point.

실시 예에 따라, 상기 객체 추적 모듈은 칼만 필터를 이용해 상기 객체 검출 모듈로부터 탐지된 객체들 중 이전 영상 프레임까지 추적 중이던 추적 객체에 대하여 현재 영상 프레임에서 존재할 것이라 예측되는 위치를 추정하고, 상기 객체 검출 모듈에 의해 상기 현재 영상 프레임에서 탐지된 후보 객체들의 위치와 상기 칼만 필터를 이용해 추정한 상기 추적 객체의 위치를 비교하여 상기 위치 유사도를 판단하고, 상기 추적 객체의 특징 벡터와 상기 후보 객체들의 특징 벡터를 비교하여 상기 생김새 유사도를 판단하며, 상기 판단한 위치 유사도 및 상기 생김새 유사도에 따라 상기 추적 객체와 상기 후보 객체 간 동일성을 판단하여 상기 추적 객체를 추적하는 것을 특징으로 한다.Depending on the embodiment, the object tracking module uses a Kalman filter to estimate the position predicted to exist in the current image frame for the tracking object that was being tracked until the previous image frame among the objects detected by the object detection module, and detects the object. The location similarity is determined by comparing the positions of the candidate objects detected in the current image frame by the module with the positions of the tracked objects estimated using the Kalman filter, and the feature vectors of the tracked objects and the feature vectors of the candidate objects. The appearance similarity is determined by comparing, and the tracking object is tracked by determining the identity between the tracking object and the candidate object according to the determined location similarity and the appearance similarity.

이때, 상기 객체 추적 모듈은 상기 판단한 위치 유사도 및 상기 생김새 유사도 각각에 소정 비율에 따른 가중치를 부여하여 상기 추적 객체와 상기 후보 객체간 동일성을 판단하되, 상기 추정한 추적 객체의 위치와 인접하는 소정 범위 내에서 상기 후보 객체들이 탐지된 경우에는 상기 위치 유사도에 상기 생김새 유사도보다 큰 가중치를 부여하고, 상기 추정한 추정된 위치와 인접하는 소정 범위 밖에서 상기 후보 객체들이 탐지된 경우에는 상기 생김새 유사도에 상기 위치 유사도보다 큰 가중치를 부여하는 것을 특징으로 한다.At this time, the object tracking module determines the identity between the tracking object and the candidate object by assigning a weight according to a predetermined ratio to each of the determined location similarity and the appearance similarity, and determining the identity between the tracking object and the candidate object within a predetermined range adjacent to the estimated location of the tracking object. When the candidate objects are detected within the location similarity, a greater weight than the appearance similarity is given to the location similarity, and when the candidate objects are detected outside a predetermined range adjacent to the estimated estimated location, the location similarity is assigned to the location similarity. It is characterized by assigning a greater weight than similarity.

본 발명의 일 실시 예에 따른 동적 카메라 영상 내의 객체를 실시간 추적하는 방법은 영상 촬영 카메라가 이동하며 촬영한 이동 영상 프레임들을 카메라 움직임 추정 모듈로 전송하는 단계와 상기 카메라 움직임 추정 모듈이 상기 전송된 이동 영상 프레임들에서 상기 영상 촬영 카메라에 대한 이동 벡터를 계산하여 상기 영상 촬영 카메라의 움직임을 추정하는 단계와 카메라 움직임 보상 모듈이 상기 이동 영상 프레임들에서 상기 추정한 영상 촬영 카메라의 움직임을 제거한 보상 영상 프레임들을 생성하여 객체 검출 모듈로 전송하는 단계와 상기 객체 검출 모듈이 상기 전송된 보상 영상 프레임들 중 해당 보상 영상 프레임에서 추적 객체를 검출하고, 상기 검출된 추적 객체의 특징 벡터를 추출하는 단계와 객체 추적 모듈이 칼만 필터를 이용하여 상기 해당 보상 영상 프레임 이후의 보상 영상 프레임에서 상기 추적 객체의 예상 위치를 추정하는 단계와 상기 객체 검출 모듈이 상기 해당 보상 영상 프레임 이후의 보상 영상 프레임에서 후보 객체를 검출하고, 상기 검출된 후보 객체의 특징 벡터를 추출하는 단계 및 상기 객체 추적 모듈이 상기 추정한 예상 위치와 인접하는 소정 범위 내 또는 상기 소정 범위 외에서 검출된 상기 후보 객체들과 상기 추적 객체 상호 간의 위치 유사도 및 상기 추출한 특징 벡터에 따른 생김새 유사도를 대비하여 매칭 여부를 판단하는 단계를 포함한다.A method of real-time tracking an object in a dynamic camera image according to an embodiment of the present invention includes transmitting moving image frames captured while a video capture camera moves to a camera motion estimation module, and the camera motion estimation module transmits the transmitted motion. estimating the motion of the video capture camera by calculating a motion vector for the video capture camera in video frames; and a compensation video frame in which a camera motion compensation module removes the estimated motion of the video capture camera from the moving video frames. generating and transmitting to an object detection module, the object detection module detecting a tracking object in a corresponding compensation image frame among the transmitted compensation image frames, and extracting a feature vector of the detected tracking object; and object tracking. a module estimating the expected position of the tracked object in a compensation image frame subsequent to the corresponding compensation image frame using a Kalman filter; and the object detection module detecting a candidate object in a compensation image frame subsequent to the corresponding compensation image frame. , extracting feature vectors of the detected candidate objects, and positional similarity between the candidate objects detected within or outside the predetermined range adjacent to the estimated expected position by the object tracking module and the tracked objects, and It includes the step of determining whether or not there is a match by comparing the appearance similarity according to the extracted feature vector.

실시 예에 따라, 상기 카메라 움직임 추정 모듈이 상기 영상 촬영 카메라의 움직임을 추정하는 단계는 상기 전송된 이동 영상 프레임들의 픽셀값을 그레이 스케일로 변환하는 단계와 상기 이동 영상 프레임들에 대해 일정 간격으로 배치된 격자점들로 구성된 격자를 생성하는 단계와 상기 격자점들 각각에 대해 이전 영상 프레임 대비 현재 영상 프레임에서의 그레이 스케일 값 변화량이 소정 기준치 이상인 격자점을 특징점으로 선별하는 단계와 소정의 광학 흐름 방법을 통해 상기 특징점들의 다음 영상 프레임에 대한 위치를 추정하는 단계와 상기 현재 영상 프레임에서 선별한 특징점과 상기 추정한 다음 영상 다음 프레임서의 특징점을 이용하여 특징점 쌍의 이동 벡터들을 구하는 단계 및 상기 구한 이동 벡터들에 대해 이차원 히스토그램을 계산하여 최고 빈도의 이동 벡터들을 결정하여 상기 영상 촬영 카메라에 대한 이동 벡터를 계산하는 단계를 포함한다.Depending on the embodiment, the step of the camera motion estimation module estimating the motion of the video capture camera includes converting pixel values of the transmitted moving video frames into gray scale and arranging the moving video frames at regular intervals. A step of generating a grid composed of grid points, and for each of the grid points, selecting grid points whose gray scale value change in the current image frame compared to the previous image frame is greater than a predetermined standard value as feature points, and using a predetermined optical flow method. estimating the positions of the feature points for the next image frame; obtaining movement vectors of feature point pairs using feature points selected from the current image frame and feature points in the next frame of the estimated next image; and calculating movement vectors of the feature point pair. Calculating a two-dimensional histogram for the vectors to determine the highest frequency motion vectors and calculating a motion vector for the video camera.

실시 예에 따라, 상기 카메라 움직임 보상 모듈이 상기 보상 영상 프레임들을 생성하여 객체 검출 모듈로 전송하는 단계는 상기 결정한 최고 빈도의 이동 벡터들을 통해, 상기 이전 영상 프레임 내의 특징점들을 상기 현재 영상 프레임의 특정 위치로 변환시키기 위한 호모그래피 행렬을 계산하는 단계와 상기 이동 영상 프레임 내의 픽셀들에 대해 상기 계산한 호모그래프 행렬의 역변환을 적용하여 상기 영상 촬영 카메라 움직임을 제거한 상기 보상 영상 프레임들을 생성하는 단계 및 상기 생성한 보상 영상 프레임들을 상기 객체 검출 모듈로 전송하는 단계를 포함한다.Depending on the embodiment, the step of generating the compensated image frames by the camera motion compensation module and transmitting them to the object detection module involves placing feature points in the previous image frame at a specific location in the current image frame through the determined highest frequency motion vectors. Calculating a homography matrix for conversion to pixels in the moving image frame and applying inverse transformation of the calculated homograph matrix to the pixels in the moving image frame to generate the compensation image frames from which the image capturing camera movement is removed. and transmitting one compensated image frame to the object detection module.

상기와 같이 본 발명의 일 실시 예에 따른 동적 카메라 영상 내의 객체를 실시간 추적하는 시스템 및 방법은 영상 프레임 내 객체의 움직임에서 영상 카메라의 움직임을 배제한 영상 프레임 내 객체 자체의 움직임을 산출해 낼 수 있기 때문에 고정된 카메라 영상 뿐만 아니라 고정되지 않은 카메라 영상으로부터도 객체를 안정적으로 추적할 수 있는 효과가 있다.As described above, the system and method for real-time tracking an object in a dynamic camera image according to an embodiment of the present invention can calculate the movement of the object itself within the image frame, excluding the movement of the video camera from the movement of the object within the image frame. Therefore, it is effective in stably tracking objects not only from fixed camera images but also from non-fixed camera images.

또한, 본 발명의 일 실시 예에 따른 동적 카메라 영상 내의 객체를 실시간 추적하는 시스템 및 방법은 시스템 부하를 최소화하며 매우 빠르게 카메라 움직임을 배제시킬 수 있기 때문에 기 촬영된 영상 뿐 아니라 현재 촬영 중인 영상에 대해서도 실시간 객체를 추적할 수 있는 효과가 있다.In addition, the system and method for real-time tracking of objects in a dynamic camera image according to an embodiment of the present invention minimizes system load and can exclude camera movement very quickly, so it can be applied not only to previously captured images but also to images currently being captured. It has the effect of tracking objects in real time.

나아가, 본 발명의 일 실시 예에 따른 동적 카메라 영상 내의 객체를 실시간 추적하는 시스템 및 방법은 사람의 신체 부위에 대한 각 키 포인트를 인식하는 방식을 사용함으로써 저밀도 환경 뿐 아니라 객체가 중첩되는 고밀도 복잡한 환경에서도 정밀하게 객체를 추적할 수 있는 효과가 있다.Furthermore, the system and method for real-time tracking of objects in a dynamic camera image according to an embodiment of the present invention uses a method of recognizing each key point for a person's body part, so that it can be used not only in low-density environments but also in high-density complex environments where objects overlap. It is also effective in tracking objects precisely.

본 발명의 상세한 설명에서 인용되는 도면을 보다 충분히 이해하기 위한 각 도면의 상세한 설명이 제공된다.
도 1은 본 발명의 일 실시 예에 따른 동적 카메라 영상 내의 객체를 실시간 추적하는 시스템의 구성을 나타내는 블럭도이다.
도 2는 도 1에 도시된 카메라 움직임 추정 모듈이 해당 영상 프레임에 격자점 및 특징점을 표시한 예시도이다.
도 3은 도 1에 도시된 카메라 움직임 추정 모듈이 이동 벡터에 대해 이차원 히스토그램을 계산하여 높은 빈도의 이동 벡터를 결정한 예시도이다.
도 4는 본 발명의 일실 시예 따른 호모그래피 역변환이 적용된 예시도이다.
도 5는 본 발명의 일 실시 예에 따른 동적 카메라 영상 내의 객체를 실시간 추적하는 방법을 설명하기 위한 순서도이다. A detailed description of each drawing is provided to more fully understand the drawings cited in the detailed description of the present invention.
Figure 1 is a block diagram showing the configuration of a system for real-time tracking of objects in a dynamic camera image according to an embodiment of the present invention.
FIG. 2 is an example diagram in which the camera motion estimation module shown in FIG. 1 displays grid points and feature points in the corresponding image frame.
FIG. 3 is an example of how the camera motion estimation module shown in FIG. 1 determines a high-frequency motion vector by calculating a two-dimensional histogram for the motion vector.
Figure 4 is an exemplary diagram in which inverse homography transformation is applied according to an embodiment of the present invention.
Figure 5 is a flow chart to explain a method for real-time tracking an object in a dynamic camera image according to an embodiment of the present invention.

본 명세서에 개시되어 있는 본 발명의 개념에 따른 실시 예들에 대해서 특정한 구조적 또는 기능적 설명들은 단지 본 발명의 개념에 따른 실시 예들을 설명하기 위한 목적으로 예시된 것으로서, 본 발명의 개념에 따른 실시 예들은 다양한 형태들로 실시될 수 있으며 본 명세서에 설명된 실시 예들에 한정되지 않는다.Specific structural or functional descriptions of the embodiments according to the concept of the present invention disclosed in this specification are merely illustrative for the purpose of explaining the embodiments according to the concept of the present invention, and the embodiments according to the concept of the present invention are It may be implemented in various forms and is not limited to the embodiments described herein.

본 발명의 개념에 따른 실시 예들은 다양한 변경들을 가할 수 있고 여러 가지 형태들을 가질 수 있으므로 실시 예들을 도면에 예시하고 본 명세서에 상세하게 설명하고자 한다. 그러나 이는 본 발명의 개념에 따른 실시 예들을 특정한 개시 형태들에 대해 한정하려는 것이 아니며, 본 발명의 사상 및 기술 범위에 포함되는 모든 변경, 균등물, 또는 대체물을 포함한다.Since the embodiments according to the concept of the present invention can make various changes and have various forms, the embodiments will be illustrated in the drawings and described in detail in this specification. However, this is not intended to limit the embodiments according to the concept of the present invention to specific disclosed forms, and includes all changes, equivalents, or substitutes included in the spirit and technical scope of the present invention.

제1 또는 제2 등의 용어는 다양한 구성 요소들을 설명하는데 사용될 수 있지만, 상기 구성 요소들은 상기 용어들에 의해 한정되어서는 안 된다. 상기 용어들은 하나의 구성 요소를 다른 구성 요소로부터 구별하는 목적으로만, 예컨대 본 발명의 개념에 따른 권리 범위로부터 이탈되지 않은 채, 제1구성요소는 제2구성요소로 명명될 수 있고, 유사하게 제2구성요소는 제1구성요소로도 명명될 수 있다.Terms such as first or second may be used to describe various components, but the components should not be limited by the terms. The above terms are used only for the purpose of distinguishing one component from another component, for example, without departing from the scope of rights according to the concept of the present invention, a first component may be named a second component, and similarly The second component may also be named the first component.

어떤 구성요소가 다른 구성요소에 "연결되어" 있다거나 "접속되어" 있다고 언급된 때에는, 그 다른 구성요소에 직접적으로 연결되어 있거나 또는 접속되어 있을 수도 있지만, 중간에 다른 구성요소가 존재할 수도 있다고 이해되어야 할 것이다. 반면에, 어떤 구성요소가 다른 구성요소에 "직접 연결되어" 있다거나 "직접 접속되어" 있다고 언급된 때에는, 중간에 다른 구성요소가 존재하지 않는 것으로 이해되어야 할 것이다. 구성요소들 간의 관계를 설명하는 다른 표현들, 즉 "~사이에"와 "바로 ~사이에" 또는 "~에 이웃하는"과 "~에 직접 이웃하는" 등도 마찬가지로 해석되어야 한다.When a component is said to be "connected" or "connected" to another component, it is understood that it may be directly connected to or connected to the other component, but that other components may exist in between. It should be. On the other hand, when it is mentioned that a component is “directly connected” or “directly connected” to another component, it should be understood that there are no other components in between. Other expressions that describe the relationship between components, such as "between" and "immediately between" or "neighboring" and "directly adjacent to" should be interpreted similarly.

본 명세서에서 사용한 용어는 단지 특정한 실시 예를 설명하기 위해 사용된 것으로, 본 발명을 한정하려는 의도가 아니다. 단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한, 복수의 표현을 포함한다.The terms used in this specification are merely used to describe specific embodiments and are not intended to limit the present invention. Singular expressions include plural expressions unless the context clearly dictates otherwise.

본 명세서에서, "포함한다" 또는 "갖는다" 등의 용어는 설명된 특징, 숫자, 단계, 동작, 구성요소, 부분품 또는 이들을 조합한 것이 존재함을 지정하려는 것이지, 하나 또는 그 이상의 다른 특징들이나 숫자, 단계, 동작, 구성요소, 부분품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 미리 배제하지 않는 것으로 이해되어야 한다.In this specification, terms such as “comprise” or “have” are intended to indicate the presence of the described features, numbers, steps, operations, components, parts, or combinations thereof, but are not intended to indicate the presence of one or more other features or numbers. It should be understood that this does not preclude the existence or addition of steps, operations, components, parts, or combinations thereof.

다르게 정의되지 않는 한, 기술적이거나 과학적인 용어를 포함해서 여기서 사용되는 모든 용어들은 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자에 의해 일반적으로 이해되는 것과 동일한 의미가 있다.Unless otherwise defined, all terms used herein, including technical or scientific terms, have the same meaning as generally understood by a person of ordinary skill in the technical field to which the present invention pertains.

일반적으로 사용되는 사전에 정의되어 있는 것과 같은 용어들은 관련 기술의 문맥상 가지는 의미와 일치하는 의미를 포함하는 것으로 해석되어야 하며, 본 명세서에서 명백하게 정의하지 않는 한, 이상적이거나 과도하게 형식적인 의미로 해석되지 않는다.Terms as defined in commonly used dictionaries should be interpreted to include meanings consistent with the meanings they have in the context of the related technology, and unless clearly defined in this specification, be interpreted in an ideal or excessively formal sense. It doesn't work.

이하, 첨부한 도면을 참조하여 본 발명의 바람직한 실시 예를 설명함으로써, 본 발명을 상세히 설명한다.Hereinafter, the present invention will be described in detail by explaining preferred embodiments of the present invention with reference to the accompanying drawings.

도 1은 본 발명의 일 실시 예에 따른 동적 카메라 영상 내의 객체를 실시간 추적하는 시스템(이하, '추적 시스템(10)'이라 함)의 구성을 나타내는 블럭도이고,Figure 1 is a block diagram showing the configuration of a system for real-time tracking an object in a dynamic camera image (hereinafter referred to as 'tracking system 10') according to an embodiment of the present invention.

도 1을 참조하면, 추적 시스템(10)은 특정 지역의 원하는 영역을 촬영하는 영상 촬영 카메라(100)와 촬영된 영역에 대한 영상으로부터 카메라의 이동 성분을 제거하고 객체의 이동 성분에 따라 객체를 추적하는 영상 분석 서버(200)를 포함한다.Referring to FIG. 1, the tracking system 10 includes a video capture camera 100 that captures a desired area of a specific area, removes the camera's movement component from the image of the captured area, and tracks the object according to the object's movement component. It includes a video analysis server 200 that does.

영상 촬영 카메라(100)는 고정된 영역을 하는 일반적인 CCTV 카메라 뿐만 아니라 동적으로 움직이는 카메라, 예컨대 수평 이동, 수직 이동, 줌 조정이 가능한 팬틸트줌(Pan-Tilt-Zoom, PTZ) 카메라로 구현될 수 있다.The video capture camera 100 can be implemented not only as a general CCTV camera with a fixed area, but also as a dynamically moving camera, such as a Pan-Tilt-Zoom (PTZ) camera capable of horizontal movement, vertical movement, and zoom adjustment. there is.

즉, 영상 촬영 카메라(100)는 특정 지역 또는 특정 객체를 촬영하기 위해 이동이 가능하며, 이동하며 촬영한 이동 영상 프레임들(1_frame 내지 n_frame)을 영상 분석 서버(200)로 전송한다.That is, the video capture camera 100 can move to photograph a specific area or a specific object, and transmits moving video frames (1_frame to n_frame) captured while moving to the video analysis server 200.

영상 분석 서버(200)는 카메라 움직임 추정 모듈(230), 카메라 움직임 보상 모듈(250), 객체 검출 모듈(270) 및 객체 추적 모듈(290)을 포함하며, 영상 촬영 카메라(100)로부터 전송된 영상 프레임들에 대해 카메라의 이동 성분을 제거한 객체 이동 상태를 파악하여 추적하는 역할을 수행한다.The video analysis server 200 includes a camera motion estimation module 230, a camera motion compensation module 250, an object detection module 270, and an object tracking module 290, and images transmitted from the video capture camera 100. It performs the role of identifying and tracking the object movement state for each frame by removing the camera movement component.

우선, 영상 분석 서버(200)의 카메라 움직임 추정 모듈(230)은 영상 촬영 카메라(100)를 통해 이동 영상 프레임들(1_frame 내지 n_frame)을 획득하게 되면, 이전 영상 프레임(예컨대, k-1_frame) 대비 현재 영상 프레임(예컨대, k_frame)에서 영상 촬영 카메라(100)의 움직임을 추정하는 역할을 한다.First, when the camera motion estimation module 230 of the video analysis server 200 acquires moving video frames (1_frame to n_frame) through the video capture camera 100, it compares them to the previous video frame (e.g., k-1_frame). It serves to estimate the movement of the video capture camera 100 in the current video frame (eg, k_frame).

이때, 카메라 움직임 추정 모듈(230)은 실시간 동영상 스티칭 알고리즘(video stitching algorithm) 기반으로 이전 영상 프레임(k-1_frame) 대비 현재 영상 프레임(k_frame)에서 배경(예컨대, 건물, 상점 등)의 위치가 어떻게 변화 했는지를 파악해 영상 촬영 카메라(100)의 움직임을 추정한다.At this time, the camera motion estimation module 230 determines how the background (e.g., building, store, etc.) is located in the current video frame (k_frame) compared to the previous video frame (k-1_frame) based on a real-time video stitching algorithm. The movement of the video camera 100 is estimated by determining whether there has been a change.

이전 영상 프레임(예컨대, k-1_frame) 대비 현재 영상 프레임(예컨대, k_frame)에서 객체들의 움직임이 있다고 할 때, 이는 객체 자체의 움직임이 있었거나 영상 촬영 카메라(100)의 움직임(예컨대, 배경의 위치 변화)이 있었다 할 수 있으며, 또한 객체 자체의 움직임 및 영상 촬영 카메라(100)의 움직임이 모두 존재하는 경우일 수도 있다.When there is movement of objects in the current image frame (e.g., k_frame) compared to the previous image frame (e.g., k-1_frame), this means that there is movement of the object itself or movement of the video capture camera 100 (e.g., the position of the background) It may be said that there is a change), and it may also be the case that both the movement of the object itself and the movement of the video capture camera 100 exist.

상기와 같은 여러 가지 이유의 객체 움직임이 있을 수 있고, 카메라 움직임 추정 모듈(230)은 영상 프레임 내 객체의 움직임은 배제하고 영상 프레임 내의 카메라 움직임만을 추정한다.There may be object movement for various reasons as described above, and the camera motion estimation module 230 excludes the movement of the object within the image frame and estimates only the camera movement within the image frame.

이와 같이 카메라 움직임 추정 모듈(230)이 영상 프레임 내의 카메라 움직임을 추정하는 이유는 영상 프레임에서 직접적으로 얻을 수 있는 것은 '영상 프레임 내 객체의 움직임'이고, 이러한 '영상 프레임 내 객체의 움직임'에서 '영상 카메라의 움직임'을 배제하면 '영상 프레임 내 객체 자체의 움직임'을 산출해 낼 수 있기 때문이다.The reason why the camera motion estimation module 230 estimates the camera movement within the video frame is that what can be directly obtained from the video frame is the 'movement of the object within the video frame', and from this 'movement of the object within the video frame' This is because by excluding the ‘movement of the video camera’, the ‘movement of the object itself within the video frame’ can be calculated.

카메라 움직임 추정 모듈(230)이 이동 영상 프레임에서 영상 촬영 카메라(100)의 움직임을 추정하는 과정은 다음과 같다.The process by which the camera motion estimation module 230 estimates the motion of the video capture camera 100 in the moving video frame is as follows.

우선, 카메라 움직임 추정 모듈(230)은 해당 영상 프레임에 일정 간격으로 배치된 격자점들로 구성된 격자(Grid)를 생성하며, 상기 격자를 이루는 격자점들 각각은 해당 영상 프레임에서 픽셀값 뿐만 아니라 x축 좌표값 및 y축 좌표값을 갖는 위치 정보를 보유한다.First, the camera motion estimation module 230 generates a grid composed of grid points arranged at regular intervals in the corresponding image frame, and each of the grid points forming the grid is not only the pixel value in the corresponding image frame, but also x It holds location information with axis coordinate values and y-axis coordinate values.

이후, 카메라 움직임 추정 모듈(230)은 상기 생성한 격자점들을 대상으로 영상 프레임 내 움직임을 알아낼 기준이 되는 특징점을 추출한다.Afterwards, the camera motion estimation module 230 extracts feature points that serve as a standard for determining movement within the image frame using the generated grid points.

이때, 상기의 특징점 추출을 위하여 이미지 피라미드를 이용하는 SIFT(Scale Invariant Feature Transform) 또는 헤시안(Hessian) 행렬식을 이용하는 SURF(Speed-Up Robust Feature) 등과 같은 일반적인 특징점 추출 알고리즘을 사용할 수도 있으나, 상기 SIFR나 SURF 알고리즘은 특징점을 추출하고 매칭하기 위한 연산량이 많기 때문에 실시간 환경에는 적합하지 않다.At this time, to extract the feature points, a general feature point extraction algorithm such as SIFT (Scale Invariant Feature Transform) using an image pyramid or SURF (Speed-Up Robust Feature) using a Hessian determinant may be used, but the SIFR or The SURF algorithm is not suitable for real-time environments because it requires a lot of computation to extract and match feature points.

따라서, 카메라 움직임 추정 모듈(230)은 종래 일반적인 방식 대비 현저히 빠른 연산 속도를 낼 수 있는 특징점 추출 방식을 이용한다.Therefore, the camera motion estimation module 230 uses a feature point extraction method that can achieve a significantly faster calculation speed than the conventional method.

카메라 움직임 추정 모듈(230)이 상기 특징점을 추출하는 방법을 보다 상세히 설명한다.The method by which the camera motion estimation module 230 extracts the feature points will be described in more detail.

우선, 카메라 움직임 추정 모듈(230)은 영상 프레임들의 픽셀값을 컬러 값이 아닌 무채색 스케일, 즉 그레이 스케일(gray scale)로 변환한다.First, the camera motion estimation module 230 converts the pixel values of the image frames into an achromatic scale, that is, gray scale, rather than a color value.

실시 예에 따라, 상기 그레이 스케일은 0.0 ~ 1.0 범위 내 또는 0 ~ 255 범위 내로 설정될 수 있다.Depending on the embodiment, the gray scale may be set within the range of 0.0 to 1.0 or within the range of 0 to 255.

이후, 카메라 움직임 추정 모듈(230)은 격자점들 각각에 대해 이전 영상 프레임(예컨대, k-1_frame) 대비 현재 영상 프레임(k_frame)에서의 그레이 스케일 값 변화량을 계산하고, 상기 계산된 그레이 스케일 값 변화량이 소정 기준치 이상인 경우에만 해당 격자점을 상기 특징점으로 선별한다.Afterwards, the camera motion estimation module 230 calculates the gray scale value change in the current image frame (k_frame) compared to the previous image frame (e.g., k-1_frame) for each grid point, and the calculated gray scale value change. Only when it is greater than this predetermined standard value, the corresponding grid point is selected as the feature point.

예컨대, 이전 영상 프레임(k-1_frame)에서 x 좌표값이 10이고 y 좌표값이 10인 제1격자점의 그레이 스케일값은 1.0(흰색)이고, 현재 영상 프레임(k_frame)에서 동일 좌표값을 갖는 상기 제1격자점의 그레이 스케일값이 0.0(검은색)인 경우에 상기 제1격자점의 그레이 스케일 값 변화량은 1.0이 된다.For example, the gray scale value of the first grid point whose x-coordinate value is 10 and y-coordinate value is 10 in the previous image frame (k-1_frame) is 1.0 (white), and in the current image frame (k_frame), the gray scale value of the first grid point is 1.0 (white) and has the same coordinate value in the current image frame (k_frame). When the gray scale value of the first grid point is 0.0 (black), the amount of change in the gray scale value of the first grid point is 1.0.

이때, 상기 소정 기준치가 0.5라 한다면, 상기 제1격자점의 그레이 스케일 값 변화량인 1.0은 상기 소정 기준치 0.5 보다 크기 때문에 카메라 움직임 추정 모듈(230)은 상기 제1격자점을 상기 상기 특징점으로 선별한다.At this time, if the predetermined reference value is 0.5, 1.0, which is the amount of change in the gray scale value of the first grid point, is greater than the predetermined reference value 0.5, so the camera motion estimation module 230 selects the first grid point as the feature point. .

이와 같은 방식으로 카메라 움직임 추정 모듈(230)은 각 격자점들에 대해 그레이 스케일 값 변화량을 계산함으로써 상기 특징점을 선별한다.In this way, the camera motion estimation module 230 selects the feature point by calculating the amount of change in gray scale value for each grid point.

도 2는 도 1에 도시된 카메라 움직임 추정 모듈(230)이 해당 영상 프레임에 격자점 및 특징점을 표시한 예시도이다.FIG. 2 is an example diagram in which the camera motion estimation module 230 shown in FIG. 1 displays grid points and feature points in the corresponding image frame.

도 2를 참조하면, 해당 영상 프레임에 격자가 생성되고 상기 생성된 격자는 소정 간격으로 배치된 격자점으로 구성됨을 알 수 있다.Referring to FIG. 2, it can be seen that a grid is created in the corresponding image frame, and the generated grid is composed of grid points arranged at predetermined intervals.

또한, 격자점들 중 노란색으로 표시된 격자점들은 최종 선별 된 특징점을 나타내고, 빨간색으로 표시된 격자점들은 상기의 소정 기준치를 만족하지 못해 특징점으로 선별되지 못하는 격자점을 나타낸다.In addition, among the grid points, the grid points shown in yellow represent the final selected feature points, and the grid points shown in red represent grid points that cannot be selected as feature points because they do not meet the above predetermined standard.

다시 도 1을 참조하면, 카메라 움직임 추정 모듈(230)은 소정의 광학 흐름(Optical Flow) 방법을 통해 현재 영상 프레임(k_frame)에서 추출한 특징점들이 다음 프레임(k+1_frame)의 어느 위치로 이동했는지 추정한다.Referring again to FIG. 1, the camera motion estimation module 230 estimates where the feature points extracted from the current image frame (k_frame) have moved to in the next frame (k+1_frame) through a predetermined optical flow method. do.

실시 예에 따라, 상기 광학 흐름 방법으로 블록 매칭 방법(Block Matching method), 혼-셩크 알고리즘(Horn-Shunck algorithm), 루카스 카나데 알고리즘(Lucas-Kanade algorithm) 또는 군나르 파너백 알고리즘(Gunnar Farenback's algorithm) 등의 방법을 이용할 수도 있으나, 본 발명의 일 실시 예 따른 카메라 움직임 추정 모듈(230)은 추출한 특징점에 대하여 광학 흐름을 분석할 수 있고, 알고리즘 소요 시간 대비 정확성이 높아 실시간 영상에 가장 적합한 피라미드 루카스 카나에 알고리즘(Iterative Lucas-Kanade Method with Pyramids)을 이용하는 것이 바람직하다. Depending on the embodiment, the optical flow method may be a block matching method, Horn-Shunck algorithm, Lucas-Kanade algorithm, Gunnar Farenback's algorithm, etc. The method may be used, but the camera motion estimation module 230 according to an embodiment of the present invention can analyze the optical flow for the extracted feature points, and has high accuracy compared to the time required for the algorithm, so it is most suitable for real-time imaging. It is desirable to use an algorithm (Iterative Lucas-Kanade Method with Pyramids).

즉, 카메라 움직임 추정 모듈(230)은 피라미드 루카스 카나에 알고리즘을 통해 상기와 같이 선별한 특징점들이 다음 영상 프레임(k+1_frame)의 어느 위치로 이동했는지 추정할 수 있다.That is, the camera motion estimation module 230 can estimate where the feature points selected as above have moved to in the next image frame (k+1_frame) through the Pyramid Lucas Canae algorithm.

이를 통해 카메라 움직임 추정 모듈(230)은 현재 영상 프레임(k_frame)에서의 특징점 위치 정보와 다음 영상 프레임(k+1_frame)에 대해 추정한 특징점의 위치 정보를 이용하여 특징점 쌍(현재 영상 프레임에서의 특징점과 다음 영상 프레임에서 추정된 특징점)의 이동 벡터를 구한다.Through this, the camera motion estimation module 230 uses the feature point location information in the current image frame (k_frame) and the location information of the feature point estimated for the next image frame (k+1_frame) to create a feature point pair (feature point in the current image frame). and find the movement vector of the feature point estimated in the next video frame.

예컨대, 현재 영상 프레임(k_frame)에 존재하는 여러 특징점들 중 제1 특징점의 위치 정보가 (41, 25)이고 다음 영상 프레임(k+1_frame)에서 추정된 상기 제1 특징점의 위치 정보가 (73, 81)라 하면, 상기 제1 특징점의 이동 벡터는 V1 = (32, 56)이 된다.For example, among several feature points existing in the current image frame (k_frame), the location information of the first feature point is (41, 25), and the location information of the first feature point estimated in the next image frame (k+1_frame) is (73, 81), the movement vector of the first feature point is V1 = (32, 56).

이와 같은 방법으로 카메라 움직임 추정 모듈(230)은 현재 영상 프레임(k_frame)에 존재하는 모든 특징점들에 대해 다음 영상 프레임(k+1_frame)에서의 위치를 추정하고, 이들 특징점 쌍에 대한 이동 벡터를 구할 수 있다.In this way, the camera motion estimation module 230 estimates the positions in the next image frame (k+1_frame) for all feature points existing in the current image frame (k_frame) and obtains movement vectors for these feature point pairs. You can.

한편, 이들 특징점 쌍의 이동 벡터들 중에는 영상 프레임 내 객체의 움직임 만으로 발생한 이동 벡터가 있을 수 있고, 또한 전적으로 영상 촬영 카메라(100)의 움직임만으로 발생한 이동 벡터가 있을 수 있다.Meanwhile, among the movement vectors of these feature point pairs, there may be a movement vector generated only by the movement of the object within the video frame, and there may also be a movement vector generated solely by the movement of the video capturing camera 100.

사람, 자동차와 같이 움직이는 객체는 영상 프레임 내에서 일부 영역인 반면, 건물 등과 같은 배경이 되는 부분은 영상 프레임 내에서 대부분의 영역을 차지할 것이기 때문에 배경이 되는 부분으로부터 발생한 이동 벡터는 영상 촬영 카메라(100)의 움직임에 의한 것으로 볼 수 있다.Moving objects such as people and cars occupy a small area within the video frame, while background parts such as buildings occupy most of the area within the video frame. Therefore, the movement vector generated from the background part is generated by the video capture camera (100 ) can be seen as being caused by the movement of.

따라서 전체 이동 벡터에 대해 이차원 히스토그램(2D Histogram)을 계산하여 최고 빈도의 이동 벡터를 결정할 수 있고, 상기 최고 빈도에 해당하는 이동 벡터들이 배경의 이동 벡터, 즉 영상 촬영 카메라(100)의 움직임에 따른 이동 벡터로 판단한다.Therefore, the motion vector with the highest frequency can be determined by calculating a 2D histogram for all motion vectors, and the motion vectors corresponding to the highest frequency are the background motion vectors, that is, according to the movement of the video capture camera 100. Judging by the movement vector.

실제적으로 영상 프레임 내의 상기 특징점 쌍의 이동 벡터들은 수 백개 이상일 수 있으나, 본 발명의 상세한 설명에서는 설명의 편의를 위해 상기 전체 이동 벡터들의 수가 4개인 경우를 가정하고, 상기 이차원 히스토그램의 저장소(bin)는 x, y를 10 단위로 분류하는 것으로 설명한다.In reality, there may be more than hundreds of motion vectors of the feature point pair in an image frame. However, in the detailed description of the present invention, for convenience of explanation, it is assumed that the total number of motion vectors is 4, and the storage (bin) of the two-dimensional histogram is assumed to be 4. explains that x and y are classified into 10 units.

예컨대, 제1 이동 벡터(V1)가 (32, 56), 제2 이동 벡터(V2)가 (41, 25), 제3 이동 벡터(V3)가 (37, 51), 제4 이동 벡터(V4)가 (35, 53)이라 가정한다.For example, the first movement vector (V1) is (32, 56), the second movement vector (V2) is (41, 25), the third movement vector (V3) is (37, 51), and the fourth movement vector (V4) ) is (35, 53).

이때, 상기 제1 이동 벡터 내지 제4 이동 벡터(V1 ~ V4)를 각각 x, y 10 단위의 저장소에 입력한다면, 상기 제1 이동 벡터(V1)는 (3, 5)의 저장소에 입력되고, 상기 제2 이동 벡터(V2)는 (4, 2)의 저장소에 입력되고, 상기 제3 이동 벡터(V3)는 (3, 5)의 저장소에 입력되며, 상기 제4 이동 벡터(V4)는 (3, 5)의 저장소에 입력된다.At this time, if the first to fourth movement vectors (V1 to V4) are input to storage of x and y units of 10, respectively, the first movement vector (V1) is input to the storage of (3, 5), The second motion vector (V2) is input into the storage of (4, 2), the third motion vector (V3) is input into the storage of (3, 5), and the fourth motion vector (V4) is ( It is entered into the repositories of 3 and 5).

이 때, 가장 많은 이동 벡터가 입력된 저장소는 제1 이동 벡터(V1), 제3 이동 벡터(V3) 및 제4 이동 벡터(V4)가 저장되어 있는 (3, 5) 저장소이며, 상기 (3, 5) 저장소에 저장된 제1 이동 벡터, 제3 이동 벡터 및 제4 이동 벡터가 최고 빈도에 해당하는 이동 벡터(즉, 배경의 이동 벡터)라 할 수 있다.At this time, the storage where the most movement vectors are input is the (3, 5) storage where the first movement vector (V1), the third movement vector (V3), and the fourth movement vector (V4) are stored, and the (3) , 5) The first, third, and fourth motion vectors stored in the storage can be said to be motion vectors (i.e., background motion vectors) corresponding to the highest frequency.

이에, 카메라 움직임 추정 모듈(230)은 배경의 이동 벡터와 관련된 상기 제1 이동 벡터(V1), 제3 이동 벡터(V3) 및 제4 이동 벡터(V4)를 영상 촬영 카메라(100)의 움직임에 따른 이동 벡터로 판단한다.Accordingly, the camera motion estimation module 230 uses the first motion vector (V1), the third motion vector (V3), and the fourth motion vector (V4) related to the background motion vector to the motion of the video capture camera 100. It is judged based on the movement vector.

영상 촬영 카메라(100)는 단순히 수평 방향(또는 수직 방향)으로 움직일 수 있으나 회전(수평 방향 및 수직 방향으로 동시에 움직이는 경우)할 수도 있기 때문에, 영상 촬영 카메라(100)의 움직임에 따른 이동 벡터는 상기와 같이 유사한 크기와 방향을 갖는 복수의 이동 벡터들(예컨대, V1, V3 및 V4)로 나타날 수 있다.Since the video capture camera 100 can simply move in the horizontal direction (or vertical direction), but can also rotate (moving in the horizontal and vertical directions simultaneously), the movement vector according to the movement of the video capture camera 100 is as described above. It may appear as a plurality of movement vectors (eg, V1, V3, and V4) with similar sizes and directions.

도 3은 카메라 움직임 추정 모듈(230)이 이동 벡터에 대해 이차원 히스토그램을 계산하여 높은 빈도의 이동 벡터를 결정한 예시도이다.Figure 3 is an example diagram in which the camera motion estimation module 230 determines a high-frequency motion vector by calculating a two-dimensional histogram for the motion vector.

도 3을 참조하면, 표시된 빨간선이 최종 선택된 카메라 움직임에 해당하는 이동 벡터가 되며, 대략적으로 이전 프레임 대비 현재 프레임에서 빨간선 정도의 방향과 크기로 움직임이 있었다고 판단할 수 있다.Referring to FIG. 3, the displayed red line becomes a movement vector corresponding to the final selected camera movement, and it can be determined that there was movement roughly in the direction and size of the red line in the current frame compared to the previous frame.

다시 도 1을 참조하면, 카메라 움직임 보상 모듈(250)은 카메라 움직임 추정 모듈(230)로부터 판단된 영상 촬영 카메라(100)의 움직임에 따른 이동 벡터들을 통해, 이전 영상 프레임(k-1_frame) 내의 특징점들을 현재 영상 프레임(k_frame)의 특정 위치로 호모그래피(Homography) 변환시키기 위한 호모그래피 행렬(Homography matrix)을 계산한다.Referring again to FIG. 1, the camera motion compensation module 250 calculates feature points in the previous image frame (k-1_frame) through movement vectors according to the movement of the video capture camera 100 determined from the camera motion estimation module 230. Calculate a homography matrix to convert the images to a specific location in the current image frame (k_frame).

일반적으로 호모그래피란 3D 공간에서의 이미지를 2D 공간으로 투영시킨 변환으로 3D 공간에서 서로 다른 두 시점에서 바라본 두 개의 이미지를 서로 변환하는 방법을 의미하며, 이때 서로 다른 두 이미지의 관계를 표현한 행렬을 호모그래피 행렬이라 한다.In general, homography refers to a method of converting two images viewed from two different perspectives in 3D space by projecting an image in 3D space into 2D space. In this case, a matrix expressing the relationship between the two different images is used. It is called a homography matrix.

즉, 상기 호모그래피 행렬이란 이전 영상 프레임(k-1_frame)에 위치한 특정 특징점들이 현재 영상 프레임(k_frame)의 특정 위치에 존재하도록 변환시키는 행렬이라 할 수 있다.In other words, the homography matrix can be said to be a matrix that transforms specific feature points located in the previous image frame (k-1_frame) so that they exist at a specific position in the current image frame (k_frame).

이때, 상기 특정 특징점은 이들에 대한 특징점 쌍의 이동 벡터가 상기 영상 촬영 카메라(100)의 움직임에 따른 이동 벡터로 판단되는 특징점을 의미하며, 상기 현재 영상 프레임(k_frame)의 특정 위치는 상기 광학 흐름 방법을 통해 상기 특정 특징점이 현재 영상 프레임(k_frame)에 존재할 것으로 추정되는 위치를 의미한다.At this time, the specific feature point means a feature point for which the movement vector of the feature point pair is determined to be a movement vector according to the movement of the video capture camera 100, and the specific position of the current image frame (k_frame) is the optical flow Through the method, the specific feature point refers to a position where the specific feature point is estimated to exist in the current image frame (k_frame).

따라서, 이전 영상 프레임(k-1_frame)에 위치한 특정 특징점들은 상기 호모그래피 변환을 적용하였을 때 현재 프레임(k_frame)의 특정 위치에 존재하게 된다.Therefore, specific feature points located in the previous image frame (k-1_frame) exist at specific positions in the current frame (k_frame) when the homography transformation is applied.

즉, 본 명세서에서의 상기 호모그래피란 객체의 이동에 있어서 객체 자신스스로의 움직임을 제외한 영상 촬영 카메라(100)의 움직임에 따른 이동이라 할 수 있다. In other words, the homography in this specification can be said to be movement of an object based on the movement of the video capture camera 100 excluding the movement of the object itself.

한편, 이전 영상 프레임(k-1_frame)에 위치한 특정 특징점들을 제외한 다른 특징점들은 상기 호모그래피 변환을 적용하였을 때의 현재 영상 프레임(k_frame) 내 특정 위치와 앞서 설명한 광학 흐름 방법을 통해 현재 영상 프레임(k_frame)에서 추정된 위치가 다를 수 있다.Meanwhile, other feature points other than the specific feature points located in the previous image frame (k-1_frame) are located at a specific location in the current image frame (k_frame) when the homography transformation is applied and the current image frame (k_frame) through the optical flow method described above. ), the estimated location may be different.

이러한 차이는 상기 특정 특징점들을 제외한 다른 특징점들은 영상 촬영 카메라(100)의 움직임에 따른 이동(즉, 상기 호모그래프 변환) 뿐만 아니라 객체 스스로의 움직임이 있었기 때문으로 볼 수 있다.This difference can be seen as the fact that other feature points other than the specific feature points are not only moved due to the movement of the video capture camera 100 (i.e., the homograph transformation), but also because there is movement of the object itself.

따라서, 현재 영상 프레임(k_frame)내의 모든 특징점들은 상기 광학 흐름 방법을 통해 추정된 위치 정보를 보유하며, 이들에 대해 상기 호모그래프의 역변환을 수행하면 영상 촬영 카메라(100)의 움직임이 제거된 위치 정보를 보유하게 된다.Therefore, all feature points in the current image frame (k_frame) have position information estimated through the optical flow method, and when the inverse homograph transformation is performed on them, the position information from which the movement of the video capture camera 100 is removed will hold.

즉, 상기 호모그래프의 역변환 수행시 배경과 관련된 객체들은 영상 촬영 카메라(100)의 움직임이 제거되었기 때문에 이전 영상 프레임(k-1_frame) 대비 동일한 위치 정보를 갖게 되며, 배경과 관련이 없는 객체들(예컨대, 사람, 자동차 등)은 객체 스스로의 움직임만 반영된 위치 정보를 갖게 된다.That is, when performing the inverse transformation of the homograph, objects related to the background have the same position information compared to the previous image frame (k-1_frame) because the movement of the video capture camera 100 has been removed, and objects unrelated to the background ( For example, people, cars, etc.) have location information that reflects only the movement of the object itself.

이와 같은 이유로, 카메라 움직임 보상 모듈(250)은 상기와 같이 계산한 호모그래피의 역변환을 현재 영상 프레임(k_frame)을 이루는 모든 픽셀에 적용하여, 이전 영상 프레임 대비 현재 영상 프레임(k_frame)에서의 영상 촬영 카메라(100) 움직임을 제거한 보상 영상 프레임들(예컨대, 1_frame_am 내지 n_frame_am)을 생성한다.For this reason, the camera motion compensation module 250 applies the inverse transformation of the homography calculated as above to all pixels forming the current image frame (k_frame), and captures the image in the current image frame (k_frame) compared to the previous image frame. Compensatory image frames (eg, 1_frame_am to n_frame_am) from which movement of the camera 100 is removed are generated.

그 결과, 상기 호모그래피의 역변환이 적용된 현재 보상 영상 프레임(k_frame_am) 내의 모든 픽셀은 영상 촬영 카메라(100)의 움직임이 제거된 위치 정보를 갖는다.As a result, all pixels in the current compensated image frame (k_frame_am) to which the inverse homography transformation has been applied have location information in which the movement of the video capturing camera 100 has been removed.

도 4는 본 발명의 일실 시예 따른 호모그래피 역변환이 적용된 예시도이다.Figure 4 is an exemplary diagram in which inverse homography transformation is applied according to an embodiment of the present invention.

이때, 도 4의 (a)는 호모그래피 역변환이 적용되기 전의 이동 영상 프레임(k-1_frame 및 k_frame)을 나타내며, 도 4의 (b)는 호모그래피 역변환이 적용된 후의 보상 영상 프레임(k-1_frame_am 및 k_frame_am)을 나타낸다.At this time, (a) in Figure 4 shows the moving image frames (k-1_frame and k_frame) before the inverse homography transformation is applied, and (b) in Figure 4 shows the compensation image frames (k-1_frame_am and k-1_frame_am) after the inverse homography transformation is applied. k_frame_am).

도 4의 (a)를 참조하면, 영상 촬영 카메라(100)의 우측 방향으로의 이동으로 인해 이전 영상 프레임((k-1_frame) 대비 현재 영상 프레임(k_frame)에서 배경 객채(예를 들어, 전봇대)가 좌측으로 이동한 것을 확인할 수 있다.Referring to (a) of FIG. 4, due to the movement of the video capture camera 100 in the right direction, the background object (e.g., electric pole) in the current video frame (k_frame) compared to the previous video frame ((k-1_frame)) You can see that it has moved to the left.

이에 비해 도 4의 (b)를 참조하면, 영상 촬영 카메라(100)가 움직였음에도 불구하고 영상 내 배경 객체(예를 들어, 전봇대)의 절대적인 위치는 고정되어 있음을 확인할 수 있다.In contrast, referring to (b) of FIG. 4, it can be seen that even though the video capture camera 100 moves, the absolute position of the background object (eg, electric pole) in the video is fixed.

이후, 카메라 움직임 보상 모듈(250)은 상기와 같이 영상 프레임에서 영상 촬영 카메라(100) 움직임을 제거하여 생성한 보상 영상 프레임들(1_frame_am 내지 n_frame_am)을 객체 검출 모듈(270)로 전송한다.Thereafter, the camera motion compensation module 250 transmits the compensated image frames (1_frame_am to n_frame_am) generated by removing the movement of the video capture camera 100 from the video frame as described above to the object detection module 270.

다시 도 1을 참조하면, 객체 검출 모듈(270)은 카메라 움직임 보상 모듈(250)로부터 전송된 보상 영상 프레임들(1_frame_am 내지 n_frame_am) 내에서 추적과 계수의 대상이 되는 객체(예컨대, 사람)를 검출하고, 검출된 객체의 생김새를 표현하는 특징 벡터를 추출한다.Referring again to FIG. 1, the object detection module 270 detects an object (e.g., a person) that is the target of tracking and counting within the compensated image frames (1_frame_am to n_frame_am) transmitted from the camera motion compensation module 250. And extract a feature vector expressing the appearance of the detected object.

일반적인 사람 검출 모델의 경우 영상 내 사람이 차지하고 있는 전체 영역을 인식하고 사람을 둘러싸고 있는 바운딩 박스(bounding box)를 반환하지만, 본 발명의 일 실시 예에 따른 객체 검출 모듈(270)은 사람의 신체 부위에 대한 각 키 포인트(예컨대, 눈, 코, 입, 어깨, 팔꿈치, 손목, 허리, 무릎, 발목 등의 신체 부분)를 인식하여 역으로 사람의 전체 영역을 추정하는 방식을 사용한다.In the case of a general person detection model, the entire area occupied by the person in the image is recognized and a bounding box surrounding the person is returned, but the object detection module 270 according to an embodiment of the present invention uses the human body part A method is used to recognize each key point (e.g., body parts such as eyes, nose, mouth, shoulder, elbow, wrist, waist, knee, ankle, etc.) and inversely estimate the entire area of the person.

이와 같은 방식은 상기 키 포인트라는 개념을 이용하여 사람의 신체 부분의 정보를 추출, 인식함으로써 보다 정밀하게 사람을 검출할 수 있다.This method can detect people more precisely by extracting and recognizing information about human body parts using the concept of key points.

예를 들어 영상 내 특정 사람의 머리만 보이고 머리 아래 부분은 지나가는 자동차에 의해 가려진 경우, 사람 전체 영역에 대해서 스코어를 계산하는 일반적인 사람 검출 모델은 상기 특정 사람을 사람으로 인식하지 못할 가능성이 높다.For example, if only the head of a specific person in an image is visible and the area below the head is obscured by a passing car, there is a high possibility that a general person detection model that calculates a score for the entire area of the person will not recognize the specific person as a person.

그러나, 본 발명의 일 실시 예에 따른 객체 검출 모듈(270)은 사람의 각 키 포인트 별 스코어를 계산할 수 있기 때문에 사람 몸 전체에 대한 스코어는 낮게 계산되어도 머리 부분에 대해서는 높은 스코어가 계산되어 머리만 노출된 사람도 정확하게 검출할 수 있다.However, since the object detection module 270 according to an embodiment of the present invention can calculate a score for each key point of a person, even if the score for the entire human body is calculated low, a high score is calculated for the head area, so only the head is calculated. Exposed people can also be accurately detected.

이후 객체 검출 모듈(270)은 검출된 객체의 생김새를 표현하는 특징 벡터를 추출한다.Afterwards, the object detection module 270 extracts a feature vector representing the appearance of the detected object.

상기 특징 벡터는 대상이 되는 사람 영역(예컨대, 키 포인트 별 영역)의 픽셀값에서 추출되는 생김새 벡터이며, 객체 검출 모듈(250)은 키 포인트별 정보를 이용하여 상기 특징 벡터를 추출한다.The feature vector is an appearance vector extracted from the pixel value of the target human area (eg, area for each key point), and the object detection module 250 extracts the feature vector using information for each key point.

즉, 객체 검출 모듈(250)은 키 포인트 별 정보를 이용함으로써 어떤 신체 부분이 영상 프레임 내에서 보이는 부분인지, 보이지 않는 부분인지를 파악할 수 있으며, 대상의 보이는 부분을 이용하여 특징 벡터를 추출한다.In other words, the object detection module 250 can determine which body part is visible or invisible within the image frame by using information for each key point, and extracts a feature vector using the visible part of the object.

이에 따라, 객체 검출 모듈(250)은 해당 영상 프레임 내에 여러 사람이 중첩되는 밀도가 높은 경우라도 정확하게 사람을 인식하고 해당 특징 벡터를 추출할 수 있다.Accordingly, the object detection module 250 can accurately recognize a person and extract the corresponding feature vector even when the density of multiple people overlapping within the corresponding image frame is high.

한편, 객체 추적 모듈(290)은 이전 영상 프레임(예컨대, k-1_frame_am)에서 검출된 객체와 현재 영상 프레임(예컨대, k_frame_am)에서 검출된 객체와의 동일성을 판단하는 검출 기반 추적 방법(Track by Detect)을 이용하여 해당 객체를 추적한다.Meanwhile, the object tracking module 290 uses a detection-based tracking method (Track by Detect) to determine the identity of the object detected in the previous image frame (e.g., k-1_frame_am) and the object detected in the current image frame (e.g., k_frame_am). ) to track the object.

이때, 객체 추적 모듈(290)이 상기 해당 객체의 추적을 수행하기 위한 보상 영상 프레임들(예컨대, 1_frame_am 내지 n_frame_am)은 카메라 움직임 보상 모듈(250)이 영상 촬영 카메라(100) 움직임을 제거하여 생성한 영상 프레임들이다.At this time, the compensation image frames (e.g., 1_frame_am to n_frame_am) for the object tracking module 290 to track the corresponding object are generated by the camera motion compensation module 250 by removing the movement of the video capture camera 100. These are video frames.

상기 검출 기반 추적 방법과 관련하여, 제 1 영상 프레임(1_frame_am)은 영상이 최초 시작되는 프레임으로서 제 1 영상 프레임(1_frame_am)에서는 추적 중인 객체가 없기 때문에 검출된 모든 객체에 대해 추적을 시작하는 초기화 단계라 할 수 있다.In relation to the detection-based tracking method, the first image frame (1_frame_am) is the frame where the image first starts, and since there is no object being tracked in the first image frame (1_frame_am), an initialization step of starting tracking for all detected objects It can be said.

또한, 제 k 영상 프레임(k_frame_am)은 제 k-1 영상 프레임(k-1_frame_am)까지 추적 중이던 객체들과 제 k 영상 프레임(k_frame_am)에서 검출된 객체들과 매칭을 통해 기존 추적을 현재 영상 프레임으로 이어가거나, 기존 추적과 매칭되지 않는 객체들을 대상으로 신규 추적을 시작하는 단계라 할 수 있다.In addition, the k-th video frame (k_frame_am) matches the objects being tracked up to the k-1-th video frame (k-1_frame_am) with the objects detected in the k-th video frame (k_frame_am), thereby converting the existing tracking to the current video frame. It can be said to be a step to continue or to start new tracking for objects that do not match the existing tracking.

이때, 객체 추적 모듈(290)은 상기 검출 기반 추적 방법을 이용하여 해당 객체를 추적함에 있어서, 객체 검출 모듈(250)로부터 검출된 객체 간의 위치 유사도및 생김새 유사도(키 포인트 별 특징 벡터의 유사도)에 따라 해당 객체를 추적할 수 있다.At this time, when tracking the object using the detection-based tracking method, the object tracking module 290 determines the location similarity and appearance similarity (similarity of feature vectors for each key point) between the objects detected by the object detection module 250. You can track the object accordingly.

이하, 객체 추적 모듈(290)이 해당 객체를 추적하는 방법에 대해서 상세히 설명한다.Hereinafter, a detailed description will be given of how the object tracking module 290 tracks the corresponding object.

우선, 객체 추적 모듈(290)은 칼만 필터(Kalman Filter)를 이용해 이전 영상 프레임(예컨대, k-1_frame_am)까지 추적 중이던 객체(예컨대, 추적 객체(Tr_obj))에 대하여 현재 프레임(예컨대, k_frame_am)에서 존재할 것이라 예측되는 위치를 추정한다.First, the object tracking module 290 uses a Kalman Filter to track an object (e.g., tracking object (Tr_obj)) that was being tracked up to the previous image frame (e.g., k-1_frame_am) in the current frame (e.g., k_frame_am). Estimate the location where it is expected to exist.

이후, 객체 추적 모듈(290)은 객체 검출 모듈(270)로부터 상기 추정된 위치와 인접하는 소정 범위 내에서 검출된 객체(예컨대, 후보 객체(Ca_obj))를 상기 추적 중이던 객체(Tr_obj)와 대비한다.Thereafter, the object tracking module 290 compares an object (e.g., a candidate object (Ca_obj)) detected within a predetermined range adjacent to the estimated location from the object detection module 270 with the object (Tr_obj) being tracked. .

이때, 객체 추적 모듈(290)로부터 추정된 위치나 객체 검출 모듈(270)로부터 검출된 객체의 위치는 이미 카메라 움직임 보상 모듈(250)로부터 카메라 움직임이 보상된 영상 프레임에서의 위치이므로 이들 상호 간의 직접적인 위치 비교가 가능하다.At this time, the position estimated from the object tracking module 290 or the position of the object detected from the object detection module 270 is the position in the image frame for which the camera movement has already been compensated by the camera motion compensation module 250, so there is a direct relationship between them. Location comparison is possible.

즉, 객체 추적 모듈(290)은 객체 검출 모듈(250)로부터 검출된 후보 객체가 상기 추정된 위치의 일정 범위 내에 존재하는지 여부에 따라 이들 간 위치 유사도를 판단하고, 상기 추적 중이던 객체(Tr_obj)의 특징 벡터와 상기 검출된 후보 객체(Ca_obj)의 특징 벡터를 비교하여 이들 간 생김새 유사도를 판단한다.That is, the object tracking module 290 determines the positional similarity between the candidate objects detected by the object detection module 250 depending on whether they exist within a certain range of the estimated position, and determines the positional similarity between the candidate objects detected by the object detection module 250 and determines the positional similarity between the candidate objects detected by the object detection module 250. The feature vector is compared with the feature vector of the detected candidate object (Ca_obj) to determine the similarity in appearance between them.

앞서 설명하였듯이, 상기 특징 벡터는 키 포인트 별(눈, 코, 입, 어깨, 팔꿈치, 손목, 허리, 무릎, 발목 등의 신체 부분) 정보를 이용하여 추출되므로, 해당 영상 프레임 내에 여러 사람이 중첩되는 밀도가 높은 경우라도 정확하게 비교가 가능하다.As explained earlier, the feature vector is extracted using information for each key point (body parts such as eyes, nose, mouth, shoulder, elbow, wrist, waist, knee, ankle, etc.), so multiple people overlap within the video frame. Even when density is high, accurate comparison is possible.

또한, 객체 추적 모듈(290)은 상기 추적 중이던 객체(Tr_obj)의 키 포인트별 특징 벡터와 상기 상기 검출된 후보 객체(Ca_obj)의 키포인트 별 특징 벡터를 비교하여 생김새 유사도를 판단하되, 각 키포인트에 따른 우선 순위를 설정하여 객체 상호 간 유사도를 판단할 수 있다.In addition, the object tracking module 290 compares the feature vector for each key point of the object being tracked (Tr_obj) with the feature vector for each key point of the detected candidate object (Ca_obj) to determine the similarity in appearance, and determines the degree of similarity according to each key point. By setting priorities, you can determine the similarity between objects.

예컨대, 눈, 코, 입, 어깨, 팔꿈치, 손목, 허리, 무릎, 발목 등의 각 키포인트들 중에서 눈, 코, 입 등 사람의 얼굴과 관련된 키 포인트의 특징 벡터에 가장 높은 우선 순위를 설정할 수 있다.For example, among key points such as eyes, nose, mouth, shoulder, elbow, wrist, waist, knee, and ankle, the highest priority can be set to the feature vector of key points related to the human face, such as eyes, nose, and mouth. .

이와 같이 객체 추적 모듈(290)은 상기 위치 유사도 및 상기 생김새 유사도 판단에 따라 상기 후보 객체(Ca_obj)와 상기 추적 객체(Tr_obj)의 동일성을 판단하고, 상기 후보 객체(Ca_obj)와 상기 추적 객체(Tr_obj)를 동일한 객체로 판단하는 경우 상호 매칭되었다고 정의한다.In this way, the object tracking module 290 determines the identity of the candidate object (Ca_obj) and the tracking object (Tr_obj) according to the determination of the location similarity and the appearance similarity, and determines the identity of the candidate object (Ca_obj) and the tracking object (Tr_obj). ) are defined as mutually matched if they are judged to be the same object.

한편, 해당 영상 프레임(예컨대, k_frame_am)에 여러 객체가 중첩될 정도로 밀도가 높은 경우, 즉 상기 추정된 위치의 일정 범위 내에서 복수의 후보 객체들(예컨대, Ca_obj1 ~ Ca_obj3)이 검출되는 경우도 있을 수 있다.On the other hand, when the density is high enough for multiple objects to overlap in the corresponding image frame (e.g., k_frame_am), that is, there may be cases where multiple candidate objects (e.g., Ca_obj1 to Ca_obj3) are detected within a certain range of the estimated position. You can.

이와 같은 경우일 때, 객체 추적 모듈(290)은 상기 복수의 후보 객체들(Ca_obj1 ~ Ca_obj3) 각각에 대하여 추적 객체(Tr_obj)와의 상기 생김새 유사도및 상기 위치 유사도를 대비함으로써 매칭 여부를 판단한다.In this case, the object tracking module 290 determines whether there is a match by comparing the appearance similarity and the location similarity with the tracking object (Tr_obj) for each of the plurality of candidate objects (Ca_obj1 to Ca_obj3).

이때, 상기 생김새 유사도와 상기 위치 유사도에 소정 비율에 따른 가중치를 부여하고, 상기 판단된 생김새 유사도 및 상기 위치 유사도에 상기 부여한 가중치를 적용한 결과를 종합하여 상기 복수의 후보 객체들(Ca_obj1 ~ Ca_obj3) 중 추적 객체(Tr_obj)와 동일한 객체로 판단되는 후보 객체를 매칭한다.At this time, a weight is given to the appearance similarity and the location similarity according to a predetermined ratio, and the result of applying the given weight to the determined appearance similarity and the location similarity is combined to select one of the plurality of candidate objects (Ca_obj1 to Ca_obj3). Match candidate objects that are determined to be the same as the tracking object (Tr_obj).

실시 예에 따라, 상기 생김새 유사도에 대해서는 1보다 큰 소정값이 상기 가중치로 설정될 수 있고, 상기 위치 유사도에 대해서는 1보다 작은 소정값이 상기 가중치로 설정될 수 있다.Depending on the embodiment, a predetermined value greater than 1 may be set as the weight for the appearance similarity, and a predetermined value less than 1 may be set as the weight for the location similarity.

이는 객체간 동일성 판단 시, 객체 간 생김새 유사도가 객체 간 위치 유사도보다 더 중요한 비교 정보로 볼 수 있기 때문인데, 영상 프레임 환경 등의 영향으로 생김새 유사도에 비중을 높게 부여하기 어려운 경우(객체 자체가 인식되지 않는 경우나 키 포인트 별로 모든 특징 벡터가 추출되지 않은 경우, 영상 내 조명 환경 변화, 객체의 자세 변화 등)에는 위치 유사도에 더 높은 가중치가 설정될 수 있음을 물론이다.This is because when judging the identity between objects, the similarity in appearance between objects can be viewed as more important comparison information than the similarity in location between objects. However, in cases where it is difficult to give high weight to similarity in appearance due to the influence of the video frame environment, etc. (the object itself is recognized Of course, in cases where this is not possible or when all feature vectors are not extracted for each key point, changes in the lighting environment within the image, changes in the posture of the object, etc.), a higher weight can be set for the position similarity.

예컨대 상기 추정된 위치와 인접하는 소정 범위 내에 아무런 객체가 검출되지 않는 경우에, 객체 추적 모듈(290)은 상기 소정 범위 밖에서 검출된 후보 객체들(예컨대, Ca_obj4 ~ Ca_obj) 각각에 대하여 추적 객체(Tr_obj)와의 상기 생김새 유사도 및 상기 위치 유사도를 대비함으로써 매칭 여부를 판단하되, 상기 위치 유사도에 대해서는 1보다 큰 소정값을 상기 가중치로 설정할 수 있고, 상기 생김새 유사도에 대해서는 1보다 작은 소정값을 상기 가중치로 설정할 수 있다.For example, when no object is detected within a predetermined range adjacent to the estimated location, the object tracking module 290 tracks the tracking object (Tr_obj) for each of the candidate objects (e.g., Ca_obj4 to Ca_obj) detected outside the predetermined range. ) is determined by comparing the appearance similarity and the location similarity. However, for the location similarity, a predetermined value greater than 1 may be set as the weight, and for the appearance similarity, a predetermined value less than 1 may be set as the weight. You can set it.

한편, 이전 영상 프레임들(1_frame_am ~ k-1_frame_am)에서 검출은 되었으나 추적에 계속 실패한 객체(Un_obj)가 존재할 수 있고, 상기 추적에 실패한 객체(Un_obj)의 현재 영상 프레임(k_frame_am)에서의 추정 위치와 인접하는 소정 범위 내에 아무런 객체가 검출되지 않는 경우가 있을 수 있다. Meanwhile, there may be an object (Un_obj) that was detected in previous video frames (1_frame_am to k-1_frame_am) but continues to fail to track, and the estimated position of the object (Un_obj) that failed to track is in the current video frame (k_frame_am) There may be cases where no object is detected within a predetermined adjacent range.

이러한 경우에 객체 추적 모듈(290)은 상기 위치 유사도에 대한 상기 가중치를 0으로 설정하여, 상기 위치 유사도와는 무관하게 상기 생김새 유사도 만으로 객체 간 매칭을 수행할 수 있다.In this case, the object tracking module 290 can set the weight for the location similarity to 0 and perform matching between objects based only on the appearance similarity, regardless of the location similarity.

상기와 같은 매칭 과정을 통해서도 매칭에 실패한 객체(Un_obj)에 대해서는 상기 칼만 필터를 통해 예측한 위치를 현재 영상 프레임(k_frame_am)에서의 객체 위치로 업데이트 한다.For an object (Un_obj) that fails to match even through the above matching process, the position predicted through the Kalman filter is updated to the object position in the current image frame (k_frame_am).

또한, 일정 회수 이상 연속으로 매칭에 실패한 경우에 해당 객체는 해당 영상 프레임 내에서 사라졌다고 판단(예컨대, 버스 등 차량 탑승, 지하철 역으로 내려감)하고 더 이상 추적을 하지 않는다.Additionally, if matching fails more than a certain number of times in succession, the object is determined to have disappeared within the video frame (e.g., boarding a vehicle such as a bus or going down to a subway station) and is no longer tracked.

도 5는 본 발명의 일 실시 예에 따른 동적 카메라 영상 내의 객체를 실시간 추적하는 방법을 설명하기 위한 순서도이다.Figure 5 is a flow chart to explain a method for real-time tracking an object in a dynamic camera image according to an embodiment of the present invention.

도 1 내지 도 5를 참조하면, 수평 이동, 수직 이동, 줌 조정이 가능한 팬틸트줌(Pan-Tilt-Zoom, PTZ) 카메라로 구현된 영상 촬영 카메라(100)가 이동하며 촬영한 이동 영상 프레임들(1_frame 내지 n_frame)을 영상 분석 서버(200)로 전송한다(S100).Referring to FIGS. 1 to 5, moving video frames captured while the video capture camera 100, implemented as a Pan-Tilt-Zoom (PTZ) camera capable of horizontal movement, vertical movement, and zoom adjustment, moves. (1_frame to n_frame) is transmitted to the video analysis server 200 (S100).

영상 분석 서버(200)의 카메라 움직임 추정 모듈(230)은 영상 촬영 카메라(100)를 통해 전송된 이동 영상 프레임들(1_frame 내지 n_frame)을 수신하고(S150), 해당 영상 프레임에 일정 간격으로 배치된 격자점들로 구성된 격자(Grid)를 생성한다(S200).The camera motion estimation module 230 of the video analysis server 200 receives moving video frames (1_frame to n_frame) transmitted through the video capture camera 100 (S150), and places the moving video frames (1_frame to n_frame) at regular intervals in the video frames. A grid consisting of grid points is created (S200).

이후, 카메라 움직임 추정 모듈(230)은 해당 영상 프레임들의 픽셀값을 컬러 값이 아닌 무채색 스케일, 즉 그레이 스케일(gray scale)로 변환한다(S210).Afterwards, the camera motion estimation module 230 converts the pixel values of the corresponding image frames into an achromatic scale, that is, gray scale, rather than a color value (S210).

순차적으로, 카메라 움직임 추정 모듈(230)은 격자점들 각각에 대해 이전 영상 프레임(예컨대, k-1_frame) 대비 현재 영상 프레임(k_frame)에서의 그레이 스케일 값 변화량을 계산하고, 상기 계산된 그레이 스케일 값 변화량이 소정 기준치 이상인 경우에만 해당 격자점을 상기 특징점으로 선별한다(S230).Sequentially, the camera motion estimation module 230 calculates the amount of change in gray scale value in the current image frame (k_frame) compared to the previous image frame (e.g., k-1_frame) for each of the grid points, and calculates the calculated gray scale value. Only when the amount of change is greater than a predetermined standard value, the corresponding grid point is selected as the feature point (S230).

이후, 카메라 움직임 추정 모듈(230)은 소정의 광학 흐름 방법을 통해 현재 영상 프레임(k_frame)에서 추출한 특징점들이 다음 영상 프레임(k+1_frame)의 어느 위치로 이동했는지 추정한다(S250).Afterwards, the camera motion estimation module 230 estimates where the feature points extracted from the current image frame (k_frame) have moved to in the next image frame (k+1_frame) using a predetermined optical flow method (S250).

실시 예에 따라, 상기 광학 흐름 방법으로 추출한 특징점에 대한 광학 흐름을 분석할 수 있고 알고리즘 소요 시간 대비 정확성이 높아 실시간 영상에 가장 적합한 피라미드 루카스 카나에 알고리즘을 이용할 수 있다.Depending on the embodiment, the optical flow for the feature points extracted by the optical flow method can be analyzed, and the Pyramid Lucas Canae algorithm, which is most suitable for real-time imaging due to its high accuracy compared to the time required for the algorithm, can be used.

순차적으로, 카메라 움직임 추정 모듈(230)은 현재 영상 프레임(k_frame)에서의 특징점 위치 정보와 다음 영상 프레임(k+1_frame)에 대해 추정한 특징점의 위치 정보를 이용하여 특징점 쌍(현재 프레임에서의 특징점과 다음 프레임에서 추정된 특징점)의 이동 벡터를 구한다(S270).Sequentially, the camera motion estimation module 230 uses the feature point location information in the current image frame (k_frame) and the location information of the feature point estimated for the next image frame (k+1_frame) to pair feature points (feature points in the current frame). and the motion vector of the feature point estimated in the next frame (S270).

이어서, 카메라 움직임 추정 모듈(230)은 상기 구한 이동 벡터 전체에 대해 이차원 히스토그램을 계산하여 최고 빈도의 이동 벡터들을 결정한다(S290).Next, the camera motion estimation module 230 calculates a two-dimensional histogram for all of the obtained motion vectors and determines the motion vectors with the highest frequency (S290).

이때, 상기 결정된 최고 빈도에 해당하는 이동 벡터들은 배경의 이동 벡터, 즉 영상 촬영 카메라(100)의 움직임과 관련된 이동 벡터가 된다.At this time, the motion vectors corresponding to the determined highest frequency become background motion vectors, that is, motion vectors related to the movement of the video capture camera 100.

한편, 카메라 움직임 보상 모듈(250)은 카메라 움직임 추정 모듈(230)로부터 결정된 최고 빈도의 이동 벡터들을 통해, 이전 영상 프레임(k-1_frame) 내의 특정 특징점들을 현재 영상 프레임(k_frame)의 특정 위치로 변환시키기 위한 호모그래피 행렬(Homography matrix)을 계산한다(S300).Meanwhile, the camera motion compensation module 250 converts specific feature points in the previous image frame (k-1_frame) to specific positions in the current image frame (k_frame) through the highest frequency motion vectors determined from the camera motion estimation module 230. Calculate the homography matrix (S300).

이때, 상기 특정 특징점은 이들에 대한 특징점 쌍의 이동 벡터가 상기 결정된 최고 빈도의 이동 벡터들에 포함되는 특징점을 의미하며, 상기 호모그래피 행렬을 계산한다는 것은 영상 촬영 카메라(100)의 움직임을 추정한다는 것과 동일한 의미이다.At this time, the specific feature point means a feature point whose motion vectors of the feature point pair are included in the determined highest frequency motion vectors, and calculating the homography matrix means estimating the motion of the video capture camera 100. It has the same meaning as

순차적으로, 카메라 움직임 보상 모듈(250)은 현재 영상 프레임(k_frame) 내의 픽셀들에 대해 상기 계산한 호모그래프 행렬의 역변환을 적용하여 영상 촬영 카메라(100) 움직임을 제거한 보상 영상 프레임들(예컨대, 1_frame_am 내지 n_frame_am)을 생성한다(S330).Sequentially, the camera motion compensation module 250 applies the inverse transformation of the calculated homograph matrix to the pixels in the current image frame (k_frame) to compensate for the movement of the image capture camera 100 by removing the compensated image frames (e.g., 1_frame_am). to n_frame_am) is generated (S330).

카메라 움직임 보상 모듈(250)은 상기와 같이 영상 촬영 카메라(100)의 움직임을 제거하여 생성한 보상 영상 프레임들(1_frame_am 내지 n_frame_am)을 객체 검출 모듈(270)로 전송한다(S350).The camera motion compensation module 250 transmits the compensated image frames (1_frame_am to n_frame_am) generated by removing the movement of the video capture camera 100 to the object detection module 270 (S350).

객체 검출 모듈(270)은 카메라 움직임 보상 모듈(250)로부터 전송된 보상 영상 프레임들(1_frame_am 내지 n_frame_am) 내에서 추적의 대상이 되는 객체(예컨대, 사람)를 검출하고(S400), 검출된 객체의 생김새를 표현하는 특징 벡터를 추출한다(S430).The object detection module 270 detects an object (e.g., a person) to be tracked within the compensated image frames (1_frame_am to n_frame_am) transmitted from the camera motion compensation module 250 (S400), and A feature vector representing the appearance is extracted (S430).

본 발명의 일 실시 예에 따른 객체 검출 모듈(270)은 사람의 신체 부위에 대한 각 키 포인트(예컨대, 눈, 코, 입, 어깨, 팔꿈치, 손목, 허리, 무릎, 발목 등의 신체 부분)를 인식하여 역으로 사람의 전체 영역을 추정하는 방식을 사용하여 보다 정밀하게 사람을 검출할 수 있다(S400).The object detection module 270 according to an embodiment of the present invention detects each key point of a person's body part (e.g., body parts such as eyes, nose, mouth, shoulder, elbow, wrist, waist, knee, ankle, etc.). A person can be detected more precisely by using a method that recognizes and inversely estimates the entire area of the person (S400).

이후 객체 검출 모듈(270)은 검출된 객체의 키 포인트 별 픽셀값에서 얻어지는 특징 벡터(생김새 벡터)를 추출한다(S430).Afterwards, the object detection module 270 extracts a feature vector (appearance vector) obtained from the pixel value of each key point of the detected object (S430).

순차적으로, 객체 추적 모듈(290)은 칼만 필터(Kalman Filter)를 이용해 상기 설명에서와 같이 이전 영상 프레임(예컨대, k-1_frame_am)까지 추적 중이던 객체(예컨대, 추적 객체(Tr_obj))에 대하여 현재 프레임(예컨대, k_frame_am)에서 존재할 것이라 예측되는 위치를 추정한다(S500).Sequentially, the object tracking module 290 uses a Kalman Filter to track the current frame for the object being tracked (e.g., the tracking object (Tr_obj)) up to the previous image frame (e.g., k-1_frame_am) as described above. A position predicted to exist in (e.g., k_frame_am) is estimated (S500).

이후, 객체 추적 모듈(290)은 객체 검출 모듈(270)로부터 상기 추정된 위치와 인접하는 소정 범위 내에서 검출된 후보 객체들(예컨대, Ca_obj1 내지 Ca_obj3) 또는 상기 소정 범위 밖에서 검출된 후보 객체들(예컨대, Ca_obj4 내지 Ca_obj6)과 상기 추적 중이던 객체(Tr_obj)를 대비하여 매칭 여부를 판단한다(S550).Thereafter, the object tracking module 290 selects candidate objects detected within a predetermined range adjacent to the estimated position from the object detection module 270 (e.g., Ca_obj1 to Ca_obj3) or candidate objects detected outside the predetermined range ( For example, Ca_obj4 to Ca_obj6) is compared with the object being tracked (Tr_obj) to determine whether there is a match (S550).

이때, 상기 매칭이란 객체 추적 모듈(290)이 상기 대비시 상기 후보 객체(Ca_obj)와 상기 추적 객체(Tr_obj)가 동일한 객체로 판단되는 경우를 의미한다.At this time, the matching refers to a case where the object tracking module 290 determines that the candidate object (Ca_obj) and the tracking object (Tr_obj) are the same object during the comparison.

객체 추적 모듈(290)이 상기 후보 객체와 추적 객체를 대비하여 매칭 여부를 판단하는 방법(S550)은 앞서 설명한 바 있으므로 중복되는 설명은 생략한다.The method (S550) in which the object tracking module 290 compares the candidate object and the tracked object to determine whether they match has been previously described, so redundant description will be omitted.

이상의 설명은 본 발명의 기술 사상을 예시적으로 설명한 것에 불과한 것으로서, 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자라면 본 발명의 본질적인 특성에서 벗어나지 않는 범위에서 다양한 수정 및 변형이 가능하다. The above description is merely an illustrative explanation of the technical idea of the present invention, and various modifications and variations can be made by those skilled in the art without departing from the essential characteristics of the present invention.

따라서, 본 발명에 개시된 실시 예들은 본 발명의 기술 사상을 한정하기 위한 것이 아니라 설명하기 위한 것이고, 이러한 실시 예에 의하여 본 발명의 기술 사상의 범위가 한정되는 것은 아니다. 본 발명의 보호 범위는 아래의 청구범위에 의하여 해석되어야 하며, 그와 동등한 범위 내에 있는 모든 기술 사상은 본 발명의 권리범위에 포함되는 것으로 해석되어야 할 것이다.Accordingly, the embodiments disclosed in the present invention are not intended to limit the technical idea of the present invention, but rather to explain it, and the scope of the technical idea of the present invention is not limited by these embodiments. The scope of protection of the present invention should be interpreted in accordance with the claims below, and all technical ideas within the equivalent scope should be construed as being included in the scope of rights of the present invention.

10 : 추적 시스템
100 : 영상 촬영 카메라
200 : 영상 분석 서버
230 : 카메라 움직임 추정 모듈
250 : 카메라 움직임 보상 모듈
270 : 객체 검출 모듈
290 : 객체 추적 모듈10: Tracking system
100: Video recording camera
200: Video analysis server
230: Camera movement estimation module
250: Camera movement compensation module
270: object detection module
290: Object tracking module

Claims

A video capture camera capable of horizontal movement, vertical movement, and rotation, and generating moving video frames captured while moving a specific area, and
For the moving image frames generated from the video capturing camera, compensation image frames are generated by removing the movement of the video capturing camera by inverse homography transforming movement vectors according to the movement of the video capturing camera, and the generated compensation image frames are It includes a video analysis server that tracks objects detected according to key points according to location similarity and appearance similarity,
The video analysis server,
The pixel values of the moving image frames are converted to gray scale, a grid composed of grid points arranged at regular intervals in the previous image frame among the moving image frames is generated, and each of the grid points is compared to the previous image frame. Grid points whose gray scale value change in the current image frame is more than a predetermined standard value are selected as feature points, and the expected positions of the selected feature points in the next video frame are estimated through the pyramid Lucas Canae algorithm to obtain movement vectors of feature point pairs. , a camera motion estimation module that calculates a two-dimensional histogram for the motion vectors of the feature point pair and calculates the highest frequency motion vectors as motion vectors according to the motion of the video capture camera;
a camera motion compensation module that generates compensation image frames in which movement of the video capture camera is removed by performing inverse homography transformation on the calculated motion vectors for the moving video frames;
an object detection module that detects the object by recognizing the key points for each part of the object from the compensated image frames; and
A system for real-time tracking an object in a dynamic camera image, including an object tracking module that tracks the detected object according to the location similarity and appearance similarity.

delete

The method of claim 1, wherein the camera motion compensation module,
Calculate a homography matrix for homography conversion of feature points in previous image frames among the moving image frames to specific positions of the current image frame based on movement vectors according to the movement of the image capture camera,
In a dynamic camera image, the inverse transform of the calculated homography is applied to all pixels of the current image frame to generate compensated image frames in which the image capture camera movement in the current image frame is removed compared to the previous image frame. A system that tracks objects in real time.

The method of claim 1, wherein the object detection module:
A system for real-time tracking an object in a dynamic camera image, characterized in that a feature vector representing the appearance of the detected object is extracted according to the recognized key point.

The method of claim 1, wherein the object tracking module:
Using the Kalman filter, among the objects detected by the object detection module, estimate the position expected to exist in the current video frame for the tracking object that was being tracked until the previous video frame,
Determine the position similarity by comparing the positions of candidate objects detected in the current image frame by the object detection module with the positions of the tracked objects estimated using the Kalman filter,
Determine the appearance similarity by comparing feature vectors of the tracked object and feature vectors of the candidate objects,
A system for real-time tracking an object in a dynamic camera image, characterized in that the tracking object is tracked by determining the identity between the tracking object and the candidate object according to the determined location similarity and the appearance similarity.

The method of claim 6, wherein the object tracking module,
Determine the identity between the tracking object and the candidate object by assigning a weight according to a predetermined ratio to each of the determined location similarity and the appearance similarity,
When the candidate objects are detected within a predetermined range adjacent to the estimated location of the tracked object, a greater weight is given to the location similarity than the appearance similarity, and the candidate objects are assigned to the candidate outside the predetermined range adjacent to the estimated location. A system for real-time tracking objects in a dynamic camera image, wherein when objects are detected, greater weight is given to the appearance similarity than the location similarity.

Transmitting moving video frames captured while the video capture camera moves to a camera motion estimation module;
estimating the motion of the video capture camera by the camera motion estimation module calculating a movement vector for the video capture camera from the transmitted moving video frames;
A camera motion compensation module generating compensation image frames in which the estimated motion of the video capture camera is removed from the moving image frames and transmitting the compensation image frames to an object detection module;
The object detection module detecting a tracking object in a corresponding compensation image frame among the transmitted compensation image frames and extracting a feature vector of the detected tracking object;
estimating, by an object tracking module, the expected position of the tracked object in a compensation image frame subsequent to the corresponding compensation image frame using a Kalman filter;
detecting, by the object detection module, at least one candidate object in a compensation image frame subsequent to the corresponding compensation image frame, and extracting a feature vector of the detected candidate object; and
The object tracking module determines whether or not a match is made by comparing the position similarity between the candidate objects detected within or outside the predetermined range adjacent to the estimated expected position and the tracked object and the appearance similarity according to the extracted feature vector. Including a judgment step;
The step of the camera motion estimation module estimating the movement of the video capture camera,
Converting pixel values of the transmitted moving image frames into gray scale, generating a grid composed of grid points arranged at regular intervals with respect to the moving image frames, and converting the previous image frame for each of the grid points. Selecting grid points whose gray scale value change in the current image frame is greater than a predetermined standard value as feature points; estimating the positions of the feature points for the next image frame through a predetermined optical flow method; and selecting the feature points in the current image frame. Obtaining motion vectors of a feature point pair using one feature point and a feature point in the next frame of the estimated next image; calculating a two-dimensional histogram for the obtained motion vectors to determine the highest frequency motion vectors; and transmitting the motion vectors to the video camera. A method for real-time tracking of an object in a dynamic camera image, including calculating a movement vector for the object.

delete

The method of claim 8, wherein the camera motion compensation module generates the compensated image frames and transmits them to the object detection module,
calculating a homography matrix for converting feature points in the previous image frame to specific positions in the current image frame through the determined highest frequency motion vectors;
generating the compensated image frames from which the video capturing camera movement is removed by applying an inverse transform of the calculated homograph matrix to pixels in the moving video frame; and
A method of real-time tracking an object in a dynamic camera image, comprising: transmitting the generated compensation image frames to the object detection module.