KR102594694B1

KR102594694B1 - Method and apparatus of the same person identification among video sequences from multiple cameras, recording medium and device for performing the same

Info

Publication number: KR102594694B1
Application number: KR1020210088010A
Authority: KR
Inventors: 안희준
Original assignee: 서울과학기술대학교 산학협력단
Priority date: 2021-07-05
Filing date: 2021-07-05
Publication date: 2023-10-26
Also published as: WO2023282410A1; KR20230007141A

Abstract

본 발명의 동일인물 인식 장치에 의해 수행되는 동일인물 인식 방법은, 적어도 하나의 카메라로부터 동영상을 입력받는 단계; 입력된 동영상에 기초하여 이미지 시퀀스가 획득되는 이미지 시퀀스 획득단계; 상기 이미지 시퀀스에서 동일인물 인식을 위한 기준인물의 이미지 시퀀스를 쿼리 이미지 시퀀스로 하고, 상기 쿼리 이미지 시퀀스에 포함된 이미지의 포즈에 기초하여 상기 쿼리 이미지 시퀀스 중에서 기설정된 수만큼의 대표 이미지가 선택되는 대표 이미지 선택단계; 상기 이미지 시퀀스에서 상기 동일인물 인식을 위한 비교 대상이 되는 인물의 이미지 시퀀스를 비교 이미지 시퀀스로 하고, 상기 비교 이미지 시퀀스에 포함된 비교 이미지와 상기 대표 이미지를 매칭하여 비교대상 쌍을 선정하는 비교대상 쌍 선정단계; 변형 알고리즘을 통해 상기 비교대상 쌍에서 쌍을 이루는 상기 대표 이미지 또는 비교 이미지 중 적어도 하나의 이미지의 포즈를 변형 목표 포즈로 보정하는 포즈 보정단계; 상기 포즈 보정단계를 거친 상기 비교 이미지 및 상기 대표 이미지의 특징을 추출하는 특징 추출단계; 및 상기 추출된 특징 간의 유사도 및 거리를 산출하는 산출단계를 포함한다. 이에 의해 다중 카메라 시스템에서 인물 재탐지의 성능과 신뢰성을 높여 동일인물을 인식하는 정확도를 향상시킬 수 있다.The same person recognition method performed by the same person recognition device of the present invention includes receiving a video input from at least one camera; An image sequence acquisition step in which an image sequence is acquired based on the input video; In the image sequence, the image sequence of the reference person for recognizing the same person is a query image sequence, and a preset number of representative images are selected from the query image sequence based on the pose of the image included in the query image sequence. Image selection step; In the image sequence, the image sequence of the person that is the subject of comparison for recognizing the same person is set as the comparison image sequence, and the comparison target pair is selected by matching the comparison image included in the comparison image sequence with the representative image. Selection stage; A pose correction step of correcting the pose of at least one image among the representative image or comparison image in the pair of comparison objects to a transformation target pose through a transformation algorithm; a feature extraction step of extracting features of the comparison image and the representative image that have undergone the pose correction step; and a calculation step of calculating the similarity and distance between the extracted features. As a result, the accuracy of recognizing the same person can be improved by increasing the performance and reliability of person re-detection in a multi-camera system.

Description

Method for recognizing the same person in images taken by multiple cameras and recording media and devices for performing the same

본 발명은 다수의 카메라에서 촬영된 영상에서 동일인물 인식 방법 및 이를 수행하기 위한 기록 매체 및 장치에 관한 것으로, 보다 상세하게는 특정 카메라에서 발견된 인물을 다른 카메라에서 촬영된 영상 속의 인물과 동일인물인지를 확인할 수 있는 다수의 카메라에서 촬영된 영상에서 동일인물 인식 방법 및 이를 수행하기 위한 기록 매체 및 장치에 관한 것이다.The present invention relates to a method for recognizing the same person in images captured by multiple cameras and a recording medium and device for performing the same. More specifically, the present invention relates to a method for recognizing a person found in a specific camera as the person in an image captured in another camera. It relates to a method for recognizing the same person from images captured by multiple cameras that can confirm recognition, and recording media and devices for performing the same.

건물의 방재 카메라(CCTV) 시스템이나 자율주행과 같은 산업분야에서는 객체를 자동으로 검출함에 있어서, 다수의 카메라에서 촬영된 영상이나, 하나의 카메라에서 다른 시각에서 촬영된 인물이 동일인물인지를 확인하거나 동일인물을 영상에서 검색하는 기술이 사용된다. 이 때 일반적으로는 인물로 고려되는 영역을 추출하고 이 인물 사진 간의 유사도를 알고리즘에 의해서 또는 인공지능 학습법에 의하여 특징을 추출하고, 이들을 비교하여 유사도를 판별하는 방식을 사용한다. In industrial fields such as building disaster prevention camera (CCTV) systems or autonomous driving, when automatically detecting objects, it is necessary to check whether images captured by multiple cameras or people photographed from different perspectives by one camera are the same person. Technology is used to search for the same person in a video. At this time, a method is generally used to extract the area considered to be a person, extract the similarity between photos of the person, extract features using an algorithm or artificial intelligence learning method, and compare them to determine the similarity.

하지만 추출된 이미지의 정보가 인물의 포즈, 카메라 위치 및 조명 등에 따라 매우 상이하기 때문에 동일한 인물을 확인하는 것이 쉽지 않다. However, it is not easy to identify the same person because the information in the extracted image is very different depending on the person's pose, camera position, and lighting.

이러한 문제를 해결하기 위하여 인물 전체가 아닌 인물을 각 부분으로 나누고, 각 부분별로 비교하는 방법을 사용하는 알고리즘이 개발되고 있지만, 이러한 방법은 파트별 특성을 고려할 수는 있지만, 전체적인 신체 비율과 구조 등을 고려할 수 없다는 단점을 갖는다. In order to solve this problem, an algorithm is being developed that divides the person into each part rather than the whole person and uses a method of comparing each part. Although this method can take into account the characteristics of each part, the overall body proportions and structure, etc. It has the disadvantage of not being able to take into account.

또한 한 장의 이미지에서 얻을 수 있는 방안은 제한적이므로 다수의 이미지를 이용한 매칭 방법도 개발되고 있지만, 동영상에서 추출되는 이미지의 수가 매우 많으므로 이를 효과적으로 줄이는 방법을 필요로 하나, 현재 개발된 방법은 비교 대상 영상과의 관계를 고려하지 않고 독립적으로 이미지를 선별하는 방법을 사용하기 때문에 동인인물을 감지하기에는 효과적이지 못한다는 문제가 여전히 존재한다. 특히 다수의 이미지를 이용한 매칭 방법의 경우에는 두 개의 이미지에 포함된 인물의 유사도를 계산할 때 인물의 파트를 검출하고, 각 파트별로 비교하는 방식을 사용하기 때문에 계산량이 과도하게 만하지기 때문에 실적용이 어렵다는 문제가 있다. In addition, since the methods that can be obtained from a single image are limited, matching methods using multiple images are also being developed. However, since the number of images extracted from videos is very large, a method to effectively reduce this is needed. However, the currently developed method is subject to comparison. There is still a problem that it is not effective in detecting the same person because it uses a method of independently selecting images without considering the relationship with the video. In particular, in the case of matching methods using multiple images, when calculating the similarity of people included in two images, the parts of the person are detected and compared for each part, so the calculation amount is excessively large, making it difficult to use in practice. there is a problem.

대한민국공개특허 제10-2012-0108319호Republic of Korea Patent Publication No. 10-2012-0108319

본 발명은 상기와 같은 문제를 해결하기 위해 안출된 것으로, 본 발명의 목적은 다중 카메라 시스템에서 인물 재탐지의 성능과 신뢰성을 높여 동일인물을 인식하는 정확도를 향상시킬 수 있는 다수의 카메라에서 촬영된 영상에서 동일인물 인식 방법 및 이를 수행하기 위한 기록 매체 및 장치를 제공하는 것이다. The present invention was created to solve the above problems, and the purpose of the present invention is to improve the accuracy of recognizing the same person by increasing the performance and reliability of person re-detection in a multi-camera system. The aim is to provide a method for recognizing the same person in an image and a recording medium and device for performing the same.

또한 본 발명의 다른 목적은 동영상에서 대표적인 이미지를 선별하여 사용함으로써 종래의 이미지를 비교하는 연산보다 검색 속도를 향상시킬 수 있는 다수의 카메라에서 촬영된 영상에서 동일인물 인식 방법 및 이를 수행하기 위한 기록 매체 및 장치를 제공하는 것이다. In addition, another object of the present invention is to provide a method for recognizing the same person in images captured by multiple cameras that can improve search speed over conventional image comparison operations by selecting and using representative images from the video, and a recording medium for performing the same. and providing a device.

상기 목적을 달성하기 위한 본 발명의 일 실시예에 따른 동일인물 인식 장치에 의해 수행되는 동일인물 인식 방법은, 적어도 하나의 카메라로부터 동영상을 입력받는 단계; 입력된 동영상에 기초하여 이미지 시퀀스가 획득되는 이미지 시퀀스 획득단계; 상기 이미지 시퀀스에서 동일인물 인식을 위한 기준인물의 이미지 시퀀스를 쿼리 이미지 시퀀스로 하고, 상기 쿼리 이미지 시퀀스에 포함된 이미지의 포즈에 기초하여 상기 쿼리 이미지 시퀀스 중에서 기설정된 수만큼의 대표 이미지가 선택되는 대표 이미지 선택단계; 상기 이미지 시퀀스에서 상기 동일인물 인식을 위한 비교 대상이 되는 인물의 이미지 시퀀스를 비교 이미지 시퀀스로 하고, 상기 비교 이미지 시퀀스에 포함된 비교 이미지와 상기 대표 이미지를 매칭하여 비교대상 쌍을 선정하는 비교대상 쌍 선정단계; 변형 알고리즘을 통해 상기 비교대상 쌍에서 쌍을 이루는 상기 대표 이미지 또는 비교 이미지 중 적어도 하나의 이미지의 포즈를 변형 목표 포즈로 보정하는 포즈 보정단계; 상기 포즈 보정단계를 거친 상기 비교 이미지 및 상기 대표 이미지의 특징을 추출하는 특징 추출단계; 및 상기 추출된 특징 간의 유사도 및 거리를 산출하는 산출단계를 포함한다. In order to achieve the above object, a method of recognizing a same person performed by a same person recognition device according to an embodiment of the present invention includes receiving a video input from at least one camera; An image sequence acquisition step in which an image sequence is acquired based on the input video; In the image sequence, the image sequence of the reference person for recognizing the same person is a query image sequence, and a preset number of representative images are selected from the query image sequence based on the pose of the image included in the query image sequence. Image selection step; In the image sequence, the image sequence of the person that is the subject of comparison for recognizing the same person is set as the comparison image sequence, and the comparison target pair is selected by matching the comparison image included in the comparison image sequence with the representative image. Selection stage; A pose correction step of correcting the pose of at least one image among the representative image or comparison image in the pair of comparison objects to a transformation target pose through a transformation algorithm; a feature extraction step of extracting features of the comparison image and the representative image that have undergone the pose correction step; and a calculation step of calculating the similarity and distance between the extracted features.

여기서 상기 이미지 시퀀스 획득단계는, 상기 입력된 동영상에서 인물영역을 검출하고 추적하여 인물에 대한 바운딩 박스를 포함하는 이미지 시퀀스를 획득하는 단계이고 상기 이미지 시퀀스 획득단계는, 상기 이미지 시퀀스에 포함된 각 이미지로부터 조인트를 추출하여 상기 이미지의 포즈를 추정하는 포즈 추정단계를 포함할 수도 있다. Here, the image sequence acquisition step is a step of detecting and tracking a person area in the input video to obtain an image sequence including a bounding box for the person, and the image sequence acquisition step includes each image included in the image sequence. It may also include a pose estimation step of estimating the pose of the image by extracting joints from .

그리고 상기 대표 이미지 선택단계는, 상기 포즈 추정단계를 통해 추정된 포즈에 기초하여 군집 알고리즘을 통해 상기 쿼리 이미지 시퀀스에 포함된 각 이미지들의 포즈 벡터를 찾아 군집화를 수행한 후 각 군집의 무게중심을 찾고, 상기 쿼리 이미지 시퀀스에 포함된 이미지 중 상기 군집의 무게중심과 가장 가까운 포즈를 갖는 이미지를 상기 대표 이미지로 선택할 수도 있다. In the representative image selection step, the pose vector of each image included in the query image sequence is found through a clustering algorithm based on the pose estimated through the pose estimation step, clustered, and then the center of gravity of each cluster is found. , among the images included in the query image sequence, the image with the pose closest to the center of gravity of the cluster may be selected as the representative image.

또한, 상기 비교대상 쌍 선정단계는, 상기 비교 이미지 시퀀스 중에서 상기 대표 이미지의 포즈와 가장 유사한 포즈를 갖는 이미지를 비교 이미지로써 상기 대표 이미지와 매칭하여 상기 대표 이미지와 상기 비교 이미지가 쌍을 이루도록 할 수도 있다. In addition, in the step of selecting a pair of comparison objects, an image having a pose most similar to the pose of the representative image among the comparison image sequences may be matched with the representative image as a comparison image so that the representative image and the comparison image are paired. there is.

그리고, 상기 포즈 보정단계에서는, 상기 비교대상 쌍에서 쌍을 이루는 상기 대표 이미지의 포즈 또는 상기 비교 이미지의 포즈 중 한 이미지의 포즈를 상기 변형 목표 포즈로 하고, 다른 한 이미지의 포즈를 상기 변형 목표 포즈로 보정할 수도 있다. In the pose correction step, the pose of one image among the pose of the representative image or the pose of the comparison image forming a pair in the comparison object pair is set as the target transformation pose, and the pose of the other image is set as the target transformation pose. It can also be corrected.

또한, 상기 포즈 보정단계에서는, 상기 비교대상 쌍에서 쌍을 이루는 상기 대표 이미지의 포즈 및 상기 비교 이미지의 포즈의 중간포즈를 상기 변형 목표 포즈로 하고, 상기 대표 이미지의 포즈 및 상기 비교 이미지의 포즈를 상기 변형 목표 포즈로 보정할 수도 있다. In addition, in the pose correction step, the intermediate pose of the pose of the representative image and the pose of the comparison image forming a pair in the comparison object pair is set as the transformation target pose, and the pose of the representative image and the pose of the comparison image are set as the target pose. It can also be corrected using the modified target pose.

한편, 상기한 본 발명의 다른 목적을 실현하기 위한 일 실시예에 따른 컴퓨터로 판독 가능한 기록 매체에는, 상기 동일인물 인식 방법을 수행하기 위한, 컴퓨터 프로그램이 기록되어 있다. Meanwhile, a computer program for performing the method for recognizing the same person is recorded on a computer-readable recording medium according to an embodiment for realizing another object of the present invention described above.

상기한 본 발명의 또 다른 목적을 실현하기 위한 일 실시예에 따른 동일인물 인식 장치는, 적어도 하나의 카메라로부터 동영상을 입력받는 입력부; 입력된 동영상에 기초하여 이미지 시퀀스를 획득하는 사전처리부; 상기 이미지 시퀀스에서 동일인물 인식을 위한 기준인물의 이미지 시퀀스를 쿼리 이미지 시퀀스로 하고, 상기 쿼리 이미지 시퀀스에 포함된 이미지의 포즈에 기초하여 상기 쿼리 이미지 시퀀스 중에서 기설정된 수만큼의 대표 이미지를 선택하는 대표 이미지 선택부; 상기 이미지 시퀀스에서 상기 동일인물 인식을 위한 비교 대상이 되는 인물의 이미지 시퀀스를 비교 이미지 시퀀스로 하고, 상기 비교 이미지 시퀀스에 포함된 비교 이미지와 상기 대표 이미지를 매칭하여 비교대상 쌍을 선정하는 비교대상 선정부; 변형 알고리즘을 통해 상기 비교대상 쌍에서 쌍을 이루는 상기 대표 이미지 또는 비교 이미지 중 적어도 하나의 이미지의 포즈를 변형 목표 포즈로 보정하는 이미지 보정부; 상기 포즈 보정부를 통해 포즈 보정을 거친 상기 대표 이미지 및 상기 비교 이미지의 특징을 추출하는 추출부; 및 상기 추출된 특징 간의 유사도 및 거리를 산출하는 산출부를 포함한다. A same person recognition device according to an embodiment for realizing another object of the present invention described above includes an input unit that receives a moving image from at least one camera; a pre-processing unit that obtains an image sequence based on the input video; In the image sequence, the image sequence of the reference person for recognizing the same person is a query image sequence, and a representative image sequence is selected from the query image sequence as many as a preset number based on the pose of the image included in the query image sequence. image selection unit; In the image sequence, the image sequence of the person that is the object of comparison for recognizing the same person is set as the comparison image sequence, and the comparison object pair is selected by matching the comparison image included in the comparison image sequence with the representative image. government; an image correction unit that corrects the pose of at least one of the representative images or comparison images in the pair of comparison objects to a transformation target pose through a transformation algorithm; an extraction unit that extracts features of the representative image and the comparison image that have undergone pose correction through the pose correction unit; and a calculation unit that calculates the similarity and distance between the extracted features.

여기서 상기 이미지 보정부는, 상기 비교대상 쌍에서 쌍을 이루는 상기 대표 이미지의 포즈 또는 상기 비교 이미지의 포즈 중 한 이미지의 포즈를 상기 변형 목표 포즈로 하고, 다른 한 이미지의 포즈를 상기 변형 목표 포즈로 보정할 수도 있다. Here, the image correction unit sets the pose of one image of the pose of the representative image or the pose of the comparison image that forms a pair in the pair of comparison objects as the deformation target pose, and corrects the pose of the other image to the deformation target pose. You may.

그리고 상기 이미지 보정부는, 상기 비교대상 쌍에서 쌍을 이루는 상기 대표 이미지의 포즈 및 상기 비교 이미지의 포즈의 중간포즈를 상기 변형 목표 포즈로 하고, 상기 대표 이미지의 포즈 및 상기 비교 이미지의 포즈를 상기 변형 목표 포즈로 보정할 수도 있다. And the image correction unit sets an intermediate pose of the pose of the representative image and the pose of the comparison image that are paired in the pair of comparison objects as the transformation target pose, and modifies the pose of the representative image and the pose of the comparison image. You can also correct it to your target pose.

상술한 본 발명의 일측면에 따르면, 본 발명의 다수의 카메라에서 촬영된 영상에서 동일인물 인식 방법 및 이를 수행하기 위한 기록 매체 및 장치를 제공함으로써, 다중 카메라 시스템에서 인물 재탐지의 성능과 신뢰성을 높여 동일인물을 인식하는 정확도를 향상시킬 수 있다. According to one aspect of the present invention described above, the performance and reliability of person re-detection in a multi-camera system are improved by providing a method for recognizing the same person in images captured by multiple cameras of the present invention and a recording medium and device for performing the same. The accuracy of recognizing the same person can be improved.

또한, 동영상에서 대표적인 이미지를 선별하여 사용함으로써 종래의 이미지를 비교하는 연산보다 검색 속도를 향상시킬 수 있다. 특히 종래의 이미지 한장 한장을 서로 매칭하여 동일인물을 인식하는 것이 아닌 동영상의 트래킹을 기반으로 함으로써 대표 이미지 한 두장으로 동일인물을 검색하는 종래의 기술을 고도화시킬 수 있게 된다. Additionally, by selecting and using representative images from the video, search speed can be improved compared to conventional image comparison operations. In particular, it is possible to advance the conventional technology of searching for the same person with one or two representative images by basing it on video tracking rather than matching each image to recognize the same person.

도 1은 본 발명의 일 실시예에 따른 동일인물 인식장치의 구성을 설명하기 위한 블록도,
도 2는 본 발명의 일 실시예에 따라 입력된 동영상에서 객체를 검출하는 모습을 설명하기 위한 도면,
도 3은 본 발명의 일 실시예에 따른 이미지 시퀀스를 설명하기 위한 도면,
도 4는 본 발명의 일 실시예에 따라 이미지 시퀀스에 포함된 이미지로부터 인물의 포즈를 추정하는 모습을 설명하기 위한 도면,
도 5는 본 발명의 일 실시예에 따른 이미지 시퀀스에 포함된 이미지의 포즈 변화를 설명하기 위한 도면,
도 6은 본 발명의 일 실시예에 따라 쿼리 이미지 시퀀스에서 대표 이미지 선정을 위한 군집화 및 대표 이미지를 선택하는 모습을 설명하기 위한 도면,
도 7은 본 발명의 일 실시예에 따라 대표 이미지와 비교 이미지를 매칭하여 비교대상 쌍을 선정하는 모습을 설명하기 위한 도면,
도 8은 본 발명의 일 실시예에 따라 대표 이미지의 포즈를 변형 목표 포즈로 정하여 비교 이미지의 포즈를 대표 이미지의 포즈와 일치시켜 이미지의 특징을 추출하고 특징의 유사도 및 거리를 산출하는 모습을 설명하기 위한 도면,
도 9는 본 발명의 일 실시예에 따른 동일인물 인식 방법을 설명하기 위한 흐름도, 그리고,
도 10은 본 발명의 일 실시예에 따른 동일인물 인식 방법에 따라 동일인물을 인식한 결과를 설명하기 위한 도면이다. 1 is a block diagram illustrating the configuration of a same-person recognition device according to an embodiment of the present invention;
Figure 2 is a diagram illustrating detection of an object in an input video according to an embodiment of the present invention;
3 is a diagram for explaining an image sequence according to an embodiment of the present invention;
Figure 4 is a diagram illustrating estimating the pose of a person from an image included in an image sequence according to an embodiment of the present invention;
Figure 5 is a diagram for explaining the pose change of an image included in an image sequence according to an embodiment of the present invention;
Figure 6 is a diagram illustrating clustering for selecting a representative image from a query image sequence and selecting a representative image according to an embodiment of the present invention;
Figure 7 is a diagram illustrating selecting a comparison target pair by matching a representative image and a comparison image according to an embodiment of the present invention;
Figure 8 illustrates how, according to an embodiment of the present invention, the pose of the representative image is set as the transformation target pose, the pose of the comparison image is matched to the pose of the representative image, the features of the image are extracted, and the similarity and distance of the features are calculated. Drawings to do,
Figure 9 is a flowchart illustrating a method for recognizing the same person according to an embodiment of the present invention, and
Figure 10 is a diagram for explaining the results of recognizing the same person according to the same person recognition method according to an embodiment of the present invention.

후술하는 본 발명에 대한 상세한 설명은, 본 발명이 실시될 수 있는 특정 실시예를 예시로서 도시하는 첨부 도면을 참조한다. 이들 실시예는 당업자가 본 발명을 실시할 수 있기에 충분하도록 상세히 설명된다. 본 발명의 다양한 실시예는 서로 다르지만 상호 배타적일 필요는 없음이 이해되어야 한다. 예를 들어, 여기에 기재되어 있는 특정 형상, 구조 및 특성은 일 실시예와 관련하여 본 발명의 정신 및 범위를 벗어나지 않으면서 다른 실시예로 구현될 수 있다. 또한, 각각의 개시된 실시예 내의 개별 구성요소의 위치 또는 배치는 본 발명의 정신 및 범위를 벗어나지 않으면서 변경될 수 있음이 이해되어야 한다. 따라서, 후술하는 상세한 설명은 한정적인 의미로서 취하려는 것이 아니며, 본 발명의 범위는, 적절하게 설명된다면, 그 청구항들이 주장하는 것과 균등한 모든 범위와 더불어 첨부된 청구항에 의해서만 한정된다. 도면에서 유사한 참조부호는 여러 측면에 걸쳐서 동일하거나 유사한 기능을 지칭한다.The detailed description of the present invention described below refers to the accompanying drawings, which show by way of example specific embodiments in which the present invention may be practiced. These embodiments are described in sufficient detail to enable those skilled in the art to practice the invention. It should be understood that the various embodiments of the invention are different from one another but are not necessarily mutually exclusive. For example, specific shapes, structures and characteristics described herein may be implemented in one embodiment without departing from the spirit and scope of the invention. Additionally, it should be understood that the location or arrangement of individual components within each disclosed embodiment may be changed without departing from the spirit and scope of the invention. Accordingly, the detailed description that follows is not intended to be taken in a limiting sense, and the scope of the invention is limited only by the appended claims, together with all equivalents to what those claims assert, if properly described. Similar reference numbers in the drawings refer to identical or similar functions across various aspects.

이하에서는 도면들을 참조하여 본 발명의 바람직한 실시예들을 보다 상세하게 설명하기로 한다.Hereinafter, preferred embodiments of the present invention will be described in more detail with reference to the drawings.

도 1은 본 발명의 일 실시예에 따른 동일인물 인식장치(100)의 구성을 설명하기 위한 블록도, 도 2는 본 발명의 일 실시예에 따라 입력된 동영상에서 객체를 검출(O1, O2, O3, O4)하는 모습을 설명하기 위한 도면이고, 도 3은 본 발명의 일 실시예에 따른 이미지 시퀀스(10)를 설명하기 위한 도면, 그리고 도 4는 본 발명의 일 실시예에 따라 이미지 시퀀스(10)에 포함된 이미지(10-1, 10-2)로부터 인물의 포즈(10-1', 10-2')를 추정하는 모습을 설명하기 위한 도면이다. Figure 1 is a block diagram for explaining the configuration of the same person recognition device 100 according to an embodiment of the present invention, and Figure 2 is a block diagram showing object detection (O1, O2, O3, O4), Figure 3 is a diagram for explaining an image sequence 10 according to an embodiment of the present invention, and Figure 4 is a diagram showing an image sequence (O3, O4) according to an embodiment of the present invention. This is a diagram to explain how to estimate a person's pose (10-1', 10-2') from the images (10-1, 10-2) included in 10).

본 실시예에 따른 동일인물 인식장치(100)는 적어도 하나 이상의 카메라에서 촬영되는 동영상으로부터 획득된 이미지로부터 동일인물을 인식하기 위해 마련된다. 이를 위해 본 발명의 동일인물 인식장치(100)는 도 1에서와 같이 통신부(110), 입력부(130), 메모리(150), 출력부(170) 및 프로세서(190)를 포함하여 마련될 수 있다. 그리고 본 발명의 동일인물 인식 방법은 동일인물 인식 방법을 수행하기 위한 소프트웨어(어플리케이션)가 설치되어 실행될 수 있으며, 통신부(110), 입력부(130), 메모리(150), 출력부(170) 및 프로세서(190)의 구성은 동일인물 인식 장치(100)에서 실행되는 소프트웨어에 의해 제어될 수 있다. The same person recognition device 100 according to this embodiment is provided to recognize the same person from images obtained from video captured by at least one camera. To this end, the same person recognition device 100 of the present invention may be provided including a communication unit 110, an input unit 130, a memory 150, an output unit 170, and a processor 190 as shown in FIG. 1. . In addition, the same person recognition method of the present invention can be executed by installing software (application) for performing the same person recognition method, and includes a communication unit 110, an input unit 130, a memory 150, an output unit 170, and a processor. The configuration of 190 may be controlled by software running on the same person recognition device 100.

통신부(110)는 외부 기기 또는 외부 네트워크로부터 필요한 정보를 송수신하기 위해 마련되는 것으로, 통신부(110)를 통해 적어도 하나의 카메라에서 촬영된 동영상을 네트워크를 통해 전달받을 수도 있다. The communication unit 110 is provided to transmit and receive necessary information from an external device or an external network, and can also receive video captured by at least one camera through the network.

입력부(130)는 사용자 명령을 입력받기 위한 입력 수단으로 카메라에서 촬영된 영상을 입력받거나, 동일인물을 찾기 위한 검색의 기준대상 또는 비교대상이 되는 영상에 대한 정보 등을 입력받을 수 있다. The input unit 130 is an input means for receiving a user command and can receive an image captured by a camera or information about an image that is a reference object or comparison object for a search to find the same person.

메모리(150)는 동일인물 인식 방법을 수행하기 위한 프로그램이 기록되고, 프로세서(190)가 동작함에 있어 필요한 저장 공간을 제공하여 프로세서(190)가 처리하는 데이터를 일시적 또는 영구적으로 저장하며, 휘발성 저장매체 또는 비휘발성 저장매체를 포함할 수 있으나, 본 발명의 범위가 이에 한정되는 것은 아니다. 또한 메모리(150)는 동일인물 인식 방법을 수행하면서 누적되는 데이터가 저장될 수 있다. The memory 150 records a program for performing the same person recognition method, provides storage space necessary for the operation of the processor 190, temporarily or permanently stores data processed by the processor 190, and provides volatile storage. It may include media or non-volatile storage media, but the scope of the present invention is not limited thereto. Additionally, the memory 150 may store data accumulated while performing the same person recognition method.

그리고 출력부(170)는 동일인물 인식 방법에 대한 과정 및 결과를 표시하기 위한 것으로 디스플레이를 포함할 수 있다. And the output unit 170 may include a display to display the process and results of the same person recognition method.

한편 프로세서(190)는 상술한 동일인물 인식 방법을 수행하기 위해 사전처리부(191), 대표 이미지 선택부(192), 비교대상 선정부(193), 이미지 보정부(194), 추출부(195) 및 산출부(196)의 구성을 포함하여 마련될 수 있다. Meanwhile, the processor 190 includes a pre-processing unit 191, a representative image selection unit 192, a comparison object selection unit 193, an image correction unit 194, and an extraction unit 195 to perform the above-described same person recognition method. and a calculation unit 196.

사전처리부(191)는 적어도 하나의 카메라를 통해 입력된 동영상에 기초하여 이미지 시퀀스를 획득할 수 있다. 구체적으로 사전처리부(191)는 도 2에 도시된 바와 같이 적어도 하나의 카메라에서 촬영된 동영상 내에서 인물, 즉 객체를 검출(O1, O2, O3, O4) 및 추적(트래킹)을 직접 수행하거나 외부 기기로부터 전달받고, 이를 통해 사전처리부(191)는 인물에 대한 위치 정보, 즉 바운딩 박스(bounding box)를 포함하는 이미지 시퀀스(10)를 도 3에서와 같이 확보할 수 있다. 사전처리부(191)는 복수의 카메라로부터 동영상이 입력되는 경우에는 카메라 c개의 이미지 시퀀스(10)를 확보할 수 있다. 이러한 객체 검출 또는 트래킹은 딥 러닝 기반의 종래 객체 탐지 기술을 사용하거나 이로부터 유추가능한 바, 이와 관련한 구체적인 설명은 생략하기로 한다. The pre-processing unit 191 may obtain an image sequence based on a video input through at least one camera. Specifically, as shown in FIG. 2, the pre-processing unit 191 directly detects (O1, O2, O3, O4) and tracks a person, that is, an object, in a video captured by at least one camera, or performs external tracking. It is transmitted from the device, and through this, the pre-processing unit 191 can secure the image sequence 10 including the location information about the person, that is, the bounding box, as shown in FIG. 3. When video is input from multiple cameras, the pre-processing unit 191 can secure an image sequence 10 from c cameras. Since such object detection or tracking uses or can be inferred from deep learning-based conventional object detection technology, detailed descriptions thereof will be omitted.

그리고 사전처리부(191)는 동영상으로부터 획득한 이미지 시퀀스(10)에 포함된 각 인물의 이미지들(10-1 내지 10-Tc)을 2차원 또는 3차원 자세예측 기법에 의하여 신체의 랜드마크 또는 조인트를 추출할 수 있다. 이러한 사전처리부(191)는 바운딩 박스를 포함하는 이미지 시퀀스(10)의 각 이미지들(10-1 내지 10-Tc)의 분석을 통하여 인물 이미지의 골격 추정으로 랜드마크 또는 조인트를 추출할 수 있으며, 추출되는 랜드마크 또는 조인트의 위치는, 얼굴중앙(코), 좌우 어깨, 좌우 팔꿈치, 좌우 손목, 좌우 골반중심, 좌우 무릎 및 좌우 발목을 포함할 수 있다. 도 4에 도시된 도면을 예로 들면, 사전처리부(191)는 이미지 시퀀스(10)에 포함되는 각 이미지들 중 2차원 포즈에 해당하는 도 4 (a)의 제1 이미지(10-1)를 2차원 자세예측 기법에 기초하여 추정한 포즈는 제1 포즈추정(10-1')와 같고, 3차원 포즈에 해당하는 도 4 (b)의 제2 이미지(10-2)를 3차원 자세예측 기법에 기초하여 추정한 포즈는 제2 포즈추정(10-2')과 같을 수 있다. 이러한 랜드마크 또는 조인트를 추출하여 이미지 속 인물의 포즈를 추정하는 것은 이미 일반화되어 있는 기술을 적용할 수도 있다. And the pre-processing unit 191 uses the images 10-1 to 10-Tc of each person included in the image sequence 10 obtained from the video to identify landmarks or joints of the body using a 2D or 3D posture prediction technique. can be extracted. This pre-processing unit 191 can extract landmarks or joints by estimating the skeleton of the person image through analysis of each image 10-1 to 10-Tc of the image sequence 10 including the bounding box, The locations of the extracted landmarks or joints may include the center of the face (nose), left and right shoulders, left and right elbows, left and right wrists, left and right pelvic centers, left and right knees, and left and right ankles. Taking the drawing shown in FIG. 4 as an example, the pre-processing unit 191 divides the first image 10-1 of FIG. 4 (a) corresponding to a two-dimensional pose among the images included in the image sequence 10 into 2 The pose estimated based on the 3D pose prediction technique is the same as the first pose estimate (10-1'), and the second image (10-2) of Figure 4 (b), which corresponds to the 3D pose, is used by the 3D pose prediction technique. The pose estimated based on may be the same as the second pose estimate (10-2'). Already generalized technology can be applied to estimate the pose of a person in an image by extracting these landmarks or joints.

그리고 도 3에 도시된 이미지 시퀀스(10)는 하나의 카메라에서 촬영된 동영상을 기준으로 하기에 이미지의 크기가 모두 동일하게 도시되었다. 하지만 복수의 카메라에서 촬영된 동영상의 경우에는 인물의 각도나 카메라 간의 위치 또는 해상도에 따라 이미지 시퀀스(10)에 포함된 각 인물 이미지들(10-1 내지 10-Tc)의 크기가 서로 다르다. 이에 본 발명에서 사전처리부(191)가 랜드마크 또는 조인트를 추출하여 인물의 포즈를 추정함에 있어서, 랜드마크 또는 조인트의 좌표 값을 픽셀 값 단위를 사용하지 않고, 각 이미지의 크기로 나누어 정규화하여 사용하거나, 카메라 간의 거리를 측정하거나 사전에 알고 있는 경우에는, 실제 물리적인 단위인 미터 등의 기준단위를 사용할 수 있다. And since the image sequence 10 shown in FIG. 3 is based on a video captured by one camera, the images are all shown to have the same size. However, in the case of a video captured by multiple cameras, the size of each person image (10-1 to 10-Tc) included in the image sequence 10 is different depending on the angle of the person, the position between cameras, or resolution. Accordingly, in the present invention, when the pre-processing unit 191 extracts landmarks or joints to estimate the pose of a person, the coordinate values of the landmarks or joints are normalized by dividing them by the size of each image rather than using pixel value units. Alternatively, if the distance between cameras is measured or known in advance, a standard unit such as the meter, which is an actual physical unit, can be used.

이렇게 사전처리부(191)에서 동영상으로부터 이미지 시퀀스(10)를 획득하고, 이미지 시퀀스(10)에 포함된 인물의 각 이미지에 기초하여 랜드마크 또는 조인트를 추출하여 인물의 포즈를 추정할 수 있다. 그리고 본 실시예에서 사전처리부(191)가 바운딩 박스를 포함하는 이미지 시퀀스(10)를 포함하고, 이미지 시퀀스(10)에 포함되는 각 이미지의 랜드마크 또는 조인트를 추출하여 인물의 포즈를 추정하는 것은 통신부(110) 또는 입력부(130)를 통해 입력되는 모든 동영상에 대해서 수행될 수 있다. In this way, the pre-processing unit 191 obtains the image sequence 10 from the video, extracts landmarks or joints based on each image of the person included in the image sequence 10, and estimates the person's pose. In this embodiment, the pre-processing unit 191 includes an image sequence 10 including a bounding box, and extracts landmarks or joints of each image included in the image sequence 10 to estimate the pose of the person. This can be performed on all videos input through the communication unit 110 or the input unit 130.

도 5는 본 발명의 일 실시예에 따른 쿼리 이미지 시퀀스(Q10)에 포함된 이미지의 포즈 변화를 설명하기 위한 도면이다. 도 5에 도시된 바와 같이 동영상에서 한 인물의 포즈는 시간의 흐름에 따라 변하게 되고 사전처리부(191)를 통해 인물의 포즈 변화를 추정할 수 있다. 이에 도 5에서와 같이 이미지 시퀀스(10) 중에서 기준인물의 이미지 시퀀스인 쿼리 이미지 시퀀스(Q10)로부터 기준인물의 포즈 변화를 추정할 수 있다.FIG. 5 is a diagram illustrating a change in the pose of an image included in the query image sequence Q10 according to an embodiment of the present invention. As shown in FIG. 5, the pose of a person in a video changes over time, and the change in the person's pose can be estimated through the pre-processing unit 191. Accordingly, as shown in FIG. 5, the pose change of the reference person can be estimated from the query image sequence Q10, which is the image sequence of the reference person among the image sequences 10.

한편 대표 이미지 선택부(192)는 이미지 시퀀스(10)에서 동일인물 인식을 위한 기준인물에 대한 쿼리 이미지 시퀀스(Q10)로부터 기설정된 개수만큼의 대표 이미지(Q)를 선택하기 위해 마련된다. 이하에서는 C개의 카메라에서 촬영된 동영상으로부터 획득된 C개의 이미지 시퀀스(10)들 중에서 동일인물 인식을 위한 기준인물에 대응되는 이미지 시퀀스를 쿼리 이미지 시퀀스(Q10)로 정의하여 설명하기로 하며, 여기서 기준인물이라 함은 다수의 카메라에서 촬영된 동영상 들로부터 사용자가 찾고자 하는 인물을 의미한다. Meanwhile, the representative image selection unit 192 is provided to select a preset number of representative images (Q) from the query image sequence (Q10) for the reference person for recognizing the same person in the image sequence (10). Hereinafter, the image sequence corresponding to the reference person for recognizing the same person among the C image sequences 10 obtained from videos shot by C cameras will be defined and explained as the query image sequence Q10, where the reference image sequence Q10 is defined as the query image sequence Q10. Person refers to the person the user is looking for from videos captured by multiple cameras.

그리고 대표 이미지 선택부(192)에서 대표 이미지(Q)를 선택함에 있어서 사용하는 쿼리 이미지 시퀀스(Q10)는 도 5에 도시된 바와 같이 사전처리부(191)를 통해 포즈 추정이 완료된 상태일 수 있다. Additionally, the query image sequence Q10 used by the representative image selection unit 192 to select the representative image Q may have pose estimation completed through the pre-processing unit 191, as shown in FIG. 5 .

대표 이미지 선택부(192)는 사용자에 의해 동일인물 인식을 위한 기준인물에 대한 정보, 즉 쿼리 이미지 시퀀스(Q10)가 별도로 입력되거나 이미지 시퀀스(10)들 중에서 선택되면 대표 이미지 선택부(192)는 군집 알고리즘을 통해 쿼리 이미지 시퀀스(Q10)에 포함된 각 이미지들의 포즈 벡터를 찾아 군집화를 수행한다. The representative image selection unit 192 is provided by the user when information about a reference person for recognizing the same person, that is, the query image sequence Q10, is separately input or selected from the image sequences 10. Clustering is performed by finding the pose vector of each image included in the query image sequence (Q10) through a clustering algorithm.

도 6은 본 발명의 일 실시예에 따라 쿼리 이미지 시퀀스(Q10)에서 대표 이미지(101-Q 내지 105-Q) 선정을 위한 군집화(101 내지 105) 및 대표이미지(101-Q 내지 105-Q)를 선택하는 모습을 설명하기 위한 도면이다. Figure 6 shows clustering (101 to 105) and representative images (101-Q to 105-Q) for selecting representative images (101-Q to 105-Q) from the query image sequence (Q10) according to an embodiment of the present invention. This is a drawing to explain how to select .

도시된 바와 같이 대표 이미지 선택부(192)는 도 5에 도시된 바와 같이 사전처리부(191)를 통해 포즈 추정이 완료된 상태의 쿼리 이미지 시퀀스(Q10)의 포즈에 기초하여 적어도 하나의 대표 이미지(101-Q 내지 105-Q)를 선택하게 된다. 선택되는 대표 이미지의 개수는 사용자에 의해 입력되는 시스템의 설정 변수로서, K개의 대표 이미지를 사용하여 후술할 도 7의 비교 이미지(101-S, 102-S, 103-S)와 비교하여 매칭하도록 설정될 수 있다. As shown, the representative image selection unit 192 selects at least one representative image 101 based on the pose of the query image sequence Q10 for which pose estimation has been completed through the pre-processing unit 191 as shown in FIG. 5. -Q to 105-Q) are selected. The number of representative images selected is a system setting variable input by the user, and K representative images are used to compare and match the comparison images (101-S, 102-S, 103-S) of FIG. 7, which will be described later. can be set.

보다 구체적으로 대표 이미지 선택부(192)는 통신부(110) 또는 입력부(130)를 통해 입력된 설정값이 5인 경우에는, 도 6에 도시된 바와 같이 쿼리 이미지 시퀀스(Q10)의 포즈에 기초하여 5개의 대표 이미지(101-Q 내지 105-Q)가 선택되는 것이다. 이를 위해 대표 이미지 선택부(192)는 군집 알고리즘으로 K-mean 알고리즘을 사용할 수 있다. More specifically, when the setting value input through the communication unit 110 or the input unit 130 is 5, the representative image selection unit 192 selects the representative image based on the pose of the query image sequence Q10 as shown in FIG. 6. Five representative images (101-Q to 105-Q) are selected. For this purpose, the representative image selection unit 192 may use the K-mean algorithm as a clustering algorithm.

대표 이미지 선택부(192)는 쿼리 이미지 시퀀스(Q10)에 포함된 각 이미지들의 포즈와 뷰(각도) 정보를 바탕으로 군집 알고리즘으로 해당 포즈 벡터를 찾고, 그 중에서 각 군집((101 내지 105), 즉 클러스터의 무게 중심을 찾는다. The representative image selection unit 192 finds the corresponding pose vector using a cluster algorithm based on the pose and view (angle) information of each image included in the query image sequence Q10, and selects each cluster among them ((101 to 105), In other words, find the center of gravity of the cluster.

그리고 대표 이미지 선택부(192)는 군집(101 내지 105)의 무게 중심과 가장 가까운 포즈를 대표 포즈로 하고, 해당 포즈의 이미지를 대표 이미지(101-Q 내지 105-Q)로 선택할 수 있다. Additionally, the representative image selection unit 192 may select the pose closest to the center of gravity of the clusters 101 to 105 as the representative pose and select the image of the pose as the representative image 101-Q to 105-Q.

본 실시예에 따른 대표 이미지 선택부(192)가 대표 이미지(101-Q 내지 105-Q)를 선택하는 것은 종래의 이미지 매칭 방식에서 과도한 계산양을 요구하는 문제를 해결하기 위한 것으로, 구체적으로 설명하면 1분체 촬영되는 동영상 내의 이미지가 초당 30 프레임으로 촬영된다고 가정하면, 30X60으로 총 1,800장의 이미지가 생성되는데, 해당 이미지들을 모두 비교하여 매칭하기에는 과도한 계산양을 필요로 한다. 이에 본 발명에서는 이미지들 중에서 대표가 되는 대표 이미지를 선택하고, 선택된 대표 이미지를 기준으로 하여 비교 이미지와 비교하여 동일인물인지를 분석함으로써 종래의 이미지 비교방식보다도 적은 계산양으로 동일인물을 인식할 수 있도록 한다. The representative image selection unit 192 according to this embodiment selects the representative images 101-Q to 105-Q to solve the problem of requiring excessive calculation in the conventional image matching method, and is described in detail. Assuming that the images in one minute of video are shot at 30 frames per second, a total of 1,800 images of 30X60 are created, but comparing and matching all of the images requires an excessive amount of calculation. Accordingly, in the present invention, a representative image is selected from among the images, and the selected representative image is compared with the comparative image to analyze whether the person is the same person, so that the same person can be recognized with less calculation than the conventional image comparison method. Let it happen.

한편 도 7은 본 발명의 일 실시예에 따른 비교대상 선정부(193)에서 대표 이미지(101-Q 내지 103-Q)와 비교 이미지(101-S 내지 103-S)를 매칭하여 비교대상 쌍((101-P 내지 103-P)을 선정하는 모습을 설명하기 위한 도면으로, 도면에서는 3개의 대표 이미지가 선택되는 경우이다. 비교대상 선정부(193)는 비교 이미지 시퀀스(S10)에 포함된 비교 이미지와 선택된 대표 이미지(101-Q 내지 103-Q)를 매칭하여 비교대상 쌍을 선정하기 위해 마련될 수 있다. 여기서 비교 이미지 시퀀스(S10)는 이미지 시퀀스(10)에서 동일인물 인식을 위한 비교 대상이 되는 인물의 이미지 시퀀스를 의미하고, 쿼리 이미지 시퀀스(Q10)와 마찬가지로 비교 이미지 시퀀스(S10) 역시 사전처리부(191)를 통해 이미지에 포함된 인물의 포즈 추정이 완료된 이미지 시퀀스일 수 있다. Meanwhile, Figure 7 shows a pair of comparison objects ( This is a drawing to explain how to select (101-P to 103-P), and in the drawing, three representative images are selected.The comparison target selection unit 193 is a comparison included in the comparison image sequence (S10). It can be prepared to select a pair of comparison objects by matching the image and the selected representative images (101-Q to 103-Q), where the comparison image sequence (S10) is a comparison object for recognizing the same person in the image sequence (10). This refers to an image sequence of a person, and like the query image sequence Q10, the comparison image sequence S10 may also be an image sequence in which the pose estimation of the person included in the image has been completed through the pre-processing unit 191.

비교대상 선정부(193)는 대표 이미지 선택부(192)를 통해 대표 이미지(101-Q, 102-Q, 103-Q)가 선택되면, 비교 이미지 시퀀스(S10) 중에서 선택된 대표 이미지(101-Q, 102-Q, 103-Q)의 포즈와 가장 유사한 포즈를 갖는 이미지를 비교 이미지(101-S, 102-S, 103-S)로 선택한다. When the representative images (101-Q, 102-Q, 103-Q) are selected through the representative image selection unit 192, the comparison object selection unit 193 selects the representative image (101-Q) selected from the comparison image sequence (S10). , 102-Q, 103-Q), the image with the most similar pose is selected as the comparison image (101-S, 102-S, 103-S).

이후 비교대상 선정부(193)는 선택된 대표 이미지(101-Q, 102-Q, 103-Q) 와 선택된 비교 이미지(101-S, 102-S, 103-S)를 각각 매칭하여 서로 하나의 쌍을 이루도록 하여 비교대상 쌍(101-P, 102-P, 103-P)을 선정할 수 있다. 여기서 비교대상 쌍은 대표 이미지(101-Q, 102-Q, 103-Q)의 개수에 대응되는 수로 선정되는데, 도 7은 대표 이미지(101-Q, 102-Q, 103-Q)가 3개 선택되도록 설정된 경우의 예시적 사항이다. 도 7에 도시된 바와 같이 비교 이미지 시퀀스(S10)에 포함된 다수의 이미지들 중에서 3개의 대표 이미지(101-Q, 102-Q, 103-Q)의 포즈와 가장 유사한 이미지를 비교 이미지(101-S, 102-S, 103-S)로 선택하고, 가장 유사한 포즈를 갖는 이미지들끼리 매칭하여 비교대상 쌍(101-P, 102-P, 103-P)으로 선정할 수 있다. Thereafter, the comparison object selection unit 193 matches the selected representative images (101-Q, 102-Q, 103-Q) and the selected comparison images (101-S, 102-S, 103-S) to form a pair. The comparison target pair (101-P, 102-P, 103-P) can be selected by forming . Here, the comparison target pair is selected as a number corresponding to the number of representative images (101-Q, 102-Q, 103-Q), and Figure 7 shows three representative images (101-Q, 102-Q, 103-Q). This is an example of a case where it is set to be selected. As shown in FIG. 7, among the multiple images included in the comparison image sequence (S10), the image most similar to the pose of the three representative images (101-Q, 102-Q, 103-Q) is selected as the comparison image (101-Q). S, 102-S, 103-S), and images with the most similar poses can be matched to select a comparison target pair (101-P, 102-P, 103-P).

그리고 본 실시예에서의 비교대상 선정부(193)는 비교대상 쌍(101-P, 102-P, 103-P)을 선정하기 위해 가중 유클리디안 거리를 사용하여 가장 유사한 포즈를 갖는 이미지를 서로 매칭하여 쌍을 선정할 수 있다. 본 실시예에서는 가장 유사한 뷰와 포즈에 해당하는 두 개의 이미지를 매칭함에 있어서 상대적으로 쉽고 안정적으로 추정이 가능한 가중 유클리디안 거리를 사용하는 것으로 상정하였지만, 꼭 이 방식에 한정되는 것은 아니며 다른 방식으로도 두 포즈 간의 거리를 구할 수 있음은 물론이다. And in this embodiment, the comparison object selection unit 193 uses the weighted Euclidean distance to select the comparison object pairs (101-P, 102-P, 103-P) to select images with the most similar poses. You can select a pair by matching. In this embodiment, it is assumed that the weighted Euclidean distance, which can be estimated relatively easily and stably, is used when matching two images corresponding to the most similar views and poses, but it is not necessarily limited to this method and can be used in other ways. Of course, the distance between two poses can also be obtained.

한편 이미지 보정부(194)는 변형 알고리즘을 통해 비교대상 쌍에서 쌍을 이루는 대표 이미지 또는 비교 이미지 중 적어도 하나의 이미지의 포즈를 변형 목표 포즈로 보정하기 위해 마련된다. Meanwhile, the image correction unit 194 is provided to correct the pose of at least one image among the representative images or comparison images in the comparison target pair to the transformation target pose through a transformation algorithm.

비교대상 쌍을 선정함에 있어서 대표 이미지(102-Q)와 가장 유사한 포즈와 뷰를 갖는 비교 이미지(102-S)를 매칭하더라도 도 7에 도시된 바와 같이 한 쌍(102-P)에 포함되는 두 이미지(102-Q, 102-S)의 포즈와 뷰가 정확히 일치하지 않을 수 있다. 이를 위해 본 실시예에서는 이미지 보정부(194)를 통해 비교 이미지(102-S)의 포즈(102-S')와 대표 이미지(102-Q)의 포즈(102-Q')를 일치시키는 과정을 수행할 수 있다. In selecting a comparison target pair, even if the representative image 102-Q matches the comparison image 102-S with the most similar pose and view, as shown in FIG. 7, the two included in the pair 102-P The pose and view of the images 102-Q and 102-S may not exactly match. To this end, in this embodiment, a process of matching the pose 102-S' of the comparison image 102-S and the pose 102-Q' of the representative image 102-Q is performed through the image correction unit 194. It can be done.

구체적으로 이미지 보정부(194)는 한 쌍의 비교대상 쌍(102-P)에 포함되는 대표 이미지(102-Q)의 포즈(102-Q') 및 비교 이미지(102-S)의 포즈(102-S') 중 적어도 하나의 이미지의 포즈를 변형 목표 포즈로 보정함에 있어서는 두 가지 방식 중 하나를 선택하여 보정할 수 있다. 하나의 방식은 대표 이미지의 포즈(102-Q') 또는 비교 이미지의 포즈(102-S′) 중 한 이미지의 포즈를 변형 목표 포즈로 정하여, 다른 한 이미지의 포즈만을 변형 목표 포즈로 보정할 수 있다. 나머지 하나의 방식은 대표 이미지의 포즈(102-Q') 및 비교 이미지의 포즈(102-S')의 중간 포즈를 변형 목표 포즈로 정하여, 두 이미지의 포즈((102-Q', 102-S')를 모두 변형 목표 포즈로 보정할 수 있다. Specifically, the image correction unit 194 operates on the pose 102-Q' of the representative image 102-Q included in a pair of comparison objects 102-P and the pose 102 of the comparison image 102-S. -S'), when correcting the pose of at least one image to the transformation target pose, one of two methods can be selected for correction. One method is to set the pose of one of the representative image poses (102-Q') or the comparison image pose (102-S') as the transformation target pose, and only the pose of the other image can be corrected to the transformation target pose. there is. The remaining method sets the intermediate pose between the pose of the representative image (102-Q') and the pose of the comparison image (102-S') as the transformation target pose, and sets the pose of the two images ((102-Q', 102-S) ') can all be corrected to the modified target pose.

도 8은 본 발명의 일 실시예에 따라 대표 이미지(102-Q)의 포즈(102-Q')를 변형 목표 포즈로 정하여 비교 이미지의 포즈(102-S')를 대표 이미지의 포즈(102-Q')와 일치시켜 이미지의 특징을 추출하고 특징의 유사도 및 거리를 산출하는 모습을 설명하기 위한 도면이다. Figure 8 shows that according to an embodiment of the present invention, the pose 102-Q' of the representative image 102-Q is set as the transformation target pose, and the pose 102-S' of the comparison image is changed to the pose 102-Q' of the representative image 102-Q. This is a diagram to explain extracting the features of an image by matching them with Q') and calculating the similarity and distance of the features.

만약 변형 목표 포즈가 대표 이미지의 포즈(102-Q')로 설정된 경우에는 도 8에 도시된 바와 같이 사전처리부(191)를 통해 추정된 포즈에 기초하여 이미지 보정부(194)는 비교 이미지의 포즈(102-S')를 변형 목표 포즈인 기준 이미지의 포즈(102-Q')와 동일해지도록 보정한다. 이를 통해 이미지 보정부(194)는 비교 이미지(102-S)가 변형 목표 포즈와 동일해지도록 보정된 보정 이미지(102-SC)를 생성할 수 있다. If the transformation target pose is set to the pose 102-Q' of the representative image, the image correction unit 194 determines the pose of the comparison image based on the pose estimated through the pre-processing unit 191, as shown in FIG. 8. (102-S') is corrected to be the same as the pose (102-Q') of the reference image, which is the transformation target pose. Through this, the image correction unit 194 can generate a correction image 102-SC corrected so that the comparison image 102-S becomes the same as the transformation target pose.

반면, 도 8과는 달리 본 발명의 변형 목표 포즈가 대표 이미지의 포즈(102-Q')와 비교 이미지의 포즈(102-S')의 중간 포즈로 설정된 경우라면, 이미지 보정부(194)는 대표 이미지((102-Q)와 비교 이미지((102-S) 모두 중간 포즈와 동일한 포즈를 갖도록 보정하여 보정된 대표 이미지(미도시)와 보정된 비교 이미지(미도시)를 생성할 수 있을 것이다. On the other hand, unlike FIG. 8, if the modified target pose of the present invention is set to an intermediate pose between the pose (102-Q') of the representative image and the pose (102-S') of the comparison image, the image correction unit 194 Both the representative image (102-Q) and the comparison image (102-S) can be corrected to have the same pose as the middle pose to generate a corrected representative image (not shown) and a corrected comparison image (not shown). .

이러한 이미지 보정부(194)를 통해 본 발명에는 종래의 DPM(Deformable Part Model)과 같이 각 신체 부위별로 비교하는 방식이 아닌 포즈 일치를 통해 전체 구조를 반영할 수 있는 것은 물론, DPM과 같은 많은 계산양을 요하지 않는다는 장점이 있다. Through this image correction unit 194, the present invention not only reflects the entire structure through pose matching rather than comparing each body part like the conventional DPM (Deformable Part Model), but also allows many calculations like DPM. It has the advantage of not requiring large amounts.

그리고 이미지 보정부(194)는 변형 목표 포즈를 정함에 있어 조인트 정보를 이용하여 non-rigid 기하변환 방식을 사용하거나 신경망을 이용한 방법으로 수행될 수 있으며, 이와 동일한 기능을 제공하는 방식이라면 어떤 방식이던 활용이 가능하다. Non-rigid 기하변환 방식의 일 예로는 Thin plate Spline 방식이나 ARAP(As Rigid as possible) 방식일 수 있다.In determining the transformation target pose, the image correction unit 194 may use a non-rigid geometric transformation method using joint information or a neural network method, whichever method provides the same function. It is possible to utilize it. An example of a non-rigid geometric transformation method may be the Thin plate Spline method or the ARAP (As Rigid as possible) method.

한편 추출부(195)는 이미지 보정부(194)를 통해 포즈 보정을 거친 비교 이미지인 보정 이미지(102-SC) 및 대표 이미지(102-Q)의 특징을 추출할 수 있다. Meanwhile, the extractor 195 may extract features of the corrected image 102-SC and the representative image 102-Q, which are comparative images that have undergone pose correction, through the image corrector 194.

도 8에 도시된 바와 같이 변형 목표 포즈가 대표 이미지의 포즈(102-Q')로 정해져 비교 이미지의 포즈(102-S')만이 보정되어 보정 이미지(102-SC)가 생성되면, 추출부(195)는 보정 이미지(102-SC)와 대표 이미지((102-Q)의 특징을 추출하는데, 이 때 합성곱 신경망이나 정해진 함수에 따라 특징을 추출할 수 있다. As shown in FIG. 8, when the transformation target pose is set as the pose 102-Q' of the representative image and only the pose 102-S' of the comparison image is corrected to generate the corrected image 102-SC, the extraction unit ( 195) extracts the features of the corrected image (102-SC) and the representative image (102-Q). At this time, the features can be extracted according to a convolutional neural network or a defined function.

만약 변형 목표 포즈가 중간 포즈인 경우에는 추출부(195)는 보정된 대표 이미지와 보정된 비교 이미지의 특징을 추출할 수 있다. If the transformation target pose is an intermediate pose, the extractor 195 may extract features of the corrected representative image and the corrected comparison image.

산출부(196)는 추출부(195)에서 추출된 특징에 기초하여 대표 이미지(102-Q)의 특징값과 보정된 비교 이미지인 보정 이미지(102-SC)의 특징값 간의 거리를 계산하는 함수에 기초하여 두 특징값 간의 거리를 계산할 수 있다. 여기서 특징값 간의 거리를 계산하는 알고리즘은 역코사인 유사성 (invers cosine similarity)을 사용하거나 신경회로망을 통한 학습에 의하여 얻을 수 있다. The calculation unit 196 is a function that calculates the distance between the feature values of the representative image 102-Q and the feature values of the corrected image 102-SC, which is a corrected comparison image, based on the features extracted by the extractor 195. Based on , the distance between two feature values can be calculated. Here, the algorithm for calculating the distance between feature values can be obtained by using invers cosine similarity or learning through a neural network.

그리고 산출부(196)은 산출한 유사도의 결과를 유사도에 따라 가장 근접한 비교대상의 쌍을 K개까지 선택하는 Rank-K 알고리즘을 통해 유사도가 가장 근접한 K개의 검색 결과를 출력하도록 할 수 있다. 여기서 K개의 개수는 사용자의 선택에 따라 설정되는 값일 수 있으며, K개의 검색 결과를 사용자가 확인할 수 있도록 출력부(170)를 통해 출력할 수 있다. 그러면 사용자는 출력부(170)를 통해 출력된 결과물을 육안으로 확인하여 동일인물인지 아닌지를 판단할 수 있게 된다. In addition, the calculation unit 196 can output K search results with the closest similarity through a Rank-K algorithm that selects up to K pairs of the closest comparison objects according to the similarity. Here, the number of K may be a value set according to the user's selection, and the K search results may be output through the output unit 170 so that the user can check them. Then, the user can visually check the results output through the output unit 170 to determine whether they are the same person or not.

한편, 도 9는 본 발명의 일 실시예에 따른 동일인물 인식 방법을 설명하기 위한 흐름도, 그리고, 도 10은 본 발명의 일 실시예에 따른 동일인물 인식 방법에 따라 동일인물을 인식한 결과를 설명하기 위한 도면이다.Meanwhile, FIG. 9 is a flowchart for explaining a method for recognizing the same person according to an embodiment of the present invention, and FIG. 10 illustrates the results of recognizing the same person according to the method for recognizing the same person according to an embodiment of the present invention. This is a drawing for this purpose.

본 발명의 동일인물 인식 방법은 다수의 카메라에서 촬영된 영상이나, 하나의 카메라에서 다른 각도로 촬영된 인물이 동일인물인지를 검색 또는 확인하기 위해 마련되는 것으로, 상술한 본 발명의 동일인물 인식 장치에 의해서 수행된다. The same person recognition method of the present invention is provided to search or confirm whether images captured by multiple cameras or people photographed at different angles from one camera are the same person. The same person recognition device of the present invention described above is provided. It is performed by.

먼저 본 발명에 따른 동일인물 인식 방법은, 동일인물 인식 장치(100)가 적어도 하나의 카메라로부터 동영상을 입력받는다(S110).First, in the same person recognition method according to the present invention, the same person recognition device 100 receives video from at least one camera (S110).

그러면 동일인물 인식 장치(100)는 입력된 동영상에 기초하여 인물영역을 검출하고 추적하여 인물에 대한 바운딩 박스를 포함하는 이미지 시퀀스(10)를 획득할 수 있다(S120). Then, the same person recognition device 100 can detect and track the person area based on the input video and obtain an image sequence 10 including a bounding box for the person (S120).

구체적으로 도 2에 도시된 바와 같이 동일인물 인식 장치(100)는 입력된 동영상 내에서 인물, 즉 객체를 검출(O1, O2, O3, O4) 및 추적(트래킹)을 직접 수행하거나 외부 기기로부터 전달받을 수 있다. 이를 통해 동일인물 인식 장치(100)는 인물에 대한 위치 정보, 즉 바운딩 박스(bounding box)를 포함하는 이미지 시퀀스(10)를 도 3에서와 같이 확보할 수 있다. 객체 검출 또는 트래킹은 딥 러닝 기반의 종래 객체 탐지 기술을 사용하거나 이로부터 유추가능한 바, 이와 관련한 구체적인 설명은 생략하기로 한다. Specifically, as shown in FIG. 2, the same person recognition device 100 directly detects (O1, O2, O3, O4) and tracks a person, that is, an object, in an input video or transmits it from an external device. You can receive it. Through this, the same person recognition device 100 can secure location information about the person, that is, an image sequence 10 including a bounding box, as shown in FIG. 3 . Since object detection or tracking uses or can be inferred from deep learning-based conventional object detection technology, detailed descriptions regarding this will be omitted.

이 때 C개의 이미지 시퀀스(10)를 로 표현할 수 있다. 여기서 는 이미지 시퀀스(10) c의 모든 프레임 이미지들의 순서있는 집합이고, 는 이미지 시퀀스(10) c의 프레임의 개수를 의미한다. 그리고 c는 고려하는 인물 이미지 시퀀스(10)의 개수, 즉 인물이 촬영되는 카메라의 수를 의미하며, t는 시간적인 순서를 의미한다. 이 때 c개의 각 이미지 시퀀스(10)의 해상도와 크기는 각각 다를 수 있다.At this time, C image sequences (10) are It can be expressed as here is an ordered set of all frame images of image sequence 10 c, means the number of frames of the image sequence 10 c. And c refers to the number of person image sequences (10) being considered, that is, the number of cameras through which the person is photographed, and t refers to the temporal order. At this time, the resolution and size of each c image sequence 10 may be different.

이후 동일인물 인식 장치(100)는 이미지 시퀀스(10)에 포함된 각 이미지로부터 조인트를 추출하여 각 이미지의 포즈를 추정할 수 있다(S130). Thereafter, the same person recognition device 100 may estimate the pose of each image by extracting a joint from each image included in the image sequence 10 (S130).

본 발명의 동일인물 인식 장치(100)는 상술한 바와 같이 획득한 이미지 시퀀스(10)에 포함된 각 인물의 이미지를 2차원 또는 3차원 자세예측 기법에 의하여 중요한 신체의 랜드마크 또는 조인트를 추출할 수 있다. 여기서 랜드마크 또는 조인트의 위치는 얼굴중앙(코), 좌우 어깨, 좌우 팔꿈치, 좌우 손목, 좌우 골반중심, 좌우 무릎 및 좌우 발목을 포함할 수 있다. 그리고 이렇게 추출된 랜드마크 또는 조인트를 로 표시할 수 있다. 또한 랜드마크 또는 조인트를 추출함에 있어서, 이미지 시퀀스(10)에 포함된 인물의 이미지들 중 도 4 (a)에 도시된 바와 같이 2차원 포즈에 해당하는 제1 이미지(10-1)인 경우에는 하기의 수학식 2에 의해 산출함으로써 제1 포즈추정(10-1')을 할 수 있고, 도 4 (b)와 같이 3차원 포즈에 해당하는 제2 이미지(10-2)는 하기의 수학식 1에 의해 산출함으로써 제2 포즈추정(10-2')을 할 수 있다. The same person recognition device 100 of the present invention extracts important body landmarks or joints from the images of each person included in the image sequence 10 obtained as described above using a 2D or 3D posture prediction technique. You can. Here, the location of the landmark or joint may include the center of the face (nose), left and right shoulders, left and right elbows, left and right wrists, left and right pelvic centers, left and right knees, and left and right ankles. And the landmarks or joints extracted in this way are It can be displayed as . Additionally, when extracting a landmark or joint, among the images of a person included in the image sequence 10, if the first image 10-1 corresponds to a two-dimensional pose as shown in FIG. 4 (a), The first pose estimate (10-1') can be made by calculating using Equation 2 below, and the second image (10-2) corresponding to the three-dimensional pose as shown in Figure 4 (b) is calculated using the following Equation: The second pose estimate (10-2') can be performed by calculating by 1.

[수학식 1][Equation 1]

[수학식 2][Equation 2]

이상의 수학식에서 는 이미지 시퀀스(10) c의 시점 t에 해당하는 인물 이미지의 J개의 랜드마크 또는 조인트들의 3차원 또는 2차원 좌표값 또는 정규화된 좌표값을 의미하며, 이상의 수학식을 통해 2차원 또는 3차원으로 추출된 조인트는 도 4에 도시된 바와 같다. 이러한 랜드마크 또는 조인트를 추출하여 이미지 속 인물의 포즈를 추정하는 것은 이미 일반화되어 있는 기술을 적용할 수도 있다.In the above equation means the three-dimensional or two-dimensional coordinate value or normalized coordinate value of J landmarks or joints of the person image corresponding to the time point t of the image sequence (10) c, and can be expressed in two or three dimensions through the above equation. The extracted joint is as shown in Figure 4. Already generalized technology can be applied to estimate the pose of a person in an image by extracting these landmarks or joints.

그리고 도 3에 도시된 이미지 시퀀스(10)는 하나의 카메라에서 촬영된 동영상을 기준으로 하기에 이미지의 크기가 모두 동일하게 도시되었다. 하지만 복수의 카메라에서 촬영된 동영상의 경우에는 인물의 각도나 카메라 간의 위치 또는 해상도에 따라 이미지 시퀀스(10)에 포함된 각 인물 이미지들(10-1 내지 10-Tc)의 크기가 c개의 이미지 시퀀스(10)마다 서로 다를 수 있다. 이에 본 동일인물 인식 방법은 랜드마크 또는 조인트를 추출하여 인물의 포즈를 추정함에 있어서, 랜드마크 또는 조인트의 좌표 값을 픽셀 값 단위를 사용하지 않고, 각 이미지의 크기로 나누어 정규화하여 사용하거나, 카메라 간의 거리를 측정하거나 사전에 알고 있는 경우에는, 실제 물리적인 단위인 미터 등의 기준단위를 사용할 수 있다. And since the image sequence 10 shown in FIG. 3 is based on a video captured by one camera, the images are all shown to have the same size. However, in the case of a video shot with multiple cameras, the size of each person image (10-1 to 10-Tc) included in the image sequence 10 is c image sequence depending on the angle of the person, the position between cameras, or resolution. (10) may be different. Accordingly, this same person recognition method extracts landmarks or joints to estimate the pose of the person, and normalizes the coordinate values of the landmarks or joints by dividing them by the size of each image rather than using pixel value units, or uses camera When measuring the distance between objects or knowing it in advance, you can use a standard unit such as the meter, which is an actual physical unit.

이렇게 동영상으로부터 이미지 시퀀스(10)를 획득하고, 이미지 시퀀스(10)에 포함된 인물의 각 이미지에 기초하여 랜드마크 또는 조인트를 추출하여 인물의 포즈를 추정할 수 있다. 그리고 본 실시예에서 바운딩 박스를 포함하는 이미지 시퀀스(10)를 획득(S120)하고, 이미지 시퀀스(10)에 포함되는 각 이미지의 랜드마크 또는 조인트를 추출하여 인물의 포즈를 추정하는 단계(S130)는 동영상이 입력되는 단계(S110)를 통해 입력되는 모든 동영상에 대해서 수행될 수 있다.In this way, the image sequence 10 can be obtained from the video, and the pose of the person can be estimated by extracting landmarks or joints based on each image of the person included in the image sequence 10. And in this embodiment, the image sequence 10 including the bounding box is acquired (S120), and landmarks or joints of each image included in the image sequence 10 are extracted to estimate the pose of the person (S130). Can be performed on all videos input through the video input step (S110).

이후 동일인물 인식 장치(100)는 쿼리 이미지 시퀀스(Q10)에 포함된 이미지의 포즈에 기초하여 쿼리 이미지 시퀀스(Q10) 중에서 기설정된 수만큼의 대표 이미지(101-Q 내지 105-Q)를 선택한다(S140). 이 때 C개의 카메라에서 촬영된 동영상으로부터 획득된 C개의 이미지 시퀀스(10)들 중에서 동일인물 인식을 위한 기준인물에 대응되는 이미지 시퀀스를 쿼리 이미지 시퀀스(Q10)로 정의하여 설명하기로 하며, 여기서 기준인물이라 함은 다수의 카메라에서 촬영된 동영상 들로부터 사용자가 찾고자 하는 인물을 의미한다.Thereafter, the same person recognition device 100 selects a preset number of representative images 101-Q to 105-Q from the query image sequence Q10 based on the pose of the image included in the query image sequence Q10. (S140). At this time, the image sequence corresponding to the reference person for recognizing the same person among the C image sequences 10 acquired from the video captured by the C cameras will be defined and explained as the query image sequence Q10, where the reference Person refers to the person the user is looking for from videos captured by multiple cameras.

여기서 선택되는 대표 이미지의 기설정된 개수는 사용자가 시스템의 설정 변수로서, 사용자가 설정한 설정값이 5인 경우에는 5개의 대표 이미지(101-Q 내지 105-Q)를 선택할 수 있다. The preset number of representative images selected here is a variable set by the user in the system, and if the setting value set by the user is 5, five representative images (101-Q to 105-Q) can be selected.

그리고 선택되는 대표 이미지(101-Q 내지 105-Q)는 쿼리 이미지 시퀀스 (Q10)중에서 독립적인 정보가 가장 큰 이미지가 선택될 수 있는데, 본 발명에서는 각 이미지의 포즈를 추정하는 단계(S130)에서 추정된 인물의 포즈와 카메라의 뷰(각도)에 기초하여 대표 이미지(101-Q 내지 105-Q)를 선택할 수 있다. And the selected representative images (101-Q to 105-Q) may be images with the largest independent information among the query image sequence (Q10). In the present invention, in the step of estimating the pose of each image (S130) Representative images (101-Q to 105-Q) can be selected based on the estimated pose of the person and the view (angle) of the camera.

도 5에와 같이 포즈 추정이 완료된 쿼리 이미지 시퀀스(Q10)를 보면 시간의 흐름에 따라 인물의 포즈가 변화하게 된다. 이에 본 실시예에서는 쿼리 이미지 시퀀스(Q10)에서 대표 이미지를 선정하기 위해 군집화를 수행하고, 각 군집(101 내지 105)에 포함된 이미지들 중 하나를 대표 이미지(101-Q 내지 105-Q)로 선정하게 되며, 이는 도 6에 도시된 바와 같다. Looking at the query image sequence (Q10) for which pose estimation has been completed, as shown in Figure 5, the pose of the person changes over time. Accordingly, in this embodiment, clustering is performed to select a representative image from the query image sequence (Q10), and one of the images included in each cluster (101 to 105) is selected as a representative image (101-Q to 105-Q). The selection is made as shown in Figure 6.

대표 이미지 선택하는 단계(S140)에서 동일인물 인식 장치(100)는 각 이미지의 포즈를 추정하는 단계(S120)를 통해 추정된 포즈에 기초하여 군집 알고리즘을 통해 쿼리 이미지 시퀀스(Q10)에 포함된 각 이미지들의 포즈 벡터를 찾아 군집화를 수행한 후 각 군집의 무게중심을 찾는다. In the step of selecting a representative image (S140), the same person recognition device 100 estimates each image included in the query image sequence (Q10) through a clustering algorithm based on the pose estimated through the step of estimating the pose of each image (S120). Find the pose vectors of the images, perform clustering, and then find the center of gravity of each cluster.

이를 위해 동일인물 인식 장치(100)는 하기의 수학식 3에 기초하여 각 군집(101 내지 105), 즉 클러스터의 무게 중심을 찾을 수 있으며, 여기서 군집 알고리즘으로는 K-mean 알고리즘을 사용할 수 있다.To this end, the same person recognition device 100 can find the center of gravity of each cluster 101 to 105, that is, the cluster, based on Equation 3 below. Here, the K-mean algorithm can be used as the cluster algorithm.

[수학식 3][Equation 3]

수학식 3에서 q는 쿼리 이미지 시퀀스를 의미하고, 는 쿼리 이미지 시퀀스 q의 포즈 벡터들로부터 군집화를 통하여 얻은 군집 k의 무게중심 값을 의미한다.In Equation 3, q refers to the query image sequence, means the center of gravity value of cluster k obtained through clustering from the pose vectors of the query image sequence q.

그리고 하기의 수학식 4에 기초하여 군집 알고리즘을 통해 찾은 각 군집(101 내지 105)의 무게 중심과 가장 가까운 포즈를 갖는 이미지를 대표 이미지(101-Q 내지 105-Q)로 선택할 수 있다. And based on Equation 4 below, the image with the pose closest to the center of gravity of each cluster (101 to 105) found through the clustering algorithm can be selected as the representative image (101-Q to 105-Q).

[수학식 4][Equation 4]

여기서 는 쿼리 이미지 시퀀스 q에서 군집 k의 무게중심 값인 를 대표하는 포즈 벡터를 갖는 이미지의 시점을 의미한다. here is the centroid value of cluster k in the query image sequence q. It means the viewpoint of the image with a pose vector representing .

본 발명의 동일인물 인식 방법에서 이와 같이 대표 이미지(101-Q 내지 105-Q)를 선택하는 것은 종래의 이미지 매칭 방식에서 과도한 계산량을 요구하는 문제를 해결하기 위함이다. 구체적으로 설명하자면 1분에 촬영되는 동영상 내의 이미지가 초당 30 프레임으로 촬영된다고 가정하면, 30X60으로 총 1,800장의 이미지가 생성되는데 해당 이미지들을 모두 비교하여 매칭하기에는 어려움이 있기에 본 발명에서는 해당 이미지 중에서 대표가 되는 대표 이미지를 선택하는 것이며, 선택된 대표 이미지를 기준 이미지로 사용하는 것이다.The purpose of selecting representative images (101-Q to 105-Q) in the same person recognition method of the present invention is to solve the problem of requiring an excessive amount of calculation in the conventional image matching method. To explain specifically, assuming that images in a video shot per minute are shot at 30 frames per second, a total of 1,800 images of 30 This means selecting a representative image that is suitable for use, and using the selected representative image as a reference image.

이렇게 대표 이미지(101-Q 내지 105-Q)가 선택되고 나면, 이후 동일인물 인식 장치(100)는 비교 이미지 시퀀스(S10)에 포함된 비교 이미지와 대표 이미지를 매칭하여 비교대상 쌍을 선정한다(S150). 여기서 비교 이미지 시퀀스(S10)는 C개의 이미지 시퀀스 중에서 동일인물 인식을 위한 비교 대상이 되는 인물의 이미지 시퀀스를 의미한다. Once the representative images (101-Q to 105-Q) are selected in this way, the same person recognition device 100 selects a comparison target pair by matching the representative image with the comparison image included in the comparison image sequence (S10) ( S150). Here, the comparison image sequence (S10) refers to the image sequence of the person that is the object of comparison for recognizing the same person among C image sequences.

비교대상 쌍 선정단계(S150)는, 도 7에서와 같이 비교 이미지 시퀀스(S10) 중에서 선택된 대표 이미지(101-Q, 102-Q, 103-Q)의 포즈와 가장 유사한 포즈를 갖는 이미지를 비교 이미지(101-S, 102-S, 103-S)로 선택할 수 있다. 그리고나면 동일인물 인식 장치(100)는 선택된 비교 이미지(101-S, 102-S, 103-S)를 대표 이미지(101-Q, 102-Q, 103-Q)와 각각 매칭하여 서로 하나의 쌍(101-P, 102-P, 103P)을 이루도록 할 수 있다. In the comparison target pair selection step (S150), as shown in FIG. 7, images with poses most similar to the poses of representative images (101-Q, 102-Q, 103-Q) selected from the comparison image sequence (S10) are selected as comparison images. You can select (101-S, 102-S, 103-S). Then, the same person recognition device 100 matches the selected comparison images (101-S, 102-S, 103-S) with the representative images (101-Q, 102-Q, 103-Q) to form a pair. (101-P, 102-P, 103P) can be achieved.

그리고 동일인물 인식 장치(100)는 하기의 수학식 5에 기초하여 비교대상 쌍을 선정할 수 있다. And the same person recognition device 100 can select a comparison target pair based on Equation 5 below.

[수학식 5][Equation 5]

이상의 식에서 는 쿼리 이미지 시퀀스의 시점 에 해당하는 포즈와 가장 유사한 포즈를 취하고 있는 비교 이미지 시퀀스에서의 시점을 의미하고, 는 대표 이미지의 포즈와 가장 유사한 포즈를 갖는 비교 이미지의 쌍들의 집합, 즉 비교대상 쌍의 집합이다. In the above equation is the viewpoint of the query image sequence It refers to the viewpoint in the comparison image sequence that is taking the pose most similar to the pose corresponding to, is a set of pairs of comparison images that have the most similar pose to that of the representative image, that is, a set of comparison target pairs.

여기서 대표 이미지의 포즈와 비교 이미지의 포즈 간의 거리는 포즈와 뷰의 차이를 측정하는 여러 가지 방식을 통해 산출할 수 있는데, 본 실시예에서는 하기의 수학식 6에서와 같이 가중 유클리디안 거리를 사용하여 산출하고 이 때 가중치 는 실험적으로 구하거나 사용자가 선택하여 결정할 수 있다.Here, the distance between the pose of the representative image and the pose of the comparison image can be calculated through various methods of measuring the difference between the pose and the view. In this embodiment, the weighted Euclidean distance is used as in Equation 6 below. Calculate and at this time the weight can be obtained experimentally or determined by user selection.

[수학식 6][Equation 6]

이상의 수학식 6에서 와 는 쿼리 이미지 시퀀스 q와 비교 이미지 시퀀스 s의 임의 시점에서의 인물 포즈에서 j번째 조인트에 대한 정규화된 2차원 또는 3차원의 좌표값을 의미한다. In Equation 6 above, and means the normalized two-dimensional or three-dimensional coordinate value for the jth joint in the pose of the person at any point in the query image sequence q and the comparison image sequence s.

이상에서와 같이 가중 유클리디안 거리를 사용하는 것은 상대적으로 쉽고 안정적으로 추정이 가능하기 때문으로 본 실시예에서는 가중 유클리디안을 사용하는 것으로 상정하였지만, 꼭 이 방식에 한정되는 것은 아니며 다른 방식으로도 두 포즈 간의 거리를 구할 수 있을 것이다.As above, using the weighted Euclidean distance is relatively easy and can be estimated stably, so in this embodiment, it is assumed to use the weighted Euclidean, but it is not limited to this method and can be used in other ways. You can also find the distance between two poses.

이 후 동일인물 인식 장치(100)는 도 8에 도시된 바와 같이 변형 알고리즘을 통해 비교대상 쌍(102-P)에서 쌍을 이루는 대표 이미지(102-Q) 또는 비교 이미지(102-S) 중 적어도 하나의 이미지의 포즈를 변형 목표 포즈로 보정할 수 있다(S160). Afterwards, the same person recognition device 100 selects at least one of the representative image 102-Q or the comparison image 102-S from the comparison target pair 102-P through a transformation algorithm as shown in FIG. 8. The pose of one image can be corrected to the modified target pose (S160).

이렇게 비교대상의 쌍을 선정하는 단계(S150) 이후 변형 알고리즘을 통해 대표 이미지(102-Q) 또는 비교 이미지(102-S) 중 적어도 하나의 포즈를 변형 목표 포즈로 보정하여 보정 이미지를 생성하는 것은, 상술한 바와 같이 S150에서 유사한 포즈와 뷰를 갖는 대표 이미지(102-Q)와 비교 이미지(102-S)를 서로 매칭하여 비교대상의 쌍(102-P)을 선정하더라도 쌍을 이루는 대표 이미지(102-Q)와 비교 이미지(102-S) 간의 포즈와 뷰가 정확히 정확히 일치하는 것은 아니기 때문에 수행하는 단계이다. After the step of selecting a pair of comparison objects (S150), a correction image is generated by correcting at least one pose of the representative image (102-Q) or the comparison image (102-S) to the transformation target pose through a transformation algorithm. , As described above, even if a pair of comparison objects (102-P) is selected by matching the representative image (102-Q) and the comparison image (102-S) having similar poses and views in S150, the paired representative image (102-P) This step is performed because the pose and view between the comparison image (102-Q) and the comparison image (102-S) do not exactly match.

종래에는 이미지를 매칭함에 있어서 DPM(deformable part Model)과 같이 각 신체 부위별로 비교하는 방식을 사용하고 있지만, 이 경우에는 DPM을 하기 위한 알고리즘을 실행하는데 많은 계산량을 필요로 하고, 파트별 비교가 전체 구조를 반영하지 못한다는 문제가 있다. Conventionally, when matching images, a method of comparing each body part, such as DPM (deformable part model), is used, but in this case, a large amount of calculation is required to execute the algorithm for DPM, and the comparison of each part is performed as a whole. The problem is that it does not reflect the structure.

이에 따라 본 발명에서는 이처럼 많은 계산량을 요구하지 않으면서 전체 구조를 반영하기 위하여 대표 이미지와 비교 이미지의 포즈를 일치시키는 단계를 포함할 수 있다. Accordingly, the present invention may include a step of matching the poses of the representative image and the comparison image to reflect the overall structure without requiring such a large amount of calculation.

그리고 변형 목표 포즈를 정하고 변형 목표 포즈와 동일하게 이미지의 포즈를 변형시키는 방법은 조인트 정보를 이용하여 non-rigid 기하변환 방식을 사용하거나 신경망을 이용한 방법으로 수행될 수 있으며, 이와 동일한 기능을 하는 방식이라면 어떤 방식이던 활용이 가능하다. 비신경망 방식의 Non-rigid 기하변환 방식의 일 예로는 Thin plate Spline 방식이나 ARAP(As Rigid as possible) 방식일 수 있다. 이러한 변형 목표 포즈는 하기의 수학식 7 및 수학식 8에 기초하여 생성될 수 있다. In addition, the method of determining the transformation target pose and transforming the pose of the image to be the same as the transformation target pose can be performed using a non-rigid geometric transformation method using joint information or a method using a neural network, and this method performs the same function. Any method can be used. An example of a non-rigid geometric transformation method of a non-neural network method may be the Thin plate Spline method or the ARAP (As Rigid as possible) method. This modified target pose can be generated based on Equation 7 and Equation 8 below.

[수학식 7][Equation 7]

[수학식 8][Equation 8]

여기서 는 변형 알고리즘에 따른 대표 이미지인 의 보정 이미지이고, 는 변형 알고리즘에 따른 비교 이미지인 의 보정 이미지이다. here is a representative image according to the transformation algorithm. It is a corrected image of is a comparison image according to the transformation algorithm. This is a corrected image.

그리고 이상의 식에서 는 현재 포즈 인 비교대상 쌍에 포함되는 대표 이미지 또는 비교 이미지인 를 변형 목표 포즈 로 변형해주는 변형 알고리즘을 의미하고, 는 쿼리 이미지 시퀀스의 시간 에서의 변형 목표 포즈를 의미한다. 와 는 각각 보정 후 이미지 와 에서의 사람의 포즈이다.And in the above equation is the current pose A representative image or comparison image included in a comparison target pair. Transform the target pose It refers to a transformation algorithm that transforms into, is the time of the query image sequence This means the modified target pose in . and are images after correction, respectively. and This is the pose of the person in .

본 발명에서의 포즈 보정단계(S160)는, 두 가지 방식 중 하나를 선택하여 포즈 보정을 수행함으로써 보정 이미지를 생성할 수 있다. 이러한 두 가지 방식 중 먼저 하나의 방식은 하기의 수학식 9에서와 같이 비교대상 쌍(102-P)에서 쌍을 이루는 대표 이미지(102-Q)의 포즈 또는 비교 이미지(102-S)의 포즈 중 한 이미지의 포즈를 변형 목표 포즈로 하고, 다른 한 이미지의 포즈를 변형 목표 포즈로 보정하여 보정 이미지를 생성하는 방식일 수 있다. In the pose correction step (S160) in the present invention, a corrected image can be generated by performing pose correction by selecting one of two methods. Among these two methods, the first method is one of the poses of the representative image (102-Q) or the pose of the comparison image (102-S) forming a pair in the comparison target pair (102-P), as shown in Equation 9 below. This may be a method of generating a corrected image by using the pose of one image as the target transformation pose and correcting the pose of the other image as the transformation target pose.

[수학식 9][Equation 9]

여기서 는 쿼리 이미지 시퀀스의 시간 와 비교 대상인 비교 이미지 시퀀스의 시간 에서의 변형 목표 포즈를 의미한다. here is the time of the query image sequence The time of the comparison image sequence to be compared with This means the modified target pose in .

도 8은 상술한 바와 같이 대표 이미지(102-Q)의 포즈를 변형 목표 포즈로 정하고, 비교 이미지(102-S)의 포즈를 대표 이미지(102-Q)의 포즈와 동일하게 보정하여 보정 이미지(102-SC)를 생성함으로써 두 이미지의 포즈를 일치시키는 경우의 도면이다. 이와는 반대로 비교 이미지(102-S)의 포즈를 변형 목표 포즈로 정하고 대표 이미지(102-Q)의 포즈를 변형 목표 포즈인 비교 이미지(102-S)의 포즈와 동일해지도록 보정하여 보정 이미지를 생성하여 두 이미지의 포즈를 일치시킬 수 있음은 물론이다. 8 shows that, as described above, the pose of the representative image 102-Q is set as the target transformation pose, and the pose of the comparison image 102-S is corrected to be the same as the pose of the representative image 102-Q, resulting in a corrected image ( This is a diagram for matching the poses of two images by generating (102-SC). On the contrary, the pose of the comparison image (102-S) is set as the transformation target pose, and the pose of the representative image (102-Q) is corrected to be the same as the pose of the comparison image (102-S), which is the transformation target pose, to generate a corrected image. Of course, the poses of the two images can be matched.

한편 나머지 하나의 방식은 하기의 수학식 10을 이용해 비교대상 쌍(102-P)에서 쌍을 이루는 대표 이미지(102-Q)의 포즈 및 비교 이미지(102-S)의 포즈의 중간포즈를 변형 목표 포즈로 하고, 대표 이미지(102-Q)의 포즈 및 비교 이미지(102-S)의 포즈를 모두 변형 목표 포즈로 보정하여 2개의 보정 이미지, 즉 보정된 대표 이미지 및 보정된 비교 이미지를 생성하여 두 이미지의 포즈를 일치시키는 방식일 수 있다. Meanwhile, the remaining method uses Equation 10 below to transform the pose of the representative image (102-Q) that forms a pair in the comparison target pair (102-P) and the intermediate pose of the pose of the comparison image (102-S). pose, and correct both the pose of the representative image (102-Q) and the pose of the comparison image (102-S) to the deformed target pose to generate two corrected images, that is, the corrected representative image and the corrected comparison image. This may be a method of matching the pose of the image.

[수학식 10][Equation 10]

따라서 본 발명의 동일인물 인식 장치(100)는 이상의 수학식들을 통해 변형 목표 포즈를 생성하고, 이렇게 생성된 변형 목표 포즈에 기반하여 비교 이미지(102-S)의 포즈 또는 대표 이미지(102-Q)의 포즈를 변형시켜 서로의 포즈를 일치시키게 된다. Therefore, the same person recognition device 100 of the present invention generates a modified target pose through the above equations, and based on the modified target pose thus generated, the pose of the comparison image 102-S or the representative image 102-Q is changed. By modifying the pose, the poses match each other.

그리고 동일인물 인식 장치(100)는 포즈 보정단계(S160)를 거친 대표 이미지 및 비교 이미지의 특징을 추출할 수 있다(S170). 여기서는 설명의 편의를 위해 도 8에 도시된 도면을 기준으로 설명하며, 도면에 도시된 바와 같이 특징을 추출하는 단계에서는 대표 이미지(102-Q)의 특징과 보정된 비교 이미지, 즉 보정 이미지(102-SC)의 특징을 추출할 수 있다. And the same person recognition device 100 can extract features of the representative image and comparison image that have gone through the pose correction step (S160) (S170). Here, for convenience of explanation, the description is based on the drawing shown in FIG. 8. As shown in the drawing, in the step of extracting features, the features of the representative image 102-Q and the corrected comparison image, that is, the corrected image 102 -SC) features can be extracted.

대표 이미지(102-Q)와 보정 이미지(102-SC) 간의 유사도를 계산하기 위하여 합성곱 신경망(CNN) 방식이나 종래의 고전적인 방식으로 특징값을 추출할 수 있는데 이를 수식화하면 하기의 수학식 11과 같다. In order to calculate the similarity between the representative image (102-Q) and the corrected image (102-SC), feature values can be extracted using a convolutional neural network (CNN) method or a conventional classical method. This can be formalized as Equation 11 below: Same as

[수학식 11][Equation 11]

이상에서 는 이미지에서 특징을 추출하기 위한 신경망 또는 정해진 함수이다. From above is a neural network or defined function for extracting features from images.

이상에서는 변형 목표 포즈가 대표 이미지(102-Q)의 포즈인 경우로 설명하였기에 보정된 비교 이미지인 보정 이미지(102-SC)의 특징과 대표 이미지102-의 특징을 추출한다고 설명하였으나, 이에 한정되는 것은 아니다. In the above, it was explained that the transformation target pose is the pose of the representative image 102-Q, so the features of the corrected image 102-SC, which is a corrected comparison image, and the features of the representative image 102- are extracted. However, it is limited to this. That is not the case.

구체적으로 만약 변형 목표 포즈가 중간 포즈로 설정된 경우라면, 변형 목표 포즈로 보정된 대표 이미지의 특징과 보정된 비교 이미지의 특징을 추출하게 되는 것이다. Specifically, if the transformation target pose is set to an intermediate pose, the features of the representative image corrected to the transformation target pose and the features of the corrected comparison image are extracted.

이렇게 특징이 추출되고 나면 동일인물 인식 장치(100)는, 추출된 특징 간의 유사도 및 거리를 산출할 수 있다(S180). Once the features are extracted in this way, the same person recognition device 100 can calculate the similarity and distance between the extracted features (S180).

도면에 도시된 바와 같이 대표 이미지(102-Q)의 특징값과 보정된 비교 이미지인 보정 이미지(102-SC)의 특징값 간의 거리를 하기의 수학식 12 또는 수학식 13과 같이 두 특징값 간의 거리를 계산하는 함수에 기초하여 비교대상 쌍(102-P)의 거리의 합 등을 유사도로 사용할 수 있다. 특징값 간의 거리를 계산하는 알고리즘은 역코사인 유사성(invers cosine similarity)을 사용하거나 신경회로망을 통한 학습에 의하여 얻을 수 있다. As shown in the figure, the distance between the feature values of the representative image 102-Q and the feature value of the corrected image 102-SC, which is a corrected comparison image, is calculated as Equation 12 or Equation 13 below. Function to calculate distance Based on , the sum of the distances of the comparison target pair (102-P) can be used as the similarity. The algorithm for calculating the distance between feature values can be obtained by using invers cosine similarity or learning through a neural network.

[수학식 12][Equation 12]

[수학식 13][Equation 13]

여기서 는 쿼리 이미지 시퀀스의 인물의 시점 t에서의 대표 이미지의 특징(feature) 벡터 값이고 는 해당 시점에 대응되는 보정된 비교 이미지의 특징 벡터 값이며, 는 이 특징 벡터들 간의 거리(차이)이다. 즉 대표 이미지의 집합 과 비교 이미지의 집합 간의 거리를 서로 매칭되어 한 쌍을 이루는 이미지들 간의 거리의 합 또는 이들 거리 중에서 가장 적은 거리로 정의하여 사용할 수 있다.here is the feature vector value of the representative image at the viewpoint t of the person in the query image sequence, and is the feature vector value of the corrected comparison image corresponding to that time point, is the distance (difference) between these feature vectors. That is, a set of representative images and set of comparison images The distance between images can be defined as the sum of the distances between images that match each other and form a pair, or the smallest distance among these distances.

이렇게 산출된 유사도의 결과를 유사도에 따라 가장 가까운 Rank-K, 즉 가장 근접한 비교 대상 쌍 K개까지 선택하는 알고리즘을 통해 유사도가 가장 근접한 K개의 검색 결과를 출력하여 사용자가 확인할 수 있도록 하여 사용자에 의해 동일인물인지를 판별할 수 있도록 한다. 여기서 K개의 개수는 사용자의 선택에 따라 달라질 수 있다.The results of the similarity calculated in this way are output through an algorithm that selects the closest Rank-K, that is, the K closest comparison target pairs according to the similarity, and outputs the K search results with the closest similarity so that the user can check them. Allows you to determine whether it is the same person. Here, the number of K may vary depending on the user's selection.

한편, 도 10은 본 발명의 일 실시예에 따른 동일인물 인식 방법에 따라 동일인물을 인식한 결과를 설명하기 위한 도면으로, 도 10 (a)에 표시된 이미지 시퀀스는 쿼리 이미지 시퀀스(Q20, Q30, Q40)로 동일인물 인식을 위한 기준인물의 이미지 시퀀스이고, 도 10 (b)에 표시된 이미지 시퀀스는 출력된 결과물이다. Meanwhile, Figure 10 is a diagram for explaining the results of recognizing the same person according to the same person recognition method according to an embodiment of the present invention. The image sequence shown in Figure 10 (a) is a query image sequence (Q20, Q30, Q40) is an image sequence of a reference person for recognizing the same person, and the image sequence shown in Figure 10 (b) is the output result.

도시된 바와 같이 복수의 카메라에서 촬영된 동영상에 기초하여 획득한 이미지 시퀀스 중에서 동일인물을 인식하기 위한 기준인물의 이미지 시퀀스인 쿼리 이미지 시퀀스(Q20, Q30, Q40)가 선택되거나 입력되면, 동일인물 인식 장치(100)는 동일인물 인식 방법을 수행하여 유사도를 산출하고 이로부터 유사도가 가장 높은 10개의 이미지 시퀀스를 출력하도록 설정되면 도시된 바와 같이 10개의 결과물이 출력될 수 있다. 그러면 사용자는 출력된 결과물을 확인하여 출력된 결과물 중 대표 이미지 시퀀스에 포함된 기준인물과 동일한 인물을 포함하는 동영상인지 아닌지를 육안으로 확인하여 판별할 수 있다. As shown, if the query image sequence (Q20, Q30, Q40), which is a reference image sequence of a reference person for recognizing the same person, is selected or input among the image sequences acquired based on videos shot from a plurality of cameras, the same person is recognized. If the device 100 is set to perform the same person recognition method to calculate the similarity and output 10 image sequences with the highest similarity, 10 results can be output as shown. Then, the user can check the output result and determine with the naked eye whether or not the video contains the same person as the reference person included in the representative image sequence.

도시된 도면에서 빨간색으로 표시된 인물 이미지 시퀀스는 찾고자하는 인물과는 다른 인물로 판별된 이미지이고, 초록색으로 표시된 인물 이미지 시퀀스는 동일인물로 인식된 이미지이다. 이처럼 종래의 인물 이미지를 통째로 비교하거나 파트별로 구분하는 대신 이처럼 본 발명의 동일인물 인식 방법에 따라 이미지를 포즈로 일치시킨 후 비교함으로써 도시된 바와 같이 인물의 뷰, 즉 촬영각도와 포즈가 다르더라도 동일한 인물인 것으로 판별할 수 있다. 이에 다중 방재카메라 시스템에서 인물 재탐지 성능과 신뢰성을 높일 수 있고 동영상에서 대표적인 이미지를 선별함으로서 검색 속도를 높일 수 있게 됨은 물론이다. In the drawing, the person image sequence shown in red is an image determined to be a different person from the person being searched, and the person image sequence shown in green is an image recognized as the same person. In this way, instead of comparing the conventional person images as a whole or dividing them into parts, the images are matched by pose according to the same person recognition method of the present invention and then compared, so that the view of the person, that is, the same person even if the shooting angle and pose are different, is shown. It can be determined that it is a person. As a result, it is possible to improve the performance and reliability of re-detection of people in a multiple disaster prevention camera system, and of course, increase the search speed by selecting representative images from the video.

이와 같은 본 발명의 동일인물 인식 방법은 다양한 컴퓨터 구성요소를 통하여 수행될 수 있는 프로그램 명령어의 형태로 구현되어 컴퓨터 판독 가능한 기록 매체에 기록될 수 있다. 상기 컴퓨터 판독 가능한 기록 매체는 프로그램 명령어, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다. The same person recognition method of the present invention can be implemented in the form of program instructions that can be executed through various computer components and recorded on a computer-readable recording medium. The computer-readable recording medium may include program instructions, data files, data structures, etc., singly or in combination.

상기 컴퓨터 판독 가능한 기록 매체에 기록되는 프로그램 명령어는 본 발명을 위하여 특별히 설계되고 구성된 것은 물론, 컴퓨터 소프트웨어 분야의 당업자에게 공지되어 사용 가능한 것일 수도 있다. The program instructions recorded on the computer-readable recording medium may not only be designed and configured specifically for the present invention, but may also be known and usable by those skilled in the computer software field.

컴퓨터 판독 가능한 기록 매체의 예에는, 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체, CD-ROM, DVD 와 같은 광기록 매체, 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical media), 및 ROM, RAM, 플래시 메모리 등과 같은 프로그램 명령어를 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다.Examples of computer-readable recording media include magnetic media such as hard disks, floppy disks and magnetic tapes, optical recording media such as CD-ROMs and DVDs, and magneto-optical media such as floptical disks. media), and hardware devices specifically configured to store and perform program instructions, such as ROM, RAM, flash memory, etc.

프로그램 명령어의 예에는, 컴파일러에 의해 만들어지는 것과 같은 기계어 코드 뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드도 포함된다. 상기 하드웨어 장치는 본 발명에 따른 처리를 수행하기 위해 하나 이상의 소프트웨어 모듈로서 작동하도록 구성될 수 있으며, 그 역도 마찬가지이다.Examples of program instructions include not only machine language code such as that created by a compiler, but also high-level language code that can be executed by a computer using an interpreter or the like. The hardware device may be configured to operate as one or more software modules to perform processing according to the invention and vice versa.

이상에서는 본 발명의 다양한 실시예에 대하여 도시하고 설명하였지만, 본 발명은 상술한 특정의 실시예에 한정되지 아니하며, 청구범위에서 청구하는 본 발명의 요지를 벗어남이 없이 당해 발명이 속하는 기술분야에서 통상의 지식을 가진 자에 의해 다양한 변형실시가 가능한 것은 물론이고, 이러한 변형실시들은 본 발명의 기술적 사상이나 전망으로부터 개별적으로 이해되어져서는 안될 것이다.Although various embodiments of the present invention have been shown and described above, the present invention is not limited to the specific embodiments described above, and may be used in the technical field to which the invention pertains without departing from the gist of the invention as claimed in the claims. Of course, various modifications can be made by those skilled in the art, and these modifications should not be understood individually from the technical idea or perspective of the present invention.

100 : 동일인물 인식 장치 110 : 통신부
130 : 입력부 150 : 메모리
170 : 출력부 190 : 프로세서
191 : 사전처리부 192 : 대표 이미지 선택부
193 : 비교대상 선정부 194 : 이미지 보정부
195 : 추출부 196 : 산출부100: Same person recognition device 110: Communication department
130: input unit 150: memory
170: output unit 190: processor
191: Pre-processing unit 192: Representative image selection unit
193: Comparison target selection unit 194: Image correction unit
195: extraction unit 196: calculation unit

Claims

In the same person recognition method performed by the same person recognition device,
Receiving video input from at least one camera;
An image sequence acquisition step in which an image sequence is acquired based on the input video;
In the image sequence, the image sequence of the reference person for recognizing the same person is a query image sequence, and a preset number of representative images are selected from the query image sequence based on the pose of the image included in the query image sequence. Image selection step;
In the image sequence, the image sequence of the person that is the subject of comparison for recognizing the same person is set as the comparison image sequence, and the comparison target pair is selected by matching the comparison image included in the comparison image sequence with the representative image. Selection stage;
A pose correction step of correcting the pose of at least one image among the representative image or comparison image in the pair of comparison objects to a transformation target pose through a transformation algorithm;
a feature extraction step of extracting features of the comparison image and the representative image that have undergone the pose correction step; and
It includes a calculation step of calculating the similarity and distance between the extracted features,
The image sequence acquisition step is,
A step of detecting and tracking a person's area in the input video to obtain an image sequence including a bounding box for the person.
The image sequence acquisition step is,
A pose estimation step of estimating the pose of the image by extracting a joint from each image included in the image sequence,
The representative image selection step is,
Based on the pose estimated through the pose estimation step, find the pose vector of each image included in the query image sequence through a clustering algorithm, perform clustering, find the center of gravity of each cluster, and find the center of gravity of each cluster. Among the images, the image with the pose closest to the center of gravity of the cluster is selected as the representative image,
The representative image selected in the representative image selection step is,
A method for recognizing the same person, characterized in that the image is selected for each pose of the reference person that changes over time from the query image sequence for which the pose estimation step has been completed.

delete

According to paragraph 1,
The step of selecting the comparison target pair is,
A method for recognizing the same person, characterized in that, among the comparison image sequences, an image having a pose most similar to the pose of the representative image is matched with the representative image as a comparison image so that the representative image and the comparison image are paired.

According to paragraph 1,
In the pose correction step,
The same, characterized in that the pose of one of the poses of the representative image or the pose of the comparison image forming a pair in the pair of comparison objects is set as the transformation target pose, and the pose of the other image is corrected to the transformation target pose. Person recognition method.

According to paragraph 1,
In the pose correction step,
Setting an intermediate pose of the pose of the representative image and the pose of the comparison image that forms a pair in the pair of comparison objects as the transformation target pose, and correcting the pose of the representative image and the pose of the comparison image to the transformation target pose. Characteristics of the same person recognition method.

A computer-readable recording medium on which a computer program is recorded for performing the same person recognition method according to claim 1.

An input unit that receives video from at least one camera;
a pre-processing unit that obtains an image sequence based on the input video;
In the image sequence, the image sequence of the reference person for recognizing the same person is a query image sequence, and a representative image sequence is selected from the query image sequence as many as a preset number based on the pose of the image included in the query image sequence. image selection unit;
In the image sequence, the image sequence of the person that is the object of comparison for recognizing the same person is set as the comparison image sequence, and the comparison object pair is selected by matching the comparison image included in the comparison image sequence with the representative image. government;
an image correction unit that corrects the pose of at least one of the representative images or comparison images in the pair of comparison objects to a transformation target pose through a transformation algorithm;
an extraction unit that extracts features of the comparison image and the representative comparison image that have undergone pose correction through the image correction unit; and
It includes a calculation unit that calculates the similarity and distance between the extracted features,
The pre-processing unit,
Obtaining an image sequence including a bounding box for the person by detecting and tracking the person area in the input video,
Extract joints from each image included in the image sequence to estimate the pose of the image,
The representative image selection section,
Based on the estimated pose, find the pose vector of each image included in the query image sequence through a clustering algorithm, perform clustering, find the center of gravity of each cluster, and find the center of gravity of each cluster. Select the image with the closest pose to the center of gravity as the representative image,
The representative image selected by the representative image selection unit is,
A same-person recognition device that is an image selected for each pose of the reference person that changes over time from the query image sequence for which the pose estimation has been completed.

According to clause 8,
The image correction unit,
The same, characterized in that the pose of one of the poses of the representative image or the pose of the comparison image forming a pair in the pair of comparison objects is set as the transformation target pose, and the pose of the other image is corrected to the transformation target pose. Person recognition device.

According to clause 8,
The image correction unit,
Setting an intermediate pose of the pose of the representative image and the pose of the comparison image that forms a pair in the pair of comparison objects as the transformation target pose, and correcting the pose of the representative image and the pose of the comparison image to the transformation target pose. Characterized by a same-person recognition device.