KR20230141371A

KR20230141371A - Face tracking device and face tracking method using image generation network

Info

Publication number: KR20230141371A
Application number: KR1020220067059A
Authority: KR
Inventors: 한동석; 이동규
Original assignee: 경북대학교 산학협력단
Priority date: 2022-03-30
Filing date: 2022-05-31
Publication date: 2023-10-10

Abstract

본 발명은 이미지 생성 네트워크를 이용한 얼굴 추적 장치에 관한 것으로, 사용자로부터 입력된 영상에서 객체를 추적하기 위해, 상기 영상의 초기 이미지로부터 적어도 하나 이상의 객체에 관한 서로 다른 특징을 포함하는 적어도 하나의 추가 이미지를 생성하는 이미지 생성 네트워크부; 상기 초기 이미지 및 적어도 하나의 추가 이미지에 존재하는 특징 정보를 추출하는 정보 추출부; 상기 특징 정보에 기초하여 얼굴을 추적하기 위한 객체 추적 모델을 생성하는 객체 추적 모델 생성부; 및 상기 객체 추적 모델을 이용하여 상기 사용자로부터 입력된 영상에서 상기 얼굴을 추적하는 객체 추적부;를 포함한다. 이를 통해, 카메라의 움직임과 환경에 따라 변화하는 객체의 다양한 모습을 이미지 생성 네트워크를 통해 생성하여 객체의 추적이 가능하다.The present invention relates to a face tracking device using an image generation network, and to track an object in an image input from a user, at least one additional image containing different characteristics of at least one object from the initial image of the image. an image generation network unit that generates; an information extraction unit that extracts feature information present in the initial image and at least one additional image; an object tracking model generator that generates an object tracking model for tracking a face based on the feature information; and an object tracking unit that tracks the face in the image input from the user using the object tracking model. Through this, various appearances of the object that change depending on the movement of the camera and the environment are created through the image generation network, making it possible to track the object.

Description

FACE TRACKING DEVICE AND FACE TRACKING METHOD USING IMAGE GENERATION NETWORK}

본 발명은 이미지 생성 네트워크를 이용한 얼굴 추적 장치 및 얼굴 추적 방법에 관한 것으로, 더욱 상세하게는 카메라의 움직임, 환경에 따라 변화하는 객체의 모습을 이미지 생성 네트워크를 통해 특징을 파악하여 객체 추적을 수행하는 이미지 생성 네트워크를 이용한 얼굴 추적 장치 및 얼굴 추적 방법에 관한 것이다.The present invention relates to a face tracking device and a face tracking method using an image generation network. More specifically, the present invention relates to a face tracking device and a face tracking method using an image generation network to determine the characteristics of an object that changes depending on the camera movement and the environment and perform object tracking. This relates to a face tracking device and face tracking method using an image generation network.

종래의 객체 추적 기술은 자율 주행 차량, 도로 교통 감시, 의료 시스템 등 다양한 분야에서 첨단기술의 기본 기술로서 중요한 역할을 한다.Conventional object tracking technology plays an important role as a basic cutting-edge technology in various fields such as autonomous vehicles, road traffic monitoring, and medical systems.

객체 추적 기술은 연속적인 프레임 즉, 영상 내에서 특정 객체를 인식하고 객체를 연속적으로 표시하는 과정을 수행하는 기술로서, 학습이 수행된 머신 러닝 모델을 통해 단일 객체를 추적하는 단일 객체 추적 알고리즘과 다중 객체를 추적하는 다중 객체 추적 알고리즘으로 분류된다.Object tracking technology is a technology that recognizes specific objects in successive frames, that is, images, and performs the process of continuously displaying the objects. It includes a single object tracking algorithm that tracks a single object through a learned machine learning model, and a multi-object tracking algorithm. It is classified as a multi-object tracking algorithm that tracks objects.

단일 객체 추적 알고리즘은 프레임 내의 객체 수와 객체의 클래스와 관계없이 특정 개체만 추적하는 알고리즘이고, 다중 객체 추적 알고리즘은 추적하려는 객체와 같이 프레임 내이 모든 클래스를 순차적으로 추적하는 알고리즘이다.The single object tracking algorithm is an algorithm that tracks only a specific object regardless of the number of objects in the frame and the class of the object, while the multi-object tracking algorithm is an algorithm that sequentially tracks all classes in the frame, such as the object to be tracked.

이러한, 단일 또는 다중 객체 추적 알고리즘은 객체를 인식하는 학습이 사전에 완료된 딥러닝 모델을 이용하기 때문에 별도의 초기화 단계가 필요하며, 첫번째 프레임에서 별도의 객체 인식 알고리즘을 사용하거나 경계 상자에 객체를 직접 표시해야 하는 문제점이 발생한다.These single or multiple object tracking algorithms require a separate initialization step because they use a deep learning model that has already been trained to recognize objects, and use a separate object recognition algorithm in the first frame or directly place the object in the bounding box. A problem arises that needs to be flagged.

또한, 딥러닝을 이용하여 객체를 추적하는 알고리즘은 객체에 포함된 특징을 통해 객체를 추적하지만, 객체의 특징은 빛, 그림자, 다른 물체에 가려짐 및 카메라 흔들림 등 다양한 요인에 따라 쉽게 변경되어 초기에 획득한 객체의 특징 정보를 사용할 수 없어 얼굴 추적 정확도가 떨어지는 문제점이 발생한다.In addition, the algorithm that tracks objects using deep learning tracks objects through the features included in the objects, but the features of the objects can easily change depending on various factors such as light, shadows, occlusion by other objects, and camera shake, so the initial Since the feature information of the object obtained cannot be used, a problem occurs in which face tracking accuracy is low.

따라서, 단일 또는 다중 객체 추적 알고리즘이 영상 내에서 객체를 추적할 때, 보다 높은 정확도를 통해 객체를 추적하는 기술에 대한 연구개발이 필요한 실정이다.Therefore, when a single or multiple object tracking algorithm tracks an object in an image, research and development on technology to track the object with higher accuracy is needed.

(대한민국) 등록특허공보 제10-2167760호(Republic of Korea) Registered Patent Publication No. 10-2167760

본 발명은 상기와 같은 문제를 해결하기 위해 안출된 것으로, 본 발명의 목적은 이미지 생성 네트워크를 통해 생성된 이미지로부터 특징 정보를 추출하고, 추출된 특징 정보에 기초하여 객체의 얼굴을 추적하는 이미지 생성 네트워크를 이용한 얼굴 추적 장치 및 얼굴 추적 방법을 제공하는 것이다.The present invention was created to solve the above problems, and the purpose of the present invention is to extract feature information from an image generated through an image generation network and generate an image that tracks the face of an object based on the extracted feature information. The aim is to provide a face tracking device and a face tracking method using a network.

상기 목적을 달성하기 위한 본 발명의 일 실시예에 따른 이미지 생성 네트워크를 이용한 얼굴 추적 장치는, 사용자로부터 입력된 영상에서 객체를 추적하기 위해, 이미지 생성 네트워크를 이용하는 얼굴 추적 장치로써, 상기 영상의 초기 이미지로부터 적어도 하나 이상의 객체에 관한 서로 다른 특징을 포함하는 적어도 하나의 추가 이미지를 생성하는 이미지 생성 네트워크부; 상기 초기 이미지 및 적어도 하나의 추가 이미지에 존재하는 특징 정보를 추출하는 정보 추출부; 상기 특징 정보에 기초하여 얼굴을 추적하기 위한 객체 추적 모델을 생성하는 객체 추적 모델 생성부; 및 상기 객체 추적 모델을 이용하여 상기 사용자로부터 입력된 영상에서 상기 얼굴을 추적하는 객체 추적부;를 포함한다.A face tracking device using an image generation network according to an embodiment of the present invention to achieve the above object is a face tracking device that uses an image generation network to track an object in an image input from a user, and includes an initial part of the image. an image generation network unit generating at least one additional image including different features of at least one object from the image; an information extraction unit that extracts feature information present in the initial image and at least one additional image; an object tracking model generator that generates an object tracking model for tracking a face based on the feature information; and an object tracking unit that tracks the face in the image input from the user using the object tracking model.

이때, 상기 추가 이미지는, 상기 초기 이미지에 존재하는 상기 객체의 얼굴 영역이 서로 다른 각도로 변형된 형태를 포함하여 적어도 하나 이상 생성될 수 있다.At this time, the additional image may be generated including at least one face area of the object present in the initial image transformed at different angles.

이와 관련하여, 상기 이미지 생성 네트워크부는, 상기 초기 이미지에 존재하는 상기 객체의 얼굴 영역을 설정하는 영역 설정부; 상기 얼굴 영역에서 특징을 추출하고, 상기 특징을 미리 설정된 신경망 모델에 입력하여 서로 다른 특징 벡터를 생성하는 특징 벡터 생성부; 및 상기 서로 다른 특징 벡터에 기초하여 상기 객체의 얼굴 영역이 서로 다른 각도로 변형된 형태를 갖는 상기 추가 이미지를 생성하는 추가 이미지 생성부;를 포함할 수 있다.In this regard, the image generation network unit may include an area setting unit that sets a face area of the object present in the initial image; a feature vector generator that extracts features from the face area and inputs the features into a preset neural network model to generate different feature vectors; and an additional image generator that generates the additional image in which the face area of the object is transformed at different angles based on the different feature vectors.

이에 따른, 상기 객체 추적 모델 생성부는, 상기 초기 이미지 및 상기 적어도 하나의 추가 이미지 각각으로부터 추출된 특징 정보를 분석하여 상기 얼굴 영역이 특정 객체인 경우 상기 객체를 전경으로 분류하고, 상기 얼굴 영역이 특정 객체가 아니면 상기 객체를 배경으로 분류하여 분류 정보를 산출하는 분류 정보 필터를 생성하는 분류 정보 필터 생성부; 및 상기 초기 이미지 및 상기 적어도 하나의 추가 이미지 각각에 포함된 상기 객체의 얼굴 영역을 프레임 형태로 표시하고, 상기 프레임 형태로 표시된 상기 객체의 얼굴 영역의 프레임 정보를 산출하는 객체 프레임 정보 필터를 생성하는 객체 프레임 정보 필터 생성부;를 포함할 수 있다.Accordingly, the object tracking model generator analyzes feature information extracted from each of the initial image and the at least one additional image to classify the object as the foreground when the face area is a specific object, and the face area is a specific object. If it is not an object, a classification information filter generator that generates a classification information filter that classifies the object as a background and calculates classification information; And generating an object frame information filter that displays the face area of the object included in each of the initial image and the at least one additional image in the form of a frame and calculates frame information of the face area of the object displayed in the frame form. It may include an object frame information filter creation unit.

그리고, 상기 객체 추적 모델 생성부는, 상기 분류 정보 필터 및 상기 객체 프레임 정보 필터에 컨볼루션 연산을 수행하여 상기 객체 추적 모델을 생성하는, 모델 생성부;를 더 포함할 수 있다.In addition, the object tracking model generator may further include a model generator configured to generate the object tracking model by performing a convolution operation on the classification information filter and the object frame information filter.

따라서, 상기 객체 추적부는, 상기 정보 추출부에서 상기 사용자로부터 입력된 영상에서 추출된 실시간 특징 정보를 전달받고, 상기 실시간 특징 정보를 상기 객체 추적 모델에 적용하여 상기 특정 객체를 추적 및 표시하되, 상기 분류 정보 필터를 이용하여 상기 실시간 특징 정보의 얼굴 영역이 전경 또는 배경인지 분류하고, 상기 객체 프레임 정보 필터를 이용하여 상기 실시간 특징 정보의 얼굴 영역을 프레임 형태로 표시하여 상기 영상으로부터 상기 얼굴인 상기 특정 객체를 추적 및 표시할 수 있다.Accordingly, the object tracking unit receives real-time feature information extracted from the image input from the user from the information extraction unit, and applies the real-time feature information to the object tracking model to track and display the specific object, Classify whether the face area of the real-time feature information is foreground or background using a classification information filter, and display the face area of the real-time feature information in the form of a frame using the object frame information filter to select the face area from the image. Objects can be tracked and marked.

한편, 상기 목적을 달성하기 위한 본 발명의 실시예에 따른 얼굴 추적 방법은, 이미지 생성 네트워크를 이용한 얼굴 추적 장치로부터 수행되는 방법으로써, 사용자로부터 영상을 입력받는 입력 단계; 상기 영상의 초기 이미지로부터 적어도 하나 이상의 객체에 관한 서로 다른 특징을 포함하는 적어도 하나의 추가 이미지를 생성하는 이미지 생성 단계; 상기 초기 이미지 및 적어도 하나의 추가 이미지에 존재하는 특징 정보를 추출하는 정보 추출 단계; 상기 특징 정보에 기초하여 얼굴을 추적하기 위한 객체 추적 모델을 생성하는 객체 추적 모델 생성 단계; 및 상기 객체 추적 모델을 이용하여 상기 사용자로부터 입력된 영상에서 상기 얼굴을 추적하는 객체 추적 단계;를 포함한다.Meanwhile, a face tracking method according to an embodiment of the present invention for achieving the above object is a method performed by a face tracking device using an image generation network, and includes an input step of receiving an image from a user; An image generation step of generating at least one additional image including different features of at least one object from the initial image of the video; An information extraction step of extracting feature information present in the initial image and at least one additional image; An object tracking model generation step of generating an object tracking model for tracking a face based on the feature information; and an object tracking step of tracking the face in an image input from the user using the object tracking model.

이와 관련하여, 상기 이미지 생성 단계는, 상기 초기 이미지에 존재하는 상기 객체의 얼굴 영역을 설정하는 영역 설정 단계; 상기 얼굴 영역에서 특징을 추출하고, 상기 특징을 미리 설정된 신경망 모델에 입력하여 서로 다른 특징 벡터를 생성하는 특징 벡터 생성 단계; 및 상기 서로 다른 특징 벡터에 기초하여 상기 객체의 얼굴 영역이 서로 다른 각도로 변형된 형태를 갖는 상기 추가 이미지를 생성하는 추가 이미지 생성 단계;를 포함할 수 있다.In this regard, the image generating step includes: an area setting step of setting a face area of the object present in the initial image; A feature vector generation step of extracting features from the face area and inputting the features into a preset neural network model to generate different feature vectors; and an additional image generating step of generating the additional image in which the face area of the object has a deformed shape at different angles based on the different feature vectors.

이에 따른, 상기 객체 추적 모델 생성 단계는, 상기 초기 이미지 및 상기 적어도 하나의 추가 이미지 각각으로부터 추출된 특징 정보를 분석하여 상기 얼굴 영역이 특정 객체인 경우 상기 객체를 전경으로 분류하고, 상기 얼굴 영역이 특정 객체가 아니면 상기 객체를 배경으로 분류하여 분류 정보를 산출하는 분류 정보 필터를 생성하는 분류 정보 필터 생성 단계; 및 상기 초기 이미지 및 상기 적어도 하나의 추가 이미지 각각에 포함된 상기 객체의 얼굴 영역을 프레임 형태로 표시하고, 상기 프레임 형태로 표시된 상기 객체의 얼굴 영역의 프레임 정보를 산출하는 객체 프레임 정보 필터를 생성하는 객체 프레임 정보 필터 생성 단계;를 포함할 수 있다.Accordingly, the object tracking model generation step analyzes feature information extracted from each of the initial image and the at least one additional image to classify the object as the foreground when the face area is a specific object, and the face area is a specific object. A classification information filter creation step of generating a classification information filter that classifies the object as a background if it is not a specific object and calculates classification information; And generating an object frame information filter that displays the face area of the object included in each of the initial image and the at least one additional image in the form of a frame and calculates frame information of the face area of the object displayed in the frame form. It may include an object frame information filter creation step.

그리고, 상기 객체 추적 모델 생성 단계는, 상기 분류 정보 필터 및 상기 객체 프레임 정보 필터에 컨볼루션 연산을 수행하여 상기 객체 추적 모델을 생성하는 모델 생성 단계;를 더 포함할 수 있다.In addition, the object tracking model generating step may further include a model generating step of generating the object tracking model by performing a convolution operation on the classification information filter and the object frame information filter.

따라서, 상기 객체 추적 단계는, 상기 정보 추출 단계에서 상기 사용자로부터 입력된 영상에서 추출된 실시간 특징 정보를 전달받고, 상기 실시간 특징 정보를 상기 객체 추적 모델에 적용하여 상기 특정 객체를 추적 및 표시하되, 상기 분류 정보 필터를 이용하여 상기 실시간 특징 정보의 얼굴 영역이 전경 또는 배경인지 분류하고, 상기 객체 프레임 정보 필터를 이용하여 상기 실시간 특징 정보의 얼굴 영역을 프레임 형태로 표시하여 상기 영상으로부터 상기 얼굴인 상기 특정 객체를 추적 및 표시할 수 있다.Therefore, the object tracking step receives real-time feature information extracted from the image input from the user in the information extraction step, and applies the real-time feature information to the object tracking model to track and display the specific object, The classification information filter is used to classify whether the face area of the real-time feature information is foreground or background, and the object frame information filter is used to display the face area of the real-time feature information in the form of a frame to select the face from the image. You can track and mark specific objects.

상술한 본 발명의 일측면에 따르면, 이미지 생성 네트워크를 이용한 얼굴 추적 장치 및 얼굴 추적 방법을 제공함으로써, 카메라의 움직임과 환경에 따라 변화하는 객체의 다양한 모습을 이미지 생성 네트워크를 통해 생성하여 객체를 추적하는 딥러닝 모델의 학습 데이터로 사용할 수 있다.According to one aspect of the present invention described above, by providing a face tracking device and a face tracking method using an image generation network, various appearances of the object that change depending on the movement of the camera and the environment are generated through the image generation network to track the object. It can be used as training data for deep learning models.

이로써, 높은 추적 정확도를 가지는 이미지 생성 네트워크를 이용한 얼굴 추적 장치 및 얼굴 추적 방법을 사용자에게 제공할 수 있다.As a result, a face tracking device and a face tracking method using an image generation network with high tracking accuracy can be provided to users.

도 1은 본 발명의 실시예에 따른 이미지 생성 네트워크를 활용한 얼굴 추적 장치의 블록 도면이다.
도 2는 도 1의 이미지 생성 네트워크부의 블록 도면이다.
도 3은 도 1의 객체 추적 모델 생성부의 블록 도면이다.
도 4 및 도 5는 도 1의 이미지 생성 네트워크를 활용한 얼굴 추적 장치의 예시 도면이다.
도 6 내지 도 8은 본 발명의 실시예에 따른 얼굴 추적 방법의 흐름 도면이다.Figure 1 is a block diagram of a face tracking device using an image generation network according to an embodiment of the present invention.
FIG. 2 is a block diagram of the image generation network unit of FIG. 1.
FIG. 3 is a block diagram of the object tracking model creation unit of FIG. 1.
Figures 4 and 5 are example diagrams of a face tracking device using the image generation network of Figure 1.
6 to 8 are flow diagrams of a face tracking method according to an embodiment of the present invention.

후술하는 본 발명에 대한 상세한 설명은, 본 발명이 실시될 수 있는 특정 실시예를 예시로서 도시하는 첨부 도면을 참조한다. 이들 실시예는 당업자가 본 발명을 실시할 수 있기에 충분하도록 상세히 설명된다. 본 발명의 다양한 실시예는 서로 다르지만 상호 배타적일 필요는 없음이 이해되어야 한다. 예를 들어, 여기에 기재되어 있는 특정 형상, 구조 및 특성은 일 실시예와 관련하여 본 발명의 정신 및 범위를 벗어나지 않으면서 다른 실시예로 구현될 수 있다. 또한, 각각의 개시된 실시예 내의 개별 구성요소의 위치 또는 배치는 본 발명의 정신 및 범위를 벗어나지 않으면서 변경될 수 있음이 이해되어야 한다. 따라서, 후술하는 상세한 설명은 한정적인 의미로서 취하려는 것이 아니며, 본 발명의 범위는, 적절하게 설명된다면, 그 청구항들이 주장하는 것과 균등한 모든 범위와 더불어 첨부된 청구항에 의해서만 한정된다. 도면에서 유사한 참조부호는 여러 측면에 걸쳐서 동일하거나 유사한 기능을 지칭한다.The detailed description of the present invention described below refers to the accompanying drawings, which show by way of example specific embodiments in which the present invention may be practiced. These embodiments are described in sufficient detail to enable those skilled in the art to practice the invention. It should be understood that the various embodiments of the present invention are different from one another but are not necessarily mutually exclusive. For example, specific shapes, structures and characteristics described herein may be implemented in one embodiment without departing from the spirit and scope of the invention. Additionally, it should be understood that the location or arrangement of individual components within each disclosed embodiment may be changed without departing from the spirit and scope of the invention. Accordingly, the detailed description that follows is not intended to be taken in a limiting sense, and the scope of the invention is limited only by the appended claims, together with all equivalents to what those claims assert, if properly described. Similar reference numbers in the drawings refer to identical or similar functions across various aspects.

또한, 본 발명의 특징 및 이점들은 첨부도면에 의거한 상세한 설명으로 더욱 명확해질 것이며, 본 명세서 및 청구범위에 사용된 용어나 단어는 통상적이고, 사전적인 의미로 해석되어서는 아니되며, 발명자가 그 자신의 발명을 가장 최선의 방법으로 설명하기 위해 용어의 개념을 적절하게 정의할 수 있다는 원칙에 입각하여 본 발명의 기술적 사상에 부합되는 의미와 개념으로 해석되어야 한다.In addition, the features and advantages of the present invention will become clearer with the detailed description based on the accompanying drawings, and the terms and words used in the present specification and claims should not be interpreted in the usual, dictionary meaning, and the inventor should not In order to explain one's invention in the best way, it should be interpreted as meaning and concept consistent with the technical idea of the present invention based on the principle that the concept of the term can be appropriately defined.

이하, 도면들을 참조하여 본 발명의 바람직한 실시예들을 보다 상세하게 설명하기로 한다.Hereinafter, preferred embodiments of the present invention will be described in more detail with reference to the drawings.

도 1은 본 발명의 실시예에 따른 이미지 생성 네트워크를 활용한 얼굴 추적 장치의 블록 도면이고, 도 2는 도 1의 이미지 생성 네트워크부의 블록 도면이고, 도 3은 도 1의 객체 추적 모델 생성부의 블록 도면이며, 도 4 및 도 5는 도 1의 이미지 생성 네트워크를 활용한 얼굴 추적 장치의 예시 도면이다.FIG. 1 is a block diagram of a face tracking device using an image generation network according to an embodiment of the present invention, FIG. 2 is a block diagram of the image generation network unit of FIG. 1, and FIG. 3 is a block diagram of the object tracking model creation unit of FIG. 1. These are diagrams, and FIGS. 4 and 5 are exemplary diagrams of a face tracking device using the image generation network of FIG. 1.

도 1을 참조하면, 이미지 생성 네트워크를 활용한 얼굴 추적 장치(이하, 얼굴 추적 장치)(10)는, 사용자로부터 입력된 영상에서 객체를 추적하기 위해 초기 이미지로부터 서로 다른 이미지를 생성하는 이미지 생성 네트워크를 이용하여 객체 추적 모델을 생성하고 객체 추적 모델을 이용하여 영상 내에서 특정 객체를 추적 및 표시하도록 하기 위해 마련된다.Referring to FIG. 1, a face tracking device (hereinafter referred to as a face tracking device) 10 using an image generation network is an image generation network that generates different images from an initial image in order to track an object in an image input from a user. It is designed to create an object tracking model using and to track and display specific objects in the image using the object tracking model.

구체적으로, 객체 추적 기술은 현재 주로 사용되고 있는 영상 내에서 특정 객체를 추적하는 자율 주행 분야, 도로 교통 감시 분야, 의료 시스템 등 다양한 분야에서 사용되는 기술이다. Specifically, object tracking technology is a technology used in various fields such as autonomous driving, road traffic surveillance, and medical systems to track specific objects within the currently used video.

이러한, 객체 추적 기술은, 입력된 영상 내에서 객체를 추적하기 위해 딥러닝 모델을 포함하여 구성되는데, 객체를 추적하기 위해 많은 학습 데이터를 이용하여 다양한 환경 변화에 따른 추적 객체의 특성이 사전에 학습된 딥러닝 모델을 이용한다.This object tracking technology is comprised of a deep learning model to track objects within an input image. In order to track an object, the characteristics of the tracked object according to various environmental changes are learned in advance using a lot of learning data. Uses a deep learning model.

하지만, 딥러닝 모델을 포함하여 수행되는 객체 추적 기술은, 빠른 연산 속도로 인해 오프라인으로 객체를 추적하는 방법보다 실시간으로 영상 내 객체를 정확하게 추적하는 측면에서 큰 강점을 가지지만, 딥러닝 모델의 성능에 크게 영향을 받는 문제가 있다.However, object tracking technology that includes a deep learning model has a great advantage in accurately tracking objects in an image in real time over offline object tracking methods due to its fast calculation speed, but the performance of the deep learning model There is a problem that is greatly affected by .

그러고, 이러한 딥러닝 모델을 포함하여 수행되는 객체 추적 기술은 입력된 영상 내에 존재하는 빛, 폐색, 불안전한 카메라 움직임 등 다양한 요인들이 실시간으로 영향을 받기 때문에, 보다 높은 학습 데이터가 필요하고, 이를 위해, 별도의 이미지를 생성하는 이미지 생성 네트워크를 이용하는 객체 추적 장치 장치를 개발해야 한다.Then, object tracking technology performed including these deep learning models is affected in real time by various factors such as light, occlusion, and unstable camera movement within the input image, so higher learning data is required, and for this purpose, , an object tracking device that uses an image generation network to generate separate images must be developed.

이에, 본 발명의 얼굴 추적 장치(10)는 딥러닝 모델인 이미지 생성 네트워크를 이용하여 영상 내 초기 이미지에 대응하는 추가 이미지를 생성하도록 마련될 수 있다.Accordingly, the face tracking device 10 of the present invention can be prepared to generate an additional image corresponding to the initial image in the video using an image generation network that is a deep learning model.

또한, 본 발명의 얼굴 추적 장치(10)는 초기 이미지 및 추가 이미지로부터 특징 정보를 추출하는 샴 네트워크 및 특징 정보에 기초하여 객체 추적 모델을 생성하고, 객체 추적 모델을 이용하여 특정 객체를 추적하는 지역 제안 네트워크(Region Proposal Network, RPN)가 마련될 수 있다.In addition, the face tracking device 10 of the present invention generates an object tracking model based on the feature information and a Siamese network that extracts feature information from the initial image and additional images, and uses the object tracking model to track a specific object. A proposal network (Region Proposal Network, RPN) may be established.

이때, 본 발명에서 제시하는 이미지 생성 네트워크, 샴 네트워크, 지역 제안 네트워크가 아니더라도 공지된 딥러닝 모델을 통해 이미지를 생성하고, 생성된 이미지로부터 특징을 추출하여 객체를 추저하는 딥러닝 모델이 사용될 수 있다.At this time, even if it is not the image generation network, Siamese network, or local proposal network proposed in the present invention, a deep learning model that generates an image through a known deep learning model and extracts features from the generated image to determine the object can be used. there is.

이를 위해, 본원 발명의 일 실시예에 따른 얼굴 추적 장치(10)는 도 1에 도시된 바와 같이, 이미지 생성 네트워크부(11), 정보 추출부(13), 객체 추적 모델 생성부(15) 및 객체 추적부(17)를 포함할 수 있다.To this end, as shown in FIG. 1, the face tracking device 10 according to an embodiment of the present invention includes an image generation network unit 11, an information extraction unit 13, an object tracking model creation unit 15, and It may include an object tracking unit 17.

그리고, 얼굴 추적 장치(10)는 얼굴 추적 방법을 수행하기 위한 소프트웨어(어플리케이션)가(이) 설치되어 실행될 수 있으며, 이미지 생성 네트워크부(11), 정보 추출부(13), 객체 추적 모델 생성부(15) 및 객체 추적부(17)는 얼굴 추적 장치(10)에서 실행되는 얼굴 추적 방법을 수행하기 위한 소프트웨어에 의해 제어될 수 있다.In addition, the face tracking device 10 may be installed and executed with software (application) for performing a face tracking method, and includes an image generation network unit 11, an information extraction unit 13, and an object tracking model creation unit. (15) and the object tracking unit 17 may be controlled by software for performing a face tracking method executed in the face tracking device 10.

그리고, 얼굴 추적 장치(10)는 도면에는 미도시되었으나, 사용자로부터 영상을 입력받는 입력부, 얼굴 추적 장치(10)에서 사용하는 적어도 하나 이상의 딥러닝 모델이 저장되는 저장부, 사용자의 외부 장치로부터 각종 영상 및 이미지 정보를 송수신하기 위한 통신부 및 출력부와 같은 구성을 더 포함할 수 있는 것은 물론, 이러한 구성 역시 얼굴 추적 장치(10)에서 실행되는 얼굴 추적 방법을 수행하기 위한 소프트웨어에 의해 제어될 수 있다.In addition, although not shown in the drawing, the face tracking device 10 includes an input unit that receives an image from the user, a storage unit that stores at least one deep learning model used in the face tracking device 10, and various information from the user's external device. It may further include components such as a communication unit and an output unit for transmitting and receiving video and image information, and these components may also be controlled by software for performing a face tracking method executed in the face tracking device 10. .

그리고, 얼굴 추적 장치(10)는 별도의 단말이거나 또는 단말의 일부 모듈일 수 있다. 또한, 이미지 생성 네트워크부(11), 정보 추출부(13), 객체 추적 모델 생성부(15) 및 객체 추적부(17)는 통합 모듈로 형성되거나 하나 이상의 모듈로 이루어질 수 있다. 그러나, 이와 반대로 각 구성은 별도의 모듈로 이루어질 수 있다.Additionally, the face tracking device 10 may be a separate terminal or a partial module of the terminal. Additionally, the image generation network unit 11, the information extraction unit 13, the object tracking model creation unit 15, and the object tracking unit 17 may be formed as an integrated module or may be comprised of one or more modules. However, on the contrary, each configuration may be comprised of a separate module.

또한, 얼굴 추적 장치(10)는 이동성을 갖거나 고정될 수 있다. 얼굴 추적 장치(10)는 서버 또는 엔진(engine) 형태일 수 있으며, 디바이스(device), 기구(apparatus), 단말 (terminal), UE(user equipment), MS(mobile station), 무선기기(wireless device), 휴대기기(handheld device) 등 다른 용어로 불릴 수 있다.Additionally, the face tracking device 10 can be mobile or fixed. The face tracking device 10 may be in the form of a server or engine, and may be a device, apparatus, terminal, user equipment (UE), mobile station (MS), or wireless device. ), handheld device, etc.

또한, 얼굴 추적 장치(10)는 운영체제(Operation System; OS), 즉 시스템을 기반으로 다양한 소프트웨어를 실행하거나 제작할 수 있다. 상기 운영체제는 소프트웨어가 장치의 하드웨어를 사용할 수 있도록 하기 위한 시스템 프로그램으로서, 안드로이드 OS, iOS, 윈도우 모바일 OS, 바다 OS, 심비안 OS, 블랙베리 OS 등 모바일 컴퓨터 운영체제 및 윈도우 계열, 리눅스 계열, 유닉스 계열, MAC, AIX, HP-UX 등 컴퓨터 운영체제를 모두 포함할 수 있다.Additionally, the face tracking device 10 can run or produce various software based on an operating system (OS), that is, a system. The operating system is a system program that allows software to use the hardware of the device, and includes mobile computer operating systems such as Android OS, iOS, Windows Mobile OS, Bada OS, Symbian OS, Blackberry OS, Windows series, Linux series, Unix series, etc. It can include all computer operating systems such as MAC, AIX, and HP-UX.

먼저, 이미지 생성 네트워크부(11)는, 사용자로부터 입력된 영상의 초기 이미지로부터 적어도 하나 이상의 객체에 관한 서로 다른 특징을 포함하는 적어도 하나의 추가 이미지를 생성한다.First, the image generation network unit 11 generates at least one additional image including different characteristics of at least one object from the initial image of the image input from the user.

도 2를 참조하면, 이미지 생성 네트워크부(11)는 영상의 초기 이미지에 존재하는 객체의 얼굴 영역을 설정하는 영역 설정부(111)와 얼굴 영역에서 특징을 추출하고, 특징을 미리 설정된 신경망 모델에 입력하여 서로 다른 특징 벡터를 생성하는 특징 벡터 생성부(113) 및 서로 다른 특징 벡터에 기초하여 객체의 얼굴 영역이 서로 다른 각도로 변형된 형태를 갖는 추가 이미지를 생성하는 추가 이미지 생성부(115)를 포함할 수 있다.Referring to FIG. 2, the image generation network unit 11 extracts features from the area setting unit 111 and the face area for setting the face area of the object present in the initial image of the image, and inputs the features into a preset neural network model. A feature vector generator 113 that generates different feature vectors by inputting them, and an additional image generator 115 that generates additional images in which the face area of the object is transformed at different angles based on the different feature vectors. may include.

이와 관련하여, 영역 설정부(111), 특징 벡터 생성부(113) 및 추가 이미지 생성부(115)는 도 4에 도시된 바와 같이 하나의 딥러닝 모델을 통해 수행되는 구성으로 마련되거나, 각각의 구성이 별도의 딥러닝 모듈을 통해 수행되는 구성으로 마련될 수 있다.In this regard, the region setting unit 111, the feature vector generating unit 113, and the additional image generating unit 115 are provided in a configuration that is performed through one deep learning model, as shown in FIG. 4, or each The configuration may be prepared as a configuration performed through a separate deep learning module.

보다 구체적으로 도 4는 이미지 생성 네트워크부(11)를 통해 입력된 이미지로부터 서로 다른 특징을 포함하는 적어도 하나의 추가 이미지를 생성하는 일례로, 도 4를 참조하면, 이미지 생성 네트워크부(11)는 사용자로부터 입력된 영상의 첫번째 이미지 즉 초기 이미지에 기초하여 초기 이미지에 존재하는 객체의 각도가 변형된 이미지를 추가로 생성할 수 있다.More specifically, Figure 4 is an example of generating at least one additional image including different features from an image input through the image generation network unit 11. Referring to Figure 4, the image generation network unit 11 Based on the first image of the image input from the user, that is, the initial image, an image in which the angle of the object present in the initial image is modified can be additionally created.

이와 관련하여, 영역 설정부(111)는 이미지 생성 네트워크부(11)에 영상이 입력되면, 영상의 초기 이미지에 존재하는 객체를 사전에 설정된 영역만큼 부여하여 객체의 얼굴 영역을 설정할 수 있다.In this regard, when an image is input to the image generation network unit 11, the area setting unit 111 may set the face area of the object by assigning a preset area to the object existing in the initial image of the image.

또한, 영역 설정부(111)는 초기 이미지에 존재하는 객체의 얼굴 영역을 잠재 공간에 투영할 수 있다. 여기서, 영역 설정부(111)가 초기 이미지를 잠재 공간에 투영하는 것은, 초기 이미지 내에 존재하는 객체 특징의 분포를 고차원 형태로 확인하기 위한 것이다.Additionally, the area setting unit 111 may project the face area of the object present in the initial image into the latent space. Here, the area setting unit 111 projects the initial image into the latent space in order to confirm the distribution of object features existing in the initial image in a high-dimensional form.

이를 위해, 영역 설정부(111)는 COCO 데이터 세트가 학습 데이터로 사용되고, COCO 데이터를 학습한 Inception-V2 네트워크를 이용하여 초기 이미지에 존재하는 객체의 얼굴 영역을 잠재 공간에 투영할 수 있다.To this end, the area setting unit 111 uses the COCO data set as learning data and can project the face area of the object present in the initial image into the latent space using the Inception-V2 network that learned the COCO data.

한편, 특징 벡터 생성부(113)는 객체의 얼굴 영역에 존재하는 특징을 추출하고, 특징에 기초하여 서로 다른 특징 벡터를 생성할 수 있다.Meanwhile, the feature vector generator 113 may extract features existing in the face area of the object and generate different feature vectors based on the features.

보다 구체적으로, 특징 벡터 생성부(113)는 영역 설정부(111)에서 투영된 초기 이미지에서 객체의 얼굴 영역의 특징만을 추출하고, 추출된 얼굴 영역의 특징을 미리 설정된 신경망 모델에 입력하여 서로 다른 특징 벡터를 생성할 수 있다.More specifically, the feature vector generator 113 extracts only the features of the face area of the object from the initial image projected by the area setting unit 111, and inputs the features of the extracted face area into a preset neural network model to create different A feature vector can be created.

이와 관련하여, 특징 벡터 생성부(113)는 입력된 이미지의 특징을 저장하는 매핑 네트워크와 매핑 네트워크에 저장된 특징을 이용하여 이미지를 생성하는 이미지 생성 네트워크로 구성되는 style GAN(Generative Adversarial Network)을 신경망 모델로서 이용할 수 있다.In this regard, the feature vector generator 113 uses a style GAN (Generative Adversarial Network) neural network, which consists of a mapping network that stores the features of the input image and an image generation network that generates an image using the features stored in the mapping network. It can be used as a model.

이러한, 특징 벡터 생성부(113)는 style GAN에 구상된 매핑 네트워크를 이용하여 얼굴 영역의 특징 분포를 정렬하고, AdaIN을 통해 정렬된 특징 분포에서 조절되어야 하는 특징 벡터를 추출할 수 있다. The feature vector generator 113 can align the feature distribution of the face area using a mapping network designed in style GAN, and extract feature vectors that need to be adjusted from the feature distribution sorted through AdaIN.

여기서, 매핑 네트워크는 8계층의 FC계층으로 구성되어 사전에 학습된 데이터를 계층 깊이에 따라 분류하는 완전 연결 네트워크로서, 데이터의 분포로 이루어진 특징 분포가 입력되면, 특징 분포를 저장하고, 매핑 과정에서 발생하는 왜곡을 최대한으로 제거하여 특징 분포 간의 연관성을 감소시킬 수 있다.Here, the mapping network is a fully connected network composed of 8 FC layers and classifies pre-learned data according to layer depth. When the feature distribution consisting of the data distribution is input, the feature distribution is stored, and in the mapping process By removing the resulting distortion as much as possible, the correlation between feature distributions can be reduced.

또한, 특징 벡터 생성부(113)는 매핑 네트워크가 사전에 학습된 학습 데이터에 따라 변화하는 특징들을 조절할 수 있다.Additionally, the feature vector generator 113 can adjust features that change according to learning data learned in advance by the mapping network.

예를 들면, 사전에 학습된 데이터에 따라 정렬된 특징 분포에 존재하는 5번째 특징 벡터가 전체적인 얼굴형의 특징인 경우, 특징 벡터 생성부(113)는 초기 이미지에 포함되는 특정 객체의 머리카락 또는 얼굴색 등이 서로 다른 형태로 변형되도록 특징 벡터를 조절할 수 있다.For example, if the fifth feature vector present in the feature distribution sorted according to pre-learned data is a feature of the overall facial shape, the feature vector generator 113 selects the hair or face color of a specific object included in the initial image. The feature vector can be adjusted so that the back is transformed into different shapes.

이를 통해, 특징 벡터 생성부(113)는 신경망 모델을 이용하여 영역 설정부(111)에서 변환된 얼굴 영역의 특징 분포에서 특징을 추출하고, 추출된 특징의 특징 벡터를 조절하여 서로 다른 값을 갖는 특징 벡터를 생성할 수 있다.Through this, the feature vector generator 113 uses a neural network model to extract features from the feature distribution of the face area converted by the region setting unit 111, and adjusts the feature vectors of the extracted features to have different values. A feature vector can be created.

한편, 추가 이미지 생성부(115)는 특징 벡터 생성부(113)가 이용하는 신경망 모델 즉, style GAN을 이용하여 초기 이미지에 기초하는 추가 이미지를 생성할 수 있다. Meanwhile, the additional image generator 115 may generate an additional image based on the initial image using the neural network model used by the feature vector generator 113, that is, style GAN.

여기서, 특징 벡터 생성부(113) 및 추가 이미지 생성부(115)는 본원 발명의 용이한 설명을 위해 별도의 구성으로 기재하였지만, 도 4에 도시된 바와 같이, 동일한 딥러닝 모델에서 수행되는 하나의 구성으로 마련될 수 있고, 서로 다른 딥러닝 모델을 통해 수행되는 복수의 구성으로 마련될 수 있다.Here, the feature vector generator 113 and the additional image generator 115 are described as separate components for easy description of the present invention, but as shown in FIG. 4, one deep learning model is performed on the same deep learning model. It can be prepared as a configuration, and it can be prepared as a plurality of configurations performed through different deep learning models.

보다 구체적으로, 추가 이미지 생성부(115)는 style GAN의 이미지 생성 네트워크를 이용하여 특징 벡터 생성부(113)에서 생성된 서로 다른 값을 갖는 특징 벡터를 초기 이미지에 적용함으로써, 초기 이미지에 대응하는 추가 이미지를 생성할 수 있다. 여기서, 추가 이미지 생성부(115)는 이미지 생성 네트워크로써, PGAN(Progressive GAN)을 사용할 수 있다.More specifically, the additional image generator 115 uses the image generation network of style GAN to apply feature vectors with different values generated by the feature vector generator 113 to the initial image, thereby creating a new image corresponding to the initial image. Additional images can be created. Here, the additional image generator 115 may use PGAN (Progressive GAN) as an image generation network.

이러한, 추가 이미지 생성부(115)는 PGAN을 이미지 생성 네트워크로 사용함으로써, 입/출력되는 데이터의 크기를 점진적으로 늘리거나 감소하여 기존 GAN의 해상도 문제를 해결할 수 있다.By using PGAN as an image generation network, the additional image generator 115 can solve the resolution problem of the existing GAN by gradually increasing or decreasing the size of input/output data.

또한, 추가 이미지 생성부(115)는 초기 이미지 및 추가 이미지를 아래에서 상술할 정보 추출부(13)에 입력하기 위해, 초기 이미지 및 추가 이미지의 크기를 조절할 수 있다.Additionally, the additional image generator 115 may adjust the sizes of the initial image and the additional image in order to input the initial image and the additional image into the information extraction unit 13, which will be described in detail below.

보다 구체적으로, 추가 이미지 생성부(115)는 512x512 해상도로 생성된 초기 이미지와 추가 이미지를 127x127x3의 크기로 각각 변환할 수 있다. More specifically, the additional image generator 115 may convert the initial image and the additional image generated with a resolution of 512x512 to a size of 127x127x3.

여기서, 추가 이미지 생성부(115)가 변환하는 크기가 127x127 미만인 경우, 특징 추출 시 성능이 떨어지는 문제가 발생한다. 이러한, 문제를 해결하기 위해, 추가 이미지 생성부(115)는 이미지 내에 모든 픽셀의 RGB 평균 값으로 주위를 채워 127x127x3의 크기로 변환할 수 있다. Here, if the size converted by the additional image generator 115 is less than 127x127, a problem of poor performance occurs when extracting features. To solve this problem, the additional image generator 115 can convert the image to a size of 127x127x3 by filling the image with the RGB average values of all pixels.

따라서, 추가 이미지 생성부(115)는 이미지 생성에 걸리는 시간과 생성 이미지의 정확도를 고려하여 해당 입력 크기인 127x127x3으로 변환하는 것이 가장 바람직하다.Therefore, it is most desirable for the additional image generator 115 to convert the input size to 127x127x3, considering the time it takes to generate the image and the accuracy of the generated image.

한편, 이미지 생성 네트워크부(11)는 초기 이미지로부터 추가 이미지를 생성하기 이전에, 초기 이미지의 크기를 조절하는 크기 조절부(117)를 더 포함할 수 있다.Meanwhile, the image generation network unit 11 may further include a size adjustment unit 117 that adjusts the size of the initial image before generating an additional image from the initial image.

이러한, 크기 조절부(117)는 특징 벡터 생성부(113) 및 추가 이미지 생성부(115)에서 이용하는 style GAN에 사용되도록 초기 이미지의 크기를 512x1로 변환할 수 있다.The size adjustment unit 117 can convert the size of the initial image to 512x1 to be used in the style GAN used by the feature vector generation unit 113 and the additional image generation unit 115.

이러한 방식으로 본 발명의 이미지 생성 네트워크부(11)는 사용자로부터 입력된 영상 내 초기 이미지로부터 추가 이미지를 생성할 수 있다.In this way, the image generation network unit 11 of the present invention can generate additional images from the initial image in the image input from the user.

이에 따른 추가 이미지는, 초기 이미지에 존재하는 객체의 얼굴 영역이 서로 다른 각도로 변형된 형태를 포함하여 적어도 하나 이상 생성될 수 있다.Accordingly, at least one additional image may be generated including the face area of the object present in the initial image transformed at different angles.

정보 추출부()는 초기 이미지 및 이미지 생성 네트워크부()에서 생성된 적어도 하나의 추가 이미지에 존재하는 특징 정보를 추출한다.The information extraction unit () extracts feature information present in the initial image and at least one additional image generated by the image generation network unit ().

보다 구체적으로, 도 5를 참조하면, 정보 추출부()는 VGC-16을 기반으로 두 개의 동일한 CNN 네트워크를 사용하는 샴 네트워크를 사용할 수 있다.More specifically, referring to Figure 5, the information extraction unit () may use a Siamese network using two identical CNN networks based on VGC-16.

여기서, 샴 네트워크는 동시에 복수 개의 이미지를 동시에 입력 받기 위해 입력단이 쌍으로 존재하는 구조로 마련되어 동시에 입력된 이미지의 특징 정보를 추출하는 네트워크이다.Here, the Siamese network is a network that has a structure in which input terminals exist in pairs in order to simultaneously receive a plurality of images, and extracts feature information of the simultaneously input images.

이러한, 정보 추출부()는 샴 네트워크에 크기가 변환된 초기 이미지 및 적어도 하나의 추가 이미지 각각을 쌍으로 존재하는 입력단에 입력하여 각각의 이미지에 존재하는 특징 정보를 추출할 수 있다.The information extraction unit may input the size-converted initial image and at least one additional image to the Siamese network as a pair to an input terminal to extract feature information present in each image.

한편, 객체 추적 모델 생성부(155)는 각각의 이미지로부터 추출된 특징 정보에 기초하여 얼굴을 추적하기 위한 객체 추적 모델을 생성한다.Meanwhile, the object tracking model generator 155 generates an object tracking model for tracking a face based on feature information extracted from each image.

이를 위해, 도 3을 참조하면, 객체 추적 모델 생성부(155)는 분류 정보 필터 생성부(151), 객체 프레임 정보 필터 생성부(153) 및 모델 생성부(155)를 포함하여 마련될 수 있다.To this end, referring to FIG. 3, the object tracking model creation unit 155 may be provided including a classification information filter creation unit 151, an object frame information filter creation unit 153, and a model creation unit 155. .

보다 구체적으로, 분류 정보 필터 생성부(151)는 초기 이미지 및 적어도 하나의 추가 이미지 각각으로부터 추출된 특징 정보를 분석하여 얼굴 영역이 특정 객체인 경우 상기 객체를 전경으로 분류하고, 상기 얼굴 영역이 특정 객체가 아니면 상기 객체를 배경으로 분류하여 분류 정보를 산출하는 분류 정보 필터를 생성할 수 잇다.More specifically, the classification information filter generator 151 analyzes feature information extracted from each of the initial image and at least one additional image, and when the face area is a specific object, classifies the object as the foreground, and the face area is a specific object. If it is not an object, a classification information filter can be created that classifies the object as the background and calculates classification information.

여기서, 분류 정보 필터 생성부는, 분류 정보 필터가 객체를 전경으로 판단하는 경우 분류 정보에 참 값을 포함하도록 설정하고, 분류 정보 필터가 객체를 배경으로 판단하는 경우 분류 정보에 거짓 값을 포함하도록 설정할 수 있다.Here, the classification information filter generator sets the classification information to include a true value when the classification information filter determines the object is the foreground, and sets the classification information to include a false value when the classification information filter determines the object is the background. You can.

따라서, 분류 정보 필터 생성부(151)는 객체가 특정 객체인지에 대한 여부에 따라 객체를 전경 또는 배경으로 분류하고, 분류된 결과에 따라 참 값 또는 거짓 값을 포함하는 분류 정보를 산출하는 분류 정보 필터를 생성할 수 있다.Therefore, the classification information filter generator 151 classifies the object as a foreground or background depending on whether the object is a specific object, and calculates classification information including a true value or a false value according to the classification result. You can create filters.

한편, 객체 프레임 정보 필터 생성부(153)는 초기 이미지 및 적어도 하나의 추가 이미지 각각에 포함된 객체의 얼굴 영역을 프레임 형태로 표시하고, 프레임 형태로 표시된 객체의 얼굴 영역의 프레임 정보를 산출하는 객체 프레임 정보 필터를 생성할 수 있다.Meanwhile, the object frame information filter generator 153 displays the face area of the object included in each of the initial image and at least one additional image in the form of a frame, and calculates frame information of the face area of the object displayed in the form of a frame. You can create a frame information filter.

보다 구체적으로, 객체 프레임 정보 필터 생성부(153)는 객체 프레임 정보 필터가 각각의 이미지에 포함된 객체의 얼굴 영역을 프레임 형태로 표시하도록 설정하고, 프레임 형태로 표시된 얼굴 영역에 따른 x축 값, y축 값, 길이 값 및 높이 값을 각각의 이미지에 대한 프레임 정보로서 산출하도록 사전에 설정할 수 있다.More specifically, the object frame information filter generator 153 sets the object frame information filter to display the face area of the object included in each image in the form of a frame, and includes an x-axis value according to the face area displayed in the frame form, The y-axis value, length value, and height value can be set in advance to be calculated as frame information for each image.

따라서, 객체 프레임 정보 필터 생성부(153)는 각각의 이미지로부터 나타나는 객체의 얼굴 영역에 따른 x축 값, y축 값, 길이 값 및 높이 값을 포함하는 프레임 정보를 산출하는 객체 프레임 정보 필터를 생성할 수 있다.Therefore, the object frame information filter generator 153 generates an object frame information filter that calculates frame information including the x-axis value, y-axis value, length value, and height value according to the face area of the object appearing in each image. can do.

이와 관련하여, 모델 생성부(155)는 분류 정보 필터 및 객체 프레임 정보 필터에 컨볼루션 연산을 수행하여 객체 추적 모델을 생성할 수 있다.In this regard, the model generator 155 may generate an object tracking model by performing a convolution operation on the classification information filter and the object frame information filter.

보다 구체적으로, 모델 생성부(155)는 분류 정보 필터 및 객체 프레임 정보 필터에 컨볼루션 연산을 수행하기 위해 RPN(Region Proposal Network)를 이용할 수 있다. 여기서, RPN은 객체의 위치에 대한 점수 예측과 객체의 경계를 예측하는 딥러닝 모델로서, CNN에서 추출한 기능 맵을 사용하여 특징에 대한 계산을 수행하는 모델이다.More specifically, the model generator 155 may use a Region Proposal Network (RPN) to perform a convolution operation on the classification information filter and the object frame information filter. Here, RPN is a deep learning model that predicts the score for the location of the object and the boundary of the object, and is a model that performs calculations on features using the feature map extracted from CNN.

먼저, 모델 생성부(155)는 스택 레이어를 이용하여 분류 정보 필터 및 객체 프레임 정보 필터의 차원을 정렬할 수 있다.First, the model generator 155 can align the dimensions of the classification information filter and the object frame information filter using the stack layer.

이후, 모델 생성부(155)는 차원이 정렬된 분류 정보 필터 및 객체 프레임 정보 필터가 6x6x(2k x 512)x3의 크기로 변환되는 컨볼루션 연산을 수행할 수 있다. 여기서, k는 사전에 정의된 다양한 크기의 경계 상자인 앵커의 개수이다.Thereafter, the model generator 155 may perform a convolution operation in which the dimensionally aligned classification information filter and object frame information filter are converted to a size of 6x6x (2k x 512)x3. Here, k is the number of anchors, which are predefined bounding boxes of various sizes.

또한, 모델 생성부(155)는 크기가 변환된 분류 정보 필터 및 객체 프레임 정보 필터에 1x1 크기로 변환하는 컨볼루션 연산을 더 수행하여 기존의 필터를 최대한 유지하면서 채널 수를 조절할 수 있다. 여기서, 모델 생성부(155)가 1x1 크기로 변환하는 컨볼루션 연산을 더 수행하는 것은, 이후에 특징 정보로 변환된 이미지에서 객체를 추적하기 위해 해당 이미지와 같은 채널 차원을 포함하도록 수행하는 것이다.In addition, the model generator 155 can further perform a convolution operation to convert the size of the classification information filter and object frame information filter to 1x1 size, thereby adjusting the number of channels while maintaining the existing filter as much as possible. Here, the model generator 155 further performs the convolution operation to convert the size to 1x1 to include the same channel dimension as the image in order to track the object in the image converted to feature information.

이로써, 객체 추적 모델 생성부(155)는 분류 정보 필터 생성부(151), 객체 프레임 정보 필터 생성부(153) 및 모델 생성부(155)를 통해 특징 정보로 변환된 이미지로부터 객체를 추적 및 표시하는 객체 추적 모델을 생성할 수 있다.Accordingly, the object tracking model generator 155 tracks and displays the object from the image converted into feature information through the classification information filter generator 151, the object frame information filter generator 153, and the model generator 155. You can create an object tracking model that does this.

한편, 객체 추적부()는 객체 추적 모델을 이용하여 사용자로부터 입력된 영상에서 얼굴을 추적한다.Meanwhile, the object tracking unit () tracks the face in the image input from the user using an object tracking model.

보다 구체적으로 도 5를 재참조하면, 객체 추적부()는 정보 추출부()가 사용자로부터 입력된 영상에서 추출된 특징 정보를 전달받을 수 있다.More specifically, referring back to FIG. 5, the object tracking unit () may receive feature information extracted from an image input by the information extraction unit ().

또한, 객체 추적부()는, 정보 추출부()로부터 전달된 실시간 특징 정보를 객체 추적 모델에 적용하여 특정 객체를 추적 및 표시하되, 분류 정보 필터를 이용하여 실시간 특징 정보의 얼굴 영역이 전경 또는 배경인지 분류하고, 객체 프레임 정보 필터를 이용하여 실시간 특징 정보의 얼굴 영역을 프레임 형태로 표시하여 영상으로부터 얼굴인 특정 객체를 추적 및 표시할 수 있다.In addition, the object tracking unit () applies the real-time feature information delivered from the information extraction unit () to the object tracking model to track and display a specific object, and uses a classification information filter to ensure that the face area of the real-time feature information is in the foreground or It is possible to track and display a specific object that is a face from an image by classifying whether it is a background and displaying the face area of real-time feature information in the form of a frame using an object frame information filter.

이와 관련하여, 객체 추적부()가 분류 정보 필터를 이용하는 것은, 분류 정보 필터에 실시간 특징 정보를 입력하여 실시간 특징 정보에 포함된 객체의 얼굴 영역에 대응하는 앵커를 예측하고, 예측된 앵커 내 객체가 특정 객체인지 배경인지 분류하는 필터링을 수행하는 것일 수 있다.In this regard, the object tracking unit () uses a classification information filter to input real-time feature information into the classification information filter, predict an anchor corresponding to the face area of the object included in the real-time feature information, and predict the object within the predicted anchor. Filtering may be performed to classify whether is a specific object or background.

또한, 객체 추적부()가 객체 프레임 정보 필터를 이용하는 것은, 객체 프레임 정보 필터에 실시간 특징 정보를 입력하여 실시간 특징 정보에 포함된 얼굴 영역의 좌표 값을 설정하고, 설정된 좌표 값에 기초하여 실시간 특징 정보에 따른 얼굴 영역을 프레임 형태로 표시하는 필터링을 수행하는 것일 수 있다.In addition, the object tracking unit () uses the object frame information filter to input real-time feature information into the object frame information filter, set the coordinate value of the face area included in the real-time feature information, and set the real-time feature information based on the set coordinate value. Filtering may be performed to display the facial area according to the information in the form of a frame.

이로써, 객체 추적부()는 초기 이미지와 초기 이미지에 기초하여 생성된 추가 이미지에서 각각 추출된 특징 정보로부터 생성된 객체 추적 모델을 이용하여 실시간으로 특징 정보가 추출된 영상에서 객체를 추적하고, 추적이 완료된 특정 객체를 프레임 형태로 표시하여 영상 내에 특정 객체 즉, 얼굴을 실시간으로 추적 및 표시할 수 있다.Accordingly, the object tracking unit () tracks and tracks an object in an image from which feature information is extracted in real time using an object tracking model generated from feature information extracted from the initial image and additional images generated based on the initial image. By displaying this completed specific object in the form of a frame, a specific object, that is, a face, can be tracked and displayed in the video in real time.

한편, 도 6 내지 도 8은 본 발명의 실시예에 따른 얼굴 추적 방법의 흐름 도면으로써, 본 발명의 실시예에 따른 얼굴 추적 방법은 도 1 내지 도 5에 도시된 얼굴 추적 장치(10)와 동일한 구성 상에서 진행되므로, 도 1 내지 도 5의 얼굴 추적 장치(10)와 동일한 도면 부호를 부여하고, 반복되는 설명은 생략하기로 한다.Meanwhile, FIGS. 6 to 8 are flow diagrams of a face tracking method according to an embodiment of the present invention. The face tracking method according to an embodiment of the present invention is the same as the face tracking device 10 shown in FIGS. 1 to 5. Since this is done in terms of configuration, the same reference numerals as the face tracking device 10 of FIGS. 1 to 5 will be assigned, and repeated descriptions will be omitted.

도 6을 참조하면, 본 발명의 실시예에 따른 얼굴 추적 방법은, 이미지 생성 네트워크를 이용한 얼굴 추적 장치(10)가 수행하는 얼굴 추적 방법으로써, 입력 단계(610), 이미지 생성 단계(630), 정보 추출 단계(650), 객체 추적 모델 생성 단계(670) 및 객체 추적 단계(690)를 포함한다.Referring to FIG. 6, the face tracking method according to an embodiment of the present invention is a face tracking method performed by the face tracking device 10 using an image generation network, and includes an input step 610, an image generation step 630, It includes an information extraction step (650), an object tracking model creation step (670), and an object tracking step (690).

먼저, 얼굴 추적 장치(10)는, 입력 단계(610)에서 사용자로부터 영상을 입력 받는다.First, the face tracking device 10 receives an image from the user in an input step 610.

이후, 얼굴 추적 장치(10)는 이미지 생성 단계(630)에서 영상의 초기 이미지로부터 적어도 하나 이상의 객체에 관한 서로 다른 특징을 포함하는 적어도 하나의 추가 이미지를 생성한다.Thereafter, in the image generation step 630, the face tracking device 10 generates at least one additional image including different features of at least one object from the initial image of the image.

보다 구체적으로, 도 7을 참조하면 이미지 생성 단계(630)는 초기 이미지에 존재하는 객체의 얼굴 영역을 설정하는 영역 설정 단계(631), 얼굴 영역에서 특징을 추출하고, 특징을 미리 설정된 신경망 모델에 입력하여 서로 다른 특징 벡터를 생성하는 특징 벡터 생성 단계(633) 및 서로 다른 특징 벡터에 기초하여 객체의 얼굴 영역이 서로 다른 각도로 변형된 형태를 갖는 추가 이미지를 생성하는 추가 이미지 생성 단계(635)를 포함할 수 있다.More specifically, referring to FIG. 7, the image generation step 630 includes the region setting step 631 of setting the face region of the object present in the initial image, extracting features from the face region, and entering the features into a preset neural network model. A feature vector generation step 633 of generating different feature vectors by inputting them, and an additional image generation step 635 of generating additional images in which the face area of the object is transformed at different angles based on the different feature vectors. may include.

이와 관련하여, 얼굴 추적 장치(10)는 정보 추출 단계(650)에서 초기 이미지 및 적어도 하나의 추가 이미지에 존재하는 특징 정보를 추출한다.In this regard, the face tracking device 10 extracts feature information present in the initial image and at least one additional image in the information extraction step 650.

이어서, 얼굴 추적 장치(10)는 객체 추적 모델 생성 단계(670)에서 각각의 특징 정보에 기초하여 얼굴을 추적하기 위한 객체 추적 모델을 생성한다.Next, the face tracking device 10 generates an object tracking model for tracking the face based on each feature information in the object tracking model creation step 670.

이를 위해, 객체 추적 모델 생성 단계는 분류 정보 필터 생성 단계(), 객체 프레임 정보 필터 생성 단계() 및 모델 생성 단계()로 마련될 수 있다.To this end, the object tracking model creation step may be prepared as a classification information filter creation step (), an object frame information filter creation step (), and a model creation step ().

보다 구체적으로, 얼굴 추적 장치(10)는 분류 정보 필터 생성 단계()에서 초기 이미지 및 적어도 하나의 추가 이미지 각각으로부터 추출된 특징 정보를 분석하여 얼굴 영역이 특정 객체인 경우 객체를 전경으로 분류하고, 얼굴 영역이 특정 객체가 아니면 객체를 배경으로 분류하여 분류 정보를 산출하는 분류 정보 필터를 생성할 수 있다.More specifically, the face tracking device 10 analyzes feature information extracted from each of the initial image and at least one additional image in the classification information filter creation step () to classify the object as the foreground if the face area is a specific object, If the face area is not a specific object, a classification information filter can be created that classifies the object as the background and calculates classification information.

또한, 얼굴 추적 장치()는 객체 프레임 정보 필터 생성 단계()에서 초기 이미지 및 적어도 하나의 추가 이미지 각각에 포함된 객체의 얼굴 영역을 프레임 형태로 표시하고, 프레임 형태로 표시된 객체의 얼굴 영역의 프레임 정보를 산출하는 객체 프레임 정보 필터를 생성할 수 있다.In addition, the face tracking device () displays the face area of the object included in each of the initial image and at least one additional image in the form of a frame in the object frame information filter creation step (), and displays a frame of the face area of the object displayed in the form of a frame. You can create an object frame information filter that produces information.

이를 통해, 얼굴 추적 장치()는 모델 생성 단계()에서 분류 정보 필터 및 객체 프레임 정보 필터에 컨볼루션 연산을 수행하여 객체 추적 모델을 생성할 수 있다.Through this, the face tracking device () can generate an object tracking model by performing a convolution operation on the classification information filter and the object frame information filter in the model creation step ().

이어서, 얼굴 추적 장치(10)는 객체 추적 단계(690)에서 객체 추적 모델을 이용하여 사용자로부터 입력된 영상에서 얼굴을 추적한다.Next, the face tracking device 10 tracks the face in the image input from the user using an object tracking model in the object tracking step 690.

보다 구체적으로, 객체 추적 단계(690)는 정보 추출 단계(650)에서 사용자로부터 입력된 영상에서 추출된 실시간 특징 정보를 전달받으면, 실시간 특징 정보를 객체 추적 모델에 적용하여 특정 객체를 추적 및 표시할 수 있다.More specifically, when the object tracking step 690 receives real-time feature information extracted from the image input from the user in the information extraction step 650, the real-time feature information is applied to the object tracking model to track and display a specific object. You can.

이때, 객체 추적 단계(690)는 분류 정보 필터를 이용하여 실시간 특징 정보의 얼굴 영역이 전경 또는 배경인지 분류하고, 객체 프레임 정보 필터를 이용하여 실시간 특징 정보의 얼굴 영역을 프레임 형태로 표시하여 영상으로부터 얼굴인 특정 객체를 추적 및 표시할 수 있다.At this time, the object tracking step 690 uses a classification information filter to classify whether the face area of the real-time feature information is foreground or background, and uses an object frame information filter to display the face area of the real-time feature information in the form of a frame from the image. It can track and display specific objects, such as faces.

이로써, 이미지 생성 네트워크를 이용한 얼굴 추적 장치(10)는 얼굴 추적 방법을 수행하여 사용자로부터 입력된 영상 내 특정 객체를 추적 및 표시할 수 있다.Accordingly, the face tracking device 10 using an image generation network can perform a face tracking method to track and display a specific object in an image input from the user.

이와 같은, 얼굴 추적 방법은 다양한 컴퓨터 구성요소들 통하여 수행될 수 있는 프로그램 명령어의 형태로 구현되어 컴퓨터 판독 가능한 기록 매체에 기록될 수 있다. 상기 컴퓨터 판독 가능한 기록 매체는 프로그램 명령어, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다.This face tracking method can be implemented in the form of program instructions that can be executed through various computer components and recorded on a computer-readable recording medium. The computer-readable recording medium may include program instructions, data files, data structures, etc., singly or in combination.

상기 컴퓨터 판독 가능한 기록 매체에 기록되는 프로그램 명령어는 본 발명을 위하여 특별히 설계되고 구성된 것들이거니와 컴퓨터 소프트웨어 분야의 당업자에게 공지되어 사용 가능한 것일 수도 있다.The program instructions recorded on the computer-readable recording medium may be those specifically designed and configured for the present invention, or may be known and usable by those skilled in the computer software field.

컴퓨터 판독 가능한 기록 매체의 예에는, 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체, CD-ROM, DVD 와 같은 광기록 매체, 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical media), 및 ROM, RAM, 플래시 메모리 등과 같은 프로그램 명령어를 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다.Examples of computer-readable recording media include magnetic media such as hard disks, floppy disks and magnetic tapes, optical recording media such as CD-ROMs and DVDs, and magneto-optical media such as floptical disks. media), and hardware devices specifically configured to store and perform program instructions, such as ROM, RAM, flash memory, etc.

프로그램 명령어의 예에는, 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드도 포함된다. 상기 하드웨어 장치는 본 발명에 따른 처리를 수행하기 위해 하나 이상의 소프트웨어 모듈로서 작동하도록 구성될 수 있으며, 그 역도 마찬가지이다.Examples of program instructions include not only machine language code such as that created by a compiler, but also high-level language code that can be executed by a computer using an interpreter or the like. The hardware device may be configured to operate as one or more software modules to perform processing according to the invention and vice versa.

이상에서는 본 발명의 다양한 실시예에 대하여 도시하고 설명하였지만, 본 발명은 상술한 특정의 실시예에 한정되지 아니하며, 청구범위에서 청구하는 본 발명의 요지를 벗어남이 없이 당해 발명이 속하는 기술분야에서 통상의 지식을 가진 자에 의해 다양한 변형실시가 가능한 것은 물론이고, 이러한 변형실시들은 본 발명의 기술적 사상이나 전망으로부터 개별적으로 이해되어져서는 안될 것이다.Although various embodiments of the present invention have been shown and described above, the present invention is not limited to the specific embodiments described above, and may be used in the technical field to which the invention pertains without departing from the gist of the invention as claimed in the claims. Of course, various modifications can be made by those skilled in the art, and these modifications should not be understood individually from the technical idea or perspective of the present invention.

10 : 이미지 생성 네트워크를 활용한 얼굴 추적 장치
11 : 이미지 생성 네트워크부
111 : 영역 설정부
113 : 특징 벡터 생성부
115 : 추가 이미지 생성부
117 : 크기 조절부
13 : 정보 추출부
131 : 분류 정보 추출부
133 : 특징 정보 추출부
15 : 객체 추적 모델 생성부
17 : 객체 추적부10: Face tracking device using image generation network
11: Image generation network unit
111: Area setting unit
113: Feature vector generation unit
115: Additional image creation unit
117: size control unit
13: Information extraction unit
131: Classification information extraction unit
133: Feature information extraction unit
15: Object tracking model creation unit
17: Object tracking unit

Claims

In a face tracking device using an image generation network that tracks a face in an image input from a user,
an image generation network unit generating at least one additional image including different features of at least one object from the initial image of the image;
an information extraction unit that extracts feature information present in the initial image and at least one additional image;
an object tracking model generator that generates an object tracking model for tracking a face based on the feature information; and
A face tracking device using an image generation network, comprising: an object tracking unit that tracks the face in an image input from the user using the object tracking model.

According to paragraph 1,
The additional image above is,
A face tracking device using an image generation network, wherein at least one facial area of the object present in the initial image is generated including a deformed shape at different angles.

According to paragraph 2,
The image generation network unit,
an area setting unit that sets a face area of the object present in the initial image;
a feature vector generator that extracts features from the face area and inputs the features into a preset neural network model to generate different feature vectors; and
A face tracking device using an image generation network, comprising: an additional image generator that generates the additional image in which the face area of the object has a deformed shape at different angles based on the different feature vectors.

According to paragraph 3,
The object tracking model creation unit,
By analyzing the feature information extracted from each of the initial image and the at least one additional image, if the face area is a specific object, the object is classified as the foreground, and if the face area is not a specific object, the object is classified as the background a classification information filter generator that generates a classification information filter that calculates classification information; and
An object that displays a face area of the object included in each of the initial image and the at least one additional image in the form of a frame, and generates an object frame information filter for calculating frame information of the face area of the object displayed in the frame form. A face tracking device using an image generation network, including a frame information filter generator.

According to paragraph 4,
The object tracking model creation unit,
A face tracking device using an image generation network, further comprising a model generator configured to generate the object tracking model by performing a convolution operation on the classification information filter and the object frame information filter.

According to clause 5,
The object tracking unit,
The information extraction unit receives real-time feature information extracted from the image input from the user, applies the real-time feature information to the object tracking model to track and display the specific object, and uses the classification information filter to track and display the specific object. Classifying whether the face area of the real-time feature information is foreground or background, and displaying the face area of the real-time feature information in the form of a frame using the object frame information filter to track and display the specific object that is the face from the image, Face tracking device using image generation network.

In a face tracking method performed by a face tracking device using an image generation network,
An input step of receiving an image from a user;
An image generation step of generating at least one additional image including different features of at least one object from the initial image of the video;
An information extraction step of extracting feature information present in the initial image and at least one additional image;
An object tracking model generation step of generating an object tracking model for tracking a face based on the feature information; and
A face tracking method including; an object tracking step of tracking the face in an image input from the user using the object tracking model.

In clause 7,
The additional image above is,
A face tracking method in which at least one face area of the object present in the initial image is generated including a deformed shape at different angles.

According to clause 8,
The image creation step is,
An area setting step of setting a face area of the object present in the initial image;
A feature vector generation step of extracting features from the face area and inputting the features into a preset neural network model to generate different feature vectors; and
A face tracking method comprising: generating the additional image in which the face area of the object has a deformed shape at different angles based on the different feature vectors.

According to clause 9,
The object tracking model creation step is,
By analyzing the feature information extracted from each of the initial image and the at least one additional image, if the face area is a specific object, the object is classified as the foreground, and if the face area is not a specific object, the object is classified as the background A classification information filter creation step of generating a classification information filter that calculates classification information; and
An object that displays a face area of the object included in each of the initial image and the at least one additional image in the form of a frame, and generates an object frame information filter for calculating frame information of the face area of the object displayed in the frame form. A face tracking method comprising: creating a frame information filter.

According to clause 10,
The object tracking model creation step is,
A model generation step of generating the object tracking model by performing a convolution operation on the classification information filter and the object frame information filter.

According to clause 11,
The object tracking step is,
In the information extraction step, real-time feature information extracted from the image input from the user is received, the real-time feature information is applied to the object tracking model to track and display the specific object, and the classification information filter is used to track and display the specific object. Classifying whether the face area of the real-time feature information is foreground or background, and displaying the face area of the real-time feature information in the form of a frame using the object frame information filter to track and display the specific object that is the face from the image, Face tracking method.