KR102446882B1

KR102446882B1 - Apparatus for recognizing faces using a large number of facial features and a learning control method thereof

Info

Publication number: KR102446882B1
Application number: KR1020200101421A
Authority: KR
Inventors: 김용현; 신종주; 박원표
Original assignee: 주식회사 카카오; 주식회사 카카오엔터프라이즈
Priority date: 2020-08-12
Filing date: 2020-08-12
Publication date: 2022-09-22
Also published as: KR20220020729A

Abstract

본 발명은 보다 많은 학습 데이터를 효율적으로 학습시키기 위한 얼굴을 인식하는 장치 및 그것의 제어 방법에 관한 것이다. 보다 구체적으로 본 발명은 학습(training)을 위한 이미지 세트를 입력 받고, 얼굴인식 모델에 기초하여 상기 이미지 세트에 대한 제 1 임베딩 벡터(embedding vector)를 산출하고, 상기 산출된 제 1 임베딩 벡터에 기초하여 제 1 학습하며, 상기 산출된 제 1 임베딩 벡터 및 제 2 임베딩 벡터에 기초하여 제 2 학습하되, 상기 학습은 상기 산출하는 단계 내지 제 2 학습하는 단계의 학습 사이클을 반복적으로 수행하고, 상기 제 2 임베딩 벡터는, 기수행된 학습 사이클에서 산출된 제 1 임베딩 벡터인 것을 특징으로 한다.The present invention relates to an apparatus for recognizing a face and a method for controlling the same for efficiently learning more training data. More specifically, the present invention receives an image set for training, calculates a first embedding vector for the image set based on a face recognition model, and based on the calculated first embedding vector to perform a first learning, and a second learning based on the calculated first embedding vector and a second embedding vector, wherein the learning repeatedly performs the learning cycle of the calculating step to the second learning step, The 2 embedding vector is characterized in that it is a first embedding vector calculated in a previously performed learning cycle.

Description

Apparatus for recognizing a face using a large number of facial features and a learning control method thereof

본 발명은 얼굴을 인식하기 위한 장치 및 그것의 학습 제어 방법에 관한 것으로, 보다 구체적으로는 얼굴을 인식하는 장치가 얼굴을 인식하기 위하여 지도 학습하는 과정에서, 단일 학습 사이클에 참고하는 이미지의 개수를 비약적으로 증대시켜 학습을 빠르게 수행할 수 있을 뿐만 아니라 더욱 정확한 학습이 가능한 장치 및 그것의 학습 제어 방법에 관한 것이다.The present invention relates to an apparatus for recognizing a face and a learning control method thereof, and more particularly, to the number of images referenced in a single learning cycle in the process of supervised learning by the apparatus for recognizing a face to recognize a face. It relates to an apparatus capable of rapidly increasing learning, as well as more accurate learning, and a learning control method thereof.

얼굴인식은 전자 결제, 스마트 폰의 화면 잠금 및 비디오 감시와 같은 많은 생체 인증 응용 분야의 핵심 기술이다.Facial recognition is a key technology in many biometric authentication applications, such as electronic payments, screen locks on smartphones, and video surveillance.

얼굴인식의 주요 작업은 얼굴 검증(face verification) 및 얼굴 식별(face identification)로 구분된다. 얼굴 검증에서는 한 쌍의 얼굴을 비교하여 그들의 신원이 같은 지 여부를 검증한다. 그리고 얼굴 식별에서는, 주어진 얼굴의 식별은 사전 등록 된 식별 갤러리와 비교하여 결정한다.The main tasks of face recognition are divided into face verification and face identification. In face verification, a pair of faces is compared to verify whether their identities are the same. And in face identification, the identification of a given face is determined by comparing it with a pre-registered identification gallery.

얼굴인식의 정확도를 높이기 위한 많은 연구가 수십년간 수행되었으며, 최근 합성곱 신경망(CNN, Convolutional Neural Networks) 채택으로 인식 정확도가 크게 향상되었다. 얼굴인식 기술들은 합성곱 신경망(CNN, Convolutional Neural Network)인 ResNet이나 MobileNet등의 백본망(backbone network) 뒤에 특징 추출 레이어를 배치하여 특징 표현을 추출한다.Many studies have been conducted for decades to improve the accuracy of face recognition, and the recent adoption of Convolutional Neural Networks (CNN) has significantly improved the recognition accuracy. Facial recognition technologies extract feature expressions by placing a feature extraction layer behind a backbone network such as ResNet or MobileNet, which is a convolutional neural network (CNN).

얼굴인식을 위한 시스템은 입력값과 그에 따른 출력값이 있는 데이터를 이용하여 주어진 입력에 맞는 출력을 찾는 학습(지도 학습, supervised learning) 과정을 반복적으로 수행하여, 정확도를 점차 개선시킨다.A system for face recognition improves accuracy by repeatedly performing a learning (supervised learning) process to find an output that matches a given input using data with input values and corresponding output values.

이와 같은 얼굴인식의 정확도나 학습에 소요되는 시간은 상기 학습 과정의 단일 학습 사이클에서 취급할 수 있는 데이터 세트의 양에 영향을 받는다. 단일 학습 사이클에서 더 많은 양의 이미지 데이터를 분석한다면, 얼굴인식 정확도는 올라갈 수 있고 학습에 소요되는 시간은 감소할 수 있다.The accuracy of face recognition or the time it takes to learn is affected by the amount of data set that can be handled in a single learning cycle of the learning process. If a larger amount of image data is analyzed in a single learning cycle, the facial recognition accuracy can be increased and the training time can be reduced.

하지만 이미지의 특징 표현을 추출하는데 있어서 많은 메모리가 요구되기 때문에, 단일 학습 사이클에 학습할 수 있는 데이터의 양은 적은 수준에 불과하며 단순히 메모리의 양을 늘리는 것은 효과적인 접근 방법이 되지 못한다.However, since a lot of memory is required to extract the feature representation of an image, the amount of data that can be learned in a single learning cycle is only a small level, and simply increasing the amount of memory is not an effective approach.

이에 따라 메모리의 양에 의존하지 않고, 단일 학습 싸이클에 보다 더 많은 양의 이미지를 학습할 수 있는 방법에 대한 연구가 요구되는 실정이다.Accordingly, there is a need for research on a method capable of learning a larger amount of images in a single learning cycle without depending on the amount of memory.

본 발명이 해결하고자 하는 과제는 다수의 얼굴 특징을 기반으로 정확하게 인물의 대표 특징을 추정하고 이를 활용함으로써 전체적인 인식 성능을 향상시킬 수 있는 얼굴을 인식하는 장치 및 그것의 학습 제어 방법을 제공하는 것이다.An object of the present invention is to provide an apparatus for recognizing a face and a learning control method thereof, which can improve overall recognition performance by accurately estimating representative features of a person based on a plurality of facial features and utilizing them.

본 발명이 해결하고자 하는 다른 과제는 학습이 진행되는 동안 얼굴 특징들을 누적시키고, 학습에 의해 발생하는 누적 에러를 보완시키며, 누적된 얼굴 특징들과 현재 특징을 종합적으로 활용함으로써 정확하게 인물의 대표 특징(Identity-Representative Vector)을 계산할 수 있는 얼굴을 인식하는 장치 및 그것의 학습 제어 방법을 제공하는 것이다.Another problem to be solved by the present invention is to accumulate facial features while learning is in progress, to compensate for accumulated errors caused by learning, and to accurately utilize the accumulated facial features and current features to accurately represent a person's representative features ( It is to provide a device for recognizing a face capable of calculating an identity-representative vector and a learning control method thereof.

본 발명에서 이루고자 하는 기술적 과제들은 이상에서 언급한 기술적 과제들로 제한되지 않으며, 언급하지 않은 또 다른 기술적 과제들은 아래의 기재로부터 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에게 명확하게 이해될 수 있을 것이다.The technical problems to be achieved in the present invention are not limited to the technical problems mentioned above, and other technical problems not mentioned will be clearly understood by those of ordinary skill in the art to which the present invention belongs from the description below. will be able

상기 또는 다른 목적을 달성하기 위해 본 발명의 일 측면에 따르면, 학습(training)을 위한 이미지 세트를 입력 받는 단계; 얼굴인식 모델에 기초하여 상기 이미지 세트에 대한 제 1 임베딩 벡터(embedding vector)를 산출하는 단계; 상기 산출된 제 1 임베딩 벡터에 기초하여 제 1 학습하는 단계; 및 상기 산출된 제 1 임베딩 벡터 및 제 2 임베딩 벡터에 기초하여 제 2 학습하는 단계를 포함하되, 상기 학습은 상기 산출하는 단계 내지 제 2 학습하는 단계의 학습 사이클을 반복적으로 수행하고, 상기 제 2 임베딩 벡터는, 기수행된 학습 사이클에서 산출된 상기 제 1 임베딩 벡터인 것을 특징으로 하는, 얼굴을 인식하는 장치의 학습 제어 방법을 제공한다.According to an aspect of the present invention to achieve the above or other object, the step of receiving an image set for training (training); calculating a first embedding vector for the image set based on a face recognition model; first learning based on the calculated first embedding vector; and a second learning step based on the calculated first embedding vector and second embedding vector, wherein the learning repeatedly performs the learning cycle of the calculating step to the second learning step, and the second The embedding vector provides a learning control method of an apparatus for recognizing a face, characterized in that it is the first embedding vector calculated in a previously performed learning cycle.

상기 제 1 학습하는 단계는, 소정 인물에 대응하는 아이덴티티 대표 벡터

(identity-representative vector)와 상기 제 1 임베딩 벡터 간의 근접도를 계산하는 단계; 및 상기 계산된 근접도가 높아지도록 파라미터를 조정하는 단계를 포함하되, 상기 조정되는 파라미터는 상기 얼굴인식 모델에 사용되는 얼굴인식 파라미터

일 수 있다.The first learning step includes an identity representative vector corresponding to a predetermined person.

calculating a proximity between an identity-representative vector and the first embedding vector; and adjusting a parameter to increase the calculated proximity, wherein the adjusted parameter is a face recognition parameter used in the face recognition model.

can be

상기 제 2 학습하는 단계는, 소정 인물에 대응하는 아이덴티티 대표 벡터

, 상기 제 1 임베딩 벡터 및 상기 제 2 임베딩 벡터 간의 근접도를 판단하는 단계; 및 상기 판단되는 근접도가 높아지도록 상기 아이덴티티 대표 벡터

를 조정하는 단계를 포함할 수 있다.The second learning step includes an identity representative vector corresponding to a predetermined person.

, determining a proximity between the first embedding vector and the second embedding vector; and the identity representative vector to increase the determined proximity.

may include adjusting the

그리고 상기 제 2 학습하는 단계는, 상기 제 2 임베딩 벡터를 보간(compensate)하는 단계를 더 포함하고, 상기 근접도를 판단하는 단계는 상기 보간된 제 2 임베딩 벡터에 기초하여 판단할 수 있다.The second learning may further include compensating the second embedding vector, and determining the proximity may be determined based on the interpolated second embedding vector.

상기 보간하는 단계는, 특정 인물

에 대해 현재 수행되는 학습 사이클의 아이덴티티 대표 벡터

와 기수행된 학습 사이클의 아이덴티티 대표 벡터

의 차이를 계산하는 단계를 포함하고, 상기 계산된 차이에 기초하여 보간할 수 있다.The interpolating step is a specific person

A representative vector of the identity of the learning cycle currently being performed on

and the identity representative vector of the previously performed learning cycle.

and calculating a difference between , and interpolation may be performed based on the calculated difference.

이때 상기 보간하는 단계는, 상기 계산된 차이에 스케일을 조정하는 단계를 더 포함하고, 상기 스케일이 조정된 차이에 기초하여 보간할 수 있다.In this case, the interpolating may further include adjusting a scale to the calculated difference, and may perform interpolation based on the scaled difference.

상기 근접도를 판단하는 단계는 하기 수학식에 정의된 손실 함수

에 의해서 계산이 이루어지며,

수학식에서

이고,

는 상기 이미지 세트,

는 상기 이미지 세트의 개수,

는 상기 제 1 임베딩 벡터,

는 레이블(labeled) 된 인물,

는 L2 노말라이즈(normalize) 된 벡터,

는

에 대한 아이덴티티 대표 벡터일 수 있다.The step of determining the proximity is a loss function defined in the following equation

Calculation is done by

in the formula

ego,

is the image set,

is the number of image sets,

is the first embedding vector,

is a labeled person,

is the L2 normalized vector,

Is

It may be an identity representative vector for .

상기 이미지 세트는 복수 개의 인물에 대한 복수 개의 이미지를 포함하고, 상기 제 1 임베딩 벡터는 상기 복수 개의 이미지 각각에 대해 산출될 수 있다.The image set may include a plurality of images of a plurality of people, and the first embedding vector may be calculated for each of the plurality of images.

상기 아이덴티티 대표 벡터

를 상기 복수 개의 인물 각각에 대응시켜 저장하는 단계를 더 포함할 수 있다.the identity representative vector

It may further include the step of storing the corresponding to each of the plurality of people.

상기 산출된 제 1 임베딩 벡터가 상기 제 2 임베딩 벡터에 포함되도록 상기 제 2 임베딩 벡터를 업데이트하는 단계를 더 포함하되, 상기 업데이트된 제 2 임베딩 벡터는 상기 업데이트 이후 수행되는 학습 사이클에 사용될 수 있다.The method may further include updating the second embedding vector so that the calculated first embedding vector is included in the second embedding vector, wherein the updated second embedding vector may be used in a learning cycle performed after the update.

상기 또는 다른 목적을 달성하기 위해 본 발명의 다른 측면에 따르면, 학습(training)을 위한 이미지 세트를 입력 받고, 상기 입력된 이미지 세트에 대한 제 1 임베딩 벡터(embedding vector)를 산출하는 얼굴인식 모델; 상기 산출된 제 1 임베딩 벡터에 기초하여 제 1 학습하는 제 1 학습부; 및 상기 산출된 제 1 임베딩 벡터 및 제 2 임베딩 벡터에 기초하여 제 2 학습하는 제 2 학습부를 포함하되, 상기 학습은 상기 산출, 제 1 및 제 2 학습의 학습 사이클을 반복적으로 수행하고, 상기 제 2 임베딩 벡터는 기수행된 학습 사이클에서 산출된 상기 제 1 임베딩 벡터일 수 있다.According to another aspect of the present invention to achieve the above or other objects, a face recognition model that receives an image set for training and calculates a first embedding vector for the input image set; a first learning unit performing first learning based on the calculated first embedding vector; and a second learning unit configured to perform a second learning based on the calculated first embedding vector and the second embedding vector, wherein the learning repeatedly performs a learning cycle of the calculation, the first and the second learning, and the second learning is performed. The second embedding vector may be the first embedding vector calculated in a previously performed learning cycle.

상기 제 1 학습부는, 소정 인물에 대응하는 아이덴티티 대표 벡터

(identity-representative vector)와 상기 제 1 임베딩 벡터 간의 근접도를 계산하고, 상기 계산된 근접도가 높아지도록 파라미터를 조정하되, 상기 조정되는 파라미터는 상기 얼굴인식 모델에 사용되는 얼굴인식 파라미터

일 수 있다.The first learning unit, an identity representative vector corresponding to a predetermined person

Calculate a proximity between an identity-representative vector and the first embedding vector, and adjust a parameter to increase the calculated proximity, wherein the adjusted parameter is a face recognition parameter used in the face recognition model

can be

상기 제 2 학습부는, 소정 인물에 대응하는 아이덴티티 대표 벡터

, 상기 제 1 임베딩 벡터 및 상기 제 2 임베딩 벡터 간의 근접도를 판단하고, 상기 판단되는 근접도가 높아지도록 상기 아이덴티티 대표 벡터

를 조정할 수 있다.The second learning unit, an identity representative vector corresponding to a predetermined person

, determine the proximity between the first embedding vector and the second embedding vector, and the identity representative vector so that the determined proximity increases

can be adjusted.

상기 제 2 학습부는 상기 근접도 판단 시, 상기 제 2 임베딩 벡터를 보간(compensate)하고, 상기 보간된 제 2 임베딩 벡터에 기초하여 판단할 수 있다.When determining the proximity, the second learner may interpolate the second embedding vector and determine based on the interpolated second embedding vector.

상기 제 2 학습부는 상기 보간하는데 있어서, 특정 인물

와 기수행된 학습 사이클의 아이덴티티 대표 벡터

의 차이를 계산하고, 상기 계산된 차이에 기초하여 보간할 수 있다.In the interpolation of the second learning unit, a specific person

may calculate a difference between , and interpolate based on the calculated difference.

상기 제 2 학습부는 상기 보간하는데 있어서, 상기 계산된 차이에 스케일을 조정하고, 상기 스케일이 조정된 차이에 기초하여 보간할 수 있다.In the interpolation, the second learner may adjust a scale to the calculated difference and perform interpolation based on the scaled difference.

상기 제 1 및 제 2 학습부는 근접도를 판단하는데 있어서 하기 수학식에 정의된 손실 함수

에 의해서 계산이 이루어지며,

The first and second learning units determine the proximity, a loss function defined in the following equation

Calculation is done by

수학식에서

이고,

는 상기 이미지 세트,

는 상기 이미지 세트의 개수,

는 상기 제 1 임베딩 벡터,

는 레이블(labeled) 된 인물,

는 L2 노말라이즈(normalize) 된 벡터,

는

에 대한 아이덴티티 대표 벡터일 수 있다.in the formula

ego,

is the image set,

is the number of image sets,

is the first embedding vector,

is a labeled person,

is the L2 normalized vector,

Is

It may be an identity representative vector for .

상기 아이덴티티 대표 벡터

를 상기 복수 개의 인물 각각에 대응시켜 저장하는 분류부를 더 포함할 수 있다.the identity representative vector

It may further include a classification unit for storing the corresponding to each of the plurality of people.

상기 산출된 제 1 임베딩 벡터가 상기 제 2 임베딩 벡터에 포함되도록 상기 제 2 임베딩 벡터를 업데이트하는 대기열저장부를 더 포함하되, 상기 업데이트된 제 2 임베딩 벡터는 상기 업데이트 이후 수행되는 학습 사이클에 사용될 수 있다.Further comprising a queue storage unit for updating the second embedding vector so that the calculated first embedding vector is included in the second embedding vector, the updated second embedding vector may be used in a learning cycle performed after the update .

본 발명에 따른 얼굴을 인식 하는 장치 및 그것의 학습 제어 방법의 효과에 대해 설명하면 다음과 같다.The effect of the device for recognizing a face and the learning control method thereof according to the present invention will be described as follows.

본 발명의 실시 예들 중 적어도 하나에 의하면, 메모리 사용량을 적절하게 유지하면서도 얼굴인식을 위한 학습에 사용되는 이미지의 개수를 비약적으로 늘릴 수 있다는 장점이 있다.According to at least one of the embodiments of the present invention, there is an advantage that the number of images used for learning for face recognition can be dramatically increased while appropriately maintaining memory usage.

또한, 본 발명의 실시 예들 중 적어도 하나에 의하면, 많은 수의 이미지를 종합적으로 고려하는 학습과정을 통하여, 보다 빠르고 정확하게 얼굴인식을 위한 학습과정을 수행할 수 있다는 장점이 있다.In addition, according to at least one of the embodiments of the present invention, there is an advantage that a learning process for face recognition can be performed more quickly and accurately through a learning process that comprehensively considers a large number of images.

본 발명의 적용 가능성의 추가적인 범위는 이하 상세한 설명으로부터 명백해질 것이다. 그러나 본 발명의 사상 및 범위 내에서 다양한 변경 및 수정은 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에게 명확하게 이해될 수 있으므로, 상세한 설명에 기술되어 있는 특정 실시 예는 단지 예시로 주어진 것으로 이해되어야 한다.Further scope of applicability of the present invention will become apparent from the following detailed description. However, since various changes and modifications within the spirit and scope of the present invention can be clearly understood by those skilled in the art to which the present invention pertains, the specific embodiments described in the detailed description are given by way of example only. should be understood

도 1은 합성곱 신경망을 기반으로 특징 표현(얼굴의 특징)을 추출하는 얼굴인식 모델(110)의 개념도를 도시한다.
도 2는 일반적인 분류부(classifier, 210)의 분류 단계 동작을 도시하는 개념도이다.
도 3은 일반적인 지도 학습(supervised learning) 과정의 한 싸이클을 도시하는 도면이다.
도 4는 본 발명의 일실시예에 따른 얼굴인식 장치(100)의 블록도를 도시하는 도면이다.
도 5는 본 발명의 일실시예에 따른 지도 학습 과정의 한 싸이클을 도시하는 도면이다.
도 6은 본 발명의 일실시예에 따른 얼굴인식 장치(100)의 순서도를 도시하는 도면이다.
도 7 및 도 8은 본 발명의 일실시예에 따라 임베딩 벡터를 보간하기 위한 방법의 개념도를 도시하는 도면이다.
도 9은 본 발명의 일실시예에 따른 보간에 따른 효과를 증명하는 실험 데이터이다.
도 10은 학습 데이터의 양이 증가됨에 따라 요구되는 메모리의 증가를 비교하는 도면이다.
도 11은 본 발명의 일실시예에 따른 얼굴인식 방법을 이미지 검색 기술에 적용한 경우의 실험 데이터를 도시한다.1 shows a conceptual diagram of a face recognition model 110 for extracting a feature expression (features of a face) based on a convolutional neural network.
2 is a conceptual diagram illustrating a classification step operation of a general classifier 210 .
3 is a diagram illustrating one cycle of a general supervised learning process.
4 is a diagram illustrating a block diagram of a face recognition apparatus 100 according to an embodiment of the present invention.
5 is a diagram illustrating one cycle of a supervised learning process according to an embodiment of the present invention.
6 is a diagram illustrating a flowchart of the face recognition apparatus 100 according to an embodiment of the present invention.
7 and 8 are diagrams illustrating a conceptual diagram of a method for interpolating an embedding vector according to an embodiment of the present invention.
9 is experimental data that proves the effect of interpolation according to an embodiment of the present invention.
10 is a diagram comparing an increase in memory required as the amount of learning data increases.
11 shows experimental data when the face recognition method according to an embodiment of the present invention is applied to an image search technology.

이하, 첨부된 도면을 참조하여 본 명세서에 개시된 실시 예를 상세히 설명하되, 도면 부호에 관계없이 동일하거나 유사한 구성요소는 동일한 참조 번호를 부여하고 이에 대한 중복되는 설명은 생략하기로 한다. 이하의 설명에서 사용되는 구성요소에 대한 접미사 "모듈" 및 "부"는 명세서 작성의 용이함만이 고려되어 부여되거나 혼용되는 것으로서, 그 자체로 서로 구별되는 의미 또는 역할을 갖는 것은 아니다. 또한, 본 명세서에 개시된 실시 예를 설명함에 있어서 관련된 공지 기술에 대한 구체적인 설명이 본 명세서에 개시된 실시 예의 요지를 흐릴 수 있다고 판단되는 경우 그 상세한 설명을 생략한다. 또한, 첨부된 도면은 본 명세서에 개시된 실시 예를 쉽게 이해할 수 있도록 하기 위한 것일 뿐, 첨부된 도면에 의해 본 명세서에 개시된 기술적 사상이 제한되지 않으며, 본 발명의 사상 및 기술 범위에 포함되는 모든 변경, 균등물 내지 대체물을 포함하는 것으로 이해되어야 한다. Hereinafter, the embodiments disclosed in the present specification will be described in detail with reference to the accompanying drawings, but the same or similar components are assigned the same reference numerals regardless of reference numerals, and redundant description thereof will be omitted. The suffixes "module" and "part" for components used in the following description are given or mixed in consideration of only the ease of writing the specification, and do not have distinct meanings or roles by themselves. In addition, in describing the embodiments disclosed in the present specification, if it is determined that detailed descriptions of related known technologies may obscure the gist of the embodiments disclosed in this specification, the detailed description thereof will be omitted. In addition, the accompanying drawings are only for easy understanding of the embodiments disclosed in this specification, and the technical idea disclosed herein is not limited by the accompanying drawings, and all changes included in the spirit and scope of the present invention , should be understood to include equivalents or substitutes.

제1, 제2 등과 같이 서수를 포함하는 용어는 다양한 구성요소들을 설명하는데 사용될 수 있지만, 상기 구성요소들은 상기 용어들에 의해 한정되지는 않는다. 상기 용어들은 하나의 구성요소를 다른 구성요소로부터 구별하는 목적으로만 사용된다.Terms including ordinal numbers such as first, second, etc. may be used to describe various elements, but the elements are not limited by the terms. The above terms are used only for the purpose of distinguishing one component from another.

어떤 구성요소가 다른 구성요소에 "연결되어" 있다거나 "접속되어" 있다고 언급된 때에는, 그 다른 구성요소에 직접적으로 연결되어 있거나 또는 접속되어 있을 수도 있지만, 중간에 다른 구성요소가 존재할 수도 있다고 이해되어야 할 것이다. 반면에, 어떤 구성요소가 다른 구성요소에 "직접 연결되어" 있다거나 "직접 접속되어" 있다고 언급된 때에는, 중간에 다른 구성요소가 존재하지 않는 것으로 이해되어야 할 것이다.When a component is referred to as being “connected” or “connected” to another component, it may be directly connected or connected to the other component, but it is understood that other components may exist in between. it should be On the other hand, when it is said that a certain element is "directly connected" or "directly connected" to another element, it should be understood that the other element does not exist in the middle.

단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한, 복수의 표현을 포함한다. The singular expression includes the plural expression unless the context clearly dictates otherwise.

본 출원에서, "포함한다" 또는 "가지다" 등의 용어는 명세서상에 기재된 특징, 숫자, 단계, 동작, 구성요소, 부품 또는 이들을 조합한 것이 존재함을 지정하려는 것이지, 하나 또는 그 이상의 다른 특징들이나 숫자, 단계, 동작, 구성요소, 부품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 미리 배제하지 않는 것으로 이해되어야 한다.In the present application, terms such as “comprises” or “have” are intended to designate that a feature, number, step, operation, component, part, or combination thereof described in the specification exists, but one or more other features It is to be understood that this does not preclude the possibility of the presence or addition of numbers, steps, operations, components, parts, or combinations thereof.

* 정의* Justice

- 인스턴스(Instance): 하나의 이미지(샘플이라고도 함)를 의미한다.- Instance: means one image (also called sample).

- 아이덴티티(Identity): 하나의 인물(보통의 이미지 검색(Vision Task)에서는 클래스(Class)로 표현됨)을 의미한다.- Identity: It means a single person (expressed as a class in a normal image search (Vision Task)).

- 얼굴인식 모델(Model, encoder): 학습된 CNN(Convolutional Neural Network)로 구조와 모델 파라미터를 통칭하는 것이다. 얼굴인식 모델은 이미지가 주어지면, 주어진 이미지의 특징을 반영하여 임베딩 벡터로 인코딩한다. 학습 과정을 거치면 모델 파라미터가 최적화된다. 일예시로, FaceNet 또는 ResNet을 이용하여 이미지에 포함되어 있는 얼굴의 특징을 벡터로 맵핑시킬 수 있다.- Face recognition model (Model, encoder): It is a trained Convolutional Neural Network (CNN) that collectively refers to the structure and model parameters. When a face recognition model is given an image, it is encoded into an embedding vector by reflecting the features of the given image. Through the training process, the model parameters are optimized. As an example, facial features included in an image may be mapped to vectors using FaceNet or ResNet.

- 분류부(classifier): 상기 얼굴인식 모델을 통하여 특정 이미지(에 포함되어 있는 얼굴)에 대한 임베딩 벡터가 구해지면, 데이터베이스 상에서 구해진 임베딩 벡터와 가장 근접한(유사도가 높은) '아이덴티티 대표 벡터'를 판단하여 아이덴티티(인물)를 식별하는 구성이다(분류부 모델은 여러 인물들에 대한 '아이덴티티 대표 벡터' 데이터베이스를 구비함). 학습 단계에서 분류부 모델은 손실 함수의 결과에 따라 저장되어 있는 '아이덴티티 대표 벡터'의 파라미터를 지속적으로 업데이트시킨다.- Classifier: When an embedding vector for a specific image (a face included in) is obtained through the face recognition model, the 'identity representative vector' closest to (high similarity) to the embedded vector obtained from the database is determined It is a configuration that identifies identities (persons) through In the learning phase, the classifier model continuously updates the parameters of the stored 'identity representative vector' according to the result of the loss function.

- 학습(training 또는 learning): 모델을 최적화시키기 위한 과정을 의미하며, 수만 또는 그 이상의 이미지에 기초하여 얼굴인식 모델 또는 분류부 모델의 파라미터를 최적화시킨다. 컴퓨팅 장치의 물리적인 한계로 인하여, 모든 이미지를 한꺼번에 학습할 수 없고, 기설정된 사이즈의 이미지 세트(미니배치) 단위로 학습을 반복(iteration)하는 방식으로 수행된다.- Learning (training or learning): refers to the process for optimizing the model, and optimizes the parameters of the face recognition model or the classifier model based on tens of thousands or more images. Due to the physical limitations of the computing device, it is not possible to learn all the images at once, and the learning is performed in a manner of iteration in units of an image set (mini-batch) of a preset size.

- 임베딩 벡터(Embedding Vector): 입력이 주어지면 학습된 모델을 이용하여 입력 이미지의 의미를 함축(Embedding)한 특징 벡터이다. 이미지는 얼굴인식 모델에 의해서 임베딩 벡터로 인코딩될 수 있다. 이미지 간의 유사성(similarity)은 각 임베딩 벡터 간의 거리를 계산하여 판단할 수 있다(유사할 경우 벡터 간의 거리가 가깝고, 유사도가 낮아질 수록 벡터 간의 거리가 멀어짐). 예를 들어, 이미지를 FaceNet 또는 ResNet에 입력으로 넣었을 때 출력되는 결과를 임베딩 벡터라 할 수 있다.- Embedding Vector: When an input is given, it is a feature vector that embeds the meaning of the input image using the trained model. The image may be encoded into an embedding vector by the face recognition model. Similarity between images can be determined by calculating the distance between each embedding vector (if they are similar, the distance between the vectors is close, and the distance between the vectors increases as the similarity decreases). For example, when an image is input to FaceNet or ResNet, the output output can be called an embedding vector.

- 임베딩 공간(Embedding Space): 임베딩 벡터들이 맵핑(mapping)되는 가상의 공간을 의미한다.- Embedding Space: A virtual space to which embedding vectors are mapped.

- 아이덴티티 대표 벡터(Identity-representative Vector): 분류부(classifier)의 가중치 벡터(weight vector)/가중치 파라미터로서, 특정 아이덴티티(인물)를 대표하는 벡터를 의미한다. 의미적으로 특정 아이덴티티에 속한 인스턴스들의 복수 개의 임베딩 벡터(들)의 평균(구현적으로는 마지막 Fully-connected Layer의 Weight Vector를 의미)이다. 예를 들어, '도날드 트럼프' 사진들에 대한 임베딩 벡터들을 대표하는 벡터(예를 들면 임베딩 벡터들의 평균)를 '도날드 트럼프'라는 인물에 대한 '아이덴티티 대표 벡터'라고 할 수 있다. 아이덴티티 대표 벡터는 학습 과정에 의해서 최적화된다.- Identity-representative vector: As a weight vector/weight parameter of a classifier, it refers to a vector representing a specific identity (person). Semantically, it is the average of a plurality of embedding vector(s) of instances belonging to a specific identity (implementationally, it means the weight vector of the last fully-connected layer). For example, a vector representing the embedding vectors for the 'Donald Trump' photos (for example, the average of the embedding vectors) may be called an 'identity representative vector' for the person named 'Donald Trump'. The identity representative vector is optimized by the learning process.

- 손실 함수(Loss function 또는 cost function): 비용(cost) 혹은 손실(loss)이 얼마나 있는지 산출하기 위한 함수를 의미하며, 임베딩 벡터 간(또는 임베딩 벡터 vs 아이덴티티 대표 벡터)의 비유사성 정도를 계산하기 위한 함수이다. 본 발명에서는 임베딩 벡터와 아이덴티티 대표 벡터 간 차이가 크면 손실이 크고, 차이가 작으면 손실이 작아진다. 파라미터(얼굴인식 모델 파라미터 또는 분류부 파라미터)의 최적화는 손실 함수로 계산된 결과를 최소화시키는 방향으로 이루어진다.- Loss function (or cost function): A function for calculating how much cost or loss is, and to calculate the degree of dissimilarity between embedding vectors (or embedding vectors vs. identity representative vectors) is a function for In the present invention, if the difference between the embedding vector and the identity representative vector is large, the loss is large, and if the difference is small, the loss is small. The optimization of parameters (facial recognition model parameters or classifier parameters) is made in the direction of minimizing the result calculated by the loss function.

- 배치(batch): 모델의 지도 학습에 사용되는 전체 인스턴스(이미지) 데이터 세트(집단)를 의미한다.- Batch: refers to the entire instance (image) data set (group) used for supervised learning of the model.

- 미니 배치(mini-batch): 모델의 지도 학습 시 하나로써 처리(예를 들어 단일 학습 싸이클)해야 할 데이터 세트(집단)를 의미한다. 예를 들어 얼굴인식 모델의 입력으로 210개의 인스턴스(이미지)가 단일 학습 싸이클(1 iteration)의 학습데이터로 입력되는 경우, 이미지 210개 세트를 미니 배치라고 한다.- Mini-batch: A data set (group) to be processed (eg, a single learning cycle) as one during supervised learning of a model. For example, when 210 instances (images) are input as training data of a single learning cycle (1 iteration) as an input of a face recognition model, a set of 210 images is called a mini-batch.

도 1은 합성곱 신경망을 기반으로 특징 표현(얼굴의 특징)을 추출하는 얼굴인식 모델(110, 이하 모델이라 함)의 개념도를 도시한다.1 shows a conceptual diagram of a face recognition model 110 (hereinafter referred to as a model) for extracting feature expressions (features of a face) based on a convolutional neural network.

모델(110)은 특정 인물(identity, 아이덴티티)에 대한 이미지가 입력되면, 해당 인물이 가진 얼굴의 특징 표현을 인코딩한다. 이때 수행되는 인코딩의 결과는 N차원의 벡터로 표현될 수 있다. 이렇게 표현되는 벡터는 해당 얼굴의 특징을 반영하게 되며, 임베딩 벡터(embedding vector)라 부른다.When an image for a specific person (identity) is input, the model 110 encodes a facial feature expression of the person. In this case, the result of the encoding performed may be expressed as an N-dimensional vector. The vector expressed in this way reflects the characteristics of the corresponding face, and is called an embedding vector.

도 1에서 제 1 인물(105-1)에 대한 제 1 이미지(101-1)가 모델(110)에 입력되면, 모델(110)은 제 1 이미지(101-1)를 제 1 임베딩 벡터(102-1)로 인코딩한다. 마찬가지로 제 1 인물(105-1)에 대한 제 2 이미지(101-2)에 대해서는 제 2 임베딩 벡터(102-2)로 인코딩하고, 제 2 인물(105-2)에 대한 제 3 이미지(101-3)를 제 3 임베딩 벡터(102-3)로 인코딩한다. 이때 제 1 및 제 2 인물(105-1, 105-2)은 서로 다른 인물이고, 제 1 및 제 2 이미지(101-1, 101-2)는 같은 제 1 인물(105-1)에 대한 서로 다른 이미지라고 가정한다.In FIG. 1 , when a first image 101-1 for a first person 105-1 is input to the model 110, the model 110 converts the first image 101-1 into a first embedding vector 102 -1) to encode. Similarly, the second image 101-2 of the first person 105-1 is encoded with the second embedding vector 102-2, and the third image 101- of the second person 105-2 is encoded. 3) is encoded into the third embedding vector 102-3. In this case, the first and second people 105-1 and 105-2 are different people, and the first and second images 101-1 and 101-2 are the same for the first person 105-1. Assume it is a different image.

이와 같이 얻어진 임베딩 벡터 간의 거리(또는 근접도)를 확인할 경우, 동일 인물인지 다른 인물인지 확인할 수 있다. 제 1 내지 제 3 임베딩 벡터(102-1 ~ 102-3) 간의 거리를 계산해 보면, 제 1 및 제 2 임베딩 벡터(102-1, 102-2)는 서로 상대적으로 가깝고 제 3 임베딩 벡터(102-3)와의 거리는 상대적으로 먼 것으로 확인할 수 있다. 이에 따라 제 1 및 제 2 이미지(101-1, 101-2)에 포함되는 인물은 서로 동일 인물(제 1 인물)이고, 제 3 이미지(101-3)에 포함된 인물은 다른 인물(제 2 인물)이라는 것을 확인할 수 있다.When checking the distance (or proximity) between the embedding vectors obtained in this way, it is possible to check whether the same person or a different person. When the distance between the first to third embedding vectors 102-1 to 102-3 is calculated, the first and second embedding vectors 102-1 and 102-2 are relatively close to each other and the third embedding vector 102- 3) can be confirmed as being relatively far away. Accordingly, the persons included in the first and second images 101-1 and 101-2 are the same person (first person), and the person included in the third image 101-3 is different (second person). person) can be identified.

결국 모델(110)을 통하여 획득한 임베딩 벡터들을 확인할 경우, 상대적으로 거리가 가까운지 먼지 확인하여 동일인인지 여부를 확인할 수 있다.After all, when checking the embedding vectors obtained through the model 110 , it is possible to check whether they are the same person by checking whether the distance is relatively close or not.

상기 모델(110)을 아래와 같이 수학식 1로 표현할 수 있다.The model 110 can be expressed by Equation 1 as follows.

여기서

는 얼굴인식 모델을 나타내는 함수,

는 임베딩 벡터,

는 모델(110)에 사용되는 얼굴인식 파라미터이고,

는 입력되는 이미지를 의미한다. 얼굴인식 파라미터

는 수 만장 이상의 이미지 데이터에 대한 지도 학습(supervised learning)을 통하여 획득한 파라미터를 의미하며, 지도 학습 과정에서 지속적으로 최적화되는 방향으로 업데이트된다. 이때 지도 학습이란, 입력값과 그에 따른 출력값이 있는 데이터를 이용하여 주어진 입력에 맞는 출력을 찾는 학습을 의미하며, 얼굴인식에 관한 학습에서는 얼굴이 포함되어 있는 이미지와 해당 얼굴이 누구인지 여부를 알고 있는 상태(정답을 알고 있는 상태)에서 이루어지는 학습을 의미한다.here

is a function representing the face recognition model,

is the embedding vector,

is a face recognition parameter used in the model 110,

is the input image. face recognition parameters

is a parameter obtained through supervised learning of tens of thousands of image data, and is continuously updated in the direction of optimization in the supervised learning process. In this case, supervised learning means learning to find an output that matches a given input using data with input values and corresponding output values. It means learning that takes place in the state of being (the state of knowing the correct answer).

도 2는 일반적인 분류부(classifier, 210)의 학습 단계를 도시하는 개념도이다.2 is a conceptual diagram illustrating a learning stage of a general classifier 210 .

분류부(210)는 각 인물들에 대한 대표적인 임베딩 벡터를 보유할 수 있다. 이러한 대표적인 임베딩 벡터를 '아이덴티티 대표 벡터

(Identity-representative Vector)'라 한다. 분류부(210)는 아이덴티티 대표 벡터

역시 상술한 지도 학습 과정(도 3을 통하여 후술)을 통하여 최적화되는 방향으로 지속적으로 업데이트(

)한다.The classification unit 210 may hold a representative embedding vector for each person. These representative embedding vectors are referred to as 'identity representative vectors.

(Identity-representative Vector)'. Classification unit 210 is an identity representative vector

Also continuously updated (

)do.

이하 도 3을 참조하여, 아이덴티티 대표 벡터

와 얼굴인식 파라미터

를 최적화되는 방향으로 지속적으로 다듬는 구체적인 지도 학습 과정에 대해서 설명한다.Referring to FIG. 3 below, identity representative vector

and face recognition parameters

The detailed supervised learning process of continuously refining in the direction of optimization will be described.

도 3은 일반적인 지도 학습(supervised learning) 과정의 한 싸이클을 도시하는 도면이다.3 is a diagram illustrating one cycle of a general supervised learning process.

상술한 바와 같이 지도 학습은 입력값과 함께 출력값이 주어진 상태에서 출력을 찾는 학습을 말한다. 즉, 지도 학습은 이미지(입력값)와 함께 해당 이미지에 포함되어 있는 인물이 누구인지에 대한 정보(출력값)가 함께 주어진 상태에서 이루어진다. 이렇게 입력값과 출력값이 쌍을 이루어 입력되는 데이터를 학습 데이터(training data)라고 한다.As described above, supervised learning refers to learning to find an output in a state in which an output value is given together with an input value. That is, supervised learning is performed in a state in which information (output value) about the person included in the image is given together with an image (input value). In this way, the data input by forming a pair of an input value and an output value is called training data.

학습 데이터를 이용한 학습이 이루어질수록 모델(110) 및 아이덴티티 대표 벡터

가 최적화될 수 있다. 즉, 학습 데이터가 많으면 많을수록 더 정확한 모델(110) 및 아이덴티티 대표 벡터

를 얻을 수 있다. 일반적으로 얼굴인식 기술에 사용되는 학습 데이터 세트의 경우 수십만에 달하는 사람들에 대해 수백만 장의 이미지가 이용된다.As the learning using the learning data is made, the model 110 and the identity representative vector

can be optimized. That is, the more training data, the more accurate the model 110 and identity representative vector.

can get For training data sets typically used in facial recognition technology, millions of images are used for hundreds of thousands of people.

그렇기 때문에 학습 시 최대한 많은 개수의 학습 데이터를 기반으로 하는 것이 바람직하지만, 장치에 구비되는 메모리의 물리적인 한계로 인하여 한 번의 학습에 입력될 수 있는 학습 데이터의 개수가 제한적일 수밖에 없다. 왜냐하면, 모델(110)이 입력된 인스턴스(이미지)를 인코딩할 때 많은 메모리가 요구되기 때문이다.Therefore, it is preferable to base the learning data as much as possible upon learning, but due to the physical limitations of the memory provided in the device, the number of learning data that can be input at one time for learning is inevitably limited. This is because a lot of memory is required when the model 110 encodes an input instance (image).

따라서 메모리의 가용 범위를 고려하여 전체 학습 데이터 세트(배치, batch)를 작은 데이터 세트(미니 배치, mini-batch)로 쪼개고, 이러한 미니 배치들을 순차적으로 반복 학습시키면서 모델(110) 및 아이덴티티 대표 벡터

를 최적화시키는 것이 일반적이다. 즉 학습 데이터들을 미니 배치 단위로 반복적으로 학습하면서 얼굴인식 장치를 최적화시킨다. 예를 들어 일반적으로 약 210개의 인스턴스(이미지)가 한 번의 학습 데이터로 입력된다.Therefore, considering the available range of memory, the entire training data set (batch) is split into small data sets (mini-batch), and while these mini-batches are sequentially and iteratively trained, the model 110 and the identity representative vector

It is common to optimize That is, the face recognition device is optimized while repeatedly learning the training data in mini-batch units. For example, in general, about 210 instances (images) are input as training data at one time.

도시된 도면을 참조하면 모델(110)에 미니 배치(301)가 입력(예를 들어 약 210개의 이미지)된다. 모델(110)은 입력된 미니 배치(301)를 임베딩 벡터

로 인코딩하고, 학습부(310)로 전달한다. 학습부(310)는 임베딩 벡터

가 대응되는 아이덴티티 대표 벡터

와 가장 가까워지고, 대응되지 않는 아이덴티티 대표 벡터

와는 멀어지도록 모델(110)의 파라미터

와 아이덴티티 대표 벡터

를 최적화시킨 후 각각 모델(110) 및 분류부(210)에 업데이트시키는 방식으로 학습을 수행한다. 즉, 동일 인물에 대한 임베딩 벡터

와 아이덴티티 대표 벡터

는 가까워지고, 다른 인물에 대한 임베딩 벡터

와 아이덴티티 대표 벡터

는 멀어지도록 최적화시킨다는 것이다.Referring to the illustrated figure, the mini-batch 301 is input to the model 110 (eg, about 210 images). The model 110 embeds the input mini-batch 301 into an embedding vector.

is encoded and transmitted to the learning unit 310 . Learning unit 310 embedding vector

Identity representative vector that corresponds to

Identity representative vector that is closest to and does not correspond to

parameters of the model 110 so as to be far from

and identity representative vector

After optimizing , learning is performed by updating the model 110 and the classifier 210, respectively. That is, embedding vectors for the same person

and identity representative vector

is getting closer, embedding vectors for other figures

and identity representative vector

is to optimize away from it.

이때 학습부(310)는 임베딩 벡터

와 아이덴티티 대표 벡터

간에 얼마나 가깝거나(근접도) 얼마나 먼지(손실도)를 아래 수학식 2의 손실 함수(Loss function)

에 기초하여 산출할 수 있다.At this time, the learning unit 310 embedding vector

and identity representative vector

How close (proximity) or how much dust (loss) is the loss function of Equation 2 below

can be calculated based on

여기서

는 아래 수학식 3으로 표현될 수 있다.here

can be expressed by Equation 3 below.

는 미니 배치(301)로 입력된 이미지 세트,

는 상기 이미지 세트의 개수,

는 임베딩 벡터,

는 레이블(labeled) 된 아이덴티티(즉 학습 데이터의 결과값),

는 L2 노말라이즈(normalize) 된 벡터,

는

에 대한 아이덴티티 대표 벡터이다. 그리고

는 데이터 베이스에 저장되어 있는 아이덴티티 대표 벡터 집단을 의미한다.

is the image set input as mini-batch 301,

is the number of image sets,

is the embedding vector,

is the labeled identity (that is, the result value of the training data),

is the L2 normalized vector,

Is

It is an identity representative vector for . and

denotes a group of identity representative vectors stored in the database.

즉 학습부(310)는 손실 함수

에 의해서 계산된 '손실'이 최소화되는 방향으로 파라미터

와 아이덴티티 대표 벡터

를 수정하고, 수정된 값으로 업데이트하도록 수정된 파라미터

를 모델(110)에, 수정된 아이덴티티 대표 벡터

분류부(210)에 회신하는 방식으로 학습을 수행한다.That is, the learning unit 310 is a loss function

parameter in the direction in which the 'loss' calculated by

and identity representative vector

Modified parameters to modify and update to the modified values

In the model 110, the modified identity representative vector

Learning is performed by replying to the classification unit 210 .

이와 같이 업데이트된 파라미터

와 아이덴티티 대표 벡터

는 해당 학습 사이클 단계에서 최적화된 결과일 것이다. 전체 학습 데이터 중 나머지 학습 데이터에 대해서도 마찬가지로 상술한 학습 싸이클이 반복적으로 수행될 수 있으며, 학습 싸이클이 반복됨에 따라 파라미터

와 아이덴티티 대표 벡터

역시 반복적으로 최적화가 이루어진다.This updated parameter

and identity representative vector

will be the optimized result in the corresponding learning cycle step. Similarly, the above-described learning cycle may be repeatedly performed for the remaining training data among the entire training data, and as the learning cycle is repeated, the parameters

and identity representative vector

Optimization is also iteratively performed.

상술한 바와 같이 종래의 학습 방법에서는 미니 배치를 사용하기 때문에, 한 번에 방대한 학습 데이터(아이덴티티나 인스턴스)를 다루는 것이 어렵다. 한 번에 학습할 수 있는 학습 데이터의 크기가 작다는 것은 수만건의 아이덴티티를 검사하는데 많은 반복이 필요하다는 것을 의미하며, 이 요구 사항은 모든 아이덴티티를 종합적으로 고려하면서 임베드 공간에서 최적의 결정 경계(decision boundary)를 학습하는 작업을 복잡하게 한다. 왜냐하면 이와 같이 미니 배치 단위로 학습이 반복적으로 이루어지는 경우 반복이 오래 누적되면 누적될 수록, 초기 학습 결과의 반영이 제대로 이루어지지 못한다는 문제점이 존재하기 때문이다. 반복 초기의 학습이 나중에 거의 반영되지 않는다는 것이다. 그렇기 때문에 오랫동안 학습을 반복하더라도, 학습의 정확도가 어느 수준 이상 올라가지 못하고 포화(saturation)되어 버린다.As described above, since the conventional learning method uses a mini-batch, it is difficult to handle a large amount of training data (identities or instances) at a time. The small size of training data that can be trained at one time means that it takes many iterations to check tens of thousands of identities, and this requirement requires that all identities be considered comprehensively while determining the optimal decision boundary in the embedding space. It complicates the task of learning the boundary). This is because, when learning is repeatedly performed in mini-batch units as described above, the longer the repetition is accumulated, the more the initial learning result is not properly reflected. Learning at the beginning of iteration is rarely reflected later. Therefore, even if learning is repeated for a long time, the learning accuracy does not rise above a certain level and becomes saturated.

단순하게 미니 배치의 크기를 늘리면 일부 문제가 완화 될 수 있지만 메모리 제한 때문에 실용적이지 못한다. 또한 단순히 미니 배치의 크기를 늘리는 것 만으로는 향상된 정확도를 보장하지는 못한다.Simply increasing the size of the mini-batch may alleviate some problems, but it is impractical due to memory limitations. Also, simply increasing the size of the mini-batch does not guarantee improved accuracy.

따라서 본 발명에서는 메모리에 의존하지 않고 방대한 양의 아이덴티티를 종합적으로 고려할 수 있는 방법을 제안하고자 한다. 이를 위해서 본 발명에서는, 미니 배치에 대한 임베딩 벡터(제 1 임베딩 벡터)와 과거 학습 과정에서 사용된 임베딩 벡터(제 2 임베딩 벡터)를 함께 고려하여 학습을 수행하도록 제안한다.Therefore, in the present invention, it is intended to propose a method that can comprehensively consider a vast amount of identities without relying on memory. To this end, the present invention proposes to perform learning in consideration of the embedding vector (the first embedding vector) for the mini-batch and the embedding vector (the second embedding vector) used in the past learning process.

본 발명의 실시예들을 이하 도면과 함께 상세히 설명한다.Embodiments of the present invention will be described in detail below with reference to the drawings.

도 4는 본 발명의 일실시예에 따른 얼굴인식 장치(100)의 블록도를 도시하는 도면이다. 도 5는 본 발명의 일실시예에 따른 지도 학습 과정의 한 싸이클을 도시하는 도면이다. 도 6은 본 발명의 일실시예에 따른 얼굴인식 장치(100)의 순서도를 도시하는 도면이다. 이하 도 4 내지 도 6을 함께 참조하여 설명한다.4 is a diagram illustrating a block diagram of a face recognition apparatus 100 according to an embodiment of the present invention. 5 is a diagram illustrating one cycle of a supervised learning process according to an embodiment of the present invention. 6 is a diagram illustrating a flowchart of the face recognition apparatus 100 according to an embodiment of the present invention. Hereinafter, it will be described with reference to FIGS. 4 to 6 together.

본 발명의 일실시예에 따른 얼굴인식 장치(100)는, 모델(110), 분류부(210), 학습부(310) 및 대기열저장부(410)를 포함하도록 구성될 수 있다. 도 4에 도시된 구성요소들은 얼굴인식 장치(100)를 구현하는데 있어서 필수적인 것은 아니어서, 본 명세서 상에서 설명되는 얼굴인식 장치(100)는 위에서 열거된 구성요소들 보다 많거나, 또는 적은 구성요소들을 가질 수 있다.The face recognition apparatus 100 according to an embodiment of the present invention may be configured to include a model 110 , a classification unit 210 , a learning unit 310 , and a queue storage unit 410 . The components shown in FIG. 4 are not essential in implementing the face recognition apparatus 100, so the face recognition apparatus 100 described in this specification may include more or fewer components than those listed above. can have

모델(110)은 특정 인물(identity, 아이덴티티)에 대한 이미지가 입력되면, 해당 인물이 가진 얼굴의 특징 표현을 추출하고, 추출된 특징 표현을 기반으로 인코딩을 수행한다. 이때 수행되는 인코딩의 결과는 N차원의 벡터로 표현될 수 있다. 이렇게 표현되는 벡터는 해당 얼굴의 특징을 반영하게 되며, 임베딩 벡터(embedding vector)라 부른다.When an image of a specific person (identity) is input, the model 110 extracts a facial feature expression of the person and performs encoding based on the extracted feature expression. In this case, the result of the encoding performed may be expressed as an N-dimensional vector. The vector expressed in this way reflects the characteristics of the corresponding face, and is called an embedding vector.

분류부(210)는 여러 인물(아이덴티티)들 각각에 대한 '아이덴티티 대표 벡터

(Identity-representative Vector)'를 보유한다. 그리고 임베딩 벡터가 입력되면, 분류부(210)는 가장 유사도가 높은 아이덴티티 대표 벡터를 매칭시키는 방식으로 입력된 임베딩 벡터를 분류한다.The classification unit 210 is an 'identity representative vector for each of several people (identities).

(Identity-representative Vector)'. And when an embedding vector is input, the classification unit 210 classifies the input embedding vector by matching the identity representative vector with the highest similarity.

학습부(310)는 입력되는 학습 데이터를 이용하여 학습을 수행하고, 학습을 수행한 결과 얼굴인식 파라미터

와 아이덴티티 대표 벡터

를 최적화(optimize)시키는 방향으로 업데이트한다. 본 발명의 일실시에에 따른 학습부(310)는 얼굴인식 파라미터

를 최적화시키기 위한 제 1 학습부(310-1)와 아이덴티티 대표 벡터

를 최적화시키기 위한 제 2 학습부(310-2)를 포함하도록 구성될 수 있다. 제 1 및 제 2 학습부(310-1, 310-2)에 대해서는 이하 좀 더 상세히 후술하기로 한다.The learning unit 310 performs learning using the input learning data, and as a result of performing the learning, the face recognition parameter

and identity representative vector

is updated in the direction of optimizing The learning unit 310 according to an embodiment of the present invention is a face recognition parameter

The first learning unit 310-1 and the identity representative vector for optimizing

It may be configured to include a second learning unit 310 - 2 for optimizing . The first and second learning units 310-1 and 310-2 will be described in more detail below.

대기열저장부(410, Queue storage)는 각 학습 사이클에서 사용된 임베딩 벡터와 아이덴티티 대표 벡터를 누적하여 저장한다. 즉, 대기열저장부(410)에는 과거 학습 과정에서 사용되었던 임베딩 벡터와 아이덴티티 대표 백터가 누적되어 저장되어 있다. 본 발명에서는, 이렇게 대기열저장부(410)에 저장되어 있는, 과거 학습 과정에서 사용되었던 임베딩 벡터를 현재 학습 사이클에 입력된 미니 배치에 대한 임베딩 벡터와 함께 학습하는 방식으로, 방대한 양의 아이덴티티(인스턴스)를 종합적으로 고려하도록 제안하고 있는 것이다.The queue storage unit 410 (Queue storage) accumulates and stores the embedding vector and the identity representative vector used in each learning cycle. That is, in the queue storage unit 410, the embedding vectors and identity representative vectors used in the past learning process are accumulated and stored. In the present invention, in this way, the embedding vector used in the past learning process, stored in the queue storage unit 410, is learned together with the embedding vector for the mini-batch input in the current learning cycle, and a vast amount of identities (instances) ) is proposed to be considered comprehensively.

도 5의 학습 과정을 도 6의 순서도와 함께 참조하여 설명한다. 도 5에 도시된 바와 같이 미니 배치(301)가 입력(S601)되면, 모델(110)은 입력된 미니 배치(301)를 제 1 임베딩 벡터(들)

(501)로 인코딩(S602)한다. 제 1 학습부(310-1)는 상기 제 1 임베딩 벡터

(501)에 기초하여 제 1 학습을 수행(S603)한다. 이때 미니 배치(301)는 복수 개의 아이덴티티에 대한 복수 개의 이미지를 포함할 수 있다. 그리고 상기 인코딩되는 제 1 임베딩 벡터(들)

는 상기 복수 개의 이미지 각각에 대해서 산출될 수 있을 것이다.The learning process of FIG. 5 will be described with reference to the flowchart of FIG. 6 . As shown in FIG. 5 , when the mini-batch 301 is input ( S601 ), the model 110 converts the input mini-batch 301 into the first embedding vector(s).

(501) is encoded (S602). The first learning unit 310-1 is the first embedding vector.

Based on (501), the first learning is performed (S603). In this case, the mini-batch 301 may include a plurality of images for a plurality of identities. and the encoded first embedding vector(s)

may be calculated for each of the plurality of images.

여기서 제 1 학습은, 동일한 아이덴티티에 대한 아이덴티티 대표 벡터

(identity-representative vector)와 상기 제 1 임베딩 벡터

가 서로 근접해 지도록(가까워지도록) 얼굴인식 파라미터

를 수정하는 방식으로 업데이트하는 것을 의미한다. 수정된 얼굴인식 파라미터

는 모델(110)로 전달되어, 업데이트가 모델(110)에 반영될 수 있을 것이다. 즉, 제 1 학습 단계에서는 얼굴인식 파라미터

가 최적화(S604)된다.Here, the first learning is an identity representative vector for the same identity.

(identity-representative vector) and the first embedding vector

face recognition parameters so that (closer) to each other

means to update it in a way that fixes it. Modified face recognition parameters

is transmitted to the model 110 , so that the update may be reflected in the model 110 . That is, in the first learning step, the face recognition parameters

is optimized (S604).

이를 위해서 본 발명의 일실시예에 따른 제 1 학습 단계는, 아이덴티티 대표 벡터

(identity-representative vector)와 상기 제 1 임베딩 벡터

간의 근접도를 판단하는 단계 및 상기 판단되는 근접도가 높아지도록 파라미터를 최적화(optimize)시키는 단계를 포함할 수 있다.To this end, the first learning step according to an embodiment of the present invention is an identity representative vector

(identity-representative vector) and the first embedding vector

It may include determining the proximity between the two, and optimizing a parameter so that the determined proximity increases.

그리고 근접도를 판단하는 단계에서, 수학식 4의 제 1 손실 함수가 사용될 수 있다. 그리고 최적화시키는 단계에서도, 제 1 손실 함수를 통하여 산출된 손실이 최소화되는 방향으로 최적화가 이루어질 수 있을 것이다.And in the step of determining the proximity, the first loss function of Equation 4 may be used. And also in the optimization step, the optimization may be made in a direction in which the loss calculated through the first loss function is minimized.

여기서

는 미니 배치(301)로 입력된 이미지 세트,

는 미니 배치(301)로 입력된 이미지 세트의 개수,

는 제 1 임베딩 벡터이다.here

is the image set input as mini-batch 301,

is the number of image sets input into the mini-batch 301,

is the first embedding vector.

더 나아가 본 발명의 일실시예에 따른 학습부(310)는, 대기열저장부(410)에 저장되어 있는 과거 데이터를 함께 활용하기 위하여 제 2 임베딩 벡터

(502)를 로드(S605)하고, 제 2 학습을 수행(S606)한다. 제 2 학습부(310-2)는 제 1 대기열저장부(410-1)에 저장되어 있던 제 2 임베딩 벡터

(502)와 상기 제 1 임베딩 벡터

와 함께 제 2 학습(S606)을 수행한다.Furthermore, the learning unit 310 according to an embodiment of the present invention uses the second embedding vector to use the past data stored in the queue storage unit 410 together.

(502) is loaded (S605), and the second learning is performed (S606). The second learning unit 310-2 is a second embedding vector stored in the first queue storage unit 410-1.

502 and the first embedding vector

A second learning ( S606 ) is performed together with .

본 발명의 일실시예에 따른 제 2 학습 단계는, 상술한 제 1 학습 단계와 유사하다. 제 2 학습 단계는, 동일한 아이덴티티에 대한 아이덴티티 대표 벡터

, 상기 제 1 임베딩 벡터

및 제 2 임베딩 벡터

가 서로 근접해 지도록(가까워지도록) 아이덴티티 대표 벡터

를 수정하는 방식으로 업데이트할 수 있다. 수정된 아이덴티티 대표 벡터

는 분류부(210)로 전달되어, 업데이트가 분류부(210)에 반영될 수 있을 것이다. 제 2 학습 단계에서는 아이덴티티 대표 벡터

, 즉 분류부(210)의 파라미터가 최적화된다.The second learning step according to an embodiment of the present invention is similar to the first learning step described above. The second learning step is an identity representative vector for the same identity.

, the first embedding vector

and a second embedding vector

identity representative vector so that (closer) to each other

can be updated by modifying the . Modified Identity Representative Vector

is transmitted to the classification unit 210 , and the update may be reflected in the classification unit 210 . In the second learning stage, the identity representative vector

That is, the parameters of the classification unit 210 are optimized.

이를 위해서 본 발명의 일실시예에 따른 제 2 학습 단계는, 아이덴티티 대표 벡터

(identity-representative vector)와 상기 제 1 임베딩 벡터

및 제 2 임베딩 벡터

간의 근접도를 판단하는 단계 및 상기 판단되는 근접도가 높아지도록 파라미터를 최적화(optimize)시키는 단계를 포함할 수 있다.To this end, the second learning step according to an embodiment of the present invention is an identity representative vector

(identity-representative vector) and the first embedding vector

and a second embedding vector

그리고 근접도를 판단하는 단계에서, 수학식 5의 제 2 손실 함수가 사용될 수 있다. 그리고 최적화시키는 단계에서도, 제 2 손실 함수를 통하여 산출된 손실이 최소화되는 방향으로 최적화가 이루어질 수 있을 것이다.And in the step of determining the proximity, the second loss function of Equation 5 may be used. And also in the optimization step, the optimization may be made in a direction in which the loss calculated through the second loss function is minimized.

는 제 1 대기열저장부(410-1)에 저장되어 있는 제 2 임베딩 벡터

(들)을 의미한다. 수학식 4와 공통되는 설명은 생략한다.

is the second embedding vector stored in the first queue storage unit 410-1

means (s). Descriptions common to Equation 4 will be omitted.

즉, 상술한 제 2 학습 단계에 의하면, 현재 학습 사이클에 입력된 미니 배치(301)에 대한 학습 데이터뿐만 아니라, 과거 학습에 사용되었던 학습 데이터가 함께 종합적으로 고려되어 아이덴티티 대표 벡터

(분류부의 파라미터)가 최적화될 수 있다. 즉, 과거 학습 데이터는 새롭게 인코딩되지 않기 때문에(과거 이미 인코딩된 임베딩 벡터를 그대로 사용) 메모리 사용을 크게 늘리지 않으면서도, 더 많은 학습 데이터를 참고하여 학습이 가능하다는 것이다.That is, according to the above-described second learning step, not only the learning data for the mini-batch 301 input in the current learning cycle, but also the learning data used for the past learning are comprehensively considered together, so that the identity representative vector

(parameters of the classifier) can be optimized. In other words, since the past learning data is not newly encoded (using the embedding vector encoded in the past as it is), it is possible to learn by referring to more training data without significantly increasing memory usage.

제 1 및 제 2 학습 단계에 의해서 최적화된 파라미터들에 기초하여 얼굴인식 장치(100)를 업데이트(S608)시킨다. 그리고 이후 학습 과정에서 다시 활용하도록, 현재 학습 사이클에서의 제 1 임베딩 벡터를 제 1 대기열저장부(410-1)에 누적 저장(현재 제 2 임베딩 벡터에 추가)하고, 아이덴티티 대표 벡터

를 제 2 대기열저장부(410-2)에 누적 저장할 수 있다.The facial recognition apparatus 100 is updated (S608) based on the parameters optimized by the first and second learning steps. Then, the first embedding vector in the current learning cycle is accumulated in the first queue storage unit 410-1 (added to the current second embedding vector) to be used again in the subsequent learning process, and the identity representative vector

may be accumulated and stored in the second queue storage unit 410 - 2 .

이어서 S609 단계로 이동하여 전체 학습 데이터에 대해서 학습이 되었는지 여부를 판단하고, 학습이 완료되지 않았다면 다시 S601 단계로 복귀하여 다른 미니 배치 세트의 입력을 수신한다.Subsequently, it moves to step S609 to determine whether learning has been performed on the entire training data, and if the learning is not completed, returns to step S601 again to receive an input of another mini-batch set.

더 나아가 본 발명의 다른 실시예에서는, 제 1 대기열저장부(410-1)에 저장되어 있는 제 2 임베딩 벡터

를 학습에 반영하는데 있어서, 그대로 반영하지 않고 보간(compensate)시킨 후 반영하도록 제안한다. 이는 과거에 학습된 임베딩 벡터들은 과거의 파라미터에 의해서 인코딩된 상태로서, 그동안 이루어진 파라미터의 업데이트를 반영하지 않고 있기 때문이다.Further, in another embodiment of the present invention, the second embedding vector stored in the first queue storage unit 410-1

It is proposed to reflect after interpolation (compensate) instead of reflecting it as it is in reflecting it in learning. This is because embedding vectors learned in the past are encoded by the parameters of the past, and do not reflect the update of the parameters made in the meantime.

도 7 및 도 8은 본 발명의 일실시예에 따라 임베딩 벡터를 보간하기 위한 방법의 개념도를 도시하는 도면이다.7 and 8 are diagrams illustrating a conceptual diagram of a method for interpolating an embedding vector according to an embodiment of the present invention.

도 7을 참조하면 과거 얼굴모델 파라미터는

이고, 여러 차례 학습 사이클이 반복되어 현재

로 업데이트되었다. 과거 얼굴모델 파라미터

에 의해서 제 1 아이덴티티에 대한 제 1 내지 제 3 인스턴스가 제 a 내지 제 c 임베딩 벡터

(701-a 내지 701-c)로 인코딩되었고, 제 2 아이덴티티에 대한 제 4 내지 제 6 인스턴스가 제 d 내지 제 f 임베딩 벡터

(701-d ~ 701-f)로 인코딩되었다.Referring to FIG. 7 , the past face model parameters are

, and the learning cycle is repeated several times to

updated to Past face model parameters

The first to third instances of the first identity are the a to cth embedding vectors by

(701-a to 701-c), and the fourth to sixth instances for the second identity are the d to f th embedding vectors

(701-d to 701-f).

하지만 과거 얼굴모델 파라미터는

대신 업데이트된

를 이용하여 인코딩될 경우, 상기 제 1 아이덴티티에 대한 제 1 내지 제 3 인스턴스는 제 g 내지 제 i 임베딩 벡터

(701-g, 701-h, 701-i)로 인코딩되고, 상기 제 2 아이덴티티에 대한 제 4 내지 제 6 인스턴스는 제 j 내지 제 l 임베딩 벡터

(701-j, 701-k, 701-l)로 인코딩될 수 있다. 즉 파라미터의 업데이트(

)됨에 따라 인코딩된 결과 역시

로 달라지게 된다는 것이다. 결국 과거 학습 과정에서 사용된 임베딩 벡터

는, 오랜 학습이 반복된 이후에 그대로 사용할 경우 아래 수학식 6 만큼의 오차

를 발생시킬 수 있다는 의미이다.However, the parameters of the face model in the past

instead updated

When encoded using , the first to third instances of the first identity are

(701-g, 701-h, 701-i), and the fourth to sixth instances of the second identity are the j to first embedding vectors.

(701-j, 701-k, 701-l). i.e. update parameters (

), the encoded result is also

that will change to In the end, the embedding vectors used in the past learning process

is an error as much as Equation 6 below when used as it is after long learning is repeated.

This means that it can cause

학습의 반복이 그리 많지 않다면, 오차

는 그리 큰 값이 아닐 수 있다. 하지만 학습이 반복적으로 수행됨에 따라 오차

는 점진적으로 누적되어갈 것이다. 이러한 오차

를 최소화시키기 위하여 본 발명의 일실시예에서는 아래 수학식 7에 기초한 보간 함수

를 제안한다.If there are not many iterations of learning, the error

may not be very large. However, as the training is iteratively performed, the error

will gradually accumulate. these errors

In an embodiment of the present invention in order to minimize

suggest

여기서

는 보간되기 전 과거 임베딩 벡터이고,

는 보간된 과거 임베딩 벡터를 의미한다.here

is the past embedding vector before interpolation,

denotes an interpolated past embedding vector.

보간 함수

는 아래 수학식 8의 아이덴티티

에 대한 보간된 임베딩 벡터와 현재 임베딩 벡터 간의 평균 제곱 오차(expected squared error) 함수인

를 최소화 시켜야 한다. interpolation function

is the identity of Equation 8 below

is a function of the expected squared error between the interpolated embedding vector and the current embedding vector for

should be minimized

여기서

는 아이덴티티

에 대한 임베딩 벡터

의 기대값(expectation)을 의미한다.here

is the identity

embedding vector for

means the expected value of

수학식 8을

에 대하여 편미분하면, 아래 수학식 9로 표현된다.Equation 8

By partial differentiation with respect to , it is expressed by Equation 9 below.

수학식 9를 이용하여 최적의 보간 함수

를 연산하면 아래 수학식 10과 같이 근사값을 도출할 수 있다.Optimal interpolation function using Equation 9

, an approximate value can be derived as shown in Equation 10 below.

여기서

는 제 2 대기열저장부(410-2)에 저장되는 과거 아이덴티티 대표 벡터로, 과거 학습 과정에서

가 제 1 대기열저장부(410-1)에 저장될 때 함께 저장된다.here

is a representative vector of the past identity stored in the second queue storage unit 410-2, and in the past learning process,

is stored together when stored in the first queue storage unit 410-1.

도 8에 도시된 바와 같이 제 1 아이덴티티에 대한 과거 임베딩 벡터들

(701-a, 701-b, 701-c)의 과거 평균 지점(710, Class center)을 지시하는 제 1 중심 벡터와 현재 평균 지점(710')을 지시하는 제 2 중심 벡터의 차인

를 임베딩 벡터들

(701-a, 701-b, 701-c)에 더할 경우, 과거 평균 지점(710)과 현재 평균 지점(710')이 일치될 수 있을 것이다. 다만 아래 수학식 11에서와 같이 임베딩 벡터와 아이덴티티 벡터 간에 스케일을 맞추기 위한 상수

가 곱해질 수 있다. 이렇게 현재 평균 지점(710')으로 일치된 과거의 임베딩 벡터들을 보간된 임베딩 벡터

(701^*-1 내지 701^*-3)라 한다.As shown in Fig. 8, past embedding vectors for the first identity

The difference between the first center vector indicating the past average point 710 (Class center) of (701-a, 701-b, 701-c) and the second center vector indicating the current average point 710'

embedding vectors

When adding to (701-a, 701-b, 701-c), the past average point 710 and the current average point 710 ′ may coincide. However, as in Equation 11 below, a constant for adjusting the scale between the embedding vector and the identity vector

can be multiplied by An embedding vector interpolated by past embedding vectors matched with the current average point 710' in this way

(701 ^* -1 to 701 ^* -3).

여기서

는 보간되기 전 과거 임베딩 벡터이고,

는 보간된 후 과거 임베딩 벡터를 의미한다. 그리고

는 아이덴티티

에 대한 현재 아이덴티티 대표 벡터,

는 아이덴티티

에 대한 과거 아이덴티티 대표 벡터를 의미한다.here

is the past embedding vector before interpolation,

denotes the past embedding vector after interpolation. and

is the identity

current identity representative vector for

is the identity

It means a representative vector of the past identity for .

다시 도 5로 복귀하면, 상술한 실시예에서는 제 1 대기열저장부(410-1)에 저장되어 있던 제 2 임베딩 벡터

(502)를 보간 없이 이용하였다. 하지만, 제 2 대기열저장부(410-2)에 저장되는 과거 아이덴티티 대표 벡터

를 이용하여 보간(수학식 11을 이용)시킨 보간된 임베딩 벡터

를 대신 이용할 수 있을 것이다.Returning to FIG. 5 , in the above-described embodiment, the second embedding vector stored in the first queue storage unit 410 - 1 .

(502) was used without interpolation. However, the past identity representative vector stored in the second queue storage unit 410-2

An interpolated embedding vector interpolated using Equation 11 (using Equation 11)

could be used instead.

보간된 임베딩 벡터

를 이용할 경우, 상술한 수학식 5의 제 2 손실 함수는 아래 수학식 12와 같이 바뀔 수 있을 것이다.Interpolated Embedding Vectors

In the case of using , the second loss function of Equation 5 may be changed to Equation 12 below.

상술한 방법들을 적용한 결과, 본 발명에서는 한 학습 사이클에서 상당히 많은 크기의 학습 데이터를 고려할 수 있었다. 상술한 방법들을 적용할 경우, 모델(110)의 최적화는 해당 학습 사이클에 입력된 미니 배치에 기초하여 이루어졌지만, 분류기(210) 파라미터(아이덴티티 대표 벡터

)는 미니 배치뿐만 아니라 과거 학습된 데이터를 모두 고려하여 최적화될 수 있기 때문에, 반복으로 학습하더라도 정확도가 꾸준히 향상되는 것을 확인할 수 있다. 더군다나 본 발명의 실시예는, 기존 얼굴인식 장치에서 대기열저장부(410)만을 구비하는 것만으로도 손쉽게 적용할 수 있어, 기존 얼굴인식 장치들과 손쉽게 호환할 수 있다는 장점이 존재한다.As a result of applying the above-described methods, in the present invention, it is possible to consider a fairly large size of training data in one learning cycle. When the above-described methods are applied, the optimization of the model 110 was made based on the mini-batch input in the corresponding learning cycle, but the classifier 210 parameter (identity representative vector) was

) can be optimized by considering both the mini-batch as well as the data learned in the past, so it can be seen that the accuracy is steadily improved even if iteratively learning. Moreover, the embodiment of the present invention can be easily applied only by having only the queue storage unit 410 in the existing face recognition device, so there is an advantage of being easily compatible with the existing face recognition devices.

이하에서는, 상술한 본 발명의 실시예들의 효과들을 실험 데이터로 증명한다.Hereinafter, the effects of the above-described embodiments of the present invention are demonstrated with experimental data.

도 9은 본 발명의 일실시예에 따른 보간에 따른 효과를 증명하는 실험 데이터이다.9 is experimental data that proves the effect of interpolation according to an embodiment of the present invention.

도 9 (a)는 현재 학습 사이클에서 임베딩 벡터들간의 코사인 에러에 대해서 보간이 있는 경우와 없는 경우를 비교하는 실험 결과이다. 오류는 64회 학습 사이클 이상 무작위로 샘플링 된 인스턴스로 계산하였다.9 (a) is an experimental result comparing the case with and without interpolation with respect to the cosine error between embedding vectors in the current learning cycle. Errors were calculated with randomly sampled instances over 64 learning cycles.

학습 사이클의 반복이 적은 경우(32회 이하) 보간이 있는 경우와 없는 경우의 코사인 에러값은 크게 차이나지 않는다. 하지만, 많은 학습 사이클이 반복되는 경우(32회 이상) 에러는 증가하였으며, 과거 데이터에 대한 보간이 필요한 것으로 판단된다. 이와 같은 에러가 증가할 경우 학습 과정에 대한 정확도가 감소할 수 있음은 자명할 것이다.When the learning cycle repetition is small (32 or less), the cosine error value with and without interpolation is not significantly different. However, when many learning cycles are repeated (more than 32 times), the error increases, and it is determined that interpolation for past data is necessary. It will be apparent that the accuracy of the learning process may decrease if such an error increases.

그리고 도 9 (b)는 과거(64회 학습 사이클)의 64개 인스턴스에 대한 과거, 현재 및 보간된 임베딩 벡터(들)의 산포도(Scatter Plot)를 도시한다. 과거의 임베딩 벡터는 보간을 적용한 후 현재의 임베딩 벡터에 접근하는 것으로 확인할 수 있기 때문에, 제안된 보간 기능이 효과적인 것으로 확인할 수 있다.and FIG. 9( b ) shows a Scatter Plot of past, present and interpolated embedding vector(s) for 64 instances of the past (64 learning cycles). Since the past embedding vector can be confirmed by accessing the current embedding vector after interpolation is applied, it can be confirmed that the proposed interpolation function is effective.

도 10은 학습 데이터의 양이 증가됨에 따라 요구되는 메모리의 증가를 비교하는 도면이다.10 is a diagram comparing an increase in memory required as the amount of learning data increases.

x축은 전체 학습 데이터의 사이즈(미니 배치의 사이즈/대기열저장부의 저장 사이즈)에 대한 축이고, y축은 메모리 사용량에 대한 축을 나타낸다.The x-axis is the size of the entire training data (the size of the mini-batch/the storage size of the queue storage unit), and the y-axis is the axis of the memory usage.

점선 그래프로 도시된 바와 같이 일반적인 방법으로 단순히 미니 배치의 사이즈만을 증가시킨다면 학습 데이트의 사이즈를 증가시키고자 하는 경우, 선형적으로 요구되는 메모리의 사이즈 역시 증가하게 된다. 실선 그래프에 도시된 바와 같이 본 발명의 실시예에 따라 대기열저장부의 사이즈를 증가시킨다면, 학습데이터를 8,192개의 인스턴스 정도로 늘리더라도 요구되는 단일 GPU의 메모리 사이즈인 32GB를 초과하지 않는 것으로 확인할 수 있다. 일반적인 미니 배치의 사이즈 증가 방법으로 8,192개의 인스턴스를 학습하기 위해서는 약 952GB의 메모리가 필요한 것에 비교하였을 때 상대적으로 상당히 낮은 수치이다.As shown in the dotted line graph, if the size of the mini-batch is simply increased in a general way, when the size of the training data is to be increased, the size of the memory required linearly also increases. As shown in the solid line graph, if the size of the queue storage unit is increased according to the embodiment of the present invention, even if the training data is increased to about 8,192 instances, it can be confirmed that the required memory size of a single GPU does not exceed 32 GB. This is a relatively low number compared to about 952 GB of memory required to train 8,192 instances with a general method of increasing the size of a mini-batch.

* 실험 세팅* Experimental settings

본 실험의 전처리 과정에서 5 개의 얼굴 특징점(두 눈, 코 및 입)을 사용하여 얼굴 영역을 변환하는 방식으로 얼굴 이미지를 112 × 112로 정규화 했다. 백본망으로는 최근 많이 사용되는 ResNet-100를 사용하였다. ResNet-100의 res5c 레이어 이후에, 배치 정규화 블록, 완전 접속 및 배치 정규화 레이어가 512차원 임베딩 벡터를 계산하기 위해 적용된다. 인코딩된 임베딩 벡터와 선형 분류부의 가중치 벡터(weight vector)

는 L2-정규화되고 ArcFace에 의해 학습된다. 4 개의 동기화된 NVidia V100 GPU로 학습되었으며 각 GPU에 128 개 이미지의 미니 배치가 할당된다. 본 발명의 실시예(이하 BroadFace라고 호칭함)의 대기열저장부에는 각 GPU에 대해 64 개의 반복에 걸쳐 누적된 최대 8,192개의 임베딩 벡터가 저장되므로 대기열저장부의 총 크기는 4 개의 GPU에 대해 32,768개의 임베딩 벡터가 저장된다. 임베드 공간의 급격한 변경을 피하기 위해 BroadFace 네트워크는 softmax 기반 손실에 의해 학습된 사전 학습 네트워크에서 학습된다. SGD(stochastic gradient descent) 최적화(optimizer)를 채택했으며 학습 속도(learning rate)는 첫 번째 50k의 경우 5 X 10^-3, 20k의 경우 5 X 10^-4, 10k의 5 X 10^-5로, 무게 감소(weight decay)는 5 X 10^-4 그리고 운동량(momentum)은 0.9로 설정되었다.In the preprocessing process of this experiment, the facial image was normalized to 112 × 112 by transforming the facial region using five facial feature points (two eyes, nose, and mouth). As the backbone network, ResNet-100, which has been widely used recently, was used. After the res5c layer of ResNet-100, the batch normalization block, fully connected and batch normalization layers are applied to compute the 512-dimensional embedding vector. The encoded embedding vector and the weight vector of the linear classifier

is L2-normalized and trained by ArcFace. It was trained with 4 synchronized NVidia V100 GPUs and each GPU is assigned a mini-batch of 128 images. In the queue storage unit of the embodiment of the present invention (hereinafter referred to as BroadFace), a maximum of 8,192 embedding vectors accumulated over 64 iterations for each GPU are stored, so the total size of the queue storage unit is 32,768 embeddings for 4 GPUs. vector is stored. To avoid abrupt changes in the embedding space, the BroadFace network is trained on a pre-trained network trained by softmax-based loss. A stochastic gradient descent (SGD) optimizer is adopted, and the learning rate is 5 X 10 ^-3 for the first 50k, 5 X 10 ^-4 for 20k, 5 X 10 ^-5 for 10k, The weight decay was set to 5 X 10 ^-4 and the momentum was set to 0.9.

* 학습 데이터 세트(training data set)* training data set

실험에서 얼굴인식 장치는 100만개의 아이덴티티에 대한 약 10 X 10⁶개의 인스턴스(이미지)로 구성된 MSCeleb-1M에 대해 교육되었다. MSCeleb-1M의 노이즈 레이블을 제거하여 85 X 10³개의 아이덴티티에 대한 3.8 X 10⁶개의 이미지가 포함된 수정된 버전을 사용했다. 실험을 위해 아래와 같이 다양한 데이터 세트에 대한 평가를 수행하였다.In the experiment, a face recognition device was trained on MSCeleb-1M, which consisted of approximately 10 X 10 ⁶ instances (images) for 1 million identities. A modified version containing 3.8 X 10 ⁶ images for 85 X 10 ³ identities was used by removing the noise labels of the MSCeleb-1M. For the experiment, evaluation was performed on various data sets as follows.

- LFW(Labeled Faces in the Wild)은 얼굴 확인에 가장 일반적으로 사용되는 데이터 세트 중 하나이다. 데이터 세트에는 5,749 명의 다른 아이덴티티에 대한 웹에서 수집 한 13 X 10³개의 얼굴 이미지가 포함되어 있다. LFW는 얼굴 이미지에서 6 X 10³쌍을 제공한다. Cross-Age LFW(CALFW)는 연령 변화가 있는 쌍을 제공하고 Cross-Pose(CPLFW)는 LFW의 이미지에서 포즈가 변하는 쌍을 제공한다.- Labeled Faces in the Wild (LFW) is one of the most commonly used data sets for face identification. The dataset contains 13 X 10 ³ face images collected from the web for 5,749 different identities. LFW provides 6 X 10 ³ pairs of face images. Cross-Age LFW (CALFW) provides pairs with age changes, and Cross-Pose (CPLFW) provides pairs with varying poses in the LFW image.

- YTF(YouTube Faces)는 얼굴 확인을 위해 유튜브(YouTube)에서 다운로드 한 동영상으로 구성된 공개 데이터 세트이다. 데이터 세트에는 1,595명의 다른 아이덴티티(사람)에 대한 3,425개의 동영상이 포함되어 있다.- YTF (YouTube Faces) is a public data set consisting of videos downloaded from YouTube for face verification. The data set contains 3,425 videos for 1,595 different identities (people).

- MegaFace에는 690 X 10³개의 아이덴티티에서 1 X 10⁶개 이상의 이미지가 포함되어 있으며, 다양한 형태의 노이즈가 존재하는 상황에서의 인식 정확도를 평가할 수 있다.- MegaFace contains more than 1 X 10 ⁶ images with 690 X 10 ³ identities, and the recognition accuracy can be evaluated in the presence of various types of noise.

- CFP(Celebrities in Frontal-Prole)은 500 명의 클래스(아이덴티티)를 포함하고 있다. 각 피사체에는 10개의 정면 이미지와 4개의 프로파일 이미지가 있다.- CFP (Celebrities in Frontal-Prole) includes 500 classes (identities). Each subject has 10 frontal images and 4 profile images.

- AgeDB-30는 연령 변화가 있는 440개의 아이덴티티를 가진 12,240개의 이미지를 포함하여, 연령 변화에 따른 감도를 평가하는 데 적합하다.- AgeDB-30 contains 12,240 images with 440 identities with age changes, and is suitable for evaluating sensitivity to age changes.

- IJB(IARPA Janus Benchmark)는 제한되지 않은 안면 인식 시스템(unconstrained face recognition systems)을 평가하도록 설계된 가장 어려운 데이터 세트 중 하나이다. IJB-B은 67 X 10³ 얼굴 이미지, 7 X 10³ 얼굴 동영상 및 10 X 10³ 비얼굴 이미지로 구성된다. IJB-C는 IJB-B에 증가 된 폐색 및 지리적 기원의 다양성을 가진 새로운 대상을 추가하는 138 X 10³ 얼굴 이미지, 11 X 10³ 얼굴 동영상 및 10 X 10³ 비얼굴 이미지로 구성된다.- IARPA Janus Benchmark (IJB) is one of the most difficult data sets designed to evaluate unconstrained face recognition systems. IJB-B consists of 67 X 10 ³ face images, 7 X 10 ³ face videos, and 10 X 10 ³ non-face images. IJB-C consists of 138 X 10 ³ face images, 11 X 10 ³ facial animations, and 10 X 10 ³ non-face images adding new objects with increased occlusion and diversity of geographic origin to IJB-B.

LFW 및 YTF는 무제한 환경에서 검증 성능을 평가하는 데 널리 사용된다. 이미지 쌍을 포함하는 LFW는 주어진 쌍에 대한 두 개의 임베딩 벡터를 비교하여 모델을 평가한다. YTF에는 이미지 세트인 동영상이 포함되어 있으며 48 프레임의 가장 짧은 클립에서 6,070 프레임의 가장 긴 클립까지 포함되어 있다. 한 쌍의 비디오를 비교하기 위해 YTF는 각 비디오에서 수집된 이미지의 평균 임베딩 벡터인 한 쌍의 대표 임베딩 벡터(아이덴티티 대표 벡터)를 비교한다. 표 1에서 확인할 수 있듯이 두 데이터 세트는 다른 일반적인 방법들 역시 정확도가 매우 높은 것으로 나타났지만 본 발명의 실시예(가장 아래줄의 BroadFace)는 다른 일반적인 방법들 보다 성능이 뛰어나다는 것을 알 수 있다.LFW and YTF are widely used to evaluate verification performance in unrestricted environments. LFW containing image pairs evaluates the model by comparing the two embedding vectors for a given pair. YTF contains video, which is a set of images, from the shortest clip at 48 frames to the longest at 6,070 frames. To compare a pair of videos, YTF compares a pair of representative embedding vectors (identity representative vectors), which is the average embedding vector of images collected from each video. As can be seen in Table 1, the two data sets showed that other general methods also showed very high accuracy, but it can be seen that the embodiment of the present invention (BroadFace in the bottom row) outperforms other general methods.

CALFW, CPLFW, CFP-FP 및 AgeDB-30은 자세의 변화나 연령 변화에 대해 얼마나 효과적인지 확인하는데 널리 사용된다. CALFW와 AgeDB-30은 단일 아이덴티티에 대해 다양한 연령대의 여러 인스턴스를 가지며, CPLFW와 CFP-FP는 서로 다른 포즈(전면 및 프로파일)에 대해 단일 아이덴티티에 대한 여러 인스턴스를 갖는다. 표 2에서 확인할 수 있듯이, 다른 일반적인 방법과 비교하였을 때 본 발명의 실시예(가장 아래줄의 BroadFace)는 높은 정확도를 갖는다.CALFW, CPLFW, CFP-FP and AgeDB-30 are widely used to determine how effective they are against changes in posture or age. CALFW and AgeDB-30 have multiple instances of different age groups for a single identity, while CPLFW and CFP-FP have multiple instances of a single identity for different poses (front and profile). As can be seen in Table 2, compared with other general methods, the embodiment of the present invention (BroadFace in the bottom row) has high accuracy.

MegaFace는 많은 노이즈(distractor)가 포함되어 있는 조건 하에서 얼굴 식별 및 검증 작업을 평가하도록 설계되었다. 본 발명은 학습 데이터 세트가 5천만 개가 넘는 'Megaface Challenge 1'으로 평가를 수행하였다. 표 3에서 확인할 수 있듯이 본 발명의 실시예(BroadFace)는 얼굴 식별 및 검증 작업 모두에서 다른 얼굴인식 모델들보다 성능이 우수한 것으로 확인하였다. 수정된 MegaFace(노이즈가 많은 라벨이 제거된)에서 역시 본 발명의 실시예(BroadFace)는 다른 얼굴인식 모델을 능가하는 것으로 확인된다.MegaFace is designed to evaluate face identification and verification tasks under conditions that contain a lot of noise. The present invention was evaluated with 'Megaface Challenge 1' with more than 50 million training data sets. As can be seen in Table 3, it was confirmed that the embodiment of the present invention (BroadFace) performed better than other face recognition models in both face identification and verification tasks. Also in the modified MegaFace (noisy labels removed), it is confirmed that the embodiment of the present invention (BroadFace) outperforms other face recognition models.

IJB-B 및 IJB-C는 제한없는 얼굴인식(unconstrained face recognition)을 평가하는데 가장 어려운 학습 데이터 세트이다. IJB에서, 이미지 또는 동영상으로부터의 한 쌍의 얼굴이 비교되므로, 주어진 이미지에 내장된 특징 및 주어진 동영상을 나타내는 템플릿 특징을 구성함으로써 평가가 수행된다. 평가가 수행되는 동안 수평 플리핑(horizontal flipping)과 같은 기능 보강없이, 검증 작업에서 BroadFace와 CosFace 및 BroadFace와 ArcFace를 비교한다. 표 4를 보면 본 발명의 실시예(BroadFace)는 모든 FAR 기준에서 상당한 개선된 것을 확인할 수 있다. IJB-B 에서 BroadFace는 ArcFace의 결과와 비교하여, FAR = 1e^-6에서 8.25 % 포인트, FAR = 1e^-5에서 1.48 % 포인트 및 FAR = 1e^-4에서 0.36 % 포인트를 우수한 것을 확인할 수 있다.IJB-B and IJB-C are the most difficult training datasets to evaluate unconstrained face recognition. In IJB, since a pair of faces from an image or a moving picture is compared, evaluation is performed by constructing a feature embedded in a given image and a template feature representing the given moving image. Comparison of BroadFace and CosFace and BroadFace and ArcFace in verification work without enhancements such as horizontal flipping during evaluation. Looking at Table 4, it can be seen that the embodiment (BroadFace) of the present invention is significantly improved in all FAR standards. Compared with the results of ArcFace in IJB-B, BroadFace is superior to 8.25 percentage points at FAR = 1e ^-6 , 1.48 percentage points at FAR = 1e ^-5 , and 0.36 percentage points at FAR = 1e ^-4 .

얼굴인식 기술과 이미지 검색 기술은 얼굴, 옷 및 산업 제품과 같이 주어진 항목 쌍을 비교하기 위해 최적의 임베드 공간을 배우려고 한다는 점에서 유사한 점이 존재한다. 따라서 본 발명의 일실시예에 따른 방법의 일반화 능력을 증명하기 위해 이하에서는 본 발명의 실시예(BroadFace)에 따른 기술을 이미지 검색 기술을 위해 최근에 제안된 메트릭 학습 방법(metric learning method)과 비교해본다.Face recognition technology and image search technology have similarities in that they try to learn the optimal embedding space to compare a given pair of items, such as faces, clothes, and industrial products. Therefore, in order to prove the generalization ability of the method according to an embodiment of the present invention, the technique according to the embodiment of the present invention (BroadFace) is compared with a metric learning method recently proposed for image search technology. see.

* 이미지 검색 영역에 적용하기 위한 실험 세팅* Experimental settings to apply to the image search area

ILSVRC 2012-CLS에서 사전 학습 된 ResNet-50을 백본망으로 사용한다. ArcFace를 기본 목적 함수로 사용하고 본 발명의 실시예에 따른 대기열(Queue) 크기를 32 X 10³으로 설정한다. 그 외에 학습 절차에 관한 세부 사항은 널리 알려진 방식들을 따른다. 그리고 평가를 위해 표준 평가 프로토콜을 따른다. 얼굴인식 데이터 세트와 유사한 클래스가 많은 두 개의 큰 데이터 세트인 In-Shop(In-Shop Clothes Retrieval) 및 SOP(Stanford Online Products)에 대해 평가를 수행하였다. 본 발명의 실시예(BroadFace)는 모델이 상기 두 개의 데이터 세트에서 많은 수의 클래스를 효과적으로 학습 할 수 있도록 도와줄 수 있다.ResNet-50, pre-trained in ILSVRC 2012-CLS, is used as the backbone network. ArcFace is used as the basic objective function and the queue size according to the embodiment of the present invention is set to 32 X 10 ³ . Other than that, the details of the learning process follow well-known methods. And follow standard evaluation protocol for evaluation. Evaluation was performed on two large data sets, In-Shop Clothes Retrieval (In-Shop) and Stanford Online Products (SOP), which have many classes similar to the face recognition data set. An embodiment of the present invention (BroadFace) can help a model to effectively learn a large number of classes from the two data sets.

In-Shop 및 SOP는 이미지 검색 기술에 관한 표준 학습 데이터 세트이다. In-Shop에는 11,735 클래스의 옷이 있다. 학습을 위해 3,997개의 클래스가 있는 25,882개의 이미지가 사용되고, 나머지 7,970 개의 클래스가 있는 26,830 개의 이미지가 평가를 위해 쿼리 세트/갤러리 세트로 분할된다. SOP에는 22,634개의 산업 제품이 포함되어 있다. 먼저 학습에는 11,318개의 클래스가 있는 59,551개의 이미지가 사용된 후, 나머지 11,316개의 클래스가 있는 60,499개의 이미지가 평가에 사용된다. 표 5를 참조하면 이 평가에서 ArcFace로 학습된 기본 모델은 다른 최신 방법에 비해 성능이 떨어지는 것을 확인할 수 있다. 본 발명의 실시예(BroadFace)는 기본 모델의 리콜을 대폭 개선하며 다른 방법보다 성능이 우수한 것을 확인할 수 있다.In-Shop and SOP are standard training datasets for image retrieval techniques. The In-Shop has 11,735 classes of clothes. 25,882 images with 3,997 classes are used for training, and 26,830 images with the remaining 7,970 classes are split into query sets/gallery sets for evaluation. The SOP includes 22,634 industrial products. First, 59,551 images with 11,318 classes are used for training, and then 60,499 images with the remaining 11,316 classes are used for evaluation. Referring to Table 5, it can be seen that the basic model trained with ArcFace in this evaluation performed poorly compared to other modern methods. The embodiment of the present invention (BroadFace) significantly improves the recall of the basic model, and it can be seen that the performance is superior to that of other methods.

본 발명의 실시예(BroadFace)에는 하이퍼 파라미터(hyper-parameter)가 과거 반복된 학습에서 누적된 최대 임베딩 벡터 수를 결정하는 대기열 크기(Queue Size) 단 하나만 존재한다. 이러한 파라미터는 인식 정확도를 결정하는 데 매우 중요한 역할을 하며, 단일 파라미터를 사용하면 분석법을 쉽게 조정할 수 있다. 표 6을 참조하면 대기열 크기가 커질수록 성능은 꾸준히 증가하는 것을 확인할 수 있다. 보간이 없으면(without Compensation) 대기열 크기가 상당히 클 때 정확도 저하가 발생하는 것을 확인할 수 있다. 그러나 보간이 적용될 경우(with Compensation) 과거 임베딩 벡터의 에러를 수정하여 성능 저하를 완화시킨다.In the embodiment (BroadFace) of the present invention, there is only one queue size in which a hyper-parameter determines the maximum number of embedding vectors accumulated in past repeated learning. These parameters play a very important role in determining the recognition accuracy, and the use of a single parameter allows easy tuning of the assay. Referring to Table 6, it can be seen that the performance steadily increases as the queue size increases. It can be seen that without interpolation (without Compensation) the accuracy degradation occurs when the queue size is quite large. However, when interpolation is applied (with Compensation), the error of the past embedding vector is corrected to alleviate the performance degradation.

도 11은 본 발명의 일실시예에 따른 얼굴인식 방법을 이미지 검색 기술에 적용한 경우의 실험 데이터를 도시한다.11 shows experimental data when the face recognition method according to an embodiment of the present invention is applied to an image search technology.

도시된 실험에서 대기열의 사이즈를 0에서 32,000까지 변화시키면서 리콜의 변화를 측정하였다. 본 발명의 일실시예에 따른 보간(with Compensation)으로 대기열의 크기가 증가함에 따라 리콜이 지속적으로 개선되는 것을 확인(1101)할 수 있다. 그러나 제안된 보간이 없으면(without Compensation) 대기열 크기가 16 X 10³를 초과하였을 때 모델의 리콜이 저하됨을 확인(1102)하였다.In the experiment shown, the change in recall was measured while changing the queue size from 0 to 32,000. It can be confirmed ( 1101 ) that the recall is continuously improved as the size of the queue increases by interpolation (with Compensation) according to an embodiment of the present invention. However, without the proposed interpolation (without Compensation), it was confirmed (1102) that the recall of the model was reduced when the queue size exceeded 16 X 10 ³ .

상기 확인된 바와 같이, 본 발명의 실시예(BroadFace)에 따르면 얼굴인식 영역뿐만 아니라 모든 목적함수(objective function) 및 다양한 백본망에 적용될 수 있다. 상기 표 4를 참조하여 확인하였듯이 널리 사용되는 CosFace와 ArcFace의 목적함수(objective function)에 본 발명의 실시예(BroadFace)가 적용될 경우 인식 정확도가 높아지는 것을 확인하였다. 또한 본 발명의 실시예(BroadFace)를 MobileFaceNet, ResNet-18 및 ResNet-34와 같은 여러 백본망에 적용하였다. MobileFaceNet 및 ResNet-18과 같은 가벼운 백본망의 경우 임베딩 벡터의 크기를 128로 설정하고 ResNet-34 및 ResNet-100과 같은 무거운 백본망의 경우 512로 설정하여 적용하였다. 표 7을 참조하면 본 발명의 실시예(BroadFace)는 모든 백본망에 매우 효과적이라는 것을 확인할 수 있다. 특히, 본 발명의 실시예(BroadFace)로 학습 된 ResNet-34는 ResNet-34가 GFlops가 훨씬 적더라도 ArcFace로만 학습된 ResNet-100과 비슷한 성능을 달성한다.As confirmed above, according to the embodiment (BroadFace) of the present invention, it can be applied to all objective functions and various backbone networks as well as the face recognition area. As confirmed with reference to Table 4, it was confirmed that recognition accuracy was increased when the embodiment (BroadFace) of the present invention was applied to the objective functions of CosFace and ArcFace, which are widely used. In addition, the embodiment of the present invention (BroadFace) was applied to several backbone networks such as MobileFaceNet, ResNet-18 and ResNet-34. For light backbone networks such as MobileFaceNet and ResNet-18, the size of the embedding vector was set to 128, and for heavy backbone networks such as ResNet-34 and ResNet-100, it was applied by setting it to 512. Referring to Table 7, it can be confirmed that the embodiment of the present invention (BroadFace) is very effective for all backbone networks. In particular, ResNet-34 trained with the embodiment of the present invention (BroadFace) achieves similar performance to ResNet-100 trained only with ArcFace, even though ResNet-34 has much less GFlops.

본 발명의 실시예(BroadFace)는 얼굴인식 기술 및 이미지 검색 기술의 학습 과정을 가속화한다. 얼굴인식 기술에서는, 고도로 포화된 데이터 세트에 대한 다양한 얼굴인식 기술들 사이에서, 작은 성능 차이를 극복하기 위해 많은 반복이 여전히 필요하다.An embodiment of the present invention (BroadFace) accelerates the learning process of face recognition technology and image search technology. In face recognition technology, many iterations are still needed to overcome small performance differences among various face recognition techniques on highly saturated data sets.

도 11 (b)를 참조하면 본 발명의 실시예(BroadFace)에 대한 효과를 명확하게 보여주기 위해 이미지 검색에서 학습 과정의 가속을 실험한 그래프이다. 본 발명의 실시예(BroadFace)는 다른 기존 모델보다 훨씬 빠르고 최고 성능을 달성하는 것을 확인할 수 있다. 다만, 보간이 없으면(without compensation) 학습이 반복될 때마다 성능이 저하되는 것을 확인할 수 있다.Referring to FIG. 11 ( b ), it is a graph in which acceleration of the learning process in image search is experimented to clearly show the effect of the embodiment (BroadFace) of the present invention. It can be seen that the embodiment of the present invention (BroadFace) achieves the highest performance much faster than other existing models. However, if there is no interpolation (without compensation), it can be seen that the performance deteriorates every time the learning is repeated.

이상으로 본 발명에 따른 얼굴을 인식하는 장치 및 그것의 학습 제어 방법의 실시예를 설시하였으나 이는 적어도 하나의 실시예로서 설명되는 것이며, 이에 의하여 본 발명의 기술적 사상과 그 구성 및 작용이 제한되지는 아니하는 것으로, 본 발명의 기술적 사상의 범위가 도면 또는 도면을 참조한 설명에 의해 한정／제한되지는 아니하는 것이다. 또한 본 발명에서 제시된 발명의 개념과 실시예가 본 발명의 동일 목적을 수행하기 위하여 다른 구조로 수정하거나 설계하기 위한 기초로써 본 발명이 속하는 기술분야의 통상의 지식을 가진 자에 의해 사용되어질 수 있을 것인데, 본 발명이 속하는 기술분야의 통상의 지식을 가진 자에 의한 수정 또는 변경된 등가 구조는 청구범위에서 기술되는 본 발명의 기술적 범위에 구속되는 것으로서, 청구범위에서 기술한 발명의 사상이나 범위를 벗어나지 않는 한도 내에서 다양한 변화, 치환 및 변경이 가능한 것이다.Although the embodiments of the device for recognizing a face and the learning control method thereof have been described above according to the present invention, this will be described as at least one embodiment, and thereby the technical idea of the present invention and its configuration and operation are not limited. As not to be, the scope of the technical idea of the present invention is not limited / limited by the drawings or the description with reference to the drawings. In addition, the concepts and embodiments of the present invention presented in the present invention can be used by those of ordinary skill in the art as a basis for modifying or designing other structures in order to perform the same purpose of the present invention. , an equivalent structure modified or changed by those of ordinary skill in the art to which the present invention belongs is bound by the technical scope of the present invention described in the claims, and does not depart from the spirit or scope of the invention described in the claims Various changes, substitutions and changes are possible within the limits.

Claims

In the learning control method of a device for recognizing a face,
receiving an image set for training;
calculating a first embedding vector for the image set based on a face recognition model;
first learning based on the calculated first embedding vector; and
A second learning step based on the calculated first embedding vector and second embedding vector,
The learning repeatedly performs the learning cycle of the calculating step to the second learning step,
The second embedding vector is characterized in that it is the first embedding vector calculated in a previously performed learning cycle,
A learning control method for a device that recognizes faces.

According to claim 1, wherein the first learning step,
Identity representative vector corresponding to a given person

calculating a proximity between an identity-representative vector and the first embedding vector; and
Comprising the step of adjusting the parameter to increase the calculated proximity,
The adjusted parameter is a face recognition parameter used in the face recognition model.

characterized in that
A learning control method for a device that recognizes faces.

According to claim 1, wherein the second learning step,
Identity representative vector corresponding to a given person

, determining a proximity between the first embedding vector and the second embedding vector; and
The identity representative vector to increase the determined proximity

characterized in that it comprises the step of adjusting
A learning control method for a device that recognizes faces.

The method of claim 3, wherein the second learning step comprises:
Further comprising the step of compensating the second embedding vector,
Determining the proximity is characterized in that the determination based on the interpolated second embedding vector,
A learning control method for a device that recognizes faces.

5. The method of claim 4, wherein the interpolating comprises:
specific person

calculating the difference between
Characterized in that interpolation based on the calculated difference,
A learning control method for a device that recognizes faces.

The method of claim 5, wherein the interpolating comprises:
Further comprising the step of adjusting the scale to the calculated difference,
Interpolating based on the scaled difference, characterized in that
A learning control method for a device that recognizes faces.

4. The method according to any one of claims 2 and 3, wherein the determining of the proximity is a loss function defined by the following equation:

Calculation is done by

in the formula

ego,

is the set of images,

is the number of image sets,

is the first embedding vector,

is a labeled person,

is the L2 normalized vector,

Is

Characterized in that it is an identity representative vector for
A learning control method for a device that recognizes faces.

The method of claim 1,
The image set includes a plurality of images of a plurality of persons,
The first embedding vector is characterized in that it is calculated for each of the plurality of images,
A learning control method for a device that recognizes faces.

9. The method of claim 8,
identity representative vector

characterized in that it further comprises the step of storing the corresponding to each of the plurality of persons,
A learning control method for a device that recognizes faces.

The method of claim 1,
The method further comprising: updating the second embedding vector so that the calculated first embedding vector is included in the second embedding vector;
The updated second embedding vector is characterized in that it is used in a learning cycle performed after the update,
A learning control method for a device that recognizes faces.

A device for recognizing a face, comprising:
a face recognition model that receives an image set for training and calculates a first embedding vector for the input image set;
a first learning unit performing first learning based on the calculated first embedding vector; and
Comprising a second learning unit to learn a second based on the calculated first embedding vector and the second embedding vector,
The learning repeatedly performs the learning cycle of the calculation, the first and the second learning,
The second embedding vector is characterized in that it is the first embedding vector calculated in a previously performed learning cycle,
Face recognition device.

The method of claim 11, wherein the first learning unit,
Identity representative vector corresponding to a given person

Calculate the proximity between (identity-representative vector) and the first embedding vector,
Adjust the parameters to increase the calculated proximity,
The adjusted parameter is a face recognition parameter used in the face recognition model.

characterized in that
Face recognition device.

The method of claim 11, wherein the second learning unit,
Identity representative vector corresponding to a given person

, determine the proximity between the first embedding vector and the second embedding vector,
The identity representative vector to increase the determined proximity

characterized by adjusting
Face recognition device.

The method of claim 13, wherein the second learning unit determines the proximity,
interpolating the second embedding vector;
Characterized in determining based on the interpolated second embedding vector,
Face recognition device.

The method according to claim 14, wherein the second learning unit performs the interpolation,
specific person

Calculate the difference between
characterized by interpolating based on the calculated difference,
Face recognition device.

The method according to claim 15, wherein the second learning unit performs the interpolation,
Adjust the scale to the calculated difference,
Interpolating based on the scaled difference, characterized in that
Face recognition device.

[Claim 14] The loss function according to any one of claims 12 and 13, wherein the first and second learning units determine proximity in a loss function defined by the following equation.

Calculation is done by

in the formula

ego,

is the image set,

is the number of image sets,

is the first embedding vector,

is a labeled person,

is the L2 normalized vector,

Is

Characterized in that it is an identity representative vector for
Face recognition device.

12. The method of claim 11,
The image set includes a plurality of images of a plurality of persons,
The first embedding vector is characterized in that it is calculated for each of the plurality of images,
Face recognition device.

19. The method of claim 18,
identity representative vector

Characterized in that it further comprises a classification unit for storing the corresponding to each of the plurality of people,
Face recognition device.

12. The method of claim 11,
Further comprising a queue storage unit for updating the second embedding vector so that the calculated first embedding vector is included in the second embedding vector,
The updated second embedding vector is characterized in that it is used in a learning cycle performed after the update,
Face recognition device.