KR20220043905A

KR20220043905A - Face recognition system for training face recognition model using frequency components

Info

Publication number: KR20220043905A
Application number: KR1020210128737A
Authority: KR
Inventors: 김성욱; 백지현; 박성찬
Original assignee: 주식회사 포스코아이씨티
Priority date: 2020-09-29
Filing date: 2021-09-29
Publication date: 2022-04-05
Also published as: KR102673999B1

Abstract

A face recognition system that simultaneously performs face detection and falsification determination according to an aspect of the present invention includes: a face recognition model that recognizes a face from the input image of a user and determines whether the input image is forged or not; and a face recognition learning model that extracts a feature map from a training image, and acquires face information, forgery information, and frequency information for the training image based on the extracted feature map, compares the obtained face information, the forgery information and the frequency information with an actual value to calculate an error, and trains the face recognition model so that the calculated error has a smaller value than a reference value.

Description

FACE RECOGNITION SYSTEM FOR TRAINING FACE RECOGNITION MODEL USING FREQUENCY COMPONENTS

본 발명은 얼굴을 인식할 수 있는 얼굴인식 시스템에 관한 것이다.The present invention relates to a face recognition system capable of recognizing a face.

얼굴인증(face authentication) 기술이란 생체인식(Biometrics) 분야 중의 하나로써 사람마다 얼굴에 담겨있는 고유한 특징 정보를 이용하여 기계가 자동으로 사람을 식별하고 인증하는 기술을 의미하는 것으로서, 비밀번호 등에 의한 기존의 인증방식에 비해 보안성이 뛰어나 최근 다양한 분야에서 널리 이용되고 있다. Face authentication technology is one of the fields of biometrics, which means a technology in which a machine automatically identifies and authenticates a person using the unique feature information contained in each person's face. It has superior security compared to the authentication method, and has been widely used in various fields recently.

일반적인 얼굴인식 시스템은 출입 게이트 등에 설치된 디바이스에서 촬영된 얼굴이미지를 서버로 전송하고, 서버가 얼굴인식 및 얼굴인식에 따른 사용자 인증을 수행하고 인증결과를 디바이스로 전송함으로써 출입 게이트의 개방여부를 결정한다.A general face recognition system transmits a face image taken from a device installed in an access gate, etc. to a server, and the server performs face recognition and user authentication according to face recognition, and transmits the authentication result to the device to determine whether to open the access gate. .

일반적인 안면인식시스템은 실제 얼굴이 아닌 출력된 사진이나 모바일, 태블릿 등 화면의 사진으로 인증을 수행하는 경우, 이를 식별할 수 없어 부정한 사용자를 정당 사용자로 승인할 수 있다는 문제점이 있다.A general facial recognition system has a problem in that when authentication is performed with a picture of a screen such as a mobile or tablet, rather than an actual face, it cannot be identified, so that an illegal user can be approved as a legitimate user.

본 발명은 상술한 문제점을 해결하기 위한 것으로서, 별도의 특수 장비없이 얼굴 이미지가 실물이미지인지 여부를 판단할 수 있는 얼굴인식 시스템을 제공하는 것을 기술적 과제로 한다.The present invention is to solve the above problems, and it is a technical task to provide a face recognition system capable of determining whether a face image is a real image without additional special equipment.

또한, 본 발명은 얼굴탐지 및 위변조 판별에 대한 연산 속도를 향상시킬 수 있는 얼굴인식 시스템을 제공하는 것을 다른 기술적 과제로 한다.Another technical object of the present invention is to provide a face recognition system capable of improving the calculation speed for face detection and forgery detection.

본 발명의 일 측면에 따른 얼굴인식 시스템은, 사용자의 입력 이미지로부터 얼굴을 인식하고 입력 이미지의 위변조 여부를 판별하는 얼굴인식 모델, 및 학습 이미지로부터 특징맵을 추출하고, 추출된 특징맵을 기초로 학습 이미지에 대한 얼굴 정보, 위변조 정보 및 주파수 정보를 획득하며, 상기 획득된 얼굴 정보, 상기 위변조 정보 및 상기 주파수 정보를 실제값과 비교하여 오차를 산출하고, 상기 산출된 오차가 기준값 보다 작은 값을 가지도록 얼굴인식 모델을 학습시키는 얼굴인식 학습 모델을 포함한다.A face recognition system according to an aspect of the present invention extracts a feature map from a face recognition model that recognizes a face from a user's input image and determines whether the input image is forged or falsified, and a learning image, and based on the extracted feature map Acquire face information, forgery information, and frequency information for the learning image, calculate an error by comparing the obtained face information, the forgery information, and the frequency information with an actual value, and the calculated error is smaller than a reference value It includes a face recognition learning model that trains the face recognition model to have it.

본 발명에 따르면, 얼굴인식모델을 얼굴에 대한 특징뿐만 아니라 주파수 성분도 고려하여 학습시킴으로써, 일반 카메라로 촬영된 RGB 이미지가 얼굴인식모델에 입력되더라도 주파수 성분이 반영된 특징맵이 생성될 수 있다. 이에 따라, 본 발명은 적외선 센서와 같은 별도의 장치 없이 위변조 여부를 판별할 수 있으므로, 환경적 제약을 최소화하고 비용을 절감할 수 있다.According to the present invention, by learning the face recognition model in consideration of not only facial features but also frequency components, a feature map in which frequency components are reflected can be generated even when RGB images photographed with a general camera are input to the face recognition model. Accordingly, the present invention can determine whether forgery or not without a separate device such as an infrared sensor, it is possible to minimize environmental restrictions and reduce costs.

또한, 본 발명은 하나의 통합 얼굴인식모델을 통해 얼굴탐지 및 위변조 판별을 동시에 수행함으로써, 연산량을 감소시킬 수 있고, 이에 따라, 연산속도를 효과적으로 향상시킬 수 있다.In addition, the present invention can reduce the amount of computation by simultaneously performing face detection and forgery detection through a single integrated face recognition model, thereby effectively improving the computation speed.

도 1은 본 발명의 일 실시예에 따른 얼굴인식 시스템의 구성을 개략적으로 보여주는 블록도이다.
도 2는 본 발명의 일 실시예에 따른 얼굴인식 서버의 구성을 개략적으로 보여주는 블록도이다.
도 3은 도 2의 얼굴인식 학습 모델의 구성의 일 예를 보여주는 블록도이다.
도 4는 도 3의 제1, 제2 및 제3 컨벌루션 유닛의 구성을 보여주는 블록도이다.
도 5는 도 3의 제4 및 제5 컨벌루션 유닛의 구성을 보여주는 블록도이다.
도 6은 도 3의 단일 스테이지 네트워크의 구성을 보여주는 블록도이다.
도 7은 도 6의 제6, 제8 및 제10 컨벌루션 유닛의 구성을 보여주는 블록도이다.
도 8은 도 6의 제7 및 제9 컨벌루션 유닛의 구성을 보여주는 블록도이다.
도 9는 도 3은 도 2의 얼굴인식 모델의 구성의 일 예를 보여주는 블록도이다.
도 10은 본 발명의 일 실시예에 따른 에지 디바이스의 구성을 개략적으로 보여주는 블록도이다.1 is a block diagram schematically showing the configuration of a face recognition system according to an embodiment of the present invention.
2 is a block diagram schematically showing the configuration of a face recognition server according to an embodiment of the present invention.
3 is a block diagram illustrating an example of the configuration of the face recognition learning model of FIG. 2 .
FIG. 4 is a block diagram showing the configuration of the first, second, and third convolutional units of FIG. 3 .
FIG. 5 is a block diagram showing the configuration of fourth and fifth convolutional units of FIG. 3 .
FIG. 6 is a block diagram showing the configuration of the single-stage network of FIG. 3 .
FIG. 7 is a block diagram showing the configuration of a sixth, eighth, and tenth convolution unit of FIG. 6 .
FIG. 8 is a block diagram showing the configuration of seventh and ninth convolution units of FIG. 6 .
9 is a block diagram showing an example of the configuration of the face recognition model of FIG. 3 .
10 is a block diagram schematically showing the configuration of an edge device according to an embodiment of the present invention.

이하, 첨부되는 도면을 참고하여 본 발명의 실시예들에 대해 상세히 설명한다.Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings.

본 명세서에서 서술되는 용어의 의미는 다음과 같이 이해되어야 할 것이다.The meaning of the terms described in this specification should be understood as follows.

단수의 표현은 문맥상 명백하게 다르게 정의하지 않는 한 복수의 표현을 포함하는 것으로 이해되어야 하고, "제1", "제2" 등의 용어는 하나의 구성요소를 다른 구성요소로부터 구별하기 위한 것으로, 이들 용어들에 의해 권리범위가 한정되어서는 아니 된다.The singular expression is to be understood as including the plural expression unless the context clearly defines otherwise, and the terms "first", "second", etc. are used to distinguish one element from another, The scope of rights should not be limited by these terms.

"포함하다" 또는 "가지다" 등의 용어는 하나 또는 그 이상의 다른 특징이나 숫자, 단계, 동작, 구성요소, 부분품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 미리 배제하지 않는 것으로 이해되어야 한다.It should be understood that terms such as “comprise” or “have” do not preclude the possibility of addition or existence of one or more other features or numbers, steps, operations, components, parts, or combinations thereof.

"적어도 하나"의 용어는 하나 이상의 관련 항목으로부터 제시 가능한 모든 조합을 포함하는 것으로 이해되어야 한다. 예를 들어, "제1 항목, 제2 항목 및 제3 항목 중에서 적어도 하나"의 의미는 제1 항목, 제2 항목 또는 제3 항목 각각 뿐만 아니라 제1 항목, 제2 항목 및 제3 항목 중에서 2개 이상으로부터 제시될 수 있는 모든 항목의 조합을 의미한다.The term “at least one” should be understood to include all possible combinations from one or more related items. For example, the meaning of "at least one of the first, second, and third items" means 2 of the first, second, and third items, as well as each of the first, second, or third items. It means a combination of all items that can be presented from more than one.

이하, 첨부되는 도면을 참고하여 본 발명의 실시예들에 대해 상세히 설명하도록 한다.Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings.

도 1은 본 발명의 일 실시예에 따른 얼굴인식 시스템의 구성을 개략적으로 보여주는 블록도이고, 도 2는 본 발명의 일 실시예에 따른 얼굴인식 서버의 구성을 개략적으로 보여주는 블록도이다.1 is a block diagram schematically showing the configuration of a face recognition system according to an embodiment of the present invention, and FIG. 2 is a block diagram schematically showing the configuration of a face recognition server according to an embodiment of the present invention.

도 1 및 도 2를 참조하면, 본 발명의 일 실시예에 따른 얼굴인식 시스템(100)은 얼굴인식 서버(110) 및 복수개의 에지 디바이스(120)들을 포함한다.1 and 2 , the face recognition system 100 according to an embodiment of the present invention includes a face recognition server 110 and a plurality of edge devices 120 .

얼굴인식 서버(110)는 얼굴인식모델을 생성하고, 생성된 얼굴인식모델을 이용하여 사용자 단말기(130)로부터 입력되는 사용자의 이미지로부터 특징벡터를 추출한다. 얼굴인식 서버(110)는 특징벡터를 이용하여 타겟 사용자의 인증을 위한 어레이 파일(Array File)을 생성한다. 그리고, 얼굴인식 서버(110)는 생성된 어레이 파일을 에지 디바이스(120)로 전송함으로써 에지 디바이스(120)가 타겟 사용자를 인증할 수 있도록 한다.The face recognition server 110 generates a face recognition model, and extracts a feature vector from the user's image input from the user terminal 130 using the generated face recognition model. The face recognition server 110 generates an array file for authentication of the target user by using the feature vector. Then, the face recognition server 110 transmits the generated array file to the edge device 120 so that the edge device 120 can authenticate the target user.

이를 위해, 얼굴인식 서버(110)는 도 2에 도시된 바와 같이 사용자 등록부(210), 사용자 얼굴인식부(220), 얼굴인식모델(240), 얼굴인식 학습 모델(230), 어레이 파일 생성부(250), 에지 디바이스 관리부(260) 및 인터페이스부(270)를 포함할 수 있다.To this end, the face recognition server 110 includes a user registration unit 210, a user face recognition unit 220, a face recognition model 240, a face recognition learning model 230, and an array file generator as shown in FIG. 250 , an edge device manager 260 , and an interface unit 270 may be included.

사용자 등록부(210)는 등록을 희망하는 사용자의 사용자 단말기(130)로부터 하나 이상의 사용자 이미지를 수신한다. 사용자 등록부(210)는 사용자 이미지가 수신되면 해당 사용자가 사용자 이미지와 동일인인지 여부를 확인하고, 동일인인 것으로 판단되면 해당 사용자에게 부여되어 있는 출입권한정보를 획득하여 사용자 이미지와 함께 사용자 데이터베이스(212)에 등록한다.The user registration unit 210 receives one or more user images from the user terminal 130 of the user who wishes to register. When a user image is received, the user registration unit 210 checks whether the corresponding user is the same person as the user image, and when it is determined that the user is the same person, obtains access permission information granted to the user and the user database 212 together with the user image. register in

일 실시예에 있어서, 사용자 등록부(210)는 사용자 단말기(130)로부터 해당 사용자의 식별정보를 사용자 이미지와 함께 수신할 수 있다. 예컨대, 사용자 등록부(210)는 사용자의 아이디, 성명, 전화번호, 또는 사용자의 직원번호 등과 같은 사용자의 식별정보를 해당 사용자 이미지와 함께 수신할 수 있다. 이러한 실시예에 따르는 경우 사용자 등록부(210)는 사용자의 식별정보 및 사용자의 출입원한정보를 해당 사용자 이미지와 함께 사용자 데이터베이스(212)에 등록할 수 있다.In an embodiment, the user registration unit 210 may receive the user's identification information from the user terminal 130 together with the user image. For example, the user registration unit 210 may receive the user's identification information, such as the user's ID, name, phone number, or the user's employee number, together with the corresponding user image. According to this embodiment, the user registration unit 210 may register the user's identification information and the user's access access information together with the corresponding user image in the user database 212 .

한편, 사용자 등록부(210)는 사용자 단말기(130)로부터 복수개의 사용자 이미지를 입력 받는 경우 서로 다른 사용자 이미지가 입력되도록 유도할 수 있다. 예컨대, 사용자 등록부(210)는 사용자가 사용자 단말기(130)를 통해 다른 환경에서 촬영된 사용자 이미지, 다른 조도에서 촬영된 사용자 이미지 또는 마스크를 착용한 사용자 이미지를 입력하도록 유도할 수 있다. 이와 같이, 사용자 등록부(210)가 한 명의 사용자로부터 서로 다른 환경, 서로 다른 조도 또는 마스크 착용 여부에 따라 촬영된 복수개의 사용자 이미지를 수신함으로써 얼굴인식의 정확도를 향상시킬 수 있게 된다.Meanwhile, when receiving a plurality of user images from the user terminal 130 , the user registration unit 210 may induce different user images to be input. For example, the user registration unit 210 may induce the user to input a user image photographed in a different environment, a user image photographed in a different illuminance, or a user image wearing a mask through the user terminal 130 . In this way, the user registration unit 210 can improve the accuracy of face recognition by receiving a plurality of user images taken according to different environments, different illuminance, or whether a mask is worn from a single user.

사용자 얼굴인식부(220)는 얼굴인식 학습 모델(240)에 의해 학습된 얼굴인식모델(240)에 복수개의 사용자 입력 이미지를 입력한다.The user face recognition unit 220 inputs a plurality of user input images to the face recognition model 240 learned by the face recognition learning model 240 .

사용자 얼굴인식부(220)는 얼굴인식 모델(240)을 통해 사용자 입력 이미지의 위변조 여부를 판별할 수 있다. 그리고, 사용자 얼굴인식부(220)는 위변조되지 않은 실물이미지인 경우, 얼굴영역이 포함된 얼굴 이미지를 획득하고, 획득된 얼굴 이미지로부터 특징벡터를 추출할 수 있다. 얼굴인식 모델(240)과 얼굴인식 학습 모델(230)에 대한 구체적인 설명은 후술하도록 한다.The user face recognition unit 220 may determine whether the user input image is forged or altered through the face recognition model 240 . And, in the case of a real image that has not been forged or altered, the user face recognition unit 220 may obtain a face image including a face region and extract a feature vector from the obtained face image. A detailed description of the face recognition model 240 and the face recognition learning model 230 will be described later.

어레이 파일 생성부(250)는 사용자 얼굴인식부(220)에 의하여 복수의 사용자들 각각의 얼굴 이미지로부터 추출된 특징벡터들을 이용하여 각 사용자 별로 어레이(Array)를 생성하고, 생성된 어레이들을 하나의 파일로 머지하여 어레이 파일을 생성한다. The array file generating unit 250 generates an array for each user by using the feature vectors extracted from the face images of each of the plurality of users by the user face recognition unit 220, and uses the generated arrays as one. Merge to a file to create an array file.

어레이 파일 생성부(250)는 생성된 어레이 파일을 어레이 파일 데이터베이스(미도시)에 저장할 수 있다. The array file generator 250 may store the generated array file in an array file database (not shown).

일 실시예에 있어서, 어레이 파일 생성부(250)에 의해 생성되는 어레이는 각 사용자의 얼굴 이미지로부터 추출된 특징벡터들과 각 사용자의 키(Key)값으로 구성될 수 있다. 이때, 사용자의 키 값은 각 사용자의 식별정보 및 각 사용자의 출입권한정보를 포함한다. 각 사용자의 식별정보는 상술한 바와 같이 각 사용자의 아이다, 성명, 전화번호, 또는 직원번호 등으로 정의될 수 있고, 각 사용자의 출입권한정보는 각 사용자가 출입할 수 있는 각 층에 대한 정보를 포함할 수 있다.In one embodiment, the array generated by the array file generator 250 may be composed of feature vectors extracted from each user's face image and each user's key value. In this case, the user's key value includes each user's identification information and each user's access right information. As described above, each user's identification information may be defined as each user's ID, name, phone number, or employee number, etc., and each user's access right information includes information on each floor to which each user can access. may include

일 실시예에 있어서, 어레이 파일 생성부(250)는 에지 디바이스(120)가 설치되어 있는 각 장소 별로 어레이 파일을 생성할 수 있다. 예컨대, 제1 어레이 파일은 제1 층에 대한 출입권한이 부여된 사용자들의 어레이들로 구성될 수 있고, 제2 어레이 파일은 제2 층에 대한 출입원한이 부여된 사용자들의 어레이들로 구성될 수 있다. 이를 위해, 어레이 파일 생성부(250)는 각 사용자의 어레이들 또한 각 사용자가 출입할 수 있는 지역 별로 구분하여 생성할 수 있다. 예컨대, 제1 사용자가 제1 층과 제3 층에 출입 가능한 권한을 가진 경우, 어레이 파일 생성부(230)는 제1 사용자에 대해 제1 층에 대한 출입권한정보가 포함된 제1 어레이와 제3 층에 대한 출입권한정보가 포함된 제2 어레이를 별도로 생성할 수 있다.In an embodiment, the array file generator 250 may generate an array file for each location where the edge device 120 is installed. For example, the first array file may be composed of arrays of users granted access to the first floor, and the second array file may be composed of arrays of users granted access to the second floor. there is. To this end, the array file generating unit 250 may generate each user's arrays by dividing them by regions to which each user can access. For example, when the first user has permission to access the first floor and the third floor, the array file generator 230 generates a first array including access permission information for the first floor for the first user and a second A second array including access permission information for the third floor can be separately created.

본 발명에 따른 어레이 파일 생성부(250)가 에지 디바이스(120)가 설치된 각 장소 별로 어레이 파일을 생성하는 이유는 사용자의 얼굴을 인증하는 에지 디바이스(120)가 각 장소 별로 설치되는 경우, 특정 장소에 설치된 에지 디바이스(120)로 해당 장소에 대한 출입권한정보가 포함된 어레이 파일만을 전송하면 되므로 어레이 파일의 전송 및 에지 디바이스(120)에서의 어레이 파일 관리가 용이해지기 때문이다.The reason that the array file generating unit 250 according to the present invention generates an array file for each place where the edge device 120 is installed is that when the edge device 120 that authenticates the user's face is installed for each place, a specific place This is because only the array file including the access permission information for the corresponding place needs to be transmitted to the edge device 120 installed in the .

상술한 실시예에 있어서는 어레이 파일 생성부(250)가 각 장소 별로 어레이 파일을 생성하는 것으로 기재하였지만, 변형된 실시예에 있어서 어레이 파일 생성부(250)는 에지 디바이스(120)가 설치된 모든 장소에 대한 권한정보가 포함된 하나의 어레이 파일을 생성하고, 생성된 어레이 파일을 모든 에지 디바이스(120)로 전송할 수도 있다.In the above-described embodiment, it has been described that the array file generating unit 250 generates an array file for each location, but in a modified embodiment, the array file generating unit 250 is installed in all places where the edge device 120 is installed. It is also possible to generate one array file including permission information for the data and transmit the generated array file to all edge devices 120 .

에지 디바이스 관리부(260)는 각 장소에 설치되어 있는 복수개의 에지 디바이스(120)들의 정보를 에지 디바이스 데이터베이스(262)에 등록한다. 일 실시예에 있어서, 에지 디바이스 등록부(260)는 각 에지 디바이스(120)의 식별정보를 각 에지 디바이스가 설치된 장소와 매핑시켜 에지 디바이스 데이터베이스(262)에 저장할 수 있다. 여기서, 에지 디바이스(120)의 식별정보는 에지 디바이스(120)의 제조사 및 시리얼 번호 등을 포함할 수 있다.The edge device manager 260 registers information on the plurality of edge devices 120 installed in each location in the edge device database 262 . According to an embodiment, the edge device registration unit 260 may store identification information of each edge device 120 in the edge device database 262 by mapping the identification information of each edge device to a place where each edge device is installed. Here, the identification information of the edge device 120 may include a manufacturer and serial number of the edge device 120 .

한편, 에지 디바이스 관리부(260)는 인터페이스부(270)를 통해 미리 정해진 기간 마다 에지 디바이스(120)로부터 인증기록을 수신하고, 수신된 출입기록을 에지 디바이스 데이터베이스(262)에 저장할 수 있다.On the other hand, the edge device manager 260 may receive the authentication record from the edge device 120 every predetermined period through the interface unit 270 , and store the received access record in the edge device database 262 .

출입권한정보 관리부(255)는 각 사용자 별로 부여되어 있는 출입권한정보를 변경하거나 새로운 출입권한정보를 추가한다. 일 실시예에 있어서, 출입권한 정보 관리부(255)는 각 사용자 별로 출입권한정보를 별개로 부여하거나 각 사용자가 속한 조직 단위로 출입권한정보를 부여할 수 있다.The access authority information management unit 255 changes access authority information assigned to each user or adds new access authority information. In an embodiment, the access right information management unit 255 may separately grant access right information to each user or grant access right information to an organizational unit to which each user belongs.

인터페이스부(270)는 얼굴인식 학습 모델(230)에 의해 학습된 얼굴인식모델 및 어레이 파일을 미리 정해진 방식으로 암호화하여 각 에지 디바이스(120)로 전송한다. 일 실시예에 있어서, 인터페이스부(270)는 공개키 기반의 암호화 알고리즘을 이용하여 얼굴인식모델 및 어레이 파일을 암호화하여 각 에지 디바이스(120)로 전송할 수 있다.The interface unit 270 encrypts the face recognition model and the array file learned by the face recognition learning model 230 in a predetermined manner and transmits it to each edge device 120 . In an embodiment, the interface unit 270 may encrypt the face recognition model and the array file using a public key-based encryption algorithm and transmit it to each edge device 120 .

한편, 인터페이스부(270)는 암호화된 어레이 파일을 에지 디바이스(120)와 약속된 프로토콜에 따라 에지 디바이스(120)로 전송할 수 있다. 또한, 인터페이스부(270)는 각 에지 디바이스(120)로부터 미리 정해진 기간 마다 인증기록을 수신하여 에지 디바이스(120)로 제공할 수 있다.Meanwhile, the interface unit 270 may transmit the encrypted array file to the edge device 120 according to a protocol agreed with the edge device 120 . In addition, the interface unit 270 may receive an authentication record from each edge device 120 every predetermined period and provide it to the edge device 120 .

본 발명의 일 실시예에 따른 얼굴인식 시스템(100)은 얼굴인식 모델(240)을 통해 입력 이미지의 얼굴탐지 및 위변조 판별을 동시에 수행한다. 이러한 얼굴인식 모델(240)은 얼굴인식 학습 모델(230)에 의하여 학습된다. The face recognition system 100 according to an embodiment of the present invention simultaneously performs face detection and forgery detection of an input image through the face recognition model 240 . The face recognition model 240 is trained by the face recognition learning model 230 .

구체적으로, 얼굴인식 학습 모델(230)은 학습 이미지를 이용하여 얼굴인식 모델(240)을 학습시킨다. 얼굴인식 학습 모델(230)는 얼굴인식 모델(240)을 구성하는 컨벌루션 신경망을 지속적으로 학습시킴으로써 최적의 얼굴인식모델(240)을 생성할 수 있다.Specifically, the face recognition learning model 230 trains the face recognition model 240 using the training image. The face recognition learning model 230 may generate the optimal face recognition model 240 by continuously learning the convolutional neural network constituting the face recognition model 240 .

이하에서는 도 3 내지 도 9를 참조하여 얼굴인식 학습 모델(230)에 대하여 구체적으로 설명하도록 한다.Hereinafter, the face recognition learning model 230 will be described in detail with reference to FIGS. 3 to 9 .

도 3은 도 2의 얼굴인식 학습 모델의 구성의 일 예를 보여주는 블록도이고, 도 4는 도 3의 제1, 제2 및 제3 컨벌루션 유닛의 구성을 보여주는 블록도이며, 도 5는 도 3의 제4 및 제5 컨벌루션 유닛의 구성을 보여주는 블록도이다. 도 6은 도 3의 단일 스테이지 네트워크의 구성을 보여주는 블록도이며, 도 7은 도 6의 제6, 제8 및 제10 컨벌루션 유닛의 구성을 보여주는 블록도이고, 도 8은 도 6의 제7 및 제9 컨벌루션 유닛의 구성을 보여주는 블록도이다. 도 9는 도 3은 도 2의 얼굴인식 모델의 구성의 일 예를 보여주는 블록도이다.3 is a block diagram showing an example of the configuration of the face recognition learning model of FIG. 2 , FIG. 4 is a block diagram showing the configuration of the first, second and third convolution units of FIG. 3 , and FIG. 5 is FIG. It is a block diagram showing the configuration of the fourth and fifth convolution units of 6 is a block diagram showing the configuration of the single-stage network of FIG. 3 , FIG. 7 is a block diagram showing the configuration of the 6th, 8th and 10th convolutional units of FIG. 6 , and FIG. 8 is the 7th and 10th convolutional units of FIG. It is a block diagram showing the configuration of a ninth convolution unit. 9 is a block diagram showing an example of the configuration of the face recognition model of FIG. 3 .

도 3 내지 도 9를 참조하면, 얼굴인식 학습 모델(230)은 컨벌루션 신경망(Convolutional Neural Network: CNN)을 기반으로 구성되어, 입력 이미지, 예컨대, 얼굴 이미지의 특징맵을 생성하고, 생성된 특징맵을 기초로 얼굴인식 모델(240)을 학습시킬 수 있다. 얼굴인식 학습 모델(230)은 입력 이미지를 미리 정해진 단계까지 다운샘플링하거나 업샘플링함으로써 하나의 입력 이미지로부터 서로 다른 해상도를 가지는 복수개의 특징맵들을 생성할 수 있다.3 to 9 , the face recognition learning model 230 is configured based on a convolutional neural network (CNN) to generate a feature map of an input image, for example, a face image, and the generated feature map. The face recognition model 240 may be trained based on . The face recognition learning model 230 may generate a plurality of feature maps having different resolutions from one input image by downsampling or upsampling the input image to a predetermined stage.

이러한 얼굴인식 학습 모델(230)은 백본 네트워크(310), 제1 네트워크(320), 제2 학습 네트워크(350) 및 오차감소부(360)을 포함한다.The face recognition learning model 230 includes a backbone network 310 , a first network 320 , a second learning network 350 , and an error reduction unit 360 .

백본 네트워크(310)는 학습 이미지로부터 스케일이 서로 다른 복수개의 학습 입력 이미지들을 생성한다. 이때, 학습 이미지는 학습 이미지 데이터베이스(305)에 저장되어 있으며, 실물이미지 및 위변조 이미지를 포함할 수 있다.The backbone network 310 generates a plurality of training input images having different scales from the training image. In this case, the learning image is stored in the learning image database 305, and may include a real image and a forged image.

구체적으로, 백본 네트워크(310)는 입력되는 하나의 학습 이미지를 미리 정해진 단계까지 다운샘플링하면서 해상도와 차원이 서로 다른 복수개의 학습 입력 이미지들을 생성할 수 있다. 이하에서는 설명의 편의를 위하여 3단계까지 다운샘플링하는 것으로 설명하고 있으나, 반드시 이에 한정되지는 않는다. Specifically, the backbone network 310 may generate a plurality of learning input images having different resolutions and different dimensions while downsampling one input learning image to a predetermined stage. Hereinafter, down-sampling is described as up to three steps for convenience of description, but the present invention is not limited thereto.

백본 네트워크(310)는 학습 이미지에 컨벌루션 연산을 수행하여 제1 해상도와 제1 차원을 가지는 제1 학습 입력 이미지를 생성할 수 있다. 이때, 제1 학습 입력 이미지의 제1 해상도는 학습 이미지의 해상도 보다 작으며, 제1 차원은 학습 이미지의 차원 보다 클 수 있다.The backbone network 310 may generate a first training input image having a first resolution and a first dimension by performing a convolution operation on the training image. In this case, the first resolution of the first training input image may be smaller than the resolution of the training image, and the first dimension may be larger than the dimension of the training image.

또한, 백본 네트워크(310)는 제1 학습 입력 이미지에 컨벌루션 연산을 수행하여 제2 해상도와 제2 차원을 가지는 제2 학습 입력 이미지를 생성할 수 있다. 이때, 제2 학습 입력 이미지의 제2 해상도는 제1 학습 입력 이미지의 제1 해상도 보다 작으며, 제2 차원은 제1 학습 입력 이미지의 제1 차원 보다 클 수 있다. 일 예로, 제2 학습 입력 이미지의 제2 해상도는 제1 학습 입력 이미지의 제1 해상도의 1/2일 수 있다.Also, the backbone network 310 may generate a second learning input image having a second resolution and a second dimension by performing a convolution operation on the first learning input image. In this case, the second resolution of the second learning input image may be smaller than the first resolution of the first learning input image, and the second dimension may be greater than the first dimension of the first learning input image. For example, the second resolution of the second learning input image may be 1/2 of the first resolution of the first learning input image.

또한, 백본 네트워크(310)는 제2 학습 입력 이미지에 컨벌루션 연산을 수행하여 제3 해상도와 제3 차원을 가지는 제3 학습 입력 이미지를 생성할 수 있다. 이때, 제3 학습 입력 이미지의 제3 해상도는 제2 학습 입력 이미지의 제2 해상도 보다 작으며, 제3 차원은 제2 학습 입력 이미지의 제2 차원 보다 클 수 있다. 일 예로, 제3 학습 입력 이미지의 제3 해상도는 제2 학습 입력 이미지의 제2 해상도의 1/2일 수 있다.Also, the backbone network 310 may generate a third learning input image having a third resolution and a third dimension by performing a convolution operation on the second learning input image. In this case, the third resolution of the third learning input image may be smaller than the second resolution of the second learning input image, and the third dimension may be greater than the second dimension of the second learning input image. For example, the third resolution of the third learning input image may be 1/2 of the second resolution of the second learning input image.

백본 네트워크(310)는 스케일이 서로 다른 복수개의 학습 입력 이미지들을 제1 네트워크(320)로 출력시킨다.The backbone network 310 outputs a plurality of training input images having different scales to the first network 320 .

제1 네트워크(320)는 스케일이 서로 다른 복수개의 학습 입력 이미지들 각각에 대하여 특징맵을 추출한다. 제1 네트워크(320)는 제1 특징맵 생성부(322), 제2 특징맵 생성부(324) 및 제3 특징맵 생성부(326)을 포함할 수 있다.The first network 320 extracts a feature map for each of a plurality of training input images having different scales. The first network 320 may include a first feature map generator 322 , a second feature map generator 324 , and a third feature map generator 326 .

제1 특징맵 생성부(322)는 제3 학습 입력 이미지로부터 제1 특징맵을 생성한다. 제1 특징맵 생성부(322)는 제3 학습 입력 이미지에 제1 컨벌루션 필터를 적용하여 제1 특징맵을 생성하는 제1 컨벌루션 유닛(331)을 포함할 수 있다. The first feature map generator 322 generates a first feature map from the third learning input image. The first feature map generator 322 may include a first convolution unit 331 that generates a first feature map by applying a first convolution filter to the third learning input image.

제1 컨벌루션 유닛(331)은 도 4에 도시된 바와 같이 제1 컨벌루션 연산부(410), 제1 정규화부(420) 및 제1 비선형화부(430)를 포함할 수 있다. As shown in FIG. 4 , the first convolution unit 331 may include a first convolution operation unit 410 , a first normalization unit 420 , and a first non-linearization unit 430 .

제1 컨벌루션 유닛(331)에 포함된 제1 컨벌루션 연산부(410)는 제3 학습 입력 이미지가 입력되면, 제3 학습 입력 이미지에 대하여 제1 컨벌루션 필터를 이용하여 컨벌루션 연산을 수행하고, 제1 특징맵이 생성될 수 있다.When the third learning input image is input, the first convolution operation unit 410 included in the first convolution unit 331 performs a convolution operation on the third learning input image by using the first convolution filter, and performs a first feature A map may be created.

일 실시예에 있어서, 제1 컨벌루션 필터는 1*1 크기를 갖고 스트라이드(Stride)의 값이 1일 수 있다. 제1 컨벌루션 필터는 제3 학습 입력 이미지의 해상도는 변경시키지 않으면서 제3 학습 입력 이미지의 차원만 특정 차원으로 변경시키는 필터일 수 있다. 예컨대, 제1 컨벌루션 필터가 1*1 크기를 가지고 K개인 경우, 제1 컨벌루션 연산부(410)는 제3 학습 입력 이미지에 제1 컨벌루션 필터를 적용하여 제3 학습 입력 이미지의 차원을 K차원으로 변경시킬 수 있다. 제3 학습 입력 이미지로부터 추출된 제1 특징맵과 다른 학습 입력 이미지로부터 추출된 특징맵의 차원을 특정 차원으로 동일하게 변경함으로써, 이들 간의 연산이 간단해질 수 있다.In an embodiment, the first convolutional filter may have a size of 1*1 and a value of a stride may be 1. The first convolution filter may be a filter that changes only the dimension of the third learning input image to a specific dimension without changing the resolution of the third learning input image. For example, if the first convolution filter has a size of 1*1 and has K, the first convolution operation unit 410 applies the first convolution filter to the third learning input image to change the dimension of the third learning input image to K dimension. can do it By changing the dimensions of the first feature map extracted from the third learning input image and the feature map extracted from other learning input images to be the same as a specific dimension, the operation between them can be simplified.

제1 컨벌루션 유닛(331)에 포함된 제1 정규화부(420)는 제1 컨벌루션 연산부(410)에 의해 생성된 제1 특징맵을 배치(Batch) 단위로 정규화한다. 배치란 한 번에 처리할 이미지들의 개수단위를 의미한다. 본 발명에 따른 제1 정규화부(420)가 배치단위로 정규화를 수행하는 이유는 배치 단위로 정규화를 수행하게 되면 각 이미지에 대한 평균 및 분산이 배치 전체에 대한 평균 및 분산과 다를 수 있는데 이러한 특징이 일종의 노이즈로 작용하게 되어 전체적인 성능이 향상될 수 있기 때문이다. The first normalizer 420 included in the first convolution unit 331 normalizes the first feature map generated by the first convolution operation unit 410 in units of batches. Batch refers to the unit of the number of images to be processed at one time. The reason that the first normalizer 420 according to the present invention performs normalization in batches is that when normalization is performed in batches, the mean and variance for each image may be different from the mean and variance for the entire batch. This is because the overall performance can be improved by acting as a kind of noise.

또한, 배치 정규화를 통해 네트워크의 각 층마다 입력의 분포(Distribution)가 일관성 없이 바뀌는 내부 공분산 이동(Internal Covariance Shift) 현상에 의해 학습의 복잡성이 증가하고 그라디언트 소멸 또는 폭발(Gradient Vanishing or Exploding)이 일어나는 것을 방지할 수 있게 되기 때문이다.In addition, through batch normalization, the learning complexity increases and gradient vanishing or exploding occurs due to the internal covariance shift phenomenon, in which the distribution of inputs for each layer of the network changes inconsistently. because it can be prevented.

제1 컨벌루션 유닛(331)에 포함된 제1 비선형화부(430)는 정규화된 제1 특징맵에 활성화함수를 적용함으로써 제1 특징맵에 비선형적 특성을 부여한다. 일 실시예에 있어서, 제1 비선형화부(430)는 제1 특징맵의 값들 중 양의 값을 동일하게 출력하고 음의 값은 그 크기를 감소시켜, 예컨대, 0을 출력하는 활성화함수를 제1 특징맵에 적용시킬 수 있다.The first non-linearization unit 430 included in the first convolution unit 331 applies an activation function to the normalized first feature map, thereby imparting a non-linear characteristic to the first feature map. In an embodiment, the first non-linearization unit 430 outputs the same positive value among the values of the first feature map and decreases the size of the negative value, for example, by using the activation function to output 0. It can be applied to the feature map.

제1 특징맵 생성부(322)에 의해 생성된 제1 특징맵은 제2 학습 네트워크(350)에서 얼굴인식 모델(240)을 학습하는데 이용될 수 있다. 제1 특징맵은 제1 특징맵 생성부(322)에 의해 생성된 형태로 제2 학습 네트워크(350)에서 이용될 수 있으나, 반드시 이에 한정되지는 않는다. The first feature map generated by the first feature map generator 322 may be used to learn the face recognition model 240 in the second learning network 350 . The first feature map may be used in the second learning network 350 in a form generated by the first feature map generator 322 , but is not limited thereto.

제1 특징맵은 컨벌루션 연산이 수행되는 별도의 네트워크에 의하여 변경될 수도 있다. 일 실시예에 있어서, 제1 특징맵은 단일 스테이지 네트워크(340)에 의하여 변경될 수 있고, 변경된 제1 특징맵이 제2 학습 네트워크(350)에서 얼굴인식 모델(240)을 학습하는데 이용될 수 있다. The first feature map may be changed by a separate network on which a convolution operation is performed. In an embodiment, the first feature map may be changed by the single-stage network 340 , and the modified first feature map may be used to train the face recognition model 240 in the second learning network 350 . there is.

또한, 제1 특징맵 생성부(322)에 의해 생성된 제1 특징맵은 제2 특징맵 생성부(324)에서 제2 특징맵이 생성되는데 이용될 수 있다.Also, the first feature map generated by the first feature map generator 322 may be used to generate a second feature map by the second feature map generator 324 .

제2 특징맵 생성부(324)는 제1 특징맵 생성부(322)에 의해 생성된 제1 특징맵 및 제2 학습 입력 이미지를 이용하여 제2 특징맵을 생성한다. 제2 특징맵 생성부(324)는 제1 업샘플링부(332), 제2 컨벌루션 유닛(333), 제3 컨벌루션 유닛(334) 및 제1 연산부(338)를 포함할 수 있다.The second feature map generator 324 generates a second feature map by using the first feature map and the second learning input image generated by the first feature map generator 322 . The second feature map generator 324 may include a first upsampling unit 332 , a second convolution unit 333 , a third convolution unit 334 , and a first operation unit 338 .

제2 컨벌루션 유닛(333)은 제2 학습 입력 이미지에 제1 컨벌루션 필터를 적용하여 제2 특징맵을 생성할 수 있다. 제2 컨벌루션 유닛(333)은 제1 컨벌루션 유닛(331)과 동일하게 제1 컨벌루션 연산부(410), 제1 정규화부(420) 및 제1 비선형화부(430)를 포함할 수 있다. The second convolution unit 333 may generate a second feature map by applying the first convolution filter to the second learning input image. The second convolution unit 333 may include a first convolution operation unit 410 , a first normalization unit 420 , and a first non-linearization unit 430 in the same manner as the first convolution unit 331 .

제2 컨벌루션 유닛(333)에 포함된 제1 컨벌루션 연산부(410)는 제2 학습 입력 이미지가 입력되면, 제2 학습 입력 이미지에 대하여 제1 컨벌루션 필터를 이용하여 컨벌루션 연산을 수행할 수 하고, 제2 특징맵이 생성될 수 있다.When the second learning input image is input, the first convolution operation unit 410 included in the second convolution unit 333 may perform a convolution operation on the second learning input image by using the first convolution filter, and 2 A feature map may be generated.

일 실시예에 있어서, 제1 컨벌루션 필터는 1*1 크기를 갖고 스트라이드(Stride)의 값이 1일 수 있다. 제1 컨벌루션 필터는 제2 학습 입력 이미지의 해상도는 변경시키지 않으면서 제2 학습 입력 이미지의 차원만 특정 차원으로 변경시키는 필터일 수 있다. 예컨대, 제1 컨벌루션 필터가 1*1 크기를 가지고 K개인 경우, 제1 컨벌루션 연산부(410)는 제2 학습 입력 이미지에 제1 컨벌루션 필터를 적용하여 제2 학습 입력 이미지의 차원을 K차원으로 변경시킬 수 있다. 이를 통해, 제2 학습 입력 이미지로부터 추출된 제2 특징맵과 제1 특징맵의 차원이 특정 차원으로 동일하게 변경될 수 있으며, 제2 특징맵과 제1 특징맵 간의 연산이 간단해질 수 있다. 여기서, 제2 특징맵은 차원이 변경된 제2 학습 입력 이미지에 상응한다.In an embodiment, the first convolutional filter may have a size of 1*1 and a value of a stride may be 1. The first convolution filter may be a filter that changes only the dimension of the second learning input image to a specific dimension without changing the resolution of the second learning input image. For example, when the first convolution filter has a size of 1*1 and has K, the first convolution operation unit 410 applies the first convolution filter to the second learning input image to change the dimension of the second learning input image to the K dimension. can do it Through this, the dimensions of the second feature map and the first feature map extracted from the second learning input image can be changed to a specific dimension in the same way, and the operation between the second feature map and the first feature map can be simplified. Here, the second feature map corresponds to the second learning input image whose dimension is changed.

제2 컨벌루션 유닛(333)에 포함된 제1 정규화부(420)는 제1 컨벌루션 연산부(410)에 의해 생성된 제2 특징맵을 배치(Batch) 단위로 정규화한다. The first normalizer 420 included in the second convolution unit 333 normalizes the second feature map generated by the first convolution operation unit 410 in units of batches.

제2 컨벌루션 유닛(333)에 포함된 제1 비선형화부(430)는 정규화된 제2 특징맵에 활성화함수를 적용함으로써 제2 특징맵에 비선형적 특성을 부여한다. 일 실시예에 있어서, 제1 비선형화부(430)는 제2 특징맵의 값들 중 양의 값을 동일하게 출력하고 음의 값은 그 크기를 감소시켜, 예컨대, 0을 출력하는 활성화함수를 제2 특징맵에 적용시킬 수 있다.The first non-linearization unit 430 included in the second convolution unit 333 applies an activation function to the normalized second feature map, thereby imparting a non-linear characteristic to the second feature map. In an embodiment, the first non-linearization unit 430 outputs the same positive value among the values of the second feature map, and decreases the size of the negative value, for example, to generate a second activation function that outputs 0. It can be applied to the feature map.

제1 업샘플링부(332)는 제1 특징맵 생성부(322)에 의해 생성된 제1 특징맵이 제2 학습 입력 이미지와 동일한 해상도를 가지도록 업샘플링한다. 제3 학습 입력 이미지를 기초로 생성된 제1 특징맵은 제3 학습 입력 이미지와 동일한 제3 해상도를 가지며, 제2 학습 입력 이미지를 기초로 생성된 제2 특징맵은 제3 해상도 보다 큰 제2 해상도를 가질 수 있다. 예컨대, 제1 특징맵은 해상도가 20*20일 수 있으며, 제2 특징맵은 해상도가 40*40일 수 있다. The first upsampling unit 332 upsamples the first feature map generated by the first feature map generator 322 to have the same resolution as the second learning input image. The first feature map generated based on the third learning input image has the same third resolution as the third learning input image, and the second feature map generated based on the second learning input image has a second resolution greater than the third resolution. resolution can be For example, the first feature map may have a resolution of 20*20, and the second feature map may have a resolution of 40*40.

제1 업샘플링부(332)는 제1 특징맵이 제2 컨벌루션 유닛(333)에 의해 생성된 제2 특징맵과 동일한 제2 해상도를 가지도록 업샘플링할 수 있다. 예컨대, 제1 업샘플링부(332)는 제1 특징맵의 해상도가 20*20에서 40*40이 되도록 업샘플링을 수행할 수 있다.The first upsampling unit 332 may upsample the first feature map to have the same second resolution as the second feature map generated by the second convolution unit 333 . For example, the first upsampling unit 332 may perform upsampling so that the resolution of the first feature map is changed from 20*20 to 40*40.

제1 연산부(338)는 업샘플링된 제1 특징맵을 제2 특징맵에 합산한다. 본 발명에 일 실시예에 따른 얼굴인식 시스템(100)은 제1 연산부(338)를 통해 업샘플링된 제1 특징맵을 제2 특징맵에 합산함으로써, 제1 특징맵의 시멘틱(sementic) 정보가 제2 특징맵에 반영될 수 있도록 할 수 있다. 이를 통해, 제2 특징맵이 더 많은 시멘틱(sementic) 정보를 가질 수 있도록 할 수 있다.The first operation unit 338 adds the up-sampled first feature map to the second feature map. The face recognition system 100 according to an embodiment of the present invention adds the first feature map upsampled through the first operation unit 338 to the second feature map, so that semantic information of the first feature map is It can be reflected in the second feature map. Through this, the second feature map may have more semantic information.

제3 컨벌루션 유닛(334)은 제1 연산부(338)에서 출력되는 제2 특징맵에 제2 컨벌루션 필터를 적용하여 제2 특징맵을 최종적으로 생성할 수 있다. 이때, 제1 연산부(338)에서 출력되는 제2 특징맵은 제1 특징맵이 반영된 제2 학습 입력 이미지에 상응할 수 있다.The third convolution unit 334 may finally generate the second feature map by applying the second convolution filter to the second feature map output from the first operation unit 338 . In this case, the second feature map output from the first operation unit 338 may correspond to the second learning input image to which the first feature map is reflected.

제3 컨벌루션 유닛(334)은 도 5에 도시된 바와 같이 제2 컨벌루션 연산부(510), 제2 정규화부(520) 및 제2 비선형화부(530)를 포함할 수 있다. As shown in FIG. 5 , the third convolution unit 334 may include a second convolution operation unit 510 , a second normalization unit 520 , and a second non-linearization unit 530 .

제3 컨벌루션 유닛(334)에 포함된 제2 컨벌루션 연산부(510)는 제2 특징맵이 입력되면, 제2 특징맵에 대하여 제2 컨벌루션 필터를 이용하여 컨벌루션 연산을 수행할 수 있다. 제2 컨벌루션 필터는 제1 컨벌루션 연산부(410)의 제1 컨벌루션 필터와 필터의 개수가 동일할 수 있으나, 크기는 상이할 수 있다. 일 실시예에 있어서, 제2 컨벌루션 필터는 3*3 크기를 갖고 스트라이드(Stride)의 값이 1일 수 있다. When the second feature map is input, the second convolution operation unit 510 included in the third convolution unit 334 may perform a convolution operation on the second feature map by using the second convolution filter. The second convolution filter may have the same number of filters as the first convolution filter of the first convolution operation unit 410 , but may have different sizes. In an embodiment, the second convolutional filter may have a size of 3*3 and a value of a stride may be 1.

제3 컨벌루션 유닛(334)에 포함된 제2 정규화부(520)는 제2 컨벌루션 연산부(510)로부터 출력된 제2 특징맵을 배치(Batch) 단위로 정규화한다. The second normalization unit 520 included in the third convolution unit 334 normalizes the second feature map output from the second convolution operation unit 510 in units of batches.

제3 컨벌루션 유닛(334)에 포함된 제2 비선형화부(530)는 정규화된 제2 특징맵에 활성화함수를 적용함으로써 제2 특징맵에 비선형적 특성을 부여한다. 일 실시예에 있어서, 제2 비선형화부(530)는 제2 특징맵의 값들 중 양의 값을 동일하게 출력하고 음의 값은 그 크기를 감소시켜, 예컨대, 0을 출력하는 활성화함수를 제2 특징맵에 적용시킬 수 있다.The second non-linearization unit 530 included in the third convolution unit 334 applies an activation function to the normalized second feature map, thereby imparting a non-linear characteristic to the second feature map. In an embodiment, the second non-linearization unit 530 outputs the same positive value among the values of the second feature map and decreases the size of the negative value, for example, to generate a second activation function that outputs 0. It can be applied to the feature map.

제2 특징맵 생성부(324)에 의해 생성된 제2 특징맵은 제2 학습 네트워크(350)에서 얼굴인식 모델(240)을 학습하는데 이용될 수 있다. 제2 특징맵은 제2 특징맵 생성부(324)에 의해 생성된 형태로 제2 학습 네트워크(350)에서 이용될 수 있으나, 반드시 이에 한정되지는 않는다. The second feature map generated by the second feature map generator 324 may be used to learn the face recognition model 240 in the second learning network 350 . The second feature map may be used in the second learning network 350 in the form generated by the second feature map generator 324, but is not limited thereto.

제2 특징맵은 컨벌루션 연산이 수행되는 별도의 네트워크에 의하여 변경될 수도 있다. 일 실시예에 있어서, 제2 특징맵은 단일 스테이지 네트워크(340)에 의하여 변경될 수 있고, 변경된 제2 특징맵이 제2 학습 네트워크(350)에서 얼굴인식 모델(240)을 학습하는데 이용될 수 있다. The second feature map may be changed by a separate network on which a convolution operation is performed. In an embodiment, the second feature map may be changed by the single-stage network 340 , and the modified second feature map may be used to train the face recognition model 240 in the second learning network 350 . there is.

또한, 제2 특징맵 생성부(324)에 의해 생성된 제2 특징맵은 제3 특징맵 생성부(326)에서 제3 특징맵이 생성되는데 이용될 수 있다.Also, the second feature map generated by the second feature map generator 324 may be used to generate a third feature map by the third feature map generator 326 .

제3 특징맵 생성부(326)는 제2 특징맵 생성부(324)에 의해 생성된 제2 특징맵 및 제1 학습 입력 이미지를 이용하여 제3 특징맵을 생성한다. 제3 특징맵 생성부(326)는 제2 업샘플링부(335), 제4 컨벌루션 유닛(336), 제5 컨벌루션 유닛(337) 및 제2 연산부(339)를 포함할 수 있다.The third feature map generator 326 generates a third feature map by using the second feature map and the first learning input image generated by the second feature map generator 324 . The third feature map generator 326 may include a second upsampling unit 335 , a fourth convolution unit 336 , a fifth convolution unit 337 , and a second operation unit 339 .

제4 컨벌루션 유닛(336)은 제1 학습 입력 이미지에 제1 컨벌루션 필터를 적용하여 제3 특징맵을 생성할 수 있다. 제4 컨벌루션 유닛(336)은 제1 컨벌루션 유닛(331) 및 제2 컨벌루션 유닛(333)과 동일하게 제1 컨벌루션 연산부(410), 제1 정규화부(420) 및 제1 비선형화부(430)를 포함할 수 있다. The fourth convolution unit 336 may generate a third feature map by applying the first convolution filter to the first learning input image. The fourth convolution unit 336 is the same as the first convolution unit 331 and the second convolution unit 333 , the first convolution operation unit 410 , the first normalization unit 420 , and the first non-linearization unit 430 . may include

제4 컨벌루션 유닛(336)에 포함된 제1 컨벌루션 연산부(410)는 제1 학습 입력 이미지가 입력되면, 제1 학습 입력 이미지에 대하여 제1 컨벌루션 필터를 이용하여 컨벌루션 연산을 수행하고, 제3 특징맵이 생성될 수 있다.When the first learning input image is input, the first convolution operation unit 410 included in the fourth convolution unit 336 performs a convolution operation on the first learning input image by using the first convolution filter, and the third feature A map may be created.

일 실시예에 있어서, 제1 컨벌루션 필터는 1*1 크기를 갖고 스트라이드(Stride)의 값이 1일 수 있다. 제1 컨벌루션 필터는 제1 학습 입력 이미지의 해상도는 변경시키지 않으면서 제1 학습 입력 이미지의 차원만 특정 차원으로 변경시키는 필터일 수 있다. 예컨대, 제1 컨벌루션 필터가 1*1 크기를 가지고 K개인 경우, 제1 컨벌루션 연산부(410)는 제1 학습 입력 이미지에 제1 컨벌루션 필터를 적용하여 제1 학습 입력 이미지의 차원을 K차원으로 변경시킬 수 있다. 이를 통해, 제1 학습 입력 이미지로부터 추출된 제3 특징맵과 제2 특징맵 생성부(324)로부터 출력된 제2 특징맵의 차원이 특정 차원으로 동일하게 변경될 수 있으며, 제3 특징맵과 제2 특징맵 간의 연산이 간단해질 수 있다. 여기서, 제3 특징맵은 차원이 변경된 제1 학습 입력 이미지에 상응한다.In an embodiment, the first convolutional filter may have a size of 1*1 and a value of a stride may be 1. The first convolution filter may be a filter that changes only the dimension of the first training input image to a specific dimension without changing the resolution of the first training input image. For example, if the first convolution filter has a size of 1*1 and has K, the first convolution operation unit 410 applies the first convolution filter to the first learning input image to change the dimension of the first learning input image to K dimension. can do it Through this, the dimensions of the third feature map extracted from the first learning input image and the second feature map output from the second feature map generator 324 may be changed to a specific dimension in the same way, and the third feature map and Calculation between the second feature maps may be simplified. Here, the third feature map corresponds to the first learning input image of which the dimension has been changed.

제4 컨벌루션 유닛(336)에 포함된 제1 정규화부(420)는 제1 컨벌루션 연산부(410)에 의해 생성된 제3 특징맵을 배치(Batch) 단위로 정규화한다. The first normalizer 420 included in the fourth convolution unit 336 normalizes the third feature map generated by the first convolution operation unit 410 in units of batches.

제4 컨벌루션 유닛(336)에 포함된 제1 비선형화부(430)는 정규화된 제3 특징맵에 활성화함수를 적용함으로써 제3 특징맵에 비선형적 특성을 부여한다. 일 실시예에 있어서, 제1 비선형화부(430)는 제3 특징맵의 값들 중 양의 값을 동일하게 출력하고 음의 값은 그 크기를 감소시켜, 예컨대, 0을 출력하는 활성화함수를 제3 특징맵에 적용시킬 수 있다.The first non-linearization unit 430 included in the fourth convolution unit 336 applies an activation function to the normalized third feature map, thereby imparting a non-linear characteristic to the third feature map. In an embodiment, the first non-linearization unit 430 outputs the same positive value among the values of the third feature map and decreases the size of the negative value, for example, by using the third activation function to output 0. It can be applied to the feature map.

제2 업샘플링부(335)는 제2 특징맵 생성부(324)에 의해 생성된 제2 특징맵이 제1 학습 입력 이미지와 동일한 해상도를 가지도록 업샘플링한다. 제2 학습 입력 이미지를 기초로 생성된 제2 특징맵은 제2 학습 입력 이미지와 동일한 제2 해상도를 가지며, 제1 학습 입력 이미지를 기초로 생성된 제3 특징맵은 제2 해상도 보다 큰 제1 해상도를 가질 수 있다. 예컨대, 제2 특징맵은 해상도가 40*40일 수 있으며, 제3 특징맵은 해상도가 80*80일 수 있다. The second upsampling unit 335 upsamples the second feature map generated by the second feature map generator 324 to have the same resolution as the first learning input image. The second feature map generated based on the second learning input image has the same second resolution as the second learning input image, and the third feature map generated based on the first learning input image has a first resolution greater than the second resolution. resolution can be For example, the second feature map may have a resolution of 40*40, and the third feature map may have a resolution of 80*80.

제2 업샘플링부(335)는 제2 특징맵이 제3 컨벌루션 유닛(336)에 의해 생성된 제3 특징맵과 동일한 제1 해상도를 가지도록 업샘플링할 수 있다. 예컨대, 제2 업샘플링부(335)는 제2 특징맵의 해상도가 40*40에서 80*80이 되도록 업샘플링을 수행할 수 있다.The second upsampling unit 335 may upsample the second feature map to have the same first resolution as the third feature map generated by the third convolution unit 336 . For example, the second upsampling unit 335 may perform upsampling so that the resolution of the second feature map is changed from 40*40 to 80*80.

제2 연산부(339)는 업샘플링된 제2 특징맵을 제3 특징맵에 합산한다. 본 발명에 일 실시예에 따른 얼굴인식 시스템(100)은 제2 연산부(339)를 통해 업샘플링된 제2 특징맵을 제3 특징맵에 합산함으로써, 제1 특징맵 및 제2 특징맵의 시멘틱(sementic) 정보가 제3 특징맵에 반영될 수 있도록 할 수 있다. 이를 통해, 제3 특징맵이 많은 시멘틱(sementic) 정보를 가질 수 있도록 할 수 있다.The second operation unit 339 adds the up-sampled second feature map to the third feature map. The face recognition system 100 according to an embodiment of the present invention adds the up-sampled second feature map to the third feature map through the second operation unit 339, so that the semantics of the first feature map and the second feature map are (sementic) information may be reflected in the third feature map. Through this, the third feature map can have a lot of semantic information.

제5 컨벌루션 유닛(337)은 제2 연산부(339)에서 출력되는 제3 특징맵에 제2 컨벌루션 필터를 적용하여 제3 특징맵을 최종적으로 생성할 수 있다. 이때, 제2 연산부(339)에서 출력되는 제3 특징맵은 제2 특징맵이 반영된 제1 학습 입력 이미지에 상응할 수 있다.The fifth convolution unit 337 may finally generate the third feature map by applying the second convolution filter to the third feature map output from the second operation unit 339 . In this case, the third feature map output from the second operation unit 339 may correspond to the first learning input image to which the second feature map is reflected.

제5 컨벌루션 유닛(337)은 제4 컨벌루션 유닛(334)과 동일하게 제2 컨벌루션 연산부(510), 제2 정규화부(520) 및 제2 비선형화부(530)를 포함할 수 있다. The fifth convolution unit 337 may include a second convolution operation unit 510 , a second normalization unit 520 , and a second non-linearization unit 530 in the same manner as the fourth convolution unit 334 .

제5 컨벌루션 유닛(337)에 포함된 제2 컨벌루션 연산부(510)는 제3 특징맵이 입력되면, 제3 특징맵에 대하여 제2 컨벌루션 필터를 이용하여 컨벌루션 연산을 수행할 수 있다. 제2 컨벌루션 필터는 제1 컨벌루션 연산부(410)의 제1 컨벌루션 필터와 크기가 상이할 수 있다. 일 실시예에 있어서, 제2 컨벌루션 필터는 3*3 크기를 갖고 스트라이드(Stride)의 값이 1일 수 있다. When the third feature map is input, the second convolution operation unit 510 included in the fifth convolution unit 337 may perform a convolution operation on the third feature map by using the second convolution filter. The size of the second convolution filter may be different from that of the first convolution filter of the first convolution operation unit 410 . In an embodiment, the second convolutional filter may have a size of 3*3 and a value of a stride may be 1.

제5 컨벌루션 유닛(337)에 포함된 제2 정규화부(520)는 제2 컨벌루션 연산부(510)로부터 출력된 제3 특징맵을 배치(Batch) 단위로 정규화한다. The second normalization unit 520 included in the fifth convolution unit 337 normalizes the third feature map output from the second convolution operation unit 510 in units of batches.

제5 컨벌루션 유닛(337)에 포함된 제2 비선형화부(530)는 정규화된 제3 특징맵에 활성화함수를 적용함으로써 제3 특징맵에 비선형적 특성을 부여한다. 일 실시예에 있어서, 제2 비선형화부(530)는 제3 특징맵의 값들 중 양의 값을 동일하게 출력하고 음의 값은 그 크기를 감소시켜, 예컨대, 0을 출력하는 활성화함수를 제3 특징맵에 적용시킬 수 있다.The second non-linearization unit 530 included in the fifth convolution unit 337 applies an activation function to the normalized third feature map, thereby imparting a non-linear characteristic to the third feature map. In an embodiment, the second non-linearization unit 530 outputs the same positive value among the values of the third feature map, and decreases the size of the negative value, for example, to generate a third activation function that outputs 0. It can be applied to the feature map.

제3 특징맵 생성부(326)에 의해 생성된 제3 특징맵은 제2 학습 네트워크(350)에서 얼굴인식 모델(240)을 학습하는데 이용될 수 있다. 제3 특징맵은 제3 특징맵 생성부(326)에 의해 생성된 형태로 제2 학습 네트워크(350)에서 이용될 수 있으나, 반드시 이에 한정되지는 않는다. The third feature map generated by the third feature map generator 326 may be used to learn the face recognition model 240 in the second learning network 350 . The third feature map may be used in the second learning network 350 in the form generated by the third feature map generator 326, but is not limited thereto.

제3 특징맵은 컨벌루션 연산이 수행되는 별도의 네트워크에 의하여 변경될 수도 있다. 일 실시예에 있어서, 제3 특징맵은 단일 스테이지 네트워크(340)에 의하여 변경될 수 있고, 변경된 제3 특징맵이 제2 학습 네트워크(350)에서 얼굴인식 모델(240)을 학습하는데 이용될 수 있다. The third feature map may be changed by a separate network on which a convolution operation is performed. In one embodiment, the third feature map may be changed by the single-stage network 340 , and the modified third feature map may be used to train the face recognition model 240 in the second learning network 350 . there is.

단일 스테이지 네트워크(340)는 제1 내지 제3 특징맵 중 하나가 입력되면, 입력된 특징맵을 기초로 리셉티브 필드(receptive field)가 서로 다른 복수의 중간 특징맵들을 생성하고, 중간 특징맵들을 통합하여 출력할 수 있다. 이를 위하여, 단일 스테이지 네트워크(340)는 복수의 컨벌루션 유닛들과 복수의 컨벌루션 유닛들 각각에 의해 생성된 중간 특징맵들을 결합하는 결합부가 포함될 수 있다.When one of the first to third feature maps is input, the single-stage network 340 generates a plurality of intermediate feature maps having different receptive fields based on the input feature map, and generates the intermediate feature maps. It can be combined and printed. To this end, the single-stage network 340 may include a plurality of convolutional units and a combiner for combining intermediate feature maps generated by each of the plurality of convolutional units.

일 실시예에 있어서, 단일 스테이지 네트워크(340)는 제1 내지 제3 단일 스테이지 네트워크들(340a, 340b, 340c)을 포함할 수 있다. 제1 단일 스테이지 네트워크(340a)에는 제1 특징맵 생성부(322)로부터 제1 특징맵이 입력되고, 제2 단일 스테이지 네트워크(340b)에는 제2 특징맵 생성부(324)로부터 제2 특징맵이 입력되며, 제3 단일 스테이지 네트워크(340c)에는 제3 특징맵 생성부(326)로부터 제3 특징맵이 입력될 수 있다. In one embodiment, the single-stage network 340 may include first to third single-stage networks 340a, 340b, and 340c. The first feature map is input from the first feature map generator 322 to the first single-stage network 340a, and the second feature map is input from the second feature map generator 324 to the second single-stage network 340b. is inputted, and a third feature map may be input from the third feature map generator 326 to the third single-stage network 340c.

이하에서는 제1 단일 스테이지 네트워크(340a)를 중점적으로 설명하고 있으나, 제2 단일 스테이지 네트워크(340b) 및 제3 단일 스테이지 네트워크(340c)도 제1 단일 스테이지 네트워크(340a)와 동일하게 동작할 수 있다.Hereinafter, the first single-stage network 340a is mainly described, but the second single-stage network 340b and the third single-stage network 340c may operate in the same manner as the first single-stage network 340a. .

제1 내지 제3 단일 스테이지 네트워크들(340a, 340b, 340c) 각각은 도 6에 도시된 바와 같이 복수의 컨벌루션 유닛들(610, 620, 630, 640, 650), 결합부(660) 및 제3 비선형화부(670)를 포함할 수 있다. 복수의 컨벌루션 유닛들(610, 620, 630, 640, 650)들 중 일부(610, 630, 650)는 도 7에 도시된 바와 같이 제3 컨벌루션 연산부(710) 및 제3 정규화부(720)를 포함하고, 복수의 컨벌루션 유닛들(610, 620, 630, 640, 650)들 중 나머지(620, 640)는 제4 컨벌루션 연산부(810), 제4 정규화부(820) 및 제4 비선형화부(830)를 포함할 수 있다.Each of the first to third single-stage networks 340a, 340b, and 340c includes a plurality of convolutional units 610, 620, 630, 640, 650, a combiner 660 and a third as shown in FIG. A non-linearization unit 670 may be included. Some of the plurality of convolution units 610 , 620 , 630 , 640 and 650 , as shown in FIG. 7 , include a third convolution operation unit 710 and a third normalization unit 720 . of the plurality of convolution units 610 , 620 , 630 , 640 , and 650 , the rest 620 and 640 are a fourth convolution operation unit 810 , a fourth normalization unit 820 , and a fourth non-linearization unit 830 . ) may be included.

제1 단일 스테이지 네트워크(340a)는 제1 특징맵이 제6 컨벌루션 유닛(610)에 의해 제1 중간 특징맵이 생성되고, 제7 컨벌루션 유닛(620) 및 제8 컨벌루션 유닛(630)에 의해 제2 중간 특징맵이 생성되며, 제7 컨벌루션 유닛(620), 제9 컨벌루션 유닛(640) 및 제10 컨벌루션 유닛(650)에 의해 제3 중간 특징맵이 생성될 수 있다.In the first single-stage network 340a, the first feature map is generated by the sixth convolution unit 610, and the first intermediate feature map is generated by the seventh convolution unit 620 and the eighth convolution unit 630. 2 intermediate feature maps may be generated, and a third intermediate feature map may be generated by the seventh convolution unit 620 , the ninth convolution unit 640 , and the tenth convolution unit 650 .

구체적을, 제6 컨벌루션 유닛(610)에 포함된 제3 컨벌루션 연산부(710)는 제1 특징맵 생성부(322)로부터 제1 특징맵이 입력되면, 제1 특징맵에 대하여 제3 컨벌루션 필터를 이용하여 컨벌루션 연산을 수행하고, 제1 중간 특징맵이 생성될 수 있다. 일 실시예에 있어서, 제3 컨벌루션 필터는 3*3 크기를 갖고 스트라이드(Stride)의 값이 1일 수 있다. 일 실시예에 있어서, 제3 컨벌루션 필터는 개수(채널의 수)가 제1 특징맵의 차원의 1/2에 상응할 수 있다. 예컨대, 제3 컨벌루션 필터는 3*3*C/2 크기를 가질 수 있다. 여기서, 상기 C는 제1 특징맵의 차원 수에 상응할 수 있다. 제3 컨벌루션 연산부(710)는 제1 특징맵의 차원의 1/2차원을 가지는 제1 중간 특징맵을 생성할 수 있다.Specifically, when the first feature map is input from the first feature map generator 322, the third convolution operation unit 710 included in the sixth convolution unit 610 applies a third convolution filter to the first feature map. A convolution operation may be performed using the method, and a first intermediate feature map may be generated. In an embodiment, the third convolutional filter may have a size of 3*3 and a value of a stride may be 1. In an embodiment, the number (the number of channels) of the third convolutional filter may correspond to 1/2 of the dimension of the first feature map. For example, the third convolutional filter may have a size of 3*3*C/2. Here, C may correspond to the number of dimensions of the first feature map. The third convolution operation unit 710 may generate a first intermediate feature map having a dimension 1/2 of the dimension of the first feature map.

제6 컨벌루션 유닛(610)에 포함된 제3 정규화부(720)는 제3 컨벌루션 연산부(710)에 의해 생성된 제1 중간 특징맵을 배치 단위로 정규화한다.The third normalizer 720 included in the sixth convolution unit 610 normalizes the first intermediate feature map generated by the third convolution operation unit 710 in units of batches.

제7 컨벌루션 유닛(620)에 포함된 제4 컨벌루션 연산부(810)는 제1 특징맵 생성부(322)로부터 제1 특징맵이 입력되면, 제1 특징맵에 대하여 제4 컨벌루션 필터를 이용하여 컨벌루션 연산을 수행하고, 제2 중간 특징맵이 생성될 수 있다. 일 실시예에 있어서, 제4 컨벌루션 필터는 3*3 크기를 갖고 스트라이드(Stride)의 값이 1일 수 있다. 일 실시예에 있어서, 제4 컨벌루션 필터는 개수(채널의 수)가 제1 특징맵의 차원의 1/4에 상응할 수 있다. 예컨대, 제4 컨벌루션 필터는 3*3*C/4 크기를 가질 수 있다. 여기서, 상기 C는 제1 특징맵의 차원 수에 상응할 수 있다. 제4 컨벌루션 연산부(810)는 제1 특징맵의 차원의 1/4차원을 가지는 제2 중간 특징맵을 생성할 수 있다.When the first feature map is input from the first feature map generator 322, the fourth convolution operation unit 810 included in the seventh convolution unit 620 performs convolution on the first feature map using a fourth convolution filter. The operation may be performed, and a second intermediate feature map may be generated. In an embodiment, the fourth convolutional filter may have a size of 3*3 and a value of a stride may be 1. In an embodiment, the number (the number of channels) of the fourth convolutional filter may correspond to 1/4 of the dimension of the first feature map. For example, the fourth convolutional filter may have a size of 3*3*C/4. Here, C may correspond to the number of dimensions of the first feature map. The fourth convolution operation unit 810 may generate a second intermediate feature map having a 1/4 dimension of the dimension of the first feature map.

제7 컨벌루션 유닛(620)에 포함된 제4 정규화부(820)는 제4 컨벌루션 연산부(810)에 의해 제2 중간 특징맵을 배치 단위로 정규화한다. 그리고, 제7 컨벌루션 유닛(620)에 포함된 제4 비선형화부(830)는 정규화된 제2 중간 특징맵에 활성화함수를 적용함으로써 제2 중간 특징맵에 비선형적 특성을 부여한다.The fourth normalization unit 820 included in the seventh convolution unit 620 normalizes the second intermediate feature map in batches by the fourth convolution operation unit 810 . In addition, the fourth non-linearization unit 830 included in the seventh convolution unit 620 applies an activation function to the normalized second intermediate feature map, thereby imparting a non-linear characteristic to the second intermediate feature map.

제8 컨벌루션 유닛(630)에 포함된 제3 컨벌루션 연산부(710)는 제7 컨벌루션 유닛(620)로부터 제2 중간 특징맵이 입력되면, 제2 중간 특징맵에 대하여 제3 컨벌루션 필터를 이용하여 컨벌루션 연산을 수행할 수 있다. 일 실시예에 있어서, 제3 컨벌루션 필터는 3*3 크기를 갖고 스트라이드(Stride)의 값이 1일 수 있다. 일 실시예에 있어서, 제3 컨벌루션 필터는 개수(채널의 수)가 제2 중간 특징맵의 차원의 1/4에 상응할 수 있다. 예컨대, 제3 컨벌루션 필터는 3*3*C/4 크기를 가질 수 있다. 여기서, 상기 C는 제7 컨벌루션 유닛(620)에서 출력된 제2 중간 특징맵의 차원 수에 상응할 수 있다. 제3 컨벌루션 연산부(710)는 제2 중간 특징맵에 제3 컨벌루션 필터를 적용하여 제2 중간 특징맵의 차원을 감소시킬 수 있다.When the second intermediate feature map is input from the seventh convolution unit 620, the third convolution operation unit 710 included in the eighth convolution unit 630 performs convolution on the second intermediate feature map using a third convolution filter. operation can be performed. In an embodiment, the third convolutional filter may have a size of 3*3 and a value of a stride may be 1. In an embodiment, the number (the number of channels) of the third convolutional filter may correspond to 1/4 of the dimension of the second intermediate feature map. For example, the third convolutional filter may have a size of 3*3*C/4. Here, C may correspond to the number of dimensions of the second intermediate feature map output from the seventh convolution unit 620 . The third convolution operation unit 710 may reduce the dimension of the second intermediate feature map by applying the third convolution filter to the second intermediate feature map.

제8 컨벌루션 유닛(630)에 포함된 제3 정규화부(720)는 제3 컨벌루션 연산부(710)에 의해 생성된 제2 중간 특징맵을 배치 단위로 정규화한다.The third normalization unit 720 included in the eighth convolution unit 630 normalizes the second intermediate feature map generated by the third convolution operation unit 710 in units of batches.

제9 컨벌루션 유닛(640)에 포함된 제4 컨벌루션 연산부(810)는 제7 컨벌루션 유닛(620)으로부터 제2 중간 특징맵이 입력되면, 제2 중간 특징맵에 대하여 제4 컨벌루션 필터를 이용하여 컨벌루션 연산을 수행하고, 제3 중간 특징맵이 생성될 수 있다. 일 실시예에 있어서, 제4 컨벌루션 필터는 3*3 크기를 갖고 스트라이드(Stride)의 값이 1일 수 있다. 일 실시예에 있어서, 제4 컨벌루션 필터는 개수(채널의 수)가 제2 중간 특징맵의 차원의 1/4에 상응할 수 있다. 예컨대, 제4 컨벌루션 필터는 3*3*C/4 크기를 가질 수 있다. 여기서, 상기 C는 제2 중간 특징맵의 차원 수에 상응할 수 있다. 제4 컨벌루션 연산부(810)는 제2 중간 특징맵에 제4 컨벌루션 필터를 적용하여 제2 중간 특징맵의 차원의 1/4차원을 가지는 제3 중간 특징맵을 생성할 수 있다.When the second intermediate feature map is input from the seventh convolution unit 620, the fourth convolution operation unit 810 included in the ninth convolution unit 640 performs convolution on the second intermediate feature map using a fourth convolution filter. The operation may be performed, and a third intermediate feature map may be generated. In an embodiment, the fourth convolutional filter may have a size of 3*3 and a value of a stride may be 1. In an embodiment, the number (the number of channels) of the fourth convolutional filter may correspond to 1/4 of the dimension of the second intermediate feature map. For example, the fourth convolutional filter may have a size of 3*3*C/4. Here, C may correspond to the number of dimensions of the second intermediate feature map. The fourth convolution operation unit 810 may generate a third intermediate feature map having a 1/4 dimension of the dimension of the second intermediate feature map by applying a fourth convolution filter to the second intermediate feature map.

제9 컨벌루션 유닛(640)에 포함된 제4 정규화부(820)는 제4 컨벌루션 연산부(810)에 의해 생성된 제3 중간 특징맵을 배치 단위로 정규화한다. 그리고, 제9 컨벌루션 유닛(640)에 포함된 제4 비선형화부(830)는 정규화된 제3 중간 특징맵에 활성화함수를 적용함으로써 제3 중간 특징맵에 비선형적 특성을 부여한다.The fourth normalization unit 820 included in the ninth convolution unit 640 normalizes the third intermediate feature map generated by the fourth convolution operation unit 810 in units of batches. In addition, the fourth non-linearization unit 830 included in the ninth convolution unit 640 applies the activation function to the normalized third intermediate feature map, thereby imparting a non-linear characteristic to the third intermediate feature map.

제10 컨벌루션 유닛(650)에 포함된 제3 컨벌루션 연산부(710)는 제9 컨벌루션 유닛(640)로부터 제3 중간 특징맵이 입력되면, 제3 중간 특징맵에 대하여 제3 컨벌루션 필터를 이용하여 컨벌루션 연산을 수행할 수 있다. 일 실시예에 있어서, 제3 컨벌루션 필터는 3*3 크기를 갖고 스트라이드(Stride)의 값이 1일 수 있다. 일 실시예에 있어서, 제3 컨벌루션 필터는 개수(채널의 수)가 제3 중간 특징맵의 차원의 1/4에 상응할 수 있다. 예컨대, 제3 컨벌루션 필터는 3*3*C/4 크기를 가질 수 있다. 여기서, 상기 C는 제9 컨벌루션 유닛(640)에서 출력된 제3 중간 특징맵의 차원 수에 상응할 수 있다. 제3 컨벌루션 연산부(710)는 제3 중간 특징맵에 제3 컨벌루션 필터를 적용하여 제3 중간 특징맵의 차원을 감소시킬 수 있다.When a third intermediate feature map is input from the ninth convolution unit 640, the third convolution operation unit 710 included in the tenth convolution unit 650 performs convolution on the third intermediate feature map using a third convolution filter. operation can be performed. In an embodiment, the third convolutional filter may have a size of 3*3 and a value of a stride may be 1. In an embodiment, the number (the number of channels) of the third convolutional filter may correspond to 1/4 of the dimension of the third intermediate feature map. For example, the third convolutional filter may have a size of 3*3*C/4. Here, C may correspond to the number of dimensions of the third intermediate feature map output from the ninth convolution unit 640 . The third convolution operation unit 710 may reduce the dimension of the third intermediate feature map by applying the third convolution filter to the third intermediate feature map.

제10 컨벌루션 유닛(650)에 포함된 제3 정규화부(720)는 제3 컨벌루션 연산부(710)에 의해 생성된 제3 중간 특징맵을 배치 단위로 정규화한다.The third normalization unit 720 included in the tenth convolution unit 650 normalizes the third intermediate feature map generated by the third convolution operation unit 710 in units of batches.

결과적으로, 제6 컨벌루션 유닛(610), 제8 컨벌루션 유닛(630) 및 제10 컨벌루션 유닛(650) 각각은 리셉티브 필드(receptive field)가 서로 다른 제1, 제2 및 제3 중간 특징맵을 출력할 수 있다. 예컨대, 제3 중간 특징맵은 3번의 컨벌루션 연산을 통해 제1 중간 특징맵 보다 넓은 리셉티브 필드를 가질 수 있다.As a result, each of the sixth convolutional unit 610 , the eighth convolutional unit 630 , and the tenth convolutional unit 650 obtains first, second, and third intermediate feature maps having different receptive fields. can be printed out. For example, the third intermediate feature map may have a wider receptive field than the first intermediate feature map through three convolution operations.

결합부(660)는 리셉티브 필드가 서로 다른 제1, 제2 및 제3 중간 특징맵들을 결합하여 하나의 제1 특징맵을 출력할 수 있다. 일 실시예에 있어서, 결합부(660)는 제1, 제2 및 제3 중간 특징맵에 대하여 결합(concat) 연산을 수행할 수 있다.The combiner 660 may combine the first, second, and third intermediate feature maps having different receptive fields to output one first feature map. In an embodiment, the combiner 660 may perform a concat operation on the first, second, and third intermediate feature maps.

제3 비선형화부(670)는 결합부(660)에서 출력된 제1 특징맵에 활성화함수를 적용함으로써 제1 특징맵에 비선형적 특성을 부여한다.The third non-linearization unit 670 applies an activation function to the first feature map output from the combiner 660 to impart a non-linear characteristic to the first feature map.

단일 스테이지 네트워크(340)는 리셉티브 필드가 서로 다른 제1, 제2 및 제3 중간 특징맵들을 결합한 특징맵을 출력하므로, 출력된 특징맵을 기초로 다양한 크기의 얼굴을 동시에 탐지할 수 있도록 할 수 있다. 또한, 단일 스테이지 네트워크(340)는 특징맵 추출이 단일 경로 상에서 이루어지므로, 알고리즘이 간단하고, 처리 속도가 빠르다. Since the single-stage network 340 outputs a feature map combining first, second, and third intermediate feature maps with different receptive fields, it is possible to simultaneously detect faces of various sizes based on the output feature map. can In addition, in the single-stage network 340 , since feature map extraction is performed on a single path, the algorithm is simple and the processing speed is high.

본 발명의 일 실시예에 따른 얼굴인식 시스템(100)은 제1 특징맵 생성부(322)에 의해 생성된 제1 특징맵, 제2 특징맵 생성부(324)에 의해 생성된 제2 특징맵 및 제3 특징맵 생성부(326)에 의해 생성된 제3 특징맵 각각을 단일 스테이지 네트워크(340)을 통해 제2 학습 네트워크(350)에 입력시킴으로써, 특징맵이 보다 견고한 함축적 정보를 가질 수 있도록 할 수 있다.The face recognition system 100 according to an embodiment of the present invention provides a first feature map generated by the first feature map generator 322 and a second feature map generated by the second feature map generator 324 . and by inputting each of the third feature maps generated by the third feature map generator 326 to the second learning network 350 through the single-stage network 340, so that the feature map can have more robust implicit information. can do.

제2 학습 네트워크(350)는 제1 내지 제3 특징맵들 각각을 기초로 학습 입력 이미지에 대한 위변조 정보 및 주파수 정보를 획득한다. 이를 위하여, 제2 학습 네트워크(350)는 위변조 판별 서브 네트워크(358) 및 주파수 서브 네트워크(359)를 포함한다.The second learning network 350 obtains forgery information and frequency information about the learning input image based on each of the first to third feature maps. To this end, the second learning network 350 includes a forgery detection sub-network 358 and a frequency sub-network 359 .

위변조 판별 서브 네트워크(358)는 제1 내지 제3 특징맵들 각각에 대하여 위변조 판별 컨벌루션 필터를 적용하여 위변조 확률값을 획득할 수 있다. 구체적으로, 위변조 판별 서브 네트워크(358)는 제1 내지 제3 특징맵들 각각에 대하여 위변조 판별 컨벌루션 필터를 이용하여 컨벌루션 연산을 수행하고, 위변조 판별 특징맵이 생성될 수 있다. The forgery detection sub-network 358 may obtain a forgery probability value by applying a forgery detection convolution filter to each of the first to third feature maps. Specifically, the forgery detection sub-network 358 may perform a convolution operation on each of the first to third feature maps using a forgery detection convolution filter, and a forgery detection feature map may be generated.

일 실시예에 있어서, 위변조 판별 컨벌루션 필터는 1*1 크기를 갖고 스트라이드(Stride)의 값이 1일 수 있다. 이때, 위변조 판별 컨벌루션 필터는 채널의 수가 2개일 수 있다. In one embodiment, the forgery discrimination convolution filter may have a size of 1*1 and a value of a stride may be 1. In this case, the number of channels of the forgery detection convolution filter may be two.

위변조 판별 서브 네트워크(358)는 위변조 판별 특징맵에 미리 정해진 분류함수를 적용함으로써, 해당 학습 입력 이미지가 위변조 이미지인지에 대한 제1 확률값을 산출할 수 있다. 일 예로, 분류함수는 소프트맥스(Softmax)함수일 수 있다.The forgery detection sub-network 358 may calculate a first probability value for whether the corresponding learning input image is a forgery image by applying a predetermined classification function to the forgery detection feature map. As an example, the classification function may be a softmax function.

주파수 서브 네트워크(359)는 제1 내지 제3 특징맵들 각각에 대하여 주파수 컨벌루션 필터를 적용하여 주파수 성분값을 획득할 수 있다. 구체적으로, 주파수 서브 네트워크(359)는 제1 내지 제3 특징맵들 각각에 대하여 주파수 컨벌루션 필터를 이용하여 컨벌루션 연산을 수행하고, 주파수 특징맵이 생성될 수 있다. The frequency subnetwork 359 may obtain a frequency component value by applying a frequency convolution filter to each of the first to third feature maps. Specifically, the frequency subnetwork 359 may perform a convolution operation on each of the first to third feature maps using a frequency convolution filter, and generate a frequency feature map.

일 실시예에 있어서, 주파수 컨벌루션 필터는 퓨리에 변환 필터일 수 있다. 일 예로, 주파수 컨벌루션 필터는 3*3 크기를 갖고 스트라이드(Stride)의 값이 1인 퓨리에 변환 필터일 수 있다.In one embodiment, the frequency convolution filter may be a Fourier transform filter. For example, the frequency convolution filter may be a Fourier transform filter having a size of 3*3 and a stride value of 1.

일 실시예에 있어서, 주파수 컨벌루션 필터의 차원수는 고속 퓨리에 변환 스펙트럼의 차원수로서, 제1 내지 제3 특징맵들의 해상도에 따라 달라질 수 있다. 주파수 컨벌루션 필터는 해상도가 작을수록 차원수가 커질 수 있다. 특징맵은 더 많은 네트워크 레이어를 거칠수록 해상도가 작아질 수 있습니다. 특징맵은 해상도가 작을수록 픽셀당 리셉티브 필드, 즉, 픽셀당 원본 이미지에서 수용하는 영역의 크기가 커지므로, 고속 퓨리에 변환 스펙트럼의 차원수를 크게 설정할 수 있다.In an embodiment, the dimensionality of the frequency convolution filter is the dimensionality of the fast Fourier transform spectrum, and may vary according to resolutions of the first to third feature maps. In the frequency convolution filter, the smaller the resolution, the larger the dimensionality. The feature map can have a lower resolution as it goes through more network layers. As the resolution of the feature map decreases, the size of the receptive field per pixel, that is, the region accommodated in the original image per pixel, increases, so that the number of dimensions of the fast Fourier transform spectrum can be set large.

일 예로, 제1 특징맵의 해상도가 20*20일 수 있고, 제2 특징맵의 해상도가 40*40일 수 있으며, 제3 특징맵의 해상도가 80*80일 수 있다. 주파수 서브 네트워크(359)는 제1 특징맵에 9*9에 상응하는 81차원의 주파수 컨벌루션 필터를 적용하여 20*20*81 크기의 고속 퓨리에 변환 스펙트럼에 상응하는 제1 주파수 특징맵이 생성될 수 있다. 이때, 주파수 컨벌루션 필터의 차원수는 제1 특징맵의 한 픽셀이 원본 이미지에서 9*9 크기의 영역을 수용함을 전제로 한 것으로, 제1 특징맵의 리셉티드 필드의 크기에 따라 달라질 수 있다. 주파수 서브 네트워크(359)는 제2 특징맵에 7*7에 상응하는 49차원의 주파수 컨벌루션 필터를 적용하여 40*40*49 크기의 고속 퓨리에 변환 스펙트럼에 상응하는 제2 주파수 특징맵이 생성될 수 있다. 이때, 주파수 컨벌루션 필터의 차원수는 제2 특징맵의 한 픽셀이 원본 이미지에서 7*7 크기의 영역을 수용함을 전제로 한 것으로, 제2 특징맵의 리셉티드 필드의 크기에 따라 달라질 수 있다. 또한, 주파수 서브 네트워크(359)는 제3 특징맵에 5*5에 상응하는 25차원의 주파수 컨벌루션 필터를 적용하여 80*80*25 크기의 고속 퓨리에 변환 스펙트럼에 상응하는 제3 주파수 특징맵이 생성될 수 있다. 이때, 주파수 컨벌루션 필터의 차원수는 제3 특징맵의 한 픽셀이 원본 이미지에서 5*5 크기의 영역을 수용함을 전제로 한 것으로, 제3 특징맵의 리셉티드 필드의 크기에 따라 달라질 수 있다.For example, the resolution of the first feature map may be 20*20, the resolution of the second feature map may be 40*40, and the resolution of the third feature map may be 80*80. The frequency subnetwork 359 applies an 81-dimensional frequency convolution filter corresponding to 9*9 to the first feature map to generate a first frequency feature map corresponding to a fast Fourier transform spectrum of size 20*20*81. there is. In this case, the number of dimensions of the frequency convolution filter is assuming that one pixel of the first feature map accommodates a 9*9 size area in the original image, and may vary depending on the size of the received field of the first feature map. . The frequency subnetwork 359 applies a 49-dimensional frequency convolution filter corresponding to 7*7 to the second feature map to generate a second frequency feature map corresponding to a fast Fourier transform spectrum of size 40*40*49. there is. In this case, the number of dimensions of the frequency convolution filter is assuming that one pixel of the second feature map accommodates a 7*7 area in the original image, and may vary depending on the size of the received field of the second feature map. . In addition, the frequency subnetwork 359 applies a 25-dimensional frequency convolution filter corresponding to 5*5 to the third feature map to generate a third frequency feature map corresponding to a fast Fourier transform spectrum of size 80*80*25 can be In this case, the number of dimensions of the frequency convolution filter is assuming that one pixel of the third feature map accommodates a 5*5 area in the original image, and may vary depending on the size of the received field of the third feature map. .

한편, 주파수 서브 네트워크(359)는 주파수 특징맵에 활성화함수를 적용함으로써 주파수 특징맵에 비선형적 특성을 부여한다. 일 실시예에 있어서, 주파수 서브 네트워크(359)는 주파수 특징맵의 값들 중 양의 값을 동일하게 출력하고 음의 값은 그 크기를 감소시켜, 예컨대, 0을 출력하는 활성화함수를 주파수 특징맵에 적용시킬 수 있다.On the other hand, the frequency sub-network 359 applies the activation function to the frequency feature map to give the frequency feature map a non-linear characteristic. In an embodiment, the frequency subnetwork 359 equally outputs positive values among the values of the frequency feature map and reduces the magnitude of negative values, for example, adding an activation function that outputs 0 to the frequency feature map. can be applied.

일 실시예에 있어서, 제2 학습 네트워크(350)는 제1 내지 제3 특징맵들 각각을 기초로 학습 입력 이미지에 대한 얼굴 정보를 더 획득할 수 있다. 이러한 경우, 제2 학습 네트워크(350)는 얼굴판별 서브 네트워크(352) 및 얼굴위치 서브 네트워크(354)를 포함할 수 있으며, 랜드마크 서브 네트워크(356)을 더 포함할 수도 있다.In an embodiment, the second learning network 350 may further acquire face information for the learning input image based on each of the first to third feature maps. In this case, the second learning network 350 may include a face recognition sub-network 352 and a face location sub-network 354 , and may further include a landmark sub-network 356 .

얼굴판별 서브 네트워크(352)는 제1 내지 제3 특징맵들 각각에 대하여 얼굴판별 컨벌루션 필터를 적용하여 얼굴영역이 포함될 확률값을 획득할 수 있다. 구체적으로, 얼굴판별 서브 네트워크(352)는 제1 내지 제3 특징맵들 각각에 대하여 얼굴판별 컨벌루션 필터를 이용하여 컨벌루션 연산을 수행하고, 얼굴판별 특징맵이 생성될 수 있다. The face recognition sub-network 352 may obtain a probability value to include a face region by applying a face recognition convolution filter to each of the first to third feature maps. Specifically, the face recognition sub-network 352 may perform a convolution operation on each of the first to third feature maps using a face recognition convolution filter, and generate a face recognition feature map.

일 실시예에 있어서, 얼굴판별 컨벌루션 필터는 1*1 크기를 갖고 스트라이드(Stride)의 값이 1일 수 있다. 이때, 얼굴판별 컨벌루션 필터는 채널의 수가 2개일 수 있다. In an embodiment, the face recognition convolution filter may have a size of 1*1 and a value of a stride may be 1. In this case, the number of channels of the convolutional filter for each face may be two.

얼굴판별 서브 네트워크(352)는 얼굴판별 특징맵에 미리 정해진 분류함수를 적용함으로써, 해당 학습 입력 이미지에 얼굴영역이 포함되어 있는지 여부에 대한 제2 확률값을 산출할 수 있다. 일 예로, 분류함수는 소프트맥스(Softmax)함수일 수 있다.The face recognition sub-network 352 may calculate a second probability value as to whether a face region is included in the corresponding learning input image by applying a predetermined classification function to the face recognition feature map. As an example, the classification function may be a softmax function.

얼굴위치 서브 네트워크(352)는 제1 내지 제3 특징맵들 각각에 대하여 얼굴위치 컨벌루션 필터를 적용하여 얼굴영역의 좌표값을 획득할 수 있다. 구체적으로, 얼굴위치 서브 네트워크(352)는 제1 내지 제3 특징맵들 각각에 대하여 얼굴위치 컨벌루션 필터를 이용하여 컨벌루션 연산을 수행하고, 얼굴위치 특징맵이 생성될 수 있다. The facial location sub-network 352 may obtain the coordinate values of the face region by applying the face location convolution filter to each of the first to third feature maps. Specifically, the face location sub-network 352 may perform a convolution operation on each of the first to third feature maps using a face location convolution filter, and generate a face location feature map.

일 실시예에 있어서, 얼굴위치 컨벌루션 필터는 1*1 크기를 갖고 스트라이드(Stride)의 값이 1일 수 있다. 이때, 얼굴위치 컨벌루션 필터는 채널의 수가 4개일 수 있다. 이를 통해, 얼굴위치 서브 네트워크(352)는 4차원으로 출력되는 4개의 값을 해당 학습 입력 이미지 상에 얼굴영역의 좌표값으로 결정할 수 있다. 이때, 얼굴영역의 좌표값은 얼굴이 포함된 영역을 사각형 형태의 바운딩 박스(Bounding Box)로 표시하였을 때 좌측 상단 꼭지점의 좌표와 우측 하단 꼭지점의 좌표로 정의되거나, 우측 상단 꼭지점의 좌표와 좌측 하단 꼭지점의 좌표로 정의될 수 있다.In an embodiment, the face position convolution filter may have a size of 1*1 and a value of a stride may be 1. In this case, the face position convolution filter may have four channels. Through this, the face position sub-network 352 may determine the four values output in four dimensions as coordinate values of the face region on the corresponding learning input image. In this case, the coordinate values of the face area are defined as the coordinates of the upper left vertex and the lower right vertex when the area including the face is displayed as a rectangular bounding box, or the coordinates of the upper right vertex and the lower left. It can be defined as the coordinates of the vertices.

랜드마크 서브 네트워크(356)는 제1 내지 제3 특징맵들 각각에 대하여 랜드마크 컨벌루션 필터를 적용하여 얼굴영역 내의 얼굴에 대한 랜드마크 좌표값을 획득할 수 있다. 구체적으로, 랜드마크 서브 네트워크(356)는 제1 내지 제3 특징맵들 각각에 대하여 랜드마크 컨벌루션 필터를 이용하여 컨벌루션 연산을 수행하고, 랜드마크 특징맵이 생성될 수 있다.The landmark sub-network 356 may obtain landmark coordinates for a face in the face region by applying a landmark convolution filter to each of the first to third feature maps. Specifically, the landmark sub-network 356 may perform a convolution operation using a landmark convolution filter for each of the first to third feature maps, and generate a landmark feature map.

일 실시예에 있어서, 랜드마크 컨벌루션 필터는 1*1 크기를 갖고 스트라이드(Stride)의 값이 1일 수 있다. 이때, 랜드마크 컨벌루션 필터는 채널의 수가 10개일 수 있다. 이를 통해, 랜드마크 서브 네트워크(356)는 10차원으로 출력되는 10개의 값을 해당 학습 입력 이미지 상에 랜드마크 좌표값으로 결정할 수 있다. 이때, 랜드마크 좌표값은 학습 입력 이미지 상에서 2개의 눈의 좌표, 코의 좌표, 2개의 입의 좌표를 포함할 수 있다. 2개의 입의 좌표는 입의 좌측 꼬리에 대한 좌표 및 입의 우측 꼬리에 대한 좌표를 의미할 수 있다.In an embodiment, the landmark convolution filter may have a size of 1*1 and a value of a stride may be 1. In this case, the landmark convolution filter may have 10 channels. Through this, the landmark sub-network 356 may determine ten values output in ten dimensions as landmark coordinate values on the corresponding learning input image. In this case, the landmark coordinate values may include coordinates of two eyes, coordinates of a nose, and coordinates of two mouths on the learning input image. The two coordinates of the mouth may mean coordinates for the left tail of the mouth and coordinates for the right tail of the mouth.

오차감소부(360)는 획득된 정보를 실제값과 비교하여 오차를 산출하고, 산출된 오차가 기준값 보다 작은 값을 가지도록 얼굴인식 모델(240)을 학습시킨다. 이를 위하여, 오차감소부(360)는 위변조 판별 오차감소부(368) 및 주파수 오차감소부(369)를 포함한다.The error reduction unit 360 compares the obtained information with an actual value to calculate an error, and trains the face recognition model 240 so that the calculated error has a value smaller than a reference value. To this end, the error reduction unit 360 includes a forgery detection error reduction unit 368 and a frequency error reduction unit 369 .

위변조 판별 오차감소부(368)는 위변조 판별 서브 네트워크(358)에 의해 획득된 위변조 확률값(이하, '제1 예측값'이라 함)과 제1 실제값 간의 제1 오차를 제1 오차함수를 이용하여 산출한다. 일 실시예에 있어서, 제1 오차함수는 CEE(Cross Entropy Error)함수일 수 있다. 일 예로, 제1 오차함수는 아래 수학식 1와 같을 수 있다.The forgery detection error reduction unit 368 calculates the first error between the forgery detection probability value (hereinafter referred to as 'first predicted value') and the first actual value obtained by the forgery detection sub-network 358 using a first error function. Calculate. In an embodiment, the first error function may be a cross entropy error (CEE) function. As an example, the first error function may be as in Equation 1 below.

상기

는 제1 오차함수를 나타내고, 상기 s^*는 제1 예측값을 나타내고, 상기 s는 제1 실제값을 나타낸다.remind

denotes a first error function, s ^* denotes a first predicted value, and s denotes a first actual value.

일 실시예에 있어서, 위변조 판별 오차감소부(368)는 제1 오차가 제1 기준값 보다 작은 값을 가질 때까지 위변조 판별 서브 네트워크(358)를 학습시킬 수 있다. 위변조 판별 오차감소부(368)는 제1 오차가 제1 기준값 보다 작아지도록 위변조 판별 컨벌루션 필터를 갱신할 수 있다. 예컨대, 위변조 판별 오차감소부(368)는 위변조 판별 컨벌루션 필터의 필터계수, 편향 및 가중치 중 적어도 하나를 갱신할 수 있다.In an embodiment, the forgery detection error reducing unit 368 may train the forgery detection sub-network 358 until the first error has a value smaller than the first reference value. The forgery detection error reducing unit 368 may update the forgery detection convolution filter so that the first error is smaller than the first reference value. For example, the forgery discrimination error reducing unit 368 may update at least one of a filter coefficient, a bias, and a weight of the forgery discrimination convolution filter.

주파수 오차감소부(369)는 주파수 서브 네트워크(359)에 의해 획득된 주파수 성분값(이하, '제2 예측값'이라 함)과 제2 실제값 간의 제2 오차를 제2 오차함수를 이용하여 산출한다. 일 실시예에 있어서, 제2 오차함수는 MSE(Mean Squared Error)함수일 수 있다. 일 예로, 제2 오차함수는 아래 수학식 2와 같을 수 있다.The frequency error reduction unit 369 calculates a second error between the frequency component value (hereinafter referred to as a 'second predicted value') obtained by the frequency subnetwork 359 and a second actual value using a second error function. do. In an embodiment, the second error function may be a mean squared error (MSE) function. As an example, the second error function may be as shown in Equation 2 below.

상기

는 제2 오차함수를 나타내고, 상기 f^*는 제2 예측값을 나타내고, 상기 f는 제2 실제값을 나타낸다.remind

denotes a second error function, f ^* denotes a second predicted value, and f denotes a second actual value.

일 실시예에 있어서, 주파수 오차감소부(369)는 제2 오차가 제2 기준값 보다 작은 값을 가질 때까지 주파수 서브 네트워크(359)를 학습시킬 수 있다. 주파수 오차감소부(369)는 제2 오차가 제2 기준값 보다 작아지도록 주파수 컨벌루션 필터를 갱신할 수 있다. 예컨대, 주파수 오차감소부(369)는 주파수 컨벌루션 필터의 필터계수, 편향 및 가중치 중 적어도 하나를 갱신할 수 있다.In an embodiment, the frequency error reducing unit 369 may train the frequency subnetwork 359 until the second error has a value smaller than the second reference value. The frequency error reducing unit 369 may update the frequency convolution filter so that the second error is smaller than the second reference value. For example, the frequency error reducing unit 369 may update at least one of filter coefficients, biases, and weights of the frequency convolution filter.

일 실시예에 있어서, 제2 학습 네트워크(350)가 얼굴판별 서브 네트워크(352), 얼굴위치 서브 네트워크(354) 및 랜드마크 서브 네트워크(356)를 포함하는 경우, 오차감소부(360)는 얼굴판별 오차감소부(362), 얼굴위치 오차감소부(364) 및 랜드마크 오차감소부(366)를 더 포함할 수 있다.In one embodiment, when the second learning network 350 includes the face recognition sub-network 352, the face location sub-network 354, and the landmark sub-network 356, the error reduction unit 360 is It may further include a discrimination error reducing unit 362 , a face position error reducing unit 364 , and a landmark error reducing unit 366 .

얼굴판별 오차감소부(362)는 얼굴판별 서브 네트워크(352)에 의해 획득된 얼굴영역이 포함될 확률값(이하, '제3 예측값'이라 함)과 제3 실제값 간의 제3 오차를 제3 오차함수를 이용하여 산출할 수 있다. 일 실시예에 있어서, 제3 오차함수는 CEE(Cross Entropy Error)함수일 수 있다. 일 예로, 제3 오차함수는 아래 수학식 3과 같을 수 있다.The face recognition error reduction unit 362 calculates a third error between the probability value (hereinafter referred to as a 'third predicted value') to be included in the face region obtained by the face recognition sub-network 352 and the third actual value with a third error function. can be calculated using In an embodiment, the third error function may be a cross entropy error (CEE) function. As an example, the third error function may be as shown in Equation 3 below.

상기

는 제3 오차함수를 나타내고, 상기 p^*는 제3 예측값을 나타내고, 상기 p는 제3 실제값을 나타낸다.remind

denotes a third error function, p ^* denotes a third predicted value, and p denotes a third actual value.

일 실시예에 있어서, 얼굴판별 오차감소부(362)는 제3 오차가 제3 기준값 보다 작은 값을 가질 때까지 얼굴판별 서브 네트워크(352)를 학습시킬 수 있다. 얼굴판별 오차감소부(362)는 제3 오차가 제3 기준값 보다 작아지도록 얼굴판별 컨벌루션 필터를 갱신할 수 있다. 예컨대, 얼굴판별 오차감소부(362)는 얼굴판별 컨벌루션 필터의 필터계수, 편향 및 가중치 중 적어도 하나를 갱신할 수 있다.In an embodiment, the face recognition error reducing unit 362 may train the face recognition sub-network 352 until the third error has a value smaller than the third reference value. The face discrimination error reducing unit 362 may update the face discrimination convolution filter so that the third error is smaller than the third reference value. For example, the face discrimination error reduction unit 362 may update at least one of filter coefficients, biases, and weights of the face discrimination convolution filter.

얼굴위치 오차감소부(364)는 얼굴위치 서브 네트워크(354)에 의해 획득된 얼굴영역의 좌표값(이하, '제4 예측값'이라 함)과 제4 실제값 간의 제4 오차를 제4 오차함수를 이용하여 산출할 수 있다. 일 실시예에 있어서, 제4 오차함수는 아래 수학식 4과 같을 수 있다.The face position error reducing unit 364 calculates a fourth error between the coordinate value of the face region obtained by the face position sub-network 354 (hereinafter referred to as a 'fourth predicted value') and a fourth actual value with a fourth error function can be calculated using In an embodiment, the fourth error function may be as Equation 4 below.

상기

는 제4 오차함수를 나타내고, 상기 t^*는 제4 예측값을 나타내고, 상기 4는 제2 실제값을 나타낸다.remind

denotes a fourth error function, t ^* denotes a fourth predicted value, and 4 denotes a second actual value.

한편, 상기 smooth함수는 아래 수학식 5과 같이 정의될 수 있다.Meanwhile, the smooth function may be defined as in Equation 5 below.

일 실시예에 있어서, 얼굴위치 오차감소부(364)는 제4 오차가 제4 기준값 보다 작은 값을 가질 때까지 얼굴위치 서브 네트워크(354)를 학습시킬 수 있다. 얼굴위치 오차감소부(364)는 제4 오차가 제4 기준값 보다 작아지도록 얼굴위치 컨벌루션 필터를 갱신할 수 있다. 예컨대, 얼굴위치 오차감소부(364)는 얼굴위치 컨벌루션 필터의 필터계수, 편향 및 가중치 중 적어도 하나를 갱신할 수 있다.In an embodiment, the face position error reducing unit 364 may train the face position sub-network 354 until the fourth error has a value smaller than the fourth reference value. The face position error reducing unit 364 may update the face position convolution filter so that the fourth error is smaller than the fourth reference value. For example, the face position error reducing unit 364 may update at least one of a filter coefficient, a bias, and a weight of the face position convolution filter.

랜드마크 오차감소부(366)는 랜드마크 서브 네트워크(356)에 의해 획득된 랜드마크 좌표값(이하, '제5 예측값'이라 함)과 제5 실제값 간의 제5 오차를 제5 오차함수를 이용하여 산출할 수 있다. 일 실시예에 있어서, 제5 오차함수는 아래 수학식 6과 같을 수 있다.The landmark error reduction unit 366 calculates the fifth error between the landmark coordinate value (hereinafter referred to as 'fifth predicted value') and the fifth actual value obtained by the landmark sub-network 356 using a fifth error function. It can be calculated using In an embodiment, the fifth error function may be as Equation 6 below.

상기

는 제5 오차함수를 나타내고, 상기 l^*는 제5 예측값을 나타내고, 상기 l는 제5 실제값을 나타낸다.remind

denotes a fifth error function, l ^* denotes a fifth predicted value, and l denotes a fifth actual value.

일 실시예에 있어서, 랜드마크 오차감소부(366)는 제5 오차가 제5 기준값 보다 작은 값을 가질 때까지 랜드마크 서브 네트워크(356)를 학습시킬 수 있다. 랜드마크 오차감소부(366)는 제5 오차가 제5 기준값 보다 작아지도록 랜드마크 컨벌루션 필터를 갱신할 수 있다. 예컨대, 랜드마크 오차감소부(366)는 랜드마크 컨벌루션 필터의 필터계수, 편향 및 가중치 중 적어도 하나를 갱신할 수 있다.In an embodiment, the landmark error reducing unit 366 may train the landmark sub-network 356 until the fifth error has a value smaller than the fifth reference value. The landmark error reduction unit 366 may update the landmark convolution filter so that the fifth error is smaller than the fifth reference value. For example, the landmark error reduction unit 366 may update at least one of a filter coefficient, a bias, and a weight of the landmark convolution filter.

일 실시예에 있어서, 오차감소부(360)는 오차를 통합하여 관리하는 통합 오차감소부(370)를 더 포함할 수 있다. In an embodiment, the error reduction unit 360 may further include an integrated error reduction unit 370 that integrates and manages errors.

통합 오차감소부(370)는 얼굴판별 오차감소부(362), 얼굴위치 오차감소부(364), 랜드마크 오차감소부(366), 위변조 판별 오차감소부(368) 및 주파수 오차감소부(369) 각각으로부터 산출된 제1 내지 제5 오차들을 이용하여 최종 오차를 산출할 수 있다. 이때, 통합 오차감소부(370)는 제1 내지 제5 오차들 각각에 가중치를 부여하고, 가중치가 부여된 제1 내지 제4 오차들을 합산하여 최종 오차를 산출할 수 있다.The integrated error reduction unit 370 includes a face discrimination error reduction unit 362 , a face position error reduction unit 364 , a landmark error reduction unit 366 , a forgery detection error reduction unit 368 , and a frequency error reduction unit 369 . ), a final error may be calculated using the first to fifth errors calculated from each. In this case, the integrated error reduction unit 370 may calculate a final error by assigning a weight to each of the first to fifth errors, and summing the weighted first to fourth errors.

일 실시예에 있어서, 통합 오차감소부(370)는 아래 수학식 7을 이용하여 최종 오차를 산출할 수 있다.In an embodiment, the integrated error reduction unit 370 may calculate the final error using Equation 7 below.

상기 L은 최종 오차를 나타내며, 상기 λ₁은 제4 오차함수에 대한 가중치를 나타내고, 상기 λ₂은 제5 오차함수에 대한 가중치를 나타내고, 상기 λ₃은 제1 오차함수에 대한 가중치를 나타내고, 상기 λ₄은 제2 오차함수에 대한 가중치를 나타낸다.L denotes the final error, λ ₁ denotes a weight for the fourth error function, λ ₂ denotes a weight for the fifth error function, and λ ₃ denotes a weight for the first error function, The λ ₄ represents a weight for the second error function.

일 실시예에 있어서, 오차감소부(360)는 최종 오차가 제6 기준값 보다 작은 값을 가질 때까지 얼굴판별 서브 네트워크(352), 얼굴위치 서브 네트워크(354), 랜드마크 서브 네트워크(356), 위변조 판별 서브 네트워크(358) 및 주파수 서브 네트워크(359)를 학습시킬 수 있다. 오차감소부(360)는 최종 오차가 제6 기준값 보다 작아지도록 얼굴판별 컨벌루션 필터, 얼굴위치 컨벌루션 필터, 랜드마크 컨벌루션 필터, 위변조 판별 컨벌루션 필터 및 주파수 컨벌루션 필터 중 적어도 하나를 갱신할 수 있다.In one embodiment, the error reduction unit 360 performs the face identification sub-network 352, the face location sub-network 354, the landmark sub-network 356, and The forgery detection sub-network 358 and the frequency sub-network 359 may be trained. The error reduction unit 360 may update at least one of a face discrimination convolution filter, a face position convolution filter, a landmark convolution filter, a forgery discrimination convolution filter, and a frequency convolution filter so that the final error is smaller than the sixth reference value.

상술한 실시예에 있어서, 얼굴인식 학습 모델(230)은 알고리즘 형태의 소프트웨어로 구현되어 얼굴인식서버(110)에 탑재될 수 있다.In the above-described embodiment, the face recognition learning model 230 may be implemented as software in the form of an algorithm and mounted on the face recognition server 110 .

상술한 바와 같은 얼굴인식 학습 모델(230)에 의하여 학습된 얼굴인식모델(240)은 얼굴탐지 및 위변조를 통합적으로 판별할 수 있다. 즉, 얼굴인식 모델(240)은 입력 이미지에 얼굴영역이 포함되어 있는지 판별하고, 입력 이미지가 위변조되었는지, 실물이미지인지를 판별할 수 있다.The face recognition model 240 learned by the face recognition learning model 230 as described above can integrally determine face detection and forgery. That is, the face recognition model 240 may determine whether the input image includes a face region, and may determine whether the input image is forged or forged or a real image.

이를 위하여, 얼굴인식 모델(240)은 도 9에 도시된 바와 같이 백본 네트워크(910), 제1 네트워크(920) 및 제2 네트워크(950)를 포함할 수 있다. 얼굴인식 모델(240)에 포함된 백본 네트워크(910) 및 제1 네트워크(920)는 얼굴인식 학습 모델(230)에서 설명한 백본 네트워크(310) 및 제1 네트워크(320)과 실질적으로 동일하므로, 이에 대한 구체적인 설명은 생략하도록 한다.To this end, the face recognition model 240 may include a backbone network 910 , a first network 920 , and a second network 950 as shown in FIG. 9 . The backbone network 910 and the first network 920 included in the face recognition model 240 are substantially the same as the backbone network 310 and the first network 320 described in the face recognition learning model 230. A detailed description thereof will be omitted.

얼굴인식 모델(240)에 포함된 제2 네트워크(950)는 얼굴판별 네트워크(952), 얼굴위치 네트워크(954), 랜드마크 네트워크(956) 및 위변조 판별 네트워크(958)를 포함할 수 있다. 얼굴인식 모델(240)의 제2 네트워크(950)는 얼굴인식 학습 모델(230)의 제2 학습 네트워크(350)과 달리 주파수 서브 네트워크(359)를 포함하지 않는다. 즉, 주파수 성분은 학습에 보조적인 역할로만 사용되고, 학습 이후 추론시에는 사용되지 않는다. 그러나, 본 발명의 일 실시예에 따른 얼굴인식 시스템(100)은 얼굴인식 학습 모델(230)에서 얼굴영역 유무, 얼굴영역 좌표, 위변조 여부 및 랜드마크 좌표 이외에 주파수 성분을 더 이용하여 얼굴인식모델(230)을 학습시킴으로써, 입력 이미지에 대한 위변조 판별시 주파수 성분이 반영될 수 있다. The second network 950 included in the face recognition model 240 may include a face recognition network 952 , a face location network 954 , a landmark network 956 , and a forgery detection network 958 . The second network 950 of the face recognition model 240 does not include a frequency subnetwork 359 unlike the second learning network 350 of the face recognition learning model 230 . That is, the frequency component is used only in an auxiliary role for learning, and is not used for inference after learning. However, in the face recognition system 100 according to an embodiment of the present invention, in the face recognition learning model 230, the face recognition model ( 230), the frequency component can be reflected when determining forgery with respect to the input image.

구체적으로, 얼굴인식 모델(240)은 사용자의 입력 이미지가 입력되면, 얼굴판별 네트워크(952)을 통해 얼굴영역이 포함될 확률값을 획득하고, 얼굴위치 네트워크(954)을 통해 얼굴영역의 좌표값을 획득할 수 있다. 또한, 얼굴인식 모델(240)은 랜드마크 네트워크(956)을 통해 랜드마크 좌표값을 획득하고, 위변조 판별 네트워크(958)을 통해 위변조 확률값을 획득할 수 있다. Specifically, when a user's input image is input, the face recognition model 240 obtains a probability value of including the face region through the face identification network 952 , and obtains the coordinate values of the face region through the face location network 954 . can do. In addition, the face recognition model 240 may obtain a landmark coordinate value through the landmark network 956 , and may obtain a forgery probability value through the forgery detection network 958 .

얼굴판별 네트워크(952)은 얼굴판별 컨벌루션 필터를 이용하여 얼굴영역이 포함될 확률값이 추출되고, 얼굴위치 네트워크(954)는 얼굴위치 컨벌루션 필터를 이용하여 얼굴영역의 좌표값이 추출되며, 랜드마크 네트워크(956)는 랜드마크 컨벌루션 필터를 이용하여 랜드마크 좌표값이 추출되고, 위변조 판별 네트워크(958)는 위변조 판별 컨벌루션 필터를 이용하여 위변조 확률값이 추출될 수 있다. The face recognition network 952 extracts a probability value to include a face region using the face recognition convolution filter, and the face location network 954 extracts the coordinate values of the face region using the face position convolution filter, and the landmark network ( 956), landmark coordinate values are extracted using a landmark convolution filter, and the forgery detection network 958 may extract a forgery probability value using a forgery detection convolution filter.

이때, 얼굴판별 컨벌루션 필터, 얼굴위치 컨벌루션 필터, 랜드마크 컨벌루션 필터 및 위변조 판별 컨벌루션 필터 각각의 필터 계수, 편향 및 가중치는 얼굴인식 학습 모델(230)에서 주파수 성분을 반영하여 학습한 결과값을 가질 수 있다. 따라서, 얼굴인식 모델(240)에서 얼굴판별 컨벌루션 필터, 얼굴위치 컨벌루션 필터, 랜드마크 컨벌루션 필터 및 위변조 판별 컨벌루션 필터 각각을 이용하여 추출된 얼굴영역이 포함될 확률값, 얼굴영역의 좌표값, 랜드마크 좌표값 및 위변조 확률값은 주파수 성분이 반영되어 획득된 값에 해당할 수 있다.At this time, the filter coefficients, biases, and weights of each of the face recognition convolution filter, face position convolution filter, landmark convolution filter, and forgery discrimination convolution filter can have the result value learned by reflecting the frequency component in the face recognition learning model 230. there is. Therefore, in the face recognition model 240, the probability value to include the face region extracted using each of the face identification convolution filter, the face position convolution filter, the landmark convolution filter, and the forgery discrimination convolution filter, the coordinate value of the face region, the landmark coordinate value and the forgery probability value may correspond to a value obtained by reflecting the frequency component.

얼굴인식 모델(240)은 얼굴영역이 포함될 확률값이 제1 문턱값 이상이면, 사용자의 입력 이미지에 얼굴이 포함된 것으로 판단하고, 위변조 확률값이 제2 문턱값 이상이면, 사용자의 입력 이미지가 위변조된 것으로 판단할 수 있다.The face recognition model 240 determines that a face is included in the user's input image when the probability value of including the face region is equal to or greater than the first threshold value, and when the forgery probability value is greater than or equal to the second threshold value, the user's input image is forged can be judged as

얼굴인식 모델(240)은 얼굴영역의 좌표값을 이용하여 사용자의 얼굴 이미지를 추출할 수 있다. 일 실시예에 있어서, 얼굴인식 모델(240)은 랜드마크 좌표값을 이용하여 얼굴 이미지를 정렬할 수 있다. 구체적으로, 얼굴인식 모델(240)은 랜드마크 좌표값을 이용하여 얼굴 이미지에 대해 회전, 평행이동, 확대 및 축소 중 적어도 하나를 수행하여 얼굴 이미지를 정렬할 수 있다. 얼굴이미지를 정렬하는 이유는 특징벡터 추출시 제공될 얼굴이미지에 일관성을 부여함으로써 얼굴인식 성능을 향상시키기 위함이다.The face recognition model 240 may extract a face image of the user by using the coordinate values of the face region. In an embodiment, the face recognition model 240 may align the face images using landmark coordinate values. Specifically, the face recognition model 240 may align the face images by performing at least one of rotation, translation, enlargement, and reduction on the face image using the landmark coordinate values. The reason for aligning the face images is to improve the face recognition performance by giving consistency to the face image to be provided when the feature vector is extracted.

얼굴인식 모델(240)은 추출된 사용자의 얼굴 이미지로부터 사용자를 특정할 수 있는 복수의 특징벡터들을 추출할 수 있다. 일 실시예에 있어서, 얼굴인식 모델(240)은 128개 이상의 특징벡터들을 출력할 수 있다. 예컨대, 얼굴인식 모델(240)은 512개의 특징벡터들을 출력할 수 있다.The face recognition model 240 may extract a plurality of feature vectors capable of specifying a user from the extracted face image of the user. In an embodiment, the face recognition model 240 may output 128 or more feature vectors. For example, the face recognition model 240 may output 512 feature vectors.

다시 도 1을 참조하면, 에지 디바이스(120)는 특정 장소 마다 배치되어 얼굴인식서버(110)에 의해 배포되는 얼굴인식 모델(240)을 이용하여 해당 장소로의 출입을 희망하는 타겟 사용자의 얼굴을 인식하고, 인식결과를 기초로 타겟 사용자의 출입을 인증하는 기능을 수행한다.Referring back to FIG. 1 , the edge device 120 uses the face recognition model 240 disposed in each specific place and distributed by the face recognition server 110 to detect the face of the target user who wants to enter the place. It recognizes and performs a function of authenticating the target user's access based on the recognition result.

본 발명에서, 얼굴인식서버(110)가 타겟 사용자의 얼굴인식 및 인증을 수행하지 않고 에지 디바이스(120)가 타겟 사용자의 얼굴인식 및 인증을 수행하도록 한 이유는 타겟 사용자의 얼굴인식 및 인증을 얼굴인식서버(110)에서 수행하는 경우 얼굴인식서버(110) 또는 네트워크에서 장애가 발생되면 얼굴인식 및 인증이 수행될 수 없을 뿐만 아니라 사용자의 수가 증가함에 따라 고가의 얼굴인식서버(110)의 증설이 요구되기 때문이다.In the present invention, the reason that the edge device 120 performs face recognition and authentication of the target user without the face recognition server 110 performing face recognition and authentication of the target user is that the face recognition and authentication of the target user are performed. In the case of performing in the recognition server 110, when a failure occurs in the face recognition server 110 or the network, not only face recognition and authentication cannot be performed, but also, as the number of users increases, the expansion of the expensive face recognition server 110 is required. because it becomes

이에 따라 본 발명은 에지 컴퓨팅(Edge Computing) 방식을 적용하여 에지 디바이스(120)에서 타겟 사용자의 얼굴인식 및 인증을 수행하도록 함으로써 안면인식서버(110) 또는 네트워크에 장애가 발생하더라도 정상적으로 얼굴인식 서비스를 제공할 수 있어 서비스 제공 신뢰도를 향상시킬 수 있고, 사용자의 수가 증가하더라도 고가의 얼굴인식서버(110)를 증설할 필요가 없어 얼굴인식시스템(100) 구축비용을 절감할 수 있게 된다.Accordingly, the present invention applies an edge computing method to perform face recognition and authentication of a target user in the edge device 120, thereby providing a face recognition service normally even if a failure occurs in the face recognition server 110 or the network It is possible to improve the reliability of service provision, and even if the number of users increases, there is no need to expand the expensive face recognition server 110 , so it is possible to reduce the cost of constructing the face recognition system 100 .

이하, 본 발명에 따른 에지 디바이스(120)의 구성을 도 10을 참조하여 보다 구체적으로 설명한다.Hereinafter, the configuration of the edge device 120 according to the present invention will be described in more detail with reference to FIG. 10 .

도 10은 본 발명의 일 실시예에 따른 에지 디바이스의 구성을 개략적으로 보여주는 블록도이다. 10 is a block diagram schematically showing the configuration of an edge device according to an embodiment of the present invention.

도 10을 참조하면, 본 발명의 일 실시예에 따른 에지 디바이스(120)는 촬영부(1010), 타겟 얼굴인식 및 위변조 판별부(1020), 얼굴인식모델(1030), 인증부(1040), 인터페이스부(1050) 및 메모리(1060)를 포함한다.10, the edge device 120 according to an embodiment of the present invention includes a photographing unit 1010, a target face recognition and forgery determining unit 1020, a face recognition model 1030, an authenticator 1040, It includes an interface unit 1050 and a memory 1060 .

촬영부(1010)는 인증대상이 되는 타겟 사용자가 접근하면, 타겟 사용자를 촬영하여 촬영 이미지를 생성한다. 촬영부(1010)는 생성된 촬영이미지를 타겟 얼굴인식 및 위변조 판별부(1020)에 전달한다.The photographing unit 1010 generates a photographed image by photographing the target user when the target user to be authenticated approaches. The photographing unit 1010 transmits the generated photographed image to the target face recognition and forgery determination unit 1020 .

타겟 얼굴인식 및 위변조 판별부(1020)는 촬영부(1010)로부터 타겟 사용자의 입력 이미지가 수신되면 수신된 타겟 사용자의 입력 이미지를 얼굴인식서버(110)로부터 배포된 얼굴인식모델(1030)를 이용하여 타겟 사용자의 입력 이미지로부터 얼굴을 인식하고, 입력 이미지의 위변조 여부를 판별한다. When the target face recognition and forgery determination unit 1020 receives the target user's input image from the photographing unit 1010, the received target user's input image uses the face recognition model 1030 distributed from the face recognition server 110. Thus, the face is recognized from the input image of the target user, and whether the input image is forged or altered is determined.

타겟 얼굴인식 및 위변조 판별부(1020)는 입력 이미지를 얼굴인식모델(1030)에 입력하고, 얼굴인식모델(1030)로부터 얼굴 정보 및 위변조 정보를 획득한다. 일 예로, 타겟 얼굴인식 및 위변조 판별부(1020)는 얼굴인식모델(1030)로부터 얼굴영역이 포함될 확률값, 얼굴영역의 좌표값, 랜드마크 좌표값 및 위변조 확률값을 획득한다. The target face recognition and forgery determination unit 1020 inputs an input image to the face recognition model 1030 and obtains face information and forgery information from the face recognition model 1030 . As an example, the target face recognition and forgery determination unit 1020 obtains a probability value to include a face region, a coordinate value of the face region, a landmark coordinate value, and a forgery probability value from the face recognition model 1030 .

이때, 타겟 얼굴인식 및 위변조 판별부(1020)는 얼굴영역이 포함될 확률값이 제1 문턱값 이상이면, 타겟 사용자의 입력 이미지에 얼굴이 포함된 것으로 판단할 수 있다. In this case, the target face recognition and forgery determination unit 1020 may determine that the face is included in the input image of the target user when the probability value to include the face region is equal to or greater than the first threshold value.

또한, 타겟 얼굴인식 및 위변조 판별부(1020)는 위변조 확률값이 제2 문턱값 이상이면, 타겟 사용자의 입력 이미지가 위변조된 것으로 판단할 수 있다. 위변조 확률값은 0부터 1 사이의 실수 값으로써, 1에 가까울수록 해당 입력 이미지가 위변조된 이미지일 확률이 크다. 반면, 타겟 얼굴인식 및 위변조 판별부(1020)는 위변조 확률값이 제2 문턱값 미만이면, 타겟 사용자의 입력 이미지가 위변조되지 않은 실물이미지인 것으로 판단할 수 있다. In addition, the target face recognition and forgery determination unit 1020 may determine that the input image of the target user has been forged when the forgery probability value is equal to or greater than the second threshold value. The forgery probability value is a real value between 0 and 1, and the closer to 1, the greater the probability that the corresponding input image is a forged image. On the other hand, the target face recognition and forgery determination unit 1020 may determine that the target user's input image is a real image that is not forged or forged when the forgery probability value is less than the second threshold value.

상기 제2 문턱값은 0부터 1사이의 구간을 균등하게 나눈 후 각 기준값을 기반으로 얼굴인식모델(240)의 성능을 재현율(recall)과 정밀도(precision)을 결합한 F1-Score로 평가하고, F1-Score가 가장 클 때 기준값을 제2 문턱값으로 결정할 수 있다.The second threshold value is evaluated as F1-Score, which combines recall and precision, for the performance of the face recognition model 240 based on each reference value after equally dividing the interval from 0 to 1, and F1 When the -Score is the largest, the reference value may be determined as the second threshold value.

한편, 타겟 얼굴인식 및 위변조 판별부(1020)는 타겟 사용자의 입력 이미지에 얼굴영역이 포함되고, 입력 이미지가 실물이미지로 판별되면, 얼굴인식모델(1030)을 통해 획득된 얼굴영역의 좌표값을 이용하여 타겟 사용자의 얼굴이미지를 추출할 수 있다. 그리고, 타겟 얼굴인식 및 위변조 판별부(1020)는 추출된 얼굴이미지로부터 타겟 특징벡터를 생성할 수 있다.On the other hand, the target face recognition and forgery determination unit 1020 includes a face region in the input image of the target user, and when the input image is determined as a real image, the coordinate value of the face region obtained through the face recognition model 1030 It is possible to extract the face image of the target user using In addition, the target face recognition and forgery determination unit 1020 may generate a target feature vector from the extracted face image.

인증부(1040)는 타겟 얼굴인식 및 위변조 판별부(1020)에 의해 획득된 타겟 특징벡터를 얼굴인식서버(110)로부터 수신된 어레이 파일와 비교하여 타겟 사용자를 인증한다. 구체적으로, 인증부(1040)는 타겟 특징벡터를 복수의 사용자들 각각의 사용자 특징벡터들 및 각 사용자의 식별정보를 갖는 복수개의 어레이로 구성된 어레이 파일과 비교하여 타겟 사용자를 인증할 수 있다.The authenticator 1040 authenticates the target user by comparing the target feature vector obtained by the target face recognition and forgery determination unit 1020 with the array file received from the face recognition server 110 . Specifically, the authenticator 1040 may authenticate the target user by comparing the target feature vector with an array file composed of a plurality of arrays having user feature vectors of each of the plurality of users and identification information of each user.

얼굴인식모델(1030)은 얼굴인식서버(110)에 의해 생성되어 배포된 것으로서, 미리 정해진 주기마다 업데이트될 수 있다. 일 예로, 에지 디바이스(120)는 얼굴인식서버(110)에 의해 얼굴인식모델(1030)이 업데이트될 때마다 얼굴인식서버(110)로부터 새로운 얼굴인식모델(1030)을 배포받음으로써 기 배포된 얼굴인식모델(540)을 새로운 얼굴인식모델(1030)로 업데이트할 수 있다.The face recognition model 1030 is generated and distributed by the face recognition server 110 and may be updated at predetermined intervals. As an example, the edge device 120 receives a new face recognition model 1030 from the face recognition server 110 whenever the face recognition model 1030 is updated by the face recognition server 110 , thereby pre-distributed face. The recognition model 540 may be updated with a new face recognition model 1030 .

제1 메모리(1062)는 인터페이스부(1050)를 통해 얼굴인식서버(110)로부터 어레이 파일이 수신되면 이를 업로드하여 인증부(1040)가 이를 이용하여 타겟 사용자를 인증할 수 있도록 한다. 특히, 본 발명에 따른 메모리(1060)는 어레이 파일이 동적으로 로딩될 수 있다.When the array file is received from the face recognition server 110 through the interface unit 1050 , the first memory 1062 uploads it so that the authentication unit 1040 can use it to authenticate the target user. In particular, in the memory 1060 according to the present invention, an array file can be dynamically loaded.

구체적으로, 제1 메모리(1062)에 어레이 파일이 로딩되어 있을 때, 얼굴인식서버(110)로부터 신규 어레이 파일이 수신되는 경우 신규 어레이 파일은 제2 메모리(1064)에 로딩될 수 있다. 제2 메모리(1064)에 신규 레이 파일의 로딩이 완료되면, 제1 메모리(1062)에 로딩되어 있는 어레이 파일을 제2 메모리(1064)에 로딩되어 있는 신규 어레이 파일로 대체할 수 있다.Specifically, when a new array file is received from the face recognition server 110 when the array file is loaded in the first memory 1062 , the new array file may be loaded into the second memory 1064 . When the loading of the new ray file into the second memory 1064 is completed, the array file loaded in the first memory 1062 may be replaced with the new array file loaded in the second memory 1064 .

제1 메모리(572)에는 인증부(550)에 의해 이용되는 어레이 파일이 로딩되고, 제2 메모리(574)에는 새롭게 수신된 신규 어레이 파일이 로딩된다. 제2 메모리(574)에 신규 어레이 파일의 로딩이 완료되면 제1 메모리(572)에 기록된 어레이 파일이 신규 어레이 파일로 대체되게 된다.An array file used by the authenticator 550 is loaded into the first memory 572 , and a newly received new array file is loaded into the second memory 574 . When the loading of the new array file into the second memory 574 is completed, the array file written in the first memory 572 is replaced with the new array file.

인터페이스부(1050)는 에지 디바이스(120)와 얼굴인식서버(110)간의 데이터 송수신을 매개한다. 구체적으로, 인터페이스부(1050)는 얼굴인식서버(110)로부터 얼굴인식모델(1030)을 수신한다. 인터페이스부(1050)는 얼굴인식서버(110)로부터 어레이 파일을 수신하여 제1 메모리(1062) 또는 제2 메모리(1064)에 로딩한다. 또한, 인터페이스부(1050)는 인증부(1040)에 의한 인증기록을 얼굴인식서버(110)로 주기적으로 전송한다. The interface unit 1050 mediates data transmission/reception between the edge device 120 and the face recognition server 110 . Specifically, the interface unit 1050 receives the face recognition model 1030 from the face recognition server 110 . The interface unit 1050 receives the array file from the face recognition server 110 and loads it into the first memory 1062 or the second memory 1064 . Also, the interface unit 1050 periodically transmits the authentication record by the authentication unit 1040 to the face recognition server 110 .

상술한 바와 같이, 본 발명에 따르면 에지 디바이스(120)에는 얼굴인식을 위한 얼굴인식모델(1030) 및 어레이 파일만 저장될 뿐 사용자의 얼굴이미지나 개인정보가 저장되지 않기 때문에 에지 디바이스(120)가 해킹되더라도 사용자의 개인정보가 유출될 염려가 없어 보안이 강화된다.As described above, according to the present invention, the edge device 120 stores only the face recognition model 1030 and the array file for face recognition in the edge device 120 and does not store the user's face image or personal information. Even if hacked, there is no fear of leakage of user's personal information, so security is enhanced.

다시 도 1을 참조하면, 사용자 단말기(130)는 사용자를 신규 등록하기 위한 사용자 이미지를 사용자의 식별정보와 함께 얼굴인식서버(110)로 전송한다. 일 실시예에 있어서, 사용자 단말기(130)에는 얼굴인식서버(110)와 연동할 수 있는 얼굴등록 에이전트(미도시)가 탑재되어 있고, 사용자는 사용자 단말기(130) 상에서 얼굴등록 에이전트를 실행시킴으로써 사용자의 얼굴을 촬영한 이미지나 기 촬영된 이미지를 사용자 식별정보와 함께 얼굴인식서버(110)로 전송할 수 있다.Referring back to FIG. 1 , the user terminal 130 transmits a user image for newly registering a user to the face recognition server 110 together with the user's identification information. In an embodiment, the user terminal 130 is equipped with a face registration agent (not shown) capable of interworking with the face recognition server 110 , and the user executes the face registration agent on the user terminal 130 . It is possible to transmit an image or a pre-photographed image of the face of the user to the face recognition server 110 together with user identification information.

일 실시예에 있어서, 사용자 단말기(130)는 각 사용자 별로 복수개의 사용자 이미지를 등록하도록 요청할 수 있다. 이때, 각 사용자 별로 등록 요청되는 복수개의 이미지는 서로 다른 환경에서 촬영된 사진이거나 서로 다른 조명하에서 촬영된 사진일 수 있다.In an embodiment, the user terminal 130 may request to register a plurality of user images for each user. In this case, the plurality of images requested for registration by each user may be pictures taken in different environments or pictures taken under different lighting conditions.

사용자 단말기(130)는 얼굴인식서버(110)로 사용자 이미지를 전송하여 사용자 등록을 요청할 수 있는 것이라면 그 종류에 제한 없이 어떤 것이든 이용 가능하다. 예컨대, 사용자 단말기(130)는 스마트폰, 노트북, 데스크탑 또는 테플릿 PC등으로 구현될 수 있다.The user terminal 130 may use any type without limitation as long as it can request user registration by transmitting a user image to the face recognition server 110 . For example, the user terminal 130 may be implemented as a smart phone, a laptop computer, a desktop computer, or a tablet PC.

본 발명의 일 실시예에 따른 얼굴인식 시스템(100)은 얼굴인식 학습 모델(230)이 얼굴인식모델(240)을 얼굴에 대한 특징뿐만 아니라 주파수 성분도 고려하여 학습시킴으로써, 일반 카메라로 촬영된 RGB 이미지가 얼굴인식모델(240)에 입력되더라도 주파수 성분이 반영되어 위변조 확률값이 획득될 수 있다. 이에 따라, 본 발명의 일 실시예에 따른 얼굴인식 시스템(100)은 적외선 센서와 같은 별도의 장치 없이 위변조 여부를 판별할 수 있으므로, 환경적 제약을 최소화하고 비용을 절감할 수 있다.In the face recognition system 100 according to an embodiment of the present invention, the face recognition learning model 230 learns the face recognition model 240 considering not only the features of the face but also the frequency components, so that the RGB image taken with a general camera Even if is input to the face recognition model 240, the frequency component may be reflected to obtain a forgery probability value. Accordingly, since the face recognition system 100 according to an embodiment of the present invention can determine whether or not forgery has been made without a separate device such as an infrared sensor, environmental restrictions can be minimized and costs can be reduced.

또한, 본 발명의 일 실시예에 따른 얼굴인식 시스템(100)은 하나의 통합 얼굴인식모델(240)을 통해 얼굴탐지 및 위변조 판별을 동시에 수행할 수 있다. 이에 따라, 본 발명의 일 실시예에 따른 얼굴인식 시스템(100)은 연산량을 감소시키고, 연산속도를 효과적으로 향상시킬 수 있다.In addition, the face recognition system 100 according to an embodiment of the present invention can simultaneously perform face detection and forgery detection through one integrated face recognition model 240 . Accordingly, the face recognition system 100 according to an embodiment of the present invention can reduce the amount of computation and effectively improve the computation speed.

본 발명이 속하는 기술분야의 당업자는 상술한 본 발명이 그 기술적 사상이나 필수적 특징을 변경하지 않고서 다른 구체적인 형태로 실시될 수 있다는 것을 이해할 수 있을 것이다.Those skilled in the art to which the present invention pertains will understand that the above-described present invention may be embodied in other specific forms without changing the technical spirit or essential characteristics thereof.

본 명이 속하는 기술분야의 당업자는 상술한 본 발명이 그 기술적 사상이나 필수적 특징을 변경하지 않고서 다른 구체적인 형태로 실시될 수 있다는 것을 이해할 수 있을 것이다.Those skilled in the art to which the present invention pertains will be able to understand that the above-described present invention may be embodied in other specific forms without changing the technical spirit or essential characteristics thereof.

그러므로, 이상에서 기술한 실시예들은 모든 면에서 예시적인 것이며 한정적인 것이 아닌 것으로 이해해야만 한다. 본 발명의 범위는 상기 상세한 설명보다는 후술하는 특허청구범위에 의하여 나타내어지며, 특허청구범위의 의미 및 범위 그리고 그 등가 개념으로부터 도출되는 모든 변경 또는 변형된 형태가 본 발명의 범위에 포함되는 것으로 해석되어야 한다.Therefore, it should be understood that the embodiments described above are illustrative in all respects and not restrictive. The scope of the present invention is indicated by the following claims rather than the above detailed description, and all changes or modifications derived from the meaning and scope of the claims and their equivalent concepts should be interpreted as being included in the scope of the present invention. do.

100: 얼굴인식 시스템 110: 얼굴인식서버
120: 에지 디바이스 130: 사용자 단말기
210: 사용자 등록부 220: 사용자 얼굴인식부
230: 얼굴인식 학습 모델 240: 얼굴인식모델
250: 어레이 파일 생성부 260: 에지 디바이스 관리부
270: 인터페이스부100: face recognition system 110: face recognition server
120: edge device 130: user terminal
210: user registration unit 220: user face recognition unit
230: face recognition learning model 240: face recognition model
250: array file generation unit 260: edge device management unit
270: interface unit

Claims

a face recognition model that determines whether a user image is forged or not, and recognizes a face from the user image; and
Extracting a feature map from a training image, acquiring face information, forgery information and frequency information for the training image based on the extracted feature map, and setting the acquired face information, the forgery information and the frequency information as an actual value A face recognition system comprising a face recognition learning model that calculates an error by comparing with , and trains the face recognition model so that the calculated error has a value smaller than a reference value.

According to claim 1, wherein the face recognition learning model,
a first network for extracting feature maps for each of a plurality of training input images generated based on the training image;
a second learning network for obtaining face information, forgery information, and frequency information for the learning input image based on each of the feature maps; and
Comprising an error reduction unit for calculating an error by comparing the obtained face information, the forgery information and the frequency information with an actual value, and learning the face recognition model so that the calculated error has a value smaller than a reference value face recognition system.

According to claim 1, wherein the face recognition learning model,
a first training input image having a first resolution and a first dimension from the training image, a second training input image having a second resolution smaller than the first resolution and a second dimension larger than the first dimension, and the second a backbone network for generating a third training input image having a third resolution smaller than the resolution and a third dimension larger than the second dimension; and
generating a first feature map from the third learning input image, generating a second feature map using the first feature map and the second learning input image, and generating the second feature map and the first learning input image A face recognition system comprising a first network for generating a third feature map using

According to claim 3, wherein the first network,
a first convolution unit configured to generate the first feature map by applying a first convolution filter to the third learning input image,
The first convolutional filter is a face recognition system, characterized in that it changes the dimension of the third learning input image to a specific dimension.

According to claim 3, wherein the first network,
a second convolution unit for changing a dimension of the second learning input image to a specific dimension by applying a second convolution filter;
a first upsampling unit for upsampling the first feature map to have the same resolution as the second learning input image;
a first operation unit for adding the up-sampled first feature map to the second learning input image of which the dimension has been changed; and
and a third convolution unit configured to generate a second feature map by applying a third convolution filter to the second learning input image output from the first operation unit.

6. The method of claim 5,
The second convolutional filter and the third convolutional filter have the same number of filters but different sizes.

The method of claim 2, wherein the second learning network comprises:
a forgery detection sub-network for obtaining a forgery probability value by applying a forgery detection convolution filter to each of the feature maps; and
and a frequency subnetwork for obtaining frequency component values by applying a frequency convolution filter to each of the feature maps,
The face recognition system, characterized in that the frequency convolution filter is a Fourier transform filter.

8. The method of claim 7,
The face recognition system, characterized in that the frequency subnetwork applies frequency convolution filters having different dimensions to each of the feature maps.

The method of claim 7, wherein the second network,
a face discrimination sub-network for obtaining a probability value to include a face region by applying a face discrimination convolution filter to each of the feature maps;
a face position sub-network for obtaining coordinate values of the face region displayed in a bounding box by applying a face position convolution filter to each of the feature maps; and
The face recognition system according to claim 1, further comprising a landmark sub-network for obtaining landmark coordinate values for a face in the face region by applying a landmark convolution filter to each of the feature maps.

According to claim 1, wherein the face recognition learning model,
a forgery detection error reducing unit for calculating a first error between the forgery probability value and a first actual value using a first error function and updating the forgery detection convolution filter so that the calculated first error is smaller than a first reference value; and
Comprising a frequency error reducing unit for calculating a second error between the frequency component value and a second actual value using a second error function and updating the frequency convolution filter so that the calculated second error is smaller than a second reference value Features a face recognition system.

The method of claim 10, wherein the face recognition learning model,
Reducing face discrimination error by calculating a third error between the probability value to include the face region and a third actual value using a third error function, and updating the face discrimination convolution filter so that the calculated third error is smaller than the third reference value wealth;
Reducing face position error by calculating a fourth error between the coordinate value of the face region and the fourth actual value using a fourth error function, and updating the face position convolution filter so that the calculated fourth error is smaller than the fourth reference value wealth;
A landmark error reduction unit that calculates a fifth error between the landmark coordinate value and a fifth actual value using a fifth error function, and updates the landmark convolution filter so that the calculated fifth error is smaller than the fifth reference value ; and
The first to fifth errors are weighted, and the weighted first to fifth errors are summed to calculate a final error, and the face discrimination convolution filter so that the final error is smaller than a sixth reference value; The face recognition system, characterized in that it further comprises an integrated error reduction unit that updates at least one of the face position convolution filter, the landmark convolution filter, the forgery discrimination convolution filter, and the frequency convolution filter.

12. The method of claim 11,
Each of the first error function and the third error function is a cross entropy error (CEE) function, and the second error function is a mean squared error (MSE) function.

According to claim 1, wherein the face recognition model,
a first network for extracting feature maps for each of a plurality of user input images generated based on the user image; and
A second network for acquiring face information and forgery information for the user input image based on each of the feature maps,
The face information includes a probability value to include a face region obtained by applying a face discrimination convolution filter learned by the face recognition learning model to each of the feature maps,
The forgery information is a face recognition system, characterized in that it includes a forgery probability value obtained by applying a forgery and modulation convolution filter learned by the face recognition learning model to each of the feature maps.

14. The method of claim 13,
If the calculated probability value of including the face region is greater than or equal to a preset first threshold, it is determined that the user's input image includes a face, and if the calculated forgery and falsification probability value is less than a preset second threshold, the real thing Face recognition system, characterized in that it further comprises a face recognition and forgery discrimination unit that determines that it is an image.