KR20210049570A

KR20210049570A - Apparatus and method for detecting forgery or alteration of the face

Info

Publication number: KR20210049570A
Application number: KR1020190134026A
Authority: KR
Inventors: 김원준; 노종혁; 곽용재; 정찬호; 조상래; 김영삼; 조관태
Original assignee: 건국대학교 산학협력단; 한국전자통신연구원
Priority date: 2019-10-25
Filing date: 2019-10-25
Publication date: 2021-05-06
Also published as: KR102275803B1

Abstract

The present invention provides a face forgery detection apparatus and method having forgery detection performance. The face forgery detection method includes the steps of: (a) acquiring a photographed image of a face part of an object through a dual camera; (b) extracting a compression feature related to a face region in the photographed image by using the photographed image; (c) extracting a difference of latent features using the photographed image; and (d) detecting whether the face region in the photographed image is forged or altered by using the difference between the compressed feature and the latent feature.

Description

Face forgery detection device and method {APPARATUS AND METHOD FOR DETECTING FORGERY OR ALTERATION OF THE FACE}

본원은 얼굴 위변조 검출 장치 및 방법에 관한 것이다. 특히, 본원은 듀얼 카메라를 이용한 얼굴 위변조 검출 장치 및 방법에 관한 것이다.The present application relates to a face forgery detection apparatus and method. In particular, the present application relates to a face forgery detection apparatus and method using a dual camera.

최근 지능형 범죄가 늘면서 첨단 보안 기술에 대한 요구가 점점 늘어가고 있다. 각 사람에 내재하는 고유한 생체 특성을 이용하여 사람을 판별하는 생체인식 기술은 사용자 편의성 이라는 측면에서 그 중요성이 점차로 증가하고 있다.With the recent increase in intelligent crime, the demand for advanced security technology is increasing. Biometric recognition technology that identifies a person using unique biometric characteristics inherent in each person is gradually increasing in importance in terms of user convenience.

그 중에서도 얼굴인식은 특정 센서에 접촉하지 않고 자연스럽게 얼굴을 기반으로 사람을 인식하기 때문에 가장 매력적인 기술 중 하나이다. 얼굴 인식은 얼굴 검출, 자세 추정, 조명 처리, 얼굴 특징 추출 기술의 발전과 함께 그 성능이 향상되면서 점차적으로 그 사용이 확대되고 있다.Among them, face recognition is one of the most attractive technologies because it naturally recognizes people based on their faces without contacting a specific sensor. Face recognition is gradually expanding its use as its performance improves with the development of face detection, posture estimation, lighting processing, and facial feature extraction technologies.

이러한 얼굴 인식 기술은 2차원 영상을 기반으로 하는 인식 기술과 3차원 형상을 기반으로 하는 기술로 분류된다. 3차원 형상을 기반으로 하는 인식 기술은 많은 연산량을 필요로 하기 때문에 2차원 영상을 기반으로 하는 얼굴 인식 시스템이 현재 널리 사용되고 있다.Such face recognition technology is classified into a recognition technology based on a 2D image and a technology based on a 3D shape. Since a recognition technology based on a 3D shape requires a large amount of computation, a face recognition system based on a 2D image is currently widely used.

한편, 2차원 영상을 기반으로 하는 얼굴 인식 시스템은 등록된 대상의 얼굴 사진을 이용한 위장 공격에 취약한 단점이 있다. 이러한 문제점을 해결하기 위한 위조 검출(liveness detection)에 대한 연구가 꾸준히 연구되고 있다.Meanwhile, a face recognition system based on a 2D image has a disadvantage that is vulnerable to a camouflage attack using a face photograph of a registered target. In order to solve this problem, studies on liveness detection are constantly being studied.

종래기술 중 하나는 푸리에 스펙트럼을 분석하여 위조 얼굴을 검출하는 방법이다. 이는 실질 얼굴로부터 캡처한 영상은 사진으로부터 캡처한 영상에 비하여 많은 고주파 성분을 포함하고 있다는 점을 이용하였다. 하지만, 이러한 방법은 두 영상에서의 고주파 성분의 차이가 외부환경의 변화가 가져오는 영상에서의 고주파 성분의 차이보다 뚜렷하지 않기 때문에 성능이 안정적이지 못하다는 문제점이 있다.One of the prior art is a method of detecting a fake face by analyzing a Fourier spectrum. This takes advantage of the fact that the image captured from the real face contains more high-frequency components than the image captured from the photograph. However, this method has a problem in that the performance is not stable because the difference between the high frequency components in the two images is less pronounced than the difference between the high frequency components in the image caused by changes in the external environment.

또한, 기존 연구들은 대부분 한 대의 카메라로부터 획득한 영상을 이용해 실제 얼굴과 위변조된 얼굴을 판별하며, 특히, 얼굴 영상의 화질, 질감(Texture) 패턴 등의 특징을 이용하여 실제 얼굴과 위변조된 얼굴의 미묘한 차이를 검출하는 연구가 활발히 진행되어 왔다.In addition, most of the existing studies identify real faces and forged faces using images acquired from a single camera. In particular, using features such as the quality of the face image and texture pattern, the actual faces and forged faces are identified. Research to detect subtle differences has been actively conducted.

최근에는 영상 인식 분야에서 심층신경망의 성공에 힘입어 이를 위변조 여부 검출에 적용하는 시도가 늘고 있으나, 현재까지 공지된 종래의 위변조 검출 기술들은, 종이로 출력된 마스크의 변형이나 고해상도 화면(이미지)을 이용한 위변조 공격에 여전히 취약하다는 단점이 있다.In recent years, thanks to the success of the deep neural network in the field of image recognition, attempts to apply it to the detection of forgery or alteration are increasing, but the conventional forgery detection techniques known to date include deformation of a mask output on paper or a high-resolution screen (image). The disadvantage is that it is still vulnerable to the forgery and alteration attacks used.

본원의 배경이 되는 기술은 한국등록특허공보 제10-1064945호에 개시되어 있다.The technology behind the present application is disclosed in Korean Patent Publication No. 10-1064945.

본원은 전술한 종래 기술의 문제점을 해결하기 위한 것으로서, 종이로 출력된 마스크의 변형이나 고해상도 화면(이미지)을 이용한 위변조 공격에도 강인한 위변조 검출 성능을 갖는 얼굴 위변조 검출 장치 및 방법을 제공하려는 것을 목적으로 한다.The present application is to solve the problems of the prior art described above, the purpose of which is to provide a face forgery detection apparatus and method having a robust forgery detection performance against a forgery attack using a mask output on paper or a high-resolution screen (image). do.

다만, 본원의 실시예가 이루고자 하는 기술적 과제는 상기된 바와 같은 기술적 과제들로 한정되지 않으며, 또 다른 기술적 과제들이 존재할 수 있다.However, the technical problem to be achieved by the embodiment of the present application is not limited to the technical problems as described above, and other technical problems may exist.

상기한 기술적 과제를 달성하기 위한 기술적 수단으로서, 본원의 일 실시예에 따른 얼굴 위변조 검출 방법은, (a) 듀얼 카메라를 통해 대상체의 얼굴 부위에 대한 촬영 영상을 획득하는 단계; (b) 상기 촬영 영상을 이용하여 상기 촬영 영상 내 얼굴 영역과 관련된 압축 특징을 추출하는 단계; (c) 상기 촬영 영상을 이용하여 잠재 특징 차이(Difference of Latent Features)를 추출하는 단계; 및 (d) 상기 압축 특징과 상기 잠재 특징 차이를 이용하여 상기 촬영 영상 내 상기 얼굴 영역에 대한 위변조 여부를 검출하는 단계를 포함할 수 있다.As a technical means for achieving the above technical problem, a method for detecting forgery and alteration of a face according to an embodiment of the present application includes: (a) obtaining a photographed image of a face portion of an object through a dual camera; (b) extracting a compression feature related to a face region in the captured image by using the captured image; (c) extracting a difference of latent features using the captured image; And (d) detecting whether the face region in the photographed image has been forged or altered by using the difference between the compressed feature and the latent feature.

또한, 상기 (a) 단계에서 상기 촬영 영상은 제1 시점 영상 및 제2 시점 영상을 포함하고, 상기 (b) 단계는, 상기 제1 시점 영상과 제2 시점 영상을 제1 신경망의 입력으로 제공하여, 상기 제1 신경망으로부터 상기 제1 시점 영상에 대응하는 제1 시점 압축 특징과 상기 제2 시점 영상에 대응하는 제2 시점 압축 특징을 포함하는 상기 압축 특징을 추출할 수 있다.In addition, the captured image in step (a) includes a first viewpoint image and a second viewpoint image, and in step (b), the first viewpoint image and the second viewpoint image are provided as inputs of the first neural network. Thus, the compression feature including a first view compression feature corresponding to the first view image and a second view compression feature corresponding to the second view image may be extracted from the first neural network.

또한, 상기 (c) 단계는, 상기 제1 시점 영상과 상기 제2 시점 영상을 이용하여 상기 제1 시점 영상과 상기 제2 시점 영상 간의 잠재 특징 차이를 추출할 수 있다.In addition, in step (c), a potential feature difference between the first viewpoint image and the second viewpoint image may be extracted by using the first viewpoint image and the second viewpoint image.

또한, 상기 (c) 단계에서 상기 잠재 특징 차이는, 상기 제1 시점 영상의 특징과 상기 제2 시점 영상의 특징이 상기 제1 신경망 내에서 압축되는 과정에서 발생하는 제1 시점 영상의 특징과 상기 제2 시점 영상의 특징 간의 차이인 차이 특징을 제2 신경망에 통과시킴으로써 획득될 수 있다.In addition, in the step (c), the difference in the latent feature is the feature of the first view image and the feature of the first view image generated when the feature of the first view image and the feature of the second view image are compressed in the first neural network. It can be obtained by passing a difference feature, which is a difference between features of the second view image, through the second neural network.

또한, 상기 (d) 단계는, 상기 제1 시점 압축 특징, 상기 제2 시점 압축 특징 및 상기 잠재 특징 차이를 이용하여 상기 촬영 영상에 대한 점수를 산출하고, 상기 산출된 점수를 기반으로 상기 촬영 영상 내 상기 얼굴 영역에 대한 위변조 여부를 검출할 수 있다.In addition, in the step (d), a score for the captured image is calculated by using the first view compression feature, the second view compression feature, and the latent feature difference, and the captured image is based on the calculated score. It is possible to detect whether or not the face area is forged or altered.

또한, 상기 (d) 단계는, 상기 제1 시점 압축 특징, 상기 제2 시점 압축 특징 및 상기 잠재 특징 차이를 결합하여 통합 결합 특징을 생성하고, 상기 생성된 통합 결합 특징을 이용하여 산출된 통합 결합 특징 관련 점수를 기반으로 상기 위변조 여부를 검출할 수 있다.In addition, in the step (d), the first view compression feature, the second view compression feature, and the latent feature difference are combined to generate an integrated combination feature, and the integrated combination calculated using the generated integrated combination feature. The forgery or alteration may be detected based on the feature-related score.

또한, 상기 (d) 단계는, 상기 제1 시점 압축 특징과 상기 제2 시점 압축 특징을 결합한 결합 특징을 이용하여 결합 특징 관련 제1 점수를 산출하고, 상기 잠재 특징 차이를 이용하여 잠재 특징 차이 관련 제2 점수를 산출하며, 상기 제1 점수와 상기 제2 점수를 합산한 합산 점수를 기반으로 상기 위변조 여부를 검출할 수 있다.In addition, in the step (d), a first score related to a combined feature is calculated using a combined feature obtained by combining the first view compression feature and the second view compression feature, and the potential feature difference is related using the latent feature difference. A second score may be calculated, and the forgery or alteration may be detected based on a sum score obtained by adding the first score and the second score.

또한, 상기 (d) 단계에서 상기 합산 점수는, 상기 제1 점수와 상기 제2 점수를 평균한 점수일 수 있다.In addition, the summed score in step (d) may be a score obtained by an average of the first score and the second score.

한편, 본원의 일 실시예에 따른 얼굴 위변조 검출 장치는, 듀얼 카메라를 통해 대상체의 얼굴 부위에 대한 촬영 영상을 획득하는 획득부; 상기 촬영 영상을 이용하여 상기 촬영 영상 내 얼굴 영역과 관련된 압축 특징을 추출하는 제1 추출부; 상기 촬영 영상을 이용하여 잠재 특징 차이(Difference of Latent Features)를 추출하는 제2 추출부; 및 상기 압축 특징과 상기 잠재 특징 차이를 이용하여 상기 촬영 영상 내 상기 얼굴 영역에 대한 위변조 여부를 검출하는 검출부를 포함할 수 있다.Meanwhile, an apparatus for detecting face forgery and alteration according to an exemplary embodiment of the present disclosure includes: an acquisition unit for obtaining a photographed image of a face portion of an object through a dual camera; A first extracting unit for extracting a compression feature related to a face region in the captured image by using the captured image; A second extraction unit for extracting a difference of latent features using the captured image; And a detection unit that detects whether or not the face region in the photographed image is forged or altered by using the difference between the compressed feature and the latent feature.

또한, 상기 촬영 영상은 제1 시점 영상 및 제2 시점 영상을 포함하고, 상기 제1 추출부는, 상기 제1 시점 영상과 제2 시점 영상을 제1 신경망의 입력으로 제공하여, 상기 제1 신경망으로부터 상기 제1 시점 영상에 대응하는 제1 시점 압축 특징과 상기 제2 시점 영상에 대응하는 제2 시점 압축 특징을 포함하는 상기 압축 특징을 추출할 수 있다.In addition, the captured image includes a first viewpoint image and a second viewpoint image, and the first extraction unit provides the first viewpoint image and the second viewpoint image as inputs of the first neural network, and The compression feature including a first view compression feature corresponding to the first view image and a second view compression feature corresponding to the second view image may be extracted.

또한, 상기 제2 추출부는, 상기 제1 시점 영상과 상기 제2 시점 영상을 이용하여 상기 제1 시점 영상과 상기 제2 시점 영상 간의 잠재 특징 차이를 추출할 수 있다.In addition, the second extraction unit may extract a potential feature difference between the first viewpoint image and the second viewpoint image by using the first viewpoint image and the second viewpoint image.

또한, 상기 잠재 특징 차이는, 상기 제1 시점 영상의 특징과 상기 제2 시점 영상의 특징이 상기 제1 신경망 내에서 압축되는 과정에서 발생하는 제1 시점 영상의 특징과 상기 제2 시점 영상의 특징 간의 차이인 차이 특징을 제2 신경망에 통과시킴으로써 획득될 수 있다.In addition, the latent feature difference is a feature of a first view image and a feature of the second view image occurring in a process in which the feature of the first view image and the feature of the second view image are compressed in the first neural network. It can be obtained by passing the difference feature, which is the difference between the two, through the second neural network.

또한, 상기 검출부는, 상기 제1 시점 특징, 상기 제2 시점 특징 및 상기 잠재 특징 차이를 이용하여 상기 촬영 영상에 대한 점수를 산출하고, 상기 산출된 점수를 기반으로 상기 촬영 영상 내 상기 얼굴 영역에 대한 위변조 여부를 검출할 수 있다.In addition, the detection unit calculates a score for the captured image by using the first viewpoint feature, the second viewpoint feature, and the latent feature difference, and based on the calculated score, the face region in the photographed image is It is possible to detect whether or not forgery or alteration has occurred.

또한, 상기 검출부는, 상기 제1 시점 압축 특징, 상기 제2 시점 압축 특징 및 상기 잠재 특징 차이를 결합하여 통합 결합 특징을 생성하고, 상기 생성된 통합 결합 특징을 이용하여 산출된 통합 결합 특징 관련 점수를 기반으로 상기 위변조 여부를 검출할 수 있다.In addition, the detection unit generates an integrated combination feature by combining the first view compression feature, the second view compression feature, and the latent feature difference, and an integrated combination feature related score calculated using the generated integrated combination feature. It is possible to detect whether the forgery or alteration is based on.

또한, 상기 검출부는, 상기 제1 시점 압축 특징과 상기 제2 시점 압축 특징을 결합한 결합 특징을 이용하여 결합 특징 관련 제1 점수를 산출하고, 상기 잠재 특징 차이를 이용하여 잠재 특징 차이 관련 제2 점수를 산출하며, 상기 제1 점수와 상기 제2 점수를 합산한 합산 점수를 기반으로 상기 위변조 여부를 검출할 수 있다.In addition, the detection unit calculates a first score related to a combined feature by using a combined feature obtained by combining the first view compression feature and the second view compression feature, and uses the latent feature difference to calculate a second score related to the potential feature difference. Is calculated, and the forgery or alteration may be detected based on a sum score obtained by adding the first score and the second score.

또한, 상기 합산 점수는, 상기 제1 점수와 상기 제2 점수를 평균한 점수일 수 있다.In addition, the summed score may be a score obtained by an average of the first score and the second score.

상술한 과제 해결 수단은 단지 예시적인 것으로서, 본원을 제한하려는 의도로 해석되지 않아야 한다. 상술한 예시적인 실시예 외에도, 도면 및 발명의 상세한 설명에 추가적인 실시예가 존재할 수 있다.The above-described problem solving means are merely exemplary and should not be construed as limiting the present application. In addition to the above-described exemplary embodiments, additional embodiments may exist in the drawings and detailed description of the invention.

전술한 본원의 과제 해결 수단에 의하면, 듀얼 카메라를 통해 획득된 제1 시점 영상과 제2 시점 영상을 이용하는 얼굴 위변조 검출 장치를 제공함으로써, 종이로 출력된 마스크의 변형이나 고해상도 화면(이미지)을 이용한 위변조 공격에도 강인한 특성을 보이는, 고성능의 얼굴 위변조 검출을 수행할 수 있다.According to the above-described problem solving means of the present application, by providing a face forgery detection device using a first viewpoint image and a second viewpoint image acquired through a dual camera, deformation of a mask output on paper or a high-resolution screen (image) is used. It can perform high-performance face forgery detection, which shows robust characteristics even in forgery attacks.

전술한 본원의 과제 해결 수단에 의하면, 제1 시점 영상과 제2 시점 영상 간의 차이인 잠재 특징 차이를 이용하는 얼굴 위변조 검출 장치를 제공함으로써, 3차원 구조에 의한 차이점을 내재적으로 학습하는 것이 가능하고, 이를 통해 얼굴 위변조 여부를 효과적으로 검출할 수 있다.According to the above-described problem solving means of the present application, by providing a face forgery detection apparatus using a latent feature difference that is a difference between a first viewpoint image and a second viewpoint image, it is possible to implicitly learn the difference due to a three-dimensional structure, Through this, it is possible to effectively detect whether the face has been forged or altered.

다만, 본원에서 얻을 수 있는 효과는 상기된 바와 같은 효과들로 한정되지 않으며, 또 다른 효과들이 존재할 수 있다.However, the effect obtainable in the present application is not limited to the above-described effects, and other effects may exist.

도 1은 본원의 일 실시예에 따른 얼굴 위변조 검출 장치의 개략적인 구성을 나타낸 도면이다.
도 2는 본원의 일 실시예에 따른 얼굴 위변조 검출 장치에 포함된 제1 신경망과 제2 신경망을 개략적으로 나타낸 도면이다.
도 3은 본원의 일 실시예에 따른 얼굴 위변조 검출 장치의 검출부가 위변조 여부를 검출하는 제1 방식을 설명하기 위한 도면이다.
도 4는 본원의 일 실시예에 따른 얼굴 위변조 검출 장치의 검출부가 위변조 여부를 검출하는 제2 방식을 설명하기 위한 도면이다.
도 5는 본원의 일 실시예에 따른 얼굴 위변조 검출 장치에서 촬영 영상 내 얼굴 영역을 식별하는 경우를 설명하기 위한 도면이다.
도 6은 본원의 일 실시예에 따른 얼굴 위변조 검출 장치의 성능 평가 결과를 나타낸 도면이다.
도 7은 본원의 일 실시예에 따른 얼굴 위변조 검출 방법의 동작 흐름도이다.1 is a view showing a schematic configuration of a face forgery detection apparatus according to an embodiment of the present application.
FIG. 2 is a diagram schematically illustrating a first neural network and a second neural network included in a face forgery detection apparatus according to an embodiment of the present application.
3 is a diagram for explaining a first method in which a detection unit of a face forgery detection apparatus according to an exemplary embodiment of the present disclosure detects whether there is forgery or alteration.
4 is a diagram for explaining a second method in which a detection unit of a face forgery detection apparatus according to an exemplary embodiment of the present disclosure detects whether or not forgery has been altered.
FIG. 5 is a diagram for describing a case of identifying a face region in a captured image by a face forgery detection apparatus according to an exemplary embodiment of the present disclosure.
6 is a diagram showing a performance evaluation result of a face forgery detection apparatus according to an embodiment of the present application.
7 is a flowchart illustrating an operation of a method for detecting forgery and alteration of a face according to an exemplary embodiment of the present application.

아래에서는 첨부한 도면을 참조하여 본원이 속하는 기술 분야에서 통상의 지식을 가진 자가 용이하게 실시할 수 있도록 본원의 실시예를 상세히 설명한다. 그러나 본원은 여러 가지 상이한 형태로 구현될 수 있으며 여기에서 설명하는 실시예에 한정되지 않는다. 그리고 도면에서 본원을 명확하게 설명하기 위해서 설명과 관계없는 부분은 생략하였으며, 명세서 전체를 통하여 유사한 부분에 대해서는 유사한 도면 부호를 붙였다.Hereinafter, embodiments of the present application will be described in detail with reference to the accompanying drawings so that those of ordinary skill in the art may easily implement the present application. However, the present application may be implemented in various different forms and is not limited to the embodiments described herein. In addition, in the drawings, parts irrelevant to the description are omitted in order to clearly describe the present application, and similar reference numerals are attached to similar parts throughout the specification.

본원 명세서 전체에서, 어떤 부분이 다른 부분과 "연결"되어 있다고 할 때, 이는 "직접적으로 연결"되어 있는 경우뿐 아니라, 그 중간에 다른 소자를 사이에 두고 "전기적으로 연결" 또는 "간접적으로 연결"되어 있는 경우도 포함한다. Throughout the present specification, when a part is said to be "connected" with another part, it is not only the case that it is "directly connected", but also "electrically connected" or "indirectly connected" with another element interposed therebetween. "Including the case.

본원 명세서 전체에서, 어떤 부재가 다른 부재 "상에", "상부에", "상단에", "하에", "하부에", "하단에" 위치하고 있다고 할 때, 이는 어떤 부재가 다른 부재에 접해 있는 경우뿐 아니라 두 부재 사이에 또 다른 부재가 존재하는 경우도 포함한다.Throughout the present specification, when a member is positioned "on", "upper", "upper", "under", "lower", and "lower" of another member, this means that a member is located on another member. This includes not only the case where they are in contact but also the case where another member exists between the two members.

본원 명세서 전체에서, 어떤 부분이 어떤 구성 요소를 "포함"한다고 할 때, 이는 특별히 반대되는 기재가 없는 한 다른 구성 요소를 제외하는 것이 아니라 다른 구성 요소를 더 포함할 수 있는 것을 의미한다.In the entire specification of the present application, when a certain part "includes" a certain component, it means that other components may be further included rather than excluding other components unless specifically stated to the contrary.

도 1은 본원의 일 실시예에 따른 얼굴 위변조 검출 장치(100)의 개략적인 구성을 나타낸 도면이다. 도 2는 본원의 일 실시예에 따른 얼굴 위변조 검출 장치(100)에 포함된 제1 신경망(20)과 제2 신경망(30)을 개략적으로 나타낸 도면이다. 도 3은 본원의 일 실시예에 따른 얼굴 위변조 검출 장치(100)의 검출부(140)가 위변조 여부를 검출하는 제1 방식을 설명하기 위한 도면이다. 도 4는 본원의 일 실시예에 따른 얼굴 위변조 검출 장치(100)의 검출부(140)가 위변조 여부를 검출하는 제2 방식을 설명하기 위한 도면이다. 도 5는 본원의 일 실시예에 따른 얼굴 위변조 검출 장치(100)에서 촬영 영상(10) 내 얼굴 영역을 식별하는 경우를 설명하기 위한 도면이다.1 is a view showing a schematic configuration of a face forgery detection apparatus 100 according to an embodiment of the present application. 2 is a diagram schematically illustrating a first neural network 20 and a second neural network 30 included in the apparatus 100 for detecting forgery and alteration of a face according to an exemplary embodiment of the present application. 3 is a view for explaining a first method in which the detection unit 140 of the face forgery detection apparatus 100 according to an embodiment of the present disclosure detects whether there is forgery or alteration. 4 is a view for explaining a second method in which the detection unit 140 of the face forgery detection apparatus 100 according to an embodiment of the present disclosure detects whether or not forgery has been altered. FIG. 5 is a diagram for describing a case in which the face forgery detection apparatus 100 according to an exemplary embodiment of the present disclosure identifies a face area within a captured image 10.

이하에서는 본원의 일 실시예에 따른 얼굴 위변조 검출 장치(100)를 설명의 편의상 본 장치(100)라 하기로 한다. Hereinafter, the face forgery detection apparatus 100 according to an exemplary embodiment of the present disclosure will be referred to as the present apparatus 100 for convenience of description.

도 1 내지 도 5를 참조하면, 본 장치(100)는 획득부(110), 제1 추출부(120), 제2 추출부(130) 및 검출부(140)를 포함할 수 있다. 본 장치(100)는 입력된 영상에 포함된 대상체(사람)의 얼굴이 실제 얼굴인지 혹은 거짓 얼굴(위변조된 가짜 얼굴)인지를 검출할 수 있는 얼굴 위변조 여부를 검출하는 장치일 수 있다.1 to 5, the apparatus 100 may include an acquisition unit 110, a first extraction unit 120, a second extraction unit 130, and a detection unit 140. The apparatus 100 may be a device that detects whether a face of an object (person) included in an input image is a real face or a false face (a fake face that has been forged).

획득부(110)는 듀얼 카메라(1)를 통해 대상체의 얼굴 부위에 대한 촬영 영상(10)을 획득할 수 있다. 촬영 영상(10)은 제1 시점 영상(11) 및 제2 시점 영상(12)을 포함할 수 있다. 획득부(110)가 획득하는 영상은, 듀얼 카메라(1)를 이용해 획득된 영상임에 따라, 2개의 영상(11, 12)이 포함될 수 있다.The acquisition unit 110 may acquire a photographed image 10 of a face portion of the object through the dual camera 1. The captured image 10 may include a first viewpoint image 11 and a second viewpoint image 12. Since the image acquired by the acquisition unit 110 is an image acquired using the dual camera 1, two images 11 and 12 may be included.

듀얼 카메라(1)는 스테레오 카메라 등으로 달리 표현될 수 있다. 듀얼 카메라(1)는 듀얼 카메라(1) 하면과 수평이 되는 수평방향에 대하여, 소정의 간격(미리 설정된 간격)을 두고 이격되어 배치되는 2개의 센서를 포함할 수 있다. 여기서, 센서는 이미지 센서라 달리 표현될 수 있다. 2개의 센서 간의 거리는 제조된 듀얼 카메라(1)의 기종 마다 차이가 있을 수 있다. 예시적으로, 2개의 센서 간의 거리는 12cm 일 수 있으나, 이에만 한정되는 것은 아니고, 그 수치는 다양하게 적용될 수 있다.The dual camera 1 may be expressed differently as a stereo camera or the like. The dual camera 1 may include two sensors disposed to be spaced apart from the lower surface of the dual camera 1 at a predetermined interval (pre-set interval) with respect to a horizontal direction horizontal to the lower surface of the dual camera 1. Here, the sensor may be expressed differently as an image sensor. The distance between the two sensors may be different for each model of the manufactured dual camera 1. For example, the distance between the two sensors may be 12 cm, but is not limited thereto, and the numerical value may be applied in various ways.

제1 시점 영상(11)은 2개의 센서 중 어느 하나의 센서를 통해 획득(촬영)된 영상일 수 있다. 제2 시점 영상(12)은 2개의 센서 중 다른 어느 하나의 센서(즉, 제1 시점 영상을 획득하는 센서를 제외한 나머지 센서)를 이용하여 획득(촬영)된 영상일 수 있다.The first viewpoint image 11 may be an image acquired (photographed) through any one of the two sensors. The second viewpoint image 12 may be an image acquired (photographed) using any one of the two sensors (ie, other sensors other than the sensor that acquires the first viewpoint image).

일예로, 제1 시점 영상(11)은 좌측(Left) 영상이고, 제2 시점 영상(12)은 우측(Rigth) 영상일 수 있으나, 이에 한정되는 것은 아니다. 다른 일예로, 제1 시점 영상(11)이 우측 영상이고, 제2 시점 영상(12)이 좌측 영상일 수 있다.As an example, the first viewpoint image 11 may be a left image, and the second viewpoint image 12 may be a right image, but the present invention is not limited thereto. As another example, the first viewpoint image 11 may be a right image, and the second viewpoint image 12 may be a left image.

이하에서는 제1 시점 영상(11)이 좌측 영상이고, 제2 시점 영상(12)이 우측 영상인 것으로 예시하기로 한다. 이에 따르면, 듀얼 카메라(1) 내 2개의 센서 중 어느 하나의 센서는 좌측 센서이고, 다른 어느 하나의 센서는 우측 센서일 수 있다.Hereinafter, the first viewpoint image 11 is the left image, and the second viewpoint image 12 is the right image. According to this, one of the two sensors in the dual camera 1 may be a left sensor, and the other sensor may be a right sensor.

제1 추출부(120)는 획득부(110)에서 획득된 촬영 영상(10)을 이용하여 촬영 영상(10) 내 얼굴 영역과 관련된 압축 특징(40)을 추출할 수 있다. 이때, 압축 특징(40)은 촬영 영상(10)에 대하여 전처리가 이루어진 전처리된 촬영 영상(10)과 관련된 압축 특징일 수 있다.The first extracting unit 120 may extract a compressed feature 40 related to a facial region within the captured image 10 by using the captured image 10 acquired by the acquisition unit 110. In this case, the compression feature 40 may be a compression feature related to the pre-processed photographed image 10 in which the photographed image 10 has been pre-processed.

압축 특징(40)은 제1 시점 영상(11)과 관련된 압축 특징인 제1 시점 압축 특징(41) 및 제2 시점 영상(12)과 관련된 압축 특징인 제2 시점 압축 특징(42)을 포함할 수 있다. 이때, 제1 시점 압축 특징(41)은 제1 시점 영상(11)에 대하여 전처리가 이루어진 전처리된 제1 시점 영상(11')과 관련된 압축 특징일 수 있다. 또한, 제2 시점 압축 특징(42)은 제2 시점 영상(12)에 대하여 전처리가 이루어진 전처리된 제2 시점 영상(12')과 관련된 압축 특징일 수 있다. 전처리 관련 설명은 도 5를 참조하여 보다 쉽게 이해될 수 있다.The compression feature 40 may include a first view compression feature 41 that is a compression feature related to the first view image 11 and a second view compression feature 42 that is a compression feature related to the second view image 12. I can. In this case, the first view compression feature 41 may be a compression feature related to the preprocessed first view image 11 ′ in which the first view image 11 has been preprocessed. In addition, the second view compression feature 42 may be a compression feature related to the preprocessed second view image 12 ′ in which the second view image 12 has been pre-processed. A description of the pretreatment may be more easily understood with reference to FIG. 5.

도 5를 참조하면, 획득부(110)는 듀얼 카메라(1)를 통해 제1 시점 영상(11)과 제2 시점 영상(12)을 포함하는 촬영 영상(10)을 획득할 수 있다. 이후, 제1 추출부(120)는 획득부(110)로부터 촬영 영상(10)(11, 12)을 전달받을 수 있는데, 이때, 제1 추출부(120)가 획득부(110)로부터 전달받는 촬영 영상(10)(11, 12)은 전처리가 이루어진 전처리된 촬영 영상(10')(11', 12')일 수 있다. 이를 위해, 도면에 도시하지는 않았으나, 본 장치(100)는 전처리부(미도시)를 포함할 수 있다.Referring to FIG. 5, the acquisition unit 110 may acquire a captured image 10 including a first viewpoint image 11 and a second viewpoint image 12 through the dual camera 1. Thereafter, the first extraction unit 120 may receive the captured image 10 (11, 12) from the acquisition unit 110, at this time, the first extraction unit 120 is transmitted from the acquisition unit 110 The photographed images 10 (11, 12) may be pre-processed photographed images 10 ′, 11 ′, 12 ′ in which pre-processing has been performed. To this end, although not shown in the drawings, the apparatus 100 may include a preprocessor (not shown).

전처리부(미도시)는 듀얼 카메라(1)를 통해 획득된 촬영 영상(10)(11, 12) 내 대상체의 얼굴 부위에 대하여 전처리를 수행할 수 있다. 특히, 전처리부(미도시)는 제1 시점 영상(11) 내 대상체의 얼굴 부위(s1)에 대하여 전처리를 수행할 수 있다. 또한, 전처리부(미도시)는 제2 시점 영상(12) 내 대상체의 얼굴 부위(s2)에 대하여 전처리를 수행할 수 있다.The preprocessor (not shown) may perform preprocessing on the face of the object in the captured images 10 (11, 12) acquired through the dual camera 1. In particular, the preprocessor (not shown) may perform preprocessing on the face portion s1 of the object in the first viewpoint image 11. Also, the preprocessor (not shown) may perform preprocessing on the face portion s2 of the object in the second viewpoint image 12.

전처리부(미도시)는 듀얼 카메라(1)로부터 획득된 촬영 영상(10)(11, 12) 내에서, 대상체의 얼굴이 포함된 얼굴 영역(s1, s2)을 식별하고, 식별된 얼굴 영역(s1 ,s2)의 크기를 정규화하는 전처리를 수행할 수 있다.The preprocessor (not shown) identifies the face regions s1 and s2 including the face of the object within the captured images 10 (11, 12) acquired from the dual camera 1, and identifies the identified face regions ( Pre-processing of normalizing the size of s1 and s2) can be performed.

촬영 영상(10)에 대한 전처리를 통해, 식별된 얼굴 영역만 포함되는 전처리된 촬영 영상(10')이 획득될 수 있다. 제1 시점 영상(11)에 대한 전처리를 통해, 식별된 얼굴 영역(s1)만 포함되는 전처리된 제1 시점 영상(11')이 획득될 수 있다. 제2 시점 영상(12)에 대한 전처리를 통해, 식별된 얼굴 영역(s2)만 포함되는 전처리된 제2 시점 영상(12')이 획득될 수 있다.Through the pre-processing of the captured image 10, a pre-processed captured image 10 ′ including only the identified face area may be obtained. Through the pre-processing of the first viewpoint image 11, a preprocessed first viewpoint image 11 ′ including only the identified face region s1 may be obtained. Through pre-processing of the second viewpoint image 12, a preprocessed second viewpoint image 12 ′ including only the identified face region s2 may be obtained.

이때, 정규화는 예시적으로 128 × 128 픽셀 크기로 재조정되는 정규화가 이루어 질 수 있다. 다만 이에만 한정되는 것은 아니고, 그 수치는 다양하게 적용될 수 있다. 달리 표현해, 정규화하는 전처리가 이루어진 전처리된 제1 시점 영상(11')과 전처리된 제2 시점 영상(12')의 크기는 일예로 128 × 128의 픽셀 크기를 가질 수 있다.In this case, the normalization may be, for example, normalization that is readjusted to a size of 128 × 128 pixels. However, it is not limited to this, and the number can be applied in various ways. In other words, the size of the preprocessed first view image 11 ′ and the preprocessed second view image 12 ′ in which normalization has been performed may have a pixel size of 128×128, for example.

전처리된 제1 시점 영상(11')과 전처리된 제2 시점 영상(12')은 제1 추출부(120)로 전달될 수 있다.The preprocessed first viewpoint image 11 ′ and the preprocessed second viewpoint image 12 ′ may be transmitted to the first extraction unit 120.

제1 추출부(120)는 제1 시점 영상(11)과 제2 시점 영상(12)을 제1 신경망(20)의 입력으로 제공하여, 제1 신경망(20)으로부터 제1 시점 영상(11)에 대응하는 제1 시점 압축 특징(41)과 제2 시점 영상(12)에 대응하는 제2 시점 압축 특징(41)을 포함하는 압축 특징(40)을 추출할 수 있다. 이러한 제1 시점 압축 특징(41) 및 제2 시점 압축 특징(42)은 제1 신경망(20)의 출력(출력 값, 출력 결과물)일 수 있다.The first extraction unit 120 provides the first viewpoint image 11 and the second viewpoint image 12 as inputs of the first neural network 20, and the first viewpoint image 11 from the first neural network 20 A compression feature 40 including a first view compression feature 41 corresponding to and a second view compression feature 41 corresponding to the second view image 12 may be extracted. The first view compression feature 41 and the second view compression feature 42 may be outputs (output values, output results) of the first neural network 20.

특히나, 제1 추출부(120)는 획득부(110)로부터 전처리된 촬영 영상(10')으로서 전처리된 제1 시점 영상(11')과 전처리된 제2 시점 영상(12')을 전달받을 수 있다. 이후, 제1 추출부(120)는 전처리된 제1 시점 영상(11')과 전처리된 제2 시점 영상(12')을 제1 신경망(20)의 입력으로 제공할 수 있다. 달리 표현해, 제1 신경망(20)에 입력되는 영상은 전처리된 촬영 영상(10'), 즉 전처리된 제1 시점 영상(11')과 전처리된 제2 시점 영상(12')일 수 있다.In particular, the first extraction unit 120 may receive the preprocessed first viewpoint image 11 ′ and the preprocessed second viewpoint image 12 ′ as the preprocessed photographed image 10 ′ from the acquisition unit 110. have. Thereafter, the first extraction unit 120 may provide the preprocessed first viewpoint image 11 ′ and the preprocessed second viewpoint image 12 ′ as inputs of the first neural network 20. In other words, the image input to the first neural network 20 may be a preprocessed photographed image 10 ′, that is, a preprocessed first viewpoint image 11 ′ and a preprocessed second viewpoint image 12 ′.

이에 따르면, 제1 추출부(120)는 제1 신경망(20)으로부터 전처리된 제1 시점 영상(11')에 대응하는 제1 시점 압축 특징(41)과 전처리된 제2 시점 영상(12')에 대응하는 제2 시점 압축 특징(41)을 포함하는 압축 특징(40)을 추출할 수 있다.Accordingly, the first extraction unit 120 includes the first view compression feature 41 corresponding to the first view image 11 ′ preprocessed from the first neural network 20 and the preprocessed second view image 12 ′. The compression feature 40 including the second view compression feature 41 corresponding to may be extracted.

제1 신경망(20)은 딥러닝 모델, 인공지능(AI) 알고리즘 모델, 기계학습(머신러닝) 모델, 신경망 모델(인공 신경망 모델), 심층신경망(Deep Neural Network) 모델, 뉴로 퍼지 모델 등을 의미할 수 있다. 제1 신경망(20)은 예시적으로 컨볼루션 신경망(Convolution Neural Network, CNN, 합성곱 신경망)일 수 있다. 다만, 이에만 한정되는 것은 아니고, 제1 신경망(20)은 종래에 공지되었거나 향후 개발되는 다양한 신경망이 적용될 수 있다.The first neural network 20 refers to a deep learning model, an artificial intelligence (AI) algorithm model, a machine learning (machine learning) model, a neural network model (artificial neural network model), a deep neural network model, a neuro fuzzy model, etc. can do. The first neural network 20 may be, for example, a convolution neural network (CNN, a convolutional neural network). However, the present invention is not limited thereto, and various neural networks known in the art or developed in the future may be applied to the first neural network 20.

제1 신경망(20)은 제1 서브 신경망(21) 및 제2 서브 신경망(22)을 포함할 수 있다. 제1 서브 신경망(21)은 제1 레이어들(21a, 21b, 21c, …, 21x)을 포함하고, 제2 서브 신경망(22)은 제2 레이어들(22a, 22b, 22c, …, 22x)을 포함할 수 있다.The first neural network 20 may include a first sub neural network 21 and a second sub neural network 22. The first sub neural network 21 includes first layers 21a, 21b, 21c, ..., 21x, and the second sub neural network 22 includes second layers 22a, 22b, 22c, ..., 22x) It may include.

제1 레이어들(21a, 21b, 21c, …, 21x) 각각 및 제2 레이어들(22a, 22b, 22c, …, 22x) 각각은 합성곱 레이어(합성곱 계층, CONV)을 의미할 수 있다.Each of the first layers 21a, 21b, 21c, …, 21x and each of the second layers 22a, 22b, 22c, …, 22x may mean a convolutional layer (convolutional layer, CONV).

전처리된 제1 시점 영상(11')이 제1 신경망(20)에 입력된 경우, 전처리된 제1 시점 영상(11')의 특징(특징들)은 제1 신경망(20) 내 제1 레이어들(21a, 21b, 21c, …, 21x)을 통과함에 따라 점차 압축(즉, 점차 특징들이 압축)될 수 있으며, 이에 따라 제1 시점 압축 특징(41)이 제1 신경망(20)으로부터 출력될 수 있다.When the preprocessed first viewpoint image 11 ′ is input to the first neural network 20, the characteristics (features) of the preprocessed first viewpoint image 11 ′ are the first layers in the first neural network 20 As (21a, 21b, 21c, …, 21x) passes, it may be gradually compressed (that is, features are gradually compressed), and accordingly, the first view compression feature 41 may be output from the first neural network 20 have.

구체적으로, 제1 레이어들(21a, 21b, 21c, …, 21x) 중 제1 레이어(21a)는 전처리된 제1 시점 영상(11')을 입력으로 하여, 전처리된 제1 시점 영상(11')의 특징이 제1 레이어(21a)에 의해 한번(1회) 압축된 1차 제1 압축된 특징(11a)을 출력할 수 있다. 즉, 제1 레이어(21a)에 의해 1차 특징 압축 과정이 이루어짐에 따라, 1차 제1 압축된 특징(11a)이 출력될 수 있다. Specifically, among the first layers 21a, 21b, 21c, ..., 21x, the first layer 21a receives the preprocessed first view image 11' as an input, and the preprocessed first view image 11' The first first compressed feature 11a in which the feature of) is compressed once (once) by the first layer 21a may be output. That is, as the first feature compression process is performed by the first layer 21a, the first first compressed feature 11a may be output.

또한, 제1 레이어(21a) 다음에 위치하는 제2 레이어(21b)는 1차 제1 압축된 특징(11a)을 입력으로 하여, 전처리된 제1 시점 영상(11')의 특징이 제1 레이어(21a)와 제2 레이어(21b)에 의해 두번(2회) 압축된 2차 제1 압축된 특징(11b)을 출력할 수 있다. 즉, 제2 레이어(21b)에 의해 2차 특징 압축 과정이 이루어짐에 따라, 2차 제1 압축된 특징(11b)이 출력될 수 있다.In addition, the second layer 21b positioned after the first layer 21a receives the first first compressed feature 11a as an input, and the feature of the preprocessed first view image 11' is the first layer. The second first compressed feature 11b compressed twice (twice) by the (21a) and the second layer 21b may be output. That is, as the second feature compression process is performed by the second layer 21b, the second first compressed feature 11b may be output.

이처럼, 전처리된 제1 시점 영상(11') 내지 그에 관한 특징은 제1 신경망(20) 내 제1 레이어들(21a, 21b, 21c, …, 21x)을 통과함에 따라(거침에 따라) 특징 압축 과정이 이루어져 점차 압축될 수 있으며, 이를 통해 제1 시점 압축 특징(41)이 출력될 수 있다. 제1 시점 압축 특징(41)은 전처리된 제1 시점 영상(11')이 제1 신경망(20) 중 특히 제1 레이어들(21a, 21b, 21c, …, 21x)을 통과함으로써 생성된 출력 결과물로서, 전처리된 제1 시점 영상(11')에 관한 압축 특징(압축된 특징)을 의미할 수 있다.As described above, the preprocessed first viewpoint image 11 ′ and its related features are compressed as they pass through the first layers 21a, 21b, 21c, …, 21x in the first neural network 20 (according to roughness). The process may be performed and gradually compressed, through which the first view compression feature 41 may be output. The first view compression feature 41 is an output result generated when the preprocessed first view image 11' passes through the first layers 21a, 21b, 21c, …, 21x among the first neural networks 20 in particular. As, it may mean a compressed feature (compressed feature) of the preprocessed first viewpoint image 11 ′.

전처리된 제2 시점 영상(12')이 제1 신경망(20)에 입력된 경우, 전처리된 제2 시점 영상(12')의 특징(특징들)은 제2 신경망(30) 내 제2 레이어들(22a, 22b, 22c, …, 22x)을 통과함에 따라 점차 압축(즉, 점차 특징들이 압축)될 수 있으며, 이에 따라 제2 시점 압축 특징(42)이 제2 신경망(30)으로부터 출력될 수 있다.When the preprocessed second view image 12' is input to the first neural network 20, the features (features) of the preprocessed second view image 12' are the second layers in the second neural network 30 As passing through (22a, 22b, 22c, …, 22x), it may be gradually compressed (ie, features are gradually compressed), and accordingly, the second view compression feature 42 may be output from the second neural network 30. have.

구체적으로, 제2 레이어들(22a, 22b, 22c, …, 22x) 중 제1 레이어(22a)는 전처리된 제2 시점 영상(12')을 입력으로 하여, 전처리된 제2 시점 영상(12')의 특징이 제1 레이어(22a)에 의해 한번(1회) 압축된 1차 제2 압축된 특징(12a)을 출력할 수 있다. 즉, 제1 레이어(22a)에 의해 1차 특징 압축 과정이 이루어짐에 따라, 1차 제2 압축된 특징(12a)이 출력될 수 있다. Specifically, among the second layers 22a, 22b, 22c, ..., 22x, the first layer 22a receives the preprocessed second view image 12' as an input, and the preprocessed second view image 12' The first second compressed feature 12a in which the feature of) is compressed once (once) by the first layer 22a may be output. That is, as the first feature compression process is performed by the first layer 22a, the first second compressed feature 12a may be output.

또한, 제1 레이어(22a) 다음에 위치하는 제2 레이어(22b)는 1차 제2 압축된 특징(12a)을 입력으로 하여, 전처리된 제2 시점 영상(12')의 특징이 제1 레이어(22a)와 제2 레이어(22b)에 의해 두번(2회) 압축된 2차 제2 압축된 특징(12b)을 출력할 수 있다. 즉, 제2 레이어(22b)에 의해 2차 특징 압축 과정이 이루어짐에 따라, 2차 제2 압축된 특징(12b)이 출력될 수 있다.In addition, the second layer 22b positioned after the first layer 22a receives the first second compressed feature 12a as an input, and the feature of the preprocessed second view image 12' is the first layer. The second second compressed feature 12b compressed twice (twice) by the (22a) and the second layer 22b may be output. That is, as the second feature compression process is performed by the second layer 22b, the second second compressed feature 12b may be output.

이처럼, 전처리된 제2 시점 영상(12') 내지 그에 관한 특징은 제1 신경망(20) 내 제2 레이어들(22a, 22b, 22c, …, 22x)을 통과함에 따라(거침에 따라) 특징 압축 과정이 이루어져 점차 압축될 수 있으며, 이를 통해 제2 시점 압축 특징(42)이 출력될 수 있다. 제2 시점 압축 특징(42)은 전처리된 제2 시점 영상(12')이 제1 신경망(20) 중 특히 제2 레이어들(22a, 22b, 22c, …, 22x)을 통과함으로써 생성된 출력 결과물로서, 전처리된 제2 시점 영상(12')에 관한 압축 특징(압축된 특징)을 의미할 수 있다.As described above, the preprocessed second view image 12 ′ and its related features are compressed as they pass through the second layers 22a, 22b, 22c, …, 22x in the first neural network 20 (according to the roughness). The process may be performed and gradually compressed, through which the second view compression feature 42 may be output. The second view compression feature 42 is an output result generated by the preprocessed second view image 12' passing through the second layers 22a, 22b, 22c, …, 22x, among the first neural networks 20 in particular. As, it may mean a compressed feature (compressed feature) of the preprocessed second viewpoint image 12 ′.

다시 말해, 전처리부(미도시)는 획득부(110)가 듀얼 카메라(1)로부터 획득한 제1 시점 영상(11)과 제2 시점 영상(12) 각각에 대하여, 얼굴 영역을 식별하고 식별된 얼굴 영역의 크기를 정규화하는 전처리를 수행할 수 있다. 이후, 전처리부(미도시)는 전처리된 제1 시점 영상(11')과 전처리된 제2 시점 영상(12')을 제1 추출부(120)로 전달할 수 있다. 이후, 제1 추출부(120)는 전처리된 두 영상(11', 12') 각각으로부터 특징을 추출하기 위해, 전처리된 제1 시점 영상(11')의 경우 제1 신경망(20) 내 제1 서브 신경망(21)의 입력으로 제공하고, 전처리된 제2 시점 영상(12')의 경우 제1 신경망(20) 내 제2 서브 신경망(22)의 입력으로 제공할 수 있다. 이를 통해, 제1 추출부(120)는 제1 신경망(20)으로부터(제1 신경망의 출력을 통해) 전처리된 제1 시점 영상(11')에 대응하는 제1 시점 압축 특징(41) 및 전처리된 제2 시점 영상(12')에 대응하는 제2 시점 압축 특징(42)을 추출할 수 있다.In other words, the preprocessor (not shown) identifies the face region for each of the first viewpoint image 11 and the second viewpoint image 12 acquired by the acquisition unit 110 from the dual camera 1 and Pre-processing of normalizing the size of the face area can be performed. Thereafter, the preprocessor (not shown) may transfer the preprocessed first viewpoint image 11 ′ and the preprocessed second viewpoint image 12 ′ to the first extraction unit 120. Thereafter, the first extraction unit 120 extracts features from each of the two preprocessed images 11 ′ and 12 ′, and in the case of the preprocessed first viewpoint image 11 ′, the first neural network 20 In the case of the second view image 12 ′ that is provided as an input of the sub neural network 21 and preprocessed, it may be provided as an input of the second sub neural network 22 in the first neural network 20. Through this, the first extraction unit 120 includes a first view compression feature 41 corresponding to the first view image 11 ′ preprocessed from the first neural network 20 (through the output of the first neural network) and preprocessing. The second view compression feature 42 corresponding to the second view image 12 ′ may be extracted.

제2 추출부(130)는 촬영 영상(10)을 이용하여 잠재 특징 차이(Difference of Latent Features)(50)를 추출할 수 있다.The second extraction unit 130 may extract a difference of latent features 50 using the captured image 10.

제2 추출부(130)는 제1 시점 영상(11)과 제2 시점 영상(12)을 이용하여 제1 시점 영상(11)과 제2 시점 영상(12) 간의 잠재 특징 차이(50)를 추출할 수 있다. 여기서, 잠재 특징 차이(50)의 산출시 고려되는 제1 시점 영상(11)과 제2 시점 영상(12)은 각각 전처리된 제1 시점 영상(11') 및 제2 시점 영상(12')일 수 있다. 달리 말해, 제2 추출부(130)는 전처리된 제1 시점 영상(11')과 전처리된 제2 시점 영상(12')을 이용하여 전처리된 제1 시점 영상(11')과 전처리된 제2 시점 영상(12') 간의 잠재 특징 차이(50)를 추출할 수 있다.The second extraction unit 130 extracts a potential feature difference 50 between the first viewpoint image 11 and the second viewpoint image 12 using the first viewpoint image 11 and the second viewpoint image 12. can do. Here, the first viewpoint image 11 and the second viewpoint image 12 considered when calculating the potential feature difference 50 are preprocessed first viewpoint images 11 ′ and second viewpoint images 12 ′, respectively. I can. In other words, the second extraction unit 130 uses the preprocessed first viewpoint image 11 ′ and the preprocessed second viewpoint image 12 ′ to obtain a preprocessed first viewpoint image 11 ′ and a preprocessed second viewpoint image 11 ′. A potential feature difference 50 between the viewpoint images 12 ′ may be extracted.

잠재 특징 차이(50)는, 제1 시점 영상(11)의 특징과 제2 시점 영상(12)의 특징이 제1 신경망(20) 내에서 압축되는 과정(특징 압축 과정)에서 발생하는 제1 시점 영상(11)의 특징과 제2 시점 영상(12)의 특징 간의 차이인 차이 특징을 제2 신경망(30)에 통과시킴으로써 획득되는 것일 수 있다. 특히, 잠재 특징 차이(50)는, 전처리된 제1 시점 영상(11')의 특징과 전처리된 제2 시점 영상(12')의 특징이 제1 신경망(20) 내에서 압축되는 과정(특징 압축 과정)에서 발생하는 전처리된 제1 시점 영상(11')의 특징과 전처리된 제2 시점 영상(12')의 특징 간의 차이인 차이 특징을 제2 신경망(30)에 통과시킴으로써 획득되는 것일 수 있다. The latent feature difference 50 is a first view that occurs in a process in which the features of the first view image 11 and the features of the second view image 12 are compressed in the first neural network 20 (feature compression process) It may be obtained by passing through the second neural network 30 a difference feature, which is a difference between the feature of the image 11 and the feature of the second viewpoint image 12. In particular, the latent feature difference 50 is a process in which the features of the preprocessed first viewpoint image 11' and the features of the preprocessed second viewpoint image 12' are compressed in the first neural network 20 (feature compression Process) may be obtained by passing through the second neural network 30 a difference feature that is a difference between the features of the preprocessed first viewpoint image 11 ′ and the preprocessed second viewpoint image 12 ′. .

제2 추출부(130)는 잠재 특징 차이(50)를 추출하기 위해, 제1 신경망(20) 내 제1 서브 신경망(21)을 통해 출력되는 각 레이어별 압축된 특징들과 제1 신경망(20) 내 제2 서브 신경망(22)을 통해 출력되는 각 레이어별 압축된 특징들 간의 차이를 차이 특징으로서 각 레이어별로 산출할 수 있다. 이후, 제2 추출부(130)는 각 레이어별로 산출된 차이 특징을(차이 특징들을) 제2 신경망(30)으로 제공하고, 제2 신경망(30)의 출력으로부터 잠재 특징 차이(50)를 획득할 수 있다. 구체적인 설명은 다음과 같다. In order to extract the potential feature difference 50, the second extraction unit 130 includes compressed features for each layer and the first neural network 20 output through the first sub-neural network 21 in the first neural network 20. ) A difference between compressed features for each layer output through the second sub neural network 22 in) may be calculated for each layer as a difference feature. Thereafter, the second extraction unit 130 provides the difference features (difference features) calculated for each layer to the second neural network 30, and obtains the latent feature difference 50 from the output of the second neural network 30. can do. A detailed description is as follows.

앞서 말한 바와 같이, 전처리된 제1 시점 영상(11')이 제1 신경망(20) 내 제1 서브 신경망(21)의 제1 레이어들(21a, 21b, 21c, …, 21x)을 통과함에 따라, 전처리된 제1 시점 영상(11')의 특징은 점차 압축될 수 있다. 마찬가지로, 전처리된 제2 시점 영상(12')이 제1 신경망(20) 내 제2 서브 신경망(22)의 제2 레이어들(22a, 22b, 22c, …, 22x)을 통과함에 따라, 전처리된 제2 시점 영상(12')의 특징은 점차 압축될 수 있다.As mentioned above, as the preprocessed first view image 11' passes through the first layers 21a, 21b, 21c, ..., 21x of the first sub neural network 21 in the first neural network 20 , Features of the preprocessed first viewpoint image 11 ′ may be gradually compressed. Similarly, as the preprocessed second view image 12' passes through the second layers 22a, 22b, 22c, …, 22x of the second sub neural network 22 in the first neural network 20, the preprocessed The features of the second viewpoint image 12 ′ may be gradually compressed.

이때, 제2 추출부(130)는 잠재 특징 차이(50)의 추출을 위해, 제1 서브 신경망(21) 내 제1 레이어(21a)의 출력인 1차 제1 압축된 특징(11a)과 제2 서브 신경망(22) 내 제1 레이어(22a)의 출력인 1차 제2 압축된 특징(12a) 간의 차이는 제1 차이 특징(13a)으로 산출할 수 있다. 또한, 제2 추출부(130)는 제1 서브 신경망(21) 내 제2 레이어(21b)의 출력인 2차 제1 압축된 특징(11b)과 제2 서브 신경망(22) 내 제2 레이어(22b)의 출력인 2차 제2 압축된 특징(12b) 간의 차이를 제2 차이 특징(13b)으로 산출할 수 있다. 이처럼, 제2 추출부(130)는 제1 서브 신경망(21) 내 레이어들 각각의 출력인 압축된 특징들(11a, 11b, …)과 제2 서브 신경망(22) 내 레이어들 각각의 출력인 압축된 특징들(12a, 12b, …) 간의 차이를, 차이 특징(13a, 13b, …)으로서 산출할 수 있다.In this case, the second extraction unit 130 extracts the first first compressed feature 11a and the first compressed feature 11a, which is an output of the first layer 21a in the first sub-neural network 21, to extract the potential feature difference 50. The difference between the first second compressed features 12a, which is an output of the first layer 22a in the sub-neural network 22, may be calculated as the first difference feature 13a. In addition, the second extraction unit 130 includes a second first compressed feature 11b that is an output of the second layer 21b in the first sub neural network 21 and a second layer in the second sub neural network 22 ( The difference between the second second compressed features 12b, which is the output of 22b), may be calculated as the second difference feature 13b. As such, the second extraction unit 130 includes compressed features 11a, 11b, …, which are outputs of each of the layers in the first sub-neural network 21, and outputs of each of the layers in the second sub-neural network 22. The difference between the compressed features 12a, 12b, ... may be calculated as the difference features 13a, 13b, ....

제2 신경망(30)은 제1 서브 신경망(21) 내 레이어들의 수 혹은 제2 서브 신경망(22) 내 레이어들의 수에 대응하는 수의 레이어들(31a, 31b, 31c, …, 31x)을 포함할 수 있다. 이때, 산출된 제1 차이 특징(13a)은 제2 신경망(30) 내 제1 레이어(31a)로 제공될 수 있다. 또한, 산출된 제2 차이 특징(13b)은 제2 신경망(30) 내 제2 레이어(31b)로 제공될 수 있다.The second neural network 30 includes the number of layers 31a, 31b, 31c, ..., 31x corresponding to the number of layers in the first sub neural network 21 or the number of layers in the second sub neural network 22 can do. In this case, the calculated first difference feature 13a may be provided as the first layer 31a in the second neural network 30. In addition, the calculated second difference feature 13b may be provided as the second layer 31b in the second neural network 30.

제2 신경망(30)은 제1 신경망(20)과 마찬가지로 일예로 컨볼루션 신경망(CNN) 신경망 일 수 있으나, 이에 한정되는 것은 아니다.Like the first neural network 20, the second neural network 30 may be, for example, a convolutional neural network (CNN) neural network, but is not limited thereto.

제2 추출부(130)는 산출된 차이 특징(13a, 13b, …)을 제2 신경망(30)으로 제공할 수 있으며, 차이 특징(13a, 13b, …)을 제2 신경망(30)에 통과시킴으로써 잠재 특징 차이(50)를 산출할 수 있다.The second extraction unit 130 may provide the calculated difference features 13a, 13b,… to the second neural network 30, and pass the difference features 13a, 13b,… through the second neural network 30 By doing so, the latent feature difference 50 can be calculated.

검출부(140)는 제1 추출부(120)에서 추출된 압축 특징(40)과 제2 추출부(130)에서 추출된 잠재 특징 차이(50)를 이용하여 촬영 영상(10) 내 식별된 대상체의 얼굴 영역(s1, s2)에 대한 위변조 여부를 검출할 수 있다.The detection unit 140 uses the compressed feature 40 extracted from the first extracting unit 120 and the latent feature difference 50 extracted from the second extracting unit 130 to determine the identified object in the captured image 10. Whether the face regions s1 and s2 are forged or altered may be detected.

검출부(140)는 제1 추출부(120)에서 추출된 제1 시점 압축 특징(41), 제2 시점 압축 특징(42), 제2 추출부(130)에서 추출된 잠재 특징 차이(50)를 이용하여 촬영 영상(10)에 대한 점수를 산출할 수 있다. 이후, 검출부(140)는 산출된 점수를 기반으로 촬영 영상(10) 내 대상체의 얼굴 영역에 대한 위변조 여부를 검출할 수 있다.The detection unit 140 detects the first view compression feature 41, the second view compression feature 42, and the latent feature difference 50 extracted from the second extraction unit 130, extracted from the first extraction unit 120. By using it, a score for the captured image 10 may be calculated. Thereafter, the detection unit 140 may detect whether the face area of the object in the captured image 10 is forged or altered based on the calculated score.

특히, 검출부(140)는 각 특징(41, 42, 50)을 이용하여 전처리된 촬영 영상(10') 내 얼굴 영역에 대한 위변조 가능성(혹은 실제 얼굴일 가능성)에 관한 점수를 산출할 수 있다. 검출부(140)는 산출된 점수에 기반하여 촬영 영상(10) 내 얼굴 영역(s1, s2)에 대한 위변조 여부를 검출할 수 있다.In particular, the detection unit 140 may calculate a score regarding the possibility of forgery or alteration (or possibility of a real face) for a face region in the pre-processed photographed image 10 ′ using each of the features 41, 42, and 50. The detection unit 140 may detect whether the face regions s1 and s2 in the captured image 10 are forged or altered based on the calculated score.

이때, 위변조 여부 검출을 위한 점수 산출 방식은 일예로 두가지 방법이 있을 수 있으며, 그 중 제1 방식은 도 3을 참조하고 제2 방식은 도 4를 참조하여 보다 쉽게 이해될 수 있다.In this case, there may be two methods of calculating a score for detecting whether or not forgery or alteration exists. Among them, the first method may be more easily understood with reference to FIG. 3 and the second method with reference to FIG. 4.

도 3을 참조하면, 검출부(140)에 의한 제1 점수 산출 방식(제1 방식)은 다음과 같다. 검출부(140)는 제1 시점 압축 특징(41), 제2 시점 압축 특징(42) 및 잠재 특징 차이(50)를 결합하여 통합 결합 특징(43)을 생성할 수 있다. 이후 검출부(140)는 생성된 통합 결합 특징(43)을 이용하여 산출된 통합 결합 특징 관련 점수(43s)를 기반으로 촬영 영상(10)(특히, 촬영 영상 내 얼굴 영역)에 대한 위변조 여부를 검출할 수 있다.Referring to FIG. 3, a method of calculating a first score by the detection unit 140 (a first method) is as follows. The detection unit 140 may generate an integrated combination feature 43 by combining the first view compression feature 41, the second view compression feature 42, and the latent feature difference 50. Thereafter, the detection unit 140 detects whether the captured image 10 (in particular, a face area in the captured image) is forged or altered based on the score 43s related to the integrated combination feature calculated using the generated integrated combination feature 43 can do.

검출부(140)는 통합 결합 특징(43)을 제3 서브 신경망(61)의 입력으로 제공하고, 제3 서브 신경망(61)의 출력으로부터 통합 결합 특징 관련 점수(43s)를 산출할 수 있다.The detection unit 140 may provide the combined combined feature 43 as an input of the third sub neural network 61 and calculate a score 43s related to the combined combined feature from the output of the third sub neural network 61.

여기서, 통합 결합 특징 관련 점수(43s)의 산출시 이용되는 제3 서브 신경망(61)은 일예로 3개의 합성곱 레이어(conv)와 3개의 풀리 커넥티드 레이어(fully connected, FC, 완전 연결 계층)로 이루어진 신경망일 수 있다. 다만, 이에만 한정되는 것은 아니고, 제3 서브 신경망(61) 내 합성곱 레이어의 수와 풀리 커넥티드 레이어의 수는 다양하게 적용될 수 있다.Here, the third sub-neural network 61 used when calculating the integrated coupling feature related score 43s is, for example, three convolutional layers (conv) and three fully connected layers (fully connected, FC, fully connected layers). It may be a neural network consisting of. However, the present invention is not limited thereto, and the number of convolutional layers and the number of fully connected layers in the third sub neural network 61 may be applied in various ways.

또한, 통합 결합 특징(43)의 생성시 이용되는 잠재 특징 차이(50)는 제4 서브 신경망(62)을 통과함으로써 특징 압축이 이루어진 압축된 잠재 특징 차이(50')일 수 있다.In addition, the latent feature difference 50 used when the integrated combined feature 43 is generated may be a compressed latent feature difference 50 ′ in which feature compression is performed by passing through the fourth sub neural network 62.

제1 신경망(20)으로부터 출력된 압축 특징(40)(41, 42)과 제2 신경망(30)으로부터 출력된 잠재 특징 차이(50)는 서로 간에 차원이 다를 수 있다. 따라서, 두 신경망(20, 30)의 출력인 특징들 간의 차원을 서로 맞추기 위해, 검출부(140)는 제2 신경망(30)으로부터 출력되는 잠재 특징 차이(50)를 제4 서브 신경망(62)의 입력으로 제공할 수 있다. 이를 통해, 잠재 특징 차이(50)는 제4 서브 신경망(62)에 의하여 특징이 압축될 수 있으며, 제4 서브 신경망(62)으로부터 압축된 잠재 특징 차이(50')가 추출될 수 있다.The compressed features 40 (41, 42) output from the first neural network 20 and the latent feature difference 50 output from the second neural network 30 may have different dimensions. Therefore, in order to match the dimensions between the features that are the outputs of the two neural networks 20 and 30, the detection unit 140 determines the latent feature difference 50 output from the second neural network 30 to the fourth sub neural network 62. Can be provided as input. Through this, the potential feature difference 50 may be compressed by the fourth sub neural network 62, and the compressed potential feature difference 50 ′ may be extracted from the fourth sub neural network 62.

이때, 압축된 잠재 특징 차이(50')를 추출하기 위해 이용되는 제4 서브 신경망(62)은 일예로 3개의 합성곱 레이어(conv)로 이루어질 수 있으나, 이에 한정되는 것은 아니고, 해당 신경망을 이루는 레이어의 유형 및 수는 다양하게 적용될 수 있다.In this case, the fourth sub neural network 62 used to extract the compressed latent feature difference 50 ′ may be formed of three convolutional layers (conv), but is not limited thereto, The type and number of layers can be applied in various ways.

검출부(140)는 차원이 동일해진 압축 특징(40)(41, 42)과 압축된 잠재 특징 차이(50)를 결합한 통합 결합 특징(43)을 이용하여 위변조 여부를 검출할 수 있다.The detection unit 140 may detect forgery or alteration by using an integrated combination feature 43 that combines the compressed features 40 (41, 42) having the same dimension and the compressed latent feature difference 50.

검출부(140)는 산출된 통합 결합 특징 관련 점수(43s)가 미리 설정된 점수 이상인 경우, 촬영 영상(10) 내 얼굴 영역(s1, s2)이 위변조가 이루어지지 않을 것으로 판단할 수 있다. 즉, 검출부(140)는 통합 결합 특징 관련 점수(43s)가 미리 설정된 점수 이상인 경우, 촬영 영상(10) 내 얼굴 영역(s1, s2)이 위변조가 이루어지지 않은 대상체의 실제 얼굴(Real face)인 것으로 판단할 수 있다. 이러한 경우, 검출부(140)는 위변조 여부 결과(141)로서 실제 얼굴임을 출력할 수 있다.When the calculated integrated combined feature related score 43s is equal to or greater than a preset score, the detection unit 140 may determine that the face regions s1 and s2 in the captured image 10 will not be forged or altered. That is, when the score 43s related to the integrated combination feature is equal to or greater than a preset score, the detection unit 140 indicates that the face regions s1 and s2 in the captured image 10 are the real faces of the object that has not been forged or altered. It can be judged as. In this case, the detection unit 140 may output a real face as a result 141 of whether the forgery or alteration has occurred.

반면, 검출부(140)는 산출된 통합 결합 특징 관련 점수(43s)가 미리 설정된 점수 미만인 경우, 촬영 영상(10) 내 얼굴 영역(s1, s2)이 위변조가 이루어진 것으로 판단할 수 있다. 즉, 검출부(140)는 통합 결합 특징 관련 점수(43s)가 미리 설정된 점수 미만인 경우, 촬영 영상(10) 내 얼굴 영역(s1, s2)이 위변조가 이루어진 대상체의 거짓 얼굴(Fake face, 가짜 얼굴)인 것으로 판단할 수 있다. 이러한 경우, 검출부(140)는 위변조 여부 결과(141)로서 거짓 얼굴임을 출력할 수 있다.On the other hand, when the calculated integrated combined feature related score 43s is less than a preset score, the detection unit 140 may determine that the face regions s1 and s2 in the captured image 10 have been forged. That is, when the integrated combination feature related score 43s is less than a preset score, the detection unit 140 is a fake face of an object in which the face regions s1 and s2 in the captured image 10 have been forged and altered. It can be determined to be. In this case, the detection unit 140 may output a false face as a result 141 of whether a forgery or alteration has occurred.

도 4를 참조하면, 검출부(140)에 의한 제2 점수 산출 방식(제2 방식)은 다음과 같다. 검출부(140)는 제1 시점 압축 특징(41)과 제2 시점 압축 특징(42)을 결합한 결합 특징(44)을 이용하여 결합 특징 관련 제1 점수(44s)를 산출할 수 있다. 또한, 검출부(140)는 잠재 특징 차이(50)를 이용하여 잠재 특징 차이 관련 제2 점수(50s)를 산출할 수 있다. 이후, 검출부(140)는 산출된 제1 점수(44s)와 산출된 제2 점수(50s)를 합산한 합산 점수(55s)를 기반으로, 촬영 영상(10)(특히, 촬영 영상 내 얼굴 영역)에 대한 위변조 여부를 검출할 수 있다.Referring to FIG. 4, a second score calculation method (second method) by the detection unit 140 is as follows. The detection unit 140 may calculate a first score 44s related to the combined feature by using the combined feature 44 that combines the first view compression feature 41 and the second view compression feature 42. In addition, the detection unit 140 may calculate a second score 50s related to the latent feature difference by using the latent feature difference 50. Thereafter, the detection unit 140, based on the sum score 55s obtained by summing the calculated first score 44s and the calculated second score 50s, the captured image 10 (especially, the face area in the captured image). It is possible to detect whether forgery or alteration of

이때, 합산 점수(55s)는 제1 점수(44s)와 제2 점수(50s)를 평균한 점수일 수 있다. 즉, 검출부(140)는 제1 점수(44s)와 제2 점수(50s)를 평균한 평균 점수(55s)를 기반으로 위변조 여부를 검출할 수 있다.In this case, the sum score 55s may be a score obtained by an average of the first score 44s and the second score 50s. That is, the detection unit 140 may detect whether or not forgery or alteration is based on the average score 55s obtained by averaging the first score 44s and the second score 50s.

검출부(140)는 결합 특징(44)을 제3 서브 신경망(61')의 입력으로 제공하고, 제3 서브 신경망(61')의 출력으로부터 결합 특징 관련 제1 점수(44s)를 산출할 수 있다.The detection unit 140 may provide the combined feature 44 as an input of the third sub neural network 61 ′ and calculate a first score 44 s related to the combined feature from the output of the third sub neural network 61 ′. .

여기서, 제1 점수(44s)의 산출시 이용되는 제3 서브 신경망(61')은 일예로 2개의 합성곱 레이어(conv)와 3개의 풀리 커넥티드 레이어(fully connected, FC, 완전 연결 계층)로 이루어진 신경망일 수 있다. 다만, 이에만 한정되는 것은 아니고, 제3 서브 신경망(61') 내 합성곱 레이어의 수와 풀리 커넥티드 레이어의 수는 다양하게 적용될 수 있다.Here, the third sub-neural network 61' used when calculating the first score 44s is, for example, two convolutional layers (conv) and three fully connected layers (fully connected, FC, fully connected layers). It may be a constructed neural network. However, the present invention is not limited thereto, and the number of convolutional layers and the number of fully connected layers in the third sub neural network 61 ′ may be applied in various ways.

또한, 검출부(140)는 잠재 특징 차이(50)를 제4 서브 신경망(62')의 입력으로 제공하고, 제4 서브 신경망(62')의 출력으로부터 잠재 특징 차이 관련 제2 점수(50s)를 산출할 수 있다.In addition, the detection unit 140 provides the latent feature difference 50 as an input of the fourth sub neural network 62 ′, and calculates a second score 50 s related to the latent feature difference from the output of the fourth sub neural network 62 ′. Can be calculated.

여기서, 2 점수(50s)의 산출시 이용되는 제4 서브 신경망(62')은 일예로 4개의 합성곱 레이어(conv)와 3개의 풀리 커넥티드 레이어(fully connected, FC, 완전 연결 계층)로 이루어진 신경망일 수 있다. 다만, 이에만 한정되는 것은 아니고, 제4 서브 신경망(62') 내 합성곱 레이어의 수와 풀리 커넥티드 레이어의 수는 다양하게 적용될 수 있다.Here, the fourth sub neural network 62' used when calculating 2 scores (50s) is composed of 4 convolutional layers (conv) and 3 fully connected layers (fully connected, FC, fully connected layers), for example. It could be a neural network. However, the present invention is not limited thereto, and the number of convolutional layers and the number of fully connected layers in the fourth sub neural network 62 ′ may be applied in various ways.

이에 따르면, 잠재 특징 차이(50)가 제4 서브 신경망(62')의 입력으로 제공됨에 따라, 제4 서브 신경망(62')을 통하여 잠재 특징 차이(50)의 특징을 압축하는 특징 압축 및 압축된 특징을 기초로 한 제2 점수(50s)의 산출이 이루어질 수 있다. 검출부(140)는 제4 서브 신경망(62')을 이용해 잠재 특징 차이 관련 제2 점수(50s)를 산출할 수 있다.Accordingly, as the latent feature difference 50 is provided as an input of the fourth sub-neural network 62', feature compression and compression for compressing the feature of the latent feature difference 50 through the fourth sub-neural network 62' The second score 50s may be calculated based on the specified characteristics. The detection unit 140 may calculate a second score 50s related to the latent feature difference using the fourth sub neural network 62 ′.

이후, 검출부(140)는 산출된 제1 점수(44s)와 제2 점수(50s)를 합산하여 평균한 점수(55s)를 기반으로, 위변조 여부를 검출할 수 있다.Thereafter, the detection unit 140 may detect whether or not forgery or alteration is based on the average score 55s obtained by summing the calculated first score 44s and the second score 50s.

검출부(140)는 합산 점수(혹은 평균 점수)(55s)가 미리 설정된 점수 이상인 경우, 촬영 영상(10) 내 얼굴 영역(s1, s2)이 위변조가 이루어지지 않을 것으로 판단할 수 있다. 즉, 검출부(140)는 합산 점수(55s)가 미리 설정된 점수 이상인 경우, 촬영 영상(10) 내 얼굴 영역(s1, s2)이 위변조가 이루어지지 않은 대상체의 실제 얼굴(Real face)인 것으로 판단할 수 있다. 이러한 경우, 검출부(140)는 위변조 여부 결과(141)로서 실제 얼굴임을 출력할 수 있다.When the sum score (or average score) 55s is equal to or greater than a preset score, the detection unit 140 may determine that the face regions s1 and s2 in the captured image 10 are not forged or altered. That is, when the sum score 55s is equal to or greater than a preset score, the detection unit 140 may determine that the face regions s1 and s2 in the captured image 10 are the real faces of the object that has not been forged or altered. I can. In this case, the detection unit 140 may output a real face as a result 141 of whether the forgery or alteration has occurred.

반면, 검출부(140)는 합산 점수(55s)가 미리 설정된 점수 미만인 경우, 촬영 영상(10) 내 얼굴 영역(s1, s2)이 위변조가 이루어진 것으로 판단할 수 있다. 즉, 검출부(140)는 합산 점수(55s)가 미리 설정된 점수 미만인 경우, 촬영 영상(10) 내 얼굴 영역(s1, s2)이 위변조가 이루어진 대상체의 거짓 얼굴(Fake face, 가짜 얼굴)인 것으로 판단할 수 있다. 이러한 경우, 검출부(140)는 위변조 여부 결과(141)로서 거짓 얼굴임을 출력할 수 있다On the other hand, when the sum score 55s is less than a preset score, the detection unit 140 may determine that the face regions s1 and s2 in the captured image 10 have been forged. That is, when the sum score 55s is less than a preset score, the detection unit 140 determines that the face regions s1 and s2 in the captured image 10 are the fake faces of the forged object. can do. In this case, the detection unit 140 may output a false face as a result 141 of whether a forgery or alteration has occurred.

이때, 위변조 여부 결과(141)가 실제 얼굴이라는 것은, 듀얼 카메라(1)를 통해 획득된 촬영 영상(10)이 일예로 도 5의 (a)에 도시된 바와 같이 실제 대상체(사람)의 얼굴을 촬영함으로써 획득된 영상임에 따라, 획득된 촬영 영상(10) 내 식별된 얼굴 영역(s1, s2)이 실제 대상체(사람)의 얼굴에 해당하는 것임을 의미할 수 있다.In this case, the fact that the forgery or alteration result 141 is a real face means that the photographed image 10 acquired through the dual camera 1 represents the face of an actual object (person) as shown in FIG. 5A. As the image is acquired by photographing, it may mean that the identified face regions s1 and s2 in the acquired captured image 10 correspond to the face of an actual object (person).

이에 반해, 위변조 여부 결과(141)가 거짓 얼굴이라는 것은, 듀얼 카메라(1)를 통해 획득된 촬영 영상(10)이 일예로 도 5의 (b)에 도시된 바와 같이 종이에 인쇄된 대상체(사람)의 얼굴을 촬영함으로써 획득된 영상임에 따라, 획득된 촬영 영상(10) 내 식별된 얼굴 영역(s1, s2)이 실제 대상체(사람)의 얼굴이 아닌 위변조(위조 혹은 변조)된 거짓 얼굴(가짜 얼굴)에 해당하는 것임을 의미할 수 있다.On the other hand, the fact that the forgery or alteration result 141 is a false face means that the captured image 10 acquired through the dual camera 1 is an object printed on paper as shown in FIG. ), the identified face regions (s1, s2) in the acquired photographed image 10 are forged (forged or altered) not the face of the actual object (person). It may mean that it corresponds to a fake face).

다른 일예로, 위변조 여부 결과(141)가 거짓 얼굴이라는 것은, 듀얼 카메라(1)를 통해 획득된 촬영 영상(10)이 일예로 도 5의 (c)에 도시된 바와 같이 휴대 단말의 화면 상에 표시된 대상체(사람)의 얼굴을 촬영함으로써 획득된 영상임에 따라, 획득된 촬영 영상(10) 내 식별된 얼굴 영역(s1, s2)이 실제 대상체(사람)의 얼굴이 아닌 위변조(위조 혹은 변조)된 거짓 얼굴(가짜 얼굴)에 해당하는 것임을 의미할 수 있다.As another example, the fact that the forgery or alteration result 141 is a false face means that the captured image 10 acquired through the dual camera 1 is displayed on the screen of the mobile terminal as shown in FIG. 5(c). As the image is acquired by photographing the face of the displayed object (person), the identified face regions (s1, s2) in the acquired photographed image 10 are forged (forged or altered) other than the face of the actual object (person). It may mean that it corresponds to a false face (fake face).

여기서, 휴대 단말은 예시적으로 스마트폰, 태블릿 PC 등일 수 있으나, 이에만 한정되는 것은 아니고, 대상체의 얼굴 이미지 혹은 얼굴 형상(일예로, 레이저빔을 이용해 제공되는 얼굴 형상)의 표시(제공)가 가능한 다양한 단말이 적용될 수 있다. Here, the mobile terminal may be a smartphone, a tablet PC, etc. as an example, but is not limited thereto, and a display (providing) of a face image or a face shape (for example, a face shape provided using a laser beam) of the object Various possible terminals can be applied.

이에 따르면, 검출부(140)는 획득된 촬영 영상(10)이 종이로 출력된 대상체의 얼굴을 촬영한 영상이거나, 고해상도의 화면 상에 출력되는 대상체의 얼굴을 촬영한 영상이거나, 혹은 실리콘 등으로 특수 제작된 대상체의 얼굴 모형을 촬영한 영상 등인 경우, 해당 촬영 영상(10)에 포함되어 있는 대상체의 얼굴이 실제 얼굴이 아닌 거짓 얼굴인 것으로 검출할 수 있다.즉, 검출부(140)는 위변조 여부 결과(141)를 거짓 얼굴인 것으로 출력할 수 있다.According to this, the detection unit 140 may be configured such that the acquired photographed image 10 is an image photographing the face of the object output on paper, an image photographing the face of the object output on a high-resolution screen, or In the case of an image obtained by photographing a face model of the produced object, it may be detected that the face of the object included in the photographed image 10 is a false face rather than a real face. That is, the detection unit 140 results in a forgery or alteration result. (141) can be output as a false face.

도 5의 (a)에서와 같이, 듀얼 카메라(1)를 통해 획득된 촬영 영상(10)이 실제 얼굴을 촬영한 촬영 영상(10)인 경우, 전처리된 제1 시점 영상(11') 내 대상체의 시선(시선방향)과 전처리된 제2 시점 영상(12') 내 대상체의 시선(시선방향) 간에는 차이가 있을 수 있다.As shown in (a) of FIG. 5, when the captured image 10 acquired through the dual camera 1 is the captured image 10 of an actual face, the object in the preprocessed first viewpoint image 11' There may be a difference between a line of sight (a line of sight direction) of and a line of sight (a line of sight direction) of an object in the preprocessed second viewpoint image 12 ′.

반면, 도 5의 (b)나 (c)와 같이, 듀얼 카메라(1)를 통해 획득된 촬영 영상(10)이 거짓 얼굴을 촬영한 촬영 영상인 경우, 전처리된 제1 시점 영상 내 대상체의 시선(시선방향)과 전처리된 제2 시점 영상 내 대상체의 시선(시선방향) 간에는 차이가 없을 수 있다.On the other hand, as shown in (b) or (c) of FIG. 5, when the captured image 10 acquired through the dual camera 1 is a photographed image of a false face, the gaze of the object in the preprocessed first viewpoint image There may be no difference between the (line of sight direction) and the line of sight of the object in the preprocessed second viewpoint image (gaze direction).

달리 말해, 듀얼 카메라(1)를 통해 획득된 촬영 영상(10)이 실제 얼굴을 촬영한 촬영 영상(10)인 경우에는, 잠재적 차이가 존재하여 잠재 특징 차이가 있는 것으로 나타나는 반면, 듀얼 카메라(1)를 통해 획득된 촬영 영상(10)이 거짓 얼굴을 촬영한 촬영 영상인 경우에는, 잠재적 차이가 거의 존재하지 않아 잠재 특징 차이가 없는 것으로 나타날 수 있다.In other words, when the photographed image 10 acquired through the dual camera 1 is the photographed image 10 of an actual face, a potential difference exists and appears to be a potential feature difference, whereas the dual camera 1 If the photographed image 10 obtained through) is a photographed image in which a false face is photographed, it may appear that there is little potential difference and thus there is no difference in potential features.

이를 고려해, 본 장치(100)는 듀얼 카메라(1)를 이용해 획득된 제1 시점 영상(11, 좌측 영상)과 제2 시점 영상(12, 우측 영상), 즉 좌/우 영상(11, 12)을 이용하여, 획득된 영상(10)이 실제 대상체(사람)의 얼굴인 것인지, 혹은 위변조된 대상체(사람)의 거짓 얼굴인 것인지 여부를 검출할 수 있다.In consideration of this, the device 100 includes a first viewpoint image 11 (left image) and a second viewpoint image 12 (right image) acquired using the dual camera 1, that is, the left/right images 11 and 12. Using, it is possible to detect whether the acquired image 10 is a face of an actual object (person) or a false face of a forged object (person).

본원에서 고려되는 제1 신경망(20)은 복수의 대상체의 얼굴 부위를 듀얼 카메라(1)를 이용해 촬영함으로써 획득된 복수 개의 촬영 영상(복수의 학습 촬영 영상)을 입력으로 하고, 복수 개의 촬영 영상 각각과 매칭되는 위변조 여부 결과(즉, 실제 얼굴인지 혹은 가짜 얼굴인지에 대한 결과 값)을 출력으로 하도록 기 학습된 신경망일 수 있다.The first neural network 20 considered herein receives a plurality of photographed images (multiple learning photographed images) obtained by photographing face portions of a plurality of objects using the dual camera 1 as inputs, and each of the plurality of photographed images It may be a pre-trained neural network to output a result of forgery or alteration matching (ie, a result of whether a real face or a fake face) is output.

마찬가지로, 본원에서 고려되는 제2 신경망(20)은 복수의 대상체의 얼굴 부위를 듀얼 카메라(1)를 이용해 촬영함으로써 획득된 복수 개의 촬영 영상(복수의 학습 촬영 영상) 각각의 잠재 특징 차이를 입력으로 하고, 복수 개의 촬영 영상 각각의 잠재 특징 차이와 매칭되는 위변조 여부 결과(즉, 실제 얼굴인지 혹은 가짜 얼굴인지에 대한 결과 값)을 출력으로 하도록 기 학습된 신경망일 수 있다.Likewise, the second neural network 20 considered in the present application takes as an input the potential feature difference of each of a plurality of photographed images (multiple learning photographed images) obtained by photographing face parts of a plurality of objects using the dual camera 1. And, it may be a pre-trained neural network to output a forgery/modulation result (ie, a result value of whether it is a real face or a fake face) that matches the potential feature difference of each of the plurality of captured images.

본원은 실제 얼굴을 촬영한 영상인 경우에 대상체의 시선(시선방향)이 각 영상(11, 12)에서 차이가 발생함에 착안하여, 듀얼 카메라(1)로부터 획득된 각 영상(11, 12)을 기 학습된 신경망을 이용함으로써 위변조 여부를 검출할 수 있다.In the case of an image photographing an actual face, the present application focuses on the fact that the object's gaze (direction of gaze) differs in each image 11 and 12, so that each image 11 and 12 acquired from the dual camera 1 is used. By using a previously learned neural network, it is possible to detect forgery or alteration.

본원은 보다 많은 수의 대상체(대상자, 사람)를 기반으로, 위변조 여부 검출을 위한 대용량 데이터베이스를 구축할 수 있으며, 이를 기초로 위변조 여부를 검출할 수 있다. 본원은 본 장치(100)로 하여금, 좌/우 영상(11, 12)의 잠재적 차이(즉, 잠재 특징 차이)를 심층신경망(30) 구조를 기반으로 학습하는 방식의 얼굴 위변조 여부 검출 기술에 대하여 제안한다.The present application may build a large-capacity database for detecting forgery or alteration based on a larger number of objects (subjects, people), and detect forgery or alteration based on this. The present application allows the present apparatus 100 to learn a potential difference (that is, a potential feature difference) between the left and right images 11 and 12 based on the structure of the deep neural network 30, based on the structure of the deep neural network 30. Suggest.

본 장치(100)는 심층신경망의 학습을 통해, 좌/우 영상(11, 12)의 차이 정도를 잠재 공간을 이용하여 학습함으로써 검출 성능을 효과적으로 향상시킬 수 있다. 달리 표현해, 본 장치(100)는 복수의 대상자의 촬영 영상을 기반으로 추출된 복수 개의 잠재 특징 차이(50)에 대하여 학습이 이루어진 제2 신경망을 이용함으로써, 듀얼 카메라(1)로부터 실시간 획득되는 촬영 영상(10)에 대한 위변조 여부를 효과적으로 검출할 수 있다.The apparatus 100 may effectively improve detection performance by learning the degree of difference between the left/right images 11 and 12 using a latent space through learning of a deep neural network. In other words, the present apparatus 100 uses a second neural network in which a plurality of potential feature differences 50 extracted based on photographed images of a plurality of subjects are learned, thereby taking a picture acquired in real time from the dual camera 1. Whether the image 10 is forged or altered can be effectively detected.

여기서, 잠재 공간은 전처리된 제1 시점 영상(11')의 특징과 전처리된 제2 시점 영상(12')의 특징에 대한 압축이 이루어지는 공간, 즉 특징 압축 과정이 이루어지는 공간을 의미할 수 있다. 잠재 특징 차이(50)는 이러한 잠재 공간 상에서의 전처리된 제1 시점 영상(11') 관련 압축된 특징들과 전처리된 제2 시점 영상(12') 관련 압축된 특징들 간의 차이를 의미할 수 있다. Here, the latent space may mean a space in which features of the preprocessed first viewpoint image 11 ′ and features of the preprocessed second viewpoint image 12 ′ are compressed, that is, a space in which a feature compression process is performed. The latent feature difference 50 may mean a difference between the compressed features related to the preprocessed first view image 11 ′ and the compressed features related to the preprocessed second view image 12 ′ in such a latent space. .

본 장치(100)는 듀얼 카메라(1)로부터 획득된 좌/우 얼굴 영상(11, 12)에 대하여, 전처리부(미도시)를 통해 얼굴 영역을 식별(검출)하고, 식별된 얼굴 영역에 대한 이미지 크기를 정규화하는 전처리를 수행할 수 있다. 이후, 본 장치(100)는 전처리된 촬영 영상(10)으로부터 특징을 추출하기 위해, 각 영상(11 12)을 별도의 심층신경망, 즉, 제1 신경망(20) 내 제1 서브 신경망(21)과 제2 서브 신경망(22) 각각에 입력으로 제공할 수 있다.The device 100 identifies (detects) a face area through a preprocessor (not shown) with respect to the left/right face images 11 and 12 acquired from the dual camera 1, and Preprocessing to normalize the image size can be performed. Thereafter, in order to extract features from the pre-processed photographed image 10, the apparatus 100 separates each image 11 12 into a separate deep neural network, that is, the first sub neural network 21 in the first neural network 20. And the second sub neural network 22 as inputs.

또한, 본 장치(100)에서는 좌우 영상(11, 12)의 잠재 특징 차이를 효과적으로 예측하기 위해, 종래에 사용하는 손실 함수 기반의 방식이 아닌, 별도의 차이 정보 압축 신경망으로서 제2 신경망(30)을 구성(마련)하였다. In addition, in order to effectively predict the difference in potential features of the left and right images 11 and 12, the present apparatus 100 uses the second neural network 30 as a separate difference information compression neural network, not a loss function-based method used in the prior art. Was constructed (prepared).

본 장치(100)는 제1 신경망(20)를 통해 추출된 압축 정보인 압축 특징(40)과 제2 신경망(20)을 통해 추출된 잠재적 차이 정보인 잠재 특징 차이(50)를 결합하기 위해 2가지 방식(상술한 제1 방식과 제2 방식)의 심층신경망 구조를 이용할 수 있다. In order to combine the compressed feature 40, which is the compressed information extracted through the first neural network 20, and the latent feature difference 50, which is the potential difference information extracted through the second neural network 20, the apparatus 100 The deep neural network structure of the branch method (the first method and the second method described above) can be used.

이때, 제1 방식의 심층신경망 구조는, 각각의 압축된 정보(즉, 압축 특징과 잠재 특징 차이)를 단순 결합(Concatenation)하여 학습하는 방식의 구조로서, 이는 도 3과 같은 구조를 의미할 수 있다. 즉, 도 3은 본 장치(100)에서 검출부(140)가 점수 산출시 이용하는 제1 방식의 심층신경망 구조를 의미할 수 있다. 이러한 제1 방식은 Early Fusion 방식이라 달리 표현될 수 있다.At this time, the deep neural network structure of the first method is a structure of a method of learning by simply concatenating each compressed information (ie, a difference between a compressed feature and a latent feature), which may mean a structure as shown in FIG. 3. have. That is, FIG. 3 may refer to a structure of a deep neural network of the first method used by the detection unit 140 in the device 100 when calculating a score. This first method can be expressed differently as an early fusion method.

이때, 제1 방식의 경우, 본 장치(100)는 결합된 정보(즉, 통합 결합 특징)에 관한 점수를 산출하기 위해, 추가적으로 합성곱 레이어와 풀리 커넥티드 레이어(완전 연결 계층)을 포함하는 제3 서브 신경망(61)을 마련할 수 있다. 이러한 제3 서브 신경망(61)을 이용해 본 장치(100)는 획득된 촬영 영상(10)에 대하여, 실제 얼굴과 위변조된 얼굴인 거짓 얼굴에 대한 점수(Score)를 산출할 수 있다. 산출된 점수를 기반으로 위변조 여부 결과(141)가 도출될 수 있다.In this case, in the case of the first method, the device 100 additionally includes a convolutional layer and a fully connected layer (fully connected layer) in order to calculate a score for the combined information (ie, integrated combination feature). 3 A sub neural network 61 can be provided. Using the third sub-neural network 61, the apparatus 100 may calculate a score for an actual face and a false face that is a forged face with respect to the acquired captured image 10. Based on the calculated score, a forgery or alteration result 141 may be derived.

한편, 제2 방식의 심층신경망 구조는, 좌/우 영상(11, 12)으로부터 각각 압축된 정보만을 이용(즉, 제1 시점 압축 특징과 제2 시점 압축 특징만을 이용)하여 위변조 여부를 학습하는 신경망(이는, 제1 신경망과 제3 서브 신경망을 포함하는 신경망을 의미할 수 있음)과 잠재적 차이 특징(잠재 특징 차이)을 학습하는 신경망(이는, 제2 신경망과 제4 서브 신경망을 포함하는 신경망을 의미할 수 있음)을 별도로 구성하여 각 신경망의 최종 점수(즉, 제1 점수와 제2 점수)의 평균을 사용하는 방식의 구조로서, 이는 도 4와 같은 구조를 의미할 수 있다. 즉, 도 4는 본 장치(100)에서 검출부(140)가 점수 산출시 이용하는 제2 방식의 심층신경망 구조를 의미할 수 있다. 이러한 제2 방식은 Late Fusion 방식이라 달리 표현될 수 있다.On the other hand, the deep neural network structure of the second method uses only compressed information from the left/right images 11 and 12 (i.e., using only the first view compression feature and the second view compression feature) to learn whether or not forgery is altered. Neural network (this may mean a neural network including a first neural network and a third sub neural network) and a neural network that learns a potential difference feature (a potential feature difference) (this is a neural network including a second neural network and a fourth sub neural network) It is a structure of a method of separately configuring) and using the average of the final scores (ie, the first score and the second score) of each neural network, which may mean a structure as shown in FIG. 4. That is, FIG. 4 may mean the structure of the second type of deep neural network used by the detection unit 140 in the present apparatus 100 to calculate a score. This second method may be differently expressed as a late fusion method.

이때, 제2 방식의 경우, 본 장치(100)는 제1 신경망(20)과 제3 서브 신경망(61')을 이용하여 제1 점수(44s)를 산출하고, 제2 신경망(30)과 제4 서브 신경망(62')을 이용하여 제2 점수(50s)를 산출할 수 있다. 이후, 본 장치(100)는 산출된 두 점수(44s, 50s)를 평균한 평균 점수를 기반으로, 획득된 촬영 영상(10)에 대하여, 실제 얼굴과 위변조된 얼굴인 거짓 얼굴에 대한 점수(Score)를 산출할 수 있다. 산출된 점수를 기반으로 위변조 여부 결과(141)가 도출될 수 있다.In this case, in the case of the second method, the device 100 calculates the first score 44s using the first neural network 20 and the third sub neural network 61 ′, and the second neural network 30 and the second neural network 30 4 A second score 50s may be calculated using the sub neural network 62'. Thereafter, the device 100 is based on the average score obtained by averaging the calculated two scores (44s, 50s), for the acquired captured image 10, the score for the real face and the false face that is a forged face (Score ) Can be calculated. Based on the calculated score, a forgery or alteration result 141 may be derived.

본원의 일 실험 예에 따르면, 본원은 본 장치(100)의 성능을 검증하기 위해, 자체적으로 제작한 50명 규모의 데이터베이스를 이용하여 위변조 여부 검출을 수행하였다. 제작된 데이터베이스에는 남자 29명과 여자 21명의 촬영 영상 및 그에 관한 위변조 여부 결과에 포함되어 있을 수 있다. According to an experimental example of the present application, in order to verify the performance of the present apparatus 100, the present application detects forgery or alteration using a database of 50 persons produced by itself. The produced database may be included in the photographed images of 29 men and 21 women, as well as the results of forgery or alteration.

본원의 일 실험에서 듀얼 카메라(1)(혹은 스테레오 카메라)는 모델 oCamS-1CGN-U)가 이용되고, 이를 통해 획득된 촬영 영상(10)의 해상도는 1280 × 960 픽셀일 수 있다. 본원의 일 실험에서는 종래에 공지된 이미지 분석 알고리즘(얼굴 식별 알고리즘)을 이용하여 듀얼 카메라(1)를 통해 획득된 촬영 영상(10) 내 얼굴 영역을 식별(검출)하고, 식별된 얼굴 영역을 128 × 128의 픽셀 크기로 정규화하는 전처리를 수행하여, 전처리된 촬영 영상을 제1 신경망(20)의 입력으로 사용하였다. 도 5는 듀얼 카메라(1)로부터 획득된 촬영 영상(10)으로부터 얼굴 영역을 검출화는 과정을 보여주고 있다.In one experiment of the present application, the model oCamS-1CGN-U) is used as the dual camera 1 (or stereo camera), and the resolution of the captured image 10 obtained through this may be 1280 × 960 pixels. In one experiment of the present application, a face region in the captured image 10 obtained through the dual camera 1 is identified (detected) using a conventionally known image analysis algorithm (face identification algorithm), and the identified face region is 128 Pre-processing of normalizing to a pixel size of x 128 was performed, and the pre-processed photographed image was used as an input of the first neural network 20. 5 shows a process of detecting a face area from a captured image 10 acquired from the dual camera 1.

도 6은 본원의 일 실시예에 따른 얼굴 위변조 검출 장치(100)의 성능 평가 결과를 나타낸 도면이다.6 is a diagram showing a performance evaluation result of the face forgery detection apparatus 100 according to an embodiment of the present application.

도 6에서 Proposed method는 본원에서 제안하는 방법, 즉 본 장치(100)에 의한 위변조 검출 기술을 의미한다. 또한, 도 6에서 본 장치(100)의 성능 평가 시 대비되는 종래의 위변조 검출 기술[1]로는 예시적으로 종래 공지된 문헌 [X. Sun, L. Huagn, and C. Liu, "Dual Camera Based Feature For Face Spoofing Detection," in Proc. Chinese Conference Pattern Recognition., Chengdu, China, Nov. 2016, pp. 332 344.]가 고려되었다.In FIG. 6, the Proposed method refers to a method proposed in the present application, that is, a forgery detection technology by the present apparatus 100. In addition, as a conventional forgery detection technique [1] compared to the performance evaluation of the present apparatus 100 in FIG. 6, the conventionally known document [X. Sun, L. Huagn, and C. Liu, "Dual Camera Based Feature For Face Spoofing Detection," in Proc. Chinese Conference Pattern Recognition., Chengdu, China, Nov. 2016, pp. 332 344.] was considered.

또한, 도 6에서 22-fold는 22명의 대상자를 기반으로 이루어진 실험임을 의미하고, 50-forld는 50명의 대상자를 기반으로 이루어진 실험임을 의미할 수 있다. 또한, 도 6에서 early fusion은 본 장치(100)의 제1 방식을 기반으로 한 실험임을 의미하고, late fusion은 본 장치(100)의 제2 방식을 기반으로 한 실험임을 의미할 수 있다. 또한, EER(Equal Error Rate)은 동일 오류율을 의미하는 것으로서, 값이 작을수록 성능이 더 좋음을 의미할 수 있다.In addition, in FIG. 6, 22-fold may mean an experiment based on 22 subjects, and 50-forld may mean an experiment based on 50 subjects. In addition, in FIG. 6, early fusion may mean an experiment based on the first method of the device 100, and late fusion may mean an experiment based on the second method of the device 100. In addition, EER (Equal Error Rate) means the same error rate, and a smaller value may mean better performance.

본원의 일 실험에서는 본 장치(100)의 성능 평가를 위해 일예로 Intel Xeon CPU E5-1650 와 NVIDIA TITAN Xp GPU가 사용되었다.In one experiment of the present application, as an example, Intel Xeon CPU E5-1650 and NVIDIA TITAN Xp GPU were used to evaluate the performance of the device 100.

도 6을 참조하면, 본원의 일 실험(Experiment) 결과, 본원에서 제안하는 기술의 성능(즉, 본 장치에 의한 위변조 여부 검출 결과의 성능)은 종래의 위변조 검출 기술 대비 더 많은 실험 대상자에(대상체)에 대하여 우수한 성능을 보이고 있음을 확인할 수 있다. Referring to FIG. 6, as a result of an experiment of the present application, the performance of the technology proposed by the present application (that is, the performance of the detection result of forgery or alteration by the device) is compared to the conventional forgery detection technology. ), it can be seen that it shows excellent performance.

또한, 본원의 일 실험 결과에 의하면, 본 장치(100)의 경우, 점수 결합 방식인 제2 방식(late fusion)을 이용한 위변조 검출 성능이 0.652%인 바, 단순 특징 결합 방식인 제1 방식(early fusion) 대비 성능이 더 우수함을 확인할 수 있다.In addition, according to the experimental results of the present application, in the case of the device 100, the forgery detection performance using the second method (late fusion), which is a score combining method, is 0.652%. It can be seen that the performance is better than fusion).

상술한 본원의 일 실험에 따른 성능 평가 결과에 따르면, 본 장치(100)에서 제안하는 기술이 얼굴 위변조 여부 검출에 효과적으로 적용될 수 있음을 확인할 수 있다. 즉, 본원에서 제안하는 방식(잠재 특징 차이를 별도로 학습하여 위변조 검출에 이용하는 방식)이, 종래의 기술 대비 얼굴 위변조 여부 검출시 보다 효과적임을 확인할 수 있다.According to the performance evaluation result according to the above-described experiment of the present application, it can be confirmed that the technique proposed by the present apparatus 100 can be effectively applied to detection of forgery or alteration of a face. That is, it can be seen that the method proposed in the present application (a method of separately learning a potential feature difference and using it to detect forgery) is more effective when detecting whether a face is forged or altered compared to the conventional technology.

본 장치(100)는 좌우 영상(11, 12) 기반으로 고성능 얼굴 위변조 여부를 검출할 수 있다. 또한, 본 장치(100)는 3차언 구조에 의한 차이점을 내재적으로 학습하는 것이 가능하다.The device 100 may detect whether a high-performance face is forged or altered based on the left and right images 11 and 12. In addition, the present apparatus 100 can implicitly learn the difference due to the three-word structure.

본 장치(100)는 듀얼 카메라(1)로부터 획득된 좌우 영상(11, 12) 및 그 차이(즉, 잠재 특징 차이)을 이용하여 얼굴 위변조 여부를 검출할 수 있다.The apparatus 100 may detect whether a face has been forged or altered by using the left and right images 11 and 12 acquired from the dual camera 1 and the difference (ie, potential feature difference).

이러한 본원에서 제안하는 기술은 다양한 임베디드 시스템(예시적으로, 스마트폰, 키오스크 등) 환경에서 얼굴 영상을 이용한 보안 강화 적용 분야에 효과적으로 적용, 활용될 수 있다.The technology proposed in the present application can be effectively applied and utilized in the field of application of security enhancement using face images in various embedded systems (eg, smartphones, kiosks, etc.).

이하에서는 상기에 자세히 설명된 내용을 기반으로, 본원의 동작 흐름을 간단히 살펴보기로 한다.Hereinafter, based on the details described above, the operation flow of the present application will be briefly described.

도 7은 본원의 일 실시예에 따른 얼굴 위변조 검출 방법의 동작 흐름도이다.7 is a flowchart illustrating an operation of a method for detecting forgery and alteration of a face according to an embodiment of the present application.

도 7에 도시된 얼굴 위변조 검출 방법은 앞서 설명된 본 장치(100)에 의하여 수행될 수 있다. 따라서, 이하 생략된 내용이라고 하더라도 본 장치(100)에 대하여 설명된 내용은 얼굴 위변조 검출 방법에 대한 설명에도 동일하게 적용될 수 있다.The method for detecting face forgery and alteration illustrated in FIG. 7 may be performed by the apparatus 100 described above. Accordingly, even if the contents are omitted below, the contents described with respect to the apparatus 100 may be equally applied to the description of the method for detecting face forgery and alteration.

도 7을 참조하면, 단계S11에서 획득부(110)는, 듀얼 카메라를 통해 대상체의 얼굴 부위에 대한 촬영 영상을 획득할 수 있다.Referring to FIG. 7, in step S11, the acquisition unit 110 may acquire a photographed image of a face portion of an object through a dual camera.

이때, 촬영 영상은 제1 시점 영상 및 제2 시점 영상을 포함할 수 있다.In this case, the captured image may include a first viewpoint image and a second viewpoint image.

다음으로, 단계S12에서 제1 추출부(120)는, 단계S11에서 획득된 촬영 영상을 이용하여 촬영 영상 내 얼굴 영역과 관련된 압축 특징을 추출할 수 있다.Next, in step S12, the first extracting unit 120 may extract a compression feature related to the face region in the captured image by using the captured image acquired in step S11.

이때, 단계S12에서 제1 추출부(120)는, 획득부(110)로부터 전처리된 촬영 영상을 전달받을 수 있으며, 전처리된 촬영 영상을 이용하여 압축 특징을 추출할 수 있다. 따라서, 후술하는 설명에서 압축 특징 및 잠재 특징 차이의 추출시 고려되는 제1 시점 영상과 제2 시점 영상은 각각 전처리된 제1 시점 영상과 전처리된 제2 시점 영상일 수 있다.In this case, in step S12, the first extraction unit 120 may receive the pre-processed captured image from the acquisition unit 110, and may extract a compression feature using the pre-processed captured image. Accordingly, in a description to be described later, the first viewpoint image and the second viewpoint image considered when extracting the compressed feature and the latent feature difference may be a preprocessed first viewpoint image and a preprocessed second viewpoint image, respectively.

단계S12에서 제1 추출부(120)는, 제1 시점 영상(특히, 전처리된 제1 시점 영상)과 제2 시점 영상(특히, 전처리된 제2 시점 영상)을 제1 신경망의 입력으로 제공하여, 제1 신경망으로부터 제1 시점 영상(전처리된 제1 시점 영상)에 대응하는 제1 시점 압축 특징과 제2 시점 영상(전처리된 제2 시점 영상)에 대응하는 제2 시점 압축 특징을 포함하는 압축 특징을 추출할 수 있다.In step S12, the first extraction unit 120 provides a first viewpoint image (especially, a preprocessed first viewpoint image) and a second viewpoint image (especially, a preprocessed second viewpoint image) as inputs of the first neural network. , Compression including a first view compression feature corresponding to a first view image (preprocessed first view image) and a second view compression feature corresponding to a second view image (preprocessed second view image) from the first neural network Features can be extracted.

다음으로, 단계S13에서 제2 추출부(130)는, 단계S11에서 획득된 촬영 영상을 이용하여 잠재 특징 차이(Difference of Latent Features)를 추출할 수 있다.Next, in step S13, the second extraction unit 130 may extract a difference of latent features by using the captured image acquired in step S11.

이때, 단계S13에서 제2 추출부(130)는, 제1 시점 영상과 제2 시점 영상을 이용하여 제1 시점 영상과 제2 시점 영상 간의 잠재 특징 차이를 추출할 수 있다. 특히, 제2 추출부(130)는, 전처리된 제1 시점 영상과 전처리된 제2 시점 영상 간의 잠재 특징 차이를 추출할 수 있다.In this case, in step S13, the second extraction unit 130 may extract a potential feature difference between the first viewpoint image and the second viewpoint image by using the first viewpoint image and the second viewpoint image. In particular, the second extraction unit 130 may extract a latent feature difference between the preprocessed first viewpoint image and the preprocessed second viewpoint image.

여기서, 잠재 특징 차이는, 제1 시점 영상의 특징과 제2 시점 영상의 특징이 제1 신경망 내에서 압축되는 과정(특징 압축 과정)에서 발생하는 제1 시점 영상의 특징과 제2 시점 영상의 특징 간의 차이인 차이 특징을 제2 신경망에 통과시킴으로써 획득되는 것일 수 있다.Here, the latent feature difference is the feature of the first view image and the feature of the second view image occurring in a process in which the feature of the first view image and the feature of the second view image are compressed in the first neural network (feature compression process). It may be obtained by passing a difference feature, which is a difference between, through the second neural network.

다음으로, 단계S14에서 검출부(140)는, 단계S12에서 추출된 압축 특징과 단계S13에서 추출된 잠재 특징 차이를 이용하여 촬영 영상 내 상기 얼굴 영역에 대한 위변조 여부를 검출할 수 있다.Next, in step S14, the detection unit 140 may detect whether the face region in the captured image is forged or altered by using the difference between the compressed feature extracted in step S12 and the latent feature extracted in step S13.

또한, 단계S14에서 검출부(140)는, 제1 시점 압축 특징, 제2 시점 압축 특징 및 잠재 특징 차이를 이용하여 촬영 영상에 대한 점수를 산출하고, 산출된 점수를 기반으로 촬영 영상 내 식별된 얼굴 영역에 대한 위변조 여부를 검출할 수 있다.In addition, in step S14, the detection unit 140 calculates a score for the captured image by using the first view compression feature, the second view compression feature, and the latent feature difference, and based on the calculated score, the identified face in the captured image It is possible to detect whether an area has been forged or altered.

또한, 단계S14에서 검출부(140)는, 제1 시점 압축 특징, 제2 시점 압축 특징 및 잠재 특징 차이를 결합하여 통합 결합 특징을 생성하고, 생성된 통합 결합 특징을 이용하여 산출된 통합 결합 특징 관련 점수를 기반으로 위변조 여부를 검출할 수 있다.In addition, in step S14, the detection unit 140 generates an integrated combination feature by combining the first view compression feature, the second view compression feature, and the latent feature difference, and relates the integrated combination feature calculated using the generated integrated combination feature. It is possible to detect forgery or alteration based on the score.

또한, 단계S14에서 검출부(140)는, 제1 시점 압축 특징과 제2 시점 압축 특징을 결합한 결합 특징을 이용하여 결합 특징 관련 제1 점수를 산출하고, 잠재 특징 차이를 이용하여 잠재 특징 차이 관련 제2 점수를 산출하며, 제1 점수와 상기 제2 점수를 합산한 합산 점수를 기반으로 위변조 여부를 검출할 수 있다.In addition, in step S14, the detection unit 140 calculates a first score related to the combined feature by using the combined feature that combines the first view compression feature and the second view compression feature, and uses the latent feature difference to calculate a potential feature difference related product. 2 A score is calculated, and forgery or alteration may be detected based on a sum score obtained by summing the first score and the second score.

여기서, 합산 점수는, 제1 점수와 제2 점수를 평균한 점수일 수 있다.Here, the sum score may be a score obtained by an average of the first score and the second score.

상술한 설명에서, 단계 S11 내지 S14는 본원의 구현예에 따라서, 추가적인 단계들로 더 분할되거나, 더 적은 단계들로 조합될 수 있다. 또한, 일부 단계는 필요에 따라 생략될 수도 있고, 단계 간의 순서가 변경될 수도 있다.In the above description, steps S11 to S14 may be further divided into additional steps or may be combined into fewer steps, depending on the embodiment of the present application. In addition, some steps may be omitted as necessary, or the order between steps may be changed.

본원의 일 실시 예에 따른 얼굴 위변조 검출 방법은 다양한 컴퓨터 수단을 통하여 수행될 수 있는 프로그램 명령 형태로 구현되어 컴퓨터 판독 가능 매체에 기록될 수 있다. 상기 컴퓨터 판독 가능 매체는 프로그램 명령, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다. 상기 매체에 기록되는 프로그램 명령은 본 발명을 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 당업자에게 공지되어 사용 가능한 것일 수도 있다. 컴퓨터 판독 가능 기록 매체의 예에는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체(magnetic media), CD-ROM, DVD와 같은 광기록 매체(optical media), 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical media), 및 롬(ROM), 램(RAM), 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다. 프로그램 명령의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드를 포함한다. 상기된 하드웨어 장치는 본 발명의 동작을 수행하기 위해 하나 이상의 소프트웨어 모듈로서 작동하도록 구성될 수 있으며, 그 역도 마찬가지이다.The method for detecting forgery and alteration of a face according to an exemplary embodiment of the present disclosure may be implemented in the form of a program command that can be executed through various computer means and recorded in a computer-readable medium. The computer-readable medium may include program instructions, data files, data structures, etc. alone or in combination. The program instructions recorded on the medium may be specially designed and configured for the present invention, or may be known and usable to those skilled in computer software. Examples of computer-readable recording media include magnetic media such as hard disks, floppy disks, and magnetic tapes, optical media such as CD-ROMs and DVDs, and magnetic media such as floptical disks. -A hardware device specially configured to store and execute program instructions such as magneto-optical media, and ROM, RAM, flash memory, and the like. Examples of program instructions include not only machine language codes such as those produced by a compiler, but also high-level language codes that can be executed by a computer using an interpreter or the like. The hardware device described above may be configured to operate as one or more software modules to perform the operation of the present invention, and vice versa.

또한, 전술한 얼굴 위변조 검출 방법은 기록 매체에 저장되는 컴퓨터에 의해 실행되는 컴퓨터 프로그램 또는 애플리케이션의 형태로도 구현될 수 있다.In addition, the above-described face forgery detection method may be implemented in the form of a computer program or application executed by a computer stored in a recording medium.

전술한 본원의 설명은 예시를 위한 것이며, 본원이 속하는 기술분야의 통상의 지식을 가진 자는 본원의 기술적 사상이나 필수적인 특징을 변경하지 않고서 다른 구체적인 형태로 쉽게 변형이 가능하다는 것을 이해할 수 있을 것이다. 그러므로 이상에서 기술한 실시예들은 모든 면에서 예시적인 것이며 한정적이 아닌 것으로 이해해야만 한다. 예를 들어, 단일형으로 설명되어 있는 각 구성 요소는 분산되어 실시될 수도 있으며, 마찬가지로 분산된 것으로 설명되어 있는 구성 요소들도 결합된 형태로 실시될 수 있다.The foregoing description of the present application is for illustrative purposes only, and those of ordinary skill in the art to which the present application pertains will be able to understand that it can be easily modified into other specific forms without changing the technical spirit or essential features of the present application. Therefore, it should be understood that the embodiments described above are illustrative and non-limiting in all respects. For example, each component described as a single type may be implemented in a distributed manner, and similarly, components described as being distributed may also be implemented in a combined form.

본원의 범위는 상기 상세한 설명보다는 후술하는 특허청구범위에 의하여 나타내어지며, 특허청구범위의 의미 및 범위 그리고 그 균등 개념으로부터 도출되는 모든 변경 또는 변형된 형태가 본원의 범위에 포함되는 것으로 해석되어야 한다.The scope of the present application is indicated by the claims to be described later rather than the detailed description, and all changes or modified forms derived from the meaning and scope of the claims and their equivalent concepts should be interpreted as being included in the scope of the present application.

100: 얼굴 위변조 검출 장치
110: 획득부
120: 제1 추출부
130: 제2 추출부
140: 검출부100: face forgery detection device
110: acquisition unit
120: first extraction unit
130: second extraction unit
140: detection unit

Claims

As a face forgery detection method,
(a) obtaining a photographed image of the face of the object through a dual camera;
(b) extracting a compression feature related to a face region in the captured image by using the captured image;
(c) extracting a difference of latent features using the captured image; And
(d) detecting whether the face region in the photographed image has been forged or altered using the difference between the compressed feature and the latent feature,
Face forgery detection method comprising a.

The method of claim 1,
In the step (a), the captured image includes a first viewpoint image and a second viewpoint image,
The step (b),
A first viewpoint compression feature corresponding to the first viewpoint image and a second viewpoint corresponding to the second viewpoint image from the first neural network by providing the first viewpoint image and the second viewpoint image as inputs of a first neural network To extract the compression feature including the compression feature, facial forgery detection method.

The method of claim 2,
The step (c),
And extracting a potential feature difference between the first viewpoint image and the second viewpoint image by using the first viewpoint image and the second viewpoint image.

The method of claim 3,
The latent feature difference in step (c) is,
The difference feature, which is the difference between the feature of the first view image and the feature of the second view image, generated when the feature of the first view image and the feature of the second view image is compressed in the first neural network is a second difference feature. To be obtained by passing through a neural network, face forgery detection method.

The method of claim 2,
The step (d),
A score for the captured image is calculated using the first view compression feature, the second view compression feature, and the latent feature difference, and whether or not the face region in the captured image is forged or altered based on the calculated score. To detect, face forgery detection method.

The method of claim 5,
The step (d),
Whether the first view compression feature, the second view compression feature, and the latent feature difference are combined to generate an integrated combination feature, and whether the forgery or alteration is based on a score related to the integrated combination feature calculated using the generated integrated combination feature. To detect, face forgery detection method.

The method of claim 5,
The step (d),
A first score related to a combined feature is calculated using a combined feature obtained by combining the first view compression feature and the second view compression feature,
A second score related to the latent feature difference is calculated using the latent feature difference,
The method of detecting forgery or alteration of the face based on a sum score obtained by adding the first score and the second score.

The method of claim 7,
In the step (d), the summed score is a score obtained by averaging the first score and the second score.

As a face forgery detection device,
An acquisition unit that acquires a photographed image of a face portion of the object through the dual camera;
A first extracting unit for extracting a compression feature related to a face region in the captured image using the captured image;
A second extraction unit for extracting a difference of latent features using the captured image; And
A detection unit that detects whether the face region in the photographed image has been forged or altered using the difference between the compression feature and the latent feature,
Face forgery detection device comprising a.

The method of claim 9,
The captured image includes a first viewpoint image and a second viewpoint image,
The first extraction unit,
A first viewpoint compression feature corresponding to the first viewpoint image and a second viewpoint corresponding to the second viewpoint image from the first neural network by providing the first viewpoint image and the second viewpoint image as inputs of a first neural network To extract the compression feature including the compression feature, facial forgery detection device.

The method of claim 10,
The second extraction unit,
And extracting a potential feature difference between the first viewpoint image and the second viewpoint image by using the first viewpoint image and the second viewpoint image.

The method of claim 11,
The latent feature difference is,
The difference feature, which is the difference between the feature of the first view image and the feature of the second view image, generated when the feature of the first view image and the feature of the second view image is compressed in the first neural network is a second difference feature. It is obtained by passing through a neural network, facial forgery detection device.

The method of claim 10,
The detection unit,
Calculating a score for the captured image using the first viewpoint feature, the second viewpoint feature, and the latent feature difference, and detecting whether the face region in the photographed image has been forged or altered based on the calculated score. That, face forgery detection device.

A computer-readable recording medium storing a program for executing the method of claim 1 on a computer.