KR20180065889A

KR20180065889A - Method and apparatus for detecting target

Info

Publication number: KR20180065889A
Application number: KR1020170150501A
Authority: KR
Inventors: 왕 비아오; 안 야오주; 유병인; 최창규; 치엔 드어헝; 한재준; 쉬 징타오
Original assignee: 삼성전자주식회사
Priority date: 2016-12-07
Filing date: 2017-11-13
Publication date: 2018-06-18
Also published as: CN108171250A; KR102449841B1

Abstract

A method and apparatus for detecting the liveness of a target based on image quality are disclosed. A method for detecting a target according to an exemplary embodiment includes: a step of determining a quality type of an image; a step of determining the quality type of the image and a convolutional neural network of a corresponding quality type; a step of determining the detection value of the image based on the convolutional neural network; and a step of determining whether the target in the image is a real target based on the detection value of the image. It is possible to detect the target more accurately.

Description

[0001] METHOD AND APPARATUS FOR DETECTING TARGET [0002]

본 발명은 컴퓨터 시각기술영역에 관한 것이고, 구체적으로 본 발명은 타겟을 검측하는 방법 및 장치에 관한 것이다.The present invention relates to a computer vision technology domain, and more particularly, the present invention relates to a method and apparatus for detecting targets.

타겟의 검측은 영상 처리영역의 중요한 내용으로, 영상 내 타겟이 미리 등록된 생체 타겟으로 검측될 때 해당 타겟이 진짜 타겟으로 결정될 수 있다. 타겟은 사람 얼굴 등일 수 있고, 핸드폰 잠금해제나 이동결제 등 다양한 기능들로 응용될 수 있다. The detection of the target is an important content of the image processing region. When the target in the image is detected as a biometric target registered in advance, the target can be determined as a real target. The target can be a human face or the like, and can be applied to various functions such as a mobile phone unlocking and mobile payment.

타겟의 검측 방법은 다양한 형식의 사기(fake) 공격에 취약할 수 있다. 예를 들어, 전통적인 타겟 검측 방법에 의하면, 실제 사용자의 생체 얼굴이 아닌, 사용자의 얼굴이 포함된 종이 프린트 영상, 사진, 스크린 영상, 스크린 비디오, 3D 프린트 등이 실제 사용자의 생체 얼굴로 오인될 수 있다. Target detection methods can be vulnerable to various forms of fake attacks. For example, according to a conventional target detection method, a paper print image, a photograph, a screen image, a screen video, and a 3D print including a user's face, rather than a real face of a real user, have.

이를 극복하기 위한 타겟 검측 방법으로, 침입식 타겟 검측 방법과 비침입식 타겟 검측 방법이 존재한다. As a target detection method to overcome this, there are a penetration type target detection method and an non-insertion type target detection method.

기존의 침입식 타겟 검측 방법은 사용자의 인터액션에 의존한다. 예를 들어, 사용자는 프로그램의 제시에 따라 눈 깜빡이기, 머리 흔들기, 또는 미소짓기 등 지정된 동작을 하고, 프로그램은 사용자의 동작을 식별하여 진짜 타겟인지 여부를 판별한다. 그러나, 이 유형의 방법은 실제 응용에서 식별단계가 비교적 번거롭고, 소요시간이 길며, 사용자에게 지정된 동작을 요구함으로써 번거로운 단점이 존재한다.Existing intrusion target detection methods depend on the user's interaction. For example, the user performs a specified operation such as blinking an eye, waving a head, smile, or the like according to a presentation of a program, and the program identifies a user's action and discriminates whether or not it is a real target. However, this type of method has a disadvantage in that the identification step is relatively cumbersome, takes a long time, and requires an operation designated by the user in a practical application.

기존의 비침입식 타겟 검측 방법은 단말장비의 촬영장치를 통하여 획득한 영상 또는 비디오 정보로부터 특징을 추출하고, 추출한 특징에 기초하여 타겟의 진위여부를 결정한다. 예를 들어, 기존의 비침입식 타겟 검측 방법으로, 인공설계특징에 기초한 타겟 검측 방법이 존재한다. 이 방법에 따르면, 설계자가 관련 컴퓨터 시각과 영상처리연구영역에서의 경험을 이용하여 일정한 객관적 알고리즘을 설계하여 영상 또는 비디오의 특징을 추출한다. 그러나, 서로 다른 단말장비의 촬영장치에 성능차이가 존재하기에, 타겟 영상에 차이가 존재한다. 예를 들어, 일부 영상의 노출이 조금 과하거나, 색조가 붉을 수 있다. 또한, 저조도 또는 역광 등의 영상과 정상조도의 영상 사이에는 차이가 존재한다. 인공설계에 기초한 유사 로컬 바이너리 패턴(Local binary pattern, LBP)등 특징의 추출방법이 이용되는 경우, 영상의 로컬 텍스쳐 정보만 고려되는데, 이 경우 저조도, 또는 역광 등 조건하에서 타겟의 진위 여부를 효과적으로 구분할 수 없다. 다시 말해, 기존의 비침입식 타겟 검측 방법에 의하면, 복잡하고 변화가 많은 다양한 실제광경에서 진위 타겟의 오검측이 쉽게 발생함으로써 강인성(robustness)이 감소된다.In the conventional non-immersive target detection method, features are extracted from image or video information acquired through a photographing apparatus of a terminal equipment, and the authenticity of the target is determined based on the extracted features. For example, with a conventional non-immersive target detection method, there exists a target detection method based on an artificial design feature. According to this method, a designer designs a certain objective algorithm by using the experience of related computer vision and image processing research area to extract the feature of image or video. However, there is a difference in the target image because there is a difference in performance between the imaging devices of different terminal equipments. For example, some images may be slightly overexposed, or the hue may be red. Also, there is a difference between images such as low-illuminance or backlight and images with normal illuminance. When a feature extraction method such as a local binary pattern (LBP) based on an artificial design is used, only the local texture information of the image is considered. In this case, the authenticity of the target can be effectively distinguished under low light conditions or backlight conditions I can not. In other words, according to the conventional non-immersion target detection method, robustness is reduced due to easy occurrence of the true detection target of the true target in a complex and varied real world.

실시예들은 사용자로 하여금 지정동작을 실행해야 하는 번거로움을 제거하고, 다양한 하드웨어 및/또는 환경에서 획득한 타겟 영상으로부터 보다 정확하게 타겟의 진위여부를 검측함으로써 강인성을 증대하는 기술을 제공한다.Embodiments provide techniques for increasing robustness by eliminating the need for a user to perform a specified operation, and more accurately detecting the authenticity of a target from a target image obtained in various hardware and / or environments.

실시예들은 각 타겟 영상의 품질 유형을 결정한 후 결정된 품질 유형과 대응하는 품질 유형의 컨볼루션 뉴럴 네트워크를 결정함으로써, 각 타겟 영상의 품질 유형에 적합한 컨볼루션 뉴럴 네트워크을 이용한다. 이로 인하여, 실시예들은 각 타겟 영상에 포함된 타겟의 진위 여부를 더욱 정확하게 검측할 수 있다.Embodiments use a convolutional neural network that is suitable for the quality type of each target image by determining the quality type of each target image and then determining the convolution neural network of the quality type corresponding to the determined quality type. Thus, the embodiments can more accurately detect the authenticity of the target included in each target image.

일 측에 따른 타겟 검측 방법은 타겟 영상의 품질 유형을 결정하는 단계; 복수의 컨볼루션 뉴럴 네트워크들을 포함하는 데이터베이스로부터, 상기 타겟 영상의 품질 유형과 대응하는 품질 유형의 컨볼루션 뉴럴 네트워크를 결정하는 단계; 상기 대응하는 품질 유형의 컨볼루션 뉴럴 네트워크에 기초하여 상기 타겟 영상의 검측 값을 결정하는 단계; 및 상기 타겟 영상의 검측 값에 기초하여 상기 타겟 영상 내 타겟이 진짜 타겟인지 여부를 결정하는 단계를 포함한다.A method for target detection according to one side comprises the steps of: determining a quality type of a target image; Determining, from a database comprising a plurality of convolutional neural networks, a convolutional neural network of a quality type corresponding to a quality type of the target image; Determining a detection value of the target image based on the convolutional neural network of the corresponding quality type; And determining whether the target in the target image is a real target based on the detected value of the target image.

상기 타겟 영상의 품질 유형은 적어도 하나의 품질 파라미터에 대응하여 결정된 상기 타겟 영상의 적어도 하나의 품질 값에 기초하여 결정될 수 있다. 상기 검측 값은 상기 타겟 영상이 진짜 타겟을 포함하는 정 샘플로 분류된 확률을 포함할 수 있다.The quality type of the target image may be determined based on at least one quality value of the target image determined corresponding to at least one quality parameter. The detected value may include a probability that the target image is classified into a positive sample including a real target.

상기 타겟 영상의 품질 유형을 결정하는 단계는 적어도 하나의 품질 파라미터에 대응하여 상기 타겟 영상의 적어도 하나의 품질 값을 결정하는 단계; 상기 적어도 하나의 품질 값, 및 상기 적어도 하나의 품질 파라미터에 대응하여 미리 설정된 적어도 하나의 품질 유형 구분 표준에 기초하여 상기 타겟 영상의 적어도 하나의 품질 구분을 결정하는 단계; 및 상기 적어도 하나의 품질 구분에 기초하여, 상기 타겟 영상의 품질 유형을 결정하는 단계를 포함할 수 있다. 또는, 상기 타겟 영상의 품질 유형을 결정하는 단계는 상기 타겟 영상에 대하여 블라인드 영상 품질 평가를 수행하여 상기 타겟 영상의 품질 유형을 결정하는 단계를 포함할 수 있다. Wherein determining the quality type of the target image comprises determining at least one quality value of the target image corresponding to at least one quality parameter; Determining at least one quality classification of the target image based on the at least one quality value and at least one quality type classification standard preset corresponding to the at least one quality parameter; And determining a quality type of the target image based on the at least one quality classification. Alternatively, determining the quality type of the target image may include performing a blind image quality assessment on the target image to determine a quality type of the target image.

상기 컨볼루션 뉴럴 네트워크는 적어도 두 개의 스테이지들의 컨볼루션 뉴럴 네크워크들과 적어도 하나의 역치 판단 레이어를 포함하는 케스케이드(cascaded) 컨볼루션 뉴럴 네트워크를 포함할 수 있다.The convolutional neural network may include a cascaded convolutional neural network including convolutional neural networks of at least two stages and at least one threshold value layer.

각 품질 유형의 케스케이드 컨볼루션 뉴럴 네트워크는 서로 다른 성능지수를 요구하고, 상기 각 품질 유형의 케스케이드 컨볼루션 뉴럴 네트워크에 포함된 스테이지들의 수는 해당하는 성능지수에 의하여 결정될 수 있다. 상기 케스케이드 컨볼루션 뉴럴 네트워크에 의하여 요구되는 성능지수는, 상기 케스케이드 컨볼루션 뉴럴 네트워크에 포함된 복수의 스테이지들의 컨볼루션 뉴럴 네트워크들의 성능지수들의 조합에 의하여 만족될 수 있다.The cascade convolutional neural network of each quality type requires a different figure of merit and the number of stages included in the cascade convolution neural network of each quality type may be determined by the corresponding figure of merit. The figure of merit required by the cascade convolutional neural network may be satisfied by a combination of the performance indices of the convolutional neural networks of the plurality of stages included in the cascade convolutional neural network.

상기 타겟 영상의 검측 값을 결정하는 단계는 상기 케스케이드 컨볼루션 뉴럴 네트워크에 포함된 제1 스테이지의 컨볼루션 뉴럴 네트워크에 기초하여 상기 타겟 영상의 제1 스테이지의 검측 값을 결정하는 단계; 및 상기 제1 스테이지의 컨볼루션 뉴럴 네트워크에 연결된 역치 판단 레이어에 기초하여 상기 제1 스테이지의 검측 값과 미리 설정된 역치를 비교함으로써, 상기 타겟 영상의 다음 스테이지의 검측 값의 결정 과정을 수행하는 단계를 포함할 수 있다.Wherein determining the detected value of the target image comprises: determining a detected value of the first stage of the target image based on a convolution neural network of a first stage included in the cascade convolution neural network; And a step of determining a detection value of a next stage of the target image by comparing the detected value of the first stage with a predetermined threshold value based on a threshold value layer connected to the convolution neural network of the first stage .

상기 타겟 영상의 검측 값에 기초하여 상기 타겟 영상 내 타겟이 진짜 타겟인지 여부를 결정하는 단계는 상기 타겟 영상이 프레임 영상 내 현재 프레임일 때, 상기 현재 프레임의 모호 평가 값을 결정하는 단계; 상기 현재 프레임의 모호 평가 값, 상기 현재 프레임의 검측 값, 상기 프레임 영상 내 복수의 이전 프레임들의 모호 평가 값들, 및 상기 복수의 이전 프레임들의 검측 값들에 기초하여 상기 타겟 영상의 종합적 검측 값을 결정하는 단계; 및 상기 타겟 영상의 종합적 검측 값에 기초하여 상기 현재 프레임 중의 타겟이 진짜 타겟인지 여부를 결정하는 단계를 포함할 수 있다.Wherein the step of determining whether the target in the target image is a real target based on a detected value of the target image comprises: determining a value of a mojo evaluation of the current frame when the target image is a current frame in the frame image; A comprehensive detection value of the target image is determined based on a voiced evaluation value of the current frame, a detected value of the current frame, a voiced evaluation value of a plurality of previous frames in the frame image, and detection values of the plurality of previous frames step; And determining whether the target in the current frame is a real target based on the integrated detection value of the target image.

상기 타겟 영상의 종합적 검측 값을 결정하는 단계는 상기 현재 프레임 및 상기 복수의 이전 복수 프레임들의 모호 평가 값들을 각각 대응하는 검측 값의 가중치로 이용하여 검측 값들의 가중 평균 값을 결정하고, 상기 가중 평균 값을 상기 종합적 검측 값으로 결정하는 단계를 포함할 수 있다.Wherein the step of determining a combined detection value of the target image determines a weighted average value of the detection values using the voiced evaluation values of the current frame and the plurality of previous plural frames as weight values of the corresponding detection values, And determining the value as the comprehensive detection value.

일 측에 따른 컨볼루션 뉴럴 네트워크의 트레이닝 방법은 샘플 영상의 품질 유형을 결정하는 단계; 복수의 품질 유형들의 컨볼루션 뉴럴 네트워크들 중 상기 결정된 품질 유형에 대응하는 품질 유형의 컨볼루션 뉴럴 네트워크를 선택하는 단계; 및 상기 샘플 영상이 정 샘플인지 부 샘플인지 여부에 기초하여, 상기 선택된 품질 유형의 컨볼루션 뉴럴 네트워크를 트레이닝하는 단계를 포함한다.A training method of a convolutional neural network according to one side comprises the steps of: determining a quality type of a sample image; Selecting a convolutional neural network of a quality type corresponding to the determined quality type of convolutional neural networks of a plurality of quality types; And training the convolutional neural network of the selected quality type based on whether the sample image is positive or negative.

상기 샘플 영상이 진짜 타겟을 포함하는 경우 상기 샘플 영상은 정 샘플에 해당하고, 상기 샘플 영상이 상기 진짜 타겟을 포함하지 않는 경우 상기 샘플 영상은 부 샘플에 해당할 수 있다. 상기 컨볼루션 뉴럴 네트워크의 트레이닝 방법은 각 품질 유형의 컨볼루션 뉴럴 네트워크에 기초하여, 데이터베이스를 구축하는 단계를 더 포함할 수 있다.If the sample image includes a real target, the sample image corresponds to a positive sample, and if the sample image does not include the real target, the sample image may correspond to a negative sample. The training method of the convolutional neural network may further comprise building a database based on a convolutional neural network of each quality type.

일 측에 따른 타겟 검측 장치는 타겟 영상의 품질 유형을 결정하는 영상 품질 유형 결정부; 복수의 컨볼루션 뉴럴 네트워크들을 포함하는 데이터베이스로부터, 상기 타겟 영상의 품질 유형에 대응하는 품질 유형의 컨볼루션 뉴럴 네트워크를 결정하는 컨볼루션 뉴럴 네트워크 결정부; 상기 대응하는 품질 유형의 컨볼루션 뉴럴 네트워크에 기초하여 상기 타겟 영상의 검측 값을 결정하는 검측 값 결정부; 및 상기 타겟 영상의 검측 값에 기초하여 상기 타겟 영상 내 타겟이 진짜 타겟인지 여부를 결정하는 타겟 진위 결정부를 포함한다.A target detection apparatus according to one side includes an image quality type determination unit for determining a quality type of a target image; A convolutional neural network determiner for determining, from a database comprising a plurality of convolutional neural networks, a convolutional neural network of a quality type corresponding to a quality type of the target image; A detection value determiner for determining a detection value of the target image based on the convolutional neural network of the corresponding quality type; And a target truth determining unit for determining whether the target in the target image is a real target based on the detected value of the target image.

도 1은 일 실시예에 따른 타겟 검측 방법의 흐름도.
도 2a는 일 실시예에 따른 컨볼루션 뉴럴 네트워크의 트레이닝 방법의 흐름도.
도 2b는 일 실시예에 따른 케스케이드 컨볼루션 뉴럴 네트워크의 구조를 설명하는 도면.
도 2c는 일 실시예에 따른 단일 스테이지의 컨볼루션 뉴럴 네트워크의 구조를 설명하는 도면.
도 3a는 일 실시예에 따른 타겟 검측 방법이 실제 응용에 적용되는 시나리오의 흐름도.
도 3b는 일 실시예에 따른 프레임 영상에 대한 타겟 검측 방법을 설명하는 도면.
도 4는 일 실시예에 따른 타겟 검측 장치의 블록도.
도 5는 일 실시예에 따른 다양한 사기(fake) 공격에 대한 강인성을 설명하는 도면.
도 6은 일 실시예에 따른 낮은 품질 유형의 사기 공격에 대한 강인성을 설명하는 도면.1 is a flow chart of a target detection method according to an embodiment;
Figure 2a is a flow diagram of a method of training a convolutional neural network according to one embodiment.
FIG. 2B is a diagram illustrating a structure of a cascaded convolutional neural network according to an embodiment; FIG.
2C illustrates a structure of a single-stage convolutional neural network according to an embodiment;
3A is a flow chart of a scenario in which a target detection method according to one embodiment is applied to an actual application.
FIG. 3B is a view for explaining a target detection method for a frame image according to an embodiment; FIG.
4 is a block diagram of a target detection apparatus according to one embodiment.
Figure 5 illustrates robustness against various fake attacks according to one embodiment.
Figure 6 illustrates robustness against fraud attacks of the low quality type according to one embodiment;

실시예들에 대한 특정한 구조적 또는 기능적 설명들은 단지 예시를 위한 목적으로 개시된 것으로서, 다양한 형태로 변경되어 실시될 수 있다. 따라서, 실시예들은 특정한 개시형태로 한정되는 것이 아니며, 본 명세서의 범위는 기술적 사상에 포함되는 변경, 균등물, 또는 대체물을 포함한다.Specific structural or functional descriptions of embodiments are set forth for illustration purposes only and may be embodied with various changes and modifications. Accordingly, the embodiments are not intended to be limited to the particular forms disclosed, and the scope of the present disclosure includes changes, equivalents, or alternatives included in the technical idea.

제1 또는 제2 등의 용어를 다양한 구성요소들을 설명하는데 사용될 수 있지만, 이런 용어들은 하나의 구성요소를 다른 구성요소로부터 구별하는 목적으로만 해석되어야 한다. 예를 들어, 제1 구성요소는 제2 구성요소로 명명될 수 있고, 유사하게 제2 구성요소는 제1 구성요소로도 명명될 수 있다.The terms first or second, etc. may be used to describe various elements, but such terms should be interpreted solely for the purpose of distinguishing one element from another. For example, the first component may be referred to as a second component, and similarly, the second component may also be referred to as a first component.

어떤 구성요소가 다른 구성요소에 "연결되어" 있다고 언급된 때에는, 그 다른 구성요소에 직접적으로 연결되어 있거나 또는 접속되어 있을 수도 있지만, 중간에 다른 구성요소가 존재할 수도 있다고 이해되어야 할 것이다.It is to be understood that when an element is referred to as being "connected" to another element, it may be directly connected or connected to the other element, although other elements may be present in between.

단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한, 복수의 표현을 포함한다. 본 명세서에서, "포함하다" 또는 "가지다" 등의 용어는 설명된 특징, 숫자, 단계, 동작, 구성요소, 부분품 또는 이들을 조합한 것이 존재함으로 지정하려는 것이지, 하나 또는 그 이상의 다른 특징들이나 숫자, 단계, 동작, 구성요소, 부분품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 미리 배제하지 않는 것으로 이해되어야 한다.The singular expressions include plural expressions unless the context clearly dictates otherwise. In this specification, the terms " comprises ", or " having ", and the like, are used to specify one or more of the described features, numbers, steps, operations, elements, But do not preclude the presence or addition of steps, operations, elements, parts, or combinations thereof.

다르게 정의되지 않는 한, 기술적이거나 과학적인 용어를 포함해서 여기서 사용되는 모든 용어들은 해당 기술 분야에서 통상의 지식을 가진 자에 의해 일반적으로 이해되는 것과 동일한 의미를 가진다. 일반적으로 사용되는 사전에 정의되어 있는 것과 같은 용어들은 관련 기술의 문맥상 가지는 의미와 일치하는 의미를 갖는 것으로 해석되어야 하며, 본 명세서에서 명백하게 정의하지 않는 한, 이상적이거나 과도하게 형식적인 의미로 해석되지 않는다.Unless otherwise defined, all terms used herein, including technical or scientific terms, have the same meaning as commonly understood by one of ordinary skill in the art. Terms such as those defined in commonly used dictionaries are to be interpreted as having a meaning consistent with the meaning of the context in the relevant art and, unless explicitly defined herein, are to be interpreted as ideal or overly formal Do not.

이하, 실시예들을 첨부된 도면을 참조하여 상세하게 설명한다. 각 도면에 제시된 동일한 참조 부호는 동일한 부재를 나타낸다.Hereinafter, embodiments will be described in detail with reference to the accompanying drawings. Like reference symbols in the drawings denote like elements.

도 1은 일 실시예에 따른 타겟 검측 방법의 흐름도이다. 도 1을 참조하면, 타겟 검측 방법은 타겟 영상의 품질 유형을 결정하는 단계(S101); 타겟 영상의 품질 유형과 대응하는 품질 유형의 컨볼루션 뉴럴 네트워크를 결정하는 단계(S102); 대응하는 품질 유형의 컨볼루션 뉴럴 네트워크에 기초하여 타겟 영상의 검측 값을 결정하는 단계(S103); 및 타겟 영상의 검측 값에 기초하여 타겟 영상 중의 타겟이 진짜 타겟인지 여부를 결정하는 단계(S104)를 포함한다. 1 is a flowchart of a target detection method according to an embodiment. Referring to FIG. 1, a target detection method includes determining a quality type of a target image (S101); Determining (S102) a convolutional neural network of a quality type corresponding to the quality type of the target image; Determining a detected value of the target image based on the convolutional neural network of the corresponding quality type (S103); And determining whether the target in the target image is a real target based on the detected value of the target image (S104).

실시예들은 타겟 영상의 품질 유형에 따라 상이한 컨볼루션 뉴럴 네트워크를 이용함으로써, 타겟 영상의 품질 유형에 영향을 미치는 다양한 요인들과 무관하게 타겟을 정확하게 검측하는 기술을 제공한다. 여기서, 타겟을 검측한다는 것은 타겟의 진위 여부를 판별하는 것으로, 예를 들어 타겟 영상 내 타겟이 생체 타겟인지 여부를 판별하는 것으로 이해될 수 있다. 예를 들어, 타겟 영상에 포함된 타겟이 생체 얼굴인 경우 진짜 타겟으로 인식되며, 사진, 영상 등의 비생체 얼굴인 경우 가짜 타겟으로 인식될 수 있다. 이하, 타겟 검측 방법은 타겟의 라이브니스(liveness)를 검측하는 방법으로 이해될 수 있다.Embodiments provide a technique for accurately detecting a target regardless of various factors that affect the quality type of the target image, by using different convolution neural networks depending on the quality type of the target image. Here, the detection of the target determines whether the target is true or false, and it can be understood that it is determined, for example, whether or not the target in the target image is a biological target. For example, if the target included in the target image is a biometric face, it is recognized as a real target, and if it is a non-biometric face such as a photograph or a video, it can be recognized as a false target. Hereinafter, the target detection method can be understood as a method of detecting the liveness of the target.

타겟은 생물의 신체부위로, 예를 들어 사람의 얼굴, 사람의 장문(palmprint), 사람의 지문, 사람의 홍채, 사람의 사지, 동물의 얼굴, 동물의 장문, 동물의 지문, 동물의 홍채, 동물의 사지 등을 포함할 수 있다. 진짜 타겟은 생체 타겟이며, 가짜 타겟은 비생체 타겟일 수 있다.A target is a body part of a creature, such as a human face, a palmprint, a human fingerprint, a human iris, a human limb, an animal's face, an animal's fingerprint, Animal limbs, and the like. The real target is a biological target, and the false target may be a non-biological target.

타겟 영상의 품질 유형은 적어도 하나의 품질 파라미터에 대응하여 결정된 타겟 영상의 적어도 하나의 품질 값에 기초하여 결정될 수 있다. 품질 파라미터는 촬영 파라미터, 속성 파라미터 등을 포함할 수 있다. 촬영 파라미터는 해상도, ISO(감광도) 등을 포함할 수 있고, 속성 파라미터는 색준(color quasi), 비교도, 밝기, 포화도, 선명도(sharpness) 등을 포함할 수 있다. 품질 값은 품질 파라미터에 대응하여 타겟 영상이 가지는 값으로 이해될 수 있다.The quality type of the target image may be determined based on at least one quality value of the target image determined corresponding to the at least one quality parameter. The quality parameter may include a shooting parameter, an attribute parameter, and the like. The shooting parameters may include resolution, ISO (sensitivity), and the like, and the attribute parameters may include color quasi, comparison, brightness, saturation, sharpness, and the like. The quality value can be understood as a value of the target image corresponding to the quality parameter.

타겟 영상에 따라 촬영 파라미터와 속성 파라미터가 다를 수 있고, 이 경우 타겟 영상마다 서로 다른 품질 유형을 가질 수 있다. 예를 들어, 사용자 단말의 촬영장비의 성능에 따라 타겟 영상의 품질 유형이 달라질 수 있으며, 또한 조도 등 다양한 촬영환경에 의하여도 타겟 영상의 품질 유형이 달라질 수 있다. 품질 유형이 상이함에도 동일한 컨볼루션 뉴럴 네트워크를 이용하는 경우, 검측 결과가 부정확해질 수 있다. 아래에서 설명하는 실시예들은 품질 유형에 따라 적절한 컨볼루션 뉴럴 네트워크를 이용함으로써, 사용자가 이용하거나 체험할 수 있는 다양한 단말이나 환경 하에서도 정확한 검측 결과를 보장할 수 있다.According to the target image, the shooting parameter and the attribute parameter may be different. In this case, the target image may have different quality types. For example, the quality type of the target image may vary depending on the performance of the imaging device of the user terminal, and the quality type of the target image may be varied by various imaging environments such as illumination. If the same convolution neural network is used even though the quality types are different, the detection result may be inaccurate. The embodiments described below can ensure accurate detection results even under various terminals or environments that the user can use or experience by using an appropriate convolutional neural network according to the quality type.

타겟 영상의 품질 유형을 결정하는 단계(S101)는 적어도 하나의 품질 파라미터에 대응하여 타겟 영상의 적어도 하나의 품질 값을 결정하는 단계; 적어도 하나의 품질 값 및 적어도 하나의 품질 파라미터에 대응하여 미리 설정된 적어도 하나의 품질 유형 구분 표준에 기초하여 타겟 영상의 적어도 하나의 품질 구분을 결정하는 단계; 및 적어도 하나의 품질 구분에 기초하여 타겟 영상의 품질 유형을 결정하는 단계를 포함할 수 있다.Determining (S101) the quality type of the target image comprises determining at least one quality value of the target image corresponding to the at least one quality parameter; Determining at least one quality classification of a target image based on at least one quality type classification standard preset corresponding to at least one quality value and at least one quality parameter; And determining a quality type of the target image based on the at least one quality classification.

품질 유형 구분 표준은 품질 파라미터 별로 정의될 수 있다. 품질 유형 구분 표준은 품질 파라미터 별로 타겟 영상이 가지는 품질 값에 따른 품질 구분을 분류한다. 예를 들어, 해상도 파라미터의 품질 유형 구분 표준은 타겟 영상의 해상도 값에 따라, 타겟 영상의 품질 구분을 고해상도, 중해상도, 저해상도 등으로 분류할 수 있다. The quality type classification standard can be defined for each quality parameter. The quality classification standard classifies the quality classification according to the quality value of the target image for each quality parameter. For example, the quality classification standard of the resolution parameter can classify the quality classification of the target image into a high resolution, a medium resolution, and a low resolution according to the resolution value of the target image.

타겟 영상의 품질 유형은 하나 또는 그 이상의 품질 구분들의 조합으로 결정될 수 있다. 예를 들어, 타겟 영상의 품질 유형은 {해상도:고, 밝기:저, 선명도:중}으로 표현될 수 있다. 아래에서 상세하게 설명하겠으나, 타겟 영상의 품질 유형은 품질 구분들 중 대표 값으로 결정되는 방식 등으로 다양하게 변형될 수 있다.The quality type of the target image may be determined by a combination of one or more quality indications. For example, the quality type of the target image may be expressed as {resolution: high, brightness: low, sharpness: medium}. As will be described in detail below, the quality type of the target image may be variously modified by a method of determining a representative value among the quality classifications.

타겟 영상의 품질 유형과 대응하는 품질 유형의 컨볼루션 뉴럴 네트워크를 결정하는 단계(S102)는 기 학습된 복수의 품질 유형들의 컨볼루션 뉴럴 네트워크들을 포함하는 데이터베이스로부터, 타겟 영상의 품질 유형에 대응하는 품질 유형의 컨볼루션 뉴럴 네트워크를 선택하는 단계를 포함할 수 있다. 아래에서 상세히 설명하겠으나, 데이터베이스는 복수의 샘플 영상들의 품질 유형을 결정하고, 각 품질 유형의 샘플 영상들에 기초하여 해당하는 품질 유형의 컨볼루션 뉴럴 네트워크를 트레이닝 하는 것에 의하여 획득될 수 있다.Determining (S102) a convolutional neural network of a quality type corresponding to the quality type of the target image comprises determining, from a database comprising convolutional neural networks of the plurality of learned quality types, a quality corresponding to the quality type of the target image Lt; RTI ID = 0.0 > a < / RTI > type of convolutional neural network. As will be described in detail below, a database may be obtained by determining the quality type of a plurality of sample images and training a convolutional neural network of the corresponding quality type based on sample images of each quality type.

일 실시예에 따르면, 타겟 검출을 위하여 이용하는 컨볼루션 뉴럴 네트워크는 단일 스테이지(single stage)의 컨볼루션 뉴럴 네트워크일 수 있다. 컨볼루션 뉴럴 네트워크의 출력에 기초하여, 타겟 영상 내 타겟이 생체 타겟인지 여부가 결정될 수 있다. 아래에서 상세하게 설명하겠으나, 타겟 검출의 응용에 따라 진짜 타겟이 포함된 샘플을 정 샘플로 분류할 확률이 일정 확률 이상일 것이 요구되거나, 가짜 타겟이 포함된 샘플을 정 샘플로 오인 분류할 확률이 일정 확률 미만일 것이 요구될 수 있다. 컨볼루션 뉴럴 네트워크의 성능은 컨볼루션 뉴럴 네트워크의 구조 및/또는 파라미터 등이 조절됨으로써 향상되고, 이로 인하여 각 응용에서 요구하는 성능조건이 만족될 수 있다.According to one embodiment, the convolutional neural network used for target detection may be a single stage convolutional neural network. Based on the output of the convolutional neural network, it can be determined whether the target in the target image is a biometric target. As will be described in detail below, depending on the application of the target detection, it is required that the probability that the sample containing the real target is classified as the normal sample is more than a certain probability, or the probability that the sample containing the false target is classified as the correct sample It may be required to be less than a probability. The performance of the convolutional neural network is improved by adjusting the structure and / or parameters of the convolutional neural network, thereby satisfying the performance requirements required in each application.

다른 실시예에 따르면, 컨볼루션 뉴럴 네트워크는 케스케이드(cascaded) 컨볼루션 뉴럴 네트워크일 수 있다. 케스케이드 컨볼루션 뉴럴 네트워크는 복수의 스테이지들의 컨볼루션 뉴럴 네트워크들을 포함할 수 있다. 아래에서 상세하게 설명하겠으나, 복수의 스테이지들의 컨볼루션 뉴럴 네트워크들을 이용하는 경우, 각각의 스테이지의 컨볼루션 뉴럴 네트워크는 케스케이드 컨볼루션 뉴럴 네트워크 전체에 요구되는 성능조건보다 낮은 수준의 성능을 가질 수 있다.According to another embodiment, the convolutional neural network may be a cascaded convolution neural network. The cascaded convolutional neural network may include convolutional neural networks of a plurality of stages. As will be described in greater detail below, when using convolutional neural networks of a plurality of stages, the convolutional neural network of each stage may have a performance level that is lower than the performance requirements required for the entire cascaded convolutional neural network.

대응하는 품질 유형의 컨볼루션 뉴럴 네트워크에 기초하여 타겟 영상의 검측 값을 결정하는 단계(S103)는 데이터베이스로부터 선택된 컨볼루션 뉴럴 네트워크에 타겟 영상을 입력하는 단계; 및 컨볼루션 뉴럴 네트워크의 출력으로부터 검측 값을 획득하는 단계를 포함할 수 있다. 아래에서 상세하게 설명하겠으나, 컨볼루션 뉴럴 네트워크는 타겟 영상이 진짜 타겟을 포함하는 정 샘플로 분류된 확률을 출력할 수 있으며, 타겟 영상이 진짜 타겟을 포함하는 정 샘플로 분류된 확률이 검측 값으로 이용될 수 있다.(S103) of determining a detection value of a target image based on a convolutional neural network of a corresponding quality type comprises: inputting a target image to a convolutional neural network selected from a database; And obtaining a detection value from the output of the convolutional neural network. As will be described in detail below, the convolutional neural network can output a probability that a target image is classified as a normal sample including a real target, and a probability that the target image is classified as a normal sample including a real target is a detection value Can be used.

타겟 영상의 검측 값에 기초하여 타겟 영상 중의 타겟이 진짜 타겟인지 여부를 결정하는 단계(S104)는 검측 값을 미리 설정된 역치와 비교함으로써 타겟의 진위 여부를 판별하는 단계를 포함할 수 있다.The step (S104) of determining whether or not the target in the target image is a real target based on the detected value of the target image may include a step of determining whether the target is true or false by comparing the detected value with a preset threshold value.

실시예들은 타겟을 검측하기 위하여 사용자에게 특정 동작의 실행을 요구하지 않을 뿐만 아니라, 다양한 하드웨어 조건이나 촬영 환경 하에서 획득한 타겟 영상에 대하여도 정확한 타겟 검측을 함으로써 진위 타겟을 판별하는 강인성(robustness)을 증가시킬 수 있다.Embodiments not only require a user to perform a specific operation to detect a target but also perform robustness to determine an authenticity target by performing accurate target detection on target images acquired under various hardware conditions or shooting environments .

도 2a는 일 실시예에 따른 컨볼루션 뉴럴 네트워크의 트레이닝 방법의 흐름도이다. 도 2a를 참조하면, 컨볼루션 뉴럴 네트워크의 트레이닝 방법은 복수의 샘플 영상의 품질 유형을 결정하는 단계(S201)를 포함한다.2A is a flow diagram of a training method of a convolutional neural network according to an embodiment. Referring to FIG. 2A, a training method of a convolutional neural network includes determining (S201) a quality type of a plurality of sample images.

복수의 샘플 영상들 각각에 대하여 적어도 하나의 품질 파라미터에 대응하는 품질 값을 얻고; 샘플 영상의 적어도 하나의 품질 값에 기초하여 샘플 영상의 품질 유형을 결정한다. Obtaining a quality value corresponding to at least one quality parameter for each of the plurality of sample images; A quality type of the sample image is determined based on at least one quality value of the sample image.

품질 유형을 결정하기 위한 품질 유형 구분표준은 실험데이터, 역사 데이터, 경험 데이터 및/또는 실제상황에 기초하여 미리 결정될 수 있다. 예를 들면, 해상도의 품질 유형 구분표준은 영상의 짧은 변의 해상도(예를 들어, 세로 해상도)가 각각 1080픽셀보다 크고, 720픽셀보다 크고 1080픽셀보다 크지 않고, 720픽셀보다 크지 않을 때, 영상의 해당도 품질 구분을 각각 고품질 해상도, 중품질 해상도, 저품질 해상도로 분류하는 기준을 포함할 수 있다.The quality type classification standard for determining a quality type can be predetermined based on experimental data, historical data, empirical data and / or actual conditions. For example, the quality type classification standard of resolution is that when the resolution of a short side of an image (e.g., a vertical resolution) is greater than 1080 pixels each, greater than 720 pixels, not greater than 1080 pixels, and not greater than 720 pixels, And may include a criterion for classifying the quality classification into high-quality resolution, medium-quality resolution, and low-quality resolution, respectively.

일 실시예에 따르면, 영상의 품질 유형은 품질 파라미터의 계층적 구조에 기초한 품질구분에 기초하여 결정될 수 있다. 일 예로, 영상의 촬영 파라미터에 속한 복수의 파라미터들 중 품질구분이 가장 낮은 파라미터에 따라 촬영 파라미터의 품질구분이 결정되고, 영상의 속성 파라미터에 속한 복수의 파라미터들 중 품질구분이 가장 낮은 파라미터에 따라 속성 파라미터의 품질구분이 결정될 수 있다. 이 경우, 영상의 품질 유형은 촬영 파라미터의 품질구분과 속성 파라미터의 품질구분의 조합에 의하여 결정될 수 있다.According to one embodiment, the quality type of the image can be determined based on the quality classification based on the hierarchical structure of the quality parameters. For example, the quality classification of the imaging parameters is determined according to the parameter having the lowest quality classification among the plurality of parameters belonging to the imaging parameter of the image, and the quality classification of the imaging parameter is determined according to the parameter having the lowest quality classification among the plurality of parameters belonging to the image The quality classification of the attribute parameters can be determined. In this case, the quality type of the image can be determined by a combination of the quality classification of the photographing parameter and the quality classification of the attribute parameter.

다른 예로, 영상의 촬영 파라미터에 속한 복수의 파라미터들 중 임의로 추첨된 파라미터의 품질구분과 속성 파라미터에 속한 복수의 파라미터들 중 임의로 추첨된 파라미터의 품질구분이 동일한 품질구분(예를 들어, 고품질)에 도달할 때까지 추첨을 반복한 뒤, 해당 품질구분에 기초하여 영상의 품질 유형이 결정될 수 있다.As another example, if the quality classification of the arbitrarily drawn parameter among the plurality of parameters belonging to the image pickup parameter and the quality classification of the arbitrarily drawn parameter among the plurality of parameters belonging to the attribute parameter are the same quality classification (for example, high quality) After the lottery is repeated until reaching, the quality type of the image can be determined based on the quality classification.

또 다른 예로, 촬영 파라미터에 속한 복수의 파라미터들 각각과 속성 파라미터에 속한 복수의 파라미터들 각각의 품질구분에 기초하여, 영상의 품질 유형이 결정될 수도 있다.As another example, the quality type of the image may be determined based on the quality classification of each of the plurality of parameters belonging to the shooting parameter and the plurality of parameters belonging to the attribute parameter.

예를 들면, 미리 설정한 해상도의 품질 유형 구분표준과 샘플 영상의 짧은 변의 해상도에 기초하여 상기 샘플 영상의 해상도 품질 유형이 결정되고; 미리 설정한 ISO의 품질 유형 구분표준과 샘플 영상의 ISO에 기초하여 샘플 영상의 ISO의 품질 유형이 결정되며; 미리 설정한 비교도의 품질 유형 구분표준과 샘플 영상의 비교도에 기초하여 샘플 영상의 비교도의 품질 유형이 결정된다. For example, the resolution quality type of the sample image is determined based on the quality type classification standard of the preset resolution and the resolution of the short side of the sample image; The quality type of the ISO of the sample image is determined based on the ISO of the predetermined ISO quality classification standard and the sample image; The quality type of the comparative chart of the sample image is determined based on the degree of comparison of the quality type classification standard and the sample image of the preset comparative chart.

이 경우, 샘플 영상의 해상도, ISO 및 비교도의 품질 유형에 기초하여 샘플 영상의 품질 유형이 결정된다. 일 예로, 샘플 영상의 해상도, ISO와 비교도가 모두 고품질일 때, 샘플 영상의 품질 유형은 고품질임로 결정될 수 있다. 다른 예로, 샘플 영상의 해상도, ISO와 비교도가 각각 고품질, 중품질과 저품질일 때, 샘플 영상의 품질 유형은 중품질로 결정될 수 있다. 또 다른 예로, 샘플 영상의 품질 유형은 샘플 영상의 해상도의 고품질, ISO의 중품질 및 비교도의 저품질을 포함하는 세 개 위도의 품질 유형으로 결정될 수 있다. In this case, the quality type of the sample image is determined based on the resolution of the sample image, the ISO, and the quality type of the comparative chart. For example, when the resolution of the sample image and the degree of comparison with ISO are both high quality, the quality type of the sample image can be determined to be high quality. As another example, when the resolution of the sample image and the degree of comparison with ISO are high quality, medium quality and low quality, respectively, the quality type of the sample image can be determined as medium quality. As another example, the quality type of the sample image may be determined by a quality type of three latitudes including a high quality of resolution of the sample image, a medium quality of ISO, and a low quality of comparison.

일 실시예에 따르면, 샘플 영상에 대하여 블라인드 영상 품질평가를 함으로써, 샘플 영상의 품질 유형이 결정될 수 있다. 예를 들면, 공간정보에 기초한 BRISQUE(Blind/referenceless image spatial quality evaluator, 블라인드/무참고 영상 품질평가) 방법, GM-LOG(Gradient Magnitude and Laplacian Of Gaussian, 기울기 크기와 가우시안의 라플라시안)에 기초한 방법, HOSA(High order statistics aggregation, 고차원 통계 집계)에 기초한 방법 등이 블라인드 영상 품질평가로 이용될 수 있다.According to one embodiment, by performing a blind image quality evaluation on a sample image, the quality type of the sample image can be determined. For example, a method based on BRISQUE (Blind / referenceless image spatial quality evaluator) based on spatial information, a method based on GM-LOG (Gradient Magnitude and Laplacian of Gaussian, slope size and Gaussian Laplacian) Methods based on high order statistics aggregation (HOSA) can be used for blind image quality evaluation.

공간정보에 기초한 BRISQUE 방법을 예로 들면, 원시적 샘플 영상에 대하여 공간적 표준화 처리를 하여 평균치를 제거하여 표준차로 나누고; 일반 가우스 분포(Generalized Gaussian distribution)를 사용하여 공간 표준화 처리후의 샘플 영상의 파라미터 분포에 대하여 피팅을 하여 얻은 분포의 파라미터를 특징으로 하며; 서포트 벡터 희귀(Support vector regression)의 방법을 사용하여 미리 트레이닝 하여 획득한 평가모델을 이용하여 영상 품질 평가결과를 결정하는 것을 포함한다. 여기서, 평가모델은 영상 품질 평가 값이 라벨된(labeled) 대량의 영상을 이용하여 트레이닝 된 것으로, 영상의 특징과 품질 평가 값이 구비된 상태에서 서포트 벡터 희귀 학습특징과 품질 평가 값 사이의 매핑관계를 이용하여 획득된다.As an example of a BRISQUE method based on spatial information, a spatial standardization process is performed on a primitive sample image to remove an average value and divided into standard differences; Characterized by a distribution parameter obtained by fitting a parameter distribution of a sample image after the spatial normalization process using a generalized Gaussian distribution; And determining an image quality evaluation result using an evaluation model obtained by pre-training using a method of support vector regression. Here, the evaluation model is a training that is performed using a large number of images labeled with an image quality evaluation value. In the state where image features and quality evaluation values are provided, a mapping relation between the support vector rare learning feature and the quality evaluation value .

BRISQUE에 기초한 영상 품질평가결과의 품질 유형 구분표준에 기초하여, BRISQUE방법에 기초하여 획득한 영상 품질평가 결과에 대하여 품질 유형 구분을 진행하여 샘플 영상의 품질 유형이 결정될 수 있다.Based on the quality classification standard of the image quality evaluation result based on the BRISQUE, the quality type classification of the image quality evaluation result acquired based on the BRISQUE method can be performed to determine the quality type of the sample image.

컨볼루션 뉴럴 네트워크의 트레이닝 방법은 복수의 품질 유형들의 컨볼루션 뉴럴 네트워크들 중 결정된 품질 유형에 대응하는 품질 유형의 컨볼루션 뉴럴 네트워크를 선택하는 단계(S202)를 포함한다.The training method of the convolutional neural network includes selecting a convolutional neural network of quality type corresponding to the determined quality type among the plurality of quality type convolutional neural networks (S202).

복수의 품질 유형들의 컨볼루션 뉴럴 네트워크들은 서로 동일한 구조를 가질 수도 있고, 서로 상이한 구조를 가질 수도 있다. 다만, 서로 다른 품질 유형들의 컨볼루션 뉴럴 네트워크들은 서로 다른 영상 샘플들에 기초하여 트레이닝 되므로, 서로 다른 파라미터(예를 들어, 시냅스 가중치 등)를 가지게 된다.Convolutional neural networks of a plurality of quality types may have the same structure or may have different structures from each other. Convolutional neural networks of different quality types, however, are trained based on different image samples and thus have different parameters (e.g., synapse weights, etc.).

아래에서 설명하겠으나, 케스케이드 컨볼루션 뉴럴 네트워크가 이용되는 경우, 품질 유형별로 요구되는 성능조건이 상이함에 따라 스테이지들의 수가 달라질 수 있다. 이 경우에도, 단일 스테이지의 컨볼루션 뉴럴 네트워크의 구조는 서로 동일하거나, 상이할 수 있다.As will be described below, when a cascaded convolution neural network is used, the number of stages may vary as the required performance conditions for different quality types are different. Even in this case, the structures of the single stage convolutional neural networks may be the same or different from each other.

일 실시예에 따르면, 컨볼루션 뉴럴 네트워크의 트레이닝에 앞서, 선택 가능한 품질 유형들이 미리 정해지고, 각 품질 유형별로 컨볼루션 뉴럴 네트워크의 구조가 결정될 수 있다.According to one embodiment, prior to training the convolutional neural network, selectable quality types are predetermined, and the structure of the convolutional neural network may be determined for each quality type.

컨볼루션 뉴럴 네트워크의 트레이닝 방법은 샘플 영상이 정 샘플인지 부 샘플인지 여부에 기초하여, 선택된 품질 유형의 컨볼루션 뉴럴 네트워크를 트레이닝하는 단계(S203)를 포함한다. 샘플 영상이 진짜 타겟을 포함하는 경우 샘플 영상은 정 샘플에 해당하고, 샘플 영상이 진짜 타겟을 포함하지 않는 경우 샘플 영상은 부 샘플에 해당한다.The training method of the convolutional neural network includes training (S203) a convolutional neural network of a selected quality type based on whether the sample image is a positive or a negative sample. The sample image corresponds to a positive sample when the sample image includes a real target, and the sample image corresponds to a negative sample when the sample image does not include a real target.

샘플 영상의 각 품질 유형에 대하여 해당 품질 유형의 복수 개 샘플 영상에 따라 해당 품질 유형의 컨볼루션 뉴럴 네트워크가 트레이닝될 수 있다. 예를 들어, 복수 개 샘플 영상이 분류된 복수 개 품질 유형들 별로 그룹을 형성하여, 동일한 그룹 중의 샘플 영상이 같은 품질 유형을 갖도록 할 수 있다. 이 경우, 각 그룹 내 동일한 품질 유형을 가진 복수 개 샘플 영상에 기초하여 해당 품질 유형의 컨볼루션 뉴럴 네트워크가 트레이닝 된다.For each quality type of sample image, a convolution neural network of the corresponding quality type may be trained according to a plurality of sample images of the corresponding quality type. For example, a group of a plurality of quality types in which a plurality of sample images are classified may be formed so that sample images in the same group have the same quality type. In this case, a convolutional neural network of the corresponding quality type is trained based on a plurality of sample images having the same quality type in each group.

예를 들면, 한 그룹의 샘플 영상의 품질 유형이 고품질의 해상도, ISO와 비교도를 포함하면 해당 그룹의 샘플 영상에 기초하여 고품질의 케스케이드 컨볼루션 뉴럴 네트워크가 트레이닝 된다. 또는, 한 그룹의 샘플 영상의 해상도가 고품질이고, ISO가 중품질이고 비교도가 저품질이면 해당 그룹의 샘플 영상에 기초하여 대응하는 품질 유형(샘플 영상 해상도는 고품질이고, ISO는 중품질이고 비교도가 저품질)의 컨볼루션 뉴럴 네트워크가 트레이닝 된다.For example, if the quality type of a group of sample images includes a high quality resolution, ISO, and a degree of comparison, a high quality cascade convolution neural network is trained based on the sample images of that group. Alternatively, when the resolution of a group of sample images is high quality, the ISO is medium quality, and the comparability is low, a corresponding quality type (sample image resolution is high quality, ISO is medium quality, A low-quality convolution neural network is trained.

일 실시예에 따르면, 각 품질 유형의 컨볼루션 뉴럴 네트워크에 기초하여, 데이터베이스가 구축될 수 있다.According to one embodiment, a database may be established based on the convolutional neural network of each quality type.

도 2b는 일 실시예에 따른 케스케이드 컨볼루션 뉴럴 네트워크의 구조를 설명하는 도면이다. 도 2b를 참조하면, 각 품질 유형의 케스케이드 컨볼루션 뉴럴 네트워크는 적어도 두 개의 스테이지들의 컨볼루션 뉴럴 네트워크와 적어도 하나의 역치 판단 레이어를 포함한다. 현재 스테이지의 역치 판단 레이어는 현재 스테이지의 컨볼루션 뉴럴 네트워크와 다음 스테이지의 컨볼루션 뉴럴 네트워크 사이에 연결된다. 구체적으로 현재 스테이지의 역치 판단 레이어의 입력노드와 현재 스테이지의 컨볼루션 뉴럴 네트워크의 출력 레이어가 연결되고; 현재 스테이지의 역치 판단 레이어의 출력노드와 다음 스테이지의 컨볼루션 뉴럴 네트워크의 입력 레이어는 연결된다. 2B is a diagram illustrating a structure of a cascaded convolutional neural network according to an embodiment. Referring to FIG. 2B, each quality type of cascaded convolutional neural network includes at least two stages of convolutional neural networks and at least one threshold decision layer. The threshold decision layer of the current stage is connected between the convolution neural network of the current stage and the convolution neural network of the next stage. Specifically, the input node of the threshold value determination layer of the current stage is connected to the output layer of the convolution neural network of the current stage; The output node of the threshold decision layer of the current stage is connected to the input layer of the convolution neural network of the next stage.

도 2b에서 CNN(Convolutional neural network, 컨볼루션 뉴럴 네트워크)1(210)과 2(220)는 각각 제1, 2 스테이지의 컨볼루션 뉴럴 네트워크를 표시하고; 역치 판단 1(215)은 제1 스테이지의 역치 판단 레이어를 표시하고 제1, 2 스테이지의 컨볼루션 뉴럴 네트워크 사이에 연결되고, 같은 원리로, 역치 판단 2(225)는 제2 스테이지의 역치 판단 레이어를 표시하고 제2 스테이지의 컨볼루션 뉴럴 네트워크와 제3 스테이지의 컨볼루션 뉴럴 네트워크(도면 미표시) 사이에 연결된다.In FIG. 2B, CNNs (Convolutional Neural Networks) 1 210 and 2 220 represent first and second stages of convolutional neural networks, respectively; The threshold value 1 (215) indicates the threshold value determination layer of the first stage and is connected between the convolution neural networks of the first and second stages. On the same principle, the threshold value 2 (225) And is connected between the convolution neural network of the second stage and the convolution neural network (not shown) of the third stage.

각 품질 유형의 케스케이드 컨볼루션 뉴럴 네트워크의 트레이닝 방법은 케스케이드 컨볼루션 뉴럴 네트워크의 스테이지 수의 결정방법, 각 스테이지의 컨볼루션 뉴럴 네트워크의 트레이닝 방법과 각 스테이지의 역치 판단 레이어의 역치의 결정방법을 포함한다.The training method of the cascaded convolution neural network of each quality type includes a method of determining the number of stages of the cascaded convolution neural network, a training method of the convolution neural network of each stage, and a method of determining the threshold value of the threshold value layer of each stage .

일 실시예에서, 각 스테이지의 컨볼루션 뉴럴 네트워크의 출력은 TPR(True Positive Rate, 진짜 양성율)과 FPR(False Positive Rate, 가짜 양성율)의 성능지수를 갖는다. TPR은 정 샘플이 정확하게 정 샘플로 분류되는 비율을 가리킨다. FPR은 부 샘플이 틀리게 정 샘플로 분류되는 비율을 가리킨다. In one embodiment, the output of the convolutional neural network at each stage has a performance index of TPR (True Positive Rate) and FPR (False Positive Rate). The TPR indicates the rate at which the positive sample is correctly classified as a normal sample. FPR refers to the rate at which the sub-sample is incorrectly classified as a normal sample.

케스케이드 컨볼루션 뉴럴 네트워크의 스테이지 수를 결정하는 방법은 각 품질 유형의 케스케이드 컨볼루션 뉴럴 네트워크가 필요로 하는 성능지수에 기초하여 해당 품질 유형의 케스케이드 컨볼루션 뉴럴 네트워크의 스테이지 수를 결정하는 것을 포함한다. 각 품질 유형의 케스케이드 컨볼루션 뉴럴 네트워크는 서로 다른 성능지수를 요구하고, 각 품질 유형의 케스케이드 컨볼루션 뉴럴 네트워크에 포함된 스테이지들의 수는 해당하는 성능지수에 의하여 결정될 수 있다.A method for determining the number of stages of a cascaded convolutional neural network includes determining a number of stages of a cascade convolutional neural network of the corresponding quality type based on a performance index required by the cascade convolutional neural network of each quality type. The cascade convolutional neural network of each quality type requires a different figure of merit and the number of stages included in the cascade convolution neural network of each quality type can be determined by the corresponding figure of merit.

예를 들면, 각 품질 유형의 케스케이드 컨볼루션 뉴럴 네트워크는 TPR=99.5%, FPR=0.1%을 요구할 수 있다. 이 때, 제1 스테이지의 컨볼루션 뉴럴 네트워크는 역치를 TPR=99.9%, FPR=10%로 조절할 수 있고, 제2 스테이지의 컨볼루션 뉴럴 네트워크도 역치를 TPR=99.9%, FPR=10%로 조절할 수 있다. 두 개의 컨볼루션 뉴럴 네트워크가 케스케이드 된 성능지수 FPR=10%*10%=1%는 케스케이드 컨볼루션 뉴럴 네트워크 전체의 FPR 요구(0.1%)를 만족하지 못한다. 제1, 2 스테이지의 컨볼루션 뉴럴 네트워크와 같은 성능의 제3 스테이지의 컨볼루션 뉴럴 네트워크를 추가로 케스케이드 하여야만 케스케이드 컨볼루션 뉴럴 네트워크 전체의 FPR 요구에 도달할 수 있다., 이 경우, 케스케이드 컨볼루션 뉴럴 네트워크의 스테이지 수는 3으로 결정될 수 있다. 제1, 2와 3 스테이지의 컨볼루션 뉴럴 네트워크가 케스케이드 된 성능지수는 케스케이드 컨볼루션 뉴럴 네트워크 전체의 성능요구를 만족한다(TPR=99.9%*99.9%*99.9%>99.5%, FPR = 10%*10%*10%<=0.1%). 이처럼, 케스케이드 컨볼루션 뉴럴 네트워크에 의하여 요구되는 성능지수는, 케스케이드 컨볼루션 뉴럴 네트워크에 포함된 복수의 스테이지들의 컨볼루션 뉴럴 네트워크들의 성능지수들의 조합에 의하여 만족될 수 있다.For example, a cascaded convolutional neural network of each quality type may require TPR = 99.5%, FPR = 0.1%. At this time, the convolution neural network of the first stage can adjust the threshold value to TPR = 99.9% and FPR = 10%, and the convolution neural network of the second stage can adjust the threshold value to TPR = 99.9% and FPR = 10% . The performance index FPR = 10% * 10% = 1%, in which two convolutional neural networks are cascaded, does not satisfy the FPR requirement (0.1%) of the entire cascaded convolution neural network. The FPR requirements of the entire cascaded convolutional neural network can only be reached by further cascading the convolutional neural network of the third stage with the same performance as the convolutional neural networks of the first and second stages. The number of stages in the network may be determined to be three. The performance index of the first, second and third stage convolutional neural networks cascaded meets the performance requirements of the entire cascaded convolutional neural network (TPR = 99.9% * 99.9% * 99.9%> 99.5%, FPR = 10% 10% * 10% < = 0.1%). As such, the performance index required by the cascaded convolutional neural network can be met by a combination of the performance indices of the convolutional neural networks of the plurality of stages included in the cascaded convolutional neural network.

이하에서 각 스테이지의 컨볼루션 뉴럴 네트워크의 트레이닝 방법을 설명한다.Hereinafter, a training method of the convolutional neural network of each stage will be described.

구체적으로, 각 품질 유형의 복수의 샘플 영상에 포함된 진짜 타겟의 샘플 영상을 정 샘플로 하고; 가짜 타겟을 포함한 샘플 영상을 부 샘플로 한다. 가짜 타겟은 진짜 타겟의 프린트 영상, 진짜 타겟의 사진, 진짜 타겟을 나타내는 스크린, 진짜 타겟의 3D프린트 모델 등을 포함할 수 있다.Specifically, a sample image of a real target contained in a plurality of sample images of each quality type is regarded as a normal sample; A sample image including a fake target is set as a sub-sample. A spoofed target may include a print image of a real target, a photograph of a real target, a screen showing a real target, a 3D print model of a real target, and the like.

각 스테이지의 컨볼루션 뉴럴 네트워크의 순서적 분배에 대하여 반복 트레이닝을 한다. 정 샘플과 부 샘플을 입력하면서 역방향 전파 알고리즘을 사용하여 제1 스테이지의 컨볼루션 뉴럴 네트워크의 파라미터를 반복 트레이닝 한다. 제1 스테이지의 컨볼루션 뉴럴 네트워크의 TPR이 하나의 비교적 높은 값에 도달하도록 하고(예를 들어 TPR=99.9%), FPR은 높지 않게 할 수 있다(예를 들어 FPR=20%). 이 경우, 부분적으로 부 샘플이 정 샘플로 오분류 되는 것을 포함하게 된다. 다음, 제1 스테이지의 컨볼루션 뉴럴 네트워크에서 분류한 정 샘플과 부 샘플을 선택하고 제2 스테이지의 컨볼루션 뉴럴 네트워크의 파라미터에 대하여 반복 트레이닝을 진행한다. 같은 원리로 기초하여 제2 스테이지의 컨볼루션 뉴럴 네트워크에서 분류한 마지막 정 샘플과 부 샘플을 선택하고, 마지막 스테이지 컨볼루션 뉴럴 네트워크의 파라미터에 대하여 반복 트레이닝을 하면, 최종적으로 케스케이드 컨볼루션 뉴럴 네트워크의 각 스테이지의 컨볼루션 뉴럴 네트워크가 획득될 수 있다. 설명의 편의를 위하여 스테이지의 수가 3인 경우를 예로 들어 설명하였으나, 스테이지의 수는 다양하게 변형될 수 있다.The iterative training is performed for the sequential distribution of the convolution neural network of each stage. The parameters of the convolutional neural network of the first stage are repeatedly trained using a back propagation algorithm while inputting positive and negative samples. The TPR of the first stage convolutional neural network may reach one relatively high value (e.g., TPR = 99.9%) and the FPR may not be high (e.g., FPR = 20%). In this case, the sub-sample is partially misclassified as a positive sample. Next, the positive and negative samples classified in the convolution neural network of the first stage are selected and the repeated training is performed on the parameters of the convolution neural network of the second stage. If the final positive and negative samples classified in the convolution neural network of the second stage are selected based on the same principle and the repeated training is performed on the parameters of the last stage convolutional neural network, finally, the angles of the cascade convolution neural network A convolutional neural network of the stage can be obtained. For convenience of explanation, the case where the number of stages is three has been described as an example, but the number of stages can be variously modified.

도 2c는 일 실시예에 따른 단일 스테이지의 컨볼루션 뉴럴 네트워크의 구조를 설명하는 도면이다. 도 2c를 참조하면, 컨볼루션 뉴럴 네트워크는 순차적으로 케스케이드된 입력레이어, 제1 내지 제6 서브 네트워크, 전체 연결레이어와 출력레이어를 포함한다. 제1, 2, 3, 4 또는 6 서브 네트워크는 컨볼루션 레이어, BN(Batch Normalization, 단체 표준화) 레이어, ReLU(Rectified Linear Unit, 수정선형유닛) 레이어와 풀링(Pooling) 레이어를 포함한다. 제5서브 네트워크는 컨볼루션 레이어와 BN 레이어를 포함한다.2C is a diagram illustrating a structure of a single stage convolutional neural network according to an embodiment. Referring to FIG. 2C, the convolutional neural network includes sequentially cascaded input layers, first through sixth subnetworks, an entire connection layer, and an output layer. The first, second, third, fourth or sixth subnetwork includes a convolution layer, a BN (Batch Normalization) layer, a ReLU (Rectified Linear Unit) layer and a Pooling layer. The fifth sub-network includes a convolution layer and a BN layer.

도 2c에서 입력영상(250)은 128X128X3의 치수를 가진다. 128x128은 입력영상의 해상도이고, 3은 영상 채널의 수(예를 들어 R(Red), G(Green), B(Blue) 채널)을 의미할 수 있다. 입력영상과 제1 서브 네트워크의 컨볼루션 레이어 1(261) 사이의 120X120X3은 CNN 입력의 치수를 나타낸다. 입력영상으로부터 120X120X3 치수의 영상을 얻기 위하여, 세 개의 120X120 행렬을 이용하여 중심자름 등으로 입력영상이 편집될 수 있다.2C, the input image 250 has dimensions of 128X128X3. 128x128 is the resolution of the input image and 3 is the number of image channels (for example, R (Red), G (Green), B (Blue) channel). And 120X120X3 between the input image and the convolution layer 1 261 of the first subnetwork represent the dimensions of the CNN input. In order to obtain the image of the 120X120X3 dimension from the input image, the input image can be edited with three 120X120 matrices, such as a center cut.

컨볼루션 레이어 1(261)의 3X3X3X16에서 3X3은 컨볼루션 레이어 1의 단위 스캔템플릿이 3X3픽셀포인트 행렬임을 표시하고; 세번째 3은 이전 급의 픽셀포인트 행렬(즉 영상 채널)의 개수를 표시하며; 16은 컨볼루션 레이어 1에 포함된 컨볼루션 커널 또는 필터의 수(또는 컨볼루션 레이어의 깊이라고도 함)을 포함한다. 컨볼루션 레이어 1(261)의 각 컨볼루션 커널 또는 필터는 3X3픽셀포인트 행렬을 단위 스캔템플릿으로 하고, 미리 설정한 픽셀포인트 개수(예를 들어, 1)를 스캔간격으로 하여 입력레이어의 각 원색의 픽셀포인트 행렬에 대하여 스캔한다. 스캔과정 중에, 각 컨볼루션 커널 또는 필터가 각 원색이 대응하는 120X120픽셀포인트 행렬중의 각 3X3픽셀포인트에 대하여 순차적으로 컨볼루션을 진행하고, 순차적으로 얻은 복수 개 제1차 컨볼루션 결과를 제1차 컨볼루션 후의 복수 픽셀포인트로 하여 120X120(픽셀포인트 행렬) X16 (레이어)개의 제1차 컨볼루션 후의 픽셀포인트를 획득한다. 다음, BN1(즉 제1차 단체 표준화)(262) 레이어로 각 레이어의 제1차 컨볼루션 후의 픽셀포인트에 대하여 표준화를 하여 16개 제1차 컨볼루션 후의 특징 맵을 얻으며, 각 특징 맵은 120X120개 픽셀포인트(즉 BN1(262)과 첫번째 ReLU(263) 사이의 120X120X16의 정의)를 포함한다. BN 레이어의 표준화는 컨볼루션 뉴럴 네트워크의 수렴속도를 향상시킬 수 있고, 서로 다른 조도 등 촬영조건이 컨볼루션 뉴럴 네트워크 성능의 영향을 감소시켜 컨볼루션 뉴럴 네트워크의 성능을 향상시키는데 유리하다.In 3X3X3X16 of convolution layer 1 261 3X3 indicates that the unit scan template of convolution layer 1 is a 3X3 pixel point matrix; The third number 3 indicates the number of pixel points of the previous class (i.e., image channels); 16 includes the number of convolution kernels or filters (also referred to as the depth of the convolution layer) included in convolution layer 1. Each of the convolution kernels or filters of the convolution layer 1 261 has a 3x3 pixel point matrix as a unit scan template and sets a predetermined number of pixel points (for example, 1) Scans for a pixel point matrix. During the scan process, each convolution kernel or filter sequentially convolves each 3X3 pixel point in the 120X120 pixel point matrix corresponding to each primary color, and a plurality of first convolution results obtained sequentially are stored in a first (Pixel point matrix) X16 (layer) number of pixel points after the first convolution as a plurality of pixel points after the second convolution. Next, normalization is performed on the pixel points after the first convolution of each layer with the layer BN1 (i.e., the first group normalization) 262 to obtain 16 feature maps after the first convolution, and each feature map is divided into 120X120 (I.e., the definition of 120X120X16 between BN1 262 and the first ReLU 263). Normalization of the BN layer can improve the convergence speed of the convolutional neural network, and shooting conditions such as different illuminance are advantageous to improve the performance of the convolutional neural network by reducing the influence of the convolutional neural network performance.

도 2c에서 첫번째 ReLU(263)는 구체적으로 첫번째 활성화 함수일 수 있고, 16개 제1차 컨볼루션 후의 특징 맵에 대한 일방적 활성화를 표시한다. 예를 들어, 특징 맵에서 0보다 크거나 같은 값은 유지되고, 0보다 작은 값은 0으로 재설정 되어 출력 또는 활성화된다. 이로 인하여, 특징 맵의 파라미터를 희박하게(sparse) 하여 파라미터 사이의 연관성을 약하게 할 수 있고, 컨볼루션 뉴럴 네트워크가 트레이닝 중에 오버 피팅되는 경향을 줄일 수 있다. 도 2c에서 첫번째 MaxPool(Max Pooling, 최대 풀링)(264)은 첫번째 풀링을 표시한다. 첫번째 풀링 레이어는 최대 풀링 방법을 사용하여 16개의 첫번째 컨볼루션 후의 특징 맵에 대하여 각각 풀링할 수 있다. 풀링 구역의 크기는 2X2이고, 예를 들어 매 2X2개 구역 내 하나의 최대값이 대표값으로 선택될 수 있다. 그 결과, 16개의 첫번째 풀링후의 특징 맵(즉 첫번째 MaxPool과 컨볼루션 레이어2 사이의 60X60X16의 정의)이 획득될 수 있다.In FIG. 2C, the first ReLU 263 may be specifically the first activation function and represents one-way activation of the feature maps after the 16 first convolution operations. For example, a value greater than or equal to 0 is maintained in the feature map, and a value less than 0 is reset to 0 and output or activated. This can sparse the parameters of the feature map to weaken the associativity between the parameters and reduce the tendency of the convolution neural network to overfit during training. In FIG. 2C, the first MaxPool (Max Pooling) 264 represents the first pooling. The first pooling layer can be pooled for each of the 16 first convolved feature maps using the maximum pooling method. The size of the pooling zone is 2X2, for example, one maximum value within every 2X2 zones may be selected as the representative value. As a result, sixteen first post-pulling feature maps (i.e., the definition of 60X60X16 between the first MaxPool and convolution layer 2) can be obtained.

전술한 것과 실질적으로 동일하게, 제2 내지 제6서브 네트워크의 내부구조 및 동작이 설명될 수 있다.Substantially the same as the above, internal structures and operations of the second to sixth subnetworks can be described.

도 2c에서 아래 열에 도시된 4X4X64는 제6서브 네트워크가 출력한 64개의 제6서브 네트워크 처리를 경과한 특징 맵을 표시한다. 각 특징 맵은 4X4픽셀 포인트 행렬을 포함한다. 도 2c에서 아래 열에 도시된 전부 연결 레이어(271)은 인접 레이어 간 노드들끼리 모두 연결된 전부 연결레이어를 표시하고, 1024X2는 전부 연결레이어의 파라미터를 표시하며; 전부 연결레이어는 64개의 4X4픽셀포인트의 특징 맵의 각 픽셀을 하나의 1X1024의 벡터로 전환하고, 전환된 벡터와 파라미터 행렬 1024X2에 대하여 행렬 곱셈 연산(matrix multiplication operation)을 하여 1X2의 결과를 획득한 후, Softmax(다항기호 논리학 회귀(multinomial logistic regression))(272)가 표시하는 출력레이어로 출력한다. 출력레이어가 출력한 결과는 입력영상(250)이 정 샘플로 분류된 확률, 및 입력영상(250)이 부 샘플로 분류된 확률을 포함한다. 입력영상(250)이 정 샘플로 분류된 확률은 입력영상이 컨볼루션 뉴럴 네트워크에 의하여 처리된 검측 값(280)으로 결정될 수 있다. In FIG. 2C, the 4X4X64 shown in the lower column shows a feature map that has passed the 64 sixth subnetwork processes output by the sixth subnetwork. Each feature map includes a 4X4 pixel point matrix. In FIG. 2C, the entire connection layer 271 shown in the lower column indicates all connection layers connected to each other between adjacent layers, and 1024X2 denotes all of the connection layer parameters; The entire link layer converts each pixel of the feature map of 64 4X4 pixel points into a vector of 1X1024 and performs a matrix multiplication operation on the transformed vector and the parameter matrix 1024X2 to obtain a result of 1X2 (Multinomial logistic regression) 272 to the output layer indicated by Softmax (multinomial logistic regression). The output from the output layer includes a probability that the input image 250 is classified as a normal sample and a probability that the input image 250 is classified as a sub-sample. The probability that the input image 250 is classified as a positive sample may be determined as the detected value 280 processed by the convolutional neural network of the input image.

아래에서 각 스테이지의 컨볼루션 뉴럴 네트워크에 연결된 후의 각 스테이지의 역치 판단 레이어의 역치 결정방법을 설명한다.A method of determining the threshold value of the threshold value determination layer of each stage after being connected to the convolutional neural network of each stage will be described below.

일 실시예에 따른 각 스테이지의 컨볼루션 뉴럴 네트워크에 의하여 입력영상에 대한 검측 값이 출력되면, 해당 스테이지의 역치 판단 레이어에서 검측 값에 대한 선별이 수행될 수 있다. 예를 들어, 선별을 통과한 검측 값의 영상은 다음 스테이지의 컨볼루션 뉴럴 네트워크의 입력영상으로 제공되고, 선별을 통과하지 못한 검측 값의 영상은 가짜 타겟 영상으로 판정되어 다음 스테이지의 컨볼루션 뉴럴 네트워크에 제공되지 않을 수 있다.When the detection value for the input image is output by the convolutional neural network of each stage according to the embodiment, the detection value may be selected from the threshold value determination layer of the stage. For example, the image of the detection value passed through the selection is provided as the input image of the convolution neural network of the next stage, and the image of the detection value that fails to pass the selection is determined as the false target image, May not be provided.

일 실시예에 따르면, 트레이닝 과정에서 초기 스테이지의 컨볼루션 뉴럴 네트워크는 부분적으로 오판정을 포함하도록 트레이닝 될 수 있으며, 오판정된 샘플 영상들은 다음 스테이지의 컨볼루션 뉴럴 네트워크의 부 샘플로 활용될 수 있다.According to one embodiment, in the training process, the convolution neural network of the initial stage may be trained to partially include mis-determination, and mis-determined sample images may be utilized as a sub-sample of the convolution neural network of the next stage .

각 스테이지의 역치 판단 레이어의 역치는 실험 데이터, 경험 데이터, 역사 데이터 및/또는 최종적으로 도달해야 할 진짜 타겟 식별률 등 실제상황에 기초하여 합리하게 설정될 수 있다. 역치 판단 레이어의 선별을 통하여 각 다음 스테이지의 컨볼루션 뉴럴 네트워크에서 입력되는 영상은 다음 스테이지의 컨볼루션 뉴럴 네트워크의 분류 정확도를 향상시키는데 도움을 주며 전체적으로 진위 타겟 영상의 분류 정확도를 향상시킬 수 있다. The threshold value of the threshold value determination layer of each stage can be set reasonably based on actual conditions such as experimental data, experience data, historical data, and / or actual target identification rate to be finally reached. Through the selection of the threshold value layer, the image input from the convolution neural network of each subsequent stage helps improve the classification accuracy of the convolution neural network of the next stage and can improve the classification accuracy of the true target image as a whole.

일 실시예에서, 현재 스테이지의 컨볼루션 뉴럴 네트워크가 출력한 TPR과 FPR성능은 실질적으로 현재 스테이지의 및 모든 이전 스테이지의 컨볼루션 뉴럴 네트워크가 출력한 성능을 종합한 결과일 수 있다. 예를 들면, 입력영상이 정 샘플로 분류된 확률과 입력영상이 부 샘플로 분류된 확률의 합이 1로, 검측 값은 0보다 크고 1보다 작은 실수일 수 있다. 일 실시예에서 제1, 2와 3 스테이지의 역치 판단 레이어의 역치는 각각 0.2, 0.3, 0.2로 설정될 수 있다.In one embodiment, the TPR and FPR performance output by the convolutional neural network of the current stage may be the result of synthesizing the performance of substantially the current stage and the convolutional neural network of all previous stages. For example, the sum of the probability that the input image is classified as a positive sample and the probability that the input image is classified as a negative sample is 1, and the detection value may be a real number larger than 0 and smaller than 1. In one embodiment, threshold values of the threshold value determination layers of the first, second, and third stages may be set to 0.2, 0.3, and 0.2, respectively.

도 3a는 일 실시예에 따른 타겟 검측 방법이 실제 응용에 적용되는 시나리오의 흐름도이다. 도 3a를 참조하면, 실제 응용에서 타겟 영상을 획득하는 단계(S301); 획득한 타겟 영상의 품질 유형을 결정하는 단계(S302); 타겟 영상의 품질 유형과 대응하는 품질 유형의 케스케이드 컨볼루션 뉴럴 네트워크를 결정하는 단계(S303); 대응하는 품질 유형의 케스케이드 컨볼루션 뉴럴 네트워크에 기초하여 타겟 영상의 검측 값을 결정하는 단계(S304); 타겟 영상의 검측 값에 기초하여 타겟 영상 내 타겟이 진짜 타겟인지 여부를 결정하는 단계(S305); 및 결정한 진짜 타겟에 기초하여 대응처리를 하는 단계(S306)를 포함할 수 있다. 3A is a flowchart of a scenario in which a target detection method according to an embodiment is applied to an actual application. Referring to FIG. 3A, a target image is acquired in an actual application (S301); Determining a quality type of the acquired target image (S302); Determining (S303) a cascade convolution neural network of a quality type corresponding to the quality type of the target image; Determining (S304) the detected value of the target image based on the cascade convolution neural network of the corresponding quality type; Determining whether the target in the target image is a real target based on the detected value of the target image (S305); And performing a corresponding process based on the determined real target (S306).

일 실시예에 따른 단말장비는 촬영장치를 통하여 타겟에 대한 타겟 영상을 획득한다. 여기서, 타겟 영상은 단일 영상(single image)이다. 실시예에 따라, 단말장비는 타겟에 대한 연속 영상을 획득하고, 연속 영상 중에 포함한 각 타겟의 영상을 타겟 영상으로 할 수 있다. 예를 들어, 단말장비는 타겟에 대한 비디오를 획득하고, 비디오 중에 포함한 각 타겟의 프레임 영상, 즉 타겟 프레임 영상을 타겟 영상으로 할 수 있다.A terminal equipment according to an embodiment acquires a target image for a target through a photographing apparatus. Here, the target image is a single image. According to the embodiment, the terminal equipment may acquire a continuous image for the target, and may use the image of each target included in the continuous image as a target image. For example, the terminal equipment may acquire a video for a target, and may use a frame image of each target included in the video, that is, a target frame image as a target image.

획득한 타겟 영상의 품질 유형을 결정하는 단계(S302), 타겟 영상의 품질 유형과 대응하는 품질 유형의 케스케이드 컨볼루션 뉴럴 네트워크를 결정하는 단계(S303), 및 대응하는 품질 유형의 케스케이드 컨볼루션 뉴럴 네트워크에 기초하여 타겟 영상의 검측 값을 결정하는 단계(S304)는 도 1의 단계(S101), 단계(S102), 및 단계(S103)에 대응된다.Determining a quality type of the acquired target image (S302), determining (S303) a cascade convolution neural network of a quality type corresponding to the quality type of the target image, and determining a cascade convolution neural network (S304) of determining the detected value of the target image based on the pixel values of the target image corresponds to steps S101, S102, and S103 in Fig.

단계(S304)에서 타겟 영상의 현재 스테이지의 검측 값의 결정과정은 다음 과정을 통하여 수행될 수 있다. 현재 스테이지의 컨볼루션 뉴럴 네트워크에 기초하여 타겟 영상의 현재 스테이지의 검측 값을 결정하고; 현재 스테이지와 다음 스테이지의 컨볼루션 뉴럴 네트워크 사이에 연결된 현재 스테이지의 역치 판단 레이어에 기초하여 현재 스테이지의 검측 값이 미리 설정한 현재 스테이지의 진짜 타겟 검측 역치보다 큰지 여부를 판단한다. 현재 스테이지의 검측 값이 현재 스테이지의 역치보다 큰 경우, 타겟 영상의 다음 스테이지의 검측 값에 대한 결정과정을 진행하여 타겟 영상의 마지막 스테이지의 검측 값을 타겟 영상의 검측 값으로 결정할 때까지 진행한다.In step S304, the determination of the detected value of the current stage of the target image may be performed through the following process. Determine a detected value of a current stage of the target image based on the convolution neural network of the current stage; It is determined whether or not the detected value of the current stage is greater than a real target detection threshold of the current stage based on the threshold value layer of the current stage connected between the current stage and the convolution neural network of the next stage. If the detected value of the current stage is larger than the threshold value of the current stage, the process proceeds to the determination of the detected value of the next stage of the target image and proceeds to the determination of the detected value of the last stage of the target image as the detected value of the target image.

예를 들면, 상술한 도 2b의 케스케이드 컨볼루션 뉴럴 네트워크에 타겟 영상이 입력되면, 제1 스테이지의 컨볼루션 뉴럴 네트워크(CNN1)에 기초하여 타겟 영상의 제1 스테이지의 검측 값을 결정하고; 제1 스테이지의 역치 판단 레이어에 기초하여 제1 스테이지의 검측 값이 제1 스테이지의 역치 판단 레이어의 역치(역치 1)보다 큰지 여부를 판단한다. 제1 스테이지의 검측 값이 제1 스테이지의 역치 판단 레이어의 역치보다 크지 않을 때, 타겟 영상 중의 타겟을 가짜 타겟으로 판단하고, 타겟 영상은 후속의 여러 스테이지의 컨볼루션 뉴럴 네트워크의 타겟 분류에 더 이상 참여하지 않으며, 제1 스테이지의 검측 값을 상기 타겟 영상의 검측 값으로 출력한다. 제1 스테이지의 검측 값이 제1 스테이지의 역치 판단 레이어의 역치보다 크면, 제1 스테이지의 컨볼루션 뉴럴 네트워크 분류를 경과한 제1 스테이지의 타겟 영상을 입력 영상으로 하여 제2 스테이지의 컨볼루션 뉴럴 네트워크(CNN2)에 대한 타겟 분류를 진행한다. 마지막 스테이지의 역치 판단 레이어까지도 최후 스테이지의 컨볼루션 뉴럴 네트워크가 출력한 검측 값이 최후 스테이지의 역치 판단 레이어의 역치보다 클 때, 검측한 타겟은 진짜 타겟일 가능성이 있으며, 최후 스테이지의 역치 판단 레이어의 역치보다 큰 최후 스테이지의 검측 값을 타겟 영상의 검측 값으로 한다.For example, if a target image is input to the cascade convolutional neural network of FIG. 2B described above, the detection value of the first stage of the target image is determined based on the convolution neural network CNN1 of the first stage; It is determined whether or not the detected value of the first stage is larger than the threshold value (threshold value 1) of the threshold value determination layer of the first stage based on the threshold value determination layer of the first stage. When the detected value of the first stage is not larger than the threshold value of the threshold value determination layer of the first stage, the target in the target image is determined as a false target, and the target image is no longer included in the target classification of the convolution neural network of subsequent stages And outputs the detected value of the first stage as the detected value of the target image. If the detected value of the first stage is larger than the threshold of the threshold value determination layer of the first stage, the target image of the first stage, which has passed the convolutional neural network classification of the first stage, (CNN2). When the detection value output from the convolution neural network of the last stage is greater than the threshold value of the threshold value determination layer of the last stage, the detected target may be a real target, and the threshold value determination layer of the last stage The detection value of the last stage larger than the threshold value is set as the detection value of the target video.

타겟 영상의 검측 값에 기초하여 타겟 영상 중의 타겟이 진짜 타겟인지 여부를 판단하는 단계(S305)는 도 1의 단계(S104)에 대응한다.The step S305 of determining whether or not the target in the target image is a real target based on the detected value of the target image corresponds to step S104 in Fig.

일 실시예에 따르면, 타겟 영상이 싱글 영상인 경우, 타겟 영상의 검측 값과 미리 설정한 진짜 타겟 검측 역치를 비교하고; 타겟 영상의 검측 값이 역치보다 클 때, 타겟 영상중이 타겟이 진짜 타겟이라고 결정하며; 타겟 영상의 검측 값이 역치보다 크지 않을 때, 타겟 영상 중의 타겟이 가까 타겟이라고 결정한다. 여기서, 진짜 타겟 검측 역치는 실험 데이터, 경험 데이터, 역사 데이터 및/또는 실제상황에 의하여 미리 설정될 수 있고; 예를 들면, 진짜 타겟 검측 역치는 0.3으로 설정될 수 있다. According to an exemplary embodiment, when the target image is a single image, the target value of the target image is compared with a preset real target detection threshold value; When the detected value of the target image is larger than the threshold value, it is determined that the target of the target image is a real target; When the detected value of the target image is not larger than the threshold value, it is determined that the target in the target image is close to the target. Here, the true target detection threshold can be preset by experiment data, experience data, historical data and / or actual situation; For example, the true target detection threshold may be set to 0.3.

일 실시예에 따르면, 타겟 영상이 프레임 영상인 경우, 각 프레임의 검측 값 및 모호 평가 값을 이용하여 종합적 검측 값을 결정할 수 있다. 타겟 영상의 모호 평가 값은 타겟 영상의 모호 정도의 평가를 통하여 획득 및 저장될 수 있다.According to an exemplary embodiment, when the target image is a frame image, a comprehensive detection value can be determined using the detected values and the ambiguous evaluation values of each frame. The morphological evaluation value of the target image can be acquired and stored through evaluation of the degree of ambiguity of the target image.

예를 들면, JNB(Just noticeable blur, 최소 분명한 모호), 또는 CPBD(Cumulative Probability of Blur Detection, 모호 검측의 누계 확률)등 방법을 사용하여 타겟 영상에 대한 모호정도의 평가를 진행할 수 있다.For example, it is possible to evaluate the degree of ambiguity of the target image using a method such as JNB (Just noticeable blur), Cumulative Probability of Blur Detection (CPBD), or cumulative probability of blur detection.

CPBD방법을 예로 들면, 우선 타겟 영상을 복수의 타겟 영상 블록으로 분할하고; Canny(컨니) 또는 Soble(소벨) 가장자리 검측 연산자를 사용하여 각 타겟 영상 블록의 수평 가장자리에 대하여 검측한다. 이후, 가장자리 픽셀의 비율을 계산한다. 예를 들어, 한 타겟 영상 블록의 가장자리 픽셀의 비율이 미리 정해진 수(예를 들어, 0.002)보다 크면 해당 타겟 영상 블록을 가장자리 영상 블록으로 결정하고, 가장자리 픽셀의 비율이 미리 정해진 수(예를 들어, 0.002)보다 크지 않으면 해당 타겟 영상 블록을 비 가장자리 영상 블록으로 결정한다.Taking the CPBD method as an example, first, a target image is divided into a plurality of target image blocks; Detect the horizontal edges of each target image block using the Canny (edge) or Soble (edge) edge detection operators. Then, the ratio of edge pixels is calculated. For example, if the ratio of the edge pixels of one target image block is larger than a predetermined number (for example, 0.002), the target image block is determined as the edge image block, and if the ratio of the edge pixels is a predetermined number , 0.002), the corresponding target image block is determined as a non-edge image block.

더 나아가, 수학식 1을 이용하여 가장자리 영상 블록 내 가장자리 픽셀 ei에 대하여 비교도 C에 기초한 최소 분명한 가장자리 너비

를 계산한다.Further, using equation (1), the minimum clear edge width based on the comparative diagram C for the edge pixel ei in the edge image block

.

가장자리 픽셀 ei의 실제 가장자리 너비 w(ei)를 계산하고, 수학식 2에 기초하여 가장자리 모호의 확률 P_blur를 계산한다. 수학식 2에서 β는 하나의 고정된 파라미터이다.Calculates the actual edge width w (ei) of the edge pixel ei, and calculates the edge blur probability P _blur based on Equation (2). In Equation (2),? Is one fixed parameter.

P_blur가 미리 정해진 수(예를 들어, 0.63)보다 작은 가장자리 픽셀이 모든 가장자리 픽셀에서 차지한 통계비율을 모호 검측 값으로 한다. 영상이 모호할수록 P_blur이 낮은 픽셀 비율은 낮아지고, 상응한 모호 검측 값도 작아진다. 아래에서 설명하겠으나, 모호 검측 값을 가중(weight)으로 하여 종합적 검측 값이 결정되므로, 모호 검측 값이 작은 모호 영상일수록 진짜 타겟(생체타겟)에 대한 검측 알고리즘에 미치는 영향을 낮출 수 있다. The percentage of statistics occupied by edge pixels of which P _blur is less than a predetermined number (for example, 0.63) in all edge pixels is taken as an ambiguous detection value. The more ambiguous the image, the lower the pixel ratio of P _blur is, and the corresponding blurred test value becomes smaller. As will be described below, since the comprehensive detection value is determined with the obscured detection value as a weight, the influence on the detection algorithm for the real target (biometric target) can be lowered as the morphological image with the smaller detection value is smaller.

현재 프레임 타겟 영상의 모호 검측 값을 결정한 후, 저장한다. 같은 원리로 현재 프레임 이전의 복수의 프레임 타겟 영상의 모호 검측 값은 미리 결정되어 저장될 수 있다. 또한, 상술내용에서 결정한 현재 프레임 타겟 영상의 검측 값에 대하여 저장한다. 같은 이유로, 현재 프레임 이전의 복수 프레임 타겟 영상의 검측 값을 미리 결정 및 저장할 수 있다. After determining the blurred test value of the current frame target image, the blurred test value is stored. On the same principle, the blurred detection values of a plurality of frame target images before the current frame can be predetermined and stored. Also, it stores the detected value of the current frame target image determined in the above description. For the same reason, the detected values of the plural frame target images before the current frame can be determined and stored in advance.

현재 프레임 타겟 영상 및 이전의 복수 프레임 타겟 영상의 검측 값과 모호 평가 값에 기초하여 현재 프레임 타겟 영상의 종합적 검측 값을 결정한다. 예를 들어, 현재 프레임 타겟 영상 및 이전의 복수의 프레임 타겟 영상의 모호 평가 값을 각자의 검측 값의 가중으로 하여, 각 검측 값의 가중 평균 값을 계산함으로써 현재 프레임 타겟 영상의 종합적 검측 값을 결정할 수 있다. A comprehensive detection value of the current frame target image is determined based on the detected value and the ambiguous evaluation value of the current frame target image and the previous plural frame target image. For example, a comprehensive detection value of a current frame target image is determined by calculating a weighted average value of each detected value as a weight of a detected value of a current frame target image and a previous evaluation value of a plurality of frame target images .

도 3b는 일 실시예에 따른 프레임 영상에 대한 타겟 검측 방법을 설명하는 도면이다. 도 3b를 참조하면, 생체 검측 값은 검측 값을 표시하고, 윗 열의 현재 프레임 N(311)은 현재 프레임 타겟 영상의 검측 값을 표시하며 N은 양의 정수이다. 윗 열의 프레임 N-i(312)는 현재 프레임과 i프레임 간격을 둔 이전 프레임의 검측 값을 표시하고 i는 N보다 작은 양의 정수이다.FIG. 3B is a view for explaining a target detection method for a frame image according to an embodiment. Referring to FIG. 3B, the biometric test value indicates a detected value, the current frame N (311) in the upper row indicates a detected value of the current frame target image, and N is a positive integer. The upper row frame N-i 312 indicates the detected value of the previous frame with the current frame and i frame interval, and i is a positive integer smaller than N. [

도 3b에서 아래 열의 현재 프레임 N(321)은 현재 프레임 타겟 영상의 모호 평가 값을 표시하고 아래 열의 프레임 N-i(322)는 현재 프레임과 i프레임 간격을 둔 이전 프레임의 모호 평가 값을 표시한다. 현재 프레임 타겟 영상과 i번째 이전 프레임 영상 중의 각 프레임 영상에 대하여 검측 값과 모호 평가 값을 곱한 후, 각 프레임 영상의 곱셈결과를 더하여 현재 프레임 타겟 영상의 종합적 검측 값(330)을 획득한다.In FIG. 3B, the current frame N (321) in the lower row indicates the evaluation value of the current frame target image, and the frame N-i 322 in the lower column indicates the evaluation value of the previous frame with the current frame and the i frame interval. Multiply the result of the multiplication of each frame image by multiplying the detected value and the evaluation value of each frame image in the current frame target image and the previous frame image to obtain a comprehensive detection value 330 of the current frame target image.

다시 도 3a를 참조하면, 단계(S305)에서 현재 프레임 타겟 영상의 종합적 검측 값에 기초하여 현재 프레임 타겟 영상 중의 타겟이 진짜 타겟인지 여부를 결정할 수 있다. 예를 들어, 현재 프레임 타겟 영상의 종합적 검측 값과 미리 설정한 진짜 타겟 검측 역치를 비교하고; 종합적 검측 값이 역치보다 클 때, 현재 프레임 타겟 영상 중의 타겟을 진짜 타겟으로 결정하고; 종합적 검측 값이 역치보다 크지 않을 때, 현재 프레임 영상 중의 타겟을 가짜 타겟으로 결정할 수 있다.Referring back to FIG. 3A, in step S305, it is possible to determine whether the target in the current frame target image is a real target, based on the comprehensive detection value of the current frame target image. For example, the comprehensive detection value of the current frame target image is compared with a preset real target detection threshold value; When the comprehensive detection value is greater than the threshold value, the target in the current frame target image is determined as a real target; When the comprehensive detection value is not larger than the threshold value, the target in the current frame image can be determined as a false target.

단계(S306)에서 타겟의 진위 여부에 따라 대응 처리가 수행될 수 있다. 상술한 단계에서 결정한 타겟 영상 중의 타겟이 진짜 타겟이면, 타겟 영상과 연관된 처리단계를 실행한다. 예를 들면, 타겟 영상이 연관된 결제단계, 또는 잠금해제 단계 등이 실행될 수 있다. 상술한 단계에서 결정한 타겟 영상 중의 타겟이 가짜 타겟이면, 타겟 영상과 연관된 관련처리단계 실행을 거절한다. 예를 들면, 타겟 영상이 연관된 잠금해제 단계 또는 결제단계의 실행이 거절될 수 있다.In step S306, corresponding processing may be performed depending on whether the target is true or not. If the target in the target image determined in the above step is a real target, the processing step associated with the target image is executed. For example, a settlement step in which a target image is associated, or an unlocking step or the like may be executed. If the target in the target image determined in the above step is a false target, the execution of the associated processing step associated with the target image is rejected. For example, the unlocking step or the execution of the settlement step associated with the target image may be rejected.

도 4는 일 실시예에 따른 타겟 검측 장치의 블록도이다. 도 4를 참조하면, 타겟 검측 장치는 영상 품질 유형 결정부(401), 컨볼루션 뉴럴 네트워크 결정부(402), 검측 값 결정부(403) 및 타겟 진위 결정부(404)를 포함한다.4 is a block diagram of a target detection apparatus according to an embodiment. Referring to FIG. 4, the target detection apparatus includes an image quality type determination unit 401, a convolutional neural network determination unit 402, a detection value determination unit 403, and a target truth determination unit 404.

여기서, 영상 품질 유형 결정부(401)는 타겟 영상의 품질 유형을 결정한다. 컨볼루션 뉴럴 네트워크 결정부(402)는 영상 품질 유형 결정부(401)가 결정한 타겟 영상의 품질 유형에 대응하는 품질 유형의 컨볼루션 뉴럴 네트워크를 결정한다. 검측 값 결정부(403)는 컨볼루션 뉴럴 네트워크 결정부(402)가 결정한 대응하는 품질 유형의 컨볼루션 뉴럴 네트워크에 기초하여 타겟 영상의 검측 값을 결정한다. 타겟 진위 결정부(404)는 검측 값 결정부(403)가 결정한 타겟 영상의 검측 값에 기초하여 타겟 영상 중의 타겟이 진짜 타겟인지 여부를 결정한다.Here, the image quality type determining unit 401 determines the quality type of the target image. The convolutional neural network determination unit 402 determines a convolutional neural network of the quality type corresponding to the quality type of the target image determined by the image quality type determination unit 401. [ The detected value determiner 403 determines the detected value of the target image based on the convolutional neural network of the corresponding quality type determined by the convolutional neural network determiner 402. [ The target truth determining unit 404 determines whether or not the target in the target image is a real target based on the detected value of the target image determined by the detected value determining unit 403. [

일 실시예에 따르면, 컨볼루션 뉴럴 네트워크 결정부(402)는 복수의 품질 유형의 컨볼루션 뉴럴 네트워크들을 포함하는 데이터베이스(406)에 접근하여 대응하는 컨볼루션 뉴럴 네트워크를 선택할 수 있다. 데이터베이스(406)는 타겟 검측 장치에 포함될 수도 있고, 혹은 타겟 검측 장치와 유무선으로 연결될 수도 있다.According to one embodiment, the convolutional neural network determiner 402 may access a database 406 comprising convolutional neural networks of a plurality of quality types to select a corresponding convolutional neural network. The database 406 may be included in the target detection device or may be wired or wirelessly connected to the target detection device.

일 실시예에 따르면, 컨볼루션 뉴럴 네트워크 트레이닝부(405)는 데이터베이스(406)에 포함된 복수의 품질 유형의 컨볼루션 뉴럴 네트워크들을 트레이닝 시킬 수 있다. 컨볼루션 뉴럴 네트워크 트레이닝부(405)는 타겟 검측 장치에 포함될 수도 있고, 별도의 서버 등으로 구현될 수도 있다.According to one embodiment, the convolutional neural network training unit 405 may train convolutional neural networks of a plurality of quality types included in the database 406. [ The convolutional neural network training unit 405 may be included in the target detection apparatus or may be implemented as a separate server or the like.

영상 품질 유형 결정부(401), 컨볼루션 뉴럴 네트워크 결정부(402), 검측 값 결정부(403), 타겟 진위 결정부(404), 데이터베이스(406) 및 컨볼루션 뉴럴 네트워크 트레이닝부(405)에는 전술한 사항들이 그대로 적용될 수 있으므로, 보다 상세한 설명은 생략한다.The convolutional neural network determination unit 402, the detection value determination unit 403, the target truth determination unit 404, the database 406, and the convolutional neural network training unit 405 are connected to the video quality type determination unit 401, the convolutional neural network determination unit 402, The above-described matters may be applied as they are, so that detailed description will be omitted.

아래에서, 일 실시예에 따른 타겟 검측 실험의 결과를 설명한다. 트레이닝 데이터베이스는 총 391760개의 샘플 영상들을 포함하고, 그 중에서 진짜 타겟 영상(예를 들어, 사람의 생체얼굴 영상)은 115145개이며 가짜 타겟 영상(예를 들어, 사기 공격 영상)은 276615개이다. 진짜 타겟 영상과 가짜 타겟 영상 사이의 비율은 약 1:3이며 샘플 영상들은 500여개의 개체에서 수집되었다. 공격영상은 진짜 사람의 얼굴을 사칭하는 프린트물, 스크린영상, 사진 등을 포함한다. The results of the target detection experiment according to one embodiment will be described below. The training database contains a total of 39,1760 sample images, of which 115,115 are true target images (e.g., human face images of a human being) and 276615 false target images (e.g., fraud attack images). The ratio between the real target image and the fake target image is about 1: 3, and the sample images are collected from about 500 individuals. Attack images include prints, screen images, photographs, etc. impersonating a real person's face.

상술한 트레이닝 데이터 베이스를 트레이닝 세트와 테스트 세트로 분류하고(예를 들어, 80%의 영상은 트레이닝에 사용되고 20%의 영상은 테스트에 사용되도록 분류), 케스케이드 컨볼루션 뉴럴 네트워크 내 각 스테이지의 컨볼루션 뉴럴 네트워크에 대하여 순차적으로 반복 트레이닝이 진행되었다. 테스트 실험결과는 아래 표 1과 같다.The training database described above is categorized into training sets and test sets (e.g., 80% of the images are used for training and 20% of the images are used for testing), convolution of each stage in the cascaded convolutional neural network The repeated training was sequentially performed on the neural network. The test results are shown in Table 1 below.

컨볼루션 뉴럴 네트워크의 구조Structure of convolution neural network 컨볼루션 뉴럴 네트워크의 분류 정확도Classification Accuracy of Convolutional Neural Networks 단일 스테이지의 CNNSingle-Stage CNN TPR = 97.0% ，FPR = 1.0%TPR = 97.0%, FPR = 1.0% 복수의 스테이지들의 케스케이드 CNNCascade of multiple stages CNN TPR = 99.2% ，FPR = 1.0%TPR = 99.2%, FPR = 1.0%

상술한 표 1에서 알 수 있듯이, 복수의 스테이지들의 케스케이드 CNN이 단일 스테이지의 CNN에 비하여 정확도 성능이 우세하다.As can be seen from the above-mentioned Table 1, the cascade CNN of the plurality of stages is superior to the single stage CNN in accuracy performance.

도 5는 일 실시예에 따른 다양한 사기(fake) 공격에 대한 강인성을 설명하는 도면이다. 도 5를 참조하면, 좌측 4개의 영상은 가짜 타겟을 포함한 공격영상이고, 우측 1개의 영상은 진짜 타겟을 포함한 타겟 영상(521)이다. 공격영상은 진짜 사람얼굴 영상을 포함하는 핸드폰 화면의 공격영상(511), 진짜 사람얼굴 영상을 포함한 스크린 화면의 공격영상(512), 진짜 사람얼굴 사진을 포함한 공격영상(513), 프린트한 진짜 사람얼굴 영상을 포함한 공격영상(514)을 포함할 수 있다. 다시 말해, 공격영상 내 촬영대상은 진짜 타겟이 아니라 진짜 타겟의 사진, 디스플레이 스크린, 또는 진짜 프린트한 영상 등일 수 있다. 5 is a diagram illustrating robustness against various fake attacks according to an embodiment. Referring to FIG. 5, the left four images are attack images including a false target, and the right image is a target image 521 including a real target. Attack video is attack image (511) of mobile phone screen including real person face image, attack image (512) of screen screen including real person face image, attack image (513) including real person face photograph, And an attack image 514 including a face image. In other words, the target in the attack image may not be a real target, but may be a photograph of a real target, a display screen, or a real printed image.

도 6은 일 실시예에 따른 낮은 품질 유형의 사기 공격에 대한 강인성을 설명하는 도면이다. 도 6을 참조하면, 좌측 영상(611)은 검측한 진짜 타겟을 포함한 모호 영상이고, 중간 영상(621)과 우측 영상(622)은 검측한 가짜 타겟을 포함한 모호한 공격영상이다.6 is a diagram illustrating robustness against fraud attacks of the low quality type according to one embodiment. Referring to FIG. 6, the left image 611 is an ambiguous image including the real target detected, and the intermediate image 621 and the right image 622 are ambiguous attack images including the detected false target.

타겟 검측 시스템의 실제 응용에서 핸드폰 등의 단말장비에 포함된 촬영장치를 통하여 비디오 프레임 영상을 연속적으로 수집하고 타겟 검측이 수행된다. 도 6의 예시들과 같이 단말장비에 흔들림과 움직임이 발생하여 수집한 프레임 영상에 운동 모호 디스토션(distortion)이 나타날 수 있다. 기존의 알고리즘에 의하면 모호한 진짜 타겟 영상과 공격영상(즉 가짜 타겟 영상)을 효과적으로 구분할 수 없다. 반면, 전술한 실시예들에 의하면, 프레임 영상에 운동 모호 디스토션이 나타난 모호 영상 중의 진위 타겟이 오판되는 확률을 효과적으로 감소시키고, 모호 영상 중의 진짜 타겟이 정확하게 검측할 수 있다.In actual application of the target detection system, video frame images are continuously collected through a photographing device included in a terminal device such as a mobile phone, and target detection is performed. As shown in FIG. 6, motion blur and motion may occur in the terminal equipment, and motion blur distortion may appear in the collected frame image. According to the existing algorithm, it is not possible to effectively distinguish between an ambiguous real target image and an attack image (i.e., a false target image). On the other hand, according to the above-described embodiments, it is possible to effectively reduce the probability that the true-point target in the ambiguous image in which the motion-blur distortion appears in the frame image is misjudged, and to accurately detect the real target in the ambiguous image.

이상에서 설명된 실시예들은 하드웨어 구성요소, 소프트웨어 구성요소, 및/또는 하드웨어 구성요소 및 소프트웨어 구성요소의 조합으로 구현될 수 있다. 예를 들어, 실시예들에서 설명된 장치, 방법 및 구성요소는, 예를 들어, 프로세서, 콘트롤러, ALU(arithmetic logic unit), 디지털 신호 프로세서(digital signal processor), 마이크로컴퓨터, FPGA(field programmable gate array), PLU(programmable logic unit), 마이크로프로세서, 또는 명령(instruction)을 실행하고 응답할 수 있는 다른 어떠한 장치와 같이, 하나 이상의 범용 컴퓨터 또는 특수 목적 컴퓨터를 이용하여 구현될 수 있다. 처리 장치는 운영 체제(OS) 및 상기 운영 체제 상에서 수행되는 하나 이상의 소프트웨어 애플리케이션을 수행할 수 있다. 또한, 처리 장치는 소프트웨어의 실행에 응답하여, 데이터를 접근, 저장, 조작, 처리 및 생성할 수도 있다. 이해의 편의를 위하여, 처리 장치는 하나가 사용되는 것으로 설명된 경우도 있지만, 해당 기술분야에서 통상의 지식을 가진 자는, 처리 장치가 복수 개의 처리 요소(processing element) 및/또는 복수 유형의 처리 요소를 포함할 수 있음을 알 수 있다. 예를 들어, 처리 장치는 복수 개의 프로세서 또는 하나의 프로세서 및 하나의 콘트롤러를 포함할 수 있다. 또한, 병렬 프로세서(parallel processor)와 같은, 다른 처리 구성(processing configuration)도 가능하다.The embodiments described above may be implemented in hardware components, software components, and / or a combination of hardware components and software components. For example, the devices, methods, and components described in the embodiments may be implemented within a computer system, such as, for example, a processor, a controller, an arithmetic logic unit (ALU), a digital signal processor, such as an array, a programmable logic unit (PLU), a microprocessor, or any other device capable of executing and responding to instructions. The processing device may execute an operating system (OS) and one or more software applications running on the operating system. The processing device may also access, store, manipulate, process, and generate data in response to execution of the software. For ease of understanding, the processing apparatus may be described as being used singly, but those skilled in the art will recognize that the processing apparatus may have a plurality of processing elements and / As shown in FIG. For example, the processing unit may comprise a plurality of processors or one processor and one controller. Other processing configurations are also possible, such as a parallel processor.

소프트웨어는 컴퓨터 프로그램(computer program), 코드(code), 명령(instruction), 또는 이들 중 하나 이상의 조합을 포함할 수 있으며, 원하는 대로 동작하도록 처리 장치를 구성하거나 독립적으로 또는 결합적으로(collectively) 처리 장치를 명령할 수 있다. 소프트웨어 및/또는 데이터는, 처리 장치에 의하여 해석되거나 처리 장치에 명령 또는 데이터를 제공하기 위하여, 어떤 유형의 기계, 구성요소(component), 물리적 장치, 가상 장치(virtual equipment), 컴퓨터 저장 매체 또는 장치, 또는 전송되는 신호 파(signal wave)에 영구적으로, 또는 일시적으로 구체화(embody)될 수 있다. 소프트웨어는 네트워크로 연결된 컴퓨터 시스템 상에 분산되어서, 분산된 방법으로 저장되거나 실행될 수도 있다. 소프트웨어 및 데이터는 하나 이상의 컴퓨터 판독 가능 기록 매체에 저장될 수 있다.The software may include a computer program, code, instructions, or a combination of one or more of the foregoing, and may be configured to configure the processing device to operate as desired or to process it collectively or collectively Device can be commanded. The software and / or data may be in the form of any type of machine, component, physical device, virtual equipment, computer storage media, or device , Or may be permanently or temporarily embodied in a transmitted signal wave. The software may be distributed over a networked computer system and stored or executed in a distributed manner. The software and data may be stored on one or more computer readable recording media.

실시예에 따른 방법은 다양한 컴퓨터 수단을 통하여 수행될 수 있는 프로그램 명령 형태로 구현되어 컴퓨터 판독 가능 매체에 기록될 수 있다. 상기 컴퓨터 판독 가능 매체는 프로그램 명령, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다. 상기 매체에 기록되는 프로그램 명령은 실시예를 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 당업자에게 공지되어 사용 가능한 것일 수도 있다. 컴퓨터 판독 가능 기록 매체의 예에는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체(magnetic media), CD-ROM, DVD와 같은 광기록 매체(optical media), 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical media), 및 롬(ROM), 램(RAM), 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다. 프로그램 명령의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드를 포함한다. 상기된 하드웨어 장치는 실시예의 동작을 수행하기 위해 하나 이상의 소프트웨어 모듈로서 작동하도록 구성될 수 있으며, 그 역도 마찬가지이다.The method according to an embodiment may be implemented in the form of a program command that can be executed through various computer means and recorded in a computer-readable medium. The computer-readable medium may include program instructions, data files, data structures, and the like, alone or in combination. The program instructions to be recorded on the medium may be those specially designed and configured for the embodiments or may be available to those skilled in the art of computer software. Examples of computer-readable media include magnetic media such as hard disks, floppy disks and magnetic tape; optical media such as CD-ROMs and DVDs; magnetic media such as floppy disks; Magneto-optical media, and hardware devices specifically configured to store and execute program instructions such as ROM, RAM, flash memory, and the like. Examples of program instructions include machine language code such as those produced by a compiler, as well as high-level language code that can be executed by a computer using an interpreter or the like. The hardware devices described above may be configured to operate as one or more software modules to perform the operations of the embodiments, and vice versa.

이상과 같이 실시예들이 비록 한정된 도면에 의해 설명되었으나, 해당 기술분야에서 통상의 지식을 가진 자라면 상기를 기초로 다양한 기술적 수정 및 변형을 적용할 수 있다. 예를 들어, 설명된 기술들이 설명된 방법과 다른 순서로 수행되거나, 및/또는 설명된 시스템, 구조, 장치, 회로 등의 구성요소들이 설명된 방법과 다른 형태로 결합 또는 조합되거나, 다른 구성요소 또는 균등물에 의하여 대치되거나 치환되더라도 적절한 결과가 달성될 수 있다.Although the embodiments have been described with reference to the drawings, various technical modifications and variations may be applied to those skilled in the art. For example, it is to be understood that the techniques described may be performed in a different order than the described methods, and / or that components of the described systems, structures, devices, circuits, Lt; / RTI > or equivalents, even if it is replaced or replaced.

Claims

Determining a quality type of the target image;
Determining, from a database comprising a plurality of convolutional neural networks, a convolutional neural network of a quality type corresponding to a quality type of the target image;
Determining a detection value of the target image based on the convolutional neural network of the corresponding quality type; And
Determining whether the target in the target image is a real target based on the detected value of the target image
Gt;

The method according to claim 1,
The quality type of the target image is
Is determined based on at least one quality value of the target image determined corresponding to at least one quality parameter.

3. The method of claim 2,
The at least one quality parameter
An image pickup parameter of the target image, and an attribute parameter of the target image.

The method according to claim 1,
The measured value
Wherein the target image comprises a probability that the target image has been classified into a positive sample containing a real target.

The method according to claim 1,
The step of determining the quality type of the target image
Determining at least one quality value of the target image corresponding to at least one quality parameter;
Determining at least one quality classification of the target image based on the at least one quality value and at least one quality type classification standard preset corresponding to the at least one quality parameter; And
Determining a quality type of the target image based on the at least one quality classification
The method comprising:

The method according to claim 1,
The step of determining the quality type of the target image
Determining a quality type of the target image by performing a blind image quality evaluation on the target image
Gt;

The method according to claim 1,
The database includes:
Determining a quality type of the plurality of sample images and training the convolutional neural network of the corresponding quality type based on the sample images of each quality type.

The method according to claim 1,
The convolutional neural network
And a cascaded convolutional neural network comprising convolutional neural networks of at least two stages and at least one threshold value layer.

9. The method of claim 8,
Wherein a cascaded convolutional neural network of each quality type requires a different figure of merit and the number of stages included in the cascade convolution neural network of each quality type is determined by a corresponding figure of merit.

10. The method of claim 9,
The figure of merit includes a true positive rate (TPR) corresponding to a fraction of a positive sample to a positive sample, and a false positive rate (FPR) corresponding to a fraction of a negative sample to a positive sample , Target detection method.

9. The method of claim 8,
Wherein the figure of merit required by the cascaded convolutional neural network is satisfied by a combination of the figure of merit of the convolutional neural networks of the plurality of stages included in the cascaded convolutional neural network.

9. The method of claim 8,
Wherein the step of determining the detected value of the target image comprises:
Determining a detected value of a first stage of the target image based on a convolutional neural network of a first stage included in the cascaded convolutional neural network; And
Performing a process of determining a detected value of a next stage of the target image by comparing a detected value of the first stage with a predetermined threshold value based on a threshold value determination layer connected to the convolution neural network of the first stage
The method comprising:

13. The method of claim 12,
The step of determining the detection value of the next stage
Sequentially determining a detection value of the last stage in accordance with a result of comparison between a detected value of each stage after the next stage and a corresponding threshold value
Gt;

13. The method of claim 12,
Setting a detected value of the next stage as a detected value of the target image when the detected value of the first stage is larger than the threshold value; And
Determining a detected value of the first stage as a detected value of the target image when the detected value of the first stage is smaller than the threshold value
Further comprising the steps of:

The method according to claim 1,
Determining whether a target in the target image is a real target based on the detected value of the target image,
Determining a motion estimation value of the current frame when the target image is a current frame in the frame image;
A comprehensive detection value of the target image is determined based on a voiced evaluation value of the current frame, a detected value of the current frame, a voiced evaluation value of a plurality of previous frames in the frame image, and detection values of the plurality of previous frames step; And
Determining whether a target in the current frame is a real target based on a comprehensive detection value of the target image
Gt;

16. The method of claim 15,
The step of determining the evaluation value
Calculating the morphological evaluation value based on the ambiguity probabilities of the edge pixels in the current frame
The method comprising:

16. The method of claim 15,
Wherein the step of determining a comprehensive detection value of the target image comprises:
Determining a weighted average value of the detected values using the voiced evaluation values of the current frame and the plurality of previous plural frames as weight values of the corresponding detected values, and determining the weighted average value as the integrated detected value
Gt;

Determining a quality type of the sample image;
Selecting a convolutional neural network of a quality type corresponding to the determined quality type of convolutional neural networks of a plurality of quality types; And
Training the convolutional neural network of the selected quality type based on whether the sample image is positive or negative,
Gt; a < / RTI > convolutional neural network.

19. The method of claim 18,
Wherein the sample image corresponds to a positive sample when the sample image includes a real target and the sample image corresponds to a negative sample if the sample image does not include the real target.

19. The method of claim 18,
The quality type of the sample image is
Wherein the at least one quality parameter is determined based on at least one quality value of the sample image determined corresponding to at least one quality parameter.

19. The method of claim 18,
The step of determining the quality type of the sample image
Determining at least one quality value of the sample image corresponding to at least one quality parameter;
Determining at least one quality classification of the sample image based on the at least one quality value and at least one quality type classification standard preset corresponding to the at least one quality parameter; And
Determining a quality type of the sample image based on the at least one quality classification
Gt; a < / RTI > convolutional neural network.

19. The method of claim 18,
Establishing a database based on the convolutional neural network of each quality type
Further comprising the steps < RTI ID = 0.0 > of: < / RTI >

19. The method of claim 18,
The convolutional neural network
And a cascaded convolutional neural network including at least two convolutional neural networks and at least one threshold value layer.

24. The method of claim 23,
Wherein the cascade convolutional neural network of each quality type requires a different figure of merit and the number of stages included in the cascade convolutional neural network of each quality type is determined by a corresponding figure of merit, Way.

26. A computer program stored in a medium for executing the method of any one of claims 1 to 24 in combination with hardware.

An image quality type determining unit for determining a quality type of a target image;
A convolutional neural network determiner for determining, from a database comprising a plurality of convolutional neural networks, a convolutional neural network of a quality type corresponding to a quality type of the target image;
A detection value determiner for determining a detection value of the target image based on the convolutional neural network of the corresponding quality type; And
A target truth determining unit for determining whether or not the target in the target image is a real target based on the detected value of the target image,
And the target detection device.

27. The method of claim 26,
The convolutional neural network
And a cascaded convolutional neural network including convolutional neural networks of at least two stages and at least one threshold value layer.

28. The method of claim 27,
Wherein the cascade convolutional neural network of each quality type requires a different figure of merit and the number of stages included in the cascade convolution neural network of each quality type is determined by a corresponding figure of merit.

28. The method of claim 27,
The detection value determination unit
Determining a detected value of a first stage of the target image based on a convolutional neural network of a first stage included in the cascaded convolutional neural network and determining a detected value of a first stage of the target image based on a threshold value layer connected to the convolutional neural network of the first stage And performs a process of determining a detection value of a next stage of the target image by comparing the detected value of the first stage with a preset threshold value.

27. The method of claim 26,
The target authenticity determination unit determines,
Determining a motion estimation value of the current frame when the target image is a current frame in the frame image; A comprehensive detection value of the target image is determined based on the voiced evaluation value of the current frame, the detected value of the current frame, the voiced evaluation values of a plurality of previous frames in the frame image, and the detection values of the plurality of previous frames ; And determines whether the target in the current frame is a real target based on the integrated detection value of the target image.