KR102277929B1

KR102277929B1 - Real time face masking system based on face recognition and real time face masking method using the same

Info

Publication number: KR102277929B1
Application number: KR1020200021508A
Authority: KR
Inventors: 오흥선; 손성빈; 정준욱
Original assignee: 한국기술교육대학교 산학협력단
Priority date: 2020-02-21
Filing date: 2020-02-21
Publication date: 2021-07-14

Abstract

The present invention relates to a real-time face masking system based on face recognition and a real-time face masking method using the same, in which the real-time face masking system based on the face recognition includes: a content storage unit for receiving and storing a video content from a content producer terminal; a frame separation unit for separating the video content into a plurality of image frames according to a video sequence; a reference information storage unit for receiving and storing a plurality of images including a face of an unmasked target from the content producer terminal; a matching information generation unit for generating matching information including facial features of the unmasked target from the images; a face detection unit for extracting a face region from each of the image frames; a comparison unit for comparing the matching information with the face region extracted from each of the image frames to select a masking target region; and a masking unit for performing a masking process on the masking target region selected by the comparison unit in each of the image frames. Accordingly, personal information such as portrait rights is protected.

Description

Real-time face masking system based on face recognition and real-time face masking method using the same

본 발명은 이미지 또는 동영상 콘텐츠에 대한 영상 처리 시스템에 관한 것으로, 더욱 상세하게는 개인 정보 보호를 위한 얼굴 인식에 기반한 실시간 얼굴 마스킹 시스템 및 이를 이용한 실시간 얼굴 마스킹 방법에 관한 것이다. The present invention relates to an image processing system for image or video content, and more particularly, to a real-time face masking system based on face recognition for protecting personal information and a real-time face masking method using the same.

최근 공개된 통계정보에 따르면 2014년부터 2018년 9월까지 접수된 초상권 침해 관련 피해 신고건수는 39,151건에 달한다고 한다. 이는, 2014년 한해 5,017건이었던 초상권 침해 신고가 4년만에 약 3배 가까이 증가했다는 것으로 초상권 침해 문제가 급격히 심화되고 있음을 반증하는 것이라 할 수 있다. According to recently released statistical information, from 2014 to September 2018, there were 39,151 reports of damage to portrait rights. This shows that the number of reports of copyright infringement, which was 5,017 cases in 2014, nearly tripled in four years, proving that the issue of copyright infringement is rapidly intensifying.

하지만, 2017년부터 2018년 5월까지 초상권 침해 해결을 위한 심의에 오른 건수는 총 96건이었고, 해결된 경우는 49건으로 절반 수준에 그치고 있다. 즉, 초상권 침해에 관한 관심과 시정요구가 늘어나고 있지만, 실질적으로 해결이 되는 경우는 매우 드물다는 것을 공개된 통계로부터 명확히 알 수 있다.However, from 2017 to May 2018, a total of 96 cases were reviewed for resolving the infringement of portrait rights, and 49 cases were resolved, only half of the cases. In other words, it can be clearly seen from the published statistics that although interest and correction requests for copyright infringement are increasing, it is very rare that they are actually resolved.

한편, 최근 페이스북(facebook), 유튜브(youtube)와 같은 인터넷 라이브 방송의 인기가 급격히 증가함에 따라 라이브 방송을 직업으로 삼는 사람들이 급격히 증가하고 있는 실정이다. 하지만, 라이브 방송을 진행하는 과정에서 원치않는 수많은 사람들이 얼굴이 공개되어 초상권을 침해하는 일이 빈번히 발생할 뿐만 아니라, 범죄의 표적이 될 가능성이 크다는 문제점이 있다. 이는, 라이브 방송이 실시간으로 이루어지기 때문에 실시간 동영상 콘텐츠에서 선택적으로 얼굴을 추출하여 마스킹 처리하는 기술이 마련되어 있지 않기 때문이다. On the other hand, as the popularity of Internet live broadcasting such as Facebook and YouTube is rapidly increasing in recent years, the number of people who make live broadcasting as a profession is rapidly increasing. However, there is a problem that the face of a large number of unwanted people is revealed in the process of live broadcasting, infringing the right of portrait frequently occurs, and there is a problem that it is highly likely to become a target of a crime. This is because, because live broadcasting is performed in real time, there is no technology for selectively extracting faces from real-time video content and masking them.

본 발명은 상기한 문제점을 개선하기 위해 발명된 것으로, 본 발명이 해결하고자 하는 과제는, 초상권 등의 개인 정보를 효과적으로 보호하기 위한 얼굴 인식에 기반한 실시간 얼굴 마스킹 시스템 및 이를 이용한 실시간 얼굴 마스킹 방법을 제공하는데 그 목적이 있다.The present invention was invented to improve the above problems, and the problem to be solved by the present invention is to provide a real-time face masking system based on face recognition for effectively protecting personal information such as portrait rights and a real-time face masking method using the same but it has a purpose.

본 발명의 기술적 과제는 이상에서 언급한 것들로 제한되지 않으며, 언급되지 않은 또 다른 기술적 과제는 아래의 기재로부터 당업자에게 명확하게 이해될 수 있을 것이다. The technical problem of the present invention is not limited to those mentioned above, and another technical problem not mentioned will be clearly understood by those skilled in the art from the following description.

본 발명의 실시예에 따른 얼굴 인식에 기반한 실시간 얼굴 마스킹 시스템은 콘텐츠 제작자 단말로부터 동영상 콘텐츠를 전송받아 저장하는 콘텐츠 저장부; 상기 동영상 콘텐츠를 동영상 시퀀스에 따라 복수의 이미지 프레임으로 분리하는 프레임 분리부; 상기 콘텐츠 제작자 단말로부터 마스킹 미처리 대상의 얼굴을 포함하는 복수의 이미지를 전송받아 저장하는 기준 정보 저장부; 상기 복수의 이미지로부터 상기 마스킹 미처리 대상의 얼굴 특징을 포함하는 매칭 정보를 생성하는 매칭 정보 생성부; 상기 복수의 이미지 프레임 각각에서 얼굴 영역을 추출하는 얼굴 검출부; 상기 복수의 이미지 프레임 각각에서 추출된 상기 얼굴 영역과 상기 매칭 정보를 비교하여 마스킹 처리 대상 영역을 선정하는 비교부; 및 상기 복수의 이미지 프레임 각각에서 상기 비교부에서 선정된 마스킹 처리 대상 영역에 대해 마스킹 처리를 수행하는 마스킹 처리부를 포함할 수 있다. A real-time face masking system based on face recognition according to an embodiment of the present invention includes: a content storage unit for receiving and storing video content from a content creator terminal; a frame separator for dividing the video content into a plurality of image frames according to a video sequence; a reference information storage unit for receiving and storing a plurality of images including a face of an unmasked target from the content creator terminal; a matching information generator configured to generate matching information including facial features of the unmasked target from the plurality of images; a face detection unit for extracting a face region from each of the plurality of image frames; a comparison unit for selecting a masking process target area by comparing the matching information with the face area extracted from each of the plurality of image frames; and a masking processor configured to perform a masking process on the region to be masked selected by the comparator in each of the plurality of image frames.

또한, 본 발명은 마스킹 처리가 완료된 상기 복수의 이미지 프레임 각각을 동영상 시퀀스에 따라 병합하여 동영상 컨텐츠를 재생산하는 프레임 병합부; 상기 비교부에서 추출된 상기 얼굴 영역과 상기 매칭 정보를 비교하여 마스킹 처리 대상 영역을 선정하기 이전에 상기 복수의 이미지 프레임 각각에 대한 명도를 조절하는 명도 제어부; 상기 복수의 이미지 프레임 각각에서 체형 영역을 추출하는 체형 검출부; 및 상기 마스킹 처리 대상 영역에 대해 마스킹 처리를 수행할 때, 마스킹 처리 정도 및 마스킹 처리 범위를 제어하는 마스킹 레벨 제어부를 더 포함할 수 있다. In addition, the present invention provides a frame merging unit for reproducing moving picture content by merging each of the plurality of image frames on which the masking process is completed according to a moving picture sequence; a brightness control unit that compares the face area extracted by the comparison unit with the matching information and adjusts brightness of each of the plurality of image frames before selecting a masking processing target area; a body shape detection unit for extracting a body shape region from each of the plurality of image frames; and a masking level controller configured to control a degree of a masking process and a range of a masking process when the masking process is performed on the masking process target area.

본 발명에 있어서, 상기 명도 제어부는 상기 복수의 이미지 프레임 각각의 명도를 동일한 레벨로 제어할 수 있다. In the present invention, the brightness control unit may control the brightness of each of the plurality of image frames to the same level.

본 발명의 실시예에 따른 실시간 얼굴 마스킹 방법은 콘텐츠 제작자 단말로부터 동영상 콘텐츠 및 마스킹 미처리 대상의 얼굴을 포함하는 복수의 이미지를 전송받아 각각 콘텐츠 저장부 및 기준 정보 저장부에 저장하는 단계; 상기 동영상 콘텐츠를 동영상 시퀀스에 따라 복수의 이미지 프레임으로 분리시키는 단계; 상기 복수의 이미지로부터 상기 마스킹 미처리 대상의 얼굴 특징을 포함하는 매칭 정보를 생성하는 단계; 상기 복수의 이미지 프레임 각각에서 얼굴 영역을 추출하는 단계; 상기 복수의 이미지 프레임 각각에서 추출된 상기 얼굴 영역과 상기 매칭 정보를 비교하여 마스킹 처리 대상 영역을 선정하는 단계; 및 선정된 상기 마스킹 처리 대상 영역에 대해 마스킹 처리를 수행하는 단계를 포함할 수 있다. A real-time face masking method according to an embodiment of the present invention includes the steps of receiving a plurality of images including moving image content and a face of an unmasked target from a content producer terminal and storing the received images in a content storage unit and a reference information storage unit, respectively; dividing the video content into a plurality of image frames according to a video sequence; generating matching information including facial features of the unmasked target from the plurality of images; extracting a face region from each of the plurality of image frames; selecting an area to be masked by comparing the matching information with the face area extracted from each of the plurality of image frames; and performing a masking process on the selected masking process target area.

또한, 본 발명은 마스킹 처리가 완료된 상기 복수의 이미지 프레임을 동영상 시퀀스에 따라 병합하여 동영상 콘텐츠를 재생산하는 단계; 및 재생산된 상기 동영상 콘텐츠를 콘텐츠 사용자 단말로 전송하는 단계를 더 포함할 수 있다. In addition, the present invention comprises the steps of reproducing the video content by merging the plurality of image frames in which the masking process is completed according to a video sequence; and transmitting the reproduced video content to a content user terminal.

또한, 본 발명은 상기 복수의 이미지 프레임 각각에서 추출된 상기 얼굴 영역과 상기 매칭 정보를 비교하여 마스킹 처리 대상 영역을 선정하는 단계 이전에, 상기 복수의 이미지 프레임 각각이 동일한 레벨의 명도를 갖도록 조절하는 단계를 더 포함할 수 있다. In addition, the present invention compares the face region extracted from each of the plurality of image frames with the matching information and adjusts each of the plurality of image frames to have the same level of brightness before selecting the region to be masked. It may include further steps.

본 발명에 있어서, 상기 복수의 이미지 프레임 각각에서 얼굴 영역을 추출하는 단계는, 체형 검출부를 통해 상기 복수의 이미지 프레임 각각에서 체형 영역을 추출하는 단계; 및 얼굴 검출부를 통해 상기 복수의 이미지 프레임 각각에서 얼굴 영역을 추출하되, 상기 얼굴 영역 추출 범위를 상기 체형 영역으로 한정하여 진행하는 단계를 포함할 수 있다. In the present invention, the extracting of the face region from each of the plurality of image frames includes: extracting the body shape region from each of the plurality of image frames through a body shape detector; and extracting a face region from each of the plurality of image frames through a face detector, limiting the extraction range of the face region to the body shape region.

본 발명에 있어서, 상기 콘텐츠 제작자 단말로부터 동영상 콘텐츠 및 마스킹 미처리 대상의 얼굴을 포함하는 복수의 이미지를 전송받아 각각 콘텐츠 저장부 및 기준 정보 저장부에 저장하는 단계는, 상기 복수의 이미지를 전송받아 상기 기준 정보 저장부에 저장하고, 저장 완료 신호를 상기 콘텐츠 저장부로 전송하는 단계; 및 상기 콘텐츠 저장부는 상기 저장 완료 신호에 응답하여 상기 콘텐츠 제작자 단말로부터 상기 동영상 콘텐츠를 저장받아 저장하는 단계를 포함할 수 있다. In the present invention, the step of receiving the plurality of images including the moving image content and the face of the unmasked target from the content producer terminal and storing the images in the content storage unit and the reference information storage unit, respectively, includes receiving the plurality of images and the storing the reference information in a reference information storage unit and transmitting a storage completion signal to the content storage unit; and the content storage unit receiving and storing the video content from the content creator terminal in response to the storage completion signal.

상술한 과제의 해결 수단을 바탕으로 하는 본 기술은 라이브 방송과 같은 동영상 콘텐츠를 사용자에게 실시간으로 제공하는 콘텐츠 관리 서버에 탑재되어 동영상 콘텐츠 내에서 얼굴 또는/및 체형이 노출되지 않아야 하는 사람에 대해 자동적으로 마스킹 처리를 수행함으로서 초상권 등의 개인 정보를 보호할 수 있는 효과가 있다. This technology, which is based on the means to solve the above problems, is mounted on a content management server that provides video content such as live broadcasting to users in real time, so that a person whose face or / and body shape should not be exposed in the video content is automatically applied. This has the effect of protecting personal information such as portrait rights by performing masking processing.

또한, 본 기술은 명도 제어부를 구비함으로서 각각의 이미지 프레임의 밝기를 일정한 수준으로 유지한 상태에서 매칭 정보와 추출된 얼굴 영역을 비교하여 마스킹 처리 대상의 선정 정확도를 향상시킬 수 있는 효과가 있다. In addition, the present technology has an effect of improving the selection accuracy of the masking processing target by comparing the matching information with the extracted face region while maintaining the brightness of each image frame at a constant level by having a brightness control unit.

본 발명의 효과들은 이상에서 언급한 효과들로 제한되지 않으며, 언급되지 않은 또 다른 효과들은 청구범위의 기재로부터 당업자에게 명확하게 이해될 수 있을 것이다. Effects of the present invention are not limited to the effects mentioned above, and other effects not mentioned will be clearly understood by those skilled in the art from the description of the claims.

도 1은 본 발명의 실시예에 따른 얼굴 인식에 기반한 실시간 얼굴 마스킹 시스템을 간략히 도시한 블럭도이다.
는 본 발명의 실시예에 따른 얼굴 인식에 기반한 실시간 얼굴 마스킹 시스템을 이용한 얼굴 마스킹 방법을 설명하기 위한 순서도이다. 1 is a block diagram schematically illustrating a real-time face masking system based on face recognition according to an embodiment of the present invention.
is a flowchart for explaining a face masking method using a real-time face masking system based on face recognition according to an embodiment of the present invention.

이하, 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자가 본 발명을 용이하게 실시할 수 있을 정도로 본 발명의 바람직한 실시예를 첨부된 도면을 참조하여 상세하게 설명하면 다음과 같다.Hereinafter, preferred embodiments of the present invention will be described in detail with reference to the accompanying drawings to the extent that those of ordinary skill in the art to which the present invention pertains can easily practice the present invention.

본 발명에 관한 설명은 구조적 내지 기능적 설명을 위한 실시예에 불과하므로, 본 발명의 권리범위는 본문에 설명된 실시예에 의하여 제한되는 것으로 해석되어서는 아니 된다. 즉, 실시예는 다양한 변경이 가능하고 여러 가지 형태를 가질 수 있으므로 본 발명의 권리범위는 기술적 사상을 실현할 수 있는 균등물들을 포함하는 것으로 이해되어야 한다. 또한, 본 발명에서 제시된 목적 또는 효과는 특정 실시예가 이를 전부 포함하여야 한다거나 그러한 효과만을 포함하여야 한다는 의미는 아니므로, 본 발명의 권리범위는 이에 의하여 제한되는 것으로 이해되어서는 아니 될 것이다.Since the description of the present invention is merely an embodiment for structural or functional description, the scope of the present invention should not be construed as being limited by the embodiment described in the text. That is, since the embodiment may have various modifications and may have various forms, it should be understood that the scope of the present invention includes equivalents capable of realizing the technical idea. In addition, since the object or effect presented in the present invention does not mean that a specific embodiment should include all of them or only such effects, it should not be understood that the scope of the present invention is limited thereby.

한편, 본 출원에서 서술되는 용어의 의미는 다음과 같이 이해되어야 할 것이다.On the other hand, the meaning of the terms described in the present application should be understood as follows.

"제1", "제2" 등의 용어는 하나의 구성요소를 다른 구성요소로부터 구별하기 위한 것으로, 이들 용어들에 의해 권리범위가 한정되어서는 아니 된다. 예를 들어, 제1 구성요소는 제2 구성요소로 명명될 수 있고, 유사하게 제2 구성요소도 제1 구성요소로 명명될 수 있다.Terms such as “first” and “second” are for distinguishing one component from another, and the scope of rights should not be limited by these terms. For example, a first component may be termed a second component, and similarly, a second component may also be termed a first component.

어떤 구성요소가 다른 구성요소에 "연결되어"있다고 언급된 때에는, 그 다른 구성요소에 직접적으로 연결될 수도 있지만, 중간에 다른 구성요소가 존재할 수도 있다고 이해되어야 할 것이다. 반면에, 어떤 구성요소가 다른 구성요소에 "직접 연결되어"있다고 언급된 때에는 중간에 다른 구성요소가 존재하지 않는 것으로 이해되어야 할 것이다. 한편, 구성요소들 간의 관계를 설명하는 다른 표현들, 즉 "~사이에"와 "바로 ~사이에" 또는 "~에 이웃하는"과 "~에 직접 이웃하는" 등도 마찬가지로 해석되어야 한다.When a component is referred to as being “connected to” another component, it should be understood that the component may be directly connected to the other component, but other components may exist in between. On the other hand, when it is mentioned that a certain element is "directly connected" to another element, it should be understood that the other element does not exist in the middle. Meanwhile, other expressions describing the relationship between elements, that is, “between” and “immediately between” or “neighboring to” and “directly adjacent to”, etc., should be interpreted similarly.

단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한 복수의 표현을 포함하는 것으로 이해되어야 하고, "포함하다"또는 "가지다" 등의 용어는 실시된 특징, 숫자, 단계, 동작, 구성요소, 부분품 또는 이들을 조합한 것이 존재함을 지정하려는 것이며, 하나 또는 그 이상의 다른 특징이나 숫자, 단계, 동작, 구성요소, 부분품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 미리 배제하지 않는 것으로 이해되어야 한다.The singular expression is to be understood to include the plural expression unless the context clearly dictates otherwise, and terms such as "comprises" or "have" refer to the embodied feature, number, step, action, component, part or these It is intended to indicate that a combination exists, and it should be understood that it does not preclude the possibility of the existence or addition of one or more other features or numbers, steps, operations, components, parts, or combinations thereof.

각 단계들에 있어 식별부호(예를 들어, a, b, c 등)는 설명의 편의를 위하여 사용되는 것으로 식별부호는 각 단계들의 순서를 설명하는 것이 아니며, 각 단계들은 문맥상 명백하게 특정 순서를 기재하지 않는 이상 명기된 순서와 다르게 일어날 수 있다. 즉, 각 단계들은 명기된 순서와 동일하게 일어날 수도 있고 실질적으로 동시에 수행될 수도 있으며 반대의 순서대로 수행될 수도 있다.Identifiers (eg, a, b, c, etc.) in each step are used for convenience of description, and the identification code does not describe the order of each step, and each step clearly indicates a specific order in context. Unless otherwise specified, it may occur in a different order from the specified order. That is, each step may occur in the same order as specified, may be performed substantially simultaneously, or may be performed in the reverse order.

본 발명은 컴퓨터가 읽을 수 있는 기록매체에 컴퓨터가 읽을 수 있는 코드로서 구현될 수 있고, 컴퓨터가 읽을 수 있는 기록 매체는 컴퓨터 시스템에 의하여 읽혀질 수 있는 데이터가 저장되는 모든 종류의 기록 장치를 포함한다. 컴퓨터가 읽을 수 있는 기록 매체의 예로는 ROM, RAM, CD-ROM, 자기 테이프, 플로피 디스크, 광 데이터 저장 장치 등이 있으며, 또한, 캐리어 웨이브(예를 들어 인터넷을 통한 전송)의 형태로 구현되는 것도 포함한다. 또한, 컴퓨터가 읽을 수 있는 기록 매체는 네트워크로 연결된 컴퓨터 시스템에 분산되어, 분산 방식으로 컴퓨터가 읽을 수 있는 코드가 저장되고 실행될 수 있다.The present invention can be embodied as computer-readable codes on a computer-readable recording medium, and the computer-readable recording medium includes all types of recording devices in which data readable by a computer system is stored. . Examples of the computer-readable recording medium include ROM, RAM, CD-ROM, magnetic tape, floppy disk, optical data storage device, etc., and may be implemented in the form of a carrier wave (eg, transmission through the Internet). also includes In addition, the computer-readable recording medium may be distributed in a network-connected computer system, and the computer-readable code may be stored and executed in a distributed manner.

첨부된 블록도의 각 블록과 흐름도의 각 단계의 조합들은 컴퓨터 프로그램 인스트럭션들에 의해 수행될 수도 있다. 이들 컴퓨터 프로그램 인스트럭션들은 범용 컴퓨터, 특수용 컴퓨터 또는 기타 프로그램 가능한 데이터 프로세싱 장비의 프로세서에 탑재될 수 있으므로, 컴퓨터 또는 기타 프로그램 가능한 데이터 프로세싱 장비의 프로세서를 통해 수행되는 그 인스트럭션들이 블록도의 각 블록 또는 흐름도의 각 단계에서 설명된 기능들을 수행하는 수단을 생성하게 된다. 이들 컴퓨터 프로그램 인스트럭션들은 특정 방식으로 기능을 구현하기 위해 컴퓨터 또는 기타 프로그램 가능한 데이터 프로세싱 장비를 지향할 수 있는 컴퓨터 이용 가능 또는 컴퓨터 판독 가능 메모리에 저장되는 것도 가능하므로, 그 컴퓨터 이용가능 또는 컴퓨터 판독 가능 메모리에 저장된 인스트럭션들은 블록도의 각 블록 또는 흐름도 각 단계에서 설명된 기능을 수행하는 인스트럭션 수단을 내포하는 제조 품목을 생산하는 것도 가능하다. 컴퓨터 프로그램 인스트럭션들은 컴퓨터 또는 기타 프로그램 가능한 데이터 프로세싱 장비 상에 탑재되는 것도 가능하므로, 컴퓨터 또는 기타 프로그램 가능한 데이터 프로세싱 장비 상에서 일련의 동작 단계들이 수행되어 컴퓨터로 실행되는 프로세스를 생성해서 컴퓨터 또는 기타 프로그램 가능한 데이터 프로세싱 장비를 수행하는 인스트럭션들은 블록도의 각 블록 및 흐름도의 각 단계에서 설명된 기능들을 실행하기 위한 단계들을 제공하는 것도 가능하다.Combinations of each block in the accompanying block diagram and each step in the flowchart may be performed by computer program instructions. These computer program instructions may be embodied in a processor of a general-purpose computer, special purpose computer, or other programmable data processing equipment, such that the instructions executed by the processor of the computer or other programmable data processing equipment may be configured in the respective blocks in the block diagram or in the flowchart. Each step creates a means for performing the described functions. These computer program instructions may also be stored in a computer-usable or computer-readable memory that may direct a computer or other programmable data processing equipment to implement a function in a particular manner, and thus the computer-usable or computer-readable memory. The instructions stored in the block diagram may also produce an item of manufacture containing instruction means for performing a function described in each block of the block diagram or each step of the flowchart. The computer program instructions may also be mounted on a computer or other programmable data processing equipment, such that a series of operational steps are performed on the computer or other programmable data processing equipment to create a computer-executed process to create a computer or other programmable data processing equipment. It is also possible that instructions for performing the processing equipment provide steps for carrying out the functions described in each block of the block diagram and each step of the flowchart.

또한, 각 블록 또는 각 단계는 특정된 논리적 기능(들)을 실행하기 위한 하나 이상의 실행 가능한 인스트럭션들을 포함하는 모듈, 세그먼트 또는 코드의 일부를 나타낼 수 있다. 또, 몇 가지 대체 실시 예들에서는 블록들 또는 단계들에서 언급된 기능들이 순서를 벗어나서 발생하는 것도 가능함을 주목해야 한다. 예컨대, 잇달아 도시되어 있는 두 개의 블록들 또는 단계들은 사실 실질적으로 동시에 수행되는 것도 가능하고 또는 그 블록들 또는 단계들이 때때로 해당하는 기능에 따라 역순으로 수행되는 것도 가능하다.Further, each block or each step may represent a module, segment, or portion of code that includes one or more executable instructions for executing the specified logical function(s). It should also be noted that, in some alternative embodiments, it is possible for the functions recited in blocks or steps to occur out of order. For example, it is possible that two blocks or steps shown one after another may in fact be performed substantially simultaneously, or that the blocks or steps may sometimes be performed in the reverse order according to the corresponding function.

여기서 사용되는 모든 용어들은 다르게 정의되지 않는 한, 본 발명이 속하는 분야에서 통상의 지식을 가진 자에 의해 일반적으로 이해되는 것과 동일한 의미를 가진다. 일반적으로 사용되는 사전에 정의되어 있는 용어들은 관련 기술의 문맥상 가지는 의미와 일치하는 것으로 해석되어야 하며, 본 출원에서 명백하게 정의하지 않는 한 이상적이거나 과도하게 형식적인 의미를 지니는 것으로 해석될 수 없다.All terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs, unless otherwise defined. Terms defined in general used in the dictionary should be interpreted as being consistent with the meaning in the context of the related art, and cannot be interpreted as having an ideal or excessively formal meaning unless explicitly defined in the present application.

후술하는 본 발명의 실시예는 페이스북, 유튜브와 같은 라이브 방송에서 빈빈하게 발생하는 초상권 침해를 원천적으로 방지하기 위한 것으로, 개인 정보 보호를 위한 얼굴 인식에 기반한 실시간 얼굴 마스킹 시스템 및 이를 이용한 실시간 얼굴 마스킹 방법을 제공하기 위한 것이다. An embodiment of the present invention, which will be described later, is to fundamentally prevent infringement of portrait rights that occur frequently in live broadcasts such as Facebook and YouTube, and a real-time face masking system based on face recognition for protecting personal information and real-time face masking using the same to provide a way.

참고로, 현재 페이스북 및 유튜브는 유사 마스킹 시스템에 해당하는 편집기능을 제공하고 있다. 그러나, 페이스북은 얼굴 인식을 이용해 사진에 있는 사람들을 자동으로 테크해주는 기능만을 제공할 뿐, 초상권 침해 방지를 위한 얼굴 마스킹 처리는 제공하지 않는다. 그리고, 유튜브는 얼굴 인식을 이용해 동영상에 대한 마스킹 처리를 제공하나, 콘텐츠 제작자가 수작업으로 마스킹 처리 대상 및 마스킹 처리 범위를 지정해주어야 하기 때문에 많은 시간 및 노동력이 소요된다는 문제점이 있다. 또한, 기 제작된 동영상 콘텐츠에 대해서는 수작업으로라도 마스킹 처리가 가능하나, 라이브 방송은 마스킹 처리가 불가능하다는 치명적인 문제점이 있다.For reference, Facebook and YouTube currently provide editing functions corresponding to similar masking systems. However, Facebook only provides a function to automatically tag people in a photo using facial recognition, and does not provide face masking to prevent copyright infringement. In addition, YouTube provides masking processing for videos using face recognition, but there is a problem in that a lot of time and labor is required because a content creator must manually designate a masking processing target and masking processing range. In addition, although masking processing is possible even manually for pre-produced video content, there is a fatal problem in that masking processing is impossible for live broadcasting.

따라서, 후술하는 본 발명의 실시예는 실시간으로 촬영되는 동영상 즉, 라이브 방송 콘텐츠에 대해 기 설정된 얼굴을 제외한 나머지 얼굴을 실시간으로 마스킹 처리할 수 있는 얼굴 인식에 기반한 실시간 얼굴 마스킹 시스템 및 이를 이용한 실시간 얼굴 마스킹 방법을 제공한다.Therefore, an embodiment of the present invention, which will be described later, provides a real-time face masking system based on face recognition capable of masking in real time the remaining faces except for a face preset for a video shot in real time, that is, a live broadcast content, and a real-time face using the same. A masking method is provided.

도 1은 본 발명의 실시예에 따른 얼굴 인식에 기반한 실시간 얼굴 마스킹 시스템을 간략히 도시한 블럭도이다. 1 is a block diagram schematically illustrating a real-time face masking system based on face recognition according to an embodiment of the present invention.

도 1에 도시된 바와 같이, 얼굴 인식에 기반한 실시간 얼굴 마스킹 시스템(200)은 콘텐츠 관리 서버(10)에 탑재될 수 있고, 콘텐츠 관리 서버(10)는 유무선 통신망을 통해 콘텐츠 제작자 단말(100)과 콘텐츠 사용자 단말(300) 사이를 연결할 수 있다. 얼굴 마스킹 시스템(200)은 프레임 분리부(212)를 구비하는 콘텐츠 저장부(210), 매칭 정보 생성부(222)를 구비하는 기준 정보 저장부(220), 체형 검출부(232) 및 얼굴 검출부(234)를 구비하는 검출부(230), 명도 제어부(242)를 구비하는 비교부(240), 마스킹 레벨 제어부(252)를 구비하는 마스킹 처리부(250) 및 프레임 병합부(260)를 포함할 수 있다. 1, the real-time face masking system 200 based on face recognition may be mounted on the content management server 10, and the content management server 10 communicates with the content creator terminal 100 through a wired/wireless communication network. The content user terminals 300 may be connected. The face masking system 200 includes a content storage unit 210 including a frame separating unit 212 , a reference information storage unit 220 including a matching information generation unit 222 , a body shape detection unit 232 and a face detection unit ( 234), a comparison unit 240 including a brightness control unit 242, a masking processing unit 250 including a masking level control unit 252, and a frame merging unit 260 may be included. .

콘텐츠 저장부(210)는 콘텐츠 제작자 단말(100)로부터 동영상 콘텐츠 예컨대, 라이브 방송을 위한 동영상을 전송받아 저장하는 역할을 수행할 수 있다. 다시 말해, 콘텐츠 저장부(210)는 콘텐츠 제작자 단말(100)로부터 전송받은 동영상 콘텐츠에 대해 마스킹 처리 여부를 판단하고, 마스킹 처리가 필요할 경우 마스킹 처리를 수행하는 동안 동영상 콘텐츠를 임시 저장하는 역할을 수행할 수 있다. The content storage unit 210 may serve to receive and store video content, for example, a video for live broadcasting from the content producer terminal 100 . In other words, the content storage unit 210 determines whether or not masking processing is performed on the video content received from the content creator terminal 100, and if masking processing is required, temporarily stores the video content while performing the masking process. can do.

프레임 분리부(212)는 마스킹 처리가 용이하도록 콘텐츠 저장부(210)에 저장된 동영상 콘텐츠를 복수의 이미지 프레임으로 분리하는 역할을 수행할 수 있다. 구체적으로, 프레임 분리부(212)는 콘텐츠 제작자 단말(100)로부터 전송받은 동영상 콘텐츠를 디코팅(decoding)하고, 동영상 시퀀스(video sequence)에 따라 각각의 이미지 프레임(image frame)으로 분리하는 역할을 수행할 수 있다. 여기서, 이미지 프레임으로 분리되는 정도 즉, 프레임 개수는 콘텐츠 제작자가 임의로 설정할 수 있다.The frame separator 212 may serve to divide the video content stored in the content storage 210 into a plurality of image frames to facilitate the masking process. Specifically, the frame separator 212 decodes the video content received from the content creator terminal 100 and divides the video content into individual image frames according to the video sequence. can be done Here, the degree of separation into image frames, ie, the number of frames, may be arbitrarily set by the content creator.

기준 정보 저장부(220)는 콘텐츠 제작자 단말(100)로부터 마스킹 처리 대상이 아닌 사람의 얼굴 정보 또는/및 체형 정보 즉, 매칭 정보를 추출할 수 있는 복수의 이미지를 전송받아 저장하는 역할을 수행할 수 있다. 여기서, 매칭 정보는 동영상 콘텐츠에서 얼굴 또는/및 체형이 공개되어도 무방한 마스킹 미처리 대상의 얼굴 및 체형에 관한 정보를 지칭하며, 콘텐츠 제작자에 의해 마스킹 미처리 대상이 사전에 선정될 수 있다. 예를 들어, 매칭 정보는 콘텐츠 제작자 자신의 얼굴 또는/및 체형에 관한 정보일 수 있고, 체형 정보는 얼굴을 포함한 전반적인 사람의 형상을 지칭할 수 있다.The reference information storage unit 220 receives and stores a plurality of images from which face information or/and body type information, that is, matching information, of a person not subject to masking processing is received from the content creator terminal 100. can Here, the matching information refers to information about the face and body type of a target that has not been masked to be disclosed in the video content, and the target that has not been masked may be selected in advance by the content creator. For example, the matching information may be information about the content creator's own face and/or body shape, and the body shape information may refer to the overall shape of a person including the face.

또한, 기준 정보 저장부(220)는 콘텐츠 제작자가 콘텐츠 관리 서버(10)에 동영상 콘텐츠를 업로드하기 이전에 콘텐츠 제작자 단말(100)에게 매칭 정보를 추출할 수 있는 복수의 이미지를 요청할 수 있고, 콘텐츠 제작자 단말(100)로부터 요청된 복수의 이미지를 전송받아 저장할 수 있다. 콘텐츠 제작자는 기준 정보 저장부(220)에 매칭 정보를 추출할 수 있는 복수의 이미지가 등록된 이후에 동영상 콘텐츠를 콘텐츠 저장부(210)에 업로드할 수 있다. 일례로, 기준 정보 저장부(220)는 콘텐츠 제작자 단말(100)에게 매칭 정보를 생성하기 위한 적어도 5장 이상의 이미지를 요청할 수 있고, 매칭 정보를 생성하기 위한 적어도 5장 이상의 이미지는 마스킹 미처리 대상인 사람의 얼굴 정면 이미지, 얼굴 좌측면 이미지, 얼굴 우측면 이미지, 사람의 전면 전체 이미지 및 사람의 후면 전체 이미지를 포함할 수 있다.In addition, the reference information storage unit 220 may request a plurality of images from which matching information can be extracted from the content creator terminal 100 before the content creator uploads the video content to the content management server 10 , A plurality of images requested from the manufacturer terminal 100 may be received and stored. The content creator may upload the video content to the content storage 210 after a plurality of images from which matching information can be extracted are registered in the reference information storage 220 . For example, the reference information storage unit 220 may request at least five images for generating matching information from the content creator terminal 100 , and at least five or more images for generating the matching information are unmasked persons. may include a face front image, a left face image, a right face image, a full front image of a person, and a full back image of a person.

매칭 정보 생성부(222)는 기준 정보 저장부(220)에 저장된 복수의 이미지로부터 마스킹 미처리 대상에 대한 얼굴 또는/및 체형에 대한 정보 즉, 매칭 정보를 생성하는 역할을 수행할 수 있다. 매칭 정보 생성부(222)는 딥러닝 기술 기반의 DeepFace 또는 VGGFace(Visual Geometry Group Face)를 사용하여 매칭 정보를 생성할 수 있다. 참고로, DeepFace는 사전에 학습된 3D 얼굴 기하 모델을 이용하여 랜드마크 추출 후에 어파인(affine) 변환에 의해 얼굴 정렬을 수행한 후 9개의 층으로 구성된 컨볼루션(convolution) 신경망을 Facebook이 내부적으로 수집한 대용량의 데이터를 이용해 학습하여 매칭 정보를 생성할 수 있다. 그리고, VGGFace는 인터넷 검색을 통해 직접 만든 대용량의 얼굴 인식을 위한 데이터셋인 VGG 얼굴 데이터셋을 공개하고, 이 데이터를 이용하여 15개의 컨볼루션 층으로 구성된 딥 네트워크 구조를 학습하여 매칭 정보를 생성할 수 있다. The matching information generating unit 222 may serve to generate information about a face and/or body shape of the unmasked target, ie, matching information, from a plurality of images stored in the reference information storage unit 220 . The matching information generator 222 may generate matching information using DeepFace or VGGFace (Visual Geometry Group Face) based on deep learning technology. For reference, DeepFace performs face alignment by affine transformation after landmark extraction using a pre-trained 3D face geometry model, and then Facebook internally creates a convolutional neural network consisting of 9 layers. Matching information can be generated by learning using the large amount of data collected. In addition, VGGFace discloses the VGG face dataset, which is a large-capacity dataset for face recognition created by Internet search, and uses this data to learn the deep network structure consisting of 15 convolutional layers to generate matching information. can

검출부(230)는 각각의 이미지 프레임에서 마스킹 처리 대상 및 마스킹 미처리 대상의 얼굴 영역 및 체형 영역을 검출하는 역할을 수행할 수 있다. 즉, 검출부(230)는 각각의 이미지 프레임에서 사람의 얼굴 영역 및 체형 영역을 검출할 수 있다. 이를 위해, 검출부(230)는 체형 검출부(232) 및 얼굴 검출부(234)를 포함할 수 있다. 프레임 분리부(212)에서 생성된 각각의 이미지 프레임에서 사람의 얼굴 인식을 위해서는 얼굴 인식을 수행할 얼굴 영역을 검출하는 것이 필요하다. 이때, 라이브 방송의 특성을 고려하여 빠른 시간내에 필요로하는 얼굴 영역을 검출하기 위해 경우에 따라 체형 영역을 먼저 검출하고, 검출된 체형 영역에서 얼굴 영역을 검출할 수 있다. 즉, 체형 검출부(232)에서는 프레임 분리부(212)에서 생성된 각각의 이미지 프레임에서 체형 영역을 검출하고, 얼굴 검출부(234)에서는 체형 검출부(232)에서 검출된 체형 영역 또는 이미지 프레임에서 얼굴을 검출할 수 있다. 이때, 체형 검출부(232) 및 얼굴 검출부(234)에서 특정 영역을 검출하는 방법으로는 Knowledge-based 방법, Feature invariant 방법, Template matching 방법, Appearance-based 방법 등 매우 다양한 방법이 적용될 수 있다. 아울러, 딥러닝을 통해 검출부(230)의 동작 효율을 향상시킬 수 있다. The detection unit 230 may serve to detect the face region and body shape region of the masking processing target and the masking unprocessed target in each image frame. That is, the detection unit 230 may detect a face region and a body shape region of a person in each image frame. To this end, the detection unit 230 may include a body shape detection unit 232 and a face detection unit 234 . In each image frame generated by the frame separating unit 212 , it is necessary to detect a face region in which face recognition is to be performed in order to recognize a human face. In this case, in order to detect a required face region within a short time in consideration of the characteristics of the live broadcast, the body shape region may be first detected in some cases, and the face region may be detected from the detected body shape region. That is, the body shape detection unit 232 detects a body shape region in each image frame generated by the frame separator 212 , and the face detection unit 234 detects a face in the body shape region or image frame detected by the body shape detection unit 232 . can be detected. In this case, as a method for detecting a specific region by the body type detection unit 232 and the face detection unit 234 , a wide variety of methods such as a knowledge-based method, a feature invariant method, a template matching method, and an appearance-based method may be applied. In addition, the operation efficiency of the detection unit 230 may be improved through deep learning.

비교부(240)는 검출부(230)에서 추출된 얼굴 영역 또는/및 체형 영역과 매칭 정보 생성부(222)에서 생성된 마스킹 미처리 대상에 대한 매칭 정보를 비교하여 각각이 이미지 프레임에서 마스킹 처리 대상 또는 마스킹 처리 영역을 선정하는 역할을 수행할 수 있다. 여기서, 마스킹 처리 대상은 초상권 침해를 방지하기 위해 얼굴 또는/및 체형이 공개되면 안되는 대상자를 지칭할 수 있다. 다시 말해, 비교부(240)는 얼굴 검출부(234)를 통해 각각의 이미지 프레임에서 검출된 얼굴 영역과 매칭 정보 생성부(222)에서 생성된 매칭 정보를 비교하고, 추출된 얼굴 영역이 매칭 정보에 부합하는 것인지를 판단하여 마스킹 처리 대상을 선정할 수 있다. 이때, 비교부(240)는 매칭 정보와 추출된 얼굴 영역의 동일성 여부를 판단하기 위해 추출된 얼굴 영역으로부터 얼굴 특징 추출, 분류 및 인식 과정을 수행할 수 있다. 얼측 특징 추출, 분류 및 인식을 위해서는 Gabor ilter, PCA(Principal Component Analysis), FDA(Fisher Discriminant Analysis), ICA(Independent Component Analysis), LBP(local Binary Feature), SVM(Support Vector machine) 등과 같은 다양한 특징 추출 및 인식 알고리즘이 적용 될 수 있다. 아울러, 딥러닝을 통해 비교부(240)의 매칭율을 향상시킬 수 있다. The comparison unit 240 compares the matching information on the face region or/and body type region extracted by the detection unit 230 with the unmasked target generated by the matching information generation unit 222 to obtain a masking processing target or It may serve to select a masking processing area. Here, the masking processing target may refer to a target whose face and/or body shape should not be disclosed in order to prevent infringement of portrait rights. In other words, the comparison unit 240 compares the face region detected in each image frame through the face detection unit 234 with the matching information generated by the matching information generator 222 , and the extracted face region is the matching information. A masking process target may be selected by determining whether it matches. In this case, the comparator 240 may perform a process of extracting, classifying, and recognizing facial features from the extracted facial region to determine whether the matching information and the extracted facial region are identical. Various features such as Gabor ilter, PCA (Principal Component Analysis), FDA (Fisher Discriminant Analysis), ICA (Independent Component Analysis), LBP (local Binary Feature), SVM (Support Vector machine), etc. Extraction and recognition algorithms can be applied. In addition, the matching rate of the comparison unit 240 may be improved through deep learning.

명도 제어부(242)는 비교부(240)의 결과물 즉, 매칭 정보와 추출된 얼굴 영역의 매칭율을 향상시키는 역할을 수행할 수 있다. 구체적으로, 명도 제어부(242)는 각각의 이미지 프레임에서 배경이 너무 어둡거나, 또는 너무 밝아서 매칭율이 저하되는 것을 방지하는 역할을 수행할 수 있다. 이를 위해, 명도 제어부(242)는 각각의 이미지 프레임이 일정한 수준의 밝기를 갖도록 각 이미지 프레임의 명도를 조절할 수 있으며, 이를 통해 비교부(240)에서 결과물이 출력되는 시간을 단축시킴과 동시에 출력되는 결과물의 정확도를 향상시킬 수 있다.The brightness control unit 242 may serve to improve the matching rate between the result of the comparison unit 240 , that is, the matching information and the extracted face region. Specifically, the brightness control unit 242 may serve to prevent a matching rate from being lowered because the background is too dark or too bright in each image frame. To this end, the brightness control unit 242 may adjust the brightness of each image frame so that each image frame has a certain level of brightness, thereby shortening the time for outputting the result from the comparator 240 and outputting the result at the same time. The accuracy of the results can be improved.

마스킹 처리부(250)는 비교부(240)에서 선정된 마스킹 처리 대상의 얼굴에 대해 마스킹 처리를 수행하는 역할을 수행할 수 있다. 즉, 각각의 이미지 프레임에서 얼굴이 공개되어도 무방한 마스킹 미처리 대상을 제외한 사람들의 개인 정보 보호를 위해 적어도 얼굴 영역을 숨겨주는 역할을 수행할 수 있다. 마스킹 처리는 모자이크 처리법을 사용할 수 있다. The masking processing unit 250 may perform a role of performing masking processing on the face of the masking processing target selected by the comparison unit 240 . That is, in each image frame, at least the face region may be hidden to protect personal information of people except for the unmasked target, which is allowed to be exposed to the face in each image frame. The masking process may use a mosaic process.

마스킹 레벨 제어부(252)는 콘텐츠 제작자에 의해 미리 설정된 조건에 따라 마스킹 처리 정도 및 마스킹 처리 범위을 제어하는 역할을 수행할 수 있다. 여기서, 마스킹 처리로 모자이크 처리법을 사용하는 경우 마스킹 처리 정도는 모자이크 패턴의 밀도일 수 있다. 그리고, 마스킹 처리 영역은 얼굴 영역 또는/및 체형 영역일 수 있다. 예컨대, 경우에 따라 마스킹 처리 대상의 얼굴을 포함하는 체형 영역 전체에 대해 마스킹 처리가 필요한 경우, 체형 검출부(232)에서 체형 영역에 관한 정보를 전송받아 마스킹 대상의 체형 영역 전체를 마스킹 할 수도 있다. The masking level controller 252 may serve to control the degree of masking and the range of masking according to conditions preset by the content creator. Here, when the mosaic process is used as the masking process, the degree of the masking process may be the density of the mosaic pattern. In addition, the masking processing region may be a face region and/or a body shape region. For example, in some cases, when masking processing is required for the entire body shape region including the face of the masking target, the body shape detection unit 232 may receive information about the body shape region and mask the entire body shape region of the masking target.

프레임 병합부(260)는 마스킹 처리가 완료된 각각의 이미지 프레임을 동영상 시퀀스에 따라 인코팅하여 동영상 콘텐츠를 생성하는 역할을 수행할 수 있고, 프레임 병합부(260)에서 재생산된 동영상 콘텐츠를 실시간으로 콘텐츠 사용자 단말(300)로 전송할 수 있다.The frame merging unit 260 may play a role of encoding each image frame on which the masking process has been completed according to a moving picture sequence to generate video content, and the video content reproduced by the frame merging unit 260 may be converted into content in real time. It can be transmitted to the user terminal 300 .

상술한 구성을 갖는 얼굴 인식에 기반한 실시간 얼굴 마스킹 시스템(200)은 라이브 방송과 같은 동영상 콘텐츠를 사용자에게 실시간으로 제공하는 콘텐츠 관리 서버(10)에 탑재되어 동영상 콘텐츠 내에서 얼굴 또는/및 체형이 노출되지 않아야 하는 사람에 대해 자동적으로 마스킹 처리를 수행함으로서 초상권 등의 개인 정보를 효과적으로 보호할 수 있다. The real-time face masking system 200 based on face recognition having the above configuration is mounted on the content management server 10 that provides video content such as live broadcasting to users in real time, so that the face or / and body shape is exposed in the video content. It is possible to effectively protect personal information such as portrait rights by automatically performing masking on people who should not be.

또한, 명도 제어부(242)를 구비함으로서 각각의 이미지 프레임의 밝기를 일정한 수준으로 유지한 상태에서 매칭 정보와 추출된 얼굴 영역을 비교하여 마스킹 처리 대상의 선정 정확도를 향상시킬 수 있다.In addition, by providing the brightness control unit 242, it is possible to improve the selection accuracy of the masking processing target by comparing the matching information with the extracted face region while maintaining the brightness of each image frame at a constant level.

도 2는 본 발명의 실시예에 따른 얼굴 인식에 기반한 실시간 얼굴 마스킹 시스템을 이용한 얼굴 마스킹 방법을 설명하기 위한 순서도이다. 2 is a flowchart illustrating a face masking method using a real-time face masking system based on face recognition according to an embodiment of the present invention.

도 1 및 도 2를 참조하여 본 발명의 실시예에 따른 얼굴 마스킹 방법은 먼저, 콘텐츠 제작자 단말(100)로부터 매칭 정보를 생성하기 위한 복수의 이미지와 동영상 콘텐츠를 전송받아 각각 기준 정보 저장부(220) 및 콘텐츠 저장부(210)에 저장한다. 이때, 매칭 정보는 동영상 콘텐츠에서 얼굴 또는/및 체형이 공개되어도 무방한 마스킹 미처리 대상의 얼굴 및 체형에 관한 정보를 지칭할 수 있고, 매칭 정보를 생성하기 위한 복수의 이미지를 기준 정보 저장부(220)에 업로드한 이후에 동영상 콘텐츠를 콘텐츠 저장부(210)에 업로드할 수 있다. 이는, 콘텐츠 제작자가 동영상 콘텐츠를 생산하여 콘텐츠 사용자에게 제공하는 시점에서 시스템적으로 개인 정보 보호가 가능하도록 구현하기 위함이다. 예를 들어, 기준 정보 저장부(220)에 복수이 이미지가 업로드되어 저장되면, 기준 정보 저장부(220)는 저장 완료 신호를 생성하여 콘텐츠 저장부(210)에 전송할 수 있고, 콘텐츠 저장부(210)는 저장 완료 신호에 응답하여 콘텐츠 제작자 단말(100)로부터 동영상 콘텐츠를 전송받아 저장할 수 있다. 1 and 2, in the face masking method according to an embodiment of the present invention, first, a plurality of images and video contents for generating matching information are received from the content creator terminal 100, and each reference information storage unit 220 ) and stored in the content storage unit 210 . In this case, the matching information may refer to information about the face and body shape of an unmasked target that may be disclosed in the video content, and a plurality of images for generating matching information are stored in the reference information storage unit 220 ), the video content may be uploaded to the content storage unit 210 . This is to implement so that personal information can be systematically protected when a content creator produces video content and provides it to content users. For example, when a plurality of images are uploaded and stored in the reference information storage unit 220 , the reference information storage unit 220 may generate a storage completion signal and transmit it to the content storage unit 210 , and the content storage unit 210 . ) may receive and store the video content from the content creator terminal 100 in response to the storage completion signal.

다음으로, 프레임 분리부(212)를 통해 콘텐츠 저장부(210)에 저장된 동영상 콘텐츠를 디코팅하고, 동영상 시퀀스에 따라 각각의 이미지 프레임으로 분리한다. 이때, 이미지 프레임으로 분리되는 정도 즉, 프레임 개수는 콘텐츠 제작자가 임의로 설정할 수 있다. Next, the video content stored in the content storage unit 210 is decoded through the frame separator 212 and separated into individual image frames according to a video sequence. In this case, the degree of division into image frames, that is, the number of frames, may be arbitrarily set by the content creator.

다음으로, 매칭 정보 생성부(222)를 통해 기준 정보 저장부(220)에 저장된 복수의 이미지로부터 마스킹 미처리 대상에 대한 얼굴 또는/및 체형에 대한 정보 즉, 매칭 정보를 생성한다. 매칭 정보 생성부(222)는 딥러닝 기반의 DeepFace 또는 VGGFace를 사용하여 매칭 정보를 생성할 수 있다. Next, information on the face and/or body type of the unmasked target, ie, matching information, is generated from the plurality of images stored in the reference information storage unit 220 through the matching information generation unit 222 . The matching information generator 222 may generate matching information using DeepFace or VGGFace based on deep learning.

다음으로, 검출부(230)의 얼굴 검출부(234)를 통해 분리된 각각의 이미지 프레임에서 얼굴 영역을 추출한다. 분리된 각각의 이미지 프레임에서 얼굴 영역의 추출은 Knowledge-based 방법, Feature invariant 방법, Template matching 방법 및 Appearance-based 방법으로 이루어진 그룹으로부터 선택된 어느 한 방법 또는 둘 이상의 방법을 혼용하여 사용할 수 있다.Next, a face region is extracted from each image frame separated by the face detector 234 of the detector 230 . Extraction of the face region from each separated image frame may be performed using any one method selected from the group consisting of a knowledge-based method, a feature invariant method, a template matching method, and an Appearance-based method, or a combination of two or more methods.

한편, 각각의 이미지 프레임에서 얼굴 영역의 추출 속도를 향상시키기 위해 체형 검출부(232)을 통해 체형 영역을 먼저 검출한 후, 체형 영역 내에서 얼굴 영역을 검출하는 방법을 사용할 수도 있다. Meanwhile, in order to improve the extraction speed of the face region from each image frame, a method of first detecting the body shape region through the body shape detection unit 232 and then detecting the face region within the body shape region may be used.

다음으로, 비교부(240)를 통해 추출된 얼굴 영역과 매칭 정보를 비교하여 각각의 이미지 프레임에서 마스킹 처리 대상 또는 마스킹 처리 영역을 선정한다. 매칭 정보와 추출된 얼굴 영역의 동일성 여부를 판단하기 위해 추출된 얼굴 영역으로부터 얼굴 특징 추출, 분류 및 인식 과정을 수행할 수 있고, 이를 위해 Gabor ilter, PCA(Principal Component Analysis), FDA(Fisher Discriminant Analysis), ICA(Independent Component Analysis), LBP(local Binary Feature) 및 SVM(Support Vector machine)으로 이루어진 그룹으로부터 선택된 어느 하나 또는 둘 이상의 특징 추출 및 인식 알고리즘이 적용될 수 있다. 이때, 추출된 얼굴 영역과 매칭 정보 사이의 매칭율을 향상시키기 위해 명도 제어부(242)를 통해 각각의 이미지 프레임이 일정한 수준의 밝기를 갖도록 명도를 제어한 후, 추출된 얼굴 영역과 매칭 정보를 비교하여 마스킹 처리 대상을 선정할 수 있다. Next, a masking process target or a masking process area is selected from each image frame by comparing the matching information with the face region extracted through the comparator 240 . Facial feature extraction, classification, and recognition processes can be performed from the extracted facial region to determine whether the matching information and the extracted facial region are identical. For this purpose, Gabor ilter, PCA (Principal Component Analysis), and FDA (Fisher Discriminant Analysis) ), Independent Component Analysis (ICA), Local Binary Feature (LBP), and Support Vector Machine (SVM), any one or two or more feature extraction and recognition algorithms selected from the group may be applied. At this time, in order to improve the matching rate between the extracted face region and the matching information, the brightness control unit 242 controls the brightness so that each image frame has a certain level of brightness, and then compares the extracted face region with the matching information. Thus, a masking process target can be selected.

다음으로, 마스킹 처리부(250)를 통해 선정된 마스킹 처리 대상의 얼굴 영역에 대해 마스킹 처리를 수행한다. 마스킹 처리는 모자이크 처리법을 사용할 수 있다. Next, a masking process is performed on the face region of the masking process target selected through the masking processor 250 . The masking process may use a mosaic process.

다음으로, 프레임 병합부(260)를 통해 마스킹 처리가 완료된 각각의 이미지 프레임을 병합하고, 인코팅하여 동영상 콘텐츠를 재생산한다. 그 후, 재생산된 동영상 콘텐츠를 사용자 단말(300)로 전송한다. 따라서, 최초 동영상 콘텐츠가 콘텐츠 저장부(210)에 업로드되는 시점으로부터 재생산된 동영상 콘텐츠가 콘텐츠 사용자 단말(300)로 전송되는 시점까지 일정 시간이 지연될 수 있으나, 실시간으로 마스킹 처리를 통해 개인 정보 침해를 방지할 수 있는 동영상 콘텐츠를 사용자에게 제공할 수 있다. Next, each image frame on which the masking process is completed is merged and encoded through the frame merging unit 260 to reproduce the video content. Thereafter, the reproduced video content is transmitted to the user terminal 300 . Therefore, a certain time may be delayed from the point in time when the first video content is uploaded to the content storage unit 210 to the point in time when the reproduced video content is transmitted to the content user terminal 300 , but personal information is violated through masking in real time. Video content that can prevent this can be provided to users.

이상 본 발명을 바람직한 실시예를 들어 상세하게 설명하였으나, 본 발명은 상기 실시예에 한정되지 않고, 본 발명의 기술적 사상의 범위 내에서 당 분야에서 통상의 지식을 가진 자에 의하여 여러 가지 변형이 가능하다.Although the present invention has been described in detail with reference to preferred embodiments, the present invention is not limited to the above embodiments, and various modifications are possible by those skilled in the art within the scope of the technical spirit of the present invention. Do.

10 : 콘텐츠 관리 서버 100 : 콘텐츠 제작자 단말
200 : 얼굴 마스킹 시스템 210 : 콘텐츠 저장부
212 : 프레임 분리부 220 : 기준 정보 저장부
222 : 매칭 정보 생성부 230 : 검출부
232 : 체형 검출부 234 : 얼굴 검출부
240 : 비교부 242 : 명도 제어부
250 : 마스킹 처리부 252 : 마스킹 레벨 제어부
260 : 프레임 병합부 300 : 콘텐츠 사용자 단말10: content management server 100: content producer terminal
200: face masking system 210: content storage unit
212: frame separation unit 220: reference information storage unit
222: matching information generation unit 230: detection unit
232: body shape detection unit 234: face detection unit
240: comparison unit 242: brightness control unit
250: masking processing unit 252: masking level control unit
260: frame merging unit 300: content user terminal

Claims

a content storage unit for receiving and storing video content from a content producer terminal;
a frame separator for dividing the video content into a plurality of image frames according to a video sequence;
a reference information storage unit for receiving and storing a plurality of images including a face of an unmasked target from the content creator terminal;
a matching information generator for generating matching information including facial features of the unmasked target from the plurality of images;
a face detection unit for extracting a face region from each of the plurality of image frames through a template matching method;
a comparison unit configured to compare the face region extracted from each of the plurality of image frames with the matching information to select a masking process target region; and
a masking processing unit configured to perform masking processing on the masking processing target area selected by the comparison unit in each of the plurality of image frames;
The matching information generating unit serves to generate matching information for a face and body shape of an unmasked target from the plurality of images stored in the reference information storage unit, and the matching information generating unit is DeepFace or VGGFace based on deep learning technology (Visual Geometry Group Face) is used to generate matching information, and the DeepFace performs face alignment by affine transformation after landmark extraction using a pre-trained 3D face geometry model and then 9 layers It generates matching information by learning a convolutional neural network composed of a large amount of data collected internally by Facebook, and the VGGFace is a large-capacity face recognition dataset created directly through Internet search, the VGG face dataset. is disclosed, and using this data, a deep network structure composed of 15 convolutional layers is learned to generate matching information,
Real-time face masking system based on face recognition.

According to claim 1,
a frame merging unit for reproducing video content by merging each of the plurality of image frames on which the masking process is completed according to a video sequence;
a brightness control unit that compares the face area extracted by the comparison unit with the matching information to adjust brightness of each of the plurality of image frames before selecting a masking processing target area;
a body shape detection unit for extracting a body shape region from each of the plurality of image frames; and
When performing the masking process on the masking process target region, the masking level control unit for controlling the degree of the masking process and the masking process range
A real-time face masking system based on face recognition further comprising a.

3. The method of claim 2,
The brightness controller is a real-time face masking system based on face recognition that controls the brightness of each of the plurality of image frames to the same level.

delete