KR102360919B1

KR102360919B1 - A host video directing system based on voice dubbing

Info

Publication number: KR102360919B1
Application number: KR1020210068825A
Authority: KR
Inventors: 육은영; 전성호
Original assignee: 주식회사 유콘
Priority date: 2021-05-28
Filing date: 2021-05-28
Publication date: 2022-02-09

Abstract

A voice dubbing-based host video editing system according to the present invention comprises: an image database for receiving a plurality of images including the face of a host and setting and storing a representative image; a region detection module for detecting a face region and a mouth region from the face region from the images stored in the image database; an object extraction module for extracting and storing a mouth-shaped object for each pronunciation of a voice from the mouth region; a voice management module including a voice input unit for receiving an information voice including guide information and a pronunciation analysis unit for analyzing pronunciation of the information voice; a synthesis module for generating a synthesized image by synthesizing the mouth-shaped object into the mouth region of the representative image depending on the pronunciation of the information voice; and an output module for outputting the synthesized image by dubbing the information voice. Accordingly, it is possible to provide a more natural image.

Description

A host video editing system based on voice dubbing {A HOST VIDEO DIRECTING SYSTEM BASED ON VOICE DUBBING}

본 발명은 음성 더빙 기반의 진행자 영상 편집 시스템에 관한 것으로서, 보다 상세히 설명하면 진행자의 영상에 정보를 전달하는 정보음성을 합성 처리하되, 정보음성의 발음에 맞게 진행자의 입 모양을 변형할 수 있도록 하여 실제 진행자로 하여금 모든 영상을 촬영하지 않더라도 정보 전달을 가능케 한, 음성 더빙 기반의 진행자 영상 편집 시스템에 관한 것이다.The present invention relates to a moderator video editing system based on voice dubbing. More specifically, the present invention synthesizes and processes informational voice that delivers information to the moderator's video, and allows the moderator's mouth shape to be modified to match the pronunciation of the information voice. It relates to a moderator video editing system based on voice dubbing that enables the actual moderator to transmit information even if they do not shoot all the videos.

뉴스(News), 보다 바람직하게 TV 뉴스는 그날 벌어진 사건이나 현재까지 진행되고 있는 사건에 대해 5W 1H, 즉, 즉 누가(Who) 언제(When), 어디서(Where), 무엇을(What), 왜(Why), 어떻게(How) 원칙에 따라 안내하는 영상이다.News, more preferably TV news, is about 5W 1H, that is, Who, When, Where, What, Why This is a video that guides you according to the (Why) and How (How) principles.

이러한 뉴스에는 아나운서로 대표되는 진행자가 출연하는데, 이를 위해 종래의 뉴스는 진행자가 스튜디오에서 해당 뉴스의 정보를 제공하는 영상을 촬영하여 제공하곤 하였다.In such news, a presenter represented by an announcer appears. For this purpose, in the conventional news, the presenter used to film and provide a video providing information on the news in a studio.

그러나, 이와 같은 종래의 뉴스 진행을 위해서는 매번 스튜디오에서 진행자가 정보를 전달하는 영상을 촬영하여 제공하여야 하므로, 시공간의 제약이 있을 뿐 아니라 뉴스 제공을 위해서는 진행자가 항상 필요하다는 한계성이 있었다.However, in order to proceed with such a conventional news, since each time a studio shoots and provides an image in which the presenter delivers information, there is a limitation in space and time, and there is a limitation that a presenter is always needed to provide news.

더불어, 근래 들어 인공지능(AI) 기술을 이용해 제작된 가짜 동영상 또는 제작 프로세스 자체를 의미하는 딥페이크 기술이 널리 이용되고 있는데, 여기서 딥페이크(deepfake)라 함은 딥러닝(deep learning)과 페이크(fake)의 합성어다. 적대관계생성신경망(GAN: Generative Adversarial Network)이라는 기계학습(ML) 기술을 사용하여, 기존 사진이나 영상을 원본에 겹쳐서 만들어 내는 기술이다.In addition, in recent years, deepfake technology, meaning a fake video produced using artificial intelligence (AI) technology or the production process itself, has been widely used. It is a compound word of fake). It is a technology that creates by superimposing existing photos or videos on the original using machine learning (ML) technology called Generative Adversarial Network (GAN).

따라서 이와 같이 기존 사진이나 영상을 원본에 겹쳐서 만들어 내어, 원본을 변형하여 다양한 추가적인 영상을 제작하고 있는 기술들이 도입되어 이용되고 있다.Accordingly, techniques for creating various additional images by superimposing existing photos or images on the originals and modifying the originals have been introduced and used.

이에 대한 선행기술로서, 한국 등록특허 제 10-0682889호에 ‘영상에 기반한 사실감 있는 3차원 얼굴 모델링 방법 및 장치’가 개시되어 있다.As a prior art for this, Korean Patent Registration No. 10-0682889 discloses a 'image-based realistic three-dimensional face modeling method and apparatus'.

상기 발명은 영상에 기반한 사실감 있는 3차원 얼굴 모델링 방법 및 장치에 관한 것으로서, 그 방법은 묘사된(textured) 다각형 메쉬 모델에 의하여 표현되는 3차원의 사실감 있는 얼굴모델 생성 방법에 있어서, 입력 영상의 정면 및 측면 얼굴 영상내의 얼굴 특징들을 검출하는 단계; 특정 얼굴의 3차원 형상을 결정하기 위해, 검출된 얼굴 특징들을 이용하여 초기 모델을 변형시켜 3차원 얼굴모델을 생성하는 단계; 입력 영상으로부터 사실감 있는 텍스쳐를 생성하는 단계; 및 상기 텍스쳐를 상기 3차원 모델 상에 매핑하는 단계를 포함하는 것을 특징으로 한다.The present invention relates to a realistic three-dimensional face modeling method and apparatus based on an image, the method comprising: a method for generating a three-dimensional realistic face model expressed by a textured polygonal mesh model, the front face of an input image and detecting facial features in the side face image. generating a three-dimensional face model by transforming an initial model using detected facial features to determine a three-dimensional shape of a specific face; generating a realistic texture from the input image; and mapping the texture onto the 3D model.

그러나 상술한 선행기술의 경우, 3차원 얼굴모델을 생성하고 텍스쳐를 생성하는 것으로서, 얼굴 특징을 기반으로 얼굴 전체를 모델링하여야 하여 영상 생성에 긴 시간이 걸릴뿐더러 처리해야 할 정보의 양이 많다는 문제점이 있었다.However, in the case of the prior art described above, as a three-dimensional face model is generated and a texture is generated, the entire face must be modeled based on facial features, so it takes a long time to generate an image and there is a problem that the amount of information to be processed is large. there was.

따라서 상술한 바와 같은 문제점을 해결하기 위해, 진행자로 하여금 정보를 제공할 시점마다 영상을 촬영하지 않고도 기존의 영상을 이용하여 합성 영상을 제공할 수 있도록 하되, 입 모양만의 변화를 제공할 수 있도록 하여 정보의 처리량을 줄인 음성 더빙 기반의 진행자 영상 편집 시스템을 개발할 필요성이 대두되는 실정이다.Therefore, in order to solve the above-mentioned problems, the moderator can provide a composite image using an existing image without taking an image every time to provide information, but to provide a change in the shape of the mouth only. Therefore, there is a need to develop an audio dubbing-based moderator video editing system that reduces the amount of information processing.

본 발명은 영상에 출연하여 정보를 제공하는 진행자가 매번 영상을 촬영하지 않더라도, 기존의 영상을 기반으로 하여 정보음성의 발음에 맞게 입 모양을 합성한 합성 영상을 생성하여 제공할 수 있도록 하는 것을 주요 목적으로 한다.The main purpose of the present invention is to create and provide a synthesized image in which the shape of the mouth is synthesized according to the pronunciation of the information voice based on the existing image, even if the moderator who appears on the image and provides information does not shoot the image every time. The purpose.

본 발명의 다른 목적은, 감정에 따른 얼굴의 변화를 합성 영상 생성에 반영할 수 있도록 하여 보다 현실감 있는 합성영상을 제공할 수 있도록 하는 것이다.Another object of the present invention is to provide a more realistic synthetic image by allowing a change of a face according to an emotion to be reflected in the synthetic image generation.

본 발명의 또 다른 목적은, 보다 자연스러운 영상 제공을 위해 영상의 보정 처리를 수행하는 것이다.Another object of the present invention is to perform image correction processing to provide a more natural image.

본 발명의 추가 목적은, 진행자의 입 모양에 대한 합성 뿐 아니라 배경의 합성을 가능케 하여 장소의 제약을 없애도록 하는 것이다.It is a further object of the present invention to enable the composition of the background as well as the composition of the presenter's mouth shape, thereby eliminating the constraint of the location.

상기 목적을 달성하기 위하여, 본 발명에 따른 음성 더빙 기반의 진행자 영상 편집 시스템은, 진행자의 얼굴을 포함하는 복수의 영상을 입력받아 대표영상을 설정하여 저장하는 영상 데이터베이스; 상기 영상 데이터베이스에 저장된 복수의 상기 영상에서 얼굴 영역 및 상기 얼굴 영역에서 입 영역을 검출하는 영역 검출 모듈; 상기 입 영역에서 음성의 발음 별 입 모양 오브젝트를 추출하여 저장하는 오브젝트 추출모듈; 안내정보를 포함하는 정보음성을 입력받는 음성 입력부 및, 상기 정보음성의 발음을 분석하는 발음 분석부를 포함하는 음성 관리 모듈; 상기 정보음성의 발음에 따라 상기 입 모양 오브젝트를 상기 대표영상의 입 영역에 합성 처리하여 합성영상을 생성하는 합성 모듈; 상기 합성영상에 상기 정보음성을 더빙하여 출력하는 출력 모듈;을 포함하는 것을 특징으로 한다.In order to achieve the above object, an audio dubbing-based moderator video editing system according to the present invention includes: an image database for receiving a plurality of images including a face of the moderator, setting and storing representative images; a region detection module configured to detect a face region and a mouth region from the face region from the plurality of images stored in the image database; an object extraction module for extracting and storing a mouth-shaped object for each pronunciation of a voice from the mouth region; a voice management module comprising: a voice input unit for receiving an information voice including guide information; and a pronunciation analysis unit for analyzing pronunciation of the information voice; a synthesis module for generating a synthesized image by synthesizing the mouth-shaped object into the mouth region of the representative image according to the pronunciation of the information voice; and an output module for outputting the information voice by dubbing the synthesized image.

나아가, 상기 음성 입력부는, 상기 안내정보의 감정정보를 입력받는 추가정보 입력파트를 더 포함하고, 상기 오브젝트 추출모듈은, 복수의 상기 입 모양 오브젝트를 감정 별로 분류하는 감정 분류부 및, 상기 안내정보의 감정정보와 일치하는 상기 입 모양 오브젝트를 감정 일치 오브젝트로 추출하는 오브젝트 추출부를 더 포함하며, 상기 합성 모듈은, 상기 정보음성의 발음에 따라 상기 감정 일치 오브젝트를 상기 대표영상의 입 영역에 합성하여 합성영상을 생성하는 것을 특징으로 한다.Furthermore, the voice input unit further includes an additional information input part for receiving the emotion information of the guide information, and the object extraction module includes an emotion classification unit for classifying the plurality of mouth-shaped objects by emotion, and the guide information and an object extracting unit for extracting the mouth-shaped object matching the emotion information of It is characterized in that it generates a composite image.

더하여, 상기 합성 모듈은, 상기 합성영상에 속한 상기 입 영역과 상기 얼굴 영역 사이의 경계 부위의 피부 화소를 검출하는 피부 화소 검출파트와, 상기 경계 부위의 채도 및 명도를 보정 처리하는 경계 보정 파트 및, 상기 경계 부위를 스머징(smudging) 처리하는 스머징 파트를 포함하는 경계 보정부를 더 포함하는 것을 특징으로 한다.In addition, the synthesis module includes a skin pixel detection part for detecting skin pixels at a boundary between the mouth region and the face region belonging to the synthesized image, a boundary correction part for correcting saturation and brightness of the boundary region, and , It characterized in that it further comprises a boundary correction unit including a smudging part for smudging the boundary portion.

또한, 상기 영상 데이터베이스는, 실내 및 야외 중 어느 하나의 장소를 포함하는 배경 영상을 입력받아 저장하고, 상기 영역 검출 모듈은, 복수의 상기 영상에서 상기 진행자를 포함한 객체 영역 및 상기 객체 영역을 제외한 나머지 영역인 배경 영역을 구분하는 배경 검출부를 포함하고, 상기 합성 모듈은, 상기 배경 영상을 상기 배경 영역에 합성하는 배경 합성부를 포함하는 것을 특징으로 한다.In addition, the image database receives and stores a background image including any one of indoor and outdoor places, and the region detection module includes, in the plurality of images, an object region including the moderator and an object region other than the object region. and a background detection unit for discriminating a background region that is a region, and the synthesizing module includes a background synthesizing unit for synthesizing the background image with the background region.

본 발명에 따른 음성 더빙 기반의 진행자 영상 편집 시스템은, A presenter video editing system based on voice dubbing according to the present invention,

1) 기존에 저장된 진행자를 촬영한 영상에, 입력된 정보음성의 발음에 맞는 입 모양을 합성 처리한 합성영상을 생성하여 이를 정보음성과 함께 제공함으로써 진행자로 하여금 매번 영상을 촬영할 필요 없이 정보를 제공할 수 있도록 함과 동시에 정보 처리량을 절감할 수 있도록 하였으며,1) A synthesized image is created by synthesizing a mouth shape that matches the pronunciation of the inputted information voice from the previously recorded image of the presenter, and providing it together with the information voice, allowing the presenter to provide information without the need to shoot a video every time and at the same time to reduce the amount of information processing,

2) 같은 발음이라 할지라도 감정 상태에 따라 달라지는 입 모양을 반영할 수 있도록 감정상태를 반영한 합성영상을 생성할 수 있도록 하여 현실감을 높일 수 있도록 하고,2) Even with the same pronunciation, it is possible to create a composite image reflecting the emotional state to reflect the mouth shape that varies depending on the emotional state, thereby enhancing the sense of reality.

3) 합성된 입 모양 인근에 대한 블러링 및 경계의 보정을 가능케 하여 보다 자연스러운 영상을 제공하며,3) It provides a more natural image by enabling blurring and boundary correction around the synthesized mouth shape.

4) 다양한 배경을 함께 합성할 수 있도록 함으로써 보다 현실감 및 현장감 있는 영상을 제공할 수 있는 효과가 있다.4) By allowing various backgrounds to be synthesized together, it has the effect of providing a more realistic and realistic image.

도 1은 본 발명의 시스템에 대한 개략적인 구성을 나타낸 개념도.
도 2는 본 발명의 시스템의 전체 구성을 도시한 블록도.
도 3은 얼굴 영역 및 입 영역을 도시한 개념도.
도 4는 본 발명의 객체 영역 및 배경 영역을 포함하는 합성영상의 화면 예시를 나타낸 개념도.
도 5는 본 발명의 하이라이트 영역 및 그림자 영역, 포인트 영역 및 섀도우 영역을 나타낸 개념도.
도 6은 가상 연결선 및 가상 진행선을 기반으로 하여 산출되는 그라데이션 에어리어를 도시한 개념도.1 is a conceptual diagram showing a schematic configuration of a system of the present invention.
Fig. 2 is a block diagram showing the overall configuration of the system of the present invention;
3 is a conceptual diagram illustrating a face region and a mouth region;
4 is a conceptual diagram illustrating an example of a screen of a composite image including an object region and a background region according to the present invention.
5 is a conceptual diagram illustrating a highlight area, a shadow area, a point area, and a shadow area according to the present invention;
6 is a conceptual diagram illustrating a gradation area calculated based on a virtual connecting line and a virtual progress line.

이하 첨부된 도면을 참조하여 본 발명의 바람직한 실시예를 상세하게 설명하도록 한다. 첨부된 도면은 축척에 의하여 도시되지 않았으며, 각 도면의 동일한 참조 번호는 동일한 구성 요소를 지칭한다.Hereinafter, preferred embodiments of the present invention will be described in detail with reference to the accompanying drawings. The accompanying drawings are not drawn to scale, and like reference numbers in each drawing refer to like elements.

도 1은 본 발명의 시스템에 대한 개략적인 구성을 나타낸 개념도이다.1 is a conceptual diagram showing a schematic configuration of a system of the present invention.

도 1을 참조하여 설명하면, 본 발명의 진행자 영상 편집 시스템은 진행자(1) 및 중앙관제서버(2)로 이루어진다.Referring to FIG. 1 , the moderator video editing system of the present invention includes a moderator 1 and a central control server 2 .

진행자(1)는 영상에 출연하여 안내정보를 제공하는 자로서, 뉴스의 아나운서나 앵커, 일기예보의 기상캐스터, 홈쇼핑 등의 쇼호스트, 기타 TV 프로그램의 진행자(1)일 수 있다.The moderator 1 is a person who provides guidance information by appearing on a video, and may be a news announcer or anchor, a weather forecaster for a weather forecast, a show host for home shopping, etc., and a moderator 1 for other TV programs.

이러한 진행자(1)는 실질적으로 영상 및 합성을 통해 생성된 합성영상에 출연하는 자이며, 후술하겠으나 정보음성을 제공하는 자 역시 진행자(1) 본인일 수 있다. 진행자(1)는 따라서 안내정보를 제공하는 영상에 출연하여 얼굴을 제공하는 자라고 할 수 있다.The presenter 1 is actually a person who appears in the video and the synthesized image generated through synthesis, and as will be described later, the presenter 1 may also be the person providing the information voice. Therefore, the moderator 1 can be said to be a person who provides a face by appearing in an image providing guide information.

따라서 진행자(1)는 합성영상의 생성을 위해 본 발명의 시스템에 기존에 본인이 촬영된 복수의 영상을 제공하여야 하는데, 이때 영상은 기존의 뉴스나 프로그램 진행 영상, 일기예보에 진행자(1)가 촬영된 영상을 의미하는 것이다.Therefore, the moderator (1) must provide a plurality of images taken by himself/herself to the system of the present invention in order to generate a composite image. It means the recorded video.

중앙관제서버(2)는 본 발명의 진행자 영상 편집 시스템을 통해 합성영상을 생성하는 주체로서, 대표영상에 정보음성의 입 모양에 맞추어 입 모양을 합성하여 합성영상을 생성하고, 합성영상에 정보음성을 더빙하여 제공하는 기능을 수행한다.The central control server 2 is a subject that generates a composite image through the presenter's video editing system of the present invention, and generates a composite image by synthesizing the mouth shape according to the mouth shape of the information voice to the representative image, and the information voice to the composite image. It performs the function provided by dubbing.

이러한 중앙관제서버(2)는 본 발명의 시스템을 구현해내기 위한 일련의 주체로서, 서버PC 및 네트워크 통신망 등을 함께 포함한다. 더불어 중앙관제서버(2)는 중앙처리장치(CPU) 및 메모리와 하드디스크와 같은 저장수단을 구비한 하드웨어 기반에서 중앙처리장치에서 수행될 수 있는 프로그램, 즉 소프트웨어가 설치되어 이 소프트웨어를 실행할 수 있는데 이러한 소프트웨어에 대한 일련의 구체적 구성을 '모듈' 및 '부', '파트' 등의 구성단위로써 후술할 예정이다.The central control server 2 is a series of subjects for realizing the system of the present invention, and includes a server PC and a network communication network together. In addition, the central control server 2 has a central processing unit (CPU) and a program that can be executed in the central processing unit on a hardware basis having storage means such as memory and hard disk, that is, software is installed and this software can be executed. A series of specific configurations of such software will be described later as structural units such as 'modules', 'parts', and 'parts'.

이러한 '모듈' 또는 '부' 또는 '인터페이스' 또는 ‘파트’ 등 의 구성은 중앙관제서버(2)의 저장수단에 설치 및 저장된 상태에서 CPU 및 메모리를 매개로 실행되는 소프트웨어 또는 FPGA 내지 ASIC과 같은 하드웨어의 일 구성을 의미한다. 이때, '모듈' 또는 '부', '인터페이스'라는 구성은 하드웨어에 한정되는 의미는 아니고, 어드레싱할 수 있는 저장 매체에 있도록 구성될 수도 있고 하나 또는 그 이상의 프로세서들을 재생시키도록 구성될 수도 있다. 일 예로서 '모듈' 또는 '부' 또는 '인터페이스'는 소프트웨어 구성요소들, 객체지향 소프트웨어 구성요소들, 클래스 구성요소들 및 태스크 구성요소들과 같은 구성요소들과, 프로세스들, 함수들, 속성들, 프로시저들, 서브루틴들, 프로그램 코드의 세그먼트들, 드라이버들, 펌웨어, 마이크로 코드, 회로, 데이터, 데이터베이스, 데이터 구조들, 테이블들, 어레이들 및 변수들을 포함한다.The configuration of these 'modules' or 'parts' or 'interfaces' or 'parts' is installed and stored in the storage means of the central control server (2), and software or FPGA or ASIC executed through the CPU and memory. It means a piece of hardware. In this case, the configuration of 'module', 'unit', and 'interface' is not limited to hardware, and may be configured to be in an addressable storage medium or may be configured to reproduce one or more processors. As an example, 'module' or 'part' or 'interface' refers to components such as software components, object-oriented software components, class components and task components, processes, functions, and properties. It includes fields, procedures, subroutines, segments of program code, drivers, firmware, microcode, circuitry, data, databases, data structures, tables, arrays and variables.

이러한 '모듈' 또는 '부' 또는 '인터페이스'에서 제공되는 기능은 더 작은 수의 구성요소들 및 '부' 또는'모듈'들로 결합되거나 추가적인 구성요소들과 '부' 또는 '모듈'들로 더 분리될 수 있다.Functions provided by these 'modules' or 'units' or 'interfaces' may be combined into a smaller number of components and 'units' or 'modules' or additional components and 'units' or 'modules' can be further separated.

더불어, 중앙관제서버(2)는 적어도 하나의 프로세서를 포함하는 모든 종류의 하드웨어 장치를 의미하는 것이고, 실시예에 따라 해당 하드웨어 장치에서 동작하는 소프트웨어적 구성도 포괄하는 의미로서 이해될 수 있다. 예를 들어, 서버의 일 예로서의 컴퓨팅 장치는 스마트폰, 태블릿 PC, 데스크탑, 노트북 및 각 장치에서 구동되는 사용자 클라이언트 및 애플리케이션을 모두 포함하는 의미로서 이해될 수 있으며, 또한 이에 제한되는 것은 아니다.In addition, the central control server 2 means all kinds of hardware devices including at least one processor, and may be understood as encompassing software configurations operating in the corresponding hardware devices according to embodiments. For example, a computing device as an example of a server may be understood as meaning including a smartphone, a tablet PC, a desktop, a notebook computer, and a user client and an application running in each device, and is not limited thereto.

이하, 이러한 중앙관제서버(2)의 구성을 기반으로 하여 본 발명의 음성 더빙 기반의 진행자 영상 편집 시스템의 세부 구성에 대해 도면과 함께 설명하면 다음과 같다.Hereinafter, the detailed configuration of the presenter's video editing system based on voice dubbing of the present invention based on the configuration of the central control server 2 will be described with drawings.

도 2는 본 발명의 시스템의 전체 구성을 도시한 블록도이며, 도 3은 얼굴 영역 및 입 영역을 도시한 개념도이다.2 is a block diagram showing the overall configuration of the system of the present invention, and FIG. 3 is a conceptual diagram illustrating a face region and a mouth region.

도 2 및 도 3을 참조하여 설명하면, 본 발명의 음성 더빙 기반의 진행자 영상 편집 시스템은. 영상 데이터베이스(100), 영역 검출 모듈(200), 오브젝트 추출모듈(300), 음성 관리 모듈(400), 합성 모듈(500), 출력 모듈(600)을 기본적으로 포함하는 것을 특징으로 한다.Referring to Figs. 2 and 3, the present invention's audio dubbing-based video editing system for a presenter is described. It is characterized in that it basically includes an image database 100 , a region detection module 200 , an object extraction module 300 , a voice management module 400 , a synthesis module 500 , and an output module 600 .

영상 데이터베이스(100)는 진행자(1)가 출연하여 진행자(1)의 얼굴이 촬영된 복수의 영상을 입력받아 저장하고, 영상 중에서 대표영상을 설정하는 기능을 수행한다. 여기서 복수의 영상이라 함은 진행자(1)의 얼굴이 촬영되어 진행자(1)의 얼굴을 포함하고 있는 영상으로서, 상술한 설명에서처럼 뉴스 영상, 일기예보 영상, TV 쇼 프로그램 영상, 홈쇼핑 영상 등일 수 있으며, 혹은 그 외에도 진행자(1)가 출연하여 정보 등을 제공할 수 있는 영상이라면 제한이 없음은 물론이다.The image database 100 performs a function of receiving and storing a plurality of images in which the presenter 1 appears to capture the face of the presenter 1, and sets a representative image among the images. Here, the plurality of images refers to an image in which the face of the presenter 1 is captured and includes the face of the presenter 1, and as described above, it may be a news image, a weather forecast image, a TV show program image, a home shopping image, etc. , or, of course, there is no limitation as long as it is a video in which the presenter 1 can appear and provide information.

따라서 영상의 입력이라 함은 진행자(1)가 촬영된, 기 생성된 영상이라고도 할 수 있으며, 혹은 진행자(1)가 출연한 영상이 생성될 때마다 이를 영상 데이터베이스(100)에 저장하는 것 역시 가능하다.Therefore, the input of an image can be referred to as a pre-generated image taken by the presenter 1, or it is also possible to store the image in the image database 100 whenever an image in which the presenter 1 appears is created. do.

저장되는 영상의 개수 역시 제한이 없으나, 바람직하게는 대표영상을 포함하는 적어도 한 개 이상의 영상을 포함하여야 하며, 영상이 많을수록 합성의 재료가 늘어나는 것을 의미하기에 영상 데이터베이스(100)에 저장되는 영상의 수는 수십 내지 수백, 수천, 나아가 수 만개가 가능할 수 있다. 보다 사실감 있는 합성영상 생성을 위해서는, 영상 데이터베이스(100)에 저장된 영상의 수가 빅데이터를 이룰 수 있는 수인 것이 바람직할 것이다.The number of images to be stored is also not limited, but preferably, it should include at least one image including a representative image, and as the number of images increases, the material for synthesis increases. The number may be tens to hundreds, thousands, or even tens of thousands. In order to generate a more realistic synthetic image, it is preferable that the number of images stored in the image database 100 is a number that can achieve big data.

영상의 촬영 장소에도 제한이 없으므로 스튜디오 촬영 및 야외 촬영, 세트 촬영 등에 있어서 제한을 두지 않으므로 다양한 장소 및 시간에 촬영된 영상이 모두 영상 데이터베이스(100)에 저장하여 후술할 합성영상 제작에 이용될 수 있다.Since there is no restriction on the shooting location of the image, there is no restriction in studio shooting, outdoor shooting, set shooting, etc., so all images taken at various places and times are stored in the image database 100 and used for producing a composite image to be described later. .

더불어 영상 데이터베이스(100)에서는 영상 중 적어도 어느 하나를 대표영상으로 설정하여 저장하는데, 여기서 대표영상이라 함은 합성영상을 생성하기 전의 원본 영상이 되는 것으로서, 입 모양 합성을 위한 원본이라 할 수 있다. 모든 영상이 대표영상이 될 수 있음은 물론이나, 가장 바람직하게는 합성에 용이하도록 단색 배경의 스튜디오에서 진행자(1)를 촬영한 영상이 대표영상이 될 수 있으며, 이는 종래의 일기예보나 뉴스 촬영용 스튜디오에서 촬영된 영상을 생각할 수 있다.In addition, the image database 100 sets and stores at least one of the images as a representative image. Here, the representative image is the original image before generating the synthesized image, and can be called an original for mouth shape synthesis. Of course, all images can be representative images, but most preferably, the image of the host 1 in a studio with a solid background for easy synthesis may be the representative image, which is used for conventional weather forecast or news shooting. Think of a video shot in a studio.

따라서 합성의 원본이 되는 대표영상의 경우 바람직하게는 야외에서 촬영된 영상이 아닌 실내, 스튜디오에서 촬영된 영상일 수 있으며, 객체(진행자(1))와 배경의 구분이 용이하도록 단색 배경의 실내 스튜디오에서 촬영될 수 있다. 여기서 대표영상은 하나일수도, 복수일 수도 있으며 그 개수에 제한을 두지 않는다.Therefore, in the case of the representative image that is the source of the synthesis, it is preferably an image taken indoors or in a studio rather than an image taken outdoors, and an indoor studio with a solid background so that it is easy to distinguish between the object (the presenter 1) and the background. can be filmed in Here, the representative image may be one or plural, and the number is not limited.

영역 검출 모듈(200)은 영상 데이터베이스(100)에 저장된, 진행자(1)를 촬영한 영상 내에서 진행자(1)의 얼굴 영역(10) 및 얼굴 영역(10) 내에서 입 영역(11)을 검출하는 기능을 수행한다. 이때 얼굴 영역(10) 및 입 영역(11)의 검출 방식에서는 제한을 두지 않으나, 바람직하게는 얼굴의 외곽선을 검출하여 외곽선이 이루는 폐곡선 내의 영역을 얼굴 영역(10)으로써 검출하고, 얼굴 영역(10) 내에서도 입술의 외곽선을 검출하여 입술의 외곽선이 이루는 내부 영역을 입 영역(11)으로 검출할 수 있다.The region detection module 200 detects the face region 10 of the presenter 1 and the mouth region 11 within the face region 10 of the image captured by the presenter 1 stored in the image database 100 . perform the function At this time, there is no limitation in the detection method of the face region 10 and the mouth region 11, but preferably, by detecting the outline of the face, the region within the closed curve formed by the outline is detected as the face region 10, and the face region 10 ) by detecting the outline of the lips, the inner region formed by the outline of the lips may be detected as the mouth region 11 .

영상이라 함은 진행자(1)의 얼굴이 촬영되어 얼굴이 포함된 것이라 하였으므로, 영상에서 진행자(1)의 얼굴 영역(10)만을 검출해내고, 얼굴 영역(10)에서도 입 모양을 나타내는 입 영역(11)만을 검출해내는 기능을 수행한다.Since it is said that the face of the presenter 1 is captured and the image is included in the image, only the face region 10 of the presenter 1 is detected from the image, and the mouth region ( 11) performs the function of detecting only.

오브젝트 추출모듈(300)은, 영상 데이터베이스(100)에 저장된 복수의 상기 영상 각각에서 검출된 입 영역(11)에 대해, 음성의 발음 별 입 모양 오브젝트를 추출하여 저장하는 기능을 수행한다.The object extraction module 300 extracts and stores a mouth-shaped object for each pronunciation of a voice with respect to the mouth region 11 detected in each of the plurality of images stored in the image database 100 .

예를 들어 각각의 영상은 정보를 전달하는 영상이므로, 진행자(1)는 정보내용을 음성을 기반으로 전달하며, 이때 음성의 발음에 맞게 입 영역(11)이 움직이게 된다. 예를 들어 정보내용이 ‘안녕하세요’ 일 경우, ‘안’,‘녕’,‘하’,‘세’,‘요’와 같이 정보내용을 이루는 텍스트에 해당하는 발음을 하게 되는 것이다.For example, since each image is an image that transmits information, the moderator 1 transmits information based on voice, and at this time, the mouth area 11 moves according to the pronunciation of the voice. For example, if the information content is ‘hello’, the pronunciation corresponding to the text composing the information content such as ‘an’, ‘nyeong’, ‘ha’, ‘se’, and ‘yo’ is pronounced.

이때 각 글자의 발음 별로 입 모양이 달라지게 되는데, 오브젝트 추출모듈(300)은 이러한 발음 별 입 모양 오브젝트를 추출하여 저장한다. 즉 각각의 영상에서 입 영역(11)만을 검출하고 잘라내여 입 영역(11)만을 저장하되, 발음 별 입 모양을 저장한 것이다. 예를 들어, 영상 내에서‘안’을 발음하는 입 모양에 해당하는 입 영역(11)만을 잘라낸 영상의 부분, 즉 입 영역(11)만을 편집한 부분은 ‘안’이라는 발음에 대한 입 모양 오브젝트가 되는 것이다.At this time, the mouth shape is different for each pronunciation of each letter, and the object extraction module 300 extracts and stores the mouth shape object for each pronunciation. That is, only the mouth area 11 is detected from each image, cut out, and only the mouth area 11 is stored, but the mouth shape for each pronunciation is stored. For example, the part of the image in which only the mouth region 11 corresponding to the mouth shape pronouncing 'in' in the image is cut out, that is, the part where only the mouth region 11 is edited, is a mouth-shaped object for the pronunciation of 'in'. will become

이때 발음 별 입 모양 오브젝트는 바람직하게는 한국어 발음 별로 저장될 수 있으며, 나아가 외국어에서도 해당 언어의 발음 별로 영상에서 입 영역(11)만을 잘라 저장하여 이를 발음 별 입 모양 오브젝트로써 저장하는 것 역시 가능하다.In this case, the mouth shape object for each pronunciation may be preferably stored for each pronunciation of the Korean language. Furthermore, it is also possible to cut and store only the mouth area 11 from the image for each pronunciation of the language in a foreign language and store it as a mouth shape object for each pronunciation. .

나아가 한국어에서는 표준발음법이 존재하므로, 입력된 영상에서 제공하고자 하는 정보의 내용을 텍스트로 변환한 뒤, 이를 표준발음법에 따라 발음대로 적고, 이를 음절마다 분할하여 음절에 해당하는 영상의 입 영역(11)만을 잘라 이를 발음 별 입 모양 오브젝트로 추출하여 저장하는 것 역시 가능하다.Furthermore, since there is a standard pronunciation method in Korean, the content of information to be provided from the input image is converted into text, written according to the standard pronunciation method, and divided into syllables to form the mouth region of the image corresponding to the syllable. It is also possible to cut only (11) and extract it as a mouth-shaped object for each pronunciation and save it.

음성 관리 모듈(400)은 음성 입력부(410) 및 발음 분석부(420)를 포함하여, 새롭게 제작될 합성영상을 통해 제공하고자 하는 정보음성을 입력하고, 정보음성의 발음을 분석한다.The voice management module 400 includes a voice input unit 410 and a pronunciation analyzer 420 to input an information voice to be provided through a newly produced synthetic image, and analyze the pronunciation of the information voice.

음성 입력부(410)는 안내정보를 포함하는 정보음성을 입력받는 기능을 수행하는데, 여기서 안내정보라 함은 제공하고자 하는 정보의 내용이다. 예를 들어 제공하고자 하는 합성영상이 일기예보일 경우 ‘내일의 날씨는 맑겠습니다’에 해당하는 내용이 안내정보가 될 수 있으며, 정보음성은 해당 내용을 읽은 음성을 일컫는다.The voice input unit 410 performs a function of receiving an information voice including guide information, where the guide information is the content of information to be provided. For example, if the synthetic video to be provided is a weather forecast, the information corresponding to 'tomorrow's weather will be sunny' can be guide information, and the information voice refers to the voice that reads the content.

이때 정보음성을 제공하는 자는 바람직하게는 진행자(1)일 수도 있으나, 진행자(1)가 아닌 다른 사람이 정보음성을 제공하는 것 역시 가능함은 물론이며, 정보음성의 경우 바람직하게는 분명한 발음으로 입력할 수 있다. 혹은 안내정보를 텍스트로 입력하고, TTS (text to sound) 기능을 통해 이를 음성으로 변환하여 정보음성으로서 입력하는 것 역시 가능하다.In this case, the person providing the information voice may preferably be the moderator 1, but it is also possible for someone other than the moderator 1 to provide the information voice. can do. Alternatively, it is also possible to input the guide information as text, convert it to voice through a text to sound (TTS) function, and input the information as voice information.

발음 분석부(420)는 정보음성의 발음을 분석하는 기능을 수행하는 것이며, 안내정보에 해당하는 텍스트와 실제 발음은 다를 수 있는 만큼 해당 안내정보에 대한 음성인 정보음성의 발음을 분석하는 것이다.The pronunciation analysis unit 420 performs a function of analyzing the pronunciation of the information voice, and analyzes the pronunciation of the information voice, which is a voice for the guidance information, to the extent that the text corresponding to the guidance information and the actual pronunciation may be different.

예를 들어 상술한 바와 같이‘내일의 날씨는 맑겠습니다’가 안내정보인 경우, 정보음성의 발음은 ‘내일의 날씨는 말?鞭였求蔑?가 된다. 이때 정보음성의 발음 분석은 바람직하게는 표준발음법에 의거하여 분석하는 것을 기본으로 한다.For example, as described above, when 'tomorrow's weather will be sunny' is the guide information, the pronunciation of the information voice becomes 'tomorrow's weather was 鞭求蔑?'. In this case, the pronunciation analysis of the information voice is preferably based on the analysis based on the standard pronunciation method.

합성 모듈(500)은 발음 분석부(420)에 의해 분석된 정보음성의 발음에 따라서, 해당 발음에 대응되는 입 모양 오브젝트를 대표영상의 입 영역(11)에 합성 처리하여 합성영상을 생성한다.The synthesis module 500 generates a synthesized image by synthesizing the mouth-shaped object corresponding to the pronunciation in the mouth region 11 of the representative image according to the pronunciation of the information voice analyzed by the pronunciation analyzing unit 420 .

즉 정보음성의 발음 및 재생속도에 맞추어, 해당 발음에 대응되는 입 모양 오브젝트를 합성의 원본이 되는 대표영상에서 검출된 입 영역(11)에 합성 처리하여 대표영상에서 입 모양이 바뀐 합성영상을 생성하는 것이다.That is, in accordance with the pronunciation and playback speed of the information voice, a mouth-shaped object corresponding to the pronunciation is synthesized in the mouth region 11 detected in the representative image, which is the source of synthesis, to generate a synthesized image in which the mouth shape is changed in the representative image. will do

여기서 만약 대표영상이 하나인 경우, 해당 대표영상에 입 모양 오브젝트를 합성 처리하며, 대표영상이 복수개인 경우 복수개의 대표영상 중 어느 하나에 입 모양 오브젝트를 합성한다. 이때 대표영상 중 하나를 고르는 주체는 바람직하게 시스템 관리자일 수 있으며, 혹은 시스템 상에서 임의로 하나의 대표영상을 무작위 선별할 수도 있다.Here, if there is one representative image, a mouth-shaped object is synthesized in the corresponding representative image, and if there are a plurality of representative images, the mouth-shaped object is synthesized in any one of the plurality of representative images. In this case, the subject who selects one of the representative images may preferably be a system administrator, or may randomly select one representative image on the system.

상술한 바와 같이‘내일의 날씨는 맑겠습니다’가 안내정보인 경우, 정보음성의 발음은 ‘내일의 날씨는 말?鞭였求蔑?가 된다고 하였으므로, 정보음성의 발음 및 재생속도에 맞추어 ‘내’‘일’‘의’‘날’‘씨’‘는’‘말’‘?阪?‘습’‘니’‘다’에 대한 각각의 음절을 발음하는 입 모양 오브젝트를 대표영상의 입 영역(11)에 합성 처리하여, 해당 정보음성의 내용 및 발음, 그리고 정보음성의 재생속도와 음성의 길이와 대응되도록 입 모양 오브젝트를 합성 처리하는 것이다.As described above, when 'tomorrow's weather will be sunny' is the guide information, the pronunciation of the information voice is 'tomorrow's weather was horse? The mouth region ( 11), the mouth-shaped object is synthesized so as to correspond to the contents and pronunciation of the information voice, the reproduction speed of the information voice, and the length of the voice.

마지막으로 출력 모듈(600)을 통해 합성영상에 정보음성을 더빙하여 출력하는데, 이때 합성영상의 원본인 대표영상에 음성과 같은 소리가 삽입되어 있었을 경우 해당 소리, 즉 사운드는 음소거 처리하고, 합성영상의 사운드를 정보음성으로 덮어 씌워 합성영상과 함께 사운드로서는 정보음성이 출력되도록 한다.Finally, through the output module 600, the synthesized image is dubbed with information voice and output. At this time, if a sound such as a voice is inserted in the representative image, which is the original of the synthesized image, the corresponding sound, that is, the sound is muted, and the synthesized image Overlay the sound of the information voice so that the information voice is output as sound together with the composite image.

따라서 이와 같은 영상 편집 시스템을 통해, 전달하고자 하는 정보음성을 별도로 입력받고, 기존에 촬영된 진행자(1)의 영상에서 입 영역(11)만을 해당 정보음성의 발음에 대응되는 입 모양으로 변경 처리하도록 입 모양을 합성한 합성영상을 생성한뒤, 이러한 합성영상에 입력된 정보음성을 더빙하여 출력할 수 있도록 함으로써 진행자(1)가 매번 영상을 촬영하지 않아도 시청자가 보기에는 진행자(1)가 출연하여 안내정보를 제공하는 것처럼 보이게끔 할 수 있다.Therefore, through such an image editing system, the information voice to be transmitted is separately input, and only the mouth area 11 in the previously photographed image of the presenter 1 is changed to a mouth shape corresponding to the pronunciation of the information voice. After creating a synthesized image synthesizing the shape of the mouth, the information voice input to the synthesized image can be dubbed and output, so that the presenter 1 appears and You can make it look like you are providing guidance information.

나아가 이때 입 모양만의 합성만을 진행하므로, 진행자(1)의 얼굴 전체를 딥페이크 처리하는 것에 비해 정보의 처리량이 간단하여 영상 제작이 간편해질 수 있으며, 보다 빠른 속도로 많은 합성영상을 생성하여 제공할 수도 있음은 물론이다.Furthermore, since only the mouth shape is synthesized at this time, the amount of information processing is simple compared to deepfake processing of the entire face of the presenter (1), so video production can be simplified, and many synthetic images are generated and provided at a faster rate Of course you can.

여기서 보다 현실감을 제공하기 위해서는, 같은 발음에 따른 입 모양일 지라도 감정 상태에 따라 입 모양이 다소 달라질 수 있다는 점을 반영하여 입 모양을 통해 감정을 반영할 수 있도록 함으로써, 보다 사실적인 합성영상을 제공하도록 할 수 있다. 예를 들어 똑같은 ‘아’발음을 할지라도 긍정적인 감정일 때는 입 영역(11) 중에서도 입꼬리가 다소 올라간 형상을 나타낼 수 있고, 부정적인 감정일 때는 입꼬리가 하방을 향해 쳐진 모양을 나타낼 수 있다는 점 등을 반영하는 것이다.In order to provide a more realistic feeling, a more realistic composite image is provided by reflecting the fact that the mouth shape may be slightly different depending on the emotional state even if the mouth shape is based on the same pronunciation. can make it For example, even if the same 'ah' is pronounced, it is possible to indicate that the corner of the mouth is slightly raised in the mouth area 11 when it is a positive emotion, and that the corner of the mouth is drooping downward when it is a negative emotion. it will reflect

따라서 정보음성의 내용, 즉 안내정보의 감정상태를 반영하여 입 모양, 바람직하게는 입 꼬리의 높이를 변화시킬 수 있는 구성을 더 포함할 수 있는데, 이를 위해 먼저 음성 입력부(410)는 추가정보 입력파트(411)를 더 포함할 수 있다.Accordingly, it may further include a configuration capable of changing the shape of the mouth, preferably the height of the tail of the mouth, by reflecting the content of the information voice, that is, the emotional state of the guide information. To this end, first, the voice input unit 410 inputs additional information A part 411 may be further included.

추가정보 입력파트(411)는 안내정보의 감정정보를 입력받는 기능을 수행하는 것으로서, 이는 안내정보가 특정 감정을 대변하는 경우, 예를 들어 밝고 기쁜 내용일 경우 감정정보로써 ‘긍정적인 기분, 활기참, 기쁨’ 등을 입력할 수 있도록 하고, 슬프고 우울한 내용일 경우 감정정보로써 ‘우울함, 슬픔’등을 입력할 수 있도록 하는 것이다.The additional information input part 411 performs a function of receiving the emotional information of the guide information, and this is when the guide information represents a specific emotion, for example, if the information is bright and joyful, it is emotional information as 'positive mood, lively' .

즉, 안내정보가 특정 감정을 대변하는 내용을 담고 있을 경우, 텍스트나 음성만으로 판단하기 힘들 수 있는 안내정보의 감정상태를 정보음성 입력 시 별도로 입력처리할 수 있도록 하여 시스템으로 하여금 정보음성에 해당하는 안내정보의 감정상태를 파악할 수 있도록 한다. 이때 감정정보는 바람직하게는 정보음성을 녹음한 자에 의해 입력될 수 있으며, 혹은 시스템 관리자에 의해 입력될 수도 있다.That is, if the guide information contains content that represents a specific emotion, the emotional state of the guide information, which may be difficult to determine only with text or voice, can be separately inputted when inputting information and voice, so that the system can provide information that corresponds to information voice. It is possible to understand the emotional state of the guide information. In this case, the emotional information may be preferably input by a person who recorded the information voice, or may be input by a system administrator.

나아가 오브젝트 추출모듈(300)은 입 모양 오브젝트를 감정 별로 분류하고, 안내정보와 일치하는 입 모양 오브젝트를 추출하도록 하기 위해 감정 분류부(310) 및 오브젝트 추출부(320)를 더 포함할 수 있다.Furthermore, the object extraction module 300 may further include an emotion classification unit 310 and an object extraction unit 320 in order to classify the mouth-shaped object by emotion and to extract the mouth-shaped object matching the guide information.

감정 분류부(310)는 저장된 복수의 발음 별 입 모양 오브젝트를 감정 별로 분류한다. 이때 감정 별 분류라 함은, 바람직하게 입꼬리의 변동을 가지고 파악할 수 있는데, 같은 발음에 해당하는 입 모양 오브젝트라 할지라도 입꼬리가 올라간 모양인 경우 일반적으로 긍정적인 감정을 나타내는 것이므로 긍정적 감정의 키워드인 ‘기쁨’‘즐거움’‘밝음’‘명랑함’‘행복함’ 등의 감정에 분류될 수 있고, 같은 발음일 지라도 입꼬리가 쳐진 입 모양 오브젝트의 경우 일반적으로 부정적 감정에 대응될 수 있으므로 ‘슬픔’‘우울함’‘눈물’‘불행함’등의 감정으로 분류될 수 있다. 입꼬리가 별도의 변화 없이 평이한 경우 ‘무감정’‘덤덤함’등 중립적인 감정을 나타내는 것으로 분류될 수 있다.The emotion classification unit 310 classifies a plurality of stored mouth-shaped objects according to pronunciations for each emotion. At this time, classification by emotion can be preferably identified with a change in the corner of the mouth. Even for a mouth-shaped object corresponding to the same pronunciation, if the corner of the mouth is raised, it generally indicates a positive emotion, so ' It can be classified into emotions such as 'joy', 'joy', 'brightness', 'cheerfulness', and 'happiness', and even if the pronunciation is the same, a mouth-shaped object with a raised corner can generally respond to negative emotions, so it is 'sad'. It can be classified into emotions such as depression, tears, and unhappiness. If the corners of the mouth are flat without any change, it can be classified as expressing neutral emotions such as 'no emotion' and 'dumbness'.

더불어 감정 분류부(310)는 이와 같이 세밀한 감정으로 분류되는 것일 수도 있으나 ‘긍정’‘부정’‘무감정’과 같이 긍정적, 부정적, 그리고 중립적인 감정만으로 분류되어 보다 단순하게 구분될 수도 있다.In addition, the emotion classification unit 310 may be classified into such detailed emotions, but may be classified more simply by classifying only positive, negative, and neutral emotions such as 'positive', 'negative', and 'no emotion'.

이때 분류의 경우 시스템 관리자에 의해 이루어질 수도 있으며, 혹은 시스템 상에서 영상의 얼굴 영역(10)의 전체 파악을 통해 감정을 파악하여 이를 기반으로 입 모양 오브젝트에 감정을 부여하여 감정 별로 분류할 수도 있다.In this case, the classification may be performed by a system administrator, or emotions may be recognized through the entire identification of the face region 10 of the image on the system, and emotions may be given to the mouth-shaped object based on this and classified by emotions.

오브젝트 추출부(320)는 저장된 전체 입 모양 오브젝트 중에서, 추가정보 입력파트(411)를 통해 입력된 감정정보와 일치하는 입 모양 오브젝트만을 감정 일치 오브젝트로 추출하는 기능을 수행한다.The object extraction unit 320 extracts only the mouth-shaped object matching the emotion information input through the additional information input part 411 as an emotion-matching object from among all the stored mouth-shaped objects.

예를 들어 감정정보가 긍정적인 감정인 경우, 긍정적인 감정을 나타내는 것으로 분류된 입 모양 오브젝트만을 감정 일치 오브젝트로 추출하고, 감정정보가 정정적인 감정을 내포한 것일 경우, 부정적인 감정을 나타내는 것으로 분류된 입 모양 오브젝트를 감정 일치 오브젝트로 추출하는 것이다.For example, when the emotional information is a positive emotion, only the mouth-shaped object classified as representing a positive emotion is extracted as an emotion matching object, and when the emotion information contains a corrective emotion, the mouth classified as representing a negative emotion is extracted. It is to extract the shape object as an emotion matching object.

나아가, 합성 모듈(500)의 경우 상술한 기본 구성에서와 같이 정보음성의 발음, 그리고 재생 속도에 맞춰 입 모양 오브젝트를 대표영상의 입 영역(11)에 합성하여 합성영상을 생성하나, 이때 감정을 반영하여 감정 일치 오브젝트를 정보음성의 발음 및 재생속도에 맞추어 대표영상의 입 영역(11)에 합성 처리한다.Furthermore, in the case of the synthesis module 500, as in the above-described basic configuration, a mouth-shaped object is synthesized in the mouth region 11 of the representative image according to the pronunciation and playback speed of the information voice to generate a synthesized image. In reflection, the emotion matching object is synthesized in the mouth area 11 of the representative image according to the pronunciation and reproduction speed of the information voice.

예를 들어 똑같은 ‘안’‘녕’‘하’‘세’‘요’의 발음에 해당하는 입 모양 오브젝트를 합성한다 치더라도, 해당 내용이 긍정적인 감정일 경우 긍정적인 감정을 나타내는 것으로 분류된 감정 일치 오브젝트 중에서 합성 처리가 이루어지고, 해당 내용이 부정적인 감정을 내포하고 있을 경우 부정적 감정을 나타내는 것으로 분류된 감정 일치 오브젝트 중에서 합성 처리가 이루어질 수 있다.For example, even if a mouth-shaped object corresponding to the pronunciation of the same 'an', 'nyeong', 'ha', 'se', and 'yo' is synthesized, if the corresponding content is a positive emotion, the emotion classified as representing a positive emotion Synthesis processing is performed among the matching objects, and when the corresponding content contains negative emotions, the synthesis processing may be performed among the emotion matching objects classified as representing negative emotions.

이를 통해 안내정보를 통해 전달하고자 하는 감정상태를 입 모양 합성에서도 나타낼 수 있도록 하여, 입 영역(11), 그중에서도 입꼬리의 방향을 기준으로 한 감정상태 표현을 가능하게 함으로써 보다 사실적이고 현실감 있는 합성영상을 생성하여 제공할 수 있게 된다.Through this, the emotional state to be conveyed through the guide information can be expressed in the mouth shape synthesis, enabling expression of the emotional state based on the mouth area 11, especially the corner of the mouth, thereby creating a more realistic and realistic synthetic image. can be created and provided.

더불어 보다 사실적인 합성영상 생성을 위해서는, 입 모양 오브젝트가 삽입된 입 영역(11)의 경계면에 있어서의 경계면 스머징(smudging), 즉 경계면의 번짐 처리를 수행하여 합성된 부분이 티나지 않게 처리하는 것 역시 가능한데, 이를 위해 합성 모듈(500)은 경계 보정부(510)를 더 포함할 수 있다, 여기서 경계 보정부(510)는 피부 화소 검출파트(511), 경계 보정 파트(512), 스머징 파트(513)를 포함한다.In addition, in order to generate a more realistic synthetic image, boundary surface smudging at the boundary surface of the mouth region 11 into which the mouth-shaped object is inserted, that is, the boundary surface blurring process is performed so that the synthesized part is not visible. Also possible, for this purpose, the synthesis module 500 may further include a boundary corrector 510 , wherein the boundary corrector 510 includes a skin pixel detection part 511 , a boundary correction part 512 , and a smudging part. (513).

피부 화소 검출파트(511)는 생성된 합성영상에서 입 영역(11)과 얼굴 영역(10) 사이의 경계 부위, 즉 합성되는 입 영역(11)의 외곽선에 해당하는 경계면의 피부 화소를 검출한다. 여기서 피부 화소라 함은 피부 색상, 즉 상아색 내지 살구색, 나아가 갈색에 걸치는 일반적인 피부의 색상을 나타내는 화소를 검출하는 것으로, 합성되는 입 영역(11)의 경계면, 즉 외곽선 경계에 근접한 화소 중에서 피부의 색을 갖는 화소를 검출해내는 것이다.The skin pixel detection part 511 detects a skin pixel on a boundary between the mouth area 11 and the face area 10 in the generated composite image, that is, a boundary surface corresponding to the outline of the mouth area 11 to be synthesized. Here, the skin pixel refers to detecting a pixel representing a skin color, that is, a general skin color spanning ivory to apricot color, and further brown, and the color of the skin among pixels close to the boundary of the mouth region 11 to be synthesized, that is, the outline boundary. To detect pixels with .

경계 보정 파트(512)는 검출된 경계 부위의 채도 및 명도를 보정 처리한다. 즉 색감 및 밝기를 보정 처리하여, 입 모양 오브젝트가 합성되어 얼굴 영역(10) 및 입 영역(11) 사이에서 발생할 수 있는 채도 차이 및 밝기의 차이를 낮추고 경계 부위의 피부 화소에 있어 톤(tone)을 고르게 하여 자연스러운 경계면을 생성하는 것이다.The boundary correction part 512 corrects the saturation and brightness of the detected boundary region. That is, by correcting color and brightness, a mouth-shaped object is synthesized to reduce the difference in chroma and brightness that may occur between the face area 10 and the mouth area 11, and to reduce the tone in the skin pixels in the boundary area. to create a natural boundary surface.

스머징 파트(513)의 경우, 경계 부위를 스머징(smudging), 즉 번지게 처리하여 합성된 경계면을 블러링(blurring)함으로써 경계를 흐릿하게 번지게 하고, 합성된 부분의 경계를 알아볼 수 없도록 하는 것이다. 이를 통해 얼굴 영역(10)의 피부가 보다 깨끗해 보일 수 있음과 동시에 경계면이 흐려져 합성된 부분의 경계가 알아보기 힘들어지는 만큼, 보다 자연스러운 합성영상을 제공할 수 있게 된다.In the case of the smudging part 513, the boundary is blurred by smudging, that is, blurring the synthesized boundary surface by smudging, that is, the boundary of the synthesized part is not recognizable. will do Through this, the skin of the face region 10 can look cleaner and at the same time, a more natural composite image can be provided as the boundary surface is blurred and the boundary of the synthesized portion is difficult to recognize.

도 4는 본 발명의 객체 영역 및 배경 영역을 포함하는 합성영상의 화면 예시를 나타낸 개념도이다.4 is a conceptual diagram illustrating an example of a screen of a composite image including an object region and a background region according to the present invention.

본 발명의 합성영상의 경우 대표영상에서 객체, 즉 진행자(1)의 얼굴 영역(10)을 검출하고, 얼굴 영역(10) 중에서도 입 영역(11)에만 합성을 수행하여 합성영상을 생성한다. 그러나 대표영상이 바람직하게는 상술한 바와 같이 합성에 용이한 실내 스튜디오 영상일 수 있는 만큼, 실내에 국한될 수 있다는 한계성이 있다.In the case of the synthesized image of the present invention, an object, that is, the face region 10 of the presenter 1 is detected from the representative image, and synthesis is performed only on the mouth region 11 among the face regions 10 to generate a synthesized image. However, as the representative image is preferably an indoor studio image that is easy to synthesize as described above, there is a limitation in that it can be limited to indoors.

이를 보완하기 위해 도 4를 참조하여 알 수 있듯이, 합성영상의 생성에 있어 배경의 합성 역시 가능하게 구성할 수 있다.In order to supplement this, as can be seen with reference to FIG. 4 , background synthesis may also be possible in the generation of a composite image.

이를 위해 먼저 영상 데이터베이스(100)는 실내 및 야외 중 어느 하나의 장소에서 촬영된 배경 영상을 더 입력받아 저장할 수 있는데, 배경 영상은 상술한 진행자(1)가 촬영된 영상과는 별도로 저장되는 것으로서, 풍경이나 특정 장소만을 촬영한 영상이라 할 수 있다. 이때 야외의 경우 낮, 밤, 오후, 저녁, 아침과 같이 시간별로 촬영될 수 있으며, 혹은 날씨별로 촬영되는 것 역시 가능하다.To this end, first, the image database 100 may further receive and store a background image photographed indoors and outdoors, and the background image is stored separately from the image photographed by the presenter 1, It can be said that it is a video of only a landscape or a specific place. At this time, in the case of the outdoors, it is possible to photograph by time such as day, night, afternoon, evening, and morning, or it is also possible to photograph according to weather.

따라서 다양한 실내 공간이나 야외 공간을 영상 형태로 촬영하여 실내 및 야외 중 어느 하나의 장소를 포함하는 배경 영상을 입력받아 저장함으로써, 이를 배경에 대한 데이터베이스로써 이용할 수 있다.Therefore, by capturing various indoor or outdoor spaces in the form of images, receiving and storing background images including any one of indoor and outdoor places, it can be used as a database for the background.

더불어 영역 검출 모듈(200)은, 배경 검출부(210)를 더 포함하여 영상 데이터베이스(100)에 입력된 진행자(1)를 촬영한 영상 중에서 진행자(1)를 포함하는 객체 영역(20), 즉 대표적인 객체인 진행자(1)의 영역인 객체 영역(20)과, 객체 영역(20)을 제외한 나머지 영역인 배경 영역(30)을 구분 짓게 할 수 있다.In addition, the region detection module 200 further includes a background detection unit 210 and includes an object region 20 including the presenter 1, that is, representative The object area 20, which is the area of the presenter 1, which is an object, and the background area 30, which is an area other than the object area 20, can be distinguished.

진행자(1)를 촬영한 영상에서 진행자(1)는 가만히 앉아있거나 서 있을 수도 있지만 움직일 수도 있는 만큼, 해당 영상의 대표적인 객체인 진행자(1)를 인식하도록 하고 진행자(1)의 영역인 객체 영역(20)과 그를 제외한 배경 영역(30)을 나누어 검출할 수 있도록 한 것이다.In the image of the moderator 1, the moderator 1 can sit still or stand, but can also move, so that the moderator 1, a representative object of the video, is recognized, and the object area ( 20) and the background area 30 excluding it can be divided and detected.

더불어 합성 모듈(500)의 경우 배경 합성부(520)를 더 포함할 수 있어, 배경 영상을 대표영상의 배경 영역(30)에 합성 처리함으로써 대표영상에서 진행자(1)의 입모양을 변경하는 것 뿐 아니라 배경을 변화시키는 것 역시 가능하다.In addition, the synthesizing module 500 may further include a background synthesizing unit 520, so that the background image is synthesized in the background area 30 of the representative image to change the mouth shape of the presenter 1 in the representative image. It is also possible to change the background.

따라서 실내 스튜디오에서 촬영된 대표영상의 배경만을 변경하여 정보음성의 내용에 맞추어 다양한 장소에서 촬영한 것처럼 합성영상을 제작할 수 있게 되어, 시공간의 제약 없이 영상을 제공할 수 있게 되는 것이다.Therefore, by changing only the background of the representative image shot in the indoor studio, it is possible to produce a composite image as if it was filmed in various places according to the content of the information voice, thereby providing the image without time and space restrictions.

여기에서 더 나아가, 배경 영상이라 함은 실내 뿐 아니라 야외에서 촬영된 것일 수 있는데, 야외에서 촬영된 배경 영상인 경우 날씨 및 시간에 따른 변화가 있을 수 있다. 즉 같은 야외 장소에서 촬영된 배경 영상이라 할지라도, 같은 시간대에서 맑거나 흐리거나 비가 오는 날씨에 따라 현장감이 달라질 수 있으며, 혹은 아침, 점심, 저녁, 밤과 같이 시간에 따라 배경의 느낌이 달라질 수 있다.Further, the background image may be taken not only indoors but also outdoors. In the case of a background image taken outdoors, there may be changes according to weather and time. That is, even if the background image was taken at the same outdoor location, the sense of presence may vary depending on the weather in the same time zone, whether it is sunny, cloudy, or raining, or the feeling of the background may change according to time, such as morning, lunch, evening, or night. have.

따라서 바람직하게는 배경 영상에 있어 같은 야외 장소라 할지라도 시간 별, 날씨 별로 반복적으로 촬영을 수행하여 제공할 수 있다면, 보다 현장감 있고 사실감있는 합성영상을 생성하여 제공할 수 있게 된다. 더불어 합성되는 배경이 합성영상이 시청자에게 제공되는 시점과 일치하는 시간, 날씨의 배경일 경우 현장감, 사실감은 극대화될 수 있다.Therefore, preferably, even in the same outdoor location in the background image, if it is possible to provide by repeatedly shooting according to time and weather, it is possible to generate and provide a more realistic and realistic synthetic image. In addition, the sense of presence and realism can be maximized if the background to be synthesized is the background of the time and weather that coincides with the time when the synthesized image is provided to the viewer.

그러므로 이와 같은 현장감 있는 영상을 구현해내기 위해, 본 발명의 영상 편집 시스템은 환경 파악 모듈(700)을 더 포함할 수 있는데, 여기서 환경 파악 모듈(700)은 외부 환경 파악부(710)와 배경 분류부(720)를 포함하여 구성될 수 있다.Therefore, in order to realize such a realistic image, the image editing system of the present invention may further include an environment identification module 700, wherein the environment identification module 700 includes an external environment identification unit 710 and a background classification unit. 720 may be included.

외부 환경 파악부(710)는 출력 모듈(600)을 통한 합성영상의 출력시점, 즉 합성영상이 시청자에게 제공되는 시점을 기준으로 하여 합성영상에 합성 처리된 배경 영상이 촬영된 장소의 날씨정보 및 시간정보를 파악한다. 시간정보의 경우 합성영상의 출력시점을 파악하는 것으로 판단할 수 있으며, 날씨정보는 해당 시간대, 장소의 기상청 일기예보를 확인하는 것으로 날씨정보를 확인할 수 있다.The external environment determining unit 710 determines the output time of the composite image through the output module 600, that is, the weather information of the place where the background image synthesized in the composite image was recorded based on the timing at which the composite image is provided to the viewer, and Get time information. In the case of time information, it can be determined that the output time of the composite image is identified, and the weather information can be confirmed by checking the weather forecast of the Korea Meteorological Administration for the corresponding time zone and place.

배경 분류부(720)는 본 발명의 영상 데이터베이스(100)에 저장된 복수의 배경 영상을 날씨정보, 시간정보별로 분류하는 기능을 수행한다. 즉 같은 장소에서 촬영된 배경 영상이라 할지라도, 날씨정보, 시간정보별로 차이가 있을 수 있는 만큼, 예를 들어 선릉역 3번 출구 앞에서 촬영된 복수의 배경 영상에 대해, 오후 5시의 맑은 날씨의 영상, 오후 5시의 비오는 날씨의 영상, 오후 5시의 눈 오는 날씨의 영상, 아침 6시의 맑은 날씨의 영상, 아침 6시의 비오는 날씨의 영상, 오후 8시의 비오는 날씨의 영상, 오후 8시의 눈 오는 날씨의 영상 등으로 나누어질 수 있다.The background classification unit 720 performs a function of classifying a plurality of background images stored in the image database 100 of the present invention according to weather information and time information. That is, even with a background image taken at the same place, there may be differences according to weather information and time information. , rainy weather video at 5pm, snowy weather video at 5pm, sunny weather video at 6am, rainy weather video at 6am, rainy weather video at 8pm, 8pm It can be divided into images of snowy weather in

즉 동일한 장소에서 촬영된 배경 영상이라 할지라도 날씨정보 및 시간정보를 반영할 수 있도록 하여 해당 배경 영상이 촬영된 시간 및 날씨를 파악하고, 이를 기반으로 배경의 합성이 가능케 한 것이다. 더불어 이와 같은 구성이 효과적으로 이루어지기 위해서는 배경 영상의 수가 많을수록 효과적이라 할 수 있다.That is, even in the background image taken at the same place, weather information and time information can be reflected, so that the time and weather at which the background image was taken can be grasped, and background synthesis is possible based on this. In addition, it can be said that the more the number of background images is, the more effective the configuration is to be made effectively.

더불어 이와 같이 배경 영상의 날씨정보 및 시간정보가 파악되는 경우, 배경 합성부(520)는 출력 모듈(600)을 통한 합성영상의 출력시점에 해당하는 장소의 날씨정보, 시간정보와 일치하는 배경 영상을 대표영상의 배경 영역(30)에 합성 처리하게 된다.In addition, when the weather information and time information of the background image are identified as described above, the background synthesizer 520 is a background image that matches the weather information and time information of a place corresponding to the output time of the synthesized image through the output module 600 . is synthesized in the background area 30 of the representative image.

즉, 안내정보를 영상 형태로 제공함에 있어, 안내정보와 대응되는 배경 영상이 제공되는 것이 바람직하다 할 수 있음은 기본인데, 만약 안내정보로써 서울시 강남구의 날씨 정보를 제공하는 경우, 해당 합성 영상이 2020년 5월 5일 오후 1시 30분에 출력된다고 가정한다면 5월 5일 오후 1시 30분의 서울시 강남구의 날씨정보를 파악하고, 날씨정보 및 시간정보가 일치하는 서울시 강남구의 배경 영상을 대표영상의 배경 영역(30)에 합성 처리하는 것이다.That is, in providing guide information in the form of an image, it is basic that it is desirable to provide a background image corresponding to the guide information. Assuming that it is output at 1:30 pm on May 5, 2020, it identifies the weather information of Gangnam-gu, Seoul at 1:30 pm on May 5, and represents the background image of Gangnam-gu, Seoul with the same weather information and time information. Compositing is performed on the background area 30 of the image.

다시 예를 들자면, 서울시 강남구의 5월 5일 1시 30분의 날씨가 맑을 것으로 판단되는 경우, 서울시 강남구의 야외에서 5월의 오후 1시 경에 촬영된, 맑은 날씨의 배경 영상을 합성 처리하여 시청자가 보기에는 마치 실시간으로 영상을 촬영하여 제공하는 듯 한 현장감을 제공할 수 있게 되는 것이다. As another example, if it is determined that the weather at 1:30 on May 5 in Gangnam-gu, Seoul will be clear, the background image of the sunny weather taken outdoors in Gangnam-gu, Seoul at around 1:00 pm in May is synthesized and processed. To the viewer, it is possible to provide a sense of realism as if the video was recorded and provided in real time.

또한 이는 외국의 예시에도 마찬가지인데, 만약 안내정보로써 파리의 5월에 대한 이야기를 제공한다면, 5월의 파리의 야외에서 촬영된 배경 영상을 합성하는 것이 바람직하다. 여기서 만약 생성된 합성영상이 5월 5일 1시 30분에 시청자에게 출력된다면, 5월 5일 1시 30분의 파리의 날씨와 동일한 배경 영상, 그 중에서도 시간대와 계절감이 동일한 배경 영상을 합성 처리하여 진행자(1)가 마치 파리에 가서 해당 합성영상을 촬영한 것과 같은 효과를 제공할 수 있게 된다.Also, this is the same for foreign examples. If a story about May in Paris is provided as guide information, it is preferable to synthesize a background image taken outdoors in Paris in May. Here, if the generated composite image is output to the viewer at 1:30 on May 5, a background image identical to the weather in Paris at 1:30 on May 5, especially a background image with the same time zone and sense of seasons, is synthesized Thus, it is possible to provide the same effect as if the host 1 went to Paris and took the corresponding composite image.

더불어 이와 같이 배경 영상이 진행자(1)와는 별도로 합성되는 경우, 바람직하게 실내에서 촬영되는 대표영상 속 진행자(1)로 대표되는 객체 영역(20)의 밝기(brightness)와 객체 영역(20)을 제외한 배경 영역(30)의 밝기가 너무 차이나 자연스러움이 떨어지는 문제가 존재할 수 있다.In addition, when the background image is synthesized separately from the presenter 1 as described above, the brightness of the object region 20 and the object region 20 represented by the presenter 1 in the representative image preferably shot indoors are excluded. There may be a problem in that the brightness of the background area 30 is too different or the naturalness is deteriorated.

이는 야외에서 촬영된 배경 영상이 합성된 합성 영상에서 더 두드러질 수 있는데, 배경이 밤인데 객체인 진행자(1)만 너무 밝거나, 혹은 배경은 낮인데 객체인 진행자(1)는 상대적으로 어두운 경우, 사실감이 떨어질 수 있다.This can be more conspicuous in a composite image in which the background image taken outdoors is synthesized. When the background is night but only the host (1) is too bright, or the background is day but the host (1), the object, is relatively dark. , the realism may be reduced.

이를 위해, 제작된 합성영상에서 배경 영상의 밝기에 맞추어 객체 영역(20)의 밝기를 조절하도록 하여 배경 영역(30)과 객체 영역(20)의 밝기 차이를 줄이고, 나아가 보다 자연스러운 영상을 생성하도록 할 수 잇는데, 이를 위해 합성 모듈(500)은 영상 추가 보정부(530)를 더 포함할 수 있다. 영상 추가 보정부(530)는 밝기 파악 파트(531), 밝기 조절 파트(532)를 포함하여 구성된다.To this end, the brightness difference between the background area 30 and the object area 20 is reduced by adjusting the brightness of the object area 20 according to the brightness of the background image in the produced composite image, and furthermore, a more natural image is generated. For this, the synthesis module 500 may further include an image additional correction unit 530 . The image additional correction unit 530 is configured to include a brightness detection part 531 and a brightness adjustment part 532 .

밝기 파악 파트(531)는, 생성된 합성영상에서 배경 영역(30)의 평균 밝기인 배경 밝기와, 객체 영역(20)의 평균 밝기인 객체 밝기를 각각 파악하는 기능을 수행한다. 이는 배경 영역(30)을 구성하는 화소들의 평균 밝기, 그리고 객체 영역(20)을 구성하는 화소들의 평균 밝기를 산출하여 구하는 것이며, 여기서 밝기라 함은 brightness를 의미하며, 영상 편집용 다빈치 리졸브 작업 시 웨이브 스코프를 기준으로 하여 게인, 감마, 리프트의 평균 값을 구하거나, 혹은 프리미어 프로와 같은 영상 편집 툴에서 하이라이트, 노출, 대비의 값을 찾아 파악할 수 있다.The brightness detecting part 531 performs a function of recognizing a background brightness that is an average brightness of the background area 30 and an object brightness that is an average brightness of the object area 20 in the generated composite image. This is obtained by calculating the average brightness of pixels constituting the background area 30 and the average brightness of pixels constituting the object area 20 , where brightness refers to brightness, and Da Vinci Resolve work for image editing You can find the average values of gain, gamma, and lift based on the Sea Wavescope, or find and understand the values of highlight, exposure, and contrast in a video editing tool such as Premiere Pro.

가장 바람직하게 밝기 값이라 함은 휘도 값의 정의에 따라 영상에 존재하는 영상소 값으로 불연속적인 밝기의 양을 나타내는데 사용되는 값을 이용할 수 있다. 이는 0~255 범위의 값을 가질 수 있다.Most preferably, the brightness value is an image element value existing in an image according to the definition of the luminance value, and a value used to indicate a discontinuous amount of brightness may be used. It can have a value ranging from 0 to 255.

여기서 바람직하게는 합성 영상을 구성하는 각 프레임 별로 배경 영역(30) 및 객체 영역(20)을 검출하고, 검출된 각각의 배경 영역(30) 및 객체 영역(20)에서 배경 밝기 및 객체 밝기를 각각 산출해 낸 뒤 전체 프레임에 있어 산출된 배경 밝기의 평균, 그리고 객체 밝기의 평균을 구하여 이를 비교하게 된다.Here, preferably, the background area 30 and the object area 20 are detected for each frame constituting the synthesized image, and the background brightness and the object brightness are respectively determined in the detected background area 30 and the object area 20, respectively. After the calculation, the average of the calculated background brightness and the average of the object brightness in the entire frame are calculated and compared.

밝기 조절 파트(532)는 배경 영역(30) 및 객체 영역(20)에서 각각 파악된 배경 밝기 및 객체 밝기의 차이에 따라 객체 밝기를 조절하는, 즉 객체 영역(20)의 밝기를 조절하는 기능을 수행한다. 즉 배경 영역(30)의 밝기에 객체 영역(20)을 맞추어 객체 영역(20)이 배경 영역(30)과 자연스럽게 어우러지게 하는 것인데, 이는 상술한 바와 같이 다빈치 리졸브 작업 시 웨이브 스코프를 기준으로 하여 게인, 감마, 리프트 값을 각각 조절하거나, 혹은 프리미어 프로와 같은 영상 편집 툴에서 하이라이트, 노출, 대비의 값을 조절하는 방식으로 구현될 수 있다.The brightness control part 532 controls the brightness of the object according to the difference between the background brightness and the object brightness detected in the background area 30 and the object area 20 , that is, the function of adjusting the brightness of the object area 20 . carry out That is, the object area 20 matches the brightness of the background area 30 so that the object area 20 naturally blends with the background area 30. It can be implemented by adjusting each of the gain, gamma, and lift values, or by adjusting the values of highlight, exposure, and contrast in a video editing tool such as Premiere Pro.

따라서 이와 같은 구성에 의해 배경 영역(30)의 밝기에 객체 영역(20)의 밝기가 맞춰질 수 있도록 조절이 가능해져, 객체 영역(20), 즉 진행자(1)가 촬영된 영역이 배경 영역(30)과 자연스럽게 어우러지게 되어 보다 자연스러운 합성영상의 생성이 가능해지게 된다.Accordingly, it is possible to adjust the brightness of the object area 20 to match the brightness of the background area 30 by this configuration, so that the object area 20 , that is, the area in which the presenter 1 is photographed, is the background area 30 . ), and it becomes possible to create a more natural composite image.

도 5는 본 발명의 하이라이트 영역 및 그림자 영역, 포인트 영역 및 섀도우 영역을 나타낸 개념도이며, 도 6은 가상 연결선 및 가상 진행선을 기반으로 하여 산출되는 그라데이션 에어리어를 도시한 개념도이다.5 is a conceptual diagram illustrating a highlight area, a shadow area, a point area, and a shadow area according to the present invention, and FIG. 6 is a conceptual diagram illustrating a gradation area calculated based on a virtual connecting line and a virtual progress line.

나아가 야외 배경인 경우 일반적으로 주 광원이 햇빛이 되거나, 혹은 자연스럽고 역광이 없는 상황을 제공하기 위해서는 먼 거리에 위치한 광원에서 자연스럽게 빛이 쏟아져 들어오는 것이 일반적이다.Furthermore, in the case of an outdoor background, in general, the main light source is sunlight, or in order to provide a natural and no backlight situation, it is common for light to naturally pour in from a light source located at a distance.

이 경우 일반적으로 진행자(1)라 할 수 있는 객체의 위쪽에서 빛이 하방을 향해 내려오는 것이 밝기 측면에서도 자연스럽고, 역광이 존재하지 않는 화면을 제공하게 되는데, 이러한 야외 배경이 제공되는 경우 객체 영역(20), 즉 진행자(1) 역시 이와 같은 빛의 흐름과 동일하게 밝기가 조절될 수 있으면 보다 현실감 있고 자연스러운 합성영상을 만들 수 있게 된다.In this case, it is natural in terms of brightness that light comes down from the top of the object, which is generally referred to as the presenter 1, and provides a screen without backlight. When such an outdoor background is provided, the object area (20), that is, if the brightness of the presenter 1 can also be adjusted in the same way as the flow of light, it is possible to create a more realistic and natural synthetic image.

이를 위해 도 5에서 나와 있는 바와 같이 합성영상의 배경 영역(30)에서 하이라이트 영역(40) 및 그림자 영역(50)을 설정하고, 객체 영역(20)에서도 이와 대응되는 포인트 영역(60) 및 섀도우 영역(70)을 설정하여 포인트 영역(60) 및 섀도우 영역(70)의 밝기를 차등 조절함으로써 보다 자연스럽게 밝기가 그라데이션 된, 자연광을 받은 듯한 효과를 객체 영역(20)에 부여할 수 있는데, 이를 위해 영상 추가 보정부(530)는 하이라이트 설정파트(533), 그림자 설정파트(534), 음영 설정파트(535), 추가 조절 파트(536)를 포함하여 구성될 수 있다.To this end, as shown in FIG. 5 , a highlight area 40 and a shadow area 50 are set in the background area 30 of the composite image, and the corresponding point area 60 and shadow area are also set in the object area 20 . By setting (70) to differentially control the brightness of the point area 60 and the shadow area 70, the object area 20 can be given the effect of receiving natural light with a more natural gradation of brightness. The additional correction unit 530 may include a highlight setting part 533 , a shadow setting part 534 , a shadow setting part 535 , and an additional adjustment part 536 .

하이라이트 설정파트(533)는 합성영상을 구성하는 배경 영역(30)에서, 합성영상의 외곽에 위치하는 둘레 부위, 즉 합성영상에 테두리 측에 위치한 배경 영역(30)을 복수개로 분할 처리하고, 분할된 배경 영역(30) 중 상측에 위치한, 즉 상측 테두리에 위치한 배경 영역(30)의 일부를 하이라이트 영역(40)으로 설정하고 해당 하이라이트 영역(40)의 밝기 및 위치를 파악하는 기능을 수행한다.The highlight setting part 533 divides the background area 30 constituting the composite image into a plurality of peripheral portions located outside the synthesized image, that is, the background region 30 located on the edge side of the synthesized image, and divides them. A part of the background area 30 located on the upper side of the background area 30 , that is, located on the upper edge, is set as the highlight area 40 , and a function of determining the brightness and position of the highlighted area 40 is performed.

여기서 하이라이트 영역(40)이라 함은 밝기가 높은 영역을 일컫는 것으로, 바람직하게는 야외에서 촬영된 배경 영상의 경우 상측 테두리 일부에서 빛이 쏟아져 들어오는 형식인 것이 가장 자연스러운 광원이기에, 광원의 영역을 찾거나 설정하는 것이라 할 수 있다.Here, the highlight area 40 refers to an area with high brightness. Preferably, in the case of a background image shot outdoors, light pouring in from a part of the upper edge is the most natural light source, so the It can be said that setting

만약 광원이 도 5의 예시와 같이 햇빛인 경우, 햇빛을 가장 직접적으로 받는 배경 영역(30)의 상측 둘레 부위가 밝기가 가장 높은 것이 일반적일 것이다. 따라서 이를 하이라이트 영역(40)으로 설정하고, 상술한 바와 같이 다빈치 리졸브에서 웨이브 스코프를 기준으로 한 게인, 감마, 리프트 값을 각각 파악하거나, 혹은 프리미어 프로와 같은 영상 편집 툴에서 이용되는 하이라이트, 노출, 대비의 값을 파악하여 밝기를 파악할 수 있다. 위치의 경우 영상 프레임 내에서 설정된 영역의 위치정보를 확인하여 파악할 수 있게 된다.If the light source is sunlight as in the example of FIG. 5 , the upper peripheral portion of the background area 30 that receives sunlight most directly has the highest brightness. Therefore, this is set as the highlight area 40, and as described above, gain, gamma, and lift values based on the wave scope are identified in Da Vinci Resolve, respectively, or highlight and exposure used in video editing tools such as Premiere Pro. , it is possible to grasp the brightness by grasping the value of the contrast. In the case of the location, it is possible to determine the location by checking the location information of the area set within the image frame.

그림자 설정파트(534)는 하이라이트 영역(40)에서 대각선 반대 방향에 위치한, 즉 배경 영역(30)의 둘레 부위 중 하측 일부를 그림자 영역(50)으로 설정하는 것이다. 여기서 그림자 영역(50)은 광원의 반대쪽 끝을 의미하므로, 밝기가 가장 낮은 영역이라 할 수 있다. 바람직하게는 객체, 즉 진행자(1)의 발끝이 위치한 방향의 배경 영역(30)의 하측 일부가 그림자 영역(50)으로 설정될 수 있다.The shadow setting part 534 is to set a lower part of the periphery of the background area 30 located in a diagonally opposite direction from the highlight area 40 as the shadow area 50 . Here, since the shadow region 50 means the opposite end of the light source, it may be referred to as a region having the lowest brightness. Preferably, an object, that is, a lower portion of the background area 30 in the direction in which the toe of the presenter 1 is located, may be set as the shadow area 50 .

더불어 대각선 반대 방향이라 함은, 하이라이트 영역(40)이 도 5에서와 같이 합성영상 화면 기준으로 좌측 상단에 위치하는 경우 그림자 영역(50)은 일반적으로 배경 영역(30) 중 우측 하단에 위치하게 되는 것을 의미하며, 하이라이트 영역(40)이 합성영상 화면 기준 우측 상단에 위치하는 경우 그림자 영역(50)이 좌측 하단에 위치하게 되는 것을 의미한다.In addition, the diagonally opposite direction means that when the highlight area 40 is located at the upper left side of the composite image screen as in FIG. 5 , the shadow area 50 is generally located at the lower right side of the background area 30 . This means that when the highlight area 40 is located at the upper right corner of the composite image screen, the shadow area 50 is located at the lower left corner.

더불어 음영 설정파트(535)는 객체 영역(20) 내에서 포인트 영역(60) 및 섀도우 영역(70)을 설정하는 것인데, 여기서 포인트 영역(60)이라 함은 객체 영역(20) 내에서의 하이라이트 존이라 할 수 있으며, 객체 영역(20) 중에서도 밝기가 가장 높은 영역이라 할 수 있다. 포인트 영역(60)은 바람직하게 하이라이트 영역(40)의 위치에 따라 설정될 수 있으며, 객체 영역(20) 내에서 하이라이트 영역(40)과 가장 인근에 위치한 부분이 바람직하게 포인트 영역(60)으로 설정될 수 있다. 일반적으로 하이라이트 영역(40)은 객체 영역(20) 내에서도 얼굴 영역(10) 인근인 것이 가장 자연스럽고 일반적일 수 있다.In addition, the shade setting part 535 sets the point area 60 and the shadow area 70 in the object area 20 , where the point area 60 is a highlight zone within the object area 20 . It can be said that the brightness is the highest among the object area 20 . The point area 60 may preferably be set according to the position of the highlight area 40 , and the part located closest to the highlight area 40 in the object area 20 is preferably set as the point area 60 . can be In general, the highlight area 40 may be the most natural and common in the vicinity of the face area 10 even within the object area 20 .

반대로 섀도우 영역(70)이라 함은 객체 영역(20) 내에서 밝기가 가장 낮은 영역이라 할 수 있으며, 그림자 영역(50)의 위치에 따라 설정되는 것이 바람직하다. 가장 바람직하게는 객체 영역(20) 내에서도, 섀도우 영역(70)과 가장 인근에 위치한 부분이 섀도우 영역(70)으로 설정될 수 있다. 일반적으로 섀도우 영역(70)은 객체 영역(20) 내에서, 진행자(1)의 발끝 인근인 것이 가장 자연스럽고 일반적일 수 있다.Conversely, the shadow area 70 may be referred to as an area having the lowest brightness within the object area 20 , and is preferably set according to the position of the shadow area 50 . Most preferably, even within the object area 20 , a portion located closest to the shadow area 70 may be set as the shadow area 70 . In general, the shadow area 70 may be the most natural and common in the object area 20 to be near the toe of the presenter 1 .

추가 조절 파트(536)는 상기 객체 밝기, 상기 배경 밝기, 상기 하이라이트 영역(40)의 밝기, 상기 그림자 영역(50)의 밝기에 따라 상기 포인트 영역(60) 및 상기 섀도우 영역(70)의 밝기를 각각 조절하는 기능을 수행한다.The additional adjustment part 536 adjusts the brightness of the point area 60 and the shadow area 70 according to the object brightness, the background brightness, the brightness of the highlight area 40, and the brightness of the shadow area 50. Each performs a control function.

여기서 포인트 영역(60)의 밝기는 객체 영역(20) 내에서도 가장 밝은 하이라이트 존이 된다 하였으므로, 객체 영역(20)의 평균 밝기인 객체 밝기보단 높아야 하지만 광원인 하이라이트 영역(40)의 밝기보다는 낮게 조절됨이 바람직하다. 더불어 배경 밝기와 너무 크게 차이나는 경우 이질감이 들 수 있으므로 배경 밝기와 유사한 범위 내에서 조절됨이 바람직하다.Here, since it is said that the brightness of the point area 60 becomes the brightest highlight zone in the object area 20, it should be higher than the object brightness, which is the average brightness of the object area 20, but is adjusted to be lower than the brightness of the highlight area 40 which is the light source. desirable. In addition, if there is too much difference from the background brightness, a sense of difference may be heard, so it is preferable to adjust the adjustment within a range similar to the background brightness.

더불어 섀도우 영역(70)의 밝기는 객체 영역(20) 내에서도 가장 어두운 존이 되는 것이므로, 객체 영역(20)의 평균 밝기인 객체 밝기보다 어두워야 하며, 그러나 배경 영역(30) 중에서 가장 어두운 영역인 섀도우 영역(70)의 밝기보다는 높게 조절됨이 바람직하다. 더불어 배경 밝기와 너무 크게 차이나는 경우 이질감이 들 수 있으므로 배경 밝기와 유사한 범위 내에서 조절됨이 바람직하다.In addition, since the brightness of the shadow area 70 is the darkest zone in the object area 20 , it should be darker than the object brightness, which is the average brightness of the object area 20 , but the shadow which is the darkest area among the background areas 30 . It is preferable to adjust the brightness higher than the brightness of the region 70 . In addition, if there is too much difference from the background brightness, a sense of difference may be heard, so it is preferable to adjust the adjustment within a range similar to the background brightness.

이와 같은 포인트 영역(60) 및 섀도우 영역(70)의 밝기는 영상 편집 시스템 관리자에 의해 조절될 수 있으며, 이때 이용되는 것은 상술한 바와 같은 프리미어프로와 같은 영상 편집 툴 또는 다빈치 리졸브와 같은 프로그램일 수 있다.The brightness of the point area 60 and the shadow area 70 can be adjusted by the video editing system administrator, and it can be a video editing tool such as Premiere Pro as described above or a program such as Da Vinci Resolve to be used. have.

따라서 이와 같이 객체 밝기, 배경 밝기, 하이라이트 영역(40)의 밝기, 그림자 영역(50)의 밝기에 따라 포인트 영역(60) 및 섀도우 영역(70)을 설정하고, 포인트 영역(60) 및 섀도우 영역(70)의 밝기를 차등 조절할 수 있도록 하여 객체 영역(20) 내에서도 명암이 생기게 할 수 있으며, 이를 통해 보다 현실감 있는, 야외 장소에 실제로 진행자(1)가 가서 촬영한 듯 한 느낌을 제공할 수 있게 된다.Accordingly, the point area 60 and the shadow area 70 are set according to the object brightness, the background brightness, the brightness of the highlight area 40, and the brightness of the shadow area 50, and the point area 60 and the shadow area ( 70) can be differentially adjusted to create contrast in the object area 20, and through this, it is possible to provide a more realistic feeling as if the host 1 actually went to an outdoor place and filmed it. .

더불어 도 6을 참조하면, 하이라이트 영역(40) 및 섀도우 영역(70)이 설정됨에 있어, 점차적으로 밝기가 줄어들 수 있도록 합성영상 화면을 하이라이트 영역(40)을 기준으로 분할 처리하고, 나아가 이를 기반으로 하여 객체 영역(20) 내에서 포인트 영역(60) 및 섀도우 영역(70)의 세부 보정이 가능하도록 구성할 수 있는데, 이를 위해 영상 추가 보정부(530)는, 가상 연결선 생성파트(537), 중심각 설정파트(538), 가상 진행선 설정파트(539), 가상 호 설정파트(540), 그라데이션 설정파트(541)를 더 포함할 수 있다.In addition, referring to FIG. 6 , when the highlight area 40 and the shadow area 70 are set, the composite image screen is divided based on the highlight area 40 so that the brightness can be gradually reduced, and further based on this Thus, it can be configured to enable detailed correction of the point region 60 and the shadow region 70 within the object region 20. It may further include a setting part 538 , a virtual progress line setting part 539 , a virtual call setting part 540 , and a gradation setting part 541 .

가상 연결선 생성파트(537)는, 상술한 도 6에서와 같이 하이라이트 영역(40)의 중심점과 그림자 영역(50)의 중심점을 연결하여 합성 영상 내에서 하이라이트 영역(40) 및 그림자 영역(50)을 가로질러 연결하는 가상 연결선(81)을 생성한다. 이는 바람직하게 합성 영상을 대각선으로 가로지르는 선일 수 있다.The virtual connecting line generation part 537 connects the center point of the highlight area 40 and the center point of the shadow area 50 as in FIG. 6 described above to form the highlight area 40 and the shadow area 50 in the composite image. Creates a virtual connecting line 81 connecting across. This may preferably be a line that crosses the composite image diagonally.

중심각 설정파트(538)는, 하이라이트 영역(40)의 밝기에 따라 빛 전달 중심각을 설정하는 기능을 수행한다. 여기서 빛 전달 중심각은 도면 상에서 세타(θ)로 표시되며, 하이라이트 영역(40)의 밝기가 넓을수록 빛이 넓게 퍼지게 되므로 빛 전달 중심각이 커지고, 하이라이트 영역(40)의 밝기가 좁을수록 빛이 좁게 퍼지게 되므로 빛 전달 중심각이 작아질 수 있다. 여기서 바람직하게 빛 전달 중심각은 30 내지 180°범위 내에서 조절될 수 있으며, 이는 시스템 관리자에 의해 설정될 수 있다.The central angle setting part 538 performs a function of setting the light transmission central angle according to the brightness of the highlight area 40 . Here, the central angle of light transmission is indicated by theta (θ) in the drawing, and as the brightness of the highlight area 40 increases, the light spreads widely. Therefore, the central angle of light transmission may be reduced. Preferably, the light transmission center angle may be adjusted within the range of 30 to 180°, which may be set by a system administrator.

가상 진행선 설정파트(539)는, 도 6에 나타난 바와 같이 가상 연결선(81)을 기준으로 대칭된 두 개의 가상 진행선(82)을 설정하는 역할을 수행한다. 각각의 가상 진행선(82)은 하이라이트 영역(40)의 중심점에서 시작되는 것이며, 두 개의 가상 진행선(82)은 빛 전달 중심각을 가지고 서로 벌어져 있다. 즉 가상 연결선(81) 및 가상 진행선(82)은 각각 θ/2의 각도를 가지게 되며, 두 개의 가상 진행선(82)이 서로 이루는 각도가 빛 전달 중심각인 θ가 될 수 있다.The virtual progress line setting part 539 serves to set two virtual progress lines 82 symmetrical with respect to the virtual connection line 81 as shown in FIG. 6 . Each virtual travel line 82 starts at the center point of the highlight area 40 , and the two virtual travel lines 82 are spaced apart from each other with a light transmission center angle. That is, the virtual connecting line 81 and the virtual traveling line 82 each have an angle of θ/2, and the angle formed by the two virtual traveling lines 82 may be the light transmission central angle θ.

가상 호 설정파트(540)는, 두 개의 가상 진행선(82)을 서로 연결하는 복수의 가상 호(83)(arc)를 설정한다. 이때 가상 호(83)는 복수 개가 설정되는데, 시스템 상에서 설정된 밝기 간격만큼 상호 이격된 것을 특징으로 한다. 이때 밝기 간격의 경우 하이라이트 영역(40)과 섀도우 영역(70)의 거리를 기반으로 설정되며, 바람직하게는 하이라이트 영역(40)의 중심점과 섀도우 영역(70)의 중심점 사이의 거리를 기반으로 설정된다. 바람직하게는 거리가 좁을수록 밝기 간격이 좁게 설정되며, 거리가 멀수록 밝기 간격이 넓게 설정된다. 따라서 바람직하게 합성영상 화면 내에서 가상 호(83)는 3개 내지 5개 범위로 설정될 수 있다. 따라서 각각의 가상 호(83)는 밝기 간격만큼 상호 이격된 동심원의 둘레의 일부라 할 수 있다.The virtual call setting part 540 sets up a plurality of virtual arcs 83 (arc) connecting two virtual running lines 82 to each other. In this case, a plurality of virtual arcs 83 are set, and they are spaced apart from each other by a brightness interval set on the system. In this case, the brightness interval is set based on the distance between the highlight area 40 and the shadow area 70, preferably based on the distance between the center point of the highlight area 40 and the center point of the shadow area 70. . Preferably, the narrower the distance is, the narrower the brightness interval is, and the greater the distance is, the wider the brightness interval is. Therefore, preferably, the virtual arc 83 in the composite image screen may be set in the range of 3 to 5. Therefore, each virtual arc 83 can be said to be a part of the circumference of concentric circles spaced apart from each other by the brightness interval.

그라데이션 설정파트(541)는 가상 진행선(82) 및 가상 호(83)가 설정됨에 따라 합성영상의 화면에 설정되는 영역인 그라데이션 에어리어(area)를 정의하고 이에 따라 가상의 그라데이션 영역을 나누는 기능을 수행한다.The gradation setting part 541 defines a gradation area, which is an area set on the screen of the composite image as the virtual progress line 82 and the virtual arc 83 are set, and divides the virtual gradation area accordingly. carry out

화면 내에서 가상 진행선(82) 및 가상 호(83)가 그려지면, 가상 호(83) 및 가상 진행선(82)에 의해 에어리어, 즉 영역이 설정된다. 이는 광원의 영역이라 할 수 있는 하이라이트 영역(40)으로부터 섀도우 영역(70)에 이르기 까지 점차적으로 빛이 퍼져나가는 것을 표현한 것이라 할 수 있는데, 본 발명에서 가상 진행선(82) 및 가상 호(83)에 의해 분할된 면적을 그라데이션 에어리어(gradation area)라고 칭한다.When the virtual progress line 82 and the virtual arc 83 are drawn on the screen, an area, that is, an area, is set by the virtual arc 83 and the virtual progress line 82 . This can be said to represent the gradual spread of light from the highlight region 40, which can be said to the region of the light source, to the shadow region 70. In the present invention, the virtual progress line 82 and the virtual arc 83 The area divided by is referred to as a gradation area.

그라데이션 에어리어는 복수 개, 바람직하게는 가상 호(83)의 개수에 따라 3개 내지 5개로 설정될 수 있는데, 하이라이트 영역(40)에 가까운 그라데이션 에어리어일수록 그 면적이 좁고, 섀도우 영역(70)에 가까운 그라데이션 에어리어일수록 그 면적이 넓게 분할된다.A plurality of gradation areas may be set, preferably 3 to 5 depending on the number of virtual arcs 83 . The closer the gradation area to the highlight area 40 is, the narrower the area is, and the closer to the shadow area 70 . The more the gradation area is, the wider the area is divided.

이와 같이 그라데이션 에어리어의 설정이 완료되면, 추가 조절 파트(536)는 객체 밝기, 배경 밝기, 하이라이트 영역(40)의 밝기, 그림자 영역(50)의 밝기 뿐 아니라 포인트 영역(60) 및 섀도우 영역(70) 각각이 속한 그라데이션 에어리어를 반영하여 포인트 영역(60) 및 섀도우 영역(70)의 밝기를 각각 조절하게 된다.When the setting of the gradation area is completed in this way, the additional adjustment part 536 includes object brightness, background brightness, brightness of highlight area 40, brightness of shadow area 50, as well as point area 60 and shadow area 70 ), the brightness of the point area 60 and the shadow area 70 is adjusted by reflecting the gradation area to which each belongs.

이는 객체 영역(20)의 크기가 합성 영상 내에서 천차만별일 수 있는 점을 반영하는 것으로, 도면에서와 같이 A1 내지 A4의 그라데이션 에어리어가 형성된다 하였을 때, 포인트 영역(60)은 A2에 속하는 것을 확인할 수 있으며, 섀도우 영역(70)은 A4 영역에 속하는 것을 확인할 수 있다.This reflects the fact that the size of the object region 20 can vary widely in the composite image, and when the gradation areas A1 to A4 are formed as shown in the figure, it is confirmed that the point region 60 belongs to A2. It can be confirmed that the shadow area 70 belongs to the A4 area.

따라서 포인트 영역(60)의 밝기 조절 시 포인트 영역(60)이 속한 그라데이션 에어리어가 하이라이트 영역(40)에 가까울수록, 즉 A1에 가까울수록 포인트 영역(60)의 밝기는 하이라이트 영역(40)의 밝기와 유사한 수준으로 밝아지는 것이며, 상대적으로 A4에 가까울수록 (하이라이트 영역과 멀수록) 하이라이트 영역(40)보다는 어둡게 처리된다.Therefore, when adjusting the brightness of the point area 60, the closer the gradation area to which the point area 60 belongs to the highlight area 40, that is, the closer to A1, the closer the brightness of the point area 60 is to the brightness of the highlight area 40. It is brightened at a similar level, and the closer it is to A4 (the farther it is from the highlight area), the darker it is processed than the highlight area 40 .

섀도우 영역(70)의 밝기 조절에 있어서도 섀도우 영역(70)이 속한 그라데이션 에어리어가 그림자 영역(50)에 가까울수록, 즉 A4에 가까울수록 섀도우 영역(70)의 밝기는 그림자 영역(50)의 밝기와 유사한 수준으로 어두워지는 것이며, 상대적으로 A1에 가까울수록 (그림자 영역(50)과 멀수록) 그림자 영역(50)보다 밝게 처리된다.Also in adjusting the brightness of the shadow area 70, as the gradation area to which the shadow area 70 belongs is closer to the shadow area 50, that is, closer to A4, the brightness of the shadow area 70 is the same as the brightness of the shadow area 50. It is darkened at a similar level, and the closer to A1 (the farther from the shadow area 50), the brighter the shadow area 50 is.

더불어 여기서 보다 바람직하게는, 포인트 영역(60)의 밝기 조절 및 섀도우 영역(70)의 밝기 조절에 있어 포인트 수치 및 섀도우 수치를 각각 산출할 수도 있는데, 이때 포인트 수치는 다음의 수학식 1을 기반으로, 섀도우 수치는 다음의 수학식 2를 기반으로 산출될 수 있다.In addition, more preferably, in the brightness control of the point area 60 and the brightness control of the shadow area 70, a point value and a shadow value may be calculated respectively, in which case the point value is based on Equation 1 below. , the shadow value may be calculated based on Equation 2 below.

수학식 1,

Equation 1,

수학식 2,

Equation 2,

(여기서,

는 포인트 수치,

는 섀도우 수치,

는 기준 포인트 수치,

는 기준 섀도우 수치,

는 포인트 영역(60)이 속한 그라데이션 에어리어에 부여된 그라데이션 가중치,

는 섀도우 영역이 속한 그라데이션 에어리어에 부여된 그라데이션 가중치,

는 배경 밝기,

는 객체 밝기,

는 하이라이트 영역의 밝기,

는 그림자 영역의 밝기)(here,

is the number of points,

is the shadow figure,

is the reference point number,

is the reference shadow figure,

is the gradation weight given to the gradation area to which the point area 60 belongs,

is the gradient weight given to the gradient area to which the shadow area belongs,

is the background brightness,

is the object brightness,

is the brightness of the highlight area,

is the brightness of the shadow area)

여기서 배경 밝기, 객체 밝기, 하이라이트 영역(40)의 밝기, 그림자 영역(50)의 밝기 값은 모두 수치화된 값으로서, 바람직하게는 휘도 값의 정의에 따라 영상에 존재하는 영상소값으로 불연속적인 밝기의 양을 나타내는데 사용되는 값을 이용할 수 있다. 이는 0~255 범위의 값을 갖는다.Here, the background brightness, the object brightness, the brightness of the highlight area 40, and the brightness value of the shadow area 50 are all numerical values. Values used to represent quantities are available. It has a value ranging from 0 to 255.

기준 포인트 수치 및 기준 섀도우 수치 역시 밝기에 해당하는 값이므로 0 내지 255 범위 내에서 설정될 수 있으나 이때 기준 포인트 수치 및 기준 섀도우 수치는 시스템 관리자에 의해 설정될 수 있다. 바람직하게 기준 포인트 수치는 200 내지 255 의 값을 가질 수 있으며, 기준 섀도우 수치는 0 내지 100의 값을 가질 수 있다.Since the reference point value and the reference shadow value are also values corresponding to brightness, they may be set within a range of 0 to 255. In this case, the reference point value and the reference shadow value may be set by a system administrator. Preferably, the reference point value may have a value of 200 to 255, and the reference shadow value may have a value of 0 to 100.

그라데이션 가중치의 경우 각각의 그라데이션 에어리어마다 부여되는 것을 특징으로 하며, 도면에서 나타난 경우 A1 내지 A4의 그라데이션 에어리어마다 서로 다른 값의 그라데이션 가중치가 부여된다. 이때 그라데이션 가중치의 경우 0.8 내지 1.2의 범위 내에서 조절이 가능하다. 바람직하게는 하이라이트 영역(40)에 가까운 그라데이션 에어리어일수록 큰 값의 그라데이션 가중치를 갖고, 그림자 영역(50)에 가까운 그라데이션 에어리어일수록 작은 값의 그라데이션 가중치를 갖는 것을 기본으로 한다.In the case of the gradation weight, it is characterized in that it is given to each gradation area, and in the case shown in the drawing, a gradation weight of a different value is given to each of the gradation areas A1 to A4. In this case, the gradation weight can be adjusted within the range of 0.8 to 1.2. Preferably, the gradation area closer to the highlight area 40 has a larger gradation weight, and the gradation area closer to the shadow area 50 has a smaller gradation weight.

더불어 이러한 포인트 수치 및 섀도우 수치는 선형 그래프가 아닌 비선형으로 나타나는 값의 특징을 갖는데, 자연스러운 그라데이션 처리를 위해서는 선형으로 일정하게 밝기가 줄어드는 것이 아닌 비선형적으로, 점차적으로 그라데이션 에어리어가 넓어지면서 밝기 차이가 줄어드는 형태가 일반적이므로, 해당 형태의 그래프를 보여주는 하이퍼볼릭을 취하여 자연스러운 디밍 효과가 반영된 포인트 수치 및 섀도우 수치를 산출할 수 있도록 하였다.In addition, these point numbers and shadow values have characteristics of values that appear non-linearly rather than on a linear graph. Since the shape is common, a hyperbolic showing the graph of the corresponding shape was taken to calculate the point value and shadow value reflecting the natural dimming effect.

포인트 수치의 경우 하이라이트 영역(40)의 밝기가 가장 중요한 포인트가 됨과 동시에 배경 밝기와 객체 밝기의 차이, 하이라이트 영역(40)과 그림자 영역(50)의 밝기 차이를 서로 비교할 수 있도록 하였으며, 섀도우 수치의 경우 그림자 영역(50)의 밝기가 가장 중요한 포인트가 됨과 동시에 배경 밝기와 객체 밝기의 차이, 하이라이트 영역(40)과 그림자 영역(50)의 밝기 차이를 서로 비교할 수 있도록 하였다.In the case of point values, the brightness of the highlight area 40 becomes the most important point, and at the same time, the difference between the background brightness and the object brightness, and the brightness difference between the highlight area 40 and the shadow area 50 can be compared with each other, and In this case, the brightness of the shadow area 50 becomes the most important point, and at the same time, the difference between the background brightness and the object brightness and the brightness difference between the highlight area 40 and the shadow area 50 can be compared with each other.

만약 단순한 수치적 예시를 들었을 때, 기준 포인트 수치가 200이고, 기준 섀도우 수치가 130이며 포인트 영역(60)이 속한 그라데이션 에어리어에 부여된 그라데이션 가중치가 1이며, 섀도우 영역(70)이 속한 그라데이션 에어리어에 부여된 그라데이션 가중치가 0.8이고, 배경 밝기가 150, 객체 밝기가 130, 하이라이트 영역(40)의 밝기가 220, 그림자 영역(50)의 밝기가 120인 경우를 가정하면,If a simple numerical example is given, the reference point value is 200, the reference shadow value is 130, the gradation weight given to the gradation area to which the point area 60 belongs is 1, and the gradation weight to which the shadow area 70 belongs. Assuming that the given gradation weight is 0.8, the background brightness is 150, the object brightness is 130, the brightness of the highlight area 40 is 220, and the brightness of the shadow area 50 is 120,

로 산출될 수 있다.can be calculated as

따라서 이와 같이 수치화된 비교값을 기반으로 하여 섀도우 수치 및 포인트 수치를 산출하고, 섀도우 수치 및 포인트 수치의 고저에 따라 포인트 영역(60) 및 섀도우 영역(70)의 밝기에 대한 차등 보정이 가능해져 포인트 수치가 높을수록 포인트 영역(60)의 밝기를 보다 높이 조절하며, 섀도우 수치가 높을수록 섀도우 영역(70)의 밝기를 보다 낮추는 방식으로 조절할 수 있어 세부적인 명암 조절이 가능해질 수 있다.Therefore, based on the numerical comparison value as described above, the shadow value and point value are calculated, and the differential correction for the brightness of the point area 60 and the shadow area 70 is possible depending on the height of the shadow value and the point value. The higher the numerical value, the higher the brightness of the point area 60 is, and the higher the shadow value, the lower the brightness of the shadow area 70 can be adjusted, so that detailed contrast control is possible.

지금까지 설명한 바와 같이, 본 발명에 따른 음성 더빙 기반의 진행자 영상 편집 시스템의 구성 및 작용을 상기 설명 및 도면에 표현하였지만 이는 예를 들어 설명한 것에 불과하여 본 발명의 사상이 상기 설명 및 도면에 한정되지 않으며, 본 발명의 기술적 사상을 벗어나지 않는 범위 내에서 다양한 변화 및 변경이 가능함은 물론이다.As described so far, the configuration and operation of the voice dubbing-based presenter video editing system according to the present invention are expressed in the above description and drawings, but this is merely an example, and the spirit of the present invention is not limited to the above description and drawings. It goes without saying that various changes and modifications can be made without departing from the technical spirit of the present invention.

1 : 진행자 2 : 중앙관제서버
10 : 얼굴 영역 11 : 입 영역
20 : 객체 영역 30 : 배경 영역
40 : 하이라이트 영역 50 : 그림자 영역
60 : 포인트 영역 70 : 섀도우 영역
81 : 가상 연결선 82 : 가상 진행선
83 : 가상 호 100 : 영상 데이터베이스
200 : 영역 검출 모듈 210 : 배경 검출부
300 : 오브젝트 추출모듈 310 : 감정 분류부
320 : 오브젝트 추출부 400 : 음성 관리 모듈
410 : 음성 입력부 411 : 추가정보 입력파트
420 : 발음 분석부 500 : 합성 모듈
510 : 경계 보정부 511 : 피부 화소 검출파트
512 : 경계 보정 파트 513 : 스머징 파트
520 : 배경 합성부 530 : 영상 추가 보정부
531 : 밝기 파악 파트 532 : 밝기 조절 파트
533 : 하이라이트 설정파트 534 : 그림자 설정파트
535 : 음영 설정파트 536 : 추가 조절 파트
537 : 가상 연결선 생성파트 538 : 중심각 설정파트
539 : 가상 진행선 설정파트 540 : 가상 호 설정파트
541 : 그라데이션 설정파트 600 : 출력 모듈
700 : 환경 파악 모듈 710 : 외부 환경 파악부
720 : 배경 분류부1: Moderator 2: Central control server
10: face area 11: mouth area
20: object area 30: background area
40: highlight area 50: shadow area
60: point area 70: shadow area
81: virtual connection line 82: virtual progress line
83: virtual call 100: image database
200: area detection module 210: background detection unit
300: object extraction module 310: emotion classification unit
320: object extraction unit 400: voice management module
410: voice input unit 411: additional information input part
420: pronunciation analysis unit 500: synthesis module
510: boundary correction unit 511: skin pixel detection part
512: boundary correction part 513: smudging part
520: background synthesizing unit 530: image additional correction unit
531: brightness grasp part 532: brightness control part
533: highlight setting part 534: shadow setting part
535: shade setting part 536: additional adjustment part
537: Virtual connection line creation part 538: Center angle setting part
539: virtual progress line setting part 540: virtual call setting part
541: gradation setting part 600: output module
700: environment identification module 710: external environment identification unit
720: background classification unit

Claims

As an audio dubbing-based moderator video editing system,
an image database for receiving a plurality of images including the face of the host, setting and storing a representative image, and receiving and storing a background image including any one of indoor and outdoor places;
A background for detecting a face region and a mouth region from the face region in a plurality of the images stored in the image database, and separating an object region including the moderator and a background region excluding the object region in the plurality of images a region detection module including a detection unit;
an object extraction module for extracting and storing a mouth-shaped object for each pronunciation of a voice from the mouth region;
a voice management module comprising: a voice input unit for receiving an information voice including guide information; and a pronunciation analyzer for analyzing pronunciation of the information voice;
a synthesis module comprising a background synthesizing unit for synthesizing the mouth-shaped object to the mouth region of the representative image according to the pronunciation of the information voice to generate a synthesized image, and synthesizing the background image to the background region of the representative image;
Including; an output module for outputting the information voice by dubbing the synthesized image,
The synthesis module is
a brightness detecting part for respectively detecting background brightness and object brightness, which are average brightnesses of the background area and the object area in the composite image, and a brightness adjusting part for adjusting object brightness according to the difference between the background brightness and the object brightness, and the background; A highlight setting part for determining the brightness and position of the highlight area by dividing the peripheral part into a plurality of areas and setting a highlight area that is a high-brightness area above the divided background area; and A shadow setting part for determining a position and brightness of the shadow region by setting a shadow region, which is a low-brightness region, below the background region, and a position of the highlight region and the position of the shadow region within the object region A shadow setting part for setting the vicinity of a highlight area as a high-brightness point area and setting a shadow area with low brightness near the shadow area, and the object brightness, the background brightness, the brightness of the highlight area, and the brightness of the shadow area. an additional adjustment part for respectively adjusting the brightness of the point area and the shadow area; a virtual connection line generation part for creating a virtual connection line connecting the center point of the highlight area and the center point of the shadow area; A central angle setting part for setting the light transmission central angle accordingly, and a virtual traveling line setting part for setting two virtual traveling lines that start at the center point of the highlight area and spread out with the light transmission central angle as symmetrical with respect to the virtual connection line; , a virtual call setting part that connects the two virtual progress lines to each other and sets up a plurality of virtual calls spaced apart from each other by a brightness interval set based on the distance between the highlight area and the shadow area; An image additional correction unit including a gradation setting part for setting a plurality of gradation areas according to a line and the virtual arc,
The additional adjustment part,
Adjusting the brightness of the point area and the shadow area according to the object brightness, the background brightness, the brightness of the highlight area, the brightness of the shadow area, and the gradation area to which the point area and the shadow area belong, respectively , a moderator video editing system.

The method of claim 1,
The voice input unit,
Including an additional information input part for receiving the emotional information of the guide information,
The object extraction module,
an emotion classification unit for classifying the plurality of mouth-shaped objects according to emotions; and
An object extraction unit for extracting the mouth shape object matching the emotion information of the guide information as an emotion matching object,
The synthesis module is
The presenter video editing system, characterized in that the synthesized image is generated by synthesizing the emotion matching object with the mouth region of the representative image according to the pronunciation of the information voice.

The method of claim 1,
The synthesis module is
a skin pixel detection part for detecting a skin pixel at a boundary between the mouth region and the face region belonging to the composite image;
a boundary correction part for correcting the saturation and brightness of the boundary region;
and a boundary correction unit including a smudging part for performing smudging processing on the boundary region.

The method of claim 1,
The system is
An environment identification module comprising: an external environment identification unit configured to determine the weather information and time information of the location based on the output time of the output module; and a background classification unit configured to classify the background image according to weather information and time information; and ,
The background synthesizing unit,
The presenter image editing system, characterized in that the background image matching the weather information and time information of the place is synthesized in the background area.

The method of claim 1,
The additional adjustment part,
Calculate a point value based on Equation 1 below, and adjust the brightness of the point area according to the calculated point value,
A moderator image editing system, characterized in that a shadow value is calculated based on Equation 2 below, and the brightness of the shadow area is adjusted according to the calculated shadow value.
Equation 1,