KR20210098220A

KR20210098220A - Content Transforming Apparatus Responsive to Audience Reaction

Info

Publication number: KR20210098220A
Application number: KR1020200012129A
Authority: KR
Inventors: 박동욱
Original assignee: 주식회사 아이티엑스에이아이
Priority date: 2020-01-31
Filing date: 2020-01-31
Publication date: 2021-08-10
Also published as: KR102398848B1

Abstract

Disclosed is a technology related to automatic editing of multimedia content. A reaction clip, which is part of the multimedia content originally produced as one-way content, is converted into a customized reaction clip determined according to a viewer reaction. The viewer reaction is detected, and accordingly, a customized reaction is determined. The reaction clip, which is part of the multimedia content, is converted into the customized reaction clip according to the determined customized reaction. The viewer reaction may be detected by analyzing a facial expression, a change in a posture, or a voice of the viewer. The conversion to the customized reaction may include a process of changing a reaction movement or a reaction facial expression without changing an identity of a character.

Description

Content Transforming Apparatus Responsive to Audience Reaction

멀티미디어 컨텐츠의 자동 편집에 관한 기술이 개시된다.Disclosed is a technique for automatic editing of multimedia content.

2009.04.24.자 출원되어 2010.11.01.자 등록된 한국 특허제992,509호는 컨텐츠의 케릭터의 목소리나 동작/표정을 부호, 형제 등 지인의 목소리나 동작/표정으로 변조하여 제공하는 아이디어를 개시하고 있다. 그러나 목소리를 변조하는 구체적인 방법에 대해 실현 가능한 방안을 제시하지 못하고 있으며, 동작이나 표정의 변조와 관련해서도 구체적인 변조 방법에 대해 제시하지 못하고 있다. Korean Patent No. 992,509, filed on April 24, 2009 and registered on November 1, 2010, discloses the idea of providing the voice or action/expression of a character in content by modulating it into the voice or action/expression of an acquaintance such as a code or a brother, and there is. However, it does not suggest a feasible method for a specific method of modulating the voice, and does not suggest a specific modulation method in relation to the modulation of motion or expression.

아이디어의 제시에 그친 이러한 수준에서 발전하여, 최근의 딥러닝에 기초한 기술은 동작이나 목소리, 얼굴의 변환 분야에서 혁신적인 기술을 제시하고 있다. GAN(Generative -Adversarial Network) 기술을 통해 사진의 표정을 바꾼다든지 자세를 바꾸는 것이 섬세하게 가능해졌다. Liqian Ma, et al. “Pose Guided Person Image Generation”, 31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA, 25 May 2017 논문은 이러한 GAN 네트워크의 한 형태를 이용하여 자세(pose) 정보에 기초하여 영상 속의 사람의 자세를 바꾸는 기술을 개시하고 있다. VLSI 기술의 적용이 인공 지능 엔진에서도 확대되고 있어 이러한 기술들의 실시간 구현도 가능해지고 있다. Developing at this level of merely presenting ideas, recent deep learning-based technologies are presenting innovative technologies in the field of motion, voice, and face transformation. GAN (Generative-Adversarial Network) technology makes it possible to change the facial expression and change the posture in a photo delicately. Liqian Ma, et al. “Pose Guided Person Image Generation”, 31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA, 25 May 2017 The paper uses a form of this GAN network to create images based on pose information. A technique for changing the posture of a person in the body is disclosed. The application of VLSI technology is also expanding in artificial intelligence engines, enabling real-time implementation of these technologies.

한편, 사용자 반응형(interactive) 멀티미디어 컨텐츠의 제작은 장면마다 다양한 경우의 수를 고려하여야 하기 때문에 현실적으로 불가능하다. 최근 들어 사람과 자연어 소통을 하면서 저장된 모션 클립을 적절히 선택하여 재생하는 제한된 범위의 인공 지능 로봇이 제시되고 있으나 그 응용 분야 별로 방대한 량의 데이터에 의한 학습이 선행되어야 한다. On the other hand, the production of user-responsive multimedia content is practically impossible because the number of various cases for each scene must be considered. Recently, artificial intelligence robots with a limited range that appropriately select and reproduce stored motion clips while communicating with humans in natural language have been proposed, but learning by a large amount of data for each application field should be preceded.

제안된 발명은 멀티미디어 컨텐츠를 재생시 시청자 반응에 따라 재생되는 아바타의 동작을 변화시키는 새로운 기술을 제시하는 것을 목적으로 한다. It is an object of the proposed invention to present a new technology for changing the operation of an avatar reproduced according to a viewer's reaction when playing multimedia content.

나아가 제안된 발명은 사용자 반응형으로 제작되지 않은 멀티미디어 컨텐츠를 사용자 반응형으로 변환하는 것을 목적으로 한다. Furthermore, the proposed invention aims to convert multimedia content that is not produced in a user responsive manner into a user responsive one.

제안된 발명의 일 양상에 따르면, 당초 일방성 컨텐츠로 제작된 멀티미디어 컨텐츠의 일부인 반응 클립이 시청자 반응에 따라 결정된 맞춤 반응 클립으로 변환된다. 시청자 반응을 검출하고, 그에 대응하여 맞춤 반응이 결정된다. 멀티미디어 컨텐츠의 일부인 반응 클립이 결정된 맞춤 반응에 따른 맞춤 반응 클립으로 변환된다. According to an aspect of the proposed invention, a reaction clip, which is a part of multimedia content originally produced as one-way content, is converted into a customized reaction clip determined according to a viewer reaction. A viewer reaction is detected, and a customized reaction is determined in response thereto. A reaction clip that is part of the multimedia content is converted into a customized reaction clip according to the determined customized reaction.

추가적인 양상에 따르면, 시청자의 반응은 클라우드 서버에 의해 검출될 수 있다. 나아가 클라우드 서버는 네트워크를 통해 수신한 다수의 시청자 영상으로부터 각각의 시청자의 반응을 검출하고 종합하여 시청자 반응을 결정할 수 있다. According to a further aspect, the viewer's reaction may be detected by a cloud server. Furthermore, the cloud server may determine the viewer reaction by detecting and synthesizing the reaction of each viewer from a plurality of viewer images received through the network.

추가적인 양상에 따르면, 시청자의 얼굴의 표정을 정해진 카테고리들 중 하나로 분류함으로써 시청자 반응을 검출할 수 있다. 나아가 또 다른 양상에 따르면, 시청자의 자세의 변화로부터 동작을 정해진 카테고리들 중 하나로 분류함으로써 시청자 반응을 검출할 수 있다. 나아가 또 다른 양상에 따르면, 시청자의 음성을 정해진 카테고리들 중 하나로 분류함으로써 시청자 반응을 검출할 수 있다. According to a further aspect, the viewer reaction may be detected by classifying the facial expression of the viewer into one of predetermined categories. Further, according to another aspect, the viewer's reaction may be detected by classifying a motion from a change in the viewer's posture into one of predetermined categories. Further, according to another aspect, the viewer's reaction may be detected by classifying the viewer's voice into one of predetermined categories.

추가적인 양상에 따르면, 맞춤 반응은 반응 동작을 포함할 수 있다. 나아가 또 다른 양상에 따르면, 맞춤 반응은 반응 표정을 포함할 수 있다.According to a further aspect, the custom reaction may include a reaction action. According to yet another aspect, the customized reaction may include a reaction expression.

추가적인 양상에 따르면, 시청자 반응은 시청자 영상을 입력 받아 소정 범위의 반응 중 하나를 결정하여 출력하는 인공 지능 엔진에 의해 처리될 수 있다. According to a further aspect, the viewer reaction may be processed by an artificial intelligence engine that receives a viewer image, determines and outputs one of a range of responses.

제안된 발명에 따라, 당초 일방성으로 제작된 멀티미디어 컨텐츠로 시청자 반응에 응답하는 응답형 서비스를 제공할 수 있다. 예를 들어 인터넷 강의의 경우 많은 시청자들을 대상으로 동일한 컨텐츠가 제공되어 시청자의 반응에 전혀 반응하지 않으므로 집중력을 떨어뜨린다. 시청자가 해당 강의에 집중하고 있는 정도나 순간적인 반응을 포착하여 강사가 그에 따른 적절한 행동을 하도록 변형함으로써, 녹화 강의에 대한 집중력을 높일 수 있다. According to the proposed invention, it is possible to provide a responsive service that responds to a viewer's reaction with multimedia content that was originally produced one-way. For example, in the case of Internet lectures, the same content is provided to many viewers and does not respond at all to the viewers' reactions, which reduces concentration. It is possible to increase the concentration on the recorded lecture by capturing the degree of concentration of the viewer or the instantaneous reaction and transforming the instructor to take an appropriate action accordingly.

도 1은 일 실시예에 따른 컨텐츠 변환 장치가 적용된 디지털 방송 시스템의 일 예를 도시한다.
도 2는 일 실시예에 따른 컨텐츠 변환 장치의 구성을 도시한 블록도이다.
도 3은 도 2의 반응 검출부의 일 실시예의 구성을 도시한 블록도이다.
도 4는 또 다른 실시예에 따른 컨텐츠 변환 장치의 구성을 도시한 블록도이다.
도 5는 또 다른 실시예에 따른 컨텐츠 변환 장치의 구성을 도시한 블록도이다.
도 6은 또 다른 실시예에 따른 컨텐츠 변환 장치의 구성을 도시한 블록도이다.1 illustrates an example of a digital broadcasting system to which a content conversion apparatus according to an embodiment is applied.
2 is a block diagram illustrating a configuration of a content conversion apparatus according to an embodiment.
FIG. 3 is a block diagram illustrating the configuration of an embodiment of the reaction detection unit of FIG. 2 .
4 is a block diagram illustrating a configuration of a content conversion apparatus according to another embodiment.
5 is a block diagram illustrating a configuration of a content conversion apparatus according to another embodiment.
6 is a block diagram illustrating a configuration of a content conversion apparatus according to another embodiment.

전술한, 그리고 추가적인 양상들은 첨부된 도면을 참조하여 설명하는 실시예들을 통해 구체화된다. 각 실시예들이 다른 실시예들과 비교하여 차이가 나거나 추가적으로 구비하는 구성요소들은 다른 언급이나 상호간에 모순이 없는 한 상호간에 다양한 조합이 가능한 것으로 이해되어야 한다. 청구범위는 개시된 실시예들의 구성요소들 간의 조합에 의해 파생되는 다양한 실시예들을 포괄하도록 의도되었다. The foregoing and additional aspects are embodied through the embodiments described with reference to the accompanying drawings. It should be understood that various combinations are possible with each other, as long as there is no contradiction between the elements or the elements that are different or additionally provided in each embodiment compared to the other embodiments. The claims are intended to cover various embodiments derived by combinations between elements of the disclosed embodiments.

도 1은 제안된 발명의 일 실시예에 따른 컨텐츠 변환 장치가 적용된 멀티미디어 서비스 시스템의 일 예를 도시한다. 도시된 멀티미디어 서비스 시스템은 컨텐츠 서비스 서버(30,31)와, 공중망(50) 및 수신단 시스템(11 내지 17)을 포함할 수 있다. 컨텐츠 서비스 서버(30,31)는 방송 스케쥴에 따라 디지털 컨텐츠를 스트리밍 서비스하는 방송 송출 시스템일 수 있다. 또 다른 예로, 컨텐츠 서비스 서버(30,31)는 주문형 컨텐츠 서비스 시스템일 수 있다. 컨텐츠 데이터베이스(31)에 저장된 컨텐츠들은 재생 서버에서 재생되고 스트리밍 서버에 의해 클라이언트들에게 스트리밍 서비스될 수 있다. 부하 분산 시스템에 의해 클라이언트 접속 부하가 분산될 수 있다. 공중망(50)은 유선, 무선, 혹은 이들의 조합일 수 있다. 이들은 인터넷 서비스 제공자들의 유선 네트워크나, 이동통신 사업자들의 이동 통신망을 포함할 수 있으며, 건물 내 유, 무선 통신망을 포함할 수 있다. 도시된 시스템에서 수신단 시스템은 셋탑 박스(11), 예를 들면 OTT(Over-The-Top) 박스와, 디스플레이(13)로 구성될 수 있다. 또 다른 예로, 수신단 시스템은 테블릿(15)이나 스마트폰(17)과 같은 이동통신 단말기일 수 있다. 1 shows an example of a multimedia service system to which a content conversion apparatus according to an embodiment of the proposed invention is applied. The illustrated multimedia service system may include content service servers 30 and 31 , a public network 50 and receiving end systems 11 to 17 . The content service servers 30 and 31 may be a broadcast transmission system that provides a streaming service of digital content according to a broadcast schedule. As another example, the content service servers 30 and 31 may be an on-demand content service system. Contents stored in the content database 31 may be reproduced by a reproduction server and provided streaming service to clients by the streaming server. A load balancing system can distribute the load of client connections. The public network 50 may be wired, wireless, or a combination thereof. These may include wired networks of Internet service providers or mobile communication networks of mobile communication operators, and may include wired and wireless communication networks in buildings. In the illustrated system, the receiving end system may include a set-top box 11 , for example, an over-the-top (OTT) box, and a display 13 . As another example, the receiving end system may be a mobile communication terminal such as a tablet 15 or a smart phone 17 .

일 양상에 따르면, 제안 발명에 따른 컨텐츠 변환 장치는 수신단 시스템, 예를 들면 셋탑 박스(11)의 일부로 구현될 수 있다. 또 다른 예로, 제안 발명에 따른 컨텐츠 변환 장치는 스마트 폰(17)이나 테블릿(15)에 포함된 응용 프로그램과, 이 응용 프로그램과 협업하는 전용 하드웨어, 예를 들면 인공 지능 엔진이 탑재된 주변 기기로 구현될 수 있다. 또 다른 예로, 제안 발명에 따른 컨텐츠 변환 장치는 스마트 폰(17)이나 테블릿(15)에 포함된 응용 프로그램과, 이 응용 프로그램과 협업하는 컨텐츠 변환 서버(50), 예를 들면 인공 지능 프로그램을 고속으로 실행하는 고성능 그래픽 서버로 구현될 수 있다. 컨텐츠 서비스 서버(30)에서 스트리밍으로 송출되는 일방성 컨텐츠의 일부인 반응 클립이 이러한 컨텐츠 변환 장치에 따라 시청자 반응에 따라 결정된 맞춤 반응 클립으로 변환될 수 있다.According to one aspect, the content conversion apparatus according to the proposed invention may be implemented as a part of the receiving end system, for example, the set-top box 11 . As another example, the content conversion device according to the proposed invention is a peripheral device equipped with an application program included in the smart phone 17 or tablet 15, and dedicated hardware that cooperates with this application, for example, an artificial intelligence engine. can be implemented as As another example, the content conversion device according to the proposed invention is an application included in the smart phone 17 or tablet 15, and the content conversion server 50 that cooperates with this application, for example, an artificial intelligence program. It can be implemented as a high-performance graphics server running at high speed. A reaction clip that is a part of one-way content transmitted by streaming from the content service server 30 may be converted into a customized reaction clip determined according to a viewer's reaction according to the content conversion device.

이하에 설명하는 실시예들은 장치로 설명되고 있지만, 정보 처리 시스템에서 구현되는 경우 하드웨어는 정보 흐름의 순서를 따라 구현되며 경우에 따라서는 소프트웨어, 즉 컴퓨터에서 실행 가능한 프로그램으로 구현될 수도 있다. 특히 인공지능 알고리즘과 같은 고도의 복잡성을 가진 알고리즘은 범용 프로세서가 필요한 실행 속도를 따라가지 못하는 부분을 중심으로 전용 하드웨어로 구현되는 경우가 많으므로 하드웨어, 즉 장치적인 면과 소프트웨어, 즉 방법적인 구분이 의미가 없는 경우가 많다. 도시된 실시예들은 장치 관점에서 설명되지만 이들은 그 자체로 첨부된 방법에 관한 청구항들을 뒷받침하고 있다. Although the embodiments described below are described as an apparatus, when implemented in an information processing system, hardware is implemented according to the order of information flow, and in some cases, it may be implemented as software, that is, a program executable on a computer. In particular, algorithms with high complexity, such as artificial intelligence algorithms, are often implemented with dedicated hardware focusing on areas where general-purpose processors cannot keep up with the required execution speed. often meaningless. The illustrated embodiments are described from an apparatus point of view but they themselves support the appended method claims.

제안된 발명의 일 양상에 따르면, 당초 일방성 컨텐츠로 제작된 멀티미디어 컨텐츠의 일부인 반응 클립이 시청자 반응에 따라 결정된 맞춤 반응 클립으로 변환된다. 시청자 반응을 검출하고, 그에 대응하여 맞춤 반응이 결정된다. 멀티미디어 컨텐츠의 일부인 반응 클립이 결정된 맞춤 반응에 따른 맞춤 반응 클립으로 변환된다.According to an aspect of the proposed invention, a reaction clip, which is a part of multimedia content originally produced as one-way content, is converted into a customized reaction clip determined according to a viewer reaction. A viewer reaction is detected, and a customized reaction is determined in response thereto. A reaction clip that is part of the multimedia content is converted into a customized reaction clip according to the determined customized reaction.

도 2는 이러한 제안된 발명의 일 양상이 적용된 일 실시예에 따른 컨텐츠 변환 장치의 구성을 도시한 블록도이다. 컨텐츠 변환 장치는 메모리(230)와, 메모리(230)에 저장된 제어 데이터를 읽어 들여 입력되는 데이터를 처리하는 정보 처리 시스템(10)을 포함할 수 있다. 예를 들어 정보 처리 시스템(10)은 전체 장치를 제어하는 범용 마이크로 프로세서와, 전용 하드웨어로 구현되고 각각이 특정한 딥러닝 알고리즘의 수행에 최적화된 구조를 가지는 복수의 인공 지능 엔진, 그리고 가변적인 구조를 가지고 속도에 민감하지 않은 인공 지능 알고리즘들 혹은 그 일부를 처리하는 재구성 가능한(re-configurable) 구조를 가진 프로세서 어레이를 포함할 수 있다. 2 is a block diagram illustrating a configuration of a content conversion apparatus according to an embodiment to which an aspect of the proposed invention is applied. The content conversion apparatus may include a memory 230 and an information processing system 10 that reads control data stored in the memory 230 and processes input data. For example, the information processing system 10 includes a general-purpose microprocessor that controls the entire device, a plurality of artificial intelligence engines implemented with dedicated hardware and each having a structure optimized for the execution of a specific deep learning algorithm, and a variable structure. and may include a processor array with a re-configurable structure that processes speed-insensitive AI algorithms or parts thereof.

정보 처리 시스템(10)의 입력인 멀티미디어 컨텐츠(250)는, 예를 들면 단말기가통신 모듈을 통해 스트리밍으로 수신하여 코덱을 통해 재생하여 출력하는 컨텐츠일 수 있다. 또 다른 예로, 단말기가 메모리(230)에서 읽어 들여 코덱을 통해 재생하여 출력하는 컨텐츠일 수 있다. The multimedia content 250 , which is an input of the information processing system 10 , may be, for example, content that the terminal receives by streaming through a communication module, reproduces it through a codec, and outputs the content. As another example, it may be content that the terminal reads from the memory 230, reproduces it through a codec, and outputs the content.

일 실시예에 따르면, 컨텐츠 변환 장치는 반응 검출부(100)와, 맞춤 반응 결정부(300)와, 반응 클립 변환부(500)를 포함할 수 있다. 반응 검출부(100)는 출력되는 멀티미디어 컨텐츠에 대한 시청자 반응을 검출한다. According to an embodiment, the content conversion apparatus may include a reaction detecting unit 100 , a custom reaction determining unit 300 , and a reaction clip converting unit 500 . The reaction detection unit 100 detects a viewer reaction to the output multimedia content.

일 양상에 따르면, 시청자의 반응은 클라우드 서버에 의해 검출될 수 있다. 이 실시예에 있어서, 정보 처리 시스템(10)의 반응 검출부(100)는 카메라(210)로부터 수신한 영상 중 적어도 일부의 프레임 혹은 프레임 중 전처리하여 추출한 얼굴, 몸통 등의 일부 영역을 클라우드 서버로 전송하고, 그 검출된 반응 정보를 수신할 수 있다. 나아가 클라우드 서버는 네트워크를 통해 수신한 다수의 시청자 영상으로부터 각각의 시청자의 반응을 검출하고 종합하여 시청자 반응을 결정할 수 있다. 즉, 도 1의 클라우드 서버(50)는 다수의 정보 처리 시스템(10) 으로부터 시청자의 영상, 음성, 센서 정보 등을 수신하여 각각의 시청자의 반응을 검출한다. 이후에 이들을 종합하여, 예를 들면 시청자 반응 중 가장 많은 반응을 최종적인 시청자 그룹 반응으로 결정할 수 있다. According to one aspect, the viewer's reaction may be detected by a cloud server. In this embodiment, the response detection unit 100 of the information processing system 10 transmits to the cloud server at least some frames of the image received from the camera 210 or partial regions such as the face and torso extracted by preprocessing among the frames. and receive the detected reaction information. Furthermore, the cloud server may determine the viewer reaction by detecting and synthesizing the reaction of each viewer from a plurality of viewer images received through the network. That is, the cloud server 50 of FIG. 1 receives the viewer's video, audio, sensor information, etc. from the plurality of information processing systems 10 and detects the reaction of each viewer. Thereafter, they may be combined to determine, for example, the most reaction among the viewer reactions as the final viewer group reaction.

도 3은 일 실시예에 따른 반응 검출부(100)의 구성을 도시한 블록도이다. 제안된 발명의 일 양상에 따라, 반응 검출부(100)는 표정 기반 반응 검출부(110)를 포함할 수 있다. 표정 기반 반응 검출부(110)는 시청자의 얼굴의 표정을 정해진 카테고리들 중 하나로 분류함으로써 시청자 반응을 검출할 수 있다. 표정 기반 반응 검출부(110)는 먼저 입력 영상에서 얼굴 영역을 검출하여 얼굴 영상을 추출한다. 이후에 예를 들어서 표정 기반 반응 검출부(110)는 추출된 얼굴 영상을 분석하여 시청자의 반응을 검출한다. 일 양상에 따르면, 표정 기반 반응 검출부(110)는 얼굴 영상에서 표정을 분석하여 시청자의 반응을 결정할 수 있다. 일 실시예에서, 표정 기반 반응 검출부(110)는, 예를 들면 {지루함, 무표정, 놀람, 기쁨, 관심 있음} 과 같이 범위가 정해진 카테고리 중 하나의 반응을 결정하는 인공 지능 엔진, 예를 들면 심층 컨볼루션 신경망(Deep-Layered Convolutional Neural Network) 모델로 구현될 수 있다. 감정의 분류는 다수의 상이한 카테고리의 감정 중 하나로 분류하는 접근 외에, 예를 들면 흥분 정도 혹은 긍정적/부정적 정도를 연속적인 수치화하는 접근도 알려져 있다. 예를 들면 AffectNet 데이터셋은 이러한 감정 인식을 위한 인공 지능 엔진의 학습에 공개적으로 제공되고 있다.3 is a block diagram illustrating the configuration of the reaction detection unit 100 according to an exemplary embodiment. According to an aspect of the proposed invention, the reaction detection unit 100 may include an expression-based reaction detection unit 110 . The expression-based reaction detection unit 110 may detect a viewer reaction by classifying the viewer's facial expression into one of predetermined categories. The expression-based reaction detection unit 110 extracts a face image by first detecting a face region from the input image. Thereafter, for example, the expression-based reaction detection unit 110 analyzes the extracted face image to detect the viewer's reaction. According to an aspect, the expression-based reaction detection unit 110 may determine the viewer's reaction by analyzing the expression in the face image. In one embodiment, the expression-based reaction detection unit 110 is, for example, an artificial intelligence engine that determines a reaction in one of a range of categories such as {boredness, expressionlessness, surprise, joy, interest}, for example, deep It can be implemented as a deep-layered convolutional neural network model. In addition to the approach of classifying emotions into one of a number of different categories of emotions, there is also known an approach that continuously quantifies the degree of excitement or positive/negative, for example. For example, the AffectNet dataset is publicly available for training artificial intelligence engines for such emotion recognition.

제안된 발명의 일 양상에 따라, 반응 검출부(100)는 동작 기반 반응 검출부(130)를 포함할 수 있다. 동작 기반 반응 검출부(130)는 시청자의 자세의 변화로부터 동작을 정해진 카테고리 중 하나로 분류하여 반응을 검출할 수 있다. 동작 기반 반응 검출부(130)는 동영상에서 시청자의 머리, 몸통, 팔 등의 골격(skeleton)을 구성하는 관절과 같은 특징점들로부터 자세(pose)를 검출하고, 이 자세의 변화, 즉 움직임을 정해진 카테고리 중 하나로 분류하여 반응을 결정할 수 있다. According to an aspect of the proposed invention, the reaction detection unit 100 may include a motion-based reaction detection unit 130 . The motion-based reaction detector 130 may classify a motion into one of a predetermined category from a change in the viewer's posture to detect a reaction. The motion-based reaction detection unit 130 detects a pose from feature points such as joints constituting a skeleton of a viewer's head, torso, and arms in the video, and sets a change in this posture, ie, a movement, into a predetermined category. You can classify one of them to determine the reaction.

제안된 발명의 일 양상에 따라, 반응 검출부(100)는 음성 기반 반응 검출부(150)를 포함할 수 있다. 음성 기반 반응 검출부(150)는 시청자의 얼굴의 표정을 정해진 카테고리들 중 하나로 분류함으로써 시청자 반응을 검출할 수 있다. 멀티미디어 컨텐츠 속에 산발적으로 존재하는 음성 기반 감정 인식 기술은 여러 개의 분류기를 사용하는 계층적 분류 방법론에 따라 음성에서 비슷한 감정의 인자를 나누어 분류하는 고전적인 방식, 예를 들면 Z. Xiao, Dellandrea, L. Chen, W. Dou, “Recognition of emotions in speech by a hierarchical approach,” ACII 2009, 2009, pp.401-408.에 개시된 것과 같은 기술들이 알려져 있다. 또 다른 예로 딥러닝을 이용한 음성 기반 감정 인식 기술들, 예를 들면 이지원 외, “다중 작업 기반의 합성곱 신경망을 이용한 음성 감정인식”, 2017년 한국통신학회 하계종합학술대회, 2017. 6. 등의 기술도 알려져 있다. According to an aspect of the proposed invention, the reaction detection unit 100 may include a voice-based reaction detection unit 150 . The voice-based reaction detector 150 may detect a viewer reaction by classifying the viewer's facial expression into one of predetermined categories. Speech-based emotion recognition technology that exists sporadically in multimedia content is a classical method that divides and classifies factors of similar emotions in speech according to a hierarchical classification methodology using multiple classifiers, for example, Z. Xiao, Dellandrea, L. Techniques such as those disclosed in Chen, W. Dou, “Recognition of emotions in speech by a hierarchical approach,” ACII 2009, 2009, pp.401-408. are known. As another example, speech-based emotion recognition technologies using deep learning, for example, Jiwon Lee et al., “Voice emotion recognition using multi-task-based convolutional neural network”, 2017 Summer Conference of the Korean Telecommunications Society, 2017. 6. Such techniques are also known.

종합 반응 결정부(170)는 각각의 반응 검출부(110 내지 150)의 출력 정보를 종합하여 최종적으로 반응을 결정한다. 예를 들어 종합 반응 결정부(170)는 각각의 반응 검출부(110 내지 150)의 출력인 감정값과, 그 감정값의 확률값을 고려하여 가장 확률이 높은 감정값을 선택할 수 있다.The synthesis reaction determining unit 170 finally determines the reaction by synthesizing the output information of each of the reaction detecting units 110 to 150 . For example, the overall reaction determination unit 170 may select an emotion value with the highest probability in consideration of the emotion value that is the output of each of the reaction detection units 110 to 150 and the probability value of the emotion value.

도 3에서 반응 검출부(100)는 표정 기반 반응 검출부(110)와, 동작 기반 반응 검출부(130)와 그리고 음성 기반 반응 검출부(150)를 모두 포함하는 것으로 도시되었지만 제안된 발명은 이들 중 하나를 포함하는 세가지 실시예도 모두 포함한다. 이 경우 종합 반응 결정부(170)는 생략될 수 있다. 제안된 발명은 또 다른 실시예들로, 표정 기반 반응 검출부(110)와, 동작 기반 반응 검출부(130)와 그리고 음성 기반 반응 검출부(150) 중 둘의 조합 또는 셋을 모두 포함하며 각각의 출력들을 종합하여 반응을 결정하는 실시예들을 모두 포함한다. In FIG. 3 , the reaction detection unit 100 is illustrated as including all of the expression-based reaction detection unit 110 , the motion-based reaction detection unit 130 , and the voice-based reaction detection unit 150 , but the proposed invention includes one of them. All three examples are also included. In this case, the overall reaction determining unit 170 may be omitted. The proposed invention is another embodiment, and includes a combination or all three of the expression-based reaction detection unit 110, the motion-based reaction detection unit 130, and the voice-based reaction detection unit 150, and outputs each All examples that collectively determine the response are included.

예를 들어 시청자를 촬영한 음성, 영상을 동시에 처리하여 감정을 판단하는 멀티모달 딥러닝 기반 감정 인식 기술, 예를 들면 Z. Xiao, Dellandrea, L. Chen, W. Dou, “Recognition of emotions in speech by a hierarchical approach,” ACII 2009, 2009, pp.401-408.이 제안된 발명에 적용될 수 있다. For example, multimodal deep learning-based emotion recognition technology that judges emotions by simultaneously processing audio and video captured by the viewer, for example, Z. Xiao, Dellandrea, L. Chen, W. Dou, “Recognition of emotions in speech by a hierarchical approach,” ACII 2009, 2009, pp.401-408. can be applied to the proposed invention.

전술한 바와 같이, 도 3의 실시예에 있어서 각각의 반응 검출부(110,130,150)의 처리 과정의 적어도 일부는 클라우드 서버에서 실행될 수 있다. As described above, in the embodiment of FIG. 3 , at least a part of the processing of each of the reaction detection units 110 , 130 , and 150 may be executed in the cloud server.

도 2로 돌아가서 맞춤 반응 결정부(300)는 반응 검출부(100)에서 검출된 시청자 반응에 따라 멀티미디어 컨텐츠에 대한 맞춤 반응을 결정한다. 예를 들어 맞춤 반응은 주의를 환기시키는 몇 가지 제스쳐, 익살스런 표정, 좌우로 왕복하는 동작, 몇 가지 포즈들이 될 수 있다. 맞춤 반응의 결정은 매핑 함수에 의해 단순하게 구현될 수 있다. 예를 들어 시청자 반응에 경험적으로 적절하다고 생각되는 맞춤 반응이 매핑될 수 있다. 검출된 시청자 반응에 일률적으로 매핑된 맞춤 반응은 단조롭고 흥미를 잃게 하기 쉬우므로, 하나의 시청자 반응값에 매핑되는 맞춤 반응을 복수 개 중 랜덤하게 선택하거나, 아니면 시간값 등 다른 추가적인 변수값을 함께 고려하여 결정할 수 있다. Returning to FIG. 2 , the customized reaction determining unit 300 determines a customized reaction to the multimedia content according to the viewer's reaction detected by the reaction detecting unit 100 . For example, a custom response could be a few gestures to call attention, a playful expression, a back-and-forth movement, and a few poses. Determination of a custom response can be implemented simply by a mapping function. For example, a custom reaction that is considered empirically appropriate to the viewer reaction may be mapped. Since the customized response that is uniformly mapped to the detected viewer response is monotonous and easy to lose interest, either randomly select a customized response mapped to one viewer response value from a plurality of values, or consider other additional variable values such as time value together can be decided by

추가적인 양상에 따라, 맞춤 반응 결정부(300)는 검출된 시청자 반응과, 추가로 반응 클립의 내용에 따라 멀티미디어 컨텐츠의 반응 클립에 대응되는 맞춤 반응을 결정할 수 있다. 일 실시예에서, 멀티미디어 컨텐츠(250)에서 변환될 부분에 해당하는 반응 클립의 내용은, 예를 들면 {고난이도 논리 설명 중, 단순 지식 전달 중, 흥미 유도 중, 잠시 중단 중} 과 같이 정해진 카테고리 중 하나로 결정될 수 있다. 이들은 인덱스로 참조될 수 있다. According to an additional aspect, the customized reaction determiner 300 may determine a customized reaction corresponding to the reaction clip of the multimedia content according to the detected viewer reaction and the content of the reaction clip. In one embodiment, the content of the reaction clip corresponding to the part to be converted in the multimedia content 250 is, for example, among a set category such as {in high difficulty logic explanation, simple knowledge transfer, interest induction, pause} one can be determined. These can be referenced by index.

추가적인 양상이 적용된 일 실시예에서, 맞춤 반응은 f(r,c)와 같은 매핑 함수로 구현될 수 있다. 여기서 r은 시청자 반응 인덱스, c는 변환될 반응 클립의 내용 인덱스이다. 즉, 이 실시예에서 맞춤 반응은 시청자 반응과 반응 클립의 내용에 따라 매핑되어 결정될 수 있다. 이러한 매핑은 컨텐츠의 변환 컨셉이나 방향을 결정하는 기획자의 의도에 따라 결정될 수 있다.In an embodiment to which an additional aspect has been applied, the custom response may be implemented as a mapping function such as f(r,c). Here, r is the viewer reaction index, and c is the content index of the reaction clip to be converted. That is, in this embodiment, the customized reaction may be determined by being mapped according to the viewer reaction and the content of the reaction clip. Such a mapping may be determined according to a planner's intention to determine a content transformation concept or direction.

도 2를 참조하면, 반응 클립 변환부(500)는 출력되는 멀티미디어 컨텐츠의 반응 클립을 결정된 맞춤 반응이 반영된 맞춤반응 클립으로 변환하되, 케릭터의 동일성(identity)은 변경하지 않고 그 반응을 변경한다. 일 실시예에서, 반응 클립 변환부(500)는 멀티미디어 컨텐츠의 반응 클립을 영상 프레임 단위로 입력 받아, 결정된 맞춤 반응이 반영된 맞춤반응 클립으로 변환한다. 반응 클립 변환부(500)는 이러한 타겟 자세로 변환된 정지 영상들을 생성한다. 동영상의 모든 프레임에 대해 반응 클립을 변환하기 보다, 샘플링된 프레임만 변환한 후 또 다른 영상처리 엔진을 통해 그 사이의 프레임들을 추정하는 방식으로 동영상을 생성할 수 있다. Referring to FIG. 2 , the reaction clip conversion unit 500 converts the reaction clip of the output multimedia content into a customized reaction clip reflecting the determined customized reaction, but changes the reaction without changing the identity of the character. In one embodiment, the reaction clip conversion unit 500 receives the reaction clip of the multimedia content in units of image frames, and converts it into a customized reaction clip in which the determined customized reaction is reflected. The reactive clip conversion unit 500 generates still images converted to the target posture. Rather than transforming a reactive clip for all frames of a moving picture, a moving picture may be generated by converting only sampled frames and then estimating frames between them through another image processing engine.

도 2에 도시된 실시예에 있어서, 반응 검출부(100)는 6층의 깊이를 가진 (layered) CNN(Convolutional Neural Network)으로 구현될 수 있다. 이러한 인공 지능 회로는 멀티 코어를 가진 그래픽 처리 회로로 구현될 수 있다. 또 다른 예로, 실시간 구현을 위해 설계된 게이트 어레이 기반의 전용 회로로 구현될 수도 있다. 또 맞춤 반응 결정부(300)는 규칙 기반(rule-based)의 맞춤 반응을 결정하는 마이크로프로세서에서 실행되는 컴퓨터 프로그램으로 구현될 수 있다. 또 반응 클립 변환부(500)는 딥 컨볼루션(Deep Convolutional) GAN (DCGAN) 모델을 포함하여 구현될 수 있다. 이러한 DCGAN 은 목표하는 처리 속도에 맞춘 전용 하드웨어로 설계되어 구현될 수 있다. 또 다른 예에서, 반응 클립 변환부(500)의 처리 과정의 적어도 일부는 클라우드 서버에서 실행될 수 있다. In the embodiment shown in FIG. 2 , the response detection unit 100 may be implemented as a convolutional neural network (CNN) having a depth of 6 layers. Such artificial intelligence circuits can be implemented as graphics processing circuits with multiple cores. As another example, it may be implemented as a gate array-based dedicated circuit designed for real-time implementation. In addition, the custom response determining unit 300 may be implemented as a computer program running on a microprocessor that determines a rule-based custom response. In addition, the reactive clip conversion unit 500 may be implemented including a deep convolutional GAN (DCGAN) model. This DCGAN can be designed and implemented with dedicated hardware tailored to the target processing speed. In another example, at least a part of the processing of the reactive clip converting unit 500 may be executed in a cloud server.

이러한 구현 기술은 각 기능 블록별로 설계 목적에 따라, 사양에 따라 또 공정 기술의 선택에 따라 몇 가지 방식이 알려져 있으며, 제안된 발명과 같이 복잡한 알고리즘을 하나의 신호 처리 프로세서나 범용 마이크로프로세서에서, 또는 하나의 서버 컴퓨터에서 구현하는 것은 현재로는 쉽지 않아 보인다. Several implementation techniques are known for each functional block according to the design purpose, specification, and process technology selection, and a complex algorithm such as the proposed invention can be implemented in one signal processing processor or general-purpose microprocessor, or Implementing it on a single server computer doesn't seem easy at the moment.

도 4는 또 다른 실시예에 따른 컨텐츠 변환 장치의 구성을 도시한 블록도이다. 일 양상에 따르면, 일방성 컨텐츠로 당초 제작된 멀티미디어 컨텐츠의 일부인 반응 클립이 시청자 반응에 따라 결정된 맞춤 반응 클립으로 치환된다. 도 4에 도시된 실시예에 있어서, 반응 클립 변환부(500)는 반응 클립 생성부(510)와, 반응 클립 치환부(530)를 포함할 수 있다. 반응 클립 생성부(510)는 인공 지능 알고리즘에 기반하여 멀티미디어 컨텐츠에 포함된 반응 클립과, 결정된 맞춤 반응 정보로부터 맞춤 반응이 반영된 맞춤반응 클립을 생성하여 출력할 수 있다. 반응 클립 치환부(530)는 멀티미디어 컨텐츠에 포함된 반응 클립을 생성된 맞춤 반응 클립으로 치환하여 출력할 수 있다.4 is a block diagram illustrating a configuration of a content conversion apparatus according to another embodiment. According to an aspect, a reaction clip that is a part of multimedia content originally produced as one-way content is replaced with a customized reaction clip determined according to a viewer reaction. In the embodiment shown in FIG. 4 , the reactive clip converting unit 500 may include a reactive clip generating unit 510 and a reactive clip replacing unit 530 . The reaction clip generation unit 510 may generate and output a reaction clip included in the multimedia content based on an artificial intelligence algorithm and a customized reaction clip in which a customized reaction is reflected from the determined customized reaction information. The reaction clip replacement unit 530 may replace and output the reaction clip included in the multimedia content with the generated custom reaction clip.

제안된 발명의 추가적인 양상에 따르면, 맞춤 반응은 반응 동작을 포함할 수 있다. 도시된 실시예에서, 맞춤 반응 결정부(300)는 맞춤 반응 동작 결정부(310)를 포함할 수 있다. 맞춤 반응 동작은 예를 들면 {손 제스쳐 1, 손 제스쳐 2, 고개 추임새, 일어나기} 등 정해진 범주의 동작 중 하나로 인덱스에 의해 지정될 수 있다. 또 다른 예로, 맞춤 반응 동작 정보는 시계열 순서로 나열된 각 관절 점의 좌표값들의 집합일 수 있다. 인체의 그래프 모델의 각 관절점들에 대해 순서를 정의하고 각 관절점들의 어떤 시각에서의 위치를 좌표값으로 표현할 수 있다. 한 시각에서의 각 관절점들의 좌표를 나타내는 순서쌍들을 시간 순서로 나열하여 케릭터의 동작 정보를 표현할 수 있다. According to a further aspect of the proposed invention, the custom reaction may comprise a reaction action. In the illustrated embodiment, the customized reaction determination unit 300 may include a customized reaction action determination unit 310 . The customized reaction action may be designated by an index as one of actions of a predetermined category, such as {hand gesture 1, hand gesture 2, head movement, getting up}. As another example, the customized response motion information may be a set of coordinate values of each joint point listed in time series order. The order of each joint point of the human body graph model can be defined, and the position of each joint point at a certain time can be expressed as a coordinate value. The motion information of the character can be expressed by arranging ordered pairs representing the coordinates of each joint point at one time in chronological order.

맞춤 반응 동작 결정부(310)는 검출된 시청자 반응에 따라 맞춤 반응 동작을 매핑하여 결정한다. 검출된 시청자 반응에 일률적으로 매핑된 반응 동작은 단조롭고 흥미를 잃게 하기 쉬우므로, 하나의 시청자 반응값에 매핑되는 맞춤 반응 동작을 복수 개 중 랜덤하게 선택하거나, 아니면 시간값 등 다른 추가적인 변수값을 함께 고려하여 결정할 수 있다. 추가적인 양상에 따라, 맞춤 반응 동작 결정부(310)는 멀티미디어 컨텐츠(250)에서 변환될 반응 클립의 내용을 맞춤 반응의 결정에 반영할 수 있다. 일 실시예에서, 멀티미디어 컨텐츠(250)에서 변환될 부분에 해당하는 반응 클립의 내용은, 예를 들면 {고난이도 논리 설명 중, 단순 지식 전달 중, 흥미 유도 중, 잠시 중단 중} 과 같이 정해진 카테고리 중 하나로 결정될 수 있다. 이들은 인덱스로 참조될 수 있다.The customized reaction action determining unit 310 maps and determines the customized reaction action according to the detected viewer reaction. Since the reaction action uniformly mapped to the detected viewer reaction is monotonous and easy to lose interest, a customized reaction action mapped to one viewer reaction value is randomly selected from among a plurality of responses, or other additional variable values such as time values are used together. can be decided by considering. According to an additional aspect, the customized reaction action determining unit 310 may reflect the content of the reaction clip to be converted in the multimedia content 250 to the determination of the customized reaction. In one embodiment, the content of the reaction clip corresponding to the part to be converted in the multimedia content 250 is, for example, among a set category such as {in high difficulty logic explanation, simple knowledge transfer, interest induction, pause} one can be determined. These can be referenced by index.

일 양상에 따라 반응 클립 생성부(510)는 반응 클립 동작 변환부(511)를 포함할 수 있다. 반응 클립 동작 변환부(511)는 출력되는 멀티미디어 컨텐츠의 반응 클립을 결정된 맞춤 반응 동작이 반영된 맞춤 반응 클립으로 변환하되, 케릭터의 동일성(identity)은 변경하지 않고 그 동작을 변경한다. 일 실시예에서, 생성적 적대 신경망(Generative Adversarial Networks : GANs)을 이용하여 반응 클립을 변환할 수 있다. 전술한 Liqian Ma, et al. “Pose Guided Person Image Generation”, 31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA, 25 May 2017 논문은 새로운 자세(novel pose)와 입력 영상으로부터 타겟 영상을 생성하는 기술을 개시하고 있다. 이 장치는 2 단계로 된 구조를 가진다. 1단계는 U-Net 과 유사한 구조를 가진 길쌈 오토인코더(convolutional autoencoder)를 포함하며, 입력 영상과, 18개의 키 포인트들의 좌표로 표현되는 새로운 자세(novel pose)를 합성(integrate)하여 타겟과 기본적으로 동일한 색상과 자세를 가진 다소 희미한(blurry) 영상을 생성한다. 2단계는 변형된 조건부 DCGAN(variant of conditional Deep Convolution GAN) 을 포함하며, 적대적 학습(adversarial training)을 통해 입력된 다소 희미한 영상을 정교화(refinement)한다. According to an aspect, the reactive clip generating unit 510 may include a reactive clip operation converting unit 511 . The reaction clip action conversion unit 511 converts the reaction clip of the output multimedia content into a customized reaction clip in which the determined customized reaction action is reflected, but changes the action without changing the identity of the character. In one embodiment, generative adversarial networks (GANs) may be used to transform reactive clips. Liqian Ma, et al. “Pose Guided Person Image Generation”, 31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA, 25 May 2017 The paper discloses a novel pose and a technology for generating a target image from an input image. are doing This device has a two-level structure. Step 1 includes a convolutional autoencoder with a structure similar to that of U-Net, and integrates the input image with a new pose expressed by the coordinates of 18 key points to integrate the target and the basic to produce a somewhat blurry image with the same color and posture. Step 2 involves a variant of conditional deep convolution GAN (DCGAN), which refines the rather faint image input through adversarial training.

도시된 실시예에서, 반응 클립 치환부(530)는 멀티 미디어 컨텐츠의 일부를 구성하는 반응 클립을 반응 클립 동작 변환부(511)가 생성한 맞춤반응 클립으로 치환하여 출력한다. In the illustrated embodiment, the reactive clip replacement unit 530 replaces the reactive clip constituting a part of the multimedia content with the customized reactive clip generated by the reactive clip action conversion unit 511 and outputs it.

제안된 발명의 추가적인 양상에 따르면, 맞춤 반응은 반응 표정을 포함할 수 있다. 도 5는 또 다른 실시예에 따른 컨텐츠 변환 장치의 구성을 도시한 블록도이다. 도시된 실시예에서, 맞춤 반응 결정부(300)는 맞춤 반응 표정 결정부(330)를 포함할 수 있다. 맞춤 반응 표정은 예를 들면 {큰 웃음, 미소, 찡그림, 코메디1, 코메디2, 화난 표정} 등 정해진 범주의 동작 중 하나로 인덱스에 의해 지정될 수 있다. 또 다른 예로, 맞춤 반응 동작 정보는 시계열 순서로 나열된 각 특징 점의 좌표값들의 집합일 수 있다. 얼굴의 그래프 모델의 각 특징점들에 대해 순서를 정의하고 각 특징점들의 어떤 시각에서의 위치를 좌표값으로 표현할 수 있다. 한 시각에서의 각 관절점들의 좌표를 나타내는 순서쌍들을 시간 순서로 나열하여 얼굴의 표정 정보를 표현할 수 있다. According to a further aspect of the proposed invention, the customized reaction may comprise a reaction expression. 5 is a block diagram illustrating a configuration of a content conversion apparatus according to another embodiment. In the illustrated embodiment, the customized reaction determination unit 300 may include a customized reaction expression determination unit 330 . The customized reaction expression may be designated by an index as one of the actions of a predetermined category, such as {big smile, smile, frown, comedy 1, comedy 2, angry expression}. As another example, the customized response motion information may be a set of coordinate values of each feature point that is arranged in a time series order. An order can be defined for each feature point of the graph model of the face, and the position of each feature point at a certain point of view can be expressed as a coordinate value. Facial expression information can be expressed by arranging ordered pairs representing the coordinates of each joint point at one time in chronological order.

맞춤 반응 표정 결정부(330)는 검출된 시청자 반응에 따라 맞춤 반응 표정을 매핑하여 결정한다. 검출된 시청자 반응에 일률적으로 매핑된 반응 표정은 단조롭고 흥미를 잃게 하기 쉬우므로, 하나의 시청자 반응값에 매핑되는 맞춤 반응 표정을 복수 개 중 랜덤하게 선택하거나, 아니면 시간값 등 다른 추가적인 변수값을 함께 고려하여 결정할 수 있다. 추가적인 양상에 따라, 맞춤 반응 표정 결정부(330)는 멀티미디어 컨텐츠(250)에서 변환될 반응 클립의 내용을 맞춤 반응의 결정에 반영할 수 있다. 일 실시예에서, 멀티미디어 컨텐츠(250)에서 변환될 부분에 해당하는 반응 클립의 내용은, 예를 들면 {고난이도 논리 설명 중, 단순 지식 전달 중, 흥미 유도 중, 잠시 중단 중} 과 같이 정해진 카테고리 중 하나로 결정될 수 있다. 이들은 인덱스로 참조될 수 있다.The customized reaction expression determining unit 330 maps and determines the customized reaction expression according to the detected viewer reaction. Since the reaction expression uniformly mapped to the detected viewer reaction is monotonous and easy to lose interest, a customized reaction expression mapped to one viewer reaction value is randomly selected from a plurality of responses, or other additional variable values such as time values are used together. can be decided by considering. According to an additional aspect, the customized reaction expression determining unit 330 may reflect the content of the reaction clip to be converted in the multimedia content 250 to the determination of the customized reaction. In one embodiment, the content of the reaction clip corresponding to the part to be converted in the multimedia content 250 is, for example, among a set category such as {in high difficulty logic explanation, simple knowledge transfer, interest induction, pause} one can be determined. These can be referenced by index.

일 양상에 따라 반응 클립 생성부(510)는 반응 클립 표정 변환부(513)를 포함할 수 있다. 반응 클립 표정 변환부(513)는 출력되는 멀티미디어 컨텐츠의 반응 클립을 결정된 맞춤 반응 표정이 반영된 맞춤 반응 클립으로 변환하되, 케릭터의 동일성(identity)은 변경하지 않고 그 표정을 변경한다. 일 실시예에서, 생성적 적대 신경망(Generative Adversarial Networks : GANs)을 이용하여 반응 클립을 변환할 수 있다. 생성적 적대 신경망(Generative Adversarial Networks : GANs)에 기반하여 케릭터의 동일성을 유지하면서 타겟 표정으로 바꾸어주는 표정 변환(facial expression translation) 기술이 알려져 있다. Hao Tang et al., “Expression Conditional GAN for Facial Expression-to-Expression Translation”, ICIP 2019 논문은 부가적인 표정 속성(additional expression attribute)에 기초하여 한 이미지 도메인에서 다른 것으로 매핑하는 EC-GAN(Expression Conditional GAN) 기술을 개시하고 있다. According to an aspect, the reactive clip generating unit 510 may include a reactive clip expression converting unit 513 . The reaction clip expression conversion unit 513 converts the reaction clip of the output multimedia content into a customized reaction clip reflecting the determined customized reaction expression, but changes the expression without changing the identity of the character. In one embodiment, generative adversarial networks (GANs) may be used to transform reactive clips. A facial expression translation technology that converts a target expression to a target expression while maintaining the identity of a character based on Generative Adversarial Networks (GANs) is known. Hao Tang et al., “Expression Conditional GAN for Facial Expression-to-Expression Translation”, ICIP 2019 paper, EC-GAN (Expression Conditional GAN) mapping from one image domain to another based on additional expression attributes GAN) technology is disclosed.

도시된 실시예에서, 반응 클립 치환부(530)는 멀티 미디어 컨텐츠의 일부를 구성하는 반응 클립을 반응 클립 표정 변환부(513)가 생성한 맞춤반응 클립으로 치환하여 출력한다. In the illustrated embodiment, the reaction clip replacement unit 530 replaces the reaction clip constituting a part of the multimedia content with the customized reaction clip generated by the reaction clip expression conversion unit 513 and outputs it.

도 6은 또 다른 실시예에 따른 컨텐츠 변환 장치의 구성을 도시한 블록도이다. 도시된 실시예에서 맞춤 반응 결정부(300)는 맞춤 반응 표정 결정부(330)와 맞춤 반응 동작 결정부(310)를 포함한다. 또 반응 클립 변환부(500)는 반응 클립 표정 변환부(513)와 반응 클립 동작 변환부(511) 및 반응 클립 치환부(530)를 포함한다. 6 is a block diagram illustrating a configuration of a content conversion apparatus according to another embodiment. In the illustrated embodiment, the customized reaction determination unit 300 includes a customized reaction expression determination unit 330 and a customized reaction action determination unit 310 . In addition, the reactive clip conversion unit 500 includes a reactive clip expression conversion unit 513 , a reactive clip action conversion unit 511 , and a reactive clip replacement unit 530 .

맞춤 반응 표정 결정부(330)는 도 5에 도시된 실시예의 대응되는 구성과, 맞춤 반응 동작 결정부(310)는 도 4에 도시된 실시예의 대응되는 구성과 유사하다. 반응 클립 표정 변환부(513)는 도 5에 도시된 실시예의 대응되는 구성과 유사하다. 반응 클립 동작 변환부(511)는 입력이 반응 클립 표정 변환부(513)에서 생성된 영상인 점을 제외하고는 도 4에 도시된 실시예의 대응되는 구성과 유사하다. The customized reaction expression determining unit 330 is similar to the corresponding configuration of the embodiment shown in FIG. 5 , and the customized reaction action determining unit 310 is similar to the corresponding configuration of the embodiment shown in FIG. 4 . The reaction clip expression conversion unit 513 is similar to the corresponding configuration of the embodiment shown in FIG. 5 . The reaction clip motion conversion unit 511 is similar to the corresponding configuration of the embodiment shown in FIG. 4 except that the input is an image generated by the reaction clip expression conversion unit 513 .

이상에서 본 발명을 첨부된 도면을 참조하는 실시예들을 통해 설명하였지만 이에 한정되는 것은 아니며, 이들로부터 당업자라면 자명하게 도출할 수 있는 다양한 변형예들을 포괄하도록 해석되어야 한다. 특허청구범위는 이러한 변형예들을 포괄하도록 의도되었다. Although the present invention has been described above with reference to the accompanying drawings, the present invention is not limited thereto, and it should be construed to encompass various modifications that can be apparent from those skilled in the art. The claims are intended to cover such modifications.

11 : 셋탑 박스 13 : 디스플레이
15 : 테블릿 17 : 스마트폰
30 : 컨텐츠 서비스 서버 31 : 컨텐츠 데이터베이스
50 : 공중망
10 : 정보 처리 시스템
100 : 반응 검출부
110 : 표정 기반 반응 검출부 130 : 동작 기반 반응 검출부
150 : 음성 기반 반응 검출부
210 : 카메라 230 : 메모리
250 : 멀티미디어 컨텐츠 270 : 마이크
300 : 맞춤 반응 결정부
310 : 맞춤 반응 동작 결정부 330 : 맞춤 반응 표정 결정부
500 : 반응 클립 변환부
510 : 반응 클립 생성부
511 : 반응 클립 동작 변환부 513 : 반응 클립 표정 변환부
530 : 반응 클립 표정 변환부11: set-top box 13: display
15: tablet 17: smartphone
30: content service server 31: content database
50: public network
10: information processing system
100: reaction detection unit
110: expression-based reaction detection unit 130: motion-based reaction detection unit
150: voice-based reaction detection unit
210: camera 230: memory
250: multimedia content 270: microphone
300: custom reaction determining unit
310: customized reaction action determination unit 330: customized reaction expression determination unit
500: reaction clip conversion unit
510: reaction clip generation unit
511: reactive clip action conversion unit 513: reactive clip expression conversion unit
530: reaction clip expression conversion unit

Claims

memory;
at least one information processing system that reads control data stored in the memory and processes input data, the information processing system comprising:
a reaction detection unit for detecting a viewer reaction to the output multimedia content;
a customized response determining unit for determining a customized response to the multimedia content according to the viewer's reaction detected by the reaction detecting unit;
a reaction clip conversion unit for converting the outputted multimedia content reaction clip into a customized reaction clip reflecting the determined customized reaction, changing the reaction without changing the identity of the character;
A multimedia content conversion device comprising a.

The method according to claim 1, wherein the reaction detection unit:
an expression-based reaction detection unit for classifying a viewer's facial expression into one of a predetermined category to detect a reaction;
a motion-based reaction detection unit for classifying a motion into one of a predetermined category from a change in the viewer's posture and detecting a reaction;
a voice-based reaction detector for classifying the viewer's voice into one of a predetermined category and detecting a reaction;
Multimedia content conversion device comprising at least one of.

The method according to claim 1, wherein the reaction detection unit
A multimedia content conversion apparatus for detecting and synthesizing each viewer's reaction from a plurality of viewer's images received through a network to determine the viewer's reaction.

The method according to claim 1,
The customized reaction determining unit determines a customized reaction corresponding to the reaction clip of the multimedia content according to the detected viewer reaction and the content of the reaction clip.

The method according to claim 1, Reactive clip conversion unit:
a reaction clip generation unit for generating and outputting a reaction clip included in the multimedia content based on an artificial intelligence algorithm and a customized reaction clip in which a customized reaction is reflected from the determined customized reaction information;
a reactive clip replacement unit for replacing and outputting a reactive clip included in the multimedia content with a generated custom reactive clip;
A multimedia content conversion device comprising a.

The method according to claim 1,
The customized reaction determination unit includes a customized reaction action determination unit that determines a customized reaction action of the character according to the detected viewer reaction,
The reaction clip conversion unit converts the reaction clip of the output multimedia content into a customized reaction clip in which the determined customized reaction action is reflected, the multimedia content conversion apparatus including a reaction clip action conversion unit that changes the operation without changing the identity of the character.

The method according to claim 1,
The customized reaction determination unit further comprises a customized reaction expression determination unit for determining a customized reaction expression of the character according to the detected viewer reaction,
The reaction clip conversion unit converts the reaction clip of the output multimedia content into a customized reaction clip reflecting the determined customized reaction expression, and a reaction clip expression conversion unit for changing the reaction expression without changing the identity of the character. .

memory; In the multimedia content conversion method executed in at least one information processing system that reads the control data stored in the memory and processes the input data,
a reaction detection step of detecting a viewer reaction to the output multimedia content;
a custom reaction determining step of determining a customized reaction to the multimedia content according to the viewer reaction detected in the reaction detecting step;
a reaction clip conversion step of converting the reaction clip of the output multimedia content into a customized reaction clip reflecting the determined customized reaction, changing the reaction without changing the identity of the character;
A multimedia content conversion method comprising a.

The method of claim 8 , wherein detecting the reaction comprises:
an expression-based reaction detection step of classifying a viewer's facial expression into one of a predetermined category to detect a reaction;
a motion-based reaction detection step of classifying a motion into one of a predetermined category from a change in a viewer's posture and detecting a reaction;
a voice-based reaction detection step of classifying a viewer's voice into one of a predetermined category to detect a reaction;
Multimedia content conversion method comprising at least one of.

The method according to claim 1, wherein the reaction detection step detects and synthesizes each viewer's reaction from a plurality of viewer images received through a network to determine the viewer's reaction.

9. The method of claim 8,
The step of determining the customized reaction includes determining a customized reaction corresponding to the reaction clip of the multimedia content according to the detected viewer reaction and the content of the reaction clip.

The method of claim 8 , wherein the step of converting reactive clips comprises:
a reaction clip generation step of generating and outputting a reaction clip included in the multimedia content based on an artificial intelligence algorithm, and a customized reaction clip in which a customized reaction is reflected from the determined customized reaction information;
Responsive clip replacement step of replacing the reaction clip included in the multimedia content with the generated custom reaction clip and outputting;
A multimedia content conversion method comprising a.

9. The method of claim 8,
The determining of a customized reaction includes a determination of a customized reaction action of determining a customized reaction action of the character according to the detected viewer reaction,
The reaction clip conversion step converts a reaction clip of the output multimedia content into a customized reaction clip in which the determined customized reaction action is reflected, multimedia content conversion including a reaction clip action conversion step of changing the action without changing the identity of the character method.

9. The method of claim 8,
The customized reaction determination step further includes a customized reaction expression determination step of determining a customized reaction expression of the character according to the detected viewer reaction,
The reaction clip conversion step converts the reaction clip of the output multimedia content into a customized reaction clip reflecting the determined customized reaction expression, but the reaction clip expression conversion step of changing the reaction expression without changing the identity of the character conversion method.