KR102512468B1

KR102512468B1 - Apparatus and method for analyzing viewers' responses to video content

Info

Publication number: KR102512468B1
Application number: KR1020220148473A
Authority: KR
Inventors: 권성은; 김연태; 장서인
Original assignee: 주식회사 제로투원파트너스
Priority date: 2022-11-09
Filing date: 2022-11-09
Publication date: 2023-03-22

Abstract

The present invention provides a device and method for analyzing the reaction of a viewer regarding video content. The present invention is configured to: play video content and generate a reaction video by filming the face of a viewer watching the video content; apply each video frame of the reaction video to an artificial intelligence model trained to determine a viewer's emotion from a face image to calculate each probability value in which a face image of the viewer included in each video frame corresponds to pre-defined emotions. Then, after calculating the average of the probability values of each emotion of the viewer for each video frame, the average value of each emotion is used to calculate the viewer's immersion for each video frame. The immersion for each video frame obtained in this way is compared with the average immersion of the entire reaction video so as to select and provide a user with 1) a section with the highest instantaneous immersion and a section with the lowest instantaneous immersion during the entire playback time of the video content, 2) an immersion maintenance section where the immersion for the video frame remains higher than the average and an immersion decline section where the immersion for the video frame remains lower than the average, and 3) a section with the maximum probability value for each emotion and a section with the minimum probability value for each emotion. Accordingly, it is possible to analyze the emotions and reactions of viewers regarding video content in various ways and provide analysis results obtained thereby.

Description

Apparatus and method for analyzing viewers' responses to video content}

본 발명은 시청자 반응 분석 장치 및 방법에 관한 것으로서, 보다 구체적으로는 광고, 영화, 드라마 및 기타 동영상 콘텐츠에 대한 시청자의 반응 분석 장치 및 방법에 관한 것이다.The present invention relates to an apparatus and method for analyzing a viewer's reaction, and more particularly, to an apparatus and method for analyzing a viewer's reaction to advertisements, movies, dramas, and other video content.

최근 OTT 서비스가 널리 보급되고, 유튜브나 인스타그램 등의 소셜미디어 및 인터넷 채널을 이용하여 동영상 콘텐츠를 제공하는 동영상 콘텐츠 제공자들이 급속도로 증가함에 따라서 미디어 환경이 격변하고 있다.Recently, as OTT services are widely spread and the number of video content providers providing video contents using social media and internet channels such as YouTube and Instagram is rapidly increasing, the media environment is rapidly changing.

소비자들에게 제공되는 동영상 콘텐츠들이 기하 급수적으로 증가함에 따라서, 소비자들이 동영상 콘테츠들을 소비하는 패턴도 다양화되고 있다. 즉, 소비자들의 동영상 콘텐츠 소비 패턴이 몰아보기, 배속 시청, 장면 스킵, 서머리 콘텐츠만 시청 등과 같이 변화되면서,'본방 사수'와 같은 정시 시청의 개념이 희미해지고 있으며, 광고, 드라마, 영화, 기타 동영상 콘텐츠의 소비에 있어서, 소비자는 원하는 장면만 골라보고, 배속과 스킵 기능으로 더 빠르게 재생 속도를 높이며, 드라마 전편을 시청하는 것이 아니라 요약본만으로 내용을 인지하는 등의 새로운 시청 패턴이 등장하고 있는 것이다. As video contents provided to consumers increase exponentially, patterns in which consumers consume video contents are also diversifying. In other words, as consumers' video content consumption patterns change, such as binge-watching, double-speed viewing, skipping scenes, and viewing only summary content, the concept of regular viewing, such as 'on-premise shooter', is fading, and advertisements, dramas, movies, and other videos are changing. In terms of content consumption, new viewing patterns are emerging, such as consumers selecting only the scenes they want, increasing playback speed faster with double speed and skip functions, and recognizing the content only with the summary rather than watching the entire drama.

이러한 동영상 소비 패턴의 변화에 따라서, 동영상 콘텐츠 제작자들은 소비자들이 동영상 콘텐츠의 어떤 장면에서 만족하고, 어떤 장면에서 불만족하는지를 분석하여, 동영상 콘텐츠의 만족 요인과 불만족 요인을 제작에 반영할 필요성이 높아지고 있다.In accordance with these changes in video consumption patterns, the need for video content producers to reflect the satisfaction and dissatisfaction factors of video content in production is increasing by analyzing which scenes of video content consumers are satisfied with and dissatisfied with.

특히, 불만족 요인은 만족 요인보다 오히려 찾아내기 어려울 수 있다. 시청자들은 만족을 느꼈던 부분에 대해서, 혹은 큰 실망을 느낀 부분에 대해서는 기꺼이 시청 후기를 남기지만, 본인들의 흥미를 유발하지 않은 부분에 대해서는 후기를 작성하지 않을 확률이 높기 때문이다. In particular, dissatisfaction factors can be rather difficult to find than satisfaction factors. This is because viewers are willing to leave reviews for parts they feel satisfied with or for parts they feel greatly disappointed with, but they are more likely not to write reviews for parts that do not arouse their interest.

따라서, 동영상 콘텐츠의 제작에 있어서, 단순히 시청 후기나 댓글을 분석하는 방식 이외에 소비자들의 실제 반응을 분석할 수 있는 방법 및 장치가 요구된다.Therefore, in the production of video content, a method and apparatus capable of analyzing actual reactions of consumers in addition to a method of simply analyzing viewing reviews or comments are required.

본 발명이 해결하고자 하는 과제는 동영상 콘텐츠를 시청하는 시청자들이 동영상 콘텐츠를 시청하면서 나타내는 반응을 검출하여 동영상 프레임 단위로 시청자들의 반응을 분석할 수 있는 반응 분석 장치 및 방법을 제공하는 것이다.An object to be solved by the present invention is to provide a reaction analysis device and method capable of analyzing viewers' reactions in units of video frames by detecting reactions expressed by viewers while watching video content.

상술한 과제를 해결하기 위한 본 발명의 바람직한 실시예에 따른 동영상 콘텐츠에 대한 시청자 반응 분석 방법은, 프로세서 및 메모리를 포함하는 반응 분석 시스템에서 수행되는 동영상 콘텐츠에 대한 시청자 반응 분석 방법으로서, (b) 동영상 콘텐츠를 시청하는 시청자들의 모습을 촬영한, 상기 동영상 콘텐츠와 동기화된 반응 동영상에 포함된 각각의 영상 프레임으로부터 시청자의 얼굴 영역 이미지를 추출하는 단계; (c) 상기 추출된 각 얼굴 영역 이미지를 감정 판정 인공지능 모델에 적용하면, 상기 감정 판정 인공지능 모델이, 각 시청자별로 상기 반응 동영상의 각 영상 프레임마다, 각 얼굴 영역 이미지가 사전에 정의된 감정들을 각각 나타낼 확률을 감정별로 판정하여 출력하는 단계; (d) 서로 다른 시청자들의 반응 동영상 중, 서로 동일한 시간에 대응되는 각 영상 프레임들에 대해서, 각 감정별 확률값들을 이용하여, 각 영상 프레임별로 시청자들의 몰입도를 계산하는 단계; 및 (f) 각 영상 프레임의 몰입도가 사전에 정의된 조건에 부합하는 영상 프레임들의 구간을 선택하여 제공하는 단계를 포함한다.A viewer reaction analysis method for video content according to a preferred embodiment of the present invention for solving the above problems is a method for analyzing viewer reaction to video content performed in a reaction analysis system including a processor and a memory, (b) extracting an image of a viewer's face region from each video frame included in a reaction video synchronized with the video content, in which images of viewers watching the video content are captured; (c) When each of the extracted face region images is applied to an emotion determination artificial intelligence model, the emotion determination artificial intelligence model, for each video frame of the reaction video for each viewer, each face region image has a predefined emotion determining and outputting probabilities for each emotion; (d) calculating the degree of immersion of viewers for each video frame, using probability values for each emotion, for each video frame corresponding to the same time among reaction videos of different viewers; and (f) selecting and providing a section of image frames in which the level of immersion of each image frame meets a predefined condition.

또한, 상기 (b) 단계 이전에, (a) 각 감정별로 해당 감정을 나타내는 얼굴 표정을 포함하는 얼굴 이미지들을 이용하여 학습한 상기 감정 판정용 인공지능 모델을 구비하는 단계;를 더 포함할 수 있다.In addition, before the step (b), (a) providing the artificial intelligence model for determining the emotion learned by using facial images including facial expressions representing the corresponding emotion for each emotion; may further include .

또한, 상기 (a) 단계는, 사람의 얼굴이 포함된 이미지로부터 얼굴 영역을 추출하고, 추출된 얼굴 영역을 분석하여 사람의 감정이 사전에 정의된 감정들을 각각 나타낼 확률을 감정별로 판정하여 출력할 수 있도록 인공지능 모델을 학습하여 상기 감정 판정용 인공지능 모델을 구비할 수 있다.In addition, in the step (a), a face region is extracted from an image including a human face, and the extracted face region is analyzed to determine a probability that each emotion of a person expresses predefined emotions for each emotion, and output the result. An artificial intelligence model for emotion determination may be provided by learning an artificial intelligence model so as to be able to do so.

또한, 상기 동영상 콘텐츠와 상기 서로 다른 시청자들에 대한 반응 동영상들은 타임 스탬프로 서로 동기화될 수 있다.In addition, the video content and the reaction videos for different viewers may be synchronized with each other with time stamps.

또한, 상기 (d) 단계는, 서로 다른 시청자들의 반응 동영상 중, 서로 동일한 시간에 대응되는 각 영상 프레임별로, 각 감정별 확률값들의 평균을 계산하여, 각 영상 프레임별 시청자들의 몰입도를 계산할 수 있다.In addition, in the step (d), among the reaction videos of different viewers, for each video frame corresponding to the same time, the average of probability values for each emotion may be calculated to calculate the viewer's immersion for each video frame. .

또한, 상기 (d) 단계는, 각 영상 프레임별로, 무표정(중립)을 제외한 나머지 감정들의 확률 평균을 모두 합산하여, 각 영상 프레임의 몰입도를 계산할 수 있다.Further, in the step (d), the degree of immersion of each image frame may be calculated by adding up all probability averages of emotions other than expressionless (neutral) for each image frame.

또한, 상기 (f) 단계는, 전체 영상 프레임에서 몰입도가 최대인 프레임과 최소인 프레임을 각각 선정하고, 선정된 해당 프레임을 중심으로 전후 소정의 시간 동안을 순간 몰입도 최대 구간 및 순간 몰입도 최소 구간으로 선정할 수 있다.In addition, in the step (f), a frame with the maximum and minimum immersion is selected from the entire video frame, and the maximum instantaneous immersion section and the instantaneous immersion degree are selected for a predetermined time before and after the selected frame. It can be selected as the smallest interval.

또한, 상기 (d) 단계 및 상기 (f) 단계 사이에, (e) 사전에 정의된 수의 영상 프레임 단위로 몰입도를 평준화하는 단계를 더 포함할 수 있다.In addition, between the steps (d) and the step (f), (e) leveling the degree of immersion in units of a predefined number of image frames may be further included.

또한, 상기 (f) 단계는, 영상 프레임의 몰입도가 전체 영상 프레임의 평균 몰입도보다 높게 유지된 가장 긴 구간을 몰입 유지 구간으로 선정하고, 영상 프레임의 몰입도가 상기 평균 몰입도보다 작게 유지된 가장 긴 구간을 몰입 저하 구간으로 선정하여 제공할 수 있다.In the step (f), the longest section in which the degree of immersion of the video frames is maintained higher than the average degree of immersion of all the image frames is selected as the immersion maintenance section, and the degree of immersion of the video frames is maintained to be less than the average degree of immersion. It is possible to select and provide the longest section as an immersion reduction section.

또한, 상기 (f) 단계는, 전체 영상 프레임들에 대해서 각 감정별로 확률의 전체 평균을 계산하고, 각 영상 프레임의 해당 감정 확률값이 전체 평균 이상인 구간들 중 최장 기간인 구간을 순서대로 선정하여 제공하고, 각 영상 프레임의 해당 감정 확률값이 전체 평균 이하인 구간들 중 최장 기간인 구간을 순서대로 선정하여 제공할 수 있다.In addition, in the step (f), the overall average of the probability for each emotion is calculated for all video frames, and among the intervals in which the corresponding emotion probability value of each video frame is higher than the overall average, the interval with the longest period is sequentially selected and provided. In addition, among the sections in which the emotion probability value of each video frame is less than the overall average, a section having the longest period may be sequentially selected and provided.

한편, 상술한 과제를 해결하기 위한 본 발명의 바람직한 실시예에 따른 컴퓨터 프로그램은, 비일시적 저장매체에 저장되고, 프로세서를 포함하는 컴퓨터에서 실행되어, 상기한 동영상 콘텐츠에 대한 시청자 반응 분석 방법을 수행한다.On the other hand, the computer program according to a preferred embodiment of the present invention for solving the above problems is stored in a non-transitory storage medium, and is executed in a computer including a processor to perform the above-described viewer reaction analysis method for video content. do.

한편, 상술한 과제를 해결하기 위한 본 발명의 바람직한 실시예에 따른 동영상 콘텐츠에 대한 시청자 반응 분석 장치는, 프로세서 및 소정의 명령어들을 저장하는 메모리를 포함하는 동영상 콘텐츠에 대한 시청자 반응 분석 장치로서, 상기 메모리에 저장된 명령어들을 실행한 상기 프로세서는 (b) 동영상 콘텐츠를 시청하는 시청자들의 모습을 촬영한, 상기 동영상 콘텐츠와 동기화된 반응 동영상에 포함된 각각의 영상 프레임으로부터 시청자의 얼굴 영역 이미지를 추출하는 단계; (c) 상기 추출된 각 얼굴 영역 이미지를 감정 판정 인공지능 모델에 적용하면, 상기 감정 판정 인공지능 모델이, 각 시청자별로 상기 반응 동영상의 각 영상 프레임마다, 각 얼굴 영역 이미지가 사전에 정의된 감정들을 각각 나타낼 확률을 감정별로 판정하여 출력하는 단계; (d) 서로 다른 시청자들의 반응 동영상 중, 서로 동일한 시간에 대응되는 각 영상 프레임들에 대해서, 각 감정별 확률값들을 이용하여, 각 영상 프레임별로 시청자들의 몰입도를 계산하는 단계; 및 (f) 각 영상 프레임의 몰입도가 사전에 정의된 조건에 부합하는 영상 프레임들의 구간을 선택하여 제공하는 단계를 수행한다.On the other hand, an apparatus for analyzing viewer reaction to video content according to a preferred embodiment of the present invention for solving the above problems is an apparatus for analyzing viewer reaction to video content including a processor and a memory for storing predetermined instructions. The processor executing the commands stored in the memory (b) extracting an image of the viewer's face region from each video frame included in the reaction video synchronized with the video content, in which the images of the viewers watching the video content are photographed. ; (c) When each of the extracted face region images is applied to an emotion determination artificial intelligence model, the emotion determination artificial intelligence model, for each video frame of the reaction video for each viewer, each face region image has a predefined emotion determining and outputting probabilities for each emotion; (d) calculating the degree of immersion of viewers for each video frame, using probability values for each emotion, for each video frame corresponding to the same time among reaction videos of different viewers; and (f) selecting and providing a section of image frames in which the level of immersion of each image frame meets a predefined condition.

또한, 상기 프로세서는, 상기 (b) 단계 이전에, (a) 각 감정별로 해당 감정을 나타내는 얼굴 표정을 포함하는 얼굴 이미지들을 이용하여 학습한 상기 감정 판정용 인공지능 모델을 구비하는 단계;를 더 수행할 수 있다.In addition, the processor, prior to the step (b), (a) providing the artificial intelligence model for determining the emotion learned using facial images including facial expressions representing the corresponding emotion for each emotion; can be done

또한, 상기 (d) 단계 및 상기 (f) 단계 사이에, (e) 사전에 정의된 수의 영상 프레임 단위로 몰입도를 평준화하는 단계를 더 수행할 수 있다.In addition, between the steps (d) and the step (f), (e) leveling the degree of immersion in units of a predefined number of image frames may be further performed.

본 발명은 동영상 콘텐츠를 재생하면서 이를 시청하는 시청자의 얼굴을 촬영하여 반응 동영상을 생성하고, 반응 동영상의 각 영상 프레임을, 시청자의 얼굴 이미지로부터 시청자의 감정을 판정할 수 있도록 학습된 인공지능 모델에 적용하여, 각 영상 프레임에 포함된 시청자의 얼굴 이미지가 사전에 정의된 감정들에 해당할 확률값들을 각각 구한다. 그리고, 각 영상 프레임별로 시청자들의 각 감정의 확률값들의 평균을 구한 후, 각 감정들의 평균값을 이용하여 해당 영상 프레임별 시청자 몰입도를 구한다. 이렇게 구해진 각 영상 프레임의 몰입도를 전체 반응 영상의 몰입도 평균과 비교함으로써 ① 동영상 콘텐츠의 전체 재생 시간 중 순간 몰입도가 최대인 구간과 최소인 구간, ② 영상 프레임의 몰입도가 평균보다 높게 유지되는 몰입 유지 구간과 평균보다 낮게 유지되는 몰입 저하 구간, ③ 감정별 확률값의 최대 구간/최소 구간을 선정하여 사용자에게 제공함으로써, 동영상 콘텐츠에 대해서 시청자들이 느끼는 감정과 반응을 다양하게 분석하여 제공할 수 있다.The present invention generates a reaction video by photographing the face of a viewer who watches it while playing video content, and transfers each video frame of the reaction video to an artificial intelligence model trained to determine the viewer's emotion from the viewer's face image. By applying, probability values corresponding to predefined emotions of the viewer's face image included in each video frame are obtained. In addition, after obtaining an average of probability values of each emotion of viewers for each video frame, the degree of viewer immersion for each corresponding video frame is obtained using the average value of each emotion. By comparing the degree of immersion of each video frame obtained in this way with the average degree of immersion of all reaction videos, ① the section with the maximum and minimum immersion during the total playback time of the video content, and ② the immersion level of the video frame is maintained higher than the average. By selecting and providing users with an immersion maintenance section and an immersion deterioration section that is maintained below the average, and ③ the maximum section/minimum section of the probability value for each emotion, viewers can analyze and provide various emotions and reactions to video content. there is.

도 1은 본 발명의 바람직한 실시예에 따른 동영상 콘텐츠에 대한 시청자 반응 분석 시스템의 전체 구성을 도시하는 도면이다.
도 2는 본 발명의 바람직한 실시예에 따른 동영상 콘텐츠에 대한 시청자 반응 분석 방법을 설명하는 도면이다.
도 3a 내지 도 3c는 몰입도를 스케일링하여 표시한 그래프이다.1 is a diagram showing the overall configuration of a viewer reaction analysis system for video content according to a preferred embodiment of the present invention.
2 is a diagram illustrating a method for analyzing a viewer's reaction to video content according to a preferred embodiment of the present invention.
3A to 3C are graphs in which immersion is scaled and displayed.

이하, 도면을 참고하여 본 발명의 바람직한 실시예를 설명한다.Hereinafter, preferred embodiments of the present invention will be described with reference to the drawings.

여기서, 본 발명의 상술한 목적, 특징들 및 장점은 첨부된 도면과 관련된 다 음의 상세한 설명을 통해 보다 분명해질 것이다. 다만, 본 발명은 다양한 변경을 가할 수 있고 여러가지 실시예들을 가질 수 있는 바, 이하에서는 특정 실시예들을 도면에 예시하고 이를 상세히 설명하고자 한다. Here, the above objects, features and advantages of the present invention will become more apparent through the following detailed description in conjunction with the accompanying drawings. However, the present invention can apply various changes and can have various embodiments, hereinafter, specific embodiments will be illustrated in the drawings and described in detail.

명세서 전체에 걸쳐서 동일한 참조번호들은 원칙적으로 동일한 구성요소들을 나타낸다. 또한, 각 실시예의 도면에 나타나는 동일한 사상의 범위 내의 기능이 동일한 구성요소는 동일한 참조부호를 사용하여 설명한다.Like reference numerals designate essentially like elements throughout the specification. In addition, components having the same function within the scope of the same idea appearing in the drawings of each embodiment are described using the same reference numerals.

명세서 전체에서 어떤 부분이 어떤 구성요소를 "포함"한다고 할 때, 이는 특별히 반대되는 기재가 없는 한 다른 구성요소를 제외하는 것이 아니라 다른 구성요소를 더 포함할 수 있음을 의미한다. 또한, 명세서에 기재된 "...부", "모듈" 등의 용어는 적어도 하나의 기능이나 동작을 처리하는 단위를 의미하며, 이는 하드웨어 또는 소프트웨어로 구현되거나 하드웨어와 소프트웨어의 결합으로 구현될 수 있다.When it is said that a certain part "includes" a certain component throughout the specification, it means that it may further include other components without excluding other components unless otherwise stated. In addition, terms such as "...unit" and "module" described in the specification mean a unit that processes at least one function or operation, which may be implemented as hardware or software or a combination of hardware and software. .

본 발명과 관련된 공지 기능 또는 구성에 대한 구체적인 설명이 본 발명의 요지를 불필요하게 흐릴 수 있다고 판단되는 경우 그 상세한 설명을 생략한다. 또한, 본 명세서의 설명 과정에서 이용되는 숫자(예를 들어, 제 1, 제 2 등)는 하나의 구성요소를 다른 구성요소와 구분하기 위한 식별기호에 불과하다.If it is determined that a detailed description of a known function or configuration related to the present invention may unnecessarily obscure the subject matter of the present invention, the detailed description thereof will be omitted. In addition, numbers (eg, 1st, 2nd, etc.) used in the description process of this specification are only identifiers for distinguishing one component from another component.

도 1은 본 발명의 바람직한 실시예에 따른 동영상 콘텐츠에 대한 시청자 반응 분석 시스템의 전체 구성을 도시하는 도면이다.1 is a diagram showing the overall configuration of a viewer reaction analysis system for video content according to a preferred embodiment of the present invention.

도 1을 참조하면, 동영상 콘텐츠에 대한 시청자 반응 분석 시스템은 반응 영상 수집 장치(230) 및 반응 분석 장치(250)를 기본적으로 포함하고, 반응 영상 수집 장치(230)에서 수집된 시청자 반응 영상을 저장하는 반응 영상 DB(240) 및 반응 분석 장치(250)에서 분석된 결과를 저장하는 반응 분석 DB(260)를 더 포함할 수 있다. 또한, 반응 분석 장치(250)는 프로세서(253), 통신 모듈(251), 및 메모리(255)를 포함할 수 있다.Referring to FIG. 1, the viewer reaction analysis system for video content basically includes a reaction image collection device 230 and a reaction analysis device 250, and stores the viewer reaction images collected by the reaction image collection device 230. A reaction analysis DB 260 for storing the results analyzed by the reaction image DB 240 and the reaction analysis device 250 may be further included. In addition, the reaction analysis device 250 may include a processor 253, a communication module 251, and a memory 255.

각 구성 요소들의 기능을 살펴보면, 반응 영상 수집 장치(230)는 내부에 카메라(210)와 디스플레이(220)를 포함하거나, 외부의 카메라(210) 및 디스플레이(220)와 연결되어, 디스플레이(220)를 통해서 동영상 콘텐츠를 시청자에게 표시하는 동시에, 카메라(210)를 이용하여 시청자를 촬영하여 반응 동영상을 생성하고, 반응 동영상을 반응 분석 장치(250)로 전송하거나, 반응 영상 DB(240)에 저장한다. 이 때, 반응 동영상의 각 프레임에는 동영상 콘텐츠의 재생 시간과 동기화되도록 타임스탬프가 설정되어 있다. Looking at the function of each component, the reaction image collection device 230 includes a camera 210 and a display 220 inside, or is connected to an external camera 210 and the display 220, and the display 220 At the same time as the video content is displayed to the viewer through, a reaction video is generated by photographing the viewer using the camera 210, and the reaction video is transmitted to the reaction analysis device 250 or stored in the reaction video DB 240. . At this time, a timestamp is set in each frame of the reaction video to be synchronized with the reproduction time of the video content.

본 발명의 바람직한 실시예에서, 반응 동영상은 25 fps의 프레임 레이트로 촬영되었으나, 이에 한정되는 것은 아니다. In a preferred embodiment of the present invention, the reaction video is taken at a frame rate of 25 fps, but is not limited thereto.

반응 분석 장치(250)는 반응 영상 수집 장치(230)로부터 동영상 콘텐츠에 대한 반응 동영상을 입력받거나, 반응 영상 DB(240)에 저장된 동영상 콘텐츠에 대한 반응 동영상을 입력받고, 반응 동영상의 각 영상 프레임에 대해서 분석을 수행하여 반응 동영상에 포함된 각 프레임에 대해서 각 시청자가 동영상 콘텐츠를 시청하면서 표현한 감정들을 분석한다. The reaction analysis device 250 receives a reaction video for video content from the reaction video collecting device 230 or receives a reaction video for video content stored in the reaction video DB 240, and displays each video frame of the reaction video. For each frame included in the reaction video, the emotions expressed by each viewer while watching the video content are analyzed.

그리고, 반응 분석 장치(250)는 각 프레임에 대해서 전체 시청자들이 표현한 각 감정들의 평균치를 계산하여, 해당 프레임에 대한 몰입도를 계산하고, 동영상 콘텐츠 전체에 대한 평균 몰입도를 구한 후, 몰입도를 이용한 다양한 통계 정보를 생성하여 제공할 수 있다. Then, the reaction analysis device 250 calculates the average value of each emotion expressed by all viewers for each frame, calculates the degree of immersion for the frame, obtains the average degree of immersion for the entire video content, and calculates the degree of immersion. Various statistical information can be generated and provided.

본 발명의 바람직한 실시예에 따라서 제공되는 통계 정보는, 평균 몰입도보다 영상 프레임의 몰입도가 높은 구간 및 평균 몰입도 이상이 가장 오래 유지되는 구간, 평균 몰입도보다 프레임의 몰입도가 낮은 구간 및 평균 몰입도 이하로 가장 오래 유지되는 구간, 몰입도가 가장 큰 프레임 및 가장 낮은 프레임을 기준으로 앞 뒤 30초 또는 1분간의 구간에 대한 정보 등을 포함할 수 있다. Statistical information provided according to a preferred embodiment of the present invention includes a section in which the degree of immersion of an image frame is higher than the average degree of immersion, a section in which the degree of immersion is maintained longer than the average degree of immersion, a section in which the degree of immersion of a frame is lower than the average degree of immersion, and It may include information about a section of 30 seconds or 1 minute before and after the section that is maintained below the average level of immersion for the longest time, the frame with the highest level of immersion, and the frame with the lowest level of immersion.

본 발명의 바람직한 실시예에 따른 반응 분석 장치(250)는 프로세서(253), 메모리(255), 통신 모듈(251)을 포함한다. The reaction analysis device 250 according to a preferred embodiment of the present invention includes a processor 253, a memory 255, and a communication module 251.

본 발명의 바람직한 실시예에 따른 메모리(255)는 프로세서(253)에 의해 실행 가능한 명령어들, 및 프로세서(253)에 의해 실행되는 프로그램들을 저장할 수 있고, 입/출력되는 데이터들을 저장할 수도 있다. 메모리(255)의 예로는 하드 디스크(HDD:Hard Disk Drive), SSD(Solid State Drive), 플래쉬메모리(255)(flash memory), 롬(ROM:Read-Only Memory), 램(RAM:Random Access Memory) 등이 있을 수 있다. 메모리(255)는 인터넷(internet)상에서 저장 매체의 기능을 수행하는 웹 스토리지(web storage) 또는 클라우드 서버로 대체 운영될 수도 있다.The memory 255 according to a preferred embodiment of the present invention may store instructions executable by the processor 253 and programs executed by the processor 253, and may store input/output data. Examples of the memory 255 include a hard disk drive (HDD), a solid state drive (SSD), a flash memory 255, a read-only memory (ROM), and a random access RAM (RAM). memory), etc. The memory 255 may be alternatively operated as a web storage or cloud server that performs the function of a storage medium on the Internet.

본 발명의 바람직한 실시예에 따른 프로세서(253)는 CPU(Central Processing Unit)나 이와 유사한 장치(예컨대, MPU(Micro Processing Unit), MCU(Micro Controll Unit) 등)로 구현될 수 있고, 메모리(255)에 저장된 명령어들을 실행함으로써, 도 2를 참조하여 후술하는 동영상 콘텐츠에 대한 시청자 반응 분석 방법의 각 단계를 수행한다. The processor 253 according to a preferred embodiment of the present invention may be implemented as a central processing unit (CPU) or a similar device (eg, a micro processing unit (MPU), a micro control unit (MCU), etc.), and a memory 255 ), each step of the method for analyzing a viewer's reaction to video content, which will be described later with reference to FIG. 2, is performed.

통신 모듈(251)은 유선 또는 무선 통신 방식으로 반응 영상 수집 장치(230)와 통신을 수행한다. 통신 모듈(251)이 이용하는 유선 통신 방식으로는 LAN(Local Area Network)이나 USB(Universal Serial Bus) 통신이 대표적인 예이며 그 외의 다른 방식도 가능하다.The communication module 251 communicates with the reaction image collection device 230 in a wired or wireless communication method. As a wired communication method used by the communication module 251, LAN (Local Area Network) or USB (Universal Serial Bus) communication is a representative example, and other methods are also possible.

또한, 무선 타입의 경우에는 주로 블루투스(Bluetooth)나 지그비(Zigbee)와 같은 WPAN(Wireless Personal Area Network)계열의 통신 방식을 이용할 수 있고, 와이파이(Wi-Fi)같은 WLAN(Wireless Local Area Network) 계열의 통신 방식, 이동통신망(LTE, 5G 등)이나 그 외의 알려진 다른 통신 방식을 이용하는 것도 가능하다.In addition, in the case of the wireless type, a WPAN (Wireless Personal Area Network)-based communication method such as Bluetooth or Zigbee can be used, and a WLAN (Wireless Local Area Network)-based communication method such as Wi-Fi can be used. It is also possible to use a communication method, a mobile communication network (LTE, 5G, etc.) or other known communication methods.

도 2는 본 발명의 바람직한 실시예에 따른 동영상 콘텐츠에 대한 시청자 반응 분석 방법을 설명하는 도면이다.2 is a diagram illustrating a method for analyzing a viewer's reaction to video content according to a preferred embodiment of the present invention.

도 2를 더 참조하여, 본 발명의 바람직한 실시예에 따른 동영상 콘텐츠에 대한 시청자 반응 분석 시스템의 기능 및 동영상 콘텐츠에 대한 시청자 반응 분석 방법을 설명한다.With further reference to FIG. 2 , a function of a viewer reaction analysis system for video content and a method for analyzing viewer reaction to video content according to a preferred embodiment of the present invention will be described.

먼저, 본 발명의 바람직한 실시예에 따른 반응 분석 장치(250)는, 각 감정별로 해당 감정을 나타내는 얼굴 표정을 포함하는 얼굴 이미지들을 이용하여 인공지능 모델을 학습하여, 사람의 얼굴이 포함된 이미지로부터 얼굴 영역을 추출하고, 추출된 얼굴 영역을 분석하여 사람의 감정을 판정할 수 있도록 감정 판정용 인공지능 모델을 구비한다(S210).First, the reaction analysis device 250 according to a preferred embodiment of the present invention learns an artificial intelligence model using facial images including facial expressions representing corresponding emotions for each emotion, and from images including human faces. An artificial intelligence model for emotion judgment is provided so that a face area is extracted and a person's emotion is determined by analyzing the extracted face area (S210).

이를 위해서, 본 발명의 바람직한 실시예에서, 프로세서(253)는 사전에 MTCNN(Multi-task Cascaded Convolutional Networks) 알고리즘을 이용하여 학습용 얼굴 이미지에서 얼굴 영역만을 추출하고, 추출된 얼굴 영역 이미지를 감정 판정용 인공지능 모델의 입력단에 인가하고, 출력단에 해당 얼굴 영역 이미지가 나타내는 감정을 인가하는 방식으로 지도 학습을 수행하여, 감정 판정용 인공지능 모델을 학습하였다.To this end, in a preferred embodiment of the present invention, the processor 253 extracts only the face region from the face image for learning using a multi-task cascaded convolutional networks (MTCNN) algorithm in advance, and uses the extracted face region image for emotion determination. The artificial intelligence model for emotion determination was learned by performing supervised learning in a way that applied to the input end of the artificial intelligence model and applied the emotion represented by the corresponding face area image to the output end.

본 발명의 바람직한 실시예에서는, 한국지능정보사회진흥원에서 제공하는 약 50만건의 감정 표현 얼굴 이미지로부터 얼굴 영역을 추출하여 학습용 얼굴 이미지로서 활용하였고, 감정 판정용 인공지능 모델로서 EfficientNet b3 알고리즘을 이용하였으나, 다른 데이터베이스에서 제공하는 얼굴 이미지들을 학습용 얼굴 이미지들로 활용할 수 있고, 감정 판정용 인공지능 모델의 경우에도 공지의 다른 인공지능 모델들이 활용될 수도 있음은 물론이다. In a preferred embodiment of the present invention, facial regions were extracted from about 500,000 emotion-expressing facial images provided by the Korea Institute for Intelligent Information Society Promotion and used as facial images for learning, and the EfficientNet b3 algorithm was used as an artificial intelligence model for emotion determination. , Face images provided by other databases can be used as face images for training, and other well-known AI models can be used in the case of an AI model for emotion judgment, of course.

또한, 한국지능정보사회진흥원에서 제공하는 학습 이미지들은 사용자의 얼굴 표정을 기쁨, 당황, 분노, 불안, 상처, 슬픔, 중립(무표정) 의 7가지 감정으로 분류하여 제공하고, 본 발명의 감정 판정용 인공지능 모델은 이러한 7가지 감정을 나타내는 얼굴 이미지들을 모두 학습하고, 후술하는 감정 판정 과정에서, 입력된 얼굴 이미지가 상기한 7개의 감정들 각각에 해당할 확률을 각 감정에 대한 값으로서 출력한다. In addition, the learning images provided by the Korea Intelligence Information Society Agency classify and provide the user's facial expressions into seven emotions of joy, embarrassment, anger, anxiety, hurt, sadness, and neutral (expressionless), and are used for emotion judgment of the present invention The artificial intelligence model learns all of the facial images representing these seven emotions, and outputs the probability that the input face image corresponds to each of the seven emotions as a value for each emotion in an emotion determination process described later.

예컨대, A라는 얼굴 이미지가 입력되면, 감정 판정 인공지능 모델은, 기쁨:0.4, 당황:0.1, 분노:0.1, 불안:0.1, 상처:0.5, 슬픔:1.5, 중립(무표정):1.0 과 같이, 해당 얼굴 이미지가 각각의 감정에 해당할 확률값을 출력하며, 이 확률값들의 합은 1이 된다.For example, if a face image called A is input, the emotion judgment artificial intelligence model, joy: 0.4, embarrassment: 0.1, anger: 0.1, anxiety: 0.1, hurt: 0.5, sadness: 1.5, neutral (expressionless): 1.0, The corresponding face image outputs a probability value corresponding to each emotion, and the sum of these probability values is 1.

여기서, 본 발명은 7개의 감정으로 분류된 얼굴 이미지들을 학습 이미지로서 이용하였으나, 4개의 감정으로 분류된 얼굴 이미지들을 학습 이미지로서 이용하게 되면, 감정 판정 인공지능 모델은 4개의 감정 각각에 속할 확률값을 출력하게 된다. Here, the present invention uses face images classified as 7 emotions as training images, but when face images classified as 4 emotions are used as training images, the emotion decision artificial intelligence model calculates the probability value belonging to each of the 4 emotions. will output

감정 판정용 인공지능 모델의 학습이 완료되면, 반응 영상 수집 장치(230)는 디스플레이(220)를 통해서 동영상 콘텐츠를 재생하고, 카메라(210)로 이를 시청하는 시청자들의 모습을 촬영하여, 상기 동영상 콘텐츠와 타임 스탬프로 동기화된 반응 동영상을 생성하고, 생성된 반응 동영상을 반응 분석 장치(250)로 전송하거나, 반응 영상 DB(240)에 저장하면, 반응 분석 장치(250)는 통신 모듈(251)을 통해서 반응 영상 수집 장치(230)로부터 반응 동영상을 수신하거나, 사전에 반응 영상 DB(240)에 저장된 반응 동영상을 반응 영상 DB(240)로부터 수신한다(S220).When the learning of the artificial intelligence model for emotion determination is completed, the reaction video collection device 230 plays the video content through the display 220, takes pictures of the viewers watching it with the camera 210, and captures the video content. When generating a reaction video synchronized with a time stamp and transmitting the generated reaction video to the reaction analysis device 250 or storing it in the reaction image DB 240, the reaction analysis device 250 communicates the module 251 Receives a reaction video from the reaction image collecting device 230 or receives a reaction video previously stored in the reaction image DB 240 from the reaction image DB 240 (S220).

본 발명의 바람직한 실시예의 제 S220 단계에서, 반응 동영상은 약 25 fps의 프레임 레이트로 생성되지만, 프레임 레이트는 가변될 수 있고, 반응 동영상의 각 프레임에는 동영상 콘텐츠의 재생 시간과 동기화되도록 타임스탬프가 설정되어 있다. In step S220 of the preferred embodiment of the present invention, the reaction video is generated at a frame rate of about 25 fps, but the frame rate can be varied, and a timestamp is set in each frame of the reaction video to be synchronized with the playback time of the video content has been

아울러, 본 발명의 바람직한 실시예에서는, 감정 판정용 인공지능 모델의 학습이 완료된 후, 반응 동영상을 생성하는 것으로 설명하였으나, 반응 영상 수집 장치(230)가 반응 동영상을 생성하여 반응 영상 DB(240)에 저장한 후, 감정 판정용 인공지능 모델의 학습이 수행될 수도 있다.In addition, in a preferred embodiment of the present invention, it has been described that a reaction video is generated after the learning of the artificial intelligence model for emotion judgment is completed, but the reaction video collection device 230 generates a reaction video and the reaction video DB 240 After storing in , learning of an artificial intelligence model for emotion judgment may be performed.

아울러, 동영상 콘텐츠에 대한 시청자들의 만족/불만족 및 감정 변화를 분석하기 위해서는, 복수의 시청자들의 얼굴 이미지가 필요하다. 따라서, 반응 동영상이 혼자서 동영상 콘텐츠를 시청하는 시청자에 대해서 생성되는 경우에는, 하나의 동영상 콘텐츠에 대해서 각 시청자별로 복수의 반응 동영상이 필요하다. In addition, in order to analyze viewers' satisfaction/dissatisfaction and emotional changes with respect to video content, face images of a plurality of viewers are required. Therefore, when a reaction video is generated for a viewer who watches video content alone, a plurality of reaction videos are required for each viewer for one video content.

만약, 하나의 반응 동영상을 제작하기 위해서 복수의 시청자들이 같이 동영상 콘텐츠를 시청하고, 반응 동영상의 영상 프레임 안에 복수 시청자들의 얼굴 이미지가 포함된다면, 하나의 반응 동영상만으로도 후술하는 분석이 가능하다.If a plurality of viewers watch video content together to produce one reaction video, and the facial images of the plurality of viewers are included in the video frame of the reaction video, the analysis described below is possible with only one reaction video.

한편, 반응 동영상이 수신된 후, 반응 분석 장치(250)의 프로세서(253)는 MTCNN(Multi-task Cascaded Convolutional Networks) 알고리즘을 이용하여 반응 동영상에 포함된 각각의 영상 프레임으로부터 시청자의 얼굴 영역 이미지를 추출한다(S230).Meanwhile, after the reaction video is received, the processor 253 of the reaction analysis device 250 uses a multi-task cascaded convolutional networks (MTCNN) algorithm to generate an image of the viewer's face region from each video frame included in the reaction video. Extract (S230).

얼굴 영역이 추출되면, 프로세서(253)는 각 영상 프레임의 얼굴 영역 이미지를 상기한 감정 판정 인공지능 모델에 적용하여 각 프레임의 얼굴 영역이 나타내는 감정을 판정한다(S240). When the face region is extracted, the processor 253 determines the emotion represented by the face region of each frame by applying the face region image of each image frame to the emotion determination artificial intelligence model (S240).

구체적으로, 얼굴 이미지를 상기 학습된 감정 판정 인공지능 모델에 입력하면, 얼굴 이미지가 각 감정에 해당될 가능성을 나타내는 값들이 각 감정별로 출력된다. Specifically, when a face image is input to the learned emotion determination artificial intelligence model, values representing the possibility that the face image corresponds to each emotion are output for each emotion.

상기한 바와 같이, 본 발명의 감정 판정 인공지능 모델은 7가지 감정으로 학습된 경우에, 제 S240 단계에서 출력되는 분석 결과는 해당 얼굴 이미지가 기쁨, 당황, 분노, 불안, 상처, 슬픔, 중립(무표정) 의 7가지 감정 각각에 해당할 확률값을 출력한다. 만약, 이러한 7가지 감정을 모두 이용하여 분석을 수행하는 경우에는 해당 값들이 후술하는 과정에서 그대로 이용된다.As described above, when the emotion determination artificial intelligence model of the present invention is learned with seven emotions, the analysis result output in step S240 indicates that the corresponding face image is joy, embarrassment, anger, anxiety, hurt, sadness, neutral ( Expressionless) outputs a probability value corresponding to each of the 7 emotions. If analysis is performed using all of these seven emotions, the corresponding values are used as they are in a process described later.

다만, 후술하는 본 발명의 바람직한 실시예와 같이, 학습된 감정의 개수보다 실제 분석할 감정의 개수가 적은 경우에는, 분석할 감정들에 대해서만 해당 얼굴 이미지가 분석할 감정들에 속할 확률값을 다시 계산한다. However, as in a preferred embodiment of the present invention described later, when the number of emotions to be actually analyzed is smaller than the number of learned emotions, the probability value that the corresponding face image belongs to the emotions to be analyzed is recalculated only for the emotions to be analyzed. do.

예컨대, 후술하는 본 발명의 바람직한 실시예는 시청자가 동영상 콘텐츠에 몰입하였는지 여부 및 가장 대표적인 감정인 기쁨, 슬픔, 분노 중 어느 감정에 몰입하였는지 여부에 대해서만 분석한다. 따라서, 상기한 7가지 감정 전부를 이용하는 대신에 기쁨, 슬픔, 분노 및 중립(무표정)의 4가지 감정에 대한 출력값들을 이용하여, 얼굴 이미지가 상기한 4가지 감정에 속할 확률을 다시 계산하는데, 본 발명에서는 7가지 감정들 중 해당 4가지 감정에 속할 확률값들을 Softmax 함수에 적용하여 얼굴 이미지가 상기한 4가지 중 어느 한 가지 감정에 속할 확률들을 각각 계산하였다. Softmax(소프트맥스)는 입력받은 값을 0~1사이의 출력값으로 모두 정규화하며, 출력 값들의 총합은 항상 1이 되는 특성을 가진 함수로서, 본 발명의 속하는 기술 분야인 딥러닝 기술 분야에서는 널리 알려진 함수이므로 구체적인 설명은 생략한다.For example, a preferred embodiment of the present invention described later analyzes whether the viewer is immersed in video content and which emotion among the most representative emotions, joy, sadness, and anger, is analyzed. Therefore, instead of using all of the above seven emotions, the probability that the face image belongs to the above four emotions is recalculated using the output values for the four emotions of joy, sadness, anger, and neutral (expressionless). In the present invention, the probabilities of belonging to any one of the above four emotions were calculated by applying the probability values belonging to the corresponding four emotions among the seven emotions to the Softmax function. Softmax is a function that normalizes all input values to output values between 0 and 1, and the sum of the output values is always 1. It is widely known in the deep learning technology field to which the present invention belongs. Since it is a function, a detailed description is omitted.

설명의 편의를 위해서, 간단한 예를 들어서 설명하면, 시청자 2 명에 대해서, 3개의 프레임으로 구성된 반응 동영상을 생성하여 분석한다고 가정하면, 각 시청자들의 반응 동영상에 대해서 상기한 제 S230 단계 및 제 S240 단계를 수행하면, 아래의 표 1과 같이, 각 시청자에 대한 감정 분석 결과를 얻을 수 있다.For convenience of description, as a simple example, assuming that a reaction video composed of three frames is generated and analyzed for two viewers, the above-described steps S230 and S240 for each viewer's reaction video. , emotion analysis results for each viewer can be obtained as shown in Table 1 below.

각 시청자별로, 그리고, 각 영상 프레임마다, 얼굴 이미지가 각 감정에 속할 확률값이 출력되면, 프로세서(253)는 서로 다른 시청자들의 반응 동영상 중, 서로 동일한 시간에 대응되는 각 영상 프레임별로, 각 감정별 확률값들의 평균을 계산한다(S250). When the probability value that the face image belongs to each emotion is output for each viewer and for each video frame, the processor 253 selects each video frame for each video frame corresponding to the same time among the reaction videos of different viewers, for each emotion. An average of the probability values is calculated (S250).

상기 표 1의 경우를 참조하면, 시청자 1의 제 1 프레임의 각 감정과 시청자 2의 제 1 프레임의 대응되는 각 감정의 확률값들의 평균을 구하면 아래의 표 2와 같다. 아울러, 제 2 프레임 및 제 3 프레임에 대한 결과도 표 2에 기재된 바와 같다.Referring to the case of Table 1, the average of the probability values of each emotion in the first frame of viewer 1 and the corresponding emotion in the first frame of viewer 2 is calculated as shown in Table 2 below. In addition, the results for the second frame and the third frame are also shown in Table 2.

프레임번호frame number 기쁨pleasure 슬픔sadness 분노anger 무표정expressionless 합계Sum 1One 0.450.45 0.20.2 0.10.1 0.250.25 1One 22 0.550.55 0.10.1 0.150.15 0.20.2 1One 33 0.150.15 0.20.2 0.20.2 0.450.45 1One

그 후, 프로세서(253)는 각 영상 프레임별로 시청자들의 감정별 확률값의 평균을 이용하여 시청자들의 몰입도를 계산한다(S260). 본 발명에서 몰입도는 시청자들이 해당 영상 프레임을 보면서 특정한 감정을 나타낼 확률로 정의된다. 즉, 상기 표 1 및 표 2에 기재된 예에서, 동영상 콘텐츠를 시청하는 시청자가 기쁨, 슬픔 또는 분노의 감정을 느낀다면, 이는 동영상 콘텐츠에 몰입하고 있다는 것을 의미하므로, 해당 영상 프레임에 대한 시청자들의 몰입도는 해당 감정들의 합으로 나타낼 수 있고, 이는 전체 확률인 1에서 아무런 감정을 나타내지 않는 무표정 확률을 감산한 것과 같다. 따라서, 상기한 표 2의 예에서, 영상 프레임 1에 대한 시청자들의 몰입도는 0.75 (0.45+0.2+0.1 또는 1-0.25)이고, 영상 프레임 2에 대한 시청자들의 몰입도는 0.8 이며, 영상 프레임 3에 대한 시청자들의 몰입도는 0.55 이다. After that, the processor 253 calculates the degree of immersion of the viewers by using the average of probability values for each emotion of the viewers for each video frame (S260). In the present invention, the degree of immersion is defined as the probability that viewers express a specific emotion while viewing a corresponding video frame. That is, in the examples described in Tables 1 and 2 above, if the viewer watching the video content feels joy, sadness, or anger, it means that he is immersed in the video content, and thus the viewer's immersion in the corresponding video frame. Degree can be expressed as the sum of the corresponding emotions, which is equivalent to subtracting the probability of expressionless expression showing no emotion from the total probability of 1. Therefore, in the example of Table 2 above, the degree of immersion of the viewers in the video frame 1 is 0.75 (0.45+0.2+0.1 or 1-0.25), the degree of immersion of the viewers in the video frame 2 is 0.8, and the degree of immersion of the viewers in the video frame 3 is The degree of immersion of viewers for is 0.55.

그 후, 프로세서(253)는 각 영상 프레임들의 몰입도를 사전에 정의된 일정한 영상 프레임 단위로 묶에서 평준화한다(S270). 본 발명은 동영상 콘텐츠에서 시청자들이 몰입하는 구간, 특정한 감정을 느끼는 구간, 가장 몰입도가 큰 소정 시간 동안의 구간 등의 통계 정보를 제공함으로써, 동영상 콘텐츠 제작에 유용한 정보를 제공하고자 한다. 그런데, 시청자들의 표정은 시시각각 순간적으로 변화될 수 있어, 몰입도가 급변하는 경우가 발생하고, 시청자의 감정은 동일하게 유지되지만, 인공지능 모델이 완벽하지 않음으로 인해서 발생하는 오류로 인하여 감정이 급변하는 것으로 나타나기도 한다. Then, the processor 253 equalizes the degree of immersion of each image frame by grouping them in units of predetermined image frames (S270). The present invention aims to provide useful information for producing video contents by providing statistical information such as a section in which viewers are immersed in video content, a section in which a particular emotion is felt, and a section for a predetermined time in which the viewer is most immersed. However, since the facial expressions of viewers can change instantaneously, there are cases in which the level of immersion changes rapidly, and the emotions of the viewers remain the same, but the emotions change rapidly due to errors caused by the artificial intelligence model not being perfect. It may also appear that

도 3a 내지 도 3c는 몰입도를 0부터 3.5 사이의 값으로 스케일링하여 표시한 그래프이다. 3A to 3C are graphs in which immersion is scaled by a value between 0 and 3.5.

도 3a를 참조하면, 전체 프레임의 몰입도 평균값을 흰 색 점선으로 표시하였고, 각 프레임의 몰입도 값을 파란색으로 표시하였다. 도 3a에 도시된 바와 같이, 각 프레임마다 몰입도 값이 급변하면, 시청자가 일정한 몰입도를 가지고 상당한 시간 동안 연속된 시청을 하는 구간을 찾아내기 어렵다. Referring to FIG. 3A , the average immersion value of all frames is indicated by a white dotted line, and the immersion value of each frame is indicated by a blue color. As shown in FIG. 3A , if the immersion value changes rapidly for each frame, it is difficult to find a section in which the viewer continuously watches for a considerable period of time with a certain level of immersion.

따라서, 본 발명의 바람직한 실시예는 도 3b에 도시된 바와 같이, 100프레임 단위로 영상 프레임들을 묶어서 몰입도의 평균을 계산하여 그래프로 출력하거나, 500프레임 단위로 영상 프레임들을 묶어서 몰입도의 평균을 계산하여 그래프로 출력할 수 있다.Therefore, as shown in FIG. 3B, a preferred embodiment of the present invention calculates the average of the degree of immersion by grouping the image frames in units of 100 frames and outputs it as a graph, or groups the image frames in units of 500 frames and calculates the average degree of immersion. It can be calculated and displayed as a graph.

그 후, 프로세서(253)는 각 영상 프레임의 몰입도 또는 감정의 확률값이 사전에 정의된 조건에 부합하는 영상 프레임 구간을 선정하여 사용자에게 제공한다(S280).After that, the processor 253 selects an image frame section in which the probability value of immersion or emotion of each image frame meets a predefined condition and provides the selected image frame section to the user (S280).

예를 들어, 사전에 정의된 조건이 ① 동영상 콘텐츠의 전체 재생 시간 중 순간 몰입도가 최대인 구간과 최소인 구간이라면, 프로세서(253)는 도 3a에 도시된 바와 같이, 전체 프레임에서 몰입도가 최대인 프레임(301)과 최소인 프레임(302)을 각각 선정하고, 해당 프레임 전 후로 약 30초씩 총 1분의 구간을 순간 몰입도 최대 구간 및 순간 몰입도 최소 구간으로 선정하여 반응 분석 DB(260)에 저장하고, 사용자들에게 제공한다.For example, if the condition defined in advance is ① a section in which the instantaneous degree of immersion is the maximum and the minimum in the total playback time of the video content, the processor 253 determines the degree of immersion in the entire frame as shown in FIG. 3A. The maximum frame 301 and the minimum frame 302 are selected, respectively, and a section of about 30 seconds before and after the frame for a total of 1 minute is selected as the maximum section of moment immersion and the minimum section of moment immersion to analyze the reaction DB (260 ) and provide it to users.

사전에 정의된 조건이 ② 몰입 유지 구간/몰입 저하 구간인 경우, 프로세서(253)는 몰입도가 평균 몰입도보다 높게 유지된 가장 긴 구간(303)과, 몰입도가 평균 몰입도보다 작게 유지된 가장 긴 구간(304)을 선정하여 제공한다. 이 때, 프로세서(253)는 도 3b 및 도 3c 에 표시된 바와 같이, 평준화된 구간에서 몰입 유지 구간/몰입 저하 구간을 선정하여 반응 분석 DB(260)에 저장하고, 사용자들에게 제공한다.If the predefined condition is ② immersion maintenance section / immersion decrease section, the processor 253 determines the longest section 303 in which the immersion level is maintained higher than the average immersion level and the immersion level is maintained lower than the average immersion level. The longest section 304 is selected and provided. At this time, as shown in FIGS. 3B and 3C , the processor 253 selects an immersion maintenance section/immersion lowering section from the leveled section, stores it in the response analysis DB 260, and provides it to users.

사전에 정의된 조건이 ③ 감정별 최대 구간/최소 구간인 경우, 프로세서(253)는 상기한 표 2에서 전체 영상 프레임들에 대해서 각 감정별로 확률의 전체 평균을 계산하고, 각 프레임의 해당 감정 확률값이 평균 이상인 구간들 중 최장 기간인 구간을 순서대로 선정하여 반응 분석 DB(260)에 저장하고 사용자들에게 제공하며, 각 프레임의 해당 감정 확률값이 평균 이하인 구간들 중 최장 기간인 구간을 순서대로 선정하여 반응 분석 DB(260)에 저장하고, 사용자들에게 제공할 수 있다.If the predefined condition is ③ the maximum/minimum range for each emotion, the processor 253 calculates the overall average of the probabilities for each emotion for all video frames in Table 2 above, and the corresponding emotion probability value for each frame. Among the sections above the average, the sections with the longest period are selected in order, stored in the response analysis DB 260 and provided to users, and among the sections with the corresponding emotion probability value of each frame below the average, the sections with the longest period are selected in order. It can be stored in the reaction analysis DB 260 and provided to users.

다만, 몰입도 및 감정의 최대 구간 및 최소 구간을 선정함에 있어서, 동영상 콘텐츠가 재생 직후, 소정의 시간 구간은 선정에서 제외한다. 이는 동영상 콘텐츠가 시작된 직후에는 동영상 콘텐츠의 인트로가 재생되어 시청자로 하여금 감정의 표현을 불러일으킬 만한 내용이 재생되지 않아, 몰입도가 낮고 명확한 사용자의 감정이 표현되지 않아, 최소 몰입도 구간 또는 최소 감정 구간에 해당하기 때문이다. However, in selecting the maximum and minimum sections of immersion and emotion, a predetermined time section immediately after video content is played is excluded from the selection. This is because the intro of the video content is played immediately after the video content starts, and content that can evoke the expression of emotion from the viewer is not played. because it corresponds to the section.

또한, 제 S280 단계에서, 프로세서(253)는 상술한 과정을 통해서 분석한 결과, 예컨대, 각 영상 프레임의 몰입도, 감정별 확률값 및 사전에 정의된 조건을 만족하는 구간들에 대한 정보 등을 반응 분석 DB(260)에 저장할 수 있다.In addition, in step S280, the processor 253 responds to the result of analysis through the above-described process, for example, the degree of immersion of each image frame, probability values for each emotion, and information about sections satisfying predefined conditions. It can be stored in the analysis DB (260).

지금까지 설명한 본 발명의 바람직한 실시예에 따른, 동영상 콘텐츠에 대한 시청자 반응 분석 방법은, 컴퓨터에서 실행가능한 명령어로 구현되어 비일시적 저장매체에 저장된 컴퓨터 프로그램으로 구현될 수 있다.The method for analyzing a viewer's reaction to video content according to a preferred embodiment of the present invention described so far may be implemented as a computer executable command and a computer program stored in a non-transitory storage medium.

저장매체는 컴퓨터 시스템에 의하여 읽혀질 수 있는 데이터가 저장되는 모든 종류의 기록장치를 포함한다. 컴퓨터가 읽을 수 있는 저장매체의 예로는 ROM, RAM, CD-ROM, 자기 테이프, 플로피디스크, 광데이터 저장장치 등이 있다. 또한 컴퓨터가 읽을 수 있는 저장매체는 네트워크로 연결된 컴퓨터 시스템에 분산되어, 분산방식으로 컴퓨터가 읽을 수 있는 코드가 저장되고 실행될 수 있다.The storage medium includes all types of recording devices in which data that can be read by a computer system is stored. Examples of computer-readable storage media include ROM, RAM, CD-ROM, magnetic tape, floppy disk, and optical data storage devices. In addition, the computer-readable storage medium is distributed in computer systems connected through a network, so that computer-readable codes can be stored and executed in a distributed manner.

이제까지 본 발명에 대하여 그 바람직한 실시예들을 중심으로 살펴보았다. 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자는 본 발명이 본 발명의 본질적인 특성에서 벗어나지 않는 범위에서 변형된 형태로 구현될 수 있음을 이해할 수 있을 것이다. 그러므로 개시된 실시예들은 한정적인 관점이 아니라 설명적인 관점에서 고려되어야 한다. 본 발명의 범위는 전술한 설명이 아니라 특허청구범위에 나타나 있으며, 그와 동등한 범위 내에 있는 모든 차이점은 본 발명에 포함된 것으로 해석되어야 할 것이다.So far, the present invention has been looked at with respect to its preferred embodiments. Those skilled in the art to which the present invention pertains will be able to understand that the present invention can be implemented in a modified form without departing from the essential characteristics of the present invention. Therefore, the disclosed embodiments should be considered from an illustrative rather than a limiting point of view. The scope of the present invention is shown in the claims rather than the foregoing description, and all differences within the equivalent scope will be construed as being included in the present invention.

100 : 시청자
210 : 카메라
220 : 디스플레이
230 : 반응 영상 수집 장치
240 : 반응 영상 DB
250 : 반응 분석 장치
251 : 통신 모듈
253 : 프로세서
255 : 메모리
260 : 반응 분석 DB100: viewers
210: camera
220: display
230: reaction image collection device
240: reaction image DB
250: reaction analysis device
251: communication module
253: processor
255: memory
260: reaction analysis DB

Claims

A viewer reaction analysis method for video content performed in a reaction analysis system including a processor and memory,
(b) extracting an image of the viewer's face region from each video frame included in a reaction video synchronized with the video content, which captures the viewers watching the video content;
(c) When each of the extracted face region images is applied to an emotion determination artificial intelligence model, the emotion determination artificial intelligence model, for each video frame of the reaction video for each viewer, each face region image has a predefined emotion determining and outputting probabilities for each emotion;
(d) calculating the degree of immersion of viewers for each video frame, using probability values for each emotion, for each video frame corresponding to the same time among reaction videos of different viewers;
(e) leveling the degree of immersion in units of a predefined number of image frames; and
(f) selecting and providing a section of video frames in which the degree of immersion of each video frame meets a predefined condition;
The step (f) is
The longest section in which the degree of immersion of video frames is maintained higher than the average degree of immersion of all image frames is selected as the immersion maintenance section, and the longest section in which the degree of immersion of video frames is maintained smaller than the average degree of immersion is designated as the immersion reduction section. Select and provide
Calculate the overall average of probabilities for each emotion for all video frames,
Among the sections in which the corresponding emotion probability value of each video frame is higher than the overall average, the section with the longest period is selected and provided in order, and among the sections in which the corresponding emotion probability value of each video frame is lower than the overall average, the section with the longest period is selected and provided in order. A method for analyzing viewer reactions to video content, characterized in that for.

The method of claim 1, before step (b)
(a) providing the emotion determination artificial intelligence model learned by using facial images including facial expressions representing the corresponding emotion for each emotion; a viewer reaction analysis method for video content, characterized in that it further comprises.

The method of claim 2, wherein step (a)
An artificial intelligence model is learned to extract a face region from an image including a human face, analyze the extracted face region, determine the probability that each emotion represents a predefined emotion, and output the result. A method for analyzing a viewer's reaction to video content, characterized in that it comprises an emotion determination artificial intelligence model.

According to claim 1,
The video content and the reaction videos for different viewers are synchronized with each other with a time stamp.

The method of claim 1, wherein step (d) is
Among the reaction videos of different viewers, for each video frame corresponding to the same time, the average of probability values for each emotion is calculated to calculate the degree of immersion of viewers for each video frame. analysis method.

The method of claim 5, wherein step (d)
A method for analyzing a viewer's reaction to video content, characterized in that for each video frame, the degree of immersion of each video frame is calculated by summing up all probability averages of the remaining emotions except expressionless (neutral).

The method of claim 1, wherein step (f) is
Selecting frames with the maximum and minimum immersion in the entire video frame, respectively, and selecting a predetermined time before and after the selected frame as the maximum section for the maximum immersion and the minimum section for the instantaneous immersion A method for analyzing viewer reactions to video content.

delete

A computer program stored in a non-transitory storage medium and executed on a computer including a processor to perform the method of analyzing a viewer's reaction to any one of claims 1 to 7.

An apparatus for analyzing viewer reaction to video content including a processor and a memory storing predetermined instructions,
The processor executing the instructions stored in the memory
(b) extracting an image of the viewer's face region from each video frame included in a reaction video synchronized with the video content, which captures the viewers watching the video content;
(c) When each of the extracted face region images is applied to an emotion determination artificial intelligence model, the emotion determination artificial intelligence model, for each video frame of the reaction video for each viewer, each face region image has a predefined emotion determining and outputting probabilities for each emotion;
(d) calculating the degree of immersion of viewers for each video frame, using probability values for each emotion, for each video frame corresponding to the same time among reaction videos of different viewers;
(e) leveling the degree of immersion in units of a predefined number of image frames; and
(f) performing a step of selecting and providing a section of video frames in which the degree of immersion of each video frame meets a predefined condition;
The step (f) is
The longest section in which the degree of immersion of video frames is maintained higher than the average degree of immersion of all image frames is selected as the immersion maintenance section, and the longest section in which the degree of immersion of video frames is maintained smaller than the average degree of immersion is designated as the immersion reduction section. Select and provide
Calculate the overall average of probabilities for each emotion for all video frames,
Among the sections in which the corresponding emotion probability value of each video frame is higher than the overall average, the section with the longest period is selected and provided in order, and among the sections in which the corresponding emotion probability value of each video frame is lower than the overall average, the section with the longest period is selected and provided in order. Apparatus for analyzing viewer reaction to video content, characterized in that for doing.

13. The method of claim 12, wherein the processor
Before step (b),
(a) providing the emotion determination artificial intelligence model learned using facial images including facial expressions representing the corresponding emotion for each emotion;

14. The method of claim 13, wherein step (a)
An artificial intelligence model is learned to extract a face region from an image including a human face, analyze the extracted face region, determine the probability that each emotion represents a predefined emotion, and output the result. Apparatus for analyzing viewer reaction to video content, characterized in that it comprises an emotion determination artificial intelligence model.

According to claim 12,
Apparatus for analyzing viewer reaction to video content, characterized in that the video content and the reaction videos for different viewers are synchronized with each other with a time stamp.

13. The method of claim 12, wherein step (d)
Among the reaction videos of different viewers, for each video frame corresponding to the same time, the average of probability values for each emotion is calculated to calculate the degree of immersion of viewers for each video frame. analysis device.

17. The method of claim 16, wherein step (d)
An apparatus for analyzing a viewer's reaction to video content, characterized in that for each video frame, the degree of immersion of each video frame is calculated by adding up all probability averages of the remaining emotions except expressionless (neutral).

13. The method of claim 12, wherein step (f)
Selecting frames with the maximum and minimum immersion in the entire video frame, respectively, and selecting a predetermined time before and after the selected frame as the maximum section for the maximum immersion and the minimum section for the instantaneous immersion A device for analyzing viewer reactions to video content.

delete