KR102459775B1

KR102459775B1 - Automatic editing method, apparatus and system for artificial intelligence-based vidieo content production

Info

Publication number: KR102459775B1
Application number: KR1020220077309A
Authority: KR
Inventors: 하승진
Original assignee: 주식회사 에이치앤케이스튜디오; 하승진
Priority date: 2022-06-24
Filing date: 2022-06-24
Publication date: 2022-10-27

Abstract

An apparatus according to an embodiment receives a production file including a video clip for producing content, an audio file, and subtitle text corresponding to the audio file from a user terminal of a user, creates a content style using the production file, applies the content style to the production file to create a content draft, transmits the content style and the content draft to the user terminal, provides the user terminal with an interface for editing the content style, and applies the content style edited by the interface to the production file to create a final content.

Description

AUTOMATIC EDITING METHOD, APPARATUS AND SYSTEM FOR ARTIFICIAL INTELLIGENCE-BASED VIDIEO CONTENT PRODUCTION}

아래 실시예들은 인공지능 기반 영상 콘텐츠 제작을 위한 자동 편집 방법을 제공하는 기술에 관한 것이다.The following embodiments relate to a technology for providing an automatic editing method for artificial intelligence-based video content production.

코로나 19의 영향으로 인해 오프라인 활동이 제한되고 온라인 활동이 활발해짐에 따라 영상 콘텐츠를 시청하는 사람들이 많아지고 있다. 또한, 영상 콘텐츠를 시청하는 것뿐만 아니라 직접 영상 콘텐츠를 기획 및 제작하고 영상 플랫폼에 업로드하여 수익을 창출하려는 사람들도 증가하는 추세이다. As offline activities are restricted and online activities are active due to the impact of COVID-19, more and more people are watching video content. In addition, there is an increasing trend in the number of people not only watching video content, but also planning and producing video content and uploading it to a video platform to generate revenue.

영상 콘텐츠를 제작하기 위해서는 사용자가 영상 콘텐츠의 주제를 기획하고, 그에 따라 촬영한 영상의 길이, 순서를 정하여 컷편집하고, 자막 및 효과를 삽입하여 시각적 영상 요소를 추가하는 과정 등 번거로운 작업을 수행하여야 한다.In order to produce video content, the user must plan the subject of the video content, cut and edit the length and sequence of the captured video accordingly, and perform cumbersome tasks such as the process of adding visual elements by inserting subtitles and effects. do.

이에 따라, 콘텐츠의 제작 파일로부터 자막 스타일, 컷편집 스타일 및 색감 스타일을 포함하는 콘텐츠 스타일을 생성하여 영상 콘텐츠를 자동으로 편집할 수 있도록 하는 기술의 개발이 요구된다.Accordingly, there is a need to develop a technology for automatically editing video content by generating a content style including a subtitle style, a cut editing style, and a color style from a content production file.

대한민국 공개특허 제 10-2018-0038318호 (2018.04.16 공개)Republic of Korea Patent Publication No. 10-2018-0038318 (published on April 16, 2018) 대한민국 공개특허 제 10-2020-0099427호 (2020.08.24 공개)Republic of Korea Patent Publication No. 10-2020-0099427 (published on August 24, 2020) 대한민국 공개특허 제 10-2019-0105533호(2019.09.17 공개)Republic of Korea Patent Publication No. 10-2019-0105533 (published on September 17, 2019) 대한민국 공개특허 제 10-2020-0071031호(2020.06.18 공개)Republic of Korea Patent Publication No. 10-2020-0071031 (published on June 18, 2020)

실시예들은 인공지능을 기반으로 영상 콘텐츠 제작을 위한 자동 편집 방법을 제공하고자 한다.Embodiments are intended to provide an automatic editing method for video content production based on artificial intelligence.

실시예들은 오디오 파일 및 자막 텍스트로부터 자막 스타일을 생성하고자 한다.Embodiments seek to create a subtitle style from an audio file and subtitle text.

실시예들은 영상 콘텐츠 제작을 위한 컷편집 스타일 및 색감 스타일을 생성하고자 한다.Embodiments intend to create a cut editing style and a color style for video content production.

일실시예에 따르면, 장치에 의해 수행되는 방법에 있어서, 사용자의 사용자 단말로부터 콘텐츠를 제작하기 위한 영상 클립, 오디오 파일 및 상기 오디오 파일에 대응하는 자막 텍스트를 포함하는 제작 파일을 수신하는 단계; 상기 제작 파일을 이용하여 콘텐츠 스타일을 생성하는 단계; 상기 제작 파일에 상기 콘텐츠 스타일을 적용하여 콘텐츠 초안을 생성하는 단계; 상기 콘텐츠 스타일 및 상기 콘텐츠 초안을 상기 사용자 단말로 전송하는 단계; 상기 콘텐츠 스타일의 편집이 가능한 인터페이스를 상기 사용자 단말에 제공하는 단계; 및 상기 인터페이스에 의해 편집된 콘텐츠 스타일을 상기 제작 파일에 적용하여 최종 콘텐츠를 생성하는 단계를 포함할 수 있다.According to one embodiment, there is provided a method performed by an apparatus, comprising: receiving a production file including a video clip, an audio file, and subtitle text corresponding to the audio file for producing content from a user terminal of a user; generating a content style using the production file; creating a content draft by applying the content style to the production file; sending the content style and the content draft to the user terminal; providing an interface capable of editing the content style to the user terminal; and applying the content style edited by the interface to the production file to generate final content.

상기 콘텐츠 스타일은 자막 스타일, 컷편집 스타일 및 색감 스타일을 포함하고, 상기 자막 스타일은 상기 오디오 파일 및 자막 텍스트로부터 생성되고, 상기 컷편집 스타일은 상기 영상 클립으로부터 생성되고, 상기 색감 스타일은 상기 자막 텍스트로부터 생성될 수 있다.The content style includes a caption style, a cut edit style, and a color style, the caption style is generated from the audio file and caption text, the cut edit style is generated from the video clip, and the color style is the caption text can be generated from

상기 자막 스타일을 생성하는 단계는, 상기 오디오 파일에서의 소리의 세기와 미리 설정된 제1 기준 세기를 비교하는 단계; 상기 소리의 세기가 상기 제1 기준 세기보다 작은 경우, 상기 제1 기준 세기보다 작은 소리에 해당하는 자막 텍스트를 제1 폰트, 제1 크기 및 제1 색상으로 설정된 제1 말자막 스타일로 설정하는 단계; 상기 소리의 세기가 상기 제1 기준 세기보다 큰 경우, 상기 소리의 세기와 미리 설정된 제2 기준 세기와 비교하는 단계; 상기 소리의 세기가 상기 제2 기준 세기보다 작은 경우, 상기 제2 기준 세기보다 작은 소리에 해당하는 자막 텍스트를 제2 폰트, 제2 크기 및 제2 색상으로 설정된 제2 말자막 스타일로 설정하는 단계; 상기 소리의 세기가 상기 제2 기준 세기보다 큰 경우, 상기 제2 기준 세기보다 큰 소리의 세기가 미리 설정된 기준 기간 동안 유지되는지 여부를 판단하는 단계; 상기 소리의 세기가 상기 기준 기간 동안 유지되는 경우, 상기 기준 기간 동안 상기 제2 기준 세기 이상의 소리에 해당하는 자막 텍스트를 제3 폰트, 제3 크기 및 제3 색상으로 설정된 제3 말자막 스타일로 설정하는 단계; 상기 소리의 세기가 상기 기준 기간 동안 유지되지 않는 경우, 상기 기준 기간 동안 상기 제2 기준 세기 이상의 소리에 해당하는 자막 텍스트를 효과 자막 스타일로 설정하는 단계; 및 상기 제1 말자막 스타일, 상기 제2 말자막 스타일, 상기 제3 말자막 스타일 및 상기 효과 자막 스타일을 기초로 상기 자막 스타일을 생성하는 단계를 포함할 수 있다.The generating of the subtitle style may include: comparing the intensity of a sound in the audio file with a preset first reference intensity; When the intensity of the sound is less than the first reference intensity, setting the subtitle text corresponding to the sound less than the first reference intensity as a first closed caption style set to a first font, a first size, and a first color; ; comparing the sound intensity with a preset second reference intensity when the intensity of the sound is greater than the first reference intensity; setting a caption text corresponding to a sound lower than the second reference intensity as a second closed caption style set to a second font, a second size, and a second color when the intensity of the sound is lower than the second reference intensity; ; determining whether the loudness of the sound greater than the second reference intensity is maintained for a preset reference period when the intensity of the sound is greater than the second reference intensity; When the intensity of the sound is maintained for the reference period, the subtitle text corresponding to the sound equal to or greater than the second reference intensity is set as a third closed caption style set to a third font, a third size, and a third color during the reference period to do; setting, as an effect caption style, a caption text corresponding to a sound equal to or greater than the second reference intensity during the reference period when the volume of the sound is not maintained for the reference period; and generating the caption style based on the first caption style, the second caption style, the third caption style, and the effect caption style.

상기 컷편집 스타일을 생성하는 단계는, 미리 정의된 유사 범위의 패턴이 전체 영상 클립 내에서 미리 정의된 기준을 넘어서 반복되는지 여부에 따라 영상 클립으로부터 배경을 추출하는 단계; 상기 추출된 배경을 상기 영상 클립으로부터 분리하여 상기 영상 클립의 주요 객체를 획득하는 단계; 상기 영상 클립을 복수의 클립으로 분할하는 단계; 상기 복수의 클립 각각에 대하여, 상기 주요 객체의 움직임이 미리 설정된 기준 변화값 이상인 경우, 해당 클립을 동적 클립으로 분류하고, 상기 주요 객체의 움직임이 미리 설정된 기준 변화값 이하인 경우, 해당 클립을 정적 클립으로 분류하는 단계; 상기 동적 클립에 제1 시간 가중치를 적용하는 제1 컷편집 스타일을 설정하는 단계; 상기 정적 클립에 상기 제1 시간 가중치보다 작은 제2 시간 가중치를 적용하는 제2 컷편집 스타일을 설정하는 단계; 및 상기 제1 컷편집 스타일 및 상기 제2 컷편집 스타일을 기초로 상기 컷편집 스타일을 생성하는 단계를 포함할 수 있다.The generating of the cut-editing style may include: extracting a background from the video clip according to whether a pattern of a predefined similarity range is repeated within the entire video clip beyond a predefined criterion; obtaining a main object of the video clip by separating the extracted background from the video clip; dividing the video clip into a plurality of clips; For each of the plurality of clips, if the movement of the main object is greater than or equal to a preset reference change value, the clip is classified as a dynamic clip. classifying as; setting a first cut-editing style for applying a first time weight to the dynamic clip; setting a second cut editing style in which a second time weight smaller than the first time weight is applied to the static clip; and generating the cut-editing style based on the first cut-editing style and the second cut-editing style.

상기 색감 스타일을 생성하는 단계는, 상기 자막 텍스트로부터 미리 설정된 기준 횟수 이상 언급된 단어를 주요 키워드로 추출하는 단계; 상기 주요 키워드를 긍정 키워드 혹은 부정 키워드로 분류하는 단계; 상기 주요 키워드가 긍정 키워드로 분류된 경우, 상기 영상 클립을 제1 색감 스타일로 설정하는 단계; 상기 주요 키워드가 부정 키워드로 분류된 경우, 상기 영상 클립을 제2 색감 스타일로 설정하는 단계; 및 상기 제1 색감 스타일 및 상기 제2 색감 스타일을 기초로 상기 색감 스타일을 생성하는 단계를 포함할 수 있다.The generating of the color style may include: extracting, as main keywords, words mentioned more than a preset reference number from the subtitle text; classifying the main keyword as a positive keyword or a negative keyword; setting the video clip as a first color style when the main keyword is classified as a positive keyword; setting the video clip as a second color style when the main keyword is classified as a negative keyword; and generating the color style based on the first color style and the second color style.

상기 장치에 의해 수행되는 방법에 있어서, 미리 설정된 기준 기간 동안 상기 사용자가 상기 콘텐츠를 업로드하는 채널에 대하여, 국가 별 상기 채널의 방문 횟수 및 방문 유지 기간을 획득하는 단계; 상기 방문 횟수 및 상기 방문 유지 기간에 기반하여 국가 별 방문 점수를 생성하는 단계; 상기 기준 기간 동안 상기 채널에 업로드된 콘텐츠를 시청하는 콘텐츠 시청 시간을 국가 별로 획득하는 단계; 상기 콘텐츠 시청 시간에 기반하여 국가 별 콘텐츠 시청 점수를 생성하는 단계; 상기 국가 별 방문 점수 및 상기 국가 별 콘텐츠 시청 점수를 합산하여, 국가 별 채널 관심도를 산출하는 단계; 상기 콘텐츠를 시청한 주요 연령대 정보를 국가 별로 획득하는 단계; 상기 주요 연령대 정보 및 미리 설정한 타겟 연령대 정보를 비교하는 단계; 상기 주요 연령대 정보와 상기 타겟 연령대 정보가 일치하는 국가에 대해 가중치를 부여하여 상기 국가 별 채널 관심도를 업데이트하여 저장하는 단계; 상기 국가 별 채널 관심도를 높은 순으로 나열하여, 채널 관심도가 가장 높은 국가를 타겟 국가로 설정하는 단계; 및 상기 타겟 국가에 대응하는 언어를 추가 자막 언어로 설정하는 단계를 더 포함할 수 있다.A method performed by the device, the method comprising: obtaining, for a channel to which the user uploads the content during a preset reference period, the number of visits to the channel and the visit maintenance period for each country; generating a visit score for each country based on the number of visits and the visit maintenance period; obtaining, by country, a content viewing time for viewing the content uploaded to the channel during the reference period; generating a content viewing score for each country based on the content viewing time; calculating a channel interest level for each country by adding up the country-specific visit score and the country-specific content viewing score; acquiring information about the main age group that watched the content for each country; comparing the main age group information with preset target age group information; assigning weights to countries in which the main age group information and the target age group information match, updating and storing the channel interest level for each country; setting the country with the highest channel interest as a target country by arranging the channel interest for each country in the order of highest; and setting a language corresponding to the target country as an additional subtitle language.

상기 장치에 의해 수행되는 방법에 있어서, 상기 최종 콘텐츠에서 상기 영상 클립 중 마지막 영상 클립이 나오는 시점인 제1 시점의 영상 이미지를 제1 이미지로 추출하는 단계; 상기 제1 이미지에서 배경이 차지하고 있는 영역을 제1 영역으로 구분하고, 상기 제1 이미지에서 상기 제1 영역이 있는 부분을 분할하여 제1-1 이미지를 추출하는 단계; 상기 제1-1 이미지에 출연자가 있는 것으로 확인되면, 상기 제1-1 이미지에서 출연자가 차지하고 있는 영역을 제2 영역으로 구분하는 단계; 상기 최종 콘텐츠를 상기 제1 시점으로부터 시간상 역순으로 재생하여 분석한 결과, 제1-1 시점에 상기 제2 영역 내에 상기 출연자가 없는 것으로 확인되면, 상기 최종 콘텐츠에서 상기 제1-1 시점의 영상 이미지를 제2 이미지로 추출하는 단계; 상기 제2 이미지에서 상기 제2 영역이 있는 부분을 분할하여 제2-1 이미지를 추출하는 단계; 및 상기 제1-1 이미지에서 상기 제2 영역이 있는 부분을 상기 제2-1 이미지로 교체하는 단계를 더 포함할 수 있다.In the method performed by the apparatus, the method comprising: extracting a video image of a first time point, which is a time point at which a last video clip among the video clips, appears as a first image in the final content; dividing an area occupied by a background in the first image into a first area, and extracting a 1-1 image by dividing a portion having the first area in the first image; if it is confirmed that there is a performer in the 1-1 image, dividing an area occupied by the performer in the 1-1 image into a second region; As a result of replaying and analyzing the final content in reverse chronological order from the first time point, if it is confirmed that there is no performer in the second area at the 1-1 time point, the video image of the 1-1 time point in the final content extracting as a second image; extracting a 2-1 image by dividing a portion having the second region from the second image; and replacing the portion having the second region in the 1-1 image with the 2-1 image.

일실시예에 따른 장치는 하드웨어와 결합되어 상술한 방법들 중 어느 하나의 항의 방법을 실행시키기 위하여 매체에 저장된 컴퓨터 프로그램에 의해 제어될 수 있다.The apparatus according to an embodiment may be controlled by a computer program stored in the medium to execute the method of any one of the above-described methods in combination with hardware.

실시예들은 인공지능을 기반으로 영상 콘텐츠 제작을 위한 자동 편집 방법을 제공할 수 있다.Embodiments may provide an automatic editing method for video content production based on artificial intelligence.

실시예들은 오디오 파일 및 자막 텍스트로부터 자막 스타일을 생성할 수 있다.Embodiments may create a subtitle style from an audio file and subtitle text.

실시예들은 영상 콘텐츠 제작을 위한 컷편집 스타일 및 색감 스타일을 생성할 수 있다.Embodiments may generate a cut-editing style and a color style for video content production.

도 1은 일실시예에 따른 시스템의 구성을 설명하기 위한 도면이다.
도 2는 일실시예에 따른 인공지능 기반 영상 콘텐츠 제작을 위한 자동 편집 방법을 제공하는 과정을 설명하기 위한 순서도이다.
도 3은 일실시예에 따른 자막 스타일을 생성하는 과정을 설명하기 위한 순서도이다.
도 4는 일실시예에 따른 컷편집 스타일을 생성하는 과정을 설명하기 위한 순서도이다.
도 5는 일실시예에 따른 색감 스타일을 생성하는 과정을 설명하기 위한 순서도이다.
도 6은 일실시예에 따른 추가 자막 언어를 설정하는 과정을 설명하기 위한 순서도이다.
도 7은 일실시예에 따른 뉴럴 네트워크의 학습을 설명하기 위한 도면이다.
도 8은 일실시예에 따른 제1 시점의 영상 이미지에서 출연자를 삭제하는 과정을 설명하기 위한 순서도이다.
도 9는 일실시예에 따른 출연자가 있는 영역이 다른 이미지로 교체된 제1-1 이미지를 설명하기 위한 도면이다.
도 10은 일실시예에 따른 장치의 구성의 예시도이다.1 is a diagram for explaining the configuration of a system according to an embodiment.
2 is a flowchart for explaining a process of providing an automatic editing method for producing an AI-based video content according to an embodiment.
3 is a flowchart illustrating a process of generating a caption style according to an exemplary embodiment.
4 is a flowchart illustrating a process of creating a cut editing style according to an exemplary embodiment.
5 is a flowchart illustrating a process of generating a color style according to an exemplary embodiment.
6 is a flowchart illustrating a process of setting an additional subtitle language according to an exemplary embodiment.
7 is a diagram for explaining learning of a neural network according to an embodiment.
8 is a flowchart illustrating a process of deleting a performer from a video image of a first viewpoint according to an exemplary embodiment.
FIG. 9 is a view for explaining a 1-1 image in which a region with a performer is replaced with another image according to an exemplary embodiment.
10 is an exemplary diagram of a configuration of an apparatus according to an embodiment.

이하에서, 첨부된 도면을 참조하여 실시예들을 상세하게 설명한다. 그러나, 실시예들에는 다양한 변경이 가해질 수 있어서 특허출원의 권리 범위가 이러한 실시예들에 의해 제한되거나 한정되는 것은 아니다. 실시예들에 대한 모든 변경, 균등물 내지 대체물이 권리 범위에 포함되는 것으로 이해되어야 한다.Hereinafter, embodiments will be described in detail with reference to the accompanying drawings. However, since various changes may be made to the embodiments, the scope of the patent application is not limited or limited by these embodiments. It should be understood that all modifications, equivalents and substitutes for the embodiments are included in the scope of the rights.

실시예들에 대한 특정한 구조적 또는 기능적 설명들은 단지 예시를 위한 목적으로 개시된 것으로서, 다양한 형태로 변경되어 실시될 수 있다. 따라서, 실시예들은 특정한 개시형태로 한정되는 것이 아니며, 본 명세서의 범위는 기술적 사상에 포함되는 변경, 균등물, 또는 대체물을 포함한다.Specific structural or functional descriptions of the embodiments are disclosed for purposes of illustration only, and may be changed and implemented in various forms. Accordingly, the embodiments are not limited to a specific disclosure form, and the scope of the present specification includes changes, equivalents, or substitutes included in the technical spirit.

제1 또는 제2 등의 용어를 다양한 구성요소들을 설명하는데 사용될 수 있지만, 이런 용어들은 하나의 구성요소를 다른 구성요소로부터 구별하는 목적으로만 해석되어야 한다. 예를 들어, 제1 구성요소는 제2 구성요소로 명명될 수 있고, 유사하게 제2 구성요소는 제1 구성요소로도 명명될 수 있다.Although terms such as first or second may be used to describe various elements, these terms should be interpreted only for the purpose of distinguishing one element from another. For example, a first component may be termed a second component, and similarly, a second component may also be termed a first component.

어떤 구성요소가 다른 구성요소에 "연결되어" 있다고 언급된 때에는, 그 다른 구성요소에 직접적으로 연결되어 있거나 또는 접속되어 있을 수도 있지만, 중간에 다른 구성요소가 존재할 수도 있다고 이해되어야 할 것이다.When a component is referred to as being “connected” to another component, it may be directly connected or connected to the other component, but it should be understood that another component may exist in between.

실시예에서 사용한 용어는 단지 설명을 목적으로 사용된 것으로, 한정하려는 의도로 해석되어서는 안된다. 단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한, 복수의 표현을 포함한다. 본 명세서에서, "포함하다" 또는 "가지다" 등의 용어는 명세서 상에 기재된 특징, 숫자, 단계, 동작, 구성요소, 부품 또는 이들을 조합한 것이 존재함을 지정하려는 것이지, 하나 또는 그 이상의 다른 특징들이나 숫자, 단계, 동작, 구성요소, 부품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 미리 배제하지 않는 것으로 이해되어야 한다.Terms used in the examples are used for the purpose of description only, and should not be construed as limiting. The singular expression includes the plural expression unless the context clearly dictates otherwise. In the present specification, terms such as “comprise” or “have” are intended to designate that a feature, number, step, operation, component, part, or combination thereof described in the specification exists, but one or more other features It is to be understood that this does not preclude the possibility of the presence or addition of numbers, steps, operations, components, parts, or combinations thereof.

다르게 정의되지 않는 한, 기술적이거나 과학적인 용어를 포함해서 여기서 사용되는 모든 용어들은 실시예가 속하는 기술 분야에서 통상의 지식을 가진 자에 의해 일반적으로 이해되는 것과 동일한 의미를 가지고 있다. 일반적으로 사용되는 사전에 정의되어 있는 것과 같은 용어들은 관련 기술의 문맥 상 가지는 의미와 일치하는 의미를 가지는 것으로 해석되어야 하며, 본 출원에서 명백하게 정의하지 않는 한, 이상적이거나 과도하게 형식적인 의미로 해석되지 않는다.Unless defined otherwise, all terms used herein, including technical or scientific terms, have the same meaning as commonly understood by one of ordinary skill in the art to which the embodiment belongs. Terms such as those defined in a commonly used dictionary should be interpreted as having a meaning consistent with the meaning in the context of the related art, and should not be interpreted in an ideal or excessively formal meaning unless explicitly defined in the present application. does not

또한, 첨부 도면을 참조하여 설명함에 있어, 도면 부호에 관계없이 동일한 구성 요소는 동일한 참조부호를 부여하고 이에 대한 중복되는 설명은 생략하기로 한다. 실시예를 설명함에 있어서 관련된 공지 기술에 대한 구체적인 설명이 실시예의 요지를 불필요하게 흐릴 수 있다고 판단되는 경우 그 상세한 설명을 생략한다.In addition, in the description with reference to the accompanying drawings, the same components are assigned the same reference numerals regardless of the reference numerals, and the overlapping description thereof will be omitted. In the description of the embodiment, if it is determined that a detailed description of a related known technology may unnecessarily obscure the gist of the embodiment, the detailed description thereof will be omitted.

실시예들은 퍼스널 컴퓨터, 랩톱 컴퓨터, 태블릿 컴퓨터, 스마트 폰, 텔레비전, 스마트 가전 기기, 지능형 자동차, 키오스크, 웨어러블 장치 등 다양한 형태의 제품으로 구현될 수 있다.The embodiments may be implemented in various types of products, such as personal computers, laptop computers, tablet computers, smart phones, televisions, smart home appliances, intelligent cars, kiosks, wearable devices, and the like.

실시예에서 인공지능(Artificial Intelligence, AI) 시스템은 인간 수준의 지능을 구현하는 컴퓨터 시스템이며, 기존 규칙(Rule) 기반의 스마트 시스템과 달리 기계가 스스로 학습하고 판단하는 시스템이다. 인공지능 시스템은 사용할수록 인식률이 향상되고 사용자 취향을 보다 정확하게 이해할 수 있게 되어, 기존 규칙 기반의 스마트 시스템은 점차 심층 학습(Deep Learning) 기반 인공지능 시스템으로 대체되고 있다.In an embodiment, the artificial intelligence (AI) system is a computer system that implements human-level intelligence, and is a system in which a machine learns and makes decisions on its own, unlike an existing rule-based smart system. The more the AI system is used, the better the recognition rate and the more accurate understanding of user preferences, and the existing rule-based smart systems are gradually being replaced by deep learning-based AI systems.

인공지능 기술은 기계 학습 및 기계 학습을 활용한 요소기술들로 구성된다. 기계 학습은 입력 데이터들의 특징을 스스로 분류/학습하는 알고리즘 기술이며, 요소기술은 심층 학습 등의 기계 학습 알고리즘을 활용하여 인간 두뇌의 인지, 판단 등의 기능을 모사하는 기술로서, 언어적 이해, 시각적 이해, 추론/예측, 지식 표현, 동작 제어 등의 기술 분야로 구성된다.Artificial intelligence technology consists of machine learning and element technologies using machine learning. Machine learning is an algorithm technology that categorizes/learns characteristics of input data by itself, and element technology uses machine learning algorithms such as deep learning to simulate functions such as cognition and judgment of the human brain. It consists of technical fields such as understanding, reasoning/prediction, knowledge expression, and motion control.

인공지능 기술이 응용되는 다양한 분야는 다음과 같다. 언어적 이해는 인간의 언어/문자를 인식하고 응용/처리하는 기술로서, 자연어 처리, 기계 번역, 대화시스템, 질의 응답, 음성 인식/합성 등을 포함한다. 시각적 이해는 사물을 인간의 시각처럼 인식하여 처리하는 기술로서, 객체 인식, 객체 추적, 영상 검색, 사람 인식, 장면 이해, 공간 이해, 영상 개선 등을 포함한다. 추론 예측은 정보를 판단하여 논리적으로 추론하고 예측하는 기술로서, 지식/확률 기반 추론, 최적화 예측, 선호 기반 계획, 추천 등을 포함한다. 지식 표현은 인간의 경험정보를 지식데이터로 자동화 처리하는 기술로서, 지식 구축(데이터 생성/분류), 지식 관리(데이터 활용) 등을 포함한다. 동작 제어는 차량의 자율 주행, 로봇의 움직임을 제어하는 기술로서, 움직임 제어(항법, 충돌, 주행), 조작 제어(행동 제어) 등을 포함한다.The various fields where artificial intelligence technology is applied are as follows. Linguistic understanding is a technology for recognizing and applying/processing human language/text, and includes natural language processing, machine translation, dialogue system, question and answer, and speech recognition/synthesis. Visual understanding is a technology for recognizing and processing objects like human vision, and includes object recognition, object tracking, image search, human recognition, scene understanding, spatial understanding, image improvement, and the like. Inferential prediction is a technology for logically reasoning and predicting by judging information, and includes knowledge/probability-based reasoning, optimization prediction, preference-based planning, and recommendation. Knowledge expression is a technology that automatically processes human experience information into knowledge data, and includes knowledge construction (data generation/classification) and knowledge management (data utilization). Motion control is a technology for controlling autonomous driving of a vehicle and movement of a robot, and includes motion control (navigation, collision, driving), manipulation control (action control), and the like.

일반적으로 기계 학습 알고리즘을 실생활에 적용하기 위해서는 기계 학습의 기본 방법론의 특성상 Trial and Error 방식으로 학습을 수행하게 된다. 특히, 심층 학습의 경우 수십만 번의 반복 실행을 필요로 한다. 이를 실제 물리적인 외부 환경에서 실행하기는 불가능하여 대신 실제 물리적인 외부 환경을 컴퓨터상에서 가상으로 구현하여 시뮬레이션을 통해 학습을 수행한다.In general, in order to apply the machine learning algorithm to real life, learning is performed in the Trial and Error method due to the characteristics of the basic methodology of machine learning. In particular, deep learning requires hundreds of thousands of iterations. It is impossible to execute this in the actual physical external environment, so instead, the actual physical external environment is implemented on a computer and learning is performed through simulation.

도 1은 일실시예에 따른 시스템의 구성을 설명하기 위한 도면이다.1 is a diagram for explaining the configuration of a system according to an embodiment.

도 1을 참조하면, 일실시예에 따른 시스템은 통신망을 통해 서로 통신 가능한 사용자 단말(10) 및 장치(30)를 포함할 수 있다.Referring to FIG. 1 , a system according to an embodiment may include a user terminal 10 and a device 30 capable of communicating with each other through a communication network.

먼저, 통신망은 유선 및 무선 등과 같이 그 통신 양태를 가리지 않고 구성될 수 있으며, 서버와 서버 간의 통신과 서버와 단말 간의 통신이 수행되도록 다양한 형태로 구현될 수 있다.First, the communication network may be configured regardless of its communication mode, such as wired and wireless, and may be implemented in various forms so that communication between a server and a server and communication between a server and a terminal are performed.

사용자 단말(10)은 본 발명에 따른 영상 콘텐츠를 제작하려는 사용자가 사용하는 단말일 수 있다. 사용자 단말(10)은 데스크탑 컴퓨터, 노트북, 태블릿, 스마트폰 등일 수 있다. 예를 들어, 도 1에 도시된 바와 같이, 사용자 단말(10)은 스마트폰일 수 있으며, 실시예에 따라 달리 채용될 수도 있다.The user terminal 10 may be a terminal used by a user who wants to produce image content according to the present invention. The user terminal 10 may be a desktop computer, a notebook computer, a tablet, a smart phone, or the like. For example, as shown in FIG. 1 , the user terminal 10 may be a smartphone, and may be employed differently according to embodiments.

사용자 단말(10)은 통상의 컴퓨터가 가지는 연산 기능, 저장/참조 기능, 입출력 기능 및 제어 기능을 전부 또는 일부 수행하도록 구성될 수 있다. 사용자 단말(10)은 장치(30)와 유무선으로 통신하도록 구성될 수 있다.The user terminal 10 may be configured to perform all or part of an arithmetic function, a storage/referencing function, an input/output function, and a control function of a typical computer. The user terminal 10 may be configured to communicate with the device 30 in a wired or wireless manner.

사용자 단말(10)은 장치(30)를 이용하여 서비스를 제공하는 자 내지 단체가 운영하는 웹 페이지에 접속되거나, 장치(30)를 이용하여 서비스를 제공하는 자 내지 단체가 개발·배포한 애플리케이션이 설치될 수 있다. 사용자 단말(10)은 웹 페이지 또는 애플리케이션을 통해 장치(30)와 연동될 수 있다.The user terminal 10 is connected to a web page operated by a person or organization providing a service using the device 30, or an application developed and distributed by a person or organization providing a service using the device 30. can be installed. The user terminal 10 may be linked with the device 30 through a web page or an application.

사용자 단말(10)은 장치(30)에서 제공하는 웹 페이지, 애플리케이션을 등을 통해 장치(30)에 접속할 수 있다.The user terminal 10 may access the device 30 through a web page or an application provided by the device 30 .

청구항에 기재된 단수의 표현은 복수를 포함하는 것으로 이해될 수 있다. 예를 들어, 청구항의 사용자는 하나의 사용자 또는 둘 이상의 사용자를 지칭할 수 있다.Expressions in the singular in the claims may be understood to include the plural. For example, a user in a claim may refer to one user or more than one user.

장치(30)는 장치(30)를 이용하여 서비스를 제공하는 자 내지 단체가 보유한 자체 서버일수도 있고, 클라우드 서버일 수도 있고, 분산된 노드(node)들의 p2p(peer-to-peer) 집합일 수도 있다. 장치(30)는 통상의 컴퓨터가 가지는 연산 기능, 저장/참조 기능, 입출력 기능 및 제어 기능을 전부 또는 일부 수행하도록 구성될 수 있다.The device 30 may be a self-server owned by a person or organization that provides services using the device 30, a cloud server, or a peer-to-peer (p2p) set of distributed nodes. may be The device 30 may be configured to perform all or part of an arithmetic function, a storage/referencing function, an input/output function, and a control function of a typical computer.

장치(30)는 사용자 단말(10)과 유무선으로 통신하도록 구성될 수 있으며, 사용자 단말(10)의 동작을 제어하고, 사용자 단말(10)의 화면에 어느 정보를 표시할 것인지에 대해 제어할 수 있다.The device 30 may be configured to communicate with the user terminal 10 by wire or wireless, and may control the operation of the user terminal 10 and control which information to display on the screen of the user terminal 10 . .

한편, 설명의 편의를 위해 도 1에서는 사용자 단말(10) 만을 도시하였으나, 단말들의 수는 실시예에 따라 얼마든지 달라질 수 있다. 장치(30)의 처리 용량이 허용하는 한, 단말들의 수는 특별한 제한이 없다.Meanwhile, although only the user terminal 10 is illustrated in FIG. 1 for convenience of explanation, the number of terminals may vary according to embodiments. As long as the processing capacity of the device 30 allows, the number of terminals is not particularly limited.

장치(30)는 인공지능 기반 영상 콘텐츠 제작을 위한 자동 편집 방법을 제공할 수 있다. 일실시예에 따르면, 장치(30) 내에는 데이터베이스가 구비될 수 있으며, 이에 한정되지 않고, 장치(30)와 별도로 데이터베이스가 구성될 수 있다. 장치(30)는 기계 학습 알고리즘의 수행을 위한 다수의 인공신경망을 포함할 수 있다.The device 30 may provide an automatic editing method for artificial intelligence-based video content production. According to an embodiment, a database may be provided in the device 30 , and the database is not limited thereto, and a database may be configured separately from the device 30 . The device 30 may include a number of artificial neural networks for performing machine learning algorithms.

도 2는 일실시예에 따른 인공지능 기반 영상 콘텐츠 제작을 위한 자동 편집 방법을 제공하는 과정을 설명하기 위한 순서도이다.2 is a flowchart for explaining a process of providing an automatic editing method for producing an AI-based video content according to an embodiment.

도 2를 참조하면, 먼저 S201 단계에서, 장치(30)는 사용자 단말(10)로부터 콘텐츠를 제작하기 위한 제작 파일을 수신할 수 있다. 이때, 제작 파일은 콘텐츠를 제작하기 위해 사용되는 파일로서, 예를 들면, 영상 클립, 오디오 파일 및 오디오 파일에 대응하는 자막 텍스트를 포함할 수 있다.Referring to FIG. 2 , first, in step S201 , the device 30 may receive a production file for producing content from the user terminal 10 . In this case, the production file is a file used to produce content, and may include, for example, a video clip, an audio file, and subtitle text corresponding to the audio file.

S202 단계에서, 장치(30)는 제작 파일을 이용하여 콘텐츠 스타일을 생성할 수 있다. 일실시예에 따르면, 장치(30)는 영상 클립, 오디오 파일 및 오디오 파일에 대응하는 자막 텍스트를 이용하여 콘텐츠 스타일을 생성할 수 있다. 이때, 콘텐츠 스타일은 자막의 폰트, 크기 및 색상에 대한 자막 스타일, 영상 클립의 시간에 대한 컷편집 스타일 및 영상 클립의 색감에 대한 색감 스타일을 포함할 수 있다.In step S202 , the device 30 may generate a content style using the production file. According to an embodiment, the device 30 may generate a content style by using a video clip, an audio file, and caption text corresponding to the audio file. In this case, the content style may include a caption style for the font, size and color of the subtitle, a cut editing style for the time of the video clip, and a color style for the color of the video clip.

자막 스타일을 생성하는 과정, 컷편집 스타일을 생성하는 과정 및 색감 스타일을 생성하는 과정에 대한 자세한 설명은 도 3, 도4 및 도 5를 각각 참조하여 후술하기로 한다.A detailed description of a process of generating a subtitle style, a process of generating a cut editing style, and a process of generating a color style will be described later with reference to FIGS. 3, 4, and 5, respectively.

S203 단계에서, 장치(30)는 제작 파일에 콘텐츠 스타일을 적용하여 콘텐츠 초안을 생성할 수 있다.In step S203 , the device 30 may create a content draft by applying a content style to the production file.

구체적으로, 장치(30)는 영상 클립, 오디오 파일 및 오디오 파일에 대응하는 자막 텍스트에 대하여, 자막 스타일, 컷편집 스타일 및 색감 스타일을 포함하는 콘텐츠 스타일을 생성하고, 생성된 콘텐츠 스타일을 제작 파일에 적용하여 콘텐츠 초안을 생성할 수 있다.Specifically, the device 30 generates a content style including a caption style, a cut-editing style, and a color style with respect to the video clip, the audio file, and the caption text corresponding to the audio file, and stores the generated content style in the production file. Can be applied to create draft content.

S204 단계에서, 장치(30)는 콘텐츠 스타일 및 콘텐츠 초안을 사용자 단말(10)로 전송할 수 있다.In step S204 , the device 30 may transmit the content style and content draft to the user terminal 10 .

S205 단계에서, 장치(30)는 콘텐츠 스타일의 편집이 가능한 인터페이스를 사용자 단말(10)에 제공할 수 있다.In step S205 , the device 30 may provide the user terminal 10 with an interface capable of editing the content style.

일실시예에 따르면, 장치(30)는 콘텐츠 스타일의 편집이 가능한 인터페이스를 생성하고, 생성된 인터페이스를 사용자 단말(10)로 전송할 수 있다. 인터페이스는 자막의 폰트, 크기 및 색상에 대한 자막 스타일, 영상 클립의 시간에 대한 컷편집 스타일 및 영상 클립의 색감에 대한 색감 스타일의 편집이 가능하도록 생성된 페이지를 포함할 수 있다.According to an embodiment, the device 30 may generate an interface capable of editing a content style, and transmit the generated interface to the user terminal 10 . The interface may include a page generated to enable editing of a subtitle style for the font, size and color of the subtitle, a cut editing style for the time of the video clip, and a color style for the color of the video clip.

S206 단계에서, 장치(30)는 인터페이스에 의해 편집된 콘텐츠 스타일을 제작 파일에 적용하여 최종 콘텐츠를 생성할 수 있다.In step S206 , the device 30 may generate the final content by applying the content style edited by the interface to the production file.

일실시예에 따르면, 장치(30)는 콘텐츠 스타일의 편집이 가능한 인터페이스를 사용자 단말(10)로 전송할 수 있으며, 사용자 단말(10)로부터 인터페이스에 의해 편집된 콘텐츠 스타일을 수신하고, 수신된 편집된 콘텐츠 스타일을 제작 파일에 적용하여 최종 콘텐츠를 생성할 수 있다.According to an embodiment, the device 30 may transmit an interface capable of editing the content style to the user terminal 10 , and receive the content style edited by the interface from the user terminal 10 , and receive the edited content style from the user terminal 10 . You can apply content styles to production files to create the final content.

도 3은 일실시예에 따른 자막 스타일을 생성하는 과정을 설명하기 위한 순서도이다.3 is a flowchart illustrating a process of generating a caption style according to an exemplary embodiment.

도 3을 참조하면, 먼저 S301 단계에서, 장치(30)는 오디오 파일에서의 소리의 세기와 미리 설정된 제1 기준 세기를 비교할 수 있다. 이때, 제1 기준 세기는 실시예에 따라 상이하게 설정될 수 있다.Referring to FIG. 3 , first, in step S301 , the device 30 may compare the intensity of a sound in the audio file with a preset first reference intensity. In this case, the first reference intensity may be set differently depending on the embodiment.

S302 단계에서, 소리의 세기가 제1 기준 세기보다 작은 경우, S303 단계에서, 장치(30)는 제1 기준 세기보다 작은 소리에 해당하는 자막 텍스트를 제1 말자막 스타일로 설정할 수 있다. 이때, 제1 말자막 스타일은 주로 영상 클립의 하단에 표시되도록 설정되고, 말소리에 따라 입력되는 자막으로서, 자막 텍스트가 제1 폰트, 제1 크기 및 제1 색상으로 설정될 수 있다.In step S302 , when the sound intensity is lower than the first reference intensity, in step S303 , the device 30 may set the subtitle text corresponding to the sound lower than the first reference intensity as the first closed caption style. In this case, the first word caption style is mainly set to be displayed at the bottom of the video clip, and as a caption input according to speech sound, the caption text may be set to a first font, a first size, and a first color.

예를 들어, 제1 말자막 스타일이 굴림체, 10pt, 검정색으로 설정되고, 자막 텍스트를 제1 말자막 스타일로 설정하는 경우, 장치(30)는 자막 텍스트를 굴림체, 12pt, 검정색으로 설정할 수 있다.For example, when the first word caption style is set to Gulim, 10pt, and black, and the caption text is set as the first closed caption style, the device 30 may set the caption text to Gulim, 12pt, and black.

S302 단계에서, 소리의 세기가 제1 기준 세기보다 큰 경우, S304 단계에서, 장치(30)는 소리의 세기와 미리 설정된 제2 기준 세기와 비교할 수 있다. 이때, 제2 기준 세기는 실시예에 따라 상이하게 설정될 수 있다.In step S302 , if the sound intensity is greater than the first reference intensity, in step S304 , the device 30 may compare the sound intensity with a preset second reference intensity. In this case, the second reference intensity may be set differently according to embodiments.

S305 단계에서, 소리의 세기가 제2 기준 세기보다 작은 경우, 장치(30)는 제2 기준 세기보다 작은 소리에 해당하는 자막 텍스트를 제2 말자막 스타일로 설정할 수 있다. 이때, 제2 말자막 스타일은 주로 영상 클립의 하단에 표시되도록 설정되고, 말소리에 따라 입력되는 자막으로서, 자막 텍스트가 제2 폰트, 제2 크기 및 제2 색상으로 설정될 수 있다.In step S305 , when the sound intensity is less than the second reference intensity, the device 30 may set the subtitle text corresponding to the sound less than the second reference intensity as the second closed caption style. In this case, the second word caption style is mainly set to be displayed at the bottom of the video clip, and as a caption input according to speech sound, the caption text may be set to a second font, a second size, and a second color.

예를 들어, 제2 말자막 스타일이 궁서체, 12pt, 파란색으로 설정되고, 자막 텍스트를 제2 말자막 스타일로 설정하는 경우, 장치(30)는 자막 텍스트를 궁서체, 12pt, 파란색으로 설정할 수 있다. For example, when the second closed caption style is set to arch font, 12pt, and blue, and the subtitle text is set to the second closed caption style, the device 30 may set the subtitle text to archery, 12pt, and blue.

S306 단계에서, 소리의 세기가 제2 기준 세기보다 큰 경우, 장치(30)는 제2 기준 세기보다 큰 소리의 세기가 미리 설정된 기준 기간 동안 유지되는지 여부를 판단할 수 있다. 이때, 기준 기간은 실시예에 따라 상이하게 설정될 수 있다.In step S306 , if the sound intensity is greater than the second reference intensity, the device 30 may determine whether the loudness greater than the second reference intensity is maintained for a preset reference period. In this case, the reference period may be set differently depending on the embodiment.

S307 단계에서, 소리의 세기가 기준 기간 동안 유지되는 경우, 장치(30)는 기준 기간 동안 제2 기준 세기 이상의 소리에 해당하는 자막 텍스트를 제3 말자막 스타일로 설정할 수 있다. 이때, 제3 말자막 스타일은 주로 영상 클립의 하단에 표시되도록 설정되고, 말소리에 따라 입력되는 자막으로서, 자막 텍스트가 제3 폰트, 제3 크기 및 제3 색상으로 설정될 수 있다.In operation S307 , if the sound intensity is maintained for the reference period, the device 30 may set the subtitle text corresponding to the sound having the second reference intensity or higher during the reference period as the third closed caption style. In this case, the third word caption style is mainly set to be displayed at the bottom of the video clip, and as a caption input according to speech sound, the caption text may be set to a third font, a third size, and a third color.

예를 들어, 제3 말자막 스타일이 돋움체, 14pt, 초록색으로 설정되고, 자막 텍스트를 제3 말자막 스타일로 설정하는 경우, 장치(30)는 자막 텍스트를 돋움체, 14pt, 초록색으로 설정할 수 있다.For example, when the third closed caption style is set to a bold font, 14 pt, and green, and the subtitle text is set to the third closed caption style, the device 30 may set the subtitle text to a raised font, 14 pt, or green.

S308 단계에서, 소리의 세기가 기준 기간 동안 유지되지 않는 경우, 장치(30)는 기준 기간 동안 제2 기준 세기 이상의 소리에 해당하는 자막 텍스트를 효과 자막 스타일로 설정할 수 있다. 이때, 효과 자막 스타일은 주로 영상 클립의 중간부에 표시되도록 설정되고, 의성어, 의태어, 감탄사 및 효과음 등 콘텐츠의 재미 요소를 첨가하기 위해 사용되는 자막 스타일일 수 있다.In step S308 , if the sound intensity is not maintained for the reference period, the device 30 may set the caption text corresponding to the sound equal to or greater than the second reference intensity level as the effect caption style during the reference period. In this case, the effect caption style may be a caption style mainly set to be displayed in the middle of a video clip, and used to add fun elements of content, such as onomatopoeia, mimetic words, exclamation words, and sound effects.

예를 들어, 효과 자막 스타일은 자막 텍스트의 폰트, 크기, 색상 및 애니메이션 효과 등을 포함할 수 있으나, 이에 한정되지 않는다. For example, the effect subtitle style may include, but is not limited to, font, size, color, animation effect, and the like of subtitle text.

S309 단계에서, 장치(30)는 제1 말자막 스타일, 제2 말자막 스타일, 제3 말자막 스타일 및 효과 자막 스타일을 기초로 자막 스타일을 생성할 수 있다.In step S309 , the device 30 may generate a caption style based on the first word caption style, the second word caption style, the third word caption style, and the effect caption style.

일실시예에 따르면, 자막 스타일은 오디오 파일 및 자막 텍스트로부터 생성될 수 있다.According to one embodiment, the subtitle style may be created from an audio file and subtitle text.

도 4는 일실시예에 따른 컷편집 스타일을 생성하는 과정을 설명하기 위한 순서도이다.4 is a flowchart illustrating a process of creating a cut editing style according to an exemplary embodiment.

도 4를 참조하면, 먼저 S401 단계에서, 장치(30)는 영상 클립으로부터 배경을 추출할 수 있다. 일실시예에 따르면, 장치(30)는 미리 정의된 유사 범위의 패턴이 전체 영상 클립 내에서 미리 정의된 기준을 넘어서 반복되는지 여부에 따라 영상 클립으로부터 배경을 추출할 수 있다.Referring to FIG. 4 , first, in step S401 , the device 30 may extract a background from the video clip. According to an embodiment, the device 30 may extract a background from a video clip according to whether a pattern of a predefined similarity range is repeated beyond a predefined criterion within the entire video clip.

S402 단계에서, 장치(30)는 추출된 배경을 영상 클립으로부터 분리하여 영상 클립의 주요 객체를 획득할 수 있다.In step S402 , the device 30 may obtain the main object of the video clip by separating the extracted background from the video clip.

S403 단계에서, 장치(30)는 영상 클립을 복수의 클립으로 분할할 수 있다. 예를 들어, 영상 클립이 총 100초인 경우, 장치(30)는 10초씩 영상 클립을 분할하여 영상 클립을 10개의 복수의 클립으로 분할할 수 있다.In step S403 , the device 30 may divide the video clip into a plurality of clips. For example, if the video clip has a total of 100 seconds, the device 30 may divide the video clip by 10 seconds to divide the video clip into a plurality of 10 clips.

S404 단계에서, 장치(30)는 복수의 클립 각각에 대하여, 주요 객체의 움직임이 미리 설정된 기준 변화값과 비교할 수 있다. 이때, 기준 변화값은 실시예에 따라 상이하게 설정될 수 있다.In step S404 , the device 30 may compare the movement of the main object with a preset reference change value for each of the plurality of clips. In this case, the reference change value may be set differently depending on the embodiment.

S405 단계에서, 주요 객체의 움직임이 기준 변화값 이상인 경우, 장치(30)는 해당 클립을 동적 클립으로 분류할 수 있다. In step S405 , when the movement of the main object is equal to or greater than the reference change value, the device 30 may classify the corresponding clip as a dynamic clip.

이때, 동적 클립은 복수의 영상 클립 각각에 대하여 주요 객체의 움직임이 기준 변화값 이상인 경우, 주요 객체의 움직임이 변화하고 있거나 역동적이라고 판단되어 분류된 영상 클립을 의미할 수 있다.In this case, the dynamic clip may refer to an image clip classified by determining that the movement of the main object is changing or dynamic when the movement of the main object is equal to or greater than the reference change value for each of the plurality of image clips.

S406 단계에서, 장치(30)는 동적 클립에 제1 시간 가중치를 적용하는 제1 컷편집 스타일을 설정할 수 있다. 이때, 제1 시간 가중치는 실시예에 따라 상이하게 설정될 수 있다.In step S406 , the device 30 may set a first cut editing style for applying a first time weight to the dynamic clip. In this case, the first time weight may be set differently depending on the embodiment.

S407 단계에서, 주요 객체의 움직임이 기준 변화값 이하인 경우, 장치(30)는 해당 클립을 정적 클립으로 분류할 수 있다. In step S407 , when the movement of the main object is equal to or less than the reference change value, the apparatus 30 may classify the corresponding clip as a static clip.

이때, 정적 클립은 복수의 영상 클립 각각에 대하여 주요 객체의 움직임이 기준 변화값 이하인 경우, 주요 객체의 움직임이 변하지 않고 계속 유지되는 것으로 판단되어 분류된 영상 클립을 의미할 수 있다.In this case, the static clip may mean an image clip classified by determining that the movement of the main object is continuously maintained without changing when the movement of the main object with respect to each of the plurality of image clips is less than or equal to the reference change value.

S408 단계에서, 장치(30)는 정적 클립에 제2 시간 가중치를 적용하는 제2 컷편집 스타일을 설정할 수 있다. 이때, 제2 시간 가중치는 제1 시간 가중치 보다 작게 설정될 수 있으며, 실시예에 따라 상이하게 설정될 수 있다.In step S408 , the device 30 may set a second cut editing style for applying a second time weight to the static clip. In this case, the second time weight may be set to be smaller than the first time weight, and may be set differently according to embodiments.

일실시예에 따르면, 장치(30)는 동적 클립에 제1 시간 가중치를 적용하는 제1 컷편집 스타일을 설정하고, 정적 클립에 제2 시간 가중치를 적용하는 제2 컷편집 스타일을 설정할 수 있다. 즉, 장치(30)는 동적 클립의 시간 비중을 증가시키고, 정적 클립의 시간 비중을 감소시키는 수행을 통하여, 영상 콘텐츠의 지루함을 감소시키고, 역동적이고 흥미를 유발하는 컷편집 스타일을 생성할 수 있다.According to an embodiment, the device 30 may set a first cut-editing style in which a first time weight is applied to a dynamic clip, and set a second cut-editing style in which a second time weight is applied to a static clip. That is, the device 30 can reduce the boredom of video content and create a dynamic and interesting cut-editing style by increasing the time weight of the dynamic clip and decreasing the time weight of the static clip. .

S409 단계에서, 장치(30)는 제1 컷편집 스타일 및 제2 컷편집 스타일을 기초로, 컷편집 스타일을 생성할 수 있다.In step S409 , the device 30 may generate a cut edit style based on the first cut edit style and the second cut edit style.

일실시예에 따르면, 컷편집 스타일은 영상 클립으로부터 생성될 수 있다.According to an embodiment, the cut-editing style may be generated from a video clip.

도 5는 일실시예에 따른 색감 스타일을 생성하는 과정을 설명하기 위한 순서도이다.5 is a flowchart illustrating a process of generating a color style according to an exemplary embodiment.

도 5를 참조하면, 먼저 S501 단계에서, 장치(30)는 자막 텍스트로부터 기준 횟수 이상 언급된 단어를 주요 키워드로 추출할 수 있다. 이때, 기준 횟수는 실시예에 따라 상이하게 설정될 수 있다.Referring to FIG. 5 , first, in step S501 , the device 30 may extract a word mentioned more than a reference number from the subtitle text as a main keyword. In this case, the reference number may be set differently depending on the embodiment.

S502 단계에서, 장치(30)는 주요 키워드를 긍정 키워드 혹은 부정 키워드로 분류할 수 있다.In step S502, the device 30 may classify the main keyword as a positive keyword or a negative keyword.

일실시예에 따르면, 장치(30)의 데이터베이스에는 부정어 혹은 긍정어로 분류된 단어들이 미리 저장되어 있을 수 있으며, 추출된 주요 키워드가 긍정어로 분류된 단어인 경우, 장치(30)는 주요 키워드를 긍정 키워드로 분류할 수 있으며, 추출된 주요 키워드가 부정어로 분류된 단어인 경우, 장치(30)는 주요 키워드를 부정 키워드로 분류할 수 있다.According to an embodiment, words classified as negative or positive words may be pre-stored in the database of the device 30 , and when the extracted main keyword is a word classified as a positive word, the device 30 affirms the main keyword It may be classified as a keyword, and when the extracted main keyword is a word classified as a negative word, the device 30 may classify the main keyword as a negative keyword.

S503 단계에서, 장치(30)는 주요 키워드가 긍정 키워드로 분류된 경우, 영상 클립을 제1 색감 스타일로 설정할 수 있다.In step S503 , when the main keyword is classified as a positive keyword, the device 30 may set the video clip as the first color style.

일실시예에 따르면, 장치(30)는 주요 키워드가 긍정 키워드로 분류되고, 제1 색감 스타일이 따뜻한 색감인 경우, 영상 클립이 따뜻한 색감을 갖도록 색온도를 조절할 수 있다. According to an embodiment, when the main keyword is classified as a positive keyword and the first color style is a warm color, the device 30 may adjust the color temperature so that the video clip has a warm color.

S504 단계에서, 장치(30)는 주요 키워드가 부정 키워드로 분류된 경우, 영상 클립을 제2 색감 스타일로 설정할 수 있다.In step S504 , when the main keyword is classified as a negative keyword, the device 30 may set the video clip as the second color style.

일실시예에 따르면, 장치(30)는 주요 키워드가 부정 키워드로 분류되고, 제2 색감 스타일이 차가운 색감인 경우, 영상 클립이 차가운 색감을 갖도록 색온도를 조절할 수 있다. According to an embodiment, when the main keyword is classified as a negative keyword and the second color style is a cool color, the device 30 may adjust the color temperature so that the video clip has a cool color.

S505 단계에서, 장치(30)는 제1 색감 스타일 및 제2 색감 스타일을 기초로 색감 스타일을 생성할 수 있다.In operation S505 , the device 30 may generate a color style based on the first color style and the second color style.

일실시예에 따르면, 색감 스타일은 자막 텍스트로부터 생성될 수 있다.According to an embodiment, the color style may be generated from subtitle text.

도 6은 일실시예에 따른 추가 자막 언어를 설정하는 과정을 설명하기 위한 순서도이다.6 is a flowchart illustrating a process of setting an additional subtitle language according to an exemplary embodiment.

도 6을 참조하면, 먼저 S601 단계에서, 장치(30)는 미리 설정된 기준 기간 동안 사용자가 콘텐츠를 업로드하는 채널에 대하여, 국가 별 채널의 방문 횟수 및 방문 유지 기간을 획득할 수 있다. 이때, 기준 기간은 실시예에 따라 상이하게 설정될 수 있다.Referring to FIG. 6 , first, in step S601 , the device 30 may acquire the number of visits and the visit maintenance period of channels for each country with respect to a channel through which a user uploads content during a preset reference period. In this case, the reference period may be set differently depending on the embodiment.

일실시예에 따르면, 장치(30)는 사용자가 콘텐츠를 업로드하는 채널의 채널 명, 구독자 수, 조회 내역 및 방문 내역 등을 포함하는 채널 정보를 획득할 수 있으며, 국가 별로 채널의 방문 횟수 및 방문 유지 기간을 획득할 수 있다. 예를 들어, 기준 기간이 3개월인 경우, 장치(30)는 3개월 동안 국가 별로 채널에 얼마나 방문했는지에 대한 방문 횟수 및 3개월 동안 방문을 얼마나 유지했는지에 대한 방문 유지 기간을 획득할 수 있다.According to an embodiment, the device 30 may obtain channel information including a channel name, number of subscribers, inquiry history and visit history of a channel to which the user uploads content, and the number of visits and visits to the channel by country maintenance period can be obtained. For example, if the reference period is 3 months, the device 30 may obtain a visit retention period for how many visits to a channel by country for 3 months and how many visits are maintained for 3 months. .

S602 단계에서, 장치(30)는 방문 횟수 및 방문 유지 기간에 기반하여 국가 별 방문 점수를 생성할 수 있다.In step S602 , the device 30 may generate a visit score for each country based on the number of visits and the visit maintenance period.

예를 들어, 기준 기간이 3개월이고, A 국가의 방문 횟수가 60회이고, 방문 유지 기간이 30일인 경우, 장치(30)는 A 국가의 방문 점수를 90점으로 생성할 수 있으며, B 국가의 방문 횟수가 55회이고, 방문 유지 기간이 20일인 경우, 장치(30)는 B 국가의 방문 점수를 75점으로 생성할 수 있다.For example, if the reference period is 3 months, the number of visits to country A is 60, and the visit maintenance period is 30 days, the device 30 may generate a visit score of country A as 90, and country B If the number of visits is 55 and the visit maintenance period is 20 days, the device 30 may generate a visit score of country B as 75 points.

S603 단계에서, 장치(30)는 기준 기간 동안 채널에 업로드된 콘텐츠를 시청하는 콘텐츠 시청 시간을 국가 별로 획득할 수 있다.In step S603 , the device 30 may acquire the content viewing time for viewing the content uploaded to the channel for each country during the reference period.

S604 단계에서, 장치(30)는 콘텐츠 시청 시간에 기반하여 국가 별 콘텐츠 시청 점수를 생성할 수 있다.In step S604 , the device 30 may generate a content viewing score for each country based on the content viewing time.

예를 들어, 기준 기간이 3개월이고, A 국가의 콘텐츠 시청 시간이 30 시간인 경우, 장치(30)는 A 국가의 방문 점수를 30점으로 생성할 수 있으며, B 국가의 콘텐츠 시청 시간이 25시간인 경우, 장치(30)는 B 국가의 콘텐츠 시청 점수를 25점으로 생성할 수 있다.For example, if the reference period is 3 months and country A's content viewing time is 30 hours, the device 30 may generate a visit score of 30 points for country A, and country B's content viewing time is 25 In the case of time, the device 30 may generate the content viewing score of country B as 25 points.

S605 단계에서, 장치(30)는 국가 별 방문 점수 및 국가 별 콘텐츠 시청 점수를 합산하여, 국가 별 채널 관심도를 산출할 수 있다.In step S605 , the device 30 may calculate the channel interest for each country by adding up the visit score for each country and the content viewing score for each country.

예를 들어, A 국가의 방문 점수가 90점이고, 콘텐츠 시청 점수가 30점인 경우, 장치(30)는 A 국가의 채널 관심도를 120점으로 산출할 수 있고, B 국가의 방문 점수가 75점이고, 콘텐츠 시청 점수가 25점인 경우, 장치(30)는 B 국가의 채널 관심도를 100점으로 산출할 수 있다.For example, if country A's visit score is 90 and the content viewing score is 30, the device 30 may calculate the channel interest of country A as 120 points, country B's visit score is 75, and the content When the viewing score is 25, the device 30 may calculate the channel interest of country B as 100 points.

S606 단계에서, 장치(30)는 콘텐츠를 시청한 주요 연령대 정보를 국가 별로 획득할 수 있다.In step S606 , the device 30 may obtain information about the main age group that watched the content for each country.

예를 들어, 장치(30)는 A국가의 주요 연령대 정보를 20대, B 국가의 주요 연령대 정보를 50대로 획득할 수 있다.For example, the device 30 may acquire main age group information of country A in their 20s and main age group information of country B in their 50s.

S607 단계에서, 장치(30)는 주요 연령대 정보 및 미리 설정한 타겟 연령대 정보를 비교할 수 있다. 이때, 타겟 연령대 정보는 사용자가 콘텐츠를 제작할 때 타겟으로 설정한 연령대를 의미할 수 있으며, 실시예에 따라 상이하게 설정될 수 있다.In step S607 , the device 30 may compare the main age group information and the preset target age group information. In this case, the target age group information may mean an age group set as a target when the user creates content, and may be set differently according to embodiments.

S608 단계에서, 장치(30)는 주요 연령대 정보와 타겟 연령대 정보가 일치하는 국가에 대해 가중치를 부여하여 국가 별 채널 관심도를 업데이트하여 저장할 수 있다.In step S608 , the device 30 may update and store the channel interest for each country by giving weights to countries in which the main age group information and the target age group information match.

예를 들어, A 국가의 주요 연령대 정보와 타겟 연령대 정보가 20대로 일치하는 경우, 장치(30)는 A 국가의 채널 관심도에 가중치를 부여하여 채널 관심도를 업데이트하여 저장할 수 있다. 이때, 가중치는 실시예에 따라 상이하게 설정될 수 있다.For example, when the main age group information of country A and the target age group information match in their 20s, the device 30 may update and store the channel interest level by weighting the channel interest level of country A. In this case, the weight may be set differently depending on the embodiment.

S609 단계에서, 장치(30)는 국가 별 채널 관심도를 높은 순으로 나열하여, 채널 관심도가 가장 높은 국가를 타겟 국가로 설정할 수 있다.In step S609 , the device 30 may list the channel interest by country in the order of highest, and set the country having the highest channel interest as the target country.

S610 단계에서, 장치(30)는 타겟 국가에 대응하는 언어를 추가 자막 언어로 설정할 수 있다. 예를 들어, 타겟 국가가 A 국가인 경우, 장치(30)는 A 국가에 대응하는 언어를 추가 자막 언어로 설정할 수 있다.In step S610 , the device 30 may set the language corresponding to the target country as the additional subtitle language. For example, if the target country is country A, the device 30 may set a language corresponding to country A as an additional subtitle language.

도 7은 일실시예에 따른 뉴럴 네트워크의 학습을 설명하기 위한 도면이다.7 is a diagram for explaining learning of a neural network according to an embodiment.

도 7에 도시된 바와 같이, 장치(30)는 사용자 단말(10)로부터 수신한 제작 파일로부터 콘텐츠 스타일을 형성하기 위하여 뉴럴 네트워크(123)를 학습시킬 수 있다. 또한, 장치(30)는 영상 클립, 오디오 파일 및 자막 텍스트를 포함하는 제작 파일로부터 최종 콘텐츠의 형성까지의 히스토리로 뉴럴 네트워크(123)를 학습시킬 수 있다. 일실시예에 따르면, 장치(30)는 서버와 다른 별개의 주체일 수 있지만, 이에 제한되는 것은 아니다.As shown in FIG. 7 , the device 30 may train the neural network 123 to form a content style from a production file received from the user terminal 10 . Also, the device 30 may train the neural network 123 with a history from production files including video clips, audio files, and subtitle text to the formation of final content. According to an embodiment, the device 30 may be a separate entity other than the server, but is not limited thereto.

뉴럴 네트워크(123)는 트레이닝 샘플들이 입력되는 입력 레이어(121)와 트레이닝 출력들을 출력하는 출력 레이어(125)를 포함하고, 트레이닝 출력들과 레이블들 사이의 차이에 기초하여 학습될 수 있다. 여기서, 레이블들은 영상 클립, 오디오 파일, 자막 텍스트를 포함하는 제작 파일 및 제작 파일에 대응하는 콘텐츠 스타일에 기초하여 정의될 수 있다. 뉴럴 네트워크(123)는 복수의 노드들의 그룹으로 연결되어 있고, 연결된 노드들 사이의 가중치들과 노드들을 활성화시키는 활성화 함수에 의해 정의된다.The neural network 123 includes an input layer 121 to which training samples are input and an output layer 125 to output training outputs, and may be trained based on a difference between the training outputs and labels. Here, the labels may be defined based on a video clip, an audio file, a production file including subtitle text, and a content style corresponding to the production file. The neural network 123 is connected to a group of a plurality of nodes, and is defined by weights between the connected nodes and an activation function that activates the nodes.

장치(30)는 GD(Gradient Decent) 기법 또는 SGD(Stochastic Gradient Descent) 기법을 이용하여 뉴럴 네트워크(123)를 학습시킬 수 있다. 장치(30)는 뉴럴 네트워크(123)의 출력들 및 레이블들 의해 설계된 손실 함수(Loss Function)를 이용할 수 있다.The device 30 may train the neural network 123 by using a Gradient Decent (GD) technique or a Stochastic Gradient Descent (SGD) technique. The device 30 may use a loss function designed by the outputs and labels of the neural network 123 .

장치(30)는 미리 정의된 손실 함수를 이용하여 트레이닝 에러를 계산할 수 있다. 손실 함수는 레이블, 출력 및 파라미터를 입력 변수로 미리 정의될 수 있고, 여기서 파라미터는 제2 인공신경망(200) 내 가중치들에 의해 설정될 수 있다. 예를 들어, 손실 함수는 MSE(Mean Square Error) 형태, 엔트로피(entropy) 형태 등으로 설계될 수 있는데, 손실 함수가 설계되는 실시예에는 다양한 기법 또는 방식이 채용될 수 있다.The device 30 may calculate the training error using a predefined loss function. The loss function may be predefined with a label, an output, and a parameter as input variables, where the parameter may be set by weights in the second artificial neural network 200 . For example, the loss function may be designed in a Mean Square Error (MSE) form, an entropy form, or the like, and various techniques or methods may be employed in an embodiment in which the loss function is designed.

장치(30)는 역전파(Backpropagation) 기법을 이용하여 트레이닝 에러에 영향을 주는 가중치들을 찾아낼 수 있다. 여기서, 가중치들은 뉴럴 네트워크(123) 내 노드들 사이의 관계들이다. 장치(30)는 역전파 기법을 통해 찾아낸 가중치들을 최적화시키기 위해 레이블들 및 출력들을 이용한 SGD 기법을 이용할 수 있다. 예를 들어, 장치(30)는 레이블들, 출력들 및 가중치들에 기초하여 정의된 손실 함수의 가중치들을 SGD 기법을 이용하여 갱신할 수 있다.The apparatus 30 may find weights affecting the training error using a backpropagation technique. Here, the weights are relationships between nodes in the neural network 123 . The apparatus 30 may use the SGD technique using labels and outputs to optimize the weights found through the backpropagation technique. For example, the apparatus 30 may update the weights of the loss function defined based on the labels, outputs, and weights using the SGD technique.

일실시예에 따르면, 장치(30)는 트레이닝 영상 클립들, 트레이닝 오디오 파일들 및 트레이닝 자막 텍스트들에 기초하여 트레이닝 입력 신호들을 생성할 수 있다. 장치(30)는 트레이닝 입력 신호로부터 출력 신호들을 생성할 수 있다.According to an embodiment, the device 30 may generate training input signals based on training video clips, training audio files and training caption texts. Device 30 may generate output signals from a training input signal.

일실시예에 따르면, 뉴럴 네트워크(123)는 트레이닝 영상 클립들, 트레이닝 오디오 파일들, 트레이닝 자막 텍스트들, 출력 신호들 및 트레이닝 콘텐츠 스타일들의 차이에 기초하여 생성된 트레이닝 에러들을 최소화하여 학습될 수 있다.According to an embodiment, the neural network 123 may be trained by minimizing training errors generated based on differences in training video clips, training audio files, training caption texts, output signals, and training content styles. .

도 8은 일실시예에 따른 제1 시점의 영상 이미지에서 출연자를 삭제하는 과정을 설명하기 위한 순서도이고, 도 9는 일실시예에 따른 출연자가 있는 영역이 다른 이미지로 교체된 제1-1 이미지를 설명하기 위한 도면이다.8 is a flowchart for explaining a process of deleting a performer from a video image of a first viewpoint according to an embodiment, and FIG. 9 is a 1-1 image in which a region with a performer is replaced with another image according to an embodiment. It is a drawing for explaining.

도 8을 참조하면, 먼저 S801 단계에서, 장치(30)는 최종 콘텐츠에서 영상 클립 중 마지막 영상 클립이 나오는 시점인 제1 시점의 영상 이미지를 제1 이미지로 추출할 수 있다. 즉, 장치(30)는 최종 콘텐츠에서 제1 시점의 영상을 캡처하여, 캡처한 이미지를 제1 이미지로 추출할 수 있다.Referring to FIG. 8 , first, in step S801 , the device 30 may extract, as a first image, a video image of a first point of view, which is a point at which the last video clip among video clips in the final content. That is, the device 30 may capture the image of the first viewpoint in the final content, and extract the captured image as the first image.

S802 단계에서, 장치(30)는 제1 이미지에서 제1 영역이 있는 부분을 분할하여 제1-1 이미지를 추출할 수 있다. 장치(30)는 제1 이미지에서 배경이 차지하고 있는 영역을 제1 영역으로 구분할 수 있다.In step S802 , the device 30 may extract a 1-1 image by dividing a portion having the first region in the first image. The device 30 may classify an area occupied by the background in the first image as the first area.

S803 단계에서, 장치(30)는 제1-1 이미지에 출연자가 있는 것으로 인식되는지 여부를 확인할 수 있다. 즉, 장치(30)는 제1-1 이미지 상에서 배경 이외에 사람 형태의 객체가 인식되는지 여부를 확인하여, 지1-1 이미지에 출연자가 있는지 여부를 확인할 수 있다.In step S803 , the device 30 may check whether it is recognized that there is a performer in the 1-1 image. That is, the device 30 may check whether a human-shaped object is recognized in the 1-1 image other than the background on the 1-1 image, and may determine whether a performer is present in the 1-1 image.

S803 단계에서 제1-1 이미지에 출연자가 있는 것으로 확인되면, S804 단계에서, 장치(30)는 제1-1 이미지에서 출연자가 차지하고 있는 영역을 제2 영역으로 구분할 수 있다. 이때, 장치(30)는 제1 이미지에서 제1 영역을 구분하는 방식과 동일한 방식을 통해, 제1-1 이미지에서 제2 영역을 구분할 수 있다.If it is confirmed in step S803 that there is a performer in the 1-1 image, in step S804 , the device 30 may classify an area occupied by the performer in the 1-1 image as a second region. In this case, the device 30 may distinguish the second region from the 1-1 image through the same method as the method for distinguishing the first region from the first image.

S805 단계에서, 장치(30)는 최종 콘텐츠를 제1 시점으로부터 시간상 역순으로 재생하여 분석할 수 있다. 이때, 최종 콘텐츠가 시간상 역순으로 재생되어 분석되기 때문에, 출연자가 이동한 동선에 대해 역으로 움직이게 된다. 즉, 최종 콘텐츠가 역재생되면, 출연자가 반대로 움직이게 된다In step S805 , the device 30 may analyze the final content by playing it back in reverse chronological order from the first time point. At this time, since the final content is reproduced and analyzed in reverse chronological order, the performer moves in reverse with respect to the movement line. That is, if the final content is played backwards, the performer moves in the opposite direction.

S806 단계에서, 장치(30)는 최종 콘텐츠를 제1 시점으로부터 시간상 역순으로 재생하여 분석한 결과, 제1-1 시점에 제2 영역 내에 출연자가 없는 것을 확인할 수 있다.In step S806 , the device 30 reproduces and analyzes the final content in reverse chronological order from the first time point, and as a result, it can be confirmed that there are no performers in the second area at the 1-1 time point.

즉, 최종 콘텐츠는 제1-1 시점 및 제1 시점 순으로 재생되는데, 장치(30)는 최종 콘텐츠를 제1 시점으로부터 시간상 역순으로 재생하여 제2 영역 내에 있는 출연자가 역으로 움직이는 것을 모니터링 할 수 있으며, 제2 영역 내에 있는 출연자가 제2 영역을 벗어난 것으로 확인되는 시점을 제1-1 시점으로 설정할 수 있다.That is, the final content is played back in the order of the 1-1 time point and the first time point, and the device 30 reproduces the final content in the reverse chronological order from the first time point to monitor the reverse movement of the performer in the second area. In addition, a time point at which it is confirmed that the performer within the second area leaves the second area may be set as the 1-1 time point.

S807 단계에서, 장치(30)는 최종 콘텐츠에서 제1-1 시점의 영상 이미지를 제2 이미지로 추출할 수 있다. 즉, 장치(30)는 최종 콘텐츠에서 제1-1 시점의 영상을 캡처하여, 캡처한 이미지를 제2 이미지로 추출할 수 있다.In step S807 , the device 30 may extract the video image of the 1-1 view from the final content as the second image. That is, the device 30 may capture the image of the 1-1 view from the final content, and extract the captured image as the second image.

S808 단계에서, 장치(30)는 제2 이미지에서 제2 영역이 있는 부분을 분할하여 제2-1 이미지를 추출할 수 있다.In step S808 , the device 30 may extract a 2-1 image by dividing a portion having the second region in the second image.

S809 단계에서, 장치(30)는 제1-1 이미지에서 제2 영역이 있는 부분을 제2-1 이미지로 교체할 수 있다.In step S809 , the device 30 may replace the portion having the second region in the 1-1 image with the 2-1 image.

즉, 도 9의 (a)에 도시된 바와 같이, 장치(30)는 제1-1 이미지(700)에 출연자가 있는 것을 확인할 수 있으며, 제1-1 이미지(700)에서 출연자가 차지하고 있는 영역을 제2 영역으로 구분할 수 있다. 이때, 장치(30)는 제2 영역이 사각형의 형태로 구분되도록, 제2 영역을 설정할 수 있다.That is, as shown in (a) of FIG. 9 , the device 30 can confirm that there is a performer in the 1-1 image 700 , and the area occupied by the performer in the 1-1 image 700 . can be divided into a second region. In this case, the device 30 may set the second area so that the second area is divided into a rectangular shape.

이후, 장치(30)는 제1-1 이미지에서 제2 영역이 있는 부분을 삭제하고, 삭제된 위치에 제2-1 이미지를 삽입하여, 제1-1 이미지와 제2-1 이미지가 결합된 제1-1 이미지를 생성할 수 있다. 즉, 제1-1 이미지에서 제2 영역이 있는 부분이 제2-1 이미지로 교체되어, 제1-1 이미지에 있었던 출연자가 삭제되고, 삭제된 자리에 배경이 추가되어, 제1-1 이미지에서는 더 이상 출연자가 인식되지 않을 수 있다.Thereafter, the device 30 deletes the portion having the second region from the 1-1 image, inserts the 2-1 image at the deleted position, and combines the 1-1 image with the 2-1 image. A 1-1 image may be generated. That is, in the 1-1 image, the part with the second area is replaced with the 2-1 image, the performer in the 1-1 image is deleted, and a background is added in the deleted place, and the 1-1 image In , the performer may no longer be recognized.

도 10은 일실시예에 따른 장치(30)의 구성의 예시도이다.10 is an exemplary diagram of a configuration of an apparatus 30 according to an embodiment.

일실시예에 따른 장치(30)는 프로세서(31) 및 메모리(32)를 포함한다. 일실시예에 따른 장치(30)는 상술한 서버 또는 단말일 수 있다. 프로세서는 도 1 내지 도 9를 통하여 전술한 적어도 하나의 장치들을 포함하거나, 도 1 내지 도 9를 통하여 전술한 적어도 하나의 방법을 수행할 수 있다. 메모리(32)는 상술한 방법과 관련된 정보를 저장하거나 상술한 방법이 구현된 프로그램을 저장할 수 있다. 메모리(32)는 휘발성 메모리 또는 비휘발성 메모리일 수 있다.Device 30 according to one embodiment includes a processor 31 and a memory 32 . The device 30 according to an embodiment may be the above-described server or terminal. The processor may include at least one of the devices described above with reference to FIGS. 1 to 9 , or may perform at least one method described above with reference to FIGS. 1 to 9 . The memory 32 may store information related to the above-described method or may store a program in which the above-described method is implemented. Memory 32 may be volatile memory or non-volatile memory.

프로세서(31)는 프로그램을 실행하고, 장치(30)를 제어할 수 있다. 프로세서(31)에 의하여 실행되는 프로그램의 코드는 메모리(32)에 저장될 수 있다. 장치(30)는 입출력 장치(도면 미 표시)를 통하여 외부 장치(예를 들어, 퍼스널 컴퓨터 또는 네트워크)에 연결되고, 데이터를 교환할 수 있다.The processor 31 may execute a program and control the device 30 . The code of the program executed by the processor 31 may be stored in the memory 32 . The device 30 may be connected to an external device (eg, a personal computer or a network) through an input/output device (not shown) and exchange data.

이상에서 설명된 실시예들은 하드웨어 구성요소, 소프트웨어 구성요소, 및/또는 하드웨어 구성요소 및 소프트웨어 구성요소의 조합으로 구현될 수 있다. 예를 들어, 실시예들에서 설명된 장치, 방법 및 구성요소는, 예를 들어, 프로세서, 콘트롤러, ALU(arithmetic logic unit), 디지털 신호 프로세서(digital signal processor), 마이크로컴퓨터, FPGA(field programmable gate array), PLU(programmable logic unit), 마이크로프로세서, 또는 명령(instruction)을 실행하고 응답할 수 있는 다른 어떠한 장치와 같이, 하나 이상의 범용 컴퓨터 또는 특수 목적 컴퓨터를 이용하여 구현될 수 있다. 처리 장치는 운영 체제(OS) 및 상기 운영 체제 상에서 수행되는 하나 이상의 소프트웨어 애플리케이션을 수행할 수 있다. 또한, 처리 장치는 소프트웨어의 실행에 응답하여, 데이터를 접근, 저장, 조작, 처리 및 생성할 수도 있다. 이해의 편의를 위하여, 처리 장치는 하나가 사용되는 것으로 설명된 경우도 있지만, 해당 기술분야에서 통상의 지식을 가진 자는, 처리 장치가 복수 개의 처리 요소(processing element) 및/또는 복수 유형의 처리 요소를 포함할 수 있음을 알 수 있다. 예를 들어, 처리 장치는 복수 개의 프로세서 또는 하나의 프로세서 및 하나의 콘트롤러를 포함할 수 있다. 또한, 병렬 프로세서(parallel processor)와 같은, 다른 처리 구성(processing configuration)도 가능하다.The embodiments described above may be implemented by a hardware component, a software component, and/or a combination of a hardware component and a software component. For example, the apparatus, methods and components described in the embodiments may include, for example, a processor, a controller, an arithmetic logic unit (ALU), a digital signal processor, a microcomputer, a field programmable gate (FPGA). array), a programmable logic unit (PLU), a microprocessor, or any other device capable of executing and responding to instructions, may be implemented using one or more general purpose or special purpose computers. The processing device may execute an operating system (OS) and one or more software applications running on the operating system. The processing device may also access, store, manipulate, process, and generate data in response to execution of the software. For convenience of understanding, although one processing device is sometimes described as being used, one of ordinary skill in the art will recognize that the processing device includes a plurality of processing elements and/or a plurality of types of processing elements. It can be seen that may include For example, the processing device may include a plurality of processors or one processor and one controller. Other processing configurations are also possible, such as parallel processors.

실시예에 따른 방법은 다양한 컴퓨터 수단을 통하여 수행될 수 있는 프로그램 명령 형태로 구현되어 컴퓨터 판독 가능 매체에 기록될 수 있다. 상기 컴퓨터 판독 가능 매체는 프로그램 명령, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다. 상기 매체에 기록되는 프로그램 명령은 실시예를 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 당업자에게 공지되어 사용 가능한 것일 수도 있다. 컴퓨터 판독 가능 기록 매체의 예에는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체(magnetic media), CD-ROM, DVD와 같은 광기록 매체(optical media), 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical media), 및 롬(ROM), 램(RAM), 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다. 프로그램 명령의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드를 포함한다. 상기된 하드웨어 장치는 실시예의 동작을 수행하기 위해 하나 이상의 소프트웨어 모듈로서 작동하도록 구성될 수 있으며, 그 역도 마찬가지이다.The method according to the embodiment may be implemented in the form of program instructions that can be executed through various computer means and recorded in a computer-readable medium. The computer-readable medium may include program instructions, data files, data structures, etc. alone or in combination. The program instructions recorded on the medium may be specially designed and configured for the embodiment, or may be known and available to those skilled in the art of computer software. Examples of the computer-readable recording medium include magnetic media such as hard disks, floppy disks and magnetic tapes, optical media such as CD-ROMs and DVDs, and magnetic such as floppy disks. - includes magneto-optical media, and hardware devices specially configured to store and execute program instructions, such as ROM, RAM, flash memory, and the like. Examples of program instructions include not only machine language codes such as those generated by a compiler, but also high-level language codes that can be executed by a computer using an interpreter or the like. The hardware devices described above may be configured to operate as one or more software modules to perform the operations of the embodiments, and vice versa.

소프트웨어는 컴퓨터 프로그램(computer program), 코드(code), 명령(instruction), 또는 이들 중 하나 이상의 조합을 포함할 수 있으며, 원하는 대로 동작하도록 처리 장치를 구성하거나 독립적으로 또는 결합적으로(collectively) 처리 장치를 명령할 수 있다. 소프트웨어 및/또는 데이터는, 처리 장치에 의하여 해석되거나 처리 장치에 명령 또는 데이터를 제공하기 위하여, 어떤 유형의 기계, 구성요소(component), 물리적 장치, 가상 장치(virtual equipment), 컴퓨터 저장 매체 또는 장치, 또는 전송되는 신호 파(signal wave)에 영구적으로, 또는 일시적으로 구체화(embody)될 수 있다. 소프트웨어는 네트워크로 연결된 컴퓨터 시스템 상에 분산되어서, 분산된 방법으로 저장되거나 실행될 수도 있다. 소프트웨어 및 데이터는 하나 이상의 컴퓨터 판독 가능 기록 매체에 저장될 수 있다.Software may comprise a computer program, code, instructions, or a combination of one or more of these, which configures a processing device to operate as desired or is independently or collectively processed You can command the device. The software and/or data may be any kind of machine, component, physical device, virtual equipment, computer storage medium or device, to be interpreted by or to provide instructions or data to the processing device. , or may be permanently or temporarily embody in a transmitted signal wave. The software may be distributed over networked computer systems and stored or executed in a distributed manner. Software and data may be stored in one or more computer-readable recording media.

이상과 같이 실시예들이 비록 한정된 도면에 의해 설명되었으나, 해당 기술분야에서 통상의 지식을 가진 자라면 상기를 기초로 다양한 기술적 수정 및 변형을 적용할 수 있다. 예를 들어, 설명된 기술들이 설명된 방법과 다른 순서로 수행되거나, 및/또는 설명된 시스템, 구조, 장치, 회로 등의 구성요소들이 설명된 방법과 다른 형태로 결합 또는 조합되거나, 다른 구성요소 또는 균등물에 의하여 대치되거나 치환되더라도 적절한 결과가 달성될 수 있다.As described above, although the embodiments have been described with reference to the limited drawings, those skilled in the art may apply various technical modifications and variations based on the above. For example, the described techniques are performed in a different order than the described method, and/or the described components of the system, structure, apparatus, circuit, etc. are combined or combined in a different form than the described method, or other components Or substituted or substituted by equivalents may achieve an appropriate result.

그러므로, 다른 구현들, 다른 실시예들 및 특허청구범위와 균등한 것들도 후술하는 청구범위의 범위에 속한다.Therefore, other implementations, other embodiments, and equivalents to the claims are also within the scope of the following claims.

Claims

A method performed by an apparatus comprising:
Receiving a production file including a video clip, an audio file, and subtitle text corresponding to the audio file for producing content from a user terminal of a user;
generating a content style using the production file;
creating a content draft by applying the content style to the production file;
sending the content style and the content draft to the user terminal;
providing an interface capable of editing the content style to the user terminal; and
Including; applying the content style edited by the interface to the production file to generate the final content;
The content style is
Includes subtitle style, cut editing style and color style,
The subtitle style is
generated from the audio file and subtitle text;
The cut editing style is,
generated from the video clip;
The color style is
generated from the subtitle text;
The step of creating the cut editing style includes:
extracting a background from a video clip according to whether a pattern of a predefined similarity range is repeated beyond a predefined criterion within the entire video clip;
obtaining a main object of the video clip by separating the extracted background from the video clip;
dividing the video clip into a plurality of clips;
For each of the plurality of clips, if the movement of the main object is greater than or equal to a preset reference change value, the clip is classified as a dynamic clip. classifying as;
setting a first cut-editing style for applying a first time weight to the dynamic clip;
setting a second cut editing style in which a second time weight smaller than the first time weight is applied to the static clip; and
generating the cut-editing style based on the first cut-editing style and the second cut-editing style;
including,
The step of creating the color style includes:
extracting words mentioned more than a preset reference number from the subtitle text as main keywords;
classifying the main keyword as a positive keyword or a negative keyword;
setting the video clip as a first color style when the main keyword is classified as a positive keyword;
setting the video clip as a second color style when the main keyword is classified as a negative keyword; and
generating the color style based on the first color style and the second color style;
containing
Automatic editing method for creating video contents based on artificial intelligence.

According to claim 1,
The step of generating the subtitle style includes:
comparing the intensity of the sound in the audio file with a preset first reference intensity;
When the intensity of the sound is less than the first reference intensity, setting the subtitle text corresponding to the sound less than the first reference intensity as a first closed caption style set to a first font, a first size, and a first color; ;
comparing the sound intensity with a preset second reference intensity when the intensity of the sound is greater than the first reference intensity;
setting a caption text corresponding to a sound lower than the second reference intensity as a second closed caption style set to a second font, a second size, and a second color when the intensity of the sound is lower than the second reference intensity; ;
determining whether the loudness of the sound greater than the second reference intensity is maintained for a preset reference period when the intensity of the sound is greater than the second reference intensity;
When the intensity of the sound is maintained for the reference period, the subtitle text corresponding to the sound equal to or greater than the second reference intensity is set as a third closed caption style set to a third font, a third size, and a third color during the reference period to do;
setting, as an effect caption style, a caption text corresponding to a sound equal to or greater than the second reference intensity during the reference period when the volume of the sound is not maintained for the reference period; and
generating the caption style based on the first caption style, the second caption style, the third caption style, and the effect caption style;
containing
Automatic editing method for creating video contents based on artificial intelligence.

delete