JP2020043454A

JP2020043454A - Video content generation method and generation program

Info

Publication number: JP2020043454A
Application number: JP2018168950A
Authority: JP
Inventors: 幸太武下; Kota Takeshita; 俊兵名波; Shumpei Nawa
Original assignee: Crossfader Inc
Current assignee: Crossfader Inc
Priority date: 2018-09-10
Filing date: 2018-09-10
Publication date: 2020-03-19
Anticipated expiration: 2038-09-10
Also published as: JP7121988B2

Abstract

To provide a video content generation method and a generation program which allow any one to generate a video content of good looking easily.SOLUTION: A video content generation method includes an extraction start point determination step S3 of determining N extraction start points on the basis of the speech information of a first material content, a split video extraction step S4 of extracting N split video information having the N extraction start points, respectively, as start points from the video information of the first material content, and a content synthesis step S6 of synthesizing music information and split video information on the basis of synthesis information corresponding to the second material content. The synthesis information contains multiple video changeover points synchronized with a beat represented by the music information, and the multiple video changeover points belong, respectively, to any one of the multiple video changeover points. In the content synthesis step S6, the multiple video changeover points belong to the same group are the start points of N split video information, respectively.SELECTED DRAWING: Figure 2

Description

本発明は、映像情報と音声情報とを含む第１素材コンテンツから新たな動画コンテンツを生成する方法およびプログラムに関し、特に、ユーザ自身が作成した第１素材コンテンツとプリセットされた第２素材コンテンツとを合成することにより新たな動画コンテンツを生成する方法およびプログラムに関する。 The present invention relates to a method and a program for generating a new moving image content from a first material content including video information and audio information, and more particularly, to a method of combining a first material content created by a user himself and a preset second material content. The present invention relates to a method and a program for generating new moving image content by combining.

近年のスマートフォンの普及に伴い、高価な機材を用意しなくても、簡単に動画を撮影することが可能となってきている。また、高速な通信環境が整備されたことにより、近年では、スマートフォンで撮影した動画を一般に公開することも容易である。 With the spread of smart phones in recent years, it has become possible to easily shoot moving images without preparing expensive equipment. In addition, with the development of a high-speed communication environment, in recent years, it is also easy to make a moving image taken by a smartphone public.

このような状況を活かしたスマートフォン向けアプリケーションソフトウェアとして、「ＴｉｋＴｏｋ（登録商標）」が知られている。ＴｉｋＴｏｋのユーザは、以下の手順により、オリジナルな動画を生成して一般に公開することができる（例えば、非特許文献１，２参照）。
（１）プリセットされた複数の音楽からＢＧＭとする音楽を選択する。
（２）選択したＢＧＭに合うような動画を撮影する。
（３）撮影した動画をアップロードする。
また、ユーザは、撮影した動画をアップロードする前に、フィルターまたはタイム効果と呼ばれる複数の特殊効果の中から選択した１以上の特殊効果を施すことにより、撮影した動画を装飾することもできる。 “TikTok (registered trademark)” is known as application software for smartphones taking advantage of such a situation. The user of TikTok can generate an original moving image and make it public to the public by the following procedure (for example, see Non-Patent Documents 1 and 2).
(1) Select music to be BGM from a plurality of preset music.
(2) Shoot a moving image suitable for the selected BGM.
(3) Upload the shot video.
Further, the user can decorate the captured moving image by applying one or more special effects selected from a plurality of special effects called filters or time effects before uploading the captured moving image.

"Tik Tok (アプリ)"、［online］、ウィキペディアフリー百科事典、［平成30年9月3日検索］、インターネット〈URL：https://ja.wikipedia.org/wiki/Tik_Tok_(%E3%82%A2%E3%83%97%E3%83%AA)〉"Tik Tok (app)", [online], Wikipedia free encyclopedia, [searched September 3, 2018], Internet <URL: https://en.wikipedia.org/wiki/Tik_Tok_(%E3%82 (% A2% E3% 83% 97% E3% 83% AA)) "無料の動画作成アプリ『Tik Tok』の基本的な使い方-曲の挿入・動画の保存方法など紹介！"、［online］、平成30年4月18日、ドハック、［平成30年9月3日検索］、インターネット〈URL：https://dohack.jp/video/tik-tok〉"Basic usage of free movie creation application" Tik Tok "-Introduction to how to insert songs and save videos!", [Online], April 18, 2018, Dohak, [September 3, 2018 Day search], Internet <URL: https://dohack.jp/video/tik-tok>

上記アプリケーションソフトウェアを用いた動画の生成では、ＢＧＭに合うような動画を撮影したり、撮影した動画に特殊効果を施したりする際に、ユーザの美的センスが求められる。このため、上記アプリケーションソフトウェアは、自分自身の美的センスに自信のない者にとっては非常にとっつきづらいものとなっていた。 In the generation of a moving image using the application software, a user's aesthetic sense is required when shooting a moving image suitable for BGM or when applying a special effect to the shot moving image. For this reason, the above-mentioned application software is very difficult for those who are not confident in their own aesthetic sense.

本発明は、上記事情に鑑みてなされたものであって、誰もが簡単に見栄えのよい動画コンテンツを生成することができる動画コンテンツ生成方法および生成プログラムを提供することを課題とする。 The present invention has been made in view of the above circumstances, and it is an object of the present invention to provide a moving image content generation method and a generation program that allow anyone to easily generate good-looking moving image content.

上記課題を解決するために、本発明に係る動画コンテンツ生成方法は、映像情報と音声情報とを含む第１素材コンテンツから動画コンテンツを生成する動画コンテンツ生成方法であって、
記憶装置から前記第１素材コンテンツを読み込む第１読込工程と、読み込まれた前記第１素材コンテンツを前記映像情報と前記音声情報とに分離するコンテンツ分離工程と、分離された前記音声情報に基づいて、Ｎ個（ただし、Ｎは２以上の整数）の抽出開始点を決定する抽出開始点決定工程と、分離された前記映像情報から、Ｎ個の前記抽出開始点のそれぞれを始点としたＮ個の分割映像情報を抽出する分割映像抽出工程と、前記記憶装置から音楽情報である第２素材コンテンツと該第２素材コンテンツに対応する合成情報とを読み込む第２読込工程と、前記合成情報に基づいて前記音楽情報と前記分割映像情報とを合成することにより、前記動画コンテンツを生成するコンテンツ合成工程とを備え、
前記合成情報は、前記音楽情報が表すビートに同期した複数の映像切替点を含み、前記複数の映像切替点は、Ｎ個のグループのいずれかに属しており、前記コンテンツ合成工程において、同じグループに属する前記映像切替点のそれぞれをＮ個のうちのいずれかの前記分割映像情報の始点とする、との構成を有している。 In order to solve the above problem, a moving image content generation method according to the present invention is a moving image content generation method for generating a moving image content from a first material content including video information and audio information,
A first reading step of reading the first material content from a storage device, a content separating step of separating the read first material content into the video information and the audio information, and based on the separated audio information. , N (where N is an integer equal to or greater than 2) extraction start point determination step, and N extraction start points starting from each of the N extraction start points are obtained from the separated video information. A second image content, which is music information, and a composite information corresponding to the second material content from the storage device; and And synthesizing the music information and the divided video information to generate the moving image content.
The composition information includes a plurality of video switching points synchronized with a beat represented by the music information, and the plurality of video switching points belong to any of N groups. Are set as the start points of any of the N pieces of the divided video information.

上記動画コンテンツ生成方法は、前記抽出開始点決定工程において、分離された前記音声情報に含まれる複数の音量ピーク点のうちの上位Ｎ個の時間を前記抽出開始点とする、との構成を有していてもよい。 The moving image content generation method has a configuration in which, in the extraction start point determining step, the upper N times of a plurality of sound volume peak points included in the separated audio information are set as the extraction start point. It may be.

上記動画コンテンツ生成方法は、前記第１素材コンテンツおよび前記第２素材コンテンツの選択に関する指令を受け付ける指令受付工程をさらに備え、前記第１読込工程において、前記指令に基づいて前記記憶装置に予め記憶されている複数の前記第１素材コンテンツのうちの１つを読み込み、前記第２読込工程において、前記指令に基づいて前記記憶装置に予め記憶されている複数の前記第２素材コンテンツのうちの１つと該１つに対応する前記合成情報とを読み込む、との構成を有していてもよい。 The moving image content generation method further includes a command receiving step of receiving a command regarding selection of the first material content and the second material content, and in the first reading step, the command is stored in advance in the storage device based on the command. Reading one of the plurality of first material contents, and in the second reading step, reading one of the plurality of second material contents stored in advance in the storage device based on the command. And reading the combined information corresponding to the one.

なお、前記合成情報は、ＭＩＤＩ形式で記述することができる。この場合は、前記Ｎ個のグループを音階で表現することができる。 Note that the synthesis information can be described in a MIDI format. In this case, the N groups can be represented by musical scales.

また、上記課題を解決するために、本発明に係る動画コンテンツ生成プログラムは、映像情報と音声情報とを含む第１素材コンテンツから動画コンテンツを生成する動画コンテンツ生成方法を情報処理装置に実行させる動画コンテンツ生成プログラムであって、
前記情報処理装置に、記憶装置から前記第１素材コンテンツを読み込む第１読込工程と、読み込まれた前記第１素材コンテンツを前記映像情報と前記音声情報とに分離するコンテンツ分離工程と、分離された前記音声情報に基づいて、Ｎ個（ただし、Ｎは２以上の整数）の抽出開始点を決定する抽出開始点決定工程と、分離された前記映像情報から、Ｎ個の前記抽出開始点のそれぞれを始点としたＮ個の分割映像情報を抽出する分割映像抽出工程と、前記記憶装置から音楽情報である第２素材コンテンツと該第２素材コンテンツに対応する合成情報とを読み込む第２読込工程と、前記合成情報に基づいて前記音楽情報と前記分割映像情報とを合成することにより、前記動画コンテンツを生成するコンテンツ合成工程とを実行させ、
前記合成情報は、前記音楽情報が表すビートに同期した複数の映像切替点を含み、前記複数の映像切替点は、Ｎ個のグループのいずれかに属しており、前記コンテンツ合成工程において、同じグループに属する前記映像切替点のそれぞれがＮ個のうちのいずれかの前記分割映像情報の始点とされる、との構成を有していてもよい。 According to another aspect of the present invention, there is provided a moving image content generation program for causing an information processing apparatus to execute a moving image content generation method for generating a moving image content from a first material content including video information and audio information. A content generation program,
A first reading step of reading the first material content from a storage device into the information processing apparatus; a content separating step of separating the read first material content into the video information and the audio information; An extraction start point determination step of determining N (where N is an integer of 2 or more) extraction start points based on the audio information; and N extraction start points from the separated video information. A divided video extracting step of extracting N pieces of divided video information starting from a second source content, and a second reading step of reading a second material content, which is music information, and synthesis information corresponding to the second material content from the storage device. By synthesizing the music information and the divided video information based on the synthesizing information, to execute a content synthesizing step of generating the moving image content,
The composition information includes a plurality of video switching points synchronized with a beat represented by the music information, and the plurality of video switching points belong to any of N groups. , Each of which is a start point of any of the N pieces of the divided video information.

上記動画コンテンツ生成プログラムは、前記抽出開始点決定工程において、分離された前記音声情報に含まれる複数の音量ピーク点のうちの上位Ｎ個に対応する時間が前記抽出開始点とされる、との構成を有していてもよい。 In the moving image content generation program, in the extraction start point determining step, a time corresponding to the top N among a plurality of volume peak points included in the separated audio information is set as the extraction start point. It may have a configuration.

上記動画コンテンツ生成プログラムは、前記情報処理装置に、前記第１素材コンテンツおよび前記第２素材コンテンツの選択に関する指令を受け付ける指令受付工程をさらに実行させ、前記第１読込工程において、前記指令に基づいて前記記憶装置に予め記憶されている複数の前記第１素材コンテンツのうちの１つが読み込まれ、前記第２読込工程において、前記指令に基づいて前記記憶装置に予め記憶されている複数の前記第２素材コンテンツのうちの１つと該１つに対応する前記合成情報とが読み込まれる、との構成を有していてもよい。 The moving image content generation program causes the information processing apparatus to further execute a command receiving step of receiving a command regarding selection of the first material content and the second material content, and in the first reading step, based on the command. One of the plurality of first material contents stored in advance in the storage device is read, and in the second reading step, the plurality of second material contents stored in advance in the storage device based on the command are read. One of the material contents and the synthesis information corresponding to the one may be read.

本発明によれば、誰もが簡単に見栄えのよい動画コンテンツを生成することができる動画コンテンツ生成方法および生成プログラムを提供することができる。 Advantageous Effects of Invention According to the present invention, it is possible to provide a moving image content generation method and a generation program that allow anyone to easily generate good-looking moving image content.

本発明の実施例に係る動画コンテンツ生成プログラムを実行している最中のスマートフォンの概略的な構成を示すブロック図である。FIG. 3 is a block diagram illustrating a schematic configuration of a smartphone during execution of a moving image content generation program according to an embodiment of the present invention. 本発明の実施例に係る動画コンテンツ生成方法のフロー図である。FIG. 4 is a flowchart of a moving image content generation method according to an embodiment of the present invention. 図２に示すコンテンツ分離工程を説明するための図である。FIG. 3 is a diagram for explaining a content separation step shown in FIG. 2. 図２に示す抽出開始点決定工程を説明するための図である。FIG. 3 is a diagram for explaining an extraction start point determination step shown in FIG. 2. 図２に示す分割映像抽出工程を説明するための図である。FIG. 3 is a diagram for explaining a divided video extracting step shown in FIG. 2. 図２に示す第２読込工程において読み込まれる音楽情報および合成情報の構成を示す図である。FIG. 3 is a diagram illustrating a configuration of music information and composite information read in a second reading step illustrated in FIG. 2. 図２に示すコンテンツ合成工程の前半部分を説明するための図である。FIG. 3 is a diagram for explaining the first half of the content synthesizing step shown in FIG. 2. 図２に示すコンテンツ合成工程の後半部分を説明するための図である。FIG. 3 is a diagram for explaining the latter half of the content synthesizing step shown in FIG. 2. 本発明の変形例１に係る動画コンテンツ生成方法のフロー図である。FIG. 9 is a flowchart of a moving image content generation method according to a first modification of the present invention. 本発明の変形例２に係る動画コンテンツ生成プログラムを実行している最中のスマートフォンの概略的な構成を示すブロック図である。FIG. 13 is a block diagram illustrating a schematic configuration of a smartphone during execution of a moving image content generation program according to Modification 2 of the present invention. 本発明の変形例２に係る動画コンテンツ生成方法のフロー図である。FIG. 9 is a flowchart of a moving image content generation method according to Modification 2 of the present invention. 本発明の変形例２に係る動画コンテンツ生成方法の別のフロー図である。FIG. 14 is another flowchart of the moving image content generation method according to Modification 2 of the present invention. 本発明の変形例３に係る動画コンテンツ生成プログラムを実行している最中のスマートフォンの概略的な構成を示すブロック図である。FIG. 14 is a block diagram illustrating a schematic configuration of a smartphone during execution of a moving image content generation program according to Modification 3 of the present invention. 本発明の変形例３に係る動画コンテンツ生成方法の第２読込工程において読み込まれる音楽情報、合成情報および特殊効果情報の構成を示す図である。FIG. 14 is a diagram showing a configuration of music information, synthesis information, and special effect information read in a second reading step of the moving image content generation method according to Modification 3 of the present invention.

以下、添付図面を参照しながら、本発明に係る動画コンテンツ生成方法および生成プログラムの一実施例について説明する。 Hereinafter, an embodiment of a moving image content generation method and a generation program according to the present invention will be described with reference to the accompanying drawings.

［実施例］
図１に、スマートフォン１０の概略的な構成を示す。同図に示すように、スマートフォン１０は、ＭＰＵ（Micro Processor Unit）からなる情報処理装置１１と、メモリからなる記憶装置１２と、カメラ１３と、マイク１４と、タッチパネルディスプレイ１５と、スピーカー１６とを備えている。 [Example]
FIG. 1 shows a schematic configuration of the smartphone 10. As shown in FIG. 1, the smartphone 10 includes an information processing device 11 including an MPU (Micro Processor Unit), a storage device 12 including a memory, a camera 13, a microphone 14, a touch panel display 15, and a speaker 16. Have.

情報処理装置１１は、第１素材コンテンツ生成部２０と動画コンテンツ生成部２１とを有している。このうち、第１素材コンテンツ生成部２０は、標準的に備えられた動画撮影プログラムがユーザによって実行されたときに情報処理装置１１内に形成される機能ブロックである。一方、動画コンテンツ生成部２１は、本実施例に係る動画コンテンツ生成プログラムがユーザによって実行されたときに情報処理装置１１内に形成される機能ブロックである。 The information processing device 11 includes a first material content generation unit 20 and a moving image content generation unit 21. Among them, the first material content generation unit 20 is a functional block formed in the information processing apparatus 11 when a standardly provided moving image shooting program is executed by a user. On the other hand, the moving image content generation unit 21 is a functional block formed in the information processing device 11 when the moving image content generation program according to the present embodiment is executed by a user.

第１素材コンテンツ生成部２０は、カメラ１３が出力する映像信号とマイクが出力する音声信号とに基づいて、映像情報と音声情報とを含む第１素材コンテンツ（動画ファイル）を生成し、これを記憶装置１２に記憶させる。 The first material content generation unit 20 generates first material content (moving image file) including video information and audio information based on the video signal output from the camera 13 and the audio signal output from the microphone. It is stored in the storage device 12.

記憶装置１２は、第１素材コンテンツに加え、音楽情報である第２素材コンテンツ、およびこれに対応する合成情報をさらに記憶している。第２素材コンテンツおよび合成情報は、予め記憶されていてもよいし、本実施例に係る動画コンテンツ生成プログラムがユーザによって実行されたときに記憶されてもよい。第２素材コンテンツおよび合成情報については、後で詳しく説明する。 The storage device 12 further stores, in addition to the first material content, a second material content, which is music information, and synthetic information corresponding thereto. The second material content and the synthesis information may be stored in advance, or may be stored when the moving image content generation program according to the present embodiment is executed by the user. The second material content and the combined information will be described later in detail.

なお、本実施例では、本実施例に係る動画コンテンツ生成プログラムがユーザによって実行された時に、各１つの第１素材コンテンツ、第２素材コンテンツおよび合成情報が記憶装置１２に既に記憶されているものとする。 In the present embodiment, when the moving image content generation program according to the present embodiment is executed by the user, each of the first material content, the second material content, and the combination information are already stored in the storage device 12. And

動画コンテンツ生成部２１は、前述した通り、本実施例に係る動画コンテンツ生成プログラムがユーザによって実行されたときに形成される。動画コンテンツ生成部２１は、本実施例に係る動画コンテンツ生成方法を実行することにより、記憶装置１２に記憶された第１素材コンテンツおよび第２素材コンテンツを合成情報に基づいて合成し、新たな動画コンテンツを生成する。言い換えると、本実施例に係る動画コンテンツ生成プログラムは、情報処理装置１１に本実施例に係る動画コンテンツ生成方法を実行させることにより、新たな動画コンテンツを生成する。 As described above, the moving image content generation unit 21 is formed when the user executes the moving image content generation program according to the present embodiment. The moving image content generation unit 21 combines the first material content and the second material content stored in the storage device 12 based on the synthesis information by executing the moving image content generation method according to the present embodiment, and generates a new moving image. Generate content. In other words, the moving image content generation program according to the present embodiment generates new moving image content by causing the information processing apparatus 11 to execute the moving image content generation method according to the embodiment.

動画コンテンツ生成部２１は、生成した動画コンテンツをタッチパネルディスプレイ１５およびスピーカー１６を介して再生することができる。この他、動画コンテンツ生成部２１は、生成した動画コンテンツを不図示の通信部を介してアップロードしたり、記憶装置１２に記憶させたりすることもできる。 The moving image content generation unit 21 can reproduce the generated moving image content via the touch panel display 15 and the speaker 16. In addition, the moving image content generation unit 21 can upload the generated moving image content via a communication unit (not shown) or store the generated moving image content in the storage device 12.

続いて、図２を参照しながら、本実施例に係る動画コンテンツ生成方法（すなわち、動画コンテンツ生成部２１の動作）について説明する。 Subsequently, the moving image content generation method according to the present embodiment (that is, the operation of the moving image content generation unit 21) will be described with reference to FIG.

第１読込工程Ｓ１において、動画コンテンツ生成部２１は、記憶装置１２から第１素材コンテンツを読み込む。前述した通り、第１素材コンテンツは、ユーザ自身が撮影した動画に関するファイルであり、映像情報と音声情報とを含んでいる。 In the first reading step S1, the moving image content generation unit 21 reads the first material content from the storage device 12. As described above, the first material content is a file related to a moving image shot by the user himself, and includes video information and audio information.

工程Ｓ１の次に実行されるコンテンツ分離工程Ｓ２において、動画コンテンツ生成部２１は、工程Ｓ１で読み込まれた第１素材コンテンツを映像情報と音声情報とに分離する（図３参照）。 In the content separation step S2 executed after the step S1, the moving image content generation unit 21 separates the first material content read in the step S1 into video information and audio information (see FIG. 3).

工程Ｓ２の次に実行される抽出開始点決定工程Ｓ３において、動画コンテンツ生成部２１は、工程Ｓ２で分離された音声情報を解析して当該音声情報に含まれる音量ピーク点のうちの上位Ｎ個（ただし、Ｎは２以上の整数。本実施例では、Ｎ＝７）を特定するとともに、これらに対応する時間を抽出開始点とする。図４に示すように、本実施例では、音量ピーク点Ｐ１，Ｐ２，・・・，Ｐ６，Ｐ７に対応する時間ｔ１，ｔ２，・・・，ｔ６，ｔ７が抽出開始点となる。 In the extraction start point determination step S3 executed after the step S2, the moving image content generation unit 21 analyzes the audio information separated in the step S2, and analyzes the top N sound volume peak points among the sound volume peak points included in the audio information. (However, N is an integer of 2 or more; in this embodiment, N = 7), and the time corresponding to these is set as the extraction start point. As shown in FIG. 4, in the present embodiment, the time points t1, t2,..., T6, and t7 corresponding to the sound volume peak points P1, P2,.

工程３の次に実行される分割映像抽出工程Ｓ４において、動画コンテンツ生成部２１は、工程Ｓ２で分離された映像情報から、工程Ｓ３で決定された７個の抽出開始点ｔ１，ｔ２，・・・，ｔ６，ｔ７のそれぞれを始点とした７個の分割映像情報Ｖ１，Ｖ２，・・・，Ｖ６，Ｖ７を抽出する（図５参照）。より詳しくは、動画コンテンツ生成部２１は、抽出開始点ｔ１−ｔ２間の情報を分割映像情報Ｖ１として抽出し、抽出開始点ｔ２−ｔ３間の情報を分割映像情報Ｖ２として抽出し、・・・、抽出開始点ｔ６−ｔ７間の情報を分割映像情報Ｖ６として抽出し、抽出開始点ｔ７と当該映像情報の終端との間の情報を分割映像情報Ｖ７として抽出する。 In the divided video extraction step S4 executed after the step 3, the moving image content generation unit 21 extracts seven extraction start points t1, t2,... Determined in the step S3 from the video information separated in the step S2. ., V6,..., V6, and V7 starting from each of t, t6, and t7 (see FIG. 5). More specifically, the moving image content generation unit 21 extracts information between the extraction start points t1 and t2 as divided video information V1, extracts information between the extraction start points t2 and t3 as divided video information V2, and so on. The information between the extraction start points t6 and t7 is extracted as the divided video information V6, and the information between the extraction start point t7 and the end of the video information is extracted as the divided video information V7.

工程Ｓ４の次に実行される第２読込工程Ｓ５において、動画コンテンツ生成部２１は、記憶装置１２から第２素材コンテンツおよびこれに対応する合成情報を読み込む。 In a second reading step S5 executed after the step S4, the moving image content generation unit 21 reads the second material content and the corresponding composite information from the storage device 12.

図６（Ａ）に、読み込まれた第２素材コンテンツに含まれる音楽情報を示す。また、図６（Ｂ）に、読み込まれた合成情報を示す。これらの図から明らかなように、合成情報は、音楽情報が表すビートに同期した複数の映像切替点ＳＷ１，ＳＷ２，・・・，ＳＷ１５，ＳＷ１６を含んでいる。そして、複数の映像切替点ＳＷ１，ＳＷ２，・・・，ＳＷ１５，ＳＷ１６は、７個のグループＣ３，Ｄ３，Ｅ３，Ｆ３，Ｇ３，Ａ３，Ｂ３のいずれかに属している。例えば、映像切替点ＳＷ１，ＳＷ１４はグループＣ３に属し、映像切替点ＳＷ１０，ＳＷ１１，ＳＷ１２，ＳＷ１３はグループＦ３に属している。 FIG. 6A shows music information included in the read second material content. FIG. 6B shows the read synthesis information. As is apparent from these figures, the synthesis information includes a plurality of video switching points SW1, SW2,..., SW15, SW16 synchronized with the beat represented by the music information. The plurality of video switching points SW1, SW2,..., SW15, and SW16 belong to one of the seven groups C3, D3, E3, F3, G3, A3, and B3. For example, the video switching points SW1 and SW14 belong to the group C3, and the video switching points SW10, SW11, SW12, and SW13 belong to the group F3.

合成情報は、ＭＩＤＩ（Musical Instrument Digital Interface）形式で記述されている。ＭＩＤＩ形式を利用することにより、７個のグループＣ３，Ｄ３，Ｅ３，Ｆ３，Ｇ３，Ａ３，Ｂ３のいずれかに属する複数の映像切替点の時間的な位置を容易に記述することができる。なお、Ｃ３、Ｄ３、Ｅ３、Ｆ３、Ｇ３、Ａ３およびＢ３は、それぞれ、第３オクターブの音階「ド」、「レ」、「ミ」、「ファ」、「ソ」、「ラ」および「シ」を意味する。 The synthesis information is described in MIDI (Musical Instrument Digital Interface) format. By using the MIDI format, the temporal position of a plurality of video switching points belonging to any of the seven groups C3, D3, E3, F3, G3, A3, and B3 can be easily described. Note that C3, D3, E3, F3, G3, A3, and B3 are the scales of the third octave, "do," "le," "mi," "fa," "so," "la," and "sh," respectively. Means.

工程Ｓ５の次に実行されるコンテンツ合成工程Ｓ６において、動画コンテンツ生成部２１は、合成情報に含まれる映像切替点ＳＷ１，ＳＷ２，・・・，ＳＷ１５，ＳＷ１６に基づいて、工程Ｓ４で抽出された７個の分割映像情報Ｖ１，Ｖ２，・・・，Ｖ６，Ｖ７と第２素材コンテンツに含まれる音楽情報とを合成する。 In the content synthesizing step S6 executed after the step S5, the moving image content generation unit 21 extracts the video switching points SW1, SW2,..., SW15, and SW16 included in the synthesis information in the step S4. .., V6, V7 and the music information included in the second material content.

より具体的には、動画コンテンツ生成部２１は、まず、７個の分割映像情報Ｖ１，Ｖ２，・・・，Ｖ６，Ｖ７と７個のグループＣ３，Ｄ３，Ｅ３，Ｆ３，Ｇ３，Ａ３，Ｂ３とを対応付ける。本実施例では、この対応付けをランダムに行い、その結果、グループＣ３と分割映像情報Ｖ１、グループＤ３と分割映像情報Ｖ３、グループＥ３と分割映像情報Ｖ５、グループＦ３と分割映像情報Ｖ６、グループＧ３と分割映像情報Ｖ２、グループＡ３と分割映像情報Ｖ４、およびグループＢ３と分割映像情報Ｖ７が対応付けられたものとする（図７参照）。 More specifically, the moving image content generation unit 21 first determines the seven divided video information V1, V2,..., V6, V7 and the seven groups C3, D3, E3, F3, G3, A3, B3. Is associated with In this embodiment, this association is performed at random, and as a result, as a result, the group C3 and the divided video information V1, the group D3 and the divided video information V3, the group E3 and the divided video information V5, the group F3 and the divided video information V6, and the group G3 And the divided video information V2, the group A3 and the divided video information V4, and the group B3 and the divided video information V7 are associated with each other (see FIG. 7).

続いて、動画コンテンツ生成部２１は、グループＣ３に属する映像切替点ＳＷ１，ＳＷ１４が分割映像情報Ｖ１の開始点となり、グループＤ３に属する映像切替点ＳＷ６が分割映像情報Ｖ３の開始点となり、グループＥ３に属する映像切替点ＳＷ５が分割映像情報Ｖ５の開始点となり、グループＦ３に属する映像切替点ＳＷ１０，ＳＷ１１，ＳＷ１２，ＳＷ１３が分割映像情報Ｖ６の開始点となり、グループＧ３に属する映像切替点ＳＷ１５が分割映像情報Ｖ２の開始点となり、グループＡ３に属する映像切替点ＳＷ２，ＳＷ４，ＳＷ７，ＳＷ９，ＳＷ１６が分割映像情報Ｖ４の開始点となり、グループＢ３に属する映像切替点ＳＷ３，ＳＷ８が分割映像情報Ｖ７の開始点となるように分割映像情報Ｖ１，Ｖ２，・・・，Ｖ６，Ｖ７を繋ぎ合わせ、繋ぎ合わせたものと音声情報とを合成する（図８参照）。 Subsequently, the moving image content generation unit 21 determines that the video switching points SW1 and SW14 belonging to the group C3 are the starting points of the divided video information V1, the video switching points SW6 belonging to the group D3 are the starting points of the divided video information V3, and the group E3 Is the start point of the divided video information V5, the video switching points SW10, SW11, SW12, and SW13 belonging to the group F3 are the starting points of the divided video information V6, and the video switching point SW15 belonging to the group G3 is the divided point. The video switching points SW2, SW4, SW7, SW9 and SW16 belonging to the group A3 are the starting points of the divided video information V4, and the video switching points SW3 and SW8 belonging to the group B3 are the starting points of the divided video information V7. Connect the divided video information V1, V2,..., V6, V7 so as to be the starting point. Were synthesized and those joined and audio information (see FIG. 8).

このとき、動画コンテンツ生成部２１は、例えば、映像切替点Ｓ１−Ｓ２間の時間が分割映像情報Ｖ１の時間よりも短い場合は、分割映像情報Ｖ１の先頭の一部分だけを使用し、映像切替点Ｓ１−Ｓ２間の時間が分割映像情報Ｖ１の時間よりも長い場合は、分割映像情報Ｖ１の最終フレーム（静止画）で穴埋めをする。 At this time, for example, when the time between the video switching points S1 and S2 is shorter than the time of the divided video information V1, the moving image content generation unit 21 uses only a part of the head of the divided video information V1 and When the time between S1 and S2 is longer than the time of the divided video information V1, the hole is filled with the last frame (still image) of the divided video information V1.

このように、本実施例に係る動画コンテンツ生成方法および生成プログラムによれば、美的センスが必要となる作業をユーザが行わなくても、ユーザが作成した第１素材コンテンツに含まれる音声情報と予め用意された合成情報とに基づいて、第１素材コンテンツに含まれる映像情報が第２素材コンテンツに含まれる音楽情報（ＢＧＭ）に合うように編集され、見栄えのよい新たな動画コンテンツが得られる。 As described above, according to the moving image content generation method and the generation program according to the present embodiment, even if the user does not perform an operation requiring an aesthetic sense, the audio information included in the first material content created by the user and the Based on the prepared composite information, the video information included in the first material content is edited to match the music information (BGM) included in the second material content, and a new moving image content with a good appearance is obtained.

なお、本発明に係る動画コンテンツ生成方法および生成プログラムには、以下に例示する複数の変形例がある。 Note that the moving image content generation method and the generation program according to the present invention include a plurality of modified examples exemplified below.

［変形例１］
第２読込工程Ｓ５は、第１読込工程Ｓ１の前に実行されてもよいし（図９（Ａ）参照）、第１読込工程Ｓ１−分割映像抽出工程Ｓ４と同時並行的に実行されてもよい（図９（Ｂ）参照）。つまり、本発明では、コンテンツ合成工程Ｓ６の前に、第１読込工程Ｓ１−分割映像抽出工程Ｓ４と第２読込工程Ｓ５とが実行されていればよい。 [Modification 1]
The second reading step S5 may be executed before the first reading step S1 (see FIG. 9A), or may be executed simultaneously and in parallel with the first reading step S1-the divided image extracting step S4. Good (see FIG. 9B). That is, in the present invention, the first reading step S1-the divided image extracting step S4 and the second reading step S5 may be executed before the content synthesizing step S6.

［変形例２］
図１０に示すように、記憶装置１２は、複数の第１素材コンテンツと、複数の第２素材コンテンツと、複数の第２素材コンテンツのそれぞれに対応する合成情報とを記憶していてもよい。この場合は、図１１に示すように、第１読込工程Ｓ１および第２読込工程Ｓ５の前に、選択指令受付工程Ｓ７を実行する必要がある。 [Modification 2]
As illustrated in FIG. 10, the storage device 12 may store a plurality of first material contents, a plurality of second material contents, and composite information corresponding to each of the plurality of second material contents. In this case, as shown in FIG. 11, it is necessary to execute the selection command receiving step S7 before the first reading step S1 and the second reading step S5.

選択指令受付工程Ｓ７において、動画コンテンツ生成部２１は、記憶装置１２に記憶されている複数の第１素材コンテンツおよび第２素材コンテンツに関する情報を選択肢としてタッチパネルディスプレイ１５に表示させるとともに、タッチパネルディスプレイ１５を介して入力された第１素材コンテンツおよび第２素材コンテンツの選択に関するユーザからの指令を受け付ける。そして、第１読込工程Ｓ１において、動画コンテンツ生成部２１は、ユーザによって選択された第１素材コンテンツを読み込む。第２読込工程Ｓ５についても同様である。 In the selection command receiving step S7, the moving image content generation unit 21 causes the touch panel display 15 to display information on the plurality of first material contents and the second material content stored in the storage device 12 as options, and An instruction from the user regarding selection of the first material content and the second material content input via the terminal is received. Then, in the first reading step S1, the moving image content generation unit 21 reads the first material content selected by the user. The same applies to the second reading step S5.

なお、第１読込工程Ｓ１および第２読込工程Ｓ５を直列的に実行する場合は、第１素材コンテンツおよび第２素材コンテンツの選択に関する指令を受け付ける選択指令受付工程Ｓ７を第１読込工程Ｓ１の前に実行してもよいし（図１２（Ａ）参照）、第１素材コンテンツの選択に関する指令を受け付ける選択指令受付工程Ｓ７Ａを第１読込工程Ｓ１の前に実行するとともに、第２素材コンテンツの選択に関する指令を受け付ける選択指令受付工程Ｓ７Ｂを第２読込工程Ｓ５の前に実行してもよい（図１２（Ｂ）参照）。 When the first reading step S1 and the second reading step S5 are executed in series, the selection command receiving step S7 for receiving a command related to the selection of the first material content and the second material content is performed before the first reading step S1. (See FIG. 12 (A)), the selection instruction receiving step S7A for receiving an instruction regarding the selection of the first material content is executed before the first reading step S1, and the selection of the second material content is performed. The selection command accepting step S7B for accepting an instruction relating to the second reading step S5 may be executed before the second reading step S5 (see FIG. 12B).

［変形例３］
図１３に示すように、記憶装置１２は、第２素材コンテンツに対応する特殊効果情報をさらに記憶していてもよい。この場合、動画コンテンツ生成部２１は、第２読込工程Ｓ５において、第２素材コンテンツとともに合成情報および特殊効果情報を読み込む。ただし、動画コンテンツ生成部２１は、特殊効果情報が存在しない第２素材コンテンツについては、第２素材コンテンツとともに合成情報のみを読み込む。 [Modification 3]
As shown in FIG. 13, the storage device 12 may further store special effect information corresponding to the second material content. In this case, in the second reading step S5, the moving image content generation unit 21 reads the combined information and the special effect information together with the second material content. However, for the second material content in which no special effect information exists, the moving image content generation unit 21 reads only the composite information together with the second material content.

図１４に示すように、特殊効果情報は、複数の特殊効果適用期間ＥＦ１，ＥＦ２，・・・，ＥＦ６，ＥＦ７を含んでいる。そして、特殊効果適用期間ＥＦ１，ＥＦ２，・・・，ＥＦ６，ＥＦ７は、４個のグループＣ４，Ｄ４，Ｅ４，Ｆ４のいずれかに属している。 As shown in FIG. 14, the special effect information includes a plurality of special effect application periods EF1, EF2,..., EF6, EF7. The special effect application periods EF1, EF2,..., EF6, EF7 belong to one of the four groups C4, D4, E4, F4.

グループＣ４に属する特殊効果適用期間ＥＦ１，ＥＦ４，ＥＦ６は、特殊効果「拡大」を適用する期間であり、グループＤ４に属する特殊効果適用期間ＥＦ７は、特殊効果「フェイドアウト」を適用する期間であり、グループＥ４に属する特殊効果適用期間ＥＦ２，ＥＦ５は、特殊効果「早送り」を適用する期間であり、グループＦ４に属する特殊効果適用期間ＥＦ３は、特殊効果「スローモーション」を適用する期間である。 The special effect application period EF1, EF4, EF6 belonging to the group C4 is a period for applying the special effect “expansion”, the special effect application period EF7 for the group D4 is a period for applying the special effect “fade out”, The special effect application periods EF2 and EF5 belonging to the group E4 are periods during which the special effect “fast-forward” is applied, and the special effect application period EF3 belonging to the group F4 is a period during which the special effect “slow motion” is applied.

特殊効果情報は、ＭＩＤＩ形式で記述されている。ＭＩＤＩ形式を利用することにより、４個のグループＣ４，Ｄ４，Ｅ４，Ｆ４のいずれかに属する特殊効果適用期間の時間的な範囲を容易に記述することができる。なお、Ｃ４、Ｄ４、Ｅ４およびＦ４は、それぞれ、第４オクターブの音階「ド」、「レ」、「ミ」および「ファ」を意味する。 The special effect information is described in a MIDI format. By using the MIDI format, the time range of the special effect application period belonging to any of the four groups C4, D4, E4, and F4 can be easily described. Note that C4, D4, E4, and F4 mean the scales "do", "le", "mi", and "fa" of the fourth octave, respectively.

動画コンテンツ生成部２１は、コンテンツ合成工程Ｓ６において、複数の分割映像情報を繋ぎ合わせたものに特殊効果情報にしたがった特殊効果を適用する。そして、動画コンテンツ生成部２１は、特殊効果を適用した後の映像情報と音楽情報とを合成する。 In the content synthesizing step S6, the moving image content generation unit 21 applies a special effect according to the special effect information to a combination of a plurality of pieces of divided video information. Then, the moving image content generation unit 21 combines the video information and the music information after applying the special effect.

特殊効果の種類を増やす場合は、例えば、第４オクターブの音階「ソ」、「ラ」、「シ」を意味するＧ４、Ａ４、Ｂ４を利用すればよい。また、合成情報および特殊効果情報は、ＭＩＤＩ形式で記述された単一のファイルであってもよい。 When increasing the types of special effects, for example, G4, A4, and B4, which represent the scales “S”, “La”, and “S” of the fourth octave, may be used. Further, the combination information and the special effect information may be a single file described in the MIDI format.

［その他の変形例］
（１）動画コンテンツ生成部２１は、第１素材コンテンツの音声情報に含まれる音量ピーク点以外のものに基づいて抽出開始点を決定してもよい。例えば、動画コンテンツ生成部２１は、音声情報の音量が急激に増加／減少した点を抽出開始点としてもよい。 [Other Modifications]
(1) The moving image content generation unit 21 may determine the extraction start point based on a point other than the volume peak point included in the audio information of the first material content. For example, the moving image content generation unit 21 may set the point at which the volume of the audio information sharply increases / decreases as the extraction start point.

（２）動画コンテンツ生成部２１は、コンテンツ合成工程Ｓ６を実行した後に、Ｎ個の分割映像情報とＮ個のグループとのランダムな対応付けのやり直しの要否についての指令を受け付けてもよい。対応付けをやり直すことにより、第１素材コンテンツ自体および第２素材コンテンツ自体に何も変更を加えなくても、様々な動画コンテンツを生成することができる。 (2) After executing the content synthesizing step S6, the moving image content generation unit 21 may receive a command as to whether or not it is necessary to re-associate the N pieces of divided video information with the N groups at random. By re-associating, it is possible to generate various moving image contents without making any changes to the first material content itself and the second material content itself.

（３）動画コンテンツ生成部２１は、ランダムにではなく、予め定められたルールにしたがってＮ個の分割映像情報とＮ個のグループとを対応付けてもよい。 (3) The moving image content generation unit 21 may associate the N pieces of divided video information with the N groups according to a predetermined rule instead of randomly.

（４）動画コンテンツ生成部２１は、抽出開始点決定工程Ｓ３を実行する前に、抽出開始点の決定に関するユーザからの指令を受け付けてもよい。この指令には、抽出開始点決定工程Ｓ３において解析を行うべき範囲の指定や、抽出開始点の直接的な指定が含まれる。なお、ユーザによってＭ個（ただし、Ｍは１以上Ｎ以下の整数）の抽出開始点が指定された場合、動画コンテンツ生成部２１は、音声情報を解析することによって（Ｎ−Ｍ）個の抽出開始点を決定することになる。また、ユーザによって抽出開始点が指定された場合、動画コンテンツ生成部２１は、当該指定に係る抽出開始点を始点とする分割映像情報を多くの映像切替点を含むグループ（図６に示す一例では、５個の映像切替点ＳＷ２，ＳＷ４，ＳＷ７，ＳＷ９，ＳＷ１６を含むグループＡ３）に対応付けることが好ましい。 (4) Before executing the extraction start point determination step S3, the moving image content generation unit 21 may receive an instruction from the user regarding the determination of the extraction start point. This instruction includes designation of a range to be analyzed in the extraction start point determination step S3 and direct designation of the extraction start point. When the user specifies M (where M is an integer of 1 or more and N or less) extraction start points, the moving image content generation unit 21 analyzes the audio information to extract (N−M) extraction points. The starting point will be determined. In addition, when the extraction start point is specified by the user, the moving image content generation unit 21 assigns the divided video information starting from the extraction start point according to the specification to a group including many video switching points (in the example illustrated in FIG. 6, It is preferable to correspond to group A3) including five video switching points SW2, SW4, SW7, SW9, and SW16.

（５）動画コンテンツ生成部２１は、コンテンツ合成工程Ｓ６を実行した後に、抽出開始点の決定に関する指令を受け付けてもよい。指令を受け付けた場合、動画コンテンツ生成部２１は、抽出開始点決定工程Ｓ３、分割映像抽出工程Ｓ４およびコンテンツ合成工程Ｓ６を再度実行する。なお、動画コンテンツ生成部２１は、コンテンツ合成工程Ｓ６を実行した後に、抽出開始点の決定に関する指令に加え、対応付けのやり直しの要否についての指令を受け付けてもよい。 (5) After executing the content synthesizing step S6, the moving image content generation unit 21 may receive a command regarding determination of the extraction start point. When receiving the command, the moving image content generation unit 21 executes the extraction start point determining step S3, the divided video extracting step S4, and the content synthesizing step S6 again. Note that, after executing the content synthesizing step S6, the moving image content generation unit 21 may receive an instruction on whether or not re-association is necessary, in addition to the instruction regarding the determination of the extraction start point.

（６）動画コンテンツ生成部２１は、選択指令受付工程Ｓ７，Ｓ７Ｂにおいて、第２素材コンテンツに関する情報をユーザに提示する代わりに、曲調に関する選択肢（例えば、楽しい、悲しい等）と曲のテンポに関する選択肢（例えば、ゆっくり、はやい等）をユーザに提示してもよい。この場合、ユーザは、曲調とテンポを選択することにより、間接的に第２素材コンテンツを選択することになる。 (6) In the selection command accepting steps S7 and S7B, the moving image content generation unit 21 instead of presenting the information on the second material content to the user, selects options related to the tune (for example, fun, sad, etc.) and options related to the tempo of the song. (For example, slowly, quickly, etc.) may be presented to the user. In this case, the user indirectly selects the second material content by selecting the tune and the tempo.

（７）第１素材コンテンツに含まれる音声情報は、右側音声情報および左側音声情報からなっていてもよい。この場合は、例えば、右側音声情報および左側音声情報の一方、またはこれらを平均化したものに基づいて、抽出開始点を決定することができる。 (7) The audio information included in the first material content may include right audio information and left audio information. In this case, for example, the extraction start point can be determined based on one of the right-side audio information and the left-side audio information or an average thereof.

（８）合成情報および特殊効果情報は、ＭＩＤＩ以外の形式で記述されていてもよい。 (8) The combination information and the special effect information may be described in a format other than MIDI.

（９）情報処理装置１１は、スマートフォンに備えられたものに限定されない。 (9) The information processing device 11 is not limited to one provided in the smartphone.

１０スマートフォン
１１情報処理装置
１２記憶装置
１３カメラ
１４マイク
１５タッチパネルディスプレイ
１６スピーカー
２０第１素材コンテンツ生成部
２１動画コンテンツ生成部 Reference Signs List 10 smartphone 11 information processing device 12 storage device 13 camera 14 microphone 15 touch panel display 16 speaker 20 first material content generation unit 21 video content generation unit

Claims

A moving image content generation method for generating a moving image content from a first material content including video information and audio information,
A first reading step of reading the first material content from a storage device;
A content separating step of separating the read first material content into the video information and the audio information;
An extraction start point determining step of determining N (where N is an integer of 2 or more) extraction start points based on the separated audio information;
A divided video extracting step of extracting N divided video information starting from each of the N extraction start points from the separated video information,
A second reading step of reading, from the storage device, a second material content that is music information and synthesis information corresponding to the second material content;
A content synthesizing step of generating the moving image content by synthesizing the music information and the divided video information based on the synthesizing information;
With
The synthesis information includes a plurality of video switching points synchronized with the beat represented by the music information,
The plurality of video switching points belong to any of the N groups,
In the content synthesizing step, each of the video switching points belonging to the same group is set as a start point of one of N pieces of the divided video information.

2. The moving image content according to claim 1, wherein, in the extraction start point determination step, upper N times of a plurality of sound volume peak points included in the separated audio information are set as the extraction start point. 3. Generation method.

A command receiving step of receiving a command regarding selection of the first material content and the second material content,
In the first reading step, one of the plurality of first material contents stored in advance in the storage device is read based on the command,
In the second reading step, one of the plurality of second material contents stored in the storage device in advance and the composite information corresponding to the one are read based on the command. The moving image content generation method according to claim 1 or 2.

The synthesis information is described in a MIDI format,
The method according to any one of claims 1 to 3, wherein the N groups correspond to musical scales.

A video content generation program for causing an information processing apparatus to execute a video content generation method of generating video content from first material content including video information and audio information,
In the information processing device,
A first reading step of reading the first material content from a storage device;
A content separating step of separating the read first material content into the video information and the audio information;
An extraction start point determining step of determining N (where N is an integer of 2 or more) extraction start points based on the separated audio information;
A divided video extracting step of extracting N divided video information starting from each of the N extraction start points from the separated video information,
A second reading step of reading, from the storage device, a second material content that is music information and synthesis information corresponding to the second material content;
A content synthesizing step of generating the moving image content by synthesizing the music information and the divided video information based on the synthesizing information;
And execute
The synthesis information includes a plurality of video switching points synchronized with the beat represented by the music information,
The plurality of video switching points belong to any of the N groups,
In the content synthesizing step, each of the video switching points belonging to the same group is set as a start point of one of N pieces of the divided video information.

6. The extraction start point determining step, wherein a time corresponding to the top N among a plurality of sound volume peak points included in the separated audio information is set as the extraction start point. Video content generation program.

Causing the information processing apparatus to further execute a command receiving step of receiving a command regarding selection of the first material content and the second material content;
In the first reading step, one of the plurality of first material contents stored in advance in the storage device is read based on the command,
In the second reading step, one of the plurality of second material contents stored in the storage device in advance and the composite information corresponding to the one are read based on the command. The moving image content generation program according to claim 5 or 6, wherein

The synthesis information is described in a MIDI format,
The moving image content generation program according to any one of claims 5 to 7, wherein the N groups correspond to musical scales.