JP7121988B2

JP7121988B2 - MOVIE CONTENT GENERATION METHOD AND GENERATION PROGRAM

Info

Publication number: JP7121988B2
Application number: JP2018168950A
Authority: JP
Inventors: 幸太武下; 俊兵名波
Original assignee: 株式会社クロスフェーダー
Priority date: 2018-09-10
Filing date: 2018-09-10
Publication date: 2022-08-19
Anticipated expiration: 2038-09-10
Also published as: JP2020043454A

Description

本発明は、映像情報と音声情報とを含む第１素材コンテンツから新たな動画コンテンツを生成する方法およびプログラムに関し、特に、ユーザ自身が作成した第１素材コンテンツとプリセットされた第２素材コンテンツとを合成することにより新たな動画コンテンツを生成する方法およびプログラムに関する。 The present invention relates to a method and program for generating new video content from first material content including video information and audio information, and in particular, to a first material content created by a user himself and a preset second material content. The present invention relates to a method and program for generating new moving image content by synthesizing.

近年のスマートフォンの普及に伴い、高価な機材を用意しなくても、簡単に動画を撮影することが可能となってきている。また、高速な通信環境が整備されたことにより、近年では、スマートフォンで撮影した動画を一般に公開することも容易である。 With the spread of smartphones in recent years, it has become possible to easily shoot videos without preparing expensive equipment. Also, in recent years, with the development of high-speed communication environments, it has become easy to open videos taken with smartphones to the public.

このような状況を活かしたスマートフォン向けアプリケーションソフトウェアとして、「ＴｉｋＴｏｋ（登録商標）」が知られている。ＴｉｋＴｏｋのユーザは、以下の手順により、オリジナルな動画を生成して一般に公開することができる（例えば、非特許文献１，２参照）。
（１）プリセットされた複数の音楽からＢＧＭとする音楽を選択する。
（２）選択したＢＧＭに合うような動画を撮影する。
（３）撮影した動画をアップロードする。
また、ユーザは、撮影した動画をアップロードする前に、フィルターまたはタイム効果と呼ばれる複数の特殊効果の中から選択した１以上の特殊効果を施すことにより、撮影した動画を装飾することもできる。 “TikTok (registered trademark)” is known as application software for smartphones that takes advantage of this situation. A user of TikTok can generate an original moving image and open it to the public according to the following procedure (see, for example, Non-Patent Documents 1 and 2).
(1) Select music as BGM from a plurality of preset music.
(2) Shoot a video that matches the selected BGM.
(3) Upload the video you shot.
In addition, before uploading the captured moving image, the user can decorate the captured moving image by applying one or more special effects selected from a plurality of special effects called filters or time effects.

"Tik Tok (アプリ)"、［online］、ウィキペディアフリー百科事典、［平成30年9月3日検索］、インターネット〈URL：https://ja.wikipedia.org/wiki/Tik_Tok_(%E3%82%A2%E3%83%97%E3%83%AA)〉"Tik Tok (app)", [online], Wikipedia Free Encyclopedia, [searched September 3, 2018], Internet <URL: https://ja.wikipedia.org/wiki/Tik_Tok_(%E3%82) %A2%E3%83%97%E3%83%AA)> "無料の動画作成アプリ『Tik Tok』の基本的な使い方-曲の挿入・動画の保存方法など紹介！"、［online］、平成30年4月18日、ドハック、［平成30年9月3日検索］、インターネット〈URL：https://dohack.jp/video/tik-tok〉"Basic usage of the free video creation app 'Tik Tok'-Introducing how to insert songs and save videos!", [online], April 18, 2018, Dohak, [September 3, 2018 day search], Internet <URL: https://dohack.jp/video/tik-tok>

上記アプリケーションソフトウェアを用いた動画の生成では、ＢＧＭに合うような動画を撮影したり、撮影した動画に特殊効果を施したりする際に、ユーザの美的センスが求められる。このため、上記アプリケーションソフトウェアは、自分自身の美的センスに自信のない者にとっては非常にとっつきづらいものとなっていた。 In generating moving images using the above application software, the user is required to have an aesthetic sense when shooting moving images that match the background music or applying special effects to the shot moving images. For this reason, the above application software is very difficult for those who do not have confidence in their own aesthetic sense.

本発明は、上記事情に鑑みてなされたものであって、誰もが簡単に見栄えのよい動画コンテンツを生成することができる動画コンテンツ生成方法および生成プログラムを提供することを課題とする。 SUMMARY OF THE INVENTION It is an object of the present invention to provide a moving image content generating method and a generating program that enable anyone to easily generate moving image content that looks good.

上記課題を解決するために、本発明に係る動画コンテンツ生成方法は、映像情報と音声情報とを含む第１素材コンテンツから動画コンテンツを生成する動画コンテンツ生成方法であって、
記憶装置から前記第１素材コンテンツを読み込む第１読込工程と、読み込まれた前記第１素材コンテンツを前記映像情報と前記音声情報とに分離するコンテンツ分離工程と、分離された前記音声情報に基づいて、Ｎ個（ただし、Ｎは２以上の整数）の抽出開始点を決定する抽出開始点決定工程と、分離された前記映像情報から、Ｎ個の前記抽出開始点のそれぞれを始点としたＮ個の分割映像情報を抽出する分割映像抽出工程と、前記記憶装置から音楽情報である第２素材コンテンツと該第２素材コンテンツに対応する合成情報とを読み込む第２読込工程と、前記合成情報に基づいて前記音楽情報と前記分割映像情報とを合成することにより、前記動画コンテンツを生成するコンテンツ合成工程とを備え、
前記合成情報は、前記音楽情報が表すビートに同期した複数の映像切替点を含み、前記複数の映像切替点は、Ｎ個のグループのいずれかに属しており、前記コンテンツ合成工程において、同じグループに属する前記映像切替点のそれぞれをＮ個のうちのいずれかの前記分割映像情報の始点とする、との構成を有している。 In order to solve the above problems, a video content generation method according to the present invention is a video content generation method for generating video content from first material content including video information and audio information,
a first reading step of reading the first material content from a storage device; a content separation step of separating the read first material content into the video information and the audio information; and based on the separated audio information. , an extraction start point determination step of determining N extraction start points (where N is an integer equal to or greater than 2); a second reading step of reading a second material content, which is music information, and synthesis information corresponding to the second material content from the storage device; and based on the synthesis information, a content synthesizing step of synthesizing the music information and the divided video information to generate the video content,
The synthesizing information includes a plurality of video switching points synchronized with the beat represented by the music information, the plurality of video switching points belong to one of N groups, and the content synthesizing step includes: each of the video switching points belonging to N is set as the starting point of any one of the N pieces of divided video information.

上記動画コンテンツ生成方法は、前記抽出開始点決定工程において、分離された前記音声情報に含まれる複数の音量ピーク点のうちの上位Ｎ個の時間を前記抽出開始点とする、との構成を有していてもよい。 The video content generation method has a configuration in which, in the extraction start point determination step, the extraction start point is set to the top N times of a plurality of volume peak points included in the separated audio information. You may have

上記動画コンテンツ生成方法は、前記第１素材コンテンツおよび前記第２素材コンテンツの選択に関する指令を受け付ける指令受付工程をさらに備え、前記第１読込工程において、前記指令に基づいて前記記憶装置に予め記憶されている複数の前記第１素材コンテンツのうちの１つを読み込み、前記第２読込工程において、前記指令に基づいて前記記憶装置に予め記憶されている複数の前記第２素材コンテンツのうちの１つと該１つに対応する前記合成情報とを読み込む、との構成を有していてもよい。 The moving image content generating method further comprises a command receiving step of receiving a command regarding selection of the first material content and the second material content, wherein in the first reading step, based on the command, one of the plurality of first material contents stored in the storage device, and one of the plurality of second material contents pre-stored in the storage device based on the command in the second reading step; The composition information corresponding to the one may be read.

なお、前記合成情報は、ＭＩＤＩ形式で記述することができる。この場合は、前記Ｎ個のグループを音階で表現することができる。 Note that the synthesis information can be described in MIDI format. In this case, the N groups can be represented by musical scales.

また、上記課題を解決するために、本発明に係る動画コンテンツ生成プログラムは、映像情報と音声情報とを含む第１素材コンテンツから動画コンテンツを生成する動画コンテンツ生成方法を情報処理装置に実行させる動画コンテンツ生成プログラムであって、
前記情報処理装置に、記憶装置から前記第１素材コンテンツを読み込む第１読込工程と、読み込まれた前記第１素材コンテンツを前記映像情報と前記音声情報とに分離するコンテンツ分離工程と、分離された前記音声情報に基づいて、Ｎ個（ただし、Ｎは２以上の整数）の抽出開始点を決定する抽出開始点決定工程と、分離された前記映像情報から、Ｎ個の前記抽出開始点のそれぞれを始点としたＮ個の分割映像情報を抽出する分割映像抽出工程と、前記記憶装置から音楽情報である第２素材コンテンツと該第２素材コンテンツに対応する合成情報とを読み込む第２読込工程と、前記合成情報に基づいて前記音楽情報と前記分割映像情報とを合成することにより、前記動画コンテンツを生成するコンテンツ合成工程とを実行させ、
前記合成情報は、前記音楽情報が表すビートに同期した複数の映像切替点を含み、前記複数の映像切替点は、Ｎ個のグループのいずれかに属しており、前記コンテンツ合成工程において、同じグループに属する前記映像切替点のそれぞれがＮ個のうちのいずれかの前記分割映像情報の始点とされる、との構成を有していてもよい。 In order to solve the above-described problems, a video content generation program according to the present invention causes an information processing apparatus to execute a video content generation method for generating video content from first material content including video information and audio information. A content generation program,
a first reading step of reading the first material content from a storage device into the information processing device; a content separation step of separating the read first material content into the video information and the audio information; an extraction start point determination step of determining N extraction start points (where N is an integer of 2 or more) based on the audio information; and determining each of the N extraction start points from the separated video information. a divided video extracting step of extracting N pieces of divided video information starting from , and a second reading step of reading second material content, which is music information, and synthesis information corresponding to the second material content from the storage device and executing a content synthesizing step of synthesizing the music information and the divided video information based on the synthesizing information to generate the video content,
The synthesizing information includes a plurality of video switching points synchronized with the beat represented by the music information, the plurality of video switching points belonging to one of N groups, and the content synthesizing step comprising: may be set as the start point of any one of the N divided video information.

上記動画コンテンツ生成プログラムは、前記抽出開始点決定工程において、分離された前記音声情報に含まれる複数の音量ピーク点のうちの上位Ｎ個に対応する時間が前記抽出開始点とされる、との構成を有していてもよい。 wherein, in the extraction start point determination step, the moving image content generation program determines the extraction start point at a time corresponding to the top N of a plurality of volume peak points included in the separated audio information. may have a configuration.

上記動画コンテンツ生成プログラムは、前記情報処理装置に、前記第１素材コンテンツおよび前記第２素材コンテンツの選択に関する指令を受け付ける指令受付工程をさらに実行させ、前記第１読込工程において、前記指令に基づいて前記記憶装置に予め記憶されている複数の前記第１素材コンテンツのうちの１つが読み込まれ、前記第２読込工程において、前記指令に基づいて前記記憶装置に予め記憶されている複数の前記第２素材コンテンツのうちの１つと該１つに対応する前記合成情報とが読み込まれる、との構成を有していてもよい。 The video content generation program causes the information processing device to further execute a command receiving step of receiving a command regarding selection of the first material content and the second material content, and in the first reading step, based on the command one of the plurality of first material contents pre-stored in the storage device is read, and in the second reading step, the plurality of second material contents pre-stored in the storage device based on the instruction; It may have a configuration in which one of the material contents and the synthesis information corresponding to the one are read.

本発明によれば、誰もが簡単に見栄えのよい動画コンテンツを生成することができる動画コンテンツ生成方法および生成プログラムを提供することができる。 According to the present invention, it is possible to provide a moving image content generating method and a generating program that enable anyone to easily generate moving image content that looks good.

本発明の実施例に係る動画コンテンツ生成プログラムを実行している最中のスマートフォンの概略的な構成を示すブロック図である。FIG. 4 is a block diagram showing a schematic configuration of a smartphone executing a moving image content generation program according to an embodiment of the present invention; 本発明の実施例に係る動画コンテンツ生成方法のフロー図である。Fig. 3 is a flow diagram of a method for generating video content according to an embodiment of the present invention; 図２に示すコンテンツ分離工程を説明するための図である。3 is a diagram for explaining a content separation process shown in FIG. 2; FIG. 図２に示す抽出開始点決定工程を説明するための図である。FIG. 3 is a diagram for explaining an extraction start point determination step shown in FIG. 2; 図２に示す分割映像抽出工程を説明するための図である。FIG. 3 is a diagram for explaining a divided video extracting process shown in FIG. 2; FIG. 図２に示す第２読込工程において読み込まれる音楽情報および合成情報の構成を示す図である。3 is a diagram showing the structure of music information and synthesis information read in a second reading step shown in FIG. 2; FIG. 図２に示すコンテンツ合成工程の前半部分を説明するための図である。3 is a diagram for explaining the first half of the content synthesizing process shown in FIG. 2; FIG. 図２に示すコンテンツ合成工程の後半部分を説明するための図である。3 is a diagram for explaining the second half of the content synthesizing process shown in FIG. 2; FIG. 本発明の変形例１に係る動画コンテンツ生成方法のフロー図である。FIG. 5 is a flowchart of a moving image content generation method according to Modification 1 of the present invention; 本発明の変形例２に係る動画コンテンツ生成プログラムを実行している最中のスマートフォンの概略的な構成を示すブロック図である。FIG. 11 is a block diagram showing a schematic configuration of a smartphone that is executing a moving image content generation program according to Modification 2 of the present invention; 本発明の変形例２に係る動画コンテンツ生成方法のフロー図である。FIG. 10 is a flowchart of a moving image content generation method according to Modification 2 of the present invention; 本発明の変形例２に係る動画コンテンツ生成方法の別のフロー図である。FIG. 11 is another flowchart of the moving image content generation method according to Modification 2 of the present invention. 本発明の変形例３に係る動画コンテンツ生成プログラムを実行している最中のスマートフォンの概略的な構成を示すブロック図である。FIG. 11 is a block diagram showing a schematic configuration of a smartphone that is executing a moving image content generation program according to Modification 3 of the present invention; 本発明の変形例３に係る動画コンテンツ生成方法の第２読込工程において読み込まれる音楽情報、合成情報および特殊効果情報の構成を示す図である。FIG. 10 is a diagram showing the structures of music information, synthesis information, and special effect information read in the second reading step of the moving image content generation method according to Modification 3 of the present invention;

以下、添付図面を参照しながら、本発明に係る動画コンテンツ生成方法および生成プログラムの一実施例について説明する。 An embodiment of a moving image content generation method and generation program according to the present invention will be described below with reference to the accompanying drawings.

［実施例］
図１に、スマートフォン１０の概略的な構成を示す。同図に示すように、スマートフォン１０は、ＭＰＵ（Micro Processor Unit）からなる情報処理装置１１と、メモリからなる記憶装置１２と、カメラ１３と、マイク１４と、タッチパネルディスプレイ１５と、スピーカー１６とを備えている。 [Example]
FIG. 1 shows a schematic configuration of the smartphone 10. As shown in FIG. As shown in the figure, the smartphone 10 includes an information processing device 11 consisting of an MPU (Micro Processor Unit), a storage device 12 consisting of a memory, a camera 13, a microphone 14, a touch panel display 15, and a speaker 16. I have.

情報処理装置１１は、第１素材コンテンツ生成部２０と動画コンテンツ生成部２１とを有している。このうち、第１素材コンテンツ生成部２０は、標準的に備えられた動画撮影プログラムがユーザによって実行されたときに情報処理装置１１内に形成される機能ブロックである。一方、動画コンテンツ生成部２１は、本実施例に係る動画コンテンツ生成プログラムがユーザによって実行されたときに情報処理装置１１内に形成される機能ブロックである。 The information processing device 11 has a first material content generation unit 20 and a moving image content generation unit 21 . Among these, the first material content generation unit 20 is a functional block formed in the information processing apparatus 11 when a moving image shooting program provided as standard is executed by the user. On the other hand, the moving image content generation unit 21 is a functional block formed within the information processing device 11 when the moving image content generation program according to the present embodiment is executed by the user.

第１素材コンテンツ生成部２０は、カメラ１３が出力する映像信号とマイクが出力する音声信号とに基づいて、映像情報と音声情報とを含む第１素材コンテンツ（動画ファイル）を生成し、これを記憶装置１２に記憶させる。 The first material content generation unit 20 generates a first material content (moving image file) including video information and audio information based on the video signal output from the camera 13 and the audio signal output from the microphone, and Store in the storage device 12 .

記憶装置１２は、第１素材コンテンツに加え、音楽情報である第２素材コンテンツ、およびこれに対応する合成情報をさらに記憶している。第２素材コンテンツおよび合成情報は、予め記憶されていてもよいし、本実施例に係る動画コンテンツ生成プログラムがユーザによって実行されたときに記憶されてもよい。第２素材コンテンツおよび合成情報については、後で詳しく説明する。 In addition to the first material content, the storage device 12 further stores second material content, which is music information, and synthesis information corresponding thereto. The second material content and the synthesis information may be stored in advance, or may be stored when the moving image content generation program according to the present embodiment is executed by the user. The second material content and synthesis information will be described later in detail.

なお、本実施例では、本実施例に係る動画コンテンツ生成プログラムがユーザによって実行された時に、各１つの第１素材コンテンツ、第２素材コンテンツおよび合成情報が記憶装置１２に既に記憶されているものとする。 Note that in this embodiment, when the user executes the moving image content generation program according to this embodiment, one each of the first material content, the second material content, and the synthesis information are already stored in the storage device 12. and

動画コンテンツ生成部２１は、前述した通り、本実施例に係る動画コンテンツ生成プログラムがユーザによって実行されたときに形成される。動画コンテンツ生成部２１は、本実施例に係る動画コンテンツ生成方法を実行することにより、記憶装置１２に記憶された第１素材コンテンツおよび第２素材コンテンツを合成情報に基づいて合成し、新たな動画コンテンツを生成する。言い換えると、本実施例に係る動画コンテンツ生成プログラムは、情報処理装置１１に本実施例に係る動画コンテンツ生成方法を実行させることにより、新たな動画コンテンツを生成する。 The moving image content generation unit 21 is formed when the moving image content generation program according to the present embodiment is executed by the user, as described above. By executing the moving image content generating method according to the present embodiment, the moving image content generation unit 21 synthesizes the first material content and the second material content stored in the storage device 12 based on the synthesis information, and generates a new moving image. Generate content. In other words, the moving image content generation program according to this embodiment generates new moving image content by causing the information processing apparatus 11 to execute the moving image content generation method according to this embodiment.

動画コンテンツ生成部２１は、生成した動画コンテンツをタッチパネルディスプレイ１５およびスピーカー１６を介して再生することができる。この他、動画コンテンツ生成部２１は、生成した動画コンテンツを不図示の通信部を介してアップロードしたり、記憶装置１２に記憶させたりすることもできる。 The moving image content generation unit 21 can reproduce the generated moving image content via the touch panel display 15 and the speaker 16 . In addition, the moving image content generation unit 21 can upload the generated moving image content via a communication unit (not shown) or store it in the storage device 12 .

続いて、図２を参照しながら、本実施例に係る動画コンテンツ生成方法（すなわち、動画コンテンツ生成部２１の動作）について説明する。 Next, a moving image content generation method (that is, operation of the moving image content generation unit 21) according to the present embodiment will be described with reference to FIG.

第１読込工程Ｓ１において、動画コンテンツ生成部２１は、記憶装置１２から第１素材コンテンツを読み込む。前述した通り、第１素材コンテンツは、ユーザ自身が撮影した動画に関するファイルであり、映像情報と音声情報とを含んでいる。 In the first reading step S<b>1 , the moving image content generation unit 21 reads the first material content from the storage device 12 . As described above, the first material content is a file related to moving images shot by the user himself/herself, and includes video information and audio information.

工程Ｓ１の次に実行されるコンテンツ分離工程Ｓ２において、動画コンテンツ生成部２１は、工程Ｓ１で読み込まれた第１素材コンテンツを映像情報と音声情報とに分離する（図３参照）。 In a content separation step S2 that is executed after step S1, the moving image content generator 21 separates the first material content read in step S1 into video information and audio information (see FIG. 3).

工程Ｓ２の次に実行される抽出開始点決定工程Ｓ３において、動画コンテンツ生成部２１は、工程Ｓ２で分離された音声情報を解析して当該音声情報に含まれる音量ピーク点のうちの上位Ｎ個（ただし、Ｎは２以上の整数。本実施例では、Ｎ＝７）を特定するとともに、これらに対応する時間を抽出開始点とする。図４に示すように、本実施例では、音量ピーク点Ｐ１，Ｐ２，・・・，Ｐ６，Ｐ７に対応する時間ｔ１，ｔ２，・・・，ｔ６，ｔ７が抽出開始点となる。 In an extraction start point determination step S3 that is executed after step S2, the moving image content generation unit 21 analyzes the audio information separated in step S2, and determines the top N volume peak points included in the audio information. (where N is an integer equal to or greater than 2. In this embodiment, N=7) are specified, and the time corresponding to these is set as the extraction start point. As shown in FIG. 4, in this embodiment, times t1, t2, . . . , t6, t7 corresponding to volume peak points P1, P2, .

工程３の次に実行される分割映像抽出工程Ｓ４において、動画コンテンツ生成部２１は、工程Ｓ２で分離された映像情報から、工程Ｓ３で決定された７個の抽出開始点ｔ１，ｔ２，・・・，ｔ６，ｔ７のそれぞれを始点とした７個の分割映像情報Ｖ１，Ｖ２，・・・，Ｖ６，Ｖ７を抽出する（図５参照）。より詳しくは、動画コンテンツ生成部２１は、抽出開始点ｔ１－ｔ２間の情報を分割映像情報Ｖ１として抽出し、抽出開始点ｔ２－ｔ３間の情報を分割映像情報Ｖ２として抽出し、・・・、抽出開始点ｔ６－ｔ７間の情報を分割映像情報Ｖ６として抽出し、抽出開始点ｔ７と当該映像情報の終端との間の情報を分割映像情報Ｖ７として抽出する。 In the divided video extraction step S4 that is executed after step 3, the moving image content generation unit 21 extracts the seven extraction start points t1, t2, . , t6, t7 are the starting points, and seven pieces of divided video information V1, V2, . . . , V6, V7 are extracted (see FIG. 5). More specifically, the moving image content generator 21 extracts information between extraction start points t1 and t2 as divided video information V1, extracts information between extraction start points t2 and t3 as divided video information V2, and so on. , the information between the extraction start point t6 and t7 is extracted as the divided video information V6, and the information between the extraction start point t7 and the end of the video information is extracted as the divided video information V7.

工程Ｓ４の次に実行される第２読込工程Ｓ５において、動画コンテンツ生成部２１は、記憶装置１２から第２素材コンテンツおよびこれに対応する合成情報を読み込む。 In a second reading step S5 that is executed after step S4, the moving image content generation unit 21 reads the second material content and the synthesis information corresponding thereto from the storage device 12 .

図６（Ａ）に、読み込まれた第２素材コンテンツに含まれる音楽情報を示す。また、図６（Ｂ）に、読み込まれた合成情報を示す。これらの図から明らかなように、合成情報は、音楽情報が表すビートに同期した複数の映像切替点ＳＷ１，ＳＷ２，・・・，ＳＷ１５，ＳＷ１６を含んでいる。そして、複数の映像切替点ＳＷ１，ＳＷ２，・・・，ＳＷ１５，ＳＷ１６は、７個のグループＣ３，Ｄ３，Ｅ３，Ｆ３，Ｇ３，Ａ３，Ｂ３のいずれかに属している。例えば、映像切替点ＳＷ１，ＳＷ１４はグループＣ３に属し、映像切替点ＳＷ１０，ＳＷ１１，ＳＷ１２，ＳＷ１３はグループＦ３に属している。 FIG. 6A shows music information included in the read second material content. Also, FIG. 6B shows the read synthesis information. As is clear from these figures, the synthesized information includes a plurality of video switching points SW1, SW2, . , SW15, and SW16 belong to one of seven groups C3, D3, E3, F3, G3, A3, and B3. For example, video switching points SW1 and SW14 belong to group C3, and video switching points SW10, SW11, SW12, and SW13 belong to group F3.

合成情報は、ＭＩＤＩ（Musical Instrument Digital Interface）形式で記述されている。ＭＩＤＩ形式を利用することにより、７個のグループＣ３，Ｄ３，Ｅ３，Ｆ３，Ｇ３，Ａ３，Ｂ３のいずれかに属する複数の映像切替点の時間的な位置を容易に記述することができる。なお、Ｃ３、Ｄ３、Ｅ３、Ｆ３、Ｇ３、Ａ３およびＢ３は、それぞれ、第３オクターブの音階「ド」、「レ」、「ミ」、「ファ」、「ソ」、「ラ」および「シ」を意味する。 Synthesis information is described in MIDI (Musical Instrument Digital Interface) format. By using the MIDI format, it is possible to easily describe the temporal positions of a plurality of video switching points belonging to any one of seven groups C3, D3, E3, F3, G3, A3 and B3. Note that C3, D3, E3, F3, G3, A3 and B3 are the notes of the third octave "Do", "Re", "Mi", "Fa", "So", "La" and "Si", respectively. ” means.

工程Ｓ５の次に実行されるコンテンツ合成工程Ｓ６において、動画コンテンツ生成部２１は、合成情報に含まれる映像切替点ＳＷ１，ＳＷ２，・・・，ＳＷ１５，ＳＷ１６に基づいて、工程Ｓ４で抽出された７個の分割映像情報Ｖ１，Ｖ２，・・・，Ｖ６，Ｖ７と第２素材コンテンツに含まれる音楽情報とを合成する。 In the content synthesizing step S6 that is executed after the step S5, the moving image content generation unit 21 extracts the video switching points SW1, SW2, . The seven pieces of divided video information V1, V2, . . . , V6, V7 and the music information included in the second material content are synthesized.

より具体的には、動画コンテンツ生成部２１は、まず、７個の分割映像情報Ｖ１，Ｖ２，・・・，Ｖ６，Ｖ７と７個のグループＣ３，Ｄ３，Ｅ３，Ｆ３，Ｇ３，Ａ３，Ｂ３とを対応付ける。本実施例では、この対応付けをランダムに行い、その結果、グループＣ３と分割映像情報Ｖ１、グループＤ３と分割映像情報Ｖ３、グループＥ３と分割映像情報Ｖ５、グループＦ３と分割映像情報Ｖ６、グループＧ３と分割映像情報Ｖ２、グループＡ３と分割映像情報Ｖ４、およびグループＢ３と分割映像情報Ｖ７が対応付けられたものとする（図７参照）。 More specifically, the moving image content generation unit 21 first divides the seven pieces of divided image information V1, V2, . and In this embodiment, this association is performed randomly, and as a result, group C3 and divided video information V1, group D3 and divided video information V3, group E3 and divided video information V5, group F3 and divided video information V6, group G3. and divided video information V2, group A3 and divided video information V4, and group B3 and divided video information V7 (see FIG. 7).

続いて、動画コンテンツ生成部２１は、グループＣ３に属する映像切替点ＳＷ１，ＳＷ１４が分割映像情報Ｖ１の開始点となり、グループＤ３に属する映像切替点ＳＷ６が分割映像情報Ｖ３の開始点となり、グループＥ３に属する映像切替点ＳＷ５が分割映像情報Ｖ５の開始点となり、グループＦ３に属する映像切替点ＳＷ１０，ＳＷ１１，ＳＷ１２，ＳＷ１３が分割映像情報Ｖ６の開始点となり、グループＧ３に属する映像切替点ＳＷ１５が分割映像情報Ｖ２の開始点となり、グループＡ３に属する映像切替点ＳＷ２，ＳＷ４，ＳＷ７，ＳＷ９，ＳＷ１６が分割映像情報Ｖ４の開始点となり、グループＢ３に属する映像切替点ＳＷ３，ＳＷ８が分割映像情報Ｖ７の開始点となるように分割映像情報Ｖ１，Ｖ２，・・・，Ｖ６，Ｖ７を繋ぎ合わせ、繋ぎ合わせたものと音声情報とを合成する（図８参照）。 Subsequently, the moving image content generating unit 21 sets the video switching points SW1 and SW14 belonging to group C3 as the starting point of the divided video information V1, the video switching point SW6 belonging to the group D3 as the starting point of the divided video information V3, and sets the video switching point SW6 belonging to the group D3 as the starting point of the divided video information V3. The video switching point SW5 belonging to the group G3 is the starting point of the divided video information V5, the video switching points SW10, SW11, SW12, and SW13 belonging to the group F3 are the starting points of the divided video information V6, and the video switching point SW15 belonging to the group G3 is the starting point of the divided video information V6. The video switching points SW2, SW4, SW7, SW9, and SW16 belonging to the group A3 are the starting points of the divided video information V4, and the video switching points SW3 and SW8 belonging to the group B3 are the starting points of the divided video information V7. The divided video information V1, V2, .

このとき、動画コンテンツ生成部２１は、例えば、映像切替点Ｓ１－Ｓ２間の時間が分割映像情報Ｖ１の時間よりも短い場合は、分割映像情報Ｖ１の先頭の一部分だけを使用し、映像切替点Ｓ１－Ｓ２間の時間が分割映像情報Ｖ１の時間よりも長い場合は、分割映像情報Ｖ１の最終フレーム（静止画）で穴埋めをする。 At this time, for example, if the time between the video switching points S1 and S2 is shorter than the time of the divided video information V1, the moving image content generation unit 21 uses only the first part of the divided video information V1, If the time between S1 and S2 is longer than the time of the divided video information V1, the gap is filled with the last frame (still image) of the divided video information V1.

このように、本実施例に係る動画コンテンツ生成方法および生成プログラムによれば、美的センスが必要となる作業をユーザが行わなくても、ユーザが作成した第１素材コンテンツに含まれる音声情報と予め用意された合成情報とに基づいて、第１素材コンテンツに含まれる映像情報が第２素材コンテンツに含まれる音楽情報（ＢＧＭ）に合うように編集され、見栄えのよい新たな動画コンテンツが得られる。 As described above, according to the moving image content generation method and generation program according to the present embodiment, even if the user does not perform work that requires an aesthetic sense, the voice information included in the first material content created by the user and the pre-generated The video information included in the first material content is edited so as to match the music information (BGM) included in the second material content based on the prepared synthesizing information, and new good-looking moving image content is obtained.

なお、本発明に係る動画コンテンツ生成方法および生成プログラムには、以下に例示する複数の変形例がある。 It should be noted that the moving image content generating method and generating program according to the present invention have a plurality of modified examples illustrated below.

［変形例１］
第２読込工程Ｓ５は、第１読込工程Ｓ１の前に実行されてもよいし（図９（Ａ）参照）、第１読込工程Ｓ１－分割映像抽出工程Ｓ４と同時並行的に実行されてもよい（図９（Ｂ）参照）。つまり、本発明では、コンテンツ合成工程Ｓ６の前に、第１読込工程Ｓ１－分割映像抽出工程Ｓ４と第２読込工程Ｓ５とが実行されていればよい。 [Modification 1]
The second reading step S5 may be performed before the first reading step S1 (see FIG. 9A), or may be performed concurrently with the first reading step S1-divided image extraction step S4. Good (see FIG. 9(B)). That is, in the present invention, it is sufficient that the first reading step S1—divided video extracting step S4 and the second reading step S5 are performed before the content synthesizing step S6.

［変形例２］
図１０に示すように、記憶装置１２は、複数の第１素材コンテンツと、複数の第２素材コンテンツと、複数の第２素材コンテンツのそれぞれに対応する合成情報とを記憶していてもよい。この場合は、図１１に示すように、第１読込工程Ｓ１および第２読込工程Ｓ５の前に、選択指令受付工程Ｓ７を実行する必要がある。 [Modification 2]
As shown in FIG. 10, the storage device 12 may store a plurality of first material contents, a plurality of second material contents, and synthesis information corresponding to each of the plurality of second material contents. In this case, as shown in FIG. 11, it is necessary to execute the selection command receiving step S7 before the first reading step S1 and the second reading step S5.

選択指令受付工程Ｓ７において、動画コンテンツ生成部２１は、記憶装置１２に記憶されている複数の第１素材コンテンツおよび第２素材コンテンツに関する情報を選択肢としてタッチパネルディスプレイ１５に表示させるとともに、タッチパネルディスプレイ１５を介して入力された第１素材コンテンツおよび第２素材コンテンツの選択に関するユーザからの指令を受け付ける。そして、第１読込工程Ｓ１において、動画コンテンツ生成部２１は、ユーザによって選択された第１素材コンテンツを読み込む。第２読込工程Ｓ５についても同様である。 In the selection command receiving step S7, the moving image content generation unit 21 causes the touch panel display 15 to display information on the plurality of first material contents and the second material contents stored in the storage device 12 as options, and causes the touch panel display 15 to be displayed. It accepts a command from the user regarding the selection of the first material content and the second material content input via. Then, in the first reading step S1, the moving image content generation unit 21 reads the first material content selected by the user. The same applies to the second reading step S5.

なお、第１読込工程Ｓ１および第２読込工程Ｓ５を直列的に実行する場合は、第１素材コンテンツおよび第２素材コンテンツの選択に関する指令を受け付ける選択指令受付工程Ｓ７を第１読込工程Ｓ１の前に実行してもよいし（図１２（Ａ）参照）、第１素材コンテンツの選択に関する指令を受け付ける選択指令受付工程Ｓ７Ａを第１読込工程Ｓ１の前に実行するとともに、第２素材コンテンツの選択に関する指令を受け付ける選択指令受付工程Ｓ７Ｂを第２読込工程Ｓ５の前に実行してもよい（図１２（Ｂ）参照）。 When executing the first reading step S1 and the second reading step S5 in series, a selection command receiving step S7 for receiving a command regarding selection of the first material content and the second material content may be performed before the first reading step S1. (See FIG. 12A), or a selection command receiving step S7A for receiving a command regarding selection of the first material content is performed before the first reading step S1, and selection of the second material content is performed. A selection command receiving step S7B for receiving a command regarding the selection command may be executed before the second reading step S5 (see FIG. 12(B)).

［変形例３］
図１３に示すように、記憶装置１２は、第２素材コンテンツに対応する特殊効果情報をさらに記憶していてもよい。この場合、動画コンテンツ生成部２１は、第２読込工程Ｓ５において、第２素材コンテンツとともに合成情報および特殊効果情報を読み込む。ただし、動画コンテンツ生成部２１は、特殊効果情報が存在しない第２素材コンテンツについては、第２素材コンテンツとともに合成情報のみを読み込む。 [Modification 3]
As shown in FIG. 13, the storage device 12 may further store special effect information corresponding to the second material content. In this case, the moving image content generation unit 21 reads the composition information and the special effect information along with the second material content in the second reading step S5. However, the moving image content generation unit 21 reads only the synthesis information together with the second material content for the second material content that does not have the special effect information.

図１４に示すように、特殊効果情報は、複数の特殊効果適用期間ＥＦ１，ＥＦ２，・・・，ＥＦ６，ＥＦ７を含んでいる。そして、特殊効果適用期間ＥＦ１，ＥＦ２，・・・，ＥＦ６，ＥＦ７は、４個のグループＣ４，Ｄ４，Ｅ４，Ｆ４のいずれかに属している。 14, the special effect information includes a plurality of special effect application periods EF1, EF2, . . . , EF6, EF7. , EF6, and EF7 belong to one of four groups C4, D4, E4, and F4.

グループＣ４に属する特殊効果適用期間ＥＦ１，ＥＦ４，ＥＦ６は、特殊効果「拡大」を適用する期間であり、グループＤ４に属する特殊効果適用期間ＥＦ７は、特殊効果「フェイドアウト」を適用する期間であり、グループＥ４に属する特殊効果適用期間ＥＦ２，ＥＦ５は、特殊効果「早送り」を適用する期間であり、グループＦ４に属する特殊効果適用期間ＥＦ３は、特殊効果「スローモーション」を適用する期間である。 Special effect application periods EF1, EF4, and EF6 belonging to group C4 are periods in which the special effect "enlargement" is applied, and special effect application period EF7 belonging to group D4 is a period in which the special effect "fade out" is applied, Special effect application periods EF2 and EF5 belonging to group E4 are periods in which the special effect "fast forward" is applied, and special effect application period EF3 belonging to group F4 is a period in which the special effect "slow motion" is applied.

特殊効果情報は、ＭＩＤＩ形式で記述されている。ＭＩＤＩ形式を利用することにより、４個のグループＣ４，Ｄ４，Ｅ４，Ｆ４のいずれかに属する特殊効果適用期間の時間的な範囲を容易に記述することができる。なお、Ｃ４、Ｄ４、Ｅ４およびＦ４は、それぞれ、第４オクターブの音階「ド」、「レ」、「ミ」および「ファ」を意味する。 Special effect information is described in MIDI format. By using the MIDI format, it is possible to easily describe the temporal range of the special effect application period belonging to one of the four groups C4, D4, E4 and F4. Note that C4, D4, E4 and F4 respectively mean the notes of the fourth octave "do", "re", "mi" and "fa".

動画コンテンツ生成部２１は、コンテンツ合成工程Ｓ６において、複数の分割映像情報を繋ぎ合わせたものに特殊効果情報にしたがった特殊効果を適用する。そして、動画コンテンツ生成部２１は、特殊効果を適用した後の映像情報と音楽情報とを合成する。 In the content synthesizing step S6, the moving image content generation unit 21 applies a special effect according to the special effect information to the piece of jointed plural pieces of divided video information. Then, the moving image content generating unit 21 synthesizes the video information and the music information after applying the special effects.

特殊効果の種類を増やす場合は、例えば、第４オクターブの音階「ソ」、「ラ」、「シ」を意味するＧ４、Ａ４、Ｂ４を利用すればよい。また、合成情報および特殊効果情報は、ＭＩＤＩ形式で記述された単一のファイルであってもよい。 To increase the number of types of special effects, for example, G4, A4, and B4, which mean "So", "La", and "Si" of the scale of the fourth octave, may be used. Also, the synthesis information and special effect information may be a single file described in MIDI format.

［その他の変形例］
（１）動画コンテンツ生成部２１は、第１素材コンテンツの音声情報に含まれる音量ピーク点以外のものに基づいて抽出開始点を決定してもよい。例えば、動画コンテンツ生成部２１は、音声情報の音量が急激に増加／減少した点を抽出開始点としてもよい。 [Other Modifications]
(1) The moving image content generation unit 21 may determine the extraction start point based on something other than the volume peak point included in the audio information of the first material content. For example, the moving image content generation unit 21 may set the point at which the volume of the audio information suddenly increases/decreases as the extraction start point.

（２）動画コンテンツ生成部２１は、コンテンツ合成工程Ｓ６を実行した後に、Ｎ個の分割映像情報とＮ個のグループとのランダムな対応付けのやり直しの要否についての指令を受け付けてもよい。対応付けをやり直すことにより、第１素材コンテンツ自体および第２素材コンテンツ自体に何も変更を加えなくても、様々な動画コンテンツを生成することができる。 (2) After executing the content composition step S6, the moving image content generation unit 21 may receive a command as to whether or not it is necessary to re-randomly associate the N pieces of divided video information with the N groups. By redoing the association, various moving image contents can be generated without changing the first material content itself and the second material content itself.

（３）動画コンテンツ生成部２１は、ランダムにではなく、予め定められたルールにしたがってＮ個の分割映像情報とＮ個のグループとを対応付けてもよい。 (3) The moving image content generator 21 may associate the N pieces of divided video information with the N groups according to a predetermined rule instead of randomly.

（４）動画コンテンツ生成部２１は、抽出開始点決定工程Ｓ３を実行する前に、抽出開始点の決定に関するユーザからの指令を受け付けてもよい。この指令には、抽出開始点決定工程Ｓ３において解析を行うべき範囲の指定や、抽出開始点の直接的な指定が含まれる。なお、ユーザによってＭ個（ただし、Ｍは１以上Ｎ以下の整数）の抽出開始点が指定された場合、動画コンテンツ生成部２１は、音声情報を解析することによって（Ｎ－Ｍ）個の抽出開始点を決定することになる。また、ユーザによって抽出開始点が指定された場合、動画コンテンツ生成部２１は、当該指定に係る抽出開始点を始点とする分割映像情報を多くの映像切替点を含むグループ（図６に示す一例では、５個の映像切替点ＳＷ２，ＳＷ４，ＳＷ７，ＳＷ９，ＳＷ１６を含むグループＡ３）に対応付けることが好ましい。 (4) The moving image content generation unit 21 may receive an instruction from the user regarding determination of the extraction start point before executing the extraction start point determination step S3. This command includes specification of the range to be analyzed in the extraction start point determination step S3 and direct specification of the extraction start point. Note that when the user designates M extraction start points (where M is an integer of 1 or more and N or less), the moving image content generation unit 21 analyzes the audio information to obtain (NM) extraction start points. Determine the starting point. Further, when the extraction start point is designated by the user, the moving image content generation unit 21 divides the divided video information starting from the designated extraction start point into a group including many video switching points (in the example shown in FIG. 6, , to a group A3) including five video switching points SW2, SW4, SW7, SW9, and SW16.

（５）動画コンテンツ生成部２１は、コンテンツ合成工程Ｓ６を実行した後に、抽出開始点の決定に関する指令を受け付けてもよい。指令を受け付けた場合、動画コンテンツ生成部２１は、抽出開始点決定工程Ｓ３、分割映像抽出工程Ｓ４およびコンテンツ合成工程Ｓ６を再度実行する。なお、動画コンテンツ生成部２１は、コンテンツ合成工程Ｓ６を実行した後に、抽出開始点の決定に関する指令に加え、対応付けのやり直しの要否についての指令を受け付けてもよい。 (5) The moving image content generator 21 may receive a command regarding determination of the extraction start point after executing the content synthesizing step S6. When receiving the instruction, the moving image content generation unit 21 executes again the extraction start point determination step S3, the divided video extraction step S4, and the content synthesis step S6. Note that, after executing the content composition step S6, the moving image content generation unit 21 may receive a command regarding whether or not to redo the association in addition to the command regarding determination of the extraction start point.

（６）動画コンテンツ生成部２１は、選択指令受付工程Ｓ７，Ｓ７Ｂにおいて、第２素材コンテンツに関する情報をユーザに提示する代わりに、曲調に関する選択肢（例えば、楽しい、悲しい等）と曲のテンポに関する選択肢（例えば、ゆっくり、はやい等）をユーザに提示してもよい。この場合、ユーザは、曲調とテンポを選択することにより、間接的に第２素材コンテンツを選択することになる。 (6) In the selection command receiving steps S7 and S7B, instead of presenting the user with information about the second material content, the moving image content generation unit 21 presents options related to the melody (for example, happy, sad, etc.) and options related to the tempo of the song. (eg, slow, fast, etc.) may be presented to the user. In this case, the user indirectly selects the second material content by selecting the tune and tempo.

（７）第１素材コンテンツに含まれる音声情報は、右側音声情報および左側音声情報からなっていてもよい。この場合は、例えば、右側音声情報および左側音声情報の一方、またはこれらを平均化したものに基づいて、抽出開始点を決定することができる。 (7) The audio information included in the first material content may consist of right audio information and left audio information. In this case, for example, the extraction start point can be determined based on one of the right audio information and the left audio information, or an average of these.

（８）合成情報および特殊効果情報は、ＭＩＤＩ以外の形式で記述されていてもよい。 (8) Synthesis information and special effect information may be described in formats other than MIDI.

（９）情報処理装置１１は、スマートフォンに備えられたものに限定されない。 (9) The information processing device 11 is not limited to one provided in a smart phone.

１０スマートフォン
１１情報処理装置
１２記憶装置
１３カメラ
１４マイク
１５タッチパネルディスプレイ
１６スピーカー
２０第１素材コンテンツ生成部
２１動画コンテンツ生成部 10 Smartphone 11 Information Processing Device 12 Storage Device 13 Camera 14 Microphone 15 Touch Panel Display 16 Speaker 20 First Material Content Generation Unit 21 Video Content Generation Unit

Claims

A video content generation method for generating video content from first material content including video information and audio information,
a first reading step of reading the first material content from a storage device;
a content separation step of separating the read first material content into the video information and the audio information;
an extraction start point determination step of determining N extraction start points (where N is an integer equal to or greater than 2) based on the separated audio information;
a divided image extracting step of extracting N pieces of divided image information starting from each of the N extraction start points from the separated image information;
a second reading step of reading second material content, which is music information, and synthesis information corresponding to the second material content from the storage device;
a content synthesizing step of synthesizing the music information and the divided video information based on the synthesizing information to generate the video content;
with
the synthesis information includes a plurality of video switching points synchronized with the beat represented by the music information;
The plurality of video switching points belong to one of N groups,
A moving image content generating method, wherein in the content synthesizing step, each of the video switching points belonging to the same group is set as a starting point of any of the N divided video information.

2. The moving image content according to claim 1, wherein, in said extraction start point determination step, the extraction start point is set to the top N times of a plurality of volume peak points included in said separated audio information. generation method.

further comprising a command receiving step of receiving a command regarding selection of the first material content and the second material content;
In the first reading step, reading one of the plurality of first material contents pre-stored in the storage device based on the command;
In the second reading step, one of the plurality of second material contents stored in advance in the storage device and the synthesis information corresponding to the one are read based on the command. 3. The moving image content generating method according to claim 1 or 2.

The synthesis information is described in MIDI format,
4. The moving image content generation method according to any one of claims 1 to 3, wherein the N groups correspond to musical scales.

A video content generation program for causing an information processing apparatus to execute a video content generation method for generating video content from first material content including video information and audio information,
In the information processing device,
a first reading step of reading the first material content from a storage device;
a content separation step of separating the read first material content into the video information and the audio information;
an extraction start point determination step of determining N extraction start points (where N is an integer equal to or greater than 2) based on the separated audio information;
a divided image extracting step of extracting N pieces of divided image information starting from each of the N extraction start points from the separated image information;
a second reading step of reading second material content, which is music information, and synthesis information corresponding to the second material content from the storage device;
a content synthesizing step of synthesizing the music information and the divided video information based on the synthesizing information to generate the video content;
and
the synthesis information includes a plurality of video switching points synchronized with the beat represented by the music information;
The plurality of video switching points belong to one of N groups,
A moving image content generating program, wherein in the content synthesizing step, each of the video switching points belonging to the same group is set as a starting point of any one of the N divided video information.

6. The extraction start point according to claim 5, wherein, in said extraction start point determination step, a time corresponding to top N of a plurality of volume peak points included in said separated audio information is set as said extraction start point. video content generator.

causing the information processing device to further execute a command receiving step of receiving a command regarding selection of the first material content and the second material content;
In the first reading step, one of the plurality of first material contents pre-stored in the storage device is read based on the command,
In the second reading step, one of the plurality of second material contents pre-stored in the storage device and the synthesis information corresponding to the one are read based on the command. 7. The moving image content generation program according to claim 5 or 6.

The synthesis information is described in MIDI format,
8. The moving image content generation program according to any one of claims 5 to 7, wherein the N groups correspond to musical scales.