JP2012182724A

JP2012182724A - Moving image combining system, moving image combining method, moving image combining program and storage medium of the same

Info

Publication number: JP2012182724A
Application number: JP2011045123A
Authority: JP
Inventors: Keiichiro Hoashi; 啓一郎帆足; Hiromi Ishisaki; 広海石先; Toshihiro Ono; 智弘小野
Original assignee: KDDI Corp
Current assignee: KDDI Corp
Priority date: 2011-03-02
Filing date: 2011-03-02
Publication date: 2012-09-20
Anticipated expiration: 2031-03-02
Also published as: JP5706718B2

Abstract

PROBLEM TO BE SOLVED: To provide a system for automatically combining coacting moving images.SOLUTION: The moving image combining system 10 includes; a material moving image retrieval section 21 retrieving and obtaining the prescribed number of moving image contents matched with a retrieval request of a user from an external database and setting them to be the prescribed number of material moving images; a material moving image feature extracting section 22 extracting a feature amount time sequence from the material moving image; a success degree calculating section 23 calculating a success degree time sequence of the material moving image from the feature amount time sequence by using a prescribed relation; and a moving image combining section 24 combining the coacting moving image obtained by combining the prescribed number of material moving images based on the success degree time sequence. The prescribed relation is used in the success degree calculating section 23 after a model is constructed in a success degree model construction section 32 by multiple regression analysis by previously using learning data of a learning data storing section 31. The user may prepare the material moving image without using the retrieval section 21.

Description

本発明は動画合成に関し、特に、インターネット上の動画共有サイトで公開されている動画像コンテンツなどを素材とした音楽合奏動画を自動的に合成する動画合成システム及び方法並びに動画合成プログラム及びその記憶媒体に関する。 The present invention relates to moving image composition, and in particular, a moving image composition system and method for automatically synthesizing a music ensemble moving image using moving image content published on a moving image sharing site on the Internet, a moving image composition program, and a storage medium thereof About.

現在、インターネット上の動画共有サイト（ニコニコ動画（登録商標）など）では、市販CDなどの音楽を再生しながら、同じ楽曲の特定の楽器を演奏する様子が撮影された動画像コンテンツ（以下、「演奏動画」）を共有して、楽しんでいるユーザが増えている。また、これらの演奏動画を集めて編集することにより、複数のユーザがあたかも１つの楽曲を演奏しているような動画像コンテンツ（以下、「合奏動画」）を作成し、共有するユーザも増えている。 Currently, on video sharing sites on the Internet (Nico Nico Douga (registered trademark), etc.), video content (hereinafter referred to as “ The number of users who share and enjoy performance videos ”) is increasing. Further, by collecting and editing these performance videos, more users create and share moving image content (hereinafter referred to as “ensemble video”) as if a plurality of users are playing one piece of music. Yes.

従来、上記のような合奏動画を制作するには、ユーザの膨大な編集作業が必要となっている。そのため、こうした動画の作成を楽しむことができるユーザは限られている。さらに、生成される合奏動画の品質は、作成したユーザの能力やセンスに大きく依存しているため、効率的に印象の強い合奏動画を作成することは困難である。 Conventionally, in order to produce the ensemble moving image as described above, a huge amount of editing work by the user is required. Therefore, the number of users who can enjoy the creation of such moving images is limited. Furthermore, since the quality of the generated ensemble moving image greatly depends on the ability and sense of the created user, it is difficult to efficiently create an ensemble moving image with a strong impression.

上記の制作コストを勘案すると、合奏動画が制作される楽曲は、人気の高い楽曲に偏ってしまう。そのため、任意の楽曲の合奏動画を楽しみたいというユーザがいたとしても、所望の合奏動画を見ることができない可能性が高い。 Considering the above production costs, the music for which the ensemble video is produced is biased toward popular music. Therefore, even if there is a user who wants to enjoy an ensemble movie of an arbitrary music, there is a high possibility that the desired ensemble movie cannot be viewed.

このような事情に関連する従来技術としては、以下の非特許文献１で提案されている動画作成システムがあげられる。当該動画合成システムは、Web上で公開されている大量の動画を利用して、自動的に二次創作動画を作成し、ユーザからのフィードバックに応じて、好みに合った動画の作成を支援する。また、このような事情に関連する従来技術として、以下の特許文献1で開示されている動画データ合成装置があげられる。 As a conventional technique related to such a situation, there is a moving image creation system proposed in Non-Patent Document 1 below. The video composition system automatically creates a secondary creation video using a large amount of video published on the web, and supports the creation of a video that suits the user's preference according to feedback from the user. . Further, as a related art related to such a situation, there is a moving image data synthesizing apparatus disclosed in Patent Document 1 below.

室伏、中野、後藤、森島："DanceReProducer: 既存のダンス動画の再利用により音楽に合った動画を作成できるシステム", WISS 2009予稿集，2009.Murobushi, Nakano, Goto, Morishima: "DanceReProducer: A system that can create videos that match music by reusing existing dance videos", WISS 2009 Proceedings, 2009.

特開2007-74277号公報(動画データ合成装置、動画データ合成プログラム、および動画データ合成システム)JP 2007-74277 A (Moving image data synthesizing apparatus, moving image data synthesizing program, and moving image data synthesizing system)

しかし、上記の非特許文献１で対象としている動画は、同じゲームに由来する断片的な動画を素材として時系列上に並べて作成されているのに対し、合奏動画では所望の楽曲に対して、さまざまな楽器を通して演奏している演奏動画を同時進行させる必要がある。当該動画合成システムは合奏動画の自動合成向けに設計されたものではない。よって、当該動画合成システムを利用して、仮に複数の楽器による演奏動画を素材としたとしても、それぞれ適切なフィードバック情報を与えて合奏動画を作成することは困難であると考えられる。 However, while the moving images targeted in the above Non-Patent Document 1 are created by arranging pieces of moving images derived from the same game in time series, the ensemble moving images are for the desired music, It is necessary to simultaneously play performance videos playing through various instruments. The moving image composition system is not designed for automatic composition of ensemble moving images. Therefore, even if a performance video of a plurality of musical instruments is used as a material using the video synthesis system, it is considered difficult to create an ensemble video by giving appropriate feedback information.

また、上記の特許文献1に開示された動画データ合成装置においては、対象としている動画を、画像と撮影時刻とを対応付けて、予めユーザがデジタルカメラなどで撮影して用意しておく必要がある。当該動画データ合成装置によっても、所望の楽曲の合奏動画をインターネット上の演奏動画を素材として自動合成することは困難であると考えられる。 In addition, in the moving image data synthesizing apparatus disclosed in Patent Document 1, it is necessary that a target moving image is prepared in advance by a user using a digital camera or the like in association with an image and a shooting time. is there. Even with the moving image data synthesizing apparatus, it is considered difficult to automatically synthesize an ensemble moving image of a desired music using a performance moving image on the Internet as a material.

本発明の目的は、上記の従来技術の課題を解決し、例えばインターネット上に公開されている動画、又は内容が同種の所定の動画などを素材とした合奏動画を、又はより一般に、同素材による共演動画を自動的に生成する及び方法並びに動画合成プログラム及びその記憶媒体を提供することにある。 The object of the present invention is to solve the above-mentioned problems of the prior art, for example, a moving picture published on the Internet, or an ensemble moving picture using a predetermined moving picture of the same type as a material, or more generally, the same material. An object of the present invention is to provide a method and a method for automatically generating a co-star movie, a movie synthesis program, and a storage medium thereof.

また、前記素材動画検索部が、前記楽曲の音源における所定数の各楽器のパートが演じられている動画コンテンツを検索して入手し、前記所定数の素材動画とすることを第三の特徴とする。 A third feature is that the material video search unit searches and obtains video content in which a predetermined number of parts of each instrument in the sound source of the music are played, and sets the predetermined number of material videos as the third feature. To do.

上記の目的を達成するために、本発明は、動画合成システムであって、所定の楽曲の音源と連動する所定数の素材動画より特徴量時系列を抽出する素材動画特徴抽出部と、所与の関係を用いて前記特徴量時系列より前記素材動画の盛況度時系列を算出する盛況度算出部と、前記盛況度時系列に基づいて前記所定数の素材動画を組み合わせた共演動画を合成する動画合成部とを備えることを第一の特徴とする。 In order to achieve the above object, the present invention provides a moving image synthesis system, a material moving image feature extraction unit that extracts a feature amount time series from a predetermined number of material moving images linked with a sound source of a predetermined music, And a liveness degree calculating unit for calculating a prosperity time series of the material video from the feature amount time series using the relationship, and synthesizing a co-star video that combines the predetermined number of material videos based on the prosperity time series The first feature is to include a moving image synthesis unit.

また、本発明は、前記動画合成システムがさらに、前記所定の楽曲の音源を特定する検索要求を受信して、該検索要求に合致する所定数の動画コンテンツを外部データベースより検索して入手し、前記所定数の素材動画とする素材動画検索部を備えることを第二の特徴とする。 Further, according to the present invention, the video composition system further receives a search request for specifying a sound source of the predetermined music, and searches and obtains a predetermined number of video content that matches the search request from an external database, A second feature is provided with a material moving image search unit that uses the predetermined number of material moving images.

さらに、本発明は、前記動画合成部が、前記共演動画の時系列上の進行に沿って、前記素材動画のうち前記盛況度時系列が所定条件を満たすものを強調表示することによって前記共演動画を合成することを第三の特徴とする。 Further, according to the present invention, the video synthesizing unit displays the co-starring video by highlighting the material video in which the success time series satisfies a predetermined condition along the time series progress of the co-staring video. The third feature is to synthesize.

本発明によれば、前記第一の特徴により、所定の楽曲の音源に連動する共演動画が自動合成される。 According to the present invention, according to the first feature, a co-star movie that is linked to a sound source of a predetermined music is automatically synthesized.

また、前記第二の特徴により、検索要求においてユーザの所望する楽曲の音源に基づいて連動する共演動画が自動合成される。 In addition, the second feature automatically synthesizes a co-starring moving image that is linked based on the sound source of the music desired by the user in the search request.

さらに、前記第三の特徴により、盛況度に応じて素材動画を強調表示した共演動画が自動合成される。 Furthermore, according to the third feature, the co-star movie that highlights the material movie according to the degree of success is automatically synthesized.

本発明の動画合成システムを含む機能ブロック図である。It is a functional block diagram including the moving image composition system of the present invention. 本発明の処理全体のフローチャートである。It is a flowchart of the whole process of this invention. 動画合成処理のフローチャートである。It is a flowchart of a moving image composition process. 検索要求を受け付けるユーザインタフェース画面イメージの一例である。It is an example of the user interface screen image which receives a search request. 素材動画特徴量抽出部の詳細な機能ブロック図である。It is a detailed functional block diagram of a material moving image feature-value extraction part. 素材動画特徴抽出部が抽出する特徴量時系列の例を示す図である。It is a figure which shows the example of the feature-value time series which a raw material moving image feature extraction part extracts. 素材動画同士の時刻情報の同期を説明する図である。It is a figure explaining the synchronization of the time information between material moving images. 合奏動画のイメージ図である。It is an image figure of an ensemble animation. 合奏動画において各演奏動画がハイライト対象となるタイミングの例を示す図である。It is a figure which shows the example of the timing when each performance moving image becomes a highlight object in an ensemble moving image. 基本配置とハイライトの各種様式の例を示す図である。It is a figure which shows the example of various styles of basic arrangement | positioning and highlight. 本発明の動画合成システムの、検索を利用しない実施形態における機能ブロック図である。It is a functional block diagram in an embodiment which does not use search of the animation composition system of the present invention. 本発明に係る動画像合成システムとして機能できるコンピュータの主要部の構成を示した機能ブロック図である。It is a functional block diagram showing a configuration of a main part of a computer that can function as a moving image composition system according to the present invention.

以下、図面を参照して本発明を詳細に説明する。図1は本発明の動画合成システム10を含む機能ブロック図である。動画合成システム10は、その構成モジュールとして、ユーザ検索要求受付部11、動画再生部12、素材動画検索部21、素材動画特徴抽出部22、素材動画盛況度算出部23、動画合成部24、学習データ保存部31及び盛況度算出モデル構築部32を備える。 Hereinafter, the present invention will be described in detail with reference to the drawings. FIG. 1 is a functional block diagram including a moving image synthesis system 10 of the present invention. The video composition system 10 includes, as its constituent modules, a user search request accepting unit 11, a video playback unit 12, a material video search unit 21, a material video feature extraction unit 22, a material video success rate calculation unit 23, a video synthesis unit 24, learning A data storage unit 31 and a prosperity calculation model construction unit 32 are provided.

ここで、機能ブロック群1(ユーザ検索要求受付部11及び動画再生部12)は、動画合成システム10におけるユーザインタフェースを担う。機能ブロック群2(素材動画検索部21、素材動画特徴抽出部22、素材動画盛況度算出部23及び動画合成部24)は、ユーザの検索要求に従う合奏動画の作成処理を担う。機能ブロック群3(学習データ保存部31及び盛況度算出モデル構築部32)は、素材動画盛況度算出部23を動作させるための前設定(パラメータ算出など)を行う。 Here, the functional block group 1 (the user search request accepting unit 11 and the moving image reproducing unit 12) serves as a user interface in the moving image synthesizing system 10. The functional block group 2 (the material moving image search unit 21, the material moving image feature extraction unit 22, the material moving image active degree calculation unit 23, and the moving image composition unit 24) is responsible for the process of creating the ensemble moving image according to the user's search request. The functional block group 3 (the learning data storage unit 31 and the prosperity level calculation model construction unit 32) performs pre-setting (such as parameter calculation) for operating the material moving image prosperity level calculation unit 23.

各構成モジュールの処理内容の概要は次の通りである。当該処理内容の詳細については後述する。 The outline of processing contents of each component module is as follows. Details of the processing contents will be described later.

ユーザ検索要求受付部11では、動画合成システム10を利用するユーザからの検索要求を受け付ける。検索要求は、所望の合奏動画における、合奏対象となる楽曲の音源を特定する情報である。当該情報としてたとえば、ユーザが合奏動画を視聴したい楽曲のタイトルやアーティスト名などを入力し、検索要求として受け付ける。またさらに、合奏動画において合奏を構成する楽器など、より詳細な条件の検索要求を受け付けるようにしてもよい。 The user search request receiving unit 11 receives a search request from a user who uses the moving image synthesis system 10. The search request is information for specifying the sound source of the music to be performed in the desired ensemble moving image. As the information, for example, the title or artist name of the music that the user wants to view the ensemble video is input and accepted as a search request. Furthermore, a search request for more detailed conditions such as musical instruments constituting an ensemble in an ensemble moving image may be accepted.

素材動画検索部21では、ユーザ検索要求受付部11で受け付けた検索要求に該当する素材動画を、インターネット上の動画共有サイトなどから検索し、動画合成システム10内に格納する処理を行う。なお、ここでは素材動画そのものを格納するだけでなく、必要に応じて動画の関連情報も収集する。 The material moving image search unit 21 searches for a material moving image corresponding to the search request received by the user search request receiving unit 11 from a moving image sharing site on the Internet and stores it in the moving image synthesis system 10. Here, not only the material moving image itself is stored, but also related information of the moving image is collected as necessary.

素材動画特徴抽出部22は、素材動画検索部21によって収集された全ての素材動画から、合奏動画合成に必要な各種の特徴量時系列を抽出する処理を行う。本モジュールで抽出する特徴量の詳細については、後述する。 The material moving image feature extraction unit 22 performs a process of extracting various feature amount time series necessary for the ensemble moving image synthesis from all the material moving images collected by the material moving image search unit 21. Details of the feature amount extracted by this module will be described later.

盛況度算出モデル構築部32は、事前に準備され学習データ保存部31に保存されている学習データを元に、素材動画の盛況度時系列を算出するためのモデルを構築する処理を行う。本モジュールでのモデル構築処理の例については、後述する。 The prosperity degree calculation model construction unit 32 performs a process of constructing a model for calculating the prosperity degree time series of the material moving image based on the learning data prepared in advance and stored in the learning data storage unit 31. An example of model building processing in this module will be described later.

素材動画盛況度算出部23は、素材動画特徴抽出部22によって抽出された素材動画の特徴量時系列に対して、盛況度算出モデル構築部32によって構築された盛況度算出モデルに基づく所定の関係を適用して、素材動画の各々に対して盛況度時系列を算出する。本モジュールでの処理内容の詳細については、後述する。 The material video prosperity degree calculation unit 23 has a predetermined relationship based on the prosperity degree calculation model constructed by the prosperity degree calculation model construction unit 32 with respect to the feature amount time series of the material video extracted by the material video feature extraction unit 22. Is applied to calculate the success time series for each of the material videos. Details of processing contents in this module will be described later.

動画合成部24は、素材動画盛況度算出部23によって算出された各素材動画の盛況度を元に、合奏動画を合成する処理を行う。本モジュールでの処理内容の詳細については、後述する。 The moving image synthesizing unit 24 performs a process of synthesizing the ensemble moving image based on the prosperity degree of each material moving image calculated by the material moving image prosperity degree calculating unit 23. Details of processing contents in this module will be described later.

動画再生部12では、動画合成部24によって合成された合奏動画を再生する処理を行う。本モジュールにより、動画合成システム10のユーザは合奏動画を視聴することができる。あるいは、ユーザは動画再生部12を利用せず、動画合成部24によって合成された合奏動画を受信して、ユーザ自身で(所有PC上などにおいて)再生して視聴してもよい。 The moving image reproduction unit 12 performs processing for reproducing the ensemble moving image synthesized by the moving image synthesis unit 24. With this module, the user of the moving image synthesis system 10 can view the ensemble moving image. Alternatively, the user may receive the ensemble moving image synthesized by the moving image synthesizing unit 24 without using the moving image reproducing unit 12, and reproduce and view it on the user himself (such as on the owning PC).

あるいは、動画合成システム10は合成された合奏動画を、その素材動画を求めた動画共有サイト等に各素材動画の情報と共にアップロードして、ユーザに対して当該サイト等の当該合奏動画のリンク情報を送信するようにしてもよい。この場合、ユーザは当該リンク情報を利用して合奏動画を視聴する。 Alternatively, the video synthesizing system 10 uploads the synthesized ensemble video together with information on each material video to a video sharing site or the like that requested the material video, and provides the link information of the ensemble video on the site to the user. You may make it transmit. In this case, the user views the ensemble moving image using the link information.

次に、本発明における処理内容について、処理フローと共に、より詳細に説明する。図2に、本発明の全体処理のフローチャートを示す。 Next, the processing contents in the present invention will be described in more detail together with the processing flow. FIG. 2 shows a flowchart of the overall processing of the present invention.

図2に示されているように、本発明ではまずステップS21にて、盛況度算出モデル構築部32が、その後に素材動画盛況度算出部23で利用するための盛況度算出モデルの構築処理を行う。次にステップS22にて、ユーザ検索要求受付部11においてユーザからの検索要求を受け付ける。当該受け付けは後述するように、ユーザとの間である程度対話的に行われてもよい。 As shown in FIG. 2, in the present invention, first, in step S21, the prosperity degree calculation model construction unit 32 performs a process for constructing a prosperity degree calculation model for use in the material video vigorous degree calculation part 23 thereafter. Do. Next, in step S22, the user search request receiving unit 11 receives a search request from the user. The reception may be performed interactively to some extent with the user, as will be described later.

次にステップS23にて、図1の機能ブロック2(素材動画検索部21ないし動画合成部24)が合奏動画の合成を行う。その後ステップS24にて、合成された合奏動画を再生してユーザが視聴する。当該視聴においては前述のとおり、ユーザが動画再生部12を利用してもよく、あるいはユーザ自身の側にある再生部を利用するなどしてもよい。 Next, in step S23, the functional block 2 (the material moving image search unit 21 to the moving image combining unit 24) in FIG. 1 combines the ensemble moving image. Thereafter, in step S24, the synthesized ensemble moving image is reproduced and viewed by the user. In the viewing, as described above, the user may use the moving image playback unit 12, or may use a playback unit on the user's own side.

図3に、動画合成の処理のフローチャートを示す。当該フローチャートは、図2のステップS23をより詳細に示すものである。図3に示すとおり、まずステップS31において、ユーザの検索要求に従う素材動画を検索して動画合成システム10内に格納する処理を、素材動画検索部21が行う。ステップS32に示されているように、当該ステップS31は合奏動画を構成する全ての楽器（に対応する素材動画）に対して行う。 FIG. 3 shows a flowchart of the moving image composition process. This flowchart shows step S23 of FIG. 2 in more detail. As shown in FIG. 3, first, in step S31, the material moving image search unit 21 performs processing for searching for a material moving image according to the user's search request and storing it in the moving image synthesis system 10. As shown in step S32, the step S31 is performed for all musical instruments (corresponding material moving images) constituting the ensemble moving image.

全ての楽器についてステップS31を終えると、ステップS33において、検索され格納された全ての素材動画につき、素材動画特徴抽出部22によって、特徴量時系列を抽出する処理が行われる。ステップS34に示されているように、当該ステップS33も、合奏動画を構成する全ての楽器（に対応する素材動画）に対して行う。 When step S31 is completed for all musical instruments, in step S33, the material moving image feature extracting unit 22 extracts a feature amount time series for all the retrieved material moving images. As shown in step S34, the step S33 is also performed for all musical instruments (corresponding material moving images) constituting the ensemble moving image.

全ての楽器についてステップS33を終えると、ステップS35において、抽出された特徴量時系列に基づいて、動画合成部24が素材動画を合成して合奏動画を作成する。合成にあたっては、後述のように各素材動画の表示領域を合奏動画内に設け、その音声も合成して（足し合わせて）同時進行させることによって、各素材動画を組み合わせる。 When step S33 is completed for all instruments, in step S35, based on the extracted feature amount time series, the moving image synthesizing unit 24 combines the material moving images to create an ensemble moving image. In the composition, as described later, a display area of each material moving image is provided in the ensemble moving image, and the respective material moving images are combined by synthesizing (adding) the sounds and proceeding simultaneously.

ここで、図2のステップS22におけるユーザからの検索要求について説明する。当該検索要求は、図3のステップS32及びステップS34における「全ての楽器」の特定にも関連する。前述のとおり、検索要求は基本的には、ユーザが視聴を希望する合奏動画における、合奏対象となる楽曲の音源を特定するものである。同一楽曲又は同一楽曲に基づく曲であっても、アレンジ・構成・テンポなどが異なり、音源として異なる場合があるので、楽曲名に加えてアーティスト名などを用いて特定するようにしてもよい。 Here, the search request from the user in step S22 of FIG. 2 will be described. The search request is also related to the identification of “all instruments” in step S32 and step S34 of FIG. As described above, the search request basically specifies the sound source of the music to be ensemble in the ensemble moving image that the user desires to view. Even the music based on the same music or the music based on the same music may have different arrangement, composition, tempo, etc. and may be different as a sound source, and may be specified using an artist name in addition to the music name.

楽曲の音源の特定は、ユーザからのテキスト入力等によってもよいし、所定のリストの中から選択させるようにしてもよいし、それらの組み合わせ（テキスト入力でリスト中の全項目の中から候補を絞り込んだ上で選択する）を用いてもよい。 The sound source of the music may be specified by text input from the user, or may be selected from a predetermined list, or a combination of them (text input is used to select candidates from all items in the list. (Select after narrowing down) may be used.

さらに詳細な検索要求として、楽曲の音源の特定に追加して、合奏動画を構成する楽器を特定する情報をユーザから受け付けるようにしてもよい。楽器の特定には、次のように各種の実施形態が可能である。一実施形態では、ユーザが所望の楽器名を全てテキストなどで入力するようにしてもよい。例えば「ギター、ベース、ドラム」などと入力すればよい。さらに、各楽器の数を指定できるようにしてもよい。例えば「第一ギター、第二ギター、ベース、ドラム」あるいは「ギター２，ベース１，ドラム１」などと入力するようにすればよい。 As a more detailed search request, information specifying the musical instrument constituting the ensemble moving image may be received from the user in addition to specifying the sound source of the music. Various embodiments are possible for specifying the musical instrument as follows. In one embodiment, the user may input all desired instrument names in text or the like. For example, “guitar, bass, drum” may be input. Further, the number of each instrument may be designated. For example, “first guitar, second guitar, bass, drum” or “guitar 2, bass 1, drum 1” may be input.

また一実施形態では、次のような対話的処理をユーザ検索受付部11において行ってもよい。すなわち、まずユーザが希望する楽曲の音源を特定してから、当該楽曲の音源の情報を検索キーとして素材動画検索部21が検索を行い、当該楽曲の音源を用いて楽器演奏を行っている素材動画が利用可能な楽器を調べる。そして、ユーザに対して当該利用可能な楽器をリスト化して提示して、その中から選択させるようにする。 In one embodiment, the following interactive processing may be performed in the user search reception unit 11. That is, first, after identifying the sound source of the music desired by the user, the material video search unit 21 searches using the sound source information of the music as a search key, and performs the musical instrument performance using the sound source of the music Find out which instruments are available for video. Then, the available musical instruments are listed and presented to the user, and are selected from the list.

また一実施形態では、楽曲の音源によらず固定の楽器（例えばギター、ベース、ドラムなど）を用いるようにしてもよい。また一実施形態では、楽曲の音源毎に素材動画が利用可能な楽器を調べてリスト化しておくことにより、楽曲の音源毎に、あるいは楽曲のジャンル（ロック、ジャズ、クラシックなど）毎に固定の楽器を用いるようにしてもよい。なお当該二つの実施形態においては、ユーザは楽器を特定する情報を入力する必要はないが、それぞれ別実施形態として、固定の楽器の中からユーザに選択させるようにしてもよい。 In one embodiment, a fixed musical instrument (eg, guitar, bass, drum, etc.) may be used regardless of the music source. Further, in one embodiment, by checking and listing the musical instruments that can use the material video for each sound source of the music, it is fixed for each music sound source or each music genre (rock, jazz, classic, etc.). A musical instrument may be used. In the two embodiments, the user does not need to input information for specifying a musical instrument. However, as another embodiment, the user may select a fixed musical instrument.

以上のような各実施形態、あるいはそれらの可能な組み合わせにより、合奏動画において用いる「全ての楽器」が特定されるので、当該全ての楽器に対して前述の図3におけるステップS32及びS34の処理を行う。なお、前述のとおり楽器の数を指定してもよいので、当該全ての楽器とは、楽器の種類及び各楽器の数として特定する。 Each embodiment as described above, or a possible combination thereof, identifies “all musical instruments” used in the ensemble moving image. Therefore, the processing in steps S32 and S34 in FIG. 3 described above is performed for all the musical instruments. Do. Since the number of musical instruments may be specified as described above, all the musical instruments are specified as the type of musical instrument and the number of each musical instrument.

なおまた、図2のステップS22における検索要求においては、全ての楽器の特定に加えて、上記各実施形態と同様の特定手法により、各楽器の演奏者のプロフィールの特定を受け付けるようにしてもよい。演奏者のプロフィールとしては、プロアマの区別及び国籍や、さらにユーザが希望するなら演奏者自身の特定を含めてもよい。 In addition, in the search request in step S22 of FIG. 2, in addition to specifying all instruments, specification of the player's profile of each instrument may be accepted by the same specifying method as in the above embodiments. . The player's profile may include pro-ama distinction and nationality, as well as the performer's own identification if desired by the user.

さらに検索要求においては、素材動画検索部21で素材動画を検索する対象となる動画共有サイト等（より一般に、外部データベース）の指定を受け付けるようにしてもよい。 Furthermore, in the search request, the material moving image search unit 21 may accept a specification of a moving image sharing site or the like (more generally, an external database) that is a target for searching for a material moving image.

以上のように、検索要求は各種の実施形態が可能である。図4に、検索要求を受け付けるユーザインタフェースの一例として、ユーザインタフェース画面イメージの一例を示す。B1は特定する楽曲の音源のテキスト入力欄であり、B2は合奏動画において希望する構成楽器のチェックボックス形式による選択欄である。 As described above, the search request can be in various embodiments. FIG. 4 shows an example of a user interface screen image as an example of a user interface that accepts a search request. B1 is a text input field for the sound source of the music to be identified, and B2 is a selection field in the check box format of the desired constituent instrument in the ensemble moving image.

ここで、上記のような検索要求を用いての、図3のステップS31における素材動画検索部21による素材動画検索処理について説明する。基本的には、検索要求における楽曲の音源の指定された各楽器につき、対応する素材動画を検索して動画合成システム10内に格納するとともに、素材動画の関連情報も収集する。なお、素材動画に対して楽曲の音源及び使用楽器以外の指定もある場合は、そのような指定も含めて対応する素材動画を検索する。 Here, the material moving image search process by the material moving image search unit 21 in step S31 of FIG. 3 using the search request as described above will be described. Basically, a corresponding material moving image is searched and stored in the moving image synthesizing system 10 for each instrument designated as the sound source of the music in the search request, and related information of the material moving image is also collected. If there is a designation other than the music source and musical instrument used for the material moving image, the corresponding material moving image is searched including such designation.

関連情報には、例えば、次のようなものがある。検索要求に含まれる楽曲の音源の特定情報や演奏楽器情報をキーとして動画共有サイトなどを検索すると、ヒットした素材動画には当該サイトにおいて検索を可能とするタグ情報などが対応づけられて与えられている場合が多い。このようなタグ情報には検索で用いた情報以外の情報も含まれているので、関連情報として利用可能であり、当該素材動画と対応づけて保存する。 Related information includes the following, for example. When searching for a video sharing site, etc., using the music source specific information or musical instrument information included in the search request as a key, the hit material video is associated with tag information that can be searched on the site. There are many cases. Since such tag information includes information other than the information used in the search, it can be used as related information and stored in association with the material moving image.

当該タグ情報には、検索を可能とする情報として、素材動画が所望楽曲の音源を用いた演奏動画である旨の情報、演奏動画における使用楽器の情報が含まれる。タグ情報に含まれているその他の情報としては、演奏者プロフィールなどの、前述のユーザの検索要求をより詳細に受け付けるための各種の情報がありうる。 The tag information includes information indicating that the material moving image is a performance moving image using the sound source of the desired music and information on the musical instrument used in the performance moving image as information enabling search. The other information included in the tag information may include various types of information for accepting the above-described user search request in more detail, such as a player profile.

なお、タグ情報において利用されるキーワードは、動画共有サイトなど毎にある程度典型的なキーワードが存在することがある。よって検索対象のサイト毎にそのようなキーワードの辞書を予め素材動画検索部21で用意しておき、ユーザの検索要求をサイト毎のキーワードに変換して検索を行うことで、所望楽曲の音源の所望楽器の演奏動画を効率的に見つけて素材動画とすることができる。 In addition, the keyword used in tag information may have a typical keyword to some extent for each video sharing site or the like. Therefore, a dictionary of such keywords is prepared in advance in the material video search unit 21 for each search target site, and the search request of the user is converted into a keyword for each site, and the search is performed. It is possible to efficiently find a performance video of a desired instrument and use it as a material video.

また、関連情報として、ヒットした素材動画の動画共有サイト等において記録され動画と共に提示されている再生数、アクセス数又は「お気に入り登録」数（以下、「再生数など」と呼ぶ）や、視聴ユーザの高評価ポイント数及び低評価ポイント数、なども収集してよい。 In addition, as related information, the number of playbacks, access counts or “favorite registrations” (hereinafter referred to as “playback counts”) recorded on the video sharing site of the hit material video and presented together with the video, viewer users The number of high evaluation points and the number of low evaluation points may be collected.

各楽器の素材動画でステップS31の検索後さらにステップS33の処理を行う対象は、当該再生数などが上位の所定数の動画、又は再生数などが所定数以上である等の所定条件を満たす動画に限定してもよい。再生数などの代わりに、高評価ポイント数又は高評価ポイント数から低評価ポイント数を引いたポイント数などを用いて限定してもよい。当該限定は各楽器毎に異なる手法で限定してもよい。その他、動画共有サイト等において提示されている、視聴ユーザの動画の評価に関連する任意の指標を収集して、同様に動画の限定に利用してもよい。 After searching in step S31 in the material video of each instrument, the target to be further processed in step S33 is a predetermined number of videos with the highest number of playbacks, etc., or videos satisfying a predetermined condition such as the number of playbacks being a predetermined number or more You may limit to. You may limit using the number of points which subtracted the low evaluation point number from the high evaluation point number or the high evaluation point number instead of the reproduction | regeneration number etc. The limitation may be limited by a different method for each instrument. In addition, an arbitrary index related to the evaluation of the viewing user's video presented on the video sharing site or the like may be collected and similarly used for limiting the video.

例えば、楽器として「第一ギター、第二ギター、ベース」を指定している場合、ステップS31及びS32で検索し、再生数などに基づく評価が上位所定数(例えば5件)の「第一ギター演奏動画5件、第二ギター演奏動画5件、ベース演奏動画5件」に限定して格納する。これらに対してステップS33及びS34の処理で盛況度を求め、当該盛況度に基づいて（例えば盛況度の最大値や、盛況度の時間軸上での積分値などが最も大きいものを選んで）それぞれ1件を選択して「第一ギター演奏動画1件、第二ギター演奏動画1件、ベース演奏動画1件」の合奏動画をステップS35で合成することができる。 For example, if “first guitar, second guitar, bass” is specified as the instrument, the search is performed in steps S31 and S32, and the evaluation based on the number of playbacks and the like is the top predetermined number (for example, five) “first guitar” Store only 5 performance videos, 5 second guitar performance videos, 5 bass performance videos. For these, the prosperity is obtained by the processing of steps S33 and S34, and based on the prosperity (for example, select the one with the maximum value of the prosperity and the largest integrated value on the time axis of the prosperity). Each one can be selected and the ensemble video of “1 first guitar performance video, 1 second guitar performance video, 1 bass performance video” can be synthesized in step S35.

あるいは、ステップS31及びS32の時点で、評価が最上位の1件に絞り込んでもよいし、評価が上位所定数の中からランダムに1件に絞り込んでもよい。また、ステップS31及びS32の時点で、評価が上位所定数のものをリスト化して、再度ユーザ入力受付部11を介して当該リストをユーザに示し、ユーザが各楽器につき所望の動画を選択するような対話的処理としてもよい。こうして例えば、ユーザ選択により「第一ギター；演奏動画a、第二ギター；演奏動画b、ベース；演奏動画c」と指定して、これらに対してステップS33、S34及びS35を経て合奏動画を作成してもよい。 Alternatively, at the time of steps S31 and S32, the evaluation may be narrowed down to the highest one, or the evaluation may be narrowed down to one randomly from the upper predetermined number. Also, at the time of steps S31 and S32, a list of the top predetermined number of evaluations is listed, the list is again shown to the user via the user input receiving unit 11, and the user selects a desired video for each instrument. It may be a simple interactive process. Thus, for example, “first guitar; performance video a, second guitar; performance video b, bass; performance video c” is designated by user selection, and an ensemble video is created through steps S33, S34 and S35 for these. May be.

いずれにせよ、合奏動画としては全ての楽器の演奏動画を1件ずつ用いた動画が作成される。 In any case, as an ensemble video, a video that uses one performance video of all instruments is created.

ここで、図3のステップS33における素材動画特徴抽出部22による特徴量時系列の抽出処理につき説明する。素材動画の特徴量時系列とは、素材動画の再生経過時間に沿って定義される特徴量の時系列である。当該特徴量時系列には、各種の特徴量を利用することができるが、本発明では印象的な合奏動画を作成するための特徴量として、視覚的特徴量、音響的特徴量、各動画を視聴しているユーザのコメント特徴量、の3種類を利用する。 Here, the feature amount time series extraction processing by the material moving image feature extraction unit 22 in step S33 of FIG. 3 will be described. The feature amount time series of the material moving image is a time series of feature amounts defined along the elapsed playback time of the material moving image. Various feature quantities can be used for the feature quantity time series. In the present invention, visual feature quantities, acoustic feature quantities, and moving pictures are used as feature quantities for creating an impressive ensemble movie. Use the three types of comment features of the viewing user.

これら3種類の特徴量の時系列は図5に示すようにそれぞれ、素材動画特徴抽出部22に含まれる視覚的特徴抽出部221、音響的特徴抽出部222及びコメント特徴抽出部223によって抽出される。以下、3種類の特徴量を全て利用するものとして本発明を説明するが、少なくとも1種類を利用すればよい。すなわち、特徴量のうち任意の1種類のみを利用しても、任意の2種類を利用してもよく、いずれの場合でも本発明は実施可能である。 The time series of these three types of feature amounts are extracted by the visual feature extraction unit 221, the acoustic feature extraction unit 222, and the comment feature extraction unit 223, respectively, included in the material moving image feature extraction unit 22, as shown in FIG. . Hereinafter, the present invention will be described assuming that all three types of feature quantities are used, but at least one type may be used. That is, only one arbitrary type of feature quantities or two arbitrary types may be used, and the present invention can be implemented in either case.

視覚特徴抽出部221では、視覚的特徴の時系列を抽出する。視覚的特徴の例としては、素材動画内の動きを表すMPEG-7における動きアクティビティ記述子(motion activity)や、MPEG符号化された動画から抽出可能な動きベクトル長などがあげられる。 The visual feature extraction unit 221 extracts a time series of visual features. Examples of visual features include a motion activity descriptor (motion activity) in MPEG-7 that represents motion in a material moving image, a motion vector length that can be extracted from an MPEG-encoded moving image, and the like.

音響的特徴抽出部222では、音響的特徴の時系列を抽出する。音響的特徴の例としては、全体的な音量などがあげられる。また、あらかじめ指定楽器の周波数帯域に絞ってから、音量などの音響的特徴を抽出してもよい。さらに、前述の通り、素材動画となる演奏動画の多くは元の楽曲の音源を再生しながら、演奏者が楽器を演奏している形式であることから、音響的特徴の例として、元の音源の音響的特徴からの差分を抽出してもよい。 The acoustic feature extraction unit 222 extracts a time series of acoustic features. Examples of acoustic features include overall volume. Further, after narrowing down to the frequency band of the designated musical instrument in advance, acoustic features such as volume may be extracted. Furthermore, as described above, most of the performance videos that are material videos are in the form in which the performer plays the instrument while playing the sound source of the original music. You may extract the difference from the acoustic feature.

すなわち、演奏している楽器の音に対して、BGMとして流れている元の楽曲の音源による音を消して、あるいは低減して、演奏している楽器の音のみが鳴っている、あるいは目立っているように加工してから、音響的特徴の時系列を音量の時系列などとして抽出してもよい。当該差分抽出に際しては、後述の図7で説明する時間軸補正処理によって、差分を施す箇所を求めればよい。 In other words, the sound of the musical instrument that is playing is turned off or reduced to the sound of the original music that is playing as BGM, and only the sound of the musical instrument that is playing is playing or conspicuous Then, the time series of acoustic features may be extracted as a time series of volume. In extracting the difference, a portion to be subjected to the difference may be obtained by a time axis correction process described later with reference to FIG.

コメント特徴抽出部223では、ユーザコメント特徴の時系列を抽出する。ユーザコメント特徴の時系列の例としては、時系列上でのコメント数があげられる。ここで前提として、当該素材動画は素材動画検索部21において検索した動画共有サイトなどにおいて、動画の再生経過時間に対応づけて視聴ユーザがコメントを付与しているものとする。そして、当該再生経過時間に対応付けられたコメントを関連情報として素材動画検索部21が収集し、コメント特徴抽出部223において、再生経過時間の所定間隔毎に付与されているコメント数をカウントすることで、コメント数特徴量時系列を得る。 The comment feature extraction unit 223 extracts a time series of user comment features. An example of the time series of user comment features is the number of comments on the time series. As a premise here, it is assumed that the viewing user adds a comment in association with the elapsed playback time of the moving image on the moving image sharing site searched by the material moving image search unit 21. Then, the material moving image search unit 21 collects the comments associated with the playback elapsed time as related information, and the comment feature extraction unit 223 counts the number of comments given at predetermined intervals of the playback elapsed time. Thus, a comment number feature amount time series is obtained.

なお、演奏動画に対して視聴ユーザが再生経過時間に対応づけて付与する上記のようなコメントには、当該演奏が盛り上がっている箇所において多く付与されやすい等の傾向がある。このため、コメント数特徴量時系列を後述の盛況度の説明変数として利用できる。音響的特徴量時系列及び視覚的特徴量時系列に関しても同様であり、適切な複数の説明変数を設けることで盛況度算出モデルの精度を上げることができる。 It should be noted that comments such as those given to the performance video in association with the elapsed playback time by the viewing user tend to be often given at locations where the performance is exciting. For this reason, the number-of-comments feature amount time series can be used as an explanatory variable for the degree of success described later. The same applies to the acoustic feature amount time series and the visual feature amount time series, and the accuracy of the prosperity degree calculation model can be increased by providing a plurality of appropriate explanatory variables.

なおまた、演奏動画の盛況度とは無関係なコメントを排除したい場合は、不要なコメントを除去するためのルール（例：「wwww」など、無意味なコメントをリストアップした辞書を準備）を設定し、同ルールに該当しないコメントのみをカウントしてもよい。また、同様に所定のルールを設けることで、所定間隔内で共起する特定のコメントに対して、コメント数のカウントにおいて重み付け（多くカウントする場合と少なくカウントする場合とを含む）を行うようにしてもよい。こうしたコメントに対するフィルタ処理は、動画共有サイト毎に別の処理を設けてもよい。 Also, if you want to exclude comments that are unrelated to the success of the performance video, set rules to remove unnecessary comments (eg, prepare a dictionary listing meaningless comments such as “wwww”) Only comments that do not fall under the same rule may be counted. Similarly, a predetermined rule is provided so that specific comments co-occurring within a predetermined interval are weighted (including the case of counting a large number and the case of counting a small number) in counting the number of comments. May be. Filter processing for such comments may be provided separately for each video sharing site.

以上のようにして得られる、各特徴量の抽出結果の例を図6に示す。図6の例では、ある素材動画に対して、視覚的特徴量（動きベクトル量）、音響的特徴量（音量）、コメント数特徴量（コメント数）の特徴量の推移としての時系列が、それぞれ示されている。なお、図6の例では各特徴量は全て1次元であるが、多次元の特徴量として抽出してもよい。例えば、視覚的特徴量を動きベクトルの各成分として2次元特徴量で、音響的特徴量を所定の周波数帯域ごとの音量として多次元特徴量で、コメント数特徴量を所定のフィルタごとのコメント数として多次元特徴量で抽出するなどしてもよい。 An example of the extraction result of each feature amount obtained as described above is shown in FIG. In the example of FIG. 6, for a certain material video, the time series as the transition of the feature amount of the visual feature amount (motion vector amount), the acoustic feature amount (volume), and the comment number feature amount (comment number) is Each is shown. In the example of FIG. 6, each feature amount is one-dimensional, but may be extracted as a multi-dimensional feature amount. For example, a visual feature amount is a two-dimensional feature amount as each component of a motion vector, an acoustic feature amount is a volume for each predetermined frequency band, a multi-dimensional feature amount, and a comment number feature amount is the number of comments per predetermined filter. Alternatively, it may be extracted with a multidimensional feature amount.

なおまた、素材動画からの特徴量時系列抽出の際には、合成の対象となる素材動画内での楽曲の音源の再生時刻に基づき、全ての素材動画同士で時刻情報を同期する必要がある。各素材動画においては、同一の楽曲の音源の最初から最後までの全体が連動して流れている箇所が存在するので、素材動画の時刻を少しずつ移動させながら、当該楽曲の音源の音響的特徴量の時系列との間で相互相関係数を算出することにより、素材動画の時刻情報を同期させることができる。 In addition, when extracting feature amount time series from material videos, it is necessary to synchronize time information among all material videos based on the playback time of the sound source of the music in the material videos to be synthesized. . In each material video, there is a place where the whole sound source of the same song flows from the beginning to the end, so the acoustic characteristics of the sound source of the song are moved while moving the time of the material movie little by little. By calculating the cross-correlation coefficient with the time series of quantities, the time information of the material moving image can be synchronized.

なお、素材動画においては、楽器演奏の音と所定楽曲の音源の音以外に目立った音が混ざっていることは少ない。よって相互相関係数の算出には、楽曲の音源及び素材動画の音の両者から演奏楽器の周波数帯域を除外したものについて行ってもよい。後述の探索時間範囲を限定する処理を加えてもよい。 In the material moving image, it is rare that a conspicuous sound is mixed other than the sound of the musical instrument performance and the sound of the sound source of the predetermined music. Therefore, the calculation of the cross-correlation coefficient may be performed for both the sound source of the music and the sound of the material moving image, excluding the frequency band of the musical instrument. You may add the process which limits the below-mentioned search time range.

上記のような素材動画の時刻情報の同期を、図7を用いて説明する。図7は素材動画が2つの場合を示しているが、3つ以上ある場合も同様である。(1)は素材動画A(例えばギター演奏動画)と素材動画B(例えばベース演奏動画)との、時刻情報の同期がなされていない状態を示してあり、(2)はそれぞれ時刻情報を同期して素材動画A'及びB'となった状態を示している。 Synchronization of time information of the material moving image as described above will be described with reference to FIG. FIG. 7 shows the case where there are two material moving images, but the same applies when there are three or more moving images. (1) shows the state in which the time information is not synchronized between material video A (for example, guitar performance video) and material video B (for example, bass performance video), and (2) synchronizes the time information respectively. This shows the state of the material animations A ′ and B ′.

(1)に素材動画A及びBとそれらの時間軸Cを示すように、素材動画A及びBはそれぞれ、楽器演奏開始前の区間A1及びB1と、所定の楽曲の音源に合わせて演奏中の区間A2及びB2と、楽器演奏終了後の区間A3及びB3とから構成されている。一般に、楽器演奏開始前の区間A1とB1の長さは一致せず、楽器演奏終了後の区間A3とB3との長さも一致しない。これらは各素材動画を用意したユーザの録画・編集状況により変わる。しかし、CD等に含まれる音源を想定した所定の楽曲の音源に合わせて演奏中の、区間A2及びB2の長さは共通である。 As shown in (1), material videos A and B and their time axis C, material videos A and B are playing in accordance with the sections A1 and B1 before the start of the musical instrument performance and the sound source of the predetermined music, respectively. It consists of sections A2 and B2 and sections A3 and B3 after the end of the musical instrument performance. In general, the lengths of sections A1 and B1 before the start of the musical instrument performance do not match, and the lengths of sections A3 and B3 after the musical instrument performance do not match. These vary depending on the recording / editing situation of the user who prepared each material video. However, the lengths of the sections A2 and B2 that are being played in accordance with the sound source of a predetermined music that is assumed to be a sound source included in a CD or the like are common.

この(1)のような状態で素材動画A及びBを同時に再生しても、特に区間A1とB1の長さが異なることより、演奏中の区間A2及びB2がずれてしまうため、合奏動画として成立しない。そこで、あらかじめ特徴量抽出処理の時点において、(2)に示すように区間A2及びB2とが同時再生されるように時間軸を移動補正する。C'は移動補正された時間軸である。当該移動補正して調整された時間軸C'は、特徴抽出処理以降の処理において共通で用いられる。すなわち、時間軸C'は盛況度時系列及び合奏動画の時間軸となる。 Even if the material videos A and B are played back simultaneously in the state as shown in (1), the sections A2 and B2 that are being played are shifted because the lengths of the sections A1 and B1 are different. Not satisfied. Therefore, at the time of the feature amount extraction process, the time axis is corrected for movement so that the sections A2 and B2 are reproduced simultaneously as shown in (2). C ′ is a movement-corrected time axis. The time axis C ′ adjusted by the movement correction is used in common in the processes after the feature extraction process. That is, the time axis C ′ becomes a time axis of the success time series and the ensemble moving image.

当該移動補正には、区間A2及びB2のそれぞれ素材動画A及びBにおける開始時刻を求めればよい。そこで、前述のような相互相関係数を用いた手法などによって、開始時刻を求める。なお、所定の楽曲の音源の全体が素材動画の内部で流れている前提で、演奏区間の開始時間は、素材動画全体の再生継続時間から所定の楽曲の音源の継続時間を引いた時間以内である。このような制約を利用して開始時刻探索の計算を簡略化してもよい。 For the movement correction, the start times of the material moving images A and B in the sections A2 and B2 may be obtained. Therefore, the start time is obtained by the method using the cross-correlation coefficient as described above. Assuming that the entire sound source of a given song flows inside the material video, the start time of the performance section is within the time obtained by subtracting the duration of the sound source of the given song from the playback duration of the whole material movie. is there. The calculation of the start time search may be simplified using such constraints.

合奏動画の視聴意義を考慮すると、盛況度算出は、基本的に演奏中のみで行えばよい。よって特徴量抽出処理を行うのは、補正により同時再生される区間となった(2)に示すt_a~t_b間に限定して、後述のハイライト処理などを適用してもよい。合奏動画として提示する場合には、演奏区間外で且つ他の演奏動画の再生時間に対応する映像が存在しない区間A0やB4に関しては、直近に存在する映像を静止画として表示してもよいし、別の所定の画像を配置しておいてもよい。また、合奏動画自体を区間t_a~t_bのみで作成するようにしてもよい。 In consideration of the significance of viewing the ensemble moving image, the prosperity calculation is basically performed only during the performance. Therefore, the feature amount extraction processing is limited to the interval t _a to t _b shown in (2), which is a section that is simultaneously reproduced by correction, and a highlight processing described later may be applied. When presenting as an ensemble video, the latest video may be displayed as a still image for the sections A0 and B4 outside the performance section and for which there is no video corresponding to the playback time of other performance videos. Another predetermined image may be arranged. Alternatively, the ensemble moving image itself may be created only in the sections t _a to t _b .

次に、図2のステップS21において図1の機能ブロック群3により行われる、盛況度モデル構築処理の例について説明する。本処理では、予め用意して学習データ保存部31に保存される学習データを元に、各素材画像の中での盛り上がり度合いを算出するためのモデルを構築する。学習データとしては、合奏動画の対象と成り得る楽器の演奏動画に対し、ハイライト箇所が付与された動画データを用いる。たとえば、複数の被験者に演奏動画を視聴させた上で、各被験者がハイライトだと感じた箇所を選択させた結果、収集された情報（演奏動画＋ハイライト箇所の時刻情報）を学習データとして準備することができる。 Next, an example of the success model construction process performed by the functional block group 3 of FIG. 1 in step S21 of FIG. 2 will be described. In this process, a model for calculating the degree of excitement in each material image is constructed based on the learning data prepared and stored in the learning data storage unit 31 in advance. As learning data, moving image data in which a highlight portion is given to a musical performance movie of a musical instrument that can be an object of an ensemble moving image is used. For example, after having a plurality of subjects watch a performance video and select a location that each subject felt to be a highlight, the collected information (performance video + time information of the highlight location) is used as learning data. Can be prepared.

モデル構築方法としては、たとえば重回帰分析やSVM(サポートベクトルマシン)などがある。ここでは、目的変数を盛況度、説明変数をある時刻tにおける演奏動画の｛視覚的、音響的、ユーザコメント数｝特徴量とし、目的変数である盛況度を算出するためのモデルを、重回帰分析を用いて構築する方法を例としてあげる。なお当然であるが、学習用の演奏動画における各特徴量は、合奏動画を作成する際に素材動画特徴抽出部23で演奏動画より抽出するものと同様の各特徴量として、あらかじめ抽出しておく。 Examples of model construction methods include multiple regression analysis and SVM (support vector machine). Here, the objective variable is the degree of success, the explanatory variable is the {visual, acoustic, number of user comments} feature quantity of the performance video at a certain time t, and the model for calculating the degree of success as the objective variable is a multiple regression. Take as an example how to build using analysis. Naturally, each feature amount in the performance video for learning is extracted in advance as each feature amount that is extracted from the performance video by the material video feature extraction unit 23 when creating the ensemble video. .

数式(1)に、重回帰分析による盛況度算出の算出式を示す。 Formula (1) shows a formula for calculating the degree of success by multiple regression analysis.

ただし、Score(m,t) は時刻tにおける演奏動画mの盛況度、x_v,tは時刻tにおける演奏動画mの視覚的特徴量、x_a,tは時刻tにおける演奏動画mの音響的特徴量、x_c,tは時刻tにおける演奏動画mのユーザコメント数特徴量、α_v, α_a, α_c, α_iは、それぞれの特徴量に対する重み係数とする（α_iは切片）。 Where Score (m, t) is the success of the performance video m at time t, x _{v, t} is the visual feature of the performance video m at time t _, and x _{a, t} are the acoustic characteristics of the performance video m at time t. The feature amount, x _{c, t} is the user comment number feature amount of the performance video m at time t, and α _v , α _a , α _c , α _i are weighting coefficients for each feature amount (α _i is an intercept).

ここで、盛況度算出モデル構築処理では、上記の各重み係数（α）を算出する処理を行う。学習時の目的変数としては、たとえば学習データ構築時の被験者数をN人とし、演奏動画内の時刻tをハイライト箇所として選択した被験者の数をn_tとすると、n_t／N（すなわち、全被験者のうち時刻tをハイライト箇所と判断したユーザの割合）を算出することができる。 Here, in the prosperity degree calculation model construction process, a process for calculating each of the weighting factors (α) is performed. As the objective variable at the time of learning, for example, if the number of subjects at the time of learning data construction is N, and the number of subjects who have selected time t in the performance video as a highlight location is n _t , n _t / N (ie, The ratio of users who have determined that time t is the highlight location among all subjects can be calculated.

そして、新たに入力された素材動画の盛況度は、上記学習の結果得られたモデル（数式(1)）、および素材動画特徴抽出部22により得られる当該素材動画の各特徴量時系列を元に、素材動画盛況度算出部23によって算出することが可能となる。この処理の結果、素材動画内での盛況度の時間的遷移として、盛況度時系列を得ることとなる。 The prosperity of the newly input material video is based on the model obtained from the learning (formula (1)) and each feature amount time series of the material video obtained by the material video feature extraction unit 22. In addition, it can be calculated by the material animation prosperity degree calculation unit 23. As a result of this processing, the success time series is obtained as a temporal transition of the success in the material moving image.

なお、｛視覚的、音響的、ユーザコメント数｝特徴量の3つの全てを利用するわけではない場合は、利用する特徴量の項のみを用いるように数式(1)を修正すればよい。また、各特徴量の中に多次元の特徴量（例えばn次元）がある場合は、当該特徴量の重み係数（α）をn個設けるように数式(1)を修正すればよい。 Note that if all three of the {visual, acoustic, and number of user comments} feature quantities are not used, Equation (1) may be modified so that only the feature quantity terms to be used are used. If there is a multidimensional feature quantity (for example, n dimensions) in each feature quantity, Equation (1) may be corrected so that n weighting coefficients (α) of the feature quantity are provided.

次に、図2のステップS23において動画合成部24により行われる、合奏動画合成処理について説明する。合奏動画を合成するためには、合奏動画全体における時間軸の中での各演奏動画の盛況度を比較し、例えばある時刻において盛況度が高い演奏動画をその後の所定時間ハイライト対象として抽出する。まず、当該処理によって合成される合奏動画のイメージを図8に示す。 Next, the ensemble moving image synthesizing process performed by the moving image synthesizing unit 24 in step S23 of FIG. 2 will be described. In order to synthesize an ensemble movie, the performance videos of the performance videos on the time axis of the entire ensemble movie are compared, and, for example, a performance movie with a high success rate at a certain time is extracted as a highlight target for a predetermined time thereafter. . First, FIG. 8 shows an image of an ensemble moving image synthesized by the processing.

図8では、４種類の楽器による素材動画によって合成された合奏動画の例が示されている。このうち、左側に示す(1)では、ハイライトされている演奏動画がない状態を示す。この場合、合奏動画の画面を均等に４分割し、それぞれのサブ領域で演奏動画が流れている構成となる。すなわち素材動画として、左上にギター演奏動画、右上にピアノ演奏動画、左下にドラム演奏動画、右下にバイオリン演奏動画を配置する。そして時間を楽曲の音源によって同期させて、各素材動画の映像及び音を同時に流すことで、合奏動画が構成される。 FIG. 8 shows an example of an ensemble moving image synthesized from material moving images of four types of musical instruments. Among these, (1) shown on the left side shows a state in which there is no performance animation highlighted. In this case, the screen of the ensemble moving image is equally divided into four, and the performance moving image flows in each sub-region. That is, as a material video, a guitar performance video is arranged in the upper left, a piano performance video in the upper right, a drum performance video in the lower left, and a violin performance video in the lower right. An ensemble moving image is constructed by synchronizing the time with the sound source of music and simultaneously playing the video and sound of each material moving image.

そして図8の右側に示す(2)には、(1)のハイライト無し状態から変化して、左上のギター演奏動画がハイライトされている合奏動画を示す。このように、動画合成部24で合成する合奏動画には、各素材動画がハイライトされるタイミングになったら、当該素材動画が強調表示される形式を利用することができる。 And (2) shown on the right side of FIG. 8 shows an ensemble moving image in which the upper left guitar performance moving image is highlighted from the no highlight state of (1). As described above, the ensemble moving image synthesized by the moving image synthesizing unit 24 can use a format in which the material moving image is highlighted at the timing when each material moving image is highlighted.

ハイライトは例えば次のようにして行う。例えば素材動画が3つ（m_x, m_y, m_z）与えられた場合、各演奏動画について、時刻tでの盛況度を数式(1)により算出することができる（Score(m_x,t), Score(m_y,t), Score(m_z,t)）。これらの盛況度の値、および各盛況度値の比較により、時刻tにおいてハイライトされるべき演奏動画を選択することができる。 For example, highlighting is performed as follows. For example, when three material videos (m _x , m _y , m _z ) are given, the prosperity at the time t can be calculated for each performance video by Equation (1) (Score (m _x , t ), Score (m _y , t), Score (m _z , t)). A performance video to be highlighted at time t can be selected by comparing the values of the prosperity and the prosperity values.

一実施形態では、各時刻tにおいて最も盛況度が高い演奏動画をハイライト対象として選択し、時間経過に沿ってハイライト対象を切り替えるようにすることができる。 In one embodiment, a performance video with the highest prosperity at each time t can be selected as a highlight target, and the highlight target can be switched over time.

一実施形態では、各演奏動画につき、盛況度が所定の閾値を超えた時刻tを、当該演奏動画のハイライト開始タイミングとすることができる。そして、ハイライト開始タイミングの時刻t以降、予め設定された一定時間内は選択された演奏動画をハイライト対象とした合奏動画を合成することができる。１つの演奏動画がハイライト対象となっている間は、他の演奏動画の盛況度がハイライト開始タイミングの条件を満たしても、ハイライト対象としないようにしてもよい。逆にハイライト開始タイミングの条件に至った演奏動画を全て、所定時間の間ハイライト対象としてもよい。 In one embodiment, for each performance video, the time t when the prosperity exceeds a predetermined threshold can be used as the highlight start timing of the performance video. Then, after the highlight start timing t, it is possible to synthesize an ensemble moving image that highlights the selected performance moving image within a predetermined time. While one performance moving image is a highlight target, even if the prosperity of another performance moving image satisfies the condition of the highlight start timing, it may not be a highlight target. Conversely, all performance videos that have reached the highlight start timing condition may be highlighted.

上記のハイライト開始タイミングtを判定する実施形態では、当該時刻t以降、所定の閾値より上に盛況度の値がある時間が継続している場合、当該継続時間をハイライト継続時間に追加してもよい。 In the embodiment for determining the highlight start timing t described above, if the time with a value of the degree of success is higher than a predetermined threshold after the time t, the duration is added to the highlight duration. May be.

また、各実施形態において、すべての演奏動画の盛況度が所定の閾値に達しておらず低い値である間は、図7の左側(2)のような形態でハイライトなしの合奏動画を合成することもできる。 Further, in each embodiment, as long as the prosperity of all the performance videos does not reach the predetermined threshold and is a low value, the ensemble video without highlight is synthesized in the form as shown on the left side (2) of FIG. You can also

図9に、上記のようなハイライト対象の選択・変遷の例を示す。当該例は、盛況度が所定の閾値を超えた時刻をハイライト開始タイミングとして、以降の所定時間ハイライトを行う例である。(1)はギター演奏動画(＝m_x)の、(2)はベース演奏動画(＝m_y)の、(3)はベース演奏動画(＝m_z)の、それぞれ盛況度時系列(＝{Score(m_i,t)｜i＝x,y,x} )であり、(4)は当該3つの素材動画の合成による合奏動画におけるハイライト対象の変遷である。(1)~(3)に示すように、ギター演奏動画は時刻t₁で盛況度が閾値を超え、ベース演奏動画は時刻t₂で盛況度が閾値を超え、ドラム演奏動画は時刻t₃で盛況度が閾値を超え、当該各時刻においてそれぞれハイライト開始タイミングを迎えている。 FIG. 9 shows an example of selection / transition of the highlight target as described above. This example is an example in which highlighting is performed for a predetermined time thereafter, with the time when the prosperity exceeds a predetermined threshold as the highlight start timing. (1) is a guitar performance video (= m _x ), (2) is a bass performance video (= m _y ), (3) is a bass performance video (= m _z ), respectively. Score (m _i , t) | i = x, y, x}), and (4) shows the transition of the highlight object in the ensemble moving image by combining the three material moving images. (1) - as shown in (3), guitar playing video success degree exceeds the threshold value at time t _1, the base Performance video is beyond the success of the threshold at time t _2, the drum playing video at a time t ₃ The degree of success exceeds the threshold, and the highlight start timing is reached at each time.

そして図9では、時間軸方向の位置を揃えて描かれている(1)〜(4)で示されるように、合奏動画においては区間t₁~t₁+Tでギター演奏動画が、区間t₂~t₂+Tでベース演奏動画が、区間t₃~t₃+Tでドラム演奏動画がハイライト対象となり、各ハイライト区間は(4)において斜線区間として示されている。ここでTはハイライト表示を行う所定の時間間隔である。合奏動画を示す(4)において、斜線を付さないその他の区間はハイライト対象の存在しない区間となる。 In FIG. 9, as shown in (1) to (4), where the position in the time axis direction is aligned, in the ensemble video, the guitar performance video is in the interval t ₁ to t ₁ + T, and the interval t _The bass performance video is highlighted at ₂ to t ₂ + T, and the drum performance video is highlighted at intervals t ₃ to t ₃ + T. Each highlighted interval is shown as a shaded area in (4). Here, T is a predetermined time interval for performing highlight display. In (4) showing the ensemble moving image, the other sections not hatched are sections with no highlight target.

なお、図9では各演奏動画のハイライト区間が重複しない例を示している。重複する場合は、前述の通り、各演奏動画毎に独立にハイライト対象としてもよいし、先にハイライト対象となってハイライト表示が継続中の演奏動画がある場合、新たなハイライト対象を設けないようにしてもよい。ハイライト対象と判定する閾値及びハイライト継続時間Tは、全演奏動画で共通としても、演奏動画毎に別の値を設定してもよい。 FIG. 9 shows an example in which the highlight sections of each performance video do not overlap. In the case of duplication, as described above, each performance video may be independently highlighted, or if there is a performance video that has already been highlighted and continues to be highlighted, a new highlight target May not be provided. The threshold value for determining the highlight object and the highlight duration time T may be common to all performance videos, or may be set to different values for each performance video.

以上、合奏動画合成処理におけるハイライト対象選択の部分を説明した。次に、ハイライトにおける各種の強調様式と、その前提としての素材動画の配置とについて説明する。ハイライトを行わない状態での素材動画の配置を基本配置と呼ぶこととする。基本配置として、例えば図8の(1)で示したように、用いる素材動画の数の所定の矩形領域に配置することができる。 The highlight target selection portion in the ensemble moving image synthesis process has been described above. Next, various highlighting styles in the highlight and the arrangement of the material moving image as the premise thereof will be described. The arrangement of the material animation without highlighting is called a basic arrangement. As the basic arrangement, for example, as shown in (1) of FIG. 8, it can be arranged in a predetermined rectangular area corresponding to the number of material moving images to be used.

基本配置とハイライトとの例を図10に示す。(a)は3つの素材動画P1、P2及びP3の基本配置の例である。(a)に示すように、図8の(1)のような例とは異なり、素材動画間に空間があってもよい。また基本配置は合奏動画の再生に連動して移動するようにしてもよいが、移動しない(a)の場合につき、素材動画P1を強調表示の対象としてハイライト処理を行う例を(b1)、(b2)、(c)、(d)、(e)及び(f)に示す。 An example of the basic arrangement and highlights is shown in FIG. (a) is an example of the basic arrangement of three material videos P1, P2 and P3. As shown in (a), unlike the example shown in (1) of FIG. 8, there may be a space between the material moving images. The basic arrangement may be moved in conjunction with the playback of the ensemble video, but in the case of not moving (a), an example of performing highlight processing with the material video P1 being highlighted (b1), Shown in (b2), (c), (d), (e) and (f).

(b1)及び(b2)は図8の(2)のような様式のハイライトである。P1がハイライト対象となると、(b1)に示すように徐々に拡大されて、(b2)のように合奏動画の画面全体を占有する様式である。ハイライト表示を解除する場合も(b1)のような状態を経て徐々に(a)へ戻ってもよい。 (b1) and (b2) are highlights of the style as shown in (2) of FIG. When P1 is highlighted, it is gradually enlarged as shown in (b1) and occupies the entire screen of the ensemble moving picture as shown in (b2). Even when the highlight display is canceled, the state may gradually return to (a) through the state (b1).

(c)はP1が拡大されるが、その他のP2やP3の領域を覆うまでには至らない様式のハイライトである。(d)はP1の周辺領域が強調色などに変色、あるいは点滅するなどの様式のハイライトである。(e)はハイライト対象のP1以外のP2及びP3にぼかし処理などを加えてP1を強調する様式である。(f)はハイライト対象のP1を複数同時に表示し、その他のP2及びP3は表示しない様式である。(c)ないし(f)においてさらに、ハイライト中のP1領域に動きの演出等を加えるようにしてもよい。これら各様式のハイライトは、ハイライト対象が選択される都度別種のものを適用するようにしてもよい。ハイライト対象の盛況度の値あるいはハイライト対象の楽器に応じて適用する様式を定めてもよい。 (c) is a highlight of a style in which P1 is enlarged but does not reach other P2 and P3 areas. (d) is a highlight of a style in which the peripheral area of P1 is changed to a highlighted color or blinks. (e) is a style in which P1 is emphasized by adding blurring processing or the like to P2 and P3 other than P1 to be highlighted. (f) is a format in which a plurality of P1s to be highlighted are displayed simultaneously, and other P2 and P3 are not displayed. In (c) to (f), a motion effect or the like may be added to the P1 area being highlighted. Different types of highlights may be applied each time a highlight target is selected. A style to be applied may be determined according to the value of the degree of success of the highlight object or the musical instrument to be highlighted.

以上に限らず、基本配置とハイライトは各種の設定が利用でき、テンプレートとして動画合成システム10に用意しておくことができる。複数テンプレートがある場合は、どれを利用するかをユーザ検索要求受付部11においてユーザが指定できるようにしてもよい。 The basic arrangement and highlight are not limited to the above, and various settings can be used, and can be prepared in the moving image synthesis system 10 as a template. When there are a plurality of templates, the user search request accepting unit 11 may specify which one to use.

なおまた、ハイライト処理を完全に省略した形式で合奏動画を作成するようにしてもよく、ユーザから当該指定をユーザ検索要求受付部11において受け付けるようにしてもよい。 In addition, the ensemble moving image may be created in a format in which the highlight processing is completely omitted, and the user search request accepting unit 11 may accept the designation from the user.

図11は、本発明の動画合成システム10の、検索機能を利用しない実施形態の機能ブロック図である。当該実施形態を、図1の実施形態との差分の部分に注目して説明する。すなわち、図1のユーザ検索要求受付部11及び素材動画検索部21に代えて、ユーザ要求受付部110及び動画準備部210を備えるが、処理の流れとしては同様である。その他の機能ブロックは共通である。 FIG. 11 is a functional block diagram of an embodiment in which the search function is not used in the moving image synthesis system 10 of the present invention. The embodiment will be described by paying attention to the difference from the embodiment of FIG. That is, a user request receiving unit 110 and a moving image preparation unit 210 are provided in place of the user search request receiving unit 11 and the material moving image search unit 21 in FIG. 1, but the process flow is the same. Other functional blocks are common.

当該実施形態は、素材動画をユーザ自身が準備して、動画合成システム10がハイライト処理を施した合奏動画の自動合成を行うものである。すなわち、ユーザは自身で検索するなどして、所定の楽曲の各楽器を演奏している演奏動画を所定数用意し、ユーザ要求受付部110を介して当該動画を入力すると共に作成要求(作成命令)を入力する。動画準備部210には当該入力した演奏動画が格納され、素材動画特徴抽出部22以降は、当該演奏動画を用いて図1の実施形態と全く同様の処理が行われる。 In this embodiment, the user himself / herself prepares a material moving image, and the moving image synthesizing system 10 performs automatic synthesis of the ensemble moving image on which highlight processing has been performed. That is, the user prepares a predetermined number of performance videos playing each instrument of a predetermined music by searching for himself or the like, inputs the video via the user request reception unit 110, and creates a request (creation command) ). The input performance moving image is stored in the moving image preparation unit 210, and the material moving image feature extraction unit 22 and the subsequent processing are performed using the performance moving image, exactly the same processing as in the embodiment of FIG.

なお、当該実施形態の意義より明らかではあるが、当該実施形態では図2のステップS22、図3のステップS31及びS32等は適宜ユーザ自身の作業に置き換えられる。当該実施形態ではユーザ自身が所望の素材動画を用意した上で、合奏動画を自動合成することができる。 As is clear from the significance of this embodiment, step S22 in FIG. 2, steps S31 and S32 in FIG. 3 and the like are appropriately replaced with the user's own work in this embodiment. In this embodiment, the ensemble moving image can be automatically synthesized after the user prepares a desired material moving image.

図12は、本発明の動画合成システム10として機能できるコンピュータ５０の主要部の構成の一例を示した機能ブロック図であり、オペレーティングシステム(OS)を含む基本プログラムや各種の基本データが記憶されたROM５２と、各種のプログラムやデータが記憶されるハードディスクドライブ装置(HDD)５７と、CR-ROMやDVD等の記憶メディア６１からプログラムやデータを読み出すメディアドライブ装置５６と、プログラムを実行するCPU５１と、このCPU５１にワークエリアを提供するRAM５３と、入出力インターフェース(I/F)５５を介して接続されたディスプレイ５８、キーボード５９およびマウス等のポインティングデバイス６０と、外部装置と通信するパラレル／シリアルI/F５４とを主要な構成としている。 FIG. 12 is a functional block diagram showing an example of the configuration of the main part of the computer 50 that can function as the moving image synthesis system 10 of the present invention, in which basic programs including an operating system (OS) and various basic data are stored. ROM 52, hard disk drive (HDD) 57 for storing various programs and data, media drive 56 for reading programs and data from storage media 61 such as CR-ROM and DVD, CPU 51 for executing the programs, A RAM 53 that provides a work area to the CPU 51, a display 58, a keyboard 59, and a pointing device 60 such as a mouse connected via an input / output interface (I / F) 55, and a parallel / serial I / O that communicates with an external device. F54 is the main component.

図12の構成では、本発明に係る動画合成プログラムがネットワーク等を経てシリアル／パラレルI/F５４から入力、またはメディアドライブ装置５６で読み取られてHDD５４に予め記憶される。メディアドライブ装置５６で読み取られる場合、本発明に係る動画合成プログラムは予め記憶メディア６１に記憶され、HDD５７にインストールされる。 In the configuration of FIG. 12, the moving image synthesis program according to the present invention is input from the serial / parallel I / F 54 via the network or the like, or read by the media drive device 56 and stored in the HDD 54 in advance. When being read by the media drive device 56, the moving image composition program according to the present invention is stored in advance in the storage medium 61 and installed in the HDD 57.

このような構成において、検索要求を行うユーザがネットワーク等を経て、動画合成サーバとして当該コンピュータ５０を利用する場合、予め管理者などがコンピュータ５０においてキーボード５９およびマウス等のポインティングデバイス６０を用いるなどして、動画合成プログラムを起動し、検索要求を待つ状態としておく。パラレル／シリアルI/F５４を介してユーザ検索要求を受信すると、CPU５１は動画合成プログラムを実行し、当該コンピュータ５０を図1に示した動画合成システム10として機能させ、図2及び図3に示したような各ステップが実行される。合成された合奏動画はシリアル／パラレルI/F５４を介してユーザに送信される。 In such a configuration, when a user who makes a search request uses the computer 50 as a video composition server via a network or the like, an administrator or the like uses a keyboard 59 and a pointing device 60 such as a mouse in the computer 50 in advance. Then, the moving image synthesis program is started and a search request is awaited. When the user search request is received via the parallel / serial I / F 54, the CPU 51 executes the moving image synthesizing program, and causes the computer 50 to function as the moving image synthesizing system 10 shown in FIG. 1, as shown in FIGS. Each step is executed. The synthesized ensemble moving image is transmitted to the user via the serial / parallel I / F 54.

あるいは、ユーザが自身で利用する端末としてコンピュータ50を利用する場合、ユーザ自身がキーボード５９およびマウス等のポインティングデバイス６０を用いるなどして、動画合成プログラムを起動すると共に検索要求を入力することで、合奏動画が合成され、HDD５７に格納される。この場合、ユーザはディスプレイ５８により合成動画を視聴することができる。 Alternatively, when the user uses the computer 50 as a terminal used by the user, the user himself / herself uses a keyboard 59 and a pointing device 60 such as a mouse to start the moving image synthesis program and input a search request. The ensemble moving image is synthesized and stored in the HDD 57. In this case, the user can view the synthesized video on the display 58.

なお、以上と同様にして、図1に示す動画合成システム10の各機能ブロック毎又は複数の機能ブロック毎に、その機能を実行する図11のようなコンピュータ50を複数用意して、当該コンピュータ50同士がパラレル／シリアルI/F５４を介してネットワーク上で通信することによって、動画合成システム10を実現してもよい。 In the same manner as described above, a plurality of computers 50 as shown in FIG. 11 that execute the function are prepared for each functional block or a plurality of functional blocks of the moving image synthesis system 10 shown in FIG. The moving image synthesis system 10 may be realized by communicating with each other over a network via the parallel / serial I / F 54.

なおまた、以上のような本発明に係る動画合成プログラムを利用するに際して、図2のステップS21における盛況度モデル構築処理は、動画合成プログラムの実行処理内に含めて、被験者の評価データをシリアル／パラレルI/F５４等を介して受信するようにしてもよい。また、当該図2のステップS21における盛況度モデル構築処理は、動画合成プログラムの実行処理内に含めず、同様の処理を予め行って得られた結果としてのパラメータ等を、当該動画合成プログラム内に含まれステップS35にて参照される定数として設けておいてもよい。 In addition, when using the video composition program according to the present invention as described above, the success model building process in step S21 of FIG. 2 is included in the execution process of the video composition program, and the evaluation data of the subject is serial / You may make it receive via parallel I / F54 grade | etc.,. Also, the success model building process in step S21 in FIG. 2 is not included in the execution process of the moving picture synthesis program, and the parameters obtained as a result of performing the same process in advance are included in the moving picture synthesis program. It may be provided as a constant included and referenced in step S35.

なお、本発明は好ましい一実施形態として、利用する楽器を指定した合奏動画を作成するものとして説明してきた。所定の楽曲の音源が連動している素材動画を利用する前提下において、より一般に次のような実施形態も可能である。すなわち、楽器に限らず、ボーカルやダンスなども含む、所定の楽曲の音源における各パートが演じられている動画を素材動画として、これらを合成した共演動画を作成することができる。この場合、以上の説明において「楽器」を「パート」に、「演奏動画」を「パートが演じられている動画」に、「合奏動画」を「共演動画」に、読み替えるなどすればよい。 In addition, this invention has demonstrated as what produces the ensemble animation which designated the musical instrument to utilize as preferable one Embodiment. In general, the following embodiment is possible under the premise of using a material moving image in which a sound source of a predetermined music is linked. That is, it is possible to create a co-star movie that combines not only a musical instrument but also a video in which each part of a sound source of a predetermined music, including vocals and dance, is composed as a material video. In this case, in the above description, “instrument” may be read as “part”, “performance video” may be read as “video where the part is played”, “ensemble video” as “co-play”.

10…動画合成システム、21…素材動画検索部、22…素材動画特徴検出部、23…素材動画盛況度算出部、24…動画合成部 10 ... Video synthesis system, 21 ... Material video search unit, 22 ... Material video feature detection unit, 23 ... Material video success rate calculation unit, 24 ... Video synthesis unit

Claims

A material video search unit that searches and obtains a predetermined number of video content that matches the user's search request from an external database,
A material video feature extraction unit that extracts a feature amount time series from a predetermined number of material videos linked to a sound source of a predetermined music;
A prosperity degree calculation unit that calculates a prosperity degree time series of the material video from the feature amount time series using a given relationship;
A moving picture composition system comprising: a moving picture composition unit for synthesizing a co-produced moving picture that combines the predetermined number of material moving pictures based on the success time series.

As a search request from a user, a search request specifying a sound source of the predetermined music is received, and a predetermined number of moving image contents that match the search request are searched and obtained from an external database, and the predetermined number of material moving images 2. The moving image composition system according to claim 1, further comprising a material moving image search unit.

4. The material video search unit according to claim 2 or 3, wherein the material video search unit searches and obtains video content in which a predetermined number of each part in the sound source of the music is played, and sets the predetermined number of material videos. The video composition system described.
3. The moving image composition system according to claim 1, wherein the material moving image is a moving image in which a predetermined number of parts in the sound source of the music are played.

4. The moving image composition system according to claim 3, wherein the part includes a musical instrument part.

The material moving image feature extraction unit corrects movement of each time axis of the material moving image based on an acoustic feature amount time series of the sound source of the predetermined music, and the sound source of the linked music is each of the material moving image. 5. The feature amount time series is extracted after simultaneously proceeding in step 1, and the movement-corrected time axis is used also in the prosperity time series and the co-star movie. The video composition system according to any one of the above.

The feature amount time series is a visual feature amount time series of the material moving image, an acoustic feature amount time series, or a comment number feature amount time series based on a comment attached to the material moving image in association with an elapsed playback time. 6. The moving image synthesizing system according to claim 1, wherein at least one of them is included.

7. The moving image synthesizing system according to claim 1, wherein the given relationship is given by a learning model using learning data in which a predetermined time series is associated with a prosperity time series.

8. The moving image synthesis system according to claim 7, wherein the learning model is constructed using multiple regression analysis.

The video composition unit synthesizes the co-star video by highlighting those of the material video that satisfy the predetermined condition in the time series of the co-star video in time series. The moving image synthesis system according to claim 1.

A material video search step that searches and obtains a predetermined number of video content that matches a user's search request from an external database,
A material video feature extraction step for extracting a feature amount time series from a predetermined number of material videos linked to a sound source of a predetermined music;
A prosperity degree calculating step of calculating a prosperity degree time series of the material video from the feature amount time series using a given relationship;
A moving image synthesizing method comprising: a moving image synthesizing step of synthesizing a co-star movie that combines the predetermined number of material moving images based on the success time series.

11. A moving picture synthesis program for causing a computer to execute the moving picture synthesis method according to claim 10.

12. A storage medium for a moving picture synthesis program in which the moving picture synthesis program according to claim 11 is recorded so as to be readable by a computer.