JP2012137560A

JP2012137560A - Karaoke device and control method and control program for karaoke device

Info

Publication number: JP2012137560A
Application number: JP2010288843A
Authority: JP
Inventors: Sukenori Kaneko; 祐紀金子; Midori Nakamae; 碧中前; Kazuyo Kuroda; 和代黒田
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 2010-12-24
Filing date: 2010-12-24
Publication date: 2012-07-19

Abstract

PROBLEM TO BE SOLVED: To easily use image data, such as uploaded photograph data, as a background dynamic image for karaoke.SOLUTION: Based on an analysis result of two or more uploaded still pictures, a karaoke playing terminal 13 selects an effect setting used for generating a dynamic image which contains at least a part of still picture from two or more still pictures. Then the terminal 13 generates the dynamic image in which an effect composing the effect setting is applied and reproduces the generated dynamic image in synchronization with the reproduction of a karaoke musical piece.

Description

本発明の実施形態は、カラオケ装置、カラオケ装置の制御方法及び制御プログラムに関する。 Embodiments described herein relate generally to a karaoke apparatus, a karaoke apparatus control method, and a control program.

従来、カラオケ装置は、カラオケ楽曲データを処理して伴奏音楽としてのカラオケ楽曲をスピーカ等の音響システムを介して出力するとともに、このカラオケ楽曲に同期させて歌詞画像をディスプレイに出力していた。 Conventionally, a karaoke apparatus processes karaoke music data and outputs karaoke music as accompaniment music via an acoustic system such as a speaker, and outputs a lyrics image in synchronization with the karaoke music.

これらと並行して、カラオケ装置は、ビデオＣＤ等に記憶された映像データを処理して、背景映像を再生し、この背景映像に歌詞画像をスーパーインポーズ表示するものが知られている。
また、カラオケ楽曲の演奏時に外部から入力された画像データを歌詞画像の背景映像として表示することも提案されている。
また、静止画像にエフェクトを施して表示する機能を有するものも知られている。 In parallel with these, a karaoke apparatus is known that processes video data stored in a video CD and the like, reproduces a background video, and superimposes a lyrics image on the background video.
It has also been proposed to display image data input from the outside during the performance of karaoke music as a background image of the lyrics image.
Also known are those having a function of displaying an effect on a still image.

特開平１１−２５９０７９号公報Japanese Patent Laid-Open No. 11-259079

ところで、ユーザが歌詞画像の背景映像として表示する画像データを持ち込んだ場合には、静止画像に施すべきエフェクトを、例えばユーザによって決定することが考えられる。 By the way, when the user brings in the image data to be displayed as the background image of the lyrics image, it is conceivable that the effect to be applied to the still image is determined by the user, for example.

しかし、静止画像毎にエフェクトを決定する作業はユーザにとって煩雑である。特にカラオケ装置においては、カラオケ楽曲を演奏して、ユーザが歌うことを目的としており、手間をかけて背景動画像の編集を行うことは考えにくい。 However, the task of determining the effect for each still image is complicated for the user. In particular, the karaoke apparatus is intended for the user to sing karaoke music and sing, and it is difficult to think about editing the background moving image with much effort.

そこで、本発明の目的は、取り込んだ写真データ等の画像データを、カラオケ用の背景動画像として容易に用いることが可能な動画像を生成するカラオケ装置、カラオケ装置の制御方法および制御プログラムを提供することにある。 Accordingly, an object of the present invention is to provide a karaoke apparatus, a karaoke apparatus control method, and a control program for generating a moving image that can easily use captured image data such as photograph data as a background moving image for karaoke. There is to do.

実施形態のカラオケ装置は、複数の静止画像を解析する解析手段を備え、エフェクト設定選択手段は、解析の結果に基づいて、複数の静止画像のうち、少なくとも一部の静止画像を含む動画像の生成に用いるエフェクト設定を選択する。 The karaoke apparatus according to the embodiment includes an analysis unit that analyzes a plurality of still images, and the effect setting selection unit is configured to analyze a moving image including at least some of the still images among the plurality of still images based on the analysis result. Select the effect settings used for generation.

これにより、動画像生成手段は、選択されたエフェクト設定を用いて、当該エフェクト設定を構成するエフェクトが施された動画像を生成し、カラオケ再生手段は、カラオケ楽曲を再生するとともに、カラオケ楽曲の再生に同期させて、生成した前記動画像を再生する。 Thereby, the moving image generating means generates a moving image to which the effect constituting the effect setting is applied using the selected effect setting, and the karaoke reproducing means reproduces the karaoke music and plays the karaoke music. The generated moving image is reproduced in synchronization with the reproduction.

図１は、実施形態に係るカラオケ装置を備えた通信カラオケシステムの概要構成説明図である。FIG. 1 is an explanatory diagram of a schematic configuration of a communication karaoke system including a karaoke apparatus according to an embodiment. 図２は、カラオケ演奏端末のブロック図である。FIG. 2 is a block diagram of the karaoke performance terminal. 図３は、カラオケ装置の要部の機能構成説明図である。FIG. 3 is an explanatory diagram of a functional configuration of a main part of the karaoke apparatus. 図４は、素材情報の構成説明図である。FIG. 4 is a diagram for explaining the structure of the material information. 図５は、分析情報の一構成例の説明図である。FIG. 5 is an explanatory diagram of a configuration example of analysis information. 図６は、笑顔度と人数とに基づいて決定されるエフェクトの一例を説明する図である。FIG. 6 is a diagram illustrating an example of an effect determined based on the smile level and the number of people. 図７は、上述した各エフェクト集に対応する具体的なエフェクト例の説明図である。FIG. 7 is an explanatory diagram of a specific effect example corresponding to each effect collection described above. 図８は、カラオケ再生処理の処理フローチャートである。FIG. 8 is a process flowchart of the karaoke playback process. 図９は、素材分析処理の処理フローチャートである。FIG. 9 is a process flowchart of the material analysis process.

次に実施の形態について図面を参照して説明する。
図１は、実施形態に係るカラオケ装置を備えた通信カラオケシステムの概要構成説明図である。
通信カラオケシステム１０は、カラオケ楽曲データ等を格納した図示しないカラオケデータベースを有するカラオケホスト１１と、カラオケホスト１１に対し、インターネット、ＶＰＮ等の通信ネットワーク１２を介して接続された複数のカラオケ演奏端末１３と、各カラオケ演奏端末１３に無線通信ネットワークを介して接続される複数のユーザ操作端末１４と、を備えている。 Next, embodiments will be described with reference to the drawings.
FIG. 1 is an explanatory diagram of a schematic configuration of a communication karaoke system including a karaoke apparatus according to an embodiment.
The communication karaoke system 10 has a karaoke host 11 having a karaoke database (not shown) storing karaoke music data and the like, and a plurality of karaoke performance terminals 13 connected to the karaoke host 11 via a communication network 12 such as the Internet or VPN. And a plurality of user operation terminals 14 connected to each karaoke performance terminal 13 via a wireless communication network.

図２は、カラオケ演奏端末のブロック図である。
カラオケ演奏端末１３は、カラオケ演奏端末１３全体を制御するコントローラ１０１と、ユーザによるカラオケ演奏端末１３の操作入力を直接あるいはユーザ操作端末１４を介して間接に受け付けたり、ユーザ所有のＵＳＢ機器あるいはメモリカード等からデータの入力を受け付けたりするユーザインタフェース１０２と、各種データ及びデータベースを記憶したハードディスクドライブ（ＨＤＤ）１０３と、を備えている。 FIG. 2 is a block diagram of the karaoke performance terminal.
The karaoke performance terminal 13 receives a controller 101 for controlling the karaoke performance terminal 13 as a whole and an operation input of the karaoke performance terminal 13 by a user directly or indirectly via the user operation terminal 14, or a user-owned USB device or memory card. A user interface 102 that accepts data input from the computer, and a hard disk drive (HDD) 103 that stores various data and databases.

また、カラオケ演奏端末１３は、通信ネットワーク１２を介してカラオケホスト１１との間の通信を行う通信インタフェース（Ｉ／Ｆ）１０４と、ＣＤ、ＤＶＤ等の光ディスクの記録／再生を行う光ディスクドライブ１０５と、ＶＲＡＭ１０６に格納されている表示画像データに基づいてディスプレイ１０７に各種表示を行う表示コントローラ１０８と、を備えている。 The karaoke performance terminal 13 includes a communication interface (I / F) 104 that performs communication with the karaoke host 11 via the communication network 12, and an optical disk drive 105 that performs recording / reproduction of an optical disk such as a CD or a DVD. And a display controller 108 for performing various displays on the display 107 based on the display image data stored in the VRAM 106.

さらに、カラオケ演奏端末１３は、マイクロフォン１０９Ａ、１０９Ｂからの入力音声をコントローラ側から入力されるカラオケ音響データに対応するカラオケ音響信号に重畳してスピーカ１１０に出力するサウンドコントローラ１１１と、各種画像を撮像するカメラ１１２と、を備えている。 Furthermore, the karaoke performance terminal 13 superimposes the input sound from the microphones 109A and 109B on the karaoke sound signal corresponding to the karaoke sound data input from the controller side, and outputs various images to the speaker 110. And a camera 112.

上記構成において、コントローラ１０１は、当該コントローラ１０１全体を制御するＣＰＵ１２１と、各種制御プログラムを不揮発的に記憶するＲＯＭ１２２と、各種データを一時的に格納し、ワーキングエリアとして機能するＲＡＭ１２３と、を備えている。 In the above configuration, the controller 101 includes a CPU 121 that controls the entire controller 101, a ROM 122 that stores various control programs in a nonvolatile manner, and a RAM 123 that temporarily stores various data and functions as a working area. Yes.

ユーザＩ／Ｆ１０２は、ユーザが各種操作を行う図示しない操作子が配置された操作パネル１２５と、ＵＳＢコネクタ１２６を介して接続された外部のＵＳＢ機器の制御を行うＵＳＢコントローラ１２７と、カードコネクタ１２８を介して接続された外部のメモリカードの制御を行うカードコントローラ１２９と、ユーザ操作端末１４からの無線通信により遠隔操作がなされるリモコンインタフェース（Ｉ／Ｆ）１３０と、を備えている。 The user I / F 102 includes an operation panel 125 on which an operator (not shown) for performing various operations by the user, a USB controller 127 for controlling an external USB device connected via the USB connector 126, and a card connector 128. A card controller 129 that controls an external memory card connected via the remote controller, and a remote control interface (I / F) 130 that is remotely operated by wireless communication from the user operation terminal 14.

図３は、カラオケ装置の要部の機能構成説明図である。
ここでは、動画像再生アプリケーションプログラム２０２が有する機能のうち、動画像生成機能を実現するための機能構成について説明する。 FIG. 3 is an explanatory diagram of a functional configuration of a main part of the karaoke apparatus.
Here, a functional configuration for realizing a moving image generation function among the functions of the moving image reproduction application program 202 will be described.

この動画像生成機能は、ユーザＩ／Ｆ１０２（上述のＵＳＢコントローラ１２７、カードコントローラ１２９等）を介して外部デバイス（ＵＳＢメモリ、メモリカード等）から格納された素材データ５１だけでなく、ＨＤＤ１０３内の所定のディレクトリに格納された素材データ５１や、通信インタフェース１０４及び通信ネットワーク１２を介して格納された素材データ５１に対しても適用できる。 This moving image generation function is used not only for the material data 51 stored from an external device (USB memory, memory card, etc.) via the user I / F 102 (the above-mentioned USB controller 127, card controller 129, etc.) but also in the HDD 103. The present invention can also be applied to the material data 51 stored in a predetermined directory and the material data 51 stored via the communication interface 104 and the communication network 12.

ここで、素材データ５１とは、例えば、ＨＤＤ１０３内の所定のディレクトリに格納された素材データ５１を例とすると、静止画像データ３０１Ａ、音声データ３０１Ｂ、動画像データ３０１Ｃ等である。 Here, the material data 51 is, for example, still image data 301 A, audio data 301 B, moving image data 301 C, and the like when the material data 51 stored in a predetermined directory in the HDD 103 is taken as an example.

動画像再生アプリケーションプログラム２０２は、コントローラ１０１のＲＡＭ１２３上に展開されており、機能的に見ると、素材入力部２１、素材分析部２２及び動画再生部２３を備えている。 The moving image reproduction application program 202 is developed on the RAM 123 of the controller 101, and includes a material input unit 21, a material analysis unit 22, and a moving image reproduction unit 23 when viewed functionally.

素材入力部２１は、ＵＳＢコントローラ１２７、カードコントローラ１２９等のユーザＩ／Ｆ１０２を介して、素材データ５１が入力されると、素材データ５１をＨＤＤ１０３内のデータベース１３１を構成する素材データベース３０１に格納する。ここで、素材データベース３０１は、生成される動画像に用いられる素材データ５１を格納するためのデータベースである。 When the material data 51 is input via the user I / F 102 such as the USB controller 127 or the card controller 129, the material input unit 21 stores the material data 51 in the material database 301 constituting the database 131 in the HDD 103. . Here, the material database 301 is a database for storing material data 51 used for a generated moving image.

具体的には、素材データベース３０１には、素材データ５１としての静止画像データ３０１Ａ、音声データ３０１Ｂ、動画像データ３０１Ｃ等が格納される。素材データベース３０１に格納された素材データ５１は、生成すべき動画像の素材候補として用いられる。 Specifically, the material database 301 stores still image data 301A, sound data 301B, moving image data 301C, and the like as material data 51. The material data 51 stored in the material database 301 is used as a material candidate for a moving image to be generated.

また、素材入力部２１は、素材データ５１がＨＤＤ１０３に格納されたことを素材分析部２２に通知する。 In addition, the material input unit 21 notifies the material analysis unit 22 that the material data 51 is stored in the HDD 103.

素材分析部２２は、素材入力部２１からの通知に応答して、素材データ５１を分析処理を開始する。
以下の説明においては、分析処理の分析対象である素材データ５１として、写真データが入力された場合を説明する。また、分析の目的としては、素材データ５１としての写真データに含まれる人物の表情（特に笑顔）及び人数を分析結果として出力するものとする。 In response to the notification from the material input unit 21, the material analysis unit 22 starts analyzing the material data 51.
In the following description, a case will be described in which photographic data is input as material data 51 that is an analysis target of analysis processing. For the purpose of analysis, it is assumed that facial expressions (especially smiles) and the number of persons included in the photographic data as the material data 51 are output as analysis results.

素材分析部２２は、大別すると、顔画像検出部２２１と、表情検出部２２２と、人数検出部２２３と、を備えている。なお、以下の説明においては、分析対象の素材データ５１が静止画像データ３０１Ａである場合を想定する。 The material analysis unit 22 includes a face image detection unit 221, a facial expression detection unit 222, and a number of people detection unit 223, when roughly classified. In the following description, it is assumed that the material data 51 to be analyzed is still image data 301A.

顔画像検出部２２１は、静止画像データ３０１Ａから顔画像を検出する顔検出処理を実行する。顔画像は、例えば、静止画像データ３０１Ａの特徴を解析し、予め用意された顔画像特徴サンプルと類似する特徴を有する領域を探索することによって検出することができる。ここで、顔画像特徴サンプルは、多数の人物それぞれの顔画像特徴を統計的に処理することによって抽出された特徴データである。 The face image detection unit 221 executes face detection processing for detecting a face image from the still image data 301A. The face image can be detected by, for example, analyzing the feature of the still image data 301A and searching for a region having a feature similar to a face image feature sample prepared in advance. Here, the face image feature sample is feature data extracted by statistically processing the face image features of a large number of persons.

顔検出処理の実行がなされると、静止画像データ３０１Ａ内に含まれる各顔画像の位置（座標）、サイズ、正面度等が検出される。 When the face detection process is executed, the position (coordinates), size, frontality, etc. of each face image included in the still image data 301A are detected.

さらに、顔画像検出部２２１は、静止画像データ３０１Ａから検出された複数の顔画像を、同一の人物と推定される顔画像別のグループに分類する。 Furthermore, the face image detection unit 221 classifies the plurality of face images detected from the still image data 301A into groups for each face image estimated as the same person.

また、顔画像検出部２２１は検出された顔画像に対応する人物を識別（特定）してもよい。その場合、顔画像検出部２２１は、例えば、識別する人物の顔画像特徴サンプルを用いて、検出された顔画像がその人物であるか否かを判定する。顔画像検出部２２１は、上述の結果に基づき、検出された顔画像に人物毎の顔ＩＤを付与する。顔画像検出部２２１は、検出した顔画像の情報（顔画像そのものおよび分類結果）を表情検出部２２２及び人数検出部２２３に出力する。 Further, the face image detection unit 221 may identify (specify) a person corresponding to the detected face image. In that case, the face image detection unit 221 determines whether or not the detected face image is the person using, for example, a face image feature sample of the person to be identified. The face image detection unit 221 assigns a face ID for each person to the detected face image based on the above result. The face image detection unit 221 outputs the detected face image information (the face image itself and the classification result) to the expression detection unit 222 and the number of people detection unit 223.

これにより、顔画像の情報が入力された表情検出部２２２は、顔画像検出部２２１によって検出された顔画像に対応する表情を検出する。そして、表情検出部２２２は、当該顔画像が検出された表情である尤もらしさを示す度合い（尤度）を算出する。 Thereby, the facial expression detection unit 222 to which the facial image information is input detects a facial expression corresponding to the facial image detected by the facial image detection unit 221. Then, the facial expression detection unit 222 calculates a degree (likelihood) indicating the likelihood that the facial image is the detected facial expression.

本実施形態においては、表情検出部２２２は、検出された顔画像に対応する表情が「笑顔」であるか否かを判定している。具体的には、表情検出部２２２は、例えば、「笑顔」の顔画像特徴サンプルに類似する特徴を有する顔画像を「笑顔」であると判定している。 In the present embodiment, the facial expression detection unit 222 determines whether the facial expression corresponding to the detected face image is “smile”. Specifically, the facial expression detection unit 222 determines that a face image having a feature similar to the “smile” face image feature sample is “smile”, for example.

そして、表情検出部２２２は、顔画像に対応する表情を「笑顔」であると判定した場合には、顔画像が笑顔であると推測する尤もらしさの度合いを笑顔度として算出することとなる。この場合において、１枚の静止画像データ３０１Ａから複数の画像が検出されている際には、表情検出部２２２は、例えば、複数の顔画像の笑顔度の平均を、静止画像データ３０１Ａの笑顔度とする。 When the facial expression detection unit 222 determines that the facial expression corresponding to the face image is “smile”, the facial expression detection unit 222 calculates the likelihood of estimating that the facial image is a smile as the smile level. In this case, when a plurality of images are detected from one still image data 301A, the facial expression detection unit 222 calculates, for example, the average smile level of the plurality of face images and the smile level of the still image data 301A. And

なお、笑顔度は、数値に限らず、例えば「高い」、「低い」といった相対的な指標で表してもよい。笑顔度を相対的な指標で表す際に、１枚の静止画像データ３０１Ａから複数の顔画像が検出されているときには、表情検出部２２２は、例えば、より多くの顔画像に設定されている方の指標（例えば、「高い」）を、静止画像データ３０１Ａの笑顔度に決定する。 The smile level is not limited to a numerical value, and may be expressed by a relative index such as “high” or “low”. When expressing a smile level as a relative index, if a plurality of face images are detected from one still image data 301A, the facial expression detection unit 222 is set to a larger number of face images, for example. Is determined based on the smile level of the still image data 301A.

以下の本実施形態の説明では、説明の簡略化のため、笑顔度のみを例として説明するが、表情検出部２２２は、笑顔に限らず、怒った顔、泣き顔、驚いた顔、無表情等、あらゆる表情である尤度を算出してもよい。 In the following description of the present embodiment, only the smile level will be described as an example for the sake of simplification. However, the facial expression detection unit 222 is not limited to a smile, but an angry face, a crying face, a surprised face, no expression, etc. The likelihood of any facial expression may be calculated.

一方、人数検出部２２３は、静止画像データ３０１Ａに含まれる人物の数を検出する。人数検出部２２３は、例えば、顔画像検出部２２１によって検出された顔画像の数を、静止画像データ３０１Ａに含まれる人物の数とする。また、人数検出部２２３は、例えば、顔画像を含む人物の全身や体の一部等を検出することにより、後ろ姿で捉えられた人物等を含む人数を算出してもよい。 On the other hand, the number-of-people detection unit 223 detects the number of people included in the still image data 301A. For example, the number-of-people detection unit 223 sets the number of face images detected by the face image detection unit 221 as the number of persons included in the still image data 301A. In addition, the number of persons detection unit 223 may calculate the number of persons including a person caught in the back, for example, by detecting the whole body or part of the body of the person including the face image.

なお、人数は、数値に限らず、例えば「多い」、「少ない」といった相対的な指標で表してもよい。人数検出部２２３は、例えば、静止画像データ３０１Ａからしきい値以上の数の顔画像が検出されているとき、静止画像データ３０１Ａの人数を「多い」に決定する。 The number of people is not limited to a numerical value, and may be expressed by a relative index such as “large” or “low”. For example, when the number of face images equal to or greater than the threshold value is detected from the still image data 301A, the number of people detection unit 223 determines the number of still image data 301A to be “large”.

素材分析部２２は、素材データ５１に付加された後述する素材情報３０２Ａ及び素材分析部２２の分析により生成された分析情報３０２Ｂを、ＨＤＤ１０３内の素材情報データベース３０２に格納する。 The material analysis unit 22 stores later-described material information 302 A added to the material data 51 and analysis information 302 B generated by the analysis of the material analysis unit 22 in the material information database 302 in the HDD 103.

図４は、素材情報の構成説明図である。
素材情報３０２Ａは、素材ＩＤ、ファイルパス、ファイルサイズ、ファイル形式、生成日時、生成場所、種類、画像サイズ、再生時間、入力経路を示す情報を含んでいる。 FIG. 4 is a diagram for explaining the structure of the material information.
The material information 302A includes information indicating a material ID, file path, file size, file format, generation date / time, generation location, type, image size, playback time, and input path.

ここで、「素材ＩＤ」は、素材データ５１を特定するために、一意に割り当てられる識別情報である。「ファイルパス」は、素材データ５１がＨＤＤ１０３上で、格納される場所を示す。「ファイルサイズ」は、素材データ５１のデータサイズを示す。「ファイル形式」は、素材データ５１のデータフォーマット（例えば、動画であれば、ｍｐｅｇフォーマット、ｗｍａフォーマット等、静止画であればｊｐｅｇフォーマット、ｂｍｐフォーマット等、音声であればｍｐ３フォーマット、ｗａｖフォーマット等）を示す。「生成日時」は、素材データ５１が生成された日時を表す情報（例えば、２０１０年１１月１０日等）を示す。「生成場所」は、素材データ５１が生成された場所を表す位置情報（例えば、ＧＰＳ測位による経度・移動情報）を示す。「種類」は、素材データ５１のデータ内容の種類（例えば、静止画像、音声、動画像等）を示す。「画像サイズ」は、素材データ５１が、静止画像データ３０１Ａ又は動画像データ３０１Ｃに対応するものであるときに、それらの表示時の画像サイズ（例えば、１０２４×７６８ピクセル等）を示す。「再生時間」は、素材データ５１が、音声データ３０１Ｂ又は動画像データ３０１Ｃに対応するものであるときに、通常速度で再生時の再生時間を示す。「入力経路」は、素材データ５１がカラオケ演奏端末１３に入力された経路（例えば、外部記憶メディア、外部記憶装置、ネットワーク上のサーバ等）を示す。 Here, the “material ID” is identification information uniquely assigned to identify the material data 51. “File path” indicates a location where the material data 51 is stored on the HDD 103. “File size” indicates the data size of the material data 51. “File format” is the data format of the material data 51 (for example, mpeg format, wma format, etc. for moving images, jpeg format, bmp format, etc. for still images, mp3 format, wav format, etc. for audio). Indicates. “Generation date / time” indicates information (for example, November 10, 2010) indicating the date / time when the material data 51 was generated. The “generation location” indicates position information (for example, longitude / movement information by GPS positioning) that represents a location where the material data 51 is generated. “Type” indicates the type of data content of the material data 51 (for example, still image, sound, moving image, etc.). “Image size” indicates an image size (for example, 1024 × 768 pixels) when the material data 51 corresponds to the still image data 301A or the moving image data 301C. “Reproduction time” indicates the reproduction time during reproduction at normal speed when the material data 51 corresponds to the audio data 301B or the moving image data 301C. “Input path” indicates a path (for example, an external storage medium, an external storage device, a server on a network, etc.) through which the material data 51 is input to the karaoke performance terminal 13.

図５は、分析情報の一構成例の説明図である。
分析情報３０２Ｂは、図５に示すように、例えば、上述した素材ＩＤ、笑顔度、人数及び顔画像情報を含む。 FIG. 5 is an explanatory diagram of a configuration example of analysis information.
As shown in FIG. 5, the analysis information 302B includes, for example, the above-described material ID, smile level, number of persons, and face image information.

また、顔画像情報は、上述の顔検出処理による分析結果に基づく情報を示す。したがって、顔画像情報は、例えば、顔画像、サイズ、位置、顔ＩＤを示す情報を含む。また、顔画像情報は、各顔画像の笑顔度を含んでもよい。
なお、分析情報３０２Ｂには、１つの静止画像データ３０１Ａから検出された顔画像に対応する数だけ、顔画像情報が格納される。 The face image information indicates information based on the analysis result by the face detection process described above. Accordingly, the face image information includes, for example, information indicating the face image, size, position, and face ID. The face image information may include the smile level of each face image.
Note that as many pieces of face image information as the number of face images detected from one still image data 301A are stored in the analysis information 302B.

また、素材分析部２２は、静止画像データ３０１Ａから人物（顔画像を含む全身や体の一部等）、風景（海、山、花等）、動物（犬、猫、魚等）等のオブジェクトを検出（認識）し、それら分析結果（検出結果）を示す情報を含む分析情報３０２Ｂを生成してもよい。 In addition, the material analysis unit 22 obtains objects such as a person (whole body including a facial image, a part of a body, etc.), a landscape (sea, mountains, flowers, etc.), an animal (dog, cat, fish, etc.) May be detected (recognized), and analysis information 302B including information indicating the analysis results (detection results) may be generated.

さらに、素材分析部２２は、素材情報３０２Ａや静止画像データ３０１Ａから撮影時刻、撮影位置等を推定し、それら分析結果（推定結果）を示す情報を含む分析情報３０２Ｂを生成してもよい。その場合、図５に示すように、分析情報３０２Ｂには、人物画像、サイズ、位置、及び人物ＩＤを含む人物画像情報、風景画像、サイズ、位置、及び属性を含む風景情報、動物画像、サイズ、位置、及び属性を含む動物情報、撮影時刻、並びに撮影位置が含まれる。 Further, the material analysis unit 22 may estimate the shooting time, the shooting position, and the like from the material information 302A and the still image data 301A, and generate analysis information 302B including information indicating the analysis results (estimation results). In this case, as shown in FIG. 5, analysis information 302B includes person image information including person images, sizes, positions, and person IDs, landscape information including landscape images, sizes, positions, and attributes, animal images, sizes. , Animal information including position and attribute, shooting time, and shooting position.

なお、素材分析部２２は、音声データ３０１Ｂを分析し、検出された声に対応する人物の情報及び人物の数、検出された音楽の雰囲気及びジャンル等を含む分析情報３０２Ｂを生成してもよい。 Note that the material analysis unit 22 may analyze the audio data 301B and generate analysis information 302B including information about the person and the number of persons corresponding to the detected voice, the atmosphere and genre of the detected music, and the like. .

さらに、素材分析部２２は、動画像データ３０１Ｃに含まれる各画像フレームを、静止画像データ３０１Ａと同様に分析し、上述の笑顔度、人数、顔画像情報等を含む分析情報３０２Ｂを生成してもよい。 Further, the material analysis unit 22 analyzes each image frame included in the moving image data 301C in the same manner as the still image data 301A, and generates analysis information 302B including the above-described smile level, number of persons, face image information, and the like. Also good.

素材分析部２２は、入力された素材データ５１に対応する素材情報３０２Ａ及び分析情報３０２Ｂが、素材情報データベース３０２に格納されたことを動画再生部２３に通知する。 The material analysis unit 22 notifies the moving image reproduction unit 23 that the material information 302A and the analysis information 302B corresponding to the input material data 51 are stored in the material information database 302.

動画再生部２３は、素材分析部２２からの通知に応答して、素材データ５１を用いて合成動画（動画像）を生成し、生成された合成動画を再生（表示）する処理を開始する。その際、動画再生部２３は、素材情報データベース３０２を参照して、所定の条件を満たす素材データ５１を素材データベース３０１から抽出し、合成動画を生成する。
この場合において、動画再生部２３は、エフェクト抽出部２３１と、合成動画生成部２３２と、合成動画出力部２３３と、を備えている。 In response to the notification from the material analysis unit 22, the video playback unit 23 generates a composite video (moving image) using the material data 51, and starts a process of playing back (displaying) the generated composite video. At that time, the moving image reproducing unit 23 refers to the material information database 302 and extracts the material data 51 satisfying a predetermined condition from the material database 301 to generate a synthesized moving image.
In this case, the video playback unit 23 includes an effect extraction unit 231, a composite video generation unit 232, and a composite video output unit 233.

エフェクト抽出部２３１は、エフェクトデータベース３０３から、取り込んだ素材データ５１に適したエフェクトデータ３０３Ａを抽出する。ここで、エフェクトデータ３０３Ａとしては、ズーム、回転、ノイズ追加、モザイク化、輪郭抽出、エンボス等の通常のビデオエフェクトの他、シーン間をつなぐトランジションも含まれるものとする。 The effect extraction unit 231 extracts effect data 303A suitable for the captured material data 51 from the effect database 303. Here, the effect data 303A includes transitions that connect scenes in addition to normal video effects such as zoom, rotation, noise addition, mosaication, contour extraction, and embossing.

具体的には、エフェクト抽出部２３１は、まず、素材情報データベース３０２から、抽出された素材データ５１に対応する分析情報３０２Ｂに含まれる笑顔度と人数とを抽出する。 Specifically, the effect extraction unit 231 first extracts the smile level and the number of people included in the analysis information 302B corresponding to the extracted material data 51 from the material information database 302.

そして、エフェクト抽出部２３１は、抽出した笑顔度と人数とに基づいて、抽出された素材データ５１に適したエフェクトデータ３０３Ａを選択する。エフェクト抽出部２３１は、例えば、抽出された複数の静止画像データ３０１Ａ（素材データ５１）の各々に対応する笑顔度と人数とから、これら複数の静止画像データ３０１Ａ全体での笑顔度の指標と人数の指標とを算出する。 Then, the effect extraction unit 231 selects effect data 303A suitable for the extracted material data 51 based on the extracted smile level and the number of people. For example, the effect extraction unit 231 determines the smile level index and the number of people in the entire still image data 301A from the smile level and the number of people corresponding to each of the extracted still image data 301A (material data 51). Is calculated.

すなわち、エフェクト抽出部２３１は、例えば、抽出された複数の静止画像データ３０１Ａのうち、笑顔度が第１しきい値以上である顔画像を含む静止画像データ３０１Ａの数を、複数の静止画像データ３０１Ａ全体での笑顔度の指標に決定する。 That is, for example, the effect extraction unit 231 determines the number of still image data 301A including a face image having a smile degree equal to or greater than the first threshold among the plurality of extracted still image data 301A. It is determined as an index of smile level in the entire 301A.

また、エフェクト抽出部２３１は、例えば、抽出された複数の静止画像データ３０１Ａの各々に対応する笑顔度の平均を、これら複数の静止画像データ３０１Ａ全体の笑顔度の指標に決定してもよい。 In addition, for example, the effect extraction unit 231 may determine the average smile level corresponding to each of the extracted still image data 301A as an index of the smile level of the entire still image data 301A.

また、エフェクト抽出部２３１は、例えば、抽出された複数の静止画像データ３０１Ａのうち、人数が第２しきい値以上である静止画像データ３０１Ａのうちの数を、複数の静止画像データ３０１Ａ全体での人数の指標に決定する。また、エフェクト抽出部２３１は、例えば、抽出された複数の静止画像データ３０１Ａの各々に対応する人数の平均を、これら複数の静止画像データ３０１Ａ全体の人数の指標に決定してもよい。 Further, for example, the effect extraction unit 231 calculates the number of the still image data 301A whose number of persons is equal to or greater than the second threshold value among the plurality of extracted still image data 301A in the plurality of still image data 301A as a whole. To be an indicator of the number of people. Further, for example, the effect extraction unit 231 may determine an average of the number of people corresponding to each of the plurality of extracted still image data 301A as an index of the number of people of the plurality of still image data 301A as a whole.

なお、上述のように、笑顔度と人数とは相対的な指標で表されてもよい。したがって、例えば、抽出された複数の静止画像データ３０１Ａの各々に「高い」又は「低い」という笑顔度が設定されているとき、エフェクト抽出部２３１は、より多くの静止画像データ３０１Ａに設定されている方の指標（例えば、「高い」）を、これら複数の静止画像データ３０１Ａ全体の笑顔度に決定する。また、例えば、抽出された複数の静止画像データ３０１Ａのうち、所定の割合（第１しきい値）以上の静止画像データ３０１Ａに「高い」という笑顔度が設定されているとき、エフェクト抽出部２３１は、これら複数の静止画像データ３０１Ａ全体の笑顔度を「高い」に決定する。 As described above, the smile level and the number of people may be represented by relative indices. Therefore, for example, when a smile level of “high” or “low” is set for each of a plurality of extracted still image data 301A, the effect extraction unit 231 is set to more still image data 301A. The index (for example, “high”) that is present is determined as the smile level of the entire still image data 301A. Further, for example, when a smile level of “high” is set in still image data 301A having a predetermined ratio (first threshold value) or more among a plurality of extracted still image data 301A, the effect extraction unit 231 Determines the smile level of the entire plurality of still image data 301A to be “high”.

同様に、例えば、抽出された複数の静止画像データ３０１Ａの各々に「多い」又は「少ない」という人数が設定されているとき、エフェクト抽出部２３１は、より多くの静止画像データ３０１Ａに設定されている方の指標（例えば、「少ない」）を、これら複数の静止画像データ３０１Ａ全体の人数に決定する。また、例えば、抽出された複数の静止画像データ３０１Ａのうち、所定の割合（第２しきい値）以上の静止画像データ３０１Ａに「多い」という人数が設定されているとき、エフェクト抽出部２３１は、これら複数の静止画像データ３０１Ａ全体の人数を「多い」に決定する。 Similarly, for example, when the number of “large” or “small” is set in each of the extracted still image data 301A, the effect extraction unit 231 is set to more still image data 301A. The index of the person (for example, “less”) is determined as the total number of the still image data 301A. In addition, for example, when the number of “large” is set in the still image data 301A that is equal to or higher than a predetermined ratio (second threshold) among the plurality of extracted still image data 301A, the effect extraction unit 231 Then, the number of the entire still image data 301A is determined to be “large”.

エフェクト抽出部２３１は、上述のように決定される、抽出された複数の静止画像データ３０１Ａ全体に対応する笑顔度と人数とに基づいて、これら複数の静止画像データ３０１Ａに適したエフェクトデータ３０３Ａを決定する。 The effect extraction unit 231 determines the effect data 303A suitable for the plurality of still image data 301A based on the smile level and the number of people corresponding to the entire extracted plurality of still image data 301A determined as described above. decide.

図６は、笑顔度と人数とに基づいて決定されるエフェクトの一例を説明する図である。
エフェクト抽出部２３１は、抽出された複数の静止画像データ３０１Ａ全体に対応する笑顔度と人数とに応じて、人数が多く、笑顔度が高い素材である場合には、人数が多く、笑顔度が高い素材である場合に適すると考えられるエフェクト集５１Ａを選択する。 FIG. 6 is a diagram illustrating an example of an effect determined based on the smile level and the number of people.
The effect extraction unit 231 has a large number of people and a high smile level in the case of a material with a large number of people and a high smile level according to the smile level and the number of people corresponding to the entire extracted plurality of still image data 301A. An effect collection 51A that is considered suitable for a high material is selected.

また、エフェクト抽出部２３１は、人数が多く、笑顔度が低い素材である場合には、人数が多く、笑顔度が低い素材である場合に適すると考えられるエフェクト集５１Ｂを選択する。 The effect extraction unit 231 selects an effect collection 51B that is considered suitable for a material with a large number of people and a low smile level when the number of people is a material with a low smile level.

また、エフェクト抽出部２３１は、人数が少なく、笑顔度が低い素材である場合には、人数が少なく、笑顔度が低い素材に適すると考えられるエフェクト集５１Ｃを選択する。 The effect extraction unit 231 selects an effect collection 51C that is suitable for a material with a small number of people and a low smile level when the number of people is low and the smile level is low.

また、エフェクト抽出部２３１は、人数が少なく、笑顔度が高い素材である場合には、人数が少なく、笑顔度が高い素材に適すると考えられるエフェクト集５１Ｄを選択する。 In addition, when the material is a material with a small number of people and a high smile level, the effect extraction unit 231 selects an effect collection 51D that is considered suitable for a material with a small number of people and a high smile level.

図７は、上述した各エフェクト集に対応する具体的なエフェクト例の説明図である。
人数が多く、笑顔度が高い素材に適したエフェクト集５２Ａとしては、幸せな印象や元気な印象を想起させる効果（装飾）を有する一群のエフェクトが用いられる。したがって、その場を盛り上げることができるような効果が付与される。 FIG. 7 is an explanatory diagram of a specific effect example corresponding to each effect collection described above.
As the effect collection 52A suitable for a material having a large number of people and a high smile level, a group of effects having an effect (decoration) reminiscent of a happy impression or a cheerful impression is used. Therefore, an effect that can excite the place is given.

また、人数が多く、笑顔度が低い素材に適したエフェクト集５２Ｂとしては、セレモニーを想起させる効果を有する一群のエフェクトが用いられる。したがって、例えば、荘厳な雰囲気を醸し出すような効果が付与される。 In addition, as the effect collection 52B suitable for a material having a large number of people and a low smile level, a group of effects having an effect reminiscent of a ceremony is used. Therefore, for example, an effect that creates a solemn atmosphere is given.

また、人数が少なく、笑顔度が低い素材に適したエフェクト集５２Ｃとしては、クールな印象や近未来的な印象を想起させる効果を有する一群のエフェクトが用いられる。
また、人数が少なく、笑顔度が高い素材に適したエフェクト集５２Ｄには、ファンタジーや魔法といった印象を想起させる効果を有する一群のエフェクトが用いられる。 In addition, as the effect collection 52C suitable for a material with a small number of people and a low smile level, a group of effects having an effect of recalling a cool impression or a near-future impression is used.
In addition, a group of effects having an effect of recalling an impression such as fantasy or magic is used for the effect collection 52D suitable for a material having a small number of people and a high smile level.

これらエフェクト集５２Ａ〜５２Ｄでは、エフェクトに用いられる色、形状、動き（モーション）、オブジェクト等を変更することによって、ユーザに認識される印象が変化するように設計される。 These effect collections 52A to 52D are designed to change the impression recognized by the user by changing the color, shape, movement (motion), object, and the like used for the effect.

したがって、例えば、幸せな印象や元気な印象を想起させる効果を有するエフェクト集５２Ａは、明るい色や鮮やかな色を用いたエフェクトを含む。また、例えば、クールな印象や近未来的な印象を想起させる効果を有するエフェクト集５２Ｃは、幾何学的な形状を用いたエフェクトを含む。 Therefore, for example, the effect collection 52 A that has an effect of recalling a happy impression or an energetic impression includes effects using bright colors or vivid colors. Further, for example, the effect collection 52 C having an effect of recalling a cool impression or a near-future impression includes an effect using a geometric shape.

なお、エフェクト抽出部２３１は、抽出された複数の静止画像データ３０１Ａに適するエフェクトを、図６及び図７に示すような４種類に分類されたエフェクト集に限らず、さらに細かく分類されたエフェクト集から選択することもできる。その場合、人数及び笑顔度の値（値域）に対応する、所定の種類のエフェクト集が予め規定され、エフェクト抽出部２３１は、規定された所定の種類のエフェクト集から、抽出された複数の静止画像データ３０１Ａに適したエフェクト集を選択する。 Note that the effect extraction unit 231 does not limit the effects suitable for the plurality of extracted still image data 301A to the effect collections classified into the four types as shown in FIGS. You can also choose from. In that case, a predetermined type of effect collection corresponding to the number of people and the smile level (value range) is specified in advance, and the effect extraction unit 231 extracts a plurality of still images extracted from the specified predetermined type of effect collection. An effect collection suitable for the image data 301A is selected.

また、エフェクト抽出部２３１は、人数及び笑顔度以外の指標を用いて、抽出された複数の静止画像データ３０１Ａに適したエフェクト集を選択してもよい。 Further, the effect extraction unit 231 may select an effect collection suitable for the plurality of extracted still image data 301A using an index other than the number of people and the smile level.

次にエフェクト抽出部２３１は、選択したエフェクト集に対応するエフェクトデータ３０３Ａをエフェクトデータベース３０３から抽出し、抽出したエフェクトデータ３０３Ａを合成動画生成部２３２に出力する。 Next, the effect extraction unit 231 extracts the effect data 303A corresponding to the selected effect collection from the effect database 303, and outputs the extracted effect data 303A to the synthesized moving image generation unit 232.

これにより、合成動画生成部２３２は、取り込まれた素材データ５１を含むカラオケ背景動画像となる合成動画を生成する。
このとき、合成動画に含まれる素材データ５１には、エフェクト抽出部２３１により抽出されたエフェクトデータ３０３Ａが施されている。 Thereby, the synthetic moving image generating unit 232 generates a synthetic moving image that becomes a karaoke background moving image including the captured material data 51.
At this time, the effect data 303A extracted by the effect extraction unit 231 is applied to the material data 51 included in the synthesized moving image.

また、例えば、合成動画に含まれる静止画像データ３０１Ａ（素材データ５１）に登場する人物の顔画像（オブジェクト）に対して、エフェクト抽出部２３１により抽出されたエフェクトデータ３０３Ａが施されている。
合成動画生成部２３２は、例えば、エフェクトデータ３０３Ａによって規定されたタイミングで表示される静止画像データ３０１Ａを含む合成動画を生成する。 Further, for example, the effect data 303A extracted by the effect extraction unit 231 is applied to the face image (object) of a person appearing in the still image data 301A (material data 51) included in the synthesized moving image.
For example, the combined moving image generation unit 232 generates a combined moving image including still image data 301A that is displayed at a timing defined by the effect data 303A.

また、この合成動画は、所定のタイミングで出力される音声データ３０１Ｂを含んでもよい。
そして、合成動画生成部２３２は、生成した合成動画を合成動画出力部２３３に出力する。 Further, the synthesized moving image may include audio data 301B output at a predetermined timing.
Then, the composite video generation unit 232 outputs the generated composite video to the composite video output unit 233.

なお、エフェクト抽出部２３１は、取り込まれた素材データ５１に対して、表情（例えば、笑顔度）と人数に基づくエフェクトデータ３０３Ａを施してもよい。その場合、合成動画生成部２３２は、エフェクト抽出部２３１によってエフェクトが施された複数の静止画像を含む動画像（合成動画）をカラオケ背景動画像として生成する。 Note that the effect extraction unit 231 may apply effect data 303A based on a facial expression (for example, smile level) and the number of people to the captured material data 51. In that case, the synthesized moving image generating unit 232 generates a moving image (synthetic moving image) including a plurality of still images to which the effect is applied by the effect extracting unit 231 as a karaoke background moving image.

合成動画出力部２３３は、合成動画生成部２３２によって生成された合成動画像を出力する。
合成動画出力部２３３は、合成動画を再生し、画面（ディスプレイ１０７）に表示する。 The composite video output unit 233 outputs the composite video generated by the composite video generation unit 232.
The synthesized moving image output unit 233 reproduces the synthesized movie and displays it on the screen (display 107).

また、合成動画出力部２３３は、合成動画をエンコードし、エンコードした合成動画のファイルを所定の記憶装置（例えば、ＨＤＤ１０３等）に格納してもよい。 The synthetic video output unit 233 may encode the synthetic video and store the encoded synthetic video file in a predetermined storage device (for example, the HDD 103).

以上の構成により、動画像再生アプリケーションプログラム２０２は、カラオケ背景動画像としての合成動画に用いられる素材データ５１に適したエフェクト（エフェクト群）３０３Ａを決定する。 With the above configuration, the moving image playback application program 202 determines an effect (effect group) 303A suitable for the material data 51 used for the synthesized moving image as the karaoke background moving image.

具体的には、エフェクト抽出部２３１は、例えば、合成動画に用いられる複数の静止画像データ３０１Ａの各々の笑顔度と人数とに基づいて、これら複数の静止画像データ３０１Ａ全体の笑顔度と人数との指標を決定する。エフェクト抽出部２３１は、決定した笑顔度と人数との指標に基づいて、カラオケ背景動画像としての合成動画に用いられる複数の静止画像データ３０１Ａに適したエフェクトデータ３０３Ａを選択する。 Specifically, the effect extraction unit 231, for example, based on the smile level and the number of people of each of the plurality of still image data 301 A used for the composite video, the smile level and the number of people of the plurality of still image data 301 A as a whole. Determine the indicators. The effect extraction unit 231 selects the effect data 303A suitable for the plurality of still image data 301A used for the synthesized moving image as the karaoke background moving image based on the determined index of smile level and number of people.

したがって、合成動画に用いられるエフェクトデータ３０３Ａを選択する操作をユーザが行うことなく、合成動画生成部２３２は、適切なエフェクトデータ３０３Ａが施された複数の静止画像データ３０１Ａを含むカラオケ背景動画像（合成動画像）を生成することができる。 Therefore, without the user performing an operation of selecting the effect data 303A used for the composite video, the composite video generation unit 232 includes a karaoke background moving image (including a plurality of still image data 301A to which the appropriate effect data 303A is applied ( Composite moving image) can be generated.

次に実施形態のカラオケ背景動画像の生成及び再生処理について説明する。
図８は、カラオケ再生処理の処理フローチャートである。
まず、カラオケ演奏端末１３のコントローラ１０１は、カラオケ背景動画像としての合成動画に用いる素材データ５１を取り込む（ステップＳ１１）。 Next, generation and playback processing of a karaoke background moving image according to the embodiment will be described.
FIG. 8 is a process flowchart of the karaoke playback process.
First, the controller 101 of the karaoke performance terminal 13 takes in the material data 51 used for the synthesized moving image as the karaoke background moving image (step S11).

素材データ５１の取込方法としては、ＵＳＢコネクタ１２６を介して外部ＵＳＢ機器から取り込む方法、カードコネクタ１２８を介して外部のメモリカードから取り込む方法、カメラ１１２により撮影して取り込む方法、ＨＤＤ１０３に記憶されている共用素材データを取り込む方法、カラオケホスト１１から通信ネットワーク１２を介して共用素材データをダウンロードする方法等が考えられる。 The material data 51 can be captured from an external USB device via the USB connector 126, from an external memory card via the card connector 128, photographed and captured by the camera 112, and stored in the HDD 103. A method of capturing shared material data, a method of downloading shared material data from the karaoke host 11 via the communication network 12, and the like are conceivable.

一般的なユーザにおける素材データ５１の取込方法としては、ＵＳＢコネクタ１２６、カードコネクタ１２８あるいはカメラ１１２から取り込む方法が採られる。 As a method of capturing the material data 51 by a general user, a method of capturing from the USB connector 126, the card connector 128, or the camera 112 is employed.

具体的には、ＵＳＢコネクタ１２６にＵＳＢメモリ、ＵＳＢ接続ハードディスク、ＵＳＢ接続ＳＳＤ（Solid State Drive）等の外部記憶装置が接続された場合には、ＵＳＢコントローラ１２７を介して、写真データ等の静止画データを素材データ５１として取り込む。 Specifically, when an external storage device such as a USB memory, a USB-connected hard disk, or a USB-connected SSD (Solid State Drive) is connected to the USB connector 126, a still image such as photo data is connected via the USB controller 127. Data is taken in as material data 51.

また、カードコネクタ１２８に外部のメモリカードが接続された場合には、カードコントローラ１２９と、カードコントローラ１２９を介して写真データ等の静止画データを素材データ５１として取り込む。 When an external memory card is connected to the card connector 128, still image data such as photo data is taken in as material data 51 via the card controller 129 and the card controller 129.

また、ユーザの操作によりカメラ１１２により撮影がなされた場合には、撮影した写真データを素材データ５１として取り込む。 In addition, when the camera 112 has photographed by the user's operation, the photographed photograph data is captured as the material data 51.

次にコントローラ１０１は、動画像再生アプリケーションプログラム２０２を実行して素材分析処理を行う（ステップＳ１２）。 Next, the controller 101 executes the moving image reproduction application program 202 to perform material analysis processing (step S12).

図９は、素材分析処理の処理フローチャートである。
以下では、素材分析対象の素材データ５１が写真データ等の静止画像データ３０１Ａである場合を想定する。 FIG. 9 is a process flowchart of the material analysis process.
In the following, it is assumed that the material analysis target material data 51 is still image data 301A such as photographic data.

まず、素材入力部２１は、インタフェース部等を介して、静止画像データ３０１Ａが入力されたか否かを判別する（ステップＳ２１）。
ステップＳ２１の判別において、静止画像データ３０１Ａが入力されていない場合には（ステップＳ２１；Ｎｏ）、待機状態となる。 First, the material input unit 21 determines whether still image data 301A has been input via the interface unit or the like (step S21).
If the still image data 301A is not input in the determination in step S21 (step S21; No), the standby state is entered.

ステップＳ２１の判別において、静止画像データ３０１Ａが入力されている場合には（ステップＳ２１；Ｙｅｓ）、素材入力部２１は、入力された静止画像データ３０１Ａを素材データベース３０１に格納する（ステップＳ２２）。そして、素材入力部２１は、静止画像データ３０１Ａが入力されたことを素材分析部２２（顔画像検出部２２１）に通知する。 If the still image data 301A is input in the determination of step S21 (step S21; Yes), the material input unit 21 stores the input still image data 301A in the material database 301 (step S22). Then, the material input unit 21 notifies the material analysis unit 22 (face image detection unit 221) that the still image data 301A has been input.

次に、顔画像検出部２２１は、入力された静止画像データ３０１Ａから顔画像を検出する（ステップＳ２３）。
すなわち、顔画像検出部２２１は、静止画像データ３０１Ａ内に含まれる各顔画像の位置（座標）、サイズ、正面度等を検出する。この場合において、顔画像検出部２２１により、検出した顔画像に対応する人物を認識（識別）するようにしてもよい。
そして、顔画像検出部２２１は、検出した顔画像を示す情報を表情検出部２２２及び人数検出部２２３に出力する。 Next, the face image detection unit 221 detects a face image from the input still image data 301A (step S23).
That is, the face image detection unit 221 detects the position (coordinates), size, frontality, and the like of each face image included in the still image data 301A. In this case, the face image detection unit 221 may recognize (identify) a person corresponding to the detected face image.
Then, the face image detection unit 221 outputs information indicating the detected face image to the expression detection unit 222 and the number of people detection unit 223.

これにより、表情検出部２２２は、顔画像検出部２２１により検出された顔画像の笑顔度を決定する（ステップＳ２４）。
ここで、笑顔度とは、検出された顔画像が笑顔である尤もらしさの度合い（尤度）を示す指標である。一つの静止画像データ３０１Ａから複数の顔画像が検出されている際には、それら顔画像の笑顔度に基づいて、静止画像データ３０１Ａの笑顔度を決定する。 As a result, the facial expression detection unit 222 determines the smile level of the face image detected by the face image detection unit 221 (step S24).
Here, the smile level is an index indicating the degree of likelihood (likelihood) that the detected face image is a smile. When a plurality of face images are detected from one still image data 301A, the smile level of the still image data 301A is determined based on the smile levels of the face images.

また、人数検出部２２３は、顔画像検出部２２１により検出された顔画像の数に基づいて、静止画像データ３０１Ａに含まれる人物の数を決定する（ステップＳ２５）。
これらの結果、素材分析部２２は、静止画像データ３０１Ａに対応する笑顔度、人数、顔画像情報等を含む分析情報３０２Ｂを素材情報データベース３０２に格納する（ステップＳ２６）。 In addition, the number-of-people detection unit 223 determines the number of persons included in the still image data 301A based on the number of face images detected by the face image detection unit 221 (step S25).
As a result, the material analysis unit 22 stores the analysis information 302B including the smile level, the number of people, face image information, and the like corresponding to the still image data 301A in the material information database 302 (step S26).

以上の処理により、入力された静止画像データ３０１Ａに含まれる顔画像の笑顔度と人数とを決定し、これら笑顔度と人数とを含む分析情報３０２Ｂが素材情報データベース３０２に格納される。 Through the above process, the smile level and the number of faces included in the input still image data 301 A are determined, and analysis information 302 B including the smile level and the number of persons is stored in the material information database 302.

次にコントローラ１０１は、動画像再生アプリケーションプログラム２０２に基づいて、エフェクト抽出部２３１として機能し、分析結果に基づいて動画像生成に用いる一連のエフェクトを選択する（ステップＳ１３）。すなわち、コントローラ１０１は、エフェクト抽出部２３１として機能して、取り込んだ素材データ５１に対応する分析情報３０２Ｂに基づいて、取り込んだ素材データ５１に適したエフェクト集を選択する。そして、コントローラ１０１は、選択したエフェクト集に対応するエフェクトデータ３０３Ａをエフェクトデータベース３０３から抽出する。 Next, the controller 101 functions as the effect extraction unit 231 based on the moving image reproduction application program 202, and selects a series of effects used for moving image generation based on the analysis result (step S13). That is, the controller 101 functions as the effect extraction unit 231 and selects an effect collection suitable for the captured material data 51 based on the analysis information 302B corresponding to the captured material data 51. Then, the controller 101 extracts the effect data 303A corresponding to the selected effect collection from the effect database 303.

続いて、コントローラ１０１は、合成動画生成部２３２として機能し、抽出された素材データ５１とエフェクトデータ３０３Ａとを用いて、合成動画を生成する（ステップＳ１４）。ここで、生成された合成動画には、エフェクトデータ３０３Ａが施された素材データ５１が含まれる。なお、コントローラ１０１は、エフェクト抽出部２３１として機能するに際し、取り込んだ素材データ５１に選択したエフェクトデータ３０３Ａを施すようにしてもよい。
これにより、コントローラ１０１は、合成動画生成部２３２として機能するに際し、エフェクトデータ３０３Ａが施された素材データ５１を含む合成動画を生成する。 Subsequently, the controller 101 functions as the synthetic moving image generating unit 232, and generates a synthetic moving image using the extracted material data 51 and the effect data 303A (step S14). Here, the generated composite video includes the material data 51 to which the effect data 303A has been applied. When the controller 101 functions as the effect extraction unit 231, the controller 101 may apply the selected effect data 303 A to the captured material data 51.
As a result, when the controller 101 functions as the composite video generation unit 232, the controller 101 generates a composite video including the material data 51 to which the effect data 303A has been applied.

続いて、コントローラ１０１は、合成動画出力部２３３として機能し、合成動画を、カラオケ楽曲の背景動画像として、表示コントローラ１０８を介して、カラオケ楽曲に対応する歌詞とともにディスプレイ１０７に表示させる。 Subsequently, the controller 101 functions as the synthetic moving image output unit 233, and displays the synthetic moving image on the display 107 together with the lyrics corresponding to the karaoke music through the display controller 108 as a background moving image of the karaoke music.

これと並行して、コントローラ１０１は、サウンドコントローラ１１１を制御して、マイクロフォン１０９Ａ、１０９Ｂから入力された、ユーザの入力音声に、カラオケ楽曲をミキシングしてスピーカ１１０から音響出力する（ステップＳ１５）。 In parallel with this, the controller 101 controls the sound controller 111 to mix karaoke music with the user's input voice input from the microphones 109A and 109B and to output the sound from the speaker 110 (step S15).

この場合において、ユーザにより録画を行う旨の設定がなされ、あるいは、基本設定として録画を行う旨の設定がなされている場合には、カラオケ楽曲の背景動画像として合成動画及びユーザの入力音声にカラオケ楽曲をミキシングした音声をＨＤＤ１０３に録画し、あるいは、予め光ディスクドライブ１０５にセットされた書き込み可能なＣＤあるいは書き込み可能なＤＶＤ等に録画を行うようになっている。 In this case, if the setting for recording is made by the user, or if the setting for recording is made as the basic setting, the synthesized moving image and the input voice of the user are added to the karaoke music as the background moving image. The sound obtained by mixing the music is recorded on the HDD 103 or recorded on a writable CD or a writable DVD set in the optical disc drive 105 in advance.

また、コントローラ１０１は、曲のテンポを変更したり、早送り等、合成動画の再生速度を変更したりするようなトリック再生が行われているか否かを判別し（ステップＳ１６）、トリック再生が行われている場合には（ステップＳ１６；Ｙｅｓ）、カラオケ楽曲の再生終了タイミングを算出して、再生終了タイミングに合わせて、カラオケ楽曲の背景動画像として合成動画の再生終了もなされるように、未再生部分の合成動画の再生成を行い（ステップＳ１７）、処理を再びステップＳ１５に移行する。 Further, the controller 101 determines whether or not trick playback is performed such as changing the tempo of the song or changing the playback speed of the composite video, such as fast-forwarding (step S16). If it has been interrupted (step S16; Yes), the playback end timing of the karaoke song is calculated, and the playback of the composite video as the background moving image of the karaoke song is also ended in accordance with the playback end timing. The reproduction moving image is regenerated (step S17), and the process returns to step S15.

また、ステップＳ１６の判別において、トリック再生が行われていない場合には（ステップＳ１６；Ｎｏ）、カラオケ楽曲の再生が終了するまで、ステップＳ１５におけるカラオケ楽曲の背景動画像をカラオケ楽曲に対応する歌詞とともにディスプレイ１０７に表示させるとともに、ユーザの入力音声に、カラオケ楽曲をミキシングしてスピーカ１１０から音響出力する処理を継続することとなる。 In addition, in the determination in step S16, if trick playback is not performed (step S16; No), the lyrics corresponding to the karaoke music are used as the background moving image of the karaoke music in step S15 until the playback of the karaoke music is finished. At the same time, it is displayed on the display 107, and the process of mixing the karaoke music with the user's input voice and outputting the sound from the speaker 110 is continued.

以上の処理により、取り込まれた素材データ５１と、素材データ５１に適したエフェクトデータ３０３Ａとを用いた合成動画をカラオケ楽曲の背景動画像として生成して、カラオケ再生を行うことができる。 Through the above processing, a synthesized moving image using the captured material data 51 and effect data 303A suitable for the material data 51 can be generated as a background moving image of karaoke music, and karaoke playback can be performed.

このように、本実施形態によれば、ユーザは、取り込ませた素材データ５１に適したエフェクトデータ３０３Ａを選択するための作業を何ら行わなくても、適切なエフェクトデータ３０３Ａが施された素材データ５１を含む合成動画がカラオケ楽曲の背景動画像として生成される。そして生成されたカラオケ楽曲の背景動画像がディスプレイ１０７に表示されつつ、カラオケ楽曲の再生が行われる状態で、カラオケ演奏端末１３を利用することができる。 As described above, according to this embodiment, the user does not perform any work for selecting the effect data 303A suitable for the captured material data 51, and the material data to which the appropriate effect data 303A is applied. A synthesized moving image including 51 is generated as a background moving image of karaoke music. The karaoke performance terminal 13 can be used in a state in which the karaoke music is reproduced while the generated background moving image of the karaoke music is displayed on the display 107.

すなわち、ユーザはエフェクトデータ３０３Ａに関する知識を全く有しないにも拘わらず、動画像再生アプリケーションプログラム２０２は、適切なエフェクトデータ３０３Ａが施された素材データ５１を含む合成動画像をカラオケの背景画像として容易に生成できる。 That is, although the user has no knowledge about the effect data 303A, the moving image playback application program 202 can easily use the synthesized moving image including the material data 51 on which the appropriate effect data 303A is applied as the background image of karaoke. Can be generated.

なお、本実施形態の合成動画生成処理の手順は全てソフトウェアによって実行することができる。このため、合成動画生成処理の手順を実行するプログラムを格納したコンピュータ読み取り可能な記憶媒体を通じてこのプログラムを通常のコンピュータにインストールして実行するだけで、本実施形態と同様の効果を容易に実現することができる。 Note that all the procedures of the synthetic moving image generation process of the present embodiment can be executed by software. For this reason, the same effect as that of the present embodiment can be easily realized simply by installing and executing this program on a normal computer through a computer-readable storage medium storing a program for executing the procedure of the synthetic moving image generation process. be able to.

また本発明は上記実施形態そのままに限定されるものではなく、実施段階ではその要旨を逸脱しない範囲で構成要素を変形して具体化できる。また、上記実施形態に開示されている複数の構成要素の適宜な組み合わせにより、種々の発明を形成できる。 Further, the present invention is not limited to the above-described embodiment as it is, and can be embodied by modifying the constituent elements without departing from the scope of the invention in the implementation stage. In addition, various inventions can be formed by appropriately combining a plurality of components disclosed in the embodiment.

以上の説明においては、主として、笑顔度と人数とに基づいて、エフェクト、トランジション、シーン切替を選択するように構成していた。 In the above description, the effect, transition, and scene switching are mainly selected based on the smile level and the number of people.

しかしながら、カラオケ楽曲の曲調、ビート等を検出し、曲調、ビート等に適合したエフェクト、トランジション、シーン切替を選択するように構成することも可能である。また、カラオケ楽曲における曲間（１番と２番等）、間奏期間等を検出して、表示する静止画像、付加するエフェクト、トランジション、シーンの切り替え等を選択するように構成することも可能である。 However, it is also possible to detect the tone, beat, etc. of the karaoke music and select an effect, transition, and scene switching suitable for the tone, beat, etc. It is also possible to detect the interval between karaoke songs (No. 1 and No. 2), interlude period, etc., and select a still image to be displayed, an effect to be added, a transition, a scene change, etc. is there.

また、歌詞表示における表示（切替）タイミングの制御データに基づいて、歌詞の表示切替タイミングを抽出し、静止画像、エフェクト、トランジション、シーンの切り替え等を選択するように構成することも可能である。 Further, it is possible to extract the display switching timing of lyrics based on the control data of the display (switching) timing in the lyrics display, and to select a still image, an effect, a transition, a scene switching, or the like.

さらには、歌詞データに含まれる歌詞を解析し、歌詞の内容に沿って表示する静止画像、付加するエフェクト、トランジション、シーンの切り替え等を選択するように構成することも可能である。具体的には、歌詞に悲しい単語が多く含まれるような場合には、暗く荘厳なイメージに合致する静止画像を表示し、エフェクトを付加し、トランジション、シーンの切り替え等を行うようにする。また、歌詞に楽しさ、明るさを表すような単語が多く含まれているような場合には、明るく、ポップな感じのイメージに合致する静止画像を表示し、エフェクトを付加し、トランジション、シーンの切り替え等を行うようにする。 Furthermore, it is possible to analyze the lyrics included in the lyrics data and select a still image to be displayed along with the contents of the lyrics, an effect to be added, a transition, a scene change, and the like. Specifically, when the lyrics contain many sad words, a still image that matches a dark and majestic image is displayed, an effect is added, and transitions, scene switching, and the like are performed. Also, if the lyrics contain many words that express fun and brightness, a still image that matches the bright, pop-like image is displayed, effects are added, transitions, scenes are displayed. Switch between them.

また、カメラ１１２により、例えば、カラオケ開始時に参加メンバの写真を撮影し、写真データに対応する写真画像に含まれる人物の年齢層、性別等に応じて、よりそれらの人々に好まれると考えられる静止画像を表示し、エフェクトを付加し、トランジション、シーンの切り替え等を行うように、構成することも可能である。 In addition, the camera 112 takes a picture of the participating members at the start of karaoke, for example, and it is considered that those people are more preferred according to the age group, gender, etc. of the person included in the photo image corresponding to the photo data. It may be configured to display a still image, add an effect, and perform transitions, scene switching, and the like.

以上の説明においては、ユーザＩ／Ｆ１０２を介して、素材データ５１が入力された場合に全ての素材データ（全ての静止画データ等）を用いて、動画像を生成するものとして説明したが、カラオケ楽曲の曲調、演奏時間等に合わせてその一部を用いて動画像を生成するように構成することも可能である。例えば、カラオケ楽曲の演奏時間に対して、素材データ５１の数が多い場合や、設定したシーン数が素材データ５１の数を下回るような場合には、適宜素材データを間引くように構成する。また、トリックプレイ時であって、演奏時間が短くなる場合には、動画像の再生成に用いる素材データ５１の数を適宜間引くようにすればよい。 In the above description, when the material data 51 is input via the user I / F 102, it has been described that the moving image is generated using all material data (all still image data, etc.). It is also possible to configure to generate a moving image using a part of the karaoke music according to the tone, performance time, and the like. For example, when the number of material data 51 is large with respect to the performance time of karaoke music, or when the number of set scenes is less than the number of material data 51, the material data is appropriately thinned out. In addition, if the performance time is shortened during trick play, the number of material data 51 used for regenerating a moving image may be thinned out as appropriate.

また、実施形態に示される全構成要素からいくつかの構成要素を削除してもよい。さらに、異なる実施形態にわたる構成要素を適宜組み合わせてもよい。 Moreover, you may delete some components from all the components shown by embodiment. Furthermore, constituent elements over different embodiments may be appropriately combined.

１０通信カラオケシステム
１１カラオケホスト
１２通信ネットワーク
１３カラオケ演奏端末（カラオケ装置）
２１素材入力部
２２素材分析部（解析手段）
２３動画再生部
５１素材データ
１０１コントローラ（解析手段、エフェクト設定選択手段、動画像生成手段、カラオケ再生手段）
１０７ディスプレイ（カラオケ再生手段）
１０８表示コントローラ（カラオケ再生手段）
１１０スピーカ（カラオケ再生手段）
１１１サウンドコントローラ（カラオケ再生手段）
１３１データベース（解析手段）
２０２動画像再生アプリケーションプログラム
２２１顔画像検出部（解析手段）
２２２表情検出部（解析手段）
２２３人数検出部（解析手段）
２３１エフェクト抽出部（エフェクト設定選択手段）
２３２合成動画生成部（動画像生成手段）
２３３合成動画出力部（カラオケ再生手段）
３０２素材情報データベース（解析手段）
３０２Ａ素材情報（解析手段）
３０２Ｂ分析情報（解析手段）
３０３エフェクトデータベース（エフェクト設定選択手段） 10 Communication Karaoke System 11 Karaoke Host 12 Communication Network 13 Karaoke Performance Terminal (Karaoke Device)
21 Material Input Unit 22 Material Analysis Unit (Analysis Means)
23 video playback section 51 material data 101 controller (analysis means, effect setting selection means, moving image generation means, karaoke playback means)
107 Display (karaoke playback means)
108 Display controller (karaoke playback means)
110 Speaker (karaoke playback means)
111 Sound controller (karaoke playback means)
131 Database (analysis means)
202 moving image reproduction application program 221 face image detection unit (analysis means)
222 Facial expression detection unit (analysis means)
223 Number detection unit (analysis means)
231 Effect extraction unit (effect setting selection means)
232 Synthetic video generation unit (moving image generation means)
233 Synthetic video output unit (karaoke playback means)
302 Material information database (analysis means)
302A Material information (analysis means)
302B Analysis information (analysis means)
303 Effect database (effect setting selection means)

Claims

An analysis means for analyzing a plurality of still images;
An effect setting selection means for selecting an effect setting used for generating a moving image including at least a part of the still images among the plurality of still images based on the result of the analysis;
Using the selected effect setting, moving image generating means for generating the moving image to which an effect constituting the effect setting is applied;
Karaoke playback means for playing back the generated moving image in synchronization with the playback of the karaoke music while playing back the karaoke music;
A karaoke apparatus comprising:

2. The karaoke apparatus according to claim 1, wherein the karaoke playback means superimposes a lyrics image corresponding to the karaoke song on the moving image being played in synchronization with the playback of the karaoke song.

3. The karaoke apparatus according to claim 1, further comprising a still image capturing unit that captures the plurality of still images via a camera, a recording medium, or a communication network and outputs the captured still images to the analysis unit.

When the playback tempo or playback speed of the karaoke music is changed, the moving image generation means associates the playback tempo or the playback speed with the number of the still images used for generating the moving image or each still image. The karaoke apparatus according to any one of claims 1 to 3, wherein the time for displaying is dynamically changed.

5. The recording apparatus according to claim 1, further comprising a recording unit that superimposes a user's input voice on the karaoke piece reproduced by the karaoke reproducing unit and the moving image and records the same on a recording medium. Karaoke equipment.

The analysis means analyzes karaoke music,
The effect setting selection means selects an effect setting used for generating a moving image based on the result of analysis such as the tone of the karaoke music obtained, beats, etc.
A karaoke apparatus according to any one of claims 1 to 5, wherein

A method for controlling a karaoke device executed in a karaoke device,
An analysis process for analyzing multiple still images;
Based on the result of the analysis, an effect setting selection process for selecting an effect setting used for generating a moving image including at least some of the still images among the plurality of still images;
Using the selected effect setting, a moving image generation process for generating the moving image with the effect constituting the effect setting;
A karaoke playback process of playing back the generated moving image in synchronization with the playback of the karaoke music while playing back the karaoke music;
A method for controlling a karaoke apparatus, comprising:

A control program for controlling a karaoke device by a computer,
The computer,
An analysis means for analyzing a plurality of still images;
An effect setting selection means for selecting an effect setting used for generating a moving image including at least a part of the still images among the plurality of still images based on the result of the analysis;
Using the selected effect setting, moving image generation means for generating the moving image to which an effect constituting the effect setting is applied;
Karaoke playback means for playing back the generated moving image in synchronization with the playback of the karaoke music while playing back the karaoke music,
A control program characterized by functioning as