JP2006126300A

JP2006126300A - Karaoke machine

Info

Publication number: JP2006126300A
Application number: JP2004311431A
Authority: JP
Inventors: Sukeyuki Shibuya; 資之渋谷
Original assignee: Yamaha Corp
Current assignee: Yamaha Corp
Priority date: 2004-10-26
Filing date: 2004-10-26
Publication date: 2006-05-18

Abstract

<P>PROBLEM TO BE SOLVED: To provide a Karaoke machine capable of displaying a result of evaluation of a singing voice to a singer by using simpler image processing. <P>SOLUTION: The Karaoke machine 1 is equipped with playing means 13, 16, 17, and 18, a singing input means 15, an evaluating means 341 and 342 for evaluating an input singing voice, a photographing means 41 for photographing the singer, image processing means 343 and 40 for processing a photographed image of the singer according to evaluation results of the evaluating means, and a display means 44 for displaying photographed image of the singer processed by the image processing means 343 and 40, and further equipped with an additional image storage means 112 for storing an additional image to be added to the photographed image of the singer. The image processing means 343 and 40 are equipped with an extracting means 343 for extracting a featured part of the singer in the photographed image of the singer and an image composing means 404 for performing composition processing of the additional image to the featured part extracted by the extracting means 343 according to the evaluation results of the evaluating means 341 and 342. <P>COPYRIGHT: (C)2006,JPO&NCIPI

Description

この発明は、歌唱者の歌唱音声を評価して、その評価に応じて画像処理が施された歌唱者の撮影画像を表示するカラオケ装置に関する。 The present invention relates to a karaoke apparatus that evaluates a singing voice of a singer and displays a photographed image of the singer that has been subjected to image processing in accordance with the evaluation.

従来より、歌唱者の歌唱音声を評価する機能と、該評価結果を歌唱者に示す機能とを備えたカラオケ装置が実用化されている。このような評価機能を備えたカラオケ装置には、例えばビデオカメラ等で、歌唱者の顔を撮影し、この顔の画像を評価結果に応じて変形するものも提案されている（例えば、特許文献１を参照）。かかるカラオケ装置では、歌唱音声の評価結果に応じて、変形処理の度合いが変更されるため、歌唱音声の評価結果を面白く歌唱者に示すことができる。
特開平１１−２３１８８１号公報 Conventionally, a karaoke apparatus having a function of evaluating a singer's singing voice and a function of showing the evaluation result to the singer has been put into practical use. As a karaoke apparatus equipped with such an evaluation function, there has been proposed a karaoke apparatus that captures a singer's face with a video camera, for example, and transforms the image of the face according to the evaluation result (for example, Patent Documents). 1). In such a karaoke apparatus, since the degree of deformation processing is changed according to the evaluation result of the singing voice, the evaluation result of the singing voice can be shown to the singer in an interesting manner.
Japanese Patent Laid-Open No. 11-231881

しかしながら、上記従来のカラオケ装置では、歌唱者の撮影画像に変形処理を施すという複雑な画像処理が用いられていため、画像処理の負荷が大きく、高速の画像処理装置（ＤＳＰ：Digital Signal Processor等）を備えなければならないという問題点があった。特に、歌唱者の歌唱中に、逐次歌唱音声を評価し、該歌唱途中の評価結果に応じて、歌唱者の撮影画像の表示内容を変更する構成が採用される場合には、歌唱途中で逐次、歌唱者の撮影画像に対する画像処理が施されるため、さらに高速の画像処理装置が要求される。 However, since the conventional karaoke apparatus uses complex image processing in which deformation processing is performed on a photographed image of a singer, the load of image processing is large, and a high-speed image processing apparatus (DSP: Digital Signal Processor or the like) There was a problem of having to have. In particular, during the singing of the singer, when a configuration is adopted in which the singing voice is evaluated sequentially and the display content of the photographed image of the singer is changed according to the evaluation result during the singing, the singing is performed sequentially during the singing. Since image processing is performed on a photographed image of a singer, a higher-speed image processing device is required.

そこで、本発明は、上記課題を解決するために、より簡易な画像処理を用いて、歌唱音声に対する評価結果を歌唱者に面白く表示することのできるカラオケ装置を提供することを目的としている。 Accordingly, an object of the present invention is to provide a karaoke apparatus that can display an evaluation result for a singing voice interesting to a singer using simpler image processing in order to solve the above-described problems.

本願発明は、上記の課題を解決するための手段として、カラオケ曲を演奏する演奏手段と、歌唱者の歌唱音声を入力する歌唱入力手段と、該歌唱入力手段によって入力された歌唱音声を評価する評価手段と、歌唱者を撮影する撮影手段と、評価手段による評価結果に応じて、歌唱者の撮影画像に画像処理を施す画像処理手段と、画像処理手段によって画像処理された、歌唱者の撮影画像を表示する表示手段と、を備えるカラオケ装置において、歌唱者の撮影画像に付加するための付加画像を記憶する付加画像記憶手段を更に備え、画像処理手段は、歌唱者の撮影画像における歌唱者の特徴部分を抽出する抽出手段と、評価手段による評価結果に応じて、抽出手段によって抽出された特徴部分に対して付加画像を合成処理する画像合成手段と、を備えていることを特徴とする。 The invention of the present application evaluates the singing voice inputted by the singing input means, the singing voice inputting means for inputting the singing voice of the singer, as the means for solving the above-mentioned problems. Evaluation means, photographing means for photographing the singer, image processing means for performing image processing on the photographed image of the singer according to the evaluation result by the evaluation means, and photographing of the singer image-processed by the image processing means A karaoke apparatus comprising a display means for displaying an image, further comprising additional image storage means for storing an additional image to be added to the photographed image of the singer, wherein the image processing means is a singer in the photographed image of the singer Extracting means for extracting the feature portion of the image processing means, and image synthesizing means for synthesizing the additional image with the feature portion extracted by the extracting means according to the evaluation result by the evaluating means, Characterized in that it comprises.

本願発明の構成によれば、演奏手段によってカラオケ曲が演奏され、該カラオケ曲に合わせて、歌唱者が歌唱すると、該歌唱音声が歌唱入力手段によって入力され、入力された歌唱音声は、評価手段によって評価される。そして、歌唱者が撮影手段によって撮影され、該歌唱者の撮影画像は、画像処理手段によって、評価手段による評価結果に応じて画像処理が施され、該画像処理された歌唱者の撮影画像が、表示手段によって表示される。ここで、画像処理手段による画像処理として、抽出手段によって、歌唱者の撮影画像における歌唱者の特徴部分が抽出され、該特徴部分に対して、付加画像記憶手段に記憶される付加画像が、画像合成手段によって合成処理される。これによって、歌唱者の撮影画像に変形処理を施していた従来技術と比較して、より簡易な画像処理を用いながら、歌唱者の撮影画像に歌唱音声の評価結果を反映させることが可能となる。 According to the configuration of the present invention, when a karaoke song is played by the performance means and the singer sings along with the karaoke song, the singing voice is input by the singing input means, and the input singing voice is evaluated by the evaluation means. Rated by. Then, the singer is photographed by the photographing means, and the photographed image of the singer is subjected to image processing by the image processing means according to the evaluation result by the evaluating means, and the photographed image of the singer that has undergone the image processing is Displayed by display means. Here, as the image processing by the image processing unit, the extraction unit extracts the characteristic part of the singer from the photographed image of the singer, and the additional image stored in the additional image storage unit is the image of the characteristic part. The synthesis process is performed by the synthesis means. As a result, it is possible to reflect the evaluation result of the singing voice on the photographed image of the singer while using simpler image processing as compared with the conventional technique in which the photographed image of the singer is subjected to the deformation process. .

更に、上述したカラオケ装置では、上記評価手段は、上記演奏手段のカラオケ曲の演奏中に、所定時間間隔毎に歌唱音声を評価し、画像処理手段は、該評価結果が得られる毎に、画像処理を実行することを特徴としてもよい。 Further, in the karaoke apparatus described above, the evaluation means evaluates the singing voice at predetermined time intervals during the performance of the karaoke song by the performance means, and the image processing means outputs an image every time the evaluation result is obtained. It is good also as performing processing.

この構成によれば、カラオケ曲の演奏中に所定時間間隔毎に歌唱音声が、評価手段によって評価され、評価結果が得られる毎に、画像処理手段によって画像処理が実行されるため、該画像処理後の歌唱者の撮影画像が表示手段によって、逐次表示されることで、例えば、歌唱者や、この歌唱者の視聴者等に対して、歌唱途中時点での歌唱の評価結果を表示することが可能となる。 According to this configuration, the singing voice is evaluated by the evaluation unit at predetermined time intervals during the performance of the karaoke song, and the image processing is executed by the image processing unit each time an evaluation result is obtained. By displaying the captured images of the subsequent singers sequentially by the display means, for example, the evaluation result of the singing in the middle of the singing can be displayed to the singers and the viewers of the singers. It becomes possible.

また、上述したカラオケ装置では、上記抽出手段は、歌唱者の特徴部分として、歌唱者の顔部の構成要素を抽出することを特徴としてもよい。この構成によれば、歌唱者の顔部の構成要素（例えば、目、鼻、口等）が、歌唱者の特徴部分として抽出され、付加画像（例えば、羽根つきの墨等の模様を模したものや、髭を模したもの等）が合成されるため、例えば、歌唱者や、この歌唱者の視聴者等に対して、歌唱音声の評価結果をより趣向性高く表示することが可能となる。 Moreover, in the karaoke apparatus mentioned above, the said extraction means is good also as extracting the component of a singer's face as a singer's characteristic part. According to this configuration, the constituent elements of the singer's face (for example, eyes, nose, mouth, etc.) are extracted as characteristic portions of the singer, and an additional image (for example, a pattern of ink with feathers, for example) For example, it is possible to display the evaluation result of the singing voice with higher taste for a singer, a viewer of the singer, and the like.

そして、上述したカラオケ装置では、上記画像合成手段は、上記評価手段による評価が低い程、歌唱者の特徴部分に対して多くの付加画像を合成することを特徴としてもよい。この構成によれば、歌唱音声の評価が低い程、歌唱者の特徴部分に対して多くの付加画像が合成されるだけの簡易な構成を用いながら、歌唱音声の評価結果を例えば、歌唱者や、この歌唱者の視聴者等に対して、視覚的に理解容易に表示することが可能となる。 And in the karaoke apparatus mentioned above, the said image synthetic | combination means may synthesize | combine many additional images with the characteristic part of a singer, so that evaluation by the said evaluation means is low. According to this configuration, as the evaluation of the singing voice is lower, the evaluation result of the singing voice is obtained, for example, by using a simple configuration in which many additional images are synthesized with respect to the characteristic part of the singer. It is possible to display visually and easily for the viewer of the singer.

また、上述したカラオケ装置では、上記画像処理手段は、上記抽出手段によって抽出された、歌唱者の特徴部分に対して、モザイク処理及び色変更処理のうち少なくとも１の処理を行うエフェクト処理手段を更に備えることを特徴としてもよい。この構成によれば、歌唱音声の評価結果に応じて、付加画像が合成されるのに加えて、歌唱者の評価結果に応じて、歌唱者の特徴部分に対してエフェクト処理が行われるため、歌唱音声の評価結果をより詳細に表現することが可能となる。 In the karaoke apparatus described above, the image processing means further includes effect processing means for performing at least one of mosaic processing and color change processing on the characteristic part of the singer extracted by the extraction means. It is good also as providing. According to this configuration, in addition to the synthesis of the additional image according to the evaluation result of the singing voice, the effect processing is performed on the characteristic part of the singer according to the evaluation result of the singer, It becomes possible to express the evaluation result of the singing voice in more detail.

また、上述したカラオケ装置では、付加画像の合成処理がなされた、歌唱者の撮影画像を記録媒体に書き込み可能な合成画像出力手段を更に備えることを特徴としてもよい。この構成によれば、付加画像の合成処理がなされた歌唱者の撮影画像が、合成画像出力手段によって記録媒体に書き込み可能であるため、歌唱音声の評価結果を保存することが可能となる。これによって、例えば、次回以降のカラオケ曲の演奏においても、付加画像の合成処理がなされた歌唱者の撮影画像を用いることができる構成等を採用して、複数回のカラオケ曲の演奏にわたって、歌唱者の評価結果を得、該評価結果を表示することが可能となる。 Further, the karaoke apparatus described above may further include a composite image output unit capable of writing a photographed image of the singer on the recording medium, which has been subjected to the additional image combining process. According to this configuration, since the photographed image of the singer that has been subjected to the synthesis process of the additional image can be written to the recording medium by the synthesized image output unit, the evaluation result of the singing voice can be stored. Thus, for example, in the performance of karaoke songs from the next time onward, it is possible to use a configuration in which a photographed image of a singer that has undergone additional image synthesis processing can be used, and singing over multiple karaoke song performances. It is possible to obtain the evaluation result of the person and display the evaluation result.

本願発明によれば、歌唱者の撮影画像に変形処理を施していた従来技術と比較して、より簡易な画像処理を用いながら、歌唱者の撮影画像に歌唱音声の評価結果を反映させることが可能となるため、従来技術と比較してより簡易な画像処理を用いながら、歌唱者等に対して、歌唱者の撮影画像によって、歌唱音声の評価結果を表示することができる。 According to the present invention, the evaluation result of the singing voice can be reflected in the photographed image of the singer while using simpler image processing as compared with the conventional technique in which the photographed image of the singer is subjected to the deformation process. Therefore, the evaluation result of the singing voice can be displayed to the singer or the like by the photographed image of the singer while using simpler image processing as compared with the prior art.

以下、図面を参照して、本発明の実施形態にかかるカラオケ装置を説明する。 Hereinafter, a karaoke apparatus according to an embodiment of the present invention will be described with reference to the drawings.

実施形態にかかるカラオケ装置では、ユーザ（歌唱者や、この歌唱者を視聴する視聴者等）の選択操作によって、通常モードの他に、評価モードを含む複数のモードから１のモードが選択可能になっている。通常モードとは、歌唱音声の評価を行わないモードである。評価モードとは、歌唱者のカラオケの演奏に合わせた歌唱者の歌唱音声のレベルを評価するモードである。評価モードでは、カラオケ装置は、評価処理を実行する。評価処理とは、歌唱者が歌唱するときに、歌唱者を撮影するとともに、該歌唱音声を評価し、評価結果に応じて、歌唱者の撮影画像に、例えば、髭等の羽根つきの墨等を模したオブジェクトを合成して表示する、一連の処理である。 In the karaoke apparatus according to the embodiment, one mode can be selected from a plurality of modes including the evaluation mode in addition to the normal mode by a selection operation of a user (a singer or a viewer who views the singer). It has become. The normal mode is a mode in which the singing voice is not evaluated. The evaluation mode is a mode for evaluating the level of the singing voice of the singer in accordance with the performance of the singer's karaoke. In the evaluation mode, the karaoke apparatus executes an evaluation process. With the evaluation process, when the singer sings, the singer is photographed and the singing voice is evaluated, and according to the evaluation result, the photographed image of the singer includes, for example, ink with feathers such as a kite. It is a series of processes that synthesize and display simulated objects.

（第１の実施形態）
図１は、本発明の第１の実施形態にかかるカラオケ装置１のブロック図である。カラオケ装置１は、装置全体の動作を制御するＣＰＵ（Central Processing Unit）１０と、図略のバスやコネクタを介してこれに接続された各種機器で構成されている。ＣＰＵ１０には、ハードディスク（以下、「ＨＤＤ」と記載する）１１、ＲＡＭ（RandomAccess Memory）１２及び操作部２０が接続されている。また、ＣＰＵ１０には、音源１３、Ａ／Ｄコンバータ１４、ミキサ１６、サウンドシステム（以下、「ＳＳ」と記載する）１７及びボーカルアダプタ１９が接続されている。更に、ＣＰＵ１０は、歌唱者画像処理部４０、ＭＰＥＧ（MovingPicture Experts Group）デコーダ４２、合成回路４３が接続されている。また、カラオケ装置１は、マイクロフォン１５、スピーカ１８、ビデオカメラ４１及びモニタ４４が接続可能に構成されている。 (First embodiment)
FIG. 1 is a block diagram of a karaoke apparatus 1 according to the first embodiment of the present invention. The karaoke apparatus 1 is composed of a CPU (Central Processing Unit) 10 that controls the operation of the entire apparatus, and various devices connected thereto via buses and connectors (not shown). A hard disk (hereinafter referred to as “HDD”) 11, a RAM (Random Access Memory) 12, and an operation unit 20 are connected to the CPU 10. Further, a sound source 13, an A / D converter 14, a mixer 16, a sound system (hereinafter referred to as “SS”) 17, and a vocal adapter 19 are connected to the CPU 10. Further, the CPU 10 is connected to a singer image processing unit 40, an MPEG (Moving Picture Experts Group) decoder 42, and a synthesis circuit 43. Moreover, the karaoke apparatus 1 is configured so that a microphone 15, a speaker 18, a video camera 41, and a monitor 44 can be connected.

ＨＤＤ１１は、カラオケ曲の演奏、歌唱音声の入出力及び該カラオケ曲の演奏に対応した画像等をモニタ４４に表示するため等の所定のプログラムを記憶するとともに、画像データや音声データを記憶するためのものであり、機能的に、曲データ記憶部１１０、映像データ記憶部１１１及び付加画像記憶部１１２を備えるものである。曲データ記憶部１１０は、カラオケ曲を演奏するための曲データを記憶する領域である。映像データ記憶部１１１は、モニタ４４に表示するための背景映像の映像データなどを記憶する領域である。付加画像記憶部１１２は、複数種類の付加画像データを記憶する領域である。付加画像データは、評価モードにおいて、歌唱者の撮影画像に付加するためのオブジェクト（以下、「付加画像」と記載する）の画像データである。付加画像は、例えば、羽子板で負けた場合に敗者の顔に、ペナルティーとして描かれる墨絵の髭や、○・×等の図形等を模したオブジェクト等である。 The HDD 11 stores a predetermined program such as performance of a karaoke song, input / output of singing voice and displaying an image corresponding to the performance of the karaoke song on the monitor 44, and also stores image data and sound data. Functionally, the music data storage unit 110, the video data storage unit 111, and the additional image storage unit 112 are provided. The song data storage unit 110 is an area for storing song data for playing karaoke songs. The video data storage unit 111 is an area for storing video data of a background video to be displayed on the monitor 44. The additional image storage unit 112 is an area for storing a plurality of types of additional image data. The additional image data is image data of an object (hereinafter referred to as “additional image”) to be added to the photographed image of the singer in the evaluation mode. The additional image is, for example, an ink brush drawn as a penalty on the face of the loser when defeated by a battledore or an object imitating a figure such as ◯ ×.

ＲＡＭ１２は、ＣＰＵ１０の作業領域であり、ＨＤＤ１１からプログラムや曲データ及び付加画像を読み出すエリアの他に、採点ログエリア１２０などが設定されている。採点ログエリア１０２は、評価モード時に採点された、歌唱音声の採点結果等を記録する領域である。 The RAM 12 is a work area of the CPU 10, and in addition to an area for reading programs, song data, and additional images from the HDD 11, a scoring log area 120 is set. The grading log area 102 is an area for recording a singing voice grading result and the like scored in the evaluation mode.

音源１３は、ＣＰＵ１０（後述する曲シーケンサ３２）から入力された曲データ（ノートイベントデータ等）に応じて楽音信号を形成し、ミキサ１６に出力する。Ａ／Ｄコンバータ１４は、マイクロフォン１５（歌唱入力手段の一例）によって入力された、歌唱者の歌唱音声信号をデジタル信号に変換し、ミキサ１６及びボーカルアダプタ１９に出力する。 The sound source 13 forms a musical sound signal in accordance with music data (note event data or the like) input from the CPU 10 (a music sequencer 32 described later), and outputs it to the mixer 16. The A / D converter 14 converts a singer's singing voice signal input by the microphone 15 (an example of a singing input unit) into a digital signal and outputs the digital signal to the mixer 16 and the vocal adapter 19.

ミキサ１６は、音源１３が発生した複数の楽音信号、および、マイクロフォン１５−Ａ／Ｄコンバータ１７を介して入力された歌唱者の歌唱音声信号に対してエコーなどの効果を付与するとともに、これらの信号を適当なバランスでミキシングする。ミキサ１６は、ミキシングしたデジタルの音声信号をＳＳ１７に出力する。ミキサ１６が各音声信号に付与する効果およびミキシングのバランスはＣＰＵ１０によって制御される。ＳＳ１７は、Ｄ／Ａコンバータおよびパワーアンプを備えており、入力されたデジタル信号をアナログ信号に変換して増幅し、スピーカ１８から放音する。 The mixer 16 gives an effect such as echo to the plurality of musical sound signals generated by the sound source 13 and the singing voice signal of the singer input via the microphone 15 -A / D converter 17. Mix the signal with an appropriate balance. The mixer 16 outputs the mixed digital audio signal to the SS 17. The effect that the mixer 16 gives to each audio signal and the balance of mixing are controlled by the CPU 10. The SS 17 includes a D / A converter and a power amplifier, converts an input digital signal into an analog signal, amplifies it, and emits sound from the speaker 18.

ボーカルアダプタ１９は、Ａ／Ｄコンバータ１４から歌唱音声信号が入力されるとともに、ＣＰＵ１０（後述の曲シーケンサ３２１）からリファレンスが入力される。なお、リファレンスとしては、例えば、曲データに含まれるガイドメロディデータ等が用いられる。ボーカルアダプタ１９は、この入力された歌唱音声信号から、歌唱音声の声量及び歌唱周波数を割り出すとともに、入力されたリファレンスの周波数及び声量を割り出して、割り出したこれらの情報を、サンプリング間隔毎にＣＰＵ１０（後述する採点モード処理部３４）に出力する。 The vocal adapter 19 receives a singing voice signal from the A / D converter 14 and a reference from the CPU 10 (a music sequencer 321 described later). As the reference, for example, guide melody data included in music data is used. The vocal adapter 19 calculates the voice volume and singing frequency of the singing voice from the input singing voice signal, and also calculates the input reference frequency and voice volume, and the calculated information for each sampling interval. It outputs to the scoring mode process part 34) mentioned later.

操作部２０は、例えば、パネルスイッチインターフェースや、リモコン受信回路等からなるものである。操作部２０は、パネルスイッチやリモコン装置（図略）に信号の送受信を可能に接続されるものである。パネルスイッチやリモコン装置は、種々のキースイッチを備えており、該キースイッチがユーザに押下されることで、ユーザからの曲選択の操作や、複数のモードから１のモードを選択するための操作を受け付ける。これらの装置でユーザからの操作を受け付けたときに、操作部２０は、該操作内容を示す操作信号をこれらの装置から受信し、該操作信号をＣＰＵ１０に出力する。 The operation unit 20 includes, for example, a panel switch interface and a remote control receiving circuit. The operation unit 20 is connected to a panel switch or a remote control device (not shown) so as to be able to transmit and receive signals. The panel switch and the remote control device are provided with various key switches. When the user presses the key switch, the user selects a song or selects one mode from a plurality of modes. Accept. When an operation from a user is received by these devices, the operation unit 20 receives an operation signal indicating the operation content from these devices, and outputs the operation signal to the CPU 10.

ＣＰＵ１０は、操作入力処理部３１と、シーケンサ３２と、背景映像再生部３３と、評価モード処理部３４とを機能的に含むものである。操作入力処理部３１は、操作部２０から入力された操作信号に従った処理を実行する。例えば、曲選択の操作信号が入力された場合には、選択されたカラオケ曲をシーケンサ３２に通知し、モード選択の操作信号が入力された場合には、例えば、ＲＡＭ１２における所定のフラグ記憶領域にフラグを記憶させる等して、選択されたモードを設定する。なお、モード選択の初期設定は、通常モードであり、評価モードを選択する操作信号が入力されない場合には、通常モードが設定される。 The CPU 10 functionally includes an operation input processing unit 31, a sequencer 32, a background video reproduction unit 33, and an evaluation mode processing unit 34. The operation input processing unit 31 executes processing according to the operation signal input from the operation unit 20. For example, when a song selection operation signal is input, the selected karaoke song is notified to the sequencer 32. When a mode selection operation signal is input, for example, in a predetermined flag storage area in the RAM 12. The selected mode is set by storing a flag or the like. Note that the initial setting for mode selection is the normal mode, and when the operation signal for selecting the evaluation mode is not input, the normal mode is set.

シーケンサ３２は、カラオケ曲の演奏に必要な処理を実行するものであり、曲シーケンサ３２１、歌詞シーケンサ３２２及び文字パターン作成部３２２ａを含む。曲シーケンサ３２１は、カラオケ曲を演奏するための処理を実行するものである。このカラオケ曲を演奏するための処理として、曲シーケンサ３２１は、例えば、操作入力処理部３１から通知された、ユーザの選択したカラオケ曲の曲データを、曲データ記憶部１１０から読み出してＲＡＭ１２に記憶させる。曲シーケンサ３２１は、ＲＡＭ１２から順次曲データを読み出し、該読み出した曲データに含まれるデータ（例えば、演奏データトラック、ガイドメロディートラックなどのトラックのデータ等）を用いて、音源１３の発音処理を制御することで、カラオケ曲の演奏音を発生させる。なお、曲シーケンサ３２１、音源１３、ミキサ１６、ＳＳ１７及びスピーカ１８で、演奏手段が構成される。 The sequencer 32 executes processing necessary for playing a karaoke song, and includes a song sequencer 321, a lyrics sequencer 322, and a character pattern creation unit 322a. The song sequencer 321 executes processing for playing a karaoke song. As a process for playing this karaoke song, the song sequencer 321 reads, for example, the song data of the karaoke song selected by the user notified from the operation input processing unit 31 from the song data storage unit 110 and stores it in the RAM 12. Let The song sequencer 321 sequentially reads song data from the RAM 12, and uses the data included in the read song data (for example, data of tracks such as performance data tracks, guide melody tracks, etc.) to control the sound generation processing of the sound source 13. By doing so, the performance sound of the karaoke song is generated. The music sequencer 321, the sound source 13, the mixer 16, the SS 17 and the speaker 18 constitute a performance means.

歌詞シーケンサ３２２は、曲シーケンサ３２１で処理されるカラオケ曲の位置に対応した歌詞トラックを、曲シーケンサ３２１に読み出されたカラオケ曲から読み出し、該歌詞トラックのデータに基づいて、文字パターン作成部３２２ａに画像パターンを作成させて、合成回路４３に出力させる。 The lyrics sequencer 322 reads the lyrics track corresponding to the position of the karaoke song processed by the song sequencer 321 from the karaoke song read by the song sequencer 321, and based on the data of the lyrics track, the character pattern creation unit 322a. Then, an image pattern is generated and output to the synthesis circuit 43.

背景映像再生部３３は、例えば、通常モードが設定されている場合に、シーケンサ３２からの指示に応じて、選択されたカラオケ曲に対応した背景画像データを順次読み出して、合成回路４３に出力する。 For example, when the normal mode is set, the background video reproduction unit 33 sequentially reads background image data corresponding to the selected karaoke song in accordance with an instruction from the sequencer 32 and outputs the background image data to the synthesis circuit 43. .

評価モード処理部３４は、評価モードが設定されている場合に、例えば後述のような評価処理を総括的に実行し、歌唱者の歌唱音声の評価を行うとともに、歌唱者画像処理部４０を用いて、ビデオカメラ４１に歌唱者を撮影させ、該撮影画像に対して、該評価結果に応じた画像処理を施して、モニタ４４に表示させる。評価モード処理部３４は、評価エンジン３４１、ポイント抽出部３４２及びオブジェト抽出部３４３を含む。 When the evaluation mode is set, the evaluation mode processing unit 34 comprehensively executes, for example, the following evaluation processing, evaluates the singing voice of the singer, and uses the singer image processing unit 40. Then, the singer is photographed by the video camera 41, and the photographed image is subjected to image processing according to the evaluation result and displayed on the monitor 44. The evaluation mode processing unit 34 includes an evaluation engine 341, a point extraction unit 342, and an object extraction unit 343.

評価エンジン３４１は、評価モードが設定されている場合に、マイクロフォン１８を介して入力された歌唱音声の音程、声量等から、歌唱音声の正確さや抑揚を評価する処理を実行する。例えば、このような歌唱音声の評価は、ボーカルアダプタ１９から出力された歌唱周波数及び歌唱の声量とリファレンス周波数及びリファレンス声量とを用いて行われる（歌唱音声の評価についての詳細は、後述する）。ポイント抽出部３４２は、評価エンジン３４１での評価結果に基づいて、歌唱音声の採点を行う（歌唱音声の採点についての詳細は、後述する）。ポイント抽出部３４２は、歌唱音声の採点結果を、採点ログ記憶領域１２０に記憶させるとともに、オブジェクト抽出部３４３に出力する。なお、評価エンジン３４１及びポイント抽出部３４２とで、評価手段が構成される。 When the evaluation mode is set, the evaluation engine 341 executes a process for evaluating the accuracy and inflection of the singing voice from the pitch, voice volume, and the like of the singing voice input via the microphone 18. For example, the evaluation of the singing voice is performed using the singing frequency, the singing voice volume, the reference frequency, and the reference voice volume output from the vocal adapter 19 (details of the singing voice evaluation will be described later). The point extraction unit 342 scores the singing voice based on the evaluation result of the evaluation engine 341 (details regarding the singing voice scoring will be described later). The point extraction unit 342 stores the singing voice scoring result in the scoring log storage area 120 and outputs it to the object extraction unit 343. The evaluation engine 341 and the point extraction unit 342 constitute an evaluation unit.

オブジェクト抽出部３４３は、ポイント抽出部３４２から、歌唱音声の採点結果が出力され、採点結果が所定レベルより低い場合には、ＲＡＭ１２を参照して、付加画像記憶部１１２から読み出され、ＲＡＭ１２に記憶される複数種類の付加画像のうち、該採点結果に対応した付加画像を読み出し、歌唱者画像処理部４０（後述する画像合成部４０４）に出力する。一方、採点結果が所定レベルより高い場合には、オブジェクト抽出部３４３は、付加画像を抽出しない。 The object extraction unit 343 outputs a singing voice scoring result from the point extraction unit 342. When the scoring result is lower than a predetermined level, the object extraction unit 343 reads out from the additional image storage unit 112 with reference to the RAM 12, and stores it in the RAM 12. An additional image corresponding to the scoring result is read out from a plurality of types of additional images stored, and is output to the singer image processing unit 40 (an image composition unit 404 described later). On the other hand, when the scoring result is higher than the predetermined level, the object extraction unit 343 does not extract the additional image.

歌唱者画像処理部４０は、例えば、ＤＳＰ等で構成され、画像記憶部４０１、映像分析部４０２、特徴部分抽出部４０３及び画像合成部４０４を備える。画像記憶部４０１は、例えば、ＲＡＭ等からなるものであり、オブジェクト抽出部３４３から出力された付加画像を格納するとともに、ビデオカメラ４１（撮影手段の一例）から出力された歌唱者の撮影画像のデータを格納する。 The singer image processing unit 40 includes, for example, a DSP, and includes an image storage unit 401, a video analysis unit 402, a feature portion extraction unit 403, and an image composition unit 404. The image storage unit 401 includes, for example, a RAM or the like, stores the additional image output from the object extraction unit 343, and stores the photographed image of the singer output from the video camera 41 (an example of the imaging unit). Store the data.

映像分析部４０２は、オブジェクト抽出部３４３で付加画像が抽出される場合に、画像記憶部４０１に格納された撮影画像データを分析し、歌唱者の撮影画像データの色情報等から歌唱者の顔を特定するとともに、顔の構成要素（例えば、顔部の輪郭、目、鼻、口、耳、鼻、眉及び髪等）を、歌唱者の特徴部分として特定する。特徴部分抽出部４０３は、本願発明の抽出手段に対応し、映像分析部４０２による歌唱者の特徴部分の特定結果が出力され、該特定結果に基づいて、画像記憶部４０１に格納されている付加画像に対応した特徴部分を抽出する。例えば、付加画像が髭オブジェクトである場合には、該髭オブジェクトに対応した特徴部分として、鼻や口等が抽出される。 When an additional image is extracted by the object extraction unit 343, the video analysis unit 402 analyzes the captured image data stored in the image storage unit 401 and determines the singer's face from the color information of the singer's captured image data. And the facial components (for example, facial contours, eyes, nose, mouth, ears, nose, eyebrows, hair, etc.) are specified as the characteristic parts of the singer. The feature part extraction unit 403 corresponds to the extraction means of the present invention, and the result of specifying the singer's feature part by the video analysis unit 402 is output. Based on the specification result, the feature part extraction unit 403 is stored in the image storage unit 401. A feature portion corresponding to the image is extracted. For example, when the additional image is a wrinkle object, a nose, a mouth, or the like is extracted as a characteristic portion corresponding to the wrinkle object.

画像合成部４０４は、本願発明の画像合成手段に対応しており、特徴部分抽出部４０３によって抽出された特徴部分に対して、該特徴部分に対応する付加画像を画像記憶部４０１から読み出して合成する（付加画像の特徴部分に対する合成については、詳しくは後述する）。画像合成部４０４は、画像処理後の歌唱者の撮影画像を合成回路４３に対して出力する。なお、オブジェクト抽出部３４３と、歌唱者画像処理部４０とで、画像処理手段が構成される。 The image composition unit 404 corresponds to the image composition unit of the present invention, and for the feature part extracted by the feature part extraction unit 403, an additional image corresponding to the feature part is read from the image storage unit 401 and synthesized. (The composition of the feature portion of the additional image will be described in detail later). The image composition unit 404 outputs the photographed image of the singer after the image processing to the composition circuit 43. The object extraction unit 343 and the singer image processing unit 40 constitute an image processing unit.

ＭＰＥＧデコーダ４２は、入力されたＭＰＥＧデータをＮＴＳＣ（National Television System Committee）の映像信号に変換して合成回路４３に入力するものである。例えば、通常モードが設定されている場合に、ＭＰＥＧデコーダ４２は、背景映像再生処理部３３から出力される、ＭＰＥＧ２形式にエンコードされた背景映像データの信号変換を行い、合成回路４３に出力する。 The MPEG decoder 42 converts the input MPEG data into an NTSC (National Television System Committee) video signal and inputs it to the synthesis circuit 43. For example, when the normal mode is set, the MPEG decoder 42 performs signal conversion of the background video data encoded in the MPEG2 format output from the background video reproduction processing unit 33 and outputs it to the synthesis circuit 43.

合成回路４３は、通常モードでは、ＭＰＥＧデコーダ４２から出力された背景映像の映像信号に、評価モードでは、画像合成部４０４から出力された歌唱者合成画像の上に歌詞テロップや採点結果の表示などのＯＳＤ（On Screen Display）を合成する回路である。モニタ４４は、例えば、ＣＲＴ（Cathode Ray Tube）ディスプレイや、ＬＣＤ（LiquidCrystal Display）等であり、合成回路４３から出力された画像を表示する。 The synthesizing circuit 43 displays the lyrics telop and the scoring result on the video signal of the background video output from the MPEG decoder 42 in the normal mode and on the singer synthesized image output from the image synthesizing unit 404 in the evaluation mode. This is a circuit for synthesizing the OSD (On Screen Display). The monitor 44 is, for example, a CRT (Cathode Ray Tube) display, an LCD (Liquid Crystal Display), or the like, and displays an image output from the synthesis circuit 43.

図２は、図１で示すカラオケ装置１の実行する評価処理の一例を示すフローチャートである。本処理は、評価モード処理部３４が、操作入力処理部３１から評価モードを選択する操作信号を受信した場合に実行開始される。 FIG. 2 is a flowchart showing an example of an evaluation process executed by the karaoke apparatus 1 shown in FIG. This process is started when the evaluation mode processing unit 34 receives an operation signal for selecting an evaluation mode from the operation input processing unit 31.

まず、評価モード処理部３４が、ビデオカメラ４１を用いて歌唱者の撮影を開始させるよう、歌唱者画像処理部４０に指示する。歌唱者画像処理部４０は、所定時間間隔毎の歌唱者の撮影をビデオカメラ４１に開始させ、撮影された歌唱者の撮影画像を出力させて、出力された該歌唱者の撮影画像を画像記憶部４０１に記憶させ、これを順次更新させる（Ｓ１）。 First, the evaluation mode processing unit 34 instructs the singer image processing unit 40 to start shooting a singer using the video camera 41. The singer image processing unit 40 causes the video camera 41 to start shooting a singer at predetermined time intervals, outputs a shot image of the shot singer, and stores the output shot image of the singer as an image. The data is stored in the unit 401 and updated sequentially (S1).

次に、シーケンサ３２が、操作入力処理部３１から、曲選択の指示を受信し、曲選択に対応したカラオケ曲の演奏を開始する（Ｓ２）。そして、歌唱者画像処理部４０は、初期画像の表示処理を行う（Ｓ３）。初期画像の表示処理では、歌唱者画像処理部４０は、画像記憶部４０１に記憶されている歌唱者の撮影画像の合成回路４３への出力を開始し、シーケンサ３２から出力された歌詞テロップと、歌唱者の撮影画像とを合成回路４３に合成させ、この合成画像を初期画像として、モニタ４４に表示させる。 Next, the sequencer 32 receives a song selection instruction from the operation input processing unit 31, and starts playing a karaoke song corresponding to the song selection (S2). And the singer image process part 40 performs the display process of an initial image (S3). In the initial image display process, the singer image processing unit 40 starts outputting the photographed image of the singer stored in the image storage unit 401 to the synthesis circuit 43, and the lyrics telop output from the sequencer 32; The photographed image of the singer is synthesized with the synthesis circuit 43, and this synthesized image is displayed on the monitor 44 as an initial image.

ここで、歌唱者が、表示された歌詞テロップを参考にして、カラオケ曲の演奏に合わせて歌唱すると、マイクロフォン１５には、歌唱音声が入力される。該入力された歌唱音声は、ミキサ１６によってカラオケ曲とミキシングされ、ＳＳ１７を介してスピーカ１８から出力されるとともに、Ａ／Ｄコンバータ１４を介してボーカルアダプタ１９に出力される。 Here, when the singer sings along with the performance of the karaoke song with reference to the displayed lyrics telop, the singing voice is input to the microphone 15. The input singing voice is mixed with the karaoke music by the mixer 16 and output from the speaker 18 via the SS 17 and also to the vocal adapter 19 via the A / D converter 14.

次に、評価モード処理部３４は、歌唱音声の評価タイミングの到来を判断する（Ｓ４）。評価モード処理部３４は、例えば、時間を計時するための計時手段を内蔵し、この計時手段によって、所定の時間間隔毎に到来が通知されるようになっており、該通知があったときに、評価タイミングの到来があると判断する。 Next, the evaluation mode processing unit 34 determines the arrival of the evaluation timing of the singing voice (S4). The evaluation mode processing unit 34 includes, for example, a time measuring unit for measuring time, and the time measuring unit is notified of arrival at every predetermined time interval. It is determined that there is an evaluation timing.

歌唱音声の評価タイミングの到来を判断しない場合には（Ｓ４でＮＯ）、評価モード処理部３４は、繰り返しステップＳ４を実行する。歌唱音声の評価タイミングの到来を判断する場合には（Ｓ４でＹＥＳ）、評価モード処理部３４は、歌唱音声の評価処理を行う（Ｓ５）。ここでの歌唱音声の評価処理では、評価モード処理部３４が、ボーカルアダプタ１９に対して、リフェレンスと歌唱音声との周波数及び声量を評価エンジン３４１に出力するように指示する。評価エンジン３４１は、入力されたリファレンスと歌唱音声との周波数及び声量に基づいて、歌唱音声の評価を行う。具体的には、評価エンジン３４１は、歌唱周波数とリファレンス周波数とを比較して差分を求めるとともに、歌唱音声の声量とリファレンス声量とを比較して差分を求め、この双方の差分が許容範囲内であった回数が所定の時間内において、所定回数以上である場合に合格と判定し、それ以外の場合に不合格の判定を行う。この合格・不合格の判定は、音符毎に行われる。 When the arrival of the evaluation timing of the singing voice is not determined (NO in S4), the evaluation mode processing unit 34 repeatedly executes step S4. When judging the arrival of the evaluation timing of the singing voice (YES in S4), the evaluation mode processing unit 34 performs the singing voice evaluation process (S5). In the singing voice evaluation process here, the evaluation mode processing unit 34 instructs the vocal adapter 19 to output the frequency and volume of the reference and the singing voice to the evaluation engine 341. The evaluation engine 341 evaluates the singing voice based on the frequency and volume of the input reference and the singing voice. Specifically, the evaluation engine 341 compares the singing frequency with the reference frequency to obtain a difference, compares the singing voice volume with the reference voice volume, obtains a difference, and the difference between the two is within an allowable range. If the number of times is greater than or equal to the predetermined number of times within a predetermined time, it is determined to be acceptable, and otherwise it is determined to be unacceptable. This pass / fail decision is made for each note.

ポイント抽出部３４２は、判定した合格・不合格の判定結果に基づいて、歌唱音声の採点を行う。ここでの採点は、例えば、基準となる点数（例えば、５０点等）から、合格につき所定点数（プラスポイント）が加算され、不合格につき所定点数（マイナスポイント）が減算される等して、例えば１００点を満点として行われる。 The point extraction unit 342 scores the singing voice based on the determined pass / fail determination result. The scoring here is, for example, by adding a predetermined score (plus points) for success and subtracting a predetermined score (minus points) for failure from a reference score (for example, 50 points, etc.) For example, the score is 100 points.

オブジェクト抽出部３４３は、ポイント抽出部３４２の採点結果に応じて、複数種類の付加画像から特定の付加画像を抽出する処理を実行する（Ｓ６）。具体的には、特定の付加画像の抽出は、採点結果を複数の閾値と照らし合わされることで行われる。例えば、複数の閾値が、１０点、２０点、３０点、４０点及び５０点である場合において、歌唱音声の採点結果が３２点である場合には、５０点、４０点の閾値以下であるため、５０点の閾値に対応した付加画像と、４０点の閾値に対応した付加画像とが特定され、抽出される。このように、歌唱音声の評価結果が低い程、多くの付加画像が抽出される。一方、採点結果がいずれの閾値よりも大きい値である場合には、付加画像の抽出が行われない。 The object extraction unit 343 executes a process of extracting a specific additional image from a plurality of types of additional images according to the scoring result of the point extraction unit 342 (S6). Specifically, extraction of a specific additional image is performed by comparing a scoring result with a plurality of threshold values. For example, when the plurality of threshold values are 10, 20, 30, 40, and 50 points, and the score of the singing voice is 32 points, the threshold values are 50 points or less. Therefore, the additional image corresponding to the threshold value of 50 points and the additional image corresponding to the threshold value of 40 points are specified and extracted. Thus, the lower the singing voice evaluation result, the more additional images are extracted. On the other hand, if the scoring result is a value larger than any threshold value, the additional image is not extracted.

また、逆に評価結果が高い程、付加画像が抽出されるようにしても良い。例えば、満点の場合に、歌唱者画像の頭上に表示するための王冠オブジェクトが抽出されてもよい。 Conversely, the higher the evaluation result, the more the additional image may be extracted. For example, in the case of a perfect score, a crown object to be displayed above the singer image may be extracted.

ポイント抽出部３４２は、画像処理の実行を指示するための制御信号を歌唱者画像処理部４０に対して出力する。ここで、制御信号には、付加画像の抽出がある場合には、抽出された付加画像の画像データが含められる。 The point extraction unit 342 outputs a control signal for instructing execution of image processing to the singer image processing unit 40. Here, when the additional image is extracted, the control signal includes image data of the extracted additional image.

歌唱者画像処理部４０は、制御信号が受信されると、制御信号に含まれる付加画像を画像記憶部４０１に記憶させる。映像分析部４０２は、付加画像の抽出があるかどうかを判断する（Ｓ７）。この付加画像の抽出の有無の判断は、例えば、画像記憶部４０１に、付加画像が記憶されているかどうかで行い、記憶されていない場合には、付加画像の抽出があると判断されない（Ｓ７でＮＯ）。 When the control signal is received, the singer image processing unit 40 causes the image storage unit 401 to store the additional image included in the control signal. The video analysis unit 402 determines whether there is an additional image extraction (S7). The determination of whether or not the additional image is extracted is performed based on, for example, whether or not the additional image is stored in the image storage unit 401. If the additional image is not stored, it is not determined that the additional image is extracted (S7). NO).

付加画像の抽出があると判断される場合には（Ｓ７でＹＥＳ）、映像分析部４０２は、画像記憶部４０１を参照して、歌唱者の撮影画像を分析して、歌唱者の特徴部分を特定し、特徴部分抽出部４０３が、特定された特徴部分のうち、付加画像に対応した特徴部分を抽出する（Ｓ８）。例えば、付加画像が髭オブジェクトである場合には、鼻の下等が、髭オブジェクトに対応した特徴部分として抽出される。 When it is determined that there is an additional image extraction (YES in S7), the video analysis unit 402 refers to the image storage unit 401, analyzes the photographed image of the singer, and determines the characteristic part of the singer. Then, the feature portion extraction unit 403 extracts a feature portion corresponding to the additional image from the specified feature portions (S8). For example, when the additional image is a heel object, the bottom of the nose is extracted as a feature portion corresponding to the heel object.

そして、画像合成部４０４は、抽出された特徴部分に対して、付加画像を合成する（Ｓ９）。その後、画像合成部４０４は、合成回路４３に歌唱者の撮影画像を出力する。ここで、出力される歌唱者の撮影画像は、付加画像の抽出がない場合には、付加画像の合成されていない画像が出力され、付加画像の抽出がある場合には、付加画像の合成画像が出力される。 Then, the image composition unit 404 synthesizes an additional image with the extracted feature portion (S9). Thereafter, the image composition unit 404 outputs a photographed image of the singer to the composition circuit 43. Here, as for the photographed image of the singer to be output, when there is no extraction of the additional image, an image in which the additional image is not combined is output, and when there is extraction of the additional image, a composite image of the additional image is output. Is output.

歌唱者画像処理部４０は、合成回路４３に対して、歌唱者の撮影画像と歌詞テロップとを合成させ、この合成画像をモニタ４４に表示させることで、モニタ４４の表示を更新する（Ｓ１０）。 The singer image processing unit 40 updates the display of the monitor 44 by causing the synthesis circuit 43 to synthesize the photographed image of the singer and the lyrics telop and displaying the synthesized image on the monitor 44 (S10). .

その後、評価モード処理部３４は、カラオケ演奏が終了したかどうかを判断し（Ｓ１１）、カラオケ曲の演奏が終了したと判断する場合には（Ｓ１２でＹＥＳ）、モード設定を通常モードに変更して、本処理を終了させる。一方、カラオケ曲の演奏が終了したと判断しない場合には（Ｓ１１でＮＯ）、評価モード処理部３４は、評価タイミングが到来したかどうかを判断する（Ｓ１２）。評価タイミングが到来したと判断しない場合には（Ｓ１２でＮＯ）、評価モード処理部３４は、本処理をステップＳ８に戻す。一方、評価タイミングが到来したと判断する場合には（Ｓ１２でＹＥＳ）、評価モード処理３４は、本処理をステップ５に戻す。 Thereafter, the evaluation mode processing unit 34 determines whether or not the karaoke performance has ended (S11), and when determining that the performance of the karaoke song has ended (YES in S12), the mode setting is changed to the normal mode. This process is terminated. On the other hand, if it is not determined that the performance of the karaoke song has ended (NO in S11), the evaluation mode processing unit 34 determines whether or not the evaluation timing has arrived (S12). If it is not determined that the evaluation timing has arrived (NO in S12), the evaluation mode processing unit 34 returns the process to step S8. On the other hand, when it is determined that the evaluation timing has arrived (YES in S12), the evaluation mode process 34 returns this process to step 5.

図３は、図２に示す評価処理でモニタ４４に表示される画面図Ｄの一例であり、（ａ）は、歌唱者の撮影画像Ｇに付加画像の合成のない画面図Ｄ１を示し、（ｂ）は、付加画像Ｏ１、Ｏ２の合成のある画面図Ｄ２を示す。 FIG. 3 is an example of a screen diagram D displayed on the monitor 44 in the evaluation process shown in FIG. 2, and (a) shows a screen diagram D1 in which no additional image is synthesized with the photographed image G of the singer, b) shows a screen diagram D2 with the composition of the additional images O1, O2.

図３（ａ）を参照して、例えば、図２に示す評価処理で、ステップＳ５による歌唱音声の評価に基づいた採点点数が、６０点であった場合について説明する。ステップＳ６で、複数の閾値と、採点点数である６０点とが比較される。ここでは、複数の閾値は、１０点、２０点、３０点、４０点及び５０点であるとする。採点点数は、６０点であるため、いずれの閾値よりも大きい値であり、いずれの付加画像も抽出されず、ステップＳ８でＮＯと判断され、ステップＳ１１で、画面図Ｄ１が表示されることになる。 With reference to Fig.3 (a), the case where the scoring score based on evaluation of the singing voice by step S5 is 60 points | pieces by the evaluation process shown in FIG. 2, for example. In step S6, a plurality of threshold values are compared with 60 points which are the scoring points. Here, it is assumed that the plurality of threshold values are 10, 20, 30, 40, and 50 points. Since the scoring score is 60 points, it is a value larger than any threshold value, no additional image is extracted, NO is determined in step S8, and the screen diagram D1 is displayed in step S11. Become.

図３（ｂ）を参照して、採点点数が３２点であった場合について説明する。採点点数は、３２点であるため、複数の閾値のうち、４０点及び５０点よりも大きい値である。ゆえに、ステップＳ６で、閾値４０点及び５０点に対応した付加画像（例では、付加画像Ｏ１、Ｏ２）が抽出される。そして、ステップＳ８でＹＥＳと判断され、ステップＳ９で、付加画像Ｏ１に対応する特徴部分Ｐ１と、付加画像Ｏ２に対応する特徴部分Ｐ２とが抽出され、ステップＳ１０で、特徴部分Ｐ１に付加画像Ｏ１が合成され、特徴部分Ｐ２に付加画像Ｏ２が合成され、ステップＳ１１で、画面図Ｄ２が表示される。そして、次に、ステップＳ５が実行されることで、採点点数が変化した場合、例えば、４２点になった場合には、閾値４０点を超えるため、閾値４０点に対応する付加画像Ｏ１が抽出されない。ゆえに、更新されるモニタ４４の画面において、歌唱者の画像Ｇに合成されるのは、付加画像Ｏ２のみとなり、カラオケ曲の演奏途中における歌唱音声の評価の変化に対応して、モニタ４４の表示が順次変更される。 With reference to FIG.3 (b), the case where the score is 32 points | pieces is demonstrated. Since the scoring number is 32 points, it is a value larger than 40 points and 50 points among a plurality of threshold values. Therefore, in step S6, additional images (in the example, additional images O1 and O2) corresponding to the threshold values 40 and 50 are extracted. Then, YES is determined in step S8, and in step S9, a feature portion P1 corresponding to the additional image O1 and a feature portion P2 corresponding to the additional image O2 are extracted, and in step S10, the additional image O1 is added to the feature portion P1. Are combined, the additional image O2 is combined with the feature portion P2, and the screen diagram D2 is displayed in step S11. Next, when the number of scoring points is changed by executing step S5, for example, when the number of scoring points is 42, the threshold value 40 is exceeded, so that an additional image O1 corresponding to the threshold value 40 points is extracted. Not. Therefore, in the updated screen of the monitor 44, only the additional image O2 is synthesized with the singer's image G, and the display on the monitor 44 corresponds to the change in the evaluation of the singing voice during the performance of the karaoke song. Are changed sequentially.

このように、採点点数が低い程、多くの付加画像が歌唱者の画像Ｇに合成されることになるため、ユーザに理解させやすく、歌唱音声の評価を表示することができる。また、カラオケ曲の演奏途中の歌唱音声の評価における変化の推移を、モニタ４４に表示させることができるため、歌唱途中において、歌唱者に緊張感を与えることができ、ユーザの歌唱音声に対する遊戯性を高めることができる。 Thus, since the additional images are combined with the singer's image G as the score is lower, it is easier for the user to understand and the evaluation of the singing voice can be displayed. Moreover, since the transition of the change in the evaluation of the singing voice during the performance of the karaoke song can be displayed on the monitor 44, it is possible to give a tension to the singer in the middle of the singing and playability with respect to the user's singing voice. Can be increased.

なお、本図では、説明の便宜のために、特徴部分Ｐ１及びＰ２を点線で表示しているが、実際の画面図では表示されない。また、本図では、歌詞テロップを省略している。 In this figure, for convenience of explanation, the characteristic portions P1 and P2 are displayed by dotted lines, but they are not displayed in the actual screen view. In the figure, the lyrics telop is omitted.

上述した本実施形態の構成によれば、撮影者の撮影画像に変形処理を施していた従来技術と比較して、画像合成という簡易な画像処理を用いて、歌唱者の歌唱力（歌唱レベル）の評価を表現して、表示することができる。特に、本実施形態では、カラオケ曲の演奏途中において、歌唱者の歌唱音声が所定時間間隔毎に評価され、評価される毎に、該評価結果を踏まえて、歌唱者の撮影画像に画像処理が施されて、モニタ４４の表示が更新されるため、従来技術と比較して、迅速に画像処理を行うことができ、モニタ４４の表示の更新を比較的迅速に行うことができる。 According to the configuration of the present embodiment described above, the singer's singing ability (singing level) is achieved by using simple image processing called image synthesis as compared with the conventional technique in which the photographer's photographed image is subjected to deformation processing. Can be expressed and displayed. In particular, in this embodiment, during the performance of a karaoke song, the singer's singing voice is evaluated at predetermined time intervals, and image processing is performed on the photographed image of the singer based on the evaluation result each time it is evaluated. As a result, the display on the monitor 44 is updated, so that image processing can be performed quickly and the display on the monitor 44 can be updated relatively quickly as compared with the prior art.

（第２の実施形態）
以下、本発明の第２の実施形態を図１を用いて説明する。第２の実施形態では、第１の実施形態と比較して、特徴部分抽出部４０３によって抽出された特徴部分に対してエフェクト処理を行うことが異なっている。カラオケ装置１ａは、図１に破線で示すエフェクト処理部４０５を歌唱者画像処理部４０に機能的に備えることが、カラオケ装置１と異なっており、その他の構成については同様である。エフェクト処理部４０５は、エフェクト処理手段の一例であり、ポイント抽出部３４２から採点結果が出力されるとともに、特徴部分抽出部４０３から抽出結果が通知され、該採点結果に応じて、通知された歌唱者の特徴部分にモザイク処理及び色変更処理のうち少なくとも１の処理を含むエフェクト処理を実行する。 (Second Embodiment)
Hereinafter, a second embodiment of the present invention will be described with reference to FIG. The second embodiment is different from the first embodiment in that effect processing is performed on the feature portion extracted by the feature portion extraction unit 403. The karaoke apparatus 1a is different from the karaoke apparatus 1 in that the singer image processing unit 40 is functionally provided with an effect processing unit 405 indicated by a broken line in FIG. 1, and the other configurations are the same. The effect processing unit 405 is an example of an effect processing unit. The scoring result is output from the point extraction unit 342, the extraction result is notified from the feature portion extraction unit 403, and the notified singing is performed according to the scoring result. The effect process including at least one of the mosaic process and the color change process is executed on the feature portion of the person.

本実施の形態の構成によれば、採点結果の表現を、付加画像の合成によってだけでなく、特徴部分に対するエフェクト処理においても表現することができるため、よりきめ細やかな評価結果の表示を行うことができる。 According to the configuration of the present embodiment, since the expression of the scoring result can be expressed not only by the synthesis of the additional image but also in the effect processing for the characteristic portion, the more detailed evaluation result can be displayed. Can do.

（第３の実施形態）
以下、本発明の第３の実施形態を図１及び図２を用いて説明する。第３の実施形態では、第１及び第２の実施形態と比較して、歌唱者画像処理部４０で処理された、歌唱者の撮影画像を、記録媒体に記録することができることが異なっている。第３の実施形態にかかるカラオケ装置１ｂは、図１において破線で示すように、合成画像入出力部５１を更に備えることが、カラオケ装置１及びカラオケ装置１ａと異なっており、その他の構成については同様である。合成画像入出力部５１は、本願発明の合成画像出力手段に対応しており、例えば、メモリスティック（登録商標）ドライブや、ＣＤ−Ｒ（CD Recordable）ドライブ等で実現される。合成画像入出力部５１は、ユーザの操作部２０を用いた指示を受けて、例えば、メモリスティックやＣＤ−Ｒ等の書き込み可能な記録媒体５１１に、画像記憶部４０１に記憶される歌唱者の撮影画像や、採点ログエリア１０２に記憶されている点数を出力する。 (Third embodiment)
Hereinafter, a third embodiment of the present invention will be described with reference to FIGS. The third embodiment is different from the first and second embodiments in that a photographed image of a singer processed by the singer image processing unit 40 can be recorded on a recording medium. . The karaoke apparatus 1b according to the third embodiment is different from the karaoke apparatus 1 and the karaoke apparatus 1a in that it further includes a composite image input / output unit 51 as shown by a broken line in FIG. It is the same. The composite image input / output unit 51 corresponds to the composite image output means of the present invention, and is realized by, for example, a Memory Stick (registered trademark) drive, a CD-R (CD Recordable) drive, or the like. The composite image input / output unit 51 receives a user's instruction using the operation unit 20 and, for example, the singer's stored in the image storage unit 401 in a writable recording medium 511 such as a memory stick or a CD-R. The captured image and the score stored in the scoring log area 102 are output.

また、合成画像入出力部５１は、評価モードが設定されている場合に、ユーザの操作部２０を用いた指示を受けて、歌唱者の撮影画像の書き込まれた記録媒体５１１から、歌唱者の撮影画像や採点点数を読み出す。合成画像入出力部５１は、読み出した歌唱者を歌唱者画像処理部４０に出力するとともに、読み出した採点点数を採点ログエリア１０２及びポイント抽出部３４２に出力する。歌唱者画像処理部４０は、合成画像入出力部５１から出力された歌唱者の撮影画像を用いる。また、ポイント抽出部３４２は、出力された採点点数を評価処理における最初の基準点数として、これに対して、ポイントの減算及び加算を行う。 Further, when the evaluation mode is set, the composite image input / output unit 51 receives an instruction using the operation unit 20 of the user, and from the recording medium 511 on which the photographed image of the singer is written, Read out the captured image and the number of points. The composite image input / output unit 51 outputs the read singer to the singer image processing unit 40 and outputs the read score points to the scoring log area 102 and the point extraction unit 342. The singer image processing unit 40 uses a photographed image of the singer output from the composite image input / output unit 51. In addition, the point extraction unit 342 performs the subtraction and addition of points on the output scoring score as the first reference score in the evaluation process.

上述の本実施形態の構成によれば、複数回のカラオケ曲の演奏にわたって、歌唱者の歌唱音声の評価を行うことができ、遊戯性の高い、歌唱音声の評価を行うことができる。 According to the configuration of the above-described embodiment, the singing voice of the singer can be evaluated over a plurality of karaoke performances, and the singing voice can be evaluated with high playability.

なお、第１〜第３の実施形態では、複数のモードから１のモードを選択的に設定可能な構成としたが、これに限定されず、常に評価モードが設定されている構成であってもよい。 In the first to third embodiments, one mode can be selectively set from a plurality of modes. However, the present invention is not limited to this, and the evaluation mode is always set. Good.

また、第１〜第３の実施形態では、歌唱者の撮影画像のみで、歌唱音声の評価結果が表示されるが、これに加えて、採点結果を表示する構成としてもよい。これによると、ユーザにより詳細に評価結果を報知することができ、更に好ましい。 Moreover, in 1st-3rd embodiment, although the evaluation result of a singing voice is displayed only with the picked-up image of a singer, it is good also as a structure which displays a scoring result in addition to this. According to this, the evaluation result can be notified in detail by the user, which is more preferable.

なお、第１〜第３の実施形態では、歌唱音声の評価結果に基づいて、採点を行っているが、必ずしも、採点する必要はなく、評価エンジンで求めた、プラスポイント及びマイナスポイント分が、付加画像の付加や消去で表現されるだけであってもよい。 In the first to third embodiments, scoring is performed based on the evaluation result of the singing voice, but it is not always necessary to score, and the plus points and minus points obtained by the evaluation engine are as follows. It may be simply expressed by addition or deletion of an additional image.

なお、第１〜第３の実施形態では、歌唱者を撮影した画像の実写が表示されるが、これに限定されず、「歌唱者の撮影画像」には、歌唱者を撮影した画像をデフォルメしたものや、歌唱者の撮影画像をアニメ化したもの、或いは、動物等で表したものも含む。なお、第１〜第３の実施形態では、静止画が表示されるが、これに限定されず、動画が表示されてもよい。 In the first to third embodiments, a live-action image of the singer is displayed. However, the present invention is not limited to this, and the “photograph of the singer” is an image of the singer. And those obtained by singing a photographed image of a singer, or those represented by animals or the like. In the first to third embodiments, a still image is displayed. However, the present invention is not limited to this, and a moving image may be displayed.

また、評価モードでは、背景画像が表示されないが、これに限定されず、背景画像の一部に、歌唱者の撮影画像が表示される構成であってもよい。 In the evaluation mode, the background image is not displayed. However, the present invention is not limited to this, and a configuration in which a photographed image of the singer is displayed on a part of the background image may be used.

本発明の第１の実施形態にかかるカラオケ装置のブロック図である。1 is a block diagram of a karaoke apparatus according to a first embodiment of the present invention. 図１で示すカラオケ装置の実行する評価処理の一例を示すフローチャートである。It is a flowchart which shows an example of the evaluation process which the karaoke apparatus shown in FIG. 1 performs. 図２に示す評価処理でモニタに表示される画面図の一例であり、（ａ）は、歌唱者の撮影画像に付加画像の合成のない画面図を示し、（ｂ）は、付加画像の合成のある画面図を示す。It is an example of the screen figure displayed on a monitor by the evaluation process shown in FIG. 2, (a) shows a screen figure without the synthesis | combination of an additional image, and (b) is a synthesis | combination of an additional image. The screen figure with is shown.

Explanation of symbols

１、１ａ、１ｂ−カラオケ装置１１２−付加画像記憶部（付加画像記憶手段の一例）１３―音源（演奏手段の一例）１５−マイクロフォン（歌唱入力手段の一例）１６−ミキサ（演奏手段の一例）１７―サウンドシステム（演奏手段の一例）１８―スピーカ（演奏手段の一例）３４１―評価エンジン（評価手段の一例）３４２―ポイント抽出部（評価手段の一例）３４３―オブジェクト抽出部（画像処理手段の一例）４０―歌唱者画像処理部（画像処理手段の一例）４０３―特徴部分抽出部（抽出手段の一例）４０４―画像合成部（画像合成手段の一例）４０５―エフェクト処理部（エフェクト処理手段の一例）４１―ビデオカメラ（撮影手段の一例）５１―合成画像入出力部（合成画像入力手段の一例） 1, 1a, 1b-Karaoke apparatus 112-Additional image storage unit (an example of additional image storage means) 13-Sound source (an example of performance means) 15-Microphone (an example of singing input means) 16-Mixer (an example of performance means) 17-sound system (an example of performance means) 18-speaker (an example of performance means) 341-evaluation engine (an example of evaluation means) 342-point extraction unit (an example of evaluation means) 343-object extraction unit (an image processing means) Example) 40-Singer image processing unit (an example of image processing means) 403-Feature portion extraction unit (an example of extraction means) 404-Image composition unit (an example of image composition means) 405-Effect processing unit (of the effect processing means) Example) 41-video camera (an example of photographing means) 51-composite image input / output unit (an example of composite image input means)

Claims

Performance means for playing karaoke music, singing input means for inputting the singing voice of the singer, evaluation means for evaluating the singing voice input by the singing input means, photographing means for photographing the singer, In accordance with the evaluation result by the evaluation means, image processing means for performing image processing on the photographed image of the singer, and display means for displaying the photographed image of the singer image-processed by the image processing means. A karaoke device provided,
An additional image storage means for storing an additional image to be added to the photographed image of the singer;
The image processing means includes
Extracting means for extracting a characteristic portion of the singer in the photographed image of the singer;
A karaoke apparatus comprising: an image synthesis unit that synthesizes the additional image with respect to the feature portion extracted by the extraction unit in accordance with an evaluation result by the evaluation unit.

The evaluation means evaluates the singing voice at predetermined time intervals during the performance of the karaoke song by the performance means, and the image processing means executes the image processing every time the evaluation result is obtained. The karaoke apparatus according to claim 1.

The karaoke apparatus according to claim 1, wherein the extraction unit extracts a constituent element of a singer's face as a characteristic part of the singer.

The karaoke apparatus according to any one of claims 1 to 3, wherein the image synthesizing unit synthesizes many additional images with the characteristic portion of the singer as the evaluation by the evaluation unit is lower.

The image processing means further comprises effect processing means for performing at least one of mosaic processing and color change processing on the characteristic part of the singer extracted by the extraction means. Item 5. The karaoke apparatus according to any one of items 1 to 4.

6. The karaoke apparatus according to claim 1, further comprising a composite image output unit that is capable of writing the photographed image of the singer that has been subjected to the composite processing of the additional image to a recording medium.