JP4134921B2

JP4134921B2 - Karaoke equipment

Info

Publication number: JP4134921B2
Application number: JP2004055065A
Authority: JP
Inventors: 航一郎佐藤; 卓也田丸
Original assignee: Yamaha Corp
Current assignee: Yamaha Corp
Priority date: 2004-02-27
Filing date: 2004-02-27
Publication date: 2008-08-20
Anticipated expiration: 2024-02-27
Also published as: JP2005242230A

Description

本発明は、歌唱者の歌唱力を採点するカラオケ装置に関する。 The present invention relates to a karaoke apparatus for scoring a singer's singing ability.

楽曲データに基づいて自動演奏を行うカラオケ装置の中には、歌唱音声信号を入力して歌唱者の歌唱力を採点するカラオケ装置がある。この種のカラオケ装置は入力した歌唱音声信号を解析して歌唱力を採点している。この採点には、歌唱音声信号のレベルが高い場合に高得点となり、低い場合に低得点となる、という傾向がある。したがって、歌唱音声信号が低レベルとなる人が高レベルとなる人より上手に歌っても、前者の歌唱力が後者の歌唱力よりも低く採点されてしまう虞がある。 Among karaoke apparatuses that perform automatic performance based on music data, there is a karaoke apparatus that inputs a singing voice signal and scores a singer's singing ability. This kind of karaoke apparatus scores the singing ability by analyzing the input singing voice signal. This scoring tends to be high when the level of the singing voice signal is high and low when it is low. Therefore, even if a person whose singing voice signal is at a low level sings better than a person who is at a high level, the former singing ability may be scored lower than the singing ability of the latter.

このような問題を解決する１つの手法として、歌唱音声信号以外の、歌唱の状態を示すデータを歌唱力の採点に用いることが考えられる。
歌唱音声信号以外のデータを採点に用いるカラオケ装置としては、特許文献１に開示のカラオケ装置が挙げられる。このカラオケ装置は、歌唱者の姿をカメラで撮影し、カメラから取り込んだ静止画像データと予め記憶している基準画像データとを比較し、歌唱者の振り付け等の姿を採点する。
特開２０００−２９４８３号公報 As one method for solving such a problem, it is conceivable to use data indicating the state of singing other than the singing voice signal for scoring the singing ability.
As a karaoke apparatus that uses data other than the singing voice signal for scoring, a karaoke apparatus disclosed in Patent Document 1 can be cited. This karaoke apparatus photographs the appearance of a singer with a camera, compares still image data captured from the camera with reference image data stored in advance, and scores the singer's appearance such as choreography.
JP 2000-29483 A

しかし、特許文献１に開示のカラオケ装置は、歌唱者の振り付けやダンス等の動きをも採点可能にするものであり、歌唱力については従来通りの方法で採点している。したがって、上記の問題の解決には役立たない。
本発明は上述した事情に鑑みて為されたものであり、歌唱音声信号のレベルに関わらず、歌の上手さに見合った採点結果を得ることができるカラオケ装置を提供することを目的としている。 However, the karaoke apparatus disclosed in Patent Literature 1 enables scoring of dancers and movements such as dance, and the singing ability is scored by a conventional method. Therefore, it does not help solve the above problem.
The present invention has been made in view of the above-described circumstances, and an object of the present invention is to provide a karaoke apparatus capable of obtaining a scoring result suitable for the skill of a song regardless of the level of a singing voice signal.

本発明は、楽曲データに基づいて自動演奏を行うカラオケ装置において、歌唱者の口元を撮像するカメラの撮像画像から前記歌唱者の口の形状を検出する形状検出手段と、前記楽曲データに基づいて前記形状検出手段により検出された形状の適切さを特定し、特定した適切さを表す画像点数データを生成する形状採点手段とを有するカラオケ装置を提供する。
このカラオケ装置では、カメラの撮像画像から歌唱者の口の形状が検出され、この形状の適切さが楽曲データに基づいて特定され、特定された適切さを表す画像点数データが生成される。 The present invention provides a karaoke apparatus that performs an automatic performance based on music data, a shape detection unit that detects the shape of the singer's mouth from a captured image of a camera that images the singer's mouth, and the music data. There is provided a karaoke apparatus having shape scoring means for specifying appropriateness of a shape detected by the shape detecting means and generating image score data representing the specified appropriateness.
In this karaoke apparatus, the shape of the singer's mouth is detected from the captured image of the camera, the appropriateness of this shape is specified based on the music data, and image score data representing the specified appropriateness is generated.

また、本発明は、楽曲データに基づいて自動演奏を行うカラオケ装置において、歌唱者の口元を撮像するカメラの撮像画像から前記歌唱者の口の縦横比を計測する縦横比計測手段と、前記楽曲データに基づいて前記縦横比計測手段により計測された縦横比の適切さを特定し、特定した適切さを表す画像点数データを生成する縦横比採点手段とを有するカラオケ装置を提供する。
このカラオケ装置によれば、カメラの撮像画像から歌唱者の口の縦横比が計測され、この縦横比の適切さが楽曲データに基づいて特定され、特定された適切さを表す画像点数データが生成される。 Further, the present invention provides an aspect ratio measuring means for measuring an aspect ratio of the singer's mouth from a captured image of a camera that images a singer's mouth in a karaoke apparatus that performs automatic performance based on music data, and the music Provided is a karaoke apparatus having an aspect ratio scoring unit that specifies appropriateness of an aspect ratio measured by the aspect ratio measuring unit based on data and generates image score data representing the specified appropriateness.
According to this karaoke apparatus, the aspect ratio of the singer's mouth is measured from the captured image of the camera, the appropriateness of this aspect ratio is specified based on the music data, and image score data representing the specified appropriateness is generated Is done.

また、本発明は、歌詞データを含む楽曲データに基づいて自動演奏を行うカラオケ装置において、歌唱者の口元を撮像するカメラの撮像画像から前記歌唱者の口の形状を検出する形状検出手段と、口の適切な形状を表す形状データを母音毎に記憶している記憶手段と、前記歌詞データから前記歌唱者が発音すべき母音を特定する母音特定手段と、前記母音特定手段により特定された母音について前記記憶手段に記憶されている形状データに基づいて前記形状検出手段により検出された形状の適切さを特定し、特定した適切さを表す画像点数データを生成する形状採点手段とを有するカラオケ装置を提供する。
このカラオケ装置によれば、歌詞データから歌唱者が発音すべき母音が特定される一方、カメラの撮像画像から歌唱者の口の形状が検出される。そして、この形状の適切さが、特定された母音についての形状データに基づいて特定され、特定された適切さを表す画像点数データが生成される。 Further, the present invention provides a karaoke apparatus that performs an automatic performance based on music data including lyrics data, and a shape detection unit that detects a shape of the singer's mouth from a captured image of a camera that images the singer's mouth; Storage means storing shape data representing an appropriate shape of the mouth for each vowel, vowel specifying means for specifying a vowel to be pronounced by the singer from the lyrics data, and vowels specified by the vowel specifying means A karaoke apparatus comprising: shape scoring means for identifying appropriateness of the shape detected by the shape detecting means based on the shape data stored in the storage means, and generating image score data representing the specified suitability I will provide a.
According to this karaoke apparatus, the vowel that the singer should pronounce is identified from the lyrics data, while the shape of the singer's mouth is detected from the captured image of the camera. Then, the appropriateness of the shape is specified based on the shape data for the specified vowel, and image score data representing the specified appropriateness is generated.

また、本発明は、楽曲データに基づいて自動演奏を行うカラオケ装置において、歌唱者の口元を撮像するカメラの撮像画像から前記歌唱者の口の大きさを計測するサイズ計測手段と、前記楽曲データに基づいて前記サイズ計測手段により計測された大きさの時間的変化の適切さ特定し、特定した適切さを表す画像点数データを生成するサイズ変化採点手段とを有するカラオケ装置を提供する。
このカラオケ装置によれば、カメラの撮像画像から歌唱者の口の大きさが計測される。そして、計測された大きさの時間的変化の適切さが楽曲データに基づいて特定され、特定された適切さを表す画像点数データが生成される。
上記の各カラオケ装置によれば、画像点数データは歌唱音声信号を用いずに生成されているから、歌唱音声信号のレベルに関わらず、歌の上手さに見合った採点結果を得ることができる。 Further, the present invention provides a karaoke apparatus that performs automatic performance based on music data, a size measuring unit that measures the size of the singer's mouth from a captured image of a camera that images the singer's mouth, and the music data A karaoke apparatus having size change scoring means for specifying appropriateness of temporal change in the size measured by the size measuring means and generating image score data representing the specified suitability is provided.
According to this karaoke apparatus, the size of the singer's mouth is measured from the captured image of the camera. Then, the appropriateness of the temporal change in the measured size is specified based on the music data, and image score data representing the specified appropriateness is generated.
According to each of the above karaoke apparatuses, since the image score data is generated without using the singing voice signal, a scoring result corresponding to the skill of the song can be obtained regardless of the level of the singing voice signal.

また、上記の各カラオケ装置に、前記歌唱者の歌唱音声を表す信号を用いて前記歌唱者の歌唱力を採点し、採点結果を表す音声点数データを生成する音声採点手段と、前記画像点数データ及び前記音声点数データとを用いて、前記歌唱者の歌唱力を表す総合点数データを生成する総合採点手段とを設けてもよい。
この場合、歌唱者の歌唱音声を表す信号を用いて歌唱者の歌唱力が採点され、この採点結果を表す音声点数データが生成される。そして、この音声点数データと画像点数データとを用いて歌唱者の歌唱力を表す総合点数データが生成される。このように、歌唱音声信号を用いずに生成された画像点数データを用いて総合点数データが生成されるから、歌唱音声信号のレベルに関わらず、歌の上手さに見合った採点結果を得ることができる。また、画像点数データのみならず、歌唱音声を表す信号を用いた採点結果を表す音声点数データをも用いて総合点数データが生成されるから、歌の上手さを高い精度で示す採点結果を得ることができる。 In addition, a voice scoring means for scoring the singer's singing ability using a signal representing the singing voice of the singer and generating voice score data representing a scoring result for each karaoke device, and the image score data And a total scoring means for generating total score data representing the singing ability of the singer using the voice score data.
In this case, the singer's singing ability is scored using a signal representing the singing voice of the singer, and voice score data representing the scoring result is generated. And the total score data showing a singer's singing power is produced | generated using this audio | voice score data and image score data. Thus, since the total score data is generated using the image score data generated without using the singing voice signal, a scoring result suitable for the skill of the song is obtained regardless of the level of the singing voice signal. Can do. Moreover, since the total score data is generated not only using the image score data but also using the voice score data representing the scoring result using the signal representing the singing voice, the scoring result indicating the singing skill with high accuracy is obtained. be able to.

以下、図面を参照して、本発明の実施形態について説明する。各図において、共通する部分には同一の符号が付されている。
図１は本発明の実施形態に係るカラオケ装置のハードウェア構成を示す図である。
このカラオケ装置は楽曲データに基づいて自動演奏を行う機能と、歌唱者の歌唱力を採点する機能とを備えており、図には、歌唱者の歌唱音声を集音して歌唱音声信号を出力するマイク１も示されている。 Embodiments of the present invention will be described below with reference to the drawings. In each figure, the same code | symbol is attached | subjected to the common part.
FIG. 1 is a diagram showing a hardware configuration of a karaoke apparatus according to an embodiment of the present invention.
This karaoke device has a function of performing automatic performance based on music data and a function of scoring a singer's singing ability. The figure collects the singer's singing voice and outputs a singing voice signal. A microphone 1 is also shown.

このカラオケ装置は、イニシャルプログラムが書き込まれたＲＯＭ（Read Only Memory）２、プログラムを読み出して実行するＣＰＵ（中央処理装置）３、ＣＰＵ３のワークエリアとして使用されるＲＡＭ（ランダムアクセスメモリ）４、楽曲データ及びシステムプログラムが書き込まれたハードディスク５、マイク１から出力された歌唱音声信号を入力してＣＰＵ３へ供給する音声入力Ｉ／Ｆ（インタフェース）６、演奏開始指示をＣＰＵ３へ供給する指示入力部７、採点の結果を表示する表示パネル８、及び歌唱者の顔を撮像して撮像画像を表す画像データをＣＰＵ３へ供給するカメラ９を有する。 This karaoke device includes a ROM (Read Only Memory) 2 in which an initial program is written, a CPU (Central Processing Unit) 3 that reads and executes the program, a RAM (Random Access Memory) 4 that is used as a work area of the CPU 3, and a song A hard disk 5 in which data and system programs are written, a voice input I / F (interface) 6 that inputs a singing voice signal output from the microphone 1 and supplies it to the CPU 3, and an instruction input unit 7 that supplies a performance start instruction to the CPU 3 And a display panel 8 for displaying a scoring result, and a camera 9 for imaging the singer's face and supplying image data representing the captured image to the CPU 3.

図２は楽曲データの構成を模式的に示す図である。
楽曲データは、ヘッダ、楽音データ、ガイドメロディデータ、歌詞データ等のデータから構成されている。ヘッダは、自動演奏の際に用いられる曲番号などの、楽曲に関する書誌的なデータである。楽音データは複数のイベントデータと各イベントの時間間隔を示すデュレーションデータとを含んだシーケンスデータであり、自動演奏はこの楽音データに基づいて行われる。ガイドメロディデータは、歌唱者が歌うべきメロディのシーケンスデータである。歌唱力を採点する一般的なカラオケ装置では、このガイドメロディデータが表す音高や音量、リズム、テンポに基づいて歌唱力の採点が行われている。歌詞データは、歌唱者が歌うべき歌詞のシーケンスデータである。一般的なカラオケ装置では、この歌詞データを用いて歌詞を表示したり、歌詞の表示色を楽曲の進行に合わせて変更したりする。 FIG. 2 is a diagram schematically showing the composition of music data.
The music data is composed of data such as a header, musical tone data, guide melody data, and lyrics data. The header is bibliographic data related to music such as a music number used for automatic performance. The musical tone data is sequence data including a plurality of event data and duration data indicating the time interval of each event, and automatic performance is performed based on the musical tone data. The guide melody data is melody sequence data to be sung by the singer. In a general karaoke apparatus that scores singing ability, singing ability is scored based on the pitch, volume, rhythm, and tempo represented by the guide melody data. The lyric data is sequence data of lyrics to be sung by the singer. In a general karaoke apparatus, lyrics are displayed using the lyrics data, and the display color of the lyrics is changed according to the progress of the music.

図１において、カメラ９は、歌唱者の顔の画像が中央に大きく写るように、向きやズーム、フォーカスを自動的に調整するカメラであり、撮像とＣＰＵ３への画像データの供給とを予め定められた時間間隔で行う。また、ＣＰＵ３は、図示しない電源の投入時にイニシャルプログラムを実行し、その上でシステムプログラムを実行する。ハードディスク５には楽曲データに基づいて自動演奏を行うための演奏プログラムや歌唱者の歌唱力を採点するための採点プログラムが書き込まれており、システムプログラムを実行中のＣＰＵ３は、指示入力部７を用いて演奏開始指示が入力されると、演奏プログラム及び採点プログラムを読み出して実行する。 In FIG. 1, a camera 9 is a camera that automatically adjusts the orientation, zoom, and focus so that the image of the singer's face is large in the center, and imaging and supply of image data to the CPU 3 are determined in advance. At specified time intervals. The CPU 3 executes an initial program when a power supply (not shown) is turned on, and then executes a system program. A performance program for performing an automatic performance based on music data and a scoring program for scoring a singer's singing ability are written on the hard disk 5. The CPU 3 executing the system program stores the instruction input unit 7. When a performance start instruction is input, the performance program and the scoring program are read out and executed.

これらのプログラムのうち、新規なものは採点プログラムのみである。よって、以下では、ＣＰＵ３が採点プログラムを実行しているときのカラオケ装置の機能構成および動作のうち、歌唱者の歌唱力の採点に係る部分について説明する。ただし、ここでは、採点プログラムの内容が異なる３種類の実施形態を想定しているから、これらの実施経形態について順に説明する。 Of these programs, the only new one is the scoring program. Therefore, below, the part which concerns on scoring of a singer's singing power is demonstrated among the functional structures and operation | movement of a karaoke apparatus when CPU3 is executing the scoring program. However, since three types of embodiments with different contents of the scoring program are assumed here, these embodiments will be described in order.

［第１実施形態］
図３は本発明の第１実施形態に係るカラオケ装置１０の機能構成を示す図である。このカラオケ装置１０は、カメラ９、縦横比計測部１１、縦横比採点部１２、音声採点部１３、総合採点部１４及び点数表示部１５を有する。
縦横比計測部１１は、カメラ９から供給された画像データを用いて歌唱者の口の縦横比を計測する。口の縦横比とは、唇に囲まれた領域の横幅に対する縦幅の比である。領域の横幅／縦幅は、口の横方向／縦方向における当該領域の最大長である。 [First Embodiment]
FIG. 3 is a diagram showing a functional configuration of the karaoke apparatus 10 according to the first embodiment of the present invention. The karaoke apparatus 10 includes a camera 9, an aspect ratio measuring unit 11, an aspect ratio scoring unit 12, a voice scoring unit 13, a total scoring unit 14, and a score display unit 15.
The aspect ratio measurement unit 11 measures the aspect ratio of the singer's mouth using the image data supplied from the camera 9. The aspect ratio of the mouth is the ratio of the vertical width to the horizontal width of the area surrounded by the lips. The width / length of the area is the maximum length of the area in the width / length direction of the mouth.

縦横比採点部１２は自動演奏中の楽曲データ内のガイドメロディデータを参照して採点タイミングを特定する。また、縦横比採点部１２は特定した採点タイミングにて後述の縦横比採点処理を行う。この縦横比採点処理において、縦横比採点部１２は縦横比計測部１１により計測された縦横比を用いて歌唱者の歌唱力を採点し、この採点の結果を表す画像点数データを生成する。なお、採点タイミングは歌唱者が声を出すべき期間内の時点である。
音声採点部１３は、入力した歌唱音声信号を解析して歌唱者の歌唱力を採点し、この採点の結果を表す音声点数データを生成する。この採点の際には楽曲データが参照される。 The aspect ratio scoring unit 12 specifies the scoring timing with reference to the guide melody data in the music data being automatically played. The aspect ratio scoring unit 12 performs the aspect ratio scoring process described later at the specified scoring timing. In the aspect ratio scoring process, the aspect ratio scoring unit 12 scores the singer's singing ability using the aspect ratio measured by the aspect ratio measuring unit 11, and generates image score data representing the result of the scoring. The scoring timing is a point in time during which the singer should speak.
The voice scoring unit 13 analyzes the input singing voice signal, scores the singer's singing ability, and generates voice score data representing the result of the grading. The music data is referred to at the time of scoring.

総合採点部１４は、縦横比採点部１２により生成された画像点数データと音声採点部１３により生成された音声点数データとを蓄積する。蓄積された点数データは、採点処理の終了時、すなわちＣＰＵ３による採点プログラムの実行終了時に破棄される。また、総合採点部１４は採点結果を表示すべきタイミングにて後述の総合採点処理を行う。この総合採点処理において、総合採点部１４は、蓄積した画像点数データ及び音声点数データを用いて歌唱者の歌唱力を採点し、この採点の結果を表す総合点数データを生成する。
点数表示部１５は総合採点部１４により生成された総合点数データで表される採点結果を表示パネル８に表示させる。 The overall scoring unit 14 accumulates the image score data generated by the aspect ratio scoring unit 12 and the audio score data generated by the audio scoring unit 13. The accumulated score data is discarded when the scoring process ends, that is, when the CPU 3 finishes executing the scoring program. The comprehensive scoring unit 14 performs a later-described comprehensive scoring process at a timing at which the scoring result should be displayed. In this total scoring process, the total scoring unit 14 scores the singer's singing ability using the accumulated image score data and voice score data, and generates total score data representing the result of this scoring.
The score display unit 15 causes the display panel 8 to display a scoring result represented by the total score data generated by the total scoring unit 14.

カラオケ装置１０においては、カメラ９から予め定められた時間間隔で画像データが出力される。したがって、縦横比計測部１１は歌唱者の口の縦横比を計測する処理（図４参照）を繰り返す。
この処理では、縦横比計測部１１は、まず、供給された画像データで表される撮像画像から歌唱者の口の輪郭線を抽出する（ステップＳＡ１）。この輪郭線は、撮像画像において唇の画像に囲まれた画像（以降、開口画像という）の輪郭線でもある。輪郭線の符号化方式は任意であるが、本実施形態では単位長さの線分の向きを示すチェーンコードにより符号化する。次に、この輪郭線と撮像画像の向きとに基づいて、開口画像の縦幅および横幅を計測する（ステップＳＡ２：ＹＥＳ、ＳＡ３）。開口画像の縦幅／横幅とは、撮像画像の縦方向／横方向における開口画像の最大長である。例えば、撮像画像と輪郭線との関係が図５に模式的に示すような関係の場合、開口画像の縦幅はＨ、横幅はＷとなる。次に、横幅に対する縦幅の比を算出し、これを歌唱者の口の縦横比とみなす（ステップＳＡ４）。なお、歌唱者の口が閉じている場合には輪郭線の抽出に失敗するから、縦横比計測部１１は零を歌唱者の口の縦横比とみなす（ステップＳＡ２：ＮＯ、ＳＡ５）。 In the karaoke apparatus 10, image data is output from the camera 9 at a predetermined time interval. Therefore, the aspect ratio measuring unit 11 repeats the process of measuring the aspect ratio of the singer's mouth (see FIG. 4).
In this process, the aspect ratio measurement unit 11 first extracts the outline of the singer's mouth from the captured image represented by the supplied image data (step SA1). This contour line is also a contour line of an image (hereinafter referred to as an aperture image) surrounded by a lip image in the captured image. The contour line can be encoded by any method, but in this embodiment, the contour line is encoded by a chain code indicating the direction of the line segment of the unit length. Next, the vertical width and the horizontal width of the aperture image are measured based on the contour line and the orientation of the captured image (step SA2: YES, SA3). The vertical / horizontal width of the aperture image is the maximum length of the aperture image in the vertical / horizontal direction of the captured image. For example, when the relationship between the captured image and the outline is as schematically shown in FIG. 5, the vertical width of the aperture image is H and the horizontal width is W. Next, the ratio of the vertical width to the horizontal width is calculated, and this is regarded as the aspect ratio of the singer's mouth (step SA4). If the singer's mouth is closed, the extraction of the contour line fails, so the aspect ratio measuring unit 11 regards zero as the singer's mouth aspect ratio (step SA2: NO, SA5).

一方、縦横比採点部１２は、採点タイミングを迎えると、図６に示す流れの縦横比採点処理を行う。縦横比採点処理において、縦横比採点部１２は、この時点で縦横比計測部１１により計測された縦横比を用いて、歌唱者の口が開いているか否かを判定する（ステップＳＢ１）。この判定は、例えば、計測された縦横比と予め定められた閾値とを比較することによって行われる。歌唱者が口を開けている場合、この時点で縦横比計測部１１により計測される縦横比は比較的に大きくなるから、縦横比採点部１２は、歌唱者の口が開いていると判定し、予め定められた点数を表す画像点数データを生成する（ステップＳＢ２）。この点数は正値である。逆に、歌唱者が口を閉じている場合、この時点で縦横比計測部１１により計測される縦横比は比較的に小さくなるから、縦横比採点部１２は、歌唱者の口が閉じていると判定し、零点を表す画像点数データを生成する（ステップＳＢ３）。 On the other hand, the aspect ratio scoring unit 12 performs the aspect ratio scoring process of the flow shown in FIG. 6 when the scoring timing comes. In the aspect ratio scoring process, the aspect ratio scoring unit 12 determines whether or not the singer's mouth is open by using the aspect ratio measured by the aspect ratio measuring unit 11 at this time (step SB1). This determination is performed, for example, by comparing the measured aspect ratio with a predetermined threshold value. When the singer is open, the aspect ratio measured by the aspect ratio measuring unit 11 at this time is relatively large, so the aspect ratio scoring unit 12 determines that the singer's mouth is open. Then, image score data representing a predetermined score is generated (step SB2). This score is a positive value. On the contrary, when the singer is closed, the aspect ratio measured by the aspect ratio measuring unit 11 at this time is relatively small, so the aspect ratio scoring unit 12 has the singer's mouth closed. And the image score data representing the zero point is generated (step SB3).

以上の処理に並行して、音声採点部１３は入力した歌唱音声信号を解析して歌唱者の歌唱力を採点し、この採点の結果を表す音声点数データを生成する。この音声点数データで表される採点結果は、歌唱音声信号のレベルが低い場合には歌唱者の歌の上手さに較べて悪い結果となる。こうして生成された画像点数データおよび音声点数データは総合採点部１４により蓄積される。 In parallel with the above processing, the voice scoring unit 13 analyzes the input singing voice signal, scores the singer's singing ability, and generates voice score data representing the result of the grading. The scoring result represented by the voice score data is worse than the singer's skill when the level of the singing voice signal is low. The image score data and audio score data generated in this way are accumulated by the total scoring unit 14.

そして、採点結果を表示すべきタイミングを迎えると、総合採点部１４は図７に示す流れの総合採点処理を行う。総合採点処理では、総合採点部１４は、蓄積した音声点数データから定まる点数に画像点数データから定まる点数を加算する（ステップＳＣ１）。次に、加算結果が予め定められた満点を超えているか否かを判定し（ステップＳＣ２）、超えている場合には満点を表す総合点数データを生成する（ステップＳＣ３）。逆に、超えていない場合には加算結果を表す総合点数データを生成する（ステップＳＣ４）。
そして、採点結果を表示すべきタイミングを迎えると、点数表示部１５により、この総合点数データで表される採点結果が表示パネル８に表示される。 And when the timing which should display a scoring result comes, the comprehensive scoring part 14 performs the comprehensive scoring process of the flow shown in FIG. In the total scoring process, the total scoring unit 14 adds the score determined from the image score data to the score determined from the accumulated voice score data (step SC1). Next, it is determined whether or not the addition result exceeds a predetermined full score (step SC2), and if it exceeds, total score data representing a full score is generated (step SC3). On the contrary, if not exceeding, total score data representing the addition result is generated (step SC4).
Then, when it is time to display the scoring results, the scoring results represented by the total score data are displayed on the display panel 8 by the score display unit 15.

以上より明らかなように、カラオケ装置１０によれば、歌唱者が採点タイミングにおいて正しく口を開けていれば総合点数が満点を超えない範囲で上がる。歌の上手な人であれば、正しいタイミングで口を開けるから、一般的なカラオケ装置において歌唱音声信号のレベルが低いために歌の上手さに較べて歌唱力が低く採点された人であっても、カラオケ装置１０を用いることにより、歌の上手さに見合った採点結果を得ることができる。 As is clear from the above, according to the karaoke apparatus 10, if the singer opens his mouth correctly at the scoring timing, the total score rises within a range not exceeding the full score. If you are good at singing, you can open your mouth at the right time, so in a general karaoke device, the level of singing voice signal is low, so the singing ability is scored lower than the singing skill. In addition, by using the karaoke apparatus 10, it is possible to obtain a scoring result commensurate with the skill of the song.

［第２実施形態］
図８は本発明の第２実施形態に係るカラオケ装置２０の機能構成を示す図である。このカラオケ装置２０が図１のカラオケ装置１０と異なる点は、縦横比計測部１１及び縦横比採点部１２に代えてサイズ計測部２１及びサイズ採点部２２を有する点と、後述の理想変化データ２３がＲＡＭ４に書き込まれている点である。理想変化データ２３のＲＡＭ４への書き込みは、採点プログラムの実行開始時に行われる。 [Second Embodiment]
FIG. 8 is a diagram showing a functional configuration of the karaoke apparatus 20 according to the second embodiment of the present invention. The karaoke apparatus 20 is different from the karaoke apparatus 10 in FIG. 1 in that it has a size measuring section 21 and a size scoring section 22 instead of the aspect ratio measuring section 11 and aspect ratio scoring section 12, and ideal change data 23 to be described later. Is written in the RAM 4. The ideal change data 23 is written into the RAM 4 when the scoring program is started.

理想変化データ２３は後述のサイズ採点処理において使用されるデータであり、口の大きさの理想的な時間的変化を表すエンベロープデータをケース毎に有する。エンベロープデータがケース毎に存在するのは、口の大きさの理想的な時間的変化がケース毎に異なるからである。例えば、発声開始時と発声終了時とでは口の大きさの理想的な時間的変化が異なる。 The ideal change data 23 is data used in a size scoring process, which will be described later, and has envelope data representing an ideal temporal change in mouth size for each case. The reason why the envelope data exists for each case is that the ideal temporal change of the mouth size differs for each case. For example, the ideal temporal change in mouth size differs at the start of utterance and at the end of utterance.

サイズ計測部２１は、カメラ９から供給された画像データを用いて歌唱者の口の大きさを計測し、計測結果を表すサイズデータを生成する。
サイズ採点部２２はサイズ計測部２１により生成されたサイズデータを蓄積する。また、サイズ採点部２２は自動演奏中の楽曲データ内のガイドメロディデータを参照して採点タイミングを特定する。また、サイズ採点部２２は特定した採点タイミングにて後述のサイズ採点処理を行う。このサイズ採点処理において、サイズ採点部２２は蓄積したサイズデータを用いて歌唱者の歌唱力を採点し、この採点の結果を表す画像点数データを生成する。 The size measuring unit 21 measures the size of the singer's mouth using the image data supplied from the camera 9 and generates size data representing the measurement result.
The size scoring unit 22 accumulates the size data generated by the size measuring unit 21. Further, the size scoring unit 22 specifies the scoring timing with reference to the guide melody data in the music data being automatically played. In addition, the size scoring unit 22 performs a size scoring process described later at the specified scoring timing. In this size scoring process, the size scoring unit 22 scores the singer's singing ability using the accumulated size data, and generates image score data representing the result of the scoring.

カラオケ装置２０においては、カメラ９から予め定められた時間間隔で画像データが出力される。したがって、サイズ計測部２１は、歌唱者の口の大きさを計測してサイズデータを生成する処理（図９参照）を繰り返す。この処理が図５に示す処理と大きく異なる点は、縦横比を計測するのではなく、開口画像のサイズ（例えば面積）を計測して計測結果を表すサイズデータを生成する点である（ステップＳＤ１、ＳＤ２）。なお、輪郭線の抽出に失敗した場合には、サイズ計測部２１は零を表すサイズデータを生成する（ステップＳＤ３）。また、サイズデータには、その生成時刻を示すデータも含まれている。 In the karaoke apparatus 20, image data is output from the camera 9 at a predetermined time interval. Therefore, the size measurement part 21 repeats the process (refer FIG. 9) which measures the magnitude | size of a singer's mouth and produces | generates size data. This process differs greatly from the process shown in FIG. 5 in that it does not measure the aspect ratio, but measures the size (for example, area) of the aperture image and generates size data representing the measurement result (step SD1). SD2). If the contour extraction fails, the size measuring unit 21 generates size data representing zero (step SD3). The size data also includes data indicating the generation time.

一方、サイズ採点部２２は、採点タイミングを迎えると、図１０に示す流れのサイズ採点処理を行う。このサイズ採点処理では、サイズ採点部２２は、まず、口の大きさの理想的な時間的変化に対する実際の歌唱者の口の大きさの時間的変化の適切さを示す変化適合度を算出する（ステップＳＥ１）。具体的には、まず、自動演奏中の楽曲データ内のガイドメロディデータから直前に再生タイミングを迎えたイベントデータを抽出し、このイベントデータに基づいてケースを特定し、特定したケースに応じたエンベロープデータを理想変化データ２３から抽出する。次に、抽出したエンベロープデータで表される時間的変化と蓄積したサイズデータで表される時間的変化とを比較して変化適合度を算出する。両変化が類似するほど変化適合度は高くなる。 On the other hand, the size scoring unit 22 performs the size scoring process of the flow shown in FIG. 10 when the scoring timing comes. In this size scoring process, the size scoring unit 22 first calculates a change adaptability indicating the appropriateness of the temporal change in the actual mouth size of the singer with respect to the ideal temporal change in the mouth size. (Step SE1). Specifically, first, event data that has just reached the playback timing is extracted from the guide melody data in the music data that is being played automatically, a case is specified based on this event data, and an envelope corresponding to the specified case is extracted. Data is extracted from the ideal change data 23. Next, the change adaptability is calculated by comparing the temporal change represented by the extracted envelope data with the temporal change represented by the accumulated size data. The more similar the two changes are, the higher the degree of change fit is.

次に、サイズ採点部２２は、算出した変化適合度に応じた画像点数を表す画像点数データを生成する（ステップＳＥ２）。生成された画像点数データは総合採点部１４により蓄積され、総合採点処理において使用される。なお、変化適合度が高いほど画像点数は高くなる。ただし、画像点数の最低点は零点に定められており、変化適合度が如何に低くても画像点数が負値となることはない。 Next, the size scoring unit 22 generates image score data representing the image score according to the calculated change adaptability (step SE2). The generated image score data is accumulated by the total scoring unit 14 and used in the total scoring process. Note that the higher the degree of change suitability, the higher the image score. However, the lowest point of the image score is set to zero, and the image score does not become a negative value no matter how low the change adaptability is.

以上の処理に並行して、音声採点部１３により音声点数データが生成され、生成された音声点数データが総合採点部１４により蓄積される。
そして、採点結果を表示すべきタイミングを迎えると、総合採点部１４が前述の総合採点処理を行って総合点数データを生成し、この総合点数データで表される採点結果が表示パネル８に表示される。 In parallel with the above processing, voice score data is generated by the voice scoring unit 13, and the generated voice score data is accumulated by the total scoring unit 14.
Then, when it is time to display the scoring result, the comprehensive scoring unit 14 performs the above-described comprehensive scoring process to generate the total score data, and the scoring result represented by this total score data is displayed on the display panel 8. The

以上より明らかなように、カラオケ装置２０によれば、歌唱者が口の大きさを正しく変化させて歌っていれば、総合点数が満点を超えない範囲で上がる。歌唱者が歌の上手な人であれば、口の大きさを正しく変化させて歌うから、一般的なカラオケ装置において歌唱音声信号のレベルが低いために歌の上手さに較べて歌唱力が低く採点された人であっても、カラオケ装置２０を用いることにより、歌の上手さに見合った採点結果を得ることができる。 As is clear from the above, according to the karaoke apparatus 20, if the singer sings while changing the size of the mouth correctly, the total score increases within a range not exceeding the perfect score. If the singer is a good singer, singing with the mouth size changed correctly, the singing voice signal level is low in a general karaoke device, so the singing ability is lower than that of a singer. Even a scored person can use the karaoke device 20 to obtain a scoring result commensurate with the skill of the song.

［第３実施形態］
図１１は本発明の第３実施形態に係るカラオケ装置３０の機能構成を示す図である。このカラオケ装置３０が図１のカラオケ装置１０と異なる点は、縦横比採点部１２に代えて形状採点部３１を有する点と、理想形状データ３２がＲＡＭ４に書き込まれている点である。理想形状データ３２のＲＡＭ４への書き込みは、採点プログラムの実行開始時に行われる。理想形状データ３２は口の理想的な縦横比を表す形状データを母音毎に有しており、後述の形状採点処理において使用される。 [Third Embodiment]
FIG. 11 is a diagram showing a functional configuration of the karaoke apparatus 30 according to the third embodiment of the present invention. The karaoke apparatus 30 differs from the karaoke apparatus 10 of FIG. 1 in that it has a shape scoring unit 31 instead of the aspect ratio scoring unit 12 and ideal shape data 32 is written in the RAM 4. The ideal shape data 32 is written into the RAM 4 when the scoring program is started. The ideal shape data 32 has shape data representing the ideal aspect ratio of the mouth for each vowel, and is used in the shape scoring process described later.

形状採点部３１は、自動演奏中の楽曲データ内の歌詞データを参照して採点タイミングを特定する。また、形状採点部３１は特定した採点タイミングにて後述の形状採点処理を行う。この形状採点処理において、形状採点部３１は縦横比計測部１１により計測された縦横比を用いて歌唱者の歌唱力を採点し、この採点の結果を表す画像点数データを生成する。 The shape scoring unit 31 specifies the scoring timing with reference to the lyrics data in the music data being automatically played. In addition, the shape scoring unit 31 performs a later-described shape scoring process at the specified scoring timing. In this shape scoring process, the shape scoring unit 31 scores the singer's singing ability using the aspect ratio measured by the aspect ratio measuring unit 11, and generates image score data representing the result of the scoring.

カラオケ装置３０においては、縦横比計測部１１が歌唱者の口の縦横比を計測する処理（図４参照）を繰り返す。
一方、形状採点部３１は、採点タイミングを迎えると、図１２に示す流れの形状採点処理を行う。この形状採点処理では、形状採点部３１は、まず、理想的な縦横比に対する、この時点で縦横比計測部１１により計測された縦横比の適合度である形状適合度を算出する。具体的には、自動演奏中の楽曲データ内の歌詞データに基づいて歌唱者が発音すべき音節を特定し（ステップＳＦ１）、この音節を構成する母音を特定し（ステップＳＦ２）、この母音に応じた形状データを理想形状データ３２から抽出し（ステップＳＦ３）、この形状データで表される理想的な縦横比を用いて形状適合度を算出する（ステップＳＦ４）。 In the karaoke apparatus 30, the aspect ratio measurement part 11 repeats the process (refer FIG. 4) which measures the aspect ratio of a singer's mouth.
On the other hand, the shape scoring unit 31 performs the flow shape scoring process shown in FIG. 12 when the scoring timing comes. In this shape scoring process, the shape scoring unit 31 first calculates the shape adaptability, which is the adaptability of the aspect ratio measured by the aspect ratio measuring unit 11 at this time with respect to the ideal aspect ratio. More specifically, the syllable to be pronounced by the singer is specified based on the lyrics data in the music data being automatically played (step SF1), the vowels constituting this syllable are specified (step SF2), The corresponding shape data is extracted from the ideal shape data 32 (step SF3), and the shape conformity is calculated using the ideal aspect ratio represented by the shape data (step SF4).

次に、形状採点部３１は、算出した形状適合度に応じた画像点数を表す画像点数データを生成する（ステップＳＦ５）。形状採点部３１により生成された画像点数データは総合採点部１４により蓄積され、総合採点処理において使用される。なお、形状適合度が高いほど画像点数は高くなる。ただし、画像点数の最低点は零点に定められており、形状適合度が如何に低くても画像点数が負値となることはない。 Next, the shape scoring unit 31 generates image score data representing the image score according to the calculated shape suitability (step SF5). The image score data generated by the shape scoring unit 31 is accumulated by the comprehensive scoring unit 14 and used in the comprehensive scoring process. Note that the higher the shape matching degree, the higher the number of image points. However, the lowest point of the image score is set to zero, and the image score does not become a negative value no matter how low the shape conformity is.

以上より明らかなように、カラオケ装置３０によれば、採点タイミングにおいて歌唱者が口を正しい形状で開けて歌っていれば、総合点数が満点を超えない範囲で上がる。歌の上手な人であれば、口を正しい形状で開けて歌うから、一般的なカラオケ装置において歌唱音声信号のレベルが低いために歌の上手さに較べて歌唱力が低く採点された人であっても、カラオケ装置３０を用いることにより、歌の上手さに見合った採点結果を得ることができる。 As is clear from the above, according to the karaoke apparatus 30, if the singer sings with the mouth opened in the correct shape at the scoring timing, the total score rises in a range not exceeding the full score. If you are a good singer, you can sing with your mouth open in the correct shape, so in a general karaoke device, the singing voice signal level is low, so the singing ability is lower than that of a singer. Even if it exists, the scoring result commensurate with the skill of the song can be obtained by using the karaoke device 30.

以上説明したように、本発明の実施形態によれば、歌唱音声信号のレベルが低くても歌の上手さに見合った採点結果を得ることができるから、周囲に聴こえる最小限の声量で歌いたい歌唱者が意に反して声を張り上げたり、マイクを口から遠く離して歌いたい歌唱者が意に反してマイクを口に近づけたり、歌唱者がマイクの感度を意識して使用するマイクを選択したりする必要がない。
また、本発明の実施形態は、歌唱の際の口の動かし方には個人差があることを考慮し、画像点数を加点にのみ用いるようにしている。つまり、歌唱の際の動かし方が理想的でなくとも、歌唱力の採点結果が不当に下がることはない。 As described above, according to the embodiment of the present invention, even if the singing voice signal level is low, it is possible to obtain a scoring result suitable for the skill of the song, so it is desired to sing with the minimum volume that can be heard around. The singer raises his voice against his will, the singer who wants to sing far away from his mouth and the singer wants to sing his voice against his mouth, and the singer selects the microphone to use with the microphone sensitivity in mind There is no need to do.
In addition, in the embodiment of the present invention, the number of image points is used only for adding in consideration of individual differences in how to move the mouth during singing. In other words, even if the way of singing is not ideal, the score of singing ability will not be unduly lowered.

［変形］
なお、上述した第１実施形態では口の縦横比を用いて口が開いているか否かを判定するようにしたが、この判定の仕方は任意である。例えば、開口画像の面積を用いて判定するようにしてもよいし、輪郭線の抽出に成功したことをもって口が開いていると判定するようにしてもよい。 [Deformation]
In the first embodiment described above, it is determined whether or not the mouth is open by using the aspect ratio of the mouth, but this determination method is arbitrary. For example, the determination may be made using the area of the opening image, or it may be determined that the mouth is open when the contour line has been successfully extracted.

また、上述した第１及び第２実施形態では、歌唱者がカメラ９の方に顔を向けて歌うことを前提とし、撮像画像における縦幅／横幅を口の縦幅／横幅とみなしているが、このような前提が存在しない場合には、撮像画像から口の３次元モデルを生成し、３次元空間における口の向きに応じた方向を縦方向／横方向として定め、こうして定めた縦方向／横方向における３次元の開口領域の最大長を口の縦幅／横幅として求めるようにしてもよい。また、第２実施形態において、撮像画像から口の３次元モデルを生成し、３次元空間における口の向きに応じた面における開口領域のサイズを口の大きさとして求めるようにしてもよい。
また、上述した第１及び第３実施形態では、口の縦横比を比較に用いているが、輪郭線の符号化データを用いるようにしてもよい。 In the first and second embodiments described above, it is assumed that the singer sings with the face facing the camera 9, and the vertical width / width in the captured image is regarded as the vertical width / width of the mouth. When such a premise does not exist, a three-dimensional model of the mouth is generated from the captured image, the direction corresponding to the direction of the mouth in the three-dimensional space is defined as the vertical direction / horizontal direction, and the vertical direction / The maximum length of the three-dimensional opening region in the horizontal direction may be obtained as the vertical width / horizontal width of the mouth. In the second embodiment, a three-dimensional model of the mouth may be generated from the captured image, and the size of the opening area on the surface corresponding to the direction of the mouth in the three-dimensional space may be obtained as the size of the mouth.
In the first and third embodiments described above, the aspect ratio of the mouth is used for comparison, but encoded data of the contour line may be used.

また、上述した第３実施形態では、時間的変化同士を比較し、変化適合度に応じた値の画像点数データを生成するようにしたが、撮像画像から歌唱者が発声している音の母音を推定し、この母音と楽曲データから定まる母音とを比較し、両者が一致した場合には正値を示す画像点数データを生成し、不一致の場合には零を示す画像点数データを生成するようにしてもよい。
また、上述した実施形態では、開口画像の輪郭線を求めてから縦横比やサイズを求めるようにしたが、開口画像の上下左右の端部を検出し、実写画像におけるこれら端部の位置を用いて縦横比やサイズを求めるようにしてもよい。 Further, in the third embodiment described above, temporal changes are compared with each other, and image score data having a value corresponding to the change suitability is generated, but the vowel of the sound sung by the singer from the captured image The vowels are compared with the vowels determined from the music data, and if the two coincide, the image score data indicating a positive value is generated, and if the two do not match, the image score data indicating zero is generated. It may be.
In the above-described embodiment, the aspect ratio and size are obtained after obtaining the contour line of the aperture image. However, the upper, lower, left, and right ends of the aperture image are detected, and the positions of these ends in the actual image are used. Thus, the aspect ratio and size may be obtained.

また、上述した実施形態では歌唱者の顔を撮像するカメラ９を用いたが、歌唱者の口元やマイクの周囲のみを撮像するカメラを用いてもよい。要は、歌唱者の口元を撮像できればよい。また、向きやズーム、フォーカスを自動的には調整できないカメラを用いてもよい。この場合には、歌唱者の顔または口元が位置し得る範囲を撮像するようにカメラの向きやズームを設定しておくべきである。また、カメラから出力された画像データを入力する画像入力Ｉ／Ｆをカラオケ装置に設け、カメラとカラオケ装置とを別体構成としてもよい。 Moreover, although the camera 9 which images a singer's face was used in embodiment mentioned above, you may use the camera which images only a singer's mouth and the circumference | surroundings of a microphone. In short, it is only necessary to image the singer's mouth. A camera that cannot automatically adjust the orientation, zoom, and focus may be used. In this case, the camera direction and zoom should be set so as to capture the range in which the singer's face or mouth can be located. Further, an image input I / F for inputting image data output from the camera may be provided in the karaoke device, and the camera and the karaoke device may be configured separately.

また、上述した実施形態では、一般的な構成の楽曲データを用いて画像採点データを生成するようにしたが、楽曲データに採点用のデータを埋め込み、これを用いて画像採点データを生成するようにしてもよい。
また、上述した実施形態では、楽曲データに基づいて採点タイミングを特定するようにしたが、撮像画像を用いて採点タイミングを特定する形態も考えられる。例えば、撮像画像から歌唱者の口が閉じ始めた時点や閉じ終わった時点を検出して採点タイミングとし、口の閉じ方の適切さを採点するようにしてもよい。
また、上述した実施形態では、採点に係る処理はＣＰＵ３により行われるが、ＤＳＰ（Digital Signal Processor）を設け、一部または全部の処理をＤＳＰに担わせるようにしてもよい。 In the above-described embodiment, the image scoring data is generated using music data having a general configuration. However, the scoring data is embedded in the music data, and the image scoring data is generated using the data. It may be.
In the above-described embodiment, the scoring timing is specified based on the music data. However, a form in which the scoring timing is specified using the captured image is also conceivable. For example, a point in time when the mouth of the singer starts to close or a point in time when the mouth of the singer starts to close may be detected as a scoring timing, and the appropriateness of how to close the mouth may be scored.
In the embodiment described above, the processing related to scoring is performed by the CPU 3, but a DSP (Digital Signal Processor) may be provided so that part or all of the processing is performed by the DSP.

ところで、口を上手く動かすことは発声の良さにつながるから、上述した実施形態に係るカラオケ装置を口の動かし方の練習に用いることも考えられる。この場合には、口の動かし方の良し悪しを歌唱者に伝えるために、撮像画像を用いた採点の結果をそのまま表示するように変形したり、負値の画像点数データを生成可能なように変形したりするのが望ましい。 By the way, since moving the mouth well leads to good utterance, the karaoke apparatus according to the above-described embodiment may be used for practicing how to move the mouth. In this case, in order to tell the singer how to move the mouth, it can be modified to display the result of scoring using the captured image as it is, or negative image score data can be generated It is desirable to deform.

本発明の実施形態に係るカラオケ装置のハードウェア構成を示す図である。It is a figure which shows the hardware constitutions of the karaoke apparatus which concerns on embodiment of this invention. 同実施形態に係る楽曲データの構成を模式的に示す図である。It is a figure which shows typically the structure of the music data which concern on the same embodiment. 本発明の第１実施形態に係るカラオケ装置１０の機能構成を示す図である。It is a figure which shows the function structure of the karaoke apparatus 10 which concerns on 1st Embodiment of this invention. 縦横比計測部１１が行う処理の流れを示すフローチャートである。It is a flowchart which shows the flow of the process which the aspect ratio measurement part 11 performs. 撮像画像と輪郭線との関係を模式的に例示した図である。It is the figure which illustrated typically the relationship between a captured image and an outline. 縦横比採点処理の流れを示すフローチャートである。It is a flowchart which shows the flow of an aspect ratio scoring process. 総合採点処理の流れを示すフローチャートである。It is a flowchart which shows the flow of a comprehensive scoring process. 本発明の第２実施形態に係るカラオケ装置２０の機能構成を示す図である。It is a figure which shows the function structure of the karaoke apparatus 20 which concerns on 2nd Embodiment of this invention. サイズ計測部２１が行う処理の流れを示すフローチャートである。It is a flowchart which shows the flow of the process which the size measurement part 21 performs. サイズ採点処理の流れを示すフローチャートである。It is a flowchart which shows the flow of a size scoring process. 本発明の第３実施形態に係るカラオケ装置３０の機能構成を示す図である。It is a figure which shows the function structure of the karaoke apparatus 30 which concerns on 3rd Embodiment of this invention. 形状採点処理の流れを示すフローチャートである。It is a flowchart which shows the flow of a shape scoring process.

Explanation of symbols

１…マイク、２…ＲＯＭ、３…ＣＰＵ、４…ＲＡＭ、５…ハードディスク、６…音声入力Ｉ／Ｆ、７…指示入力部、８…表示パネル、９…カメラ、１０，２０，３０…カラオケ装置、１１…縦横比計測部、１２…縦横比採点部、１３…音声採点部、１４…総合採点部、１５…点数表示部、２１…サイズ計測部、２２…サイズ採点部、３１…形状採点部。
DESCRIPTION OF SYMBOLS 1 ... Microphone, 2 ... ROM, 3 ... CPU, 4 ... RAM, 5 ... Hard disk, 6 ... Voice input I / F, 7 ... Instruction input part, 8 ... Display panel, 9 ... Camera, 10, 20, 30 ... Karaoke Device: 11: Aspect ratio measuring unit, 12: Aspect ratio scoring unit, 13: Audio scoring unit, 14: Comprehensive scoring unit, 15 ... Score display unit, 21 ... Size measuring unit, 22 ... Size scoring unit, 31 ... Shape scoring unit Department.

Claims

In a karaoke device that performs automatic performance based on music data,
A shape detecting means for detecting the shape of the singer's mouth from a captured image of a camera that images the singer's mouth;
A karaoke apparatus comprising: shape scoring means for specifying appropriateness of the shape detected by the shape detecting means based on the music data and generating image score data representing the specified suitability.

In a karaoke device that performs automatic performance based on music data,
An aspect ratio measuring means for measuring an aspect ratio of the singer's mouth from a captured image of a camera that images the singer's mouth;
A karaoke apparatus comprising: aspect ratio scoring means for specifying appropriateness of the aspect ratio measured by the aspect ratio measuring means based on the music data, and generating image score data representing the specified appropriateness.

In a karaoke device that performs automatically based on music data including lyrics data,
A shape detecting means for detecting the shape of the singer's mouth from a captured image of a camera that images the singer's mouth;
Storage means for storing shape data representing an appropriate shape of the mouth for each vowel;
Vowel identification means for identifying a vowel that the singer should pronounce from the lyrics data;
For the vowels specified by the vowel specifying means, the appropriateness of the shape detected by the shape detecting means is specified based on the shape data stored in the storage means, and image score data representing the specified appropriateness is generated. Karaoke apparatus having shape scoring means.

In a karaoke device that performs automatic performance based on music data,
Size measuring means for measuring the size of the singer's mouth from a captured image of a camera that images the singer's mouth;
A karaoke apparatus comprising: a size change scoring unit that identifies appropriateness of a temporal change in size measured by the size measuring unit based on the music data, and generates image score data representing the identified suitability.

A voice scoring means for scoring the singing ability of the singer using a signal representing the singing voice of the singer, and generating voice score data representing a scoring result;
5. A total scoring unit that generates total score data representing the singing ability of the singer using the image score data and the voice score data. 6. Karaoke equipment.