JP5803172B2

JP5803172B2 - Evaluation device

Info

Publication number: JP5803172B2
Application number: JP2011056425A
Authority: JP
Inventors: 隆一成山; 小林　詠子; 詠子小林; 木村　誠; 誠木村
Original assignee: Yamaha Corp
Current assignee: Yamaha Corp
Priority date: 2011-03-15
Filing date: 2011-03-15
Publication date: 2015-11-04
Anticipated expiration: 2031-03-15
Also published as: JP2012194241A

Description

本発明は、歌唱した音声を評価する際にキャラクターを表示する技術に関する。 The present invention relates to a technique for displaying a character when evaluating a sung voice.

従来より、カラオケなどで歌唱している歌唱者が上手に歌えるように補助する際、キャラクターを表示させる技術がある。特許文献１には、合成された歌声（いわゆるガイドボーカル）を出力することによって上記の補助をする場合に、その歌声の音韻、音高及び発音タイミング等に合わせた口の形状又は表情のキャラクターを表示させる技術が記載されている。 Conventionally, there is a technique for displaying a character when assisting a singer who sings in karaoke or the like to sing well. In Patent Document 1, when the above-mentioned assistance is performed by outputting a synthesized singing voice (so-called guide vocal), a character having a mouth shape or facial expression that matches the phonological tone, pitch, and pronunciation timing of the singing voice is disclosed. The technology to display is described.

特開２００１−４２８７９号公報JP 2001-42879 A

特許文献１に記載された技術においては、１つの楽曲の中で様々な表情が表示される一方、同じ楽曲であれば表示される表情のパターン及びそれらが表示されるタイミングが決まっている。つまり、歌唱者がどのように歌っても、その内容に対する反応がないため、表示されたキャラクターと一緒に歌っているような臨場感を歌唱者に与えることが難しい。同じ理由で、歌唱者は、上記ガイドボーカル、すなわち、歌唱されるべき基準となる歌声に対して、自分が同じように歌唱しているのかどうかが分からず、楽曲の残り部分を今の調子で歌い続けるべきか、上記ガイドボーカルをもっと参考にすべきか、という判断をすることが難しい。
本発明は、このような事情に鑑みてなされたもので、その目的の１つは、歌唱されるべき基準に対して歌唱者が歌唱した音声が沿っている程度に反応するキャラクターを表示することである。 In the technique described in Patent Document 1, while various expressions are displayed in one piece of music, patterns of expressions that are displayed for the same piece of music and the timing for displaying them are determined. That is, no matter how the singer sings, there is no reaction to the contents, so it is difficult to give the singer a sense of realism as if singing with the displayed character. For the same reason, the singer does not know whether he is singing in the same way as the guide vocal, that is, the standard singing voice to be sung. It is difficult to judge whether to continue singing or to use the above guide vocal more.
The present invention has been made in view of such circumstances, and one of its purposes is to display a character that reacts to the extent that the voice sung by the singer is in line with the standard to be sung. It is.

上述の課題を解決するため、本発明は、再生されると楽曲の歌唱すべき位置を表す位置表示データと、歌唱音声を評価する基準を示すリファレンスデータとを記憶する記憶手段と、前記記憶手段から前記位置表示データを読み出して再生する再生手段と、前記再生手段が前記位置表示データを再生しているときに収音手段から供給されたオーディオ信号が表す歌唱音声と、前記記憶手段から読み出した前記リファレンスデータが示す基準とを比較して当該歌唱音声を評価する評価手段と、前記再生手段が前記位置表示データを再生している間、キャラクターの画像を示す第１画像データを出力し、前記評価手段により前記歌唱音声が評価された場合、前記キャラクターの表情を当該歌唱音声が評価された結果に応じた表情で表した画像を示す第２画像データを出力する出力手段であって、前記オーディオ信号が表す歌唱音声から特定の歌唱技法が検出された場合には当該表情を変化させた画像を示すデータを前記第２画像データとして出力する出力手段と、を備えることを特徴とする評価装置を提供する。 In order to solve the above-described problem, the present invention provides a storage unit that stores position display data indicating a position where a song should be sung when reproduced, and reference data indicating a reference for evaluating the singing voice, and the storage unit Read out the position display data from the reproduction means, and the singing voice represented by the audio signal supplied from the sound collection means when the reproduction means is reproducing the position display data, and read out from the storage means An evaluation unit that evaluates the singing voice by comparing with a reference indicated by the reference data, and a first image data indicating a character image is output while the reproduction unit reproduces the position display data, and When the singing voice is evaluated by the evaluation means, an image representing the facial expression of the character with a facial expression corresponding to the result of evaluating the singing voice is shown. An output means for outputting the second image data, and outputs data indicating the image obtained by changing the facial expression as the second image data if a particular singing technique from singing voice which the audio signal represented is detected And an output means .

また、別の好ましい態様において、前記評価手段は、前記オーディオ信号が表す第１の前記歌唱音声と、前記記憶手段から読み出した前記リファレンスデータが示す基準とを比較して当該第１の歌唱音声を評価点によって評価し、当該基準と過去に比較された第２の歌唱音声の評価点よりも前記第１の歌唱音声の評価点が高ければ当該第１の歌唱音声の評価点に加点することを特徴とする。 Moreover, in another preferable aspect, the evaluation means compares the first singing voice represented by the audio signal with a reference indicated by the reference data read from the storage means to obtain the first singing voice. If the evaluation score of the first singing voice is higher than the evaluation score of the second singing voice compared with the reference in the past, the evaluation score of the first singing voice is added. Features.

また、本発明は、再生されると楽曲の歌唱すべき位置を表す位置表示データと、歌唱音声を評価する基準を示すリファレンスデータとを記憶する記憶手段と、前記記憶手段から前記位置表示データを読み出して再生する再生手段と、前記再生手段が前記位置表示データを再生しているときに収音手段から供給されたオーディオ信号が表す第１の歌唱音声と、前記記憶手段から読み出した前記リファレンスデータが示す基準とを比較して当該第１の歌唱音声を評価点によって評価する評価手段であって、当該基準と過去に比較された第２の歌唱音声の評価点よりも前記第１の歌唱音声の評価点が高ければ当該第１の歌唱音声の評価点に加点する評価手段と、前記再生手段が前記位置表示データを再生している間、キャラクターの画像を示す第１画像データを出力し、前記評価手段により前記第１の歌唱音声が評価された場合、前記キャラクターの表情を当該第１の歌唱音声が評価された結果に応じた表情で表した画像を示す第２画像データを出力する出力手段と、を備えることを特徴とする評価装置を提供する。 Further, the present invention provides a storage means for storing position display data indicating a position where a song should be sung when reproduced, and reference data indicating a reference for evaluating the singing voice; and the position display data from the storage means. Reproducing means for reading and reproducing; first singing voice represented by an audio signal supplied from the sound collecting means when the reproducing means reproduces the position display data; and the reference data read from the storage means Is an evaluation means for comparing the first singing voice with the evaluation point of the second singing voice compared with the reference in the past. If the evaluation score is high, an evaluation means for adding to the evaluation score of the first singing voice, and a first image indicating a character image while the reproduction means reproduces the position display data. A second image that outputs image data and shows the facial expression of the character as a facial expression corresponding to a result of the evaluation of the first singing voice when the evaluation means evaluates the first singing voice; And an output means for outputting image data .

また、別の好ましい態様において、前記出力手段は、前記オーディオ信号が表す歌唱音声から特定の歌唱技法が検出された場合には当該表情を変化させた画像を示すデータを前記第２画像データとして出力することを特徴とする。 In another preferable aspect, the output means outputs, as the second image data, data indicating an image in which the facial expression is changed when a specific singing technique is detected from the singing voice represented by the audio signal. characterized in that it.

本発明によれば、歌唱されるべき基準に対して歌唱者が歌唱した音声が沿っている程度に反応するキャラクターを表示することが可能になる。 According to the present invention, it is possible to display a character that reacts to the extent that the voice sung by the singer is along the reference to be sung.

実施形態に係るカラオケ装置の全体構成を示すブロック図である。It is a block diagram which shows the whole structure of the karaoke apparatus which concerns on embodiment. カラオケ装置の制御部の機能的構成を示すブロック図である。It is a block diagram which shows the functional structure of the control part of a karaoke apparatus. ディスプレイに表示される画像の一例を示す図である。It is a figure which shows an example of the image displayed on a display. ディスプレイに表示される画像の一例を示す図である。It is a figure which shows an example of the image displayed on a display. ディスプレイに表示される画像の一例を示す図である。It is a figure which shows an example of the image displayed on a display. 変形例に係るカラオケ装置の構成を示す図である。It is a figure which shows the structure of the karaoke apparatus which concerns on a modification. ディスプレイの方向を調整する様子を説明するための図である。It is a figure for demonstrating a mode that the direction of a display is adjusted.

［実施形態］
以下、本発明の実施形態について図面を参照して説明する。
図１は、カラオケ装置１の全体構成を示すブロック図である。カラオケ装置１は、例えばカラオケ店などに設置され、いわゆるカラオケの機能を利用者に提供すると共に、利用者（歌唱者）が歌唱する音声（歌唱音声）を評価する装置であり、本発明に係る「評価装置」の一例に相当する。カラオケ装置１は、制御部１０と、操作部２１と、音響処理部２２と、収音部２３と、放音部２４と、立体画像表示部２５と、記憶部３０とを備えている。
制御部１０は、ＣＰＵ（Central Processing Unit）とメモリとを備えている。ＣＰＵは、メモリに記憶されているプログラムを実行することにより、カラオケ装置１の各部を制御する。メモリは、ＲＯＭ（Read Only Memory）とＲＡＭ（Random Access Memory）とを備えており、ＣＰＵによって用いられるプログラムやデータを記憶する。
操作部２１は、複数の操作ボタンを備えており、利用者が操作した内容を示す操作データを制御部１０に供給する。
収音部２３は、マイクロホン等の収音手段であり、歌唱者が発した歌唱音声が入力され、歌唱音声を示すオーディオ信号を音響処理部２２に出力する。
放音部２４は、スピーカ等の放音手段であり、音響処理部２２から出力されるオーディオ信号を放音する。
音響処理部２２は、ＤＳＰ（Digital Signal Processor）などの信号処理回路、ＭＩＤＩ（Musical Instrument Digital Interface）形式の信号からオーディオ信号を生成する音源などを有する。音響処理部２２は、収音部２３から入力されるオーディオ信号をＡ／Ｄ変換して制御部１０に出力する。音響処理部２２は、制御部１０からＭＩＤＩ形式のデータが入力され、そのデータに基づいてオーディオ信号を生成する。音響処理部２２は、このように生成したオーディオ信号、制御部１０から出力されたオーディオ信号、収音部２３から入力されたオーディオ信号などを、エフェクト処理、増幅処理などの信号処理を施してから放音部２４に出力する。
立体画像表示部２５は、立体画像に対応したディスプレイを備えており、制御部１０から供給される立体画像を示す画像データに応じた立体画像を表示する。 [Embodiment]
Embodiments of the present invention will be described below with reference to the drawings.
FIG. 1 is a block diagram showing the overall configuration of the karaoke apparatus 1. The karaoke apparatus 1 is installed in, for example, a karaoke shop and provides a user with a so-called karaoke function and evaluates a voice (singing voice) sung by the user (singer), according to the present invention. It corresponds to an example of “evaluation device”. The karaoke apparatus 1 includes a control unit 10, an operation unit 21, an acoustic processing unit 22, a sound collection unit 23, a sound emission unit 24, a stereoscopic image display unit 25, and a storage unit 30.
The control unit 10 includes a CPU (Central Processing Unit) and a memory. The CPU controls each unit of the karaoke apparatus 1 by executing a program stored in the memory. The memory includes a ROM (Read Only Memory) and a RAM (Random Access Memory), and stores programs and data used by the CPU.
The operation unit 21 includes a plurality of operation buttons, and supplies operation data indicating the contents operated by the user to the control unit 10.
The sound collection unit 23 is a sound collection unit such as a microphone, and receives a singing voice emitted by a singer and outputs an audio signal indicating the singing voice to the acoustic processing unit 22.
The sound emitting unit 24 is a sound emitting unit such as a speaker, and emits an audio signal output from the sound processing unit 22.
The acoustic processing unit 22 includes a signal processing circuit such as a DSP (Digital Signal Processor), a sound source that generates an audio signal from a MIDI (Musical Instrument Digital Interface) format signal, and the like. The sound processing unit 22 performs A / D conversion on the audio signal input from the sound collection unit 23 and outputs the audio signal to the control unit 10. The sound processing unit 22 receives MIDI format data from the control unit 10 and generates an audio signal based on the data. The sound processing unit 22 performs signal processing such as effect processing and amplification processing on the audio signal thus generated, the audio signal output from the control unit 10, the audio signal input from the sound collection unit 23, and the like. The sound is output to the sound emitting unit 24.
The stereoscopic image display unit 25 includes a display corresponding to the stereoscopic image, and displays a stereoscopic image corresponding to image data indicating the stereoscopic image supplied from the control unit 10.

記憶部３０は、ハードディスクを備えており、楽曲データベース３１０とキャラクターデータベース３２０とを記憶している。
楽曲データベース３１０には、カラオケで再生される楽曲に関するデータが記録されており、詳細には、伴奏データ３１１とリファレンスデータ３１２と歌詞データ３１３とが記録されている。伴奏データ３１１は、楽曲の伴奏の内容を示すデータであり、例えば、ＭＩＤＩ形式により記述されている。リファレンスデータ３１２は、歌唱すべき基準を示すデータであり、詳細には、歌唱すべき各構成音の音高を示している。これらの基準、すなわち、構成音の音高は、伴奏データ３１１が示す伴奏において各構成音が歌唱されるべき期間と対応付けられており、この期間に収音部２３に入力された歌唱音声を評価するために用いられる。以下においては、この期間を「評価期間」という。評価期間の開始と終了とは、伴奏データ３１１が開始されてから経過する時間で表される。リファレンスデータ３１２は、例えば、各構成音の音高と音の長さ（評価期間に相当）とをＭＩＤＩ形式により記述されている。歌詞データ３１３は、楽曲の歌詞の内容を示すデータ、および立体画像表示部２５に表示させた歌詞テロップを色替えするためのタイミングを示すデータを有する。 The storage unit 30 includes a hard disk, and stores a music database 310 and a character database 320.
In the music database 310, data related to music played in karaoke is recorded, and in detail, accompaniment data 311, reference data 312 and lyric data 313 are recorded. The accompaniment data 311 is data indicating the content of the accompaniment of the music and is described in, for example, the MIDI format. The reference data 312 is data indicating a standard to be sung, and specifically indicates the pitch of each component sound to be sung. These pitches, that is, the pitches of the constituent sounds, are associated with the period during which each constituent sound is to be sung in the accompaniment indicated by the accompaniment data 311, and the singing voice input to the sound collection unit 23 during this period Used to evaluate. Hereinafter, this period is referred to as an “evaluation period”. The start and end of the evaluation period are represented by the time elapsed after the accompaniment data 311 is started. In the reference data 312, for example, the pitch and length of each constituent sound (corresponding to the evaluation period) are described in the MIDI format. The lyrics data 313 includes data indicating the contents of the lyrics of the music and data indicating the timing for changing the color of the lyrics telop displayed on the stereoscopic image display unit 25.

キャラクターデータベース３２０には、歌唱中に立体画像表示部２５に表示される画像に関するデータが記録されており、詳細には、キャラクター画像データ３２１が記録されている。キャラクター画像データ３２１は、様々なキャラクターの様々な表情の画像を示すデータ（画像データ）である。ここでいうキャラクターは、例えば、絵で表された人、動物又は擬人化された物（ロボットなど）等であり、インターネットのコミュニティサイトで用いられるキャラクター（いわゆるアバター）も含まれる。なお、この絵には、実在の人、動物又は物の写真が用いられても良い。ここにおいて、キャラクターの表情とは、キャラクターの感情又は伝えたい情報等を顔つきや身振りで表したものをいい、例えば、笑っている表情、怒っている表情、悲しげに歩いている動作、うれしそうに跳び上がっている動作、テンポに合わせて手を振る又はあるタイミングを目配せして示す動作等である。 In the character database 320, data relating to an image displayed on the stereoscopic image display unit 25 during singing is recorded, and in detail, character image data 321 is recorded. The character image data 321 is data (image data) indicating images of various expressions of various characters. The character here is, for example, a person represented by a picture, an animal, or an anthropomorphic object (such as a robot), and also includes a character (so-called avatar) used on an Internet community site. In addition, a photograph of a real person, an animal, or a thing may be used for this picture. Here, the character's facial expression is a character's emotion or information that you want to convey, such as a facial expression or gesture, such as a smiling expression, an angry expression, a sadly walking action, For example, a jumping action, an action of waving at the tempo, or showing a certain timing.

次に、このようなハードウェア構成において、制御部１０が機能プログラムを実行したときに構築される機能ブロックについて、図２を参照して説明する。
図２は、制御部１０の機能的構成を示すブロック図である。制御部１０は、再生部１１１と、評価部１１２と、特定部１１３と、画像生成部１１４と、表示制御部１１５とを有する。
再生部１１１は、操作部２１から楽曲の再生を指示する操作データが供給されると、図１に示したその楽曲の伴奏データ３１１及び歌詞データ３１３を再生する。詳細には、再生部１１１は、楽曲データベース３１０からその楽曲の伴奏データ３１１と歌詞データ３１３とを読み出す。そして、再生部１１１は、伴奏データ３１１を音響処理部２２に供給し、伴奏音を放音させ、また、歌詞データ３１３を表示制御部１１５に供給し、歌詞を表示させる。このとき、再生部１１１は、伴奏音の放音と歌詞の表示とが同じタイミングで開始されるようにこれらのデータを供給する。再生部１１１は、伴奏データ３１１を再生している間、その伴奏データ３１１の再生が開始されてから経過した時間（経過時間）を示すデータを、数ｍｓｅｃ毎に評価部１１２、画像生成部１１４及び表示制御部１１５に供給する。 Next, functional blocks constructed when the control unit 10 executes a functional program in such a hardware configuration will be described with reference to FIG.
FIG. 2 is a block diagram illustrating a functional configuration of the control unit 10. The control unit 10 includes a reproduction unit 111, an evaluation unit 112, a specification unit 113, an image generation unit 114, and a display control unit 115.
When the operation data for instructing the reproduction of the music is supplied from the operation unit 21, the reproduction unit 111 reproduces the accompaniment data 311 and the lyrics data 313 of the music shown in FIG. Specifically, the playback unit 111 reads the accompaniment data 311 and the lyrics data 313 of the music from the music database 310. Then, the playback unit 111 supplies the accompaniment data 311 to the sound processing unit 22 to emit the accompaniment sound, and supplies the lyrics data 313 to the display control unit 115 to display the lyrics. At this time, the reproduction unit 111 supplies these data so that the accompaniment sound emission and the lyrics display are started at the same timing. While reproducing the accompaniment data 311, the reproducing unit 111 evaluates data indicating the time (elapsed time) that has elapsed since the reproduction of the accompaniment data 311 is started every several msec, and the image generating unit 114. And the display control unit 115.

評価部１１２は、歌唱者の歌唱音声を、リファレンスデータ３１２が示す基準と比較して評価する手段であり、詳細には、以下のとおり動作する。評価部１１２は、再生部１１１から上記経過時間を示すデータが供給されている状態（つまり、伴奏データ３１１が再生されている状態）において、音響処理部２２を介して図１に示した収音部２３からオーディオ信号が供給されることで動作する。まず、評価部１１２は、操作部２１から供給された操作データが示す楽曲のリファレンスデータ３１２を楽曲データベース３１０から読み出す。次に、評価部１１２は、再生部１１１から上記経過時間を示すデータが供給された時刻に供給されたオーディオ信号が表す歌唱音声の音高と、リファレンスデータ３１２が示す基準（構成音の音高）のうちこの経過時間を含む評価期間に対応付けられたものとを比較し、その差の値を例えばセント値で算出する。評価部１１２は、この差の値を、経過時間を示すデータが再生部１１１から供給される度（数ｍｓｅｃ毎）に算出する。そして、評価部１１２は、評価期間に算出された差の値を合計したものを評価値として算出し、その値が第１閾値よりも大きい場合に、評価点の基礎となる点数から減じる（減点する）。評価部１１２は、伴奏データ３１１の再生が終了するまで評価値の算出を行い、その結果の点数を最終的な評価点として算出する。このように、評価点は、評価値の値が大きいほど、すなわち、音高が基準から離れるほど、また、減点される評価期間が多いほど、すなわち、音高が基準から離れる回数が多いほど、低くなる。なお、第１閾値は、歌唱音声と構成音の音声との音高のずれを許容する程度を示す値であり、例えば、歌唱を厳しく評価する場合に小さくし、優しく評価する場合に大きくすれば良い。 The evaluation unit 112 is a means for evaluating the singing voice of the singer by comparing with the reference indicated by the reference data 312, and operates in detail as follows. The evaluation unit 112 collects the sound shown in FIG. 1 via the sound processing unit 22 in a state where the data indicating the elapsed time is supplied from the reproduction unit 111 (that is, the accompaniment data 311 is reproduced). The audio signal is supplied from the unit 23 to operate. First, the evaluation unit 112 reads the reference data 312 of the music indicated by the operation data supplied from the operation unit 21 from the music database 310. Next, the evaluation unit 112 determines the pitch of the singing voice represented by the audio signal supplied at the time when the data indicating the elapsed time is supplied from the reproduction unit 111, and the reference (pitch of the constituent sound) indicated by the reference data 312. ) And the one associated with the evaluation period including this elapsed time, and the difference value is calculated, for example, as a cent value. The evaluation unit 112 calculates the difference value every time data indicating elapsed time is supplied from the reproduction unit 111 (every several msec). Then, the evaluation unit 112 calculates the sum of the difference values calculated during the evaluation period as an evaluation value, and subtracts it from the score that is the basis of the evaluation score when the value is larger than the first threshold (deduction point) To do). The evaluation unit 112 calculates an evaluation value until the reproduction of the accompaniment data 311 is completed, and calculates the score of the result as a final evaluation point. As described above, the evaluation score is larger as the value of the evaluation value is larger, that is, as the pitch is separated from the reference, and as the evaluation period is deducted, that is, as the number of times the pitch is separated from the reference is larger, Lower. Note that the first threshold is a value indicating the degree to which the pitch difference between the singing voice and the constituent voices is allowed. For example, if the singing is evaluated strictly, the first threshold is reduced, and if the singing is evaluated gently, it is increased. good.

また、評価部１１２は、評価値が第１閾値以下の歌唱音声を「良い」と評価し、第１閾値より大きく第２閾値以下の歌唱音声を「普通」と評価し、第２閾値より大きい歌唱音声を「悪い」と評価する。この第２閾値は、減点が大きく評価が悪かったところ、すなわち、歌唱者が“苦手なところ”を判定するための値である。第２閾値も、第１閾値同様、歌唱を厳しく評価する場合に小さくし、優しく評価する場合に大きくすれば良い。評価部１１２は、画像生成部１１４に対して、評価期間が終了する度にこれらの評価結果（「良い」、「普通」、「悪い」）を示すデータを供給し、最後の基準に対する評価値を算出したときにこの評価値まで含めて算出した評価点を供給する。また、評価部１１２は、評価結果が「悪い」である場合は、この評価において比較された基準を示すデータと共に特定部１１３に供給する。 The evaluation unit 112 evaluates a singing voice having an evaluation value equal to or lower than the first threshold as “good”, evaluates a singing voice higher than the first threshold and lower than the second threshold as “normal”, and is higher than the second threshold. Evaluate singing voice as “bad”. The second threshold value is a value for determining a place where the deduction is large and the evaluation is bad, that is, a place where the singer is not good. Similarly to the first threshold value, the second threshold value may be reduced when singing is evaluated strictly, and may be increased when evaluated gently. The evaluation unit 112 supplies the image generation unit 114 with data indicating these evaluation results (“good”, “normal”, “bad”) every time the evaluation period ends, and the evaluation value for the last criterion An evaluation score calculated by including up to this evaluation value is calculated. In addition, when the evaluation result is “bad”, the evaluation unit 112 supplies the data to the specifying unit 113 together with data indicating the criteria compared in this evaluation.

特定部１１３は、リファレンスデータ３１２により示される基準に対応づけられた期間のうち、評価部１１２により供給されたデータが示す基準と音高が共通する基準に対応づけられた期間を特定し、特定した期間を示すデータ（例えば、評価期間が開始する時間及び終了する時間を示すデータ）を画像生成部１１４に供給する。
画像生成部１１４は、操作部２１から楽曲を示す操作データが供給されると、その楽曲の伴奏データに応じたキャラクターの画像データをキャラクターデータベース３２０から読み出す。そして、画像生成部１１４は、読み出した画像を編集又は合成等した画像データを生成して表示制御部１１５に供給する。この読み出した画像データは、例えば、サビでは盛り上がりを表す表情となり、ビブラートさせたいところではビブラートする歌い方を喚起する表情となるキャラクターの画像を示す画像データである。これらの画像は、伴奏データの再生中に表情が変化するが、その変化の仕方は一定であり、伴奏データが同じであれば、毎回同じように表情が変化する。このキャラクターの画像を示す画像データは、本発明に係る「第１画像データ」の一例に相当する。画像生成部１１４は、評価部１１２及び特定部１１３からデータが供給されない間は、上記のとおり動作する。 The identifying unit 113 identifies and identifies a period associated with a reference that has the same pitch as the reference indicated by the data supplied by the evaluation unit 112 among the periods associated with the reference indicated by the reference data 312. Data indicating the determined period (for example, data indicating the start time and end time of the evaluation period) is supplied to the image generation unit 114.
When the operation data indicating the music is supplied from the operation unit 21, the image generation unit 114 reads the character image data corresponding to the accompaniment data of the music from the character database 320. The image generation unit 114 generates image data obtained by editing or synthesizing the read image and supplies the image data to the display control unit 115. The read image data is, for example, image data indicating an image of a character that becomes a facial expression that expresses excitement in rust and an expression that evokes a vibrato singing method when it is desired to vibrato. In these images, the expression changes during the reproduction of the accompaniment data, but the manner of the change is constant. If the accompaniment data is the same, the expression changes in the same way every time. The image data indicating the character image corresponds to an example of “first image data” according to the present invention. The image generation unit 114 operates as described above while data is not supplied from the evaluation unit 112 and the specifying unit 113.

一方、画像生成部１１４は、評価部１１２から評価結果を示すデータが供給された場合、この評価結果に応じた表情のキャラクターを表す画像の画像データをキャラクターデータベース３２０から読み出す。また、画像生成部１１４は、特定部１１３から評価期間を示すデータが供給された場合、この評価期間に応じた表情のキャラクターを表す図１に示したキャラクター画像データ３２１をキャラクターデータベース３２０から読み出す。そして、それぞれの場合において、画像生成部１１４は、これらのキャラクターの画像を編集又は合成等した画像データを生成して表示制御部１１５に供給する。これらの場合に生成される画像データは、いずれの場合も、評価部１１２により歌唱が評価された結果に応じた表情のキャラクターの画像を示すものであり、本発明に係る「第２画像データ」の一例に相当する。これらの場合、画像生成部１１４は、生成した画像データを、上述した伴奏データに応じたキャラクターの画像データに代えて表示制御部１１５に供給する。表示制御部１１５は、立体画像表示部２５の動作を制御して、画像生成部１１４から供給された画像データが表す画像を、再生部１１１から供給されるデータが示す経過時間に合わせて上述したディスプレイに表示させる。 On the other hand, when data indicating an evaluation result is supplied from the evaluation unit 112, the image generation unit 114 reads out image data of an image representing a facial expression character corresponding to the evaluation result from the character database 320. Further, when the data indicating the evaluation period is supplied from the specifying unit 113, the image generation unit 114 reads the character image data 321 shown in FIG. 1 representing the character of the facial expression corresponding to the evaluation period from the character database 320. In each case, the image generation unit 114 generates image data obtained by editing or synthesizing these character images and supplies the image data to the display control unit 115. In any case, the image data generated in these cases indicates an image of a character with a facial expression corresponding to the result of evaluation of the singing by the evaluation unit 112, and “second image data” according to the present invention. It corresponds to an example. In these cases, the image generation unit 114 supplies the generated image data to the display control unit 115 instead of the character image data corresponding to the accompaniment data described above. The display control unit 115 controls the operation of the stereoscopic image display unit 25, and the image represented by the image data supplied from the image generation unit 114 is described above according to the elapsed time indicated by the data supplied from the reproduction unit 111. Display on the display.

以上の各部の動作により表示される画像について、図３、図４、図５を参照して詳細に説明する。これらの各図は、カラオケ装置１においてディスプレイに表示される画像の一例を示す図である。また、図３、図４では、時間が経過したときにそれぞれディスプレイに表示される画像を（ａ）、（ｂ）、（ｃ）の順番に示している。
図３は、歌唱を評価した結果に応じたキャラクターが表示されている画像の一例を示す図である。これらの画像には、歌詞Ａと、基準画像Ｂと、歌唱結果線Ｃ（Ｃ１、Ｃ２、Ｃ３）と、キャラクターＤ（Ｄ１、Ｄ２、Ｄ３）とが表示されている。歌詞Ａは、図１に示した歌詞データ３１３が示す歌詞が表示されている。
歌詞Ａは、図１に示した伴奏データ３１１が再生されている部分に対応する部分が白抜きの文字から黒い文字に画像が変化することで、現在歌唱者が歌唱するべき位置が分かるように表示されている。
基準画像Ｂは、上述した歌唱音声の基準を示す画像であり、詳細には、各構成音の音高とその評価期間を表す棒状の画像である。基準画像Ｂは、五線譜に重ねて表示されており、この五線譜は、各線に沿った矢印Ｒ１が示す方向に時刻が規定されている。基準画像Ｂは、この五線譜のどこに表示されているかによって音高を表し、矢印Ｒ１に沿った方向の長さによって評価時間を表す。基準画像Ｂは、図１に示したリファレンスデータ３１２が示す音高とこれに対応付けられた評価期間とに基づき表示される。 The images displayed by the operations of the above units will be described in detail with reference to FIGS. Each of these drawings is a diagram illustrating an example of an image displayed on the display in the karaoke apparatus 1. In FIGS. 3 and 4, images displayed on the display when time elapses are shown in the order of (a), (b), and (c).
FIG. 3 is a diagram illustrating an example of an image in which a character corresponding to a result of evaluating a song is displayed. In these images, lyrics A, reference image B, singing result line C (C1, C2, C3), and character D (D1, D2, D3) are displayed. As the lyrics A, the lyrics indicated by the lyrics data 313 shown in FIG. 1 are displayed.
In the lyrics A, the portion corresponding to the portion where the accompaniment data 311 shown in FIG. 1 is reproduced is changed from a white character to a black character, so that the position where the singer should sing now can be understood. It is displayed.
The reference image B is an image indicating the reference of the singing voice described above, and in detail, is a bar-shaped image representing the pitch of each constituent sound and its evaluation period. The reference image B is displayed so as to be superimposed on the staff, and the time is defined in the direction indicated by the arrow R1 along each line. The reference image B represents the pitch according to where it is displayed on the staff and represents the evaluation time according to the length in the direction along the arrow R1. The reference image B is displayed based on the pitch indicated by the reference data 312 shown in FIG. 1 and the evaluation period associated therewith.

歌唱結果線Ｃは、歌唱音声の音高の軌跡を示す線が、基準画像Ｂと共に表示された五線譜に重ねて示されたものであり、以下の図において、二点鎖線で示されている。詳細には、歌唱結果線Ｃは、図１に示した収音部２３から出力されたオーディオ信号により表される音声の音高と、それが供給されたときの時刻とが上記五線譜において示す点を順番に結んだ線である。
キャラクターＤは、歌唱の音声が評価された結果に応じて表示される画像である。図３（ａ）では、歌唱結果線Ｃ１のうち、評価期間Ｘ１に示される部分が「普通」と評価された結果、「普通」という感情を表した表情のキャラクターＤ１が表示されている。図３（ｂ）では、歌唱結果線Ｃ２のうち、評価期間Ｘ２に示される部分が「悪い」と評価された結果、「悪い」という感情を表した表情のキャラクターＤ２が表示されている。図３（ｃ）では、歌唱結果線Ｃ３のうち、評価期間Ｘ３に示される部分が「良い」と評価された結果、「良い」という感情を表した表情のキャラクターＤ３が表示されている。これらキャラクターＤの画像は、各評価期間が経過し、評価結果を示すデータが図１に示した評価部１１２から画像生成部１１４に供給されることで表示される。 The singing result line C is a line showing the pitch trajectory of the singing voice superimposed on the staff score displayed together with the reference image B, and is indicated by a two-dot chain line in the following figures. Specifically, the singing result line C is a point that the pitch of the voice represented by the audio signal output from the sound collection unit 23 shown in FIG. It is the line which connected in order.
The character D is an image displayed according to the result of evaluating the voice of the song. In FIG. 3A, the character D1 having a facial expression expressing the feeling of “normal” is displayed as a result of evaluating the portion indicated by the evaluation period X1 in the singing result line C1 as “normal”. In FIG. 3B, as a result of evaluating the portion indicated in the evaluation period X2 of the singing result line C2 as “bad”, a character D2 having a facial expression expressing an emotion of “bad” is displayed. In FIG. 3C, as a result of evaluating the portion indicated in the evaluation period X3 in the singing result line C3 as “good”, a character D3 having a facial expression expressing the feeling of “good” is displayed. The images of the character D are displayed when each evaluation period elapses and data indicating the evaluation result is supplied from the evaluation unit 112 shown in FIG. 1 to the image generation unit 114.

図４は、特定部１１３により特定された評価期間に応じた表情のキャラクターが表示されている画像の一例を示す図である。図４では、図３と異なる点を中心に説明する。図４では、キャラクターＥ（Ｅ１、Ｅ２、Ｅ３）は、上述した「苦手なところ」を伝えるための表情をしている。この例では、図３（ｂ）において「悪い」と評価された評価期間Ｘ２における基準（音高がＧ）と音高が共通する基準の評価期間が、時刻ｔ３から始まる評価期間Ｙ１である。図４（ａ）、（ｂ）は、時刻ｔ３よりも、時間Ｔ１前の時点又は時間Ｔ２前の時点に表示されている画像をそれぞれ示し、図４（ｃ）は、時刻ｔ３に表示されている画像を示している。 FIG. 4 is a diagram illustrating an example of an image in which a character with a facial expression corresponding to the evaluation period specified by the specifying unit 113 is displayed. FIG. 4 will be described with a focus on differences from FIG. In FIG. 4, the character E (E1, E2, E3) has an expression to convey the above-mentioned “I am not good”. In this example, the reference evaluation period Y1 starting from time t3 is the reference evaluation period that is common to the reference (pitch is G) in the evaluation period X2 evaluated as “bad” in FIG. 3B. 4 (a) and 4 (b) show images displayed at a time point before time T1 or a time point before time T2 before time t3, respectively, and FIG. 4 (c) is displayed at time t3. The image is shown.

図４（ａ）に示すキャラクターＥ１は、「苦手なところ」までもうすぐであることと、そのときにどのように歌唱するべきかを示した画像である。詳細には、キャラクターＥ１は、左腕Ｅ１Ｌを上げることで、「苦手なところ」が近づいてきたことを示している。また、キャラクターＥ１は、右腕Ｅ１Ｒを上げることで、その音声を発するときには、声を少し高めに出すと良いことを示している。図４（ｂ）に示すキャラクターＥ２は、左腕Ｅ２Ｌを矢印Ｒ２の方向に少しずつ下げることで、「苦手なところ」が近づいていることを示している。図４（ｃ）に示すキャラクターＥ３は、左腕Ｅ３Ｌを矢印Ｒ３の方向に一気に下げることで、「苦手なところ」に到達したことを示している。以上のとおり、これらのキャラクターＥは、「苦手なところ」、すなわち、評価期間Ｙが開始するタイミングである時刻ｔ３よりも前に表示され、その表情の変化でこのタイミングが近づいていることを示す。これにより、歌唱者は、評価期間Ｙの開始に合わせて音声を発することが容易になり、カラオケ装置１は、歌唱者が上手に歌うことを補助することができる。 The character E1 shown in FIG. 4A is an image that shows that it is close to “I am not good” and how to sing at that time. More specifically, the character E1 indicates that “I am not good” is approaching by raising the left arm E1L. Further, the character E1 indicates that it is better to raise the right arm E1R so that the voice is raised slightly when the voice is emitted. The character E2 shown in FIG. 4B indicates that the “I am not good at hand” is approaching by lowering the left arm E2L little by little in the direction of the arrow R2. The character E3 shown in FIG. 4 (c) indicates that he / she has reached “I'm not good” by lowering the left arm E3L all at once in the direction of the arrow R3. As described above, these characters E are displayed “before the time t3, which is the timing when the evaluation period Y starts”, indicating that this timing is approaching due to the change in facial expression. . Thereby, it becomes easy for a singer to utter a sound in accordance with the start of the evaluation period Y, and the karaoke apparatus 1 can assist the singer to sing well.

図５は、キャラクターの様々な表情を表す画像の例を示す図である。図５（ａ）は、音高をどのように修正するべきか（修正方法）を歌唱者に伝えるための画像である。キャラクターＧ（Ｇ１、Ｇ２）は、右腕の上げ方で音高を上げるか下げるかを伝えており、右腕を上げたキャラクターＧ１が表示されたときは、「音高を上げる」べきであることを伝え、右腕を下げたキャラクターＧ２が表示されたときは、「音高を下げる」べきであることを伝えている。なお、手の指を立てる本数で、上げる（又は下げる）べきである音高の程度を表しても良く、例えば、指１本なら半音、指２本なら１音音高を上げる（又は下げる）ことを示しても良い。また、右腕を上げる角度によって、音高の程度を表しても良い。
図５（ｂ）は、音量の修正方法を歌唱者に伝えるための画像である。キャラクターＨ（Ｈ１、Ｈ２、Ｈ３）は、口の開け方で音量をどうするべきかを伝えている。この例では、口を小さく開けたキャラクターＨ１が表示されたときは、「音量を小さくする」べきであることを伝え、口を大きく開けたキャラクターＨ２が表示されたときは、「音量を大きくする」べきであることを伝え、口を閉じたキャラクターＨ３が表示されたときは、「発声するべきではない」ことを伝えている。 FIG. 5 is a diagram illustrating examples of images representing various facial expressions of a character. FIG. 5A is an image for telling the singer how to correct the pitch (correction method). Character G (G1, G2) tells whether to raise or lower the pitch by raising the right arm, and when the character G1 with the right arm raised is displayed, it should indicate that the pitch should be raised. When the character G2 with the lower right arm is displayed, it is informed that “pitch should be lowered”. Note that the number of fingers raised by the hand may represent the degree of pitch that should be raised (or lowered). For example, one finger increases the semitone, and two fingers increases (or decreases) the pitch. You may show that. Further, the degree of pitch may be expressed by the angle at which the right arm is raised.
FIG. 5B is an image for telling the singer how to correct the volume. Character H (H1, H2, H3) tells you what to do with the volume by opening your mouth. In this example, when the character H1 with a small mouth is displayed, it is notified that “the volume should be reduced”, and when the character H2 with a wide mouth is displayed, “the volume is increased”. When the character H3 with a closed mouth is displayed, it is informed that “You should not speak”.

図５（ｃ）は、音量の修正方法を、立体画像で歌唱者に伝えるための画像である。キャラクターＪ（Ｊ１、Ｊ２、Ｊ３）は、画像の飛び出し方で音量をどの程度大きくすべきかを伝えるための画像である。説明の便宜上、各キャラクターＪが飛び出して見える程度を、立体でない画像との水平方向のずれＫ１、Ｋ２、Ｋ３（Ｋ１＜Ｋ２＜Ｋ３）を仮想的に示し、このずれが大きいほど飛び出して見えているものとした。この例では、飛び出して見える度合いが最も小さい（Ｋ１）キャラクターＪ１が表示されたときは、「音量をやや大きくする」べきであることを伝え、飛び出して見える度合いが最も大きい（Ｋ３）キャラクターＪ３が表示されたときは、「音量をとても大きくする」べきであることを伝え、飛び出して見える度合いがこれらの間（Ｋ２）のキャラクターＪ２が表示されたときは、「音量を大きくする」べきであることを伝えている。 FIG.5 (c) is an image for conveying the volume correction method to a singer with a three-dimensional image. The character J (J1, J2, J3) is an image for indicating how much the volume should be increased in the way the image is projected. For convenience of explanation, the horizontal shifts K1, K2, and K3 (K1 <K2 <K3) from the non-stereoscopic image are virtually shown to the extent that each character J appears to jump out. It was supposed to be. In this example, when the character J1 with the smallest degree of appearing popping out (K1) is displayed, it is notified that “the volume should be slightly increased”, and the character J3 with the largest degree of appearing popping out (K3) When it is displayed, it tells you that “the volume should be very loud”, and when character J2 is displayed with a degree of popping out (K2) between them, it should be “increase the volume” I tell you.

カラオケ装置１においては、図５に示した表情が伝える様々な内容について、それらを実行すべきタイミングを図４に示した左腕の表情により伝えることで、歌唱者は、自分が苦手とする音声をいつ、どのように発するべきかを知ることができる。このため、歌唱者は、苦手な音声であっても、予めそれを改善するための準備をして発するため、評価を向上させる可能性が高まる。また、カラオケ装置１は、伴奏データ３１１を再生しているときに、キャラクターを表した画像を表示し、かつ、歌唱を評価した結果に応じてそのキャラクターの表示を変化させて表示する。これにより、カラオケ装置１は、歌唱されるべき基準に対して歌唱音声が沿っている程度に反応するキャラクターを表示することができ、このキャラクターと一緒に歌っているような臨場感を歌唱者に与えることができる。 In the karaoke apparatus 1, the singer can hear the voice that he / she is not good at by telling the timing to execute the various contents conveyed by the facial expression shown in FIG. 5 by the facial expression of the left arm shown in FIG. 4. You can know when and how to emit. For this reason, even if the singer is not good at voice, since the singer makes a preparation for improving it in advance, the possibility of improving the evaluation is increased. Further, the karaoke apparatus 1 displays an image representing a character while reproducing the accompaniment data 311 and changes the display of the character according to the result of evaluating the singing. Thereby, the karaoke apparatus 1 can display the character which reacts to the extent to which the singing voice is along with the reference | standard which should be sung, and a singer is sung with this character. Can be given.

また、カラオケ装置１は、図５（ｃ）に示すとおり、キャラクターの画像を立体的に表示して、その画像が飛び出す程度によって歌唱者に歌い方を伝えることができる。図１に示す立体画像表示部２５が有するディスプレイは、画像を表示する面（表示面）に対して、予め定められた領域から見た場合に画像が立体的に見えるものである。例えば、カラオケルームのステージから歌唱者が見るディスプレイは、歌唱者以外の者からは見えない又は見にくい場合が多い。このディスプレイに上述したキャラクターを表示させて、それが飛び出して見える度合いで歌い方を伝えれば、キャラクターから歌い方を伝えられていることが歌唱者以外には分かりにくくすることができる。これにより、歌唱者は、他の者には気づかれにくい方法でキャラクターから歌い方を伝えてもらい、歌唱が高く評価されるように歌い方を改善することができる。 Moreover, as shown in FIG.5 (c), the karaoke apparatus 1 can display the image of a character in three dimensions, and can tell a singer how to sing by the extent to which the image jumps out. The display included in the stereoscopic image display unit 25 illustrated in FIG. 1 is an image that can be viewed stereoscopically when viewed from a predetermined area with respect to a surface (display surface) for displaying an image. For example, a display that a singer sees from the stage of a karaoke room is often invisible or difficult to see by anyone other than the singer. If the character described above is displayed on this display and the way of singing is transmitted to such an extent that it appears to pop out, it can be made difficult for a non-singer to understand that the character is telling how to sing. Thereby, the singer can improve how to sing so that the singing is highly evaluated by having the character convey the way of singing in a way that is difficult for others to notice.

［変形例］
上述した実施形態は、本発明の実施の一例に過ぎず、次のように種々の応用・変形が可能であり、また、必要に応じて組み合わせることも可能である。 [Modification]
The above-described embodiment is merely an example of implementation of the present invention, and various applications and modifications are possible as follows, and can be combined as necessary.

（変形例１）
本発明に係るカラオケ装置は、歌唱者にはキャラクターが立体的に見えるように画像を表示させても良い。例えば、カラオケ装置が表示部に複数のディスプレイを備えている場合に、歌唱者いる位置を検知して、その位置から見ることができるディスプレイにだけキャラクターを表示させても良いし、そのディスプレイだけキャラクターを立体的に表示させても良い。また、カラオケ装置は、ディスプレイをモータ等で回転させ、検知した位置から見ることができるように向きを調整したディスプレイにキャラクターを表示させても良い。 (Modification 1)
The karaoke apparatus according to the present invention may display an image so that a singer can see a character stereoscopically. For example, when the karaoke device has a plurality of displays on the display unit, it is possible to detect the position of the singer and display the character only on the display that can be seen from that position, or the character only on the display May be displayed three-dimensionally. Further, the karaoke apparatus may display the character on a display whose orientation is adjusted so that the display can be viewed from the detected position by rotating the display with a motor or the like.

図６は、本変形例に係るカラオケ装置１ａの構成を示す図である。収音部２３は、２つのマイクロホン２３１（２３１ａ、２３１ｂ）を有し、放音部２４は、２つのスピーカ２４１（２４１ａ、２４１ｂ）を有する。立体画像表示部２５は、２つのディスプレイ２５１（２５１ａ、２５１ｂ）を有し、各ディスプレイには、それぞれモータ２５３（２５３ａ、２５３ｂ）が設けられている。モータ２５３は、各ディスプレイをそれぞれ回転させる。各ディスプレイは、各モータにより回転させられることで、画像を表示する向きを変えられる。記憶部３０ａは、ディスプレイデータ３３０ａを記憶する。ディスプレイデータ３３０ａは、各ディスプレイが設置されている位置と、これらが画像を表示する向きとを示すデータである。これらの位置及び向きは、各ディスプレイを設置する際、図１に示す操作部２１の操作により入力され、ディスプレイデータ３３０ａとして記憶される。 FIG. 6 is a diagram illustrating a configuration of the karaoke apparatus 1a according to the present modification. The sound collection unit 23 includes two microphones 231 (231a and 231b), and the sound emission unit 24 includes two speakers 241 (241a and 241b). The stereoscopic image display unit 25 includes two displays 251 (251a, 251b), and each display is provided with a motor 253 (253a, 253b). The motor 253 rotates each display. Each display can be rotated by each motor to change the direction in which an image is displayed. The storage unit 30a stores display data 330a. The display data 330a is data indicating the position where each display is installed and the direction in which these images are displayed. These positions and orientations are input by operating the operation unit 21 shown in FIG. 1 when each display is installed, and are stored as display data 330a.

制御部１０ａは、位置検知部１１７ａと、方向算出部１１８ａと、モータ制御部１１９ａとを有する。位置検知部１１７ａは、歌唱者が歌唱している位置（歌唱位置）を検知する手段である。詳細には、位置検知部１１７ａは、マイクロホン２３１の位置を測定するための音（測定音）を示すデータを、音響処理部２２を介して２つのスピーカ２４１に出力し、これらのスピーカ２４１から測定音を放音させる。このとき、位置検知部１１７ａは、スピーカ２４１が測定音を放音した時刻を取得する。続いて、位置検知部１１７ａは、各マイクロホン２３１が収音した音を表すオーディオ信号から、測定音が収音された時刻を算出する。位置検知部１１７ａは、測定音が放音されてから収音されるまでの時間から、その測定音を放音したスピーカから収音したマイクロホンまでの距離を算出し、これを２つのスピーカ２４１について算出することで、マイクロホン２３１の位置を測定する。位置検知部１１７ａは、測定した各マイクロホンの位置のうち、評価部１１２により評価されている音声が収集されたマイクロホンの位置を歌唱位置として検知する。位置検知部１１７ａは、検知した歌唱位置を示すデータを、方向算出部１１８ａに供給する。マイクロホン２３１、スピーカ２４１及び位置検知部１１７ａが協働することで、歌唱位置を検知する検知手段として機能する。
方向算出部１１８ａは、供給されたデータが示す位置と記憶部３０ａから読み出したディスプレイデータ３３０ａが示す位置及び向きから、キャラクターを表示するべきディスプレイを判断し、そのディスプレイが画像を表示すべき方向を算出する。方向算出部１１８ａの動作の詳細については、後に示す図７を参照しながら説明する。方向算出部１１８ａは、算出した方向とその方向に画像を表示すべきディスプレイとを示す算出結果データを、画像生成部１１４ａ、表示制御部１１５ａ及びモータ制御部１１９ａに供給する。 The control unit 10a includes a position detection unit 117a, a direction calculation unit 118a, and a motor control unit 119a. The position detection unit 117a is a means for detecting a position where the singer is singing (singing position). Specifically, the position detection unit 117 a outputs data indicating sound (measurement sound) for measuring the position of the microphone 231 to the two speakers 241 via the acoustic processing unit 22 and measures from these speakers 241. Sound is emitted. At this time, the position detection unit 117a acquires the time when the speaker 241 emits the measurement sound. Subsequently, the position detection unit 117a calculates the time at which the measurement sound is collected from the audio signal representing the sound collected by each microphone 231. The position detection unit 117a calculates the distance from the speaker that emitted the measurement sound to the microphone that collected the sound from the time from when the measurement sound is emitted until the sound is collected, and this is calculated for the two speakers 241. By calculating, the position of the microphone 231 is measured. The position detection unit 117a detects the position of the microphone from which the sound evaluated by the evaluation unit 112 is collected among the measured positions of each microphone as the singing position. The position detection unit 117a supplies data indicating the detected singing position to the direction calculation unit 118a. The microphone 231, the speaker 241, and the position detection unit 117a cooperate to function as a detection unit that detects the singing position.
The direction calculation unit 118a determines the display on which the character is to be displayed from the position indicated by the supplied data and the position and orientation indicated by the display data 330a read from the storage unit 30a, and determines the direction in which the display should display the image. calculate. Details of the operation of the direction calculation unit 118a will be described with reference to FIG. The direction calculation unit 118a supplies calculation result data indicating the calculated direction and a display to display an image in the direction to the image generation unit 114a, the display control unit 115a, and the motor control unit 119a.

画像生成部１１４ａは、上述したキャラクターを表す画像を含む画像データと、この画像を含まない画像データとを生成し、これらの画像データを表示制御部１１５ａに供給する。
表示制御部１１５ａは、画像生成部１１４ａから供給された画像データのうち、キャラクターを表す画像を含む画像データが示す画像を、方向算出部１１８ａから供給された算出結果データが示すディスプレイに対して表示させ、この画像を含まない画像データが示す画像を、このディスプレイ以外のディスプレイに表示させる。
モータ制御部１１９ａは、方向算出部１１８ａから供給された算出結果データが示すディスプレイに設けられたモータ２５３の動作を制御し、そのディスプレイが算出結果データが示す方向を向くように調整する。表示制御部１１５ａ、方向算出部１１８ａ及びモータ制御部１１９ａが協働することで、本発明に係る「表示制御手段」として機能する。 The image generation unit 114a generates image data including the image representing the character and the image data not including the image, and supplies the image data to the display control unit 115a.
The display control unit 115a displays the image indicated by the image data including the image representing the character among the image data supplied from the image generation unit 114a on the display indicated by the calculation result data supplied from the direction calculation unit 118a. The image indicated by the image data not including this image is displayed on a display other than this display.
The motor control unit 119a controls the operation of the motor 253 provided in the display indicated by the calculation result data supplied from the direction calculation unit 118a, and adjusts the display to face the direction indicated by the calculation result data. The display control unit 115a, the direction calculation unit 118a, and the motor control unit 119a cooperate to function as a “display control unit” according to the present invention.

図７は、ディスプレイの方向を調整する様子を説明するための図である。図７では、各ディスプレイと、歌唱者を含む利用者Ｍ１、Ｍ２、Ｍ３を鉛直方向上方から見た様子を模式的に示している。ディスプレイ２５１ａ、２５１ｂは、それぞれ、表示面２５２ａ、２５２ｂに画像を表示する。図７では、各表示面に表示される画像を立体的に見える第１領域とそれ以外の第２領域との境界を、点線Ｘａ、Ｘｂでそれぞれ示している。つまり、第１領域は、２本の点線Ｘａに挟まれた領域であり、同じく２本の点線Ｘｂに挟まれた領域である。
図７（ａ）では、利用者Ｍ１が歌唱者であり、マイクロホン２３１ａから音声を入力している。この場合、図６に示した位置検知部１１７ａは、歌唱者が用いているマイクロホン２３１ａの位置を歌唱位置として検知する。そして、図６に示した方向算出部１１８ａは、検知された歌唱位置が点線Ｘａで示される領域に含まれることから、キャラクターを表示するべきディスプレイをディスプレイ２５１ａと判断し、そのディスプレイが画像を表示すべき方向として、現在向いている方向を算出する。これにより、図６に示した表示制御部１１５ａは、ディスプレイ２５１ａにキャラクターを含む画像を表示させ、ディスプレイ２５１ｂにはキャラクターを含まない画像を表示させる。また、図６に示したモータ制御部１１９ａは、算出された向きが現在向いている向きであるため、モータを回転させない。 FIG. 7 is a diagram for explaining how the direction of the display is adjusted. FIG. 7 schematically shows each display and users M1, M2, and M3 including a singer viewed from above in the vertical direction. The displays 251a and 251b display images on the display surfaces 252a and 252b, respectively. In FIG. 7, the boundaries between the first region in which images displayed on the respective display surfaces are viewed stereoscopically and the other second regions are indicated by dotted lines Xa and Xb, respectively. That is, the first region is a region sandwiched between two dotted lines Xa, and is also a region sandwiched between two dotted lines Xb.
In Fig.7 (a), the user M1 is a singer and has input the audio | voice from the microphone 231a. In this case, the position detector 117a shown in FIG. 6 detects the position of the microphone 231a used by the singer as the singing position. Then, since the detected singing position is included in the region indicated by the dotted line Xa, the direction calculation unit 118a illustrated in FIG. 6 determines that the display on which the character is to be displayed is the display 251a, and the display displays an image. As the direction to be calculated, the direction that is currently facing is calculated. Thereby, the display control unit 115a illustrated in FIG. 6 displays an image including a character on the display 251a, and displays an image including no character on the display 251b. In addition, the motor control unit 119a illustrated in FIG. 6 does not rotate the motor because the calculated direction is the current direction.

図７（ｂ）では、図７（ａ）の状態から歌唱者が利用者Ｍ３に変わり、利用者Ｍ３がマイクロホン２３１ｂから音声を入力している状態を示している。図７（ａ）の状態では、マイクロホン２３１ｂの位置から画像を立体的に見ることができるディスプレイがなかった。この場合、方向算出部１１８ａは、検知された歌唱位置が点線Ｘａ又は点線Ｘｂで示される領域に含まれるように向けた場合の各ディスプレイの方向を算出する。そして、方向算出部１１８ａは、算出した方向を向くことが可能なディスプレイのうち、その方向に向けるため回転させる角度が最も小さいディスプレイ（図７（ｂ）の例では、ディスプレイ２５１ｂ）を、キャラクターを表示するべきディスプレイとして判断する。続いて、方向算出部１１８ａは、ディスプレイ２５１ｂが画像を表示すべき方向を算出する。そして、表示制御部１１５ａは、ディスプレイ２５１ｂにキャラクターを含む画像を表示し、モータ制御部１１９ａは、ディスプレイ２５１ｂを、図７（ｂ）において二点鎖線で示した状態から実線で示した状態となるまで回転させる。これにより、歌唱者は、自分でディスプレイの向きを変えたり、自分の位置を変えたりしなくとも、ディスプレイに表示されるキャラクターが立体的に見えるようになる。これにより、歌唱者がどの位置で歌っていても、キャラクターの飛び出し具合が他の者からは分かりにくくなり、歌唱者は、他の者に知られることなく、歌唱が高く評価されるように歌い方を改善することができる。 FIG. 7B shows a state where the singer is changed to the user M3 from the state of FIG. 7A and the user M3 is inputting sound from the microphone 231b. In the state shown in FIG. 7A, there is no display capable of stereoscopically viewing an image from the position of the microphone 231b. In this case, the direction calculation unit 118a calculates the direction of each display when the detected singing position is directed to be included in the region indicated by the dotted line Xa or the dotted line Xb. Then, the direction calculation unit 118a selects a display (display 251b in the example of FIG. 7B) having the smallest angle of rotation for directing the calculated direction from among the displays capable of facing the calculated direction. Judge as the display to be displayed. Subsequently, the direction calculation unit 118a calculates a direction in which the display 251b should display an image. Then, the display control unit 115a displays an image including a character on the display 251b, and the motor control unit 119a changes the display 251b from the state indicated by the two-dot chain line in FIG. 7B to the state indicated by the solid line. Rotate until As a result, the singer can see the character displayed on the display in a three-dimensional manner without changing the orientation of the display or changing his position. This makes it difficult for others to understand the character's pop-up, regardless of where the singer is singing, and the singer sings so that the singing is highly appreciated without being known to others. Can be improved.

（変形例２）
本発明に係るカラオケ装置は、歌唱者の画像を撮影し、撮影した画像をキャラクターの画像と合成して表示させても良い。この場合、カラオケ装置は、歌唱者の画像を撮影する撮影手段を備え、撮影手段が撮影した画像と上述した各種画像とを合成した画像データを画像生成部が生成し、この画像データを立体画像表示部が有するディスプレイに出力して表示させれば良い。これにより、カラオケ装置は、キャラクターとデュエットをしているような雰囲気を歌唱者に与えることができ、上記合成した画像データを表示しない場合に比べて、臨場感をより高めることができる。 (Modification 2)
The karaoke apparatus according to the present invention may shoot a singer's image and synthesize the captured image with a character image for display. In this case, the karaoke apparatus includes a photographing unit that photographs a singer's image, and the image generation unit generates image data obtained by combining the image captured by the photographing unit and the various images described above. What is necessary is just to output and display on the display which a display part has. Thereby, the karaoke apparatus can give the singer an atmosphere that makes a duet with the character, and can enhance the sense of reality compared to the case where the synthesized image data is not displayed.

（変形例３）
本発明に係るカラオケ装置は、上述した実施形態で示したキャラクターの表情以外にも、様々な表情で感情等を伝えても良い。例えば、キャラクターに目配せをさせてタイミングを伝えても良いし、上手く歌えたとき（評価が高いとき）に、キャラクターに飛び跳ねさせて歌唱者を盛り上げるようにしても良い。また、基準とは関係なく、例えばビブラートを検出したら表情を変えても良い。また、キャラクターの表情と伝えられる内容とは、歌唱者が対応付けて伝えられる内容を認識することができるものであれば、どのように対応付けられていても良い。例えば、表情とその表情が示す内容を対応付けた表を、予め利用者に提供しておけば良い。これにより、人により表情の受け取り方が異なる場合であっても、伝えるべき感情又は内容を歌唱者に対して伝えることができる。また、この対応を、ユーザが操作部２１の操作などで設定できるようにしても良い。なお、キャラクターの表情に加え、背景により感情等を伝えても良く、例えば、評価が高い状態から低い状態になるに連れて、背景の色を青から赤に変化させ、間違いが増えていることを警告するようにしても良い。また、文字を表示しても良いし、音声を発声させても良い。音声を発声させる際は、発音する音に応じて、口の形を変えてキャラクターを表示させても良い。 (Modification 3)
The karaoke apparatus according to the present invention may convey emotions and the like with various facial expressions in addition to the facial expressions of the characters shown in the above-described embodiments. For example, the character may be watched and the timing may be transmitted, or when the character is sung well (when the evaluation is high), the character may jump up and the singer can be excited. Regardless of the reference, the facial expression may be changed if, for example, vibrato is detected. Further, the content transmitted to the character's facial expression may be associated in any way as long as the singer can recognize the content transmitted in association. For example, a table in which facial expressions and contents indicated by the facial expressions are associated may be provided to the user in advance. Thereby, even if it is a case where how to receive a facial expression changes with people, the emotion or content which should be conveyed can be conveyed with respect to a singer. Further, this correspondence may be set by the user by operating the operation unit 21 or the like. In addition to the character's facial expression, emotions etc. may be conveyed by the background, for example, as the evaluation changes from high to low, the background color changes from blue to red, and mistakes increase. May be warned. Moreover, a character may be displayed and an audio | voice may be uttered. When uttering voice, the character may be displayed with the mouth shape changed according to the sound to be generated.

（変形例４）
本発明に係るカラオケ装置は、キャラクターの表情を１つ１つ異なる画像で表すものでなくともよく、例えば、表情を形作るいくつかのパーツを組み合わせることで、数通りのパーツから何十種類もの表情を表すものであっても良いし、パラメータを用いて表情を生成するものであっても良い。これにより、表情の数だけ画像を用意する場合よりも、キャラクター画像データの容量を小さくすることができる。 (Modification 4)
The karaoke apparatus according to the present invention does not have to represent each character's facial expression with different images. For example, by combining several parts that form the facial expression, dozens of different facial expressions can be used. May be used, or a facial expression may be generated using a parameter. As a result, the capacity of the character image data can be reduced as compared with the case of preparing images for the number of facial expressions.

（変形例５）
本発明においては、歌唱音声の様々な特徴を基準として用いて、これを評価しても良い。例えば、基準には、拍の長さ又は音量等の特徴を用いても良いし、「ビブラート」又は「こぶし」等の歌唱技法を特徴としてとらえて用いても良い。 (Modification 5)
In the present invention, this may be evaluated using various characteristics of the singing voice as a reference. For example, features such as beat length or volume may be used as the reference, and singing techniques such as “vibrato” or “fist” may be used as features.

（変形例６）
本発明に係る制御部は、上述した実施形態において説明した方法に限らず、様々な方法で歌唱を評価しても良い。例えば、制御部が算出した評価値が第１閾値以下である場合に２点、第２閾値以下である場合に１点加点し、その合計を評価点とするというように、加点式で評価しても良い。また、評価期間を、構成音が歌唱されるべき期間とは異なる期間としても良く、例えば、１秒や２秒といった一定の時間を評価期間としても良いし、評価期間の長さをユーザが操作部２１の操作などにより設定できるようにしても良い。 (Modification 6)
The control part which concerns on this invention may evaluate a song by not only the method demonstrated in embodiment mentioned above but various methods. For example, when the evaluation value calculated by the control unit is less than or equal to the first threshold value, two points are added, and when the evaluation value is less than or equal to the second threshold value, one point is added, and the total is used as an evaluation point. May be. Further, the evaluation period may be a period different from the period in which the constituent sounds are to be sung, for example, a certain time such as 1 second or 2 seconds may be set as the evaluation period, and the length of the evaluation period is operated by the user. The setting may be made by operating the unit 21 or the like.

（変形例７）
本発明に係るカラオケ装置は、上述した実施形態では、評価部１１２による評価結果に応じた表情のキャラクターを表示し、また、特定部１１３により特定された評価期間を伝えるための表情のキャラクターをそれぞれ異なる画像で表示させたが、これらは、同時に表示させても良いし、いずれか一方だけを表示させても良い。前者を表示させれば、歌唱者に臨場感を与えやすく、後者を表示させれば、歌唱者が上手に歌うことを補助することができる。 (Modification 7)
In the embodiment described above, the karaoke apparatus according to the present invention displays a facial expression character according to the evaluation result by the evaluation unit 112, and also displays a facial expression character for transmitting the evaluation period specified by the specifying unit 113. Although different images are displayed, these may be displayed simultaneously or only one of them may be displayed. If the former is displayed, it is easy to give a sense of reality to the singer, and if the latter is displayed, it is possible to assist the singer in singing well.

（変形例８）
本発明において、特定部１１３により特定された評価期間に、過去に歌唱したときよりも上手に歌うことができたら、評価点を高くするように加点しても良い。詳細には、評価部１１２は、評価期間が特定部１１３により特定されたものであり、かつ、音高が共通する基準のうち、１回前の基準に対する評価値よりも評価値が小さくなった場合、評価点に１点加点する。なお、この際、過去の複数回の評価値の平均値よりも小さくなった場合に加点しても良いし、加点する代わりに減点する点数を減らしても良い。 (Modification 8)
In the present invention, during the evaluation period specified by the specifying unit 113, if the user can sing better than when singing in the past, the evaluation point may be increased. Specifically, in the evaluation unit 112, the evaluation period is specified by the specifying unit 113, and the evaluation value is smaller than the evaluation value with respect to the previous standard among the standards having the same pitch. In this case, one point is added to the evaluation score. At this time, points may be added when the evaluation value becomes smaller than the average value of a plurality of past evaluation values, or the points to be deducted may be reduced instead of adding points.

（変形例９）
本発明に係るカラオケ装置は、上述した実施形態においては、立体画像を表示する手段（立体画像表示部）を備えていたが、２次元の画像を表示する手段を備えていても良い。この場合、画像生成部は、図５（ｃ）に示すような立体的に飛び出して見える画像を、遠近法を用いて２次元の画像で生成すれば良い。また、飛び出して見える画像は表示せずに、図４、図５（ａ）、（ｂ）に示すような平面的に表現されたキャラクターの画像を表示させても良い。この場合でも、カラオケ装置は、歌唱されるべき基準に対して歌唱音声が沿っている程度に反応するキャラクターを表示することができる。 (Modification 9)
The karaoke apparatus according to the present invention includes means for displaying a stereoscopic image (stereoscopic image display unit) in the above-described embodiment, but may include means for displaying a two-dimensional image. In this case, the image generation unit may generate an image that appears three-dimensionally as shown in FIG. 5C as a two-dimensional image using a perspective method. Further, instead of displaying an image that appears to pop out, an image of a character expressed in a plane as shown in FIGS. 4, 5A, and 5B may be displayed. Even in this case, the karaoke apparatus can display a character that reacts to the extent that the singing voice is in line with the reference to be sung.

（変形例１０）
本発明に係るカラオケ装置は、上述した実施形態においては、表示制御部及び立体画像表示部を備えたが、これらの各部は、外部の装置に備えられていても良い。この場合、画像生成部は、例えばネットワークを介して、この外部の装置に備えられた表示制御部に画像データを出力すればよい。 (Modification 10)
The karaoke apparatus according to the present invention includes the display control unit and the stereoscopic image display unit in the above-described embodiment, but these units may be included in an external device. In this case, the image generation unit may output image data to a display control unit provided in the external device via a network, for example.

（変形例１１）
本発明に係るカラオケ装置は、キャラクターの表情に加え、背景によって感情又は伝えたい情報等を表しても良い。この場合、記憶部にキャラクターの背景として表示させるための画像データである背景データを記憶させれば良い。ここでいう背景には、背後の景色を表したいわゆる背景に加え、上記キャラクターの表情と同様に、キャラクターの感情又は伝えたい情報等を色、形又は文字等で表したものが含まれる。 (Modification 11)
The karaoke apparatus according to the present invention may represent emotions or information to be conveyed depending on the background in addition to the facial expression of the character. In this case, background data, which is image data to be displayed as a character background, may be stored in the storage unit. The background referred to here includes, in addition to the so-called background representing the background behind the scene, the same as the character's facial expression, representing the character's emotion or information to be conveyed in color, shape, or character.

（変形例１２）
本発明に係るカラオケ装置は、歌唱者が歌唱すべき時刻又は歌唱すべき位置を、伴奏音に限らず、画像等で表しても良い。例えば、上述した実施形態においては、図３に示したように、歌詞Ａの色を変化させて色が変化する境目を現在歌唱すべき位置として表しているので、歌唱者は、伴奏音がなくとも歌唱すべき位置を知ることができる。また、これに限らず、図３に示した五線譜に、現在再生されているところを示すマークを表示させ、矢印Ｒ１の方向に移動させて歌唱すべき位置を知らせても良い。上述した実施形態に係る「伴奏データ３１１」、「歌詞データ３１３」及び本変形例に係るマークを表示させるデータは、いずれも、再生されると楽曲の歌唱すべき位置を表すデータであり、本発明に係る「位置表示データ」の一例に相当する。 (Modification 12)
In the karaoke apparatus according to the present invention, the time at which the singer should sing or the position at which the singer should sing may be represented not only by the accompaniment sound but also by an image or the like. For example, in the above-described embodiment, as shown in FIG. 3, the boundary where the color changes by changing the color of the lyrics A is represented as the position where the singer should sing now. Both can know where to sing. Further, the present invention is not limited to this, and a mark indicating the currently played position may be displayed on the staff shown in FIG. 3 and moved in the direction of the arrow R1 to notify the position to be sung. The “accompaniment data 311” and “lyric data 313” according to the embodiment described above and the data for displaying the mark according to this modification are all data representing the position at which the music should be sung when reproduced. This corresponds to an example of “position display data” according to the invention.

（変形例１３）
画像生成部１１４は、上述した実施形態において、生成した画像データ（第２画像データ）を伴奏データに応じたキャラクターの画像データ（第１画像データ）に代えて表示制御部１１５に供給したが、これに限らず、これらの画像データを共に表示制御部１１５に供給してもよい。例えば、画像生成部１１４は、第２画像データとして、第１画像データが示すキャラクターの表情を、上述した評価結果又は特定部１１３により特定された評価期間に応じた表情で表した画像を示すデータを生成し、表示制御部１１５は、第１画像データが示す画像に第２画像データが示す画像を重ねて（オーバーレイさせて）表示させれば良い。この場合であっても、カラオケ装置１は、歌唱されるべき基準に対して歌唱音声が沿っている程度に反応するキャラクターを表示することができる。 (Modification 13)
In the above-described embodiment, the image generation unit 114 supplies the generated image data (second image data) to the display control unit 115 instead of the character image data (first image data) corresponding to the accompaniment data. Not only this but these image data may be supplied to the display control part 115 together. For example, the image generation unit 114 represents, as the second image data, data representing an image representing the facial expression of the character indicated by the first image data with the above-described evaluation result or the facial expression corresponding to the evaluation period specified by the specifying unit 113. The display control unit 115 may display the image indicated by the second image data on the image indicated by the first image data so as to overlap (overlay) the image. Even in this case, the karaoke apparatus 1 can display a character that reacts to the extent that the singing voice is along the reference to be sung.

（変形例１４）
本発明は、コンピュータを本発明に係る評価装置として機能させるためのプログラムとしても特定され得るものである。かかるプログラムは、光ディスク等の記録媒体に記録した形態で提供されたり、インターネット等のネットワークを介して、コンピュータにダウンロードさせ、これをインストールして利用可能にするなどの形態で提供されたりすることも可能である。 (Modification 14)
The present invention can also be specified as a program for causing a computer to function as the evaluation apparatus according to the present invention. Such a program may be provided in a form recorded on a recording medium such as an optical disk, or may be provided in a form such that the program is downloaded to a computer via a network such as the Internet, and the program can be installed and used. Is possible.

１…カラオケ装置、１０、１０ａ…制御部、２１…操作部、２２…音響処理部、２３…収音部、２４…放音部、２５…立体画像表示部、３０…記憶部、１１１…再生部、１１２…評価部、１１３…特定部、１１４…画像生成部、１１５、１１５ａ…表示制御部、１１７ａ…位置検知部、１１８ａ…方向算出部、１１９ａ…モータ制御部、２３１…マイクロホン、２５１…ディスプレイ、２５２…表示面、２５３…モータ、３１０…楽曲データベース、３１１…伴奏データ、３１２…リファレンスデータ、３１３…歌詞データ、３２０…キャラクターデータベース、３２１…キャラクター画像データ、３３０…ディスプレイデータ DESCRIPTION OF SYMBOLS 1 ... Karaoke apparatus 10, 10a ... Control part, 21 ... Operation part, 22 ... Sound processing part, 23 ... Sound collection part, 24 ... Sound emission part, 25 ... Three-dimensional image display part, 30 ... Memory | storage part, 111 ... Playback , 112 ... evaluation unit, 113 ... identification unit, 114 ... image generation unit, 115, 115a ... display control unit, 117a ... position detection unit, 118a ... direction calculation unit, 119a ... motor control unit, 231 ... microphone, 251 ... Display, 252 ... display surface, 253 ... motor, 310 ... music database, 311 ... accompaniment data, 312 ... reference data, 313 ... lyrics data, 320 ... character database, 321 ... character image data, 330 ... display data

Claims

Storage means for storing position display data representing a position at which the song should be sung when played and reference data indicating a reference for evaluating the singing voice;
Reproduction means for reading out and reproducing the position display data from the storage means;
The singing voice is compared by comparing the singing voice represented by the audio signal supplied from the sound collecting means when the reproducing means is reproducing the position display data with the reference indicated by the reference data read from the storage means. An evaluation means for evaluating
While the reproduction means reproduces the position display data, the first image data indicating the character image is output, and when the singing voice is evaluated by the evaluation means, the expression of the character is represented by the singing voice. Output means for outputting second image data indicating an image represented by a facial expression according to the evaluated result, and changing the facial expression when a specific singing technique is detected from the singing voice represented by the audio signal And an output means for outputting the data indicating the processed image as the second image data .

The evaluation means compares the first singing voice represented by the audio signal with a reference indicated by the reference data read from the storage means, and evaluates the first singing voice by an evaluation point, and the reference If the evaluation score of the first singing voice is higher than the evaluation score of the second singing voice compared to the past, the evaluation score of the first singing voice is added.
The evaluation apparatus according to claim 1, wherein:

Storage means for storing position display data representing a position at which the song should be sung when played and reference data indicating a reference for evaluating the singing voice;
Reproduction means for reading out and reproducing the position display data from the storage means;
The first singing voice represented by the audio signal supplied from the sound collecting means when the reproducing means is reproducing the position display data is compared with the reference indicated by the reference data read from the storage means. An evaluation means for evaluating the first singing voice based on an evaluation point, and if the evaluation point of the first singing voice is higher than the evaluation point of the second singing voice compared with the reference in the past, the first An evaluation means for adding to the evaluation score of one singing voice;
While the reproduction means reproduces the position display data, the first image data indicating the character image is output, and when the first singing voice is evaluated by the evaluation means, the expression of the character is Output means for outputting second image data indicating an image represented by an expression according to a result of evaluation of the first singing voice;
An evaluation apparatus comprising:

When the specific singing technique is detected from the singing voice represented by the audio signal, the output means outputs data indicating an image in which the facial expression is changed as the second image data.
The evaluation apparatus according to claim 3.