JP2019101071A

JP2019101071A - Singing evaluation device, singing evaluation program and karaoke device

Info

Publication number: JP2019101071A
Application number: JP2017228478A
Authority: JP
Inventors: 佳紀原; Yoshinori Hara
Original assignee: Xing Inc
Current assignee: Xing Inc
Priority date: 2017-11-28
Filing date: 2017-11-28
Publication date: 2019-06-24
Anticipated expiration: 2037-11-28
Also published as: JP6810676B2

Abstract

To perform singing evaluation with high accuracy while suppressing a processing load even for music which has no singing evaluation information (main melody information) which becomes a reference for singing evaluation.SOLUTION: A singing evaluation device includes: singing pitch vicinity range calculation means for performing frequency analysis of a singing voice signal inputted from a microphone and for calculating a singing pitch vicinity range which is a frequency range of the pitch vicinity of the singing voice signal; model singing voice characteristic calculation means for calculating a model singing characteristic in the singing pitch vicinity range from an accompaniment sound containing the model singing sound; and evaluation means for performing evaluation of the singing voice signal by comparing the singing voice characteristic of the singing voice signal and the model singing characteristic.SELECTED DRAWING: Figure 6

Description

本発明は、カラオケ等、再生される伴奏音に併せて歌唱を行う際、歌唱評価を行う歌唱評価装置、歌唱評価プログラム、歌唱評価方法及びカラオケ装置に関する。 The present invention relates to a song evaluation apparatus, song evaluation program, song evaluation method, and karaoke apparatus for performing song evaluation when performing singing along with accompaniment sounds to be reproduced such as karaoke.

従来、伴奏音に合わせて歌唱を楽しむことのできるカラオケ装置では、歌唱評価としての採点機能を備えたものがある。カラオケ装置における採点機能は、伴奏情報内の主旋律情報と、マイクロホンから入力される歌唱音声信号を比較して行われることが一般的である。したがって、ＣＤ等のように主旋律情報を有していない音源では、採点機能を利用することができなかった。 Conventionally, some karaoke apparatuses that can enjoy singing along with accompaniment sounds are provided with a scoring function as a singing evaluation. In general, the scoring function in the karaoke apparatus is performed by comparing the main melody information in the accompaniment information with the singing voice signal input from the microphone. Therefore, with a sound source that does not have main melody information, such as a CD, the scoring function can not be used.

特許文献１には、ＣＤ等のように主旋律情報を有していない音源であっても採点（歌唱評価）を行うことのできる音声評価装置が開示されている。この音声評価装置では、マイクロホンから入力される音声データから、時間軸に沿って抽出された評価対象ピッチと、複数の音が含まれるオーディオデータから時間軸に沿って抽出された複数の音、各々の基準ピッチとを比較することで音声データを評価している。 Patent Document 1 discloses a voice evaluation device that can perform scoring (song evaluation) even for a sound source such as a CD that does not have main melody information. In this voice evaluation apparatus, an evaluation target pitch extracted along a time axis from voice data input from a microphone, and a plurality of sounds extracted along a time axis from audio data including a plurality of sounds, The voice data is evaluated by comparing with the reference pitch of.

特開２０１６−１２２１６４号公報JP, 2016-122164, A

特許文献１に開示される音声評価装置によれば、市販の音楽用ＣＤのように、主旋律（正解データ）が設けられていない音源であっても歌唱評価を行うことが可能である。しかしながら、特許文献１では、複数の基準ピッチを抽出する必要があるため、歌唱評価に必要となる処理負荷が大きくなることが想定される。また、評価対象ピッチを、オーディオデータに含まれるメインボーカル、コーラス、各種楽器などのあらゆる音の基準ピッチと比較するため比較に必要な処理負荷は大きくなることが考えられる。また、複数の基準ピッチの内、適切な基準ピッチと比較しないと、歌唱評価が不適切となってしまう場合がある。 According to the voice evaluation device disclosed in Patent Document 1, it is possible to perform song evaluation even with a sound source that is not provided with a main melody (correct data), such as a commercially available music CD. However, in patent document 1, since it is necessary to extract several reference | standard pitch, it is assumed that the processing load required for song evaluation will become large. In addition, it is conceivable that the processing load required for comparison becomes large in order to compare the evaluation target pitch with the reference pitch of all the sounds included in the audio data, such as the main vocal, chorus, and various instruments. Moreover, if it does not compare with a suitable reference pitch among several reference pitch, song evaluation may become unsuitable.

本発明は、このような事情を考慮したものであって、ＣＤ等のように主旋律情報を有していない音源であっても、精度良く、また、処理負荷も抑えた歌唱評価を行うことのできる歌唱評価装置、歌唱評価プログラム及びカラオケ装置を提供することを目的としている。 The present invention takes such circumstances into consideration, and even if the sound source does not have main melody information such as CD, etc., it is possible to perform song evaluation with high accuracy and reduced processing load. It aims at providing a song evaluation device, a song evaluation program and a karaoke device that can be performed.

そのため本発明に係る歌唱評価装置は、以下の構成を採用したことを特徴としている。マイクロホンから入力される歌唱音声信号を周波数解析し、歌唱音声信号の音高近傍の周波数範囲である歌唱音高近傍範囲を算出する歌唱音高近傍範囲算出手段と、
模範歌唱音を含んだ伴奏音信号から、歌唱音高近傍範囲内における模範歌唱特性を算出する模範歌唱音声特性算出手段と、
歌唱音声信号の歌唱音声特性と、模範歌唱特性を比較することで歌唱音声信号の評価を行う評価手段と、を備えたことを特徴とする。 Therefore, the song evaluation device according to the present invention is characterized in that the following configuration is adopted. Sing voice pitch proximity range calculation means for performing frequency analysis of a singing voice signal input from the microphone and calculating a singing voice pitch proximity range which is a frequency range near the pitch of the singing voice signal;
Model singing voice characteristic calculation means for calculating model singing characteristics within the vicinity of the singing pitch from the accompaniment sound signal including the model singing sound;
An evaluation means for evaluating a singing voice signal by comparing a singing voice characteristic of the singing voice signal with a model singing characteristic is provided.

さらに本発明に係る歌唱評価装置において、
歌唱音高近傍範囲は、周波数軸方向または時間軸方向の少なくとも何れか１つについて、歌唱音声信号の音高を基準とする所定分布であることを特徴とする。 Furthermore, in the song evaluation device according to the present invention,
The song pitch vicinity range is a predetermined distribution based on the pitch of the singing voice signal in at least one of the frequency axis direction and the time axis direction.

また本発明に係る歌唱評価装置は、
歌唱音高近傍範囲、もしくは、歌唱音高近傍範囲を特定可能な情報が付帯された伴奏音信号から歌唱音高近傍範囲を算出する歌唱音高近傍範囲算出手段と、
模範歌唱音を含んだ伴奏音信号から、歌唱音高近傍範囲内における模範歌唱特性を算出する模範歌唱音声特性算出手段と、
マイクロホンから入力される歌唱音声信号の歌唱音声特性と、模範歌唱特性を比較することで歌唱音声信号の評価を行う評価手段と、を備えたことを特徴とする。 The song evaluation device according to the present invention is
Singing sound pitch vicinity range calculation means for calculating a singing sound pitch vicinity range from an accompaniment sound signal accompanied by information that can specify the singing sound pitch vicinity range or the singing sound pitch vicinity range;
Model singing voice characteristic calculation means for calculating model singing characteristics within the vicinity of the singing pitch from the accompaniment sound signal including the model singing sound;
It is characterized by comprising: evaluation means for evaluating a singing voice signal by comparing singing voice characteristics of singing voice signals inputted from a microphone with model singing characteristics.

また本発明に係るカラオケ装置は、
上記した何れか１つの歌唱評価装置と、
模範歌唱音を含んだ伴奏音信号を再生する再生手段と、を備えたことを特徴とする。 The karaoke apparatus according to the present invention is
Any one song evaluation device mentioned above,
And reproducing means for reproducing an accompaniment sound signal including a model singing sound.

また本発明に係る歌唱評価プログラムは、
マイクロホンから入力される歌唱音声信号を周波数解析し、歌唱音声信号の音高近傍の周波数範囲である歌唱音高近傍範囲を算出する歌唱音高近傍範囲算出処理と、
模範歌唱音を含んだ伴奏音信号から、歌唱音高近傍範囲内における模範歌唱特性を算出する模範歌唱音声特性算出処理と、
歌唱音声信号の歌唱音声特性と、模範歌唱特性を比較することで歌唱音声信号の評価を行う評価処理と、を実行可能としたことを特徴とする。 The song evaluation program according to the present invention is
Singing voice height vicinity range calculation processing of performing frequency analysis of the singing voice signal input from the microphone and calculating a singing voice pitch vicinity range which is a frequency range near the pitch of the singing voice signal;
Model singing voice characteristic calculation processing for calculating model singing characteristics within the vicinity of the singing pitch from the accompaniment sound signal including the model singing sound;
It is characterized in that it is possible to execute an evaluation process of evaluating a singing voice signal by comparing a singing voice characteristic of the singing voice signal and a model singing characteristic.

また本発明に係る歌唱評価プログラムは、
歌唱音高近傍範囲、もしくは、歌唱音高近傍範囲を特定可能な情報が付帯された伴奏音信号から歌唱音高近傍範囲を算出する歌唱音高近傍範囲算出処理と、
模範歌唱音を含んだ伴奏音信号から、歌唱音高近傍範囲内における模範歌唱特性を算出する模範歌唱音声特性算出処理と、
マイクロホンから入力される歌唱音声信号の歌唱音声特性と、模範歌唱特性を比較することで歌唱音声信号の評価を行う評価処理と、を実行可能としたことを特徴とする。 The song evaluation program according to the present invention is
Singing sound height vicinity range calculation processing for calculating a singing sound height vicinity range from an accompaniment sound signal to which information capable of specifying a singing sound height vicinity range or a singing sound height vicinity range is attached;
Model singing voice characteristic calculation processing for calculating model singing characteristics within the vicinity of the singing pitch from the accompaniment sound signal including the model singing sound;
It is characterized in that it is possible to execute an evaluation process of evaluating a singing voice signal by comparing a singing voice characteristic of a singing voice signal input from a microphone with a model singing characteristic.

本発明に係る歌唱評価装置、歌唱評価プログラム及びカラオケ装置によれば、主旋律情報が付帯されていないＣＤ音源等のような伴奏音信号を使用して歌唱を行う場合であっても、伴奏音信号から模範歌唱音声特性を算出することで、採点等の歌唱評価を行うことが可能となる。そして、伴奏音信号から模範歌唱音声特性を算出する際、ユーザーの歌唱する歌唱音高近傍範囲を使用して算出するため、伴奏音信号の全周波数範囲を対象とする必要が無く、処理負荷を抑えるとともに、精度の高い歌唱評価を行うことが可能となる。 According to the song evaluation apparatus, the song evaluation program, and the karaoke apparatus according to the present invention, the accompaniment sound signal is used even when singing is performed using an accompaniment sound signal such as a CD sound source with no main melody information attached. By calculating model singing voice characteristics from the above, it becomes possible to perform singing evaluation such as scoring. Then, when calculating the model singing voice characteristic from the accompaniment sound signal, it is not necessary to cover the entire frequency range of the accompaniment sound signal because the calculation is performed using the singing sound pitch vicinity range to be sung by the user. While suppressing, it becomes possible to perform song evaluation with high accuracy.

また、本発明に係る歌唱評価装置、歌唱評価プログラム及びカラオケ装置で使用する歌唱音高近傍範囲は、周波数軸方向について、歌唱音高を基準とする所定分布となっている。このように歌唱音高範囲に所定分布を持たせることで、模範歌唱音特性に重み付けを行うことでさらに精度の高い歌唱評価を行うことが可能となる。 Further, the singing voice pitch vicinity range used in the singing voice evaluation device, the singing voice evaluation program and the karaoke device according to the present invention has a predetermined distribution based on the singing voice height in the frequency axis direction. By giving a predetermined distribution to the singing voice height range in this way, it is possible to perform a more accurate singing evaluation by weighting the model singing voice characteristics.

さらに、本発明に係る歌唱評価装置、歌唱評価プログラム及びカラオケ装置で使用する歌唱音高近傍範囲は、時間軸方向について、歌唱音高を基準とする所定分布となっている。このように歌唱音高範囲の時間軸方向を考慮することで、さらに精度の高い歌唱評価を行うことが可能となる。 Furthermore, the song sound height vicinity range used by the song evaluation apparatus, the song evaluation program, and the karaoke apparatus according to the present invention has a predetermined distribution based on the song sound height in the time axis direction. By considering the time axis direction of the singing voice height range in this manner, it is possible to perform singing evaluation with higher accuracy.

また、本発明に係る歌唱評価装置、歌唱評価プログラム及びカラオケ装置によれば、伴奏音信号に歌唱音高近傍範囲、もしくは、歌唱音高近傍範囲を特定可能な情報を付帯しておくことで、正確な主旋律情報が付帯されていないＣＤ音源等のような伴奏音信号を使用して歌唱を行う場合であっても、伴奏音信号から模範歌唱音声特性を算出することで、採点等の歌唱評価を行うことが可能となる。そして、伴奏音信号から模範歌唱音声特性を算出する際、予め付帯されている歌唱音高近傍範囲を使用して算出するため、伴奏音信号の全周波数範囲を対象とする必要が無く、処理負荷を抑えるとともに、精度の高い歌唱評価を行うことが可能となる。 Further, according to the song evaluation device, the song evaluation program, and the karaoke device according to the present invention, by adding the information that can specify the singing sound pitch vicinity range or the singing sound pitch vicinity range to the accompaniment sound signal, Even if singing is performed using an accompaniment sound signal such as a CD sound source or the like without accurate main melody information, song evaluation such as scoring can be performed by calculating model singing voice characteristics from the accompaniment sound signal. It is possible to do Then, when calculating the model singing voice characteristic from the accompaniment sound signal, it is not necessary to cover the entire frequency range of the accompaniment sound signal because the calculation is performed using the singing sound pitch vicinity range attached in advance. It is possible to perform accurate song evaluation while suppressing

本実施形態のカラオケシステムの構成を示す図The figure which shows the structure of the karaoke system of this embodiment 本実施形態のカラオケシステムで使用する各種データ構成を示す図A diagram showing various data configurations used in the karaoke system of the present embodiment 本実施形態の演奏音確率密度算出処理を示すフロー図Flow chart showing the performance sound probability density calculation process of the present embodiment 本実施形態の楽曲再生処理を示すフロー図Flow chart showing music reproduction processing of the present embodiment 本実施形態の評価処理を示すフロー図Flow chart showing the evaluation process of the present embodiment 本実施形態の歌唱音高近傍範囲を説明するための図The figure for demonstrating the song sound vicinity vicinity of this embodiment 本実施形態の音高確率密度の二次元正規分布領域を説明するための図Diagram for explaining a two-dimensional normal distribution region of pitch probability density according to the present embodiment 本実施形態の模範歌唱特性を説明するための図The figure for demonstrating the model song characteristic of this embodiment 本実施形態の比較処理を説明するための図A figure for explaining comparison processing of this embodiment

図１は、本実施形態のカラオケシステムの構成を示す図である。本実施形態におけるカラオケシステムは、カラオケ装置２（コマンダと呼ぶこともある）と、リモコン装置１を含んで構成されている。カラオケ装置２とリモコン装置１は、ＬＡＮ１００及びアクセスポイント１３０を利用してネットワークを形成するように通信接続されている。 FIG. 1 is a diagram showing the configuration of the karaoke system of the present embodiment. The karaoke system in the present embodiment is configured to include a karaoke device 2 (sometimes called a commander) and a remote control device 1. The karaoke apparatus 2 and the remote control apparatus 1 are communicably connected to form a network using the LAN 100 and the access point 130.

カラオケボックスなどの店舗に設置されるカラオケ装置２は、楽曲を演奏するための演奏部として音響制御部２５を備えている。また、カラオケ装置２は、ユーザーからの各種入力を受け付ける操作部２１を備える。カラオケ装置２は、操作部２１からの入力を解釈してＣＰＵ３０に伝達する操作処理部２２を備える。また、カラオケ装置２は、各種情報を記憶する記憶部としてのハードディスク３２を備える。カラオケ装置２は、ＬＡＮ１００に接続してネットワークに加入する通信手段としてのＬＡＮ通信部２４ａを備えている。また、本実施形態のカラオケ装置２は、無線ＬＡＮ通信部２４ｂも備えており、ＬＡＮ通信部２４ａを使用した有線によるネットワーク接続に代え、無線ＬＡＮ通信部２４ｂを使用した無線によるネットワーク接続を行うことも可能である。 The karaoke apparatus 2 installed in a store such as a karaoke box includes a sound control unit 25 as a playing unit for playing music. Moreover, the karaoke apparatus 2 is provided with the operation part 21 which receives the various input from a user. The karaoke apparatus 2 includes an operation processing unit 22 that interprets an input from the operation unit 21 and transmits the interpretation to the CPU 30. Moreover, the karaoke apparatus 2 is equipped with the hard disk 32 as a memory | storage part which memorize | stores various information. The karaoke apparatus 2 is provided with a LAN communication unit 24a as a communication means connected to the LAN 100 to join the network. In addition, the karaoke apparatus 2 according to the present embodiment also includes the wireless LAN communication unit 24b, and instead of wired network connection using the LAN communication unit 24a, wireless network connection using the wireless LAN communication unit 24b is performed. Is also possible.

また、カラオケ装置２は、モニター４１に対して歌詞映像、背景映像を表示させる映像再生手段を備える。この映像再生手段は、映像情報に基づいて映像を再生する映像再生部２９、再生する映像を一時的に蓄積するビデオＲＡＭ２８、再生された映像に対する歌詞テロップの重畳、映像効果の付与等を行う映像制御部３１を備えて構成される。 In addition, the karaoke apparatus 2 includes video reproduction means for displaying a lyric image and a background image on the monitor 41. The video playback means is a video playback unit 29 for playing back a video based on video information, a video RAM 28 for temporarily storing a video to be played back, a video for superposing a lyric telop on the played video, and applying a video effect. The control unit 31 is configured.

さらに、このカラオケ装置２では、外部に接続されるモニター４１以外に、タッチパネルモニター３３に対して各種情報を表示することを可能としている。タッチパネルモニター３３は映像制御部３１から入力された映像情報を表示する表示部３５と、タッチ入力された位置を操作処理部２２に出力するタッチパネル３４が重畳されて構成されている。このタッチパネルモニター３３は、カラオケ装置２の筐体前面等に配置され、カラオケ装置２の操作部２１、あるいは、リモコン装置１のタッチパネルモニター１１などと同様、入力部として機能する。ユーザーは、タッチパネルモニター３３にて楽曲を選択することで、直接カラオケ装置２に予約させる等、カラオケ装置２に対する各種操作を行うことが可能である。 Furthermore, in the karaoke apparatus 2, in addition to the monitor 41 connected to the outside, various information can be displayed on the touch panel monitor 33. The touch panel monitor 33 is configured by superimposing a display unit 35 for displaying the video information input from the video control unit 31 and a touch panel 34 for outputting the position of the touch input to the operation processing unit 22. The touch panel monitor 33 is disposed on the front surface of the casing of the karaoke apparatus 2 or the like, and functions as an input unit, similarly to the operation unit 21 of the karaoke apparatus 2 or the touch panel monitor 11 of the remote control device 1. The user can perform various operations on the karaoke apparatus 2 such as making the karaoke apparatus 2 directly make a reservation by selecting music by the touch panel monitor 33.

さらに、カラオケ装置２は、各構成を統括して制御するためのＣＰＵ３０、各種プログラムを実行するにあたって必要となる情報を一時記憶するためのメモリ２７を含んだ制御部を備えて構成されている。 Furthermore, the karaoke apparatus 2 is configured to include a CPU 30 for integrally controlling each component and a control unit including a memory 27 for temporarily storing information necessary for executing various programs.

このような構成にてカラオケ装置２は、各種処理を実行することとなるが、カラオケ装置２の主な機能として、楽曲予約処理、楽曲再生処理などを実行可能としている。楽曲予約処理は、ユーザーからの指定に基づいて楽曲を指定、予約するための処理であってリモコン装置１と連携して実行される。リモコン装置１の選曲処理で形成された予約情報は、カラオケ装置２に送信される。カラオケ装置２は、受信した予約情報をメモリ２７中の予約テーブルに登録する。楽曲再生処理は、予約された楽曲を再生させる処理であって、楽曲演奏処理と歌詞表示処理とが同期して実行される処理である。 With such a configuration, the karaoke apparatus 2 executes various processes, but as main functions of the karaoke apparatus 2, it is possible to execute music reservation processing, music reproduction processing, and the like. The song reservation process is a process for designating and reserving a song based on a designation from a user, and is executed in cooperation with the remote control device 1. The reservation information formed in the music selection process of the remote control device 1 is transmitted to the karaoke device 2. The karaoke apparatus 2 registers the received reservation information in the reservation table in the memory 27. The music reproduction process is a process for reproducing a reserved music, and is a process in which a music performance process and a lyric display process are performed in synchronization.

楽曲演奏処理は、楽曲情報に含まれる演奏情報に基づき、音響制御部２５に演奏を実行させる処理である。音響制御部２５にて演奏された楽曲は、マイクロホン４３ａ、４３ｂから入力される歌唱音声と一緒にスピーカー４２から放音される。歌詞表示処理は、楽曲情報に含まれる歌詞情報をモニター４１に表示させることで歌唱補助を行う処理である。この歌詞表示処理で表示される歌詞に、背景映像を重畳させて表示させる背景映像表示処理を実行することとしてもよい。 The music performance process is a process for causing the sound control unit 25 to perform a performance based on the performance information included in the music information. The music played by the sound control unit 25 is emitted from the speaker 42 together with the singing voice input from the microphones 43a and 43b. The lyric display process is a process of performing song assistance by displaying the lyric information included in the music information on the monitor 41. It is also possible to execute background video display processing in which background video is superimposed and displayed on the lyrics displayed in the lyrics display processing.

一方、リモコン装置１は、ユーザーからの指示に基づいて楽曲を検索し、再生指示のあった楽曲について予約情報をカラオケ装置２に送信する選曲処理を実行可能としている。また、リモコン装置１は、カラオケ装置２あるいはインターネット上に接続されたホスト装置５から各種情報を受信し、各種処理を実行することが可能である。本実施形態では、ユーザーから各種指示を受け付けるユーザインターフェイスとして、操作部１７と、タッチパネルモニター１１を備えている。タッチパネルモニター１１は、表示部１１ａとタッチパネル１１ｂを有して構成され、表示部１１ａに各種インターフェイスを表示するとともに、ユーザーからのタッチ入力を受付可能としている。 On the other hand, the remote control device 1 can execute music selection processing for searching for music based on an instruction from the user and transmitting reservation information to the karaoke apparatus 2 for the music instructed to be reproduced. Further, the remote control device 1 can receive various information from the karaoke device 2 or the host device 5 connected to the Internet, and can execute various processes. In the present embodiment, the operation unit 17 and the touch panel monitor 11 are provided as user interfaces for receiving various instructions from the user. The touch panel monitor 11 is configured to have a display unit 11a and a touch panel 11b, displays various interfaces on the display unit 11a, and can receive touch input from a user.

さらにリモコン装置１は、選曲処理に必要とされるデータベース、各種プログラム、並びに、プログラム実行に伴って発生する各種情報を記憶する記憶部として、メモリ１４、そして、これら構成を統括して制御するためのリモコン側制御部を備えて構成される。リモコン側制御部には、ＣＰＵ１５、タッチパネルモニター１１に対して表示する映像を形成する映像制御部１３、表示する映像情報を一時的に蓄えるビデオＲＡＭ１２、タッチパネルモニター１１あるいは操作部１７からの入力を解釈してＣＰＵ１５に伝える操作処理部１８が含まれている。 Further, the remote control device 1 controls the memory 14 as a storage unit for storing a database required for music selection processing, various programs, and various information generated along with the execution of the program, and controls these configurations collectively. The remote control side control unit of The remote control control unit interprets the input from the CPU 15, the video control unit 13 for forming a video to be displayed on the touch panel monitor 11, the video RAM 12 for temporarily storing the video information to be displayed, the touch panel monitor 11 or the operation unit 17 And an operation processing unit 18 for transmitting the information to the CPU 15.

リモコン装置１は、無線ＬＡＮ通信部１６によって、アクセスポイント１３０と無線接続されることで、ＬＡＮ１００によって構成されるネットワークに接続される。なお、各リモコン装置１は、特定のカラオケ装置２に対して事前に対応付けされている。リモコン装置１から出力される各種命令は、対応付けされたカラオケ装置２にて受信されることとなる。 The remote control device 1 is connected to the network configured by the LAN 100 by being wirelessly connected to the access point 130 by the wireless LAN communication unit 16. Each remote control device 1 is associated in advance with a specific karaoke device 2. Various commands output from the remote control device 1 are received by the associated karaoke device 2.

このようなリモコン装置１の構成により、ユーザーからの各種入力をタッチパネルモニター１１、あるいは、操作部１７から受付けるとともに、映像情報をタッチパネルモニター１１の表示により各種情報を提供することで、カラオケ装置２に対して出力する予約情報を送信する選曲処理など、各種処理を行うことが可能となっている。 According to the configuration of the remote control device 1 as described above, the karaoke device 2 is provided by receiving various inputs from the user from the touch panel monitor 11 or the operation unit 17 and providing various information by displaying the video information on the touch panel monitor 11. It is possible to perform various processing such as music selection processing of transmitting reservation information to be output.

本実施形態のカラオケ装置２は、２種類の楽曲情報を再生可能としている。一方の種類（Ａタイプ）の楽曲情報は、歌唱評価を行うための歌唱評価情報を含んだ楽曲情報である。従来から知られているように、このような楽曲情報を再生する際の歌唱評価では、主旋律情報としての歌唱評価情報と、マイクロホンから入力される歌唱音声信号とを比較し、その一致の度合いに基づいて採点値を算出することが可能である。 The karaoke apparatus 2 of the present embodiment can reproduce two types of music information. One type (A type) of music information is music information including song evaluation information for performing song evaluation. As conventionally known, in the song evaluation at the time of reproducing such music information, the song evaluation information as the main melody information is compared with the singing voice signal input from the microphone, and the degree of coincidence is determined. It is possible to calculate a scoring value based on that.

図２（Ａ）は、本実施形態の楽曲情報（Ａタイプ）のデータ構成を示した図である。楽曲情報は、楽曲情報に関連する各種情報を含んだメタ情報と、演奏や歌詞の表示といった各種処理を実行するための実情報を有している。メタ情報には、楽曲情報を識別するための楽曲ＩＤ、曲名、歌手名、ジャンル等の楽曲関連情報を有している。楽曲関連情報は、ユーザーが楽曲を検索する際の検索対象項目として使用することが可能である。楽曲情報の実情報には、演奏情報、歌詞情報、背景映像情報等を含んで構成される。演奏情報は、ＭＩＤＩ規格に基づいて電子楽器用の制御情報、あるいは、実際の演奏を録音した圧縮音声情報等を含んで構成された、カラオケの伴奏音を演奏するための情報である。歌詞情報は、歌唱補助のため、演奏情報に同期して表示される情報であり、演奏に同期して表示された歌詞の色替えを行うように構成してもよい。歌唱評価情報は、楽曲再生時において、ユーザーの歌唱音声を評価する情報であり、歌唱すべき旋律等を含んで構成される。楽曲再生時、歌唱評価を行う際には、マイクロホン４３ａ、４３ｂに入力される歌唱音声と、この歌唱評価情報を比較することで、採点値の算出等を行うことが可能である。 FIG. 2A is a diagram showing a data configuration of music information (A type) according to the present embodiment. The music information includes meta information including various information related to the music information, and actual information for executing various processes such as performance and display of lyrics. The meta information includes music related information such as a music ID for identifying music information, a music title, a singer name, and a genre. The music related information can be used as a search target item when the user searches for music. The actual information of the music information includes performance information, lyric information, background video information, and the like. The performance information is information for playing the accompaniment sound of karaoke, which includes control information for an electronic musical instrument based on the MIDI standard, or compressed audio information obtained by recording an actual performance. The lyric information is information that is displayed in synchronization with the performance information for song assistance, and may be configured to change the color of the displayed lyrics in synchronization with the performance. The song evaluation information is information for evaluating a user's singing voice at the time of music reproduction, and is configured to include a melody to be sung and the like. When performing song evaluation at the time of music reproduction, it is possible to calculate a score or the like by comparing the singing voice input to the microphones 43a and 43b with the singing evaluation information.

他方の種類（Ｂタイプ）の楽曲情報は、歌唱評価情報を含んでいない、例えば、ＣＤに記録された楽曲等のように歌唱評価情報を有していない楽曲情報である。従来、このような楽曲情報の再生時には、歌唱評価情報を含んでいないため歌唱評価を行うことが困難であった。本実施形態のカラオケ装置２では、このような楽曲情報についても歌唱評価を行うことが可能となっている。 The other type (B type) of music information is music information that does not include song evaluation information, for example, music information that does not have song evaluation information such as music recorded on a CD. Conventionally, at the time of reproduction of such music information, it is difficult to perform song evaluation because the song evaluation information is not included. In the karaoke apparatus 2 of this embodiment, it is possible to perform song evaluation also about such music information.

図２（Ｂ）は、本実施形態の楽曲情報（Ｂタイプ）のデータ構成を示した図である。図２（Ａ）の楽曲情報と同様、楽曲情報は、楽曲情報に関連する各種情報を含んだメタ情報と、演奏や歌詞の表示といった各種処理を実行するための実情報を有している。メタ情報については図２（Ａ）の楽曲情報と同様であるため、ここでの説明は省略する。実情報には、伴奏音を含んだ演奏情報（本発明における「伴奏音信号」に相当）、映像情報が含まれている。演奏情報には、市販されるＣＤのように実際の演奏を録音した伴奏音が含まれている。また、演奏情報には、伴奏音のみならず、歌手による歌唱音声（本発明における「模範歌唱音」に相当）が含まれていている。楽曲情報（Ｂタイプ）における演奏情報は、伴奏音と歌唱音声がミキシングされた状態となっている。映像情報には、背景映像、伴奏音に同期して表示される歌詞が含まれている。本実施形態のカラオケ装置２は、歌唱評価情報が含まれない楽曲情報（Ｂタイプ）についても、演奏情報を使用して歌唱評価を行うことが可能となっている。 FIG. 2 (B) is a diagram showing a data configuration of music information (B type) of the present embodiment. Similar to the music information of FIG. 2A, the music information includes meta information including various information related to the music information, and actual information for executing various processes such as performance and lyrics display. The meta information is the same as the music information of FIG. 2A, and thus the description thereof is omitted here. The actual information includes performance information including an accompaniment sound (corresponding to the "accompaniment sound signal" in the present invention) and video information. The performance information includes an accompaniment sound obtained by recording an actual performance like a commercially available CD. Further, the performance information includes not only the accompaniment sound but also the singing voice by the singer (corresponding to the “model singing voice” in the present invention). The performance information in the music information (type B) is in a state where the accompaniment sound and the singing voice are mixed. The video information includes background video and lyrics displayed in synchronization with the accompaniment sound. The karaoke apparatus 2 according to the present embodiment can perform song evaluation using performance information even for music information (B type) that does not include song evaluation information.

なお、本実施形態では、事前に演奏情報に基づいて得られた演奏音確率密度を楽曲情報に含ませた構成としている。この演奏音確率密度は、伴奏音に加えて歌唱音を含んだ演奏音情報を、周波数解析した情報であって、歌唱音の音高の他、伴奏に使用される各種楽器の音高を含んだ情報となっている。 In the present embodiment, the playing sound probability density obtained in advance based on the playing information is included in the music information. The performance sound probability density is information obtained by frequency analyzing the performance sound information including the singing sound in addition to the accompaniment sound, and includes the pitches of various musical instruments used for accompaniment in addition to the pitch of the singing sound. Information.

なお、主旋律情報等、直接の比較対象となる歌唱評価情報を含まない楽曲情報（Ｂタイプ）としては、本実施形態のように演奏音確率密度を含んだ形態のみならず、演奏音確率密度を含んでいない楽曲情報であってもよい。その場合、楽曲演奏処理時に演奏音確率密度が算出されることになる。 In addition, as music information (B type) not including song evaluation information to be directly compared, such as main melody information, not only the form including the playing sound probability density as in this embodiment, but also the playing sound probability density It may be music information not included. In that case, the playing sound probability density is calculated at the time of the music playing process.

図３は、本実施形態の演奏音確率密度算出処理を示すフロー図である。本実施形態で使用する演奏音確率密度は、模範歌唱音と伴奏音を含んだ演奏情報（本発明の「伴奏音信号」に相当）を解析して得られる情報であって、特許第３４１３６３４号公報に記載される音高推定方法に従って算出される。本実施形態では、楽曲情報（Ｂタイプ）に含まれる演奏情報を事前に処理して、楽曲情報（Ｂタイプ）に含ませることとしているが、カラオケ装置２の処理能力によっては、楽曲演奏中に、演奏音確率密度を算出して歌唱評価を行うこととしてもよい。その場合、楽曲情報（Ｂタイプ）には、演奏音確率密度は含まれないことになる。 FIG. 3 is a flowchart showing the playing sound probability density calculation processing of the present embodiment. The performance sound probability density used in the present embodiment is information obtained by analyzing performance information (corresponding to "accompaniment sound signal" of the present invention) including model song sound and accompaniment sound, and is disclosed in Japanese Patent No. 3413634. It is calculated according to the pitch estimation method described in the publication. In this embodiment, the performance information included in the music information (B type) is processed in advance and included in the music information (B type). The song sound probability density may be calculated and song evaluation may be performed. In that case, the music information (B type) does not include the playing sound probability density.

演奏音確率密度算出処理は、解析対象となる楽曲情報から演奏情報を取得することで開始される（Ｓ１００）。特許第３４１３６３４号に記載される音高推定方法では、ＢＰＦを通すことで、メロディーライン（歌唱音）、ベースライン（ベース音）毎に基本周波数の軌跡を求めているが、本実施形態では、ＢＰＦを通さずに全ての帯域の演奏情報を対象としている。解析精度を高めるため、周波数解析（ＦＦＴ）を行う前の前処理として、ゼロパディング（ゼロ詰め）が行われる（Ｓ１０１）。そして、ゼロパディングされた演奏情報に対して周波数解析（Ｓ１０２）を実行する。本実施形態では、周波数解析手法としてＦＦＴを使用しているが、他の周波数解析手法を使用することも可能である。 The performance sound probability density calculation process is started by acquiring performance information from music information to be analyzed (S100). In the pitch estimation method described in Japanese Patent No. 3413634, the locus of the fundamental frequency is obtained for each of the melody line (song sound) and the baseline (bass sound) by passing the BPF, but in the present embodiment, Performance information of all bands is targeted without passing through the BPF. In order to improve analysis accuracy, zero padding (zero padding) is performed as pre-processing before performing frequency analysis (FFT) (S101). Then, frequency analysis (S102) is performed on the zero-padding performance information. In this embodiment, although FFT is used as a frequency analysis method, it is also possible to use other frequency analysis methods.

周波数解析された演奏音情報に対して、ＥＭアルゴリズム（ExpectationーMaximization）を使用して、演奏情報に含まれる演奏音確率密度が算出される。ＥＭアルゴリズムは、Ｅステップ（Expectation Step、Ｓ１０３）と、Ｍステップ（Maximization Step、Ｓ１０４）を交互に繰り返し適用することで実行される。その際、歌唱音の高調波構造の音モデルを適用することで、歌唱音について精度の高い演奏音確率密度が算出される。算出された演奏音確率密度は、解析対象となった楽曲情報に格納される（Ｓ１０５）。 The performance sound probability density included in the performance information is calculated using the EM algorithm (Expectation-Maximization) for the frequency-analyzed performance sound information. The EM algorithm is executed by alternately and repeatedly applying E step (Expectation Step, S103) and M step (Maximization Step, S104). At that time, the performance sound probability density with high accuracy is calculated for the singing sound by applying the sound model of the harmonic structure of the singing sound. The calculated performance sound probability density is stored in the music information to be analyzed (S105).

本実施形態では、このようにして算出された演奏音確率密度を使用して、楽曲情報（Ａタイプ）のように、主旋律情報のような歌唱音の音高を直接示した歌唱評価情報を含んでいない楽曲情報であっても、適切に歌唱評価を行うことが可能となっている。 In the present embodiment, the musical performance probability density calculated in this manner is used to include song evaluation information that directly indicates the pitch of the singing voice such as the main melody information, like the musical composition information (A type). It is possible to perform song evaluation properly even if it is not music information.

では、本実施形態のカラオケ装置２について歌唱評価を行う評価処理を含んだ楽曲再生処理について説明する。図４は、本実施形態の楽曲再生処理を示すフロー図である。カラオケ装置２は、リモコン装置１、あるいは、タッチパネルモニター３３等の入力部に対する操作に基づいて楽曲が予約される。図２（Ｃ）は、予約操作に基づいてカラオケ装置２のメモリ２７に記憶される予約情報のデータ構成である。予約情報は、楽曲情報を識別するための楽曲ＩＤの他、予約したユーザーを示すユーザーＩＤ、予約時の音程設定に基づく音程設定値等が含まれている。 The music reproduction process including the evaluation process of performing the song evaluation on the karaoke apparatus 2 of the present embodiment will be described. FIG. 4 is a flowchart showing the music reproduction process of the present embodiment. The karaoke apparatus 2 reserves a music piece based on an operation on an input unit such as the remote control device 1 or the touch panel monitor 33. FIG. 2C shows a data configuration of reservation information stored in the memory 27 of the karaoke apparatus 2 based on the reservation operation. The reservation information includes, in addition to the music ID for identifying the music information, a user ID indicating a user who has made a reservation, a pitch setting value based on a pitch setting at the time of reservation, and the like.

カラオケ装置２は、メモリ２７に記憶管理している予約テーブルをチェックし、再生の対象となる楽曲を確認する（Ｓ２０１）。次に再生する楽曲がある場合（Ｓ２０２：Ｙｅｓ）、予約情報中の楽曲ＩＤに対応する楽曲情報を読み出して楽曲の再生を開始する（Ｓ２０３）。楽曲情報の再生期間中、ユーザーの歌唱を評価する標準評価処理（Ｓ２０５）、あるいは、評価処理（Ｓ３００）が実行される。再生中の楽曲情報が、歌唱評価情報が含まれる楽曲情報（Ａタイプ）と判断された場合（Ｓ２０４：Ｙｅｓ）、標準評価処理（Ｓ２０５）が実行される。この標準評価処理（Ｓ２０５）は、従来から行われている歌唱評価であって、楽曲情報に含まれる歌唱評価情報（主旋律情報）と、マイクロホン４３ａ、４３ｂから入力される歌唱音声信号とを比較し、採点値等の評価結果を算出する処理である。なお、標準評価処理（Ｓ２０５）は、従来からよく知られている処理であるため、ここでの詳細な説明は省略する。 The karaoke apparatus 2 checks the reservation table stored and managed in the memory 27 and confirms the music to be reproduced (S201). Next, when there is a music to be reproduced (S202: Yes), the music information corresponding to the music ID in the reservation information is read out and reproduction of the music is started (S203). During the reproduction period of the music information, a standard evaluation process (S205) for evaluating the user's singing or an evaluation process (S300) is executed. When it is determined that the music information being reproduced is music information (A type) including singing evaluation information (S204: Yes), a standard evaluation process (S205) is performed. This standard evaluation process (S205) is a song evaluation that has been conventionally performed, and is performed by comparing song evaluation information (main melody information) included in the music information with the singing voice signal input from the microphones 43a and 43b. It is a process which calculates evaluation results, such as a graded value. In addition, since the standard evaluation process (S205) is a process which is well known conventionally, the detailed description here is omitted.

一方、再生中の楽曲情報が、歌唱評価情報が含まれない楽曲情報（Ｂタイプ）と判断された場合（Ｓ２０４：Ｎｏ）、評価処理（Ｓ３００）が実行される。この評価処理（Ｓ３００）は、本実施形態の特徴となる処理であって、楽曲情報（Ｂタイプ）に含まれる演奏音確率密度に基づいて歌唱評価を行う処理である。この評価処理（Ｓ３００）の詳細については後述する。Ａタイプ、Ｂタイプの何れについても楽曲の再生が終了する（Ｓ２０６、Ｓ２０７：Ｙｅｓ）と、標準評価処理（Ｓ２０５）、あるいは、評価処理（Ｓ３００）で判定された評価結果をモニター４１等に表示することでユーザーの歌唱能力が通知される。そして、楽曲再生処理の先頭に戻り、次に再生の対象となる楽曲の確認が行われる。 On the other hand, when it is determined that the music information being reproduced is music information (B type) not including singing evaluation information (S204: No), an evaluation process (S300) is performed. This evaluation process (S300) is a process that is a feature of the present embodiment, and is a process of performing song evaluation based on the playing sound probability density included in the music information (B type). Details of the evaluation process (S300) will be described later. When the reproduction of the music ends for any of A type and B type (S206, S207: Yes), the evaluation result determined in the standard evaluation process (S205) or the evaluation process (S300) is displayed on the monitor 41 etc. Will notify you of your singing ability. Then, the process returns to the beginning of the music reproduction process, and confirmation of the music to be reproduced next is performed.

図５は、本実施形態の評価処理（Ｓ３００）を示すフロー図である。評価処理（Ｓ３００）では、マイクロホン４３ａ、４３ｂから入力される歌唱音声信号と、楽曲情報から取得する演奏音確率密度に基づいて評価が行われる。マイクロホン４３ａ、４３ｂから入力された歌唱音声信号（Ｓ３０３）に対しては、歌唱音声信号の基本周波数について、その確率密度が算出される（Ｓ３０４）。その際、現時点が歌唱区間か否かが判定される（Ｓ３０５）。歌唱区間の判定は、歌唱音声信号自体を使用して判定してもよいし、演奏音情報、あるいは、演奏音確率密度を参照して判定してもよい。あるいは、楽曲情報に含まれる歌詞情報の進行状況を使用して判定することも可能である。 FIG. 5 is a flowchart showing the evaluation process (S300) of the present embodiment. In the evaluation process (S300), evaluation is performed based on the singing voice signals input from the microphones 43a and 43b and the playing sound probability density acquired from the music information. For the singing voice signal (S303) input from the microphones 43a and 43b, the probability density is calculated for the fundamental frequency of the singing voice signal (S304). At that time, it is determined whether or not the current time is a singing section (S305). The determination of the singing section may be made using the singing voice signal itself, or may be made with reference to the performance sound information or the performance sound probability density. Alternatively, the determination may be made using the progress status of the lyric information included in the music information.

歌唱区間と判定された場合（Ｓ３０５：Ｙｅｓ）、Ｓ３０４で算出された基本周波数の確率密度について二次元正規分布領域が算出される（Ｓ３０６）。図６は、本実施形態の音高確率密度の二次元正規分布領域を説明するための図である。図６（Ａ）は、基本周波数の確率密度を示すグラフであって、解析した時点における基本周波数ｆｃについて確率密度（確からしさ）が与えられた状態となっている。本実施形態では、この基本周波数ｆｃを中心（基準）として、所定分布を適用することで評価精度の向上を図っている。本実施形態では、周波数方向、時間軸方向のそれぞれに正規分布を適用し、図６（Ｂ）に示す二次元正規分布領域を作成している。基本周波数ｆｃを基準とする分布は、二次元正規分布に限られるものではなく、周波数軸方向のみ、時間軸方向のみであってもよい。また、分布形態も正規分布ではなく、各種他の分布形態を採用してもよい。作成された二次元正規分布領域を使用して、演奏音確率密度における模範歌唱特性の抽出、並びに、歌唱音声信号の評価を行うこととしている。 If it is determined to be a singing section (S305: Yes), a two-dimensional normal distribution area is calculated for the probability density of the fundamental frequency calculated in S304 (S306). FIG. 6 is a diagram for explaining a two-dimensional normal distribution region of pitch probability density according to the present embodiment. FIG. 6A is a graph showing the probability density of the fundamental frequency, in which the probability density (certainty) is given to the fundamental frequency fc at the time of analysis. In the present embodiment, the evaluation accuracy is improved by applying a predetermined distribution with the fundamental frequency fc as a center (reference). In the present embodiment, normal distribution is applied to each of the frequency direction and the time axis direction to create a two-dimensional normal distribution region shown in FIG. 6 (B). The distribution based on the fundamental frequency fc is not limited to the two-dimensional normal distribution, and may be only in the frequency axis direction or only in the time axis direction. Also, the distribution form is not a normal distribution, and various other distribution forms may be adopted. The created two-dimensional normal distribution region is used to extract model song characteristics in the performance sound probability density and to evaluate a singing voice signal.

図３で説明したように演奏音確率密度は、歌唱評価に必要となる歌唱音以外に、伴奏音、バックコーラス等の情報が含まれている。そのため、この演奏音確率密度を、そのまま評価に使用すると、評価精度が低下してしまうことが考えられる。本実施形態では、演奏音確率密度を、歌唱音声信号でフィルタリングすることで模範歌唱特性を抽出し、この模範歌唱特性を歌唱音声信号との比較に使用することで、評価精度の向上を図ることとしている。これは、歌唱者によって歌唱音の巧拙の相違はあるとはいえ、適切に歌唱を行えば、模範歌唱特性に類似した特性となること、反対にいえば、適切に歌唱を行わない場合、演奏音確率密度から模範歌唱特性を適切に抽出できないことを理由としている。 As described with reference to FIG. 3, the performance sound probability density includes information such as accompaniment sound and back chorus in addition to the singing sound required for song evaluation. Therefore, if this performance sound probability density is used as it is for evaluation, it is possible that the evaluation accuracy is lowered. In this embodiment, the model sound characteristics are extracted by filtering the performance sound probability density with the singing voice signal, and the model singing characteristics are used for comparison with the singing voice signal to improve the evaluation accuracy. And This means that although there are differences in the skill of the singing voice depending on the singers, if singing properly, the characteristics will be similar to the model singing characteristics, and conversely, if not properly singing, performance The reason is that the model singing characteristics can not be properly extracted from the sound probability density.

評価処理では、評価対象となる区間の演奏音確率密度が取得され（Ｓ３０１）、取得された歌唱音高確率密度から、歌唱音声信号の基本周波数を基準とする歌唱音高近傍範囲を抽出し、模範歌唱特性が算出される（Ｓ３０２）。本実施形態では、演奏音確率密度に対して、歌唱音声信号の二次元正規分布領域を乗算することで、歌唱音高近傍範囲の抽出、及び、模範歌唱特性の算出を同時に行うこととしている。なお、歌唱するユーザーは、本来歌唱すべき歌唱音高に対しオクターブを異ならせて歌唱することが考えられる。そのため、歌唱音高近傍範囲は、複数のオクターブ毎に歌唱音高近傍範囲を抽出することとしてもよい。あるいは、複数のオクターブ毎に抽出し、演奏音確率密度が占める割合が高いオクターブを歌唱音高近傍範囲として抽出することとしてもよい。 In the evaluation process, the performance sound probability density of the section to be evaluated is acquired (S301), and a singing sound pitch vicinity range based on the fundamental frequency of the singing voice signal is extracted from the acquired singing sound height probability density, Model singing characteristics are calculated (S302). In this embodiment, the performance sound probability density is multiplied by the two-dimensional normal distribution area of the singing voice signal to simultaneously perform the extraction of the singing sound pitch vicinity range and the calculation of the model singing characteristics. The user who sings may sing with different octaves with respect to the singing voice to be originally singing. Therefore, the singing sound pitch vicinity range may be configured to extract the singing sound pitch vicinity range for each of a plurality of octaves. Alternatively, it is also possible to extract for each of a plurality of octaves, and extract an octave having a high proportion of the playing sound probability density as the singing sound pitch vicinity range.

図７は、本実施形態の歌唱音高近傍範囲を説明するための図である。図７（Ａ）は、歌唱音高確率密度の時間変化を示したグラフである。図７（Ｂ）は、図７（Ａ）のグラフについて、歌唱尾音高近傍範囲を示したグラフである。図７（Ｂ）に示される歌唱音高近傍範囲は、歌唱するユーザーの基本周波数を基準とした範囲であって、この範囲外の歌唱音高確率密度は、歌唱評価の対象外となる。したがって、周波数について評価範囲を減らし、評価処理で必要となる負荷を削減することが可能となっている。 FIG. 7 is a diagram for explaining the singing sound pitch vicinity range of the present embodiment. FIG. 7A is a graph showing the temporal change of the singing voice pitch probability density. FIG. 7 (B) is a graph showing a singing tail pitch vicinity range for the graph of FIG. 7 (A). The song pitch vicinity range shown in FIG. 7B is a range based on the fundamental frequency of the user who sings, and the singing voice pitch probability density outside this range is not the target of the song evaluation. Therefore, it is possible to reduce the evaluation range of the frequency and to reduce the load required for the evaluation process.

図８は、本実施形態の模範歌唱特性の算出を説明するための図である。図８（Ａ）は、ある時間における演奏音確率密度と、歌唱音声信号から算出された二次元正規分布領域が記載されている。演奏音確率密度に対して二次元分布領域を乗算することで、図８（Ｂ）に示される模範歌唱特性が算出される。 FIG. 8 is a diagram for explaining the calculation of model singing characteristics of the present embodiment. FIG. 8A shows a performance sound probability density at a certain time and a two-dimensional normal distribution area calculated from a singing voice signal. The model singing characteristics shown in FIG. 8B are calculated by multiplying the two-dimensional distribution area by the playing sound probability density.

歌唱評価は、算出された模範歌唱特性と、二次元正規分布領域を比較処理（一致度算出）することで、行われる（Ｓ３０７）。図９は、図９は、本実施形態の比較処理（Ｓ３０７）を説明するための図である。図９（Ａ）は、演奏音情報に基づいて算出された模範歌唱特性であって、図９（Ｂ）は、歌唱音声信号に基づいて算出された二次元正規分布領域である。両者の一致度を算出することで比較処理が実行される。両者の一致具合が高い程、一致度、すなわち、歌唱評価は高いものとなる。なお、本実施形態では、二次元正規分布領域を、周波数方向、時間軸方向の両軸方向に持たせている。そのため、時間軸方向についても考慮して評価を行うことが可能である。時間軸方向を考慮する場合には、模範歌唱特性についても時間的な推移を使用する等、模範歌唱特性を二次元的に換算する、あるいは、二次元正規分布領域の時間的な推移に基づいて周波数軸に関する一次元の分布領域に換算して比較することが好ましい。 The song evaluation is performed by comparing the calculated model song characteristics with the two-dimensional normal distribution region (calculating the degree of coincidence) (S307). FIG. 9 is a view for explaining comparison processing (S307) of the present embodiment. FIG. 9 (A) shows model singing characteristics calculated based on the performance sound information, and FIG. 9 (B) shows a two-dimensional normal distribution area calculated based on the singing voice signal. The comparison process is executed by calculating the degree of coincidence between the two. The higher the degree of agreement between the two, the higher the degree of coincidence, ie, the song evaluation. In the present embodiment, the two-dimensional normal distribution region is provided in both the frequency direction and the time axis direction. Therefore, it is possible to perform the evaluation in consideration of the time axis direction. When considering the time-axis direction, the model singing characteristics are also converted in two dimensions, such as using a temporal transition for the model singing characteristics, or based on the temporal transition of the two-dimensional normal distribution region It is preferable to compare and compare in a one-dimensional distribution region with respect to the frequency axis.

比較処理（Ｓ３０７）で算出された比較採点値を積算することで採点値が算出される（Ｓ３０８）。楽曲情報の再生期間中、評価処理（Ｓ３００）を繰り返し実行することで、楽曲全体の採点値が算出されることになる。楽曲の再生終了が判定された場合（Ｓ２０７：Ｙｅｓ）、評価処理（Ｓ３００）の評価結果として算出された採点値をモニター４１等に表示することで、歌唱したユーザーに対して歌唱能力としての採点値が通知される。 A score value is calculated by integrating the comparative score value calculated in the comparison process (S307) (S308). By repeatedly executing the evaluation process (S300) during the reproduction period of the music information, the score value of the entire music is calculated. When it is determined that the reproduction of the music is finished (S207: Yes), the score value calculated as the evaluation result of the evaluation process (S300) is displayed on the monitor 41 or the like to score the singing user as a singing ability The value is notified.

以上、本実施形態のカラオケシステムによれば、主旋律情報のような歌唱評価情報を含んでいない楽曲情報であっても、演奏情報中、歌唱音声信号の音高近傍の周波数範囲である歌唱音高近傍範囲を評価の対象とすることで、処理負荷を抑えると共に、精度の高い歌唱評価を行うことが可能となる。 As described above, according to the karaoke system of the present embodiment, even if the song information does not include song evaluation information such as main melody information, the song pitch which is a frequency range near the pitch of the singing voice signal in the performance information. By setting the vicinity range as an evaluation target, it is possible to suppress the processing load and to perform song evaluation with high accuracy.

なお、本実施形態では、歌唱音声信号、演奏音情報の両方について、確率密度を算出して比較の対象としているが、各々について確率密度のみならず、各種指標を使用することが可能である。 In the present embodiment, the probability density is calculated for both the singing voice signal and the performance sound information and is used as a comparison target. However, not only the probability density but also various indexes can be used for each.

また、本実施形態では、楽曲情報（Ｂタイプ）に、演奏情報を解析して得られた演奏音確率密度を付帯させているが、この演奏音確率密度は、予め付帯させず、楽曲再生処理中に算出することとしてもよい。また、歌唱音高近傍範囲、あるいは、歌唱音高近傍範囲を特定可能な情報は、楽曲再生処理時にユーザーが歌唱する歌唱音声信号を使用して算出することに代えて、予め楽曲情報に付帯させておくこととしてもよい。その場合、楽曲情報の提供側（製造メーカー側）で用意した歌唱者が歌唱を行い、当該歌唱音声信号から得られた歌唱音高近傍範囲を楽曲情報に付帯させておく、あるいは、当該歌唱音声信号から得られた歌唱音高近傍範囲を特定可能な情報を楽曲情報に付帯させておくことが考えられる。 Further, in the present embodiment, the musical performance probability density obtained by analyzing the musical performance information is attached to the musical composition information (B type). However, the musical performance probability density is not incidental in advance, and the musical composition reproduction processing is performed. It may be calculated during the period. In addition, the information that can specify the song pitch vicinity range or the song pitch vicinity range is attached to the music information in advance instead of calculating using the singing voice signal that the user sings during the music reproduction process. It is good to keep it. In that case, the singer prepared by the music information providing side (manufacturer's side) sings, and the song sound vicinity range obtained from the singing voice signal is attached to the music information, or the singing voice It is conceivable to attach information capable of specifying a singing sound pitch vicinity range obtained from a signal to music information.

このように歌唱音高近傍範囲、あるいは、歌唱音高近傍範囲を特定可能な情報を楽曲情報に付帯させておくことで、楽曲情報の提供側（製造メーカー側）において、プロの歌手程、上手くなくても適度に歌唱できる歌唱者を使用して、適切な歌唱音高近傍範囲を提供することが可能となる。 As described above, by attaching information capable of specifying the song sound vicinity range or the song sound vicinity range to the music information, a professional singer can be performed well on the music information provider side (manufacturer side). It is possible to use a singer who can sing appropriately without having to provide a suitable singing pitch neighborhood range.

以上、本発明についてカラオケシステムを用いて説明したが、本発明はカラオケシステムに限られるものではない。カラオケ装置内、あるいは、カラオケ装置外において歌唱評価を実行する歌唱評価装置についても本発明の範疇に属するものである。また、現在、スマートホンにカラオケ用アプリ（プログラム）をインストールすることで、スマートホンでカラオケを行う形態、あるいは、ゲーム機において実行されるカラオケ用プログラムもよく知られている。このような各種情報処理装置にインストールすることで、本発明の機能を実現するカラオケ用プログラムに実装された歌唱評価プログラムについても本発明の範疇に属するものである。さらに歌唱評価方法についても本発明の範疇に属することはいうまでもない。 As mentioned above, although this invention was demonstrated using the karaoke system, this invention is not limited to a karaoke system. A song evaluation device that performs song evaluation inside or outside the karaoke device also belongs to the category of the present invention. In addition, at present, a form for performing karaoke with a smartphone by installing a karaoke application (program) on a smartphone, or a program for karaoke to be executed in a game machine is well known. The song evaluation program implemented in the program for karaoke which realizes the function of the present invention by being installed in such various information processing apparatuses also belongs to the category of the present invention. Furthermore, it goes without saying that the singing evaluation method also belongs to the category of the present invention.

１：リモコン装置２４ｂ：無線ＬＡＮ通信部
２：カラオケ装置２５：音響制御部
５：ホスト装置２７：メモリ
１１：タッチパネルモニター２８：ビデオＲＡＭ
１１ａ：表示部２９：映像再生部
１１ｂ：タッチパネル３０：ＣＰＵ
１２：ビデオＲＡＭ３１：映像制御部
１３：映像制御部３２：ハードディスク
１４：メモリ３３：タッチパネルモニター
１５：ＣＰＵ３４：タッチパネル
１６：無線ＬＡＮ通信部３５：表示部
１７：操作部４１：モニター
１８：操作処理部４２：スピーカー
２１：操作部４３ａ、４３ｂ：マイクロホン
２２：操作処理部１３０：アクセスポイント
２４ａ：ＬＡＮ通信部 1: Remote control device 24b: Wireless LAN communication unit 2: Karaoke device 25: Sound control unit 5: Host device 27: Memory 11: Touch panel monitor 28: Video RAM
11a: Display unit 29: Video reproduction unit 11b: Touch panel 30: CPU
12: Video RAM 31: Video control unit 13: Video control unit 32: Hard disk 14: Memory 33: Touch panel monitor 15: CPU 34: Touch panel 16: Wireless LAN communication unit 35: Display unit 17: Operation unit 41: Monitor 18: Operation Processing unit 42: Speaker 21: Operation unit 43a, 43b: Microphone 22: Operation processing unit 130: Access point 24a: LAN communication unit

Claims

Sing voice pitch proximity range calculation means for performing frequency analysis of a singing voice signal input from the microphone and calculating a singing voice pitch proximity range which is a frequency range near the pitch of the singing voice signal;
Model singing voice characteristic calculation means for calculating model singing characteristics within the vicinity of the singing pitch from the accompaniment sound signal including the model singing sound;
What is claimed is: 1. A song evaluation apparatus comprising: evaluation means for evaluating a singing voice signal by comparing a singing voice characteristic of a singing voice signal with a model singing characteristic.

The song evaluation device according to claim 1, wherein the singing sound pitch vicinity range is a predetermined distribution based on the pitch of the singing voice signal in at least one of the frequency axis direction and the time axis direction. .

Singing sound pitch vicinity range calculation means for calculating a singing sound pitch vicinity range from an accompaniment sound signal accompanied by information that can specify the singing sound pitch vicinity range or the singing sound pitch vicinity range;
Model singing voice characteristic calculation means for calculating model singing characteristics within the vicinity of the singing pitch from the accompaniment sound signal including the model singing sound;
What is claimed is: 1. A song evaluation apparatus comprising: an evaluation unit that evaluates a singing voice signal by comparing singing voice characteristics of a singing voice signal input from a microphone with a model singing characteristic.

The song evaluation apparatus according to any one of claims 1 to 3;
A karaoke apparatus comprising: reproduction means for reproducing an accompaniment sound signal.

Singing voice height vicinity range calculation processing of performing frequency analysis of the singing voice signal input from the microphone and calculating a singing voice pitch vicinity range which is a frequency range near the pitch of the singing voice signal;
Model singing voice characteristic calculation processing for calculating model singing characteristics within the vicinity of the singing pitch from the accompaniment sound signal including the model singing sound;
A song evaluation program characterized in that an evaluation process for evaluating a singing voice signal by comparing a singing voice characteristic of a singing voice signal with a model singing characteristic is executable.

Singing sound height vicinity range calculation processing for calculating a singing sound height vicinity range from an accompaniment sound signal to which information capable of specifying a singing sound height vicinity range or a singing sound height vicinity range is attached;
Model singing voice characteristic calculation processing for calculating model singing characteristics within the vicinity of the singing pitch from the accompaniment sound signal including model or silence;
A song evaluation program characterized in that it is possible to execute an evaluation process of evaluating a singing voice signal by comparing singing voice characteristics of a singing voice signal input from a microphone with a model singing characteristic.