JP6810676B2

JP6810676B2 - Singing evaluation device, singing evaluation program and karaoke device

Info

Publication number: JP6810676B2
Application number: JP2017228478A
Authority: JP
Inventors: 佳紀原
Original assignee: Xing Inc
Current assignee: Xing Inc
Priority date: 2017-11-28
Filing date: 2017-11-28
Publication date: 2021-01-06
Anticipated expiration: 2037-11-28
Also published as: JP2019101071A

Description

本発明は、カラオケ等、再生される伴奏音に併せて歌唱を行う際、歌唱評価を行う歌唱評価装置、歌唱評価プログラム、歌唱評価方法及びカラオケ装置に関する。 The present invention relates to a singing evaluation device, a singing evaluation program, a singing evaluation method, and a karaoke device that evaluate singing when singing along with a reproduced accompaniment sound such as karaoke.

従来、伴奏音に合わせて歌唱を楽しむことのできるカラオケ装置では、歌唱評価としての採点機能を備えたものがある。カラオケ装置における採点機能は、伴奏情報内の主旋律情報と、マイクロホンから入力される歌唱音声信号を比較して行われることが一般的である。したがって、ＣＤ等のように主旋律情報を有していない音源では、採点機能を利用することができなかった。 Conventionally, some karaoke devices that allow you to enjoy singing along with the accompaniment sound have a scoring function as a singing evaluation. The scoring function in the karaoke device is generally performed by comparing the main melody information in the accompaniment information with the singing voice signal input from the microphone. Therefore, the scoring function cannot be used with a sound source that does not have main melody information such as a CD.

特許文献１には、ＣＤ等のように主旋律情報を有していない音源であっても採点（歌唱評価）を行うことのできる音声評価装置が開示されている。この音声評価装置では、マイクロホンから入力される音声データから、時間軸に沿って抽出された評価対象ピッチと、複数の音が含まれるオーディオデータから時間軸に沿って抽出された複数の音、各々の基準ピッチとを比較することで音声データを評価している。 Patent Document 1 discloses a voice evaluation device capable of scoring (singing evaluation) even for a sound source that does not have main melody information such as a CD. In this voice evaluation device, the evaluation target pitch extracted along the time axis from the voice data input from the microphone and the plurality of sounds extracted along the time axis from the audio data including a plurality of sounds, respectively. The audio data is evaluated by comparing with the reference pitch of.

特開２０１６−１２２１６４号公報JP-A-2016-122164

特許文献１に開示される音声評価装置によれば、市販の音楽用ＣＤのように、主旋律（正解データ）が設けられていない音源であっても歌唱評価を行うことが可能である。しかしながら、特許文献１では、複数の基準ピッチを抽出する必要があるため、歌唱評価に必要となる処理負荷が大きくなることが想定される。また、評価対象ピッチを、オーディオデータに含まれるメインボーカル、コーラス、各種楽器などのあらゆる音の基準ピッチと比較するため比較に必要な処理負荷は大きくなることが考えられる。また、複数の基準ピッチの内、適切な基準ピッチと比較しないと、歌唱評価が不適切となってしまう場合がある。 According to the voice evaluation device disclosed in Patent Document 1, it is possible to perform singing evaluation even for a sound source that is not provided with a main melody (correct answer data), such as a commercially available music CD. However, in Patent Document 1, since it is necessary to extract a plurality of reference pitches, it is expected that the processing load required for singing evaluation will increase. Further, since the pitch to be evaluated is compared with the reference pitch of all sounds such as main vocals, choruses, and various musical instruments included in the audio data, it is considered that the processing load required for the comparison becomes large. In addition, the singing evaluation may be inappropriate unless it is compared with an appropriate reference pitch among a plurality of reference pitches.

本発明は、このような事情を考慮したものであって、ＣＤ等のように主旋律情報を有していない音源であっても、精度良く、また、処理負荷も抑えた歌唱評価を行うことのできる歌唱評価装置、歌唱評価プログラム及びカラオケ装置を提供することを目的としている。 The present invention takes such circumstances into consideration, and even a sound source that does not have main melody information such as a CD can perform singing evaluation with high accuracy and with a reduced processing load. It is an object of the present invention to provide a singing evaluation device, a singing evaluation program, and a karaoke device that can be used.

そのため本発明に係る歌唱評価装置は、以下の構成を採用したことを特徴としている。
マイクロホンから入力される歌唱音声信号を周波数解析し、歌唱音声信号の音高近傍の周波数範囲である歌唱音高近傍範囲を算出する歌唱音高近傍範囲算出手段と、
模範歌唱音を含んだ伴奏音信号から、歌唱音高近傍範囲算出手段で算出された歌唱音高近傍範囲内における模範歌唱特性を算出する模範歌唱音声特性算出手段と、
歌唱音声信号を周波数解析し、歌唱音声特性を算出する歌唱音声特性算出手段と、
歌唱音声特性算出手段で算出された歌唱音声信号の歌唱音声特性と、模範歌唱音声特性算出手段で算出された模範歌唱特性を比較することで歌唱音声信号の評価を行う評価手段と、を備えたことを特徴とする。 Therefore, the singing evaluation device according to the present invention is characterized by adopting the following configuration.
A singing pitch proximity range calculation means that calculates a singing pitch neighborhood range that is a frequency range near the pitch of the singing pitch by frequency analysis of the singing voice signal input from the microphone.
From the accompaniment sound signal including the model singing sound, the model singing voice characteristic calculation means for calculating the model singing characteristic within the singing pitch vicinity range calculated by the singing pitch vicinity range calculation means, and the model singing voice characteristic calculation means.
Singing voice characteristic calculation means that calculates singing voice characteristics by frequency analysis of singing voice signals,
It is provided with an evaluation means for evaluating the singing voice signal by comparing the singing voice characteristic of the singing voice signal calculated by the singing voice characteristic calculation means with the model singing characteristic calculated by the model singing voice characteristic calculation means . It is characterized by that.

さらに本発明に係る歌唱評価装置において、
歌唱音高近傍範囲は、周波数軸方向または時間軸方向の少なくとも何れか１つについて、歌唱音声信号の音高を基準とする所定分布であることを特徴とする。 Further, in the singing evaluation device according to the present invention,
The singing pitch vicinity range is characterized in that it has a predetermined distribution with reference to the pitch of the singing voice signal in at least one of the frequency axis direction and the time axis direction.

また本発明に係る歌唱評価装置は、
歌唱音高近傍範囲、もしくは、歌唱音高近傍範囲を特定可能な情報が付帯された伴奏音信号から歌唱音高近傍範囲を算出する歌唱音高近傍範囲算出手段と、
模範歌唱音を含んだ伴奏音信号から、歌唱音高近傍範囲内における模範歌唱特性を算出する模範歌唱音声特性算出手段と、
マイクロホンから入力される歌唱音声信号の歌唱音声特性と、模範歌唱特性を比較することで歌唱音声信号の評価を行う評価手段と、を備えたことを特徴とする。 The singing evaluation device according to the present invention is
A singing pitch proximity range calculating means for calculating a singing pitch neighborhood range from an accompaniment sound signal accompanied by information that can identify the singing pitch neighborhood range or the singing pitch neighborhood range.
A model singing voice characteristic calculation means for calculating a model singing characteristic within a range near the singing pitch from an accompaniment sound signal including a model singing sound,
It is characterized by being provided with an evaluation means for evaluating a singing voice signal by comparing the singing voice characteristic of the singing voice signal input from a microphone and the model singing characteristic.

また本発明に係るカラオケ装置は、
上記した何れか１つの歌唱評価装置と、
模範歌唱音を含んだ伴奏音信号を再生する再生手段と、を備えたことを特徴とする。 The karaoke device according to the present invention is
With any one of the above singing evaluation devices,
It is characterized by being provided with a reproduction means for reproducing an accompaniment sound signal including a model singing sound.

また本発明に係る歌唱評価プログラムは、
マイクロホンから入力される歌唱音声信号を周波数解析し、歌唱音声信号の音高近傍の周波数範囲である歌唱音高近傍範囲を算出する歌唱音高近傍範囲算出処理と、
模範歌唱音を含んだ伴奏音信号から、歌唱音高近傍範囲算出手段で算出された歌唱音高近傍範囲内における模範歌唱特性を算出する模範歌唱音声特性算出処理と、
歌唱音声信号を周波数解析し、歌唱音声特性を算出する歌唱音声特性算出手段と、
歌唱音声特性算出手段で算出された歌唱音声信号の歌唱音声特性と、模範歌唱音声特性算出手段で算出された模範歌唱特性を比較することで歌唱音声信号の評価を行う評価処理と、を実行可能としたことを特徴とする。 The singing evaluation program according to the present invention is
Frequency analysis of the singing voice signal input from the microphone and calculation of the singing pitch neighborhood range, which is the frequency range near the pitch of the singing pitch, and the singing pitch neighborhood calculation process.
From the accompaniment sound signal including the model singing sound, the model singing voice characteristic calculation process for calculating the model singing characteristic within the singing pitch neighborhood range calculated by the singing pitch neighborhood calculation means, and
Singing voice characteristic calculation means that calculates singing voice characteristics by frequency analysis of singing voice signals,
It is possible to execute an evaluation process that evaluates the singing voice signal by comparing the singing voice characteristic of the singing voice signal calculated by the singing voice characteristic calculation means with the model singing characteristic calculated by the model singing voice characteristic calculation means. It is characterized by the fact that.

また本発明に係る歌唱評価プログラムは、
歌唱音高近傍範囲、もしくは、歌唱音高近傍範囲を特定可能な情報が付帯された伴奏音信号から歌唱音高近傍範囲を算出する歌唱音高近傍範囲算出処理と、
模範歌唱音を含んだ伴奏音信号から、歌唱音高近傍範囲内における模範歌唱特性を算出する模範歌唱音声特性算出処理と、
マイクロホンから入力される歌唱音声信号の歌唱音声特性と、模範歌唱特性を比較することで歌唱音声信号の評価を行う評価処理と、を実行可能としたことを特徴とする。 The singing evaluation program according to the present invention is
Singing pitch neighborhood calculation processing that calculates the singing pitch neighborhood range from the accompaniment sound signal with information that can identify the singing pitch neighborhood range or the singing pitch neighborhood range,
The model singing voice characteristic calculation process that calculates the model singing characteristics within the range near the singing pitch from the accompaniment sound signal including the model singing sound, and
The feature is that the singing voice characteristic of the singing voice signal input from the microphone and the evaluation process for evaluating the singing voice signal by comparing the model singing characteristic can be executed.

本発明に係る歌唱評価装置、歌唱評価プログラム及びカラオケ装置によれば、主旋律情報が付帯されていないＣＤ音源等のような伴奏音信号を使用して歌唱を行う場合であっても、伴奏音信号から模範歌唱音声特性を算出することで、採点等の歌唱評価を行うことが可能となる。そして、伴奏音信号から模範歌唱音声特性を算出する際、ユーザーの歌唱する歌唱音高近傍範囲を使用して算出するため、伴奏音信号の全周波数範囲を対象とする必要が無く、処理負荷を抑えるとともに、精度の高い歌唱評価を行うことが可能となる。 According to the singing evaluation device, the singing evaluation program, and the karaoke device according to the present invention, the accompaniment sound signal is used even when singing using an accompaniment sound signal such as a CD sound source to which the main melody information is not attached. By calculating the model singing voice characteristics from, it is possible to perform singing evaluation such as scoring. Then, when calculating the model singing voice characteristic from the accompaniment sound signal, it is calculated using the range near the singing pitch of the user singing, so that it is not necessary to target the entire frequency range of the accompaniment sound signal, and the processing load is increased. While suppressing it, it becomes possible to perform highly accurate singing evaluation.

また、本発明に係る歌唱評価装置、歌唱評価プログラム及びカラオケ装置で使用する歌唱音高近傍範囲は、周波数軸方向について、歌唱音高を基準とする所定分布となっている。このように歌唱音高範囲に所定分布を持たせることで、模範歌唱音特性に重み付けを行うことでさらに精度の高い歌唱評価を行うことが可能となる。 Further, the range near the singing pitch used in the singing evaluation device, the singing evaluation program, and the karaoke device according to the present invention has a predetermined distribution based on the singing pitch in the frequency axis direction. By giving a predetermined distribution to the singing pitch range in this way, it is possible to perform singing evaluation with higher accuracy by weighting the model singing sound characteristics.

さらに、本発明に係る歌唱評価装置、歌唱評価プログラム及びカラオケ装置で使用する歌唱音高近傍範囲は、時間軸方向について、歌唱音高を基準とする所定分布となっている。このように歌唱音高範囲の時間軸方向を考慮することで、さらに精度の高い歌唱評価を行うことが可能となる。 Further, the range near the singing pitch used in the singing evaluation device, the singing evaluation program, and the karaoke device according to the present invention has a predetermined distribution based on the singing pitch in the time axis direction. By considering the time axis direction of the singing pitch range in this way, it is possible to perform singing evaluation with higher accuracy.

また、本発明に係る歌唱評価装置、歌唱評価プログラム及びカラオケ装置によれば、伴奏音信号に歌唱音高近傍範囲、もしくは、歌唱音高近傍範囲を特定可能な情報を付帯しておくことで、正確な主旋律情報が付帯されていないＣＤ音源等のような伴奏音信号を使用して歌唱を行う場合であっても、伴奏音信号から模範歌唱音声特性を算出することで、採点等の歌唱評価を行うことが可能となる。そして、伴奏音信号から模範歌唱音声特性を算出する際、予め付帯されている歌唱音高近傍範囲を使用して算出するため、伴奏音信号の全周波数範囲を対象とする必要が無く、処理負荷を抑えるとともに、精度の高い歌唱評価を行うことが可能となる。 Further, according to the singing evaluation device, the singing evaluation program, and the karaoke device according to the present invention, the accompaniment sound signal is accompanied by information that can specify the singing sound height vicinity range or the singing sound height vicinity range. Even when singing using an accompaniment sound signal such as a CD sound source that does not have accurate main melody information attached, singing evaluation such as scoring is performed by calculating the model singing sound characteristics from the accompaniment sound signal. It becomes possible to do. Then, when calculating the model singing voice characteristic from the accompaniment sound signal, it is calculated using the range near the singing pitch that is attached in advance, so that it is not necessary to target the entire frequency range of the accompaniment sound signal, and the processing load. It is possible to perform highly accurate singing evaluation while suppressing the above.

本実施形態のカラオケシステムの構成を示す図The figure which shows the structure of the karaoke system of this embodiment 本実施形態のカラオケシステムで使用する各種データ構成を示す図The figure which shows various data structures used in the karaoke system of this embodiment 本実施形態の演奏音確率密度算出処理を示すフロー図Flow chart showing performance sound probability density calculation processing of this embodiment 本実施形態の楽曲再生処理を示すフロー図Flow chart showing music reproduction processing of this embodiment 本実施形態の評価処理を示すフロー図Flow chart showing evaluation processing of this embodiment 本実施形態の歌唱音高近傍範囲を説明するための図The figure for demonstrating the singing pitch neighborhood range of this embodiment 本実施形態の音高確率密度の二次元正規分布領域を説明するための図The figure for demonstrating the two-dimensional normal distribution region of the pitch probability density of this embodiment. 本実施形態の模範歌唱特性を説明するための図The figure for demonstrating the model singing characteristic of this embodiment 本実施形態の比較処理を説明するための図The figure for demonstrating the comparison process of this embodiment.

図１は、本実施形態のカラオケシステムの構成を示す図である。本実施形態におけるカラオケシステムは、カラオケ装置２（コマンダと呼ぶこともある）と、リモコン装置１を含んで構成されている。カラオケ装置２とリモコン装置１は、ＬＡＮ１００及びアクセスポイント１３０を利用してネットワークを形成するように通信接続されている。 FIG. 1 is a diagram showing a configuration of a karaoke system of the present embodiment. The karaoke system in the present embodiment includes a karaoke device 2 (sometimes called a commander) and a remote control device 1. The karaoke device 2 and the remote controller 1 are communicated and connected so as to form a network by using the LAN 100 and the access point 130.

カラオケボックスなどの店舗に設置されるカラオケ装置２は、楽曲を演奏するための演奏部として音響制御部２５を備えている。また、カラオケ装置２は、ユーザーからの各種入力を受け付ける操作部２１を備える。カラオケ装置２は、操作部２１からの入力を解釈してＣＰＵ３０に伝達する操作処理部２２を備える。また、カラオケ装置２は、各種情報を記憶する記憶部としてのハードディスク３２を備える。カラオケ装置２は、ＬＡＮ１００に接続してネットワークに加入する通信手段としてのＬＡＮ通信部２４ａを備えている。また、本実施形態のカラオケ装置２は、無線ＬＡＮ通信部２４ｂも備えており、ＬＡＮ通信部２４ａを使用した有線によるネットワーク接続に代え、無線ＬＡＮ通信部２４ｂを使用した無線によるネットワーク接続を行うことも可能である。 The karaoke device 2 installed in a store such as a karaoke box includes an acoustic control unit 25 as a performance unit for playing music. Further, the karaoke device 2 includes an operation unit 21 that receives various inputs from the user. The karaoke device 2 includes an operation processing unit 22 that interprets the input from the operation unit 21 and transmits the input to the CPU 30. Further, the karaoke device 2 includes a hard disk 32 as a storage unit for storing various information. The karaoke device 2 includes a LAN communication unit 24a as a communication means for connecting to the LAN 100 and joining the network. Further, the karaoke device 2 of the present embodiment also includes a wireless LAN communication unit 24b, and instead of a wired network connection using the LAN communication unit 24a, a wireless network connection using the wireless LAN communication unit 24b is performed. Is also possible.

また、カラオケ装置２は、モニター４１に対して歌詞映像、背景映像を表示させる映像再生手段を備える。この映像再生手段は、映像情報に基づいて映像を再生する映像再生部２９、再生する映像を一時的に蓄積するビデオＲＡＭ２８、再生された映像に対する歌詞テロップの重畳、映像効果の付与等を行う映像制御部３１を備えて構成される。 Further, the karaoke device 2 includes a video reproduction means for displaying a lyrics image and a background image on the monitor 41. The video reproduction means includes a video reproduction unit 29 that reproduces an image based on the image information, a video RAM 28 that temporarily stores the image to be reproduced, a video that superimposes a lyric telop on the reproduced video, and imparts a video effect. It is configured to include a control unit 31.

さらに、このカラオケ装置２では、外部に接続されるモニター４１以外に、タッチパネルモニター３３に対して各種情報を表示することを可能としている。タッチパネルモニター３３は映像制御部３１から入力された映像情報を表示する表示部３５と、タッチ入力された位置を操作処理部２２に出力するタッチパネル３４が重畳されて構成されている。このタッチパネルモニター３３は、カラオケ装置２の筐体前面等に配置され、カラオケ装置２の操作部２１、あるいは、リモコン装置１のタッチパネルモニター１１などと同様、入力部として機能する。ユーザーは、タッチパネルモニター３３にて楽曲を選択することで、直接カラオケ装置２に予約させる等、カラオケ装置２に対する各種操作を行うことが可能である。 Further, in this karaoke device 2, it is possible to display various information on the touch panel monitor 33 in addition to the monitor 41 connected to the outside. The touch panel monitor 33 is configured by superimposing a display unit 35 that displays video information input from the video control unit 31 and a touch panel 34 that outputs the touch-input position to the operation processing unit 22. The touch panel monitor 33 is arranged on the front surface of the housing of the karaoke device 2, and functions as an input unit like the operation unit 21 of the karaoke device 2 or the touch panel monitor 11 of the remote controller 1. By selecting a musical piece on the touch panel monitor 33, the user can perform various operations on the karaoke device 2, such as having the karaoke device 2 make a reservation directly.

さらに、カラオケ装置２は、各構成を統括して制御するためのＣＰＵ３０、各種プログラムを実行するにあたって必要となる情報を一時記憶するためのメモリ２７を含んだ制御部を備えて構成されている。 Further, the karaoke device 2 is configured to include a control unit including a CPU 30 for controlling each configuration in a centralized manner and a memory 27 for temporarily storing information necessary for executing various programs.

このような構成にてカラオケ装置２は、各種処理を実行することとなるが、カラオケ装置２の主な機能として、楽曲予約処理、楽曲再生処理などを実行可能としている。楽曲予約処理は、ユーザーからの指定に基づいて楽曲を指定、予約するための処理であってリモコン装置１と連携して実行される。リモコン装置１の選曲処理で形成された予約情報は、カラオケ装置２に送信される。カラオケ装置２は、受信した予約情報をメモリ２７中の予約テーブルに登録する。楽曲再生処理は、予約された楽曲を再生させる処理であって、楽曲演奏処理と歌詞表示処理とが同期して実行される処理である。 With such a configuration, the karaoke device 2 executes various processes, and as the main functions of the karaoke device 2, music reservation processing, music reproduction processing, and the like can be executed. The music reservation process is a process for designating and reserving music based on a designation from the user, and is executed in cooperation with the remote controller 1. The reservation information formed by the music selection process of the remote control device 1 is transmitted to the karaoke device 2. The karaoke device 2 registers the received reservation information in the reservation table in the memory 27. The music reproduction process is a process of reproducing a reserved music, and is a process in which the music performance process and the lyrics display process are executed in synchronization.

楽曲演奏処理は、楽曲情報に含まれる演奏情報に基づき、音響制御部２５に演奏を実行させる処理である。音響制御部２５にて演奏された楽曲は、マイクロホン４３ａ、４３ｂから入力される歌唱音声と一緒にスピーカー４２から放音される。歌詞表示処理は、楽曲情報に含まれる歌詞情報をモニター４１に表示させることで歌唱補助を行う処理である。この歌詞表示処理で表示される歌詞に、背景映像を重畳させて表示させる背景映像表示処理を実行することとしてもよい。 The music performance process is a process of causing the acoustic control unit 25 to perform a performance based on the performance information included in the music information. The music played by the sound control unit 25 is emitted from the speaker 42 together with the singing voice input from the microphones 43a and 43b. The lyrics display process is a process of assisting singing by displaying the lyrics information included in the music information on the monitor 41. The background image display process for displaying the background image superimposed on the lyrics displayed by the lyrics display process may be executed.

一方、リモコン装置１は、ユーザーからの指示に基づいて楽曲を検索し、再生指示のあった楽曲について予約情報をカラオケ装置２に送信する選曲処理を実行可能としている。また、リモコン装置１は、カラオケ装置２あるいはインターネット上に接続されたホスト装置５から各種情報を受信し、各種処理を実行することが可能である。本実施形態では、ユーザーから各種指示を受け付けるユーザインターフェイスとして、操作部１７と、タッチパネルモニター１１を備えている。タッチパネルモニター１１は、表示部１１ａとタッチパネル１１ｂを有して構成され、表示部１１ａに各種インターフェイスを表示するとともに、ユーザーからのタッチ入力を受付可能としている。 On the other hand, the remote controller 1 can perform a music selection process of searching for a musical piece based on an instruction from the user and transmitting reservation information to the karaoke device 2 for the song for which a playback instruction has been given. Further, the remote controller 1 can receive various information from the karaoke apparatus 2 or the host apparatus 5 connected on the Internet and execute various processes. In the present embodiment, an operation unit 17 and a touch panel monitor 11 are provided as a user interface for receiving various instructions from the user. The touch panel monitor 11 includes a display unit 11a and a touch panel 11b, displays various interfaces on the display unit 11a, and can accept touch input from the user.

さらにリモコン装置１は、選曲処理に必要とされるデータベース、各種プログラム、並びに、プログラム実行に伴って発生する各種情報を記憶する記憶部として、メモリ１４、そして、これら構成を統括して制御するためのリモコン側制御部を備えて構成される。リモコン側制御部には、ＣＰＵ１５、タッチパネルモニター１１に対して表示する映像を形成する映像制御部１３、表示する映像情報を一時的に蓄えるビデオＲＡＭ１２、タッチパネルモニター１１あるいは操作部１７からの入力を解釈してＣＰＵ１５に伝える操作処理部１８が含まれている。 Further, the remote controller 1 is for controlling the memory 14 as a storage unit for storing the database required for the music selection process, various programs, and various information generated by executing the programs, and these configurations. It is configured to be equipped with a remote control side control unit. The remote control side control unit interprets the input from the CPU 15, the video control unit 13 that forms the image to be displayed on the touch panel monitor 11, the video RAM 12 that temporarily stores the video information to be displayed, the touch panel monitor 11 or the operation unit 17. The operation processing unit 18 that transmits the information to the CPU 15 is included.

リモコン装置１は、無線ＬＡＮ通信部１６によって、アクセスポイント１３０と無線接続されることで、ＬＡＮ１００によって構成されるネットワークに接続される。なお、各リモコン装置１は、特定のカラオケ装置２に対して事前に対応付けされている。リモコン装置１から出力される各種命令は、対応付けされたカラオケ装置２にて受信されることとなる。 The remote controller 1 is connected to the network configured by the LAN 100 by being wirelessly connected to the access point 130 by the wireless LAN communication unit 16. Each remote control device 1 is associated with a specific karaoke device 2 in advance. Various commands output from the remote control device 1 will be received by the associated karaoke device 2.

このようなリモコン装置１の構成により、ユーザーからの各種入力をタッチパネルモニター１１、あるいは、操作部１７から受付けるとともに、映像情報をタッチパネルモニター１１の表示により各種情報を提供することで、カラオケ装置２に対して出力する予約情報を送信する選曲処理など、各種処理を行うことが可能となっている。 With such a configuration of the remote control device 1, various inputs from the user are received from the touch panel monitor 11 or the operation unit 17, and various information is provided by the display of the touch panel monitor 11 to the karaoke device 2. It is possible to perform various processes such as music selection processing for transmitting reservation information to be output.

本実施形態のカラオケ装置２は、２種類の楽曲情報を再生可能としている。一方の種類（Ａタイプ）の楽曲情報は、歌唱評価を行うための歌唱評価情報を含んだ楽曲情報である。従来から知られているように、このような楽曲情報を再生する際の歌唱評価では、主旋律情報としての歌唱評価情報と、マイクロホンから入力される歌唱音声信号とを比較し、その一致の度合いに基づいて採点値を算出することが可能である。 The karaoke device 2 of the present embodiment can reproduce two types of music information. One type (A type) of music information is music information including singing evaluation information for performing singing evaluation. As is conventionally known, in the singing evaluation when reproducing such music information, the singing evaluation information as the main melody information is compared with the singing voice signal input from the microphone, and the degree of matching is determined. It is possible to calculate the score value based on this.

図２（Ａ）は、本実施形態の楽曲情報（Ａタイプ）のデータ構成を示した図である。楽曲情報は、楽曲情報に関連する各種情報を含んだメタ情報と、演奏や歌詞の表示といった各種処理を実行するための実情報を有している。メタ情報には、楽曲情報を識別するための楽曲ＩＤ、曲名、歌手名、ジャンル等の楽曲関連情報を有している。楽曲関連情報は、ユーザーが楽曲を検索する際の検索対象項目として使用することが可能である。楽曲情報の実情報には、演奏情報、歌詞情報、背景映像情報等を含んで構成される。演奏情報は、ＭＩＤＩ規格に基づいて電子楽器用の制御情報、あるいは、実際の演奏を録音した圧縮音声情報等を含んで構成された、カラオケの伴奏音を演奏するための情報である。歌詞情報は、歌唱補助のため、演奏情報に同期して表示される情報であり、演奏に同期して表示された歌詞の色替えを行うように構成してもよい。歌唱評価情報は、楽曲再生時において、ユーザーの歌唱音声を評価する情報であり、歌唱すべき旋律等を含んで構成される。楽曲再生時、歌唱評価を行う際には、マイクロホン４３ａ、４３ｂに入力される歌唱音声と、この歌唱評価情報を比較することで、採点値の算出等を行うことが可能である。 FIG. 2A is a diagram showing a data structure of music information (A type) of the present embodiment. The music information has meta information including various information related to the music information and actual information for executing various processes such as performance and display of lyrics. The meta information includes music-related information such as a music ID, a music name, a singer name, and a genre for identifying the music information. The music-related information can be used as a search target item when the user searches for a music. The actual information of the music information includes performance information, lyrics information, background video information, and the like. The performance information is information for playing a karaoke accompaniment sound, which is composed of control information for an electronic musical instrument based on the MIDI standard, compressed audio information obtained by recording an actual performance, and the like. The lyrics information is information displayed in synchronization with the performance information for singing assistance, and may be configured to change the color of the lyrics displayed in synchronization with the performance. The singing evaluation information is information for evaluating the user's singing voice at the time of playing music, and is configured to include a melody to be sung. When performing singing evaluation during music reproduction, it is possible to calculate a scoring value or the like by comparing the singing voice input to the microphones 43a and 43b with the singing evaluation information.

他方の種類（Ｂタイプ）の楽曲情報は、歌唱評価情報を含んでいない、例えば、ＣＤに記録された楽曲等のように歌唱評価情報を有していない楽曲情報である。従来、このような楽曲情報の再生時には、歌唱評価情報を含んでいないため歌唱評価を行うことが困難であった。本実施形態のカラオケ装置２では、このような楽曲情報についても歌唱評価を行うことが可能となっている。 The other type (B type) of music information is music information that does not include singing evaluation information, for example, music that does not have singing evaluation information such as a music recorded on a CD. Conventionally, it has been difficult to evaluate singing because the singing evaluation information is not included when reproducing such music information. In the karaoke device 2 of the present embodiment, it is possible to perform singing evaluation also for such music information.

図２（Ｂ）は、本実施形態の楽曲情報（Ｂタイプ）のデータ構成を示した図である。図２（Ａ）の楽曲情報と同様、楽曲情報は、楽曲情報に関連する各種情報を含んだメタ情報と、演奏や歌詞の表示といった各種処理を実行するための実情報を有している。メタ情報については図２（Ａ）の楽曲情報と同様であるため、ここでの説明は省略する。実情報には、伴奏音を含んだ演奏情報（本発明における「伴奏音信号」に相当）、映像情報が含まれている。演奏情報には、市販されるＣＤのように実際の演奏を録音した伴奏音が含まれている。また、演奏情報には、伴奏音のみならず、歌手による歌唱音声（本発明における「模範歌唱音」に相当）が含まれていている。楽曲情報（Ｂタイプ）における演奏情報は、伴奏音と歌唱音声がミキシングされた状態となっている。映像情報には、背景映像、伴奏音に同期して表示される歌詞が含まれている。本実施形態のカラオケ装置２は、歌唱評価情報が含まれない楽曲情報（Ｂタイプ）についても、演奏情報を使用して歌唱評価を行うことが可能となっている。 FIG. 2B is a diagram showing a data structure of music information (B type) of the present embodiment. Similar to the music information of FIG. 2A, the music information has meta information including various information related to the music information and actual information for executing various processes such as performance and display of lyrics. Since the meta information is the same as the music information of FIG. 2 (A), the description here will be omitted. The actual information includes performance information including accompaniment sounds (corresponding to the "accompaniment sound signal" in the present invention) and video information. The performance information includes accompaniment sounds obtained by recording an actual performance, such as a commercially available CD. Further, the performance information includes not only the accompaniment sound but also the singing voice by the singer (corresponding to the "model singing sound" in the present invention). The performance information in the music information (B type) is in a state where the accompaniment sound and the singing voice are mixed. The video information includes lyrics that are displayed in synchronization with the background video and accompaniment sound. The karaoke device 2 of the present embodiment can perform singing evaluation using performance information even for music information (B type) that does not include singing evaluation information.

なお、本実施形態では、事前に演奏情報に基づいて得られた演奏音確率密度を楽曲情報に含ませた構成としている。この演奏音確率密度は、伴奏音に加えて歌唱音を含んだ演奏音情報を、周波数解析した情報であって、歌唱音の音高の他、伴奏に使用される各種楽器の音高を含んだ情報となっている。 In this embodiment, the performance sound probability density obtained in advance based on the performance information is included in the music information. This performance sound probability density is information obtained by frequency-analyzing the performance sound information including the singing sound in addition to the accompaniment sound, and includes the pitch of the singing sound and the pitches of various musical instruments used for the accompaniment. It is information.

なお、主旋律情報等、直接の比較対象となる歌唱評価情報を含まない楽曲情報（Ｂタイプ）としては、本実施形態のように演奏音確率密度を含んだ形態のみならず、演奏音確率密度を含んでいない楽曲情報であってもよい。その場合、楽曲演奏処理時に演奏音確率密度が算出されることになる。 The music information (B type) that does not include the singing evaluation information to be directly compared, such as the main melody information, includes not only the form including the performance sound probability density as in the present embodiment but also the performance sound probability density. It may be music information that is not included. In that case, the performance sound probability density is calculated at the time of music performance processing.

図３は、本実施形態の演奏音確率密度算出処理を示すフロー図である。本実施形態で使用する演奏音確率密度は、模範歌唱音と伴奏音を含んだ演奏情報（本発明の「伴奏音信号」に相当）を解析して得られる情報であって、特許第３４１３６３４号公報に記載される音高推定方法に従って算出される。本実施形態では、楽曲情報（Ｂタイプ）に含まれる演奏情報を事前に処理して、楽曲情報（Ｂタイプ）に含ませることとしているが、カラオケ装置２の処理能力によっては、楽曲演奏中に、演奏音確率密度を算出して歌唱評価を行うこととしてもよい。その場合、楽曲情報（Ｂタイプ）には、演奏音確率密度は含まれないことになる。 FIG. 3 is a flow chart showing the performance sound probability density calculation process of the present embodiment. The performance sound probability density used in the present embodiment is information obtained by analyzing performance information (corresponding to the "accompaniment sound signal" of the present invention) including a model singing sound and an accompaniment sound, and is patent No. 3413634. It is calculated according to the pitch estimation method described in the publication. In the present embodiment, the performance information included in the music information (B type) is processed in advance and included in the music information (B type), but depending on the processing capacity of the karaoke device 2, the music is being played. , The performance sound probability density may be calculated and the singing evaluation may be performed. In that case, the music information (B type) does not include the performance sound probability density.

演奏音確率密度算出処理は、解析対象となる楽曲情報から演奏情報を取得することで開始される（Ｓ１００）。特許第３４１３６３４号に記載される音高推定方法では、ＢＰＦを通すことで、メロディーライン（歌唱音）、ベースライン（ベース音）毎に基本周波数の軌跡を求めているが、本実施形態では、ＢＰＦを通さずに全ての帯域の演奏情報を対象としている。解析精度を高めるため、周波数解析（ＦＦＴ）を行う前の前処理として、ゼロパディング（ゼロ詰め）が行われる（Ｓ１０１）。そして、ゼロパディングされた演奏情報に対して周波数解析（Ｓ１０２）を実行する。本実施形態では、周波数解析手法としてＦＦＴを使用しているが、他の周波数解析手法を使用することも可能である。 The performance sound probability density calculation process is started by acquiring performance information from the music information to be analyzed (S100). In the pitch estimation method described in Patent No. 3413634, the trajectory of the fundamental frequency is obtained for each melody line (singing sound) and bass line (bass sound) by passing through a BPF. However, in the present embodiment, Performance information of all bands is targeted without passing through the BPF. In order to improve the analysis accuracy, zero padding (zero padding) is performed as a preprocessing before performing frequency analysis (FFT) (S101). Then, frequency analysis (S102) is executed for the zero-padded performance information. In the present embodiment, the FFT is used as the frequency analysis method, but other frequency analysis methods can also be used.

周波数解析された演奏音情報に対して、ＥＭアルゴリズム（ExpectationーMaximization）を使用して、演奏情報に含まれる演奏音確率密度が算出される。ＥＭアルゴリズムは、Ｅステップ（Expectation Step、Ｓ１０３）と、Ｍステップ（Maximization Step、Ｓ１０４）を交互に繰り返し適用することで実行される。その際、歌唱音の高調波構造の音モデルを適用することで、歌唱音について精度の高い演奏音確率密度が算出される。算出された演奏音確率密度は、解析対象となった楽曲情報に格納される（Ｓ１０５）。 For the frequency-analyzed performance sound information, the performance sound probability density included in the performance information is calculated by using the EM algorithm (Expectation-Maximization). The EM algorithm is executed by repeatedly applying the E step (Expectation Step, S103) and the M step (Maximization Step, S104) alternately and repeatedly. At that time, by applying a sound model having a harmonic structure of the singing sound, a highly accurate performance sound probability density is calculated for the singing sound. The calculated performance sound probability density is stored in the music information to be analyzed (S105).

本実施形態では、このようにして算出された演奏音確率密度を使用して、楽曲情報（Ａタイプ）のように、主旋律情報のような歌唱音の音高を直接示した歌唱評価情報を含んでいない楽曲情報であっても、適切に歌唱評価を行うことが可能となっている。 In the present embodiment, the performance sound probability density calculated in this way is used to include singing evaluation information that directly indicates the pitch of the singing sound such as the main melody information, such as music information (A type). It is possible to appropriately evaluate the singing even if the music information is not provided.

では、本実施形態のカラオケ装置２について歌唱評価を行う評価処理を含んだ楽曲再生処理について説明する。図４は、本実施形態の楽曲再生処理を示すフロー図である。カラオケ装置２は、リモコン装置１、あるいは、タッチパネルモニター３３等の入力部に対する操作に基づいて楽曲が予約される。図２（Ｃ）は、予約操作に基づいてカラオケ装置２のメモリ２７に記憶される予約情報のデータ構成である。予約情報は、楽曲情報を識別するための楽曲ＩＤの他、予約したユーザーを示すユーザーＩＤ、予約時の音程設定に基づく音程設定値等が含まれている。 Then, the music reproduction processing including the evaluation processing for singing evaluation about the karaoke apparatus 2 of this embodiment will be described. FIG. 4 is a flow chart showing the music reproduction process of the present embodiment. The karaoke device 2 reserves music based on an operation on an input unit such as the remote controller 1 or the touch panel monitor 33. FIG. 2C is a data structure of reservation information stored in the memory 27 of the karaoke device 2 based on the reservation operation. The reservation information includes a music ID for identifying the music information, a user ID indicating the reserved user, a pitch setting value based on the pitch setting at the time of reservation, and the like.

カラオケ装置２は、メモリ２７に記憶管理している予約テーブルをチェックし、再生の対象となる楽曲を確認する（Ｓ２０１）。次に再生する楽曲がある場合（Ｓ２０２：Ｙｅｓ）、予約情報中の楽曲ＩＤに対応する楽曲情報を読み出して楽曲の再生を開始する（Ｓ２０３）。楽曲情報の再生期間中、ユーザーの歌唱を評価する標準評価処理（Ｓ２０５）、あるいは、評価処理（Ｓ３００）が実行される。再生中の楽曲情報が、歌唱評価情報が含まれる楽曲情報（Ａタイプ）と判断された場合（Ｓ２０４：Ｙｅｓ）、標準評価処理（Ｓ２０５）が実行される。この標準評価処理（Ｓ２０５）は、従来から行われている歌唱評価であって、楽曲情報に含まれる歌唱評価情報（主旋律情報）と、マイクロホン４３ａ、４３ｂから入力される歌唱音声信号とを比較し、採点値等の評価結果を算出する処理である。なお、標準評価処理（Ｓ２０５）は、従来からよく知られている処理であるため、ここでの詳細な説明は省略する。 The karaoke device 2 checks the reservation table stored and managed in the memory 27, and confirms the music to be played back (S201). When there is a music to be played next (S202: Yes), the music information corresponding to the music ID in the reservation information is read out and the music is started to be played (S203). During the reproduction period of the music information, the standard evaluation process (S205) or the evaluation process (S300) for evaluating the user's singing is executed. When it is determined that the music information being played is music information (A type) including singing evaluation information (S204: Yes), the standard evaluation process (S205) is executed. This standard evaluation process (S205) is a conventional singing evaluation, and compares the singing evaluation information (main melody information) included in the music information with the singing voice signals input from the microphones 43a and 43b. , It is a process of calculating the evaluation result such as the scoring value. Since the standard evaluation process (S205) is a well-known process, detailed description thereof will be omitted here.

一方、再生中の楽曲情報が、歌唱評価情報が含まれない楽曲情報（Ｂタイプ）と判断された場合（Ｓ２０４：Ｎｏ）、評価処理（Ｓ３００）が実行される。この評価処理（Ｓ３００）は、本実施形態の特徴となる処理であって、楽曲情報（Ｂタイプ）に含まれる演奏音確率密度に基づいて歌唱評価を行う処理である。この評価処理（Ｓ３００）の詳細については後述する。Ａタイプ、Ｂタイプの何れについても楽曲の再生が終了する（Ｓ２０６、Ｓ２０７：Ｙｅｓ）と、標準評価処理（Ｓ２０５）、あるいは、評価処理（Ｓ３００）で判定された評価結果をモニター４１等に表示することでユーザーの歌唱能力が通知される。そして、楽曲再生処理の先頭に戻り、次に再生の対象となる楽曲の確認が行われる。 On the other hand, when it is determined that the music information being played is music information (B type) that does not include the singing evaluation information (S204: No), the evaluation process (S300) is executed. This evaluation process (S300) is a process that is a feature of the present embodiment, and is a process that evaluates singing based on the performance sound probability density included in the music information (B type). The details of this evaluation process (S300) will be described later. When the playback of the music is completed for both the A type and the B type (S206, S207: Yes), the evaluation result determined by the standard evaluation process (S205) or the evaluation process (S300) is displayed on the monitor 41 or the like. By doing so, the user's singing ability is notified. Then, the process returns to the beginning of the music reproduction process, and then the music to be reproduced is confirmed.

図５は、本実施形態の評価処理（Ｓ３００）を示すフロー図である。評価処理（Ｓ３００）では、マイクロホン４３ａ、４３ｂから入力される歌唱音声信号と、楽曲情報から取得する演奏音確率密度に基づいて評価が行われる。マイクロホン４３ａ、４３ｂから入力された歌唱音声信号（Ｓ３０３）に対しては、歌唱音声信号の基本周波数について、その確率密度が算出される（Ｓ３０４）。その際、現時点が歌唱区間か否かが判定される（Ｓ３０５）。歌唱区間の判定は、歌唱音声信号自体を使用して判定してもよいし、演奏音情報、あるいは、演奏音確率密度を参照して判定してもよい。あるいは、楽曲情報に含まれる歌詞情報の進行状況を使用して判定することも可能である。 FIG. 5 is a flow chart showing the evaluation process (S300) of the present embodiment. In the evaluation process (S300), evaluation is performed based on the singing voice signal input from the microphones 43a and 43b and the performance sound probability density acquired from the music information. For the singing voice signal (S303) input from the microphones 43a and 43b, the probability density of the fundamental frequency of the singing voice signal is calculated (S304). At that time, it is determined whether or not the current time is the singing section (S305). The determination of the singing section may be made by using the singing voice signal itself, or may be determined by referring to the performance sound information or the performance sound probability density. Alternatively, it is also possible to make a determination using the progress of the lyrics information included in the music information.

歌唱区間と判定された場合（Ｓ３０５：Ｙｅｓ）、Ｓ３０４で算出された基本周波数の確率密度について二次元正規分布領域が算出される（Ｓ３０６）。図６は、本実施形態の音高確率密度の二次元正規分布領域を説明するための図である。図６（Ａ）は、基本周波数の確率密度を示すグラフであって、解析した時点における基本周波数ｆｃについて確率密度（確からしさ）が与えられた状態となっている。本実施形態では、この基本周波数ｆｃを中心（基準）として、所定分布を適用することで評価精度の向上を図っている。本実施形態では、周波数方向、時間軸方向のそれぞれに正規分布を適用し、図６（Ｂ）に示す二次元正規分布領域を作成している。基本周波数ｆｃを基準とする分布は、二次元正規分布に限られるものではなく、周波数軸方向のみ、時間軸方向のみであってもよい。また、分布形態も正規分布ではなく、各種他の分布形態を採用してもよい。作成された二次元正規分布領域を使用して、演奏音確率密度における模範歌唱特性の抽出、並びに、歌唱音声信号の評価を行うこととしている。 When it is determined to be a singing section (S305: Yes), a two-dimensional normal distribution region is calculated for the probability density of the fundamental frequency calculated in S304 (S306). FIG. 6 is a diagram for explaining a two-dimensional normal distribution region of the pitch probability density of the present embodiment. FIG. 6A is a graph showing the probability density of the fundamental frequency, and is in a state where the probability density (probability) is given for the fundamental frequency fc at the time of analysis. In the present embodiment, the evaluation accuracy is improved by applying a predetermined distribution with the fundamental frequency fc as the center (reference). In the present embodiment, a normal distribution is applied to each of the frequency direction and the time axis direction to create a two-dimensional normal distribution region shown in FIG. 6 (B). The distribution based on the fundamental frequency fc is not limited to the two-dimensional normal distribution, and may be only in the frequency axis direction or only in the time axis direction. Further, the distribution form is not a normal distribution, and various other distribution forms may be adopted. Using the created two-dimensional normal distribution region, the model singing characteristics in the performance sound probability density are extracted, and the singing voice signal is evaluated.

図３で説明したように演奏音確率密度は、歌唱評価に必要となる歌唱音以外に、伴奏音、バックコーラス等の情報が含まれている。そのため、この演奏音確率密度を、そのまま評価に使用すると、評価精度が低下してしまうことが考えられる。本実施形態では、演奏音確率密度を、歌唱音声信号でフィルタリングすることで模範歌唱特性を抽出し、この模範歌唱特性を歌唱音声信号との比較に使用することで、評価精度の向上を図ることとしている。これは、歌唱者によって歌唱音の巧拙の相違はあるとはいえ、適切に歌唱を行えば、模範歌唱特性に類似した特性となること、反対にいえば、適切に歌唱を行わない場合、演奏音確率密度から模範歌唱特性を適切に抽出できないことを理由としている。 As described with reference to FIG. 3, the performance sound probability density includes information such as accompaniment sound and backing chorus in addition to the singing sound required for singing evaluation. Therefore, if this performance sound probability density is used as it is for evaluation, it is considered that the evaluation accuracy is lowered. In the present embodiment, the model singing characteristic is extracted by filtering the performance sound probability density with the singing voice signal, and the model singing characteristic is used for comparison with the singing voice signal to improve the evaluation accuracy. It is said. This is because, although there are differences in the skill of the singing sound depending on the singer, if the singing is performed properly, the characteristics will be similar to the model singing characteristics, and conversely, if the singing is not performed properly, the performance will be performed. The reason is that the model singing characteristics cannot be properly extracted from the sound probability density.

評価処理では、評価対象となる区間の演奏音確率密度が取得され（Ｓ３０１）、取得された歌唱音高確率密度から、歌唱音声信号の基本周波数を基準とする歌唱音高近傍範囲を抽出し、模範歌唱特性が算出される（Ｓ３０２）。本実施形態では、演奏音確率密度に対して、歌唱音声信号の二次元正規分布領域を乗算することで、歌唱音高近傍範囲の抽出、及び、模範歌唱特性の算出を同時に行うこととしている。なお、歌唱するユーザーは、本来歌唱すべき歌唱音高に対しオクターブを異ならせて歌唱することが考えられる。そのため、歌唱音高近傍範囲は、複数のオクターブ毎に歌唱音高近傍範囲を抽出することとしてもよい。あるいは、複数のオクターブ毎に抽出し、演奏音確率密度が占める割合が高いオクターブを歌唱音高近傍範囲として抽出することとしてもよい。 In the evaluation process, the performance sound probability density of the section to be evaluated is acquired (S301), and the singing pitch vicinity range based on the fundamental frequency of the singing voice signal is extracted from the acquired singing pitch high probability density. The model singing characteristic is calculated (S302). In the present embodiment, by multiplying the performance sound probability density by the two-dimensional normal distribution region of the singing sound signal, the range near the singing pitch is extracted and the model singing characteristic is calculated at the same time. It is conceivable that the user who sings sings with an octave different from the singing pitch that should be sung. Therefore, the singing pitch neighborhood range may be extracted for each of a plurality of octaves. Alternatively, it may be extracted for each of a plurality of octaves, and the octave having a high proportion of the performance sound probability density may be extracted as a range near the singing pitch.

図７は、本実施形態の歌唱音高近傍範囲を説明するための図である。図７（Ａ）は、歌唱音高確率密度の時間変化を示したグラフである。図７（Ｂ）は、図７（Ａ）のグラフについて、歌唱尾音高近傍範囲を示したグラフである。図７（Ｂ）に示される歌唱音高近傍範囲は、歌唱するユーザーの基本周波数を基準とした範囲であって、この範囲外の歌唱音高確率密度は、歌唱評価の対象外となる。したがって、周波数について評価範囲を減らし、評価処理で必要となる負荷を削減することが可能となっている。 FIG. 7 is a diagram for explaining a range near the singing pitch of the present embodiment. FIG. 7A is a graph showing the time change of the singing pitch high probability density. FIG. 7B is a graph showing a range near the singing tail pitch with respect to the graph of FIG. 7A. The range near the singing pitch shown in FIG. 7B is a range based on the fundamental frequency of the singing user, and the singing pitch probability density outside this range is not subject to singing evaluation. Therefore, it is possible to reduce the evaluation range for the frequency and reduce the load required for the evaluation process.

図８は、本実施形態の模範歌唱特性の算出を説明するための図である。図８（Ａ）は、ある時間における演奏音確率密度と、歌唱音声信号から算出された二次元正規分布領域が記載されている。演奏音確率密度に対して二次元分布領域を乗算することで、図８（Ｂ）に示される模範歌唱特性が算出される。 FIG. 8 is a diagram for explaining the calculation of the model singing characteristics of the present embodiment. FIG. 8A shows a performance sound probability density at a certain time and a two-dimensional normal distribution region calculated from the singing voice signal. The model singing characteristic shown in FIG. 8B is calculated by multiplying the performance sound probability density by the two-dimensional distribution region.

歌唱評価は、算出された模範歌唱特性と、二次元正規分布領域を比較処理（一致度算出）することで、行われる（Ｓ３０７）。図９は、図９は、本実施形態の比較処理（Ｓ３０７）を説明するための図である。図９（Ａ）は、演奏音情報に基づいて算出された模範歌唱特性であって、図９（Ｂ）は、歌唱音声信号に基づいて算出された二次元正規分布領域である。両者の一致度を算出することで比較処理が実行される。両者の一致具合が高い程、一致度、すなわち、歌唱評価は高いものとなる。なお、本実施形態では、二次元正規分布領域を、周波数方向、時間軸方向の両軸方向に持たせている。そのため、時間軸方向についても考慮して評価を行うことが可能である。時間軸方向を考慮する場合には、模範歌唱特性についても時間的な推移を使用する等、模範歌唱特性を二次元的に換算する、あるいは、二次元正規分布領域の時間的な推移に基づいて周波数軸に関する一次元の分布領域に換算して比較することが好ましい。 The singing evaluation is performed by comparing the calculated model singing characteristics with the two-dimensional normal distribution region (calculating the degree of agreement) (S307). FIG. 9 is a diagram for explaining the comparison process (S307) of the present embodiment. FIG. 9A is a model singing characteristic calculated based on the performance sound information, and FIG. 9B is a two-dimensional normal distribution region calculated based on the singing voice signal. The comparison process is executed by calculating the degree of agreement between the two. The higher the degree of agreement between the two, the higher the degree of agreement, that is, the singing evaluation. In the present embodiment, the two-dimensional normal distribution region is provided in both the frequency direction and the time axis direction. Therefore, it is possible to evaluate in consideration of the time axis direction. When considering the time axis direction, the model singing characteristics are converted two-dimensionally, such as using the temporal transition for the model singing characteristics, or based on the temporal transition of the two-dimensional normal distribution region. It is preferable to convert to a one-dimensional distribution region related to the frequency axis for comparison.

比較処理（Ｓ３０７）で算出された比較採点値を積算することで採点値が算出される（Ｓ３０８）。楽曲情報の再生期間中、評価処理（Ｓ３００）を繰り返し実行することで、楽曲全体の採点値が算出されることになる。楽曲の再生終了が判定された場合（Ｓ２０７：Ｙｅｓ）、評価処理（Ｓ３００）の評価結果として算出された採点値をモニター４１等に表示することで、歌唱したユーザーに対して歌唱能力としての採点値が通知される。 The score value is calculated by integrating the comparative score values calculated in the comparison process (S307) (S308). By repeatedly executing the evaluation process (S300) during the reproduction period of the music information, the score value of the entire music is calculated. When the end of playback of the music is determined (S207: Yes), the scoring value calculated as the evaluation result of the evaluation process (S300) is displayed on the monitor 41 or the like to score the singing user as the singing ability. The value is notified.

以上、本実施形態のカラオケシステムによれば、主旋律情報のような歌唱評価情報を含んでいない楽曲情報であっても、演奏情報中、歌唱音声信号の音高近傍の周波数範囲である歌唱音高近傍範囲を評価の対象とすることで、処理負荷を抑えると共に、精度の高い歌唱評価を行うことが可能となる。 As described above, according to the karaoke system of the present embodiment, even if the music information does not include the singing evaluation information such as the main melody information, the singing pitch is in the frequency range near the pitch of the singing voice signal in the performance information. By targeting the vicinity range as the evaluation target, it is possible to reduce the processing load and perform highly accurate singing evaluation.

なお、本実施形態では、歌唱音声信号、演奏音情報の両方について、確率密度を算出して比較の対象としているが、各々について確率密度のみならず、各種指標を使用することが可能である。 In the present embodiment, the probability densities are calculated and compared for both the singing voice signal and the performance sound information, but not only the probability densities but also various indexes can be used for each.

また、本実施形態では、楽曲情報（Ｂタイプ）に、演奏情報を解析して得られた演奏音確率密度を付帯させているが、この演奏音確率密度は、予め付帯させず、楽曲再生処理中に算出することとしてもよい。また、歌唱音高近傍範囲、あるいは、歌唱音高近傍範囲を特定可能な情報は、楽曲再生処理時にユーザーが歌唱する歌唱音声信号を使用して算出することに代えて、予め楽曲情報に付帯させておくこととしてもよい。その場合、楽曲情報の提供側（製造メーカー側）で用意した歌唱者が歌唱を行い、当該歌唱音声信号から得られた歌唱音高近傍範囲を楽曲情報に付帯させておく、あるいは、当該歌唱音声信号から得られた歌唱音高近傍範囲を特定可能な情報を楽曲情報に付帯させておくことが考えられる。 Further, in the present embodiment, the performance sound probability density obtained by analyzing the performance information is attached to the music information (B type), but this performance sound probability density is not attached in advance, and the music reproduction process is performed. It may be calculated in. Further, the information that can specify the singing pitch vicinity range or the singing pitch vicinity range is attached to the music information in advance instead of being calculated using the singing voice signal sung by the user during the music reproduction process. You may leave it as it is. In that case, the singer prepared by the music information provider (manufacturer) sings, and the range near the singing pitch obtained from the singing voice signal is attached to the music information, or the singing voice is heard. It is conceivable to attach information that can specify the range near the singing pitch obtained from the signal to the music information.

このように歌唱音高近傍範囲、あるいは、歌唱音高近傍範囲を特定可能な情報を楽曲情報に付帯させておくことで、楽曲情報の提供側（製造メーカー側）において、プロの歌手程、上手くなくても適度に歌唱できる歌唱者を使用して、適切な歌唱音高近傍範囲を提供することが可能となる。 By attaching information that can identify the singing pitch vicinity range or the singing pitch vicinity range to the music information in this way, the music information provider side (manufacturer side) is as good as a professional singer. It is possible to provide an appropriate singing pitch vicinity range by using a singer who can sing moderately without it.

以上、本発明についてカラオケシステムを用いて説明したが、本発明はカラオケシステムに限られるものではない。カラオケ装置内、あるいは、カラオケ装置外において歌唱評価を実行する歌唱評価装置についても本発明の範疇に属するものである。また、現在、スマートホンにカラオケ用アプリ（プログラム）をインストールすることで、スマートホンでカラオケを行う形態、あるいは、ゲーム機において実行されるカラオケ用プログラムもよく知られている。このような各種情報処理装置にインストールすることで、本発明の機能を実現するカラオケ用プログラムに実装された歌唱評価プログラムについても本発明の範疇に属するものである。さらに歌唱評価方法についても本発明の範疇に属することはいうまでもない。 Although the present invention has been described above using the karaoke system, the present invention is not limited to the karaoke system. A singing evaluation device that executes singing evaluation inside or outside the karaoke device also belongs to the category of the present invention. Further, at present, a form in which a karaoke application (program) is installed on a smartphone to perform karaoke on the smartphone, or a karaoke program executed on a game machine is well known. A singing evaluation program implemented in a karaoke program that realizes the functions of the present invention by being installed in such various information processing devices also belongs to the scope of the present invention. Furthermore, it goes without saying that the singing evaluation method also belongs to the category of the present invention.

１：リモコン装置２４ｂ：無線ＬＡＮ通信部
２：カラオケ装置２５：音響制御部
５：ホスト装置２７：メモリ
１１：タッチパネルモニター２８：ビデオＲＡＭ
１１ａ：表示部２９：映像再生部
１１ｂ：タッチパネル３０：ＣＰＵ
１２：ビデオＲＡＭ３１：映像制御部
１３：映像制御部３２：ハードディスク
１４：メモリ３３：タッチパネルモニター
１５：ＣＰＵ３４：タッチパネル
１６：無線ＬＡＮ通信部３５：表示部
１７：操作部４１：モニター
１８：操作処理部４２：スピーカー
２１：操作部４３ａ、４３ｂ：マイクロホン
２２：操作処理部１３０：アクセスポイント
２４ａ：ＬＡＮ通信部 1: Remote control device 24b: Wireless LAN communication unit 2: Karaoke device 25: Acoustic control unit 5: Host device 27: Memory 11: Touch panel monitor 28: Video RAM
11a: Display unit 29: Video reproduction unit 11b: Touch panel 30: CPU
12: Video RAM 31: Video control unit 13: Video control unit 32: Hard disk 14: Memory 33: Touch panel monitor 15: CPU 34: Touch panel 16: Wireless LAN communication unit 35: Display unit 17: Operation unit 41: Monitor 18: Operation Processing unit 42: Speaker 21: Operation unit 43a, 43b: Microphone 22: Operation processing unit 130: Access point 24a: LAN communication unit

Claims

A singing pitch proximity range calculation means that calculates a singing pitch neighborhood range that is a frequency range near the pitch of the singing pitch by frequency analysis of the singing voice signal input from the microphone.
From the accompaniment sound signal including the model singing sound, the model singing voice characteristic calculation means for calculating the model singing characteristic within the singing pitch vicinity range calculated by the singing pitch vicinity range calculation means, and the model singing voice characteristic calculation means.
Singing voice characteristic calculation means that calculates singing voice characteristics by frequency analysis of singing voice signals,
It is provided with an evaluation means for evaluating the singing voice signal by comparing the singing voice characteristic of the singing voice signal calculated by the singing voice characteristic calculation means with the model singing characteristic calculated by the model singing voice characteristic calculation means . A singing evaluation device characterized by this.

The singing evaluation device according to claim 1, wherein the range near the singing pitch is a predetermined distribution based on the pitch of the singing voice signal in at least one of the frequency axis direction and the time axis direction. ..

A singing pitch proximity range calculating means for calculating a singing pitch neighborhood range from an accompaniment sound signal accompanied by information that can identify the singing pitch neighborhood range or the singing pitch neighborhood range.
A model singing voice characteristic calculation means for calculating a model singing characteristic within a range near the singing pitch from an accompaniment sound signal including a model singing sound,
A singing evaluation device characterized in that it is provided with an evaluation means for evaluating a singing voice signal by comparing the singing voice characteristics of a singing voice signal input from a microphone with the model singing characteristics.

The singing evaluation device according to any one of claims 1 to 3,
A karaoke device equipped with a reproduction means for reproducing an accompaniment sound signal.

Frequency analysis of the singing voice signal input from the microphone and calculation of the singing pitch neighborhood range, which is the frequency range near the pitch of the singing pitch, and the singing pitch neighborhood calculation process.
From the accompaniment sound signal including the model singing sound, the model singing voice characteristic calculation process for calculating the model singing characteristic within the singing pitch neighborhood range calculated by the singing pitch neighborhood calculation means, and
Singing voice characteristic calculation means that calculates singing voice characteristics by frequency analysis of singing voice signals,
It is possible to execute an evaluation process that evaluates the singing voice signal by comparing the singing voice characteristic of the singing voice signal calculated by the singing voice characteristic calculation means with the model singing characteristic calculated by the model singing voice characteristic calculation means. A singing evaluation program characterized by the fact that

Singing pitch neighborhood calculation processing that calculates the singing pitch neighborhood range from the accompaniment sound signal with information that can identify the singing pitch neighborhood range or the singing pitch neighborhood range,
A model singing voice characteristic calculation process that calculates the model singing characteristics within the range near the singing pitch from the accompaniment sound signal including the model or muffling,
A singing evaluation program characterized in that it is possible to execute an evaluation process that evaluates a singing voice signal by comparing the singing voice characteristics of a singing voice signal input from a microphone with the model singing characteristics.