JP2008015388A

JP2008015388A - Singing skill evaluation method and karaoke machine

Info

Publication number: JP2008015388A
Application number: JP2006188742A
Authority: JP
Inventors: Hideyo Takeuchi; 英世竹内; Masahiro Hoguro; 政大保黒
Original assignee: DDS KK
Current assignee: DDS KK
Priority date: 2006-07-10
Filing date: 2006-07-10
Publication date: 2008-01-24

Abstract

PROBLEM TO BE SOLVED: To score only a singing part by judging whether a sound input from a microphone is accompaniment information or human singing voice. SOLUTION: A score processing part 12 incorporated in a Karaoke machine 1 is provided with an accompaniment/singing voice discrimination part 21 for judging whether a sound signal input from the microphone 2 is a Karaoke accompaniment or human singing voice. The accompaniment/singing voice discrimination part 21 calculates a spectrum by analyzing the frequency of the sound signal input from the microphone 2 and decides that the input sound signal is accompaniment information when the concentration of energy of the spectrum into a high frequency band is detected. COPYRIGHT: (C)2008,JPO&INPIT

Description

本発明は、歌唱力評価方法及びカラオケ装置に関する。 The present invention relates to a singing ability evaluation method and a karaoke apparatus.

現在市販されている殆ど全てのカラオケ装置には、カラオケ採点機能が付いている。従来のカラオケ採点装置として、例えば特許文献１に示すようなマイクから入力された歌唱者の歌声から抽出した音程（ピッチ）とガイドメロディの音程を比較して、その一致度に基づき得点を算出するというものが知られている。
特許第２９２５７５９号公報 Almost all karaoke devices currently on the market have a karaoke scoring function. As a conventional karaoke scoring device, for example, a pitch (pitch) extracted from a singing voice of a singer input from a microphone as shown in Patent Document 1 is compared with a pitch of a guide melody, and a score is calculated based on the degree of coincidence. Is known.
Japanese Patent No. 2925759

これらのカラオケ採点装置では、マイクから入力された歌唱者の歌声からピッチを検出し、ピッチに基づいて歌唱者の歌の採点を行う。しかし、マイクにはいつも歌唱者の歌声が入力されるとは限らない。例えばカラオケＢＯＸなどに設置されたカラオケ装置の場合、マイクには歌唱者の歌声と伴奏情報が混ざりあった音声信号が入力される場合が多い。歌唱者がマイクを口元から離して歌っている場合、歌唱者の声が小さい場合、伴奏の音量が大きい場合などは、カラオケ伴奏が歌唱者の持つマイクに回り込み、カラオケ採点結果に影響を与える場合がある（図１３参照）。 In these karaoke scoring devices, the pitch is detected from the singing voice of the singer input from the microphone, and the singer's song is scored based on the pitch. However, the singer's singing voice is not always input to the microphone. For example, in the case of a karaoke apparatus installed in a karaoke BOX or the like, an audio signal in which a singer's singing voice and accompaniment information are mixed is often input to the microphone. When the singer sings away from the microphone, when the singer's voice is low, or when the accompaniment volume is high, the karaoke accompaniment wraps around the singer's microphone and affects the karaoke scoring results (See FIG. 13).

本発明は、上記問題を解決するためになされたものであり、音声信号入力手段から入力された音声信号が歌唱者の歌声情報か伴奏情報かを判定することのできる歌唱力評価方法及び歌唱力評価機能を有するカラオケ装置を提供することを目的とする。 The present invention has been made to solve the above problem, and a singing ability evaluation method and singing ability capable of determining whether a voice signal input from a voice signal input means is a singer's singing voice information or accompaniment information. An object is to provide a karaoke apparatus having an evaluation function.

上記目的を達成するために、本発明の請求項１に記載の歌唱力評価方法は、コンピュータに、音声信号入力手段から入力された入力音声信号に周波数分析を行いスペクトルを算出させるスペクトル算出ステップと、当該スペクトル算出ステップを実行して得られたスペクトルをスペクトル記憶手段に記憶するスペクトル記憶ステップと、前記スペクトル記憶手段より読み出されたスペクトルのエネルギーが高周波数帯域に集中していることが検出されたときに入力音声信号は伴奏情報であると判定する伴奏・歌声判定ステップとを実行させることを特徴とする。 In order to achieve the above object, a singing ability evaluation method according to claim 1 of the present invention is a spectrum calculation step for causing a computer to perform frequency analysis on an input voice signal input from voice signal input means and calculate a spectrum; A spectrum storage step for storing the spectrum obtained by executing the spectrum calculation step in the spectrum storage means, and detecting that the energy of the spectrum read from the spectrum storage means is concentrated in a high frequency band. And an accompaniment / singing voice determination step for determining that the input audio signal is accompaniment information.

また、本発明の請求項２に記載の歌唱力評価方法は、請求項１に記載の発明の構成に加え、前記伴奏・歌声判定ステップでは、前記スペクトル記憶手段より読み出されたスペクトルからピッチを算出させ、当該ピッチが一定しきい値より高いときに入力音声信号は伴奏情報であると判定することを特徴とする。 Further, in the singing ability evaluation method according to claim 2 of the present invention, in addition to the configuration of the invention according to claim 1, in the accompaniment / singing voice determination step, the pitch is calculated from the spectrum read from the spectrum storage means. When the pitch is higher than a certain threshold value, it is determined that the input audio signal is accompaniment information.

また、本発明の請求項３に記載の歌唱力評価方法は、請求項１に記載の発明の構成に加え、前記伴奏・歌声判定ステップでは、前記スペクトル記憶手段より読み出されたスペクトルからスペクトルの傾きを算出させ、算出されたスペクトルの傾きが一定しきい値より大きいときに入力音声信号は伴奏情報であると判定することを特徴とする。 Further, in the singing ability evaluation method according to claim 3 of the present invention, in addition to the configuration of the invention according to claim 1, in the accompaniment / singing voice determination step, the spectrum is calculated from the spectrum read from the spectrum storage means. An inclination is calculated, and when the calculated spectrum inclination is larger than a certain threshold value, it is determined that the input sound signal is accompaniment information.

また、本発明の請求項４に記載の歌唱力評価方法は、請求項１乃至３の何れかに記載の歌唱力評価方法であって、前記伴奏・歌声判定ステップでは、さらに前記スペクトル記憶手段より読み出されたスペクトルから、スペクトルに含まれる倍音の量を測定し、当該倍音の量が一定しきい値より多いときに入力音声信号は伴奏情報であると判定することを特徴とする。 Moreover, the singing ability evaluation method according to claim 4 of the present invention is the singing ability evaluation method according to any one of claims 1 to 3, wherein the accompaniment / singing voice determination step further includes the spectrum storage means. From the read spectrum, the amount of harmonics included in the spectrum is measured, and when the amount of harmonics exceeds a certain threshold, it is determined that the input audio signal is accompaniment information.

また、本発明の請求項５に記載の歌唱力評価方法は、請求項１乃至３の何れかに記載の歌唱力評価方法であって、前記伴奏・歌声判定ステップでは、さらに前記スペクトル記憶手段より読み出されたスペクトルから、スペクトルに含まれる極値の数を数え、当該極値の数が一定しきい値より多いときに入力音声信号は伴奏情報であると判定することを特徴とする。 Further, the singing ability evaluation method according to claim 5 of the present invention is the singing ability evaluation method according to any one of claims 1 to 3, wherein the accompaniment / singing voice determination step further includes the spectrum storage means. The number of extreme values included in the spectrum is counted from the read spectrum, and when the number of extreme values is greater than a certain threshold value, it is determined that the input audio signal is accompaniment information.

また、本発明の請求項６に記載の歌唱力評価機能を搭載したカラオケ装置は、音声信号入力手段から入力された入力音声信号に周波数分析を行いスペクトルを算出するスペクトル算出手段と、当該スペクトル算出手段により得られたスペクトルを記憶するスペクトル記憶手段と、前記スペクトル記憶手段により読み出されたスペクトルのエネルギーが高周波数帯域に集中していることが検出されたときに入力音声信号は伴奏情報であると判定する伴奏・歌声判定手段とを備えたことを特徴とする。 Further, a karaoke apparatus equipped with the singing ability evaluation function according to claim 6 of the present invention includes a spectrum calculation means for performing frequency analysis on an input voice signal input from the voice signal input means and calculating a spectrum, and the spectrum calculation. The spectrum storage means for storing the spectrum obtained by the means, and the input audio signal is accompaniment information when it is detected that the energy of the spectrum read by the spectrum storage means is concentrated in the high frequency band And accompaniment / singing voice judging means.

また、本発明の請求項７に記載の歌唱力評価機能を搭載したカラオケ装置は、請求項６に記載の発明の構成に加え、前記伴奏・歌声判定手段は、前記スペクトル記憶手段により読み出されたスペクトルからピッチを算出し、当該ピッチが一定しきい値より高いときに入力音声信号は伴奏情報であると判定することを特徴とする。 Further, in the karaoke apparatus equipped with the singing ability evaluation function according to claim 7 of the present invention, in addition to the configuration of the invention according to claim 6, the accompaniment / singing voice determination means is read by the spectrum storage means. The pitch is calculated from the obtained spectrum, and when the pitch is higher than a certain threshold value, it is determined that the input voice signal is accompaniment information.

また、本発明の請求項８に記載の歌唱力評価機能を搭載したカラオケ装置は、請求項６に記載の発明の構成に加え、前記伴奏・歌声判定手段は、前記スペクトル記憶手段により読み出されたスペクトルからスペクトルの傾きを算出し、算出されたスペクトルの傾きが一定しきい値より大きいときに入力音声信号は伴奏情報であると判定することを特徴とする。 Further, in the karaoke apparatus equipped with the singing ability evaluation function according to claim 8 of the present invention, in addition to the configuration of the invention according to claim 6, the accompaniment / singing voice determination means is read by the spectrum storage means. The inclination of the spectrum is calculated from the obtained spectrum, and when the calculated inclination of the spectrum is larger than a certain threshold value, it is determined that the input voice signal is accompaniment information.

また、本発明の請求項９に記載の歌唱力評価機能を搭載したカラオケ装置は、請求項６乃至８の何れかに記載のカラオケ装置であって、前記伴奏・歌声判定手段は、さらに前記スペクトル記憶手段により読み出されたスペクトルから、スペクトルに含まれる倍音の量を測定し、当該倍音の量が一定しきい値より多いときに入力音声信号は伴奏情報であると判定することを特徴とする。 Moreover, the karaoke apparatus equipped with the singing ability evaluation function according to claim 9 of the present invention is the karaoke apparatus according to any one of claims 6 to 8, wherein the accompaniment / singing voice determination means further includes the spectrum. The amount of overtones included in the spectrum is measured from the spectrum read out by the storage means, and when the amount of overtones exceeds a certain threshold, it is determined that the input sound signal is accompaniment information. .

また、本発明の請求項１０に記載の歌唱力評価機能を搭載したカラオケ装置は、請求項６乃至８の何れかに記載のカラオケ装置であって、前記伴奏・歌声判定手段は、さらに前記スペクトル記憶手段により読み出されたスペクトルから、スペクトルに含まれる極値の数を数え、当該極値の数が一定しきい値より多いときに入力音声信号は伴奏情報であると判定することを特徴とする。 Moreover, the karaoke apparatus equipped with the singing ability evaluation function according to claim 10 of the present invention is the karaoke apparatus according to any one of claims 6 to 8, wherein the accompaniment / singing voice determination means further includes the spectrum. The number of extreme values included in the spectrum is counted from the spectrum read out by the storage means, and the input audio signal is determined to be accompaniment information when the number of extreme values is greater than a certain threshold value. To do.

本発明の請求項１に記載の歌唱力評価方法は、入力音声信号より算出されたスペクトルの周波数特性を用いて入力音声信号が伴奏情報なのか、歌声情報なのかを判定することができる。本発明によれば、マイク等から入力された伴奏情報を歌声情報と誤って採点してしまうのを防ぐことができるため、高精度な歌唱力評価が可能となる。 The singing ability evaluation method according to claim 1 of the present invention can determine whether the input voice signal is accompaniment information or singing voice information by using the frequency characteristic of the spectrum calculated from the input voice signal. According to the present invention, since accompaniment information input from a microphone or the like can be prevented from being erroneously scored as singing voice information, highly accurate singing ability evaluation can be performed.

また、本発明の請求項２に記載の歌唱力評価方法は、入力音声信号より算出されたピッチを用いて入力音声信号が伴奏情報なのか、歌声情報なのかを判定することができる。本発明によれば、マイク等から入力された伴奏情報を歌声情報と誤って採点してしまうのを防ぐことができるため、高精度な歌唱力評価が可能となる。 The singing ability evaluation method according to claim 2 of the present invention can determine whether the input voice signal is accompaniment information or singing voice information using the pitch calculated from the input voice signal. According to the present invention, since accompaniment information input from a microphone or the like can be prevented from being erroneously scored as singing voice information, highly accurate singing ability evaluation can be performed.

また、本発明の請求項３に記載の歌唱力評価方法は、入力音声信号より算出されたスペクトルの傾き用いて入力音声信号が伴奏情報なのか、歌声情報なのかを判定することができる。本発明によれば、マイク等から入力された伴奏情報を歌声情報と誤って採点してしまうのを防ぐことができるため、高精度な歌唱力評価が可能となる。 Moreover, the singing ability evaluation method according to claim 3 of the present invention can determine whether the input voice signal is accompaniment information or singing voice information by using the inclination of the spectrum calculated from the input voice signal. According to the present invention, since accompaniment information input from a microphone or the like can be prevented from being erroneously scored as singing voice information, highly accurate singing ability evaluation can be performed.

また、本発明の請求項４に記載の歌唱力評価方法は、請求項１乃至３の何れかに記載の発明の効果に加え、さらにスペクトルに含まれる倍音の量を利用して伴奏・歌声判定を行うことで、より高精度な歌唱力評価が可能となる。 Further, the singing ability evaluation method according to claim 4 of the present invention, in addition to the effect of the invention according to any one of claims 1 to 3, further determines the accompaniment / singing voice using the amount of overtones included in the spectrum. By performing this, it becomes possible to evaluate the singing ability with higher accuracy.

また、本発明の請求項５に記載の歌唱力評価方法は、請求項１乃至３の何れかに記載の発明の効果に加え、さらにスペクトルの極値の数を利用して伴奏・歌声判定を行うことで、より高精度な歌唱力評価が可能となる。 Further, the singing ability evaluation method according to claim 5 of the present invention, in addition to the effect of the invention according to any one of claims 1 to 3, further performs accompaniment / singing voice determination using the number of extreme points of the spectrum. By performing, more accurate singing ability evaluation becomes possible.

また、本発明の請求項６に記載の歌唱力評価機能を搭載したカラオケ装置は、入力音声信号より算出されたスペクトルの周波数特性を用いて入力音声信号が伴奏情報なのか、歌声情報なのかを判定することができる。カラオケＢＯＸなどにおいて、マイクに回りこんできた伴奏情報を歌声情報と誤って採点してしまう不具合を防ぐことができるため、高精度な歌唱力評価機能を提供できる。 Moreover, the karaoke apparatus equipped with the singing ability evaluation function according to claim 6 of the present invention uses the frequency characteristic of the spectrum calculated from the input voice signal to determine whether the input voice signal is accompaniment information or singing voice information. Can be determined. In karaoke BOX and the like, it is possible to prevent a problem that scoring accompaniment information that has entered the microphone as singing voice information, thereby providing a highly accurate singing ability evaluation function.

また、本発明の請求項７に記載の歌唱力評価機能を搭載したカラオケ装置は、入力音声信号より算出されたピッチを用いて入力音声信号が伴奏情報なのか、歌声情報なのかを判定することができる。カラオケＢＯＸなどにおいて、マイクに回りこんできた伴奏情報を歌声情報と誤って採点してしまう不具合を防ぐことができるため、高精度な歌唱力評価機能を提供できる。 A karaoke apparatus equipped with the singing ability evaluation function according to claim 7 of the present invention determines whether the input voice signal is accompaniment information or singing voice information using a pitch calculated from the input voice signal. Can do. In karaoke BOX and the like, it is possible to prevent a problem that scoring accompaniment information that has entered the microphone as singing voice information, thereby providing a highly accurate singing ability evaluation function.

また、本発明の請求項８に記載の歌唱力評価機能を搭載したカラオケ装置は、入力音声信号より算出されたスペクトルの傾きを用いて入力音声信号が伴奏情報なのか、歌声情報なのかを判定することができる。カラオケＢＯＸなどにおいて、マイクに回りこんできた伴奏情報を歌声情報と誤って採点してしまう不具合を防ぐことができるため、高精度な歌唱力評価機能を提供できる。 Further, a karaoke apparatus equipped with the singing ability evaluation function according to claim 8 of the present invention determines whether the input voice signal is accompaniment information or singing voice information by using the slope of the spectrum calculated from the input voice signal. can do. In karaoke BOX and the like, it is possible to prevent a problem that scoring accompaniment information that has entered the microphone as singing voice information, thereby providing a highly accurate singing ability evaluation function.

また、本発明の請求項９に記載の歌唱力評価機能を搭載したカラオケ装置は、請求項６乃至８の何れかに記載の発明の効果に加え、さらにスペクトルに含まれる倍音の量を利用して伴奏・歌声判定を行うため、より高精度な歌唱力評価機能を提供できる。 Moreover, the karaoke apparatus equipped with the singing ability evaluation function according to claim 9 of the present invention uses the amount of overtones included in the spectrum in addition to the effect of the invention according to any of claims 6 to 8. Since accompaniment / singing voice determination is performed, a more accurate singing ability evaluation function can be provided.

また、本発明の請求項１０に記載の歌唱力評価機能を搭載したカラオケ装置は、請求項６乃至８の何れかに記載の発明の効果に加え、さらにスペクトルの極値の数を利用して伴奏・歌声判定を行うため、より高精度な歌唱力評価機能を提供できる。 Moreover, in addition to the effect of the invention according to any one of claims 6 to 8, the karaoke apparatus equipped with the singing ability evaluation function according to claim 10 of the present invention further utilizes the number of extreme points of the spectrum. Since accompaniment / singing voice determination is performed, a more accurate singing ability evaluation function can be provided.

次に本発明を適用した実施の形態について図面を参照して詳しく説明する。本発明の実施形態として、歌唱力評価装置を搭載したカラオケ装置について説明する。図１は本実施形態におけるカラオケ採点装置の外観である。図１に示すように、カラオケ装置１にはマイク２、ディスプレイ３、ＡＭＰ４が接続されており、ＡＭＰ４にはスピーカ５が接続されている。 Next, embodiments to which the present invention is applied will be described in detail with reference to the drawings. As an embodiment of the present invention, a karaoke apparatus equipped with a singing ability evaluation apparatus will be described. FIG. 1 is an external view of a karaoke scoring device in the present embodiment. As shown in FIG. 1, a microphone 2, a display 3, and an AMP 4 are connected to the karaoke apparatus 1, and a speaker 5 is connected to the AMP 4.

図２は、カラオケ装置１の内部構造について説明したブロック図である。図２に示すように、カラオケ装置はＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）１９を中心にした電子回路で構成される。ＣＰＵ１９はＨＤＤ（ＨａｒｄＤｉｓｋＤｒｉｖｅ）１８、ビデオコントローラ６、ミキサ７、ＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）９、採点処理部１２に接続され、各機器の動作を制御する。マイク２は、カラオケ装置１内部のＡ／Ｄ変換部１７に接続される。Ａ／Ｄ変換部１７は、ＲＡＭ９とミキサ７に接続される。ミキサ７は演奏装置８に接続されており、ミキサ７の出力を演奏装置８を経由して外部のＡＭＰ４に伝える。ＲＡＭ９には、Ａ／Ｄ変換部１７、採点処理部１２、ＣＰＵ１９が接続されている。ＲＡＭ９にはＡ／Ｄ変換部１７によりＡ／Ｄ変換された音声信号、採点処理部１２により算出されたカラオケ採点結果などが記録される。 FIG. 2 is a block diagram illustrating the internal structure of the karaoke apparatus 1. As shown in FIG. 2, the karaoke apparatus is composed of an electronic circuit centered on a CPU (Central Processing Unit) 19. The CPU 19 is connected to an HDD (Hard Disk Drive) 18, a video controller 6, a mixer 7, a RAM (Random Access Memory) 9, and a scoring processing unit 12, and controls the operation of each device. The microphone 2 is connected to the A / D conversion unit 17 inside the karaoke apparatus 1. The A / D converter 17 is connected to the RAM 9 and the mixer 7. The mixer 7 is connected to the performance device 8 and transmits the output of the mixer 7 to the external AMP 4 via the performance device 8. The RAM 9 is connected to an A / D conversion unit 17, a scoring processing unit 12, and a CPU 19. The RAM 9 stores the audio signal A / D converted by the A / D converter 17, the karaoke scoring result calculated by the scoring processor 12, and the like.

採点処理部１２は、ピッチ抽出部１３、ビブラート検出部１４、得点算出部１５から構成され、ピッチ抽出部１３とビブラート検出部１４の出力を基に、得点算出部１５で得点算出を行う構造になっている。 The scoring processing unit 12 includes a pitch extraction unit 13, a vibrato detection unit 14, and a score calculation unit 15. The score calculation unit 15 calculates a score based on outputs from the pitch extraction unit 13 and the vibrato detection unit 14. It has become.

ＨＤＤ１８には、多数のカラオケ曲の背景映像、演奏データ、歌詞テロップ、その他の情報が蓄えられている。操作部１６は、パネルスイッチや、リモコン受信回路から成り、ユーザによる操作信号をＣＰＵ１９に伝える。歌い手が多数のカラオケ曲から特定の曲を選択し操作部１６より入力すると、ＣＰＵ１９は、その信号を受けてＨＤＤ１８より該当するカラオケ曲の演奏データを読み出し、ミキサに出力する。 The HDD 18 stores background images, performance data, lyrics telop, and other information of a large number of karaoke songs. The operation unit 16 includes a panel switch and a remote control receiving circuit, and transmits an operation signal from the user to the CPU 19. When the singer selects a specific song from a large number of karaoke songs and inputs it from the operation unit 16, the CPU 19 receives the signal and reads the performance data of the corresponding karaoke song from the HDD 18 and outputs it to the mixer.

一方、マイク２より入力された歌唱者の歌声は、Ａ／Ｄ変換部１７でサンプリングされ、ミキサ７に送られる。ミキサ７はマイク２から入力された歌唱者の歌声とＨＤＤ１８より読み出された演奏データを合成し、演奏装置８に出力する。合成された演奏データはＡＭＰ４を経由してスピーカ５から出力される。同時に、ＣＰＵ１９は背景映像および歌詞テロップをビデオコントローラ６に送る。歌詞テロップは、演奏と同期してディスプレイ３に表示され、現在演奏されている歌詞テロップの色が変化していく。歌唱者は歌詞テロップを見ながら伴奏に従って歌を歌う。この一連の動作制御はＣＰＵ１９が担当している。なお、ビデオコントローラ６が請求項に言う「表示制御手段」に相当し、ディスプレイ３が請求項に言う「表示手段」に相当する。 On the other hand, the singing voice of the singer input from the microphone 2 is sampled by the A / D converter 17 and sent to the mixer 7. The mixer 7 synthesizes the singing voice of the singer inputted from the microphone 2 and the performance data read from the HDD 18 and outputs the synthesized data to the performance device 8. The synthesized performance data is output from the speaker 5 via the AMP 4. At the same time, the CPU 19 sends the background video and the lyrics telop to the video controller 6. The lyrics telop is displayed on the display 3 in synchronization with the performance, and the color of the currently performed lyrics telop changes. The singer sings according to the accompaniment while watching the lyrics telop. The CPU 19 is in charge of this series of operation control. The video controller 6 corresponds to “display control means” in the claims, and the display 3 corresponds to “display means” in the claims.

次に、採点処理部１２の動作について説明する。カラオケ装置に付属するカラオケ採点機能を使用するかどうかは歌唱者の意思に委ねられる。カラオケ採点を希望する歌唱者は操作部１６を操作して、採点機能をＯＮにする。歌唱者のカラオケ歌唱が始まると、ＣＰＵ１９は採点処理部１２に採点開始指示を与える。採点開始指示を与えられた採点処理部１２はカラオケ採点を開始する。カラオケ採点が開始されると、ＣＰＵ１９はＨＤＤ１８から歌唱者が歌うカラオケ曲データを読み出し、読み出された曲データに含まれているガイドメロディをＲＡＭ９に書き込み始める。一方、マイク２より入力された歌唱者の歌声は、Ａ／Ｄ変換部１７によりサンプリングされ、ＤＭＡ（ＤｉｒｅｃｔＭｅｍｏｒｙＡｃｃｅｓｓ）によりＲＡＭ９に音声信号として記録される。ピッチ抽出部１３はＲＡＭ９から音声信号を読み出し、ピッチを算出する。同時に、ビブラート検出部１４はＲＡＭ９から音声信号を読み出し、ビブラートを検出する。得点算出部１５はピッチ抽出部１３より検出されたピッチ情報とＲＡＭ９より読み出されたガイドメロディを比較し、この比較結果にビブラート検出部１４より検出されたビブラート情報と伴奏・歌声判定部２１より出力された伴奏／歌声の判定結果を加味して得点を算出する。算出された得点は、採点結果としてＲＡＭ９に書き込まれる。なお、このＡ／Ｄ変換部１７が請求項に言う音声信号入力手段に相当する。 Next, the operation of the scoring processing unit 12 will be described. Whether or not to use the karaoke scoring function attached to the karaoke device is left to the singer's intention. A singer who desires karaoke scoring operates the operation unit 16 to turn on the scoring function. When the singer's karaoke song starts, the CPU 19 gives a scoring start instruction to the scoring unit 12. The scoring unit 12 given the scoring start instruction starts karaoke scoring. When the karaoke scoring is started, the CPU 19 reads out the karaoke song data sung by the singer from the HDD 18 and starts writing the guide melody included in the read song data into the RAM 9. On the other hand, the singing voice of the singer input from the microphone 2 is sampled by the A / D conversion unit 17 and recorded as an audio signal in the RAM 9 by DMA (Direct Memory Access). The pitch extraction unit 13 reads an audio signal from the RAM 9 and calculates a pitch. At the same time, the vibrato detection unit 14 reads an audio signal from the RAM 9 and detects vibrato. The score calculation unit 15 compares the pitch information detected by the pitch extraction unit 13 with the guide melody read from the RAM 9, and the vibrato information detected by the vibrato detection unit 14 and the accompaniment / singing voice determination unit 21 compare with the comparison result. The score is calculated in consideration of the output accompaniment / singing voice determination result. The calculated score is written in the RAM 9 as a scoring result. The A / D converter 17 corresponds to the voice signal input means described in the claims.

演奏終了後、ＣＰＵ１９が採点終了指示を採点処理部１２に与えると、カラオケ採点処理は終了する。ＣＰＵ１９はＲＡＭ９から読み出した採点結果をビデオコントローラ６に送る。採点結果はディスプレイ３に表示され、歌唱者は歌の採点結果を確認し一喜一憂する。なお、本実施形態では演奏終了後に得点をディスプレイ３に表示しているが、これに限らず歌唱者が歌い始めてから現時点までの中間得点を順次ディスプレイに表示していき歌唱者が歌いながら得点を確認できる構造にしても良い。 When the CPU 19 gives a scoring end instruction to the scoring processing unit 12 after the performance is finished, the karaoke scoring process ends. The CPU 19 sends the scoring result read from the RAM 9 to the video controller 6. The scoring result is displayed on the display 3, and the singer is glad to see the scoring result of the song. In the present embodiment, the score is displayed on the display 3 after the performance is finished. However, the present invention is not limited to this, and the intermediate score from the time the singer starts to sing until the present time is sequentially displayed on the display, and the score is obtained while the singer sings. A structure that can be confirmed may be used.

採点処理部１２は、ＤＳＰ（ＤｉｇｉｔａｌＳｉｇｎａｌＰｒｏｃｅｓｓｏｒ）と採点処理を行う採点専用ファームウェアによって構成される。通常のカラオケ採点装置ではＤＳＰはピッチ算出のみに用いられ、採点処理はＣＰＵが行う構成が多かったが、本実施形態では採点に関わる殆ど全ての処理をＤＳＰ（採点処理部）が担当する。この構成により、採点回路設計の自由度が格段に向上すると共に非常に詳細な分析が可能となる。 The scoring processing unit 12 includes a DSP (Digital Signal Processor) and scoring dedicated firmware that performs scoring processing. In a normal karaoke scoring device, the DSP is used only for pitch calculation, and the scoring process is often performed by the CPU. However, in this embodiment, almost all processing related to scoring is handled by the DSP (scoring processing unit). With this configuration, the degree of freedom in scoring circuit design is greatly improved and a very detailed analysis is possible.

次に、ＲＡＭ９に設定される記憶領域について、図３を参照して説明する。音声信号記憶メモリ９Ａは、Ａ／Ｄ変換部１７によりＡ／Ｄ変換された音声信号を記憶する領域である。自己相関関数記憶メモリ９Ｂは、採点処理部１２内部にあるピッチ抽出部１３で算出される自己相関関数を記憶する領域である。フーリエ級数記憶メモリ９Ｃは採点処理部１２内部にあるピッチ抽出部１３で算出されるフーリエ級数を記憶する領域である。有声・無声判定結果記憶メモリ９Ｄは、ＲＡＭ９より読み出された切り出し音声フレームが有声音か？無声音か？判定した結果を記憶する領域である。伴奏・歌声判定結果記憶メモリ９Ｅは、ＲＡＭ９より読み出された切り出し音声フレームが歌声情報か？伴奏情報か？判定した結果を記憶する領域である。ピッチ記憶メモリ９Ｆは、ピッチ抽出部１３で算出されるピッチを記憶する領域である。ピッチ変化量記憶メモリ９Ｇは、ビブラート検出部１４により算出されたピッチ変化量を記憶する領域である。ビブラート情報記憶メモリ９Ｈは、ビブラート検出部１４により算出されたビブラート情報を記憶する領域である。瞬時得点記憶メモリ９Ｉは、得点算出部１５により算出された瞬時得点を記憶する領域である。累積得点記憶メモリ９Ｊは、得点算出部１５により算出された累積得点を記憶する領域である。ワークメモリ９Ｗは、採点処理部が、採点処理の一時記憶などに使用する領域である。なお、この自己相関関数記憶メモリ９Ｂ及びフーリエ級数記憶メモリ９Ｃが請求項に言う「スペクトル記憶手段」に相当する。 Next, the storage area set in the RAM 9 will be described with reference to FIG. The audio signal storage memory 9 A is an area for storing the audio signal that has been A / D converted by the A / D conversion unit 17. The autocorrelation function storage memory 9B is an area for storing the autocorrelation function calculated by the pitch extraction unit 13 in the scoring processing unit 12. The Fourier series storage memory 9 C is an area for storing a Fourier series calculated by the pitch extraction unit 13 in the scoring processing unit 12. In the voiced / unvoiced determination result storage memory 9D, is the cut out voice frame read from the RAM 9 a voiced sound? Is it silent? This is an area for storing the determined result. In the accompaniment / singing voice determination result memory 9E, is the cut out voice frame read from the RAM 9 the singing voice information? Accompaniment information? This is an area for storing the determined result. The pitch storage memory 9 F is an area for storing a pitch calculated by the pitch extraction unit 13. The pitch change amount storage memory 9G is an area for storing the pitch change amount calculated by the vibrato detection unit 14. The vibrato information storage memory 9H is an area for storing the vibrato information calculated by the vibrato detection unit 14. The instantaneous score storage memory 9I is an area for storing the instantaneous score calculated by the score calculation unit 15. The cumulative score storage memory 9J is an area for storing the cumulative score calculated by the score calculation unit 15. The work memory 9W is an area used by the scoring processing unit for temporary storage of scoring processing. The autocorrelation function storage memory 9B and the Fourier series storage memory 9C correspond to “spectrum storage means” in the claims.

図４は、採点処理部１２で行われる採点動作手順について説明したフローチャートである。採点処理部１２の動作について、図３、図４を参照して説明する。採点処理部１２は、ピッチ抽出部１３、ビブラート検出部１４、伴奏・歌声判定部２１、得点算出部１５より構成される。 FIG. 4 is a flowchart illustrating the scoring operation procedure performed by the scoring processing unit 12. The operation of the scoring unit 12 will be described with reference to FIGS. The scoring unit 12 includes a pitch extraction unit 13, a vibrato detection unit 14, an accompaniment / singing voice determination unit 21, and a score calculation unit 15.

まず、ピッチ抽出部１３の動作について図４に示すフローチャートを参照して説明する。ピッチ抽出処理では、まずＲＡＭ９の音声信号記憶メモリ９Ａより読み出された音声信号を基に自己相関関数を算出し、ＲＡＭ９の自己相関関数記憶メモリ９Ｂに書き込む（Ｓ１０）。次に、ＲＡＭ９の音声信号記憶メモリ９Ａより読み出された音声信号に対して高速フーリエ変換を行い、得られたフーリエ級数をＲＡＭ９のフーリエ級数記憶メモリ９Ｃに書き込む（Ｓ１１）。次に、ＲＡＭ９の自己相関関数記憶メモリ９Ｂから読み出された自己相関関数を基に「入力音声が有声音か無声音か？」の判定を行い、判定結果をＲＡＭ９の有声・無声判定結果記憶領域９Ｄに書き込む（Ｓ１２）。次に、ＲＡＭ９の自己相関関数記憶メモリ９ＢとＲＡＭ９のフーリエ級数記憶メモリ９Ｃから、自己相関関数とフーリエ級数を読み出し、これらを基に伴奏・歌声判定を行い、得られた伴奏・歌声判定結果をＲＡＭ９の伴奏・歌声判定結果記憶メモリ９Ｅに書き込む（Ｓ１３）。次に、ＲＡＭ９の自己相関関数記憶メモリ９ＢとＲＡＭ９のフーリエ級数記憶メモリ９Ｃから、自己相関関数とフーリエ級数を読み出し、これらを基にピッチ抽出を行い、検出されたピッチをＲＡＭ９のピッチ記憶メモリ９Ｆに書き込む（Ｓ１４）。上述Ｓ１０〜Ｓ１４までの処理をピッチ抽出部１３が担当する。なお、上記Ｓ１０及びＳ１１の処理が請求項に言う「スペクトル算出ステップ」及び「スペクトル記憶ステップ」に相当し、Ｓ１０及びＳ１１の処理を実行する採点処理部（ＤＳＰ）１２が「スペクトル算出手段」に相当する。また、上記Ｓ１３の処理が請求項に言う「伴奏・歌声判定ステップ」に相当し、Ｓ１０及びＳ１１の処理を実行する採点処理部（ＤＳＰ）１２が「伴奏・歌声判定手段」に相当する。 First, the operation of the pitch extraction unit 13 will be described with reference to the flowchart shown in FIG. In the pitch extraction process, first, an autocorrelation function is calculated based on the audio signal read from the audio signal storage memory 9A of the RAM 9 and written to the autocorrelation function storage memory 9B of the RAM 9 (S10). Next, fast Fourier transform is performed on the audio signal read from the audio signal storage memory 9A of the RAM 9, and the obtained Fourier series is written in the Fourier series storage memory 9C of the RAM 9 (S11). Next, based on the autocorrelation function read from the autocorrelation function storage memory 9B of the RAM 9, it is determined whether the input speech is voiced or unvoiced? The determination result is stored in the voiced / unvoiced determination result storage area of the RAM 9. Write to 9D (S12). Next, the autocorrelation function and the Fourier series are read from the autocorrelation function storage memory 9B of the RAM 9 and the Fourier series storage memory 9C of the RAM 9, and accompaniment / singing voice determination is performed based on these, and the obtained accompaniment / singing voice determination result is obtained. The accompaniment / singing voice determination result storage memory 9E of the RAM 9 is written (S13). Next, the autocorrelation function and the Fourier series are read out from the autocorrelation function storage memory 9B of the RAM 9 and the Fourier series storage memory 9C of the RAM 9, the pitch is extracted based on these, and the detected pitch is stored in the pitch storage memory 9F of the RAM 9. (S14). The pitch extraction unit 13 takes charge of the processes from S10 to S14. The processes of S10 and S11 correspond to the “spectrum calculation step” and “spectrum storage step” in the claims, and the scoring processing unit (DSP) 12 that executes the processes of S10 and S11 is used as the “spectrum calculation means”. Equivalent to. The process of S13 corresponds to the “accompaniment / singing voice determination step” described in the claims, and the scoring processing unit (DSP) 12 that executes the processes of S10 and S11 corresponds to “accompaniment / singing voice determination means”.

次に、ビブラート検出部１４の処理について説明する。ビブラート検出部１４では、ＲＡＭ９のピッチ記憶メモリ９Ｆより読み出されたピッチを基にピッチ変化量を算出しＲＡＭ９のピッチ変化量記憶メモリ９Ｇに書き込む（Ｓ１５）。次に、ＲＡＭ９のピッチ変化量記憶メモリ９Ｇより読み出されたピッチ変化量を基にビブラート検出を行い、ビブラート検出結果をＲＡＭ９のビブラート情報記憶メモリに書き込む（Ｓ１６）。上述Ｓ１５〜Ｓ１６までの処理をビブラート検出部１４が担当する。 Next, processing of the vibrato detection unit 14 will be described. The vibrato detection unit 14 calculates a pitch change amount based on the pitch read from the pitch storage memory 9F of the RAM 9 and writes it to the pitch change amount storage memory 9G of the RAM 9 (S15). Next, vibrato detection is performed based on the pitch change amount read from the pitch change amount storage memory 9G of the RAM 9, and the vibrato detection result is written in the vibrato information storage memory of the RAM 9 (S16). The vibrato detection unit 14 takes charge of the processes from S15 to S16.

得点算出部１５は、ＲＡＭ９のピッチ記憶メモリ９Ｆ、ガイドメロディ記憶メモリ９Ｍ、有声・無声判定結果記憶メモリ９Ｄ、伴奏・歌声判定結果記憶メモリ９Ｅ、ピッチ変化量記憶メモリ９Ｇ、ビブラート情報記憶メモリ９Ｈより、それぞれピッチ、ガイドメロディ、有声・無声判定結果、伴奏・歌声判定結果、ピッチ変化量、ビブラート検出結果を読み出し、これらを基に得点算出を行う（Ｓ１７）。処理Ｓ１７により得られた得点算出結果は、ＲＡＭ９の瞬時得点記憶メモリ９Ｉ、累積得点記憶メモリ９Ｊに書き込まれる。ＲＡＭ９の瞬時得点記憶メモリ９Ｉには、短時間について分析した瞬時得点が記録され、ＲＡＭ９の累積得点記憶メモリ９Ｊには、採点を始めてから現在に至るまで瞬時得点を累積して平均をとった平均得点が記録される。 The score calculation unit 15 includes a pitch storage memory 9F, a guide melody storage memory 9M, a voiced / unvoiced determination result storage memory 9D, an accompaniment / singing voice determination result storage memory 9E, a pitch change amount storage memory 9G, and a vibrato information storage memory 9H. The pitch, guide melody, voiced / unvoiced determination result, accompaniment / singing voice determination result, pitch change amount, and vibrato detection result are read out, and the score is calculated based on these (S17). The score calculation result obtained by the process S17 is written into the instantaneous score storage memory 9I and the cumulative score storage memory 9J of the RAM 9. The instantaneous score storage memory 9I of the RAM 9 records the instantaneous score analyzed for a short time, and the cumulative score storage memory 9J of the RAM 9 averages by accumulating the instantaneous scores from the start to the present. A score is recorded.

なお、上記瞬時得点は上記ガイドメロディとピッチの比較を行い、その類似度を得点としている。ただし、有声・無声判定結果記憶メモリ９Ｄより読み出された判定結果が"無声音"と判定されている無声区間については得点算出を行わない。また、伴奏・歌声判定結果記憶メモリ９Ｅより読み出された判定結果が"伴奏情報"と判定されている伴奏区間については、得点算出を行わない。また、ピッチ変化量記憶メモリ９Ｇより読み出されたピッチ変化量が激しい区間についても得点算出を行わない。また、ビブラート情報記憶メモリ９Ｈより読み出された情報より、"ビブラート区間"と判定された区間については、ビブラートの美しさを算出し得点とする構成になっている。これら一連の採点処理は、ＣＰＵ１９から採点終了指示を受ける（Ｓ１８）ことで終了する。累積得点記憶メモリ９Ｊに書き込まれた平均得点が歌唱者の歌の得点となる。 The instantaneous score is compared with the guide melody and the pitch, and the similarity is scored. However, score calculation is not performed for an unvoiced section in which the determination result read from the voiced / unvoiced determination result storage memory 9D is determined to be “unvoiced sound”. Further, no score calculation is performed for the accompaniment section in which the determination result read from the accompaniment / singing voice determination result storage memory 9E is determined to be “accompaniment information”. Further, the score calculation is not performed for the section where the pitch change amount read from the pitch change amount storage memory 9G is intense. In addition, for the section determined as the “vibrato section” from the information read from the vibrato information storage memory 9H, the beauty of the vibrato is calculated and used as a score. The series of scoring processes ends when a scoring end instruction is received from the CPU 19 (S18). The average score written in the cumulative score storage memory 9J becomes the score of the singer's song.

以下、ピッチ抽出部１３、ビブラート検出部１４、得点算出部１５の動作について詳細に説明する。ピッチ抽出部１３では、前処理として、入力音声に自己相関分析を行い、自己相関関数を求める。本実施形態では、マイク２より入力された歌唱音声は、Ａ／Ｄ変換器１７によりサンプリング周波数４８［ｋＨｚ］でサンプリングされ、ＲＡＭ９の音声信号記憶メモリ９Ａに書き込まれる。採点処理部１２は、ＲＡＭ９の音声信号記憶メモリ９Ａから、１回の分析につき１４４０［ｐｏｉｎｔ］の分析フレームを切り出し分析を行う。自己相関関数法では、『分析フレーム：Ｆ_０＝｛ｘ（１），ｘ（２），…，ｘ（Ｎ）｝』と『分析フレームをｉ［ｐｏｉｎｔ］ずらしたもの：Ｆ_ｉ＝｛ｘ（ｉ），ｘ（１＋ｉ），…，ｘ（Ｎ＋ｉ）｝』の相関を求める。相関値Ｒ（０，ｉ）を計算する式の一例を数式１に示す。

Hereinafter, operations of the pitch extraction unit 13, the vibrato detection unit 14, and the score calculation unit 15 will be described in detail. The pitch extraction unit 13 performs autocorrelation analysis on the input speech as preprocessing, and obtains an autocorrelation function. In this embodiment, the singing voice input from the microphone 2 is sampled by the A / D converter 17 at a sampling frequency of 48 [kHz] and written to the voice signal storage memory 9 A of the RAM 9. The scoring unit 12 cuts out an analysis frame of 1440 [points] per analysis from the audio signal storage memory 9A of the RAM 9 and performs an analysis. In the autocorrelation function method, “analysis frame: F ₀ = {x (1), x (2),..., X (N)}” and “analysis frame shifted by i [point]: F _i = {x (I), x (1 + i),..., X (N + i)} ”. An example of an equation for calculating the correlation value R (0, i) is shown in Equation 1.

自己相関関数法では、Ｆ_０とＦ_ｉのずれ量ｉを１〜Ｎ［ｐｏｉｎｔ］まで変化させ、相関値（類似度）Ｒ（０，ｉ）を順次算出していく。このようにして算出された自己相関関数Ｒ（０，ｉ）は、メモリ９の自己相関関数記憶メモリ９Ｂに書き込まれる。 In the autocorrelation function method, the deviation amount i between F ₀ and F _i is changed from 1 to N [point], and the correlation value (similarity) R (0, i) is sequentially calculated. The autocorrelation function R (0, i) calculated in this way is written into the autocorrelation function storage memory 9B of the memory 9.

自己相関関数を用いたピッチ抽出法では、ピッチ（基本周波数）ｆは相関値Ｒ（０，ｉ）を利用して数式２により算出される。数式２は、ずれ量ｉを順次変化させていったときの相関値Ｒ（０，ｉ）が最大になるずれ量ａｒｇｍａｘ_ｉ｛Ｒ（０，ｉ）｝を音声信号の基本周期として検出し、サンプリング周波数４８０００［Ｈｚ］を基本周期で割ったものをピッチとして算出するという意味である。このようにして抽出されたピッチは、ＲＡＭ９のピッチ記憶メモリ９Ｆに書き込まれる。

In the pitch extraction method using the autocorrelation function, the pitch (fundamental frequency) f is calculated by Equation 2 using the correlation value R (0, i). Formula 2 detects the shift amount argmax _i {R (0, i)} that maximizes the correlation value R (0, i) when the shift amount i is sequentially changed, as the fundamental period of the audio signal, This means that the sampling frequency 48000 [Hz] divided by the basic period is calculated as the pitch. The pitch extracted in this way is written into the pitch storage memory 9F of the RAM 9.

ここで、歌唱者の歌声が有声音か無声音か判別する方法について簡単に説明する。人間の声には有声音と無声音があるが、一般的に無声音からはピッチを算出することはできないことが知られている。このため、ピッチを算出する前に有声音・無声音判定を行う必要がある。有声・無声は、数式１で使用する自己相関関数の比Ｒ（０，ｉ_ｍａｘ）／Ｒ（０，０）を用いて簡単に判定できる。Ｒ（０，ｉ_ｍａｘ）／Ｒ（０，０）が一定しきい値より大きければ有声音、一定しきい値より小さければ無声音と判定する。本実施の形態では、無声音はピッチの信頼度が低いため歌唱力評価には用いない。また、有声・無声判定法としては、自己相関関数の比に限らずゼロクロス法などの他の公知な有声・無声判定技術を用いることもできる。 Here, a method for determining whether a singer's singing voice is voiced sound or unvoiced sound will be briefly described. Human voices include voiced and unvoiced sounds, but it is generally known that the pitch cannot be calculated from unvoiced sounds. For this reason, it is necessary to determine voiced / unvoiced sound before calculating the pitch. Voiced / unvoiced can be easily determined using the ratio R (0, i _max ) / R (0, 0) of the autocorrelation function used in Equation 1. If R (0, i _max ) / R (0, 0) is greater than a certain threshold value, it is determined as a voiced sound, and if it is smaller than a certain threshold value, it is determined as an unvoiced sound. In the present embodiment, unvoiced sounds are not used for singing ability evaluation because of their low pitch reliability. The voiced / unvoiced determination method is not limited to the autocorrelation function ratio, and other known voiced / unvoiced determination techniques such as a zero-cross method may be used.

なお、本実施形態では自己相関関数を利用したピッチ抽出と高速フーリエ変換（ＦＦＴ）を利用したピッチ抽出を併用することにより、より信頼性の高いピッチ抽出を実現する。高速フーリエ変換を用いたピッチ抽出では、メモリ９の音声信号記憶メモリから読み出された音声信号に対し、高速フーリエ変換を用いてＦＦＴスペクトルを算出し、フーリエ級数記憶メモリ９Ｃに書き込む。ピッチは、ＦＦＴスペクトルが最大値をとるときの周波数として検出する。自己相関関数を利用したピッチ抽出法は男性低音系の歌声からのピッチ抽出に有効で、高速フーリエ変換を用いたピッチ抽出法は女性高音系の歌声からのピッチ抽出に有効である。採点処理に用いるピッチｆは、自己相関関数より算出したピッチｆ１と高速フーリエ変換により算出したピッチｆ２から、例えば表１に示す選択基準によって選択する。ｆ１とｆ２が一定しきい値ＦＴＨより大きいときはｆ＝ｆ２を選択し、それ以外のときはｆ＝ｆ１を選択する。この方法により、信頼性の高いピッチｆを選択することができる。ＦＴＨは、高音と低音を判断する予め設定されたしきい値である。ここでは、例えばＦＴＨ＝４００［Ｈｚ］とする。

In this embodiment, pitch extraction using an autocorrelation function and pitch extraction using fast Fourier transform (FFT) are used together to realize more reliable pitch extraction. In pitch extraction using the fast Fourier transform, an FFT spectrum is calculated using the fast Fourier transform for the audio signal read from the audio signal storage memory of the memory 9 and written into the Fourier series storage memory 9C. The pitch is detected as a frequency when the FFT spectrum takes the maximum value. The pitch extraction method using the autocorrelation function is effective for pitch extraction from male bass singing voices, and the pitch extraction method using fast Fourier transform is effective for pitch extraction from female treble singing voices. The pitch f used for the scoring process is selected from the pitch f1 calculated from the autocorrelation function and the pitch f2 calculated by the fast Fourier transform, for example, according to the selection criteria shown in Table 1. When f1 and f2 are larger than the fixed threshold value FTH, f = f2 is selected. Otherwise, f = f1 is selected. By this method, a highly reliable pitch f can be selected. FTH is a preset threshold value for determining high and low sounds. Here, for example, FTH = 400 [Hz].

次に、ビブラート検出部１４の動作について説明する。ビブラート検出部１４では、まずピッチ変化量を算出する。ピッチ変化量Ｄ（ｉ）は、ＲＡＭ９のピッチ記憶メモリ９Ｆより読み出された現在のピッチｆ_ｉ＋１と一分析フレーム過去のピッチｆ_ｉを用いて、数式３により算出される。算出されたピッチ変化量Ｄ（ｉ）は、ピッチ変化量記憶メモリ９Ｇに書き込まれる。

Next, the operation of the vibrato detection unit 14 will be described. The vibrato detection unit 14 first calculates a pitch change amount. Pitch change D (i), using an analysis frame past pitch f _i between the current pitch f _{i + 1} read from the pitch storage memory 9F in RAM 9, is calculated by equation 3. The calculated pitch change amount D (i) is written in the pitch change amount storage memory 9G.

ＲＡＭ９のピッチ変化量記憶メモリ９Ｇには、数式３で算出されたピッチの変化量信号が例えば５００［ｍｓ］分バッファリングされている。フレームシフトを１０［ｍｓ］とすると５００［ｍｓ］は、５０フレーム分に相当するので、Ｎ＝５０ポイント分のピッチ変化量信号をバッファリングされていることになる。５０ポイントのピッチの変化量信号Ｄ（ｉ）に対して、数式４で示される自己相関関数ｅ（τ）を算出する。自己相関関数は信号の周期性を調べるのに適している関数である。自己相関関数ｅ（τ）が一定しきい値を超えた場合、ピッチ変化量信号にある程度の周期性があると考えられるため、入力音声信号にビブラートがかかっていると判定することができる。このようにして判定されたビブラート判定結果は、ビブラート情報記憶メモリ９Ｈに書き込まれる。

In the pitch change amount storage memory 9G of the RAM 9, the pitch change amount signal calculated by Equation 3 is buffered for 500 [ms], for example. Assuming that the frame shift is 10 [ms], 500 [ms] corresponds to 50 frames, so that the pitch change amount signal for N = 50 points is buffered. An autocorrelation function e (τ) expressed by Equation 4 is calculated for the pitch change signal D (i) of 50 points. The autocorrelation function is a function suitable for examining the periodicity of a signal. When the autocorrelation function e (τ) exceeds a certain threshold value, the pitch variation signal is considered to have a certain degree of periodicity, and therefore it can be determined that the input audio signal is vibrato. The vibrato determination result thus determined is written in the vibrato information storage memory 9H.

次に、得点算出部１５の動作について詳細に説明する。得点算出部１５では、入力音声信号を表２に示す（ｉ）〜（ｉｖ）の４つの区間に分類する。（ｉ）無声区間と（ｉｉｉ）ピッチの変化が激しい区間（音程変化量が大きい区間）は得点算出には用いず、（ｉｉ）ビブラート区間と（ｉｖ）通常歌唱区間について得点算出を行う。なお、伴奏・歌声判定部２１により伴奏情報と判定された区間は（ｉ）の無声区間に分類され、得点算出に用いられない。（ｉｖ）通常歌唱区間では、入力音声信号から抽出されたピッチとガイドメロディを比較し、その類似度に比例した得点を算出する。（ｉｉ）ビブラート区間の得点は、例えば数式４で算出される相関の強さｅ（τ）の最大値に予め設定された定数を乗じて算出する。ビブラートはｅ（τ）の値が大きければ大きいほど周期性が強く、美しいと考えられる。従って、例えばｅ（τ）の最大値に予め設定された定数を乗ずることによりビブラート区間の得点を算出する。最終的な得点は（ｉｉ）ビブラート区間の得点と（ｉｖ）通常歌唱区間の得点の合計として算出される。

Next, the operation of the score calculation unit 15 will be described in detail. The score calculation unit 15 classifies the input audio signal into four sections (i) to (iv) shown in Table 2. (I) An unvoiced section and (iii) a section with a large pitch change (a section with a large pitch change amount) are not used for score calculation, and (ii) a vibrato section and (iv) a normal singing section are scored. The section determined as accompaniment information by the accompaniment / singing voice determination unit 21 is classified into the silent section (i) and is not used for score calculation. (Iv) In the normal singing section, the pitch extracted from the input voice signal is compared with the guide melody, and a score proportional to the similarity is calculated. (Ii) The score of the vibrato section is calculated by, for example, multiplying the maximum value of the correlation strength e (τ) calculated by Equation 4 by a preset constant. Vibrato is considered to be more beautiful as the value of e (τ) is larger and the periodicity is stronger. Therefore, for example, the score of the vibrato section is calculated by multiplying the maximum value of e (τ) by a preset constant. The final score is calculated as the sum of (ii) the score of the vibrato section and (iv) the score of the normal singing section.

図５は、得点算出部１５のブロック図である。信頼性算出モジュール１５１は、ＲＡＭ９の有声・無声判定結果記憶メモリ９Ｄと伴奏・歌声判定結果メモリ９Ｅ及びピッチ変化量記憶メモリ９Ｇよりそれぞれ有声・無声判定結果、伴奏・歌声判定結果、ピッチ変化量を読み出し、これらを基にピッチの信頼性を算出し、算出されたピッチの信頼性をＲＡＭ９のワーク領域９Ｗに書き込む。瞬時得点算出モジュール１５２は、ＲＡＭ９のピッチ記憶メモリ９Ｆ、ビブラート情報記憶メモリ９Ｈ、ガイドメロディ記憶メモリ９Ｍ、ワーク領域９Ｗよりそれぞれピッチ、ビブラート判定結果、ガイドメロディ、ピッチの信頼性を読み出し、歌唱者の歌声１分析フレーム分の瞬時得点を算出し、算出した瞬時得点をＲＡＭ９の瞬時得点記憶メモリ９Ｉに書き込む。得点累積モジュール１５３は、ＲＡＭ９の瞬時得点記憶メモリ９Ｉより読み出された瞬時得点を累積し、カラオケ採点が始まってから現在に至るまでの累積得点を算出し、ＲＡＭ９の累積得点記憶メモリ９Ｊに記録する。 FIG. 5 is a block diagram of the score calculation unit 15. The reliability calculation module 151 stores the voiced / unvoiced determination result, the accompaniment / singing voice determination result, and the pitch change amount from the voiced / unvoiced determination result memory 9D of the RAM 9, the accompaniment / singing voice determination result memory 9E, and the pitch change amount storage memory 9G, respectively. Reading, the pitch reliability is calculated based on these, and the calculated pitch reliability is written in the work area 9W of the RAM 9. The instantaneous score calculation module 152 reads the pitch, vibrato determination result, guide melody, and pitch reliability from the pitch storage memory 9F, the vibrato information storage memory 9H, the guide melody storage memory 9M, and the work area 9W of the RAM 9, respectively. The instantaneous score for one singing voice analysis frame is calculated, and the calculated instantaneous score is written in the instantaneous score storage memory 9I of the RAM 9. The score accumulating module 153 accumulates the instantaneous score read from the instantaneous score storage memory 9I of the RAM 9, calculates the cumulative score from the start of karaoke scoring until the present, and records it in the cumulative score storage memory 9J of the RAM 9 To do.

カラオケ採点終了後、累積得点記憶メモリ９Ｊに記録された累積得点は、ＲＡＭ９より読み出され、ビデオコントローラ６を経由して、ディスプレイ３に表示される。歌唱者は、ディスプレイ３に表示された採点結果を見て一喜一憂する。なお、得点は累積得点をそのままディスプレイに表示しても良いし、得点変換関数や変換テーブルを通して変換をかけたものを表示しても良い。これらの変換テーブルや変換関数は事前に得点分布の統計調査を行った上で、例えば１００点が算出される確率が全体の５％以下になるように設計した変換テーブルや変換関数を用いる。 After the karaoke scoring is completed, the cumulative score recorded in the cumulative score storage memory 9J is read from the RAM 9 and displayed on the display 3 via the video controller 6. The singer is anxious to see the scoring results displayed on the display 3. The accumulated score may be displayed on the display as it is, or may be displayed after conversion through a score conversion function or a conversion table. For these conversion tables and conversion functions, for example, a conversion table or conversion function designed so that the probability that 100 points are calculated is 5% or less after performing a statistical survey of the score distribution in advance.

次に、伴奏・歌声判定部２１の動作について詳細に説明する。まず、カラオケ装置に接続されているダイナミックマイクから入力された音声がカラオケの伴奏なのか、人間の歌声なのかを判定する回路を設計するために、ダイナミックマイクの周波数特性について考察する。 Next, the operation of the accompaniment / singing voice determination unit 21 will be described in detail. First, in order to design a circuit for determining whether the voice input from the dynamic microphone connected to the karaoke apparatus is a karaoke accompaniment or a human singing voice, the frequency characteristics of the dynamic microphone will be considered.

図６は、一般的なカラオケ装置に接続されるダイナミックマイクの周波数特性を示したものである。ダイナミックマイクは、歌唱者の口元からマイクまでの距離により周波数特性が変わる特性を持つ。例えば、（ａ）音源とマイクの距離が２５ｍｍ場合、２００［Ｈｚ］付近の低音が最も強調され、１０００［Ｈｚ］以上の高音は弱めになる低域強調特性となる。（ｂ）音源とマイクの距離が５０ｍｍの場合、特性はフラットに近いものとなる。（ｃ）音源がマイクから６００ｍｍ離れた場合では、１００［Ｈｚ］付近の低音はほとんど入らず、１０００［Ｈｚ］以上の高音が強調される右肩上がりのグラフ（高域強調特性）となる。 FIG. 6 shows frequency characteristics of a dynamic microphone connected to a general karaoke apparatus. A dynamic microphone has a characteristic in which a frequency characteristic changes depending on a distance from a singer's mouth to the microphone. For example, (a) when the distance between the sound source and the microphone is 25 mm, a low-frequency emphasis characteristic in which a low tone near 200 [Hz] is most emphasized and a high tone above 1000 [Hz] is weakened. (B) When the distance between the sound source and the microphone is 50 mm, the characteristics are almost flat. (C) When the sound source is 600 mm away from the microphone, there is almost no bass in the vicinity of 100 [Hz], and the graph rises to the right (high frequency emphasis characteristic) in which trebles of 1000 [Hz] or higher are emphasized.

このマイク周波数特性をさらに分かり易い例を用いて説明したものが図７である。例えばソプラノ歌手が離れたところから歌った場合でも、ダイナミックマイクは歌声を拾うことができる。ソプラノ歌手の高い声は１〜２［ｍ］離れていても拾う。マイクが遠くても、高い音なら拾う。一方、ベースボーカルの低い声は、１０［ｃｍ］離したら拾わない。 FIG. 7 illustrates the microphone frequency characteristics using an example that is easier to understand. For example, even if a soprano singer sings from a distance, the dynamic microphone can pick up the singing voice. Pick up the high voice of soprano singer even if it is 1-2 m away. Even if the microphone is far away, pick it up if it is loud. On the other hand, a voice with a low bass vocal is not picked up after 10 cm away.

上述のダイナミックマイクの性質により、マイクを持っている歌唱者より遠くにあるスピーカからカラオケ伴奏がマイクに回り込む場合、マイクに入ってくる伴奏はスピーカから出た伴奏に高域強調をかけたものとなる。つまり、マイクに回り込んできた音はＢＡＳＳライン（低音）の比重が小さく、ドラムのハイハットの音や、エレキギターの音（高音）の比重が大きな伴奏となる（図８参照）。従って、ダイナミックマイクから入力された伴奏情報に対して周波数分析を行い、スペクトルを算出すると、スペクトルのエネルギーは高周波数帯域に集中する傾向が現れる。 Due to the nature of the dynamic microphone described above, when a karaoke accompaniment wraps around a microphone from a speaker farther away than the singer who has the microphone, the accompaniment entering the microphone is a high-frequency emphasis on the accompaniment coming out of the speaker. Become. That is, the sound that wraps around the microphone has a low specific gravity of the BASS line (bass), and is accompanied by a high specific gravity of the drum hi-hat sound and the electric guitar sound (high sound) (see FIG. 8). Therefore, if frequency analysis is performed on accompaniment information input from a dynamic microphone and a spectrum is calculated, the spectrum energy tends to concentrate in a high frequency band.

この性質を利用して、伴奏情報と人間の声を判別することができる。スペクトルのエネルギーが高周波数帯域に集中する特徴を検出する１つの方法としてピッチ抽出を利用する方法がある。歌い手より遠くにあるスピーカから回り込んできた伴奏からピッチを抽出すると、人間の歌声よりも高い周波数となることが多い（但し女性の高音：８００Ｈｚ付近以上では帯域が重なることもある）。以下、伴奏・歌声の判別方法について説明する。 Using this property, accompaniment information and human voice can be discriminated. One method for detecting a feature in which spectrum energy is concentrated in a high frequency band is a method using pitch extraction. When a pitch is extracted from an accompaniment that wraps around from a speaker farther away than the singer, the frequency often becomes higher than that of a human singing voice (however, the high frequency of a woman: the band may overlap in the vicinity of 800 Hz or higher). Hereinafter, the accompaniment / singing voice discrimination method will be described.

本発明の実施形態では、先ず、ピッチ検出部１３で人間の声より高い帯域を考慮してピッチを計算する。具体的には、図４の自己相関関数算出（Ｓ１０）とフーリエ級数算出（Ｓ１１）における分析帯域を人間の歌唱帯域（７０Ｈｚ〜１２００Ｈｚ）より高めに設定して例えば７０Ｈｚ〜７０００Ｈｚとする。７０Ｈｚ〜７０００Ｈｚの分析帯域をもつ自己相関関数及びフーリエ級数からピッチを抽出する。この構成で１２００Ｈｚ以上のピッチが検出された場合は、その部分は伴奏情報とみなし得点算出に用いない。この方法では、例えば高音域を得意とする一部の女性歌手が歌った場合、歌声の一部が伴奏と見なされる場合がある。ただし、常に１２００Ｈｚ（Ｄ６）以上の音程で歌い続ける歌唱者はいないため、カラオケ採点においては、実用上は問題ない。まず、上述の方法により判定された伴奏・歌声判定結果が、伴奏・歌声判定結果記憶メモリ９Ｅに書き込まれる。 In the embodiment of the present invention, first, the pitch detector 13 calculates a pitch in consideration of a band higher than a human voice. Specifically, the analysis band in the autocorrelation function calculation (S10) and Fourier series calculation (S11) in FIG. 4 is set higher than the human singing band (70 Hz to 1200 Hz), for example, 70 Hz to 7000 Hz. The pitch is extracted from the autocorrelation function and Fourier series having an analysis band of 70 Hz to 7000 Hz. When a pitch of 1200 Hz or higher is detected with this configuration, that portion is regarded as accompaniment information and is not used for score calculation. In this method, for example, when some female singers who are good at high frequencies sing, a part of the singing voice may be regarded as an accompaniment. However, since there is no singer who always sings at a pitch of 1200 Hz (D6) or higher, there is no practical problem in karaoke scoring. First, the accompaniment / singing voice determination result determined by the above-described method is written in the accompaniment / singing voice determination result storage memory 9E.

ただし、８００Ｈｚ帯域付近の伴奏情報がマイクから入力されることもあるため、上述の判定法のみでは完全に伴奏情報を除去できない問題がある。次に、８００Ｈｚ帯域付近の女性の声なのか伴奏情報なのか判定し難い（ある程度高い）帯域のピッチが検出された場合に、自己相関関数の特徴を用いて伴奏を判定する方法を説明する。例えば、女性が８００Ｈｚ付近の高音で歌った場合、その歌声から抽出した自己相関関数は、図９に示すように比較的滑らかな波形となることが多い（倍音が少ない）。一方、伴奏情報から算出した自己相関関数は図１０に示すようにスペクトルに含まれる極値（ローカルピーク）の数が極端に多い（倍音が多い）。男性低音系の歌声は倍音は多いが、検出されるピッチは低い周波数となる。 However, since accompaniment information in the vicinity of the 800 Hz band may be input from the microphone, there is a problem that accompaniment information cannot be completely removed only by the above-described determination method. Next, a method for determining accompaniment using the characteristics of an autocorrelation function when a pitch of a band that is difficult to determine whether it is female voice or accompaniment information in the vicinity of the 800 Hz band (somewhat high) will be described. For example, when a woman sings at a high frequency around 800 Hz, the autocorrelation function extracted from the singing voice often has a relatively smooth waveform as shown in FIG. 9 (there is less harmonics). On the other hand, the autocorrelation function calculated from the accompaniment information has an extremely large number of extreme values (local peaks) included in the spectrum as shown in FIG. Male singing voices have many overtones, but the detected pitch has a low frequency.

この特徴を用いて、例えば４８０Ｈｚ〜１２００Ｈｚの『女性の歌声か伴奏情報か判定し難いピッチ』が検出された場合でも、自己相関関数などのスペクトルのローカルピーク（極値）が例えば１００以上ある（倍音が異常に多い）ときは、その部分は伴奏またはノイズと判定することができる。本発明の実施形態では、上述の方法を利用してＲＡＭ９の自己相関関数記憶メモリ９Ｂより自己相関関数が持つローカルピーク（極値）の数を数える。ローカルピークの数が予め設定されたしきい値ＴＨ（例えば１００）より大きい場合は入力音声は人間の歌声では無く伴奏情報（またはノイズ）と判定する。この方法により判定された伴奏・歌声判定結果はＲＡＭ９の伴奏・歌声判定結果記憶メモリ９Ｅに記録される。 Using this feature, even when a “pitch that is difficult to determine whether it is female singing voice or accompaniment information” of 480 Hz to 1200 Hz, for example, the local peak (extreme value) of a spectrum such as an autocorrelation function is 100 or more, for example ( If the harmonics are abnormally high), the portion can be determined as accompaniment or noise. In the embodiment of the present invention, the number of local peaks (extreme values) of the autocorrelation function is counted from the autocorrelation function storage memory 9B of the RAM 9 using the above-described method. When the number of local peaks is larger than a preset threshold value TH (for example, 100), it is determined that the input voice is not human singing voice but accompaniment information (or noise). The accompaniment / singing voice determination result determined by this method is recorded in the accompaniment / singing voice determination result storage memory 9E of the RAM 9.

また、倍音が異常に多いことを検出する方法は、この他にも様々な方法がある。例えば、スペクトルの隣り合う要素を結んだ長さＬを算出し、この尺度Ｌが一定しきい値より大きいとき『倍音が異常に多い』と判定することも可能である。例えば、数式１により算出された自己相関関数Ｒ（０，ｉ）を「スペクトル」とした場合、このスペクトルから数式５や数式６を用いて長さＬを算出することができる。数式５と数式６は本質的に同じものであり、どちらを用いても『倍音の多さ』を測定（定量化）することができる。このように、例えば４８０Ｈｚ〜１２００Ｈｚの『女性の歌声か伴奏情報か判定し難いピッチ』が検出された場合でも、Ｌが一定閾値より大きいときは、その部分は伴奏またはノイズと判定することができる。また、スペクトルとして自己相関関数の変わりにＦＦＴスペクトルやその他のスペクトルを利用できることは言うまでも無い。

There are various other methods for detecting that the number of overtones is abnormally large. For example, it is possible to calculate the length L connecting adjacent elements of the spectrum and determine that “the number of overtones is abnormally large” when the scale L is larger than a certain threshold value. For example, when the autocorrelation function R (0, i) calculated by Equation 1 is “spectrum”, the length L can be calculated from this spectrum using Equation 5 or Equation 6. Equations 5 and 6 are essentially the same, and either one can be used to measure (quantify) “overtones”. Thus, for example, even when “a female singing voice or accompaniment information pitch that is difficult to determine” of 480 Hz to 1200 Hz is detected, if L is larger than a certain threshold value, the portion can be determined as accompaniment or noise. . Needless to say, an FFT spectrum or other spectrum can be used as the spectrum instead of the autocorrelation function.

なお、本実施形態では、上述の自己相関関数を利用した伴奏・歌声判定に加え、さらにメモリ９のフーリエ級数記憶メモリ９Ｃに記憶されたフーリエ級数（ＦＦＴスペクトル）を用いて歌声・伴奏判別を行うことで、さらに伴奏・歌声判定の信頼度を高める。以下、フーリエ級数を用いた伴奏・歌声判定方法について説明する。図１１（ａ）はマイクから採取された歌声に高速フーリエ変換をかけ得られたＦＦＴスペクトル、図１１（ｂ）はマイクに回り込んできた伴奏から得られたＦＦＴスペクトルである。ピッチは、例えば、ＦＦＴスペクトルが最大値をとるときの周波数として検出することができる。歌声から抽出したピッチは、４００［Ｈｚ］付近、マイクに回りこんできた伴奏のＦＦＴスペクトルは１５００［Ｈｚ］付近に分布している。例えば、１２００［Ｈｚ］以上のピッチが検出された場合は、その部分は伴奏情報とみなし得点算出に用いない構成とすることにより、カラオケ採点の精度を高めることができる。この方法により判定された伴奏・歌声判定結果はＲＡＭ９の伴奏・歌声判定結果記憶メモリ９Ｅに記録される。 In this embodiment, in addition to the accompaniment / singing voice determination using the autocorrelation function described above, the singing voice / accompaniment discrimination is performed using the Fourier series (FFT spectrum) stored in the Fourier series storage memory 9C of the memory 9. This increases the reliability of accompaniment / singing voice determination. Hereinafter, an accompaniment / singing voice determination method using a Fourier series will be described. FIG. 11A shows an FFT spectrum obtained by applying a fast Fourier transform to a singing voice collected from a microphone, and FIG. 11B shows an FFT spectrum obtained from an accompaniment that wraps around the microphone. For example, the pitch can be detected as a frequency when the FFT spectrum takes the maximum value. The pitch extracted from the singing voice is distributed around 400 [Hz], and the FFT spectrum of the accompaniment that has come around the microphone is distributed around 1500 [Hz]. For example, when a pitch of 1200 [Hz] or more is detected, the portion is regarded as accompaniment information and is not used for score calculation, so that the accuracy of karaoke scoring can be improved. The accompaniment / singing voice determination result determined by this method is recorded in the accompaniment / singing voice determination result storage memory 9E of the RAM 9.

最後に、本発明の実施形態では上述の伴奏・歌声判定に加え、メモリ９のフーリエ級数記憶メモリ９Ｃより読み出したＦＦＴスペクトル（フーリエ級数）の傾きから歌声・伴奏判定を行う。以下、スペクトルの傾きから伴奏・歌声判定を行う方法について説明する。２００［Ｈｚ］〜７０００［Ｈｚ］区間スペクトルを切り出し、最小２乗法によりスペクトルを直線近似したときのスペクトルの傾きを算出する。直線の傾きが予め設定された一定しきい値ＴＨより大きいとき、『入力音声は伴奏情報である』と判定する。この方法により判定された伴奏・歌声判定結果はＲＡＭ９の伴奏・歌声判定結果記憶メモリ９Ｅに記録される。 Finally, in the embodiment of the present invention, in addition to the accompaniment / singing voice determination described above, the singing voice / accompaniment determination is performed from the slope of the FFT spectrum (Fourier series) read from the Fourier series storage memory 9C of the memory 9. Hereinafter, a method of performing accompaniment / singing voice determination from the inclination of the spectrum will be described. A spectrum between 200 [Hz] and 7000 [Hz] is cut out, and the slope of the spectrum when the spectrum is linearly approximated by the least square method is calculated. When the slope of the straight line is larger than a predetermined threshold value TH, it is determined that “the input voice is accompaniment information”. The accompaniment / singing voice determination result determined by this method is recorded in the accompaniment / singing voice determination result storage memory 9E of the RAM 9.

図１２（ａ）はマイクから入力された歌声に高速フーリエ変換をかけて求めたＦＦＴスペクトルの傾き、図１２（ｂ）はマイクに回り込んできた伴奏に高速フーリエ変換をかけて求めたＦＦＴスペクトルの傾きである。歌声から算出したＦＦＴスペクトルの傾きは大きくなり、伴奏から算出したＦＦＴスペクトルの傾きは小さくなることが観察できる。この特徴を用いて歌声とマイクから回りこんできた伴奏を判別することは容易である。このように『歌唱者より遠くにある音源から入ってくる音は、ダイナミックマイクの高域強調特性により高域強調される』という特徴を用いて伴奏・歌声の判定をすることが可能である。 FIG. 12A shows the slope of the FFT spectrum obtained by applying fast Fourier transform to the singing voice inputted from the microphone, and FIG. 12B shows the FFT spectrum obtained by applying fast Fourier transform to the accompaniment that wraps around the microphone. Is the slope of It can be observed that the slope of the FFT spectrum calculated from the singing voice increases and the slope of the FFT spectrum calculated from the accompaniment decreases. Using this feature, it is easy to discriminate between the singing voice and the accompaniment coming from the microphone. In this way, accompaniment / singing voice can be determined using the feature that “the sound coming from a sound source farther away from the singer is emphasized by the high frequency by the high frequency emphasis characteristic of the dynamic microphone”.

なお、本実施形態では、ピッチ検出法として自己相関関数法及びフーリエ変換法を用いているが、これに限らず、ピッチ検出法として例えば相互相関法、ケプストラム法、平方根・４乗根スペクトルの自己相関法、対数スペクトルの自己相関法、線形予測法など他の公知のピッチ検出法を用いることもできる。本発明は、人間の声である可能性が少ない高音ピッチが検出されたときに入力音声を伴奏情報として判定するというもので、ピッチ検出法は特に限定しない。また、伴奏判別に用いるスペクトルとしては、例えば線形予測分析によって得られたＬＰＣスペクトル、相互相関関数、ケプストラム、ＬＰＣケプストラム、平方根・４乗根スペクトルなどを用いることもできることは言うまでもない。 In this embodiment, the autocorrelation function method and the Fourier transform method are used as the pitch detection method. However, the pitch detection method is not limited to this, and for example, the cross correlation method, the cepstrum method, the square root / fourth root spectrum self Other known pitch detection methods such as a correlation method, a logarithmic spectrum autocorrelation method, and a linear prediction method can also be used. In the present invention, the input sound is determined as accompaniment information when a high pitch that is less likely to be a human voice is detected, and the pitch detection method is not particularly limited. Needless to say, as a spectrum used for accompaniment discrimination, for example, an LPC spectrum, a cross-correlation function, a cepstrum, an LPC cepstrum, a square root / quarter root spectrum, etc. obtained by linear prediction analysis can be used.

また、直線の傾きを求めるのに使う周波数特徴としてはＬＰＣスペクトル、群遅延スペクトル、ＬＰＣケプストラム、ケプストラム、自己相関関数、相互相関関数など、他の公知の周波数特徴を用いることができる。 In addition, as the frequency feature used for obtaining the slope of the straight line, other known frequency features such as an LPC spectrum, a group delay spectrum, an LPC cepstrum, a cepstrum, an autocorrelation function, a cross correlation function, and the like can be used.

例えば、入力音声に高速フーリエ変換をかけ、ＦＦＴスペクトルを求める。次に、ＦＦＴスペクトルに高速逆フーリエ変換をかけるとケプストラムと呼ばれる特徴量が求まる。このケプストラムの高次の係数を０に置き換えて、さらに高速フーリエ変換をかけると、平滑化されたスペクトルが求まる。この平滑化されたスペクトルにおいて、例えば６００［Ｈｚ］以上の高次のスペクトル係数の平均値ＡＨと６００［Ｈｚ］未満の低次の係数の平均値ＡＬを算出する。例えば、数式７により高次のスペクトル係数と低次のスペクトル係数の比ＲＡＴＥを求め、ＲＡＴＥが一定しきい値より大きい場合、入力音声は伴奏情報であると判定することもできる。

For example, fast Fourier transform is applied to the input speech to obtain an FFT spectrum. Next, when a fast inverse Fourier transform is applied to the FFT spectrum, a feature amount called a cepstrum is obtained. When the high-order coefficient of the cepstrum is replaced with 0 and further subjected to fast Fourier transform, a smoothed spectrum is obtained. In this smoothed spectrum, for example, an average value AH of higher-order spectral coefficients of 600 [Hz] or more and an average value AL of lower-order coefficients of less than 600 [Hz] are calculated. For example, the ratio RATE between the higher-order spectral coefficient and the lower-order spectral coefficient is obtained by Expression 7, and when RATE is larger than a certain threshold value, it can be determined that the input voice is accompaniment information.

なお、上述の例では、逆フーリエ変換を行った後のケプストラムを用いて平滑化スペクトルを求めたが、これに限らず、平滑化を行わないフーリエ変換係数からＡＨ、ＡＬ、ＲＡＴＥを算出し、高次の係数と低次の係数の比：ＲＡＴＥを算出しても良い。さらに、比ＲＡＴＥや直線の傾きを求めるのに使う周波数特徴としてはＬＰＣスペクトル、群遅延スペクトル、ＬＰＣケプストラム、ケプストラム、自己相関関数、相互相関関数など、他の公知の周波数特徴を用いることができることは言うまでもない。 In the above example, the smoothed spectrum is obtained using the cepstrum after performing the inverse Fourier transform. However, the present invention is not limited to this, and AH, AL, and RATE are calculated from the Fourier transform coefficients that are not smoothed. Ratio of higher-order coefficient and lower-order coefficient: RATE may be calculated. Furthermore, other known frequency features such as LPC spectrum, group delay spectrum, LPC cepstrum, cepstrum, autocorrelation function, cross-correlation function, etc. can be used as the frequency feature used to determine the ratio RATE and the slope of the straight line. Needless to say.

また、本実施形態では、歌唱者の歌声から抽出したピッチとガイドメロディとを比較してその類似度より得点を算出する例を示した。しかし、これに限らずガイドメロディを参照しない採点システムを構築することも可能である。例えば、歌唱者の歌声から抽出したピッチと当該ピッチの最近傍にある平均率音階上（または純正率音階上）の音程との最小距離を算出し、当該最小距離が小さいほど高い得点を算出する採点アルゴリズムを搭載した採点システムとしても良い。また、人の歌声に含まれる倍音の量を測定し、倍音が多いほど高い得点を算出する採点システムを構築しても良い。また、ガイドメロディと調和音の関係にある音程（不協和音にならない音程）を推定し、歌唱者の歌声が調和音と一致したときに高得点を加算するアルゴリズムを採用しても良い。 Moreover, in this embodiment, the example which calculated the score from the similarity by comparing the pitch extracted from the singing voice of the singer with the guide melody was shown. However, it is not limited to this, and a scoring system that does not refer to the guide melody can be constructed. For example, the minimum distance between the pitch extracted from the singing voice of the singer and the pitch of the average rate scale (or the pure rate scale) nearest to the pitch is calculated, and the smaller the minimum distance, the higher the score is calculated. A scoring system equipped with a scoring algorithm may be used. Further, a scoring system may be constructed that measures the amount of harmonics contained in a person's singing voice and calculates a higher score as the number of harmonics increases. Alternatively, an algorithm may be employed that estimates a pitch (pitch that does not become dissonant) between the guide melody and the harmonic sound, and adds a high score when the singer's singing voice matches the harmonic sound.

本発明は、カラオケ装置に搭載されるカラオケ採点装置、及び歌唱力評価装置に利用することができる。 The present invention can be used for a karaoke scoring device and a singing ability evaluation device mounted on a karaoke device.

本発明の第一実施形態におけるカラオケ装置の外観である。It is an external appearance of the karaoke apparatus in 1st embodiment of this invention. 本発明の第一実施形態におけるカラオケ装置のブロック図である。It is a block diagram of the karaoke apparatus in the first embodiment of the present invention. ＲＡＭ９に確保される記憶領域を示す図である。3 is a diagram showing a storage area secured in a RAM 9. FIG. 同カラオケ装置の採点処理部の動作を示すフローチャートである。It is a flowchart which shows operation | movement of the scoring process part of the karaoke apparatus. 本発明の実施形態における得点算出手順を示すブロック図である。It is a block diagram which shows the score calculation procedure in embodiment of this invention. ダイナミックマイクの周波数特性を示す図である。It is a figure which shows the frequency characteristic of a dynamic microphone. ダイナミックマイクの周波数特性により引き起こされる現象を具体的に説明した図である。It is the figure which explained concretely the phenomenon caused by the frequency characteristic of a dynamic microphone. カラオケでマイクに回り込む伴奏情報の性質について説明した図である。It is the figure explaining the property of the accompaniment information which goes around to a microphone in karaoke. 女性が歌う８８０［Ｈｚ］の歌声から算出した自己相関関数（ローカルピーク１６個）を示した図である。It is the figure which showed the autocorrelation function (16 local peaks) computed from the singing voice of 880 [Hz] which a woman sings. 伴奏情報から算出した自己相関関数を示した図である。It is the figure which showed the autocorrelation function calculated from the accompaniment information. 歌声から抽出したＦＦＴスペクトルと伴奏から抽出したＦＦＴスペクトルを示す図である。It is a figure which shows the FFT spectrum extracted from the FFT spectrum extracted from the singing voice, and the accompaniment. 歌声から算出したスペクトルの傾きとマイクに回り込んだ伴奏から算出したスペクトルの傾きを比較した図である。It is the figure which compared the inclination of the spectrum calculated from the accompaniment which went around to the microphone and the inclination of the spectrum calculated from the singing voice. カラオケ採点における伴奏の回り込みを説明する図である。It is a figure explaining the wraparound of the accompaniment in karaoke scoring.

Explanation of symbols

１カラオケ装置
２マイク
３ディスプレイ
４ＡＭＰ
５スピーカ
６ビデオコントローラ
７ミキサ（エフェクタ）
８演奏装置
９ＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）
１２採点処理部（採点ＤＳＰ）
１３ピッチ抽出部
１４ビブラート検出部
１５得点算出部
１６操作部
１７Ａ／Ｄ変換部
１８ＨＤＤ（ＨａｒｄＤｉｓｋＤｒｉｖｅ）
１９ＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）
２１伴奏・歌声判定部
１５１信頼性算出モジュール
１５２瞬時得点算出モジュール
１５３得点累積モジュール
1 Karaoke device 2 Microphone 3 Display 4 AMP
5 Speaker 6 Video controller 7 Mixer (effector)
8 Performance Equipment 9 RAM (Random Access Memory)
12 Scoring processor (scoring DSP)
DESCRIPTION OF SYMBOLS 13 Pitch extraction part 14 Vibrato detection part 15 Score calculation part 16 Operation part 17 A / D conversion part 18 HDD (Hard Disk Drive)
19 CPU (Central Processing Unit)
21 Accompaniment / Singing Voice Determination Unit 151 Reliability Calculation Module 152 Instantaneous Score Calculation Module 153 Score Accumulation Module

Claims

On the computer,
A spectrum calculating step of performing frequency analysis on the input voice signal input from the voice signal input means and calculating a spectrum;
A spectrum storage step of storing the spectrum obtained by executing the spectrum calculation step in the spectrum storage means;
And performing an accompaniment / singing voice determination step for determining that the input audio signal is accompaniment information when it is detected that the spectrum energy read from the spectrum storage means is concentrated in a high frequency band. Characteristic singing ability evaluation method.

In the accompaniment / singing voice determination step,
The pitch is calculated from the spectrum read from the spectrum storage means,
The singing ability evaluation method according to claim 1, wherein when the pitch is higher than a certain threshold value, the input voice signal is determined to be accompaniment information.

In the accompaniment / singing voice determination step,
The slope of the spectrum is calculated from the spectrum read from the spectrum storage means,
2. The singing ability evaluation method according to claim 1, wherein the input voice signal is determined to be accompaniment information when the calculated slope of the spectrum is larger than a certain threshold value.

A singing ability evaluation method according to any one of claims 1 to 3,
In the accompaniment / singing voice determination step, from the spectrum read out from the spectrum storage means, the amount of harmonics included in the spectrum is measured,
A singing ability evaluation method characterized in that an input audio signal is determined to be accompaniment information when the amount of overtones is greater than a certain threshold value.

A singing ability evaluation method according to any one of claims 1 to 3,
In the accompaniment / singing voice determination step, from the spectrum read out from the spectrum storage means, the number of extreme values included in the spectrum is counted,
A singing ability evaluation method characterized in that an input audio signal is determined to be accompaniment information when the number of extreme values is greater than a certain threshold value.

Spectrum calculation means for performing frequency analysis on the input voice signal input from the voice signal input means and calculating a spectrum;
Spectrum storage means for storing the spectrum obtained by the spectrum calculation means;
An accompaniment / singing voice judging means for judging that the input audio signal is accompaniment information when it is detected that the energy of the spectrum read out by the spectrum storing means is concentrated in a high frequency band. A karaoke device with a singing ability evaluation function.

The accompaniment / singing voice judging means is:
Calculate the pitch from the spectrum read by the spectrum storage means,
The karaoke apparatus equipped with the singing ability evaluation function according to claim 6, wherein the input voice signal is determined to be accompaniment information when the pitch is higher than a certain threshold value.

The accompaniment / singing voice judging means is:
Calculating the slope of the spectrum from the spectrum read by the spectrum storage means;
7. The karaoke apparatus equipped with the singing ability evaluation function according to claim 6, wherein the input audio signal is determined to be accompaniment information when the calculated inclination of the spectrum is larger than a certain threshold value.

A karaoke apparatus according to any one of claims 6 to 8,
The accompaniment / singing voice determination means further measures the amount of harmonics contained in the spectrum from the spectrum read out by the spectrum storage means,
A karaoke apparatus equipped with a singing ability evaluation function, characterized in that an input voice signal is determined to be accompaniment information when the amount of overtones exceeds a certain threshold.

A karaoke apparatus according to any one of claims 6 to 8,
The accompaniment / singing voice determination means further counts the number of extreme values included in the spectrum from the spectrum read out by the spectrum storage means,
A karaoke apparatus equipped with a singing ability evaluation function, characterized in that an input voice signal is determined to be accompaniment information when the number of extreme values is greater than a certain threshold value.