JP2007033471A

JP2007033471A - Singing grading apparatus, and program

Info

Publication number: JP2007033471A
Application number: JP2005211943A
Authority: JP
Inventors: Katsu Setoguchi; 克瀬戸口
Original assignee: Casio Computer Co Ltd
Current assignee: Casio Computer Co Ltd
Priority date: 2005-07-21
Filing date: 2005-07-21
Publication date: 2007-02-08

Abstract

<P>PROBLEM TO BE SOLVED: To provide a singing grading apparatus which can perform grading by audibility more similar to human being's irrespective of pitches of voices uttered when singing. <P>SOLUTION: A karaoke system 11 performs karaoke performance by reproducing sequence data. During the performance, the karaoke system calculates a pitch ratio of a pitch of a voice input from a microphone 12 to that of a voice which is identified from the sequence data to be uttered, and uses the calculated pitch ratio to perform evaluation using two or more pitches ratios as a grading unit. After finishing the karaoke performance, the system performs grading of the user's singing using the evaluation resulting from the grading units, and displays the grading result on a display apparatus 14. <P>COPYRIGHT: (C)2007,JPO&INPIT

Description

本発明は、利用者による楽曲の歌唱を評価し採点を行うための技術に関する。 The present invention relates to a technique for evaluating and scoring a song sung by a user.

伴奏に合わせて歌をうたうカラオケは現在、娯楽として多くの人に楽しまれている。そのカラオケ用のカラオケ装置のなかには、自身の歌唱力を客観的に把握したいと望む人などに対応できるように、その歌唱力を採点する歌唱採点装置を搭載したものも多く製品化されている。 Karaoke that sings along with accompaniment is now enjoyed by many people as an amusement. Among the karaoke devices for karaoke, many are equipped with a singing scoring device for scoring the singing ability so as to be able to cope with a person who wants to grasp his / her singing ability objectively.

上記採点は、利用者が歌唱により発声した音声の音高（ピッチ）、及びその音声の発声タイミングのうちの少なくとも一方に着目して行われるのが普通である。そのような採点を行う従来の歌唱採点装置としては、例えば特許文献１、２にそれぞれ記載されたものがある。 The scoring is usually performed by paying attention to at least one of the pitch (pitch) of the voice uttered by the user and the utterance timing of the voice. As a conventional singing scoring device that performs such scoring, for example, there are devices described in Patent Documents 1 and 2, respectively.

特許文献１、２にそれぞれ記載された従来の歌唱採点装置では、音声のピッチに着目した評価（ピッチずれの評価）は、その音声のピッチ（以降、その音声、ピッチはそれぞれ「利用者音声」「利用者ピッチ」と呼ぶ）と、その音声を発声させたタイミングで発声すべき音声のピッチ（以降、その音声、ピッチはそれぞれ「参照音声」「参照ピッチ」と呼ぶ）との間の音高（ピッチ）差（差分）を求め、その音高差に応じて固定的に行っていた。採点結果と聴感による評価とがより一致するように、特許文献１に記載された従来の歌唱採点装置では、ピッチ変動のピークとボトムで挟まれた部分の平均ピッチを求め、参照音声の発声期間内に存在する平均ピッチの中で最も参照ピッチに近いものを採点用に選択するようにしている。特許文献２に記載された従来の歌唱採点装置では、２００ｍｓ程度の一定の長さを１つのフレームとして、フレーム内で検出したピッチのヒストグラムを作成し、最大度数となる階級の階級値を当該フレームにおける利用者ピッチとして採用している。 In the conventional singing scoring devices described in Patent Documents 1 and 2, the evaluation focusing on the pitch of the voice (evaluation of pitch deviation) is the pitch of the voice (hereinafter, the voice and pitch are “user voice”, respectively). The pitch between the “user pitch” and the pitch of the voice to be uttered at the timing when the voice is uttered (hereinafter, the voice and the pitch are called “reference voice” and “reference pitch”, respectively) A (pitch) difference (difference) was obtained and fixed according to the pitch difference. In the conventional singing scoring device described in Patent Document 1, the average pitch of the portion sandwiched between the peak and the bottom of the pitch fluctuation is obtained so that the scoring result and the evaluation based on the audibility are more consistent, and the utterance period of the reference voice The average pitch that is closest to the reference pitch is selected for scoring. In the conventional singing scoring device described in Patent Document 2, a histogram of pitches detected in a frame is created with a fixed length of about 200 ms as one frame, and the class value of the class that is the maximum frequency is set in the frame. It is adopted as a user pitch.

人間の聴覚系は、音の高さの違いにきわめて敏感である。しかし、音程(音の高さの違い)の量は、周波数比にほぼ比例することが知られている。それにより、例えば４４０Ｈｚと８８０Ｈｚの２音の高さの違いと、８８０Ｈｚと１７６０Ｈｚの２音の高さの違いはどちらも１：２であるから、聴感上、同じ違いであると認識される。 The human auditory system is extremely sensitive to pitch differences. However, it is known that the pitch (difference in pitch) is almost proportional to the frequency ratio. Thereby, for example, the difference between the pitches of two sounds at 440 Hz and 880 Hz and the difference between the two sounds at 880 Hz and 1760 Hz are both 1: 2, so that it is recognized that the difference is the same in terms of hearing.

人間は上述したような聴覚系を有しているため、ピッチずれを認識できるピッチ差はピッチによって異なることになる。これは、特許文献１、２にそれぞれ記載された従来の歌唱採点装置で採用されているような利用者ピッチと参照ピッチの差分に応じた固定的な評価では、ピッチによって、評価（採点）結果と聴感上の評価の差が変動することを意味する。このことから、聴感による評価との差をより小さくさせるうえでは、言い換えれば、より人間の聴感に合った採点を行えるようにするうえでは、ピッチによって評価（採点）結果が変動しないようにすることも重要であると考えられる。
特開平１０−２６９９５号公報特開平１１−２２４０９４号公報 Since humans have the auditory system as described above, the pitch difference that can recognize the pitch shift differs depending on the pitch. This is the result of the evaluation (scoring) according to the pitch in the fixed evaluation according to the difference between the user pitch and the reference pitch as employed in the conventional singing scoring devices described in Patent Documents 1 and 2, respectively. This means that the difference in evaluation on hearing is fluctuating. From this, in order to make the difference from the evaluation based on audibility smaller, in other words, in order to be able to score more suited to human audibility, the evaluation (scoring) result should not vary depending on the pitch. Is also considered important.
Japanese Patent Laid-Open No. 10-26995 Japanese Patent Laid-Open No. 11-224094

本発明の課題は、歌唱の際に発声すべき音声のピッチに係わらず、より人間の聴感に合った採点を行える歌唱採点装置を提供することにある。 An object of the present invention is to provide a singing scoring device capable of scoring more suited to human hearing regardless of the pitch of the voice to be uttered during singing.

本発明の歌唱採点装置は、利用者による楽曲の歌唱を評価し採点を行うことを前提とし、利用者により入力される音声信号の音高である第１の音高、及び該音声信号に対応する楽曲の歌唱位置で発声すべき音声の音高である第２の音高の間の音高比を算出する音高比算出手段と、音高比算出手段が算出した音高比を用いて、利用者による歌唱を採点する採点手段と、を具備する。 The singing scoring device of the present invention is based on the premise that the singing of the music by the user is evaluated and scored, and corresponds to the first pitch that is the pitch of the voice signal input by the user, and the voice signal. Using a pitch ratio calculation means for calculating a pitch ratio between second pitches that are pitches of voice to be uttered at the singing position of the music to be played, and a pitch ratio calculated by the pitch ratio calculation means And scoring means for scoring the song by the user.

なお、上記採点手段は、音高比算出手段が算出する所定数の音高比を採点単位として、該採点単位毎に、該音高比から音高のずれを評価して採点を行う、ことが望ましい。また、採点単位とする音高比のなかで予め定めた範囲を越えている音高比を除外して該採点単位の評価を行う、ことが望ましい。採点単位の評価結果は予め定めたタイミングで順次、利用者に通知できる、ことが望ましい。 The scoring means performs scoring by evaluating the deviation of the pitch from the pitch ratio for each scoring unit, with the predetermined number of pitch ratios calculated by the pitch ratio calculating means as scoring units. Is desirable. In addition, it is desirable to evaluate the scoring unit by excluding the pitch ratio exceeding the predetermined range from the pitch ratio as the scoring unit. It is desirable that the evaluation results of scoring units can be notified to the user sequentially at a predetermined timing.

本発明のプログラムは、上記歌唱採点装置が具備する各手段を実現させるための機能を搭載している。 The program of this invention is equipped with the function for implement | achieving each means with which the said song scoring apparatus comprises.

本発明は、利用者により入力される音声信号の音高である第１の音高、及びその音声信号に対応する楽曲の歌唱位置で発声すべき音声の音高である第２の音高の間の音高比を算出し、その音高比を用いて、利用者による歌唱を採点する。 The present invention includes a first pitch that is a pitch of a voice signal input by a user, and a second pitch that is a pitch of a voice to be uttered at a song singing position corresponding to the voice signal. The pitch ratio is calculated, and the song by the user is scored using the pitch ratio.

人間の聴覚系では、音程(音の高さの違い)の量は、周波数比にほぼ比例することが知られている。このことから、上記音高比を算出し、歌唱の採点に用いることにより、歌唱の際に発声すべき音声のピッチに係わらず、その聴覚系の特性に合ったより適切な採点を行えるようになる。 In the human auditory system, it is known that the amount of pitch (difference in pitch) is almost proportional to the frequency ratio. From this, by calculating the pitch ratio and using it for singing, it becomes possible to perform more appropriate scoring that matches the characteristics of the auditory system, regardless of the pitch of the voice to be uttered. .

以下、本発明の実施の形態について、図面を参照しながら詳細に説明する。
図１は、本実施の形態による歌唱採点装置を搭載のカラオケシステムを用いて構築されたカラオケ用システムの構成を説明する図である。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.
FIG. 1 is a diagram for explaining the configuration of a karaoke system constructed using a karaoke system equipped with a singing scoring device according to the present embodiment.

図１に示すように、カラオケ用システムは、カラオケシステム（カラオケ装置）１１に、音声入力用のマイク１２、音声放音用の複数のスピーカ１３、及びテレビジョン、或いは液晶表示装置等の表示装置１４をそれぞれ接続させた形で構築されている。それにより、カラオケシステム１１は、表示装置１４に歌詞等を表示させ、マイク１２から入力した音声、及びカラオケ用の演奏をスピーカ１３により放音させるものとして実現されている。 As shown in FIG. 1, the karaoke system includes a karaoke system (karaoke device) 11, a microphone 12 for voice input, a plurality of speakers 13 for sound emission, and a display device such as a television or a liquid crystal display device. 14 are connected to each other. As a result, the karaoke system 11 is realized as displaying lyrics on the display device 14 and emitting the voice input from the microphone 12 and the performance for karaoke through the speaker 13.

図２は、上記カラオケシステム１１の構成図である。図２に示すように、システム１１全体の制御を行うＣＰＵ２０１と、ＣＰＵ２０１が実行するプログラムや各種制御用データを格納したＲＯＭ２０２と、ＣＰＵ２０１がワークに用いるＲＡＭ２０３と、音声の入力によってマイク１２が出力するアナログの音声信号をデジタルデータ（音声データ）に変換するＡ／Ｄ変換器（ＡＤＣ）２０４と、各種操作子を備えたスイッチ（ＳＷ）部２０５と、カラオケ用の演奏を再生するためのシーケンサ２０６と、そのシーケンサ２０６の指示に従って楽音の波形データを生成する音源２０７と、ＣＰＵ２０１が出力する音声データと音源２０７が出力する波形データを入力し、それらを混合し放音用の波形データとして出力するミキサ２０８と、そのミキサ２０８が出力する波形データをＤ／Ａ変換してアナログのオーディオ信号を出力するＤ／Ａ変換器（ＤＡＣ）と、を備えている。 FIG. 2 is a configuration diagram of the karaoke system 11. As shown in FIG. 2, a CPU 201 that controls the entire system 11, a ROM 202 that stores programs executed by the CPU 201 and various control data, a RAM 203 that the CPU 201 uses for work, and a microphone 12 that outputs sound by input. An A / D converter (ADC) 204 for converting an analog audio signal into digital data (audio data), a switch (SW) unit 205 having various operators, and a sequencer 206 for reproducing a performance for karaoke Then, a sound source 207 that generates musical tone waveform data in accordance with instructions from the sequencer 206, audio data output from the CPU 201 and waveform data output from the sound source 207 are input, mixed and output as waveform data for sound emission. D / A conversion of mixer 208 and waveform data output by mixer 208 It includes D / A converter for outputting an analog audio signal (DAC), a Te.

上記カラオケ用の演奏を再生するためのシーケンスデータは、例えばスタンダードＭＩＤＩファイル（ＳＭＦ）の形でＲＯＭ２０２、或いはシーケンサ２０６に格納されている。ここではＳＭＦはＲＯＭ２０２に格納されているとの前提で以降の説明を行う。そのＲＯＭ２０２には、ＳＭＦ（楽曲）毎に歌詞表示用のテキストデータも併せて格納されている。 The sequence data for reproducing the performance for karaoke is stored in the ROM 202 or the sequencer 206 in the form of a standard MIDI file (SMF), for example. Here, the following description will be made on the assumption that the SMF is stored in the ROM 202. The ROM 202 also stores text data for displaying lyrics for each SMF (musical piece).

上記スイッチ部２０５は、操作の対象となるスイッチ類として、選曲等に用いられるテンキー、カラオケ演奏の再生開始／終了を指示するための再生開始／終了スイッチなどを備えている。また、ミキシングレベルの調整用、音量調整用、カラオケ演奏のキー（音高）調整用、といった各種の調整を行うための調整用つまみを複数、備えている。また、それらの操作子の状態変化を検出するための検出回路を備えており、その検出回路は随時、スイッチ類では押下された操作子、つまみでは状態が変化したつまみ、その変化後の状態をＣＰＵ２０１に通知する。ＣＰＵ２０１は、その通知によりユーザーの指示に対応するための制御を行う。 The switch unit 205 includes, as switches to be operated, a numeric keypad used for music selection and the like, a reproduction start / end switch for instructing the reproduction start / end of karaoke performance, and the like. A plurality of adjustment knobs are provided for performing various adjustments such as mixing level adjustment, volume adjustment, and karaoke performance key (pitch) adjustment. In addition, a detection circuit for detecting a change in the state of these operators is provided, and the detection circuit indicates a switch that has been pressed with a switch, a knob with a changed state with a knob, and a state after the change. The CPU 201 is notified. The CPU 201 performs control for responding to the user's instruction based on the notification.

上記テンキーによる選曲は、例えば曲番号を入力することで行うようになっている。その曲番号をユーザーが入力すると、ＣＰＵ２０１はスイッチ部２０５からの通知によりその曲番号を認識し、その曲番号が割り当てられたＳＭＦ（シーケンスデータ）をＲＯＭ２０２から読み出してシーケンサ２０６に送る。それにより、シーケンサ２０６は、ＣＰＵ２０１から受け取ったＳＭＦを対象に再生を行う。 The music selection by the numeric keypad is performed, for example, by inputting a music number. When the user inputs the music number, the CPU 201 recognizes the music number by a notification from the switch unit 205, reads SMF (sequence data) assigned with the music number from the ROM 202, and sends it to the sequencer 206. As a result, the sequencer 206 plays back the SMF received from the CPU 201.

そのＳＭＦでは、演奏イベントの内容を示すＭＩＤＩデータに、それを処理すべきタイミングを示す時間データが付加された形となっている。このことから、その再生は、時間データに従ってＭＩＤＩデータを順次、処理していくことで行われる。そのＭＩＤＩデータの処理により、音源２０７には音高を指定しての楽音の発音開始、その終了がシーケンサ２０６から指示される。 In the SMF, time data indicating the timing for processing the MIDI data indicating the contents of the performance event is added. Therefore, the reproduction is performed by sequentially processing the MIDI data according to the time data. By the processing of the MIDI data, the sound source 207 is instructed from the sequencer 206 to start and end the tone generation by designating the pitch.

そのシーケンサ２０６は、メロディに相当する楽音の発音に係わるＭＩＤＩデータを処理する場合に、そのＭＩＤＩデータによって発音開始、或いは終了される楽音のピッチ（参照ピッチ：ノートナンバー）をＣＰＵ２０１に通知する。その通知によってＣＰＵ２０１は、シーケンスデータの再生中（カラオケ演奏中）に発声すべき音声の参照ピッチ、その音声を発声すべき期間を認識する。メロディに相当する楽音の発音に係わるＭＩＤＩデータを判別できるように、そのＭＩＤＩデータは予め定めたチャンネルのデータとして格納するようになっている。その参照ピッチはキー調整用のつまみへの操作によって変化するが、ここでは説明上、便宜的にその変化は想定しないこととする。 The sequencer 206 notifies the CPU 201 of the pitch (reference pitch: note number) of the musical sound that starts or ends with the MIDI data when processing the MIDI data related to the musical sound corresponding to the melody. By the notification, the CPU 201 recognizes the reference pitch of the voice to be uttered during the reproduction of the sequence data (during karaoke performance) and the period during which the voice is uttered. The MIDI data is stored as data of a predetermined channel so that the MIDI data related to the pronunciation of the musical sound corresponding to the melody can be discriminated. The reference pitch changes depending on the operation of the knob for adjusting the key. However, for the sake of convenience, the change is not assumed here.

シーケンスデータの再生中、ＣＰＵ２０１は、ＡＤＣ２０４から入力する音声データにより、マイク１２を通して入力された音声のピッチ（利用者ピッチ）を検出し、その検出結果を用いてユーザーの歌唱に対する採点を行う。その採点結果は、シーケンスデータの再生の終了後、表示装置１４に表示させる。 During the reproduction of the sequence data, the CPU 201 detects the pitch (user pitch) of the voice input through the microphone 12 from the voice data input from the ADC 204, and scores the user's song using the detection result. The scoring result is displayed on the display device 14 after the reproduction of the sequence data is completed.

ＡＤＣ２０４は、予め定められたサンプリング周期で音声信号のサンプリングを行い、音声データを出力する。その音声データからピッチ検出を行うためにＣＰＵ２０１は、ＲＡＭ２０３に確保した領域に、過去、所定期間分の音声データを格納している。以降、図３、及び図４を参照して、シーケンスデータの再生、その再生により行う採点について詳細に説明する。 The ADC 204 samples an audio signal at a predetermined sampling period and outputs audio data. In order to perform pitch detection from the audio data, the CPU 201 stores audio data for a predetermined period in the past in an area secured in the RAM 203. Hereinafter, with reference to FIG. 3 and FIG. 4, reproduction of sequence data and scoring performed by the reproduction will be described in detail.

図３は、再生処理のフローチャートである。シーケンスデータの再生のためにＣＰＵ２０１が実行する処理を抜粋してその流れを示したものである。始めに図３を参照して、その再生処理について詳細に説明する。その再生処理は、ＣＰＵ２０１がＲＯＭ２０２に格納されたプログラムを実行することで実現される。 FIG. 3 is a flowchart of the reproduction process. The flow of the process executed by the CPU 201 for reproducing sequence data is shown. First, the reproduction process will be described in detail with reference to FIG. The reproduction process is realized by the CPU 201 executing a program stored in the ROM 202.

先ず、ステップ３０１では、ユーザーのスイッチ部２０５への操作に応じた楽曲の選択を行う。ユーザーが楽曲を指定するために入力した曲番号は、スイッチ部２０５からユーザーが操作したと通知されるキーの種類、その順序から特定される。ユーザーが曲番号を入力した後はステップ３０２に移行して、シーケンスデータの再生を指示するのを待つ。その指示は、上述したように再生開始／終了スイッチを操作することで行うようになっていることから、そのボタンの操作によってステップ３０３に移行する。 First, in step 301, music is selected in accordance with a user operation on the switch unit 205. The song number input by the user for designating the song is specified from the type of key notified from the switch unit 205 that the user has operated and the order thereof. After the user inputs the song number, the process proceeds to step 302 and waits for an instruction to reproduce the sequence data. Since the instruction is made by operating the reproduction start / end switch as described above, the operation proceeds to step 303 by operating the button.

ステップ３０３では、ユーザーが指定したシーケンスデータ（ＳＭＦ）をＲＯＭ２０２から読み出してシーケンサ２０６に送り、その再生開始を指示する。また、それに対応するテキストデータをＲＯＭ２０２から読み出し、表示装置１４に歌詞として表示させる。その後にステップ３０４に移行する。 In step 303, sequence data (SMF) designated by the user is read from the ROM 202 and sent to the sequencer 206 to instruct the start of reproduction. The corresponding text data is read from the ROM 202 and displayed on the display device 14 as lyrics. Thereafter, the process proceeds to step 304.

シーケンスデータの再生開始をシーケンサ２０６に指示することにより、カラオケ演奏が開始され、その開始によってユーザーは歌唱をしかるべきタイミングで開始することになる。ステップ３０４以降では、その歌唱に対応するための処理が行われる。 By instructing the sequencer 206 to start the reproduction of the sequence data, the karaoke performance is started, and the user starts singing at an appropriate timing. In step 304 and subsequent steps, processing for responding to the song is performed.

先ず、ステップ３０４では、ユーザーの歌唱に対する採点用の採点データを収集するための採点データ収集処理を実行する。続くステップ３０５では、ユーザーの歌唱によりマイク１２から入力された音声（利用者音声）と発声すべき音声（参照音声）の間のピッチ（音高）のずれを表示装置１４に表示させるための処理を行う。その次に移行するステップ３０６では、設定に応じて、利用者音声のピッチを参照ピッチに補正する操作を音声データに対して行い、ミキサ２０８に出力する処理を行う。ステップ３０７にはその実行後に移行する。 First, in step 304, a scoring data collection process for collecting scoring data for scoring the user's song is executed. In subsequent step 305, a process for causing the display device 14 to display a pitch (pitch) deviation between the voice (user voice) input from the microphone 12 by the user's singing and the voice to be uttered (reference voice). I do. In the next step 306, an operation for correcting the pitch of the user voice to the reference pitch is performed on the voice data in accordance with the setting, and a process of outputting to the mixer 208 is performed. The process proceeds to step 307 after the execution.

上記ステップ３０５の処理の実行によるピッチずれの表示の更新は、例えば発声すべき音声単位で行われる。そのずれを表示によって通知することにより、ユーザーが歌唱中により正しいピッチの音声を発声できるようになる。 The display of the pitch shift display by the execution of the processing in step 305 is performed, for example, for each voice to be uttered. By notifying the deviation by display, the user can utter the sound of the correct pitch during singing.

ステップ３０７では、シーケンスデータの再生停止が指示、或いはその再生が終了したか否か判定する。上記再生開始／終了スイッチを操作してシーケンスデータの再生終了をユーザーが指示するか、或いはその再生終了がシーケンサ２０６から通知された場合、判定はＹＥＳとなり、ステップ３０８で採点結果（点数）を表示装置１４に表示させた後、一連の処理を終了する。シーケンスデータの再生終了をユーザーが指示していたときには、ステップ３０８ではシーケンサ２０６への再生終了の指示を併せて行う。そうでない場合には、判定はＮＯとなって上記ステップ３０４に戻る。それにより、シーケンスデータの再生が終了するまでの間、ステップ３０４〜３０７で形成される処理ループは繰り返し実行される。その実行周期はサンプリング周期である。 In step 307, it is determined whether or not the reproduction of the sequence data is instructed or the reproduction is finished. If the user instructs the end of playback of the sequence data by operating the playback start / end switch, or if the end of playback is notified from the sequencer 206, the determination is YES and the scoring result (score) is displayed in step 308. After being displayed on the device 14, the series of processing is terminated. When the user has instructed the end of reproduction of the sequence data, in step 308, the instruction to end the reproduction to the sequencer 206 is also performed. Otherwise, the determination is no and the process returns to step 304 above. Thereby, the processing loop formed in steps 304 to 307 is repeatedly executed until the reproduction of the sequence data is completed. The execution cycle is a sampling cycle.

図４は、上記ステップ３０４として実行される採点データ収集処理のフローチャートである。次に図４を参照して、その収集処理について詳細に説明する。
先ず、ステップ４０１では、ＡＤＣ２０４を介して音声データの形で入力した音声のピッチを検出し、そのピッチ（利用者ピッチ）と、シーケンサ２０６から通知された、この時点で発声すべき音声のピッチ（参照ピッチ）とのピッチ比を算出する。本実施の形態では、そのピッチ比として、参照ピッチを利用者ピッチで割って得られる値を算出している。その算出後は、ステップ４０２でそのピッチ比を変数ｐ＿ｒａｔｉｏに代入し、更にステップ４０３で変数ｄｉｆｆに、変数ｐ＿ｒａｔｉｏの値から１を引いた値（＝ｐ＿ｒａｔｉｏ−１．０）を代入してからステップ４０４に移行する。 FIG. 4 is a flowchart of the scoring data collection process executed as step 304 above. Next, the collection process will be described in detail with reference to FIG.
First, in step 401, the pitch of the voice input in the form of voice data via the ADC 204 is detected, and the pitch (user pitch) and the pitch of the voice to be uttered at this time point notified from the sequencer 206 ( The pitch ratio to the reference pitch is calculated. In this embodiment, a value obtained by dividing the reference pitch by the user pitch is calculated as the pitch ratio. After the calculation, the pitch ratio is substituted for the variable p_ratio in step 402, and further, a value obtained by subtracting 1 from the value of the variable p_ratio (= p_ratio−1.0) is substituted for the variable diff in step 403. Go to 404.

上記利用者ピッチの検出方法は、特に限定するものではないが、本実施の形態では、出願人が特願２００５−５４４８１の願書に添付の明細書に記載された方法を採用している。その方法を採用することにより、上記ピッチ比は以下のような流れで算出される。
（１）先ず、ＲＡＭ２０３に確保した領域から予め定めたサイズ分の音声データをフレームとして抽出し、ＤＦＴ（Discrete Fourier Transform）により周波数領域に変換する
（２）周波数領域に変換することで周波数チャンネル毎に得られた周波数成分から、倍音もしくは基音が存在する周波数チャンネルを２つ以上見つける。ここでこのようなチャンネルが見つからない場合は、音声が入力されていない、或いは入力された音声が無声音であると判断し、ピッチ比は算出しない
（３）（２）で２つ以上の周波数チャンネルが見つかった場合、見つかった２つ以上のチャンネルのうちの１つを基準チャンネルとする
（４）見つかった２つ以上の周波数チャンネルのチャンネルインデクスに対応する周波数の最大公約数を算出する
（５）算出した最大公約数から上記基準チャンネルが何倍音に相当するのかを判定する
（６）シーケンサ２０６から通知されている参照ピッチに対応する周波数チャンネルでのフレーム間位相差を（５）で判定した倍数分乗算し、目標位相差とする
（７）上記基準チャンネルのフレーム間位相差と（６）で算出した目標位相差の比を算出する
（７）で算出される比が、参照ピッチを利用者ピッチで割って得られるピッチ比である。このことから、そのピッチ比は、ユーザーが正しいピッチの音声を発声していた場合に１となる。変数ｄｉｆｆの値は、その音声のピッチが参照ピッチよりも低ければ正、その参照ピッチよりも高ければ負となる。上述のようにピッチ比を算出する場合、ミッシング・ファンダメンタルと呼ばれる基本周波数が欠落、或いは他の周波数と比較して非常に小さいような楽音でも確実にピッチ比を算出することができる。 The method for detecting the user pitch is not particularly limited, but in this embodiment, the applicant adopts the method described in the specification attached to the application of Japanese Patent Application No. 2005-54481. By adopting this method, the pitch ratio is calculated according to the following flow.
(1) First, audio data of a predetermined size is extracted as a frame from an area secured in the RAM 203 and converted into a frequency domain by DFT (Discrete Fourier Transform). (2) Each frequency channel is converted into a frequency domain. From the obtained frequency components, two or more frequency channels in which harmonics or fundamentals exist are found. If such a channel is not found, it is determined that no sound is input or the input sound is an unvoiced sound, and the pitch ratio is not calculated. (3) Two or more frequency channels in (2) Is found, one of the two or more found channels is used as the reference channel. (4) The greatest common divisor of the frequency corresponding to the channel index of the two or more found frequency channels is calculated (5). It is determined how many harmonics the reference channel corresponds to from the calculated greatest common divisor. (6) Multiple determined by (5) the inter-frame phase difference in the frequency channel corresponding to the reference pitch notified from the sequencer 206. (7) The ratio of the phase difference between the frames of the reference channel to the target phase difference calculated in (6) is calculated. The ratio calculated in (7) is a pitch ratio obtained by dividing the reference pitch by the user pitch. For this reason, the pitch ratio becomes 1 when the user utters a sound having a correct pitch. The value of the variable diff is positive if the voice pitch is lower than the reference pitch, and negative if the voice pitch is higher than the reference pitch. When the pitch ratio is calculated as described above, it is possible to reliably calculate the pitch ratio even for a musical sound in which a fundamental frequency called missing fundamental is missing or very small compared to other frequencies.

ステップ４０４では、変数ｄｉｆｆの値の絶対値が、予め定められた定数OUTER＿THRESHより大きいか否か判定する。その絶対値が定数OUTER＿THRESH以下であった場合、判定はＮＯとなってステップ４０５に移行する。そうでない場合には、判定はＹＥＳとなり、ここで一連の処理を終了する。 In step 404, it is determined whether or not the absolute value of the variable diff is greater than a predetermined constant OUTER_THRESH. If the absolute value is less than or equal to the constant OUTER_THRESH, the determination is no and the process moves to step 405. Otherwise, the determination is yes, and the series of processing ends here.

ピッチ比算出（ピッチ検出）ではエラーが発生する可能性がある。このことから、本実施の形態では、上記ステップ４０４の判定処理により、ピッチ比が定数OUTER＿THRESHにより指定される範囲外となっているものはピッチ比算出においてエラーが発生していると見なし、採点を行ううえで除外している。倍音もしくは基音が存在する周波数チャンネルが２つ以上見つからなかった場合や、発声すべき音声が存在しない場合には、特には図示していないが、ステップ４０１の処理を実行した後、一連の処理を終了するようになっている。そのようにして、評価を行ううえで考慮すべきでないピッチ比や期間を除外することにより、評価はより適切に行えるようになる。上記定数OUTER＿THRESHとしては、たとえば５度音程を目安に決定する。 An error may occur in the pitch ratio calculation (pitch detection). Therefore, in the present embodiment, if the pitch ratio is out of the range specified by the constant OUTER_THRESH by the determination process in step 404, it is considered that an error has occurred in the pitch ratio calculation, and scoring is performed. Excluded in doing. When two or more frequency channels with harmonics or fundamentals are not found, or when there is no voice to be uttered, although not particularly shown, a series of processing is performed after executing the processing of step 401. It is supposed to end. In this way, the evaluation can be performed more appropriately by excluding the pitch ratio and the period that should not be taken into consideration in the evaluation. The constant OUTER_THRESH is determined using, for example, a pitch of 5 degrees as a guide.

発生する音声には、ピッチの揺らぎが生じることが多いのが実情である。このことから、その揺らぎを考慮し、一定のデータ数を採点単位として、その採点単位毎に採点用のデータを収集するようにしている。ここでは、その採点単位を「採点フレーム」と呼ぶことにする。変数ａｃｃｕｍ、ｄａｔａ＿ｃｏｕｎｔｅｒ、及びａｖｇは、採点フレーム毎にピッチに着目した評価を行うために用意した変数である。上記一定のデータ数は定数ｆｒａｍｅ＿ｎｕｍとして用意されている。 The actual situation is that pitch fluctuation often occurs in the generated voice. For this reason, in consideration of the fluctuations, scoring data is collected for each scoring unit with a certain number of data as scoring units. Here, the scoring unit is referred to as a “scoring frame”. The variables accum, data_counter, and avg are variables prepared for performing an evaluation focusing on the pitch for each scoring frame. The fixed number of data is prepared as a constant frame_num.

ステップ４０５では、変数ａｃｃｕｍに、それまでの値に変数ｄｉｆｆの値を加算した結果（＝ａｃｃｕｍ＋ｄｉｆｆ）を代入する。次のステップ４０６では、変数ｄａｔａ＿ｃｏｕｎｔｅｒの値をインクリメントする。その次に移行するステップ４０７では、変数ｄａｔａ＿ｃｏｕｎｔｅｒの値が定数ｆｒａｍｅ＿ｎｕｍと等しいか否か判定する。変数ｄａｔａ＿ｃｏｕｎｔｅｒの値は、現時点の採点フレーム内で有効とするピッチ比を算出した回数に相当する。このことから、その採点フレーム分の有効とするピッチ比の算出が行われた場合、判定はＹＥＳとなってステップ４０８に移行する。そうでない場合には、判定はＮＯとなり、ここで一連の処理を終了する。 In step 405, the result (= accum + diff) obtained by adding the value of the variable diff to the previous value is substituted for the variable accum. In the next step 406, the value of the variable data_counter is incremented. In the next step 407, it is determined whether or not the value of the variable data_counter is equal to the constant frame_num. The value of the variable data_counter corresponds to the number of times that the pitch ratio that is valid in the current scoring frame is calculated. For this reason, when the effective pitch ratio for the scoring frame is calculated, the determination is yes and the process proceeds to step 408. Otherwise, the determination is no and the series of processing ends here.

ステップ４０７でのＹＥＳの判定は、当該採点フレームで算出の対象となるピッチ比が存在していないことを意味する。このことから、ステップ４０８以降では、当該採点フレームを対象にピッチに着目した評価を行い、その評価結果を保存するための処理が行われる。 A determination of YES in step 407 means that there is no pitch ratio to be calculated in the scoring frame. For this reason, in step 408 and subsequent steps, an evaluation focusing on the pitch is performed on the scoring frame, and a process for storing the evaluation result is performed.

先ず、ステップ４０８では、対象となる採点フレーム数カウント用の変数ｆｒａｍｅ＿ｃｏｕｎｔｅｒの値をインクリメントする。続くステップ４０９では、変数ａｖｇに、変数ａｃｃｕｍの値を定数ｆｒａｍｅ＿ｎｕｍで割った値を代入する。その代入後はステップ４１０に移行して、変数ａｖｇの値の絶対値が、予め定めた定数PASS＿THRESHより小さいか否か判定する。その定数PASS＿THRESHは、ピッチずれが発生していないと見なすピッチ比の範囲を示す値として決定したものである。変数ａｖｇの値は、当該採点フレームにおけるピッチ比の平均値を相対的に示す値であり、定数PASS＿THRESHは、その変数ａｖｇの値を想定して決定している。このことから、その平均値がその範囲内であった場合、変数ａｖｇの値の絶対値は定数PASS＿THRESH以下、つまり判定はＹＥＳとなり、次にステップ４１１で変数ｐａｓｓ＿ｃｏｕｎｔｅｒの値をインクリメントした後、ステップ４１２に移行する。そうでない場合には、判定はＮＯとなり、次にそのステップ４１２の処理を実行する。 First, in step 408, the value of the variable frame_counter for counting the number of scoring frames to be processed is incremented. In the subsequent step 409, a value obtained by dividing the value of the variable accum by the constant frame_num is substituted for the variable avg. After the substitution, the routine proceeds to step 410, where it is determined whether or not the absolute value of the variable avg is smaller than a predetermined constant PASS_THRESH. The constant PASS_THRESH is determined as a value indicating a pitch ratio range in which it is considered that no pitch deviation has occurred. The value of the variable avg is a value that relatively indicates the average value of the pitch ratio in the scoring frame, and the constant PASS_THRESH is determined assuming the value of the variable avg. From this, when the average value is within the range, the absolute value of the variable avg is equal to or smaller than the constant PASS_THRESH, that is, the determination is YES. Then, after incrementing the value of the variable pass_counter in step 411, step 412 is performed. Migrate to Otherwise, the determination is no and the process of step 412 is then executed.

人間の聴覚系は、音の高さの違いにきわめて敏感である。周知のように、連続する２つの音の高さの違いを聞き取ることができる周波数差の最小値（周波数弁別閾）に個人差はあるが、普通の人でも０．２％程度であるとされている。この値は、耳の鋭い専門家では０．１％以下になる。上記定数PASS＿THRESHは、例えばそのようなことも考慮して決定した値である。 The human auditory system is extremely sensitive to pitch differences. As is well known, there is an individual difference in the minimum frequency difference (frequency discrimination threshold) at which the difference in pitch between two consecutive sounds can be heard, but even a normal person is said to be about 0.2%. ing. This value is 0.1% or less for an expert with a sharp ear. The constant PASS_THRESH is a value determined in consideration of such a situation, for example.

ステップ４１２では、変数ａｃｃｕｍに０を代入する。続くステップ４１３では、変数ｄａｔａ＿ｃｏｕｎｔｅｒに０を代入する。そのようにして、次の採点フレームの評価を行うために変数を初期化した後、一連の処理を終了する。 In step 412, 0 is substituted into the variable accum. In the subsequent step 413, 0 is substituted into the variable data_counter. In this way, after initializing variables in order to evaluate the next scoring frame, a series of processing is terminated.

ミキサ２０８に出力する音声データの生成は、ＲＡＭ２０３に別に確保した領域を用いて行われる。図３のステップ３０６では、上記ステップ４０２で変数ｐ＿ｒａｔｉｏに代入されるピッチ比に従ってピッチスケーリングを行い、そのスケーリングを行った後の音声データをその領域に格納されている音声データにオーバーラップ加算する。ミキサ２０８には、オーバーラップ加算後の１サンプリング分の音声データを出力する。 The generation of audio data to be output to the mixer 208 is performed using an area separately secured in the RAM 203. In step 306 of FIG. 3, pitch scaling is performed according to the pitch ratio substituted in the variable p_ratio in step 402, and the audio data after the scaling is overlapped with the audio data stored in the area. The mixer 208 outputs audio data for one sampling after the overlap addition.

図３のステップ３０８では、上記変数ｐａｓｓ＿ｃｏｕｎｔｅｒ、ｆｒａｍｅ＿ｃｏｕｎｔｅｒの各値を用いて点数を計算し表示させる。その点数をｇｒａｄｅと表記すると、それは例えば下記式で算出される。 In step 308 of FIG. 3, the score is calculated and displayed using the values of the variables pass_counter and frame_counter. If the score is expressed as grade, it is calculated by the following equation, for example.

grade＝pass＿counter／frame＿counter×100
人間の聴覚系は、音程(音の高さの違い)の量は、周波数比にほぼ比例することが知られている。このことに着目し、利用者ピッチと参照ピッチのピッチ比を用いてユーザーの歌唱を評価し採点している。このため、楽曲で発声すべき音声のピッチの傾向（ピッチ毎の発声すべき音声の数）に係わらず、人間の聴感に合った採点を行うことができる。それにより、採点結果（点数）は、より多くの人にとって適切と評価できるものとなる。 grade = pass_counter / frame_counter × 100
In the human auditory system, the amount of pitch (difference in pitch) is known to be approximately proportional to the frequency ratio. Focusing on this, the user's singing is evaluated and scored using the pitch ratio between the user pitch and the reference pitch. For this reason, scoring suitable for human hearing can be performed irrespective of the tendency of the pitch of the voice to be uttered in the music (the number of voices to be uttered for each pitch). Thereby, the scoring result (score) can be evaluated as appropriate for more people.

なお、本実施の形態では、ピッチにのみ着目して歌唱を評価し採点するようにしているが、音声の発声タイミング（発声開始、及び終了のうちの少なくとも一方のタイミング、或いは発声期間、など）に着目した評価を併せて行うようにしても良い。ユーザーのレベルに応じた採点が行えるように、複数の異なる定数PASS＿THRESHを選択できるようにしても良い。その選択（或いは設定）はユーザーが任意に行えるようにすることが望ましい。ピッチ比の算出方法やそのピッチ比を用いた採点方法などは本実施の形態に限定されるものではなく、様々な変形を行っても良い。音声の入力では、ネットワークを介して行っても良く、ＣＤ等の記録媒体を介して行えるようにしても良い。参照ピッチでは、メロディ以外のパートで発音される楽音のピッチを参照ピッチとしても良い。 In the present embodiment, the singing is evaluated and scored by paying attention only to the pitch, but the utterance timing of voice (at least one timing of utterance start and end, or utterance period, etc.) You may make it perform the evaluation which paid attention to. A plurality of different constants PASS_THRESH may be selected so that scoring according to the user level can be performed. It is desirable that the selection (or setting) can be arbitrarily performed by the user. The pitch ratio calculation method and the scoring method using the pitch ratio are not limited to the present embodiment, and various modifications may be made. Audio input may be performed via a network or may be performed via a recording medium such as a CD. In the reference pitch, the pitch of a musical sound generated by a part other than the melody may be used as the reference pitch.

上述したような歌唱採点装置、或いはその変形例を実現させるプログラムの全て、或いは一部は、記録媒体に格納して配布するようにしても良い。或いはネットワークを構成する通信媒体を介して配布するようにしても良い。そのようにした場合には、ユーザーはプログラムを取得してデータ処理装置（コンピュータ）にロードすることにより、その装置に本発明を適用させることができる。 All or a part of the singing scoring apparatus as described above or a program for realizing the modification may be stored in a recording medium and distributed. Or you may make it distribute via the communication medium which comprises a network. In such a case, the user can apply the present invention to the device by acquiring the program and loading it into the data processing device (computer).

本実施の形態による歌唱採点装置を搭載のカラオケシステムを用いて構築されたカラオケ用システムの構成を説明する図である。It is a figure explaining the structure of the system for karaoke constructed | assembled using the karaoke system carrying the singing scoring apparatus by this Embodiment. 本実施の形態による歌唱採点装置を搭載のカラオケシステムの構成図である。It is a lineblock diagram of a karaoke system carrying a song scoring device by this embodiment. 再生処理のフローチャートである。It is a flowchart of a reproduction process. 採点データ収集処理のフローチャートである。It is a flowchart of scoring data collection processing.

Explanation of symbols

１ＣＰＵ
５音源部
５−１整数アドレス計算ブロック
５−２小数アドレス計算ブロック
５−３ＥＮＶ発生ブロック
５−４波形ＲＯＭデータラッチブロック
５−５積和ブロック
５−６累算ブロック
６波形ＲＯＭ
５０１〜５０４、５０９〜５１１、５２１〜５２３レジスタ
５０５、５０６、５１６、５１７マルチプレクサ
５０７加減算器
５１４右２ビットシフタ
５１５演算器
５１９乗算器
５２０加算器
1 CPU
5 Sound Generator 5-1 Integer Address Calculation Block 5-2 Decimal Address Calculation Block 5-3 ENV Generation Block 5-4 Waveform ROM Data Latch Block 5-5 Multiply and Accumulate Block 5-6 Accumulation Block 6 Waveform ROM
501 to 504, 509 to 511, 521 to 523 Register 505, 506, 516, 517 Multiplexer 507 Adder / Subtractor 514 Right 2-bit shifter 515 Operation unit 519 Multiplier 520 Adder

Claims

In the singing scoring device that evaluates and scores the singing of music by the user,
Between the first pitch that is the pitch of the voice signal input by the user and the second pitch that is the pitch of the voice to be uttered at the singing position of the music corresponding to the voice signal. A pitch ratio calculating means for calculating a pitch ratio;
Using the pitch ratio calculated by the pitch ratio calculation means, scoring means for scoring the song by the user;
A singing scoring device comprising:

The scoring means performs the scoring by evaluating a deviation in pitch from the pitch ratio for each scoring unit, with the predetermined number of pitch ratios calculated by the pitch ratio calculating means as scoring units.
The singing scoring device according to claim 1.

The scoring means performs evaluation of the scoring unit by excluding a pitch ratio that exceeds a predetermined range from the pitch ratio as the scoring unit.
The singing scoring device according to claim 2.

The scoring means can sequentially notify the user of the evaluation result of the scoring unit at a predetermined timing.
The singing scoring device according to claim 2 or 3, wherein

It is a program that is executed by a singing scoring device that evaluates and scores a song of a song by a user,
Between the first pitch that is the pitch of the voice signal input by the user and the second pitch that is the pitch of the voice to be uttered at the singing position of the music corresponding to the voice signal. A function to calculate the pitch ratio;
Using the pitch ratio calculated by the function to calculate, the function of scoring the song by the user,
A program to realize