JP4135615B2

JP4135615B2 - Musical sound comparison device and musical sound comparison program

Info

Publication number: JP4135615B2
Application number: JP2003365554A
Authority: JP
Inventors: 純一南高
Original assignee: Casio Computer Co Ltd
Current assignee: Casio Computer Co Ltd
Priority date: 2003-10-27
Filing date: 2003-10-27
Publication date: 2008-08-20
Anticipated expiration: 2023-10-27
Also published as: JP2005128372A

Abstract

<P>PROBLEM TO BE SOLVED: To exactly judge matching and mismatching of inputted speech and the pitch of musical sound to be actually produced. <P>SOLUTION: A singing guide device is equipped with: a pitch extraction section 32 which extracts the pitch by each of prescribed frames from the speech data based on a speech signal of user input; a reproduction control section 30 which generates the musical sound of the pitch based on the musical sound data at the sound production timing of the musical piece data to be reproduced and silences the musical piece at a silencing timing; and a comparative decision section 34 which compares the pitch extracted by each of the frames until the sound production timing reaches the silencing timing, and the pitch based on the musical sound data, accumulates the results of the comparison by each of the frames by associating the results with the musical sound data, and decides the exactness of the pitch extracted from the speech data based on the results of the comparison after lapse of the silencing timing. <P>COPYRIGHT: (C)2005,JPO&NCIPI

Description

本発明は、ユーザ入力にかかる音声信号と、本来発するべき楽音の音高とを比較する装置に関する。 The present invention relates to an apparatus that compares an audio signal applied to a user input with a pitch of a musical sound that should be emitted.

ピッチ抽出技術により、ユーザの歌唱のピッチを抽出し、実際に発音すべき楽音との相違を比較する装置が提案されている。 There has been proposed an apparatus for extracting the pitch of a user's singing by the pitch extraction technique and comparing the difference with a musical sound to be actually pronounced.

たとえば、特許文献１には、歌唱音声信号に基づく音声データと、演奏データとを比較し、表示装置の画面上に表示された歌詞や楽譜に、音程外れなどを表示できるような演奏曲再生装置が開示されている。
特開２０００−２９４７３号公報 For example, Japanese Patent Laid-Open No. 2004-228561 compares a musical data based on a singing voice signal with performance data, and displays a musical piece reproduction device that can display out-of-pitch on a lyrics or a score displayed on the screen of the display device. Is disclosed.
JP 2000-29473 A

たとえば、上記特許文献１に記載された装置においては、音程外れやテンポ外れなどの不一致が生じた箇所の位置を求めているが、不一致を判断するタイミングにより、ユーザ歌唱にかかるピッチと実際の音高とが外れているか否かの判断結果が異なり、その結果、正確な判断ができないという問題点があった。たとえば、わずかな時間だけでも本来発声すべき音高から逸脱した場合にも、音程の不一致であると判断してしまうと、ユーザの歌唱にわずかなミスがあった場合にもその部分が間違っていると判断されてしまう。 For example, in the apparatus described in the above-mentioned Patent Document 1, the position of a location where a mismatch such as out of pitch or tempo has occurred is obtained. As a result, there is a problem in that accurate judgment cannot be made. For example, even if it deviates from the pitch that should be originally spoken even in a short time, if it is determined that the pitch is inconsistent, even if there is a slight mistake in the user's singing, that part is wrong. It will be judged.

本発明は、入力された音声と、実際に発するべき楽音の音高との一致、不一致を正確に判断することができる装置を提供することを目的とする。 It is an object of the present invention to provide an apparatus that can accurately determine the coincidence and disagreement between an input voice and the pitch of a musical sound that should actually be emitted.

本発明の目的は、ユーザ入力にかかる音声信号から抽出されたピッチと、本来発するべき楽音の音高とを比較して、比較結果を出力する楽音比較装置であって、前記ユーザ入力にかかる音声信号をディジタル化した音声データに基づいて、所定のフレームごとのピッチを抽出するピッチ抽出手段と、再生すべき楽曲データの発音タイミングに、当該楽音データに基づく音高の楽音を発生させるとともに、当該楽曲データの消音タイミングに、当該楽曲を消音させる再生手段と、前記発音タイミングから前記消音タイミングに至るまで、前記フレームごとに抽出されたピッチと、前記楽音データに基づく音高とを比較して、フレームごとの比較結果を、当該楽音データに関連付けて累算する比較手段であって、前記フレームごとに、抽出されたピッチが、楽音データに基づく音高と一致することを示す第１のカウンタ、音高より高いことを示す第２のカウンタ、音高より低いことを示す第３のカウンタ、および、前記音声信号のレベルが所定の値に達していない場合に、入力がなかったことを示す第４のカウンタのいずれかの、カウント値をカウントアップする比較手段と、前記消音タイミングが経過後、前記第１ないし第４のカウンタの前記カウンタ値を参照し、前記比較結果に基づき、前記音声データから抽出されたピッチの正確さを判定する判定手段と、前記判定手段による判定結果を表示装置に表示する表示制御手段とを備えたことを特徴とする楽音比較装置により達成される。 An object of the present invention is a musical sound comparison device that compares a pitch extracted from a voice signal applied to a user input with a pitch of a musical sound that should be emitted and outputs a comparison result, and the voice applied to the user input Based on the sound data obtained by digitizing the signal, a pitch extracting means for extracting a pitch for each predetermined frame, and generating a musical tone having a pitch based on the musical sound data at the sounding timing of the music data to be reproduced, Comparing the reproduction means for muting the music to the mute timing of the music data, the pitch extracted for each frame from the sound generation timing to the mute timing, and the pitch based on the musical sound data, the comparison result for each frame, a comparison means for accumulating in association with the tone data for each said frame, the extracted A first counter indicating that the pitch matches the pitch based on the musical tone data, a second counter indicating that the pitch is higher than the pitch, a third counter indicating that the pitch is lower than the pitch, and the audio signal If the level does not reach a predetermined value, the comparison means for counting up the count value of any of the fourth counters indicating that there has been no input , and the first to th Referencing the counter value of the fourth counter , based on the comparison result, determination means for determining the accuracy of the pitch extracted from the audio data, and display control for displaying the determination result by the determination means on a display device And a musical tone comparison device characterized by comprising:

本発明の目的は、ユーザ入力にかかる音声信号から抽出されたピッチと、本来発するべき楽音の音高とを比較して、比較結果を出力する楽音比較装置であって、前記ユーザ入力にかかる音声信号をディジタル化した音声データに基づいて、所定のフレームごとのピッチを抽出するピッチ抽出手段と、再生すべき楽曲データの発音タイミングに、当該楽音データに基づく音高の楽音を発生させるとともに、当該楽曲データの消音タイミングに、当該楽曲を消音させる再生手段と、前記発音タイミングから前記消音タイミングに至るまで、前記フレームごとに抽出されたピッチと、前記楽音データに基づく音高とを比較して、フレームごとの比較結果を、当該楽音データに関連付けて累算する比較手段であって、前記フレームごとに、抽出されたピッチが、楽音データに基づく音高と一致することを示す第１のカウンタ、音高より高いことを示す第２のカウンタ、音高より低いことを示す第３のカウンタ、および、前記音声信号のレベルが所定の値に達していない場合に、入力がなかったことを示す第４のカウンタのいずれかの、カウント値をカウントアップする比較手段と、前記消音タイミングが経過後、前記第１ないし第４のカウンタの前記カウンタ値を参照し、前記比較結果に基づき、前記音声データから抽出されたピッチの正確さを判定する判定手段と、前記判定手段による判定結果を表示装置に表示する表示制御手段とを備え、前記再生手段が、消音タイミングに前記楽音を消音させるのに応答して、前記判定手段が起動することを特徴とする楽音比較装置により達成される。An object of the present invention is a musical sound comparison device that compares a pitch extracted from a voice signal applied to a user input with a pitch of a musical sound that should be emitted and outputs a comparison result, and the voice applied to the user input Based on the sound data obtained by digitizing the signal, a pitch extracting means for extracting a pitch for each predetermined frame, and generating a musical tone having a pitch based on the musical sound data at the sounding timing of the music data to be reproduced, Comparing the reproduction means for muting the music to the mute timing of the music data, the pitch extracted for each frame from the sound generation timing to the mute timing, and the pitch based on the musical sound data, Comparing means for accumulating the comparison results for each frame in association with the musical sound data, extracted for each frame A first counter indicating that the pitch matches the pitch based on the musical tone data, a second counter indicating that the pitch is higher than the pitch, a third counter indicating that the pitch is lower than the pitch, and the audio signal If the level does not reach the predetermined value, the comparison means for counting up the count value of any one of the fourth counters indicating that there has been no input, and the first to thirst times after the mute timing has elapsed. Referencing the counter value of the fourth counter, based on the comparison result, determination means for determining the accuracy of the pitch extracted from the audio data, and display control for displaying the determination result by the determination means on a display device And the reproducing means is activated in response to muting the musical sound at the muting timing, and the determination means is activated.

さらに別の好ましい実施態様においては、前記比較手段が、フレームごとにピッチと音高が比較された回数をカウントする第５のカウンタを備え、前記判定手段が、前記第１のカウンタのカウンタ値が他のカウンタのカウンタ値よりも大きかったときに、前記第１のカウンタのカウンタ値／第５のカウンタのカウンタ値に基づいて、前記ピッチと音高とが一致した割合を求め、前記割合が、所定の閾値より大きい場合に、当該音高について、正確であったと判定する。前記閾値を、設定された難易度に応じて変化させても良い。In still another preferred embodiment, the comparison means includes a fifth counter that counts the number of times the pitch and pitch are compared for each frame, and the determination means determines that the counter value of the first counter is When the counter value of the other counter is larger than the counter value of the first counter / the counter value of the fifth counter, the ratio of the pitch and the pitch is determined, and the ratio is When it is larger than the predetermined threshold, it is determined that the pitch is accurate. The threshold value may be changed according to the set difficulty level.

本発明の目的は、ユーザ入力にかかる音声信号から抽出されたピッチと、本来発するべき楽音の音高とを比較して、比較結果を出力する楽音比較装置であって、前記ユーザ入力にかかる音声信号をディジタル化した音声データに基づいて、所定のフレームごとのピッチを抽出するピッチ抽出手段と、再生すべき楽曲データの発音タイミングに、当該楽音データに基づく音高の楽音を発生させるとともに、当該楽曲データの消音タイミングに、当該楽曲を消音させる再生手段と、前記発音タイミングから前記消音タイミングに至るまで、前記フレームごとに抽出されたピッチと、前記楽音データに基づく音高とを比較して、フレームごとの比較結果を、当該楽音データに関連付けて累算する比較手段であって、前記フレームごとに、抽出されたピッチが、楽音データに基づく音高と一致することを示す第１のカウンタ、音高より高いことを示す第２のカウンタ、および、音高より低いことを示す第３のカウンタ、フレームごとにピッチと音高が比較された回数をカウントする第４のカウンタのいずれかの、カウント値をカウントアップする比較手段と、前記消音タイミングが経過後、前記比較結果に基づき、前記音声データから抽出されたピッチの正確さを判定する判定手段であって、前記第１のカウンタのカウンタ値が他のカウンタのカウンタ値よりも大きかったときに、前記第１のカウンタのカウンタ値／第４のカウンタのカウンタ値に基づいて、前記ピッチと音高とが一致した割合を求め、前記割合が、所定の閾値より大きい場合に、当該音高について、正確であったと判定する判定手段と、前記判定手段による判定結果を表示装置に表示する表示制御手段とを備えたことを特徴とする楽音比較装置により達成される。なお、前記閾値を、設定された難易度に応じて変化させても良い。An object of the present invention is a musical sound comparison device that compares a pitch extracted from a voice signal applied to a user input with a pitch of a musical sound that should be emitted and outputs a comparison result, and the voice applied to the user input Based on the sound data obtained by digitizing the signal, a pitch extracting means for extracting a pitch for each predetermined frame, and generating a musical tone having a pitch based on the musical sound data at the sounding timing of the music data to be reproduced, Comparing the reproduction means for muting the music to the mute timing of the music data, the pitch extracted for each frame from the sound generation timing to the mute timing, and the pitch based on the musical sound data, Comparing means for accumulating the comparison results for each frame in association with the musical sound data, extracted for each frame A first counter indicating that the pitch matches the pitch based on the musical tone data, a second counter indicating that the pitch is higher than the pitch, and a third counter indicating that the pitch is lower than the pitch, for each frame One of a fourth counter that counts the number of times the pitch and the pitch are compared, and a comparison unit that counts up the count value, and after the mute timing has elapsed, is extracted from the audio data based on the comparison result Determining means for determining the accuracy of the pitch, when the counter value of the first counter is larger than the counter value of the other counter, the counter value of the first counter / the value of the fourth counter Based on the counter value, a ratio of the pitch and the pitch is obtained, and when the ratio is larger than a predetermined threshold, it is determined that the pitch is accurate. And judging means is achieved by the determination means according to the determination result tone comparison device, characterized in that it comprises a display control means for displaying on a display device. The threshold may be changed according to the set difficulty level.

本発明の目的は、ユーザ入力にかかる音声信号から抽出されたピッチと、本来発するべき楽音の音高とを比較して、比較結果を出力する楽音比較装置であって、前記ユーザ入力にかかる音声信号をディジタル化した音声データに基づいて、所定のフレームごとのピッチを抽出するピッチ抽出手段と、再生すべき楽曲データの発音タイミングに、当該楽音データに基づく音高の楽音を発生させるとともに、当該楽曲データの消音タイミングに、当該楽曲を消音させる再生手段と、前記発音タイミングから前記消音タイミングに至るまで、前記フレームごとに抽出されたピッチと、前記楽音データに基づく音高とを比較して、フレームごとの比較結果を、当該楽音データに関連付けて累算する比較手段であって、前記フレームごとに、抽出されたピッチが、楽音データに基づく音高と一致することを示す第１のカウンタ、音高より高いことを示す第２のカウンタ、および、音高より低いことを示す第３のカウンタ、フレームごとにピッチと音高が比較された回数をカウントする第４のカウンタのいずれかの、カウント値をカウントアップする比較手段と、前記消音タイミングが経過後、前記比較結果に基づき、前記音声データから抽出されたピッチの正確さを判定する判定手段であって、前記第１のカウンタのカウンタ値が他のカウンタのカウンタ値よりも大きかったときに、前記第１のカウンタのカウンタ値／第４のカウンタのカウンタ値に基づいて、前記ピッチと音高とが一致した割合を求め、前記割合が、所定の閾値より大きい場合に、当該音高について、正確であったと判定する判定手段と、前記判定手段による判定結果を表示装置に表示する表示制御手段とを備え、前記再生手段が、消音タイミングに前記楽音を消音させるのに応答して、前記判定手段が起動することを特徴とする楽音比較装置により達成される。なお、前記閾値を、設定された難易度に応じて変化させても良い。An object of the present invention is a musical sound comparison device that compares a pitch extracted from a voice signal applied to a user input with a pitch of a musical sound that should be emitted and outputs a comparison result, and the voice applied to the user input Based on the sound data obtained by digitizing the signal, a pitch extracting means for extracting a pitch for each predetermined frame, and generating a musical tone having a pitch based on the musical sound data at the sounding timing of the music data to be reproduced, Comparing the reproduction means for muting the music to the mute timing of the music data, the pitch extracted for each frame from the sound generation timing to the mute timing, and the pitch based on the musical sound data, Comparing means for accumulating the comparison results for each frame in association with the musical sound data, extracted for each frame A first counter indicating that the pitch matches the pitch based on the musical tone data, a second counter indicating that the pitch is higher than the pitch, and a third counter indicating that the pitch is lower than the pitch, for each frame One of a fourth counter that counts the number of times the pitch and the pitch are compared, and a comparison unit that counts up the count value, and after the mute timing has elapsed, is extracted from the audio data based on the comparison result Determining means for determining the accuracy of the pitch, when the counter value of the first counter is larger than the counter value of the other counter, the counter value of the first counter / the value of the fourth counter Based on the counter value, a ratio of the pitch and the pitch is obtained, and when the ratio is larger than a predetermined threshold, it is determined that the pitch is accurate. Determining means, and display control means for displaying a determination result by the determining means on a display device, wherein the reproducing means is activated in response to the sound being silenced at the mute timing. This is achieved by the featured tone comparison device. The threshold may be changed according to the set difficulty level.

本発明によれば、入力された音声と、実際に発するべき楽音の音高との一致、不一致を正確に判断することができる装置を提供することが可能となる。 ADVANTAGE OF THE INVENTION According to this invention, it becomes possible to provide the apparatus which can judge correctly whether the input audio | voice and the pitch of the musical sound which should be emitted actually correspond, and mismatch.

以下、添付図面を参照して、本発明の実施の形態について説明する。図１は、本発明の第１の実施の形態にかかる歌唱ガイド装置のハードウェア構成を示すブロックダイヤグラムである。図１に示すように、歌唱ガイド装置１０は、ＣＰＵ１２、歌唱をガイドするための種々のプログラムや演奏する楽曲の楽曲データを記憶したＲＯＭ１４、歌唱者の歌唱にしたがったデータを一時的に記憶し、また、プログラムを実行する際のワークエリアとして用いられるＲＡＭ１６、キーボードやマウスなどの入力装置１６、ＣＲＴ表示装置やＬＣＤからなる表示装置１８、外部機器から楽曲データを受理し、或いは、外部機器に楽曲データを出力するＭＩＤＩ(Musical Instrument Digital Interface)インタフェース（Ｉ／Ｆ）２０、および、マイク２２を経て伝達された音声信号をディジタル信号に変換するＡＤ変換器（ＡＤＣ）２４を有している。 Embodiments of the present invention will be described below with reference to the accompanying drawings. FIG. 1 is a block diagram showing a hardware configuration of the singing guide device according to the first embodiment of the present invention. As shown in FIG. 1, the singing guide device 10 temporarily stores a CPU 12, a ROM 14 that stores various programs for guiding the singing and music data of the music to be played, and data according to the singing of the singer. In addition, the RAM 16 used as a work area when executing the program, the input device 16 such as a keyboard and a mouse, the display device 18 including a CRT display device or an LCD, the music data from the external device, or the external device It has a MIDI (Musical Instrument Digital Interface) interface (I / F) 20 for outputting music data, and an AD converter (ADC) 24 for converting an audio signal transmitted through a microphone 22 into a digital signal.

図２は、ＣＰＵ１２およびその周辺部材の機能ブロックダイヤグラムである。図２に示すように、ＣＰＵ１２は、楽曲データを受理して、受理した楽曲データに基づいて楽曲を再生する再生制御部３０と、ＡＤＣ２６からのディジタル信号に基づいて、歌唱者の音声のピッチを抽出するピッチ抽出部３２と、再生される楽曲の音高と抽出されたピッチとを比較する比較判定部３４と、再生される楽曲の楽譜表示を制御する楽譜表示制御部３６と，表示すべき楽譜のスクロールを制御するスクロール表示制御部３８とを有している。 FIG. 2 is a functional block diagram of the CPU 12 and its peripheral members. As shown in FIG. 2, the CPU 12 receives the music data, and reproduces the music based on the received music data, and the pitch of the singer's voice based on the digital signal from the ADC 26. A pitch extraction unit 32 to extract, a comparison / determination unit 34 that compares the pitch of the music to be reproduced and the extracted pitch, a score display control unit 36 that controls the score display of the music to be reproduced, and to be displayed And a scroll display control unit 38 for controlling the scrolling of the musical score.

楽曲データは、歌唱ガイド装置１０のＲＯＭ１４に記憶され、或いは、外部機器から、ＭＩＤＩＩ／Ｆ２２を介して、再生制御部３０に与えられる。図３は、楽曲データのデータ構成を示す図である。図３に示すように、楽曲データは、再生時に発音される楽音ごとに、Ｎｏｔｅ［０］，Ｎｏｔｅ［１］,・・・Ｎｏｔｅ［ｎ］に示すようなデータ群を有する。各楽音に相当するデータ群（たとえば、Ｎｏｔｅ［０］）には、当該楽音の発音開始ティック、すなわち、楽音の発音を開始するタイミングまでのクロック数を示すＩＴｉｍｅ、発音した楽音を伸ばす時間に相当するゲートタイムを示すＩＧａｔｅ、発音すべき楽音の音高（ノート番号）を示すＰｉｔ、発音すべき楽音のベロシティを示すＶｅｌ、当該楽音と音声のディジタル信号に基づき抽出されたピッチとの比較結果をそれぞれ示すｉＭａｔｃｈＣｎｔ、ｉＵｐＣｎｔ、ｉＤｏｗｎＣｎｔ、および、ｓＭａｔｃｈ、並びに、次の楽音のノート番号を示すノートポインタｉＮｐが含まれる。 The music data is stored in the ROM 14 of the singing guide device 10 or is given to the reproduction control unit 30 from an external device via the MIDI I / F 22. FIG. 3 is a diagram showing a data structure of music data. As shown in FIG. 3, the music data has a data group as shown in Note [0], Note [1],. In a data group corresponding to each musical tone (for example, Note [0]), the sounding start tick of the musical tone, that is, the ITime indicating the number of clocks until the musical sound starts to be generated, corresponds to the time for extending the musical tone to be generated. IGate indicating the gate time to be played, Pit indicating the pitch (note number) of the musical sound to be generated, Vel indicating the velocity of the musical sound to be generated, and the comparison result between the musical sound and the pitch extracted based on the digital signal of the sound IMatchCnt, iUpCnt, iDownCnt, and sMatch, respectively, and a note pointer iNp indicating the note number of the next musical sound are included.

第１のピッチ比較結果ｉＭａｔｃｈＣｎｔは、ピッチの一致回数を示し、第２のピッチ比較結果ｉＵｐＣｎｔは、歌唱の音高が楽音の音高より高い、つまり、上ずっていた回数を示し、その一方、第３のピッチ比較結果ｉＤｏｗｎＣｎｔは、歌唱の音高が楽音の音高より低い、つまり、いわゆるフラットした回数を示す。また、第４の比較結果Ｓｍａｔｃｈは、ピッチの最終的な比較結果を示す。Ｓｍａｔｃｈの値として、たとえば、ピッチが一致したことを示す「○（マル）」に対応する値、或いは、ピッチが不一致であることを示す「×（バツ）」に対応する値をとることができる。 The first pitch comparison result iMatchCnt indicates the number of pitch matches, and the second pitch comparison result iUpCnt indicates the number of times that the pitch of the singing is higher than the pitch of the musical tone, that is, the number of times that the pitch has increased. The pitch comparison result iDownCnt of 3 indicates the number of times that the pitch of the singing is lower than the pitch of the musical tone, that is, the so-called flat frequency. The fourth comparison result Smatch indicates the final comparison result of the pitch. As the value of Smatch, for example, a value corresponding to “◯ (maru)” indicating that the pitches match or a value corresponding to “x (X)” indicating that the pitches do not match can be taken. .

本実施の形態では、楽音ごとに、上述したデータ群のうち、ＩＴｉｍｅ、ＩＧａｔｅ、Ｐｉｔ、ＶｅｌおよびｉＮＰが予め用意される。また、ピッチ比較結果は、後述する比較判定処理により必要な値が格納される。 In the present embodiment, ITTime, IGate, Pit, Vel, and iNP are prepared in advance among the above-described data group for each musical tone. Further, the pitch comparison result stores a necessary value by a comparison determination process described later.

図４は、本実施の形態にかかる歌唱ガイド装置にて実行される処理のメインフローチャートである。図４に示すように、歌唱ガイド装置１０は、電源がオンされると、まず、処理に使用される変数を初期化する（ステップ４０１）とともに、楽曲データに基づいて、楽譜を表示する（ステップ４０２）。 FIG. 4 is a main flowchart of processing executed by the singing guide device according to the present embodiment. As shown in FIG. 4, when the power is turned on, the singing guide device 10 first initializes variables used for processing (step 401) and displays a score based on music data (step). 402).

図５は、処理に使用される変数を示す図である。本実施の形態においては、再生制御のために用いられるパラメータ（図５（ａ）参照）と、ピッチ抽出のために用いられるパラメータ（図５（ｂ）参照）とがある。再生制御用パラメータには、４分音符の分解能を示すｉＲｅｓｏｌｕｔｉｏｎ、拍子の分子の部分を示すｌＢｅａｔ、分母の部分を示すｌＢｅａｔＤｅｎｏｍｉ、テンポ値（ＢＰＭ）を示すｌＢＰＭ、１小節の秒数を示すdoTimeOfMeas、楽譜表示の際に、発音すべきタイミングから先行させて表示させる楽譜のフレーム数を示すＩＯｆｆｓｅｔＸ、再生時の現在時間を示すＩＮｏｗＴｉｍｅ、再生開始時間を示すＩＢａｓｅＴｉｍｅ、再生時の現在再生位置をティックにて示すＩＮｏｗＴｉｃｋ、先行させて表示する際の先行時間を示すＩＯｆｆｓｅｔ、いまどのノートを再生しているかを示す再生用ノートポインタｎｐ、既に発音が終わったノートのうち、どのノートまでを残して表示するかを示す表示用ノートポインタｎｄ、および、先行して表示すべきノートを示す先行表示用ノートポインタｎａが含まれる。さらに、再生制御用パラメータには、発音すべき楽曲データＮｏｔｅ［Ｎ］、発音すべき楽曲でデータのバッファであるＯｎＢｕｆ［Ｎ］、および、再生、音声入力の状態変数であるｉＰｌａｙ、ｉＷａｖｅが含まれる。これらパラメータについては、再生処理に関連してより詳細に説明する。 FIG. 5 is a diagram showing variables used for processing. In this embodiment, there are parameters used for reproduction control (see FIG. 5A) and parameters used for pitch extraction (see FIG. 5B). The parameters for playback control include iResolution indicating the resolution of a quarter note, lBeat indicating the numerator part, lBeatDenomi indicating the denominator part, lBPM indicating the tempo value (BPM), doTimeOfMeas indicating the number of seconds in one measure, When displaying a score, IOffsetX indicating the number of frames of the score to be displayed in advance of the timing to be sounded, INowTime indicating the current time during playback, IBaseTime indicating the playback start time, and the current playback position during playback in ticks IowTick indicating, IOffset indicating the preceding time when displaying in advance, Reproducing note pointer np indicating which note is currently being reproduced, and up to which note among the notes that have already been pronounced is to be displayed Display note pointer nd indicating It includes prior display note pointer na showing the to be displayed notes. Further, the playback control parameters include music data Note [N] to be sounded, OnBuf [N] which is a data buffer for the music to be sounded, and iPlay and iWave which are state variables for playback and audio input. It is. These parameters will be described in more detail in connection with the playback process.

また、図５（ｂ）に示すように、ピッチ抽出用パラメータには、ＦＦＴ実行のための変数（複素数）であるｃｍＤａｔａ［Ｎ］、スペクトルパワーを示すｄｏＤａｔａ［Ｎ］、サンプリング周波数を示すｄｏＦＳ、フレームサイズを示すｉＦｒａｍｅＳｉｚｅ、抽出するピッチの最低音のノート番号を示すｉＬｏｗｅｒＮＮ、抽出するピッチの最高音のノート番号を示すｉＵｐｐｅｒＮＮ、抽出されたピッチの整数部を示すｉＢＮｏｔｅＮ、抽出されたピッチの小数部を示すｉＢＮｏｔｅＤ、および、採点結果を示すｉＪｕｄｇｅが含まれる。 As shown in FIG. 5B, the pitch extraction parameters include cmData [N] that is a variable (complex number) for executing FFT, doData [N] that indicates the spectrum power, doFS that indicates the sampling frequency, IFrameSize indicating the frame size, iLowerNN indicating the note number of the lowest pitch of the extracted pitch, iUpperNN indicating the note number of the highest pitch of the extracted pitch, iBNoteN indicating the integer part of the extracted pitch, and the fractional part of the extracted pitch IBNoteD indicating i, and iJudge indicating scoring results are included.

ステップ４０１およびステップ４０２の後、歌唱ガイド装置１０は、ユーザによる入力装置１８の入力を待つ（ステップ４０３）。より具体的には、ここでは、再生開始或いは停止指示のための、キー或いはマウスによる入力を待機する。入力装置１８の入力が、開始指示であった場合には（ステップ４０４でイエス(Yes)）、歌唱ガイド装置１０は、再生ルーチンを起動するとともに（ステップ４０５）、音声入力ルーチンも起動する（ステップ４０６）。その一方、終了指示であった場合には（ステップ４０７でイエス(Yes)）、再生ルーチンおよび音声入力ルーチンを停止させる（ステップ４０８、４０９）。 After step 401 and step 402, the singing guide device 10 waits for input from the input device 18 by the user (step 403). More specifically, here, input by a key or a mouse for instructing to start or stop playback is waited. When the input of the input device 18 is a start instruction (Yes in Step 404), the singing guide device 10 starts a playback routine (Step 405) and also starts a voice input routine (Step 405). 406). On the other hand, if it is an end instruction (Yes in Step 407), the reproduction routine and the voice input routine are stopped (Steps 408 and 409).

図６は、歌唱ガイド装置１０の再生制御部３０にて実行される再生ルーチンを示すフローチャートである。この再生ルーチンは、メインフローのステップ４０５において起動され、メインルーチンと平行して動作する。 FIG. 6 is a flowchart showing a reproduction routine executed by the reproduction control unit 30 of the singing guide device 10. This reproduction routine is started in step 405 of the main flow and operates in parallel with the main routine.

再生ルーチンにおいては、まず、再生制御用パラメータ中、ノートポインタｎｐ、ｎｄおよびｎａが初期化される（ステップ６０１）。次いで、スクロール表示のための初期化が行われるとともに（ステップ６０２）、再生開始時間ＩＢａｓｅＴｉｍｅに、現在時刻が与えられる（ステップ６０３）。 In the reproduction routine, first, note pointers np, nd and na in the reproduction control parameters are initialized (step 601). Next, initialization for scroll display is performed (step 602), and the current time is given to the reproduction start time IBaseTime (step 603).

次いで、状態変数ｉＰｌａｙが「０（ゼロ）」であるかどうかが判断される（ステップ６０４）。この状態変数ｉＰｌａｙは、再生中に「１」、再生終了時に「０」がセットされるため、ステップ６０４でイエス(Yes)と判断された場合には、採点処理が実行される（ステップ６０５）。 Next, it is determined whether or not the state variable iPlay is “0 (zero)” (step 604). Since this state variable iPlay is set to “1” during playback and “0” at the end of playback, if it is determined yes in step 604, a scoring process is executed (step 605). .

その一方、ステップ６０４でノー(No)と判断された場合には、再生時の現在時間ＩＮｏｗＴｉｍｅに現在時刻が与えられる（ステップ６０６）。次いで、再生制御部３０は、時間変数を更新する（ステップ６０７）。ここでは、以下の式を用いて、現在再生位置のティックを示すＩＮｏｗＴｉｃｋが更新される。
ＩＮｏｗＴｉｃｋ＝（ＩＮｏｗＴｉｍｅ−ＩＢａｓｅＴｉｍｅ）
＊ｉＲｅｓｏｌｕｔｉｏｎ＊ＩＢＰＭ／６０ On the other hand, if it is determined NO in step 604, the current time is given to the current time INowTime at the time of reproduction (step 606). Next, the playback control unit 30 updates the time variable (step 607). Here, INowTick indicating the tick of the current playback position is updated using the following equation.
INowTick = (INowTime-IBaseTime)
* IResolution * IBPM / 60

つまり、再生開始時からの経過時刻が、曲の先頭からどの位置（ティック）に相当するのかを、テンポの変数（ＩＢＰＭ）と、４分音符のティックの定義（ｉＲｅｓｏｌｕｔｉｏｎ）とから算出される。なお、ＩＢＰＭは、１分間の拍数（Beat Per Minute）であるため、６０で除算している。 That is, the position (tick) corresponding to the elapsed time from the start of reproduction is calculated from the tempo variable (IBPM) and the definition of the quarter note tick (iResolution). Note that IBPM is divided by 60 because it is a beat per minute.

次いで、再生制御部３０は、算出されたＩＮｏｗＴｉｃｋを参照して、キーオフすべきタイミングであれば、後述するノートオフ処理（ステップ６０８）において、所定のノート（Ｎｏｔｅ［ｉ］）の楽曲データの楽音をノートオフするとともに、キーオンすべきタイミングであれば、後述するノートオン処理（ステップ６０９）において、所定のノートの楽曲データを発音させる。また、再生制御部３０は、現在時刻に対応するＩＮｏｗＴｉｃｋ、楽曲データ、および、各ノートの発音開始タイミングを比較して、表示すべきノート等を判定した上で（スクロール表示処理：ステップ６１０）、楽譜をスクロール表示させる（楽譜表示処理：ステップ６１１）。スクロール表示処理の詳細についても後述する。再生制御部３０は、停止指示がなされるまで、ステップ６０６〜６１１の処理を繰り返し実行する。 Next, the playback control unit 30 refers to the calculated INowTick, and if it is time to key off, the musical tone of the music data of a predetermined note (Note [i]) in a note-off process (step 608) described later. Is turned off and at the timing when the key should be turned on, the music data of a predetermined note is generated in a note-on process (step 609) described later. Further, the playback control unit 30 compares the INowTick corresponding to the current time, the music data, and the sounding start timing of each note to determine a note to be displayed (scroll display processing: step 610). The score is scroll-displayed (score display processing: step 611). Details of the scroll display process will also be described later. The reproduction control unit 30 repeatedly executes the processing of steps 606 to 611 until a stop instruction is issued.

次に、図７を参照して、音声入力ルーチンについて説明する。この再生ルーチンは、メインフローのステップ４０６において起動され、所定間隔で、たとえば、ＲＡＭ１６の処理の領域に、ＡＤＣ２６を介してディジタル信号がセットされた状態で、メインルーチンと平行して動作する。図７に示すように、ＣＰＵ１２は、入力された音声データを、複素数のＦＦＴ演算用変数にコピーして、ＲＡＭ１６に一時的に記憶する（ステップ７０１）。次いで、変数に対して、ハニング窓などの窓関数が乗算される（ステップ７０２）。次いで、ＦＦＴが実行される（ステップ７０３）。その結果は、実数部と虚数部とに現れるので、そのパワー、つまり、それぞれの二乗和の平方根が求められる（ステップ７０４）。マイクで音声を入力する場合に、低周波ノイズや広域ノイズが混入する場合があるため、周波数領域となったデータに対してフィルタリングが施される（ステップ７０５）。これは、単に掛け算を行うことにより実現できる。次いで、後述する基本周波数同定処理（ステップ７０６）、比較判定処理（ステップ７０７）、および、ピッチ表示処理（ステップ７０８）が実行される。 Next, the voice input routine will be described with reference to FIG. This reproduction routine is started in step 406 of the main flow, and operates in parallel with the main routine at a predetermined interval, for example, with a digital signal set in the processing area of the RAM 16 via the ADC 26. As shown in FIG. 7, the CPU 12 copies the input audio data to a complex FFT calculation variable and temporarily stores it in the RAM 16 (step 701). The variable is then multiplied by a window function such as a Hanning window (step 702). Next, FFT is executed (step 703). Since the result appears in the real part and the imaginary part, the power, that is, the square root of each square sum is obtained (step 704). When voice is input through a microphone, low-frequency noise or wide-area noise may be mixed, so that filtering is performed on data in the frequency domain (step 705). This can be realized by simply performing multiplication. Next, a fundamental frequency identification process (step 706), a comparison determination process (step 707), and a pitch display process (step 708) described later are executed.

図８は、基本周波数同定処理をより詳細に示すフローチャートである。この処理においては、パワースペクトルにより基本周波数を同定する。一般的に歌声には、基本周波数に対して倍音が存在する性質を利用して、本実施の形態では、倍音を含む声のスペクトル構造とのマッチングを行うことにより、基本周波数を同定する。また、マッチングのピッチ範囲を超えの一般的な範囲から制限することで、より高速に周波数同定が実現できる。まず、マッチング評価値変数ｄｐＶＭａｘが初期化されるとともに、同定される音高に対応するノート番号変数ｉＢＮｏｔｅＮが初期化される（ステップ８０１）。次に、ノート番号の変数ｉが、抽出するピッチ音最低音に相当するノート番号ｉＬｏｗｅｒＮＮにセットされる（ステップ８０２）。ステップ８０４以降の処理が、ノート番号「ｉ」が、抽出するピッチ音最高音「ｉＵｐｐｅｒＮＮ」に達するまで繰り返される（ステップ８０３参照）。 FIG. 8 is a flowchart showing the basic frequency identification process in more detail. In this process, the fundamental frequency is identified by the power spectrum. In general, in a singing voice, the fundamental frequency is identified by matching with the spectrum structure of a voice including a harmonic overtone by utilizing the property that a harmonic overtone exists with respect to the fundamental frequency. Further, by limiting from a general range exceeding the matching pitch range, frequency identification can be realized at higher speed. First, the matching evaluation value variable dpVMax is initialized, and the note number variable iBNoteN corresponding to the identified pitch is initialized (step 801). Next, the note number variable i is set to the note number iLowerNN corresponding to the lowest pitch sound to be extracted (step 802). The processing after step 804 is repeated until the note number “i” reaches the highest pitch sound “iUpperNN” to be extracted (see step 803).

ＣＰＵ１２は、ノート番号より細かいピッチのための変数ｋを初期化する（ステップ８０４）。当該変数ｋが所定の最大数ｋＭＡＸに至るまで以下の処理が繰り返される（ステップ８０５参照）。 The CPU 12 initializes a variable k for a pitch smaller than the note number (step 804). The following processing is repeated until the variable k reaches a predetermined maximum number kMAX (see step 805).

ＣＰＵ１２は、倍音カウンタｊを初期化するとともに、あるピッチにおける評価値変数ｄＰｉｔＶａｌを初期化する（ステップ８０６）。さらに、倍音の上限値ＪＭＡＸに至るまで（ステップ８０７）、ＣＰＵ１２は、検査対象の周波数を求め（ステップ８０８）、対象の周波数がどのパワースペクトルであるかを特定し、そのインデクスを算出し（ステップ８０９）、評価値としてパワースペクトルを加算し（ステップ８１０）、倍音カウンタｊをインクリメントする（ステップ８１１）。より詳細に、ステップ８０８において、ノート番号から周波数を求めるので、基準ピッチをＡ４＝４４０Ｈｚに設定し、Ａ４のノート番号＝６９とする。具体的には、
ｄＦｒ＝４４０＊ｐｏｗｅｒ（２，（ｉ−６９）＋ｋ／ＫＭＡＸ）／１２）＊ｊ
と求められる。 The CPU 12 initializes the overtone counter j and initializes the evaluation value variable dPitVal at a certain pitch (step 806). Further, until reaching the upper limit value JMAX of the overtone (step 807), the CPU 12 obtains the frequency to be inspected (step 808), specifies which power spectrum is the target frequency, and calculates its index (step) 809), the power spectrum is added as an evaluation value (step 810), and the overtone counter j is incremented (step 811). More specifically, since the frequency is obtained from the note number in step 808, the reference pitch is set to A4 = 440 Hz, and the note number of A4 is set to 69. In particular,
dFr = 440 * power (2, (i−69) + k / KMAX) / 12) * j
Is required.

また、ステップ８０９において、インデックスｉＮは、
ｉＮ＝（ｄＦｒ＊ｉＦｒａｍｅＳｉｚｅ／ｄｏＦｓ）
で求めることができる。 In step 809, the index iN is
iN = (dFr * iFrameSize / doFs)
Can be obtained.

さらに、ステップ８１０において、評価値ｄＰｉｔＶａｌは、
ｄＰｉｔＣａｌ＋ｄｏＤａｔａ［ｉＮ］＊（Ｊ−ＭＡＸ−Ｊ）／ＪＭＡＸ
となる。 Furthermore, in step 810, the evaluation value dPitVal is
dPitCal + doData [iN] * (J-MAX-J) / JMAX
It becomes.

倍音カウンタｊが倍音の上限値ＪＭＡＸに達すると（ステップ８０７でイエス(Yes)）、対象となる周波数についての評価値ｄＰｉｔＶａｌを、それまでの最大値ｄＰＶＭａｘと比較して、大きいほうを最大値ｄＰＶＭａｘとして記憶する（ステップ８１２、８１３）。また、ステップ８１３においては、ノート番号変数ＩＢＮｏｔｅＮとして、ノート番号ｉがセットされ、かつ、ノート番号より細かいピッチのステップをあらわす変数ｉＢＮｏｔｅＤに、ｋ＊１００／ＫＭＡＸがセットされる。つまり、最大値をあらわしたピッチが、整数部ＩＢＮｏｔｅＮおよび小数部ＩＢＮｏｔｅＤにより特定される。このような処理の後、変数ｋがインクリメントされる。 When the overtone counter j reaches the upper limit value JMAX of the overtone (Yes in Step 807), the evaluation value dPitVal for the target frequency is compared with the maximum value dPVMax so far, and the larger one is the maximum value dPVMax. (Steps 812 and 813). In step 813, note number i is set as note number variable IBNoteN, and k * 100 / KMAX is set in variable iBNoteD representing a step having a pitch smaller than the note number. That is, the pitch representing the maximum value is specified by the integer part IBMNoteN and the decimal part IBMNoteD. After such processing, the variable k is incremented.

次に、図９を参照して、比較判定処理（図７のステップ７０７）をより詳細に説明する。この処理では、基本周波数同定処理によって抽出されたピッチを、現在発音中のノート番号と比較して、一致、不一致（大小）を調べる。
まず、発音中のノートのポインタｐが初期化される（ステップ９０１）。ＣＰＵ１２は、ポインタｐが所定数を超えるまで（ステップ９０２）、以下の処理を実行する。 Next, the comparison determination process (step 707 in FIG. 7) will be described in more detail with reference to FIG. In this process, the pitch extracted by the fundamental frequency identification process is compared with the note number currently being sounded to check for a match or mismatch (large or small).
First, the pointer p of the note that is sounding is initialized (step 901). The CPU 12 executes the following processing until the pointer p exceeds a predetermined number (step 902).

ＣＰＵ１２は、発音中のデータについて（ステップ９０３でイエス(Yes)）、抽出されたピッチｉＢＮｏｔｅＮが発音中バッファＯｎＢｕｆ［ｐ］のピッチＰｉｔｃｈと等しい場合には（ステップ９０４でイエス(Yes)）、楽曲データ中、Ｎｏｔｅ［ＯｎＢｕｆ［ｐ］．ｉＮｐ］のピッチ比較結果のうち一致回数を表すｉＭａｔｃｈＣｎｔをインクリメントする（ステップ９０５）。 If the extracted pitch iBNoteN is equal to the pitch Pitch of the sounding buffer OnBuf [p] (Yes in step 904) for the data being sounded (Yes in step 903), the CPU 12 In the data, Note [OnBuf [p]. iMatchCnt indicating the number of matches in the pitch comparison result of iNp] is incremented (step 905).

また、ＣＰＵ１２は、発音中バッファＯｎＢｕｆ［ｐ］のピッチＰｉｔｃｈが、抽出されたピッチｉＢＮｏｔｅＮより大きい場合には（ステップ９０６でイエス(Yes)）、楽曲データ中、Ｎｏｔｅ［ＯｎＢｕｆ［ｐ］．ｉＮｐ］のピッチ比較結果のうちうわずった回数を表すｉＵｐＣｎｔをインクリメントする（ステップ９０７）。 If the pitch Pitch of the sound generation buffer OnBuf [p] is larger than the extracted pitch iBNoteN (Yes in Step 906), the CPU 12 determines Note [OnBuf [p]. In step 907, iUpCnt indicating the number of times iNp] is pitched is incremented.

或いは、ＣＰＵ１２は、発音中バッファＯｎＢｕｆ［ｐ］のピッチＰｉｔｃｈが、抽出されたピッチｉＢＮｏｔｅＮより小さい場合には（ステップ９０８でイエス(Yes)）、楽曲データ中、Ｎｏｔｅ［ＯｎＢｕｆ［ｐ］．ｉＮｐ］のピッチ比較結果のうちフラットした回数を表すｉＤｏｗｎＣｎｔをインクリメントする（ステップ９０９）。その後、ノートのポインタｐをインクリメントして（ステップ９１０）、ステップ９０２に戻る。このようにして、楽曲データ（図３参照）の楽音（Ｎｏｔｅ［ｉ］）ごとに、ピッチの比較結果を得ることができる。 Alternatively, when the pitch Pitch of the sound generation buffer OnBuf [p] is smaller than the extracted pitch iBNoteN (Yes in Step 908), the CPU 12 determines Note [OnBuf [p]. iDownCnt indicating the number of times of flattening in the pitch comparison result of iNp] is incremented (step 909). Thereafter, the note pointer p is incremented (step 910), and the process returns to step 902. In this way, a pitch comparison result can be obtained for each musical sound (Note [i]) of the music data (see FIG. 3).

次に、メインフロー（図６）におけるステップ６０５、６０８、６０９、６１０などについて以下に説明する。図１０は、ノートオン処理を示すフローチャートである。ノートオンは、現在時刻で発音すべき楽音があった場合に、その音高の楽音を発生させる。また、発音中バッファにデータをセットし、ノートオフ処理時に簡単に処理が実行できるよう準備をしておく。ＣＰＵ１２は、ノートポインタｐを初期化し（ステップ１００１）、全てのノート（楽音）について、ステップ１００３以降の処理を実行する。楽音データ中、Ｎｏｔｅ［ｐ］の発音開始ティックＩＴｉｍｅが、現在時刻ＩＮｏｗＴｉｃｋを過ぎていれば（ステップ１００３でイエス(Yes)）、ＣＰＵ１２はノートオンイベントをＭＩＤＩＯＵＴする（ステップ１００４）。次いで、ＣＰＵ１２は、発音中のデータ記憶用のバッファＯｎＢｕｆのポインタｑを初期化し（ステップ１００５）、ＯｎＢｕｆ［ｑ］が空きであれば（ステップ１００６でイエス(Yes)）、ノートオフをＭＩＤＩＯＵＴするためのデータを作成する（ステップ１００７）。より詳細には、ステップ１００７において、ＯｎＢｕｆ［ｑ］において、ＩＴｉｍｅとして、楽音データＮｏｔｅ［ｐ］のＩＴｉｍｅとＩＧａｔｅとの和をセットし、ＩＧａｔｅに「１」をセットし、Ｐｉｔに、Ｎｏｔｅ［ｐ］のＰｉｔをセットし、Ｓｔａｔに、Ｎｏｔｅ［ｐ］のＳｔａｔをセットし、かつ、Ｖｅｌに「０」をセットする。 Next, steps 605, 608, 609, 610, etc. in the main flow (FIG. 6) will be described below. FIG. 10 is a flowchart showing the note-on process. Note-on generates a musical tone having a pitch when there is a musical tone to be generated at the current time. Also, data is set in the sounding buffer, and preparations are made so that processing can be easily executed during note-off processing. The CPU 12 initializes the note pointer p (step 1001), and executes the processing after step 1003 for all the notes (musical sounds). In the musical sound data, if the pronunciation start tick ITime of Note [p] has passed the current time INowTick (Yes in step 1003), the CPU 12 MIDI OUTs the note-on event (step 1004). Next, the CPU 12 initializes the pointer q of the buffer OnBuf for storing data during sound generation (step 1005), and if OnBuf [q] is empty (Yes in step 1006), the note-off is MIDI OUT. The data for this is created (step 1007). More specifically, in step 1007, in OnBuf [q], the sum of ITime and IGate of the musical sound data Note [p] is set as ITime, “1” is set in IGate, and Note [p in Pit. ] Pit, Note [p] Stat is set in Stat, and “0” is set in Vel.

また、ＣＰＵ１２は、ＯｎＢｕｆ「ｑ」において、ピッチ比較結果のカウンタをそれぞれ「０」に初期化するとともに、ピッチ比較結果ＳＭａｔｃｈに無効値「−１」を収容するとともに、ノートポインタｉＮＰに「ｐ」を与える（ステップ１００８）。その後、ノートのポインタｐをインクリメントして（ステップ１０１０）、ステップ１００２に戻る。これに対して、ステップ１００６でノー(No)の場合には、パラメータｑをインクリメントして、ＯｎＢｕｆ［ｑ］の空きを探す（ステップ１００９）。 Further, the CPU 12 initializes the counter of the pitch comparison result to “0” in OnBuf “q”, stores the invalid value “−1” in the pitch comparison result SMatch, and sets “p” in the note pointer iNP. (Step 1008). Thereafter, the note pointer p is incremented (step 1010), and the process returns to step 1002. On the other hand, if no in step 1006, the parameter q is incremented to search for an empty OnBuf [q] (step 1009).

次に、図１１を参照して、ノートオフ処理についてより詳細に説明する。ノートオフ処理においては、経過時間に伴い、消音すべき楽音を消音するとともに、比較判定処理（図９）により取得されたカウンタ値にしたがって、正しく歌えたか否かの判定を行う。ＣＰＵ１２は、ノートポインタｐを初期化し（ステップ１１０１）、すべての発音中バッファＯｎＢｕｆ［ｐ］について以下の処理を実行する。 Next, the note-off process will be described in more detail with reference to FIG. In the note-off process, the musical sound to be muted is muted along with the elapsed time, and whether or not the song has been sung correctly is determined according to the counter value acquired by the comparison determination process (FIG. 9). The CPU 12 initializes the note pointer p (step 1101), and executes the following processing for all sound generating buffers OnBuf [p].

ＣＰＵ１２は、発音中のデータであって、かつ、現在時刻が消音時間を越えたかどうかを判断する（ステップ１１０３）。ステップ１１０３でイエス(Yes)の場合には、ノートオフイベントをＭＩＤＩＯＵＴする（ステップ１１０４）。次いで、発音中バッファＯｎＢｕｆ［ｐ］の元となった楽音データＮｏｔｅ［ＯｎＢｕｆ［ｐ］．ｉＮＰ］中のｉＭａｔｃｈＣｎｔ、ｉＵｐＣｎｔおよびｉＤｏｗｎＣｎｔを基づいて、ピッチ比較結果であるｓＭａｔｃｈに、正しかった場合「ＣＯＲＲＥＣＴ」（ステップ１１０５、１１０６）、上ずっていた場合「ＵＰＰＩＴ」（ステップ１１０７、１１０８）、下がっていた（フラットしていた場合）「ＤＯＷＮＰＩＴ」（ステップ１１０９、１１１０）をセットする。このような処理の後、ノートポインタｐがインクリメントされる（ステップ１１１１）。 The CPU 12 determines whether the data is sounding and the current time has exceeded the mute time (step 1103). If yes in step 1103, the note-off event is MIDI OUT (step 1104). Next, the musical sound data Note [OnBuf [p]. iNP] based on iMatchCnt, iUpCnt, and iDownCnt, if the pitch comparison result sMatch is correct, “CORRECT” (steps 1105, 1106), if it is higher, “UPPIT” (steps 1107, 1108), it is lowered. (If flat) “DOWNNPIT” (steps 1109 and 1110) is set. After such processing, the note pointer p is incremented (step 1111).

図１２は、採点処理（図６のステップ６０５）をより詳細に示すフローチャートである。採点処理は、再生した楽曲において、ピッチ比較結果で正しいと判断された、つまり、「ＣＯＲＲＥＣＴ」と判断された楽音の数をカウントする。より詳細には、ＣＰＵ１２は、ノートポインタｐを初期化するとともに、カウンタｉＯＫＣｎｔを初期化し（ステップ１２０１）、ポインタｐが楽曲の最後を指すまで以下の処理を実行する（ステップ１２０１）。 FIG. 12 is a flowchart showing the scoring process (step 605 in FIG. 6) in more detail. The scoring process counts the number of musical sounds that are determined to be correct in the pitch comparison result, that is, “CORRECT” in the reproduced music. More specifically, the CPU 12 initializes the note pointer p, initializes the counter iOKCnt (step 1201), and executes the following processing until the pointer p points to the end of the music (step 1201).

ＣＰＵ１２は、楽曲データＮｏｔｅ［ｐ］のピッチ比較結果ＳＭａｔｃｈが「ＣＯＲＲＥＣＴ」であれば、カウンタｉＯＫＣｎｔをインクリメントする処理を繰り返す（ステップ１２０３〜１２０５）。最終的に、得点ｉＪｕｄｇｅは、１００＊ｉＯＫＣｎｔ／ｐとして算出される（ステップ１２０６）。これは正しく歌えた割合をパーセンテージで表したものである。 If the pitch comparison result SMatch of the music data Note [p] is “CORRECT”, the CPU 12 repeats the process of incrementing the counter iOKCnt (steps 1203 to 1205). Finally, the score iJudge is calculated as 100 * iOKCnt / p (step 1206). This is the percentage that you sang correctly.

次に、画像表示に関する処理について説明する。図１３は、スクロール表示初期化処理（図６のステップ６０２）をより詳細に示すフローチャートである。本実施の形態では、楽曲データに基づく楽譜のスクロール表示において、楽曲の演奏タイミングに先行して表示を開始し、音符と抽出されたピッチとを重ねあわせて表示できるようになっている。また、楽曲データに基づく楽譜は左方向にスクロールし、現在再生されている箇所の画面中の画像位置は一定となる。また、楽曲の再生は楽曲の時間単位であるティック（Ｔｉｃｋ）に基づいて表示されるが、入力音声のピッチは、切り出されたフレーム単位となる。そこで、１小節が、何フレーム分に相当するかを計算し、先行表示するＸ軸上の距離を算出しておく必要がある。 Next, processing related to image display will be described. FIG. 13 is a flowchart showing the scroll display initialization process (step 602 in FIG. 6) in more detail. In the present embodiment, in the scroll display of the musical score based on the music data, the display is started prior to the performance timing of the music, and the notes and the extracted pitch can be displayed in an overlapping manner. In addition, the score based on the music data is scrolled to the left, and the position of the image on the screen at the currently played location is constant. In addition, the reproduction of the music is displayed based on a tick that is a time unit of the music, but the pitch of the input voice is the cut frame unit. Therefore, it is necessary to calculate how many frames one bar corresponds to and calculate the distance on the X axis for the preceding display.

図１３に示すように、ＣＰＵ１２は、１小節に相当するＴｉｃｋ数を算出する（ステップ１３０１）。より具体的には、
ＩＯｆｆｓｅｔ＝ｉＲｅｓｏｌｕｔｉｏｎ＊ＩＢｅａｔ＊４
／ＩＢｅａｔＤｅｎｏｍｉ
となる。 As shown in FIG. 13, the CPU 12 calculates the number of ticks corresponding to one measure (step 1301). More specifically,
IOoffset = iResolution * IBeat * 4
/ IBeatDenomi
It becomes.

次いで、ＣＰＵ１２は、１小節の実時間を算出する（ステップ１３０２）。より具体的には、
ｄｏＴｉｍｅＯｆＭｅａｓ＝（６０＊ＩＢｅａｔ＊４
／ＩＢｅａｔＤｅｎｏｍｉ）／ＩＢＰＭ
となる。ここに、１ＢＰＭ（Beat Per Minutes）は、テンポ値（すなわち、４分音符の１分間の個数）である。したがって、「６０」を掛けることで、単位を秒にしている。 Next, the CPU 12 calculates the actual time of one measure (step 1302). More specifically,
doTimeOfMeas = (60 * IBeat * 4
/ IBeatDenomi) / IBPM
It becomes. Here, 1 BPM (Beat Per Minutes) is a tempo value (that is, the number of quarter notes per minute). Therefore, by multiplying by “60”, the unit is set to second.

また、１小節が、何フレームに対応するかが次に算出される（ステップステップ１３０３）。より具体的には、
ＩＯｆｆｓｅｔＸ＝ｄｏＴｉｍｅＯｆＭｅａｓ＊ｄｏＦｓ
／ｉＦｒａｍｅＳｉｚｅ
から求めることができる。 Further, the number of frames corresponding to one bar is calculated next (step 1303). More specifically,
IOffsetX = doTimeOfMeas * doFs
/ IFrameSize
Can be obtained from

次に、スクロール表示処理（図６のステップ６１０）についてより詳細に説明する。スクロール表示においては、現在時刻に対応するティックであるＩＮｏｗＴｉｃｋと、楽曲データ、各楽音（ノート）の発音開始タイミングを考慮して表示の是非を判定する。ステップ１４０１〜１４０３は、先行表示のためのループであり、ステップ１４０４〜１４０６は、演奏すべきタイミングにおける表示のためのループである。 Next, the scroll display process (step 610 in FIG. 6) will be described in more detail. In the scroll display, whether the display is right or wrong is determined in consideration of the INowTick that is a tick corresponding to the current time, the music data, and the sound generation start timing of each musical sound (note). Steps 1401 to 1403 are loops for displaying in advance, and steps 1404 to 1406 are loops for displaying at the timing to be played.

前半のループでは、ＣＰＵ１２は、先行表示用のノートポインタｎａについて、現在の演奏タイミングより後に演奏される楽音データＮｏｔｅ［ｎａ］のＩＴｉｍｅと、スクロール表示初期化（図１３）にて算出されたＩＯｆｆｓｅｔとの和が、ＩＮｏｗＴｉｃｋより小さい限り（ステップ１４０１でイエス(Yes)）、当該楽音を表示する（ステップ１４０２、１４０３）。また、後半のループでは、ＣＰＵ１２は、表示用のノートポインタｎｄについて、Ｎｏｔｅ［ｎｄ］のＩＴｉｍｅが、ＩＮｏｗＴｉｃｋより小さい限り（ステップ１４０４でノー(No)）、当該楽音を表示する（ステップ１４０５、１４０６）。このようにして、表示すべき楽音を表示させることができる。なお、Ｙ方向（縦方向）の表示位置は、当該Ｎｏｔｅ［ｎａまたはｎｄ］中のＰｉｔに基づいて決定される。 In the first half of the loop, the CPU 12 uses the ITime of the musical tone data Note [na] played after the current performance timing and the Ioffset calculated in the scroll display initialization (FIG. 13) for the note pointer na for the preceding display. As long as the sum is smaller than INowTick (Yes in Step 1401), the musical sound is displayed (Steps 1402, 1403). In the latter half of the loop, the CPU 12 displays the musical sound for the note pointer nd for display as long as the ITime of Note [nd] is smaller than INowTick (No in step 1404) (steps 1405, 1406). ). In this way, the musical sound to be displayed can be displayed. The display position in the Y direction (vertical direction) is determined based on the Pit in the Note [na or nd].

次に、ピッチ表示処理（図７のステップ７０８）について、図１５を参照して、より詳細に説明する。ピッチ表示処理においては、ピッチを表示するタイミングで画面全体を全体をスクロールし、基本周波数同定処理（図８参照）において抽出されたピッチに基づいて、楽音を示す音符の位置、或いは、そのＹ方向（縦方向）の上下に、ユーザの音声にかかるピッチを示す印を付加する。
まず、ＣＰＵ１２は、画面の矩形領域をコピーすることで、画面全体を左にスクロールさせる（ステップ１５０１）。次いで、ＣＰＵ１２は、表示画面の幅をｗとして、また、その高さをｈとしてセットする（ステップ１５０２）。 Next, the pitch display process (step 708 in FIG. 7) will be described in more detail with reference to FIG. In the pitch display process, the entire screen is scrolled at the timing of displaying the pitch, and the position of the musical note indicating the musical tone based on the pitch extracted in the fundamental frequency identification process (see FIG. 8), or its Y direction The mark which shows the pitch concerning a user's voice is added up and down (vertical direction).
First, the CPU 12 copies the rectangular area of the screen to scroll the entire screen to the left (step 1501). Next, the CPU 12 sets the width of the display screen as w and the height as h (step 1502).

ＣＰＵ１２は、印を表示すべきｘ座標を算出する（ステップ１５０３）。ここでは、幅の所定割合の位置（本実施の形態では２／３の位置）をｘ座標として決定している。また、ＣＰＵ１２は、印を表示すべきｙ座標を算出する（ステップ１５０４）。ここでは、表示画面の高さの最上部をピッチの最大値（ｉＵｐｐｅｒＮＮ）に対応させ、最下部をピッチの最小値（ｉＬｏｗｅｒＮＮ）に対応させている。その後、算出されたｘ座標およびｙ座標で特定される位置に、所定の印が描画される（ステップ１５０５）。 The CPU 12 calculates the x coordinate where the mark is to be displayed (step 1503). Here, a position of a predetermined ratio of the width (a position of 2/3 in the present embodiment) is determined as the x coordinate. Further, the CPU 12 calculates the y coordinate where the mark is to be displayed (step 1504). Here, the uppermost part of the height of the display screen corresponds to the maximum pitch value (iUpperNN), and the lowermost part corresponds to the minimum value of pitch (iLowerNN). Thereafter, a predetermined mark is drawn at a position specified by the calculated x coordinate and y coordinate (step 1505).

図１６は、表示画像の例を示す図である。この例では、表示画面において、ピッチ表示の領域（符号１６０１）を設け、そこに、楽譜の本来の音高に対応する印（たとえば、符号１６０２）参照およびユーザの音声から抽出されたピッチを示す印（符号１６０３参照）が表示されている。なお、ここで、ピッチ表示領域１６０１の最上部は抽出するピッチの最高音（ピッチ＝７５；Ｅ♭５）であり、最下部は抽出するピッチの最低音（ピッチ
５６：Ａ♭３）となっている。また、画面のＸ方向２／３の位置が演奏タイミングであり、さらに、その前方（Ｘ方向左側）に先行表示タイミングが設けられる。 FIG. 16 is a diagram illustrating an example of a display image. In this example, a pitch display area (reference numeral 1601) is provided on the display screen, and a pitch (for example, reference numeral 1602) corresponding to the original pitch of the score and a pitch extracted from the user's voice are shown there. A mark (see reference numeral 1603) is displayed. Here, the uppermost portion of the pitch display area 1601 is the highest pitch sound to be extracted (pitch = 75; E ♭ 5), and the lowermost portion is the lowest pitch sound to be extracted (pitch 56: A ♭ 3). ing. Further, the position in the X direction 2/3 of the screen is the performance timing, and further, the preceding display timing is provided in front of it (left side in the X direction).

このように、本実施の形態においては、ユーザが楽曲を再生させつつ、マイクを通して歌を歌うと、表示画面において、楽音の音高の印とともに自分の歌った音声のピッチを示す印を見ることができる。本実施の形態によれば、楽曲データに基づく音高と、自分が歌った歌唱のピッチの変化とがグラフィカルに表示されるため、歌唱における得意な箇所や苦手な箇所が直感的にわかり、歌のトレーニングを効果的に進めることができる。また、ユーザは楽譜を持たなくても、画面を参照することで、遊び感覚で歌唱のトレーニングを進めることが可能となる。 As described above, in this embodiment, when a user sings a song through a microphone while playing a song, the display screen shows a mark indicating the pitch of the voice sung along with the mark of the pitch of the musical sound. Can do. According to the present embodiment, the pitch based on the music data and the change in the pitch of the singing song are graphically displayed. Training can be effectively promoted. In addition, even if the user does not have a musical score, the user can proceed with singing training with a sense of play by referring to the screen.

本発明は、上記実施の形態に限定されることなく、特許請求の範囲に記載された発明の範囲内で、種々の変更が可能であり、それらも本発明の範囲内に包含されるものであることは言うまでもない。
たとえば、前記実施の形態においては、図１５に示すように、音の長さを反映せず等間隔で楽曲の音高を示す印が表示されているがこれに限定されるものではなく、楽曲を構成するノートの音の長さに比例させた間隔で、音高を示す印を表示させてもよい。 The present invention is not limited to the above-described embodiment, and various modifications are possible within the scope of the invention described in the claims, and these are also included in the scope of the present invention. Needless to say.
For example, in the above embodiment, as shown in FIG. 15, the mark indicating the pitch of the music is displayed at equal intervals without reflecting the length of the sound. However, the present invention is not limited to this. A mark indicating the pitch may be displayed at an interval proportional to the length of the note sound.

また、抽出されたピッチを示す印を不連続な点として表した（図１５の符号１６０３参照）。しかしながら、これに限定されず、隣接する印を結ぶことで連続した変化を表してもよい。
また、本実施の形態では楽曲データ中の現在位置の座標を固定させて表示しているがこれに限定されるものではなく、現在位置がＸ方向右側に移動するように構成してもよい。この場合には、現在位置のＸ座標を表すポインタが、表示領域中の右端に進んだときに画面全体を書き換えればよい。 Moreover, the mark which shows the extracted pitch was represented as a discontinuous point (refer the code | symbol 1603 of FIG. 15). However, the present invention is not limited to this, and a continuous change may be expressed by connecting adjacent marks.
In the present embodiment, the coordinates of the current position in the music data are fixed and displayed. However, the present invention is not limited to this, and the current position may be moved to the right in the X direction. In this case, the entire screen may be rewritten when the pointer representing the X coordinate of the current position advances to the right end in the display area.

また、ノートオフ処理（図１１）およびピッチ表示処理（図１５参照）を組み合わせて処理してもよい。図１７は、ノートオフ処理とピッチ表示処理とが組み合わされたフローチャートである。ここで、ステップ１７０１〜１７１０は図１１の処理と同様である。その後、判定対象となる楽音のゲートタイムＩＧａｔｅの実時間が求められる（ステップ１７１１）。次いで、求められた実時間のフレームが特定される（ステップ１７１２）。その後、表示領域中の表示すべきｘ座標が算出され（ステップ１７１３）、算出されたｘ座標および音高に対応するｙ座標の位置に、抽出されたピッチを示す印が付加される（ステップ１７１４）。その後、ノートポインタｐがインクリメントされ（ステップ１７１５）、ステップ１７０２に戻る。 Moreover, you may process combining a note-off process (FIG. 11) and a pitch display process (refer FIG. 15). FIG. 17 is a flowchart in which note-off processing and pitch display processing are combined. Here, steps 1701 to 1710 are the same as the processing of FIG. Thereafter, the actual time of the gate time IGate of the musical tone to be determined is obtained (step 1711). The determined real time frame is then identified (step 1712). Thereafter, the x coordinate to be displayed in the display area is calculated (step 1713), and a mark indicating the extracted pitch is added to the position of the calculated y coordinate and y coordinate corresponding to the pitch (step 1714). ). Thereafter, the note pointer p is incremented (step 1715), and the process returns to step 1702.

さらに、前記実施の形態の表示（図１５参照）に加え、ピッチ比較結果を表示するように構成してもよい。たとえば、図１８の例では、ユーザの歌唱のピッチが、実際の音高より高い場合（符号１８０１）、低い場合（符号１８０２）、或いは、正しい場合（符号１８０３）のそれぞれについて、特有の符号を付与してもよい。無論、この符号は、矢印などに限定されず、丸印、バツ印であってもよいし、より漫画化した印（たとえば、風船の絵／風船が割れている絵、笑顔／泣き顔）であってもよい。 Furthermore, in addition to the display of the above embodiment (see FIG. 15), the pitch comparison result may be displayed. For example, in the example of FIG. 18, a unique code is used for each of the case where the pitch of the user's song is higher (reference numeral 1801), lower (reference numeral 1802), or correct (reference numeral 1803). It may be given. Of course, this code is not limited to an arrow or the like, and may be a circle or cross mark, or a more cartoonized mark (for example, a picture of a balloon / a picture of a broken balloon, a smile / crying face). May be.

次に、本発明の第２の実施の形態について説明する。第２の実施の形態では、対象者、つまり、歌唱するユーザの属性（子供、大人女性、大人男性）の指定に基づいて、それぞれ音域変数を設定することにより、より正確なピッチ抽出を可能としている。また、ユーザが発するべき音高をガイド音として出すことにより、歌のトレーニングをより効果的に行うことができ、或いは、ガイド音を出さないこと、つまりミューとすることにより、楽譜の譜読みのトレーニングを行うことも可能となっている。 Next, a second embodiment of the present invention will be described. In the second embodiment, more accurate pitch extraction can be performed by setting the range variable based on the designation of the target person, that is, the user's singing attribute (child, adult female, adult male). Yes. In addition, it is possible to perform the training of the song more effectively by making the pitch that the user should emit as a guide sound, or by not making a guide sound, that is, by making a mu, Training is also possible.

さらに、難易度を指定して、歌唱するユーザに応じて、難易度にしたがった判定を実現している。たとえば、初心者や子供には、比較的やさしく、熟練者や大人には厳しい判定を提示することが可能となる。 Furthermore, the difficulty level is designated and the determination according to the difficulty level is realized according to the user who sings. For example, it is relatively easy for beginners and children, and strict judgments can be presented to skilled people and adults.

第２の実施の形態におけるハードウェア構成、機能ブロックダイヤグラム、メインフロー等は、先に説明した第１の実施の形態のものと同様であるため、ここでは、追加された処理フローのみを説明する。 Since the hardware configuration, functional block diagram, main flow, etc. in the second embodiment are the same as those in the first embodiment described above, only the added processing flow will be described here. .

図１９は、対象者指定処理を示すフローチャートである。ユーザは、あらかじめ入力装置を操作して、ユーザ属性として、「子供」、「大人女性」或いは「大人男性」のいずれかを指定しておく。これに応答して、ＣＰＵ１２は、ユーザ属性ｉＶＲａｎｇｅとしてＲＡＭ１６に記憶しておく。対象者指定処理において、ＣＰＵ１２は、抽出するピッチの最低音ｉＬｏｗｅｒＮＮおよび最高音ｉＵｐｐｅｒＮＮの初期値を、それぞれ設定する（ステップ１９０１）。 FIG. 19 is a flowchart showing the target person specifying process. The user operates the input device in advance and designates “child”, “adult female”, or “adult male” as the user attribute. In response to this, the CPU 12 stores the user attribute iVRange in the RAM 16. In the subject designation process, the CPU 12 sets initial values of the lowest sound iLowerNN and the highest sound iUpperNN of the pitch to be extracted (step 1901).

ＣＰＵ１２は、指定されたユーザ属性を参照して、以下のように、最低音および最高音を設定する（ステップ１９０２〜ステップ１９０６）。
子供：最低音＝５８、最高音＝７９
大人女性：最低音＝５５、最高音＝７９
大人男性：最低音＝４３、最高音＝６７
ユーザによるユーザ属性の指定がない場合には、初期値が維持される。 The CPU 12 refers to the designated user attribute and sets the lowest sound and the highest sound as follows (steps 1902 to 1906).
Child: lowest note = 58, highest note = 79
Adult female: lowest note = 55, highest note = 79
Adult male: lowest note = 43, highest note = 67
When the user attribute is not designated by the user, the initial value is maintained.

このように、ユーザ属性により、最低音および最高音の範囲を決めることにより、より正確でかつ高速なピッチ抽出が可能となる。 Thus, by determining the range of the lowest sound and the highest sound according to the user attribute, more accurate and high-speed pitch extraction can be performed.

次に、第２の実施の形態にかかるガイド音の発音について説明する。図２０は第２の実施の形態にかかるノートオン処理を示すフローチャートである。この処理は、図１０のノートオン処理に、部分的に処理ステップが加えられたものであり、ステップ２００１〜２００３は、図１０のステップ１００１〜１００３に対応し、ステップ２００７〜２０１３は、図１０のステップ１００４〜１０１０に対応する。したがって、以下に加えられた処理ステップのみについて説明する。また、ユーザは、あらかじめ入力装置を操作して、ガイド音の出力の有無を指定しておく。これに応答して、ＣＰＵ１２は、ガイド音出力の有無を示す変数ｉＧｕｉｄｅＮｏｔｅの値（「１」：ガイド音あり、「０」：ガイド音なし）をＲＡＭ１６に記憶しておく。 Next, the pronunciation of the guide sound according to the second embodiment will be described. FIG. 20 is a flowchart showing note-on processing according to the second embodiment. In this process, processing steps are partially added to the note-on process in FIG. 10, steps 2001 to 2003 correspond to steps 1001 to 1003 in FIG. 10, and steps 2007 to 2013 are in FIG. Corresponds to Steps 1004 to 1010. Therefore, only the processing steps added below will be described. In addition, the user operates the input device in advance to designate whether or not guide sound is output. In response to this, the CPU 12 stores the value of the variable iGuideNote (“1”: with guide sound, “0”: no guide sound) indicating whether or not the guide sound is output in the RAM 16.

ＣＰＵ１２は、楽音データ中のＮｏｔｅ［ｐ］を発音すべきタイミングである場合（ステップ２００３でイエス(Yes)）、当該Ｎｏｔｅ［ｐ］のベロシティＶｅｌを最低値「１」に設定しておく（ステップ２００４）。次いで、ｉＧｕｉｄｅＮｏｔｅの値を調べ、その値が「１」、つまり、ガイド音を出力すべき場合には（ステップ２００５でイエス(Yes)）、Ｎｏｔｅ［ｐ］のベロシティＶｅｌを所定の値ＧＶＥＬに設定する（ステップ２００６）。これにより、ガイド音なしの指定の場合、ユーザが歌うべき音が出力されず（つまりミュートされ）、その一方、ガイド音ありの指定の場合、ユーザが歌うべき音が所定の音量で出力される。 When it is time to sound Note [p] in the musical sound data (Yes in Step 2003), the CPU 12 sets the velocity Vel of the Note [p] to the minimum value “1” (Step 1). 2004). Next, the value of iGuideNote is checked. If the value is “1”, that is, if a guide sound is to be output (Yes in step 2005), the velocity Vel of Note [p] is set to a predetermined value GVEL. (Step 2006). Thereby, in the case of designation without guide sound, the sound that the user should sing is not output (that is, muted), while in the case of designation with guide sound, the sound that the user should sing is output at a predetermined volume. .

次に、第２の実施の形態にかかる比較判定処理について、図２１を参照して説明する。この比較判定処理は、第１の実施の形態にかかる比較判定処理（図９）に、ユーザが入力した音声の入力波形のレベルが所定値に達しなかったときに、これをカウントするように構成され、そのための処理ステップが付加されている。図２１において、ステップ２１０１〜２１０３は、図９のステップ９０１〜９０３に対応し、ステップ２１０７〜２１１３は、図９のステップ９０４〜９１０に対応する。したがって付加された処理ステップについてのみ説明する。 Next, comparison determination processing according to the second embodiment will be described with reference to FIG. This comparison determination process is configured to count when the level of the input waveform of the voice input by the user does not reach a predetermined value in the comparison determination process (FIG. 9) according to the first embodiment. Therefore, processing steps therefor are added. 21, steps 2101 to 2103 correspond to steps 901 to 903 in FIG. 9, and steps 2107 to 2113 correspond to steps 904 to 910 in FIG. 9. Therefore, only the added processing steps will be described.

ＣＰＵ１２は、楽音データＮｏｔｅ［］中のピッチ比較結果として、カウンタｉＡｌｌＣｎｔおよびｉＮｏＩｎｐｕｔＣｎｔを設けている。発音中のデータであるとき（ステップ２１０３でイエス(Yes)）、ＣＰＵ１２は、Ｎｏｔｅ［ＯｎＢｕｆ［ｐ］．ｉＮｐ］のカウンタｉＡｌｌＣｎｔをインクリメントする（ステップ２１０５）。また、入力なし、つまり、ユーザ入力にかかる音声の波形のレベルが所定値に達していない場合には（ステップ２１０６でイエス(Yes)）、Ｎｏｔｅ［ＯｎＢｕｆ［ｐ］．ｉＮｐ］のカウンタｉＮｏＩｎｐｕｔＣｎｔをインクリメントする。これにより、楽音ごとに入力の総数（ｉＡｌｌＣｎｔ中の最終的なカウンタ値）と、音声入力がなかったときの総数（ｉＮｏＩｎｐｕｔＣｎｔの最終的なカウンタ値）とを知ることができる。 The CPU 12 provides counters iAllCnt and iNoInputCnt as pitch comparison results in the musical sound data Note []. When the data is sounding (Yes in Step 2103), the CPU 12 determines that Note [OnBuf [p]. iNp] counter iAllCnt is incremented (step 2105). If there is no input, that is, if the level of the waveform of the voice applied to the user input has not reached the predetermined value (Yes in step 2106), Note [OnBuf [p]. iNp] counter iNoInputCnt is incremented. Thereby, it is possible to know the total number of inputs (final counter value in iAllCnt) and the total number when there is no voice input (final counter value of iNoInputCnt) for each musical tone.

次に、第２の実施の形態にかかるノートオフ処理について、図２１を参照して説明する。ここでは、ユーザ入力により設定され、或いは、前述したユーザ属性に応じて設定された難易度ｉＤｆｆに対応する閾値を利用して、難易度に応じた正率判定が可能となる。 Next, note-off processing according to the second embodiment will be described with reference to FIG. Here, using the threshold value corresponding to the degree of difficulty iDff set by user input or set according to the above-described user attribute, it is possible to determine the correct rate according to the degree of difficulty.

ＣＰＵ１２は、発音中バッファＯｎＢｕｆ［ｐ］中、ピッチ比較結果ｉＭａｔｃｈＣｎｔ、ｉＵｐＣｎｔ、ｉＤｏｗｎＣｎｔおよびｉＮｏＩｎｐｕｔＣｎｔのうち最大のものを見つける（ステップ２２０３）。ｉＭａｔｃｈＣｎｔ以外が最大であった場合（ステップ２２０４でノー(No)）、発音中の楽音に対応する楽音データＮｏｔｅ［ＯｎＢｕｆ［ｐ］．ｉＮｐ］の判定結果ｓＭａｔｃｈとして、ｉＵｐＣｎｔ、ｉＤｏｗｎＣｎｔ、ｉＮｏＩｎｐｕｔＣｎｔのうち、最大の値をとったものに対応する結果（ＵＰＰＩＴ、ＤＯＷＮＰＩＴ、ＮＧの何れか）が記憶される（ステップ２２０５）。 The CPU 12 finds the maximum one of the pitch comparison results iMatchCnt, iUpCnt, iDownCnt, and iNoInputCnt in the sounding buffer OnBuf [p] (step 2203). If the value other than iMatchCnt is the maximum (No in Step 2204), the musical sound data Note [OnBuf [p]. As a determination result sMatch of iNp], a result (any one of UPPIT, DOWNNPIT, and NG) corresponding to the highest value among iUpCnt, iDownCnt, and iNoInputCnt is stored (step 2205).

ステップ２２０６でイエス(Yes)と判断された場合、全ての判定回数ｉＡｌｌＣｎｔにおいて、正しく歌えた回数ｉＭａｔｃｈＣｎｔの割合ｒが求められる（ステップ２２０６）。次いで、ＣＰＵ１２は、難易度ｉＤｉｆｆに応じて設定された閾値ｉＣＲａｔｅ［ｉＤｉｆｆ］と上記ｒとを比較し、ｒが大きい場合には（ステップ２２０７でイエス(Yes)）、判定結果ｓＭａｔｃｈとして正解（ＣＯＲＲＥＣＴ）を記憶し（ステップ２２０８）、それ以外の場合には、ｓＭａｔｃｈとしてＮＧを記憶する（ステップ２２０９）。これにより、難易度にしたがって、正解（ＣＯＲＲＥＣＴ）となる程度を変化させることが可能となる。したがって、難易度を設定しておくことで、初心者から上級者まで種々のレベルのトレーニングを実現することが可能となる。 If it is determined yes in step 2206, the ratio r of the number of times iMatchCnt that has been sung correctly is obtained for all the determination times iAllCnt (step 2206). Next, the CPU 12 compares the threshold value iCrate [iDiff] set in accordance with the difficulty level iDiff with the above r, and if r is large (Yes in Step 2207), the correct result (CORRECT) as the determination result sMatch. ) Is stored (step 2208), otherwise NG is stored as sMatch (step 2209). This makes it possible to change the degree of correct answer (CORRECT) according to the degree of difficulty. Therefore, by setting the degree of difficulty, it is possible to realize various levels of training from beginners to advanced users.

また、第２の実施の形態において、図２３に示すように、楽譜上にピッチ比較判定の結果を表示するように構成しても良い。ここでは、歌えなかった音符には「バツ印」を、上ずった部分の音符には上向きの矢印を、下がった（フラットした）部分の音符には下向きの矢印を付している。また、正しく歌えた音符には「丸印」を付している。このような表示にすることで、ユーザの苦手或いは得意な場所が、楽譜上速やかにわかるので、独習にも好適であり、また、生徒と教師とがいっしょにレッスンする場合にも適切な指導のための資料とすることができる。 Further, in the second embodiment, as shown in FIG. 23, the result of the pitch comparison determination may be displayed on the score. Here, the notes that could not be sung are marked with a “cross”, the upward notes are marked with an upward arrow, and the down (flat) notes are marked with a downward arrow. In addition, the notes that were sung correctly are marked with a “circle”. By displaying in this way, the user's weakness or where they are good at can be quickly identified on the score, so it is also suitable for self-study, and appropriate instruction is also provided when students and teachers take lessons together. Can be used as a reference material.

なお、前記第１の実施の形態および第２の実施の形態においては、ユーザの歌唱をガイドする装置に本発明を適用したが、入力される音声は声に限定されず、楽器から発せられマイクにより取得されるものであっても良い。また、電子ギターのように当初から電気信号が出力される場合であっても本発明を適用できることはいうまでもない。 In the first embodiment and the second embodiment, the present invention is applied to a device that guides the user's singing. However, the input voice is not limited to a voice, but is emitted from an instrument and a microphone. It may be acquired by. Further, it goes without saying that the present invention can be applied even when an electric signal is output from the beginning as in an electronic guitar.

図１は、本発明の第１の実施の形態にかかる歌唱ガイド装置のハードウェア構成を示すブロックダイヤグラムである。FIG. 1 is a block diagram showing a hardware configuration of the singing guide device according to the first embodiment of the present invention. 図２は、ＣＰＵおよびその周辺部材の機能ブロックダイヤグラムである。FIG. 2 is a functional block diagram of the CPU and its peripheral members. 図３は、本実施の形態にかかる楽曲データを説明する図である。FIG. 3 is a diagram for explaining music data according to the present embodiment. 図４は、本実施の形態にかかる歌唱ガイド装置のメインフローを示すフローチャートである。FIG. 4 is a flowchart showing a main flow of the singing guide device according to the present embodiment. 図５は、本実施の形態において使用される制御用の変数を説明するための図である。FIG. 5 is a diagram for explaining control variables used in the present embodiment. 図６は、本実施の形態にかかる再生ルーチンを示すフローチャートである。FIG. 6 is a flowchart showing a reproduction routine according to the present embodiment. 図７は、本実施の形態にかかる音声入力ルーチンを示すフローチャートである。FIG. 7 is a flowchart showing a voice input routine according to the present embodiment. 図８は、本実施の形態にかかる周波数同定処理を示すフローチャートである。FIG. 8 is a flowchart showing frequency identification processing according to the present embodiment. 図９は、本実施の形態にかかる比較判定処理を示すフローチャートである。FIG. 9 is a flowchart showing the comparison determination process according to the present embodiment. 図１０は、本実施の形態にかかるノートオン処理を示すフローチャートである。FIG. 10 is a flowchart showing the note-on process according to this embodiment. 図１１は、本実施の形態にかかるノートオフ処理を示すフローチャートである。FIG. 11 is a flowchart showing note-off processing according to the present embodiment. 図１２は、本実施の形態にかかる採点処理を示すフローチャートである。FIG. 12 is a flowchart showing the scoring process according to the present embodiment. 図１３は、本実施の形態にかかるスクロール表示初期化処理を示すフローチャートである。FIG. 13 is a flowchart showing scroll display initialization processing according to the present embodiment. 図１４は、本実施の形態にかかるスクロール表示処理を示すフローチャートである。FIG. 14 is a flowchart showing scroll display processing according to the present embodiment. 図１５は、本実施の形態にかかるピッチ表示処理を示すフローチャートである。FIG. 15 is a flowchart showing the pitch display process according to the present embodiment. 図１６は、本実施の形態において表示装置の画面上に表示された画像例を示す図である。FIG. 16 is a diagram illustrating an example of an image displayed on the screen of the display device in the present embodiment. 図１７は、本実施の形態にかかるノートオフ処理の他の例を示すフローチャートである。FIG. 17 is a flowchart showing another example of the note-off process according to the present embodiment. 図１８は、本実施の形態において表示装置の画面上に表示された画像例を示す図である。FIG. 18 is a diagram illustrating an example of an image displayed on the screen of the display device in the present embodiment. 図１９は、第２の実施の形態にかかる対象者指定処理を示すフローチャートである。FIG. 19 is a flowchart illustrating a target person specifying process according to the second embodiment. 図２０は、第２の実施の形態にかかるノートオン処理を示すフローチャートである。FIG. 20 is a flowchart illustrating note-on processing according to the second embodiment. 図２１は、第２の実施の形態にかかる比較判定処理を示すフローチャートである。FIG. 21 is a flowchart illustrating the comparison determination process according to the second embodiment. 図１２は、第２の実施の形態にかかるノートオフ処理を示すフローチャートである。FIG. 12 is a flowchart illustrating note-off processing according to the second embodiment. 図２３は、本実施の形態において表示装置の画面上に表示された画像例を示す図である。FIG. 23 is a diagram illustrating an example of an image displayed on the screen of the display device in the present embodiment.

Explanation of symbols

１０歌唱ガイド装置
１２ＣＰＵ
１４ＲＯＭ
１６ＲＡＭ
１８入力装置
２０表示装置
２２ＭＩＤＩＩ／Ｆ
２４ＡＤＣ
３０再生制御部
３２ピッチ抽出部
３４比較判定部
３６楽譜表示制御部
３８スクロール表示制御部 10 Singing Guide Device 12 CPU
14 ROM
16 RAM
18 Input device 20 Display device 22 MIDI I / F
24 ADC
30 Playback Control Unit 32 Pitch Extraction Unit 34 Comparison Judgment Unit 36 Score Display Control Unit 38 Scroll Display Control Unit

Claims

A musical sound comparison device that compares a pitch extracted from a voice signal applied to a user input with a pitch of a musical sound that should be emitted and outputs a comparison result,
Pitch extraction means for extracting a pitch for each predetermined frame based on voice data obtained by digitizing the voice signal applied to the user input;
Reproduction means for generating a musical tone having a pitch based on the musical tone data at the sounding timing of the musical piece data to be reproduced, and for silencing the musical piece at the silencing timing of the musical piece data;
From the sounding timing to the mute timing, the pitch extracted for each frame is compared with the pitch based on the musical sound data, and the comparison result for each frame is accumulated in association with the musical sound data. Comparing means , for each frame, a first counter indicating that the extracted pitch matches the pitch based on the musical tone data, a second counter indicating that the pitch is higher than the pitch, and lower than the pitch Comparison means for counting up a count value of any one of a third counter indicating that the input is not performed and a fourth counter indicating that there is no input when the level of the audio signal does not reach a predetermined value When,
Determination means for determining the accuracy of the pitch extracted from the audio data based on the comparison result with reference to the counter values of the first to fourth counters after the mute timing has elapsed;
A musical sound comparison apparatus comprising: display control means for displaying a determination result by the determination means on a display device.

A musical sound comparison device that compares a pitch extracted from a voice signal applied to a user input with a pitch of a musical sound that should be emitted and outputs a comparison result,
Pitch extraction means for extracting a pitch for each predetermined frame based on voice data obtained by digitizing the voice signal applied to the user input;
Reproduction means for generating a musical tone having a pitch based on the musical tone data at the sounding timing of the musical piece data to be reproduced, and for silencing the musical piece at the silencing timing of the musical piece data;
From the sounding timing to the mute timing, the pitch extracted for each frame is compared with the pitch based on the musical sound data, and the comparison result for each frame is accumulated in association with the musical sound data. Comparing means , for each frame, a first counter indicating that the extracted pitch matches the pitch based on the musical tone data, a second counter indicating that the pitch is higher than the pitch, and lower than the pitch Comparison means for counting up a count value of any one of a third counter indicating that the input is not performed and a fourth counter indicating that there is no input when the level of the audio signal does not reach a predetermined value When,
Determination means for determining the accuracy of the pitch extracted from the audio data based on the comparison result with reference to the counter values of the first to fourth counters after the mute timing has elapsed;
Display control means for displaying a determination result by the determination means on a display device,
2. A musical sound comparison apparatus according to claim 1, wherein the determination means is activated in response to the reproduction means muting the musical sound at the muting timing .

The comparison means comprises a fifth counter for counting the number of times the pitch and pitch are compared for each frame;
When the counter value of the first counter is larger than the counter values of the other counters, the determining means determines the pitch and the pitch based on the counter value of the first counter / the counter value of the fifth counter. The musical tone comparison apparatus according to claim 1 or 2 , wherein a ratio of matching the pitch is obtained, and when the ratio is larger than a predetermined threshold, it is determined that the pitch is accurate.

A musical sound comparison device that compares a pitch extracted from a voice signal applied to a user input with a pitch of a musical sound that should be emitted and outputs a comparison result,
Pitch extraction means for extracting a pitch for each predetermined frame based on voice data obtained by digitizing the voice signal applied to the user input;
Reproduction means for generating a musical tone having a pitch based on the musical tone data at the sounding timing of the musical piece data to be reproduced, and for silencing the musical piece at the silencing timing of the musical piece data;
From the sounding timing to the mute timing, the pitch extracted for each frame is compared with the pitch based on the musical sound data, and the comparison result for each frame is accumulated in association with the musical sound data. Comparing means , for each frame, a first counter indicating that the extracted pitch matches the pitch based on the musical tone data, a second counter indicating that the pitch is higher than the pitch, and a pitch A third counter indicating lower, a fourth counter for counting the number of times the pitch and pitch are compared for each frame, a comparison means for counting up the count value;
A determination means for determining the accuracy of the pitch extracted from the audio data based on the comparison result after the mute timing has elapsed, wherein a counter value of the first counter is higher than a counter value of another counter; When the ratio is larger, the ratio of the pitch and the pitch is obtained based on the counter value of the first counter / counter value of the fourth counter, and when the ratio is larger than a predetermined threshold, A determination means for determining that the pitch is accurate;
A musical sound comparison apparatus comprising: display control means for displaying a determination result by the determination means on a display device.

A musical sound comparison device that compares a pitch extracted from a voice signal applied to a user input with a pitch of a musical sound that should be emitted and outputs a comparison result,
Pitch extraction means for extracting a pitch for each predetermined frame based on voice data obtained by digitizing the voice signal applied to the user input;
Reproduction means for generating a musical tone having a pitch based on the musical tone data at the sounding timing of the musical piece data to be reproduced, and for silencing the musical piece at the silencing timing of the musical piece data;
From the sounding timing to the mute timing, the pitch extracted for each frame is compared with the pitch based on the musical sound data, and the comparison result for each frame is accumulated in association with the musical sound data. Comparing means , for each frame, a first counter indicating that the extracted pitch matches the pitch based on the musical tone data, a second counter indicating that the pitch is higher than the pitch, and a pitch A third counter indicating lower, a fourth counter for counting the number of times the pitch and pitch are compared for each frame, a comparison means for counting up the count value;
A determination means for determining the accuracy of the pitch extracted from the audio data based on the comparison result after the mute timing has elapsed, wherein a counter value of the first counter is higher than a counter value of another counter; When the ratio is larger, the ratio of the pitch and the pitch is obtained based on the counter value of the first counter / counter value of the fourth counter, and when the ratio is larger than a predetermined threshold, A determination means for determining that the pitch is accurate;
Display control means for displaying a determination result by the determination means on a display device,
2. A musical sound comparison apparatus according to claim 1, wherein the determination means is activated in response to the reproduction means muting the musical sound at the muting timing .

The musical sound comparison device according to any one of claims 3 to 5, wherein the threshold value changes according to a set difficulty level.

A computer-readable musical sound comparison program that compares a pitch extracted from a voice signal applied to a user input with a pitch of a musical sound that should be emitted and operates the computer to output a comparison result. ,
A pitch extraction step of extracting a pitch for each predetermined frame based on voice data obtained by digitizing the voice signal applied to the user input;
A reproduction step of generating a musical tone having a pitch based on the musical sound data at the sounding timing of the musical piece data to be reproduced, and silencing the musical piece at the silencing timing of the musical piece data;
From the sounding timing to the mute timing, the pitch extracted for each frame is compared with the pitch based on the musical sound data, and the comparison result for each frame is accumulated in association with the musical sound data. A first counter indicating that the extracted pitch matches the pitch based on the musical tone data, and a second counter indicating that the pitch is higher than the pitch for each frame. A count value of any one of a third counter indicating that the pitch is lower than the pitch and a fourth counter indicating that there is no input when the level of the audio signal does not reach a predetermined value. A comparison step to count up,
A determination step of determining the accuracy of the pitch extracted from the audio data based on the comparison result with reference to the counter values of the first to fourth counters after the mute timing has elapsed;
A musical sound comparison program that causes the computer to execute a display control step of displaying a determination result of the determination step on a display device.

A computer-readable musical sound comparison program that compares a pitch extracted from a voice signal applied to a user input with a pitch of a musical sound that should be emitted and operates the computer to output a comparison result. ,
A pitch extraction step of extracting a pitch for each predetermined frame based on voice data obtained by digitizing the voice signal applied to the user input;
A reproduction step of generating a musical tone having a pitch based on the musical sound data at the sounding timing of the musical piece data to be reproduced, and silencing the musical piece at the silencing timing of the musical piece data;
From the sounding timing to the mute timing, the pitch extracted for each frame is compared with the pitch based on the musical sound data, and the comparison result for each frame is accumulated in association with the musical sound data. A first counter indicating that the extracted pitch matches the pitch based on the musical tone data, and a second counter indicating that the pitch is higher than the pitch for each frame. A count value of any one of a third counter indicating that the pitch is lower than the pitch and a fourth counter indicating that there is no input when the level of the audio signal does not reach a predetermined value. A comparison step to count up,
A determination step of determining the accuracy of the pitch extracted from the audio data based on the comparison result with reference to the counter values of the first to fourth counters after the mute timing has elapsed;
Causing the computer to execute a display control step of displaying a determination result of the determination step on a display device;
Wherein in response to said musical tone to mute timing in the playback step to cause silencing, tone comparison program characterized that you start the determination step.

In the comparison step, causing the computer to execute a step of counting up the number of times the pitch and pitch are compared for each frame by a fifth counter; and
In the determination step, when the counter value of the first counter is larger than the counter value of the other counter, the pitch and the pitch value are determined based on the counter value of the first counter / the counter value of the fifth counter. The computer is caused to execute a step of obtaining a ratio with which the pitch matches, and a step of determining that the pitch is accurate when the ratio is larger than a predetermined threshold. Item 9. The musical tone comparison program according to item 7 or 8 .

A computer-readable musical sound comparison program that compares a pitch extracted from a voice signal applied to a user input with a pitch of a musical sound that should be emitted and operates the computer to output a comparison result. ,
A pitch extraction step of extracting a pitch for each predetermined frame based on voice data obtained by digitizing the voice signal applied to the user input;
A reproduction step of generating a musical tone having a pitch based on the musical sound data at the sounding timing of the musical piece data to be reproduced, and silencing the musical piece at the silencing timing of the musical piece data;
From the sounding timing to the mute timing, the pitch extracted for each frame is compared with the pitch based on the musical sound data, and the comparison result for each frame is accumulated in association with the musical sound data. A first counter indicating that the extracted pitch matches the pitch based on the musical tone data, and a second counter indicating that the pitch is higher than the pitch for each frame. And a third counter indicating that the pitch is lower than the pitch, and a fourth counter that counts the number of times the pitch and pitch are compared for each frame, a comparison step for counting up the count value,
A determination step for determining the accuracy of the pitch extracted from the audio data based on the comparison result after the mute timing has elapsed, in which the counter value of the first counter is set to another counter When the counter value is larger than the counter value of the first counter / the counter value of the fourth counter, the ratio of the pitch and the pitch is determined, and the ratio is a predetermined value. A determination step that determines that the pitch is accurate when the pitch is greater than
A musical sound comparison program that causes the computer to execute a display control step of displaying a determination result of the determination step on a display device.

A computer-readable musical sound comparison program that compares a pitch extracted from a voice signal applied to a user input with a pitch of a musical sound that should be emitted and operates the computer to output a comparison result. ,
A pitch extraction step of extracting a pitch for each predetermined frame based on voice data obtained by digitizing the voice signal applied to the user input;
A reproduction step of generating a musical tone having a pitch based on the musical sound data at the sounding timing of the musical piece data to be reproduced, and silencing the musical piece at the silencing timing of the musical piece data;
From the sounding timing to the mute timing, the pitch extracted for each frame is compared with the pitch based on the musical sound data, and the comparison result for each frame is accumulated in association with the musical sound data. A first counter indicating that the extracted pitch matches the pitch based on the musical tone data, and a second counter indicating that the pitch is higher than the pitch for each frame. And a third counter indicating that the pitch is lower than the pitch, and a fourth counter that counts the number of times the pitch and pitch are compared for each frame, a comparison step for counting up the count value,
A determination step of determining the accuracy of the pitch extracted from the audio data based on the comparison result after the mute timing has elapsed, in which the counter value of the first counter is set to another counter When the counter value is larger than the counter value of the first counter / the counter value of the fourth counter, a ratio at which the pitch and the pitch match is obtained, and the ratio is a predetermined threshold value. A determination step for determining that the pitch is accurate when the pitch is greater;
Causing the computer to execute a display control step of displaying the determination result of the determination step on a display device;
The reproduction in response to cause silencing the musical sound mute timing in step, tone comparison program said determining step, characterized that you start.

The musical tone comparison program according to any one of claims 9 to 11, wherein the threshold value changes in accordance with a set difficulty level.