JP2005208196A

JP2005208196A - Device and method for karaoke, and program

Info

Publication number: JP2005208196A
Application number: JP2004012763A
Authority: JP
Inventors: Kazuyoshi Tominaga; 一喜冨永
Original assignee: Konami Corp; Konami Computer Entertainment Studios Inc
Current assignee: Konami Computer Entertainment Studios Inc; Konami Group Corp
Priority date: 2004-01-21
Filing date: 2004-01-21
Publication date: 2005-08-04
Anticipated expiration: 2024-01-21
Also published as: JP3878180B2

Abstract

<P>PROBLEM TO BE SOLVED: To provide a device and method for karaoke which properly detect a temporal deviation of singing and suitably present it to the singer, and a program for realizing them by a computer. <P>SOLUTION: A game machine 201 is characterized in that: an input reception part 202 receives input of voice data; an analysis part 203 analyzes a frequency component and/or loudness (hereinafter "feature information") of the inputted voice data; a storage part 204 stores the analyzed feature information while making it correspond to the elapsed time from the start of the input reception; an extraction part 205 contrasts variation of feature information of scoring data with the elapsed time to variation of the stored feature information with the elapsed time when a specified contrast condition is met after the input reception is started to extract a section of the latter where the deviation of the variation of the latter from the variation of the former exceeds a specified range; and an output part 206 outputs information of the extracted section and its deviation in the stored feature information. <P>COPYRIGHT: (C)2005,JPO&NCIPI

Description

本発明は、カラオケ装置、カラオケ方法、ならびに、これらをコンピュータにて実現するプログラムに関する。 The present invention relates to a karaoke apparatus, a karaoke method, and a program for realizing these with a computer.

従来から、種々のカラオケ装置が提供されている。カラオケ装置では、ＭＩＤＩ（Musical Instrument Digital Interface）データ等で用意された楽曲データや、ＰＣＭ（Pulse Code Modulation）データ等で用意された音声波形データをあらかじめ用意しており、これを伴奏データとして音声出力するとともに、音声出力のタイミングに合わせてテレビジョン装置等の画面に歌詞を表示する。また、カラオケ装置に接続されたマイクから歌い手の音声波形データの入力を受け付け、これと、上記の伴奏データとを適宜混合（mixing；ミキシング）して、スピーカから音声出力する。伴奏データは、ＣＤ−ＲＯＭ（Compact Disk Read Only Memory）やＤＶＤ−ＲＯＭ（Digital Versatile Disk ROM）などの情報記録媒体に記録されている場合もあるし、コンピュータ通信網を介して、他の記憶装置から取得可能に構成されている場合もある。 Conventionally, various karaoke apparatuses have been provided. In the karaoke machine, music data prepared as MIDI (Musical Instrument Digital Interface) data and voice waveform data prepared as PCM (Pulse Code Modulation) data are prepared in advance, and this is output as accompaniment data. At the same time, the lyrics are displayed on the screen of a television device or the like in accordance with the timing of the audio output. In addition, the input of the singer's voice waveform data is received from a microphone connected to the karaoke apparatus, and this is mixed with the accompaniment data as appropriate, and the voice is output from the speaker. The accompaniment data may be recorded on an information recording medium such as a CD-ROM (Compact Disk Read Only Memory) or a DVD-ROM (Digital Versatile Disk ROM), or other storage device via a computer communication network. In some cases, it can be obtained from

さらに、コンピュータ技術の発展に伴い、各家庭にあるゲーム装置を利用して、カラオケを楽しむことができる環境が整いつつある。このようなゲーム装置を利用したカラオケでは、ビデオ出力はテレビジョン装置などに接続され、歌詞等を表示するのに用いられる。マイクはゲーム装置に直結され、マイクから入力された音声は、ゲーム装置用のＤＶＤ−ＲＯＭに記憶された伴奏データやコンピュータ通信網を介してダウンロードされた伴奏データとミキシングされて、テレビジョン装置やステレオ装置の音声入力を介してスピーカから出力される。 Furthermore, with the development of computer technology, an environment in which karaoke can be enjoyed by using game devices in each home is being prepared. In karaoke using such a game device, the video output is connected to a television device or the like and used to display lyrics and the like. The microphone is directly connected to the game device, and the sound input from the microphone is mixed with the accompaniment data stored in the DVD-ROM for the game device or the accompaniment data downloaded via the computer communication network, and the television device or The sound is output from the speaker via the audio input of the stereo device.

従来、カラオケ装置は、マイクから入力された歌い手の歌唱の様子を示す音声データを周波数分析や音量分析して、その時間変化を、あらかじめ用意された採点用の周波数成分や音量の時間変化とリアルタイムに対比して、採点を行うのが一般的であった。 Conventionally, a karaoke apparatus performs frequency analysis and volume analysis on voice data indicating the state of a singer's singing input from a microphone, and the time change of the frequency component and volume change for the scoring prepared in advance and real time In contrast, it was common to score.

しかしながら、採点をリアルタイムで行うと、早く歌い出してしまったり、歌い出しが遅れたり、等の時間的なずれを検知することは難しい。一方で、歌唱の各小節が正しいタイミングで歌われているか否かは、歌の上手下手に大きく関わるため、これを採点に用いたいという要望は大きい。 However, when scoring is performed in real time, it is difficult to detect a time lag such as singing early or delaying singing. On the other hand, since whether or not each measure of a song is sung at the right timing is greatly related to the skill of the song, there is a great demand to use it for scoring.

そこで、カラオケにおいて歌唱の時間的なずれを適切に検知して歌い手に提示する技術が強く求められている。
本発明は、歌唱の時間的なずれを適切に検知して歌い手に提示するのに好適なカラオケ装置、カラオケ方法、ならびに、これらをコンピュータによって実現するプログラムを提供することを目的とする。 Therefore, there is a strong demand for a technique for appropriately detecting a time shift of singing in karaoke and presenting it to a singer.
An object of the present invention is to provide a karaoke apparatus, a karaoke method, and a program that realizes these by a computer, which are suitable for appropriately detecting a time shift of singing and presenting it to a singer.

以上の目的を達成するため、本発明の原理にしたがって、下記の発明を開示する。
本発明の第１の観点に係るカラオケ装置は、入力受付部、分析部、記憶部、抽出部、および、出力部を備え、以下のように構成する。 In order to achieve the above object, the following invention is disclosed in accordance with the principle of the present invention.
The karaoke apparatus according to the first aspect of the present invention includes an input reception unit, an analysis unit, a storage unit, an extraction unit, and an output unit, and is configured as follows.

すなわち、入力受付部は、音声データの入力を受け付ける。
典型的には、カラオケ装置に接続されたマイクから、歌い手の歌唱の様子が音声データとして入力されることとなる。 That is, the input receiving unit receives input of audio data.
Typically, a singer's singing state is input as audio data from a microphone connected to the karaoke apparatus.

一方、分析部は、入力を受け付けられた音声データの特徴情報を分析する。特徴情報としては、典型的には、音声データの周波数成分や音量、または、これらの双方などを採用することができる。
周波数成分の分析には、典型的には、高速フーリエ変換を用いたり、各音程（やこれをさらに細分化した周波数帯）ごとに複数用意されたバンドパスフィルタを用いる。そして、成分の強度が最も高いものを、歌い手の歌唱の音程とする。 On the other hand, the analysis unit analyzes the feature information of the voice data that has been accepted. As the characteristic information, typically, the frequency component and volume of audio data, or both of them can be adopted.
For the analysis of frequency components, typically, a fast Fourier transform is used, or a plurality of bandpass filters prepared for each pitch (or a further subdivided frequency band) are used. And let the thing with the highest intensity | strength of a component be the pitch of a singer's song.

音量の分析は、音声データの振幅や振幅の自乗などをそのまま利用するのが一般的であるが、上記歌い手の歌唱の音程の成分の強度を音量として採用しても良い。
なお、歌い出しなどの歌唱の時間的なずれを検知するためには、音程と音量の両方が必要でない場合も多い。したがって、いずれか一方のみを特徴情報として採用して、時間的なずれを検知するのでも良いし、両方を特徴情報として採用するのでも良い。 The analysis of the volume generally uses the amplitude of the voice data or the square of the amplitude as it is, but the intensity of the pitch component of the singer's singing may be adopted as the volume.
In many cases, both the pitch and the volume are not required to detect a time lag of singing such as singing. Therefore, only one of them may be adopted as feature information to detect a time lag, or both may be adopted as feature information.

さらに、記憶部は、分析された特徴情報を、当該入力の受付開始からの経過時間に対応付けて、記憶する。
典型的には、所定の時間間隔（サンプリング間隔）で、分析された特徴情報をＲＡＭ（Randome Access Memory）やハードディスクなどの外部記憶装置に用意された配列に順に記憶していく。このサンプリング間隔は、時間的なずれを検知するための精度と、配列を用意する記憶装置の容量とを勘案して適宜選択することができる。 Furthermore, the storage unit stores the analyzed feature information in association with the elapsed time from the start of receiving the input.
Typically, the analyzed feature information is sequentially stored in an array prepared in an external storage device such as a RAM (Randome Access Memory) or a hard disk at a predetermined time interval (sampling interval). This sampling interval can be appropriately selected in consideration of the accuracy for detecting the time lag and the capacity of the storage device for preparing the array.

そして、抽出部は、当該入力の受付開始の後、所定の対比条件が満たされる場合、あらかじめ用意された採点データの特徴情報の経過時間に対する変化と、記憶された音声データの特徴情報の経過時間に対する変化と、を対比して、前者の変化と後者の変化とのずれが所定の範囲を超える後者の区間を抽出する。 Then, after the start of accepting the input, the extraction unit, when a predetermined contrast condition is satisfied, changes with respect to the elapsed time of the feature information of the scoring data prepared in advance and the elapsed time of the feature information of the stored voice data The latter section is extracted in which the difference between the former change and the latter change exceeds a predetermined range.

ここで、「採点データ」とは、典型的には、歌唱の見本となるべき音程や音量で演奏を行うための演奏データである。したがって、「採点データ」をかりにそのまま演奏すれば、歌い手の見本となるべき音程・音量である楽器が演奏されたかのような音声出力される。
また、「採点データ」と「音声データ」の差異が大きければ、歌い手の歌唱が下手であることになり、一致の度合が高ければ、歌い手の歌唱が上手であることになる。 Here, the “scoring data” is typically performance data for performing at a pitch or volume that should serve as a sample for singing. Therefore, if the performance is performed as it is based on the “scoring data”, the sound is output as if the musical instrument having the pitch and volume that should be a sample of the singer is played.
Also, if the difference between the “scoring data” and the “voice data” is large, the singer's singing is poor, and if the degree of coincidence is high, the singer's singing is good.

上記のように、所定の対比条件が満たされるまでは、マイクから入力された音声データを分析した特徴情報は記憶部に記憶されているのみであり、時間的なずれの検知は行わない。所定の対比条件が満たされて初めて、時間的なずれの大きさと、当該ずれが生じている区間を抽出する。したがって、典型的には、歌い手が１曲を歌い終わってから時間的なずれを検知することとなる。 As described above, until the predetermined contrast condition is satisfied, the feature information obtained by analyzing the voice data input from the microphone is only stored in the storage unit, and no temporal shift is detected. Only when a predetermined contrast condition is satisfied, the size of the time shift and the section where the shift occurs are extracted. Therefore, typically, the time lag is detected after the singer has finished singing one song.

なお、抽出される「区間」は、採点データに対するものであり、後述するように、歌唱の時間的なずれを歌い手に提示する場合には、ＲＡＭなどに記憶された特徴情報を用いるのではなく、採点データを用いて提示する。
また、「所定の対比条件」としては、１曲を歌い終わっているか否か、という条件を採用するのが典型的であるが、歌い手がコントローラ等を介して対比をするようにカラオケ装置に指示を出した場合に満たされることとしても良いし、歌い始めてから所定の時間が経過が経過したときに満たされるものとして設定しておいても良い。 The extracted “section” is for the scoring data. As will be described later, when presenting the singing time lag to the singer, the feature information stored in the RAM or the like is not used. , Present using scoring data.
In addition, the “predetermined contrast condition” typically employs the condition of whether or not a song has been sung, but the singer instructs the karaoke device to perform the comparison via a controller or the like. It may be satisfied when a song is issued, or may be set to be satisfied when a predetermined time has elapsed since the beginning of singing.

一方、出力部は、記憶された特徴情報のうち、抽出された区間とそのずれの情報を出力する。
出力は、音声による出力、画面表示による出力など、種々の手法が考えられる。また、ずれの大きさをそのまま提示したり、わかりやすく提示したりするほか、ずれの大きさや、ずれが生じている区間の数や時間長によって歌唱に採点付けを行い、採点結果を出力する手法を採用することもできる。 On the other hand, the output unit outputs information on the extracted section and its deviation from the stored feature information.
For output, various methods such as audio output and screen display output can be considered. In addition to presenting the size of the shift as it is or presenting it in an easy-to-understand manner, the singing is scored according to the size of the shift, the number of sections where the shift has occurred, and the length of time, and the scoring result is output Can also be adopted.

本発明によれば、カラオケにおいて、歌い方に時間的なずれがある場合にはそれを検出して、歌い手にこれを報告することができるようになる。 According to the present invention, when there is a time lag in the way of singing in karaoke, it can be detected and reported to the singer.

また、本発明のカラオケ装置において、出力部は、抽出された区間に対応付けられる採点データを音声により出力するように構成することができる。 Moreover, the karaoke apparatus of this invention WHEREIN: An output part can be comprised so that the scoring data matched with the extracted area may be output with an audio | voice.

すなわち、採点データと音声データとに時間的なずれが生じている区間について、正しい音程、正しい強弱、正しいテンポである採点データを音声により再生出力するのである。 That is, in a section in which a time lag occurs between the scoring data and the voice data, the scoring data having the correct pitch, the correct strength, and the correct tempo is reproduced and output by voice.

本発明によれば、歌のどの区間について、ユーザの歌唱に時間的なずれが生じているかを、ユーザは、正しい音程、正しい強弱、正しいテンポで再生される音声により知ることができるため、自己の歌唱において問題が生じている部分を容易に把握することができるようになり、研鑽を積むのに役立てることができる。 According to the present invention, since the user can know in which section of the song a time lag has occurred in the user's singing from the sound that is reproduced at the correct pitch, the correct strength, and the correct tempo. This makes it possible to easily grasp the part where the problem occurs in the singing, and can be used to accumulate study.

また、本発明のカラオケ装置は、再生部をさらに備え、以下のように構成することができる。 Moreover, the karaoke apparatus of this invention is further provided with the reproducing part, and can be comprised as follows.

すなわち、再生部は、入力を受け付けられた音声データと、伴奏データと、を再生する。
典型的なカラオケ装置においては、ユーザはアカペラ（伴奏なし）で歌唱を行うのではなく、伴奏データをスピーカなどで音声出力するとともに、ユーザの歌唱を当該音声出力に混合して出力する。再生部は、このようなミキシング再生を行うのである。 That is, the playback unit plays back the audio data and accompaniment data that have been accepted.
In a typical karaoke apparatus, the user does not sing with a cappella (no accompaniment), but outputs the accompaniment data with a speaker or the like, and also mixes the user's singing with the sound output and outputs it. The reproduction unit performs such mixing reproduction.

一方、入力受付部は、当該伴奏データの再生の開始により当該入力の受付を開始する。
すなわち、本発明では、ユーザの歌唱に係る音声入力の処理を開始する契機として、伴奏データの再生開始を用いる。これによって、伴奏データが再生される前の音声入力は、かりに入力があったとしても、歌唱に係る音声データではないとして処理の対象とはしない。 On the other hand, the input accepting unit starts accepting the input by starting the reproduction of the accompaniment data.
That is, in this invention, the reproduction | regeneration start of accompaniment data is used as an opportunity which starts the process of the audio | voice input which concerns on a user's song. As a result, the voice input before the accompaniment data is reproduced is not subject to processing because it is not voice data related to singing even if there is an input to the scale.

さらに、当該所定の対比条件は、当該伴奏データの再生の終了により満たされる。
「所定の対比条件」として、伴奏の終了を採用することにより、ユーザが１曲を歌い終わっていることを確認するのである。 Further, the predetermined contrast condition is satisfied by the end of reproduction of the accompaniment data.
By adopting the end of accompaniment as the “predetermined contrast condition”, it is confirmed that the user has finished singing one song.

そして、出力部は、抽出された区間に対応付けられる伴奏データを、当該区間に対応付けられる採点データとともに、音声により出力する。
上記発明では、ずれが生じている区間に対応する採点データを再生することによって、歌のうちのどの区間について歌唱に時間のずれが生じていたかを知らせていたが、本発明では、当該採点データの再生とともに伴奏データを再生する。 And an output part outputs the accompaniment data matched with the extracted area with a voice with the scoring data matched with the said area.
In the above-described invention, by reproducing the scoring data corresponding to the section in which the deviation occurs, it is informed which section of the song the time difference has occurred in the singing. Accompaniment data is played along with the playback.

本発明によれば、ユーザはカラオケの伴奏データを聞いて歌を歌うことができるようになる一方で、歌い始めに時間的なずれが生じた場合には、その区間についての伴奏データと採点データとが再生されるので、歌のうちのどの区間について歌唱に時間のずれが生じていたかを、さらに容易に知ることができるようになる。 According to the present invention, the user can sing a song while listening to the accompaniment data of karaoke. On the other hand, if a time lag occurs at the beginning of the singing, the accompaniment data and scoring data for that section Is reproduced, so that it becomes possible to more easily know which section of the song has a time lag in singing.

また、本発明のカラオケ装置において、出力部は、当該伴奏データの音声による出力に対して、当該採点データを、当該区間に対するずれだけ時間的にずらして、音声により出力するように構成することができる。
たとえば、ある区間について、歌い始めが○○秒遅れた場合に、これを出力部が知らせる際には、当該区間に対応する伴奏データの再生から○○秒遅れて採点データを再生するようにする。
本発明によれば、どの程度歌い出し等に時間的なずれが生じていたかを、伴奏データの再生と採点データの再生とを時間的にずらすことによって、ユーザは容易に知ることができるようになる。 Further, in the karaoke apparatus of the present invention, the output unit may be configured to output the scoring data by sound with a time shift by a shift with respect to the section with respect to the sound output of the accompaniment data. it can.
For example, when the output section notifies when a singing start is delayed by XX seconds for a certain section, the scoring data is reproduced with a delay of XX seconds from the reproduction of the accompaniment data corresponding to the section. .
According to the present invention, the user can easily know how much time deviation has occurred in singing or the like by shifting the reproduction of accompaniment data and the reproduction of scoring data in time. Become.

また、本発明のカラオケ装置において、出力部は、当該採点データの区間に対応付けられた文字または図形を、当該採点データの音声による出力に同期させて画面表示して出力するように構成することができる。 Further, in the karaoke apparatus of the present invention, the output unit is configured to display and output a character or a figure associated with the section of the scoring data in synchronism with the voice output of the scoring data. Can do.

一般にカラオケにおいては、伴奏データの再生に合わせて歌詞を表す文字を画面に表示する。本発明の一形態においては、出力部において採点データを再生する際に、採点データの再生の進行状況に同期させて、その歌詞を表示するのである。また、図形を同期させて画面表示する場合は、採点データがどの程度再生されたかを表す棒グラフ等の図形の長さの変化を採点データの音声出力に同期させる等が考えられる。 Generally, in karaoke, characters representing lyrics are displayed on the screen in accordance with the reproduction of accompaniment data. In one embodiment of the present invention, when the scoring data is reproduced in the output unit, the lyrics are displayed in synchronization with the progress of the scoring data reproduction. In the case where the graphic is displayed on the screen in synchronization, it is conceivable to synchronize the change in the length of the graphic such as a bar graph indicating how much the scoring data has been reproduced with the audio output of the scoring data.

本発明によれば、歌い出し等に時間的なずれが生じていた部分について、当該採点データの再生に合わせて、当該採点データの区間に対応付けられた歌詞などが画面に時間を追って表示されるので、ユーザは、採点データにおける時間経過を容易に知ることができるようになる。 According to the present invention, the lyrics associated with the section of the scoring data are displayed on the screen over time in accordance with the reproduction of the scoring data for the portion where the time difference has occurred in the singing or the like. Therefore, the user can easily know the passage of time in the scoring data.

また、本発明のカラオケ装置において、出力部は、当該採点データの区間に対応付けられた文字または図形を、当該採点データを当該区間に対するずれだけ時間的にずらさずに音声により出力したとした場合の出力に同期させてさらに画面表示して出力するように構成することができる。 Also, in the karaoke apparatus of the present invention, the output unit outputs the character or the figure associated with the section of the scoring data by voice without shifting the scoring data by the time with respect to the section. In synchronization with the output, the screen can be further displayed and output.

上記発明では、採点データの再生に同期させて歌詞等の文字や図形を表示していたが、本発明では、さらに、「当該採点データを当該区間に対するずれだけ時間的にずらさずに音声により出力したとした場合の出力」に同期させて（いわば伴奏データの出力に同期させて）、さらに歌詞等の文字や図形を表示する。たとえば、歌詞を表示する場合に、検知された時間のずれが○○秒遅れであった場合は、正しく歌えたとした場合の歌詞表示（伴奏に合った歌詞表示）がされるとともに、○○秒遅れで同じ歌詞が別途表示されることとなる。棒グラフ等で進行状況を提示する場合は、正しく歌ったとした場合の進行状況の棒グラフ等の表示から○○秒遅れで別の棒グラフ等が表示されることとなる。 In the above invention, characters and figures such as lyrics are displayed in synchronization with the reproduction of the scoring data. However, in the present invention, “the scoring data is output by voice without being shifted in time by the deviation from the section”. In synchronism with the “output in the case of having performed” (in other words, in synchronization with the output of accompaniment data), characters and figures such as lyrics are further displayed. For example, when displaying the lyrics, if the detected time lag is delayed by XX seconds, the lyrics will be displayed as if you were able to sing correctly (lyrics matching the accompaniment) and XX seconds. The same lyrics will be displayed separately with a delay. When presenting the progress status with a bar graph or the like, another bar graph or the like is displayed with a delay of XX seconds from the display of the progress status bar graph or the like when singing correctly.

本発明では、歌い出し等に時間的なずれが生じていた部分について、その時間的なずれだけずらして２つの表示を行うことにより、ずれが生じていた部分やずれの大きさをわかりやすくユーザに提示することができるようになる。 In the present invention, a portion where a time lag has occurred in singing or the like is displayed by shifting the time lag and performing two displays so that the user can easily understand the portion where the shift has occurred and the size of the shift. Will be able to present.

また、本発明のカラオケ装置において、出力部は、当該伴奏データの音声による出力に対して当該採点データを当該区間に対するずれだけ時間的にずらして、当該ずらした音声を所定の第１出力系統を介して音声により出力し、当該伴奏データの音声による出力に対して当該採点データを時間的にずらさずに当該ずらさない音声を所定の第２出力系統を介して音声により出力するように構成することができる。 Further, in the karaoke apparatus of the present invention, the output unit shifts the scoring data with respect to the output by the sound of the accompaniment data in terms of time, and shifts the shifted sound to a predetermined first output system. And output the accompaniment data by voice through a predetermined second output system without shifting the scoring data in time with respect to the voice output of the accompaniment data. Can do.

たとえば、本発明のカラオケ装置がステレオ再生可能なものである場合、右チャンネルと左チャンネルのいずれか一方を第１出力系統に、他方が第２出力系統に、それぞれ対応付けることができる。そして、第１出力系統のチャンネルからはユーザの歌唱に即した採点データの音声出力を行い、第２出力系統のチャンネルからは正しく歌えたとした場合の採点データの音声出力を行うのである。これらのほか、伴奏データも音声出力されるため、第２出力系統のチャンネルからの採点データの音声出力と伴奏データの音声出力とは、歌唱の見本となるべく同期していることになる。 For example, when the karaoke apparatus of the present invention is capable of stereo reproduction, one of the right channel and the left channel can be associated with the first output system, and the other can be associated with the second output system. Then, the voice data of the scoring data is output from the channel of the first output system, and the voice data of the scoring data is output from the channel of the second output system when it is sung correctly. In addition to this, since accompaniment data is also output as audio, the audio output of scoring data from the channel of the second output system and the audio output of accompaniment data are synchronized as much as possible in the sample of singing.

本発明では、ヘッドホンを用いて音声出力を聞いた場合は特に、ユーザは、歌い出し等の時間的なずれを左右の音の時間差によって知得することができるようになる。 In the present invention, especially when listening to the sound output using headphones, the user can know a time lag such as singing by the time difference between the left and right sounds.

また、本発明は、以下のように構成することができる。
すなわち、分析部は、音声データの特徴情報として、周波数成分および音量の両方を分析する。
上記発明では、時間的なずれが検知できるのであれば、周波数成分のみを特徴情報としても良いし、音量のみを特徴情報としても良いし、周波数成分と音量の両方を合わせて特徴情報としても良い、としていたが、本発明では、周波数成分と音量の両方を合わせて特徴情報とする。 Further, the present invention can be configured as follows.
That is, the analysis unit analyzes both the frequency component and the volume as the feature information of the audio data.
In the above invention, as long as a time lag can be detected, only the frequency component may be used as the feature information, only the volume may be used as the feature information, or both the frequency component and the volume may be combined as the feature information. However, in the present invention, both the frequency component and the volume are combined as feature information.

一方、出力部は、「当該採点データを音声により出力」するのにかえて、「『前記抽出された採点データの区間に対応する当該記憶された特徴情報の経過時間に対する変化の区間を』音声により出力」する。
上記発明では、時間的なずれが生じていた部分を音声によりユーザに知らせるために、採点データを再生することとしていたが、本実施形態では、ユーザの声がマイク等を介して入力された音声データの周波数成分および音量の変化のうち、時間的なずれが生じていた部分を音声出力する。 On the other hand, instead of “outputting the scoring data by voice”, the output unit reads “the section of change with respect to the elapsed time of the stored feature information corresponding to the section of the extracted scoring data”. Output ".
In the above invention, the scoring data is reproduced in order to inform the user by voice of the portion where the time lag has occurred. However, in this embodiment, the voice of the user input via a microphone or the like is used. Of the change in the frequency component and volume of the data, the portion where the time lag has occurred is output as audio.

本発明によれば、ユーザの歌い方を録音してそのまま再生するのではなく、その周波数成分（音程）および音量のみをあらかじめ分析しておいて、歌い方に時間的なずれが生じた区間についてその音程・音量で音声再生をしてユーザの歌い方をある程度再現することにより、歌い方に問題がある部分について、ユーザにわかりやすく提示することができるようになる。 According to the present invention, the user's singing method is not recorded and reproduced as it is, but only the frequency component (pitch) and volume are analyzed in advance, and a section in which a time lag occurs in the singing method By reproducing the sound with the pitch and volume and reproducing the user's way of singing to some extent, it becomes possible to present to the user in an easy-to-understand manner a portion having a problem in the way of singing.

本発明のその他の観点に係るカラオケ方法は、入力受付部、分析部、記憶部、抽出部、および、出力部を備えるカラオケ装置において使用されるカラオケ方法であって、入力受付工程、分析工程、記憶工程、抽出工程、および、出力工程を備え、以下のように構成する。 A karaoke method according to another aspect of the present invention is a karaoke method used in a karaoke apparatus including an input reception unit, an analysis unit, a storage unit, an extraction unit, and an output unit, and includes an input reception process, an analysis process, A storage process, an extraction process, and an output process are provided and configured as follows.

すなわち、入力受付工程では、当該入力受付部が、音声データの入力を受け付ける。
一方、分析工程では、当該分析部が、入力を受け付けられた音声データの特徴情報を分析する。
さらに、記憶工程では、当該記憶部に、分析された特徴情報を、当該入力の受付開始からの経過時間に対応付けて、記憶する。 That is, in the input receiving step, the input receiving unit receives input of audio data.
On the other hand, in the analysis step, the analysis unit analyzes the feature information of the voice data that has been accepted.
Further, in the storing step, the analyzed feature information is stored in the storage unit in association with the elapsed time from the start of receiving the input.

そして、抽出工程では、当該抽出部が、当該入力の受付開始の後、所定の対比条件が満たされる場合、記憶された音声データの特徴情報の経過時間に対する変化を、あらかじめ用意された採点データの経過時間に対する変化と対比して、前者の変化と後者の変化とのずれが所定の閾値を超える区間を抽出する。
一方、出力工程では、当該出力部が、記憶された特徴情報のうち、抽出された区間の情報を出力する。 Then, in the extraction step, when the predetermined comparison condition is satisfied after the start of accepting the input, the extraction unit changes the change of the feature information of the stored voice data with respect to the elapsed time of the scoring data prepared in advance. In contrast to the change with respect to the elapsed time, a section where the difference between the former change and the latter change exceeds a predetermined threshold is extracted.
On the other hand, in the output step, the output unit outputs information on the extracted section of the stored feature information.

本発明の他の観点に係るプログラムは、コンピュータを、上記のカラオケ装置の各部として機能させ、もしくは、コンピュータに、上記のカラオケ方法の各工程を実行させるように構成する。
また、本発明のプログラムは、コンパクトディスク、フレキシブルディスク、ハードディスク、光磁気ディスク、ディジタルビデオディスク、磁気テープ、半導体メモリ等のコンピュータ読取可能な情報記録媒体に記録することができる。上記プログラムは、当該プログラムが実行されるコンピュータとは独立して、コンピュータ通信網を介して配布・販売することができる。また、上記情報記録媒体は、当該コンピュータとは独立して配布・販売することができる。 A program according to another aspect of the present invention is configured to cause a computer to function as each unit of the karaoke device or to cause the computer to execute each step of the karaoke method.
The program of the present invention can be recorded on a computer-readable information recording medium such as a compact disk, flexible disk, hard disk, magneto-optical disk, digital video disk, magnetic tape, and semiconductor memory. The above program can be distributed and sold via a computer communication network independently of the computer on which the program is executed. The information recording medium can be distributed and sold independently of the computer.

本発明によれば、歌唱の時間的なずれを適切に検知して歌い手に提示するのに好適なカラオケ装置、カラオケ方法、ならびに、これらをコンピュータによって実現するプログラムを提供することができる。 According to the present invention, it is possible to provide a karaoke apparatus, a karaoke method, and a program that realizes these by a computer, which are suitable for appropriately detecting a time shift of singing and presenting it to a singer.

以下に本発明の実施形態を説明する。以下では、理解を容易にするため、ゲーム装置に本発明が適用される実施形態を説明するが、各種のコンピュータ、ＰＤＡ、携帯電話などの情報処理装置においても同様に本発明を適用することができる。すなわち、以下に説明する実施形態は説明のためのものであり、本願発明の範囲を制限するものではない。したがって、当業者であればこれらの各要素もしくは全要素をこれと均等なものに置換した実施形態を採用することが可能であるが、これらの実施形態も本発明の範囲に含まれる。 Embodiments of the present invention will be described below. In the following, for ease of understanding, an embodiment in which the present invention is applied to a game device will be described. However, the present invention can be similarly applied to information processing devices such as various computers, PDAs, and mobile phones. it can. That is, the embodiment described below is for explanation, and does not limit the scope of the present invention. Therefore, those skilled in the art can employ embodiments in which each or all of these elements are replaced with equivalent ones, and these embodiments are also included in the scope of the present invention.

図１は、本発明の実施形態の１つに係るカラオケ装置が実現される典型的なゲーム装置の概要構成を示す模式図である。以下、本図を参照して説明する。 FIG. 1 is a schematic diagram showing a schematic configuration of a typical game device in which a karaoke device according to one embodiment of the present invention is realized. Hereinafter, a description will be given with reference to FIG.

ゲーム装置１００は、ＣＰＵ（Central Processing Unit）１０１と、ＲＯＭ（Read Only Memory）１０２と、ＲＡＭ（Random Access Memory）１０３と、インターフェース１０４と、コントローラ１０５と、外部メモリ１０６と、画像処理部１０７と、ＤＶＤ（Digital Versatile Disk）−ＲＯＭドライブ１０８と、ＮＩＣ（Network Interface Card）１０９と、を備える。 The game apparatus 100 includes a CPU (Central Processing Unit) 101, a ROM (Read Only Memory) 102, a RAM (Random Access Memory) 103, an interface 104, a controller 105, an external memory 106, an image processing unit 107, and the like. A DVD (Digital Versatile Disk) -ROM drive 108 and a NIC (Network Interface Card) 109.

ゲーム用のプログラムおよびデータを記憶したＤＶＤ−ＲＯＭをＤＶＤ−ＲＯＭドライブ１０８に装着して、ゲーム装置１００の電源を投入することにより、当該プログラムが実行され、本実施形態のカラオケ装置が実現される。 A DVD-ROM storing a game program and data is loaded into the DVD-ROM drive 108 and the game apparatus 100 is turned on to execute the program, thereby realizing the karaoke apparatus of the present embodiment. .

ＣＰＵ１０１は、ゲーム装置１００全体の動作を制御し、各構成要素と接続され制御信号やデータをやりとりする。また、ＣＰＵ１０１は、レジスタ（図示せず）という高速アクセスが可能な記憶域に対してＡＬＵ（Arithmetic Logic Unit）（図示せず）を用いて加減乗除等の算術演算や、論理和、論理積、論理否定等の論理演算、ビット和、ビット積、ビット反転、ビットシフト、ビット回転等のビット演算などを行うことができる。さらに、マルチメディア処理対応のための加減乗除等の飽和演算や、三角関数等、ベクトル演算などを高速に行えるように、ＣＰＵ１０１自身が構成されているものや、コプロセッサを備えて実現するものがある。 The CPU 101 controls the overall operation of the game apparatus 100 and is connected to each component to exchange control signals and data. Further, the CPU 101 uses arithmetic operations such as addition / subtraction / multiplication / division, logical sum, logical product, etc. using an ALU (Arithmetic Logic Unit) (not shown) for a storage area called a register (not shown) that can be accessed at high speed. , Logic operations such as logical negation, bit operations such as bit sum, bit product, bit inversion, bit shift, and bit rotation can be performed. In addition, the CPU 101 itself is configured so that saturation operations such as addition / subtraction / multiplication / division for multimedia processing, vector operations such as trigonometric functions, etc. can be performed at a high speed, and those provided with a coprocessor. There is.

ＲＯＭ１０２には、電源投入直後に実行されるＩＰＬ（Initial Program Loader）が記録され、これが実行されることにより、ＤＶＤ−ＲＯＭに記録されたプログラムをＲＡＭ１０３に読み出してＣＰＵ１０１による実行が開始される。また、ＲＯＭ１０２には、ゲーム装置１００全体の動作制御に必要なオペレーティングシステムのプログラムや各種のデータが記録される。 The ROM 102 records an IPL (Initial Program Loader) that is executed immediately after the power is turned on, and when this is executed, the program recorded on the DVD-ROM is read out to the RAM 103 and execution by the CPU 101 is started. The The ROM 102 stores an operating system program and various data necessary for operation control of the entire game apparatus 100.

ＲＡＭ１０３は、データやプログラムを一時的に記憶するためのもので、ＤＶＤ−ＲＯＭから読み出したプログラムやデータ、その他ゲームの進行やチャット通信に必要なデータが保持される。また、ＣＰＵ１０１は、ＲＡＭ１０３に変数領域を設け、当該変数に格納された値に対して直接ＡＬＵを作用させて演算を行ったり、ＲＡＭ１０３に格納された値を一旦レジスタに格納してからレジスタに対して演算を行い、演算結果をメモリに書き戻す、などの処理を行う。 The RAM 103 is for temporarily storing data and programs, and holds programs and data read from the DVD-ROM and other data necessary for game progress and chat communication. Further, the CPU 101 provides a variable area in the RAM 103 and performs an operation by directly operating the ALU on the value stored in the variable, or temporarily stores the value stored in the RAM 103 in the register. Perform operations such as performing operations on registers and writing back the operation results to memory.

インターフェース１０４を介して接続されたコントローラ１０５は、ユーザがカラオケなどのゲーム実行の際に行う操作入力を受け付ける。 The controller 105 connected via the interface 104 receives an operation input performed by the user when executing a game such as karaoke.

インターフェース１０４を介して着脱自在に接続された外部メモリ１０６には、カラオケのプレイ状況（過去に歌った楽曲や採点結果等）を示すデータ、ゲームの進行状態を示すデータ、チャット通信のログ（記録）のデータなどが書き換え可能に記憶される。ユーザは、コントローラ１０５を介して指示入力を行うことにより、これらのデータを適宜外部メモリ１０６に記録することができる。 The external memory 106 detachably connected via the interface 104 stores data indicating the karaoke play status (songs sung in the past, scoring results, etc.), data indicating the progress of the game, and chat communication logs (records). ) Is stored in a rewritable manner. The user can record these data in the external memory 106 as appropriate by inputting an instruction via the controller 105.

ＤＶＤ−ＲＯＭドライブ１０８に装着されるＤＶＤ−ＲＯＭには、ゲームを実現するためのプログラムとゲームに付随する画像データや音声データが記録される。ＣＰＵ１０１の制御によって、ＤＶＤ−ＲＯＭドライブ１０８は、これに装着されたＤＶＤ−ＲＯＭに対する読み出し処理を行って、必要なプログラムやデータを読み出し、これらはＲＡＭ１０３等に一時的に記憶される。 A DVD-ROM mounted on the DVD-ROM drive 108 stores a program for realizing the game and image data and audio data associated with the game. Under the control of the CPU 101, the DVD-ROM drive 108 performs a reading process on the DVD-ROM loaded therein, reads out necessary programs and data, and these are temporarily stored in the RAM 103 or the like.

画像処理部１０７は、ＤＶＤ−ＲＯＭから読み出されたデータをＣＰＵ１０１や画像処理部１０７が備える画像演算プロセッサ（図示せず）によって加工処理した後、これを画像処理部１０７が備えるフレームメモリ（図示せず）に記録する。フレームメモリに記録された画像情報は、所定の同期タイミングでビデオ信号に変換され画像処理部１０７に接続されるモニタ（図示せず）へ出力される。これにより、各種の画像表示が可能となる。 The image processing unit 107 processes the data read from the DVD-ROM by an image arithmetic processor (not shown) included in the CPU 101 or the image processing unit 107, and then processes the processed data on a frame memory ( (Not shown). The image information recorded in the frame memory is converted into a video signal at a predetermined synchronization timing and output to a monitor (not shown) connected to the image processing unit 107. Thereby, various image displays are possible.

画像演算プロセッサは、２次元の画像の重ね合わせ演算やαブレンディング等の透過演算、各種の飽和演算を高速に実行できる。
また、仮想３次元空間に配置され、各種のテクスチャ情報が付加されたポリゴン情報を、Ｚバッファ法によりレンダリングして、所定の視点位置から仮想３次元空間に配置されたポリゴンを俯瞰したレンダリング画像を得る演算の高速実行も可能である。 The image calculation processor can execute a two-dimensional image overlay calculation, a transmission calculation such as α blending, and various saturation calculations at high speed.
In addition, the polygon information arranged in the virtual three-dimensional space and added with various texture information is rendered by the Z buffer method, and a rendering image obtained by overlooking the polygon arranged in the virtual three-dimensional space from a predetermined viewpoint position is obtained. High speed execution of the obtained operation is also possible.

さらに、ＣＰＵ１０１と画像演算プロセッサが協調動作することにより、文字の形状を定義するフォント情報にしたがって、文字列を２次元画像としてフレームメモリへ描画したり、各ポリゴン表面へ描画することが可能である。フォント情報は、ＲＯＭ１０２に記録されているが、ＤＶＤ−ＲＯＭに記録された専用のフォント情報を利用することも可能である。 Further, the CPU 101 and the image arithmetic processor operate in a coordinated manner, so that a character string can be drawn as a two-dimensional image in a frame memory or drawn on the surface of each polygon according to font information that defines the character shape. is there. The font information is recorded in the ROM 102, but it is also possible to use dedicated font information recorded in the DVD-ROM.

ＮＩＣ１０９は、ゲーム装置１００をインターネット等のコンピュータ通信網（図示せず）に接続するためのものであり、ＬＡＮ（Local Area Network）を構成する際に用いられる１０ＢＡＳＥ−Ｔ／１００ＢＡＳＥ−Ｔ規格にしたがうものや、電話回線を用いてインターネットに接続するためのアナログモデム、ＩＳＤＮ（Integrated Services Digital Network）モデム、ＡＤＳＬ（Asymmetric Digital Subscriber Line）モデム、ケーブルテレビジョン回線を用いてインターネットに接続するためのケーブルモデム等と、これらとＣＰＵ１０１との仲立ちを行うインターフェース（図示せず）により構成される。 The NIC 109 is for connecting the game apparatus 100 to a computer communication network (not shown) such as the Internet, and conforms to the 10BASE-T / 100BASE-T standard used when configuring a LAN (Local Area Network). Therefore, analog modems for connecting to the Internet using telephone lines, ISDN (Integrated Services Digital Network) modems, ADSL (Asymmetric Digital Subscriber Line) modems, cables for connecting to the Internet using cable television lines A modem or the like, and an interface (not shown) that mediates between them and the CPU 101 are configured.

音声処理部１１０は、ＤＶＤ−ＲＯＭから読み出した音声データをアナログ音声信号に変換し、これに接続されたスピーカ（図示せず）から出力させる。また、ＣＰＵ１０１の制御の下、ゲームの進行の中で発生させるべき効果音や楽曲データを生成し、これに対応した音声をスピーカから出力させる。 The audio processing unit 110 converts audio data read from the DVD-ROM into an analog audio signal and outputs the analog audio signal from a speaker (not shown) connected thereto. Further, under the control of the CPU 101, sound effects and music data to be generated during the progress of the game are generated, and sound corresponding to this is output from the speaker.

音声処理部１１０は、ＤＶＤ−ＲＯＭに記録された音声データがＭＩＤＩデータである場合には、これが有する音源データを参照して、ＭＩＤＩデータをＰＣＭデータに変換する。また、ADPCM形式やOgg Vorbis形式等の圧縮済音声データである場合には、これを展開してＰＣＭデータに変換する。ＰＣＭデータは、そのサンプリング周波数に応じたタイミングでＤ／Ａ（Digital/Analog）変換を行って、スピーカに出力することにより、音声出力が可能となる。 When the audio data recorded on the DVD-ROM is MIDI data, the audio processing unit 110 refers to the sound source data included in the audio data and converts the MIDI data into PCM data. If the compressed audio data is in ADPCM format or Ogg Vorbis format, it is expanded and converted to PCM data. The PCM data can be output by performing D / A (Digital / Analog) conversion at a timing corresponding to the sampling frequency and outputting it to a speaker.

さらに、ゲーム装置１００には、インターフェース１０４を介してマイク１１１を接続することができる。この場合、マイク１１１からのアナログ信号に対しては、適当なサンプリング周波数でＡ／Ｄ変換を行い、ＰＣＭ形式のディジタル信号として、音声処理部１１０でのミキシング等の処理ができるようにする。 Furthermore, a microphone 111 can be connected to the game apparatus 100 via the interface 104. In this case, the analog signal from the microphone 111 is subjected to A / D conversion at an appropriate sampling frequency so that processing such as mixing in the sound processing unit 110 can be performed as a PCM format digital signal.

ゲーム装置１００をカラオケ装置として利用する場合には、ＤＶＤ−ＲＯＭから読み出した音声データ、もしくは、ＮＩＣ１０９を介してコンピュータ通信網から取得した音声データを伴奏データとし、マイクから入力された音声データを歌唱データとして、伴奏データと歌唱データを音声処理部１１０がミキシングし、スピーカから出力する。また、スピーカにかえて、ヘッドホン（図示せず）やイヤフォン（図示せず）を用いて、音声を出力させることもできる。 When the game apparatus 100 is used as a karaoke apparatus, audio data read from a DVD-ROM or audio data acquired from a computer communication network via the NIC 109 is used as accompaniment data, and audio data input from a microphone is used. The audio processing unit 110 mixes accompaniment data and singing data as singing data, and outputs them from the speaker. Moreover, it is possible to output sound using headphones (not shown) or earphones (not shown) instead of the speakers.

このほか、ゲーム装置１００は、ハードディスク等の大容量外部記憶装置を用いて、ＲＯＭ１０２、ＲＡＭ１０３、外部メモリ１０６、ＤＶＤ−ＲＯＭドライブ１０８に装着されるＤＶＤ−ＲＯＭ等と同じ機能を果たすように構成してもよい。 In addition, the game apparatus 100 uses a large-capacity external storage device such as a hard disk so as to perform the same functions as the ROM 102, the RAM 103, the external memory 106, the DVD-ROM attached to the DVD-ROM drive 108, and the like. It may be configured.

（カラオケ装置の構成）
図２は、上記ゲーム装置１００等の上に実現されるカラオケ装置の概要構成を示す説明図である。図３は、当該カラオケ装置にて実行されるカラオケ方法の制御の流れを示すフローチャートである。以下、本図を参照して説明する。 (Composition of karaoke equipment)
FIG. 2 is an explanatory diagram showing a schematic configuration of a karaoke apparatus realized on the game apparatus 100 or the like. FIG. 3 is a flowchart showing a flow of control of the karaoke method executed by the karaoke apparatus. Hereinafter, a description will be given with reference to FIG.

本実施形態に係るカラオケ装置２０１は、入力受付部２０２、分析部２０３、記憶部２０４、抽出部２０５、出力部２０６、および、表示部２０７、操作部２０８を備え、ゲーム装置１００上に実現される。 A karaoke apparatus 201 according to the present embodiment includes an input reception unit 202, an analysis unit 203, a storage unit 204, an extraction unit 205, an output unit 206, a display unit 207, and an operation unit 208, and is realized on the game device 100. The

まず、カラオケ装置２０１は、表示部２０７にメッセージを表示して、カラオケをしたい楽曲をユーザに選択させる（ステップＳ３０１）。ここで、表示部２０７としては、画像処理部１０７を用い、メッセージは、テレビジョン装置などのモニタに表示される。 First, the karaoke apparatus 201 displays a message on the display unit 207, and allows the user to select a song for which karaoke is desired (step S301). Here, the image processing unit 107 is used as the display unit 207, and the message is displayed on a monitor such as a television device.

そして、ユーザが、コントローラ１０５などにより構成される操作部２０８を用いて、所望の楽曲を選択すると、選択された楽曲の伴奏データと採点データを、コンピュータ通信網を介してＮＩＣ１０９経由で接続されたカラオケサーバ装置からダウンロードする（ステップＳ３０２）。 When the user selects a desired piece of music using the operation unit 208 configured by the controller 105 or the like, the accompaniment data and scoring data of the selected piece of music are connected via the NIC 109 via the computer communication network. Download from the karaoke server device (step S302).

本実施形態では、伴奏データと採点データは、ＭＩＤＩデータとして提供される。ＭＩＤＩデータは、複数の音源に対応付けられる複数のチャンネルからなる。各チャンネルは、たとえば、ピアノ、ギター、ベース、ドラムなどの楽器に割り当てられており、ゲーム装置１００の音声処理部１１０にあらかじめ用意された音源データに対応付けられている。 In this embodiment, accompaniment data and scoring data are provided as MIDI data. MIDI data is composed of a plurality of channels associated with a plurality of sound sources. Each channel is assigned to, for example, a musical instrument such as a piano, guitar, bass, and drum, and is associated with sound source data prepared in advance in the sound processing unit 110 of the game apparatus 100.

ＭＩＤＩデータの指令は、先の指令を出してからどれだけ時間が経過してからこの指令を実行するかを示す経過時間、どのチャンネルに対する指令かを表す番号（チャンネル指定がない場合もある。）、音高・音程、音量、音の演出効果を示す識別子ならびにパラメータを含むものが一般的であるが、ベンダーが自由に利用できるように予約された指令（「システム・エクスクルーシブ」と呼ばれる。）もある。 The MIDI data command is an elapsed time indicating how much time has elapsed since the previous command was issued, and a number indicating which channel the command is directed to (in some cases, there is no channel designation). In general, it includes a pitch / pitch, volume, an identifier indicating a sound effect, and a parameter, but a command (referred to as “system exclusive”) reserved so that a vendor can use it freely. is there.

本実施形態では、伴奏データは、上記のような伴奏に用いられる音源（楽器）に割り当てられたチャンネルに対応付けられるＭＩＤＩデータによって表現される。一方、採点データは、通常のカラオケの伴奏には用いない音源に割り当てられたチャンネルに対応付けられる。 In the present embodiment, accompaniment data is represented by MIDI data associated with a channel assigned to a sound source (musical instrument) used for accompaniment as described above. On the other hand, the scoring data is associated with a channel assigned to a sound source that is not used for normal karaoke accompaniment.

したがって、採点データを、何らかの音源に割り当てて再生すれば、カラオケを歌う際の見本となる音程・音量で（カラオケの歌詞を無視してハミングしたかのように）再生されることになる。 Therefore, if the scoring data is assigned to some sound source and played back, it will be played back at the pitch and volume that serve as a sample for singing karaoke (as if humming ignoring the lyrics of the karaoke).

また、後述するように、本実施形態では、歌唱の時間的なずれを検知するが、その検知を行う単位を１小節ごとに行う。すなわち、「この小節は、○○秒歌い始めが遅れている」「この小節は、○○秒歌い始めが進んでいる」のような情報を検知することによって、ユーザの歌唱の実力向上に役立てようとしている。 In addition, as will be described later, in this embodiment, a time shift of singing is detected, but the unit for the detection is performed for each measure. In other words, by detecting information such as “This bar is late at the beginning of singing for XX seconds” and “This bar is starting at the beginning of singing for XX seconds”, it helps to improve the ability of the user's singing. I am trying to do.

そこで、小節と小節の区切は、上記のシステム・エクスクルーシブに適当なベンダ拡張命令を定義して、これを用いることによって表現する。 Therefore, a bar-bar delimiter is expressed by defining an appropriate vendor extension instruction for the above system exclusive.

このように、ＭＩＤＩデータの「経過時間」の情報を逐次加算していけば、演奏開始からの経過時間を知ることができる。また、「システム・エクスクルーシブ」によって、時間のずれを検出する区間の区切を表現する。さらに、採点用のデータを扱うチャンネルを伴奏用のデータとは別に設けることによって、カラオケをプレイするとき（ユーザが歌うとき）には、採点データのチャンネルを再生しないこととし、後述するように時間のずれを再現する際に採点データのチャンネルを再現することができる。 Thus, by sequentially adding the “elapsed time” information of the MIDI data, it is possible to know the elapsed time from the start of the performance. In addition, “system exclusive” represents a section of a section in which a time lag is detected. Further, by providing a channel for handling scoring data separately from the accompaniment data, when playing karaoke (when the user sings), the scoring data channel is not reproduced, and as will be described later, The channel of scoring data can be reproduced when reproducing the deviation.

このほか、カラオケにおいては、歌詞が伴奏に同期して画面に表示される。そこで、このような歌詞データもＭＩＤＩデータの「システム・エクスクルーシブ」に含めるようにすることが望ましい。ＣＰＵ１０１は、ＭＩＤＩデータを音声処理部１１０に引き渡して処理させるのと同時に、音声処理部１１０がどこまでＭＩＤＩデータを処理したのかを監視し、それに応じて、ＭＩＤＩデータから歌詞データを走査して、画像処理部１０７を制御して対応する歌詞データをモニタに表示させるのである。その詳細については、後述する。 In addition, in karaoke, lyrics are displayed on the screen in synchronization with the accompaniment. Therefore, it is desirable to include such lyric data in the “system exclusive” of the MIDI data. The CPU 101 delivers the MIDI data to the voice processing unit 110 for processing, and at the same time, monitors how far the voice processing unit 110 has processed the MIDI data, and scans the lyrics data from the MIDI data accordingly. The image processing unit 107 is controlled to display the corresponding lyric data on the monitor. Details thereof will be described later.

尚、ＭＩＤＩデータを用いなくとも、上記のような機能を果たす独自のフォーマットで、採点データや伴奏データを表現した場合にも、本発明の範囲に含まれる。たとえば、採点データは１チャンネルのＭＩＤＩデータで表現し、伴奏データはOgg Vorbis形式の音声データで表現する等である。この場合は、伴奏データと採点データとに矛盾が生じないように、それぞれの経過時間や区間の定義等に整合性を持たせるようにデータを定義する必要がある。 Even if the MIDI data is not used, the case where the scoring data and the accompaniment data are expressed in a unique format that performs the above functions is also included in the scope of the present invention. For example, the scoring data is expressed by one channel of MIDI data, and the accompaniment data is expressed by audio data in the Ogg Vorbis format. In this case, it is necessary to define the data so that the elapsed time and the definition of the sections are consistent so that the accompaniment data and the scoring data do not contradict each other.

また、本実施形態は、いわゆるネットワークカラオケに相当するものであり、ダウンロードしたＭＩＤＩデータはＲＡＭ１０３に一時的に格納される。また、ハードディスク等の外部記憶装置を有する場合には、そちらに蓄積保存することとしても良い。また、採点データや伴奏データとして、ＤＶＤ−ＲＯＭやＣＤ−ＲＯＭに記録されたものを利用することとし、ＤＶＤ−ＲＯＭドライブ１０８を用いてこれらの情報を読み出すように構成することも可能である。 The present embodiment corresponds to a so-called network karaoke, and the downloaded MIDI data is temporarily stored in the RAM 103. If an external storage device such as a hard disk is provided, it may be stored and stored there. It is also possible to use data recorded on a DVD-ROM or CD-ROM as scoring data or accompaniment data, and to read out such information using the DVD-ROM drive 108.

次に、ＣＰＵ１０１は、ＲＡＭ１０３に記憶したＭＩＤＩデータを指定して、当該ＭＩＤＩデータを演奏するように音声処理部１１０に指示を出す（ステップＳ３０３）。すると、音声処理部１１０は、ＭＩＤＩデータを順次解釈し、適切な音源データを取得してＰＣＭデータに変換し、これをＤ／Ａ変換してスピーカから出力する、という処理を、以降のＣＰＵ１０１における処理と並行して行うこととなる。なお、音声処理部１１０は、後述するように、マイク１１１から入力された音声データと伴奏データとのミキシングも行う。 Next, the CPU 101 designates the MIDI data stored in the RAM 103 and issues an instruction to the audio processing unit 110 to play the MIDI data (step S303). Then, the audio processing unit 110 sequentially interprets the MIDI data, acquires appropriate sound source data, converts it into PCM data, D / A converts this, and outputs it from the speaker. It will be performed in parallel with the processing in. The audio processing unit 110 also mixes audio data input from the microphone 111 and accompaniment data, as will be described later.

ついで、ＣＰＵ１０１は、演奏を開始してからの経過時間を記憶する領域をＲＡＭ１０３内に確保し、当該経過時間領域に値０（演奏を開始した時刻であることを意味する。）を格納する（ステップＳ３０４）。 Next, the CPU 101 secures an area for storing the elapsed time from the start of the performance in the RAM 103, and stores a value of 0 (meaning the time when the performance is started) in the elapsed time area. (Step S304).

ついで、音声処理部１１０に、ＭＩＤＩデータの演奏を開始してから経過時間が、ＲＡＭ１０３内の経過時間領域に格納された経過時間に所定の間隔値を加算したものとなるまで待機する（ステップＳ３０５）。ここで待機するためには、音声処理部１１０に、演奏開始からの経過時間を問い合わせ、上記条件が満たされるまでこれを繰り返すのでも良いし、割込処理を利用するのでも良い。 Next, the audio processing unit 110 waits until the elapsed time from the start of the performance of the MIDI data is equal to the elapsed time stored in the elapsed time area in the RAM 103 plus a predetermined interval value (step) S305). In order to wait here, the audio processing unit 110 may be inquired about the elapsed time from the start of the performance, and this may be repeated until the above conditions are satisfied, or interrupt processing may be used.

また、「所定の間隔値」は、マイク１１１から入力されるＰＣＭデータをバッファリングできる時間長などの定数としても良いし、現在演奏されているＭＩＤＩデータのテンポに応じて変化させても良い。また、１小節の時間長を所定の整数で割った長さ（たとえば、４分音符の長さの２５６分の１等。）、とするのでも良い。以下では、理解を容易にするため、定数dであるとする。 The “predetermined interval value” may be a constant such as a time length during which PCM data input from the microphone 111 can be buffered, or may be changed according to the tempo of MIDI data currently being played. Alternatively, a length obtained by dividing the time length of one measure by a predetermined integer (for example, 1/256 of the length of a quarter note) may be used. In the following, it is assumed that the constant is d for easy understanding.

そして、ＣＰＵ１０１は、ＲＡＭ１０３に記憶されたＭＩＤＩデータを走査して、「ＲＡＭ１０３内の経過時間領域に格納された経過時間」から、「ＲＡＭ１０３内の経過時間領域に格納された経過時間に所定の間隔値を加算した経過時間」までの間（以下「繰り返し単位」という。）に、処理すべきものとして指定されている歌詞データを抽出し（ステップＳ３０６）、当該歌詞データにしたがって画像処理部１０７に指示を出して、モニタに歌詞を表示させる（ステップＳ３０７）。 Then, the CPU 101 scans the MIDI data stored in the RAM 103, and from the “elapsed time stored in the elapsed time area in the RAM 103” to the “elapsed time stored in the elapsed time area in the RAM 103”. The lyric data designated to be processed is extracted until “the elapsed time obtained by adding a predetermined interval value to” (hereinafter referred to as “repetition unit”) (step S306), and image processing is performed according to the lyric data. The unit 107 is instructed to display the lyrics on the monitor (step S307).

カラオケにおいては、歌い手のために、モニタにこれから歌うべき歌詞を表示し、伴奏が進むにつれて歌うべき場所の歌詞の色が変化していくという処理を行うことが一般的であるが、適切に歌詞データをＭＩＤＩデータ内に埋め込んでおくことによって、これを実現することができる。 In karaoke, it is common for the singer to display the lyrics to be sung on the monitor and to change the color of the lyrics at the place to sing as the accompaniment progresses. This can be achieved by embedding data in the MIDI data.

さらに、ＣＰＵ１０１は、歌い手がマイク１１１を介して歌声を入力し、マイク１１１やこれをインターフェース１０４や適切なサンプリング周波数でＤ／Ａ変換を行うことによって得られるＰＣＭデータのうち、今回の繰り返し単位の間に入力されたものを取得してＲＡＭ１０３に保存し（ステップＳ３０８）、当該ＰＣＭデータを音声処理部１１０に渡して伴奏データとミキシングさせて、スピーカから再生させる（ステップＳ３０９）。 Further, the CPU 101 inputs the singing voice via the microphone 111, and the CPU 101 repeats the current repetition unit among the microphone 111 and the PCM data obtained by performing D / A conversion on the interface 104 and an appropriate sampling frequency. Is input and stored in the RAM 103 (step S308), the PCM data is transferred to the audio processing unit 110, mixed with the accompaniment data, and reproduced from the speaker (step S309).

「所定の間隔値」を十分に小さくすれば、ステップＳ３０８〜ステップＳ３０９の処理によって歌声が遅れる度合は十分に無視できる。なお、マイク１１１から入力されるＰＣＭデータは、ＣＰＵ１０１の処理とは並行して音声処理部１１０に渡されるものとして、ミキシングによる遅延が生じないようにしても良い。 If the “predetermined interval value” is made sufficiently small, the degree to which the singing voice is delayed by the processing in steps S308 to S309 can be sufficiently ignored. Note that the PCM data input from the microphone 111 may be passed to the audio processing unit 110 in parallel with the processing of the CPU 101 so that no delay due to mixing occurs.

上記のように、本処理では、伴奏が開始されて以降にマイク１１１から入力されたＰＣＭデータを処理の対象とするため、伴奏が開始される以前にマイク１１１に音声が入力されたとしても、その入力は無視されることになる。 As described above, in this process, since PCM data input from the microphone 111 after the start of accompaniment is targeted, even if sound is input to the microphone 111 before the accompaniment starts, That input will be ignored.

さて、ここで得られるＰＣＭデータは、今回の繰り返し単位の間に入力が受け付けられたものである。そこで、ＣＰＵ１０１は、ＰＣＭデータを分析して、以下のような情報を得る（ステップＳ３１０）。したがって、ＣＰＵ１０１は、分析部２０３として機能する。
（ａ）現在のＰＣＭデータの基本周波数。これによって現在の歌声の音程が判明する。たとえば、ＰＣＭデータを高速フーリエ変換することによって周波数スペクトルを得れば、最も成分が大きい周波数が基本周波数となる。このほか、複数の互に異なる周波数帯の信号を通過させるバンドパスフィルタを用意して、これらの出力のうち、最も大きい成分を出力したバンドパスフィルタの周波数帯を基本周波数としても良い。
（ｂ）現在のＰＣＭデータの変位の２乗平均（パワー）。これによって現在の歌声の大きさが判明する。たとえば、ＰＣＭデータは、数値の列と見ることができるが、「当該数値の列に含まれるそれぞれの数値の自乗の平均」から、「当該数値の列に含まれる数値の平均の自乗」を減算すれば、「ＰＣＭデータの変位の２乗平均」が得られる。 Now, the PCM data obtained here has been received during the current repeat unit. Therefore, the CPU 101 analyzes the PCM data and obtains the following information (step S310). Therefore, the CPU 101 functions as the analysis unit 203.
(A) The fundamental frequency of the current PCM data. This reveals the pitch of the current singing voice. For example, if a frequency spectrum is obtained by fast Fourier transforming PCM data, the frequency with the largest component becomes the fundamental frequency. In addition, a plurality of band pass filters that pass signals in different frequency bands may be prepared, and the frequency band of the band pass filter that outputs the largest component of these outputs may be used as the fundamental frequency.
(B) The root mean square (power) of the displacement of the current PCM data. This reveals the current loudness of the singing voice. For example, PCM data can be viewed as a numeric column, but subtract "average square of numeric values contained in the numeric column" from "average squared numeric values contained in the numeric column". Then, “the mean square of displacement of PCM data” is obtained.

そして、これらの特徴情報を、ＲＡＭ１０３に保存する（ステップＳ３１１）。したがって、ＲＡＭ１０３は、ＣＰＵ１０１と共働して、記憶部２０４として機能する。 Then, the feature information is stored in the RAM 103 (step S311). Therefore, the RAM 103 functions as the storage unit 204 in cooperation with the CPU 101.

以下、理解を容易にするため、演奏を開始してからi回目の繰り返し単位（「演奏開始からの経過時間id」から「演奏開始からの経過時間(i+1)d」までの間）における基本周波数はf[i]と、パワーはa[i]と書くこととする。 In the following, for ease of understanding, in the i-th repeating unit (between “elapsed time id from performance start” to “elapsed time (i + 1) d from performance start)” The fundamental frequency is written as f [i], and the power is written as a [i].

尚、ＲＡＭ１０３の容量が小さいために、１曲のカラオケをプレイした場合のすべてのf[i]やa[i]を記憶させることができない場合がある。その場合は、適宜間引を行えば良い。 Note that since the capacity of the RAM 103 is small, all f [i] and a [i] may not be stored when one karaoke is played. In that case, thinning may be performed as appropriate.

たとえば、N回に１回だけ記憶させるとするならば、iがNの倍数のときにだけＲＡＭ１０３にf[i]，a[i]を記憶させることとし、このf[i]，a[i]は、「演奏開始からの経過時間id」から、「演奏開始からの経過時間(i+N)ｄ」までの間における基本周波数とパワーであるものとして扱えば良い。すなわち、f[i] = f[i+1] = … = f[i+N-1]，a[i] = a[i+1] = … = a[i+N-1]，であるものとして扱えば良いのである。尚、以下では、理解を容易にするため、N=1であるものとして説明する。 For example, if it is stored only once in N times, f [i], a [i] is stored in the RAM 103 only when i is a multiple of N, and this f [i], a [ i] may be treated as a fundamental frequency and power from “elapsed time id from the start of performance” to “elapsed time (i + N) d from the start of performance”. That is, f [i] = f [i + 1] = ... = f [i + N-1], a [i] = a [i + 1] = ... = a [i + N-1] It can be treated as a thing. In the following description, it is assumed that N = 1 for easy understanding.

このほか、ミキシングをする際の「所定の間隔値」と分析を行う際の「所定の間隔値」とを、異なる値とすることによっても、分析された結果のデータの容量を調節することができる。 In addition, the data volume of the analysis result can be adjusted by setting different values for the “predetermined interval value” at the time of mixing and the “predetermined interval value” at the time of analysis. it can.

そして、ステップＳ３０５においてＲＡＭ１０３内の経過時間領域に格納された値にdを加算して更新し（ステップＳ３１２）ＭＩＤＩデータの演奏が終わっていないか調べ（ステップＳ３１３）、終わっていなければ（ステップＳ３１３；Ｎｏ）、ステップＳ３０５に戻る。 In step S305, the value stored in the elapsed time area in the RAM 103 is updated by adding d (step S312), and it is checked whether the performance of the MIDI data has ended (step S313). S313; No), it returns to step S305.

なお、ステップＳ３１３における判断は、「伴奏が終了したか否か」に相当するが、ユーザがコントローラ１０５を操作することによって途中で歌唱を中止した場合には、直ちに抽出処理を終了して、以降の処理に進むようにしても良い。この場合は、以降の処理で対象となる区間として、ステップＳ３１２で直前に処理を行った区間を最後の区間とすることとなる。 Note that the determination in step S313 corresponds to “whether or not the accompaniment has ended”, but if the user stops singing in the middle by operating the controller 105, the extraction process is immediately ended, and thereafter You may make it progress to the process of. In this case, as a section to be processed in the subsequent processes, the section that has been processed immediately before in step S312 is set as the last section.

一方終わっていれば、後述する抽出処理を実行し（ステップＳ３１４）、その抽出の結果を後述する出力処理によって出力して（ステップＳ３１５）、本処理を終了する。 On the other hand, the extraction process described later is executed (step S314), the extraction result is output by the output process described later (step S315), and this process ends.

図４は、抽出処理の制御の流れを示すフローチャートである。 FIG. 4 is a flowchart showing the flow of control of the extraction process.

上記のように、ＲＡＭ１０３には、時間間隔dで分析した結果得られた基本周波数の数列f[0]，f[1]，f[2]，…とパワーの数列a[0]，a[1]，a[2]，…とが記憶されている。 As described above, the RAM 103 stores the basic frequency sequences f [0], f [1], f [2],..., And the power sequence a [0], a obtained as a result of the analysis at the time interval d. [1], a [2], ... are stored.

一方、採点用データとして、ＭＩＤＩデータのあるチャンネルに、演奏開始からの経過時間に対応付けることが可能なように音程データと音量データとが採点データとして埋め込まれている。ＭＩＤＩデータを走査すれば、各データ断片が処理されるべき「演奏開始からの経過時間」を得ることができ、音程データからは対応する基本周波数を得ることができる。また、音量はパワーと相関がある。典型的には、音量の２乗がパワー、あるいは、音量そのものがパワーである。 On the other hand, pitch data and volume data are embedded as scoring data in a channel having MIDI data as scoring data so as to be able to be associated with the elapsed time from the start of the performance. By scanning the MIDI data, the “elapsed time from the start of performance” in which each data fragment should be processed can be obtained, and the corresponding fundamental frequency can be obtained from the pitch data. The volume is correlated with power. Typically, the square of the volume is power, or the volume itself is power.

そこで、演奏開始からの経過時間をtとしたときの、tにおける採点データの基本周波数をg(t)、パワーをb(t)と書くこととする。 Therefore, when the elapsed time from the start of performance is t, the basic frequency of the scoring data at t is written as g (t) and the power is written as b (t).

また、ＭＩＤＩデータには、小節ごとの区切のデータも含まれている。そこで、i番目の小節が始まる「演奏開始からの経過時間」をs(i)と書くこととすると、i番目の小節は、「演奏開始からの経過時間t」がs(i)≦t<s(i+1)である区間になる。 The MIDI data includes data for each bar. So, if s (i) is written as the “elapsed time from the start of performance” when the i-th measure starts, the “elapsed time t from the start of performance” is s (i) ≦ t < The section is s (i + 1).

そこで、対比のために、当該区間の採点データから、対比用の基本周波数g(s(i))，g(s(i)+d)，…，g(s(i+1)-d)、および、パワーb(s(i))，b(s(i)+d)，…，b(s(i+1)-d)を得る。これらはいずれも、(s(i+1)-s(i))/d個の数値データであるから、(s(i+1)-s(i))/d次元のベクトルと見ることができる。そこで以下では、前者をベクトルG[i]と、後者をベクトルB[i]と、それぞれ書くこととする。 Therefore, for comparison, from the scoring data of the section, the fundamental frequency for comparison g (s (i)), g (s (i) + d), ..., g (s (i + 1) -d) And powers b (s (i)), b (s (i) + d),..., B (s (i + 1) -d) are obtained. Since these are (s (i + 1) -s (i)) / d numerical data, they can be viewed as (s (i + 1) -s (i)) / d-dimensional vectors. it can. Therefore, in the following, the former is written as vector G [i] and the latter as vector B [i].

一方、採点データに対する入力された音声データ（歌声）の時間的なずれをδdとすると、これに対応する歌声の区間の基本周波数はf[s(i)/d+δ]，f[s(i)/d+δ+1]，…，f[s(i+1)d+δ-1]パワーは、a[s(i)/d+δ]，a[s(i)/d+δ+1]，…，a[s(i+1)d+δ-1]となる。 On the other hand, if the temporal shift of the input voice data (singing voice) with respect to the scoring data is δd, the fundamental frequency of the corresponding singing voice section is f [s (i) / d + δ], f [s ( i) / d + δ + 1],..., f [s (i + 1) d + δ-1] powers are a [s (i) / d + δ], a [s (i) / d + δ + 1], ..., a [s (i + 1) d + δ-1].

そこで、これらも(s(i+1)-s(i))/d次元のベクトルと見て、前者をF[i,δ]と、後者をA[i,δ]と、それぞれ書くこととする。 Therefore, seeing these as (s (i + 1) -s (i)) / d-dimensional vectors, writing the former as F [i, δ] and the latter as A [i, δ] To do.

さて、時間的なずれがδdであることから、G[i]とF[i,u]の相関の値、および、B[i]とA[i,u]の相関の値は、いずれも、u = δで極大となるべきである。逆に言えば、ある所定の範囲でuを変化させて、これらの相関の値を求め、これが最大となるuをδとすれば、時間的なずれδdを求めることができる。これは、通信技術におけるスライド同期に相当する。なお、歌声のサンプリングの精度を考えれば、得られるずれの精度はd程度であるから、u，δとしては整数を考えれば十分である。 Now, since the time lag is δd, the correlation value between G [i] and F [i, u] and the correlation value between B [i] and A [i, u] are both , U = δ should be maximal. In other words, when u is changed within a certain predetermined range to obtain the value of these correlations, and u is maximized, δ can be obtained as a time shift δd. This corresponds to slide synchronization in communication technology. Considering the sampling accuracy of the singing voice, the accuracy of the deviation obtained is about d, so it is sufficient to consider integers as u and δ.

uを変化させる範囲については、歌がどの程度までずれることを想定するか、によって、適宜決めれば良い。たとえば、前後L秒までのずれを検出するのであれば、-L/d≦u≦L/dの範囲でuを変化させればよい。 The range in which u is changed may be determined as appropriate depending on how far the song is supposed to be shifted. For example, if a deviation up to L seconds before and after is detected, u may be changed in a range of −L / d ≦ u ≦ L / d.

さて、相関の値の具体的な計算方法であるが、１つの手法としては、２つのベクトルp，qがあるときに、これらの相関の値は、ベクトルp，qがなす角の余弦（cosine）とする手法がある。すなわち、(p・q)/(|p| |q|)をp，qの相関の値C(p,q)とするのである。 Now, as a specific calculation method of the correlation value, as one method, when there are two vectors p and q, these correlation values are cosine (cosine) of the angle formed by the vectors p and q. ). That is, (p · q) / (| p | | q |) is set as a correlation value C (p, q) between p and q.

N次元ベクトルpのi番目の要素をp[i]と表記することとすると、ベクトルの内積p・qは、以下の計算によって求めることができる。
p・q = Σ_i=0 ^N-1p[i]q[i]
また、ベクトルの大きさ|p|は、以下の計算によって求めることができる。
|p| = (p・p)^1/2 If the i-th element of the N-dimensional vector p is expressed as p [i], the inner product p · q of the vector can be obtained by the following calculation.
p ・ q = Σ _{i = 0} ^N-1 p [i] q [i]
The vector magnitude | p | can be obtained by the following calculation.
| p | = (p ・ p) ^1/2

さて、本実施形態では、基本周波数の相関と、パワーの相関と、の、２つの値が求められる。したがって、それぞれについて時間的なずれを求めても良いが、両者が異なる場合に、ユーザに両方の値を出力してもユーザがとまどうことも多い。 In the present embodiment, two values, a correlation between fundamental frequencies and a correlation between powers, are obtained. Therefore, the time lag may be obtained for each, but if both are different, the user is often confused even if both values are output to the user.

そこで、あらかじめ正定数KおよびHを定めておき、i番目の区間について、-L/d≦u≦L/dの範囲でuを変化させて
z(i,u) = K C(G[i],F[i,u]) + H C(B[i],A[i,u])
を計算し、その値が極大となるuを求めて、これをδとすれば良い。 Therefore, positive constants K and H are determined in advance, and u is changed in the range of −L / d ≦ u ≦ L / d for the i-th interval.
z (i, u) = KC (G [i], F [i, u]) + HC (B [i], A [i, u])
Is calculated, and u where the value becomes maximum is obtained, and this may be set as δ.

なお、δ = 0の場合はずれがなく、δが正の場合は、歌唱が伴奏に比べて遅れていることになり、δが負の場合は、歌唱が伴奏に比べて進んでいることになる。 When δ = 0, there is no deviation. When δ is positive, the singing is delayed compared to the accompaniment. When δ is negative, the singing is advanced compared to the accompaniment. .

以下、具体的に抽出処理について述べる。 The extraction process will be specifically described below.

まず、ＣＰＵ１０１は、ＲＡＭ１０３内に用意された変数領域iに0を格納する（ステップＳ４０１）。 First, the CPU 101 stores 0 in the variable area i prepared in the RAM 103 (step S401).

そして、ＣＰＵ１０１は、ＲＡＭ１０３内に記憶されたＭＩＤＩデータを走査して、i番目の区間の開始時間s(i)と終了時間s(i+1)を求める（ステップＳ４０２）。 Then, the CPU 101 scans the MIDI data stored in the RAM 103 to obtain the start time s (i) and end time s (i + 1) of the i-th section (step S402).

ついで、ＣＰＵ１０１は、ＲＡＭ１０３内に記憶されたＭＩＤＩデータを走査して、採点データを調べ、ベクトルG[i]およびベクトルB[i]を取得する（ステップＳ４０３）。 Next, the CPU 101 scans the MIDI data stored in the RAM 103, examines the scoring data, and acquires the vector G [i] and the vector B [i] (step S403).

さらに、ＣＰＵ１０１は、これらのベクトルの大きさ|G[i]|および|B[i]|を計算する（ステップＳ４０４）。 Further, the CPU 101 calculates the magnitudes | G [i] | and | B [i] | of these vectors (step S404).

次に、ＣＰＵ１０１は、ＲＡＭ１０３内に用意された変数領域uに値-L/dを、変数領域δに値0を、変数領域mに値0を、それぞれ格納する（ステップＳ４０５）。 Next, the CPU 101 stores the value −L / d in the variable area u prepared in the RAM 103, the value 0 in the variable area δ, and the value 0 in the variable area m (step S405).

そして、u≦L/dである間（ステップＳ４０６）、以下のステップＳ４０７〜ステップＳ４１２を繰り返す。 Then, while u ≦ L / d (step S406), the following steps S407 to S412 are repeated.

ＲＡＭ１０３内に記憶されたＰＣＭデータ（マイク１１１から入力された音声を前述のようにサンプリングしたもの）から、ベクトルF[i,u]およびベクトルA[i,u]を取得する（ステップＳ４０７）。 Vector F [i, u] and vector A [i, u] are acquired from the PCM data stored in RAM 103 (sound input from microphone 111 is sampled as described above) (step S407). .

次に、ＣＰＵ１０１は、これらのベクトルの大きさ|F[i,u]|および|A[i,u]|を計算する（ステップＳ４０８）。 Next, the CPU 101 calculates the magnitudes | F [i, u] | and | A [i, u] | of these vectors (step S408).

さらに、ＣＰＵ１０１は、相関値z[i,u]を計算する（ステップＳ４０９）。 Further, the CPU 101 calculates a correlation value z [i, u] (step S409).

そして、ＣＰＵ１０１は、z[i,u]>mであるか否かを調べ（ステップＳ４１０）、そうである場合（ステップＳ４１０；Ｙｅｓ）、変数領域δに値uを、変数領域mに値z[i,u]を、それぞれ格納する（ステップＳ４１１）、 Then, the CPU 101 checks whether or not z [i, u]> m (step S410). If so (step S410; Yes), the value u is set in the variable area δ and the value is set in the variable area m. z [i, u] are stored respectively (step S411),

ついで、uの値を１増やし（ステップＳ４１２）、ステップＳ４０６に戻る。 Next, the value of u is incremented by 1 (step S412), and the process returns to step S406.

さて、ステップＳ４０６〜ステップＳ４１２の繰り返しが終わったら（ステップＳ４０６；Ｎｏ）、ＣＰＵ１０１は、i番目の区間についての時間のずれをδdであるとして、この情報をＲＡＭ１０３に記録する（ステップＳ４１３）。本実施形態では、あらかじめ、ＲＡＭ１０３にずれを記憶するための配列wを用意し、ＣＰＵ１０１が、i番目の区間についての時間のずれを、当該配列のi番目の要素w[i]に記憶させる。 When the repetition of step S406 to step S412 ends (step S406; No), the CPU 101 records this information in the RAM 103, assuming that the time lag for the i-th section is δd (step S413). . In this embodiment, an array w for storing a shift in the RAM 103 is prepared in advance, and the CPU 101 stores the time shift for the i-th section in the i-th element w [i] of the array. Let

そして、終了時間s(i+1)がＭＩＤＩデータの演奏時間に至ったか否かを調べ（ステップＳ４１４）、至っていない場合（ステップＳ４１４；Ｎｏ）、iを１増やして（ステップＳ４１５）、ステップＳ４０２に戻る。一方、至った場合（ステップＳ４１４；Ｙｅｓ）、本処理を終了する。 Then, it is checked whether or not the end time s (i + 1) has reached the MIDI data performance time (step S414). If it has not reached (step S414; No), i is increased by 1 (step S415), and step S402. Return to. On the other hand, when it arrives (step S414; Yes), this process is complete | finished.

このようにして、各区間についての時間のずれが抽出された後は、当該時間のずれを歌い手に知らせる必要がある。最も単純には、配列wの要素を順に画面に表示して、各小節（区間）ごとにどれだけの時間のずれがあったかを知らせる手法がある。 In this way, after the time lag for each section is extracted, it is necessary to inform the singer of the time lag. The simplest method is to display the elements of the array w in order on the screen and notify how much time has shifted for each measure (section).

もっとも、本実施形態では、さらに歌い手に時間のずれをわかりやすく通知するための工夫を行っている。以下では、このための出力処理の詳細について説明する。 But in this embodiment, the device for notifying a singer of the time gap in an easy-to-understand manner is further performed. Below, the detail of the output process for this is demonstrated.

図５は、出力処理の制御の流れを示すフローチャートである。本実施形態では、歌い手の歌唱の時間のずれを、主として聴覚によって知らせる手法を採用する。以下、本図を参照して説明する。 FIG. 5 is a flowchart showing the flow of control of output processing. In the present embodiment, a technique for informing a singer's singing time shift mainly by hearing. Hereinafter, a description will be given with reference to FIG.

上記のように、ＲＡＭ１０３内の配列wに各区間の時間のずれが格納されているので、まず、ＣＰＵ１０１は、当該配列のうち、時間のずれの絶対値|w[i]|を大きい順に上位から所定の数だけ選択する（ステップＳ５０１）。 As described above, since the time lag of each section is stored in the array w in the RAM 103, the CPU 101 first increases the absolute value | w [i] | of the time lag in the array. A predetermined number is selected from the top in order (step S501).

次に、ＣＰＵ１０１は、ステップＳ５０１で選択されたものから、絶対値|w[i]|が所定の閾値よりも大きいものをさらに選択する（ステップＳ５０２）。ここでの「所定の閾値」とは、これよりも時間が短かければ、人間の聴覚には、遅れたり進んだりしているとは認識できないような、短かい時間を、あらかじめ設定すればよい。 Next, the CPU 101 further selects one having an absolute value | w [i] | larger than a predetermined threshold value from those selected in step S501 (step S502). Here, the “predetermined threshold value” may be set in advance to a short time such that if the time is shorter than this, the human hearing cannot recognize that it is delayed or advanced. .

そして、ＣＰＵ１０１は、ステップＳ５０２において選択された配列の要素の数が０個であるか否かを調べる（ステップＳ５０３）。０個である場合（ステップＳ５０３；Ｙｅｓ）、歌い手の歌い出しは正確である旨を報告して（ステップＳ５０４）、本処理を終了する。 Then, the CPU 101 checks whether or not the number of elements in the array selected in step S502 is zero (step S503). If it is zero (step S503; Yes), it is reported that the singer's singing is accurate (step S504), and the process is terminated.

一方、ステップＳ５０２において選択された配列の要素の数が０個でない場合（ステップＳ５０３；Ｎｏ）、ＣＰＵ１０１は、ステップＳ５０２において選択された配列の要素を、所定の順序に並べる（ステップＳ５０５）。ここで、所定の順序としては、以下のような態様が考えられる。
（ａ）添字iの順番。すなわち、歌の中でそれぞれの区間が出現する順。
（ｂ）時間のずれの絶対値の大きい順。すなわち、歌唱の中での、ずれが大きい順。
（ｃ）時間のずれの値の順またはその逆順。すなわち、歌唱の中での遅れまたは進みが大きい順。 On the other hand, when the number of elements of the array selected in step S502 is not 0 (step S503; No), the CPU 101 arranges the elements of the array selected in step S502 in a predetermined order (step S505). Here, the following modes can be considered as the predetermined order.
(A) Order of subscript i. That is, the order in which each section appears in the song.
(B) The descending order of absolute value of time deviation. In other words, the order in which the deviation is large in the singing.
(C) The order of the time lag values or vice versa. In other words, the order of delay or advance in singing.

次に、ステップＳ５０２において選択された配列の要素に対応する区間のそれぞれについて、ステップＳ５０５において得られた順に、ステップＳ５０７〜ステップＳ５１３の処理を繰り返す（ステップＳ５０６）。 Next, the processing in steps S507 to S513 is repeated in the order obtained in step S505 for each of the sections corresponding to the elements of the array selected in step S502 (step S506).

当該区間がi番目の小節であるとすると、ＣＰＵ１０１は、w[i]の正であるか否かを判断する（ステップＳ５０７）。 If the section is the i-th measure, the CPU 101 determines whether w [i] is positive (step S507).

w[i]が正である場合（ステップＳ５０７；Ｙｅｓ）、ＣＰＵ１０１は、音声処理部１１０に対して、i番目の区間のＭＩＤＩデータのうち、伴奏データに係るチャンネルの再生を開始する旨の指示を出す（ステップＳ５０８）。そして、ＣＰＵ１０１は、w[i]に相当する時間だけ待機してから（ステップＳ５０９）、音声処理部１１０に対して、i番目の区間のＭＩＤＩデータのうち、採点データに係るチャンネルの再生を開始する旨の指示を出す（ステップＳ５１０）。 When w [i] is positive (step S507; Yes), the CPU 101 instructs the audio processing unit 110 to start playing the channel related to the accompaniment data in the i-th interval MIDI data. An instruction is issued (step S508). Then, after waiting for a time corresponding to w [i] (step S509), the CPU 101 causes the audio processing unit 110 to reproduce the channel related to the scoring data among the MIDI data in the i-th section. An instruction to start is issued (step S510).

上記のように、採点データに係るチャンネルは、カラオケのプレイ時には音源を割り当てず、再生も行わない（ただし、見本音声として、たとえば小さな音量で再生するようにしてもよい。）。一方、出力時には、採点データに係るチャンネルに適当な音源（伴奏に使われていない音源データや、正弦波、方形波のように容易に生成できる音源データ等。）を当該採点データに割り当てて再生を行う。 As described above, the channel related to the scoring data is not assigned a sound source during karaoke play and is not played back (however, it may be played back as a sample sound at a low volume, for example). On the other hand, at the time of output, an appropriate sound source (sound source data not used for accompaniment, sound source data that can be easily generated like a sine wave, square wave, etc.) is assigned to the scoring data and played back. I do.

一方、w[i]が正でない場合（ステップＳ５０７；Ｎｏ）、ＣＰＵ１０１は、i番目の区間のＭＩＤＩデータのうち、採点データに係るチャンネルの再生を開始する旨の指示を出す（ステップＳ５１１）。そして、|w[i]|に相当する時間だけ待機してから（ステップＳ５１２）、音声処理部１１０に対して、i番目の区間のＭＩＤＩデータのうち、伴奏データに係るチャンネルの再生を開始する旨の指示を出す（ステップＳ５１３）。 On the other hand, if w [i] is not positive (step S507; No), the CPU 101 issues an instruction to start reproduction of the channel related to the scoring data among the MIDI data in the i-th section (step S511). . Then, after waiting for a time corresponding to | w [i] | (step S512), the audio processing unit 110 starts playing the channel related to the accompaniment data in the MIDI data of the i-th section. An instruction to this effect is issued (step S513).

ステップＳ５０６〜ステップＳ５１３の処理の繰り返しが終わったら、本処理を終了する。 When the process from step S506 to step S513 is repeated, the present process is terminated.

本実施形態によれば、ユーザの歌唱データそのものをすべて録音する必要がないため、ＲＡＭ１０３などの記憶容量に制限があるゲーム装置１００などの安価なハードウェアにおいても、伴奏と歌唱とのずれをユーザが聴覚によって知得することができる。 According to the present embodiment, since it is not necessary to record all of the user's singing data itself, even in inexpensive hardware such as the game apparatus 100 having a limited storage capacity such as the RAM 103, the deviation between the accompaniment and the singing is prevented. The user can know by hearing.

なお、ハードディスクなどを外部記憶装置として備えるゲーム装置１００においては、伴奏データと採点データとの再生開始の時間をずらして再生するのではなく、ユーザの歌唱のＰＣＭデータをそのまま（もしくは適宜圧縮して）録音し、伴奏データの当該区間と、当該区間の再生時に録音されたＰＣＭデータと、を同時にミキシングして再生することとしても良い。この場合には、よりリアルに、ユーザの歌唱のずれを再現することができる。 In the game device 100 having a hard disk or the like as an external storage device, the PCM data of the user's singing is directly (or appropriately compressed) instead of being played with the accompaniment data and the scoring data being played at different times. ) It is good also as recording and mixing the said area of accompaniment data and the PCM data recorded at the time of reproduction | regeneration of the said area simultaneously, and reproducing | regenerating. In this case, the shift of the user's singing can be reproduced more realistically.

また、上記実施形態では、原則として伴奏の開始から終了までにマイク１１１から入力された音声データの時間のずれを検出してこれを報告しようとするが、ある一定の範囲の区間について何回も練習したい場合には、ユーザの指示入力によって、「伴奏全体の開始から終了まで」にかえて、「ユーザが指定した範囲の区間」について、上記の処理を実行しても良い。 In the above embodiment, in principle, an attempt is made to detect and report a time lag of the audio data input from the microphone 111 from the start to the end of the accompaniment, but many times in a certain range of sections. When the user wants to practice, the above-described processing may be executed for “a section in a range designated by the user” instead of “from the start to the end of the entire accompaniment” by an instruction input by the user.

また、上記実施形態では、時間のずれが生じている区間のうち上位のものについて、歌唱と伴奏とを当該区間についてのみ再生する。したがって、ある小節のみにおいて時間のずれが生じていた場合は、当該小節のみが出力されることとなる。 In the above embodiment, the singing and the accompaniment are reproduced only for the section of the upper section among the sections where the time lag occurs. Therefore, when a time lag occurs only in a certain bar, only that bar is output.

しかしながら、時間のずれが生じている場合には、その前後の様子も合わせて再生した方が、ユーザにわかりやすいことも多い。 However, when there is a time lag, it is often easier for the user to reproduce the situation before and after it.

そこで、i (i≧1)番目の区間について出力の処理を行う場合には、伴奏データについては、i-1番目の区間からi+1番目の区間まで順に連続して再生することとしても良い。この場合、採点データについては、i-1番目の区間の再生を開始してからs(i-1) + w[i]だけ時間が経過した後に再生を開始する。 Therefore, when output processing is performed for the i (i ≧ 1) -th section, the accompaniment data may be successively reproduced in order from the i−1-th section to the i + 1-th section. . In this case, for the scoring data, the reproduction is started after the time of s (i-1) + w [i] has elapsed since the reproduction of the (i-1) th section was started.

これによって、特に、歌い出しが早い場合に、そのずれをわかりやすくユーザに提示することができるようになる。 This makes it possible to present the deviation to the user in an easy-to-understand manner, particularly when singing is quick.

このほか、カラオケ装置でステレオ出力ができる場合においては、採点データの再生の際に、一方の出力（たとえば、左側出力）では、伴奏データに同期して採点データを再生し、他方の出力（たとえば、右側出力）では、上記のように伴奏データとはw[i]を考慮して時間をずらして採点データを再生するようにしても良い。 In addition, when stereo output is possible with the karaoke device, when the scoring data is reproduced, the scoring data is reproduced in synchronization with the accompaniment data at one output (for example, the left output) and the other output (for example, In the right output), as described above, the accompaniment data may be reproduced with the time shifted in consideration of w [i].

この場合は、正しい歌い出し（見本）とユーザが行った歌唱（実際）の歌い出しとが、ヘッドホンやスピーカの左右からずれて再生されるため、ユーザはずれの程度をわかりやすく把握できるようになる。 In this case, since the correct singing (sample) and the singing of the singing performed by the user (actual) are reproduced from the left and right of the headphones and speakers, the user can easily understand the extent of the singing. .

また、上記のように、カラオケ装置においては、採点データ（を伴奏データと時間のずれなく再生したもの）に同期して歌詞がモニタに表示される。そこで、「伴奏データと同期して採点データを再生したとした場合の歌詞表示」と、「伴奏データとw[i]だけ時間をずらして採点データを再生した場合の歌詞表示」と、を画面の異なる位置に表示することとしても良い。 In addition, as described above, in the karaoke apparatus, lyrics are displayed on the monitor in synchronization with the scoring data (reproduced without time difference from the accompaniment data). Therefore, the screen displays "Lyrics display when scoring data is played back in synchronization with accompaniment data" and "Lyric display when scoring data is played back with accompaniment data shifted by w [i]". They may be displayed at different positions.

たとえば、スピーカやヘッドホンの左右から採点データを「見本」と「実際」として再生する場合には、「見本」に対応する歌詞の表示を画面上段に、「実際」に対応する歌詞の表示を画面下段に、それぞれ行うものとする。 For example, when scoring data is reproduced as “sample” and “actual” from the left and right of a speaker or headphones, the lyrics corresponding to “sample” are displayed at the top of the screen, and the lyrics corresponding to “actual” are displayed on the screen. It shall be performed in the lower row.

一般に、カラオケにおける歌詞の表示は、歌詞に含まれる文字の色を、歌の進行に合わせて変更する形態をとることが多い。 In general, the display of lyrics in karaoke often takes the form of changing the color of characters included in the lyrics as the song progresses.

上記実施形態によれば、「見本」となる歌詞表示と、「実際」の歌詞表示とが、時間をずらして行われるので、ユーザは、聴覚のみならず視覚を通じても、自分の歌唱の時間のずれを容易に知得できるようになる。 According to the above embodiment, the lyric display of “sample” and the lyric display of “actual” are performed at different times, so that the user can determine the time of his singing not only by hearing but also visually. The shift can be easily learned.

上記のように、本発明によれば、歌唱の時間的なずれを適切に検知して歌い手に提示するのに好適なカラオケ装置、カラオケ方法、ならびに、これらをコンピュータによって実現するプログラムを提供することができ、カラオケボックス等の専用施設に利用されるカラオケ装置の他、汎用ゲーム装置や汎用コンピュータ上に実現されるカラオケ装置においても、本発明を適用することができる。 As described above, according to the present invention, it is possible to provide a karaoke apparatus, a karaoke method, and a program that realizes these by a computer, which are suitable for appropriately detecting a time difference in singing and presenting it to a singer. In addition to a karaoke device used for a dedicated facility such as a karaoke box, the present invention can also be applied to a karaoke device realized on a general-purpose game device or a general-purpose computer.

本発明の実施形態に係るカラオケ装置が実現される典型的なゲーム装置の概要構成を示す模式図である。It is a mimetic diagram showing an outline composition of a typical game device by which a karaoke device concerning an embodiment of the present invention is realized. 本実施形態のカラオケ装置の概要構成を示す説明図である。It is explanatory drawing which shows schematic structure of the karaoke apparatus of this embodiment. 本実施形態のカラオケ装置にて実行されるカラオケ方法の制御の流れを示すフローチャートである。It is a flowchart which shows the flow of control of the karaoke method performed with the karaoke apparatus of this embodiment. 本実施形態のカラオケ装置にて実行される抽出処理の制御の流れを示すフローチャートである。It is a flowchart which shows the flow of control of the extraction process performed with the karaoke apparatus of this embodiment. 本実施形態のカラオケ装置にて実行される出力処理の制御の流れを示すフローチャートである。It is a flowchart which shows the flow of control of the output process performed with the karaoke apparatus of this embodiment.

Explanation of symbols

１００ゲーム装置
１０１ＣＰＵ
１０２ＲＯＭ
１０３ＲＡＭ
１０４インターフェース
１０５コントローラ
１０６外部メモリ
１０７画像処理部
１０８ＤＶＤ−ＲＯＭドライブ
１０９ＮＩＣ
１１０音声処理部
１１１マイク
２０１カラオケ装置
２０２入力受付部
２０３分析部
２０４記憶部
２０５抽出部
２０６出力部 100 game machine 101 CPU
102 ROM
103 RAM
104 Interface 105 Controller 106 External Memory 107 Image Processing Unit 108 DVD-ROM Drive 109 NIC
DESCRIPTION OF SYMBOLS 110 Speech processing part 111 Microphone 201 Karaoke apparatus 202 Input reception part 203 Analysis part 204 Storage part 205 Extraction part 206 Output part

Claims

An input receiving unit for receiving voice data input;
An analysis unit for analyzing the feature information of the audio data that has received the input;
A storage unit for storing the analyzed feature information in association with an elapsed time from the start of reception of the input;
When the predetermined contrast condition is satisfied after the start of accepting the input, a change with respect to the elapsed time of the feature information of the scoring data prepared in advance and a change with respect to the elapsed time of the feature information of the stored voice data are: In contrast, an extraction unit that extracts the latter section in which the difference between the former change and the latter change exceeds a predetermined range; and
A karaoke apparatus comprising: an output unit that outputs information on the extracted section and its deviation in the stored feature information.

The karaoke apparatus according to claim 1,
The output unit outputs the scoring data associated with the extracted section by voice.

The karaoke apparatus according to claim 2,
A playback unit that plays back the audio data that has received the input and the accompaniment data;
The input accepting unit starts accepting the input by starting the reproduction of the accompaniment data,
The predetermined contrast condition is satisfied by the end of reproduction of the accompaniment data,
The output unit outputs the accompaniment data associated with the extracted section together with scoring data associated with the section by voice.

The karaoke apparatus according to claim 3,
The output unit outputs the accompaniment data by voice, with the scoring data shifted in time by a shift with respect to the section, with respect to the voice output of the accompaniment data.

The karaoke apparatus according to claim 4,
The output unit displays and outputs a character or a figure associated with a section of the scoring data in synchronization with output of the scoring data by voice.

The karaoke apparatus according to claim 5,
The output unit further synchronizes with the output in the case where the character or the figure associated with the section of the scoring data is output by voice without shifting the scoring data by the time with respect to the section. It is characterized by being displayed and output.

The karaoke apparatus according to claim 3, wherein the output unit includes:
The scoring data is shifted with respect to the output of the accompaniment data in terms of time, and the shifted sound is output by sound through a predetermined first output system,
The scoring data is output by voice through a predetermined second output system without shifting the scoring data in time with respect to the output of the accompaniment data by voice.

A karaoke apparatus according to any one of claims 2 to 7,
The analysis unit analyzes both frequency components and volume as characteristic information of audio data,
Instead of “outputting the scoring data by voice”, the output unit reads “the section of change with respect to the elapsed time of the stored feature information corresponding to the section of the extracted scoring data” by voice. It is characterized by “output”.

A karaoke method used in a karaoke apparatus including an input reception unit, an analysis unit, a storage unit, an extraction unit, and an output unit,
An input receiving step in which the input receiving unit receives input of audio data;
An analysis step in which the analysis unit analyzes the feature information of the voice data that has received the input;
A storage step of storing the analyzed feature information in the storage unit in association with an elapsed time from the start of reception of the input;
When the extraction unit is satisfied with a predetermined contrast condition after the start of receiving the input, the change of the stored feature information with respect to the elapsed time is compared with the change with respect to the elapsed time of the scoring data prepared in advance. An extraction step for extracting a section in which the difference between the former change and the latter change exceeds a predetermined threshold; and
The said output part is provided with the output process of outputting the information of the said extracted area among the memorize | stored characteristic information.

Computer
An input receiving unit for receiving voice data input;
An analysis unit for analyzing the feature information of the audio data that has received the input;
A storage unit for storing the analyzed feature information in association with an elapsed time from the start of reception of the input;
When a predetermined contrast condition is satisfied after the start of accepting the input, the change with respect to the elapsed time of the stored feature information is compared with the change with respect to the elapsed time of the scoring data prepared in advance. An extraction unit that extracts a section in which a deviation from the latter change exceeds a predetermined threshold; and
A program that functions as an output unit that outputs information of the extracted section of the stored feature information.