JP2010217382A

JP2010217382A - Following performance evaluation system, karaoke system, and program

Info

Publication number: JP2010217382A
Application number: JP2009062730A
Authority: JP
Inventors: Noriaki Asemi; 典昭阿瀬見
Original assignee: Brother Industries Ltd
Current assignee: Brother Industries Ltd
Priority date: 2009-03-16
Filing date: 2009-03-16
Publication date: 2010-09-30
Anticipated expiration: 2029-03-16
Also published as: JP5262875B2

Abstract

<P>PROBLEM TO BE SOLVED: To determine how much it can sing according to tempo. <P>SOLUTION: Each of voicing time vlen[k] of each composition sound k is specified by collating a singing voice and a model voice (s180), and a non-reproducibility nvlen[k] is specified by contrasting the voicing time with corresponding model voicing time clen[k] (s190). A higher evaluation value is outputted as following performance of singing to an object musical piece when a periodicity included in a series of the non-reproducibility nvlen[k] is lower (s210). <P>COPYRIGHT: (C)2010,JPO&INPIT

Description

本発明は、ユーザが対象楽曲を歌唱した際の歌唱音声につき、その対象楽曲に対する歌唱の追従性を評価するための追従性評価システムに関する。 The present invention relates to a follow-up evaluation system for evaluating the follow-up performance of a song with respect to the target song for a singing voice when a user sings the target song.

近年、対象楽曲を歌唱してなる歌唱音声から抽出されたピッチ変化の傾向と、その対象楽曲におけるピッチ変化の傾向とに基づいて、その対象楽曲に対する歌唱の遅速を判定する、といった技術が提案されている（特許文献１参照）。 In recent years, a technique has been proposed in which, based on the tendency of pitch change extracted from the singing voice formed by singing the target music and the tendency of pitch change in the target music, the slowness of singing the target music is determined. (See Patent Document 1).

特開平１０−１４９１８０号公報JP-A-10-149180

ただ、上記技術では、対象楽曲に対して「歌唱が遅れている」，「歌唱が速すぎる」または「丁度良い」ことを判定することしかできないため、その歌唱がどの程度対象楽曲に追従できているか，より具体的にいえばどの程度そのテンポに合わせて歌唱できているのかといったことまで判定することはできなかった。 However, in the above technology, since it is only possible to determine whether the singing is delayed, the singing is too fast, or just right for the target song, how much the song can follow the target song Or, more specifically, how much can you sing at that tempo.

本発明は、このような課題を解決するためになされたものであり、その目的は、どの程度テンポに合わせて歌唱できているのかといったことを判定するための技術を提供することである。 The present invention has been made to solve such problems, and an object of the present invention is to provide a technique for determining how much singing can be performed at the tempo.

上記課題を解決するため第１の構成は、ユーザが対象楽曲を歌唱した際の歌唱音声を構成する構成音の変化するタイミング（以降「歌唱変化タイミング」という）それぞれが、その対象楽曲を適切に歌唱した場合における模範音声における構成音が変化するタイミング（以降「模範変化タイミング」という）ｃｓｔ［ｋ］（ｋ＝１〜ｎ）のいずれに対応するかを照合するタイミング照合手段と、前記歌唱変化タイミングそれぞれにつき、該歌唱変化タイミングに対応するものとして前記タイミング照合手段に照合された前記模範変化タイミングを基準とするタイミングのズレ量ｖｄｔ［ｋ］を特定するズレ特定手段と、それぞれ隣接する前記模範変化タイミングｃｓｔ［ｋ］，ｃｓｔ［ｋ＋１］，および，該模範変化タイミングについて前記ズレ特定手段が特定したズレ量ｖｄｔ［ｋ］，ｖｄｔ［ｋ＋１］に基づいて、前記歌唱音声における構成音それぞれの発声時間ｖｌｅｎ［ｋ］を特定する発声特定手段と、前記発声特定手段により特定された発声時間ｖｌｅｎ［ｋ］それぞれを、該発声時間ｖｌｅｎ［ｋ］の特定時に参照された模範変化タイミングｃｓｔ［ｋ］，ｃｓｔ［ｋ＋１］の区間を適切に発声した場合における模範発声時間ｃｌｅｎ［ｋ］と対比することにより、発声時間ｖｌｅｎ［ｋ］における模範発声時間ｃｌｅｎ［ｋ］の非再現性ｎｖｌｅｎ［ｋ］を特定する再現性特定手段と、前記再現性特定手段により特定された非再現性ｎｖｌｅｎ［ｋ］それぞれを、該特定に際して参照された前記模範変化タイミングｃｓｔ［ｋ］の到来する順に分布させた場合における非再現性の系列に基づいて、該系列における非再現性の変化パターンに含まれる周期性が低いほど、前記対象楽曲に対する歌唱の追従性として高い評価値を出力する評価出力手段と、を備えている。 In order to solve the above problem, the first configuration is that each of the timings at which the constituent sounds constituting the singing voice when the user sings the target music (hereinafter referred to as “singing change timing”) appropriately changes the target music. Timing collating means for collating which timing corresponds to cst [k] (k = 1 to n) at which the constituent sound in the exemplary voice changes when singing (hereinafter referred to as “exemplary change timing”), and the singing change For each timing, a deviation specifying means for specifying a timing deviation amount vdt [k] based on the model change timing checked by the timing checking means as corresponding to the singing change timing, and the adjacent examples The change timings cst [k], cst [k + 1], and the exemplary change timing are described above. Based on the deviation amounts vdt [k], vdt [k + 1] specified by the re-identifying means, the utterance specifying means for specifying the utterance time vlen [k] of each constituent sound in the singing voice, and the utterance specifying means The utterance time clen [k] when the utterance time vlen [k] is appropriately uttered in the section of the example change timings cst [k] and cst [k + 1] referred to when the utterance time vlen [k] is specified. ], The reproducibility specifying means for specifying the non-reproducibility nvlen [k] of the exemplary utterance time clen [k] at the utterance time vlen [k], and the non-reproducibility specified by the reproducibility specifying means When each of the nvlen [k] is distributed in the order of arrival of the model change timing cst [k] referred to in the identification, Based on the non-reproducibility series, the evaluation output means for outputting a higher evaluation value as the follow-up performance of the song to the target music as the periodicity included in the non-reproducibility change pattern in the series is lower. Yes.

このように構成された追従性評価システムでは、歌唱音声および模範音声を照合して各構成音の発声時間それぞれを特定すると共に、これを対応する模範発声時間と対比して非再現性を特定し、この非再現性の系列に含まれる周期性が低いほど、対象楽曲に対する歌唱の追従性として高い評価値を出力する。 In the follow-up evaluation system configured as described above, the utterance time of each constituent sound is specified by comparing the singing voice and the model voice, and the non-reproducibility is specified by comparing it with the corresponding model voice time. As the periodicity included in the non-reproducibility series is lower, a higher evaluation value is output as the followability of the song to the target music.

非再現性の系列は、対象楽曲に対する歌唱に追従できている，つまりテンポに合わせて適切に歌唱できていれば、その系列における変化パターンに大きな周期性が現れることはない。それは、対象楽曲のテンポに合わせて適切に歌唱できていれば、模範発声時間の非再現性が大きくなることはなく、一定の大きさで推移するため、大きな周期性を持った変化パターンとはなりえないからである。 If the non-reproducible sequence can follow the singing of the target music, that is, if it can sing appropriately according to the tempo, a large periodicity does not appear in the change pattern in the sequence. If you can sing appropriately according to the tempo of the target song, the non-reproducibility of the model utterance time will not increase, and it will change at a certain size, so what is a change pattern with a large periodicity? Because it cannot be.

一方、対象楽曲のテンポに合わせて適切に歌唱できず、実際のテンポから遅れて歌唱したり速く歌唱してしまう場合には、模範発声時間の非再現性が大きくなった後、非再現性の大きさに起因する歌唱タイミングのズレに気付いた歌唱者が模範変化タイミングに合わせて構成音の音高を変化させる、といった歌唱行動を繰り返すことが予想される。 On the other hand, if you cannot sing properly according to the tempo of the target song and sing late or sing faster than the actual tempo, the non-reproducible It is expected that a singer who notices a deviation in singing timing due to the size repeats the singing behavior of changing the pitch of the constituent sounds in accordance with the model change timing.

この場合、模範発声時間の非再現性が、大きくなった後それまでよりも小さくなるといった変化パターンを繰り返すようになり、これが系列において大きな周期性として現れるようになる。そして、この周期性は、対象楽曲に対する歌唱に追従できていない，つまりテンポに合わせて歌唱できていないほど大きくなる。 In this case, a change pattern in which the non-reproducibility of the model utterance time increases and then becomes smaller than before is repeated, and this appears as a large periodicity in the sequence. And this periodicity becomes so large that it cannot follow the singing with respect to the object music, that is, it cannot sing according to the tempo.

そのため、上述のように、模範発声時間の非再現性の系列に含まれる周期性が低いほど、対象楽曲をそのテンポに合わせて適切に歌唱できているといえ、歌唱に対する追従性が高いということができる。 Therefore, as described above, the lower the periodicity included in the non-reproducible sequence of model utterance times, the better the singing of the target music can be in accordance with the tempo, and the higher the followability to singing. Can do.

つまり、上記構成のように、周期性が低いほど対象楽曲に対する歌唱の追従性として高い評価値を出力するようにすることで、その評価値を、その歌唱がどの程度対象楽曲に追従できているか，つまりどの程度そのテンポに合わせて歌唱できているのかといったことを判定した結果とすることができる。 In other words, as in the above configuration, the lower the periodicity is, the higher the evaluation value is output as the followability of the singing with respect to the target music, and how much the singing can follow the target music. That is, it can be the result of determining how much singing can be performed at the tempo.

この構成において「評価値を出力する」とは、例えば、表示部やスピーカから評価値を示すメッセージを出力させたり、後述するカラオケ装置など別の装置にその評価値を渡して処理させたり、といったことである。 In this configuration, “output the evaluation value” means that, for example, a message indicating the evaluation value is output from the display unit or the speaker, or the evaluation value is passed to another device such as a karaoke device described later to be processed. That is.

また、この構成において、歌唱音声における構成音それぞれの発声時間ｖｌｅｎ［ｋ］を特定するに際しては、例えば、｛ｃｓｔ［ｋ＋１］＋ｖｄｔ［ｋ＋１］｝−｛ｃｓｔ［ｋ］＋ｖｄｔ［ｋ］｝といった数式に従って算出した値を、発声時間ｖｌｅｎ［ｋ］として特定するようにすることが考えられる。 Further, in this configuration, when specifying the utterance time vlen [k] of each component sound in the singing voice, for example, a mathematical expression such as {cst [k + 1] + vdt [k + 1]} − {cst [k] + vdt [k]} It is conceivable that the value calculated according to the above is specified as the utterance time vlen [k].

また、上記構成における「非再現性」とは、歌唱音声における実際の発声時間が、同一構成音の模範発声時間をどの程度再現できていないのか、を示すものであり、例えば、発声時間ｖｌｅｎ［ｋ］と模範発声時間ｃｌｅｎ［ｋ］との比（＝ｖｌｅｎ［ｋ］／ｃｌｅｎ［ｋ］，または，ｃｌｅｎ［ｋ］／ｖｌｅｎ［ｋ］）として算出した値を、「１」から離れるほど大きくなる非再現性ｎｖｌｅｎ［ｋ］として特定するようにすることが考えられる。 The “non-reproducibility” in the above configuration indicates how much the actual utterance time in the singing voice cannot reproduce the model utterance time of the same component sound. For example, the utterance time vlen [ k] and the ratio of the exemplary utterance time clen [k] (= vlen [k] / clen [k], or clen [k] / vlen [k]), the value calculated increases as the distance from “1” increases. It is conceivable to specify as non-reproducibility nvlen [k].

また、上記構成において、模範音声おける模範変化タイミングが、歌唱音声における歌唱変化タイミングのいずれに対応するのかを特定するに際しては、どのような手法で模範音声と歌唱音声とを照合することとしてもよい。 In the above configuration, when specifying which model change timing in the model voice corresponds to the song change timing in the singing voice, the model voice and the singing voice may be collated by any method. .

具体的な例としては、例えば、模範音声および歌唱音声それぞれの時間軸に沿った音高の推移パターン（具体的な例としては、音高の推移を示す波形など）を照合して変化タイミングを特定することが考えられる。 As a specific example, for example, the transition timing of the pitch along the time axis of each of the model voice and the singing voice (as a specific example, a waveform indicating the transition of the pitch, etc.) is collated to determine the change timing. It is possible to specify.

このためには、上記第１の構成を以下に示す第２の構成（請求項２）のようにするとよい。
この構成において、前記タイミング照合手段は、前記歌唱音声および前記模範音声それぞれの時間軸に沿った音高の推移パターンを照合することで、前記歌唱音声において音高が変化する前記歌唱変化タイミングそれぞれが、前記模範音声における前記模範変化タイミングｃｓｔ［ｋ］のいずれに対応するのかを特定する。 For this purpose, the first configuration is preferably a second configuration (claim 2) shown below.
In this configuration, the timing collating unit collates the transition patterns of the pitch along the time axis of each of the singing voice and the exemplary voice, so that each of the singing change timings at which the pitch changes in the singing voice is obtained. And which of the model change timings cst [k] in the model voice corresponds to.

この構成であれば、模範音声および歌唱音声それぞれにおける音高の推移パターンを照合することで、例えば、模範音声の時間軸に沿った音高の推移パターンのうち、歌唱音声における構成音の歌唱変化タイミングにおける推移パターンに所定のしきい値以上近似している模範変化タイミングを特定し、これを、その近似する歌唱変化タイミングに対応する模範変化タイミングであると特定することができる。 If it is this composition, by comparing the transition pattern of the pitch in each of the model voice and the singing voice, for example, among the transition patterns of the pitch along the time axis of the model voice, the singing change of the constituent sound in the singing voice The model change timing that approximates the transition pattern in the timing by a predetermined threshold or more can be specified, and this can be specified as the model change timing corresponding to the approximate song change timing.

この構成において照合に用いられる模範音声における音高の推移パターンとしては、時間軸に沿った実際の音高の推移を示す波形などを用いればよく、模範音声となる構成音それぞれの音高，音価を示す情報列（具体的な例としては楽譜データ）などを用いてもよい。 In this configuration, as a transition pattern of the pitch in the model voice used for collation, a waveform showing the transition of the actual pitch along the time axis may be used. An information string indicating a price (specific example, musical score data) may be used.

なお、この構成では、模範音声において同一音高で連続する構成音が含まれていると、その模範変化タイミングが、音高の推移だけで対応関係を特定することが難しくなるため、別の照合方法を併用することが望ましい。この場合における「別の照合方法」としては、例えば、音声レベルの推移パターンによる照合方法が考えられる。 Note that with this configuration, if the model speech contains constituent sounds that are continuous at the same pitch, it will be difficult to specify the correspondence relationship of the model change timing only by the transition of the pitch. It is desirable to use the methods together. As “another collation method” in this case, for example, a collation method based on a voice level transition pattern can be considered.

この構成において、前記タイミング照合手段は、前記歌唱音声の時間軸に沿った区間のうち、前記模範変化タイミングとの対応関係を特定した歌唱変化タイミングで挟まれ，かつ，前記対応関係を特定していない前記模範変化タイミングｃｓｔ［ｋ］に対応する区間について、該区間において音声レベルが一定以下になるタイミングを、前記対応関係が特定されなかった前記模範変化タイミングｃｓｔ［ｋ］に対応する前記歌唱変化タイミングとして特定する。 In this configuration, the timing collating means is sandwiched by the singing change timing specifying the correspondence relationship with the model change timing in the section along the time axis of the singing voice, and specifies the correspondence relationship. For the section corresponding to the non-exemplary change timing cst [k], the singing change corresponding to the exemplary change timing cst [k] for which the correspondence relationship has not been specified is the timing at which the sound level becomes lower than a certain level in the section. Specify as timing.

この構成であれば、模範音声および歌唱音声それぞれにおける音高の推移パターンを照合することで変化タイミングの対応関係を特定した後、ここで対応関係が特定されなかった区間について、音声レベルの推移パターンを照合することにより、模範音声において同一音高で連続する構成音の模範変化タイミングが、歌唱音声における歌唱変化タイミングのいずれに対応する模範変化タイミングかを特定することができる。 If it is this structure, after identifying the correspondence relationship of change timing by collating the transition pattern of the pitch in each of the model voice and the singing voice, the transition pattern of the voice level for the section in which the correspondence relationship is not specified here Can be identified as the model change timing corresponding to any of the singing change timings in the singing voice.

そのため、模範音声において同一音高で連続する構成音が含まれていたとしても、その模範変化タイミングが、歌唱音声における歌唱変化タイミングのいずれに対応するのかを適切に特定することができるようになる。 Therefore, even if constituent sounds that are continuous at the same pitch are included in the model voice, it is possible to appropriately identify which model change timing corresponds to the song change timing in the singing voice. .

また、上記各構成において、歌唱の追従性を示す評価値を決定するに際しては、「非再現性の系列」に含まれる周期性を特定する必要があるところ、その特定は、評価値を決定するタイミングで行うこととすればよく、また、その決定に先立って行うこととしてもよい。 In each of the above configurations, when determining the evaluation value indicating the followability of the singing, it is necessary to specify the periodicity included in the “non-reproducibility series”, and the determination determines the evaluation value. It may be performed at the timing, or may be performed prior to the determination.

この後者のためには、上記各構成を以下に示す第４の構成（請求項４）のようにするとよい。
この構成においては、前記非再現性の系列に含まれる周期性を特定する周期特定手段，を備えている。そして、前記評価出力手段は、前記周期特定手段により特定された周期性が低いほど、前記対象楽曲に対する歌唱の追従性として高い評価値を出力する。 For this latter, each of the above-mentioned configurations should be as a fourth configuration (claim 4) shown below.
In this configuration, there is provided period specifying means for specifying the periodicity included in the non-reproducible series. And the said evaluation output means outputs a high evaluation value as followability of the singing with respect to the said target music, so that the periodicity specified by the said period specific means is low.

この構成であれば、歌唱の追従性を示す評価値を決定するのに先立ち、非再現性の系列に含まれる周期性を特定しておくことができる。
この構成における周期性の特定方法については、特に限定されないが、例えば、非再現性の系列を、非再現性の大きさを振幅として変化する波形とみなし、その波形の周波数成分の分布で規定される周期性を特定できるようにする、ことが考えられる。 If it is this structure, prior to determining the evaluation value which shows the followability of a song, the periodicity contained in the non-reproducibility series can be specified.
The method for specifying periodicity in this configuration is not particularly limited. For example, the non-reproducibility sequence is regarded as a waveform that changes with the magnitude of non-reproducibility as an amplitude, and is defined by the distribution of frequency components of the waveform. It is conceivable to be able to specify the periodicity.

このための構成としては、上記第４の構成を以下に示す第５の構成（請求項５）のようにすることが考えられる。
この構成において、前記周期特定手段は、前記非再現性の系列を、非再現性の大きさを振幅として前記模範変化タイミングｃｓｔ［ｋ］の到来する順に変化する波形とみなし、該波形の周波数成分の分布を算出することにより、該分布で規定される周期性を特定して、前記評価出力手段は、前記周期特定手段により算出された周波数成分の分布に基づき、該分布している周波数成分の尖鋭度が小さいほど、前記時間差の変化パターンに含まれる周期性が低いものとして高い評価値を出力する。 As a configuration for this purpose, it is conceivable that the fourth configuration is changed to a fifth configuration (claim 5) shown below.
In this configuration, the period specifying unit regards the non-reproducibility series as a waveform that changes in order of arrival of the model change timing cst [k] with the non-reproducibility magnitude as an amplitude, and a frequency component of the waveform The periodicity defined by the distribution is specified, and the evaluation output means determines the frequency component distributed based on the distribution of the frequency components calculated by the period specifying means. As the sharpness is smaller, a higher evaluation value is output on the assumption that the periodicity included in the change pattern of the time difference is lower.

この構成であれば、「非再現性の系列」を、非再現性の大きさが振幅として変化する波形とみなし、その波形の周波数成分の分布を算出したうえで、その周波数成分における尖鋭度（いわゆるＱ値）が小さいほど変化パターンに含まれる周期性が低いものとして、そのような場合に高い評価値を出力することしている。 With this configuration, the “non-reproducibility series” is regarded as a waveform whose magnitude of non-reproducibility changes as an amplitude, and after calculating the frequency component distribution of the waveform, the sharpness ( As the periodicity included in the change pattern is lower as the so-called Q value is smaller, a higher evaluation value is output in such a case.

上記周波数成分の分布は、非再現性の系列における周期性が大きければ、当然、特定の周波数成分のスペクトル強度が大きくなっているはずであり、周波数成分の分布においてピークが現れる。この場合、そのようにスペクトル強度が大きくなっている周波数成分については、その尖鋭度として大きな値を示すものとなっているはずである。逆に，非再現性の系列における周期性が小さければ，尖鋭度は小さな値を示す。 If the periodicity in the non-reproducible sequence is large in the frequency component distribution, naturally the spectrum intensity of the specific frequency component should be large, and a peak appears in the frequency component distribution. In this case, the frequency component having such a large spectrum intensity should show a large value as the sharpness. Conversely, if the periodicity in a non-reproducible sequence is small, the sharpness shows a small value.

そのため、上記構成のように、尖鋭度が小さいほど変化パターンに含まれる周期性が低いものとして、そのような場合に高い評価値を出力する構成であれば、その評価値を、対象楽曲に対する歌唱の追従性としての高い評価とすることができる。 Therefore, as in the above configuration, if the sharpness is small, the periodicity included in the change pattern is low, and if it is a configuration that outputs a high evaluation value in such a case, the evaluation value is sung to the target song. It can be set as high evaluation as followability of.

また、この構成においては、周波数成分の分布においてスペクトル強度が大きくなっているものであれば、いずれの周波数成分の尖鋭度に基づいて評価値を決定することとしてもよいが、そのスペクトル強度が最も大きい周波数成分の尖鋭度に基づいて決定するようにすればよい。 In this configuration, as long as the spectrum intensity is high in the distribution of frequency components, the evaluation value may be determined based on the sharpness of any frequency component, but the spectrum intensity is the highest. What is necessary is just to make it determine based on the sharpness of a large frequency component.

このためには、上記構成を以下に示す第６の構成（請求項６）のようにするとよい。
この構成において、前記評価出力手段は、前記周期特定手段により算出された周波数成分の分布に基づき、該分布においてスペクトル強度が最も大きい周波数成分について、該周波数成分の尖鋭度が小さいほど、前記時間差の変化パターンに含まれる周期性が低いものとして高い評価値を決定する。 For this purpose, the above-described configuration is preferably a sixth configuration (claim 6) described below.
In this configuration, the evaluation output means, based on the distribution of frequency components calculated by the period specifying means, for the frequency component having the highest spectral intensity in the distribution, the smaller the sharpness of the frequency component, the smaller the time difference. A high evaluation value is determined on the assumption that the periodicity included in the change pattern is low.

この構成であれば、周波数成分の分布においてスペクトル強度が最も大きくなっている周波数成分の尖鋭度に基づいて評価値を決定することができる。
また、上記各構成は、以下に示す第７の構成（請求項７）のようにするとよい。 With this configuration, the evaluation value can be determined based on the sharpness of the frequency component having the highest spectral intensity in the frequency component distribution.
Each of the above-described configurations may be a seventh configuration (claim 7) described below.

この構成においては、ユーザによる対象楽曲の歌唱時における歌唱音声を示す歌唱データを、該歌唱された対象楽曲を識別可能な識別情報と共に取得する歌唱データ取得手段，を備えている。そして、前記タイミング照合手段は、前記歌唱データ取得手段により歌唱データで示される歌唱音声を、前記歌唱データと共に取得された識別情報で識別される対象楽曲の模範音声と照合する。 In this structure, the singing data acquisition means which acquires the singing data which shows the singing sound at the time of the singing of the object music by a user with the identification information which can identify this object music sung is provided. And the said timing collation means collates the song audio | voice shown by song data by the said song data acquisition means with the model audio | voice of the target music identified with the identification information acquired with the said song data.

この構成であれば、ユーザによる対象楽曲の歌唱毎に歌唱データを取得すると共に、その歌唱データに基づいて評価値を決定して出力することができる。
なお、上記各構成における追従性評価システムは、１つの装置として構成してもよいし、それぞれ通信可能に接続された複数の装置が協調して動作するように構成してもよい。 If it is this structure, while obtaining song data for every song of the object music by a user, an evaluation value can be determined and output based on the song data.
The followability evaluation system in each of the above configurations may be configured as a single device, or may be configured such that a plurality of devices that are communicably connected operate in cooperation with each other.

また、上記課題を解決するための構成としては、カラオケシステムを以下に示す第８の構成（請求項８）のようにしてもよい。
この構成においては、請求項１から７のいずれかに記載の追従性評価システムと、前記歌唱データで示される歌唱音声を時系列に沿って所定の区間毎に分割した単位区間それぞれについて、該単位区間の音声に関する歌唱パラメータを、該単位区間において発声すべき正しい音声に基づく理想パラメータと対比することにより、その歌唱楽曲を採点する歌唱採点手段と、前記歌唱採点手段により採点された採点結果を報知する結果報知手段と、を備えている。 Moreover, as a structure for solving the said subject, you may make it like the 8th structure (Claim 8) which shows a karaoke system below.
In this configuration, for each unit section obtained by dividing the follow-up evaluation system according to any one of claims 1 to 7 and the singing voice indicated by the singing data for each predetermined section along a time series, the unit By comparing the singing parameters relating to the voice of the section with the ideal parameters based on the correct voice to be uttered in the unit section, the singing scoring means for scoring the singing song, and the scoring results scored by the singing scoring means are reported. And a result notification means.

そして、前記歌唱採点手段は、前記歌唱パラメータと前記理想パラメータとの対比による採点結果を、前記評価出力手段により出力された評価値に応じて加減点させることにより、最終的な採点結果を決定する。 The singing scoring means determines the final scoring result by adding or subtracting the scoring result based on the comparison between the singing parameter and the ideal parameter according to the evaluation value output by the evaluation output means. .

この構成であれば、上記各構成と同様の作用，効果を得ることができる。さらに、上述したように出力された評価値を考慮した採点結果を報知することができる。
また、上記課題を解決するためには、上記第１〜第８のいずれかにおける全ての手段として機能させるための各種処理手順をコンピュータシステムに実行させるためのプログラム（請求項９）としてもよい。 If it is this structure, the effect | action and effect similar to said each structure can be acquired. Furthermore, the scoring result considering the evaluation value output as described above can be notified.
Moreover, in order to solve the said subject, it is good also as a program (Claim 9) for making a computer system perform the various process procedures for functioning as all the means in any of the said 1st-8th.

このプログラムを実行するコンピュータシステムであれば、上記第１〜第８のいずれかに係る追従性評価システムの一部を構成することができる。
なお、上述したプログラムは、コンピュータシステムによる処理に適した命令の順番付けられた列からなるものであって、各種記録媒体や通信回線を介して追従性評価システム，カラオケシステムや、これを利用するユーザ等に提供されるものである。 If it is a computer system that executes this program, it can constitute a part of the follow-up evaluation system according to any one of the first to eighth aspects.
The above-described program is composed of an ordered sequence of instructions suitable for processing by a computer system, and uses a tracking evaluation system, a karaoke system, and the like via various recording media and communication lines. It is provided to users and the like.

カラオケシステムの全体構成を示すブロック図Block diagram showing the overall configuration of the karaoke system 追従性評価処理を示すフローチャートFlow chart showing follow-up evaluation process 音高の推移パターンに基づいて変化タイミングの対応関係を特定する様子を示す図The figure which shows a mode that the correspondence of a change timing is specified based on the transition pattern of a pitch 音声レベルの推移パターンに基づいて変化タイミングの対応関係を特定する様子を示す図The figure which shows a mode that the correspondence of a change timing is specified based on the transition pattern of an audio | voice level 非再現性の系列における変化のパターンの周期性を特定する様子を示す図The figure which shows a mode that the periodicity of the pattern of a change in a non-reproducibility series is specified. 楽曲演奏処理を示すフローチャートFlow chart showing music performance processing

以下に本発明の実施形態を図面と共に説明する。
（１）全体構成
カラオケシステム１は、周知のコンピュータシステムからなるサーバ２と、１以上のカラオケ装置３それぞれとが、ネットワーク１００を介して通信可能に接続されてなるものである。 Embodiments of the present invention will be described below with reference to the drawings.
(1) Overall Configuration The karaoke system 1 is configured such that a server 2 composed of a well-known computer system and each of one or more karaoke apparatuses 3 are communicably connected via a network 100.

サーバ２は、サーバ全体を制御する制御部２１，各種情報を記憶する記憶部２３，ネットワーク１００を介した通信を制御する通信部２５，キーボードやディスプレイなどからなるユーザインタフェース（Ｕ／Ｉ）部２７，記録メディアを介して情報を入出力するメディアドライブ２９などを備えている。なお、このサーバ２が本発明における追従性評価システムとして機能するものである。 The server 2 includes a control unit 21 that controls the entire server, a storage unit 23 that stores various information, a communication unit 25 that controls communication via the network 100, and a user interface (U / I) unit 27 including a keyboard and a display. , A media drive 29 for inputting / outputting information via a recording medium. The server 2 functions as a followability evaluation system in the present invention.

カラオケ装置３は、装置全体を制御する制御部３１，演奏楽曲の伴奏内容および歌詞を示す楽曲データや映像データなどを記憶する記憶部３３，ネットワーク１００を介した通信を制御する通信部３５，各種映像の表示を行う表示部４１，複数のキー・スイッチなどからなる操作部４３，マイク４５からの音声の入力とスピーカ４７からの音声の出力とを制御する音声入出力部４９などを備えている。
（２）サーバ２による追従性評価処理
以下に、サーバ２の制御部２１が、内蔵メモリまたは記憶部２３に記憶されているプログラムに従って実行する追従性評価処理の処理手順を図２に基づいて説明する。 The karaoke device 3 includes a control unit 31 that controls the entire device, a storage unit 33 that stores music data and video data indicating the accompaniment content and lyrics of the performance music, a communication unit 35 that controls communication via the network 100, and the like. A display unit 41 for displaying video, an operation unit 43 including a plurality of keys and switches, a voice input / output unit 49 for controlling voice input from the microphone 45 and voice output from the speaker 47, and the like are provided. .
(2) Follow-up evaluation process by server 2 Hereinafter, the process procedure of the follow-up evaluation process executed by the control unit 21 of the server 2 according to a program stored in the built-in memory or the storage unit 23 will be described with reference to FIG. To do.

この追従性評価処理は、いずれかのカラオケ装置３から歌唱データを取得する（ｓ１１０）ことにより開始される。
この歌唱データは、ユーザがカラオケ装置３を使用して楽曲を歌唱した後で送信されてくるデータであり、その歌唱に係る音声の時系列に沿った音声信号をデジタル信号として示すものである。また、この歌唱データは、その歌唱に係る楽曲の識別情報（楽曲番号）が付加された状態で送信されてくるものである。なお、この歌唱データは、カラオケ装置３による歌唱とは無関係に取得されることとしてもよい。 This follow-up evaluation process is started by acquiring singing data from any karaoke apparatus 3 (s110).
This singing data is data that is transmitted after the user sings a song using the karaoke device 3, and indicates an audio signal along a time series of audio related to the singing as a digital signal. The song data is transmitted with identification information (music number) of the music related to the song added. In addition, this song data is good also as being acquired irrespective of the song by the karaoke apparatus 3. FIG.

この追従性評価処理が起動されると、まず、その起動に際して受信した歌唱データで示される音声波形に基づいて、この音声波形が離散周波数スペクトルに変換される（ｓ１２０）。 When the follow-up evaluation process is activated, first, the speech waveform is converted into a discrete frequency spectrum based on the speech waveform indicated by the singing data received at the time of activation (s120).

ここでは、まず、音声波形ｖ［ｉ］（ｉ：時間インデックス）（図３（ａ）参照）を、デジタル信号としてのサンプリングのポイントを所定数ｎ₀ずつズラして時間長Ｎ₀（例えば、数十ｍｓ）の時間窓ｗ［ｎ］で順番に切り出してなる波形素片ｖ_w［ｐ］（ｐ＝１，２，…，Ｎ₀）が、下記の式１により求められる。 Here, first, the voice waveform v [i] (i: time index) (see FIG. 3A) is shifted by a predetermined number n ₀ from the sampling point as a digital signal to obtain a time length N ₀ (for example, A waveform segment v _w [p] (p = 1, 2,..., N ₀ ) cut out in order in a time window w [n] of several tens of ms is obtained by the following equation 1.

なお、この時間素片ｖ_w［ｐ］は、時間窓ｗ［ｎ］の順番（番号）ｍ，および，デジタル信号におけるサンプリング周波数Ｆ_sに基づいて下記の式２により決められる時間領域ｔ［ｍ］の音声波形を示すものである。 The time segment v _w [p] is a time region t [m] determined by the following equation 2 based on the order (number) m of the time window w [n] and the sampling frequency F _s in the digital signal. ] Is shown.

そして、こうして求められた波形素片ｖ_w［ｐ］が、以下の式３により離散フーリエ変換されることにより、音声波形ｖ［ｉ］を変換してなる離散周波数スペクトルＶ［ｉ’］が求められる。 Then, the waveform segment v _w [p] thus obtained is subjected to discrete Fourier transform by the following expression 3, thereby obtaining a discrete frequency spectrum V [i ′] obtained by converting the speech waveform v [i]. It is done.

次に、上記ｓ１２０で変換された離散周波数スペクトルＶ［ｉ’］に基づいて、この離散周波数スペクトルに含まれている調波構造の成分における基本周波数が推定される（ｓ１３０）。ここでは、基本周波数Ｆ₀とその高調波成分（倍音成分）からなる調波構造モデルＶ_HM［ｉ’］（下記の式４）と、上記ｓ１２０にて変換された離散周波数スペクトルＶ［ｉ’］（ｉ’：周波数インデックス）と、の相関関係が最大になるＦ₀が、上述した時間領域ｔ［ｍ］について求められ、こうして求められるＦ₀が基本周波数ｖｆ０［ｍ］として推定される。 Next, based on the discrete frequency spectrum V [i ′] converted in s120, the fundamental frequency in the harmonic structure component included in the discrete frequency spectrum is estimated (s130). Here, the harmonic structure model V _HM [i ′] (formula 4 below) composed of the fundamental frequency F ₀ and its harmonic component (harmonic component), and the discrete frequency spectrum V [i ′ converted in the above s120. ] (I ′: frequency index) and the maximum correlation F ₀ is obtained for the time domain t [m] described above, and F ₀ thus obtained is estimated as the fundamental frequency vf ₀ [m].

こうして推定された基本周波数ｖｆ０［ｍ］は、各時間窓に対応する周波数を分布させると、図３（ｂ）に示すように、歌唱データで示される音声波形に含まれる基本周波数の推移，つまり音高の推移パターンを示すものとなる。 When the fundamental frequency vf0 [m] estimated in this way is distributed over the frequency corresponding to each time window, as shown in FIG. 3B, the transition of the fundamental frequency included in the speech waveform indicated by the song data, that is, It shows the transition pattern of the pitch.

次に、上記ｓ１１０にて受信した歌唱データに付加された楽曲（以降「歌唱楽曲」という）の識別情報（楽曲番号）に基づき、その楽曲において発声すべき正しい音声（以降「模範音声」という）を示す模範データが、記憶部２３における模範データ用の記憶領域にあらかじめ記憶されている複数種類の模範データの中から読み出される（ｓ１４０）。 Next, based on the identification information (music number) of the music (hereinafter referred to as “song music”) added to the song data received in s110, the correct sound (hereinafter referred to as “model voice”) to be uttered in the music. Is read from a plurality of types of model data stored in advance in the storage area for model data in the storage unit 23 (s140).

この模範データは、歌唱楽曲における模範音声の時間軸に沿った音高の推移パターンを、その模範音声となる構成音ｋ（＝１，２，…）それぞれの発声開始タイミングｃｓｔ［ｋ］，音高ｃｆ０［ｋ］，音価ｃｌｅｎ［ｋ］および音声レベルｃｖｏｌ［ｋ］にて規定したものであり、本実施形態では、各構成音を音符として表した楽譜データである。 This model data includes the pitch transition pattern along the time axis of the model voice in the song composition, the utterance start timing cst [k], the sound of each of the constituent sounds k (= 1, 2,...) As the model voice. It is defined by a high cf0 [k], a note value clen [k], and a voice level cvol [k]. In the present embodiment, the score data represents each component sound as a note.

次に、上記ｓ１４０にて読み出された模範データで示される模範音声，および，上記ｓ１１０にて受信した歌唱データで示される歌唱音声それぞれの時間軸に沿った音高の推移パターンを照合することで、模範音声において連続する構成音が変化する変化タイミング（以降「模範変化タイミング」という）それぞれが、歌唱音声において連続する構成音の変化する変化タイミング（以降「歌唱変化タイミング」という）ｃｓｔ［ｋ］（ｋ＝１〜ｎ）のいずれに対応するのかが特定される（ｓ１５０）。 Next, the pitch transition pattern along the time axis of the model voice indicated by the model data read out at s140 and the song voice indicated by the song data received at s110 is checked. Each change timing (hereinafter referred to as “exemplary change timing”) in which the constituent sound continues in the model voice is changed by a change timing (hereinafter referred to as “singing change timing”) in the singing voice cst [k ] (K = 1 to n) is identified (s150).

このｓ１５０では、まず、模範音声における音高の推移パターンに基づいて、模範音声において連続する構成音の変化が開始されてから終了するまでの間の所定タイミング（本実施形態では中間地点）それぞれが模範変化タイミングとして特定される。 In s150, first, based on the pitch transition pattern in the model voice, each predetermined timing (intermediate point in the present embodiment) from the start to the end of the continuous change in the constituent voice in the model voice is determined. Specified as an example change timing.

続いて、歌唱音声および模範音声それぞれにおいて各模範変化タイミングを中心とする基準期間（例えば、隣接する構成音それぞれまでの期間）分の音高の推移パターンそれぞれが同一基準期間同士で照合される（図３（ｃ）参照）。 Subsequently, in each of the singing voice and the model voice, the pitch transition patterns for the reference period (for example, the period to each adjacent constituent sound) centered on each model change timing are collated in the same reference period ( (Refer FIG.3 (c)).

ここでは、模範音声における基準期間のうち、連続する構成音の音高が変化している模範変化タイミングを中心とする各基準期間の推移パターンに対し、歌唱音声における同一基準区間の推移パターンを時間軸に沿って移動させ、両推移パターンの類似度（相関関係）が最大となった際の類似度および時間軸に沿った時間差ｖｄｔ［ｋ］が算出される。なお、ここでの類似度（相関関係）および時間差を算出するための手法については特に限定されないが、例えば、特開２００５−１０７３３０号公報に記載されている手法を用いることが考えられる。 Here, the transition pattern of the same reference section in the singing voice is changed to the transition pattern of each reference period centering on the model change timing in which the pitch of the continuous constituent sounds is changing among the reference periods in the model voice. By moving along the axis, the similarity when the similarity (correlation) between both transition patterns becomes maximum and the time difference vdt [k] along the time axis are calculated. The method for calculating the similarity (correlation) and the time difference here is not particularly limited. For example, it is conceivable to use the method described in JP-A-2005-107330.

そして、上記照合により類似度および時間差が算出された模範変化タイミングそれぞれが、この模範変化タイミングとの照合の対象となった歌唱音声の基準期間に含まれる歌唱変化タイミングに対応するものとして特定される。ここで、同一音高で連続する構成音における模範変化タイミングに関しては、推移パターンの照合および時間差ｖｄｔ［ｋ］の算出が行われないが、これら時間差ｖｄｔ［ｋ］は初期値の「０」とされる。 Then, each model change timing at which the similarity and time difference are calculated by the above collation is specified as corresponding to the song change timing included in the reference period of the singing voice subjected to collation with this model change timing. . Here, with respect to the model change timings of the constituent sounds that are continuous at the same pitch, the transition pattern is not collated and the time difference vdt [k] is not calculated, but these time differences vdt [k] are set to the initial value “0”. Is done.

次に、上記ｓ１１０にて受信した歌唱データで示される音声波形に基づいて、この音声波形が音声レベルの推移を示すレベル波形に変換される（ｓ１６０）。
ここでは、まず、上記ｓ１２０と同様に、音声波形ｖ［ｉ］（図３（ａ）参照）を、デジタル信号としてのサンプリングのポイントを所定数ｎ₀ずつズラして時間長Ｎ₀の時間窓ｗ［ｎ］で順番に切り出してなる波形素片ｖ_w［ｐ］が上記の式１により求められる。 Next, based on the speech waveform indicated by the singing data received at s110, the speech waveform is converted into a level waveform indicating the transition of the speech level (s160).
Here, first, similarly to s120, the speech waveform v [i] (see FIG. 3 (a)) is shifted by a predetermined number n ₀ from the sampling point as a digital signal, and a time window of time length N ₀ is obtained. The waveform segment v _w [p] cut out in order by w [n] is obtained by the above-described equation 1.

そして、こうして求められた波形素片ｖ_w［ｐ］が、以下の式５により、音声レベルの推移を示すレベル波形ｖ_p［ｍ］に変換される。 Then, the waveform segment v _w [p] obtained in this way is converted into a level waveform v _p [m] indicating the transition of the voice level by the following Expression 5.

こうして変換されたレベル波形ｖ_p［ｍ］は、各時間窓に対応する音声レベルを分布させると、図４（ａ）に示すように、歌唱データで示される音声波形における音声レベルの推移パターンを示すものとなる。 When the level waveform v _p [m] thus converted distributes the sound level corresponding to each time window, as shown in FIG. 4A, the transition pattern of the sound level in the sound waveform indicated by the song data is shown. It will be shown.

次に、上記ｓ１１０にて受信した歌唱データで示される歌唱音声それぞれの時間軸に沿った区間のうち、上記ｓ１５０にて模範変化タイミングとの対応関係が特定された歌唱変化タイミングで挟まれ，かつ，対応関係が特定されなかった模範変化タイミングに対応する区間について、この区間において音声レベルが一定以下になるタイミングが、その対応関係の特定されなかった模範変化タイミングに対応する歌唱変化タイミングとして特定される（ｓ１７０）。 Next, among the sections along the time axis of each of the singing voices indicated by the singing data received at s110, the singing change timing between which the correspondence relationship with the model changing timing is specified at s150 is sandwiched, and , For the section corresponding to the model change timing for which the correspondence relationship has not been specified, the timing at which the audio level falls below a certain level in this section is specified as the song change timing corresponding to the model change timing for which the correspondence relationship has not been specified. (S170).

ここでは、図４（ｂ）に示すように、上記ｓ１６０にて変換されたレベル波形のうち、該当する区間において音声レベルが最小となるタイミングが特定され、このタイミングが、該当区間において特定されなかった模範変化タイミングに対応する歌唱変化タイミングとして特定される。 Here, as shown in FIG. 4B, the timing at which the audio level is minimized in the corresponding section is specified in the level waveform converted in s160, and this timing is not specified in the corresponding section. It is specified as the song change timing corresponding to the model change timing.

このタイミングについてはどのように特定することとしてもよいが、本実施形態では、上述した式５により変換されたレベル波形ｖ_p［ｍ］に基づき、このレベル波形の一次微分値ｖ_p’［ｍ］＝０、および、二次微分値ｖ_p’’［ｍ］＞０の条件を同時に満たす「ｍ」に対応するタイミングとして特定するように構成されている。 Although this timing may be specified in any way, in the present embodiment, based on the level waveform v _p [m] converted by Equation 5 described above, the first-order differential value v _p ′ [m] of this level waveform is used. ] = 0 and the timing corresponding to “m” that simultaneously satisfies the condition of the secondary differential value v _p ″ [m]> 0.

ここでの各微分値は下記の式６，７により近似される。
一次微分値ｖ_p’［ｍ］＝ｖ_p［ｍ＋１］−ｖ_p［ｍ］ … （式６）
二次微分値ｖ_p’’［ｍ］＝ｖ_p’［ｍ＋１］−ｖ_p’［ｍ］ … （式７）
また、このｓ１７０では、さらに、以下の式８により、対応関係が特定された歌唱変化タイミングにおける模範変化タイミングｃｓｔ［ｋ］に対する時間差ｖｄｔ［ｋ］が算出，特定される。なお、下記の式８では、上記のように特定されたｍをｍ０と表してある。 Each differential value here is approximated by the following formulas 6 and 7.
First derivative value v _p '[m] = v _p [m + 1] −v _p [m] (Expression 6)
Second derivative value v _p ″ [m] = v _p ′ [m + 1] −v _p ′ [m] (Expression 7)
In s170, the time difference vdt [k] with respect to the model change timing cst [k] at the singing change timing for which the correspondence relationship is specified is further calculated and specified by the following Expression 8. In the following Expression 8, m specified as described above is represented as m0.

次に、上記ｓ１５０，ｓ１７０にて特定された構成音ｋそれぞれについての時間差ｖｄｔ［ｋ］に基づいて、歌唱音声における各構成音ｋの発声時間ｖｌｅｎ［ｋ］それぞれが特定される（ｓ１８０）。ここでは、ある構成音ｋについての時間差ｖｄｔ［ｋ］と、これに隣接する構成音ｋ＋１についての時間差ｖｄｔ［ｋ＋１］と、から下記の式９により、構成音ｋに対する発声時間長ｖｌｅｎ［ｋ］が算出，特定される。 Next, each utterance time vlen [k] of each constituent sound k in the singing voice is specified based on the time difference vdt [k] for each of the constituent sounds k specified in s150 and s170 (s180). Here, the time difference vdt [k] for a certain component sound k and the time difference vdt [k + 1] for the component sound k + 1 adjacent thereto are expressed by the following Expression 9 to give the utterance time length vlen [k] for the component sound k. Is calculated and specified.

次に、上記ｓ１８０にて算出された発声時間ｖｌｅｎ［ｋ］それぞれについて、同一区間における模範発声時間ｃｌｅｎ［ｋ］の非再現性ｎｖｌｅｎ［ｋ］それぞれが特定される（ｓ１９０）。この「非再現性」とは、歌唱音声における実際の発声時間が、同一構成音の模範発声時間をどの程度再現できていないのか、を示すものである。 Next, for each utterance time vlen [k] calculated in s180, each non-reproducible nvlen [k] of the exemplary utterance time clen [k] in the same section is specified (s190). This “non-reproducibility” indicates how much the actual utterance time of the singing voice cannot reproduce the model utterance time of the same constituent sound.

ここでは、上記ｓ１８０にて特定された発声時間ｖｌｅｎ［ｋ］それぞれを、この発声時間ｖｌｅｎ［ｋ］の算出時に参照された模範変化タイミングｃｓｔ［ｋ］，ｃｓｔ［ｋ＋１］の区間を適切に発声した場合における模範発声時間ｃｌｅｎ［ｋ］と対比することにより、発声時間ｖｌｅｎ［ｋ］における模範発声時間ｃｌｅｎ［ｋ］の非再現性ｎｖｌｅｎ［ｋ］（＝ｖｌｅｎ［ｋ］／ｃｌｅｎ［ｋ］）が算出される。 Here, each utterance time vlen [k] specified in s180 is appropriately uttered in the section of the model change timings cst [k] and cst [k + 1] referred to when the utterance time vlen [k] is calculated. The non-reproducibility nvlen [k] (= vlen [k] / clen [k]) of the exemplary utterance time clen [k] at the utterance time vlen [k] by comparing with the exemplary utterance time clen [k] Is calculated.

次に、上記ｓ１９０にて特定された非再現性ｎｖｌｅｎ［ｋ］の系列に基づいて、この系列における非再現性の変化パターンに含まれる周期性が特定される（ｓ２００）。
ここでは、非再現性の系列を、その特定に際して参照された変化タイミングの到来する順に非再現性の大きさを振幅として変化する波形とみなし（図５（ａ）参照）、その波形の周波数スペクトルの分布ＮＶＬＥＮ［ｋ］を下記の式１０にて算出することにより、この分布で規定される周期性が特定される。 Next, based on the non-reproducible nvlen [k] series specified in s190, the periodicity included in the non-reproducible change pattern in this series is specified (s200).
Here, the non-reproducibility series is regarded as a waveform that changes with the non-reproducibility magnitude as an amplitude in the order of arrival of the change timing referred to in the identification (see FIG. 5A), and the frequency spectrum of the waveform. The distribution NVLEN [k] is calculated by the following equation 10 to identify the periodicity defined by this distribution.

こうして特定された周波数スペクトルの分布ＮＶＬＥＮ［ｋ］は、各変化タイミングについてスペクトル強度を分布させると、図５（ｂ）に示すように、非再現性の変化パターンに含まれる周期性が高いほど、その周期性に応じた周波数成分のスペクトル強度が大きくなる。つまり、この周波数スペクトルの分布ＮＶＬＥＮ［ｋ］は、スペクトル強度が大きいほど、そのスペクトル強度に対応する周波数成分についての周期性が高いということを示す。 As shown in FIG. 5 (b), the frequency spectrum distribution NVLEN [k] identified in this way distributes the spectrum intensity for each change timing, and as shown in FIG. 5B, the higher the periodicity included in the non-reproducible change pattern, The spectral intensity of the frequency component corresponding to the periodicity increases. That is, this frequency spectrum distribution NVLEN [k] indicates that the greater the spectrum intensity, the higher the periodicity of the frequency component corresponding to the spectrum intensity.

そして、上記ｓ２００にて特定された周期性に基づいて、上記ｓ１１０で受信された歌唱データで示される歌唱音声における歌唱の追従性を評価してなる評価値が決定される（ｓ２１０）。 Then, based on the periodicity specified in s200, an evaluation value obtained by evaluating the followability of singing in the singing voice indicated by the singing data received in s110 is determined (s210).

ここでは、上記ｓ２００にて特定された周波数スペクトルの分布ＮＶＬＥＮ［ｋ］において、その分布している所定の周波数成分（例えば、最もスペクトル強度の大きい周波数成分）の尖鋭度Ｑが小さいほど、非再現性の変化パターンに含まれる周期性が低いものとして高い評価値を決定する。 Here, in the frequency spectrum distribution NVLEN [k] specified in s200, the smaller the sharpness Q of the predetermined frequency component (for example, the frequency component having the highest spectral intensity) distributed, the less the reproduction is performed. A high evaluation value is determined on the assumption that the periodicity included in the sex change pattern is low.

具体的には、上記周波数成分におけるピークとなる時間インデックスｋを「ｋ０」とし、そのピークから１／２の大きさになる時間インデックスｋの幅を「Δｋ」とした場合にｋ０とΔｋとの比（ｋ０／Δｋ）により尖鋭度Ｑが求められ、この尖鋭度Ｑの逆数が評価値ＳＣ（＝１／Ｑ）として決定される。 Specifically, when the time index k that is a peak in the frequency component is “k0” and the width of the time index k that is ½ of the peak is “Δk”, k0 and Δk The sharpness Q is obtained from the ratio (k0 / Δk), and the reciprocal of this sharpness Q is determined as the evaluation value SC (= 1 / Q).

なお、このｓ２１０では、上述した評価値ＳＣの決定だけでなく、歌唱データに基づいて周知の採点を行い、その採点結果を、評価値ＳＣに応じて加減点させることにより、最終的な採点結果を決定することとしてもよい。ここでの採点は、例えば、歌唱データで示される歌唱音声を時系列に沿って所定の区間毎に分割した単位区間それぞれについて、その単位区間の音声に関する歌唱パラメータを、その単位区間において発声すべき正しい音声に基づく理想パラメータと対比することにより、単位区間それぞれにおけるパラメータの誤差に応じた値を採点結果とすればよい。 In s210, not only the above-described evaluation value SC is determined, but also a well-known scoring is performed based on the singing data, and the scoring result is added or subtracted according to the evaluation value SC, thereby obtaining a final scoring result. May be determined. The scoring here is, for example, for each unit section obtained by dividing the singing voice indicated by the singing data for each predetermined section along the time series, and singing parameters related to the sound of the unit section should be uttered in the unit section. By comparing with an ideal parameter based on correct speech, a value corresponding to a parameter error in each unit interval may be used as a scoring result.

そして、このｓ２１０にて決定された評価値ＳＣ（または評価値と採点結果；以降「評価値等」という）が、楽曲データの送信元であるカラオケ装置３へと返信された後（ｓ２２０）、本追従性評価処理が終了する。 Then, after the evaluation value SC (or evaluation value and scoring result; hereinafter referred to as “evaluation value etc.”) determined in s210 is returned to the karaoke device 3 that is the music data transmission source (s220), The followability evaluation process ends.

この評価値等を受信したカラオケ装置３では、後述する楽曲演奏処理により、その評価値等の表示部４１への表示を行うこととなる。
（３）カラオケ装置３による楽曲演奏処理
以下に、カラオケ装置３の制御部３１が内蔵メモリまたは記憶部３３に記憶されたプログラムに従って実行する楽曲演奏処理の処理手順を図６に基づいて説明する。この楽曲演奏処理は、カラオケ装置３が起動した以降、繰り返し実行される。 In the karaoke apparatus 3 that has received the evaluation value or the like, the evaluation value or the like is displayed on the display unit 41 by a music performance process described later.
(3) Music performance processing by the karaoke apparatus 3 Hereinafter, a processing procedure of music performance processing executed by the control unit 31 of the karaoke apparatus 3 according to a program stored in the built-in memory or the storage unit 33 will be described with reference to FIG. This music performance process is repeatedly executed after the karaoke apparatus 3 is activated.

この楽曲演奏処理が起動されると、まず、ユーザにより歌唱すべき楽曲を選択するための操作が行われるまで待機状態となる（ｓ３１０：ＮＯ）。
その後、楽曲を選択するための操作が行われたら（ｓ３１０：ＹＥＳ）、そうして選択された楽曲（指定楽曲）の楽曲番号が取得される（ｓ３２０）。 When the music performance process is activated, the user enters a standby state until an operation for selecting a music to be sung is performed by the user (s310: NO).
Thereafter, when an operation for selecting a song is performed (s310: YES), the song number of the song (designated song) thus selected is acquired (s320).

次に、上記ｓ３２０にて取得された楽曲番号に基づき、この楽曲番号で識別される指定楽曲を演奏するための楽曲データをカラオケ装置３に要求するための情報として、その楽曲番号，および，これと共に取得されたユーザＩＤを伴う通知要求が生成され（ｓ３３０）、これがサーバ２に送信される（ｓ３４０）。 Next, as information for requesting the karaoke apparatus 3 for music data for playing the designated music identified by the music number based on the music number acquired in s320, the music number, and this A notification request with the user ID acquired together is generated (s330), and is transmitted to the server 2 (s340).

この通知要求を受信したサーバ２は、この通知要求に伴う楽曲番号で識別される指定楽曲を演奏するための楽曲データを返信してくるように構成されている。
こうして、上記ｓ３４０で通知要求を送信した後、サーバ２から返信されてくる楽曲データが受信されたら（ｓ３５０）、この楽曲データが記憶部３３に記憶される（ｓ３６０）。 The server 2 that has received the notification request is configured to return music data for playing the designated music identified by the music number associated with the notification request.
Thus, after the notification request is transmitted in s340, when the music data returned from the server 2 is received (s350), the music data is stored in the storage unit 33 (s360).

次に、上記ｓ３６０にて記憶部３３に記憶された楽曲データに基づく指定楽曲の演奏が開始されると共に（ｓ３８０）、その演奏に際してマイク４５から入力された音声，つまり指定楽曲を歌唱してなる音声を示す歌唱データの生成が開始される（ｓ３９０）。 Next, the performance of the designated music based on the music data stored in the storage unit 33 is started in s360 (s380), and the voice input from the microphone 45 during the performance, that is, the designated music is sung. Generation of song data indicating voice is started (s390).

こうして、指定楽曲の演奏が開始された以降、その演奏が終了するまで待機状態となった後（ｓ４００：ＮＯ）、演奏が終了したら（ｓ４００：ＹＥＳ）、上記ｓ３９０にて開始された歌唱データの生成が終了され、その時点までに生成された歌唱データが取得される（ｓ４１０）。 In this way, after the performance of the designated music is started, it is in a standby state until the performance is completed (s400: NO), and when the performance is completed (s400: YES), the song data started in s390 is stored. The generation is finished, and the song data generated up to that point is acquired (s410).

次に、上記ｓ４１０にて取得された歌唱データがサーバ２へと送信される（ｓ４２０）。この歌唱データを受信したサーバ２は、上述した追従性評価処理により追従性の評価を行った後、その評価結果である評価値または採点結果（評価値等）を返信してくる。 Next, the singing data acquired in s410 is transmitted to the server 2 (s420). The server 2 that has received this singing data returns the evaluation value or scoring result (evaluation value, etc.), which is the evaluation result, after evaluating the followability by the followability evaluation process described above.

なお、ここでは、歌唱データそのものをサーバ２へと送信しているが、サーバ２側で評価値等を決定するために必要なパラメータのみをサーバ２へと送信することとしてもよい。 Although the singing data itself is transmitted to the server 2 here, only the parameters necessary for determining the evaluation value and the like on the server 2 side may be transmitted to the server 2.

そして、上記ｓ４２０により歌唱データがサーバ２へと送信されてから、このサーバ２から送信されてくる評価値等が受信され（ｓ４３０）、この評価値等が表示部４１に表示された後（ｓ４４０）、本楽曲演奏処理が終了する。
（４）作用，効果
このように構成されたカラオケシステム１では、歌唱音声および模範音声を照合して各構成音ｋの発声時間ｖｌｅｎ［ｋ］それぞれを特定すると共に（図２のｓ１８０）、これを対応する模範発声時間ｃｌｅｎ［ｋ］と対比して非再現性ｎｖｌｅｎ［ｋ］を特定し（同図ｓ１９０）、この非再現性ｎｖｌｅｎ［ｋ］の系列に含まれる周期性が低いほど、対象楽曲に対する歌唱の追従性として高い評価値を決定する（同図ｓ２１０）。 Then, after the singing data is transmitted to the server 2 by the above s420, the evaluation value and the like transmitted from the server 2 are received (s430), and the evaluation value and the like are displayed on the display unit 41 (s440). ), The music performance process ends.
(4) Actions and effects In the karaoke system 1 configured as described above, the singing voice and the model voice are collated to specify the utterance time vlen [k] of each constituent sound k (s180 in FIG. 2), and this Is compared with the corresponding exemplary utterance time clen [k] (s190 in the figure), and the lower the periodicity included in this non-reproducible nvlen [k] series, the more A high evaluation value is determined as the followability of the singing to the music (s210 in the figure).

非再現性の系列は、対象楽曲に対する歌唱に追従できている，つまりテンポに合わせて適切に歌唱できていれば、その系列における変化パターンに大きな周期性が現れることはない。それは、対象楽曲のテンポに合わせて適切に歌唱できていれば、模範発声時間ｃｌｅｎ［ｋ］の非再現性ｎｖｌｅｎ［ｋ］が大きくなることはなく、一定の大きさで推移するため、大きな周期性を持った変化パターンとはなりえないからである。 If the non-reproducible sequence can follow the singing of the target music, that is, if it can sing appropriately according to the tempo, a large periodicity does not appear in the change pattern in the sequence. That is, if the singing can be appropriately performed according to the tempo of the target music, the non-reproducibility nvlen [k] of the exemplary utterance time clen [k] does not become large and changes with a certain size, so a large cycle This is because it cannot be a change pattern with sex.

一方、対象楽曲のテンポに合わせて適切に歌唱できず、実際のテンポから遅れて歌唱したり速く歌唱してしまう場合には、模範発声時間ｃｌｅｎ［ｋ］の非再現性ｎｖｌｅｎ［ｋ］が大きくなった後、非再現性ｎｖｌｅｎ［ｋ］の大きさに起因する歌唱タイミングのズレに気付いた歌唱者が模範変化タイミングに合わせて構成音の音高を変化させる、といった歌唱行動を繰り返すことが予想される。 On the other hand, when the singing cannot be performed properly in accordance with the tempo of the target music and the singing is delayed or sung quickly from the actual tempo, the non-reproducibility nvlen [k] of the model utterance time clen [k] is large. After that, it is expected that the singer who noticed the deviation of the singing timing due to the size of the non-reproducibility nvlen [k] will repeat the singing behavior of changing the pitch of the constituent sound in accordance with the model change timing. Is done.

この場合、模範発声時間ｃｌｅｎ［ｋ］の非再現性ｎｖｌｅｎ［ｋ］が、大きくなった後それまでよりも小さくなるといった変化パターンを繰り返すようになり、これが系列において大きな周期性として現れるようになる。そして、この周期性は、対象楽曲に対する歌唱に追従できていない，つまりテンポに合わせて歌唱できていないほど大きくなる。 In this case, the non-reproducibility nvlen [k] of the exemplary utterance time clen [k] repeats a change pattern in which the non-reproducibility nvlen [k] becomes smaller than before after increasing, and this appears as a large periodicity in the sequence . And this periodicity becomes so large that it cannot follow the singing with respect to the object music, that is, it cannot sing according to the tempo.

そのため、上述のように、模範発声時間ｃｌｅｎ［ｋ］の非再現性ｎｖｌｅｎ［ｋ］の系列に含まれる周期性が低いほど、対象楽曲をそのテンポに合わせて適切に歌唱できているといえ、歌唱に対する追従性が高いということができる。 Therefore, as described above, it can be said that the lower the periodicity included in the non-reproducible nvlen [k] series of the model utterance time clen [k], the more suitable the target music can be sung according to the tempo. It can be said that followability to singing is high.

また、上記実施形態においては、模範音声および歌唱音声それぞれにおける音高の推移パターンを照合することで、模範音声の時間軸に沿った音高の推移パターンのうち、歌唱音声における構成音の歌唱変化タイミングにおける推移パターンに所定のしきい値以上近似している（最大の類似度となっている）模範変化タイミングを特定し、これを、その近似する歌唱変化タイミングに対応する模範変化タイミングであると特定できる（図２のｓ１５０）。 Moreover, in the said embodiment, the singing change of the structure sound in a singing voice is compared among the transition patterns of the pitch along the time axis of a model voice by collating the transition pattern of the pitch in each of a model voice and a singing voice. The model change timing that approximates the transition pattern in the timing more than a predetermined threshold (having the maximum similarity) is specified, and this is the model change timing corresponding to the singing change timing to be approximated It can be identified (s150 in FIG. 2).

また、上記実施形態においては、模範音声および歌唱音声それぞれにおける音高の推移パターンを照合することで変化タイミングの対応関係を特定した後（図２のｓ１５０）、ここで対応関係が特定されなかった区間について、音声レベルの推移パターンを照合することにより、模範音声において同一音高で連続する構成音の模範変化タイミングが、歌唱音声における歌唱変化タイミングのいずれに対応するかを特定することができる（同図１７０）。 Moreover, in the said embodiment, after identifying the correspondence relationship of a change timing by collating the transition pattern of the pitch in each of model voice and singing voice (s150 of FIG. 2), the correspondence relationship was not specified here. By checking the transition pattern of the voice level for the section, it is possible to specify which of the singing change timings in the singing voice corresponds to the exemplar change timing of the constituent sounds that continue at the same pitch in the singing voice ( Fig. 170).

また、上記実施形態においては、歌唱の追従性を示す評価値を決定するのに先立ち、非再現性の系列に含まれる周期性を特定しておくことができる（図２のｓ２００）。
また、上記実施形態においては、「非再現性の系列」を、非再現性の大きさが振幅として変化する波形とみなし、その波形の周波数成分の分布を算出したうえで、その周波数成分における尖鋭度（いわゆるＱ値）が小さいほど変化パターンに含まれる周期性が低いものとして、そのような場合に高い評価値を出力する（図２のｓ２００，ｓ２１０）。 Moreover, in the said embodiment, prior to determining the evaluation value which shows the followability of a song, the periodicity contained in a non-reproducibility series can be specified (s200 of FIG. 2).
Further, in the above embodiment, the “non-reproducibility series” is regarded as a waveform whose magnitude of non-reproducibility changes as an amplitude, and after calculating the distribution of frequency components of the waveform, the sharpness in the frequency component is calculated. As the degree (so-called Q value) is small, the periodicity included in the change pattern is low, and in such a case, a high evaluation value is output (s200 and s210 in FIG. 2).

上記周波数成分の分布は、非再現性の系列における周期性が大きければ、当然、特定の周波数成分のスペクトル強度が大きくなっているはずであり、周波数成分の分布においてピークが現れる。この場合、そのようにスペクトル強度が大きくなっている周波数成分については、その尖鋭度として大きな値を示すものとなっているはずである。逆に，非再現性の系列における周期性が小さければ，尖鋭度は小さな値を示す（図５参照）。 If the periodicity in the non-reproducible sequence is large in the frequency component distribution, naturally the spectrum intensity of the specific frequency component should be large, and a peak appears in the frequency component distribution. In this case, the frequency component having such a large spectrum intensity should show a large value as the sharpness. Conversely, if the periodicity in the non-reproducible series is small, the sharpness shows a small value (see FIG. 5).

そのため、上記実施形態のように、尖鋭度が小さいほど変化パターンに含まれる周期性が低いものとして、そのような場合に高い評価値を出力する構成であれば、その評価値を、対象楽曲に対する歌唱の追従性としての高い評価とすることができる。 Therefore, as in the above-described embodiment, assuming that the periodicity included in the change pattern is lower as the sharpness is smaller, and in such a case, a configuration that outputs a high evaluation value, the evaluation value for the target music It can be set as high evaluation as followability of a song.

また、上記実施形態においては、周波数成分の分布においてスペクトル強度が最も大きくなっている周波数成分の尖鋭度に基づいて評価値を決定することができる（図２の２１０）。 Moreover, in the said embodiment, an evaluation value can be determined based on the sharpness of the frequency component with the largest spectrum intensity in distribution of a frequency component (210 of FIG. 2).

また、上記実施形態においては、ユーザによる対象楽曲の歌唱毎に歌唱データを取得すると共に（図２のｓ１１０）、その歌唱データに基づいて評価値を決定して出力することができる（同図ｓ１２０〜ｓ２１０）。 Moreover, in the said embodiment, while acquiring song data for every song of the object music by a user (s110 of FIG. 2), an evaluation value can be determined and output based on the song data (s120 of the same figure). ~ S210).

また、上記実施形態においては、周知の採点を行った結果を評価値ＳＣに応じて加減点させるように構成した場合であれば、追従性の評価として決定された評価値を考慮した採点結果を報知することができる（図２のｓ２２０）。
（５）変形例
以上、本発明の実施の形態について説明したが、本発明は、上記実施形態に何ら限定されることはなく、本発明の技術的範囲に属する限り種々の形態をとり得ることはいうまでもない。 Moreover, in the said embodiment, if it is a case where it is comprised so that the result of having performed well-known scoring may be added or subtracted according to the evaluation value SC, the scoring result which considered the evaluation value determined as follow-up evaluation is taken. Notification can be made (s220 in FIG. 2).
(5) Modifications Embodiments of the present invention have been described above, but the present invention is not limited to the above-described embodiments, and can take various forms as long as they belong to the technical scope of the present invention. Needless to say.

例えば、上記実施形態においては、カラオケ装置３の表示部４１への表示という態様で評価値を出力するように構成されている（図６のｓ４４０）。しかし、この評価値の出力は、例えば、評価値を示すメッセージをサーバ２の表示部やスピーカなどで表示することで実現してもよい。 For example, in the said embodiment, it is comprised so that an evaluation value may be output by the aspect of the display on the display part 41 of the karaoke apparatus 3 (s440 of FIG. 6). However, the output of the evaluation value may be realized, for example, by displaying a message indicating the evaluation value on the display unit or the speaker of the server 2.

また、上記実施形態においては、模範データが、模範音声の構成音それぞれを音符として表した音譜データである場合を例示した。しかし、この模範データは、模範音声における音高または音声レベルの波形を示すデータとしてもよい。 Moreover, in the said embodiment, the case where model data was the musical score data which represented each component sound of model voice as a musical note was illustrated. However, the model data may be data indicating a waveform of a pitch or a sound level in the model voice.

また、上記実施形態では、カラオケシステム１として、サーバ２およびカラオケ装置３が協調して動作するように構成された場合を例示した。しかし、このシステムは、カラオケ装置３側に実装された機能をサーバ２に実装させることにより、サーバ２単体からなる構成としてもよい。 Moreover, in the said embodiment, the case where the server 2 and the karaoke apparatus 3 were comprised so that it might operate | move cooperatively as the karaoke system 1 was illustrated. However, this system may be configured by the server 2 alone by causing the server 2 to implement the functions implemented on the karaoke device 3 side.

また、上記実施形態におけるサーバ２は、このサーバ２による処理の一部または全部を他の装置と協調して実施することにより、他の装置を含めた全体でサーバ２として機能するようにできることはいうまでもない。 In addition, the server 2 in the above embodiment can function as the server 2 as a whole including other devices by performing part or all of the processing by the server 2 in cooperation with the other devices. Needless to say.

また、上記実施形態においては、模範変化タイミングと歌唱変化タイミングとの時間差を算出するにあたり、推移パターンを照合するように構成されているものを例示した。しかし、両変化タイミングの時間差を算出するにあたっては、歌唱音声を音声認識してなる文字およびその歌唱されたタイミングを、対象楽曲の歌詞を構成する文字およびその歌唱されるタイミングと対比することにより、その時間差を算出することとしてもよい。 Moreover, in the said embodiment, when calculating the time difference of model change timing and song change timing, what was comprised so that a transition pattern might be collated was illustrated. However, in calculating the time difference between the two change timings, by comparing the characters formed by voice recognition of the singing voice and the timing of the singing with the characters constituting the lyrics of the target song and the timing of the singing, The time difference may be calculated.

また、上記実施形態においては、非再現性ｎｖｌｅｎ［ｋ］として、ｖｌｅｎ［ｋ］／ｃｌｅｎ［ｋ］により算出された値を用いるように構成されている。しかし、この非再現性ｎｖｌｅｎ［ｋ］は、「１」から離れるほど非再現性が大きいことを示すものであればよく、例えば、ｃｌｅｎ［ｋ］／ｖｌｅｎ［ｋ］により算出した値を用いてもよい。 Moreover, in the said embodiment, it is comprised so that the value calculated by vlen [k] / clen [k] may be used as non-reproducibility nvlen [k]. However, this non-reproducibility nvlen [k] only needs to indicate that the non-reproducibility increases as the distance from “1” increases. For example, the value calculated by clen [k] / vlen [k] is used. Also good.

また、上記実施形態においては、図２のｓ１３０で基本周波数を推定するにあたり、上記式４のモデルＶＨＭ［ｉ’］を用いるように構成されたものを例示した。しかし、この基本周波数を推定する際に用いるモデルは、このモデルに限られない。例えば、下記に示す式１１のモデルを用いることが考えられる。 Moreover, in the said embodiment, when estimating a fundamental frequency by s130 of FIG. 2, what was comprised so that the model VHM [i '] of the said Formula 4 might be used was illustrated. However, the model used when estimating the fundamental frequency is not limited to this model. For example, it is conceivable to use the model of Equation 11 shown below.

なお、この式７における「σ」は、スペクトルの広がりを調整するためのパラメータであり、分布のピーク値から所定割合Ｘ％（数十％；本実施形態の条件では約３７％）の値に小さくなるまでの周波数インデックスｉのズレを示す。この値が小さいほど調波構造の各成分は細く尖った形状となり、逆に大きいほど太くなめらかな形状となる。そして、この「σ」の値としては、上記所定割合Ｘ％よりも小さい値（具体的な例としては１０〜２０％程度）に設定しておけばよい。
（６）本発明との対応関係
以上説明した実施形態において、図２のｓ１５０，ｓ１７０が本発明におけるズレ特定手段であり、同図ｓ１８０が本発明における発声特定手段であり、同図ｓ１９０が本発明における再現性特定手段であり、同図ｓ２２０が本発明における評価出力手段であり、同図ｓ２００が本発明における周期特定手段であり、同図ｓ１１０が本発明における歌唱データ取得手段であり、同図ｓ２１０が本発明における歌唱採点手段であり、図６のｓ４４０が本発明における結果報知手段である。 Note that “σ” in Equation 7 is a parameter for adjusting the spread of the spectrum, and is a value of a predetermined ratio X% (several tens%; approximately 37% under the conditions of the present embodiment) from the peak value of the distribution. The deviation of the frequency index i until it becomes smaller is shown. The smaller the value, the finer the components of the harmonic structure, and the larger the value, the thicker and smoother the shape. The value of “σ” may be set to a value smaller than the predetermined ratio X% (specifically, about 10 to 20%).
(6) Correspondence with the Present Invention In the embodiment described above, s150 and s170 in FIG. 2 are deviation specifying means in the present invention, s180 in FIG. 2 is utterance specifying means in the present invention, and s190 in FIG. S220 is an evaluation output means in the present invention, s200 is a period specifying means in the present invention, and s110 is a song data acquiring means in the present invention. FIG. S210 is singing scoring means in the present invention, and s440 in FIG. 6 is result notifying means in the present invention.

１…カラオケシステム、２…サーバ、２１…制御部、２３…記憶部、２５…通信部、２７…ユーザインタフェース部、２９…メディアドライブ、３…カラオケ装置、３１…制御部、３３…記憶部、３５…通信部、４１…表示部、４３…操作部、４５…マイク、４７…スピーカ、４９…音声入出力部、１００…ネットワーク。 DESCRIPTION OF SYMBOLS 1 ... Karaoke system, 2 ... Server, 21 ... Control part, 23 ... Storage part, 25 ... Communication part, 27 ... User interface part, 29 ... Media drive, 3 ... Karaoke apparatus, 31 ... Control part, 33 ... Storage part, 35 ... Communication unit, 41 ... Display unit, 43 ... Operation unit, 45 ... Microphone, 47 ... Speaker, 49 ... Audio input / output unit, 100 ... Network.

Claims

Each of the timings at which the constituent sounds constituting the singing voice change when the user sings the target music (hereinafter referred to as “singing change timing”) changes the constituent sounds in the model voice when the target music is appropriately sung. Timing collating means for collating which timing (hereinafter referred to as “model change timing”) cst [k] (k = 1 to n) corresponds to;
For each of the singing change timings, a deviation specifying means for specifying a timing deviation amount vdt [k] based on the model change timing collated with the timing collating means as corresponding to the singing change timing;
Based on the model change timings cst [k] and cst [k + 1] adjacent to each other and the shift amounts vdt [k] and vdt [k + 1] specified by the shift specifying means for the model change timings, Utterance specifying means for specifying the utterance time vlen [k] of each component sound;
When each utterance time vlen [k] specified by the utterance specifying means is appropriately uttered in the section of the model change timings cst [k] and cst [k + 1] referred to when the utterance time vlen [k] is specified Reproducibility specifying means for specifying non-reproducibility nvlen [k] of the exemplary utterance time clen [k] in the utterance time vlen [k] by comparing with the exemplary utterance time clen [k] in
Based on the non-reproducibility sequence when the non-reproducibility nvlen [k] specified by the reproducibility specifying means is distributed in the order of arrival of the model change timing cst [k] referred to in the specification. An evaluation output means for outputting a higher evaluation value as the followability of the singing to the target music as the periodicity included in the non-reproducible change pattern in the sequence is lower. Evaluation system.

The timing collating means collates pitch transition patterns along the time axis of the singing voice and the exemplary voice, respectively, so that each of the singing change timings at which the pitch changes in the singing voice is the model voice. The followability evaluation system according to claim 1, wherein it corresponds to which of the model change timings cst [k].

The timing matching means is sandwiched between singing change timings that specify a correspondence relationship with the model change timing in a section along the time axis of the singing voice, and the model change that does not specify the correspondence relationship For the section corresponding to the timing cst [k], the timing at which the audio level becomes lower than a certain level in the section is specified as the singing change timing corresponding to the model change timing cst [k] for which the correspondence relationship has not been specified. The followability evaluation system according to claim 2.

A period specifying means for specifying the periodicity included in the non-reproducible series,
The said evaluation output means outputs a high evaluation value as followability of the singing with respect to the said target music, so that the periodicity specified by the said period specific | specification means is low. Follow-up evaluation system.

The period specifying unit regards the non-reproducible series as a waveform that changes in order of arrival of the model change timing cst [k] with the non-reproducible magnitude as an amplitude, and calculates a distribution of frequency components of the waveform By specifying the periodicity specified by the distribution,
The evaluation output means is based on the distribution of frequency components calculated by the period specifying means, and the higher the sharpness of the distributed frequency components, the lower the periodicity included in the time difference change pattern. The follow-up evaluation system according to claim 4, wherein an evaluation value is output.

The evaluation output means is based on the frequency component distribution calculated by the period specifying means, and the frequency component having the highest spectral intensity in the distribution is included in the time difference change pattern as the sharpness of the frequency component is smaller. The followability evaluation system according to claim 5, wherein a high evaluation value is determined as having a low periodicity.

Singing data acquisition means for acquiring singing data indicating singing voice at the time of singing the target music by the user together with identification information capable of identifying the target music sung;
The said timing collation means collates the singing voice shown by song data by the said singing data acquisition means with the model audio | voice of the target music identified by the identification information acquired with the said singing data. To 6. The following evaluation system according to 6.

The followability evaluation system according to any one of claims 1 to 7,
For each unit section obtained by dividing the singing voice indicated by the singing data for each predetermined section along a time series, the singing parameters relating to the sound of the unit section are ideal parameters based on the correct voice to be uttered in the unit section, and By contrast, a singing scoring means for scoring the singing song,
A result informing means for informing the scoring result scored by the singing scoring means,
The singing scoring means determines the final scoring result by adding or subtracting the scoring result based on the comparison between the singing parameter and the ideal parameter according to the evaluation value output by the evaluation output means. A characteristic karaoke system.

A program for causing a computer system to execute various processing procedures for causing the computer system to function as all means according to claim 1.