JP2010197738A

JP2010197738A - Tone pitch determination system, register determination system, and program

Info

Publication number: JP2010197738A
Application number: JP2009042811A
Authority: JP
Inventors: Noriaki Asemi; 典昭阿瀬見
Original assignee: Brother Industries Ltd
Current assignee: Brother Industries Ltd
Priority date: 2009-02-25
Filing date: 2009-02-25
Publication date: 2010-09-09
Anticipated expiration: 2029-02-25
Also published as: JP5298945B2

Abstract

<P>PROBLEM TO BE SOLVED: To properly determine tone pitch and register corresponding to a user. <P>SOLUTION: A transition pattern of singing tone pitch along a time axis when a user sings a song is compared with a transition pattern of a guide tone pitch k when the song is sung properly, thereby calculating an error d between the transition patterns and reflecting it in an error distribution of the user. Then, the guide tone pitch k where the error ratio obtained by comparing differences Δd from errors d in adjacent guide tone pitches k-1 and k+1 becomes minimum is determined to be the highest tone pitch kup in the register of the user corresponding to the error distribution (s310-s330). Similarly, the guide tone pitch k where the error ratio obtained by comparing differences from errors in adjacent guide tone pitches k-1 and k+1 becomes minimum is determined to be the lowest pitch klo in the register of the user corresponding to the error distribution (s410-s430). The register of the user is determined based on the highest pitch kup and the lowest pitch klo (s260). <P>COPYRIGHT: (C)2010,JPO&INPIT

Description

本発明は、ユーザの音域における音高を判定するためのシステムに関する。 The present invention relates to a system for determining a pitch in a user's range.

近年、楽曲を歌唱したユーザの音域を特定するための技術が種々提案されている。 In recent years, various techniques for specifying the range of a user who has sung music have been proposed.

例えば、ユーザによる楽曲の歌唱中に、マイクから入力された音声における最高音高（最高音程）および最低音高（最低音程）を検出し、そうして検出された最高音高から最低音高までを、そのユーザの音域と判定する、といった技術がある（特許文献１参照）。 For example, while the user sings a song, the highest pitch (highest pitch) and lowest pitch (lowest pitch) in the sound input from the microphone is detected, and from the highest pitch thus detected to the lowest pitch Is determined to be the sound range of the user (see Patent Document 1).

特開２００２−７３０５８号公報JP 2002-73058 A

しかし、上述した技術は、単に歌唱に際して検出された最高音高および最低音高の範囲を該当ユーザの音域と判定しているため、適切に発声されたものであるか否かに拘わらず、検出された音高をユーザが適切に発声可能な音高と判定してしまい、さらには、検出された音高で規定される音域をユーザの音域と判定してしまう。 However, since the above-mentioned technique simply determines the range of the highest pitch and the lowest pitch detected when singing as the corresponding user's range, it is detected regardless of whether or not the voice is properly spoken. The determined pitch is determined as a pitch that the user can properly utter, and furthermore, the range defined by the detected pitch is determined as the user's range.

具体的にいえば、あるユーザにとって発声しにくい音高の区間を有する楽曲が歌唱された場合を想定すると、その区間においてたとえ一時的に適切な音高に到達したとしても、その区間における音高の推移パターンは、その区間を適切に歌唱した場合における推移パターンから大きく異なったものになってしまう可能性が高い。 Specifically, assuming that a song having a pitch section that is difficult for a certain user to sing is sung, even if the song temporarily reaches an appropriate pitch in that section, the pitch in that section is The transition pattern is likely to be greatly different from the transition pattern when the section is appropriately sung.

このような音高については、そのユーザが無理に発声している音高といえ、歌唱に適した状態で発声できる音高とは言い難いため、この音高がユーザの発声可能な音高と判定されないようにし、また、この音高を含まないようにユーザの音域を判定することが望ましい。 About such a pitch, it can be said that the pitch of the user is uttered forcibly, and it is difficult to say that the pitch can be uttered in a state suitable for singing. It is desirable to determine the user's range so that it is not determined and does not include this pitch.

本発明は、このような課題を解決するためになされたものであり、その目的は、より適切にユーザが発声可能な音高，音域を判定するための技術を提供することである。 The present invention has been made to solve such a problem, and an object of the present invention is to provide a technique for determining a pitch and a sound range that a user can utter more appropriately.

上記課題を解決するため第１の構成は、
ユーザによる楽曲の歌唱に伴う時間軸に沿った音高の推移を示す歌唱データ，および，該歌唱データに係る歌唱楽曲を適切に歌唱した場合における時間軸に沿った音高の推移を示すガイドデータ，に基づいて、前記歌唱データで示される時間軸上の各単位区間（以降「歌唱区間」という）における音高（以降「歌唱音高」という）の推移パターンそれぞれを、前記ガイドデータで示される時間軸上の各単位区間（以降「ガイド区間」という）のうち、該当歌唱区間に対応するガイド区間における音高（以降「ガイド音高」という）の推移パターンと対比することにより、両推移パターンの誤差を、そのガイド区間において発声すべきガイド音高ｋ（＝１〜ｎのいずれか）に対する歌唱音高の誤差Δｄ［ｋ］としてそれぞれ算出する誤差算出手段と、
複数のユーザそれぞれに対して用意され、該ユーザの前記ガイド音高に対する前記歌唱音高の誤差を前記ガイド音高毎に分布させてなる誤差分布のうち、前記誤差算出手段による算出の契機となる歌唱を行ったユーザに対応する誤差分布に、その算出された誤差それぞれを、該算出に際して参照されたガイド区間におけるガイド音高ｋに対する歌唱音高の誤差ｄ［ｋ］として追加的に分布させることにより、前記誤差分布を更新する分布更新手段と、
前記分布更新手段に更新された誤差分布におけるガイド音高ｋのうち、低い側に隣接するガイド音高ｋ−１における誤差ｄ［ｋ−１］との差分Δｄ［ｋ−１］（＝ｄ［ｋ］−ｄ［ｋ−１］）を高い側に隣接するガイド音高ｋ＋１における誤差ｄ［ｋ＋１］との差分Δｄ［ｋ］（＝ｄ［ｋ＋１］−ｄ［ｋ］）と対比した誤差比｜Δｄ［ｋ−１］／Δｄ［ｋ］｜が最小となるガイド音高ｋを抽出し、該抽出したガイド音高ｋを、その誤差成分に対応するユーザの音域における最高音ｋｕｐと判定する最高音判定手段と、を備えている。 In order to solve the above problem, the first configuration is as follows.
Singing data indicating the transition of the pitch along the time axis associated with the singing of the music by the user, and guide data indicating the transition of the pitch along the time axis when the singing music related to the singing data is appropriately sung , The transition patterns of the pitches (hereinafter referred to as “singing pitches”) in the respective unit sections (hereinafter referred to as “singing intervals”) on the time axis indicated by the song data are indicated by the guide data. Both transition patterns are compared with the transition pattern of the pitch (hereinafter referred to as “guide pitch”) in the guide section corresponding to the corresponding singing section in each unit section (hereinafter referred to as “guide section”) on the time axis. Error calculating means for calculating an error Δd [k] of a singing pitch with respect to a guide pitch k (= 1 to n) to be uttered in the guide section And,
Among the error distributions prepared for each of a plurality of users and distributing the error of the singing pitch with respect to the guide pitch of the user for each of the guide pitches, this is an opportunity for calculation by the error calculation means. Each of the calculated errors is additionally distributed in the error distribution corresponding to the user who performed the song as an error d [k] of the singing pitch with respect to the guide pitch k in the guide section referred to in the calculation. A distribution updating means for updating the error distribution;
Among the guide pitches k in the error distribution updated by the distribution updating means, the difference Δd [k−1] (= d [) from the error d [k−1] in the guide pitch k−1 adjacent to the lower side. k] −d [k−1]) is compared with the difference Δd [k] (= d [k + 1] −d [k]) from the error d [k + 1] at the guide pitch k + 1 adjacent to the higher side. A guide pitch k that minimizes | Δd [k−1] / Δd [k] | is extracted, and the extracted guide pitch k is determined as the highest tone kup in the user's range corresponding to the error component. And a highest sound determination means.

この構成における音高判定システムでは、まず、ユーザの歌唱に伴う時間軸に沿った音高（歌唱音高）の推移パターンを、楽曲を適切に歌唱した場合における音高（ガイド音高）の推移パターンと対比することで、これら推移パターンの誤差ｄを算出し、これをユーザの誤差分布に反映させる。 In the pitch determination system in this configuration, first, the transition pattern of the pitch (singing pitch) along the time axis associated with the user's singing, and the transition of the pitch (guide pitch) when the song is properly sung By comparing with the pattern, the error d of these transition patterns is calculated and reflected in the error distribution of the user.

そして、誤差分布におけるガイド音高１〜ｎのうち、隣接するガイド音高における誤差ｄとの差分Δｄそれぞれを対比した誤差比｜Δｄ［ｋ−１］／Δｄ［ｋ］｜が最小となるガイド音高ｋを、その誤差分布に対応するユーザの音域における最高音ｋｕｐと判定している。 Then, of the guide pitches 1 to n in the error distribution, the guide that minimizes the error ratio | Δd [k−1] / Δd [k] | that compares each difference Δd with the error d in the adjacent guide pitch. The pitch k is determined as the highest sound kup in the user's range corresponding to the error distribution.

誤差分布は、特定ガイド区間のガイド音高ｋに対する歌唱音高の推移パターンとしての誤差ｄ［ｋ］をガイド音高ｋ毎に分布させたものであるため、その誤差ｄ［ｋ］が小さいガイド音高ｋは、該当ユーザが同じような推移パターンで適切に発声できている音高といえるのに対し、その誤差ｄ［ｋ］が大きいガイド音高は、該当ユーザが無理に発声している音高といえる。 The error distribution is such that an error d [k] as a transition pattern of the singing pitch with respect to the guide pitch k in the specific guide section is distributed for each guide pitch k, and therefore the guide having a small error d [k] is distributed. The pitch k can be said to be a pitch that the corresponding user can properly utter in a similar transition pattern, whereas a guide pitch whose error d [k] is large is uttered by the corresponding user forcibly. It can be said that the pitch.

そうすると、ユーザが適切に発声できているガイド音高ｋから無理に発声しているガイド音高ｋ＋１へと至る高音側の変曲領域では、低い側に隣接するガイド音高ｋ−１における誤差ｄ［ｋ−１］との差分Δｄ［ｋ−１］に対し、高い側に隣接するガイド音高ｋ＋１における誤差ｄ［ｋ］との差分Δｄ［ｋ］が急増する傾向を示す。この傾向は、ユーザが適切に発声できている最も高いガイド音高ｋ付近で最も顕著に表れることが予想され、この場合、差分Δｄ同士を対比した誤差比｜Δｄ［ｋ−１］／Δｄ［ｋ］｜は、分母である差分Δｄ［ｋ］が著しく大きくなることで最小値を示すことになる。 Then, in the inflection region on the high pitch side from the guide pitch k that the user can properly utter to the guide pitch k + 1 that is forcibly uttered, the error d in the guide pitch k−1 adjacent to the low side. The difference Δd [k] from the error d [k] at the guide pitch k + 1 adjacent to the higher side tends to increase rapidly with respect to the difference Δd [k−1] from [k−1]. This tendency is expected to be most noticeable in the vicinity of the highest guide pitch k at which the user can speak properly. In this case, the error ratio | Δd [k−1] / Δd [ k] | indicates a minimum value when the difference Δd [k], which is the denominator, is significantly increased.

このようなことから、上記構成では、誤差比｜Δｄ［ｋ−１］／Δｄ［ｋ］｜が最小となっているガイド音高ｋを、ユーザが適切に発声できているガイド音高ｋのうち、最も高い音高ｋｕｐと判定することができる。 For this reason, in the above configuration, the guide pitch k at which the error ratio | Δd [k−1] / Δd [k] | is the minimum is the guide pitch k that the user can properly utter. Of these, the highest pitch kup can be determined.

また、この構成においては、最高音ｋｕｐの判定時に参照するガイド音高ｋの範囲をある程度限定することが、処理負荷や判定精度の観点で望ましい。このための構成としては、例えば、上記構成を以下に示す第２の構成（請求項２）のようにすることが考えられる。 In this configuration, it is desirable from the viewpoint of processing load and determination accuracy to limit the range of the guide pitch k referred to when determining the highest sound kup to some extent. As a configuration for this purpose, for example, it is conceivable to make the above configuration as a second configuration (claim 2) shown below.

この構成において、前記最高音判定手段は、前記分布更新手段に更新された誤差分布におけるガイド音高ｋのうち、前記差分Δｄ［ｋ］が０より大きく、かつ、前記誤差比｜Δｄ［ｋ−１］／Δｄ［ｋ］｜が最小となるガイド音高を抽出する。 In this configuration, the highest sound determination unit has the difference Δd [k] larger than 0 in the guide pitch k in the error distribution updated by the distribution update unit, and the error ratio | Δd [k− 1] / Δd [k] |, the guide pitch that minimizes is extracted.

この構成では、最高音ｋｕｐの判定時に参照するガイド音高ｋの範囲を、差分Δｄ［ｋ］が０より大きいものに限定している。 In this configuration, the range of the guide pitch k that is referred to when determining the highest sound kup is limited to a range in which the difference Δd [k] is greater than zero.

上述した変曲領域では、ガイド音高ｋが適切に発声できている最高音より高くなるにつれて、高い側に隣接するガイド音高ｋ＋１における誤差ｄ［ｋ］が増加していくことが予想されるため、誤差ｄ［ｋ］と誤差ｄ［ｋ＋１］との差分Δｄ［ｋ］（＝ｄ［ｋ＋１］−ｄ［ｋ］は当然０より大きくなる。 In the inflection region described above, it is expected that the error d [k] in the guide pitch k + 1 adjacent to the higher side will increase as the guide pitch k becomes higher than the highest sound that can be properly spoken. Therefore, the difference Δd [k] (= d [k + 1] −d [k] between the error d [k] and the error d [k + 1] is naturally larger than 0.

そうすると、対象となるガイド音高ｋを差分Δｄ［ｋ］が０より大きいものに限定しても、最高音ｋｕｐとして抽出されるべきガイド音高ｋが抽出対象から除かれることはないため、適切に処理負荷を抑えることができる。 Then, even if the target guide pitch k is limited to a difference Δd [k] larger than 0, the guide pitch k to be extracted as the highest sound kup is not excluded from the extraction target. The processing load can be reduced.

さらにいえば、適切なガイド音高ｋのみが対象となるため、不適切なガイド音高ｋを抽出してしまうことなどに起因して最高音ｋｕｐの判定精度が低くなってしまうことも防止することができる。 Furthermore, since only the appropriate guide pitch k is targeted, it is possible to prevent the determination accuracy of the highest sound kup from being lowered due to extraction of an inappropriate guide pitch k. be able to.

また、上記各構成において、最高音ｋｕｐとなるガイド音高ｋは、誤差分布における全てのガイド音高のうち、高音側の変曲領域にあるため、最高音ｋｕｐの判定時に参照するガイド音高ｋの範囲を、高音側の変曲領域を形成する帯域に限定することとしてもよい。このための構成としては、例えば、上記各構成を以下に示す第３の構成（請求項３）のようにすることが考えられる。 Further, in each of the above configurations, the guide pitch k that is the highest tone kup is in the inflection region on the higher tone side of all the guide pitches in the error distribution, and therefore the guide pitch that is referred to when determining the highest tone kup. The range of k may be limited to a band that forms a high-frequency inflection region. As a configuration for this purpose, for example, it is conceivable to make each of the above configurations as a third configuration (claim 3) shown below.

この構成においては、前記分布更新手段に更新された誤差分布における複数のガイド音高のうち、音高が高い側に位置する所定数のガイド音高からなる帯域を高域誤差変曲帯域として抽出する高域誤差変曲帯域抽出手段，を備えている。そして、前記最高音判定手段は、前記高域誤差変曲帯域抽出手段により抽出された高域誤差変曲帯域におけるガイド音高の中から前記最高音ｋｕｐとなるガイド音高を抽出する。 In this configuration, among a plurality of guide pitches in the error distribution updated by the distribution updating means, a band composed of a predetermined number of guide pitches located on the higher pitch side is extracted as a high frequency error inflection band. High frequency error inflection band extracting means. Then, the highest sound determination means extracts the guide pitch that becomes the highest sound kup from the guide pitches in the high frequency error inflection band extracted by the high frequency error inflection band extraction means.

この構成では、最高音ｋｕｐの判定時に参照するガイド音高ｋの範囲を、あらかじめ高音側の誤差変曲帯域に限定することにより、その判定に要する処理負荷を抑えることができる。 In this configuration, the processing load required for the determination can be suppressed by limiting the range of the guide pitch k referred to when determining the highest sound kup to the error inflection band on the high sound side in advance.

また、この構成において、高音誤差変曲帯域を抽出するための具体的な構成は特に限定されないが、例えば、以下に示す第４の構成（請求項４）のようにするとよい。 Further, in this configuration, the specific configuration for extracting the treble error inflection band is not particularly limited. For example, the fourth configuration (claim 4) described below may be used.

この構成において、前記高域誤差変曲帯域抽出手段は、前記分布更新手段に更新された誤差分布における複数のガイド音高のうち、音高が高い側に位置しており，かつ，それぞれ隣接するガイド音高との間での誤差の変化率が一定以上となっているガイド音高それぞれからなる高音誤差変曲帯域を抽出する。 In this configuration, the high frequency error inflection band extracting means is located on the higher pitch side among the plurality of guide pitches in the error distribution updated by the distribution updating means, and is adjacent to each other. A treble error inflection band made up of guide pitches each having a certain rate of change in error with respect to the guide pitch is extracted.

誤差分布における高音側の帯域では、ガイド音高ｋが適切に発声できる音高より高くなるにつれて誤差が大きくなっていくことが予想されるため、このように誤差が大きくなっていくことに伴い、それぞれ隣接するガイド音高における誤差の変化度合も大きくなっていく。 In the band on the high pitch side in the error distribution, the error is expected to increase as the guide pitch k becomes higher than the pitch that can be properly spoken. The degree of change in error between adjacent guide pitches also increases.

そのため、上記構成においては、帯域を抽出する際の変化率として、ユーザが適切に発声できるガイド音高以上の音高における変化率として想定される値を「一定の変化率」としておくことにより、高音誤差変曲帯域として適切な帯域を抽出することができる。 Therefore, in the above configuration, as a rate of change when extracting the band, by setting a value assumed as a rate of change in the pitch above the guide pitch that can be appropriately spoken by the user as a “constant rate of change”, An appropriate band can be extracted as the treble error inflection band.

この構成は、より具体的に以下に示す第５の構成（請求項５）のようにするとよい。 More specifically, this configuration is preferably a fifth configuration (claim 5) described below.

この構成において、前記高域誤差変曲帯域抽出手段は、前記分布更新手段に更新された誤差分布における複数のガイド音高のうち、少なくとも音高が高い側に位置しているガイド音高を所定数のガイド音高毎のグループに分類すると共に、同一グループに分類されたガイド音高の誤差をグループ毎に平均化したうえで、各グループのうち、該グループにおける平均誤差で高音側に隣接するグループにおける平均誤差を除した隣接誤差比が最大となるグループを含む１以上のグループについて、これらグループに分類されたガイド音高それぞれからなる帯域を抽出する。 In this configuration, the high frequency error inflection band extracting means predetermines a guide pitch positioned at least on the higher pitch side among a plurality of guide pitches in the error distribution updated by the distribution updating means. In addition to classifying into guide groups for each guide pitch, and averaging the guide pitch errors classified into the same group for each group, the average error in each group is adjacent to the treble side. For one or more groups including the group having the maximum adjacent error ratio excluding the average error in the group, a band composed of the guide pitches classified into these groups is extracted.

この構成であれば、所定数のガイド音高からなるグループ毎のガイド音高の誤差を平均化し、それぞれ隣接するグループ間の隣接誤差比が最大となっている１以上のグループについて、このグループにおけるガイド音高それぞれからなる高音誤差変曲領域を抽出することができる。 With this configuration, the guide pitch errors for each group consisting of a predetermined number of guide pitches are averaged, and one or more groups in which the adjacent error ratio between adjacent groups is maximized are included in this group. It is possible to extract a treble error inflection region consisting of each guide pitch.

なお、この構成において、グループとして分類する対象となる「音高が高い側に位置しているガイド音高」とは、誤差分布において音高が高い側に位置していればよく、例えば、ガイド音高全域における高音側の一定割合（例えば、数十％など）とすることなどが考えられる。 In this configuration, the “guide pitch located on the higher pitch side” to be classified as a group only needs to be located on the higher pitch side in the error distribution. It is conceivable to set a certain ratio (for example, several tens of percent) on the high pitch side in the entire pitch range.

また、上記課題を解決するため第６の構成は、
ユーザによる楽曲の歌唱に伴う時間軸に沿った音高の推移を示す歌唱データ，および，該歌唱データに係る歌唱楽曲を適切に歌唱した場合における時間軸に沿った音高の推移を示すガイドデータ，に基づいて、前記歌唱データで示される時間軸上の各単位区間（以降「歌唱区間」という）における音高（以降「歌唱音高」という）の推移パターンそれぞれを、前記ガイドデータで示される時間軸上の各単位区間（以降「ガイド区間」という）のうち、該当歌唱区間に対応するガイド区間における音高（以降「ガイド音高」という）の推移パターンと対比することにより、両推移パターンの誤差を、そのガイド区間において発声すべきガイド音高ｋ（＝１〜ｎのいずれか）に対する歌唱音高の誤差Δｄ［ｋ］としてそれぞれ算出する誤差算出手段と、
複数のユーザそれぞれに対して用意され、該ユーザの前記ガイド音高に対する前記歌唱音高の誤差を前記ガイド音高毎に分布させてなる誤差分布のうち、前記誤差算出手段による算出の契機となる歌唱を行ったユーザに対応する誤差分布に、その算出された誤差それぞれを、該算出に際して参照されたガイド区間におけるガイド音高ｋに対する歌唱音高の誤差ｄ［ｋ］として追加的に分布させることにより、前記誤差分布を更新する分布更新手段と、
前記分布更新手段に更新された誤差分布におけるガイド音高ｋのうち、高い側に隣接するガイド音高ｋ＋１における誤差ｄ［ｋ＋１］との差分Δｄ［ｋ］（＝ｄ［ｋ＋１］−ｄ［ｋ］）を低い側に隣接するガイド音高ｋ−１における誤差ｄ［ｋ−１］との差分Δｄ［ｋ−１］（＝ｄ［ｋ］−ｄ［ｋ−１］）と対比した誤差比｜Δｄ［ｋ］／Δｄ［ｋ−１］｜が最小となるガイド音高ｋを抽出し、該抽出したガイド音高ｋを、その誤差成分に対応するユーザの音域における最低音ｋｌｏと判定する最低音判定手段と、を備えている。 In order to solve the above problem, the sixth configuration is:
Singing data indicating the transition of the pitch along the time axis associated with the singing of the music by the user, and guide data indicating the transition of the pitch along the time axis when the singing music related to the singing data is appropriately sung , The transition patterns of the pitches (hereinafter referred to as “singing pitches”) in the respective unit sections (hereinafter referred to as “singing intervals”) on the time axis indicated by the song data are indicated by the guide data. Both transition patterns are compared with the transition pattern of the pitch (hereinafter referred to as “guide pitch”) in the guide section corresponding to the corresponding singing section in each unit section (hereinafter referred to as “guide section”) on the time axis. Error calculating means for calculating an error Δd [k] of a singing pitch with respect to a guide pitch k (= 1 to n) to be uttered in the guide section And,
Among the error distributions prepared for each of a plurality of users and distributing the error of the singing pitch with respect to the guide pitch of the user for each of the guide pitches, this is an opportunity for calculation by the error calculation means. Each of the calculated errors is additionally distributed in the error distribution corresponding to the user who performed the song as an error d [k] of the singing pitch with respect to the guide pitch k in the guide section referred to in the calculation. A distribution updating means for updating the error distribution;
Among the guide pitches k in the error distribution updated by the distribution updating means, the difference Δd [k] (= d [k + 1] −d [k] from the error d [k + 1] in the guide pitch k + 1 adjacent to the higher side. ]) In comparison with the difference Δd [k−1] (= d [k] −d [k−1]) from the error d [k−1] in the guide pitch k−1 adjacent to the lower side. A guide pitch k that minimizes | Δd [k] / Δd [k−1] | is extracted, and the extracted guide pitch k is determined as the lowest tone klo in the user's range corresponding to the error component. Minimum sound determination means.

この構成における音高判定システムでは、上記構成と同様に、ユーザにおける誤差分布を更新したうえで、この誤差分布におけるガイド音高１〜ｎのうち、隣接するガイド音高における誤差との差分それぞれを対比した誤差比｜Δｄ［ｋ］／Δｄ［ｋ−１］｜が最小となるガイド音高ｋを、その誤差成分に対応するユーザの音域における最低音ｋｌｏと判定している。 In the pitch determination system in this configuration, as in the above configuration, after updating the error distribution in the user, each of the differences from the error in the adjacent guide pitch among the guide pitches 1 to n in this error distribution is calculated. The guide pitch k at which the compared error ratio | Δd [k] / Δd [k−1] | is minimized is determined as the lowest sound klo in the user's range corresponding to the error component.

上述したとおり、誤差Δｄ［ｋ］が小さいガイド音高ｋは、該当ユーザが同じような推移パターンで適切に発声できている音高といえるのに対し、その誤差Δｄ［ｋ］が大きいガイド音高は、該当ユーザが無理に発声している音高といえる。 As described above, the guide pitch k having a small error Δd [k] can be said to be a pitch at which the corresponding user can properly utter with a similar transition pattern, whereas the guide pitch having a large error Δd [k]. High can be said to be the pitch that the corresponding user is forcing.

そうすると、ユーザが無理に発声しているガイド音高から適切に発声できているガイド音高へと至る低音側の変曲領域では、高い側に隣接するガイド音高ｋ＋１における誤差ｄ［ｋ＋１］との差分Δｄ［ｋ］に対し、低い側に隣接するガイド音高ｋ−１における誤差ｄ［ｋ−１］との差分Δｄ［ｋ−１］が急増する傾向を示す。この傾向は、ユーザが適切に発声できている最も低いガイド音高ｋ付近で最も顕著に表れることが予想され、この場合、差分Δｄ同士を対比した誤差比｜Δｄ［ｋ］／Δｄ［ｋ−１］｜は、分母である差分Δｄ［ｋ−１］が著しく大きくなることで最小値を示すことになる。 Then, in the inflection region on the low pitch side from the guide pitch that the user has forcibly uttered to the guide pitch that can be properly uttered, the error d [k + 1] in the guide pitch k + 1 adjacent to the high side is The difference Δd [k−1] from the error d [k−1] in the guide pitch k−1 adjacent to the lower side tends to increase rapidly with respect to the difference Δd [k]. This tendency is expected to be most noticeable in the vicinity of the lowest guide pitch k at which the user can properly speak. In this case, the error ratio | Δd [k] / Δd [k− 1] | indicates the minimum value when the difference Δd [k−1], which is the denominator, is significantly increased.

このようなことから、上記構成では、誤差比｜Δｄ［ｋ］／Δｄ［ｋ−１］｜が最小となっているガイド音高ｋを、ユーザが適切に発声できているガイド音高ｋのうち、最も低い音高ｋｌｏと判定することができる。 For this reason, in the above configuration, the guide pitch k at which the error ratio | Δd [k] / Δd [k−1] | Of these, the lowest pitch klo can be determined.

また、この構成においては、最低音ｋｌｏの判定時に参照するガイド音高ｋの範囲をある程度限定することが、処理負荷や判定精度の観点で望ましい。このための構成としては、例えば、上記構成を以下に示す第７の構成（請求項７）のようにすることが考えられる。 In this configuration, it is desirable from the viewpoint of processing load and determination accuracy that the range of the guide pitch k referred to when determining the lowest sound klo is limited to some extent. As a configuration for this purpose, for example, the above configuration can be considered as a seventh configuration (claim 7) described below.

この構成において、前記最低音判定手段は、前記分布更新手段に更新された誤差分布におけるガイド音高ｋのうち、前記差分Δｄ［ｋ］が０より小さく、かつ、誤差比｜Δｄ［ｋ］／Δｄ［ｋ−１］｜が最小となるガイド音高ｋを抽出する。 In this configuration, the lowest sound determination means has the difference Δd [k] smaller than 0 in the guide pitch k in the error distribution updated by the distribution update means, and an error ratio | Δd [k] / The guide pitch k that minimizes Δd [k−1] | is extracted.

この構成では、最低音ｋｌｏの判定時に参照するガイド音高ｋの範囲を、差分Δｄ［ｋ］が０より小さいものに限定している。 In this configuration, the range of the guide pitch k that is referred to when determining the lowest sound klo is limited to a range in which the difference Δd [k] is smaller than zero.

上述した変曲領域では、ガイド音高ｋが適切に発声できている最低音より低くなるにつれて、高い側に隣接するガイド音高ｋ＋１における誤差ｄ［ｋ］が増加していくことが予想されるため、誤差ｄ［ｋ］と誤差ｄ［ｋ＋１］との差分Δｄ［ｋ］（＝ｄ［ｋ＋１］−ｄ［ｋ］）は当然０より小さくなる。 In the inflection region described above, the error d [k] in the guide pitch k + 1 adjacent to the higher side is expected to increase as the guide pitch k becomes lower than the lowest sound that can be properly spoken. Therefore, the difference Δd [k] (= d [k + 1] −d [k]) between the error d [k] and the error d [k + 1] is naturally smaller than 0.

そうすると、対象となるガイド音高ｋを差分Δｄ［ｋ］が０より小さいものに限定しても、最低音ｋｌｏとして抽出されるべきガイド音高ｋが抽出対象から除かれることはないため、適切に処理負荷を抑えることができる。 Then, even if the target guide pitch k is limited to a difference Δd [k] smaller than 0, the guide pitch k to be extracted as the lowest sound klo is not excluded from the extraction target. The processing load can be reduced.

さらにいえば、適切なガイド音高ｋのみが対象となるため、不適切なガイド音高ｋを抽出してしまうことなどに起因して最低音ｋｌｏの判定精度が低くなってしまうことも防止することができる。 Furthermore, since only the appropriate guide pitch k is the target, it is possible to prevent the determination accuracy of the lowest sound klo from being lowered due to the extraction of the inappropriate guide pitch k. be able to.

また、上記各構成において、最低音ｋｌｏとなるガイド音高ｋは、誤差分布における全てのガイド音高のうち、低音側の変曲領域にあるため、最低音ｋｌｏの判定時に参照するガイド音高ｋの範囲を、低音側の変曲領域を形成する帯域に限定することとしてもよい。このための構成としては、例えば、上記各構成を以下に示す第８の構成（請求項８）のようにすることが考えられる。 Further, in each of the above configurations, the guide pitch k that becomes the lowest sound klo is in the inflection region on the lower sound side among all the guide pitches in the error distribution, so that the guide pitch that is referred to when determining the lowest sound klo. The range of k may be limited to a band that forms a low-frequency inflection region. As a configuration for this purpose, for example, it is conceivable to make each of the above configurations as shown in an eighth configuration (claim 8).

この構成において、前記分布更新手段に更新された誤差分布における複数のガイド音高のうち、音高が低い側に位置する所定数のガイド音高からなる帯域を低域誤差変曲帯域として抽出する低域誤差変曲帯域抽出手段，を備えている。そして、前記最低音判定手段は、前記低域誤差変曲帯域抽出手段により抽出された低域誤差変曲帯域におけるガイド音高の中から前記最低音ｋｌｏとなるガイド音高を抽出する。 In this configuration, a band composed of a predetermined number of guide pitches located on the lower pitch side among a plurality of guide pitches in the error distribution updated by the distribution updating means is extracted as a low frequency error inflection band. Low-frequency error inflection band extraction means. Then, the lowest sound determination means extracts a guide pitch that becomes the lowest sound klo from the guide pitches in the low frequency error inflection band extracted by the low frequency error inflection band extraction means.

この構成では、最低音ｋｌｏの判定時に参照するガイド音高ｋの範囲を、あらかじめ低音側の誤差変曲帯域に限定することにより、その抽出および判定に要する処理負荷を抑えることができる。 In this configuration, the processing load required for the extraction and determination can be reduced by limiting the range of the guide pitch k referred to when determining the lowest sound klo to the error inflection band on the low sound side in advance.

また、この構成において、高音帯域を抽出するための具体的な構成は特に限定されないが、例えば、以下に示す第９の構成（請求項９）のようにするとよい。 Further, in this configuration, a specific configuration for extracting the high frequency band is not particularly limited, but for example, a ninth configuration (claim 9) described below may be used.

この構成において、前記低域誤差変曲帯域抽出手段は、前記分布更新手段に更新された誤差分布における複数のガイド音高のうち、音高が低い側に位置しており，かつ，それぞれ隣接するガイド音高との間での誤差の変化率が一定以上となっているガイド音高それぞれからなる低音帯域を抽出する。 In this configuration, the low frequency error inflection band extracting means is located on the low pitch side among a plurality of guide pitches in the error distribution updated by the distribution updating means, and is adjacent to each other. A bass band composed of guide pitches whose error change rate with respect to the guide pitch is equal to or greater than a certain level is extracted.

誤差分布における低音側の帯域では、ガイド音高ｋが適切に発声できる音高より低くなるにつれて誤差が大きくなっていくことが予想されるため、このように誤差が大きくなっていくことに伴い、それぞれ隣接するガイド音高における誤差の変化度合も大きくなっていく。 In the band on the low frequency side in the error distribution, the error is expected to increase as the guide pitch k becomes lower than the pitch that can be properly spoken. The degree of change in error between adjacent guide pitches also increases.

そのため、上記構成においては、帯域を抽出する際の変化率として、ユーザが適切に発声できるガイド音高以下の音高における変化率として想定される値を「一定の変化率」としておくことにより、高音誤差変曲帯域として適切な帯域を抽出することができる。 Therefore, in the above configuration, as a rate of change when extracting the band, by setting a value assumed as a rate of change in the pitch below the guide pitch that can be appropriately spoken by the user as a “constant rate of change”, An appropriate band can be extracted as the treble error inflection band.

この構成は、より具体的に以下に示す第１０の構成（請求項１０）のようにするとよい。 More specifically, this configuration may be a tenth configuration (claim 10) described below.

この構成において、前記低域誤差変曲帯域抽出手段は、前記分布更新手段に更新された誤差分布における複数のガイド音高のうち、少なくとも音高が低い側に位置しているガイド音高を所定数のガイド音高毎のグループに分類すると共に、同一グループに分類されたガイド音高の誤差をグループ毎に平均化したうえで、各グループのうち、該グループにおける平均誤差で高音側に隣接するグループにおける平均誤差を除した隣接誤差比が最小となるグループを含む１以上のグループについて、これらグループに分類されたガイド音高それぞれからなる帯域を抽出する。 In this configuration, the low-frequency error inflection band extracting unit predetermines a guide pitch located at least on the low pitch side among a plurality of guide pitches in the error distribution updated by the distribution updating unit. In addition to classifying into guide groups for each guide pitch, and averaging the guide pitch errors classified into the same group for each group, the average error in each group is adjacent to the treble side. For one or more groups including a group having a minimum adjacent error ratio excluding the average error in the group, a band made up of each guide pitch classified into these groups is extracted.

この構成であれば、所定数のガイド音高からなるグループ毎にガイド音高の誤差を平均化し、それぞれ隣接するグループ間における隣接誤差比が最小となっている１以上のグループについて、このグループにおけるガイド音高それぞれからなる低域誤差変曲領域を抽出することができる。 With this configuration, the guide pitch error is averaged for each group composed of a predetermined number of guide pitches, and one or more groups in which the adjacent error ratio between the adjacent groups is minimum are determined in this group. A low-frequency error inflection region consisting of each guide pitch can be extracted.

なお、この構成において、グループとして分類する対象となる「音高が低い側に位置しているガイド音高」とは、誤差分布において音高が低い側に位置していればよく、例えば、ガイド音高全域における低音側の一定割合（例えば、数十％など）とすることなどが考えられる。 In this configuration, the “guide pitch located on the low pitch side” to be classified as a group only needs to be located on the low pitch side in the error distribution. It is conceivable to use a certain ratio (for example, several tens of percent) on the low sound side in the entire pitch range.

また、上記各構成において、特定のガイド区間におけるガイド音高に対する歌唱音高の誤差は、どのような値であってもよく、例えば、歌唱音高の推移パターンとガイド音高の推移パターンとのパターンとしての形状の違いを数値化したものや、歌唱音高の推移パターンおよびガイド音高の推移パターンにおいて最終的に到達した音高それぞれの音高としての違いを数値化したものなどを採用することができる。 Further, in each of the above configurations, the error of the singing pitch with respect to the guide pitch in the specific guide section may be any value, for example, the singing pitch transition pattern and the guide pitch transition pattern. Employing a quantified difference in shape as a pattern, a quantified difference in the pitch of the final pitch reached in the singing pitch transition pattern and the guide pitch transition pattern be able to.

また、上記構成においてガイドデータを取得するに際しては、あらかじめ用意された複数のガイドデータの中から、歌唱データにおいてユーザが歌唱した楽曲に対応するものを取得することとすればよい。ここで「ユーザが歌唱した楽曲に対応するもの」であることは、歌唱データに、ユーザにより歌唱された楽曲を対応づけておき、その対応関係に基づいて特定することとすればよい。 Moreover, when acquiring guide data in the said structure, what is necessary is just to acquire what corresponds to the music which the user sang in the song data out of several guide data prepared beforehand. Here, “corresponding to the song sung by the user” may be specified by associating the song data sung by the user with the song data, and specifying the song based on the correspondence.

また、上記構成において、該当ユーザに対応する誤差分布を更新するに際しては、歌唱データに、その歌唱をしたユーザを対応づけておき、その対応関係に基づいて特定したユーザの誤差分布を更新することとすればよい。 In the above configuration, when updating the error distribution corresponding to the corresponding user, the user who performed the singing is associated with the song data, and the error distribution of the user specified based on the correspondence relationship is updated. And it is sufficient.

また、上記構成において歌唱データを取得するに際しては、ユーザによる楽曲の歌唱に伴って別途生成したものをシステム外部から取得することとすればよいし、ユーザによる楽曲の歌唱が行われる都度生成してこれを取得することとしてもよい。 In addition, when acquiring singing data in the above configuration, it is only necessary to acquire from the outside of the system what was separately generated along with the singing of the music by the user, and it is generated each time the user sings the music. It is good also as acquiring this.

この後者のように歌唱データを生成するにあたっては、ユーザが楽曲を歌唱してなる音声データに基づいて歌唱データを生成することとすればよく、そのためには、上記構成を以下に示すような第Ａの構成のようにするとよい。 In generating the singing data like the latter, it is sufficient that the user generates the singing data based on the voice data formed by singing the music. The configuration of A is good.

この構成では、ユーザが楽曲を歌唱してなる音声データにつき、該音声データにおいて時間軸に沿った位置それぞれの音高を算出し、該音高の時間軸に沿った推移を示すデータを歌唱データとして取得する。 In this configuration, for voice data formed by a user singing a song, the pitch of each position along the time axis in the voice data is calculated, and data indicating the transition of the pitch along the time axis is used as song data. Get as.

この構成であれば、ユーザによる楽曲の歌唱に係る音声データから歌唱データを生成し、これに基づいて音高の判定を行うことができる。 If it is this structure, song data can be produced | generated from the audio | voice data which concern on the song of a music by a user, and a pitch can be determined based on this.

ところで、上述した歌唱データは、その歌唱に伴う時間軸に沿った音高の推移を示すものであるが、歌唱区間それぞれでみると、実際に音高の変化が開始されたタイミングが、その楽曲を適切に歌唱した場合におけるタイミングに対してズレていることもありうる。 By the way, the singing data described above shows the transition of the pitch along the time axis associated with the singing, but when viewed in each singing section, the timing at which the pitch change actually started is the song. May be out of sync with the timing when singing properly.

この場合、上述したように単に推移パターンの対比により誤差を算出すると、そのようなタイミングのズレが要因となって誤差が大きくなってしまう。しかし、ユーザの音高を判定するという目的に照らせば、そのようなタイミングよりも、実際に音高が推移する推移パターンそのものが重要といえるため、そのようなタイミングのズレについてあらかじめ補償しておくことが望ましい。 In this case, if the error is simply calculated by comparing the transition patterns as described above, the error increases due to such a timing shift. However, in light of the objective of determining the user's pitch, it can be said that the transition pattern itself in which the pitch actually shifts is more important than such timing. It is desirable.

そのように、タイミングのズレを補償するためには、上記各構成を以下に示す第Ｂの構成のようにすることが考えられる。 As described above, in order to compensate for the timing shift, it is conceivable to make each of the above configurations as a Bth configuration shown below.

この構成においては、前記歌唱データで示される歌唱区間それぞれについて、該歌唱区間における歌唱音高の推移パターンと、前記ガイドデータで示される該当ガイド区間におけるガイド音高の推移パターンと、の近似度が最大となるように、その歌唱区間における時間軸上の位置を修正する。そして、前記誤差算出手段は、こうして修正された歌唱データで示される各単位区間における歌唱音高の推移パターンそれぞれを、前記ガイドデータで示される各ガイド区間のうち、該当する歌唱区間に対応するガイド区間におけるガイド音高の推移パターンと対比することにより、両推移パターンの誤差を、そのガイド区間において発声すべきガイド音高に対する歌唱音高の誤差としてそれぞれ算出する。 In this configuration, for each singing section indicated by the singing data, the degree of approximation between the singing pitch transition pattern in the singing section and the guide pitch transition pattern in the corresponding guide section indicated by the guide data is The position on the time axis in the singing section is corrected so as to be maximum. Then, the error calculation means, each of the transition patterns of the singing pitch in each unit section indicated by the singing data corrected in this way, guide corresponding to the corresponding singing section in each guide section indicated by the guide data. By comparing with the transition pattern of the guide pitch in the section, the error of both transition patterns is calculated as the error of the singing pitch with respect to the guide pitch to be uttered in the guide section.

この構成であれば、歌唱音高の推移パターンとガイド音高の推移パターンとが最も近似するように、歌唱音高における時間軸上の位置が修正される。このように推移パターンが近似するということは、その音高の変化するタイミングも近似した状態になることを意味するため、こうして、音高の変化が開始されたタイミングのズレを補償することができる。 If it is this structure, the position on the time axis in a song pitch will be corrected so that the transition pattern of a song pitch and the transition pattern of a guide pitch may be approximated most. When the transition pattern is approximated in this way, it means that the timing at which the pitch changes is also approximated. Thus, the timing deviation at which the pitch change is started can be compensated. .

なお、この構成において、推移パターンが近似するように時間軸上の位置を修正するためには、どのような手法を採用してもよいが、例えば、特開２００５−１０７３３０号公報に記載されているような手法を採用することが考えられる。 In this configuration, any method may be employed to correct the position on the time axis so that the transition pattern approximates. For example, as described in JP-A-2005-107330 It is conceivable to adopt such a method.

また、上記課題を解決するため第１１の構成（請求項１１）は、第１から第５のいずれかの構成，および，第６から第１０のいずれかの構成の全ての手段を供えた音域判定システムであって、前記最高音判定手段により判定された最高音から、前記最低音判定手段により判定された最低音までの音高の範囲を、これら判定に際して参照された誤差分布に対応するユーザの音域と判定する音域判定手段，を備えている。 In order to solve the above problem, an eleventh configuration (claim 11) is a sound range provided with all means of any one of the first to fifth configurations and any of the sixth to tenth configurations. A determination system, wherein a range of pitches from a highest sound determined by the highest sound determination means to a lowest sound determined by the lowest sound determination means corresponds to an error distribution referred to in the determination A sound range determination means for determining a sound range of

この構成であれば、上述したように、無理に発声している音高を除いた音高として判定されたユーザの音域における最高音ｋｕｐおよび最低音ｋｌｏに基づいて、このユーザの音域を適切に判定することができる。 With this configuration, as described above, the user's range is appropriately set based on the highest tone kup and the lowest tone klo in the user's range determined as the pitch excluding the pitch that is forcibly uttered. Can be determined.

また、上記課題を解決するため請求項１２の構成（請求項１２）は、上記第１から第５のいずれかの構成に係る全ての手段として機能させるための各種処理手順をコンピュータに実行させるためのプログラムである。 In order to solve the above problem, the configuration of claim 12 (claim 12) causes a computer to execute various processing procedures for causing all the means according to any one of the first to fifth configurations to function. It is a program.

このプログラムにより制御されるコンピュータは、上記第１から第５のいずれかの構成の一部として機能することができる。 A computer controlled by this program can function as a part of any one of the first to fifth configurations.

また、上記課題を解決するため請求項１３の構成（請求項１３）は、上記第６から第１２のいずれかの構成に係る全ての手段として機能させるための各種処理手順をコンピュータに実行させるためのプログラムである。 In order to solve the above problem, the configuration of claim 13 (claim 13) causes the computer to execute various processing procedures for causing all the means according to any of the sixth to twelfth configurations to function. It is a program.

このプログラムにより制御されるコンピュータは、上記第６から第１２のいずれかの構成の一部として機能することができる。 The computer controlled by this program can function as a part of any of the sixth to twelfth configurations.

また、上記課題を解決するため請求項１４の構成（請求項１４）は、上記第１１１の構成に係る全ての手段として機能させるための各種処理手順をコンピュータに実行させるためのプログラムである。 In order to solve the above problem, a configuration of claim 14 (claim 14) is a program for causing a computer to execute various processing procedures for causing all the means according to the 111th configuration to function.

このプログラムにより制御されるコンピュータは、上記第１１の構成の一部として機能することができる。 The computer controlled by this program can function as a part of the eleventh configuration.

なお、上述したプログラムは、コンピュータシステムによる処理に適した命令の順番付けられた列からなるものであって、各種記録媒体や通信回線を介して音高判定システムや、これらを利用するユーザ等に提供されるものである。 Note that the above-described program is composed of an ordered sequence of instructions suitable for processing by a computer system, and can be used for a pitch determination system via various recording media and communication lines, and for users who use these. It is provided.

音域判定システムの全体構成を示すブロック図Block diagram showing the overall configuration of the range determination system 誤差集計処理を示すフローチャートFlow chart showing error tabulation process 歌唱データおよびガイドデータで示される音高の推移パターンを示す図The figure which shows the transition pattern of the pitch shown by song data and guide data 音声データで示される音声波形（ａ）、および、歌唱データで示される音高の推移パターンを示す図（ｂ）The figure which shows the transition pattern of the sound waveform (a) shown by audio | voice data, and the pitch shown by song data (b) 誤差分布を示す図Diagram showing error distribution 音域判定処理を示すフローチャート（１／２）Flow chart showing the range determination process (1/2) 音域判定処理を示すフローチャート（２／２）Flow chart showing the range determination process (2/2)

以下に本発明の実施形態を図面と共に説明する。
（１）ハードウェア構成
音域判定システム１は、周知のコンピュータシステムからなる端末装置やカラオケ装置にプログラムを実装することにより実現されるものである。 Embodiments of the present invention will be described below with reference to the drawings.
(1) Hardware Configuration The sound range determination system 1 is realized by mounting a program on a terminal device or a karaoke device composed of a known computer system.

まず、「端末装置」にプログラムを実装する場合のハードウェア構成は、図１（ａ）に示すように、システム全体を制御する制御部１１，各種情報を記憶する記憶部１３，ネットワーク２を介した通信を制御する通信部１５，キーボードやディスプレイなどからなるユーザインタフェース（Ｕ／Ｉ）部１７，記録メディアを介して情報を入出力するメディアドライブ１９などを備えた構成となる。 First, as shown in FIG. 1A, the hardware configuration when the program is installed in the “terminal device” includes a control unit 11 that controls the entire system, a storage unit 13 that stores various information, and a network 2. A communication unit 15 for controlling the communication, a user interface (U / I) unit 17 including a keyboard and a display, a media drive 19 for inputting and outputting information via a recording medium, and the like.

この構成では、ユーザインタフェース部１７または通信部１５を介して外部からの所定の指令を受けた際に、制御部１１が記憶部１３に記憶されたプログラムに従って各種処理を実行することによって、本発明の音域判定システムとしての機能を発揮する。 In this configuration, when a predetermined command is received from the outside via the user interface unit 17 or the communication unit 15, the control unit 11 executes various processes according to the program stored in the storage unit 13. Exhibits the function as a sound range judgment system.

また、「カラオケ装置」にプログラムを実装する場合のハードウェア構成は、図１（ｂ）に示すように、システム全体を制御する制御部１１，演奏楽曲の伴奏内容および歌詞を示す楽曲データや映像データなどを記憶する記憶部１３，ネットワーク２を介した通信を制御する通信部１５，各種映像の表示を行う表示部２１，複数のキー・スイッチなどからなる操作部２３，マイク２５からの音声の入力とスピーカ２７からの音声の出力とを制御する音声入出力部２９などを備えた構成となる。 In addition, as shown in FIG. 1B, the hardware configuration when the program is installed in the “karaoke apparatus” includes a control unit 11 that controls the entire system, music data and video indicating the accompaniment content and lyrics of the performance music. A storage unit 13 for storing data, a communication unit 15 for controlling communication via the network 2, a display unit 21 for displaying various images, an operation unit 23 including a plurality of keys and switches, and a voice from a microphone 25 The audio input / output unit 29 that controls input and output of audio from the speaker 27 is provided.

この構成では、操作部２３または通信部１５を介して外部からの所定の指令を受けた際に、制御部１１が記憶部１３に記憶されたプログラムに従って各種処理を実行することによって、本発明の音域判定システムとしての機能を発揮する。 In this configuration, when receiving a predetermined command from the outside via the operation unit 23 or the communication unit 15, the control unit 11 executes various processes according to the program stored in the storage unit 13, thereby Demonstrates the function of a range determination system.

なお、本実施形態では、音域判定システム１が単体の装置（端末装置，カラオケ装置）により構成されているが、それぞれ協調して動作する複数の装置により構成できることはいうまでもない。
（２）制御部１１による処理
以下に、制御部１１が記憶部１３に記憶されたプログラムに従って実行する各種処理手順を説明する。
（２−１）誤差集計処理
はじめに、誤差集計処理の処理手順を図２に基づいて説明する。 In the present embodiment, the sound range determination system 1 is constituted by a single device (terminal device, karaoke device), but it goes without saying that it can be constituted by a plurality of devices that operate in cooperation with each other.
(2) Processing by Control Unit 11 Hereinafter, various processing procedures executed by the control unit 11 according to a program stored in the storage unit 13 will be described.
(2-1) Error Aggregation Process First, the processing procedure of the error aggregation process will be described with reference to FIG.

この誤差集計処理は、本音域判定システム１が端末装置により実現された構成であれば、ユーザインタフェース部１７を介して規定の操作が行われた，または，通信部１５を介してネットワーク２からの指令を受けた際に起動される。一方、本音域判定システム１がカラオケ装置により実現された構成であれば、カラオケ装置の動作モードが音域判定モードに切り替えられている状態で、ユーザによる楽曲の歌唱（楽曲データに基づく楽曲の再生）が行われる毎に起動される。 This error totaling process is performed when a predetermined operation is performed through the user interface unit 17 or from the network 2 through the communication unit 15 if the present sound range determination system 1 is configured by a terminal device. It is activated when a command is received. On the other hand, if the real-range determination system 1 is realized by a karaoke device, the user sings a song (reproduction of a song based on song data) while the operation mode of the karaoke device is switched to the range determination mode. It is activated every time.

この誤差集計処理が起動されると、まず、ユーザによる楽曲の歌唱に伴う時間軸に沿った音高の推移を示す歌唱データが取得される（ｓ１１０）。この歌唱データは、ユーザが楽曲を歌唱した際の音高（以降「歌唱音高」という）の推移を時間軸に沿って示したものであり、具体的には、ユーザの歌唱に係る音声に含まれた基本周波数を時間軸に沿って推移させた推移パターンを示すものである（図３（ａ）参照）。 When this error counting process is activated, first, singing data indicating the transition of the pitch along the time axis associated with the singing of music by the user is acquired (s110). This singing data shows the transition of the pitch (hereinafter referred to as “singing pitch”) when the user sings the music along the time axis. Specifically, the singing data includes the voice related to the user's singing. A transition pattern in which the included fundamental frequency is shifted along the time axis is shown (see FIG. 3A).

このｓ１１０では、本音域判定システム１が端末装置により実現された構成であれば、ユーザインタフェース部１７への操作を介して指定された歌唱データが記憶部１３やメディアドライブ１９（つまり記録メディア）から取得される、または、通信部１５を介して受信された歌唱データが取得される。なお、こうして取得される歌唱データは、その歌唱に係るユーザ，楽曲をそれぞれ識別するためのユーザ識別情報，楽曲識別情報が付加されたものとなっている。 In this s110, if the real sound range determination system 1 is realized by a terminal device, the song data designated through the operation to the user interface unit 17 is transmitted from the storage unit 13 or the media drive 19 (that is, the recording medium). The singing data acquired or received via the communication unit 15 is acquired. In addition, the song data acquired in this way are added with user identification information and song identification information for identifying the user and song associated with the song, respectively.

また、本音域判定システム１がカラオケ装置により実現された構成であれば、ユーザによる楽曲の歌唱に際して生成された歌唱データが取得される。この場合には、楽曲の歌唱に伴って音声入出力部２９から入力されるユーザの音声データ（図４（ａ）参照）を取得しておき、この音声データにおいて時間軸に沿った位置それぞれの音高を算出し、この音高の時間軸に沿った推移（図４（ｂ）参照）を示すデータが歌唱データとして生成されることとなる。なお、こうして取得される歌唱データは、その歌唱に際してカラオケ装置にログインしているユーザを識別するためのユーザ識別情報と、その歌唱に係る楽曲を識別するためのユーザ識別情報，楽曲識別情報が付加されたものとなっている。 Moreover, if the real range determination system 1 is the structure implement | achieved by the karaoke apparatus, the song data produced | generated at the time of the song of a music by a user are acquired. In this case, the user's voice data (see FIG. 4A) input from the voice input / output unit 29 along with the singing of the music is acquired, and each position along the time axis in this voice data is acquired. The pitch is calculated, and data indicating the transition of the pitch along the time axis (see FIG. 4B) is generated as song data. In addition, the song data acquired in this way is added with user identification information for identifying a user who is logged in to the karaoke apparatus at the time of singing, user identification information for identifying a song related to the song, and song identification information. It has been made.

次に、上記ｓ１１０にて取得された歌唱データにおける歌唱に係る楽曲につき、この楽曲を適切に歌唱した場合における時間軸に沿った音高の推移を示すガイドデータが取得される（ｓ１２０）。このガイドデータとは、楽曲を適切に歌唱した場合における時間軸に沿った音高の推移を示したものであり、具体的には、適切に歌唱した場合の音声に含まれるべき基本周波数を時間軸に沿って推移させた推移パターンを示すものである（図３（ａ）参照）。 Next, for the music related to the singing in the singing data acquired in s110, guide data indicating the transition of the pitch along the time axis when the music is appropriately sung is acquired (s120). This guide data shows the transition of the pitch along the time axis when a song is properly sung. Specifically, the basic frequency that should be included in the sound when singing properly is the time. FIG. 3A shows a transition pattern that is shifted along the axis (see FIG. 3A).

本実施形態においては、複数の楽曲それぞれに対し、その楽曲を適切に歌唱した場合における時間軸に沿った音高の推移を示すガイドデータが記憶部１３に複数記憶されており、このｓ１２０では、それらガイドデータのうち、上記ｓ１１０にて取得された歌唱データに付加された楽曲識別情報で識別される楽曲に対応するガイドデータが読み出され、これが取得される。なお、このガイドデータは、ネットワーク２を介して接続されたサーバ装置などからの提供を受けて、これを取得することとしてもよい。 In this embodiment, for each of a plurality of music pieces, a plurality of guide data indicating the transition of the pitch along the time axis when the music is appropriately sung is stored in the storage unit 13, and in this s120, Among these guide data, guide data corresponding to the music identified by the music identification information added to the singing data acquired in s110 is read out and acquired. The guide data may be obtained by receiving provision from a server device or the like connected via the network 2.

次に、上記ｓ１１０にて取得された歌唱データで示される推移パターンにおける時間軸に沿った各単位区間（以降「歌唱区間」という）それぞれの位置が、上記ｓ１２０にて取得されたガイドデータに基づいて修正される（ｓ１３０）。 Next, the position of each unit section (hereinafter referred to as “singing section”) along the time axis in the transition pattern indicated by the singing data acquired in s110 is based on the guide data acquired in s120. Is corrected (s130).

ここでは、歌唱データで示される歌唱区間それぞれについて、この歌唱区間における歌唱音高の推移パターンが、ガイドデータで示される時間軸上の各単位区間（以降「ガイド区間」という）のうち、その歌唱区間に対応するガイド区間における音高（以降「ガイド音高」という）の推移パターンと最も近似する（近似度が最大となる）ように、その歌唱区間における時間軸上の位置が修正（時間軸に沿って前後にシフト）される（図３（ｂ）参照）。 Here, for each singing section indicated by the singing data, the singing pitch transition pattern in this singing section is the singing of each unit section (hereinafter referred to as “guide section”) on the time axis indicated by the guide data. The position on the time axis in the singing section is corrected (time axis) so that it closely approximates the transition pattern of the pitch in the guide section corresponding to the section (hereinafter referred to as “guide pitch”). (Refer to FIG. 3B).

このように推移パターンが近似するように時間軸上の位置を修正する手法としては、どのような手法を採用してもよいが、例えば、特開２００５−１０７３３０号公報に記載されているような手法を採用することが考えられる。 As a method for correcting the position on the time axis so that the transition pattern is approximated in this way, any method may be adopted. For example, as described in JP-A-2005-107330 It is conceivable to adopt a method.

なお、上述した歌唱区間，ガイド区間それぞれは、楽曲の時間軸に沿った全体を２以上に分割したそれぞれの区間を示すものであるが、楽曲全体を分割することなく全体として１つの歌唱区間，ガイド区間として取り扱うこととしてもよい。 Each of the above-described singing section and guide section indicates each section obtained by dividing the whole of the music along the time axis into two or more, but one singing section as a whole without dividing the entire music, It may be handled as a guide section.

次に、上記ｓ１３０にて時間軸上の位置が修正された歌唱データで示される歌唱音高の推移パターンと、上記ｓ１２０にて取得されたガイドデータで示されるガイド音高の推移パターンと、の対比により、ガイド音高に対する歌唱音高の誤差が算出される（ｓ１４０）。 Next, a transition pattern of the singing pitch indicated by the singing data whose position on the time axis is corrected in s130, and a transition pattern of the guide pitch indicated by the guide data acquired in s120. By contrast, the error of the singing pitch with respect to the guide pitch is calculated (s140).

ここでは、歌唱データで示される歌唱区間における歌唱音高の推移パターンそれぞれが、ガイドデータで示されるガイド区間のうち、該当歌唱区間に対応するガイド区間における推移パターンと対比され、そのガイド区間において発声すべきガイド音高ｋ（＝１〜ｎのいずれか）に対する歌唱音高の誤差ｄ［ｋ］それぞれが算出される。 Here, each of the transition patterns of the singing pitch in the singing section indicated by the singing data is compared with the transition pattern in the guide section corresponding to the corresponding singing section among the guide sections indicated by the guide data, and the utterance in the guide section Each of the errors d [k] of the singing pitch with respect to the guide pitch k (= 1 to n) to be calculated is calculated.

ここでいう「誤差」とは、例えば、歌唱音高の推移パターンとガイド音高の推移パターンとのパターンとしての形状の違いを数値化したものや、歌唱音高の推移パターンおよびガイド音高の推移パターンにおいて最終的に到達した音高それぞれの音高としての違い（または歌唱音高がガイド音高と同一の音高となっている期間の違い）を数値化したものなどである。 “Error” here means, for example, a numerical difference between the pattern of the transition pattern of the singing pitch and the transition pattern of the guide pitch, or the transition pattern of the singing pitch and the guide pitch. For example, the difference (or the difference in the period during which the singing pitch is the same pitch as the guide pitch) in the transition pattern is quantified.

次に、ユーザ毎に用意された情報であり、ガイド音高に対する歌唱音高の誤差をガイド音高毎に分布させてなる誤差分布を示す誤差分布情報のうち、上記ｓ１１０にて取得された歌唱データにおける歌唱に係るユーザに対応する誤差分布情報が取得される（ｓ１５０）。ここでは、歌唱データに付加されたユーザ識別情報に基づき、これにより識別されるユーザに対応する誤差分布情報が取得される。 Next, it is information prepared for each user, and among the error distribution information indicating the error distribution obtained by distributing the error of the singing pitch with respect to the guide pitch for each guide pitch, the singing acquired in s110 above. Error distribution information corresponding to the user related to the song in the data is acquired (s150). Here, based on the user identification information added to the singing data, error distribution information corresponding to the user identified thereby is acquired.

この誤差分布情報で示される「誤差分布」は、図５（ａ）に示すように、縦軸に誤差の累積値をとり、横軸に基本周波数で規定されるガイド音高ｋをとって、ガイド音高ｋに対する歌唱音高の誤差を分布させたものである。なお、図５は、ガイド音高ｋそれぞれにおける誤差の累積値をつなぐ包絡線により誤差の分布を表している。 As shown in FIG. 5A, the “error distribution” indicated by the error distribution information takes the accumulated value of errors on the vertical axis and the guide pitch k defined by the fundamental frequency on the horizontal axis. The error of the singing pitch with respect to the guide pitch k is distributed. FIG. 5 shows an error distribution by an envelope connecting the accumulated error values for each guide pitch k.

次に、上記ｓ１５０にて取得された誤算分布情報が、上記ｓ１４０にて算出された誤差それぞれを誤差分布に追加的に分布させたものを示すように更新される（ｓ１６０）。ここでは、上記ｓ１４０にて算出されたガイド音高毎の誤差それぞれが、上記ｓ１５０にて取得された誤算分布情報で示される誤差分布のうち、該当ガイド音高の誤差として積算され、こうして積算された誤差分布を示す誤差分布情報に更新される。 Next, the miscalculation distribution information acquired in s150 is updated so as to indicate that each error calculated in s140 is additionally distributed in the error distribution (s160). Here, the errors for each guide pitch calculated in s140 are integrated as errors of the corresponding guide pitch in the error distribution indicated by the miscalculation distribution information acquired in s150, and thus integrated. The error distribution information indicating the error distribution is updated.

次に、上記ｓ１６０にて更新された誤差分布情報に基づいて、この誤差分布情報で示される誤差分布に対応するユーザの音域が判定される（ｓ１７０）。ここでは、上記ｓ１６０にて更新された誤差分布情報を引数として後述する音域判定処理が行われ、ここでユーザの音域が判定される。 Next, based on the error distribution information updated in s160, the user's range corresponding to the error distribution indicated by the error distribution information is determined (s170). Here, the range determination process described later is performed using the error distribution information updated in s160 as an argument, and the range of the user is determined here.

そして、上記ｓ１８０にて判定されたユーザの音域が通知または記憶される（ｓ１８０）。ここでは、装置の動作設定が、上記ｓ１７０にて判定されたユーザの音域を通知すべき旨の設定となっている場合であれば、その音域がユーザインタフェース部１７や表示部２１に表示される、または、通信部１５を介して外部の装置へと送信される。また、ユーザの音域を蓄積しておくべき旨の設定となっている場合であれば、その音域を示す情報が記憶部１３や記録メディアに記憶される。
（２−２）音域判定処理
続いて、誤差集計処理のｓ１７０である音域判定処理の処理手順を図６に基づいて説明する。 Then, the user's range determined in s180 is notified or stored (s180). Here, if the operation setting of the apparatus is set to notify the user's sound range determined in s170, the sound range is displayed on the user interface unit 17 or the display unit 21. Alternatively, the data is transmitted to an external device via the communication unit 15. Also, if the setting is such that the user's sound range should be accumulated, information indicating the sound range is stored in the storage unit 13 or the recording medium.
(2-2) Sound Range Determination Process Next, the processing procedure of the sound range determination process which is s170 of the error totaling process will be described based on FIG.

この音域判定処理では、まず、誤差集計処理から渡された誤差分布情報で示される誤差分布が、所定数のガイド音高毎のグループに分類される（ｓ２１０）。ここでは、図５（ｂ）に示すように、誤差分布の横軸が所定数のガイド音高ｋに相当する帯域毎に区切られ、こうして区切られた同一の帯域におけるガイド音高それぞれが、同じグループｍ（ｍは、グループを示すインデックス）のものとして分類される。 In this sound range determination process, first, the error distribution indicated by the error distribution information passed from the error totaling process is classified into a group for each predetermined number of guide pitches (s210). Here, as shown in FIG. 5B, the horizontal axis of the error distribution is divided for each band corresponding to a predetermined number of guide pitches k, and the guide pitches in the same band thus divided are the same. It is classified as a group m (m is an index indicating a group).

次に、上記ｓ２１０で分類されたグループ毎に、同一グループに分類されたガイド音高ｋの誤差ｄ［ｋ］を平均化した平均誤差ｄ［ｍ］が算出される（ｓ２２０）。ここでは、図５（ｃ）に示すように、誤差分布における全ガイド音高ｋの誤差ｄ［ｋ］が、グループｍ毎に平均化した平均誤差ｄ［ｍ］とされる。 Next, for each group classified in s210, an average error d [m] obtained by averaging the error d [k] of the guide pitch k classified in the same group is calculated (s220). Here, as shown in FIG. 5C, the error d [k] of all the guide pitches k in the error distribution is set as an average error d [m] averaged for each group m.

次に、上記ｓ２１０で分類されたグループそれぞれについて、該当グループｍに隣接するガイド音高との間での誤差の変化率が算出される（ｓ２３０）。本実施形態では、該当グループｍと隣接するグループの平均誤差を、該当グループｍの平均誤差ｄ［ｍ］で除してなる隣接誤差比が変化率を表すパラメータとして算出される。 Next, for each of the groups classified in s210, an error change rate with the guide pitch adjacent to the group m is calculated (s230). In the present embodiment, the adjacent error ratio obtained by dividing the average error of the group adjacent to the group m by the average error d [m] of the group m is calculated as a parameter representing the change rate.

より具体的には、図５（ｄ）に示すように、該当グループｍと低音側で隣接するグループｍ−１の平均誤差ｄ［ｍ−１］を、該当グループｍの平均誤差ｄ［ｍ］で除した値ｄ［ｍ−１］／ｄ［ｍ］が算出され、これが誤差分布における該当グループｍの隣接誤差比となる。この隣接誤差比は、変化率が大きくなるほど「１」から離れた値となる。なお、ここでは、該当グループｍと低音側に隣接するグループｍ−１との関係で隣接誤差比を算出しているが、高音側に隣接するグループｍ＋１との関係で隣接誤差比を算出することとしてもよい。 More specifically, as shown in FIG. 5D, the average error d [m−1] of the group m−1 adjacent to the corresponding group m on the bass side is changed to the average error d [m] of the corresponding group m. The value d [m−1] / d [m] divided by is calculated, and this is the adjacent error ratio of the corresponding group m in the error distribution. The adjacent error ratio becomes a value farther from “1” as the change rate increases. Here, the adjacent error ratio is calculated based on the relationship between the corresponding group m and the group m−1 adjacent to the bass side, but the adjacent error ratio is calculated based on the relationship between the group m + 1 adjacent to the high frequency side. It is good.

次に、上記ｓ２１０で分類されたグループのうち、上記ｓ２３０で算出された変化率が一定以上となっているグループを含む１以上のグループで構成される帯域が、以降の処理で音域の最高音を特定する際の対象帯域として抽出される（ｓ２４０）。本実施形態では、図５（ｅ）に示すように、高音側において、変化率として算出された隣接誤差比が最大となっている（「１」から正方向に最も離れている）グループｍと、その隣接誤差比を算出する際に参照された平均誤差ｄ［ｍ−１］に対応するグループｍ−１と、で構成される高域側の帯域（高域側誤差変曲帯域）が、対象帯域として抽出される（同図の網掛け部Ａ参照）。 Next, among the groups classified in s210, a band composed of one or more groups including a group in which the rate of change calculated in s230 is equal to or higher than a certain level is the highest tone in the range in the subsequent processing. Is extracted as a target band for specifying (S240). In the present embodiment, as shown in FIG. 5E, the adjacent error ratio calculated as the rate of change is the maximum (the furthest away from “1” in the positive direction) on the treble side. , And a group m−1 corresponding to the average error d [m−1] referred to when calculating the adjacent error ratio, and a high frequency band (high frequency error inflection band), It is extracted as a target band (see the shaded part A in the figure).

なお、ここでは、該当グループｍとグループｍ−１との関係で隣接誤差比が算出されることを前提に、隣接誤差比が最大となっているグループｍとグループｍ−１とで構成される帯域が抽出されている。しかし、該当グループｍとグループｍ＋１との関係で隣接誤差比が算出される場合には、隣接誤差比が最大となっているグループｍとグループｍ＋１とで構成される帯域を、高域側誤差変曲帯域として抽出することとしてもよい。 Here, on the assumption that the adjacent error ratio is calculated based on the relationship between the group m and the group m-1, the group m and the group m-1 having the maximum adjacent error ratio are configured. Bands are extracted. However, when the adjacent error ratio is calculated based on the relationship between the group m and the group m + 1, the band formed by the group m and the group m + 1 having the maximum adjacent error ratio is changed to the high frequency side error variable. It may be extracted as a music band.

こうして、上記ｓ２４０で対象帯域が抽出された後、対象帯域を構成するグループのガイド音高それぞれについて、それぞれ隣接するガイド音高における誤差との差分Δｄが算出される（ｓ３１０）。 Thus, after the target band is extracted in s240, the difference Δd between the guide pitches of the group constituting the target band and the error in the adjacent guide pitch is calculated (s310).

本実施形態では、ガイド音高ｋそれぞれについて、まず、該当ガイド音高ｋの高音側に隣接するガイド音高ｋ＋１における誤差ｄ［ｋ＋１］と、該当ガイド音高ｋにおける誤差ｄ［ｋ］と、の差分Δｄ［ｋ］（＝ｄ［ｋ＋１］−ｄ［ｋ］）が算出される。そして、該当ガイド音高ｋにおける誤差ｄ［ｋ］と、該当ガイド音高ｋの低音側に隣接するガイド音高ｋ−１における誤差ｄ［ｋ−１］と、の差分Δｄ［ｋ］（＝ｄ［ｋ］−ｄ［ｋ−１］）が算出される。 In the present embodiment, for each guide pitch k, first, an error d [k + 1] in the guide pitch k + 1 adjacent to the high pitch side of the corresponding guide pitch k, an error d [k] in the corresponding guide pitch k, The difference Δd [k] (= d [k + 1] −d [k]) is calculated. Then, the difference Δd [k] (=) between the error d [k] at the corresponding guide pitch k and the error d [k−1] at the guide pitch k−1 adjacent to the low pitch side of the corresponding guide pitch k. d [k] -d [k-1]) is calculated.

次に、対象帯域を構成するグループのガイド音高それぞれについて、該当ガイド音高ｋについて上記ｓ３１０で算出された差分Δｄ［ｋ−１］と差分Δｄ［ｋ］とを対比してなる誤差比｜Δｄ［ｋ−１］／Δｄ［ｋ］｜が算出される（ｓ３２０）。 Next, for each guide pitch of the group that constitutes the target band, an error ratio obtained by comparing the difference Δd [k−1] and the difference Δd [k] calculated in s310 with respect to the corresponding guide pitch k | Δd [k−1] / Δd [k] | is calculated (s320).

次に、対象帯域を構成するグループのガイド音高のうち、上記ｓ３１０にて算出された差分Δｄ［ｋ］が０より大きく、かつ、上記ｓ３２０にて算出された誤差比｜Δｄ［ｋ−１］／Δｄ［ｋ］｜が最小となっているガイド音高ｋが抽出され、これがユーザの音域における最高音ｋｕｐとして判定される（ｓ３３０）。 Next, among the guide pitches of the groups constituting the target band, the difference Δd [k] calculated in s310 is greater than 0, and the error ratio | Δd [k−1] calculated in s320 is set. ] / Δd [k] | is extracted, and this is determined as the highest sound kup in the user's range (s330).

誤差分布情報で示される誤差分布は、ガイド音高ｋに対する歌唱音高の推移パターンとしての誤差ｄ［ｋ］をガイド音高毎に分布させたものであるため、誤差ｄ［ｋ］が小さいガイド音高ｋは、該当ユーザが同じような推移パターンで適切に発声できている音高といえるのに対し、その誤差ｄ［ｋ］が大きいガイド音高は、該当ユーザが無理に発声している音高といえる。 The error distribution indicated by the error distribution information is obtained by distributing the error d [k] as a transition pattern of the singing pitch with respect to the guide pitch k for each guide pitch. Therefore, the guide having a small error d [k]. The pitch k can be said to be a pitch that the corresponding user can properly utter in a similar transition pattern, whereas a guide pitch whose error d [k] is large is uttered by the corresponding user forcibly. It can be said that the pitch.

そうすると、ユーザが適切に発声できているガイド音高から無理に発声しているガイド音高へと至る高音側誤差変曲帯域では、適切に発声できている最高音のガイド音高ｋと低音側に隣接するガイド音高ｋ−１における誤差ｄ［ｋ−１］に対し、高音側に隣接するガイド音高ｋ＋１における誤差ｄ［ｋ］が急増する傾向を示す（図７（ａ）参照）。 Then, in the high-side error inflection band from the guide pitch that the user can properly utter to the guide pitch that is forcibly uttered, the highest guide pitch k and low-side that can be properly uttered The error d [k] at the guide pitch k + 1 adjacent to the high pitch side tends to increase rapidly with respect to the error d [k-1] at the guide pitch k-1 adjacent to (see FIG. 7A).

この場合、このガイド音高ｋについての誤差ｄ［ｋ］と、高音側に隣接するガイド音高ｋ＋１についての誤差ｄ［ｋ＋１］との差分Δｄ［ｋ］は当然０より大きくなる。また、低音側に隣接するガイド音高ｋ−１との比である誤差比｜Δｄ［ｋ−１］／Δｄ［ｋ］｜は一定の小さい値を示すことになる。具体的にいえば、この誤差比は、上記誤差変曲帯域においてユーザが適切に発声できている最高音のガイド音高ｋを変曲点として、分母であるΔｄ［ｋ］が著しく大きくなることで最小値を示すこととなる。 In this case, the difference Δd [k] between the error d [k] for the guide pitch k and the error d [k + 1] for the guide pitch k + 1 adjacent to the high pitch side is naturally greater than zero. Further, the error ratio | Δd [k−1] / Δd [k] |, which is a ratio with the guide pitch k−1 adjacent to the low tone side, shows a constant small value. More specifically, the error ratio is such that Δd [k], which is the denominator, becomes extremely large with the guide pitch k of the highest sound that the user can properly utter in the error inflection band as the inflection point. Indicates the minimum value.

こうして、差分Δｄ［ｋ］が０より大きく、かつ、誤差比｜Δｄ［ｋ−１］／Δｄ［ｋ］｜が最小となっているガイド音高ｋを最高音ｋｕｐと判定できる（図７（ｂ）の網掛け部Ａ参照）。 Thus, the guide pitch k where the difference Δd [k] is greater than 0 and the error ratio | Δd [k−1] / Δd [k] | is minimum can be determined as the highest tone kup (FIG. 7 ( (See shaded part A in b)).

次に、上記ｓ２１０で分類されたグループのうち、上記ｓ２３０で算出された変化率が一定以上となっている１以上のグループで構成される帯域が、以降の処理で音域の最低音を特定する際の対象帯域として抽出される（ｓ２５０）。本実施形態では、図５（ｅ）に示すように、低音側において、変化率として算出された隣接誤差比が最小となっている（「１」から負方向に最も離れている）グループｍと、その隣接誤差比を算出する際に参照された平均誤差ｄ［ｍ−１］に対応するグループｍ−１と、で構成される低域側の帯域（低域側誤差変曲帯域）が、対象帯域として抽出される（同図の網掛け部Ｂ参照）。 Next, among the groups classified in s210, a band composed of one or more groups whose rate of change calculated in s230 is greater than or equal to a certain value specifies the lowest sound in the range in the subsequent processing. Is extracted as a target band at the time (s250). In the present embodiment, as shown in FIG. 5E, the adjacent error ratio calculated as the rate of change is the smallest (the farthest away from “1” in the negative direction) on the bass side, and the group m , And a group m−1 corresponding to the average error d [m−1] referred to when calculating the adjacent error ratio, and a low frequency band (low frequency error inflection band), It is extracted as a target band (see the shaded part B in the figure).

なお、ここでは、該当グループｍとグループｍ−１との関係で隣接誤差比が算出されることを前提に、隣接誤差比が最小となっているグループｍとグループｍ−１とで構成される帯域が抽出されている。しかし、該当グループｍとグループｍ＋１との関係で隣接誤差比が算出される場合には、隣接誤差比が最小となっているグループｍとグループｍ＋１とで構成される帯域を、最小側誤差変曲帯域として抽出することとしてもよい。 Here, on the assumption that the adjacent error ratio is calculated based on the relationship between the group m and the group m-1, the group m and the group m-1 having the minimum adjacent error ratio are configured. Bands are extracted. However, when the adjacent error ratio is calculated based on the relationship between the group m and the group m + 1, the band composed of the group m and the group m + 1 having the minimum adjacent error ratio is set to the minimum side error inflection. It is good also as extracting as a zone | band.

こうして、上記ｓ２５０で対象帯域が抽出された後、対象帯域を構成するグループのガイド音高それぞれについて、それぞれ隣接するガイド音高における誤差との差分Δｄが算出される（ｓ４１０）。本実施形態では、上記３１０と同様、差分Δｄ［ｋ］（＝ｄ［ｋ＋１］−ｄ［ｋ］）と、差分Δｄ［ｋ］（＝ｄ［ｋ］−ｄ［ｋ−１］）とが算出される。 Thus, after the target band is extracted in s250, the difference Δd between the guide pitches of the groups constituting the target band and the error in the adjacent guide pitches is calculated (s410). In the present embodiment, the difference Δd [k] (= d [k + 1] −d [k]) and the difference Δd [k] (= d [k] −d [k−1]) are the same as in 310 above. Calculated.

次に、対象帯域を構成するグループのガイド音高それぞれについて、該当ガイド音高ｋについて上記ｓ３１０で算出された差分Δｄ［ｋ］と差分Δｄ［ｋ−１］とを対比してなる誤差比｜Δｄ［ｋ］／Δｄ［ｋ−１］｜が算出される（ｓ４２０）。 Next, for each of the guide pitches of the group constituting the target band, an error ratio obtained by comparing the difference Δd [k] and the difference Δd [k−1] calculated in s310 with respect to the corresponding guide pitch k | Δd [k] / Δd [k−1] | is calculated (s420).

次に、対象帯域を構成するグループのガイド音高のうち、上記ｓ４１０にて算出された差分Δｄ［ｋ］が０より小さく、かつ、上記ｓ４２０にて算出された誤差比｜Δｄ［ｋ−１］／Δｄ［ｋ］｜が最小となっているガイド音高ｋが抽出され、これがユーザの音域における最低音ｋｌｏと判定される（ｓ４３０）。 Next, among the guide pitches of the groups constituting the target band, the difference Δd [k] calculated in s410 is smaller than 0, and the error ratio | Δd [k−1] calculated in s420 is set. ] / Δd [k] | is extracted, and the guide pitch k is extracted, and this is determined as the lowest sound klo in the user's range (s430).

上記のとおり、誤差ｄ［ｋ］が小さいガイド音高ｋは、該当ユーザが同じような推移パターンで適切に発声できている音高といえるのに対し、その誤差ｄ［ｋ］が大きいガイド音高は、該当ユーザが無理に発声している音高といえるため、低音側の誤差変曲帯域では、適切に発声できている最低音のガイド音高ｋより高音側に隣接するガイド音高ｋ＋１における誤差ｄ［ｋ＋１］に対し、ガイド音高ｋにおける誤差ｄ［ｋ］が急増する傾向を示す（図７（ａ）参照）。 As described above, the guide pitch k having a small error d [k] can be said to be a pitch at which the corresponding user can properly utter in a similar transition pattern, whereas the guide pitch having a large error d [k]. High can be said to be the pitch that the user is uttering forcibly, so in the error inflection band on the low tone side, the guide pitch k + 1 adjacent to the high tone side is higher than the lowest guide pitch k that can be properly uttered. The error d [k] at the guide pitch k tends to increase rapidly with respect to the error d [k + 1] at (see FIG. 7A).

この場合、このガイド音高ｋについての誤差ｄ［ｋ］と、高音側に隣接するガイド音高ｋ＋１についての誤差ｄ［ｋ＋１］との差分Δｄ［ｋ］は当然０より小さくなる。また、低音側に隣接するガイド音高ｋ−１との比である誤差比｜Δｄ［ｋ−１］／Δｄ［ｋ］｜は一定の小さい値を示すことになる。具体的にいえば、この誤差比は、上記誤差変曲帯域においてユーザが適切に発声できている最高音のガイド音高ｋを変曲点として、分母であるΔｄ［ｋ−１］が著しく大きくなることで最小値を示すこととなる。 In this case, the difference Δd [k] between the error d [k] for the guide pitch k and the error d [k + 1] for the guide pitch k + 1 adjacent to the high pitch side is naturally smaller than zero. Further, the error ratio | Δd [k−1] / Δd [k] |, which is a ratio with the guide pitch k−1 adjacent to the low tone side, shows a constant small value. Specifically, this error ratio has a significantly large Δd [k−1] as a denominator with the guide pitch k of the highest sound that the user can properly utter in the error inflection band as an inflection point. As a result, the minimum value is indicated.

こうして、差分Δｄ［ｋ］が０より小さく、かつ、誤差比｜Δｄ［ｋ−１］／Δｄ［ｋ］｜が最小となっているガイド音高ｋを最高音ｋｌｏと判定できる（図７（ｂ）の網掛け部Ｂ参照）。 In this way, the guide pitch k in which the difference Δd [k] is smaller than 0 and the error ratio | Δd [k−1] / Δd [k] | is minimum can be determined as the highest tone klo (FIG. 7 ( (See shaded portion B in b)).

そして、上記ｓ３３０にて判定された最高音ｋｕｐから、ｓ４３０にて判定された最低音ｋｌｏまでの音高の範囲が、ユーザの音域と判定される（図７（ｂ）（ｓ２６０）。 Then, the range of the pitch from the highest sound kup determined at s330 to the lowest sound klo determined at s430 is determined as the user's range (FIG. 7 (b) (s260).

こうして、ｓ２６０を終えた後、プロセスが誤差集計処理（ｓ１８０）へ戻る。
（３）作用，効果
本実施形態における音域判定システム１であれば、無理に発声している音高を除いた音高として判定されたユーザの音域における最高音ｋｕｐおよび最低音ｋｌｏに基づいて、このユーザの音域を適切に判定することができる（図ｓ２６０）。 Thus, after finishing s260, the process returns to the error totaling process (s180).
(3) Operation, effect In the range determination system 1 according to the present embodiment, based on the highest sound kup and the lowest sound klo in the user's range determined as the pitch excluding the pitch that is forcibly uttered, The user's range can be appropriately determined (FIG. S260).

この最高音ｋｕｐ，最低音ｋｌｏの判定に際しては、まず、ユーザの歌唱に伴う時間軸に沿った音高（歌唱音高）の推移パターンを、楽曲を適切に歌唱した場合における音高（ガイド音高）の推移パターンと対比することで、これら推移パターンの誤差ｄを算出し（図６のｓ１４０）、これをユーザの誤差分布に反映させる（同図ｓ１６０）。 In determining the highest sound kup and the lowest sound klo, first, the transition pattern of the pitch (singing pitch) along the time axis associated with the user's singing is set to the pitch (guide sound) when the song is properly sung. By comparing with the high transition pattern, the error d of these transition patterns is calculated (s140 in FIG. 6), and this is reflected in the error distribution of the user (s160 in FIG. 6).

そして、誤差分布におけるガイド音高１〜ｎのうち、隣接するガイド音高における誤差ｄとの差分Δｄそれぞれを対比した誤差比｜Δｄ［ｋ−１］／Δｄ［ｋ］｜が最小となるガイド音高ｋを、その誤差分布に対応するユーザの音域における最高音ｋｕｐと判定している（同図ｓ３１０〜ｓ３３０）。同様に、隣接するガイド音高における誤差との差分それぞれを対比した誤差比｜Δｄ［ｋ］／Δｄ［ｋ−１］｜が最小となるガイド音高ｋを、その誤差成分に対応するユーザの音域における最低音ｋｌｏと判定している（同図ｓ４１０〜ｓ４３０）。 Then, of the guide pitches 1 to n in the error distribution, the guide that minimizes the error ratio | Δd [k−1] / Δd [k] | that compares each difference Δd with the error d in the adjacent guide pitch. The pitch k is determined as the highest sound kup in the user's range corresponding to the error distribution (s310 to s330 in the figure). Similarly, the guide pitch k that minimizes the error ratio | Δd [k] / Δd [k−1] | that compares each difference with the error in the adjacent guide pitch is determined by the user corresponding to the error component. The lowest sound klo in the range is determined (s410 to s430 in the figure).

このように、上記実施形態では、誤差比｜Δｄ［ｋ−１］／Δｄ［ｋ］｜が最小となっているガイド音高ｋを、最高音ｋｕｐとして判定することができ、また、誤差比｜Δｄ［ｋ］／Δｄ［ｋ−１］｜が最小となっているガイド音高ｋを、最高音ｋｌｏとして判定することができる。 Thus, in the above embodiment, the guide pitch k at which the error ratio | Δd [k−1] / Δd [k] | is minimum can be determined as the highest sound kup, and the error ratio The guide pitch k at which | Δd [k] / Δd [k−1] | is minimized can be determined as the highest tone klo.

また、上記実施形態では、最高音ｋｕｐの判定時に参照するガイド音高ｋの範囲を、差分Δｄ［ｋ］が０より大きいものに限定している（図６のｓ３３０）。 Further, in the above embodiment, the range of the guide pitch k that is referred to when determining the highest sound kup is limited to a range in which the difference Δd [k] is greater than 0 (s330 in FIG. 6).

上述した変曲領域では、ガイド音高Ｋが適切に発声できている最高音より高くなるにつれて、高い側に隣接するガイド音高ｋ＋１における誤差ｄ［ｋ］が増加していくことが予想されるため、誤差ｄ［ｋ］と誤差ｄ［ｋ＋１］との差分Δｄ［ｋ］は当然０より大きくなる。 In the inflection region described above, it is expected that the error d [k] in the guide pitch k + 1 adjacent to the higher side will increase as the guide pitch K becomes higher than the highest sound that can be properly spoken. Therefore, the difference Δd [k] between the error d [k] and the error d [k + 1] is naturally larger than zero.

そうすると、対象となるガイド音高ｋを差分Δｄ［ｋ］が０より大きいものに限定しても、最高音ｋｕｐとして抽出されるべきガイド音高ｋが抽出対象から除かれることはないため、適切に処理負荷を抑えることができる。さらにいえば、適切なガイド音高ｋのみが対象となるため、不適切なガイド音高ｋを抽出してしまうことなどに起因して最高音ｋｕｐの判定精度が低くなってしまうことも防止することができる。 Then, even if the target guide pitch k is limited to a difference Δd [k] larger than 0, the guide pitch k to be extracted as the highest sound kup is not excluded from the extraction target. The processing load can be reduced. Furthermore, since only the appropriate guide pitch k is targeted, it is possible to prevent the determination accuracy of the highest sound kup from being lowered due to extraction of an inappropriate guide pitch k. be able to.

また、上記実施形態では、最低音ｋｌｏの判定時に参照するガイド音高ｋの範囲を、差分Δｄ［ｋ］が０より小さいものに限定している（図６のｓ４３０）。 Further, in the above-described embodiment, the range of the guide pitch k referred to when determining the lowest sound klo is limited to a range in which the difference Δd [k] is smaller than 0 (s430 in FIG. 6).

上述した変曲領域では、ガイド音高ｋが適切に発声できている最低音より高くなるにつれて、高い側に隣接するガイド音高ｋ＋１における誤差ｄ［ｋ］が減少していくことが予想されるため、誤差ｄ［ｋ］と誤差ｄ［ｋ＋１］との差分Δｄ［ｋ］は当然０より小さくなる。 In the inflection region described above, it is expected that the error d [k] in the guide pitch k + 1 adjacent to the higher side will decrease as the guide pitch k becomes higher than the lowest sound that can be properly spoken. Therefore, the difference Δd [k] between the error d [k] and the error d [k + 1] is naturally smaller than 0.

そうすると、対象となるガイド音高ｋを差分Δｄ［ｋ］が０より小さいものに限定しても、最低音ｋｌｏとして抽出されるべきガイド音高ｋが抽出対象から除かれることはないため、適切に処理負荷を抑えることができる。さらにいえば、適切なガイド音高ｋのみが対象となるため、不適切なガイド音高ｋを抽出してしまうことなどに起因して最低音ｋｌｏの判定精度が低くなってしまうことも防止することができる。 Then, even if the target guide pitch k is limited to a difference Δd [k] smaller than 0, the guide pitch k to be extracted as the lowest sound klo is not excluded from the extraction target. The processing load can be reduced. Furthermore, since only the appropriate guide pitch k is the target, it is possible to prevent the determination accuracy of the lowest sound klo from being lowered due to the extraction of the inappropriate guide pitch k. be able to.

また、上記実施形態では、誤差分布におけるガイド音高ｋのうち、音高が高い側に位置する所定数のガイド音高からなる帯域を高音側の誤差変曲帯域として抽出したうえで、最高音ｋｕｐの判定を行っている（ｓ２４０〜ｓ３３０）。最高音ｋｕｐとなるガイド音高ｋは、誤差分布における全てのガイド音高のうち、高音側の変曲領域にあるため、最高音ｋｕｐの判定時に参照するガイド音高ｋの範囲を、高音側の変曲領域を形成する帯域に限定しても問題なく、これにより最高音ｋｕｐの判定に要する処理負荷を抑えることができる。 Further, in the above embodiment, after extracting a band composed of a predetermined number of guide pitches located on the higher pitch side of the guide pitch k in the error distribution as an error inflection band on the higher pitch side, The determination of kup is performed (s240 to s330). The guide pitch k that becomes the highest sound kup is in the inflection region on the high sound side of all the guide sound pitches in the error distribution, so the range of the guide pitch k that is referred to when determining the highest sound kup is the high sound side. There is no problem even if it is limited to the band that forms the inflection region, so that the processing load required for the determination of the highest sound kup can be suppressed.

また、上記実施形態では、誤差分布におけるガイド音高ｋのうち、音高が低い側に位置する所定数のガイド音高からなる帯域を低音側の誤差変曲帯域として抽出したうえで、最低音ｋｌｏの判定を行っている（ｓ２５０〜ｓ４３０）。最低音ｋｌｏとなるガイド音高ｋは、誤差分布における全てのガイド音高のうち、低音側の変曲領域にあるため、最低音ｋｌｏの判定時に参照するガイド音高ｋの範囲を、低音側の変曲領域を形成する帯域に限定しても問題なく、これにより最低音ｋｌｏの判定に要する処理負荷を抑えることができる。 Further, in the above embodiment, a band composed of a predetermined number of guide pitches located on the lower pitch side of the guide pitch k in the error distribution is extracted as an error inflection band on the lower pitch side, and then the lowest pitch is calculated. klo is determined (s250 to s430). The guide pitch k that is the lowest pitch klo is in the inflection region on the low pitch side among all the guide pitches in the error distribution, so the range of the guide pitch k that is referred to when determining the minimum pitch klo is There is no problem even if it is limited to the band that forms the inflection region, so that the processing load required for determining the lowest sound klo can be suppressed.

また、上記実施形態では、誤差分布のガイド音高のうち、音高が高い側に位置しており，かつ，それぞれ隣接するガイド音高との間での誤差の変化率が一定以上となっているガイド音高それぞれからなる高音誤差変曲帯域を抽出している（図６のｓ２４０）。 In the above embodiment, the guide pitch of the error distribution is located on the higher pitch side, and the rate of change in error between the adjacent guide pitches is more than a certain value. A treble error inflection band consisting of each guide pitch is extracted (s240 in FIG. 6).

そのため、上述のように、帯域を抽出する際の変化率として、ユーザが適切に発声できるガイド音高以上の音高における変化率として想定される値（本実施形態においては「隣接誤差比が最大」となる値）を「一定の変化率」としておくことにより、高音誤差変曲帯域として適切な帯域を抽出することができる。 Therefore, as described above, the rate of change at the time of extracting a band is a value assumed as the rate of change in the pitch above the guide pitch that can be appropriately spoken by the user (in this embodiment, “the adjacent error ratio is the maximum). By setting “a constant value” as “a constant change rate”, it is possible to extract an appropriate band as the treble error inflection band.

また、上記実施形態では、誤差分布のガイド音高のうち、音高が低い側に位置しており，かつ，それぞれ隣接するガイド音高との間の誤差の変化率が一定以上となっているガイド音高それぞれからなる低音誤差変曲帯域を抽出している（図６のｓ２５０）。 Moreover, in the said embodiment, it is located in the low pitch side among the guide pitches of an error distribution, and the rate of change of the error between each adjacent guide pitch is not less than a certain level. A bass error inflection band consisting of each guide pitch is extracted (s250 in FIG. 6).

そのため、上述のように、帯域を抽出する際の変化率として、ユーザが適切に発声できるガイド音高以下の音高における変化率として想定される値（本実施形態においては「隣接誤差比が最小」となる値）を「一定の変化率」としておくことにより、低音誤差変曲帯域として適切な帯域を抽出することができる。 Therefore, as described above, the rate of change at the time of extracting the band is a value assumed as the rate of change in the pitch below the guide pitch that can be appropriately spoken by the user (in this embodiment, “the adjacent error ratio is the minimum By setting “a constant value” to “a constant change rate”, it is possible to extract an appropriate band as the bass error inflection band.

また、上記実施形態においては、所定数のガイド音高からなるグループ毎のガイド音高の誤差を平均化し（図６のｓ２２０）、それぞれ隣接するグループ間の隣接誤差比が最大となっている１以上のグループについて、このグループにおけるガイド音高それぞれからなる高音誤差変曲領域を抽出することができる（同図ｓ２４０）。同様に、隣接誤差比が最小となっている１以上のグループについて、このグループにおけるガイド音高それぞれからなる低音誤差変曲領域を抽出することができる。 In the above embodiment, the errors in the guide pitches for each group consisting of a predetermined number of guide pitches are averaged (s220 in FIG. 6), and the adjacent error ratio between the adjacent groups is maximized. With respect to the above group, it is possible to extract a treble error inflection region made up of each guide pitch in this group (s240 in the figure). Similarly, for one or more groups having a minimum adjacent error ratio, it is possible to extract a bass error inflection region made up of each guide pitch in this group.

また、上記実施形態においては、歌唱音高の推移パターンとガイド音高の推移パターンとが最も近似するように、歌唱音高における時間軸上の位置が修正されたうえで音高の誤差が算出される（図２のｓ１４０）。このように推移パターンが近似するということは、その音高の変化するタイミングも近似した状態になることを意味するため、こうして、音高の変化が開始されたタイミングのズレを補償することができる。
（４）変形例
以上、本発明の実施の形態について説明したが、本発明は、上記実施形態に何ら限定されることはなく、本発明の技術的範囲に属する限り種々の形態をとり得ることはいうまでもない。 Further, in the above embodiment, the pitch error is calculated after the position on the time axis of the singing pitch is corrected so that the transition pattern of the singing pitch and the transition pattern of the guide pitch are most approximated. (S140 in FIG. 2). When the transition pattern is approximated in this way, it means that the timing at which the pitch changes is also approximated. Thus, the timing deviation at which the pitch change is started can be compensated. .
(4) Modifications The embodiment of the present invention has been described above. However, the present invention is not limited to the above embodiment, and can take various forms as long as they belong to the technical scope of the present invention. Needless to say.

例えば、上記実施形態においては、最低音ｋｕｐの判定時に参照するガイド音高ｋの範囲を、差分Δｄ［ｋ］が０より大きいものに限定するように構成されている（図６のｓ３３０）。しかし、ここで参照するガイド音高ｋは、誤差分布における高音側のガイド音高であればよく、差分Δｄ［ｋ］とは無関係に決めてもよい。具体的な例としては、ガイド音高全域における高音側の一定割合（例えば、数十％など）とすることが考えられる。 For example, in the above-described embodiment, the range of the guide pitch k referred to when determining the lowest sound kup is limited to a range where the difference Δd [k] is larger than 0 (s330 in FIG. 6). However, the guide pitch k referred to here may be a guide pitch on the high pitch side in the error distribution, and may be determined regardless of the difference Δd [k]. As a specific example, it is conceivable to set a certain ratio (for example, several tens of percent) on the high pitch side in the entire guide pitch range.

また、上記実施形態においては、最低音ｋｌｏの判定時に参照するガイド音高ｋの範囲を、差分Δｄ［ｋ］が０より小さいものに限定するように構成されている（図６のｓ４３０）。しかし、ここで参照するガイド音高ｋは、誤差分布における低音側のガイド音高であればよく、差分Δｄ［ｋ］とは無関係に決めてもよい。具体的な例としては、ガイド音高全域における低音側の一定割合（例えば、数十％など）とすることが考えられる。
（５）本発明との対応関係
以上説明した実施形態において、図２のｓ１４０が本発明における誤差算出手段であり、同図ｓ１６０が本発明における分布更新手段であり、図６のｓ３３０が本発明における最高音判定手段であり、同図ｓ２４０が本発明における高域誤差変曲帯域抽出手段であり、同図ｓ４３０が本発明における最低音判定手段であり、同図ｓ２５０が本発明における低域誤差変曲帯域抽出手段であり、同図ｓ２６０が本発明における音域判定手段である。 Further, in the above embodiment, the range of the guide pitch k referred to when determining the lowest sound klo is limited to a range where the difference Δd [k] is smaller than 0 (s430 in FIG. 6). However, the guide pitch k referred to here may be a guide pitch on the low tone side in the error distribution, and may be determined regardless of the difference Δd [k]. As a specific example, it is conceivable to set a certain ratio (for example, several tens of percent) on the low sound side in the entire guide pitch.
(5) Correspondence with the Present Invention In the embodiment described above, s140 in FIG. 2 is the error calculating means in the present invention, s160 in FIG. 6 is the distribution updating means in the present invention, and s330 in FIG. 6 is the present invention. S240 is the high frequency error inflection band extracting means in the present invention, s430 is the lowest sound determining means in the present invention, and s250 is the low frequency error in the present invention. The inflection band extraction means, and s260 in FIG.

１…音域判定システム、２…ネットワーク、１１…制御部、１３…記憶部、１５…通信部、１７…ユーザインタフェース部、１９…メディアドライブ、２１…表示部、２３…操作部、２５…マイク、２７…スピーカ、２９…音声入出力部。 DESCRIPTION OF SYMBOLS 1 ... Sound range determination system, 2 ... Network, 11 ... Control part, 13 ... Memory | storage part, 15 ... Communication part, 17 ... User interface part, 19 ... Media drive, 21 ... Display part, 23 ... Operation part, 25 ... Microphone, 27 ... Speaker, 29 ... Audio input / output unit.

Claims

Singing data indicating the transition of the pitch along the time axis associated with the singing of the music by the user, and guide data indicating the transition of the pitch along the time axis when the singing music related to the singing data is appropriately sung , The transition patterns of the pitches (hereinafter referred to as “singing pitches”) in the respective unit sections (hereinafter referred to as “singing intervals”) on the time axis indicated by the song data are indicated by the guide data. Both transition patterns are compared with the transition pattern of the pitch (hereinafter referred to as “guide pitch”) in the guide section corresponding to the corresponding singing section in each unit section (hereinafter referred to as “guide section”) on the time axis. Error calculating means for calculating an error Δd [k] of a singing pitch with respect to a guide pitch k (= 1 to n) to be uttered in the guide section And,
Among the error distributions prepared for each of a plurality of users and distributing the error of the singing pitch with respect to the guide pitch of the user for each of the guide pitches, this is an opportunity for calculation by the error calculation means. Each of the calculated errors is additionally distributed in the error distribution corresponding to the user who performed the song as an error d [k] of the singing pitch with respect to the guide pitch k in the guide section referred to in the calculation. A distribution updating means for updating the error distribution;
Among the guide pitches k in the error distribution updated by the distribution updating means, the difference Δd [k−1] (= d [) from the error d [k−1] in the guide pitch k−1 adjacent to the lower side. k] −d [k−1]) is compared with the difference Δd [k] (= d [k + 1] −d [k]) from the error d [k + 1] at the guide pitch k + 1 adjacent to the higher side. A guide pitch k that minimizes | Δd [k−1] / Δd [k] | is extracted, and the extracted guide pitch k is determined as the highest tone kup in the user's range corresponding to the error component. A pitch determination system characterized by comprising the highest pitch determination means.

The highest sound determination means has a difference Δd [k] greater than 0 in the guide pitch k in the error distribution updated by the distribution update means, and the error ratio | Δd [k−1] / Δd. The pitch determination system according to claim 1, wherein a guide pitch that minimizes [k] | is extracted.

Among the plurality of guide pitches in the error distribution updated by the distribution updating means, a high frequency error variable is extracted that extracts a band consisting of a predetermined number of guide pitches located on the higher pitch side as a high frequency error inflection band. Music band extraction means,
The highest sound determination means extracts a guide pitch that becomes the highest sound kup from guide pitches in the high frequency error inflection band extracted by the high frequency error inflection band extraction means. The pitch determination system according to claim 1 or 2.

The high frequency error inflection band extracting means is located on the higher pitch side among a plurality of guide pitches in the error distribution updated by the distribution updating means, and is adjacent to the adjacent guide pitches. The pitch determination system according to claim 3, wherein a treble error inflection band made up of guide pitches each having a rate of change in error between or above a certain value is extracted.

The high frequency error inflection band extracting means converts a guide pitch located at least on the higher pitch side among a plurality of guide pitches in the error distribution updated by the distribution updating means to a predetermined number of guide sounds. In addition to classifying each group into high-frequency groups and averaging the guide pitch errors classified into the same group for each group, the average error in each group adjacent to the high-pitched sound side is the average error in that group. The pitch determination system according to claim 4, wherein, for one or more groups including a group having a maximum adjacent error ratio excluding, a band composed of each of the guide pitches classified into these groups is extracted. .

Singing data indicating the transition of the pitch along the time axis associated with the singing of the music by the user, and guide data indicating the transition of the pitch along the time axis when the singing music related to the singing data is appropriately sung , The transition patterns of the pitches (hereinafter referred to as “singing pitches”) in the respective unit sections (hereinafter referred to as “singing intervals”) on the time axis indicated by the song data are indicated by the guide data. Both transition patterns are compared with the transition pattern of the pitch (hereinafter referred to as “guide pitch”) in the guide section corresponding to the corresponding singing section in each unit section (hereinafter referred to as “guide section”) on the time axis. Error calculating means for calculating an error Δd [k] of a singing pitch with respect to a guide pitch k (= 1 to n) to be uttered in the guide section And,
Among the error distributions prepared for each of a plurality of users and distributing the error of the singing pitch with respect to the guide pitch of the user for each of the guide pitches, this is an opportunity for calculation by the error calculation means. Each of the calculated errors is additionally distributed in the error distribution corresponding to the user who performed the song as an error d [k] of the singing pitch with respect to the guide pitch k in the guide section referred to in the calculation. A distribution updating means for updating the error distribution;
Among the guide pitches k in the error distribution updated by the distribution updating means, the difference Δd [k] (= d [k + 1] −d [k] from the error d [k + 1] in the guide pitch k + 1 adjacent to the higher side. ]) In comparison with the difference Δd [k−1] (= d [k] −d [k−1]) from the error d [k−1] in the guide pitch k−1 adjacent to the lower side. A guide pitch k that minimizes | Δd [k] / Δd [k−1] | is extracted, and the extracted guide pitch k is determined as the lowest tone klo in the user's range corresponding to the error component. A pitch determination system comprising: a minimum pitch determination unit;

The lowest sound determination means has the difference Δd [k] smaller than 0 in the guide pitch k in the error distribution updated by the distribution update means, and an error ratio | Δd [k] / Δd [k− 1] The guide pitch k that minimizes | is extracted. The pitch determination system according to claim 6, wherein:

Of the plurality of guide pitches in the error distribution updated by the distribution updating means, a low-frequency error variable that extracts a band consisting of a predetermined number of guide pitches located on the low pitch side as low-frequency error inflection bands. Music band extraction means,
The lowest sound determination means extracts a guide pitch that becomes the lowest sound klo from guide pitches in the low-frequency error inflection band extracted by the low-frequency error inflection band extraction means. The pitch determination system according to claim 6 or 7.

The low frequency error inflection band extracting means is located on the low pitch side among a plurality of guide pitches in the error distribution updated by the distribution updating means, and is adjacent to the adjacent guide pitches. The pitch determination system according to claim 8, wherein a bass band composed of guide pitches each having a change rate of an error between the guide pitches is extracted from a certain level.

The low-frequency error inflection band extracting unit converts a guide pitch located at least on the low pitch side among a plurality of guide pitches in the error distribution updated by the distribution updating unit to a predetermined number of guide sounds. In addition to classifying each group into high-frequency groups and averaging the guide pitch errors classified into the same group for each group, the average error in each group adjacent to the high-pitched sound side is the average error in that group. The pitch determination system according to claim 9, wherein, for one or more groups including a group having a minimum adjacent error ratio excluding, a band composed of each of the guide pitches classified into these groups is extracted. .

A sound range determination system provided with all the means according to any one of claims 1 to 5 and all the means according to any one of claims 6 to 10,
A sound range in which a pitch range from the highest sound determined by the highest sound determination means to the lowest sound determined by the lowest sound determination means is a user's sound range corresponding to the error distribution referred to in these determinations A sound range determination system comprising a determination means.

A program for causing a computer to execute various processing procedures for causing the computer to function as all the means according to claim 1.

The program for making a computer perform the various process procedures for functioning as all the means in any one of Claim 6 to 10.

The program for making a computer perform the various process procedures for functioning as all the means in any one of Claim 11.