JP2016177144A

JP2016177144A - Evaluation reference generation device and signing evaluation device

Info

Publication number: JP2016177144A
Application number: JP2015057488A
Authority: JP
Inventors: 隆一成山; Ryuichi Nariyama; 松本　秀一; Shuichi Matsumoto; 秀一松本; 辰弥寺島; Tatsuya Terajima
Original assignee: Yamaha Corp
Current assignee: Yamaha Corp
Priority date: 2015-03-20
Filing date: 2015-03-20
Publication date: 2016-10-06

Abstract

PROBLEM TO BE SOLVED: To give redundancy of evaluation to singing technique, while stabilizing reference of singing evaluation.SOLUTION: An evaluation reference generation device according to one embodiment of this invention includes: a reference acquisition unit configured to acquire reference data; a reference feature amount calculation unit configured to calculate feature amount data indicating time change of a feature amount of sound on the basis of the reference data; and a generation unit configured to generate evaluation reference data for defining a reference feature amount range that is set so as to contain a feature amount indicated by the feature amount data as reference, contains an allowable range from the reference, and is changed based on information in which the allowable range is contained in the reference data.SELECTED DRAWING: Figure 2

Description

本発明は、歌唱を評価する技術に関する。 The present invention relates to a technique for evaluating a song.

カラオケ装置には、歌唱音声を解析して評価する機能が備えられていることが多い。歌唱の評価には様々な方法が用いられる。その方法の一つとして、例えば、特許文献１には、歌唱音声のピッチと歌唱すべき模範音声のピッチとを比較し、ピッチ差に応じて歌唱を評価する技術が開示されている。また、歌唱音声にビブラートが含まれていることを検出した場合には、ビブラートが含まれている区間ではピッチの差が大きくても高評価が得られるように、評価基準を緩くする技術も特許文献１には開示されている。 Karaoke devices often have a function of analyzing and evaluating singing voices. Various methods are used for singing evaluation. As one of the methods, for example, Patent Document 1 discloses a technique for comparing a pitch of a singing voice with a pitch of a model voice to be sung and evaluating the singing according to a pitch difference. In addition, when it is detected that the vibrato is included in the singing voice, there is also a technology that loosens the evaluation criteria so that a high evaluation can be obtained even if the pitch difference is large in the section where the vibrato is included. It is disclosed in Document 1.

特開２０１１−２０９６５４号公報JP 2011-209654 A

模範音声においてビブラートが含まれている箇所において、歌唱音声にビブラートが含まれていない場合、歌唱技法の有無つまり表現の問題であるため、ビブラートの表現が違うからといってあまり低い評価にするべきではない。しかしながら、特許文献１に開示された技術では、ビブラートの無い歌唱音声が、模範音声のビブラートに追従していないことになり、評価を低くしてしまう。また、歌唱音声に依存して評価基準を決めている。そのため、例えばビブラートが不要な区間においても歌唱音声に含まれるビブラートによって、歌唱の評価基準を変動させてしまうため評価基準が不安定になってしまうという問題もあった。 If the vibrato is not included in the singing voice where the vibrato is included in the model voice, it is a problem of the presence or absence of the singing technique, that is, the expression problem. is not. However, in the technique disclosed in Patent Document 1, the singing voice without vibrato does not follow the vibrato of the model voice, and the evaluation is lowered. In addition, the evaluation standard is determined depending on the singing voice. Therefore, for example, even in a section where vibrato is unnecessary, there is a problem that the evaluation standard becomes unstable because the evaluation standard of the singing is changed by vibrato included in the singing voice.

本発明の目的の一つは、歌唱評価の基準を安定させつつ、歌唱技法に対する評価の冗長性をもたせることにある。 One of the objects of the present invention is to provide evaluation redundancy for the singing technique while stabilizing the standard of singing evaluation.

本発明の一実施形態によると、基準データを取得する基準取得部と、前記基準データから音の特徴量の時間変化を示す特徴量データを算出する基準特徴量算出部と、前記特徴量データが示す特徴量を基準として含むように設定された基準特徴量範囲であって、当該基準からの許容範囲を含み、当該許容範囲が前記基準データに含まれる情報に基づいて変更される基準特徴量範囲を規定する評価基準データを生成する生成部と、を備えることを特徴とする評価基準生成装置が提供される。 According to an embodiment of the present invention, a reference acquisition unit that acquires reference data, a reference feature amount calculation unit that calculates feature amount data indicating a temporal change in a sound feature amount from the reference data, and the feature amount data A reference feature amount range set to include the indicated feature amount as a reference, the reference feature amount range including the allowable range from the reference, and the allowable range being changed based on information included in the reference data And a generation unit that generates evaluation reference data for defining the evaluation reference data.

本発明の一実施形態によると、歌唱入力音を取得する歌唱音取得部と、前記歌唱入力音から特徴量の時間変化を示す歌唱特徴量データを算出する歌唱特徴量算出部と、前記特徴量の評価の基準となる基準特徴量を含むように設定された基準特徴量範囲であって、当該基準特徴量からの許容範囲を含み、当該許容範囲が区間によって変更されている基準特徴量範囲を規定する評価基準データを取得する評価基準取得部と、前記評価基準データが示す前記基準特徴量範囲と、前記歌唱特徴量データが示す前記特徴量の時間変化とを比較する比較部と、前記比較された結果に基づいて、前記歌唱入力音に対する評価値を算出する評価部と、を備えることを特徴とする歌唱評価装置が提供される。また、前記評価基準取得部によって取得される評価基準データが規定する基準特徴量範囲の前記許容範囲は、予め決められた特定区間以外において第１ルールで設定され、当該特定区間において第２ルールで設定されてもよい。 According to one embodiment of the present invention, a singing sound acquisition unit that acquires a singing input sound, a singing feature amount calculation unit that calculates singing feature amount data indicating temporal changes in the characteristic amount from the singing input sound, and the feature amount A reference feature amount range that is set to include a reference feature amount that is a reference for evaluation of the reference feature amount, includes an allowable range from the reference feature amount, and the allowable range is changed by a section. An evaluation criterion acquisition unit that acquires evaluation criterion data to be defined; a comparison unit that compares the reference feature amount range indicated by the evaluation criterion data; and a temporal change of the feature amount indicated by the singing feature amount data; and the comparison And a valuation unit for calculating an evaluation value for the singing input sound based on the result. In addition, the allowable range of the reference feature amount range defined by the evaluation reference data acquired by the evaluation reference acquisition unit is set by the first rule except for the predetermined specific section, and the second rule in the specific section It may be set.

本発明の一実施形態によると、歌唱入力音を取得する歌唱音取得部と、前記歌唱入力音から特徴量の時間変化を示す歌唱特徴量データを算出する歌唱特徴量算出部と、前記特徴量の評価の基準として上限値および下限値によって設定され、当該上限値と当該下限値までの範囲が区間によって変更されている基準特徴量範囲を規定する評価基準データを取得する評価基準取得部と、前記評価基準データが示す前記基準特徴量範囲と、前記歌唱特徴量データが示す前記特徴量の時間変化とを比較する比較部と、前記比較された結果に基づいて、前記歌唱入力音に対する評価値を算出する評価部と、を備えることを特徴とする歌唱評価装置が提供される。また、前記評価基準取得部によって取得される評価基準データが規定する基準特徴量範囲は、上限値と下限値との幅が第１幅となる許容範囲の第１区間および当該第１幅とは異なる第２幅となる許容範囲の第２区間とを含んでもよい。また、前記基準特徴量範囲は、許容範囲が大きい低評価範囲と、許容範囲が小さい高評価範囲とを含み、前記評価部は、前記歌唱特徴量データが示す前記特徴量の時間変化に対して、前記低評価範囲と比較した結果、および前記高評価範囲と比較した結果に基づいて、前記歌唱入力音に対する評価値を算出してもよい。 According to one embodiment of the present invention, a singing sound acquisition unit that acquires a singing input sound, a singing feature amount calculation unit that calculates singing feature amount data indicating temporal changes in the characteristic amount from the singing input sound, and the feature amount An evaluation criterion acquisition unit that acquires evaluation criterion data that defines a reference feature amount range that is set by an upper limit value and a lower limit value as a criterion for evaluation, and the range from the upper limit value to the lower limit value is changed by a section; A comparison unit that compares the reference feature amount range indicated by the evaluation reference data with a temporal change of the feature amount indicated by the singing feature amount data, and an evaluation value for the singing input sound based on the comparison result A singing evaluation device comprising: an evaluation unit that calculates Further, the reference feature amount range defined by the evaluation reference data acquired by the evaluation reference acquisition unit is the first section of the allowable range in which the width between the upper limit value and the lower limit value is the first width, and the first width. It may include a second section of an allowable range having a different second width. Further, the reference feature amount range includes a low evaluation range having a large allowable range and a high evaluation range having a small allowable range, and the evaluation unit is adapted to a temporal change of the feature amount indicated by the singing feature amount data. The evaluation value for the singing input sound may be calculated based on the result compared with the low evaluation range and the result compared with the high evaluation range.

本発明の一実施形態によれば、歌唱評価の基準を安定させつつ、歌唱技法に対する評価の冗長性をもたせることができる。 According to an embodiment of the present invention, it is possible to provide evaluation redundancy for a singing technique while stabilizing a standard for singing evaluation.

本発明の第１実施形態における評価装置１の構成を示すブロック図である。It is a block diagram which shows the structure of the evaluation apparatus 1 in 1st Embodiment of this invention. 本発明の第１実施形態における評価基準生成機能および評価機能の構成を示すブロック図である。It is a block diagram which shows the structure of the evaluation reference | standard production | generation function and evaluation function in 1st Embodiment of this invention. 本発明の第１実施形態における通常区間における基準ピッチ範囲の決定に適用される第１ルールを説明する図である。It is a figure explaining the 1st rule applied to determination of the standard pitch range in the usual section in a 1st embodiment of the present invention. 本発明の第１実施形態におけるビブラート検出区間における基準ピッチ範囲の決定に適用される第２ルールを説明する図である。It is a figure explaining the 2nd rule applied to the determination of the reference pitch range in the vibrato detection area in 1st Embodiment of this invention. 本発明の第１実施形態におけるコブシ検出区間における基準ピッチ範囲の決定に適用される第２ルールを説明する図である。It is a figure explaining the 2nd rule applied to determination of the reference pitch range in the Kobushi detection section in a 1st embodiment of the present invention. 本発明の第１実施形態におけるフォール検出区間における基準ピッチ範囲の決定に適用される第２ルールを説明する図である。It is a figure explaining the 2nd rule applied to the determination of the reference pitch range in the fall detection area in 1st Embodiment of this invention. 本発明の第１実施形態におけるシャクリ検出区間における基準ピッチ範囲の決定に適用される第２ルールを説明する図である。It is a figure explaining the 2nd rule applied to the determination of the reference | standard pitch range in the shackle detection area in 1st Embodiment of this invention. 本発明の第１実施形態におけるロングトーン検出区間における基準ピッチ範囲の決定に適用される第２ルールを説明する図である。It is a figure explaining the 2nd rule applied to the determination of the reference pitch range in the long tone detection area in 1st Embodiment of this invention. 本発明の第１実施形態における歌唱ピッチを用いた歌唱評価方法を説明する図である。It is a figure explaining the song evaluation method using the song pitch in 1st Embodiment of this invention. 本発明の第２実施形態における基準音量速度範囲を説明する図である。It is a figure explaining the reference | standard volume speed range in 2nd Embodiment of this invention. 本発明の第３実施形態における通常区間およびビブラート検出区間における基準ピッチ範囲の決定方法を説明する図である。It is a figure explaining the determination method of the reference | standard pitch range in the normal area and vibrato detection area in 3rd Embodiment of this invention. 本発明の第４実施形態における評価機能の構成を示すブロック図である。It is a block diagram which shows the structure of the evaluation function in 4th Embodiment of this invention. 本発明の第６実施形態における評価基準生成機能の構成を示すブロック図である。It is a block diagram which shows the structure of the evaluation reference | standard production | generation function in 6th Embodiment of this invention.

以下、本発明の一実施形態における評価装置について、図面を参照しながら詳細に説明する。以下に示す実施形態は本発明の実施形態の一例であって、本発明はこれらの実施形態に限定されるものではない。 Hereinafter, an evaluation apparatus according to an embodiment of the present invention will be described in detail with reference to the drawings. The following embodiments are examples of the embodiments of the present invention, and the present invention is not limited to these embodiments.

＜第１実施形態＞
本発明の第１実施形態における評価装置について、図面を参照しながら詳細に説明する。第１実施形態に係る評価装置は、歌唱するユーザ（以下、歌唱者という場合がある）の歌唱音声を評価するための評価基準を生成し、その評価基準によって歌唱音声を評価する装置である。この評価装置は、評価の基準となるピッチ範囲（以下、基準ピッチ範囲という場合がある）を音声（この例では、模範となる歌唱音声（模範音声））の入力によって生成する。このとき、模範音声に所定の歌唱技法が含まれる区間とそれ以外の区間とでは、異なるルールによって基準ピッチ範囲を設定する。なお、模範音声は、模範となる歌唱者によるものに限らず、合成音声等、コンピュータ等によって生成された音声、楽器音など人声以外の音声であってもよい。 <First Embodiment>
The evaluation apparatus according to the first embodiment of the present invention will be described in detail with reference to the drawings. The evaluation apparatus which concerns on 1st Embodiment is an apparatus which produces | generates the evaluation reference | standard for evaluating the singing voice of the user who sings (it may be called a singer hereafter), and evaluates a singing voice by the evaluation reference | standard. This evaluation device generates a pitch range (hereinafter, also referred to as a reference pitch range) that serves as a reference for evaluation by inputting a voice (in this example, a singing voice that serves as a model (model voice)). At this time, the reference pitch range is set according to different rules in the section in which the predetermined voice technique is included in the model voice and the other sections. The model voice is not limited to a model singer, and may be a voice other than a human voice such as a synthesized voice, a voice generated by a computer, or a musical instrument.

また、評価装置は、このように生成した基準ピッチ範囲と歌唱ピッチとを比較して、歌唱音声を評価する。以下に説明する方法で基準ピッチ範囲を設定すれば、模範音声において歌唱技法が含まれる区間に、歌唱音声に歌唱技法が含まれていなくても、歌唱の評価を下げないようにすることができる。また、模範音声に歌唱技法が含まれる区間と同じ区間に、歌唱音声に歌唱技法が含まれている場合に、模範音声と全く同じ歌唱でなくても評価を下げないようにすることができる。以下、このような評価装置について説明する。 The evaluation device compares the reference pitch range generated in this way with the singing pitch and evaluates the singing voice. If the reference pitch range is set by the method described below, the evaluation of the singing can be prevented from being lowered even if the singing voice does not include the singing technique in the section in which the singing technique is included in the model voice. . Further, when the singing voice is included in the same section as the section in which the singing technique is included in the model voice, the evaluation can be prevented from being lowered even if the singing voice is not exactly the same as the model voice. Hereinafter, such an evaluation apparatus will be described.

［ハードウエア］
図１は、本発明の第１実施形態における評価装置１の構成を示すブロック図である。評価装置１は、例えば、カラオケ装置である。なお、評価装置１は、スマートフォン等の携帯装置であってもよいし、ネットワークを介して連携する複数の装置で構成されていてもよい。評価装置１は、制御部１１、記憶部１３、操作部１５、表示部１７、通信部１９、および信号処理部２１を含む。これらの各構成は、バスを介して接続されている。また、信号処理部２１には、マイクロフォン２３およびスピーカ２５が接続されている。 [Hardware]
FIG. 1 is a block diagram showing a configuration of an evaluation apparatus 1 in the first embodiment of the present invention. The evaluation device 1 is, for example, a karaoke device. The evaluation device 1 may be a mobile device such as a smartphone, or may be configured by a plurality of devices that cooperate via a network. The evaluation device 1 includes a control unit 11, a storage unit 13, an operation unit 15, a display unit 17, a communication unit 19, and a signal processing unit 21. Each of these components is connected via a bus. In addition, a microphone 23 and a speaker 25 are connected to the signal processing unit 21.

制御部１１は、ＣＰＵなどの演算処理回路を含む。制御部１１は、記憶部１３に記憶された制御プログラムをＣＰＵにより実行して、各種機能を評価装置１において実現させる。実現される機能には、歌唱音声を評価する機能（評価機能）およびその評価の基準を生成する機能（評価基準生成機能）が含まれる。なお、いずれか一方の機能のみが実現されてもよい。すなわち、評価基準生成機能を実現せず、評価機能を実現する装置（評価装置）であってもよいし、評価機能を実現せず、評価基準生成機能を実現する装置（評価基準生成装置）であってもよい。 The control unit 11 includes an arithmetic processing circuit such as a CPU. The control unit 11 causes the CPU to execute the control program stored in the storage unit 13 to realize various functions in the evaluation device 1. The realized functions include a function for evaluating singing voice (evaluation function) and a function for generating a reference for the evaluation (evaluation reference generation function). Only one of the functions may be realized. In other words, it may be a device (evaluation device) that realizes the evaluation function without realizing the evaluation criterion generation function, or a device (evaluation criterion generation device) that realizes the evaluation criterion generation function without realizing the evaluation function. There may be.

記憶部１３は、不揮発性メモリ、ハードディスク等の記憶装置である。記憶部１３は、評価機能を実現するための制御プログラムを記憶する。制御プログラムは、磁気記録媒体、光記録媒体、光磁気記録媒体、半導体メモリなどのコンピュータ読み取り可能な記録媒体に記憶した状態で提供されてもよい。この場合には、評価装置１は、記録媒体を読み取る装置を備えていればよい。また、制御プログラムは、ネットワーク経由でダウンロードされてもよい。 The storage unit 13 is a storage device such as a nonvolatile memory or a hard disk. The storage unit 13 stores a control program for realizing the evaluation function. The control program may be provided in a state stored in a computer-readable recording medium such as a magnetic recording medium, an optical recording medium, a magneto-optical recording medium, or a semiconductor memory. In this case, the evaluation device 1 only needs to include a device that reads the recording medium. The control program may be downloaded via a network.

また、記憶部１３は、歌唱に関するデータとして、楽曲データ、歌唱音声データ、および評価基準データを記憶する。楽曲データは、カラオケの歌唱曲に関連するデータ、例えば、ガイドメロディデータ、伴奏データ、歌詞データなどが含まれている。ガイドメロディデータは、歌唱曲のメロディを示すデータである。伴奏データは、歌唱曲の伴奏を示すデータである。ガイドメロディデータおよび伴奏データは、ＭＩＤＩ形式で表現されたデータであってもよい。歌詞データは、歌唱曲の歌詞を表示させるためのデータ、および表示させた歌詞テロップを色替えするタイミングを示すデータである。歌唱音声データは、歌唱者がマイクロフォン２３から入力した歌唱音声を示すデータである。この例では、歌唱音声データは、評価機能によって歌唱音声が評価されるまで、記憶部１３にバッファされる。なお、記憶部１３は、評価基準データを生成するために入力される模範音声を示すデータを記憶してもよい。 Moreover, the memory | storage part 13 memorize | stores music data, singing voice data, and evaluation reference data as data regarding a song. The music data includes data related to the karaoke song, for example, guide melody data, accompaniment data, and lyrics data. The guide melody data is data indicating the melody of the song. Accompaniment data is data indicating the accompaniment of a song. The guide melody data and accompaniment data may be data expressed in the MIDI format. The lyric data is data for displaying the lyrics of the song and data indicating the timing for changing the color of the displayed lyrics telop. The singing voice data is data indicating the singing voice input from the microphone 23 by the singer. In this example, the singing voice data is buffered in the storage unit 13 until the singing voice is evaluated by the evaluation function. In addition, the memory | storage part 13 may memorize | store the data which show the model audio | voice input in order to produce | generate evaluation criteria data.

評価基準データは、評価機能によって歌唱音声の評価の基準として用いられる情報であり、評価対象の歌唱曲（歌唱音声の入力がされるときに出力されている歌唱曲）を示す楽曲データに対応付けられている。この例では、評価基準データは、評価基準生成機能により模範音声の入力に基づいて生成され、評価対象の歌唱曲の進行に伴って時間変化する基準ピッチ範囲（基準特徴量範囲）を規定するデータとして生成される。基準ピッチ範囲は、上限ピッチと下限ピッチとで規定されている。基準ピッチ範囲は、一つの範囲として規定される場合に限らず、複数の範囲として規定される場合もある。この場合、複数の範囲は、第１上限ピッチと第１下限ピッチとの間の第１範囲、および第２上限ピッチと第２下限ピッチとの間の第２範囲として規定される。また、基準ピッチ範囲は、後述する通常区間と特定区間とでは基準となるピッチからの幅が異なるように決められている。 The evaluation reference data is information used as a reference for evaluation of the singing voice by the evaluation function, and is associated with music data indicating the singing song to be evaluated (the singing tune outputted when the singing voice is input). It has been. In this example, the evaluation standard data is generated based on the input of the model voice by the evaluation standard generation function, and data defining a reference pitch range (reference feature range) that changes with time as the song to be evaluated progresses. Is generated as The reference pitch range is defined by an upper limit pitch and a lower limit pitch. The reference pitch range is not limited to being defined as one range, but may be defined as a plurality of ranges. In this case, the plurality of ranges are defined as a first range between the first upper limit pitch and the first lower limit pitch and a second range between the second upper limit pitch and the second lower limit pitch. Further, the reference pitch range is determined so that the width from the reference pitch is different between a normal section and a specific section described later.

なお、評価基準データは、特徴量の上限値および下限値で規定される場合に限られず、通常区間と特定区間とが識別子等で区別され、基準となる特徴量、通常区間における特徴量の幅、および特定区間における特徴量の幅とで規定されていてもよい。 Note that the evaluation criterion data is not limited to the case where the upper limit value and the lower limit value of the feature quantity are specified, and the normal section and the specific section are distinguished by identifiers and the like, and the reference feature quantity and the width of the feature quantity in the normal section , And the width of the feature amount in the specific section.

操作部１５は、操作パネルおよびリモコンなどに設けられた操作ボタン、キーボード、マウスなどの装置であり、入力された操作に応じた信号を制御部１１に出力する。表示部１７は、液晶ディスプレイ、有機ＥＬディスプレイ等の表示装置であり、制御部１１による制御に基づいた画面が表示される。なお、操作部１５と表示部１７とが一体となったタッチパネルであってもよい。通信部１９は、制御部１１の制御に基づいて、インターネットなどの通信回線と接続して、サーバ等の外部装置と情報の送受信を行う。なお、記憶部１３の機能は、通信部１９において通信可能な外部装置で実現されてもよい。 The operation unit 15 is a device such as an operation button, a keyboard, or a mouse provided on an operation panel and a remote controller, and outputs a signal corresponding to the input operation to the control unit 11. The display unit 17 is a display device such as a liquid crystal display or an organic EL display, and displays a screen based on control by the control unit 11. A touch panel in which the operation unit 15 and the display unit 17 are integrated may be used. The communication unit 19 is connected to a communication line such as the Internet based on the control of the control unit 11 and transmits / receives information to / from an external device such as a server. The function of the storage unit 13 may be realized by an external device that can communicate with the communication unit 19.

信号処理部２１は、ＭＩＤＩ形式の信号からオーディオ信号を生成する音源、Ａ／Ｄコンバータ、Ｄ／Ａコンバータ等を含む。歌唱音声は、マイクロフォン２３において電気信号に変換されて信号処理部２１に入力され、信号処理部２１においてＡ／Ｄ変換されて制御部１１に出力される。上述したように、歌唱音声は、歌唱音声データとして記憶部１３にバッファされる。また、伴奏データは、制御部１１によって読み出され、信号処理部２１においてＤ／Ａ変換され、スピーカ２５から歌唱曲の伴奏として出力される。このとき、ガイドメロディもスピーカ２５から出力されるようにしてもよい。 The signal processing unit 21 includes a sound source that generates an audio signal from a MIDI format signal, an A / D converter, a D / A converter, and the like. The singing voice is converted into an electric signal by the microphone 23 and input to the signal processing unit 21, and A / D converted by the signal processing unit 21 and output to the control unit 11. As described above, the singing voice is buffered in the storage unit 13 as singing voice data. The accompaniment data is read out by the control unit 11, D / A converted by the signal processing unit 21, and output from the speaker 25 as an accompaniment of the song. At this time, a guide melody may be output from the speaker 25.

［評価装置１の機能］
評価装置１の制御部１１が制御プログラムを実行することによって実現される評価基準生成機能および評価機能について説明する。なお、以下に説明する機能を実現する構成の一部または全部は、ハードウエアによって実現されてもよいし、ネットワークを介してサーバ上で実現されてもよい。 [Function of Evaluation Apparatus 1]
An evaluation standard generation function and an evaluation function realized by the control unit 11 of the evaluation apparatus 1 executing a control program will be described. Part or all of the configuration for realizing the functions described below may be realized by hardware or may be realized on a server via a network.

図２は、本発明の第１実施形態における評価基準生成機能および評価機能の構成を示すブロック図である。評価基準生成機能１００は、入力された模範音声を用いて評価基準データを生成して、記憶部１３に記憶する。評価機能２００は、入力された歌唱音声を評価基準データと比較して、比較結果に基づいて評価する。以下、それぞれの機能について、具体的に説明する。 FIG. 2 is a block diagram showing the configuration of the evaluation reference generation function and the evaluation function in the first embodiment of the present invention. The evaluation criterion generation function 100 generates evaluation criterion data using the input model voice and stores it in the storage unit 13. The evaluation function 200 compares the input singing voice with the evaluation reference data and evaluates based on the comparison result. Each function will be specifically described below.

［評価基準生成機能］
評価基準生成機能１００は、基準取得部１０１、基準特徴量算出部１０３、特定区間検出部１０５、生成部１０７および評価精度設定部１０９を含む。基準取得部１０１は、マイクロフォン２３へ入力された基準音（この例では、模範音声）を示すデータ（基準音データ）を取得する。このとき、生成される評価基準データを対応付ける楽曲データが指定される。なお、基準取得部１０１は、指定された楽曲データの歌唱曲の伴奏音等をスピーカ２５から出力させ、歌唱曲の進行に連動してリアルタイムに基準音データを取得してもよいし、予め記憶された基準音データを記憶部１３等の記憶装置から読み込むことによって取得してもよい。なお、取得する基準音データは、評価の基準となる音を示していればよいため、必ずしも歌唱の模範となる音声を示していなくてもよい。 [Evaluation criteria generation function]
The evaluation reference generation function 100 includes a reference acquisition unit 101, a reference feature amount calculation unit 103, a specific section detection unit 105, a generation unit 107, and an evaluation accuracy setting unit 109. The reference acquisition unit 101 acquires data (reference sound data) indicating the reference sound (in this example, model voice) input to the microphone 23. At this time, music data to be associated with the generated evaluation reference data is designated. Note that the reference acquisition unit 101 may output the accompaniment sound of the song of the designated song data from the speaker 25 and may acquire the reference sound data in real time in conjunction with the progress of the song, or may be stored in advance. The obtained reference sound data may be acquired by reading from a storage device such as the storage unit 13. Note that the acquired reference sound data only needs to indicate a sound serving as a reference for evaluation, and therefore does not necessarily indicate a voice that serves as a model for singing.

基準特徴量算出部１０３は、基準取得部１０１によって取得された基準音データが示す模範音声（基準入力音）を解析し、模範音声の特徴量の時間的変化を算出する。この例では、特徴量はピッチである。したがって、基準特徴量算出部１０３は、模範音声のピッチ（以下、模範ピッチという場合がある）の時間的な変化、すなわち、模範ピッチ波形を算出する。具体的には、模範音声の波形のゼロクロスを用いた方法、ＦＦＴ（Fast Fourier Transform）を用いた方法等、公知の方法で模範ピッチ波形（特徴量データ）が算出される。 The reference feature value calculation unit 103 analyzes the model voice (reference input sound) indicated by the reference sound data acquired by the reference acquisition unit 101, and calculates a temporal change in the feature value of the model voice. In this example, the feature amount is a pitch. Therefore, the reference feature amount calculation unit 103 calculates a temporal change in the pitch of the model voice (hereinafter sometimes referred to as a model pitch), that is, a model pitch waveform. Specifically, the model pitch waveform (feature data) is calculated by a known method such as a method using the zero cross of the waveform of the model voice or a method using FFT (Fast Fourier Transform).

特定区間検出部１０５は、模範ピッチ波形を解析し、模範音声の入力期間のうち、予め決められた特定の条件満たす（この例では歌唱技法を含む）区間を検出する。検出される区間を特定区間という。このとき検出される特定区間は、歌唱技法の種類ごとに対応付けられる。検出される歌唱技法は、例えば、以下の歌唱技法を含む。なお、予め決められた特定の条件を満たす区間として予め設定しておけば、これら以外のピッチ変化（特定の遷移状態）についても検出の対象としてもよい。
（１）ビブラート：ピッチが細かく（所定周期以下で）高低に変化する。ビブラート検出の具体的な例は、特開２００５−１０７０８７号公報に開示されている。
（２）コブシ：ピッチが一時的に（所定時間以内で）高くなり、その後、元のピッチに戻る。コブシ検出の具体的な例は、特開２００８−２６８３７０号公報に開示されている。
（３）シャクリ：ピッチが所定時間かけて高くなり、その後安定する。シャクリ検出の具体的な例は、特開２００５−１０７３３４号公報に開示されている。
（４）フォール：ピッチが所定時間かけて低くなり、その後、歌唱が途切れる。フォール検出の具体的な例は、特開２００８−２２５１１５号公報に開示されている。
（５）ロングトーン：ピッチが一定期間以上、狭い範囲で続く。ロングトーン検出の具体的な例は、特開２００８−２２５１１４号公報に開示されている。 The specific section detection unit 105 analyzes the model pitch waveform and detects a section satisfying a predetermined condition (including a singing technique in this example) in the input period of the model voice. The detected section is called a specific section. The specific section detected at this time is associated with each type of singing technique. The detected singing technique includes, for example, the following singing techniques. In addition, if it sets beforehand as an area which satisfy | fills the predetermined specific condition, it is good also as a detection target also about pitch changes (specific transition state) other than these.
(1) Vibrato: The pitch changes finely (within a predetermined period or less). A specific example of vibrato detection is disclosed in Japanese Patent Application Laid-Open No. 2005-107087.
(2) Kobushi: The pitch temporarily increases (within a predetermined time), and then returns to the original pitch. A specific example of Kobushi detection is disclosed in Japanese Patent Laid-Open No. 2008-268370.
(3) Shaking: The pitch increases over a predetermined time and then stabilizes. A specific example of shackle detection is disclosed in Japanese Patent Laid-Open No. 2005-107334.
(4) Fall: The pitch is lowered over a predetermined time, and then the singing is interrupted. A specific example of fall detection is disclosed in Japanese Patent Laid-Open No. 2008-225115.
(5) Long tone: The pitch continues for a certain period or more in a narrow range. A specific example of long tone detection is disclosed in Japanese Patent Laid-Open No. 2008-225114.

生成部１０７は、模範ピッチ波形を基準として、基準ピッチ範囲を決定する。この例では、基準ピッチ範囲は、通常区間（歌唱音声の入力があった区間のうち、特定区間を除く区間）と特定区間とで異なるルール（第１ルール、第２ルール）に従って決められる。まず、通常区間において基準ピッチ範囲の決定に適用される第１ルールについて説明する。 The generation unit 107 determines a reference pitch range based on the model pitch waveform. In this example, the reference pitch range is determined according to different rules (first rule, second rule) between the normal section (the section in which the singing voice is input and the section excluding the specific section) and the specific section. First, the first rule applied for determining the reference pitch range in the normal section will be described.

図３は、本発明の第１実施形態における通常区間における基準ピッチ範囲の決定に適用される第１ルールを説明する図である。模範ピッチ波形が図３に示す波形ＰＳである場合を例として説明する。まず、ピッチ方向に幅ＰＷ、時間方向に幅ＴＷのウインドウＣＳを生成する。なお、幅ＴＷは、存在しなくてもよい。 FIG. 3 is a diagram illustrating a first rule applied to determination of a reference pitch range in a normal section according to the first embodiment of the present invention. The case where the exemplary pitch waveform is the waveform PS shown in FIG. 3 will be described as an example. First, a window CS having a width PW in the pitch direction and a width TW in the time direction is generated. Note that the width TW does not have to exist.

生成したウインドウＣＳの中心Ｃを、模範ピッチ波形ＰＳに沿って移動させる。この際に、ウインドウＣＳが通過した領域を基準ピッチ範囲ＡＷ１とする。基準ピッチ範囲ＡＷ１の上限を上限ピッチＵＬ、下限を下限ピッチＬＬとして示す。模範ピッチ波形ＰＳに対して、上限ピッチＵＬまでの範囲および下限ピッチＬＬまでの範囲は、模範ピッチに対する許容範囲となる。模範ピッチ波形ＰＳが存在しない区間等の存在により、基準ピッチ範囲ＡＷ１が存在しない区間ＮＴがある場合には、その区間ＮＴにおいて基準ピッチ範囲が存在しないことを示す識別子等が含まれていてもよい。 The center C of the generated window CS is moved along the model pitch waveform PS. At this time, a region through which the window CS has passed is defined as a reference pitch range AW1. The upper limit of the reference pitch range AW1 is shown as the upper limit pitch UL, and the lower limit is shown as the lower limit pitch LL. For the exemplary pitch waveform PS, the range up to the upper limit pitch UL and the range up to the lower limit pitch LL are allowable ranges for the exemplary pitch. When there is a section NT where the reference pitch range AW1 does not exist due to the existence of a section where the exemplary pitch waveform PS does not exist, an identifier or the like indicating that the reference pitch range does not exist in the section NT may be included. .

このようにして、模範ピッチ波形に基づいて、通常区間における基準ピッチ範囲が第１ルールに従って決定される。続いて、特定区間において基準ピッチ範囲の決定に適用される第２ルールについて説明する。第２ルールは、第１ルールとは異なる方法で、基準ピッチ範囲を決定する。また、特定区間に対応付けられた歌唱技法の種類（この例では、ビブラート検出区間、コブシ検出区間、フォール検出区間、シャクリ検出区間、およびロングトーン検出区間）によって、それぞれ異なる方法で基準ピッチ範囲を決定する。以下、それぞれの種類において、基準ピッチ範囲の決定に適用される第２ルールについて説明する。 In this manner, the reference pitch range in the normal section is determined according to the first rule based on the model pitch waveform. Subsequently, the second rule applied to the determination of the reference pitch range in the specific section will be described. The second rule determines the reference pitch range by a method different from the first rule. In addition, the reference pitch range is set differently depending on the type of singing technique associated with the specific section (in this example, vibrato detection section, kobushi detection section, fall detection section, shackle detection section, and long tone detection section). decide. Hereinafter, the second rule applied to the determination of the reference pitch range in each type will be described.

図４は、本発明の第１実施形態におけるビブラート検出区間における基準ピッチ範囲の決定に適用される第２ルールを説明する図である。図４に示すように、ビブラート検出区間ＶＰにおいて第２ルールが適用されて決定される基準ピッチ範囲ＡＷ２は、模範ピッチ波形ＰＳの最大値および最小値を基準として所定幅（例えば、ウインドウＣＳの幅ＰＷの半分のピッチ）を拡げた範囲とする。このとき、ビブラート検出区間においては上限ピッチＵＬと下限ピッチＬＬは一定値になるように決定される。このようにすると、ビブラートの周期、ピークのタイミング等が一致していなくても、ある程度近いビブラートであれば、歌唱ピッチが基準ピッチ範囲に含まれるようにすることができる。 FIG. 4 is a diagram for explaining the second rule applied to the determination of the reference pitch range in the vibrato detection section in the first embodiment of the present invention. As shown in FIG. 4, the reference pitch range AW2 determined by applying the second rule in the vibrato detection section VP has a predetermined width (for example, the width of the window CS) based on the maximum value and the minimum value of the model pitch waveform PS. The pitch is half the pitch of PW. At this time, the upper limit pitch UL and the lower limit pitch LL are determined to be constant values in the vibrato detection section. In this way, the singing pitch can be included in the reference pitch range as long as the vibrato period, peak timing, etc. do not match, as long as the vibrato is close to some extent.

このような決定方法は一例であって、ビブラート検出区間ＶＰにおける基準ピッチ範囲ＡＷ２は、模範ピッチ波形ＰＳ全体が含まれ、通常区間ＮＰにおける基準ピッチ範囲ＡＷ１の決定方法と異なるルールが適用されて決定されていればよい。 Such a determination method is an example, and the reference pitch range AW2 in the vibrato detection section VP includes the entire exemplary pitch waveform PS, and is determined by applying different rules from the determination method of the reference pitch range AW1 in the normal section NP. It only has to be done.

図５は、本発明の第１実施形態におけるコブシ検出区間における基準ピッチ範囲の決定に適用される第２ルールを説明する図である。図５に示すように、コブシ検出区間ＫＰにおいて第２ルールが適用されて決定される基準ピッチ範囲ＡＷ３は、上限ピッチＵＬについては、模範ピッチ波形ＰＳを基準として第１ルールと同様にして決められている。一方、下限ピッチＬＬについては、模範音声にコブシ技法が含まれていない場合を想定した波形ＶＳを基準として第１ルールと同様にして決定される。このようにすると、コブシ技法が含まれていない歌唱をしたときに、基準ピッチ範囲から外れてしまう、すなわち、コブシ技法を模範音声のとおりに必ず実行しないと低評価になってしまうということを除外することができる。 FIG. 5 is a diagram for explaining the second rule applied to the determination of the reference pitch range in the bump detection section in the first embodiment of the present invention. As shown in FIG. 5, the reference pitch range AW3 determined by applying the second rule in the Kobushi detection section KP is determined in the same manner as the first rule with respect to the upper limit pitch UL based on the exemplary pitch waveform PS. ing. On the other hand, the lower limit pitch LL is determined in the same manner as the first rule with reference to the waveform VS assuming that the model voice does not include the Kobushi technique. In this way, when singing that does not include the Kobushi technique, it is out of the standard pitch range, that is, the Kobushi technique must be performed according to the model voice, and it will be disliked. can do.

このような決定方法は一例であって、コブシ検出区間ＫＰにおける基準ピッチ範囲ＡＷ３は、模範ピッチ波形ＰＳ全体が含まれ、通常区間ＮＰにおける基準ピッチ範囲ＡＷ１の決定方法と異なるルールが適用されて決定されていればよい。 Such a determination method is an example, and the reference pitch range AW3 in the Kobushi detection section KP includes the entire exemplary pitch waveform PS, and is determined by applying different rules from the determination method of the reference pitch range AW1 in the normal section NP. It only has to be done.

図６は、本発明の第１実施形態におけるフォール検出区間における基準ピッチ範囲の決定に適用される第２ルールを説明する図である。図６に示すように、フォール検出区間ＦＰにおいて第２ルールが適用されて決定される基準ピッチ範囲は、第１範囲ＡＷ４および第２範囲ＡＷ５を含む。第１範囲ＡＷ４の上限ピッチＵＬ１および下限ピッチＬＬ１は、模範ピッチ波形ＰＳを基準として第１ルールと同様にして決められている。一方、第２範囲ＡＷ５の上限ピッチＵＬ２および下限ピッチＬＬ２については、模範音声にフォール技法が含まれていない場合を想定した波形ＶＳを基準として第１ルールと同様にして決定される。このようにすると、フォール技法が含まれていない歌唱をしたときに、基準ピッチ範囲から外れてしまう、すなわち、フォール技法を模範音声のとおりに必ず実行しないと低評価になってしまうということを除外することができる。 FIG. 6 is a diagram illustrating a second rule applied to determination of the reference pitch range in the fall detection section according to the first embodiment of the present invention. As shown in FIG. 6, the reference pitch range determined by applying the second rule in the fall detection section FP includes a first range AW4 and a second range AW5. The upper limit pitch UL1 and the lower limit pitch LL1 of the first range AW4 are determined in the same manner as the first rule with the model pitch waveform PS as a reference. On the other hand, the upper limit pitch UL2 and the lower limit pitch LL2 of the second range AW5 are determined in the same manner as the first rule on the basis of the waveform VS assuming that the exemplary voice does not include the fall technique. In this way, when you sing a song that does not include the fall technique, you will be excluded from the reference pitch range, that is, if you do not necessarily execute the fall technique as in the model voice, it will be excluded. can do.

なお、上限ピッチＵＬ１および下限ピッチＬＬ２を考慮せず、上限ピッチＵＬ２および下限ピッチＬＬ１の間に含まれるピッチ範囲全体が、基準ピッチ範囲として決定されてもよい。このようにすると、フォール技法のピッチの落とし方が少なかったとしても、低評価になってしまうことを除外することができる。このような決定方法は一例であって、フォール検出区間ＦＰにおける基準ピッチ範囲（第１範囲ＡＷ４および第２範囲ＡＷ５）は、いずれかにの範囲の一部に模範ピッチ波形ＰＳ全体が含まれ、通常区間ＮＰにおける基準ピッチ範囲ＡＷ１の決定方法と異なるルールが適用されて決定されていればよい。 Note that the entire pitch range included between the upper limit pitch UL2 and the lower limit pitch LL1 may be determined as the reference pitch range without considering the upper limit pitch UL1 and the lower limit pitch LL2. In this way, it is possible to exclude the low evaluation even when the fall technique has a small number of pitch drops. Such a determination method is an example, and the reference pitch range (the first range AW4 and the second range AW5) in the fall detection section FP includes the entire exemplary pitch waveform PS in a part of either range, It is only necessary to apply a rule different from the method for determining the reference pitch range AW1 in the normal section NP.

図７は、本発明の第１実施形態におけるシャクリ検出区間における基準ピッチ範囲の決定に適用される第２ルールを説明する図である。図７に示すように、シャクリ検出区間ＳＰにおいて第２ルールが適用されて決定される基準ピッチ範囲は、第１範囲ＡＷ６および第２範囲ＡＷ７を含む。第１範囲ＡＷ６の上限ピッチＵＬ１および下限ピッチＬＬ１は、模範ピッチ波形ＰＳを基準として第１ルールと同様にして決められている。一方、第２範囲ＡＷ７の上限ピッチＵＬ２および下限ピッチＬＬ２については、模範音声にシャクリ技法が含まれていない場合を想定した波形ＶＳを基準として第１ルールと同様にして決定される。このようにすると、シャクリ技法が含まれていない歌唱をしたときに、基準ピッチ範囲から外れてしまう、すなわち、シャクリ技法を模範音声のとおりに必ず実行しないと低評価になってしまうということを除外することができる。 FIG. 7 is a diagram illustrating a second rule applied to determination of the reference pitch range in the shackle detection section in the first embodiment of the present invention. As shown in FIG. 7, the reference pitch range determined by applying the second rule in the shackle detection section SP includes a first range AW6 and a second range AW7. The upper limit pitch UL1 and the lower limit pitch LL1 of the first range AW6 are determined in the same manner as the first rule with the model pitch waveform PS as a reference. On the other hand, the upper limit pitch UL2 and the lower limit pitch LL2 of the second range AW7 are determined in the same manner as the first rule on the basis of the waveform VS assuming that the model voice does not include the brushing technique. In this way, when singing that does not include the Shakuri technique, it is out of the standard pitch range, that is, it is excluded that the Shakuri technique must be performed according to the model voice, and it will be disliked. can do.

なお、上限ピッチＵＬ１および下限ピッチＬＬ２を考慮せず、上限ピッチＵＬ２および下限ピッチＬＬ１の間に含まれるピッチ範囲全体が、基準ピッチ範囲として決定されてもよい。このようにすると、シャクリ技法のピッチの上げ方が少なかったとしても、低評価になってしまうことを除外することができる。このような決定方法は一例であって、シャクリ検出区間ＳＰにおける基準ピッチ範囲（第１範囲ＡＷ６および第２範囲ＡＷ７）は、いずれかにの範囲の一部に模範ピッチ波形ＰＳ全体が含まれ、通常区間ＮＰにおける基準ピッチ範囲ＡＷ１の決定方法と異なるルールが適用されて決定されていればよい。 Note that the entire pitch range included between the upper limit pitch UL2 and the lower limit pitch LL1 may be determined as the reference pitch range without considering the upper limit pitch UL1 and the lower limit pitch LL2. In this way, even if there are few ways to raise the pitch of the shackle technique, it can be excluded that the evaluation is low. Such a determination method is an example, and the reference pitch range (the first range AW6 and the second range AW7) in the shackle detection section SP includes the entire exemplary pitch waveform PS in a part of either range, It is only necessary to apply a rule different from the method for determining the reference pitch range AW1 in the normal section NP.

図８は、本発明の第１実施形態におけるロングトーン検出区間における基準ピッチ範囲の決定に適用される第２ルールを説明する図である。図８に示すように、ロングトーン検出区間ＬＰにおいて第２ルールが適用されて決定される基準ピッチ範囲ＡＷ８は、模範ピッチ波形ＰＳを基準として所定幅（例えば、ウインドウＣＳの幅ＰＷの半分のピッチ）を拡げた範囲とする。ロングトーンでは、ピッチがずれていると、そのずれが目立ってしまうため、ロングトーン検出区間での判定基準を厳しくすることができる。 FIG. 8 is a diagram illustrating the second rule applied to the determination of the reference pitch range in the long tone detection section in the first embodiment of the present invention. As shown in FIG. 8, the reference pitch range AW8 determined by applying the second rule in the long tone detection section LP has a predetermined width (for example, a pitch half of the width PW of the window CS) based on the model pitch waveform PS. ) Is an expanded range. In the long tone, if the pitch is deviated, the deviation becomes conspicuous. Therefore, it is possible to tighten the criterion in the long tone detection section.

このとき、ロングトーン検出区間ＬＰの後ろの区間ほど、幅が狭くなるようにしてもよい。このような決定方法は一例であって、ロングトーン検出区間ＬＰにおける基準ピッチ範囲ＡＷ８は、模範ピッチ波形ＰＳ全体が含まれ、通常区間ＮＰにおける基準ピッチ範囲ＡＷ１の決定方法と異なるルールが適用されて決定されていればよい。 At this time, the width may be narrower in the section after the long tone detection section LP. Such a determination method is an example, and the reference pitch range AW8 in the long tone detection section LP includes the entire exemplary pitch waveform PS, and rules different from the determination method of the reference pitch range AW1 in the normal section NP are applied. It only has to be decided.

このようにして、生成部１０７は、基準ピッチ範囲を決定し、基準ピッチ範囲を示すパラメータ（例えば、上限ピッチおよび下限ピッチ）により規定された評価基準データを生成する。 In this way, the generation unit 107 determines the reference pitch range, and generates evaluation reference data defined by the parameters indicating the reference pitch range (for example, the upper limit pitch and the lower limit pitch).

図２に戻って説明を続ける。評価精度設定部１０９は、歌唱評価の精度を設定する。評価精度は、ユーザによって設定される。評価精度は、生成部１０７における第１ルールおよび第２ルールの少なくとも一方における許容範囲を決めるためのパラメータである。生成部１０７は、例えば、評価精度が高くなると、基準ピッチ範囲（ＡＷ１、ＡＷ２、・・・）が狭くなるようにして、評価基準データを生成する。具体的には、評価精度が高く設定されるほど、上述したウインドウＣＳの面積（特に、幅ＰＷ）を小さくするようにして基準ピッチ範囲が決定される。なお、評価精度設定部１０９は、存在しなくてもよい。存在しない場合には、評価精度は予め決められた値に設定されていればよい。 Returning to FIG. 2, the description will be continued. The evaluation accuracy setting unit 109 sets the accuracy of singing evaluation. The evaluation accuracy is set by the user. The evaluation accuracy is a parameter for determining an allowable range in at least one of the first rule and the second rule in the generation unit 107. For example, when the evaluation accuracy increases, the generation unit 107 generates the evaluation reference data so that the reference pitch range (AW1, AW2,...) Is narrowed. Specifically, as the evaluation accuracy is set higher, the reference pitch range is determined so as to reduce the area (in particular, the width PW) of the window CS described above. Note that the evaluation accuracy setting unit 109 may not exist. If it does not exist, the evaluation accuracy may be set to a predetermined value.

［評価機能］
続いて、評価機能２００について説明する。評価機能２００は、伴奏出力部２０１、歌唱音取得部２０３、歌唱特徴量算出部２０５、評価基準取得部２０７、特徴量比較部２０９、および音声評価部２１１を含む。伴奏出力部２０１は、歌唱者に指定された歌唱曲に対応する伴奏データを読み出し、信号処理部２１を介して、伴奏音をスピーカ２５から出力させる。歌唱音取得部２０３は、マイクロフォン２３から入力された歌唱音声を示す歌唱音声データを取得する。この例では、伴奏音が出力されている期間におけるマイクロフォン２３への入力音を、評価対象の歌唱音声として認識する。なお、歌唱音取得部２０３は、記憶部１３にバッファされた歌唱音声データを取得するが、記憶部１３に１曲全体の歌唱音声データが記憶された後に取得してもよいし、信号処理部２１から直接取得してもよい。 [Evaluation function]
Next, the evaluation function 200 will be described. The evaluation function 200 includes an accompaniment output unit 201, a singing sound acquisition unit 203, a singing feature amount calculation unit 205, an evaluation criterion acquisition unit 207, a feature amount comparison unit 209, and a voice evaluation unit 211. The accompaniment output unit 201 reads the accompaniment data corresponding to the song tune designated by the singer, and outputs the accompaniment sound from the speaker 25 via the signal processing unit 21. The singing sound acquisition unit 203 acquires singing sound data indicating the singing sound input from the microphone 23. In this example, the input sound to the microphone 23 during the period in which the accompaniment sound is output is recognized as the singing sound to be evaluated. The singing sound acquisition unit 203 acquires the singing voice data buffered in the storage unit 13, but may acquire the singing voice data of the entire song stored in the storage unit 13, or the signal processing unit. You may acquire directly from 21.

歌唱特徴量算出部２０５は、歌唱音取得部２０３によって取得された歌唱音声データを解析し、歌唱音声の特徴量の時間的変化を算出する。この例では、上述と同様に、特徴量はピッチである。したがって、歌唱特徴量算出部２０５は、歌唱音声のピッチ（以下、歌唱ピッチという場合がある）の時間的な変化、すなわち、歌唱ピッチ波形を算出する。算出方法は、上述した模範ピッチ波形の算出方法と同様である。 The singing feature amount calculation unit 205 analyzes the singing voice data acquired by the singing sound acquisition unit 203, and calculates a temporal change in the singing voice feature amount. In this example, as described above, the feature amount is a pitch. Accordingly, the singing feature amount calculation unit 205 calculates a temporal change in the pitch of the singing voice (hereinafter, sometimes referred to as a singing pitch), that is, a singing pitch waveform. The calculation method is the same as the above-described exemplary pitch waveform calculation method.

評価基準取得部２０７は、評価対象となる歌唱曲（伴奏出力部２０１において出力した伴奏音）の楽曲データに対応付けられた評価基準データを記憶部１３から読み出して取得する。特徴量比較部２０９は、評価基準データが示す基準ピッチ範囲と、歌唱ピッチとを比較する。 The evaluation reference acquisition unit 207 reads out and acquires evaluation reference data associated with the song data of the song to be evaluated (accompaniment sound output from the accompaniment output unit 201) from the storage unit 13. The feature amount comparison unit 209 compares the reference pitch range indicated by the evaluation reference data with the singing pitch.

図９は、本発明の第１実施形態における歌唱ピッチを用いた歌唱評価方法を説明する図である。歌唱特徴量算出部２０５において算出された歌唱ピッチの時間変化を示す波形ＩＳ（歌唱ピッチ波形）は、上限ピッチＵＬおよび下限ピッチＬＬによって決められた基準ピッチ範囲と比較される。図９では、ビブラート検出区間ＶＰを少し越えたタイミングまで歌唱ピッチが算出されている例を示している。この時点において、歌唱ピッチ波形ＩＳは、２箇所の外れ区間ＮＧにおいて、下限ピッチＬＬよりも低いピッチとなり、基準ピッチ範囲を外れている。歌唱音声の評価結果における一つのパラメータとして、例えば、全区間における外れ区間ＮＧの割合が高いほど、不一致度が高くなるように決められる。 FIG. 9 is a diagram for explaining a song evaluation method using a song pitch in the first embodiment of the present invention. The waveform IS (singing pitch waveform) indicating the time change of the singing pitch calculated by the singing feature amount calculation unit 205 is compared with a reference pitch range determined by the upper limit pitch UL and the lower limit pitch LL. FIG. 9 shows an example in which the singing pitch is calculated until a timing slightly exceeding the vibrato detection section VP. At this time point, the singing pitch waveform IS has a pitch lower than the lower limit pitch LL in two outliers NG and is out of the reference pitch range. As one parameter in the evaluation result of the singing voice, for example, the higher the proportion of outliers NG in all the sections, the higher the mismatch degree.

また、ビブラート検出区間ＶＰにおいては、歌唱ピッチ波形ＩＳにおけるビブラートの周期、ピークタイミング等がずれていても、上述した第２ルールで基準ピッチ範囲を決めているため、外れ区間ＮＧが増加しないようにすることができる。その結果、不一致度が高くならないようにすることができる。このように、この歌唱評価方法によれば、模範ピッチ波形に対する許容範囲を規定して予め基準ピッチ範囲として定めておくことで、歌唱評価の基準を安定させつつ、歌唱技法に対する評価の冗長性を持たせることができる。なお、区間ＮＴにおいて歌唱ピッチ波形が含まれる場合には、評価の対象としないようにしてもよいし、外れ区間ＮＧとして扱ってもよい。 Also, in the vibrato detection section VP, even if the vibrato period, peak timing, etc. in the singing pitch waveform IS are shifted, the reference pitch range is determined by the second rule described above, so that the outlier section NG does not increase. can do. As a result, the degree of inconsistency can be prevented from becoming high. Thus, according to this singing evaluation method, by defining the allowable range for the model pitch waveform and preliminarily setting it as the reference pitch range, the singing evaluation standard is stabilized and the evaluation redundancy for the singing technique is increased. You can have it. In addition, when the singing pitch waveform is included in the section NT, it may not be considered as an evaluation target, or may be treated as an outlier section NG.

図２に戻って説明を続ける。音声評価部２１１は、特徴量比較部２０９における比較結果に基づいて、歌唱音声の評価の指標となる評価値を算出する。この例では、特徴量比較部２０９で算出された不一致度が高いほど評価値が低く算出され、歌唱音声の評価が悪くなる。なお、音声評価部２１１は、この不一致度のみに基づいて評価値を算出するのではなく、さらに他の要素に基づいて評価値を算出してもよい。他の要素は、歌唱技法および歌唱音声データから抽出可能な他のパラメータなどが想定される。音声評価部２１１による評価結果は、表示部１７において提示されてもよい。以上、評価基準生成機能１００および評価機能２００の説明である。 Returning to FIG. 2, the description will be continued. The voice evaluation unit 211 calculates an evaluation value serving as an index for evaluating the singing voice based on the comparison result in the feature amount comparison unit 209. In this example, the higher the mismatch degree calculated by the feature amount comparison unit 209, the lower the evaluation value is calculated, and the evaluation of the singing voice becomes worse. Note that the voice evaluation unit 211 may calculate the evaluation value based on another element, instead of calculating the evaluation value based only on the degree of mismatch. As other elements, a singing technique and other parameters that can be extracted from the singing voice data are assumed. The evaluation result by the voice evaluation unit 211 may be presented on the display unit 17. The above is the description of the evaluation reference generation function 100 and the evaluation function 200.

＜第２実施形態＞
上述した第１実施形態では、ピッチを特徴量として用いていた。第２実施形態では、特徴量として音量の一階微分をした値（音量速度）を用いた例について説明する。なお、基準特徴量算出部１０３、および歌唱特徴量算出部２０５において算出される特徴量の内容が異なるだけであるので、詳細についての説明は省略する。 Second Embodiment
In the first embodiment described above, the pitch is used as the feature amount. In the second embodiment, an example will be described in which a first-order differential value (volume speed) is used as the feature amount. It should be noted that since the contents of the feature values calculated by the reference feature value calculation unit 103 and the singing feature value calculation unit 205 are different, detailed description thereof is omitted.

図１０は、本発明の第２実施形態における基準音量速度範囲を説明する図である。図１０（ａ）は、模範音声から算出された模範音量の波形ＡＳ１を示している。図１０（ｂ）は、模範音量波形ＡＳ１を時間で一階微分した音量速度波形ＡＳ２を示す。また、この音量速度波形ＡＳ２を基準として、第１実施形態で説明した複数のルール（第１ルールおよび第２ルール）に相当するルールを用いて決められた基準音量速度範囲ＡＷ１０、ＡＷ１１（上限音量速度ＵＬと下限音量速度ＬＬとの間）についても示されている。基準音量速度範囲ＡＷ１０は、通常区間ＮＰでの音量速度の範囲である。基準音量速度範囲ＡＷ１１は、ビブラート検出区間ＶＰにおける音量速度の範囲である。 FIG. 10 is a diagram for explaining a reference volume speed range in the second embodiment of the present invention. FIG. 10A shows a waveform AS1 of the model volume calculated from the model sound. FIG. 10B shows a volume velocity waveform AS2 obtained by first-order differentiation of the model volume waveform AS1 with respect to time. Further, with reference to this volume speed waveform AS2, reference volume speed ranges AW10 and AW11 (upper limit volume) determined using rules corresponding to the plurality of rules (first rule and second rule) described in the first embodiment. Also shown is between the speed UL and the lower limit volume speed LL. The reference volume speed range AW10 is a volume speed range in the normal section NP. The reference volume speed range AW11 is a volume speed range in the vibrato detection section VP.

このようにして決定された基準音量速度範囲を示す評価基準データが記憶部１３に記憶される。そして、歌唱音の評価の際には、歌唱音声から特徴量として算出した音量速度を、基準音量速度範囲と比較する。このように、一階微分をすることによって音量の直流成分を除去し、音量の変化の傾向を比較する。これによって、歌唱音の入力レベルが模範音声の入力レベルと大きく異なっても、変化の傾向を用いた比較を行うことができる。なお、音量速度を用いると、例えば、例示したビブラートの他にも、歌唱の抑揚、ブレスといった技法の検出も可能となる。 Evaluation reference data indicating the reference volume speed range determined in this manner is stored in the storage unit 13. Then, when evaluating the singing sound, the volume speed calculated as the feature amount from the singing voice is compared with the reference volume speed range. In this way, the direct current component of the volume is removed by performing the first-order differentiation, and the tendency of the volume change is compared. Thereby, even if the input level of the singing sound is greatly different from the input level of the model voice, the comparison using the tendency of change can be performed. When the volume speed is used, for example, techniques such as singing inflection and breathing can be detected in addition to the exemplified vibrato.

なお、特徴量としては、その他にも、周波数分布であってもよい。例えば、基音と倍音とのパワー比、またはこれに基づいて生成されるパラメータを特徴量とすることで、声質の時間変化に基づく評価をすることもできる。いずれの特徴量が採用されたとしても、模範音声に含まれる特殊な歌唱が検出される特定区間においては、第２ルールは、歌唱音声に特殊な歌唱が含まれなかった場合でも基準特徴量範囲に含まれるように設定される。 In addition, the feature amount may be a frequency distribution. For example, by using a power ratio between a fundamental tone and a harmonic overtone or a parameter generated based on the power ratio as a feature amount, it is possible to perform an evaluation based on a temporal change in voice quality. Regardless of which feature quantity is adopted, in the specific section in which a special song included in the model voice is detected, the second rule is the reference feature quantity range even if the special voice is not included in the song voice. Is set to be included.

＜第３実施形態＞
第３実施形態では、評価基準データに含まれる基準ピッチ範囲が２段階に決められている例について説明する。なお、以下の説明では基準ピッチ範囲が２段階で定義されているが、さらに多くの複数段階で定義されていてもよい。 <Third Embodiment>
In the third embodiment, an example in which the reference pitch range included in the evaluation reference data is determined in two stages will be described. In the following description, the reference pitch range is defined in two steps, but it may be defined in a plurality of more steps.

図１１は、本発明の第３実施形態における通常区間およびビブラート検出区間における基準ピッチ範囲の決定方法を説明する図である。図１１は、図４に対応する図である。この図に示すように、通常区間ＮＰにおいて、模範ピッチ波形ＰＳを含む基準ピッチ範囲は、高評価範囲ＡＷ１および低評価範囲ＡＷ１ａを含む。また、ビブラート検出区間ＶＰにおいては、模範ピッチ波形ＰＳを含む基準ピッチ範囲は、高評価範囲ＡＷ２および低評価範囲ＡＷ２ａを含む。この例では、低評価範囲ＡＷ１ａ、ＡＷ２ａは、上限ピッチＵＬに対して所定のピッチだけ増加させた上限ピッチＵＬａと、下限ピッチＬＬに対して所定のピッチだけ減少させた下限ピッチＬＬａとの間のピッチ幅である。 FIG. 11 is a diagram illustrating a method for determining a reference pitch range in the normal section and the vibrato detection section in the third embodiment of the present invention. FIG. 11 is a diagram corresponding to FIG. As shown in this figure, in the normal section NP, the reference pitch range including the exemplary pitch waveform PS includes a high evaluation range AW1 and a low evaluation range AW1a. In the vibrato detection section VP, the reference pitch range including the exemplary pitch waveform PS includes a high evaluation range AW2 and a low evaluation range AW2a. In this example, the low evaluation ranges AW1a and AW2a are between an upper limit pitch ULa that is increased by a predetermined pitch with respect to the upper limit pitch UL and a lower limit pitch LLa that is decreased by a predetermined pitch with respect to the lower limit pitch LL. The pitch width.

評価基準データは、このように、高評価範囲と低評価範囲というように複数段階に区分した基準ピッチ範囲を示していてもよい。このように複数段階に区分された基準ピッチ範囲は、歌唱音声の評価において、様々に適用可能である。まず、第１実施形態では、基準ピッチ範囲に歌唱ピッチ波形が含まれているか否かのいずれかで判定され、どの程度ずれているかについては考慮していなかった。一方、この例のように、基準ピッチ範囲に複数段階を設けることで、上限ピッチＵＬａとＵＬとの間または、下限ピッチＬＬａとＬＬとの間に歌唱ピッチ波形が含まれる区間を特定することもできる。これによれば、歌唱ピッチ波形が少しだけ基準ピッチ範囲から離れた場合と、完全に離れた場合とを区別して、不一致度の算出に重み付けをすることで、より詳細な歌唱音声を評価することができる。 Thus, the evaluation reference data may indicate a reference pitch range divided into a plurality of stages such as a high evaluation range and a low evaluation range. The reference pitch range divided into a plurality of stages in this way can be applied in various ways in the evaluation of singing voice. First, in the first embodiment, determination is made based on whether or not the singing pitch waveform is included in the reference pitch range, and the degree of deviation is not considered. On the other hand, as in this example, by providing a plurality of steps in the reference pitch range, it is also possible to specify a section in which the singing pitch waveform is included between the upper limit pitches ULa and UL or between the lower limit pitches LLa and LL. it can. According to this, a more detailed singing voice can be evaluated by weighting the calculation of the degree of inconsistency by distinguishing between a case where the singing pitch waveform is slightly separated from the reference pitch range and a case where it is completely separated from the reference pitch range. Can do.

また、上限ピッチＵＬａとＵＬとの間、および下限ピッチＬＬａとＬＬとの間をそれぞれ区別して、いずれの間において歌唱ピッチ波形が多く含まれるかを判定してもよい。これによれば、歌唱ピッチがシャープ側（高ピッチ側）にずれているのか、フラット側（低ピッチ側）にずれているのかを評価することもできる。 Alternatively, the upper limit pitches ULa and UL and the lower limit pitches LLa and LL may be distinguished from each other, and it may be determined between which the singing pitch waveform is included. According to this, it can be evaluated whether the singing pitch is shifted to the sharp side (high pitch side) or the flat side (low pitch side).

また、上限ピッチＵＬと下限ピッチＬＬとの間の基準ピッチ幅ＡＷ１、ＡＷ２と歌唱ピッチ波形とを比較した場合の不一致度１、および上限ピッチＵＬａと下限ピッチＬＬａとの間の基準ピッチ幅ＡＷ１ａ、ＡＷ２ａと歌唱ピッチ波形とを比較した場合の不一致度２を特徴量比較部２０９で算出し、不一致度１と不一致度２との相対関係（例えば、比）を、評価のためのパラメータとして用いることもできる。 Further, the reference pitch width AW1 between the upper limit pitch UL and the lower limit pitch LL, a mismatch degree 1 when comparing the AW2 and the singing pitch waveform, and the reference pitch width AW1a between the upper limit pitch ULa and the lower limit pitch LLa, When the AW 2a is compared with the singing pitch waveform, the feature quantity comparison unit 209 calculates the mismatch degree 2 and uses the relative relationship (eg, ratio) between the mismatch degree 1 and the mismatch degree 2 as a parameter for evaluation. You can also.

また、ピッチ幅ＡＷ１Ｕａとピッチ幅ＡＷ１Ｌａとは、同じ幅ではなく異なる幅にしてもよい。例えば、ピッチ幅ＡＷ１Ｕａを狭く、ピッチ幅ＡＷ１Ｌａをピッチ幅ＡＷ１Ｕａよりも大きくすることで、シャープ側に触れた歌唱に対して厳しくすることもできる。 Further, the pitch width AW1Ua and the pitch width AW1La may be different from each other instead of the same width. For example, by narrowing the pitch width AW1Ua and making the pitch width AW1La larger than the pitch width AW1Ua, it is possible to make it stricter for a song that touches the sharp side.

＜第４実施形態＞
第４実施形態では、歌唱音声から歌唱技法を検知して、特徴量比較部２０９Ａまたは音声評価部２１１Ａの処理の内容を変更する評価機能２００Ａについて説明する。 <Fourth embodiment>
In the fourth embodiment, an evaluation function 200A that detects the singing technique from the singing voice and changes the processing content of the feature amount comparison unit 209A or the voice evaluation unit 211A will be described.

図１２は、本発明の第４実施形態における評価機能の構成を示すブロック図である。評価機能２００Ａの構成のうち、上述した第１実施形態における評価機能２００とは異なる構成を説明する。特定区間検出部２１３は、第１実施形態における評価基準生成機能１００の特定区間検出部１０５と同様な機能を有し、歌唱ピッチ波形から特定の歌唱技法が含まれる第２特定区間を検出する。なお、以下の説明では、特定区間検出部１０５において検出された特定区間を第１特定区間と表記して、この第２特定区間と区別する。 FIG. 12 is a block diagram showing the configuration of the evaluation function in the fourth embodiment of the present invention. Of the configuration of the evaluation function 200A, a configuration different from the evaluation function 200 in the first embodiment described above will be described. The specific section detection unit 213 has the same function as the specific section detection unit 105 of the evaluation reference generation function 100 in the first embodiment, and detects a second specific section including a specific singing technique from the singing pitch waveform. In the following description, the specific section detected by the specific section detection unit 105 is described as a first specific section and is distinguished from the second specific section.

特徴量比較部２０９Ａは、第１実施形態における特徴量比較部２０９とほぼ同じ処理によって不一致度を算出するが、以下の点が異なっている。特徴量比較部２０９Ａは、評価基準データにおける第１特定区間ではない通常区間と、歌唱ピッチ波形に特定の歌唱技法に対応した第２特定区間とが重複する区間が存在するか否かを判定する。重複する区間が存在する場合には、重複する区間を不一致度の算出に用いない。歌唱者が模範音声とは異なる区間で歌唱技法を用いた場合に、基準ピッチ範囲をはずれてしまうことがあるが、このような処理によって不一致度の増加を抑えることができる。なお、このような処理を行わない第１実施形態では、模範音声に近い歌唱が求められる場合に用いられればよい。 The feature amount comparison unit 209A calculates the degree of inconsistency by substantially the same processing as the feature amount comparison unit 209 in the first embodiment, but differs in the following points. The feature amount comparison unit 209A determines whether there is a section in which the normal section that is not the first specific section in the evaluation reference data and the second specific section corresponding to the specific singing technique overlap in the singing pitch waveform. . If there are overlapping sections, the overlapping sections are not used for calculating the degree of mismatch. When the singer uses the singing technique in a section different from the model voice, the reference pitch range may be deviated. By such processing, an increase in the degree of inconsistency can be suppressed. In addition, in 1st Embodiment which does not perform such a process, what is necessary is just to be used when the song close | similar to a model voice is calculated | required.

また、この例では、音声評価部２１１Ａは、第１実施形態における音声評価部２１１とほぼ同じ処理によって評価値を算出するが、上記のように第２特定区間が評価基準データにおける通常区間と重複していた場合、評価が高くなるように評価値を算出する。なお、音声評価部２１１Ａは、このような場合に評価が高くなるようにするのではなく、第１特定区間と第２特定区間とが一致した場合に評価が高くなるように評価値を算出してもよい。 In this example, the voice evaluation unit 211A calculates the evaluation value by substantially the same process as the voice evaluation unit 211 in the first embodiment, but the second specific section overlaps with the normal section in the evaluation reference data as described above. If so, the evaluation value is calculated so that the evaluation is high. Note that the voice evaluation unit 211A does not increase the evaluation in such a case, but calculates the evaluation value so that the evaluation increases when the first specific section and the second specific section match. May be.

＜第５実施形態＞
上述した実施形態では、特定区間検出部１０５は、模範ピッチ波形に歌唱技法に対応した波形が含まれている特定区間（評価対象区間）を検出していた。第５実施形態においては、特定区間検出部１０５は、さらに模範ピッチ波形に特定の条件を満たす波形が含まれている場合には、評価を除外するための特定区間（評価除外区間）を検出する。そのため、特定区間は、模範ピッチ波形が満たす条件に応じて決められる評価対象区間と評価除外区間とが含まれる。評価対象区間は、上記の通り歌唱技法が含まれる区間である。一方、評価除外区間としては、例えば、模範ピッチが急峻に変化する区間であったり、模範ピッチの算出ができずに分散してしまうような区間であったりすればよい。そして、生成部１０７において適用される第２ルールによって、評価対象区間に対しては第１実施形態と同様な処理が実行される一方、評価除外区間に対してはその区間が評価対象外であることを示す識別子を評価基準データに付加する処理が実行されればよい。すなわち、評価対象外の特定区間では、第２ルールによって許容範囲が「０」とされた評価基準データが生成される、ということもできる。 <Fifth Embodiment>
In the embodiment described above, the specific section detection unit 105 detects a specific section (evaluation target section) in which a waveform corresponding to the singing technique is included in the exemplary pitch waveform. In the fifth embodiment, the specific section detection unit 105 detects a specific section (evaluation exclusion section) for excluding evaluation when the exemplary pitch waveform includes a waveform that satisfies a specific condition. . Therefore, the specific section includes an evaluation target section and an evaluation exclusion section that are determined according to conditions that the exemplary pitch waveform satisfies. The evaluation target section is a section including the singing technique as described above. On the other hand, the evaluation exclusion section may be, for example, a section where the model pitch changes steeply or a section where the model pitch cannot be calculated and dispersed. Then, according to the second rule applied in the generation unit 107, the same processing as that of the first embodiment is executed for the evaluation target section, whereas the evaluation exclusion section is not evaluated. It is only necessary to execute processing for adding an identifier indicating that to the evaluation reference data. That is, it can be said that the evaluation reference data in which the allowable range is set to “0” by the second rule is generated in the specific section that is not the evaluation target.

なお、特定区間をすべて評価除外区間としてもよい。この場合、歌唱技法が含まれる区間を評価除外区間としてもよい。そして、第４実施形態において説明した第２特定区間が、評価除外区間と重複する場合（または評価除外区間に含まれる場合）には、評価が高くなるように評価値が算出されてもよい。 In addition, it is good also considering all the specific areas as an evaluation exclusion area. In this case, a section including the singing technique may be set as an evaluation exclusion section. And when the 2nd specific area demonstrated in 4th Embodiment overlaps with an evaluation exclusion area (or when included in an evaluation exclusion area), an evaluation value may be calculated so that evaluation may become high.

＜第６実施形態＞
上述した実施形態では、基準特徴量算出部１０３は、基準入力音として取得された模範音声から特徴量（例えば、模範ピッチ）を算出していた。模範音声としては、上述したように、合成音声等、コンピュータ等によって生成された音声、楽器音など、人声以外であってもよいことを説明したが、第６実施形態では、模範音声のような音の波形データ以外のデータを基準データとして特徴量算出に用いる場合について、いくつかの例を説明する。 <Sixth Embodiment>
In the embodiment described above, the reference feature value calculation unit 103 calculates a feature value (for example, a model pitch) from the model voice acquired as the reference input sound. As described above, it has been described that the model voice may be a voice other than a human voice, such as a synthesized voice, a voice generated by a computer, or a musical instrument sound, as described above. Several examples will be described for the case where data other than the waveform data of simple sound is used as the reference data for feature amount calculation.

基準データとしては、ＭＩＤＩ形式のデータなど、音を規定するシーケンスデータであってもよい。また、図３に示すような波形ＰＳを手書き入力（紙に描画した波形をスキャンして画像データとしてコンピュータに入力、または、操作部１５を用いて入力）したデータであってもよい。このような手書き入力は、手本となる歌唱を聴きながら実施されるようにしてもよい。このように、基準データは、音の特徴量の時間変化に変換できるデータであれば、その音そのものを示すデータ（第１実施形態における模範音声）でなくてもよい。 The reference data may be sequence data that defines a sound, such as MIDI format data. 3 may be data obtained by handwriting input of a waveform PS as shown in FIG. 3 (scanning a waveform drawn on paper and inputting it as image data into a computer or using the operation unit 15). Such handwriting input may be performed while listening to a song as a model. As described above, the reference data may not be data indicating the sound itself (exemplary sound in the first embodiment) as long as it is data that can be converted into a temporal change in the sound feature amount.

また、基準データが第１実施形態のように模範音声である場合には、特定区間は、特徴量の時間変化が特定の条件を満たす区間として規定される。特定の条件としては、所定の歌唱技法が存在するという条件を具体例として説明した。一方、特徴量が所定の範囲外である場合（特徴量がピッチであれば、高音であったり低音であったりする場合）など、特定の条件を満たすか否かの判断が、特徴量の時間変化を対象とする場合に限らず、各タイミングにおける特徴量の値が対象であってもよい。 In addition, when the reference data is model voice as in the first embodiment, the specific section is defined as a section in which the temporal change of the feature amount satisfies a specific condition. As a specific condition, the condition that a predetermined singing technique exists has been described as a specific example. On the other hand, when the feature value is outside the predetermined range (if the feature value is pitch, the sound is high or low), whether or not a specific condition is satisfied is determined by the time of the feature value. The value of the feature value at each timing is not limited to the case where the change is a target, and may be the target.

また、基準データに、歌唱曲の情報が含まれる場合、この歌唱曲の情報に基づいて、通常区間の許容範囲から変更される特定区間が検出されてもよい。歌唱曲の情報は、例えば、曲の構成情報（歌い出し、サビ、Ａメロ、Ｂメロ、Ａメロの頭、エンディング等）、音域情報（高音、低音等）、および音情報（和音構成音、経過音等）である。なお、上述した特徴量についても、基準データ（模範音声）から算出して得られる情報であるから、基準データに含まれる情報の一種である。 Moreover, when the information of song music is contained in reference | standard data, the specific area changed from the tolerance | permissible_range of a normal area may be detected based on the information of this song music. The information on the song includes, for example, composition information of the song (singing, rust, A melody, B melody, head of A melody, ending, etc.), range information (treble, bass, etc.), and sound information (chord constituent sounds, Progress sound, etc.). Note that the feature amount described above is also information obtained by calculation from reference data (model voice), and thus is a kind of information included in the reference data.

許容範囲の変化量は、操作部１５を介してユーザによって設定されてもよいし、予め設定されていてもよい。このとき、基準データに含まれる情報の種別によって、許容範囲の変化量が異なるように設定されていてもよい。なお、許容範囲は、通常区間と特定区間とで変更されるだけでなく、区間を区別せずに変更されてもよい。このような場合の評価基準生成機能１００Ｂについて説明する。 The change amount of the allowable range may be set by the user via the operation unit 15 or may be set in advance. At this time, the change amount of the allowable range may be set to be different depending on the type of information included in the reference data. The allowable range is not only changed between the normal section and the specific section, but may be changed without distinguishing between sections. The evaluation reference generation function 100B in such a case will be described.

図１３は、本発明の第６実施形態における評価基準生成機能の構成を示すブロック図である。図１３に示すように、評価基準生成機能１００Ｂは、特定区間を区別せずに処理を実行するため特定区間検出部１０５を備えていなくてもよい。一方、生成部１０７Ｂは、基準データに含まれる情報を、基準取得部１０１または基準特徴量算出部１０３から取得して、この情報に基づいて許容範囲を変更させながら基準データ範囲を決定する。このように、必ずしも特定区間を検出する構成を含まなくてもよい。 FIG. 13 is a block diagram showing the configuration of the evaluation reference generation function in the sixth embodiment of the present invention. As illustrated in FIG. 13, the evaluation criterion generation function 100B may not include the specific section detection unit 105 in order to execute the process without distinguishing the specific section. On the other hand, the generation unit 107B acquires information included in the reference data from the reference acquisition unit 101 or the reference feature amount calculation unit 103, and determines the reference data range while changing the allowable range based on this information. Thus, the configuration for detecting the specific section is not necessarily included.

＜第７実施形態＞
上述した実施形態では、生成部１０７は、通常区間と特定区間とで異なるルールに従って基準ピッチ範囲を決定していた。第７実施形態では、基準ピッチ範囲を決定する際に通常区間と特定区間とを分けず、１つのルールに従って基準ピッチ範囲が決定される。１つのルールとは、模範ピッチ波形を基準として基準ピッチ範囲を決定するものであれば、上述した第１ルールに相当するルールであってもよいし、それ以外のルール、例えば、低ピッチ側ほど許容範囲を大きくするルール、模範ピッチ波形の変化量に応じて許容範囲を変化させるルールなどを適用してもよい。このような場合には、上述した実施形態における第２ルールを適用する特定区間を検出する必要が無いため、特定区間検出部１０５は存在しなくてもよい。 <Seventh embodiment>
In the embodiment described above, the generation unit 107 determines the reference pitch range according to different rules for the normal section and the specific section. In the seventh embodiment, when determining the reference pitch range, the normal interval and the specific interval are not divided, and the reference pitch range is determined according to one rule. The one rule may be a rule corresponding to the first rule described above as long as the reference pitch range is determined based on the model pitch waveform, and other rules, for example, the lower pitch side A rule for increasing the allowable range, a rule for changing the allowable range in accordance with the amount of change in the exemplary pitch waveform, and the like may be applied. In such a case, since it is not necessary to detect a specific section to which the second rule in the above-described embodiment is applied, the specific section detection unit 105 may not exist.

このように第２ルールを用いない場合には、基準特徴量算出部１０３は、模範音声の特徴量の時間変化を示す波形に対してＬＰＦ（ローパスフィルタ）等のフィルタ処理を施し、これを模範ピッチ波形（特徴量データ）としてもよい。例えばＬＰＦを通して得られた模範ピッチ波形は、例えば、ビブラートの波形が除去されたものとなる。この状態で、許容範囲を広めに設定すると、歌唱技法に対する評価の冗長性が向上する。なお、第２ルールを用いる場合であっても、基準特徴量算出部１０３は、フィルタ処理が施された模範ピッチ波形を算出するようにしてもよい。 When the second rule is not used in this way, the reference feature value calculation unit 103 performs a filter process such as LPF (low-pass filter) on the waveform indicating the temporal change in the feature value of the model voice, and uses this as a model. It may be a pitch waveform (feature data). For example, the model pitch waveform obtained through the LPF is obtained by removing the vibrato waveform, for example. In this state, if the allowable range is set wider, the evaluation redundancy for the singing technique is improved. Even when the second rule is used, the reference feature value calculation unit 103 may calculate an exemplary pitch waveform subjected to the filter process.

１…評価装置、１１…制御部、１３…記憶部、１５…操作部、１７…表示部、１９…通信部、２１…信号処理部、２３…マイクロフォン、２５…スピーカ、１００…評価基準生成機能、１０１…基準取得部、１０３…基準特徴量算出部、１０５…特定区間検出部、１０７…生成部、１０９…評価精度設定部、２００，２００Ａ…評価機能、２０１…伴奏出力部、２０３…信号取得部、２０５…歌唱特徴量算出部、２０７…評価基準取得部、２０９，２０９Ａ…特徴量比較部、２１１，２１１Ａ…音声評価部、２１３…特定区間検出部
DESCRIPTION OF SYMBOLS 1 ... Evaluation apparatus, 11 ... Control part, 13 ... Memory | storage part, 15 ... Operation part, 17 ... Display part, 19 ... Communication part, 21 ... Signal processing part, 23 ... Microphone, 25 ... Speaker, 100 ... Evaluation criteria production | generation function DESCRIPTION OF SYMBOLS 101 ... Standard acquisition part 103 ... Reference | standard feature-value calculation part 105 ... Specific area detection part 107 ... Production | generation part 109 ... Evaluation precision setting part 200, 200A ... Evaluation function, 201 ... Accompaniment output part, 203 ... Signal Acquiring unit, 205 ... Singing feature value calculating unit, 207 ... Evaluation standard acquiring unit, 209, 209A ... Feature value comparing unit, 211, 211A ... Voice evaluating unit, 213 ... Specific section detecting unit

Claims

A reference acquisition unit for acquiring reference data;
A reference feature amount calculation unit for calculating feature amount data indicating a temporal change in the sound feature amount from the reference data;
A reference feature amount range set to include the feature amount indicated by the feature amount data as a reference, including an allowable range from the reference, and the allowable range is changed based on information included in the reference data A generation unit that generates evaluation reference data that defines a reference feature amount range;
An evaluation criterion generating device comprising:

From the feature amount data, further comprising a detection unit for detecting a specific section in which the feature amount satisfies a specific condition,
The allowable range of the reference feature amount range defined by the evaluation reference data generated by the generation unit is set by the first rule outside the specific section, and is set by the second rule in the specific section. 2. The evaluation reference generation device according to claim 1, wherein the allowable range is changed based on information included in the reference data.

An evaluation accuracy setting unit for setting the evaluation accuracy is further provided.
The evaluation reference generation device according to claim 1, wherein the generation unit changes the allowable range based on the evaluation accuracy set by the evaluation accuracy setting unit.

The evaluation criterion generation apparatus according to claim 1, wherein the allowable range is divided into a plurality of evaluation ranges different in evaluation method.

The reference data indicates an input sound,
The feature amount includes a pitch of the input sound,
The allowable range set by the first rule is a pitch width from the pitch of the input sound,
The evaluation criterion generation apparatus according to claim 2, wherein the allowable range set by the second rule is a range wider than the allowable range set by the first rule.

The second rule includes a rule for setting an allowable range including a first pitch width including the pitch of the reference sound and a second pitch width which does not include the pitch and is not continuous with the first pitch width. 6. The evaluation reference generation device according to claim 5, wherein

The evaluation reference generation device according to claim 1, wherein the feature amount includes a differential value of a volume of the reference sound.

The specific section includes an evaluation target section and an evaluation exclusion section that are determined according to a condition that the feature amount satisfies,
The said 2nd rule sets the said tolerance | permissible_range in the said evaluation object area, and adds the identifier which shows that it is not an evaluation object in the said evaluation exclusion area to the said evaluation reference data. Evaluation standard generator.

The reference feature amount calculation unit calculates the feature amount data indicating a waveform obtained by performing a predetermined filtering process on a waveform indicating a temporal change in a sound feature amount from the reference data. Item 2. The evaluation criterion generating device according to Item 1.

The evaluation reference generation device according to any one of claims 1 to 9,
A singing sound acquisition unit for acquiring a singing input sound;
A singing feature value calculation unit for calculating singing feature value data indicating a temporal change in the feature value from the singing input sound;
A comparison unit that compares the reference feature amount range indicated by the evaluation reference data with a temporal change in the feature value indicated by the singing feature amount data;
Based on the compared results, an evaluation unit that calculates an evaluation value for the singing input sound;
A singing evaluation apparatus comprising:

A singing sound acquisition unit for acquiring a singing input sound;
A singing feature amount calculation unit for calculating singing feature amount data indicating a temporal change of the feature amount from the singing input sound;
A reference feature amount range that is set to include a reference feature amount that serves as a reference for evaluation of the feature amount, includes an allowable range from the reference feature amount, and the allowable feature is changed according to a section. An evaluation criteria acquisition unit for acquiring evaluation criteria data defining a quantity range;
A comparison unit that compares the reference feature amount range indicated by the evaluation reference data with a temporal change in the feature value indicated by the singing feature amount data;
Based on the compared results, an evaluation unit that calculates an evaluation value for the singing input sound;
A singing evaluation apparatus comprising: