JP6650248B2

JP6650248B2 - Singing evaluation method, singing evaluation program, singing evaluation device, and singing evaluation system

Info

Publication number: JP6650248B2
Application number: JP2015216942A
Authority: JP
Inventors: 敏秀金
Original assignee: JE International Corp
Current assignee: JE International Corp
Priority date: 2015-11-04
Filing date: 2015-11-04
Publication date: 2020-02-19
Anticipated expiration: 2035-11-04
Also published as: JP2017090545A

Description

本発明は、歌唱評価方法、歌唱評価プログラム、歌唱評価装置、および歌唱評価システムに関する。 The present invention relates to a singing evaluation method, a singing evaluation program, a singing evaluation device, and a singing evaluation system.

従来から、ユーザーの歌唱の巧拙を採点する方法が種々提案されている。例えば、下記の特許文献１には、ユーザーの音声データに基づいて抽出されるピッチ（音程）と、楽音データに基づくピッチと、を比較して、ユーザーのピッチの正確性を判定する方法が提案されている。 Conventionally, various methods for scoring the skill of a user's singing have been proposed. For example, Patent Literature 1 below proposes a method of comparing a pitch (pitch) extracted based on user voice data with a pitch based on musical sound data to determine the accuracy of the user's pitch. Have been.

特開２００５−１２８３７２号公報JP 2005-128372 A

ピッチが正確であれば、音程に関しては、正確に歌唱していることになる。しかし、歌唱の巧拙は、歌声のピッチの正確性だけで判断されるものではない。歌唱の巧拙としては、歌声を途切れさせるタイミングや歌声の長さを判断することも必要である。そして、歌声を途切れさせるタイミングや歌声の長さは、ユーザーの呼吸区間に依存する。ユーザーが呼吸区間を工夫することで、歌声を伸ばしたり、歌声を大きくしたり、また表現力に幅を持たせることもできる。特許文献１に係る判定方法では、このようなユーザーの呼吸区間は評価されていない。 If the pitch is correct, the singer will be singing correctly with respect to the pitch. However, the skill of singing is not judged only by the accuracy of the pitch of the singing voice. As for the skill of singing, it is also necessary to judge the timing at which the singing voice is interrupted and the length of the singing voice. The timing at which the singing voice is interrupted and the length of the singing voice depend on the breathing interval of the user. By devising the breathing section, the user can lengthen the singing voice, increase the singing voice, and provide a wider range of expression. In the determination method according to Patent Literature 1, such a user's breathing section is not evaluated.

本発明は、上記事情に鑑みてなされたものであり、ユーザーの呼吸区間を評価することのできる歌唱評価方法、歌唱評価プログラム、歌唱評価装置、および歌唱評価システムを提供することを目的とする。 The present invention has been made in view of the above circumstances, and an object of the present invention is to provide a singing evaluation method, a singing evaluation program, a singing evaluation device, and a singing evaluation system capable of evaluating a user's respiratory interval.

上記目的を達成する本発明に係る歌唱評価方法は、ユーザーの歌唱の巧拙を評価する歌唱評価方法である。歌唱評価方法は、前記ユーザーが目標とする目標音声を表す目標音声データから、時間遷移する前記目標音声の周波数を特定し、前記ユーザーの音声であるユーザー音声を表すユーザー音声データから、時間遷移する前記ユーザー音声の周波数を特定する周波数特定ステップを有する。歌唱評価方法は、前記目標音声の周波数に基づいて、前記目標音声において周波数が万遍なく散らばり広い範囲に亘って無数の周波数が存在する呼吸区間を目標呼吸区間として特定し、前記ユーザー音声の周波数に基づいて、前記ユーザー音声において周波数が万遍なく散らばり広い範囲に亘って無数の周波数が存在する呼吸区間をユーザー呼吸区間として特定する呼吸区間特定ステップを有する。歌唱評価方法は、前記目標呼吸区間および前記ユーザー呼吸区間を比較する呼吸区間比較ステップと、前記呼吸区間比較ステップにおいて比較された結果を出力する出力ステップと、を有する。 A singing evaluation method according to the present invention that achieves the above object is a singing evaluation method for evaluating the skill of a user singing. The singing evaluation method specifies a frequency of the target voice that changes over time from target voice data indicating a target voice targeted by the user, and performs a time transition from user voice data indicating a user voice that is the voice of the user. A frequency specifying step of specifying a frequency of the user voice. Singing evaluation method, on the basis of the frequency of the target speech, the frequency have your target speech over the evenly scattered wide range identifies the breathing segment that exist innumerable frequency as target respiration interval, the user based on the frequency of the audio, with a breathing segment specifying step of the frequency have your user speech to identify the breathing segment that exist are countless frequency over a wide range scattered without evenly as a user breathing segment. The singing evaluation method includes a breathing section comparing step of comparing the target breathing section and the user breathing section, and an output step of outputting a result compared in the breathing section comparing step.

また、上記目的を達成する本発明に係る歌唱評価プログラムは、上述した歌唱評価方法をコンピューターに実行させるための歌唱評価プログラムである。 A singing evaluation program according to the present invention that achieves the above object is a singing evaluation program for causing a computer to execute the above-described singing evaluation method.

また、上記目的を達成する本発明に係る歌唱評価装置は、ユーザーの歌唱の巧拙を評価する歌唱評価装置である。歌唱評価装置は、前記ユーザーが目標とする目標音声を表す目標音声データから、時間遷移する前記目標音声の周波数を特定し、前記ユーザーの音声であるユーザー音声を表すユーザー音声データから、時間遷移する前記ユーザー音声の周波数を特定する周波数特定部を有する。歌唱評価装置は、前記目標音声の周波数に基づいて、前記目標音声において周波数が万遍なく散らばり広い範囲に亘って無数の周波数が存在する呼吸区間を目標呼吸区間として特定し、前記ユーザー音声の周波数に基づいて、前記ユーザー音声において周波数が万遍なく散らばり広い範囲に亘って無数の周波数が存在する呼吸区間をユーザー呼吸区間として特定する呼吸区間特定部を有する。歌唱評価装置は、前記目標呼吸区間および前記ユーザー呼吸区間を比較する呼吸区間比較部と、前記呼吸区間比較部によって比較された結果を出力する出力部と、を有する。 A singing evaluation device according to the present invention that achieves the above object is a singing evaluation device that evaluates a user's singing skill. The singing evaluation device specifies the frequency of the target voice that makes a time transition from the target voice data that represents the target voice targeted by the user, and makes a time transition from the user voice data that represents the user voice that is the user's voice. A frequency specifying unit that specifies a frequency of the user voice; Singing evaluation device, based on the frequency of the target speech, the frequency have your target speech over the evenly scattered wide range identifies the breathing segment that exist innumerable frequency as target respiration interval, the user based on the frequency of the audio, including respiratory section identifying unit that the frequency have your user speech to identify the breathing segment that exist are countless frequency over a wide range scattered without evenly as a user breathing segment. The singing evaluation device includes a breathing section comparison unit that compares the target breathing section and the user breathing section, and an output unit that outputs a result compared by the breathing section comparison unit.

また、上記目的を達成する本発明に係る歌唱評価システムは、ユーザーの音声であるユーザー音声からユーザー音声データを生成し、前記ユーザー音声データを、ネットワークを介して送信するユーザー端末を有する。歌唱評価システムは、前記ユーザー音声データを前記ユーザー端末から受信し、前記ユーザー音声データを、前記ユーザーが目標とする目標音声を表す目標音声データと比較して、歌唱評価の結果を送信する上記の歌唱評価装置として機能するサーバー装置を有する。前記ユーザー端末は、前記サーバー装置から、前記歌唱評価の結果を受信し、当該結果を出力する。 The singing evaluation system according to the present invention that achieves the above object has a user terminal that generates user voice data from a user voice that is a user voice and transmits the user voice data via a network. The singing evaluation system receives the user voice data from the user terminal, compares the user voice data with target voice data representing a target voice targeted by the user, and transmits a singing evaluation result. It has a server device that functions as a singing evaluation device. The user terminal receives a result of the singing evaluation from the server device, and outputs the result.

上述の歌唱評価方法、歌唱評価プログラム、歌唱評価装置、および歌唱評価システムによれば、目標音声の周波数に基づいて、目標呼吸区間を特定する。また、ユーザー音声の周波数に基づいて、ユーザー呼吸区間を特定する。そして、目標呼吸区間とユーザー呼吸区間とを比較して、比較結果を出力する。このため、ユーザーは、目標呼吸区間に対する、ユーザー呼吸区間のずれを把握することができる。したがって、ユーザーの呼吸区間を評価することのできる歌唱評価方法、歌唱評価プログラム、歌唱評価装置、および歌唱評価システムを提供することができる。 According to the above-described singing evaluation method, singing evaluation program, singing evaluation device, and singing evaluation system, the target breathing section is specified based on the frequency of the target voice. Further, the user breathing section is specified based on the frequency of the user voice. Then, the target breathing section is compared with the user breathing section, and a comparison result is output. For this reason, the user can grasp the deviation of the user breathing section from the target breathing section. Therefore, it is possible to provide a singing evaluation method, a singing evaluation program, a singing evaluation device, and a singing evaluation system that can evaluate a user's breathing section.

本実施形態に係る歌唱評価装置の概略構成を示すブロック図である。It is a block diagram showing the schematic structure of the singing evaluation device concerning this embodiment. 歌唱評価装置のＣＰＵの機能構成を示すブロック図である。It is a block diagram which shows the functional structure of CPU of a singing evaluation apparatus. 時間遷移に伴う音声の周波数の変化の一例を示す図である。It is a figure showing an example of change of the frequency of the voice accompanying time transition. 時間遷移に伴う音声のレベルの変化の一例を示す図である。It is a figure showing an example of a change of a voice level accompanying time transition. 本実施形態に係る歌唱評価方法を示すフローチャートである。It is a flowchart which shows the singing evaluation method which concerns on this embodiment. 表示部に表示される結果の一例を示す図である。FIG. 9 is a diagram illustrating an example of a result displayed on a display unit. 呼吸区間、音程（周波数）、および声の大きさ（レベル）を選択可能な様子を示す図である。It is a figure showing a mode that a respiratory section, a pitch (frequency), and a loudness (level) of a voice can be selected. 歌唱評価システムの概略構成を示すブロック図である。It is a block diagram showing a schematic structure of a singing evaluation system.

以下、添付した図面を参照して、本発明の実施形態を説明する。なお、図面の説明において、同一の要素には同一の符号を付し、重複する説明を省略する。また、図面の寸法比率は、説明の都合上誇張されており、実際の比率とは異なる場合がある。 Hereinafter, embodiments of the present invention will be described with reference to the accompanying drawings. In the description of the drawings, the same elements will be denoted by the same reference symbols, without redundant description. In addition, the dimensional ratios in the drawings are exaggerated for convenience of description, and may be different from the actual ratios.

図１は、本実施形態に係る歌唱評価装置１００の概略構成を示すブロック図である。 FIG. 1 is a block diagram illustrating a schematic configuration of a singing evaluation device 100 according to the present embodiment.

歌唱評価装置１００は、例えばデスクトップ型ＰＣ（ＰｅｒｓｏｎａｌＣｏｍｐｕｔｅｒ）やノート型ＰＣなどのコンピューター端末である。また、歌唱評価装置１００は、カラオケ装置などの一部に組み込まれてもよい。歌唱評価装置１００は、ユーザーの音声入力を受け付け、ユーザーの歌唱の巧拙を評価する。 The singing evaluation device 100 is a computer terminal such as a desktop PC (Personal Computer) or a notebook PC. The singing evaluation device 100 may be incorporated in a part of a karaoke device or the like. The singing evaluation device 100 receives a user's voice input and evaluates the user's singing skill.

歌唱評価装置１００は、ＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）１１０、メモリー１２０、ハードディスク１３０、通信Ｉ／Ｆ部１４０、表示部１５０、操作部１６０、音声入力部１７０、および音声出力部１８０を有する。各構成は、バス１９０を介して、相互に通信可能に接続されている。 The singing evaluation device 100 includes a CPU (Central Processing Unit) 110, a memory 120, a hard disk 130, a communication I / F unit 140, a display unit 150, an operation unit 160, a voice input unit 170, and a voice output unit 180. The components are connected to each other via a bus 190 so that they can communicate with each other.

ＣＰＵ１１０は、メモリー１２０やハードディスク１３０に記録されているプログラムに従って、各構成の制御や各種の演算処理などを実行する。 The CPU 110 controls each component and executes various arithmetic processes according to programs stored in the memory 120 and the hard disk 130.

メモリー１２０は、各種プログラムや各種データを記憶するＲＯＭ（ＲｅａｄＯｎｌｙＭｅｍｏｒｙ）、作業領域として一時的にプログラムやデータを記憶するＲＡＭ（ＲａｎｄａｍＡｃｃｅｓｓＭｅｍｏｒｙ）などから構成される。 The memory 120 includes a ROM (Read Only Memory) for storing various programs and various data, a RAM (Random Access Memory) for temporarily storing programs and data as a work area, and the like.

ハードディスク１３０は、オペレーティングシステムを含む各種プログラムや各種データを記憶する。ハードディスク１３０は、ユーザーが目標とする目標音声を表す目標音声データと、ユーザーに伴奏音を提供するための伴奏データとを含む楽曲データを記憶する。なお、本明細書において、目標音声は、例えばプロの歌手が歌唱した音声を意味するが、歌の手本となる人が歌唱した音声であれば特に限定されない。また、ハードディスク１３０は、ユーザーの音声であるユーザー音声を表すユーザー音声データを記憶する。また、ハードディスク１３０は、歌唱評価結果に関連するアドバイスの情報を記憶する。 The hard disk 130 stores various programs including an operating system and various data. The hard disk 130 stores music data including target audio data representing target audio targeted by the user, and accompaniment data for providing accompaniment sounds to the user. In this specification, the target voice means, for example, a voice sung by a professional singer, but is not particularly limited as long as the voice is sung by a person who becomes a model of the song. Further, the hard disk 130 stores user voice data representing user voice, which is voice of the user. Further, the hard disk 130 stores information of advice related to the singing evaluation result.

通信Ｉ／Ｆ部１４０は、ネットワークを介して他の機器と通信するためのインターフェースであり、イーサネット（登録商標）、ＦＤＤＩ（ＦｉｂｅｒＤｉｓｔｒｉｂｕｔｅｄＤａｔａＩｎｔｅｒｆａｃｅ）、Ｗｉ−Ｆｉ（ＷｉｒｅｌｅｓｓＦｉｄｅｌｉｔｙ）などの規格を用いる。通信Ｉ／Ｆ部１４０は、目標音声データおよび伴奏データを含む楽曲データや、歌唱評価結果に関連するアドバイスの情報を、外部のサーバーなどから受信する。 The communication I / F unit 140 is an interface for communicating with other devices via a network, and uses standards such as Ethernet (registered trademark), FDDI (Fiber Distributed Data Interface), and Wi-Fi (Wireless Fidelity). . The communication I / F unit 140 receives music data including target voice data and accompaniment data, and information on advice related to the singing evaluation result from an external server or the like.

表示部１５０は、例えば液晶ディスプレイであり、歌唱評価結果等の各種情報を表示する。 The display unit 150 is, for example, a liquid crystal display, and displays various information such as a singing evaluation result.

操作部１６０は、例えばマウスなどのポインティングデバイスやキーボードであり、ユーザーが各種情報を入力するために使用される。 The operation unit 160 is, for example, a pointing device such as a mouse or a keyboard, and is used by a user to input various information.

音声入力部１７０は、ユーザー音声を電気信号に変換するマイクロホン、変換された電気信号を増幅するアンプ、電気信号をアナログ信号からデジタル信号に変換するＡ／Ｄコンバーターなどから構成される。つまり、音声入力部１７０によって、ユーザー音声はデジタル信号に変換され、変換されたデジタル信号は、ＣＰＵ１１０によって、ユーザー音声データとして処理される。 The audio input unit 170 includes a microphone that converts a user's voice into an electric signal, an amplifier that amplifies the converted electric signal, an A / D converter that converts an electric signal from an analog signal to a digital signal, and the like. That is, the user voice is converted into a digital signal by the voice input unit 170, and the converted digital signal is processed as user voice data by the CPU 110.

音声出力部１８０は、電気信号をデジタル信号からアナログ信号に変換するＤ／Ａコンバーター、電気信号を増幅するアンプ、電気信号を音に変換して音を出力するスピーカーまたはヘッドホンなどから構成される。 The audio output unit 180 includes a D / A converter that converts an electric signal from a digital signal to an analog signal, an amplifier that amplifies the electric signal, a speaker or a headphone that converts the electric signal into sound and outputs sound.

なお、音声入力部１７０および音声出力部１８０の構成の一部または全部は、歌唱評価装置１００の外部に設けられてもよく、図１に示す例に限定されない。例えば、音声入力部１７０および音声出力部１８０の構成の一部が、歌唱評価装置１００に設けられた音声入力端子および音声出力端子を介して、歌唱評価装置１００と接続されていてもよい。 A part or all of the configuration of the voice input unit 170 and the voice output unit 180 may be provided outside the singing evaluation device 100, and is not limited to the example illustrated in FIG. For example, a part of the configuration of the voice input unit 170 and the voice output unit 180 may be connected to the singing evaluation device 100 via a voice input terminal and a voice output terminal provided in the singing evaluation device 100.

図２は、歌唱評価装置１００のＣＰＵ１１０の機能構成を示すブロック図である。 FIG. 2 is a block diagram showing a functional configuration of CPU 110 of singing evaluation apparatus 100.

ＣＰＵ１１０は、各種プログラムを実行することによって、例えば、周波数特定部１１１、呼吸区間特定部１１２、レベル特定部１１３、伴奏データ再生部１１４、呼吸区間比較部１１５、周波数比較部１１６、レベル比較部１１７、および出力部１１８として機能する。以下、各機能構成について説明する。 By executing various programs, the CPU 110 executes, for example, the frequency identification unit 111, the breathing interval identification unit 112, the level identification unit 113, the accompaniment data reproduction unit 114, the breathing interval comparison unit 115, the frequency comparison unit 116, and the level comparison unit 117. , And the output unit 118. Hereinafter, each functional configuration will be described.

周波数特定部１１１は、目標音声データから、時間遷移する目標音声の周波数を特定する。また、周波数特定部１１１は、ユーザー音声データから、時間遷移するユーザー音声の周波数を特定する。具体的には、周波数特定部１１１は、目標音声データおよびユーザー音声データに対して、それぞれフーリエ変換を適用する。そして、周波数特定部１１１は、所定時間毎に音声の周波数成分を特定し、特定された周波数成分のうち、最も低い周波数を「基本周波数」として特定する。音声の基本周波数は、人間が感じる音の高さ（音程）に相当する。特定された周波数成分は、例えば、図３のように表現できる。 The frequency specifying unit 111 specifies the frequency of the target sound that undergoes time transition from the target sound data. Further, the frequency specifying unit 111 specifies the frequency of the user voice that changes over time from the user voice data. Specifically, the frequency identification unit 111 applies Fourier transform to the target audio data and the user audio data, respectively. Then, the frequency specifying unit 111 specifies the frequency component of the audio at predetermined time intervals, and specifies the lowest frequency among the specified frequency components as the “basic frequency”. The fundamental frequency of a voice corresponds to the pitch (pitch) of a sound felt by humans. The specified frequency component can be expressed, for example, as shown in FIG.

図３は、時間遷移に伴う音声の周波数の変化の一例を示す図である。周波数特定部１１１が、歌唱の進行（時間の進行）に従って、所定時間毎に音声の周波数を特定することによって、時間遷移に伴う音声の周波数の変化が確認される。ここで、所定時間は任意であり、例えば、０．１秒である。この所定時間が短いほど、精密に音声の周波数の遷移を特定できる。図３に示す周波数成分のうち、矢印Ａで示す最も低い周波数成分が、基本周波数であって、矢印Ｂで示す周波数成分は、倍音の周波数である。 FIG. 3 is a diagram illustrating an example of a change in the frequency of the voice according to the time transition. The frequency specifying unit 111 specifies the frequency of the voice at predetermined time intervals according to the progress of the singing (the progress of time), so that a change in the frequency of the voice accompanying the time transition is confirmed. Here, the predetermined time is arbitrary, for example, 0.1 second. The shorter the predetermined time is, the more precisely the transition of the voice frequency can be specified. Of the frequency components shown in FIG. 3, the lowest frequency component indicated by arrow A is the fundamental frequency, and the frequency component indicated by arrow B is the frequency of the overtone.

図２に戻って、呼吸区間特定部１１２は、目標音声の周波数に基づいて、目標音声における呼吸区間を目標呼吸区間として特定する。また、呼吸区間特定部１１２は、ユーザー音声の周波数に基づいて、ユーザー音声における呼吸区間をユーザー呼吸区間として特定する。呼吸区間特定部１１２は、図３の矢印Ｃに示すように、周波数が万遍なく散らばっている箇所を呼吸区間として特定する。より具体的には、呼吸区間特定部１１２は、矢印Ａで示される基本周波数や矢印Ｂで示される倍音の周波数のように特定の周波数が存在する区間ではなく、広い範囲に亘って無数の周波数が存在する区間を呼吸区間として特定する。呼吸する際の音は、特定の周波数成分を有さず、分散するため、このように分散している部分を呼吸区間として特定できる。 Returning to FIG. 2, the breathing section specifying unit 112 specifies the breathing section in the target voice as the target breathing section based on the frequency of the target voice. In addition, the breathing section specifying unit 112 specifies a breathing section in the user voice as a user breathing section based on the frequency of the user voice. The respiratory interval specifying unit 112 specifies, as indicated by an arrow C in FIG. 3, a portion where the frequencies are scattered uniformly as a respiratory interval. More specifically, the respiratory section identification unit 112 is not a section in which a specific frequency exists, such as a fundamental frequency indicated by an arrow A or a frequency of an overtone indicated by an arrow B, but has an infinite number of frequencies over a wide range. Is specified as a respiratory section. The sound at the time of breathing has no specific frequency component and is dispersed, so that such a dispersed portion can be specified as a breathing section.

レベル特定部１１３は、目標音声データから、時間遷移する目標音声のレベルを特定する。また、レベル特定部１１３は、ユーザー音声データから、時間遷移するユーザー音声のレベルを特定する。レベル特定部１１３は、目標音声データおよびユーザー音声データに対して、所定時間毎に音声のレベルを特定する。音声のレベルは、音声の強さ（音圧）に対応する。音声のレベルは、例えば、図４のように表現できる。 The level specifying unit 113 specifies the level of the target sound that changes with time from the target sound data. Further, the level specifying unit 113 specifies the level of the user voice that changes over time from the user voice data. The level identification unit 113 identifies the audio level of the target audio data and the user audio data at predetermined time intervals. The sound level corresponds to the strength (sound pressure) of the sound. The audio level can be expressed, for example, as shown in FIG.

図４は、時間遷移に伴う音声のレベルの変化の一例を示す図である。レベル特定部１１３が、歌唱の進行（時間の進行）に従って、所定時間毎に音声のレベルを特定することによって、時間遷移に伴う音声のレベルの変化が確認される。ここで、所定時間は任意であり、例えば、０．１秒である。この所定時間が短いほど、精密に音声のレベルの遷移を特定できる。 FIG. 4 is a diagram illustrating an example of a change in the level of the voice according to the time transition. The level specifying unit 113 specifies the audio level at predetermined time intervals according to the progress of the singing (the progress of time), so that a change in the audio level due to the time transition is confirmed. Here, the predetermined time is arbitrary, for example, 0.1 second. The shorter the predetermined time is, the more precisely the transition of the sound level can be specified.

図２に戻って、伴奏データ再生部１１４は、ユーザーに伴奏を提供するために、ハードディスク１３０などに記憶され、目標音声データと共に楽曲データに含まれる伴奏データを再生する。そして、伴奏データ再生部１１４は、伴奏データに基づく電気信号を音声出力部１８０に出力し、音声出力部１８０に、楽曲の伴奏音として出力させる。ユーザーは、音声出力部１８０によって出力される伴奏音を聞きながら、歌唱する。 Returning to FIG. 2, the accompaniment data reproducing unit 114 reproduces the accompaniment data stored in the hard disk 130 or the like and included in the music data together with the target audio data in order to provide the accompaniment to the user. Then, the accompaniment data reproducing unit 114 outputs an electric signal based on the accompaniment data to the audio output unit 180, and causes the audio output unit 180 to output the music signal as an accompaniment sound. The user sings while listening to the accompaniment sound output by the audio output unit 180.

呼吸区間比較部１１５は、目標呼吸区間およびユーザー呼吸区間を比較する。具体的には、呼吸区間比較部１１５は、目標呼吸区間およびユーザー呼吸区間の開始および終了タイミングをそれぞれ比較する。 The breathing section comparator 115 compares the target breathing section with the user's breathing section. Specifically, the breathing section comparison unit 115 compares the start and end timings of the target breathing section and the user's breathing section, respectively.

周波数比較部１１６は、進行する歌唱の同一タイミングにおける、目標音声の周波数およびユーザー音声の周波数を比較する。具体的には、周波数比較部１１６は、進行する歌唱の同一タイミングにおける、目標音声の周波数に対するユーザー音声の周波数の比率を求める。なお、周波数比較部１１６は、比率に代えて、進行する歌唱の同一タイミングにおける、目標音声の周波数とユーザー音声の周波数との差分を求めてもよい。 The frequency comparing unit 116 compares the frequency of the target voice and the frequency of the user voice at the same timing of the singing song. Specifically, the frequency comparing unit 116 calculates the ratio of the frequency of the user voice to the frequency of the target voice at the same timing of the singing song. Note that the frequency comparison unit 116 may calculate the difference between the frequency of the target voice and the frequency of the user voice at the same timing of the singing song, instead of the ratio.

レベル比較部１１７は、進行する歌唱の同一タイミングにおける、目標音声のレベルおよびユーザー音声のレベルを比較する。なお、目標音声およびユーザー音声は、同じ条件で録音されないため、ユーザー音声のレベルが目標音声のレベルと比較して、全体的に音のレベルが小さくなる場合や大きくなる場合がある。このため、例えば下記の手順によって比較を行う。まず、レベル比較部１１７は、歌唱の一部または全部における、目標音声のレベルおよびユーザー音声のレベルの平均値をそれぞれ算出し、目標音声のレベルの平均値に対するユーザー音声のレベルの平均値の割合を算出する。そして、ＣＰＵ１１０は、算出された割合を目標音声のレベルに乗算することによって、目標音声のレベルを補正する。そして、ＣＰＵ１１０は、進行する歌唱の同一タイミングにおいて、補正された目標音声のレベルと、ユーザー音声のレベルとを比較する。具体的には、レベル比較部１１７は、進行する歌唱の同一タイミングにおける、補正された目標音声のレベルに対するユーザー音声のレベルの比率を求める。なお、レベル比較部１１７は、比率に代えて、進行する歌唱の同一タイミングにおける、補正された目標音声のレベルとユーザー音声のレベルとの差分を求めてもよい。 The level comparing section 117 compares the level of the target voice and the level of the user voice at the same timing of the singing song. Since the target voice and the user voice are not recorded under the same condition, the level of the user voice may be lower or higher as a whole as compared with the level of the target voice. For this reason, for example, the comparison is performed according to the following procedure. First, the level comparison unit 117 calculates the average value of the target voice level and the user voice level in part or all of the singing, and calculates the ratio of the average value of the user voice level to the target voice level average value. Is calculated. Then, CPU 110 corrects the level of the target voice by multiplying the calculated ratio by the level of the target voice. Then, the CPU 110 compares the level of the corrected target voice with the level of the user voice at the same timing of the singing proceeding. Specifically, the level comparing section 117 obtains the ratio of the level of the user voice to the level of the corrected target voice at the same timing of the singing song. Note that the level comparing section 117 may calculate the difference between the corrected target voice level and the user voice level at the same timing of the singing song in progress, instead of the ratio.

出力部１１８は、呼吸区間比較部１１５、周波数比較部１１６、およびレベル比較部１１７によって得られる結果を表示部１５０に出力する。具体的には以下の通りである。 The output unit 118 outputs the result obtained by the respiratory interval comparison unit 115, the frequency comparison unit 116, and the level comparison unit 117 to the display unit 150. Specifically, it is as follows.

呼吸区間比較部１１５による比較結果において、目標呼吸区間およびユーザー呼吸区間の開始および終了タイミングのずれが小さい場合、出力部１１８は、歌唱が巧いと判断する。一方、呼吸区間比較部１１５による比較結果において、目標呼吸区間およびユーザー呼吸区間の開始および終了タイミングのずれが大きい場合、出力部１１８は、歌唱が拙いと判断する。そして、出力部１１８は、呼吸区間の比較結果に関連するアドバイスを表示部１５０に表示する。 When the difference between the start and end timings of the target breathing section and the user breathing section is small in the comparison result by the breathing section comparing section 115, the output section 118 determines that the singing is good. On the other hand, when the difference between the start and end timings of the target breathing section and the user breathing section is large in the comparison result by the breathing section comparing section 115, the output section 118 determines that the singing is poor. Then, the output unit 118 displays advice related to the comparison result of the breathing sections on the display unit 150.

周波数比較部１１６による比較結果において、目標音声の周波数に対するユーザー音声の周波数の比率が１に近いほど、出力部１１８は、歌唱が巧いと判断する。そして、出力部１１８は、周波数の比較結果に関連するアドバイスを表示部１５０に表示する。 In the comparison result by the frequency comparing unit 116, the closer the ratio of the frequency of the user voice to the frequency of the target voice is to 1, the output unit 118 determines that the singing is better. Then, the output unit 118 displays advice related to the frequency comparison result on the display unit 150.

レベル比較部１１７による比較結果において、補正された目標音声のレベルに対するユーザー音声のレベルの比率が１に近いほど、出力部１１８は、歌唱が巧いと判断する。そして、出力部１１８は、周波数の比較結果に関連するアドバイスを表示部１５０に表示する。 In the comparison result by the level comparing unit 117, the closer the ratio of the level of the user voice to the corrected level of the target voice is closer to 1, the output unit 118 determines that the singing is good. Then, the output unit 118 displays advice related to the frequency comparison result on the display unit 150.

出力部１１８は、呼吸区間比較部１１５、周波数比較部１１６、レベル比較部１１７によって得られる結果に基づいて採点する機能を有する。例えば、呼吸区間が０．１秒ずれている場合は、マイナス１点とするなどのルールを予め決めておき、楽曲の全ての区間についてのマイナス点を総計し、１００点から減点する。出力部１１８は、採点結果としての点数を表示部１５０に出力する。 The output unit 118 has a function of scoring based on the results obtained by the respiratory interval comparison unit 115, the frequency comparison unit 116, and the level comparison unit 117. For example, if the breathing section is shifted by 0.1 second, a rule such as minus one point is determined in advance, and the minus points for all sections of the music are totaled and deducted from 100 points. The output unit 118 outputs the score as the scoring result to the display unit 150.

次に、図５を参照して本実施形態に係る歌唱評価方法を説明する。図５は、本実施形態に係る歌唱評価方法を示すフローチャートである。なお、図５のフローチャートに示される処理は、歌唱評価装置１００のメモリー１２０またはハードディスク１３０にプログラムとして記憶されており、ＣＰＵ１１０によって実行される。本実施形態に係る歌唱評価装置１００のプログラムは、ユーザー音声データを目標音声データと比較して、ユーザーの歌唱の巧拙を評価するものである。 Next, a singing evaluation method according to the present embodiment will be described with reference to FIG. FIG. 5 is a flowchart illustrating the singing evaluation method according to the present embodiment. The process shown in the flowchart of FIG. 5 is stored as a program in the memory 120 or the hard disk 130 of the singing evaluation device 100, and is executed by the CPU 110. The program of the singing evaluation device 100 according to the present embodiment evaluates the user's singing skill by comparing the user voice data with the target voice data.

まず、ＣＰＵ１１０は、周波数特定部１１１として、ハードディスク１３０に記憶されている楽曲データに含まれる目標音声データから、時間遷移する目標音声の周波数を特定する（ステップＳ１０１）。具体的には、ＣＰＵ１１０は、時間遷移する目標音声の基本周波数（以下、目標基本周波数と称する）を、所定時間毎に特定する。ＣＰＵ１１０は、目標基本周波数を特定した後、目標音声データにおいて目標基本周波数を特定した時間と、当該目標基本周波数と、を関連付けた情報を、メモリー１２０などに記憶させる。 First, the CPU 110, as the frequency specifying unit 111, specifies a frequency of a target sound that changes over time from target sound data included in music data stored in the hard disk 130 (step S101). Specifically, the CPU 110 specifies a fundamental frequency of a target sound that undergoes a time transition (hereinafter, referred to as a target fundamental frequency) at predetermined time intervals. After specifying the target basic frequency, the CPU 110 causes the memory 120 or the like to store information in which the time at which the target basic frequency is specified in the target audio data is associated with the target basic frequency.

次に、ＣＰＵ１１０は、呼吸区間特定部１１２として、ステップＳ１０１で特定された目標基本周波数に基づいて、目標呼吸区間を特定する（ステップＳ１０２）。具体的には、ＣＰＵ１１０は、目標音声データのうち、目標音声において呼吸している時間ごとに、目標呼吸区間を特定する。ＣＰＵ１１０は、目標呼吸区間を特定した後、目標音声において呼吸区間を特定した時間と、各時間における呼吸区間と、を関連付けた情報を、メモリー１２０などに記憶させる。 Next, the CPU 110, as the breathing section specifying unit 112, specifies the target breathing section based on the target basic frequency specified in step S101 (step S102). Specifically, the CPU 110 specifies a target breathing section for each time of breathing in the target voice in the target voice data. After specifying the target breathing section, the CPU 110 causes the memory 120 or the like to store information that associates the time at which the breathing section was specified in the target voice with the breathing section at each time.

次に、ＣＰＵ１１０は、レベル特定部１１３として、目標音声データから、時間遷移する目標音声のレベルを特定する（ステップＳ１０３）。具体的には、ＣＰＵ１１０は、時間遷移する目標音声のレベル（以下、目標レベルと称する）を、所定時間毎に特定する。ＣＰＵ１１０は、目標レベルを特定した後、目標音声データにおいて目標レベルを特定した時間と、当該目標レベルと、を関連付けた情報を、メモリー１２０などに記憶させる。 Next, the CPU 110, as the level identification unit 113, identifies the level of the time-shifted target audio from the target audio data (step S103). Specifically, the CPU 110 specifies a level of a target sound that changes over time (hereinafter, referred to as a target level) at predetermined time intervals. After specifying the target level, the CPU 110 causes the memory 120 or the like to store information that associates the time at which the target level was specified in the target audio data with the target level.

ここまでのステップＳ１０１〜Ｓ１０３により、ユーザーが目標とする目標音声の特徴が特定される。以上のステップは、以下のステップと連続せずに、予め実行され、目標呼吸期間、目標音声の周波数およびレベルは、ハードディスク１３０等に格納されていてもよい。 Through steps S101 to S103 up to this point, the features of the target voice targeted by the user are specified. The above steps are not performed in succession to the following steps, but are executed in advance, and the target breathing period, the frequency and the level of the target sound may be stored in the hard disk 130 or the like.

次に、ＣＰＵ１１０は、伴奏データ再生部１１４として、伴奏データに基づく伴奏音を音声出力部１８０から再生する（ステップＳ１０４）。ユーザーは、伴奏音を聞きながら、音声入力部１７０を介して歌唱する。 Next, the CPU 110 plays the accompaniment sound based on the accompaniment data from the audio output unit 180 as the accompaniment data playback unit 114 (step S104). The user sings via the voice input unit 170 while listening to the accompaniment sound.

ＣＰＵ１１０は、ユーザーによる歌唱の間（伴奏音が再生されている間）、音声入力部１７０を介してユーザー音声データを取得する（ステップＳ１０５）。 The CPU 110 acquires the user voice data via the voice input unit 170 during the singing of the user (while the accompaniment sound is being reproduced) (step S105).

次に、ＣＰＵ１１０は、周波数特定部１１１として、ステップＳ１０５で取得されたユーザー音声データから、時間遷移するユーザー音声の周波数を特定する（ステップＳ１０６）。具体的には、ＣＰＵ１１０は、時間遷移するユーザー音声の基本周波数（以下、ユーザー基本周波数と称する）を、所定時間毎に特定する。ＣＰＵ１１０は、ユーザー基本周波数を特定した後、ユーザー音声データにおいてユーザー基本周波数を特定した時間と、当該ユーザー基本周波数と、を関連付けた情報を、メモリー１２０などに記憶させる。 Next, the CPU 110, as the frequency specifying unit 111, specifies the frequency of the user voice that undergoes a time transition from the user voice data acquired in step S105 (step S106). Specifically, the CPU 110 specifies a fundamental frequency of a user voice that changes over time (hereinafter, referred to as a user fundamental frequency) at predetermined time intervals. After specifying the user basic frequency, the CPU 110 causes the memory 120 or the like to store information that associates the time at which the user basic frequency was specified in the user voice data with the user basic frequency.

次に、ＣＰＵ１１０は、呼吸区間特定部１１２として、ステップＳ１０６で特定されたユーザー基本周波数に基づいて、ユーザー呼吸区間を特定する（ステップＳ１０７）。具体的には、ＣＰＵ１１０は、ユーザー音声データのうち、ユーザー音声において呼吸している区間ごとに、ユーザー呼吸区間を特定する。ＣＰＵ１１０は、ユーザー呼吸区間を特定した後、ユーザー音声において呼吸区間を特定した時間と、各時間における呼吸区間とを関連付けた情報を、メモリー１２０などに記憶させる。 Next, the CPU 110 specifies the user's breathing section as the breathing section specifying unit 112 based on the user's fundamental frequency specified in step S106 (step S107). Specifically, the CPU 110 specifies a user breathing section for each section of the user voice data that is breathing in the user voice. After specifying the user's breathing section, the CPU 110 causes the memory 120 or the like to store information relating the time at which the breathing section was specified in the user's voice and the breathing section at each time.

次に、ＣＰＵ１１０は、レベル特定部１１３として、ステップＳ１０５で取得されたユーザー音声データから、時間遷移するユーザー音声のレベルを特定する（ステップＳ１０８）。具体的には、ＣＰＵ１１０は、時間遷移するユーザー音声のレベル（以下、ユーザーレベルと称する）を、所定時間毎に特定する。ＣＰＵ１１０は、ユーザーレベルを特定した後、ユーザー音声データにおいてユーザーレベルを特定した時間と、当該ユーザーレベルと、を関連付けた情報を、メモリー１２０などに記憶させる。 Next, the CPU 110, as the level specifying unit 113, specifies the level of the user voice that changes over time from the user voice data acquired in step S105 (step S108). Specifically, the CPU 110 specifies a level of a user voice that changes over time (hereinafter, referred to as a user level) at predetermined time intervals. After specifying the user level, the CPU 110 causes the memory 120 or the like to store information that associates the time at which the user level was specified in the user voice data with the user level.

次に、ＣＰＵ１１０は、呼吸区間比較部１１５として、ステップＳ１０２で特定された目標呼吸区間およびステップＳ１０７で特定されたユーザー呼吸区間を比較する（ステップＳ１０９）。具体的には、ＣＰＵ１１０は、ステップＳ１０２およびステップＳ１０７においてメモリー１２０などに記憶された情報を参照し、同じ時間に関連付けられた目標呼吸区間およびユーザー呼吸区間の開始タイミングおよび終了タイミングを比較する。この処理は、ステップＳ１０７において特定された全てのユーザー呼吸区間に対して、つまり、ユーザー音声の全体に対して実行される。 Next, the CPU 110, as the breathing section comparison unit 115, compares the target breathing section specified in step S102 with the user breathing section specified in step S107 (step S109). Specifically, CPU 110 refers to the information stored in memory 120 or the like in steps S102 and S107, and compares the start timing and end timing of the target respiratory section and the user's respiratory section associated with the same time. This process is performed on all the user breathing sections specified in step S107, that is, on the entire user voice.

次に、ＣＰＵ１１０は、周波数比較部１１６として、進行する歌唱の同一タイミングにおける、ステップＳ１０１で特定された目標基本周波数およびステップＳ１０６で特定されたユーザー基本周波数を比較する（ステップＳ１１０）。具体的には、ＣＰＵ１１０は、ステップＳ１０１およびステップＳ１０６においてメモリー１２０などに記憶された情報を参照し、同じ時間に関連付けられた目標基本周波数およびユーザー周波数を比較する。この処理は、ステップＳ１０６において特定された全てのユーザー基本周波数に対して、つまり、ユーザー音声の全体に対して実行される。 Next, the CPU 110, as the frequency comparing unit 116, compares the target fundamental frequency specified in step S101 and the user fundamental frequency specified in step S106 at the same timing of the singing song that is going on (step S110). Specifically, CPU 110 refers to the information stored in memory 120 or the like in steps S101 and S106, and compares the target fundamental frequency and the user frequency associated with the same time. This process is performed on all the user fundamental frequencies specified in step S106, that is, on the entire user voice.

次に、ＣＰＵ１１０は、レベル比較部１１７として、進行する歌唱の同一タイミングにおける、ステップＳ１０３において特定された目標レベルおよびステップＳ１０８において特定されたユーザーレベルを比較する（ステップＳ１１１）。具体的には、ＣＰＵ１１０は、ステップＳ１０３およびステップＳ１０８においてメモリー１２０などに記憶された情報を参照し、同じ時間に関連付けられた目標レベルおよびユーザーレベルを比較する。この処理は、ステップＳ１０８において特定された全てのユーザーレベルに対して、つまり、ユーザー音声の全体に対して実行される。 Next, the CPU 110, as the level comparing unit 117, compares the target level specified in Step S103 and the user level specified in Step S108 at the same timing of the singing song (Step S111). Specifically, CPU 110 refers to the information stored in memory 120 or the like in steps S103 and S108, and compares the target level and the user level associated with the same time. This process is executed for all the user levels specified in step S108, that is, for the entire user voice.

次に、ＣＰＵ１１０は、出力部１１８として、ステップＳ１０９、ステップＳ１１０、およびステップＳ１１１における比較結果を、図６に示すように、表示部１５０に出力する（ステップＳ１１２）。図６は、表示部１５０に表示される結果の一例を示す図である。図６において、太線は目標基本周波数を示し、細線はユーザー基本周波数を示す。なお、表示部１５０に表示する際は、ユーザーが自身の歌唱の巧拙を理解しやすくするために、周波数ではなく、音程で表示することが好ましい。 Next, the CPU 110 outputs the comparison results in steps S109, S110, and S111 to the display unit 150 as the output unit 118, as shown in FIG. 6 (step S112). FIG. 6 is a diagram illustrating an example of a result displayed on the display unit 150. In FIG. 6, a thick line indicates a target fundamental frequency, and a thin line indicates a user fundamental frequency. When displaying on the display unit 150, it is preferable to display not by frequency but by pitch so that the user can easily understand the skill of his or her singing.

図６では、歌詞「もりのなか」と歌う前の呼吸区間が短く、歌詞「あ」のときの声が小さく、また歌詞「の」のときの声が極端に高い場合の結果を示している。このような結果の場合、ユーザー音声が目標音声に対して異なる該当箇所に、比較結果に基づくアドバイスが表示される。例えば、呼吸区間が短い箇所には、「呼吸長く」などのアドバイスが表示される。また、各比較結果を数値に換算し、得点として表示することもできる。図６では、左上に「９４点」と表示される例を示している。また、１００点からのマイナス分の内訳が表示されることが好ましい。例えば、図６では、１００点からのマイナス６点の内訳が、呼吸区間：−２点、周波数（音程）：−２点、レベル：−２点である場合を示す。 FIG. 6 shows the result when the breathing interval before singing the lyrics “Morinonaka” is short, the voice at the time of the lyrics “A” is low, and the voice at the time of the lyrics “No” is extremely high. In the case of such a result, the advice based on the comparison result is displayed at a corresponding portion where the user voice is different from the target voice. For example, advice such as "long breathing" is displayed at a place where the breathing section is short. Further, each comparison result can be converted into a numerical value and displayed as a score. FIG. 6 shows an example in which “94 points” is displayed at the upper left. In addition, it is preferable that the breakdown of the minus portion from 100 points is displayed. For example, FIG. 6 shows a case where the breakdown of minus 6 points from 100 points is a respiratory section: -2 points, a frequency (pitch): -2 points, and a level: -2 points.

なお、図６では、楽曲の歌詞も表示している。歌詞は全文が表示されてもよいが、図６に示すように、一部だけが表示されてもよい。この場合、画面の切り換えやスクロール等によって、前または後の歌詞も表示できる。 In FIG. 6, the lyrics of the music are also displayed. The lyrics may be displayed in their entirety, but may be displayed in part as shown in FIG. In this case, the previous or subsequent lyrics can be displayed by switching the screen or scrolling.

以上説明したように、本実施形態に係る歌唱評価方法および歌唱評価装置によれば、目標音声の周波数に基づいて、目標音声における呼吸区間を目標呼吸区間として特定する。また、ユーザー音声の周波数に基づいて、ユーザー音声における呼吸区間をユーザー呼吸区間として特定する。そして、目標呼吸区間とユーザー呼吸区間とを比較して、比較結果を出力する。このため、ユーザーは、目標呼吸区間に対する、ユーザー呼吸区間のずれを把握することができる。したがって、呼吸区間を評価することができる。 As described above, according to the singing evaluation method and the singing evaluation device according to the present embodiment, the breathing section in the target voice is specified as the target breathing section based on the frequency of the target voice. Further, based on the frequency of the user voice, a breathing section in the user voice is specified as a user breathing section. Then, the target breathing section is compared with the user breathing section, and a comparison result is output. For this reason, the user can grasp the deviation of the user breathing section from the target breathing section. Therefore, the respiratory interval can be evaluated.

また、進行する歌唱の同一タイミングにおける、目標音声の周波数およびユーザー音声の周波数を比較して、周波数の比較結果を出力できる。このため、目標音声の周波数に対する、ユーザー音声の周波数のずれを把握することができる。よって、呼吸区間の評価だけでなく、周波数の評価を行うこともでき、より精密に歌唱を評価することができる。 In addition, the frequency of the target voice and the frequency of the user voice at the same timing of the singing song can be compared, and the frequency comparison result can be output. For this reason, the deviation of the frequency of the user voice from the frequency of the target voice can be grasped. Therefore, not only the evaluation of the breathing section but also the evaluation of the frequency can be performed, and the singing can be evaluated more precisely.

また、目標音声データから、時間遷移する目標レベルを特定し、ユーザー音声データから、時間遷移するユーザーレベルを特定し、進行する歌唱の同一タイミングにおける目標レベルおよびユーザーレベルを比較して、レベルの比較結果を出力できる。このため、目標音声のレベルに対する、ユーザー音声のレベルのずれを把握することができる。よって、呼吸区間および周波数の評価だけでなく、レベルの評価を行うこともでき、より精密に歌唱を評価することができる。 Further, a time-shifted target level is specified from the target voice data, a time-shifted user level is specified from the user voice data, and the target level and the user level at the same timing of the singing song are compared. Can output the result. For this reason, it is possible to grasp the difference between the level of the user voice and the level of the target voice. Therefore, not only the evaluation of the breathing section and the frequency, but also the evaluation of the level can be performed, and the singing can be evaluated more precisely.

また、比較結果が、点数として表示部１５０に表示される。このため、歌唱の巧拙をユーザーが判断しやすい。 The comparison result is displayed on the display unit 150 as a score. Therefore, it is easy for the user to judge the skill of the singing.

また、比較結果に関連するアドバイスが表示部１５０に表示される。このため、ユーザーが、苦手箇所を把握しやすくなる。 Further, advice related to the comparison result is displayed on the display unit 150. For this reason, it becomes easy for the user to grasp the weak points.

なお、本発明は上述した実施形態に限定されるものではなく、特許請求の範囲内で種々改変することができる。 Note that the present invention is not limited to the above-described embodiment, and can be variously modified within the scope of the claims.

例えば、上述した実施形態では、ユーザー呼吸区間、ユーザー音声の周波数、およびユーザーレベルの全てを比較した。しかしながら、図７に示すように、呼吸区間、音程（周波数）、声の大きさ（レベル）の３つの項目をチェックリストとして、ユーザーが評価したい項目を適宜選択することができるようにしてもよい。これによって、ユーザー自身が苦手であると認識している項目を評価することができ、その項目を重点的に練習することでユーザーの歌唱力向上につながる。 For example, in the above-described embodiment, all of the user breathing section, the frequency of the user voice, and the user level were compared. However, as shown in FIG. 7, three items of a breathing section, a pitch (frequency), and a loudness (level) may be used as a checklist so that the user can appropriately select an item to be evaluated. . As a result, it is possible to evaluate an item that the user perceives to be weak and to improve the singing ability of the user by practicing the item with emphasis.

また、上述した実施形態では、周波数比較部１１６は、目標基本周波数とユーザー基本周波数とを比較して、音程のずれを評価した。しかしながら、周波数比較部１１６は、これに加えてまたは代えて、音階が上昇する箇所、音階が下降する箇所、およびビブラートのそれぞれのタイミングを比較してもよい。このような項目を評価することで、より精密な歌唱評価を実施することができる。 Further, in the above-described embodiment, the frequency comparing unit 116 compares the target fundamental frequency and the user fundamental frequency, and evaluates a shift in pitch. However, the frequency comparison unit 116 may additionally or alternatively compare the timings of a place where the scale rises, a place where the scale falls, and the vibrato. By evaluating such items, more accurate singing evaluation can be performed.

また、上述した実施形態では、歌唱評価装置１００は、図５に示す歌唱評価方法において、目標基本周波数およびユーザー基本周波数を比較した。しかし、本発明はこれに限定されず、基本周波数以外の周波数を比較してもよい。例えば、歌唱評価装置１００は、目標音声の第２倍音の周波数およびユーザー音声の第２倍音の周波数を比較してもよい。 In the above-described embodiment, the singing evaluation device 100 compares the target fundamental frequency and the user fundamental frequency in the singing evaluation method illustrated in FIG. However, the present invention is not limited to this, and frequencies other than the fundamental frequency may be compared. For example, the singing evaluation device 100 may compare the frequency of the second overtone of the target voice with the frequency of the second overtone of the user voice.

また、上述した実施形態では、ユーザーが楽曲を最初から最後まで歌唱した後に、ユーザーの歌唱を評価する歌唱評価方法について説明した。しかしながらこれに限定されず、ユーザーが歌唱しつつ、リアルタイムで歌唱の評価を行ってもよい。このとき、例えば、ユーザー呼吸区間が目標呼吸区間に対してずれている場合には、ブザーを鳴らして警告してもよいし、表示部１５０にその旨記載してもよい。 In the above-described embodiment, the singing evaluation method of evaluating the singing of the user after the user sings the music from the beginning to the end has been described. However, the present invention is not limited to this, and the singing may be evaluated in real time while the user sings. At this time, for example, when the user's breathing section is deviated from the target breathing section, a buzzer may be sounded to warn or the display section 150 may indicate that.

また、上述した実施形態では、ユーザー音声のレベルを特定するステップＳ１０８は、ユーザー音声の周波数を特定するステップＳ１０６、およびユーザー呼吸区間を特定するステップＳ１０７に次いで実行された。しかしながら、これに限定されず、ユーザー音声のレベルを特定するステップＳ１０８は、ユーザー音声の周波数を特定するステップＳ１０６、およびユーザー呼吸区間を特定するステップＳ１０７と並行して、またはそれらのステップの前に実行されてもよい。 In the above-described embodiment, the step S108 for specifying the level of the user voice is executed after the step S106 for specifying the frequency of the user voice and the step S107 for specifying the user's breathing interval. However, the present invention is not limited to this. The step S108 for specifying the level of the user voice may be performed in parallel with or before the step S106 for specifying the frequency of the user voice and the step S107 for specifying the user's breathing interval. It may be performed.

また、上述した実施形態における歌唱評価結果はデータとして、ハードディスク１３０に保存してもよい。このように歌唱評価結果をデータとして保存できることで、ユーザーは日々の歌唱の向上を確認することができる。 In addition, the singing evaluation result in the above-described embodiment may be stored in the hard disk 130 as data. Since the singing evaluation result can be stored as data in this way, the user can confirm daily singing improvement.

また、上述した実施形態において、目標音声のレベルを特定するステップＳ１０３、ユーザー音声のレベルを特定するステップＳ１０８、目標基本周波数およびユーザー基本周波数を比較するステップＳ１１０、目標レベルおよびユーザーレベルを比較するステップＳ１１１は、省略されてもよい。 Further, in the above-described embodiment, the step S103 for specifying the level of the target voice, the step S108 for specifying the level of the user voice, the step S110 for comparing the target basic frequency and the user basic frequency, and the step for comparing the target level and the user level S111 may be omitted.

＜改変例１＞
以下、上述した実施形態の改変例について説明する。 <Modification 1>
Hereinafter, modified examples of the above-described embodiment will be described.

上記実施形態においては、歌唱評価装置１００単体により、ユーザー音声の取得や、歌唱の評価を実行している。改変例１では、複数の装置を含む歌唱評価システムにより、ユーザー音声の取得や、歌唱の評価を実行する。 In the above embodiment, the singing evaluation device 100 alone acquires user voices and performs singing evaluation. In the first modification, the singing evaluation system including a plurality of devices acquires user voices and evaluates singing.

図８は、歌唱評価システム２００の概略構成を示すブロック図である。 FIG. 8 is a block diagram showing a schematic configuration of the singing evaluation system 200.

歌唱評価システム２００は、図８に示すように、ＰＣ２１０と、携帯端末２２０と、サーバー装置２３０と、を有する。 The singing evaluation system 200 includes a PC 210, a portable terminal 220, and a server device 230, as shown in FIG.

ＰＣ２１０は、デスクトップ型ＰＣやノート型ＰＣなどのコンピューター端末である。 The PC 210 is a computer terminal such as a desktop PC or a notebook PC.

携帯端末２２０は、タブレット端末やスマートフォンなどの、ユーザーが携帯可能な端末である。 The mobile terminal 220 is a terminal that can be carried by a user, such as a tablet terminal or a smartphone.

ＰＣ２１０および／または携帯端末２２０は、ユーザー端末として機能する。 The PC 210 and / or the mobile terminal 220 function as a user terminal.

サーバー装置２３０は、歌唱評価方法を実行する情報処理装置である。サーバー装置２３０は、図１に示す歌唱評価装置１００と略同様のハードウェア構成を有する。さらに、サーバー装置２３０は、図２に示す歌唱評価装置１００のＣＰＵ１１０の機能構成と同様の構成を有する。 The server device 230 is an information processing device that executes the singing evaluation method. The server device 230 has substantially the same hardware configuration as the singing evaluation device 100 shown in FIG. Further, the server device 230 has a configuration similar to the functional configuration of the CPU 110 of the singing evaluation device 100 shown in FIG.

ネットワーク２４０は、イーサネット（登録商標）、ＦＤＤＩ、Ｗｉ−Ｆｉなどの規格によるＬＡＮ（ＬｏｃａｌＡｒｅａＮｅｔｗｏｒｋ）や、ＬＡＮ同士を専用線で接続したＷＡＮ（ＷｉｄｅＡｒｅａＮｅｔｗｏｒｋ）などからなる。なお、ネットワーク２４０に接続される各構成の種類および台数は、図８に示す例に限定されない。 The network 240 includes a LAN (Local Area Network) based on standards such as Ethernet (registered trademark), FDDI, and Wi-Fi, and a WAN (Wide Area Network) in which LANs are connected by a dedicated line. Note that the type and number of each component connected to the network 240 are not limited to the example shown in FIG.

歌唱評価システム２００の作用について説明する。 The operation of the singing evaluation system 200 will be described.

ユーザー端末であるＰＣ２１０または携帯端末２２０は、サーバー装置２３０から伴奏データを取得する。以下では、携帯端末２２０が伴奏データを取得したものとして説明する。携帯端末２２０は、伴奏データを再生し、ユーザー音声の入力を受け付ける。ここで、ユーザー音声の入力は、携帯端末２２０に内蔵されたマイクを介しても良いし、携帯端末２２０に接続されたマイクを介してもよい。携帯端末２２０は、入力されたユーザー音声から、ユーザー音声データを生成する。そして、携帯端末２２０は、ネットワーク２４０を介して、ユーザー音声データをサーバー装置２３０に送信する。 The PC 210 or the portable terminal 220 as the user terminal acquires the accompaniment data from the server device 230. Hereinafter, a description will be given assuming that the portable terminal 220 has acquired the accompaniment data. The portable terminal 220 reproduces the accompaniment data and receives an input of a user voice. Here, the input of the user's voice may be performed through a microphone built in the portable terminal 220 or via a microphone connected to the portable terminal 220. The mobile terminal 220 generates user voice data from the input user voice. Then, the mobile terminal 220 transmits the user voice data to the server device 230 via the network 240.

サーバー装置２３０は、携帯端末２２０において生成されたユーザー音声データを、携帯端末２２０から受信する。そして、サーバー装置２３０は、図５に示す歌唱評価方法を開始する。図５に示す歌唱評価方法は、上述した実施形態とほぼ同様の工程であるため、その詳細な説明は省略する。サーバー装置２３０は、歌唱評価方法により、ユーザー音声と目標音声を比較して、比較結果を歌唱評価の結果として、携帯端末２２０に送信（出力）する。 The server device 230 receives the user voice data generated in the mobile terminal 220 from the mobile terminal 220. Then, the server device 230 starts the singing evaluation method shown in FIG. The singing evaluation method shown in FIG. 5 has substantially the same steps as those in the above-described embodiment, and thus detailed description thereof will be omitted. The server device 230 compares the user voice and the target voice by the singing evaluation method, and transmits (outputs) the comparison result to the portable terminal 220 as a singing evaluation result.

携帯端末２２０は、歌唱評価の結果を、サーバー装置２３０から受信し、結果を出力する。携帯端末２２０における歌唱評価の結果の出力は、例えば、携帯端末２２０の画面上に表示されたり、音声として発声されたりすることにより、実行される。 The mobile terminal 220 receives the result of the singing evaluation from the server device 230 and outputs the result. The output of the result of the singing evaluation in the mobile terminal 220 is executed, for example, by being displayed on the screen of the mobile terminal 220 or being uttered as voice.

以上のように、改変例１に係る歌唱評価システム２００によれば、ユーザー音声の取得は、ＰＣ２１０や携帯端末２２０などのユーザー端末によって行い、ユーザー音声に基づく歌唱の評価は、サーバー装置２３０によって行う。比較的処理能力が低いユーザー端末側において、歌唱の評価を実行する必要がない。その一方で、比較的処理能力が高いサーバー装置により、歌唱の評価を実行できる。つまり、処理能力の高い装置で、処理負荷が大きい処理を実行でき、また、処理能力の低い装置で、処理負荷が小さい処理を実行できる。したがって、処理を最適化できる。 As described above, according to the singing evaluation system 200 according to the first modification, the acquisition of the user voice is performed by the user terminal such as the PC 210 or the mobile terminal 220, and the evaluation of the singing based on the user voice is performed by the server device 230. . There is no need to perform singing evaluation on the user terminal having relatively low processing capability. On the other hand, song evaluation can be performed by a server device having relatively high processing capability. In other words, a device with a high processing capacity can execute a process with a large processing load, and a device with a low processing capability can execute a process with a small processing load. Therefore, processing can be optimized.

ユーザーは、歌唱評価プログラムがインストールされたＰＣを保有したり、当該ＰＣがある場所に出向いたりする必要がない。つまり、歌唱評価システム２００によれば、ユーザー自身のＰＣ２１０や携帯端末２２０を用いて、手軽に歌唱評価のサービスを提供できる。 The user does not need to own a PC on which the singing evaluation program is installed or go to a place where the PC is located. That is, according to the singing evaluation system 200, the singing evaluation service can be easily provided using the user's own PC 210 or the portable terminal 220.

なお、サーバー装置２３０を介することによって、例えば、第３者と、歌唱評価結果を比較して、ランキング付けできるシステムが提案されてもよい。このように構成された歌唱評価システム２００によれば、ユーザーの歌唱練習のモチベーションにつながる。 In addition, a system that can compare and rank singing evaluation results with a third party by way of the server device 230 may be proposed. According to the singing evaluation system 200 configured as described above, the motivation of the singing practice of the user is led.

本発明による歌唱評価装置１００による処理は、上記各手順を実行するための専用のハードウェア回路によっても、また、上記各手順を記述したプログラムをＣＰＵが実行することによっても実現できる。後者により本発明を実現する場合、歌唱評価装置１００を動作させる上記プログラムは、ＵＳＢメモリー、フロッピー（登録商標）ディスクやＣＤ−ＲＯＭなどのコンピューター読み取り可能な記録媒体によって提供されてもよいし、インターネットなどのネットワークを介してオンラインで提供されてもよい。この場合、コンピューター読み取り可能な記録媒体に記録されたプログラムは、通常、メモリーやハードディスクなどに転送され記憶される。また、このプログラムは、たとえば、単独のアプリケーションソフトとして提供されてもよいし、歌唱評価装置１００の一機能としてその装置のソフトウェアに組み込んでもよい。 The processing by the singing evaluation device 100 according to the present invention can be realized by a dedicated hardware circuit for executing the above-described procedures, or by the CPU executing a program describing the above-described procedures. When the present invention is realized by the latter, the program for operating the singing evaluation device 100 may be provided by a computer-readable recording medium such as a USB memory, a floppy (registered trademark) disk or a CD-ROM, or may be provided on the Internet. Alternatively, it may be provided online via a network such as. In this case, the program recorded on the computer-readable recording medium is usually transferred to a memory or a hard disk and stored. Further, this program may be provided, for example, as independent application software, or may be incorporated in software of the singing evaluation device 100 as one function of the device.

１００歌唱評価装置、
１１０ＣＰＵ、
１１１周波数特定部、
１１２呼吸区間特定部、
１１３レベル特定部、
１１４伴奏データ再生部、
１１５呼吸区間比較部、
１１６周波数比較部、
１１７レベル比較部、
１１８出力部、
１２０メモリー、
１３０ハードディスク、
１４０通信Ｉ／Ｆ部、
１５０表示部、
１６０操作部、
１７０音声入力部、
１８０音声出力部、
１９０バス、
２００歌唱評価システム、
２１０ＰＣ（ユーザー端末）、
２２０携帯端末（ユーザー端末）、
２３０サーバー装置、
２４０ネットワーク。 100 singing evaluation device,
110 CPU,
111 frequency identification unit,
112 breathing section identification unit,
113 level identification unit,
114 accompaniment data playback unit,
115 breathing section comparison unit,
116 frequency comparison unit,
117 level comparison unit,
118 output unit,
120 memories,
130 hard disk,
140 communication I / F section,
150 display,
160 operation unit,
170 voice input unit,
180 audio output unit,
190 bus,
200 singing evaluation system,
210 PC (user terminal),
220 mobile terminal (user terminal),
230 server devices,
240 networks.

Claims

A singing evaluation method that evaluates the skill of the user's singing,
From the target voice data representing the target voice targeted by the user, specify the frequency of the target voice that transitions in time, from the user voice data representing the user voice that is the user's voice, the frequency of the user voice that transitions in time A frequency identification step of identifying
Based on the frequency of the target speech, the frequency have your target speech over the evenly scattered wide range identifies the breathing segment that exist innumerable frequency as target respiration interval, based on the frequency of the user voice Te, a breathing segment specifying step of specifying a breathing segment that exist innumerable frequency as a user breathing segment over a wide range scattered is evenly frequency have you to the user voice,
A breathing section comparison step of comparing the target breathing section and the user breathing section,
An output step of outputting a result compared in the breathing section comparison step.

Based on the frequency of the target voice and the frequency of the user voice, at the same timing of the singing proceeding, further comprising a frequency comparison step of comparing the frequency of the target voice and the frequency of the user voice,
The singing evaluation method according to claim 1, wherein in the output step, a result compared in the frequency comparison step is also output.

From the target voice data, to specify the level of the target voice that changes over time, from the user voice data, a level specifying step of specifying the level of the user voice that changes over time,
At the same timing of the singing proceeding, a level comparing step of comparing the level of the target voice and the level of the user voice,
The singing evaluation method according to claim 1, wherein in the output step, a result compared in the level comparison step is also output.

The singing evaluation method according to any one of claims 1 to 3, wherein in the output step, the result is displayed as a score on a display unit.

The singing evaluation method according to any one of claims 1 to 4, wherein in the output step, advice related to the result is displayed on a display unit.

A singing evaluation program for causing a computer to execute the singing evaluation method according to any one of claims 1 to 5.

A singing evaluation device that evaluates a user's skill in singing,
From the target voice data representing the target voice targeted by the user, specify the frequency of the target voice that transitions in time, from the user voice data representing the user voice that is the user's voice, the frequency of the user voice that transitions in time A frequency specifying unit for specifying
Based on the frequency of the target speech, the frequency have your target speech over the evenly scattered wide range identifies the breathing segment that exist innumerable frequency as target respiration interval, based on the frequency of the user voice Te, breathing section identifying unit for identifying the breathing segment that exist innumerable frequency as a user breathing segment over a wide range scattered is evenly frequency have you to the user voice,
A breathing section comparison unit that compares the target breathing section and the user breathing section,
A singing evaluation device, comprising: an output unit that outputs a result compared by the breathing section comparison unit.

Based on the frequency of the target voice and the frequency of the user voice, at the same timing of the singing proceeding, further comprising a frequency comparison unit that compares the frequency of the target voice and the frequency of the user voice,
The singing evaluation device according to claim 7, wherein the output unit also outputs a result compared by the frequency comparison unit.

From the target voice data, to specify the level of the target voice that changes over time, from the user voice data, a level specifying unit that specifies the level of the user voice that changes over time,
At the same timing of the singing proceeding, a level comparing unit that compares the level of the target voice and the level of the user voice,
The singing evaluation device according to claim 7, wherein the output unit also outputs a result compared by the level comparison unit.

The singing evaluation device according to any one of claims 7 to 9, wherein the output unit displays a result as a score on a display unit.

The singing evaluation device according to any one of claims 7 to 10, wherein the output unit displays advice related to the result on a display unit.

A user terminal that generates user voice data from a user voice that is a voice of the user and transmits the user voice data via a network;
The user voice data is received from the user terminal, the user voice data is compared with target voice data representing a target voice targeted by the user, and a result of singing evaluation is transmitted. Or a server device functioning as the singing evaluation device according to claim 1,
The singing evaluation system, wherein the user terminal receives a result of the singing evaluation from the server device and outputs the result.