JP5585320B2

JP5585320B2 - Singing voice evaluation device

Info

Publication number: JP5585320B2
Application number: JP2010198323A
Authority: JP
Inventors: 伸悟神谷; 辰弥寺島; 秀一松本
Original assignee: Yamaha Corp
Current assignee: Yamaha Corp
Priority date: 2010-09-03
Filing date: 2010-09-03
Publication date: 2014-09-10
Anticipated expiration: 2030-09-03
Also published as: JP2012058277A

Description

本発明は、歌唱音声を評価した結果を出力する技術に関する。 The present invention relates to a technique for outputting a result of evaluating a singing voice.

カラオケ装置において、歌唱者が歌える楽曲について予め登録しておき、登録された楽曲のガイドメロディデータから最高音と最低音とを検出し、歌唱者の歌唱音域を表示して、歌唱者に知らせる技術がある。（例えば、特許文献１）。 In the karaoke device, a technique in which a song that a singer can sing is registered in advance, the highest and lowest sounds are detected from the guide melody data of the registered song, and the singer's singing range is displayed to inform the singer There is. (For example, patent document 1).

特開２００４−２０５８１８号公報JP 2004-205818 A

特許文献１に開示された技術においては、歌唱者が歌える楽曲を予め登録することになるが、歌える楽曲についての判断は歌唱者自身が行うため、実際にその楽曲を歌えているのかどうかについては正確性に欠けていた。また、歌唱者は、ある楽曲について歌えるものとして登録しても、その楽曲の最高音および最低音を超えて歌えるのかどうかは不明であり、自身の歌唱音域の限界を知ることにはならない。そのため、数多くの楽曲を歌ってみて、ぎりぎり歌える曲を探して登録しなくてはならない。
本発明は、歌唱者自らの判断によらず、歌唱者の歌唱音域を評価することを目的とする。 In the technique disclosed in Patent Document 1, a song that can be sung by a singer is registered in advance, but since the singer performs judgment on the song that can be sung, whether or not the song is actually sung. It lacked accuracy. Also, even if a singer registers as a song that can be sung, it is unclear whether or not the song can be sung beyond the highest and lowest sounds of the song, and does not know the limits of its own singing range. Therefore, you have to sing a lot of songs, find a song that you can sing at the last minute, and register it.
An object of the present invention is to evaluate a singer's singing range without depending on the singer's own judgment.

上述の課題を解決するため、本発明は、楽曲データの再生中に入力された歌唱音声を取得する取得手段と、前記取得された歌唱音声の歌唱音高を特定する音高特定手段と、オクターブ単位で区切られた音高範囲であって前記取得された歌唱音声の音高が属する音高範囲を特定する範囲特定手段と、前記楽曲データにより指定される歌唱すべき各構成音について、前記楽曲データにより指定される指定音高と前記歌唱音高とを比較して、当該指定音高と当該歌唱音高とが同じ音高範囲に属するようにオクターブ単位で変換した場合における音高の一致度を算出する算出手段と、前記歌唱すべき構成音の指定音高を当該構成音について前記特定された音高範囲に属するようにオクターブ単位で変換し、当該構成音について算出された一致度に基づいて当該変換した音高に対する歌唱評価を行う評価手段と、前記歌唱評価の結果に応じた情報を出力する出力手段とを具備することを特徴とする歌唱音声評価装置を提供する。 In order to solve the above-described problems, the present invention provides an acquisition means for acquiring a singing voice input during reproduction of music data, a pitch specifying means for specifying a singing pitch of the acquired singing voice, and an octave. A range specifying means for specifying a pitch range that is divided in units and to which the pitch of the acquired singing voice belongs, and for each component sound to be sung specified by the music data, the music The degree of coincidence of pitches when the specified pitch specified by the data is compared with the singing pitch and converted in octave units so that the specified pitch and the singing pitch belong to the same pitch range And calculating means for converting the designated pitch of the constituent sound to be sung in octave units so as to belong to the specified pitch range for the constituent sound, and based on the degree of coincidence calculated for the constituent sound Z And evaluation means for performing singing evaluation for the converted tone pitch Te provides a singing voice evaluation apparatus characterized by comprising an output means for outputting information corresponding to the result of the singing evaluation.

また、別の好ましい態様において、前記音高特定手段は、前記取得された歌唱音声を第１解析方法により解析して前記歌唱音高を特定し、前記範囲特定手段は、前記第１解析方法とは異なる第２解析方法により解析して前記音高範囲を特定することを特徴とする。 Moreover, in another preferable aspect, the pitch specifying means analyzes the acquired singing voice by a first analysis method to specify the singing pitch, and the range specifying means includes the first analysis method and the first analysis method. Are analyzed by different second analysis methods to identify the pitch range.

また、別の好ましい態様において、前記範囲特定手段は、前記歌唱すべき構成音について前記特定した音高範囲の正確性の指標となる信頼度を算出し、前記評価手段は、前記構成音について特定した音高範囲の信頼度が予め決められたしきい値未満となった場合には、当該構成音より前の構成音に対応する信頼度が当該しきい値以上となった音高範囲と当該前の構成音の指定音高が属する音高範囲との関係に基づいて決定した音高範囲に属するように、前記指定音高を変換することを特徴とする。 In another preferable aspect, the range specifying unit calculates a reliability that is an index of accuracy of the specified pitch range for the constituent sound to be sung, and the evaluation unit specifies the constituent sound. If the reliability of the pitch range is less than a predetermined threshold, the pitch range corresponding to the constituent sound before the constituent sound is equal to or higher than the threshold and the pitch range. The designated pitch is converted so as to belong to a pitch range determined based on a relationship with a pitch range to which a designated pitch of a previous constituent sound belongs.

また、別の好ましい態様において、前記歌唱評価の結果に応じて、一定範囲の音高を前記歌唱音声の音域として特定する音域特定手段をさらに具備し、前記出力手段が出力する情報は、前記特定した音域を示す音域情報であることを特徴とする。 Moreover, in another preferable aspect, according to the result of the singing evaluation, further comprising a sound range specifying means for specifying a pitch of a certain range as a sound range of the singing voice, and the information output by the output means is the specified It is the range information which shows the selected range.

本発明によれば、歌唱者自らの判断によらず、歌唱者の歌唱音域を評価することができる。 According to the present invention, the singer's singing range can be evaluated regardless of the singer's own judgment.

本発明の実施形態におけるカラオケ装置の構成を説明するブロック図である。It is a block diagram explaining the structure of the karaoke apparatus in embodiment of this invention. 本発明の実施形態における歌唱音声評価機能の構成を説明する機能ブロック図である。It is a functional block diagram explaining the structure of the song voice evaluation function in embodiment of this invention. 本発明の実施形態における算出部における一致度算出の具体例を説明する図である。It is a figure explaining the specific example of the coincidence degree calculation in the calculation part in embodiment of this invention. 本発明の実施形態における評価処理の動作を説明するフローチャートである。It is a flowchart explaining the operation | movement of the evaluation process in embodiment of this invention. 本発明の実施形態における修正処理の動作を説明するフローチャートである。It is a flowchart explaining operation | movement of the correction process in embodiment of this invention. 本発明の実施形態における評価結果情報の内容を説明する図である。It is a figure explaining the content of the evaluation result information in embodiment of this invention. 本発明の実施形態における評価結果の表示内容の一例を説明する図である。It is a figure explaining an example of the display contents of the evaluation result in the embodiment of the present invention.

＜実施形態＞
[ハードウエア構成]
図１は、本発明の実施形態におけるカラオケ装置１の構成を説明するブロック図である。カラオケ装置１は、本発明の歌唱音声評価装置の一例であり、入力された歌唱音声の評価を行う装置である。カラオケ装置１は、歌唱者の歌唱音声が入力され、その歌唱音声の音域の評価を行う。まず、カラオケ装置１のハードウエア構成について説明する。 <Embodiment>
[Hardware configuration]
FIG. 1 is a block diagram illustrating a configuration of a karaoke apparatus 1 according to an embodiment of the present invention. The karaoke apparatus 1 is an example of a singing voice evaluation apparatus of the present invention, and is an apparatus that evaluates an input singing voice. The karaoke apparatus 1 receives the singing voice of the singer and evaluates the range of the singing voice. First, the hardware configuration of the karaoke apparatus 1 will be described.

カラオケ装置１は、制御部１０、操作部２０、表示部３０、通信部４０、記憶部５０、音響処理部６０を有する。これらの各構成は、バスを介して接続されている。また、カラオケ装置１は、音響処理部６０に接続されたスピーカ６１およびマイクロフォン６２を有する。 The karaoke apparatus 1 includes a control unit 10, an operation unit 20, a display unit 30, a communication unit 40, a storage unit 50, and an acoustic processing unit 60. Each of these components is connected via a bus. Moreover, the karaoke apparatus 1 has a speaker 61 and a microphone 62 connected to the acoustic processing unit 60.

制御部１０は、ＣＰＵ（Central Processing Unit）、ＲＡＭ（Random Access Memory）、ＲＯＭ（Read Only Memory）などを有する。制御部１０は、ＲＯＭまたは記憶部５０に記憶された制御プログラムを実行することにより、バスを介してカラオケ装置１の各部を制御する。この例においては、制御部１０は、制御プログラムを実行することにより、入力された歌唱音声の音域の評価を行うための歌唱音声評価機能を実現する。 The control unit 10 includes a CPU (Central Processing Unit), a RAM (Random Access Memory), a ROM (Read Only Memory), and the like. The control unit 10 controls each unit of the karaoke apparatus 1 through the bus by executing a control program stored in the ROM or the storage unit 50. In this example, the control part 10 implement | achieves the song voice evaluation function for performing the evaluation of the sound range of the input song voice by running a control program.

操作部２０は、操作パネルなどに設けられた操作ボタン、リモコンに設けられた操作ボタン、キーボード、マウスなどの操作デバイスであって、歌唱者の操作を受け付けて、その内容を示す操作信号を制御部１０に出力する。
表示部３０は、液晶ディスプレイなどの表示デバイスであり、制御部１０の制御に応じた内容の表示を行う。この表示の内容は、カラオケの楽曲の進行に応じた背景画像、歌詞テロップ、メニュー画面、歌唱音声の評価結果などである。
通信部４０は、制御部１０の制御に応じて、インターネットなどの通信回線と接続して、サーバ装置などの通信装置と情報のやり取りを行う。制御部１０は、通信部４０を介して取得した情報を用いて、記憶部５０に記憶される情報を更新するようにしてもよい。
記憶部５０は、ハードディスク、不揮発性メモリなどの記憶手段であり、楽曲データ、歌唱音声データ、および評価結果情報をそれぞれ記憶する記憶領域を有する。 The operation unit 20 is an operation device provided on an operation panel or the like, an operation button provided on a remote control, a keyboard, a mouse, or the like, and receives an operation of a singer and controls an operation signal indicating the contents thereof To the unit 10.
The display unit 30 is a display device such as a liquid crystal display, and displays contents according to the control of the control unit 10. The contents of this display are a background image, a lyrics telop, a menu screen, a singing voice evaluation result, etc. according to the progress of the karaoke music.
The communication unit 40 is connected to a communication line such as the Internet under the control of the control unit 10 and exchanges information with a communication device such as a server device. The control unit 10 may update information stored in the storage unit 50 using information acquired through the communication unit 40.
The memory | storage part 50 is memory | storage means, such as a hard disk and a non-volatile memory, and has a memory area | region which each memorize | stores music data, singing voice data, and evaluation result information.

楽曲データは、カラオケの歌唱対象となる楽曲に関連するデータが含まれ、例えば、ガイドメロディデータ（以下、ＧＭデータという）、伴奏データ、歌詞データなどが含まれている。ＧＭデータは、楽曲のボーカルパートのメロディを示すデータ、すなわち、歌唱すべき構成音の内容が指定されたデータであり、例えば、ＭＩＤＩ（Musical Instrument Digital Interface）形式により記述されている。伴奏データは、楽曲の伴奏の内容を示すデータであり、例えば、ＭＩＤＩ形式により記述されている。歌詞データは、楽曲の歌詞の内容を示すデータ、および表示部３０に表示させた歌詞テロップを色替えするためのタイミングを示すデータを有する。また、楽曲データには、楽曲のサビ部分の位置、メロディの出だし部分の位置など、楽曲の各構成部分の位置を規定する情報も含まれていてもよい。
楽曲データは、歌唱者によって操作部２０の操作により指定された楽曲に対応するものが制御部１０によって読み出され、カラオケの伴奏音のスピーカ６１からの出力、歌詞テロップの表示部３０への表示に用いられる。 The music data includes data related to the music to be sung in karaoke, and includes, for example, guide melody data (hereinafter referred to as GM data), accompaniment data, lyric data, and the like. The GM data is data indicating the melody of the vocal part of the music, that is, data in which the content of the constituent sound to be sung is designated, and is described in, for example, the MIDI (Musical Instrument Digital Interface) format. The accompaniment data is data indicating the contents of the accompaniment of the music and is described in, for example, the MIDI format. The lyrics data includes data indicating the contents of the lyrics of the music and data indicating the timing for changing the color of the lyrics telop displayed on the display unit 30. The music data may also include information defining the position of each constituent part of the music, such as the position of the chorus part of the music and the position of the melody start part.
The music data corresponding to the music specified by the operation of the operation unit 20 by the singer is read by the control unit 10, the karaoke accompaniment sound is output from the speaker 61, and the lyrics telop is displayed on the display unit 30. Used for.

歌唱音声データは、カラオケの対象となった楽曲を歌唱する歌唱者によって、マイクロフォン６２から入力された歌唱音声を示すデータであり、例えば、ＷＡＶＥ形式などで記憶される。このようにして記憶される歌唱音声データは、制御部１０によって、カラオケの対象となった楽曲を示す楽曲データに対応付けられる。
評価結果情報は、歌唱音声評価機能により生成され、歌唱音声の音域を評価するための情報である（図６参照）。具体的な内容については、歌唱音声評価機能における説明で述べる。 The singing voice data is data indicating the singing voice input from the microphone 62 by the singer who sings the music that is the object of karaoke, and is stored in, for example, the WAVE format. The singing voice data stored in this manner is associated by the control unit 10 with music data indicating the music that is the target of karaoke.
The evaluation result information is information that is generated by the singing voice evaluation function and is used to evaluate the range of the singing voice (see FIG. 6). Specific contents will be described in the explanation of the singing voice evaluation function.

マイクロフォン６２は、歌唱者の歌唱音声が入力され、歌唱音声を示すオーディオ信号を音響処理部６０に出力する。スピーカ６１は、音響処理部６０から出力されるオーディオ信号を放音する。音響処理部６０は、ＤＳＰ（Digital Signal Processor）などの信号処理回路、ＭＩＤＩ形式の信号からオーディオ信号を生成する音源などを有する。音響処理部６０は、マイクロフォン６２から入力されるオーディオ信号をＡ／Ｄ変換して制御部１０に出力する。音響処理部６０は、制御部１０から楽曲データに基づくＭＩＤＩ形式の信号が入力され、その信号に基づいてオーディオ信号を生成する。音響処理部６０は、このように生成したオーディオ信号、制御部１０から出力されたオーディオ信号、マイクロフォン６２から入力されたオーディオ信号などを、エフェクト処理、増幅処理などの信号処理を施してからスピーカ６１に出力する。 The microphone 62 receives the singing voice of the singer and outputs an audio signal indicating the singing voice to the acoustic processing unit 60. The speaker 61 emits an audio signal output from the sound processing unit 60. The acoustic processing unit 60 includes a signal processing circuit such as a DSP (Digital Signal Processor), a sound source that generates an audio signal from a MIDI format signal, and the like. The sound processing unit 60 performs A / D conversion on the audio signal input from the microphone 62 and outputs it to the control unit 10. The sound processing unit 60 receives a MIDI signal based on music data from the control unit 10 and generates an audio signal based on the signal. The sound processing unit 60 performs signal processing such as effect processing and amplification processing on the audio signal thus generated, the audio signal output from the control unit 10, the audio signal input from the microphone 62, and the like, and then the speaker 61. Output to.

ここで、制御部１０は、楽曲データを読み出して再生し、その楽曲の伴奏音をスピーカ６１から出力させている期間において、音響処理部６０から出力されるオーディオ信号を取得し、歌唱音声データを生成し、その楽曲データに対応付けて記憶部５０へ記憶する。
以上が、カラオケ装置１のハードウエア構成についての説明である。 Here, the control unit 10 reads and reproduces the music data, acquires the audio signal output from the acoustic processing unit 60 during the period in which the accompaniment sound of the music is output from the speaker 61, and obtains the singing voice data. Generated and stored in the storage unit 50 in association with the music data.
The above is the description of the hardware configuration of the karaoke apparatus 1.

[歌唱音声評価機能]
次に、カラオケ装置１の制御部１０が制御プログラムを実行することによって実現される歌唱音声評価機能について説明する。なお、以下に説明する歌唱音声評価機能を実現する歌唱音声評価部１００における各構成の一部または全部については、ハードウエアによって実現してもよい。 [Singing voice evaluation function]
Next, the singing voice evaluation function realized by the control unit 10 of the karaoke apparatus 1 executing the control program will be described. In addition, you may implement | achieve part or all of each structure in the song voice evaluation part 100 which implement | achieves the song voice evaluation function demonstrated below by hardware.

図２は、本発明の実施形態における歌唱音声評価部１００の構成を説明する機能ブロック図である。歌唱音声評価部１００は、取得部１１０、音高特定部１２０、範囲特定部１３０、算出部１４０、評価部１５０、音域特定部１６０、および出力部１７０を有する。
取得部１１０は、記憶部５０に記憶された歌唱音声データのうち、予め決められた評価期間の歌唱音声に対応する部分（この例においては、楽曲全体）の歌唱音声データを取得して、音高特定部１２０および範囲特定部１３０に出力する。 FIG. 2 is a functional block diagram illustrating the configuration of the singing voice evaluation unit 100 according to the embodiment of the present invention. The singing voice evaluation unit 100 includes an acquisition unit 110, a pitch specification unit 120, a range specification unit 130, a calculation unit 140, an evaluation unit 150, a range specification unit 160, and an output unit 170.
The acquisition unit 110 acquires the singing voice data of the portion corresponding to the singing voice in the predetermined evaluation period (in this example, the entire music piece) of the singing voice data stored in the storage unit 50, and the sound The data is output to the high specifying unit 120 and the range specifying unit 130.

音高特定部１２０は、取得部１１０から取得した歌唱音声データから、歌唱すべき各構成音について、歌唱音声の音高（以下、歌唱音高という）を特定する。具体的には、音高特定部１２０は、各フレームについて歌唱音声データが示す音声信号の波形が負から正に変化する際のゼロクロスを検出し、そのゼロクロスの時間間隔を測定することによってフレーム毎の歌唱音高（周波数）を特定する。このとき、この音声信号から、ローパスフィルタによりノイズ成分となる高域成分をカットしたり、ハイパスフィルタにより直流成分をカットしたりしておいてもよい。
音高特定部１２０は、このようにして特定した歌唱音高を示す情報を時系列に算出部１４０に出力する。 The pitch specifying unit 120 specifies the pitch of the singing voice (hereinafter referred to as the singing pitch) for each constituent sound to be sung from the singing voice data acquired from the acquiring unit 110. Specifically, the pitch specifying unit 120 detects a zero cross when the waveform of the voice signal indicated by the singing voice data for each frame changes from negative to positive, and measures the time interval of the zero cross for each frame. Specify the singing pitch (frequency). At this time, a high-frequency component that becomes a noise component may be cut from the audio signal by a low-pass filter, or a DC component may be cut by a high-pass filter.
The pitch specifying unit 120 outputs information indicating the singing pitch specified in this way to the calculating unit 140 in time series.

範囲特定部１３０は、取得部１１０から取得した歌唱音声データから、歌唱音高が属する音高範囲を特定する。音高範囲は、オクターブ単位で区切られた音高の範囲を示し、第１音高範囲、第１音高範囲より１オクターブ上の第２音高範囲、第２音高範囲より１オクターブ上の第３音高範囲・・・といったように区切られ、それぞれ、周波数帯域で規定されている。 The range specifying unit 130 specifies the pitch range to which the singing pitch belongs from the singing voice data acquired from the acquiring unit 110. The pitch range indicates a range of pitches divided in octave units. The first pitch range, the second pitch range one octave above the first pitch range, and one octave above the second pitch range. The third pitch range, etc. is divided and each is defined by a frequency band.

この例においては、ＭＩＤＩ形式のノートナンバ（以下、オクターブ表記付の音名で記載する）に対応して音高範囲を決めている。例えば、第３音高範囲は「Ｃ３」に相当するとみなされる周波数の下限から「Ｂ３」に相当するとみなされる周波数の上限までの周波数帯域、第４音高範囲は「Ｃ４」に相当するとみなされる周波数の下限から「Ｂ４」に相当するとみなされる周波数の上限までの周波数帯域、として決められている。すなわち、歌唱音高が、「Ａ３」に対応する「４４０Ｈｚ」であるものについては、第３音高範囲に属するものとして取り扱われる。なお、オクターブ単位で区切られていればよいから、例えば、「Ａ４」に相当するとみなされる周波数から「Ａ５」に相当するとみなされる周波数までの周波数帯域を１つの音高範囲としてもよい。 In this example, the pitch range is determined in correspondence with a MIDI note number (hereinafter referred to as a pitch name with an octave notation). For example, the third pitch range is considered to correspond to the frequency band from the lower limit of the frequency considered to be equivalent to “C3” to the upper limit of the frequency considered to be equivalent to “B3”, and the fourth pitch range is considered to be equivalent to “C4”. It is determined as a frequency band from the lower limit of the frequency to the upper limit of the frequency considered to correspond to “B4”. That is, a song whose pitch is “440 Hz” corresponding to “A3” is handled as belonging to the third pitch range. Note that, as long as it is divided in octave units, for example, a frequency band from a frequency considered to correspond to “A4” to a frequency considered to correspond to “A5” may be set as one pitch range.

範囲特定部１３０は、歌唱音高が属する音高範囲を特定するときには、以下のように行う。まず、範囲特定部１３０は、取得部１１０から取得した歌唱音声データにＦＦＴ（Fast Fourier Transform）を施して、フレーム毎の周波数スペクトルを算出する。範囲特定部１３０は、算出した周波数スペクトルから第１フォルマント周波数を検出し、第１フォルマント周波数が含まれる周波数帯域に対応する音高範囲を歌唱音高が属する音高範囲として特定する。このとき、範囲特定部１３０は、第１フォルマント周波数の検出結果の正確性、すなわち、特定された音高範囲の正確性の指標となる信頼度を算出する。この信頼度は、周波数スペクトルの形状により変動する。例えば、周波数スペクトルのピークの位置が、倍音関係を満たしているほど、ピークのレベルが高いほど、信頼度が高くなるようにすればよい。この場合、歌唱音声において、子音であったり、ノイズが多かったりする部分においては信頼度が低くなる。
範囲特定部１３０は、このようにして特定した音高範囲（以下、特定音高範囲という）、算出した信頼度（以下、算出信頼度という）を示す範囲情報を時系列に評価部１５０に出力する。 The range specifying unit 130 performs as follows when specifying the pitch range to which the singing pitch belongs. First, the range specifying unit 130 performs FFT (Fast Fourier Transform) on the singing voice data acquired from the acquiring unit 110 to calculate a frequency spectrum for each frame. The range specifying unit 130 detects the first formant frequency from the calculated frequency spectrum, and specifies the pitch range corresponding to the frequency band including the first formant frequency as the pitch range to which the singing pitch belongs. At this time, the range specifying unit 130 calculates the accuracy of the detection result of the first formant frequency, that is, the reliability as an index of the accuracy of the specified pitch range. This reliability varies depending on the shape of the frequency spectrum. For example, the higher the peak level in the frequency spectrum, the higher the peak level, the higher the reliability. In this case, in the singing voice, the reliability is low in a portion that is a consonant or has a lot of noise.
The range specifying unit 130 outputs range information indicating the pitch range thus specified (hereinafter referred to as a specific pitch range) and the calculated reliability (hereinafter referred to as a calculated reliability) to the evaluation unit 150 in time series. To do.

なお、範囲特定部１３０においては、第１フォルマント周波数から歌唱音高も特定できることとなるが、精度を高めるためには演算処理負担も大きくなるため、ここでは、音高範囲が特定できる程度に精度を緩くして第１フォルマント周波数を検出している。一方、音高特定部１２０においては、音声波形のゼロクロスから音高を特定しているが、この検出手法においては、倍音関係のいずれの音高であるかについて、範囲特定部１３０におけるＦＦＴ処理に比べて特定精度が低い。一方、ゼロクロスからの音高の特定において、ある音高範囲の周波数帯域に限って音高を特定する場合には、特定の精度はＦＦＴ処理に比べて高くなる。
この例においては、このように、範囲特定部１３０と音高特定部１２０とで異なる検出方法で検出された音高に基づいて歌唱音高、音高範囲を特定している。一方、例えば、ＦＦＴ処理においても精度の高い第１フォルマントの周波数検出が可能であれば、音高特定部１２０に代えて、範囲特定部１３０において検出した第１フォルマントの周波数を示す歌唱音高を算出部１４０に出力するようにしてもよい。 In the range specifying unit 130, the singing pitch can be specified from the first formant frequency. However, since the calculation processing load increases in order to increase the accuracy, the accuracy is high enough to specify the pitch range here. The first formant frequency is detected by loosening. On the other hand, the pitch specifying unit 120 specifies the pitch from the zero cross of the voice waveform. In this detection method, the FFT processing in the range specifying unit 130 is used to determine which pitch is related to overtones. Specific accuracy is lower than that. On the other hand, in specifying the pitch from the zero cross, when the pitch is specified only in a frequency band within a certain pitch range, the specific accuracy is higher than that in the FFT processing.
In this example, the singing pitch and the pitch range are specified based on the pitches detected by the different detection methods in the range specifying unit 130 and the pitch specifying unit 120 in this way. On the other hand, for example, if the first formant frequency can be detected with high accuracy in the FFT processing, the singing pitch indicating the frequency of the first formant detected by the range specifying unit 130 is used instead of the pitch specifying unit 120. You may make it output to the calculation part 140. FIG.

算出部１４０は、音高特定部１２０から出力された情報と記憶部５０に記憶されたＧＭデータとを取得する。算出部１４０は、取得したＧＭデータに時系列に指定される歌唱すべき各構成音について、その構成音の音高（以下、指定音高という）と、音高特定部１２０において特定された歌唱音高とを比較して、音高の一致の程度を示す一致度を算出する。なお、指定音高は、周波数で示されていてもよいし、「Ａ４」などのノートナンバに応じた値で示されていてもよい。 The calculation unit 140 acquires the information output from the pitch specifying unit 120 and the GM data stored in the storage unit 50. For each component sound to be sung in time series in the acquired GM data, the calculation unit 140 determines the pitch of the component sound (hereinafter referred to as a specified pitch) and the song specified by the pitch specifying unit 120. The degree of coincidence indicating the degree of coincidence of pitches is calculated by comparing with the pitches. The designated pitch may be indicated by a frequency, or may be indicated by a value corresponding to a note number such as “A4”.

算出部１４０は、歌唱音高を時系列に取得するから、ＧＭデータに指定される各構成音と歌唱音高との時系列の対応関係を識別できる。また、算出部１４０は、この一致度の算出において、指定音高と歌唱音高とが同じ音高範囲に属するようにオクターブ単位で変換して、変換した指定音高と歌唱音高とを比較する。例えば、歌唱音高が２２０Ｈｚ（「Ａ２」に相当）であれば、第２音高範囲に属するから、時系列に対応する構成音の音高を第２音高範囲に属するように変換（例えば、指定音高が「Ａ３」なら「Ａ２」に変換）してから比較する。
算出部１４０は、算出した一致度（以下、算出一致度という）を示す情報と、歌唱すべき構成音およびその指定音高を特定可能な情報とを対応付けた一致度情報を、評価部１５０に出力する。算出部１４０における処理の具体例について、図３を用いて説明する。 Since the calculation unit 140 acquires the singing pitches in time series, the calculation unit 140 can identify the time-series correspondence between the constituent sounds specified in the GM data and the singing pitches. In addition, in calculating the degree of coincidence, the calculation unit 140 converts the designated pitch and the singing pitch so that the designated pitch and the singing pitch belong to the same pitch range, and compares the converted designated pitch with the singing pitch. To do. For example, if the singing pitch is 220 Hz (corresponding to “A2”), it belongs to the second pitch range, so that the pitches of the constituent sounds corresponding to the time series are converted to belong to the second pitch range (for example, If the designated pitch is “A3”, it is converted to “A2”) and then compared.
The calculating unit 140 determines the degree of coincidence information in which information indicating the calculated degree of coincidence (hereinafter referred to as a calculated degree of coincidence) is associated with information that can identify the constituent sound to be sung and its designated pitch. Output to. A specific example of processing in the calculation unit 140 will be described with reference to FIG.

図３は、本発明の実施形態における算出部１４０における一致度算出の具体例を説明する図である。図３において、横軸は時刻、縦軸は音高を示している。ＧＭデータによって指定される歌唱すべき各構成音は、ＧＭで示した斜線のある四角部分に対応し、「Ｃ４」、「Ｅ４」などの記載は各構成音の指定音高を示している。斜線のない四角部分については、各構成音をオクターブ単位で変換したものである。四角の範囲は、時刻方向については構成音の長さを示している。また、音高方向については、その指定音高とみなされる周波数の範囲を示し、指定音高の周波数を中心に、±５０ｃｅｎｔの範囲である。
音高特定部１２０において特定された歌唱音高は、Ｐで示した曲線に対応している。
横軸に記載したＮ１、Ｎ２、・・・は、各構成音の期間に対応した部分として時系列で表され、以下、Ｎ１部分の構成音を構成音Ｎ１などというものとする。例えば、構成音Ｎ２は、指定音高が「Ｅ４」である。 FIG. 3 is a diagram illustrating a specific example of coincidence calculation in the calculation unit 140 according to the embodiment of the present invention. In FIG. 3, the horizontal axis indicates time, and the vertical axis indicates pitch. Each component sound to be sung designated by the GM data corresponds to a shaded square portion indicated by GM, and descriptions such as “C4” and “E4” indicate designated pitches of each component sound. The square part without diagonal lines is obtained by converting each constituent sound in octave units. The square range indicates the length of the constituent sound in the time direction. The pitch direction indicates a range of frequencies considered as the designated pitch, and is a range of ± 50 cent with the frequency of the designated pitch as the center.
The singing pitch specified by the pitch specifying unit 120 corresponds to the curve indicated by P.
N1, N2,... Described on the horizontal axis are represented in time series as portions corresponding to the period of each constituent sound, and hereinafter, the constituent sounds of the N1 portion will be referred to as constituent sounds N1 and the like. For example, the constituent tone N2 has a designated pitch “E4”.

算出部１４０は、構成音Ｎ１について、指定音高は「Ｃ４」であるが、歌唱音高が第３音高範囲にあるため、１オクターブ下に変換した「Ｃ３」と比較する。そして、算出部１４０は、この例においては、構成音Ｎ１に対応する期間に対して、歌唱音高が「Ｃ３」の周波数±５０ｃｅｎｔに含まれている期間の割合を、構成音Ｎ１についての一致度（この例においては、７０％）として算出する。そして、この例においては、算出部１４０は、構成音Ｎ１およびその指定音高「Ｃ４」を示す情報と、算出一致度「７０％」を示す情報とを対応付けた一致度情報を、評価部１５０に出力する。 The calculation unit 140 compares the constituent pitch N1 with “C3” converted to one octave lower because the designated pitch is “C4” but the singing pitch is in the third pitch range. In this example, the calculation unit 140 matches the ratio of the period in which the singing pitch is included in the frequency ± 50 cent of “C3” with respect to the constituent sound N1 with respect to the constituent sound N1. It is calculated as a degree (70% in this example). In this example, the calculation unit 140 determines the degree of coincidence information in which the information indicating the constituent sound N1 and the designated pitch “C4” is associated with the information indicating the calculated degree of coincidence “70%”. 150.

図２に戻って説明を続ける。評価部１５０は、算出部１４０から出力された一致度情報と、範囲特定部１３０から出力された範囲情報とを取得して、これらの情報に基づいて、各音高について評価する評価処理を行う。以下、評価処理について説明する。 Returning to FIG. 2, the description will be continued. The evaluation unit 150 acquires the matching degree information output from the calculation unit 140 and the range information output from the range specifying unit 130, and performs an evaluation process for evaluating each pitch based on these information. . Hereinafter, the evaluation process will be described.

図４は、本発明の実施形態における評価処理の動作を説明するフローチャートである。評価部１５０は、一致度情報および範囲情報を取得すると評価処理を開始する。以下に説明する評価処理については、歌唱すべき構成音毎に行われる処理である。なお、評価部１５０は、特定音高範囲を時系列に取得し、また、各構成音も時系列に表されているから、これらの対応関係を識別できる。 FIG. 4 is a flowchart for explaining the operation of the evaluation process in the embodiment of the present invention. When the evaluation unit 150 acquires the matching degree information and the range information, the evaluation unit 150 starts the evaluation process. About the evaluation process demonstrated below, it is a process performed for every component sound which should be sung. Note that the evaluation unit 150 acquires the specific pitch range in time series, and each constituent sound is also shown in time series, so that the correspondence relationship between them can be identified.

まず、評価部１５０は、取得した一致度情報および範囲情報に示されている内容を参照して、構成音について、指定音高、算出一致度、特定音高範囲、および算出信頼度の各種情報を対応付けて記憶する（ステップＳ１１０）。評価部１５０は、予め決められた数（例えば、２０音）だけ前に遡った構成音についてまでは記憶しておき、それよりも前の構成音については記憶内容を削除する。なお、評価部１５０は、この記憶内容を削除せず、楽曲の最初の構成音についての情報をすべて記憶しておいてもよい。 First, the evaluation unit 150 refers to the content indicated in the acquired matching degree information and range information, and various information on the specified pitch, the calculated matching degree, the specific pitch range, and the calculated reliability for the constituent sounds. Are stored in association with each other (step S110). The evaluation unit 150 stores up to a predetermined number of constituent sounds that are traced back by a predetermined number (for example, 20 sounds), and deletes the stored contents of constituent sounds before that. Note that the evaluation unit 150 may store all information about the first component sound of the music without deleting the stored content.

続いて、評価部１５０は、算出信頼度が予め決められた条件を満たしているかどうかを判定する（ステップＳ１２０）。予め決められた条件とは、算出信頼度が予め決められたしきい値以上になることである。評価部１５０は、算出信頼度が条件を満たさないと判定した場合（ステップＳ１２０；Ｎｏ）には、特定音高範囲の修正処理を行う（ステップＳ１３０）。特定音高範囲の修正処理については、図５を用いて説明する。 Subsequently, the evaluation unit 150 determines whether or not the calculation reliability satisfies a predetermined condition (step S120). The predetermined condition is that the calculation reliability is equal to or higher than a predetermined threshold value. If the evaluation unit 150 determines that the calculation reliability does not satisfy the condition (step S120; No), the evaluation unit 150 performs a specific pitch range correction process (step S130). The specific pitch range correction process will be described with reference to FIG.

図５は、本発明の実施形態における修正処理の動作を説明するフローチャートである。評価部１５０は、過去の構成音に対応してステップＳ１１０において記憶された内容を参照し、評価対象の構成音より１つ前の構成音についての算出信頼度を取得する（ステップＳ２１０）。評価部１５０は、取得した算出信頼度が予め決められた条件を満たしているかどうかを判定する（ステップＳ２２０）。この判定は、ステップＳ１２０における判定と同様の判定である。算出信頼度が条件を満たしていない場合（ステップＳ２２０；Ｎｏ）には、評価部１５０は、再びステップＳ２１０に戻って処理を続ける。すなわち、評価部１５０は、さらに１つ前の構成音についての算出信頼度を取得する（ステップＳ２１０）。 FIG. 5 is a flowchart for explaining the operation of the correction processing in the embodiment of the present invention. The evaluation unit 150 refers to the content stored in step S110 corresponding to the past component sound, and acquires the calculation reliability for the component sound immediately before the component sound to be evaluated (step S210). The evaluation unit 150 determines whether or not the obtained calculation reliability satisfies a predetermined condition (step S220). This determination is the same as the determination in step S120. When the calculation reliability does not satisfy the condition (step S220; No), the evaluation unit 150 returns to step S210 again and continues the process. In other words, the evaluation unit 150 further acquires the calculation reliability for the immediately preceding constituent sound (step S210).

算出信頼度が条件を満たしている場合（ステップＳ２２０；Ｙｅｓ）には、条件を満たしている構成音について、指定音高の属する音高範囲に対する特定音高範囲のずれの程度を判定する（ステップＳ２３０）。ずれの程度は、オクターブ単位で表される。例えば、指定音高が「Ｃ４」であり、特定音高範囲が第３音高範囲であれば、「Ｃ４」が属する第４音高範囲に対して特定音高範囲は１オクターブ下にずれているから、ずれの程度は「−１」として表される。 If the calculated reliability satisfies the condition (step S220; Yes), the degree of deviation of the specific pitch range from the pitch range to which the specified pitch belongs is determined for the constituent sound satisfying the condition (step S220). S230). The degree of deviation is expressed in octave units. For example, if the designated pitch is “C4” and the specific pitch range is the third pitch range, the specific pitch range is shifted one octave below the fourth pitch range to which “C4” belongs. Therefore, the degree of deviation is represented as “−1”.

評価部１５０は、当初の評価対象（算出信頼度が条件を満たしていない）である構成音についての特定音高範囲を、その構成音の指定音高が属する音高範囲に対して、ステップＳ２３０において判定したずれの程度だけずらした音高範囲として決定する（ステップＳ２４０）。例えば、上記の例を前提として、当初の評価対象である構成音についての指定音高が「Ｂ３」であれば、「Ｂ３」が属する第３音高範囲から１オクターブ下の第２音高範囲を、その構成音についての特定音高範囲として決定して、ステップＳ１１０において記憶した内容を修正し、修正処理を終了する。このようにして、ステップＳ１１０において記憶された特定音高範囲は、算出信頼度が条件を満たさない場合には、どのような音高範囲を示すものであっても、ステップＳ２４０において決定した音高範囲に修正される。 The evaluation unit 150 sets the specific pitch range of the constituent sound that is the initial evaluation target (calculation reliability does not satisfy the condition) to the pitch range to which the designated pitch of the constituent sound belongs, in step S230. Is determined as a pitch range shifted by the degree of shift determined in (step S240). For example, on the premise of the above example, if the designated pitch for the component sound that is the initial evaluation target is “B3”, the second pitch range that is one octave below the third pitch range to which “B3” belongs. Is determined as the specific pitch range for the component sound, the content stored in step S110 is corrected, and the correction process is terminated. In this way, the specific pitch range stored in step S110 indicates any pitch range when the calculated reliability does not satisfy the condition, the pitch determined in step S240. Modified to range.

なお、楽曲の最初の構成音まで戻っても条件を満たす算出信頼度が得られない場合には、予め決められた処理を行えばよい。例えば、評価部１５０は、評価対象である構成音についての特定音高範囲を、音高特定部１２０において特定された歌唱音高の属する音高範囲として決定し、ステップＳ１１０において記憶した内容を修正し修正処理を終了すればよい。 If the calculation reliability satisfying the condition is not obtained even after returning to the first component sound of the music, a predetermined process may be performed. For example, the evaluation unit 150 determines the specific pitch range for the constituent sound to be evaluated as the pitch range to which the singing pitch specified by the pitch specifying unit 120 belongs, and corrects the content stored in step S110. Then, the correction process may be terminated.

図４に戻って説明を続ける。評価部１５０は、算出信頼度が条件を満たすと判定した場合（ステップＳ１２０；Ｙｅｓ）または修正処理（ステップＳ１３０）が終了した場合には、ステップＳ１１０において記憶した内容を参照して、指定音高を特定音高範囲に属するようにオクターブ単位で変換する（ステップＳ１４０）。このとき、修正処理（ステップＳ１３０）が行われていれば、修正後の特定音高範囲に属するように変換される。
そして評価部１５０は、算出一致度を、変換した音高に対する歌唱評価として評価結果情報に記録（ステップＳ１５０）し、その構成音についての評価処理を終了する。評価部１５０は、歌唱すべき全ての構成音について、これらの評価処理を繰り返す。続いて、評価結果情報について図６を用いて説明する。 Returning to FIG. 4, the description will be continued. When it is determined that the calculated reliability satisfies the condition (step S120; Yes) or when the correction process (step S130) is completed, the evaluation unit 150 refers to the content stored in step S110 and designates the specified pitch. Are converted in octave units so as to belong to the specific pitch range (step S140). At this time, if correction processing (step S130) has been performed, conversion is performed so as to belong to the specific pitch range after correction.
Then, the evaluation unit 150 records the calculated coincidence in the evaluation result information as a singing evaluation for the converted pitch (step S150), and ends the evaluation process for the constituent sounds. The evaluation unit 150 repeats these evaluation processes for all constituent sounds to be sung. Subsequently, the evaluation result information will be described with reference to FIG.

図６は、本発明の実施形態における評価結果情報の内容を説明する図である。評価結果情報は、評価音高、評価点および算出一致度履歴が対応付けられた情報であり、音高毎の歌唱評価の結果を示している。 FIG. 6 is a diagram for explaining the contents of the evaluation result information in the embodiment of the present invention. The evaluation result information is information in which an evaluation pitch, an evaluation score, and a calculated coincidence history are associated with each other, and indicates a result of singing evaluation for each pitch.

評価音高とは、評価対象となる音高であり、この例においては、ＭＩＤＩ形式のノートナンバに対応して定められている。
算出一致度履歴とは、上記評価処理のステップＳ１５０において記録される算出一致度であり、音高に対する歌唱評価として記録する度に、その評価音高に対応する履歴として記録されていく。例えば、図６に示す例においては、評価音高「Ｄ３」については、算出一致度の記録が４回なされた状態である。
評価点は、算出一致度履歴に記録された数値に応じて算出される点数であり、この例においては、算出一致度履歴における各点数の平均として決められている。例えば、図６に示す例においては、評価音高「Ｃ３」については、算出一致度の記録が「７０」、「６０」であるから、平均の「６５」が評価点となっている。この評価点の算出は、評価処理のステップＳ１５０における一連の処理として評価部１５０が行う。 The evaluation pitch is a pitch to be evaluated. In this example, the evaluation pitch is determined corresponding to a MIDI note number.
The calculated coincidence history is the calculated coincidence recorded in step S150 of the evaluation process, and is recorded as a history corresponding to the evaluation pitch every time it is recorded as a singing evaluation with respect to the pitch. For example, in the example shown in FIG. 6, the evaluation pitch “D3” is in a state where the calculated coincidence is recorded four times.
The evaluation score is a score calculated according to the numerical value recorded in the calculated matching score history, and in this example, it is determined as the average of the scores in the calculated matching score history. For example, in the example shown in FIG. 6, for the evaluation pitch “C3”, the calculated coincidence records are “70” and “60”, so the average “65” is the evaluation score. This evaluation score calculation is performed by the evaluation unit 150 as a series of processes in step S150 of the evaluation process.

このようにして、評価部１５０は、評価結果情報への記録により、構成音単位で歌唱を評価するのではなく、音高単位で歌唱を評価することになる。すなわち、評価点は、各音高について、どの程度うまく歌えているかの指標となる値である。 In this way, the evaluation unit 150 evaluates the singing in units of pitches instead of evaluating the singing in units of constituent sounds by recording in the evaluation result information. That is, the evaluation score is a value that serves as an index of how well each pitch is sung.

図２に戻って説明を続ける。音域特定部１６０は、評価結果情報を参照し、一定範囲の音高を歌唱音声の音域（以下、歌唱音域という）として特定する。この例においては、音域特定部１６０は、評価結果情報における評価音高の低い音高から順に評価点が予め決められた合格点数を超えているか否かを判定し、最初に合格点数を超えている評価音高を最低音域音高として検出する。また、音域特定部１６０は、評価結果情報における評価音高の高い音高から順に評価点が予め決められた合格点数を超えているか否かを判定し、最初に合格点数を超えている評価音高を最高音域音高として検出する。
そして、音域特定部１６０は、検出した最低音域音高から最高音域音高までを歌唱音域として特定して、歌唱音域を示す情報を出力部１７０に出力する。 Returning to FIG. 2, the description will be continued. The sound range specification unit 160 refers to the evaluation result information and specifies a certain range of pitches as the sound range of the singing voice (hereinafter referred to as a singing range). In this example, the range specifying unit 160 determines whether or not the evaluation score exceeds a predetermined passing score in order from the lowest evaluation pitch in the evaluation result information, and first exceeds the passing score. The evaluation pitch is detected as the lowest pitch. Further, the range specifying unit 160 determines whether or not the evaluation score exceeds a predetermined passing score in order from the highest evaluation pitch in the evaluation result information, and the evaluation sound that first exceeds the passing score is determined. High is detected as the highest pitch.
Then, the range specifying unit 160 specifies from the detected lowest range pitch to the highest range pitch as the singing range, and outputs information indicating the singing range to the output unit 170.

出力部１７０は、音域特定部１６０から出力された情報に基づいて、表示結果として表示部３０に表示させる内容を決定して、その内容を表示部３０に表示させるための制御信号を出力する。表示部３０において表示させる内容とは、最低音域音高と最高音域音高とを明示して歌唱音域を示すものであればよい。この表示内容は様々なものとすることができるが、一例として図７に示すような表示内容としてもよい。 The output unit 170 determines the content to be displayed on the display unit 30 as a display result based on the information output from the sound range specifying unit 160, and outputs a control signal for causing the display unit 30 to display the content. The content displayed on the display unit 30 may be anything that clearly indicates the lowest pitch and the highest pitch and indicates the singing range. The display content can be various, but as an example, the display content may be as shown in FIG.

図７は、本発明の実施形態における評価結果の表示内容の一例を説明する図である。この例においては、歌唱者の歌唱音域を示すだけでなく、歌唱した楽曲の音域およびどの程度キーをずらせば歌唱者が歌唱できるかの推奨を行う表示内容である。このような表示内容を行うにあたっては、出力部１７０は、ＧＭデータによって指定される構成音の音高の最低音高と最高音高とを検出しておく。そして、出力部１７０は、最低音高と最高音高がともに歌唱音域に含まれるようにするには、どの程度キーを変更する必要があるかを算出すればよい。 FIG. 7 is a diagram for explaining an example of display contents of evaluation results in the embodiment of the present invention. In this example, the display contents not only indicate the singer's singing range but also recommend the singer's singing range and how much the key can be shifted to sing. In performing such display contents, the output unit 170 detects the minimum pitch and the maximum pitch of the constituent sounds specified by the GM data. Then, the output unit 170 may calculate how much the key needs to be changed so that both the lowest pitch and the highest pitch are included in the singing range.

図７に示す例においては、歌唱者の歌唱音域は、最高音域音高が「Ｃ４」、最低音域音高が「Ｃ３」とする評価結果である。その評価に用いられた楽曲の音域は、最高音高が「Ｇ４」、最低音高が「Ａ３」であるものとする。その場合、出力部１７０は、楽曲の音域を下げて、歌唱音域内に含まれるようにキーの変更量を算出する。この場合には、半音６つ分から８つ分の範囲で下げることで、歌唱音域内に含まれるようになる。その場合には、出力部１７０は、ちょうど歌唱音域内の中心部分に含まれるになるように、半音を７つ分下げることを推奨するように決定して、歌唱音域、楽曲の音域、キー変更後の楽曲の音域、および推奨コメントを表示部３０に表示させる。 In the example shown in FIG. 7, the singing range of the singer is an evaluation result in which the highest pitch is “C4” and the lowest pitch is “C3”. It is assumed that the musical pitch used for the evaluation has a maximum pitch “G4” and a minimum pitch “A3”. In that case, the output unit 170 lowers the sound range of the music and calculates the key change amount so as to be included in the singing sound range. In this case, it is included in the singing range by lowering it in the range of 6 to 8 semitones. In that case, the output unit 170 decides to recommend that the semitone is lowered by seven so that it is included in the central portion of the singing range, and the singing range, the range of the song, and the key change The range of the later music and recommended comments are displayed on the display unit 30.

このように、本発明の実施形態におけるカラオケ装置１は、歌唱者の歌唱音域を評価して、その評価結果に応じた内容を表示部３０に表示させることができる。この歌唱音域の評価結果は、歌唱者が歌唱できる最大幅の音域を示すものではなく、オクターブずれであっても歌唱すべき音高にあわせて歌唱できる最大幅の音域を示すものである。したがって、歌唱者は、信頼性の高い歌唱音域の評価結果を知ることができる。 Thus, the karaoke apparatus 1 in the embodiment of the present invention can evaluate the singing range of the singer and display the content corresponding to the evaluation result on the display unit 30. The evaluation result of this singing range does not indicate the maximum range that can be sung by the singer, but indicates the maximum range that can be sung in accordance with the pitch to be sung even if there is an octave shift. Therefore, the singer can know the evaluation result of the singing sound range with high reliability.

＜変形例＞
以上、本発明の実施形態について説明したが、本発明は以下のように、さまざまな態様で実施可能である。
[変形例１]
上述した実施形態において、カラオケ装置１は、楽曲が終了した後、楽曲全体を１つの評価期間として歌唱音域の評価をしていたが、１つの楽曲を複数の評価期間に分割して、各期間において評価をしてもよいい。例えば、複数の評価期間とは、楽曲の構成単位、例えば、歌詞の１番に相当する期間と２番に相当する期間であってもよいし、一定時間単位で区切られた期間であってもよい。
この場合には、評価部１５０は、楽曲データを参照したり、計時したりして複数の評価期間を認識し、各評価期間に対応する評価結果情報を生成して記録すればよい。 <Modification>
As mentioned above, although embodiment of this invention was described, this invention can be implemented in various aspects as follows.
[Modification 1]
In the above-described embodiment, the karaoke apparatus 1 has evaluated the singing sound range by using the entire music as one evaluation period after the music is finished. You may evaluate in For example, the plurality of evaluation periods may be composition units of music, for example, a period corresponding to the first and second periods of the lyrics, or a period divided in fixed time units. Good.
In this case, the evaluation unit 150 may recognize and evaluate a plurality of evaluation periods by referring to the music data or measuring the time, and generate and record evaluation result information corresponding to each evaluation period.

[変形例２]
上述した実施形態において、出力部１７０は、評価結果情報に応じて音域特定部１６０により特定された歌唱音域に応じた表示をさせるための情報を表示部３０に出力していたが、評価結果情報に示される各評価音高の評価点をヒストグラムなどによりグラフ化したものの表示をさせるための情報を表示部３０に出力するようにしてもよい。これによっても歌唱者は、自らの歌唱音声がどの程度の歌唱音域を有するのかを知ることができる。
また、出力部１７０は、評価結果情報に応じて特定された歌唱音域に基づいて、歌唱音域に対する音域評価点を算出して、その音域評価点の表示をさせるための情報を表示部３０に出力するようにしてもよい。この音域評価点は、歌唱音域と楽曲の音域との比較結果に基づいて算出されるようにしてもよい。
このように、出力部１７０は、評価結果情報に応じた内容で表示部３０に表示させるための情報を出力するのであれば、どのような情報であってもよい。 [Modification 2]
In the embodiment described above, the output unit 170 outputs information for causing the display unit 30 to display information corresponding to the singing range specified by the range specifying unit 160 according to the evaluation result information. It is also possible to output to the display unit 30 information for displaying an evaluation score of each evaluation pitch shown in FIG. Also by this, the singer can know how much singing sound range his / her singing voice has.
The output unit 170 calculates a range evaluation point for the singing range based on the singing range specified according to the evaluation result information, and outputs information for displaying the range evaluation point to the display unit 30. You may make it do. This range evaluation point may be calculated based on a comparison result between the singing range and the musical range.
As described above, the output unit 170 may be any information as long as it outputs information to be displayed on the display unit 30 with contents corresponding to the evaluation result information.

[変形例３]
上述した実施形態においは、範囲特定部１３０は、特定した音高範囲に対する信頼度を算出していたが、必ずしも信頼度を算出しなくてもよい。信頼度を算出しない構成である場合であれば、評価部１５０における評価処理において、図４のステップＳ１２０、Ｓ１３０における処理は不要である。 [Modification 3]
In the embodiment described above, the range specifying unit 130 calculates the reliability with respect to the specified pitch range, but the reliability need not necessarily be calculated. In the case where the reliability is not calculated, the processing in steps S120 and S130 of FIG. 4 is unnecessary in the evaluation processing in the evaluation unit 150.

[変形例４]
上述した実施形態において、評価部１５０の評価処理におけるステップＳ１２０の判定で用いられるしきい値は、予め決められたものであったが、変更可能に構成されていてもよい。例えば、しきい値を変更する指示が操作部２０への操作によって入力されると、制御部１０は、このしきい値を指示に応じて変更するようにすればよい。また、対応する構成音について算出部１４０において算出された算出一致度に応じてしきい値が変化するようにしてもよい。例えば、算出一致度が低ければしきい値を高くし、算出一致度が高ければしきい値を低くすればよい。
なお、音域特定部１６０において歌唱音域の特定のときに用いられる合格点数など、各種の判定処理に用いられるしきい値についても上記同様に変更可能に構成されていてもよい。 [Modification 4]
In the above-described embodiment, the threshold value used in the determination in step S120 in the evaluation process of the evaluation unit 150 is predetermined, but may be configured to be changeable. For example, when an instruction to change the threshold value is input by operating the operation unit 20, the control unit 10 may change the threshold value according to the instruction. Further, the threshold value may be changed according to the calculated coincidence calculated by the calculation unit 140 for the corresponding constituent sound. For example, the threshold may be increased if the calculated coincidence is low, and the threshold may be decreased if the calculated coincidence is high.
Note that threshold values used for various determination processes, such as a passing score used when the singing range is specified in the range specifying unit 160, may be configured to be changeable in the same manner as described above.

[変形例５]
上述した実施形態において、評価結果情報に示される評価点は、算出一致度履歴における各点数の平均として決められていたが、それ以外の算出方法により決められるようにしてもよい。例えば、算出一致度履歴における最高点を評価点としてもよいし、最低点を評価点としてもよい。この場合には、評価部１５０は、評価結果情報に算出一致度を履歴として記録しなくてもよい。例えば、最高点を評価点とする場合には、評価部１５０は、現時点における評価点と取得した算出一致度とを比較して、算出一致度が評価点より高い場合に限り評価点を算出一致度に置き換えるようにすればよいためである。
また、評価部１５０は、算出一致度履歴に記録された点数に統計処理を行って得られる指標（中央値、最頻値など）に基づいて、評価点を決めるようにしてもよい。 [Modification 5]
In the above-described embodiment, the evaluation score indicated in the evaluation result information is determined as the average of the scores in the calculation matching history, but may be determined by other calculation methods. For example, the highest score in the calculated coincidence history may be used as the evaluation score, and the lowest score may be used as the evaluation score. In this case, the evaluation unit 150 may not record the calculated matching degree as a history in the evaluation result information. For example, when using the highest score as the evaluation score, the evaluation unit 150 compares the evaluation score at the current time with the obtained calculated matching score, and calculates the matching score only when the calculated matching score is higher than the evaluation score. This is because it is only necessary to replace it every time.
Further, the evaluation unit 150 may determine an evaluation point based on an index (median value, mode value, etc.) obtained by performing statistical processing on the score recorded in the calculated matching score.

[変形例６]
上述した実施形態において、音域特定部１６０は、評価結果情報における評価音高の低い音高から順に評価点が予め決められた合格点数を超えているか否かを判定し、最初に合格点数を超えている評価音高を最低音域音高として検出していたが、他の判定手法により検出してもよい。例えば、音域特定部１６０は、評価結果情報に示される評価音高に対する評価点の分布から、統計処理などの演算処理を行って歌唱音域を特定するようにすればよい。なお、上記説明においては、最低音域音高を例として説明したが、最高音域音高の検出についても同様に変形可能である。 [Modification 6]
In the embodiment described above, the pitch range specifying unit 160 determines whether or not the evaluation score exceeds a predetermined passing score in order from the lowest evaluation pitch in the evaluation result information, and first exceeds the passing score. The detected evaluation pitch is detected as the lowest pitch, but may be detected by other determination methods. For example, the sound range specifying unit 160 may perform a calculation process such as a statistical process from the distribution of evaluation points with respect to the evaluation pitch indicated in the evaluation result information to specify the singing sound range. In the above description, the lowest pitch is described as an example, but the detection of the highest pitch can be similarly modified.

[変形例７]
上述した実施形態において、範囲特定部１３０は、音高特定部１２０において特定された歌唱音高を用いて、音高範囲を特定するようにしてもよい。この場合には、範囲特定部１３０は、音高特定部１２０から出力される情報を取得し、検出した第１フォルマント周波数を、歌唱音高の整数分の１の周波数および整数倍の周波数のいずれに近いかという判定から音高範囲を決めてもよい。
例えば、音高特定部１２０において特定された歌唱音高が４４０Ｈｚであれば、範囲特定部１３０は、第１フォルマント周波数が、１１０、２２０、４４０、８８０、１７６０Ｈｚのいずれに近いかという判定をして音高範囲を決めてもよい。例えば、範囲特定部１３０は、第１フォルマント周波数が２４０Ｈｚとして検出された場合には、２２０Ｈｚに近いから、２２０Ｈｚが属する音高範囲である第２音高範囲として特定すればよい。 [Modification 7]
In the embodiment described above, the range specifying unit 130 may specify the pitch range using the singing pitch specified by the pitch specifying unit 120. In this case, the range specifying unit 130 acquires the information output from the pitch specifying unit 120, and the detected first formant frequency is any one of a frequency that is a fraction of the singing pitch and an integer multiple of the frequency. The pitch range may be determined based on the determination of whether or not the pitch is close.
For example, if the singing pitch specified by the pitch specifying unit 120 is 440 Hz, the range specifying unit 130 determines whether the first formant frequency is closer to 110, 220, 440, 880, or 1760 Hz. The pitch range may be determined. For example, if the first formant frequency is detected as 240 Hz, the range specifying unit 130 may specify the second pitch range that is the pitch range to which 220 Hz belongs because it is close to 220 Hz.

[変形例８]
上述した実施形態において、評価部１５０は、ステップＳ２４０において特定音高範囲を、ステップＳ２３０において判定したずれの程度に応じた音高範囲として決定したが、別の方法において判定したずれの程度に応じた音高範囲として決定してもよい。評価部１５０は、例えば、実施形態においては、条件を満たしている１つの構成音について、指定音高の属する音高範囲に対する特定音高範囲のずれの程度を判定していたが、条件を満たしている構成音をさらに前に遡った構成音も含んで複数抽出し、抽出した複数の構成音について、指定音高の属する音高範囲に対する特定音高範囲のずれの程度を判定してもよい。この場合、構成音ごとに対応してずれの程度が複数得られることになるが、評価部１５０は、複数のずれの程度に基づいて、ステップＳ２３０において判定すべきずれの程度を算出すればよい。例えば、複数のずれの程度が「−１」、「−１」、「−１」、「−２」であった場合には、これらの平均をとり整数となるように四捨五入した「−１」としてもよいし、最頻値である「−１」としてもよい。 [Modification 8]
In the embodiment described above, the evaluation unit 150 determines the specific pitch range in step S240 as the pitch range corresponding to the degree of deviation determined in step S230, but according to the degree of deviation determined in another method. The pitch range may be determined. For example, in the embodiment, the evaluation unit 150 determines the degree of deviation of the specific pitch range from the pitch range to which the specified pitch belongs for one constituent sound that satisfies the condition. A plurality of extracted constituent sounds may be extracted, including constituent sounds that are traced back further, and the degree of deviation of the specific pitch range from the pitch range to which the specified pitch belongs may be determined for the extracted constituent sounds. . In this case, a plurality of degrees of deviation are obtained corresponding to each constituent sound, but the evaluation unit 150 may calculate the degree of deviation to be determined in step S230 based on the plurality of degrees of deviation. . For example, when the degree of the plurality of deviations is “−1”, “−1”, “−1”, “−2”, the average of these is “−1” rounded to an integer. Or the mode value “−1”.

[変形例９]
上述した実施形態においては、出力部１７０から出力される情報は、歌唱音域を表示部３０に表示させるための情報であったが、それ以外の内容を示す情報であってもよい。出力部１７０から出力される情報は、歌唱者に歌唱音域の評価結果を報知するためのものであればよいから、例えば、評価結果の内容を声で表した音声データであってもよい。また、出力部１７０から出力される情報は、音響処理部６０における音源を用いて発音させるためのＭＩＤＩ形式のシーケンスデータであってもよい [Modification 9]
In the embodiment described above, the information output from the output unit 170 is information for displaying the singing sound range on the display unit 30, but may be information indicating other contents. The information output from the output unit 170 may be information for notifying the singer of the evaluation result of the singing range, and may be, for example, voice data expressing the content of the evaluation result with a voice. Further, the information output from the output unit 170 may be MIDI format sequence data for sound generation using a sound source in the sound processing unit 60.

なお、歌唱者に歌唱音域の評価結果を報知するものとしては、発光、香り、動きなどを用いたものであってもよい。この場合には、様々な発光態様で発光するＬＥＤ（Light Emitting Diode）などを用いた発光装置、様々な香りの成分をもつガスを放出可能な香り放出装置、様々な動作を行うことが可能なロボットなどを外部装置として接続する。そして、その外部装置を時系列に沿って制御するための制御データを出力部１７０から出力される情報とすればよい。 In addition, as what alert | reports the evaluation result of a singing range to a singer, you may use light emission, a fragrance, a movement, etc. In this case, it is possible to perform a light emitting device using LEDs (Light Emitting Diodes) that emit light in various light emission modes, a scent discharge device capable of releasing gas having various scent components, and various operations. Connect the robot as an external device. Then, control data for controlling the external device in time series may be information output from the output unit 170.

[変形例１０]
上述した実施形態における制御プログラムは、磁気記録媒体（磁気テープ、磁気ディスクなど）、光記録媒体（光ディスクなど）、光磁気記録媒体、半導体メモリなどのコンピュータ読み取り可能な記録媒体に記憶した状態で提供し得る。また、カラオケ装置１は、制御プログラムをネットワーク経由でダウンロードしてもよい。 [Modification 10]
The control program in the above-described embodiment is provided in a state stored in a computer-readable recording medium such as a magnetic recording medium (magnetic tape, magnetic disk, etc.), an optical recording medium (optical disk, etc.), a magneto-optical recording medium, or a semiconductor memory. Can do. Further, the karaoke apparatus 1 may download the control program via a network.

１…カラオケ装置、１０…制御部、２０…操作部、３０…表示部、４０…通信部、５０…記憶部、６０…音響処理部、６１…スピーカ、６２…マイクロフォン、１００…歌唱音声評価部、１１０…取得部、１２０…音高特定部、１３０…範囲特定部、１４０…算出部、１５０…評価部、１６０…音域特定部、１７０…出力部 DESCRIPTION OF SYMBOLS 1 ... Karaoke apparatus, 10 ... Control part, 20 ... Operation part, 30 ... Display part, 40 ... Communication part, 50 ... Memory | storage part, 60 ... Sound processing part, 61 ... Speaker, 62 ... Microphone, 100 ... Singing voice evaluation part 110 ... Acquisition unit 120 ... Pitch identification unit 130 ... Range identification unit 140 ... Calculation unit 150 ... Evaluation unit 160 ... Sound range identification unit 170 ... Output unit

Claims

Acquisition means for acquiring a singing voice input during reproduction of music data;
A pitch specifying means for specifying the singing pitch of the acquired singing voice;
Range specifying means for specifying a pitch range divided in octaves and to which the pitch of the acquired singing voice belongs,
For each constituent sound to be sung specified by the music data, the specified pitch specified by the music data is compared with the singing pitch, and the specified pitch and the singing pitch are the same pitch. A calculation means for calculating the degree of coincidence of pitches when converted in octave units so as to belong to the range;
The specified pitch of the constituent sound to be sung is converted in octave units so as to belong to the specified pitch range for the constituent sound, and the converted pitch is based on the degree of coincidence calculated for the constituent sound An evaluation means for performing singing evaluation;
The singing voice evaluation apparatus comprising: output means for outputting information corresponding to the result of the singing evaluation.

The pitch specifying means specifies the singing pitch based on the pitch detected by the first detection method from the acquired singing voice,
The singing voice evaluation device according to claim 1, wherein the range specifying unit specifies the pitch range based on a pitch detected by a second detection method different from the first detection method.

The range specifying means calculates a reliability that is an index of accuracy of the specified pitch range for the constituent sound to be sung,
When the reliability of the pitch range specified for the constituent sound is less than a predetermined threshold, the evaluation means determines that the reliability corresponding to the constituent sound before the constituent sound is the threshold. The specified pitch is converted so as to belong to the pitch range determined based on the relationship between the pitch range exceeding the value and the pitch range to which the specified pitch of the previous constituent sound belongs. The singing voice evaluation apparatus according to claim 1 or 2.

According to the result of the singing evaluation, further comprising a range specifying means for specifying a certain range of pitches as the range of the singing voice,
The singing voice evaluation apparatus according to any one of claims 1 to 3, wherein the information output by the output means is range information indicating the specified range.