JP2013190564A

JP2013190564A - Voice evaluation device

Info

Publication number: JP2013190564A
Application number: JP2012056044A
Authority: JP
Inventors: Ryuichi Nariyama; 隆一成山; Shuichi Matsumoto; 秀一松本; Tatsuya Terajima; 辰弥寺島
Original assignee: Yamaha Corp
Current assignee: Yamaha Corp
Priority date: 2012-03-13
Filing date: 2012-03-13
Publication date: 2013-09-26
Anticipated expiration: 2032-03-13
Also published as: JP6024130B2

Abstract

PROBLEM TO BE SOLVED: To suitably evaluate singing in a singing method placing emphasis on inflection and rhythmic sense.SOLUTION: When playback of a karaoke musical piece ends, a control unit 10 performs evaluation processing of user's singing voice data stored in a user's singing voice data storage area 25. In sections other than sections of specific singing, the control unit 10 performs evaluation in accordance with a difference in pitch between user's singing voice data and GM data. Meanwhile, in the section of specific singing, the control unit 10 performs evaluation in accordance with a difference in variation between displacement points of pitch of the user's singing voice data and displacement points indicated by scoring data.

Description

本発明は、音声評価装置に関する。 The present invention relates to a voice evaluation apparatus.

カラオケ装置においては、歌唱者による歌唱の巧拙を採点する機能を備えるものがある。例えば特許文献１には、マイクロフォンから入力される音声信号を、相互に異なった採点基準になされた複数のモードで採点する方法が開示されている。 Some karaoke apparatuses have a function of scoring the skill of singing by a singer. For example, Patent Document 1 discloses a method of scoring an audio signal input from a microphone in a plurality of modes based on different scoring standards.

特開平１０−２６９９２号公報JP 10-26992 A

ところで、ラップと呼ばれる歌唱法が用いられる楽曲がある。ラップでは、ピッチの値ではなく抑揚やリズム感が重視されるため、特許文献１に記載の技術等の従来の方式では採点が困難である場合があった。
本発明は上述の背景に鑑みてなされたものであり、抑揚やリズム感が重視される歌唱法における歌唱の評価を好適に行うことを目的とする。 By the way, there is a song that uses a singing method called rap. In laps, not the pitch value but emphasis and rhythmic feeling are emphasized, so scoring may be difficult with a conventional method such as the technique described in Patent Document 1.
This invention is made | formed in view of the above-mentioned background, and it aims at performing the evaluation of the singing in the singing method where emphasis and a feeling of rhythm are emphasized suitably.

上述した課題を解決するために、本発明は、音声の波形を示す音声データを取得する音声取得部と、模範となる音の特徴を示す模範音データを取得する模範音取得部と、前記音声取得部により取得された音声データにより示される音の特徴を特定する特徴特定部と、前記特徴特定部により特定された特徴の変化量と、前記模範音データにより示される音の特徴の変化量のそれぞれの変化量の差分を特定する差分特定部と、前記差分特定部により特定された差分に基づき前記音声データにより示される音を評価し、評価結果を出力する評価部とを具備することを特徴とする音声評価装置を提供する。 In order to solve the above-described problems, the present invention provides an audio acquisition unit that acquires audio data indicating a waveform of an audio, an exemplary sound acquisition unit that acquires exemplary sound data indicating characteristics of an exemplary sound, and the audio A feature specifying unit that specifies a feature of the sound indicated by the sound data acquired by the acquiring unit, a change amount of the feature specified by the feature specifying unit, and a change amount of the feature of the sound indicated by the exemplary sound data. A difference specifying unit that specifies a difference between each change amount, and an evaluation unit that evaluates a sound indicated by the audio data based on the difference specified by the difference specifying unit and outputs an evaluation result. A voice evaluation device is provided.

本発明の好ましい態様において、音の特徴の時間的変化を表すグラフの傾きが予め定められた閾値以上変化する箇所を変位点とするとき、前記特徴特定部により特定された特徴の変位点を特定する変位点特定部と、前記模範音データにより示される特徴に関する変位点と、当該変位点の時刻から所定の時間差内に現れる前記変位点特定部により特定された前記音声データの変位点とを互いに対応付ける対応付け部とを具備し、前記差分特定部は、前記差分の特定において、前記対応付け部により対応付けられた前記音声データに関する変位点における特徴の変化量と前記模範音データに関する変位点における特徴の変化量との差分を特定してもよい。 In a preferred aspect of the present invention, when a point where a slope of a graph representing a temporal change in a sound feature changes more than a predetermined threshold is set as a displacement point, the displacement point of the feature specified by the feature specifying unit is specified. A displacement point specifying unit, a displacement point related to the feature indicated by the model sound data, and a displacement point of the audio data specified by the displacement point specifying unit appearing within a predetermined time difference from the time of the displacement point. An association unit for associating, wherein the difference identifying unit is configured to identify a feature change amount at a displacement point related to the audio data and a displacement point related to the exemplary sound data associated by the association unit in identifying the difference. You may specify the difference with the variation | change_quantity of a characteristic.

また、本発明の更に好ましい態様において、前記音声データに関する変位点の時刻と、前記模範音データに関する変位点の時刻との差分を特定する時刻変位特定部を有し、前記評価部は、前記差分特定部により特定された差分及び前記時刻変位特定部により特定された差分に基づき前記音声データにより示される音を評価し、評価結果を出力してもよい。 Further, in a further preferred aspect of the present invention, there is provided a time displacement specifying unit that specifies a difference between the time of the displacement point related to the audio data and the time of the displacement point related to the exemplary sound data, and the evaluation unit The sound indicated by the audio data may be evaluated based on the difference specified by the specifying unit and the difference specified by the time displacement specifying unit, and the evaluation result may be output.

また、本発明の更に好ましい態様において、前記評価部は、前記差分特定部により特定された差分が予め定められた閾値以内である場合には同一の評価結果を出力する一方、それ以外の場合には、該特定された差分が大きいほど低い評価を示す評価結果を出力してもよい。 Further, in a further preferred aspect of the present invention, the evaluation unit outputs the same evaluation result when the difference specified by the difference specifying unit is within a predetermined threshold, while in the other cases May output an evaluation result indicating a lower evaluation as the identified difference is larger.

本発明の更に好ましい態様において、前記音声データにおいて予め定められた歌唱態様による歌唱を行う区間を表す区間データを取得する区間データ取得部を具備し、前記評価部は、前記区間データ取得部によって取得された区間データの示す区間内においては、前記差分特定部によって特定された差分に基づく評価を行うまたはピッチを主とした従来の歌唱評価を行う一方、該区間データの示す区間以外の区間においては、前記特徴特定部によって特定された特徴と、前記模範音データにより示される音の特徴との差分に基づく評価を行ってもよい。 In a further preferred aspect of the present invention, the audio data includes a section data acquisition unit that acquires section data representing a section in which singing is performed in a predetermined singing mode, and the evaluation unit is acquired by the section data acquisition unit. In the section indicated by the section data, the evaluation based on the difference specified by the difference specifying unit is performed or the conventional singing evaluation based on the pitch is performed, while the section other than the section indicated by the section data is performed. The evaluation may be performed based on the difference between the feature specified by the feature specifying unit and the sound feature indicated by the model sound data.

また、本発明の別の好ましい態様において、前記音声データを予め定められたアルゴリズムに従って解析し、解析結果に応じて、予め定められた歌唱態様による歌唱を行う区間を特定する区間特定部を具備し、前記評価部は、前記区間特定部によって特定された区間内においては、前記差分特定部によって特定された差分に基づく評価を行うまたはピッチを主とした従来の歌唱評価を行う一方、該特定された区間以外の区間においては、前記特徴特定部によって特定された特徴と、前記模範音データにより示される音の特徴との差分に基づく評価を行ってもよい。 Further, in another preferred aspect of the present invention, the audio data is analyzed according to a predetermined algorithm, and a section specifying unit for specifying a section for performing singing according to a predetermined singing mode according to the analysis result is provided. In the section specified by the section specifying section, the evaluation section performs the evaluation based on the difference specified by the difference specifying section or performs the conventional singing evaluation mainly on the pitch while the specified section is specified. In a section other than the section, evaluation based on the difference between the feature specified by the feature specifying unit and the sound feature indicated by the model sound data may be performed.

また、本発明は、音声の波形を示す音声データを取得する音声取得部と、前記音声取得部により取得された音声データにより示される音の特徴を特定する特徴特定部と、前記特徴特定部により特定された特徴の時間的変化を表すグラフに現れるピーク値を複数特定するピーク値特定部と、前記ピーク値特定部により特定されたピーク値の変化量を特定する変化量特定部と、前記変化量特定部により特定された変化量に基づき前記音声データにより示される音を評価し、評価結果を出力する評価部とを具備することを特徴とする音声評価装置を提供する。 Further, the present invention provides a voice acquisition unit that acquires voice data indicating a waveform of a voice, a feature specification unit that specifies a feature of a sound indicated by the voice data acquired by the voice acquisition unit, and the feature specification unit. A peak value specifying unit for specifying a plurality of peak values appearing in a graph representing a temporal change of the specified feature, a change amount specifying unit for specifying a change amount of the peak value specified by the peak value specifying unit, and the change There is provided an audio evaluation apparatus comprising: an evaluation unit that evaluates a sound indicated by the audio data based on a change amount specified by an amount specifying unit and outputs an evaluation result.

また、本発明は、音声の波形を表す音声データを取得する音声取得部と、楽曲の拍を示す拍データを取得する拍データ取得部と、前記音声取得部により取得された音声データにより示される音の特徴を特定する特徴特定部と、前記特徴特定部により特定された特徴の時間的変化を表すグラフの傾きが予め定められた閾値以上変化する箇所を変位点として特定する変位点特定部と、前記変位点特定部により特定された変位点と、前記拍データ取得部により取得された拍データの示す時刻との時間差に基づき前記音声データにより示される音を評価し、評価結果を出力する評価部とを具備することを特徴とする音声評価装置を提供する。
また、この態様において、前記特徴特定部により特定された特徴の変化量と、模範となる音の特徴を表す模範音データにより示される音の特徴の変化量との差分を特定する差分特定部を具備し、前記評価部は、前記変位点と前記拍データの示す時刻との時間差に基づく評価を行うともに、前記差分特定部により特定された差分に基づき前記音声データにより示される音を評価してもよい。 Further, the present invention is indicated by an audio acquisition unit that acquires audio data representing an audio waveform, a beat data acquisition unit that acquires beat data indicating the beat of the music, and audio data acquired by the audio acquisition unit. A feature specifying unit for specifying a feature of the sound, and a displacement point specifying unit for specifying a location where a slope of a graph representing a temporal change of the feature specified by the feature specifying unit changes by a predetermined threshold or more as a displacement point; Evaluation that evaluates the sound indicated by the audio data based on the time difference between the displacement point specified by the displacement point specifying unit and the time indicated by the beat data acquired by the beat data acquisition unit, and outputs an evaluation result A speech evaluation apparatus characterized by comprising:
Further, in this aspect, a difference specifying unit for specifying a difference between the change amount of the feature specified by the feature specifying unit and the change amount of the sound feature indicated by the model sound data representing the feature of the model sound. And the evaluation unit performs an evaluation based on a time difference between the displacement point and the time indicated by the beat data, and evaluates a sound indicated by the audio data based on a difference specified by the difference specifying unit. Also good.

本発明によれば、抑揚やリズム感が重視される歌唱法における歌唱の評価を好適に行うことができる。 According to the present invention, it is possible to suitably perform singing evaluation in a singing method in which inflection and rhythmic feeling are important.

本発明の実施形態におけるシステムの構成図System configuration diagram of an embodiment of the present invention カラオケ装置のハードウェア構成を表すブロック図Block diagram showing hardware configuration of karaoke equipment 伴奏データ記憶領域の内容を表す模式図Schematic diagram showing the contents of the accompaniment data storage area 採点用データの内容の一例を示す図Diagram showing an example of the contents of scoring data 変位点データの内容の一例を示す図The figure which shows an example of the contents of the displacement point data カラオケ装置の機能的構成の一例を示すブロック図Block diagram showing an example of the functional configuration of a karaoke device 採点部の機能的構成の一例を示すブロック図Block diagram showing an example of the functional configuration of the scoring unit ピッチ比較部が行う処理の内容を説明するための図The figure for demonstrating the content of the process which a pitch comparison part performs ピッチ比較部が行う処理の内容を説明するための図The figure for demonstrating the content of the process which a pitch comparison part performs 制御部が行う処理の流れを示すフロー図Flow chart showing the flow of processing performed by the control unit 制御部が行う採点処理の流れを示すフロー図Flow chart showing the flow of scoring process performed by the control unit ピッチ比較処理の内容を説明するための図The figure for explaining the contents of the pitch comparison processing ピッチ比較処理の内容を説明するための図The figure for explaining the contents of the pitch comparison processing ピッチ比較処理の内容を説明するための図The figure for explaining the contents of the pitch comparison processing 得点算出処理の内容を説明するための図The figure for explaining the contents of the score calculation processing ピッチ比較処理の内容を説明するための図The figure for explaining the contents of the pitch comparison processing 変位点のヒストグラムを示す図Figure showing a histogram of displacement points ピッチ比較処理の内容を説明するための図The figure for explaining the contents of the pitch comparison processing 歌唱音声のピッチの変化の一例を示す図The figure which shows an example of the change of the pitch of singing voice

＜実施形態＞
＜構成＞
図１は、本発明の実施形態におけるシステムの構成を表した図である。このシステムは、カラオケ装置１００と、サーバ装置２００と、ネットワークＮＷとを有する。カラオケ装置１００は、ユーザからの要求に従ってカラオケ楽曲を再生するとともに、再生されるカラオケ楽曲についてのユーザによる歌唱を評価する装置である。ネットワークＮＷはＬＡＮ（Local Area Network）やインターネットであり、カラオケ装置１００とサーバ装置２００との間におけるデータ通信が行われる通信網である。サーバ装置２００は、その内部あるいは外部に備えたＨＤＤ（Hard Disk Drive）等の記憶手段に、カラオケ楽曲に関するコンテンツデータ等の各種データを記憶しており、カラオケ装置１００からの要求に従って、ネットワークＮＷ経由でこのコンテンツデータをカラオケ装置１００に供給する装置である。ここで、コンテンツとは、カラオケ楽曲に関する音声と映像との組み合わせを指す。すなわち、コンテンツデータとは、主旋律の歌声が存在せず伴奏やコーラスで構成されたいわゆる伴奏データと、この楽曲の歌詞や歌詞の背景に表示する映像からなる映像データとから成り立っている。なお、サーバ装置２００に対してカラオケ装置１００は複数存在してもよい。また、カラオケ装置１００に対してサーバ装置２００が複数存在してもよい。 <Embodiment>
<Configuration>
FIG. 1 is a diagram showing the configuration of a system according to an embodiment of the present invention. This system includes a karaoke device 100, a server device 200, and a network NW. The karaoke device 100 is a device that reproduces karaoke music according to a request from the user and evaluates the singing by the user about the karaoke music to be played. The network NW is a LAN (Local Area Network) or the Internet, and is a communication network in which data communication is performed between the karaoke apparatus 100 and the server apparatus 200. The server device 200 stores various data such as content data related to karaoke music in a storage means such as an HDD (Hard Disk Drive) provided inside or outside the server device 200, and via the network NW according to a request from the karaoke device 100 The content data is supplied to the karaoke apparatus 100. Here, the content refers to a combination of audio and video related to karaoke music. That is, the content data is composed of so-called accompaniment data composed of accompaniment and chorus without the main melodic singing voice, and video data composed of the lyrics displayed on the background of the music and the lyrics. Note that a plurality of karaoke apparatuses 100 may exist for the server apparatus 200. Further, a plurality of server devices 200 may exist for the karaoke device 100.

図２は、カラオケ装置１００のハードウェア構成を表したブロック図である。カラオケ装置１００は、制御部１０、記憶部２０、操作部３０、表示部４０、通信制御部５０、音声処理部６０、マイクロホン６１、及びスピーカ６２を有し、これら各部がバス７０を介して接続されている。制御部１０は、ＣＰＵ（Central Processing Unit）、ＲＡＭ（Random Access Memory）、及びＲＯＭ（Read Only Memory）等を有している。制御部１０において、ＣＰＵが、ＲＯＭや記憶部２０に記憶されているコンピュータプログラムを読み出しＲＡＭにロードして実行することにより、カラオケ装置１００の各部を制御する。 FIG. 2 is a block diagram showing the hardware configuration of the karaoke apparatus 100. The karaoke apparatus 100 includes a control unit 10, a storage unit 20, an operation unit 30, a display unit 40, a communication control unit 50, an audio processing unit 60, a microphone 61, and a speaker 62, and these units are connected via a bus 70. Has been. The control unit 10 includes a CPU (Central Processing Unit), a RAM (Random Access Memory), a ROM (Read Only Memory), and the like. In the control unit 10, the CPU controls each unit of the karaoke apparatus 100 by reading out a computer program stored in the ROM or the storage unit 20, loading it into the RAM, and executing it.

操作部３０は、各種の操作子を備え、ユーザによる操作内容を表す操作信号を制御部１０に出力する。表示部４０は、例えば液晶パネルを備え、制御部１０による制御の下、各カラオケ楽曲に応じた歌詞テロップや背景映像等の各種画像を表示する。通信制御部５０は、カラオケ装置１００とネットワークＮＷとを有線あるいは無線で接続し、ネットワークＮＷを介したカラオケ装置１００とサーバ装置２００との間のデータ通信を制御する。 The operation unit 30 includes various operators and outputs an operation signal representing the content of an operation performed by the user to the control unit 10. The display unit 40 includes, for example, a liquid crystal panel, and displays various images such as lyrics telop and background video according to each karaoke piece under the control of the control unit 10. The communication control unit 50 connects the karaoke device 100 and the network NW by wire or wirelessly, and controls data communication between the karaoke device 100 and the server device 200 via the network NW.

サーバ装置２００は、図示せぬＣＰＵや各種メモリを備えたコンピュータであり、特にネットワークストレージ２１０を備えている。ネットワークストレージ２１０は例えばＨＤＤであり、カラオケ楽曲のコンテンツデータ等の各種データを記憶する。図２においてサーバ装置２００は１つのネットワークストレージ２１０を備えているが、ネットワークストレージの数はこれに限ったものではなく、複数のネットワークストレージをサーバ装置２００が備えてもよい。ユーザにより予約されたカラオケ楽曲のコンテンツデータがネットワークストレージ２１０に記憶されている場合、カラオケ装置１００は、通信制御部５０による制御に従ってサーバ装置２００と通信を行い、ネットワークストレージ２１０から読み出されたコンテンツデータをネットワークＮＷ経由でダウンロードしながら、ダウンロードが完了した部分から順次再生する、というストリーミング再生を行う。 The server device 200 is a computer including a CPU and various memories (not shown), and particularly includes a network storage 210. The network storage 210 is an HDD, for example, and stores various data such as content data of karaoke music. In FIG. 2, the server device 200 includes one network storage 210, but the number of network storages is not limited to this, and the server device 200 may include a plurality of network storages. When content data of a karaoke song reserved by the user is stored in the network storage 210, the karaoke device 100 communicates with the server device 200 according to control by the communication control unit 50, and content read from the network storage 210 While downloading data via the network NW, streaming playback is performed in which playback is performed sequentially from the part where the download is completed.

マイクロホン６１は、収音した音声を表すアナログの音声信号を音声処理部６０に出力する。音声処理部６０は、Ａ／Ｄ（Analog / Digital）コンバータを有し、マイクロホン６１が出力したアナログの音声信号をデジタルの音声データに変換して制御部１０に出力し、制御部１０はこれを取得する。また、音声処理部６０は、Ｄ／Ａ（Digital / Analog）コンバータを有し、制御部１０から受け取ったデジタルの音声データをアナログの音声信号に変換してスピーカ６２に出力する。スピーカ６２は、音声処理部６０から受け取ったアナログの音声信号に基づく音を放音する。なお、この実施形態では、マイクロホン６１とスピーカ６２とがカラオケ装置１００に含まれている場合について説明するが、音声処理部６０に入力端子及び出力端子を設け、オーディオケーブルを介してその入力端子に外部マイクロホンを接続する構成としても良く、同様に、オーディオケーブルを介してその出力端子に外部スピーカを接続するとしても良い。また、この実施形態では、マイクロホン６１からスピーカ６２へ出力されるオーディオ信号がアナログオーディオ信号である場合について説明するが、デジタルオーディオデータを入出力するようにしても良い。このような場合には、音声処理部６０にてＡ／Ｄ変換やＤ／Ａ変換を行う必要はない。操作部３０や表示部４０についても同様であり、外部出力端子を設け、外部モニタを接続する構成としてもよい。 The microphone 61 outputs an analog audio signal representing the collected audio to the audio processing unit 60. The audio processing unit 60 includes an A / D (Analog / Digital) converter, converts the analog audio signal output from the microphone 61 into digital audio data, and outputs the digital audio data to the control unit 10. get. The audio processing unit 60 includes a D / A (Digital / Analog) converter, converts digital audio data received from the control unit 10 into an analog audio signal, and outputs the analog audio signal to the speaker 62. The speaker 62 emits a sound based on the analog audio signal received from the audio processing unit 60. In this embodiment, a case where the microphone 61 and the speaker 62 are included in the karaoke apparatus 100 will be described. However, the audio processing unit 60 is provided with an input terminal and an output terminal, and the input terminal is connected to the input terminal via an audio cable. An external microphone may be connected, and similarly, an external speaker may be connected to the output terminal via an audio cable. In this embodiment, the audio signal output from the microphone 61 to the speaker 62 is an analog audio signal. However, digital audio data may be input / output. In such a case, the audio processing unit 60 does not need to perform A / D conversion or D / A conversion. The same applies to the operation unit 30 and the display unit 40, and an external output terminal may be provided to connect an external monitor.

記憶部２０は、各種のデータを記憶するための記憶手段であり、例えばＨＤＤや不揮発性メモリである。記憶部２０は、伴奏データ記憶領域２１、映像データ記憶領域２２、ＧＭ（Guide Melody）データ記憶領域２３、採点用データ記憶領域２４、及びユーザ歌唱音声データ記憶領域２５といった複数の記憶領域を備えている。 The storage unit 20 is a storage unit for storing various data, and is, for example, an HDD or a nonvolatile memory. The storage unit 20 includes a plurality of storage areas such as an accompaniment data storage area 21, a video data storage area 22, a GM (Guide Melody) data storage area 23, a scoring data storage area 24, and a user singing voice data storage area 25. Yes.

図３は、伴奏データ記憶領域２１の内容を表す模式図である。伴奏データ記憶領域２１には、各楽曲における伴奏の音声を表す伴奏データに関する情報が記憶されている。伴奏データ記憶領域２１には、「曲番号」、「曲名」、「歌手名」、「ジャンル」、及び「ファイル格納場所」といった複数の項目からなる伴奏データレコードが複数記憶されている。「曲番号」は、楽曲を一意に識別するための番号であり、例えば４桁の親番号と２桁の枝番号とからなる。「曲名」は、各楽曲の名称を表す。「歌手名」は、各楽曲の歌い手の名称を表す。「ジャンル」は、予め決められた分類基準で分類された複数のジャンルのうち、各楽曲の属する音楽のジャンルを表す。「ファイル格納場所」は、各楽曲の伴奏データそのものであるデータファイルの格納場所を表し、server1というフォルダを含む場合には伴奏データのデータファイルがサーバ装置２００に格納されており、server1というフォルダを含まない場合には伴奏データのデータファイルがカラオケ装置１００に格納されていることを意味している。例えば図３において、曲名が「ＢＢＢ」である楽曲は、伴奏データのデータファイルがサーバ装置２００に格納されていることを表し、曲名が「ＣＣＣ」である楽曲は、伴奏データのデータファイルがカラオケ装置１００の記憶部２０に格納されていることを表している。この伴奏データのデータファイルは、例えば、ＭＩＤＩ（Musical Instrument Digital Interface）形式のファイルである。 FIG. 3 is a schematic diagram showing the contents of the accompaniment data storage area 21. The accompaniment data storage area 21 stores information related to accompaniment data representing accompaniment sound in each music piece. The accompaniment data storage area 21 stores a plurality of accompaniment data records including a plurality of items such as “song number”, “song name”, “singer name”, “genre”, and “file storage location”. “Song number” is a number for uniquely identifying a music piece, and is composed of, for example, a 4-digit parent number and a 2-digit branch number. “Song name” represents the name of each music piece. “Singer name” represents the name of each singer. “Genre” represents the genre of music to which each musical piece belongs among a plurality of genres classified according to a predetermined classification standard. The “file storage location” represents a storage location of a data file that is accompaniment data itself of each music piece. When a folder called server1 is included, a data file of accompaniment data is stored in the server device 200. If not included, it means that a data file of accompaniment data is stored in the karaoke apparatus 100. For example, in FIG. 3, a song whose song name is “BBB” indicates that a data file of accompaniment data is stored in the server device 200, and a song whose song name is “CCC” has a data file of accompaniment data as karaoke. It is stored in the storage unit 20 of the device 100. The accompaniment data file is, for example, a MIDI (Musical Instrument Digital Interface) format file.

映像データ記憶領域２２には、各楽曲の歌詞を示す歌詞データ及び歌詞の背景に表示される背景映像を表す背景映像データが記憶されている。歌詞データによって示される歌詞は、カラオケ歌唱の際に、楽曲の進行に伴って歌詞テロップとして表示部４０に表示される。また、背景映像データによって表される背景映像は、カラオケ歌唱の際に楽曲の進行に伴って歌詞テロップの背景として表示部４０に表示される。ＧＭデータ記憶領域２３には、楽曲のボーカルパートのメロディを示すデータ、すなわち、歌唱すべき構成音の内容を指定するデータであるガイドメロディデータ（以下「ＧＭデータ」という）が記憶されている。ＧＭデータは、制御部１０が、楽曲においてラップによる歌唱やものまねによる歌唱（以下「特定歌唱」という）を行う区間以外の区間において、制御部１０がユーザによる歌唱の巧拙の評価処理を行う際に比較の基準として用いるものである。なお、制御部１０が行う評価処理については後述するため、ここではその詳細な説明は省略する。ＧＭデータは、例えば、ＭＩＤＩ形式により記述されている。 The video data storage area 22 stores lyrics data indicating the lyrics of each song and background video data representing a background video displayed on the background of the lyrics. The lyrics shown by the lyrics data are displayed on the display unit 40 as lyrics telop as the music progresses during karaoke singing. In addition, the background video represented by the background video data is displayed on the display unit 40 as the background of the lyrics telop as the music progresses during karaoke singing. In the GM data storage area 23, data indicating the melody of the vocal part of the music, that is, guide melody data (hereinafter referred to as “GM data”), which is data specifying the content of the constituent sound to be sung, is stored. The GM data is obtained when the control unit 10 performs the skill evaluation process of the singing by the user in a section other than the section in which the control unit 10 performs singing by rap or singing by imitation (hereinafter referred to as “specific singing”). It is used as a reference for comparison. Since the evaluation process performed by the control unit 10 will be described later, a detailed description thereof will be omitted here. The GM data is described in the MIDI format, for example.

採点用データ記憶領域２４には、楽曲において特定歌唱を行う区間（以下「特定区間」という）の採点を行うために用いられるデータ（以下「採点用データ」という）が記憶されている。図４は採点用データの内容の一例を示す図である。図示のように、採点用データは、「曲番号」と「特定区間データ」と「ピッチ変位点データ」と「音量変位点データ」の各項目を含んでいる。これらの項目のうち、「曲番号」は上述したとおりである。「特定区間データ」は、特定区間を示すデータである。図４に示す例では、「曲番号」が「１００４−１９」の楽曲は、時刻ｔ１１からｔ２０の区間と、時刻ｔ２１から時刻ｔ３０の区間との２つの区間が特定区間であることが示されている。ひとつの楽曲に含まれる特定区間の数は１であってもよく、また、複数であってもよい。また、特定区間を有しない楽曲の場合は、採点用データは記憶されない。 The scoring data storage area 24 stores data (hereinafter referred to as “scoring data”) used for scoring a section where a specific song is performed (hereinafter referred to as “specific section”). FIG. 4 is a diagram showing an example of the contents of scoring data. As shown in the figure, the scoring data includes items of “song number”, “specific section data”, “pitch displacement point data”, and “volume displacement point data”. Among these items, the “song number” is as described above. “Specific section data” is data indicating a specific section. In the example shown in FIG. 4, it is indicated that the music piece with the “song number” “1004-19” has two sections, a section from time t11 to t20 and a section from time t21 to time t30. ing. The number of specific sections included in one musical piece may be one or plural. In the case of a music piece that does not have a specific section, scoring data is not stored.

図４において、「ピッチ変位点データ」は、模範となる音を表す模範音データ（例えば、ＧＭデータ）から生成されたデータであり、模範音データにおいてピッチの変化の傾向（ピッチを表すグラフの傾き）が変わる時刻を示す時刻データと、その時刻におけるピッチの値を示すピッチデータとを含む。図４に示す例では、「曲番号」が「１００４−１９」の楽曲の時刻ｔ１１からｔ２０の特定区間においては、（時刻，ピッチ）＝（ｔ１１，ｐ１１），（ｔ１２，ｐ１２），…で示される複数の箇所がピッチの変位点として示されている。なお、模範音データはＧＭデータに限らず、例えば模範となる歌唱音声を表すデータであってもよく、模範となる音を表すデータであればどのようなものであってもよい。「音量変位点データ」は、模範音データから生成されたデータであり、模範音データにおいて音量の変化の傾向（音量を表すグラフの傾き）が変わる時刻を示す時刻データと、その時刻における音量の値を示す音量データとを含む。以下の説明では、説明の便宜上、ピッチ変位点データと音量変位点データとを各々区別する必要がない場合には、これらを「変位点データ」と称して説明する。すなわち、変位点データは、ピッチ（又は音量）の時間的変化を表すグラフの傾きがその前後で予め定められた閾値以上変化する箇所を示すデータである。 In FIG. 4, “pitch displacement point data” is data generated from model sound data (for example, GM data) representing a model sound, and a tendency of a pitch change in the model sound data (in a graph representing a pitch). Time data indicating the time at which (tilt) changes, and pitch data indicating the value of the pitch at that time. In the example shown in FIG. 4, (time, pitch) = (t11, p11), (t12, p12),... In a specific section from time t11 to t20 of the music whose “song number” is “1004-19”. The plurality of locations shown are shown as pitch displacement points. The model sound data is not limited to GM data, and may be, for example, data representing a typical singing voice, or any data representing a model sound. The “volume displacement point data” is data generated from the model sound data, and the time data indicating the time when the tendency of the volume change (slope of the graph representing the volume) changes in the model sound data, and the volume of the sound at that time Volume data indicating a value. In the following description, when it is not necessary to distinguish the pitch displacement point data and the volume displacement point data from each other for convenience of explanation, these will be referred to as “displacement point data”. That is, the displacement point data is data indicating a location where the slope of the graph representing the temporal change in pitch (or volume) changes by a predetermined threshold value before and after the displacement.

図５は、変位点データの内容の一例を示す図である。図において、横軸は時刻を示し、縦軸はピッチ（又は音量）を示す。実線５００は、ＧＭデータによって表されるガイドメロディのピッチ（又は音量）の変化を表しており、以下、ＧＭ曲線５００という。この実施形態では、ＧＭ曲線５００の傾きが大きく変わる点（以下「変位点」という）として、ピッチ（又は音量）が上昇から下降に転じた点（例えば、図５の時刻ｔ１９参照）、上昇をやめてある一定範囲に収まる点、上昇し始めた点（例えば、時刻ｔ１４参照）、ピッチ（又は音量）が下降から上昇に転じた点（例えば、時刻ｔ１７参照）、下降をやめてある一定範囲に収まる点（例えば、時刻ｔ１３参照）、下降し始めた点（例えば、時刻ｔ１１参照）等、ピッチ（又は音量）を表すグラフ（ＧＭ曲線５００）の傾きの変化量が予め定められた閾値以上となった位置を用いる。この実施形態では、ＧＭ曲線５００の傾きの変化量が予め定められた閾値以上となる箇所に加えて、歌唱開始時のピッチの検出が開始される箇所（図５の時刻ｔ１１等）も変位点として用いる。なお、これに限らず、歌唱開始時のピッチの検出が開始される箇所を変位点として用いない構成としてもよい。 FIG. 5 is a diagram illustrating an example of the contents of the displacement point data. In the figure, the horizontal axis indicates time, and the vertical axis indicates pitch (or volume). A solid line 500 represents a change in the pitch (or volume) of the guide melody represented by the GM data, and is hereinafter referred to as a GM curve 500. In this embodiment, as a point at which the slope of the GM curve 500 changes greatly (hereinafter referred to as “displacement point”), a point where the pitch (or volume) has changed from an increase to a decrease (see, for example, time t19 in FIG. 5), an increase. Points that fall within a certain range, points that begin to rise (for example, see time t14), points that the pitch (or volume) starts to rise from descending (for example, see time t17), and that fall within a certain range that stops. The amount of change in the slope of the graph (GM curve 500) representing the pitch (or volume), such as a point (see, for example, time t13) or a point that starts to fall (see, for example, time t11), is equal to or greater than a predetermined threshold. Use a different position. In this embodiment, in addition to the location where the amount of change in the slope of the GM curve 500 is greater than or equal to a predetermined threshold, the location where the pitch detection at the start of singing is started (such as time t11 in FIG. 5) is also the displacement point. Used as In addition, it is good also as a structure which does not use as a displacement point not only this but the location where the detection of the pitch at the time of a singing start is started.

ユーザ歌唱音声データ記憶領域２５には、カラオケの対象となった各楽曲について、その伴奏データが再生されている期間中マイクロホン６１によって収音されたユーザの歌唱音声が音声処理部６０でデジタルデータに変換されることで生成された音声データが記憶される。この音声データをユーザ歌唱音声データという。このユーザ歌唱音声データは、音声の波形を表す音声データであり、例えば、ＷＡＶＥ（RIFF waveform Audio Format）形式のデータファイルとして記憶される。各楽曲についてのユーザ歌唱音声データは、制御部１０によって、その楽曲のＧＭデータに対応付けられる。 In the user singing voice data storage area 25, the user's singing voice collected by the microphone 61 during the period in which the accompaniment data is being reproduced for each piece of karaoke music is converted into digital data by the voice processing unit 60. Audio data generated by the conversion is stored. This voice data is called user singing voice data. The user singing voice data is voice data representing a voice waveform, and is stored as a data file in a WAVE (RIFF waveform Audio Format) format, for example. The user singing voice data for each song is associated with the GM data of the song by the control unit 10.

図６は、カラオケ装置１００の機能的構成の一例を示すブロック図である。図６において、再生部１１及び採点部１２は、制御部１０のＣＰＵが、ＲＯＭや記憶部２０に記憶されているコンピュータプログラムを読み出しＲＡＭにロードして実行することにより実現される。再生部１１は、カラオケ楽曲の再生を行う。具体的には、再生部１１は、伴奏データ及びＧＭデータに基づく音声をスピーカ６２から放音させるとともに、映像データに基づく映像を表示部４０に表示させる。 FIG. 6 is a block diagram illustrating an example of a functional configuration of the karaoke apparatus 100. In FIG. 6, the reproduction unit 11 and the scoring unit 12 are realized by the CPU of the control unit 10 reading a computer program stored in the ROM or the storage unit 20 and loading it into the RAM for execution. The reproducing unit 11 reproduces karaoke music. Specifically, the playback unit 11 causes the speaker 62 to emit sound based on the accompaniment data and the GM data, and causes the display unit 40 to display a video based on the video data.

採点部１２は、歌唱者の歌唱音声を表すデータ（以下「ユーザ歌唱音声データ」という）を採点する。採点部１２は、歌唱されている区間が特定歌唱を行う区間（以下「特定区間」）かそれ以外の区間（以下「標準区間」という）かを判定し、特定区間においては採点用データを用いて採点を行う一方、標準区間においてはＧＭデータを用いて採点を行う。より具体的には、採点部１２は、標準区間においては歌唱音声のピッチとＧＭデータのピッチとの差分に応じて歌唱を評価する一方、特定区間においては歌唱のピッチの変化量とＧＭデータのピッチの変化量との差分が小さいほど高評価となるように評価を行う。 The scoring unit 12 scores data representing the singing voice of the singer (hereinafter referred to as “user singing voice data”). The scoring unit 12 determines whether the section being sung is a section where a specific song is performed (hereinafter referred to as a “specific section”) or another section (hereinafter referred to as a “standard section”), and the scoring data is used in the specific section. On the other hand, scoring is performed using GM data in the standard section. More specifically, the scoring unit 12 evaluates the singing according to the difference between the pitch of the singing voice and the pitch of the GM data in the standard section, while the amount of change in the singing pitch and the GM data in the specific section. Evaluation is performed such that the smaller the difference from the change amount of the pitch, the higher the evaluation.

図７は、採点部１２の機能的構成の一例を示すブロック図である。図７において、ピッチ特定部１２１は、ユーザ歌唱音声データ記憶領域２５に記憶されたユーザ歌唱音声データを取得する音声取得部として機能するとともに、取得したユーザ歌唱音声データを解析し、ユーザ歌唱音声データにより示される音のピッチを特定するピッチ特定部として機能する。ピッチ特定部１２１は、特定したピッチを表すデータ（以下「ピッチデータ」という）を区間判定部１２３に出力する。音量特定部１２２は、ユーザ歌唱音声データ記憶領域２５に記憶されたユーザ歌唱音声データの音量を特定する。音量特定部１２２は、特定した音量を表すデータ（以下「音量データ」という）を区間判定部１２３に出力する。 FIG. 7 is a block diagram illustrating an example of a functional configuration of the scoring unit 12. In FIG. 7, the pitch specifying unit 121 functions as a voice acquisition unit that acquires the user singing voice data stored in the user singing voice data storage area 25, analyzes the acquired user singing voice data, and analyzes the user singing voice data. It functions as a pitch specifying unit that specifies the pitch of the sound indicated by. The pitch specifying unit 121 outputs data representing the specified pitch (hereinafter referred to as “pitch data”) to the section determining unit 123. The volume specifying unit 122 specifies the volume of the user singing voice data stored in the user singing voice data storage area 25. The volume specifying unit 122 outputs data representing the specified volume (hereinafter referred to as “volume data”) to the section determining unit 123.

区間判定部１２３は、採点用データ記憶領域２４に記憶された区間データを参照し、取得されたユーザ歌唱音声データが特定区間であるか否かを判定する。区間判定部１２３は、特定区間であると判定された場合には、ピッチ特定部１２１から取得したピッチデータをピッチ変位点特定部１２４に出力する。一方、それ以外の場合には、区間判定部１２３は、ピッチ特定部１２１から取得したピッチデータをピッチ比較部１２６に出力する。また、区間判定部１２３は、特定区間であると判定された場合には、音量特定部１２２から取得した音量データを音量変位点特定部１２５に出力する。一方、それ以外の場合には、区間判定部１２３は、音量特定部１２２から取得した音量データを音量比較部１２７に出力する。 The section determination unit 123 refers to the section data stored in the scoring data storage area 24 and determines whether or not the acquired user singing voice data is a specific section. If it is determined that the section is a specific section, the section determining unit 123 outputs the pitch data acquired from the pitch specifying unit 121 to the pitch displacement point specifying unit 124. On the other hand, in other cases, the section determination unit 123 outputs the pitch data acquired from the pitch identification unit 121 to the pitch comparison unit 126. Further, when it is determined that the section is a specific section, the section determining unit 123 outputs the volume data acquired from the volume specifying section 122 to the volume displacement point specifying section 125. On the other hand, in other cases, the section determination unit 123 outputs the volume data acquired from the volume identification unit 122 to the volume comparison unit 127.

ピッチ変位点特定部１２４は、区間判定部１２３から供給されるピッチデータの表すピッチの時間的な変化を表すグラフの傾きがその前後で予め定められた閾値以上変化する箇所を変位点として特定する。すなわち、ピッチ変位点特定部１２４は、区間判定部１２３から供給されるピッチデータの表すグラフの傾きの変化量が予め定められた閾値以上となる時刻を特定するとともに、その時刻におけるピッチの値を特定する。前記ピッチデータの表すグラフの傾きは、例えば以下のようにして求められる。ピッチ変位点特定部１２４は、隣り合うサンプルから傾きを求めても良く、また、複数のサンプルの近似曲線から傾きを求めてもよい。また、ピッチ変位点特定部１２４は、隣り合うサンプルから求めた傾きの列に対してＬＰＦ（ローパスフィルタ）をかけてもよい。また、ピッチ変位点特定部１２４は、傾きを算出する前にサンプルにＬＰＦをかけてもよい。また、ピッチ変位点特定部１２４は、１点１点微分して接線の傾きを求めてもよい。ピッチ変位点特定部１２４は、特定した時刻とピッチを表すピッチ変位点データをピッチ比較部１２６に出力する。 The pitch displacement point identifying unit 124 identifies, as a displacement point, a point where the slope of the graph representing the temporal change in the pitch represented by the pitch data supplied from the section determining unit 123 changes more than a predetermined threshold before and after that. . That is, the pitch displacement point specifying unit 124 specifies a time at which the amount of change in the slope of the graph represented by the pitch data supplied from the section determining unit 123 is equal to or greater than a predetermined threshold, and sets the pitch value at that time. Identify. The slope of the graph represented by the pitch data is obtained as follows, for example. The pitch displacement point specifying unit 124 may obtain an inclination from adjacent samples, or may obtain an inclination from an approximate curve of a plurality of samples. Further, the pitch displacement point specifying unit 124 may apply an LPF (low pass filter) to a column of slopes obtained from adjacent samples. In addition, the pitch displacement point specifying unit 124 may apply LPF to the sample before calculating the inclination. Further, the pitch displacement point specifying unit 124 may obtain the slope of the tangent by differentiating one point at a time. The pitch displacement point identification unit 124 outputs pitch displacement point data representing the identified time and pitch to the pitch comparison unit 126.

音量変位点特定部１２５は、区間判定部１２３から供給される音量データから、ユーザ歌唱音声データにおける音量の変位点を特定する。すなわち、音量変位点特定部１２５は、区間判定部１２３から供給される音量データの表すグラフの傾きがその前後で予め定められた閾値以上変化する時刻を特定するとともに、その時刻における音量の値を特定する。音量変位点特定部１２５は、特定した時刻と音量を表す音量変位点データを音量比較部１２７に出力する。 The volume displacement point identification unit 125 identifies the volume displacement point in the user singing voice data from the volume data supplied from the section determination unit 123. That is, the volume displacement point specifying unit 125 specifies a time at which the slope of the graph represented by the volume data supplied from the section determining unit 123 changes by a predetermined threshold before and after that, and sets the volume value at that time. Identify. The volume displacement point identification unit 125 outputs volume displacement point data representing the identified time and volume to the volume comparison unit 127.

ピッチ比較部１２６は、標準区間と特定区間とで異なる処理を行う。ピッチ比較部１２６は、標準区間においては、区間判定部１２３から出力されるユーザ歌唱音声のピッチデータと、このユーザ歌唱音声に対応するＧＭデータを取得する。ＧＭデータは、制御部１０がユーザによる歌唱の巧拙を評価する際に比較の基準となるものであって、歌唱の対象となる曲に対して予め定められた基準である。制御部１０は、ユーザ歌唱音声データとＧＭデータとを時間軸方向に対応付けるとともに、この対応付け結果に従ってユーザ歌唱音声データのピッチとＧＭデータのピッチとを比較し、両者の差分を表す比較結果データを生成する。 The pitch comparison unit 126 performs different processing in the standard section and the specific section. In the standard section, the pitch comparison unit 126 acquires pitch data of the user singing voice output from the section determination unit 123 and GM data corresponding to the user singing voice. The GM data is a reference for comparison when the control unit 10 evaluates the skill of singing by the user, and is a reference that is predetermined for the song to be sung. The control unit 10 associates the user singing voice data with the GM data in the time axis direction, compares the pitch of the user singing voice data with the pitch of the GM data according to the correspondence result, and compares the result data representing the difference between the two. Is generated.

一方、特定区間においては、ピッチ比較部１２６は、採点用データ記憶領域２４から、模範音データの変位点におけるピッチを表すピッチ変位点データを取得する。このピッチ変位点データは、模範となる音のピッチを示す模範音データの一例である。また、ピッチ比較部１２６は、採点用データ記憶領域２４から取得したピッチ変位点データの示す変位点（すなわち模範音データにより示されるピッチに関する変位点）と、この変位点の時刻から所定の時間差内に現れる、ピッチ変位点特定部１２４から出力されるピッチ変位点データの示す変位点（すなわちユーザ歌唱音声データにより示されるピッチに関する変位点）とを互いに対応付け、対応付けられたユーザ歌唱音声データに関する変位点におけるピッチの変化量と模範音データに関する変位点におけるピッチの変化量との差分を特定する。変位点同士の対応付けは、例えば以下のようにして行う。ピッチ比較部１２６は、ユーザ歌唱音声データのピッチ列と模範音データのピッチ列から変位点を求める。変位点の時間的な位置は、それぞれ曲の先頭位置を０（ゼロ）としたときの時刻で定まる。ピッチ比較部１２６は、模範音データのピッチ列から得た変位点の近傍に、ユーザ歌唱音声データのピッチ列から得た変位点があるか否かを探す。変位点の近傍としては、例えば、変位点の前後１秒以内、といったように時間でその範囲を決めてもよく、また、例えば、一拍等、テンポに依存するようにしてもよい。ピッチ比較部１２６は、近傍に変位点がない場合、模範音データの変位点に対応する変位点は存在しなかったものとする。一方、ピッチ比較部１２６は、近傍に変位点がひとつしかない場合は、その変位点が対応する変位点であるとする。また、ピッチ比較部１２６は、近傍に変位点が２つ以上ある場合は、複数の変位点のうち時間的に近いものを、対応付ける変位点として採用する。なお、ピッチ比較部１２６は、ピッチの傾きの変化の態様（例えば、上昇から下降に転じている、下降から上昇に転じている、等）が同じ変位点同士で比較する。 On the other hand, in the specific section, the pitch comparison unit 126 acquires pitch displacement point data representing the pitch at the displacement point of the model sound data from the scoring data storage area 24. This pitch displacement point data is an example of model sound data indicating the pitch of a model sound. Further, the pitch comparison unit 126 is within a predetermined time difference from the displacement point indicated by the pitch displacement point data acquired from the scoring data storage area 24 (that is, the displacement point relating to the pitch indicated by the model sound data) and the time of this displacement point. And the displacement point indicated by the pitch displacement point data output from the pitch displacement point specifying unit 124 (that is, the displacement point relating to the pitch indicated by the user singing voice data) are associated with each other and related to the associated user singing voice data The difference between the change amount of the pitch at the displacement point and the change amount of the pitch at the displacement point related to the model sound data is specified. The association between the displacement points is performed as follows, for example. The pitch comparison unit 126 obtains a displacement point from the pitch sequence of the user singing voice data and the pitch sequence of the exemplary sound data. The temporal position of the displacement point is determined by the time when the beginning position of the song is 0 (zero). The pitch comparison unit 126 searches for a displacement point obtained from the pitch sequence of the user singing voice data in the vicinity of the displacement point obtained from the pitch sequence of the model sound data. As the vicinity of the displacement point, the range may be determined by time, for example, within 1 second before and after the displacement point, or may depend on the tempo, such as one beat. When there is no displacement point in the vicinity, it is assumed that there is no displacement point corresponding to the displacement point of the model sound data. On the other hand, when there is only one displacement point in the vicinity, the pitch comparison unit 126 assumes that the displacement point is a corresponding displacement point. In addition, when there are two or more displacement points in the vicinity, the pitch comparison unit 126 employs a plurality of displacement points that are temporally close as the corresponding displacement points. Note that the pitch comparison unit 126 compares the displacement points having the same pitch inclination change mode (for example, turning from rising to falling, turning from falling to rising, etc.).

ピッチ比較部１２６は、特定した差分を表す比較結果データを生成する。ピッチ比較部１２６は、本発明に係る対応付け部及び差分特定部の一例に相当する。この実施形態では、ピッチ比較部１２６は、以下のような処理を行って比較結果データを生成する。図８及び図９は、特定区間におけるピッチ比較部１２６の処理の内容を説明するための図である。図８は、変位点における時刻の変化量の差分の算出処理の内容を示す図であり、図９は、変位点におけるピッチの変化量の差分の算出処理の内容を示す図である。図８及び図９において、横軸は時刻を示し、縦軸はピッチを示す。ＧＭ曲線５００は図５に示したそれと同様である。実線３００は、ユーザ歌唱音声データによって表されるユーザの歌唱時の音声のピッチの変化を表しており、以下、ユーザ歌唱音声曲線３００という。まず、ピッチ比較部１２６は、採点用データに含まれる、変位点Ｘ_i（ｉは１からｎ（ｎ＞１）までの整数）におけるピッチ変位点データ（ｇｔ_i，ｇｐ_i）と、その直前の変位点Ｘ_i-1におけるピッチ変位点データ（ｇｔ_i-1，ｇｐ_i-1）との変化量（Δｇｔ_i，Δｇｐ_i）＝（ｇｔ_i−ｇｔ_i-1，ｇｐ_i−ｇｐ_i-1）を算出する。同様に、ピッチ比較部１２６は、ユーザ歌唱音声データの変位点ＵＸ_iにおけるピッチ変位点データ（ｕｔ_i，ｕｐ_i）と、その直前の変位点ＵＸ_i-1におけるピッチ変位点データ（ｕｔ_i-1，ｕｐ_i-1）との変化量（Δｕｔ_i，Δｕｐ_i）＝（ｕｔ_i−ｕｔ_i-1，ｕｐ_i−ｕｐ_i-1）を算出する。 The pitch comparison unit 126 generates comparison result data representing the specified difference. The pitch comparison unit 126 corresponds to an example of an association unit and a difference specifying unit according to the present invention. In this embodiment, the pitch comparison unit 126 performs the following processing to generate comparison result data. 8 and 9 are diagrams for explaining the processing contents of the pitch comparison unit 126 in the specific section. FIG. 8 is a diagram showing the contents of the calculation process of the difference in time variation at the displacement point, and FIG. 9 is the diagram showing the contents of the calculation process of the difference in pitch variation at the displacement point. 8 and 9, the horizontal axis indicates time, and the vertical axis indicates pitch. The GM curve 500 is similar to that shown in FIG. A solid line 300 represents a change in the pitch of the voice when the user sings represented by the user singing voice data, and is hereinafter referred to as a user singing voice curve 300. First, the pitch comparison unit 126 includes pitch displacement point data (gt _i , gp _i ) at the displacement point X _i (i is an integer from 1 to n (n> 1)) included in the scoring data, and immediately before that. Change amount (Δgt _i , Δgp _i ) = (gt _i −gt _i−1 , gp _i −gp _i− ) with respect to the pitch displacement point data (gt _i−1 , gp _i−1 ) at the displacement point X _i−1 . ₁ ) Calculate. Similarly, pitch comparison section 126, the user pitch displacement point data (ut _{_i,} up _i) in displacement point UX _i of the singing voice data and, pitch displacement point data in displacement point UX _i-1 of the immediately preceding (ut _{i- 1} , up _i-1 ) (Δut _i , Δup _i ) = (ut _i −ut _i−1 , up _i −up _i−1 ) is calculated.

次いで、ピッチ比較部１２６は、採点用データの変位点毎に求めた変化量（Δｇｔ_i，Δｇｐ_i）と、ユーザ歌唱音声データの変位点毎に求めた変化量（Δｕｔ_i，Δｕｐ_i）との差分値（Δｔ_i，Δｐ_i）＝（Δｇｔ_i−Δｕｔ_i，Δｇｐ_i−Δｕｐ_i）を算出し、算出結果を表す時刻の差分値Δｔ_iと、ピッチの差分値Δｐ_iとを、比較結果データとして出力する。 Next, the pitch comparison unit 126 calculates the amount of change (Δgt _i , Δgp _i ) obtained for each displacement point of the scoring data and the amount of change (Δut _i , Δup _i ) obtained for each displacement point of the user singing voice data. the difference value _{_{(Δt i, Δp i) =}} (Δgt i -Δut i, Δgp i -Δup i) is calculated, the difference value Delta] t _i of time representing the calculation result, and the difference value Delta] p _i of the pitch, comparison Output as result data.

音量比較部１２７は、標準区間と特定区間とで異なる処理を行う。音量比較部１２７は、標準区間においては、ユーザ歌唱音声データ記憶領域２５に記憶されたユーザ歌唱音声データの音量と、予め定められた音量基準値とを比較し、両者の差分を表す比較結果データを生成する。 The volume comparison unit 127 performs different processing in the standard section and the specific section. In the standard interval, the volume comparison unit 127 compares the volume of the user singing voice data stored in the user singing voice data storage area 25 with a predetermined volume reference value, and provides comparison result data representing the difference between the two. Is generated.

一方、特定区間においては、音量比較部１２７は、採点用データに含まれる音量変位点データとユーザ歌唱音声データの音量変位点データとを用いて、両者の変化量の差分を表す比較結果データを生成する。この比較処理はピッチ比較部１２６が行うピッチの比較処理と同様である。すなわち、音量比較部１２７は、採点用データ記憶領域２４に記憶された採点用データに含まれる変位点毎の音量変位点データについて、変位点Ｘ_iにおける音量変位点データ（ｇｔ_i，ｇｖ_i）とその直前の変位点Ｘ_i-1における音量変位点データ（ｇｔ_i-1，ｇｖ_i-1）との変化量（Δｇｔ_i，Δｇｖ_i）を算出するとともに、ユーザ歌唱音声データの変位点についても同様の処理を行って変化量（Δｕｔ_i，ｕｖ_i）を算出し、両者の差分値（Δｔ_i，Δｖ_i）＝（Δｇｔ_i−Δｕｔ_i，Δｇｖ_i−Δｕｖ_i）を、比較結果データとして出力する。 On the other hand, in the specific section, the volume comparison unit 127 uses the volume displacement point data included in the scoring data and the volume displacement point data of the user singing voice data, and provides comparison result data representing the difference between the changes in both. Generate. This comparison process is the same as the pitch comparison process performed by the pitch comparison unit 126. That is, the volume comparison unit 127 performs the volume displacement point data (gt _i , gv _i ) at the displacement point X _i for the volume displacement point data for each displacement point included in the scoring data stored in the scoring data storage area 24. And the amount of change (Δgt _i , Δgv _i ) between the volume displacement point data (gt _i-1 , gv _i-1 ) at the displacement point X _i-1 immediately before that and the displacement point of the user singing voice data The same processing is performed to calculate the amount of change (Δut _i , uv _i ), and the difference value (Δt _i , Δv _i ) = (Δgt _i −Δut _i , Δgv _i −Δuv _i ) is compared with the comparison result data Output as.

採点出力部１２８は、ピッチ比較部１２６から出力される比較結果データと、音量比較部１２７から出力される比較結果データとに基づいて、歌唱音声の評価処理を行い、評価結果を表示部４０等に出力する。採点出力部１２８は、採点用データ記憶領域２４から、特定区間を表す特定区間データを取得し、取得した特定区間データの示す特定区間においては、歌唱音声データのピッチの変化量と模範音データのピッチの変化量との差分が小さいほど高評価となるように評価処理を行う一方、特定区間以外の区間においては、ユーザ歌唱音声データのピッチとＧＭデータのピッチとの差分に基づいてユーザ歌唱音声データを評価する。より具体的には、例えば、標準区間においては、採点出力部１２８は、ユーザ歌唱音声データが示す音声のピッチの変化と、ＧＭデータが示すガイドメロディのピッチの変化とを比較し、これらの一致の程度を示す評価値を算出する。評価値は、あるノートにおいて、両者のピッチの差が予め定められた許容範囲内に収まっていれば１００％（すなわち減点なし）とし、両者のピッチの差が上記範囲内に収まらない部分の期間が、ＧＭデータにおいてこのノートにおける音長の半分に渡っていれば５０％である、といった具合であってもよい。つまり、あるノートにおいて、両者のピッチの差が上記範囲内に収まる期間を、ＧＭデータにおいてこのノートにおける音長で除した値を評価値とする。制御部１０は、算出した評価値に基づいて減点するポイントを決定する。例えば、あるノートに「２点」のポイントが割り当てられているときに、評価値が５０％と算出された場合、制御部１０は、「１点」を減点のポイントとして決定する。 The scoring output unit 128 performs singing voice evaluation processing based on the comparison result data output from the pitch comparison unit 126 and the comparison result data output from the volume comparison unit 127, and the evaluation result is displayed on the display unit 40 or the like. Output to. The scoring output unit 128 acquires specific section data representing the specific section from the scoring data storage area 24, and in the specific section indicated by the acquired specific section data, the pitch change amount of the singing voice data and the model sound data While the evaluation process is performed so that the smaller the difference from the change amount of the pitch, the higher the evaluation, the user singing voice is calculated based on the difference between the pitch of the user singing voice data and the pitch of the GM data in the sections other than the specific section. Evaluate the data. More specifically, for example, in the standard section, the scoring output unit 128 compares the change in the pitch of the voice indicated by the user singing voice data with the change in the pitch of the guide melody indicated by the GM data, and matches them. An evaluation value indicating the degree of is calculated. The evaluation value is 100% (that is, there is no deduction) if the difference between the two pitches is within a predetermined allowable range in a certain note, and the period during which the difference between the two pitches does not fall within the above range. However, in the case of GM data, it may be 50% if the note length is half of the note length in this note. That is, in a certain note, a value obtained by dividing a period in which the difference in pitch between the two is within the above range by the sound length in this note in GM data is used as an evaluation value. The control unit 10 determines a point to be deducted based on the calculated evaluation value. For example, when “2 points” is assigned to a certain note and the evaluation value is calculated as 50%, the control unit 10 determines “1 point” as a deduction point.

評価値は、例えば、あるノートにおいて、両者のピッチの差が予め定められた許容範囲内に収まっていれば１００％（すなわち減点なし）とし、両者のピッチの差が上記範囲内に収まらない部分の期間が、ＧＭデータにおいてこのノートにおける音長の半分に渡っていれば５０％である、といった具合であってもよい。なお、音量基準値は、ＧＭに含まれるノート毎に設定されていてもよく、また、例えば、小節毎等の予め定められた区間毎に設定されていてもよい。 The evaluation value is, for example, 100% (that is, no deduction point) if the difference between the two pitches is within a predetermined allowable range in a certain note, and the difference between the two pitches is not within the above range. For example, the period may be 50% if the period is half of the note length in this note in the GM data. Note that the volume reference value may be set for each note included in the GM, or may be set for each predetermined section such as for each measure.

一方、特定区間においては、採点出力部１２８は、例えば、模範音データの変位点一箇所ごとに点数をつけ、特定区間終了後に平均を求めてもよい。より具体的には、例えば、採点出力部１２８が、ずれ度ｘ_iを下記の（１）式で算出し、全ての変位点のずれ度ｘ_iの平均値が小さいほど点数が高くなるように採点を行ってもよい。以下の（１）式において、Δｔ_iはピッチ変位点の時間差（ピッチ比較部１２６により算出される差分値Δｔ_i）を示し、Δｐ_iは変位点のピッチ差（ピッチ比較部１２６により算出される差分値Δｐ_i）を示す。また、α，βは重み付け係数である。
ｘ_i＝｜Δｔ_i｜＊α＋｜Δｐ_i｜＊β …（１） On the other hand, in the specific section, the scoring output unit 128 may score, for example, for each displacement point of the model sound data, and obtain the average after the end of the specific section. More specifically, for example, as scoring output unit 128 calculates the deviation of x _i by the following equation (1), the number the higher the average value of deviation of x _i of all the displacement point smaller increases Scoring may be performed. In the following equation (1), Δt _i represents a time difference between pitch displacement points (difference value Δt _i calculated by the pitch comparison unit 126), and Δp _i represents a pitch difference between displacement points (calculated by the pitch comparison unit 126). Difference value Δp _i ) is shown. Α and β are weighting coefficients.
x _i = | Δt _i | * α + | Δp _i | * β (1)

なお、採点の態様は上記のものに限らず、他の態様であってもよい。例えば、採点出力部１２８は、時間差の偏差とピッチ差の偏差をそれぞれ変位点毎に点数化したのち、変位点毎の点数ｓ_iを下記の（２）式により算出し、全変位点の得点の平均が大きいほど点数が高くなるように採点を行ってもよい。以下の（２）式において、ｓｔ_iは変位点Ｘ_iにおける時間差の偏差に基づく点数を示し、ｓｐ_iは変位点Ｘ_iにおけるピッチ差の偏差に基づく点数を示す。
ｓ_i＝ｓｔ_i＊α＋ｓｐ_i＊β …（２） In addition, the aspect of scoring is not limited to the above, but may be other aspects. For example, the scoring output unit 128 scores the deviation of the time difference and the deviation of the pitch difference for each displacement point, calculates the score s _i for each displacement point by the following equation (2), and scores all the displacement points Scoring may be performed such that the higher the average of, the higher the score. In the following equation (2), st _i denotes the score based on the deviation of the time difference in the displacement point X _i, sp _i denotes the score based on the deviation of the pitch difference in displacement point X _i.
s _i = st _i * α + sp _i * β (2)

＜動作＞
図１０は、制御部１０が行う処理の流れを示すフロー図である。操作部３０を介してユーザにより楽曲が予約されると（ステップＳ１００；Ｙｅｓ）、制御部１０は、記憶部２０から予約された楽曲の検索を行う（ステップＳ１０２）。具体的にはステップＳ１０２において、制御部１０は、伴奏データ記憶領域２１、映像データ記憶領域２２、及びＧＭデータ記憶領域２３の各々から、選択された楽曲の曲番号または曲名をキーにして、その楽曲に関するデータを検索し、検索結果のデータをＲＡＭに読み込む。 <Operation>
FIG. 10 is a flowchart showing the flow of processing performed by the control unit 10. When the music is reserved by the user via the operation unit 30 (step S100; Yes), the control unit 10 searches for the reserved music from the storage unit 20 (step S102). Specifically, in step S102, the control unit 10 uses the song number or song name of the selected song from each of the accompaniment data storage area 21, the video data storage area 22, and the GM data storage area 23 as a key. Data related to the music is searched, and the search result data is read into the RAM.

次いで、制御部１０は、ＲＡＭに記憶された伴奏データ、映像データ、及びＧＭデータに基づいて、カラオケ楽曲の再生を行う（ステップＳ１０４）。具体的にはステップＳ１０４において、制御部１０は、伴奏データ及びＧＭデータに基づく音声をスピーカ６２から放音させるとともに、映像データに基づく映像を表示部４０に表示させる。そして制御部１０は、マイク６１によって収音されたユーザの歌唱音声が音声処理部６０によってデジタルのデータに変換されたものであるユーザ歌唱音声データを、ユーザ歌唱音声データ記憶領域２５に記憶させる（ステップＳ１０６）。カラオケ楽曲の再生が終了すると、制御部１０は、ユーザ歌唱音声データ記憶領域２５に記憶されたユーザ歌唱音声データとＧＭデータ及び採点用データとに基づいて、歌唱の採点を行う（ステップＳ１０８）。そして制御部１０は、採点結果を表示部４０に表示させる（ステップＳ１１０）。 Next, the control unit 10 reproduces karaoke music based on the accompaniment data, video data, and GM data stored in the RAM (step S104). Specifically, in step S104, the control unit 10 causes the speaker 62 to emit sound based on the accompaniment data and the GM data, and causes the display unit 40 to display a video based on the video data. And the control part 10 memorize | stores in the user song voice data storage area 25 the user song voice data by which the user's song voice picked up by the microphone 61 is converted into digital data by the voice processing part 60 ( Step S106). When the reproduction of the karaoke music is completed, the control unit 10 scores the singing based on the user singing voice data, the GM data, and the scoring data stored in the user singing voice data storage area 25 (step S108). And the control part 10 displays a scoring result on the display part 40 (step S110).

図１１は、制御部１０が行う採点処理（図１０のステップＳ１０８）の流れを示すフロー図である。まず、制御部１０は、ユーザ歌唱音声データにより示される音のピッチを特定する（ステップＳ２００）。次いで、制御部１０は、特定区間においてユーザ歌唱音声データからピッチ変位点を特定する（ステップＳ２１０）。次いで、制御部１０は、予め定められた単位区間毎に、以下のステップＳ２２０からステップＳ２５０の処理を行うことによって、ユーザ歌唱音声の採点を行う。まず、制御部１０は、採点対象が標準区間であるか特定区間であるかを判定する（ステップＳ２２０）。制御部１０は、標準区間である場合は（ステップＳ２２０；ＮＯ）、ユーザ歌唱音声データのピッチとＧＭデータのピッチとを比較し、両者の差分に応じて評価値を算出するとともに、ユーザ歌唱音声データの音量と予め定められた音量基準値とを比較し、両者の差分に応じて評価値を算出する（ステップＳ２３０）。一方、制御部１０は、特定区間である場合には（ステップＳ２２０；ＮＯ）、ユーザ歌唱音声データから特定されたピッチ変位点を表すデータの変化量と採点用データ記憶領域２４に記憶されたピッチ変位点データの変化量とを比較し、両者の変化量の差分に応じた評価値を算出するとともに、ユーザ歌唱音声データから特定された音量変位点を表すデータと採点用データ記憶領域２４に記憶された音量変位点データとを比較し、両者の変化量の差分に応じた評価値を算出する（ステップＳ２４０）。 FIG. 11 is a flowchart showing the flow of the scoring process (step S108 in FIG. 10) performed by the control unit 10. First, the control part 10 specifies the pitch of the sound shown by user song voice data (step S200). Next, the control unit 10 specifies the pitch displacement point from the user singing voice data in the specific section (step S210). Next, the control unit 10 scores the user singing voice by performing the following steps S220 to S250 for each predetermined unit section. First, the control unit 10 determines whether the scoring target is a standard section or a specific section (step S220). When it is a standard section (step S220; NO), the control unit 10 compares the pitch of the user singing voice data with the pitch of the GM data, calculates the evaluation value according to the difference between the two, and the user singing voice. The volume of the data is compared with a predetermined volume reference value, and an evaluation value is calculated according to the difference between the two (step S230). On the other hand, when it is a specific section (step S220; NO), the control unit 10 changes the data representing the pitch displacement point specified from the user singing voice data and the pitch stored in the scoring data storage area 24. The change amount of the displacement point data is compared, an evaluation value corresponding to the difference between the two change amounts is calculated, and data representing the volume displacement point specified from the user singing voice data and stored in the scoring data storage area 24 The obtained volume displacement point data is compared, and an evaluation value corresponding to the difference between the two changes is calculated (step S240).

制御部１０は、採点していない区間があるかを判定することによって処理を終了するか否かを判定し（ステップＳ２５０）、採点対象である区間がある場合には（ステップＳ２５０；ＮＯ）、ステップＳ２２０に戻って次の区間の採点を行う一方、楽曲の最後まで採点したと判定された場合には（ステップＳ２５０；ＹＥＳ）、採点処理を終了する。 The control unit 10 determines whether or not to end the process by determining whether there is an unscored section (step S250), and when there is a section to be scored (step S250; NO), While returning to step S220 and scoring the next section, if it is determined that the music has been scored to the end (step S250; YES), the scoring process is terminated.

ところで、ラップの歌唱においては、小節の終わりなどで韻を踏みながら、あまりメロディを付けずにリズミカルに喋るように歌唱される。そのため、ラップの採点においてはピッチの一致度はそれほど重視されるものではなく、イントネーションやリズムが重視される。一般的な歌唱の採点においては、メロディに乗せて歌う歌唱の採点を目的としているため、ピッチの絶対的な値を重視している。一方、ラップでは、ピッチの値ではなく、抑揚やリズム感を重視するため、従来の方式では採点が困難である。この実施形態では、ユーザ歌唱音声データのピッチの変化量とＧＭデータのピッチの変化量との差分に応じて採点を行うから、ラップやものまね等、抑揚やリズム感が重視される音声を好適に採点することができる。 By the way, in rap singing, it is sung in a rhythmic manner without much melody, while rhyming at the end of a measure. For this reason, in scoring laps, the degree of coincidence of pitch is not so important, and intonation and rhythm are emphasized. In general singing grading, the goal is to sing a song sung on a melody, so the absolute value of the pitch is emphasized. On the other hand, in the lap, since the emphasis is not on the pitch value but on the inflection and rhythm, it is difficult to score with the conventional method. In this embodiment, scoring is performed according to the difference between the amount of change in the pitch of the user singing voice data and the amount of change in the pitch of the GM data. Can be scored.

＜変形例＞
以上の実施形態は次のように変形可能である。尚、以下の変形例は適宜組み合わせて実施しても良い。また、上記実施形態と以下の変形例を組み合わせて実施しても良い。 <Modification>
The above embodiment can be modified as follows. In addition, you may implement the following modifications suitably combining. Moreover, you may implement combining the said embodiment and the following modifications.

＜変形例１＞
上述の実施形態では、区間データを参照して特定区間とそれ以外の区間とで異なる評価処理を行うようにしたが、これに限らず、特定区間か否かを判定する処理を行わないようにしてもよい。この場合は、楽曲の全ての区間において上述の採点用データを用いた採点を行うようにしてもよい。より具体手的には、例えば、ジャンルが「ラップ」の楽曲については、楽曲の全ての区間において採点用データを用いた採点処理を行うようにしてもよい。この態様によれば、制御部１０は、採点処理を切り替える必要がなく、また、曲データに特定区間を指示する必要がない。 <Modification 1>
In the above-described embodiment, different evaluation processes are performed in the specific section and other sections with reference to the section data. However, the present invention is not limited to this, and the process for determining whether or not the specific section is performed is not performed. May be. In this case, scoring using the above scoring data may be performed in all sections of the music. More specifically, for example, for a music piece whose genre is “rap”, a scoring process using scoring data may be performed in all sections of the music piece. According to this aspect, the control unit 10 does not need to switch the scoring process and does not need to indicate a specific section in the music data.

＜変形例２＞
上述の実施形態では、制御部１０は、ユーザ歌唱音声データからピッチ変位点と音量変位点とを特定し、隣り合う変位点の変化量の差分を、ユーザ歌唱音声データと採点用データとで比較した。ユーザ歌唱音声データと採点用データの比較処理の態様は上述したものに限定されるものではなく、例えば、以下のような処理であってもよい。まず、制御部１０は、予め定められた時間間隔で、ユーザ歌唱音声データのピッチの変化量を算出するとともに、模範音データ（例えば、ＧＭデータ）のピッチの変化量を予め定められた時間間隔で算出する。次いで、制御部１０は、算出したユーザ歌唱音声データのピッチの変化量と模範音データのピッチの変化量とを比較し、両者の差分を算出する。図１２に示す例では、模範音データのピッチの一定間隔毎の差（ｂ−ａ），（ｃ−ｂ），（ｄ−ｃ），…と、ユーザ歌唱音声データのピッチの一定間隔毎の差（ｂ´−ａ´），（ｃ´−ｂ´），（ｄ´−ｃ´），…とが比較され、両者の差分が算出される。この算出処理によって求められる差分が小さいほど、ユーザ歌唱音声データのピッチと模範音データのピッチとの差分の偏差が小さいといえる。そのため、制御部１０が、算出される差分が小さいほど高評価となるような評価処理を行うことで、上述の実施形態と同様の評価処理が行われる。このように、制御部１０は、変位点におけるピッチの変化量の差分に応じて採点を行うことに代えて、ユーザ歌唱音声データのピッチと模範音データにより示される音のピッチとを予め定められた単位時間毎に比較し、両者の変化量の差分に基づいた評価処理を行ってもよい。この態様においても、上述の実施形態と同様に、抑揚やリズム感が重視される歌唱法による歌唱を好適に評価することができる。要は、制御部１０は、ユーザ歌唱音声データのピッチの変化量と模範音データにより示される音のピッチの変化量との差分に基づいてユーザ歌唱音声データにより示される音を評価し、評価結果を出力するものであればよい。 <Modification 2>
In the above-described embodiment, the control unit 10 identifies the pitch displacement point and the volume displacement point from the user singing voice data, and compares the difference in the amount of change between adjacent displacement points between the user singing voice data and the scoring data. did. The aspect of the comparison process between the user singing voice data and the scoring data is not limited to that described above, and may be, for example, the following process. First, the control unit 10 calculates the amount of change in the pitch of the user singing voice data at a predetermined time interval and sets the amount of change in the pitch of the model sound data (for example, GM data) at the predetermined time interval. Calculate with Next, the control unit 10 compares the calculated amount of change in the pitch of the user singing voice data with the amount of change in the pitch of the model sound data, and calculates the difference between the two. In the example shown in FIG. 12, the difference (b−a), (c−b), (dc),... Of the pitch of the model sound data at regular intervals, and the pitch of the user singing voice data at regular intervals. The differences (b′−a ′), (c′−b ′), (d′−c ′),... Are compared, and the difference between them is calculated. It can be said that the smaller the difference obtained by this calculation process, the smaller the difference between the pitch of the user singing voice data and the pitch of the model sound data. Therefore, the evaluation process similar to the above-described embodiment is performed by performing an evaluation process in which the control unit 10 performs a higher evaluation as the calculated difference is smaller. Thus, instead of scoring according to the difference in the amount of change in pitch at the displacement point, the control unit 10 is predetermined with the pitch of the user singing voice data and the pitch of the sound indicated by the model sound data. Alternatively, the evaluation processing may be performed based on the difference between the two amounts of change. Also in this aspect, as in the above-described embodiment, singing by a singing method in which emphasis is placed on intonation and rhythmic feeling can be suitably evaluated. In short, the control unit 10 evaluates the sound indicated by the user singing voice data based on the difference between the change amount of the pitch of the user singing voice data and the change amount of the pitch of the sound indicated by the model sound data. Can be output.

＜変形例３＞
また、ユーザ歌唱音声データと採点用データの比較処理は、以下のような処理であってもよい。まず、制御部１０は、ＧＭ曲線５００の傾きを求め、この傾きの値の範囲に応じて特定区間を複数の区間に分割する。例えば、制御部１０は、傾きが正の値である区間（すなわちピッチが上昇している区間）、負の値である区間（すなわちピッチが下降している区間）、ゼロ（又はゼロに近い予め定められた閾値内である）値である区間（すなわちピッチの変化が少ない区間）、に分割してもよい。同様に、制御部１０は、ユーザ歌唱音声曲線３００の傾きを求め、この傾きの値の範囲に応じて特定区間を複数の区間に分割する。次いで、制御部１０は、ピッチの変化の態様が同じである区間（例えば、共に傾きが正の値である区間）（図１３の区間Ａ１参照）については高評価とする一方、ピッチの変化の態様が異なる区間（例えば、一方の傾きの値が正である一方、他方の傾きの値が負である区間）（図１３の区間Ａ２参照）については低評価となるように評価処理を行うようにしてもよい。ピッチの変化の態様が異なる区間は、ユーザ歌唱音声データのピッチの変化量と模範音データのピッチの変化量との差分は大きくなるといえるから、この評価処理においても、上述の実施形態と同様に、歌唱音声データのピッチの変化量と模範音データのピッチの変化量との差分が大きいほど低評価となる評価処理が行われる。そのため、この態様においても、抑揚やリズム感が重視される歌唱法による歌唱を好適に評価することができる。また、この態様によれば、歌唱を評価する際に変位点を特定する必要がない。 <Modification 3>
Moreover, the following processes may be sufficient as the comparison process of user song voice data and the data for scoring. First, the control unit 10 obtains the slope of the GM curve 500, and divides the specific section into a plurality of sections according to the range of the slope value. For example, the control unit 10 may use a section having a positive slope (that is, a section where the pitch is increasing), a section having a negative value (that is, a section where the pitch is decreasing), zero (or preliminarily close to zero). You may divide | segment into the area (namely, area with little change of a pitch) which is a value which is in the defined threshold value. Similarly, the control part 10 calculates | requires the inclination of the user singing voice curve 300, and divides | segments a specific area into a some area according to the range of the value of this inclination. Next, the control unit 10 makes a high evaluation for a section in which the pitch change mode is the same (for example, a section in which both slopes are positive values) (see section A1 in FIG. 13), while the change in pitch changes. An evaluation process is performed so that a section having a different aspect (for example, a section in which one slope value is positive while the other slope value is negative) (see section A2 in FIG. 13) has a low evaluation. It may be. Since it can be said that the difference between the amount of change in the pitch of the user singing voice data and the amount of change in the pitch of the model sound data is large in the sections having different modes of change in the pitch, this evaluation process is similar to the above-described embodiment. The evaluation processing is performed such that the lower the difference between the pitch change amount of the singing voice data and the pitch change amount of the model sound data, the lower the evaluation. Therefore, also in this aspect, the singing by the singing method where emphasis is placed on the intonation and the rhythmic feeling can be suitably evaluated. Moreover, according to this aspect, it is not necessary to specify a displacement point when evaluating a song.

＜変形例４＞
また、ユーザ歌唱音声データと採点用データの比較処理は、以下のような処理であってもよい。まず、制御部１０は、ユーザ歌唱音声データからピッチを特定し、ピッチの平均値を算出する。また、制御部１０は、模範音データのピッチの平均値を算出し、ピッチの値と算出した平均値との差分が予め定められた閾値以上となる区間（図１４の区間Ａ１１，Ａ１２，…参照）を示す区間データを生成する。なお、制御部１０がこの区間データを生成するに限らず、区間データを予め採点用データに含めて採点用データ記憶領域２４に記憶しておく構成としてもよい。制御部１０は、ユーザ歌唱音声データのピッチと平均値との差分が予め定められた閾値以上となる区間（図１４の区間Ａ２１，Ａ２２，…参照）を特定し、特定した区間と区間データの示す区間とを比較し、重複する部分が大きいほど高評価となるように評価処理を行う。この態様においては、ピッチの平均値からの差分が閾値以上となる区間の重複量に応じて評価されるから、抑揚やリズム感が重視される歌唱法による歌唱を好適に採点することができる。 <Modification 4>
Moreover, the following processes may be sufficient as the comparison process of user song voice data and the data for scoring. First, the control part 10 specifies a pitch from user song voice data, and calculates the average value of a pitch. Further, the control unit 10 calculates an average value of the pitch of the model sound data, and a section where the difference between the pitch value and the calculated average value is equal to or greater than a predetermined threshold (sections A11, A12,... Reference section) is generated. The control unit 10 is not limited to generating the section data, and the section data may be previously included in the scoring data and stored in the scoring data storage area 24. The control unit 10 identifies a section (see sections A21, A22,... In FIG. 14) in which the difference between the pitch of the user singing voice data and the average value is equal to or greater than a predetermined threshold, and the identified section and the section data. The section shown is compared, and the evaluation process is performed so that the larger the overlapping part is, the higher the evaluation is. In this aspect, since the evaluation is performed according to the overlap amount of the section in which the difference from the average value of the pitch is equal to or greater than the threshold value, the singing by the singing method in which the inflection and the sense of rhythm are emphasized can be scored suitably.

また、他の例として、例えば、制御部１０が、ユーザ歌唱音声データにおいて所定の時間間隔（例えば５００ｍｓ程度）でピッチを取得し、隣り合うサンプル毎のピッチの差の絶対値の合計値を算出するとともに、模範音データにおいて同様の時間間隔でピッチを取得した場合の隣り合うピッチの差の絶対値の合計値を算出し、ユーザ歌唱音声データにおける合計値と模範音データにおける合計値とを比較してもよい。この場合、ユーザ歌唱音声データから算出された合計値と模範音データから算出された合計値との差分が小さいほど高評価となるように評価処理を行ってもよい。この態様によれば、模範音データのピッチを用いて評価することができ、また、評価処理に要する計算量を軽くすることができる。 As another example, for example, the control unit 10 acquires a pitch at a predetermined time interval (for example, about 500 ms) in the user singing voice data, and calculates a total value of absolute values of a difference in pitch between adjacent samples. In addition, the absolute value of the difference between adjacent pitches when pitches are acquired at similar time intervals in the model sound data is calculated, and the total value in the user singing voice data is compared with the total value in the model sound data May be. In this case, the evaluation process may be performed such that the smaller the difference between the total value calculated from the user singing voice data and the total value calculated from the model sound data, the higher the evaluation. According to this aspect, the evaluation can be performed using the pitch of the model sound data, and the calculation amount required for the evaluation process can be reduced.

＜変形例５＞
上述の実施形態では、制御部１０は、上述の（１）式を用いて変位点毎の採点を行ったが、採点処理の態様はこれに限らず、例えば、図１５に示すような採点関数４００を用いて採点値を算出してもよい。図１５は、変位点一箇所あたりの得点の算出処理の内容を示す図であり、横軸はユーザ歌唱音声データと採点用データとの時間（又はピッチ）の変化量の差分を示し、縦軸は得点を示す。図１５に示す例では、制御部１０は、変化量の差分がある一定範囲内であれば満点となり、それ以降は点数が下がり、ある一定量以上のずれは最低点となるように得点を算出する。すなわち、制御部１０は、ユーザ歌唱音声データの変化量と採点用データの変化量との差分が予め定められた閾値以内である場合には同一の評価結果を出力する一方、それ以外の場合には、差分が大きいほど低い評価結果を出力する。この態様によれば、ずれをある程度許容するような計算方式をとることができるため、より聴感に近い採点結果が得られる。 <Modification 5>
In the above-described embodiment, the control unit 10 performs scoring for each displacement point using the above-described equation (1). However, the scoring process is not limited to this, and for example, a scoring function as shown in FIG. The scoring value may be calculated using 400. FIG. 15 is a diagram showing the content of the score calculation process per displacement point, the horizontal axis shows the difference in the amount of change in time (or pitch) between the user singing voice data and the scoring data, and the vertical axis Indicates a score. In the example shown in FIG. 15, the control unit 10 calculates a score so that if the difference in the amount of change is within a certain range, the score will be full, the score will decrease thereafter, and the deviation over a certain amount will be the lowest score. To do. That is, the control unit 10 outputs the same evaluation result when the difference between the change amount of the user singing voice data and the change amount of the scoring data is within a predetermined threshold value, but otherwise Outputs a lower evaluation result as the difference is larger. According to this aspect, since it is possible to adopt a calculation method that allows a certain amount of deviation, a scoring result closer to hearing is obtained.

＜変形例６＞
上述の実施形態では、変位点ごとに採点し、変位点ごとの採点値の平均値を求めることで特定区間の採点を行うようにしたが、特定区間の採点の態様はこれに限定されるものではない。例えば、採点用データにおいてある時刻にピッチの変位点があるとして、その時間的なごく近傍に歌唱ピッチの変位点があると高得点が得られ、時間的に離れれば離れるほど得点が下がるようにしてもよい。また、例えば、制御部１０が、採点用データのピッチの変位点がない時刻に歌唱ピッチが変位点をとると減点するようにしてもよい。 <Modification 6>
In the above-described embodiment, scoring is performed for each specific point by scoring for each displacement point and obtaining the average value of the scoring values for each displacement point. However, the scoring mode for the specific region is limited to this. is not. For example, if there is a pitch displacement point at a certain time in the scoring data, a high score will be obtained if there is a singing pitch displacement point in the immediate vicinity, and the score will decrease as the distance increases. May be. Further, for example, the control unit 10 may deduct points when the singing pitch takes a displacement point at a time when there is no displacement point of the scoring data pitch.

また、他の例として、例えば、制御部１０が、特定区間全体を見て統計処理をしてもよい。より具体的には、例えば、採点用データによって示される変位点の時間と、歌唱の変位点の時間の差（図１６のΔｔ５０参照）の平均と偏差を求め、平均がゼロに近いほど、また偏差が小さいほど高得点が得られるようにしてもよい。各変位点で、採点用データとユーザ歌唱音声データとで変位点の時間差を算出し、時間差の平均がゼロに近ければ近いほど、また、偏差がゼロに近ければ近いほど、採点用データによって示される模範歌唱のとおりに歌唱したことになる。そのため、制御部１０は、以下の（３）式で点数ｓｃｏｒｅを算出してもよい。以下の（３）式において、Ａは平均値、Ｄは偏差値、ａ，ｂ，ｃは係数とする。
ｓｃｏｒｅ＝ａＡ＋ｂＤ＋ｃ …（３） As another example, for example, the control unit 10 may perform statistical processing while viewing the entire specific section. More specifically, for example, the average and deviation of the difference between the time of the displacement point indicated by the scoring data and the time of the singing displacement point (see Δt50 in FIG. 16) are obtained. Higher scores may be obtained as the deviation is smaller. At each displacement point, the time difference between the displacement points is calculated between the scoring data and the user singing voice data, and the closer the mean of the time difference is to near zero and the closer the deviation is to zero, the more indicated by the scoring data. Singing the model singing. Therefore, the control unit 10 may calculate the score score by the following equation (3). In the following equation (3), A is an average value, D is a deviation value, and a, b, and c are coefficients.
score = aA + bD + c (3)

この態様において、制御部１０が、ヒストグラムをとる等の処理を行う（図１７参照）ことによって特定の変位点（例えば極大値をとる点など）を特定し、特定した変位点に対して、重みを行ってもよい。この態様によれば、特定の変位点においては、時間のずれが強調されることになり、聴いた感じにより近い採点が行われる。 In this aspect, the control unit 10 identifies a specific displacement point (for example, a point having a maximum value) by performing processing such as taking a histogram (see FIG. 17), and assigns a weight to the identified displacement point. May be performed. According to this aspect, the time shift is emphasized at the specific displacement point, and scoring closer to the feeling of listening is performed.

＜変形例７＞
上述の実施形態では、制御部１０は、特定区間においては採点用データを用いて採点を行ったが、これに限らず、採点用データを用いずに採点を行ってもよい。この場合は、例えば、楽曲情報に含まれる拍の時刻（または拍間を２分割または４分割する時刻）とユーザ歌唱音声データから特定された変位点と時刻の差を用いて評価処理を行ってもよい。（図１８参照）。すなわち、制御部１０が、楽曲の拍を示す拍データを取得し、取得した拍データの示す時刻と、ユーザ歌唱音声におけるピッチ変位点の時刻との時間差に基づいて歌唱音声を評価してもよい。この場合、制御部１０は、拍と変位点とのずれ量（すなわち時間差）が大きいほど低評価となるように評価処理を行ってもよい。この態様によれば、模範音データを用いることなく歌唱の評価を行うことができるから、模範音データを作成する手間が省かれる。また、制御部１０が、この変形例に係る評価処理と、上述の実施形態に係る評価処理とを併用してもよい。すなわち、制御部１０が、ユーザ歌唱音声のピッチの変位点の時刻と楽曲の拍データの示す時刻との時間差に基づく評価を行うとともに、ユーザ歌唱音声のピッチの変化量と模範音データのピッチの変化量との差分に基づいて評価を行うようにしてもよい。 <Modification 7>
In the above-described embodiment, the control unit 10 performs the scoring using the scoring data in the specific section. However, the scoring may be performed without using the scoring data. In this case, for example, the evaluation process is performed using the difference between the time of the beat included in the music information (or the time to divide the beat into two or four) and the displacement point specified from the user singing voice data and the time. Also good. (See FIG. 18). That is, the control unit 10 may acquire beat data indicating the beat of the music, and evaluate the singing voice based on the time difference between the time indicated by the acquired beat data and the time of the pitch displacement point in the user singing voice. . In this case, the control unit 10 may perform the evaluation process so that the evaluation becomes lower as the deviation amount (that is, the time difference) between the beat and the displacement point is larger. According to this aspect, since the singing can be evaluated without using the model sound data, the labor of creating the model sound data can be saved. Moreover, the control part 10 may use together the evaluation process which concerns on this modification, and the evaluation process which concerns on the above-mentioned embodiment. That is, the control unit 10 performs evaluation based on the time difference between the time of the displacement point of the pitch of the user singing voice and the time indicated by the beat data of the music, and the change amount of the pitch of the user singing voice and the pitch of the model sound data You may make it evaluate based on the difference with the variation | change_quantity.

また、採点用データを用いない他の態様として、例えば、制御部１０が、変位点のうち、極大値を統計処理し、分散が小さければ小さいほど高得点となるようにしてもよい。すなわち、制御部１０が、ユーザ歌唱音声データにより示される音のピッチの時間的な変化を表すグラフに現れるピーク値（図１９の変位点ｐ９１，９２，９３参照）のピッチの値の変化量（偏差）を算出し、算出された変化量（偏差）が小さいほど高評価となるように評価処理を行ってもよい。得点の算出の態様としては、例えば、１００から偏差を差し引いた値を得点として算出してもよい。変位点のうち、値が極大となる点の値は、ラップ等の歌唱の場合は歌唱が上手な歌唱者ほどそろってくる（図１９参照）。そのため、このような評価処理を行うことで、変位点の値がそろっている歌唱ほど高評価が得られる。この態様によれば、模範音データを用いることなく歌唱の評価を行うことができるから、模範音データを作成する手間が省かれる。 Further, as another aspect not using the scoring data, for example, the control unit 10 may statistically process the maximum value among the displacement points, and the higher the score is, the smaller the variance is. That is, the amount of change in the pitch value of the peak value (see the displacement points p91, 92, and 93 in FIG. 19) that appears on the graph representing the temporal change in the pitch of the sound indicated by the user singing voice data. Deviation) may be calculated, and the evaluation process may be performed so that the smaller the calculated change amount (deviation), the higher the evaluation. As an aspect of calculating the score, for example, a value obtained by subtracting the deviation from 100 may be calculated as a score. Among the displacement points, the value of the point where the value becomes the maximum, in the case of singing such as rap, is gathered for the singers who are good at singing (see FIG. 19). Therefore, by performing such an evaluation process, a higher evaluation is obtained for a singing with the same displacement point value. According to this aspect, since the singing can be evaluated without using the model sound data, the labor of creating the model sound data can be saved.

また、採点用データを用いない他の態様として、例えば、制御部１０が、ユーザ歌唱音声の音量の立ち上がり部を特定し、特定した立ち上がり部を用いて歌唱のリズムを評価してもよい。この場合は、例えば、制御部１０が、楽曲情報に含まれる拍の時刻（または拍間を２分割または４分割する時刻）と特定された音量の立ち上がり部の時刻の差を用いて評価処理を行ってもよい。この場合、制御部１０は、拍と立ち上がり部とのずれ量（すなわち時間差）が大きいほど低評価となるように評価処理を行ってもよい。この態様によれば、採点用データを用いることなく評価することができる。また、この態様によれば、模範音データを用いることなく歌唱の評価を行うことができるから、模範音データを作成する手間が省かれる。また、音量の立ち上がり部に限らず、制御部１０が、音程の立ち上がり部を用いて歌唱のリズムを評価してもよい。この場合は、制御部１０が、ユーザ歌唱音声の音程の立ち上がり部を特定し、楽曲情報に含まれる拍の時刻と特定した音程の立ち上がり部の時刻の差を用いて評価処理を行ってもよい。また、音程の立ち上がり部に限らず、制御部１０が、ユーザ歌唱音声の検出が開始されたタイミングを用いて歌唱のリズムを評価してもよい。この場合は、制御部１０が、ユーザ歌唱音声の検出が開始されたタイミングを特定し、楽曲情報に含まれる拍の時刻と特定したタイミングとの時間差を用いて評価処理を行ってもよい。この態様によれば、模範音データを用いることなく歌唱の評価を行うことができるから、模範音データを作成する手間が省かれる。 Moreover, as another aspect which does not use the scoring data, for example, the control unit 10 may specify a rising part of the volume of the user singing voice, and evaluate the singing rhythm using the specified rising part. In this case, for example, the control unit 10 performs the evaluation process using the difference between the time of the beat included in the music information (or the time when the interval between the beats is divided into two or four) and the time of the specified rising portion of the volume. You may go. In this case, the control unit 10 may perform the evaluation process so that the evaluation becomes lower as the deviation amount (that is, the time difference) between the beat and the rising portion is larger. According to this aspect, evaluation can be performed without using scoring data. Moreover, according to this aspect, since the singing can be evaluated without using the model sound data, the labor for creating the model sound data can be saved. Moreover, not only the rising part of a sound volume but the control part 10 may evaluate the rhythm of a song using the rising part of a pitch. In this case, the control unit 10 may specify the rising part of the pitch of the user singing voice, and perform the evaluation process using the difference between the time of the beat included in the music information and the time of the rising part of the specified pitch. . Moreover, not only the rising part of a pitch but the control part 10 may evaluate the rhythm of a song using the timing at which the detection of the user song voice was started. In this case, the control unit 10 may specify the timing when the detection of the user singing voice is started, and perform the evaluation process using the time difference between the time of the beat included in the music information and the specified timing. According to this aspect, since the singing can be evaluated without using the model sound data, the labor of creating the model sound data can be saved.

＜変形例８＞
上述の実施形態では、制御部１０が、特定区間を示す区間データを取得する構成としたが、これに代えて、制御部１０が、ユーザ歌唱音声データを予め定められたアルゴリズムに従って解析し、解析結果に応じて特定区間を特定するようにしてもよい。具体的には、例えば、制御部１０が、ピッチの変化の態様が予め定められた条件を満たす区間を特定区間として特定するようにしてもよい。この場合も、上述の実施形態と同様に、制御部１０は、特定区間においては採点用データを用いた採点を行うようにすればよい。この態様によれば、特定区間を予め曲データに記載するという手間が省かれる。 <Modification 8>
In the above-described embodiment, the control unit 10 obtains the section data indicating the specific section. Instead, the control unit 10 analyzes the user singing voice data according to a predetermined algorithm and performs analysis. You may make it identify a specific area according to a result. Specifically, for example, the control unit 10 may specify a section in which the pitch change mode satisfies a predetermined condition as the specific section. Also in this case, similarly to the above-described embodiment, the control unit 10 may perform scoring using scoring data in the specific section. According to this aspect, the trouble of describing the specific section in the song data in advance can be saved.

＜変形例９＞
上述の実施形態では、制御部１０は、ピッチの変化量の差分と音量の変化量の差分とを用いて採点処理を行ったが、音量の変化を採点に加味しない構成であってもよい。すなわち、ピッチの変化に基づいて評価値を算出するようにしてもよい。この場合は、採点用データに、音量の変位点を表すデータを含める必要はない。 <Modification 9>
In the above-described embodiment, the control unit 10 performs the scoring process using the difference in pitch change and the difference in volume change. However, the control unit 10 may be configured not to take the change in volume into account. That is, the evaluation value may be calculated based on the change in pitch. In this case, it is not necessary to include data representing the volume displacement point in the scoring data.

また、上述の実施形態では、採点用データを採点用データ記憶領域２４に予め記憶しておく構成としたが、これに限らず、制御部１０が、模範となる歌唱（以下「模範歌唱」という）を表すデータ（以下「模範歌唱データ」）を解析し、採点用データを生成するようにしてもよい。 In the above-described embodiment, the scoring data is stored in advance in the scoring data storage area 24. However, the present invention is not limited to this, and the control unit 10 is a model song (hereinafter referred to as "model song"). ) (Hereinafter, “exemplary song data”) may be analyzed to generate scoring data.

上述の実施形態では、制御部１０が、歌唱音声データをユーザ歌唱音声データ記憶領域２５に記憶し、歌唱が終了した後に採点を行うようにしたが、これに限らず、歌唱中にリアルタイムで採点処理を行うようにしてもよい。 In the above-described embodiment, the control unit 10 stores the singing voice data in the user singing voice data storage area 25 and performs the scoring after the singing is finished. However, not limited to this, the scoring is performed in real time during the singing. Processing may be performed.

また、上述の実施形態では、制御部１０は、歌唱音声のピッチとＧＭデータのピッチとを比較し、比較結果に応じて評価処理を行ったが、評価処理の態様はこれに限らず、他の態様であってもよい。例えば、ＦＦＴ（Fast Fourier Transform）などを用いた周波数分析、音量分析などの公知の様々な手法を用い、評価項目について評価値、つまり評価結果を算出するようにしてもよい。 In the above-described embodiment, the control unit 10 compares the pitch of the singing voice with the pitch of the GM data, and performs the evaluation process according to the comparison result. The aspect of this may be sufficient. For example, an evaluation value, that is, an evaluation result may be calculated for an evaluation item by using various known methods such as frequency analysis using FFT (Fast Fourier Transform) and sound volume analysis.

また、上述の実施形態では、制御部１０は、採点結果を表示部４０に出力したが、これに限らず、採点結果を示すデータを外部接続された記憶装置に出力するようにしてもよく、また、例えば、通信ネットワークを介して接続されたサーバ装置へ送信することによって採点結果を出力するようにしてもよい。また、この実施形態では、採点結果を表示部４０に出力することによってユーザに報知したが、報知の態様はこれに限らず、例えば、音声メッセージや報知音によって報知してもよく、採点結果をユーザに報知するものであればどのようなものであってもよい。 In the above-described embodiment, the control unit 10 outputs the scoring result to the display unit 40. However, the present invention is not limited thereto, and the data indicating the scoring result may be output to an externally connected storage device. Further, for example, the scoring result may be output by transmitting to a server device connected via a communication network. In this embodiment, the user is informed by outputting the scoring result to the display unit 40. However, the manner of notification is not limited to this. For example, the scoring result may be notified by a voice message or a notification sound. Any information may be used as long as it notifies the user.

また、上述の実施形態では、制御部１０は、歌唱者の歌唱音声を評価したが、歌唱者の歌唱音声に代えて、演奏者による楽器の演奏音を評価してもよい。本実施形態にいう「音声」には、人間が発生した音声や楽器の演奏音といった種々の音響が含まれる。 Moreover, in the above-mentioned embodiment, although the control part 10 evaluated the song voice of a singer, it may replace with the song voice of a singer, and may evaluate the performance sound of the musical instrument by a player. The “speech” referred to in the present embodiment includes various sounds such as a sound generated by a person and a performance sound of a musical instrument.

また、上述の実施形態では、制御部１０は、音の特徴としてピッチと音量とを用いたが、音の特徴はピッチと音量に限定されるものではなく、他の特徴であってもよい。例えば、音の特徴は、特定の倍音のパワーの変動や、特定の倍音と基音のパワーの比率、倍音成分のパワーの合計と基音のパワーの比率、ＳＮ比率、ラウドネス（音量を聴覚の周波数特性に合わせて補正した値。「Ａ特性音圧レベル」、「サウンドレベル」とも呼ばれる。JIS C1509で規定。）等であってもよく、音の特徴を表すものであればどのようなものであってもよい。 In the above-described embodiment, the control unit 10 uses the pitch and the volume as the characteristics of the sound. However, the characteristics of the sound are not limited to the pitch and the volume, and may be other characteristics. For example, the characteristics of the sound are the fluctuation of the power of a specific harmonic, the ratio of the power of the specific harmonic and the fundamental, the ratio of the total harmonic power and the power of the fundamental, the SN ratio, the loudness (the volume is the frequency characteristic of the hearing) A value that is corrected according to JIS C1509 (also referred to as “A-weighted sound pressure level” or “sound level”). May be.

また、上述の実施形態では、制御部１０は、ユーザ歌唱音声データに関する変位点の時刻の変化量と模範音データに関する変位点の時刻の変化量との差分に基づいてユーザ歌唱音声データを評価したが、これに限らず、変位点の時刻に関しては、絶対的な時刻があっているものの評価を高くしてもよい。すなわち、制御部１０は、ユーザ歌唱音声データに関する変位点の時刻と模範音データに関する変位点の時刻との差分を特定し、特定した差分に基づきユーザ歌唱音声データにより示される音を評価してもよい。制御部１０は、本発明に係る時刻変位特定部の一例である。 Moreover, in the above-mentioned embodiment, the control part 10 evaluated user singing voice data based on the difference of the variation | change_quantity of the time of the displacement point regarding user singing voice data, and the variation | change_quantity of the time of the displacement point regarding model sound data. However, the present invention is not limited to this, and regarding the time of the displacement point, although the absolute time is correct, the evaluation may be made high. That is, the control unit 10 specifies the difference between the time of the displacement point related to the user singing voice data and the time of the displacement point related to the model sound data, and evaluates the sound indicated by the user singing voice data based on the specified difference. Good. The control unit 10 is an example of a time displacement specifying unit according to the present invention.

＜変形例１０＞
上述の実施形態では、制御部１０は、特定区間においては、ユーザ歌唱音声データの音の特徴の変化量の差分に基づく評価を行う一方、特定区間以外の区間においては、ユーザ歌唱音声データの音の特徴とＧＭデータにより示される音の特徴との差分に基づいて評価を行った。これに限らず、制御部１０が、特定区間において、音の特徴の変化量の差分に基づく評価を行うまたはピッチを主とした従来の歌唱評価を行う（すなわちユーザ歌唱音声データの音の特徴とＧＭデータにより示される音の特徴との差分に基づいて評価を行う）構成であってもよい。 <Modification 10>
In the above-described embodiment, the control unit 10 performs the evaluation based on the difference in the change amount of the sound feature of the user singing voice data in the specific section, while the sound of the user singing voice data is used in the section other than the specific section. The evaluation was performed based on the difference between the feature of the sound and the feature of the sound indicated by the GM data. Not only this but the control part 10 performs the evaluation based on the difference of the variation | change_quantity of the characteristic of a sound in the specific area, or performs the conventional singing evaluation mainly on the pitch (namely, the characteristic of the sound of user singing voice data) (Evaluation is based on the difference from the sound feature indicated by the GM data).

＜変形例１１＞
上述の実施形態において、通信ネットワークで接続された２以上の装置が、上記実施形態のカラオケ装置１００に係る機能を分担するようにし、それら複数の装置を備えるシステムが同実施形態のカラオケ装置１００を実現するようにしてもよい。例えば、マイクロホンやスピーカ、表示装置及び操作部等を備えるコンピュータ装置と、採点処理を実行するサーバ装置とが通信ネットワークで接続されたシステムとして構成されていてもよい。この場合は、例えば、コンピュータ装置が、マイクロホンで収音された音声をオーディオ信号に変換してサーバ装置に送信し、サーバ装置が、受信したオーディオ信号を解析して採点し、採点結果をコンピュータ装置に送信してもよい。この態様によれば、カラオケ端末の処理負荷が軽減され、また、サーバにおける統計処理が可能になる。 <Modification 11>
In the above-described embodiment, two or more devices connected by a communication network share the functions related to the karaoke device 100 of the above-described embodiment, and a system including the plurality of devices uses the karaoke device 100 of the same embodiment. It may be realized. For example, a computer device including a microphone, a speaker, a display device, an operation unit, and the like may be configured as a system in which a server device that executes scoring processing is connected via a communication network. In this case, for example, the computer apparatus converts the sound collected by the microphone into an audio signal and transmits it to the server apparatus, and the server apparatus analyzes and scores the received audio signal, and the scoring result is calculated by the computer apparatus. May be sent to. According to this aspect, the processing load of the karaoke terminal is reduced, and statistical processing in the server becomes possible.

＜変形例１２＞
本発明は、評価装置以外にも、これらを実現するための方法や、コンピュータに音声評価機能を実現させるためのプログラムとしても把握される。かかるプログラムは、これを記憶させた光ディスク等の記録媒体の形態で提供されたり、インターネット等を介して、コンピュータにダウンロードさせ、これをインストールして利用させるなどの形態でも提供されたりする。この態様によれば、家庭のＰＣ（Personal Computer）や携帯端末等（スマートフォンを含む）で、上述した実施形態に係るサービスを提供できる。 <Modification 12>
In addition to the evaluation apparatus, the present invention can be understood as a method for realizing these and a program for causing a computer to realize a voice evaluation function. Such a program may be provided in the form of a recording medium such as an optical disk storing the program, or may be provided in the form of being downloaded to a computer via the Internet or the like and installed and used. According to this aspect, a service according to the above-described embodiment can be provided by a home PC (Personal Computer), a mobile terminal, or the like (including a smartphone).

１０…制御部、２０…記憶部、２１…伴奏データ記憶領域、２２…映像データ記憶領域、２３…ＧＭデータ記憶領域、２４…採点用データ記憶領域、２５…ユーザ歌唱音声データ記憶領域、３０…操作部、４０…表示部、５０…通信制御部、６０…音声処理部、６１…マイクロホン、６２…スピーカ、７０…バス、１００…カラオケ装置、２００…サーバ装置、２１０…ネットワークストレージ、３００…ユーザ歌唱音声曲線、４００…採点関数、５００…ＧＭ曲線 DESCRIPTION OF SYMBOLS 10 ... Control part, 20 ... Storage part, 21 ... Accompaniment data storage area, 22 ... Video data storage area, 23 ... GM data storage area, 24 ... Scoring data storage area, 25 ... User singing voice data storage area, 30 ... Operation unit, 40 ... display unit, 50 ... communication control unit, 60 ... audio processing unit, 61 ... microphone, 62 ... speaker, 70 ... bus, 100 ... karaoke device, 200 ... server device, 210 ... network storage, 300 ... user Singing voice curve, 400 ... scoring function, 500 ... GM curve

Claims

An audio acquisition unit for acquiring audio data indicating an audio waveform;
An exemplary sound acquisition unit that acquires exemplary sound data indicating characteristics of the exemplary sound;
A feature specifying unit for specifying the feature of the sound indicated by the voice data acquired by the voice acquisition unit;
A difference specifying unit that specifies a difference between the amount of change of the feature specified by the feature specifying unit and a change amount of each of the change amount of the feature of the sound indicated by the exemplary sound data;
An audio evaluation device comprising: an evaluation unit that evaluates a sound indicated by the audio data based on the difference specified by the difference specifying unit and outputs an evaluation result.

When the point where the slope of the graph representing the temporal change in the characteristic of the sound changes more than a predetermined threshold is used as the displacement point,
A displacement point identifying unit that identifies a displacement point of the feature identified by the feature identifying unit;
An associating unit that associates a displacement point related to the feature indicated by the exemplary sound data with a displacement point of the audio data identified by the displacement point identifying unit that appears within a predetermined time difference from the time of the displacement point. And
The difference specifying unit specifies a difference between a feature change amount at a displacement point related to the audio data correlated by the association unit and a feature change amount at a displacement point related to the exemplary sound data in specifying the difference. The speech evaluation apparatus according to claim 1, wherein:

A time displacement identifying unit that identifies the difference between the time of the displacement point related to the audio data and the time of the displacement point related to the exemplary sound data;
The evaluation unit evaluates the sound indicated by the audio data based on the difference specified by the difference specifying unit and the difference specified by the time displacement specifying unit, and outputs an evaluation result. 2. The voice evaluation device according to 2.

The evaluation unit outputs the same evaluation result when the difference specified by the difference specifying unit is within a predetermined threshold value. In other cases, the evaluation unit increases the specified difference. The speech evaluation apparatus according to any one of claims 1 to 3, wherein an evaluation result indicating low evaluation is output.

A section data obtaining unit for obtaining section data representing a section in which the voice data is sung in a predetermined singing manner;
In the section indicated by the section data acquired by the section data acquisition section, the evaluation section performs evaluation based on the difference specified by the difference specifying section, while in a section other than the section indicated by the section data. The speech evaluation apparatus according to claim 1, wherein the evaluation is performed based on a difference between the feature specified by the feature specifying unit and the feature of the sound indicated by the model sound data.

Analyzing the voice data according to a predetermined algorithm, and according to the analysis result, comprising a section specifying unit for specifying a section for performing singing according to a predetermined singing mode,
The evaluation unit performs evaluation based on the difference specified by the difference specifying unit in the section specified by the section specifying unit, while the feature specifying unit performs the evaluation in a section other than the specified section. The speech evaluation apparatus according to claim 1, wherein the evaluation is performed based on a difference between the identified feature and the feature of the sound indicated by the model sound data.