JP2013195738A

JP2013195738A - Singing evaluation device

Info

Publication number: JP2013195738A
Application number: JP2012063263A
Authority: JP
Inventors: Shuichi Matsumoto; 秀一松本; Tatsuya Terajima; 辰弥寺島
Original assignee: Yamaha Corp
Current assignee: Yamaha Corp
Priority date: 2012-03-21
Filing date: 2012-03-21
Publication date: 2013-09-30
Anticipated expiration: 2032-03-21
Also published as: JP5919928B2

Abstract

PROBLEM TO BE SOLVED: To evaluate a sense of rhythm in user's singing while taking beats into consideration.SOLUTION: Once a user specifies a song (Sa1), a control section acquires guide melody data and musical sound data representing an accompaniment sound of the specified song from a server device (Sa2). The control section creates an evaluation table based upon the acquired guide melody data (Sa3). The control section classifies respective reference component sounds of the acquired guide melody data by types associated with beats (Sa4). Once the song is reproduced (Sa5), a voice signal representing user's singing is acquired (Sa6). The control section specifies the pitch of the user's singing from the voice signal (Sa7). The control section specifies user's singing start time of each reference component sound (Sa8). The control section evaluates singing to each reference component sound (Sa9). The control section calculates average points by the types associated with the beats (Sa10) and evaluates a total sense of rhythm (Sa11).

Description

本発明は、ユーザの歌唱におけるリズム感を評価する技術に関する。 The present invention relates to a technique for evaluating a rhythm feeling in a user's song.

従来、カラオケ装置において、ユーザのリズム感を評価するものが広く知られている。例えば、特許文献１には、ガイドメロディに対するユーザの歌唱のリズム感を評価する技術が開示されている。 Conventionally, in a karaoke apparatus, what evaluates a user's rhythm sense is widely known. For example, Patent Literature 1 discloses a technique for evaluating a user's singing rhythm with respect to a guide melody.

特開２００４−１０２１４８号公報JP 2004-102148 A

特許文献１に記載の技術では、ガイドメロディを構成する音ごとにユーザの歌唱のリズム感を均等に評価していた。一方、近年のＪ−ＰＯＰと呼ばれる歌謡曲の中には、リズムにシンコペーションやアンティシペーション等が用いられるものが多く、歌唱において拍との関係がリズム感を左右する。また、この種の歌謡曲では、単位時間あたりの歌詞の文字数が多いことがあるため、表拍だけでなく裏拍までタイミングよく歌唱することが要求される。 In the technique described in Patent Document 1, the rhythm feeling of the user's singing is evaluated equally for each sound constituting the guide melody. On the other hand, there are many popular songs called J-POP in recent years in which syncopation, anticipation, etc. are used for the rhythm, and the relationship with the beat influences the sense of rhythm in singing. Moreover, in this kind of popular song, since the number of characters of the lyrics per unit time may be large, it is required to sing not only at the table beat but also at the back beat.

本発明は、上述の背景に鑑みてなされたものであり、拍を考慮してユーザの歌唱におけるリズム感を評価することを目的とする。 The present invention has been made in view of the above-described background, and an object thereof is to evaluate a user's rhythm feeling in singing in consideration of beats.

上述した課題を解決するため、本発明は、ユーザが楽曲を演奏する際に参照する模範音を構成する複数の参照構成音の各々に関し、当該参照構成音の音高と、当該参照構成音の発音開始時刻とを表す模範音データを取得する模範音データ取得手段と、前記模範音データ取得手段により取得された模範音データにより音高及び発音開始時刻が表される複数の参照構成音の各々を、所定の規則に従って定められた、拍に関する複数の種別のいずれかに分類する分類手段と、ユーザの演奏により生成される音である演奏音を表す演奏音信号を取得する演奏音信号取得手段と、前記演奏音信号取得手段により取得された演奏音信号からユーザの演奏音の音高を特定する音高特定手段と、前記模範音データにより音高及び発音開始時刻が表される複数の参照構成音の各々に関し、所定の規則に従って当該参照構成音の演奏がなされた発音開始時刻を特定する発音開始時刻特定手段と、前記模範音データにより表される複数の参照構成音の各々に関し、当該参照構成音の発音開始時刻と、当該参照構成音に関して前記発音開始時刻特定手段により特定されたユーザによる演奏の発音開始時刻との差が小さいほど高い評価を示す構成音評価データを生成する構成音評価データ生成手段と、拍に関する前記種別毎に、前記分類手段により当該種別に分類された１以上の参照構成音について前記構成音評価データ生成手段により生成された構成音評価データに基づき、前記種別毎の評価を示す拍種別毎評価データを生成する拍種別毎評価データ生成手段とを備える歌唱評価装置を提供する。 In order to solve the above-described problem, the present invention relates to each of a plurality of reference constituent sounds constituting a model sound that is referred to when a user plays a musical piece, and the pitch of the reference constituent sound and the reference constituent sound. Model sound data acquisition means for acquiring model sound data representing the pronunciation start time, and each of a plurality of reference constituent sounds whose pitch and pronunciation start time are represented by the model sound data acquired by the model sound data acquisition means Classifying means for classifying the sound into one of a plurality of types related to beats, and a performance sound signal acquisition means for acquiring a performance sound signal representing a performance sound that is a sound generated by a user's performance A pitch specifying means for specifying the pitch of the user's performance sound from the performance sound signal acquired by the performance sound signal acquisition means, and a plurality of pitches and pronunciation start times represented by the model sound data With respect to each of the reference component sounds, the pronunciation start time specifying means for specifying the pronunciation start time at which the reference component sound was performed according to a predetermined rule, and each of the plurality of reference component sounds represented by the exemplary sound data, A configuration for generating component sound evaluation data indicating a higher evaluation as a difference between the sound generation start time of the reference component sound and the sound generation start time of the performance specified by the sound generation start time specifying unit with respect to the reference component sound is smaller Based on the component sound evaluation data generated by the component sound evaluation data generation unit for one or more reference component sounds classified into the type by the classification unit, for each of the types relating to the beat evaluation data generation unit, There is provided a singing evaluation device including evaluation data generating means for each beat type for generating evaluation data for each beat type indicating evaluation for each type.

好ましい態様において、本発明は、前記発音開始時刻特定手段は、前記模範音データにより音高及び発音開始時刻が表される複数の参照構成音の各々に関し、当該参照構成音の発音開始時刻を基準として定められる期間内に、前記音高特定手段により特定されたユーザの演奏音の音高が、当該参照構成音の音高を基準として定められる音高の範囲内に入った時刻を、当該参照構成音の演奏がなされた発音開始時刻として特定することを特徴とする。 In a preferred aspect, the present invention provides that the sound generation start time specifying means relates to each of a plurality of reference component sounds whose pitches and sound generation start times are represented by the model sound data, and uses the sound generation start time of the reference component sound as a reference The time when the pitch of the performance sound of the user specified by the pitch specifying means falls within the range of the pitch determined on the basis of the pitch of the reference constituent sound within the period determined as It is specified as the sounding start time when the constituent sound is played.

また別の好ましい態様において、本発明は、前記模範音データは、少なくとも１以上の前記参照構成音の各々に関し、当該参照構成音が発音されるタイミングを示すタイミングデータを含み、前記分類手段は、前記タイミングデータに基づき前記分類を行うことを特徴とする。 In another preferred aspect, the present invention provides that the model sound data includes timing data indicating a timing at which the reference constituent sound is generated with respect to each of at least one or more reference constituent sounds, and the classification unit includes: The classification is performed based on the timing data.

また別の好ましい態様において、本発明は、前記模範音データは、前記楽曲の演奏期間を複数に区分して得られる小節の各々の開始時刻を特定するデータと、前記複数の各々の小節に関する拍子を特定するデータとを含み、前記分類手段は、前記小節の各々の開始時刻を特定するデータと、前記複数の各々の小節に関する拍子を特定するデータとに基づき前記分類を行うことを特徴とする。 In another preferred embodiment, the present invention provides the model sound data, the data specifying the start time of each measure obtained by dividing the performance period of the music into a plurality of times, and the time signature relating to each of the plurality of measures. The classification means performs the classification based on data specifying the start time of each of the bars and data specifying the time signature of each of the plurality of bars. .

また別の好ましい態様において、本発明は、前記模範音データは、少なくとも１以上の前記参照構成音の各々に関し、当該参照構成音に関する評価点に乗じられるべき係数を含み、前記構成音評価データ生成手段は、前記参照構成音の発音開始時刻と前記ユーザによる演奏の発音開始時刻との差に基づき決定される評価点に、前記係数を乗じて得られる評価点を示す前記構成音評価データを生成することを特徴とする。 In another preferred embodiment, the present invention provides that the model sound data includes, for each of at least one or more of the reference component sounds, a coefficient to be multiplied by an evaluation point regarding the reference component sound, and the component sound evaluation data generation The means generates the constituent sound evaluation data indicating an evaluation point obtained by multiplying an evaluation point determined based on a difference between the sounding start time of the reference constituent sound and the sounding start time of the performance by the user by the coefficient It is characterized by doing.

また別の好ましい態様において、本発明は、前記拍種別毎評価データ生成手段により生成される拍種別毎評価データは、拍に関する種別毎の評価を数値で示すデータであり、前記拍種別毎評価データにより示される数値に対し種別毎に予め定められた比率の係数を乗じ、それらを加算して得られる数値を算出する算出手段を備えることを特徴とする。 In another preferred embodiment, in the present invention, the beat type evaluation data generated by the beat type evaluation data generation unit is data indicating a numerical value of the evaluation of each type related to the beat, and the beat type evaluation data Multiplying the numerical value represented by the above by a coefficient of a ratio predetermined for each type, and adding a calculation means for calculating a numerical value obtained by adding them.

本発明によれば、拍を考慮してユーザの歌唱におけるリズム感を評価することができる。 ADVANTAGE OF THE INVENTION According to this invention, a rhythm feeling in a user's song can be evaluated in consideration of a beat.

歌唱評価システムの構成を示す図The figure which shows the structure of the singing evaluation system カラオケ装置のハードウェア構成を表すブロック図Block diagram showing hardware configuration of karaoke equipment 拍に関する種別の分類を表す模式図Schematic diagram showing classification of types related to beats ガイドメロディデータの一例を表す図A figure showing an example of guide melody data 評価テーブルの一例を表す図A figure showing an example of an evaluation table カラオケ装置の機能的構成を表すブロック図Block diagram showing functional configuration of karaoke equipment カラオケ装置の処理フロー図Processing flow diagram of karaoke equipment ユーザによる歌唱の検出を説明するための模式図Schematic diagram for explaining detection of singing by the user 評価基準テーブルの一例を表す図The figure showing an example of an evaluation standard table 評価が行われた後の評価テーブルの一例を表す図The figure showing an example of the evaluation table after evaluation is performed 度数分布表の一例を表す図Figure showing an example of the frequency distribution table 評価コメントテーブルの一例を表す図A figure showing an example of an evaluation comment table 変形例１に係るガイドメロディデータの一例を表す図A figure showing an example of guide melody data concerning modification 1 変形例２に係るガイドメロディデータの一例を表す図The figure showing an example of the guide melody data which concerns on the modification 2. 変形例３に係るガイドメロディデータの一例を表す図The figure showing an example of the guide melody data which concerns on the modification 3. 変形例４に係る評価コメントテーブルの一例を表す図The figure showing an example of the evaluation comment table which concerns on the modification 4.

＜実施形態＞
＜構成＞
図１は、本発明の実施形態に係る歌唱評価システム１００の構成を示す図である。歌唱評価システム１００は、カラオケ装置１０及びサーバ装置２０からなる。サーバ装置２０は、楽曲の伴奏音を表す楽音データと、ユーザがこの楽曲を歌唱する際に参照するガイドメロディに関する情報（ガイドメロディデータという）とを記憶するとともに、これらをカラオケ装置１０に供給するサーバ装置である。ユーザが、カラオケ装置１０において歌唱したい楽曲を指定すると、カラオケ装置１０は、指定された楽曲の伴奏音を表す楽音データとガイドメロディデータとをサーバ装置２０から取得する。楽音データに基づく伴奏音に合わせてユーザが歌唱を行うと、カラオケ装置１０は、ユーザの歌唱を表す音声信号とガイドメロディデータとに基づいてユーザの歌唱におけるリズム感を評価する。 <Embodiment>
<Configuration>
FIG. 1 is a diagram showing a configuration of a singing evaluation system 100 according to an embodiment of the present invention. The singing evaluation system 100 includes a karaoke device 10 and a server device 20. The server device 20 stores musical sound data representing accompaniment sounds of music and information on guide melody (referred to as guide melody data) to be referred to when the user sings the music and supplies them to the karaoke device 10. It is a server device. When the user designates a song to be sung in the karaoke device 10, the karaoke device 10 acquires musical sound data and guide melody data representing the accompaniment sound of the designated song from the server device 20. When the user sings according to the accompaniment sound based on the musical sound data, the karaoke apparatus 10 evaluates the rhythm feeling in the user's singing based on the audio signal representing the user's singing and the guide melody data.

図２は、カラオケ装置１０のハードウェア構成を表すブロック図である。カラオケ装置１０は、制御部１１、記憶部１２、通信部１３、ＵＩ（User Interface）部１４、音声入力部１５及び音声出力部１６を備える。制御部１１は、本発明に係る歌唱評価装置の一例であり、ＣＰＵ（Central Processing Unit）、ＲＡＭ（Random Access Memory）及びＲＯＭ（Read Only Memory）を備え、カラオケ装置１０の全体を制御する。ＣＰＵは、ＲＯＭや記憶部１２に記憶されたプログラムをＲＡＭに読み出して実行することにより、各種制御を行う。記憶部１２は、例えばハードディスク等の大容量記憶装置であり、各種プログラムや上述した楽音データ及びガイドメロディデータのほか、後述する評価テーブルや、閾値等を記憶する。通信部１３は、有線或いは無線によってサーバ装置２０と通信を行う。ＵＩ部１４は、例えばタッチスクリーン及びキーを有し、ユーザによる操作を受け付けるとともに、操作に基づく画像をタッチスクリーンに表示させる。音声入力部１５は、例えばマイクなどの音声収音手段を有し、マイクから入力されたユーザの音声を表す音声信号を取得して、取得した音声信号を制御部１１に入力する。音声出力部１６は、例えばスピーカなどの放音手段を有し、楽曲の伴奏音を表す楽音データとユーザの音声を表す音声信号とをデジタルアナログ変換してスピーカから出力させる。ユーザが、歌唱したい楽曲をＵＩ部１４を用いて指定すると、通信部１３は、制御部１１の制御のもと、指定された楽曲をサーバ装置３０に通知し、これに応じてサーバ装置３０から送信されてくる楽音データ及びガイドメロディデータを取得する。制御部１１は、取得した楽音データ及びガイドメロディデータを記憶部１２に記憶させる。 FIG. 2 is a block diagram illustrating a hardware configuration of the karaoke apparatus 10. The karaoke apparatus 10 includes a control unit 11, a storage unit 12, a communication unit 13, a UI (User Interface) unit 14, a voice input unit 15, and a voice output unit 16. The control unit 11 is an example of a song evaluation apparatus according to the present invention, and includes a CPU (Central Processing Unit), a RAM (Random Access Memory), and a ROM (Read Only Memory), and controls the entire karaoke apparatus 10. The CPU performs various controls by reading the program stored in the ROM or the storage unit 12 into the RAM and executing it. The storage unit 12 is, for example, a large-capacity storage device such as a hard disk, and stores various programs, the above-described musical tone data and guide melody data, an evaluation table described later, a threshold value, and the like. The communication unit 13 communicates with the server device 20 by wire or wireless. The UI unit 14 includes, for example, a touch screen and a key, and accepts an operation by a user and displays an image based on the operation on the touch screen. The voice input unit 15 includes voice pickup means such as a microphone, acquires a voice signal representing the user's voice input from the microphone, and inputs the acquired voice signal to the control unit 11. The sound output unit 16 has sound emission means such as a speaker, for example, and converts the musical sound data representing the accompaniment sound of the music and the sound signal representing the user's voice from digital to analog and outputs the result from the speaker. When the user designates a song to be sung using the UI unit 14, the communication unit 13 notifies the server device 30 of the designated song under the control of the control unit 11, and accordingly, from the server device 30. Acquired musical tone data and guide melody data. The control unit 11 causes the storage unit 12 to store the acquired musical sound data and guide melody data.

上述したガイドメロディデータには、ユーザが歌唱の際に参照するガイドメロディについて、構成音毎に音高や発音開始時刻等の情報が記述されている。これら構成音毎の音高や発音開始時刻等の情報は、ユーザの歌唱における模範となる音に関する情報である。ガイドメロディデータは、本発明に係る模範音データの一例である。制御部１１は、ガイドメロディデータが示す各構成音を、所定の規則に従い予め定められた複数の拍に関する種別のいずれかに分類し、分類された拍に関する種別ごとに、ユーザの歌唱におけるリズム感を評価する。以下に、図３を参照して拍に関する種別の分類について説明する。
図３は、拍に関する種別の分類を表す模式図である。図３は、４／４拍子における１小節を例に表している。ここでは、４／４拍子なので、１拍目から４拍目までの音符が「表拍」となり、表拍どうしの中間の音符が「裏拍」となり、それ以外の音符は「それ以外」となる。つまり、ここでは、一小節を時間的に８つの区間に等分した場合に、奇数番目の区間の先頭位置が「表拍」に相当し、偶数番目の区間の先頭位置が「裏拍」に相当する。また、図示された三連符については、最初の音符は表拍となるが、残りの２つの音符は、表拍でも裏拍でもない「それ以外」となる。 In the guide melody data described above, information such as pitch and pronunciation start time is described for each constituent sound with respect to the guide melody that the user refers to when singing. Information such as the pitch of each constituent sound and the sounding start time is information on a sound that serves as an example in the user's singing. Guide melody data is an example of model sound data according to the present invention. The control unit 11 classifies each constituent sound indicated by the guide melody data into one of a plurality of types related to a predetermined number of beats according to a predetermined rule, and the rhythm feeling in the user's singing for each type of classified beats. To evaluate. Hereinafter, the classification of types related to beats will be described with reference to FIG.
FIG. 3 is a schematic diagram showing classification of types related to beats. FIG. 3 shows an example of one measure in a 4/4 time signature. In this example, because the time is 4/4, the notes from the first beat to the fourth beat are “table beats”, the notes between the beats are “back beats”, and the other notes are “other than that”. Become. In other words, here, when one measure is equally divided into eight sections in time, the start position of the odd-numbered section corresponds to “front beat” and the start position of the even-numbered section becomes “back beat”. Equivalent to. For the triplet shown in the figure, the first note is a table beat, but the remaining two notes are “other” that is neither a table beat nor a back beat.

本実施形態では、拍に関する種別を、「表拍」、「裏拍」及び「それ以外」の３つに分類する。また、本実施形態においては、「それ以外」の拍に関する種別は、三連符における先頭の音符を除く残りの２つの音符や、シンコペーションやアンティシペーションやシャッフルに相当する音符等を含む概念である。ここで、上述した「表拍」及び「裏拍」以外の「強拍」、「弱拍」、「シンコペーション」、「アンティシペーション」、「三連符」、「シャッフル」について定義を説明する。「強拍」とは、或る拍子における１拍目の先頭の音符を指す。一方、「弱拍」とは、或る拍子における強拍以外の音符を示す。例えば「４／４拍子」の場合、１拍目の音符は「強拍」となり、２〜４拍目の音符は「弱拍」となる。「シンコペーション」とは、或る音がより劣位の拍からより優位の拍に鳴り続けることによって生じるリズムのことである。例えば、或る小節の弱拍から、小節内において次に位置する強拍までタイによりひとつの音として繋がっている場合、これをシンコペーションのリズムという。本実施形態において、このようにタイで繋がれた音符が拍に関する種別で分類されると「それ以外」となり、より詳細に分類される場合「シンコペーション」となる。「アンティシペーション」とは、小節線を跨ぐ場合のシンコペーションのことである。例えば、或る小節の弱拍から、小節線を跨いで、次の小節の先頭に位置する強拍までタイによりひとつの音として繋がっている場合、これをアンティシペーションのリズムという。本実施形態において、このようにタイで繋がれた音符が拍に関する種別で分類されると「それ以外」となり、より詳細に分類される場合「アンティシペーション」となる。「三連符」とは、例えば四分音符といった基準となる或る一つの音符が三等分された音符を示す。本実施形態では、三連符のうち、「表拍」又は「裏拍」に分類可能な音符以外の残りの音符が、拍に関する種別として「それ以外」に分類され、より詳細に分類される場合「三連符」となる。「シャッフル」とは、二つの連続した音符のうち、初めの音符の長さを長めにとり、二つ目の音符の長さを短くとるリズムのことである。シャッフルのリズムは、真ん中の音符を休符にした三連符と同様となる。本実施形態では、このような二つの連続した音符のうち、二つ目の音符が拍に関する種別で分類されると「それ以外」となり、より詳細に分類される場合「シャッフル」となる。
次に、ガイドメロディデータについて図を用いて説明する。 In the present embodiment, the types related to beats are classified into three: “table beats”, “back beats”, and “others”. In this embodiment, the type related to the beat of “other than that” is a concept including the remaining two notes excluding the first note in a triplet, notes corresponding to syncopation, anticipation, shuffle, and the like. is there. Here, definitions of “strong beat”, “weak beat”, “syncopation”, “anticipation”, “triplet”, and “shuffle” other than the above-described “table beat” and “back beat” will be described. . “Strong beat” refers to the first note of the first beat in a certain time signature. On the other hand, “weak beat” indicates a note other than a strong beat in a certain time signature. For example, in the case of “4/4 time”, the note of the first beat is “strong beat”, and the notes of the second to fourth beats are “weak beat”. “Syncopation” is a rhythm produced by a certain sound that continues to sound from a lesser beat to a more dominant beat. For example, when a tie is connected as one sound from a weak beat of a measure to a strong beat positioned next in the measure, this is called a syncopation rhythm. In the present embodiment, when notes connected in this way are classified by the type related to beats, it becomes “other”, and when it is classified in more detail, it becomes “syncopation”. “Anticipation” is a syncopation when straddling a bar line. For example, when a tie is connected as a single sound from a weak beat of a bar to a strong beat located at the head of the next bar across the bar line, this is called an anti-citation rhythm. In the present embodiment, when the notes connected in this way are classified by the type related to beats, it becomes “other”, and when it is classified in more detail, it becomes “anticipation”. The “triplet” indicates a note obtained by dividing a certain reference note such as a quarter note into three equal parts. In the present embodiment, among the triplets, the remaining notes other than the notes that can be classified as “table beat” or “back beat” are classified as “others” as the type related to beats, and are classified in more detail. In this case "triplet". “Shuffle” is a rhythm in which the length of the first note is long and the length of the second note is short among two consecutive notes. The shuffle rhythm is similar to a triplet with the middle note rested. In the present embodiment, among such two consecutive notes, when the second note is classified by the type relating to beats, it becomes “other”, and when it is classified in more detail, it becomes “shuffle”.
Next, guide melody data will be described with reference to the drawings.

図４は、ガイドメロディデータの一例を表す図である。ガイドメロディデータは、ユーザが歌唱の際に参照するガイドメロディの各構成音の情報が時系列に沿って記述されたデータである。ガイドメロディデータは、「Ｎｏ．」、「音高」、「発音開始時刻」、「発音終了時刻」及び「拍種別」といった複数の項目からなる。各々のガイドメロディデータは、ガイドメロディにおける各構成音と対応付けられている。以降、ガイドメロディデータが示すガイドメロディの構成音のことを、参照構成音という。「Ｎｏ．」は、各々のガイドメロディデータを一意に識別する識別子であって、その発音開始時刻が早い順に、昇順で例えば４桁の番号が割り当てられている。「音高」は、各々の参照構成音の音高を表す。「発音開始時刻」は、楽曲の開始時点を基準として、各々の参照構成音の発音が開始される時刻を表す。「発音終了時刻」は、楽曲の開始時点を基準として、各々の参照構成音の発音が終了する時刻を表す。「拍種別」は、各々の参照構成音の拍に関する種別を示す。図の例では、楽曲の進行においてリズムの基準となる表拍に相当する参照構成音の「拍種別」に、表拍であることが記述されている。「拍種別」は、参照構成音の発音がなされるべきタイミングを特定の拍とするものであるから、参照構成音が発音されるべきタイミングを示している。「拍種別」は、本発明に係るタイミングデータの一例である。次に、上述した評価テーブルについて図を用いて説明する。 FIG. 4 is a diagram illustrating an example of guide melody data. The guide melody data is data in which information on each component sound of the guide melody that the user refers to when singing is described in time series. The guide melody data includes a plurality of items such as “No.”, “pitch”, “pronunciation start time”, “pronunciation end time”, and “beat type”. Each guide melody data is associated with each constituent sound in the guide melody. Hereinafter, the constituent sounds of the guide melody indicated by the guide melody data are referred to as reference constituent sounds. “No.” is an identifier for uniquely identifying each guide melody data. For example, a four-digit number is assigned in ascending order from the earliest pronunciation start time. “Pitch” represents the pitch of each reference component sound. The “pronunciation start time” represents the time when the pronunciation of each reference constituent sound is started with reference to the start point of the music. The “pronunciation end time” represents the time at which the pronunciation of each reference constituent sound ends with the start point of the music as a reference. The “beat type” indicates a type related to the beat of each reference constituent sound. In the example shown in the figure, the “beat type” of the reference constituent sound corresponding to the table beat that is the rhythm reference in the progression of the music describes the table beat. The “beat type” indicates a timing at which the reference component sound should be generated since the timing at which the reference component sound should be generated is a specific beat. The “beat type” is an example of timing data according to the present invention. Next, the above-described evaluation table will be described with reference to the drawings.

図５は、評価テーブルの一例を表す図である。評価テーブルは、ガイドメロディデータと、ユーザの歌唱に関する情報とが、対応する参照構成音毎に対応付けられたテーブルであって、参照構成音毎にリズム感の評価点が記述される。ユーザがＵＩ部１４を用いて歌唱したい楽曲を指定するたびに、制御部１１が、指定された楽曲のガイドメロディデータに基づいて評価テーブルを作成し、記憶部１２に記憶させる。図５の評価テーブルは、作成された直後の状態を示す。評価テーブルは、「Ｎｏ．」、「音高」、「発音開始時刻」、「発音終了時刻」、「拍分類情報」、「歌唱開始時刻」及び「評価点」といった複数の項目からなる。「Ｎｏ．」、「音高」、「発音開始時刻」及び「発音終了時刻」は、ガイドメロディデータにおけるものと同一の項目である。制御部１１は、評価テーブルをガイドメロディデータに基づいて作成するので、評価テーブルには、「Ｎｏ．」、「音高」、「発音開始時刻」及び「発音終了時刻」について、作成元であるガイドメロディデータと同一の内容が記述されている。「拍分類情報」は、各参照構成音を拍に関する種別で分類した結果を表す。制御部１１は、ガイドメロディデータにおいて「拍種別」が表拍となっている参照構成音について、評価テーブルの「拍分類情報」に表拍を設定する。作成時の評価テーブルは、表拍以外の「拍分類情報」が空欄となっている。「歌唱開始時刻」は、各参照構成音についてユーザが歌唱を開始したとみなされた時刻を表す。「評価点」は、各参照構成音に対するユーザの歌唱におけるリズム感の評価点を表す。作成時の評価テーブルは、「歌唱開始時刻」及び「評価点」が空欄となっている。 FIG. 5 is a diagram illustrating an example of the evaluation table. The evaluation table is a table in which guide melody data and information related to the user's singing are associated with each corresponding reference constituent sound, and an evaluation point of rhythm feeling is described for each reference constituent sound. Each time the user designates a song to be sung using the UI unit 14, the control unit 11 creates an evaluation table based on the guide melody data of the designated song and stores it in the storage unit 12. The evaluation table in FIG. 5 shows a state immediately after being created. The evaluation table includes a plurality of items such as “No.”, “pitch”, “pronunciation start time”, “pronunciation end time”, “beat classification information”, “singing start time”, and “evaluation point”. “No.”, “pitch”, “pronunciation start time”, and “pronunciation end time” are the same items as those in the guide melody data. Since the control unit 11 creates the evaluation table based on the guide melody data, “No.”, “pitch”, “sounding start time”, and “sounding end time” are created in the evaluation table. The same contents as the guide melody data are described. “Beat classification information” represents the result of classifying each reference component sound by the type related to the beat. The control unit 11 sets a table beat in the “beat classification information” of the evaluation table for the reference constituent sound whose “beat type” is a table beat in the guide melody data. In the evaluation table at the time of creation, “beat classification information” other than the table beat is blank. "Singing start time" represents the time when the user is considered to have started singing for each reference constituent sound. The “evaluation score” represents an evaluation score of a rhythm feeling in the user's song for each reference constituent sound. In the evaluation table at the time of creation, “singing start time” and “evaluation point” are blank.

図６は、カラオケ装置１０の機能的構成を表すブロック図である。模範音データ取得部１１１、分類部１１２、演奏音信号取得部１１３、音高特定部１１４、発音開始時刻特定部１１５、構成音評価データ生成部１１６及び拍種別毎評価データ生成部１１７は、制御部１１によって実現される。模範音データ取得部１１１は、ユーザが楽曲を演奏する際に参照する模範音を構成する複数の参照構成音の各々に関し、当該参照構成音の音高と、当該参照構成音の発音開始時刻とを表す模範音データを取得する。分類部１１２は、模範音データ取得部１１１により取得された模範音データにより表される複数の参照構成音の各々を、所定の規則に従い予め定められた複数の拍に関する種別のいずれかに分類する。演奏音信号取得部１１３は、ユーザの演奏により生成される音である演奏音を表す演奏音信号を取得する。音高特定部１１４は、演奏音信号取得部１１３により取得された演奏音信号からユーザの演奏音の音高を特定する。発音開始時刻特定部１１５は、模範音データにより表される複数の参照構成音の各々に関し、当該参照構成音の発音開始時刻を基準として所定の規則に従い定められる期間内に、音高特定部１１４により特定されたユーザの演奏音の音高が、当該参照構成音の音高を基準として所定の規則に従い定められる音高の範囲内に入った時刻を当該参照構成音の演奏がなされた発音開始時刻として特定する。構成音評価データ生成部１１６は、模範音データにより表される複数の参照構成音の各々に関し、当該参照構成音の発音開始時刻と、当該参照構成音に関して発音開始時刻特定部１１５により特定されたユーザによる演奏の発音開始時刻との差が小さいほど高い評価を示す評価データを当該参照構成音に関する構成音評価データとして生成する。拍種別毎評価データ生成部１１７は、複数の拍に関する種別毎に、分類部１１２により当該種別に分類された１以上の参照構成音に関し構成音評価データ生成部１１６により生成された構成音評価データに基づき、拍に関する種別毎の評価を示す拍種別毎評価データを生成する。 FIG. 6 is a block diagram illustrating a functional configuration of the karaoke apparatus 10. The model sound data acquisition unit 111, the classification unit 112, the performance sound signal acquisition unit 113, the pitch specification unit 114, the sound generation start time specification unit 115, the constituent sound evaluation data generation unit 116, and the evaluation data generation unit 117 for each beat type are controlled. This is realized by the unit 11. The model sound data acquisition unit 111 relates to each of a plurality of reference component sounds constituting a model sound to be referred to when the user plays a musical piece, and the pitch of the reference component sound and the pronunciation start time of the reference component sound The model sound data representing is acquired. The classification unit 112 classifies each of the plurality of reference constituent sounds represented by the model sound data acquired by the model sound data acquisition unit 111 into one of the types related to a plurality of beats determined in advance according to a predetermined rule. . The performance sound signal acquisition unit 113 acquires a performance sound signal representing a performance sound that is a sound generated by a user's performance. The pitch specifying unit 114 specifies the pitch of the user's performance sound from the performance sound signal acquired by the performance sound signal acquisition unit 113. The sound generation start time specifying unit 115 relates to each of the plurality of reference constituent sounds represented by the model sound data, and within a period determined in accordance with a predetermined rule based on the sound generation start time of the reference constituent sounds. The time when the pitch of the performance sound of the user specified by the step is within the range of the pitch determined according to a predetermined rule based on the pitch of the reference configuration sound is started. Specify as time. The component sound evaluation data generation unit 116 specifies the sound generation start time of the reference component sound and the sound generation start time specifying unit 115 for the reference component sound for each of the plurality of reference component sounds represented by the model sound data. Evaluation data indicating higher evaluation as the difference from the sound generation start time of the performance by the user is smaller is generated as component sound evaluation data related to the reference component sound. The beat type evaluation data generation unit 117 generates component sound evaluation data generated by the component sound evaluation data generation unit 116 for one or more reference component sounds classified into the type by the classification unit 112 for each type related to a plurality of beats. Based on the above, the evaluation data for each beat type indicating the evaluation for each type of beat is generated.

＜動作＞
次に、本実施形態における動作について説明する。
図７は、カラオケ装置１０の処理フロー図である。まず、ユーザが、ＵＩ部１４を用いて歌唱したい楽曲を指定すると、制御部１１は、これを受け付ける（ステップＳａ１）。次に、制御部１１は、通信部１３を用いて、指定された楽曲の伴奏音を表す楽音データと指定された楽曲のガイドメロディデータとをサーバ装置２０から取得すると、これらを記憶部２２に記憶させる（ステップＳａ２）。制御部１１は、取得したガイドメロディデータに基づいて評価テーブルを作成すると、これを記憶部２２に記憶させる（ステップＳａ３）。次に、制御部１１は、取得したガイドメロディデータにおける各々の参照構成音を拍に関する種別で分類する（ステップＳａ４）。次に、ステップＳａ４の処理について詳述する。 <Operation>
Next, the operation in this embodiment will be described.
FIG. 7 is a process flow diagram of the karaoke apparatus 10. First, when the user designates a piece of music to be sung using the UI unit 14, the control unit 11 accepts this (step Sa1). Next, when the control unit 11 uses the communication unit 13 to acquire the musical tone data representing the accompaniment sound of the designated song and the guide melody data of the designated song from the server device 20, these are stored in the storage unit 22. Store (step Sa2). If the control part 11 produces an evaluation table based on the acquired guide melody data, it will memorize | store this in the memory | storage part 22 (step Sa3). Next, the control unit 11 classifies each reference component sound in the acquired guide melody data according to the type relating to the beat (step Sa4). Next, the process of step Sa4 will be described in detail.

上述したように、ガイドメロディデータにおいて、各参照構成音の発音開始時刻が記述されているとともに、表拍に相当する参照構成音には「拍種別」として「表拍」が対応付けられている。ステップＳａ４において、制御部１１は、或る表拍の参照構成音Ａと次の表拍の参照構成音Ｂとの間に存在する参照構成音Ｃについて、表拍の参照構成音Ａ及びＢの発音開始時刻に対する参照構成音Ｃの発音開始時刻の時間差に基づいて、参照構成音Ｃが相当する拍について拍種別の分類を行う。例えば、表拍の参照構成音Ａの発音開始時刻が「００：０４：００」であり、表拍の参照構成音Ｂの発音開始時刻が「００：０６：００」であり、参照構成音Ｃの発音開始時刻が「００：０５：００」であったとする。この場合、参照構成音Ｃの発音開始時刻は、表拍である参照構成音Ａの発音開始時刻と、表拍である参照構成音Ｂの発音開始時刻とのちょうど中間に位置している。従って、制御部１１は、参照構成音Ｃの拍を裏拍に分類する。要するに、連続する表拍の参照構成音における発音開始時刻どうしの差を基準として、表拍の参照構成音の発音開始時刻と、未分類の参照構成音の発音開始時刻との時間差に基づいて、拍が分類される。制御部１１は、各参照構成音の拍に関する種別を、「表拍」、「裏拍」及び「それ以外」のいずれかに分類し、その分類の結果を記憶部２２に記憶された評価テーブルにおける拍分類情報に記述する。このように、制御部１１は、上述した内容の規則に従って、参照構成音の各々を予め定められた複数の拍に関する種別のいずれかに分類する。 As described above, in the guide melody data, the pronunciation start time of each reference component sound is described, and the reference component sound corresponding to the table beat is associated with “table beat” as the “beat type”. . In step Sa4, the control unit 11 uses the reference component sounds A and B of the table beat for the reference component sound C existing between the reference component sound A of a certain table beat and the reference component sound B of the next table beat. Based on the time difference of the sound generation start time of the reference component sound C with respect to the sound generation start time, the beat type is classified for the beat corresponding to the reference component sound C. For example, the pronunciation start time of the reference constituent sound A of the table beat is “00:04:00”, the pronunciation start time of the reference constituent sound B of the table beat is “00:06:00”, and the reference constituent sound C Suppose that the pronunciation start time of “00:05:00”. In this case, the sound generation start time of the reference component sound C is located exactly in the middle between the sound generation start time of the reference component sound A which is a table beat and the sound generation start time of the reference component sound B which is a table beat. Therefore, the control unit 11 classifies the beat of the reference component sound C as a back beat. In short, based on the difference between the pronunciation start times in the reference constituent sounds of consecutive table beats, based on the time difference between the pronunciation start time of the reference constituent sounds of the table beats and the pronunciation start times of unclassified reference constituent sounds, Beats are classified. The control unit 11 classifies the type related to the beat of each reference constituent sound into one of “front beat”, “back beat”, and “other than that”, and the evaluation table stored in the storage unit 22 with the result of the classification Described in beat classification information. In this way, the control unit 11 classifies each reference constituent sound into one of a plurality of predetermined beats according to the rules described above.

ステップＳａ４の処理が開始されると、制御部１１が、取得した楽音データとガイドメロディデータとに基づいて楽曲を再生させて、再生信号を音声出力部１６に供給することで、音声出力部１６は、楽曲の伴奏音及びガイドメロディをスピーカから放音させる（ステップＳａ５）。そして、ユーザがマイクを用いて歌唱を開始すると、音声入力部１５は、マイクから供給されるユーザの音声を表す音声信号を取得すると、取得した順番でこれを記憶部２２に記憶させる（ステップＳａ６）。次に、制御部１１は、取得した音声信号を周波数解析してユーザの音声の音高を特定する（ステップＳａ７）。周波数解析の方法については、周知のもののうちいずれかが用いられればよい。次に、制御部１１は、各参照構成音に対するユーザの歌唱タイミング、具体的には、ユーザによる歌唱開始時刻を特定する（ステップＳａ８）。次に、ステップＳａ８について図を用いて説明する。 When the process of step Sa4 is started, the control unit 11 reproduces the music based on the acquired musical tone data and the guide melody data, and supplies the reproduction signal to the audio output unit 16, thereby the audio output unit 16 Causes the musical accompaniment sound and the guide melody to be emitted from the speaker (step Sa5). And if a user starts singing using a microphone, the audio | voice input part 15 will memorize | store this in the memory | storage part 22 in the acquired order, if the audio | voice signal showing the user's audio | voice supplied from a microphone is acquired (step Sa6). ). Next, the control unit 11 analyzes the frequency of the acquired voice signal and specifies the pitch of the user's voice (step Sa7). Any known method may be used as the frequency analysis method. Next, the control part 11 specifies the user's singing timing with respect to each reference constituent sound, specifically, the singing start time by the user (step Sa8). Next, step Sa8 will be described with reference to the drawings.

図８は、ユーザによる歌唱の検出を説明するための模式図である。図８において、横軸は時間を表し、図中右方向へ進むほど時間が経過することを表している。また、図８において、縦軸は音高を表し、図中上方向へ進むほど音高が高いことを表している。図示された３つの矩形は、ガイドメロディデータが示す参照構成音ＲＳ０、ＲＳ１及びＲＳ２を表す。また、曲線Ｐは、歌唱におけるピッチの軌跡であるピッチ軌跡を示す。参照構成音ＲＳ０の音高は「Ｆ」であり、参照構成音ＲＳ１の音高は「Ｇ」であり、参照構成音ＲＳ２の音高は「Ｃ」である。各々の参照構成音は、音高の検出範囲及び検出期間を有する。例えば、ここでは、音高の検出範囲は、参照構成音の音高から上下に半音ずつであるものとする。つまり、参照構成音ＲＳ０の音高検出範囲は、「Ｆ＃」から「Ｅ」までとなり、参照構成音ＲＳ１の音高検出範囲は、「Ｇ＃」から「Ｆ＃」までとなり、参照構成音ＲＳ２の音高検出範囲は、「Ｃ＃」から「Ｂ」までとなる。図５においては、音高検出範囲の上下限を破線で表している。また、例えば、ここでは、検出期間は、「参照構成音の発音開始時刻を基準として前後２秒間」であるものとする。音高検出範囲及び検出期間の内容は記憶部２２に記憶されているが、ユーザがＵＩ部１４を用いてその内容を適切な範囲で任意に変更可能としてもよい。 FIG. 8 is a schematic diagram for explaining detection of singing by the user. In FIG. 8, the horizontal axis represents time, and the time elapses toward the right in the figure. In FIG. 8, the vertical axis represents the pitch, and the pitch increases as it goes upward in the figure. The three rectangles shown represent reference constituent sounds RS0, RS1, and RS2 indicated by the guide melody data. A curve P indicates a pitch trajectory that is a pitch trajectory in a song. The pitch of the reference component sound RS0 is “F”, the pitch of the reference component sound RS1 is “G”, and the pitch of the reference component sound RS2 is “C”. Each reference component sound has a pitch detection range and detection period. For example, it is assumed here that the pitch detection range is one semitone up and down from the pitch of the reference constituent sound. That is, the pitch detection range of the reference component sound RS0 is “F #” to “E”, and the pitch detection range of the reference component sound RS1 is “G #” to “F #”. The pitch detection range of RS2 is “C #” to “B”. In FIG. 5, the upper and lower limits of the pitch detection range are represented by broken lines. Further, for example, here, the detection period is “two seconds before and after the sound generation start time of the reference constituent sound”. The contents of the pitch detection range and the detection period are stored in the storage unit 22, but the user may arbitrarily change the contents within an appropriate range using the UI unit 14.

ステップＳａ８において、制御部１１は、或る参照構成音の発音開始時刻を基準とした検出期間内に、ユーザの歌唱の音高が参照構成音の音高を基準として音高検出範囲内に入った時刻を、この参照構成音の歌唱がなされた歌唱開始時刻の基準となる基準時刻として特定する。例えば、図８の場合、参照構成音ＲＳ１に対するユーザの歌唱の音高は、検出期間に含まれる時刻ｔ１において、参照構成音ＲＳ１の音高検出範囲に含まれるようになるため、制御部１１は、参照構成音ＲＳ１に対する基準時刻を「ｔ１」と特定する。そして、制御部１１は、基準時刻からピッチ軌跡Ｐを予め決められた期間だけ過去にさかのぼり、ピッチ軌跡Ｐが予め決められた状態のいずれかに相当すると判定したタイミングを、参照構成音ＲＳ１の歌唱開始時刻「ｔｘ１」として特定する。上記予め決められた期間とは、例えば「１００ミリ秒」という期間であり、パラメータとして予め記憶部１２に記憶されている。 In step Sa8, the control unit 11 determines that the pitch of the user's singing is within the pitch detection range based on the pitch of the reference constituent sound within the detection period based on the pronunciation start time of a certain reference constituent sound. Is specified as a reference time that is a reference for the singing start time when the reference component sound is sung. For example, in the case of FIG. 8, the pitch of the user's singing with respect to the reference component sound RS1 is included in the pitch detection range of the reference component sound RS1 at time t1 included in the detection period. The standard time for the reference constituent sound RS1 is specified as “t1”. Then, the control unit 11 traces the pitch trajectory P from the reference time in the past for a predetermined period, and determines the timing at which the pitch trajectory P corresponds to one of the predetermined states as the singing of the reference constituent sound RS1. The start time is specified as “tx1”. The predetermined period is a period of “100 milliseconds”, for example, and is stored in the storage unit 12 in advance as a parameter.

次に、上述した予め決められた状態について説明する。予め決められた状態には３パターンあり、制御部１１は、これらのパターンのいずれかに相当すると判定した場合、そのタイミングを歌唱開始時刻として特定する。第１のパターンは、制御部１１が、ピッチ軌跡Ｐを過去にさかのぼった場合に、軌跡が途切れたときである。この状態は、ピッチ軌跡Ｐが途切れたタイミングから歌唱が開始されたことを表す。この場合、制御部１１は、ピッチ軌跡Ｐが途切れたタイミングを、参照構成音の歌唱開始時刻として特定する。第２のパターンは、制御部１１が、ピッチ軌跡Ｐを過去にさかのぼった場合に、ピッチ軌跡Ｐがなすカーブが時間軸に対してなす傾斜の角度が予め決められた閾値よりも緩やかになったときである。この予め決められた閾値は、例えば「３０度」といったものであり、パラメータとして予め記憶部１２に記憶されている。この場合、基準時刻を特定した参照構成音と、その直前の参照構成音との音高に、或る程度の差分があることを表す。つまり、この場合、ユーザは、直前の参照構成音とは音高が異なる別の参照構成音を歌っていることを表すから、制御部１１は、このタイミングを、基準時刻を特定した参照構成音の歌唱開始時刻として特定する。第３のパターンは、制御部１１が、ピッチ軌跡Ｐを過去にさかのぼった場合に、ピッチ軌跡Ｐがなすカーブが時間軸に対して為す傾斜の角度が、予め決められた閾値よりも大きくなったときである。この予め決められた閾値は、例えば「６０度」といったものであり、パラメータとして予め記憶部１２に記憶されている。この場合、基準時刻を特定した参照構成音と、その直前の参照構成音との音高に、大きな差分があることを表す。つまり、この場合も、ユーザは、直前の参照構成音とは音高が異なる別の参照構成音を歌っていることを表すから、このタイミングを、基準時刻を特定した参照構成音の歌唱開始時刻として特定する。 Next, the above-described predetermined state will be described. There are three patterns in the predetermined state, and when it is determined that the control unit 11 corresponds to any of these patterns, the timing is specified as the singing start time. The first pattern is when the trajectory is interrupted when the control unit 11 traces back the pitch trajectory P in the past. This state represents that the singing is started from the timing when the pitch trajectory P is interrupted. In this case, the control unit 11 specifies the timing at which the pitch trajectory P is interrupted as the singing start time of the reference constituent sound. In the second pattern, when the control unit 11 traces the pitch trajectory P in the past, the inclination angle formed by the curve made by the pitch trajectory P with respect to the time axis becomes gentler than a predetermined threshold value. Is the time. The predetermined threshold is, for example, “30 degrees”, and is stored in the storage unit 12 in advance as a parameter. In this case, it indicates that there is a certain difference between the pitches of the reference constituent sound specifying the standard time and the reference constituent sound immediately before. That is, in this case, since the user represents singing another reference constituent sound having a pitch different from that of the immediately preceding reference constituent sound, the control unit 11 uses this timing as the reference constituent sound that specifies the reference time. Is specified as the singing start time. In the third pattern, when the control unit 11 traces the pitch trajectory P in the past, the angle of inclination that the curve made by the pitch trajectory P makes with respect to the time axis becomes larger than a predetermined threshold value. Is the time. The predetermined threshold is, for example, “60 degrees”, and is stored in the storage unit 12 in advance as a parameter. In this case, it indicates that there is a large difference in pitch between the reference component sound specifying the standard time and the reference component sound immediately before. That is, also in this case, since the user represents singing another reference constituent sound having a pitch different from that of the immediately preceding reference constituent sound, this timing is used as the singing start time of the reference constituent sound specifying the reference time. As specified.

図示されるように、歌唱開始時刻として検出された「ｔｘ１」は、参照構成音ＲＳ１の発音開始時刻より遅い。この場合、ユーザが歌いだしを遅れたか、あるいは意図的にタメをきかせた場合を表す。ここで、「タメ」とは、本来のタイミングより歌唱に遅れが生じている状態であり、一方、「ハシリ」とは、本来のタイミングより早く歌唱が行われている状態を表す。同様に、制御部１１は、参照構成音ＲＳ２の基準時刻を「ｔ２」と特定し、歌唱開始時刻を「ｔｘ２」と特定する。制御部１１は、特定した歌唱開始時刻を、対応する参照構成音と対応付けて評価テーブルに記述する。 As illustrated, “tx1” detected as the singing start time is later than the pronunciation start time of the reference constituent sound RS1. In this case, it represents a case where the user has delayed the singing or has intentionally made a mistake. Here, “Tame” is a state where the singing is delayed from the original timing, while “Hashiri” represents a state where the singing is performed earlier than the original timing. Similarly, the control unit 11 specifies the reference time of the reference constituent sound RS2 as “t2” and specifies the singing start time as “tx2”. The control unit 11 describes the specified singing start time in the evaluation table in association with the corresponding reference constituent sound.

図７の説明に戻る。ステップＳａ８の次に、制御部１１は、各参照構成音に対するユーザの歌唱を評価する（ステップＳａ９）。制御部１１は、評価テーブルに記述された内容に従って、ユーザによる歌唱開始時刻と参照構成音の発音開始時刻との差の絶対値が小さいほど高く、大きいほど低くなるような評価データを生成する。ここでは、評価データは、数値で表される評点であるとする。評価の方法については、予め決められた計算式を用いてもよいが、ここでは、対応関係が記述されたテーブルである評価基準テーブルを用いるものとする。 Returning to the description of FIG. Following step Sa8, the control unit 11 evaluates the user's song for each reference constituent sound (step Sa9). In accordance with the contents described in the evaluation table, the control unit 11 generates evaluation data that is higher as the absolute value of the difference between the singing start time by the user and the pronunciation start time of the reference constituent sound is smaller and lower as it is larger. Here, it is assumed that the evaluation data is a score represented by a numerical value. As a method of evaluation, a predetermined calculation formula may be used, but here, an evaluation criterion table that is a table in which the correspondence is described is used.

図９は、評価基準テーブルの一例を表す図である。評価基準テーブルにおいては、ユーザによる歌唱開始時刻と参照構成音の発音開始時刻との差が取り得る範囲である時刻差（ミリ秒）に対して、評価点が対応付けられて記述されている。ここでは、評価点を「０」から「５」までの６段階評価としている。また、評価点には、通常の評価に用いられる「標準」と、タメのきいた歌唱の評価に対して用いられる「タメ」と、走りのきいた歌唱の評価に対して用いられる「ハシリ」の３種類がある。「タメ」及び「ハシリ」の評価点については、一度評価点を算出した後に、総合的なリズム感評価において、歌唱に「タメ」又は「ハシリ」の特徴があると判定された場合に、用いられる。まず、ここでは、「標準」の評価点を用いて評価が行われる。図示されるように、「標準」の評価点では、ユーザによる歌唱開始時刻から参照構成音の発音開始時刻を減算した差の絶対値が小さいほど高い評価となっている。例えば、制御部１１は、ユーザによる歌唱開始時刻から対象となる参照構成音の発音開始時刻を減算した差を「＋１００ミリ秒」と算出すると、評価基準テーブルを参照して、この参照構成音の評価点を「５」と決定する。制御部１１は、決定した内容に基づいて、各参照構成音について評価テーブルにおける評価点を記述する。この評価点は、本発明に係る構成音評価データの一例である。 FIG. 9 is a diagram illustrating an example of the evaluation criterion table. In the evaluation criteria table, an evaluation point is described in association with a time difference (milliseconds) that is a range in which the difference between the singing start time by the user and the pronunciation start time of the reference constituent sound can be taken. Here, the evaluation score is a six-level evaluation from “0” to “5”. Also, the evaluation points include “standard” used for normal evaluation, “tame” used for evaluation of singing songs, and “hashiri” used for evaluation of singing songs. There are three types. The evaluation points for “Tame” and “Hashiri” are used when it is determined that the song has “Tame” or “Hashiri” characteristics in the overall rhythmic evaluation after the evaluation points have been calculated once. It is done. First, evaluation is performed using “standard” evaluation points. As shown in the figure, the “standard” evaluation score is higher as the absolute value of the difference obtained by subtracting the sound generation start time of the reference constituent sound from the singing start time by the user is smaller. For example, when the difference obtained by subtracting the pronunciation start time of the target reference constituent sound from the singing start time by the user is calculated as “+100 milliseconds”, the control unit 11 refers to the evaluation standard table and calculates the reference constituent sound. The evaluation score is determined as “5”. The control part 11 describes the evaluation score in an evaluation table about each reference structure sound based on the determined content. This evaluation score is an example of constituent sound evaluation data according to the present invention.

図１０は、ステップＳａ９までに評価が行われた後の評価テーブルの一例を表す図である。評価テーブルの作成時に表拍以外は空欄となっていた「拍分類情報」は、ステップＳａ４において拍に関する種別の分類が行われることで、各参照構成音について拍に関する種別の分類が記述されている。また、ステップＳａ８においてユーザの歌唱が検出されることにより、「歌唱開始時刻」に値が記述されている。また、ステップＳａ９において参照構成音毎にユーザの歌唱が評価されることにより、「評価点」に値が記述されている。ステップＳａ９の次に、制御部１１は、ユーザの歌唱が評価された後の評価テーブルに基づいて、拍に関する種別毎の評価データを生成する（ステップＳａ１０）。ここでは、拍に関する種別毎の評価データは、数値で示されるものであり、拍に関する種別毎の評価点の平均点であって、本発明に係る拍種別毎評価データの一例である。具体的には、制御部１１は、評価テーブルにおいて或る同一の「拍分類情報」を有する参照構成音の評価点の合計値を、この「拍分類情報」を有する参照構成音の個数で除算することで、拍に関する種別毎の平均点を算出する。つまり、ステップＳａ１０が終了すると、「表拍」と、「裏拍」と、「それ以外」についてそれぞれ、ユーザによる歌唱の評価の平均点が算出される。 FIG. 10 is a diagram illustrating an example of the evaluation table after the evaluation is performed up to step Sa9. The “beat classification information”, which is blank except for the table beat when the evaluation table is created, describes the classification of the type related to the beat for each reference constituent sound by performing the classification of the type related to the beat in step Sa4. . Further, by detecting the user's song in step Sa8, a value is described in “song start time”. In step Sa9, the user's song is evaluated for each reference constituent sound, whereby a value is described in “evaluation point”. After step Sa9, the control unit 11 generates evaluation data for each type related to beats based on the evaluation table after the user's singing has been evaluated (step Sa10). Here, the evaluation data for each type related to the beat is indicated by a numerical value, and is an average score of the evaluation points for each type related to the beat, and is an example of the evaluation data for each beat type according to the present invention. Specifically, the control unit 11 divides a total value of evaluation scores of reference constituent sounds having the same “beat classification information” in the evaluation table by the number of reference constituent sounds having the “beat classification information”. By doing so, the average score for each type relating to the beat is calculated. That is, when step Sa10 is completed, the average score of the singing evaluation by the user is calculated for each of “top beat”, “back beat”, and “other than that”.

ステップＳａ１０の次に、制御部１１は、拍の分類毎に算出した歌唱の評価の平均点に基づいて、総合的なリズム感の評価を行う（ステップＳａ１１）。ステップＳａ１１の評価は、例えば以下の式（１）に基づいて行われる。式（１）において、αは０＜α＜１を満たす係数である。
総合的リズム感評価＝｛（（“裏拍”の平均点＋“それ以外”の平均点）／２）／（“表拍”の平均点）｝×｛（（“裏拍”の平均点＋“それ以外”の平均点）／２）×α＋（“表拍”の平均点）×（１−α）｝・・・式（１）
係数αは記憶部１２に記憶されており、ユーザがＵＩ部１４を用いてその値を変更可能としてもよい。係数αが大きいほど、「裏拍」及び「それ以外」の評価の重みが増し、係数αが小さいほど、「表拍」の評価の重みが増すこととなる。係数「α」及び「１−α」の合計は１であるから、これらの係数は、拍の種別毎に乗じる係数の比率を表している。つまり、制御部１１は、拍に関する種別毎の評価データにより示される数値に対し、種別毎に予め定められた比率の係数を乗じて、それらを加算して得られる数値を算出する。制御部１１は、本発明に係る算出手段の一例である。また、式（１）において、「｛（（“裏拍”の平均点＋“それ以外”の平均点）／２）／（“表拍”の平均点）｝」の部分の計算については、制御部１１は、その最大値を１とする。つまり、この部分についての計算結果として１を超える値が算出された場合、制御部１１は、その値を１に補正する。また、「｛（（“裏拍”の平均点＋“それ以外”の平均点）／２）／（“表拍”の平均点）｝」の部分について、「表拍」の平均点を分母にし、「裏拍」及び「それ以外」の平均点を分子にしている理由は以下のとおりである。「表拍」は、ユーザが一般的に認識しやすく歌唱タイミングがずれにくいのに比べて、「裏拍」及び「それ以外」については、「表拍」と比べて歌唱タイミングがずれやすい。従って、「裏拍」及び「それ以外」の参照構成音における歌唱タイミングのずれが少ないほど、そのユーザはリズム感がよい可能性が高い。このような考え方に基づいて、本実施形態においては、「裏拍」及び「それ以外」の評価が高い場合に上記部分の計算式の値がより大きくなるように、「裏拍」及び「それ以外」の平均点を分子にしている。 After step Sa10, the control unit 11 evaluates the overall sense of rhythm based on the average score of singing evaluations calculated for each beat classification (step Sa11). Evaluation of step Sa11 is performed based on the following formula (1), for example. In Expression (1), α is a coefficient that satisfies 0 <α <1.
Comprehensive Rhythm Evaluation = {(("Average score of" back beat "+" Average score of "other beat") / 2) / (Average score of "table beat")} x {(("Average score of" back beat ") + “Other average” score) / 2) × α + (Average score of “table beat”) × (1−α)} Expression (1)
The coefficient α is stored in the storage unit 12 and may be changed by the user using the UI unit 14. As the coefficient α is larger, the evaluation weight of “back beat” and “others” is increased, and as the coefficient α is smaller, the evaluation weight of “table beat” is increased. Since the sum of the coefficients “α” and “1−α” is 1, these coefficients represent the ratio of the coefficients to be multiplied for each type of beat. That is, the control unit 11 calculates a numerical value obtained by multiplying the numerical value indicated by the evaluation data for each type related to the beat by a coefficient of a ratio predetermined for each type and adding them. The control unit 11 is an example of a calculation unit according to the present invention. In addition, in the formula (1), for the calculation of the part of “{((“ average score of “back beat” + “average score of other beats”) / 2) / (average score of “table beat”)} ” The control unit 11 sets the maximum value to 1. That is, when a value exceeding 1 is calculated as the calculation result for this portion, the control unit 11 corrects the value to 1. In addition, for the part of “{((the average score of“ back beat ”+ the average score of“ other beat ”) / 2) / (the average score of“ table beat ”)}”, the average score of “table beat” is used as the denominator. The reason why the average score of “back beat” and “other than that” is used as the numerator is as follows. The “table beat” is generally easily recognized by the user, and the singing timing is less likely to be shifted. Therefore, the smaller the deviation of the singing timing in the reference constituent sounds of “back beat” and “other than that”, the higher the possibility that the user has a sense of rhythm. Based on this concept, in this embodiment, when the evaluation of “back beat” and “other than that” is high, “back beat” and “it The average score of “other than” is the numerator.

例えば、式（１）において、係数αが「０．６」であり、「表拍」の平均点が「４」であり、「裏拍」及び「それ以外」の平均点が「３」であった場合、以下の式（２）のように表せる。
総合的リズム感評価＝｛（（３＋３）／２）／４｝×｛（（３＋３）／２）×０．６＋４×０．４｝＝（３／４）×３．４
＝２．５５・・・式（２）
式（２）によれば、「表拍」の平均点は高いものの、「裏拍」及び「それ以外」の平均点は「表拍」と比べて低く、係数αの値が、「裏拍」及び「それ以外」の評価の重みを増す値となっている。この結果、式（２）によって得られる総合的リズム感評価は、やや低いものとなる。ユーザは、ＵＩ部１４を用いて係数αの値を変更することで、総合的リズム評価における拍の種別毎の重み付けを変更することができる。 For example, in the formula (1), the coefficient α is “0.6”, the average score of “table beat” is “4”, and the average score of “back beat” and “other beats” is “3”. If there is, it can be expressed as the following formula (2).
Total rhythm evaluation = {((3 + 3) / 2) / 4} × {((3 + 3) / 2) × 0.6 + 4 × 0.4} = (3/4) × 3.4
= 2.55 ... Formula (2)
According to Equation (2), although the average score of “table beat” is high, the average score of “back beat” and “others” is lower than that of “table beat”, and the value of coefficient α is “back beat” "And other values" increase the weight of the evaluation. As a result, the overall rhythmic sense evaluation obtained by equation (2) is somewhat low. The user can change the weight of each beat type in the overall rhythm evaluation by changing the value of the coefficient α using the UI unit 14.

カラオケ装置１０は、評価結果を分析したコメントをＵＩ部１４に表示させる。例えば、制御部１１は、ステップＳａ１１の後に、評価テーブルに基づいて、分類された拍に関する種別毎に度数分布表を作成する。図１１は、度数分布表の一例を表す図である。図において横軸は、参照構成音の発音開始時刻と歌唱開始時刻との時間差を表す。図１１において、図上右方向に進むほど歌唱のタイミングが本来あるべきよりも遅れていることを表し、図上左方向に進むほど歌唱のタイミングが本来あるべきよりも早いことを表す。図上縦軸は度数を表し、各歌唱タイミングの個数が示されている。図１１において、破線は「表拍」の度数分布を表し、一点鎖線は「裏拍」の度数分布を表し、二点鎖線は「それ以外」の度数分布を表す。図１１（ａ）は、ユーザＡの歌唱の評価結果に基づく度数分布であり、図１１（ｂ）は、ユーザＢの歌唱の評価結果に基づく度数分布である。図１１（ａ）に示すように、ユーザＡの歌唱は、「表拍」、「裏拍」及び「それ以外」のそれぞれについて、度数のピークがほぼ同じ歌唱タイミングに集中している。従って、ユーザＡの歌唱のタイミングは、拍に関する種別によらずに一定であり、リズム感がよいといえる。一方、図１１（ｂ）の場合、ユーザＢの歌唱は、「表拍」、「裏拍」及び「それ以外」のそれぞれについて、度数のピークが訪れる歌唱のタイミングが分散している。従って、ユーザＢの歌唱のタイミングは、拍に関する種別によって異なっており、リズム感がよくないといえる。制御部１１は、このような度数分布表に基づく判定結果をコメントとしてＵＩ部１４に表示させてもよいし、度数分布表そのものをＵＩ部１４に表示させてもよい。 The karaoke apparatus 10 causes the UI unit 14 to display a comment obtained by analyzing the evaluation result. For example, after step Sa11, the control unit 11 creates a frequency distribution table for each type related to the classified beat based on the evaluation table. FIG. 11 is a diagram illustrating an example of a frequency distribution table. In the figure, the horizontal axis represents the time difference between the pronunciation start time of the reference component sound and the singing start time. In FIG. 11, the singing timing is later than it should be as it proceeds in the right direction in the figure, and the singing timing is earlier than it should be as it proceeds in the left direction in the figure. The vertical axis in the figure represents the frequency, and the number of each singing timing is shown. In FIG. 11, the broken line represents the frequency distribution of “table beat”, the one-dot chain line represents the frequency distribution of “back beat”, and the two-dot chain line represents the frequency distribution of “others”. FIG. 11A shows a frequency distribution based on the evaluation result of user A's song, and FIG. 11B shows a frequency distribution based on the evaluation result of user B's song. As shown to Fig.11 (a), as for the song of the user A, the peak of frequency concentrates on the same singing timing about each of "front beat", "back beat", and "other than that". Therefore, it can be said that the singing timing of the user A is constant regardless of the type related to the beat, and the rhythm feeling is good. On the other hand, in the case of FIG. 11B, the singing timing of the frequency at which the peak of the frequency at which the user B's singing visits “top beat”, “back beat”, and “other than that” is distributed. Therefore, it can be said that the singing timing of the user B differs depending on the type related to the beat, and the rhythm feeling is not good. The control unit 11 may display the determination result based on such a frequency distribution table on the UI unit 14 as a comment, or may display the frequency distribution table itself on the UI unit 14.

コメントを表示する他の方法として、例えば、評価コメントを記述した評価コメントテーブルを用いる方法がある。図１２は、評価コメントテーブルの一例を表す図である。図１２の評価コメントテーブルには、「表拍」の平均点と、「裏拍」及び「それ以外」の平均点との組み合わせについて、評価コメントが記述されている。平均点は、例えば「３．５」といった閾値に基づいて、高いか低いかに区別される。評価コメントテーブルは、上記閾値とともに記憶部１２に記憶されている。この閾値は、ユーザがＵＩ部１４を用いて変更可能としてもよい。例えば、ステップＳａ９において、表拍の平均点が「４」、裏拍及びそれ以外の平均点が「３」と算出された場合、制御部１１は、「リズム感が悪いです。」というコメントをＵＩ部１４に表示させる。このような評価コメントテーブルを用いれば、「表拍」の評価と、「裏拍」及び「それ以外」の評価とを組み合わせた結果に基づいて、総合的なリズム感の評価が可能となる。 As another method for displaying a comment, for example, there is a method of using an evaluation comment table describing an evaluation comment. FIG. 12 is a diagram illustrating an example of the evaluation comment table. In the evaluation comment table of FIG. 12, evaluation comments are described for combinations of the average score of “table beat” and the average score of “back beat” and “other than that”. The average score is distinguished as high or low based on a threshold value of “3.5”, for example. The evaluation comment table is stored in the storage unit 12 together with the threshold value. This threshold value may be changeable by the user using the UI unit 14. For example, in step Sa9, when the average score of the table beat is calculated as “4” and the average score of the back beat and other points is calculated as “3”, the control unit 11 comments that “the sense of rhythm is bad”. It is displayed on the UI unit 14. By using such an evaluation comment table, it is possible to evaluate the overall sense of rhythm based on the result of combining the evaluation of “table beat” with the evaluation of “back beat” and “other than that”.

また、ここで、「表拍」の平均点が低く、「裏拍」及び「それ以外」の平均点が高かった場合、制御部１１は、「歌唱にタメ／ハシリがあります。」というコメントをＵＩ部１４に表示させる。これは、以下の理由による。一般的に、「裏拍」及び「それ以外」は、「表拍」よりもリズムを合わせるのが難しく、それと比較して「表拍」はリズムを合わせやすい。従って、難易度の高い「裏拍」及び「それ以外」の評価が高く、難易度の低い「表拍」の評価が低い場合、「表拍」について意図的にタメやハシリをきかせた結果、参照構成音の発音開始時刻と歌唱開始時刻とに差が生じている状態である、と考えられる。評価コメントテーブルを用いた評価の結果、「表拍」の平均点が低く、「裏拍」及び「それ以外」の平均点が高かった場合、制御部１１は、表拍の参照構成音について評価点を再計算する。制御部１１は、図９の評価規準テーブルと、ユーザによる歌唱開始時刻から対象となる参照構成音の発音開始時刻を減算した差とに基づいて、差が正の値である場合は、「タメ」の評価点を用いて、差が負の値である場合は、「ハシリ」の評価点を用いる。例えば、差が「＋４００ミリ秒」であった場合、参照構成音の発音開始時刻よりも、ユーザが、４００ミリ秒遅く歌唱したことを表すから、この場合、制御部１１は、評価規準テーブルにおける「タメ」の評価点に基づいて、評価点を「５」と決定する。通常の評価であれば、これは「４」の評価点となるところであるが、タメに対応した評価規準テーブルを用いているため、このようになる。制御部１１は、決定した評価点で評価テーブルを更新する。このようにすれば、意図的に歌唱されたタメ及びハシリを評価することができる。 In addition, here, when the average score of “table beat” is low and the average score of “back beat” and “other than that” is high, the control unit 11 comments that “there is a sword / hashiri in singing”. It is displayed on the UI unit 14. This is due to the following reason. In general, “back beat” and “other” are more difficult to match the rhythm than “table beat”, and “table beat” is easier to match the rhythm. Therefore, if the evaluation of “back beat” and “other” with high difficulty is high, and the evaluation of “table beat” with low difficulty is low, the result of intentionally using the tame and scissors for “table beat” It is considered that there is a difference between the pronunciation start time of the reference component sound and the singing start time. As a result of the evaluation using the evaluation comment table, when the average score of “table beat” is low and the average score of “back beat” and “other than that” is high, the control unit 11 evaluates the reference constituent sound of the table beat Recalculate points. If the difference is a positive value based on the evaluation criterion table of FIG. 9 and the difference obtained by subtracting the pronunciation start time of the target reference constituent sound from the singing start time by the user, the control unit 11 When the difference is a negative value using the evaluation score “”, the evaluation score “Hashiri” is used. For example, when the difference is “+400 milliseconds”, it means that the user sang 400 milliseconds later than the sound generation start time of the reference component sound. In this case, the control unit 11 in the evaluation criterion table Based on the evaluation score of “Tame”, the evaluation score is determined as “5”. In the case of normal evaluation, this is the evaluation score of “4”, but this is because the evaluation criterion table corresponding to the game is used. The control unit 11 updates the evaluation table with the determined evaluation points. In this way, it is possible to evaluate the intentionally sung wings and lashes.

このように、本実施形態では、楽曲のガイドメロディを表すガイドメロディデータにおける各参照構成音が、拍に関する種別で分類される。そして、拍に関する種別で分類された参照構成音ごとに歌唱の評価が行われ、評価結果に基づいて拍に関する種別ごとの評価が行われることで、拍に関する種別ごとのリズム感評価が得られる。このように、本実施形態によれば、拍を考慮してユーザの歌唱におけるリズム感を評価することが可能となる。 Thus, in this embodiment, each reference constituent sound in the guide melody data representing the guide melody of the music is classified by the type related to the beat. Then, the singing is evaluated for each reference constituent sound classified by the type related to the beat, and the evaluation for each type related to the beat is performed based on the evaluation result, thereby obtaining the rhythmic feeling evaluation for each type related to the beat. Thus, according to the present embodiment, it is possible to evaluate the rhythm feeling of the user's singing in consideration of the beat.

＜変形例＞
以上、本発明の実施形態について説明したが、本発明は、以下のように変形可能である。また、以下の変形例は、適宜組み合わせて実施してもよい。
（変形例１）
ガイドメロディデータは、「拍種別」に限らず、次のような拍に関する情報を含んでもよい。
図１３は、変形例１に係るガイドメロディデータの一例を表す図である。図１３に示すガイドメロディデータは、図４に示すものと比較して、「拍種別２」及び「評価係数」という項目が追加されている。図１３に示す「拍種別２」は、「表拍」がさらに「強拍」と「弱拍」とに分類されたものである。例えば、４／４拍子で４つの四分音符が並んでいた場合、最初の四分音符は強拍となり、残りの３つの四分音符は弱拍となる。「評価係数」は、評価点に乗算する係数である。強拍は弱拍と比較して歌唱タイミングを合わせやすく、目立つため、評価点に重み付けがされるように評価係数が設定されている。評価係数は、記憶部２２に記憶されており、ユーザがＵＩ部１４を用いて設定可能としてもよい。なお、実施形態と比較して変形例１でガイドメロディデータに追加された項目は、作成される評価テーブルにも含まれる。これは、以降の変形例でも同様である。このように、ガイドメロディデータにおいて予め強拍と弱拍を識別する情報を付与し、評価係数を設定可能とすることで、ユーザは、より拍が考慮されたリズム感評価を得ることができる。 <Modification>
Although the embodiment of the present invention has been described above, the present invention can be modified as follows. Further, the following modifications may be implemented in combination as appropriate.
(Modification 1)
The guide melody data is not limited to “beat type”, and may include information on the following beats.
FIG. 13 is a diagram illustrating an example of guide melody data according to the first modification. In the guide melody data shown in FIG. 13, items “beat type 2” and “evaluation coefficient” are added as compared with those shown in FIG. 4. “Beat type 2” shown in FIG. 13 is one in which “table beat” is further classified into “strong beat” and “weak beat”. For example, if four quarter notes are arranged in a 4/4 time signature, the first quarter note is a strong beat and the remaining three quarter notes are weak beats. “Evaluation coefficient” is a coefficient by which the evaluation score is multiplied. Since the strong beats are easier to match the singing timing than the weak beats and stand out, the evaluation coefficient is set so that the evaluation points are weighted. The evaluation coefficient is stored in the storage unit 22 and may be set by the user using the UI unit 14. Note that items added to the guide melody data in Modification 1 as compared with the embodiment are also included in the created evaluation table. The same applies to the following modifications. In this way, by providing information for identifying strong beats and weak beats in advance in the guide melody data and making it possible to set the evaluation coefficient, the user can obtain a rhythmic evaluation that takes beats into account.

（変形例２）
ガイドメロディデータは、次のような拍に関する情報を含んでいてもよい。
図１４は、変形例２に係るガイドメロディデータの一例を表す図である。図１４に示すガイドメロディデータは、図４に示すものと比較して、「拍種別２」及び「評価係数」という項目が追加されている。「評価係数」は、参照構成音に関する評価点に乗じられるべき係数である。図１４に示す「拍種別２」は、拍に関する種別を「シンコペーション」で分類したものである。図１４の例では、シンコペーションを重要視しており、シンコペーションの拍に相当する参照構成音には「１．２」という評価係数が設定されている。制御部１１は、参照構成音の発音開始時刻とユーザによる歌唱開始時刻との差に基づき決定される評価点に、この評価係数を乗じて得られる評価点を生成する。拍に関する種別の分類は、上述したものに限らず、例えば「拍種別２」において「三連符」や「シャッフル」の分類がなされてもよい。この場合、ステップＳａ１０において制御部１１が拍に関する種別ごとに評価を行うときには、「表拍」、「裏拍」及び「それ以外」だけでなく、「拍種別２」に含まれる拍に関する種別についても、拍に関する種別ごとの評価を行う。このように、「表拍」と「裏拍」と「それ以外」という分類よりも、より細かな分類を行うことにより、ユーザは、より詳細なリズム感評価を得ることができる。 (Modification 2)
The guide melody data may include the following information on beats.
FIG. 14 is a diagram illustrating an example of guide melody data according to the second modification. In the guide melody data shown in FIG. 14, items “beat type 2” and “evaluation coefficient” are added as compared with those shown in FIG. 4. The “evaluation coefficient” is a coefficient to be multiplied by the evaluation score related to the reference constituent sound. “Beat type 2” shown in FIG. 14 is a type in which the types related to beats are classified by “syncopation”. In the example of FIG. 14, the syncopation is regarded as important, and an evaluation coefficient of “1.2” is set for the reference constituent sound corresponding to the beat of the syncopation. The control unit 11 generates an evaluation score obtained by multiplying the evaluation score determined based on the difference between the pronunciation start time of the reference constituent sound and the singing start time by the user by this evaluation coefficient. The classification of types related to beats is not limited to those described above, and for example, “triplet” or “shuffle” may be classified in “beat type 2”. In this case, when the control unit 11 performs evaluation for each type related to beats in step Sa10, not only “front beat”, “back beat”, and “others” but also types related to beats included in “beat type 2”. Also, each type of beat is evaluated. In this way, the user can obtain a more detailed rhythmic evaluation by performing a finer classification than the classification of “table beat”, “back beat”, and “other than that”.

（変形例３）
参照構成音毎に評価の重み付けを任意に可能としてもよい。
図１５は、変形例３に係るガイドメロディデータの一例を表す図である。変形例３におけるガイドメロディデータは、参照構成音毎に評価係数を設定可能である。
例えば、歌い出しの頭の箇所や、サビに相当する箇所や、ブリッジ又はブレイク等により拍子が変化する箇所や、テンポの変化に伴い拍間時間が変化する箇所については、いずれも歌唱が難しく、かつ楽曲全体を通して印象的であり目立つ箇所であるため、これらの箇所に相当する参照構成音について、評価点に正の重み付けをするための評価係数が記述されている。このような評価係数を設定することにより、例えばユーザの歌唱が荒いものであっても、頭の出だしやサビといった目立ちやすい箇所で歌唱タイミングがあっていれば、リズム感として高評価が得られやすくなる。 (Modification 3)
Evaluation weighting may be arbitrarily enabled for each reference constituent sound.
FIG. 15 is a diagram illustrating an example of guide melody data according to the third modification. In the guide melody data in the third modification, an evaluation coefficient can be set for each reference constituent sound.
For example, it is difficult to sing at the beginning of singing, the part corresponding to rust, the part where the time signature changes due to bridge or break, etc., and the part where the time between beats changes with the tempo change, And since it is an impressive and conspicuous part throughout the music, an evaluation coefficient for positively weighting the evaluation point is described for the reference constituent sound corresponding to these parts. By setting such an evaluation coefficient, for example, even if the user's singing is rough, it is easy to get high evaluation as a rhythm feeling if there is a singing timing at a conspicuous place such as head out or rust Become.

（変形例４）
評価コメントテーブルは上述した例に限らない。図１６は、変形例４に係る評価コメントテーブルの一例を表す図である。図１６の評価コメントテーブルには、「表拍」の評価点の分布と、「裏拍」及び「それ以外」の評価点の分布との組み合わせについて、評価コメントが記述されている。「表拍」の評価点の分布と、「裏拍」及び「それ以外」の評価点の分布とは、分類された拍に関する種別毎の評価点の標準偏差と、予め決められた閾値とに基づいて、狭いか広いかに区別される。この閾値は、予め記憶部１２に記憶されており、ＵＩ部１４を用いてユーザにより変更可能としてもよい。例えば、制御部１１は、「表拍」の評価点の標準偏差が閾値以上であった場合、評価点の分布を「広い」と判定し、「裏拍」及び「それ以外」の評価点の標準偏差が閾値未満であった場合、評価点の分布を「狭い」と判定する。この場合、制御部１１は、「歌唱にタメ／ハシリがあります。」というコメントをＵＩ部１４に表示させる。このようにすれば、ユーザの歌唱タイミングの分布によって、ユーザの歌唱のリズム感を評価することができる。 (Modification 4)
The evaluation comment table is not limited to the example described above. FIG. 16 is a diagram illustrating an example of the evaluation comment table according to the fourth modification. In the evaluation comment table of FIG. 16, evaluation comments are described for combinations of the distribution of evaluation scores for “table beats” and the distribution of evaluation points for “back beats” and “others”. The distribution of evaluation scores for “table beats” and the distribution of evaluation scores for “back beats” and “other beats” are the standard deviation of evaluation points for each type of classified beats, and a predetermined threshold value. Based on whether it is narrow or wide. This threshold value is stored in advance in the storage unit 12 and may be changed by the user using the UI unit 14. For example, when the standard deviation of the evaluation score of “table beat” is equal to or greater than the threshold, the control unit 11 determines that the distribution of the evaluation points is “wide”, and sets the evaluation score of “back beat” and “other than that”. If the standard deviation is less than the threshold, the evaluation point distribution is determined to be “narrow”. In this case, the control unit 11 causes the UI unit 14 to display a comment “There is a sword / hashiri in the song”. If it does in this way, a user's song rhythm feeling can be evaluated by distribution of a user's song timing.

（変形例５）
事前にガイドメロディデータにおいて「拍種別」に記述される情報は、「表拍」であることに限らない。例えば、拍種別には、「裏拍」であることや、「強拍」であることが記述されていてもよい。要するに、拍に関する種別の分類の際に基準として用いることが可能であれば、表拍に限らずともよい。また、ガイドメロディデータにおいて、全ての参照構成音について予め「拍種別」が記述されるようにしてもよい。この場合、作成される評価テーブルの「拍分類情報」には、ガイドメロディデータにおける「拍種別」がそのまま記述されることとなるので、ステップＳａ４における分類処理が不要となる。 (Modification 5)
The information described in the “beat type” in the guide melody data in advance is not limited to “table beat”. For example, the beat type may describe “back beat” or “strong beat”. In short, as long as it can be used as a reference in classification of types related to beats, it is not limited to table beats. In the guide melody data, “beat type” may be described in advance for all reference constituent sounds. In this case, since “beat type” in the guide melody data is described as it is in “beat classification information” of the created evaluation table, the classification process in step Sa4 becomes unnecessary.

（変形例６）
上述した実施形態では、拍に関する種別の分類を「表拍」、「裏拍」及び「それ以外」の３つに分類したが、これに限ったものではない。例えば、拍に関する種別の分類を、「表拍」及び「裏拍」のみに基づいて行ってもよいし、より詳細に分類してもよい。 (Modification 6)
In the above-described embodiment, the classification of types related to beats is classified into three of “table beat”, “back beat”, and “other than that”, but is not limited to this. For example, classification of types related to beats may be performed based on only “front beats” and “back beats”, or may be classified in more detail.

（変形例７）
拍に関する種別毎のリズム感の評価及び総合的リズム感評価は、例えば、メロディの評価結果と並べて、あるいは合算されてＵＩ部１４に表示されてユーザに通知されるようにしてもよい。また、カラオケ装置１０の制御部１１が、通信部１３を用いて、図示せぬプリンタやカラオケ店舗の管理するウェブサーバ装置、あるいはユーザの所持する携帯電話機に対してこれらのデータを送信することで、ユーザが、携帯電話機を用いて、またはウェブブラウザ上で評価内容を確認可能としたり、評価内容を印刷可能としたりしてもよい。このようにすれば、ユーザは、拍を考慮したリズム感が反映された総合的な歌唱の巧拙を知ることができる。 (Modification 7)
The evaluation of the rhythm feeling and the overall rhythm feeling evaluation for each type relating to the beat may be displayed on the UI unit 14 together with the evaluation result of the melody or added together and notified to the user. In addition, the control unit 11 of the karaoke apparatus 10 uses the communication unit 13 to transmit these data to a printer (not shown), a web server apparatus managed by a karaoke store, or a mobile phone owned by the user. The user may be able to confirm the evaluation content using a mobile phone or on a web browser, or print the evaluation content. In this way, the user can know the skill of comprehensive singing in which a sense of rhythm considering the beat is reflected.

（変形例８）
ユーザの歌唱に対して連続で高評価がなされた場合は、その後の歌唱に重み付けがなされて評価されるようにしてもよい。例えば、制御部１１は、ステップＳａ９において各参照構成音に対するユーザの歌唱の評価を行う際に、予め決められた閾値以上（例えばここでは「４」）の評価点が、予め決められた個数以上連続すると、次の評価点に例えば「１．１」等の評価係数を乗ずる。上記予め決められた閾値及び個数と評価係数とは、記憶部２２に記憶されており、ユーザがＵＩ部１４を用いて変更可能としてもよい。制御部１１は、評価点が再び予め決められた閾値未満となるまで、上記評価係数を用いる。このようにすれば、ユーザの歌唱タイミングが連続で正しいほど、すなわち拍とタイミングが合った歌唱が連続で正しく行われているほど、より高評価を得ることができるので、より拍を考慮したリズム感の評価が得られやすくなる。 (Modification 8)
If the user's song is continuously highly evaluated, the subsequent songs may be weighted and evaluated. For example, when evaluating the user's song for each reference constituent sound in step Sa9, the control unit 11 has an evaluation score equal to or greater than a predetermined threshold (for example, “4” here) equal to or greater than a predetermined number. If it continues, the next evaluation point is multiplied by an evaluation coefficient such as “1.1”. The predetermined threshold and number and the evaluation coefficient are stored in the storage unit 22 and may be changed by the user using the UI unit 14. The control unit 11 uses the evaluation coefficient until the evaluation score again becomes less than a predetermined threshold value. In this way, the more the user's singing timing is correct, that is, the more the singing that matches the timing of the beat is performed correctly, the higher the rating can be obtained. An evaluation of feeling can be easily obtained.

（変形例９）
ユーザによる歌唱に限らず、ユーザによる楽器の演奏を評価の対象としてもよい。この場合、カラオケ装置１０は、楽器の演奏音信号を取得するためのインターフェースを備える。ガイドメロディデータにおけるデータ構成は、実施形態のように歌唱を評価する場合と同様であるが、ガイドメロディ自体は、歌唱の旋律に限らず、例えばキーボードのパートのメロディとしてもよい。拍に関する種別の分類及び評価の処理については、実施形態と同様の手順により行われる。このようにすれば、ユーザは、歌唱に限らず楽器の演奏についても、拍を考慮したリズム感の評価を得ることができる。 (Modification 9)
Not only singing by the user but also performance of musical instruments by the user may be evaluated. In this case, the karaoke apparatus 10 includes an interface for acquiring a performance sound signal of a musical instrument. The data structure in the guide melody data is the same as that in the case of evaluating a song as in the embodiment, but the guide melody itself is not limited to the melody of the song, and may be a melody of a keyboard part, for example. The classification and evaluation processing relating to the beats is performed by the same procedure as in the embodiment. In this way, the user can obtain an evaluation of a sense of rhythm in consideration of beats not only for singing but also for playing musical instruments.

（変形例１０）
ユーザの歌唱において評価の低かった拍に関する種別をユーザに通知するようにしてもよい。例えば、予め決められた閾値よりも低い評価点をつけられた拍に関する種別について、「あなたは裏拍が苦手なようです。」というようなコメントをＵＩ部１４に表示させてもよい。このようにすれば、ユーザは自らの不得意とする拍に関する種別が明確に分かるので、効率的に練習を積むことができる。 (Modification 10)
You may make it notify a user the classification regarding the beat with low evaluation in a user's song. For example, a comment such as “You seem not good at back beats” may be displayed on the UI unit 14 for a type related to a beat with an evaluation score lower than a predetermined threshold. In this way, the user can clearly understand the types of beats that he / she is not good at, so he can practice efficiently.

（変形例１１）
ガイドメロディデータが、「拍種別」に代えて次のような情報を有することで、拍に関する種別の分類が行われても良い。この場合、ガイドメロディデータは、楽曲の開始から終了までの演奏期間を時間的に複数に区分して得られる小節の各々の開始時刻と、複数の小節の各々に関する拍子を特定する拍子情報とを含む。例えば、時刻を「ｍｍ：ｓｓ：ｓｓｓ」として表したときに、開始時刻が「００：００：００」である小節が「４／４拍子」であり、次の小節の開始時刻が「００：０４：００」であったとする。この場合、制御部１１は、小節の開始時刻どうしの差から、先頭小節を４秒間として特定する。この場合、「４／４拍子」であるから、１拍につき１秒であることがわかる。これに従って、制御部１１は、先頭小節において、「００：００：００」、「００：０１：００」、「００：０２：００」及び「００：０３：００」のタイミングを表拍と特定する。このように表拍のタイミングがわかれば、実施形態と同様に時刻差に基づいて、拍に関する種別の分類を行うことが可能である。このようにしても、実施形態と同様の効果を奏することができる。 (Modification 11)
The guide melody data may include the following information in place of the “beat type”, so that the types of beats may be classified. In this case, the guide melody data includes the start time of each measure obtained by dividing the performance period from the start to the end of the music into a plurality of times, and the time information for specifying the time for each of the plurality of measures. Including. For example, when the time is expressed as “mm: ss: sss”, the measure whose start time is “00:00:00” is “4/4 time”, and the start time of the next measure is “00: 04:00 ”. In this case, the control unit 11 specifies the first measure as 4 seconds from the difference between the start times of the measures. In this case, since it is “4/4 time signature”, it is understood that it is 1 second per beat. In accordance with this, the control unit 11 specifies the timing of “00:00:00”, “00:01:00”, “00:02:00”, and “00:03:00” as a table beat in the first measure. To do. As described above, if the timing of the front beat is known, it is possible to classify the types related to beats based on the time difference as in the embodiment. Even if it does in this way, there can exist an effect similar to embodiment.

（変形例１２）
サーバ装置２０が、楽曲の伴奏音を表す楽音データとガイドメロディデータとを記憶する構成に限らず、これらのデータをカラオケ装置１０が記憶部１２に記憶させるようにしてもよい。このような構成にすれば、カラオケ装置１０がサーバ装置２０と通信を行う必要がないため、通信にかかる処理時間を省略することができる。 (Modification 12)
The server device 20 is not limited to the configuration in which the musical sound data representing the musical accompaniment sound and the guide melody data are stored, and the karaoke device 10 may store the data in the storage unit 12. With such a configuration, the karaoke device 10 does not need to communicate with the server device 20, and therefore processing time for communication can be omitted.

（変形例１３）
評価規準テーブルに基づくタメやハシリの評価を、楽曲のジャンルや拍の種別によって異ならせてもよい。この場合、サーバ装置２０から送信されてくる楽音データには、楽曲のジャンルを示す識別子が含まれている。例えば、楽曲のジャンルが「演歌」であった場合、歌唱にタメが用いられやすい。カラオケ装置１０は、楽曲のジャンルを示す識別子と、タメ又はハシリを対象とした評価を行うか否かのフラグを対応付けた対応表を記憶部１２に予め記憶させている。制御部１１は、受信した楽曲のジャンルを示す識別子と、上記対応表とに基づいて、タメ又はハシリを対象とした評価を行うかを決定する。例えば、楽曲のジャンルが「演歌」であった場合、一般的に歌唱にタメが用いられやすいので、対応表において、「演歌」は「タメ」を対象とした評価を行うフラグがオンとなっている。この場合、制御部１１は、ステップＳａ９で評価を行う際に、評価規準テーブルのうち、「タメ」の評価点を用いて評価を行う。また、例えば、制御部１１は、「表拍」については、「タメ」を対象とした評価を行う、というように、拍の種別によって評価方法を予め決定するようにしてもよい。このとき、どの拍の種別に対してタメ又はハシリを対象とした評価を行うかは、対応表として記憶部１２に予め記憶されている。このようにすれば、楽曲の特徴や拍の種別に合わせて、タメやハシリを評価することができる。 (Modification 13)
The evaluation of the tame and the hash based on the evaluation criterion table may be varied depending on the genre of music and the type of beat. In this case, the musical sound data transmitted from the server device 20 includes an identifier indicating the genre of the music. For example, when the genre of the music is “enka”, it is easy to use the song for singing. The karaoke apparatus 10 stores in the storage unit 12 in advance a correspondence table in which an identifier indicating the genre of music is associated with a flag indicating whether or not to perform evaluation for a target or a hash. The control unit 11 determines whether or not to perform evaluation for the target or the hash based on the identifier indicating the received music genre and the correspondence table. For example, if the genre of the song is “Enka”, generally, it is easy to use the song for singing. Therefore, in the correspondence table, “Enka” has the flag for performing evaluation for “Tame” turned on. Yes. In this case, when performing the evaluation in step Sa9, the control unit 11 performs the evaluation using the evaluation score “Tame” in the evaluation criterion table. Further, for example, the control unit 11 may determine in advance the evaluation method according to the type of beat, such as performing an evaluation on “tame” for “table beat”. At this time, which beat type is to be evaluated for the tame or hash is stored in advance in the storage unit 12 as a correspondence table. In this way, it is possible to evaluate the tame and the hash according to the characteristics of the music and the type of beat.

１０…カラオケ装置、１１…制御部、１２…記憶部、１３…通信部、１４…ＵＩ部、１５…音声入力部、１６…音声出力部、２０…サーバ装置、１００…歌唱評価システム DESCRIPTION OF SYMBOLS 10 ... Karaoke apparatus, 11 ... Control part, 12 ... Memory | storage part, 13 ... Communication part, 14 ... UI part, 15 ... Voice input part, 16 ... Voice output part, 20 ... Server apparatus, 100 ... Singing evaluation system

Claims

For each of a plurality of reference constituent sounds constituting a model sound to be referred to when a user plays a musical piece, model sound data representing the pitch of the reference constituent sound and the pronunciation start time of the reference constituent sound is acquired. Model sound data acquisition means;
Each of the plurality of reference constituent sounds whose pitches and pronunciation start times are represented by the model sound data acquired by the model sound data acquisition means is set to any one of a plurality of types related to beats determined according to a predetermined rule. A classification means for classifying;
A performance sound signal acquisition means for acquiring a performance sound signal representing a performance sound that is a sound generated by a user's performance;
Pitch specifying means for specifying the pitch of the user's performance sound from the performance sound signal acquired by the performance sound signal acquisition means;
For each of a plurality of reference constituent sounds whose pitches and pronunciation start times are represented by the model sound data, pronunciation start time specifying means for specifying the pronunciation start time at which the reference constituent sounds were played according to a predetermined rule;
For each of the plurality of reference component sounds represented by the model sound data, the sound generation start time of the reference component sound, and the sound generation start time of the performance specified by the sound generation start time specifying unit for the reference component sound, Component sound evaluation data generating means for generating component sound evaluation data indicating higher evaluation as the difference between
For each type related to beats, a beat indicating an evaluation for each type based on component sound evaluation data generated by the component sound evaluation data generation unit for one or more reference component sounds classified by the classification unit. A singing evaluation device comprising: evaluation data generating means for each beat type for generating evaluation data for each type.

The sound generation start time specifying means relates to each of a plurality of reference constituent sounds whose pitches and sound generation start times are represented by the model sound data, within a period determined based on the sound generation start time of the reference constituent sounds. The time when the pitch of the user's performance sound specified by the pitch specifying means falls within the range of the pitch determined with reference to the pitch of the reference component sound, and the pronunciation of the performance of the reference component sound The singing evaluation device according to claim 1, wherein the singing evaluation device is specified as a start time.

The exemplary sound data includes timing data indicating a timing at which the reference constituent sound is generated with respect to each of the at least one or more reference constituent sounds.
The singing evaluation apparatus according to claim 1, wherein the classification unit performs the classification based on the timing data.

The exemplary sound data includes data for specifying the start time of each measure obtained by dividing the performance period of the music into a plurality of pieces, and data for specifying time signatures for each of the plurality of measures,
The said classification | category means performs the said classification based on the data which specify the start time of each of the said bar | burr, and the data which specify the time signature regarding each of these several bar | burrs. The classification | category of Claim 1 or 2 characterized by the above-mentioned. Singing evaluation device.

The model sound data includes, for each of at least one or more of the reference component sounds, a coefficient to be multiplied by an evaluation score regarding the reference component sound,
The component sound evaluation data generating means indicates an evaluation point obtained by multiplying an evaluation point determined based on a difference between a pronunciation start time of the reference component sound and a pronunciation start time of a performance by the user by the coefficient. The singing evaluation apparatus according to any one of claims 1 to 4, wherein the constituent sound evaluation data is generated.

The beat type evaluation data generated by the beat type evaluation data generating means is data indicating the numerical value of the evaluation of each type related to the beat,
6. A calculating means for calculating a numerical value obtained by multiplying a numerical value indicated by the evaluation data for each beat type by a coefficient of a ratio predetermined for each type and adding them. The singing evaluation apparatus in any one of.