JP2015069082A

JP2015069082A - Information processing device, data generation method and program

Info

Publication number: JP2015069082A
Application number: JP2013204484A
Authority: JP
Inventors: 典昭阿瀬見; Noriaki Asemi
Original assignee: Brother Industries Ltd
Current assignee: Brother Industries Ltd
Priority date: 2013-09-30
Filing date: 2013-09-30
Publication date: 2015-04-13
Anticipated expiration: 2033-09-30
Also published as: JP6060867B2

Abstract

PROBLEM TO BE SOLVED: To provide a technique for generating evaluation data.SOLUTION: In evaluation data generation processing, a control unit generates information including a weight which becomes larger as a singing skill in a musical piece is more characteristic, as evaluation data, on the basis of an average value and standard deviation of a skill characteristic amount for each singing skill in one musical piece and a standard skill characteristic amount in a plurality of musical pieces (S280), generates reference data (S290), and stores it together with the evaluation data in a storage unit 14 of an information processing server 10 (S300). In a karaoke scoring processing, the control unit calculates a skill evaluation point by performing weighted calculation based on the evaluation data. As a result obtained by comparing pitch transition in a singing waveform with that in the reference data, the control unit calculates a higher reference evaluation point with a higher matching degree and calculates a total amount with the skill evaluation point as a total evaluation point.

Description

本発明は、評価データを生成する情報処理装置、データ生成方法、及びプログラムに関する。 The present invention relates to an information processing apparatus that generates evaluation data, a data generation method, and a program.

従来、カラオケ装置においては、歌唱音声における音高推移に基づいて採点した基準点数に、歌唱中に用いた歌唱技巧を評価した付加点数を加えた点数を評価点数として算出することがなされている（特許文献１参照）。 Conventionally, in a karaoke apparatus, a score obtained by adding an additional score obtained by evaluating a singing technique used during singing to a reference score scored based on a pitch transition in a singing voice has been calculated as an evaluation score ( Patent Document 1).

このようなカラオケ装置においては、歌唱すべき旋律を表し楽曲ごとに予め用意されたリファレンスデータと、楽曲を歌唱した際の音声における音高推移とのズレが小さいほど、高い点数となるように基準点数を算出する。さらに、特許文献１に記載のカラオケ装置では、歌唱音声を解析して歌唱中に用いられた各種の歌唱技巧を検出し、歌唱技巧が用いられた回数が多いほど、大きな値の付加点数を算出している。 In such a karaoke device, the reference data that represents the melody to be sung and prepared in advance for each song and the difference between the pitch transitions in the voice when singing the song is smaller, the higher the score Calculate the score. Furthermore, in the karaoke apparatus described in Patent Document 1, the singing voice is analyzed to detect various singing techniques used during the singing, and the larger the number of times the singing technique is used, the larger the added value is calculated. doing.

特開２００７−２３３０１３号公報JP 2007-233303 A

一般的な歌謡曲では、楽曲のジャンルや歌手などによって、楽曲を歌唱する際に中心として用いられる歌唱技巧（以下、「特徴歌唱技巧」と称す）の種類が異なる。
このため、特許文献１に記載されたカラオケ装置において、歌唱音声から検出した歌唱技巧を、予め生成した評価データに照合した結果、一致している場合に、付加点数を付与することが考えられる。ここで言う評価データとして、例えば、楽曲を歌唱する際に用いるべき歌唱技巧の内容を、その歌唱技巧を用いるべきタイミングと対応付けたものとすることが考えられる。このような評価データは、通常、人の手によって楽曲ごとに予め生成する必要があり、従来の技術では、楽曲データに基づいて評価データを自動的に生成することが困難であるという課題があった。 In general pop music, the type of singing technique (hereinafter referred to as “characteristic singing technique”) used as a center when singing a music differs depending on the genre or singer of the music.
For this reason, in the karaoke apparatus described in Patent Document 1, it is conceivable that an additional score is given when the singing technique detected from the singing voice matches the evaluation data generated in advance. As the evaluation data referred to here, for example, it is conceivable that the content of the singing technique to be used when singing a song is associated with the timing at which the singing technique is to be used. Such evaluation data usually needs to be generated in advance for each piece of music by a human hand, and the conventional technology has a problem that it is difficult to automatically generate evaluation data based on music data. It was.

そこで、本発明は、評価データを生成する技術を提供することを目的とする。 Therefore, an object of the present invention is to provide a technique for generating evaluation data.

上記目的を達成するためになされた本発明は、楽曲データ取得手段と、抽出手段と、第１決定手段と、第２決定手段と、実行手段とを備えた情報処理装置である。
本発明の楽曲データ取得手段は、歌唱した歌唱音を含む楽曲データが記憶された第一記憶部から、楽曲データを取得する。抽出手段は、楽曲データ取得手段により取得された楽曲データから、歌唱した歌唱音を表すボーカルデータを抽出する。 The present invention made to achieve the above object is an information processing apparatus comprising music data acquisition means, extraction means, first determination means, second determination means, and execution means.
The music data acquisition means of this invention acquires music data from the 1st memory | storage part in which the music data containing the singing song sound was memorize | stored. The extraction means extracts vocal data representing the sung singing sound from the music data acquired by the music data acquisition means.

そして、第一決定手段は、抽出手段により抽出されたボーカルデータについて、複数の歌唱技巧についての評価を表す技巧特徴量を決定する。第二決定手段は、複数の楽曲のボーカルデータと、複数の楽曲のボーカルデータについて、複数の歌唱技巧について評価した評価情報とを対応付けて記憶する第二記憶部に記憶された評価情報に基づいて、第１決定手段により決定された技巧特徴量の中で、所定の条件を満たす特徴歌唱技巧を決定する。 And a 1st determination means determines the technique feature-value showing evaluation about several singing techniques about the vocal data extracted by the extraction means. The second determining means is based on the evaluation information stored in the second storage unit that stores the vocal data of the plurality of songs and the evaluation information evaluated on the plurality of singing techniques for the vocal data of the plurality of songs. Thus, the characteristic singing technique satisfying a predetermined condition is determined among the technical feature quantities determined by the first determining means.

さらに、実行手段は、第２決定手段により決定された特徴歌唱技巧を用いて、楽曲に対する歌唱の評価を実行する。
本発明の情報処理装置によれば、評価データとしての特徴歌唱技巧を自動的に生成することができる。しかも、本発明の情報処理装置において、楽曲における特徴的な歌唱技巧であることを所定の条件とすれば、生成する評価データとしての特徴歌唱技巧を、楽曲における特徴的な歌唱技巧とすることができる。 Furthermore, the execution means executes the evaluation of the singing for the music using the characteristic singing technique determined by the second determining means.
According to the information processing apparatus of the present invention, it is possible to automatically generate a characteristic singing technique as evaluation data. In addition, in the information processing apparatus of the present invention, if a predetermined condition is a characteristic singing technique in music, the characteristic singing technique as evaluation data to be generated may be a characteristic singing technique in music. it can.

したがって、このような特徴歌唱技巧を用いて歌唱を評価すれば、楽曲を利用者が歌唱する際に用いた歌唱技巧が特徴歌唱技巧であるか否かを評価でき、評価結果について、利用者の歌唱を聴いた人物が違和感を覚えることを低減できる。 Therefore, if singing is evaluated using such a characteristic singing technique, it can be evaluated whether or not the singing technique used when the user sings the music is a characteristic singing technique. It can be reduced that the person who listens to the song feels uncomfortable.

また、本発明の情報処理装置は、楽曲の演奏中に入力された音声を表す歌唱データを取得する歌唱取得手段を備えていても良い。
この場合、本発明における実行手段は、歌唱取得手段により取得された歌唱データについて、第２決定手段により決定された特徴歌唱技巧を用いて、楽曲に対する歌唱の評価を実行しても良い。 Moreover, the information processing apparatus of this invention may be provided with the song acquisition means which acquires the song data showing the audio | voice input during the performance of the music.
In this case, the execution means in the present invention may perform singing evaluation on the music using the characteristic singing technique determined by the second determination means for the singing data acquired by the singing acquisition means.

このような情報処理装置によれば、楽曲を歌唱した際に、その歌唱に対する評価を実行できる。
さらに、本発明における抽出手段は、楽曲データから、ボーカルデータと、楽曲における伴奏音を表す伴奏データとを抽出しても良い。 According to such an information processing apparatus, when a song is sung, the singing can be evaluated.
Furthermore, the extracting means in the present invention may extract vocal data and accompaniment data representing accompaniment sounds in the music from the music data.

この場合、本発明の情報処理装置は、さらに、抽出手段で抽出した伴奏データに基づいて、楽曲を演奏する演奏手段を備えていても良い。そして、本発明の歌唱取得手段は、演奏手段にて楽曲の演奏中に入力された音声を歌唱データとして取得しても良い。 In this case, the information processing apparatus of the present invention may further include performance means for playing a music piece based on the accompaniment data extracted by the extraction means. And the song acquisition means of this invention may acquire the audio | voice input during the performance of the music by the performance means as song data.

このような情報処理装置によれば、第一記憶部に記憶された楽曲データに基づいて楽曲を演奏でき、その演奏中に入力された音声（歌唱）に対する評価を実行できる。
また、本発明における実行手段は、歌唱データにおける技巧特徴量と特徴歌唱技巧との差分を導出し、その導出した差分が大きいほど、楽曲に対する歌唱の評価を高くしても良い。 According to such an information processing device, music can be played based on the music data stored in the first storage unit, and the evaluation of the voice (singing) input during the performance can be performed.
Moreover, the execution means in this invention may derive | lead-out the difference of the technique feature-value and characteristic song technique in song data, and may raise the evaluation of the song with respect to a music, so that the derived difference is large.

このような情報処理装置によれば、歌唱データにおける技巧特徴量と特徴歌唱技巧との差分が大きいほど高く評価できる。
すなわち、情報処理装置では、歌唱データにおける技巧特徴量と特徴歌唱技巧との差分が大きい場合、歌唱時の技巧が特徴歌唱技巧よりも強く表れているものと言える。 According to such an information processing device, the higher the difference between the skill feature amount and the characteristic singing skill in the song data, the higher the evaluation.
That is, in the information processing apparatus, when the difference between the skill feature quantity and the characteristic singing technique in the song data is large, it can be said that the technique at the time of singing appears more strongly than the characteristic singing technique.

このため、情報処理装置によれば、楽曲における特徴技巧を強く表現した歌唱を高く評価でき、評価結果について、利用者の歌唱を聴いた人物が違和感を覚えることを低減できる。 For this reason, according to the information processing device, a song that strongly expresses a characteristic technique in music can be highly evaluated, and it can be reduced that the person who listened to the user's song feels uncomfortable about the evaluation result.

さらに、本発明における第１決定手段は、技巧特徴量を歌唱技巧ごとに決定し、第２決定手段は、特徴歌唱技巧を歌唱技巧ごとに決定しても良い。この場合、本発明における実行手段は、歌唱データにおける技巧特徴量と特徴歌唱技巧との差分を歌唱技巧ごとに導出し、楽曲における特徴を強く表す歌唱技巧ほど大きな重みを付与して、その導出した差分を重み付き平均することで、楽曲の歌唱を評価しても良い。 Furthermore, the 1st determination means in this invention may determine a technique feature-value for every singing technique, and a 2nd determination means may determine a characteristic singing technique for every singing technique. In this case, the execution means in the present invention derives the difference between the skill feature quantity and the feature singing technique in the singing data for each singing technique, assigns a larger weight to the singing technique that strongly expresses the feature in the music, and derives the difference. The song singing may be evaluated by averaging the differences with weights.

このような情報処理装置によれば、楽曲の歌唱の評価において、特徴を強く表す歌唱技巧に対する評価の割合を高くできる。
本発明の情報処理装置は、さらに、標準算出手段と、評価情報生成手段と、格納制御手段とを備えていても良い。 According to such an information processing device, in the evaluation of song singing, the rate of evaluation with respect to singing techniques that strongly express characteristics can be increased.
The information processing apparatus of the present invention may further include standard calculation means, evaluation information generation means, and storage control means.

本発明における標準算出手段は、複数の楽曲のボーカルデータについての歌唱技巧の標準的な評価を表す標準特徴量を算出する。評価情報生成手段は、標準算出手段で導出された標準特徴量を数値処理し、その数値処理の結果を評価情報として算出する。さらに、格納制御手段は、評価情報生成手段で算出された評価情報を、複数の楽曲のボーカルデータと対応付けて、第二記憶部に格納する。 The standard calculation means in the present invention calculates a standard feature amount representing a standard evaluation of the singing skill for vocal data of a plurality of music pieces. The evaluation information generation means numerically processes the standard feature amount derived by the standard calculation means, and calculates the result of the numerical processing as evaluation information. Further, the storage control means stores the evaluation information calculated by the evaluation information generating means in the second storage unit in association with vocal data of a plurality of music pieces.

このような情報処理装置によれば、評価情報を、複数の楽曲のボーカルデータから導出できる。
なお、ここで言う数値処理とは、例えば、算術平均や標準偏差を求めることである。 According to such an information processing apparatus, evaluation information can be derived from vocal data of a plurality of music pieces.
Note that the numerical processing referred to here is, for example, obtaining an arithmetic mean or standard deviation.

ところで、本発明は、歌唱を評価するデータ生成方法としてなされていても良い。
この場合のデータ生成方法は、第一記憶部から楽曲データを取得する楽曲データ取得過程と、その取得された楽曲データからボーカルデータを抽出する抽出過程と、その抽出されたボーカルデータについて、複数の歌唱技巧についての技巧特徴量を決定する第１決定過程と、第二記憶部に記憶された評価情報に基づいて、第１決定過程により決定された技巧特徴量の中で、所定の条件を満たす特徴歌唱技巧を決定する第２決定過程と、その決定された特徴歌唱技巧を用いて、楽曲に対する歌唱の評価を実行する実行過程とを備えている。 By the way, this invention may be made | formed as a data generation method which evaluates a song.
In this case, the data generation method includes a music data acquisition process for acquiring music data from the first storage unit, an extraction process for extracting vocal data from the acquired music data, and a plurality of extracted vocal data. Based on the first determination process for determining the technique feature amount for the singing technique and the evaluation information stored in the second storage unit, the technique feature amount determined by the first determination process satisfies a predetermined condition. A second determination process for determining the characteristic singing technique and an execution process for performing an evaluation of the singing on the music using the determined characteristic singing technique.

このようなデータ生成方法によれば、請求項１に記載の情報処理装置と同様の効果を得ることができる。
また、本発明は、コンピュータが実行するプログラムとしてなされていても良い。 According to such a data generation method, an effect similar to that of the information processing apparatus according to claim 1 can be obtained.
Further, the present invention may be made as a program executed by a computer.

この場合、本発明のプログラムは、第一記憶部から楽曲データを取得する楽曲データ取得手順と、その取得された楽曲データからボーカルデータを抽出する抽出手順と、その抽出されたボーカルデータについて、複数の歌唱技巧についての技巧特徴量を決定する第１決定手順と、第二記憶部に記憶された評価情報に基づいて、第１決定手順により決定された技巧特徴量の中で、所定の条件を満たす特徴歌唱技巧を決定する第２決定手順と、その決定された特徴歌唱技巧を用いて、楽曲に対する歌唱の評価を実行する実行手順とをコンピュータに実行させる。 In this case, the program of the present invention includes a plurality of song data acquisition procedures for acquiring song data from the first storage unit, an extraction procedure for extracting vocal data from the acquired song data, and the extracted vocal data. Based on the first determination procedure for determining the skill feature amount for the singing technique and the evaluation information stored in the second storage unit, a predetermined condition is determined among the skill feature amounts determined by the first determination procedure. The computer is made to perform the 2nd determination procedure which determines the characteristic song technique to satisfy | fill, and the execution procedure which performs evaluation of the song with respect to a music using the determined characteristic song technique.

本発明がプログラムとしてなされていれば、記録媒体から必要に応じてコンピュータにロードさせて起動することや、必要に応じて通信回線を介してコンピュータに取得させて起動することにより用いることができる。そして、コンピュータに各手順を実行させることで、そのコンピュータを、請求項１に記載された情報処理装置として機能させることができる。 If the present invention is implemented as a program, it can be used by loading it into a computer from a recording medium as necessary and starting it, or by acquiring it and starting it through a communication line as necessary. And by making a computer perform each procedure, the computer can be functioned as an information processing apparatus described in claim 1.

なお、ここで言う記録媒体には、例えば、ＤＶＤ−ＲＯＭ、ＣＤ−ＲＯＭ、ハードディスク等のコンピュータ読み取り可能な電子媒体を含む。 The recording medium referred to here includes, for example, a computer-readable electronic medium such as a DVD-ROM, a CD-ROM, and a hard disk.

本発明が適用された情報処理装置を含むカラオケシステムの概略構成を示すブロック図である。It is a block diagram which shows schematic structure of the karaoke system containing the information processing apparatus with which this invention was applied. 標準特徴量算出処理の処理手順を示すフローチャートである。It is a flowchart which shows the process sequence of a standard feature-value calculation process. 評価データ生成処理の処理手順を示すフローチャートである。It is a flowchart which shows the process sequence of an evaluation data generation process. カラオケ採点処理の処理手順を示すフローチャートである。It is a flowchart which shows the process sequence of a karaoke scoring process. 評価データの一例を示す図である。It is a figure which shows an example of evaluation data.

以下に本発明の実施形態を図面と共に説明する。
〈カラオケシステムの構成〉
図１に示すカラオケシステム１は、ユーザ（利用者）が指定した楽曲を演奏し、その演奏に合わせてユーザが歌唱するシステムである。 Embodiments of the present invention will be described below with reference to the drawings.
<Configuration of karaoke system>
The karaoke system 1 shown in FIG. 1 is a system in which a music specified by a user (user) is played and the user sings along with the performance.

これを実現するために、カラオケシステム１は、情報処理サーバ１０と、少なくとも一台のカラオケ装置３０とを備えている。情報処理サーバ１０とカラオケ装置３０とは、通信網を介して接続されている。なお、ここで言う通信網は、有線による通信網であっても良いし、無線による通信網であっても良い。 In order to realize this, the karaoke system 1 includes an information processing server 10 and at least one karaoke apparatus 30. The information processing server 10 and the karaoke apparatus 30 are connected via a communication network. The communication network referred to here may be a wired communication network or a wireless communication network.

情報処理サーバ１０は、楽曲ごとに用意された楽曲データＭＤ−１〜ＭＤ−Ｎを格納する。カラオケ装置３０は、ユーザ（利用者）が指定した楽曲に対応する楽曲データＭＤを情報処理サーバ１０から取得し、その楽曲データＭＤに基づいて楽曲を演奏すると共に、その楽曲の演奏中に音声の入力を受け付ける。 The information processing server 10 stores music data MD-1 to MD-N prepared for each music. The karaoke apparatus 30 acquires the music data MD corresponding to the music specified by the user (user) from the information processing server 10 and plays the music based on the music data MD. Accept input.

なお、符号「Ｎ」は、楽曲データＭＤを識別する識別子であり、「Ｎ」は、２以上の自然数である。
〈情報処理サーバ〉
情報処理サーバ１０は、通信部１２と、記憶部１４と、制御部１６とを備えている。 The code “N” is an identifier for identifying the music data MD, and “N” is a natural number of 2 or more.
<Information processing server>
The information processing server 10 includes a communication unit 12, a storage unit 14, and a control unit 16.

このうち、通信部１２は、通信網を介して、情報処理サーバ１０が外部との間で通信を行う。
制御部１６は、ＲＯＭ１８，ＲＡＭ２０，ＣＰＵ２２を備えた周知のマイクロコンピュータを中心に構成された周知の制御装置である。ＲＯＭ１８は、電源が切断されても記憶内容を保持する必要がある処理プログラムやデータを格納する。ＲＡＭ２０は、処理プログラムやデータを一時的に格納する。ＣＰＵ２２は、ＲＯＭ１８やＲＡＭ２０に記憶された処理プログラムに従って各処理（各種演算）を実行する。 Among these, the communication unit 12 performs communication between the information processing server 10 and the outside via a communication network.
The control unit 16 is a known control device that is configured around a known microcomputer including a ROM 18, a RAM 20, and a CPU 22. The ROM 18 stores processing programs and data that need to retain stored contents even when the power is turned off. The RAM 20 temporarily stores processing programs and data. The CPU 22 executes each process (various calculations) in accordance with a processing program stored in the ROM 18 or the RAM 20.

すなわち、制御部１６は、情報処理サーバ１０を構成する各部を制御すると共に、カラオケ装置３０との間のデータ通信を実行する。
記憶部１４は、記憶内容を読み書き可能に構成された周知の記憶装置である。この記憶部１４には、少なくとも、複数の楽曲データＭＤが格納される。 That is, the control unit 16 controls each unit constituting the information processing server 10 and executes data communication with the karaoke apparatus 30.
The storage unit 14 is a known storage device configured to be able to read and write stored contents. The storage unit 14 stores at least a plurality of music data MD.

楽曲データＭＤは、楽曲に関する情報が記述された楽曲管理情報と、楽曲の演奏音を表す原盤波形データと、楽曲の歌詞を表す歌詞データとを備えている。楽曲管理情報には、少なくとも、楽曲を識別する楽曲識別情報（例えば、曲番号）が含まれる。 The music data MD includes music management information in which information related to music is described, master waveform data representing the performance sound of the music, and lyric data representing the lyrics of the music. The music management information includes at least music identification information (for example, music number) for identifying music.

本実施形態の原盤波形データは、複数の楽器の演奏音と、主旋律を歌唱した歌唱音を含む音声データである。この音声データは、非圧縮音声ファイルフォーマットの音声ファイルによって構成されたデータであっても良いし、音声圧縮フォーマットの音声ファイルによって構成されたデータであっても良い。 The master waveform data of the present embodiment is sound data including performance sounds of a plurality of musical instruments and singing sounds singing the main melody. The audio data may be data constituted by an audio file in an uncompressed audio file format, or data constituted by an audio file in an audio compression format.

なお、以下では、原盤波形データに含まれる演奏音を表す音声波形データを伴奏データと称し、原盤波形データに含まれる歌唱音を表す音声波形データをボーカルデータと称す。 In the following, voice waveform data representing performance sound included in the master waveform data is referred to as accompaniment data, and voice waveform data representing singing sound included in the master waveform data is referred to as vocal data.

本実施形態の伴奏データに含まれる楽器の演奏音としては、打楽器（例えば、ドラム，太鼓，シンバルなど）の演奏音，弦楽器（例えば、ギター，ベースなど）の演奏音，打弦
楽器（例えば、ピアノ）の演奏音，及び管楽器（例えば、トランペットやクラリネットなど）の演奏音がある。一般的な楽曲においては、通常、打楽器やベースがリズム楽器として用いられる。 Musical instrument performance sounds included in the accompaniment data of the present embodiment include percussion instrument (eg, drum, drum, cymbal, etc.) performance sounds, stringed instrument (eg, guitar, bass, etc.) performance sounds, percussion instrument (eg, piano) ) And wind instruments (eg, trumpet, clarinet, etc.). In general music, percussion instruments and bass are usually used as rhythm instruments.

なお、記憶部１４に格納される楽曲データＭＤには、プロが作曲した楽曲の楽曲データＭＤに加えて、カラオケシステム１の一般ユーザが作曲した楽曲の楽曲データＭＤも含まれる。この一般ユーザが作曲した楽曲の楽曲データＭＤは、周知の情報処理端末（例えば、パーソナルコンピュータや携帯端末）にて作成され、情報処理サーバ１０にアップロードされる。
〈カラオケ装置〉
カラオケ装置３０は、通信部３２と、入力受付部３４と、楽曲再生部３６と、記憶部３８と、音声制御部４０と、映像制御部４６と、制御部５０とを備えている。 The music data MD stored in the storage unit 14 includes music data MD of music composed by a general user of the karaoke system 1 in addition to music data MD of music composed by a professional. The music data MD of the music composed by the general user is created by a known information processing terminal (for example, a personal computer or a portable terminal) and uploaded to the information processing server 10.
<Karaoke equipment>
The karaoke apparatus 30 includes a communication unit 32, an input reception unit 34, a music playback unit 36, a storage unit 38, an audio control unit 40, a video control unit 46, and a control unit 50.

通信部３２は、通信網を介して、カラオケ装置３０が外部との間で通信を行う。入力受付部３４は、外部からの操作に従って情報や指令の入力を受け付ける入力機器である。本実施形態における入力機器とは、例えば、キーやスイッチ、リモコンの受付部などである。 In the communication unit 32, the karaoke apparatus 30 communicates with the outside via a communication network. The input receiving unit 34 is an input device that receives input of information and commands in accordance with external operations. The input device in the present embodiment is, for example, a key, a switch, a remote control receiving unit, or the like.

楽曲再生部３６は、記憶部３８に記憶されている楽曲データＭＤや、情報処理サーバ１０からダウンロードした楽曲データＭＤに基づいて楽曲の再生を行う。音声制御部４０は、音声の入出力を制御するデバイスであり、出力部４２と、マイク入力部４４とを備えている。 The music playback unit 36 plays back music based on the music data MD stored in the storage unit 38 or the music data MD downloaded from the information processing server 10. The voice control unit 40 is a device that controls voice input / output, and includes an output unit 42 and a microphone input unit 44.

マイク入力部４４には、マイク６２が接続される。これにより、マイク入力部４４は、ユーザの歌唱音を取得する。出力部４２にはスピーカ６０が接続されている。出力部４２は、楽曲再生部３６によって再生される楽曲の音源信号、マイク入力部４４からの歌唱音の音源信号をスピーカ６０に出力する。スピーカ６０は、出力部４２から出力される音源信号を音に換えて出力する。 A microphone 62 is connected to the microphone input unit 44. Thereby, the microphone input part 44 acquires a user's song sound. A speaker 60 is connected to the output unit 42. The output unit 42 outputs the sound source signal of the music reproduced by the music reproducing unit 36 and the sound source signal of the singing sound from the microphone input unit 44 to the speaker 60. The speaker 60 outputs the sound source signal output from the output unit 42 instead of sound.

映像制御部４６は、制御部５０から送られてくる映像データに基づく映像の出力を行う。映像制御部４６には、映像の表示を行う表示部６４が接続されている。
制御部５０は、ＲＯＭ５２，ＲＡＭ５４，ＣＰＵ５６を少なくとも有した周知のコンピュータを中心に構成されている。ＲＯＭ５２は、電源が切断されても記憶内容を保持する必要がある処理プログラムやデータを格納する。ＲＡＭ５４は、処理プログラムやデータを一時的に格納する。ＣＰＵ５６は、ＲＯＭ５２やＲＡＭ５４に記憶された処理プログラムに従って各処理（各種演算）を実行する。 The video control unit 46 outputs video based on video data sent from the control unit 50. A display unit 64 for displaying video is connected to the video control unit 46.
The control unit 50 is configured around a known computer having at least a ROM 52, a RAM 54, and a CPU 56. The ROM 52 stores processing programs and data that need to retain stored contents even when the power is turned off. The RAM 54 temporarily stores processing programs and data. The CPU 56 executes each process (various calculations) in accordance with a processing program stored in the ROM 52 or the RAM 54.

そして、ＲＯＭ５２には、制御部５０が、カラオケ採点処理を実行するための処理プログラムと、評価データ生成処理を実行するための処理プログラムと、標準特徴量算出処理を実行するための処理プログラムとが格納されている。 The ROM 52 includes a processing program for the control unit 50 to execute the karaoke scoring process, a processing program for executing the evaluation data generation process, and a processing program for executing the standard feature amount calculation process. Stored.

なお、カラオケ採点処理は、ユーザによって指定された楽曲を演奏し、その演奏期間中にマイク６２を介して入力された音声を評価する処理である。評価データ生成処理は、カラオケ採点処理に必要な評価データを楽曲データＭＤごとに生成する処理である。標準特徴量算出処理は、評価データの生成に用いる標準特徴量を算出する処理である。 The karaoke scoring process is a process of playing music designated by the user and evaluating the voice input through the microphone 62 during the performance period. The evaluation data generation process is a process of generating evaluation data necessary for the karaoke scoring process for each piece of music data MD. The standard feature quantity calculation process is a process for calculating a standard feature quantity used for generating evaluation data.

つまり、カラオケ装置３０では、標準特徴量算出処理に従って、評価情報としての標準特徴量を算出すると共に、評価データ生成処理に従って、特徴歌唱技巧としての評価データを楽曲データＭＤごとに生成する。そして、カラオケ装置３０では、カラオケ採点処理に従って、対象楽曲に対応する楽曲データＭＤに基づいて楽曲を演奏し、その演奏中に、
マイク６２を介して入力された音声を歌唱データとして取得する。さらに、カラオケ装置３０では、カラオケ採点処理に従って、その取得した歌唱データを採点して評価する。 That is, the karaoke apparatus 30 calculates standard feature values as evaluation information according to the standard feature value calculation processing, and generates evaluation data as feature singing techniques for each piece of music data MD according to the evaluation data generation processing. And in the karaoke apparatus 30, according to a karaoke scoring process, a music is performed based on the music data MD corresponding to an object music,
The voice input through the microphone 62 is acquired as song data. Further, the karaoke apparatus 30 scores and evaluates the acquired singing data according to the karaoke scoring process.

すなわち、カラオケ装置３０は、標準特徴量算出処理、評価データ生成処理、及びカラオケ採点処理を実行する情報処理装置として機能する。
〈標準特徴量算出処理〉
次に、カラオケ装置３０の制御部５０が実行する標準特徴量算出処理について説明する。 That is, the karaoke apparatus 30 functions as an information processing apparatus that executes standard feature value calculation processing, evaluation data generation processing, and karaoke scoring processing.
<Standard feature calculation processing>
Next, a standard feature amount calculation process executed by the control unit 50 of the karaoke apparatus 30 will be described.

この標準特徴量算出処理は、予め規定された時間間隔で起動される。なお、標準特徴量算出処理の起動タイミングは、予め規定された時間間隔ごとに限らない。例えば、処理プログラム（アプリケーション）を起動するための起動指令が、入力受付部３４を介して入力されたタイミングでも良い。 This standard feature amount calculation process is started at a predetermined time interval. Note that the start timing of the standard feature amount calculation process is not limited to a predetermined time interval. For example, the timing at which a start command for starting a processing program (application) is input via the input receiving unit 34 may be used.

そして、標準特徴量算出処理では、図２に示すように、起動されると、まず、制御部５０は、情報処理サーバ１０に格納されている全ての楽曲データＭＤの中から、一つの楽曲データＭＤを取得する（Ｓ１１０）。続いて、制御部５０は、Ｓ１１０にて取得した楽曲データＭＤに含まれる原盤波形データを取得する（Ｓ１２０）。 In the standard feature amount calculation process, as shown in FIG. 2, when activated, the control unit 50 firstly selects one piece of music data from all the music data MD stored in the information processing server 10. The MD is acquired (S110). Subsequently, the control unit 50 acquires master waveform data included in the music data MD acquired in S110 (S120).

さらに、標準特徴量算出処理では、制御部５０は、Ｓ１２０にて取得した原盤波形データから、伴奏データとボーカルデータとを分離して抽出する（Ｓ１３０）。このＳ１３０において、制御部５０が、伴奏データとボーカルデータとを分離する手法として、周知の手法（例えば、特開２００８−１３４６０６に記載された“ＰｒｅＦＥｓｔ”）が考えられる。なお、ＰｒｅＦＥｓｔとは、原盤波形データにおいて最も優勢な音声波形をボーカルデータとして原盤波形データから分離し、残りの音声波形を伴奏データとして分離する手法である。 Further, in the standard feature amount calculation process, the control unit 50 separates and extracts accompaniment data and vocal data from the master disk waveform data acquired in S120 (S130). In S <b> 130, a known method (for example, “PreFEst” described in JP-A-2008-134606) is conceivable as a method by which the control unit 50 separates accompaniment data and vocal data. Note that PreFEst is a method of separating the most prevalent voice waveform in the master waveform data from the master waveform data as vocal data and separating the remaining voice waveform as accompaniment data.

続いて、標準特徴量算出処理では、制御部５０は、Ｓ１３０にて抽出したボーカルデータを採譜処理する（Ｓ１４０）。このＳ１４０における採譜処理は、ボーカルデータにおける音圧の時間変化と、ボーカルデータにおける音高の時間変化とに基づいて採譜する周知の手法である。 Subsequently, in the standard feature value calculation process, the control unit 50 performs a musical score process on the vocal data extracted in S130 (S140). The music recording process in S140 is a well-known method of recording music based on the temporal change in sound pressure in vocal data and the temporal change in pitch in vocal data.

すなわち、採譜処理では、制御部５０は、ボーカルデータにおける音圧の時間変化が規定閾値以上となったタイミングを、楽曲における歌唱旋律を構成する各音符の開始タイミングｎｎｔ（ａ，ｉ）として特定する。さらに、採譜処理では、制御部５０は、ボーカルデータにおける音圧の時間変化が規定閾値以下となったタイミングを、楽曲における歌唱旋律を構成する各音符の終了タイミングｎｆｔ（ａ，ｉ）として特定する。 That is, in the music recording process, the control unit 50 specifies the timing at which the temporal change of the sound pressure in the vocal data becomes equal to or greater than the specified threshold as the start timing nnt (a, i) of each note constituting the song melody in the music. . Further, in the music recording process, the control unit 50 specifies the timing at which the time change of the sound pressure in the vocal data is equal to or less than the specified threshold as the end timing nft (a, i) of each note constituting the song melody in the music. .

採譜処理では、制御部５０は、互いに対応する開始タイミングｎｎｔ（ａ，ｉ）及び終了タイミングｎｆｔ（ａ，ｉ）によって特定される区間を各音符の音符区間として特定する。これと共に、採譜処理では、制御部５０は、ボーカルデータにおける音高の時間変化に基づいて、各音符区間における音高を特定し、各音符区間とその音符区間における音高ｎｎ（ａ，ｉ）とを対応付ける。 In the music recording process, the control unit 50 specifies a section specified by the start timing nnt (a, i) and the end timing nft (a, i) corresponding to each other as a note section of each note. At the same time, in the music recording process, the control unit 50 specifies the pitch in each note interval based on the time change of the pitch in the vocal data, and the pitch nn (a, i) in each note interval and the note interval. Is associated.

なお、符号ａは、楽曲を識別する符号であり、符号ｉは、楽曲における歌唱旋律の音符区間を識別する符号である。
標準特徴量算出処理では、制御部５０は、さらに、複数の歌唱技巧についての評価を表す技巧特徴量を、楽曲における音符区間ごとに決定する（Ｓ１５０）。ここで言う複数の歌唱技巧には、“ビブラート”，“ため”，“しゃくり”，“フォール”，“こぶし”を含む。 In addition, the code | symbol a is a code | symbol which identifies a music, and the code | symbol i is a code | symbol which identifies the musical note area of the song melody in a music.
In the standard feature value calculation process, the control unit 50 further determines a skill feature value representing an evaluation of a plurality of singing skills for each note section in the music (S150). The plurality of singing techniques mentioned here include “vibrato”, “for”, “shrimp”, “fall”, and “fist”.

このうち、“ビブラート”についての技巧特徴量（以下、「ビブラート特徴量」と称す）ｖｉｂ（ａ，ｉ）の算出では、制御部５０は、まず、ボーカルデータから各音符区間に対応する音声波形を抽出し、各音符区間の音声波形について周波数解析（ＤＦＴ）を実施する。そして、制御部５０は、下記（１）式に従って、ビブラート特徴量ｖｉｂ（ａ，ｉ）を算出する。 Among these, in calculating the technical feature amount (hereinafter referred to as “vibrato feature amount”) vib (a, i) for “vibrato”, the control unit 50 firstly calculates a speech waveform corresponding to each note interval from vocal data. , And frequency analysis (DFT) is performed on the speech waveform of each note interval. And the control part 50 calculates the vibrato feature-value vib (a, i) according to following (1) Formula.

ただし、上記（１）式におけるｖｉｂ＿ｐｅｒ（ａ，ｉ）は、各音符区間の音声波形におけるスペクトルピークの突出精度を表す指標である。このｖｉｂ＿ｐｅｒ（ａ，ｉ）は、周波数解析結果（即ち、振幅スペクトル）のピーク値を、周波数解析結果の平均値で除すことで求めれば良い。また、上記（１）式におけるｖｉｐ＿ｄｅｐ（ａ，ｉ）は、各音符区間の音声波形の標準偏差である。

However, vib_per (a, i) in the above equation (1) is an index representing the protruding accuracy of the spectrum peak in the speech waveform of each note interval. The vib_per (a, i) may be obtained by dividing the peak value of the frequency analysis result (that is, the amplitude spectrum) by the average value of the frequency analysis result. Also, vip_dep (a, i) in the above equation (1) is the standard deviation of the speech waveform in each note interval.

“ため”についての技巧特徴量（以下、「ため特徴量」と称す）ｔｔ（ａ，ｉ）の算出では、制御部５０は、まず、伴奏データにおける非調波成分の音声波形をリズム楽器の演奏音波形として抽出する。この非調波成分の抽出手法として、周知の手法を用いれば良い。具体的な手法の例としては、非調波成分の音声波形を表すフィルタとして予め用意されたフィルタに伴奏音データを通過させることや、“スペクトログラムの滑らかさの異方性に基づいた調波音・打楽器音の分離”（日本音響学会春季研究発表会講演論文集，２−５−８，ｐ．９０３−９０４（２００８．０３））に記載された手法などが考えられる。 In calculating the technical feature amount (hereinafter referred to as “for feature amount”) tt (a, i) for “for”, the control unit 50 first converts the sound waveform of the non-harmonic component in the accompaniment data to the rhythm instrument. Extract as performance sound waveform. A known method may be used as the method for extracting the inharmonic component. Specific examples of the method include passing accompaniment sound data through a filter prepared in advance as a filter representing a speech waveform of a non-harmonic component, or “harmonic sound based on the anisotropy of spectrogram smoothness / The method described in “Separation of percussion instrument sounds” (Proceedings of the Spring Meeting of the Acoustical Society of Japan, 2-5-8, p.903-904 (2008.03)) can be considered.

さらに、ため特徴量ｔｔ（ａ，ｉ）の算出では、制御部５０は、リズム楽器の演奏音波形において、音圧が規定値以上となるタイミングを拍の位置として推定する。続いて、制御部５０は、楽曲の歌唱旋律を構成する音符の中で、音価が最も短い音符（以下、「最短音符」と称す）を抽出する。そして、制御部５０は、抽出した最短音符の音価にて拍の位置の間隔を除すことで、発声タイミングを特定する。ここで言う発声タイミングとは、各音符ｉに対して歌唱を開始する可能性のあるタイミングである。 Further, in calculating the feature quantity tt (a, i), the control unit 50 estimates the timing at which the sound pressure becomes equal to or higher than a specified value in the performance sound waveform of the rhythm instrument as the beat position. Subsequently, the control unit 50 extracts a note having the shortest note value (hereinafter referred to as “shortest note”) from the notes constituting the song melody of the music. Then, the control unit 50 specifies the utterance timing by dividing the interval between the beat positions by the note value of the extracted shortest note. The utterance timing here is a timing at which singing may be started for each note i.

ため特徴量ｔｔ（ａ，ｉ）の算出では、さらに、制御部５０は、規定条件を満たす発声タイミングを特定する。ここで言う規定条件を満たすとは、開始タイミングｎｎｔ（ａ，ｉ）よりも遅い発声タイミングであって、かつ、開始タイミングｎｎｔ（ａ，ｉ）から減算した値の絶対値が最小となる発声タイミングである。そして、特定した発声タイミングを開始タイミングｎｎｔ（ａ，ｉ）から減算した時間長を、ため特徴量ｔｔ（ａ，ｉ）として算出する。 Therefore, in the calculation of the feature quantity tt (a, i), the control unit 50 further specifies an utterance timing that satisfies a specified condition. Satisfying the specified condition here is an utterance timing that is later than the start timing nnt (a, i), and that the absolute value of the value subtracted from the start timing nnt (a, i) is minimum. It is. Then, a time length obtained by subtracting the specified utterance timing from the start timing nnt (a, i) is calculated as a feature quantity tt (a, i).

“しゃくり”についての技巧特徴量（以下、「しゃくり特徴量」と称す）ｒｉｓｅ（ａ，ｉ）の算出では、制御部５０は、まず、ボーカルデータの音高時間変化を微分した微分変化を導出する。続いて、制御部５０は、各音符の開始タイミングｎｎｔ（ａ，ｉ）以前で、微分変化が時間軸に沿って正の値となったタイミングを特定する。さらに、制御部５０は、その特定した各タイミングから開始タイミングｎｎｔ（ａ，ｉ）までの区間におけるボーカルデータの音高時間変化と予め規定された模範曲線との相互相関値を、しゃくり特徴量ｒｉｓｅ（ａ，ｉ）として導出する。 In calculating the skill feature amount (hereinafter referred to as “shackle feature amount”) rise (a, i) for “shrimp”, the control unit 50 first derives a differential change obtained by differentiating the pitch time change of vocal data. To do. Subsequently, the control unit 50 specifies the timing at which the differential change becomes a positive value along the time axis before the start timing nnt (a, i) of each note. Further, the control unit 50 obtains the cross-correlation value between the pitch time change of the vocal data and the predefined exemplary curve in the section from the identified timing to the start timing nnt (a, i), and the scribing feature amount rise. Derived as (a, i).

“フォール”についての技巧特徴量（以下、「フォール特徴量」と称す）ｆａｌｌ（ａ，ｉ）の算出では、制御部５０は、各音符区間の終了タイミングｎｆｔ（ａ，ｉ）以降で、微分変化が時間軸に沿って正の値となった最初のタイミングを特定する。さらに、制御
部５０は、歌唱旋律を構成する各音符区間の終了タイミングｎｆｔ（ａ，ｉ）から、その特定したタイミングまでの区間におけるボーカルデータの音高時間変化と、予め規定された模範曲線との相互相関値を、フォール特徴量ｆａｌｌ（ａ，ｉ）として導出する。 In calculating the technical feature amount (hereinafter referred to as “fall feature amount”) fall (a, i) for “fall”, the control unit 50 performs differentiation after the end timing nft (a, i) of each note interval. The first timing when the change becomes a positive value along the time axis is specified. Further, the control unit 50 changes the pitch time of the vocal data in the section from the end timing nft (a, i) of each note section constituting the singing melody to the specified timing, and a prescribed model curve. Are derived as the fall feature value fall (a, i).

“こぶし”についての技巧特徴量（以下、「こぶし特徴量」と称す）ｋｏｂ（ａ，ｉ）の算出では、制御部５０は、まず、こぶし区間を特定する。ここで言うこぶし区間とは、複数の音高に渡る各音符を同一母音で音高を変化させながら歌っている区間である。 In calculating the technical feature amount (hereinafter referred to as “fist feature amount”) kob (a, i) for “fist”, the control unit 50 first specifies a fist section. Here, the fist section is a section in which each note over a plurality of pitches is sung while changing the pitch with the same vowel.

このため、こぶし特徴量ｋｏｂ（ａ，ｉ）の算出では、制御部５０は、同一母音で発声された区間（以下、「同一母音区間」と称す）を特定する。この同一母音区間の特定方法として、各音符区間のメル周波数ケプストラム（ＭＦＣＣ）の平均値の類似性を相互相関で導出し、相互相関値が閾値以上である音符区間を同一母音区間として特定する方法を用いる。 For this reason, in calculating the fist feature value kob (a, i), the control unit 50 identifies a section uttered by the same vowel (hereinafter referred to as “same vowel section”). As a method for specifying the same vowel interval, a method of deriving similarity of average values of mel frequency cepstrum (MFCC) of each note interval by cross-correlation and specifying a note interval whose cross-correlation value is equal to or greater than a threshold as the same vowel interval Is used.

また、制御部５０は、同一母音区間において、設定条件を満たす同一母音区間だけをこぶし区間として特定する。ここで言う設定条件を満たすとは、時間軸に沿って隣接する音符区間の終了タイミングｎｆｔ（ａ−１，ｉ）と開始タイミングｎｎｔ（ａ，ｉ）との時間間隔が閾値以下であり、かつ、隣接する音符区間の音高が全て異なることである。 In addition, the control unit 50 identifies only the same vowel section that satisfies the setting condition as the fist section in the same vowel section. Satisfying the setting condition here means that the time interval between the end timing nft (a-1, i) and the start timing nnt (a, i) of the note intervals adjacent to each other along the time axis is equal to or less than a threshold value, and The pitches of adjacent note intervals are all different.

そして、こぶし特徴量ｋｏｂ（ａ，ｉ）の算出では、制御部５０は、こぶし区間におけるボーカル波形からクロマベクトルを算出する。さらに、制御部５０は、同こぶし区間における伴奏データのクロマベクトルを算出し、ボーカル波形のクロマベクトルとの相互相関値をこぶし特徴量ｋｏｂ（ａ，ｉ）として算出する。 In calculating the fist feature value kob (a, i), the control unit 50 calculates a chroma vector from the vocal waveform in the fist section. Further, the control unit 50 calculates a chroma vector of accompaniment data in the same fist section, and calculates a cross-correlation value with the chroma vector of the vocal waveform as a fist feature value kob (a, i).

標準特徴量算出処理では、続いて、制御部５０が、複数の歌唱技巧について評価した評価情報としての楽曲特徴量を算出する（Ｓ１６０）。この楽曲特徴量とは、ビブラート特徴量ｖｉｂ，ため特徴量ｔｔ，しゃくり特徴量ｒｉｓｅ，フォール特徴量ｆａｌｌ，こぶし特徴量ｋｏｂそれぞれについての一つの楽曲内での平均値である。 In the standard feature value calculation process, the control unit 50 calculates a music feature value as evaluation information evaluated for a plurality of singing techniques (S160). The music feature amount is an average value in one piece of music for each of the vibrato feature amount vib, the feature amount tt, the shawl feature amount rise, the fall feature amount fall, and the fist feature amount kob.

このため、制御部５０は、ビブラート特徴量ｖｉｂの楽曲特徴量ｓｖｉｂを下記（２）式に従って算出する。制御部５０は、ため特徴量ｔｔの楽曲特徴量ｓｔｔを下記（３）式に従って算出する。制御部５０は、しゃくり特徴量ｒｉｓｅの楽曲特徴量ｓｒｉｓｅを下記（４）式に従って算出する。さらに、制御部５０は、フォール特徴量ｆａｌｌの楽曲特徴量ｓｆａｌｌを下記（５）式に従って算出する。制御部５０は、こぶし特徴量ｋｏｂの楽曲特徴量ｓｋｏｂを下記（６）式に従って算出する。 Therefore, the control unit 50 calculates the music feature amount sviv of the vibrato feature amount vib according to the following equation (2). Therefore, the control unit 50 calculates the music feature amount stt of the feature amount tt according to the following equation (3). The control unit 50 calculates the music feature amount srise of the scribbling feature amount rise according to the following equation (4). Further, the control unit 50 calculates the music feature amount sfall of the fall feature amount fall according to the following equation (5). The control unit 50 calculates the music feature amount skob of the fist feature amount kob according to the following equation (6).

なお、（２）〜（５）式における符号Ｎｉは、楽曲の歌唱旋律を構成する音符区間の個数である。また、（６）式における符号Ｎｊは、ボーカルデータに含まれるこぶし区間の個数である。

In addition, the code | symbol Ni in (2)-(5) Formula is the number of the note intervals which comprise the song melody of a music. Moreover, the code | symbol Nj in (6) Formula is the number of fist areas contained in vocal data.

標準特徴量算出処理では、続いて、制御部５０は、楽曲特徴量を算出するまでの処理（即ち、Ｓ１１０〜Ｓ１６０）を、全ての楽曲データＭＤに対して実行したか否かを判定する（Ｓ１７０）。このＳ１７０での判定の結果、楽曲特徴量を算出するまでの処理を、全ての楽曲データＭＤに対して実行していなければ（Ｓ１７０：ＮＯ）、制御部５０は、標準特徴量算出処理をＳ１１０へと戻す。そして、制御部５０は、処理を未実行の楽曲データＭＤを情報処理サーバ１０から取得して、Ｓ１２０へと移行する。 In the standard feature value calculation process, the control unit 50 subsequently determines whether or not the process until the music feature value is calculated (ie, S110 to S160) has been executed for all the music data MD ( S170). As a result of the determination in S170, if the process until calculating the music feature amount is not executed for all the music data MD (S170: NO), the control unit 50 performs the standard feature amount calculation process in S110. Return to. And the control part 50 acquires the music data MD which has not performed the process from the information processing server 10, and transfers to S120.

一方、Ｓ１７０での判定の結果、楽曲特徴量を算出するまでの処理を、全ての楽曲データＭＤに対して実行していれば（Ｓ１７０：ＹＥＳ）、制御部５０は、標準特徴量算出処理をＳ１８０へと進める。 On the other hand, as a result of the determination in S170, if the process until calculating the music feature amount is executed for all the music data MD (S170: YES), the control unit 50 performs the standard feature amount calculation process. Proceed to S180.

そのＳ１８０では、制御部５０は、標準特徴量を算出する。この標準特徴量には、楽曲特徴量ｓｖｉｂ，ｓｔｔ，ｓｒｉｓｅ，ｓｆａｌｌ，ｓｋｏｂ、それぞれを、全ての楽曲データＭＤで平均した平均値、及び標準偏差を含む。 In S180, the control unit 50 calculates a standard feature amount. This standard feature amount includes a music feature amount sviv, stt, srise, sfall, and skov, an average value obtained by averaging all the song data MD, and a standard deviation.

このうち、楽曲特徴量ｓｖｉｂの平均値ｍｓｖｉｂは、制御部５０が、下記（７）式に従って算出する。さらに、制御部５０は、楽曲特徴量ｓｔｔの平均値ｍｓｔｔを下記（８）式に従って算出し、楽曲特徴量ｓｒｉｓｅの平均値ｍｓｒｉｓｅを下記（９）式に従って算出する。制御部５０は、楽曲特徴量ｓｆａｌｌの平均値ｍｓｆａｌｌを下記（１０）式に従って算出し、楽曲特徴量ｓｋｏｂの平均値ｍｓｋｏｂを下記（１１）式に従って算出する。 Among these, the average value msvib of the music feature amount sviv is calculated by the control unit 50 according to the following equation (7). Further, the control unit 50 calculates the average value mstt of the music feature amount stt according to the following equation (8), and calculates the average value msrise of the music feature amount srise according to the following equation (9). The control unit 50 calculates the average value msfall of the music feature value sfall according to the following equation (10), and calculates the average value mskob of the music feature value skob according to the following equation (11).

また、制御部５０は、楽曲特徴量ｓｖｉｂの標準偏差ｓｄｖｉｂを下記（１２）式に従って算出し、楽曲特徴量ｓｔｔの標準偏差ｓｄｔｔを下記（１３）式に従って算出し、楽曲特徴量ｓｒｉｓｅの標準偏差ｓｄｒｉｓｅを下記（１４）式に従って算出する。さらに、制御部５０は、楽曲特徴量ｓｆａｌｌの標準偏差ｓｄｆａｌｌを下記（１５）式に従って算出し、楽曲特徴量ｓｋｏｂの標準偏差ｓｄｋｏｂを下記（１６）式に従って算出する。

Further, the control unit 50 calculates the standard deviation sdvib of the music feature quantity sviv according to the following formula (12), calculates the standard deviation sdtt of the music feature quantity stt according to the following formula (13), and the standard deviation of the music feature quantity srise. sdrise is calculated according to the following equation (14). Further, the control unit 50 calculates the standard deviation sdfall of the music feature value sfall according to the following equation (15), and calculates the standard deviation sdkob of the music feature value skov according to the following equation (16).

なお、（７）式〜（１６）式における符号ＮＳは、楽曲データの個数を表す。

Note that the symbol NS in the equations (7) to (16) represents the number of music data.

標準特徴量算出処理では、制御部５０は、Ｓ１８０にて算出した平均値ｍｓｖｉｂ，ｍｓｔｔ，ｍｓｒｉｓｅ，ｍｓｆａｌｌ，ｍｓｋｏｂ、及び標準偏差ｓｄｖｉｂ，ｓｄｔｔ，ｓｄｒｉｓｅ，ｓｄｆａｌｌ，ｓｄｋｏｂを標準特徴量として、情報処理サーバ１０の記憶部１４にアップロードする（Ｓ１９０）。さらに、Ｓ１９０では、制御部５０は、楽曲のボーカルデータと、複数の楽曲のボーカルデータについての楽曲特徴量とを対応付けて、情報処理サーバ１０の記憶部１４に記憶する。 In the standard feature value calculation process, the control unit 50 uses the average values msvib, mstt, msrise, msfall, mskob and standard deviations sdvib, sdtt, sdrise, sdfall, sdkov calculated in S180 as standard feature values. 10 is uploaded to the storage unit 14 (S190). Further, in S <b> 190, the control unit 50 stores the vocal data of the music in association with the music feature amount for the vocal data of the plurality of music in the storage unit 14 of the information processing server 10.

その後、本標準特徴量算出処理を終了する。
つまり、本実施形態の標準特徴量算出処理では、制御部５０が、複数の楽曲のボーカルデータについての各歌唱技巧の評価を表す技巧特徴量を算出する。さらに、標準特徴量算出処理では、制御部５０が、その算出された技巧特徴量を数値処理し、その数値処理の結果を標準特徴量として、情報処理サーバ１０の記憶部１４に格納する。
〈評価データ生成処理〉
次に、カラオケ装置３０の制御部５０が実行する評価データ生成処理について説明する。 Thereafter, the standard feature amount calculation process is terminated.
That is, in the standard feature value calculation process of the present embodiment, the control unit 50 calculates a skill feature value representing evaluation of each singing skill for vocal data of a plurality of music pieces. Further, in the standard feature amount calculation process, the control unit 50 performs numerical processing on the calculated technical feature amount, and stores the result of the numerical processing in the storage unit 14 of the information processing server 10 as a standard feature amount.
<Evaluation data generation process>
Next, an evaluation data generation process executed by the control unit 50 of the karaoke apparatus 30 will be described.

この評価データ生成処理は、評価データ生成処理を実行するための起動指令が入力されると起動される。
そして、評価データ生成処理では、図３に示すように、起動されると、まず、制御部５０は、情報処理サーバ１０に格納されている全ての楽曲データＭＤの中から、一つの楽曲データＭＤを取得する（Ｓ２１０）。続いて、制御部５０は、Ｓ２１０にて取得した楽曲データＭＤに含まれる原盤波形データを取得する（Ｓ２２０）。 The evaluation data generation process is started when a start command for executing the evaluation data generation process is input.
Then, in the evaluation data generation process, as shown in FIG. 3, when activated, the control unit 50 firstly selects one piece of music data MD from all the music data MD stored in the information processing server 10. Is acquired (S210). Subsequently, the control unit 50 acquires master waveform data included in the music data MD acquired in S210 (S220).

さらに、評価データ生成処理では、制御部５０は、Ｓ２２０にて取得した原盤波形データから、伴奏データとボーカルデータとを分離して、伴奏データ及びボーカルデータを抽出する（Ｓ２３０）。このＳ２３０における伴奏データとボーカルデータとを分離する手法は、標準特徴量算出処理におけるＳ１３０と同様の手法を用いれば良い。 Further, in the evaluation data generation process, the control unit 50 separates the accompaniment data and the vocal data from the original disc waveform data acquired in S220, and extracts the accompaniment data and the vocal data (S230). The technique for separating accompaniment data and vocal data in S230 may be the same technique as in S130 in the standard feature value calculation process.

続いて、評価データ生成処理では、制御部５０は、Ｓ２３０にて抽出したボーカルデータを採譜処理する（Ｓ２４０）。このＳ２４０における採譜処理は、標準特徴量算出処理のＳ１４０における採譜処理と同様の方法を用いれば良い。 Subsequently, in the evaluation data generation process, the control unit 50 performs a musical score process on the vocal data extracted in S230 (S240). The musical score processing in S240 may use the same method as the musical score processing in S140 of the standard feature amount calculation process.

評価データ生成処理では、制御部５０は、さらに、Ｓ２３０にて抽出したボーカルデータについての技巧特徴量それぞれを、楽曲における音符区間ごとに決定する（Ｓ２５０）。このＳ２５０における技巧特徴量を決定する手法は、標準特徴量算出処理におけるＳ１５０と同様の方法を用いれば良い。 In the evaluation data generation process, the control unit 50 further determines each technical feature amount of the vocal data extracted in S230 for each note section in the music (S250). As a technique for determining the skill feature amount in S250, a method similar to S150 in the standard feature amount calculation process may be used.

評価データ生成処理では、続いて、制御部５０が、楽曲特徴量を算出する（Ｓ２６０）。このＳ２６０にて算出する楽曲特徴量には、Ｓ２３０にて抽出したボーカルデータにおける楽曲（ビブラート）特徴量ｎｓｖｉｂ，楽曲（ため）特徴量ｎｓｔｔ，楽曲（しゃくり）特徴量ｎｓｒｉｓｅ，楽曲（フォール）特徴量ｎｓｆａｌｌ，楽曲（こぶし）特徴量ｎｓｋｏｂを含む。これらの楽曲特徴量の導出は、標準特徴量算出処理におけるＳ１６０と同様、Ｓ２５０にて算出された技巧特徴量の歌唱技巧ごとの平均値を求めることや、標準偏差を求めることで実現すれば良い。 In the evaluation data generation process, subsequently, the control unit 50 calculates a music feature amount (S260). The music feature values calculated in S260 include the music (vibrato) feature value nsvib, the music (for) feature value nstt, the music (shackle) feature value nsrise, and the music (fall) feature value in the vocal data extracted in S230. nsfall, music (fist) feature value nskob. Similar to S160 in the standard feature value calculation process, the derivation of these music feature values may be realized by obtaining the average value of the skill features calculated in S250 for each singing skill or obtaining the standard deviation. .

さらに、評価データ生成処理では、制御部５０が、評価データを生成する（Ｓ２７０）。ここで言う評価データとは、楽曲における特徴的な歌唱技巧ほど大きな値となるように、歌唱技巧ごとに規定される重みである。 Further, in the evaluation data generation process, the control unit 50 generates evaluation data (S270). The evaluation data referred to here is a weight defined for each singing technique so that the characteristic singing technique in the music has a larger value.

具体的にＳ２７０では、まず、制御部５０は、下記（１７）式から（２１）式に従って評価値ｄｖｉｂ，ｄｔｔ，ｄｒｉｓｅ，ｄｆａｌｌ，ｄｋｏｂを算出する。 Specifically, in S270, first, the control unit 50 calculates evaluation values dviv, dtt, drise, dfall, dkob according to the following formulas (17) to (21).

そして、Ｓ２７０では、制御部５０は、下記（２２）式から（２６）式に従って、各歌唱技巧に対する重みｗｖｉｂ，ｗｔｔ，ｗｒｉｓｅ，ｗｆａｌｌ，ｗｋｏｂを算出する。すなわち、本実施形態の評価データ生成処理では、技巧特徴量の中で所定の条件を満たす特徴歌唱技巧を、重みによって表している。

In S270, the control unit 50 calculates weights wvib, wtt, write, wfall, and wkob for each singing technique according to the following equations (22) to (26). That is, in the evaluation data generation process of the present embodiment, the feature singing technique that satisfies a predetermined condition in the technique feature amount is represented by a weight.

これと共に、Ｓ２７０では、制御部５０は、各楽曲特徴量ｎｓｖｉｂ，ｎｓｔｔ，ｎｓｒｉｓｅ，ｎｓｆａｌｌ，ｎｓｋｏｂについて、標準特徴量からの向きｄｉｒｖｉｂ，ｄｉｒｔｔ，ｄｉｒｒｉｓｅ，ｄｉｒｆａｌｌ，ｄｉｒｋｏｂを求める。

At the same time, in S270, the control unit 50 obtains the directions dirviv, dirtt, dirrise, dirfall, and dirkov from the standard feature values for the music feature values nsvib, nstt, nsrise, nsfall, and nskob.

具体的には、制御部５０は、ｄｉｒｖｉｂ，ｄｉｒｔｔ，ｄｉｒｒｉｓｅ，ｄｉｒｆａｌｌ，ｄｉｒｋｏｂそれぞれを、次の式に従って算出する。
ｄｉｒｖｉｂ＝ｓｉｇｎ（ｎｓｖｉｂ−ｍｓｖｉｂ）
ｄｉｒｔｔ＝ｓｉｇｎ（ｎｓｔｔ−ｍｓｔｔ）
ｄｉｒｒｉｓｅ＝ｓｉｇｎ（ｎｓｒｉｓｅ−ｍｓｒｉｓｅ）
ｄｉｒｆａｌｌ＝ｓｉｇｎ（ｎｓｆａｌｌ−ｍｓｆａｌｌ）
ｄｉｒｋｏｂ＝ｓｉｇｎ（ｎｓｋｏｂ−ｍｓｋｏｂ）
ただし、ここで言う“ｓｉｇｎ”は、符号関数であり、括弧内の符号に応じて、「１」，「−１」を返す関数である。つまり、向きｄｉｒｖｉｂ，ｄｉｒｔｔ，ｄｉｒｒｉｓｅ，ｄｉｒｆａｌｌ，ｄｉｒｋｏｂは、それぞれ、「１」または「−１」となる。 Specifically, the control unit 50 calculates each of dirviv, dirtt, dirrise, dirfall, and dirkob according to the following equations.
dirvib = sign (nsvib-msvib)
dirtt = sign (nstt−mst)
dirrise = sign (nsrise-mrise)
dirfall = sign (nsfall-msfall)
dirkob = sign (nskob-mskob)
However, “sign” here is a sign function, and is a function that returns “1” and “−1” according to the sign in parentheses. That is, the directions dirviv, dirtt, dirrise, dirfall, and dirkob are “1” or “−1”, respectively.

そして、Ｓ２７０では、制御部５０は、図５に示すように、楽曲識別情報と、標準特徴量と、重みｗｖｉｂ，ｗｔｔ，ｗｒｉｓｅ，ｗｆａｌｌ，ｗｋｏｂと、向きｄｉｒｖｉｂ，ｄｉｒｔｔ，ｄｉｒｒｉｓｅ，ｄｉｒｆａｌｌ，ｄｉｒｋｏｂとを歌唱技巧ごとに対応付けることで、評価データを生成する。なお、図５における“ ”内の数値は、標準特徴量、重み、向きの一例を示したものである。 In S270, the control unit 50, as shown in FIG. 5, the music identification information, the standard feature, the weights wvib, wtt, wrise, wfall, wkob, the direction dirviv, dirtt, dirrise, dirfall, dirkob, Is associated with each singing skill to generate evaluation data. Note that the numerical values in “” in FIG. 5 show examples of standard feature amounts, weights, and directions.

続いて、評価データ生成処理では、制御部５０は、Ｓ２４０における採譜処理の結果をリファレンスデータとして生成する（Ｓ２９０）。ここで言うリファレンスデータとは、歌唱すべき旋律を構成する音符（即ち、音高と音価と）を表したデータである。 Subsequently, in the evaluation data generation process, the control unit 50 generates the result of the music transcription process in S240 as reference data (S290). The reference data referred to here is data representing notes (that is, pitches and note values) constituting a melody to be sung.

そして、評価データ生成処理では、制御部５０は、楽曲識別情報と、評価データと、リファレンスデータとを対応付けて情報処理サーバ１０の記憶部１４にアップロードする（Ｓ３００）。 In the evaluation data generation process, the control unit 50 uploads the music identification information, the evaluation data, and the reference data in association with each other to the storage unit 14 of the information processing server 10 (S300).

その後、評価データ生成処理を終了する。
つまり、評価データ生成処理では、制御部５０が、楽曲における特徴的な歌唱技巧ほど大きな値となる重みを含む情報を評価特徴量として生成する。さらに、評価データ生成処理では、制御部５０は、リファレンスデータを生成し、評価データと共に情報処理サーバ１０の記憶部１４に格納する。
〈カラオケ採点処理〉
次に、カラオケ装置３０の制御部５０が実行するカラオケ採点処理について説明する。 Thereafter, the evaluation data generation process ends.
In other words, in the evaluation data generation process, the control unit 50 generates information including a weight that becomes a larger value as the characteristic singing technique in the music is an evaluation feature amount. Further, in the evaluation data generation process, the control unit 50 generates reference data and stores it in the storage unit 14 of the information processing server 10 together with the evaluation data.
<Karaoke scoring>
Next, the karaoke scoring process which the control part 50 of the karaoke apparatus 30 performs is demonstrated.

このカラオケ採点処理は、カラオケ採点処理を実行するための処理プログラムを起動する指令が入力されると起動される。
そして、カラオケ採点処理では、起動されると、図４に示すように、制御部５０は、まず、入力受付部３４を介して指定された楽曲に対応する楽曲データＭＤを、情報処理サーバ１０の記憶部１４から取得する（Ｓ５１０）。続いて、制御部５０は、Ｓ５１０にて取得した楽曲データＭＤに含まれている伴奏データを抽出する（Ｓ５２０）。 The karaoke scoring process is activated when a command for activating a processing program for executing the karaoke scoring process is input.
In the karaoke scoring process, when activated, as shown in FIG. 4, the control unit 50 first stores the music data MD corresponding to the music specified via the input receiving unit 34 in the information processing server 10. Obtained from the storage unit 14 (S510). Subsequently, the control unit 50 extracts accompaniment data included in the music data MD acquired in S510 (S520).

そして、カラオケ採点処理では、制御部５０は、伴奏データを再生して楽曲を演奏する（Ｓ５３０）。具体的にＳ５３０では、制御部５０は、楽曲再生部３６に伴奏データを出力し、その伴奏データを取得した楽曲再生部３６は、楽曲の再生を行う。そして、楽曲再生部３６によって再生される楽曲の音源信号が、出力部４２を介してスピーカ６０へと出力される。すると、スピーカ６０は、音源信号を音に換えて出力する。 In the karaoke scoring process, the control unit 50 reproduces the accompaniment data and plays the music (S530). Specifically, in S530, the control unit 50 outputs the accompaniment data to the music reproducing unit 36, and the music reproducing unit 36 that has acquired the accompaniment data reproduces the music. Then, a sound source signal of the music reproduced by the music reproducing unit 36 is output to the speaker 60 via the output unit 42. Then, the speaker 60 outputs the sound source signal instead of sound.

さらに、カラオケ採点処理では、制御部５０は、マイク６２及びマイク入力部４４を介して入力された音声を歌唱データとして取得する（Ｓ５４０）。そして、制御部５０は、Ｓ５４０にて取得した歌唱データを記憶部３８に格納する（Ｓ５５０）。 Further, in the karaoke scoring process, the control unit 50 acquires the voice input through the microphone 62 and the microphone input unit 44 as song data (S540). And the control part 50 stores the song data acquired in S540 in the memory | storage part 38 (S550).

続いて、カラオケ採点処理では、制御部５０は、楽曲の演奏を終了したか否かを判定する（Ｓ５６０）。この判定の結果、楽曲の演奏を終了していなければ（Ｓ５６０：ＮＯ）、制御部５０は、カラオケ採点処理をＳ５４０へと戻す。一方、Ｓ５６０での判定の結果、楽曲の演奏が終了していれば（Ｓ５６０：ＹＥＳ）、制御部５０は、カラオケ採点処理をＳ５７０へと移行させる。 Subsequently, in the karaoke scoring process, the control unit 50 determines whether or not the performance of the music has ended (S560). If the result of this determination is that the music performance has not ended (S560: NO), the control unit 50 returns the karaoke scoring process to S540. On the other hand, if the result of determination in S560 is that the music has been played (S560: YES), the control unit 50 shifts the karaoke scoring process to S570.

そのＳ５７０では、制御部５０は、記憶部３８に格納されている全ての歌唱データを取得する。そして、制御部５０は、楽曲における時間軸に沿った歌唱データから、歌唱旋律を構成する各音符を歌唱した区間（以下、「音符歌唱区間」と称す）の歌唱波形それぞれを抽出する（Ｓ５８０）。この音符歌唱区間の特定は、標準特徴量算出処理におけるＳ１４０と同様の方法で実施すれば良い。 In S <b> 570, the control unit 50 acquires all song data stored in the storage unit 38. And the control part 50 extracts each song waveform of the area (henceforth a "musical note song area") which sang each note which comprises a song melody from the song data along the time-axis in a music (S580). . The note singing section may be specified by the same method as S140 in the standard feature amount calculation process.

続いて、カラオケ採点処理では、制御部５０は、歌唱データについての歌唱技巧を評価した技巧特徴量（以下、「歌唱特徴量」と称す）を算出する（Ｓ５９０）。この歌唱特徴量の算出方法は、「ボーカルデータ」を「歌唱データ」へと読み替えることを除けば、標準特徴量算出処理におけるＳ１５０及びＳ１６０と同様であるため、ここでの詳しい説明は省略する。 Subsequently, in the karaoke scoring process, the control unit 50 calculates a skill feature amount (hereinafter referred to as “singing feature amount”) that evaluates the singing skill of the song data (S590). The singing feature value calculation method is the same as S150 and S160 in the standard feature value calculation process, except that “vocal data” is replaced with “singing data”, and detailed description thereof will be omitted.

さらに、カラオケ採点処理では、制御部５０は、基準評価点を算出する（Ｓ６００）。このＳ６００での基準評価点の算出では、制御部５０は、まず、各音符歌唱区間における歌唱波形の音高推移を、リファレンスデータにおける音高推移に照合する。そして、制御
部５０は、照合の結果、一致度が高いほど高い点数とした基準評価点を算出する。 Further, in the karaoke scoring process, the control unit 50 calculates a reference evaluation score (S600). In the calculation of the reference evaluation score in S600, the control unit 50 first collates the pitch transition of the singing waveform in each note singing section with the pitch transition in the reference data. And the control part 50 calculates the reference | standard evaluation score made into the high score, so that a matching degree is high as a result of collation.

続いて、カラオケ採点処理では、制御部５０は、技巧評価点を算出する（Ｓ６１０）。このＳ６１０での技巧評価点の算出では、制御部５０は、まず、Ｓ５９０にて算出した歌唱特徴量を歌唱技巧ごとに平均する。続いて、制御部５０は、下記式に従って技巧評価点を算出する。 Subsequently, in the karaoke scoring process, the control unit 50 calculates a skill evaluation score (S610). In the calculation of the skill evaluation score in S610, the control unit 50 first averages the singing feature amount calculated in S590 for each singing skill. Subsequently, the control unit 50 calculates a skill evaluation score according to the following formula.

技巧評価点＝α×Σ（重み×向き×（歌唱特徴量−楽曲特徴量の平均値）／楽曲特徴量の標準偏差）
ただし、上記の技巧評価点を求める式おいて、Σの対象は、歌唱技巧であり、上記の式における向きは「１」または「−１」である。また、上記の技巧評価点を求める式における符号αは、技巧評価点に与えられる重みであり、予め規定された定数である。 Technical evaluation score = α × Σ (weight × direction × (singing feature value−average value of music feature value) / standard deviation of music feature value)
However, in the formula for obtaining the skill evaluation score, the object of Σ is a singing skill, and the direction in the formula is “1” or “−1”. In addition, the symbol α in the formula for obtaining the skill evaluation point is a weight given to the skill evaluation point, and is a predetermined constant.

つまり、上記の技巧評価点を求める式は、歌唱特徴量と楽曲特徴量の平均値との差分に、楽曲における特徴を強く表す歌唱技巧ほど大きな重みを付与して重み付き加算したものである。なお、技巧評価点を求める方法は、これに限るものではなく、楽曲における特徴を強く表す歌唱技巧ほど大きな重みを付与した重み付き平均であっても良い。 In other words, the formula for obtaining the skill evaluation point is obtained by adding a weight to the difference between the singing feature value and the average value of the song feature value and giving a greater weight to the singing skill that strongly expresses the feature of the song. In addition, the method for obtaining the skill evaluation score is not limited to this, and may be a weighted average in which a larger weight is given to a singing technique that strongly expresses the feature of the music.

さらに、カラオケ採点処理では、制御部５０は、Ｓ６００にて算出した基準評価点に、Ｓ６１０にて算出した技巧評価点を加算することで、総合評価点を算出する（Ｓ６２０）。そして、制御部５０は、Ｓ６２０にて算出した総合評価点を表示部６４に表示させる（Ｓ６３０）。Ｓ６３０での表示は、制御部５０が、映像制御部４６を介して表示部６４に対して制御信号を出力することで実現する。なお、表示部６４に表示される評価点は、総合評価点だけに限らず、基準評価点、技巧評価点の少なくとも一方を含んでも良い。 Further, in the karaoke scoring process, the control unit 50 calculates the overall evaluation score by adding the skill evaluation score calculated in S610 to the reference evaluation score calculated in S600 (S620). And the control part 50 displays the comprehensive evaluation score calculated in S620 on the display part 64 (S630). The display in S630 is realized by the control unit 50 outputting a control signal to the display unit 64 via the video control unit 46. Note that the evaluation points displayed on the display unit 64 are not limited to the overall evaluation points, and may include at least one of a reference evaluation point and a skill evaluation point.

その後、本カラオケ採点処理を終了し、次の起動タイミングまで待機する。
つまり、カラオケ採点処理では、制御部５０は、楽曲の演奏中に入力された音声を歌唱データとして記憶する。そして、カラオケ採点処理では、制御部５０は、記憶した歌唱データを解析して歌唱特徴量を算出する。さらに、カラオケ採点処理では、制御部５０は、評価データとしての重みに基づいて、楽曲における特徴的な技巧が強く表れているほど、大きな点数となるように重みを付与した重み付き演算により、技巧評価点を算出する。 Thereafter, the karaoke scoring process is terminated, and the system waits until the next activation timing.
That is, in the karaoke scoring process, the control unit 50 stores the voice input during the performance of the music as singing data. In the karaoke scoring process, the control unit 50 analyzes the stored singing data and calculates the singing feature amount. Further, in the karaoke scoring process, the control unit 50 performs a skill by weighted calculation in which a weight is given so that a larger score is obtained as the characteristic skill in the music is strongly expressed based on the weight as the evaluation data. An evaluation score is calculated.

また、カラオケ採点処理では、制御部５０は、各音符歌唱区間における歌唱波形の音高推移をリファレンスデータにおける音高推移に照合し、一致度が高いほど、高い点数とした基準評価点を算出する。そして、制御部５０は、技巧評価点と基準評価点との合計を、総合評価点として算出する。
［実施形態の効果］
以上説明したように、カラオケシステム１によれば、楽曲データＭＤを解析することで評価データを自動的に生成することができる。 Further, in the karaoke scoring process, the control unit 50 collates the pitch transition of the singing waveform in each note singing section with the pitch transition in the reference data, and calculates a reference evaluation score having a higher score as the matching degree is higher. . And the control part 50 calculates the sum total of a skill evaluation score and a reference | standard evaluation score as a comprehensive evaluation score.
[Effect of the embodiment]
As described above, according to the karaoke system 1, the evaluation data can be automatically generated by analyzing the music data MD.

しかも、カラオケシステム１によれば、楽曲における特徴的な歌唱技巧ほど大きな値となる重みを含む評価データを生成できる。
そして、カラオケシステム１によれば、このような評価データを用いて歌唱を評価でき、利用者が楽曲を歌唱した際に用いた歌唱技巧が特徴技巧であるか否かを評価できる。 Moreover, according to the karaoke system 1, it is possible to generate evaluation data including a weight that becomes a larger value as a characteristic singing technique in music.
And according to the karaoke system 1, singing can be evaluated using such evaluation data, and it can be evaluated whether the singing technique used when the user sang music is a characteristic technique.

また、本実施形態のカラオケ採点処理では、技巧評価点を、歌唱特徴量と楽曲特徴量の平均値との差分が大きいほど高い評価となるように実施している。
すなわち、歌唱特徴量と楽曲特徴量の平均値との差分が大きいほど、楽曲にて特徴的な技巧がより強く表れているものと言える。 Moreover, in the karaoke scoring process of this embodiment, the skill evaluation score is implemented so that the higher the difference between the singing feature value and the average value of the song feature value, the higher the evaluation.
That is, it can be said that as the difference between the singing feature quantity and the average value of the song feature quantity is larger, the characteristic technique is more strongly expressed in the song.

このため、本実施形態のカラオケ採点処理によれば、楽曲における特徴技巧を強く表現した歌唱を高く評価できる。したがって、カラオケシステム１によれば、評価結果について、利用者の歌唱を聴いた人物が違和感を覚えることを低減できる。
［その他の実施形態］
以上、本発明の実施形態について説明したが、本発明は上記実施形態に限定されるものではなく、本発明の要旨を逸脱しない範囲において、様々な態様にて実施することが可能である。 For this reason, according to the karaoke scoring process of this embodiment, the song which strongly expressed the characteristic technique in the music can be highly evaluated. Therefore, according to the karaoke system 1, it can reduce that the person who listened to the user's song feels strange about an evaluation result.
[Other Embodiments]
As mentioned above, although embodiment of this invention was described, this invention is not limited to the said embodiment, In the range which does not deviate from the summary of this invention, it is possible to implement in various aspects.

例えば、上記実施形態の評価データ生成処理におけるＳ２７０では、重みｗｖｉｂ，ｗｔｔ，ｗｒｉｓｅ，ｗｆａｌｌ，ｗｋｏｂを、技巧特徴量の中で所定の条件を満たす特徴歌唱技巧として表していたが、特徴歌唱技巧は、重みに限るものではない。 For example, in S270 in the evaluation data generation process of the above embodiment, the weights wvib, wtt, wrise, wfall, and wkob are expressed as characteristic singing techniques that satisfy a predetermined condition in the technical feature quantities. , Is not limited to weights.

すなわち、本発明においては、評価値ｄｖｉｂ，ｄｔｔ，ｄｒｉｓｅ，ｄｆａｌｌ，ｄｋｏｂが最大となる歌唱技巧を特徴歌唱技巧として特定しても良い。この場合、制御部５０は、特徴歌唱技巧についての楽曲特徴量の平均値、及び標準偏差と、標準特徴量からの向きと、楽曲識別情報とを対応付けた情報を評価データとして生成しても良い。 That is, in the present invention, a singing technique that maximizes the evaluation values dviv, dtt, drise, dfall, dkob may be specified as the characteristic singing technique. In this case, even if the control part 50 produces | generates the information which matched the average value and standard deviation of the music feature-value about characteristic singing technique, the direction from a standard feature-value, and music identification information as evaluation data. good.

上記実施形態では、標準特徴量算出処理，評価データ生成処理，カラオケ採点処理を、カラオケ装置３０が実行していたが、これらの標準特徴量算出処理，評価データ生成処理，カラオケ採点処理を実行する装置は、カラオケ装置３０に限るものではない。例えば、標準特徴量算出処理，評価データ生成処理，カラオケ採点処理を実行する装置は、情報処理サーバ１０であっても良いし、その他の情報処理装置であっても良い。 In the above embodiment, the karaoke device 30 executes the standard feature quantity calculation process, the evaluation data generation process, and the karaoke scoring process. However, the standard feature quantity calculation process, the evaluation data generation process, and the karaoke scoring process are executed. The device is not limited to the karaoke device 30. For example, the information processing server 10 or another information processing device may be used as the device that executes the standard feature amount calculation process, the evaluation data generation process, and the karaoke scoring process.

なお、上記実施形態の構成の一部を、課題を解決できる限りにおいて省略した態様も本発明の実施形態である。また、上記実施形態と変形例とを適宜組み合わせて構成される態様も本発明の実施形態である。また、特許請求の範囲に記載した文言によって特定される発明の本質を逸脱しない限度において考え得るあらゆる態様も本発明の実施形態である。［実施形態と特許請求の範囲との対応関係］
最後に、上記実施形態の記載と、特許請求の範囲の記載との関係を説明する。 In addition, the aspect which abbreviate | omitted a part of structure of the said embodiment as long as the subject could be solved is also embodiment of this invention. Further, an aspect configured by appropriately combining the above embodiment and the modification is also an embodiment of the present invention. Moreover, all the aspects which can be considered in the limit which does not deviate from the essence of the invention specified by the wording described in the claims are the embodiments of the present invention. [Correspondence between Embodiment and Claims]
Finally, the relationship between the description of the above embodiment and the description of the scope of claims will be described.

上記実施形態の評価データ生成処理におけるＳ２１０を実行することで得られる機能が、特許請求の範囲の記載における楽曲データ取得手段に相当し、Ｓ２２０を実行することで得られる機能が、抽出手段に相当する。評価データ生成処理におけるＳ２４０〜Ｓ２６０を実行することで得られる機能が、特許請求の範囲の記載における第１決定手段に相当し、Ｓ２７０，Ｓ２８０を実行することで得られる機能が、第２決定手段に相当する。 The function obtained by executing S210 in the evaluation data generation process of the above embodiment corresponds to the music data acquisition means described in the claims, and the function obtained by executing S220 corresponds to the extraction means. To do. The function obtained by executing S240 to S260 in the evaluation data generation process corresponds to the first determining means in the claims, and the function obtained by executing S270 and S280 is the second determining means. It corresponds to.

そして、上記実施形態のカラオケ処理を実行することで得られる機能が特許請求の範囲の記載における実行手段に相当し、カラオケ処理のＳ５４０を実行することで得られる機能が、歌唱取得手段に相当し、Ｓ５３０を実行することで得られる機能が、演奏手段に相当する。 And the function obtained by performing the karaoke process of the said embodiment is equivalent to the execution means in description of a claim, and the function obtained by performing S540 of karaoke process is equivalent to a song acquisition means. The function obtained by executing S530 corresponds to the performance means.

また、上記実施形態の標準特徴量算出処理におけるＳ１１０〜Ｓ１６０を実行することで得られる機能が、特許請求の範囲の記載における標準算出手段に相当し、Ｓ１８０を実行することで得られる機能が、評価情報生成手段に相当し、Ｓ１９０を実行することで得られる機能が、格納制御手段に相当する。 Further, the function obtained by executing S110 to S160 in the standard feature amount calculation processing of the above embodiment corresponds to the standard calculation means in the description of the claims, and the function obtained by executing S180 is The function corresponding to the evaluation information generating means and the function obtained by executing S190 corresponds to the storage control means.

１…カラオケシステム１０…情報処理サーバ１２…通信部１４…記憶部１６…制御部１８…ＲＯＭ２０…ＲＡＭ２２…ＣＰＵ３０…カラオケ装置３２…通信部３４…入力受付部３６…楽曲再生部３８…記憶部４０…音声制御部４２…出
力部４４…マイク入力部４６…映像制御部５０…制御部５２…ＲＯＭ５４…ＲＡＭ５６…ＣＰＵ６０…スピーカ６２…マイク６４…表示部 DESCRIPTION OF SYMBOLS 1 ... Karaoke system 10 ... Information processing server 12 ... Communication part 14 ... Memory | storage part 16 ... Control part 18 ... ROM 20 ... RAM 22 ... CPU 30 ... Karaoke apparatus 32 ... Communication part 34 ... Input reception part 36 ... Music reproduction part 38 ... Storage unit 40 ... Audio control unit 42 ... Output unit 44 ... Microphone input unit 46 ... Video control unit 50 ... Control unit 52 ... ROM 54 ... RAM 56 ... CPU 60 ... Speaker 62 ... Microphone 64 ... Display unit

Claims

Music data acquisition means for acquiring the music data from the first storage unit storing music data including the sung singing sound;
Extraction means for extracting vocal data representing the sung sound from the music data acquired by the music data acquisition means;
A first determining means for determining a technique feature amount representing an evaluation of a plurality of singing techniques for the vocal data extracted by the extracting means;
Based on the evaluation information stored in the second storage unit that stores the vocal data of a plurality of pieces of music and the evaluation data evaluated about the plurality of singing techniques for the vocal data of the plurality of pieces of music. A second deciding means for deciding a characteristic singing technique that satisfies a predetermined condition among the skill features determined by the first deciding means;
Execution means for performing evaluation of singing on the music using the characteristic singing technique determined by the second determining means;
An information processing apparatus comprising:

Singing acquisition means for acquiring singing data representing the voice input during the performance of the music,
The execution means includes
2. The information processing according to claim 1, wherein the song data acquired by the song acquisition unit is subjected to song evaluation using a characteristic song technique determined by the second determination unit. 3. apparatus.

The extraction means extracts the vocal data and accompaniment data representing an accompaniment sound in the music from the music data,
The information processing apparatus further includes:
Based on the accompaniment data extracted by the extraction means, comprising performance means for playing the music,
The information processing apparatus according to claim 2, wherein the singing acquisition unit acquires, as the singing data, a voice input during the performance of a music piece by the performance unit.

The execution means includes
The difference between the skill feature amount in the singing data and the characteristic singing skill is derived, and the larger the derived difference is, the higher the evaluation of the song is. The information processing apparatus according to claim 1.

The first determining means determines the technique feature amount for each singing technique,
The second determining means determines the characteristic singing technique for each singing technique,
The execution means includes
The difference between the technical feature amount in the singing data and the characteristic singing skill is derived for each singing skill, the singing technique that strongly expresses the feature in the music is given a greater weight, and the derived difference is weighted averaged The information processing apparatus according to claim 4, wherein the singing of the music is evaluated.

A standard calculation means for calculating a standard feature amount representing a standard evaluation of a singing skill for vocal data of a plurality of songs;
Evaluation information generation means for numerically processing the standard feature amount derived by the standard calculation means, and calculating a result of the numerical processing as the evaluation information;
6. A storage control unit that stores the evaluation information calculated by the evaluation information generation unit in association with vocal data of a plurality of music pieces and stores it in the second storage unit. The information processing apparatus according to any one of the above.

A music data acquisition process for acquiring the music data from the first storage unit storing the music data including the sung singing sound;
An extraction process for extracting vocal data representing the sung sound from the music data acquired by the music data acquisition process;
A first determination step for determining a technique feature amount representing an evaluation of a plurality of singing techniques for the vocal data extracted by the extraction process;
Based on the evaluation information stored in the second storage unit that stores the vocal data of a plurality of pieces of music and the evaluation data evaluated about the plurality of singing techniques for the vocal data of the plurality of pieces of music. A second determination process for determining a feature singing technique that satisfies a predetermined condition among the skill features determined in the first determination process;
Using the characteristic singing technique determined in the second determining process, performing an evaluation of singing the music;
A data generation method comprising:

A music data acquisition procedure for acquiring the music data from the first storage unit storing the music data including the sung singing sound;
Extraction procedure for extracting vocal data representing the singing sound from the song data acquired by the song data acquisition procedure;
A first determination procedure for determining a technique feature amount representing an evaluation of a plurality of singing techniques for the vocal data extracted by the extraction procedure;
Based on the evaluation information stored in the second storage unit that stores the vocal data of a plurality of pieces of music and the evaluation data evaluated about the plurality of singing techniques for the vocal data of the plurality of pieces of music. A second determination procedure for determining a feature singing technique that satisfies a predetermined condition among the skill features determined by the first determination procedure;
A program for causing a computer to execute an execution procedure for performing evaluation of singing a song using the characteristic singing technique determined by the second determination procedure.