JP2015069083A

JP2015069083A - Information processing device, data generation method and program

Info

Publication number: JP2015069083A
Application number: JP2013204485A
Authority: JP
Inventors: 典昭阿瀬見; Noriaki Asemi
Original assignee: Brother Industries Ltd
Current assignee: Brother Industries Ltd
Priority date: 2013-09-30
Filing date: 2013-09-30
Publication date: 2015-04-13
Anticipated expiration: 2033-09-30
Also published as: JP6011506B2

Abstract

PROBLEM TO BE SOLVED: To provide a technique for generating evaluation data.SOLUTION: In evaluation data generation processing, a control unit 50 extracts vocal data representing voice sound from musical piece data (S230), and determines a skill characteristic amount representing evaluation of a plurality of singing skills for each note section which is a predetermined note section composing the musical piece data (S240). Furthermore, in evaluation data generation processing, a skill characteristic amount which satisfies a predetermined condition is determined among the skill characteristic amounts in the respective note sections, and then data in which a note section corresponding to the determined skill characteristic amount is associated with the determined skill characteristic amount is generated as evaluation data used for singing evaluation (S280).

Description

本発明は、評価データを生成する情報処理装置、データ生成方法、及びプログラムに関する。 The present invention relates to an information processing apparatus that generates evaluation data, a data generation method, and a program.

従来、カラオケ装置においては、歌唱音声における音高推移に基づいて採点した基準点数に、歌唱中に用いた歌唱技巧を評価した付加点数を加えた点数を評価点数として算出することがなされている（特許文献１参照）。 Conventionally, in a karaoke apparatus, a score obtained by adding an additional score obtained by evaluating a singing technique used during singing to a reference score scored based on a pitch transition in a singing voice has been calculated as an evaluation score ( Patent Document 1).

このようなカラオケ装置においては、歌唱すべき旋律を表し楽曲ごとに予め用意されたリファレンスデータと、楽曲を歌唱した際の音声における音高推移とのズレが小さいほど、高い点数となるように基準点数を算出する。さらに、特許文献１に記載のカラオケ装置では、歌唱音声を解析して歌唱中に用いられた各種の歌唱技巧を検出し、歌唱技巧が用いられた回数が多いほど、大きな値の付加点数を算出している。 In such a karaoke device, the reference data that represents the melody to be sung and prepared in advance for each song and the difference between the pitch transitions in the voice when singing the song is smaller, the higher the score Calculate the score. Furthermore, in the karaoke apparatus described in Patent Document 1, the singing voice is analyzed to detect various singing techniques used during the singing, and the larger the number of times the singing technique is used, the larger the added value is calculated. doing.

特開２００７−２３３０１３号公報JP 2007-233303 A

一般的な歌謡曲では、楽曲のジャンルや歌手などによって、楽曲を歌唱する際に中心として用いられる歌唱技巧（以下、「特徴技巧」と称す）の種類が異なる。
このため、特許文献１に記載されたカラオケ装置において、歌唱音声から検出した歌唱技巧を、予め生成した評価データに照合した結果、一致している場合に、付加点数を付与することが考えられる。ここで言う評価データとして、例えば、楽曲を歌唱する際に用いるべき歌唱技巧の内容を、その歌唱技巧を用いるべきタイミングと対応付けたものとすることが考えられる。このような評価データは、通常、人の手によって楽曲ごとに予め生成する必要があり、従来の技術では、楽曲データに基づいて評価データを自動的に生成することが困難であるという課題があった。 In general pop music, the type of singing technique (hereinafter referred to as “feature technique”) used as a center when singing a music differs depending on the genre of the music or the singer.
For this reason, in the karaoke apparatus described in Patent Document 1, it is conceivable that an additional score is given when the singing technique detected from the singing voice matches the evaluation data generated in advance. As the evaluation data referred to here, for example, it is conceivable that the content of the singing technique to be used when singing a song is associated with the timing at which the singing technique is to be used. Such evaluation data usually needs to be generated in advance for each piece of music by a human hand, and the conventional technology has a problem that it is difficult to automatically generate evaluation data based on music data. It was.

そこで、本発明は、評価データを生成する技術を提供することを目的とする。 Therefore, an object of the present invention is to provide a technique for generating evaluation data.

上記目的を達成するためになされた本発明は、楽曲データ取得手段と、抽出手段と、決定手段と、生成手段とを備えた情報処理装置である。
本発明における楽曲データ取得手段は、歌唱した歌唱音を含む楽曲データが記憶された第一記憶部から、楽曲データを取得する。抽出手段は、楽曲データ取得手段により取得された楽曲データから、歌唱した歌唱音を表すボーカルデータを抽出する。 In order to achieve the above object, the present invention is an information processing apparatus including music data acquisition means, extraction means, determination means, and generation means.
The music data acquisition means in this invention acquires music data from the 1st memory | storage part in which the music data containing the sung singing sound were memorize | stored. The extraction means extracts vocal data representing the sung singing sound from the music data acquired by the music data acquisition means.

さらに、決定手段は、抽出手段により抽出されたボーカルデータについて、楽曲データを構成する所定の音符の区間である音符区間ごとに、複数の歌唱技巧についての評価を表す技巧特徴量を決定する。そして、生成手段は、決定手段により決定された音符区間の技巧特徴量の中で、所定の条件を満たす技巧特徴量を決定し、決定された技巧特徴量に対応する音符区間と、決定された技巧特徴量とを対応付けたデータを、歌唱の評価に用いる評価データとして生成する。 Further, the determining means determines a technique feature amount representing an evaluation of a plurality of singing techniques for each musical note section which is a predetermined musical note section constituting the music data for the vocal data extracted by the extracting means. Then, the generating means determines the skill feature amount satisfying a predetermined condition among the skill feature quantities of the note interval determined by the determination means, and the note interval corresponding to the determined skill feature amount is determined. Data in which the technical feature quantity is associated is generated as evaluation data used for singing evaluation.

このような情報処理装置によれば、楽曲データに対応する評価データを自動で生成することができる。
したがって、本発明において、一般の人間が作詞作曲した楽曲の楽曲データであっても、その楽曲についての評価データを自動生成できる。 According to such an information processing apparatus, evaluation data corresponding to music data can be automatically generated.
Therefore, in the present invention, evaluation data for music can be automatically generated even for music data of music composed by a general human.

本発明の情報処理装置は、さらに、歌唱取得手段と、評価手段とを備えていても良い。
本発明における歌唱取得手段は、楽曲の演奏中に入力された音声を表す歌唱データを取得する。評価手段は、歌唱取得手段により取得された歌唱データについて、生成手段で生成された評価データを用いて、入力された音声における歌唱技巧を評価する。 The information processing apparatus of the present invention may further include a song acquisition unit and an evaluation unit.
The singing acquisition means in the present invention acquires singing data representing the voice input during the performance of the music. An evaluation means evaluates the singing skill in the input audio | voice using the evaluation data produced | generated by the production | generation means about the song data acquired by the song acquisition means.

このような情報処理装置によれば、楽曲を歌唱した歌唱音声における歌唱技巧を評価できる。
さらに、本発明における抽出手段は、楽曲データから、ボーカルデータと、楽曲における伴奏音を表す伴奏データとを抽出しても良い。 According to such an information processing apparatus, the singing skill in the singing voice singing the music can be evaluated.
Furthermore, the extracting means in the present invention may extract vocal data and accompaniment data representing accompaniment sounds in the music from the music data.

この場合、本発明の情報処理装置は、さらに、抽出手段にて抽出した伴奏データに基づいて、楽曲を演奏する演奏手段を備えていても良い。そして、歌唱取得手段は、演奏手段にて楽曲の演奏中に入力された音声を歌唱データとして取得しても良い。 In this case, the information processing apparatus of the present invention may further include performance means for playing music based on the accompaniment data extracted by the extraction means. And a song acquisition means may acquire the audio | voice input during the performance of the music by the performance means as song data.

このような情報処理装置によれば、第一記憶部に記憶された楽曲データに基づいて楽曲を演奏でき、その演奏中に入力された音声（歌声）における歌唱技巧を評価できる。
また、本発明における生成手段は、決定手段により決定された音符区間での技巧特徴量の中で、楽曲において特徴的に用いられる歌唱技巧に対応する技巧特徴量を、所定の条件を満たす技巧特徴量として決定しても良い。 According to such an information processing apparatus, music can be played based on the music data stored in the first storage unit, and the singing skill in the voice (singing voice) input during the performance can be evaluated.
Further, the generation means in the present invention, the technical features corresponding to the singing technique used characteristically in the music among the technical features in the note interval determined by the determining means, the technical features that satisfy a predetermined condition It may be determined as an amount.

このような情報処理装置によれば、楽曲において特徴的に用いられる歌唱技巧に対応する技巧特徴量を、所定の条件を満たす技巧特徴量として決定できる。
この結果、このような情報処理装置にて作成された情報処理装置を用いて歌唱を評価すれば、楽曲に対して多くの人間が有している印象に合致するように評価できる。 According to such an information processing apparatus, the technique feature quantity corresponding to the singing technique used characteristically in the music can be determined as the technique feature quantity satisfying a predetermined condition.
As a result, if singing is evaluated using an information processing device created by such an information processing device, it can be evaluated so as to match the impression that many people have with respect to the music.

本発明の情報処理装置は、複数の楽曲にて用いられる歌唱技巧の標準的な評価を表す標準特徴量が格納された第二記憶部から、標準特徴量を取得する標準取得手段を備えていても良い。 The information processing apparatus according to the present invention includes a standard acquisition unit that acquires a standard feature amount from a second storage unit that stores a standard feature amount representing a standard evaluation of a singing technique used in a plurality of songs. Also good.

この場合、本発明における生成手段は、決定手段により決定された音符区間での技巧特徴量と標準取得手段で取得した標準特徴量との差分が基準範囲外である場合、基準範囲外となる技巧特徴量を、所定の条件を満たす技巧特徴量として決定しても良い。 In this case, the generation means according to the present invention, if the difference between the technical feature amount in the note interval determined by the determination means and the standard feature amount acquired by the standard acquisition means is outside the reference range, the technical skill that falls outside the reference range. The feature amount may be determined as a skill feature amount that satisfies a predetermined condition.

このような情報処理装置によれば、標準特徴量と技巧特徴量との差分に基づいて、評価データを生成できる。
また、本発明における生成手段は、決定手段により決定された音符区間での技巧特徴量を、音符区間における音高かつ音価ごとに集計した特徴量分布を算出する分布算出手段を備えていても良い。 According to such an information processing apparatus, evaluation data can be generated based on the difference between the standard feature value and the skill feature value.
Further, the generation means in the present invention may include distribution calculation means for calculating a feature quantity distribution obtained by tabulating the technical feature quantities in the note section determined by the determination means for each pitch and note value in the note section. good.

この場合、生成手段は、分布算出手段により算出された特徴量分布において、楽曲における特徴として有意な範囲に含まれる場合、有意な範囲に含まれる技巧特徴量を所定の条件を満たす技巧特徴量として決定しても良い。 In this case, in the feature amount distribution calculated by the distribution calculating unit, the generation unit, when included in a significant range as a feature in the music, sets the technical feature amount included in the significant range as a technical feature amount satisfying a predetermined condition. You may decide.

このような情報処理装置によれば、評価データの生成に必要なデータを楽曲におけるボ
ーカルデータだけとすることができる。
ところで、本発明は、評価データを生成するデータ生成方法としてなされていても良い。 According to such an information processing apparatus, data necessary for generating evaluation data can be only vocal data in music.
By the way, this invention may be made | formed as a data generation method which produces | generates evaluation data.

この場合のデータ生成方法は、第一記憶部から楽曲データを取得する楽曲データ取得過程と、その取得された楽曲データからボーカルデータを抽出する抽出過程と、その抽出されたボーカルデータについて、楽曲データを構成する所定の音符の区間である音符区間ごとに、複数の歌唱技巧についての技巧特徴量を決定する決定過程と、その決定された音符区間の技巧特徴量の中で、所定の条件を満たす技巧特徴量を決定し、決定された技巧特徴量に対応する音符区間と、決定された技巧特徴量とを対応付けたデータを、歌唱の評価に用いる評価データとして生成する生成過程とを備えていても良い。 The data generation method in this case includes a music data acquisition process for acquiring music data from the first storage unit, an extraction process for extracting vocal data from the acquired music data, and music data for the extracted vocal data A predetermined condition is satisfied in the determination process for determining the technique feature quantity for a plurality of singing techniques for each note section, which is a predetermined note section that constitutes, and the technique feature quantity of the determined note section And a generation process for determining a skill feature amount and generating data associating the note interval corresponding to the determined skill feature amount and the determined skill feature amount as evaluation data used for singing evaluation. May be.

このようなデータ生成方法によれば、請求項１に記載の情報処理装置と同様の効果を得ることができる。
また、本発明は、コンピュータが実行するプログラムとしてなされていても良い。 According to such a data generation method, an effect similar to that of the information processing apparatus according to claim 1 can be obtained.
Further, the present invention may be made as a program executed by a computer.

この場合のプログラムは、第一記憶部から楽曲データを取得する楽曲データ取得手順と、その取得された楽曲データからボーカルデータを抽出する抽出手順と、その抽出されたボーカルデータについて、楽曲データを構成する所定の音符の区間である音符区間ごとに、複数の歌唱技巧についての技巧特徴量を決定する決定手順と、その決定された音符区間の技巧特徴量の中で、所定の条件を満たす技巧特徴量を決定し、決定された技巧特徴量に対応する音符区間と、決定された技巧特徴量とを対応付けたデータを、歌唱の評価に用いる評価データとして生成する生成手順とをコンピュータに実行させる。 The program in this case constitutes music data for the music data acquisition procedure for acquiring music data from the first storage unit, the extraction procedure for extracting vocal data from the acquired music data, and the extracted vocal data A determination procedure for determining a technique feature amount for a plurality of singing techniques for each note section, which is a predetermined note section, and a technique feature that satisfies a predetermined condition among the determined technique feature quantities of the note section The amount is determined, and the computer is caused to execute a generation procedure for generating data associating the note interval corresponding to the determined skill feature amount and the determined skill feature amount as evaluation data used for singing evaluation .

本発明がプログラムとしてなされていれば、記録媒体から必要に応じてコンピュータにロードさせて起動することや、必要に応じて通信回線を介してコンピュータに取得させて起動することにより用いることができる。そして、コンピュータに各手順を実行させることで、そのコンピュータを、請求項１に記載された情報処理装置として機能させることができる。 If the present invention is implemented as a program, it can be used by loading it into a computer from a recording medium as necessary and starting it, or by acquiring it and starting it through a communication line as necessary. And by making a computer perform each procedure, the computer can be functioned as an information processing apparatus described in claim 1.

なお、ここで言う記録媒体には、例えば、ＤＶＤ−ＲＯＭ、ＣＤ−ＲＯＭ、ハードディスク等のコンピュータ読み取り可能な電子媒体を含む。 The recording medium referred to here includes, for example, a computer-readable electronic medium such as a DVD-ROM, a CD-ROM, and a hard disk.

本発明が適用された情報処理装置を含むカラオケシステムの概略構成を示すブロック図である。It is a block diagram which shows schematic structure of the karaoke system containing the information processing apparatus with which this invention was applied. 標準特徴量算出処理の処理手順を示すフローチャートである。It is a flowchart which shows the process sequence of a standard feature-value calculation process. 第一実施形態における評価データ生成処理の処理手順を示すフローチャートである。It is a flowchart which shows the process sequence of the evaluation data generation process in 1st embodiment. カラオケ採点処理の処理手順を示すフローチャートである。It is a flowchart which shows the process sequence of a karaoke scoring process. 第二実施形態における評価データ生成処理の処理手順を示すフローチャートである。It is a flowchart which shows the process sequence of the evaluation data generation process in 2nd embodiment.

以下に本発明の実施形態を図面と共に説明する。
［第一実施形態］
〈カラオケシステムの構成〉
図１に示すカラオケシステム１は、ユーザ（利用者）が指定した楽曲を演奏し、その演奏に合わせてユーザが歌唱するシステムである。 Embodiments of the present invention will be described below with reference to the drawings.
[First embodiment]
<Configuration of karaoke system>
The karaoke system 1 shown in FIG. 1 is a system in which a music specified by a user (user) is played and the user sings along with the performance.

これを実現するために、カラオケシステム１は、情報処理サーバ１０と、少なくとも一台のカラオケ装置３０とを備えている。情報処理サーバ１０とカラオケ装置３０とは、通信網を介して接続されている。なお、ここで言う通信網は、有線による通信網であっても良いし、無線による通信網であっても良い。 In order to realize this, the karaoke system 1 includes an information processing server 10 and at least one karaoke apparatus 30. The information processing server 10 and the karaoke apparatus 30 are connected via a communication network. The communication network referred to here may be a wired communication network or a wireless communication network.

情報処理サーバ１０は、楽曲ごとに用意された楽曲データＭＤ−１〜ＭＤ−Ｎを格納する。カラオケ装置３０は、ユーザ（利用者）が指定した楽曲に対応する楽曲データＭＤを情報処理サーバ１０から取得し、その楽曲データＭＤに基づいて楽曲を演奏すると共に、その楽曲の演奏中に音声の入力を受け付ける。 The information processing server 10 stores music data MD-1 to MD-N prepared for each music. The karaoke apparatus 30 acquires the music data MD corresponding to the music specified by the user (user) from the information processing server 10 and plays the music based on the music data MD. Accept input.

なお、符号「Ｎ」は、楽曲データＭＤを識別する識別子であり、「Ｎ」は、２以上の自然数である。
〈情報処理サーバ〉
情報処理サーバ１０は、通信部１２と、記憶部１４と、制御部１６とを備えている。 The code “N” is an identifier for identifying the music data MD, and “N” is a natural number of 2 or more.
<Information processing server>
The information processing server 10 includes a communication unit 12, a storage unit 14, and a control unit 16.

このうち、通信部１２は、通信網を介して、情報処理サーバ１０が外部との間で通信を行う。
制御部１６は、ＲＯＭ１８，ＲＡＭ２０，ＣＰＵ２２を備えた周知のマイクロコンピュータを中心に構成された周知の制御装置である。ＲＯＭ１８は、電源が切断されても記憶内容を保持する必要がある処理プログラムやデータを格納する。ＲＡＭ２０は、処理プログラムやデータを一時的に格納する。ＣＰＵ２２は、ＲＯＭ１８やＲＡＭ２０に記憶された処理プログラムに従って各処理（各種演算）を実行する。 Among these, the communication unit 12 performs communication between the information processing server 10 and the outside via a communication network.
The control unit 16 is a known control device that is configured around a known microcomputer including a ROM 18, a RAM 20, and a CPU 22. The ROM 18 stores processing programs and data that need to retain stored contents even when the power is turned off. The RAM 20 temporarily stores processing programs and data. The CPU 22 executes each process (various calculations) in accordance with a processing program stored in the ROM 18 or the RAM 20.

すなわち、制御部１６は、情報処理サーバ１０を構成する各部を制御すると共に、カラオケ装置３０との間のデータ通信を実行する。
記憶部１４は、記憶内容を読み書き可能に構成された周知の記憶装置である。この記憶部１４には、少なくとも、複数の楽曲データＭＤが格納される。 That is, the control unit 16 controls each unit constituting the information processing server 10 and executes data communication with the karaoke apparatus 30.
The storage unit 14 is a known storage device configured to be able to read and write stored contents. The storage unit 14 stores at least a plurality of music data MD.

楽曲データＭＤは、楽曲に関する情報が記述された楽曲管理情報と、楽曲の演奏音を表す原盤波形データと、楽曲の歌詞を表す歌詞データとを備えている。楽曲管理情報には、少なくとも、楽曲を識別する楽曲識別情報（例えば、曲番号）が含まれる。 The music data MD includes music management information in which information related to music is described, master waveform data representing the performance sound of the music, and lyric data representing the lyrics of the music. The music management information includes at least music identification information (for example, music number) for identifying music.

本実施形態の原盤波形データは、複数の楽器の演奏音と、主旋律を歌唱した歌唱音を含む音声データである。この音声データは、非圧縮音声ファイルフォーマットの音声ファイルによって構成されたデータであっても良いし、音声圧縮フォーマットの音声ファイルによって構成されたデータであっても良い。 The master waveform data of the present embodiment is sound data including performance sounds of a plurality of musical instruments and singing sounds singing the main melody. The audio data may be data constituted by an audio file in an uncompressed audio file format, or data constituted by an audio file in an audio compression format.

なお、以下では、原盤波形データに含まれる演奏音を表す音声データを伴奏データと称し、原盤波形データに含まれる歌唱音を表す音声データをボーカルデータと称す。
本実施形態の伴奏データに含まれる楽器の演奏音としては、打楽器（例えば、ドラム，太鼓，シンバルなど）の演奏音，弦楽器（例えば、ギター，ベースなど）の演奏音，打弦楽器（例えば、ピアノ）の演奏音，及び管楽器（例えば、トランペットやクラリネットなど）の演奏音がある。一般的な楽曲においては、通常、打楽器やベースがリズム楽器となる。 In the following description, the sound data representing the performance sound included in the master waveform data is referred to as accompaniment data, and the sound data indicating the singing sound included in the master waveform data is referred to as vocal data.
Musical instrument performance sounds included in the accompaniment data of the present embodiment include percussion instrument (eg, drum, drum, cymbal, etc.) performance sounds, stringed instrument (eg, guitar, bass, etc.) performance sounds, percussion instrument (eg, piano) ) And wind instruments (eg, trumpet, clarinet, etc.). In general music, a percussion instrument and a bass are usually rhythm instruments.

なお、記憶部１４に格納される楽曲データＭＤには、プロが作曲した楽曲の楽曲データＭＤに加えて、カラオケシステム１の一般ユーザが作詞・作曲した楽曲の楽曲データＭＤも含まれる。この一般ユーザが作詞・作曲した楽曲の楽曲データＭＤは、周知の情報処理端末（例えば、パーソナルコンピュータや携帯端末）にて作成され、情報処理サーバ１０にアップロードされる。
〈カラオケ装置〉
カラオケ装置３０は、通信部３２と、入力受付部３４と、楽曲再生部３６と、記憶部３８と、音声制御部４０と、映像制御部４６と、制御部５０とを備えている。 The music data MD stored in the storage unit 14 includes music data MD of music composed and written by a general user of the karaoke system 1 in addition to music data MD of music composed by a professional. The song data MD of the song composed / written by the general user is created by a known information processing terminal (for example, a personal computer or a portable terminal) and uploaded to the information processing server 10.
<Karaoke equipment>
The karaoke apparatus 30 includes a communication unit 32, an input reception unit 34, a music playback unit 36, a storage unit 38, an audio control unit 40, a video control unit 46, and a control unit 50.

通信部３２は、通信網を介して、カラオケ装置３０が外部との間で通信を行う。入力受付部３４は、外部からの操作に従って情報や指令の入力を受け付ける入力機器である。本実施形態における入力機器とは、例えば、キーやスイッチ、リモコンの受付部などである。 In the communication unit 32, the karaoke apparatus 30 communicates with the outside via a communication network. The input receiving unit 34 is an input device that receives input of information and commands in accordance with external operations. The input device in the present embodiment is, for example, a key, a switch, a remote control receiving unit, or the like.

楽曲再生部３６は、記憶部３８に記憶されている楽曲データＭＤや、情報処理サーバ１０からダウンロードした楽曲データＭＤに基づいて楽曲の再生を行う。音声制御部４０は、音声の入出力を制御するデバイスであり、出力部４２と、マイク入力部４４とを備えている。 The music playback unit 36 plays back music based on the music data MD stored in the storage unit 38 or the music data MD downloaded from the information processing server 10. The voice control unit 40 is a device that controls voice input / output, and includes an output unit 42 and a microphone input unit 44.

マイク入力部４４には、マイク６２が接続される。これにより、マイク入力部４４は、ユーザの歌唱音を取得する。出力部４２にはスピーカ６０が接続されている。出力部４２は、楽曲再生部３６によって再生される楽曲の音源信号、マイク入力部４４からの歌唱音の音源信号をスピーカ６０に出力する。スピーカ６０は、出力部４２から出力される音源信号を音に換えて出力する。 A microphone 62 is connected to the microphone input unit 44. Thereby, the microphone input part 44 acquires a user's song sound. A speaker 60 is connected to the output unit 42. The output unit 42 outputs the sound source signal of the music reproduced by the music reproducing unit 36 and the sound source signal of the singing sound from the microphone input unit 44 to the speaker 60. The speaker 60 outputs the sound source signal output from the output unit 42 instead of sound.

映像制御部４６は、制御部５０から送られてくる映像データに基づく映像の出力を行う。映像制御部４６には、映像の表示を行う表示部６４が接続されている。
制御部５０は、ＲＯＭ５２，ＲＡＭ５４，ＣＰＵ５６を少なくとも有した周知のコンピュータを中心に構成されている。ＲＯＭ５２は、電源が切断されても記憶内容を保持する必要がある処理プログラムやデータを格納する。ＲＡＭ５４は、処理プログラムやデータを一時的に格納する。ＣＰＵ５６は、ＲＯＭ５２やＲＡＭ５４に記憶された処理プログラムに従って各処理（各種演算）を実行する。 The video control unit 46 outputs video based on video data sent from the control unit 50. A display unit 64 for displaying video is connected to the video control unit 46.
The control unit 50 is configured around a known computer having at least a ROM 52, a RAM 54, and a CPU 56. The ROM 52 stores processing programs and data that need to retain stored contents even when the power is turned off. The RAM 54 temporarily stores processing programs and data. The CPU 56 executes each process (various calculations) in accordance with a processing program stored in the ROM 52 or the RAM 54.

そして、ＲＯＭ５２には、制御部５０が、カラオケ採点処理を実行するための処理プログラムと、評価データ生成処理を実行するための処理プログラムと、標準特徴量算出処理を実行するための処理プログラムとが格納されている。 The ROM 52 includes a processing program for the control unit 50 to execute the karaoke scoring process, a processing program for executing the evaluation data generation process, and a processing program for executing the standard feature amount calculation process. Stored.

なお、カラオケ採点処理は、ユーザによって指定された楽曲を演奏し、その演奏期間中にマイク６２を介して入力された音声を評価する処理である。評価データ生成処理は、カラオケ採点処理に必要な評価データを楽曲データＭＤごとに生成する処理である。標準特徴量算出処理は、評価データの生成に用いる標準特徴量を算出する処理である。 The karaoke scoring process is a process of playing music designated by the user and evaluating the voice input through the microphone 62 during the performance period. The evaluation data generation process is a process of generating evaluation data necessary for the karaoke scoring process for each piece of music data MD. The standard feature quantity calculation process is a process for calculating a standard feature quantity used for generating evaluation data.

つまり、カラオケ装置３０は、標準特徴量算出処理に従って、評価情報としての標準特徴量を算出すると共に、評価データ生成処理に従って、楽曲データＭＤごとに評価データを生成する。そして、カラオケ装置３０は、カラオケ採点処理に従って、対象楽曲に対応する楽曲データＭＤに基づいて楽曲を演奏し、その演奏中に、マイク６２を介して入力された音声を歌唱データとして取得する。さらに、カラオケ装置３０は、カラオケ採点処理に従って、その取得した歌唱データを採点して評価する。 That is, the karaoke apparatus 30 calculates standard feature values as evaluation information in accordance with the standard feature value calculation processing, and generates evaluation data for each piece of music data MD in accordance with the evaluation data generation processing. And the karaoke apparatus 30 performs a music based on the music data MD corresponding to an object music according to a karaoke scoring process, and acquires the audio | voice input via the microphone 62 as song data during the performance. Furthermore, the karaoke apparatus 30 scores and evaluates the acquired singing data according to the karaoke scoring process.

すなわち、カラオケ装置３０は、標準特徴量算出処理、評価データ生成処理、及びカラオケ採点処理を実行する情報処理装置として機能する。
〈標準特徴量算出処理〉
次に、カラオケ装置３０の制御部５０が実行する標準特徴量算出処理について説明する。 That is, the karaoke apparatus 30 functions as an information processing apparatus that executes standard feature value calculation processing, evaluation data generation processing, and karaoke scoring processing.
<Standard feature calculation processing>
Next, a standard feature amount calculation process executed by the control unit 50 of the karaoke apparatus 30 will be described.

この標準特徴量算出処理は、予め規定された時間間隔で起動される。なお、標準特徴量算出処理の起動タイミングは、予め規定された時間間隔ごとに限らず、標準特徴量算出処理を実行するための処理プログラム（アプリケーション）を起動するための起動指令が、入力受付部３４を介して入力されたタイミングでも良い。 This standard feature amount calculation process is started at a predetermined time interval. Note that the start timing of the standard feature value calculation process is not limited to a predetermined time interval, and an input command for starting a processing program (application) for executing the standard feature value calculation process is input The timing input via 34 may be used.

そして、標準特徴量算出処理では、図２に示すように、起動されると、まず、制御部５０は、情報処理サーバ１０に格納されている全ての楽曲データＭＤの中から、一つの楽曲データＭＤを取得する（Ｓ１１０）。続いて、制御部５０は、Ｓ１１０にて取得した楽曲データＭＤに含まれる原盤波形データを取得する（Ｓ１２０）。 In the standard feature amount calculation process, as shown in FIG. 2, when activated, the control unit 50 firstly selects one piece of music data from all the music data MD stored in the information processing server 10. The MD is acquired (S110). Subsequently, the control unit 50 acquires master waveform data included in the music data MD acquired in S110 (S120).

さらに、標準特徴量算出処理では、制御部５０は、Ｓ１２０にて取得した原盤波形データから、伴奏データとボーカルデータとを分離して、伴奏データ及びボーカルデータを抽出する（Ｓ１３０）。このＳ１３０において、制御部５０が、伴奏データとボーカルデータとを分離する手法として、周知の手法（例えば、特開２００８−１３４６０６に記載された“ＰｒｅＦＥｓｔ”）が考えられる。なお、ＰｒｅＦＥｓｔとは、原盤波形データにおいて最も優勢な音声波形をボーカルデータとして原盤波形データから分離し、残りの音声波形を伴奏データとして分離する手法である。 Further, in the standard feature amount calculation process, the control unit 50 separates the accompaniment data and the vocal data from the master waveform data acquired in S120, and extracts the accompaniment data and the vocal data (S130). In S <b> 130, a known method (for example, “PreFEst” described in JP-A-2008-134606) is conceivable as a method by which the control unit 50 separates accompaniment data and vocal data. Note that PreFEst is a method of separating the most prevalent voice waveform in the master waveform data from the master waveform data as vocal data and separating the remaining voice waveform as accompaniment data.

続いて、標準特徴量算出処理では、制御部５０は、Ｓ１３０にて抽出したボーカルデータを採譜処理する（Ｓ１４０）。このＳ１４０における採譜処理は、ボーカルデータにおける音圧の時間変化と、ボーカルデータにおける音高の時間変化とに基づいて採譜する周知の手法である。 Subsequently, in the standard feature value calculation process, the control unit 50 performs a musical score process on the vocal data extracted in S130 (S140). The music recording process in S140 is a well-known method of recording music based on the temporal change in sound pressure in vocal data and the temporal change in pitch in vocal data.

すなわち、採譜処理では、制御部５０は、ボーカルデータにおける音圧の時間変化が規定閾値以上となったタイミングを、楽曲における歌唱旋律を構成する各音符の開始タイミングｎｎｔ（ａ，ｉ）として特定する。さらに、採譜処理では、制御部５０は、ボーカルデータにおける音圧の時間変化が規定閾値以下となったタイミングを、楽曲における歌唱旋律を構成する各音符の終了タイミングｎｆｔ（ａ，ｉ）として特定する。 That is, in the music recording process, the control unit 50 specifies the timing at which the temporal change of the sound pressure in the vocal data becomes equal to or greater than the specified threshold as the start timing nnt (a, i) of each note constituting the song melody in the music. . Further, in the music recording process, the control unit 50 specifies the timing at which the time change of the sound pressure in the vocal data is equal to or less than the specified threshold as the end timing nft (a, i) of each note constituting the song melody in the music. .

採譜処理では、制御部５０は、互いに対応する開始タイミングｎｎｔ（ａ，ｉ）及び終了タイミングｎｆｔ（ａ，ｉ）によって特定される区間を各音符の音符区間として特定する。これと共に、採譜処理では、制御部５０は、ボーカルデータにおける音高の時間変化に基づいて、各音符区間における音高を特定し、各音符区間とその音符区間における音高ｎｎ（ａ，ｉ）とを対応付ける。 In the music recording process, the control unit 50 specifies a section specified by the start timing nnt (a, i) and the end timing nft (a, i) corresponding to each other as a note section of each note. At the same time, in the music recording process, the control unit 50 specifies the pitch in each note interval based on the time change of the pitch in the vocal data, and the pitch nn (a, i) in each note interval and the note interval. Is associated.

なお、符号ａは、楽曲を識別する符号であり、符号ｉは、楽曲における歌唱旋律の音符区間を識別する符号である。
標準特徴量算出処理では、制御部５０は、さらに、複数の歌唱技巧についての評価を表す技巧特徴量を、楽曲における音符区間ごとに決定する（Ｓ１５０）。ここで言う複数の歌唱技巧には、“ビブラート”，“ため”，“しゃくり”，“フォール”，“こぶし”を含む。 In addition, the code | symbol a is a code | symbol which identifies a music, and the code | symbol i is a code | symbol which identifies the musical note area of the song melody in a music.
In the standard feature value calculation process, the control unit 50 further determines a skill feature value representing an evaluation of a plurality of singing skills for each note section in the music (S150). The plurality of singing techniques mentioned here include “vibrato”, “for”, “shrimp”, “fall”, and “fist”.

このうち、“ビブラート”についての技巧特徴量（以下、「ビブラート特徴量」と称す）ｖｉｂ（ａ，ｉ）の算出では、制御部５０は、まず、ボーカルデータから各音符区間に対応する音声波形を抽出し、各音符区間の音声波形について周波数解析（ＤＦＴ）を実施する。そして、制御部５０は、下記（１）式に従って、ビブラート特徴量ｖｉｂ（ａ，ｉ）を算出する。 Among these, in calculating the technical feature amount (hereinafter referred to as “vibrato feature amount”) vib (a, i) for “vibrato”, the control unit 50 firstly calculates a speech waveform corresponding to each note interval from vocal data. , And frequency analysis (DFT) is performed on the speech waveform of each note interval. And the control part 50 calculates the vibrato feature-value vib (a, i) according to following (1) Formula.

ただし、上記（１）式におけるｖｉｂ＿ｐｅｒ（ａ，ｉ）は、各音符区間の音声波形におけるスペクトルピークの突出精度を表す指標である。このｖｉｂ＿ｐｅｒ（ａ，ｉ）は、周波数解析結果（即ち、振幅スペクトル）のピーク値を、周波数解析結果の平均値で除すことで求めれば良い。また、上記（１）式におけるｖｉｐ＿ｄｅｐ（ａ，ｉ）は、各音符区間の音声波形の標準偏差である。

However, vib_per (a, i) in the above equation (1) is an index representing the protruding accuracy of the spectrum peak in the speech waveform of each note interval. The vib_per (a, i) may be obtained by dividing the peak value of the frequency analysis result (that is, the amplitude spectrum) by the average value of the frequency analysis result. Also, vip_dep (a, i) in the above equation (1) is the standard deviation of the speech waveform in each note interval.

“ため”についての技巧特徴量（以下、「ため特徴量」と称す）ｔｔ（ａ，ｉ）の算出では、制御部５０は、まず、伴奏データにおける非調波成分の音声波形をリズム楽器の演奏音波形として抽出する。この非調波成分の抽出手法として、非調波成分の音声波形を表すフィルタとして予め用意されたフィルタに伴奏音データを通過させることや、“スペクトログラムの滑らかさの異方性に基づいた調波音・打楽器音の分離”（日本音響学会春季研究発表会講演論文集，２−５−８，ｐ．９０３−９０４（２００８．０３））に記載された手法などを用いることが考えられる。 In calculating the technical feature amount (hereinafter referred to as “for feature amount”) tt (a, i) for “for”, the control unit 50 first converts the sound waveform of the non-harmonic component in the accompaniment data to the rhythm instrument. Extract as performance sound waveform. As a method for extracting the non-harmonic component, the accompaniment sound data is passed through a filter prepared in advance as a filter representing the sound waveform of the non-harmonic component, or the harmonic sound based on the anisotropy of the spectrogram smoothness is used. It is conceivable to use the method described in “Separation of percussion instrument sounds” (Proceedings of the Spring Meeting of the Acoustical Society of Japan, 2-5-8, p.903-904 (2008.03)).

さらに、ため特徴量ｔｔ（ａ，ｉ）の算出では、制御部５０は、リズム楽器の演奏音波形において、音圧が規定値以上となるタイミングを拍の位置として推定する。続いて、制御部５０は、楽曲の歌唱旋律を構成する音符の中で、音価が最も短い音符（以下、「最短音符」と称す）を抽出する。そして、制御部５０は、抽出した最短音符の音価にて拍の位置の間隔を除すことで、発声タイミングを特定する。ここで言う発声タイミングとは、各音符ｉに対して歌唱を開始する可能性のあるタイミングである。 Further, in calculating the feature quantity tt (a, i), the control unit 50 estimates the timing at which the sound pressure becomes equal to or higher than a specified value in the performance sound waveform of the rhythm instrument as the beat position. Subsequently, the control unit 50 extracts a note having the shortest note value (hereinafter referred to as “shortest note”) from the notes constituting the song melody of the music. Then, the control unit 50 specifies the utterance timing by dividing the interval between the beat positions by the note value of the extracted shortest note. The utterance timing here is a timing at which singing may be started for each note i.

ため特徴量ｔｔ（ａ，ｉ）の算出では、さらに、制御部５０は、規定条件を満たす発声タイミングを特定する。ここで言う規定条件を満たすとは、開始タイミングｎｎｔ（ａ，ｉ）よりも遅い発声タイミングであって、かつ、開始タイミングｎｎｔ（ａ，ｉ）から減算した値の絶対値が最小となる発声タイミングである。そして、特定した発声タイミングを開始タイミングｎｎｔ（ａ，ｉ）から減算した時間長を、ため特徴量ｔｔ（ａ，ｉ）として算出する。 Therefore, in the calculation of the feature quantity tt (a, i), the control unit 50 further specifies an utterance timing that satisfies a specified condition. Satisfying the specified condition here is an utterance timing that is later than the start timing nnt (a, i), and that the absolute value of the value subtracted from the start timing nnt (a, i) is minimum. It is. Then, a time length obtained by subtracting the specified utterance timing from the start timing nnt (a, i) is calculated as a feature quantity tt (a, i).

“しゃくり”についての技巧特徴量（以下、「しゃくり特徴量」と称す）ｒｉｓｅ（ａ，ｉ）の算出では、制御部５０は、まず、ボーカルデータの音高時間変化を微分した微分変化を導出する。続いて、制御部５０は、各音符区間の開始タイミングｎｎｔ（ａ，ｉ）以前で、微分変化が時間軸に沿って正の値となったタイミングを特定する。さらに、制御部５０は、その特定した各タイミングから開始タイミングｎｎｔ（ａ，ｉ）までの区間におけるボーカルデータの音高時間変化と予め規定された模範曲線との相互相関値を、しゃくり特徴量ｒｉｓｅ（ａ，ｉ）として導出する。 In calculating the skill feature amount (hereinafter referred to as “shackle feature amount”) rise (a, i) for “shrimp”, the control unit 50 first derives a differential change obtained by differentiating the pitch time change of vocal data. To do. Subsequently, the control unit 50 specifies the timing at which the differential change becomes a positive value along the time axis before the start timing nnt (a, i) of each note interval. Further, the control unit 50 obtains the cross-correlation value between the pitch time change of the vocal data and the predefined exemplary curve in the section from the identified timing to the start timing nnt (a, i), and the scribing feature amount rise. Derived as (a, i).

“フォール”についての技巧特徴量（以下、「フォール特徴量」と称す）ｆａｌｌ（ａ，ｉ）の算出では、制御部５０は、各音符区間の終了タイミングｎｆｔ（ａ，ｉ）以降で、微分変化が時間軸に沿って正の値となった最初のタイミングを特定する。さらに、制御部５０は、歌唱旋律を構成する各音符区間の終了タイミングｎｆｔ（ａ，ｉ）から、その特定したタイミングまでの区間におけるボーカルデータの音高時間変化と、予め規定された模範曲線との相互相関値を、フォール特徴量ｆａｌｌ（ａ，ｉ）として導出する。 In calculating the technical feature amount (hereinafter referred to as “fall feature amount”) fall (a, i) for “fall”, the control unit 50 performs differentiation after the end timing nft (a, i) of each note interval. The first timing when the change becomes a positive value along the time axis is specified. Further, the control unit 50 changes the pitch time of the vocal data in the section from the end timing nft (a, i) of each note section constituting the singing melody to the specified timing, and a prescribed model curve. Are derived as the fall feature value fall (a, i).

“こぶし”についての技巧特徴量（以下、「こぶし特徴量」と称す）ｋｏｂ（ａ，ｉ）の算出では、制御部５０は、まず、こぶし区間を特定する。ここで言うこぶし区間とは、複数の音高に渡る各音符区間を同一母音で音高を変化させながら歌っている区間である。 In calculating the technical feature amount (hereinafter referred to as “fist feature amount”) kob (a, i) for “fist”, the control unit 50 first specifies a fist section. The fist section here is a section in which each note section over a plurality of pitches is sung while changing the pitch with the same vowel.

このため、こぶし特徴量ｋｏｂ（ａ，ｉ）の算出では、制御部５０は、同一母音で発声された区間（以下、「同一母音区間」と称す）を特定する。この同一母音区間の特定方法として、各音符区間のメル周波数ケプストラム（ＭＦＣＣ）の平均値の相互相関を導出し、相互相関値が閾値以上である音符区間を同一母音区間として特定する方法を用いる。 For this reason, in calculating the fist feature value kob (a, i), the control unit 50 identifies a section uttered by the same vowel (hereinafter referred to as “same vowel section”). As a method for specifying the same vowel section, a method is used in which a cross-correlation of average values of Mel frequency cepstrum (MFCC) of each note section is derived and a note section having a cross-correlation value equal to or greater than a threshold is specified as the same vowel section.

また、制御部５０は、同一母音区間において、設定条件を満たす同一母音区間だけをこぶし区間として特定する。ここで言う設定条件を満たすとは、時間軸に沿って隣接する音符区間の終了タイミングｎｆｔ（ａ−１，ｉ）と開始タイミングｎｎｔ（ａ，ｉ）との時間間隔が閾値以下であり、かつ、隣接する音符区間の音高が全て異なることである。 In addition, the control unit 50 identifies only the same vowel section that satisfies the setting condition as the fist section in the same vowel section. Satisfying the setting condition here means that the time interval between the end timing nft (a-1, i) and the start timing nnt (a, i) of the note intervals adjacent to each other along the time axis is equal to or less than a threshold value, and The pitches of adjacent note intervals are all different.

そして、こぶし特徴量ｋｏｂ（ａ，ｉ）の算出では、制御部５０は、こぶし区間におけるボーカル波形からクロマベクトルを算出する。さらに、制御部５０は、同こぶし区間における伴奏データのクロマベクトルを算出し、ボーカル波形のクロマベクトルとの相互相関値をこぶし特徴量ｋｏｂ（ａ，ｉ）として算出する。 In calculating the fist feature value kob (a, i), the control unit 50 calculates a chroma vector from the vocal waveform in the fist section. Further, the control unit 50 calculates a chroma vector of accompaniment data in the same fist section, and calculates a cross-correlation value with the chroma vector of the vocal waveform as a fist feature value kob (a, i).

標準特徴量算出処理では、続いて、制御部５０が、複数の歌唱技巧について評価した評価情報としての楽曲特徴量を算出する（Ｓ１６０）。この楽曲特徴量とは、ビブラート特徴量ｖｉｂ，ため特徴量ｔｔ，しゃくり特徴量ｒｉｓｅ，フォール特徴量ｆａｌｌ，こぶし特徴量ｋｏｂそれぞれについての楽曲内での平均値である。ただし、Ｓ１６０では、制御部５０は、歌唱旋律を構成する音符区間の音価、かつ、音高の組み合わせごとに、各歌唱技巧の楽曲特徴量を算出する。 In the standard feature value calculation process, the control unit 50 calculates a music feature value as evaluation information evaluated for a plurality of singing techniques (S160). The music feature value is an average value in the music for each of the vibrato feature value vib, the feature value tt, the shawl feature value rise, the fall feature value fall, and the fist feature value kob. However, in S160, the control part 50 calculates the music feature-value of each singing skill for every combination of the note value and pitch of the note interval which comprises a song melody.

標準特徴量算出処理では、続いて、制御部５０は、楽曲特徴量を算出するまでの処理（即ち、Ｓ１１０〜Ｓ１６０）を、全ての楽曲データＭＤに対して実行したか否かを判定する（Ｓ１７０）。このＳ１７０での判定の結果、楽曲特徴量を算出するまでの処理を、全ての楽曲データＭＤに対して実行していなければ（Ｓ１７０：ＮＯ）、制御部５０は、標準特徴量算出処理をＳ１１０へと戻す。そして、制御部５０は、処理を未実行の楽曲データＭＤを情報処理サーバ１０から取得して、Ｓ１２０へと移行する。 In the standard feature value calculation process, the control unit 50 subsequently determines whether or not the process until the music feature value is calculated (ie, S110 to S160) has been executed for all the music data MD ( S170). As a result of the determination in S170, if the process until calculating the music feature amount is not executed for all the music data MD (S170: NO), the control unit 50 performs the standard feature amount calculation process in S110. Return to. And the control part 50 acquires the music data MD which has not performed the process from the information processing server 10, and transfers to S120.

一方、Ｓ１７０での判定の結果、楽曲特徴量を算出するまでの処理を、全ての楽曲データＭＤに対して実行していれば（Ｓ１７０：ＹＥＳ）、制御部５０は、標準特徴量算出処理をＳ１８０へと進める。 On the other hand, as a result of the determination in S170, if the process until calculating the music feature amount is executed for all the music data MD (S170: YES), the control unit 50 performs the standard feature amount calculation process. Proceed to S180.

そのＳ１８０では、制御部５０は、音符区間の音価、かつ、音高の組み合わせごとに算出された楽曲特徴量それぞれの、全ての楽曲データに渡った平均値及び標準偏差を標準特徴量として算出する。なお、Ｓ１８０では、制御部５０は、標準特徴量を、音符区間の音価かつ音高の組み合わせごとに算出する。 In S180, the control unit 50 calculates, as standard feature values, the average value and standard deviation of all the song data of the song feature values calculated for each combination of note value and pitch of the note interval. To do. In S180, the control unit 50 calculates the standard feature value for each combination of note value and pitch of the note interval.

標準特徴量算出処理では、制御部５０は、Ｓ１８０にて算出した音符区間の音価かつ音高の組み合わせごとに算出された標準特徴量を、情報処理サーバ１０の記憶部１４にアップロードする（Ｓ１９０）。 In the standard feature amount calculation process, the control unit 50 uploads the standard feature amount calculated for each combination of note value and pitch of the note interval calculated in S180 to the storage unit 14 of the information processing server 10 (S190). ).

その後、本標準特徴量算出処理を終了する。
つまり、本実施形態の標準特徴量算出処理では、制御部５０は、複数の楽曲のボーカルデータについての各歌唱技巧の評価を表す技巧特徴量を算出する。さらに、標準特徴量算出処理では、制御部５０は、その算出された技巧特徴量を、歌唱旋律を構成する音符区間の音価、かつ、音高の組み合わせごとに平均した平均値及び標準偏差を求める。そして、標準特徴量算出処理では、制御部５０は、音符区間の音価かつ音高ごとに求められた標準特徴量を、情報処理サーバ１０の記憶部１４に格納する。
〈評価データ生成処理〉
次に、カラオケ装置３０の制御部５０が実行する評価データ生成処理について説明する。 Thereafter, the standard feature amount calculation process is terminated.
That is, in the standard feature value calculation process of the present embodiment, the control unit 50 calculates a skill feature value representing evaluation of each singing skill for vocal data of a plurality of music pieces. Further, in the standard feature value calculation process, the control unit 50 calculates an average value and a standard deviation obtained by averaging the calculated skill feature value for each combination of the note value and pitch of the note melody. Ask. In the standard feature value calculation process, the control unit 50 stores the standard feature value obtained for each note value and pitch in the note section in the storage unit 14 of the information processing server 10.
<Evaluation data generation process>
Next, an evaluation data generation process executed by the control unit 50 of the karaoke apparatus 30 will be described.

この評価データ生成処理は、評価データ生成処理を実行するための起動指令が入力されると起動される。
そして、評価データ生成処理では、図３に示すように、起動されると、まず、制御部５０は、情報処理サーバ１０に格納されている全ての楽曲データＭＤの中から、一つの楽曲データＭＤを取得する（Ｓ２１０）。続いて、制御部５０は、Ｓ２１０にて取得した楽曲データＭＤに含まれる原盤波形データを取得する（Ｓ２２０）。 The evaluation data generation process is started when a start command for executing the evaluation data generation process is input.
Then, in the evaluation data generation process, as shown in FIG. 3, when activated, the control unit 50 firstly selects one piece of music data MD from all the music data MD stored in the information processing server 10. Is acquired (S210). Subsequently, the control unit 50 acquires master waveform data included in the music data MD acquired in S210 (S220).

さらに、評価データ生成処理では、制御部５０は、Ｓ２２０にて取得した原盤波形データから、伴奏データとボーカルデータとを分離して抽出する（Ｓ２３０）。このＳ２３０における伴奏データとボーカルデータとを分離する手法は、標準特徴量算出処理におけるＳ１３０と同様の手法を用いれば良い。 Further, in the evaluation data generation process, the control unit 50 separates and extracts accompaniment data and vocal data from the master disk waveform data acquired in S220 (S230). The technique for separating accompaniment data and vocal data in S230 may be the same technique as in S130 in the standard feature value calculation process.

続いて、評価データ生成処理では、制御部５０は、Ｓ２３０にて抽出したボーカルデータを採譜処理する（Ｓ２４０）。このＳ２４０における採譜処理は、標準特徴量算出処理のＳ１４０における採譜処理と同様の方法を用いれば良い。 Subsequently, in the evaluation data generation process, the control unit 50 performs a musical score process on the vocal data extracted in S230 (S240). The musical score processing in S240 may use the same method as the musical score processing in S140 of the standard feature amount calculation process.

評価データ生成処理では、制御部５０は、さらに、Ｓ２３０にて抽出したボーカルデータについての技巧特徴量それぞれを、楽曲における音符区間ごとに決定する（Ｓ２５０）。このＳ２５０における技巧特徴量を決定する手法は、標準特徴量算出処理におけるＳ１５０と同様の方法を用いれば良い。 In the evaluation data generation process, the control unit 50 further determines each technical feature amount of the vocal data extracted in S230 for each note section in the music (S250). As a technique for determining the skill feature amount in S250, a method similar to S150 in the standard feature amount calculation process may be used.

評価データ生成処理では、制御部５０は、情報処理サーバ１０の記憶部１４に格納されている標準特徴量を取得する（Ｓ２６０）。続いて、制御部５０は、特徴量距離を歌唱技巧ごとに算出する（Ｓ２７０）。 In the evaluation data generation process, the control unit 50 acquires a standard feature amount stored in the storage unit 14 of the information processing server 10 (S260). Subsequently, the control unit 50 calculates the feature amount distance for each singing skill (S270).

このＳ２７０では、具体的に、制御部５０は、Ｓ２５０にて決定した技巧特徴量と標準特徴量における平均値との差分の絶対値を、標準特徴量における標準偏差で除した値を、特徴量距離として算出する。ただし、Ｓ２７０では、制御部５０は、音符区間の音価かつ音高の組み合わせごと、かつ歌唱技巧ごとに特徴量距離を算出する。 In S270, specifically, the control unit 50 calculates the feature value by dividing the absolute value of the difference between the technical feature value determined in S250 and the average value in the standard feature value by the standard deviation in the standard feature value. Calculate as distance. However, in S270, the control unit 50 calculates the feature amount distance for each combination of the note value and pitch of the note interval and for each singing skill.

さらに、評価データ生成処理では、制御部５０は、評価データを生成する（Ｓ２８０）。このＳ２８０では、制御部５０は、まず、音符区間ごとに、特定条件を満たす特徴距離に対応する歌唱技巧を特定する。ここで言う特定条件を満たすとは、Ｓ２７０で算出した各歌唱技巧の特徴量距離の中で、予め規定された基準値以上であり、かつ最大となる特徴量距離となることである。 Further, in the evaluation data generation process, the control unit 50 generates evaluation data (S280). In S280, the control unit 50 first specifies the singing technique corresponding to the feature distance that satisfies the specific condition for each note interval. Satisfying the specific condition mentioned here means that the feature amount distance is equal to or greater than a predetermined reference value among the feature amount distances of each singing technique calculated in S270.

そして、制御部５０は、その特定条件を満たす各技巧特徴量（以下、「特定特徴量」と称す）を、その特定特徴量に対応する音符区間それぞれと対応付けた情報を評価データとして生成する。 Then, the control unit 50 generates, as evaluation data, information that associates each technical feature amount (hereinafter referred to as “specific feature amount”) that satisfies the specific feature with each note interval corresponding to the specific feature amount. .

すなわち、Ｓ２７０では、制御部５０は、歌唱旋律を構成する各音符区間での技巧特徴量と標準特徴量との差分が基準範囲外である場合に、その基準範囲外となる技巧特徴量を、所定の条件を満たす技巧特徴量（即ち、特定特徴量）として決定する。そして、制御部５０は、音符区間の中で特徴的な歌唱技巧が用いられている音符区間と、その特徴的な歌唱技巧の技巧特徴量とを対応付けた情報を評価データとして生成する。 That is, in S270, when the difference between the technical feature amount and the standard feature amount in each note section constituting the singing melody is outside the reference range, the control unit 50 determines the technical feature amount outside the reference range, It is determined as a technical feature amount (that is, a specific feature amount) that satisfies a predetermined condition. And the control part 50 produces | generates the information which matched the note area in which the characteristic song technique is used in the note area, and the technique feature-value of the characteristic song technique as evaluation data.

続いて、評価データ生成処理では、制御部５０は、Ｓ２４０における採譜処理の結果をリファレンスデータとして生成する（Ｓ２９０）。ここで言うリファレンスデータとは、歌唱すべき旋律を構成する音符区間（即ち、音高と音価と）を表したデータである。 Subsequently, in the evaluation data generation process, the control unit 50 generates the result of the music transcription process in S240 as reference data (S290). The reference data referred to here is data representing note intervals (that is, pitches and note values) constituting a melody to be sung.

そして、評価データ生成処理では、制御部５０は、楽曲識別情報と、評価データと、リファレンスデータとを対応付けて情報処理サーバ１０の記憶部１４にアップロードする（Ｓ３００）。 In the evaluation data generation process, the control unit 50 uploads the music identification information, the evaluation data, and the reference data in association with each other to the storage unit 14 of the information processing server 10 (S300).

その後、評価データ生成処理を終了する。
つまり、評価データ生成処理では、制御部５０が、楽曲において特徴的な歌唱技巧が用いられている音符区間ごとに、その歌唱技巧を評価した評価データを生成する。さらに、評価データ生成処理では、制御部５０は、リファレンスデータを生成し、評価データと共に情報処理サーバ１０の記憶部１４に格納する。
〈カラオケ採点処理〉
次に、カラオケ装置３０の制御部５０が実行するカラオケ採点処理について説明する。 Thereafter, the evaluation data generation process ends.
That is, in the evaluation data generation process, the control unit 50 generates evaluation data that evaluates the singing technique for each note section in which a characteristic singing technique is used in the music. Further, in the evaluation data generation process, the control unit 50 generates reference data and stores it in the storage unit 14 of the information processing server 10 together with the evaluation data.
<Karaoke scoring>
Next, the karaoke scoring process which the control part 50 of the karaoke apparatus 30 performs is demonstrated.

このカラオケ採点処理は、カラオケ採点処理を実行するための処理プログラムを起動する指令が入力受付部３４を介して入力されると起動される。
そして、カラオケ採点処理では、起動されると、図４に示すように、制御部５０は、まず、入力受付部３４を介して指定された楽曲に対応する楽曲データＭＤを、情報処理サーバ１０の記憶部１４から取得する（Ｓ５１０）。続いて、制御部５０は、Ｓ５１０にて取得した楽曲データＭＤに含まれている伴奏データを抽出する（Ｓ５２０）。 This karaoke scoring process is started when a command for starting a processing program for executing the karaoke scoring process is input via the input receiving unit 34.
In the karaoke scoring process, when activated, as shown in FIG. 4, the control unit 50 first stores the music data MD corresponding to the music specified via the input receiving unit 34 in the information processing server 10. Obtained from the storage unit 14 (S510). Subsequently, the control unit 50 extracts accompaniment data included in the music data MD acquired in S510 (S520).

そして、カラオケ採点処理では、制御部５０は、伴奏データを再生して楽曲を演奏する（Ｓ５３０）。具体的にＳ５３０では、制御部５０は、楽曲再生部３６に伴奏データを出力し、その伴奏データを取得した楽曲再生部３６は、楽曲の再生を行う。そして、楽曲再生部３６によって再生される楽曲の音源信号が、出力部４２を介してスピーカ６０へと出力される。すると、スピーカ６０は、音源信号を音に換えて出力する。 In the karaoke scoring process, the control unit 50 reproduces the accompaniment data and plays the music (S530). Specifically, in S530, the control unit 50 outputs the accompaniment data to the music reproducing unit 36, and the music reproducing unit 36 that has acquired the accompaniment data reproduces the music. Then, a sound source signal of the music reproduced by the music reproducing unit 36 is output to the speaker 60 via the output unit 42. Then, the speaker 60 outputs the sound source signal instead of sound.

さらに、カラオケ採点処理では、制御部５０は、マイク６２及びマイク入力部４４を介して入力された音声を歌唱データとして取得する（Ｓ５４０）。そして、制御部５０は、Ｓ５４０にて取得した歌唱データを記憶部３８に格納する（Ｓ５５０）。 Further, in the karaoke scoring process, the control unit 50 acquires the voice input through the microphone 62 and the microphone input unit 44 as song data (S540). And the control part 50 stores the song data acquired in S540 in the memory | storage part 38 (S550).

続いて、カラオケ採点処理では、制御部５０は、楽曲の演奏を終了したか否かを判定する（Ｓ５６０）。この判定の結果、楽曲の演奏を終了していなければ（Ｓ５６０：ＮＯ）、制御部５０は、カラオケ採点処理をＳ５４０へと戻す。一方、Ｓ５６０での判定の結果、楽曲の演奏が終了していれば（Ｓ５６０：ＹＥＳ）、制御部５０は、カラオケ採点処理をＳ５７０へと移行させる。 Subsequently, in the karaoke scoring process, the control unit 50 determines whether or not the performance of the music has ended (S560). If the result of this determination is that the music performance has not ended (S560: NO), the control unit 50 returns the karaoke scoring process to S540. On the other hand, if the result of determination in S560 is that the music has been played (S560: YES), the control unit 50 shifts the karaoke scoring process to S570.

そのＳ５７０では、制御部５０は、記憶部３８に格納されている全ての歌唱データを取得する。そして、制御部５０は、楽曲における時間軸に沿った歌唱データから、歌唱旋律を構成する各音符を歌唱した区間（以下、「音符歌唱区間」と称す）の歌唱波形それぞれを抽出する（Ｓ５８０）。この音符歌唱区間の特定は、「ボーカルデータ」を「歌唱データ」へと読み替えることを除けば、標準特徴量算出処理におけるＳ１４０と同様の方法で実施すれば良い。 In S <b> 570, the control unit 50 acquires all song data stored in the storage unit 38. And the control part 50 extracts each song waveform of the area (henceforth a "musical note song area") which sang each note which comprises a song melody from the song data along the time-axis in a music (S580). . The note singing section may be specified by the same method as S140 in the standard feature amount calculation process, except that “vocal data” is replaced with “singing data”.

続いて、カラオケ採点処理では、制御部５０は、歌唱データについての歌唱技巧を評価した技巧特徴量（以下、「歌唱特徴量」と称す）を算出する（Ｓ５９０）。この歌唱特徴量の算出方法は、「ボーカルデータ」を「歌唱データ」へと読み替えることを除けば、標準特徴量算出処理におけるＳ１５０及びＳ１６０と同様であるため、ここでの詳しい説明
は省略する。 Subsequently, in the karaoke scoring process, the control unit 50 calculates a skill feature amount (hereinafter referred to as “singing feature amount”) that evaluates the singing skill of the song data (S590). The singing feature value calculation method is the same as S150 and S160 in the standard feature value calculation process, except that “vocal data” is replaced with “singing data”, and detailed description thereof will be omitted.

さらに、カラオケ採点処理では、制御部５０は、基準評価点を算出する（Ｓ６００）。このＳ６００での基準評価点の算出では、制御部５０は、各音符歌唱区間における歌唱波形の音高推移を、リファレンスデータにおける音高推移に照合する。そして、制御部５０は、照合の結果、一致度が高いほど高い点数とした基準評価点を算出する。 Further, in the karaoke scoring process, the control unit 50 calculates a reference evaluation score (S600). In the calculation of the reference evaluation score in S600, the control unit 50 collates the pitch transition of the singing waveform in each note singing section with the pitch transition in the reference data. And the control part 50 calculates the reference | standard evaluation score made into high score, so that a matching degree is high as a result of collation.

続いて、カラオケ採点処理では、制御部５０は、技巧評価点を算出する（Ｓ６１０）。このＳ６１０での技巧評価点の算出では、制御部５０は、まず、標準特徴量、及びＳ５１０にて取得した楽曲データＭＤの評価データを取得する。そして、Ｓ５９０にて算出した歌唱特徴量と、取得した評価データと、標準特徴量とに基づいて、制御部５０は、音符区間ごとに、下記式に従って音符毎評価点を算出する。 Subsequently, in the karaoke scoring process, the control unit 50 calculates a skill evaluation score (S610). In the calculation of the skill evaluation score in S610, the control unit 50 first acquires the standard feature value and the evaluation data of the music data MD acquired in S510. Then, based on the singing feature value calculated in S590, the acquired evaluation data, and the standard feature value, the control unit 50 calculates an evaluation score for each note according to the following formula for each note interval.

音符毎評価点＝α×向き×（歌唱特徴量−標準特徴量における平均値）／標準特徴量における標準偏差
ただし、上記の音符毎評価点を求める式おいて、向きは、評価データに含まれる特定特徴量と歌唱特徴量との差分における正負であり、「１」または「−１」である。 Evaluation score for each note = α × direction × (average value in singing feature value−standard feature value) / standard deviation in standard feature value However, in the above formula for calculating the evaluation score for each note, the direction is included in the evaluation data. It is positive or negative in the difference between the specific feature value and the singing feature value, and is “1” or “−1”.

また、上記の音符毎評価点を求める式おける符号αは、歌唱技巧に対する点数の重みであり、予め規定された定数である。
さらに、技巧評価点の算出では、制御部５０は、楽曲全体に渡る音符毎評価点の平均値を技巧評価点として算出する。 Further, the symbol α in the equation for obtaining the evaluation score for each note is a weight of the score for the singing skill, and is a predetermined constant.
Furthermore, in the calculation of the skill evaluation score, the control unit 50 calculates the average value of the evaluation scores for each note over the entire music as the skill evaluation score.

さらに、カラオケ採点処理では、制御部５０は、Ｓ６００にて算出した基準評価点に、Ｓ６１０にて算出した技巧評価点を加算することで、総合評価点を算出する（Ｓ６２０）。そして、制御部５０は、Ｓ６２０にて算出した総合評価点を表示部６４に表示させる（Ｓ６３０）。Ｓ６３０での表示は、制御部５０が、映像制御部４６を介して表示部６４に対して制御信号を出力することで実現する。なお、表示部６４に表示される評価点は、総合評価点だけに限らず、基準評価点、技巧評価点の少なくとも一方を含んでも良い。 Further, in the karaoke scoring process, the control unit 50 calculates the overall evaluation score by adding the skill evaluation score calculated in S610 to the reference evaluation score calculated in S600 (S620). And the control part 50 displays the comprehensive evaluation score calculated in S620 on the display part 64 (S630). The display in S630 is realized by the control unit 50 outputting a control signal to the display unit 64 via the video control unit 46. Note that the evaluation points displayed on the display unit 64 are not limited to the overall evaluation points, and may include at least one of a reference evaluation point and a skill evaluation point.

その後、本カラオケ採点処理を終了し、次の起動タイミングまで待機する。
つまり、カラオケ採点処理では、制御部５０は、楽曲の演奏中に入力された音声を歌唱データとして記憶する。そして、カラオケ採点処理では、制御部５０は、記憶した歌唱データを解析して歌唱特徴量を算出する。さらに、カラオケ採点処理では、制御部５０は、楽曲における特徴的な技巧が強く表れているほど、大きな点数となるように技巧評価点を算出する。 Thereafter, the karaoke scoring process is terminated, and the system waits until the next activation timing.
That is, in the karaoke scoring process, the control unit 50 stores the voice input during the performance of the music as singing data. In the karaoke scoring process, the control unit 50 analyzes the stored singing data and calculates the singing feature amount. Further, in the karaoke scoring process, the control unit 50 calculates the skill evaluation score so that the higher the characteristic skill in the music, the higher the score.

また、カラオケ採点処理では、制御部５０は、各音符歌唱区間における歌唱波形の音高推移をリファレンスデータにおける音高推移に照合し、一致度が高いほど、高い点数とした基準評価点を算出する。そして、制御部５０は、技巧評価点と基準評価点との合計を、総合評価点として算出する。
［第一実施形態の効果］
以上説明したように、カラオケシステム１によれば、楽曲データＭＤに対応する評価データを自動で生成することができる。 Further, in the karaoke scoring process, the control unit 50 collates the pitch transition of the singing waveform in each note singing section with the pitch transition in the reference data, and calculates a reference evaluation score having a higher score as the matching degree is higher. . And the control part 50 calculates the sum total of a skill evaluation score and a reference | standard evaluation score as a comprehensive evaluation score.
[Effect of the first embodiment]
As described above, according to the karaoke system 1, the evaluation data corresponding to the music data MD can be automatically generated.

したがって、カラオケシステム１において、一般の人間が作詞作曲した楽曲の楽曲データＭＤであっても、その楽曲についての評価データを自動生成できる。
さらに、カラオケシステム１においては、歌唱旋律を構成する音符区間において特徴的に用いられる歌唱技巧を特徴技巧として特定している。そして、カラオケ採点処理では、その音符区間が歌唱された音声において特徴技巧が強く表れていれば、大きな点数の技巧
評価点を基準評価点に付加している。 Therefore, in the karaoke system 1, even if it is the music data MD of the music composed by a general human, evaluation data for the music can be automatically generated.
Furthermore, in the karaoke system 1, the singing technique used characteristically in the note interval which comprises a singing melody is specified as a characteristic technique. Then, in the karaoke scoring process, if the characteristic skill is strongly expressed in the voice in which the note interval is sung, a large skill evaluation score is added to the reference evaluation score.

この結果、カラオケシステム１を用いて歌唱を評価すれば、楽曲に対して多くの人間が有している印象に合致するように評価できる。
［第二実施形態］
第二実施形態のカラオケシステムは、第一実施形態のカラオケシステム１とは、主として、評価データ生成処理の内容が異なる。このため、本実施形態においては、第一実施形態と同様の構成及び処理には、同一の符号を付して説明を省略し、第一実施形態とは異なる評価データ処理を中心に説明する。
〈評価データ生成処理〉
本実施形態における評価データ生成処理は、評価データ生成処理を実行するための起動指令が入力されると起動される。 As a result, if singing is evaluated using the karaoke system 1, it can be evaluated so as to match the impression many people have with respect to the music.
[Second Embodiment]
The karaoke system of the second embodiment is mainly different from the karaoke system 1 of the first embodiment in the content of the evaluation data generation process. For this reason, in the present embodiment, the same configurations and processes as those in the first embodiment are denoted by the same reference numerals, description thereof will be omitted, and evaluation data processing different from that in the first embodiment will be mainly described.
<Evaluation data generation process>
The evaluation data generation process in the present embodiment is started when a start command for executing the evaluation data generation process is input.

そして、評価データ生成処理では、図５に示すように、起動されると、まず、制御部５０は、情報処理サーバ１０に格納されている全ての楽曲データＭＤの中から、一つの楽曲データＭＤを取得する（Ｓ７１０）。続いて、制御部５０は、Ｓ７１０にて取得した楽曲データＭＤに含まれる原盤波形データを取得する（Ｓ７２０）。 In the evaluation data generation process, as shown in FIG. 5, when activated, the control unit 50 firstly selects one piece of music data MD from all the music data MD stored in the information processing server 10. Is acquired (S710). Subsequently, the control unit 50 acquires master waveform data included in the music data MD acquired in S710 (S720).

さらに、評価データ生成処理では、制御部５０は、Ｓ７２０にて取得した原盤波形データから、伴奏データとボーカルデータとを分離して、伴奏データ及びボーカルデータを抽出する（Ｓ７３０）。このＳ７３０における伴奏データとボーカルデータとを分離する手法は、標準特徴量算出処理におけるＳ１３０と同様の手法を用いれば良い。 Further, in the evaluation data generation process, the control unit 50 separates accompaniment data and vocal data from the master waveform data acquired in S720, and extracts accompaniment data and vocal data (S730). The technique for separating accompaniment data and vocal data in S730 may be the same technique as S130 in the standard feature value calculation process.

続いて、評価データ生成処理では、制御部５０は、Ｓ７３０にて抽出したボーカルデータを採譜処理する（Ｓ７４０）。このＳ７４０における採譜処理は、標準特徴量算出処理のＳ１４０における採譜処理と同様の方法を用いれば良い。 Subsequently, in the evaluation data generation process, the control unit 50 performs a musical score process on the vocal data extracted in S730 (S740). The musical score processing in S740 may use the same method as the musical score processing in S140 of the standard feature amount calculation process.

評価データ生成処理では、制御部５０は、さらに、Ｓ７３０にて抽出したボーカルデータについての技巧特徴量それぞれを、楽曲における音符区間ごとに決定する（Ｓ７５０）。このＳ７５０における技巧特徴量を決定する手法は、標準特徴量算出処理におけるＳ１５０と同様の方法を用いれば良い。 In the evaluation data generation process, the control unit 50 further determines each technical feature amount of the vocal data extracted in S730 for each note section in the music (S750). As a technique for determining the skill feature amount in S750, a method similar to S150 in the standard feature amount calculation process may be used.

評価データ生成処理では、続いて、制御部５０は、特徴量分布を算出する（Ｓ７６０）。このＳ７６０にて算出する特徴量分布とは、歌唱旋律を構成する音符区間の音高かつ音価ごとに、Ｓ７５０にて算出した技巧特徴量を集計した分布である。なお、Ｓ７６０では、制御部５０は、歌唱技巧ごとに特徴量分布を算出する。 In the evaluation data generation process, subsequently, the control unit 50 calculates a feature amount distribution (S760). The feature amount distribution calculated in S760 is a distribution in which the technical feature amounts calculated in S750 are tabulated for each pitch and note value of the note interval constituting the singing melody. In S760, control part 50 computes feature-value distribution for every singing technique.

さらに、評価データ生成処理では、制御部５０は、評価データを生成する（Ｓ７７０）。このＳ７７０では、制御部５０は、Ｓ７６０にて算出した特徴量分布において、楽曲における特徴として有意な範囲に含まれる技巧特徴量であって、各音符区間における歌唱技巧ごとの技巧特徴量の中で最大である技巧特徴量を特定特徴量として特定する。そして、制御部５０は、その特定された特定特徴量と、その特定特徴量に対応する音符区間とを対応付けた情報を評価データとして生成する。 Further, in the evaluation data generation process, the control unit 50 generates evaluation data (S770). In S770, the control unit 50 is a technique feature quantity included in a significant range as a feature in the music in the feature quantity distribution calculated in S760, and among the technique feature quantities for each singing technique in each note section. The technical feature amount that is the maximum is specified as the specific feature amount. And the control part 50 produces | generates the information which matched the specified specific feature-value and the note area corresponding to the specific feature-value as evaluation data.

続いて、評価データ生成処理では、制御部５０は、Ｓ７４０における採譜処理の結果をリファレンスデータとして生成する（Ｓ７８０）。
そして、評価データ生成処理では、制御部５０は、楽曲識別情報と、評価データと、リファレンスデータとを対応付けて情報処理サーバ１０の記憶部１４にアップロードする（Ｓ７９０）。 Subsequently, in the evaluation data generation process, the control unit 50 generates the result of the music recording process in S740 as reference data (S780).
In the evaluation data generation process, the control unit 50 associates the music identification information, the evaluation data, and the reference data and uploads them to the storage unit 14 of the information processing server 10 (S790).

その後、評価データ生成処理を終了する。
［第二実施形態の効果］
第二実施形態のカラオケシステム１によれば、評価データの生成に必要なデータを楽曲におけるボーカルデータだけとすることができる。
［その他の実施形態］
以上、本発明の実施形態について説明したが、本発明は上記実施形態に限定されるものではなく、本発明の要旨を逸脱しない範囲において、様々な態様にて実施することが可能である。 Thereafter, the evaluation data generation process ends.
[Effects of Second Embodiment]
According to the karaoke system 1 of the second embodiment, the data necessary for generating the evaluation data can be only vocal data in the music.
[Other Embodiments]
As mentioned above, although embodiment of this invention was described, this invention is not limited to the said embodiment, In the range which does not deviate from the summary of this invention, it is possible to implement in various aspects.

上記実施形態では、標準特徴量算出処理，評価データ生成処理，カラオケ採点処理を、カラオケ装置３０が実行していたが、これらの標準特徴量算出処理，評価データ生成処理，カラオケ採点処理を実行する装置は、カラオケ装置３０に限るものではない。例えば、標準特徴量算出処理，評価データ生成処理，カラオケ採点処理を実行する装置は、情報処理サーバ１０であっても良いし、その他の情報処理装置であっても良い。 In the above embodiment, the karaoke device 30 executes the standard feature quantity calculation process, the evaluation data generation process, and the karaoke scoring process. However, the standard feature quantity calculation process, the evaluation data generation process, and the karaoke scoring process are executed. The device is not limited to the karaoke device 30. For example, the information processing server 10 or another information processing device may be used as the device that executes the standard feature amount calculation process, the evaluation data generation process, and the karaoke scoring process.

また、上記実施形態の評価データ生成処理では、各特定特徴量と、その特定特徴量に対応する音符区間それぞれと対応付けた情報を評価データとして生成していたが、評価データは、これに限るものではない。 In the evaluation data generation process of the above embodiment, each specific feature amount and information associated with each note interval corresponding to the specific feature amount are generated as evaluation data. However, the evaluation data is limited to this. It is not a thing.

例えば、技巧特徴量が予め規定された基準閾値よりも小さい音符区間の情報、即ち、所定の歌唱技巧を用いることが不適切な音符区間表す情報を、評価データに含めても良い。これにより、歌唱技巧を用いることを禁止する音符区間を表す情報を、評価データに含めることができる。 For example, the evaluation data may include information on a note interval in which the skill feature amount is smaller than a predetermined reference threshold value, that is, information indicating a note interval in which it is inappropriate to use a predetermined singing skill. Thereby, the information showing the note area which prohibits using a singing technique can be included in evaluation data.

また、評価データ生成処理にて生成される評価データは、特徴技巧を用いるべき音符区間と、その特徴技巧の内容及び技巧特徴量とを表したデータであったが、評価データ生成処理にて生成される評価データはこれに限るものではない。評価データは、例えば、音符区間ごとの各歌唱技巧に付与すべき重みを表したデータであっても良い。 The evaluation data generated by the evaluation data generation process is data representing the note interval in which the feature technique should be used, the contents of the feature technique, and the technique feature amount, but is generated by the evaluation data generation process. The evaluation data to be performed is not limited to this. The evaluation data may be, for example, data representing a weight to be given to each singing technique for each note interval.

この場合、カラオケ採点処理において、制御部５０は、次の手順にて技巧評価点を算出すれば良い。
まず、制御部５０は、以下の式に従って、音符毎評価点を算出する。 In this case, in the karaoke scoring process, the control unit 50 may calculate the skill evaluation score according to the following procedure.
First, the control unit 50 calculates an evaluation score for each note according to the following formula.

音符毎評価点＝α×Σ×（重み×向き×（歌唱特徴量−標準特徴量における平均値）／標準特徴量における標準偏差）
ただし、上記の音符毎評価点を求める式おいて、和を求める対象（即ち、シグマの対象）は、歌唱技巧である。また、向きは、評価データに含まれる特定特徴量と歌唱特徴量との差分における正負であり、「１」または「−１」である。 Evaluation score for each note = α × Σ × (weight × direction × (singing feature value−average value in standard feature value) / standard deviation in standard feature value)
However, in the above formula for obtaining the evaluation score for each note, the object for obtaining the sum (that is, the object of sigma) is a singing technique. The direction is positive or negative in the difference between the specific feature amount and the singing feature amount included in the evaluation data, and is “1” or “−1”.

なお、上記実施形態の構成の一部を、課題を解決できる限りにおいて省略した態様も本発明の実施形態である。また、上記実施形態と変形例とを適宜組み合わせて構成される態様も本発明の実施形態である。また、特許請求の範囲に記載した文言によって特定される発明の本質を逸脱しない限度において考え得るあらゆる態様も本発明の実施形態である。［実施形態と特許請求の範囲との対応関係］
最後に、上記実施形態の記載と、特許請求の範囲の記載との関係を説明する。 In addition, the aspect which abbreviate | omitted a part of structure of the said embodiment as long as the subject could be solved is also embodiment of this invention. Further, an aspect configured by appropriately combining the above embodiment and the modification is also an embodiment of the present invention. Moreover, all the aspects which can be considered in the limit which does not deviate from the essence of the invention specified by the wording described in the claims are the embodiments of the present invention. [Correspondence between Embodiment and Claims]
Finally, the relationship between the description of the above embodiment and the description of the scope of claims will be described.

上記実施形態の評価データ生成処理におけるＳ２１０，Ｓ７１０を実行することで得られる機能が、特許請求の範囲の記載における楽曲データ取得手段に相当し、Ｓ２２０，Ｓ２３０，Ｓ７２０，Ｓ７３０を実行することで得られる機能が、抽出手段に相当する。また、Ｓ２５０，Ｓ７５０を実行することで得られる機能が、決定手段に相当し、Ｓ２７０，Ｓ２８０，Ｓ７７０，Ｓ７８０を実行することで得られる機能が、生成手段に相当する。 The function obtained by executing S210 and S710 in the evaluation data generation process of the above embodiment corresponds to the music data acquisition means described in the claims, and obtained by executing S220, S230, S720, and S730. The function to be performed corresponds to the extraction means. Further, the function obtained by executing S250 and S750 corresponds to the determining means, and the function obtained by executing S270, S280, S770, and S780 corresponds to the generating means.

さらに、上記実施形態のカラオケ採点処理におけるＳ５４０，Ｓ５５０，Ｓ５７０を実行することで得られる機能が、特許請求の範囲の記載における歌唱取得手段に相当し、Ｓ６１０を実行することで得られる機能が、評価手段に相当する。また、Ｓ５３０を実行することで得られる機能が、演奏手段に相当する。 Furthermore, the function obtained by executing S540, S550, S570 in the karaoke scoring process of the above embodiment corresponds to the singing acquisition means in the description of the claims, and the function obtained by executing S610, Corresponds to evaluation means. The function obtained by executing S530 corresponds to a performance means.

なお、第１実施形態の評価データ生成処理におけるＳ２６０を実行することで得られる機能が、特許請求の範囲の記載における標準取得手段に相当し、第２実施形態の評価データ生成処理におけるＳ７６０を実行することで得られる機能が、分布算出手段に相当する。 Note that the function obtained by executing S260 in the evaluation data generation process of the first embodiment corresponds to the standard acquisition means in the description of the claims, and executes S760 in the evaluation data generation process of the second embodiment. The function obtained by doing this corresponds to the distribution calculating means.

１…カラオケシステム１０…情報処理サーバ１２…通信部１４…記憶部１６…制御部１８…ＲＯＭ２０…ＲＡＭ２２…ＣＰＵ３０…カラオケ装置３２…通信部３４…入力受付部３６…楽曲再生部３８…記憶部４０…音声制御部４２…出力部４４…マイク入力部４６…映像制御部５０…制御部５２…ＲＯＭ５４…ＲＡＭ５６…ＣＰＵ６０…スピーカ６２…マイク６４…表示部 DESCRIPTION OF SYMBOLS 1 ... Karaoke system 10 ... Information processing server 12 ... Communication part 14 ... Memory | storage part 16 ... Control part 18 ... ROM 20 ... RAM 22 ... CPU 30 ... Karaoke apparatus 32 ... Communication part 34 ... Input reception part 36 ... Music reproduction part 38 ... Storage unit 40 ... Audio control unit 42 ... Output unit 44 ... Microphone input unit 46 ... Video control unit 50 ... Control unit 52 ... ROM 54 ... RAM 56 ... CPU 60 ... Speaker 62 ... Microphone 64 ... Display unit

Claims

Music data acquisition means for acquiring the music data from the first storage unit storing music data including the sung singing sound;
Extraction means for extracting vocal data representing the sung sound from the music data acquired by the music data acquisition means;
For vocal data extracted by the extraction means, a determination means for determining a skill feature amount representing an evaluation of a plurality of singing techniques for each note section that is a predetermined note section constituting the music data;
Among the technical features of the note interval determined by the determining means, determine a technical feature amount satisfying a predetermined condition, and a note interval corresponding to the determined technical feature amount, and the determined technical feature amount An information processing apparatus comprising: generating means for generating the associated data as evaluation data used for singing evaluation.

Singing acquisition means for acquiring singing data representing voice input during the performance of the music;
An evaluation means for evaluating the singing skill in the input voice using the evaluation data generated by the generation means for the singing data acquired by the singing acquisition means. The information processing apparatus described in 1.

The extraction means extracts the vocal data and accompaniment data representing an accompaniment sound in the music from the music data,
The information processing apparatus further includes:
Based on the accompaniment data extracted by the extraction means, comprising performance means for playing the music,
The information processing apparatus according to claim 2, wherein the singing acquisition unit acquires, as the singing data, a voice input during the performance of a music piece by the performance unit.

The generating means includes
Of the skill features in the note interval determined by the determination means, a skill feature amount corresponding to a singing skill that is characteristically used in the music is determined as a skill feature amount satisfying the predetermined condition. The information processing apparatus according to any one of claims 1 to 3, wherein:

Standard acquisition means for acquiring the standard feature value from a second storage unit storing a standard feature value representing a standard evaluation of the singing technique used in a plurality of songs,
The generating means includes
If the difference between the technical feature amount in the note interval determined by the determination unit and the standard feature amount acquired by the standard acquisition unit is outside the reference range, the technical feature amount outside the reference range is determined as the predetermined feature amount. The information processing device according to claim 4, wherein the information processing device is determined as a technique feature amount that satisfies the following condition.

The generating means includes
Distribution calculating means for calculating a feature quantity distribution in which the technical feature quantities determined in the note section determined by the determining means are tabulated for each pitch and note value in the note section;
When the feature amount distribution calculated by the distribution calculating means is included in a significant range as a feature in music, the skill feature amount included in the significant range is determined as a skill feature amount satisfying the predetermined condition. The information processing apparatus according to claim 4.

A music data acquisition process for acquiring the music data from the first storage unit storing the music data including the sung singing sound;
An extraction process for extracting vocal data representing the sung sound from the music data acquired by the music data acquisition process;
For the vocal data extracted by the extraction process, for each note interval that is a predetermined note interval constituting the music data, a determination step for determining a technique feature amount representing evaluation of a plurality of singing techniques;
Among the technical features of the note section determined by the determination process, a technical feature amount satisfying a predetermined condition is determined, and a note section corresponding to the determined technical feature amount, and the determined technical feature amount A data generation method comprising: generating the associated data as evaluation data used for singing evaluation.

A music data acquisition procedure for acquiring the music data from the first storage unit storing the music data including the sung singing sound;
Extraction procedure for extracting vocal data representing the singing sound from the song data acquired by the song data acquisition procedure;
For the vocal data extracted by the extraction procedure, a determination procedure for determining a technique feature amount representing evaluation of a plurality of singing techniques for each note section that is a predetermined note section constituting the music data;
Among the technical features of the note section determined by the determination procedure, a technical feature amount satisfying a predetermined condition is determined, and a musical note section corresponding to the determined technical feature amount and the determined technical feature amount A program for causing a computer to execute a generation procedure for generating the associated data as evaluation data used for song evaluation.