JP2015191177A

JP2015191177A - Program, information processing device, and data generation method

Info

Publication number: JP2015191177A
Application number: JP2014069740A
Authority: JP
Inventors: 誠司黒川; Seiji Kurokawa
Original assignee: Brother Industries Ltd
Current assignee: Brother Industries Ltd
Priority date: 2014-03-28
Filing date: 2014-03-28
Publication date: 2015-11-02
Anticipated expiration: 2034-03-28
Also published as: JP6056799B2

Abstract

PROBLEM TO BE SOLVED: To provide a technique capable of generating data necessary for indicating features of a way of singing specific to a singer.SOLUTION: The information processing device executes: first and second acquisition steps for acquiring musical score data and music data respectively; a pitch transition derivation step for deriving a vocal pitch transition; a specification step for specifying a vibrato waveform on the basis of the vocal pitch transition; a reference calculation step for calculating a high tone reference pitch and a low tone reference pitch on the basis of a pitch at the top of the vibrato waveform; a determination step for determining at least either of a center pitch difference based on the musical score data, the high tone reference pitch, and the low tone reference pitch, or a vibrato depth based on the high tone reference pitch and the low tone reference pitch; and a data generation step for generating singing feature data by associating at least either of the center pitch difference or the vibrato depth with a performance period in the musical score data corresponding to the vibrato waveform.

Description

本発明は、データを生成するプログラム、情報処理装置、及びデータ生成方法に関する。 The present invention relates to a program for generating data, an information processing apparatus, and a data generation method.

従来、楽曲の歌唱旋律を歌った歌唱の巧拙を評価する歌唱評価技術が知られている（特許文献１参照）。
この特許文献１に記載の技術では、楽曲ごとに用意された模範音声に、利用者が歌唱した音声を照合することで巧拙を評価し、その評価に応じたコメントを歌唱後に提示している。 Conventionally, a song evaluation technique for evaluating the skill of a song that sang the song melody has been known (see Patent Document 1).
In the technique described in Patent Document 1, skill is evaluated by comparing the voice sung by the user with the model voice prepared for each piece of music, and a comment corresponding to the evaluation is presented after singing.

特開２００７−２３３０１３号公報JP 2007-233303 A

ところで、プロの歌手が歌唱した楽曲においては、多くの場合、その楽曲の歌手ごとに特有の特徴が表出する。カラオケ装置や情報処理装置などを用いて利用者が歌唱する場合において、歌手特有の歌い方の特徴を再現できた場合には、より上手く歌唱しているように聞こえる。 By the way, in the music sung by a professional singer, in many cases, a unique characteristic appears for each singer of the music. When a user sings using a karaoke device, an information processing device, or the like, if the singing-specific characteristics of the singer can be reproduced, it sounds like singing better.

このため、カラオケ装置や情報処理装置などにおいて、歌手特有の歌い方の特徴について、その歌い方の特徴を利用者が実際に歌唱する前に提示することが求められている。
しかしながら、従来の技術では、模範音声と利用者が歌唱した音声とのズレを利用者が歌唱した後に提示するだけである。よって、カラオケ装置や情報処理装置などの利用者は、歌手特有の歌い方の特徴について、どのタイミングでどのように歌唱するとよいのかを歌唱前に認識できないという課題があった。 For this reason, in a karaoke apparatus, an information processing apparatus, etc., about the characteristic of the way of singing peculiar to a singer, it is calculated | required before a user actually sings the characteristic of the way of singing.
However, in the conventional technology, the difference between the model voice and the voice sung by the user is only presented after the user sings. Therefore, users such as karaoke apparatuses and information processing apparatuses have a problem that it is difficult to recognize before singing at what timing and how to sing about the singing-specific characteristics of the singer.

つまり、カラオケ装置や情報処理装置などにおいて、歌手特有の歌い方の特徴について、利用者が実際に歌唱する前に何ら提示していないという課題があった。
そこで、本発明は、歌手特有の歌い方の特徴を提示するために必要なデータを生成可能な技術の提供を目的とする。 That is, in a karaoke apparatus, an information processing apparatus, etc., there was a problem that the user did not present anything about the characteristics of the singing method specific to the singer before the user actually sang.
Therefore, an object of the present invention is to provide a technique capable of generating data necessary for presenting the characteristics of a singer-specific singing method.

上記目的を達成するためになされた本発明は、コンピュータに実行させるプログラムに関する。
本発明のプログラムでは、第１取得ステップと、第２取得ステップと、音高推移決定ステップと、特定ステップと、基準算出ステップと、決定ステップと、データ生成ステップとをコンピュータに実行させる。 The present invention made to achieve the above object relates to a program to be executed by a computer.
In the program according to the present invention, the computer executes the first acquisition step, the second acquisition step, the pitch transition determination step, the specifying step, the reference calculation step, the determination step, and the data generation step.

このうち、第１取得ステップでは、複数の音符によって構成された楽曲の楽譜を表す楽譜データであって、複数の音符のそれぞれに、音高及び演奏期間が対応付けられた楽譜データを、第１記憶部から取得する。第２取得ステップでは、楽曲を歌唱したボーカル音を含む楽曲データを、第２記憶部から取得する。抽出ステップでは、楽曲データからボーカル音を表すボーカルデータを抽出する。 Among these, in the first acquisition step, the score data representing the score of a music composed of a plurality of notes, the score data in which the pitch and the performance period are associated with each of the plurality of notes, Obtained from the storage unit. In the second acquisition step, music data including the vocal sound that sang the music is acquired from the second storage unit. In the extraction step, vocal data representing vocal sound is extracted from the music data.

さらに、音高推移決定ステップでは、ボーカルデータに基づいて、ボーカルデータにおける音高の推移を表すボーカル音高推移を決定する。特定ステップでは、ボーカル音高推移に基づいて、ボーカル音高推移においてビブラートを用いて歌唱された歌唱区間であるビブラート波形を特定する。基準算出ステップでは、ビブラート波形における頂点の音高に基づいて、ビブラート波形における高音域の音高を表す高音基準音高、及び、ビブラート波形における低音域の音高を表す低音基準音高を算出する。 Further, in the pitch transition determining step, a vocal pitch transition representing transition of pitch in the vocal data is determined based on the vocal data. In the specifying step, based on the vocal pitch transition, a vibrato waveform that is a singing section sung using vibrato in the vocal pitch transition is specified. In the reference calculation step, based on the peak pitch of the vibrato waveform, a treble reference pitch that represents a high pitch in the vibrato waveform and a low reference pitch that represents a low pitch in the vibrato waveform are calculated. .

そして、決定ステップでは、高音基準音高及び低音基準音高に基づいて、中心音高差、または、ビブラート深さのうち少なくともいずれかを決定する。ここで言う中心音高差とは、ビブラート波形における中心音高と、楽譜データにおける歌唱区間に対応する音符の音高との差分である。ここで言うビブラート深さとは、ビブラート波形における周波数軸に沿った振れ幅を表す。 In the determination step, at least one of the central pitch difference and the vibrato depth is determined based on the high pitch reference pitch and the low pitch reference pitch. The central pitch difference here is the difference between the central pitch in the vibrato waveform and the pitch of the note corresponding to the singing section in the score data. The vibrato depth referred to here represents a fluctuation width along the frequency axis in the vibrato waveform.

データ生成ステップでは、歌唱区間に対応する楽譜データでの区間に、中心音高差、及び、ビブラート深さのうち少なくともいずれかを対応付けることで、歌唱特徴データを生成する。 In the data generation step, singing feature data is generated by associating at least one of the central pitch difference and the vibrato depth with the section of the score data corresponding to the singing section.

このようなプログラムにて生成される歌唱特徴データは、楽曲を歌唱した歌手が用いるビブラートに関する特徴である「中心音高差」及び「ビブラート深さ」のうちの少なくともいずれか一方を、楽譜データの音符に対応付けたものである。ここで、「中心音高差」及び「ビブラート深さ」は、歌手が用いるビブラートに関する特徴を表す情報であり、歌手ごとに特有の特徴である。 The singing feature data generated by such a program includes at least one of “center pitch difference” and “vibrato depth”, which are features related to vibrato used by the singer who sang the music, in the score data. It is associated with a note. Here, the “center pitch difference” and the “vibrato depth” are information representing the characteristics relating to the vibrato used by the singer, and are unique to each singer.

このような歌唱特徴データによれば、楽譜データに基づいて楽曲を構成する音符、即ち、歌唱旋律を表示する場合、カラオケ装置や情報処理装置に接続された表示部に、歌手が用いたビブラートの「中心音高差」及び「ビブラート深さ」を提示できる。しかも、その「中心音高差」及び「ビブラート深さ」の提示を、カラオケ装置や情報処理装置の利用者が実際に歌唱する前に実施できる。 According to such singing characteristic data, when displaying the notes constituting the music based on the score data, that is, the singing melody, the vibrato used by the singer is displayed on the display unit connected to the karaoke device or the information processing device. “Center pitch difference” and “Vibrato depth” can be presented. In addition, the “center pitch difference” and “vibrato depth” can be presented before the user of the karaoke apparatus or the information processing apparatus actually sings.

そして、ビブラートの「中心音高差」及び「ビブラート深さ」が表示装置に提示されるカラオケ装置や情報処理装置の利用者は、歌唱している楽曲において、プロの歌手が、歌唱技巧としての「ビブラート」をどのように用いているのかを認識できる。この結果、カラオケ装置や情報処理装置の利用者は、利用者自身の歌い方を、プロの歌手の歌い方により近づけることができる。 And the user of the karaoke device or information processing device in which the “center pitch difference” and “vibrato depth” of the vibrato are presented on the display device, the professional singer is the singer Recognize how "vibrato" is used. As a result, the user of the karaoke apparatus or the information processing apparatus can bring the user's own way of singing closer to that of a professional singer.

本発明における決定ステップでは、高音基準音高と低音基準音高との平均値を、ビブラート波形における中心音高として決定し、その決定した中心音高と、楽譜データにおける歌唱区間に対応する音符の音高との差分を、中心音高差として決定しても良い。 In the determination step according to the present invention, an average value of the high pitch reference pitch and the low pitch reference pitch is determined as the central pitch in the vibrato waveform, and the determined central pitch and the note corresponding to the singing section in the score data are determined. The difference from the pitch may be determined as the central pitch difference.

このようなプログラムによれば、高音基準音高と低音基準音高との平均値を、ビブラート波形における中心音高として算出することができる。そして、本発明のプログラムによれば、ビブラート波形における中心音高と音符の音高との差分を、中心音高差として算出できる。 According to such a program, the average value of the high pitch reference pitch and the low pitch reference pitch can be calculated as the central pitch in the vibrato waveform. According to the program of the present invention, the difference between the central pitch and the pitch of the note in the vibrato waveform can be calculated as the central pitch difference.

さらに、本発明における決定ステップでは、高音基準音高と低音基準音高との差分を、ビブラート深さとして算出しても良い。
このようなプログラムによれば、高音基準音高と低音基準音高との差分を、ビブラート深さとして算出できる。 Furthermore, in the determination step according to the present invention, the difference between the high pitch reference pitch and the low pitch reference pitch may be calculated as the vibrato depth.
According to such a program, the difference between the high pitch reference pitch and the low pitch reference pitch can be calculated as the vibrato depth.

また、本発明における基準算出ステップでは、頂点検出ステップと、高音基準算出ステップと、低音基準算出ステップとをコンピュータに実行させても良い。頂点検出ステップでは、特定ステップで特定されたビブラート波形における変曲点を検出する。高音基準算出ステップでは、頂点検出ステップで検出されたビブラート波形の変曲点のうち、極大となる変曲点における音高の代表値を高音基準音高として算出する。さらに、低音基準算出ステップでは、頂点検出ステップで検出されたビブラート波形の変曲点のうち、極小となる変曲点における音高の代表値を低音基準音高として算出する。 In the reference calculation step according to the present invention, the vertex detection step, the treble reference calculation step, and the bass reference calculation step may be executed by a computer. In the vertex detection step, an inflection point in the vibrato waveform identified in the identification step is detected. In the treble reference calculation step, a representative value of the pitch at the maximum inflection point among the inflection points of the vibrato waveform detected in the vertex detection step is calculated as the treble reference pitch. Further, in the bass reference calculation step, the representative value of the pitch at the minimum inflection point among the inflection points of the vibrato waveform detected in the vertex detection step is calculated as the bass reference pitch.

このようなプログラムによれば、高音基準音高及び低音基準音高をより確実に算出できる。
また、本発明の基準算出ステップでは、ビブラート波形において極小となる変曲点の個数と、ビブラート波形において極大となる変曲点の個数とを同数として、高音基準音高及び低音基準音高を算出しても良い。 According to such a program, the treble reference pitch and the bass reference pitch can be calculated more reliably.
Further, in the reference calculation step of the present invention, the treble reference pitch and the bass reference pitch are calculated by using the same number of inflection points as the minimum in the vibrato waveform and the maximum inflection points in the vibrato waveform. You may do it.

このようなプログラムによれば、ビブラート波形の変曲点における特異点を除外でき、高音基準音高及び低音基準音高の算出精度を向上させることができる。
なお、本発明のプログラムにおいては、表示制御ステップをコンピュータに実行させても良い。この場合、表示制御ステップでは、第１取得ステップで取得された楽譜データを構成する複数の音符を表示部に表示させ、歌唱特徴データに含まれる中心音高差、及び、ビブラート深さのうち少なくともいずれか一方を、歌唱区間に対応する楽譜データでの区間と対応付けて、表示部に表示させる。 According to such a program, the singular point at the inflection point of the vibrato waveform can be excluded, and the calculation accuracy of the high pitch reference pitch and the low pitch reference pitch can be improved.
In the program of the present invention, the display control step may be executed by a computer. In this case, in the display control step, a plurality of notes constituting the musical score data acquired in the first acquisition step are displayed on the display unit, and at least of the central pitch difference included in the singing feature data and the vibrato depth Either one is displayed on the display unit in association with the section in the musical score data corresponding to the singing section.

このようなプログラムによれば、カラオケ装置や情報処理装置に接続された表示部に、歌手が用いたビブラートの「中心音高差」及び「ビブラート深さ」を提示できる。
この結果、ビブラートの「中心音高差」及び「ビブラート深さ」が表示装置に提示されるカラオケ装置や情報処理装置の利用者は、歌唱している楽曲において、プロの歌手が、歌唱技巧としての「ビブラート」をどのように用いているのかを認識できる。 According to such a program, the “center pitch difference” and “vibrato depth” of the vibrato used by the singer can be presented on the display unit connected to the karaoke apparatus or the information processing apparatus.
As a result, users of karaoke devices and information processing devices in which the “center pitch difference” and “vibrato depth” of vibrato are presented on the display device can be used by professional singers as singing skills in singing songs. It is possible to recognize how “Vibrato” is used.

本発明は、情報処理装置としてなされていても良い。
本発明の情報処理装置は、第１取得手段と、第２取得手段と、抽出手段と、音高推移決定手段と、特定手段と、基準算出手段と、決定手段と、データ生成手段とを備える。 The present invention may be implemented as an information processing apparatus.
The information processing apparatus of the present invention includes a first acquisition unit, a second acquisition unit, an extraction unit, a pitch transition determination unit, a specification unit, a reference calculation unit, a determination unit, and a data generation unit. .

第１取得手段は、楽譜データを第１記憶部から取得し、第２取得手段は、楽曲データを第２記憶部から取得する。抽出手段は、ボーカルデータを抽出し、音高推移決定手段は、そのボーカルデータにおける音高の推移を表すボーカル音高推移を決定する。特定手段は、そのボーカル音高推移においてビブラートを用いて歌唱された歌唱区間であるビブラート波形を特定する。基準算出手段は、そのビブラート波形における頂点の音高に基づいて、高音基準音高、及び、低音基準音高を算出する。 The first acquisition unit acquires the musical score data from the first storage unit, and the second acquisition unit acquires the music piece data from the second storage unit. The extracting means extracts vocal data, and the pitch transition determining means determines a vocal pitch transition representing a transition of pitch in the vocal data. The specifying means specifies a vibrato waveform that is a singing section sung using vibrato in the vocal pitch transition. The reference calculation means calculates a high pitch reference pitch and a low pitch reference pitch based on the peak pitch of the vibrato waveform.

さらに、決定手段は、高音基準音高及び低音基準音高に基づいて、中心音高差、または、ビブラート深さのうち少なくともいずれかを決定する。また、データ生成手段は、歌唱区間に対応する楽譜データでの区間に、中心音高差、及び、ビブラート深さのうち少なくともいずれかを対応付けることで、歌唱特徴データを生成する。 Further, the determining means determines at least one of the central pitch difference and the vibrato depth based on the high pitch reference pitch and the low pitch reference pitch. Further, the data generating means generates singing feature data by associating at least one of a central pitch difference and a vibrato depth with a section of the musical score data corresponding to the singing section.

このような情報処理装置によれば、請求項１に係るプログラムと同様の効果を得ることができる。
また、本発明は、情報処理装置が実行するデータ生成方法としてなされていても良い。 According to such an information processing apparatus, an effect similar to that of the program according to claim 1 can be obtained.
The present invention may be implemented as a data generation method executed by the information processing apparatus.

本発明のデータ生成方法は、第１取得手順と、第２取得手順と、抽出手順と、音高推移決定手順と、特定手順と、基準算出手順と、決定手順と、データ生成手順とを備える。
第１取得手順では、情報処理装置が楽譜データを第１記憶部から取得し、第２取得手順では、情報処理装置が楽曲データを第２記憶部から取得する。抽出手順では、ボーカルデータを情報処理装置が抽出し、音高推移決定手順では、その抽出したボーカルデータに基づいて、情報処理装置がボーカル音高推移を決定する。特定手順では、その決定されたボーカル音高推移に基づいて、情報処理装置がビブラート波形を特定する。 The data generation method of the present invention includes a first acquisition procedure, a second acquisition procedure, an extraction procedure, a pitch transition determination procedure, a specific procedure, a reference calculation procedure, a determination procedure, and a data generation procedure. .
In the first acquisition procedure, the information processing apparatus acquires score data from the first storage unit, and in the second acquisition procedure, the information processing apparatus acquires song data from the second storage unit. In the extraction procedure, the information processing device extracts vocal data, and in the pitch transition determination procedure, the information processing device determines the vocal pitch transition based on the extracted vocal data. In the specifying procedure, the information processing apparatus specifies a vibrato waveform based on the determined vocal pitch transition.

そして、基準算出手順では、その特定されたビブラート波形における頂点の音高に基づいて、情報処理装置が、高音基準音高及び低音基準音高を算出する。決定手順では、中心音高差、または、ビブラート深さのうち少なくともいずれかを、情報処理装置が決定する。そして、データ生成手順では、歌唱区間に対応する楽譜データでの区間に、中心音高差、及び、ビブラート深さのうち少なくともいずれかを対応付けることで、情報処理装置が歌唱特徴データを生成する。 In the reference calculation procedure, the information processing apparatus calculates the high pitch reference pitch and the low pitch reference pitch based on the pitch of the apex in the identified vibrato waveform. In the determination procedure, the information processing apparatus determines at least one of the central pitch difference and the vibrato depth. In the data generation procedure, the information processing apparatus generates singing feature data by associating at least one of the central pitch difference and the vibrato depth with the section of the musical score data corresponding to the singing section.

このようなデータ生成方法によれば、請求項１に係るプログラムと同様の効果を得ることができる。 According to such a data generation method, an effect similar to that of the program according to claim 1 can be obtained.

本発明が適用された情報処理装置を備えたシステムの概略構成を示すブロック図である。It is a block diagram which shows schematic structure of the system provided with the information processing apparatus with which this invention was applied. 情報処理装置が実行するデータ生成処理の処理手順を示すフローチャートである。It is a flowchart which shows the process sequence of the data generation process which information processing apparatus performs. ビブラート特徴算出処理の処理手順を示すフローチャートである。It is a flowchart which shows the process sequence of a vibrato feature calculation process. ビブラート特徴の算出手法の概要を例示する図である。It is a figure which illustrates the outline | summary of the calculation method of a vibrato feature. 歌唱特徴データの概要を例示する図である。It is a figure which illustrates the outline | summary of singing characteristic data. カラオケ装置が実行する表示処理の処理手順を示すフローチャートである。It is a flowchart which shows the process sequence of the display process which a karaoke apparatus performs. 表示処理を実行することで表示部に表示される表示の態様を例示する図である。It is a figure which illustrates the aspect of the display displayed on a display part by performing a display process.

以下に本発明の実施形態を図面と共に説明する。
＜システム構成＞
図１に示すカラオケ装置３０は、ユーザが指定した楽曲を演奏すると共に、その楽曲において表出するプロの歌手の歌い方における特徴を、表示部６４に表示させる装置である。 Embodiments of the present invention will be described below with reference to the drawings.
<System configuration>
The karaoke device 30 shown in FIG. 1 is a device that displays on the display unit 64 the characteristics of how to sing a professional singer expressed in the music while playing the music specified by the user.

このような、プロの歌手の歌い方における特徴をカラオケ装置３０が表示するために構築されるシステム１は、情報処理装置２と、情報処理サーバ１０と、カラオケ装置３０とを備えている。 The system 1 constructed in order for the karaoke device 30 to display such characteristics of how to sing a professional singer includes the information processing device 2, the information processing server 10, and the karaoke device 30.

情報処理装置２は、楽曲ごとに用意された楽曲データＷＤ及びＭＩＤＩ楽曲ＭＤに基づいて、歌唱特徴データＳＦを算出する。ここで言う歌唱特徴データＳＦとは、楽曲を歌唱するプロの歌手の歌い方における特徴を表すデータである。 The information processing device 2 calculates the singing feature data SF based on the music data WD and the MIDI music MD prepared for each music. The singing feature data SF referred to here is data representing features in how to sing a professional singer who sings music.

情報処理サーバ１０には、少なくとも、情報処理装置２にて算出された歌唱特徴データＳＦ及びＭＩＤＩ楽曲ＭＤが、対応する楽曲ごとに対応付けられて記憶部１４に記憶されている。
＜楽曲データ＞
次に、楽曲データＷＤは、特定の楽曲ごとに予め用意されたものであり、楽曲に関する情報が記述された楽曲管理情報と、楽曲の演奏音を表す原盤波形データとを備えている。楽曲管理情報には、楽曲を識別する楽曲識別情報（以下、楽曲ＩＤと称す）が含まれる。 In the information processing server 10, at least the singing feature data SF and the MIDI music MD calculated by the information processing device 2 are stored in the storage unit 14 in association with each corresponding music.
<Music data>
Next, the music data WD is prepared in advance for each specific music, and includes music management information in which information related to the music is described, and master waveform data representing the performance sound of the music. The music management information includes music identification information (hereinafter referred to as music ID) for identifying music.

本実施形態の原盤波形データは、複数の楽器の演奏音と、歌唱旋律をプロの歌手が歌唱したボーカル音とを含む音声データである。この音声データは、非圧縮音声ファイルフォーマットの音声ファイルによって構成されたデータであっても良いし、音声圧縮フォーマットの音声ファイルによって構成されたデータであっても良い。 The master waveform data of this embodiment is audio data including performance sounds of a plurality of musical instruments and vocal sounds sung by a professional singer. The audio data may be data constituted by an audio file in an uncompressed audio file format, or data constituted by an audio file in an audio compression format.

なお、以下では、原盤波形データに含まれる楽器の演奏音を表す音声波形データを伴奏データと称し、原盤波形データに含まれるボーカル音を表す音声波形データをボーカルデータと称す。 In the following, voice waveform data representing the performance sound of the musical instrument included in the master waveform data is referred to as accompaniment data, and voice waveform data representing the vocal sound included in the master waveform data is referred to as vocal data.

本実施形態の伴奏データに含まれる楽器の演奏音としては、打楽器（例えば、ドラム，太鼓，シンバルなど）の演奏音，弦楽器（例えば、ギター，ベースなど）の演奏音，打弦楽器（例えば、ピアノ）の演奏音，及び管楽器（例えば、トランペットやクラリネットなど）の演奏音がある。
＜ＭＩＤＩ楽曲＞
ＭＩＤＩ楽曲ＭＤは、楽曲ごとに予め用意されたものであり、演奏データと、歌詞データとを有している。 Musical instrument performance sounds included in the accompaniment data of the present embodiment include percussion instrument (eg, drum, drum, cymbal, etc.) performance sounds, stringed instrument (eg, guitar, bass, etc.) performance sounds, percussion instrument (eg, piano) ) And wind instruments (eg, trumpet, clarinet, etc.).
<MIDI music>
The MIDI music MD is prepared in advance for each music and has performance data and lyrics data.

このうち、演奏データは、周知のＭＩＤＩ（ＭｕｓｉｃａｌＩｎｓｔｒｕｍｅｎｔＤｉｇｉｔａｌＩｎｔｅｒｆａｃｅ）規格によって、一つの楽曲の楽譜を表したデータである。この演奏データは、楽曲ＩＤと、当該楽曲にて用いられる楽器ごとの楽譜を表す楽譜トラックとを少なくとも有している。 Of these, the performance data is data representing the score of one piece of music according to the well-known MIDI (Musical Instrument Digital Interface) standard. This performance data has at least a music ID and a music score track representing a music score for each musical instrument used in the music.

そして、楽譜トラックには、ＭＩＤＩ音源から出力される個々の演奏音について、少なくとも、音高（いわゆるノートナンバー）と、ＭＩＤＩ音源が演奏音を出力する期間（以下、音符長と称す）とが規定されている。楽譜トラックにおける音符長は、当該演奏音の出力を開始するまでの当該楽曲の演奏開始からの時間を表す演奏開始タイミング（いわゆるノートオンタイミング）と、当該演奏音の出力を終了するまでの当該楽曲の演奏開始からの時間を表す演奏終了タイミング（いわゆるノートオフタイミング）とによって規定されている。 The musical score track defines at least the pitch (so-called note number) and the period during which the MIDI sound source outputs the performance sound (hereinafter referred to as the note length) for each performance sound output from the MIDI sound source. Has been. The note length in the score track is the performance start timing (so-called note-on timing) indicating the time from the start of the performance of the music until the output of the performance sound and the music until the output of the performance sound ends. Performance end timing (so-called note-off timing) representing the time from the start of the performance.

すなわち、楽譜トラックでは、ノートナンバーと、ノートオンタイミング及びノートオフタイミングによって表される音符長とによって、１つの音符ＮＯが規定される。そして、楽譜トラックは、音符ＮＯが演奏順に配置されることによって、１つの楽譜として機能する。なお、楽譜トラックは、例えば、鍵盤楽器、弦楽器、打楽器、及び管楽器などの楽器ごとに用意されている。このうち、本実施形態では、特定の楽器（例えば、ヴィブラフォン）が、楽曲における歌唱旋律を担当する楽器として規定されている。 That is, in the score track, one note NO is defined by the note number and the note length represented by the note-on timing and note-off timing. The musical score track functions as one musical score by arranging note NO in the order of performance. Note that the musical score track is prepared for each instrument such as a keyboard instrument, a stringed instrument, a percussion instrument, and a wind instrument, for example. Among these, in this embodiment, a specific musical instrument (for example, vibraphone) is defined as a musical instrument responsible for singing melody in music.

一方、歌詞データは、楽曲の歌詞に関するデータであり、歌詞テロップデータと、歌詞出力データとを備えている。歌詞テロップデータは、楽曲の歌詞を構成する文字（以下、歌詞構成文字とする）を表す。歌詞出力データは、歌詞構成文字の出力タイミングである歌詞出力タイミングを、演奏データの演奏と対応付けるタイミング対応関係が規定されたデータである。 On the other hand, the lyrics data is data relating to the lyrics of the music, and includes lyrics telop data and lyrics output data. The lyrics telop data represents characters that constitute the lyrics of the music (hereinafter referred to as lyrics component characters). The lyrics output data is data in which a timing correspondence relationship that associates the lyrics output timing, which is the output timing of the lyrics constituent characters, with the performance of the performance data is defined.

具体的に、本実施形態におけるタイミング対応関係では、演奏データの演奏を開始するタイミングに、歌詞テロップデータの出力を開始するタイミングが対応付けられている。さらに、タイミング対応関係では、楽曲の時間軸に沿った各歌詞構成文字の歌詞出力タイミングが、演奏データの演奏開始からの経過時間によって規定されている。これにより、楽譜トラックに規定された個々の演奏音（即ち、音符ＮＯ）と、歌詞構成文字それぞれとが対応付けられる。
＜情報処理装置＞
情報処理装置２は、入力受付部３と、情報出力部４と、記憶部５と、制御部６とを備えた周知の情報処理装置（例えば、パーソナルコンピュータ）である。 Specifically, in the timing correspondence relationship in the present embodiment, the timing for starting the output of the lyrics telop data is associated with the timing for starting the performance of the performance data. Furthermore, in the timing correspondence relationship, the lyrics output timing of each lyrics constituent character along the time axis of the music is defined by the elapsed time from the performance start of the performance data. Thereby, each performance sound (namely, note NO) prescribed | regulated to the score track | truck and each lyric component character are matched.
<Information processing device>
The information processing apparatus 2 is a known information processing apparatus (for example, a personal computer) including an input receiving unit 3, an information output unit 4, a storage unit 5, and a control unit 6.

入力受付部３は、外部からの情報や指令の入力を受け付ける入力機器である。ここでの入力機器とは、例えば、キーやスイッチ、可搬型の記憶媒体（例えば、ＣＤやＤＶＤ、フラッシュメモリ）に記憶されたデータを読み取る読取ドライブ、通信網を介して情報を取得する通信ポートなどである。情報出力部４は、外部に情報を出力する出力装置である。ここでの出力装置とは、可搬型の記憶媒体にデータを書き込む書込ドライブや、通信網に情報を出力する通信ポートなどである。 The input receiving unit 3 is an input device that receives input of information and commands from the outside. The input device here is, for example, a key or switch, a reading drive for reading data stored in a portable storage medium (for example, CD, DVD, flash memory), or a communication port for acquiring information via a communication network. Etc. The information output unit 4 is an output device that outputs information to the outside. Here, the output device is a writing drive that writes data to a portable storage medium, a communication port that outputs information to a communication network, or the like.

記憶部５は、記憶内容を読み書き可能に構成された周知の記憶装置である。記憶部５には、少なくとも１つの楽曲データＷＤと、少なくとも１つのＭＩＤＩ楽曲ＭＤとが、共通する楽曲ごとに対応付けて記憶されている。 The storage unit 5 is a known storage device configured to be able to read and write stored contents. The storage unit 5 stores at least one piece of music data WD and at least one MIDI piece of music MD in association with each common piece of music.

制御部６は、ＲＯＭ７，ＲＡＭ８，ＣＰＵ９を備えた周知のマイクロコンピュータを中心に構成された周知の制御装置である。ＲＯＭ７は、電源が切断されても記憶内容を保持する必要がある処理プログラムやデータを記憶する。ＲＡＭ８は、処理プログラムやデータを一時的に記憶する。ＣＰＵ９は、ＲＯＭ７やＲＡＭ８に記憶された処理プログラムに従って各処理を実行する。 The control unit 6 is a known control device that is configured around a known microcomputer including a ROM 7, a RAM 8, and a CPU 9. The ROM 7 stores processing programs and data that need to retain stored contents even when the power is turned off. The RAM 8 temporarily stores processing programs and data. The CPU 9 executes each process according to a processing program stored in the ROM 7 or RAM 8.

本実施形態のＲＯＭ７には、記憶部５に記憶されている楽曲データＷＤ及びＭＩＤＩ楽曲ＭＤに基づいて、歌唱特徴データＳＦを算出するデータ生成処理を、制御部６が実行するための処理プログラムが記憶されている。
＜情報処理サーバ＞
情報処理サーバ１０は、通信部１２と、記憶部１４と、制御部１６とを備えている。 The ROM 7 of the present embodiment has a processing program for the control unit 6 to execute a data generation process for calculating the song feature data SF based on the music data WD and the MIDI music MD stored in the storage unit 5. It is remembered.
<Information processing server>
The information processing server 10 includes a communication unit 12, a storage unit 14, and a control unit 16.

このうち、通信部１２は、通信網を介して、情報処理サーバ１０が外部との間で通信を行う。すなわち、情報処理サーバ１０は、通信網を介してカラオケ装置３０と接続されている。なお、ここで言う通信網は、有線による通信網であっても良いし、無線による通信網であっても良い。 Among these, the communication unit 12 performs communication between the information processing server 10 and the outside via a communication network. That is, the information processing server 10 is connected to the karaoke apparatus 30 via a communication network. The communication network referred to here may be a wired communication network or a wireless communication network.

記憶部１４は、記憶内容を読み書き可能に構成された周知の記憶装置である。この記憶部１４には、少なくとも、複数のＭＩＤＩ楽曲ＭＤが記憶される。なお、図１に示す符号「ｍ」は、情報処理サーバ１０の記憶部１４に記憶されているＭＩＤＩ楽曲ＭＤを識別する識別子であり、１以上の自然数である。さらに、記憶部１４には、情報処理装置２がデータ生成処理を実行することで生成された歌唱特徴データＳＦが記憶される。 The storage unit 14 is a known storage device configured to be able to read and write stored contents. The storage unit 14 stores at least a plurality of MIDI music pieces MD. 1 is an identifier for identifying the MIDI music piece MD stored in the storage unit 14 of the information processing server 10, and is a natural number of 1 or more. Furthermore, the storage unit 14 stores singing feature data SF generated by the information processing apparatus 2 executing data generation processing.

制御部１６は、ＲＯＭ１８，ＲＡＭ２０，ＣＰＵ２２を備えた周知のマイクロコンピュータを中心に構成された周知の制御装置である。ＲＯＭ１８，ＲＡＭ２０，ＣＰＵ２２は、それぞれ、ＲＯＭ７，ＲＡＭ８，ＣＰＵ９と同様に構成されている。
＜カラオケ装置＞
カラオケ装置３０は、通信部３２と、入力受付部３４と、楽曲再生部３６と、記憶部３８と、音声制御部４０と、映像制御部４６と、制御部５０とを備えている。 The control unit 16 is a known control device that is configured around a known microcomputer including a ROM 18, a RAM 20, and a CPU 22. The ROM 18, RAM 20, and CPU 22 are configured similarly to the ROM 7, RAM 8, and CPU 9, respectively.
<Karaoke equipment>
The karaoke apparatus 30 includes a communication unit 32, an input reception unit 34, a music playback unit 36, a storage unit 38, an audio control unit 40, a video control unit 46, and a control unit 50.

通信部３２は、通信網を介して、カラオケ装置３０が外部との間で通信を行う。入力受付部３４は、外部からの操作に従って情報や指令の入力を受け付ける入力機器である。ここでの入力機器とは、例えば、キーやスイッチ、リモコンの受付部などである。 In the communication unit 32, the karaoke apparatus 30 communicates with the outside via a communication network. The input receiving unit 34 is an input device that receives input of information and commands in accordance with external operations. Here, the input device is, for example, a key, a switch, a reception unit of a remote controller, or the like.

楽曲再生部３６は、情報処理サーバ１０からダウンロードしたＭＩＤＩ楽曲ＭＤに基づく楽曲の演奏を実行する。この楽曲再生部３６は、例えば、ＭＩＤＩ音源である。音声制御部４０は、音声の入出力を制御するデバイスであり、出力部４２と、マイク入力部４４とを備えている。 The music playback unit 36 performs a music performance based on the MIDI music MD downloaded from the information processing server 10. The music reproducing unit 36 is, for example, a MIDI sound source. The voice control unit 40 is a device that controls voice input / output, and includes an output unit 42 and a microphone input unit 44.

マイク入力部４４には、マイク６２が接続される。これにより、マイク入力部４４は、ユーザの歌唱音を取得する。出力部４２にはスピーカ６０が接続されている。出力部４２は、楽曲再生部３６によって再生される楽曲の音源信号、マイク入力部４４からの歌唱音の音源信号をスピーカ６０に出力する。スピーカ６０は、出力部４２から出力される音源信号を音に換えて出力する。 A microphone 62 is connected to the microphone input unit 44. Thereby, the microphone input part 44 acquires a user's song sound. A speaker 60 is connected to the output unit 42. The output unit 42 outputs the sound source signal of the music reproduced by the music reproducing unit 36 and the sound source signal of the singing sound from the microphone input unit 44 to the speaker 60. The speaker 60 outputs the sound source signal output from the output unit 42 instead of sound.

映像制御部４６は、制御部５０から送られてくる映像データに基づく映像または画像の出力を行う。映像制御部４６には、映像または画像を表示する表示部６４が接続されている。 The video control unit 46 outputs a video or an image based on the video data sent from the control unit 50. The video control unit 46 is connected to a display unit 64 that displays video or images.

制御部５０は、ＲＯＭ５２，ＲＡＭ５４，ＣＰＵ５６を少なくとも有した周知のコンピュータを中心に構成されている。ＲＯＭ５２，ＲＡＭ５４，ＣＰＵ５６は、それぞれ、ＲＯＭ７，ＲＡＭ８，ＣＰＵ９と同様に構成されている。 The control unit 50 is configured around a known computer having at least a ROM 52, a RAM 54, and a CPU 56. The ROM 52, RAM 54, and CPU 56 are configured similarly to the ROM 7, RAM 8, and CPU 9, respectively.

そして、ＲＯＭ５２には、表示処理を制御部５０が実行するための処理プログラムが記憶されている。表示処理は、ユーザによって指定された楽曲を演奏すると共に、その楽曲において表出するプロの歌手の歌い方における特徴、その楽曲の歌詞、及び歌唱旋律を表示部６４に表示させる処理である。
＜データ生成処理＞
次に、情報処理装置２の制御部６が実行するデータ生成処理について説明する。 The ROM 52 stores a processing program for the control unit 50 to execute display processing. The display process is a process of playing the music designated by the user and displaying on the display unit 64 the characteristics of how to sing a professional singer expressed in the music, the lyrics of the music, and the singing melody.
<Data generation processing>
Next, data generation processing executed by the control unit 6 of the information processing apparatus 2 will be described.

このデータ生成処理は、処理プログラムを起動するための起動指令が、情報処理装置２の入力受付部３を介して入力されたタイミングで起動される。
そして、データ生成処理では、図２に示すように、起動されると、まず、制御部６は、情報処理装置２の記憶部５に記憶されている全ての楽曲データＷＤの中から、一つの楽曲データＷＤを取得する（Ｓ１１０）。なお、本実施形態のＳ１１０においては、制御部６は、楽曲データＷＤを記憶部５から取得したが、制御部６は、入力受付部３を介して、可搬型の記憶媒体や通信網を介してサーバなどから楽曲データＷＤを取得しても良い。 This data generation process is started at the timing when a start command for starting the processing program is input via the input receiving unit 3 of the information processing apparatus 2.
In the data generation process, as shown in FIG. 2, when activated, the control unit 6 first selects one piece of music data WD stored in the storage unit 5 of the information processing device 2 from one piece of music data WD. The music data WD is acquired (S110). In S110 of the present embodiment, the control unit 6 acquires the music data WD from the storage unit 5. However, the control unit 6 via the input reception unit 3 via a portable storage medium or communication network. The music data WD may be acquired from a server or the like.

データ生成処理では、制御部６は、続いて、Ｓ１１０にて取得した楽曲データＷＤ（以下、「取得楽曲データ」と称す）に含まれる原盤波形データを取得する（Ｓ１２０）。さらに、制御部６は、Ｓ１２０にて取得した原盤波形データから、ボーカルデータと伴奏データとを分離して抽出する（Ｓ１３０）。このＳ１３０において制御部６が実行する、伴奏データとボーカルデータとの分離手法として、周知の手法（例えば、特開２００８−１３４６０６に記載された“ＰｒｅＦＥｓｔ”）を使って推定された音高および調波成分を利用する手法が考えられる。なお、ＰｒｅＦＥｓｔとは、原盤波形データにおいて最も優勢な音声波形をボーカルデータとみなしてボーカルの音高（即ち、基本周波数）および調波成分の大きさを推定する手法である。 In the data generation process, the control unit 6 subsequently acquires master waveform data included in the music data WD acquired in S110 (hereinafter referred to as “acquired music data”) (S120). Further, the control unit 6 separates and extracts vocal data and accompaniment data from the master disk waveform data acquired in S120 (S130). As a method of separating accompaniment data and vocal data, which is executed by the control unit 6 in S130, the pitch and pitch estimated using a well-known method (for example, “PreFEst” described in Japanese Patent Laid-Open No. 2008-134606). A method using wave components can be considered. Note that PreFEst is a technique for estimating the pitch of a vocal (that is, the fundamental frequency) and the magnitude of a harmonic component by regarding the most prevalent voice waveform in the master waveform data as vocal data.

データ生成処理では、続いて、制御部６は、ボーカルデータにおける音圧レベルの推移を表すボーカル音圧推移を特定する（Ｓ１４０）。さらに、制御部６は、ボーカルデータにおける基本周波数ｆ０の推移を表すボーカル周波数推移を特定する（Ｓ１５０）。 In the data generation process, subsequently, the control unit 6 specifies a vocal sound pressure transition representing a transition of the sound pressure level in the vocal data (S140). Further, the control unit 6 specifies a vocal frequency transition representing a transition of the fundamental frequency f0 in the vocal data (S150).

具体的に、本実施形態のＳ１４０，Ｓ１５０では、制御部６は、まず、規定時間窓ＡＷ（ｊ）をボーカルデータに設定する。この規定時間窓ＡＷ（ｊ）は、予め規定された単位時間（例えば、１０［ｍｓ］）を有した分析窓である。本実施形態においては、規定時間窓ＡＷは、時間軸に沿って互いに隣接かつ連続するように設定される。なお、符号ｊは、規定時間窓ＡＷを識別する識別子である。 Specifically, in S140 and S150 of the present embodiment, the control unit 6 first sets the specified time window AW (j) as vocal data. The specified time window AW (j) is an analysis window having a predetermined unit time (for example, 10 [ms]). In the present embodiment, the specified time window AW is set to be adjacent to and continuous with each other along the time axis. The symbol j is an identifier for identifying the specified time window AW.

続いて、制御部６は、周知の手法により、ボーカルデータにおける各規定時間窓ＡＷ（ｊ）での音圧レベルＬｐを算出する。なお、音圧レベルＬｐは、ボーカルデータの規定時間窓ＡＷ（ｊ）における音圧の二乗平均平方根ｐを、基準となる音圧ｐ０で除したものの常用対数に、所定の係数（通常、「２０」）を乗じること（即ち、Ｌｐ＝２０×ｌｏｇ１０（ｐ／ｐ０））で求めることができる。 Subsequently, the control unit 6 calculates the sound pressure level Lp in each prescribed time window AW (j) in the vocal data by a known method. The sound pressure level Lp is a predetermined coefficient (usually “20”, which is the common logarithm of the root mean square p of the sound pressure in the prescribed time window AW (j) of the vocal data divided by the reference sound pressure p0. ]) (That is, Lp = 20 × log10 (p / p0)).

さらに、制御部６は、各規定時間窓ＡＷ（ｊ）での音圧レベルＬｐを、ボーカルデータにおける時間軸に沿って配置することで、ボーカル音圧推移を特定する。
また、ボーカル周波数推移を特定するために、制御部６は、ボーカルデータにおける各規定時間窓ＡＷ（ｊ）での基本周波数ｆ０を導出する。この基本周波数ｆ０の導出手法として、種種の周知の手法が考えられる。一例として、制御部６は、ボーカルデータに設定された規定時間窓ＡＷ（ｊ）それぞれについて、周波数解析（例えば、ＤＦＴ）を実施し、自己相関の結果、最も強い周波数成分を基本周波数ｆ０とすることが考えられる。 Furthermore, the control unit 6 specifies the vocal sound pressure transition by arranging the sound pressure level Lp in each specified time window AW (j) along the time axis in the vocal data.
In addition, in order to identify the vocal frequency transition, the control unit 6 derives the fundamental frequency f0 in each prescribed time window AW (j) in the vocal data. Various known methods can be considered as a method for deriving the fundamental frequency f0. As an example, the control unit 6 performs frequency analysis (for example, DFT) for each specified time window AW (j) set in vocal data, and sets the strongest frequency component as the fundamental frequency f0 as a result of autocorrelation. It is possible.

そして、制御部６は、それらの規定時間窓ＡＷ（ｊ）ごとに導出された基本周波数ｆ０を、ボーカルデータにおける時間軸に沿って配置することで、ボーカル周波数推移を特定する。 And the control part 6 specifies vocal frequency transition by arrange | positioning the fundamental frequency f0 derived | led-out for every these regulation | regulation time window AW (j) along the time axis in vocal data.

データ生成処理では、制御部６は、続いて、Ｓ１１０で取得した楽曲データＷＤと同一の楽曲ＩＤが対応付けられた一つのＭＩＤＩ楽曲ＭＤを取得する（Ｓ１６０）。さらに、制御部６は、取得楽曲データの各音符に対応する各音の再生時間に、Ｓ１６０で取得したＭＩＤＩ楽曲ＭＤ（以下、「取得ＭＩＤＩ」と称す）を構成する各音符の演奏タイミングが一致するように、その取得ＭＩＤＩを調整する（Ｓ１７０）。この取得ＭＩＤＩを調整する手法として、周知の手法（例えば、特許第５３１０６７７号に記載の手法）を用いることが考えられる。特許第５３１０６７７号に記載された手法では、制御部６は、取得ＭＩＤＩをレンダリングし、その取得ＭＩＤＩのレンダリング結果と取得楽曲データの原盤波形データとの双方を規定時間単位でスペクトルデータに変換する。そして、双方のスペクトルデータ上の時間が同期するように、各演奏音の演奏開始タイミング及び演奏終了タイミングを修正する。なお、スペクトルデータ上の時間が同期するように調整する際には、ＤＰマッチングを用いても良い。 In the data generation process, the control unit 6 subsequently acquires one MIDI music MD associated with the same music ID as the music data WD acquired in S110 (S160). Furthermore, the control unit 6 matches the performance timing of each note constituting the MIDI song MD acquired in S160 (hereinafter referred to as “acquired MIDI”) with the playback time of each sound corresponding to each note of the acquired song data. Then, the acquired MIDI is adjusted (S170). As a method for adjusting the acquired MIDI, it is conceivable to use a known method (for example, the method described in Japanese Patent No. 5310777). In the method described in Japanese Patent No. 5310679, the control unit 6 renders the acquired MIDI, and converts both the rendering result of the acquired MIDI and the master waveform data of the acquired music data into spectral data in a predetermined time unit. And the performance start timing and performance end timing of each performance sound are corrected so that the time on both spectrum data may synchronize. Note that DP matching may be used when adjusting the time on the spectrum data so as to be synchronized.

さらに、データ生成処理では、制御部６は、Ｓ１７０にて時間調整が実施されたＭＩＤＩ楽曲ＭＤから、歌唱旋律を表すメロディトラックを取得する（Ｓ１８０）。このＳ１８０において取得するメロディトラックには、歌唱旋律を構成する各音符（以下、「メロディ音符」と称す）ＮＯ（ｉ）が規定されている。そして、各メロディ音符ＮＯ（ｉ）には、ノートナンバー（音高），演奏開始タイミングｎｎｔ（ｉ），演奏終了タイミングｎｆｔ（ｉ），及び各種の時間制御情報（例えば、テンポ，分解能など）などの音符プロパティが規定されている。なお、符号ｉは、メロディ音符を識別する識別子であり、歌唱旋律の時間軸に沿って増加するように規定されている。 Further, in the data generation process, the control unit 6 acquires a melody track representing the singing melody from the MIDI music MD whose time has been adjusted in S170 (S180). In the melody track acquired in S180, each note (hereinafter referred to as "melody note") NO (i) constituting the singing melody is defined. Each melody note NO (i) includes a note number (pitch), performance start timing nnt (i), performance end timing nft (i), various time control information (eg, tempo, resolution, etc.), etc. Note properties are specified. Note that the symbol i is an identifier for identifying a melody note, and is defined so as to increase along the time axis of the singing melody.

データ生成処理では、制御部６は、時系列音符データを生成する（Ｓ１９０）。この時系列音符データは、メロディ音符ＮＯ（ｉ）を時間軸に沿って配置した音符推移に、各メロディ音符ＮＯ（ｉ）に関する音符プロパティを対応付けたものである。 In the data generation process, the control unit 6 generates time-series note data (S190). This time-series note data is obtained by associating note properties relating to each melody note NO (i) with note transitions in which the melody note NO (i) is arranged along the time axis.

具体的に、本実施形態のＳ１９０では、まず、制御部６は、メロディ音符ＮＯ（ｉ）を時間軸に沿って配置した音符推移を生成する。その音符推移に対して、制御部６は、規定時間窓ＡＷ（ｊ）を設定する。この音符推移に設定される規定時間窓ＡＷ（ｊ）は、ボーカルデータに設定される規定時間窓ＡＷ（ｊ）と共通である。すなわち、音符推移及びボーカルデータに設定される規定時間窓ＡＷ（ｊ）は、符号ｊが共通であれば、同一タイミングであることを意味する。 Specifically, in S190 of the present embodiment, first, the control unit 6 generates a note transition in which the melody note NO (i) is arranged along the time axis. The controller 6 sets a specified time window AW (j) for the note transition. The specified time window AW (j) set for the note transition is the same as the specified time window AW (j) set for vocal data. That is, the specified time window AW (j) set for the note transition and vocal data means that the same timing is used if the code j is common.

続いて、制御部６は、音符推移に設定した規定時間窓ＡＷの個数と、ボーカルデータに設定した規定時間窓ＡＷの個数とを比較する。制御部６は、比較の結果、個数が多いものを、音符推移及びボーカルデータの両データにおける規定時間窓ＡＷの個数（即ち、「ｊｍａｘ」）として採用する。なお、制御部６は、規定時間窓ＡＷとの個数が一致するように、個数が少ない方に「０値」を追加する補完を実行しても良い。 Subsequently, the control unit 6 compares the number of specified time windows AW set for note transition with the number of specified time windows AW set for vocal data. As a result of the comparison, the control unit 6 adopts the one having a large number as the number of the specified time windows AW (that is, “jmax”) in both the note transition data and the vocal data. Note that the control unit 6 may perform complementation by adding “0 value” to the smaller number so that the number matches the specified time window AW.

さらに、制御部６は、音符推移に設定された規定時間窓ＡＷ（ｊ）の中で、各メロディ音符ＮＯ（ｉ）の演奏開始タイミングｎｎｔ（ｉ）及び演奏終了タイミングｎｆｔ（ｉ）に対応する規定時間窓ＡＷ（ｊ）を特定する。そして、制御部６は、演奏開始タイミングｎｎｔ（ｉ）に対応する規定時間窓ＡＷ（ｊ）に対して、演奏開始タイミングｎｎｔ（ｉ）に対応する旨を表す音符開始フラグを対応付ける。さらに、制御部６は、演奏終了タイミングｎｆｔ（ｉ）に対応する規定時間窓ＡＷ（ｊ）に対して、演奏終了タイミングｎｆｔ（ｉ）に対応する旨を表す音符終了フラグを対応付ける。これと共に、制御部６は、音符開始フラグが対応付けられた規定時間窓ＡＷから、音符終了フラグが対応付けられた規定時間窓ＡＷまでの全ての規定時間窓ＡＷに対して、対応するメロディ音符ＮＯ（ｉ）の音高を対応付けることで、時系列音符データを生成する。 Further, the control unit 6 corresponds to the performance start timing nnt (i) and performance end timing nft (i) of each melody note NO (i) within the specified time window AW (j) set to note transition. A specified time window AW (j) is specified. Then, the control unit 6 associates a note start flag indicating that it corresponds to the performance start timing nnt (i) with the specified time window AW (j) corresponding to the performance start timing nnt (i). Furthermore, the control unit 6 associates a note end flag indicating that it corresponds to the performance end timing nft (i) with the specified time window AW (j) corresponding to the performance end timing nft (i). At the same time, the control unit 6 adds melody notes corresponding to all the specified time windows AW from the specified time window AW associated with the note start flag to the specified time window AW associated with the note end flag. Time series note data is generated by associating the pitch of NO (i).

データ生成処理では、制御部６は、Ｓ１４０にて特定したボーカル音圧推移と、Ｓ１５０にて特定したボーカル周波数推移と、Ｓ１９０で生成した時系列音符データとを、規定時間窓ＡＷ（ｊ）ごとに対応付ける（Ｓ２００）。以下では、ボーカル音圧推移とボーカル周波数推移と時系列音符データとを対応付けたデータを、同期データとも称す。 In the data generation process, the control unit 6 displays the vocal sound pressure transition specified in S140, the vocal frequency transition specified in S150, and the time-series note data generated in S190 for each specified time window AW (j). (S200). Below, the data which matched vocal sound pressure transition, vocal frequency transition, and time series note data are also called synchronous data.

データ生成処理では、制御部６は、同期データに基づいて、ボーカルデータにおいて、歌唱技巧としてのビブラートを用いて歌唱した区間を特定すると共に、ビブラート特徴を算出するビブラート特徴算出処理を実行する（Ｓ２１０）。 In the data generation process, the control unit 6 executes a vibrato feature calculation process for identifying a section sung using vibrato as a singing technique in the vocal data based on the synchronization data and calculating a vibrato feature (S210). ).

このビブラート特徴は、プロの歌手の歌い方における特徴の１つであり、楽曲においてプロの歌手が用いたビブラートに関する情報である。本実施形態のＳ２１０にて算出されるビブラート特徴には、中心音高差と、ビブラート深さとを含む。中心音高差は、ビブラートを用いて歌唱されたボーカルデータの区間における、そのビブラートの中心音高と、ビブラートを用いて歌唱されたボーカルデータの区間に対応する演奏タイミングが規定されたメロディ音符ＮＯ（ｉ）の音高との差分である。また、ビブラート深さは、ビブラートを用いて歌唱されたボーカルデータの区間における基本周波数ｆ０の周波数軸に沿った振れ幅を表す。 This vibrato feature is one of the features of a professional singer's singing, and is information on the vibrato used by the professional singer in the music. The vibrato feature calculated in S210 of the present embodiment includes a central pitch difference and a vibrato depth. The central pitch difference is the melody note NO in which the central pitch of the vibrato in the vocal data section sung using vibrato and the performance timing corresponding to the vocal data section sung using vibrato are specified. It is a difference from the pitch of (i). Further, the vibrato depth represents a fluctuation width along the frequency axis of the fundamental frequency f0 in a section of vocal data sung using vibrato.

さらに、データ生成処理では、制御部６は、歌唱特徴データＳＦを生成する（Ｓ２２０）。具体的に、本実施形態のＳ２２０では、Ｓ２１０にて算出されたビブラート特徴量に識別情報（以下、「ビブラートＩＤ」と称す）を付与することで、歌唱特徴データＳＦを生成する。 Furthermore, in the data generation process, the control unit 6 generates singing feature data SF (S220). Specifically, in S220 of the present embodiment, singing feature data SF is generated by adding identification information (hereinafter referred to as “vibrato ID”) to the vibrato feature amount calculated in S210.

データ生成処理では、制御部６は、Ｓ２２０にて生成した歌唱特徴データＳＦを記憶部５に記憶する（Ｓ２３０）。
その後、制御部６は、本データ生成処理を終了する。
＜ビブラート特徴算出処理＞
データ生成処理のＳ２１０にて実行されるビブラート特徴算出処理では、図３に示すように、制御部６は、先のＳ１４０にて特定されたボーカル周波数推移を取得する（Ｓ７１０）。 In the data generation process, the control unit 6 stores the singing feature data SF generated in S220 in the storage unit 5 (S230).
Thereafter, the control unit 6 ends the data generation process.
<Vibrato feature calculation processing>
In the vibrato feature calculation process executed in S210 of the data generation process, as shown in FIG. 3, the control unit 6 acquires the vocal frequency transition specified in the previous S140 (S710).

続いて、制御部６は、Ｓ７１０にて取得したボーカル周波数推移を、規定された個数分の規定時間窓ＡＷごとに平滑化微分する（Ｓ７２０）。さらに、制御部６は、ボーカル周波数推移を平滑化微分した結果と、規定された軸（いわゆるＸ軸）とが交わる規定時間窓ＡＷ（ｊ）を、交点ＰＩ（ｊ）として特定する（Ｓ７２０）。この交点ＰＩ（ｊ）は、ボーカル周波数推移における頂点であり、変曲点である。このＳ７２０では、制御部６は、各交点ＰＩに、交点ＰＩ自身が、極大となる変曲点であるのか、極小となる変曲点であるのかを対応付ける。 Subsequently, the control unit 6 performs smoothing differentiation on the vocal frequency transition acquired in S710 for each specified time window AW for the specified number (S720). Furthermore, the control unit 6 specifies a specified time window AW (j) where the result of smoothing differentiation of the vocal frequency transition and a specified axis (so-called X axis) intersects as an intersection point PI (j) (S720). . This intersection point PI (j) is a vertex in the vocal frequency transition and an inflection point. In S720, the control unit 6 associates each intersection point PI with whether the intersection point PI itself is a maximum inflection point or a minimum inflection point.

ビブラート特徴算出処理では、制御部６は、交点ＰＩ（ｊ）の中から、１つの交点ＰＩ（ｊ）を抽出する（Ｓ７３０）。このＳ７３０では、制御部６は、まず、時間軸に沿って最先の交点ＰＩ（ｊ）を取得する。以下、Ｓ７３０にて取得した交点ＰＩ（ｊ）を判定対象交点ＰＩ（ｊ）と称す。 In the vibrato feature calculation process, the control unit 6 extracts one intersection point PI (j) from the intersection point PI (j) (S730). In S730, the control unit 6 first acquires the earliest intersection PI (j) along the time axis. Hereinafter, the intersection PI (j) acquired in S730 is referred to as a determination target intersection PI (j).

さらに、制御部６は、判定対象交点ＰＩ（ｊ）から、規定された時間長の範囲内に存在する交点ＰＩの個数が、予め規定された閾値Ｔｈｃ（例えば、「４」）よりも多いか否かを判定する（Ｓ７４０）。このＳ７４０での判定の結果、交点ＰＩの個数が閾値Ｔｈｃ以下であれば（Ｓ７４０：ＮＯ）、制御部６は、後述するＳ７７０へとビブラート特徴算出処理を移行させる。一方、Ｓ７４０での判定の結果、交点ＰＩの個数が閾値Ｔｈｃよりも多ければ（Ｓ７４０：ＹＥＳ）、制御部６は、ビブラート特徴算出処理をＳ７５０へと移行させる。 Furthermore, the control unit 6 determines whether the number of intersection points PI existing within a specified time length range from the determination target intersection point PI (j) is greater than a predetermined threshold value Thc (for example, “4”). It is determined whether or not (S740). As a result of the determination in S740, if the number of intersection points PI is equal to or less than the threshold Thc (S740: NO), the control unit 6 shifts the vibrato feature calculation process to S770 described later. On the other hand, as a result of the determination in S740, if the number of intersection points PI is greater than the threshold Thc (S740: YES), the control unit 6 shifts the vibrato feature calculation process to S750.

そのＳ７５０では、制御部６は、判定対象交点ＰＩ（ｊ）から規定された時間長の範囲内における、ボーカル周波数推移の最高音高と最低音高との音高差が、予め規定された規定値よりも小さいか否かを判定する。このＳ７５０での判定の結果、規定された時間長の範囲内での音高差が規定値以上であれば（Ｓ７５０：ＮＯ）、制御部６は、ビブラート特徴算出処理をＳ７７０へと移行する。一方、Ｓ７５０での判定の結果、規定された時間長の範囲内での音高差が規定値よりも小さければ（Ｓ７５０：ＹＥＳ）、制御部６は、ビブラート特徴算出処理をＳ７６０へと移行する。 In S750, the control unit 6 determines that the pitch difference between the highest pitch and the lowest pitch of the vocal frequency transition within the range of the time length specified from the determination target intersection PI (j) is a predetermined rule. It is determined whether it is smaller than the value. As a result of the determination in S750, if the pitch difference within the specified time length is greater than or equal to the specified value (S750: NO), the control unit 6 moves the vibrato feature calculation process to S770. On the other hand, as a result of the determination in S750, if the pitch difference within the specified time length is smaller than the specified value (S750: YES), the control unit 6 moves the vibrato feature calculation process to S760. .

そのＳ７６０では、制御部６は、判定対象交点ＰＩ（ｊ）から規定された時間長の範囲内に存在する全ての規定時間窓ＡＷ（ｊ）にビブラートフラグを付与する。このビブラートフラグとは、ビブラートによって歌唱された可能性のある区間であることを表す。以下、ビブラートフラグが付与された規定時間窓ＡＷ（ｊ）からなる１つの領域を、ビブラート候補領域ＣＳ（ｋ）と称す。ただし、符号ｋは、ビブラート候補領域ＣＳを識別する識別子であり、ビブラートＩＤである。 In S760, the control unit 6 assigns a vibrato flag to all the specified time windows AW (j) existing within the range of the time length specified from the determination target intersection PI (j). This vibrato flag represents a section that may have been sung by vibrato. Hereinafter, one area including the specified time window AW (j) to which the vibrato flag is assigned is referred to as a vibrato candidate area CS (k). Here, the symbol k is an identifier for identifying the vibrato candidate area CS and is a vibrato ID.

続くＳ７７０では、制御部６は、全ての交点ＰＩについて、Ｓ７３０からＳ７６０までの処理を実行したか否かを判定する。そして、Ｓ７７０での判定の結果、Ｓ７３０からＳ７６０までの処理を全ての交点ＰＩについて実行していなければ（Ｓ７７０：ＮＯ）、制御部６は、ビブラート特徴算出処理をＳ７３０へと戻す。そのＳ７３０では、ビブラートフラグが付与されていない規定時間窓ＡＷに対応する交点ＰＩの中から、時間軸に沿って最先の交点ＰＩを１つ抽出する。その後、制御部６は、Ｓ７７０までのステップを実行する。 In subsequent S770, the control unit 6 determines whether or not the processing from S730 to S760 has been executed for all the intersection points PI. As a result of the determination in S770, if the processing from S730 to S760 has not been executed for all the intersection points PI (S770: NO), the control unit 6 returns the vibrato feature calculation processing to S730. In S730, one of the earliest intersections PI along the time axis is extracted from the intersections PI corresponding to the specified time window AW to which no vibrato flag is assigned. Thereafter, the control unit 6 executes steps up to S770.

一方、Ｓ７７０での判定の結果、Ｓ７３０からＳ７６０までの処理を全ての交点ＰＩについて実行していれば（Ｓ７７０：ＹＥＳ）、制御部６は、ビブラート特徴算出処理をＳ７８０へと移行する。そのＳ７８０では、制御部６は、ビブラート候補領域ＣＳのそれぞれにおいて、メロディ音符ＮＯの音高が変化するか否かを判定する。このＳ７８０での判定の結果、メロディ音符ＮＯの音高が変化していなければ（Ｓ７８０：ＮＯ）、制御部６は、各ビブラート候補領域ＣＳ（ｋ）をビブラート領域ＶＴ（ｋ）に設定して、後述するＳ８００へとビブラート特徴算出処理を移行させる。ここで言うビブラート領域ＶＴは、ボーカルデータにおいてビブラートによって歌唱されていることを表す規定時間窓ＡＷの区間である。 On the other hand, as a result of the determination in S770, if the processes from S730 to S760 have been executed for all the intersection points PI (S770: YES), the control unit 6 moves the vibrato feature calculation process to S780. In S780, the control unit 6 determines whether or not the pitch of the melody note NO changes in each of the vibrato candidate areas CS. If the pitch of the melody note NO has not changed as a result of the determination in S780 (S780: NO), the control unit 6 sets each vibrato candidate area CS (k) as the vibrato area VT (k). Then, the vibrato feature calculation process is shifted to S800 described later. The vibrato area VT referred to here is a section of a specified time window AW indicating that the vocal data is sung by vibrato.

なお、Ｓ７８０での判定の結果、メロディ音符ＮＯの音高が変化していれば（Ｓ７８０：ＹＥＳ）、制御部６は、その音高が変化したタイミングに対応する規定時間窓ＡＷ（ｊ）にて、ビブラート候補領域ＣＳを分割する（Ｓ７９０）。その後、制御部６は、分割後のビブラート候補領域ＣＳを含む、全てのビブラート候補領域ＣＳをビブラート領域ＶＴに設定して、ビブラート特徴算出処理をＳ８００へと移行させる。そのＳ８００では、制御部６は、ビブラート領域ＶＴの１つに対応するボーカル周波数推移をビブラート波形として取得する。 Note that if the pitch of the melody note NO has changed as a result of the determination in S780 (S780: YES), the control unit 6 displays the specified time window AW (j) corresponding to the timing at which the pitch has changed. Then, the vibrato candidate area CS is divided (S790). Thereafter, the control unit 6 sets all the vibrato candidate areas CS including the divided vibrato candidate area CS to the vibrato area VT, and shifts the vibrato feature calculation process to S800. In S800, the control unit 6 acquires a vocal frequency transition corresponding to one of the vibrato regions VT as a vibrato waveform.

続いて、制御部６は、Ｓ８００にて取得したビブラート波形（以下、「取得ビブラート波形」と称す）に基づいて、高音基準音高及び低音基準音高を算出する（Ｓ８１０）。高音基準音高とは、取得ビブラート波形における高音域の音高を表す。具体的に、Ｓ８１０では、制御部６は、取得ビブラート波形における変曲点（即ち、交点ＰＩに対応するボーカル周波数推移での規定時間窓ＡＷ）のうち、極大となる変曲点での基本周波数ｆ０の代表値を、高音基準音高として算出する。また、Ｓ８１０では、制御部６は、取得ビブラート波形の変曲点のうち、極小となる変曲点での基本周波数ｆ０の代表値を、低音基準音高として算出する。ここで言う代表値は、極大または極小となる変曲点での基本周波数ｆ０の相加平均であっても良い。また、代表値は、極大または極小となる変曲点での基本周波数ｆ０の中で、最も高い基本周波数ｆ０及び最も低い基本周波数ｆ０を除いた基本周波数ｆ０の相加平均であっても良い。代表値は、極大または極小となる変曲点での基本周波数ｆ０の中央値であっても良いし、最頻値であっても良い。 Subsequently, the control unit 6 calculates the high pitch reference pitch and the low pitch reference pitch based on the vibrato waveform acquired in S800 (hereinafter referred to as “acquired vibrato waveform”) (S810). The high pitch reference pitch represents the pitch of the high range in the acquired vibrato waveform. Specifically, in S810, the control unit 6 determines the fundamental frequency at the inflection point that is the maximum among the inflection points in the acquired vibrato waveform (that is, the specified time window AW in the vocal frequency transition corresponding to the intersection PI). The representative value of f0 is calculated as the high pitch reference pitch. In S810, the control unit 6 calculates the representative value of the fundamental frequency f0 at the inflection point that is the minimum among the inflection points of the acquired vibrato waveform as the bass reference pitch. The representative value referred to here may be an arithmetic average of the fundamental frequency f0 at the inflection point at which the maximum or minimum is reached. In addition, the representative value may be an arithmetic average of the fundamental frequency f0 excluding the highest fundamental frequency f0 and the lowest fundamental frequency f0 among the fundamental frequencies f0 at the inflection point where the maximum or the minimum is the inflection point. The representative value may be the median value of the fundamental frequency f0 at the inflection point at which the maximum value or the minimum value, or may be the mode value.

なお、本実施形態においては、制御部６は、取得ビブラート波形において極小となる変曲点の個数と、取得ビブラート波形において極大となる変曲点の個数とを同数として、高音基準音高及び低音基準音高を算出する。 In the present embodiment, the control unit 6 uses the same number of inflection points as the minimum in the acquired vibrato waveform and the number of inflection points as the maximum in the acquired vibrato waveform. A reference pitch is calculated.

ビブラート特徴算出処理では、制御部６は、続いて、ビブラート中心音高を算出する（Ｓ８２０）。ビブラート中心音高とは、取得ビブラート波形における中心音高である。具体的に、Ｓ８２０では、制御部６は、Ｓ８１０にて算出した高音基準音高と低音基準音高との相加平均の結果を、ビブラート中心音高として算出する。 In the vibrato feature calculation process, the control unit 6 subsequently calculates the vibrato center pitch (S820). The vibrato center pitch is the center pitch in the acquired vibrato waveform. Specifically, in S820, the control unit 6 calculates the arithmetic average result of the treble reference pitch and the bass reference pitch calculated in S810 as the vibrato center pitch.

さらに、制御部６は、Ｓ８２０にて算出したビブラート中心音高と、取得ビブラート波形に対応する演奏期間が規定されたメロディ音符ＮＯ（ｉ）の音高との差分を、中心音高差として算出する（Ｓ８３０）。続いて、制御部６は、高音基準音高と低音基準音高との差分を、ビブラート深さとして算出する（Ｓ８４０）。なお、本実施形態においては、制御部６は、中心音高差及びビブラート深さをセント値にて算出する。 Further, the control unit 6 calculates the difference between the vibrato central pitch calculated in S820 and the pitch of the melody note NO (i) in which the performance period corresponding to the acquired vibrato waveform is defined as the central pitch difference. (S830). Subsequently, the control unit 6 calculates the difference between the treble reference pitch and the bass reference pitch as the vibrato depth (S840). In the present embodiment, the control unit 6 calculates the central pitch difference and the vibrato depth as cent values.

ビブラート特徴算出処理では、制御部６は、全てのビブラート領域ＶＴについて、Ｓ８００からＳ８４０までのステップを実行したか否かを判定する（Ｓ８５０）。このＳ８５０での判定の結果、全てのビブラート領域ＶＴについてＳ８００からＳ８４０までのステップを実行していなければ（Ｓ８５０：ＮＯ）、制御部６は、ビブラート特徴算出処理をＳ８３０へと戻す。そのＳ８３０では、制御部６は、Ｓ８００からＳ８４０までのステップを未実施であるビブラート領域ＶＴの１つに対応するボーカル周波数推移をビブラート波形として取得する。制御部６は、その後、Ｓ８５０までのステップを実行する。 In the vibrato feature calculation process, the control unit 6 determines whether or not the steps from S800 to S840 have been executed for all the vibrato regions VT (S850). As a result of the determination in S850, if the steps from S800 to S840 have not been executed for all the vibrato regions VT (S850: NO), the control unit 6 returns the vibrato feature calculation processing to S830. In S830, the control unit 6 acquires a vocal frequency transition corresponding to one of the vibrato regions VT where the steps from S800 to S840 have not been performed as a vibrato waveform. Thereafter, the controller 6 executes steps up to S850.

一方、Ｓ８５０での判定の結果、全てのビブラート領域ＶＴについてＳ８００からＳ８４０までのステップを実行していれば（Ｓ８５０：ＹＥＳ）、制御部６は、本ビブラート特徴算出処理を終了し、データ生成処理のＳ２２０へと処理を移行させる。 On the other hand, as a result of the determination in S850, if the steps from S800 to S840 have been executed for all the vibrato regions VT (S850: YES), the control unit 6 ends the vibrato feature calculation process, and the data generation process The process is shifted to S220.

つまり、ビブラート特徴算出処理では、制御部６は、図４に示すように、ボーカル周波数推移に基づいて、ボーカルデータにおいてビブラートを用いて歌唱された区間、即ち、ビブラート領域ＶＴを特定する。そして、制御部６は、その特定したビブラート領域ＶＴに対応するボーカル周波数推移における頂点の音高に基づいて、高音基準音高、及び、低音基準音高を算出する。さらに、制御部６は、ビブラート波形における中心音高と、当該ビブラート波形に対応する演奏期間が規定されたメロディ音符ＮＯ（ｉ）の音高との差分を中心音高差として算出する。これと共に、制御部６は、高音基準音高と低音基準音高との差分をビブラート深さとして算出する。 That is, in the vibrato feature calculation process, as shown in FIG. 4, the control unit 6 identifies the section sung using vibrato in the vocal data, that is, the vibrato region VT, based on the vocal frequency transition. And the control part 6 calculates a high pitch reference pitch and a low pitch reference pitch based on the pitch of the vertex in the vocal frequency transition corresponding to the specified vibrato area | region VT. Further, the control unit 6 calculates the difference between the central pitch in the vibrato waveform and the pitch of the melody note NO (i) in which the performance period corresponding to the vibrato waveform is defined as the central pitch difference. At the same time, the control unit 6 calculates the difference between the high pitch reference pitch and the low pitch reference pitch as the vibrato depth.

なお、ビブラート特徴算出処理の終了後に実行されるデータ生成処理のＳ２２０では、制御部６は、図５に示すように、ビブラート特徴量にビブラートＩＤを付与する。すなわち、制御部６は、ビブラート領域ＶＴにビブラート特徴量を対応付けることで、歌唱特徴データＳＦを生成する。 In S220 of the data generation process executed after the vibrato feature calculation process is completed, the control unit 6 assigns a vibrato ID to the vibrato feature amount as shown in FIG. That is, the control unit 6 generates the singing feature data SF by associating the vibrato feature amount with the vibrato region VT.

情報処理装置２の制御部６がデータ生成処理を実行することで生成した歌唱特徴データＳＦは、可搬型の記憶媒体を用いて情報処理サーバ１０の記憶部１４に記憶されても良い。情報処理装置２と情報処理サーバ１０とが通信網を介して接続されている場合には、情報処理装置２の記憶部５に記憶された歌唱特徴データＳＦは、通信網を介して転送されることで、情報処理サーバ１０の記憶部１４に記憶されても良い。ただし、いずれの場合も、歌唱特徴データＳＦは、対応する楽曲のＭＩＤＩ楽曲ＭＤと対応付けて、情報処理サーバ１０の記憶部１４に記憶される。
＜表示処理＞
カラオケ装置３０の制御部５０が実行する表示処理は、表示処理を実行するための処理プログラムを起動する指令が入力されると起動される。 The singing feature data SF generated by the control unit 6 of the information processing device 2 executing the data generation process may be stored in the storage unit 14 of the information processing server 10 using a portable storage medium. When the information processing device 2 and the information processing server 10 are connected via a communication network, the singing feature data SF stored in the storage unit 5 of the information processing device 2 is transferred via the communication network. Thus, the information may be stored in the storage unit 14 of the information processing server 10. However, in any case, the singing feature data SF is stored in the storage unit 14 of the information processing server 10 in association with the MIDI music MD of the corresponding music.
<Display processing>
The display process executed by the control unit 50 of the karaoke apparatus 30 is started when a command for starting a processing program for executing the display process is input.

そして、表示処理では、起動されると、図６に示すように、制御部５０は、入力受付部３４を介して指定された楽曲に対応するＭＩＤＩ楽曲ＭＤを、情報処理サーバ１０の記憶部１４から取得する（Ｓ９１０）。続いて、制御部５０は、Ｓ９１０にて取得したＭＩＤＩ楽曲ＭＤに対応する歌詞データを取得する（Ｓ９２０）。さらに、制御部５０は、Ｓ９１０にて取得したＭＩＤＩ楽曲ＭＤに対応する歌唱特徴データＳＦを取得する（Ｓ９３０）。 In the display process, when activated, as shown in FIG. 6, the control unit 50 stores the MIDI music MD corresponding to the music specified via the input receiving unit 34 as the storage unit 14 of the information processing server 10. (S910). Subsequently, the control unit 50 acquires lyrics data corresponding to the MIDI music MD acquired in S910 (S920). Furthermore, the control unit 50 acquires singing feature data SF corresponding to the MIDI music MD acquired in S910 (S930).

表示処理では、続いて、制御部５０は、Ｓ９１０にて取得したＭＩＤＩ楽曲ＭＤを演奏する（Ｓ９４０）。具体的にＳ９４０では、制御部５０は、楽曲再生部３６にＭＩＤＩ楽曲ＭＤを出力する。そのＭＩＤＩ楽曲ＭＤを取得した楽曲再生部３６は、楽曲の演奏を行う。そして、楽曲再生部３６によって演奏された楽曲の音源信号が、出力部４２を介してスピーカ６０へと出力される。すると、スピーカ６０は、音源信号を音に換えて出力する。 In the display process, the control unit 50 subsequently plays the MIDI musical piece MD acquired in S910 (S940). Specifically, in S940, the control unit 50 outputs the MIDI music piece MD to the music reproduction unit 36. The music reproducing unit 36 that has acquired the MIDI music MD performs the music. Then, the sound source signal of the music played by the music playback unit 36 is output to the speaker 60 via the output unit 42. Then, the speaker 60 outputs the sound source signal instead of sound.

さらに、表示処理では、制御部５０は、Ｓ９１０にて取得したＭＩＤＩ楽曲ＭＤ、Ｓ９２０にて取得した歌詞データ、及びＳ９３０にて取得した歌唱特徴データＳＦに基づく表示を実行する（Ｓ９５０）。このＳ９５０では、具体的には、制御部５０は、ＭＩＤＩ楽曲ＭＤにおけるメロディトラック、歌詞データ、及び歌唱特徴データＳＦを映像制御部４６に出力する。その映像制御部４６は、歌詞データと、メロディトラックを構成するメロディ音符ＮＯ（ｉ）を、時間軸に沿って順次表示部６４に出力する。すると、表示部６４は、歌詞データを時間軸に沿って順次表示すると共に、図７に示すように、メロディトラックを構成するメロディ音符ＮＯ（ｉ）を時間軸に沿って順次表示する。 Further, in the display process, the control unit 50 performs display based on the MIDI music MD acquired in S910, the lyrics data acquired in S920, and the singing feature data SF acquired in S930 (S950). In S950, specifically, the control unit 50 outputs the melody track, the lyric data, and the singing feature data SF in the MIDI music piece MD to the video control unit 46. The video control unit 46 sequentially outputs the lyrics data and the melody note NO (i) constituting the melody track to the display unit 64 along the time axis. Then, the display unit 64 sequentially displays the lyrics data along the time axis, and sequentially displays the melody note NO (i) constituting the melody track along the time axis as shown in FIG.

さらに、映像制御部４６は、歌唱特徴データＳＦに含まれるビブラート特徴量を、対応するビブラート領域ＶＴのタイミングで時間軸に沿って順次表示部６４に出力する。すると、表示部６４は、対応するビブラート領域ＶＴのタイミングでビブラート特徴量を時間軸に沿って順次表示する。 Furthermore, the video control unit 46 sequentially outputs the vibrato feature amount included in the singing feature data SF to the display unit 64 along the time axis at the timing of the corresponding vibrato region VT. Then, the display unit 64 sequentially displays the vibrato feature amount along the time axis at the timing of the corresponding vibrato region VT.

なお、Ｓ９５０では、制御部５０は、楽曲の演奏に合わせて、カラオケ装置３０の利用者が歌唱した歌唱音声の音高の推移を、映像制御部４６に出力しても良い。そして、映像制御部４６は、カラオケ装置３０の利用者が歌唱した歌唱音声の音高の推移（例えば、図７に示す実線の波形）を、メロディ音符ＮＯ（ｉ）とのズレを、利用者が認識可能な態様で表示しても良い。さらに、Ｓ９５０では、制御部５０は、プロの歌手が歌唱したボーカルデータの音高推移を、映像制御部４６に出力しても良い。そして、映像制御部４６は、プロの歌手が歌唱したボーカルデータの音高推移（即ち、ボーカル周波数推移、図７に示す破線の波形）を表示しても良い。 In S950, the control unit 50 may output the transition of the pitch of the singing voice sung by the user of the karaoke apparatus 30 to the video control unit 46 in accordance with the performance of the music. Then, the video control unit 46 changes the pitch of the singing voice sung by the user of the karaoke device 30 (for example, the solid line waveform shown in FIG. 7) from the deviation from the melody note NO (i). May be displayed in a recognizable manner. Furthermore, in S950, the control unit 50 may output the pitch transition of vocal data sung by a professional singer to the video control unit 46. Then, the video control unit 46 may display the pitch transition of vocal data sung by a professional singer (that is, vocal frequency transition, broken line waveform shown in FIG. 7).

ところで、図７においては、ビブラート特徴量とビブラート領域ＶＴとが対応付いていることを容易に理解できるように、ビブラート領域ＶＴを図示したが、本実施形態の表示部６４には、ビブラート特徴量だけが表示され、ビブラート領域ＶＴは表示されなくとも良い。 In FIG. 7, the vibrato region VT is illustrated so that the vibrato feature amount and the vibrato region VT are easily associated with each other. However, the display unit 64 according to the present embodiment includes the vibrato feature amount. Only the vibrato area VT need not be displayed.

さらに、Ｓ９５０では、制御部５０は、ビブラート特徴量を、予め規定された分類範囲ごとに分類した結果を、映像制御部４６に出力しても良い。すなわち、制御部５０は、例えば、ビブラート特徴を、制御部５０は、例えば、中心音高差が規定音高差以上高ければ、「中心音高高」に分類する。そして、制御部５０は、中心音高差が規定音高差以上低ければ、「中心音高低」に分類する。さらに、制御部５０は、例えば、ビブラート深さが規定音高差未満であれば、「ビブラート浅い」に分類する。そして、制御部５０は、ビブラート深さが規定音高差以上であれば、「ビブラート深い」に分類する。そして、制御部５０は、分類の結果をビブラート特徴として映像制御部４６に出力しても良い。この場合、映像制御部４６は、分類の結果を表示部６４に表示させる。 Further, in S950, the control unit 50 may output the result of classifying the vibrato feature amount for each predefined classification range to the video control unit 46. That is, for example, the control unit 50 classifies the vibrato feature, and the control unit 50 classifies the central pitch as “central pitch” if the central pitch difference is higher than a specified pitch difference, for example. Then, if the central pitch difference is lower than the specified pitch difference, the control unit 50 classifies it as “central pitch low”. Furthermore, for example, if the vibrato depth is less than the specified pitch difference, the control unit 50 classifies the “vibrato shallow”. Then, if the vibrato depth is equal to or greater than the specified pitch difference, the control unit 50 classifies it as “deep vibrato”. Then, the control unit 50 may output the classification result to the video control unit 46 as a vibrato feature. In this case, the video control unit 46 causes the display unit 64 to display the classification result.

続いて、表示処理では、制御部５０は、楽曲の演奏を終了したか否かを判定する（Ｓ９６０）。この判定の結果、楽曲の演奏を終了していなければ（Ｓ９６０：ＮＯ）、制御部５０は、表示処理をＳ９４０へと戻す。一方、Ｓ９６０での判定の結果、楽曲の演奏が終了していれば（Ｓ９６０：ＹＥＳ）、制御部５０は、表示処理を終了する。
［実施形態の効果］
以上説明したように、本実施形態の歌唱特徴データＳＦは、ビブラート領域ＶＴを特定しその特定したビブラート領域ＶＴに「中心音高差」、及び「ビブラート深さ」を対応付けたものである。「中心音高差」及び「ビブラート深さ」は、歌手が用いるビブラートに関する特徴を表す情報であり、歌手ごとに特有の特徴である。 Subsequently, in the display process, the control unit 50 determines whether or not the music performance has ended (S960). If the result of this determination is that the music performance has not ended (S960: NO), the control unit 50 returns the display processing to S940. On the other hand, if the result of determination in S960 is that the music performance has ended (S960: YES), the control unit 50 ends the display process.
[Effect of the embodiment]
As described above, the singing feature data SF of the present embodiment specifies the vibrato region VT and associates the specified vibrato region VT with the “center pitch difference” and the “vibrato depth”. The “center pitch difference” and “vibrato depth” are information representing characteristics relating to the vibrato used by the singer, and are unique to each singer.

そして、歌唱特徴データＳＦに基づいて表示部６４に表示するカラオケ装置３０では、「中心音高差」、及び「ビブラート深さ」を時間軸に沿って順次表示している。
このため、カラオケ装置３０のユーザは、歌唱している楽曲において、プロの歌手が、歌唱技巧としての「ビブラート」をどのように用いているのかを認識できる。この結果、カラオケ装置３０のユーザは、ユーザ自身の歌い方を、プロの歌手の歌い方により近づけることができる。 And in the karaoke apparatus 30 displayed on the display part 64 based on singing characteristic data SF, "central pitch difference" and "vibrato depth" are displayed sequentially along a time axis.
For this reason, the user of the karaoke apparatus 30 can recognize how a professional singer uses “vibrato” as a singing technique in the song being sung. As a result, the user of the karaoke apparatus 30 can bring the user's own way of singing closer to that of a professional singer.

また、本実施形態のデータ生成処理では、高音基準音高及び低音基準音高を算出するために用いる、ビブラート波形において極大となる変曲点の個数と極小となる変曲点の個数とを同数としている。 In the data generation processing of the present embodiment, the same number of inflection points as the maximum and minimum inflection points in the vibrato waveform are used to calculate the treble reference pitch and the bass reference pitch. It is said.

このため、データ生成処理によれば、高音基準音高及び低音基準音高から特異点を除くことができ、高音基準音高及び低音基準音高の算出精度を向上させることができる。
［その他の実施形態］
以上、本発明の実施形態について説明したが、本発明は上記実施形態に限定されるものではなく、本発明の要旨を逸脱しない範囲において、様々な態様にて実施することが可能である。 For this reason, according to the data generation processing, the singular point can be removed from the high pitch reference pitch and the low pitch reference pitch, and the calculation accuracy of the high pitch reference pitch and the low pitch reference pitch can be improved.
[Other Embodiments]
As mentioned above, although embodiment of this invention was described, this invention is not limited to the said embodiment, In the range which does not deviate from the summary of this invention, it is possible to implement in various aspects.

例えば、上記実施形態におけるデータ生成処理は、情報処理装置２にて実行されていたが、本発明においてデータ生成処理を実行する装置は、情報処理装置２に限るものではない。すなわち、データ生成処理を実行する装置は、情報処理サーバ１０であっても良いし、カラオケ装置３０であっても良い。この場合、情報処理装置２は、システム１から省略されていても良い。 For example, the data generation process in the above embodiment is executed by the information processing apparatus 2, but the apparatus that executes the data generation process in the present invention is not limited to the information processing apparatus 2. That is, the apparatus that executes the data generation process may be the information processing server 10 or the karaoke apparatus 30. In this case, the information processing apparatus 2 may be omitted from the system 1.

上記実施形態における表示処理は、カラオケ装置３０にて実行されていたが、本発明において表示処理を実行する装置は、カラオケ装置３０に限るものではなく、情報処理装置２であっても良い。この場合、情報処理装置２には、映像制御部４６が設けられ、その映像制御部４６には、表示部６４が接続されている必要がある。 The display process in the above embodiment is executed by the karaoke apparatus 30, but the apparatus that executes the display process in the present invention is not limited to the karaoke apparatus 30, and may be the information processing apparatus 2. In this case, the information processing apparatus 2 is provided with a video control unit 46, and a display unit 64 needs to be connected to the video control unit 46.

また、上記実施形態においては、データ生成処理と表示処理とは別個の処理として構成されていたが、本発明においては、データ生成処理と表示処理とは１つの処理として構成されていても良い。この場合、データ生成処理と表示処理とからなる１つの処理は、情報処理装置２にて実行されても良いし、カラオケ装置３０にて実行されても良い。 In the above embodiment, the data generation process and the display process are configured as separate processes. However, in the present invention, the data generation process and the display process may be configured as a single process. In this case, one process including the data generation process and the display process may be executed by the information processing apparatus 2 or may be executed by the karaoke apparatus 30.

ところで、上記実施形態のビブラート特徴算出処理では、中心音高差と、ビブラート深さとの両方を、ビブラート特徴として算出していたが、本発明においては、中心音高差と、ビブラート深さとのうちのいずれか一方だけを、ビブラート特徴として算出しても良い。この場合、ビブラート特徴算出処理においては、Ｓ８２０及びＳ８３０が省略されていても良いし、Ｓ８４０が省略されていても良い。 By the way, in the vibrato feature calculation process of the above embodiment, both the central pitch difference and the vibrato depth are calculated as the vibrato feature. However, in the present invention, of the central pitch difference and the vibrato depth, Only one of these may be calculated as the vibrato feature. In this case, in the vibrato feature calculation process, S820 and S830 may be omitted, or S840 may be omitted.

なお、上記実施形態のデータ生成処理では、「中心音高差」、及び「ビブラート深さ」をビブラート領域ＶＴに対応付けることで歌唱特徴データＳＦを生成していたが、「中心音高差」、及び「ビブラート深さ」を対応付ける対象は、ビブラート領域ＶＴに限るものではない。例えば、「中心音高差」、及び「ビブラート深さ」を対応付ける対象は、ビブラート領域ＶＴを包含するメロディ音符ＮＯであっても良い。 In the data generation process of the above embodiment, the singing feature data SF is generated by associating the “central pitch difference” and the “vibrato depth” with the vibrato region VT. The object to be associated with the “vibrato depth” is not limited to the vibrato region VT. For example, the object associated with “center pitch difference” and “vibrato depth” may be a melody note NO including the vibrato region VT.

また、この場合において、メロディ音符ＮＯに対応付ける対象は、「中心音高差」、及び「ビブラート深さ」のうちの少なくともいずれか一方であっても良い。
なお、上記実施形態の構成の一部を、課題を解決できる限りにおいて省略した態様も本発明の実施形態である。また、上記実施形態と変形例とを適宜組み合わせて構成される態様も本発明の実施形態である。また、特許請求の範囲に記載した文言によって特定される発明の本質を逸脱しない限度において考え得るあらゆる態様も本発明の実施形態である。 In this case, the object associated with the melody note NO may be at least one of “center pitch difference” and “vibrato depth”.
In addition, the aspect which abbreviate | omitted a part of structure of the said embodiment as long as the subject could be solved is also embodiment of this invention. Further, an aspect configured by appropriately combining the above embodiment and the modification is also an embodiment of the present invention. Moreover, all the aspects which can be considered in the limit which does not deviate from the essence of the invention specified by the wording described in the claims are the embodiments of the present invention.

１…システム２…情報処理装置３…入力受付部４…情報出力部５…記憶部６，１６，５０…制御部７，１８，５２…ＲＯＭ８，２０，５４…ＲＡＭ９，２２，５６…ＣＰＵ１０…情報処理サーバ１２…通信部１４…記憶部３０…カラオケ装置３２…通信部３４…入力受付部３６…楽曲再生部３８…記憶部４０…音声制御部４２…出力部４４…マイク入力部４６…映像制御部６０…スピーカ６２…マイク６４…表示部 DESCRIPTION OF SYMBOLS 1 ... System 2 ... Information processing apparatus 3 ... Input reception part 4 ... Information output part 5 ... Memory | storage part 6, 16, 50 ... Control part 7, 18, 52 ... ROM 8, 20, 54 ... RAM 9, 22, 56 ... CPU 10 ... Information processing server 12 ... Communication unit 14 ... Storage unit 30 ... Karaoke device 32 ... Communication unit 34 ... Input reception unit 36 ... Music playback unit 38 ... Storage unit 40 ... Voice control unit 42 ... Output unit 44 ... Microphone input unit 46 ... Video control unit 60 ... Speaker 62 ... Microphone 64 ... Display unit

Claims

1st acquisition which acquires the score data which represents the score of the music comprised by several notes, Comprising: The musical score data with which the pitch and the performance period were matched with each of these several notes from the 1st memory | storage part Steps,
A second acquisition step of acquiring music data including a vocal sound of singing the music from the second storage unit;
An extraction step of extracting vocal data representing the vocal sound from the music data;
Based on the vocal data, a pitch transition determination step for determining a vocal pitch transition representing a transition of a pitch in the vocal data;
Based on the vocal pitch transition, a specific step of identifying a vibrato waveform that is a singing section sung using vibrato in the vocal pitch transition;
Based on the peak pitch in the vibrato waveform, a reference calculation for calculating a high pitch reference pitch that represents a high pitch in the vibrato waveform and a low pitch reference pitch that represents a low pitch in the vibrato waveform Steps,
Based on the high pitch reference pitch and the low pitch reference pitch, a central pitch difference that is a difference between a central pitch in the vibrato waveform and a pitch of a note corresponding to the singing section in the score data, or A determination step of determining at least one of vibrato depths representing a fluctuation width along a frequency axis in the vibrato waveform;
A data generation step of generating singing feature data is executed on a computer by associating at least one of the central pitch difference and the vibrato depth with an interval in the score data corresponding to the singing interval. A program characterized by letting

The determining step includes
An average value of the high pitch reference pitch and the low pitch reference pitch is determined as a central pitch in the vibrato waveform,
The program according to claim 1, wherein a difference between the central pitch and a pitch of a note corresponding to the singing section in the score data is determined as the central pitch difference.

The determining step includes
The program according to claim 1 or 2, wherein a difference between the high pitch reference pitch and the low pitch reference pitch is calculated as the vibrato depth.

The reference calculating step includes
A vertex detection step of detecting an inflection point in the vibrato waveform;
Of the inflection points of the vibrato waveform, a treble reference calculation step of calculating a representative value of the pitch at the inflection point that is a maximum as the treble reference pitch;
The bass reference calculation step of calculating a representative value of a pitch at an inflection point that is a minimum among the inflection points of the vibrato waveform as the bass reference pitch. The program according to any one of claims 1 to 3.

The reference calculating step includes
The treble reference pitch and the bass reference pitch are calculated using the same number of inflection points as the minimum in the vibrato waveform and the number of inflection points as the maximum in the vibrato waveform. The program according to claim 4.

A plurality of notes constituting the musical score data acquired in the first acquisition step are displayed on a display unit, and at least one of the central pitch difference and the vibrato depth included in the singing feature data 6. The display control step of causing the computer to execute a display control step of associating with a section in the score data corresponding to the singing section and displaying on the display unit. A program according to any one of the above.

1st acquisition which acquires the score data which represents the score of the music comprised by several notes, Comprising: The musical score data with which the pitch and the performance period were matched with each of these several notes from the 1st memory | storage part Means,
Second acquisition means for acquiring music data including the vocal sound of singing the music from the second storage unit;
Extraction means for extracting vocal data representing the vocal sound from the music data;
Based on the vocal data, a pitch transition determining means for determining a vocal pitch transition representing a transition of a pitch in the vocal data;
Based on the vocal pitch transition, a specifying means for identifying a vibrato waveform that is a singing section sung using vibrato in the vocal pitch transition;
Based on the peak pitch in the vibrato waveform, a reference calculation for calculating a high pitch reference pitch that represents a high pitch in the vibrato waveform and a low pitch reference pitch that represents a low pitch in the vibrato waveform Means,
Based on the high pitch reference pitch and the low pitch reference pitch, a central pitch difference that is a difference between a central pitch in the vibrato waveform and a pitch of a note corresponding to the singing section in the score data, or Determining means for determining at least one of vibrato depths representing a fluctuation width along a frequency axis in the vibrato waveform;
Data generation means for generating singing feature data by associating at least one of the central pitch difference and the vibrato depth with the section of the score data corresponding to the singing section. A characteristic information processing apparatus.

The information processing apparatus acquires, from the first storage unit, musical score data representing a musical score of a music composed of a plurality of notes, each of which is associated with a pitch and a performance period. A first acquisition procedure,
A second acquisition procedure in which the information processing apparatus acquires from the second storage unit music data including a vocal sound that sang the music;
An extraction procedure by which the information processing apparatus extracts vocal data representing the vocal sound from the music data;
Based on the vocal data, a pitch transition determination procedure in which the information processing apparatus determines a vocal pitch transition representing a transition of a pitch in the vocal data;
Based on the vocal pitch transition, a specific procedure in which the information processing device identifies a vibrato waveform that is a singing section sung using vibrato in the vocal pitch transition;
Based on the peak pitch in the vibrato waveform, the high pitch reference pitch representing the high pitch in the vibrato waveform and the low pitch reference pitch representing the low pitch in the vibrato waveform are the information processing. A reference calculation procedure calculated by the device;
Based on the high pitch reference pitch and the low pitch reference pitch, a central pitch difference that is a difference between a central pitch in the vibrato waveform and a pitch of a note corresponding to the singing section in the score data, or A determination procedure in which the information processing apparatus determines at least one of vibrato depths representing a fluctuation width along a frequency axis in the vibrato waveform;
A data generation procedure in which the information processing apparatus generates singing feature data by associating at least one of the central pitch difference and the vibrato depth with a section in the score data corresponding to the singing section. A data generation method comprising: and.