JP5983670B2

JP5983670B2 - Program, information processing apparatus, and data generation method

Info

Publication number: JP5983670B2
Application number: JP2014069731A
Authority: JP
Inventors: 誠司黒川
Original assignee: Brother Industries Ltd
Current assignee: Brother Industries Ltd
Priority date: 2014-03-28
Filing date: 2014-03-28
Publication date: 2016-09-06
Anticipated expiration: 2034-03-28
Also published as: JP2015191170A

Description

本発明は、データを生成するプログラム、情報処理装置、及びデータ生成方法に関する。 The present invention relates to a program for generating data, an information processing apparatus, and a data generation method.

従来、楽曲の歌唱旋律を歌った歌唱の巧拙を評価する歌唱評価技術が知られている（特許文献１参照）。
この特許文献１に記載の技術では、楽曲ごとに用意された模範音声に、利用者が歌唱した音声を照合することで巧拙を評価し、その評価に応じたコメントを歌唱後に提示している。 Conventionally, a song evaluation technique for evaluating the skill of a song that sang the song melody has been known (see Patent Document 1).
In the technique described in Patent Document 1, skill is evaluated by comparing the voice sung by the user with the model voice prepared for each piece of music, and a comment corresponding to the evaluation is presented after singing.

特開２００７−２３３０１３号公報JP 2007-233303 A

ところで、プロの歌手が歌唱した楽曲においては、多くの場合、その楽曲の歌手ごとに特有の特徴が表出する。カラオケ装置や情報処理装置などを用いて利用者が歌唱する場合において、歌手特有の歌い方の特徴を再現できた場合には、より上手く歌唱しているように聞こえる。 By the way, in the music sung by a professional singer, in many cases, a unique characteristic appears for each singer of the music. When a user sings using a karaoke device, an information processing device, or the like, if the singing-specific characteristics of the singer can be reproduced, it sounds like singing better.

このため、カラオケ装置や情報処理装置などにおいて、歌手特有の歌い方の特徴について、その歌い方の特徴を利用者が実際に歌唱する前に提示することが求められている。
しかしながら、従来の技術では、利用者が歌唱した音声と模範音声とのズレを利用者が歌唱した後に提示するだけである。よって、カラオケ装置や情報処理装置などの利用者は、歌手特有の歌い方の特徴について、どのように歌唱するとよいのかを歌唱前に認識できないという課題があった。 For this reason, in a karaoke apparatus, an information processing apparatus, etc., about the characteristic of the way of singing peculiar to a singer, before the user actually sings, the characteristic of the way of singing is required to be presented.
However, in the conventional technology, the difference between the voice sung by the user and the model voice is only presented after the user sings. Therefore, users such as karaoke devices and information processing devices have a problem that they cannot recognize how to sing before singing about the singing-specific characteristics of the singer.

つまり、カラオケ装置や情報処理装置などにおいて、歌手特有の歌い方の特徴について、利用者が実際に歌唱する前に何ら提示していないという課題があった。
そこで、本発明は、歌手特有の歌い方の特徴を提示するために必要なデータを生成可能な技術の提供を目的とする。 That is, in a karaoke apparatus, an information processing apparatus, or the like, there is a problem that the user does not present any characteristics of the singing-specific singing method before the user actually sings.
Therefore, an object of the present invention is to provide a technique capable of generating data necessary for presenting the characteristics of a singer-specific singing method.

上記目的を達成するためになされた本発明は、コンピュータに実行させるプログラムに関する。この本発明のプログラムは、第１取得ステップと、第２取得ステップと、抽出ステップと、タイミング特定ステップと、第１生成ステップと、ため時間算出ステップと、第２生成ステップとをコンピュータに実行させる。 The present invention made to achieve the above object relates to a program to be executed by a computer. The program of the present invention causes a computer to execute a first acquisition step, a second acquisition step, an extraction step, a timing specifying step, a first generation step, a time calculation step, and a second generation step. .

このうち、第１取得ステップでは、複数の音符によって構成された楽曲の楽譜を表す楽譜データであって、前記複数の音符のそれぞれに、少なくとも、音高及び演奏開始タイミングが規定された楽譜データを、第１記憶部から取得する。第２取得ステップでは、楽曲を歌唱したボーカル音を含む楽曲データを第２記憶部から取得する。 Among these, in the first acquisition step, musical score data representing a musical score of a music composed of a plurality of notes, and at least the musical score data in which the pitch and performance start timing are defined for each of the plurality of notes. , Acquired from the first storage unit. In the second acquisition step, music data including the vocal sound that sang the music is acquired from the second storage unit.

さらに、抽出ステップでは、第２取得ステップで取得した楽曲データからボーカル音を表すボーカルデータを抽出する。タイミング特定ステップでは、抽出ステップで抽出したボーカルデータに基づいて、ボーカルデータにおいて発声を開始したタイミングとみなせる発声タイミングそれぞれを特定する。第１生成ステップでは、タイミング特定ステップで特定した発声タイミングのそれぞれと、第１取得ステップで取得した楽譜データにおける音符の演奏開始タイミングとに基づいて、所定の条件を満たす演奏開始タイミングと発声タイミングとを対応付けたタイミングペアデータを生成する。 Furthermore, in the extraction step, vocal data representing vocal sound is extracted from the music data acquired in the second acquisition step. In the timing specifying step, based on the vocal data extracted in the extracting step, each utterance timing that can be regarded as a timing at which utterance is started in the vocal data is specified. In the first generation step, based on each of the utterance timings specified in the timing specification step and the performance start timing of the notes in the score data acquired in the first acquisition step, To generate timing pair data.

そして、ため時間算出ステップでは、ペア生成ステップで生成されたタイミングペアデータそれぞれに基づいて、タイミングペアデータにおける演奏開始タイミングと発声タイミングとの時間差である、ため時間を算出する。すると、ため時間算出ステップでは、第１生成ステップで生成されたタイミングペアデータそれぞれに基づいて、演奏開始タイミングと発声タイミングとの時間差である、ため時間を算出する。すると、第２生成ステップでは、ため時間算出ステップで算出した“ため時間”を、ため時間の算出に用いた演奏開始タイミングに対応する楽譜データの音符と対応付けることで、音符における歌手の歌い方の特徴を表す歌唱特徴データを生成する。 Then, in the time calculation step, the time is calculated because it is the time difference between the performance start timing and the utterance timing in the timing pair data based on each of the timing pair data generated in the pair generation step. Then, in the time calculation step, the time is calculated because it is the time difference between the performance start timing and the utterance timing based on each of the timing pair data generated in the first generation step. Then, in the second generation step, the “time” calculated in the time calculation step is associated with the musical score data notes corresponding to the performance start timing used for the calculation of the time, so that the singer's way of singing the notes Singing feature data representing the feature is generated.

このようなプログラムにて生成される歌唱特徴データは、楽譜データの音符に「ため時間」を対応付けたものである。特に、本発明のプログラムでは、楽譜データにおける音符の演奏開始タイミングと、楽曲データに含まれるプロの歌手が歌唱したボーカル音から検出した発声タイミングとの差分を「ため時間」として算出している。このため、「ため時間」は、楽曲を歌唱する際に、プロの歌手が用いる歌唱技巧としての「ため」の特徴を表すものである。 The singing feature data generated by such a program is obtained by associating “time” with the notes of the score data. In particular, in the program of the present invention, the difference between the performance start timing of the notes in the score data and the utterance timing detected from the vocal sound sung by a professional singer included in the music data is calculated as “time”. Therefore, “time” represents a characteristic of “for” as a singing technique used by a professional singer when singing music.

しかも、歌唱特徴データにおいては、ため時間の算出に用いた演奏開始タイミングに対応する楽譜データの音符に、その“ため時間”を対応付けている。
このため、歌唱特徴データによれば、カラオケ装置や情報処理装置に接続された表示部に、音符における“ため時間”を提示できる。特に、歌唱特徴データによれば、カラオケ装置や情報処理装置の利用者が実際に歌唱する前に“ため時間”を提示することが可能となる。 In addition, in the singing feature data, the “time” is associated with the musical score data corresponding to the performance start timing used to calculate the time.
For this reason, according to the singing characteristic data, “time” in the note can be presented on the display unit connected to the karaoke apparatus or the information processing apparatus. In particular, according to the singing feature data, the user of the karaoke apparatus or the information processing apparatus can present “time” before actually singing.

そして、“ため時間”が表示装置に提示されるカラオケ装置や情報処理装置の利用者は、歌唱している楽曲において、プロの歌手が、歌唱技巧としての「ため」をどのように用いているのかを認識できる。この結果、カラオケ装置や情報処理装置の利用者は、利用者自身の歌い方を、プロの歌手の歌い方により近づけることができる。 And a user of a karaoke apparatus or an information processing apparatus in which “the time” is presented on the display device, how a professional singer uses “for” as a singing technique in the song being sung Can be recognized. As a result, the user of the karaoke apparatus or the information processing apparatus can bring the user's own way of singing closer to that of a professional singer.

本発明におけるタイミング特定ステップは、ボーカルデータにおける音圧が、予め規定された規定閾値以上となった発声開始タイミングそれぞれを、発声タイミングとして特定しても良い。 In the timing specifying step in the present invention, each utterance start timing at which the sound pressure in the vocal data is equal to or higher than a predetermined threshold may be specified as the utterance timing.

さらに、本発明における第１生成ステップは、楽譜データにおける音符の演奏開始タイミングと、その演奏開始タイミングからの時間長が最短となる発声開始タイミングとを組み合わせたタイミングペアデータを生成しても良い。ため時間算出ステップは、その生成されたタイミングペアデータにおける演奏開始タイミングと発声開始タイミングとの時間差である発音ため時間を、ため時間として算出しても良い。 Furthermore, the first generation step in the present invention may generate timing pair data that combines the performance start timing of the notes in the score data and the utterance start timing with the shortest time length from the performance start timing. Therefore, the time calculation step may calculate the time for sound generation as the time difference between the performance start timing and the utterance start timing in the generated timing pair data.

このようなプログラムによれば、「ため時間」の一種として、「発音ため時間」を算出できる。この「発音ため時間」は、楽譜データにおける音符の演奏開始タイミングと、楽曲データにおいて歌手が実際に発声を開始した発声開始タイミングとの時間差であり、その歌手に特有の歌い方の特徴である。 According to such a program, “time for sound generation” can be calculated as a kind of “time for sound”. This “pronunciation time” is a time difference between the note performance start timing in the score data and the utterance start timing at which the singer actually started speaking in the music data, and is a characteristic of the singing method peculiar to the singer.

このため、本発明のプログラムによれば、歌手に特有の歌い方の特徴として、「発音ため時間」を含む歌唱特徴データを生成することができる。
この結果、カラオケ装置や情報処理装置において、プロの歌手が発声を開始するタイミングが、音符における演奏開始タイミングからどの程度ずれているのかを提示することができる。 For this reason, according to the program of the present invention, singing characteristic data including “time for pronunciation” can be generated as a characteristic of a singing method peculiar to a singer.
As a result, in the karaoke apparatus or the information processing apparatus, it is possible to present how much the timing at which a professional singer starts speaking deviates from the performance start timing in a note.

また、本発明におけるため時間算出ステップでは、発音ため時間の楽曲における代表値である発音ため代表値を算出し、発音ため時間と、発音ため代表値とに基づいて、発音ため代表値から規定時間長以上乖離した発音ため時間である特殊発音ため時間を、ため時間として算出しても良い。 In the time calculation step according to the present invention, the representative value for pronunciation, which is a representative value in the musical composition of time for sound generation, is calculated, and based on the time for sound generation and the representative value for sound generation, the specified time from the representative value for sound generation The time for special pronunciation, which is the time for sound that deviates more than a long time, may be calculated as time.

このようなプログラムにて特定する「特殊発音ため時間」は、楽曲において歌唱技巧としての「ため」が顕著に表れる音符でのため時間であり、楽曲を歌唱した歌手の歌い方における顕著な特徴である。 The “special pronunciation time” specified in such a program is a time for notes that clearly show “for” as a singing skill in the music, and is a prominent feature in the way of singing the singer who sang the music. is there.

このため、本発明のプログラムによれば、歌手の歌い方における顕著な特徴として、「特殊発音ため時間」を含む歌唱特徴データを生成できる。
この結果、楽曲において、プロの歌手が「ため」を顕著に用いる箇所を、カラオケ装置や情報処理装置を介して提示できる。そして、カラオケ装置や情報処理装置の利用者は、プロの歌手が「ため」を顕著に用いる箇所だけでも、利用者自身の歌い方をプロの歌手の歌い方に近づけることができる。 Therefore, according to the program of the present invention, singing feature data including “time for special pronunciation” can be generated as a prominent feature in the way of singing a singer.
As a result, the location where the professional singer prominently uses “for” in the music can be presented via the karaoke apparatus or the information processing apparatus. And the user of a karaoke apparatus or an information processing apparatus can make the user's own way of singing close to the way of singing a professional singer only at the place where the professional singer uses “for” prominently.

また、本発明におけるタイミング特定ステップでは、ボーカルデータにおける音高の推移を表すボーカル周波数推移が、楽譜データにおける音符の音高の範囲内となったタイミングである音高変化タイミングを、発声タイミングとして特定する。第１生成ステップでは、楽譜データにおける音符の演奏開始タイミングと、その演奏開始タイミングから、時間軸に沿って予め規定された規定時間範囲であり、かつ、演奏開始タイミングからの時間長が最短となる音高変化タイミングとを対応付けたタイミングペアデータを生成する。 In the timing specifying step of the present invention, the pitch change timing, which is the timing at which the vocal frequency transition representing the pitch transition in the vocal data falls within the pitch range of the musical score data, is specified as the utterance timing. To do. In the first generation step, a musical note performance start timing in the score data and the performance start timing are within a specified time range defined in advance along the time axis, and the time length from the performance start timing is the shortest. Timing pair data in which the pitch change timing is associated is generated.

さらに、この場合、ため時間算出ステップは、第１生成ステップで生成されたタイミングペアデータにおける演奏開始タイミングと音高変化タイミングとの時間差である音高ため時間を、ため時間として算出する。 Further, in this case, the time calculation step calculates the time for the pitch, which is the time difference between the performance start timing and the pitch change timing in the timing pair data generated in the first generation step, as the time.

このようなプログラムによれば、「ため時間」の一種として、「音高ため時間」を算出できる。この「音高ため時間」は、楽譜データにおける音符の音高に、歌手が歌唱した歌声における音高が一致したタイミングと、その音符の演奏開始タイミングとの時間差であり、歌手に特有の歌い方の特徴として表れる。 According to such a program, “time for pitch” can be calculated as a kind of “for time”. This “pitch time” is the time difference between the pitch of the note in the score data and the pitch of the singing voice sung by the singer and the timing at which the note starts to be played. Appears as a feature of

このため、本発明のプログラムによれば、歌手ごとに特有の歌い方の特徴である「音高ため時間」を含む歌唱特徴データを生成することができる。
この結果、カラオケ装置や情報処理装置において、プロの歌手が、歌唱した歌声における音高を、楽譜データにおける音符の音高に一致させるタイミングを、どの程度ずらしているのかを提示することができる。 Therefore, according to the program of the present invention, singing feature data including “time for pitch”, which is a characteristic of a singing method peculiar to each singer, can be generated.
As a result, in the karaoke apparatus or the information processing apparatus, it is possible to present how much the professional singer has shifted the timing for matching the pitch of the sung voice to the pitch of the note in the score data.

さらに、本発明におけるため時間算出ステップでは、音高ため時間の楽曲における代表値である音高ため代表値を算出し、音高ため時間と、音高ため代表値とに基づいて、音高ため代表値から規定時間長以上乖離した音高ため時間である特殊音高ため時間を特定しても良い。 Further, in the time calculation step according to the present invention, in the time calculation step, a representative value is calculated for the pitch, which is a representative value in the music of the time for the pitch. The time may be specified because of a special pitch that is a time because the pitch deviates from the representative value by a specified time length or more.

このようなプログラムにて特定される「特殊音高ため時間」は、楽譜データにおける音符の音高に、プロの歌手が歌唱した歌声における音高を一致させるタイミングのズレがより顕著に表れる音符での時間であり、楽曲を歌唱した歌手の歌い方における顕著な特徴である。 The “special pitch time” specified in such a program is a note in which the pitch of the note in the score data matches the pitch in the singing voice sung by a professional singer more prominently. This is a remarkable feature of how to sing a singer who sang music.

このため、本発明のプログラムによれば、歌手の歌い方における顕著な特徴である「特殊音高ため時間」を含む歌唱特徴データを生成できる。
この結果、楽曲において、楽譜データにおける音符の音高に、プロの歌手が歌唱した歌声における音高を一致させるタイミングのズレが顕著な箇所を、カラオケ装置や情報処理装置を介して提示できる。そして、カラオケ装置や情報処理装置の利用者は、楽曲において、楽譜データにおける音符の音高にプロの歌手が歌唱した歌声における音高を一致させるタイミングのズレが顕著な箇所だけでも、利用者自身の歌い方をプロの歌手の歌い方に近づけることができる。 Therefore, according to the program of the present invention, singing feature data including “time for special pitch”, which is a remarkable feature in the way of singing a singer, can be generated.
As a result, it is possible to present, via the karaoke apparatus or the information processing apparatus, a position where the difference in timing for matching the pitches of the singing voices sung by the professional singer with the pitches of the musical notes in the musical score data is significant. And the user of the karaoke apparatus or the information processing apparatus can use the user himself / herself only in a portion where there is a significant difference in the timing of matching the pitch of the singing voice sung by a professional singer with the pitch of the notes in the musical score data. Can be brought close to how professional singers sing.

さらに、本発明のプログラムにおいては、第３取得ステップと、第４取得ステップと、表示制御ステップとをコンピュータに実行させても良い。第３取得ステップでは、指定された楽曲の楽譜データを取得する。第４取得ステップでは、第３取得ステップで取得された楽譜データに対応し、データ生成ステップにて生成された歌唱特徴データを取得する。さらに、表示制御ステップでは、第３取得ステップで取得された楽譜データを構成する音符それぞれを、表示部に表示させると共に、第４取得ステップで取得した歌唱特徴データに含まれるため時間それぞれを、ため時間の算出に用いた演奏開始タイミングに対応する楽譜データの音符と対応付けて、表示部に表示させる。 Furthermore, in the program of the present invention, the computer may execute the third acquisition step, the fourth acquisition step, and the display control step. In the third acquisition step, the score data of the designated music is acquired. In the fourth acquisition step, the singing feature data generated in the data generation step is acquired corresponding to the score data acquired in the third acquisition step. Further, in the display control step, each of the notes constituting the score data acquired in the third acquisition step is displayed on the display unit, and each time is counted because it is included in the singing feature data acquired in the fourth acquisition step. It is displayed on the display unit in association with the musical score data note corresponding to the performance start timing used for calculating the time.

このようなプログラムによれば、楽譜データに基づいて楽曲を構成する音符における“ため時間”を、カラオケ装置や情報処理装置に接続された表示部に提示できる。さらに、本発明のプログラムにおいて、楽譜データを構成する音符それぞれを、時間軸に沿って順次表示部に表示させるようにすれば、カラオケ装置や情報処理装置の利用者が実際に歌唱する前に、“ため時間”を表示部に表示できる。 According to such a program, it is possible to present “time” in the notes constituting the music based on the score data on the display unit connected to the karaoke apparatus or the information processing apparatus. Furthermore, in the program of the present invention, if each of the notes constituting the musical score data is sequentially displayed on the display unit along the time axis, before the user of the karaoke apparatus or the information processing apparatus actually sings, “Time” can be displayed on the display.

この結果、“ため時間”が表示部に提示されるカラオケ装置や情報処理装置の利用者は、歌唱している楽曲において、プロの歌手が、歌唱技巧としての「ため」をどのように用いているのかを認識できる。 As a result, users of karaoke devices and information processing devices whose “time” is presented on the display unit, how the professional singer uses “for” as a singing technique in the song being sung You can recognize if you are.

ところで、本発明は、情報処理装置としてなされていても良い。
本発明としての情報処理装置は、第１取得手段と、第２取得手段と、抽出手段と、タイミング特定手段と、第１生成手段と、ため時間算出手段と、第２生成手段とを備える。 By the way, this invention may be made | formed as information processing apparatus.
An information processing apparatus according to the present invention includes a first acquisition unit, a second acquisition unit, an extraction unit, a timing specifying unit, a first generation unit, a time calculation unit, and a second generation unit.

第１取得手段は、楽譜データを第１記憶部から取得し、第２取得手段は、楽曲データを第２記憶部から取得する。抽出手段は、楽曲データからボーカルデータを抽出し、タイミング特定手段は、その抽出したボーカルデータに基づいて、発声タイミングそれぞれを特定する。第１生成手段は、その特定した発声タイミングのそれぞれと、楽譜データにおける音符の演奏開始タイミングとに基づいて、タイミングペアデータを生成し、ため時間算出手段は、その生成されたタイミングペアデータそれぞれに基づいて、ため時間を算出する。そして、第２生成手段は、その算出した“ため時間”を、ため時間の算出に用いた演奏開始タイミングに対応する楽譜データの音符と対応付けることで、歌唱特徴データを生成する。 The first acquisition unit acquires the musical score data from the first storage unit, and the second acquisition unit acquires the music piece data from the second storage unit. The extracting means extracts vocal data from the music data, and the timing specifying means specifies each utterance timing based on the extracted vocal data. The first generation means generates timing pair data based on each of the specified utterance timings and the performance start timing of the notes in the musical score data, and the time calculation means therefore adds the timing pair data to each of the generated timing pair data. Based on this, the time is calculated. Then, the second generation means generates singing feature data by associating the calculated “time” with notes of the score data corresponding to the performance start timing used for calculating the time.

このような情報処理装置によれば、本発明に係るプログラムと同様の効果を得ることができる。
また、本発明は、情報処理装置が実行するデータ生成方法としてなされていても良い。 According to such an information processing apparatus, the same effect as the program according to the present invention can be obtained.
The present invention may be implemented as a data generation method executed by the information processing apparatus.

本発明としてのデータ生成方法は、第１取得手順と、第２取得手順と、抽出手順と、タイミング特定手順と、第１生成手順と、ため時間算出手順と、第２生成手順とを備える。
第１取得手順では、情報処理装置が第１記憶部から楽譜データを取得し、第２取得手順では、情報処理装置が第２記憶部から楽曲データを取得する。抽出手順では、その取得した楽曲データから、情報処理装置がボーカルデータを抽出し、タイミング特定手順では、その抽出したボーカルデータに基づいて、情報処理装置が発声タイミングそれぞれを特定する。第１生成手順では、その特定した発声タイミングのそれぞれと、楽譜データにおける音符の演奏開始タイミングとに基づいて、情報処理装置がタイミングペアデータを生成しため時間算出手順では、その生成されたタイミングペアデータそれぞれに基づいて、情報処理装置がため時間を算出する。さらに、第２生成手順では、情報処理装置が、その算出した“ため時間”を、ため時間の算出に用いた演奏開始タイミングに対応する楽譜データの音符と対応付けることで、歌唱特徴データを生成する。 The data generation method according to the present invention includes a first acquisition procedure, a second acquisition procedure, an extraction procedure, a timing specifying procedure, a first generation procedure, a time calculation procedure, and a second generation procedure.
In the first acquisition procedure, the information processing apparatus acquires score data from the first storage unit, and in the second acquisition procedure, the information processing apparatus acquires music data from the second storage unit. In the extraction procedure, the information processing device extracts vocal data from the acquired music data, and in the timing specifying procedure, the information processing device specifies each utterance timing based on the extracted vocal data. In the first generation procedure, the information processing device generates timing pair data based on each of the specified utterance timings and the performance start timing of the notes in the score data. Therefore, in the time calculation procedure, the generated timing pair Based on each data, the information processing apparatus calculates time. Further, in the second generation procedure, the information processing apparatus generates singing feature data by associating the calculated “time” with notes of musical score data corresponding to the performance start timing used for calculating the time. .

このようなデータ生成方法によれば、本発明に係るプログラムと同様の効果を得ることができる。 According to such a data generation method, the same effect as the program according to the present invention can be obtained.

本発明が適用された情報処理装置を備えたシステムの概略構成を示すブロック図である。It is a block diagram which shows schematic structure of the system provided with the information processing apparatus with which this invention was applied. 情報処理装置が実行するデータ生成処理の処理手順を示すフローチャートである。It is a flowchart which shows the process sequence of the data generation process which information processing apparatus performs. 発音ため算出処理の処理手順を示すフローチャートである。It is a flowchart which shows the process sequence of the calculation process for pronunciation. 発音ため時間の算出手法の概要を例示する図である。It is a figure which illustrates the outline | summary of the calculation method of time for pronunciation. 音高ため算出処理の処理手順を示すフローチャートである。It is a flowchart which shows the process sequence of a calculation process for a pitch. 音高ため時間の算出手法の概要を例示する図である。It is a figure which illustrates the outline | summary of the calculation method of time for a pitch. 歌唱特徴データの概要を例示する図である。It is a figure which illustrates the outline | summary of singing characteristic data. カラオケ装置が実行する表示処理の処理手順を示すフローチャートである。It is a flowchart which shows the process sequence of the display process which a karaoke apparatus performs. 表示処理を実行することで表示部に表示される表示の態様を例示する図である。It is a figure which illustrates the aspect of the display displayed on a display part by performing a display process.

以下に本発明の実施形態を図面と共に説明する。
＜システム構成＞
図１に示すカラオケ装置３０は、ユーザが指定した楽曲を演奏すると共に、その楽曲において表出するプロの歌手の歌い方における特徴を、表示部６４に表示させる装置である。 Embodiments of the present invention will be described below with reference to the drawings.
<System configuration>
The karaoke device 30 shown in FIG. 1 is a device that displays on the display unit 64 the characteristics of how to sing a professional singer expressed in the music while playing the music specified by the user.

このような、プロの歌手の歌い方における特徴をカラオケ装置３０が表示するために構築されるシステム１は、情報処理装置２と、情報処理サーバ１０と、カラオケ装置３０とを備えている。 The system 1 constructed in order for the karaoke device 30 to display such characteristics of how to sing a professional singer includes the information processing device 2, the information processing server 10, and the karaoke device 30.

情報処理装置２は、楽曲ごとに用意された楽曲データＷＤ及びＭＩＤＩ楽曲ＭＤに基づいて、歌唱特徴データＳＦを算出する。ここで言う歌唱特徴データＳＦとは、楽曲を歌唱するプロの歌手の歌い方における特徴を表すデータである。 The information processing device 2 calculates the singing feature data SF based on the music data WD and the MIDI music MD prepared for each music. The singing feature data SF referred to here is data representing features in how to sing a professional singer who sings music.

情報処理サーバ１０には、少なくとも、情報処理装置２にて算出された歌唱特徴データＳＦ及びＭＩＤＩ楽曲ＭＤが、対応する楽曲ごとに対応付けられて記憶部１４に記憶されている。
＜楽曲データ＞
次に、楽曲データＷＤは、特定の楽曲ごとに予め用意されたものであり、楽曲に関する情報が記述された楽曲管理情報と、楽曲の演奏音を表す原盤波形データとを備えている。楽曲管理情報には、楽曲を識別する楽曲識別情報（以下、楽曲ＩＤと称す）が含まれる。 In the information processing server 10, at least the singing feature data SF and the MIDI music MD calculated by the information processing device 2 are stored in the storage unit 14 in association with each corresponding music.
<Music data>
Next, the music data WD is prepared in advance for each specific music, and includes music management information in which information related to the music is described, and master waveform data representing the performance sound of the music. The music management information includes music identification information (hereinafter referred to as music ID) for identifying music.

本実施形態の原盤波形データは、複数の楽器の演奏音と、歌唱旋律をプロの歌手が歌唱したボーカル音とを含む音声データである。この音声データは、非圧縮音声ファイルフォーマットの音声ファイルによって構成されたデータであっても良いし、音声圧縮フォーマットの音声ファイルによって構成されたデータであっても良い。 The master waveform data of this embodiment is audio data including performance sounds of a plurality of musical instruments and vocal sounds sung by a professional singer. The audio data may be data constituted by an audio file in an uncompressed audio file format, or data constituted by an audio file in an audio compression format.

なお、以下では、原盤波形データに含まれる楽器の演奏音を表す音声波形データを伴奏データと称し、原盤波形データに含まれるボーカル音を表す音声波形データをボーカルデータと称す。 In the following, voice waveform data representing the performance sound of the musical instrument included in the master waveform data is referred to as accompaniment data, and voice waveform data representing the vocal sound included in the master waveform data is referred to as vocal data.

本実施形態の伴奏データに含まれる楽器の演奏音としては、打楽器（例えば、ドラム，太鼓，シンバルなど）の演奏音，弦楽器（例えば、ギター，ベースなど）の演奏音，打弦楽器（例えば、ピアノ）の演奏音，及び管楽器（例えば、トランペットやクラリネットなど）の演奏音がある。
＜ＭＩＤＩ楽曲＞
ＭＩＤＩ楽曲ＭＤは、楽曲ごとに予め用意されたものであり、演奏データと、歌詞データとを有している。 Musical instrument performance sounds included in the accompaniment data of the present embodiment include percussion instrument (eg, drum, drum, cymbal, etc.) performance sounds, stringed instrument (eg, guitar, bass, etc.) performance sounds, percussion instrument (eg, piano) ) And wind instruments (eg, trumpet, clarinet, etc.).
<MIDI music>
The MIDI music MD is prepared in advance for each music and has performance data and lyrics data.

このうち、演奏データは、周知のＭＩＤＩ（ＭｕｓｉｃａｌＩｎｓｔｒｕｍｅｎｔＤｉｇｉｔａｌＩｎｔｅｒｆａｃｅ）規格によって、一つの楽曲の楽譜を表したデータである。この演奏データは、楽曲ＩＤと、当該楽曲にて用いられる楽器ごとの楽譜を表す楽譜トラックとを少なくとも有している。 Of these, the performance data is data representing the score of one piece of music according to the well-known MIDI (Musical Instrument Digital Interface) standard. This performance data has at least a music ID and a music score track representing a music score for each musical instrument used in the music.

そして、楽譜トラックには、ＭＩＤＩ音源から出力される個々の演奏音について、少なくとも、音高（いわゆるノートナンバー）と、ＭＩＤＩ音源が演奏音を出力する期間（以下、音符長と称す）とが規定されている。楽譜トラックにおける音符長は、当該演奏音の出力を開始するまでの当該楽曲の演奏開始からの時間を表す演奏開始タイミング（いわゆるノートオンタイミング）と、当該演奏音の出力を終了するまでの当該楽曲の演奏開始からの時間を表す演奏終了タイミング（いわゆるノートオフタイミング）とによって規定されている。 The musical score track defines at least the pitch (so-called note number) and the period during which the MIDI sound source outputs the performance sound (hereinafter referred to as the note length) for each performance sound output from the MIDI sound source. Has been. The note length in the score track is the performance start timing (so-called note-on timing) indicating the time from the start of the performance of the music until the output of the performance sound and the music until the output of the performance sound ends. Performance end timing (so-called note-off timing) representing the time from the start of the performance.

すなわち、楽譜トラックでは、ノートナンバーと、ノートオンタイミング及びノートオフタイミングによって表される音符長とによって、１つの音符ＮＯが規定される。そして、楽譜トラックは、音符ＮＯが演奏順に配置されることによって、１つの楽譜として機能する。なお、楽譜トラックは、例えば、鍵盤楽器、弦楽器、打楽器、及び管楽器などの楽器ごとに用意されている。このうち、本実施形態では、特定の楽器（例えば、ヴィブラフォン）が、楽曲における歌唱旋律を担当する楽器として規定されている。 That is, in the score track, one note NO is defined by the note number and the note length represented by the note-on timing and note-off timing. The musical score track functions as one musical score by arranging note NO in the order of performance. Note that the musical score track is prepared for each instrument such as a keyboard instrument, a stringed instrument, a percussion instrument, and a wind instrument, for example. Among these, in this embodiment, a specific musical instrument (for example, vibraphone) is defined as a musical instrument responsible for singing melody in music.

一方、歌詞データは、楽曲の歌詞に関するデータであり、歌詞テロップデータと、歌詞出力データとを備えている。歌詞テロップデータは、楽曲の歌詞を構成する文字（以下、歌詞構成文字とする）を表す。歌詞出力データは、歌詞構成文字の出力タイミングである歌詞出力タイミングを、演奏データの演奏と対応付けるタイミング対応関係が規定されたデータである。 On the other hand, the lyrics data is data relating to the lyrics of the music, and includes lyrics telop data and lyrics output data. The lyrics telop data represents characters that constitute the lyrics of the music (hereinafter referred to as lyrics component characters). The lyrics output data is data in which a timing correspondence relationship that associates the lyrics output timing, which is the output timing of the lyrics constituent characters, with the performance of the performance data is defined.

具体的に、本実施形態におけるタイミング対応関係では、演奏データの演奏を開始するタイミングに、歌詞テロップデータの出力を開始するタイミングが対応付けられている。さらに、タイミング対応関係では、楽曲の時間軸に沿った各歌詞構成文字の歌詞出力タイミングが、演奏データの演奏開始からの経過時間によって規定されている。これにより、楽譜トラックに規定された個々の演奏音の音符ＮＯと、歌詞構成文字それぞれとが対応付けられる。
＜情報処理装置＞
情報処理装置２は、入力受付部３と、情報出力部４と、記憶部５と、制御部６とを備えた周知の情報処理装置（例えば、パーソナルコンピュータ）である。 Specifically, in the timing correspondence relationship in the present embodiment, the timing for starting the output of the lyrics telop data is associated with the timing for starting the performance of the performance data. Furthermore, in the timing correspondence relationship, the lyrics output timing of each lyrics constituent character along the time axis of the music is defined by the elapsed time from the performance start of the performance data. As a result, the note NO of each performance sound defined in the score track is associated with each of the lyrics constituent characters.
<Information processing device>
The information processing apparatus 2 is a known information processing apparatus (for example, a personal computer) including an input receiving unit 3, an information output unit 4, a storage unit 5, and a control unit 6.

入力受付部３は、外部からの情報や指令の入力を受け付ける入力機器である。ここでの入力機器とは、例えば、キーやスイッチ、可搬型の記憶媒体（例えば、ＣＤやＤＶＤ、フラッシュメモリ）に記憶されたデータを読み取る読取ドライブ、通信網を介して情報を取得する通信ポートなどである。情報出力部４は、外部に情報を出力する出力装置である。ここでの出力装置とは、可搬型の記憶媒体にデータを書き込む書込ドライブや、通信網に情報を出力する通信ポートなどである。 The input receiving unit 3 is an input device that receives input of information and commands from the outside. The input device here is, for example, a key or switch, a reading drive for reading data stored in a portable storage medium (for example, CD, DVD, flash memory), or a communication port for acquiring information via a communication network. Etc. The information output unit 4 is an output device that outputs information to the outside. Here, the output device is a writing drive that writes data to a portable storage medium, a communication port that outputs information to a communication network, or the like.

記憶部５は、記憶内容を読み書き可能に構成された周知の記憶装置である。記憶部５には、少なくとも１つの楽曲データＷＤと、少なくとも１つのＭＩＤＩ楽曲ＭＤとが、共通する楽曲ごとに対応付けて記憶されている。 The storage unit 5 is a known storage device configured to be able to read and write stored contents. The storage unit 5 stores at least one piece of music data WD and at least one MIDI piece of music MD in association with each common piece of music.

制御部６は、ＲＯＭ７，ＲＡＭ８，ＣＰＵ９を備えた周知のマイクロコンピュータを中心に構成された周知の制御装置である。ＲＯＭ７は、電源が切断されても記憶内容を保持する必要がある処理プログラムやデータを記憶する。ＲＡＭ８は、処理プログラムやデータを一時的に記憶する。ＣＰＵ９は、ＲＯＭ７やＲＡＭ８に記憶された処理プログラムに従って各処理を実行する。 The control unit 6 is a known control device that is configured around a known microcomputer including a ROM 7, a RAM 8, and a CPU 9. The ROM 7 stores processing programs and data that need to retain stored contents even when the power is turned off. The RAM 8 temporarily stores processing programs and data. The CPU 9 executes each process according to a processing program stored in the ROM 7 or RAM 8.

本実施形態のＲＯＭ７には、記憶部５に記憶されている楽曲データＷＤ及びＭＩＤＩ楽曲ＭＤに基づいて、歌唱特徴データＳＦを算出するデータ生成処理を、制御部６が実行するための処理プログラムが記憶されている。
＜情報処理サーバ＞
情報処理サーバ１０は、通信部１２と、記憶部１４と、制御部１６とを備えている。 The ROM 7 of the present embodiment has a processing program for the control unit 6 to execute a data generation process for calculating the song feature data SF based on the music data WD and the MIDI music MD stored in the storage unit 5. It is remembered.
<Information processing server>
The information processing server 10 includes a communication unit 12, a storage unit 14, and a control unit 16.

このうち、通信部１２は、通信網を介して、情報処理サーバ１０が外部との間で通信を行う。すなわち、情報処理サーバ１０は、通信網を介してカラオケ装置３０と接続されている。なお、ここで言う通信網は、有線による通信網であっても良いし、無線による通信網であっても良い。 Among these, the communication unit 12 performs communication between the information processing server 10 and the outside via a communication network. That is, the information processing server 10 is connected to the karaoke apparatus 30 via a communication network. The communication network referred to here may be a wired communication network or a wireless communication network.

記憶部１４は、記憶内容を読み書き可能に構成された周知の記憶装置である。この記憶部１４には、少なくとも、複数のＭＩＤＩ楽曲ＭＤが記憶される。なお、図１に示す符号「ｍ」は、情報処理サーバ１０の記憶部１４に記憶されているＭＩＤＩ楽曲ＭＤを識別する識別子であり、１以上の自然数である。さらに、記憶部１４には、情報処理装置２がデータ生成処理を実行することで生成された歌唱特徴データＳＦが記憶される。 The storage unit 14 is a known storage device configured to be able to read and write stored contents. The storage unit 14 stores at least a plurality of MIDI music pieces MD. 1 is an identifier for identifying the MIDI music piece MD stored in the storage unit 14 of the information processing server 10, and is a natural number of 1 or more. Furthermore, the storage unit 14 stores singing feature data SF generated by the information processing apparatus 2 executing data generation processing.

制御部１６は、ＲＯＭ１８，ＲＡＭ２０，ＣＰＵ２２を備えた周知のマイクロコンピュータを中心に構成された周知の制御装置である。ＲＯＭ１８，ＲＡＭ２０，ＣＰＵ２２は、それぞれ、ＲＯＭ７，ＲＡＭ８，ＣＰＵ９と同様に構成されている。
＜カラオケ装置＞
カラオケ装置３０は、通信部３２と、入力受付部３４と、楽曲再生部３６と、記憶部３８と、音声制御部４０と、映像制御部４６と、制御部５０とを備えている。 The control unit 16 is a known control device that is configured around a known microcomputer including a ROM 18, a RAM 20, and a CPU 22. The ROM 18, RAM 20, and CPU 22 are configured similarly to the ROM 7, RAM 8, and CPU 9, respectively.
<Karaoke equipment>
The karaoke apparatus 30 includes a communication unit 32, an input reception unit 34, a music playback unit 36, a storage unit 38, an audio control unit 40, a video control unit 46, and a control unit 50.

通信部３２は、通信網を介して、カラオケ装置３０が外部との間で通信を行う。入力受付部３４は、外部からの操作に従って情報や指令の入力を受け付ける入力機器である。ここでの入力機器とは、例えば、キーやスイッチ、リモコンの受付部などである。 In the communication unit 32, the karaoke apparatus 30 communicates with the outside via a communication network. The input receiving unit 34 is an input device that receives input of information and commands in accordance with external operations. Here, the input device is, for example, a key, a switch, a reception unit of a remote controller, or the like.

楽曲再生部３６は、情報処理サーバ１０からダウンロードしたＭＩＤＩ楽曲ＭＤに基づく楽曲の演奏を実行する。この楽曲再生部３６は、例えば、ＭＩＤＩ音源である。音声制御部４０は、音声の入出力を制御するデバイスであり、出力部４２と、マイク入力部４４とを備えている。 The music playback unit 36 performs a music performance based on the MIDI music MD downloaded from the information processing server 10. The music reproducing unit 36 is, for example, a MIDI sound source. The voice control unit 40 is a device that controls voice input / output, and includes an output unit 42 and a microphone input unit 44.

マイク入力部４４には、マイク６２が接続される。これにより、マイク入力部４４は、ユーザの歌唱音を取得する。出力部４２にはスピーカ６０が接続されている。出力部４２は、楽曲再生部３６によって再生される楽曲の音源信号、マイク入力部４４からの歌唱音の音源信号をスピーカ６０に出力する。スピーカ６０は、出力部４２から出力される音源信号を音に換えて出力する。 A microphone 62 is connected to the microphone input unit 44. Thereby, the microphone input part 44 acquires a user's song sound. A speaker 60 is connected to the output unit 42. The output unit 42 outputs the sound source signal of the music reproduced by the music reproducing unit 36 and the sound source signal of the singing sound from the microphone input unit 44 to the speaker 60. The speaker 60 outputs the sound source signal output from the output unit 42 instead of sound.

映像制御部４６は、制御部５０から送られてくる映像データに基づく映像または画像の出力を行う。映像制御部４６には、映像または画像を表示する表示部６４が接続されている。 The video control unit 46 outputs a video or an image based on the video data sent from the control unit 50. The video control unit 46 is connected to a display unit 64 that displays video or images.

制御部５０は、ＲＯＭ５２，ＲＡＭ５４，ＣＰＵ５６を少なくとも有した周知のコンピュータを中心に構成されている。ＲＯＭ５２，ＲＡＭ５４，ＣＰＵ５６は、それぞれ、ＲＯＭ７，ＲＡＭ８，ＣＰＵ９と同様に構成されている。 The control unit 50 is configured around a known computer having at least a ROM 52, a RAM 54, and a CPU 56. The ROM 52, RAM 54, and CPU 56 are configured similarly to the ROM 7, RAM 8, and CPU 9, respectively.

そして、ＲＯＭ５２には、表示処理を制御部５０が実行するための処理プログラムが記憶されている。表示処理は、ユーザによって指定された楽曲を演奏すると共に、その楽曲において表出するプロの歌手の歌い方における特徴、その楽曲の歌詞、及び歌唱旋律を表示部６４に表示させる処理である。
＜データ生成処理＞
次に、情報処理装置２の制御部６が実行するデータ生成処理について説明する。 The ROM 52 stores a processing program for the control unit 50 to execute display processing. The display process is a process of playing the music designated by the user and displaying on the display unit 64 the characteristics of how to sing a professional singer expressed in the music, the lyrics of the music, and the singing melody.
<Data generation processing>
Next, data generation processing executed by the control unit 6 of the information processing apparatus 2 will be described.

このデータ生成処理は、処理プログラムを起動するための起動指令が、情報処理装置２の入力受付部３を介して入力されたタイミングで起動される。
そして、データ生成処理では、図２に示すように、起動されると、まず、制御部６は、情報処理装置２の記憶部５に記憶されている全ての楽曲データＷＤの中から、一つの楽曲データＷＤを取得する（Ｓ１１０）。なお、本実施形態のＳ１１０においては、制御部６は、楽曲データＷＤを記憶部５から取得したが、制御部６は、入力受付部３を介して、可搬型の記憶媒体や通信網を介してサーバなどから楽曲データＷＤを取得しても良い。 This data generation process is started at the timing when a start command for starting the processing program is input via the input receiving unit 3 of the information processing apparatus 2.
In the data generation process, as shown in FIG. 2, when activated, the control unit 6 first selects one piece of music data WD stored in the storage unit 5 of the information processing device 2 from one piece of music data WD. The music data WD is acquired (S110). In S110 of the present embodiment, the control unit 6 acquires the music data WD from the storage unit 5. However, the control unit 6 via the input reception unit 3 via a portable storage medium or communication network. The music data WD may be acquired from a server or the like.

データ生成処理では、制御部６は、続いて、Ｓ１１０にて取得した楽曲データＷＤ（以下、「取得楽曲データ」と称す）に含まれる原盤波形データを取得する（Ｓ１２０）。さらに、制御部６は、Ｓ１２０にて取得した原盤波形データから、ボーカルデータと伴奏データとを分離して抽出する（Ｓ１３０）。このＳ１３０において制御部６が実行する、伴奏データとボーカルデータとの分離手法として、周知の手法（例えば、特開２００８−１３４６０６に記載された“ＰｒｅＦＥｓｔ”）を使って推定された音高および調波成分を利用する手法が考えられる。なお、ＰｒｅＦＥｓｔとは、原盤波形データにおいて最も優勢な音声波形をボーカルデータとみなしてボーカルの音高（即ち、基本周波数）および調波成分の大きさを推定する手法である。 In the data generation process, the control unit 6 subsequently acquires master waveform data included in the music data WD acquired in S110 (hereinafter referred to as “acquired music data”) (S120). Further, the control unit 6 separates and extracts vocal data and accompaniment data from the master disk waveform data acquired in S120 (S130). As a method of separating accompaniment data and vocal data, which is executed by the control unit 6 in S130, the pitch and pitch estimated using a well-known method (for example, “PreFEst” described in Japanese Patent Laid-Open No. 2008-134606). A method using wave components can be considered. Note that PreFEst is a technique for estimating the pitch of a vocal (that is, the fundamental frequency) and the magnitude of a harmonic component by regarding the most prevalent voice waveform in the master waveform data as vocal data.

データ生成処理では、続いて、制御部６は、ボーカルデータにおける音圧レベルの推移を表すボーカル音圧推移を特定する（Ｓ１４０）。さらに、制御部６は、ボーカルデータにおける基本周波数ｆ０の推移を表すボーカル周波数推移を特定する（Ｓ１５０）。 In the data generation process, subsequently, the control unit 6 specifies a vocal sound pressure transition representing a transition of the sound pressure level in the vocal data (S140). Further, the control unit 6 specifies a vocal frequency transition representing a transition of the fundamental frequency f0 in the vocal data (S150).

具体的に、本実施形態のＳ１４０，Ｓ１５０では、制御部６は、まず、規定時間窓ＡＷ（ｊ）をボーカルデータに設定する。この規定時間窓ＡＷ（ｊ）は、予め規定された単位時間（例えば、１０［ｍｓ］）を有した分析窓である。本実施形態においては、規定時間窓ＡＷは、時間軸に沿って互いに隣接かつ連続するように設定される。なお、符号ｊは、規定時間窓ＡＷを識別する識別子である。 Specifically, in S140 and S150 of the present embodiment, the control unit 6 first sets the specified time window AW (j) as vocal data. The specified time window AW (j) is an analysis window having a predetermined unit time (for example, 10 [ms]). In the present embodiment, the specified time window AW is set to be adjacent to and continuous with each other along the time axis. The symbol j is an identifier for identifying the specified time window AW.

続いて、制御部６は、周知の手法により、ボーカルデータにおける各規定時間窓ＡＷ（ｊ）での音圧レベルＬｐを算出する。なお、音圧レベルＬｐは、ボーカルデータの規定時間窓ＡＷ（ｊ）における音圧の二乗平均平方根ｐを、基準となる音圧ｐ₀で除したものの常用対数に、所定の係数（通常、「２０」）を乗じること（即ち、Ｌｐ＝２０×ｌｏｇ₁₀（ｐ／ｐ₀））で求めることができる。 Subsequently, the control unit 6 calculates the sound pressure level Lp in each prescribed time window AW (j) in the vocal data by a known method. The sound pressure level Lp is obtained by dividing a root mean square p of sound pressure in a prescribed time window AW (j) of vocal data by a common logarithm obtained by dividing by a sound pressure p ₀ as a reference (usually, “ 20 ") (ie Lp = 20 × log ₁₀ (p / p ₀ )).

さらに、制御部６は、各規定時間窓ＡＷ（ｊ）での音圧レベルＬｐを、ボーカルデータにおける時間軸に沿って配置することで、ボーカル音圧推移を特定する。
また、ボーカル周波数推移を特定するために、制御部６は、ボーカルデータにおける各規定時間窓ＡＷ（ｊ）での基本周波数ｆ０を導出する。この基本周波数ｆ０の導出手法として、種種の周知の手法が考えられる。一例として、制御部６は、ボーカルデータに設定された規定時間窓ＡＷ（ｊ）それぞれについて、周波数解析（例えば、ＤＦＴ）を実施し、自己相関の結果、最も強い周波数成分を基本周波数ｆ０とすることが考えられる。 Furthermore, the control unit 6 specifies the vocal sound pressure transition by arranging the sound pressure level Lp in each specified time window AW (j) along the time axis in the vocal data.
In addition, in order to identify the vocal frequency transition, the control unit 6 derives the fundamental frequency f0 in each prescribed time window AW (j) in the vocal data. Various known methods can be considered as a method for deriving the fundamental frequency f0. As an example, the control unit 6 performs frequency analysis (for example, DFT) for each specified time window AW (j) set in vocal data, and sets the strongest frequency component as the fundamental frequency f0 as a result of autocorrelation. It is possible.

そして、制御部６は、それらの規定時間窓ＡＷ（ｊ）ごとに導出された基本周波数ｆ０を、ボーカルデータにおける時間軸に沿って配置することで、ボーカル周波数推移を特定する。 And the control part 6 specifies vocal frequency transition by arrange | positioning the fundamental frequency f0 derived | led-out for every these regulation | regulation time window AW (j) along the time axis in vocal data.

データ生成処理では、制御部６は、続いて、Ｓ１１０で取得した楽曲データＷＤと同一の楽曲ＩＤが対応付けられた一つのＭＩＤＩ楽曲ＭＤを取得する（Ｓ１６０）。さらに、制御部６は、取得楽曲データの各音符に対応する各音の再生時間に、Ｓ１６０で取得したＭＩＤＩ楽曲ＭＤ（以下、「取得ＭＩＤＩ」と称す）を構成する各音符の演奏タイミングが一致するように、その取得ＭＩＤＩを調整する（Ｓ１７０）。この取得ＭＩＤＩを調整する手法として、周知の手法（例えば、特許第５３１０６７７号に記載の手法）を用いることが考えられる。特許第５３１０６７７号に記載された手法では、制御部６は、取得ＭＩＤＩをレンダリングし、その取得ＭＩＤＩのレンダリング結果と取得楽曲データの原盤波形データとの双方を規定時間単位でスペクトルデータに変換する。そして、双方のスペクトルデータ上の時間が同期するように、各演奏音の演奏開始タイミング及び演奏終了タイミングを修正する。なお、スペクトルデータ上の時間が同期するように調整する際には、ＤＰマッチングを用いても良い。 In the data generation process, the control unit 6 subsequently acquires one MIDI music MD associated with the same music ID as the music data WD acquired in S110 (S160). Furthermore, the control unit 6 matches the performance timing of each note constituting the MIDI song MD acquired in S160 (hereinafter referred to as “acquired MIDI”) with the playback time of each sound corresponding to each note of the acquired song data. Then, the acquired MIDI is adjusted (S170). As a method for adjusting the acquired MIDI, it is conceivable to use a known method (for example, the method described in Japanese Patent No. 5310777). In the method described in Japanese Patent No. 5310679, the control unit 6 renders the acquired MIDI, and converts both the rendering result of the acquired MIDI and the master waveform data of the acquired music data into spectral data in a predetermined time unit. And the performance start timing and performance end timing of each performance sound are corrected so that the time on both spectrum data may synchronize. Note that DP matching may be used when adjusting the time on the spectrum data so as to be synchronized.

さらに、データ生成処理では、制御部６は、Ｓ１７０にて時間調整が実施されたＭＩＤＩ楽曲ＭＤから、歌唱旋律を表すメロディトラックを取得する（Ｓ１８０）。このＳ１８０において取得するメロディトラックには、歌唱旋律を構成する各音符（以下、「メロディ音符」と称す）ＮＯ（ｉ）が規定されている。そして、各メロディ音符ＮＯ（ｉ）には、ノートナンバー（音高），演奏開始タイミングｎｎｔ（ｉ），演奏終了タイミングｎｆｔ（ｉ），及び各種の時間制御情報（例えば、テンポ，分解能など）などの音符プロパティが規定されている。なお、符号ｉは、メロディ音符を識別する識別子であり、歌唱旋律の時間軸に沿って増加するように規定されている。 Further, in the data generation process, the control unit 6 acquires a melody track representing the singing melody from the MIDI music MD whose time has been adjusted in S170 (S180). In the melody track acquired in S180, each note (hereinafter referred to as "melody note") NO (i) constituting the singing melody is defined. Each melody note NO (i) includes a note number (pitch), performance start timing nnt (i), performance end timing nft (i), various time control information (eg, tempo, resolution, etc.), etc. Note properties are specified. Note that the symbol i is an identifier for identifying a melody note, and is defined so as to increase along the time axis of the singing melody.

データ生成処理では、制御部６は、時系列音符データを生成する（Ｓ１９０）。この時系列音符データは、メロディ音符ＮＯ（ｉ）を時間軸に沿って配置した音符推移に、各メロディ音符ＮＯ（ｉ）に関する音符プロパティを対応付けたものである。 In the data generation process, the control unit 6 generates time-series note data (S190). This time-series note data is obtained by associating note properties relating to each melody note NO (i) with note transitions in which the melody note NO (i) is arranged along the time axis.

具体的に、本実施形態のＳ１９０では、まず、制御部６は、メロディ音符ＮＯ（ｉ）を時間軸に沿って配置した音符推移を生成する。その音符推移に対して、制御部６は、規定時間窓ＡＷ（ｊ）を設定する。この音符推移に設定される規定時間窓ＡＷ（ｊ）は、ボーカルデータに設定される規定時間窓ＡＷ（ｊ）と共通である。すなわち、音符推移及びボーカルデータに設定される規定時間窓ＡＷ（ｊ）は、符号ｊが共通であれば、同一タイミングであることを意味する。 Specifically, in S190 of the present embodiment, first, the control unit 6 generates a note transition in which the melody note NO (i) is arranged along the time axis. The controller 6 sets a specified time window AW (j) for the note transition. The specified time window AW (j) set for the note transition is the same as the specified time window AW (j) set for vocal data. That is, the specified time window AW (j) set for the note transition and vocal data means that the same timing is used if the code j is common.

続いて、制御部６は、音符推移に設定した規定時間窓ＡＷの個数と、ボーカルデータに設定した規定時間窓ＡＷの個数とを比較する。制御部６は、比較の結果、個数が多いものを、音符推移及びボーカルデータの両データにおける規定時間窓ＡＷの個数（即ち、「ｊｍａｘ」）として採用する。なお、制御部６は、規定時間窓ＡＷとの個数が一致するように、個数が少ない方に「０値」を追加する補完を実行しても良い。 Subsequently, the control unit 6 compares the number of specified time windows AW set for note transition with the number of specified time windows AW set for vocal data. As a result of the comparison, the control unit 6 adopts the one having a large number as the number of the specified time windows AW (that is, “jmax”) in both the note transition data and the vocal data. Note that the control unit 6 may perform complementation by adding “0 value” to the smaller number so that the number matches the specified time window AW.

さらに、制御部６は、音符推移に設定された規定時間窓ＡＷ（ｊ）の中で、各メロディ音符ＮＯ（ｉ）の演奏開始タイミングｎｎｔ（ｉ）及び演奏終了タイミングｎｆｔ（ｉ）に対応する規定時間窓ＡＷ（ｊ）を特定する。そして、制御部６は、演奏開始タイミングｎｎｔ（ｉ）に対応する規定時間窓ＡＷ（ｊ）に対して、演奏開始タイミングｎｎｔ（ｉ）に対応する旨を表す音符開始フラグを対応付ける。さらに、制御部６は、演奏終了タイミングｎｆｔ（ｉ）に対応する規定時間窓ＡＷ（ｊ）に対して、演奏終了タイミングｎｆｔ（ｉ）に対応する旨を表す音符終了フラグを対応付ける。これと共に、制御部６は、音符開始フラグが対応付けられた規定時間窓ＡＷから、音符終了フラグが対応付けられた規定時間窓ＡＷまでの全ての規定時間窓ＡＷに対して、対応するメロディ音符ＮＯ（ｉ）の音高を対応付けることで、時系列音符データを生成する。 Further, the control unit 6 corresponds to the performance start timing nnt (i) and performance end timing nft (i) of each melody note NO (i) within the specified time window AW (j) set to note transition. A specified time window AW (j) is specified. Then, the control unit 6 associates a note start flag indicating that it corresponds to the performance start timing nnt (i) with the specified time window AW (j) corresponding to the performance start timing nnt (i). Furthermore, the control unit 6 associates a note end flag indicating that it corresponds to the performance end timing nft (i) with the specified time window AW (j) corresponding to the performance end timing nft (i). At the same time, the control unit 6 adds melody notes corresponding to all the specified time windows AW from the specified time window AW associated with the note start flag to the specified time window AW associated with the note end flag. Time series note data is generated by associating the pitch of NO (i).

データ生成処理では、制御部６は、Ｓ１４０にて特定したボーカル音圧推移と、Ｓ１５０にて特定したボーカル周波数推移と、Ｓ１９０で生成した時系列音符データとを、規定時間窓ＡＷ（ｊ）ごとに対応付ける（Ｓ２００）。以下では、ボーカル音圧推移とボーカル周波数推移と時系列音符データとを対応付けたデータを、同期データとも称す。 In the data generation process, the control unit 6 displays the vocal sound pressure transition specified in S140, the vocal frequency transition specified in S150, and the time-series note data generated in S190 for each specified time window AW (j). (S200). Below, the data which matched vocal sound pressure transition, vocal frequency transition, and time series note data are also called synchronous data.

データ生成処理では、制御部６は、Ｓ２００にて生成された同期データに基づいて、「発音ため時間」を算出する発音ため算出処理を実行する（Ｓ２１０）。この「発音ため時間」は、プロの歌手の歌い方における特徴の１つである。本実施形態のＳ２１０にて算出される「発音ため時間」は、メロディ音符ＮＯ（ｉ）の演奏開始タイミングｎｎｔ（ｉ）と、そのメロディ音符ＮＯ（ｉ）に対してプロの歌手が発音を開始するタイミングとの時間差である。 In the data generation process, the control unit 6 executes a sound generation calculation process for calculating “time for sound generation” based on the synchronization data generated in S200 (S210). This “pronunciation time” is one of the characteristics of professional singers. The “pronunciation time” calculated in S210 of this embodiment is the performance start timing nnt (i) of the melody note NO (i) and the professional singer starts to pronounce the melody note NO (i). It is the time difference from the timing to perform.

さらに、データ生成処理では、制御部６は、同期データに基づいて、「音高ため時間」を算出する音高ため算出処理を実行する（Ｓ２２０）。この「音高ため時間」は、プロの歌手の歌い方における特徴の１つである。本実施形態のＳ２２０にて算出される「音高ため時間」は、メロディ音符ＮＯ（ｉ）の演奏開始タイミングｎｎｔ（ｉ）と、プロの歌手が発声した音高が、そのメロディ音符ＮＯ（ｉ）の音高に一致するタイミングとの時間差である。 Further, in the data generation process, the control unit 6 executes the pitch calculation process to calculate “time for pitch” based on the synchronization data (S220). This “pitch time” is one of the features of professional singers. The “pitch time” calculated in S220 of the present embodiment is the performance start timing nnt (i) of the melody note NO (i) and the pitch uttered by a professional singer. ) Is the time difference from the timing that matches the pitch.

さらに、データ生成処理では、制御部６は、歌唱特徴データＳＦを生成する（Ｓ２３０）。具体的に、本実施形態のＳ２３０では、制御部６は、Ｓ２１０にて算出された発音ため時間を、その発音ため時間の算出に用いた演奏開始タイミングｎｎｔ（ｉ）に対応付ける。さらに、制御部６は、Ｓ２２０にて算出された音高ため時間を、その音高ため時間の算出に用いた演奏開始タイミングｎｎｔ（ｉ）に対応付ける。 Furthermore, in the data generation process, the control unit 6 generates singing feature data SF (S230). Specifically, in S230 of the present embodiment, the control unit 6 associates the time for sound generation calculated in S210 with the performance start timing nnt (i) used to calculate the time for sound generation. Further, the control unit 6 associates the pitch time calculated in S220 with the performance start timing nnt (i) used for calculating the pitch time.

データ生成処理では、制御部６は、Ｓ２３０にて生成した歌唱特徴データＳＦを記憶部５に記憶する（Ｓ２４０）。
その後、制御部６は、本データ生成処理を終了する。
＜発音ため算出処理＞
データ生成処理のＳ２１０にて実行される発音ため算出処理では、図３に示すように、制御部６は、先のＳ１４０にて特定されたボーカル音圧推移を取得する（Ｓ３１０）。続いて、制御部６は、ボーカル音圧推移における規定時間窓ＡＷ（ｊ）を、初期値（ｊ＝１）に設定する（Ｓ３２０）。 In the data generation process, the control unit 6 stores the singing feature data SF generated in S230 in the storage unit 5 (S240).
Thereafter, the control unit 6 ends the data generation process.
<Calculation process for pronunciation>
In the sound generation calculation process executed in S210 of the data generation process, as shown in FIG. 3, the control unit 6 acquires the vocal sound pressure transition specified in the previous S140 (S310). Subsequently, the control unit 6 sets the specified time window AW (j) in the vocal sound pressure transition to an initial value (j = 1) (S320).

さらに、発音ため算出処理では、制御部６は、設定されている規定時間窓ＡＷ（ｊ）でのボーカル音圧推移の音圧レベルＬｐ（ｊ）が閾値Ｔｈａよりも大きいか否かを判定する（Ｓ３３０）。この閾値Ｔｈａは、人が発声したものとみなせる音圧レベルとして予め規定されたものである。 Further, in the calculation process for sound generation, the control unit 6 determines whether or not the sound pressure level Lp (j) of the vocal sound pressure transition in the set specified time window AW (j) is larger than the threshold value Tha. (S330). This threshold value Tha is defined in advance as a sound pressure level that can be regarded as being uttered by a person.

そして、Ｓ３３０での判定の結果、設定されている規定時間窓ＡＷ（ｊ）でのボーカル音圧推移の音圧レベルＬｐ（ｊ）が閾値Ｔｈａ以下であれば（Ｓ３３０：ＮＯ）、制御部６は、発音ため算出処理を、詳しくは後述するＳ３５０へと移行させる。一方、規定時間窓ＡＷ（ｊ）でのボーカル音圧推移の音圧レベルＬｐ（ｊ）が閾値Ｔｈａよりも大きければ（Ｓ３３０：ＹＥＳ）、制御部６は、現時点で設定されている規定時間窓ＡＷ（ｊ）に発音開始フラグを設定する（Ｓ３４０）。この発音開始フラグは、ボーカルデータにおいて発音が開始されたタイミングを表す。なお、発音開始フラグが対応付けられた規定時間窓ＡＷは、特許請求の範囲における発声タイミングの一例である。 As a result of the determination in S330, if the sound pressure level Lp (j) of the vocal sound pressure transition in the set specified time window AW (j) is equal to or less than the threshold value Tha (S330: NO), the control unit 6 Shifts the calculation process for sound generation to S350, which will be described in detail later. On the other hand, if the sound pressure level Lp (j) of the vocal sound pressure transition in the specified time window AW (j) is larger than the threshold value Tha (S330: YES), the control unit 6 determines the specified time window set at the present time. A sound generation start flag is set in AW (j) (S340). This sounding start flag represents the timing at which sounding is started in vocal data. The specified time window AW associated with the pronunciation start flag is an example of the utterance timing in the claims.

続いて、制御部６は、設定されている規定時間窓ＡＷ（ｊ）の識別子を１つインクリメントする（Ｓ３５０）。さらに、制御部６は、現時点で設定されている規定時間窓ＡＷ（ｊ）の識別子が、規定時間窓ＡＷの個数に「１」を加えた値（即ち、ｊｍａｘ＋１）に一致しているか否かを判定する（Ｓ３６０）。このＳ３６０での判定の結果、設定されている規定時間窓ＡＷ（ｊ）の識別子が、規定時間窓ＡＷの個数に「１」を加えた値よりも小さければ（Ｓ３６０：ＮＯ）、制御部６は、発音ため算出処理をＳ３３０へと戻す。そして、制御部６は、Ｓ３３０からＳ３６０までのステップを繰り返す。 Subsequently, the control unit 6 increments the identifier of the set specified time window AW (j) by one (S350). Further, the control unit 6 determines whether or not the identifier of the specified time window AW (j) set at the present time is equal to a value obtained by adding “1” to the number of specified time windows AW (ie, jmax + 1). Is determined (S360). As a result of the determination in S360, if the identifier of the set specified time window AW (j) is smaller than the value obtained by adding “1” to the number of specified time windows AW (S360: NO), the controller 6 Returns the calculation process to S330 for sound generation. And the control part 6 repeats the step from S330 to S360.

一方、Ｓ３６０での判定の結果、現時点で設定されている規定時間窓ＡＷ（ｊ）の識別子が、規定時間窓ＡＷの個数に「１」を加えた値に一致していれば（Ｓ３６０：ＹＥＳ）、制御部６は、発音ため算出処理を、Ｓ３７０へと移行する。そのＳ３７０では、制御部６は、音符開始フラグが対応付けられた規定時間窓ＡＷ（ｊ）の中で、１つの規定時間窓ＡＷ（ｊ）を取得する。本実施形態において、Ｓ３７０へと最初に移行した場合には、制御部６は、音符開始フラグが対応付けられた規定時間窓ＡＷ（ｊ）の中で、時間軸に沿って最先の規定時間窓ＡＷ（ｊ）を取得する。以下、Ｓ３７０にて取得した規定時間窓ＡＷ（ｊ）を音符開始タイミングＮＳと称す。 On the other hand, as a result of the determination in S360, if the identifier of the specified time window AW (j) set at the present time matches the value obtained by adding “1” to the number of specified time windows AW (S360: YES) ), The control unit 6 shifts the calculation process for sounding to S370. In S370, the control unit 6 acquires one specified time window AW (j) in the specified time window AW (j) associated with the note start flag. In the present embodiment, when the process proceeds to S370 for the first time, the control unit 6 determines the earliest specified time along the time axis in the specified time window AW (j) associated with the note start flag. A window AW (j) is acquired. Hereinafter, the specified time window AW (j) acquired in S370 is referred to as a note start timing NS.

続いて、制御部６は、発音開始フラグが設定された規定時間窓ＡＷ（ｊ）の中で、規定条件を満たす規定時間窓ＡＷ（ｊ）を特定する（Ｓ３８０）。ここで言う規定条件とは、音符開始タイミングＮＳからの時間長が規定された規定範囲内であり、かつ、音符開始タイミングＮＳからの時間長が最短であることである。なお、本実施形態における規定範囲は、音符開始タイミングＮＳを起点として、時間軸に沿って前後の両方に規定される。この場合、音符開始タイミングＮＳから時間軸に沿って後の時間長が、音符開始タイミングＮＳから時間軸に沿って前の時間長よりも長く規定されても良い。また、音符開始タイミングＮＳから時間軸に沿って後の時間長と、音符開始タイミングＮＳから時間軸に沿って前の時間長とが同一であっても良い。以下、Ｓ３８０にて特定した規定時間窓ＡＷ（ｊ）を、発音開始タイミングＰＳと称す。 Subsequently, the control unit 6 identifies a specified time window AW (j) that satisfies a specified condition among the specified time windows AW (j) in which the sound generation start flag is set (S380). The specified condition here is that the time length from the note start timing NS is within a specified range and the time length from the note start timing NS is the shortest. Note that the specified range in the present embodiment is specified both before and after the time axis with the note start timing NS as a starting point. In this case, the time length after the note start timing NS along the time axis may be defined to be longer than the previous time length along the time axis from the note start timing NS. Further, the time length after the note start timing NS along the time axis may be the same as the time length after the note start timing NS along the time axis. Hereinafter, the specified time window AW (j) specified in S380 is referred to as a sound generation start timing PS.

これと共に、制御部６は、その特定した発音開始タイミングＰＳと、音符開始タイミングＮＳとのタイミングペアデータを生成する（Ｓ３８０）。
発音ため算出処理では、制御部６は、Ｓ３８０にて生成したタイミングペアデータを構成する、音符開始タイミングＮＳと発音開始タイミングＰＳとの時間差を、発音ため時間として算出する（Ｓ３９０）。このＳ３９０では、音符開始タイミングＮＳとしての規定時間窓ＡＷから、発音開始タイミングＰＳとしての規定時間窓ＡＷまでの間の時間窓の個数に、時間窓の単位時間を乗じることで、発音ため時間を算出すれば良い。 At the same time, the control unit 6 generates timing pair data of the specified sound generation start timing PS and the note start timing NS (S380).
In the sound generation calculation process, the control unit 6 calculates the time difference between the note start timing NS and the sound generation start timing PS, which constitutes the timing pair data generated in S380, as time for sound generation (S390). In this S390, the number of time windows from the specified time window AW as the note start timing NS to the specified time window AW as the sound generation start timing PS is multiplied by the unit time of the time window, thereby generating the time for sound generation. What is necessary is just to calculate.

なお、制御部６は、Ｓ３８０において、発音開始タイミングＰＳを特定できない場合には、発音開始タイミングＰＳを未定に設定しても良い。この場合、制御部６は、Ｓ３９０において、当該音符開始タイミングＮＳに対する発音ため時間として、予め規定された未定値（例えば、「−９．９９９９」［ｍｓ］）を設定すれば良い。 Note that if the sound generation start timing PS cannot be specified in S380, the control unit 6 may set the sound generation start timing PS to be undetermined. In this case, the control unit 6 may set a predetermined undetermined value (for example, “−9.99999” [ms]) as a time for sound generation with respect to the note start timing NS in S390.

続いて、制御部６は、音符開始フラグが対応付けられた全ての規定時間窓ＡＷ（ｊ）について、Ｓ３７０〜Ｓ３９０が完了したか否かを判定する（Ｓ４００）。このＳ４００での判定の結果、音符開始フラグが対応付けられた全ての規定時間窓ＡＷ（ｊ）について、Ｓ３７０〜Ｓ３９０が完了していなければ（Ｓ４００：ＮＯ）、制御部６は、発音ため算出処理をＳ３７０へと戻す。そのＳ３７０では、制御部６は、音符開始フラグが対応付けられた規定時間窓ＡＷ（ｊ）の中で、先に取得した規定時間窓ＡＷ（ｊ）から時間軸に沿って１つ次の規定時間窓ＡＷ（ｊ）を取得する。制御部６は、その後、Ｓ４００までを実行する。 Subsequently, the control unit 6 determines whether or not S370 to S390 are completed for all the specified time windows AW (j) associated with the note start flag (S400). As a result of the determination in S400, if S370 to S390 are not completed for all the specified time windows AW (j) associated with the note start flag (S400: NO), the control unit 6 calculates for sound generation. The process returns to S370. In S370, the control unit 6 determines the next specified rule along the time axis from the previously acquired specified time window AW (j) in the specified time window AW (j) associated with the note start flag. A time window AW (j) is acquired. Thereafter, the control unit 6 executes up to S400.

一方、Ｓ４００での判定の結果、音符開始フラグが対応付けられた全ての規定時間窓ＡＷ（ｊ）について、Ｓ３７０〜Ｓ３９０が完了していれば（Ｓ４００：ＹＥＳ）、制御部６は、発音ため算出処理をＳ４１０へと移行させる。そのＳ４１０では、制御部６は、発音ため時間の楽曲における代表値（以下、「発音ため代表値」と称す）を算出する。 On the other hand, as a result of the determination in S400, if S370 to S390 have been completed for all the specified time windows AW (j) associated with the note start flag (S400: YES), the control unit 6 generates a sound. The calculation process proceeds to S410. In S410, the control unit 6 calculates a representative value in the musical piece of time for sound generation (hereinafter referred to as “representative value for sound generation”).

具体的に、Ｓ４１０では、制御部６は、Ｓ３９０にて算出された未定値を除く全ての発音ため時間を相加平均した結果を、発音ため代表値として算出する。なお、発音ため代表値の算出方法は、これに限るものではなく、例えば、未定値を除く全ての発音ため時間における中央値を、発音ため代表値としても良いし、未定値を除く全ての発音ため時間における最頻値を、発音ため代表値としても良い。 Specifically, in S410, the control unit 6 calculates, as a representative value for sound generation, an arithmetic average of all sound generation times except for the undetermined value calculated in S390. Note that the method for calculating the representative value for pronunciation is not limited to this. For example, the median value in time for all pronunciations except for the undetermined value may be used as the representative value for pronunciation, or all the pronunciations except for the undetermined value. Therefore, the mode value in time may be used as a representative value for pronunciation.

さらに、発音ため算出処理では、制御部６は、特殊発音ため時間を特定する（Ｓ４２０）。具体的に、Ｓ４２０では、制御部６は、Ｓ３９０にて算出された未定値を除く全ての発音ため時間の中で、発音ため代表値から規定時間長以上乖離した発音ため時間それぞれを、特殊発音ため時間として特定する。すなわち、特殊発音ため時間は、楽曲において歌唱技巧としての「ため」が顕著に表れる音符ＮＯでの発音ため時間である。 Further, in the calculation process for sound generation, the control unit 6 specifies time for special sound generation (S420). Specifically, in S420, the control unit 6 uses the special pronunciation for each of the pronunciation times that deviate from the representative value for the pronunciation by a specified time length or more in all the pronunciation times except for the undetermined value calculated in S390. Therefore, specify as time. In other words, the time for special pronunciation is the time for sound generation with the note NO in which “for” as a singing technique is prominent in the music.

そして、制御部６は、その特定した特殊発音ため時間における音符開始フラグが対応付けられた規定時間窓ＡＷ（ｊ）に、特殊発音ため時間の開始タイミングである旨を表すフラグを設定する。 And the control part 6 sets the flag showing that it is the start timing of time for special pronunciation in the regulation time window AW (j) with which the note start flag in time for the specified special pronunciation was matched.

その後、制御部６は、発音ため算出処理を終了して、データ生成処理のＳ２２０へと処理を移行する。
つまり、発音ため算出処理では、制御部６は、図４に示すように、ボーカル音圧推移の音圧レベルＬｐ（ｊ）が閾値Ｔｈａよりも大きくなったタイミングそれぞれを特定する。その特定したタイミングの中から、制御部６は、規定条件を満たす発音開始タイミングＰＳを特定する。この規定条件とは、音符開始タイミング（即ち、演奏開始タイミングｎｎｔ）ＮＳからの時間長が規定された規定範囲内であり、かつ、音符開始タイミングＮＳからの時間長が最短であることである。 Thereafter, the control unit 6 ends the calculation process for sound generation and shifts the process to S220 of the data generation process.
In other words, in the calculation process for sound generation, as shown in FIG. 4, the control unit 6 specifies each timing when the sound pressure level Lp (j) of the vocal sound pressure transition becomes larger than the threshold value Tha. From the identified timing, the control unit 6 identifies the sound generation start timing PS that satisfies the specified condition. The specified condition is that the time length from the note start timing (ie, performance start timing nnt) NS is within a specified range and the time length from the note start timing NS is the shortest.

制御部６は、さらに、その特定した発音開始タイミングＰＳと音符開始タイミングＮＳとのタイミングペアデータを特定する。そして、制御部６は、そのタイミングペアを構成する音符開始タイミングＮＳと発音開始タイミングＰＳと時間差を、「発音ため時間」として算出する。１つの楽曲における全ての音符開始タイミングＮＳについて、「発音ため時間」を算出すると、制御部６は、その算出した「発音ため時間」の中から、楽曲において歌唱技巧としての「ため」が顕著に表れる特殊発音ため時間を特定する。
＜音高ため算出処理＞
データ生成処理のＳ２２０にて実行される音高ため算出処理では、図５に示すように、制御部６は、先のＳ１４０にて特定されたボーカル周波数推移を取得する（Ｓ５１０）。続いて、制御部６は、ボーカル周波数推移をボーカル音高推移へと変換する（Ｓ５２０）。Ｓ５２０では、制御部６は、ボーカル周波数推移における各規定時間窓ＡＷ（ｊ）の基本周波数ｆ０を、音階音高（すなわち、ノートナンバーに対応する音高）ＭＳ（ｊ）へと変更することで、ボーカル音高推移へと変換する。なお、音階音高ＭＳ（ｊ）への変更は、基本周波数ｆ０からの差が最も近い音階音高とすることで実現すれば良い。 The control unit 6 further specifies timing pair data of the specified sound generation start timing PS and note start timing NS. Then, the control unit 6 calculates a time difference between the note start timing NS and the sound generation start timing PS constituting the timing pair as “time for sound generation”. When “time for pronunciation” is calculated for all the note start timings NS in one piece of music, the control unit 6 remarkably indicates “for” as a singing technique in the music from the calculated “time for pronunciation”. Specify the time for the special pronunciation to appear.
<Calculation processing for pitch>
In the pitch calculation process executed in S220 of the data generation process, as shown in FIG. 5, the control unit 6 acquires the vocal frequency transition specified in the previous S140 (S510). Subsequently, the control unit 6 converts the vocal frequency transition into a vocal pitch transition (S520). In S520, the control unit 6 changes the fundamental frequency f0 of each prescribed time window AW (j) in the vocal frequency transition to the scale pitch (that is, the pitch corresponding to the note number) MS (j). , Convert to vocal pitch transition. Note that the change to the scale pitch MS (j) may be realized by setting the scale pitch closest to the fundamental frequency f0.

音高ため算出処理では、制御部６は、続いて、ボーカル音高推移における規定時間窓ＡＷ（ｊ）を初期値（ｊ＝１）に設定する（Ｓ５３０）。さらに、制御部６は、現時点で設定されている規定時間窓ＡＷ（ｊ）が、設定条件を満たすか否かを判定する（Ｓ５４０）。 In the calculation process for pitch, the control unit 6 subsequently sets the specified time window AW (j) in the vocal pitch transition to an initial value (j = 1) (S530). Further, the control unit 6 determines whether or not the specified time window AW (j) set at the present time satisfies the setting condition (S540).

このＳ５４０における設定条件とは、次の２つのいずれも満たすことである。すなわち、設定条件の１つは、設定されている規定時間窓ＡＷ（ｊ）での音圧レベルＬｐ（ｊ）が閾値Ｔｈａよりも大きいことである。設定条件の他の１つは、ボーカル音高推移において、規定時間窓ＡＷ（ｊ）での音階音高ＭＳ（ｊ）と、規定時間窓ＡＷ（ｊ−１）での音階音高ＭＳ（ｊ−１）との差分の絶対値が、予め規定された設定閾値Ｔｈｂ以上となることである。換言すると、Ｓ５４０では、制御部６は、規定時間窓ＡＷ（ｊ）において、歌手が発声しており、かつ、その発声した歌声の音高が設定閾値Ｔｈｂ以上変化したか否かを判定する。 The setting condition in S540 is to satisfy both of the following two conditions. That is, one of the setting conditions is that the sound pressure level Lp (j) in the set specified time window AW (j) is larger than the threshold value Tha. Another setting condition is that in the vocal pitch transition, the scale pitch MS (j) in the specified time window AW (j) and the scale pitch MS (j in the specified time window AW (j-1). The absolute value of the difference from -1) is equal to or greater than a preset threshold value Thb. In other words, in S540, the control unit 6 determines whether or not the singer is uttering in the specified time window AW (j) and the pitch of the uttered singing voice has changed by the set threshold Thb or more.

そして、Ｓ５４０での判定の結果、設定されている規定時間窓ＡＷ（ｊ）が設定条件を満たしていなければ（Ｓ５４０：ＮＯ）、制御部６は、後述するＳ５６０へと音高ため算出処理を移行させる。一方、Ｓ５４０での判定の結果、設定されている規定時間窓ＡＷ（ｊ）が設定条件を満たしていれば（Ｓ５４０：ＹＥＳ）、制御部６は、ボーカルデータにおいて音高が変化したことを表す音高変化フラグを、その設定されている規定時間窓ＡＷ（ｊ）に設定する（Ｓ５５０）。 As a result of the determination in S540, if the set specified time window AW (j) does not satisfy the setting condition (S540: NO), the control unit 6 performs calculation processing for pitching to S560 described later. Transition. On the other hand, as a result of the determination in S540, if the set specified time window AW (j) satisfies the setting condition (S540: YES), the control unit 6 indicates that the pitch has changed in the vocal data. The pitch change flag is set in the set specified time window AW (j) (S550).

続いて、制御部６は、設定されている規定時間窓ＡＷ（ｊ）の識別子を１つインクリメントする（Ｓ５６０）。さらに、制御部６は、現時点で設定されている規定時間窓ＡＷ（ｊ）の識別子が、規定時間窓ＡＷの個数に「１」を加えた値（即ち、ｊｍａｘ＋１）に一致しているか否かを判定する（Ｓ５７０）。このＳ５７０での判定の結果、設定されている規定時間窓ＡＷ（ｊ）の識別子が、規定時間窓ＡＷの個数に「１」を加えた値よりも小さければ（Ｓ５７０：ＮＯ）、制御部６は、音高ため算出処理をＳ５４０へと戻す。そして、制御部６は、Ｓ５４０からＳ５７０までを繰り返す。 Subsequently, the control unit 6 increments the identifier of the set specified time window AW (j) by one (S560). Further, the control unit 6 determines whether or not the identifier of the specified time window AW (j) set at the present time is equal to a value obtained by adding “1” to the number of specified time windows AW (ie, jmax + 1). Is determined (S570). As a result of the determination in S570, if the identifier of the set specified time window AW (j) is smaller than the value obtained by adding “1” to the number of specified time windows AW (S570: NO), the controller 6 Returns the calculation process to S540 for the pitch. Then, the control unit 6 repeats S540 to S570.

一方、Ｓ５７０での判定の結果、設定されている規定時間窓ＡＷ（ｊ）の識別子が、規定時間窓ＡＷの個数に「１」を加えた値に一致していれば（Ｓ５７０：ＹＥＳ）、制御部６は、音高ため算出処理をＳ５８０へと移行する。そのＳ５８０では、制御部６は、音符開始フラグが対応付けられた規定時間窓ＡＷ（ｊ）の中で、１つの規定時間窓ＡＷ（ｊ）を取得する。本実施形態において、Ｓ５８０へと最初に移行した場合には、制御部６は、音符開始フラグが対応付けられた規定時間窓ＡＷ（ｊ）の中で、時間軸に沿って最先の規定時間窓ＡＷ（ｊ）を取得する。以下、Ｓ５８０にて取得した規定時間窓ＡＷ（ｊ）も音符開始タイミングＮＳと称す。 On the other hand, as a result of the determination in S570, if the identifier of the set specified time window AW (j) matches the value obtained by adding “1” to the number of specified time windows AW (S570: YES), The controller 6 shifts the calculation process to S580 due to the pitch. In S580, the control unit 6 acquires one specified time window AW (j) in the specified time window AW (j) associated with the note start flag. In the present embodiment, when first shifting to S580, the control unit 6 sets the earliest specified time along the time axis in the specified time window AW (j) associated with the note start flag. A window AW (j) is acquired. Hereinafter, the specified time window AW (j) acquired in S580 is also referred to as a note start timing NS.

続いて、制御部６は、音高変化フラグが設定された規定時間窓ＡＷ（ｊ）の中で、第２規定条件を満たす規定時間窓ＡＷ（ｊ）を特定する（Ｓ５９０）。ここで言う第２規定条件とは、まず、音符開始タイミングＮＳに対応するメロディ音符ＮＯ（ｉ）の音高に一致する音階音高ＭＳ（ｊ）が対応付けられた規定時間窓ＡＷ（ｊ）であることである。さらに、ここで言う第２規定条件とは、音符開始タイミングＮＳからの時間長が規定された規定範囲内であり、かつ、音符開始タイミングＮＳからの時間長が最短であることである。以下、Ｓ５９０にて特定した規定時間窓ＡＷ（ｊ）を音高変化タイミングＣＴと称す。 Subsequently, the control unit 6 specifies a specified time window AW (j) that satisfies the second specified condition among the specified time windows AW (j) in which the pitch change flag is set (S590). The second specified condition here refers to a specified time window AW (j) in which the scale pitch MS (j) corresponding to the pitch of the melody note NO (i) corresponding to the note start timing NS is associated. It is to be. Further, the second specified condition here is that the time length from the note start timing NS is within a specified range and the time length from the note start timing NS is the shortest. Hereinafter, the specified time window AW (j) specified in S590 is referred to as pitch change timing CT.

これと共に、制御部６は、その特定した音高変化タイミングＣＴと、音符開始タイミングＮＳとのタイミングペアデータを生成する（Ｓ５９０）。
音高ため算出処理では、制御部６は、Ｓ５９０にて生成したタイミングペアデータを構成する、音符開始タイミングＮＳと音高変化タイミングＣＴとの時間差を音高ため時間として算出する（Ｓ６００）。このＳ６００における音高ため時間の算出は、音符開始タイミングＮＳとしての規定時間窓ＡＷから、音高変化タイミングＣＴとしての規定時間窓ＡＷまでの規定時間窓ＡＷの個数に、規定時間窓ＡＷの単位時間を乗じることで実現すれば良い。 At the same time, the control unit 6 generates timing pair data of the specified pitch change timing CT and the note start timing NS (S590).
In the calculation process for pitch, the control unit 6 calculates the time difference between the note start timing NS and the pitch change timing CT, which constitutes the timing pair data generated in S590, as time for pitch (S600). The calculation of the time for pitch in S600 is performed by adding the unit of the specified time window AW to the number of specified time windows AW from the specified time window AW as the note start timing NS to the specified time window AW as the pitch change timing CT. It can be realized by multiplying time.

なお、制御部６は、Ｓ５９０において、音高変化タイミングＣＴを特定できない場合には、音高変化タイミングＣＴを未定に設定しても良い。制御部６は、Ｓ６００において、当該音符開始タイミングＮＳに対する音高ため時間として、予め規定された未定値（例えば、「−９．９９９」［ｍｓ］）を設定すれば良い。 Note that if the pitch change timing CT cannot be specified in S590, the control unit 6 may set the pitch change timing CT to be undetermined. In S600, the control unit 6 may set a predetermined undetermined value (for example, “−9.999” [ms]) as the time for pitch with respect to the note start timing NS in S600.

続いて、制御部６は、音符開始フラグが対応付けられた全ての規定時間窓ＡＷ（ｊ）について、Ｓ５８０〜Ｓ６００が完了したか否かを判定する（Ｓ６１０）。このＳ６１０での判定の結果、音符開始フラグが対応付けられた全ての規定時間窓ＡＷ（ｊ）について、Ｓ５８０〜Ｓ６００が完了していなければ（Ｓ６１０：ＮＯ）、制御部６は、音高ため算出処理をＳ５８０へと戻す。そのＳ５８０では、制御部６は、音符開始フラグが対応付けられた規定時間窓ＡＷ（ｊ）の中で、先に取得した規定時間窓ＡＷ（ｊ）から時間軸に沿って１つ次の規定時間窓ＡＷ（ｊ）を取得する。制御部６は、その後、Ｓ６１０までを実行する。 Subsequently, the control unit 6 determines whether or not S580 to S600 have been completed for all the specified time windows AW (j) associated with the note start flag (S610). As a result of the determination in S610, if S580 to S600 are not completed for all the specified time windows AW (j) associated with the note start flag (S610: NO), the control unit 6 has a pitch. The calculation process is returned to S580. In S580, the control unit 6 determines the next specified rule along the time axis from the previously acquired specified time window AW (j) in the specified time window AW (j) associated with the note start flag. A time window AW (j) is acquired. Thereafter, the control unit 6 executes up to S610.

一方、Ｓ６１０での判定の結果、音符開始フラグが対応付けられた全ての規定時間窓ＡＷ（ｊ）について、Ｓ５８０〜Ｓ６００が完了していれば（Ｓ６１０：ＹＥＳ）、制御部６は、音高ため算出処理をＳ６２０へと移行させる。そのＳ６２０では、制御部６は、音高ため時間の楽曲における代表値（以下、「音高ため代表値」と称す）を算出する。 On the other hand, as a result of the determination in S610, if S580 to S600 have been completed for all the specified time windows AW (j) associated with the note start flag (S610: YES), the control unit 6 Therefore, the calculation process is shifted to S620. In S620, the control unit 6 calculates a representative value in the musical piece of time due to pitch (hereinafter referred to as “representative value because of pitch”).

具体的に、Ｓ６２０では、制御部６は、Ｓ６００にて算出された未定値を除く全ての音高ため時間を相加平均した結果を、音高ため代表値として算出する。なお、音高ため代表値の算出方法は、これに限るものではなく、例えば、未定値を除く全ての音高ため時間における中央値を、音高ため代表値としても良いし、未定値を除く全ての音高ため時間における最頻値を、音高ため代表値としても良い。 Specifically, in S620, the control unit 6 calculates, as a representative value for the pitch, the result of arithmetically averaging the time for all the pitches except the undetermined value calculated in S600. Note that the method for calculating the representative value for the pitch is not limited to this. For example, the median value in time for all the pitches except the undetermined value may be used as the representative value for the pitch, or the undetermined value is excluded. A mode value in time for all pitches may be used as a representative value for pitches.

さらに、音高ため算出処理では、制御部６は、特殊音高ため時間を特定する（Ｓ６３０）。具体的にＳ６３０では、制御部６は、Ｓ６００にて算出された未定値を除く全ての音高ため時間の中で、音高ため代表値から規定時間長以上乖離した音高ため時間それぞれを、特殊音高ため時間として特定する。そして、制御部６は、その特定した特殊音高ため時間における音符開始フラグが対応付けられた規定時間窓ＡＷ（ｊ）に、特殊音高ため時間の開始タイミングである旨を表すフラグを設定する。 Further, in the calculation process for pitch, the control unit 6 specifies the time for the special pitch (S630). Specifically, in S630, the control unit 6 determines the pitch times that are deviated from the representative value by a specified time length or more because of the pitch, in all the pitch values except the undetermined value calculated in S600. It is specified as time because it is a special pitch. And the control part 6 sets the flag showing that it is the start timing of time for special pitch to the regulation time window AW (j) with which the note start flag in time for the specified special pitch was matched. .

その後、制御部６は、音高ため算出処理を終了して、データ生成処理のＳ２３０へと処理を移行する。
つまり、音高ため算出処理では、制御部６は、図６に示すように、ボーカル音高推移において、設定条件を満たすタイミングを音高変化タイミングＣＴとして特定する。制御部６は、さらに、その特定した音高変化タイミングＣＴと音符開始タイミングＮＳとのタイミングペアを特定する。そして、制御部６は、そのタイミングペアを構成する音符開始タイミングＮＳと音高変化タイミングＣＴとの時間差を、「音高ため時間」として算出する。１つの楽曲における全ての音符開始タイミングＮＳについて、「音高ため時間」を算出すると、制御部６は、その算出した「音高ため時間」の中から、音高が一致するまでのズレ時間が顕著に表れる特殊音高ため時間を特定する。 Then, the control part 6 complete | finishes a calculation process for a pitch, and transfers a process to S230 of a data generation process.
That is, in the calculation process for pitch, as shown in FIG. 6, the control unit 6 specifies the timing satisfying the setting condition as the pitch change timing CT in the vocal pitch transition. The control unit 6 further specifies a timing pair of the specified pitch change timing CT and the note start timing NS. Then, the control unit 6 calculates a time difference between the note start timing NS and the pitch change timing CT constituting the timing pair as “time for pitch”. When “time for pitch” is calculated for all the note start timings NS in one musical piece, the control unit 6 determines the time difference from the calculated “time for pitch” until the pitches match. The time is specified because of the special pitch that appears prominently.

なお、音高ため算出処理の終了後に実行されるデータ生成処理のＳ２３０では、制御部６は、図７に示すように、発音ため時間を、その発音ため時間の算出に用いた演奏開始タイミングｎｎｔ（ｉ）、ひいては、メロディ音符ＮＯ（ｉ）に対応付ける。制御部６は、音高ため時間を、その音高ため時間の算出に用いた演奏開始タイミングｎｎｔ（ｉ）、ひいては、メロディ音符ＮＯ（ｉ）に対応付ける。 In S230 of the data generation process that is executed after the calculation process for pitch is completed, as shown in FIG. 7, the control unit 6 uses the time for sound generation and the performance start timing nnt used to calculate the time for sound generation as shown in FIG. (I) As a result, it is associated with the melody note NO (i). The control unit 6 associates the time for the pitch with the performance start timing nnt (i) used for calculating the time for the pitch, and consequently the melody note NO (i).

ただし、データ生成処理にて算出された発音ため時間、及び音高ため時間が未定値である場合には、制御部６は、発音ため時間、及び音高ため時間として、未定値をメロディ音符ＮＯ（ｉ）に対応付けても良い。 However, when the time for sound generation and the time for pitch calculated in the data generation process are undetermined values, the control unit 6 sets the undetermined value as the time for sound generation and the time for pitches as a melody note NO. It may be associated with (i).

情報処理装置２の制御部６がデータ生成処理を実行することで生成した歌唱特徴データＳＦは、可搬型の記憶媒体を用いて情報処理サーバ１０の記憶部１４に記憶されても良い。情報処理装置２と情報処理サーバ１０とが通信網を介して接続されている場合には、情報処理装置２の記憶部５に記憶された歌唱特徴データＳＦは、通信網を介して転送されることで、情報処理サーバ１０の記憶部１４に記憶されても良い。ただし、いずれの場合も、歌唱特徴データＳＦは、対応する楽曲のＭＩＤＩ楽曲ＭＤと対応付けて、情報処理サーバ１０の記憶部１４に記憶される。
＜表示処理＞
カラオケ装置３０の制御部５０が実行する表示処理は、表示処理を実行するための処理プログラムを起動する指令が入力されると起動される。 The singing feature data SF generated by the control unit 6 of the information processing device 2 executing the data generation process may be stored in the storage unit 14 of the information processing server 10 using a portable storage medium. When the information processing device 2 and the information processing server 10 are connected via a communication network, the singing feature data SF stored in the storage unit 5 of the information processing device 2 is transferred via the communication network. Thus, the information may be stored in the storage unit 14 of the information processing server 10. However, in any case, the singing feature data SF is stored in the storage unit 14 of the information processing server 10 in association with the MIDI music MD of the corresponding music.
<Display processing>
The display process executed by the control unit 50 of the karaoke apparatus 30 is started when a command for starting a processing program for executing the display process is input.

そして、表示処理では、起動されると、図８に示すように、制御部５０は、入力受付部３４を介して指定された楽曲に対応するＭＩＤＩ楽曲ＭＤを、情報処理サーバ１０の記憶部１４から取得する（Ｓ９１０）。続いて、制御部５０は、Ｓ９１０にて取得したＭＩＤＩ楽曲ＭＤに対応する歌詞データを取得する（Ｓ９２０）。さらに、制御部５０は、Ｓ９１０にて取得したＭＩＤＩ楽曲ＭＤに対応する歌唱特徴データＳＦを取得する（Ｓ９３０）。 In the display process, when activated, as shown in FIG. 8, the control unit 50 stores the MIDI music MD corresponding to the music specified via the input receiving unit 34 into the storage unit 14 of the information processing server 10. (S910). Subsequently, the control unit 50 acquires lyrics data corresponding to the MIDI music MD acquired in S910 (S920). Furthermore, the control unit 50 acquires singing feature data SF corresponding to the MIDI music MD acquired in S910 (S930).

表示処理では、続いて、制御部５０は、Ｓ９１０にて取得したＭＩＤＩ楽曲ＭＤを演奏する（Ｓ９４０）。具体的にＳ９４０では、制御部５０は、楽曲再生部３６にＭＩＤＩ楽曲ＭＤを出力する。そのＭＩＤＩ楽曲ＭＤを取得した楽曲再生部３６は、楽曲の演奏を行う。そして、楽曲再生部３６によって演奏された楽曲の音源信号が、出力部４２を介してスピーカ６０へと出力される。すると、スピーカ６０は、音源信号を音に換えて出力する。 In the display process, the control unit 50 subsequently plays the MIDI musical piece MD acquired in S910 (S940). Specifically, in S940, the control unit 50 outputs the MIDI music piece MD to the music reproduction unit 36. The music reproducing unit 36 that has acquired the MIDI music MD performs the music. Then, the sound source signal of the music played by the music playback unit 36 is output to the speaker 60 via the output unit 42. Then, the speaker 60 outputs the sound source signal instead of sound.

さらに、表示処理では、制御部５０は、Ｓ９１０にて取得したＭＩＤＩ楽曲ＭＤ、Ｓ９２０にて取得した歌詞データ、及びＳ９３０にて取得した歌唱特徴データＳＦに基づく表示を実行する（Ｓ９５０）。このＳ９５０では、具体的には、制御部５０は、ＭＩＤＩ楽曲ＭＤにおけるメロディトラック、歌詞データ、及び歌唱特徴データＳＦを映像制御部４６に出力する。その映像制御部４６は、歌詞データと、メロディトラックを構成するメロディ音符ＮＯ（ｉ）を、時間軸に沿って順次表示部６４に出力する。すると、表示部６４は、歌詞データと、メロディトラックを構成するメロディ音符ＮＯ（ｉ）とを、図９に示すように、時間軸に沿って順次表示する。さらに、映像制御部４６は、歌唱特徴データＳＦに含まれる、発音ため時間、音高ため時間を、時間軸に沿って順次表示部６４に出力する。すると、表示部６４は、発音ため時間、音高ため時間を、対応するメロディ音符ＮＯ（ｉ）の演奏開始タイミングｎｎｔ（ｉ）と対応付けて、時間軸に沿って順次表示する。 Further, in the display process, the control unit 50 performs display based on the MIDI music MD acquired in S910, the lyrics data acquired in S920, and the singing feature data SF acquired in S930 (S950). In S950, specifically, the control unit 50 outputs the melody track, the lyric data, and the singing feature data SF in the MIDI music piece MD to the video control unit 46. The video control unit 46 sequentially outputs the lyrics data and the melody note NO (i) constituting the melody track to the display unit 64 along the time axis. Then, the display unit 64 sequentially displays the lyrics data and the melody note NO (i) constituting the melody track along the time axis as shown in FIG. Further, the video control unit 46 sequentially outputs the time for sound generation and the time for pitch included in the singing feature data SF to the display unit 64 along the time axis. Then, the display unit 64 sequentially displays the time for sound generation and the time for sound pitch in association with the performance start timing nnt (i) of the corresponding melody note NO (i) along the time axis.

なお、Ｓ９５０では、制御部５０は、楽曲の演奏に合わせて、カラオケ装置３０の利用者が歌唱した歌唱音声の音高の推移を、映像制御部４６に出力しても良い。そして、映像制御部４６は、カラオケ装置３０の利用者が歌唱した歌唱音声の音高の推移（例えば、図９に示す実線の波形）を、メロディ音符ＮＯ（ｉ）とのズレを、利用者が認識可能な態様で表示しても良い。さらに、Ｓ９５０では、制御部５０は、プロの歌手が歌唱したボーカルデータの音高推移を、映像制御部４６に出力しても良い。そして、映像制御部４６は、プロの歌手が歌唱したボーカルデータの音高推移（即ち、ボーカル周波数推移、図９に示す破線の波形）を表示しても良い。 In S950, the control unit 50 may output the transition of the pitch of the singing voice sung by the user of the karaoke apparatus 30 to the video control unit 46 in accordance with the performance of the music. Then, the video control unit 46 changes the pitch of the singing voice sung by the user of the karaoke apparatus 30 (for example, the solid line waveform shown in FIG. 9) from the deviation from the melody note NO (i). May be displayed in a recognizable manner. Furthermore, in S950, the control unit 50 may output the pitch transition of vocal data sung by a professional singer to the video control unit 46. And the video control part 46 may display the pitch transition (namely, vocal frequency transition, the waveform of the broken line shown in FIG. 9) of the vocal data which the professional singer sang.

さらに、Ｓ９５０では、制御部５０は、発音ため時間、音高ため時間を、予め規定された分類範囲ごとに分類した結果を、映像制御部４６に出力しても良い。すなわち、制御部５０は、例えば、発音ため時間であれば、第１時間（例えば、２００［ｍｓ］）までは発音ため時間が短いことを表す「発音ため小」に分類する。そして、制御部５０は、発音ため時間が、第１時間から第２時間（例えば、４００［ｍｓ］）までであれば、「発音ため中」に分類し、第２時間以上であれば、「発音ため大」に分類する。そして、制御部５０は、分類の結果を発音ため時間として映像制御部４６に出力しても良い。この場合、映像制御部４６は、分類の結果を表示部６４に表示させる。 Further, in S950, the control unit 50 may output to the video control unit 46 the result of classifying the time for sound generation and the time for pitches into predetermined classification ranges. That is, for example, if it is time for sound generation, the control unit 50 classifies it as “small for sound generation” indicating that the time for sounding is short until the first time (for example, 200 [ms]). Then, the control unit 50 classifies the sound for sound generation from the first time to the second time (for example, 400 [ms]), and classifies it as “medium for sound generation”. Classify as “Large for Pronunciation” Then, the control unit 50 may output the classification result to the video control unit 46 as a time for pronunciation. In this case, the video control unit 46 causes the display unit 64 to display the classification result.

また、音高ため時間であれば、制御部５０は、例えば、第１時間までは音高ため時間が短いことを表す「音高ため小」に分類する。そして、制御部５０は、音高ため時間が、第１時間から第２時間までであれば、「音高ため中」に分類し、第２時間以上であれば、「音高ため大」に分類する。そして、制御部５０は、分類の結果を音高ため時間として映像制御部４６に出力しても良い。この場合、映像制御部４６は、分類の結果を表示部６４に表示させる。 Further, if the time is due to the pitch, the control unit 50 classifies, for example, “small due to the pitch” indicating that the time is short until the first time. Then, the control unit 50 classifies as “medium for pitch” if the time for pitch is from the first time to the second time, and sets it as “large for pitch” if it is over the second time. Classify. Then, the control unit 50 may output the classification result to the video control unit 46 as a time for pitch. In this case, the video control unit 46 causes the display unit 64 to display the classification result.

さらには、制御部５０は、発音ため時間及び音高ため時間については、特殊発音ため時間及び特殊音高ため時間だけを映像制御部４６に出力しても良い。すなわち、Ｓ９５０においては、発音ため時間及び音高ため時間のうちの特殊発音ため時間及び特殊音高ため時間だけを、メロディ音符ＮＯと対応付けて表示部６４に提示しても良い。 Furthermore, the control unit 50 may output only the time for sound generation and the time for pitch to the video control unit 46 for time for special sounding and time for special pitch. That is, in S950, only the time for special pronunciation and the time for special pitch among the time for pronunciation and the time for pitch may be presented on the display unit 64 in association with the melody note NO.

続いて、表示処理では、制御部５０は、楽曲の演奏を終了したか否かを判定する（Ｓ９６０）。この判定の結果、楽曲の演奏を終了していなければ（Ｓ９６０：ＮＯ）、制御部５０は、表示処理をＳ９４０へと戻す。一方、Ｓ９６０での判定の結果、楽曲の演奏が終了していれば（Ｓ９６０：ＹＥＳ）、制御部５０は、表示処理を終了する。
［実施形態の効果］
以上説明したように、本実施形態の歌唱特徴データＳＦは、「発音ため時間」、「音高ため時間」を、メロディ音符ＮＯ（ｉ）に対応付けたものである。 Subsequently, in the display process, the control unit 50 determines whether or not the music performance has ended (S960). If the result of this determination is that the music performance has not ended (S960: NO), the control unit 50 returns the display processing to S940. On the other hand, if the result of determination in S960 is that the music performance has ended (S960: YES), the control unit 50 ends the display process.
[Effect of the embodiment]
As described above, the singing feature data SF of the present embodiment associates “time for sound generation” and “time for pitch” with the melody note NO (i).

このうち、「発音ため時間」、「音高ため時間」は、楽曲を歌唱する際に、プロの歌手が用いる歌唱技巧としての「ため」の特徴を表すものである。
発音ため時間は、メロディ音符ＮＯ（ｉ）の演奏開始タイミングｎｎｔ（ｉ）と、楽曲データＷＤにおいて歌手が実際に発声を開始した発声開始タイミングとの時間差である。また、音高ため時間は、メロディ音符ＮＯ（ｉ）の音高に、歌手が歌唱した歌声における音高が一致したタイミングと、そのメロディ音符ＮＯ（ｉ）の演奏開始タイミングｎｎｔ（ｉ）との時間差である。 Among these, “time for pronunciation” and “time for pitch” represent the characteristics of “for” as a singing technique used by professional singers when singing music.
The time for sounding is the time difference between the performance start timing nnt (i) of the melody note NO (i) and the utterance start timing at which the singer actually started uttering in the music data WD. Also, the time for pitch is the time between the pitch of the melody note NO (i) and the pitch of the singing voice sung by the singer, and the performance start timing nnt (i) of the melody note NO (i). It is a time difference.

そして、カラオケ装置３０では、時間軸に沿って順次表示するメロディ音符ＮＯの演奏開始タイミングｎｎｔと対応付けて、「発音ため時間」及び「音高ため時間」を表示部６４に表示している。 In the karaoke apparatus 30, “time for pronunciation” and “time for pitch” are displayed on the display unit 64 in association with the performance start timing nnt of the melody note NO sequentially displayed along the time axis.

したがって、カラオケ装置３０によれば、楽曲において、プロの歌手が「ため」を用いている箇所を、表示部６４を介して提示できる。また、カラオケ装置３０によれば、プロの歌手が、メロディ音符ＮＯ（ｉ）の音高に、歌声における音高を一致させるタイミングを、メロディ音符ＮＯ（ｉ）の演奏開始タイミングｎｎｔ（ｉ）からどの程度ずらしているのかを提示することができる。 Therefore, according to the karaoke apparatus 30, the location where the professional singer uses “for” in the music can be presented via the display unit 64. Further, according to the karaoke apparatus 30, the timing at which a professional singer matches the pitch of the melody note NO (i) with the pitch of the singing voice from the performance start timing nnt (i) of the melody note NO (i). It can be shown how much it is shifted.

しかも、カラオケ装置３０によれば、メロディ音符ＮＯ（ｉ）における“発音ため時間”及び“音高ため時間”を時間軸に沿って表示部６４に順次表示させる。このため、カラオケ装置３０によれば、メロディ音符ＮＯ（ｉ）における“発音ため時間”及び“音高ため時間”を、カラオケ装置３０の利用者が実際に歌唱する前に提示できる。 Moreover, according to the karaoke apparatus 30, “time for sound generation” and “time for pitch” in the melody note NO (i) are sequentially displayed on the display unit 64 along the time axis. Therefore, according to the karaoke apparatus 30, the “time for sound generation” and the “time for pitch” in the melody note NO (i) can be presented before the user of the karaoke apparatus 30 actually sings.

そして、これらのため時間が表示部６４に提示されるカラオケ装置３０のユーザは、歌唱している楽曲において、プロの歌手が、歌唱技巧としての「ため」をどのように用いているのかを認識できる。この結果、カラオケ装置３０のユーザは、ユーザ者自身の歌い方を、プロの歌手の歌い方により近づけることができる。 For these reasons, the user of the karaoke apparatus 30 whose time is presented on the display unit 64 recognizes how the professional singer uses “for” as a singing technique in the song being sung. it can. As a result, the user of the karaoke apparatus 30 can bring the user's own way of singing closer to that of a professional singer.

また、データ生成処理において特定する「特殊発音ため時間」は、楽曲において歌唱技巧としての「ため」が顕著に表れるメロディ音符ＮＯでの「発音ため時間」である。また、「特殊音高ため時間」は、メロディ音符ＮＯ（ｉ）の音高に、プロの歌手が歌唱した歌声における音高を一致させるタイミングのズレがより顕著に表れるメロディ音符ＮＯでのズレ時間である。 Further, the “time for special pronunciation” specified in the data generation process is “time for sound generation” in the melody note NO in which “for” as a singing technique appears remarkably in music. In addition, the “time for special pitch time” is a time difference in the melody note NO in which the pitch of the pitch in the singing voice sung by a professional singer is more pronounced than the pitch of the melody note NO (i). It is.

そして、本実施形態の歌唱特徴データＳＦには、「特殊発音ため時間」、及び「特殊音高ため時間」を含んでいる。
このため、カラオケ装置３０によれば、楽曲において、プロの歌手が「ため」を顕著に用いる箇所を、表示部６４を介して表示できる。また、カラオケ装置３０によれば、音符の音高に、プロの歌手が歌唱した歌声における音高を一致させるタイミングのズレが顕著な箇所を、表示部６４を介して表示できる。 The singing feature data SF of the present embodiment includes “time for special pronunciation” and “time for special pitch”.
For this reason, according to the karaoke apparatus 30, the location which a professional singer uses "for" notably in a music can be displayed via the display part 64. FIG. In addition, according to the karaoke apparatus 30, it is possible to display, via the display unit 64, a portion where a shift in timing for matching the pitches of musical notes with the pitches of singing voices sung by a professional singer is significant.

したがって、カラオケ装置３０のユーザは、プロの歌手が「ため」を顕著に用いる箇所及びメロディ音符ＮＯ（ｉ）の音高にプロの歌手が歌唱した歌声における音高を一致させるタイミングのズレが顕著な箇所だけでも、ユーザ自身の歌い方をプロの歌手の歌い方に近づけることができる。
［その他の実施形態］
以上、本発明の実施形態について説明したが、本発明は上記実施形態に限定されるものではなく、本発明の要旨を逸脱しない範囲において、様々な態様にて実施することが可能である。 Therefore, the user of the karaoke apparatus 30 has a noticeable shift in the pitch at which the professional singer sings “for” and the pitch of the singing voice sung by the professional singer to the pitch of the melody note NO (i). Even a simple part can bring the user's own way of singing closer to that of a professional singer.
[Other Embodiments]
As mentioned above, although embodiment of this invention was described, this invention is not limited to the said embodiment, In the range which does not deviate from the summary of this invention, it is possible to implement in various aspects.

例えば、上記実施形態におけるデータ生成処理は、情報処理装置２にて実行されていたが、本発明においてデータ生成処理を実行する装置は、情報処理装置２に限るものではない。すなわち、データ生成処理を実行する装置は、情報処理サーバ１０であっても良いし、カラオケ装置３０であっても良い。この場合、情報処理装置２は、システム１から省略されていても良い。 For example, the data generation process in the above embodiment is executed by the information processing apparatus 2, but the apparatus that executes the data generation process in the present invention is not limited to the information processing apparatus 2. That is, the apparatus that executes the data generation process may be the information processing server 10 or the karaoke apparatus 30. In this case, the information processing apparatus 2 may be omitted from the system 1.

上記実施形態における表示処理は、カラオケ装置３０にて実行されていたが、本発明において表示処理を実行する装置は、カラオケ装置３０に限るものではなく、情報処理装置２であっても良い。この場合、情報処理装置２は、カラオケ装置３０と同様に構成されている必要がある。 The display process in the above embodiment is executed by the karaoke apparatus 30, but the apparatus that executes the display process in the present invention is not limited to the karaoke apparatus 30, and may be the information processing apparatus 2. In this case, the information processing device 2 needs to be configured similarly to the karaoke device 30.

また、上記実施形態においては、データ生成処理と表示処理とは別個の処理として構成されていたが、本発明においては、データ生成処理と表示処理とは１つの処理として構成されていても良い。この場合、データ生成処理と表示処理とからなる１つの処理は、情報処理装置２にて実行されても良いし、カラオケ装置３０にて実行されても良い。 In the above embodiment, the data generation process and the display process are configured as separate processes. However, in the present invention, the data generation process and the display process may be configured as a single process. In this case, one process including the data generation process and the display process may be executed by the information processing apparatus 2 or may be executed by the karaoke apparatus 30.

また、上記実施形態のデータ生成処理では、発音ため算出処理と、音高ため算出処理との二つの処理を実行していたが、本発明においては、発音ため算出処理と、音高ため算出処理とのうちの少なくとも一方が実行されていれば良い。 In the data generation process of the above-described embodiment, two processes of the calculation process for sound generation and the calculation process for pitch are executed. However, in the present invention, the calculation process for sound generation and the calculation process for pitch are performed. As long as at least one of them is executed.

なお、上記実施形態の構成の一部を、課題を解決できる限りにおいて省略した態様も本発明の実施形態である。また、上記実施形態と変形例とを適宜組み合わせて構成される態様も本発明の実施形態である。また、特許請求の範囲に記載した文言によって特定される発明の本質を逸脱しない限度において考え得るあらゆる態様も本発明の実施形態である。 In addition, the aspect which abbreviate | omitted a part of structure of the said embodiment as long as the subject could be solved is also embodiment of this invention. Further, an aspect configured by appropriately combining the above embodiment and the modification is also an embodiment of the present invention. Moreover, all the aspects which can be considered in the limit which does not deviate from the essence of the invention specified by the wording described in the claims are the embodiments of the present invention.

１…システム２…情報処理装置３…入力受付部４…情報出力部５…記憶部６，１６，５０…制御部７，１８，５２…ＲＯＭ８，２０，５４…ＲＡＭ９，２２，５６…ＣＰＵ１０…情報処理サーバ１２…通信部１４…記憶部３０…カラオケ装置３２…通信部３４…入力受付部３６…楽曲再生部３８…記憶部４０…音声制御部４２…出力部４４…マイク入力部４６…映像制御部６０…スピーカ６２…マイク６４…表示部 DESCRIPTION OF SYMBOLS 1 ... System 2 ... Information processing apparatus 3 ... Input reception part 4 ... Information output part 5 ... Memory | storage part 6, 16, 50 ... Control part 7, 18, 52 ... ROM 8, 20, 54 ... RAM 9, 22, 56 ... CPU 10 ... Information processing server 12 ... Communication unit 14 ... Storage unit 30 ... Karaoke device 32 ... Communication unit 34 ... Input reception unit 36 ... Music playback unit 38 ... Storage unit 40 ... Voice control unit 42 ... Output unit 44 ... Microphone input unit 46 ... Video control unit 60 ... Speaker 62 ... Microphone 64 ... Display unit

Claims

1st acquisition which acquires the score data which represents the score of the music comprised by several notes, Comprising: The musical score data with which the pitch and the performance period were matched with each of these several notes from the 1st memory | storage part Steps,
A second acquisition step of acquiring music data including a vocal sound of singing the music from the second storage unit;
An extraction step of extracting vocal data representing the vocal sound from the music data;
A timing specifying step of specifying each utterance start timing at which the sound pressure in the vocal data is equal to or higher than a predetermined threshold as an utterance timing that can be regarded as a timing at which utterance is started in the vocal data;
Based on each of the utterance timings identified in the timing identification step and the performance start timing of the notes in the score data acquired in the first acquisition step, the performance start timing of the notes in the score data and the performance start timing A first generation step of generating timing pair data combining the utterance start timing with the shortest time length ;
Calculating a representative value for pronunciation, which is a representative value in the music for the time of pronunciation, which is a time difference between the performance start timing and the utterance start timing in the timing pair data generated in the first generation step, If, on the basis of the representative values for sound, and time calculating step for calculating the sound for the representative value specified time or length deviating the pronunciation for special sound because time is time, since the time,
Therefore, singing feature data representing the characteristics of the singer's singing method is generated by associating the time thus calculated in the time calculating step with the note of the score data corresponding to the performance start timing used for calculating the time. A program for causing a computer to execute the second generation step.

1st acquisition which acquires the score data which represents the score of the music comprised by several notes, Comprising: The musical score data with which the pitch and the performance period were matched with each of these several notes from the 1st memory | storage part Steps,
A second acquisition step of acquiring music data including a vocal sound of singing the music from the second storage unit;
An extraction step of extracting vocal data representing the vocal sound from the music data;
The pitch change timing, which is the timing at which the vocal frequency transition representing the transition of the pitch in the vocal data falls within the pitch range of the note in the score data, can be regarded as the timing at which the vocal data is started. A timing specifying step for specifying the voice timing;
Based on each of the utterance timings identified in the timing identification step and the performance start timing of the notes in the score data acquired in the first acquisition step, the performance start timing of the notes in the score data and the performance start timing A first generation step of generating timing pair data in which the pitch change timing is associated with the pitch change timing that is within a specified time range that is specified in advance along the time axis and that has the shortest time length from the performance start timing. When,
Calculating a representative value for a pitch, which is a representative value in the music for a pitch, which is a time difference between the performance start timing and the pitch change timing in the timing pair data generated in the first generation step, To calculate the time for the special pitch, which is the time for the pitch that is deviated from the representative value for the pitch by a specified time length or more, based on the time for the pitch and the representative value for the pitch. A time calculation step;
Therefore, singing feature data representing the characteristics of the singer's singing method is generated by associating the time thus calculated in the time calculating step with the note of the score data corresponding to the performance start timing used for calculating the time. A second generation step and
A program that causes a computer to execute.

The timing specifying step specifies, as the utterance timing, each utterance start timing at which a sound pressure in the vocal data is equal to or higher than a predetermined threshold value,
The first generation step generates the timing pair data combining the performance start timing of the notes in the musical score data and the utterance start timing with the shortest time length from the performance start timing,
Therefore, the time calculating step calculates the time for sound generation as the time for the sound generation, which is a time difference between the performance start timing and the utterance start timing in the timing pair data generated in the first generation step. The program according to claim 2.

Therefore, the time calculation step includes
Calculating a representative value for pronunciation, which is a representative value in the song for the pronunciation time,
Based on the time for sound generation and the representative value for sound generation, the time for special sound generation, which is the time for sound generation that deviates from the representative value for sound generation by a specified time length or more, is calculated as the time for the purpose. The program according to claim 3.

1st acquisition which acquires the score data which represents the score of the music comprised by several notes, Comprising: The musical score data with which the pitch and the performance period were matched with each of these several notes from the 1st memory | storage part Means,
Second acquisition means for acquiring music data including the vocal sound of singing the music from the second storage unit;
Extraction means for extracting vocal data representing the vocal sound from the music data;
Timing specifying means for specifying each utterance start timing at which the sound pressure in the vocal data is equal to or greater than a predetermined threshold value as an utterance timing that can be regarded as a timing at which utterance is started in the vocal data;
Based on each of the utterance timings specified by the timing specifying means and the performance start timing of the notes in the score data acquired by the first acquisition means, from the performance start timing of the notes in the score data and the performance start timing First generation means for generating timing pair data combining the utterance start timing with the shortest time length;
Calculating a representative value for pronunciation, which is a representative value in the music for the time of pronunciation, which is a time difference between the performance start timing and the utterance start timing in the timing pair data generated by the first generation means, And, based on the representative value for sound generation, a time calculating means for calculating a time for special sound generation, which is a time for the sound generation that deviates from the representative value for sound generation by a specified time length or more, and as a time,
Therefore, singing feature data representing the characteristics of the singer's singing method is generated by associating the time thus calculated by the time calculating means with the note of the score data corresponding to the performance start timing used for calculating the time. With second generation means
An information processing apparatus comprising:

1st acquisition which acquires the score data which represents the score of the music comprised by several notes, Comprising: The musical score data with which the pitch and the performance period were matched with each of these several notes from the 1st memory | storage part Means,
Second acquisition means for acquiring music data including the vocal sound of singing the music from the second storage unit;
Extraction means for extracting vocal data representing the vocal sound from the music data;
A timing specifying means for specifying a pitch change timing, which is a timing at which a vocal frequency transition representing a transition of a pitch in the vocal data is within a range of the pitch of the note in the score data, as an utterance timing;
Based on each of the utterance timings specified by the timing specifying means and the performance start timing of the notes in the score data acquired by the first acquisition means, from the performance start timing of the notes in the score data and the performance start timing First generation means for generating timing pair data that is associated with the pitch change timing that is within a specified time range that is specified in advance along the time axis and that has the shortest time length from the performance start timing. When,
Calculating a representative value for a pitch, which is a representative value in the music for a pitch, which is a time difference between the performance start timing and the pitch change timing in the timing pair data generated by the first generator, To calculate the time for the special pitch, which is the time for the pitch that is deviated from the representative value for the pitch by a specified time length or more, based on the time for the pitch and the representative value for the pitch. Time calculation means;
Therefore, singing feature data representing the characteristics of the singer's singing method is generated by associating the time thus calculated by the time calculating means with the note of the score data corresponding to the performance start timing used for calculating the time. With second generation means
An information processing apparatus comprising:

The information processing apparatus acquires, from the first storage unit, musical score data representing a musical score of a music composed of a plurality of notes, each of which is associated with a pitch and a performance period. A first acquisition procedure,
A second acquisition procedure in which the information processing apparatus acquires from the second storage unit music data including a vocal sound that sang the music;
An extraction procedure by which the information processing apparatus extracts vocal data representing the vocal sound from the music data;
A timing specifying procedure for the information processing apparatus to specify each utterance start timing at which the sound pressure in the vocal data is equal to or higher than a predetermined threshold value as utterance timing that can be regarded as a timing at which utterance is started in the vocal data;
Based on each of the utterance timings specified in the timing specification procedure and the performance start timing of the notes in the score data acquired in the first acquisition procedure, from the performance start timing of the notes in the score data and the performance start timing A first generation procedure in which the information processing apparatus generates timing pair data that combines the utterance start timing with the shortest time length of
Calculating a representative value for pronunciation, which is a representative value of the musical composition for the time of pronunciation, which is a time difference between the performance start timing and the utterance start timing in the timing pair data generated in the first generation procedure; And a time calculation procedure for the information processing device to calculate the time for special pronunciation, which is the time for sound generation, which is deviated from the representative value for sound generation by a specified time length or more based on the representative value for sound generation. When,
Therefore, the singing feature data representing the singing feature of the singer by associating with the notes of the score data corresponding to the performance start timing used for the calculation of the time for the time calculated in the time calculation procedure, A second generation procedure generated by the information processing apparatus;
A data generation method comprising:

The information processing apparatus acquires, from the first storage unit, musical score data representing a musical score of a music composed of a plurality of notes, each of which is associated with a pitch and a performance period. A first acquisition procedure,
A second acquisition procedure in which the information processing apparatus acquires from the second storage unit music data including a vocal sound that sang the music;
An extraction procedure by which the information processing apparatus extracts vocal data representing the vocal sound from the music data;
The pitch change timing, which is the timing at which the vocal frequency transition representing the transition of the pitch in the vocal data falls within the pitch range of the note in the score data, can be regarded as the timing at which the vocal data is started. A timing specifying procedure specified by the information processing apparatus as the utterance timing;
Based on each of the utterance timings identified in the timing identification procedure and the performance start timing of the notes in the score data acquired in the first acquisition procedure, the performance start timing of the notes in the score data , and the performance start timing, The information processing apparatus generates timing pair data that is associated with the pitch change timing that is within a specified time range that is specified in advance along the time axis and that has the shortest time length from the performance start timing. A first generation procedure to:
Calculating a representative value for a pitch, which is a representative value in the music for a pitch, which is a time difference between the performance start timing and the pitch change timing in the timing pair data generated in the first generation procedure, Based on the time for pitch and the representative value for pitch, the information processing as the time for the special pitch, which is the time for the pitch that is deviated from the representative value for the pitch by a specified time length or more, as the time. A time calculation procedure for the device to calculate,
Therefore, the singing feature data representing the singing feature of the singer by associating with the notes of the score data corresponding to the performance start timing used for the calculation of the time for the time calculated in the time calculation procedure, A data generation method comprising: a second generation procedure generated by the information processing apparatus.