JP5549651B2

JP5549651B2 - Lyric output data correction device and program

Info

Publication number: JP5549651B2
Application number: JP2011167210A
Authority: JP
Inventors: 久美太田
Original assignee: Brother Industries Ltd
Current assignee: Brother Industries Ltd
Priority date: 2011-07-29
Filing date: 2011-07-29
Publication date: 2014-07-16
Anticipated expiration: 2031-07-29
Also published as: JP2013029762A; WO2013018397A1

Description

本発明は、楽譜データに対応付けて規定された歌詞の出力タイミングを、楽曲データに時間同期させるように修正する歌詞出力データ修正装置、及びプログラムに関する。 The present invention relates to a lyrics output data correction device and a program for correcting the output timing of lyrics defined in association with musical score data so as to synchronize with music data in time.

従来、歌声と伴奏音とを含む楽曲の音楽音響信号に対し、音楽音響信号とは別に用意された歌詞を時間軸に沿って対応付けるシステムが知られている（例えば、特許文献１参照）。 2. Description of the Related Art Conventionally, a system is known that associates lyrics prepared separately from a music acoustic signal along a time axis with a music acoustic signal of a song including a singing voice and an accompaniment sound (see, for example, Patent Document 1).

この種のシステムでは、音楽音響信号に含まれるボーカル音声に対応する信号（以下、ボーカル信号とする）から、予め規定された特徴量を抽出し、機械学習によって予め生成した識別器（いわゆる識別モデル）に照合する音声認識によって、ボーカル音声の音素の各々を特定している。さらに、この種のシステムでは、それらの特定した音素の各々と、歌詞を構成する各文字とを、時間軸に沿って順次対応付けることがなされている。 In this type of system, a discriminator (a so-called discriminating model) generated in advance by machine learning by extracting a pre-defined feature quantity from a signal (hereinafter referred to as a vocal signal) corresponding to a vocal voice included in a music acoustic signal. ) To identify each phoneme of the vocal speech. Further, in this type of system, each of the specified phonemes and each character constituting the lyrics are sequentially associated with each other along the time axis.

特開２００８−１３４６０６号公報JP 2008-134606 A

ところで、特許文献１に記載のシステムでは、ボーカル音声中の音素を特定するために、音声認識処理を用いており、認識フェーズにおいて各音素を特定するまでの処理量が膨大なものとなる。特に、特許文献１に記載のシステムにおいては、音声認識に用いる識別モデルを、機械学習によって生成しているため、学習フェーズに要する処理量は膨大なものとなる。 By the way, in the system described in Patent Document 1, speech recognition processing is used to specify phonemes in vocal speech, and the amount of processing until each phoneme is specified in the recognition phase becomes enormous. In particular, in the system described in Patent Document 1, since the identification model used for speech recognition is generated by machine learning, the amount of processing required for the learning phase becomes enormous.

つまり、特許文献１に記載のシステムでは、歌詞を構成する各文字を対応付けるべき、音楽音響信号での時刻を特定するまでに要するトータルの処理量が膨大なものとなるという問題があった。 That is, the system described in Patent Document 1 has a problem that the total amount of processing required to specify the time in the music acoustic signal to which the characters constituting the lyrics should be associated becomes enormous.

そこで、本発明は、より簡易な方法で、歌詞を構成する文字を、当該文字に対応する音楽音響信号での時刻に対応付けることを目的とする。 Therefore, an object of the present invention is to associate characters constituting the lyrics with the time in the music acoustic signal corresponding to the characters by a simpler method.

上記目的を達成するためになされた本発明の歌詞出力データ修正装置では、楽音推移取得手段が、対象楽曲を構成する楽音の音圧が時間軸に沿って推移した楽音推移波形を取得し、出力音推移取得手段が、対象楽曲を模擬した楽曲の楽譜を表し、音源モジュールから出力される個々の出力音について、少なくとも音高及び演奏開始タイミングが規定された楽譜データに基づいて、出力音の音圧が時間軸に沿って推移した出力音推移波形を取得する。 In the lyrics output data correction device of the present invention made to achieve the above object, the musical sound transition acquisition means acquires a musical sound transition waveform in which the sound pressure of the musical sound constituting the target music has changed along the time axis, and outputs it. The sound transition acquisition means represents the musical score of the musical piece simulating the target musical piece, and for each output sound output from the sound source module, the sound of the output sound is based on musical score data in which at least the pitch and the performance start timing are defined. An output sound transition waveform in which the pressure has shifted along the time axis is acquired.

そして、歌詞出力データ取得手段が、対象楽曲の歌詞を構成する歌詞構成文字の出力タイミングである歌詞出力タイミングを規定するデータであり、かつ歌詞構成文字の少なくとも１つに対する歌詞出力タイミングが、楽譜データに対して規定された少なくとも１つのタイミングである特定開始タイミングと対応付けられた歌詞出力データを取得する。 The lyrics output data acquisition means is data defining the lyrics output timing, which is the output timing of the lyrics constituent characters constituting the lyrics of the target song, and the lyrics output timing for at least one of the lyrics constituent characters is the score data The lyrics output data associated with the specific start timing which is at least one timing defined for the above is acquired.

さらに、時間ズレ量導出手段が、楽音推移取得手段にて取得した楽音推移波形から抽出した該楽音推移波形の特性を表す楽音情報と、出力音推移取得手段にて取得した出力音推移波形から抽出した該出力音推移波形の特性を表す出力音情報とを比較した結果に基づき、出力音の各々の演奏開始タイミングに対する、各出力音に対応する楽音の演奏開始タイミングのズレ量を表す時間ズレ量を導出する。 Further, the time deviation amount derivation means is extracted from the musical sound information indicating the characteristics of the musical sound transition waveform extracted from the musical sound transition waveform acquired by the musical sound transition acquisition means and the output sound transition waveform acquired by the output sound transition acquisition means. Based on the result of comparison with the output sound information representing the characteristics of the output sound transition waveform, the amount of time deviation indicating the amount of deviation of the performance start timing of the musical sound corresponding to each output sound with respect to the performance start timing of each output sound Is derived.

そして、タイミング修正手段が、楽音の演奏開始タイミングに一致するように、歌詞出力データ取得手段で取得した歌詞出力データにおける歌詞出力タイミングを、時間ズレ量導出手段で導出された時間ズレ量に従って修正した修正歌詞出力タイミングを規定する。
本発明の時間ズレ量導出手段では、楽音変化導出手段が、楽音推移波形から、該楽音推移波形の非調波成分である楽音非調波を抽出し、時間軸に沿った楽音非調波の変化を表す楽音変化を、楽音情報として導出し、出力音変化導出手段が、出力音推移波形から、該出力音推移波形の非調波成分である出力音非調波を抽出し、時間軸に沿った出力音非調波の変化を表す出力音変化を、出力音情報として導出する。
すると、時間相関導出手段が、楽音変化と出力音変化との相関値を表す時間相関値を、楽音変化の時間軸上に規定された基準位置に、出力音変化の時間軸上に設定された設定位置を一致させて出力音変化を時間軸に沿って伸縮する毎に導出すると共に、設定位置を規定範囲内で時間軸に沿って順次変更し、その時間相関導出手段にて導出された時間相関値の中で、値が最大となる時間相関値に対応する出力音変化の伸縮率及び設定位置を、時間補正量導出手段が、時間補正量（時間ズレ量）として導出する。 Then, the timing correction means corrects the lyrics output timing in the lyrics output data acquired by the lyrics output data acquisition means according to the time deviation amount derived by the time deviation amount deriving means so as to coincide with the musical performance start timing. Specify the timing of the revised lyrics output.
In the time shift amount deriving means of the present invention, the tone change deriving means extracts a tone non-harmonic that is a non-harmonic component of the tone transition waveform from the tone transition waveform, and the tone non-harmonic along the time axis. The musical sound change representing the change is derived as musical sound information, and the output sound change deriving means extracts the output sound non-harmonic, which is a non-harmonic component of the output sound transition waveform, from the output sound transition waveform, An output sound change representing a change in the output sound inharmonic along the line is derived as output sound information.
Then, the time correlation deriving means sets the time correlation value indicating the correlation value between the musical sound change and the output sound change to the reference position defined on the musical sound change time axis on the time base of the output sound change. The time derived by the time correlation deriving means is derived every time the output sound change is expanded or contracted along the time axis by matching the set position, and the set position is sequentially changed along the time axis within the specified range. Among the correlation values, the time correction amount deriving means derives the expansion / contraction rate and set position of the output sound change corresponding to the time correlation value having the maximum value as the time correction amount (time shift amount).

このような歌詞出力データ修正装置によれば、歌詞出力データにおける歌詞出力タイミングを、対象楽曲を構成する楽音の演奏開始タイミングに対して、時間軸に沿って一致したものへと修正することができる。 According to such a lyrics output data correction device, the lyrics output timing in the lyrics output data can be corrected to match the performance start timing of the musical sound constituting the target music along the time axis. .

そして、本発明の歌詞出力データ修正装置によれば、歌詞出力タイミングを修正する（即ち、修正歌詞出力タイミングを規定する）際に、音声認識処理を一切実行する必要がない。よって、本発明の歌詞出力データ修正装置によれば、歌詞構成文字を対応付けるべき、楽音の演奏開始タイミング（即ち、楽音推移波形（音楽音響信号）での時刻）を特定するまでに要するトータルの処理量を、特許文献１に記載された装置に比べて低減できる。 According to the lyrics output data correction device of the present invention, it is not necessary to execute any speech recognition processing when correcting the lyrics output timing (that is, defining the corrected lyrics output timing). Therefore, according to the lyrics output data correction device of the present invention, the total processing required to specify the musical performance start timing (that is, the time in the musical sound transition waveform (musical sound signal)) to which the lyrics constituent characters should be associated. The amount can be reduced as compared with the apparatus described in Patent Document 1.

換言すれば、本発明の歌詞出力データ修正装置によれば、より簡易な方法で、歌詞構成文字を、当該歌詞構成文字に対応する楽音の演奏開始タイミング（即ち、楽音推移波形（音楽音響信号）での時刻）に対応付けることができる。 In other words, according to the lyrics output data correction apparatus of the present invention, the music composition corresponding to the lyrics constituent characters is played back in a simpler manner (i.e., the musical sound transition waveform (music acoustic signal)). Time).

さらに、音源モジュールを備えていない装置に、本発明の歌詞出力データ修正装置にて修正された歌詞出力データを、個々の歌詞構成文字を表す歌詞テロップデータ、及び楽曲音響データと共に配信するシステムを想定する。このとき、当該装置にて、楽曲音響データの再生に併せて、楽曲音響データ中の楽音に時間同期させて、歌詞構成文字を出力すれば、当該装置においても、カラオケを楽しむことができる。 Furthermore, a system is assumed in which the lyrics output data corrected by the lyrics output data correction device of the present invention is delivered to an apparatus not equipped with a sound module along with lyrics telop data representing individual lyrics constituent characters and music acoustic data. To do. At this time, if the device outputs the lyrics constituent characters in time synchronization with the musical sound in the music sound data in conjunction with the reproduction of the music sound data, the device can also enjoy karaoke.

なお、ここで言う楽音推移波形には、例えば、対象楽曲を構成する全ての楽音の音圧が時間軸に沿って推移したアナログ波形を標本化（サンプリング）したものを含む。また、ここでいう出力音推移波形には、対象楽曲を模擬した楽曲をＭＩＤＩ形式にて表したデータをレンダリングすることで生成した音声信号を含む。 Note that the musical tone transition waveform referred to here includes, for example, a sampled (sampled) analog waveform in which the sound pressures of all musical tones constituting the target music have shifted along the time axis. In addition, the output sound transition waveform here includes an audio signal generated by rendering data representing a musical piece simulating the target musical piece in the MIDI format.

さらに、ここで言う歌詞構成文字とは、歌詞を構成する文字の各々であっても良いし、その文字の各々を特定の規則に従って一群とした文節やフレーズであっても良い。 Furthermore, the lyric structure character referred to here, it may be a respective characters constituting the lyrics, but it may also be a phrase or phrases and a group according to specific rules each of the character.

一般的に、楽音推移や出力音推移に含まれる非調波成分は、リズムを刻む楽器（例えば、ドラムやベース）の楽器音であることが多い。
このリズムを刻む楽器の楽器音は、他の楽器音に比して確実に検出できる。このため、本発明の歌詞出力データ修正装置にて導出される時間ズレ量は、楽譜データにおける個々の出力音の演奏開始タイミングと、楽音の演奏開始タイミングとをより確実に一致させることが可能なものとなる。 In general, inharmonic components included in musical tone transitions and output sound transitions are often instrumental sounds of musical instruments (for example, drums and basses) that rhythm.
The instrumental sound of a musical instrument that engraves this rhythm can be detected more reliably than other musical instrument sounds. For this reason, the amount of time deviation derived by the lyrics output data correction device of the present invention can more reliably match the performance start timing of each output sound in the score data with the performance start timing of the musical sound. It will be a thing.

よって、本発明の歌詞出力データ修正装置によれば、修正歌詞出力タイミングを、楽音の演奏開始タイミングにより確実に一致させることができる。
さらに、本発明の歌詞出力データ修正装置では、音高補正量導出手段が、楽音情報の１つと、出力音情報の１つとを比較した結果に基づき、出力音の音高が、該出力音に対応する楽音の音高に一致するように音高補正量を導出し、楽譜データ修正手段が、楽譜データに規定された出力音の各々の音高を、音高補正量導出手段で導出した音高補正量に従ってシフトすることで、楽譜データを修正した修正楽譜データを生成しても良い。 Therefore, according to the lyrics output data correction device of the present invention, the corrected lyrics output timing can be reliably matched with the musical performance start timing.
Furthermore, in the lyrics output data correction device according to the present invention, the pitch correction amount deriving means converts the pitch of the output sound to the output sound based on the result of comparing one piece of musical sound information with one piece of output sound information. The pitch correction amount is derived so as to match the pitch of the corresponding musical sound, and the score data correction means derives the pitch of each output sound specified in the score data by the pitch correction amount derivation means. Modified score data in which the score data is corrected may be generated by shifting according to the high correction amount.

この場合、時間ズレ量導出手段は、修正楽譜データに基づく出力音推移波形である修正音推移波形を、出力音推移取得手段で取得した出力音推移波形としても良い。 In this case, the time shift amount deriving means, correct the score modified sound transition waveform which is the output sound transition waveform based on the data, yet good as the output sound transition waveform obtained in output sound transition acquisition unit.

このような歌詞出力データ修正装置によれば、出力音推移取得手段で取得した出力音推移波形が修正音推移波形となるため、楽音推移波形との間の音高ズレが最小限に抑制され、時間ズレ量の導出精度を向上させることができる。この結果、本発明の歌詞出力データ修正装置によれば、修正歌詞出力タイミングを楽音の出力タイミングにより確実に一致させることができる。 According to such a lyrics output data correction device, since the output sound transition waveform acquired by the output sound transition acquisition means becomes the correction sound transition waveform, the pitch deviation between the musical sound transition waveform is suppressed to a minimum, The accuracy of deriving the amount of time deviation can be improved. As a result, according to the lyrics output data correction device of the present invention, the corrected lyrics output timing can be surely matched with the tone output timing.

そして、本発明の歌詞出力データ修正装置における音高補正量導出手段では、楽音分布導出手段が、楽音推移波形に含まれる周波数と各周波数の強さとを表し、該周波数の強さについて正規化した楽音音高分布を、楽音情報の一つとして導出し、出力音分布導出手段が、出力音推移波形に含まれる周波数と各周波数の強さとを表し、該周波数の強さについて正規化した出力音高分布を、出力音情報の一つとして導出し、音高相関導出手段が、出力音高分布と楽音音高分布との相関値を表す音高相関値を、楽音音高分布の予め規定された規定位置から出力音高分布を周波数軸に沿ってシフトさせる毎に導出しても良い。この場合、音高補正量導出手段は、音高相関導出手段にて導出された音高相関値の中で、値が最大となる音高相関値に対応する規定位置からの周波数軸に沿ったシフト量を、音高補正量として導出しても良い。 In the pitch correction amount deriving unit in the lyrics output data correcting device of the present invention, the tone distribution deriving unit represents the frequency included in the tone transition waveform and the strength of each frequency, and normalized the strength of the frequency. The tone pitch distribution is derived as one piece of tone information, and the output tone distribution deriving means represents the frequency included in the output tone transition waveform and the strength of each frequency, and the output tone normalized with respect to the strength of the frequency. The pitch distribution is derived as one of the output sound information, and the pitch correlation deriving means determines the pitch correlation value representing the correlation value between the output pitch distribution and the musical tone pitch distribution as the musical pitch distribution. The output pitch distribution may be derived from the specified position every time the output pitch distribution is shifted along the frequency axis. In this case, the pitch correction amount deriving unit is arranged along the frequency axis from the specified position corresponding to the pitch correlation value having the maximum value among the pitch correlation values derived by the pitch correlation deriving unit. the shift amount, be derived as pitch correction amount not good.

このように導出される音高補正量に従って楽譜データを修正すれば、修正後の出力音推移波形に含まれる周波数及び各周波数の強さの比率を、楽音推移波形に含まれる周波数及び各周波数の強さの比率に、より近似させることができる。 If the musical score data is corrected according to the pitch correction amount derived in this way, the ratio of the frequency and the strength of each frequency included in the corrected output sound transition waveform is changed to the frequency and each frequency included in the musical sound transition waveform. The strength ratio can be more closely approximated.

特に、本発明の歌詞出力データ修正装置で導出される楽音音高分布及び出力音高分布は、楽音推移波形及び出力音推移波形に含まれる周波数と各周波数の強さのうち、周波数の強さについて正規化されている。このため、本発明の歌詞出力データ修正装置によれば、楽音推移波形の振幅と、出力音推移波形の振幅とが大きく異なっていたとしても、修正楽譜データに基づく出力音推移波形を楽音推移波形に近づけることができる。 In particular, the musical tone pitch distribution and the output pitch distribution derived by the lyrics output data correction device of the present invention are the frequency included in the tone transition waveform and the output tone transition waveform, and the strength of the frequency. Has been normalized. Therefore, according to the lyrics output data correction device of the present invention, even if the amplitude of the tone transition waveform and the amplitude of the output tone transition waveform are greatly different from each other, the output transition waveform based on the corrected score data is converted into the tone transition waveform. Can be approached.

なお、本発明において、歌詞出力データは、歌詞構成文字のうちの少なくとも一部について、特定開始タイミングからの経過時間によって歌詞出力タイミングが規定されていても良い。 In the present invention, the lyrics output data may have the lyrics output timing defined by the elapsed time from the specific start timing for at least some of the lyrics constituent characters.

この場合、本発明の歌詞出力データ修正装置では、対応付手段が、少なくとも、経過時間によって歌詞出力タイミングが規定された歌詞構成文字の歌詞出力タイミングについて、修正歌詞出力タイミングを規定しても良い。 In this case, in the lyrics output data correcting device of the present invention, associating means, at least for the lyrics output timing of the lyrics structure characters lyric output timing defined by the elapsed time, but it may also define a modified lyrics output timing .

このような歌詞出力データ修正装置によれば、１つの歌詞構成文字からの経過時間によって歌詞出力タイミングが規定された歌詞出力データであっても、修正歌詞出力タイミングを規定できる。 According to such a lyrics output data correction device, the corrected lyrics output timing can be defined even for the lyrics output data in which the lyrics output timing is defined by the elapsed time from one lyrics constituent character.

なお、経過時間によって歌詞出力タイミングが規定された歌詞構成文字の修正歌詞出力タイミングの規定は、各歌詞構成文字の歌詞出力タイミングを、当該歌詞構成文字に対応する出力音の演奏開始タイミングと対応付けた上で実施しても良いし、当該対応付けを行うことなく実施しても良い。この対応付けの具体的な方法としては、対象楽曲においてテンポが一定の区間を特定し、そのテンポが一定の同一区間について、楽譜データに含まれる出力音の演奏開始タイミングを、歌詞出力データに含まれる歌詞構成文字の歌詞出力タイミングとを対応付けても良い。また、例えば、楽譜データがＭＩＤＩ（ＭｕｓｉｃＩｎｓｔｒｕｍｅｎｔＤｉｇｉｔａｌＩｎｔｅｒｆａｃｅ）規格のデータとして予め生成されたものである場合、歌詞出力データにおける歌詞出力タイミングを表すＭＩＤＩのトラックを新たに追加することが考えられる。すなわち、新たに追加したトラックにおいて、歌詞構成文字の各々の歌詞出力タイミングを、各歌詞構成文字に対応する出力音の演奏開始タイミングと対応付けて表しても良い。 The definition of the lyrics output timing of the lyrics constituent characters whose lyrics output timing is specified by the elapsed time is that the lyrics output timing of each lyrics constituent character is associated with the performance start timing of the output sound corresponding to the lyrics constituent character. In addition, it may be performed without performing the association. As a specific method of this association, a section where the tempo is constant in the target music is specified, and the performance start timing of the output sound included in the score data is included in the lyrics output data for the same section where the tempo is constant It may be associated with the lyrics output timing of the lyrics constituent characters. Further, for example, when the musical score data is generated in advance as data of the MIDI (Music Instrument Digital Interface) standard, it is conceivable to newly add a MIDI track representing the lyrics output timing in the lyrics output data. That is, in the newly added track, the lyrics output timing of each of the lyrics constituent characters may be expressed in association with the performance start timing of the output sound corresponding to each lyrics constituent character.

ところで、本発明において、楽譜データは、少なくとも一部の出力音の演奏開始タイミングが、特定開始タイミングとして規定されており、歌詞出力データは、歌詞構成文字の各々の歌詞出力タイミングに、当該歌詞構成文字に対応する出力音の演奏開始タイミングが対応付けられていても良い。 By the way, in the present invention, the musical score data has the performance start timing of at least some of the output sounds defined as the specific start timing, and the lyrics output data has the lyrics composition timing at the lyrics output timing of each of the lyrics constituent characters. The performance start timing of the output sound corresponding to the character may be associated.

この場合、本発明におけるタイミング修正手段は、歌詞構成文字の各々について、修正歌詞出力タイミングを規定しても良い。
このような歌詞出力データ修正装置によれば、歌詞構成文字の各々の歌詞出力タイミングに、当該歌詞構成文字に対応する出力音の演奏開始タイミングが対応付けられた歌詞出力データを生成できる。 In this case, the timing correcting means in the present invention, for each of the lyric structure characters, but it may also define a modified lyrics output timing.
According to such a lyric output data correction device, it is possible to generate lyric output data in which the lyric output timing of each lyric constituent character is associated with the performance start timing of the output sound corresponding to the lyric constituent character.

さらに、本発明では、演奏開始タイミング補正手段が、出力音の演奏開始タイミングを、時間ズレ量シフトさせた修正演奏開始タイミングを導出し、タイミング修正手段が、演奏開始タイミング補正手段で導出された修正演奏開始タイミングを、修正歌詞出力タイミングとしても良い。 Furthermore, in the present invention, the performance start timing correction means derives a modified performance start timing obtained by shifting the performance start timing of the output sound by a time shift amount, and the timing correction means corrects the performance start timing correction means derived by the performance start timing correction means. the performance start timing, is also not good as a modified lyrics output timing.

このような歌詞出力データ修正装置によれば、修正歌詞出力タイミングを修正演奏開始タイミングに置き換えることができる。
また、本発明では、演奏開始タイミング補正手段が、出力音の演奏開始タイミングを、時間ズレ量シフトさせた修正演奏開始タイミングを導出し、タイミング修正手段が、その修正演奏開始タイミングと出力音の演奏開始タイミングとの差分、歌詞出力タイミングをシフトさせることで、修正歌詞出力タイミングを規定しても良い。 According to such a lyrics output data correction device, the corrected lyrics output timing can be replaced with the corrected performance start timing.
Further, in the present invention, the performance start timing correction means derives a corrected performance start timing obtained by shifting the performance start timing of the output sound by a time shift amount, and the timing correction means performs the correction performance start timing and the performance of the output sound. the difference between the start timing, by shifting the lyrics output timing, but it may also be prescribed a modified lyrics output timing.

このような歌詞出力データ修正装置によれば、修正歌詞出力タイミングを、修正演奏開始タイミングと出力音の演奏開始タイミングとの差分、歌詞出力タイミングをシフトさせることで規定できる。 According to such a lyrics output data correction device, the corrected lyrics output timing can be defined by shifting the difference between the corrected performance start timing and the performance start timing of the output sound, and the lyrics output timing.

なお、本発明は、コンピュータを歌詞出力データ修正装置として機能させるためのプログラムであっても良い。
本発明がプログラムとしてなされている場合、そのプログラムでは、楽音推移取得手順にて、対象楽曲を構成する楽音の音圧が時間軸に沿って推移した楽音推移波形を取得し、出力音推移取得手順にて、対象楽曲を模擬した楽曲の楽譜を表し、音源モジュールから出力される個々の出力音について、少なくとも音高及び演奏開始タイミングが規定された楽譜データに基づいて、出力音の音圧が時間軸に沿って推移した出力音推移波形を取得する。さらに、歌詞出力データ取得手順にて、対象楽曲の歌詞を構成する歌詞構成文字の出力タイミングである歌詞出力タイミングを規定するデータであり、かつ歌詞構成文字の少なくとも１つに対する歌詞出力タイミングが、楽譜データに対して規定された少なくとも１つのタイミングである特定開始タイミングと対応付けられた歌詞出力データを取得する。 The present invention may be a program for causing a computer to function as a lyrics output data correction device.
When the present invention is implemented as a program, the program acquires a musical sound transition waveform in which the sound pressure of the musical sound constituting the target music has changed along the time axis in the musical sound transition acquisition procedure, and obtains an output sound transition acquisition procedure. Represents the musical score of the musical piece simulating the target musical piece, and for each output sound output from the sound source module, the sound pressure of the output sound is time based on the musical score data in which at least the pitch and the performance start timing are defined. The output sound transition waveform that has shifted along the axis is acquired. Furthermore, in the lyrics output data acquisition procedure, the data is for defining the lyrics output timing, which is the output timing of the lyrics constituent characters constituting the lyrics of the target music, and the lyrics output timing for at least one of the lyrics constituent characters is a score. The lyrics output data associated with the specific start timing which is at least one timing defined for the data is acquired.

そして、時間ズレ量導出手順にて、楽音推移波形から抽出した該楽音推移波形の特性を表す楽音情報と、出力音推移波形から抽出した該出力音推移波形の特性を表す出力音情報とを比較した結果に基づき、出力音の各々の演奏開始タイミングに対する、各出力音に対応する楽音の演奏開始タイミングのズレ量を表す時間ズレ量を導出し、タイミング修正手順にて、楽音の演奏開始タイミングに一致するように、歌詞出力データ取得手順で取得した歌詞出力データにおける歌詞出力タイミングを、時間ズレ量導出手順で導出された時間ズレ量に従って修正した修正歌詞出力タイミングを規定する。
時間ズレ量導出手順は、楽音推移取得手順にて取得した楽音推移波形から、該楽音推移波形の非調波成分である楽音非調波を抽出し、時間軸に沿った楽音非調波の変化を表す楽音変化を、楽音情報として導出する楽音変化導出手順と、出力音推移取得手順にて取得した出力音推移波形から、該出力音推移波形の非調波成分である出力音非調波を抽出し、時間軸に沿った出力音非調波の変化を表す出力音変化を、出力音情報として導出する出力音変化導出手順と、楽音変化導出手順にて導出された楽音変化と、出力音変化導出手順にて導出された出力音変化との相関値を表す時間相関値を、楽音変化の時間軸上に規定された基準位置に、出力音変化の時間軸上に設定された設定位置を一致させて出力音変化を時間軸に沿って伸縮する毎に導出すると共に、設定位置を規定範囲内で時間軸に沿って順次変更する時間相関導出手順と、時間相関導出手順にて導出された時間相関値の中で、値が最大となる時間相関値に対応する出力音変化の伸縮率及び設定位置を、時間補正量として導出する時間補正量導出手順とをコンピュータに実行させ、時間補正量導出手順にて導出した時間補正量を、時間ズレ量とする。 Then, in the time deviation amount derivation procedure, the musical sound information representing the characteristics of the musical sound transition waveform extracted from the musical sound transition waveform and the output sound information representing the characteristics of the output sound transition waveform extracted from the output sound transition waveform are compared. Based on the result, a time deviation representing the deviation of the performance start timing of the musical sound corresponding to each output sound with respect to the performance start timing of each output sound is derived, and the musical performance start timing is determined by the timing correction procedure. as match, the lyrics output timing of the lyrics output data acquired by the lyric output data acquisition step, define a modified lyrics output timing corrected according to the time shift amount derived in the time shift amount derivation procedure.
The time deviation amount derivation procedure is based on extracting the tone non-harmonic that is a non-harmonic component of the tone transition waveform from the tone transition waveform acquired in the tone transition acquisition procedure, and changing the tone non-harmonic along the time axis. From the tone change derivation procedure for deriving the tone change representing the tone information as the tone information and the output tone transition waveform acquired in the output tone transition acquisition procedure, the output tone non-harmonic that is the non-harmonic component of the output tone transition waveform is obtained. The output sound change derivation procedure for extracting the output sound change representing the change of the output sound non-harmonic along the time axis as output sound information, the musical sound change derived by the musical sound change derivation procedure, and the output sound The time correlation value representing the correlation value with the output sound change derived in the change derivation procedure is set to the reference position specified on the time axis of the musical sound change and the set position set on the time axis of the output sound change. Derived every time the output sound change is expanded and contracted along the time axis In addition, the time correlation derivation procedure for sequentially changing the set position along the time axis within the specified range and the time correlation value derived from the time correlation derivation procedure corresponding to the maximum time correlation value The computer executes a time correction amount derivation procedure for deriving the expansion / contraction rate of the output sound change and the set position as a time correction amount, and the time correction amount derived in the time correction amount derivation procedure is set as a time shift amount.

本発明のプログラムが、このようになされていれば、例えば、ＤＶＤ−ＲＯＭ、ＣＤ−ＲＯＭ、ハードディスク等のコンピュータ読み取り可能な記録媒体に記録し、必要に応じてコンピュータにロードさせて起動することや、必要に応じて通信回線を介してコンピュータに取得させて起動することにより用いることができる。そして、コンピュータに各手順を実行させることで、そのコンピュータを、請求項１に記載された歌詞出力データ修正装置として機能させることができる。 If the program of the present invention is made in this way, for example, it can be recorded on a computer-readable recording medium such as a DVD-ROM, CD-ROM, hard disk, etc. If necessary, it can be used by being acquired and activated by a computer via a communication line. And by making a computer perform each procedure, the computer can be functioned as a lyrics output data correction apparatus described in Claim 1.

本発明が適用された情報処理装置を中心に構成された音楽データ配信システムの概略構成を示すブロック図である。It is a block diagram which shows schematic structure of the music data delivery system comprised centering on the information processing apparatus to which this invention was applied. 第一実施形態におけるデータ修正処理の処理手順を示すフローチャートである。It is a flowchart which shows the process sequence of the data correction process in 1st embodiment. 音高補正処理の処理手順を示すフローチャートである。It is a flowchart which shows the process sequence of a pitch correction process. 音高補正処理の概要を説明する説明図である。It is explanatory drawing explaining the outline | summary of a pitch correction process. 時間ズレ量導出処理の処理手順を示すフローチャートである。It is a flowchart which shows the process sequence of a time gap | deviation amount derivation | leading-out process. 時間ズレ量導出処理の概要を説明する説明図である。It is explanatory drawing explaining the outline | summary of a time gap | deviation amount derivation | leading-out process. 第二実施形態における歌詞出力データの概要を示す図面である。It is drawing which shows the outline | summary of the lyric output data in 2nd embodiment. 第二実施形態におけるデータ修正処理の処理手順を示すフローチャートである。It is a flowchart which shows the process sequence of the data correction process in 2nd embodiment.

以下に本発明の実施形態を図面と共に説明する。
［第一実施形態］
〈音楽データ配信システムについて〉
ここで、図１は、本発明が適用された歌詞出力データ修正装置を備えた音楽データ配信システムの概略構成を示すブロック図である。 Embodiments of the present invention will be described below with reference to the drawings.
[First embodiment]
<About music data distribution system>
Here, FIG. 1 is a block diagram showing a schematic configuration of a music data distribution system including a lyrics output data correction device to which the present invention is applied.

この音楽データ配信システム１は、歌詞出力データＤＯを含む音楽データＭＤを格納する音楽データ格納サーバ３と、音楽データ格納サーバ３に格納された音楽データＭＤに対して処理を加える情報処理装置２０と、情報処理装置２０にて処理が加えられた音楽データＭＤが配信される少なくとも１つの携帯端末５Ａ〜５ｎ（ｎは、携帯端末の台数を表す１以上の自然数）とを備えている。
〈音楽データ格納サーバについて〉
音楽データ格納サーバ３は、音楽データＭＤ１〜ＭＤｍ（ｍは、音楽データの数を表す１以上の自然数）を格納するデータベースとして機能する装置である。本実施形態における音楽データＭＤには、楽曲音響データＤＷと、楽曲ＭＩＤＩデータＤＭと、歌詞データＤＬとが含まれる。 The music data distribution system 1 includes a music data storage server 3 that stores music data MD including lyrics output data DO, and an information processing device 20 that performs processing on the music data MD stored in the music data storage server 3. And at least one mobile terminal 5A to 5n (n is a natural number of 1 or more representing the number of mobile terminals) to which the music data MD processed by the information processing apparatus 20 is distributed.
<About the music data storage server>
The music data storage server 3 is a device that functions as a database that stores music data MD1 to MDm (m is a natural number of 1 or more representing the number of music data). The music data MD in the present embodiment includes music acoustic data DW, music MIDI data DM, and lyrics data DL.

このうち、楽曲音響データＤＷは、１つの楽曲（以下、特定楽曲とする）を構成する全ての楽音の音圧が時間軸に沿って推移したアナログ波形（即ち、楽音推移波形）を標本化（サンプリング）したデータであり、例えば、楽曲毎に予め用意された、ＷＡＶやＭＰ３形式の音声ファイルである。 Among these, the music acoustic data DW is a sample of an analog waveform (that is, a music transition waveform) in which the sound pressures of all the musical sounds constituting one music (hereinafter referred to as a specific music) change along the time axis ( Sampled data, for example, audio files in WAV or MP3 format prepared in advance for each song.

また、楽曲ＭＩＤＩデータＤＭは、周知のＭＩＤＩ（ＭｕｓｉｃａｌＩｎｓｔｒｕｍｅｎｔＤｉｇｉｔａｌＩｎｔｅｒｆａｃｅ）規格によって、特定楽曲を模擬した楽曲の楽譜を表すデータ（即ち、本発明の楽譜データに相当）であり、楽曲毎に予め用意されている。この楽曲ＭＩＤＩデータＤＭの各々は、楽曲を区別するデータである識別データと、当該楽曲にて用いられる楽器毎の楽譜を表す楽譜トラックと、当該楽曲を分割した区間（例えば、Ａメロやサビなど）の各々におけるテンポを表すテンポデータとを少なくとも有している。 The music MIDI data DM is data representing the music score of a music simulating a specific music according to the well-known MIDI (Musical Instrument Digital Interface) standard (that is, corresponding to the music data of the present invention), and is prepared in advance for each music piece. Has been. Each of the music MIDI data DM includes identification data that is data for distinguishing music, a score track that represents a score for each instrument used in the music, and a section in which the music is divided (for example, A melody or rust). ) At least tempo data representing the tempo.

このうちの楽譜トラックには、ＭＩＤＩ音源から出力される個々の出力音について、少なくとも、音高（いわゆるノートナンバー）と、音源モジュールが出力音を出力する期間（以下、音符長）とが規定されている。ただし、楽譜トラックの音符長は、当該出力音の出力を開始するまでの当該楽曲の演奏開始からの時間を表す演奏開始タイミング（いわゆるノートオンタイミング）と、当該出力音の出力を終了するまでの当該楽曲の演奏開始からの時間を表す演奏終了タイミング（いわゆるノートオフタイミング）とによって規定されている。 Of these, at least the pitch (so-called note number) and the period during which the sound module outputs the output sound (hereinafter referred to as the note length) are defined for each output sound output from the MIDI sound source in the score track. ing. However, the note length of the musical score track is the performance start timing (so-called note-on timing) indicating the time from the start of the performance of the music until the output of the output sound is started, and the output of the output sound is ended. It is defined by the performance end timing (so-called note-off timing) that represents the time from the start of performance of the music.

なお、楽譜トラックは、例えば、鍵盤楽器（例えば、ピアノやパイプオルガンなど）、弦楽器（例えば、バイオリンやビオラ、ギター、琴など）、打楽器（例えば、ドラムやシンバル、ティンパニー、木琴など）、及び管楽器（例えば、クラリネットやトランペット、フルート、尺八など）などの楽器毎に用意されている。 Note that the score track includes, for example, keyboard instruments (eg, piano and pipe organ), stringed instruments (eg, violin, viola, guitar, koto), percussion instruments (eg, drums, cymbals, timpani, xylophone, etc.), and wind instruments. (For example, clarinet, trumpet, flute, shakuhachi, etc.)

歌詞データＤＬは、周知のカラオケ装置を構成する表示装置に表示される歌詞に関するデータであり、特定楽曲の歌詞を構成する文字（以下、歌詞構成文字）を表す歌詞テロップデータＤＴと、歌詞構成文字の出力タイミングである歌詞出力タイミングを、楽曲ＭＩＤＩデータＤＭの演奏と対応付けるタイミング対応関係が規定された歌詞出力データＤＯとを備えている。 The lyric data DL is data relating to lyrics displayed on a display device that constitutes a well-known karaoke device, and includes lyrics telop data DT representing characters (hereinafter, lyric constituent characters) constituting the lyrics of the specific music, and lyrics constituent characters. Lyric output data DO, in which a timing correspondence relationship for associating the lyric output timing with the performance of the music MIDI data DM is defined.

具体的に、本実施形態におけるタイミング対応関係は、楽曲ＭＩＤＩデータＤＭの演奏を開始するタイミング（本発明における特定開始タイミングの一例）に、歌詞テロップデータＤＴの出力を開始するタイミングが対応付けられた上で、対象楽曲の時間軸に沿った各歌詞構成文字の歌詞出力タイミングが、楽曲ＭＩＤＩデータＤＭの演奏を開始からの経過時間によって規定されている。なお、ここでいう経過時間とは、例えば、表示された歌詞構成文字の色替えを実行するタイミングを表す時間であり、色替えの速度によって規定されている。また、ここでいう歌詞構成文字は、歌詞を構成する文字の各々であっても良いし、その文字の各々を時間軸に沿った特定の規則に従って一群とした文節やフレーズであっても良い。 Specifically, in the timing correspondence relationship in the present embodiment, the timing for starting the output of the lyrics telop data DT is associated with the timing for starting the performance of the music MIDI data DM (an example of the specific start timing in the present invention). Above, the lyrics output timing of each lyrics constituent character along the time axis of the target song is defined by the elapsed time from the start of the performance of the song MIDI data DM. The elapsed time referred to here is, for example, a time representing the timing of executing color change of the displayed lyrics constituent characters, and is defined by the color change speed. Further, the lyric constituent characters here may be each of the characters constituting the lyric, or may be a phrase or a phrase in which each of the characters is grouped according to a specific rule along the time axis.

なお、本実施形態におけるタイミング対応関係として、色替えの速度から特定される各歌詞構成文字の出力を終了するタイミング（以下、歌詞出力終了タイミングとする）が、楽曲ＭＩＤＩデータＤＭの演奏を開始からの経過時間によって規定されていても良い。 As the timing correspondence in the present embodiment, the timing of ending the output of each lyric constituent character specified from the color change speed (hereinafter referred to as the lyric output end timing) is from the start of the performance of the music MIDI data DM. It may be defined by the elapsed time.

楽曲音響データＤＷと、楽曲ＭＩＤＩデータＤＭと、歌詞データＤＬとは、それぞれ対応する楽曲毎に対応付けられて、音楽データ格納サーバ３に格納されている。
〈携帯端末について〉
このうち、携帯端末５は、情報処理装置２０から取得した楽曲音響データＤＷを再生可能な端末（例えば、周知の携帯電話）であり、情報受付部６と、表示部７と、音出力部８と、通信部９と、記憶部１０と、制御部１１とを備えている。 The music acoustic data DW, the music MIDI data DM, and the lyrics data DL are stored in the music data storage server 3 in association with each corresponding music.
<About mobile devices>
Among these, the mobile terminal 5 is a terminal (for example, a well-known mobile phone) that can reproduce the music acoustic data DW acquired from the information processing apparatus 20, and includes an information receiving unit 6, a display unit 7, and a sound output unit 8. A communication unit 9, a storage unit 10, and a control unit 11.

このうちの情報受付部６は、入力装置（図示せず）を介して入力された情報を受け付ける。表示部７は、制御部１１からの指令に基づいて、少なくとも、文字コードで示される情報を含む画像を表示する。音出力部８は、少なくとも、楽曲音響データＤＷを再生して出力するものであり、例えば、ＰＣＭ音源と、スピーカとを備えている。 Among these, the information reception part 6 receives the information input via the input device (not shown). The display unit 7 displays an image including at least information indicated by a character code based on a command from the control unit 11. The sound output unit 8 reproduces and outputs at least the music sound data DW, and includes, for example, a PCM sound source and a speaker.

通信部９は、通信網（例えば、公衆無線通信網やネットワーク回線）を介して、携帯端末５が外部との間で情報通信を行うものである。記憶部１０は、各種処理プログラムや各種データを記憶する。制御部１１は、記憶部１０に記憶された処理プログラムなどに従って、携帯端末５を構成する各部６，７，８，９，１０を制御する。
〈情報処理装置について〉
次に、情報処理装置２０について説明する。 The communication unit 9 is for the mobile terminal 5 to perform information communication with the outside via a communication network (for example, a public wireless communication network or a network line). The storage unit 10 stores various processing programs and various data. The control unit 11 controls the units 6, 7, 8, 9, and 10 that constitute the mobile terminal 5 according to a processing program stored in the storage unit 10 and the like.
<Information processing device>
Next, the information processing apparatus 20 will be described.

この情報処理装置２０は、通信部２１と、入力受付部２２と、表示部２３と、音声入力部２４と、音声出力部２５と、音源モジュール２６と、記憶部２７と、制御部３０とを備えている。 The information processing apparatus 20 includes a communication unit 21, an input receiving unit 22, a display unit 23, a voice input unit 24, a voice output unit 25, a sound source module 26, a storage unit 27, and a control unit 30. I have.

このうち、通信部２１は、通信網（例えば、公衆無線通信網やネットワーク回線）を介して、情報処理装置２０が外部との間で通信を行う。入力受付部２２は、外部からの操作に従って情報や指令の入力を受け付ける入力機器（例えば、キーボードやポインティングデバイス）である。表示部２３は、少なくとも、文字コードで示される情報を含む画像を表示する表示装置（例えば、液晶ディスプレイやＣＲＴ等）である。また、音声入力部２４は、音声を電気信号に変換して制御部３０に入力する装置（いわゆるマイクロホン）である。音声出力部２５は、制御部３０からの電気信号を音声に変換して出力する装置（いわゆるスピーカ）である。 Among these, the communication unit 21 performs communication between the information processing apparatus 20 and the outside via a communication network (for example, a public wireless communication network or a network line). The input receiving unit 22 is an input device (for example, a keyboard or a pointing device) that receives input of information and commands in accordance with an external operation. The display unit 23 is a display device (for example, a liquid crystal display or a CRT) that displays an image including at least information indicated by a character code. The voice input unit 24 is a device (so-called microphone) that converts voice into an electrical signal and inputs the electrical signal to the control unit 30. The sound output unit 25 is a device (so-called speaker) that converts an electrical signal from the control unit 30 into sound and outputs the sound.

さらに、音源モジュール２６は、楽曲ＭＩＤＩデータＤＭに基づいて、音源からの音を模擬した音（即ち、出力音）を出力する装置（例えば、ＭＩＤＩ音源）である。記憶部２７は、記憶内容を読み書き可能に構成された不揮発性の記憶装置（例えば、ハードディスク装置）である。 Furthermore, the sound module 26 is a device (for example, a MIDI sound source) that outputs a sound (that is, an output sound) that simulates a sound from a sound source based on the music MIDI data DM. The storage unit 27 is a non-volatile storage device (for example, a hard disk device) configured to be able to read and write stored contents.

また、制御部３０は、電源が切断されても記憶内容を保持する必要がある処理プログラムやデータを格納するＲＯＭ３１と、処理プログラムやデータを一時的に格納するＲＡＭ３２と、ＲＯＭ３１やＲＡＭ３２に記憶された処理プログラムに従って各処理（各種演算）を実行するＣＰＵ３３とを少なくとも有した周知のコンピュータを中心に構成されている。 Further, the control unit 30 is stored in the ROM 31 that stores processing programs and data that need to retain stored contents even when the power is turned off, the RAM 32 that temporarily stores processing programs and data, and the ROM 31 and RAM 32. It is mainly composed of a known computer having at least a CPU 33 that executes each process (various operations) according to the processing program.

なお、ＲＯＭ３１には、対象楽曲に対応する歌詞出力データＤＯにおける歌詞出力タイミングを、対象楽曲に対応する楽曲音響データＤＷ中の楽音の演奏開始タイミングに一致するように修正するデータ修正処理を、ＣＰＵ３３が実行するための処理プログラムが格納されている。すなわち、データ修正処理を実行することで、情報処理装置２０は、本発明の歌詞出力データ修正装置として機能する。
〈データ修正処理について〉
次に、ＣＰＵ３３が実行するデータ修正処理について説明する。 In the ROM 31, the CPU 33 performs a data correction process for correcting the lyrics output timing in the lyrics output data DO corresponding to the target music so as to coincide with the musical performance start timing in the music acoustic data DW corresponding to the target music. The processing program for executing is stored. That is, by executing the data correction process, the information processing apparatus 20 functions as the lyrics output data correction apparatus of the present invention.
<About data correction processing>
Next, data correction processing executed by the CPU 33 will be described.

ここで、図２は、本実施形態におけるデータ修正処理の処理手順を示すフローチャートである。
このデータ修正処理は、入力受付部２２を介して、当該データ修正処理を起動するための起動指令が入力されると、実行が開始されるものである。 Here, FIG. 2 is a flowchart showing a processing procedure of data correction processing in the present embodiment.
The data correction process is started when an activation command for starting the data correction process is input via the input receiving unit 22.

そして、図２に示すように、データ修正処理は、起動されると、入力受付部２２を介して入力された情報によって指定された楽曲（以下、対象楽曲とする）に対応する楽曲ＭＩＤＩデータＤＭを、音楽データ格納サーバ３から取得する（Ｓ１１０（Ｓは、ステップを意味する））。 As shown in FIG. 2, when the data correction process is started, the music MIDI data DM corresponding to the music (hereinafter referred to as the target music) specified by the information input via the input receiving unit 22 is used. Is acquired from the music data storage server 3 (S110 (S means step)).

続いて、対象楽曲に対応する楽曲音響データＤＷを音楽データ格納サーバ３から取得する（Ｓ１３０）。その取得した楽曲音響データＤＷから、当該楽曲音響データＤＷにおける楽音推移波形を取得する（Ｓ１４０）。 Subsequently, music acoustic data DW corresponding to the target music is acquired from the music data storage server 3 (S130). A musical sound transition waveform in the music acoustic data DW is acquired from the acquired music acoustic data DW (S140).

そして、Ｓ１１０にて取得した楽曲ＭＩＤＩデータＤＭと、Ｓ１４０にて取得した楽音推移波形とに基づいて、対象楽曲を構成する楽音の音高に、出力音の音高が一致するように、当該楽曲ＭＩＤＩデータＤＭを修正する音高補正処理を実行する（Ｓ１５０）。以下、出力音について修正が実行された楽曲ＭＩＤＩデータＤＭを修正楽曲ＭＩＤＩデータＤＭと称す。 Then, based on the music MIDI data DM acquired in S110 and the musical sound transition waveform acquired in S140, the musical composition is set so that the pitch of the output sound matches the pitch of the musical sound constituting the target musical composition. A pitch correction process for correcting the MIDI data DM is executed (S150). Hereinafter, the music MIDI data DM in which the output sound has been corrected is referred to as corrected music MIDI data DM.

さらに、音高補正処理によって、出力音の音高が楽音の音高に一致するように修正された出力音（以下、修正出力音とする）の演奏開始タイミングと、楽音の演奏開始タイミングとのズレ量（以下、時間ズレ量と称す）を導出すると共に、対象楽曲を構成する楽音の演奏開始タイミングに、個々の出力音の演奏開始タイミングが一致するように修正楽曲ＭＩＤＩデータＤＭを修正する時間ズレ量導出処理を実行する（Ｓ１７０）。
〈音高補正処理の処理内容について〉
ここで、データ修正処理のＳ１５０にて起動される音高補正処理について説明する。 In addition, the pitch correction process is performed between the performance start timing of the output sound (hereinafter referred to as the corrected output sound) modified so that the pitch of the output sound matches the pitch of the musical sound, and the performance start timing of the musical sound. A time for deriving a deviation amount (hereinafter referred to as a time deviation amount) and correcting the corrected music MIDI data DM so that the performance start timing of each output sound matches the performance start timing of the musical sound constituting the target music. A deviation amount derivation process is executed (S170).
<Pitch correction processing details>
Here, the pitch correction process started in S150 of the data correction process will be described.

この音高補正処理は、起動されると、図３に示すように、先のＳ１１０にて取得した楽曲ＭＩＤＩデータＤＭに含まれる全ての楽譜トラックに基づいて、全ての出力音が時間軸に沿って推移した波形である出力音推移波形を取得する（Ｓ３１０）。具体的に、本実施形態における出力音推移波形の取得は、ＭＩＤＩ規格のデータから音声信号（波形）を生成する周知のレンダリングによって実行される。 When the pitch correction process is started, as shown in FIG. 3, all output sounds are set along the time axis based on all score tracks included in the music MIDI data DM acquired in the previous S110. An output sound transition waveform that is a waveform that has been shifted is acquired (S310). Specifically, the acquisition of the output sound transition waveform in the present embodiment is executed by well-known rendering that generates an audio signal (waveform) from MIDI standard data.

続いて、その取得した出力音推移波形を、時間軸に沿って設定された単位時間毎に周波数解析（本実施形態では、離散フーリエ変換）して、その単位時間の出力音推移波形に含まれる周波数、及び各周波数における強度を表すパワースペクトルを導出する（Ｓ３２０）。その導出されたパワースペクトルに基づいて、各周波数における強度を、時間軸に沿って周波数毎に相加平均した平均出力音スペクトルを導出する（Ｓ３３０）。その導出した平均出力音スペクトルの周波数における強度を、境界が互いに隣接するように予め規定された周波数範囲（例えば、半音単位、以下、規定音高範囲）毎に平均化して代表値とする（Ｓ３４０）。さらに、そのＳ３４０で平均化した平均出力音スペクトルにおける周波数における強度を、分散「１」、平均「０」となるように正規化した正規化出力音スペクトル（図４（Ａ）参照）を導出する（Ｓ３５０）。 Subsequently, the obtained output sound transition waveform is subjected to frequency analysis (in this embodiment, discrete Fourier transform) for each unit time set along the time axis, and is included in the output sound transition waveform of the unit time. A power spectrum representing the frequency and the intensity at each frequency is derived (S320). Based on the derived power spectrum, an average output sound spectrum obtained by arithmetically averaging the intensity at each frequency for each frequency along the time axis is derived (S330). The intensity at the frequency of the derived average output sound spectrum is averaged for each frequency range (for example, a semitone unit, hereinafter, a specified pitch range) so that the boundaries are adjacent to each other to obtain a representative value (S340). ). Furthermore, a normalized output sound spectrum (see FIG. 4A) is derived by normalizing the intensity at the frequency in the average output sound spectrum averaged in S340 so that the variance is “1” and the average is “0”. (S350).

続いて、先のＳ１４０にて取得した楽音推移波形を、時間軸に沿って設定された単位時間毎に周波数解析して、その単位時間でのパワースペクトルを導出する（Ｓ３６０）。その導出されたパワースペクトルに基づいて、各周波数における強度を、時間軸に沿って周波数毎に相加平均した平均楽音スペクトルを導出する（Ｓ３７０）。その導出した平均楽音スペクトルの周波数における強度を、規定音高範囲毎に平均化して代表値とし（Ｓ３８０）、そのＳ３８０で平均化した平均楽音スペクトルの周波数における強度を、分散「１」、平均「０」となるように正規化した正規化楽音スペクトル（図４（Ｂ）参照）を導出する（Ｓ３９０）。 Subsequently, the musical sound transition waveform acquired in S140 is subjected to frequency analysis for each unit time set along the time axis, and a power spectrum in the unit time is derived (S360). Based on the derived power spectrum, an average musical sound spectrum obtained by arithmetically averaging the intensity at each frequency for each frequency along the time axis is derived (S370). The intensity at the frequency of the derived average tone spectrum is averaged for each specified pitch range to obtain a representative value (S380), and the intensity at the frequency of the average tone spectrum averaged at S380 is expressed as variance “1”, average “ A normalized musical tone spectrum (see FIG. 4B) normalized to be “0” is derived (S390).

なお、本実施形態のＳ３４０，Ｓ３８０にて求める代表値は、規定音高範囲における中心値に対応する周波数における強度を代表値としても良い。この場合、具体的には、２０Ｃｅｎｔ毎（半音の５分の１毎）に、２０Ｃｅｎｔグリッドに一番近い周波数の値（パワー）を抽出する処理を行う。 Note that the representative value obtained in S340 and S380 of the present embodiment may be the intensity at a frequency corresponding to the center value in the specified pitch range. In this case, specifically, for each 20 Cent (every fifth of a semitone), a process of extracting a frequency value (power) closest to the 20 Cent grid is performed.

そして、詳しくは、後述するように、正規化出力音スペクトルと正規化楽音スペクトルとの相関値（以下、音高相関値とする）を導出する（Ｓ４００）。そして、正規化楽音スペクトルに対する正規化出力音スペクトルのシフト量が予め規定された上限値以上であるか否かを判定する（Ｓ４１０）。その判定の結果、シフト量が上限値未満であれば（Ｓ４１０：ＮＯ）、正規化出力音スペクトルを、周波数軸に沿って予め規定された規定量シフトして（Ｓ４２０）、Ｓ４００へと戻り、音高相関値を再度導出する。 In detail, as described later, a correlation value between the normalized output sound spectrum and the normalized musical sound spectrum (hereinafter referred to as a pitch correlation value) is derived (S400). Then, it is determined whether or not the shift amount of the normalized output sound spectrum with respect to the normalized musical sound spectrum is equal to or greater than a predetermined upper limit value (S410). As a result of the determination, if the shift amount is less than the upper limit value (S410: NO), the normalized output sound spectrum is shifted by a predetermined amount along the frequency axis (S420), and the process returns to S400. The pitch correlation value is derived again.

すなわち、本実施形態のＳ４００〜Ｓ４２０では、図４（Ｃ）に示すように、正規化楽音スペクトルに対して、正規化出力音スペクトルを周波数軸に沿って下限値から上限値に達するまでシフトさせつつ、その正規化出力音スペクトルをシフトさせる毎に、音高相関値を導出する。 That is, in S400 to S420 of this embodiment, as shown in FIG. 4C, the normalized output sound spectrum is shifted from the lower limit value to the upper limit value along the frequency axis with respect to the normalized musical sound spectrum. However, every time the normalized output sound spectrum is shifted, a pitch correlation value is derived.

そして、正規化出力音のシフト量が上限値以上となると（Ｓ４１０：ＹＥＳ）、対象楽曲を構成する楽音の音高に、出力音の音高を一致させるための補正量（以下、音高補正量とする）を導出する（Ｓ４３０）。本実施形態のＳ４３０では、具体的に、先のＳ４００にて導出された全ての音高相関値の中で、値が最大である音高相関値に対応する正規化出力音スペクトルのシフト量を音高補正量として導出する。 When the shift amount of the normalized output sound becomes equal to or greater than the upper limit value (S410: YES), a correction amount (hereinafter referred to as pitch correction) for matching the pitch of the output sound to the pitch of the musical sound constituting the target music. Is determined (S430). In S430 of the present embodiment, specifically, among the pitch correlation values derived in the previous S400, the shift amount of the normalized output sound spectrum corresponding to the pitch correlation value having the maximum value is calculated. Derived as a pitch correction amount.

続いて、その導出された音高補正量に従って、楽曲ＭＩＤＩデータＤＭにおける全ての楽譜トラックに規定された個々の出力音の音高を修正することで、修正楽曲ＭＩＤＩデータを生成する（Ｓ４４０）。すなわち、本実施形態のＳ４４０にて生成される修正楽曲ＭＩＤＩデータは、出力音の音高が、予め用意された出力音の音高から音高補正量シフトされたものとなる。 Subsequently, according to the derived pitch correction amount, the corrected music MIDI data is generated by correcting the pitches of the individual output sounds defined for all the score tracks in the music MIDI data DM (S440). That is, the corrected music MIDI data generated in S440 of the present embodiment is obtained by shifting the pitch of the output sound from the pitch of the output sound prepared in advance by a pitch correction amount.

そして、その後、本音高補正処理を終了し、データ修正処理へと戻る。
〈時間ズレ量導出処理の処理内容について〉
次に、データ修正処理のＳ１７０にて起動される時間ズレ量導出処理について説明する。 After that, the pitch correction process is terminated and the process returns to the data correction process.
<About the processing content of the time deviation derivation process>
Next, the time lag amount derivation process activated in S170 of the data correction process will be described.

この時間ズレ量導出処理は、起動されると、図５に示すように、先のＳ４４０にて生成された修正楽曲ＭＩＤＩデータに含まれる全ての楽譜トラックに基づいて、全ての修正出力音が時間軸に沿って推移した波形である修正音推移波形を取得する（Ｓ５１０）。本実施形態における修正音推移波形の取得は、Ｓ３１０と同様の方法により実行すれば良い。 When this time shift amount derivation process is started, as shown in FIG. 5, all the corrected output sounds are timed based on all the score tracks included in the corrected music MIDI data generated in the previous S440. A modified sound transition waveform that is a waveform that has shifted along the axis is acquired (S510). The acquisition of the modified sound transition waveform in the present embodiment may be executed by the same method as in S310.

続いて、その取得した修正音推移波形の非調波成分である出力音非調波を、該修正音推移波形から導出し（Ｓ５２０）、さらに、先のＳ１４０で取得した楽音推移波形の非調波成分である楽音非調波を、該楽音推移波形から導出する（Ｓ５３０）。これらの非調波成分の導出は、予め用意されたフィルタに、修正音推移波形または楽音推移波形を通過させることで実行しても良い。 Subsequently, an output sound non-harmonic that is a non-harmonic component of the acquired modified sound transition waveform is derived from the modified sound transition waveform (S520), and further, the non-harmonic of the musical sound transition waveform acquired in the previous S140 is obtained. A musical tone non-harmonic wave component is derived from the musical tone transition waveform (S530). The derivation of these non-harmonic components may be performed by passing the corrected sound transition waveform or the musical sound transition waveform through a filter prepared in advance.

さらに、出力音非調波及び楽音非調波を、それぞれ、時間軸に沿って規定された時間長である特定ブロック毎に分割する（Ｓ５４０）。その分割する特定ブロックは、対象楽曲においてテンポが一定であることを表すテンポ一定区間である。このテンポ一定区間は、楽曲ＭＩＤＩデータＤＭのテンポデータに規定されたテンポに従って、テンポが変更される時刻を、各テンポ一定区間の開始時刻、終了時刻として特定することで決定する。なお、楽音非調波の特定ブロックについては、出力音非調波の特定ブロックを決定した後、出力音非調波の特定ブロックそれぞれの開始時刻、終了時刻に相当する対象楽曲の演奏開始からの時刻を、楽音非調波の特定ブロックそれぞれの開始時刻及び終了時刻として特定することで決定する。 Furthermore, the output sound non-harmonic and the musical sound non-harmonic are each divided into specific blocks each having a time length defined along the time axis (S540). The specific block to be divided is a constant tempo section indicating that the tempo of the target music is constant. This fixed tempo section is determined by specifying the time at which the tempo is changed according to the tempo specified in the tempo data of the music MIDI data DM as the start time and end time of each fixed tempo section. Regarding the specific block of musical tone non-harmonic, after the specific block of output non-harmonic is determined, the start time and the end time of each specific block of output sound non-harmonic are determined from the start of the performance of the target music. The time is determined by specifying the start time and the end time of each specific block of the musical tone non-harmonic.

そして、Ｓ５４０にて分割された特定ブロックの中から、一組の特定ブロックを選択し（Ｓ５５０）、その一組の特定ブロックについて、楽音非調波、出力音非調波共に、時間軸に沿った変化を表すユニットデータを生成する（Ｓ５６０）。本実施形態におけるユニットデータは、図６（Ａ），（Ｂ）に示すように、特定ブロックよりも短い時間長である規定区間毎に、その規定区間内での非調波成分の振幅値を加算した上で、その規定区間毎に加算された値を正規化することによって生成する。なお、以下では、出力音非調波についてのユニットデータを出力音ユニットデータ（本発明における出力音変化に相当）とし、楽音非調波についてのユニットデータを楽音ユニットデータ（本発明における楽音変化に相当）とする。 Then, a set of specific blocks is selected from the specific blocks divided in S540 (S550), and both the musical tone non-harmonic and output sound non-harmonic are set along the time axis for the set of specific blocks. Unit data representing the change is generated (S560). As shown in FIGS. 6 (A) and 6 (B), the unit data in the present embodiment includes the amplitude value of the non-harmonic component in the specified section for each specified section that is shorter than the specific block. After the addition, it is generated by normalizing the value added for each specified section. In the following, the unit data for output sound non-harmonic is referred to as output sound unit data (corresponding to the output sound change in the present invention), and the unit data for musical sound non-harmonic is referred to as the musical unit data (musical sound change in the present invention) Equivalent).

その出力音ユニットデータの時間軸上に規定された出力音設定位置を、楽音ユニットデータの時間軸上に規定された楽音設定位置に一致させて、出力音ユニットデータと楽音ユニットデータとの相関値（以下、時間相関値とする）を導出する（Ｓ５７０）。そして、楽音ユニットデータに対する出力音ユニットデータの伸縮率が、予め規定された上限値（伸縮率の上限値）以上であるか否かを判定する（Ｓ５８０）。その判定の結果、楽音ユニットデータの伸縮率が、伸縮率の上限値未満であれば（Ｓ５８０：ＮＯ）、出力音ユニットデータを、時間軸に沿って予め規定された規定率拡大して（Ｓ５９０）、Ｓ５７０へと戻る。 Matching the output sound setting position specified on the time axis of the output sound unit data with the music sound setting position specified on the time axis of the music sound unit data, the correlation value between the output sound unit data and the sound unit data (Hereinafter referred to as a time correlation value) is derived (S570). Then, it is determined whether or not the expansion / contraction rate of the output sound unit data with respect to the musical sound unit data is equal to or higher than a predetermined upper limit value (expansion rate upper limit value) (S580). As a result of the determination, if the expansion / contraction rate of the musical sound unit data is less than the upper limit value of the expansion / contraction rate (S580: NO), the output sound unit data is expanded by a predetermined rate along the time axis (S590). ), The process returns to S570.

さらに、楽音ユニットデータの伸縮率が、伸縮率の上限値に達していれば（Ｓ５８０：ＹＥＳ）、楽音ユニットデータに対する出力音ユニットデータの時間軸に沿ったシフト量が、予め規定された上限値（シフト量の上限値）以上であるか否かを判定する（Ｓ６００）。その判定の結果、楽音ユニットデータのシフト量が、シフト量の上限値未満であれば（Ｓ６００：ＮＯ）、出力音ユニットデータの設定位置を、予め規定された時間シフトして（Ｓ６１０）、出力音ユニットデータの伸縮率を下限値とした上で、Ｓ５７０へと戻る。 Further, if the expansion / contraction rate of the musical sound unit data has reached the upper limit value of the expansion / contraction rate (S580: YES), the shift amount along the time axis of the output sound unit data with respect to the musical sound unit data is set to a predetermined upper limit value. It is determined whether or not it is equal to or greater than (the upper limit value of the shift amount) (S600). As a result of the determination, if the shift amount of the musical sound unit data is less than the upper limit value of the shift amount (S600: NO), the set position of the output sound unit data is shifted by a predetermined time (S610) and output. After setting the expansion / contraction rate of the sound unit data as the lower limit value, the process returns to S570.

すなわち、本実施形態のＳ５７０〜Ｓ６１０では、図６（Ｃ）に示すように、楽音ユニットデータに対して、出力音ユニットデータの伸縮率が上限値に達するまで拡大する毎に、時間相関値を導出する。そして、このような時間相関値の導出を、楽音ユニットデータに対して、出力音ユニットデータを時間軸に沿ってシフト量の上限値に達するまでシフトさせつつ実行する。 That is, in S570 to S610 of this embodiment, as shown in FIG. 6 (C), the time correlation value is set each time the musical sound unit data is expanded until the expansion / contraction rate of the output sound unit data reaches the upper limit value. To derive. Then, the derivation of the time correlation value is executed while shifting the output sound unit data along the time axis until the upper limit value of the shift amount is reached with respect to the musical sound unit data.

一方、Ｓ６００での判定の結果、出力音ユニットデータのシフト量が、シフト量の上限値以上であれば（Ｓ６００：ＹＥＳ）、対象楽曲を構成する楽音の演奏開始タイミングに、修正出力音の演奏開始タイミングを一致させるための補正量、即ち、時間ズレ量を導出する（Ｓ６２０）。本実施形態のＳ６２０では、具体的に、一組の特定ブロックに対してＳ５７０で導出された全ての時間相関値の中で、値が最大となる時間相関値に対応する出力音ユニットデータの伸縮率及びシフト量を、Ｓ５５０で選択した特定ブロックに対する時間ズレ量として導出する。 On the other hand, if the result of determination in S600 is that the shift amount of the output sound unit data is greater than or equal to the upper limit value of the shift amount (S600: YES), the performance of the modified output sound is performed at the performance start timing of the musical sound constituting the target music. A correction amount for matching the start timing, that is, a time shift amount is derived (S620). In S620 of this embodiment, specifically, the expansion / contraction of the output sound unit data corresponding to the time correlation value having the maximum value among all the time correlation values derived in S570 for a set of specific blocks. The rate and the shift amount are derived as a time shift amount with respect to the specific block selected in S550.

その導出された時間ズレ量に従って、個々の出力音の演奏開始タイミングを修正した修正楽譜データを生成する（Ｓ６３０）。本実施形態のＳ６３０では、Ｓ５５０で選択した特定ブロックに対する時間ズレ量として導出された、出力音ユニットデータのシフト量と、出力音ユニットデータの伸縮率とに基づいて、出力音の音高が修正された修正楽譜データにおける当該特定ブロックの開始時刻及び終了時刻を修正する。そして、修正前の出力音の演奏開始タイミングの間隔比率が維持されるように、修正後の開始時刻、及び終了時刻にて規定される期間に応じて、出力音の演奏開始タイミングを伸縮させることで、当該特定ブロックに対する個々の出力音の演奏開始タイミングを修正した修正楽譜データを生成する。なお、本実施形態のＳ６３０では、出力音の演奏終了タイミングについても修正する。この出力音の演奏終了タイミングの修正方法は、出力音の演奏開始タイミングと同様の方法を用いれば良い。 According to the derived amount of time deviation, modified score data in which the performance start timing of each output sound is modified is generated (S630). In S630 of the present embodiment, the pitch of the output sound is corrected based on the shift amount of the output sound unit data and the expansion / contraction rate of the output sound unit data derived as the amount of time shift with respect to the specific block selected in S550. The start time and end time of the specific block in the corrected musical score data are corrected. Then, the performance start timing of the output sound is expanded or contracted according to the period specified by the start time and the end time after the correction so that the interval ratio of the performance start timing of the output sound before the correction is maintained. Then, modified score data in which the performance start timing of each output sound for the specific block is modified is generated. In S630 of this embodiment, the performance end timing of the output sound is also corrected. The method for correcting the performance end timing of the output sound may be the same method as the performance start timing of the output sound.

続いて、Ｓ５４０にて分割した全ての特定ブロックに対して、時間ズレ量を導出したか否かを判定し（Ｓ６４０）、その判定の結果、全ての特定ブロックに対して時間ズレ量を導出していなければ（Ｓ６４０：ＮＯ）、Ｓ５５０に戻る。そのＳ５５０では、新たな特定ブロックを選択し、Ｓ６２０までのステップを実行する。このＳ５５０では、時間長が長いものから順に特定ブロックを取得して、時間ズレ量を導出する。ただし、時間ズレ量が既に導出されている特定ブロックに隣接する特定ブロックでは、既に導出されている特定ブロックの修正後の開始時刻または終了時刻を、自特定ブロックでの値として導出する。 Subsequently, it is determined whether or not the amount of time deviation is derived for all the specific blocks divided in S540 (S640). As a result of the determination, the amount of time deviation is derived for all the specific blocks. If not (S640: NO), the process returns to S550. In S550, a new specific block is selected, and the steps up to S620 are executed. In S550, specific blocks are acquired in order from the longest time length, and the amount of time shift is derived. However, in a specific block adjacent to a specific block for which the amount of time deviation has already been derived, the corrected start time or end time of the specific block that has already been derived is derived as a value in the own specific block.

一方、Ｓ６４０での判定の結果、全ての特定ブロックに対して時間ズレ量を導出していれば（Ｓ６４０：ＹＥＳ）、その後、本時間補正処理を終了し、データ修正処理へと戻る。 On the other hand, if the result of determination in S640 is that the amount of time deviation has been derived for all the specific blocks (S640: YES), then this time correction process is terminated and the process returns to the data correction process.

そのデータ修正処理（図２参照）のＳ１９０へと移行すると、Ｓ１２０にて規定した歌詞出力トラックの各歌詞構成文字の歌詞出力タイミングを取得する（Ｓ１９０）。続いて、時間ズレ量導出処理にて導出された時間ズレ量に従って、楽曲音響データＤＷにおける楽音の演奏開始タイミングに一致するように、Ｓ１９０にて取得した歌詞構成文字の各々の歌詞出力タイミングを修正する（Ｓ２００）。 When the process proceeds to S190 in the data correction process (see FIG. 2), the lyrics output timing of each lyrics constituent character of the lyrics output track defined in S120 is acquired (S190). Subsequently, according to the time deviation amount derived in the time deviation amount derivation process, the lyrics output timing of each of the lyrics constituent characters acquired in S190 is corrected so as to coincide with the musical performance start timing in the music acoustic data DW. (S200).

具体的に、本実施形態のＳ２００において歌詞出力タイミングを修正する方法は、先のＳ６３０における出力音の演奏開始タイミング及び演奏終了タイミングの修正と同様の方法でも良い。 Specifically, the method of correcting the lyrics output timing in S200 of the present embodiment may be the same method as the correction of the performance start timing and performance end timing of the output sound in S630.

そして、Ｓ２００にて修正された歌詞出力タイミング（即ち、修正歌詞出力タイミング）及び歌詞出力終了タイミングが規定された歌詞出力データ（即ち、修正歌詞出力データ）ＤＯを生成する（Ｓ２１０）。 Then, lyrics output data (that is, corrected lyrics output data) DO that defines the lyrics output timing (that is, corrected lyrics output timing) and the lyrics output end timing that are corrected in S200 are generated (S210).

その後、本データ修正処理を終了する。
［第一実施形態の効果］
以上説明したように、このようなデータ修正処理によれば、歌詞出力データＤＯにおける歌詞出力タイミングを、対象楽曲を構成する楽音の演奏開始タイミングに対して、時間軸に沿って一致したものへと修正することができる。 Thereafter, the data correction process is terminated.
[Effect of the first embodiment]
As described above, according to such data correction processing, the lyrics output timing in the lyrics output data DO is made to coincide with the performance start timing of the musical sound constituting the target music along the time axis. It can be corrected.

しかも、本実施形態のデータ修正処理では、歌詞出力タイミングを修正する（即ち、修正歌詞出力タイミングを規定する）際に、音声認識処理を一切実行する必要がないため、歌詞構成文字の歌詞出力タイミングを対応付けるべき、楽音の演奏開始タイミングを特定するまでに要するトータルの処理量を、特許文献１に記載された装置に比べて低減できる。 Moreover, in the data correction process of the present embodiment, when the lyrics output timing is corrected (that is, the corrected lyrics output timing is defined), it is not necessary to execute any speech recognition process, so the lyrics output timing of the lyrics constituent characters The total amount of processing required to specify the musical performance start timing should be reduced as compared with the apparatus described in Patent Document 1.

換言すれば、本発明の歌詞出力データ修正装置によれば、より簡易な方法で、歌詞構成文字の歌詞出力タイミングを、当該歌詞構成文字に対応する楽音の演奏開始タイミングに対応付けることができる。 In other words, according to the lyrics output data correction device of the present invention, the lyrics output timing of the lyrics constituent characters can be associated with the musical performance start timing corresponding to the lyrics constituent characters by a simpler method.

特に、本実施形態のデータ修正処理では、時間ズレ量導出処理の実行前に音高補正処理を実行して、楽音の音高に一致するように修正された出力音の音高を有した修正楽曲ＭＩＤＩデータＤＭを生成した上で、その修正楽曲ＭＩＤＩデータＤＭから出力音推移波形を取得している。この結果、データ修正処理によれば、出力音推移波形と楽音推移波形との間の音高ズレが最小限に抑制され、時間ズレ量の導出精度を向上させることができる。 In particular, in the data correction process according to the present embodiment, the pitch correction process is executed before the time deviation amount derivation process, and the correction has the pitch of the output sound corrected to match the pitch of the musical sound. After the music MIDI data DM is generated, an output sound transition waveform is acquired from the corrected music MIDI data DM. As a result, according to the data correction process, the pitch shift between the output sound transition waveform and the musical sound transition waveform is suppressed to the minimum, and the accuracy of deriving the time shift amount can be improved.

ところで、本実施形態のデータ修正処理にて生成される修正歌詞出力データＤＯは、各歌詞構成文字についての歌詞出力タイミングを、楽曲音響データＤＷにおける楽音の演奏開始タイミングに一致させるためのデータである。よって、本実施形態の音楽データ配信システム１において、音源モジュールを有していない携帯端末５に、楽曲音響データＤＷと共に、歌詞テロップデータＤＴと、修正歌詞出力データＤＯとを配信して、当該携帯端末５にて、楽曲音響データＤＷの再生に併せて、楽曲音響データ中の楽音に時間同期させて、歌詞構成文字を出力すれば、当該携帯端末５においても、カラオケを楽しむことができる。
［第二実施形態］
次に、本発明の第二実施形態について説明する。 By the way, the corrected lyrics output data DO generated in the data correction processing of the present embodiment is data for making the lyrics output timing for each lyric constituent character coincide with the musical performance start timing in the music sound data DW. . Therefore, in the music data distribution system 1 of the present embodiment, the lyrics telop data DT and the modified lyrics output data DO are distributed to the portable terminal 5 that does not have the sound source module, together with the music acoustic data DW, and the portable If the terminal 5 outputs the lyrics constituent characters in time synchronization with the musical sound in the music acoustic data in conjunction with the reproduction of the music acoustic data DW, the mobile terminal 5 can also enjoy karaoke.
[Second Embodiment]
Next, a second embodiment of the present invention will be described.

第二実施形態における音楽データ配信システムは、第一実施形態における音楽データ配信システム１とは、歌詞出力データＤＯの構造、及び情報処理装置２０が実行するデータ修正処理の処理内容が異なるのみである。このため、第二実施形態では、第一実施形態の音楽データ配信システム１と同様の構成には、同一の符号を付して説明を省略し、歌詞出力データＤＯの構造、及び情報処理装置２０が実行するデータ修正処理の処理内容を中心に説明する。
〈歌詞出力データＤＯについて〉
本実施形態の歌詞出力データＤＯは、第一実施形態の歌詞出力データＤＯと同様、歌詞構成文字の歌詞出力タイミングを、楽曲ＭＩＤＩデータＤＭの演奏と対応付けるタイミング対応関係が規定されている。 The music data distribution system in the second embodiment is different from the music data distribution system 1 in the first embodiment only in the structure of the lyrics output data DO and the processing content of the data correction processing executed by the information processing apparatus 20. . For this reason, in 2nd embodiment, the same code | symbol is attached | subjected to the structure similar to the music data delivery system 1 of 1st embodiment, description is abbreviate | omitted, the structure of the lyrics output data DO, and the information processing apparatus 20 The contents of the data correction process executed by will be mainly described.
<About lyrics output data DO>
In the lyrics output data DO of this embodiment, the timing correspondence relationship for associating the lyrics output timing of the lyrics constituent characters with the performance of the music MIDI data DM is defined in the same way as the lyrics output data DO of the first embodiment.

具体的に、本実施形態におけるタイミング対応関係は、図７に示すように、歌詞構成文字の各々に対する歌詞出力タイミングが、当該歌詞構成文字に対応する出力音の演奏開始タイミングと対応付けられている。さらには、本実施形態におけるタイミング対応関係では、図７に示すように、歌詞構成文字の各々に対する歌詞出力終了タイミングが、当該歌詞構成文字に対応する出力音の演奏終了タイミングと対応付けられている。
〈データ修正処理について〉
次に、本実施形態におけるデータ修正処理について説明する。 Specifically, in the timing correspondence relationship in the present embodiment, as shown in FIG. 7, the lyrics output timing for each of the lyrics constituent characters is associated with the performance start timing of the output sound corresponding to the lyrics constituent characters. . Furthermore, in the timing correspondence relationship in the present embodiment, as shown in FIG. 7, the lyrics output end timing for each of the lyrics constituent characters is associated with the performance end timing of the output sound corresponding to the lyrics constituent characters. .
<About data correction processing>
Next, data correction processing in the present embodiment will be described.

ここで、図８は、本実施形態におけるデータ修正処理の処理手順を示すフローチャートである。
このデータ修正処理は、入力受付部２２を介して、当該データ修正処理を起動するための起動指令が入力されると、実行が開始されるものである。 Here, FIG. 8 is a flowchart showing a processing procedure of data correction processing in the present embodiment.
The data correction process is started when an activation command for starting the data correction process is input via the input receiving unit 22.

そして、図８に示すように、データ修正処理は、起動されると、対象楽曲に対応する楽曲ＭＩＤＩデータＤＭを、音楽データ格納サーバ３から取得する（Ｓ７１０）。
続いて、対象楽曲に対応する楽曲音響データＤＷを音楽データ格納サーバ３から取得する（Ｓ７３０）。その取得した楽曲音響データＤＷから、当該楽曲音響データＤＷにおける楽音推移波形を取得する（Ｓ７４０）。 As shown in FIG. 8, when the data correction process is started, the music MIDI data DM corresponding to the target music is acquired from the music data storage server 3 (S710).
Subsequently, music acoustic data DW corresponding to the target music is acquired from the music data storage server 3 (S730). A musical sound transition waveform in the music acoustic data DW is acquired from the acquired music acoustic data DW (S740).

そして、音高補正処理を実行する（Ｓ７５０）。この音高補正処理は、第一実施形態における音高補正処理（Ｓ１５０）と同様であるため、内容の具体的な説明は省略する。
さらに、時間ズレ量導出処理を実行する（Ｓ７７０）。この時間ズレ量導出処理は、第一実施形態における時間ズレ量導出処理（Ｓ１７０）と同様であるため、内容の具体的な説明は省略する。 Then, a pitch correction process is executed (S750). Since the pitch correction process is the same as the pitch correction process (S150) in the first embodiment, a detailed description thereof will be omitted.
Further, a time deviation amount derivation process is executed (S770). Since this time deviation amount derivation process is the same as the time deviation amount derivation process (S170) in the first embodiment, a detailed description thereof will be omitted.

続いて、Ｓ７７０にて修正された楽曲ＭＩＤＩデータＤＭにおける出力音の演奏開始タイミングと、Ｓ７１０で取得された修正前の楽曲ＭＩＤＩデータＤＭにおける出力音の演奏開始タイミングとの差分を導出する（Ｓ７９０）。本実施形態のＳ７９０では、具体的には、下記（１）式に基づいて、個々の出力音について、差分（以下、オンセット差分時間ｄＯｎｓｅｔ，オフセット差分時間ｄＯｆｆｓｅｔ）を導出する。 Subsequently, a difference between the performance start timing of the output sound in the music MIDI data DM corrected in S770 and the performance start timing of the output sound in the music MIDI data DM before correction acquired in S710 is derived (S790). . In S790 of the present embodiment, specifically, a difference (hereinafter, an onset difference time dOnset, an offset difference time dOffset) is derived for each output sound based on the following equation (1).

ただし、（１）式中のａＯｎｓｅｔは、修正された楽曲ＭＩＤＩデータＤＭにおける出力音の演奏開始タイミングであり、ｂＯｎｓｅｔは、修正前の楽曲ＭＩＤＩデータＤＭにおける出力音の演奏開始タイミングである。また、（１）式におけるａＯｆｆｓｅｔは、修正された楽曲ＭＩＤＩデータＤＭにおける出力音の演奏終了タイミングであり、ｂＯｆｆｓｅｔは、修正前の楽曲ＭＩＤＩデータＤＭにおける出力音の演奏終了タイミングである。 In Equation (1), aOnset is the performance start timing of the output sound in the modified music MIDI data DM, and bOnset is the performance start timing of the output sound in the music MIDI data DM before correction. In addition, aOffset in the equation (1) is the performance end timing of the output sound in the modified music MIDI data DM, and bOffset is the performance end timing of the output sound in the music MIDI data DM before correction.

つまり、Ｓ７９０では、各出力音について、オンセット差分時間ｄＯｎｓｅｔ，オフセット差分時間ｄＯｆｆｓｅｔが導出される。
続いて、対象楽曲についての歌詞出力データＤＯを、音楽データ格納サーバ３から取得して、当該歌詞出力データＤＯにおける各歌詞構成文字の歌詞出力タイミングを取得する（Ｓ８００）。 That is, in S790, an onset difference time dOnset and an offset difference time dOffset are derived for each output sound.
Subsequently, the lyrics output data DO for the target music is acquired from the music data storage server 3, and the lyrics output timing of each lyrics constituent character in the lyrics output data DO is acquired (S800).

さらに、Ｓ７９０で導出されたオンセット差分時間ｄＯｎｓｅｔ，オフセット差分時間ｄＯｆｆｓｅｔに従って、楽曲音響データＤＷにおける楽音の演奏開始タイミングに一致するように、Ｓ８００で取得した歌詞出力データＤＯにおける歌詞構成文字の各々の歌詞出力タイミングを修正する（Ｓ８１０）。 Further, according to the onset difference time dOnset and the offset difference time dOffset derived in S790, each of the lyrics constituent characters in the lyrics output data DO acquired in S800 so as to coincide with the musical performance start timing in the music acoustic data DW. The lyrics output timing is corrected (S810).

具体的に、本実施形態のＳ８１０では、下記（２）に基づいて、歌詞構成文字の各々について、修正した歌詞出力タイミングｍＯｎｓｅｔ、及び修正した歌詞出力終了タイミングｍＯｆｆｓｅｔを導出する。 Specifically, in S810 of the present embodiment, the corrected lyrics output timing mOnset and the corrected lyrics output end timing mOffset are derived for each of the lyrics constituent characters based on (2) below.

ただし、（２）式中のｌＯｎｓｅｔは、歌詞出力データＤＯにおける歌詞構成文字の歌詞出力タイミングであり、ｌＯｆｆｓｅｔは、歌詞出力データＤＯにおける歌詞構成文字の歌詞出力終了タイミングである。 In Equation (2), lOnset is the lyrics output timing of the lyrics constituent characters in the lyrics output data DO, and lOffset is the lyrics output end timing of the lyrics constituent characters in the lyrics output data DO.

つまり、Ｓ８１０では、歌詞構成文字それぞれの歌詞出力タイミング及び歌詞出力終了タイミングについて、楽曲音響データＤＷにおける楽音それぞれの演奏開始タイミングに一致するように、オンセット差分時間ｄＯｎｓｅｔ，オフセット差分時間ｄＯｆｆｓｅｔシフトすることで修正される。 That is, in S810, the onset difference time dOnset and the offset difference time dOffset are shifted so that the lyrics output timing and the lyrics output end timing of each of the lyrics constituent characters coincide with the performance start timing of each musical sound in the music acoustic data DW. Will be fixed.

そして、Ｓ８１０にて修正された歌詞出力タイミング（即ち、修正歌詞出力タイミング）及び歌詞出力終了タイミングが規定された歌詞出力データ（即ち、修正歌詞出力データ）を生成する（Ｓ８２０）。 Then, lyrics output data (that is, corrected lyrics output data) in which the lyrics output timing (that is, corrected lyrics output timing) and the lyrics output end timing corrected in S810 are defined is generated (S820).

その後、本データ修正処理を終了する。
［第二実施形態の効果］
本実施形態のデータ修正処理においても、第一実施形態のデータ修正処理と同様の効果を得ることができる。 Thereafter, the data correction process is terminated.
[Effects of Second Embodiment]
Also in the data correction process of this embodiment, the same effect as the data correction process of 1st embodiment can be acquired.

特に、本実施形態のように構成された歌詞出力データＤＯを修正する場合、各歌詞構成文字の歌詞出力タイミングを、出力音の演奏開始タイミングと対応付ける必要が無いため、修正歌詞出力データＤＯを容易に生成できる。
［その他の実施形態］
以上、本発明の実施形態について説明したが、本発明は上記実施形態に限定されるものではなく、本発明の要旨を逸脱しない範囲において、様々な態様にて実施することが可能である。 In particular, when correcting the lyric output data DO configured as in the present embodiment, it is not necessary to associate the lyric output timing of each lyric constituent character with the performance start timing of the output sound. Can be generated.
[Other Embodiments]
As mentioned above, although embodiment of this invention was described, this invention is not limited to the said embodiment, In the range which does not deviate from the summary of this invention, it is possible to implement in various aspects.

例えば、上記第二実施形態のデータ修正処理のＳ８１０では、歌詞構成文字それぞれの歌詞出力タイミング及び歌詞出力終了タイミングについて、オンセット差分時間ｄＯｎｓｅｔ，オフセット差分時間ｄＯｆｆｓｅｔシフトすることで修正していたが、歌詞出力タイミング及び歌詞出力終了タイミングの修正方法は、これに限るものではなく、第一実施形態のデータ修正処理における歌詞出力データの修正方法を適用しても良い。 For example, in S810 of the data correction process of the second embodiment, the lyrics output timing and the lyrics output end timing of each of the lyrics constituent characters are corrected by shifting the onset difference time dOnset and the offset difference time dOffset. The correction method of the lyrics output timing and the lyrics output end timing is not limited to this, and the lyrics output data correction method in the data correction processing of the first embodiment may be applied.

また、第一実施形態のデータ修正処理においては、対象楽曲に対応する歌詞出力データＤＯを音楽データ格納サーバ３から取得し、当該楽曲ＭＩＤＩデータＤＭに規定されている出力音の演奏開始タイミングと、歌詞出力データＤＯに規定された歌詞構成文字の歌詞出力タイミングとを対応付けた上で、その出力音の演奏開始タイミングと対応付けられた、歌詞構成文字の歌詞出力タイミングを修正しても良い。このとき、歌詞出力終了タイミングも、各歌詞構成文字に対応すると推定される出力音の演奏終了タイミングと対応付けても良い。 In the data correction process of the first embodiment, the lyrics output data DO corresponding to the target music is acquired from the music data storage server 3, and the performance start timing of the output sound defined in the music MIDI data DM is obtained. After associating the lyrics output timing of the lyrics constituent characters defined in the lyrics output data DO, the lyrics output timing of the lyrics constituent characters associated with the performance start timing of the output sound may be corrected. At this time, the lyrics output end timing may also be associated with the performance end timing of the output sound estimated to correspond to each lyrics constituent character.

この対応付けの方法としては、色替えの速度から特定される各歌詞構成文字の歌詞出力タイミングを、各歌詞構成文字に対応すると推定される出力音の演奏開始タイミングとして設定した、楽曲ＭＩＤＩデータＤＭの新たなトラック（以下、歌詞出力トラックとする）を生成する。 As this association method, the music MIDI data DM in which the lyrics output timing of each lyrics constituent character specified from the speed of color change is set as the performance start timing of the output sound estimated to correspond to each lyrics constituent character. A new track (hereinafter referred to as a lyrics output track) is generated.

なお、第一実施形態のデータ修正処理における歌詞構成文字の歌詞出力タイミング（または、歌詞出力終了タイミング）を修正する方法は、同一のテンポ一定区間ごとに、当該区間に含まれる、歌詞構成文字の歌詞出力タイミング（または、歌詞出力終了タイミング）を、オンセット差分時間ｄＯｎｓｅｔまたはオフセット差分時間ｄＯｆｆｓｅｔシフトする方法でも良い。 Note that the method of correcting the lyrics output timing (or lyrics output end timing) of the lyrics constituent characters in the data correction processing of the first embodiment is the same as that of the lyrics constituent characters included in the same tempo intervals. The lyrics output timing (or lyrics output end timing) may be shifted by the onset difference time dOnset or the offset difference time dOffset.

また、上記実施形態（ここでは、第一実施形態と第二実施形態との両方）のデータ修正処理では、音高補正処理と時間ズレ量導出処理との両方の処理を実行していたが、データ修正処理で実行する処理としては、時間ズレ量導出処理のみであっても良い。
［実施形態と特許請求の範囲との対応関係］
最後に、上記実施形態の記載と、特許請求の範囲の記載との関係を説明する。 In the data correction process of the above embodiment (here, both the first embodiment and the second embodiment), both the pitch correction process and the time deviation amount derivation process are executed. The process executed in the data correction process may be only the time deviation amount derivation process.
[Correspondence between Embodiment and Claims]
Finally, the relationship between the description of the above embodiment and the description of the scope of claims will be described.

上記実施形態のデータ修正処理におけるＳ１４０，Ｓ７４０が、特許請求の範囲の記載における楽音推移取得手段に相当し、時間ズレ量導出処理のＳ５１０が、出力音推移取得手段に相当し、データ修正処理におけるＳ１９０，Ｓ８００が、歌詞出力データ取得手段に相当する。さらに、時間ズレ量導出処理のＳ５２０〜Ｓ６２０が、時間ズレ量導出手段に相当し、データ修正処理におけるＳ２００，Ｓ８１０が、タイミング修正手段に相当する。 S140 and S740 in the data correction process of the above embodiment correspond to the musical sound transition acquisition means in the description of the claims, and S510 of the time shift amount derivation process corresponds to the output sound transition acquisition means. S190 and S800 correspond to the lyrics output data acquisition means. Further, S520 to S620 of the time shift amount derivation process correspond to time shift amount derivation means, and S200 and S810 in the data correction process correspond to timing correction means.

また、時間ズレ量導出処理におけるＳ５２０，Ｓ５４０からＳ５６０が、出力音変化導出手段に相当し、Ｓ５３０からＳ５６０が、楽音変化導出手段に相当し、Ｓ５７０からＳ６１０が、時間相関導出手段に相当し、Ｓ６２０が、時間補正量導出手段に相当する。さらに、音高補正処理におけるＳ３２０からＳ４３０が、音高補正量導出手段に相当し、Ｓ４４０が、楽譜データ修正手段に相当する。このうちのＳ３６０からＳ３９０が、楽音分布導出手段に相当し、Ｓ３２０からＳ３５０が、出力音分布導出手段に相当し、Ｓ４００からＳ４２０が、音高相関導出手段に相当する。 Further, S520, S540 to S560 in the time deviation amount derivation process correspond to output sound change derivation means, S530 to S560 correspond to musical sound change derivation means, and S570 to S610 correspond to time correlation derivation means. S620 corresponds to time correction amount deriving means. Further, S320 to S430 in the pitch correction process correspond to a pitch correction amount deriving unit, and S440 corresponds to a score data correcting unit. Of these, S360 to S390 correspond to the musical sound distribution deriving means, S320 to S350 correspond to the output sound distribution deriving means, and S400 to S420 correspond to the pitch correlation deriving means.

さらに、時間ズレ量導出処理におけるＳ６３０が、演奏開始タイミング補正手段に相当する。 Further, S630 in the time deviation amount derivation process corresponds to performance start timing correction means.

１…音楽データ配信システム３…音楽データ格納サーバ５…携帯端末６…情報受付部７…表示部８…音出力部９…通信部１０…記憶部１１…制御部２０…情報処理装置２１…通信部２２…入力受付部２３…表示部２４…音声入力部２５…音声出力部２６…音源モジュール２７…記憶部３０…制御部３１…ＲＯＭ３２…ＲＡＭ３３…ＣＰＵ DESCRIPTION OF SYMBOLS 1 ... Music data delivery system 3 ... Music data storage server 5 ... Portable terminal 6 ... Information reception part 7 ... Display part 8 ... Sound output part 9 ... Communication part 10 ... Memory | storage part 11 ... Control part 20 ... Information processing apparatus 21 ... Communication Unit 22: Input receiving unit 23 ... Display unit 24 ... Audio input unit 25 ... Audio output unit 26 ... Sound source module 27 ... Storage unit 30 ... Control unit 31 ... ROM 32 ... RAM 33 ... CPU

Claims

A musical sound transition acquisition means for acquiring a musical sound transition waveform in which the sound pressure of the musical sound constituting the target music has changed along the time axis;
The musical score of the musical composition that simulates the target musical piece, and for each output sound output from the sound source module, the sound pressure of the output sound is time axis based on the musical score data in which the pitch and the performance start timing are defined at least. Output sound transition acquisition means for acquiring an output sound transition waveform that has shifted along
It is data that defines the lyrics output timing, which is the output timing of the lyrics constituent characters constituting the lyrics of the target music, and the lyrics output timing for at least one of the lyrics constituent characters is specified for the score data Lyrics output data acquisition means for acquiring lyrics output data associated with a specific start timing that is at least one timing;
The musical sound information representing the characteristics of the musical sound transition waveform extracted from the musical sound transition waveform acquired by the musical sound transition acquisition means, and the output sound transition waveform extracted from the output sound transition waveform acquired by the output sound transition acquisition means. Based on the result of comparison with the output sound information representing the characteristics, a time shift for deriving a time shift amount representing a shift amount of the performance start timing of the musical sound corresponding to each output sound with respect to the performance start timing of each of the output sounds. A quantity derivation means;
Modified lyrics output in which the lyrics output timing in the lyrics output data acquired by the lyrics output data acquisition means is corrected according to the time deviation amount derived by the time deviation amount deriving means so as to coincide with the performance start timing of the musical sound for example Bei and timing correction means for defining the timing,
The time shift amount derivation means includes:
From the musical tone transition waveform acquired by the musical tone transition acquisition means, a musical tone non-harmonic that is a non-harmonic component of the musical tone transition waveform is extracted, and a musical tone change that represents a musical non-harmonic change along the time axis is obtained. A musical sound change deriving means derived as the musical sound information;
From the output sound transition waveform acquired by the output sound transition acquisition means, the output sound non-harmonic that is a non-harmonic component of the output sound transition waveform is extracted, and the change of the output sound non-harmonic along the time axis is extracted. Output sound change deriving means for deriving an output sound change representing as the output sound information;
A time correlation value representing a correlation value between the musical sound change derived by the musical sound change deriving means and the output sound change derived by the output sound change deriving means is defined on the time axis of the musical sound change. Each time the output sound change is expanded or contracted along the time axis by matching the set position set on the time axis of the output sound change with a reference position, the set position is derived within the specified range on the time axis. A time correlation deriving means that sequentially changes along
Of the time correlation values derived by the time correlation deriving means, the time correction amount for deriving the expansion / contraction rate of the output sound change and the set position corresponding to the time correlation value having the maximum value as the time correction amount Derivation means and
With
The lyrics output data correction apparatus characterized in that the time correction amount derived by the time correction amount deriving means is set as the time shift amount .

Based on the result of comparing one of the musical sound information and one of the output sound information, the pitch correction amount is adjusted so that the pitch of the output sound matches the pitch of the musical sound corresponding to the output sound. A pitch correction amount deriving means for deriving;
A musical score that generates corrected musical score data in which the musical score data is corrected by shifting the pitch of each of the output sounds defined in the musical score data according to the pitch correction amount derived by the pitch correction amount deriving means. Data correction means;
With
The time shift amount derivation means includes:
The corrected sound transition waveform, which is the output sound transition waveform based on the modified score data generated by the score data correction means, is the output sound transition waveform acquired by the output sound transition acquisition means. Item 2. The lyrics output data correction device according to Item 1 .

The pitch correction amount derivation means includes:
A musical sound distribution deriving unit that represents the frequency included in the musical sound transition waveform and the strength of each frequency, and derives a musical sound pitch distribution normalized with respect to the strength of the frequency as one of the musical sound information;
An output sound distribution deriving unit that represents the frequency and the strength of each frequency included in the output sound transition waveform, and derives an output pitch distribution normalized with respect to the strength of the frequency as one of the output sound information;
A pitch correlation value representing a correlation value between the output pitch distribution derived by the output sound distribution deriving means and the musical tone pitch distribution derived by the musical sound distribution deriving means is determined in advance of the musical tone pitch distribution. A pitch correlation deriving means for deriving each time the output pitch distribution is shifted along the frequency axis from a defined position,
Of the pitch correlation values derived by the pitch correlation deriving means, the shift amount along the frequency axis from the specified position corresponding to the pitch correlation value having the maximum value is the pitch correction amount. The lyrics output data correction device according to claim 2 , wherein the lyrics output data correction device is derived as follows.

The lyrics output data is
For at least some of the lyrics constituent characters, the lyrics output timing is defined by the elapsed time from the specific start timing,
The timing correction means includes
For the lyrics output timing of the lyric structure characters the lyric output timing defined by the elapsed time, according to any one of claims 1 to 3, characterized in that defining the modified lyrics output timing Lyrics output data correction device.

The performance start timing of at least some of the output sounds is defined as the specific start timing,
The lyrics output data is
The specific start timing corresponding to the lyrics constituent characters is associated with the lyrics output timing of each of the lyrics constituent characters,
The timing correction means includes
The lyrics output data correction device according to any one of claims 1 to 3 , wherein the corrected lyrics output timing is defined for each of the lyrics constituent characters.

A performance start timing correction means for deriving a modified performance start timing obtained by shifting the performance start timing of the output sound by the time offset amount,
The timing correction means includes
6. The lyrics output data correction device according to claim 5 , wherein the corrected performance start timing derived by the performance start timing correction means is the corrected lyrics output timing.

Performance start timing correction means for deriving a modified performance start timing obtained by shifting the performance start timing of the output sound by the time offset amount;
The timing correction means includes
The modified lyrics output timing is defined by shifting the difference between the corrected performance start timing derived by the performance start timing correcting means and the performance start timing of the output sound, and the lyrics output timing. The lyrics output data correction device according to claim 4 or 5 .

A musical sound transition acquisition procedure for acquiring a musical sound transition waveform in which the sound pressure of the musical sound constituting the target music has changed along the time axis,
The musical score of the musical composition that simulates the target musical piece, and for each output sound output from the sound source module, the sound pressure of the output sound is time axis based on the musical score data in which the pitch and the performance start timing are defined at least. Output sound transition acquisition procedure for acquiring the output sound transition waveform that has shifted along
It is data that defines the lyrics output timing, which is the output timing of the lyrics constituent characters constituting the lyrics of the target music, and the lyrics output timing for at least one of the lyrics constituent characters is specified for the score data A lyrics output data acquisition procedure for acquiring lyrics output data associated with a specific start timing that is at least one timing;
The musical sound information representing the characteristics of the musical sound transition waveform extracted from the musical sound transition waveform acquired in the musical sound transition acquisition procedure, and the output sound transition waveform extracted from the output musical transition waveform acquired in the output sound transition acquisition procedure. Based on the result of comparison with the output sound information representing the characteristics, a time shift for deriving a time shift amount representing a shift amount of the performance start timing of the musical sound corresponding to each output sound with respect to the performance start timing of each of the output sounds. A quantity derivation procedure;
Modified lyrics output in which the lyrics output timing in the lyrics output data acquired in the lyrics output data acquisition procedure is corrected according to the amount of time deviation derived in the time deviation amount derivation procedure so as to coincide with the performance start timing of the musical sound Let the computer execute the timing correction procedure that defines the timing ,
The time deviation amount derivation procedure is as follows:
From the musical tone transition waveform acquired in the musical tone transition acquisition procedure, a musical tone non-harmonic that is a non-harmonic component of the musical tone transition waveform is extracted, and a musical tone change that represents a musical non-harmonic change along the time axis is obtained. A musical sound change derivation procedure derived as the musical sound information;
The output sound non-harmonic, which is a non-harmonic component of the output sound transition waveform, is extracted from the output sound transition waveform acquired in the output sound transition acquisition procedure, and the change of the output sound non-harmonic along the time axis is extracted. An output sound change derivation procedure for deriving an output sound change representing the output sound information;
A time correlation value representing a correlation value between the musical sound change derived in the musical sound change derivation procedure and the output sound change derived in the output sound change derivation procedure is defined on the time axis of the musical sound change. Each time the output sound change is expanded or contracted along the time axis by matching the set position set on the time axis of the output sound change with a reference position, the set position is derived within the specified range on the time axis. A time correlation derivation procedure that sequentially changes along
Of the time correlation values derived in the time correlation deriving procedure, the time correction amount for deriving the expansion / contraction rate of the output sound change and the set position corresponding to the time correlation value having the maximum value as the time correction amount Derivation procedure and
Let the computer run
A program characterized in that the time correction amount derived in the time correction amount derivation procedure is set as the time deviation amount .