JP2011209592A

JP2011209592A - Musical instrument sound separation device and program

Info

Publication number: JP2011209592A
Application number: JP2010078665A
Authority: JP
Inventors: Noriaki Asemi; 典昭阿瀬見; Seiji Kurokawa; 誠司黒川
Original assignee: Brother Industries Ltd
Current assignee: Brother Industries Ltd
Priority date: 2010-03-30
Filing date: 2010-03-30
Publication date: 2011-10-20
Anticipated expiration: 2030-03-30
Also published as: JP5267495B2

Abstract

PROBLEM TO BE SOLVED: To provide a musical instrument sound separation device and a program, which improve accuracy in separating and extracting musical instrument sound from mixed sound.SOLUTION: In a sound source separation processing, a corresponding range corresponding to a specific sound range is specified, where a correlation value of a spectrum amplitude value ntsp(twi, fi) which is derived from one specific sound, with a spectrum amplitude value tusp(twi, fi), becomes the maximum. An amplitude ratio kr is derived in the specific corresponding range, and the derived amplitude ratio kr is multiplied by the spectrum amplitude value ntsp(twi, fi), and thereby, a separation spectrum for expressing a complex spectrum of the musical sound at musical instrument sound transition, corresponding to the specific sound, is derived (S310). Inverse Fourier transform of the separation spectrum is performed and a section transition is derived (S320). The musical instrument sound transition trwf is newly updated by replacing the corresponding range in the musical instrument transition trwf, in the derived section transition (S330).

Description

本発明は、複数の音が重畳した混合音から楽器音を分離する楽器音分離装置、及びプログラムに関する。 The present invention relates to an instrument sound separation device and a program for separating an instrument sound from a mixed sound in which a plurality of sounds are superimposed.

従来、楽曲を構成する楽音の音圧が時間軸に沿って推移した既知の楽音波形を用いて、音声や物音といった主要音に、楽曲（例えば、ＢＧＭとして演奏された楽曲）にて演奏された楽器音が重畳した混合音から、楽曲の楽器音を除去する音源分離装置が知られている（特許文献１参照）。 Conventionally, a musical tone (for example, a musical piece played as a BGM) is played on a major sound such as a voice or a physical sound using a known musical sound waveform in which the sound pressure of the musical sound constituting the musical piece changes along the time axis. A sound source separation device is known that removes musical instrument sounds from a mixed sound on which musical instrument sounds are superimposed (see Patent Document 1).

この特許文献１に記載された音源分離装置においては、混合音に含まれている楽曲の楽器音の音圧が時間軸に沿って推移した楽器音波形が、既知の楽音波形に時間軸に沿って一致している（即ち、波形が合同である）と仮定している。そして、特許文献１に記載された音源分離装置では、混合音の音圧と既知の楽音波形における音圧との差が最小となるタイミングを、楽器音波形と既知の楽音波形との双方の開始位置として、既知の楽音波形全体を混合音から減算することで、楽曲の楽器音全体を混合音から除去している。 In the sound source separation device described in Patent Document 1, the musical instrument sound waveform in which the sound pressure of the musical instrument sound of the music included in the mixed sound has shifted along the time axis is changed to a known musical sound waveform along the time axis. Are consistent (ie, the waveforms are congruent). In the sound source separation device described in Patent Document 1, the timing at which the difference between the sound pressure of the mixed sound and the sound pressure in the known musical sound waveform is minimized starts both the instrument sound waveform and the known musical sound waveform. As a position, the entire musical tone of the music is removed from the mixed sound by subtracting the entire known musical sound waveform from the mixed sound.

特許第４２７４４１８号Japanese Patent No. 4274418

ところで、楽曲が同一であっても、当該楽曲を演奏する演奏者によっては、その楽曲を構成する各楽音の出力タイミングや各楽音の音高は異なることが多い。また、各楽音の出力タイミングや各楽音の音高は、演奏者が同一であっても、当該楽曲が演奏される毎にアレンジされることがある。 By the way, even if the music is the same, depending on the performer who plays the music, the output timing of each musical tone constituting the music and the pitch of each musical tone are often different. Further, the output timing of each musical tone and the pitch of each musical tone may be arranged every time the music is played even if the performers are the same.

このように、各楽音の出力タイミングや各楽音の音高が、楽曲が演奏される毎に異なると、混合音に含まれる当該楽曲の楽器音波形は、既知の楽音波形との間にズレを有する（即ち、楽器音波形は、既知の楽音波形に合同とはならない）可能性が高い。したがって、特許文献１に記載された音源分離装置にて混合音から分離・抽出した楽器音は、出力タイミングや音高が異なる区間について、実際に演奏された楽音と異なるという問題があった。 In this way, if the output timing of each musical tone and the pitch of each musical tone are different each time a musical piece is played, the musical instrument sound waveform of the musical piece included in the mixed sound is shifted from the known musical sound waveform. (I.e., the instrument sound waveform is not likely to be congruent with a known musical sound waveform). Therefore, there is a problem that the musical instrument sound separated and extracted from the mixed sound by the sound source separation device described in Patent Document 1 is different from the actually played musical sound in the sections having different output timings and pitches.

つまり、特許文献１に記載の音源分離装置では、楽器音を混合音から分離・抽出する際の精度が低いという問題があった。
そこで、本発明は、混合音から楽器音を分離・抽出する際の精度を向上させた楽器音分離装置、及びプログラムを提供することを目的とする。 That is, the sound source separation device described in Patent Document 1 has a problem that accuracy when separating and extracting musical instrument sounds from mixed sounds is low.
Therefore, an object of the present invention is to provide a musical instrument sound separating apparatus and a program that improve accuracy in separating and extracting musical instrument sounds from mixed sounds.

上記目的を達成するためになされた本発明の楽器音分離装置は、楽音取得手段と、楽音解析手段と、特定音取得手段と、特定音解析手段と、範囲特定手段と、振幅比率導出手段と、区間推移導出手段と、楽器音分離手段とを備えている。 The instrument sound separation device of the present invention made to achieve the above object includes a musical sound acquisition means, a musical sound analysis means, a specific sound acquisition means, a specific sound analysis means, a range specification means, and an amplitude ratio derivation means. , Section transition deriving means and instrument sound separating means.

この本発明の楽器音分離装置では、楽音取得手段が、楽曲を構成する楽音の音圧が時間軸に沿って推移した楽音推移を取得し、その取得された楽音推移に含まれる周波数と各周波数の強さとを表す周波数スペクトルを時間軸に沿って配した楽音スペクトログラムを、楽音解析手段が導出する。そして、少なくとも一種類の楽器の楽器音を模擬した特定音を出力する音源モジュールにて演奏される楽曲の楽譜を表す演奏データに基づいて、特定音取得手段が、規定された一種類の楽器である対象楽器の特定音の音圧が時間軸に沿って推移する特定音推移を取得し、その取得された特定音推移に含まれる周波数と各周波数の強さとを表す周波数スペクトルを時間軸に沿って配した特定音スペクトログラムを、特定音解析手段が、音源モジュールにて対象楽器の各特定音が演奏される時間長である分析区間毎に導出する。 In the musical instrument sound separation device of the present invention, the musical sound acquisition means acquires the musical sound transition in which the sound pressure of the musical sound constituting the music has shifted along the time axis, and the frequency and each frequency included in the acquired musical sound transition. The musical sound analyzing means derives a musical spectrogram in which a frequency spectrum representing the intensity of the voice is arranged along the time axis. Based on the performance data representing the musical score of the music played by the sound source module that outputs the specific sound simulating the musical instrument sound of at least one type of musical instrument, the specific sound acquisition means is a specified type of musical instrument. Acquires a specific sound transition in which the sound pressure of a specific instrument's specific sound changes along the time axis, and displays a frequency spectrum representing the frequency included in the acquired specific sound transition and the strength of each frequency along the time axis. The specific sound analyzing means derives the specific sound spectrogram arranged for each analysis section, which is a time length during which each specific sound of the target musical instrument is played by the sound source module.

さらに、本発明の楽器音分離装置では、範囲特定手段が、特定音解析手段にて導出された特定音スペクトログラムそれぞれを楽音解析手段にて導出された楽音スペクトログラムに照合して、周波数軸及び時間軸に沿って特定音スペクトログラムが最も一致する楽音スペクトログラムでの範囲である対応範囲を特定し、その特定された対応範囲それぞれでの楽音スペクトログラムの周波数の強さと、特定音スペクトログラムの周波数の強さとの比を表す振幅比率を、振幅比率導出手段が、各周波数について導出する。これと共に、その導出された振幅比率それぞれを、当該分析区間に対応する楽音スペクトログラムを構成する周波数スペクトルの各周波数の強さに乗じた結果である分離スペクトルから、区間推移導出手段が、時間軸に沿った音圧の推移である区間推移を導出して、その導出された区間推移を楽曲の時間軸に沿って配することで、楽器音分離手段が、楽音推移において対象楽器の楽器音の音圧が時間軸に沿って推移した楽器音推移を生成する。 Further, in the musical instrument sound separation device of the present invention, the range specifying means collates each specific sound spectrogram derived by the specific sound analyzing means with the musical sound spectrogram derived by the musical sound analyzing means, and the frequency axis and time axis A specific range corresponding to the musical spectrogram that best matches the specific spectrogram is determined, and the ratio of the frequency intensity of the musical spectrogram to the specific spectrogram frequency strength in each of the specified corresponding ranges. An amplitude ratio deriving unit derives an amplitude ratio representing the frequency ratio. At the same time, the section transition deriving means from the separated spectrum obtained by multiplying each derived amplitude ratio by the intensity of each frequency spectrum constituting the musical sound spectrogram corresponding to the analysis section on the time axis. By deriving a section transition, which is a transition of the sound pressure along, and arranging the derived section transition along the time axis of the music, the instrument sound separation means allows the sound of the instrument sound of the target instrument in the musical sound transition. A musical instrument sound transition in which the pressure has shifted along the time axis is generated.

このような本発明の楽器音分離装置によれば、楽音推移から、楽器音推移を生成（抽出）することができる。
また、本発明の楽器音分離装置にて特定される対応範囲は、楽音スペクトログラムと特定音スペクトログラムとが、周波数軸及び時間軸の両方の軸に沿って最も一致する範囲である。このため、本発明の楽器音分離装置によれば、楽音取得手段にて取得する楽音推移が、各楽音の出力タイミングや音高が演奏データにおける特定音推移とは異なるようにアレンジされたものであっても、そのアレンジされた楽音の出力タイミングや音高に対応する範囲を対応範囲として特定することができる。 According to such a musical instrument sound separation device of the present invention, a musical instrument sound transition can be generated (extracted) from a musical sound transition.
The corresponding range specified by the musical instrument sound separation device of the present invention is a range in which the musical sound spectrogram and the specific sound spectrogram are most consistent along both the frequency axis and the time axis. For this reason, according to the musical instrument sound separation device of the present invention, the tone transition acquired by the tone acquisition means is arranged so that the output timing and pitch of each tone differs from the specific tone transition in the performance data. Even if it exists, the range corresponding to the output timing and pitch of the arranged musical sound can be specified as the corresponding range.

よって、本発明の楽器音分離装置によれば、楽音推移から生成した楽器音推移を、その楽曲にて実際に演奏された楽器音に近づけることができる。この結果、楽音推移から楽器音推移を生成する際の精度を向上させることができる。 Therefore, according to the musical instrument sound separation device of the present invention, the musical instrument sound transition generated from the musical sound transition can be brought close to the musical instrument sound actually played with the music. As a result, it is possible to improve the accuracy when the musical instrument sound transition is generated from the musical sound transition.

ところで、演奏データに、音源モジュールから特定音の出力を開始するタイミングを表す発音タイミング、及び発音タイミングに対応し、かつ特定音の出力を終了するタイミングを表す終了タイミングが規定されていても良い。この場合、本発明の楽器音分離装置における特定音解析手段は、請求項２に記載のように、発音タイミングから、当該発音タイミングに対応する終了タイミングまでの期間それぞれを、分析区間として規定しても良い。 By the way, the performance data may be defined with a sound generation timing indicating a timing for starting the output of the specific sound from the sound source module and an end timing corresponding to the sound generation timing and indicating a timing for ending the output of the specific sound. In this case, the specific sound analysis means in the musical instrument sound separation device of the present invention defines each period from the sound generation timing to the end timing corresponding to the sound generation timing as an analysis section. Also good.

このような本発明の楽器音分離装置によれば、区間推移を、発音タイミングから終了タイミングまでの期間、即ち、演奏データにおける音符毎に導出することができる。この結果、本発明の楽器音分離装置によれば、楽音推移が、楽曲において一つの音符のみがアレンジされたものであっても、楽器音推移を精度良く生成することができる。 According to such a musical instrument sound separation device of the present invention, it is possible to derive the section transition from the sounding timing to the end timing, that is, for each note in the performance data. As a result, according to the musical instrument sound separation apparatus of the present invention, even if the musical sound transition is one in which only one note is arranged in the music, the musical instrument sound transition can be generated with high accuracy.

さらに、本発明の楽器音分離装置において、範囲特定手段が特定音スペクトログラムを照合する楽音スペクトログラムでの区間は、請求項３に記載されたように、発音タイミングから、当該発音タイミングに対応する終了タイミングまでの期間に対応する楽音推移上の期間を時間軸に沿って始端及び終端が挟むように規定される楽音スペクトログラムでの区間であっても良い。この区間は、特定音スペクトログラムとの一致度が高いと予測される区間を包含したものとなる。 Furthermore, in the musical instrument sound separation device of the present invention, the interval in the musical sound spectrogram in which the range specifying means collates the specific sound spectrogram is defined from the sound generation timing to the end timing corresponding to the sound generation timing. It may be a section in a musical spectrogram that is defined so that the start end and the end end are sandwiched along the time axis in the period on the musical tone transition corresponding to the period up to. This section includes a section predicted to have a high degree of coincidence with the specific sound spectrogram.

このため、本発明の楽器音分離装置によれば、特定音スペクトログラムを楽音スペクトログラムに照合して対応区間を特定するまで、ひいては、楽器音推移を生成するまでに要する処理量を低減できる。 For this reason, according to the musical instrument sound separation device of the present invention, it is possible to reduce the amount of processing required until the specific sound spectrogram is collated with the musical sound spectrogram to identify the corresponding section, and eventually the instrument sound transition is generated.

また、本発明の楽器音分離装置では、請求項４に記載のように、特定音解析手段にて特定音スペクトログラムを導出する毎に、範囲特定手段が、対応範囲を特定し、その範囲特定手段にて対応範囲を特定する毎に、振幅比率導出手段が、振幅比率を導出し、その振幅比率導出手段にて振幅比率を導出する毎に、区間推移導出手段が、区間推移を導出しても良い。 In the musical instrument sound separation device of the present invention, as described in claim 4, each time the specific sound spectrogram is derived by the specific sound analyzing means, the range specifying means specifies the corresponding range, and the range specifying means The amplitude ratio deriving means derives the amplitude ratio every time the corresponding range is specified in step, and every time the amplitude ratio is derived by the amplitude ratio deriving means, the section transition deriving means derives the section transition. good.

このような楽器音分離装置によれば、反復して処理を実行することなく、楽音推移及び特定音推移の時間軸に沿った一連の処理によって楽器音推移を生成することができる。
この結果、本発明の楽器音分離装置によれば、楽器音推移を生成するまでに要する処理量を低減することができ、ひいては、楽器音推移を生成するまでに要する時間を短縮できる。 According to such an instrument sound separation device, it is possible to generate an instrument sound transition by a series of processing along the time axis of the musical sound transition and the specific sound transition without repeatedly performing the process.
As a result, according to the musical instrument sound separation device of the present invention, it is possible to reduce the amount of processing required to generate a musical instrument sound transition, and thus to reduce the time required to generate a musical instrument sound transition.

そして、本発明の楽器音分離装置における特定音取得手段では、請求項５に記載のように、演奏データに従って音源モジュールが出力する特定音に対応する各楽器が、それぞれ、対象楽器として規定されていても良い。 And in the specific sound acquisition means in the musical instrument sound separation device of the present invention, as described in claim 5, each musical instrument corresponding to the specific sound output from the sound source module according to the performance data is defined as a target musical instrument. May be.

このような楽器音分離装置によれば、演奏データに従って音源モジュールが出力する全ての楽器の楽器音推移を、楽音推移から生成（分離・抽出）することができる。
さらに、本発明の楽器音分離装置では、請求項６に記載のように、記憶制御手段が、区間推移導出手段にて導出された区間推移を楽音推移から減算した残留楽音推移を導出して、その導出した残留楽音推移を記憶装置に記憶し、推移導出手段が、演奏データに従って音源モジュールが出力する特定音に対応する各楽器についての区間推移を順次導出すると共に、推移導出手段にて区間推移が導出される毎に、更新手段が、その導出された区間推移それぞれを記憶装置に記憶された残留楽音推移から減算して、当該記憶装置に記憶された残留楽音推移を更新しても良い。 According to such a musical instrument sound separating apparatus, it is possible to generate (separate / extract) musical instrument sound transitions of all musical instruments output by the sound module according to performance data from musical sound transitions.
Furthermore, in the musical instrument sound separation device of the present invention, as described in claim 6, the storage control means derives the residual musical sound transition obtained by subtracting the section transition derived by the section transition deriving means from the musical sound transition, The derived residual musical sound transition is stored in the storage device, and the transition deriving means sequentially derives the section transition for each instrument corresponding to the specific sound output by the sound source module according to the performance data, and the transition deriving means performs the section transition. May be updated by subtracting each derived section transition from the residual musical sound transition stored in the storage device, to update the residual musical sound transition stored in the storage device.

このような本発明の楽器音分離装置によれば、楽音として歌声が含まれた楽音推移を取得した場合、その楽音推移から全ての楽器音についての区間推移（即ち、楽器音推移）を減算すれば、歌声の音圧の時間軸に沿った推移が残る。つまり、本発明の楽器音分離装置によれば、楽曲における歌声の音圧の推移を抽出することができる。 According to such a musical instrument sound separation device of the present invention, when a musical sound transition including a singing voice is acquired as a musical sound, a section transition (that is, a musical instrument sound transition) for all musical instrument sounds is subtracted from the musical sound transition. For example, the transition of the sound pressure of the singing voice along the time axis remains. That is, according to the musical instrument sound separation device of the present invention, the transition of the sound pressure of the singing voice in the music can be extracted.

なお、本発明は、コンピュータを楽器音分離装置として機能させるためのプログラムであっても良い。
本発明が、このようなプログラムとしてなされている場合、本発明のプログラムは、請求項８に記載されたように、楽音推移を取得する楽音取得手順と、その取得された楽音推移から、楽音スペクトログラムを導出する楽音解析手順と、演奏データに基づいて、特定音推移を取得する特定音取得手順と、その取得された特定音推移から、特定音スペクトログラムを分析区間毎に導出する特定音解析手順と、その導出された特定音スペクトログラムそれぞれを楽音スペクトログラムに照合して、対応範囲を特定する範囲特定手順と、その特定された対応範囲それぞれでの振幅比率を、各周波数について導出する振幅比率導出手順と、その導出された振幅比率それぞれを、当該分析区間に対応する楽音スペクトログラムを構成する周波数スペクトルの各周波数の強さに乗じた結果である分離スペクトルから、時間軸に沿った音圧の推移である区間推移を導出する区間推移導出手順と、その導出された区間推移を楽曲の時間軸に沿って配することで、楽器音推移を生成する楽器音分離手順とを実行させる必要がある。 The present invention may be a program for causing a computer to function as a musical instrument sound separation device.
When the present invention is configured as such a program, the program of the present invention, as described in claim 8, is a musical sound spectrogram based on a musical sound acquisition procedure for acquiring a musical sound transition and the acquired musical sound transition. A specific sound analysis procedure for deriving a specific sound spectrogram for each analysis interval from the acquired specific sound transition, a specific sound acquisition procedure for acquiring a specific sound transition based on performance data, , Each of the derived specific sound spectrograms is collated with a musical sound spectrogram, a range specifying procedure for specifying a corresponding range, and an amplitude ratio deriving procedure for deriving an amplitude ratio in each of the specified corresponding ranges for each frequency, , Each of the derived amplitude ratios is a frequency spectrum constituting the musical sound spectrogram corresponding to the analysis interval. The section transition derivation procedure for deriving the section transition, which is the transition of sound pressure along the time axis, from the separated spectrum that is the result of multiplying the intensity of each frequency of, and the derived section transition on the time axis of the music It is necessary to execute the instrument sound separation procedure for generating the instrument sound transition by arranging them along.

本発明のプログラムが、このようになされていれば、例えば、ＤＶＤ−ＲＯＭ、ＣＤ−ＲＯＭ、ハードディスク等のコンピュータ読み取り可能な記録媒体に記録し、必要に応じてコンピュータにロードさせて起動することや、必要に応じて通信回線を介してコンピュータに取得させて起動することにより用いることができる。そして、コンピュータに各手順を実行させることで、そのコンピュータを、請求項１に記載された楽器音分離装置として機能させることができる。 If the program of the present invention is made in this way, for example, it can be recorded on a computer-readable recording medium such as a DVD-ROM, CD-ROM, hard disk, etc. If necessary, it can be used by being acquired and activated by a computer via a communication line. And by making a computer perform each procedure, the computer can be functioned as a musical instrument sound separation apparatus described in claim 1.

実施形態における楽器音分離装置の概略構成を示すブロック図である。It is a block diagram which shows schematic structure of the musical instrument sound separation apparatus in embodiment. 楽音推移の概要、及び楽譜データのデータ構造を示す説明図である。It is explanatory drawing which shows the outline | summary of a musical tone transition, and the data structure of musical score data. 音源分離装置の制御部が実行する音源分離処理の処理手順を示すフローチャートである。It is a flowchart which shows the process sequence of the sound source separation process which the control part of a sound source separation apparatus performs. 音源分離処理において導出される楽音スペクトログラム、及び特定音推移の概要を示す説明図である。It is explanatory drawing which shows the outline of the musical sound spectrogram derived | led-out in a sound source separation process, and a specific sound transition. 音源分離処理において導出される特定音スペクトログラム、及び時間周波数ズレを補正する方法の概要を示す説明図である。It is explanatory drawing which shows the outline | summary of the method of correct | amending the specific sound spectrogram derived | led-out in a sound source separation process, and a time frequency shift. 第一実施形態において、制御部が実行する楽器音分離処理の処理手順を示すフローチャートである。5 is a flowchart illustrating a processing procedure of instrument sound separation processing executed by a control unit in the first embodiment. 楽器音分離処理において、分離スペクトルを導出する方法の概要を示す説明図である。It is explanatory drawing which shows the outline | summary of the method of deriving | requiring a separated spectrum in an instrument sound separation process. 楽器音分離処理において、楽器音波形を更新する方法の概要を示す説明図である。It is explanatory drawing which shows the outline | summary of the method of updating an instrument sound waveform in an instrument sound separation process. 第二実施形態において、制御部が実行する楽器音分離処理の処理手順を示すフローチャートである。In a second embodiment, it is a flow chart which shows a processing procedure of musical instrument sound separation processing which a control part performs.

以下に本発明の実施形態を図面と共に説明する。
［第一実施形態］
本発明が適用された楽器音分離装置は、楽曲において演奏された複数種類の楽器の楽器音が重畳した楽音の音圧が時間軸に沿って推移した音響データから、一種類の楽器音の音圧が時間軸に沿って推移した楽器音推移ｔｒｗｆを分離・抽出する装置であり、図１に示す情報処理装置１０によって構成されている。 Embodiments of the present invention will be described below with reference to the drawings.
[First embodiment]
The instrument sound separation device to which the present invention is applied is based on the sound data in which the sound pressure of the musical sound in which the instrument sounds of a plurality of kinds of instruments played in the music are superimposed changes along the time axis, and the sound of one kind of instrument sound. This is a device for separating and extracting the instrument sound transition trwf whose pressure has shifted along the time axis, and is constituted by the information processing apparatus 10 shown in FIG.

〈楽器音分離装置の構成について〉
図１に示すように、情報処理装置１０は、通信部１１と、音響データ読取部１２と、入力受付部１３と、表示部１４と、音声入力部１５と、音声出力部１６と、音源モジュール１７と、記憶部１８と、制御部２０とを備えている。 <Configuration of instrument sound separation device>
As shown in FIG. 1, the information processing apparatus 10 includes a communication unit 11, an acoustic data reading unit 12, an input receiving unit 13, a display unit 14, a voice input unit 15, a voice output unit 16, and a sound source module. 17, a storage unit 18, and a control unit 20.

このうち、通信部１１は、情報処理装置１０をネットワーク（例えば、専用回線やＷＡＮ）に接続し、その接続されたネットワークを介して外部と通信を行うものである。
音響データ読取部１２は、記憶媒体に記憶されている音響データから、当該音響データに対応する楽曲を時間軸に沿って順次読み取る装置（例えば、ＣＤやＤＶＤの読取装置）である。その音響データは、図２（Ａ）に示すような楽音の音圧が時間軸に沿って推移したアナログ波形を標本化（サンプリング）することで生成されている。 Among these, the communication unit 11 connects the information processing apparatus 10 to a network (for example, a dedicated line or a WAN), and communicates with the outside through the connected network.
The acoustic data reading unit 12 is a device (for example, a CD or DVD reader) that sequentially reads music corresponding to the acoustic data from the acoustic data stored in the storage medium along the time axis. The acoustic data is generated by sampling (sampling) an analog waveform in which the sound pressure of a musical tone as shown in FIG.

そして、入力受付部１３は、外部からの操作に従って情報や指令の入力を受け付ける入力機器（例えば、キーボードやポインティングデバイス）である。表示部１４は、画像を表示する表示装置（例えば、液晶ディスプレイやＣＲＴ等）である。また、音声入力部１５は、音声を電気信号に変換して制御部２０に入力する装置（いわゆるマイクロホン）である。音声出力部１６は、制御部２０からの電気信号を音声に変換して出力する装置（いわゆるスピーカ）である。 The input receiving unit 13 is an input device (for example, a keyboard or a pointing device) that receives input of information and commands in accordance with an external operation. The display unit 14 is a display device (for example, a liquid crystal display or a CRT) that displays an image. The voice input unit 15 is a device (so-called microphone) that converts voice into an electrical signal and inputs the electrical signal to the control unit 20. The audio output unit 16 is a device (so-called speaker) that converts an electrical signal from the control unit 20 into sound and outputs the sound.

さらに、音源モジュール１７は、楽曲の楽譜を表す演奏データに基づいて、予め登録された楽器（以下、模擬楽器とする）の楽器音を模擬した音（以下、特定音とする）を出力する装置であり、本実施形態では、周知のＭＩＤＩ（ＭｕｓｉｃａｌＩｎｓｔｒｕｍｅｎｔＤｉｇｉｔａｌＩｎｔｅｒｆａｃｅ）音源によって構成されている。一般的に、模擬楽器には、鍵盤楽器（例えば、ピアノやパイプオルガンなど）、弦楽器（例えば、バイオリンやビオラ、ギター、琴など）、打楽器（例えば、ドラムやシンバル、ティンパニー、木琴など）、及び管楽器（例えば、クラリネットやトランペット、フルート、尺八など）が少なくとも含まれている。 Furthermore, the tone generator module 17 outputs a sound (hereinafter referred to as a specific sound) that simulates a musical instrument sound of a previously registered musical instrument (hereinafter referred to as a simulated musical instrument) based on performance data representing the musical score of the music. In this embodiment, the sound source is constituted by a well-known MIDI (Musical Instrument Digital Interface) sound source. Generally, simulated instruments include keyboard instruments (eg, piano and pipe organ), stringed instruments (eg, violin, viola, guitar, koto), percussion instruments (eg, drums, cymbals, timpani, xylophone, etc.), and Wind instruments (eg, clarinet, trumpet, flute, shakuhachi, etc.) are included at least.

また、記憶部１８は、記憶内容を読み書き可能に構成された不揮発性の記憶装置（例えば、ハードディスク装置）である。この記憶部１８には、処理プログラムや演奏データが少なくとも格納される。 The storage unit 18 is a non-volatile storage device (for example, a hard disk device) configured to be able to read and write stored contents. The storage unit 18 stores at least processing programs and performance data.

〈演奏データの構造について〉
次に、演奏データは、周知のＭＩＤＩ規格によって表されたデータであり、楽曲を区別するデータである識別データと、音源モジュール１７にて演奏される当該楽曲の楽譜を表す楽譜データとを少なくとも有している。 <About the structure of performance data>
Next, the performance data is data expressed in accordance with a well-known MIDI standard, and includes at least identification data that is data for distinguishing music and score data that represents the score of the music played by the sound module 17. is doing.

このうち、楽譜データは、当該楽曲にて演奏される模擬楽器の種類毎に用意されるものであり、模擬楽器の種類に応じてインデックス番号ｍｔｉ（ｍｔｉ＝１〜ＭＴＮ）が割り振られている。その各楽譜データは、図２（Ｂ）に示すように、音源モジュール１７が特定音を出力する期間（以下、音符長）、及び各特定音の音高（図２中、ノートナンバー）ＮＮ_niを表す音符ＮＯ_niを羅列したものである。さらに、楽譜データには、音源モジュール１７が出力する各特定音の強さ（いわゆるアタック、ベロシティ、ディケイなど）が、当該特定音に対応する音符ＮＯ_ni毎に含まれている。 Of these, the musical score data is prepared for each type of simulated musical instrument played with the music, and index numbers mti (mti = 1 to MTN) are assigned according to the type of the simulated musical instrument. As shown in FIG. 2B, the musical score data includes a period (hereinafter referred to as note length) during which the sound source module 17 outputs a specific sound, and a pitch (note number in FIG. 2) NN _{ni of} each specific sound. This is a list of musical notes NO _ni representing Further, the score data includes the strength of each specific sound (so-called attack, velocity, decay, etc.) output from the sound module 17 for each note NO _ni corresponding to the specific sound.

このうち、楽譜データの音符長は、当該特定音の出力を開始するまでの当該楽曲の演奏開始からの時刻を表す発音タイミング（図２中、ノートオンタイミング）ＯＮ_niと、当該特定音の出力を終了するまでの当該楽曲の演奏開始からの時刻を表す終了タイミング（図２中、ノートオフタイミング）ＯＦＦ_niとによって規定されている。つまり、音符ＮＯ_niの音符長は、発音タイミングＯＮ_niから終了タイミングＯＦＦ_niまでの時間長である。 Of these, the note length of the musical score data is the sound generation timing (note on timing in FIG. 2) ON _{ni indicating} the time from the start of the performance of the music until the output of the specific sound, and the output of the specific sound. _Is defined by an end timing (note-off timing in FIG. 2) OFF _ni representing the time from the start of the performance of the music. That is, the note length of the note NO _ni is the time length from the sound generation timing ON _ni to the end timing OFF _ni .

なお、本実施形態において、符合ｎｉは、当該音符ＮＯ_niに対応する特定音が楽曲の演奏開始から何番目に演奏されるものであるかを表すインデックス番号である。
〈制御部の構成について〉
さらに、制御部２０は、ＲＯＭ２１と、ＲＡＭ２２と、ＣＰＵ２３とを少なくとも有した周知のコンピュータを中心に構成されている。 In the present embodiment, the sign ni is an index number that indicates what number the specific sound corresponding to the note NO _ni is played from the start of the performance of the music.
<Configuration of control unit>
Further, the control unit 20 is configured around a known computer having at least a ROM 21, a RAM 22, and a CPU 23.

このうち、ＲＯＭ２１は、電源が切断されても記憶内容を保持する必要がある処理プログラムやデータを格納するものである。また、ＲＡＭ２２は、処理プログラムやデータを一時的に格納するものである。そして、ＣＰＵ２３は、ＲＯＭ２１やＲＡＭ２２に記憶された処理プログラムに従って各処理（各種演算）を実行する。 Of these, the ROM 21 stores processing programs and data that need to retain stored contents even when the power is turned off. The RAM 22 temporarily stores processing programs and data. Then, the CPU 23 executes each process (various calculations) according to the processing program stored in the ROM 21 or the RAM 22.

なお、本実施形態では、制御部２０が実行する処理プログラムとして、音響データから楽器音推移ｔｒｗｆを分離・抽出するものが予め用意されている。以下、本実施形態では、音響データから楽器音推移ｔｒｗｆを分離・抽出する処理を音源分離処理と称す。 In the present embodiment, as a processing program executed by the control unit 20, a program for separating and extracting the instrument sound transition trwf from the acoustic data is prepared in advance. Hereinafter, in the present embodiment, the process of separating and extracting the instrument sound transition trwf from the acoustic data is referred to as a sound source separation process.

〈音源分離処理の処理内容について〉
次に、制御部２０が実行する音源分離処理について説明する。
この音源分離処理は、入力受付部１３を介して、当該音源分離処理を起動するための起動指令が入力されると、実行が開始されるものである。 <About the content of sound source separation processing>
Next, the sound source separation process executed by the control unit 20 will be described.
The sound source separation process is started when an activation command for activating the sound source separation process is input via the input receiving unit 13.

そして、図３に示すように、音源分離処理は、起動されると、入力受付部１３を介して入力された情報によって指定される楽曲に対応する演奏データ（即ち、楽譜データ）を取得する（Ｓ１１０（Ｓは、ステップを意味する））。 As shown in FIG. 3, when the sound source separation process is started, performance data (that is, musical score data) corresponding to the music designated by the information input via the input receiving unit 13 is acquired ( S110 (S means a step)).

続いて、音響データに基づいて、楽音の音圧が時間軸に沿って推移した波形である楽音推移ｔｕｗｆ（ｔｉ）を取得する（Ｓ１２０）。具体的に、本実施形態では、音響データ読取部１２にて読み取った当該楽曲を再生して、その再生した音声（即ち、複数の楽器の楽器音）を音声出力部１６から出力する。そして、音声入力部１５を介して入力された音声をサンプリングすることで楽音推移ｔｕｗｆ（ｔｉ）を取得する。なお、符合ｔｉは、時間軸に沿ってサンプリングした順番である。 Subsequently, the musical sound transition tuwf (ti), which is a waveform in which the sound pressure of the musical sound has shifted along the time axis, is acquired based on the acoustic data (S120). Specifically, in the present embodiment, the music read by the acoustic data reading unit 12 is reproduced, and the reproduced sound (that is, instrument sounds of a plurality of musical instruments) is output from the sound output unit 16. The musical sound transition tuwf (ti) is acquired by sampling the voice input via the voice input unit 15. Note that the symbol ti is the order of sampling along the time axis.

ただし、本実施形態では、本音源分離処理が起動される前に、Ｓ１１０にて取得する演奏データでの楽曲と同一な楽曲の音響データが記憶された記憶媒体が、音響データ読取部１２に配置されているものとする。 However, in the present embodiment, before the sound source separation process is activated, a storage medium in which the acoustic data of the same music as the music in the performance data acquired in S110 is stored in the acoustic data reading unit 12. It is assumed that

続いて、Ｓ１２０にて取得した楽音推移ｔｕｗｆを、予め規定された時間長ＷＬである分析時間窓ｔｗｉ毎に周波数解析し、その周波数解析の結果をＲＡＭ２２（または記憶部１８）に記憶する。（Ｓ１３０）。 Subsequently, the musical sound transition tuwf acquired in S120 is subjected to frequency analysis for each analysis time window twi that is a predetermined time length WL, and the result of the frequency analysis is stored in the RAM 22 (or the storage unit 18). (S130).

ただし、本実施形態の周波数解析は、周知の離散フーリエ変換（ＤＦＴ：ｄｉｓｃｒｅｔｅＦｏｕｒｉｅｒｔｒａｎｓｆｏｒｍ）によって実施される。その離散フーリエ変換は、楽音推移ｔｕｗｆの開始時刻から終了時刻までの期間を、予め規定された時間長であるシフト幅ＷＳＬ（ただし、シフト幅ＷＳＬ＜＜分析時間窓の時間長ＷＬ）で、時間軸に沿って分析時間窓ｔｗｉをシフトすることを繰り返しながら実行される。このため、Ｓ１３０での周波数解析の結果、楽音推移ｔｕｗｆの各分析時間窓ｔｗｉに含まれている周波数毎に、その周波数の強さ（以下、スペクトル振幅値とする）ｔｕｓｐ（ｔｗｉ，ｆｉ）が、周波数解析の結果として導出される。ただし、本実施形態では、スペクトル振幅値ｔｕｓｐは、実数部及び虚数部それぞれについて導出される。また、符合ｆｉは、周波数の区分（即ち、ＤＦＴによって導出される周波数区分：単位［ｂｉｎ］）である。 However, the frequency analysis of this embodiment is performed by a well-known discrete Fourier transform (DFT). In the discrete Fourier transform, the period from the start time to the end time of the musical sound transition tuwf is expressed by a shift width WSL (however, the shift width WSL << the time length WL of the analysis time window), It is performed while iteratively shifting the analysis time window twi along the axis. For this reason, as a result of the frequency analysis in S130, for each frequency included in each analysis time window twi of the musical sound transition tuwf, the frequency strength (hereinafter referred to as spectrum amplitude value) tusp (twi, fi) is obtained. Derived as a result of frequency analysis. However, in the present embodiment, the spectrum amplitude value tusp is derived for each of the real part and the imaginary part. Further, the sign fi is a frequency division (that is, a frequency division derived by DFT: unit [bin]).

すなわち、本実施形態では、スペクトル振幅値ｔｕｓｐ（ｔｗｉ，ｆｉ）を、対数軸にて表した周波数軸に沿って配置することで複素（周波数）スペクトルを導出する。そして、その複素スペクトルのスペクトル振幅値ｔｕｓｐ（ｔｗｉ，ｆｉ）を絶対値とした振幅スペクトルを、時間軸に沿って配置したスペクトログラム（図４（Ａ）参照、以下、楽音スペクトログラムとする）を導出する。なお、図４（Ａ）に示す楽音スペクトログラムでは、スペクトル振幅値ｔｕｓｐ（ｔｗｉ，ｆｉ）の大きさを色の濃淡によって表した。 That is, in the present embodiment, the complex (frequency) spectrum is derived by arranging the spectrum amplitude value tusp (twi, fi) along the frequency axis represented by the logarithmic axis. Then, a spectrogram (see FIG. 4A, hereinafter referred to as a musical sound spectrogram) in which an amplitude spectrum having the spectrum amplitude value tusp (twi, fi) of the complex spectrum as an absolute value is arranged along the time axis is derived. . In the musical spectrogram shown in FIG. 4A, the magnitude of the spectral amplitude value tusp (twi, fi) is represented by the color shading.

そして、Ｓ１１０にて取得した楽譜データのインデックス番号ｍｔｉを初期値（本実施形態では、初期値＝０）に設定する（Ｓ１４０）。続いて、設定されている楽譜データのインデックス番号（以下、設定インデックスとする）ｍｔｉが、楽譜データにおいて最大のインデックス番号（以下、最終インデックス）ＭＴＮ未満であるか否かを判定する（Ｓ１５０）。 Then, the index number mti of the musical score data acquired in S110 is set to an initial value (in this embodiment, initial value = 0) (S140). Subsequently, it is determined whether the index number (hereinafter referred to as a setting index) mti of the set musical score data is less than the maximum index number (hereinafter referred to as the final index) MTN in the musical score data (S150).

そのＳ１５０での判定の結果、設定インデックスｍｔｉが最終インデックスＭＴＮ未満であれば（Ｓ１５０：ＹＥＳ）、設定インデックスｍｔｉを１つインクリメントする（Ｓ１６０）。続いて、楽器音推移ｔｒｗｆ（ｍｔｉ，ｔｉ）を初期値に設定する（Ｓ１７０）。本実施形態において、楽器音推移ｔｒｗｆの初期値は、音圧が時間軸に沿って全て「０」に設定されたゼロ波形である。 If the setting index mti is less than the final index MTN as a result of the determination in S150 (S150: YES), the setting index mti is incremented by one (S160). Subsequently, the instrument sound transition trwf (mti, ti) is set to an initial value (S170). In the present embodiment, the initial value of the instrument sound transition trwf is a zero waveform in which the sound pressures are all set to “0” along the time axis.

そして、設定インデックスｍｔｉに対応する楽譜データにおける音符ＮＯのインデックス番号（以下、音符インデックスとする）ｎｉを初期値（本実施形態では、０とする）に設定する（Ｓ１８０）。続いて、音符インデックスｎｉが、設定インデックスｍｔｉに対応する楽譜データでの最大のインデックス番号（以下、最終音符とする）ＮＮＰＴ（ｍｔｉ）未満であるか否かを判定する（Ｓ１９０）。 Then, the note number index number (hereinafter referred to as note index) ni in the musical score data corresponding to the set index mti is set to an initial value (0 in this embodiment) (S180). Subsequently, it is determined whether or not the note index ni is less than the maximum index number (hereinafter referred to as the last note) NNPT (mti) in the score data corresponding to the set index mti (S190).

そのＳ１９０での判定の結果、音符インデックスｎｉが、最終音符ＮＮＰＴ（ｍｔｉ）未満であれば（Ｓ１９０：ＹＥＳ）、音符インデックスｎｉを１つインクリメントする（Ｓ２００）。そのインクリメントされた音符インデックスｎｉを有した音符（以下、対象音符とする）ＮＯ_niに対応する１つの特定音の音圧が、図４（Ｂ）に示すような時間軸に沿って推移した波形である特定音推移ｎｔｗｆ（ｔｉ）を取得する（Ｓ２１０）。具体的に、本実施形態のＳ２１０では、対象音符ＮＯ_niに対応する特定音を音源モジュール１７に出力させ、音声入力部１５を介して受け付けることで特定音推移ｎｔｗｆ（ｔｉ）を取得する。 As a result of the determination in S190, if the note index ni is less than the final note NNPT (mti) (S190: YES), the note index ni is incremented by one (S200). A waveform in which the sound pressure of one specific sound corresponding to a note (hereinafter referred to as a target note) NO _ni having the incremented note index ni has shifted along the time axis as shown in FIG. The specific sound transition ntwf (ti) is acquired (S210). Specifically, in S210 of the present embodiment, the specific sound corresponding to the target note NO _ni is output to the sound module 17 and received via the voice input unit 15 to acquire the specific sound transition ntwf (ti).

そして、取得した特定音推移ｎｔｗｆ（ｔｉ）を周波数解析する（Ｓ２２０）。ただし、本実施形態の周波数解析は、離散フーリエ変換によって実施され、その離散フーリエ変換は、特定音推移ｎｔｗｆ（ｔｉ）（即ち、１つの特定音）の開始から終了までの期間について、分析時間窓ｔｗｉをシフト幅ＷＳＬで時間軸に沿ってシフトさせることを繰り返しながら実行される。 Then, frequency analysis is performed on the acquired specific sound transition ntwf (ti) (S220). However, the frequency analysis of the present embodiment is performed by discrete Fourier transform, and the discrete Fourier transform is an analysis time window for a period from the start to the end of a specific sound transition ntwf (ti) (that is, one specific sound). It is executed while repeatedly shifting twi along the time axis by the shift width WSL.

このような周波数解析の結果、Ｓ２２０では、特定音推移ｎｔｗｆ（ｔｉ）における各分析時間窓ｔｗｉに含まれる周波数毎に、その周波数の強さ（即ち、スペクトル振幅値）ｎｔｓｐ（ｔｗｉ，ｆｉ）が、実数部及び虚数部の両方について導出される。すなわち、本実施形態では、スペクトル振幅値ｎｔｓｐ（ｔｗｉ，ｆｉ）を周波数軸に沿って配置することで複素（周波数）スペクトルを導出する。そして、その複素スペクトルのスペクトル振幅値ｎｔｓｐを絶対値とした振幅スペクトルを、時間軸に沿って配置したスペクトログラム（図５（Ａ）参照、以下、特定音スペクトログラムと称す）を導出する。なお、図５（Ａ）の特定音スペクトログラムでは、スペクトル振幅値ｎｔｓｐ（ｔｗｉ，ｆｉ）の大きさを色の濃淡によって表した。 As a result of such frequency analysis, in S220, for each frequency included in each analysis time window twi in the specific sound transition ntwf (ti), the frequency strength (that is, spectrum amplitude value) ntsp (twi, fi) is obtained. , Derived for both the real and imaginary parts. That is, in the present embodiment, a complex (frequency) spectrum is derived by arranging the spectrum amplitude value ntsp (twi, fi) along the frequency axis. Then, a spectrogram (see FIG. 5A, hereinafter referred to as a specific sound spectrogram) in which an amplitude spectrum having the spectrum amplitude value ntsp of the complex spectrum as an absolute value is arranged along the time axis is derived. In the specific sound spectrogram shown in FIG. 5A, the magnitude of the spectrum amplitude value ntsp (twi, fi) is represented by color shading.

続いて、ＲＡＭ２２（または記憶部１８）に記憶されているスペクトル振幅値ｔｕｓｐ（ｔｗｉ，ｆｉ）に対して、Ｓ２２０にて導出されたスペクトル振幅値ｎｔｓｐ（ｔｗｉ，ｆｉ）の相関値が最大となる時間軸方向の補正量（以下、ズレ量とする）ｄｔｗｉ及び周波数軸に沿ったズレ量ｄｆｉを導出する（Ｓ２３０）。 Subsequently, the correlation value of the spectrum amplitude value ntsp (twi, fi) derived in S220 becomes the maximum with respect to the spectrum amplitude value tusp (twi, fi) stored in the RAM 22 (or the storage unit 18). A correction amount (hereinafter referred to as a deviation amount) dtwi in the time axis direction and a deviation amount dfi along the frequency axis are derived (S230).

具体的に、本実施形態のＳ２３０では、対象音符ＮＯ_niの発音タイミングＯＮ_niに対応する楽音推移ｔｕｗｆでの分析時間窓ｎｏｔｗｉを、下記（１）式により特定する。なお、（１）式に示す関数ｒｏｕｎｄは、小数点以下を四捨五入した整数値を返す関数である。 Specifically, in S230 of the present embodiment, the analysis time window notwi in the musical sound transition tuwf corresponding to the sound generation timing ON _ni of the target note NO _ni is specified by the following equation (1). The function round shown in the equation (1) is a function that returns an integer value obtained by rounding off the decimal part.

そして、下記（２）式により、楽音スペクトログラムを構成する全範囲でのスペクトル振幅値ｔｕｓｐに対して、Ｓ２２０にて導出された特定音スペクトログラムを構成するスペクトル振幅値ｎｔｓｐ（ｔｗｉ，ｆｉ）の相関値が最大となる分析時間窓ｔｗｉ及び周波数区分ｆｉを特定する。ただし、下記（２）式にて導出されるズレ量ｄｔｗｉは、上述した相関値が最大となる分析時間窓ｔｗｉが、分析時間窓ｎｏｔｗｉから時間軸に沿って何番目であるかを表し、ズレ量ｄｆｉは、上述した相関値が最大となる周波数区分ｆｉが、最小の周波数区分ｆｉ_MINから周波数軸に沿って何番目であるかを表す。なお、（２）式に示す関数ａｒｇｍａｘは、括弧内の関数（本実施形態では、相関値）が最大となる変数（ｐ，ｑ）を返す関数である。 Then, the correlation value of the spectrum amplitude value ntsp (twi, fi) constituting the specific sound spectrogram derived in S220 with respect to the spectrum amplitude value tusp in the entire range constituting the musical sound spectrogram by the following equation (2). The analysis time window twi and the frequency division fi that maximizes are specified. However, the deviation amount dtwi derived by the following equation (2) represents the number of deviations along the time axis from the analysis time window notwi that the analysis time window twi having the maximum correlation value described above is expressed. The quantity dfi represents the number along the frequency axis of the frequency section fi having the maximum correlation value described above from the minimum frequency section fi _MIN . The function argmax shown in the expression (2) is a function that returns a variable (p, q) that maximizes the function in parentheses (correlation value in this embodiment).

すなわち、（２）式では、図５（Ｂ）に示すように、（１）式によって特定された分析時間窓ｎｏｔｗｉを原点として、Ｓ２２０にて導出された特定音スペクトログラムを構成するスペクトル振幅値ｎｔｓｐ（ｔｗｉ，ｆｉ）を、周波数軸及び時間軸に沿ってシフトさせながら相関値が最大となるズレ量ｄｔｗｉ及びズレ量ｄｆｉを導出する。 That is, in the equation (2), as shown in FIG. 5B, the spectrum amplitude value ntsp that constitutes the specific sound spectrogram derived in S220 with the analysis time window notwi specified by the equation (1) as the origin. While shifting (twi, fi) along the frequency axis and the time axis, a shift amount dtwi and a shift amount dfi that maximize the correlation value are derived.

続いて、Ｓ２３０にて導出されたズレ量ｄｔｗｉ，ｄｆｉに基づいて、楽音スペクトログラムを構成する全範囲でのスペクトル振幅値ｔｕｓｐ（ｔｗｉ，ｆｉ）の中で、Ｓ２２０にて導出された特定音スペクトログラムを構成するスペクトル振幅値ｎｔｓｐ（ｔｗｉ，ｆｉ）に対応する範囲（以下、対応範囲とする）を特定する（Ｓ２４０）。具体的に、本実施形態では、先のＳ２３０にて導出されたズレ量ｄｔｗｉを分析時間窓ｎｏｔｗｉに加えた分析時間窓ｎｏｔｗｉ＋ｄｔｗｉ、及び最小の周波数区分ｆｉ_MINにズレ量ｄｆｉを加えた周波数区分ｆｉ＋ｄｆｉを原点として、特定音スペクトログラムを構成するスペクトル振幅値ｎｔｓｐ（ｔｗｉ，ｆｉ）に対応する楽音スペクトログラム上での範囲を対応範囲とする。 Subsequently, based on the shift amounts dtwi and dfi derived in S230, the specific sound spectrogram derived in S220 among the spectrum amplitude values tusp (twi, fi) in the entire range constituting the musical sound spectrogram is obtained. A range (hereinafter referred to as a corresponding range) corresponding to the spectrum amplitude value ntsp (twi, fi) to be configured is specified (S240). Specifically, in the present embodiment, an analysis time window notwi + dtwi obtained by adding the deviation amount dtwi derived in the previous S230 to the analysis time window notwi, and a frequency division fi + dfi obtained by adding the deviation amount dfi to the minimum frequency division fi _MIN. Is a range on the musical sound spectrogram corresponding to the spectrum amplitude value ntsp (twi, fi) constituting the specific sound spectrogram.

続いて、対応範囲でのスペクトル振幅値ｔｕｓｐ（ｔｗｉ，ｆｉ）と、特定音スペクトログラムを構成するスペクトル振幅値ｎｔｓｐ（ｔｗｉ，ｆｉ）との比を表す振幅比率ｋｒ（ｔｗｉ，ｆｉ）を導出する（Ｓ２５０）。なお、特定音スペクトログラムを構成するスペクトル振幅値ｎｔｓｐ（ｔｗｉ，ｆｉ）は、最小の周波数区分ｆｉ_MINから、先のＳ２３０にて導出されたズレ量ｄｆｉシフトしたものである。 Subsequently, an amplitude ratio kr (twi, fi) representing a ratio between the spectrum amplitude value tusp (twi, fi) in the corresponding range and the spectrum amplitude value ntsp (twi, fi) constituting the specific sound spectrogram is derived ( S250). The spectral amplitude value ntsp (twi, fi) constituting the specific sound spectrogram is obtained by shifting the shift amount dfi derived in S230 from the minimum frequency section fi _MIN .

具体的に、Ｓ２５０では、振幅比率Ｋｒは、複素スペクトルの絶対値に対して、各分析時間窓ｔｗｉにおける周波数区分ｆｉ毎に導出する。ただし、本実施形態における振幅比率ｋｒは、特定音スペクトログラムを構成するスペクトル振幅値ｎｔｓｐ（ｔｗｉ，ｆｉ）が、スペクトル振幅値ｔｕｓｐ（ｔｗｉ，ｆｉ）よりも大きければ、その値を「１」とし、スペクトル振幅値ｎｔｓｐ（ｔｗｉ，ｆｉ）が、スペクトル振幅値ｔｕｓｐ（ｔｗｉ，ｆｉ）よりも小さければ、両スペクトル振幅値の比としている。 Specifically, in S250, the amplitude ratio Kr is derived for each frequency division fi in each analysis time window twi with respect to the absolute value of the complex spectrum. However, the amplitude ratio kr in the present embodiment is set to “1” if the spectral amplitude value ntsp (twi, fi) constituting the specific sound spectrogram is larger than the spectral amplitude value tusp (twi, fi). If the spectrum amplitude value ntsp (twi, fi) is smaller than the spectrum amplitude value tusp (twi, fi), the ratio of both spectrum amplitude values is set.

そして、Ｓ２５０にて導出された振幅比率ｋｒに基づいて、対応範囲に対応する楽音推移ｔｕｗｆの期間である特定期間にて、楽器音の音圧が時間軸に沿って推移した波形である区間推移ｎｔｃｐｗｆ（ｔｉ）を導出すると共に、楽器音推移ｔｒｗｆ（ｍｔｉ，ｔｉ）での特定期間を、その導出された区間推移ｎｔｃｐｗｆ（ｔｉ）へと更新する楽器音分離処理を実行する（Ｓ２６０）。 Then, based on the amplitude ratio kr derived in S250, the section transition that is a waveform in which the sound pressure of the instrument sound has shifted along the time axis in the specific period that is the period of the musical sound transition tuwf corresponding to the corresponding range. In addition to deriving ntcpwf (ti), the instrument sound separation process is executed to update the specific period in the instrument sound transition trwf (mti, ti) to the derived section transition ntcpwf (ti) (S260).

その後、Ｓ１９０へと戻り、対象音符ＮＯ_niの音符インデックスｎｉが、設定インデックスｍｔｉにおける最終音符ＮＮＰＴ（ｍｔｉ）未満であれば（Ｓ１９０：ＹＥＳ）、Ｓ１９０からＳ２６０のステップを繰り返す。そして、対象音符ＮＯ_niの音符インデックスｎｉが、設定インデックスｍｔｉにおける最終音符ＮＮＰＴ（ｍｔｉ）以上となると（Ｓ１９０：ＮＯ）、Ｓ２６０での楽器音分離処理にて更新された楽器音推移ｔｒｗｆ（ｍｔｉ，ｔｉ）を記憶部１８に記憶する（Ｓ２７０）。すなわち、音響データから、楽譜データにおける最初の音符ＮＯ₁に対応する楽器音から最後の音符ＮＯ_NNPT(mti)に対応する楽器音まで分離し終えると、Ｓ２７０を経てＳ１５０へと戻る。 Thereafter, the process returns to S190, and if the note index _ni of the target note NO _ni is less than the final note NNPT (mti) in the set index mti (S190: YES), the steps from S190 to S260 are repeated. When the note index _ni of the target note NO _ni is equal to or greater than the final note NNPT (mti) in the set index mti (S190: NO), the instrument sound transition trwf (mti, ti) is stored in the storage unit 18 (S270). That is, returning from the acoustic data, after finishing separate from the instrument sound corresponding to the first note NO ₁ in the musical score data to the instrument sound corresponding to the last note NO _{NNPT (mti),} to S150 through S270.

そのＳ２７０を経て戻ったＳ１５０では、設定インデックスｍｔｉが、最終インデックスＭＴＮ未満であれば（Ｓ１５０：ＹＥＳ）、Ｓ１５０からＳ２７０のステップを繰り返す。そして、設定されている設定インデックスｍｔｉが、最終インデックスＭＴＮ以上となると（Ｓ１５０：ＮＯ）、本音源分離処理を終了する。すなわち、演奏データに対応する楽曲にて演奏された全ての楽器について、音響データから楽器音推移ｔｒｗｆ（ｍｔｉ，ｔｉ）を生成して分離し終えると、本音源分離処理を終了する。 In S150 after returning through S270, if the set index mti is less than the final index MTN (S150: YES), the steps from S150 to S270 are repeated. When the set index mti that is set is equal to or greater than the final index MTN (S150: NO), the sound source separation process is terminated. That is, when the musical instrument sound transition trwf (mti, ti) is generated and separated from the acoustic data for all musical instruments played with the music corresponding to the performance data, the sound source separation process is terminated.

〈楽器音分離処理の処理について〉
次に、音源分離処理のＳ２６０にて起動される楽器音分離処理について説明する。
図６に示すように、この楽器音分離処理は、起動されると、楽音推移ｔｕｗｆでの特定期間に含まれる対象音符ＮＯ_niに対応する楽器音の周波数毎に、各周波数の強さを表すスペクトル振幅値（本発明の分離スペクトルに相当）ｎｔｃｐｓｐ（ｔｗｉ，ｆｉ）を導出する（Ｓ３１０）。 <About instrument sound separation processing>
Next, the instrument sound separation process started in S260 of the sound source separation process will be described.
As shown in FIG. 6, when the instrument sound separation process is started, the intensity of each frequency is expressed for each frequency of the instrument sound corresponding to the target note NO _ni included in the specific period in the musical sound transition tuwf. A spectrum amplitude value (corresponding to the separation spectrum of the present invention) ntcpsp (twi, fi) is derived (S310).

本実施形態のＳ３１０では、ＲＡＭ２２（または記憶部１８）に記憶されている楽音スペクトログラムを構成する全範囲でのスペクトル振幅値ｔｕｓｐ（ｔｗｉ，ｆｉ）のうち、対応範囲のスペクトル振幅値ｔｕｓｐ（ｔｗｉ，ｆｉ）に、振幅比率ｋｒを乗算してスペクトル振幅値ｎｔｃｐｓｐ（ｔｗｉ，ｆｉ）を導出する。具体的に、図７（Ａ）、及び図７（Ｂ）に示すように、複素スペクトルの実数部と虚数部とにおける分析時間窓ｔｗｉのスペクトル振幅値ｔｕｓｐに、各分析時間窓ｔｗｉと周波数区分ｆｉとの組み合わせに対応する振幅比率ｋｒを乗算する。この振幅比率ｋｒの乗算は、周波数毎に実行される。なお、図７中において、実線は、分離スペクトルとして導出されたスペクトル振幅値ｎｔｃｐｓｐ（ｔｗｉ，ｆｉ）であり、破線は、楽音スペクトログラムにおけるスペクトル振幅値ｔｕｓｐ（ｔｗｉ，ｆｉ）である。 In S310 of the present embodiment, among the spectrum amplitude values tusp (twi, fi) in the entire range constituting the musical sound spectrogram stored in the RAM 22 (or storage unit 18), the spectrum amplitude value tusp (twi, fi) of the corresponding range. Fi) is multiplied by the amplitude ratio kr to derive a spectrum amplitude value ntpspsp (twi, fi). Specifically, as shown in FIGS. 7A and 7B, the spectrum amplitude value tusp of the analysis time window twi in the real part and the imaginary part of the complex spectrum is divided into each analysis time window twi and frequency division. Multiply by the amplitude ratio kr corresponding to the combination with fi. The multiplication of the amplitude ratio kr is executed for each frequency. In FIG. 7, the solid line is the spectrum amplitude value ntcpsp (twi, fi) derived as a separated spectrum, and the broken line is the spectrum amplitude value tusp (twi, fi) in the musical sound spectrogram.

そして、Ｓ３１０にて導出された分離スペクトルのスペクトル振幅値ｎｔｃｐｓｐ（ｔｗｉ，ｆｉ）を逆離散フーリエ変換（ＩＤＦＴ：ｉｎｖｅｒｓｅｄｉｓｃｒｅｔｅＦｏｕｒｉｅｒｔｒａｎｓｆｏｒｍ）して、区間推移ｎｔｃｐｗｆを導出する（Ｓ３２０）。その導出した区間推移ｎｔｃｐｗｆに基づいて、下記（３）式に従って、楽器音推移ｔｒｗｆ_oldを楽器音推移ｔｒｗｆ_newへと更新する（Ｓ３３０）。ただし、添え字ｏｌｄは、更新前の楽器音推移ｔｒｗｆであることを表し、添え字ｎｅｗは、更新後の楽器音推移ｔｒｗｆであることを表す。 Then, an inverse discrete Fourier transform (IDFT) is performed on the spectrum amplitude value ntcpsp (twi, fi) of the separated spectrum derived in S310 to derive an interval transition ntcpwf (S320). Based on the derived section transition ntcpwf, the instrument sound transition trwf _old is updated to the instrument sound transition trwf _new according to the following equation (3) (S330). However, the suffix “old” represents the instrumental sound transition trwf before the update, and the suffix “new” represents the instrumental sound transition trwf after the update.

すなわち、本実施形態のＳ３３０では、図８（Ａ）に示すように、初期値に設定されていた特定期間での楽器音推移ｔｒｗｆ_oldを、図８（Ｂ）に示すように、区間推移ｎｔｃｐｗｆへと置き換えることで、楽器音推移ｔｒｗｆ_newへと更新している。 That is, in S330 of this embodiment, as shown in FIG. 8A, the instrument sound transition trwf _old in the specific period set to the initial value is changed to the section transition ntcpwf as shown in FIG. 8B. Is replaced with instrument sound transition trwf _new .

そして、その後、音源分離処理へと戻り、Ｓ１９０へと進む。
つまり、本実施形態の音源分離処理では、スペクトル振幅値ｔｕｓｐ（ｔｗｉ，ｆｉ）に対して、１つの特定音から導出されたスペクトル振幅値ｎｔｓｐ（ｔｗｉ，ｆｉ）の相関値が最大となる対応範囲を特定する。そして、特定された対応範囲から振幅比率ｋｒを導出して、その導出された振幅比率ｋｒを、スペクトル振幅値ｔｕｓｐ（ｔｗｉ，ｆｉ）に乗算することで、特定音に対応する楽音推移ｔｕｗｆでの楽器音の複素スペクトルを表すスペクトル振幅値ｎｔｃｐｓｐ（ｔｗｉ，ｆｉ）を導出する。 Thereafter, the process returns to the sound source separation process and proceeds to S190.
That is, in the sound source separation process of the present embodiment, the corresponding range in which the correlation value of the spectrum amplitude value ntsp (twi, fi) derived from one specific sound is maximized with respect to the spectrum amplitude value tusp (twi, fi). Is identified. Then, the amplitude ratio kr is derived from the identified corresponding range, and the derived amplitude ratio kr is multiplied by the spectrum amplitude value tusp (twi, fi), so that the musical sound transition tuwf corresponding to the specific sound is obtained. A spectrum amplitude value ntpspsp (twi, fi) representing the complex spectrum of the instrument sound is derived.

さらに、そのスペクトル振幅値ｎｔｃｐｓｐ（ｔｗｉ，ｆｉ）を逆フーリエ変換して、区間推移ｎｔｃｐｗｆを導出し、その導出された区間推移ｎｔｃｐｗｆにて、楽器音推移ｔｒｗｆにおける対応範囲を置換することで、楽器音推移ｔｒｗｆを更新する。 Further, the spectrum amplitude value ntcpsp (twi, fi) is subjected to inverse Fourier transform to derive a section transition ntcpwf, and the corresponding section in the instrument sound transition trwf is replaced with the derived section transition ntcpwf. The sound transition trwf is updated.

［第一実施形態の効果］
以上説明したように、本実施形態の楽器音分離装置１０によれば、楽音推移ｔｕｗｆから、楽器音推移ｔｒｗｆを生成（抽出）することができる。 [Effect of the first embodiment]
As described above, according to the instrument sound separation device 10 of the present embodiment, the instrument sound transition trwf can be generated (extracted) from the musical sound transition tuwf.

特に、楽器音分離装置１０にて特定される対応範囲は、楽音スペクトログラムと特定音スペクトログラムとが、周波数軸及び時間軸の両方の軸に沿って最も一致する範囲である。このため、楽器音分離装置１０によれば、取得された楽音推移ｔｕｗｆにおいて各楽音の出力タイミングや音高が、演奏データにおける発音タイミングや音高とは異なるようにアレンジされたものであっても、そのアレンジされた出力タイミングや音高に対応する範囲を対応範囲として特定することができる。 In particular, the corresponding range specified by the musical instrument sound separation device 10 is a range in which the musical sound spectrogram and the specific sound spectrogram are most consistent along both the frequency axis and the time axis. For this reason, according to the musical instrument sound separating apparatus 10, even if the output timing and pitch of each musical tone in the acquired musical tone transition tuwf are arranged so as to be different from the sounding timing and pitch in the performance data. The range corresponding to the arranged output timing and pitch can be specified as the corresponding range.

よって、楽器音分離装置１０によれば、楽音推移ｔｕｗｆから生成した楽器音推移ｔｒｗｆを、その楽曲にて実際に演奏された楽器音に近づけることができる。この結果、楽音推移ｔｕｗｆから楽器音推移ｔｒｗｆを生成する際の精度を向上させることができる。 Therefore, according to the musical instrument sound separating apparatus 10, the musical instrument sound transition trwf generated from the musical sound transition tuwf can be brought close to the musical instrument sound actually played with the music. As a result, it is possible to improve the accuracy when the musical instrument sound transition trwf is generated from the musical sound transition tuwf.

しかも、本実施形態の楽器音分離装置１０によれば、区間推移ｎｔｃｐｗｆを演奏データにおける音符ＮＯ毎に導出しているため、一つの音符ＮＯのみがアレンジされた楽曲の楽音推移ｔｕｗｆであっても、楽器音推移ｔｒｗｆを精度良く生成することができる。 Moreover, according to the musical instrument sound separation device 10 of the present embodiment, since the section transition ntcpwf is derived for each note NO in the performance data, even the musical sound transition tuwf of the music in which only one note NO is arranged. The instrumental sound transition trwf can be generated with high accuracy.

さらに、本実施形態の音源分離処理では、対象音符ＮＯ_niの発音タイミングＯＮ_niに対応する楽音推移ｔｕｗｆにおける分析時間窓ｔｗｉのインデックス番号ｎｏｔｗｉを特定した後に、ズレ量ｄｔｗｉを導出している。このため、楽器音分離装置１０によれば、対応区間を特定するまで、ひいては、区間推移ｎｔｃｐｗｆを生成するまでに要する処理量を低減できる。 Further, in the sound source separation process of the present embodiment, after specifying the index number notwi of the analysis time window twi in the musical sound transition tuwf corresponding to the sound generation timing ON _ni of the target note NO _ni , the deviation amount dtwi is derived. For this reason, according to the musical instrument sound separation apparatus 10, it is possible to reduce the amount of processing required until the corresponding section is specified, and thus the section transition ntcpwf is generated.

そして、本実施形態の音源分離処理によれば、演奏データに従って音源モジュール１７が出力する全ての楽器の楽器音推移ｔｒｗｆを、楽音推移ｔｕｗｆから分離・抽出することができる。 According to the sound source separation process of the present embodiment, the instrument sound transition trwf of all musical instruments output from the sound module 17 according to the performance data can be separated and extracted from the musical sound transition tuwf.

なお、本実施形態の音源分離処理によれば、楽音推移ｔｕｗｆ及び特定音推移ｎｔｗｆの時間軸に沿った一連の処理によって、１つの模擬楽器に対する楽器音推移ｔｒｗｆを生成することができる。この結果、楽器音推移ｔｒｗｆを生成するまでに要する処理量を低減することができる。 According to the sound source separation process of the present embodiment, the instrument sound transition trwf for one simulated musical instrument can be generated by a series of processes along the time axis of the musical sound transition tuwf and the specific sound transition ntwf. As a result, it is possible to reduce the amount of processing required to generate the instrument sound transition trwf.

［第二実施形態］
次に、本発明の第二実施形態について説明する。
本実施形態における楽器音分離装置は、第一実施形態における楽器音分離装置１０と楽器音分離処理の処理内容が異なるのみである。このため、本実施形態における楽器音分離装置では、第一実施形態における楽器音分離装置１０と同様の構成及び処理については、同一符合を付して説明を省略し、第一実施形態における楽器音分離装置１０とは異なる楽器音分離処理を中心に説明する。
〈楽器音分離処理の処理について〉
図９に示すように、本実施形態の楽器音分離処理は、起動されると、楽音推移ｔｕｗｆでの特定期間に含まれる対象音符ＮＯ_niに対応する楽器音の周波数毎に、各周波数の強さを表すスペクトル振幅値ｎｔｃｐｓｐ（ｔｗｉ，ｆｉ）を導出する（Ｓ４１０）。このＳ４１０におけるスペクトル振幅値ｎｔｃｐｓｐ（ｔｗｉ，ｆｉ）の導出方法は、第一実施形態の楽器音分離処理におけるＳ３１０と同様であるため、ここでの詳しい説明は省略する。 [Second Embodiment]
Next, a second embodiment of the present invention will be described.
The instrument sound separation device in this embodiment is different from the instrument sound separation device 10 in the first embodiment only in the processing content of the instrument sound separation processing. For this reason, in the musical instrument sound separation device according to the present embodiment, the same configurations and processes as those of the musical instrument sound separation device 10 according to the first embodiment are denoted by the same reference numerals and description thereof is omitted, and the instrument sound according to the first embodiment is omitted. The description will focus on instrument sound separation processing different from the separation device 10.
<About instrument sound separation processing>
As shown in FIG. 9, when the instrument sound separation process according to the present embodiment is started, for each frequency of the instrument sound corresponding to the target note NO _ni included in the specific period in the musical sound transition tuwf, the intensity of each frequency is increased. A spectral amplitude value ntcpsp (twi, fi) representing the length is derived (S410). The method of deriving the spectrum amplitude value ntpspsp (twi, fi) in S410 is the same as that in S310 in the instrument sound separation process of the first embodiment, and thus detailed description thereof is omitted here.

さらに、記憶部１８に記憶されているスペクトル振幅値ｔｕｓｐ（ｔｗｉ，ｆｉ）を、下記（４）式に基づいて更新する（Ｓ４２０）。 Further, the spectrum amplitude value tusp (twi, fi) stored in the storage unit 18 is updated based on the following equation (4) (S420).

すなわち、Ｓ４１０にて導出されたスペクトル振幅値ｎｔｃｐｓｐ（ｔｗｉ，ｆｉ）を、対応範囲のスペクトル振幅値ｔｕｓｐから減算して、新たなスペクトル振幅値ｔｕｓｐを導出している。なお、（４）式において、添え字ｏｌｄは、更新前のスペクトル振幅値ｔｕｓｐであることを表し、添え字ｎｅｗは、更新後のスペクトル振幅値ｔｕｓｐであることを表す。 That is, the new spectrum amplitude value tusp is derived by subtracting the spectrum amplitude value ntpspsp (twi, fi) derived in S410 from the spectrum amplitude value tusp in the corresponding range. In the equation (4), the suffix “old” represents the spectrum amplitude value tusp before update, and the suffix “new” represents the spectrum amplitude value tusp after update.

続いて、Ｓ４１０にて導出されたスペクトル振幅値ｎｔｃｐｓｐ（ｔｗｉ，ｆｉ）を逆離散フーリエ変換（ＩＤＦＴ）して、区間推移ｎｔｃｐｗｆを導出する（Ｓ４３０）。その導出した区間推移ｎｔｃｐｗｆに基づいて、上記（３）式に従って、楽器音推移ｔｒｗｆ_oldを楽器音推移ｔｒｗｆ_newへと更新する（Ｓ４４０）。 Subsequently, an inverse discrete Fourier transform (IDFT) is performed on the spectrum amplitude value ntcpsp (twi, fi) derived in S410 to derive an interval transition ntcpwf (S430). On the basis of the derived section transition ntcpwf, the instrument sound transition trwf _old is updated to the instrument sound transition trwf _new according to the above equation (3) (S440).

そして、その後、音源分離処理へと戻り、Ｓ１９０へと進む。
［第二実施形態の効果］
つまり、本実施形態の楽器音分離処理では、スペクトル振幅値ｎｔｃｐｓｐ（ｔｗｉ，ｆｉ）を導出する際に、振幅比率ｋｒを乗算する対象が、模擬楽器に対するスペクトル振幅値ｎｔｃｐｓｐが減算されたスペクトル振幅値ｔｕｓｐであるという点で、第一実施形態の楽器音分離処理とは異なる。 Thereafter, the process returns to the sound source separation process and proceeds to S190.
[Effects of Second Embodiment]
That is, in the instrument sound separation process of the present embodiment, when the spectrum amplitude value ntcpsp (twi, fi) is derived, the target to be multiplied by the amplitude ratio kr is the spectrum amplitude value obtained by subtracting the spectrum amplitude value ntcpsp for the simulated instrument. It is different from the instrument sound separation processing of the first embodiment in that it is tusp.

したがって、本実施形態の楽器音分離装置１０では、楽音として歌声が含まれた楽音推移ｔｕｗｆを取得した場合、その楽音推移ｔｕｗｆから全ての模擬楽器についての区間推移ｎｔｃｐｗｆ（即ち、楽器音推移ｔｒｗｆ）を減算すると、歌声の音圧の時間軸に沿った推移が残る。つまり、楽器音分離装置１０によれば、楽曲における歌声の音圧の推移を抽出することができる。 Therefore, in the musical instrument sound separation apparatus 10 of the present embodiment, when the musical sound transition tuwf including the singing voice is acquired as the musical sound, the section transition ntcpwf (that is, the musical instrument sound transition trwf) for all the simulated musical instruments from the musical sound transition tuwf. Subtracting, the transition along the time axis of the sound pressure of the singing voice remains. That is, according to the musical instrument sound separation device 10, it is possible to extract the transition of the sound pressure of the singing voice in the music.

［その他の実施形態］
以上、本発明の実施形態について説明したが、本発明は上記実施形態に限定されるものではなく、本発明の要旨を逸脱しない範囲において、様々な態様にて実施することが可能である。 [Other Embodiments]
As mentioned above, although embodiment of this invention was described, this invention is not limited to the said embodiment, In the range which does not deviate from the summary of this invention, it is possible to implement in various aspects.

例えば、上記実施形態では、演奏データ及び音響データを、それぞれ個別のデータとして情報処理装置１０にて取得していたが、演奏データ及び音響データの取得方法は、これに限るものではなく、演奏データ及び音響データを、対応する楽曲毎に１つの組として、通信部１１を介して外部から組単位で取得しても良い。 For example, in the above-described embodiment, the performance data and the sound data are acquired by the information processing apparatus 10 as individual data, but the method of acquiring the performance data and the sound data is not limited to this, and the performance data The acoustic data may be acquired as a set for each corresponding music piece from the outside via the communication unit 11 in units of sets.

また、演奏データに基づいて特定音推移ｎｔｗｆを取得する方法は、音源モジュール１７から出力された特定音が音声入力部１５を介して入力されることに限らない。例えば、特定音の時間軸に沿った波形を表す音響信号（電気信号）を音源モジュール１７が生成し、その生成された音響信号に従って音声出力部１６が鳴動するように、情報処理装置１０が構成されている場合、音源モジュール１７が生成する音響信号を特定音推移ｎｔｗｆとして取得しても良い。 Further, the method of acquiring the specific sound transition ntwf based on the performance data is not limited to the specific sound output from the sound source module 17 being input via the voice input unit 15. For example, the information processing apparatus 10 is configured such that the sound source module 17 generates an acoustic signal (electric signal) representing a waveform along the time axis of a specific sound, and the sound output unit 16 rings according to the generated acoustic signal. If so, the sound signal generated by the sound source module 17 may be acquired as the specific sound transition ntwf.

さらに、音響データに基づいて楽音推移ｔｕｗｆを取得する方法は、音声出力部１６から出力された音声が音声入力部１５を介して入力されることに限らない。例えば、音声の時間軸に沿った波形を表す楽音信号（電気信号）を、音響データ読取部１２や制御部２０が生成し、その生成された楽音信号に従って音声出力部１６が鳴動するように、情報処理装置１０が構成されている場合、音響データ読取部１２や制御部２０が生成した楽音信号を楽音推移ｔｕｗｆとして取得しても良い。 Furthermore, the method for obtaining the musical sound transition tuwf based on the acoustic data is not limited to the voice output from the voice output unit 16 being input via the voice input unit 15. For example, a sound signal (electric signal) representing a waveform along the time axis of sound is generated by the acoustic data reading unit 12 or the control unit 20, and the sound output unit 16 rings according to the generated sound signal. When the information processing apparatus 10 is configured, a musical tone signal generated by the acoustic data reading unit 12 or the control unit 20 may be acquired as a musical tone transition tuwf.

また、上記実施形態では、楽曲スペクトログラム及び特定音スペクトログラムを構成する振幅スペクトルの周波数区分を、対数軸にて表現したが、対応範囲を導出する際に、周波数の比が保持したまま、相関値を導出可能であれば、振幅スペクトルの周波数成分は、実数にて表記しても良い。
［実施形態と特許請求の範囲との対応関係］
最後に、上記実施形態の記載と、特許請求の範囲の記載との関係を説明する。 Further, in the above embodiment, the frequency segment of the amplitude spectrum constituting the music spectrogram and the specific sound spectrogram is expressed by the logarithmic axis, but when deriving the corresponding range, the correlation value is maintained while the frequency ratio is maintained. If derivable, the frequency component of the amplitude spectrum may be expressed as a real number.
[Correspondence between Embodiment and Claims]
Finally, the relationship between the description of the above embodiment and the description of the scope of claims will be described.

上記実施形態の音源分離処理におけるＳ１２０が、本発明の楽音取得手段に相当し、音源分離処理のＳ１３０が、本発明の楽音解析手段に相当し、音源分離処理のＳ２１０が、本発明の特定音取得手段に相当し、音源分離処理のＳ２２０が特定音解析手段に相当する。さらに、上記実施形態の音源分離処理におけるＳ２３０，Ｓ２４０が、本発明の範囲特定手段に相当し、音源分離処理のＳ２５０が、本発明の振幅比率導出手段に相当する。 S120 in the sound source separation process of the above embodiment corresponds to the musical sound acquisition means of the present invention, S130 of the sound source separation process corresponds to the musical sound analysis means of the present invention, and S210 of the sound source separation process corresponds to the specific sound of the present invention. It corresponds to an acquisition unit, and S220 of the sound source separation process corresponds to a specific sound analysis unit. Further, S230 and S240 in the sound source separation process of the above embodiment correspond to the range specifying means of the present invention, and S250 of the sound source separation process corresponds to the amplitude ratio deriving means of the present invention.

また、楽器音分離処理のＳ３１０，Ｓ３２０が、本発明の区間推移導出手段に相当し、楽器音分離処理のＳ３３０が、本発明の楽器音分離手段に相当する。なお、楽器音分離処理のＳ４１０，Ｓ４３０，Ｓ４４０が、本発明の推移導出手段に相当し、楽器音分離処理のＳ４２０が、本発明の記憶制御手段、及び更新手段に相当する。 Further, S310 and S320 of the instrument sound separation process correspond to the section transition deriving means of the present invention, and S330 of the instrument sound separation process corresponds to the instrument sound separation means of the present invention. Note that S410, S430, and S440 of the instrument sound separation process correspond to the transition deriving unit of the present invention, and S420 of the instrument sound separation process corresponds to the storage control unit and the update unit of the present invention.

１０…情報処理装置（楽器音分離装置）１１…通信部１２…音響データ読取部１３…入力受付部１４…表示部１５…音声入力部１６…音声出力部１７…音源モジュール１８…記憶部２０…制御部２１…ＲＯＭ２２…ＲＡＭ２３…ＣＰＵ DESCRIPTION OF SYMBOLS 10 ... Information processing apparatus (instrument sound separation apparatus) 11 ... Communication part 12 ... Acoustic data reading part 13 ... Input reception part 14 ... Display part 15 ... Audio | voice input part 16 ... Audio | voice output part 17 ... Sound source module 18 ... Memory | storage part 20 ... Control unit 21 ... ROM 22 ... RAM 23 ... CPU

Claims

A musical sound acquisition means for acquiring a musical sound transition in which the sound pressure of the musical sound constituting the music has changed along the time axis;
A musical sound analyzing means for deriving a musical spectrogram in which a frequency spectrum representing a frequency included in the musical sound transition acquired by the musical sound acquisition means and a strength of each frequency is arranged along a time axis;
Based on the performance data representing the score of the musical piece played by the sound source module that outputs the specific sound simulating the musical instrument sound of at least one type of musical instrument, the specific sound of the target musical instrument that is a specified type of musical instrument Specific sound acquisition means for acquiring a specific sound transition in which the sound pressure changes along the time axis;
The specific sound spectrogram in which the frequency spectrum representing the frequency and the strength of each frequency included in the specific sound transition acquired by the specific sound acquisition means is arranged along the time axis, and each specific sound of the target musical instrument is detected by the sound source module. Specific sound analysis means to be derived for each analysis section which is the length of time to be played,
Each of the specific sound spectrograms derived by the specific sound analyzing means is collated with the musical sound spectrograms derived by the musical sound analyzing means, and the specific sound spectrogram most closely matches the frequency axis and the time axis. A range specifying means for specifying a corresponding range in
Amplitude ratio deriving means for deriving, for each frequency, an amplitude ratio representing a ratio between the frequency intensity of the musical sound spectrogram in each corresponding range identified by the range identifying means and the frequency intensity of the specific sound spectrogram; ,
A sound along the time axis is obtained from a separated spectrum obtained by multiplying each amplitude ratio derived by the amplitude ratio deriving means by the intensity of each frequency spectrum constituting the musical sound spectrogram corresponding to the analysis section. A section transition deriving means for deriving a section transition which is a pressure transition;
By arranging the section transition derived by the section transition deriving means along the time axis of the musical piece, the musical instrument sound transition in which the sound pressure of the musical instrument sound of the target instrument has shifted along the time axis in the musical sound transition. A musical instrument sound separation device comprising: a musical instrument sound separation means for generating.

The performance data is
A sound generation timing indicating a timing for starting the output of the specific sound from the sound source module, and an end timing corresponding to the sound generation timing and indicating a timing for ending the output of the specific sound are defined,
The specific sound analyzing means includes
The instrument sound separation device according to claim 1, wherein each period from the sound generation timing to the end timing corresponding to the sound generation timing is defined as the analysis section.

The range specifying means includes
In the interval in the musical spectrogram that is defined so that the beginning and end of the musical sound transition period corresponding to the period from the sounding timing to the end timing corresponding to the sounding timing is sandwiched along the time axis. The musical instrument sound separation device according to claim 2, wherein the specific sound spectrogram is collated.

The range specifying means includes
Each time the specific sound spectrogram is derived by the specific sound analyzing means, the corresponding range is specified,
The amplitude ratio deriving means includes
Each time the corresponding range is specified by the range specifying means, the amplitude ratio is derived,
The section transition deriving means is:
The instrument sound separation device according to any one of claims 1 to 3, wherein the interval transition is derived each time the amplitude ratio is derived by the amplitude ratio deriving unit.

The specific sound acquisition means includes
5. The instrument sound separation according to claim 1, wherein each instrument corresponding to a specific sound output from the sound source module according to the performance data is defined as the target instrument. apparatus.

Storage control means for deriving a residual music transition obtained by subtracting the section transition derived by the section transition deriving means from the musical sound transition, and storing the derived residual musical sound transition in a storage device;
Transition deriving means for sequentially deriving the section transition for each musical instrument corresponding to the specific sound output by the sound module according to the performance data;
Each time the section transition is derived by the transition deriving means, each of the derived section transitions is subtracted from the residual musical sound transition stored in the storage device to update the residual musical sound transition stored in the storage device. The musical instrument sound separation device according to any one of claims 1 to 5, further comprising:

A musical sound acquisition procedure for acquiring a musical sound transition in which the sound pressure of the musical sound constituting the music has changed along the time axis,
A musical sound analysis procedure for deriving a musical spectrogram in which a frequency spectrum representing the frequency and the strength of each frequency included in the musical sound transition acquired in the musical sound acquisition procedure is arranged along the time axis;
Based on the performance data representing the score of the musical piece played by the sound source module that outputs the specific sound simulating the musical instrument sound of at least one type of musical instrument, the specific sound of the target musical instrument that is a specified type of musical instrument A specific sound acquisition procedure for acquiring a specific sound transition in which the sound pressure changes along the time axis;
A specific sound spectrogram in which a frequency spectrum representing the frequency included in the specific sound transition acquired in the specific sound acquisition procedure and the strength of each frequency is arranged along the time axis is displayed on the sound source module. Specific sound analysis procedure to be derived for each analysis section that is the length of time to be played,
Each of the specific sound spectrograms derived by the specific sound analysis procedure is compared with the musical sound spectrograms derived by the musical sound analysis procedure, and the specific sound spectrogram most closely matches the frequency axis and the time axis. A range identification procedure for identifying a corresponding range that is a range in
An amplitude ratio deriving procedure for deriving, for each frequency, an amplitude ratio representing a ratio of the frequency intensity of the musical sound spectrogram in each of the corresponding ranges specified in the range specifying procedure and the frequency intensity of the specific sound spectrogram; ,
A sound along the time axis is obtained from a separated spectrum obtained by multiplying each amplitude ratio derived in the amplitude ratio deriving procedure by the intensity of each frequency spectrum constituting the musical sound spectrogram corresponding to the analysis section. A section transition derivation procedure for deriving a section transition that is a pressure transition;
By arranging the section transition derived in the section transition deriving procedure along the time axis of the music, the musical instrument sound transition in which the sound pressure of the musical instrument sound of the target instrument has shifted along the time axis in the musical sound transition is obtained. A musical instrument sound separation procedure to be generated and a program executed by a computer.