JP5732910B2

JP5732910B2 - Method and apparatus for encoding acoustic signal

Info

Publication number: JP5732910B2
Application number: JP2011043549A
Authority: JP
Inventors: 茂出木　敏雄; 敏雄茂出木
Original assignee: Dai Nippon Printing Co Ltd
Current assignee: Dai Nippon Printing Co Ltd
Priority date: 2011-03-01
Filing date: 2011-03-01
Publication date: 2015-06-10
Anticipated expiration: 2031-03-01
Also published as: JP2012181304A

Description

本発明は、音響信号の符号化技術に関し、特に、ＭＩＤＩ形式等の符号データに符号化するのに好適な符号化技術に関する。 The present invention relates to an audio signal encoding technique, and more particularly to an encoding technique suitable for encoding into code data of MIDI format or the like.

従来、ＭＩＤＩ音源を用いて音響信号を再生することを可能とするため、音響信号をＭＩＤＩ符号等の符号データに変換することが行われている（特許文献１〜３参照）。ＭＩＤＩ音源では、３２和音など限定された周波数で再生されるため、符号化の際には、限定された数の周波数を選択して符号化することが必要となる。出願人も、音響信号から限定された数の周波数を選択して符号化する技術について提案している（特許文献１参照）。 Conventionally, in order to be able to reproduce an acoustic signal using a MIDI sound source, the acoustic signal is converted into code data such as a MIDI code (see Patent Documents 1 to 3). Since a MIDI sound source is reproduced at a limited frequency such as 32 chords, it is necessary to select and encode a limited number of frequencies when encoding. The applicant has also proposed a technique for selecting and encoding a limited number of frequencies from an acoustic signal (see Patent Document 1).

特開２００２−４１０３７号公報JP 2002-41037 A 特許第４０６１０７０号公報Japanese Patent No. 4061070 特許第４１５６２６８号公報Japanese Patent No. 4156268

赤木正人：“聴覚フィルタとそのモデル”，電子情報通信学会誌 77(9), 948-956, 1994-09-25.Masato Akagi: “Hearing filter and its model”, IEICE Journal 77 (9), 948-956, 1994-09-25.

しかしながら、上記特許文献１に記載の発明では、先に選択された強度が大きい周波数の周辺の信号成分を一律に減衰させることにより、互いに隣接する周波数成分がなるべく選択されないようにしている。そのため、ボーカル再生において子音を表現するのに本来必要な成分も削除されてしまい、返って再現性が低下するという問題を抱えていた。 However, in the invention described in Patent Document 1, signal components in the vicinity of a frequency having a high intensity selected previously are uniformly attenuated so that adjacent frequency components are not selected as much as possible. For this reason, the component that is originally necessary for expressing the consonant in the vocal reproduction is also deleted, and the reproducibility is lowered.

一方、ヒトの聴覚は２４本の聴覚神経系で周波数弁別を行っており、理論的には２４個の帯域フィルタで近似することができ、２４個の出力信号の割合で複数（同時には１２種未満）の周波数成分の識別を行っていることが知られている（非特許文献１参照）。すなわち、各帯域フィルタ内に含まれる周波数成分は同時刻では単一周波数成分しか認識できず、同一帯域フィルタに含まれる複数の周波数成分は時間差をもって認識される（うなり音またはビートとして、個人差はあるが同一帯域内の１／４半音程度の周波数の微細な相違が認識される）。 On the other hand, human hearing is frequency-discriminated by 24 auditory nervous systems, and can be theoretically approximated by 24 bandpass filters. It is known that the frequency component is less than (see Non-Patent Document 1). That is, the frequency component included in each band filter can recognize only a single frequency component at the same time, and a plurality of frequency components included in the same band filter are recognized with a time difference (as a beat sound or a beat, However, a minute difference in frequency of about ¼ semitone within the same band is recognized).

そこで、本発明は、ヒトの聴覚特性を利用することにより、子音成分を限定された数の周波数で再生される音源を用いて忠実に再現することが可能な音響信号の符号化方法および装置を提供することを課題とする。 Therefore, the present invention provides an audio signal encoding method and apparatus capable of faithfully reproducing a consonant component using a sound source reproduced at a limited number of frequencies by utilizing human auditory characteristics. The issue is to provide.

上記課題を解決するため、本発明第１の態様では、所定のサンプリング周波数でデジタル化されたＪ個の時系列のサンプル列として与えられる音響信号を符号化するにあたり、前記サンプル列に対して、所定数Ｔ（Ｔ＜Ｊ）個のサンプルで構成される単位区間を、隣接する単位区間と時間軸方向に所定数Ｗ（Ｗ＜Ｔ）個のサンプルを重複させながら設定し、個々の単位区間ごとに、解析対象とする少なくともＮ種類の各周波数ｆ（ｎ）について、周波数変換を行うことにより、各単位区間に対して、前記Ｎ種類の周波数に対応したスペクトル強度を算出し、前記Ｎ種類の各周波数ｆ（ｎ）を互いに重複しないように所定数の周波数グループに分割し、前記各単位区間に対して、各周波数グループに含まれる周波数のスペクトル強度の中で最大値をとる周波数以外のスペクトル強度に所定の割合だけ減衰させるように補正を行い、補正スペクトル強度を作成し、前記単位区間の先頭時刻と直後の単位区間の先頭時刻との時間差と、前記単位区間の補正スペクトル強度に基づいて、所定の形式の符号コードを生成するようにしたことを特徴とする。 In order to solve the above-described problem, in the first aspect of the present invention, when encoding an acoustic signal given as a sample sequence of J time series digitized at a predetermined sampling frequency, for the sample sequence, A unit interval composed of a predetermined number T (T <J) samples is set by overlapping a predetermined number W (W <T) samples in the time axis direction with an adjacent unit interval, and each unit interval For each unit section, by performing frequency conversion for at least N types of frequencies f (n) to be analyzed, spectrum intensities corresponding to the N types of frequencies are calculated for each unit section, and the N types Are divided into a predetermined number of frequency groups so that they do not overlap with each other, and the maximum of the spectrum intensities of the frequencies included in each frequency group for each unit section. Is corrected so as to be attenuated by a predetermined ratio to the spectrum intensity other than the frequency taking the frequency, and the corrected spectrum intensity is created, and the time difference between the start time of the unit interval and the start time of the immediately following unit interval is determined. A code code of a predetermined format is generated based on the corrected spectrum intensity.

本発明第１の態様によれば、各単位区間に対して周波数変換を行ってＮ種類の周波数に対応したスペクトル強度を算出し、Ｎ種類の各周波数ｆ（ｎ）を互いに重複しないように所定数の周波数グループに分割し、各周波数グループに含まれる周波数のスペクトル強度の中で最大値をとる周波数以外のスペクトル強度に所定の割合だけ減衰させるように補正を行い、得られた解析結果を基に符号コードを生成するようにしたので、音響信号の子音成分を、３２和音などの限定された周波数で再生される音源（例えばＭＩＤＩ音源）を用いて忠実に再現することが可能となる。 According to the first aspect of the present invention, frequency conversion is performed on each unit section to calculate spectrum intensities corresponding to N types of frequencies, and the N types of frequencies f (n) are determined so as not to overlap each other. The frequency spectrum is divided into a number of frequency groups, and correction is made so as to attenuate the spectrum intensity other than the frequency having the maximum value among the spectrum intensities included in each frequency group by a predetermined ratio, and the obtained analysis results are used as a basis. Since the code code is generated, the consonant component of the acoustic signal can be faithfully reproduced using a sound source (for example, a MIDI sound source) reproduced at a limited frequency such as 32 chords.

また、本発明第２の態様では、本発明第１の態様において、前記所定数の周波数グループは、ヒト聴覚系の特性に基づき、前記ｎの値を規格上のノートナンバーと定義して、１７，４５，５７，６４，６９，７２，７６，７９，８２，８５，８８，９１，９３，９６，９８，１０１，１０４，１０６，１０９，１１３，１１６，１１９，１２３，１２７，１２８を境界とする２４個で設定されることを特徴とする。 Further, in the second aspect of the present invention, in the first aspect of the present invention, the predetermined number of frequency groups are defined based on the characteristics of the human auditory system and the value of n is defined as a standard note number. , 45, 57, 64, 69, 72, 76, 79, 82, 85, 88, 91, 93, 96, 98, 101, 104, 106, 109, 113, 116, 119, 123, 127, 128 It is characterized by being set by 24.

また、本発明第３の態様では、本発明第１の態様において、前記所定数の周波数グループは、ヒト聴覚系の特性に基づき、周波数ｆ（ｎ）の値が２０，１００，２００，３００，４００,５１０，６３０，７７０，９２０,１０８０,１２７０,１４８０,１７２０,２０００,２３２０,２７００,３１５０,３７００,４４００,５３００,６４００,７７００，９５００，１２０００，１５５００Ｈｚを境界とする２４個で設定されることを特徴とする。 According to a third aspect of the present invention, in the first aspect of the present invention, the predetermined number of frequency groups have a frequency f (n) value of 20, 100, 200, 300, based on characteristics of the human auditory system. 400,510,630,770,920,1080,1270,1480,1720,2000,2320,2700,3150,3700,4400,5300,6400,7700,9500,12000,15500Hz It is characterized by that.

本発明第２、第３の態様によれば、ヒト聴覚系の特性に基づき２４個の周波数グループに分類するようにして補正を行うようにしたので、ヒトの聴覚特性に適した最適な周波数成分で符号化を行うことができる。特に、本発明第２の態様によれば、周波数グループの分類を、再生音源のＭＩＤＩ規格に対応したノートナンバーｎで行っているため、再生ＭＩＤＩ音源に適した符号化を行うことができる。 According to the second and third aspects of the present invention, the correction is performed by classifying into 24 frequency groups based on the characteristics of the human auditory system, so that the optimum frequency component suitable for the human auditory characteristics is obtained. It is possible to perform encoding. In particular, according to the second aspect of the present invention, since the frequency group is classified by the note number n corresponding to the MIDI standard of the reproduction sound source, encoding suitable for the reproduction MIDI sound source can be performed.

また、本発明第４の態様では、本発明第１から第３のいずれかの態様において、前記スペクトル強度の算出を、個々の単位区間ごとに、解析対象とする少なくともＮ種類の各周波数ｆ（ｎ）について、周波数変換を行うことにより、単位区間ｐに対して、前記Ｎ種類の周波数に対応した第１のスペクトル強度Ｅ１（ｐ，ｎ）を算出し、前記単位区間ｐに対して直前に位置する単位区間ｐ−１における第１のスペクトル強度Ｅ１（ｐ−１，ｎ）との対応する周波数ごとの変化に基づく評価値が、所定のしきい値より大きい場合に限り、当該単位区間ｐをｑ（ｑ≦ｐ）番目の選出単位区間ｑとして選出し、解析対象とする少なくともＮ種類の各周波数ｆ（ｎ）について、前記第１のスペクトル算出段階における周波数変換に比較して高精度な周波数変換を行うことにより、前記Ｎ種類の周波数に対応した第２のスペクトル強度Ｅ２（ｑ，ｎ）を算出することにより、前記スペクトル強度として第２のスペクトル強度Ｅ２（ｑ，ｎ）を算出するものであり、前記スペクトルの補正は、前記第２のスペクトル強度Ｅ２（ｑ，ｎ）に対して所定の割合だけ減衰させるように補正を行い、前記補正スペクトル強度として補正スペクトル強度Ｅ‘（ｑ，ｎ）を作成するようにしていることを特徴とする。 Also, in the fourth aspect of the present invention, in any one of the first to third aspects of the present invention, the calculation of the spectral intensity is performed for each unit section at least N types of frequencies f ( n), a first spectral intensity E1 (p, n) corresponding to the N kinds of frequencies is calculated for the unit interval p by performing frequency conversion, and immediately before the unit interval p. Only when the evaluation value based on the change for each corresponding frequency with the first spectrum intensity E1 (p-1, n) in the unit interval p-1 located is larger than the predetermined threshold value, the unit interval p Is selected as the q (q ≦ p) -th selection unit interval q, and at least N types of frequencies f (n) to be analyzed are more accurate than the frequency conversion in the first spectrum calculation stage. frequency By calculating the second spectral intensity E2 (q, n) corresponding to the N kinds of frequencies by performing conversion, the second spectral intensity E2 (q, n) is calculated as the spectral intensity. The spectrum is corrected so as to be attenuated by a predetermined ratio with respect to the second spectrum intensity E2 (q, n), and the corrected spectrum intensity E ′ (q, n) is used as the corrected spectrum intensity. ) Is created.

本発明第４の態様によれば、設定された各単位区間に対して簡易な第１の周波数変換を行い、その強度が直前の単位区間と比較して所定の基準以上に大きい場合に、選出単位区間として選出し、その選出単位区間に対してより高精度な第２の周波数変換を行って、得られた解析結果を基に符号コードを生成するようにしたので、固定間隔で音響信号全体に渡って情報を解析しつつ、特徴的な部分のみを符号化することになるため、和音を含む音響信号や、音声信号の周波数変化をより適切に解析することが可能となる。 According to the fourth aspect of the present invention, a simple first frequency conversion is performed for each set unit section, and the selection is performed when the intensity is larger than a predetermined reference compared to the immediately preceding unit section. Since it was selected as a unit interval, and the second frequency conversion with higher accuracy was performed on the selected unit interval, and a code code was generated based on the obtained analysis result, the entire acoustic signal was fixed at a fixed interval. Thus, only the characteristic part is encoded while analyzing the information over time, so that it is possible to more appropriately analyze the acoustic signal including chords and the frequency change of the audio signal.

また、本発明第５の態様では、本発明第４の態様において、前記第１のスペクトル算出および第２のスペクトル算出を、Ｎ種類の各周波数ｆ（ｎ）に対して隣接する周波数を超えない範囲で所定のＭ種類の副周波数ｆ（ｎ，ｍ）を設定し、前記第１のスペクトル強度Ｅ１（ｐ，ｎ）および第２のスペクトル強度Ｅ２（ｑ，ｎ）として、前記Ｍ種類の副周波数の中で最も大きい強度を示す副周波数に対応する強度値を算出するようにしたことを特徴とする。 In the fifth aspect of the present invention, in the fourth aspect of the present invention, the first spectrum calculation and the second spectrum calculation do not exceed frequencies adjacent to each of the N types of frequencies f (n). A predetermined M types of sub-frequency f (n, m) are set in a range, and the M types of sub-frequency f1 (p, n) and second spectrum strength E2 (q, n) are set as the first spectrum strength E1 (p, n). An intensity value corresponding to a sub-frequency indicating the greatest intensity among the frequencies is calculated.

本発明第５の態様によれば、解析する周波数の間隔を微細に設定することにより、より詳細な周波数解析が可能となり、適切な周波数成分を特定することができる。 According to the fifth aspect of the present invention, by setting the interval of the frequency to be analyzed finely, more detailed frequency analysis can be performed and an appropriate frequency component can be specified.

また、本発明第６の態様では、本発明第４または第５の態様において、前記符号コードを生成する際、隣接する２つの選出単位区間ｑと選出単位区間ｑ＋１に対して、前記選出単位区間ｑがｐ番目の単位区間ｐであった場合に、Ｐ（ｑ）＝ｐと定義し、前記選出単位区間ｑ＋１における周波数ｆ（ｎ）、ｆ（ｎ−１）、ｆ（ｎ＋１）に対応する前記第１のスペクトル強度Ｅ１（Ｐ（ｑ＋１），ｎ）、Ｅ１（Ｐ（ｑ＋１），ｎ−１）、Ｅ１（Ｐ（ｑ＋１），ｎ＋１）のいずれかと、当該選出単位区間ｑ＋１の直前に位置する単位区間Ｐ（ｑ＋１）−１における周波数ｆ（ｎ）に対応する前記第１のスペクトル強度Ｅ１（Ｐ（ｑ＋１）−１，ｎ）との差が所定のしきい値Ｌｄｉｆ未満で、かつ前記第１のスペクトル強度Ｅ１（Ｐ（ｑ＋１），ｎ）、Ｅ１（Ｐ（ｑ＋１），ｎ−１）、Ｅ１（Ｐ（ｑ＋１），ｎ＋１）のいずれかおよび前記第１のスペクトル強度Ｅ１（Ｐ（ｑ＋１）−１，ｎ）が所定のしきい値Ｌｍｉｎより大きい場合、前記選出単位区間ｑと選出単位区間ｑ＋１を連結し、前記符号コードの基礎となる時間差として、前記選出単位区間ｑに定義された選出単位区間ｑの先頭時刻と、選出単位区間ｑ＋１の直後の選出単位区間ｑ＋２の先頭時刻との時間差を用いるようにしていることを特徴とする。 In addition, in the sixth aspect of the present invention, when the code code is generated in the fourth or fifth aspect of the present invention, the selection unit section with respect to two adjacent selection unit sections q and selection unit section q + 1. When q is the p-th unit interval p, it is defined as P (q) = p and corresponds to the frequencies f (n), f (n−1), and f (n + 1) in the selected unit interval q + 1. Any one of the first spectral intensities E1 (P (q + 1), n), E1 (P (q + 1), n-1), E1 (P (q + 1), n + 1) and a position immediately before the selected unit interval q + 1 A difference from the first spectral intensity E1 (P (q + 1) -1, n) corresponding to the frequency f (n) in the unit interval P (q + 1) -1 to be less than a predetermined threshold Ldif, and First spectral intensity E1 (P (q + 1), n) Any of E1 (P (q + 1), n−1), E1 (P (q + 1), n + 1) and the first spectral intensity E1 (P (q + 1) −1, n) is greater than a predetermined threshold value Lmin. If it is larger, the selection unit interval q and the selection unit interval q + 1 are connected, and the start time of the selection unit interval q defined in the selection unit interval q and the selection unit interval q + 1 It is characterized in that the time difference from the head time of the immediately subsequent selection unit section q + 2 is used.

本発明第６の態様によれば、符号コードを生成する際、隣接する２つの選出単位区間のうち、後続の選出単位区間とその直前の単位区間の強度の差が所定のしきい値未満で、後続の選出単位区間の強度とその直前の単位区間の強度がともに所定のしきい値より大きい場合に、隣接する２つの選出単位区間を連結するようにしたので、選出されていないものの、時間的に最も近い単位区間が連結判断の際に考慮されることとなり、適切に音成分を連結することが可能になる。 According to the sixth aspect of the present invention, when the code code is generated, the difference in intensity between the subsequent selected unit section and the immediately preceding unit section of the two adjacent selected unit sections is less than the predetermined threshold value. When the intensity of the subsequent selected unit section and the intensity of the previous unit section are both greater than a predetermined threshold, the two adjacent selected unit sections are connected so that the time is not selected. Therefore, the closest unit section is taken into consideration when determining the connection, and the sound components can be appropriately connected.

また、本発明第７の態様では、本発明第６の態様において、前記第１のスペクトル算出および第２のスペクトル算出は、Ｎ種類の各周波数ｆ（ｎ）に対して隣接する周波数を超えない範囲で所定のＭ種類の副周波数ｆ（ｎ，ｍ）を設定し、前記第１のスペクトル強度Ｅ１（ｐ，ｎ）および第２のスペクトル強度Ｅ２（ｑ，ｎ）として、前記Ｍ種類の副周波数の中で最も大きい強度を示す副周波数ｆ１（ｎ，ｍｍａｘ１）およびｆ２（ｎ，ｍｍａｘ２）に対応する強度値を算出するものであり、前記符号コードを生成する際、前記第１のスペクトル強度Ｅ１（Ｐ（ｑ＋１），ｎ）、Ｅ１（Ｐ（ｑ＋１），ｎ−１）、Ｅ１（Ｐ（ｑ＋１），ｎ＋１）を決定する副周波数ナンバーｍｍａｘ１、ｍｍａｘ２、ｍｍａｘ３の少なくともいずれか１つと、前記第１のスペクトル強度Ｅ１（Ｐ（ｑ＋１）−１，ｎ）を決定する副周波数ナンバーｍｍａｘ０との差が所定のしきい値Ｎｄｉｆ未満という条件をさらに満たした場合に限り、前記選出単位区間ｑと選出単位区間ｑ＋１を連結するようにしていることを特徴とする。 In the seventh aspect of the present invention, in the sixth aspect of the present invention, the first spectrum calculation and the second spectrum calculation do not exceed frequencies adjacent to each of the N types of frequencies f (n). A predetermined M types of sub-frequency f (n, m) are set in a range, and the M types of sub-frequency f1 (p, n) and second spectrum strength E2 (q, n) are set as the first spectrum strength E1 (p, n). An intensity value corresponding to the sub-frequency f1 (n, mmax1) and f2 (n, mmax2) indicating the highest intensity among the frequencies is calculated, and the first spectral intensity is generated when the code code is generated. E1 (P (q + 1), n), E1 (P (q + 1), n-1), at least one of the sub-frequency numbers mmax1, mmax2, mmax3 for determining E1 (P (q + 1), n + 1); Only when the condition that the difference from the sub-frequency number mmax0 for determining the first spectral intensity E1 (P (q + 1) -1, n) is less than a predetermined threshold value Ndif is satisfied, the selection unit interval q And the selected unit interval q + 1 are connected.

本発明第７の態様によれば、解析する周波数の間隔を微細に設定することにより、より詳細な周波数解析が可能となり、さらに、音成分の連結条件として、後続の選出単位区間とその直前の単位区間の副周波数との差がしきい値未満であることを追加したので、より精度の高い解析結果に基づいて音成分を連結することが可能となる。 According to the seventh aspect of the present invention, it is possible to perform more detailed frequency analysis by finely setting the frequency interval to be analyzed, and further, as a sound component connection condition, the subsequent selection unit interval and the immediately preceding selection unit interval Since it is added that the difference from the sub-frequency of the unit section is less than the threshold value, it is possible to connect sound components based on a more accurate analysis result.

また、本発明第８の態様では、本発明第７の態様において、前記選出単位区間ｑが、既に他の選出単位区間と連結されている場合、前記選出単位区間ｑが連結されている先頭の選出単位区間をｑｏとし、前記符号化段階は、前記第１のスペクトル強度Ｅ１（Ｐ（ｑ＋１），ｎ）、Ｅ１（Ｐ（ｑ＋１），ｎ−１）、Ｅ１（Ｐ（ｑ＋１），ｎ＋１）を決定する副周波数ナンバーｍｍａｘ１、ｍｍａｘ２、ｍｍａｘ３の少なくともいずれか１つと、前記第１のスペクトル強度Ｅ１（Ｐ（ｑｏ），ｎ）を決定する副周波数ナンバーｍｍａｘｏとの差が所定のしきい値Ｎａｄｉｆ未満という条件をさらに満たした場合に限り、前記選出単位区間ｑと選出単位区間ｑ＋１を連結することを特徴とする。 Further, in the eighth aspect of the present invention, in the seventh aspect of the present invention, when the selection unit section q is already connected to another selection unit section, the head of the selection unit section q is connected. The selected unit interval is qo, and the encoding step includes the first spectral intensities E1 (P (q + 1), n), E1 (P (q + 1), n−1), E1 (P (q + 1), n + 1). A difference between at least one of the sub-frequency numbers mmax1, mmax2, and mmax3 for determining the first spectral intensity E1 (P (qo), n) and the sub-frequency number mmmax for determining the first spectrum intensity is a predetermined threshold value Nadif The selection unit interval q and the selection unit interval q + 1 are connected only when the condition of less than is further satisfied.

本発明第８の態様によれば、さらに、音成分の連結条件として、前方の選出単位区間が、既に他の選出単位区間と連結されている場合、後続の選出単位区間とその直前の選出単位区間が連結されている先頭の選出単位区間の副周波数との差がしきい値未満であることを追加したので、副周波数が緩やかに変化する異なる音成分に属する後続の選出単位区間を誤って連結することを防ぎ、より精度の高い音成分の連結を実現することが可能となる。 According to the eighth aspect of the present invention, as the sound component connection condition, when the preceding selection unit interval is already connected to another selection unit interval, the subsequent selection unit interval and the immediately preceding selection unit Since the difference between the sub-frequency of the first selection unit section to which the section is connected is less than the threshold value, the subsequent selection unit section belonging to different sound components whose sub-frequency changes slowly is mistakenly It is possible to prevent the connection and to achieve a more accurate connection of sound components.

また、本発明第９の態様では、本発明第４から第８のいずれかの態様において、前記第１のスペクトル算出を、前記単位区間の区間信号の構成要素となるべきＮ種類の要素信号を、各々当該周波数ｆ（ｎ）の周期の整数倍に対応し、前記Ｔに最も近いＴ（ｎ）個のサンプルとして準備し、前記Ｎ個の各周波数ｆ（ｎ）に対応する要素信号と、それぞれ対応する前記単位区間ｐのＴ（ｎ）個のサンプルで構成される区間信号との相関演算を行うことにより、第１のスペクトル強度Ｅ１（ｐ，ｎ）を算出し、前記第２のスペクトルの算出を、前記準備された前記Ｎ個の各周波数ｆ（ｎ）に対応する要素信号と、それぞれ対応する前記選出単位区間ｑのＴ（ｎ）個のサンプルで構成される区間信号との相関演算を行い、相関値が最も高い周波数ｆ（ｎｍａｘ）に対応する要素信号を調和信号として選出し、前記選出された調和信号と当該調和信号について得られた相関値との積で与えられるＴ（ｎｍａｘ）個のサンプルを含有信号とし、当該含有信号を前記区間信号から減じることにより、Ｔ（ｎｍａｘ）個のサンプルで構成される差分信号を演算により求め、前記Ｔ（ｎｍａｘ）個のサンプルを反映させ更新されたＴ（ｎ）個のサンプルを新たな区間信号として、前記調和信号の選出および差分信号の演算を実行して新たな含有信号および差分信号を得る処理を繰り返し行うことによりＮ個の含有信号を求め、求められた含有信号の相関値に基づいて、前記Ｎ種類の周波数に対応した第２のスペクトル強度Ｅ２（ｑ，ｎ）を算出することを特徴とする。 Also, in the ninth aspect of the present invention, in any one of the fourth to eighth aspects of the present invention, the first spectrum calculation is performed using N types of element signals to be constituent elements of the section signal of the unit section. , Each corresponding to an integral multiple of the period of the frequency f (n), prepared as T (n) samples closest to the T, and element signals corresponding to the N frequencies f (n); A first spectrum intensity E1 (p, n) is calculated by performing a correlation operation with a section signal composed of T (n) samples of the corresponding unit section p, and the second spectrum. Is calculated by correlating the prepared element signal corresponding to each of the N frequencies f (n) and the section signal composed of T (n) samples of the corresponding selection unit section q. Calculate the frequency f () with the highest correlation value element) corresponding to max) is selected as a harmonic signal, and T (nmax) samples given by the product of the selected harmonic signal and the correlation value obtained for the harmonic signal are used as the inclusion signal, By subtracting the signal from the interval signal, a differential signal composed of T (nmax) samples is obtained by calculation, and T (n) samples updated to reflect the T (nmax) samples are obtained. As the new interval signal, the selection of the harmonic signal and the calculation of the difference signal are performed to repeatedly obtain the new inclusion signal and the difference signal, thereby obtaining N inclusion signals, and the correlation of the obtained inclusion signals. A second spectrum intensity E2 (q, n) corresponding to the N types of frequencies is calculated based on the value.

本発明第９の態様によれば、全ての単位区間に対する第１のスペクトル算出を、簡易な離散フーリエ変換により行い、選出単位区間に対する第２のスペクトル算出を高精度な一般化調和解析により行うようにしたので、全ての単位区間の解析結果を参考にしつつ、選出単位区間の情報を高精度に得ることを、全体として効率的に行うことが可能となる。 According to the ninth aspect of the present invention, the first spectrum calculation for all unit sections is performed by simple discrete Fourier transform, and the second spectrum calculation for the selected unit section is performed by high-precision generalized harmonic analysis. As a result, it is possible to efficiently obtain information on the selected unit section with high accuracy while referring to the analysis results of all the unit sections.

また、本発明第１０の態様では、本発明第９の態様において、前記第１のスペクトル算出における前記相関の演算を、直前に位置する単位区間ｐ−１における各周波数ｆ（ｎ）に対応する直前相関演算結果に対し、前記単位区間ｐ−１における先頭Ｗサンプルに対応する相関演算を行い、各周波数ごとの相関値を前記直前相関演算結果より減算するとともに、前記単位区間ｐにおけるＴ（ｎ）サンプル中の最後尾Ｗサンプルに対応する相関演算を行い、各周波数ごとの相関値を前記直前相関演算結果に加算することにより、前記単位区間ｐにおける各周波数ｆ（ｎ）に対応する相関演算結果を取得し、当該相関演算結果に基づいて前記第１のスペクトル強度Ｅ１（ｐ，ｎ）を算出することにより行うことを特徴とする。 In the tenth aspect of the present invention, in the ninth aspect of the present invention, the calculation of the correlation in the first spectrum calculation corresponds to each frequency f (n) in the unit interval p-1 located immediately before. A correlation calculation corresponding to the first W sample in the unit interval p-1 is performed on the previous correlation calculation result, and a correlation value for each frequency is subtracted from the previous correlation calculation result, and T (n in the unit interval p ) Correlation calculation corresponding to each frequency f (n) in the unit interval p by performing correlation calculation corresponding to the last W sample in the sample and adding the correlation value for each frequency to the previous correlation calculation result This is performed by obtaining a result and calculating the first spectral intensity E1 (p, n) based on the correlation calculation result.

本発明第１０の態様によれば、第１のスペクトル算出における各単位区間に対する簡易な相関演算を行う際、直前の単位区間に対して行われた相関演算結果を利用し、直前相関演算結果の先頭部分を除去するとともに、当該単位区間の最後尾に対する相関演算を行って、その結果を直前相関演算結果に加算するようにしたので、直前の単位区間の相関演算結果の大部分を流用することができ、全ての単位区間に対する演算処理を高速化することが可能となる。 According to the tenth aspect of the present invention, when performing a simple correlation calculation for each unit section in the first spectrum calculation, the correlation calculation result performed for the previous unit section is used, and the previous correlation calculation result is calculated. Since the head part is removed and the correlation calculation is performed on the tail of the unit interval, and the result is added to the previous correlation calculation result, most of the correlation calculation result of the previous unit interval is used. It is possible to speed up the arithmetic processing for all unit sections.

本発明によれば、和音信号や、音声信号の周波数変化をより適切に解析することが可能な音響信号の符号化方法および装置を提供することが可能となるという効果を有する。 According to the present invention, there is an effect that it is possible to provide a method and an apparatus for encoding an acoustic signal that can more appropriately analyze a frequency change of a chord signal or an audio signal.

本実施形態における音響信号の符号化装置のハードウェア構成図である。It is a hardware block diagram of the encoding apparatus of the acoustic signal in this embodiment. 本発明に係る音響信号の符号化方法の概要を示すフローチャートである。It is a flowchart which shows the outline | summary of the encoding method of the acoustic signal which concerns on this invention. 時間軸方向の拡大、周波数の増加・時間情報の縮小の概念を示す図である。It is a figure which shows the concept of the expansion of a time-axis direction, the increase in a frequency, and reduction | decrease of time information. 従来技術と比較した本実施形態の区間設定の概念を示す図である。It is a figure which shows the concept of the area setting of this embodiment compared with the prior art. 本実施形態における解析周波数の論理的／物理的範囲の関係を示す図である。It is a figure which shows the relationship of the logical / physical range of the analysis frequency in this embodiment. 本実施形態における単位区間と解析範囲の関係を示す図である。It is a figure which shows the relationship between the unit area and analysis range in this embodiment. 本実施形態における単位区間の解析処理の様子を示す図である。It is a figure which shows the mode of the analysis process of the unit area in this embodiment. 時間軸拡大処理後の音響信号から抽出した単位区間におけるサンプル列と、調和信号の対応関係を示す図である。It is a figure which shows the correspondence of the sample row | line in the unit area extracted from the acoustic signal after a time-axis expansion process, and a harmonic signal. 単位区間長Ｔが調和信号の１／２周期以上で３／４周期未満の場合の解析フレームの時間軸延長を説明する図である。It is a figure explaining the time-axis extension of the analysis frame in case unit interval length T is more than 1/2 period of a harmonic signal and less than 3/4 period. 単位区間長Ｔが調和信号の１／４周期以上で１／２周期未満の場合の解析フレームの時間軸延長を説明する図である。It is a figure explaining the time-axis extension of an analysis frame in case unit interval length T is 1/4 period or more and less than 1/2 period of a harmonic signal. 非特許文献１を基に作成された周波数基準の帯域フィルタを示す図である。It is a figure which shows the frequency reference band filter produced based on the nonpatent literature 1. FIG. 図１１に示した帯域フィルタの単位を変換したＭＩＤＩノートナンバー基準の帯域フィルタを示す図である。It is a figure which shows the band filter of the MIDI note number reference | standard which converted the unit of the band filter shown in FIG. Ｓ８における聴覚フィルタ補正の詳細を示すフローチャートである。It is a flowchart which shows the detail of the auditory filter correction | amendment in S8. 連結判断の対象とする選出単位区間と単位区間との関係を示す図である。It is a figure which shows the relationship between the selection unit area used as the object of a connection judgment, and a unit area.

以下、本発明の好適な実施形態について、図面を参照して詳細に説明する。
図１は、本発明の一実施形態における音響信号の符号化装置のハードウェア構成図である。音響信号の符号化装置は、汎用のコンピュータで実現することができ、図１に示すように、ＣＰＵ１（CPU: Central Processing Unit）と、コンピュータのメインメモリであるＲＡＭ２（RAM: Random Access Memory）と、データを記憶するための大容量のデータ記憶装置３（例えば，ハードディスク）と、ＣＰＵが実行するプログラムを記憶するためのプログラム記憶装置４（例えば，ハードディスク）と、キーボード、マウス等のキー入力Ｉ／Ｆ５と、外部デバイス（データ記憶媒体）とデータ通信するためのデータ入出力インターフェース６と、表示デバイス（ディスプレイ）に情報を送出するための表示出力インターフェース７と、を備え、互いにバスを介して接続されている。 DESCRIPTION OF EMBODIMENTS Hereinafter, preferred embodiments of the present invention will be described in detail with reference to the drawings.
FIG. 1 is a hardware configuration diagram of an audio signal encoding device according to an embodiment of the present invention. The audio signal encoding apparatus can be realized by a general-purpose computer. As shown in FIG. 1, a CPU 1 (CPU: Central Processing Unit) and a computer main memory RAM 2 (RAM: Random Access Memory) A large-capacity data storage device 3 (for example, a hard disk) for storing data, a program storage device 4 (for example, a hard disk) for storing a program executed by the CPU, and a key input I such as a keyboard and a mouse / F5, a data input / output interface 6 for data communication with an external device (data storage medium), and a display output interface 7 for sending information to a display device (display), each via a bus It is connected.

音響信号の符号化装置のプログラム記憶装置４には、ＣＰＵ１を動作させ、コンピュータを、音響信号の符号化装置として機能させるための専用のプログラムが実装されている。専用のプログラムを実行することにより、ＣＰＵ１は、区間設定手段、第１のスペクトル算出手段（要素信号準備手段、相関演算手段を含む）、第２のスペクトル算出手段（要素信号準備手段、調和信号選出手段、差分信号演算手段を含む）、スペクトル補正手段、符号化手段としての機能を実現することになる。また、データ記憶装置３は、処理に必要な様々なデータを記憶する。 The program storage device 4 of the acoustic signal encoding device is mounted with a dedicated program for operating the CPU 1 and causing the computer to function as the acoustic signal encoding device. By executing the dedicated program, the CPU 1 can select the section setting means, the first spectrum calculation means (including element signal preparation means and correlation calculation means), and the second spectrum calculation means (element signal preparation means, harmonic signal selection). Functions as a spectrum correction means and an encoding means. The data storage device 3 stores various data necessary for processing.

図２は、本実施形態に係る音響信号の符号化方法の概要を示すフローチャートである。本実施形態に係る音響信号の符号化方法は、図２に示した各ステップ（各段階）の詳細な手順を記録したプログラムを、コンピュータが実行することにより、行われる。コンピュータとしては、演算処理を行うためのＣＰＵやメモリ、プログラムやデータを記憶するハードディスク等の記憶装置、音響信号等のデータ入力を行うためのデータ入力機器、指示入力を行うキーボード、マウス等の入力機器、必要な情報を画面に表示する液晶ディスプレイ等の表示機器を備えた汎用のコンピュータを用いることができる。また、図２に示した各ステップ（各段階）の詳細な手順を記録したプログラムが組み込まれた汎用のコンピュータにより本実施形態に係る音響信号の符号化装置が実現される。 FIG. 2 is a flowchart showing an outline of an audio signal encoding method according to this embodiment. The audio signal encoding method according to the present embodiment is performed by a computer executing a program that records the detailed procedure of each step (each stage) shown in FIG. As a computer, a CPU and memory for performing arithmetic processing, a storage device such as a hard disk for storing programs and data, a data input device for inputting data such as acoustic signals, a keyboard for inputting instructions, a mouse, etc. A general-purpose computer including a device and a display device such as a liquid crystal display that displays necessary information on a screen can be used. Also, the audio signal encoding apparatus according to the present embodiment is realized by a general-purpose computer in which a program recording the detailed procedure of each step (each stage) shown in FIG. 2 is incorporated.

まず、コンピュータ（符号化装置）は、処理対象であるデジタル音響信号を、データ入力機器から読み込む。デジタル音響信号は、アナログ音響信号を所定のサンプリング周波数、量子化ビット数でサンプリングしたものであり、本実施形態では、サンプリング周波数４４．１ｋＨｚ、量子化ビット数１６ビットでサンプリングした場合を例にとって以下説明していく。サンプリング周波数４４．１ｋＨｚでサンプリングした場合、デジタル音響信号は、１秒間に４４１００個のサンプル（強度値）を有するサンプル列（サンプルの配列：強度配列）として構成されることになる。 First, a computer (encoding device) reads a digital audio signal to be processed from a data input device. The digital audio signal is obtained by sampling an analog audio signal at a predetermined sampling frequency and the number of quantization bits. In the present embodiment, the sampling is performed with a sampling frequency of 44.1 kHz and a quantization bit number of 16 bits as an example. I will explain. When sampling is performed at a sampling frequency of 44.1 kHz, the digital acoustic signal is configured as a sample row (sample array: intensity array) having 44100 samples (intensity values) per second.

デジタル音響信号を読み込んだら、コンピュータ（符号化装置）は、デジタル音響信号を時間軸方向に所定の倍率Ｋ（Ｋは整数）だけ拡大する（Ｓ１）。具体的には、デジタル音響信号を構成するサンプルの数をＫ倍にする。そして、Ｋ個ごとに、元のサンプルと同じ値のものを配置し、その間の（Ｋ−１）個のサンプルの値としては、両側に位置する元のサンプルの値を用いて線形補間したものを与える。原音響信号の各サンプルｊ（ｊ＝０・・・Ｊ−１）についてのサンプル値をｘ（ｊ）とすると、コンピュータは、以下の〔数式１〕に従った処理を実行することにより、拡大後の音響信号の各サンプルｊ・Ｋ＋ｋ（０≦ｋ≦Ｋ−１）についてのサンプル値ｘ´（ｊ・Ｋ＋ｋ）を算出する。以下の〔数式１〕において、ｗはｋ／Ｋで与えられる０≦ｗ≦１の値をとる実数値とする。 After reading the digital sound signal, the computer (encoding device) expands the digital sound signal by a predetermined magnification K (K is an integer) in the time axis direction (S1). Specifically, the number of samples constituting the digital acoustic signal is multiplied by K. Then, every K samples having the same value as the original sample are arranged, and (K-1) sample values between them are linearly interpolated using the values of the original sample located on both sides. give. Assuming that the sample value for each sample j (j = 0... J-1) of the original sound signal is x (j), the computer expands by executing the processing according to the following [Equation 1]. A sample value x ′ (j · K + k) for each sample j · K + k (0 ≦ k ≦ K−1) of the subsequent acoustic signal is calculated. In the following [Equation 1], w is a real value taking a value of 0 ≦ w ≦ 1 given by k / K.

〔数式１〕
ｘ´（ｊ・Ｋ＋ｋ）＝（１−ｗ）・ｘ（ｊ）＋ｗ・ｘ（ｊ＋１） [Formula 1]
x ′ (j · K + k) = (1−w) · x (j) + w · x (j + 1)

Ｓ１における処理の結果、デジタル音響信号を構成するＪ個のサンプルは、Ｊ×Ｋ個に拡大される。図３（ａ）にＳ１における拡大処理による波形の変化を示す。図３（ａ）における波形は、サンプルの値をプロットしたものを線分で結んだものであるが、サンプル数が多いため、曲線状に表現されるものである。上記〔数式１〕に従った処理を実行することにより、左側に示したような波形が右側に示したような波形に変化することになる。なお、図３の例では、説明の便宜上Ｋ＝２の場合を示している。このように、音響信号を時間軸方向に拡大して解析を行うことにより、元の音響信号に対してそのまま解析を行う場合と同等な周波数解析精度を維持しながら、解析における時間分解能を向上させ、周波数変動を高精度に抽出することが可能となる。 As a result of the processing in S1, J samples constituting the digital audio signal are expanded to J × K. FIG. 3A shows a change in waveform due to the enlargement process in S1. The waveform in FIG. 3A is obtained by connecting sampled values plotted with line segments, but is expressed in a curved line due to the large number of samples. By executing the processing according to the above [Equation 1], the waveform shown on the left side changes to the waveform shown on the right side. In the example of FIG. 3, the case of K = 2 is shown for convenience of explanation. In this way, by performing analysis by expanding the acoustic signal in the time axis direction, the time resolution in the analysis is improved while maintaining the same frequency analysis accuracy as when performing the analysis directly on the original acoustic signal. Thus, it is possible to extract frequency fluctuations with high accuracy.

コンピュータ（符号化装置）は、続くＳ２〜Ｓ５において、所定の区間に対して周波数解析を行う。本実施形態では、周波数解析の対象とする区間の設定を、単位区間を設定した後、所定の条件を満たす単位区間を選出単位区間として選出することにより行う。本実施形態における解析対象とする区間の設定の概念を、特許文献２に代表される従来技術と比較して説明する。図４は、従来技術と比較した本実施形態の区間設定の概念を示す図である。従来の場合も本実施形態も、後述するように、サンプリングにより得られた所定数のサンプルを単位区間として設定し、単位区間ごとに周波数解析を行う点は同一である。図４（ａ）に示すように、特許文献２に開示されている従来技術では、ゼロ交差点を特定し、そのゼロ交差点を利用して単位区間の設定位置を可変とする。図４（ａ）では例として３つの単位区間が設定されているが、これらの開始位置（左端）の間隔は均一ではない。そして、設定された各単位区間に対して、離散フーリエ変換及び一般化調和解析の両方を実行して解析結果を得る。 The computer (encoding device) performs frequency analysis on a predetermined section in subsequent S2 to S5. In the present embodiment, the section to be subjected to frequency analysis is set by selecting a unit section satisfying a predetermined condition as the selected unit section after setting the unit section. The concept of setting the section to be analyzed in the present embodiment will be described in comparison with the prior art represented by Patent Document 2. FIG. 4 is a diagram showing a concept of section setting of the present embodiment compared with the prior art. As will be described later, both the conventional case and the present embodiment are the same in that a predetermined number of samples obtained by sampling are set as unit sections and frequency analysis is performed for each unit section. As shown in FIG. 4A, in the prior art disclosed in Patent Document 2, a zero crossing point is specified, and the setting position of the unit section is made variable using the zero crossing point. In FIG. 4A, three unit sections are set as an example, but the intervals between these start positions (left ends) are not uniform. Then, both discrete Fourier transform and generalized harmonic analysis are executed for each set unit interval to obtain an analysis result.

これに対して、図４（ｂ）に示すように、本実施形態では、固定間隔で単位区間を設定し、各単位区間に対して離散フーリエ変換を実行して解析結果を得る。そして、その解析結果を直前の単位区間と比較して、所定の条件を満たす場合に、選出単位区間として選出する。図４（ｂ）の例では、単位区間１、５、６がそれぞれ選出単位区間１、２、３として選出されている。そして、選出単位区間に対して一般化調和解析を実行して解析結果を得る。 On the other hand, as shown in FIG. 4B, in this embodiment, unit intervals are set at fixed intervals, and discrete Fourier transform is performed on each unit interval to obtain an analysis result. Then, the analysis result is compared with the immediately preceding unit section, and when a predetermined condition is satisfied, it is selected as the selected unit section. In the example of FIG. 4B, unit sections 1, 5, and 6 are selected as selection unit sections 1, 2, and 3, respectively. Then, a generalized harmonic analysis is performed on the selected unit section to obtain an analysis result.

具体的には、まず、コンピュータ（符号化装置）が、時間軸方向に拡大されたサンプル上に単位区間を設定する（Ｓ２）。単位区間の長さ（サンプル数Ｔ）は、サンプリング周波数との関係で設定されるが、サンプリング周波数が４４．１ｋＨｚの場合、低域部まで忠実に解析するためには、４０９６サンプル以上必要である。そこで、本実施形態では、１単位区間のサンプル数Ｔ＝４０９６として単位区間を設定している。 Specifically, first, the computer (encoding device) sets a unit interval on the sample expanded in the time axis direction (S2). The length of the unit interval (number of samples T) is set in relation to the sampling frequency. However, if the sampling frequency is 44.1 kHz, 4096 samples or more are required to faithfully analyze the low frequency region. . Therefore, in this embodiment, the unit interval is set as the number of samples T per unit interval T = 4096.

単位区間の設定は、特許文献１、３に開示されているように、デジタル音響信号の先頭から順次サンプルを抽出することにより行われる。単位区間は、全てのサンプルを漏らさず設定し、好ましくは、連続する単位区間においてサンプルが重複するように設定する。本発明では、各単位区間の先頭の間隔（シフト幅という）を固定値で設定する。すなわち、重複させるサンプル数を一定として設定する。本実施形態では、シフト幅Ｗ＝６４の固定値とする。これにより、Ｔ＝４０９６の場合、先頭の単位区間をｊ＝０〜４０９５、２番目の単位区間をｊ＝６４〜４１５９、３番目の単位区間をｊ＝１２８〜４２２３というように、６４個のサンプルを重複させながら、設定することになる。そして、各サンプルの値ｘ（ｊ）を各単位区間ｐ（ｐは０以上の整数）ごとの値ｘ（ｐ，ｉ）（０≦ｉ≦Ｔ−１）と表現する。 Setting of the unit section is performed by sequentially extracting samples from the head of the digital sound signal as disclosed in Patent Documents 1 and 3. The unit interval is set so as not to leak all samples, and is preferably set so that the samples overlap in continuous unit intervals. In the present invention, the head interval (referred to as shift width) of each unit section is set as a fixed value. That is, the number of samples to be overlapped is set to be constant. In the present embodiment, the shift width W = 64 is a fixed value. Accordingly, when T = 4096, the first unit interval is j = 0 to 4095, the second unit interval is j = 64 to 4159, the third unit interval is j = 128 to 4223, and so on. It will be set with duplicate samples. The value x (j) of each sample is expressed as a value x (p, i) (0 ≦ i ≦ T−1) for each unit interval p (p is an integer of 0 or more).

次に、設定された各単位区間を対象として第１の周波数解析である離散フーリエ変換を実行し、各単位区間のスペクトルを算出する（Ｓ３）。各単位区間のスペクトルの算出は、特許文献１〜３に開示されているように、ＭＩＤＩのノートナンバーｎに対応する１２８種の解析周波数ｆ（ｎ）＝４４０・２^(n-69)/12の調和信号（調和関数）を基本にした離散フーリエ変換により、１２８個の成分を抽出することにより行う。“１２８種”“１２８個”というのは一例であり、例えば、ＭＩＤＩ規格の場合、ノートナンバーｎ＝０〜１２７の範囲に対応するが、グランドピアノを再現するための規格音域は、ノートナンバーｎ＝２１〜１０８の範囲である。したがって、この場合、８８種類の解析周波数を用いて８８個の成分を抽出することになる。また、前述したように、音響信号を時間軸方向に所定の倍率Ｋだけ拡大して周波数解析を行う場合、後述する時間軸拡大処理を省略すると、低音部に位置する解析周波数では解析不能になるため解析周波数の種類は更に少なくなる。例えば、Ｋ＝４の場合、解析周波数はＭＩＤＩ規格上でもノートナンバーｎ＝２４〜１２７（１０４種類）の範囲に制限され、更にグランドピアノの規格音域では、ノートナンバーｎ＝２４〜１０８（６４種類）の範囲に制限される。 Next, discrete Fourier transform, which is the first frequency analysis, is executed for each set unit section, and the spectrum of each unit section is calculated (S3). As disclosed in Patent Documents 1 to 3, the spectrum of each unit section is calculated with 128 analysis frequencies f (n) = 440 · 2 ^{(n−69) / 12} corresponding to MIDI note number n. This is done by extracting 128 components by discrete Fourier transform based on the harmonic signal (harmonic function). “128 types” and “128 pieces” are examples. For example, in the case of the MIDI standard, it corresponds to the range of note number n = 0 to 127, but the standard range for reproducing a grand piano is note number n. The range is from 21 to 108. Therefore, in this case, 88 components are extracted using 88 types of analysis frequencies. Further, as described above, when the frequency analysis is performed by enlarging the acoustic signal by a predetermined magnification K in the time axis direction, if the time axis expansion process described later is omitted, the analysis cannot be performed at the analysis frequency located in the bass portion. Therefore, the types of analysis frequencies are further reduced. For example, when K = 4, the analysis frequency is limited to the range of note numbers n = 24 to 127 (104 types) in the MIDI standard, and note numbers n = 24 to 108 (64 types) in the standard range of the grand piano. ).

本実施形態では、音響信号を時間軸方向にＫ倍に拡大したことに伴い、ｎの上限、下限をそれぞれαだけ下方に移動させる。αは、α＝１２・ｌｏｇ₂Ｋ（例えばＫ＝４の場合α＝２４）で定義される整数である。したがって、特許文献１〜３では、０≦ｎ≦１２７であるが、本実施形態では、−α≦ｎ≦１２７−αである。これにより各調和信号の周波数は、１／Ｋ倍に設定されることになる。ここで、本実施形態における解析周波数の論理的／物理的範囲の関係を図５に示す。図５に示すように、グランドピアノの規格音域は、ｎ＝２１〜１０８の範囲であるため、通常の解析を行う場合は、ｎ＝２１〜１０８の範囲で行うことになる。ところが、本発明では、時間軸拡大することにより周波数を低音側にシフトして解析処理を行う。また、ノートナンバーｎ＝２１以下については、対応する調和信号の１周期が単位区間より長くなるため、時間軸延長（後述）による長周期解析を行う。この結果、ｎ＝−３〜８４について周波数成分が得られるが、最終的に補正処理を行うことにより、ｎ＝２１〜１０８の範囲周波数成分が得られる。 In the present embodiment, as the acoustic signal is expanded K times in the time axis direction, the upper and lower limits of n are respectively moved downward by α. α is an integer defined by α = 12 · log ₂ K (for example, α = 24 when K = 4). Therefore, in Patent Documents 1 to 3, 0 ≦ n ≦ 127, but in this embodiment, −α ≦ n ≦ 127−α. Thereby, the frequency of each harmonic signal is set to 1 / K times. Here, the relationship of the logical / physical range of the analysis frequency in this embodiment is shown in FIG. As shown in FIG. 5, the standard sound range of the grand piano is in a range of n = 21 to 108. Therefore, when performing a normal analysis, it is performed in a range of n = 21 to 108. However, in the present invention, the time axis is expanded to shift the frequency to the bass side and perform analysis processing. For note number n = 21 or less, since one cycle of the corresponding harmonic signal is longer than the unit interval, a long cycle analysis is performed by extending the time axis (described later). As a result, frequency components are obtained for n = −3 to 84, but by performing correction processing finally, range frequency components of n = 21 to 108 are obtained.

ノートナンバーｎに対応して解析周波数を設定した場合、周波数が高くなるにつれ、ノートナンバー間の周波数間隔が広くなるため、特に、ｎが６０を超えると解析精度が低下してしまう。そこで、本実施形態では、特許文献３に開示したように、ノートナンバー間をＭ個の微分音（副周波数）に分割した１２８Ｍ個の要素信号ｆ（ｎ，ｍ）＝４４０・２^{(n-69+m/M)/12}を用いて解析を行い、１２８Ｍ個の成分を抽出する。後述するＳ１０の符号コード作成処理においてピッチベンド符号の付加など特殊な符号化を行わない限り、各ノートナンバーにおけるＭ個の微分音の情報は不要であるため、Ｍ個の微分音の成分の最大値を当該ノートナンバーにおける成分として代表させ、結果的に１２８個の成分を抽出する。 When the analysis frequency is set in correspondence with the note number n, the frequency interval between the note numbers becomes wider as the frequency becomes higher. In particular, when n exceeds 60, the analysis accuracy decreases. Therefore, in the present embodiment, as disclosed in Patent Document 3, 128M element signals f (n, m) = 440 · 2 ⁽ⁿ⁻ ) where the note numbers are divided into M differential sounds (sub-frequency). ^{69 + m / M) / 12} , and 128M components are extracted. Unless special encoding such as addition of a pitch bend code is performed in the code code generation process of S10 described later, information on M differential sounds at each note number is unnecessary, so the maximum value of the components of M differential sounds Is represented as a component in the note number, and as a result, 128 components are extracted.

コンピュータ（符号化装置）による具体的な処理手順としては、各単位区間ｐごとに、まず、ノートナンバー分の強度値の配列Ｅ１（ｐ，ｎ）（−α≦ｎ≦１２７−α）と副周波数配列Ｓ（ｐ，ｎ）を設定し、初期値を全て０とする。続いて、−α≦ｎ≦１２７−αおよび０≦ｍ≦Ｍ−１に対して以下の〔数式２〕に従った処理を実行し、Ｅ１（ｐ，ｎ，ｍ）を最大にする（ｎｍａｘ，ｍｍａｘ）を求める。ただし、処理負荷との関係から、低音部分を必要としない場合は、ｎ＜０については処理を省略することにより、処理負荷を軽減することもできる。 As a specific processing procedure by the computer (encoding device), for each unit section p, first, an array of intensity values E1 (p, n) (−α ≦ n ≦ 127−α) corresponding to the note number and sub The frequency array S (p, n) is set and all initial values are set to 0. Subsequently, the processing according to the following [Formula 2] is executed for -α ≦ n ≦ 127-α and 0 ≦ m ≦ M−1 to maximize E1 (p, n, m) (nmax , Mmax). However, if the bass portion is not required due to the relationship with the processing load, the processing load can be reduced by omitting the processing for n <0.

〔数式２〕
Ａ(ｐ，ｎ，ｍ)＝(１／Ｔ（ｎ）)・Σ_i=0,T(n)-1ｘ(ｐ，ｉ) sin(２πｆ（ｎ，ｍ）（ｉ＋ｐＷ）／ｆｓ)
Ｂ(ｐ，ｎ，ｍ)＝(１／Ｔ（ｎ）)・Σ_i=0,T(n)-1ｘ(ｐ，ｉ) cos (２πｆ（ｎ，ｍ）（ｉ＋ｐＷ）／ｆｓ)
Ｅ１(ｐ，ｎ，ｍ)＝｛Ａ(ｐ，ｎ，ｍ)｝²＋｛Ｂ(ｐ，ｎ，ｍ)｝² [Formula 2]
A (p, n, m) = (1 / T (n)). Σi _{= 0, T (n) −1} x (p, i) sin (2πf (n, m) (i + pW) / fs)
B (p, n, m) = (1 / T (n)). Σi _{= 0, T (n) −1} x (p, i) cos (2πf (n, m) (i + pW) / fs)
E1 (p, n, m) = {A (p, n, m)} ² + {B (p, n, m)} ²

上記〔数式２〕においてＴ（ｎ）は解析フレーム長であり、調和信号（調和関数）の１周期が単位区間長Ｔ以下の場合、単位区間長Ｔを超えない範囲で調和信号の周期の最大の整数倍になるように設定する。ただし、本発明では、時間軸拡大することにより周波数を低音側にシフトして解析処理を行うため、調和信号（調和関数）の１周期が単位区間長Ｔを超える場合が発生する。具体的には、調和信号の１周期が単位区間長Ｔより大きい場合、Ｔ（ｎ）＝ｇ×ｆｓ／ｆ（ｎ，ｍ）で与え、Ｔ＜Ｔ（ｎ）におけるｘ(ｉ)の値については、後述する時間軸延長処理に基づき設定する。なお、ｇは１以上の整数値、ｆｓはサンプリング周波数（例えば、４４．１ｋＨｚ）である。 In the above [Equation 2], T (n) is the analysis frame length. When one period of the harmonic signal (harmonic function) is equal to or shorter than the unit section length T, the maximum period of the harmonic signal is within a range not exceeding the unit section length T. Set to be an integer multiple of. However, in the present invention, the time axis is expanded to shift the frequency to the bass side and the analysis process is performed, so that one period of the harmonic signal (harmonic function) exceeds the unit interval length T. Specifically, when one period of the harmonic signal is larger than the unit section length T, it is given by T (n) = g × fs / f (n, m), and the value of x (i) at T <T (n) Is set based on the time axis extension process described later. Note that g is an integer value of 1 or more, and fs is a sampling frequency (for example, 44.1 kHz).

上記〔数式２〕に従った処理を各単位区間に対して実行し、Ａ(ｐ，ｎ，ｍ)、Ｂ(ｐ，ｎ，ｍ)、Ｅ１(ｐ，ｎ，ｍ)を求めることも可能である。ここで、本実施形態における単位区間と解析範囲の関係を図６に示す。図６において、上端の波形は原音響信号、下端の波形は調和関数をそれぞれ模式的に示したものである。図６の例では、対象とする単位区間である対象単位区間と、その直前の単位区間である直前単位区間のみを示してあるが、それぞれの相関計算範囲は、矩形の横方向の長さになる。本実施形態では、相関計算範囲Ｔを４０９６サンプル、シフト幅Ｗを６４サンプルとしているため、重複部分が非常に大きい。そこで、本実施形態では、重複部分については、直前単位区間における解析結果を利用することにより、解析処理の効率化を図っている。 The processing according to the above [Equation 2] is executed for each unit section, and A (p, n, m), B (p, n, m), E1 (p, n, m) can be obtained. It is. Here, the relationship between the unit interval and the analysis range in the present embodiment is shown in FIG. In FIG. 6, the waveform at the upper end schematically shows the original sound signal, and the waveform at the lower end schematically shows the harmonic function. In the example of FIG. 6, only the target unit section that is the target unit section and the immediately preceding unit section that is the immediately preceding unit section are shown, but each correlation calculation range has a rectangular horizontal length. Become. In this embodiment, since the correlation calculation range T is 4096 samples and the shift width W is 64 samples, the overlapping portion is very large. Therefore, in the present embodiment, the efficiency of the analysis process is improved by using the analysis result in the immediately preceding unit section for the overlapping portion.

本実施形態における単位区間の解析処理の様子を図７に示す。図７に示すように、対象単位区間における解析結果を得る際に、直前単位区間の重複部分を利用する。具体的には、対象単位区間と重複しない直前単位区間の先頭部分を削除し、直前単位区間と重複しない対象単位区間の最後尾部分のみ、相関演算を行って追加する。従って、単位区間内全体に渡って相関演算を行うのは、先頭の単位区間（ｐ＝０）に対してだけということになる。 FIG. 7 shows a state of the unit interval analysis processing in the present embodiment. As shown in FIG. 7, when obtaining the analysis result in the target unit section, the overlapping part of the immediately preceding unit section is used. Specifically, the head part of the previous unit section that does not overlap with the target unit section is deleted, and only the tail part of the target unit section that does not overlap with the previous unit section is added by performing correlation calculation. Therefore, the correlation calculation is performed only for the head unit interval (p = 0) over the entire unit interval.

ｐ≧１の場合、すなわち、２番目以降の単位区間ｐについて処理する場合、直前の単位区間（ｐ−１）についてのＡ(ｐ−１，ｎ，ｍ)、Ｂ(ｐ−１，ｎ，ｍ)が既に算出されている。本実施形態では、Ａ(ｐ−１，ｎ，ｍ)、Ｂ(ｐ−１，ｎ，ｍ)を用いて、以下の〔数式３〕に従った処理を実行することにより、単位区間ｐについてのＡ(ｐ，ｎ，ｍ) 、Ｂ(ｐ，ｎ，ｍ)を算出する。 When p ≧ 1, that is, when processing for the second and subsequent unit intervals p, A (p−1, n, m) and B (p−1, n, m) for the immediately preceding unit interval (p−1). m) has already been calculated. In the present embodiment, by using A (p−1, n, m) and B (p−1, n, m), the processing according to the following [Equation 3] is executed, so that the unit interval p is obtained. A (p, n, m) and B (p, n, m) are calculated.

〔数式３〕
Ａ(ｐ，ｎ，ｍ)＝Ａ(ｐ−１，ｎ，ｍ) −(１／Ｔ（ｎ）)・Σ_i=0,W-1ｘ(ｐ−１，ｉ) sin(２πｆ（ｎ，ｍ）（ｉ＋（ｐ−１）Ｗ）／ｆｓ)＋(１／Ｔ（ｎ）)・Σ_{i=T(n)-W,T(n)-1}ｘ(ｐ，ｉ) sin(２πｆ（ｎ，ｍ）（ｉ＋ｐＷ）／ｆｓ)
Ｂ(ｐ，ｎ，ｍ)＝Ｂ(ｐ−１，ｎ，ｍ) −(１／Ｔ（ｎ）)・Σ_i=0,W-1ｘ(ｐ−１，ｉ) cos (２πｆ（ｎ，ｍ）（ｉ＋（ｐ−１）Ｗ）／ｆｓ)＋(１／Ｔ（ｎ）)・Σ_{i=T(n)-W,T(n)-1}ｘ(ｐ，ｉ) cos (２πｆ（ｎ，ｍ）（ｉ＋ｐＷ）／ｆｓ)
Ｅ１(ｐ，ｎ，ｍ)＝｛Ａ(ｐ，ｎ，ｍ)｝²＋｛Ｂ(ｐ，ｎ，ｍ)｝² [Formula 3]
A (p, n, m) = A (p-1, n, m) − (1 / T (n)) · Σ _{i = 0, W−1} x (p−1, i) sin (2πf (n , M) (i + (p-1) W) / fs) + (1 / T (n)). Σi _{= T (n) -W, T (n) -1} x (p, i) sin (2πf (N, m) (i + pW) / fs)
B (p, n, m) = B (p-1, n, m) − (1 / T (n)) · Σ _{i = 0, W−1} x (p−1, i) cos (2πf (n , M) (i + (p-1) W) / fs) + (1 / T (n)). Σi _{= T (n) -W, T (n) -1} x (p, i) cos (2πf (N, m) (i + pW) / fs)
E1 (p, n, m) = {A (p, n, m)} ² + {B (p, n, m)} ²

続いて、ノートナンバーｎごとに、０≦ｍ≦Ｍ−１の範囲で、Ｅ（ｐ，ｎ，ｍ）を最大にする（ｐ，ｎ，ｍｍａｘ）を求め、Ｅ１(ｐ，ｎ)＝Ｅ１(ｐ，ｎ，ｍｍａｘ)、Ｓ（ｐ，ｎ）＝ｍｍａｘとする処理を行う。そして、算出されたＥ１(ｐ，ｎ)、Ｓ（ｐ，ｎ）をメモリに一時保存する。メモリに一時保存されたＥ１(ｐ，ｎ)、Ｓ（ｐ，ｎ）は、後述する単音成分連結処理において用いる。 Subsequently, for each note number n, (p, n, mmax) that maximizes E (p, n, m) in the range of 0 ≦ m ≦ M−1 is obtained, and E1 (p, n) = E1 Processing is performed such that (p, n, mmax) and S (p, n) = mmax. Then, the calculated E1 (p, n) and S (p, n) are temporarily stored in the memory. E1 (p, n) and S (p, n) temporarily stored in the memory are used in a single-tone component connection process to be described later.

次に、単位区間ｐにおいて算出されたスペクトル強度Ｅ１(ｐ，ｎ)と、直前区間（ｐ−１）において算出されたスペクトル強度Ｅ１(ｐ−１，ｎ)との変化の評価を行う（Ｓ４）。具体的には、まず、以下の〔数式４〕に従った処理を実行することにより、単位区間ｐの直前区間（ｐ−１）との変化評価値ｄＥ(ｐ−１，ｐ)を算出する。 Next, a change between the spectral intensity E1 (p, n) calculated in the unit interval p and the spectral intensity E1 (p-1, n) calculated in the immediately preceding interval (p-1) is evaluated (S4). ). Specifically, first, a change evaluation value dE (p−1, p) with the immediately preceding section (p−1) of the unit section p is calculated by executing processing according to the following [Formula 4]. .

〔数式４〕
ｄＥ(ｐ−１，ｐ)＝（１００／Ｎ）・Σ_n=0,N-1｛｜Ｅ１(ｐ，ｎ)−Ｅ１(ｐ−１，ｎ)｜／（Ｅ１(ｐ，ｎ)＋Ｅ１(ｐ−１，ｎ)）｝ [Formula 4]
dE (p−1, p) = (100 / N) · Σ _{n = 0, N−1} {| E1 (p, n) −E1 (p−1, n) | / (E1 (p, n) + E1 (p-1, n))}

そして、得られた変化評価値ｄＥ(ｐ−１，ｐ)が、所定のしきい値（例えば４０）未満である場合は、ｐ←ｐ＋１としてＳ２に戻り、次の単位区間ｐの設定を行う。 If the obtained change evaluation value dE (p−1, p) is less than a predetermined threshold value (for example, 40), the process returns to S2 as p ← p + 1, and the next unit interval p is set. .

一方、得られた変化評価値ｄＥ(ｐ−１，ｐ)が、所定のしきい値以上である場合は、その単位区間ｐを選出単位区間ｑとして選出し、選出単位区間ｑについて一般化調和解析を行う（Ｓ５）。ｑの値は最初に選出された選出単位区間を０とし、以降は選出されるごとに１ずつ加算した値を与える。具体的には、まず、Ｓ３において設定されたＥ１(ｐ，ｎ)が最大になるＥ１(ｐ，ｎｍａｘ)を求める。すなわち、−α≦ｎ≦１２７−αの全てのｎのうち、Ｅ１(ｐ，ｎ)が最大になるｎの値をｎｍａｘとして求めるとともに、そのときのＥ１(ｐ，ｎ)をＥ１(ｐ，ｎｍａｘ)として求める。これは、上記〔数式２〕の処理を全てのｎに対して実行し、算出されたｎ個のＥ１(ｐ，ｎ)のうち最大のものを選択することにより行われる。さらに、求めたｎｍａｘを用いて、ｍｍａｘ＝Ｓ（ｐ，ｎｍａｘ）と設定する。 On the other hand, when the obtained change evaluation value dE (p−1, p) is equal to or greater than a predetermined threshold value, the unit interval p is selected as the selected unit interval q, and the generalized harmony is selected for the selected unit interval q. Analysis is performed (S5). The value of q is 0 for the first selected unit interval, and thereafter, a value obtained by adding 1 each time it is selected is given. Specifically, first, E1 (p, nmax) that maximizes E1 (p, n) set in S3 is obtained. That is, among all n of −α ≦ n ≦ 127−α, the value of n that maximizes E1 (p, n) is obtained as nmax, and E1 (p, n) at that time is determined as E1 (p, n). nmax). This is performed by executing the process of [Formula 2] for all n and selecting the largest of the calculated n E1 (p, n). Further, using the obtained nmax, mmax = S (p, nmax) is set.

そして、得られたｎｍａｘ、ｍｍａｘを用いて以下の〔数式５〕に従った処理を実行することにより、Ａ(ｐ，ｎｍａｘ，ｍｍａｘ)、Ｂ(ｐ，ｎｍａｘ，ｍｍａｘ)を算出する。〔数式５〕に従った処理を実行するに際し、まず、単位区間ｐはｑ番目に選出された選出単位区間ｑであるとした場合に、Ｐ（ｑ）＝ｐと設定し、選出単位区間ｑにおいてノートナンバー分の相関強度配列Ｅ２（ｑ，ｎ）を定義し、初期値を全て０未満の値（例えば−１）に設定しておく。 Then, A (p, nmax, mmax) and B (p, nmax, mmax) are calculated by executing processing according to the following [Equation 5] using the obtained nmax and mmax. When executing the processing according to [Formula 5], first, assuming that the unit interval p is the q-th selected unit interval q, P (q) = p is set, and the selected unit interval q , The correlation intensity array E2 (q, n) for the note number is defined, and the initial values are all set to values less than 0 (eg, −1).

〔数式５〕
Ａ(ｐ，ｎｍａｘ，ｍｍａｘ)＝(１／Ｔ（ｎｍａｘ）)・Σ_{i=0,T(nmax)-1}ｘ(ｐ，ｉ)・ sin(２πｆ（ｎｍａｘ，ｍｍａｘ）ｉ／ｆｓ)
Ｂ(ｐ，ｎｍａｘ，ｍｍａｘ)＝(１／Ｔ（ｎｍａｘ）)・Σ_{i=0,T(nmax)-1}ｘ(ｐ，ｉ) cos (２πｆ（ｎｍａｘ，ｍｍａｘ）ｉ／ｆｓ)
Ｅ２(ｑ，ｎｍａｘ)＝｛Ａ(ｐ，ｎｍａｘ，ｍｍａｘ)｝²＋｛Ｂ(ｐ，ｎｍａｘ，ｍｍａｘ)｝² [Formula 5]
A (p, nmax, mmax) = (1 / T (nmax)) · Σi _{= 0, T (nmax) −1} x (p, i) · sin (2πf (nmax, mmax) i / fs)
B (p, nmax, mmax) = (1 / T (nmax)) · Σi _{= 0, T (nmax) −1} x (p, i) cos (2πf (nmax, mmax) i / fs)
E2 (q, nmax) = {A (p, nmax, mmax)} ² + {B (p, nmax, mmax)} ²

そして、算出されたＡ(ｐ，ｎｍａｘ，ｍｍａｘ)、Ｂ(ｐ，ｎｍａｘ，ｍｍａｘ)を用いて、以下の〔数式６〕に従った処理を実行することにより、単位区間ｐ内のサンプル（ｐ，ｉ）の値ｘ（ｐ，ｉ）を０≦ｉ≦Ｔ（ｎｍａｘ）−１に渡って更新する。 Then, by using the calculated A (p, nmax, mmax) and B (p, nmax, mmax), a process in accordance with the following [Equation 6] is performed, whereby a sample (p , I), the value x (p, i) is updated over 0 ≦ i ≦ T (nmax) −1.

〔数式６〕
ｘ(ｐ，ｉ)←ｘ(ｐ，ｉ)−Ａ(ｐ，ｎｍａｘ，ｍｍａｘ) ・sin(２πｆ（ｎｍａｘ，ｍｍａｘ）ｉ／ｆｓ)−Ｂ(ｐ，ｎｍａｘ，ｍｍａｘ) ・cos (２πｆ（ｎｍａｘ，ｍｍａｘ）ｉ／ｆｓ) [Formula 6]
x (p, i) ← x (p, i) −A (p, nmax, mmax) · sin (2πf (nmax, mmax) i / fs) −B (p, nmax, mmax) · cos (2πf (nmax , Mmax) i / fs)

〔数式６〕の処理は、元の音響信号から含有信号を除去する処理である。含有成分を除去した後の音響信号に対して、さらに処理したｎｍａｘの値以外のｎを対象としてＥ２(ｑ，ｎ)が最大になる新たなＥ２(ｑ，ｎｍａｘ)を求め、その新たなｎｍａｘを用いて、〔数式５〕〔数式６〕に従った処理を実行する。この結果、さらに含有信号が音響信号から除去される。コンピュータ（符号化装置）は、このような処理を１２８個全てのｎに対して実行し、Ｅ２(ｑ，ｎ)を得る。 The process of [Formula 6] is a process of removing the contained signal from the original acoustic signal. A new E2 (q, nmax) that maximizes E2 (q, n) is obtained for n other than the processed nmax value for the acoustic signal after removing the contained components, and the new nmax is obtained. Is used to execute processing according to [Equation 5] and [Equation 6]. As a result, the contained signal is further removed from the acoustic signal. The computer (encoding device) executes such a process for all 128 n to obtain E2 (q, n).

本実施形態では、処理負荷を軽減するため、Ｍの値については、ノートナンバーに基づいて可変に設定し、例えば解析する周波数間隔が１００Ｈｚ程度になるようにしている。そして、ノートナンバー６０以下は分割せずＭ＝１にする。また、精度は若干落ちるが、相関強度配列Ｅ１(ｐ，ｎ)を決定するための〔数式２〕の処理でＳ（ｐ，ｎ）を決定し、相関強度配列Ｅ２(ｑ，ｎ)を決定するための〔数式５〕の処理は、ｍ＝Ｓ（ｐ，ｎ）に固定して行い、微分音解析を省略するようにしても良い。また、〔数式５〕の処理で、既に同一ノートナンバーに対して副周波数が異なる信号成分が複数回に渡って解析される可能性があるが、Ｅ２(ｑ，ｎ)に既に値がセットされている場合は、Ｅ１（ｐ，ｎ）の最大値の選定候補から除外するようにしても良い。 In the present embodiment, in order to reduce the processing load, the value of M is variably set based on the note number, for example, the frequency interval to be analyzed is about 100 Hz. And note number 60 and below are not divided and M = 1. Further, although the accuracy is slightly reduced, S (p, n) is determined by the processing of [Equation 2] for determining the correlation strength array E1 (p, n), and the correlation strength array E2 (q, n) is determined. The processing of [Formula 5] for this purpose may be performed with m = S (p, n) fixed, and the differential sound analysis may be omitted. Further, in the processing of [Equation 5], there is a possibility that a signal component having a different sub-frequency with respect to the same note number may be analyzed multiple times, but a value is already set in E2 (q, n). If it is, it may be excluded from selection candidates of the maximum value of E1 (p, n).

ここで、単位区間における解析フレームの設定について説明する。尚、以下説明は前述の選出単位区間においても同様に適用される。図８は、時間軸拡大処理後の音響信号から抽出した単位区間におけるサンプル列と、調和信号の対応関係を示す図である。このうち、図８（ａ）は、時間軸拡大処理後の音響信号から抽出した単位区間におけるサンプル列である。各サンプルにおけるサンプル値（４０９６個）を結ぶことにより、図８（ａ）に示すような波形状で示される。１２８個の調和信号のうち、図８（ｂ）に示すような１周期が単位区間長Ｔ以下の高音部の解析調和信号と相関演算を行う際、および単位区間Ｔより選出された調和信号である含有信号を減算する際には、調和信号の１周期が単位区間長Ｔを超えない範囲まで周期を整数倍した長さを解析フレーム長Ｔ（ｎ）とし、単位区間Ｔの先頭からサンプルＴ（ｎ）個を抽出して、解析フレームとする。 Here, the setting of the analysis frame in the unit section will be described. The following description is similarly applied to the aforementioned selection unit section. FIG. 8 is a diagram illustrating a correspondence relationship between a sample sequence in a unit section extracted from an acoustic signal after time axis expansion processing and a harmonic signal. Among these, Fig.8 (a) is a sample row | line | column in the unit area extracted from the acoustic signal after a time-axis expansion process. By connecting the sample values (4096) in each sample, a wave shape as shown in FIG. Among the 128 harmonic signals, a harmonic signal selected from the unit interval T when performing a correlation operation with an analysis harmonic signal of a treble part whose period is equal to or less than the unit interval length T as shown in FIG. When subtracting a certain contained signal, the analysis frame length T (n) is a length obtained by multiplying the period of the harmonic signal by an integer up to a range in which the period does not exceed the unit section length T. (N) Extract them and use them as analysis frames.

調和信号の１周期が単位区間長Ｔより大きい場合、相関計算区間である解析フレーム長Ｔ（ｎ）を調和信号の１周期分とするため、単位区間長ＴにＴ（ｎ）−Ｔ個のサンプル数を追加することにより時間軸延長処理を行う。この場合、調和信号の１周期を４つの分割区間Ｑ１−Ｑ４として設定する。そして、図８（ｄ）に示すように、単位区間長Ｔが、調和信号の３／４周期に相当する場合は、調和信号の分割区間Ｑ３（１／２周期から３／４周期の区間）に対応する単位区間のサンプルを、調和信号の３／４周期（２７０度：分割区間Ｑ３と分割区間Ｑ４の境界）の時点において時間軸方向に反転させたサンプルを追加する。２７０度の時点において時間軸方向に反転させるのは、調和信号の１周期が正弦波であると仮定すると、分割区間Ｑ３と分割区間Ｑ４（３／４周期から１周期の区間）は２７０度を軸に時間軸方向に左右対称である特徴を利用したものである。このようにして時間軸延長が行われ、図８（ｃ）に示すような波形の解析フレーム（５４６１サンプル）が得られる。図８（ｃ）（ｄ）の例では、単位区間長Ｔが、調和信号の３／４周期に相当する場合を示したが、単位区間長Ｔが、調和信号の３／４周期以上である場合は、同様に処理が行われ、解析フレーム長Ｔ（ｎ）は最大５４６１サンプルとなる。この場合、調和信号の３／４周期を超えた部分の単位区間のサンプルは、単位区間内のいずれかのサンプルが重複して使用されることになる。 When one period of the harmonic signal is larger than the unit section length T, the analysis frame length T (n) that is the correlation calculation section is set to one period of the harmonic signal, so that the unit section length T is T (n) −T. Extend the time axis by adding the number of samples. In this case, one period of the harmonic signal is set as four divided sections Q1-Q4. Then, as shown in FIG. 8D, when the unit section length T corresponds to 3/4 period of the harmonic signal, the harmonic signal division section Q3 (1/2 period to 3/4 period) A sample obtained by inverting the sample of the unit section corresponding to に in the time axis direction at the time of 3/4 period of the harmonic signal (270 degrees: boundary between the divided sections Q3 and Q4) is added. Assuming that one period of the harmonic signal is a sine wave, inversion in the time axis direction at the time of 270 degrees is 270 degrees in divided sections Q3 and Q4 (3/4 period to 1 period). It uses a feature that is symmetrical with respect to the axis in the time axis direction. In this way, the time axis is extended, and a waveform analysis frame (5461 samples) as shown in FIG. 8C is obtained. In the examples of FIGS. 8C and 8D, the case where the unit section length T corresponds to 3/4 period of the harmonic signal is shown, but the unit section length T is 3/4 period or more of the harmonic signal. In this case, the same processing is performed, and the analysis frame length T (n) is 5461 samples at the maximum. In this case, any sample in the unit section of the portion exceeding the 3/4 period of the harmonic signal is used by overlapping one of the samples in the unit section.

また、図９（ｂ）に示すように、単位区間長Ｔが、調和信号の１／２周期以上で３／４周期未満に相当する場合（単位区間の最終サンプルが分割区間Ｑ３に属する場合）は、分割区間Ｑ４全体と分割区間Ｑ３の一部が欠損しているため、分割区間Ｑ２（１／４周期から１／２周期の区間）に対応する単位区間のサンプルを、調和信号の１／２周期（１８０度：分割区間Ｑ２と分割区間Ｑ３の境界）の時点において時間軸および振幅軸方向に反転させたサンプルを分割区間Ｑ３に追加し、さらに追加された分割区間Ｑ３のサンプルを、調和信号の３／４周期（２７０度）の時点において反転させたサンプルを追加する。１８０度の時点において時間軸および振幅軸方向に反転させるのは、調和信号の１周期が正弦波であると仮定すると、分割区間Ｑ２と分割区間Ｑ３は１８０度を軸に時間軸および振幅軸方向に１８０度回転させた対称形である特徴を利用したものである。このようにして時間軸延長が行われ、図９（ａ）に示すような波形の解析フレーム（最大８１９２サンプル）が得られる。この場合、調和信号の１／２周期を超えた部分の単位区間のサンプルは、単位区間内のいずれかのサンプルが重複して使用されることになる。 Also, as shown in FIG. 9B, when the unit section length T corresponds to more than 1/2 cycle and less than 3/4 period of the harmonic signal (when the last sample of the unit section belongs to the divided section Q3). Since the entire divided section Q4 and a part of the divided section Q3 are missing, a unit section sample corresponding to the divided section Q2 (a section from a quarter cycle to a half cycle) is converted to 1 / of the harmonic signal. Samples that are inverted in the time axis and amplitude axis directions at the time of two periods (180 degrees: the boundary between the divided sections Q2 and Q3) are added to the divided sections Q3, and the samples of the added divided sections Q3 are harmonized. An inverted sample is added at the time of 3/4 period (270 degrees) of the signal. Inversion in the time axis and amplitude axis directions at the time of 180 degrees assumes that one period of the harmonic signal is a sine wave, the divided sections Q2 and Q3 are 180 degrees in the time axis and amplitude axis directions. A characteristic that is a symmetrical shape rotated 180 degrees is used. The time axis is extended in this way, and a waveform analysis frame (up to 8192 samples) as shown in FIG. 9A is obtained. In this case, any sample in the unit section of the portion exceeding the half period of the harmonic signal is used by overlapping one of the samples in the unit section.

また、図１０（ｂ）に示すように、単位区間長Ｔが、調和信号の１／４周期以上で１／２周期未満に相当する場合（単位区間の最終サンプルが分割区間Ｑ２に属する場合）は、分割区間Ｑ３・Ｑ４全体と分割区間Ｑ２の一部が欠損しているため、Ｑ１区間（先頭から１／４周期の区間）に対応する単位区間のサンプルを、調和信号の１／４周期（９０度：分割区間Ｑ１と分割区間Ｑ２の境界）の時点において時間軸方向に反転させたサンプルを分割区間Ｑ２に追加し、さらに追加された分割区間Ｑ２のサンプルを、調和信号の１／２周期（１８０度）の時点において時間軸および振幅軸方向に反転させたサンプルを分割区間Ｑ３に追加し、さらに追加された分割区間Ｑ３のサンプルを、調和信号の３／４周期（２７０度）の時点において時間軸方向に反転させたサンプルを追加する。９０度の時点において時間軸方向に反転させるのは、調和信号の１周期が正弦波であると仮定すると、分割区間Ｑ１と分割区間Ｑ２は９０度を軸に時間軸方向に左右対称である特徴を利用したものである。このようにして時間軸延長が行われ、図１０（ａ）に示すような波形の解析フレーム（最大１６３８４サンプル）が得られる。この場合、調和信号の１／４周期を超えた部分の単位区間のサンプルは、単位区間内のいずれかのサンプルが重複して使用されることになる。
尚、単位区間長Ｔが、調和信号の１／４周期未満に相当する場合（単位区間の最終サンプルが分割区間Ｑ１に属する場合）も起こり得るが、これをもとに時間軸延長を行って相関計算を行っても、ソースとなる情報量が少なすぎて、有意な相関値が得られないため、単位区間長Ｔが、調和信号の１／４周期未満に相当する周波数に対しては解析対象としないものとする。 Also, as shown in FIG. 10 (b), when the unit section length T corresponds to not less than ¼ period and less than ½ period of the harmonic signal (when the last sample of the unit section belongs to the divided section Q2). Since the whole of the divided sections Q3 and Q4 and a part of the divided section Q2 are missing, the unit section samples corresponding to the Q1 section (section having a quarter period from the head) are used as the quarter period of the harmonic signal. The sample inverted in the time axis direction at the time of (90 degrees: the boundary between the divided section Q1 and the divided section Q2) is added to the divided section Q2, and the added sample of the divided section Q2 is ½ of the harmonic signal. Samples inverted in the time axis and amplitude axis directions at the time of the period (180 degrees) are added to the divided section Q3, and the added samples of the divided section Q3 are added to the 3/4 period (270 degrees) of the harmonic signal. Time at time To add a sample obtained by reversing the direction. The reason for inversion in the time axis direction at the time of 90 degrees is that if one period of the harmonic signal is a sine wave, the divided sections Q1 and Q2 are symmetrical in the time axis direction about 90 degrees. Is used. The time axis is extended in this way, and a waveform analysis frame (maximum 16384 samples) as shown in FIG. 10A is obtained. In this case, any sample in the unit section of the portion exceeding the quarter period of the harmonic signal is used by overlapping any sample in the unit section.
Note that the unit interval length T may correspond to less than ¼ period of the harmonic signal (when the last sample of the unit interval belongs to the divided interval Q1), but the time axis is extended based on this. Even if the correlation calculation is performed, the amount of information used as a source is too small to obtain a significant correlation value. Therefore, the unit section length T is analyzed for frequencies corresponding to less than a quarter cycle of the harmonic signal. It shall not be covered.

各選出単位区間ｑについて解析フレームを変化させながら周波数解析を行い、スペクトル（１２８個の周波数成分）が算出されたら、個々の選出単位区間ごとに、Ｓ５において算出されたスペクトルに基づいて、Ｎ種類の各周波数に対応して、各周波数を特定可能な周波数情報と、各々に対応するスペクトル強度、および当該選出単位区間の開始と終了を特定可能な時間情報で構成される単音成分を作成する（Ｓ６）。具体的には、算出したスペクトルに、各ノートナンバーｎの時刻、時間長の情報を追加し、[開始時刻，時間長，主周波数ｎ，副周波数Ｓ（Ｐ（ｑ），ｎ），強度Ｅ２（ｑ，ｎ）]で構成される単音成分を作成する。「開始時刻」としては選出単位区間の先頭の時刻を、デジタル音響信号全体において特定できる情報であれば良く、本実施形態では、単位区間の先頭サンプル（ｉ＝０）に付されたデジタル音響信号全体におけるサンプル番号（絶対サンプルアドレス：ｊに対応）を記録している。この絶対サンプルアドレスをサンプリング周波数（４４１００）で除算することにより、音響信号先頭からの時刻が得られる。時間長は、本実施形態では選出単位区間ごとに可変で与えられることを特徴とし、直後に後続する一般化調和解析を行った選出単位区間の開始時刻までの差分（後続する選出単位区間の開始時刻−当該選出単位区間の開始時刻）で与えられる。直後に後続する選出単位区間が存在しない場合（最終の選出単位区間）、単位区間のシフト幅Ｗを時間長として与える。 When frequency analysis is performed while changing the analysis frame for each selected unit section q and the spectrum (128 frequency components) is calculated, N types are selected for each selected unit section based on the spectrum calculated in S5. Corresponding to each frequency, a single-tone component composed of frequency information that can specify each frequency, spectrum intensity corresponding to each frequency, and time information that can specify the start and end of the selected unit section is created ( S6). Specifically, the time and time length information of each note number n is added to the calculated spectrum, and [start time, time length, main frequency n, sub frequency S (P (q), n), intensity E2 A single tone component composed of (q, n)] is created. The “start time” may be any information that can identify the start time of the selected unit section in the entire digital sound signal, and in this embodiment, the digital sound signal attached to the start sample (i = 0) of the unit section. The sample number (corresponding to absolute sample address: j) in the whole is recorded. By dividing this absolute sample address by the sampling frequency (44100), the time from the head of the acoustic signal is obtained. In this embodiment, the length of time is variably given for each selected unit section. The time length is the difference up to the start time of the selected unit section that has been subjected to the generalized harmonic analysis that immediately follows (the start of the subsequent selected unit section). Time-start time of the selected unit section). When there is no subsequent selection unit section immediately after (the last selection unit section), the shift width W of the unit section is given as the time length.

続いて、時間軸方向に拡大して処理されたことによる変動を是正するため、各単音成分を補正する処理を行う（Ｓ７）。単音成分が作成されたら、全ての周波数情報（ノートナンバー値に対応）に１２・ｌｏｇ₂Ｋだけ加算する処理を行う。例えば、Ｋ＝４の場合、２４半音（２オクターブ）だけ全体的に音高を上げる。この処理は、Ｓ１においてサンプル数をＫ倍したことにより周波数が１／Ｋになっているため、周波数をＫ倍にして元の状態に戻すために行う。この補正によりノートナンバーが規格値上限の１２７を超えるノートナンバーをもつ符号コードは削除する。具体的には補正前のノートナンバーが１２８−１２・ｌｏｇ₂Ｋ以上の符号コードが削除される。 Subsequently, in order to correct the variation caused by the processing being enlarged in the time axis direction, processing for correcting each single tone component is performed (S7). When a single tone component is created, a process of adding 12 · log ₂ K to all frequency information (corresponding to the note number value) is performed. For example, when K = 4, the overall pitch is raised by 24 semitones (2 octaves). This process is performed to restore the original state by multiplying the frequency by K times because the frequency is 1 / K by multiplying the number of samples by K in S1. By this correction, the code code having the note number whose note number exceeds the standard value upper limit 127 is deleted. Specifically, the code code with the note number before correction of 128-12 · log ₂ K or more is deleted.

続いて、全ての開始時刻，時間長に１／Ｋを乗算する。これにより、後述する符号化処理によりＭＩＤＩ符号に変換した場合、ＭＩＤＩ符号全体の演奏時間、および各ノートイベントの発音時間が１／Ｋに縮小される。この処理は、Ｓ１においてサンプル数をＫ倍したことにより全体の演奏時間がＫ倍になっているため、時刻を１／Ｋにして元の状態に戻すために行う。 Subsequently, all start times and time lengths are multiplied by 1 / K. As a result, when converted to a MIDI code by an encoding process described later, the performance time of the entire MIDI code and the sounding time of each note event are reduced to 1 / K. This process is performed in order to set the time to 1 / K and return to the original state because the total performance time is K times by multiplying the number of samples by K in S1.

Ｓ７における処理の結果、周波数（音高）はＫ倍になるとともに、時間情報は１／Ｋになる。Ｓ７の補正処理による音成分の変化の様子を図３（ｂ）に示す。図３（ｂ）においては、Ｋ＝２の場合の音成分の変化を、音符により示している。Ｓ７の補正処理により左側の“ミ”の音符は、右側では１オクターブ高い（周波数が２倍）“ミ”の音符に変化している。一方、左側の四分音符が、右側では時間的に１／２の八分音符に変化している。 As a result of the processing in S7, the frequency (pitch) becomes K times and the time information becomes 1 / K. FIG. 3B shows how the sound component changes due to the correction process in S7. In FIG. 3B, the change of the sound component in the case of K = 2 is indicated by a note. As a result of the correction processing in S7, the left "mi" note is changed to the "mi" note that is one octave higher (double the frequency) on the right side. On the other hand, the left quarter note is changed to a half eighth note on the right side.

各単音成分の周波数をＫ倍、時間軸を１／Ｋ倍にする処理を行ったら、次に、聴覚フィルタ補正を行う（Ｓ８）。具体的には、Ｓ７までの処理により得られた０≦ｎ≦１２７の周波数範囲を２４個の帯域に分割し、各帯域単位でＥ２（ｑ，ｎ）に対して補正を行う。 Once the processing of making the frequency of each single tone component K times and the time axis 1 / K times is performed, auditory filter correction is performed (S8). Specifically, the frequency range of 0 ≦ n ≦ 127 obtained by the processes up to S7 is divided into 24 bands, and correction is performed on E2 (q, n) for each band.

本実施形態において聴覚フィルタ補正に用いる帯域フィルタについて説明しておく。非特許文献１を基に作成された帯域フィルタを図１１に示す。図１１においては、２４個の帯域に帯域番号１〜２４を付し、各帯域番号で特定される２４個の帯域について、下限周波数、中心周波数、上限周波数、帯域幅を示している。下限周波数、上限周波数はそれぞれ、各帯域の下限の周波数、上限の周波数を示しており、各帯域の下限周波数、上限周波数はそれぞれ各帯域フィルタにおける上下のカットオフ周波数、当該周波数より上側または下側の周波数成分は通さない（フィルタ利得dB値がマイナス無限大に近い）ときの上限周波数、下限周波数に一致している。中心周波数は、帯域フィルタ内で最も信号成分を通す（フィルタ利得dB値がプラス無限大に近い）ピーク周波数である。帯域幅は、各帯域の幅であり、下限周波数と上限周波数の差分値となっている。 A band-pass filter used for auditory filter correction in this embodiment will be described. FIG. 11 shows a bandpass filter created based on Non-Patent Document 1. In FIG. 11, band numbers 1 to 24 are assigned to 24 bands, and the lower limit frequency, the center frequency, the upper limit frequency, and the bandwidth are shown for the 24 bands specified by each band number. The lower limit frequency and the upper limit frequency indicate the lower limit frequency and the upper limit frequency of each band, respectively. The lower limit frequency and the upper limit frequency of each band are the upper and lower cut-off frequencies in each band filter, respectively, above or below the corresponding frequency. The frequency component is not passed through (the filter gain dB value is close to minus infinity), which matches the upper limit frequency and the lower limit frequency. The center frequency is a peak frequency through which the signal component passes most in the bandpass filter (filter gain dB value is close to plus infinity). The bandwidth is the width of each band, and is a difference value between the lower limit frequency and the upper limit frequency.

図１２は、図１１に示した帯域フィルタをＭＩＤＩノートナンバー単位に変換したものである。すなわち、図１１に示した周波数ｆ（ｎ）（単位Ｈｚ）と、図１２に示したノートナンバーｎの関係は、ｆ（ｎ）＝４４０・２^(n-69)/12を満たすものとなっている。ただし、ノートナンバーが複数の帯域で重複しないように、各帯域の上限音高を１つ高い帯域の下限音高よりも１だけ減じた値としている。 FIG. 12 shows the band filter shown in FIG. 11 converted into MIDI note number units. That is, the relationship between the frequency f (n) (unit: Hz) shown in FIG. 11 and the note number n shown in FIG. 12 satisfies f (n) = 440 · 2 ^{(n−69) / 12.} ing. However, the upper limit pitch of each band is set to a value obtained by subtracting 1 from the lower limit pitch of one higher band so that the note numbers do not overlap in a plurality of bands.

各帯域は、帯域番号ｂを用いてＮｒ（ｂ）≦ｎ＜Ｎｒ（ｂ＋１）のノートナンバーｎの範囲で特定される。２５個の帯域境界値Ｎｒ（ｂ）（ｂ＝１，・・・，２５）は、Ｎｒ（ｂ）＝｛１７，４５，５７，６４，６９，７２，７６，７９，８２，８５，８８，９１，９３，９６，９８，１０１，１０４，１０６，１０９，１１３，１１６，１１９，１２３，１２７，１２８｝である。２５個の帯域境界値Ｎｒ（ｂ）のうち、ｂ＝１〜２４に対応するものは、各帯域の下限音高を示し、ｂ＝２５に対応する帯域境界値“１２８”のみ、帯域の中心音高を示している。図１１、図１２に示した帯域フィルタは典型例であり、上限、中心、下限の数値は適宜変更することができる。また、帯域の数についても、２４個が現状は定説であるが、これに限定されず、新たな知見に基づいて変更することができる。 Each band is specified in the range of note number n such that Nr (b) ≦ n <Nr (b + 1) using band number b. The 25 band boundary values Nr (b) (b = 1,..., 25) are Nr (b) = {17, 45, 57, 64, 69, 72, 76, 79, 82, 85, 88. , 91, 93, 96, 98, 101, 104, 106, 109, 113, 116, 119, 123, 127, 128}. Of the 25 band boundary values Nr (b), the one corresponding to b = 1 to 24 indicates the lower limit pitch of each band, and only the band boundary value “128” corresponding to b = 25 is the center of the band. The pitch is shown. The bandpass filters shown in FIGS. 11 and 12 are typical examples, and the numerical values of the upper limit, the center, and the lower limit can be changed as appropriate. Also, regarding the number of bands, 24 is currently the established theory, but is not limited to this, and can be changed based on new knowledge.

図１３は、Ｓ８における聴覚フィルタ補正の詳細を示すフローチャートである。まず、帯域番号ｂを１に設定する（Ｓ２１）。これにより一番低い帯域である帯域番号１から処理が行われることになる。次に、帯域番号ｂに対応する下限音高Ｎｒ（ｂ）から上限音高Ｎｒ（ｂ＋１）−１の範囲内で相関値配列Ｅ２（ｑ，ｎ）が最大となる極大ノートナンバーｎｂｍａｘを探索する（Ｓ２２）。 FIG. 13 is a flowchart showing details of the auditory filter correction in S8. First, the band number b is set to 1 (S21). As a result, processing is performed from band number 1 which is the lowest band. Next, the maximum note number nbmax that maximizes the correlation value array E2 (q, n) within the range from the lower limit pitch Nr (b) corresponding to the band number b to the upper limit pitch Nr (b + 1) −1 is searched. (S22).

続いて、帯域番号ｂに対応する下限音高Ｎｒ（ｂ）から上限音高Ｎｒ（ｂ＋１）−１の範囲内で極大ノートナンバーｎｂｍａｘ以外のノートナンバーｎに対応する相関値配列Ｅ２（ｑ，ｎ）の値を所定の割合だけ減衰する（Ｓ２３）。具体的には、極大ノートナンバーｎｂｍａｘ以外の各ノートナンバーｎに対応する相関値配列Ｅ２（ｑ，ｎ）に１未満の所定の実数値γを乗算して減衰補正する。実数値γは減衰補正を行うためのものであるので、１未満の値であれば適宜設定することができるが、本実施形態では、γ＝０．１に設定されている。したがって、Ｓ２３における処理により、極大ノートナンバーｎｂｍａｘ以外のノートナンバーｎに対応する各相関値Ｅ２（ｑ，ｎ）は、１／１０に減衰されることになる。 Subsequently, the correlation value array E2 (q, n) corresponding to the note number n other than the maximum note number nbmax within the range from the lower limit pitch Nr (b) corresponding to the band number b to the upper limit pitch Nr (b + 1) −1. ) Is attenuated by a predetermined ratio (S23). Specifically, attenuation correction is performed by multiplying the correlation value array E2 (q, n) corresponding to each note number n other than the maximum note number nbmax by a predetermined real value γ less than 1. Since the real value γ is for performing attenuation correction, any value less than 1 can be set as appropriate. However, in this embodiment, γ = 0.1 is set. Therefore, the correlation value E2 (q, n) corresponding to the note number n other than the maximum note number nbmax is attenuated to 1/10 by the processing in S23.

帯域番号ｂについて、極大ノートナンバーｎｂｍａｘの探索および他のノートナンバーに対応する相関値の減衰を行ったら、帯域番号ｂをカウントアップ（ｂ←ｂ＋１）する（Ｓ２４）。そして、最も高い帯域である帯域番号２４と帯域番号ｂとの比較を行う（Ｓ２５）。Ｓ２５における比較の結果、ｂが２４以下であれば、Ｓ２２に戻って次の帯域について、極大ノートナンバーｎｂｍａｘの探索（Ｓ２２）および他のノートナンバーに対応する相関値の減衰処理（Ｓ２３）を行う。一方、Ｓ２５における比較の結果、ｂが２５以上であれば、全ての帯域について処理を行ったことになるので、聴覚フィルタ補正処理を終了する。 When band number b is searched for maximum note number nbmax and correlation values corresponding to other note numbers are attenuated, band number b is counted up (b ← b + 1) (S24). Then, the band number 24, which is the highest band, is compared with the band number b (S25). If b is 24 or less as a result of the comparison in S25, the process returns to S22, and the search for the maximum note number nbmax (S22) and the attenuation process of the correlation value corresponding to another note number (S23) are performed for the next band. . On the other hand, if b is 25 or more as a result of the comparison in S25, the processing has been performed for all the bands, and the auditory filter correction processing is terminated.

Ｓ８における聴覚フィルタ補正処理の結果は、補正相関強度配列Ｅ‘（ｑ，ｎ）として出力される。補正相関強度配列Ｅ‘（ｑ，ｎ）における各補正相関値は、１２８個のＥ２（ｑ，ｎ）のうち、１０４個は減衰され、残る２４個は減衰されない状態のものとなる。 The result of the auditory filter correction process in S8 is output as a corrected correlation strength array E ′ (q, n). Of the 128 E2 (q, n), 104 corrected correlation values in the corrected correlation intensity array E ′ (q, n) are attenuated and the remaining 24 are not attenuated.

聴覚フィルタ補正処理を行ったら、次に、連続する選出単位区間において単音成分を連結（統合）する処理を行う（Ｓ９）。具体的には、連続する選出単位区間における単音成分が、所定の連結条件を満たす場合、２つの単音成分を連結する。ここで、連結判断の対象とする選出単位区間と単位区間との関係を図１４に示す。 Once the auditory filter correction processing has been performed, processing for connecting (integrating) the single tone components in successive selection unit sections is performed (S9). Specifically, when a single sound component in consecutive selected unit sections satisfies a predetermined connection condition, two single sound components are connected. Here, FIG. 14 shows the relationship between the selected unit section and the unit section that are subject to the connection determination.

図１４は、図４（ｂ）と同様、単位区間１−６のうち、単位区間１、５、６がそれぞれ選出単位区間１、２、３として選出された例を示している。本実施形態では、連結判断の対象を音成分として選出される選出単位区間の間ではなく、選出単位区間と隣接する単位区間の間で行う点に特徴がある。図１４の例では、選出単位区間１と選出単位区間２（単位区間５）を連結するかどうか判断する際に、選出単位区間１と選出単位区間２（単位区間５）の間で連結条件を判断するのではなく、選出単位区間２（単位区間５）とその直前の単位区間４の間で連結条件を判断する。これにより、時間的に最も近い単位区間が連結判断の際に考慮されることとなり、適切に音成分を連結することが可能になる。 FIG. 14 shows an example in which unit sections 1, 5, and 6 are selected as selected unit sections 1, 2, and 3 among unit sections 1-6, as in FIG. 4 (b). The present embodiment is characterized in that it is performed not between selected unit sections selected as sound components for connection determination, but between selected unit sections and adjacent unit sections. In the example of FIG. 14, when it is determined whether or not the selection unit section 1 and the selection unit section 2 (unit section 5) are connected, the connection condition is set between the selection unit section 1 and the selection unit section 2 (unit section 5). Rather than determining, the connection condition is determined between the selected unit interval 2 (unit interval 5) and the immediately preceding unit interval 4. As a result, the closest unit section in time is taken into consideration when determining the connection, and the sound components can be appropriately connected.

連結条件としては、同一の音として連続性を有する状態を適宜設定することができる。本実施形態では、選出単位区間ｑと選出単位区間ｑ＋１を連結するか否かを判断するにあたり、４つの条件について判断する。第１の条件は、選出単位区間ｑ＋１（単位区間としてはＰ（ｑ＋１））における単音成分の強度から直前の単位区間Ｐ（ｑ＋１）−１における単音成分の強度を減じた値が所定のしきい値Ｌｄｉｆ未満であること。第２の条件は、その双方の強度が所定のしきい値Ｌｍｉｎより大きいこと。第３の条件は、ノートナンバーに対して上下１の変移を考慮し、副周波数を考慮した周波数の差が所定の閾値Ｎｄｉｆ未満であること。第４の条件は、選出単位区間ｑが既に他の選出単位区間と連結されている場合、選出単位区間ｑと連結されている先頭の選出単位区間ｑｏと選出単位区間ｑ＋１の副周波数を考慮した周波数の差が所定の閾値Ｎａｄｉｆ未満であること。以上４つの条件を全て満たす場合に、連続性を有するとして、後続の単音成分を前方の単音成分に連結する。第１の条件と第２の条件においても、ノートナンバーに対して上下１の変移を考慮する。本実施形態では、ノートナンバーの変移を考慮した以下の〔数式７〕〜〔数式９〕に従った４条件を満たすかどうかを判断することにより連結を行うか否かを決定する。 As a connection condition, a state having continuity as the same sound can be appropriately set. In the present embodiment, four conditions are determined when determining whether or not to connect the selection unit interval q and the selection unit interval q + 1. The first condition is that a value obtained by subtracting the intensity of a single sound component in the immediately preceding unit section P (q + 1) -1 from the intensity of the single sound component in the selected unit section q + 1 (P (q + 1) as a unit section) is a predetermined threshold. Must be less than the value Ldif. The second condition is that the intensity of both is greater than a predetermined threshold value Lmin. The third condition is that the difference of the frequency considering the sub-frequency is less than a predetermined threshold value Ndif in consideration of a shift of 1 up and down with respect to the note number. The fourth condition is that when the selection unit section q is already connected to another selection unit section, the sub-frequency of the top selection unit section qo and the selection unit section q + 1 connected to the selection unit section q is considered. The frequency difference is less than a predetermined threshold value Nadif. When all the above four conditions are satisfied, it is assumed that there is continuity, and the subsequent single sound component is connected to the front single sound component. Also in the first condition and the second condition, a shift of 1 up and down with respect to the note number is considered. In the present embodiment, whether or not to perform the connection is determined by determining whether or not the four conditions according to the following [Equation 7] to [Equation 9] are taken into account in consideration of the transition of the note number.

〔数式７〕
Ｅ１（Ｐ（ｑ＋１），ｎ）−Ｅ１（Ｐ（ｑ＋１）−１，ｎ）＜Ｌｄｉｆ
Ｅ１（Ｐ（ｑ＋１）−１，ｎ）＞ＬｍｉｎかつＥ１（Ｐ（ｑ＋１），ｎ）＞Ｌｍｉｎ
｜Ｓ（Ｐ（ｑ＋１）−１，ｎ）−Ｓ（Ｐ（ｑ＋１），ｎ）｜＜Ｎｄｉｆ
｜Ｓ（Ｐ（ｑｏ），ｎ）−Ｓ（Ｐ（ｑ＋１），ｎ）｜＜Ｎａｄｉｆ [Formula 7]
E1 (P (q + 1), n) -E1 (P (q + 1) -1, n) <Ldif
E1 (P (q + 1) -1, n)> Lmin and E1 (P (q + 1), n)> Lmin
| S (P (q + 1) -1, n) -S (P (q + 1), n) | <Ndif
| S (P (qo), n) -S (P (q + 1), n) | <Nadif

上記〔数式７〕において１行目の式は、後続の単音成分の強度から前方の単音成分の強度を減じた値が所定のしきい値Ｌｄｉｆ未満であることを示し、２行目の式は、前方の単音成分と後続の単音成分の強度がともに所定のしきい値Ｌｍｉｎより大きいことを示し、３行目の式は、副周波数を考慮したノートナンバー単位の周波数の差が所定の閾値Ｎｄｉｆ未満であることを示している。４行目の式は、前方の選出単位区間がそれより前方の選出単位区間と既に連結されている場合に、先頭の選出単位区間ｑoと後続の選出単位区間ｑ＋１の副周波数を考慮したノートナンバー単位の周波数の差が所定の閾値Ｎａｄｉｆ未満であることを示している。 In the above [Equation 7], the expression on the first line indicates that the value obtained by subtracting the intensity of the preceding single sound component from the intensity of the subsequent single sound component is less than a predetermined threshold Ldif. , Indicating that the intensity of both the preceding single sound component and the subsequent single sound component is larger than the predetermined threshold value Lmin, and the expression in the third row indicates that the frequency difference in note number units considering the sub frequency is a predetermined threshold value Ndif. It shows that it is less than. The expression on the fourth line is a note number that takes into account the sub-frequency of the head selection unit section qo and the subsequent selection unit section q + 1 when the front selection unit section is already connected to the front selection unit section. It shows that the difference in unit frequency is less than a predetermined threshold value Nadif.

〔数式８〕
Ｅ１（Ｐ（ｑ＋１），ｎ−１）−Ｅ１（Ｐ（ｑ＋１）−１，ｎ）＜Ｌｄｉｆ
Ｅ１（Ｐ（ｑ＋１）−１，ｎ）＞ＬｍｉｎかつＥ１（Ｐ（ｑ＋１），ｎ−１）＞Ｌｍｉｎ
｜Ｓ（Ｐ（ｑ＋１）−１，ｎ）−Ｓ（Ｐ（ｑ＋１），ｎ−１）−Ｍ｜＜Ｎｄｉｆ
｜Ｓ（Ｐ（ｑｏ），ｎ）−Ｓ（Ｐ（ｑ＋１），ｎ−１）−Ｍ｜＜Ｎａｄｉｆ [Formula 8]
E1 (P (q + 1), n-1) -E1 (P (q + 1) -1, n) <Ldif
E1 (P (q + 1) -1, n)> Lmin and E1 (P (q + 1), n-1)> Lmin
| S (P (q + 1) -1, n) -S (P (q + 1), n-1) -M | <Ndif
| S (P (qo), n) -S (P (q + 1), n-1) -M | <Nadif

上記〔数式７〕が同一ノートナンバー同士で比較しているのに対して、上記〔数式８〕においては、後続の単音成分として１ノートナンバー分下げたものを対象としている点が異なっている。３行目の式は、周波数をノートナンバー単位にすると、第１のスペクトル強度Ｅ１（Ｐ（ｑ＋１），ｎ−１）を決定する周波数Ｓ（Ｐ（ｑ＋１），ｎ−１）と、第１のスペクトル強度Ｅ１（Ｐ（ｑ＋１）−１，ｎ）を決定する副周波数Ｓ（Ｐ（ｑ＋１）−１，ｎ）との差が所定のしきい値Ｎｄｉｆ未満であることを示している。４行目の式は、周波数をノートナンバー単位にすると、第１のスペクトル強度Ｅ１（Ｐ（ｑ＋１），ｎ−１）を決定する周波数Ｓ（Ｐ（ｑ＋１），ｎ−１）と、第１のスペクトル強度Ｅ１（Ｐ（ｑｏ），ｎ）を決定する副周波数Ｓ（Ｐ（ｑｏ），ｎ）との差が所定のしきい値Ｎａｄｉｆ未満であることを示している。 The above [Equation 7] compares the same note numbers, whereas the above [Equation 8] differs in that the following single-tone component is reduced by one note number. The expression in the third row is obtained by calculating the frequency S (P (q + 1), n−1) for determining the first spectral intensity E1 (P (q + 1), n−1) and the first The difference from the sub-frequency S (P (q + 1) -1, n) that determines the spectrum intensity E1 (P (q + 1) -1, n) is less than a predetermined threshold value Ndif. The expression in the fourth row is obtained by calculating the frequency S (P (q + 1), n−1) for determining the first spectral intensity E1 (P (q + 1), n−1) and the first It is shown that the difference from the sub-frequency S (P (qo), n) that determines the spectrum intensity E1 (P (qo), n) is less than a predetermined threshold value Nadif.

〔数式９〕
Ｅ１（Ｐ（ｑ＋１），ｎ＋１）−Ｅ１（Ｐ（ｑ＋１）−１，ｎ）＜Ｌｄｉｆ
Ｅ１（Ｐ（ｑ＋１）−１，ｎ）＞ＬｍｉｎかつＥ１（Ｐ（ｑ＋１），ｎ＋１）＞Ｌｍｉｎ
｜Ｓ（Ｐ（ｑ＋１）−１，ｎ）−Ｓ（Ｐ（ｑ＋１），ｎ＋１）＋Ｍ｜＜Ｎｄｉｆ
｜Ｓ（Ｐ（ｑｏ），ｎ）−Ｓ（Ｐ（ｑ＋１），ｎ＋１）＋Ｍ｜＜Ｎａｄｉｆ [Formula 9]
E1 (P (q + 1), n + 1) -E1 (P (q + 1) -1, n) <Ldif
E1 (P (q + 1) -1, n)> Lmin and E1 (P (q + 1), n + 1)> Lmin
| S (P (q + 1) -1, n) -S (P (q + 1), n + 1) + M | <Ndif
| S (P (qo), n) -S (P (q + 1), n + 1) + M | <Nadif

上記〔数式７〕が同一ノートナンバー同士で比較しているのに対して、上記〔数式９〕においては、後続の単音成分として１ノートナンバー分上げたものを対象としている点が異なっている。３行目の式は、周波数をノートナンバー単位にすると、第１のスペクトル強度Ｅ１（Ｐ（ｑ＋１），ｎ＋１）を決定する周波数Ｓ（Ｐ（ｑ＋１），ｎ＋１）と、第１のスペクトル強度Ｅ１（Ｐ（ｑ＋１）−１，ｎ）を決定する副周波数Ｓ（Ｐ（ｑ＋１）−１，ｎ）との差が所定のしきい値Ｎｄｉｆ未満であることを示している。４行目の式は、周波数をノートナンバー単位にすると、第１のスペクトル強度Ｅ１（Ｐ（ｑ＋１），ｎ＋１）を決定する周波数Ｓ（Ｐ（ｑ＋１），ｎ＋１）と、第１のスペクトル強度Ｅ１（Ｐ（ｑｏ），ｎ）を決定する副周波数Ｓ（Ｐ（ｑｏ），ｎ）との差が所定のしきい値Ｎａｄｉｆ未満であることを示している。 While the above [Equation 7] compares the same note numbers, the above [Equation 9] is different in that the subsequent single sound component is increased by one note number. The expression in the third row is obtained by calculating the frequency S (P (q + 1), n + 1) for determining the first spectral intensity E1 (P (q + 1), n + 1) and the first spectral intensity E1 when the frequency is in note number units. It shows that the difference from the sub-frequency S (P (q + 1) -1, n) for determining (P (q + 1) -1, n) is less than a predetermined threshold value Ndif. The expression in the fourth row is expressed as follows: when the frequency is in note number units, the frequency S (P (q + 1), n + 1) for determining the first spectral intensity E1 (P (q + 1), n + 1) and the first spectral intensity E1 It shows that the difference from the sub-frequency S (P (qo), n) for determining (P (qo), n) is less than a predetermined threshold value Nadif.

本実施形態における具体的な処理手順としては、まず、〔数式７〕に示した条件を全て満たすかどうかを判断する。そして、満たす場合には、連結処理へ進む。〔数式７〕に示した条件のうち１つでも満たさないものがある場合は、〔数式８〕に示した条件を全て満たすかどうかを判断する。そして、満たす場合には、連結処理へ進む。〔数式８〕に示した条件のうち１つでも満たさないものがある場合は、〔数式９〕に示した条件を全て満たすかどうかを判断する。そして、満たす場合には、連結処理へ進む。〔数式９〕に示した条件のうち１つでも満たさないものがある場合は、連結を行わないという決定を行う。 As a specific processing procedure in the present embodiment, first, it is determined whether or not all the conditions shown in [Formula 7] are satisfied. And when satisfy | filling, it progresses to a connection process. If any of the conditions shown in [Formula 7] does not satisfy one of the conditions, it is determined whether or not all the conditions shown in [Formula 8] are satisfied. And when satisfy | filling, it progresses to a connection process. If any of the conditions shown in [Formula 8] does not satisfy one of the conditions, it is determined whether or not all the conditions shown in [Formula 9] are satisfied. And when satisfy | filling, it progresses to a connection process. If any of the conditions shown in [Equation 9] does not satisfy one of the conditions, it is determined that the connection is not performed.

連結後の主周波数、副周波数、強度は大きい方の単音成分の各値を採用し、時間長は双方の和で与える。この時、強度は聴覚フィルタ補正を行った後の値を使用する。連結条件としての具体的なしきい値は、本実施形態では、Ｌｄｉｆ＝１０[単位：１２８段階ベロシティ換算]、Ｌｍｉｎ＝１[単位：１２８段階ベロシティ換算]、Ｎｄｉｆ＝４／２５[単位：ノートナンバー換算] 、Ｎａｄｉｆ＝８／２５[単位：ノートナンバー換算]としている。連結処理は、符号コードへの変換前に行うものであるため、各しきい値は、ノートナンバー、ベロシティに換算したものである。Ｓ９における連結処理の結果、連結処理されなかった単音成分はそのまま残ることになる。また、連結により連結音成分が得られることになるが、連結音成分は、単音成分と同様、[開始時刻，時間長，主周波数ｎ，副周波数Ｓ（Ｐ（ｑ），ｎ），強度Ｅ‘（ｑ，ｎ）]で構成され、このうち時間長が単音成分より大きい値を有することになる。連結処理により、単音成分と連結音成分が混在することになるが、以降これらをまとめて音成分と呼ぶことにする。なお、Ｓ９における連結処理については、実行した方が、長音の音符で表現することになり、符号量が少なくなりＭＩＤＩ音源で円滑で自然な演奏が行われるようになるため、一般に望ましいが、ピッチベンド符号の付加などが行われないと、逆にビブラートなど音の微妙な時間的変化が消失するためＭＩＤＩ音源で不自然に聞こえる場合もあるため、必ずしも必須ではない。Ｓ９における連結処理を行わない場合、全てが短い音符として表現されることになる。 The main frequency, sub frequency, and intensity after connection are each taken from the value of the larger single tone component, and the time length is given as the sum of both. At this time, the intensity is the value after the auditory filter correction is performed. In this embodiment, specific threshold values as connection conditions are as follows: Ldif = 10 [unit: 128 step velocity conversion], Lmin = 1 [unit: 128 step velocity conversion], Ndif = 4/25 [unit: note number] Conversion], Nadif = 8/25 [unit: note number conversion]. Since the concatenation process is performed before conversion to a code code, each threshold value is converted into a note number and velocity. As a result of the connecting process in S9, the single sound components that have not been connected remain. In addition, a connected sound component is obtained by connection, and the connected sound component is [start time, time length, main frequency n, sub frequency S (P (q), n), intensity E, as with the single sound component. '(Q, n)], of which the time length has a value greater than the single tone component. By the connection process, a single sound component and a connected sound component are mixed, and these are hereinafter collectively referred to as a sound component. It should be noted that the connection processing in S9 is generally preferable because it is expressed by a long note, and the amount of codes is reduced, so that a smooth and natural performance can be performed with a MIDI sound source. If a code is not added, subtle temporal changes in sound such as vibrato disappear, and it may sound unnatural with a MIDI sound source. When the connection process in S9 is not performed, all are expressed as short notes.

Ｓ９の連結処理を終えたら、最終的に得られた[開始時刻，時間長，主周波数ｎ，副周波数Ｓ（Ｐ（ｑ），ｎ），強度Ｅ‘（ｑ，ｎ）]の音成分を、符号コードに変換する（Ｓ１０）。符号コードの形式としては、周波数情報と、各周波数に対応するスペクトル強度、および単位区間の開始と終了を特定可能な時間情報を有するものであれば、どのような形式のものであっても良いが、本実施形態では、ＭＩＤＩ形式に変換する。ＭＩＤＩでは、発音開始と発音終了を別のイベントとして発生するため、本実施形態では、１つの音成分を２つのＭＩＤＩノートイベントに変換する。具体的には、「開始時刻」で、ノートナンバーｎのノートオンイベントを発行し、ベロシティ値は強度Ｅ‘（ｑ，ｎ）の最大値をＥｍａｘとして、１２８・｛Ｅ‘（ｑ，ｎ）／Ｅｍａｘ｝^1/4で与える。時刻については、Standard MIDI Fileでは、直前イベントとの相対時刻（デルタタイム）で与える必要があり、その時刻単位は任意の整数値で定義でき、例えば、１／１５３６[秒]の単位に変換して与える。そして、絶対時刻が「開始時刻」＋「時間長」で特定される終了時刻で（デルタタイムでは「時間長」で与えられる終了時刻で）、ノートナンバーｎのノートオフイベントを発行する。この際、時間長には、０以上１以下の実数を乗じる。これは、使用するＭＩＤＩ音源の音色にも依存するが、ＭＩＤＩ音源の余韻を考慮して早めにノートオフ指示をするためである。時間長をそのまま用いてもＭＩＤＩ音源の処理上問題はないが、発音の際、後続音と部分的に重なる場合がある。 When the connection process of S9 is completed, the sound component of [start time, time length, main frequency n, sub frequency S (P (q), n), intensity E ′ (q, n)] obtained finally is obtained. And converted into a code code (S10). The format of the code code may be any format as long as it has frequency information, spectrum intensity corresponding to each frequency, and time information that can specify the start and end of a unit section. However, in this embodiment, the data is converted to the MIDI format. In MIDI, sound generation start and sound generation end are generated as separate events, so in this embodiment, one sound component is converted into two MIDI note events. Specifically, a note-on event of note number n is issued at the “start time”, and the velocity value is 128 · {E ′ (q, n) with the maximum value of intensity E ′ (q, n) as Emax. / Emax} is given by ^1/4 . In Standard MIDI File, it is necessary to give the time as a relative time (delta time) with the immediately preceding event, and the time unit can be defined by an arbitrary integer value, for example, converted to 1/1536 [seconds]. Give. Then, the note-off event of the note number n is issued at the end time specified by “start time” + “time length” (the end time given by “time length” in the delta time). At this time, the time length is multiplied by a real number between 0 and 1. This is because a note-off instruction is given early in consideration of the reverberation of the MIDI sound source, although it depends on the tone color of the MIDI sound source to be used. Even if the time length is used as it is, there is no problem in the processing of the MIDI sound source.

Ｓ１０の符号コード変換処理を終えたら、次に、符号コードに対して必要な調整を行う（Ｓ１１）。例えば、符号コードとしてＭＩＤＩ符号に変換する際、ＭＩＤＩ音源で処理可能な同時発音数についても考慮するため、同時発音数の調整を行う必要がある。ＭＩＤＩ音源で処理可能な同時発音数が３２である場合、時間軸方向に発音期間中（ノートオン状態）のノートイベントの個数を連続的にカウントし、同時に３２個を超えるノートイベントが存在する箇所が見つかった場合は、各々対になるノートオフイベントを近傍区間内で探索し、各ノートイベント対のベロシティ値とデュレーション値（ノートオフ時刻−ノートオン時刻）の積（エネルギー値）で優先度を評価し、指定和音数（この場合“３２”）以下になるように優先度の低い（エネルギー値の小さい）ノートイベント対を局所的に削除する処理を行う。“局所的に”とは、３２を超えるノートイベント対が存在する部分に限りという意味である。この際、ベロシティ値またはデュレーション値のいずれかが所定の下限値より低い場合、優先度に関係なく削除する処理も行う。この段階で、本願で提案した聴覚フィルタ補正の効果が顕著に働き、選択される３２個のノートイベントのうち２４個のノートイベントは、図１２で定義される２４個の各帯域内に含まれるノートナンバーより各々より少なくとも１つずつ選択され、残りの８個のノートナンバーは、いずれか８種の帯域より重複して選択される可能性が高くなる。実際には、図５に示されるように、符号化対象のノートナンバーをピアノの音域である２１〜１０８の範囲に限定することが多く、その場合、図１２で有効な帯域は１〜１８番に制限されるため、１８個の各帯域内に含まれるノートナンバーより各々より少なくとも１つずつ選択され、残りの１４個のノートナンバーは、いずれか１４種の帯域より重複して選択される可能性が高くなる。ただし、これはあくまで、このように選択される確率が高くなるように補正をしたということで、選出単位区間によっては、選択されるノートイベントは１８種未満のいずれかの帯域から選択されたり、１つの帯域から３種以上のノートナンバーが重複して選択されることもあり得る。このように符号化されたＭＩＤＩデータを、３２和音以上同時発音可能なＭＩＤＩ音源を用いて再生すると、特定の周波数帯域に限定されず、ヒト聴覚系が同時に聴取できる全ての周波数帯域の音域をカバーするため、ヒトが符号化前の原音響信号をそのまま聴取するのと同等な臨場感で明瞭に聴取することが可能になる。 After completing the code code conversion process of S10, next, necessary adjustments are made to the code code (S11). For example, when converting to a MIDI code as a code code, it is necessary to adjust the number of simultaneous sounds in order to consider the number of simultaneous sounds that can be processed by a MIDI sound source. When the number of simultaneous sounds that can be processed by the MIDI sound source is 32, the number of note events during the sound generation period (note-on state) is continuously counted in the time axis direction, and there are simultaneously more than 32 note events. Is found, each pair of note-off events is searched in the neighborhood, and the priority is determined by the product (energy value) of the velocity value and duration value (note-off time-note-on time) of each note event pair. Evaluation is performed, and a note event pair having a low priority (low energy value) is locally deleted so as to be equal to or less than the specified number of chords (in this case, “32”). “Locally” means that only a part where there are more than 32 note event pairs exists. At this time, if either the velocity value or the duration value is lower than the predetermined lower limit value, the deletion process is also performed regardless of the priority. At this stage, the effect of the auditory filter correction proposed in this application works remarkably, and 24 note events among the 32 note events to be selected are included in each of the 24 bands defined in FIG. At least one is selected from each of the note numbers, and the remaining eight note numbers are more likely to be selected redundantly from any of the eight types of bands. In practice, as shown in FIG. 5, the note numbers to be encoded are often limited to the range of 21 to 108 that is the piano range, and in this case, the effective bands in FIG. Therefore, at least one note number is selected from each of the 18 note numbers included in each of the 18 bands, and the remaining 14 note numbers can be selected redundantly from any of the 14 bands. Increases nature. However, this is to the extent that the correction is made so that the probability of being selected in this way is high, and depending on the selected unit section, the note event to be selected is selected from any band of less than 18 types, It is possible that three or more types of note numbers are selected in duplicate from one band. When the MIDI data encoded in this way is reproduced using a MIDI sound source capable of simultaneously generating 32 chords or more, it is not limited to a specific frequency band, and covers the frequency range of all frequency bands that can be heard simultaneously by the human auditory system. Therefore, it becomes possible to hear clearly with a sense of presence equivalent to a human being listening to the original sound signal before encoding as it is.

さらに、符号コードで処理可能なビットレートについても考慮するため、ビットレートの調整を行う。ＭＩＤＩ符号に変換する場合、時間軸方向に、例えば１秒間隔にノートイベント対の個数をカウントし、各々の符号データ量を平均５バイト（４０ビット）とし、ＭＩＤＩ音源で処理可能な最大ビットレートを９０００［ｂｐｓ（ビット／秒）］とすると、１秒間あたりイベント数が９０００／４０＝２２５個を超えている区間が見つかった場合は、その区間に存在するノートオンまたはノートオフイベントと各々対になるノートオフまたはノートオンイベントを近傍区間内で探索し、各ノートイベント対のベロシティ値とデュレーション値（ノートオフ時刻−ノートオン時刻）の積（エネルギー値）で優先度を評価し、指定イベント個数（この場合“２２５”）以下になるように優先度の低い（エネルギー値の小さい）ノートイベント対を局所的に削除する処理を行う。この際、ベロシティ値またはデュレーション値のいずれかが所定の下限値より低い場合、優先度に関係なく削除する処理も行う。この段階でも、本願で提案した聴覚フィルタ補正の効果が顕著に働き、図１２で定義される２４個（ピアノ音域に限定した運用の場合は１８個）の各帯域内に含まれる少なくとも１つのノートナンバーのノートイベントは削除されず残存する可能性が高くなる。 Furthermore, the bit rate is adjusted in order to consider the bit rate that can be processed by the code code. When converting to MIDI code, the number of note event pairs is counted in the time axis direction, for example, at 1-second intervals, and the average amount of code data is 5 bytes (40 bits). Is 9000 [bps (bits / second)], when a section in which the number of events per second exceeds 9000/40 = 225 is found, it is paired with a note-on or note-off event existing in the section. Searches for note-off or note-on events that become in the neighborhood, evaluates the priority by the product (energy value) of the velocity value and duration value (note-off time-note-on time) of each note event pair, and designates the event Note event pairs with low priority (low energy value) to be less than the number (in this case "225") It performs a process of deleting Tokoro manner. At this time, if either the velocity value or the duration value is lower than the predetermined lower limit value, the deletion process is also performed regardless of the priority. Even at this stage, the effect of the auditory filter correction proposed in the present application works remarkably, and at least one note included in each of the 24 bands (18 in the case of operation limited to the piano sound range) defined in FIG. Number note events are not deleted and are more likely to remain.

以上、本発明の好適な実施形態について説明したが、本発明は上記実施形態に限定されず、種々の変形が可能である。例えば、上記実施形態では、ノートナンバー間をＭ個の微分音（副周波数）を用いて解析を行うようにしたが、微分音を用いず、ノートナンバーに対応したＮ種類の周波数のみで解析するようにしても良い。この場合、解析精度は若干落ちるが、解析対象の周波数の数が減るため、処理負荷は軽減される。微分音を用いない場合、Ｓ９の単音成分の連結処理の判断において、〔数式７〕〜〔数式９〕では、いずれも３行目、４行目の式は判断しないことになる。 The preferred embodiments of the present invention have been described above. However, the present invention is not limited to the above embodiments, and various modifications can be made. For example, in the above embodiment, the analysis between the note numbers is performed using M differential sounds (sub-frequency), but the differential sounds are not used and the analysis is performed using only N types of frequencies corresponding to the note numbers. You may do it. In this case, the analysis accuracy is slightly reduced, but the processing load is reduced because the number of frequencies to be analyzed is reduced. In the case where the differential sound is not used, in the determination of the connection processing of the single sound component in S9, in the [Equation 7] to [Equation 9], the expressions of the third and fourth lines are not determined.

また、上記実施形態では、単位区間の設定前に、Ｓ１において時系列方向へのサンプル数の拡大処理を行い、単音成分の作成後、Ｓ７において単音成分の補正として各単音成分の周波数をＫ倍、時間軸を１／Ｋ倍にする処理を行っているが、特別に高い時間分解能を必要としない場合、必ずしも実行しなくても良い。 In the above embodiment, before setting the unit section, the number of samples in the time series direction is expanded in S1, and after the creation of the single sound component, the frequency of each single sound component is multiplied by K times as the correction of the single sound component in S7. Although the processing for increasing the time axis to 1 / K is performed, it is not always necessary to perform the processing when a particularly high time resolution is not required.

また、上記実施形態では、周波数解析を第１の周波数解析と第２の周波数解析に分け、第１の周波数解析の結果、所定の条件を満たした選出単位区間に対して第２の周波数解析を実行するようにしたが、各単位区間に対して、特許文献１〜３に開示されているような公知の周波数解析を実行するようにしても良い。 In the above embodiment, the frequency analysis is divided into the first frequency analysis and the second frequency analysis. As a result of the first frequency analysis, the second frequency analysis is performed on the selected unit section that satisfies a predetermined condition. Although it was made to perform, you may make it perform the well-known frequency analysis which is disclosed by patent documents 1-3 with respect to each unit area.

本発明は、ＰＣＭ等により得られた音響信号を、ＭＩＤＩ符号等の符号コードに変換する技術を用い、放送メディア（地上・ＢＳなどによるデジタルラジオ・テレビ放送など）、通信メディア（ＣＳ放送、インターネット・ストリーミング放送、携帯電話サービス、携帯音楽配信サービスなど）、パッケージメディア（ＣＤ、ＤＶＤ、ＢｌｕｅＲａｙ、メモリＩＣカードなど）向けのオーディオコンテンツ制作産業に利用することができる。 The present invention uses a technology for converting an acoustic signal obtained by PCM or the like into a code code such as a MIDI code, and uses broadcast media (digital radio / television broadcast such as terrestrial / BS) and communication media (CS broadcast, Internet). -Streaming broadcasting, mobile phone service, portable music distribution service, etc.) and audio media production industry for package media (CD, DVD, BlueRay, memory IC card, etc.).

１・・・ＣＰＵ
２・・・ＲＡＭ
３・・・データ記憶装置
４・・・プログラム記憶装置
５・・・キー入力Ｉ／Ｆ
６・・・データ入出力Ｉ／Ｆ
７・・・表示出力Ｉ／Ｆ 1 ... CPU
2 ... RAM
3 ... Data storage device 4 ... Program storage device 5 ... Key input I / F
6 ... Data I / O I / F
7 ... Display output I / F

Claims

An encoding method for encoding an acoustic signal given as a sample sequence of J time series digitized at a predetermined sampling frequency,
A unit section composed of a predetermined number T (T <J) samples with respect to the sample sequence, while overlapping a predetermined number W (W <T) samples in the time axis direction with an adjacent unit section. Section setting stage to be set,
For each unit section, by performing frequency conversion for at least N types of frequencies f (n) to be analyzed , the first spectral intensity corresponding to the N types of frequencies for the unit section p. A first spectrum calculation stage for calculating E1 (p, n);
An evaluation value based on a change for each frequency corresponding to the first spectral intensity E1 (p-1, n) in the unit section p-1 located immediately before the unit section p is greater than a predetermined threshold value. Only when it is larger, the unit interval p is selected as the q (q ≦ p) -th selection unit interval q, and at least N kinds of frequencies f (n) to be analyzed are selected in the first spectrum calculation stage. A second spectrum calculation step of calculating a second spectrum intensity E2 (q, n) corresponding to the N kinds of frequencies by performing a frequency conversion with higher accuracy than the frequency conversion;
The N types of frequencies f (n) are divided into a predetermined number of frequency groups so as not to overlap each other, and the second spectral intensity E2 (q, n) A spectrum that corrects the second spectrum intensity E2 (q, n) other than the frequency having the maximum value in n) so as to be attenuated by a predetermined ratio to generate a corrected spectrum intensity E ′ (q, n). A correction stage;
An encoding step of generating a code code of a predetermined format based on the time difference between the start time of the unit section and the start time of the immediately following unit section, and the corrected spectrum intensity of the unit section;
A method for encoding an acoustic signal, comprising:

In claim 1,
The predetermined number of frequency groups are based on the characteristics of the human auditory system, and the value of n is defined as a standard note number, and 17, 45, 57, 64, 69, 72, 76, 79, 82, 85 , 88, 91, 93, 96, 98, 101, 104, 106, 109, 113, 116, 119, 123, 127, 128. Method.

In claim 1,
The predetermined number of frequency groups have frequency f (n) values of 20, 100, 200, 300, 400, 510, 630, 770, 920, 1080, 1270, 1480, 1720, based on characteristics of the human auditory system. An acoustic signal encoding method characterized by being set at 24 with 2000, 2320, 2700, 3150, 3700, 4400, 5300, 6400, 7700, 9500, 12000, and 15500 Hz as boundaries.

In any one of Claims 1-3 ,
In the first spectrum calculation stage and the second spectrum calculation stage, predetermined M types of sub-frequency f (n, m) are set in a range not exceeding adjacent frequencies for each of the N types of frequencies f (n). Then, as the first spectral intensity E1 (p, n) and the second spectral intensity E2 (q, n), intensity values corresponding to the sub-frequency indicating the highest intensity among the M types of sub-frequency are obtained. A method for encoding an acoustic signal, comprising: calculating an acoustic signal.

In any one of Claims 1-4 ,
The encoding step defines P (q) = p when the selection unit interval q is the p-th unit interval p with respect to two adjacent selection unit intervals q and selection unit interval q + 1. , First spectral intensities E1 (P (q + 1), n), E1 (P (q + 1), n corresponding to frequencies f (n), f (n-1), f (n + 1) in the selected unit interval q + 1. n-1), E1 (P (q + 1), n + 1) and the first spectrum corresponding to the frequency f (n) in the unit section P (q + 1) -1 located immediately before the selected unit section q + 1 The difference from the intensity E1 (P (q + 1) -1, n) is less than a predetermined threshold Ldif, and the first spectrum intensity E1 (P (q + 1), n), E1 (P (q + 1), n -1), E1 (P (q + 1), n + 1) and When the first spectral intensity E1 (P (q + 1) -1, n) is greater than a predetermined threshold value Lmin, the selection unit interval q and the selection unit interval q + 1 are connected, and the time difference that is the basis of the code code And a method of encoding an acoustic signal using a time difference between a head time of a selected unit section q defined in the selected unit section q and a head time of a selected unit section q + 2 immediately after the selected unit section q + 1. .

In claim 5 ,
The encoding step includes at least sub-frequency determining the first spectral intensities E1 (P (q + 1), n), E1 (P (q + 1), n−1), E1 (P (q + 1), n + 1). When the condition that the difference in note number units between any one and the sub-frequency that determines the first spectral intensity E1 (P (q + 1) -1, n) is less than a predetermined threshold value Ndif is further satisfied As long as the selection unit interval q and the selection unit interval q + 1 are connected, the method of encoding an acoustic signal is characterized.

In claim 6 ,
When the selection unit interval q is already connected to another selection unit interval, the first selection unit interval to which the selection unit interval q is connected is defined as qo,
The encoding step includes at least sub-frequency determining the first spectral intensities E1 (P (q + 1), n), E1 (P (q + 1), n−1), E1 (P (q + 1), n + 1). Only when the condition that the difference in note number units between any one and the sub-frequency that determines the first spectral intensity E1 (P (qo), n) is less than a predetermined threshold value Nadif is satisfied A method for encoding an acoustic signal, comprising connecting the selection unit interval q and the selection unit interval q + 1.

In any one of claims 1 to 7,
In the first spectrum calculation step, N element signals to be constituent elements of the section signal of the unit section correspond to integer multiples of the period of the frequency f (n), respectively, and T closest to the T (N) an element signal preparation stage to prepare as samples;
By performing a correlation operation between the element signal corresponding to each of the N frequencies f (n) and a section signal composed of T (n) samples of the corresponding unit section p, It is constituted by a correlation calculation stage for calculating the spectral intensity E1 (p, n),
The second spectrum calculation step includes:
The element signal preparation step;
Correlation is performed between the element signal corresponding to each of the N frequencies f (n) and the section signal composed of T (n) samples of the corresponding selection unit section q, and the correlation value is the highest. A harmonic signal selection step of selecting an element signal corresponding to the high frequency f (nmax) as a harmonic signal;
By using T (nmax) samples given by the product of the selected harmonic signal and the correlation value obtained for the harmonic signal as an inclusion signal, and subtracting the inclusion signal from the interval signal, T (nmax) A differential signal calculation step for obtaining a differential signal composed of a number of samples,
Using the T (n) samples updated to reflect the T (nmax) samples as new interval signals, the harmonic signal selection step and the difference signal calculation step are executed to obtain new inclusion signals and difference signals. N content signals are obtained by repeatedly performing the obtained process, and the second spectrum intensity E2 (q, n) corresponding to the N kinds of frequencies is calculated based on the correlation values of the obtained content signals. A method of encoding an acoustic signal characterized by the above.

In claim 8 ,
The correlation calculation step includes:
A correlation calculation corresponding to the first W sample in the unit interval p-1 is performed on the previous correlation calculation result corresponding to each frequency f (n) in the unit interval p-1 located immediately before, and a correlation value for each frequency is obtained. Is subtracted from the previous correlation calculation result, a correlation calculation corresponding to the last W sample in the T (n) samples in the unit interval p is performed, and a correlation value for each frequency is added to the previous correlation calculation result. Thus, a correlation calculation result corresponding to each frequency f (n) in the unit interval p is acquired, and the first spectral intensity E1 (p, n) is calculated based on the correlation calculation result. A method for encoding an acoustic signal.

In any one of claims 1 to 9,
The encoding step performs encoding using the MIDI format as the code code, uses a note number as the frequency information of the code code, uses velocity as the second spectral intensity, and selects the selected unit interval q Delta time 1 and delta time 2 that are relative times from the immediately preceding MIDI event, respectively, and the defined note number, velocity, delta A method for encoding an audio signal, wherein a MIDI note-on event is created based on time 1 and a MIDI note-off event is created based on note number and delta time 2.

An encoding device for encoding an acoustic signal given as a sample sequence of J time series digitized at a predetermined sampling frequency,
A unit section composed of a predetermined number T (T <J) samples with respect to the sample sequence, while overlapping a predetermined number W (W <T) samples in the time axis direction with an adjacent unit section. Section setting means to set;
For each unit section, spectrum conversion corresponding to the N types of frequencies is calculated for each unit section by performing frequency conversion for at least N types of frequencies f (n) to be analyzed. Spectrum calculation means;
The N types of frequencies f (n) are divided into a predetermined number of frequency groups so as not to overlap with each other, and the maximum value among the spectrum intensities of the frequencies included in each frequency group is obtained for each unit section. Spectrum correction means for performing correction so as to attenuate the spectrum intensity other than the frequency by a predetermined ratio, and creating corrected spectrum intensity;
And time difference between the start time and the start time of the unit segment immediately following the unit segment, based on the corrected spectrum intensity of the unit section, possess encoding means for generating a predetermined format code code, and,
The predetermined number of frequency groups are based on the characteristics of the human auditory system, and the value of n is defined as a standard note number, and 17, 45, 57, 64, 69, 72, 76, 79, 82, 85 , coding of audio signals, characterized in Rukoto set in 24 bounded by 88,91,93,96,98,101,104,106,109,113,116,119,123,127,128 apparatus.

An encoding device for encoding an acoustic signal given as a sample sequence of J time series digitized at a predetermined sampling frequency,
A unit section composed of a predetermined number T (T <J) samples with respect to the sample sequence, while overlapping a predetermined number W (W <T) samples in the time axis direction with an adjacent unit section. Section setting means to set;
For each unit section, spectrum conversion corresponding to the N types of frequencies is calculated for each unit section by performing frequency conversion for at least N types of frequencies f (n) to be analyzed. Spectrum calculation means;
The N types of frequencies f (n) are divided into a predetermined number of frequency groups so as not to overlap with each other, and the maximum value among the spectrum intensities of the frequencies included in each frequency group is obtained for each unit section. Spectrum correction means for performing correction so as to attenuate the spectrum intensity other than the frequency by a predetermined ratio, and creating corrected spectrum intensity;
And time difference between the start time and the start time of the unit segment immediately following the unit segment, based on the corrected spectrum intensity of the unit section, possess encoding means for generating a predetermined format code code, and,
The predetermined number of frequency groups have frequency f (n) values of 20, 100, 200, 300, 400, 510, 630, 770, 920, 1080, 1270, 1480, 1720, based on characteristics of the human auditory system. encoding apparatus of an audio signal, wherein Rukoto set in 24 bounded by 2000,2320,2700,3150,3700,4400,5300,6400,7700,9500,12000,15500Hz.

An encoding device for encoding an acoustic signal given as a sample sequence of J time series digitized at a predetermined sampling frequency,
A unit section composed of a predetermined number T (T <J) samples with respect to the sample sequence, while overlapping a predetermined number W (W <T) samples in the time axis direction with an adjacent unit section. Section setting means to set;
For each unit section, by performing frequency conversion for at least N types of frequencies f (n) to be analyzed , the first spectral intensity corresponding to the N types of frequencies for the unit section p. First spectrum calculating means for calculating E1 (p, n);
An evaluation value based on a change for each frequency corresponding to the first spectral intensity E1 (p-1, n) in the unit section p-1 located immediately before the unit section p is greater than a predetermined threshold value. Only when it is larger, the unit interval p is selected as the q (q ≦ p) -th selection unit interval q, and at least N types of frequencies f (n) to be analyzed are selected in the first spectrum calculation means. Second spectrum calculation means for calculating a second spectrum intensity E2 (q, n) corresponding to the N types of frequencies by performing a frequency conversion with higher accuracy than the frequency conversion;
The N types of frequencies f (n) are divided into a predetermined number of frequency groups so as not to overlap each other, and the second spectral intensity E2 (q, n) A spectrum that corrects the second spectrum intensity E2 (q, n) other than the frequency having the maximum value in n) so as to be attenuated by a predetermined ratio to generate a corrected spectrum intensity E ′ (q, n). Correction means;
Encoding means for generating a code code of a predetermined format based on the time difference between the start time of the unit section and the start time of the immediately following unit section, and the corrected spectrum intensity of the unit section;
A device for encoding an acoustic signal, comprising:

According to claim 1 3,
The first spectrum calculating means and the second spectrum calculating means set predetermined M kinds of sub-frequency f (n, m) within a range not exceeding adjacent frequencies for each of the N kinds of frequencies f (n). Then, as the first spectral intensity E1 (p, n) and the second spectral intensity E2 (q, n), intensity values corresponding to the sub-frequency indicating the highest intensity among the M types of sub-frequency are obtained. An apparatus for encoding an acoustic signal, comprising: calculating an acoustic signal.

According to claim 1 3 or claim 1 4,
The encoding means defines P (q) = p when the selection unit interval q is the p-th unit interval p for two adjacent selection unit intervals q and selection unit interval q + 1. , First spectral intensities E1 (P (q + 1), n), E1 (P (q + 1), n corresponding to frequencies f (n), f (n-1), f (n + 1) in the selected unit interval q + 1. n-1), E1 (P (q + 1), n + 1) and the first spectrum corresponding to the frequency f (n) in the unit section P (q + 1) -1 located immediately before the selected unit section q + 1 The difference from the intensity E1 (P (q + 1) -1, n) is less than a predetermined threshold Ldif, and the first spectrum intensity E1 (P (q + 1), n), E1 (P (q + 1), n -1), E1 (P (q + 1), n + 1) and When the first spectral intensity E1 (P (q + 1) -1, n) is greater than a predetermined threshold value Lmin, the selection unit interval q and the selection unit interval q + 1 are connected, and the time difference that is the basis of the code code And a time difference between the start time of the selected unit section q defined in the selected unit section q and the start time of the selected unit section q + 2 immediately after the selected unit section q + 1 is used. .

In claim 15 ,
The encoding means includes at least a sub-frequency for determining the first spectral intensity E1 (P (q + 1), n), E1 (P (q + 1), n-1), E1 (P (q + 1), n + 1). When the condition that the difference in note number units between any one and the sub-frequency that determines the first spectral intensity E1 (P (q + 1) -1, n) is less than a predetermined threshold value Ndif is further satisfied As long as the selection unit interval q and the selection unit interval q + 1 are connected, the apparatus for encoding an acoustic signal is characterized.

In claim 16 ,
When the selection unit interval q is already connected to another selection unit interval, the first selection unit interval to which the selection unit interval q is connected is defined as qo,
The encoding means includes at least a sub-frequency for determining the first spectral intensity E1 (P (q + 1), n), E1 (P (q + 1), n-1), E1 (P (q + 1), n + 1). Only when the condition that the difference in note number units between any one and the sub-frequency that determines the first spectral intensity E1 (P (qo), n) is less than a predetermined threshold value Nadif is satisfied An apparatus for encoding an acoustic signal, wherein the selection unit interval q and the selection unit interval q + 1 are connected.

In claims 1 to 3 any one of claims 1-7,
The first spectrum calculating means corresponds to N element signals to be constituent elements of the section signal of the unit section, each corresponding to an integral multiple of the period of the frequency f (n), and T closest to the T (N) Element signal preparation means for preparing as samples,
By performing a correlation operation between the element signal corresponding to each of the N frequencies f (n) and a section signal composed of T (n) samples of the corresponding unit section p, It is comprised by the correlation calculating means which calculates spectrum intensity | strength E1 (p, n),
The second spectrum calculation means includes:
The element signal preparation means;
Correlation is performed between the element signal corresponding to each of the N frequencies f (n) and the section signal composed of T (n) samples of the corresponding selection unit section q, and the correlation value is the highest. Harmonic signal selection means for selecting an element signal corresponding to a high frequency f (nmax) as a harmonic signal;
By using T (nmax) samples given by the product of the selected harmonic signal and the correlation value obtained for the harmonic signal as an inclusion signal, and subtracting the inclusion signal from the interval signal, T (nmax) Differential signal calculation means for obtaining a differential signal composed of a number of samples,
The T (nmax) samples updated to reflect the T (nmax) samples are used as new section signals, and the processing by the harmonic signal selection means and the difference signal calculation means is executed to obtain new inclusion signals and differences. N content signals are obtained by repeatedly performing the signal obtaining process, and the second spectrum intensity E2 (q, n) corresponding to the N types of frequencies is calculated based on the correlation values of the obtained content signals. An apparatus for encoding an acoustic signal.

In claim 18 ,
The correlation calculation means includes
A correlation calculation corresponding to the first W sample in the unit interval p-1 is performed on the previous correlation calculation result corresponding to each frequency f (n) in the unit interval p-1 located immediately before, and a correlation value for each frequency is obtained. Is subtracted from the previous correlation calculation result, a correlation calculation corresponding to the last W sample in the T (n) samples in the unit interval p is performed, and a correlation value for each frequency is added to the previous correlation calculation result. Thus, a correlation calculation result corresponding to each frequency f (n) in the unit interval p is acquired, and the first spectral intensity E1 (p, n) is calculated based on the correlation calculation result. An apparatus for encoding an acoustic signal.

In any one of claims 19 claims 1 to 3,
The encoding means performs encoding using the MIDI format as the code code, uses a note number as the frequency information of the code code, uses velocity as the second spectral intensity, and selects the selected unit interval q Delta time 1 and delta time 2 that are relative times from the immediately preceding MIDI event, respectively, and the defined note number, velocity, delta An apparatus for encoding an audio signal, wherein a MIDI note-on event is created based on time 1 and a MIDI note-off event is created based on note number and delta time 2.

Programs for executing the encoding method of the audio signal according to the computer of claims 1 to claim 1 0.

A program for causing a computer to function as the acoustic signal encoding device according to any one of claims 11 to 20 .