JP5552794B2

JP5552794B2 - Method and apparatus for encoding acoustic signal

Info

Publication number: JP5552794B2
Application number: JP2009244307A
Authority: JP
Inventors: 敏雄茂出木
Original assignee: Dai Nippon Printing Co Ltd
Current assignee: Dai Nippon Printing Co Ltd
Priority date: 2009-10-23
Filing date: 2009-10-23
Publication date: 2014-07-16
Anticipated expiration: 2029-10-23
Also published as: JP2011090189A

Description

本発明は、音響信号の符号化技術に関し、特に、ＭＩＤＩ形式等の符号データに符号化するのに好適な符号化技術に関する。 The present invention relates to an audio signal encoding technique, and more particularly to an encoding technique suitable for encoding into code data of MIDI format or the like.

音響信号をMIDI符号等の符号データに変換する方法として、出願人は、音響信号に対してフーリエ変換などの周波数解析を施す技術を提案している（特許文献１参照）。しかし、与えられる音響信号はデジタルサンプリングされたものであるため、周波数が高い領域では調和関数との相関計算において周期あたりのサンプル数が少なくなり解析精度が顕著に悪化するという問題があった。これに対し、出願人は、MIDI符号で使用される平均律音階と同様な対数尺度の周波数で周波数解析を行うことにより、周波数が高い領域では参照する調和関数の周波数間隔を増大させ、隣接する周波数（MIDIノートナンバー）を誤判定する技術を提案している（特許文献２参照）。これに伴い、解析対象の主周波数間（半音間）を微細な周波数に分割して解析する方法を提案し、主として隣接する周波数間隔が広い周波数が高い領域における解析精度を改善している（特許文献３参照）。更に、出願人は、一般化調和解析を用いる技術を提案し、フーリエ変換では擬似的な周波数成分を過剰に抽出してしまう問題を改善している（特許文献４参照）。 As a method for converting an acoustic signal into code data such as a MIDI code, the applicant has proposed a technique for performing frequency analysis such as Fourier transform on the acoustic signal (see Patent Document 1). However, since the given acoustic signal is digitally sampled, there is a problem that in the region where the frequency is high, the number of samples per period is reduced in the correlation calculation with the harmonic function, and the analysis accuracy is significantly deteriorated. On the other hand, the applicant increases the frequency interval of the harmonic function to be referenced in a high frequency region by performing frequency analysis at a logarithmic scale frequency similar to the average temperament used in the MIDI code, and is adjacent. A technique for erroneously determining the frequency (MIDI note number) has been proposed (see Patent Document 2). Along with this, we have proposed a method to analyze by dividing the main frequencies (semitones) to be analyzed into fine frequencies, mainly improving the analysis accuracy in the region where the adjacent frequency interval is wide and the frequency is high (patent) Reference 3). Further, the applicant has proposed a technique using generalized harmonic analysis, and has improved the problem of excessively extracting pseudo frequency components in the Fourier transform (see Patent Document 4).

上記提案手法により定常的な信号における周波数解析を高精度に行うことは可能になったが、音響信号は時系列に周波数が変動するため、理想的には、ある時刻における瞬間周波数を計測することが望まれるが、それは実現不可能である。そのため上記いずれの提案においても、ある時刻の近傍にフレームという微小区間を設定して、その区間で短時間の周波数解析を行う方法をとっている。この時、周波数解析精度（特に低音周波数）を向上させるためには微小区間（サンプル数）は長いほど良いが、一方時間分解能を向上させるためには微小区間は短いほど良い。即ち、周波数分解能と時間分解能はトレードオフの関係になる（不確定性の原理）。そこで、出願人は、フレーム長はMIDI符号の最低音を解析可能な程度に固定値(44.1kHzサンプリングで4096サンプル)に設定し、フレームを可変のステップで送りながら、時間分解能を改善する方法を提案した（特許文献５参照）。 Although the proposed method has made it possible to perform frequency analysis of stationary signals with high accuracy, the frequency of acoustic signals fluctuates in time series, so ideally the instantaneous frequency at a certain time should be measured. Is desired, but it is not feasible. Therefore, in any of the above proposals, a method is adopted in which a minute section called a frame is set in the vicinity of a certain time and a short-time frequency analysis is performed in that section. At this time, in order to improve the frequency analysis accuracy (particularly the bass sound frequency), the smaller interval (number of samples) is better. On the other hand, in order to improve the time resolution, the shorter interval is better. That is, frequency resolution and time resolution are in a trade-off relationship (the principle of uncertainty). Therefore, the applicant sets the frame length to a fixed value (4096 samples at 44.1 kHz sampling) so that the lowest sound of the MIDI code can be analyzed, and improves the time resolution while sending the frame in variable steps. Proposed (see Patent Document 5).

特許第３７９５２０１号公報Japanese Patent No. 3795201 特許第４０３７５４２号公報Japanese Patent No. 4037542 特許第４１５６２６８号公報Japanese Patent No. 4156268 特許第４１３２３６２号公報Japanese Patent No. 4132362 特許第４０６１０７０号公報Japanese Patent No. 4061070

しかし、上記従来の技術では限界があり、音声信号などでは周波数の変動が急峻な場合、解析が周波数変動に追従できなかった。特に外国語音声、比較的早口な日本語音声では周波数変動を適切に抽出できず、明瞭なMIDI再生音が得られないという問題が発生している。音声信号の場合、一般に楽器音に比べ録音時のサンプリング周波数が低いということも起因しているが、楽器演奏音に比べ周波数変化の速度が著しいというのが主要因である。といって、フレーム長をこれ以上短く設定すると、周波数解析精度が低下してしまうというジレンマに陥っていた。 However, there is a limit in the above-described conventional technology, and analysis cannot follow the frequency variation when the frequency variation is sharp in an audio signal or the like. In particular, foreign language voices and relatively fast-spoken Japanese voices have a problem that frequency fluctuations cannot be extracted properly, and clear MIDI playback sounds cannot be obtained. In the case of audio signals, the sampling frequency at the time of recording is generally lower than that of musical instrument sounds, but the main factor is that the speed of frequency change is significant compared to musical instrument performance sounds. However, if the frame length is set to be shorter than this, the frequency analysis accuracy deteriorates.

そこで、本発明は、従来と同等な周波数解析精度を維持しながら、解析における時間分解能を向上させ、主として音声信号における周波数変動を高精度に抽出することが可能な音響信号の符号化方法および装置を提供することを課題とする。 Accordingly, the present invention provides an audio signal encoding method and apparatus capable of improving time resolution in analysis and extracting frequency fluctuations mainly in audio signals with high accuracy while maintaining frequency analysis accuracy equivalent to that of the prior art. It is an issue to provide.

上記課題を解決するため、本発明第１の態様では、所定のサンプリング周期でデジタル化されたＪ個の時系列の強度配列として与えられる音響信号を符号化するにあたり、強度配列に対して時間軸方向に所定の倍率Ｑ（Ｑは整数）だけ拡大し、Ｊ×Ｑ個の時系列の拡大強度配列に変換し、前記拡大強度配列に対して所定の個数のサンプルＴ（Ｔ＜Ｊ）で構成される符号化対象となる複数の単位区間を隣接する単位区間を時間軸方向に重複させながら設定し、個々の単位区間ごとに、Ｐ種類の周波数に対応したスペクトル強度を算出し、個々の単位区間ごとに、求められたＰ種類の周波数に対応して、各周波数を特定可能な周波数情報と、各々に対応するスペクトル強度、および当該単位区間の開始と終了を特定可能な時間情報で構成されるＰ個の符号コードを作成し、前記Ｐ個の符号コードの周波数がＱ倍になるように前記周波数情報を補正し、補正後の周波数情報が補正前の最大の周波数情報を越える周波数に対応した符号コードを削除し、残りのＰｈ個の符号コードに対して、時間軸が１／Ｑ倍になるように、前記時間情報を補正するようにしたことを特徴とする。 In order to solve the above-described problem, in the first aspect of the present invention, in encoding an acoustic signal given as an intensity array of J time series digitized at a predetermined sampling period, a time axis is obtained with respect to the intensity array. Magnified in the direction by a predetermined magnification Q (Q is an integer), converted to a J × Q time-series expanded intensity array, and composed of a predetermined number of samples T (T <J) with respect to the expanded intensity array A plurality of unit sections to be encoded are set while overlapping adjacent unit sections in the time axis direction, spectrum intensities corresponding to P types of frequencies are calculated for each unit section, and individual units are calculated. Each section is composed of frequency information that can specify each frequency, spectrum intensity corresponding to each frequency, and time information that can specify the start and end of the unit section, corresponding to the obtained P types of frequencies. P Code code is generated, the frequency information is corrected so that the frequency of the P code codes is multiplied by Q, and the corrected frequency information corresponds to a frequency that exceeds the maximum frequency information before correction. And the time information is corrected so that the time axis becomes 1 / Q times the remaining Ph code codes.

本発明第１の態様によれば、デジタル化された音響信号の各強度配列を時間軸方向に所定の倍率だけ拡大した後、所定数Ｔ個の強度配列で構成される単位区間ごとに、所定数Ｐ種類の周波数に対応したスペクトル強度を算出し、周波数、時刻を含むＰ個の符号コードを得て、Ｐ個の符号コードの周波数をＱ倍、時刻を１／Ｑ倍に補正するようにしたので、従来と同等な周波数解析精度を維持しながら、解析における時間分解能を向上させ、主として音声信号における周波数変動を高精度に抽出することが可能となる。 According to the first aspect of the present invention, after each intensity array of the digitized acoustic signal is expanded by a predetermined magnification in the time axis direction, a predetermined interval is set for each unit section composed of a predetermined number T of intensity arrays. Spectral intensities corresponding to several P types of frequencies are calculated, P code codes including frequency and time are obtained, and the frequency of the P code codes is corrected to Q times and the time is corrected to 1 / Q times. Therefore, it is possible to improve the time resolution in the analysis while maintaining the frequency analysis accuracy equivalent to the conventional one, and extract the frequency fluctuation mainly in the audio signal with high accuracy.

また、本発明第２の態様では、所定のサンプリング周期でデジタル化されたＪ個の時系列の強度配列として与えられる音響信号を符号化するための符号化するにあたり、前記強度配列に対して時間軸方向に所定の倍率Ｑ（Ｑは整数）だけ拡大し、Ｊ×Ｑ個の時系列の拡大強度配列に変換し、前記拡大強度配列に対して所定の個数のサンプルＴ（Ｔ＜Ｊ）で構成される符号化対象となる複数の単位区間を隣接する単位区間を時間軸方向に重複させながら設定し、個々の単位区間ごとに、Ｐ種類の周波数に対応したスペクトル強度を算出し、前記スペクトル算出段階において求めたＰ種類の各周波数に対してＱ倍になるように補正し、補正後の周波数が補正前の最大の周波数を越える周波数に対応したスペクトル強度を削除し、残りのＰｈ種類の周波数に対応したスペクトル強度に補正し、各単位区間の開始と終了の時刻を１／Ｑ倍になるように補正し、個々の単位区間ごとに、前記スペクトル補正段階において補正されたＰｈ種類の周波数に対応して、各周波数を特定可能な周波数情報と、各々に対応するスペクトル強度、および当該単位区間の開始と終了を特定可能な時間情報で構成されるＰｈ個の符号コードを作成するようにしたことを特徴とする。 Further, in the second aspect of the present invention, in encoding for encoding an acoustic signal given as an intensity array of J time series digitized at a predetermined sampling period, time is applied to the intensity array. Ax is enlarged in the axial direction by a predetermined magnification Q (Q is an integer), converted into J × Q time-series enlarged intensity arrays, and a predetermined number of samples T (T <J) with respect to the expanded intensity array. A plurality of unit sections to be encoded are set while overlapping adjacent unit sections in the time axis direction, spectrum intensities corresponding to P types of frequencies are calculated for each unit section, and the spectrum Corrections are made so that each of the P types of frequencies obtained in the calculation stage is multiplied by Q, and the spectrum intensity corresponding to the frequency after the correction exceeds the maximum frequency before correction is deleted, and the remaining Ph types are corrected. The spectrum intensity corresponding to the wave number is corrected, the start and end times of each unit section are corrected to be 1 / Q times, and the Ph types of frequencies corrected in the spectrum correction step for each unit section. Corresponding to the frequency information that can identify each frequency, the spectrum intensity corresponding to each frequency, and the Ph code code composed of time information that can identify the start and end of the unit section It is characterized by that.

本発明第２の態様によれば、デジタル化された音響信号の各強度配列を時間軸方向に所定の倍率だけ拡大した後、所定数Ｔ個の強度配列で構成される単位区間ごとに、所定数Ｐ種類の周波数に対応したスペクトル強度を算出し、Ｐ個の符号コードの周波数をＱ倍し、補正前の最大の周波数を越える周波数に対応したスペクトル強度を削除し、残りのＰｈ種類の周波数に対応したスペクトル強度に補正し、時刻を１／Ｑ倍に補正した後、周波数、時刻を含むＰｈ個の符号コードを得るようにしたので、従来と同等な周波数解析精度を維持しながら、解析における時間分解能を向上させ、主として音声信号における周波数変動を高精度に抽出することが可能となる。 According to the second aspect of the present invention, after each intensity array of the digitized acoustic signal is enlarged by a predetermined magnification in the time axis direction, a predetermined interval is set for each unit section composed of a predetermined number T of intensity arrays. Spectral intensities corresponding to several P types of frequencies are calculated, the frequency of P code codes is multiplied by Q, spectral intensities corresponding to frequencies exceeding the maximum frequency before correction are deleted, and the remaining Ph types of frequencies After correcting the spectrum intensity to correspond to, and correcting the time to 1 / Q times, Ph code codes including the frequency and time are obtained, so analysis is performed while maintaining the same frequency analysis accuracy as before. It is possible to improve the time resolution and extract the frequency fluctuation mainly in the audio signal with high accuracy.

また、本発明第３の態様では、本発明第１または第２の態様において、強度配列の時系列への拡大を、前記サンプリング周期を変化させずに、前記強度配列に対して線形補間を用いて時間軸方向にＱ倍だけ拡大するようにし、前記音響信号の周波数を全体的に１／Ｑに下げ、時間軸をＱ倍に延長させるようにしていることを特徴とする。 Further, in the third aspect of the present invention, in the first or second aspect of the present invention, expansion of the intensity array to the time series is performed using linear interpolation for the intensity array without changing the sampling period. The time axis is enlarged by Q times, the frequency of the acoustic signal is lowered to 1 / Q as a whole, and the time axis is extended by Q times.

また、本発明第４の態様では、本発明第１から第３いずれかの態様において、前記符号化は、前記符号コードとしてＭＩＤＩ形式を用いて符号化を行い、前記符号コードの前記周波数情報としてノートナンバーを用い、前記スペクトル強度としてベロシティを用い、前記時間情報として直前のＭＩＤＩイベントからの相対時刻であるデルタタイム１とデルタタイム２を用い、これらの変換されたノートナンバー、ベロシティ、デルタタイム１を基にＭＩＤＩのノートオンイベントを作成するとともに、ノートナンバー、デルタタイム２を基にＭＩＤＩのノートオフイベントを作成するようにしていることを特徴とする。 Also, in the fourth aspect of the present invention, in any one of the first to third aspects of the present invention, the encoding is performed using MIDI format as the code code, and the frequency information of the code code is used. Using a note number, using velocity as the spectrum intensity, and using delta time 1 and delta time 2 which are relative times from the immediately preceding MIDI event as the time information, these converted note numbers, velocity, delta time 1 A MIDI note-on event is created based on the note number, and a MIDI note-off event is created based on the note number and delta time 2.

また、本発明第５の態様では、本発明第１の態様において、前記符号化は、前記符号コードとしてＭＩＤＩ形式を用いて符号化を行い、前記符号コードの前記周波数情報としてノートナンバーを用い、前記スペクトル強度としてベロシティを用い、前記時間情報として直前のＭＩＤＩイベントからの相対時刻であるデルタタイム１とデルタタイム２を用い、これらの変換されたノートナンバー、ベロシティ、デルタタイム１を基にＭＩＤＩのノートオンイベントを作成するとともに、ノートナンバー、デルタタイム２を基にＭＩＤＩのノートオフイベントを作成するようにし、さらに、前記ノートナンバーに１２・ｌｏｇ₂Ｑ（前記Ｑに対して２を底とする対数値に１２倍を乗算した値）を加算し、１２８−１２・ｌｏｇ₂Ｑ以上のノートナンバーをもつ符号コードを削除するとともに、残りのＰｈ個の符号コードの前記デルタタイム１およびデルタタイム２に対して１／Ｑを乗算するような補正を行っていることを特徴とする。
Further, in the fifth aspect of the present invention, in the first aspect of the present invention, the encoding is performed using MIDI format as the code code, and a note number is used as the frequency information of the code code. Velocity is used as the spectrum intensity, and delta time 1 and delta time 2, which are relative times from the immediately preceding MIDI event, are used as the time information, and based on these converted note numbers, velocity, and delta time 1, MIDI A note-on event is created, a MIDI note-off event is created based on the note number and delta time 2, and the note number is set to 12 · log ₂ Q (with 2 as the base for Q) adds a value) obtained by multiplying 12 times the logarithm, Notona over 128-12 · log ₂ Q It deletes the code code with bars, characterized in that correction is performed so as to multiply the 1 / Q to the delta time 1 and delta time 2 remaining Ph number of code code.

また、本発明第６の態様では、本発明第１から第５のいずれかの態様において、前記スペクトルの算出は、前記単位区間の区間信号の構成要素となるべき複数個の要素信号を準備し、前記複数個の要素信号の中から、前記区間信号に対する相関値が最も高い要素信号を調和信号として選出し、前記調和信号とこの調和信号について得られた相関値との積で与えられる含有信号を、前記区間信号から減じることにより差分信号を求め、前記差分信号を新たな区間信号として、前記調和信号の選出および前記差分信号の演算を実行して新たな含有信号および新たな差分信号を得る処理を、繰り返し行うことによりＰ個の含有信号を求め、求めた含有信号の振幅値に基づいて、前記Ｐ種類の周波数に対応したスペクトル強度を算出するようにしていることを特徴とする。 Further, in a sixth aspect of the present invention, in any one of the first to fifth aspects of the present invention, the calculation of the spectrum prepares a plurality of element signals to be constituent elements of the section signal of the unit section. The inclusion signal given as the product of the harmonic signal and the correlation value obtained for the harmonic signal is selected as the harmonic signal from the plurality of component signals as the harmonic signal. Is subtracted from the section signal, and the difference signal is used as a new section signal, and the harmonic signal is selected and the difference signal is calculated to obtain a new inclusion signal and a new difference signal. By repeating the process, P inclusion signals are obtained, and the spectrum intensities corresponding to the P kinds of frequencies are calculated based on the obtained amplitude values of the inclusion signals. It is characterized in.

また、本発明第７の態様では、本発明第１の態様において、前記Ｊ個の時系列の強度配列に対して所定の個数のサンプルＴ（Ｔ＜Ｊ）で構成される符号化対象となる複数の第２単位区間を隣接する第２単位区間を時間軸方向に重複させながら設定し、個々の第２単位区間ごとに、Ｐ種類の周波数に対応した第２スペクトル強度を算出し、前記第２スペクトルの算出において求めたＰ個の周波数と各々に対応する第２スペクトル強度、および当該第２単位区間の開始時刻と終了時刻で構成されるＰ個の第２符号コードを作成し、作成されたＰ個の第２符号コードより、前記補正されたＰｈ個の符号コードの周波数範囲より低い周波数をもつＰｌ個の第２符号コードを抽出し、当該第２単位区間に対応する単位区間に含まれる前記補正されたＰｈ個の符号コードを加えて、Ｐｈ個の補正された符号コードとＰｌ個の第２符号コードで構成されるＰｈ＋Ｐｌ個の合成符号コードを作成するようにしたことを特徴とする。 Further, in the seventh aspect of the present invention, in the first aspect of the present invention, the J time-series intensity array is an encoding target composed of a predetermined number of samples T (T <J). A plurality of second unit intervals are set while overlapping adjacent second unit intervals in the time axis direction, a second spectrum intensity corresponding to P kinds of frequencies is calculated for each second unit interval, P second code codes composed of P frequencies obtained in the calculation of two spectra, second spectrum intensities corresponding to the respective frequencies, and start time and end time of the second unit section are created and created. Pl second code codes having a frequency lower than the frequency range of the corrected Ph code codes are extracted from the P second code codes and included in the unit section corresponding to the second unit section Said corrected Ph Added code code, characterized in that so as to create a Ph number of corrected code code and Pl pieces of Ph + Pl number of composite code code consists second code code.

本発明第７の態様によれば、本発明第１の態様において、時系列に拡大した音響信号とは別に、元の音響信号に対して周波数解析を行って第２符号コードを作成し、作成した第２符号コードを、時系列に拡大した音響信号から得られた符号コードと合成するようにしたので、時系列に拡大した音響信号から得られた符号コードにおいて欠落する低音部の符号コードを補うことができ、低音部に重要な成分のある音響信号に対しても忠実に再現可能な符号化を行うことができる。 According to the seventh aspect of the present invention, in the first aspect of the present invention, the second code code is generated by performing frequency analysis on the original acoustic signal separately from the acoustic signal expanded in time series, Since the second code code is synthesized with the code code obtained from the acoustic signal expanded in time series, the code part of the bass part that is missing in the code code obtained from the acoustic signal expanded in time series is obtained. It is possible to compensate, and it is possible to perform encoding that can be faithfully reproduced even for an acoustic signal having an important component in the bass part.

また、本発明第８の態様では、本発明第２の態様において、前記Ｊ個の時系列の強度配列に対して所定の個数のサンプルＴ（Ｔ＜Ｊ）で構成される符号化対象となる複数の第２単位区間を隣接する第２単位区間を時間軸方向に重複させながら設定し、個々の第２単位区間ごとに、Ｐ種類の周波数に対応した第２スペクトル強度を算出し、前記作成されたＰ種類の周波数に対応した第２スペクトル強度より、前記補正されたＰｈ種類の周波数範囲より低いＰｌ種類の周波数に対応した第２スペクトル強度を抽出し、当該第２単位区間に対応する単位区間に含まれる前記補正されたＰｈ種類個の周波数に対応したスペクトル強度を加えて、Ｐｈ種類の補正された周波数に対応するスペクトル強度とＰｌ種類の周波数に対応する第２スペクトル強度で構成されるＰｈ＋Ｐｌ種類の合成された周波数に対応する合成スペクトル強度を作成し、前記符号化は、前記合成されたＰｈ＋Ｐｌ種類の周波数に対応して、各周波数を特定可能な周波数情報と、各々に対応する合成スペクトル強度、および当該単位区間の開始と終了を特定可能な時間情報で構成されるＰｈ＋Ｐｌ個の符号コードを作成するようにしていることを特徴とする。 Further, in the eighth aspect of the present invention, in the second aspect of the present invention, the J time series intensity array is an encoding target composed of a predetermined number of samples T (T <J). A plurality of second unit intervals are set while overlapping adjacent second unit intervals in the time axis direction, and second spectral intensities corresponding to P types of frequencies are calculated for each second unit interval, and the creation A second spectral intensity corresponding to a Pl type frequency lower than the corrected Ph type frequency range is extracted from the second spectral intensity corresponding to the P type frequency thus determined, and a unit corresponding to the second unit section is extracted. A spectrum intensity corresponding to the corrected Ph types of frequencies included in the section is added, and a spectrum intensity corresponding to the Ph types of corrected frequencies and a second spectrum intensity corresponding to the Pl types of frequencies are included. A composite spectrum intensity corresponding to the combined Ph + Pl type synthesized frequency is created, and the encoding is performed on each of the frequency information capable of specifying each frequency corresponding to the synthesized Ph + Pl type frequency, It is characterized in that Ph + Pl code codes composed of corresponding synthetic spectrum intensities and time information capable of specifying the start and end of the unit section are created.

本発明第８の態様によれば、本発明第２の態様において、時系列に拡大した音響信号とは別に、元の音響信号に対して周波数解析を行ってスペクトル強度を得て、時系列に拡大した音響信号から得られたスペクトル強度と合成して符号コードを得るようにしたので、時系列に拡大した音響信号から得られた符号コードにおいて欠落する低音部の符号コードを補うことができ、低音部に重要な成分のある音響信号に対しても忠実に再現可能な符号化を行うことができる。 According to the eighth aspect of the present invention, in the second aspect of the present invention, in addition to the acoustic signal expanded in time series, frequency analysis is performed on the original acoustic signal to obtain spectrum intensity, and in time series Since a code code is obtained by combining with the spectrum intensity obtained from the expanded acoustic signal, the code code of the bass part that is missing in the code code obtained from the acoustic signal expanded in time series can be supplemented, It is possible to perform encoding that can be faithfully reproduced even for an acoustic signal having an important component in the bass part.

本発明によれば、従来と同等な周波数解析精度を維持しながら、解析における時間分解能を向上させ、主として音声信号における周波数変動を高精度に抽出することが可能となるという効果を奏する。 According to the present invention, while maintaining the same frequency analysis accuracy as in the past, the time resolution in analysis can be improved, and it is possible to extract frequency fluctuations mainly in audio signals with high accuracy.

本発明に係る音響信号の符号化方法の概要を示すフローチャートである。It is a flowchart which shows the outline | summary of the encoding method of the acoustic signal which concerns on this invention. 時間軸方向の拡大、周波数の増加・時間情報の縮小の概念を示す図である。It is a figure which shows the concept of the expansion of a time-axis direction, the increase in a frequency, and reduction | decrease of time information. 本発明に係る音響信号の符号化方法の変形例を示すフローチャートである。It is a flowchart which shows the modification of the encoding method of the acoustic signal which concerns on this invention.

以下、本発明の好適な実施形態について、図面を参照して詳細に説明する。図１は、本実施形態に係る音響信号の符号化方法の概要を示すフローチャートである。本実施形態に係る音響信号の符号化方法は、図１に示した各ステップ（各段階）の詳細な手順を記録したプログラムを、コンピュータが実行することにより、行われる。コンピュータとしては、演算処理を行うためのＣＰＵやメモリ、プログラムやデータを記憶するハードディスク等の記憶装置、音響信号等のデータ入力を行うためのデータ入力機器、指示入力を行うキーボード、マウス等の入力機器、必要な情報を画面に表示する液晶ディスプレイ等の表示機器を備えた汎用のコンピュータを用いることができる。また、図１に示した各ステップ（各段階）の詳細な手順を記録したプログラムが組み込まれた汎用のコンピュータにより本実施形態に係る音響信号の符号化装置が実現される。 DESCRIPTION OF EMBODIMENTS Hereinafter, preferred embodiments of the present invention will be described in detail with reference to the drawings. FIG. 1 is a flowchart showing an outline of an audio signal encoding method according to this embodiment. The audio signal encoding method according to the present embodiment is performed by a computer executing a program that records the detailed procedure of each step (each stage) shown in FIG. As a computer, a CPU and memory for performing arithmetic processing, a storage device such as a hard disk for storing programs and data, a data input device for inputting data such as acoustic signals, a keyboard for inputting instructions, a mouse, etc. A general-purpose computer including a device and a display device such as a liquid crystal display that displays necessary information on a screen can be used. Further, the audio signal encoding apparatus according to the present embodiment is realized by a general-purpose computer in which a program recording the detailed procedure of each step (each stage) shown in FIG. 1 is incorporated.

まず、コンピュータ（符号化装置）は、処理対象であるデジタル音響信号を、データ入力機器から読み込む。デジタル音響信号は、アナログ音響信号を所定のサンプリング周波数、量子化ビット数でサンプリングしたものであり、本実施形態では、サンプリング周波数４４．１ｋＨｚ、量子化ビット数１６ビットでサンプリングした場合を例にとって以下説明していく。サンプリング周波数４４．１ｋＨｚでサンプリングした場合、デジタル音響信号は、１秒間に４４１００個のサンプル（強度値）を有するサンプル列（サンプルの配列：強度配列）として構成されることになる。 First, a computer (encoding device) reads a digital audio signal to be processed from a data input device. The digital audio signal is obtained by sampling an analog audio signal at a predetermined sampling frequency and the number of quantization bits. In the present embodiment, the sampling is performed with a sampling frequency of 44.1 kHz and a quantization bit number of 16 bits as an example. I will explain. When sampling is performed at a sampling frequency of 44.1 kHz, the digital acoustic signal is configured as a sample row (sample array: intensity array) having 44100 samples (intensity values) per second.

デジタル音響信号を読み込んだら、コンピュータは、デジタル音響信号を構成するサンプルを時間軸方向に所定の倍率Ｑ（Ｑは整数）だけ拡大する（Ｓ１）。具体的には、デジタル音響信号を構成するサンプルの数をＱ倍にする。そして、Ｑ個ごとに、元のサンプルと同じ値のものを配置し、その間の（Ｑ−１）個のサンプルの値としては、両側に位置する元のサンプルの値を用いて線形補間したものを与える。原音響信号の各サンプルｊ（ｊ＝０・・・Ｊ−１）についてのサンプル値をｘ（ｊ）とすると、コンピュータは、以下の〔数式１〕に従った処理を実行することにより、拡大後の音響信号の各サンプルｊ・Ｑ＋ｋ（０≦ｋ≦Ｑ−１）についてのサンプル値ｘ´（ｊ・Ｑ＋ｋ）を算出する。以下の〔数式１〕において、ｗはｋ／（Ｑ−１）で与えられる０≦ｗ≦１の値をとる実数値とする。 After reading the digital sound signal, the computer enlarges the sample constituting the digital sound signal by a predetermined magnification Q (Q is an integer) in the time axis direction (S1). Specifically, the number of samples constituting the digital acoustic signal is multiplied by Q. Then, every Q samples having the same value as the original sample are arranged, and (Q-1) sample values between them are linearly interpolated using the values of the original samples located on both sides. give. Assuming that the sample value for each sample j (j = 0... J-1) of the original sound signal is x (j), the computer expands by executing the processing according to the following [Equation 1]. A sample value x ′ (j · Q + k) for each sample j · Q + k (0 ≦ k ≦ Q−1) of the subsequent acoustic signal is calculated. In the following [Formula 1], w is a real value taking a value of 0 ≦ w ≦ 1 given by k / (Q−1).

〔数式１〕
ｘ´（ｊ・Ｑ＋ｋ）＝（１−ｗ）・ｘ（ｊ）＋ｗ・ｘ（ｊ＋１） [Formula 1]
x '(j.Q + k) = (1-w) .x (j) + w.x (j + 1)

Ｓ１における処理の結果、デジタル音響信号を構成するＪ個のサンプルは、Ｊ×Ｑ個に拡大される。図２（ａ）にＳ１における拡大処理による波形の変化を示す。図２（ａ）における波形は、サンプルの値をプロットしたものを線分で結んだものであるが、サンプル数が多いため、曲線状に表現されるものである。上記〔数式１〕に従った処理を実行することにより、左側に示したような波形が右側に示したような波形に変化することになる。なお、図２の例では、説明の便宜上Ｑ＝２の場合を示している。 As a result of the processing in S1, J samples constituting the digital audio signal are expanded to J × Q. FIG. 2A shows a change in waveform due to the enlargement process in S1. The waveform in FIG. 2A is a plot of sample values connected by line segments, but is expressed in a curved line due to the large number of samples. By executing the processing according to the above [Equation 1], the waveform shown on the left side changes to the waveform shown on the right side. In the example of FIG. 2, the case of Q = 2 is shown for convenience of explanation.

次に、コンピュータは、時間軸方向に拡大されたサンプル上に単位区間を設定する（Ｓ２）。単位区間の長さ（サンプル数Ｔ）は、サンプリング周波数との関係で設定されるが、サンプリング周波数が４４．１ｋＨｚの場合、低域部まで忠実に解析するためには、４０９６サンプル以上必要である。そこで、本実施形態では、１単位区間のサンプル数Ｔ＝４０９６として単位区間を設定している。 Next, the computer sets a unit interval on the sample expanded in the time axis direction (S2). The length of the unit interval (number of samples T) is set in relation to the sampling frequency. However, if the sampling frequency is 44.1 kHz, 4096 samples or more are required to faithfully analyze the low frequency region. . Therefore, in this embodiment, the unit interval is set as the number of samples T per unit interval T = 4096.

単位区間の設定は、特許文献１〜５に開示されているように、デジタル音響信号の先頭から順次サンプルを抽出することにより行われる。単位区間は、全てのサンプルを漏らさず設定し、好ましくは、連続する単位区間においてサンプルが重複するように設定する。この場合、各単位区間の先頭の間隔（シフト幅という）は、様々な規則で設定することができる。最も単純なのは、シフト幅を固定、すなわち重複させるサンプル数を一定として設定する手法である。例えば、Ｔ＝４０９６の場合、先頭の単位区間をｊ＝０〜４０９５、２番目の単位区間をｊ＝２０４８〜６１４３、２番目の単位区間をｊ＝４０９６〜８１９１というように、２０４８（＝Ｔ／２）個のサンプルを重複させながら、設定することになる。しかし、時間分解能を向上させるためには、シフト幅を小さくしたいという要望があり、一方、シフト幅を小さくするほど、計算時間が増大するという問題がある。また、シフト幅を必要以上に小さくすると、後述する図１・Ｓ４の単音成分の連結処理において連結条件が満足されなくなり連結処理が適切に機能しなくなる。そこで、音響信号の状態に合わせて最適なシフト幅を設定するため、本実施形態では、特許文献５に開示したような、ゼロ交差点間隔の粗密または自己相関解析により周波数変化が顕著なゼロ交差点を選別し、このゼロ交差点に位置するサンプルを先頭とする。 Setting of the unit section is performed by sequentially extracting samples from the head of the digital sound signal as disclosed in Patent Documents 1 to 5. The unit interval is set so as not to leak all samples, and is preferably set so that the samples overlap in continuous unit intervals. In this case, the head interval (referred to as shift width) of each unit section can be set according to various rules. The simplest is a method in which the shift width is fixed, that is, the number of overlapping samples is set constant. For example, when T = 4096, the first unit interval is j = 0-4095, the second unit interval is j = 2048-6143, the second unit interval is j = 4096-8191, and so on. / 2) Setting is performed with overlapping samples. However, in order to improve the time resolution, there is a demand for reducing the shift width. On the other hand, there is a problem that the calculation time increases as the shift width is reduced. Also, if the shift width is made smaller than necessary, the connection conditions are not satisfied in the connection processing of single-tone components in FIG. 1 and S4 described later, and the connection processing does not function properly. Therefore, in order to set an optimum shift width in accordance with the state of the acoustic signal, in the present embodiment, the zero crossing point in which the frequency change is noticeable by the denseness or autocorrelation analysis of the zero crossing interval as disclosed in Patent Document 5 is used. The sample is selected and the sample located at this zero crossing is set as the head.

ゼロ交差点とは、正負両極性の音響信号と、信号の０レベルとの交差点のことであり、ここでは、音響信号の信号の強度値（振幅）が０となる時刻を示す。ただし、デジタル化した音響信号は、アナログ信号におけるゼロ交差点をサンプルするとは限らない。そのため、実際には、強度値が丁度０になる場合に加え、サンプリング点の強度値が正から負、または負から正に変化した場合に、その前後のサンプリング点のどちらかをゼロ交差点とみなす処理を行う。なお、ゼロ交差点検出のためには、解析対象となる音響信号が正負両極性となっている必要がある。そのため、直流成分を含む音響信号については、直流成分を除去しておく必要がある。直流成分の除去については、周知の種々の手法を適用することができるので、ここでは詳細な説明は省略する。基本的には、ゼロ交差点に位置するサンプルを先頭として単位区間を設定するが、連続する単位区間のシフト幅が一定の範囲に収まるように、ゼロ交差点以外の位置を先頭として単位区間を設定する場合もある。具体的には、最大シフト幅（例えばＴ／２）を上回る場合は、ゼロ交差点以外の位置でも最大シフト幅となる位置を先頭にして単位区間を設定する。逆に、最小シフト幅（例えばＴ／８）を下回る場合は、最小シフト幅を上回るように幾つかのゼロ交差点を飛ばした位置を先頭にして単位区間を設定し、最小シフト幅を上回りかつ最大シフト幅の範囲で該当するゼロ交差点が存在しない場合は、上記と同様に最大シフト幅となる位置を先頭にして単位区間を設定するような補正を行う。 The zero crossing point is a crossing point between the positive and negative acoustic signals and the 0 level of the signal. Here, the zero crossing point indicates the time when the intensity value (amplitude) of the acoustic signal becomes zero. However, the digitized acoustic signal does not necessarily sample the zero crossing point in the analog signal. Therefore, in actuality, in addition to the case where the intensity value is just 0, when the intensity value of the sampling point changes from positive to negative or from negative to positive, one of the sampling points before and after that is regarded as a zero crossing point. Process. In order to detect the zero crossing, it is necessary that the acoustic signal to be analyzed has both positive and negative polarities. Therefore, it is necessary to remove the direct current component for the acoustic signal including the direct current component. Since various known methods can be applied to the removal of the DC component, detailed description thereof is omitted here. Basically, the unit section is set with the sample located at the zero crossing as the head, but the unit section is set with the position other than the zero crossing as the head so that the shift width of the continuous unit sections falls within a certain range. In some cases. Specifically, when the maximum shift width (for example, T / 2) is exceeded, the unit section is set starting from the position having the maximum shift width even at a position other than the zero intersection. On the other hand, if it is below the minimum shift width (for example, T / 8), a unit section is set starting from the position where several zero crossings are skipped so as to exceed the minimum shift width, exceeding the minimum shift width and maximum If there is no corresponding zero-intersection in the range of the shift width, correction is performed so that the unit section is set with the position having the maximum shift width as the head as described above.

次に、設定された各単位区間を対象として周波数解析を実行し、各単位区間のスペクトルを算出する（Ｓ３）。各単位区間のスペクトルの算出は、特許文献１〜５に開示されているように、ＭＩＤＩのノートナンバーｎ（０≦ｎ≦１２７）に対応する１２８種の解析周波数ｆ（ｎ）＝４４０・２^(n-69)/12の要素信号（要素関数）を基本にした一般化調和解析により、１２８個の成分を抽出することにより行う。“１２８種”“１２８個”というのは一例であり、一般にＰ種類の解析周波数を用いてＰ個の成分を抽出することになる。ノートナンバーｎに対応して解析周波数を設定した場合、周波数が高くなるにつれ、ノートナンバー間の周波数間隔が広くなるため、特に、ｎが６０を超えると解析精度が低下してしまう。そこで、本実施形態では、特許文献３に開示したように、ノートナンバー間をＭ個の微分音に分割した１２８Ｍ個の要素信号ｆ（ｎ，ｍ）＝４４０・２^{(n-69+m/M)/12}を用いて解析を行い、１２８Ｍ個の成分を抽出する。後述する図１・Ｓ４においてピッチベンド符号の付加など特殊な符号化を行わない限り、各ノートナンバーにおけるＭ個の微分音の情報は不要であるため、Ｍ個の微分音の成分の合算値を当該ノートナンバーにおける成分として代表させ、結果的に１２８個の成分を抽出する。 Next, frequency analysis is performed for each set unit section, and a spectrum of each unit section is calculated (S3). As disclosed in Patent Documents 1 to 5, the spectrum of each unit section is calculated by 128 analysis frequencies f (n) = 440 · 2 corresponding to MIDI note numbers n (0 ≦ n ≦ 127). ^{This is} performed by extracting 128 components by generalized harmonic analysis based on ^{(n-69) / 12} element signals (element functions). “128 types” and “128” are examples, and generally P components are extracted using P types of analysis frequencies. When the analysis frequency is set in correspondence with the note number n, the frequency interval between the note numbers becomes wider as the frequency becomes higher. In particular, when n exceeds 60, the analysis accuracy decreases. Therefore, in this embodiment, as disclosed in Patent Document 3, 128M element signals f (n, m) = 440 · 2 ^{(n−69 + m /} ) obtained by dividing a note number into M differential sounds. Analyze using ^{M) / 12} to extract 128M components. Unless special encoding such as addition of a pitch bend code is performed in FIG. 1 and S4, which will be described later, information on M differential sounds at each note number is not necessary. As a component in the note number, 128 components are extracted as a result.

コンピュータによる具体的な処理手順としては、まず、ノートナンバー分の強度値の配列Ｅ（ｎ）（０≦ｎ≦１２７）と副周波数配列Ｓ（ｎ）を設定し、初期値を全て０とする。続いて、０≦ｎ≦１２７および０≦ｍ≦Ｍ−１に対して以下の〔数式２〕に従った処理を実行し、Ｅ（ｎ，ｍ）を最大にする（ｎｍａｘ，ｍｍａｘ）を求める。 As a specific processing procedure by the computer, first, an intensity value array E (n) (0 ≦ n ≦ 127) and a sub-frequency array S (n) corresponding to the note number are set, and all initial values are set to 0. . Subsequently, a process according to the following [Formula 2] is executed for 0 ≦ n ≦ 127 and 0 ≦ m ≦ M−1 to obtain (nmax, mmax) that maximizes E (n, m). .

〔数式２〕
Ａ(ｎ，ｍ)＝(１／Ｔ（ｎ）)・Σ_i=0,T(n)-1ｘ(ｉ) sin(２πｆ（ｎ，ｍ）ｉ／ｆｓ)
Ｂ(ｎ，ｍ)＝(１／Ｔ（ｎ）)・Σ_i=0,T(n)-1ｘ(ｉ) cos (２πｆ（ｎ，ｍ）ｉ／ｆｓ)
｛Ｅ(ｎ，ｍ)｝²＝｛Ａ(ｎ，ｍ)｝²＋｛Ｂ(ｎ，ｍ)｝² [Formula 2]
A (n, m) = (1 / T (n)) · Σi _{= 0, T (n) −1} x (i) sin (2πf (n, m) i / fs)
B (n, m) = (1 / T (n)) · Σi _{= 0, T (n) −1} x (i) cos (2πf (n, m) i / fs)
{E (n, m)} ² = {A (n, m)} ² + {B (n, m)} ²

上記〔数式２〕においてＴ（ｎ）は解析フレーム長であり、単位区間Ｔを超えない範囲で要素信号の周期の最大の整数倍になるように設定し、ｋを適当な整数値として、Ｔ（ｎ）＝ｋ／ｆ（ｎ，ｍ）で与える。Ｅ（ｎ，ｍ）を最大にする（ｎｍａｘ，ｍｍａｘ）を用いたｆ（ｎｍａｘ，ｍｍａｘ）が調和信号として選出されることになる。（ｎｍａｘ，ｍｍａｘ）が求められたら、コンピュータは、Ａ（ｎｍａｘ，ｍｍａｘ）およびＢ（ｎｍａｘ，ｍｍａｘ）を用いて、以下の〔数式３〕に従った処理を実行し、サンプル配列ｘ（ｉ）の全ての要素（０≦ｉ≦Ｔ−１）を更新する。 In the above [Equation 2], T (n) is the analysis frame length, and is set so as to be the maximum integer multiple of the period of the element signal within a range not exceeding the unit interval T. (N) = k / f (n, m). F (nmax, mmax) using (nmax, mmax) that maximizes E (n, m) is selected as the harmonic signal. When (nmax, mmax) is obtained, the computer executes processing according to the following [Equation 3] using A (nmax, mmax) and B (nmax, mmax) to obtain a sample array x (i). Update all elements (0 ≦ i ≦ T−1).

〔数式３〕
ｘ（ｉ）←ｘ（ｉ）−Ａ（ｎｍａｘ，ｍｍａｘ）・sin(２πｆ（ｎｍａｘ，ｍｍａｘ）ｉ／ｆｓ)−Ｂ（ｎｍａｘ，ｍｍａｘ）・cos (２πｆ（ｎｍａｘ，ｍｍａｘ）ｉ／ｆｓ) [Formula 3]
x (i) ← x (i) −A (nmax, mmax) · sin (2πf (nmax, mmax) i / fs) −B (nmax, mmax) · cos (2πf (nmax, mmax) i / fs)

上記〔数式３〕においては、ｘ（ｉ）から含有信号を減じる処理を行っている。さらに、以下の〔数式４〕に従った処理を実行し、強度値の配列Ｅ（ｎ）、副周波数配列Ｓ（ｎ）を更新する。 In the above [Equation 3], the process of subtracting the content signal from x (i) is performed. Further, processing according to the following [Equation 4] is executed to update the array of intensity values E (n) and the sub-frequency array S (n).

〔数式４〕
Ｅ（ｎｍａｘ）←Ｅ（ｎｍａｘ）＋Ｅ（ｎｍａｘ，ｍｍａｘ）
Ｓ（ｎｍａｘ）←ｍｍａｘ [Formula 4]
E (nmax) ← E (nmax) + E (nmax, mmax)
S (nmax) ← mmax

コンピュータは、上記〔数式２〕〜〔数式４〕の処理を全てのｎ（０≦ｎ≦１２７）に対して実行し、全てのＥ（ｎ）およびＳ（ｎ）の値を決定する。 The computer executes the processes of [Formula 2] to [Formula 4] for all n (0 ≦ n ≦ 127), and determines the values of all E (n) and S (n).

本実施形態では、処理負荷を軽減するため、Ｍの値については、ノートナンバーに基づいて可変に設定し、例えば解析する周波数間隔が１００Ｈｚ程度になるようにしている。そして、ノートナンバー６０以下は分割せずＭ＝１にする。また、精度は若干落ちるが、初回の〔数式２〕の処理でＳ（ｎ）を決定し、２回目以降の〔数式２〕の処理は、ｍ＝Ｓ（ｎ）に固定して行い、微分音解析を省略するようにしても良い。また、〔数式２〕の処理で、既に同一ノートナンバーに対して副周波数が異なる信号成分が複数回に渡って解析される可能性があるが、Ｅ（ｎ）とＳ（ｎ）に既に値がセットされている場合は、Ｅ（ｎ，ｍ）の最大値の選定候補から除外するようにしても良い。 In the present embodiment, in order to reduce the processing load, the value of M is variably set based on the note number, for example, the frequency interval to be analyzed is about 100 Hz. And note number 60 and below are not divided and M = 1. Although the accuracy is slightly reduced, S (n) is determined by the first [Formula 2] process, and the second and subsequent [Formula 2] processes are performed with m = S (n) fixed. Sound analysis may be omitted. Further, in the processing of [Equation 2], there is a possibility that signal components having different sub-frequency with respect to the same note number may be analyzed a plurality of times, but E (n) and S (n) are already values. May be excluded from selection candidates for the maximum value of E (n, m).

各単位区間について、スペクトル（１２８個の周波数成分）が算出されたら、周波数情報と、各周波数に対応するスペクトル強度、および単位区間の開始と終了を特定可能な時間情報で構成される符号コードを作成する（Ｓ４）。符号コードの作成にあたり、まず、算出したスペクトルに、各ノートナンバーｎの時刻、時間長の情報を追加し、[開始時刻，時間長，主周波数ｎ，副周波数Ｓ（ｎ），強度Ｅ（ｎ）]で構成される単音成分を作成する。「開始時刻」としては単位区間の先頭の時刻を、デジタル音響信号全体において特定できる情報であれば良く、本実施形態では、単位区間の先頭サンプル（ｉ＝０）に付されたデジタル音響信号全体におけるサンプル番号（絶対サンプルアドレス：ｊに対応）を記録している。この絶対サンプルアドレスをサンプリング周波数（４４１００）で除算することにより、音響信号先頭からの時刻が得られる。時間長は、本実施形態では単位区間ごとに可変で与えられることを特徴とし、直後に後続する単位区間の開始時刻までの差分（後続する単位区間の開始時刻−当該単位区間の開始時刻）で与えられる。 When a spectrum (128 frequency components) is calculated for each unit section, a code code including frequency information, spectrum intensity corresponding to each frequency, and time information that can specify the start and end of the unit section Create (S4). In creating a code code, first, information of time and time length of each note number n is added to the calculated spectrum, and [start time, time length, main frequency n, sub-frequency S (n), intensity E (n )] Is created. The “start time” may be information that can specify the start time of the unit section in the entire digital sound signal. In this embodiment, the entire digital sound signal attached to the start sample (i = 0) of the unit section. Sample number (corresponding to absolute sample address: j) is recorded. By dividing this absolute sample address by the sampling frequency (44100), the time from the head of the acoustic signal is obtained. In this embodiment, the time length is variably given for each unit section, and is a difference (start time of the subsequent unit section−start time of the unit section) immediately after the start time of the subsequent unit section. Given.

Ｓ２で設定された単位区間ごとに、１２８個の単音成分が作成されるが、さらに、Ｓ４においては、連続する単位区間において単音成分を連結する処理を行う。具体的には、連続する単位区間における同一ノートナンバーの単音成分が、所定の連結条件を満たす場合、２つの単音成分を連結する。連結条件としては、同一の音として連続性を有する状態を適宜設定することができるが、本実施形態では、副周波数を考慮した周波数（主周波数＋副周波数）の差が所定の閾値Ｎｄｉｆ未満で、双方の強度が所定の閾値Ｌｍｉｎ以上で、かつ双方の強度の差が所定の閾値Ｌｄｉｆ未満である場合に、連続性を有するとして、後続の単音成分を前方の単音成分に連結する。ただし、連結後の主周波数、副周波数、強度は大きい方の単音成分の各値を採用し、時間長は双方の和で与える。連結条件としての具体的な閾値は、本実施形態では、Ｎｄｉｆ＝８／２５[単位：ノートナンバー換算]、Ｌｍｉｎ＝１[単位：１２８段階ベロシティ換算]、Ｌｄｉｆ＝１０[単位：１２８段階ベロシティ換算]としている。連結処理は、符号コードへの変換前に行うものであるため、各閾値は、ノートナンバー、ベロシティに換算したものである。 For each unit section set in S2, 128 single-tone components are created. Further, in S4, processing for connecting the single-tone components in continuous unit sections is performed. Specifically, when a single note component of the same note number in a continuous unit section satisfies a predetermined connection condition, two single note components are connected. As the connection condition, a state having continuity as the same sound can be set as appropriate, but in this embodiment, the difference between the frequencies (main frequency + sub frequency) considering the sub frequency is less than a predetermined threshold value Ndif. When the two intensities are equal to or greater than the predetermined threshold Lmin and the difference between the two intensities is less than the predetermined threshold Ldif, the subsequent single sound component is connected to the preceding single sound component as having continuity. However, the connected main frequency, sub-frequency, and intensity use each value of the larger single tone component, and the time length is given as the sum of both. In the present embodiment, specific threshold values as connection conditions are Ndif = 8/25 [unit: note number conversion], Lmin = 1 [unit: 128 step velocity conversion], Ldif = 10 [unit: 128 step velocity conversion] It is said that. Since the concatenation process is performed before conversion to a code code, each threshold value is converted into a note number and velocity.

同一ノートナンバーの単音成分の連結は、連結条件を満たす限り、後続する単位区間の単音成分に対して繰り返し行い、最終的に得られた[開始時刻，時間長，主周波数ｎ，副周波数Ｓ（ｎ），強度Ｅ（ｎ）]の単音成分を、符号コードに変換する。符号コードの形式としては、周波数情報と、各周波数に対応するスペクトル強度、および単位区間の開始と終了を特定可能な時間情報を有するものであれば、どのような形式のものであっても良いが、本実施形態では、ＭＩＤＩ形式に変換する。ＭＩＤＩでは、発音開始と、発音終了を別のイベントとして発生するため、したがって、本実施形態では、１つの単音成分を２つのＭＩＤＩノートイベントに変換する。具体的には、「開始時刻」で、ノートナンバーｎのノートオンイベントを発行し、ベロシティ値は強度Ｅ（ｎ）の最大値をＥｍａｘとして、１２８・｛Ｅ（ｎ）／Ｅｍａｘ｝^1/4で与える。時刻については、Standard MIDI Fileでは、直前イベントとの相対時刻（デルタタイム）で与える必要があり、その時刻単位は任意の整数値で定義でき、例えば、１／１５３６[秒]の単位に変換して与える。そして、「開始時刻」＋「時間長」で特定される終了時刻で、ノートナンバーｎのノートオフイベントを発行する。この際、時間長には、０以上１以下の実数を乗じる。これは、使用するＭＩＤＩ音源の音色にも依存するが、ＭＩＤＩ音源の余韻を考慮して早めにノートオフ指示をするためである。時間長をそのまま用いてもＭＩＤＩ音源の処理上問題はないが、発音の際、後続音と部分的に重なる場合がある。 As long as the connection condition is satisfied, the single note components of the same note number are repeatedly applied to the single note component of the subsequent unit section, and finally obtained [start time, time length, main frequency n, sub frequency S ( n), a single tone component of intensity E (n)] is converted into a code code. The format of the code code may be any format as long as it has frequency information, spectrum intensity corresponding to each frequency, and time information that can specify the start and end of a unit section. However, in this embodiment, the data is converted to the MIDI format. In MIDI, sound generation start and sound generation end occur as separate events. Therefore, in this embodiment, one single tone component is converted into two MIDI note events. Specifically, a note-on event of note number n is issued at the “start time”, and the velocity value is 128 · {E (n) / Emax} ^1/4, where the maximum value of the intensity E (n) is Emax. Give in. In Standard MIDI File, it is necessary to give the time as a relative time (delta time) with the immediately preceding event, and the time unit can be defined by an arbitrary integer value, for example, converted to 1/1536 [seconds]. Give. Then, a note-off event of note number n is issued at the end time specified by “start time” + “time length”. At this time, the time length is multiplied by a real number between 0 and 1. This is because a note-off instruction is given early in consideration of the reverberation of the MIDI sound source, although it depends on the tone color of the MIDI sound source to be used. Even if the time length is used as it is, there is no problem in the processing of the MIDI sound source.

ＭＩＤＩ符号に変換する際、ＭＩＤＩ音源で処理可能な同時発音数についても考慮するため、同時発音数の調整を行う必要がある。ＭＩＤＩ音源で処理可能な同時発音数が３２である場合、時間軸方向に発音期間中（ノートオン状態）のノートイベントの個数を連続的にカウントし、同時に３２個のノートイベントが存在する箇所が見つかった場合は、各々対になるノートオフイベントを近傍区間内で探索し、各ノートイベント対のベロシティ値とデュレーション値（ノートオフ時刻−ノートオン時刻）の積（エネルギー値）で優先度を評価し、指定和音数（この場合“３２”）以下になるように優先度の低い（エネルギー値の小さい）ノートイベント対を局所的に削除する処理を行う。“局所的に”とは、３２を超えるノートイベントが存在する部分に限りという意味である。この際、ベロシティ値またはデュレーション値のいずれかが所定の下限値より低い場合、優先度に関係なく削除する処理も行う。 When converting to MIDI code, it is necessary to adjust the number of simultaneous pronunciations in order to consider the number of simultaneous pronunciations that can be processed by the MIDI sound source. When the number of simultaneous sounds that can be processed by the MIDI sound source is 32, the number of note events during the sound generation period (note-on state) is continuously counted in the time axis direction, and there are locations where 32 note events exist simultaneously. If found, each pair of note-off events is searched in the neighborhood, and the priority is evaluated by the product (energy value) of the velocity value and duration value (note-off time-note-on time) of each note event pair. Then, a process of locally deleting note event pairs with low priority (low energy value) so as to be equal to or less than the specified number of chords (in this case “32”). “Locally” means that it is limited to a portion where there are more than 32 note events. At this time, if either the velocity value or the duration value is lower than the predetermined lower limit value, the deletion process is also performed regardless of the priority.

さらに、ＭＩＤＩ符号に変換する際、ＭＩＤＩ音源で処理可能なビットレートについても考慮するため、ビットレートの調整を行う必要がある。時間軸方向に、１秒間隔にノートオンまたはノートオフイベントの個数をカウントし、各々の符号長を平均５バイト（４０ビット）とし、ＭＩＤＩ音源で処理可能な最大ビットレートを９０００［ｂｐｓ（ビット／秒）］とすると、１秒間あたりイベント数が９０００／４０＝２２５個を超えている区間が見つかった場合は、その区間に存在するノートオンまたはノートオフイベントと各々対になるノートオフまたはノートオンイベントを近傍区間内で探索し、各ノートイベント対のベロシティ値とデュレーション値（ノートオフ時刻−ノートオン時刻）の積（エネルギー値）で優先度を評価し、指定イベント個数（この場合“２２５”）以下になるように優先度の低い（エネルギー値の小さい）ノートイベント対を局所的に削除する処理を行う。この際、ベロシティ値またはデュレーション値のいずれかが所定の下限値より低い場合、優先度に関係なく削除する処理も行う。 Furthermore, when converting to a MIDI code, it is necessary to adjust the bit rate in order to consider the bit rate that can be processed by the MIDI sound source. In the time axis direction, the number of note-on or note-off events is counted at one-second intervals, the average code length is 5 bytes (40 bits), and the maximum bit rate that can be processed by the MIDI sound source is 9000 [bps (bits). / Sec)], if an interval in which the number of events per second exceeds 9000/40 = 225 is found, the note-off or note paired with the note-on or note-off event existing in that interval, respectively. An on-event is searched in the neighborhood interval, and the priority is evaluated by the product (energy value) of the velocity value and duration value (note-off time-note-on time) of each note event pair, and the specified number of events (in this case, “225”). ”) Locally delete note event pairs with low priority (low energy value) so that Cormorant. At this time, if either the velocity value or the duration value is lower than the predetermined lower limit value, the deletion process is also performed regardless of the priority.

符号コードの作成が行われたら、時間軸方向に拡大して処理されたことによる変動を是正するため、各符号コードを補正する処理を行う（Ｓ５）。具体的には、まず、全てのノートイベント（ノートオンイベントまたはノートオフイベント）のノートナンバー値に１２・ｌｏｇ₂Ｑだけ加算する処理を行う。例えば、Ｑ＝４の場合、２４半音（２オクターブ）だけ全体的に音高を上げる。この処理は、Ｓ１においてサンプル数をＱ倍したことにより周波数が１／Ｑになっているため、周波数をＱ倍にして元の状態に戻すために行う。この補正によりノートナンバーが規格値上限の１２７を超えるノートナンバーをもつ符号コードは削除する。具体的には補正前のノートナンバーが１２８−１２・ｌｏｇ₂Ｑ以上の符号コードが削除される。 When the code code is created, a process for correcting each code code is performed in order to correct the variation caused by the enlargement in the time axis direction (S5). Specifically, first, a process of adding 12 · log ₂ Q to the note number values of all the note events (note-on event or note-off event) is performed. For example, when Q = 4, the overall pitch is raised by 24 semitones (2 octaves). This process is performed to restore the original state by multiplying the frequency by Q, since the frequency is 1 / Q by multiplying the number of samples by Q in S1. By this correction, the code code having the note number whose note number exceeds the standard value upper limit 127 is deleted. Specifically, the code code with the note number before correction of 128-12 · log ₂ Q or more is deleted.

続いて、全てのノートイベントの時刻（ノートオン時刻またはノートオフ時刻）に１／Ｑを乗算する。これにより、ＭＩＤＩ符号全体の演奏時間、および各ノートイベントの発音時間が１／Ｑに縮小される。この処理は、Ｓ１においてサンプル数をＱ倍したことにより全体の演奏時間がＱ倍になっているため、時刻を１／Ｑにして元の状態に戻すために行う。この処理を行うと、時間あたりのノートイベント数がＱ倍に増大するため、上記Ｓ４で実行したビットレートの調整を再度実行する。 Subsequently, the times (note-on time or note-off time) of all the note events are multiplied by 1 / Q. As a result, the performance time of the entire MIDI code and the sounding time of each note event are reduced to 1 / Q. This process is performed in order to set the time to 1 / Q and to return to the original state because the total performance time has become Q times by multiplying the number of samples by Q in S1. When this process is performed, the number of note events per hour increases by a factor of Q, so the bit rate adjustment executed in S4 is executed again.

Ｓ５における処理の結果、周波数（音高）はＱ倍になるとともに、時間情報は１／Ｑになる。Ｓ５の補正処理によるＭＩＤＩイベント（ＭＩＤＩ符号のノートイベント）の変化の様子を図２（ｂ）に示す。図２（ｂ）においては、Ｑ＝２の場合のＭＩＤＩイベントの変化を、音符により示している。Ｓ５の補正処理により左側の“ミ”の音符は、右側では１オクターブ高い（周波数が２倍）“ミ”の音符に変化している。一方、左側の四分音符が、右側では時間的に１／２の八分音符に変化している。 As a result of the processing in S5, the frequency (pitch) becomes Q times and the time information becomes 1 / Q. FIG. 2B shows how the MIDI event (MIDI code note event) is changed by the correction process of S5. In FIG. 2B, the change of the MIDI event in the case of Q = 2 is indicated by a note. As a result of the correction process of S5, the left "mi" note is changed to the "mi" note that is one octave higher (double the frequency) on the right side. On the other hand, the left quarter note is changed to a half eighth note on the right side.

Ｓ５により得られたＭＩＤＩコードを、そのまま最終的な符号として出力しても、本発明の効果は得られる。しかし、Ｓ５の符号コードの補正において、ノートナンバー値に１２・ｌｏｇ₂Ｑだけ加算する処理を行っているため、符号コード全体が高音部に移動したため、高音部の一部がＭＩＤＩ規格外により削除されるとともに、低音部の符号コードが存在しなくなる。そこで、以下のＳ６〜Ｓ９による処理を実行することにより低音部を補填する処理を行う。以下、Ｓ６〜Ｓ９の処理について説明する。 Even if the MIDI code obtained in S5 is outputted as a final code as it is, the effect of the present invention can be obtained. However, in the correction of the code code in S5, since the process of adding 12 · log ₂ Q to the note number value is performed, the entire code code has been moved to the high-pitched part, so a part of the high-pitched part is deleted outside the MIDI standard. As a result, the code code of the bass part does not exist. Therefore, a process for compensating for the bass part is performed by executing the processes in S6 to S9 below. Hereinafter, the process of S6-S9 is demonstrated.

Ｓ６〜Ｓ８の処理は、Ｓ２〜Ｓ４の処理と基本的には同じ処理であるが、対象とするデジタル音響信号が、Ｓ２〜Ｓ４は、サンプル数を拡大したデジタル音響信号であるのに対し、Ｓ６〜Ｓ８の処理は、元のデジタル音響信号である点で異なっている。したがって、Ｓ６〜Ｓ８における処理は、特許文献１〜特許文献５に開示されている公知の技術により実現される。本実施形態における処理では、Ｓ６〜Ｓ８に示した処理を、元のデジタル音響信号に対して実行する。 The processing of S6 to S8 is basically the same as the processing of S2 to S4, but the target digital acoustic signal is a digital acoustic signal with an increased number of samples, whereas S2 to S4 are digital acoustic signals. The processing of S6 to S8 is different in that it is the original digital acoustic signal. Therefore, the processes in S6 to S8 are realized by known techniques disclosed in Patent Documents 1 to 5. In the processing in the present embodiment, the processing shown in S6 to S8 is performed on the original digital acoustic signal.

そして、Ｓ８の処理により符号コードが得られたら、Ｓ５により得られた符号コード（サンプル数を拡大したデジタル音響信号から得られた符号コード：以下第１符号コードという）と、Ｓ８により得られた符号コード（元のデジタル音響信号から得られた符号コード：以下第２符号コードという）を合成する処理を行う（Ｓ９）。第１符号コードではＳ５の処理に基づいてノートナンバーに１２・ｌｏｇ₂Ｑなる正の値が加算されることにより、ノートナンバーが０から１２・ｌｏｇ₂Ｑまでのノートイベントは存在しない。（逆に、Ｓ５処理前に１２８−１２・ｌｏｇ₂Ｑ以上のノートナンバーをもつノートイベントはＭＩＤＩ規格外の高音領域に入るため削除される）。そこで、ノートナンバーについて所定の閾値として１２・ｌｏｇ₂Ｑを設定しておき、第１符号コードからは、所定の閾値以上のものを採用し、第２符号コードからは、所定の閾値未満のものを採用し、合成符号コード群を得る。この際、第２符号コードのノートイベントについては、そのベロシティ値をそのまま採用しても良いが、単位区間の音響信号に対応する長さが第１符号コードに比べてＱ倍長いため、ベロシティ値においてアンバランスが生じる可能性がある。そのため、以下のような補正処理を必要に応じて実施しても良い。 When the code code is obtained by the process of S8, the code code obtained by S5 (the code code obtained from the digital acoustic signal with the increased number of samples: hereinafter referred to as the first code code) and the code code obtained by S8. A process of synthesizing a code code (a code code obtained from the original digital acoustic signal: hereinafter referred to as a second code code) is performed (S9). In the first code code, a note event having a note number of 0 to 12 · log ₂ Q does not exist by adding a positive value of 12 · log ₂ Q to the note number based on the processing of S5. (Conversely, note events having a note number of 128-12 · log ₂ Q or more before S5 processing are deleted because they enter a high-pitched sound region outside the MIDI standard). Therefore, 12 · log ₂ Q is set as a predetermined threshold for the note number, the first code code uses a value not less than the predetermined threshold, and the second code code has a value less than the predetermined threshold. To obtain a composite code group. At this time, for the note event of the second code code, the velocity value may be adopted as it is, but since the length corresponding to the acoustic signal of the unit section is Q times longer than the first code code, the velocity value May cause imbalance. Therefore, the following correction processing may be performed as necessary.

具体的には、まず、第１符号コード群において、ノートナンバーが閾値以上の全てのノートイベントのベロシティ値の平均値Ｖ１を算出する。続いて、第２符号コード群においても、同様にノートナンバーが同閾値以上の全てのノートイベントのベロシティ値の平均値Ｖ２を算出する。そして、第２符号コード群のノートナンバーが閾値未満の全てのノートイベントについて、そのベロシティ値にＶ１／Ｖ２を乗じる。この結果、第１符号コード群の低音部に第２符号コード群の低音部の強度を補正したものが追加されることになる。追加する際は、Standard MIDI Fileの仕様に基づいて第１符号コード群と第２符号コード群を時刻情報に基づいてソートして混在させる方法もとれるし、第１符号コード群と第２符号コード群を独立したトラックとして収納する方法もとれる。Ｓ６〜Ｓ９の処理を実行することにより、ＭＩＤＩ規格上の全ての周波数帯域に対して符号コードを生成することができ、特に低音部にも伸びている音楽（楽器音）に対して有益である。符号コードの合成を行うと、第１符号コード群が第２符号コード群に比べ平均してＱ倍のノートイベント数をもち、トータルで時間あたりのノートイベント数が増大するため、上記Ｓ４で実行した同時発音数の調整およびビットレートの調整を再度実行する。 Specifically, first, in the first code code group, an average value V1 of velocity values of all note events having a note number equal to or greater than a threshold value is calculated. Subsequently, in the second code code group as well, an average value V2 of velocity values of all note events having note numbers equal to or greater than the same threshold value is calculated. Then, for all the note events whose note numbers of the second code code group are less than the threshold value, the velocity values are multiplied by V1 / V2. As a result, the corrected bass intensity of the second code code group is added to the bass part of the first code code group. When adding, the first code code group and the second code code group can be sorted and mixed based on the time information based on the specification of Standard MIDI File, and the first code code group and the second code code can be mixed. It is possible to store groups as independent tracks. By executing the processing of S6 to S9, code codes can be generated for all frequency bands in the MIDI standard, which is particularly useful for music (instrument sounds) extending to the bass part. . When code code synthesis is performed, the first code code group has an average number of note events Q times that of the second code code group, and the total number of note events per time increases. The adjustment of the number of simultaneous sounds and the adjustment of the bit rate are executed again.

（合成についての変形例）
以上は符号コードまで処理したＭＩＤＩイベントデータの段階で合成する手法について説明したが、変形例として、第１符号コードおよび第２符号コードを作成する前段階のスペクトル算出段階で合成する手法について図３を用いて説明する。Ｓ１〜Ｓ３までの処理は図１と同様であり、Ｓ４の代わりにＳ１０としてスペクトルの補正を行う。これは図１のＳ５に対応するもので、算出されたスペクトルの周波数（主周波数および副周波数とも）をＱ倍にし（ノートナンバーの単位では、１２・ｌｏｇ₂Ｑなる値を加算）、ＭＩＤＩ規格外にはみ出した高い周波数に対応するスペクトル強度を削除するとともに、各単位区間の開始時刻を１／Ｑ倍に補正する処理を行う。一方、図３のＳ６〜Ｓ７までは図１と同様であり、図１のＳ８を行わずに、上記Ｓ１０で補正したスペクトルと第２スペクトルとを合成する処理を行う（Ｓ１１）。Ｓ５と同様に、Ｓ１０の処理に基づいて補正されたスペクトルにはノートナンバーに１２・ｌｏｇ₂Ｑなる正の値が加算されることにより、ノートナンバーが０から１２・ｌｏｇ₂Ｑ−１に対応するスペクトル強度は存在しない。（逆に、Ｓ１０処理前に１２８−１２・ｌｏｇ₂Ｑ以上のノートナンバーをもつスペクトル強度はＭＩＤＩ規格外の高音領域に入るため削除される）。そこで、第２スペクトルより所定の閾値１２・ｌｏｇ₂Ｑ未満のスペクトル強度だけを採用し、合成スペクトルを得る。この際、第２スペクトルのスペクトル強度については、その強度値をそのまま採用し、前節で述べたような強度値の補正処理は通常行わない。（単位区間の音響信号に対応する長さがＳ１０で補正されたスペクトルは第２スペクトルに比べて１／Ｑと短いため、強度値においてバラツキが生じる可能性がある。しかし、前述したような補正処理を実施すると、後続Ｓ１２の単音成分の連結処理が強度値の不連続性により適切に働かなくなる。）そして、Ｓ１２の符号コードの作成において、第２スペクトルのスペクトル強度を含めて単音成分の連結処理を行い、ＭＩＤＩイベントデータを作成する。本合成方法をとると、Ｓ１０とＳ７で生成される２種類のスペクトル強度のなかで所定の閾値に近いデータが単一の符号コードに連結される場合があり、図１の方法に比べ符号効率の高い符号コードが生成できるという特徴がある。また、図１の実施形態では、同時発音数の調整をＳ４とＳ９の２回、ビットレートの調整をＳ４、Ｓ５、Ｓ９の３回行っているが、これらについては、本方法では最終段階のＳ１２で１回ずつ行えば良い。 (Modification about composition)
Although the method of synthesizing at the stage of the MIDI event data processed up to the code code has been described above, as a modification, the method of synthesizing at the spectrum calculation stage before the creation of the first code code and the second code code is shown in FIG. Will be described. The processing from S1 to S3 is the same as that in FIG. 1, and the spectrum is corrected as S10 instead of S4. This corresponds to S5 in FIG. 1, and the calculated spectrum frequency (both the main frequency and the sub frequency) is multiplied by Q (in the unit of note number, a value of 12 · log ₂ Q is added). A process of deleting the spectrum intensity corresponding to the high frequency protruding outside and correcting the start time of each unit section to 1 / Q times is performed. On the other hand, S6 to S7 in FIG. 3 are the same as in FIG. 1, and the process of synthesizing the spectrum corrected in S10 and the second spectrum is performed without performing S8 in FIG. 1 (S11). As in S5, the note number corresponds to 0 to 12 · log ₂ Q-1 by adding a positive value of 12 · log ₂ Q to the note number in the spectrum corrected based on the processing of S10. There is no spectral intensity to do. (Conversely, the spectrum intensity having a note number of 128-12 · log ₂ Q or more before S10 processing is deleted because it falls in the high-pitched region outside the MIDI standard). Therefore, only a spectrum intensity less than a predetermined threshold 12 · log ₂ Q is adopted from the second spectrum to obtain a combined spectrum. At this time, for the spectrum intensity of the second spectrum, the intensity value is adopted as it is, and the intensity value correction processing as described in the previous section is not normally performed. (The spectrum whose length corresponding to the acoustic signal of the unit section has been corrected in S10 is as short as 1 / Q compared to the second spectrum, so there is a possibility that the intensity value will vary. When the processing is performed, the subsequent processing of connecting the single-tone components in S12 does not work properly due to the discontinuity of the intensity value. Processing is performed to create MIDI event data. When this synthesis method is used, data close to a predetermined threshold value may be concatenated into a single code code among the two types of spectral intensities generated in S10 and S7, and the code efficiency is higher than that in the method of FIG. It is characterized in that a high code code can be generated. In the embodiment of FIG. 1, the number of simultaneous sounds is adjusted twice at S4 and S9, and the bit rate is adjusted three times at S4, S5, and S9. This may be done once at S12.

以上、本発明の好適な実施形態について説明したが、本発明は上記実施形態に限定されず、種々の変形が可能である。例えば、上記実施形態では、Ｓ６〜Ｓ９の処理を追加して、低音部について補填する処理を行ったが、上述のように、低音部を補填せず、Ｓ５までの処理により得られる符号コードでも、十分に元の音響信号を再現することが可能である。特に、音響信号として人間の音声や、低音部に影響の無い音楽を用いた場合、Ｓ５までの処理により得られる符号コードでも対応可能である。Ｓ６〜Ｓ９の処理を追加することにより、楽器音等の低音部に意味のある音響信号をより忠実に再現可能となる。 The preferred embodiments of the present invention have been described above. However, the present invention is not limited to the above embodiments, and various modifications can be made. For example, in the above embodiment, the processing of S6 to S9 is added and the processing for compensating for the bass portion is performed. However, as described above, the code portion obtained by the processing up to S5 without supplementing the bass portion is also used. It is possible to reproduce the original acoustic signal sufficiently. In particular, when a human voice or music that does not affect the bass part is used as an acoustic signal, a code code obtained by the processing up to S5 can be handled. By adding the processing of S6 to S9, it becomes possible to more faithfully reproduce a sound signal that is meaningful in a bass part such as a musical instrument sound.

また、上記図１の実施形態では、同時発音数の調整をＳ４とＳ９の２回、ビットレートの調整をＳ４、Ｓ５、Ｓ９の３回行っているが、これらについては、最終的に少なくとも１回ずつ行われていれば良い。そのため、低音部を補填する場合については、Ｓ９においてそれぞれ１回、低音部を補填しない場合については、Ｓ５においてそれぞれ１回行えばよい。もっとも、途中で実行しておくことにより、全体としての処理負荷が軽減される場合もあるため、状況に応じて適宜変更するようにしておけば良い。 In the embodiment shown in FIG. 1, the number of simultaneous sounds is adjusted twice at S4 and S9, and the bit rate is adjusted three times at S4, S5, and S9. It only has to be done once. Therefore, the case where the bass portion is compensated may be performed once in S9, and the case where the bass portion is not compensated may be performed once in S5. However, since the processing load as a whole may be reduced by executing in the middle, it may be changed as appropriate according to the situation.

また、上記実施形態では、Ｓ２〜Ｓ４、Ｓ６〜Ｓ８の処理について好ましい処理例について具体的に説明したが、これらの処理については、本発明の趣旨を逸脱しない範囲で、公知の特許文献１〜５に開示した技術を用いることができる。 Moreover, in the said embodiment, although the preferable process example was demonstrated about the process of S2-S4, S6-S8, about these processes, in the range which does not deviate from the meaning of this invention, well-known patent documents 1- 5 can be used.

本発明は、ＰＣＭ等により得られた音響信号を、ＭＩＤＩ符号等の符号コードに変換する技術を用い、放送メディア（地上・ＢＳなどによるデジタルラジオ・テレビ放送など）、通信メディア（ＣＳ放送、インターネット・ストリーミング放送、携帯電話サービス、携帯音楽配信サービスなど）、パッケージメディア（ＣＤ、ＤＶＤ、ＢｌｕｅＲａｙ、メモリＩＣカードなど）向けのオーディオコンテンツ制作産業に利用することができる。 The present invention uses a technology for converting an acoustic signal obtained by PCM or the like into a code code such as a MIDI code, and uses broadcast media (digital radio / television broadcast such as terrestrial / BS) and communication media (CS broadcast, Internet). -Streaming broadcasting, mobile phone service, portable music distribution service, etc.) and audio media production industry for package media (CD, DVD, BlueRay, memory IC card, etc.).

Claims

An encoding method for encoding an acoustic signal given as an intensity array of J time series digitized at a predetermined sampling period,
A time-series expansion step of expanding the intensity array by a predetermined magnification Q (Q is an integer) in the time axis direction and converting it into J × Q time-series expanded intensity arrays;
A section setting step for setting a plurality of unit sections to be encoded, which are composed of a predetermined number of samples T (T <J) with respect to the expanded intensity array, while overlapping adjacent unit sections in the time axis direction; ,
A spectrum calculation stage for calculating a spectrum intensity corresponding to P kinds of frequencies for each unit section;
For each unit section, the frequency information that can specify each frequency, the spectrum intensity corresponding to each frequency, and the start and end of the unit section are specified corresponding to the P types of frequencies obtained in the spectrum calculation stage. An encoding stage for creating P code codes composed of possible time information;
The frequency information is corrected so that the frequency of the P code codes is Q times, the code code corresponding to the frequency whose frequency information after correction exceeds the maximum frequency information before correction is deleted, and the remaining Ph For each code code
A code code correcting step for correcting the time information so that the time axis is 1 / Q times;
A method for encoding an acoustic signal, comprising:

An encoding method for encoding an acoustic signal given as an intensity array of J time series digitized at a predetermined sampling period,
A time-series expansion step of expanding the intensity array by a predetermined magnification Q (Q is an integer) in the time axis direction and converting it into J × Q time-series expanded intensity arrays;
A section setting step for setting a plurality of unit sections to be encoded, which are composed of a predetermined number of samples T (T <J) with respect to the expanded intensity array, while overlapping adjacent unit sections in the time axis direction; ,
A spectrum calculation stage for calculating a spectrum intensity corresponding to P kinds of frequencies for each unit section;
Correction is performed so that each of the P types of frequencies obtained in the spectrum calculation step is Q times, and the spectrum intensity corresponding to the frequency after the correction exceeds the maximum frequency before correction is deleted, and the remaining Ph A spectral correction step for correcting the spectral intensity corresponding to the type of frequency and correcting the start and end times of each unit section to be 1 / Q times;
For each unit section, frequency information that can specify each frequency, spectrum intensity corresponding to each frequency, and start and end of the unit section corresponding to the Ph types of frequencies corrected in the spectrum correction step. An encoding stage for generating Ph code codes composed of identifiable time information;
A method for encoding an acoustic signal, comprising:

In claim 1 or claim 2,
In the time series expansion step, the frequency of the acoustic signal is entirely increased by 1 / Q in the time axis direction using linear interpolation with respect to the intensity array without changing the sampling period. A method for encoding an acoustic signal, characterized in that the time axis is lowered to Q and the time axis is extended by Q times.

In any one of Claims 1-3,
The encoding step performs encoding using the MIDI format as the code code, uses a note number as the frequency information of the code code, uses velocity as the spectrum intensity, and uses the velocity as the time information and the previous MIDI event as the time information. The relative note time delta time 1 and delta time 2 are used to create a MIDI note-on event based on the converted note number, velocity, and delta time 1, and based on the note number and delta time 2. A method for encoding an audio signal, wherein a MIDI note-off event is created.

In claim 1 ,
The encoding step performs encoding using the MIDI format as the code code, uses a note number as the frequency information of the code code, uses velocity as the spectrum intensity, and uses the velocity as the time information and the previous MIDI event as the time information. The relative note time delta time 1 and delta time 2 are used to create a MIDI note-on event based on the converted note number, velocity, and delta time 1, and based on the note number and delta time 2. Create a MIDI note-off event,
The code code correction step adds 12 · log ₂ Q (a value obtained by multiplying Q by a logarithm value with 2 as a base by 12 times) to the note number,
The code code having a note number of 128-12 · log ₂ Q or more is deleted, and correction is performed such that the delta time 1 and delta time 2 of the remaining Ph code codes are multiplied by 1 / Q. A method for encoding an acoustic signal.

In any one of Claims 1-5,
The spectrum calculation step includes:
An element signal preparation step of preparing a plurality of element signals to be components of the section signal of the unit section;
A harmonic signal selection step of selecting, as a harmonic signal, an element signal having the highest correlation value with respect to the section signal from the plurality of element signals;
A difference signal calculation step for obtaining a difference signal by subtracting the inclusion signal given by the product of the harmonic signal and the correlation value obtained for the harmonic signal from the interval signal;
Using the difference signal as a new interval signal, the harmonic signal selection step and the difference signal calculation step are executed to obtain a new inclusion signal and a new difference signal, thereby obtaining P inclusion signals. An acoustic signal encoding method, wherein spectrum intensities corresponding to the P kinds of frequencies are calculated based on the obtained amplitude value of the contained signal.

In claim 1,
A second unit interval adjacent to a plurality of second unit intervals to be encoded, which is composed of a predetermined number of samples T (T <J) with respect to the J time-series intensity arrays, is set in the time axis direction. A second section setting stage to be set while overlapping,
A second spectrum calculating step for calculating a second spectrum intensity corresponding to P kinds of frequencies for each second unit section;
A second code code is generated that includes P frequencies obtained in the second spectrum calculation step, second spectrum intensities corresponding to the P frequencies, and a start time and an end time of the second unit section. Two encoding stages;
Pl second code codes having a frequency lower than the frequency range of the Ph code codes corrected by the code code correction stage are extracted from the P second code codes created by the second encoding stage. Then, the Ph code codes corrected by the code code correction step included in the unit section corresponding to the second unit section are added, and the Ph corrected code codes and the Pl second code codes are added. A code synthesis stage for creating Ph + Pl synthesized code codes comprising:
A method for encoding an acoustic signal, comprising:

In claim 2,
A second unit interval adjacent to a plurality of second unit intervals to be encoded, which is composed of a predetermined number of samples T (T <J) with respect to the J time-series intensity arrays, is set in the time axis direction. A second section setting stage to be set while overlapping,
A second spectrum calculating step for calculating a second spectrum intensity corresponding to P kinds of frequencies for each second unit section;
A second spectrum intensity corresponding to a Pl type frequency lower than a Ph type frequency range corrected by the spectrum correction step is obtained from a second spectrum intensity corresponding to the P type frequency generated by the second spectrum calculation step. Spectral intensities corresponding to the Ph types of corrected frequencies are extracted and added to the spectral intensities corresponding to the Ph types of frequencies corrected in the spectral correction step included in the unit interval corresponding to the second unit interval. And a spectrum synthesis stage for creating a synthesized spectrum intensity corresponding to the Ph + Pl synthesized frequency composed of the second spectrum intensity corresponding to the Pl kind of frequency, and
Have
The encoding step corresponds to the frequency of the Ph + Pl types synthesized in the spectrum synthesis step, frequency information that can specify each frequency, the synthesized spectrum intensity corresponding to each frequency, and the start and end of the unit interval. An acoustic signal encoding method, wherein Ph + Pl code codes composed of identifiable time information are created.

An encoding device for encoding an acoustic signal given as an intensity array of J time series digitized at a predetermined sampling period,
Time series expansion means for enlarging the intensity array by a predetermined magnification Q (Q is an integer) in the time axis direction and converting it into a J × Q time series expanded intensity array;
Section setting means for setting a plurality of unit sections to be encoded, which are composed of a predetermined number of samples T (T <J) with respect to the expanded intensity array, while overlapping adjacent unit sections in the time axis direction; ,
Spectrum calculating means for calculating spectrum intensities corresponding to P types of frequencies for each unit section;
For each unit section, the frequency information that can specify each frequency, the spectrum intensity corresponding to each frequency, and the start and end of the unit section are specified corresponding to the P types of frequencies obtained by the spectrum calculation means . Encoding means for creating P code codes composed of possible time information;
The frequency information is corrected so that the frequency of the P code codes is Q times, the code code corresponding to the frequency whose frequency information after correction exceeds the maximum frequency information before correction is deleted, and the remaining Ph Code code correcting means for correcting the time information so that the time axis is 1 / Q times the number of code codes;
A device for encoding an acoustic signal, comprising:

An encoding device for encoding an acoustic signal given as an intensity array of J time series digitized at a predetermined sampling period,
Time series expansion means for enlarging the intensity array by a predetermined magnification Q (Q is an integer) in the time axis direction and converting it into a J × Q time series expanded intensity array;
Section setting means for setting a plurality of unit sections to be encoded, which are composed of a predetermined number of samples T (T <J) with respect to the expanded intensity array, while overlapping adjacent unit sections in the time axis direction; ,
Spectrum calculating means for calculating spectrum intensities corresponding to P types of frequencies for each unit section;
Correction is performed so that the frequency of P types obtained by the spectrum calculation means is multiplied by Q, and the spectrum intensity corresponding to the frequency after the correction exceeds the maximum frequency before correction is deleted, and the remaining frequency is deleted. Spectral correction means that corrects the spectral intensity corresponding to the Ph type of frequency and corrects the start and end times of each unit section to be 1 / Q times;
For each unit section, corresponding to the Ph types of frequencies corrected by the spectrum correction means, frequency information that can specify each frequency, spectrum intensity corresponding to each frequency, and start and end of the unit section Encoding means for creating Ph code codes composed of identifiable time information;
A device for encoding an acoustic signal, comprising:

In claim 9 or claim 10,
The time series expansion means expands the frequency of the acoustic signal by 1 / Q in the time axis direction using linear interpolation with respect to the intensity array without changing the sampling period. A sound signal encoding apparatus, wherein the time axis is lowered to Q and the time axis is extended by Q times.

In any one of Claims 9-11,
The encoding means performs encoding using the MIDI format as the code code, uses a note number as the frequency information of the code code, uses velocity as the spectrum intensity, and uses the velocity as the time information and the previous MIDI event as the time information. The relative note time delta time 1 and delta time 2 are used to create a MIDI note-on event based on the converted note number, velocity, and delta time 1, and based on the note number and delta time 2. An apparatus for encoding an audio signal, wherein a MIDI note-off event is created.

In claim 9 ,
The encoding means performs encoding using the MIDI format as the code code, uses a note number as the frequency information of the code code, uses velocity as the spectrum intensity, and uses the velocity as the time information and the previous MIDI event as the time information. The relative note time delta time 1 and delta time 2 are used to create a MIDI note-on event based on the converted note number, velocity, and delta time 1, and based on the note number and delta time 2. Create a MIDI note-off event,
The code code correcting means adds 12 · log ₂ Q (a value obtained by multiplying Q to a logarithm value with 2 as a base by 12) to the note number to obtain 128−12 · log ₂ Q or more. An acoustic signal in which a code code having a note number is deleted and correction is performed such that 1 / Q is multiplied with respect to the delta time 1 and the delta time 2 of the remaining Ph code codes. Encoding device.

In any one of Claims 9-13,
The spectrum calculating means includes
Element signal preparation means for preparing a plurality of element signals to be constituent elements of the section signal of the unit section;
Harmonic signal selection means for selecting, as a harmonic signal, an element signal having the highest correlation value with respect to the section signal from the plurality of element signals;
Differential signal calculation means for obtaining a differential signal by subtracting the inclusion signal given by the product of the harmonic signal and the correlation value obtained for the harmonic signal from the interval signal;
Using the difference signal as a new section signal, the harmonic signal selection means and the difference signal calculation means obtain P content signals by repeatedly performing a process of obtaining a new content signal and a new difference signal. An apparatus for encoding an acoustic signal, wherein spectrum intensities corresponding to the P kinds of frequencies are calculated based on an amplitude value of a contained signal.

In claim 9,
A second unit interval adjacent to a plurality of second unit intervals to be encoded, which is composed of a predetermined number of samples T (T <J) with respect to the J time-series intensity arrays, is set in the time axis direction. Second section setting means for setting while overlapping,
Second spectrum calculating means for calculating a second spectrum intensity corresponding to P kinds of frequencies for each second unit section;
A second code code for generating P second code codes composed of the P frequencies obtained by the second spectrum calculating means , the second spectrum intensity corresponding to each of the P frequencies, and the start time and end time of the second unit section. Two encoding means;
Pl second code codes having a frequency lower than the frequency range of the Ph code codes corrected by the code code correction unit are extracted from the P second code codes created by the second encoding unit. Then, Ph code codes corrected by the code code correction means included in the unit section corresponding to the second unit section are added, and Ph corrected code codes and Pl second code codes are added. Code synthesizing means for creating Ph + Pl composed code codes configured;
A device for encoding an acoustic signal, comprising:

In claim 10 ,
A second unit interval adjacent to a plurality of second unit intervals to be encoded, which is composed of a predetermined number of samples T (T <J) with respect to the J time-series intensity arrays, is set in the time axis direction. Second section setting means for setting while overlapping,
Second spectrum calculating means for calculating a second spectrum intensity corresponding to P kinds of frequencies for each second unit section;
The second spectrum intensity corresponding to the Pl type frequency lower than the Ph type frequency range corrected by the spectrum correction means than the second spectrum intensity corresponding to the P type frequencies created by the second spectrum calculating means. Spectral intensities corresponding to the Ph types of corrected frequencies are extracted and added to the spectral intensities corresponding to the Ph types of frequencies corrected by the spectrum correcting means included in the unit interval corresponding to the second unit interval. And a spectrum combining means for creating a combined spectrum intensity corresponding to the combined frequency of Ph + Pl types composed of second spectrum intensities corresponding to Pl types of frequencies,
Have
The encoding means corresponds to the frequency of the Ph + P1 types synthesized by the spectrum synthesizing means , frequency information capable of specifying each frequency, synthesized spectrum intensity corresponding to each frequency, and start and end of the unit section. An acoustic signal encoding apparatus characterized in that Ph + Pl code codes composed of identifiable time information are created.

A program for causing a computer to function as the acoustic signal encoding device according to any one of claims 9 to 16.